This is a discussion on Re: Enums patch v2 within the Pgsql Patches forums, part of the PostgreSQL category; --> Tom Dunstan wrote: > Here is an updated version of the enums patch. It has been brought up to ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Tom Dunstan wrote: > Here is an updated version of the enums patch. It has been brought up to > date and applies against current CVS HEAD. The original email is at [1], > and describes the implementation. I'm sorry I missed the original discussions, but I have to ask: Why do we want enums in core? The only potential advantage I can see over using a look-up table and FK references is performance. And I'd rather spend time improving the performance of FK checks than add extra machinery to do the same thing in a different way. Ignoring my general dislike of enums, I have a few issues with the patch as it is: 1. What's the point of having comparison operators for enums? For most use cases, there's no natural ordering of enum values. 2. The comparison routine compares oids, right? If the oids wrap around when the enum values are created, the ordering isn't what the user expects. 3. 4 bytes per value is wasteful if you're storing simple status codes etc. Especially if you're doing this for performance, let's do no harm by wasting space. One byte seems enough for the typical use cases. I'd even argue that having a high upper limit on the number of enum values encourages misuse of the feature. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tue, Dec 19, 2006 at 08:09:47AM +0000, Heikki Linnakangas wrote: > Tom Dunstan wrote: > >Here is an updated version of the enums patch. It has been brought up to > >date and applies against current CVS HEAD. The original email is at [1], > >and describes the implementation. > > I'm sorry I missed the original discussions, but I have to ask: Why do > we want enums in core? The only potential advantage I can see over using > a look-up table and FK references is performance. A natural ordering is another. I'd love to be able to make a type color that has Red Orange Yellow Green Blue Indigo Violet and then be able to do an ORDER BY color; > And I'd rather spend time improving the performance of FK checks > than add extra machinery to do the same thing in a different way. Not the same thing. > Ignoring my general dislike of enums, I have a few issues with the patch > as it is: > > 1. What's the point of having comparison operators for enums? For most > use cases, there's no natural ordering of enum values. A natural ordering is precisely the use case for enums. Otherwise, you just use a FK to a one-column table and have done. Cheers, D -- David Fetter <david@fetter.org> http://fetter.org/ phone: +1 415 235 3778 AIM: dfetter666 Skype: davidfetter Remember to vote! ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Heikki Linnakangas wrote: > I'm sorry I missed the original discussions, but I have to ask: Why > do we want enums in core? The only potential advantage I can see over > using a look-up table and FK references is performance. The difference is that foreign-key-referenced data is part of your data whereas enums would be part of the type system used to model the data. An objection to enums on the ground that foreign keys can accomplish the same thing could be extended to object to any data type with a finite domain. -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: > >> 1. What's the point of having comparison operators for enums? For most >> use cases, there's no natural ordering of enum values. >> > > If you would like to be able to index enum columns, or even GROUP BY one, > you need those; whether the ordering is arbitrary or not is irrelevant. > Heikki's assertion is wrong in any case. The enumeration definition defines the ordering, and I can think of plenty of use cases where it does matter. We do not use an arbitrary ordering. An enum type is an *ordered* set of string labels. Without this the feature would be close to worthless. But if a particular application doesn't need them ordered, it need not use the comparison operators. Leaving aside the uses for GROUP BY and indexes, I would ask what the justification would be for leaving off comparison operators? > >> 2. The comparison routine compares oids, right? If the oids wrap around >> when the enum values are created, the ordering isn't what the user expects. >> > > This is a fair point --- it'd be better if the ordering were not > dependent on chance OID assignments. Not sure what we are willing > to pay to have that though. > This is a non-issue. The code sorts the oids before assigning them: /* allocate oids */ oids = (Oid *) palloc(sizeof(Oid) * n); for(i = 0; i < n; i++) { oids[i] = GetNewOid(pg_enum); } /* wraparound is unlikely, but just to be safe...*/ qsort(oids, n, sizeof(Oid), oid_cmp); > >> 3. 4 bytes per value is wasteful if you're storing simple status codes >> etc. >> > > I've forgotten exactly which design Tom is proposing to implement here, > but at least one of the contenders involved storing an OID that would be > unique across all enum types. 1 byte is certainly not enough for that > and even 2 bytes would be pretty marginal. I'm unconvinced by arguments > about 2 bytes being so much better than 4 anyway --- in the majority of > real table layouts, the hoped-for savings would disappear into alignment > padding. > > > Globally unique is the design adopted, after much on-list discussion. That was a way of getting it *down* to 4 bytes. The problem is that the output routines need enough info from just the internal representation of the type value to do their work. The original suggestions was for 8 bytes - type oid + offset in value set. Having them globally unique lets us get down to 4. As for efficiency, I agree with what Tom says about alignment and padding dissolving away any perceived advantage in most cases. If we ever get around to optimising record layout we could revisit it. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Andrew Dunstan wrote: > As for efficiency, I agree with what Tom says about alignment and > padding dissolving away any perceived advantage in most cases. If we > ever get around to optimising record layout we could revisit it. I don't, because there are always those that are knowledgeable enough to know how to reduce space lost to padding. So it would be nice to have 2-byte enums on-disk, and resolve them based on the column's typid. But then, I'm not familiar with the patch at all so I'm not sure if it's possible. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Alvaro Herrera wrote: > Andrew Dunstan wrote: > > >> As for efficiency, I agree with what Tom says about alignment and >> padding dissolving away any perceived advantage in most cases. If we >> ever get around to optimising record layout we could revisit it. >> > > I don't, because there are always those that are knowledgeable enough to > know how to reduce space lost to padding. So it would be nice to have > 2-byte enums on-disk, and resolve them based on the column's typid. But > then, I'm not familiar with the patch at all so I'm not sure if it's > possible. > > The trouble is that we have one output routine for all enum types. See previous discussions about disallowing extra params to output routines. So if all we have is a 2 byte offset into the list of values for the given type, we do not have enough info to allow the output routine to deduce which particular enum type it is dealing with. With the globally unique oid approach it doesn't even need to care - it just looks up the corresponding value. Note that this was a reduction from the previously suggested (by TGL) 8 bytes. I'm not a big fan of ordering columns to optimise record layout, except in the most extreme cases (massive DW type apps). I think visible column order should be logical, not governed by physical considerations. cheers andrew ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| "Andrew Dunstan" <andrew@dunslane.net> writes: > I'm not a big fan of ordering columns to optimise record layout, except in the > most extreme cases (massive DW type apps). I think visible column order should > be logical, not governed by physical considerations. Well as long as we're talking "should"s the database should take care of this for you anyways. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Heikki Linnakangas wrote: > I'm sorry I missed the original discussions, but I have to ask: Why do > we want enums in core? The only potential advantage I can see over using > a look-up table and FK references is performance. Well, there are a few things. Sometimes its tidiness, sometimes integrity... I've seen more than one system with hundreds of these things, and they've either gone down the table-per-enum solution, with LOTS of extra tables whose values never change, or the EAV solution, with one or two globally referenced tables to which everything in your system has as a FK, and an integrity check in a trigger if you're very lucky. Yuck on both accounts. Enums hit a sweet spot in the middle and provide data integrity and performance for non-changing values. > 1. What's the point of having comparison operators for enums? For most > use cases, there's no natural ordering of enum values. Well, there are a number of cases where ordering IS important, and indeed, enums provide a way to do it easily where many of the alternative solutions do not. It's one of the key benefits. > 2. The comparison routine compares oids, right? If the oids wrap around > when the enum values are created, the ordering isn't what the user expects. As has been pointed out by others quicker on the draw than me, I do sort the OIDs at enum creation time, for exactly this reason. > 3. 4 bytes per value is wasteful if you're storing simple status codes > etc. Especially if you're doing this for performance, let's do no harm > by wasting space. One byte seems enough for the typical use cases. I'd > even argue that having a high upper limit on the number of enum values > encourages misuse of the feature. I'd really love to have these fit into a 1 or 2 byte value on disk, but AFAIK there's simply no way to do it currently in postgresql. If we ever move to a solution where on-disk representation is noticeably different from in-memory representation, then it might be doable. If that does happen, we might benefit from other improvements such as being able to order columns in a tuple on disk so as to minimize alignment padding, not having to store a composite type's oid, etc. Until that happens, though, if it ever does, this is probably the tightest on-disk representation we're likely to get, unless we're happy to impose some pretty severe restrictions, like 8 bits per enum, and only 256 enums in total (for a 2 byte total). I was already shot down trying to make similar restrictions when I first brought it up. seems to offend the least. We did discuss this somewhat earlier, and I'm happy to take alternative suggestions, but AFAIK this is about as good as it's going to get right now. Cheers Tom ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Alvaro Herrera wrote: > I don't, because there are always those that are knowledgeable enough to > know how to reduce space lost to padding. So it would be nice to have > 2-byte enums on-disk, and resolve them based on the column's typid. But > then, I'm not familiar with the patch at all so I'm not sure if it's > possible. Not with this patch, and AFAIK not possible generally, without writing separate I/O functions for each type. I'd love to be able to do that, but I don't think it's possible currently. The main stumbling block is the output function (and cast-to-text function), because output functions do not get provided the oid of the type that they're dealing with, for security reasons IIRC. It was never clear to me why I/O functions should ever be directly callable by a user (and hence open to security issues), but apparently it was enough to purge any that were designed like that from the system, so I wasn't going to go down that road with the patch. Cheers Tom ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| Peter Eisentraut wrote: > An objection to enums on the ground that foreign keys can accomplish the > same thing could be extended to object to any data type with a finite > domain. Exactly. The extreme case is the boolean type, which could easily be represented by a two-value enum. Or, if you were feeling masochistic, a FK to a separate table. Which is easier? People regularly do stuff like having domains over finite text values, or having a FK to a separate (static) table, or using some sort of EAV. Enums are type-safe, easily ordered, relatively efficient and don't leave zillions of little static tables all over the place, a combination of attributes that none of the alternative solutions in this space present. Cheers Tom ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |