Unix Technical Forum

Re: Enums patch v2

This is a discussion on Re: Enums patch v2 within the Pgsql Patches forums, part of the PostgreSQL category; --> Tom Dunstan wrote: > Here is an updated version of the enums patch. It has been brought up to ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 10:10 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Enums patch v2

Tom Dunstan wrote:
> Here is an updated version of the enums patch. It has been brought up to
> date and applies against current CVS HEAD. The original email is at [1],
> and describes the implementation.


I'm sorry I missed the original discussions, but I have to ask: Why do
we want enums in core? The only potential advantage I can see over using
a look-up table and FK references is performance. And I'd rather spend
time improving the performance of FK checks than add extra machinery to
do the same thing in a different way.

Ignoring my general dislike of enums, I have a few issues with the patch
as it is:

1. What's the point of having comparison operators for enums? For most
use cases, there's no natural ordering of enum values.

2. The comparison routine compares oids, right? If the oids wrap around
when the enum values are created, the ordering isn't what the user expects.

3. 4 bytes per value is wasteful if you're storing simple status codes
etc. Especially if you're doing this for performance, let's do no harm
by wasting space. One byte seems enough for the typical use cases. I'd
even argue that having a high upper limit on the number of enum values
encourages misuse of the feature.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 10:10 AM
David Fetter
 
Posts: n/a
Default Re: Enums patch v2

On Tue, Dec 19, 2006 at 08:09:47AM +0000, Heikki Linnakangas wrote:
> Tom Dunstan wrote:
> >Here is an updated version of the enums patch. It has been brought up to
> >date and applies against current CVS HEAD. The original email is at [1],
> >and describes the implementation.

>
> I'm sorry I missed the original discussions, but I have to ask: Why do
> we want enums in core? The only potential advantage I can see over using
> a look-up table and FK references is performance.


A natural ordering is another. I'd love to be able to make a type
color that has

Red
Orange
Yellow
Green
Blue
Indigo
Violet

and then be able to do an ORDER BY color;

> And I'd rather spend time improving the performance of FK checks
> than add extra machinery to do the same thing in a different way.


Not the same thing.

> Ignoring my general dislike of enums, I have a few issues with the patch
> as it is:
>
> 1. What's the point of having comparison operators for enums? For most
> use cases, there's no natural ordering of enum values.


A natural ordering is precisely the use case for enums. Otherwise,
you just use a FK to a one-column table and have done.

Cheers,
D
--
David Fetter <david@fetter.org> http://fetter.org/
phone: +1 415 235 3778 AIM: dfetter666
Skype: davidfetter

Remember to vote!

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 10:10 AM
Peter Eisentraut
 
Posts: n/a
Default Re: Enums patch v2

Heikki Linnakangas wrote:
> I'm sorry I missed the original discussions, but I have to ask: Why
> do we want enums in core? The only potential advantage I can see over
> using a look-up table and FK references is performance.


The difference is that foreign-key-referenced data is part of your data
whereas enums would be part of the type system used to model the data.

An objection to enums on the ground that foreign keys can accomplish the
same thing could be extended to object to any data type with a finite
domain.

--
Peter Eisentraut
http://developer.postgresql.org/~petere/

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 10:11 AM
Andrew Dunstan
 
Posts: n/a
Default Re: Enums patch v2

Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>
>> 1. What's the point of having comparison operators for enums? For most
>> use cases, there's no natural ordering of enum values.
>>

>
> If you would like to be able to index enum columns, or even GROUP BY one,
> you need those; whether the ordering is arbitrary or not is irrelevant.
>



Heikki's assertion is wrong in any case. The enumeration definition
defines the ordering, and I can think of plenty of use cases where it
does matter. We do not use an arbitrary ordering. An enum type is an
*ordered* set of string labels. Without this the feature would be close
to worthless. But if a particular application doesn't need them ordered,
it need not use the comparison operators. Leaving aside the uses for
GROUP BY and indexes, I would ask what the justification would be for
leaving off comparison operators?

>
>> 2. The comparison routine compares oids, right? If the oids wrap around
>> when the enum values are created, the ordering isn't what the user expects.
>>

>
> This is a fair point --- it'd be better if the ordering were not
> dependent on chance OID assignments. Not sure what we are willing
> to pay to have that though.
>


This is a non-issue. The code sorts the oids before assigning them:

/* allocate oids */
oids = (Oid *) palloc(sizeof(Oid) * n);
for(i = 0; i < n; i++)
{
oids[i] = GetNewOid(pg_enum);
}
/* wraparound is unlikely, but just to be safe...*/
qsort(oids, n, sizeof(Oid), oid_cmp);


>
>> 3. 4 bytes per value is wasteful if you're storing simple status codes
>> etc.
>>

>
> I've forgotten exactly which design Tom is proposing to implement here,
> but at least one of the contenders involved storing an OID that would be
> unique across all enum types. 1 byte is certainly not enough for that
> and even 2 bytes would be pretty marginal. I'm unconvinced by arguments
> about 2 bytes being so much better than 4 anyway --- in the majority of
> real table layouts, the hoped-for savings would disappear into alignment
> padding.
>
>
>


Globally unique is the design adopted, after much on-list discussion.
That was a way of getting it *down* to 4 bytes. The problem is that the
output routines need enough info from just the internal representation
of the type value to do their work. The original suggestions was for 8
bytes - type oid + offset in value set. Having them globally unique lets
us get down to 4.

As for efficiency, I agree with what Tom says about alignment and
padding dissolving away any perceived advantage in most cases. If we
ever get around to optimising record layout we could revisit it.

cheers

andrew


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 10:11 AM
Alvaro Herrera
 
Posts: n/a
Default Re: [HACKERS] Enums patch v2

Andrew Dunstan wrote:

> As for efficiency, I agree with what Tom says about alignment and
> padding dissolving away any perceived advantage in most cases. If we
> ever get around to optimising record layout we could revisit it.


I don't, because there are always those that are knowledgeable enough to
know how to reduce space lost to padding. So it would be nice to have
2-byte enums on-disk, and resolve them based on the column's typid. But
then, I'm not familiar with the patch at all so I'm not sure if it's
possible.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 10:11 AM
Andrew Dunstan
 
Posts: n/a
Default Re: [HACKERS] Enums patch v2

Alvaro Herrera wrote:
> Andrew Dunstan wrote:
>
>
>> As for efficiency, I agree with what Tom says about alignment and
>> padding dissolving away any perceived advantage in most cases. If we
>> ever get around to optimising record layout we could revisit it.
>>

>
> I don't, because there are always those that are knowledgeable enough to
> know how to reduce space lost to padding. So it would be nice to have
> 2-byte enums on-disk, and resolve them based on the column's typid. But
> then, I'm not familiar with the patch at all so I'm not sure if it's
> possible.
>
>


The trouble is that we have one output routine for all enum types. See
previous discussions about disallowing extra params to output routines.
So if all we have is a 2 byte offset into the list of values for the
given type, we do not have enough info to allow the output routine to
deduce which particular enum type it is dealing with. With the globally
unique oid approach it doesn't even need to care - it just looks up the
corresponding value. Note that this was a reduction from the previously
suggested (by TGL) 8 bytes.

I'm not a big fan of ordering columns to optimise record layout, except
in the most extreme cases (massive DW type apps). I think visible column
order should be logical, not governed by physical considerations.

cheers

andrew



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 10:11 AM
Gregory Stark
 
Posts: n/a
Default Re: [HACKERS] Enums patch v2

"Andrew Dunstan" <andrew@dunslane.net> writes:

> I'm not a big fan of ordering columns to optimise record layout, except in the
> most extreme cases (massive DW type apps). I think visible column order should
> be logical, not governed by physical considerations.


Well as long as we're talking "should"s the database should take care of this
for you anyways.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 10:11 AM
Tom Dunstan
 
Posts: n/a
Default Re: Enums patch v2

Heikki Linnakangas wrote:
> I'm sorry I missed the original discussions, but I have to ask: Why do
> we want enums in core? The only potential advantage I can see over using
> a look-up table and FK references is performance.


Well, there are a few things. Sometimes its tidiness, sometimes
integrity... I've seen more than one system with hundreds of these
things, and they've either gone down the table-per-enum solution, with
LOTS of extra tables whose values never change, or the EAV solution,
with one or two globally referenced tables to which everything in your
system has as a FK, and an integrity check in a trigger if you're very
lucky. Yuck on both accounts. Enums hit a sweet spot in the middle and
provide data integrity and performance for non-changing values.

> 1. What's the point of having comparison operators for enums? For most
> use cases, there's no natural ordering of enum values.


Well, there are a number of cases where ordering IS important, and
indeed, enums provide a way to do it easily where many of the
alternative solutions do not. It's one of the key benefits.

> 2. The comparison routine compares oids, right? If the oids wrap around
> when the enum values are created, the ordering isn't what the user expects.


As has been pointed out by others quicker on the draw than me, I do sort
the OIDs at enum creation time, for exactly this reason.

> 3. 4 bytes per value is wasteful if you're storing simple status codes
> etc. Especially if you're doing this for performance, let's do no harm
> by wasting space. One byte seems enough for the typical use cases. I'd
> even argue that having a high upper limit on the number of enum values
> encourages misuse of the feature.


I'd really love to have these fit into a 1 or 2 byte value on disk, but
AFAIK there's simply no way to do it currently in postgresql. If we ever
move to a solution where on-disk representation is noticeably different
from in-memory representation, then it might be doable. If that does
happen, we might benefit from other improvements such as being able to
order columns in a tuple on disk so as to minimize alignment padding,
not having to store a composite type's oid, etc. Until that happens,
though, if it ever does, this is probably the tightest on-disk
representation we're likely to get, unless we're happy to impose some
pretty severe restrictions, like 8 bits per enum, and only 256 enums in
total (for a 2 byte total). I was already shot down trying to make
similar restrictions when I first brought it up. The OID solution
seems to offend the least.

We did discuss this somewhat earlier, and I'm happy to take alternative
suggestions, but AFAIK this is about as good as it's going to get right now.

Cheers

Tom

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 10:11 AM
Tom Dunstan
 
Posts: n/a
Default Re: [HACKERS] Enums patch v2

Alvaro Herrera wrote:
> I don't, because there are always those that are knowledgeable enough to
> know how to reduce space lost to padding. So it would be nice to have
> 2-byte enums on-disk, and resolve them based on the column's typid. But
> then, I'm not familiar with the patch at all so I'm not sure if it's
> possible.


Not with this patch, and AFAIK not possible generally, without writing
separate I/O functions for each type. I'd love to be able to do that,
but I don't think it's possible currently. The main stumbling block is
the output function (and cast-to-text function), because output
functions do not get provided the oid of the type that they're dealing
with, for security reasons IIRC. It was never clear to me why I/O
functions should ever be directly callable by a user (and hence open to
security issues), but apparently it was enough to purge any that were
designed like that from the system, so I wasn't going to go down that
road with the patch.

Cheers

Tom



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 10:11 AM
Tom Dunstan
 
Posts: n/a
Default Re: Enums patch v2

Peter Eisentraut wrote:
> An objection to enums on the ground that foreign keys can accomplish the
> same thing could be extended to object to any data type with a finite
> domain.


Exactly. The extreme case is the boolean type, which could easily be
represented by a two-value enum. Or, if you were feeling masochistic, a
FK to a separate table. Which is easier?

People regularly do stuff like having domains over finite text values,
or having a FK to a separate (static) table, or using some sort of EAV.
Enums are type-safe, easily ordered, relatively efficient and don't
leave zillions of little static tables all over the place, a combination
of attributes that none of the alternative solutions in this space present.

Cheers

Tom


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:38 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com