Unix Technical Forum

Change sort order on UUIDs?

This is a discussion on Change sort order on UUIDs? within the pgsql Hackers forums, part of the PostgreSQL category; --> I've been testing the new UUID functionality in 8.3dev and noticed that UUIDs are sorted using memcmp in their ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 09:07 AM
Robert Wojciechowski
 
Posts: n/a
Default Change sort order on UUIDs?

I've been testing the new UUID functionality in 8.3dev and noticed that
UUIDs are sorted using memcmp in their default in-memory layout, which
is:



struct uuid {

uint32_t time_low;

uint16_t time_mid;

uint16_t time_hi_and_version;

uint8_t clock_seq_hi_and_reserved;

uint8_t clock_seq_low;

uint8_t node[_UUID_NODE_LEN];

};



When done that way, you're going to see a lot of index B-tree
fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,
as described above. With random (version 4) or hashed based (version 3
or 5) UUIDs there's nothing that can be done to improve the situation,
obviously.



So I went down the path of changing the pgsql sorting order to instead
sort by, from most significant to least:



1) Node (MAC address),

2) clock sequence, then

3) time.



The implementation is as follows:



/* internal uuid compare function */

static int

uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2)

{

int result;



/* node */

if ((result = memcmp(&arg1->data[10], &arg2->data[10], 6)) != 0)

return result;



/* clock_seq_hi_and_reserved, clock_seq_low */

if ((result = memcmp(&arg1->data[8], &arg2->data[8], 2)) != 0)

return result;



/* time_hi_and_version */

if ((result = memcmp(&arg1->data[6], &arg2->data[6], 2)) != 0)

return result;



/* time_mid */

if ((result = memcmp(&arg1->data[4], &arg2->data[4], 2)) != 0)

return result;



/* time_low */

return memcmp(&arg1->data[0], &arg2->data[0], 4);

}



This results in much less fragmentation and reduced page hits when
indexing a UUID column. When multiple UUID generators with different
node values contribute to a single table concurrently, it should also
result in better performance than if they sorted the way they do now or
by time first.



Sorting UUIDs when they are random/hashed with memcmp seems pretty darn
useless in all scenarios and performs poorly on indexes. This method is
equally poor with random/hashed UUIDs, but much better with version 1
time based UUIDs.



What do you guys think about changing the default behavior of pgsql to
compare UUIDs this way?



-- Robert


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 09:07 AM
Tom Lane
 
Posts: n/a
Default Re: Change sort order on UUIDs?

"Robert Wojciechowski" <robertw@expressyard.com> writes:
> I've been testing the new UUID functionality in 8.3dev and noticed that
> UUIDs are sorted using memcmp in their default in-memory layout,
> ...
> When done that way, you're going to see a lot of index B-tree
> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,


This claim seems like nonsense. Btrees don't care about the ordering
details of what they index.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 09:07 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Change sort order on UUIDs?

Tom Lane wrote:
> "Robert Wojciechowski" <robertw@expressyard.com> writes:
>> I've been testing the new UUID functionality in 8.3dev and noticed that
>> UUIDs are sorted using memcmp in their default in-memory layout,
>> ...
>> When done that way, you're going to see a lot of index B-tree
>> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,

>
> This claim seems like nonsense. Btrees don't care about the ordering
> details of what they index.


I believe he means that with his modified comparison function, when
inserting a series of UUIDs with increasing time-fields, the index keys
are always inserted to the rightmost page, which gives a more tightly
packed index than scattered inserts all-around the index.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 09:07 AM
Tom Lane
 
Posts: n/a
Default Re: Change sort order on UUIDs?

Heikki Linnakangas <heikki@enterprisedb.com> writes:
> I believe he means that with his modified comparison function, when
> inserting a series of UUIDs with increasing time-fields, the index keys
> are always inserted to the rightmost page, which gives a more tightly
> packed index than scattered inserts all-around the index.


Hm. Still, given that that benefit would only accrue for one version of
uuid generation, it's a pretty weak argument.

The concrete reason for not changing it is that the sort ordering of
uuids would then look quite unnatural compared to the display format.
Which would provoke confusion and bug reports...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 09:07 AM
Robert Wojciechowski
 
Posts: n/a
Default Re: Change sort order on UUIDs?

> >> I've been testing the new UUID functionality in 8.3dev and noticed
that
> >> UUIDs are sorted using memcmp in their default in-memory layout,
> >> ...
> >> When done that way, you're going to see a lot of index B-tree
> >> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based

UUIDs,
> >
> > This claim seems like nonsense. Btrees don't care about the

ordering
> > details of what they index.

>
> I believe he means that with his modified comparison function, when
> inserting a series of UUIDs with increasing time-fields, the index

keys
> are always inserted to the rightmost page, which gives a more tightly
> packed index than scattered inserts all-around the index.
>


That was my thinking; that it would speed up (bulk) inserts causing
fewer page splits.

I'm also using my own contrib module that uses FreeBSD's uuid_create
generating DCE 1.1 UUIDs, which keeps the state of the UUID generator in
the kernel. The 8.3 contrib module based on uuid-ossp seems to 1) not
compile on FreeBSD (conflicts with uuid.h from the OS) and 2) randomizes
the clock sequence as there is no state stored between invocations.

The other thing this modification does is allow ORDER BY to order by
time when possible, which is a nice default behavior as well, yes?

-- Robert

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 09:07 AM
Robert Wojciechowski
 
Posts: n/a
Default Re: Change sort order on UUIDs?

> Heikki Linnakangas <heikki@enterprisedb.com> writes:
> > I believe he means that with his modified comparison function, when
> > inserting a series of UUIDs with increasing time-fields, the index

keys
> > are always inserted to the rightmost page, which gives a more

tightly
> > packed index than scattered inserts all-around the index.

>
> Hm. Still, given that that benefit would only accrue for one version

of
> uuid generation, it's a pretty weak argument.
>
> The concrete reason for not changing it is that the sort ordering of
> uuids would then look quite unnatural compared to the display format.
> Which would provoke confusion and bug reports...
>
> regards, tom lane


If it improves non-user controllable indexing behavior, doesn't
negatively affect the indexing of random/hash based UUIDs, and only
seems to affect ordering for the display format, it seems worth it to
me.

A paragraph in the documentation stating how UUIDs are sorted seems to
satisfy the visual ordering concern, which is more than what Microsoft
is doing (I had to dig for a blog post to find this out.)

In addition it would be very odd to sort random/hashed GUIDs and expect
anything that in meaningful, anyway. If the user wants to see a UUID
lexographically sorted, they could also cast the column to text like so:

select uuid_column from uuid_test order by uuid_column::text;

.... which produces the desired output for visual analysis if that was
desired while still retaining all the other benefits.

I'll continue thinking about any other downsides to this tonight, too.

-- Robert

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 09:07 AM
Gregory Stark
 
Posts: n/a
Default Re: Change sort order on UUIDs?


"Robert Wojciechowski" <robertw@expressyard.com> writes:

> When done that way, you're going to see a lot of index B-tree
> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,
> as described above. With random (version 4) or hashed based (version 3
> or 5) UUIDs there's nothing that can be done to improve the situation,
> obviously.


Is this based on empirical results or just a theory? I'm asking because it's
actually a common technique to reverse the natural index key to construct
basically exactly this situation -- for performance reasons. The idea is that
low order bits have higher cardinality and that that can *improve* btree
performance by avoiding contention.

I'm not sure how much I believe in the effectiveness of that strategy myself
or for that matter whether it's universally applicable or only useful in
certain types of loads.

I'm not saying you're wrong, but I'm not sure it's a simple open and shut case
either.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 09:07 AM
Gregory Stark
 
Posts: n/a
Default Re: Change sort order on UUIDs?

"Robert Wojciechowski" <robertw@expressyard.com> writes:

> That was my thinking; that it would speed up (bulk) inserts causing
> fewer page splits.


Ah, I understand better now. hm. high data density would be good for reading.
But I think the case for inserting is actually quite mixed. If you have lots
of processes trying to insert you'll actually get poorer performance because
they'll all have to get access to the same page. Worse, you'll probably have a
unique index.

> The other thing this modification does is allow ORDER BY to order by
> time when possible, which is a nice default behavior as well, yes?


I think that actually is quite a nice effect. Certainly the loss of it is one
of the big practical disadvantages of using UUIDs over a sequence.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 09:07 AM
mark@mark.mielke.cc
 
Posts: n/a
Default Re: Change sort order on UUIDs?

On Thu, Jun 14, 2007 at 03:38:44PM -0400, Robert Wojciechowski wrote:
> I've been testing the new UUID functionality in 8.3dev and noticed that
> UUIDs are sorted using memcmp in their default in-memory layout, which
> is:
> struct uuid {
> uint32_t time_low;
> uint16_t time_mid;
> uint16_t time_hi_and_version;
> uint8_t clock_seq_hi_and_reserved;
> uint8_t clock_seq_low;
> uint8_t node[_UUID_NODE_LEN];
> };
> When done that way, you're going to see a lot of index B-tree
> fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs,
> as described above. With random (version 4) or hashed based (version 3
> or 5) UUIDs there's nothing that can be done to improve the situation,
> obviously.


I suggest that treating the UUID as anything other than a unique
random value is a mistake. There should be no assumptions by users
with regard to how the order is displayed. Also, as UUID generation
based on time is always in sequence, it seems to me that sorting by
UUID time would have the effect of inserts always being to the end of
the index. While this might pack tightly, wouldn't this hurt
concurrency? Random access vs sequential performance. For UUID, I
would value random access before sequential performance. Why would
anybody scan UUID through the index in "sequential" order?

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
.. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 09:07 AM
rse+jquery-discuss@engelschall.com
 
Posts: n/a
Default Re: Change sort order on UUIDs?

On Jun 14, 11:26 pm, robe...@expressyard.com ("Robert Wojciechowski")
wrote:

> [...]
> I'm also using my own contrib module that uses FreeBSD's uuid_create
> generating DCE 1.1 UUIDs, which keeps the state of the UUID generator in
> the kernel. The 8.3 contrib module based on uuid-osspseems to 1) not
> compile on FreeBSD (conflicts with uuid.h from the OS) and 2) randomizes
> the clock sequence as there is no state stored between invocations.
> [...]


That's correct. But the next (still unreleased) version of OSSP uuid
for this purpose contains an optional "context" object where those
states are stored. Once this version is released (soon) and PostgreSQL
uses this context feature (just a few lines of code changes necessary)
this drawback will be gone.

> The other thing this modification does is allow ORDER BY to order by
> time when possible, which is a nice default behavior as well, yes?


This works for version 1 UUIDs only. I don't know which version
PostgreSQL currently uses. If it is version 1, then this is true.
For other versions this doesn work. The FreeBSD UUID generator
BTW just supports version 1 anyway...

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:16 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com