Unix Technical Forum

Patch for UUID datatype (beta)

This is a discussion on Patch for UUID datatype (beta) within the Pgsql Patches forums, part of the PostgreSQL category; --> Folks, The following patch implements the UUID datatype. I would like to send this beta patch to see if ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 10:00 AM
Gevik Babakhani
 
Posts: n/a
Default Patch for UUID datatype (beta)

Folks,

The following patch implements the UUID datatype. I would like to send
this beta patch to see if I still am on the right track. Please send
your comments.

Description of UUID:

- The type is called uuid.
- btree and hash indexes are supported.
- uuid array is supported.
- uuid text i/o is supported.
- uuid binary i/o is supported.
- uuid_to_text and text_to_uuid casting is supported.
- uuid_to_varchar and varchar_to_uuid casting is supported.
- the < <= = => > <> operators are supported. Please note that some of
these operators mathematically have no meaning and are only good for
sorting.

- new_guid() function is supported. This function is based on V4 random
uuid value. It generated 16 random bytes with uuid 'variant' and
'version'. It is not guaranteed to produce unique values according to
the docs but I have inserted 6 million records and it did not create any
duplicates

- the uuid datatype supports 3 input formats:
1. "00000000-0000-0000-0000-00000000"
2. "0000000000000000000000000000"
3. "{00000000-0000-0000-0000-00000000}"

- the uuid datatype supports the defined output format by RFC:
"00000000-0000-0000-0000-00000000"


Areas yet in development and testing:

- uuid array indexing.
- testing with joins (merge,hash,gin)
- new_guid() fail proof testing
- performance testing
- testing with internal storage and compression.
- regression test addition
- proper documentation
- overall sanity testing/checking

Please note that I consider this a beta patch.
You can download it from:
http://www.truesoftware.net/pgsql/uuid/patch-0.1/


Regards,
Gevik.









---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 10:00 AM
Andreas Pflug
 
Posts: n/a
Default Re: Patch for UUID datatype (beta)

Gevik Babakhani wrote:
> - new_guid() function is supported. This function is based on V4 random
> uuid value. It generated 16 random bytes with uuid 'variant' and
> 'version'. It is not guaranteed to produce unique values


Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
there's a commonly accepted algorithm providing this.

Regards,
Andreas


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 10:00 AM
Gevik Babakhani
 
Posts: n/a
Default Re: Patch for UUID datatype (beta)

On Mon, 2006-09-18 at 09:21 +0200, Andreas Pflug wrote:
> Gevik Babakhani wrote:
> > - new_guid() function is supported. This function is based on V4 random
> > uuid value. It generated 16 random bytes with uuid 'variant' and
> > 'version'. It is not guaranteed to produce unique values

>
> Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> there's a commonly accepted algorithm providing this.
>


uniqueness is never a guaranteed. that is according to the RFC docs.
However the new_guid() generates a random value in the range of 256^256.
The random value is again based on the PG's randomizer which is a very
good one.

uniqueness is never a guaranteed in the sense that there is a tiny
chance someone of the other side of the planet might generate the same
guid. Or if you set your PC's clock back to the past (1981) you have a
tiny chance to generate a same guid twice.

I am running a test that is going on for the past two days, in has
generated over 14 million guids with new_guid() and yet no
duplicates

Regards,
Gevik


> Regards,
> Andreas
>
>



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 10:00 AM
Harald Armin Massa
 
Posts: n/a
Default Re: Patch for UUID datatype (beta)

Gevik,
>uniqueness is never a guaranteed. that is according to the RFC docs.


>uniqueness is never a guaranteed in the sense that there is a tiny
>chance someone of the other side of the planet might generate the same
>guid.


As much as I learned, it is recommended to give information about "grade of
uniqueness". I think it would be a valuable information, which information
your UUID-generator takes into account, and what the "grade of uniqueness"
is.

(I know of the Windows UUID, which takes the MAC-Address of the included
Ethernet-Card into it's calculation, which may be guaranteed to be unique)


Some more questions about UUIDs and your patch:

a) compatibility of UUIDs -> I have generated a lot of UUIDs via the WIN32
provided function (for the unix-only-people: Windows uses UUIDs all around
its registry, software IDs and on and on). How unique are those UUIDs when
mixed with "your" UUIDs ?

b) I read some time ago about the problems with UUIDs as primary keys in
contrast to serials: serials get produced in ascending order; and often data
which was produced in one timespan is also connected semantically. "near
serial values" are also local within a btree-index; but UUIDs generated in
"near times" are usually spread around the possible bitranges.
(example for sequence of serials: 1 - 2 - 3 - 4 - 5 - 6
example for sequence of UUIDs : 1 - 999919281921843191 - 782 -
18291831912318971231)
that is supposed to affect the locality of the index, and from that also the
performance of the system.

I do not know how valid this information is; so I am asking you for your
feedback; especially since you put a lot of thoughts into this UUID patch.
Maybe you took allready care of this situation when constructing the index
operators?

Thanks

Harald






--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 10:00 AM
Tom Lane
 
Posts: n/a
Default Re: Patch for UUID datatype (beta)

Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> there's a commonly accepted algorithm providing this.


Anyone who thinks UUIDs are guaranteed unique has been drinking too much
of the kool-aid. They're at best probably unique. Some generator
algorithms might make it more probable than others, but you simply
cannot "guarantee" it for UUIDs generated on noncommunicating machines.

One of the big reasons that I'm hesitant to put a UUID generation
function into core is the knowledge that none of them are or can be
perfect ... so people might need different ones depending on local
conditions. I'm inclined to think that a reasonable setup would put
the datatype (with input, output, comparison and indexing support)
into core, but provide a generation function as a contrib module,
making it easily replaceable.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 10:00 AM
Gevik Babakhani
 
Posts: n/a
Default Re: Patch for UUID datatype (beta)

Completely agreed. I can remove the function from the patch. The
temptation was just too high not to include the new_guid() in the
patch


On Mon, 2006-09-18 at 10:33 -0400, Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> > Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> > there's a commonly accepted algorithm providing this.

>
> Anyone who thinks UUIDs are guaranteed unique has been drinking too much
> of the kool-aid. They're at best probably unique. Some generator
> algorithms might make it more probable than others, but you simply
> cannot "guarantee" it for UUIDs generated on noncommunicating machines.
>
> One of the big reasons that I'm hesitant to put a UUID generation
> function into core is the knowledge that none of them are or can be
> perfect ... so people might need different ones depending on local
> conditions. I'm inclined to think that a reasonable setup would put
> the datatype (with input, output, comparison and indexing support)
> into core, but provide a generation function as a contrib module,
> making it easily replaceable.
>
> regards, tom lane
>



---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 10:00 AM
mark@mark.mielke.cc
 
Posts: n/a
Default Re: [HACKERS] Patch for UUID datatype (beta)

On Mon, Sep 18, 2006 at 10:33:22AM -0400, Tom Lane wrote:
> Andreas Pflug <pgadmin@pse-consulting.de> writes:
> > Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> > there's a commonly accepted algorithm providing this.

> Anyone who thinks UUIDs are guaranteed unique has been drinking too much
> of the kool-aid. They're at best probably unique. Some generator
> algorithms might make it more probable than others, but you simply
> cannot "guarantee" it for UUIDs generated on noncommunicating machines.


The versions that include a MAC address, time, and serial number for
the machine come pretty close, presuming that the user has not
overwritten the MAC address with something else. It's unique at
manufacturing time. If the generation is performed from a library
with the same state, on the same machine, on the off chance that you
do request multiple generations at the same exact time (from my
experience, this is already unlikely) the serial number should be
bumped for that time.

So yeah - if you set your MAC address, or if your machine time is ever
set back, or if you assume a serial number of 0 each time (generation
routine isn't shared among processes on the system), you can get overlap.
All of these can be controlled, making it possible to eliminate overlap.

> One of the big reasons that I'm hesitant to put a UUID generation
> function into core is the knowledge that none of them are or can be
> perfect ... so people might need different ones depending on local
> conditions. I'm inclined to think that a reasonable setup would put
> the datatype (with input, output, comparison and indexing support)
> into core, but provide a generation function as a contrib module,
> making it easily replaceable.


I have UUID generation in core in my current implementation. In the
last year that I've been using it, I have already chosen twice to
generate UUIDs from my calling program. I find it faster, as it avoids
have to call out to PostgreSQL twice. Once to generate the UUID, and
once to insert the row using it. I have no strong need for UUID
generation to be in core, and believe there does exist strong reasons
not to. Performance is better when not in core. Portability of
PostgreSQL is better when not in core. Ability to control how UUID is
defined is better when not in control.

The only thing an in-core version provides is convenience for those
that do not have easy access to a UUID generation library. I don't
care for that convenience.

Cheers,
mark

--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
.. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 10:00 AM
Jim C. Nasby
 
Posts: n/a
Default Re: [HACKERS] Patch for UUID datatype (beta)

If you're going to yank it, please at least include a generator in
contrib.

Personally, I'd like to see at least some kind of generator in core,
with appropriate info/disclaimers in the docs. A simple random-number
generator is probably the best way to go in that regard. I think that
most people know that UUID generation isn't 100.00000% perfect.

BTW, at a former company we used SHA1s to identify files that had been
uploaded. We were wondering on the odds of 2 different files hashing to
the same value and found some statistical comparisons of probabilities.
I don't recall the details, but the odds of duplicating a SHA1 (1 in
2^160) are so insanely small that it's hard to find anything in the
physical world that compares. To duplicate random 256^256 numbers you'd
probably have to search until the heat-death of the universe.

On Mon, Sep 18, 2006 at 05:14:22PM +0200, Gevik Babakhani wrote:
> Completely agreed. I can remove the function from the patch. The
> temptation was just too high not to include the new_guid() in the
> patch
>
>
> On Mon, 2006-09-18 at 10:33 -0400, Tom Lane wrote:
> > Andreas Pflug <pgadmin@pse-consulting.de> writes:
> > > Isn't guaranteed uniqueness the very attribute that's expected? AFAIK
> > > there's a commonly accepted algorithm providing this.

> >
> > Anyone who thinks UUIDs are guaranteed unique has been drinking too much
> > of the kool-aid. They're at best probably unique. Some generator
> > algorithms might make it more probable than others, but you simply
> > cannot "guarantee" it for UUIDs generated on noncommunicating machines.
> >
> > One of the big reasons that I'm hesitant to put a UUID generation
> > function into core is the knowledge that none of them are or can be
> > perfect ... so people might need different ones depending on local
> > conditions. I'm inclined to think that a reasonable setup would put
> > the datatype (with input, output, comparison and indexing support)
> > into core, but provide a generation function as a contrib module,
> > making it easily replaceable.
> >
> > regards, tom lane
> >

>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>


--
Jim Nasby jimn@enterprisedb.com
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 10:00 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: [HACKERS] Patch for UUID datatype (beta)

On Mon, Sep 18, 2006 at 04:00:22PM -0500, Jim C. Nasby wrote:
> BTW, at a former company we used SHA1s to identify files that had been
> uploaded. We were wondering on the odds of 2 different files hashing to
> the same value and found some statistical comparisons of probabilities.
> I don't recall the details, but the odds of duplicating a SHA1 (1 in
> 2^160) are so insanely small that it's hard to find anything in the
> physical world that compares. To duplicate random 256^256 numbers you'd
> probably have to search until the heat-death of the universe.


The birthday paradox gives you about 2^80 (about 10^24) files before a
SHA1 match, which is huge enough as it is. AIUI a UUID is only 2^128
bits so that would make 2^64 (about 10^19) random strings before you
get a duplicate. Embed the time in there and the chance becomes
*really* small, because then you have to get it in the same second.

Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFFDwv3IB7bNG8LQkwRAmsXAJ4iz3h3Ox4pRh1CVbxHd5 rkMOw9MQCgjugo
edj35RMruhhcMdmIw6BGh0E=
=RSfj
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 10:01 AM
Jim C. Nasby
 
Posts: n/a
Default Re: [HACKERS] Patch for UUID datatype (beta)

On Mon, Sep 18, 2006 at 12:23:16PM -0400, mark@mark.mielke.cc wrote:
> I have UUID generation in core in my current implementation. In the
> last year that I've been using it, I have already chosen twice to
> generate UUIDs from my calling program. I find it faster, as it avoids
> have to call out to PostgreSQL twice. Once to generate the UUID, and
> once to insert the row using it. I have no strong need for UUID
> generation to be in core, and believe there does exist strong reasons
> not to. Performance is better when not in core. Portability of
> PostgreSQL is better when not in core. Ability to control how UUID is
> defined is better when not in control.


That's kinda short-sighted. You're assuming that the only place you'll
want to generate UUIDs is outside the database. What about a stored
procedure that's adding data to the database? How about populating a
table via a SELECT INTO? There's any number of cases where you'd want to
generate a UUID inside the database.

> The only thing an in-core version provides is convenience for those
> that do not have easy access to a UUID generation library. I don't
> care for that convenience.


It's not about access to a library, it's about how do you get to that
library from inside the database, which may not be very easy.

You may not care for that convenience, but I certainly would.
--
Jim Nasby jimn@enterprisedb.com
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:30 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com