Unix Technical Forum

Re: Re: Which qsort is used

This is a discussion on Re: Re: Which qsort is used within the pgsql Hackers forums, part of the PostgreSQL category; --> > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, December 16, 2005 9:03 PM > To: Dann ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 07:23 AM
Dann Corbit
 
Posts: n/a
Default Re: Re: Which qsort is used

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, December 16, 2005 9:03 PM
> To: Dann Corbit
> Cc: Qingqing Zhou; Bruce Momjian; Luke Lonergan; Neil Conway; pgsql-
> hackers@postgresql.org
> Subject: Re: [HACKERS] Re: Which qsort is used
>
> "Dann Corbit" <DCorbit@connx.com> writes:
> > Both of them are modified versions of Bentley's sort algorithm.

>
> Jon Bentley of Bell Labs? Small world ... he was my thesis adviser
> for awhile when he was at CMU. He's a good algorithms man, for sure.
>
> > I just added the in-order check to both of them, and the reversed
> > partition check for the second method (in the case of the subfiles
> > because insertion sort is allergic to reversed partitions).

>
> I've still got a problem with these checks; I think they are a net
> waste of cycles on average. They would only be a win if you expected

a
> nontrivial percentage of perfectly-in-order or perfectly-reverse-order
> inputs. What I think is much more probable in the Postgres

environment
> is almost-but-not-quite-ordered inputs --- eg, a table that was
> perfectly ordered by key when filled, but some of the tuples have

since
> been moved by UPDATEs. This is the worst possible case for the

in-order
> checks, because they can then grovel over large percentages of the

file
> before failing ... and when they fail, those cycles are entirely

wasted;
> you have not advanced the state of the sort at all.
>
> For the "usual" case of randomly ordered input, of course it doesn't
> matter much at all because the in-order checks will fail after

examining
> not too many items. But to argue that the checks are worth making,

you
> have to assume that perfectly-ordered inputs are more common than
> almost-ordered inputs, and I think that is exactly the wrong

assumption
> for the Postgres environment. I sure haven't seen any evidence that
> it's a good assumption.


The benchmarks say that they (order checks) are a good idea on average
for ordered data, random data, and partly ordered data.

It does not require perfectly ordered data for the checks to be useful.
On mostly ordered data, it is likely that some partitions are perfectly
ordered.

If you trace the algorithms in a debugger you will be surprised at how
often the partitions are ordered, even with random sets as input.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 07:23 AM
Qingqing Zhou
 
Posts: n/a
Default Re: Re: Which qsort is used



On Sat, 17 Dec 2005, Dann Corbit wrote:

>
> The benchmarks say that they (order checks) are a good idea on average
> for ordered data, random data, and partly ordered data.
>


I interpret that in linux, 5000000 seems a divide for qsortpdq. Before
that number, it wins, after that, bsd wins more. On SunOS, qsortpdq takes
the lead till the last second -- I suspect this is due to the rand()
function:

Linux - #define RAND_MAX 2147483647
SunOS - #define RAND_MAX 32767

So in SunOS, the data actually not that scattered - so more favourate for
sorted() or reversed() check?

Regards,
Qingqing

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 07:23 AM
Tom Lane
 
Posts: n/a
Default Re: Re: Which qsort is used

"Dann Corbit" <DCorbit@connx.com> writes:
>> I've still got a problem with these checks; I think they are a net
>> waste of cycles on average.


> The benchmarks say that they (order checks) are a good idea on average
> for ordered data, random data, and partly ordered data.


There are lies, damn lies, and benchmarks ;-)

The problem with citing a benchmark for this discussion is that a
benchmark can't tell you anything about real-world probabilities;
it only tells you about the probabilities occuring in the benchmark
case. You need to make the case that the benchmark reflects the
real world, which you didn't.

> If you trace the algorithms in a debugger you will be surprised at how
> often the partitions are ordered, even with random sets as input.


Well, I do agree that checking for orderedness on small partitions would
succeed more often than on larger partitions or the whole file --- but
the code-as-given checks all the way down. Moreover, the argument given
for spending these cycles is that insertion sort sucks on reverse-order
input ... where "sucks" means that it spends O(N^2) time. But it spends
O(N^2) in the average case, too.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:27 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com