Unix Technical Forum

Re: [PERFORM] Releasing memory during External sorting?

This is a discussion on Re: [PERFORM] Releasing memory during External sorting? within the pgsql Hackers forums, part of the PostgreSQL category; --> The cited book also explains how to use a callback function to perform arbitrary radix sorts (you simply need ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 06:53 AM
Dann Corbit
 
Posts: n/a
Default Re: [PERFORM] Releasing memory during External sorting?

The cited book also explains how to use a callback function to perform
arbitrary radix sorts (you simply need a method that returns the
[bucketsize] most significant bits for a given data type, for the length
of the key).

So you can sort fairly arbitrary data in linear time (of course if the
key is long then O(n*log(n)) will be better anyway.)

But in any case, if we are talking about external sorting, then disk
time will be so totally dominant that the choice of algorithm is
practically irrelevant.

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailtogsql-hackers-
> owner@postgresql.org] On Behalf Of Dann Corbit
> Sent: Friday, September 23, 2005 2:21 PM
> To: Ron Peacetree; Mark Lewis; Tom Lane; pgsql-hackers@postgresql.org;
> pgsql-performance@postgresql.org
> Subject: Re: [HACKERS] [PERFORM] Releasing memory during External

sorting?
>
> For the subfiles, load the top element of each subfile into a priority
> queue. Extract the min element and write it to disk. If the next

value
> is the same, then the queue does not need to be adjusted. If the next
> value in the subfile changes, then adjust it.
>
> Then, when the lowest element in the priority queue changes, adjust

the
> queue.
>
> Keep doing that until the queue is empty.
>
> You can create all the subfiles in one pass over the data.
>
> You can read all the subfiles, merge them, and write them out in a
> second pass (no matter how many of them there are).
>
> Replacement selection is not a good idea any more, since obvious

better
> ideas should take over. Longer runs are of no value if you do not

have
> to do multiple merge passes.
>
> I have explained this general technique in the book "C Unleashed",
> chapter 13.
>
> Sample code is available on the book's home page.
>
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org [mailtogsql-hackers-
> > owner@postgresql.org] On Behalf Of Ron Peacetree
> > Sent: Friday, September 23, 2005 11:41 AM
> > To: Mark Lewis; Tom Lane; pgsql-hackers@postgresql.org; pgsql-
> > performance@postgresql.org
> > Subject: Re: [HACKERS] [PERFORM] Releasing memory during External

> sorting?
> >
> > Yep. Also, bear in mind that the lg(n!)= ~ nlgn - n lower bound on
> > the number of comparisions:
> > a= says nothing about the amount of data movement used.
> > b= only holds for generic comparison based sorting algorithms.
> >
> > As Knuth says (vol 3, p180), Distribution Counting sorts without
> > ever comparing elements to each other at all, and so does Radix
> > Sort. Similar comments can be found in many algorithms texts.
> >
> > Any time we know that the range of the data to be sorted is

> substantially
> > restricted compared to the number of items to be sorted, we can sort

> in
> > less than O(lg(n!)) time. DB fields tend to take on few values and

> are
> > therefore "substantially restricted".
> >
> > Given the proper resources and algorithms, O(n) sorts are very

> plausible
> > when sorting DB records.
> >
> > All of the fastest external sorts of the last decade or so take

> advantage
> > of
> > this. Check out that URL I posted.
> >
> > Ron
> >
> >
> > -----Original Message-----
> > From: Mark Lewis <mark.lewis@mir3.com>
> > Sent: Sep 23, 2005 1:43 PM
> > To: Tom Lane <tgl@sss.pgh.pa.us>
> > Subject: Re: [PERFORM] Releasing memory during External sorting?
> >
> > operations != passes. If you were clever, you could probably write

a
> > modified bubble-sort algorithm that only made 2 passes. A pass is a
> > disk scan, operations are then performed (hopefully in memory) on

what
> > you read from the disk. So there's no theoretical log N lower-bound

> on
> > the number of disk passes.
> >
> > Not that I have anything else useful to add to this discussion, just

a
> > tidbit I remembered from my CS classes back in college
> >
> > -- Mark
> >
> > On Fri, 2005-09-23 at 13:17 -0400, Tom Lane wrote:
> > > Ron Peacetree <rjpeace@earthlink.net> writes:
> > > > 2= No optimal external sorting algorithm should use more than 2

> > passes.
> > > > 3= Optimal external sorting algorithms should use 1 pass if at

all
> > possible.
> > >
> > > A comparison-based sort must use at least N log N operations, so

it
> > > would appear to me that if you haven't got approximately log N

> passes
> > > then your algorithm doesn't work.
> > >
> > > regards, tom lane

> >
> > ---------------------------(end of

> broadcast)---------------------------
> > TIP 1: if posting/reading through Usenet, please send an appropriate
> > subscribe-nomail command to majordomo@postgresql.org so that

> your
> > message can get through to the mailing list cleanly

>
> ---------------------------(end of

broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that

your
> message can get through to the mailing list cleanly


---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 07:35 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com