Unix Technical Forum

Re: Merge algorithms for large numbers of "tapes"

This is a discussion on Re: Merge algorithms for large numbers of "tapes" within the pgsql Hackers forums, part of the PostgreSQL category; --> There are some articles here that are worth reading if you want to sort fast: http://research.microsoft.com/barc/SortBenchmark/ > -----Original Message----- ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 02:31 AM
Dann Corbit
 
Posts: n/a
Default Re: Merge algorithms for large numbers of "tapes"

There are some articles here that are worth reading if you want to sort
fast:

http://research.microsoft.com/barc/SortBenchmark/

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailtogsql-hackers-
> owner@postgresql.org] On Behalf Of Dann Corbit
> Sent: Wednesday, March 08, 2006 1:59 PM
> To: Luke Lonergan; Tom Lane; Jim C. Nasby
> Cc: Simon Riggs; pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
>
> > -----Original Message-----
> > From: Luke Lonergan [mailto:llonergan@greenplum.com]
> > Sent: Wednesday, March 08, 2006 1:52 PM
> > To: Dann Corbit; Tom Lane; Jim C. Nasby
> > Cc: Simon Riggs; pgsql-hackers@postgreSQL.org
> > Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
> >
> > Dann,
> >
> > On 3/8/06 12:39 PM, "Dann Corbit" <DCorbit@connx.com> wrote:
> >
> > > Here are some suggestions of things that I know work really,

really
> > > well:

> >
> > Can you point to an example? That might help move the discussion

> along.
>
> I wrote all of the sorting and merging stuff for CONNX Solutions
> http://www.connx.com
>
> I have carefully benched all of this stuff and (at least for our

system)
> the ideas I propose work well. Of course, every system is different

and
> the only way to know if it is an improvement is to try it in place.
>
> > The reason to interject about the tape goo in this discussion is

that
> we
> > seem to be spending a lot of time optimizing around the tape goo

> without
> > tackling the overall structure of the external sort. I think we'll

> just
> > end
> > up having to replace all of this goo when we really get around to

> fixing
> > the
> > problem.

>
> I suggest trying several alternatives and benching them with real

world
> queries and especially with the open database benchmark suite.
>
> > Add to this that other commercial databases external sort in 1/4 the

> time
> > or
> > better on the same hardware with the same CPU/memory resources using

a
> > 2-pass external sort.

>
> Our sort merge is so fast that I can join two tables on a column with

no
> index faster than on a database that has a unique clustered index on

the
> column. Benchmarked against Oracle, SQL*Server, and several others.
>
> If you check our ORDER BY on a large table with no index, you will see
> that it is competitive with the best commercial systems.
>
> If you are interested, you could get an eval of CONNX and try it
> yourself (eval is free for some number of days, I don't remember

what).
>
> > > #1. Two pass merge (none of that silly poly-tape merge goo)

> >
> > Voice of reason here. It's what the other database systems do.
> >
> > > #2. Load ONLY the keys that are to be sorted into memory. Use a
> > > pointer exchange sort, and do not move the physical rows of data

at
> all.
> >
> > Sounds right. Example of this in practice?

>
> It is what we use here. It is the only way to fly. This is well

known,
> and if you read a few articles from the ACM, you will see that it has
> been known for decades.
>
> > > I am pretty sure from this thread that PostgreSQL is not doing #1,

> and I
> > > have no idea if it is doing #2.

> >
> > Yep. Even Knuth says that the tape goo is only interesting from a
> > historical perspective and may not be relevant in an era of disk

> drives.
> >
> > - Luke
> >

>
>
> ---------------------------(end of

broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:40 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com