Unix Technical Forum

Re: Merge algorithms for large numbers of "tapes"

This is a discussion on Re: Merge algorithms for large numbers of "tapes" within the pgsql Hackers forums, part of the PostgreSQL category; --> > -----Original Message----- > From: Luke Lonergan [mailto:llonergan@greenplum.com] > Sent: Wednesday, March 08, 2006 1:52 PM > To: Dann ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 02:31 AM
Dann Corbit
 
Posts: n/a
Default Re: Merge algorithms for large numbers of "tapes"

> -----Original Message-----
> From: Luke Lonergan [mailto:llonergan@greenplum.com]
> Sent: Wednesday, March 08, 2006 1:52 PM
> To: Dann Corbit; Tom Lane; Jim C. Nasby
> Cc: Simon Riggs; pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] Merge algorithms for large numbers of "tapes"
>
> Dann,
>
> On 3/8/06 12:39 PM, "Dann Corbit" <DCorbit@connx.com> wrote:
>
> > Here are some suggestions of things that I know work really, really
> > well:

>
> Can you point to an example? That might help move the discussion

along.

I wrote all of the sorting and merging stuff for CONNX Solutions
http://www.connx.com

I have carefully benched all of this stuff and (at least for our system)
the ideas I propose work well. Of course, every system is different and
the only way to know if it is an improvement is to try it in place.

> The reason to interject about the tape goo in this discussion is that

we
> seem to be spending a lot of time optimizing around the tape goo

without
> tackling the overall structure of the external sort. I think we'll

just
> end
> up having to replace all of this goo when we really get around to

fixing
> the
> problem.


I suggest trying several alternatives and benching them with real world
queries and especially with the open database benchmark suite.

> Add to this that other commercial databases external sort in 1/4 the

time
> or
> better on the same hardware with the same CPU/memory resources using a
> 2-pass external sort.


Our sort merge is so fast that I can join two tables on a column with no
index faster than on a database that has a unique clustered index on the
column. Benchmarked against Oracle, SQL*Server, and several others.

If you check our ORDER BY on a large table with no index, you will see
that it is competitive with the best commercial systems.

If you are interested, you could get an eval of CONNX and try it
yourself (eval is free for some number of days, I don't remember what).

> > #1. Two pass merge (none of that silly poly-tape merge goo)

>
> Voice of reason here. It's what the other database systems do.
>
> > #2. Load ONLY the keys that are to be sorted into memory. Use a
> > pointer exchange sort, and do not move the physical rows of data at

all.
>
> Sounds right. Example of this in practice?


It is what we use here. It is the only way to fly. This is well known,
and if you read a few articles from the ACM, you will see that it has
been known for decades.

> > I am pretty sure from this thread that PostgreSQL is not doing #1,

and I
> > have no idea if it is doing #2.

>
> Yep. Even Knuth says that the tape goo is only interesting from a
> historical perspective and may not be relevant in an era of disk

drives.
>
> - Luke
>



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:48 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com