Unix Technical Forum

Re: Merge algorithms for large numbers of "tapes"

This is a discussion on Re: Merge algorithms for large numbers of "tapes" within the pgsql Hackers forums, part of the PostgreSQL category; --> > Two pass will create the count of subfiles proportional to: > Subfile_count = original_stream_size/sort_memory_buffer_size > > The merge ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 02:33 AM
Zeugswetter Andreas DCP SD
 
Posts: n/a
Default Re: Merge algorithms for large numbers of "tapes"


> Two pass will create the count of subfiles proportional to:
> Subfile_count = original_stream_size/sort_memory_buffer_size
>
> The merge pass requires (sizeof record * subfile_count) memory.


That is true from an algorithmic perspective. But to make the
merge efficient you would need to have enough RAM to cache a reasonably
large block per subfile_count. Else you would need to reread the same
page/block from one subfile multiple times.
(If you had one disk per subfile you could also rely on the disk's own
cache,
but I think we can rule that out)

> Example:
> You have a 7 gigabyte table to sort and you have 100 MB sort buffer.
> The number of subfiles will be:
> 7000000000 / 100000000 = 70 files


To be efficient you need (70 + 1) \* max(record_size, 256k) = 18 Mb

Plus you need a structure per subfile that points to the current record
in the buffer.

Andreas

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 02:33 AM
Martijn van Oosterhout
 
Posts: n/a
Default Re: Merge algorithms for large numbers of "tapes"

On Fri, Mar 10, 2006 at 09:57:28AM +0100, Zeugswetter Andreas DCP SD wrote:
>
> > Two pass will create the count of subfiles proportional to:
> > Subfile_count = original_stream_size/sort_memory_buffer_size
> >
> > The merge pass requires (sizeof record * subfile_count) memory.

>
> That is true from an algorithmic perspective. But to make the
> merge efficient you would need to have enough RAM to cache a reasonably
> large block per subfile_count. Else you would need to reread the same
> page/block from one subfile multiple times.
> (If you had one disk per subfile you could also rely on the disk's own
> cache,
> but I think we can rule that out)


But what about the OS cache? Linux will read upto the next 128KB of a
file if it's contiguous on disk, which is likely with modern
filesystems. It's likely to be much "fairer" than any way we can come
up with to share memory.

Question is, do we want our algorithm to rely on that caching?
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFEEUqAIB7bNG8LQkwRAndoAJ0aTelxQz3EdjzsCIRPPX DrwgbovQCcCBh0
Z7JhtZNzNGzbskE4IWSqmlA=
=UEM3
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:01 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com