Unix Technical Forum

Synchronized Scan update

This is a discussion on Synchronized Scan update within the pgsql Hackers forums, part of the PostgreSQL category; --> I have found some interesting results from my tests with the Synchronized Scan patch I'm working on. The two ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 07:25 AM
Jeff Davis
 
Posts: n/a
Default Synchronized Scan update


I have found some interesting results from my tests with the
Synchronized Scan patch I'm working on.

The two benefits that I hope to achieve with the patch are:
(1) Better caching behavior with multiple sequential scans running in
parallel
(2) Faster sequential reads from disk and less seeking

I have consistently seen #1 to be true. There is still more testing to
be done (hopefully soon), but I haven't found a problem yet. And the
benefits I've seen are very substantial, which isn't hard, since in the
typical case, a large sequential scan will have 0% cache hit rate. These
numbers were retrieved using log_executor_stats=on.

#2 however, is a little trickier. IIRC, Tom was the first to point out
that the I/O system might not recognize that reads coming from different
processes are indeed one sequential read.

At first I never saw the problem actually happen, and I assumed that the
OS was being smart enough. However, recently I noticed this problem on
my home machine, which experienced great caching behavior but poor I/O
throughput (as measured by iostat). My home machine was using the Linux
CFQ io scheduler, and when I swapped the CFQ io scheduler for the
anticipatory scheduler (AS), it worked great. When I sent Josh my patch
(per his request) I mentioned the problem I experienced.

Then I started investigating, and found some mixed results. My test was
basically to use iostat (or zpool iostat) to measure disk throughput,
and N processes of "dd if=bigfile of=/dev/null" (started simultaneously)
to run the test. I consider the test to be "passed" if the additional
processes did not interfere (i.e. each process finished as though it
were the only one running). Of course, all tests were I/O bound.

My home machine (core 2 duo, single SATA disk, intel controller):
Linux/ext3/AS: passed
Linux/ext3/CFQ: failed
Linux/ext3/noop: passed
Linux/ext3/deadline: passed

Machine 2 (old thinkpad, IDE disk):
Solaris/UFS: failed
Solaris/ZFS: passed

Machine 3 (dell 2950, LSI PERC/5i controller, 6 SAS disks, RAID-10,
adaptive read ahead):
FreeBSD/UFS: failed

(I suspect the last test would be fine with read ahead always on, and it
may just be a problem with the adaptive read ahead feature)

There are a lot of factors involved, because several components of the
I/O system have the ability to reorder requests or read ahead, such as
the block layer and the controller.

The block request ordering isn't the only factor because Solaris/UFS
only orders the requests by cylinder and moves only in one direction
(i.e. looks like a simple elevator algorithm that isn't affected by
process id). At least, that's how I understand it.

Readahead can't be the only factor either because replacing the io
scheduler in Linux solved the problem, even when that replacement was
the noop scheduler.

Anyway, back to the patch, it looks like there are some complications if
you try to use it with the wrong combination of fs, io scheduler, and
controller.

The patch is designed for certain query patterns anyway, so I don't
think that this is a show-stopper. Given the better cache behavior, it
seems like it's really the job of the I/O system to get a single,
sequential stream of blocks efficiently.

The alternative would be to have a single block-reader process, which I
don't think we want to do. However, I/O systems don't really seem to
know how to handle multiple processes reading from the same file very
well.

Comments?

Regards,
Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 07:31 AM
Jeff Davis
 
Posts: n/a
Default Re: Synchronized Scan update


Is there any consensus about whether to include these two parameters as
GUCs or constants if my patch is to be accepted?

(1) sync_scan_threshold: Use synchronized scanning for tables greater
than this many pages; smaller tables will not be affected.
(2) sync_scan_offset: Start a new scan this many pages before a
currently running scan to take advantage of the pages
that are likely already in cache.

Right now they are just constants defined in a header, but a GUC might
make sense. I'd like to know which version is more acceptable when I
submit my final patch.

Regards,
Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 07:32 AM
Josh Berkus
 
Posts: n/a
Default Re: Synchronized Scan update

Jeff,

> Right now they are just constants defined in a header, but a GUC might
> make sense. I'd like to know which version is more acceptable when I
> submit my final patch.


As much as I hate the thought of more GUCs, until we have a solid
performance profile for synch scan we probably need them. You should
include the option to turn synch_scan off, such as by setting
synch_scan_threshold to -1.

Oh, and remember that these now need to be able to take K/MB/GB.

These options should probably go in postgresql.conf under QUERY TUNING,
with their own sub-head.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 07:32 AM
Jeff Davis
 
Posts: n/a
Default Re: Synchronized Scan update

On Fri, 2007-03-02 at 15:49 -0800, Josh Berkus wrote:
> Jeff,
>
> > Right now they are just constants defined in a header, but a GUC might
> > make sense. I'd like to know which version is more acceptable when I
> > submit my final patch.

>
> As much as I hate the thought of more GUCs, until we have a solid
> performance profile for synch scan we probably need them. You should


I will include them in the final patch then.

> include the option to turn synch_scan off, such as by setting
> synch_scan_threshold to -1.


Naturally.

> Oh, and remember that these now need to be able to take K/MB/GB.


Will do.

> These options should probably go in postgresql.conf under QUERY TUNING,
> with their own sub-head.


That makes sense to me.

Regards,
Jeff Davis

PS: Did you happen to get my patch for testing (sent off-list)? If
testing will take a while, that's OK, I'd just like to know whether to
expect the results before feature freeze.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 07:32 AM
Josh Berkus
 
Posts: n/a
Default Re: Synchronized Scan update

Jeff,

> PS: Did you happen to get my patch for testing (sent off-list)? If
> testing will take a while, that's OK, I'd just like to know whether to
> expect the results before feature freeze.


I'm not sure. We have a bunch to patches in our queue to test, and the
benchmark guys don't really expect synch scan to affect OLTP benchmarks
much. You might want to pester Greenplum about testing on TPCH.

--
--Josh

Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 07:33 AM
Simon Riggs
 
Posts: n/a
Default Re: Synchronized Scan update

On Fri, 2007-03-02 at 15:03 -0800, Jeff Davis wrote:
> Is there any consensus about whether to include these two parameters as
> GUCs or constants if my patch is to be accepted?
>
> (1) sync_scan_threshold: Use synchronized scanning for tables greater
> than this many pages; smaller tables will not be affected.


That sounds OK.

> (2) sync_scan_offset: Start a new scan this many pages before a
> currently running scan to take advantage of the pages
> that are likely already in cache.


I'm somewhat dubious about this parameter, I have to say, even though I
am eager for this feature. It seems like a "magic" parameter that works
only when we have the right knowledge to set it correctly.

How will we know what to default it to and how will we know whether to
set it higher or lower for better performance? Does that value vary
according to the workload on the system? How?

I'm worried that we get a feature that works well on simple tests and
not at all in real world circumstances. I don't want to cast doubt on
what could be a great patch or be negative: I just see that the feature
relies on the dynamic behaviour of the system. I'd like to see some
further studies on how this works to make sure that we can realistically
set know how to set this knob, that its the correct knob and it is the
only one we need.

Further thoughts: It sounds like sync_scan_offset is related to
effective_cache_size. Can you comment on whether that might be a
something we can use as well/instead? (i.e. set the scan offset to say K
* effective_cache_size, 0.1 <= K <= 0.5)???

Might we do roughly the same thing with sync_scan_threshold as well, and
just have enable_sync_scan instead? i.e. sync_scan_threshold =
effective_cache_size? When would those two parameters not be connected
directly to each other?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 07:33 AM
Joshua D. Drake
 
Posts: n/a
Default Re: Synchronized Scan update

>
>> (2) sync_scan_offset: Start a new scan this many pages before a
>> currently running scan to take advantage of the pages
>> that are likely already in cache.
>>

>
> I'm somewhat dubious about this parameter, I have to say, even though I
> am eager for this feature. It seems like a "magic" parameter that works
> only when we have the right knowledge to set it correctly.
>
>

Hello,

Don't get me wrong, I want things to be easily understandable as well
but the reason you site above pretty much
makes us need to remove most of the postgresql.conf, including all
bgwriter, vacuum cost delay, and autovac settings.
Not to mention commit delay and others .

Sincerely,

Joshua D. Drake







---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 07:34 AM
Jeff Davis
 
Posts: n/a
Default Re: Synchronized Scan update

On Sun, 2007-03-04 at 11:54 +0000, Simon Riggs wrote:
> > (2) sync_scan_offset: Start a new scan this many pages before a
> > currently running scan to take advantage of the pages
> > that are likely already in cache.

>
> I'm somewhat dubious about this parameter, I have to say, even though I
> am eager for this feature. It seems like a "magic" parameter that works
> only when we have the right knowledge to set it correctly.
>


That was my concern about this parameter also.

> How will we know what to default it to and how will we know whether to
> set it higher or lower for better performance? Does that value vary
> according to the workload on the system? How?
>


Perhaps people would only set this parameter when they know it will
help, and for more complex (or varied) usage patterns they'd set
sync_scan_offset to 0 to be safe.

My thinking on the subject (and this is only backed up by very basic
tests) is that there are basically two situations where setting this
parameter too high can hurt:
(1) It's too close to the limits of your physical memory, and you end up
diverging the scans when they could be kept together.
(2) You're using a lot of CPU and the backends aren't processing the
buffers as fast as your I/O system is delivering them. This will prevent
the scans from converging.

If your CPUs are well below capacity and you choose a size significantly
less than your effective cache size, I don't think it will hurt.

> I'm worried that we get a feature that works well on simple tests and
> not at all in real world circumstances. I don't want to cast doubt on
> what could be a great patch or be negative: I just see that the feature
> relies on the dynamic behaviour of the system. I'd like to see some
> further studies on how this works to make sure that we can realistically
> set know how to set this knob, that its the correct knob and it is the
> only one we need.


I will do some better tests on some better hardware this week and next
week. I hope that sheds some light.

> Further thoughts: It sounds like sync_scan_offset is related to
> effective_cache_size. Can you comment on whether that might be a
> something we can use as well/instead? (i.e. set the scan offset to say K
> * effective_cache_size, 0.1 <= K <= 0.5)???
>
> Might we do roughly the same thing with sync_scan_threshold as well, and
> just have enable_sync_scan instead? i.e. sync_scan_threshold =
> effective_cache_size? When would those two parameters not be connected
> directly to each other?
>


Originally, these parameters were in terms of the effective_cache_size.
Somebody else convinced me that it was too confusing to have the
variables dependent on each other, so I made them independent. I don't
have a strong opinion either way.

Regards,
Jeff Davis





---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 07:36 AM
Josh Berkus
 
Posts: n/a
Default Re: Synchronized Scan update

JD,

> Don't get me wrong, I want things to be easily understandable as well
> but the reason you site above pretty much
> makes us need to remove most of the postgresql.conf, including all
> bgwriter, vacuum cost delay, and autovac settings.
> Not to mention commit delay and others .


Wouldn't that be nice!

The explosion of GUC settings is primarily a result of not enough information.
The reason there are 7 bgwriter settings, for example, is that we have no
idea what those settings should be and are hoping that people will tinker
with them and tell us. Someday when I can fully profile bgwriter, we'll just
have one setting: bgwriter_aggressive, set to a number between 0 and 9.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 07:38 AM
Jim Nasby
 
Posts: n/a
Default Re: Synchronized Scan update

On Mar 6, 2007, at 9:43 AM, Josh Berkus wrote:
>> Don't get me wrong, I want things to be easily understandable as well
>> but the reason you site above pretty much
>> makes us need to remove most of the postgresql.conf, including all
>> bgwriter, vacuum cost delay, and autovac settings.
>> Not to mention commit delay and others .

>
> Wouldn't that be nice!
>
> The explosion of GUC settings is primarily a result of not enough
> information.
> The reason there are 7 bgwriter settings, for example, is that we
> have no
> idea what those settings should be and are hoping that people will
> tinker
> with them and tell us. Someday when I can fully profile bgwriter,
> we'll just
> have one setting: bgwriter_aggressive, set to a number between 0
> and 9.


In the mean time; it would be great for these multiple-settings cases
to be listed somewhere, indicating that it's something we could use
help with. I think that with some explanation of what we're looking
for there's any number of people who could do this kind of profiling.
--
Jim Nasby jim@nasby.net
EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:40 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com