This is a discussion on Synchronized Scan update within the pgsql Hackers forums, part of the PostgreSQL category; --> I have found some interesting results from my tests with the Synchronized Scan patch I'm working on. The two ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have found some interesting results from my tests with the Synchronized Scan patch I'm working on. The two benefits that I hope to achieve with the patch are: (1) Better caching behavior with multiple sequential scans running in parallel (2) Faster sequential reads from disk and less seeking I have consistently seen #1 to be true. There is still more testing to be done (hopefully soon), but I haven't found a problem yet. And the benefits I've seen are very substantial, which isn't hard, since in the typical case, a large sequential scan will have 0% cache hit rate. These numbers were retrieved using log_executor_stats=on. #2 however, is a little trickier. IIRC, Tom was the first to point out that the I/O system might not recognize that reads coming from different processes are indeed one sequential read. At first I never saw the problem actually happen, and I assumed that the OS was being smart enough. However, recently I noticed this problem on my home machine, which experienced great caching behavior but poor I/O throughput (as measured by iostat). My home machine was using the Linux CFQ io scheduler, and when I swapped the CFQ io scheduler for the anticipatory scheduler (AS), it worked great. When I sent Josh my patch (per his request) I mentioned the problem I experienced. Then I started investigating, and found some mixed results. My test was basically to use iostat (or zpool iostat) to measure disk throughput, and N processes of "dd if=bigfile of=/dev/null" (started simultaneously) to run the test. I consider the test to be "passed" if the additional processes did not interfere (i.e. each process finished as though it were the only one running). Of course, all tests were I/O bound. My home machine (core 2 duo, single SATA disk, intel controller): Linux/ext3/AS: passed Linux/ext3/CFQ: failed Linux/ext3/noop: passed Linux/ext3/deadline: passed Machine 2 (old thinkpad, IDE disk): Solaris/UFS: failed Solaris/ZFS: passed Machine 3 (dell 2950, LSI PERC/5i controller, 6 SAS disks, RAID-10, adaptive read ahead): FreeBSD/UFS: failed (I suspect the last test would be fine with read ahead always on, and it may just be a problem with the adaptive read ahead feature) There are a lot of factors involved, because several components of the I/O system have the ability to reorder requests or read ahead, such as the block layer and the controller. The block request ordering isn't the only factor because Solaris/UFS only orders the requests by cylinder and moves only in one direction (i.e. looks like a simple elevator algorithm that isn't affected by process id). At least, that's how I understand it. Readahead can't be the only factor either because replacing the io scheduler in Linux solved the problem, even when that replacement was the noop scheduler. Anyway, back to the patch, it looks like there are some complications if you try to use it with the wrong combination of fs, io scheduler, and controller. The patch is designed for certain query patterns anyway, so I don't think that this is a show-stopper. Given the better cache behavior, it seems like it's really the job of the I/O system to get a single, sequential stream of blocks efficiently. The alternative would be to have a single block-reader process, which I don't think we want to do. However, I/O systems don't really seem to know how to handle multiple processes reading from the same file very well. Comments? Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Is there any consensus about whether to include these two parameters as GUCs or constants if my patch is to be accepted? (1) sync_scan_threshold: Use synchronized scanning for tables greater than this many pages; smaller tables will not be affected. (2) sync_scan_offset: Start a new scan this many pages before a currently running scan to take advantage of the pages that are likely already in cache. Right now they are just constants defined in a header, but a GUC might make sense. I'd like to know which version is more acceptable when I submit my final patch. Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Jeff, > Right now they are just constants defined in a header, but a GUC might > make sense. I'd like to know which version is more acceptable when I > submit my final patch. As much as I hate the thought of more GUCs, until we have a solid performance profile for synch scan we probably need them. You should include the option to turn synch_scan off, such as by setting synch_scan_threshold to -1. Oh, and remember that these now need to be able to take K/MB/GB. These options should probably go in postgresql.conf under QUERY TUNING, with their own sub-head. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Fri, 2007-03-02 at 15:49 -0800, Josh Berkus wrote: > Jeff, > > > Right now they are just constants defined in a header, but a GUC might > > make sense. I'd like to know which version is more acceptable when I > > submit my final patch. > > As much as I hate the thought of more GUCs, until we have a solid > performance profile for synch scan we probably need them. You should I will include them in the final patch then. > include the option to turn synch_scan off, such as by setting > synch_scan_threshold to -1. Naturally. > Oh, and remember that these now need to be able to take K/MB/GB. Will do. > These options should probably go in postgresql.conf under QUERY TUNING, > with their own sub-head. That makes sense to me. Regards, Jeff Davis PS: Did you happen to get my patch for testing (sent off-list)? If testing will take a while, that's OK, I'd just like to know whether to expect the results before feature freeze. ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Jeff, > PS: Did you happen to get my patch for testing (sent off-list)? If > testing will take a while, that's OK, I'd just like to know whether to > expect the results before feature freeze. I'm not sure. We have a bunch to patches in our queue to test, and the benchmark guys don't really expect synch scan to affect OLTP benchmarks much. You might want to pester Greenplum about testing on TPCH. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| On Fri, 2007-03-02 at 15:03 -0800, Jeff Davis wrote: > Is there any consensus about whether to include these two parameters as > GUCs or constants if my patch is to be accepted? > > (1) sync_scan_threshold: Use synchronized scanning for tables greater > than this many pages; smaller tables will not be affected. That sounds OK. > (2) sync_scan_offset: Start a new scan this many pages before a > currently running scan to take advantage of the pages > that are likely already in cache. I'm somewhat dubious about this parameter, I have to say, even though I am eager for this feature. It seems like a "magic" parameter that works only when we have the right knowledge to set it correctly. How will we know what to default it to and how will we know whether to set it higher or lower for better performance? Does that value vary according to the workload on the system? How? I'm worried that we get a feature that works well on simple tests and not at all in real world circumstances. I don't want to cast doubt on what could be a great patch or be negative: I just see that the feature relies on the dynamic behaviour of the system. I'd like to see some further studies on how this works to make sure that we can realistically set know how to set this knob, that its the correct knob and it is the only one we need. Further thoughts: It sounds like sync_scan_offset is related to effective_cache_size. Can you comment on whether that might be a something we can use as well/instead? (i.e. set the scan offset to say K * effective_cache_size, 0.1 <= K <= 0.5)??? Might we do roughly the same thing with sync_scan_threshold as well, and just have enable_sync_scan instead? i.e. sync_scan_threshold = effective_cache_size? When would those two parameters not be connected directly to each other? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| > >> (2) sync_scan_offset: Start a new scan this many pages before a >> currently running scan to take advantage of the pages >> that are likely already in cache. >> > > I'm somewhat dubious about this parameter, I have to say, even though I > am eager for this feature. It seems like a "magic" parameter that works > only when we have the right knowledge to set it correctly. > > Hello, Don't get me wrong, I want things to be easily understandable as well but the reason you site above pretty much makes us need to remove most of the postgresql.conf, including all bgwriter, vacuum cost delay, and autovac settings. Not to mention commit delay and others Sincerely, Joshua D. Drake ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| On Sun, 2007-03-04 at 11:54 +0000, Simon Riggs wrote: > > (2) sync_scan_offset: Start a new scan this many pages before a > > currently running scan to take advantage of the pages > > that are likely already in cache. > > I'm somewhat dubious about this parameter, I have to say, even though I > am eager for this feature. It seems like a "magic" parameter that works > only when we have the right knowledge to set it correctly. > That was my concern about this parameter also. > How will we know what to default it to and how will we know whether to > set it higher or lower for better performance? Does that value vary > according to the workload on the system? How? > Perhaps people would only set this parameter when they know it will help, and for more complex (or varied) usage patterns they'd set sync_scan_offset to 0 to be safe. My thinking on the subject (and this is only backed up by very basic tests) is that there are basically two situations where setting this parameter too high can hurt: (1) It's too close to the limits of your physical memory, and you end up diverging the scans when they could be kept together. (2) You're using a lot of CPU and the backends aren't processing the buffers as fast as your I/O system is delivering them. This will prevent the scans from converging. If your CPUs are well below capacity and you choose a size significantly less than your effective cache size, I don't think it will hurt. > I'm worried that we get a feature that works well on simple tests and > not at all in real world circumstances. I don't want to cast doubt on > what could be a great patch or be negative: I just see that the feature > relies on the dynamic behaviour of the system. I'd like to see some > further studies on how this works to make sure that we can realistically > set know how to set this knob, that its the correct knob and it is the > only one we need. I will do some better tests on some better hardware this week and next week. I hope that sheds some light. > Further thoughts: It sounds like sync_scan_offset is related to > effective_cache_size. Can you comment on whether that might be a > something we can use as well/instead? (i.e. set the scan offset to say K > * effective_cache_size, 0.1 <= K <= 0.5)??? > > Might we do roughly the same thing with sync_scan_threshold as well, and > just have enable_sync_scan instead? i.e. sync_scan_threshold = > effective_cache_size? When would those two parameters not be connected > directly to each other? > Originally, these parameters were in terms of the effective_cache_size. Somebody else convinced me that it was too confusing to have the variables dependent on each other, so I made them independent. I don't have a strong opinion either way. Regards, Jeff Davis ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| JD, > Don't get me wrong, I want things to be easily understandable as well > but the reason you site above pretty much > makes us need to remove most of the postgresql.conf, including all > bgwriter, vacuum cost delay, and autovac settings. > Not to mention commit delay and others Wouldn't that be nice! The explosion of GUC settings is primarily a result of not enough information. The reason there are 7 bgwriter settings, for example, is that we have no idea what those settings should be and are hoping that people will tinker with them and tell us. Someday when I can fully profile bgwriter, we'll just have one setting: bgwriter_aggressive, set to a number between 0 and 9. -- Josh Berkus PostgreSQL @ Sun San Francisco ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| ||||
| On Mar 6, 2007, at 9:43 AM, Josh Berkus wrote: >> Don't get me wrong, I want things to be easily understandable as well >> but the reason you site above pretty much >> makes us need to remove most of the postgresql.conf, including all >> bgwriter, vacuum cost delay, and autovac settings. >> Not to mention commit delay and others > > Wouldn't that be nice! > > The explosion of GUC settings is primarily a result of not enough > information. > The reason there are 7 bgwriter settings, for example, is that we > have no > idea what those settings should be and are hoping that people will > tinker > with them and tell us. Someday when I can fully profile bgwriter, > we'll just > have one setting: bgwriter_aggressive, set to a number between 0 > and 9. In the mean time; it would be great for these multiple-settings cases to be listed somewhere, indicating that it's something we could use help with. I think that with some explanation of what we're looking for there's any number of people who could do this kind of profiling. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |