vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi all, I like to know the caching policies of Postgresql. What parameter in the postgresql.conf affects the cache size used by the Postgresql? As far as I have searched my knowledge of the parameters are 1. shared_buffers - Sets the limit on the amount of shared memory used. If I take this is as the cache size then my performance should increase with the increase in the size of shared_buffers. But it seems it is not the case and my performance actually decreases with the increase in the shared_buffers. I have a RAM size of 32 GB. The table which I use more frequently has around 68 million rows. Can I cache this entire table in RAM? 2. work_mem - It is the amount of memory used by an operation. My guess is once the operation is complete this is freed and hence has nothing to do with the caching. 3. effective_cache_size - The parameter used by the query planner and has nothing to do with the actual caching. So kindly help me in pointing me to the correct parameter to set. It will be great if you can point me to the docs that explains the implementation of caching in Postgresql which will help me in understanding things much clearly. Thanks in advance. Gokul. __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| gokulnathbabu manoharan wrote: > Hi all, > > I like to know the caching policies of Postgresql. > What parameter in the postgresql.conf affects the > cache size used by the Postgresql? As far as I have > searched my knowledge of the parameters are In general, you don't. The OS handles caching based on file usage. So if you are using the files, the OS should cache them. Just like it does with any other program. > > 1. shared_buffers - Sets the limit on the amount of > shared memory used. If I take this is as the cache > size then my performance should increase with the > increase in the size of shared_buffers. But it seems > it is not the case and my performance actually > decreases with the increase in the shared_buffers. I > have a RAM size of 32 GB. The table which I use more > frequently has around 68 million rows. Can I cache > this entire table in RAM? There is a portion of this which is used for caching. But I believe before 8.1 there was code that went linearly through all of the shared_buffers and checked for dirty/clean pages. So there was a tradeoff that the bigger you make it, the longer that search goes. So you got diminishing returns, generally around 10k shared buffers. I think it is better in 8.1, but if the OS is going to cache it anyway (since it does), then having a Postgres cache is just wasting memory, and not letting cache as much. So I'm guessing that with 8.1 there would be 2 sweet spots. Low shared_buffers (<= 10k), and really high shared buffers (like all of available ram). But because postgres has been tuned for the former I would stick with it (I don't think shared_buffers can go >2GB, but that might just be work_mem/maintenance_work_mem). > > 2. work_mem - It is the amount of memory used by an > operation. My guess is once the operation is complete > this is freed and hence has nothing to do with the > caching. > > 3. effective_cache_size - The parameter used by the > query planner and has nothing to do with the actual > caching. This is important from a planner issue. Because the planner can then expect that the OS is doing its job and caching the tables, so index scans are cheaper than they would be otherwise. John =:-> > > So kindly help me in pointing me to the correct > parameter to set. > > It will be great if you can point me to the docs that > explains the implementation of caching in Postgresql > which will help me in understanding things much > clearly. > > Thanks in advance. > Gokul. > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.0 (Darwin) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFDC1woJdeBCYSNAAMRAo5oAKDK1553m7IjY7NssS5Ot/I7lDwKjQCfW21x 1B1z4H/JdL58uUMxS1ma3Ac= =QG/Q -----END PGP SIGNATURE----- |
| |||
| On Tue, Aug 23, 2005 at 10:10:45 -0700, gokulnathbabu manoharan <gokulnathbabu@yahoo.com> wrote: > Hi all, > > I like to know the caching policies of Postgresql. > What parameter in the postgresql.conf affects the > cache size used by the Postgresql? As far as I have > searched my knowledge of the parameters are The main policy is to let the OS do most of the caching. > 1. shared_buffers - Sets the limit on the amount of > shared memory used. If I take this is as the cache > size then my performance should increase with the > increase in the size of shared_buffers. But it seems > it is not the case and my performance actually > decreases with the increase in the shared_buffers. I > have a RAM size of 32 GB. The table which I use more > frequently has around 68 million rows. Can I cache > this entire table in RAM? Using extermely large values for shared buffers is known to be a performance loss for Postgres. Some improvements were made for 8.0 and more for 8.1. The OS will cache frequently used data from files for you. So if you are using that table a lot and the rows aren't too wide, it should mostly be cached for you by the OS. > 2. work_mem - It is the amount of memory used by an > operation. My guess is once the operation is complete > this is freed and hence has nothing to do with the > caching. This is used for sorts and some other things. > 3. effective_cache_size - The parameter used by the > query planner and has nothing to do with the actual > caching. You are supposed to use this to give the planner an idea about how much space the OS will using for caching on behalf of Posgres. > So kindly help me in pointing me to the correct > parameter to set. > > It will be great if you can point me to the docs that > explains the implementation of caching in Postgresql > which will help me in understanding things much > clearly. You probably want to read the following: http://www.varlena.com/varlena/Gener...bits/perf.html ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Tue, 23 Aug 2005 10:10:45 -0700 (PDT) gokulnathbabu manoharan <gokulnathbabu@yahoo.com> wrote: > Hi all, > > I like to know the caching policies of Postgresql. > What parameter in the postgresql.conf affects the > cache size used by the Postgresql? As far as I have > searched my knowledge of the parameters are > > 1. shared_buffers - Sets the limit on the amount of > shared memory used. If I take this is as the cache > size then my performance should increase with the > increase in the size of shared_buffers. But it seems > it is not the case and my performance actually > decreases with the increase in the shared_buffers. I > have a RAM size of 32 GB. The table which I use more > frequently has around 68 million rows. Can I cache > this entire table in RAM? increasing shared_buffers to a point helps, but after a certain threshold it can actually degree performance. > 2. work_mem - It is the amount of memory used by an > operation. My guess is once the operation is complete > this is freed and hence has nothing to do with the > caching. This is the amount of memory used for things like sorts and order bys on a per backend process basis. > 3. effective_cache_size - The parameter used by the > query planner and has nothing to do with the actual > caching. The instructs the query planner on how large the operating system's disk cache is. There isn't a built in cache, PostgreSQL relies on the operating system to cache the on disk information based on how often it is used. In most cases this is probably more accurate anyway. I wrote an article on PostgreSQL performance tuning that has links to several other related sites, you can find it here: http://www.revsys.com/writings/postg...rformance.html --------------------------------- Frank Wiles <frank@wiles.org> http://www.wiles.org --------------------------------- ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| John, > So I'm guessing that with 8.1 there would be 2 sweet spots. Low > shared_buffers (<= 10k), and really high shared buffers (like all of > available ram). > But because postgres has been tuned for the former I would stick with it > (I don't think shared_buffers can go >2GB, but that might just be > work_mem/maintenance_work_mem). I'll be testing this as soon as we get some issues with the 64bit shared_buffer patch worked out. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| I mean well with this comment - This whole issue of data caching is a troubling issue with postreSQL in that even if you ran postgreSQL on a 64 bit address space with larger number of CPUs you won't see much of a scale up and possibly even a drop. I am not alone in having the *expectation* that a database should have some cache size parameter and the option to skip the file system. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. Is this a crazy idea - that a project be started to get this adopted? Is it too big and structural to contemplate? From one who likes postgreSQL dc Frank Wiles wrote: >On Tue, 23 Aug 2005 10:10:45 -0700 (PDT) >gokulnathbabu manoharan <gokulnathbabu@yahoo.com> wrote: > > > >>Hi all, >> >>I like to know the caching policies of Postgresql. >>What parameter in the postgresql.conf affects the >>cache size used by the Postgresql? As far as I have >>searched my knowledge of the parameters are >> >>1. shared_buffers - Sets the limit on the amount of >>shared memory used. If I take this is as the cache >>size then my performance should increase with the >>increase in the size of shared_buffers. But it seems >>it is not the case and my performance actually >>decreases with the increase in the shared_buffers. I >>have a RAM size of 32 GB. The table which I use more >>frequently has around 68 million rows. Can I cache >>this entire table in RAM? >> >> > > increasing shared_buffers to a point helps, but after > a certain threshold it can actually degree performance. > > > >>2. work_mem - It is the amount of memory used by an >>operation. My guess is once the operation is complete >>this is freed and hence has nothing to do with the >>caching. >> >> > > This is the amount of memory used for things like sorts and > order bys on a per backend process basis. > > > >>3. effective_cache_size - The parameter used by the >>query planner and has nothing to do with the actual >>caching. >> >> > > The instructs the query planner on how large the operating > system's disk cache is. There isn't a built in cache, PostgreSQL > relies on the operating system to cache the on disk information > based on how often it is used. In most cases this is probably > more accurate anyway. > > I wrote an article on PostgreSQL performance tuning that has > links to several other related sites, you can find it here: > > http://www.revsys.com/writings/postg...rformance.html > > --------------------------------- > Frank Wiles <frank@wiles.org> > http://www.wiles.org > --------------------------------- > > >---------------------------(end of broadcast)--------------------------- >TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > > ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Donald Courtney <Donald.Courtney@Sun.COM> writes: > I am not alone in having the *expectation* that a database should have > some cache size parameter and the option to skip the file system. If > I use oracle, sybase, mysql and maxdb they all have the ability to > size a data cache and move to 64 bits. And you're not alone in holding that opinion despite having no shred of evidence that it's worthwhile expanding the cache that far. However, since we've gotten tired of hearing this FUD over and over, 8.1 will have the ability to set shared_buffers as high as you want. I expect next we'll be hearing from people complaining that they set shared_buffers to use all of RAM and performance went into the tank ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Donald, > This whole issue of data caching is a troubling issue with postreSQL > in that even if you ran postgreSQL on a 64 bit address space > with larger number of CPUs you won't see much of a scale up > and possibly even a drop. Since when? Barring the context switch bug, you're not going to get a drop with more processors/more RAM. You may fail to get any gain, though. If your database is only 100MB in size, having 11G of cache space isn't going to help you much over having only 1G. > I am not alone in having the *expectation* > that a database should have some cache size parameter and > the option to skip the file system. Sure, because that's the conventional wisdom, as writ by Oracle. However, this comes with substantial code maintenance costs and portability limitations which have to be measured against any gain in performance. > If I use oracle, sybase, mysql > and maxdb they all have the ability to size a data cache and move > to 64 bits. And yet, we regularly outperform Sybase and MySQL on heavy OLTP loads on commodity x86 hardware. So apparently DB caching isn't everything. ;-) I'm not saying that it's not worth testing larger database caches -- even taking over most of RAM -- on high-speed systems. In fact, I'm working on doing that kind of test now. However, barring test results, we can't assume that taking over RAM and the FS cache would have a substantial performance benefit; that remains to be shown. The other thing is that we've had, and continue to have, low-hanging fruit which have a clear and measurable effect on performance and are fixable without bloating the PG code. Some of these issues (COPY path, context switching, locks, GiST concurrency, some aggregates) have been addressed in the 8.1 code; some remain to be addressed (sorts, disk spill, 64-bit sort mem, other aggregates, index-only access, etc.). Why tackle a huge, 250-hour project which could fail when a 20-hour patch is more likely to provide the same performance benefit? We have the same discussion (annually) about mmap. Using mmap *might* provide us with a huge performance boost. However, it would *definitely* require 300hours (or more) of programmer time to test properly, and might not benefit us at all. Of course, if *you* want to work on large database cache improvements, be my guest ... it's an open source project! Submit your patches! I'll be happy to test them. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Tue, Aug 23, 2005 at 02:41:39PM -0400, Donald Courtney wrote: > I mean well with this comment - > This whole issue of data caching is a troubling issue with postreSQL > in that even if you ran postgreSQL on a 64 bit address space > with larger number of CPUs you won't see much of a scale up > and possibly even a drop. I am not alone in having the *expectation* > that a database should have some cache size parameter and > the option to skip the file system. If I use oracle, sybase, mysql > and maxdb they all have the ability to size a data cache and move > to 64 bits. > Is this a crazy idea - that a project be started to get this adopted? > Is it > too big and structural to contemplate? > From one who likes postgreSQL Hey Donald. :-) This is an operating system issue, not a PostgreSQL issue. If you have more physical memory than fits in 32-bit addresses, and your operating system isn't using this extra memory to cache files (or anything else), than your OS is what I would consider to be broken (or at the very least, not designed for a 64-bit host). The only questions that can be asked here is - 1) can PostgreSQL do a better job than the OS at best utilizing system RAM, and 2) if so, is the net gain worth the added complexity to PostgreSQL? I happen to think that yes, PostgreSQL can do a better job than most OS's, as it has better information to make decisions as to which pages are worth keeping, and which are not, but no, it isn't worth the effort until PostgreSQL developers start running out of things to do. Buy your 64-bit platforms - but if page caching is your concern, 1) ensure that you really have more physical memory than can fit in 32 bits, and 2) ensure that your operating system is comfortable caching data pages from files above the 32-bit mark. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ .. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bind them... http://mark.mielke.cc/ ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| On Tue, Aug 23, 2005 at 12:38:04PM -0700, Josh Berkus wrote: >which have a clear and measurable effect on performance and are fixable >without bloating the PG code. Some of these issues (COPY path, context >switching Does that include increasing the size of read/write blocks? I've noticed that with a large enough table it takes a while to do a sequential scan, even if it's cached; I wonder if the fact that it takes a million read(2) calls to get through an 8G table is part of that. Mike Stone ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |