This is a discussion on Page-at-a-time Locking Considerations within the pgsql Hackers forums, part of the PostgreSQL category; --> In heapgetpage() we hold the buffer locked while we look for visible tuples. That works well in most cases ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| In heapgetpage() we hold the buffer locked while we look for visible tuples. That works well in most cases since the visibility check is fast if we have status bits set. If we don't have visibility bits set we have to do things like scan the snapshot and confirm things via clog lookups. All of that takes time and can lead to long buffer lock times, possibly across multiple I/Os in the very worst cases. This doesn't just happen for old transactions. Accessing very recent TransactionIds is prone to rare but long waits when we ExtendClog(). Such problems are numerically rare, but the buffers with long lock times are also the ones that have concurrent or at least recent write operations on them. So all SeqScans have the potential to induce long wait times for write transactions, even if they are scans on 1 block tables. Tables with heavy write activity on them from multiple backends have their work spread across multiple blocks, so a SeqScan will hit this issue repeatedly as it encounters each current insertion point in a table and so greatly increases the chances of it occurring. It seems possible to just memcpy() the whole block away and then drop the lock quickly. That gives a consistent lock time in all cases and allows us to do the visibility checks in our own time. It might seem that we would end up copying irrelevant data, which is true. But the greatest cost is memory access time. If hardware memory pre-fetch cuts in we will find that the memory is retrieved en masse anyway; if it doesn't we will have to wait for each cache line. So the best case is actually an en masse retrieval of cache lines, in the common case where blocks are fairly full (vague cutoff is determined by exact mechanism of hardware/compiler induced memory prefetch). The copied block would be used only for visibility checks. The main buffer would retain its pin and we would pass references to the block through the executor as normal. So this would be a change completely isolated to heapgetpage(). Was the copy-aside method considered when we introduced page at a time mode? Any reasons to think it would be dangerous or infeasible? If not, I'll give it a bash and get some test results. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Simon Riggs <simon@2ndquadrant.com> writes: > In heapgetpage() we hold the buffer locked while we look for visible > tuples. It's a share lock though. Do you have any direct proof that this behavior is as nasty as you claim? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Mon, 2008-02-04 at 13:27 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > In heapgetpage() we hold the buffer locked while we look for visible > > tuples. > > It's a share lock though. Which conflicts with write locks. > Do you have any direct proof that this > behavior is as nasty as you claim? No, but I've been thinking about how to get some, for this and other situations. This one is difficult to track down because it moves from buffer to buffer reasonably quickly. Starting another thread on that. We still have a higher than desirable variability in response times and I'm looking at possible causes. I'll try patching it, unless you can think of a reason why its a complete non-starter? I'm not saying we'd want it yet, just that it seems worth trying. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > We still have a higher than desirable variability in response times and > I'm looking at possible causes. I agree we have a problem with this. My feeling is that the problems have more to do with higher level things like stats being toasted, or checkpoints or wal file changes, or a myriad of other things. But clog lru thrashing while holding other locks is a definite possibility too. I wonder how hard it would be to shove the clog into regular shared memory pages and let the clock sweep take care of adjusting the percentage of shared mem allocated to the clog versus data pages. > I'll try patching it, unless you can think of a reason why its a > complete non-starter? I'm not saying we'd want it yet, just that it > seems worth trying. Sure, but a good experiment needs af theory to test. I think you have to find a way to measure this first. Otherwise you're going to write a patch and then have two trees and be searching around in the dark for a difference. This strikes me as something dtrace might be able to help measure. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services! ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Gregory Stark wrote: > I wonder how hard it would be to shove the clog into regular shared memory > pages and let the clock sweep take care of adjusting the percentage of shared > mem allocated to the clog versus data pages. Hmm, this is an interesting idea. I wonder what would happen if we let other SLRU users go into shared buffers too -- for example it has been reported several times that pg_subtrans thrashing can cause severe problems in case of long running transactions. (I wonder whether pg_subtrans would occupy a big portion of shared buffers if we let it go unchecked). -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Mon, 2008-02-04 at 20:03 +0000, Gregory Stark wrote: > I wonder how hard it would be to shove the clog into regular shared > memory pages and let the clock sweep take care of adjusting the > percentage of shared mem allocated to the clog versus data pages. There is a reason that's not been done... try it and see. Plus it doesn't fully resolve the main issue as described. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > Gregory Stark wrote: >> I wonder how hard it would be to shove the clog into regular shared memory >> pages and let the clock sweep take care of adjusting the percentage of shared >> mem allocated to the clog versus data pages. > Hmm, this is an interesting idea. I wonder what would happen if we let > other SLRU users go into shared buffers too -- for example it has been > reported several times that pg_subtrans thrashing can cause severe > problems in case of long running transactions. My recollection is that we didn't do that because the standard buffer manager has some assumptions that are violated by clog/etc pages --- notably the lack of LSNs on the pages. Not sure how hard that is to fix. I also note that we'd not really be removing any contention, rather just pushing it into the bufmgr. Maybe the bufmgr is now scalable enough that it could take the extra load better than SLRU can, but this is hardly a given. It sounds worth experimenting with, but it's not a slam-dunk win. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Simon Riggs wrote: > On Mon, 2008-02-04 at 20:03 +0000, Gregory Stark wrote: > > > I wonder how hard it would be to shove the clog into regular shared > > memory pages and let the clock sweep take care of adjusting the > > percentage of shared mem allocated to the clog versus data pages. > > There is a reason that's not been done... try it and see. What is it? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Alvaro Herrera wrote: > Gregory Stark wrote: > >> I wonder how hard it would be to shove the clog into regular shared memory >> pages and let the clock sweep take care of adjusting the percentage of shared >> mem allocated to the clog versus data pages. > > Hmm, this is an interesting idea. I wonder what would happen if we let > other SLRU users go into shared buffers too -- for example it has been > reported several times that pg_subtrans thrashing can cause severe > problems in case of long running transactions. (I wonder whether > pg_subtrans would occupy a big portion of shared buffers if we let it go > unchecked). Presumably we would have a fair way of accounting cache hits, and increase the usage_count accordingly. It should occupy just the right amount, in proportion of how often it's used vs. other buffers. That definitely seems worthwhile to me. Not only because of any possible performance gains you might get, but perhaps even more importantly it would eliminate an option (clog_buffers) that you may need to tune manually otherwise. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| Tom Lane wrote: > > Gregory Stark wrote: > >> I wonder how hard it would be to shove the clog into regular shared memory > >> pages and let the clock sweep take care of adjusting the percentage of shared > >> mem allocated to the clog versus data pages. > My recollection is that we didn't do that because the standard buffer > manager has some assumptions that are violated by clog/etc pages --- > notably the lack of LSNs on the pages. Not sure how hard that is to > fix. I also note that we'd not really be removing any contention, > rather just pushing it into the bufmgr. Maybe the bufmgr is now > scalable enough that it could take the extra load better than SLRU can, > but this is hardly a given. Well, in the case of pg_subtrans, I don't think the problem is contention -- rather, the fact that the number of buffers is fixed and small. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| Thread Tools | |
| Display Modes | |
|
|