This is a discussion on Re: [WIP PATCH] Lazily assign xids for toplevel Transactions within the pgsql Hackers forums, part of the PostgreSQL category; --> I wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Yeah - I do not really like that dual-locking thing ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Yeah - I do not really like that dual-locking thing either. But it makes >> prepared transaction handling much easier - if we were to only lock the >> RID, we'd have to store the rid<->xid mapping for prepared transactions > Hmmm .... that's a good point. Not sure how to include prepared xacts > in the scheme. After further thought I think we should avoid depending on PIDs as part of a lock tag --- their behavior is too system-dependent, in particular we have no right to assume that when a backend exits the same PID won't be reassigned shortly after, possibly leading to confusion. Instead, I suggest that we keep a session counter in shared memory and have each backend assign itself a session ID at startup using that. A 32-bit session ID in combination with a 32-bit locally assigned transaction number should be sufficiently unique to identify a transaction (prepared or otherwise) for the purposes of locking. These "transient XIDs" only need to be unique as long as the transaction exists plus shortly thereafter (to avoid race conditions when someone waits for a transaction that actually terminated a moment before). So wraparound of the counters isn't a problem, although we probably want to reserve zero as an invalid value. I think we only need a transient XID of this sort for top-level transactions, not subtransactions. I thought for awhile about how to avoid taking a lock on the regular XID as well as the transient XID, but it seems pretty messy, particularly if we don't want to assign transient XIDs to subtransactions. Probably best not to go there, at least not in the first-cut patch. To make CREATE INDEX CONCURRENTLY work, we'd need two things: * GetLockConflicts would need to report the transient XIDs of the conflicting xacts, not regular XIDs, since they might not have regular XIDs. Then we'd wait on those locks instead of regular-XID locks. * The second phase where we wait out transactions that can still see old tuples doesn't work because such transactions won't necessarily be listed in the snapshot. Instead, what we have to do is look through the ProcArray for transactions whose advertised xmin is less than the xmax of our reference snapshot. When we find one, wait for it using its transient XID. AFAICT, C.I.C. is currently the only place in the system where we really need transient XIDs at all. Everyplace else that we need to wait for a transaction, it's because we found its regular XID in a tuple we want to lock or modify. So the whole thing is a bit annoying. Maybe we could get rid of the extra overhead with some shenanigans inside the lock manager, like not bothering to create a data structure representing the holding of a transient-XID lock until such time as C.I.C. actually tries to wait for it. But again, that seems like a second-pass optimization. Can anyone suggest better terminology than "transient XID"? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Tom Lane wrote: > I wrote: >> "Florian G. Pflug" <fgp@phlo.org> writes: >>> Yeah - I do not really like that dual-locking thing either. But it makes >>> prepared transaction handling much easier - if we were to only lock the >>> RID, we'd have to store the rid<->xid mapping for prepared transactions > >> Hmmm .... that's a good point. Not sure how to include prepared xacts >> in the scheme. > > After further thought I think we should avoid depending on PIDs as part > of a lock tag --- their behavior is too system-dependent, in particular > we have no right to assume that when a backend exits the same PID won't > be reassigned shortly after, possibly leading to confusion. > > Instead, I suggest that we keep a session counter in shared memory and > have each backend assign itself a session ID at startup using that. > A 32-bit session ID in combination with a 32-bit locally assigned > transaction number should be sufficiently unique to identify a > transaction (prepared or otherwise) for the purposes of locking. > These "transient XIDs" only need to be unique as long as the transaction > exists plus shortly thereafter (to avoid race conditions when someone > waits for a transaction that actually terminated a moment before). > So wraparound of the counters isn't a problem, although we probably > want to reserve zero as an invalid value. > > I think we only need a transient XID of this sort for top-level > transactions, not subtransactions. Sounds good, if we decide to go with the transient XID idea. So below for an alternative that I just came up with. > To make CREATE INDEX CONCURRENTLY work, we'd need two things: > > * GetLockConflicts would need to report the transient XIDs of the > conflicting xacts, not regular XIDs, since they might not have regular > XIDs. Then we'd wait on those locks instead of regular-XID locks. Yes. This is exactly what my patch does today. > * The second phase where we wait out transactions that can still see > old tuples doesn't work because such transactions won't necessarily > be listed in the snapshot. Instead, what we have to do is look > through the ProcArray for transactions whose advertised xmin is > less than the xmax of our reference snapshot. When we find one, > wait for it using its transient XID. > AFAICT, C.I.C. is currently the only place in the system where we > really need transient XIDs at all. Everyplace else that we need to > wait for a transaction, it's because we found its regular XID in > a tuple we want to lock or modify. So the whole thing is a bit > annoying. Maybe we could get rid of the extra overhead with some > shenanigans inside the lock manager, like not bothering to create > a data structure representing the holding of a transient-XID lock > until such time as C.I.C. actually tries to wait for it. But > again, that seems like a second-pass optimization. I've given some thought to that. There are two distinct things we need to be able to wait for 1) Until all current holders of a lock, grantmask conflicts with a given locklevel have dropped their lock 2) Until all currently in-use snapshots have an xmin larger than some given value (The xmax of the reference snapshot). (1) Could be solved directly in the lock manager. We'd need some mechanism to wake up a process whenever someone releases a ceratin lock. (2) Could be done by acquireing a ShareLock (with a new locktype LOCKTYPE_XMIN) on the xmin of a transaction's serializable snapshot when it's created. The second waiting phase of concurrent index builds would then be a) Find the oldest xmin in the ProcArray. b) If that xmin is equal or greater than the xmax of our reference snapshot, we're done. c) Wait until the ExclusiveLock (for LOCKTYPE_TRANSACTION) is released on that xmin. After that point, new transactions will compute an xmin greater than the oldest one we found in the ProcArray, because the limiting transactions has exited, and because ReadNewTransactionId returns a value greater than that xmin too (Otherwise, we'd have exited in (b)). d) Wait for all current holders of LOCKTYPE_XMIN to release their locks. (Using the machinery needed for (1)). No new holders can show up, because new snapshots will computer a larger xmin. e) Goto a). I could code (2), but I'd need help with (1) - The details of the locking subsystems are still somewhat a mystery to me. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| "Florian G. Pflug" <fgp@phlo.org> writes: > Sounds good, if we decide to go with the transient XID idea. So below > for an alternative that I just came up with. This proposal appears to require taking and releasing a brand-new lock type every time a snapshot is made or destroyed. That is certainly not going to be less overhead than the transient-XID scheme. At least in READ COMMITTED mode, there are normally multiple snapshots taken per transaction. (Something worth noting here is that I expect soon, probably 8.4, we will fix things so that what a backend advertises in MyProc->xmin is the xmin of its oldest still-live snapshot. That means that xmin will change intra-transaction in READ COMMITTED mode, and thus that we would indeed need to take and release the sort of lock you are suggesting each time.) regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Sounds good, if we decide to go with the transient XID idea. So below >> for an alternative that I just came up with. > > This proposal appears to require taking and releasing a brand-new lock > type every time a snapshot is made or destroyed. That is certainly not > going to be less overhead than the transient-XID scheme. At least in > READ COMMITTED mode, there are normally multiple snapshots taken per > transaction. Only for the serializable shapsnot I'd have thought, making the overhead bearable (it's is surely the oldest of all the xact's shapshots) but ... > (Something worth noting here is that I expect soon, probably 8.4, > we will fix things so that what a backend advertises in MyProc->xmin > is the xmin of its oldest still-live snapshot. That means that xmin > will change intra-transaction in READ COMMITTED mode, and thus that > we would indeed need to take and release the sort of lock you are > suggesting each time.) .... with this in mind, my proposal looks pretty bad :-( What do you think about solving the requirements of the *first* waiting phase (Where we wait for current ShareLock holders) inside the lock manager? The only real downside I can see is that I feel uneasy about messing with that code... It seems to be subtle, and quick to anger ;-) For the second phase, I see two options .) Go forward with the transient XIDs / RIDs approach .) Do something similar to the indcreatexid idea the HOT patch implements. This essentially puts the burden of deciding an index's usability on the using xact, not on the one creating the index. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| "Florian G. Pflug" <fgp@phlo.org> writes: > What do you think about solving the requirements of the *first* waiting > phase (Where we wait for current ShareLock holders) inside the lock manager? > The only real downside I can see is that I feel uneasy about messing with > that code... It seems to be subtle, and quick to anger ;-) It sounded pretty dubious to me. The problem is that we don't want to wait until we could actually *get* the lock, we only want to wait out the conflicting xacts that existed when we looked. This is important because there might be a steady stream of new xacts acquiring conflicting locks (ie, a steady stream of writers), and we don't want to either block them, or have to hope for a window where there are none. But the lock manager does not currently track who acquired a lock when, and I think it would add a lot of usually-wasted overhead to do that. I spent a fair amount of time yesterday trying to think of alternative solutions, and didn't really think of much. The core reason why C.I.C. is implemented the way it is is that it's entirely possible that there will be a deadlock with some other process (ie, the other process is old enough that we must wait for it, but it's blocked trying to acquire some lock that conflicts with our ShareUpdateExclusiveLock). Thus, waiting by trying to acquire XID locks is a good idea because it automatically detects deadlock, and may even be able to escape the deadlock by wait-queue rearrangement. (I'm not certain that the latter is applicable in any situation C.I.C. would get into, but I'm not certain it's not, either.) Schemes involving "sleep awhile and check the ProcArray again" are right out because they fail to detect deadlock. Everything else I could think of involved special new lockmanager features that would have to still preserve the ability to handle deadlocks, which didn't sound like something I want to tackle for this. So on the whole the extra transaction identifier seems to be the way to go. I haven't looked at how that interacts with HOT though. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> What do you think about solving the requirements of the *first* waiting >> phase (Where we wait for current ShareLock holders) inside the lock manager? >> The only real downside I can see is that I feel uneasy about messing with >> that code... It seems to be subtle, and quick to anger ;-) > > It sounded pretty dubious to me. The problem is that we don't want to > wait until we could actually *get* the lock, we only want to wait out > the conflicting xacts that existed when we looked. This is important > because there might be a steady stream of new xacts acquiring > conflicting locks (ie, a steady stream of writers), and we don't want to > either block them, or have to hope for a window where there are none. > But the lock manager does not currently track who acquired a lock when, > and I think it would add a lot of usually-wasted overhead to do that. > > I spent a fair amount of time yesterday trying to think of alternative > solutions, and didn't really think of much. The core reason why C.I.C. > is implemented the way it is is that it's entirely possible that there > will be a deadlock with some other process (ie, the other process is > old enough that we must wait for it, but it's blocked trying to acquire > some lock that conflicts with our ShareUpdateExclusiveLock). Thus, > waiting by trying to acquire XID locks is a good idea because it > automatically detects deadlock, and may even be able to escape the > deadlock by wait-queue rearrangement. (I'm not certain that the latter > is applicable in any situation C.I.C. would get into, but I'm not > certain it's not, either.) Schemes involving "sleep awhile and check > the ProcArray again" are right out because they fail to detect > deadlock. Everything else I could think of involved special new > lockmanager features that would have to still preserve the ability > to handle deadlocks, which didn't sound like something I want to > tackle for this. > > So on the whole the extra transaction identifier seems to be the > way to go. I haven't looked at how that interacts with HOT though. Ok, I'll update the patch to use your global id plus local id idea than, and remove TemporaryTransactionIds. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| Thread Tools | |
| Display Modes | |
|
|