vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, I've finished (hopefully) the code to handle a current list of open snapshots in a transaction. I'm now wondering how to put it to good use ;-) I'm not posting it yet -- first I want to get some feedback on the previous patch I posted, http://archives.postgresql.org/pgsql...3/msg00245.php I think the important change here is switching the semantics of MyProc->xmin. Currently, it is "the minimum of Xmin and Xid, across all backends, at the moment the current transaction fetches its serializable snapshot". The first important bit is that it is computed only once: when the serializable snapshot is taken. So ISTM the important change is that we will have to update MyProc->xmin more frequently than that. I'm thinking in keeping enough local state so that we can detect at what time the earliest open snapshot is unregistered; when that happens, we can recalculate MyProc->xmin based on the snapshots we have and the Xid/Xmin of remote backends (which could have also been updating their own xmins). There is one hole here: contention on ProcArrayLock. Basically, for simple transactions we will need to update MyProc after every command. It has been reported that ProcArrayLock is the most contended lock for some loads; this would only add to that, and heavily I think. Perhaps we could restructure the locking here somehow to avoid this problem, but it is complex enough already that it may not even be possible. Another idea is to throttle the updating of Xmin so it only happens once in a while, but it's difficult to find a useful criterion and avoid falling into the trap that we just neglected to update it before a large command. Thoughts? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Tue, 2008-03-25 at 17:26 -0300, Alvaro Herrera wrote: > I've finished (hopefully) the code to handle a current list of open > snapshots in a transaction. I'm now wondering how to put it to good use > ;-) I'm not posting it yet -- first I want to get some feedback on the > previous patch I posted, > http://archives.postgresql.org/pgsql...3/msg00245.php As I said before, it looks fine. In your words, it "just moves code around", so there's not much to complain about. > I think the important change here is switching the semantics of > MyProc->xmin. Currently, it is "the minimum of Xmin and Xid, across all > backends, at the moment the current transaction fetches its serializable > snapshot". The first important bit is that it is computed only once: > when the serializable snapshot is taken. Yes, I see that as necessary. So the refactoring makes sense, since we'll be adding lots of stuff in that area and its good to keep it isolated. > So ISTM the important change is that we will have to update MyProc->xmin > more frequently than that. I'm thinking in keeping enough local state > so that we can detect at what time the earliest open snapshot is > unregistered; when that happens, we can recalculate MyProc->xmin based > on the snapshots we have and the Xid/Xmin of remote backends (which > could have also been updating their own xmins). > > There is one hole here: contention on ProcArrayLock. Basically, for > simple transactions we will need to update MyProc after every command. > It has been reported that ProcArrayLock is the most contended lock for > some loads; this would only add to that, and heavily I think. Perhaps > we could restructure the locking here somehow to avoid this problem, but > it is complex enough already that it may not even be possible. I don't see that this would be a contention problem. We are already careful to read the xmin just once during GetSnapshotData(). We advance it while holding only a LW_SHARED lock during a serializable snapshot, so not sure why we wouldn't advance it at other times also without contention issues. Why does anyone else know or care whether we're taking a serializable snapshot or not? The issue is whether we agree that is correct to do so. If we're advancing it in the circumstances you say, then yes I agree it is. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Tue, 2008-03-25 at 17:26 -0300, Alvaro Herrera wrote: > There is one hole here: contention on ProcArrayLock. Basically, for > simple transactions we will need to update MyProc after every command. If we're just updating MyProc->xmin, we only need to acquire ProcArrayLock in shared mode, right? > Another idea is to throttle the updating of Xmin so it only happens once > in a while, but it's difficult to find a useful criterion and avoid > falling into the trap that we just neglected to update it before a large > command. Using LWLockConditionalAcquire() might help also. -Neil -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| "Heikki Linnakangas" <heikki@enterprisedb.com> writes: > Neil Conway wrote: >> If we're just updating MyProc->xmin, we only need to acquire >> ProcArrayLock in shared mode, right? > In fact, do you need a lock at all? I think you probably do. GetSnapshotData needs to be confident that the global xmin it computes is <= the xmin that any other backend might be about to store into its MyProc->xmin; how can you ensure that if there's no locking happening? Now the way I'd been envisioning this would work is that whenever the number of active snapshots goes to zero, we clear MyProc->xmin, and that probably could be done without a lock. Then the next time we do GetSnapshotData, it would compute and store a new MyProc->xmin (this would be the same activity that we currently think of as "setting the serializable snapshot"). So you don't need any more locking than already exists. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Neil Conway wrote: > On Tue, 2008-03-25 at 17:26 -0300, Alvaro Herrera wrote: >> There is one hole here: contention on ProcArrayLock. Basically, for >> simple transactions we will need to update MyProc after every command. > > If we're just updating MyProc->xmin, we only need to acquire > ProcArrayLock in shared mode, right? In fact, do you need a lock at all? We already assume that reading/writing a TransactionId is atomic in many places. We acquire ProcArrayLock at the end of transaction when we clear MyProc->xid, to ensure that we don't exit the set of running transactions while someone else is taking a snapshot, but AFAICS that's not necessary when we just advance MyProc->xmin. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Le mercredi 26 mars 2008, Tom Lane a Ă©critÂ*: > whenever the number of active snapshots goes to zero Does this ever happen? I mean, if the way to avoid locking contention is to rely on a production system which let the service "breathe" from time to time, maybe there's something wrong in the reasoning. Of course I'm much more ready to accept I don't understand the first bit ofit all than to consider you're off-tracks here, but... -- dim If you ask a stupid question, you may feel stupid. If you don’t aska stupid question, you remain stupid. -- Tony Rothman, Ph.D.U. Chicago, Physics -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) iD8DBQBH6gpXlBXRlnbh1bkRArPpAKCnxa8XGkF56hMnDsj5vU GyK35OfwCggIu0 umbtANdwS7IV3/8yZg76IP4= =vIRm -----END PGP SIGNATURE----- |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Heikki Linnakangas" <heikki@enterprisedb.com> writes: >> Neil Conway wrote: >>> If we're just updating MyProc->xmin, we only need to acquire >>> ProcArrayLock in shared mode, right? > >> In fact, do you need a lock at all? > > I think you probably do. GetSnapshotData needs to be confident that the > global xmin it computes is <= the xmin that any other backend might be > about to store into its MyProc->xmin; how can you ensure that if there's > no locking happening? Surely xmin would only ever advance? How can removing snapshots cause xmin to retreat at all, let alone behind the gloal xmin GetSnapshotData calculated? > Now the way I'd been envisioning this would work is that whenever the > number of active snapshots goes to zero, we clear MyProc->xmin, and > that probably could be done without a lock. Then the next time we do > GetSnapshotData, it would compute and store a new MyProc->xmin > (this would be the same activity that we currently think of as "setting > the serializable snapshot"). So you don't need any more locking than > already exists. It's the same locking in theory from the point of view of where in the code the locking happens. But I don't think it's the same locking in practice from the point of view of how much wall-clock time passes between locks. Consider a data loading job which has millions of INSERT statements in a file. Currently if you put them all in a transaction it takes a single snapshot and runs them all with the same snapshot. If you reset xmin whenever you have no live snapshots then that job would be doing that between every INSERT statement. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training! -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Dimitri Fontaine <dfontaine@hi-media.com> writes: > Le mercredi 26 mars 2008, Tom Lane a Ă©critÂ*: >> whenever the number of active snapshots goes to zero > Does this ever happen? Certainly: between any two commands of a non-serializable transaction. In a serializable transaction the whole thing is a dead issue anyway, since the original snapshot has to be kept. There are corner cases involving open cursors where a snapshot might persist longer, and then the optimization wouldn't apply. The formulation that Alvaro gave would sometimes be able to move xmin forward when the simple no-snaps-left rule wouldn't, such as create cursor A, create cursor B (with a newer snap), close cursor A. However I really doubt that scenarios like this occur often enough to be worth having a much more expensive snapshot-management mechanism. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Gregory Stark <stark@enterprisedb.com> writes: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> I think you probably do. GetSnapshotData needs to be confident that the >> global xmin it computes is <= the xmin that any other backend might be >> about to store into its MyProc->xmin; how can you ensure that if there's >> no locking happening? > Surely xmin would only ever advance? You couldn't guarantee that without any lock. The risk case is where someone else is in progress of setting his own xmin, but is running so slowly that he's included an XID that isn't there anymore. So someone else coming in and doing a computation of global xmin will compute a higher value than what the slow guy is about to publish. I agree that it would be safe for a backend to increase its already-published xmin to some higher value without a lock. But I don't see the point. The place where you'd actually be computing the new value is in GetSnapshotData, and that can't run without a lock for the above-mentioned reason. > It's the same locking in theory from the point of view of where in the code > the locking happens. But I don't think it's the same locking in practice from > the point of view of how much wall-clock time passes between locks. > Consider a data loading job which has millions of INSERT statements in a file. > Currently if you put them all in a transaction it takes a single snapshot and > runs them all with the same snapshot. > If you reset xmin whenever you have no live snapshots then that job would be > doing that between every INSERT statement. These statements are 100% nonsense. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| ||||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: >> Consider a data loading job which has millions of INSERT statements in a file. >> Currently if you put them all in a transaction it takes a single snapshot and >> runs them all with the same snapshot. > >> If you reset xmin whenever you have no live snapshots then that job would be >> doing that between every INSERT statement. > > These statements are 100% nonsense. Uhm, yeah, I somehow didn't write was I was thinking. I didn't mean to say we would be taking a new snapshot for each INSERT but that we would be resetting xmin for each INSERT. Whereas currently we only set xmin once when we set the serializable snapshot. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |