vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi I've updated the lazy xid assignment patch, incorporating the results of the discussion over the past few days. I'll try to explain what the patch does, and why. This reasoning got a bit lengthy, but I felt that I should summarize my own thought, plus the relevant parts of the discussions here... And I might be a bit chatty too ;-) The patch can be found at: http://soc.phlo.org/lazyxidassign.v2.patch (Seems that large attachments don't get through on this list - or was that just bad luck when I tried posting the first version of that patch?) It includes a new regression test that asserts the we really don't assign xids for pure selects. The patch got a bit large in the end, though most of it is just the removal of XLOG_NO_TRAN and the various MyXact* flags and friends. The interesting parts really aren't that much code. The patch survives "make installcheck-parallel", and some light testing I did. I didn't really hammer it yet, though. I'll post this to the patches list if nobody finds a hole, or suggest further improvements. XID Assignment, Snapshot creation and oldest XMIN computation ================================================== ============ We no longer call GetNewTransactionId during transaction start. Instead, we leave the xid set to InvalidTransactionId, similar to what we currently do for subtransactions. GetTopTransactionId checks if we have a xid assigned, and assigns one using GetNewTransactionId if it's still InvalidTransactionId - again, similar to what GetCurrentTransactionId does for subtransactions already. There is a new function GetTopTransactionIdIfAny() bears the same relation to GetTopTransactionId() as GetCurrentTransactionIdIfAny() does to GetCurrentTransactionId() - it won't force xid assignment, but rather return InvalidTransactionId if the xid is not set. Both GetTopTransactionId() and GetCurrentTranscationId() use AssignTranscationId() to do the actualy work - this is a modified version of what is currently called AssignSubTransactionId(). GetOldestXmin() is changed slightly - it used to use GetTopTransactionId() as a starting point for it's xmin calculation, falling back to ReadNewTransactionId() if called from outside a transaction. Now, it always starts with ReadNewTransactionId(). It also used to ignore procarray entries without a valid xmin, which it obviously mustn't do anymore - they might still have a valid xmin set. The changes to GetSnapshotData() are similar. Again, we initialize xmin to ReadNewTransactionId() instead of GetTopTransactionId(). When scanning the procarray, we don't skip our own entry, but just process it normally. The function xid_age computes the age of a given xid relative to the transaction's toplevel xid. Since forcing xid assignment just to do that computation seemed silly, I modified xid_age to use MyProc->xmin as a reference instead. I could have used ReadNewTransactionId(), but that involves locking overhead, and we probably want to use something that remains constant during a transaction. Correctness issues: ------------------- * Users of Get{Top|Current}TransactionId{IfAny} * Since subtransactions *already* assign their xids on-demand, most of the code correctly distinguishes between GetCurrentTransactionId() and GetCurrentTransactionIdIfAny(). So that code is safe. Callers of GetTopTransactionId() have to be checked - but there are only very few of them. In fact, all callers are discussed above, with the exception of some logging code which I modified to use GetCurrentTransactionIdIfAny() instead. * Obtaining a snapshot * The most interesting function from a correctness perspective is GetSnapshotData(). It's correctness *does* heavily depend on the assumption that no transaction can *exit* the set of running transactions while it takes a snapshot - not after it set xmax in any case. This assumption ensures that if transaction A sees B as as committed, and B sees C as committed, than A will see C as committed. The assumption implies the "commit transitivity" because because C can't commit while A and B take their snapshot, and B can't commit while A takes it's snapshot. So for A to see B as committed, it'll have to take it's snapshot after B committed. And B will have to have taken it's snapshot after C commited, if B is to see C as committed. But since B will obviously haven taken it's snapshot before it committed, the ordering must then be |C commit|B snapshot|B commit|A snapshot|. Which implies that A will indeed see C as commited. This reasoning stays valid even if we lazy assign xids (No xids show up in the reasoning at all), provided that we see *all* transactions as committed that did so before we took a snapshot. But even with lazy xid assignment, the xid will surely be assigned before commit (or not at all, so nobody cares about us anyway) - so when we a snapshot after some commit, ReadNewTransactionId() will return a larger value than the just committed xid. So our snapshot's xmax will end up > that xid, the xid won't show up in our snasphot, and so we'll see it as committed. * Global xmin computation* Both GetSnapshotData() and GetOldestXmin() assume that concurrent callers will end up computing the same GlobalXmin. This is guaranteed as long as we hold the ProcArrayLock in ExclusiveMode while doing the GlobalXmin computation - something that my patch doesn't change. To cut the long story short - the small modifications outlined above are all that is needed to ensure the correctness of both GetSnapshotData and GetOldestXmin(), even with lazily assignment of xids. Top- and subtransaction commit and abort: ========================================= * Deciding whether to write COMMIT/ABORT records or not * The existing code maintains quite a few flags needed to decide if a COMMIT or ABORT record needs to be written, if the XLOG needs to be flushed afterwards and if so, how far. Those flage are * MyXactMadeTempRelUpdate * MyXactMadeXLogEntry * MyLastRecPtr * ProcLastRecEnd There is also a historical distinction between transactional and non-transactional xlog entries - the latter are generated by passing XLOG_NO_TRAN to XLogInsert. The only difference between those two types is that MyLastRecPtr is only advanced for transactional xlog entries, though. With lazy xid assignment, we have a much cleaner way to decide if a transaction has done anything that warrants writing a commit record. We assume that if someone forced xid assignment by calling GetCurrentTransactionId, than he will also have written that xid to disk, and so we must write a commit record. If no transaction id was assigned, then writing a commit record is impossible (there is no xid to put into that record), and also completely useless. The same hold true for ABORT records. A transaction might still have created an xlog entry, even without having been assigned an xid. An example for this is nextval(). We still want to force those records to permanent storage, even if we don't write a commit record. We therefore still track the end of the latest xlog record a transaction has written. My patch renames ProcLastRecEnd to XactLastRecEnd, and resets that value after committing or aborting a transaction. The rest of the flags (MyXactMadeTempRelUpdate, MyXactMadeXLogEntry and MyLastRecPtr) are removed because they no longer serve a useful purpose. Since removing MyLastRecPtr makes XLOG_NO_TRAN utterly useless, that flags is remove too. * Files scheduled for deletion on COMMIT & asynchronous commits * Whenever a transaction drops a relation, it schedules the physical file for deletion on COMMIT. Obviously we must only delete such a file when we are *absolutely* sure that our transaction is committed. Since we can only assume that when the COMMIT record has physically hit the disk, we first *must* not skip writing the COMMIT record if there are files scheduled for deletion, and second *must* not allow asynchronous commits in that case. The first requirement is already satisfied - file creation will surely be followed by a catalog update, which will force xid assingment, and thus writing a COMMIT record. There is an assertion in RecordTransactionCommit that checks for no schedules file deletions when we don't have a xid assigned. The second requirement can be relaxed a bit. If all to-be-deleted files are in fact temporary relations, then they will be gone after a crash anyway. So we may delete them before the COMMIT record hits the disk. Note that there is a small window where we might leak a file. If we flush the COMMIT record to disk, CHECKPOINT immediatly afterwards, and then crash before physically deleting the file, we won't do the deletion during recovery since our COMMIT record is before the latest CHECKPOINT. For asynchronous commit that window is actually *smaller* since the COMMIT record hits the disk later (maybe even after deleting the files. This is therefore *only* allowed if all files are temporary relations). * Files scheduled for deletion on ABORT * Whenever a transaction creates a file, it put it onto the on-abort delete list. There is no protection against leaks here - if we crash after creating the file, but before the transaction COMMITS or ABORTS, the file is leaked. The current code attempts to make that window smaller by forcing a xlog flush on ABORT if there are files to be deleted. But there is not much point in that - the transaction might have run for an arbitrarily long time after creating the file, and during that whole time a crash would have caused file leakage. So getting paranoid at ABORT time seems unjustified. And flushing might even do more harm than good here - it's entirely possible that it'll take longer to flush that it would have to get to the file deletion point if we had not flushed. * Suma sumarum * This modifies RecordTransactionCommit and RecordSubTransactionCommit, plus it joins RecordSubTransactionAbort and RecordTransactionAbort into one function, implementing the following rules (Which seems much clearer than the ones in the current code). Toplevel COMMIT: ..) If we have an XID assigned, write a COMMIT record. If not, assert that there are no pending deletions. ..) If we're not committing asynchronously, we flush the XLOG up to XactLastRecEnd. Otherwise, we just do XLogSetAsyncCommitLSN(), and let the wal writer do the flushing. ..) If we have an XID assigned, we update the CLOG. For asynchronous commits, we have to pass the LSN of our COMMIT record to prevent the update from hitting the disk before the xlog record does. ..) Reset XactLastRecEnd Subtransaction COMMIT: ..) Do nothing if we don't have a xid assigned, otherwise ..) Just update the CLOG. No WAL-logging needed. Toplevel *and* Subtransaction ABORT: ..) Do nothing if we don't have a xid assigned, otherwise ..) Write an ABORT record, and then ..) update the CLOG. We don't ensure that this doesn't hit the disk early - there is not much difference between actively aborting, and just being assumed to have crashed. ..) For subtransactions, remove the relevent xids from the procarray, otherwise reset XactLastRecEnd. Waiting for transactions to exit / ResourceOwnerIds ================================================== = Each transaction currently holds a lock on it's xid. This is used for two quite different purposes - one is waiting for a transaction that modified some tuple - this still works with lazily assigned xids. But during CREATE INDEX CONCURRENTLY, this is used to first wait for all transactions that are holding a lock conflicting with ShareLock on the relation in question, and than to out-wait *all* currently running transactions. This obviously fails when xids are assigned lazily. In the first waiting phase because a transaction will acquire the lock *before* requesting an xid, in the second phase because we have to out-wait even pure selects. Therefore, the concept of a ResourceOwnerId is introduced. This id identifies the toplevel resourceowner of a transaction, and the owning transaction holds a lock on it just as it does for transaction ids. Porting the first waiting phase to this is trivial - we just collect the RIDs instead of the XIDs of the current lock holders, and wait for them in turn. The second waiting phase is a bit more complicated, because the current code just waits on all xids in it's reference snapshot - which won't work for RIDs, because they do not show up in snapshots. So, instead, we scan the procarray, and wait for any transaction with an xmin < the xmax of our reference snapshot. ResourceOwnerIds are a struct composed of two uint32 fields - one sessionId that is assigned to each backend at backend startup (From a counter in shmem), and one localTransactionId that is just incremented locally whenever we need a new unique RID in a backend. This ensures that RIDs create no new locking overhead, apart from having to hold at least two locks per transaction instead of one. Locks on ResourceOwnerIds are not saved to the 2PC state file on PREPARE, since the RID or a transaction will be reused at time point after it was prepared. ResourceOwnerIds & pg_locks. ============================ Locking requests on RIDs are shown in pg_locks, just as those on xids are. Therefore, some representation has to be chosen for RIDs. XID won't work, because due to the local/global split we need 8 bytes to represent a RID. After much discussion (A new datatype is to heavy-weight, leaving out bits might hinder debugging, and using int8 is impossible as long as we support INT64_IS_BUSTED), we settled on a string representation. This at least has the advantage that the transition to a new full-blown type for RIDs would be seamless shoudl we even choose to go that way. My patch adds two columns to pg_locks - "resourceownerid" on the 'left' side (The locking request), and "resourceowner" on the 'right' side (the holder/waiter). This is similar to how xids are handled in that view. Possible future Improvements: ============================= It's not entirely clear that we really need the exclusive ProcArrayLock when exiting a transaction that didn't hold an xid. Not holding that lock *will* make two concurrent calculations of globalxmin get two different results sometimes - but that might actually be OK. With some effort, we could get rid of the lock we hold on the xid entirely I think. We'd need to reserve a special sessionId (0xffffffff probably) for RIDs of prepared transactions. During prepare, we'd then grab a lock on the RID (0xffffffff, xid) *before* releasing the lock on our old RID. In XactLockTableWait we'd have to first check if the transaction using the xid we want to wait for is still running, and get it's RID if it is. We'd than have to wait on that RID, and after getting that lock we'd have to wait on the RID (0xffffffff, xid). Only after getting that lock too, we may return. But that doesn't seem much prettier than just holding two locks per transaction, so for now I think it's best to leave things as they are. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| "Florian G. Pflug" <fgp@phlo.org> writes: > The patch can be found at: http://soc.phlo.org/lazyxidassign.v2.patch > (Seems that large attachments don't get through on this list - or was that > just bad luck when I tried posting the first version of that patch?) No, the message size limit on -hackers is relatively small. You shouldn't post patches here anyway, they belong on -patches. A few comments ... > GetOldestXmin() is changed slightly - it used to use > GetTopTransactionId() as a starting point for it's xmin > calculation, falling back to ReadNewTransactionId() if called > from outside a transaction. Now, it always starts with ReadNewTransactionId(). It's kind of annoying to do ReadNewTransactionId() when most of the time it won't affect the result. However I don't see much choice; if we tried to read it only after failing to find any entries in the procarray, we'd have a race condition, since someone might acquire an xid and put it into the procarray after we look at their PGPROC and before we can do ReadNewTransactionId. It might be worth doing GetTopTransactionIdIfAny and only calling ReadNewTransactionId when that fails, since the former is far cheaper than the latter. Also we could cheaply check to see if our own xmin is set, and use that to initialize the computation. > The function xid_age computes the age of a given xid relative > to the transaction's toplevel xid. Since forcing xid assignment > just to do that computation seemed silly, I modified xid_age > to use MyProc->xmin as a reference instead. I could have used > ReadNewTransactionId(), but that involves locking overhead, > and we probably want to use something that remains constant > during a transaction. I don't much like this, since as I mentioned before I don't think MyProc->xmin is going to be constant over a whole transaction for long. I don't think xid_age is performance-critical in any way, so I'd vote for letting it force XID assignment. > The first requirement is already satisfied - file creation > will surely be followed by a catalog update, which will force xid assingment, > and thus writing a COMMIT record. There is an assertion in > RecordTransactionCommit that checks for no schedules file deletions when > we don't have a xid assigned. Rather than an Assert (which won't be there in production), I'd suggest a standard test and elog(ERROR). This point is sufficiently critical that I don't want the test to be missing in production builds ... > Therefore, the concept of a ResourceOwnerId is introduced. This id identifies > the toplevel resourceowner of a transaction, and the owning transaction holds > a lock on it just as it does for transaction ids. Porting the first waiting > phase to this is trivial - we just collect the RIDs instead of the XIDs of > the current lock holders, and wait for them in turn. I don't particularly like calling these ResourceOwnerId -- ResourceOwners are an implementation artifact that's purely backend-local. If we were to get rid of them in favor of some other resource management mechanism, we'd still need RIDs for the waiting purposes you mention. So I'm in favor of calling them something else, maybe virtual transaction ids (VXID or VID) --- not wedded to that if someone has a better idea. > Locks on ResourceOwnerIds are not saved to the 2PC state file on PREPARE, since > the RID or a transaction will be reused at time point after it was prepared. Hmm. I think this is all right since we don't actually need to wait for a prepared xact, but it still seems like a wart. Other than these relatively minor quibbles, the description seems all right, and it's so complete that I'm not sure I need to read the code ;-) As I just mentioned over in pgsql-performance, I'm strongly inclined to push this patch into 8.3, even though we're certainly far past the feature freeze deadline. Not allocating XIDs for transactions that only do SELECTs seems to me like it'll be a big win for a lot of high-performance applications. It's not so much that the act of allocating an XID is really expensive, it's that making a significant reduction in the rate at which XIDs are used up translates directly to a reduction in many overhead costs. For instance, the "live" area of pg_clog and pg_subtrans gets smaller, reducing pressure on those buffer areas, and the frequency with which vacuums are needed for anti-wraparound purposes drops. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: > A few comments ... > >> GetOldestXmin() is changed slightly - it used to use >> GetTopTransactionId() as a starting point for it's xmin >> calculation, falling back to ReadNewTransactionId() if called >> from outside a transaction. Now, it always starts with ReadNewTransactionId(). > > It's kind of annoying to do ReadNewTransactionId() when most of the time > it won't affect the result. However I don't see much choice; if we > tried to read it only after failing to find any entries in the > procarray, we'd have a race condition, since someone might acquire > an xid and put it into the procarray after we look at their PGPROC > and before we can do ReadNewTransactionId. > > It might be worth doing GetTopTransactionIdIfAny and only calling > ReadNewTransactionId when that fails, since the former is far > cheaper than the latter. Also we could cheaply check to see if > our own xmin is set, and use that to initialize the computation. Sounds worthwile, I'm going to do this. >> The function xid_age computes the age of a given xid relative >> to the transaction's toplevel xid. Since forcing xid assignment >> just to do that computation seemed silly, I modified xid_age >> to use MyProc->xmin as a reference instead. I could have used >> ReadNewTransactionId(), but that involves locking overhead, >> and we probably want to use something that remains constant >> during a transaction. > > I don't much like this, since as I mentioned before I don't think > MyProc->xmin is going to be constant over a whole transaction for > long. I don't think xid_age is performance-critical in any way, > so I'd vote for letting it force XID assignment. Hm... I agree that xid_age is probably not performance-critical. I guess it's more the complete counter-intuitivity of forcing xid assignment in some arbitrary function thats bugging me a bit. Maybe it'd be sufficient to guarantee a constant return value of this function within one statement for read-committed transaction? For serializable transactions I'd still be constant Sounds not completely ill-logical to me - after all the age of an xid *does* really change if my xmin moves forward. But I can live with just doing GetTopTransactionId() too... >> The first requirement is already satisfied - file creation >> will surely be followed by a catalog update, which will force xid assingment, >> and thus writing a COMMIT record. There is an assertion in >> RecordTransactionCommit that checks for no schedules file deletions when >> we don't have a xid assigned. > > Rather than an Assert (which won't be there in production), I'd suggest > a standard test and elog(ERROR). This point is sufficiently critical > that I don't want the test to be missing in production builds ... I agree - I didn't consider the fact that Asserts are disable in production builds when I made it an assertion. >> Therefore, the concept of a ResourceOwnerId is introduced. This id identifies >> the toplevel resourceowner of a transaction, and the owning transaction holds >> a lock on it just as it does for transaction ids. Porting the first waiting >> phase to this is trivial - we just collect the RIDs instead of the XIDs of >> the current lock holders, and wait for them in turn. > > I don't particularly like calling these ResourceOwnerId -- > ResourceOwners are an implementation artifact that's purely > backend-local. If we were to get rid of them in favor of some other > resource management mechanism, we'd still need RIDs for the waiting > purposes you mention. So I'm in favor of calling them something else, > maybe virtual transaction ids (VXID or VID) --- not wedded to that if > someone has a better idea. I not particularly proud of the name "ResourceOwnerId" - it was just the best I could come up with ;-) I quite like VirtualTransactionId actually - it emphasizes the point that they never hit the disk. So I'll rename it to VirtualTransactionId if no better idea comes up. >> Locks on ResourceOwnerIds are not saved to the 2PC state file on PREPARE, since >> the RID or a transaction will be reused at time point after it was prepared. > > Hmm. I think this is all right since we don't actually need to wait for > a prepared xact, but it still seems like a wart. It certainly is a wart, though but removing it will create a bigger one I fear... The only other I idea I had was to make locks on RIDs session locks, and to drop them explicitly at toplevel commit and abort time, and that certainly has a quite high warty-ness score... > Other than these relatively minor quibbles, the description seems all > right, and it's so complete that I'm not sure I need to read the code > ;-) I *did* warn that I tend to be a bit chatty at times, though ;-) greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| "Florian G. Pflug" <fgp@phlo.org> writes: > Tom Lane wrote: >> I don't much like this, since as I mentioned before I don't think >> MyProc->xmin is going to be constant over a whole transaction for >> long. I don't think xid_age is performance-critical in any way, >> so I'd vote for letting it force XID assignment. > Hm... I agree that xid_age is probably not performance-critical. > I guess it's more the complete counter-intuitivity of forcing > xid assignment in some arbitrary function thats bugging me a bit. Well, I don't see it as arbitrary --- we're trying to hide the fact that XIDs are assigned lazily from user-visible behavior. >>> Locks on ResourceOwnerIds are not saved to the 2PC state file on PREPARE, since >>> the RID or a transaction will be reused at time point after it was prepared. >> >> Hmm. I think this is all right since we don't actually need to wait for >> a prepared xact, but it still seems like a wart. > > It certainly is a wart, though but removing it will create a bigger one > I fear... The only other I idea I had was to make locks on RIDs session > locks, and to drop them explicitly at toplevel commit and abort time, > and that certainly has a quite high warty-ness score... Hmm. I was thinking that we could just not do anything special here, but on second thought I see the problem: on crash and restart we are intending to restart the session ID counter from zero, which creates a significant probability of some session conflicting with a VXID lock held by a prepared xact. So I guess filtering the VXID lock out of the set sent to disk is probably the best answer. There's definitely no need for the prepared xact to continue to hold that lock after crash/restart. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Tom Lane wrote: >>> I don't much like this, since as I mentioned before I don't think >>> MyProc->xmin is going to be constant over a whole transaction for >>> long. I don't think xid_age is performance-critical in any way, >>> so I'd vote for letting it force XID assignment. > >> Hm... I agree that xid_age is probably not performance-critical. >> I guess it's more the complete counter-intuitivity of forcing >> xid assignment in some arbitrary function thats bugging me a bit. > > Well, I don't see it as arbitrary --- we're trying to hide the fact that > XIDs are assigned lazily from user-visible behavior. It seems both this and some of the other cases of having to call ReadNewTransactionId would be eliminated if we invented a new xid treated similarly to current_time() and others which are calculated once per transaction and then cached for subsequent accesses. So xid_age() would measure relative to a fixed xid, it just wouldn't be *our* xid necessarily. Just a thought. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Gregory Stark <stark@enterprisedb.com> writes: > It seems both this and some of the other cases of having to call > ReadNewTransactionId would be eliminated if we invented a new xid treated > similarly to current_time() and others which are calculated once per > transaction and then cached for subsequent accesses. So xid_age() would > measure relative to a fixed xid, it just wouldn't be *our* xid necessarily. Hm, so basically call ReadNewTransactionId the first time one of these functions is used within a given transaction, and save that? Or we could use our own transaction ID if one was already assigned at that point. Seems reasonable. Not sure what to call the thing though. ArbitraryReferenceXID? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| August Zajonc wrote: I assume that you meant that mail to go to pgsql-hackers too, and just clicked the wrong button by accident. If not, please forgive that I post the reply to the list. > Are you going to look at putting temp relations into a special folder > for 8.4? I really like that idea for crash cleanup and generally. According to Alvaro, that is already the case. I didn't check, though. > For non temp files, can they be created in a pending folder, work done > on them, and then moved over to final destination just before commit? Hm... that doesn't sound bad at first sight. We'd need one such folder per tablespace, though. > You could still leak them (ie, you move to final location and bail > before commit) but it seems to narrow the window down significantly. That leak could be prevented I think if the COMMIT record indicated which files are to be moved, and the actual move happens after the flushing the COMMIT record to disk. Probably the move would need to happen while we're in the commit critical section too, to guard against concurrent checkpoints - but that all seems doable. > Actually, as I write this, a bad idea, the temp files are the low > hanging fruit, don't think it's worth dealing with non-temp files until > you can make it totally leakproof. There is no fundamental difference between temporary and non-temporary delete-on-abort files I think. There is one for delete-on-commit files, (and only because of asynchronous commits) but the leakage possibility for those is already really small (I'd take a COMMIT, a crash *and* a CHECKPOINT at just the right time to get a leak). The hardest part seems to be making the code cope with moved files. Though we only need to care about that local backend I believe, because all others haven't yet seen the files anyway. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Gregory Stark wrote: > "Tom Lane" <tgl@sss.pgh.pa.us> writes: > >> "Florian G. Pflug" <fgp@phlo.org> writes: >>> Tom Lane wrote: >>>> I don't much like this, since as I mentioned before I don't think >>>> MyProc->xmin is going to be constant over a whole transaction for >>>> long. I don't think xid_age is performance-critical in any way, >>>> so I'd vote for letting it force XID assignment. >>> Hm... I agree that xid_age is probably not performance-critical. >>> I guess it's more the complete counter-intuitivity of forcing >>> xid assignment in some arbitrary function thats bugging me a bit. >> Well, I don't see it as arbitrary --- we're trying to hide the fact that >> XIDs are assigned lazily from user-visible behavior. > > It seems both this and some of the other cases of having to call > ReadNewTransactionId would be eliminated if we invented a new xid treated > similarly to current_time() and others which are calculated once per > transaction and then cached for subsequent accesses. So xid_age() would > measure relative to a fixed xid, it just wouldn't be *our* xid necessarily. Hm.. for xid_age that would be OK I think. I'm not so sure about GetOldestXmin(), though. If we cache the value *before* assigning a xid, ur cached value might very well turn out to be older than any xmin and xid in the procarray. GetOldestXmin() would then return a value older than necessary, which effectively means keeping old data around longer than necessary. If we only cache the value *after* setting an xmin it's entirely redundant - we might just use the xid, and be done with it. I've actually checked who actually uses GetOldestXmin now. The list is surprisingly short. CreateCheckPoint - To find the pg_subtrans truncation point. Called from bgwriter, and therefore from outside a transaction. IndexBuildHeapScan - Only in the nonconcurrent case. Hardly performance-critical. vacuum_set_xid_limits - To decide which tuples are recently dead (Only called one per VACUUM). Hardly performance-critical. The reason this list is so short is that we *already* cache a global xmin value in RecentGlobalXmin. So doing any more optimizations for GetOldestXmin() seems like overkill. So I'd say lets do the optimization Tom suggested (Checking GetTopTransactionIdIfAny() before calling ReadNewTransactionId()), and be done with it. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Florian G. Pflug wrote: > August Zajonc wrote: > I assume that you meant that mail to go to pgsql-hackers too, and just > clicked the wrong button by accident. If not, please forgive that > I post the reply to the list. > >> Are you going to look at putting temp relations into a special folder for >> 8.4? I really like that idea for crash cleanup and generally. > According to Alvaro, that is already the case. I didn't check, though. Huh, no, I was just suggesting we could do it :-) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Tom Lane wrote: >>> I don't much like this, since as I mentioned before I don't think >>> MyProc->xmin is going to be constant over a whole transaction for >>> long. I don't think xid_age is performance-critical in any way, >>> so I'd vote for letting it force XID assignment. > >> Hm... I agree that xid_age is probably not performance-critical. >> I guess it's more the complete counter-intuitivity of forcing >> xid assignment in some arbitrary function thats bugging me a bit. > > Well, I don't see it as arbitrary --- we're trying to hide the fact that > XIDs are assigned lazily from user-visible behavior. Hm.. but xid_age already exposes an implementation detail (that xid comparison is circular). Actually, after more thought on this I'd argue that the only really correct reference for computing a xid's age is GetOldestXmin(). It's what VACUUM uses to calculate the freezeLimit, after all. Now, using GetOldestXmin() directly is probably unwise because it might change at any time. So I now think that using RecentOldestXmin is the correct thing to do. The fact that it might change between statements for a read-committed transaction is quite natural I'd say. After all, read-committed transaction *do* see changes made by other transaction, so why shouldn't they see xid age values changing, if this is what those values really do? Of course, this is very similar to Gregory's suggestion - only that it uses a cached xid value that already exists, instead of adding a new one. Plus, it uses a value set once per snapshot instead of once per transaction. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| Thread Tools | |
| Display Modes | |
|
|