This is a discussion on Advice on MyXactMade* flags, MyLastRecPtr, pendingDeletes and lazyXID assignment within the pgsql Hackers forums, part of the PostgreSQL category; --> Hi When we lazily assign XIDs, we gain another flag beside the existing MyXactMadeTempRelUpdates, MyXactMadeXLogEntry, MyLastRecPtr and smgr's pendingDeletes ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi When we lazily assign XIDs, we gain another flag beside the existing MyXactMadeTempRelUpdates, MyXactMadeXLogEntry, MyLastRecPtr and smgr's pendingDeletes to tell what kind of actions a transaction performed. Adding TransactionIsIsValid(GetCurrentTransactionIdIfAny( )) on top of that makes things quite impenetrable - at least for me. So I'm trying to wrap my head around that logic, and simplify it a bit if possible. (Nowadays, async commit even adds a bit more complexity) Currently, we write out a COMMIT record if a transaction either created any transaction-controlled XLOG entries (MyLastRecPtr.xrecoff != 0), or scheduled the deletion of files on commit. Afterwards, the xlog is flushed to the end last record created by the session (ProcLastRecEnd) if the transaction created any xlog record at all. If we either made transaction-controlled XLOG entries, or temporary relation updates, we update the XID status in the CLOG. An ABORT record is currently created if a transaction either created transaction-controlled XLOG entries or scheduled the deletion of files on abort. If we schedules file deletions, we flush the XLOG up to the ABORT record. If we either made transaction-controlled XLOG entries, updated temporary relations, or scheduled deletions we update the XID status in the CLOG. For subtransaction commit, a COMMIT record is emitted if we either made transaction-controlled XLOG entries, or updated temporary relations. No XLOG flush is performed. Subtransaction ABORTS are handled the same way as regular transaction aborts. For toplevel transaction commits, we defer flushing the xlog if synchronous_commit = off, and we didn't schedule any file deletions. Now, with lazy XID assignment I believe the following holds true ..) If we didn't assign a valid XID, we cannot have made transaction-controlled XLOG entries (Can be checked by asserting that the current transaction id is valid if XLOG_NO_TRAN isn't set in XLogInsert). ..) We cannot have scheduled files for deletion (on either COMMIT or ABORT) if we don't have a valid XID, since any such deletion will happen together with a catalog update. Note that this is already assumed to be true for subtransactions, since they only call RecordSubTransaction{Commit|Abort} if they have an XID assigned. I propose to do the following in my lazy XID assignment patch - can anyone see a hole in that? ..) Get rid of MyLastRecPtr and MyXactMadeTempRelUpdates. Those are superseeded by TransactionIdIsValid(GetCurrentTransactionIdIfAny( )). ..) Get rid of MyXactMadeXLogEntry. Instead, just reset ProcLast ..) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting a new toplevel transaction. Transaction COMMIT: Write an COMMIT record if and only if we have a valid XID. Then, flush the XLOG to XactLastRecEnd if that is set, and synchronous_commit=on. Aferwards, update the CLOG if and only if we have a valid XID. Transaction ABORT: Write an ABORT record if and only if we have a valid XID. Then, flush the XLOG to XactLastRecEnd if that is set, and we scheduled on-abort deletions. Update the CLOG if and only if we have a valid XID. Subtransaction COMMIT: Update the CLOG if and only if we have a valid XID. Subtransaction ABORT: Write an ABORT record if and only if we have a valid XID. Then, flush the XLOG to XactLastRecEnd if that is set, and we scheduled on-abort deletions. Update the CLOG if and only if we have a valid XID. I think we might go even further, and *never* flush the XLOG on abort, since if we crash just before the abort won't log anything either. But if we leak the leftover files in such a case, that's probably a bad idea. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| "Florian G. Pflug" <fgp@phlo.org> writes: > I propose to do the following in my lazy XID assignment patch - can > anyone see a hole in that? Cleaning up this area seems like a good idea. Just FYI, one reason why there are so many LastRec pointer variables is that the WAL record format used to include a back-link to the previous record of the same transaction, so we needed to track that location. Since that's gone, simplification is definitely possible. A lot of the other behavior you're looking at "just grew" as incremental optimizations added over time. One comment is that at the time we make an entry into smgr's pending-deletes list, I think we might not have acquired an XID yet --- if I understand your patch correctly, a CREATE TABLE would acquire an XID when it makes its first catalog insertion, and that happens after creating the on-disk table file. So it seems like a good idea for smgr itself to trigger acquisition of an XID before it makes a pending-deletes entry. This ensures that you can't have a situation where you have deletes to record and no XID; otherwise, an elog between smgr insertion and catalog insertion would lead to just that. > .) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting > a new toplevel transaction. I'm not very happy with that name for the variable, because it looks like it might refer to the last transaction-controlled record we emitted, rather than the last record of any type. Don't have a really good suggestion though --- CurXactLastRecEnd is the best I can do. One thought here is that it's not clear that we really need a concept of transaction-controlled vs not-transaction-controlled xlog records anymore. In CVS HEAD, the *only* difference no_tran makes is whether to set MyLastRecPtr, and you propose removing that variable. This seems sane to me --- the reason for having the distinction at all was Vadim's plan to implement transaction UNDO by scanning its xlog records backwards, and that idea is as dead as a doornail. So we could simplify matters conceptually if we got rid of any reference to such a distinction. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> I propose to do the following in my lazy XID assignment patch - can >> anyone see a hole in that? > > One comment is that at the time we make an entry into smgr's > pending-deletes list, I think we might not have acquired an XID yet > --- if I understand your patch correctly, a CREATE TABLE would acquire > an XID when it makes its first catalog insertion, and that happens > after creating the on-disk table file. So it seems like a good idea > for smgr itself to trigger acquisition of an XID before it makes a > pending-deletes entry. This ensures that you can't have a situation > where you have deletes to record and no XID; otherwise, an elog > between smgr insertion and catalog insertion would lead to just that. I wonder a bit about the whole special-casing of COMMITs/ABORTs with pending delete, though. A crash might always leave stray file around, so there ought to be a way to clean them up anyway. Still, for now I'll go with your suggestion, and force XID assignment in the smgr. >> .) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting >> a new toplevel transaction. > > I'm not very happy with that name for the variable, because it looks > like it might refer to the last transaction-controlled record we > emitted, rather than the last record of any type. Don't have a really > good suggestion though --- CurXactLastRecEnd is the best I can do. Hm.. don't have a good suggestion, either - the reason I want to rename it is that ProcLastRecEnd doesn't sound like it's be reset at transaction start. > One thought here is that it's not clear that we really need a concept of > transaction-controlled vs not-transaction-controlled xlog records > anymore. In CVS HEAD, the *only* difference no_tran makes is whether > to set MyLastRecPtr, and you propose removing that variable. This > seems sane to me --- the reason for having the distinction at all was > Vadim's plan to implement transaction UNDO by scanning its xlog records > backwards, and that idea is as dead as a doornail. So we could simplify > matters conceptually if we got rid of any reference to such a > distinction. I've thinking about keeping XLOG_NO_TRAN, and doing if (!no_tran) Assert(TransactionIdIsValid(GetCurrentTransactionI dIfAny()) in xlog.c as a safety measure. We can't make that assertion unconditionally, I think, because nextval() won't force XID assigment, but might do XLogInsert. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| "Florian G. Pflug" <fgp@phlo.org> writes: > Tom Lane wrote: >> One thought here is that it's not clear that we really need a concept of >> transaction-controlled vs not-transaction-controlled xlog records >> anymore. > I've thinking about keeping XLOG_NO_TRAN, and doing > if (!no_tran) > Assert(TransactionIdIsValid(GetCurrentTransactionI dIfAny()) > in xlog.c as a safety measure. Why do you think this is a safety measure? All that it is checking is whether the caller has preserved an entirely useless distinction. The real correctness property is that you can't write your XID into a heap tuple or XLOG record if you haven't acquired an XID, but that seems nearly tautological. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Tom Lane wrote: >>> One thought here is that it's not clear that we really need a concept of >>> transaction-controlled vs not-transaction-controlled xlog records >>> anymore. > >> I've thinking about keeping XLOG_NO_TRAN, and doing >> if (!no_tran) >> Assert(TransactionIdIsValid(GetCurrentTransactionI dIfAny()) >> in xlog.c as a safety measure. > > Why do you think this is a safety measure? All that it is checking > is whether the caller has preserved an entirely useless distinction. > The real correctness property is that you can't write your XID > into a heap tuple or XLOG record if you haven't acquired an XID, > but that seems nearly tautological. I was confused. I wanted to protect against the case the an XID hits the disk, but doesn't show up in any xl_xid field, and therefore might be reused after crash recovery. But of course, to make that happen you'd have to actually *store* the XID into the data you pass to XLogInsert, which is kind of hard if you haven't asked for it first. So, I now agree, XLOG_NO_TRAN should be buried. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Wed, 2007-08-29 at 19:32 +0200, Florian G. Pflug wrote: > I propose to do the following in my lazy XID assignment patch The lazy XID assignment seems to be the key to unlocking this whole area. > - can > anyone see a hole in that? > > .) Get rid of MyLastRecPtr and MyXactMadeTempRelUpdates. Those are > superseeded by TransactionIdIsValid(GetCurrentTransactionIdIfAny( )). > .) Get rid of MyXactMadeXLogEntry. Instead, just reset ProcLast > .) Rename ProcLastRecEnd to XactLastRecEnd, and reset when starting > a new toplevel transaction. I followed you up to this point. Nothing bad springs immediately to mind, but please can you explain the proposals rather than just assert them and ask us to find the holes? > I think we might go even further, and *never* flush the XLOG on abort, > since if we crash just before the abort won't log anything either. But > if we leak the leftover files in such a case, that's probably a bad idea. That doesn't gain us much, yet we lose the ability to tell the difference between an abort and a crash. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Tom Lane wrote: > One comment is that at the time we make an entry into smgr's > pending-deletes list, I think we might not have acquired an XID yet > --- if I understand your patch correctly, a CREATE TABLE would acquire > an XID when it makes its first catalog insertion, and that happens > after creating the on-disk table file. So it seems like a good idea > for smgr itself to trigger acquisition of an XID before it makes a > pending-deletes entry. This ensures that you can't have a situation > where you have deletes to record and no XID; otherwise, an elog > between smgr insertion and catalog insertion would lead to just that. Hm.. I was just going to implement this, but I'm now wondering if thats really worth it. For smgrcreate, this would catch the following case: ..) CREATE something ..) smgrcreate: Creates file, and puts it onto the delete-on-abort list ..) We elog() *before* acquiring an XID ..) RecordTransactionAbort or RecordSubTransactionAbort: We don't write an ABORT record. ..) We crash *before* actually deleting the file Compare the probability of that happening (The elog *and* the crash) with the probability of ..) CREATE something ..) smgrcreate: Creates the file ..) We crash *before* we have to chance to commit or abort. The window in which a crash causes us to leak the file seems to be much wider in the second case, yet forcing XID assignment will not help to preven it, unless I'm overlooking something. In the smgrunlink case, there is no reason at all to force XID assignment, because if we abort or crash, we don't want to unlink anyway, and if we survive until we commit, we'll assign the XID during the inevitable catalog update. The only thing the forced XID assignment would buy is to be able to stick if (TransactionIdIsValid(GetCurrentTransactionIdIfAny ())) Assert(nrels == 0); into the various Record{Sub|}Transction{Commit|Abort} functions So unless I'm overlooking something, I believe for now it's best to ignore this issued, and to do a proper fix in the long run that removes *all* possible leakages. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| "Florian G. Pflug" <fgp@phlo.org> writes: >> One comment is that at the time we make an entry into smgr's >> pending-deletes list, I think we might not have acquired an XID yet > Hm.. I was just going to implement this, but I'm now wondering if > thats really worth it. Basically what you'd give up is the ability to Assert() that there are no deletable files if there's no XID, which seems to me to be an important cross-check ... although maybe making smgr do that turns this "cross-check" into a tautology ... hmm. I guess the case that's bothering me is where we reach commit with deletable files and no XID. But that should probably be an error condition anyway, ie, we should error out and turn it into an abort. On the abort side we'd consider it OK to have files and no XID. Seems reasonable to me. The only way we could make this more robust is if we could have WAL-before-data rule for file *creation*, but I think that's not possible given that we don't know what relfilenode number we will use until we've successfully created a file. So there will always be windows where a crash leaks unreferenced files. There's been some debate about having crash recovery search for and delete such files, but so far I've resisted it on the grounds that it sounds like data loss waiting to happen --- someday it'll delete a file you wished it'd kept. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >>> One comment is that at the time we make an entry into smgr's >>> pending-deletes list, I think we might not have acquired an XID yet > >> Hm.. I was just going to implement this, but I'm now wondering if >> thats really worth it. > > Basically what you'd give up is the ability to Assert() that there are > no deletable files if there's no XID, which seems to me to be an > important cross-check ... although maybe making smgr do that turns > this "cross-check" into a tautology ... hmm. I guess the case that's > bothering me is where we reach commit with deletable files and no XID. > But that should probably be an error condition anyway, ie, we should > error out and turn it into an abort. On the abort side we'd consider > it OK to have files and no XID. Seems reasonable to me. I've done that now, and it turned out nicely. There is an Assertion on "(nrels == 0) || xid assigned" in the COMMIT path, but not in the ABORT path. Seems reasonable and safe. And I'm quite tempted to not flush the XLOG at all during ABORT, and to only force synchronous commits if one of the to-be-deleted files is non-temporary. The last idea widens the leakage window quite a bit though, so I maybe I should rather resist that temptation... OTOH, it'd allow aynchronous commits for transactions that created temporary tables. > The only way we could make this more robust is if we could have > WAL-before-data rule for file *creation*, but I think that's not > possible given that we don't know what relfilenode number we will use > until we've successfully created a file. So there will always be > windows where a crash leaks unreferenced files. There's been some > debate about having crash recovery search for and delete such files, but > so far I've resisted it on the grounds that it sounds like data loss > waiting to happen --- someday it'll delete a file you wished it'd kept. It seems doable, but it's not pretty. One possible scheme would be to emit a record *after* chosing a name but *before* creating the file, and then a second record when the file is actually created successfully. Then, during replay we could remember a list of xids and filenames, and remove those files for which we either haven't seen a "created successfully" record, or no COMMIT record for the creating xid. With this scheme, I'd be natural to force XID assignment in smgrcreate, because we'd actually depend on logging the xid there. greetings, Florian Pflug ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| "Florian G. Pflug" <fgp@phlo.org> writes: > And I'm quite tempted to not flush the XLOG at all during ABORT, and to > only force synchronous commits if one of the to-be-deleted files is > non-temporary. +1 on the first, but -1 on the second, because we'd have to track whether deleted files are temp or not ... it's very unclear that it'd be worth the trouble. > OTOH, it'd allow aynchronous commits for transactions that created > temporary tables. It'd be for xacts that *dropped* temp tables, no? I'm not sure that is a performance-critical path --- probably it more usually gets done after the client's already disconnected. >> The only way we could make this more robust is if we could have >> WAL-before-data rule for file *creation*, but I think that's not >> possible given that we don't know what relfilenode number we will use >> until we've successfully created a file. > It seems doable, but it's not pretty. One possible scheme would be to > emit a record *after* chosing a name but *before* creating the file, No, because the way you know the name is good is a successful open(O_CREAT). regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |