This is a discussion on Improvement of procArray.xmin for VACUUM within the Pgsql Patches forums, part of the PostgreSQL category; --> VACUUM treats dead tuple with an xmax less than all procArray.xmin's as invisible with space ready to be reused. ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| VACUUM treats dead tuple with an xmax less than all procArray.xmin's as invisible with space ready to be reused. Right now, we set procArray.xmin at transaction start to the required SERIALIZABLE value. We currently don't update procArray.xmin when we know we are in a READ COMMITTED transaction, even though we already call GetSnapshotData(). This means that if a transaction completes that was active when our transaction started, we don't update our procArray.xmin during the next multi-statement transaction command to indicate that we don't care about the completed transaction anymore. I have been thinking we could improve how quickly VACUUM can expire rows if we update procArray.xmin more frequently for non-SERIALIZABLE transactions. The attached patch updates procArray.xmin in this manner. Here is an example of how the patch improves dead row reuse: Session #: 1 2 3 CREATE TABLE test(x int); INSERT INTO test VALUES (1); BEGIN; DELETE FROM test; BEGIN; SELECT 1; COMMIT; VACUUM VERBOSE test; (row not reused) SELECT 1; (At this point #2 doesn't see the test row anymore. Patch updates procArray.xmin.) VACUUM VERBOSE test; (row reused with patch) COMMIT; VACUUM VERBOSE test; (normal row reuse) What the patch does is to recompute procArray.xmin during the second SELECT, which is then used by the next VACUUM. The patch has debug code that prints the procArray.xmin assignments. I am not sure if I have all the boolean calls to GetTransactionSnapshot() correct. The set_procArray_xmin parameter should be true only when we are sure we aren't going to be resetting the session back to an earlier snapshot. The major affect of the patch is to set the minimum procArray.xmin to be the earliest in-process transaction, rather than the earliest in-process transaction at transaction start. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Bruce Momjian wrote: > The attached patch updates procArray.xmin in this manner. Here is an > example of how the patch improves dead row reuse: I don't think this really works. Consider what happens if I change session 2 this way: > Session #: > 1 2 3 > > CREATE TABLE test(x int); > INSERT INTO test VALUES (1); > BEGIN; > DELETE FROM test; > BEGIN; DECLARE foo CURSOR FOR SELECT * FROM test; > SELECT 1; > COMMIT; > VACUUM VERBOSE test; > (row not reused) > SELECT 1; FETCH * FROM foo; > > (At this point #2 doesn't see > the test row anymore. Patch > updates procArray.xmin.) > > VACUUM VERBOSE test; > (row reused with patch) > COMMIT; > VACUUM VERBOSE test; > (normal row reuse) -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Bruce Momjian <bruce@momjian.us> writes: > I have been thinking we could improve how quickly VACUUM can expire rows > if we update procArray.xmin more frequently for non-SERIALIZABLE > transactions. > The attached patch updates procArray.xmin in this manner. This patch is incredibly broken. Didn't you understand what I said about how we don't track which snapshots are still alive? You can't advance procArray.xmin past the xmin of the oldest live snapshot in the backend, and you can't assume that there are no live snapshots at the places where this patch assumes that. (Open cursors are one obvious counterexample, but I think there are more.) To make intra-transaction advancing of xmin possible, we'd need to explicitly track all of the backend's live snapshots, probably by introducing a "snapshot cache" manager that gives out tracked refcounts as we do for some other structures like catcache entries. This might have some other advantages (I think most of the current CopySnapshot operations could be replaced by refcount increments) but it's a whole lot more complicated and invasive than what you've got here. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > I have been thinking we could improve how quickly VACUUM can expire rows > > if we update procArray.xmin more frequently for non-SERIALIZABLE > > transactions. > > The attached patch updates procArray.xmin in this manner. > > This patch is incredibly broken. Didn't you understand what I said > about how we don't track which snapshots are still alive? You can't > advance procArray.xmin past the xmin of the oldest live snapshot in the > backend, and you can't assume that there are no live snapshots at the > places where this patch assumes that. (Open cursors are one obvious > counterexample, but I think there are more.) > > To make intra-transaction advancing of xmin possible, we'd need to > explicitly track all of the backend's live snapshots, probably by > introducing a "snapshot cache" manager that gives out tracked refcounts > as we do for some other structures like catcache entries. This might > have some other advantages (I think most of the current CopySnapshot > operations could be replaced by refcount increments) but it's a whole > lot more complicated and invasive than what you've got here. I updated the patch to save the MyProc->xid at the time the first cursor is created, and not allow the MyProc->xid to be set lower than that saved value in the current transaction. It added only a few more lines to the patch. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Bruce Momjian wrote: > Tom Lane wrote: >> Bruce Momjian <bruce@momjian.us> writes: >>> I have been thinking we could improve how quickly VACUUM can expire rows >>> if we update procArray.xmin more frequently for non-SERIALIZABLE >>> transactions. >>> The attached patch updates procArray.xmin in this manner. >> This patch is incredibly broken. Didn't you understand what I said >> about how we don't track which snapshots are still alive? You can't >> advance procArray.xmin past the xmin of the oldest live snapshot in the >> backend, and you can't assume that there are no live snapshots at the >> places where this patch assumes that. (Open cursors are one obvious >> counterexample, but I think there are more.) >> >> To make intra-transaction advancing of xmin possible, we'd need to >> explicitly track all of the backend's live snapshots, probably by >> introducing a "snapshot cache" manager that gives out tracked refcounts >> as we do for some other structures like catcache entries. This might >> have some other advantages (I think most of the current CopySnapshot >> operations could be replaced by refcount increments) but it's a whole >> lot more complicated and invasive than what you've got here. > > I updated the patch to save the MyProc->xid at the time the first cursor > is created, and not allow the MyProc->xid to be set lower than that > saved value in the current transaction. It added only a few more lines > to the patch. It seems to me a lot cleaner to do the reference counting like Tom suggested. Increase the refcount on CopySnapshot, and decrease it on FreeSnapshot. Assuming that all callers of CopySnapshot free the snapshot with FreeSnapshot when they're done with it. BTW: I really like the idea of doing this. It's a relatively simple thing to do and gives some immediate benefit. And it opens the door for more tricks to vacuum more aggressively in the presence of long-running transactions. And it allows us to vacuum tuples that were inserted and deleted in the same transactions even while the transaction is still running, which helps some pathological cases where a transaction updates a counter column many times within a transaction. We could also use it to free entries in the new combocids hash table earlier (not that it's a problem as it is, though). -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > It seems to me a lot cleaner to do the reference counting like Tom > suggested. Increase the refcount on CopySnapshot, and decrease it on > FreeSnapshot. Assuming that all callers of CopySnapshot free the > snapshot with FreeSnapshot when they're done with it. I don't believe we bother at the moment; which is one of the reasons it'd be a nontrivial patch. I do think it might be worth doing though. In the simple case where you're just issuing successive non-cursor commands within a READ COMMITTED transaction, a refcounted implementation would be able to recognize that there are *no* live snapshots between commands and therefore reset MyProc->xmin to 0 whenever the backend is idle. OTOH, do we have any evidence that this is worth bothering with at all? I fear that the cases of long-running transactions that are problems in the real world wouldn't be helped much --- for instance, pg_dump wouldn't change behavior because it uses a serializable transaction. Also, at some point a long-running transaction becomes a bottleneck simply because its XID is itself the oldest thing visible in the ProcArray and is determining everyone's xmin. How much daylight is there really between "your xmin is old" and "your xid is old"? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: > > It seems to me a lot cleaner to do the reference counting like Tom > > suggested. Increase the refcount on CopySnapshot, and decrease it on > > FreeSnapshot. Assuming that all callers of CopySnapshot free the > > snapshot with FreeSnapshot when they're done with it. > > I don't believe we bother at the moment; which is one of the reasons > it'd be a nontrivial patch. I do think it might be worth doing though. > In the simple case where you're just issuing successive non-cursor > commands within a READ COMMITTED transaction, a refcounted > implementation would be able to recognize that there are *no* live > snapshots between commands and therefore reset MyProc->xmin to 0 > whenever the backend is idle. Attached is my current version of the patch. It doesn't work now that I tried to do reference count for Snapshots, but will stop now that Tom is considering redesigning the snapshot mechanism. > OTOH, do we have any evidence that this is worth bothering with at all? > I fear that the cases of long-running transactions that are problems > in the real world wouldn't be helped much --- for instance, pg_dump > wouldn't change behavior because it uses a serializable transaction. > Also, at some point a long-running transaction becomes a bottleneck > simply because its XID is itself the oldest thing visible in the > ProcArray and is determining everyone's xmin. How much daylight is > there really between "your xmin is old" and "your xid is old"? Well, interesting you mention that, because I have a second idea on how to improve things. We start with MyProc->xmin equal to our own xid, and then look for earlier transactions. It should be possible to skip considering our own xid for MyProc->xmin. This would obviously help VACUUM during long-running transactions. While our transaction is running, our xid isn't committed, so VACUUM isn't going to touch any of our rows, and if other transactions complete before our multi-transaction _statement_ starts, we can't see deleted rows from them transaction, so why keep the deleted rows around? Consider this case: Session #: 1 2 3 BEGIN; SELECT 1; CREATE TABLE test(x int); INSERT INTO test VALUES (1); DELETE FROM test; SELECT 1; VACUUM VERBOSE test; (row can be reused) COMMIT; VACUUM VERBOSE test; (normal row reuse) As I understand it, in READ COMMITTED mode, we have to skip transactions in progress when our _statement_ starts, but anything committed before that we see and we don't see dead rows created by them. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| > OTOH, do we have any evidence that this is worth bothering with at all? > I fear that the cases of long-running transactions that are problems > in the real world wouldn't be helped much --- for instance, pg_dump > wouldn't change behavior because it uses a serializable transaction. Well I think this would be the same infrastructure we would need to do the other discussed improvement to address pg_dump's impact. That would require us to publish the youngest xmax of the live snapshots. Vacuum could deduce that that xid cannot possibly see any transactions between the youngest extant xmax and the oldest in-progress transaction. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Gregory Stark <stark@enterprisedb.com> writes: > Well I think this would be the same infrastructure we would need to do > the other discussed improvement to address pg_dump's impact. That > would require us to publish the youngest xmax of the live > snapshots. Vacuum could deduce that that xid cannot possibly see any > transactions between the youngest extant xmax and the oldest > in-progress transaction. .... and do what with the knowledge? Not remove tuples, because any such tuples would be part of RECENTLY_DEAD update chains that that xact might be following now or in the future. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| "Tom Lane" <tgl@sss.pgh.pa.us> writes: > Gregory Stark <stark@enterprisedb.com> writes: >> Well I think this would be the same infrastructure we would need to do >> the other discussed improvement to address pg_dump's impact. That >> would require us to publish the youngest xmax of the live >> snapshots. Vacuum could deduce that that xid cannot possibly see any >> transactions between the youngest extant xmax and the oldest >> in-progress transaction. > > ... and do what with the knowledge? Not remove tuples, because any such > tuples would be part of RECENTLY_DEAD update chains that that xact might > be following now or in the future. Well that just means it might require extra work, not that it would be impossible. Firstly, some tuples would not be part of a chain and could be cleaned up anyways. Others would be part of a HOT chain which might make it easier to clean them up. But even for tuples that are part of a chain there may be solutions. We could truncate the tuple to just the MVCC information so subsequent transactions can find the head. Or we might be able to go back and edit the forward link to skip the dead intermediate tuples (and somehow deal with the race conditions...) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |