This is a discussion on WAL replay of truncate fails if the table was dropped within the pgsql Bugs forums, part of the PostgreSQL category; --> mdtruncate throws an error if the relation file doesn't exist. However, that's not an error condition if the relation ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| mdtruncate throws an error if the relation file doesn't exist. However, that's not an error condition if the relation was dropped later. Non-existent file should be treated the same as an already truncated file; we now end up with an unrecoverable database. This bug seems to be present from 8.0 onwards. Attached is a test case to reproduce it, along with a patch for CVS HEAD, and an adapted version of the patch for 8.0-8.2. Thanks to my colleague Dharmendra Goyal for finding this bug and constructing an initial test case. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > mdtruncate throws an error if the relation file doesn't exist. Interesting corner case. The proposed fix seems not very consistent with the way we handle comparable cases elsewhere, though. In general, md.c will cut some slack when InRecovery if a relation is shorter than expected, but not if it's not there at all. (This is, indeed, what justifies mdtruncate's response to file-too-short...) We handle dropped files during recovery by forced smgrcreate() in places like XLogOpenRelation. I'm inclined to think smgr_redo should force smgrcreate() before trying to truncate. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> mdtruncate throws an error if the relation file doesn't exist. > > Interesting corner case. The proposed fix seems not very consistent > with the way we handle comparable cases elsewhere, though. In general, > md.c will cut some slack when InRecovery if a relation is shorter than > expected, but not if it's not there at all. (This is, indeed, what > justifies mdtruncate's response to file-too-short...) We handle > dropped files during recovery by forced smgrcreate() in places like > XLogOpenRelation. I'm inclined to think smgr_redo should force > smgrcreate() before trying to truncate. I followed the example of the file-too-short case. Yeah, calling smgrcreate would work and I can see the justification for that as well. Interestingly, this bug isn't triggered unless there's an already empty or uninitialized page at the end of table. If vacuum removes the last tuple from the page, that will be WAL-logged and replay of that calls smgrcreate. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > Interestingly, this bug isn't triggered unless there's an already empty > or uninitialized page at the end of table. If vacuum removes the last > tuple from the page, that will be WAL-logged and replay of that calls > smgrcreate. Yeah, I tried other ways to provoke the failure and came to the same conclusion. The reproducer really is relying on the fact that vacuum's PageInit of an uninitialized page doesn't get WAL-logged. Which is a bit nervous-making. As far as I can think at the moment, it won't provoke any problem because the first subsequent WAL-logged touch of the page would be an INSERT with the INIT bit set; but it does mean that a warm-standby slave would be out of sync with the master for an indefinitely long period with respect to the on-disk contents of such a page. Does that matter? Note that we have to fix truncate replay anyway, since you could have the same failure if a checkpoint happened just before an ordinary vacuum's truncate. This PageInit behavior merely allows a simpler reproducer script with no race condition involved. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| On Fri, 2007-07-20 at 11:38 -0400, Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: > > Interestingly, this bug isn't triggered unless there's an already empty > > or uninitialized page at the end of table. If vacuum removes the last > > tuple from the page, that will be WAL-logged and replay of that calls > > smgrcreate. > > Yeah, I tried other ways to provoke the failure and came to the same > conclusion. The reproducer really is relying on the fact that vacuum's > PageInit of an uninitialized page doesn't get WAL-logged. Which is a > bit nervous-making. As far as I can think at the moment, it won't > provoke any problem because the first subsequent WAL-logged touch of > the page would be an INSERT with the INIT bit set; but it does mean > that a warm-standby slave would be out of sync with the master for an > indefinitely long period with respect to the on-disk contents of such a > page. Does that matter? If I understand this: the primary would be initialised yet the standby would remain uninitialised? I don't think that matters because the actual the data contents are still zero. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| "Simon Riggs" <simon@2ndquadrant.com> writes: > If I understand this: the primary would be initialised yet the standby > would remain uninitialised? I don't think that matters because the > actual the data contents are still zero. From a logical perspective the page is "empty" either way. The only behavioral difference I can think of is that the initialized page is a candidate for insertion of new tuples, whereas on the slave it would not be a candidate until after another VACUUM. So the histories would diverge faster once the slave comes alive. As long as the slave is just following WAL records and not making any decisions of its own, I can't see a failure mode; but it looks like a potential weak spot for future extensions (particularly, trying to allow slave servers to execute queries). regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |