vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I seem to remember that we'd agreed that autovacuum should ignore any globally set statement_timeout, on the grounds that a poorly chosen setting could indefinitely prevent large tables from being vacuumed. But I do not see anything in autovacuum.c that resets the variable. Am I just being blind? (Quite possible, as I'm tired and under the weather.) The thing that brought this to mind was the idea that Mark Shuttleworth's open problem might be triggered in part by a statement timeout interrupting autovacuum at an inopportune point --- some logs he sent me offlist show that he is using statement_timeout ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > I seem to remember that we'd agreed that autovacuum should ignore any > globally set statement_timeout, on the grounds that a poorly chosen > setting could indefinitely prevent large tables from being vacuumed. > But I do not see anything in autovacuum.c that resets the variable. > Am I just being blind? (Quite possible, as I'm tired and under the > weather.) > > The thing that brought this to mind was the idea that Mark > Shuttleworth's open problem might be triggered in part by a statement > timeout interrupting autovacuum at an inopportune point --- some logs > he sent me offlist show that he is using statement_timeout ... statement_timeout interrupts seem to go through the PG_CATCH-block and clean up the entry from the vacuum cycle array as they should. But a SIGINT leading to a "terminating connection due to administrator command" error does not. After the recent change in CVS HEAD, CREATE DATABASE tries to kill(SIGINT) any autovacuum process in the template database. That seems very dangerous now, it could easily leave stale entries in the cycle id array. However, it doesn't explain the Mark Shuttleworth's problem because the 8.2 behavior is to throw an "source database is being accessed by other users" error instead of killing autovacuum. Maybe there's something else killing autovacuum processes? I think we need to add the xid of the vacuum transaction in the vacuum cycle array, and clean up orphaned entries in _bt_start_vacuum. We're going to have a hard time plugging every leak one-by-one otherwise. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Heikki Linnakangas <heikki@enterprisedb.com> writes: > statement_timeout interrupts seem to go through the PG_CATCH-block and > clean up the entry from the vacuum cycle array as they should. But a > SIGINT leading to a "terminating connection due to administrator > command" error does not. Hm, that's an interesting thought, but there are no "terminating connection" messages in Shuttleworth's logs either. So we still lack the right idea there. (BTW it would be SIGTERM not SIGINT.) > I think we need to add the xid of the vacuum transaction in the vacuum > cycle array, and clean up orphaned entries in _bt_start_vacuum. We're > going to have a hard time plugging every leak one-by-one otherwise. You're thinking too small --- what this thought actually suggests is that PG_CATCH can't be used to clean up shared memory at all, and I don't think we want to accept that. (I see several other places already where we assume we can do that. We could convert each one into an on_proc_exit cleanup operation, maybe, but that seems messy and not very scalable.) I'm thinking we may want to redesign elog(FATAL) processing so that we escape out to the outer level before calling proc_exit, thereby allowing CATCH blocks to run first. Note for the archives: I've argued for some time that SIGTERM'ing individual backends is an insufficiently tested code path to be exposed as standard functionality. Looks like that's still true. This is not a bug for database shutdown because we don't really care if we leave perfectly clean shared memory behind --- it's only a bug if you try to SIGTERM an individual vacuum process while leaving the system up. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| I wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> statement_timeout interrupts seem to go through the PG_CATCH-block and >> clean up the entry from the vacuum cycle array as they should. But a >> SIGINT leading to a "terminating connection due to administrator >> command" error does not. > Hm, that's an interesting thought, but there are no "terminating > connection" messages in Shuttleworth's logs either. So we still lack > the right idea there. (BTW it would be SIGTERM not SIGINT.) Hold it ... stop the presses ... the reason we saw no "terminating connection" messages was he was grepping his logs for lines containing ERROR. Once we look for FATAL too, there are a pile of 'em. I'm not 100% convinced that any are from autovacuum processes, but clearly *something* is throwing SIGTERM around with abandon in his test environment. So at this point your theory above looks like a plausible mechanism for the vacuum cycle array to slowly fill up and eventually make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to a repeat vacuum attempt). >> I think we need to add the xid of the vacuum transaction in the vacuum >> cycle array, and clean up orphaned entries in _bt_start_vacuum. We're >> going to have a hard time plugging every leak one-by-one otherwise. > You're thinking too small --- what this thought actually suggests is > that PG_CATCH can't be used to clean up shared memory at all, and I > don't think we want to accept that. (I see several other places already > where we assume we can do that. We could convert each one into an > on_proc_exit cleanup operation, maybe, but that seems messy and not very > scalable.) I'm thinking we may want to redesign elog(FATAL) processing > so that we escape out to the outer level before calling proc_exit, > thereby allowing CATCH blocks to run first. I was hoping we could do that just as an 8.3 change, but it's now starting to look like we might have to back-patch it, depending on how much we care about surviving random SIGTERM attempts. I'd like to wait for some report from Mark about what's causing all the SIGTERMs before we evaluate that. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Tom Lane wrote: > I wrote: > > Heikki Linnakangas <heikki@enterprisedb.com> writes: > >> statement_timeout interrupts seem to go through the PG_CATCH-block and > >> clean up the entry from the vacuum cycle array as they should. But a > >> SIGINT leading to a "terminating connection due to administrator > >> command" error does not. > > > Hm, that's an interesting thought, but there are no "terminating > > connection" messages in Shuttleworth's logs either. So we still lack > > the right idea there. (BTW it would be SIGTERM not SIGINT.) > > Hold it ... stop the presses ... the reason we saw no "terminating > connection" messages was he was grepping his logs for lines containing > ERROR. Once we look for FATAL too, there are a pile of 'em. I'm not > 100% convinced that any are from autovacuum processes, but clearly > *something* is throwing SIGTERM around with abandon in his test > environment. So at this point your theory above looks like a plausible > mechanism for the vacuum cycle array to slowly fill up and eventually > make _bt_start_vacuum fail (or, perhaps, fail sooner than that due to > a repeat vacuum attempt). Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE DATABASE; but what it does is send SIGINT, not SIGTERM. Also, it's not in 8.2. SIGINT does terminate the autovac process however. I haven't read the whole problem report completely, so I'm not sure this has something to do or not. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > Hmmm, remember that DatabaseCancelAutovacuumActivity is called on CREATE > DATABASE; but what it does is send SIGINT, not SIGTERM. Also, it's not > in 8.2. SIGINT does terminate the autovac process however. > I haven't read the whole problem report completely, so I'm not sure this > has something to do or not. AFAICT, SIGINT should be okay, because it will lead to an ERROR not a FATAL elog; so control should fall out through the CATCH block before the autovacuum process quits. The problem is with FATAL elogs. Mark reports that the only FATAL lines in his logs are instances of FATAL: terminating connection due to administrator command FATAL: database "launchpad_ftest" does not exist and the latter presumably isn't coming out from within the btree vacuum code, so I don't see any other explanation for a FATAL exit than SIGTERM. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Tom Lane wrote: > I seem to remember that we'd agreed that autovacuum should ignore any > globally set statement_timeout, on the grounds that a poorly chosen > setting could indefinitely prevent large tables from being vacuumed. On a vaguely related matter, should programs such as pg_dump, vacuumdb, and reindexdb disable statement_timeout? -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote: > Tom Lane wrote: > > I seem to remember that we'd agreed that autovacuum should ignore any > > globally set statement_timeout, on the grounds that a poorly chosen > > setting could indefinitely prevent large tables from being vacuumed. > > On a vaguely related matter, should programs such as pg_dump, vacuumdb, > and reindexdb disable statement_timeout? Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore to that list as well (really, pg_dump(all) should output a command to disable statement_timeout). -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Jim C. Nasby wrote: > On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote: >> Tom Lane wrote: >>> I seem to remember that we'd agreed that autovacuum should ignore any >>> globally set statement_timeout, on the grounds that a poorly chosen >>> setting could indefinitely prevent large tables from being vacuumed. >> On a vaguely related matter, should programs such as pg_dump, vacuumdb, >> and reindexdb disable statement_timeout? > > Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore > to that list as well (really, pg_dump(all) should output a command to > disable statement_timeout). I don't know if that should be a default or not. It is certainly easy enough to disable it should you want to. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutions since 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| On Tue, Apr 17, 2007 at 12:51:51PM -0700, Joshua D. Drake wrote: > Jim C. Nasby wrote: > >On Sun, Apr 01, 2007 at 12:36:01AM +0200, Peter Eisentraut wrote: > >>Tom Lane wrote: > >>>I seem to remember that we'd agreed that autovacuum should ignore any > >>>globally set statement_timeout, on the grounds that a poorly chosen > >>>setting could indefinitely prevent large tables from being vacuumed. > >>On a vaguely related matter, should programs such as pg_dump, vacuumdb, > >>and reindexdb disable statement_timeout? > > > >Youch... yes, they should IMO. Add clusterdb, pg_dumpall and pg_restore > >to that list as well (really, pg_dump(all) should output a command to > >disable statement_timeout). > > I don't know if that should be a default or not. It is certainly easy > enough to disable it should you want to. How would you disable it for those command-line utilities? Or are you referring to disabling it via an ALTER ROLE SET ... for superusers? ISTM current behavior is a bit of a foot-gun. These are administrative shell commands that aren't going to be run by Joe-user. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| Thread Tools | |
| Display Modes | |
|
|