This is a discussion on Vacuum full crash within the pgsql Admins forums, part of the PostgreSQL category; --> Hello list an interrupted vacuum full has just caused a PG instance to restart and recover. Background: select version(); ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello list an interrupted vacuum full has just caused a PG instance to restart and recover. Background: select version(); version ---------------------------------------------------------------------------------------------------------- PostgreSQL 8.3.1 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070626 (Red Hat 4.1.2-14) (1 row) I have a largish ( >1TB ) database which is kind of a data warehouse. Recently I had to do some major operations to some of the tables (update all rows in a table etc) which caused major bloat. To remove the bloat, I run VACUUM FULL VERBOSE on the bloated tables. Before the vacuum got finished, I had to abort it due to problems with my own laptop. When I hit ctrl+c to the vacuum, the PG instance went suddenly to recover mode. The logs showed this: 2008-03-29 22:25:15 EET [26841]: [1-1] ERROR: canceling statement due to user request 2008-03-29 22:25:15 EET [26841]: [2-1] STATEMENT: vacuum full verbose xyz ; 2008-03-29 22:25:15 EET [26841]: [1-1] ERROR: canceling statement due to user request 2008-03-29 22:25:15 EET [26841]: [2-1] STATEMENT: vacuum full verbose xyz ; 2008-03-29 22:25:15 EET [26841]: [3-1] PANIC: cannot abort transaction 3778747509, it was already committed 2008-03-29 22:25:15 EET [6476]: [4-1] LOG: server process (PID 26841) was terminated by signal 6: Aborted 2008-03-29 22:25:15 EET [6476]: [5-1] LOG: terminating any other active server processes 2008-03-29 22:25:15 EET [24814]: [48-1] WARNING: terminating connection because of crash of another server process 2008-03-29 22:25:15 EET [24814]: [49-1] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2008-03-29 22:25:15 EET [24814]: [50-1] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2008-03-29 22:25:15 EET [26841]: [3-1] PANIC: cannot abort transaction 3778747509, it was already committed 2008-03-29 22:25:15 EET [6476]: [4-1] LOG: server process (PID 26841) was terminated by signal 6: Aborted 2008-03-29 22:25:15 EET [6476]: [5-1] LOG: terminating any other active server processes 2008-03-29 22:25:15 EET [24814]: [48-1] WARNING: terminating connection because of crash of another server process 2008-03-29 22:25:15 EET [24814]: [49-1] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2008-03-29 22:25:15 EET [24814]: [50-1] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2008-03-29 22:25:15 EET [6476]: [6-1] LOG: archiver process (PID 6489) exited with exit code 1 2008-03-29 22:25:15 EET [18228]: [1-1] FATAL: the database system is in recovery mode 2008-03-29 22:25:16 EET [6476]: [7-1] LOG: all server processes terminated; reinitializing 2008-03-29 22:25:16 EET [18229]: [1-1] LOG: database system was interrupted; last known up at 2008-03-29 22:20:24 EET 2008-03-29 22:25:16 EET [18229]: [2-1] LOG: database system was not properly shut down; automatic recovery in progress 2008-03-29 22:25:16 EET [18229]: [3-1] LOG: redo starts at CB5/16399698 2008-03-29 22:25:22 EET [18229]: [4-1] LOG: unexpected pageaddr CB4/76FF6000 in log file 3253, segment 35, offset 16736256 2008-03-29 22:25:22 EET [18229]: [5-1] LOG: redo done at CB5/23FF4C80 2008-03-29 22:25:22 EET [18229]: [6-1] LOG: last completed transaction was at log time 2008-03-29 22:22:47.931231+02 2008-03-29 22:25:23 EET [18336]: [1-1] FATAL: the database system is in recovery mode 2008-03-29 22:25:27 EET [18337]: [1-1] FATAL: the database system is in recovery mode 2008-03-29 22:25:30 EET [18346]: [1-1] FATAL: the database system is in recovery mode 2008-03-29 22:25:41 EET [18424]: [1-1] FATAL: the database system is in recovery mode 2008-03-29 22:25:43 EET [18427]: [1-1] LOG: autovacuum launcher started 2008-03-29 22:25:43 EET [6476]: [8-1] LOG: database system is ready to accept connections Seems quite serious to me ("cannot abort a transaction that has already committed"), what can cause such behaviour? Regards Mikko |
| |||
| "Mikko Partio" <mpartio@gmail.com> writes: > 2008-03-29 22:25:15 EET [26841]: [3-1] PANIC: cannot abort transaction > 3778747509, it was already committed Yeah, this is a well-known problem with VACUUM FULL ... regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin |
| |||
| On Sun, Mar 30, 2008 at 12:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Mikko Partio" <mpartio@gmail.com> writes: > > 2008-03-29 22:25:15 EET [26841]: [3-1] PANIC: cannot abort transaction > > 3778747509, it was already committed > > Yeah, this is a well-known problem with VACUUM FULL ... > > Ok, that's good to know. Is there a bugfix on the way or is it more like a 'feature'? Regards Mikko |
| |||
| "Mikko Partio" <mpartio@gmail.com> writes: > On Sun, Mar 30, 2008 at 12:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> "Mikko Partio" <mpartio@gmail.com> writes: >>> 2008-03-29 22:25:15 EET [26841]: [3-1] PANIC: cannot abort transaction >>> 3778747509, it was already committed >> >> Yeah, this is a well-known problem with VACUUM FULL ... > Ok, that's good to know. Is there a bugfix on the way or is it more like a > 'feature'? It would be good to fix it I suppose, but what with VACUUM FULL being on the edge of deprecation anyway, it's hard to muster enthusiasm for doing a (probably) large additional amount of work on it, I think most hackers would prefer to spend time on making it unnecessary than making it more bulletproof. Note that there is (AFAIK) no risk of loss of committed data from this bug. The fact that the PANIC kills all your other sessions is annoying, but in the scenarios where this is likely to happen you're trying to stop them all anyway ... regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin |
| |||
| On Sun, Mar 30, 2008 at 6:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It would be good to fix it I suppose, but what with VACUUM FULL being on > the edge of deprecation anyway, it's hard to muster enthusiasm for doing > a (probably) large additional amount of work on it, I think most > hackers would prefer to spend time on making it unnecessary than making > it more bulletproof. > Interesting. What is planned substitute for VACUUM FULL? CLUSTER? Regards Mikko |
| ||||
| "Mikko Partio" <mpartio@gmail.com> writes: > Interesting. What is planned substitute for VACUUM FULL? CLUSTER? What's been discussed is something that works roughly like CLUSTER but doesn't bother to sort the data any particular way (ie, just do a seqscan not an indexscan in the data-copying phase). This would certainly dominate VACUUM FULL in contexts where you've removed a large fraction of the rows. In cases where you haven't, the large transient space requirement (2x the table size) would be objectionable, but it's not real clear why you need a VACUUM FULL rather than plain VACUUM in such cases anyway. A lot of the context for this arises from recent and planned improvements, notably (1) CLUSTER is MVCC-safe now, removing one of the core arguments for not using it for routine cleanup. (2) Planned fixes like the dead space map will substantially reduce the cost of running frequent plain vacuums. That together with autovacuum (which we could make more aggressive in consequence) should help to avoid getting into situations where VACUUM FULL is needed in the first place. We certainly couldn't have considered eliminating VACUUM FULL before this, but it's starting to seem a viable option to me. regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin |
| Thread Tools | |
| Display Modes | |
|
|