This is a discussion on segfault of autovacuum process during restore - coredumps included within the pgsql Bugs forums, part of the PostgreSQL category; --> L.S. Since I started to use v8.1 I've been seeing incidental segfaults of the autovacuum process, probably when it ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| L.S. Since I started to use v8.1 I've been seeing incidental segfaults of the autovacuum process, probably when it is kicking in at some particular point during the pg_restore of a database: <-2005-11-28 09:39:12 CET>LOG: autovacuum process (PID 5075) was terminated by signal 11 <-2005-11-28 09:39:12 CET>LOG: terminating any other active server processes and <-2005-11-28 10:09:21 CET>LOG: autovacuum process (PID 5230) was terminated by signal 11 <-2005-11-28 10:09:21 CET>LOG: terminating any other active server processes Triggering it twice this morning annoyed me enough to recompile with debug info/asserts. During a number of (20-30) additional 'tries' and I was able to trigger the problem twice, resulting the following coredumps: #0 0x082535fc in CopySnapshot (snapshot=0x0) at tqual.c:1301 1301 newsnap = (Snapshot) palloc(sizeof(SnapshotData) + (gdb) where #0 0x082535fc in CopySnapshot (snapshot=0x0) at tqual.c:1301 #1 0x0814673b in fmgr_sql (fcinfo=0xbfe9d740) at functions.c:319 #2 0x081400bc in ExecMakeFunctionResult (fcache=0x8517ab8, econtext=0x8517f10, isNull=0xbfe9d9bb "�231=\b", isDone=0x0) at execQual.c:1096 #3 0x081426d6 in ExecEvalExprSwitchContext (expression=0x8519b04, econtext=0x0, isNull=0xbfe9d9bb "�231=\b", isDone=0x0) at execQual.c:2865 #4 0x08189e53 in evaluate_expr (expr=0x8517ab8, result_type=23) at clauses.c:2646 #5 0x0818b8e9 in simplify_function (funcid=163843, result_type=23, args=0x85023c0, allow_inline=1 '\001', context=0xbfe9dbd0) at clauses.c:2260 #6 0x0818bdea in eval_const_expressions_mutator (node=0x8502144, context=0xbfe9dbd0) at clauses.c:1305 #7 0x0818a5bd in expression_tree_mutator (node=0x85015a0, mutator=0x818bc10 <eval_const_expressions_mutator>, context=0xbfe9dbd0) at clauses.c:3473 #8 0x0818be4d in eval_const_expressions_mutator (node=0x8501694, context=0xbfe9dbd0) at clauses.c:1335 #9 0x0818c431 in eval_const_expressions_mutator (node=0x8502304, context=0xbfe9dbd0) at clauses.c:2030 #10 0x0818ca35 in eval_const_expressions (node=0x85016c0) at clauses.c:1211 #11 0x0822fd63 in RelationGetIndexPredicate (relation=0x443361b8) at relcache.c:2790 #12 0x080cb49f in BuildIndexInfo (index=0x443361b8) at index.c:900 #13 0x080febd6 in analyze_rel (relid=164956, vacstmt=0x44217764) at analyze.c:257 #14 0x0813728b in vacuum (vacstmt=0x44217764, relids=0x44219fb8) at vacuum.c:476 #15 0x0819350f in autovacuum_do_vac_analyze (relids=0x4421b394, dovacuum=0 '\0', doanalyze=1 '\001', freeze=0 '\0') at autovacuum.c:900 #16 0x08193ead in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:674 #17 0x081941f6 in autovac_start () at autovacuum.c:170 #18 0x08199f91 in ServerLoop () at postmaster.c:1268 #19 0x0819b122 in PostmasterMain (argc=3, argv=0x833b8b8) at postmaster.c:943 #20 0x0815bece in main (argc=3, argv=0x833b8b8) at main.c:256 and #0 0x082535fc in CopySnapshot (snapshot=0x0) at tqual.c:1301 1301 newsnap = (Snapshot) palloc(sizeof(SnapshotData) + (gdb) bt #0 0x082535fc in CopySnapshot (snapshot=0x0) at tqual.c:1301 #1 0x0814673b in fmgr_sql (fcinfo=0xbfd97090) at functions.c:319 #2 0x081400bc in ExecMakeFunctionResult (fcache=0x8567a50, econtext=0x8567ea8, isNull=0xbfd9730b "", isDone=0x0) at execQual.c:1096 #3 0x081426d6 in ExecEvalExprSwitchContext (expression=0x8569a9c, econtext=0x0, isNull=0xbfd9730b "", isDone=0x0) at execQual.c:2865 #4 0x08189e53 in evaluate_expr (expr=0x8567a50, result_type=23) at clauses.c:2646 #5 0x0818b8e9 in simplify_function (funcid=226356, result_type=23, args=0x8552278, allow_inline=1 '\001', context=0xbfd97520) at clauses.c:2260 #6 0x0818bdea in eval_const_expressions_mutator (node=0x8551ffc, context=0xbfd97520) at clauses.c:1305 #7 0x0818a5bd in expression_tree_mutator (node=0x853d450, mutator=0x818bc10 <eval_const_expressions_mutator>, context=0xbfd97520) at clauses.c:3473 #8 0x0818be4d in eval_const_expressions_mutator (node=0x853d544, context=0xbfd97520) at clauses.c:1335 #9 0x0818c431 in eval_const_expressions_mutator (node=0x85521bc, context=0xbfd97520) at clauses.c:2030 #10 0x0818ca35 in eval_const_expressions (node=0x853d570) at clauses.c:1211 #11 0x0822fd63 in RelationGetIndexPredicate (relation=0x44377818) at relcache.c:2790 #12 0x080cb49f in BuildIndexInfo (index=0x44377818) at index.c:900 #13 0x080febd6 in analyze_rel (relid=227469, vacstmt=0x44258764) at analyze.c:257 #14 0x0813728b in vacuum (vacstmt=0x44258764, relids=0x4425afb8) at vacuum.c:476 #15 0x0819350f in autovacuum_do_vac_analyze (relids=0x4425c394, dovacuum=0 '\0', doanalyze=1 '\001', freeze=0 '\0') at autovacuum.c:900 #16 0x08193ead in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:674 #17 0x081941f6 in autovac_start () at autovacuum.c:170 #18 0x08199f91 in ServerLoop () at postmaster.c:1268 #19 0x0819b122 in PostmasterMain (argc=3, argv=0x833b8b8) at postmaster.c:943 #20 0x0815bece in main (argc=3, argv=0x833b8b8) at main.c:256 Both dumps are still available for additional info on variable-values, etc. db=# select version(); version ------------------------------------------------------------------------ PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3 -- Best, Frank. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Frank van Vugt wrote: > L.S. > > Since I started to use v8.1 I've been seeing incidental segfaults of the > autovacuum process, probably when it is kicking in at some particular point > during the pg_restore of a database: The problem appears to be that autovacuum is forgetting to set a snapshot in cases where it's needed. My bug -- should be easy to fix. I'll apply a patch shortly. Thanks for the report! -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Frank van Vugt wrote: > Since I started to use v8.1 I've been seeing incidental segfaults of the > autovacuum process, probably when it is kicking in at some particular point > during the pg_restore of a database: Hum, I'm unable to reproduce the problem here; could you give me the code for the functions in the functional indexes for tables 164956 or 227469? They must be non-trivial SQL functions AFAICT (I'm rather unable to figure out their shape based only on the backtrace.) The attached patch should correct the problem, but I'd like to make sure it does ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Alvaro Herrera wrote: > Frank van Vugt wrote: > > > Since I started to use v8.1 I've been seeing incidental segfaults of the > > autovacuum process, probably when it is kicking in at some particular point > > during the pg_restore of a database: [...] > The attached patch should correct the problem, but I'd like to make sure > it does ... I managed to make it fail, and the patch does indeed correct the problem. I'm applying to both HEAD and the 8.1 branch. If you could confirm that it fixes the problem for you, that'd be good. Thanks, -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Alvaro Herrera <alvherre@commandprompt.com> writes: > The attached patch should correct the problem, but I'd like to make sure > it does ... Rather than that, I'd suggest just setting ActiveSnapshot unconditionally after each of the StartTransactionCommand calls in autovacuum.c, ie make the code look just like vacuum.c: /* Begin a transaction for vacuuming this relation */ StartTransactionCommand(); /* functions in indexes may want a snapshot set */ ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); This seems more future-proof. The patch as proposed is assuming a whole lot about where snapshots might or might not get used. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| > Alvaro Herrera <alvherre@commandprompt.com> writes: > > The attached patch should correct the problem, but I'd like to make sure > > it does ... > Rather than that, I'd suggest just setting ActiveSnapshot > unconditionally after each of the StartTransactionCommand calls in > autovacuum.c, ie make the code look just like vacuum.c: > > /* Begin a transaction for vacuuming this relation */ > StartTransactionCommand(); > /* functions in indexes may want a snapshot set */ > ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); > > This seems more future-proof. The patch as proposed is assuming a whole > lot about where snapshots might or might not get used. Will try the patch tonight. Tom, is your patch meant for the exact same location? Also, don't we need a 'CommitTransactionCommand()' as well? -- Best, Frank. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Frank van Vugt <ftm.van.vugt@foxi.nl> writes: > Tom, is your patch meant for the exact same location? No. There are two StartTransactionCommand calls in autovacuum.c, and what I'm suggesting is to add the ActiveSnapshot assignment after each one. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > The attached patch should correct the problem, but I'd like to make sure > > it does ... > > Rather than that, I'd suggest just setting ActiveSnapshot > unconditionally after each of the StartTransactionCommand calls in > autovacuum.c, ie make the code look just like vacuum.c: > > /* Begin a transaction for vacuuming this relation */ > StartTransactionCommand(); > /* functions in indexes may want a snapshot set */ > ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); Done. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Alvaro, > Hum, I'm unable to reproduce the problem here; could you give me the > code for the functions in the functional indexes for tables 164956 or > 227469? They must be non-trivial SQL functions AFAICT (I'm rather > unable to figure out their shape based only on the backtrace.) Obviously the oid's have changed backtrace: #13 0x080febd6 in analyze_rel (relid=268480, vacstmt=0x4422f4a0) at analyze.c:257 is referring to a table 'purchaseorder_line' which has one functional index defined like this (taken from '\d' output): "purchaseorder_line_idx01" btree (salesorder_line_id) WHERE status_id <> pol_stat('POL_CANCELLED'::character varying) AND status_id <> pol_stat('POL_HANDLED'::character varying) The pol_stat() function is a helper-function used to refer to statusses by abbreviation instead of id and is defined as: CREATE OR REPLACE FUNCTION pol_stat(varchar) RETURNS int LANGUAGE 'sql' IMMUTABLE STRICT SECURITY INVOKER AS 'SELECT id FROM purchaseorder_line_status WHERE abbreviation = $1'; If you were interested in some other relid, just let me know. -- Best, Frank. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| <TL> > No. There are two StartTransactionCommand calls in autovacuum.c, and > what I'm suggesting is to add the ActiveSnapshot assignment after each > one. <AH> > Done. I've changed autovacuum.c per this diff, I 'hope' I misinterpreted what needed to be done (see below): ================================================== ==== -- autovacuum.c_orig 2005-11-28 16:34:49.000000000 +0100 +++ autovacuum.c 2005-11-28 16:37:11.000000000 +0100 @@ -494,6 +494,9 @@ /* Start a transaction so our commands have one to play into. */ StartTransactionCommand(); + /* Begin a transaction for vacuuming this database */ + ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); + dbRel = heap_open(DatabaseRelationId, AccessShareLock); /* Must use a table scan, since there's no syscache for pg_database */ @@ -555,6 +558,9 @@ /* Start a transaction so our commands have one to play into. */ StartTransactionCommand(); + /* Begin a transaction for vacuuming this relation */ + ActiveSnapshot = CopySnapshot(GetTransactionSnapshot()); + /* * StartTransactionCommand and CommitTransactionCommand will automatically * switch to other contexts. We need this one to keep the list of ================================================== ==== Both the first and second attempt to restore my development database failed..... ;( Here's a new backtrace: #0 0x0825361c in CopySnapshot (snapshot=0x0) at tqual.c:1301 1301 newsnap = (Snapshot) palloc(sizeof(SnapshotData) + (gdb) bt #0 0x0825361c in CopySnapshot (snapshot=0x0) at tqual.c:1301 #1 0x0814673b in fmgr_sql (fcinfo=0xbfa79e20) at functions.c:319 #2 0x081400bc in ExecMakeFunctionResult (fcache=0x84e3780, econtext=0x84e3bd8, isNull=0xbfa7a09b "<\b", isDone=0x0) at execQual.c:1096 #3 0x081426d6 in ExecEvalExprSwitchContext (expression=0x850dfdc, econtext=0x0, isNull=0xbfa7a09b "<\b", isDone=0x0) at execQual.c:2865 #4 0x08189e53 in evaluate_expr (expr=0x84e3780, result_type=23) at clauses.c:2646 #5 0x0818b8e9 in simplify_function (funcid=267367, result_type=23, args=0x84ed6d0, allow_inline=1 '\001', context=0xbfa7a2b0) at clauses.c:2260 #6 0x0818bdea in eval_const_expressions_mutator (node=0x84ed454, context=0xbfa7a2b0) at clauses.c:1305 #7 0x0818a5bd in expression_tree_mutator (node=0x84e35a0, mutator=0x818bc10 <eval_const_expressions_mutator>, context=0xbfa7a2b0) at clauses.c:3473 #8 0x0818be4d in eval_const_expressions_mutator (node=0x84e3694, context=0xbfa7a2b0) at clauses.c:1335 #9 0x0818c431 in eval_const_expressions_mutator (node=0x84ed614, context=0xbfa7a2b0) at clauses.c:2030 #10 0x0818ca35 in eval_const_expressions (node=0x84e36c0) at clauses.c:1211 #11 0x0822fd83 in RelationGetIndexPredicate (relation=0x4433f40c) at relcache.c:2790 #12 0x080cb49f in BuildIndexInfo (index=0x4433f40c) at index.c:900 #13 0x080febd6 in analyze_rel (relid=268480, vacstmt=0x4422f4a0) at analyze.c:257 #14 0x0813728b in vacuum (vacstmt=0x4422f4a0, relids=0x44228dcc) at vacuum.c:476 #15 0x0819350f in autovacuum_do_vac_analyze (relids=0x4422e528, dovacuum=0 '\0', doanalyze=1 '\001', freeze=0 '\0') at autovacuum.c:906 #16 0x08193ecd in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:680 #17 0x08194216 in autovac_start () at autovacuum.c:170 #18 0x08199fb1 in ServerLoop () at postmaster.c:1268 #19 0x0819b142 in PostmasterMain (argc=3, argv=0x833b8a0) at postmaster.c:943 #20 0x0815bece in main (argc=3, argv=0x833b8a0) at main.c:256 -- Best, Frank. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |