Unix Technical Forum

segfault of autovacuum process during restore - coredumps included

This is a discussion on segfault of autovacuum process during restore - coredumps included within the pgsql Bugs forums, part of the PostgreSQL category; --> L.S. Since I started to use v8.1 I've been seeing incidental segfaults of the autovacuum process, probably when it ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Bugs

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 10:30 AM
Frank van Vugt
 
Posts: n/a
Default segfault of autovacuum process during restore - coredumps included

L.S.

Since I started to use v8.1 I've been seeing incidental segfaults of the
autovacuum process, probably when it is kicking in at some particular point
during the pg_restore of a database:

<-2005-11-28 09:39:12 CET>LOG: autovacuum process (PID 5075) was terminated
by signal 11
<-2005-11-28 09:39:12 CET>LOG: terminating any other active server processes

and

<-2005-11-28 10:09:21 CET>LOG: autovacuum process (PID 5230) was terminated
by signal 11
<-2005-11-28 10:09:21 CET>LOG: terminating any other active server processes



Triggering it twice this morning annoyed me enough to recompile with debug
info/asserts. During a number of (20-30) additional 'tries' and I was able
to trigger the problem twice, resulting the following coredumps:

#0 0x082535fc in CopySnapshot (snapshot=0x0) at tqual.c:1301
1301 newsnap = (Snapshot) palloc(sizeof(SnapshotData) +
(gdb) where
#0 0x082535fc in CopySnapshot (snapshot=0x0) at tqual.c:1301
#1 0x0814673b in fmgr_sql (fcinfo=0xbfe9d740) at functions.c:319
#2 0x081400bc in ExecMakeFunctionResult (fcache=0x8517ab8,
econtext=0x8517f10, isNull=0xbfe9d9bb "�231=\b", isDone=0x0) at
execQual.c:1096
#3 0x081426d6 in ExecEvalExprSwitchContext (expression=0x8519b04,
econtext=0x0, isNull=0xbfe9d9bb "�231=\b", isDone=0x0) at execQual.c:2865
#4 0x08189e53 in evaluate_expr (expr=0x8517ab8, result_type=23) at
clauses.c:2646
#5 0x0818b8e9 in simplify_function (funcid=163843, result_type=23,
args=0x85023c0, allow_inline=1 '\001', context=0xbfe9dbd0) at clauses.c:2260
#6 0x0818bdea in eval_const_expressions_mutator (node=0x8502144,
context=0xbfe9dbd0) at clauses.c:1305
#7 0x0818a5bd in expression_tree_mutator (node=0x85015a0, mutator=0x818bc10
<eval_const_expressions_mutator>, context=0xbfe9dbd0) at clauses.c:3473
#8 0x0818be4d in eval_const_expressions_mutator (node=0x8501694,
context=0xbfe9dbd0) at clauses.c:1335
#9 0x0818c431 in eval_const_expressions_mutator (node=0x8502304,
context=0xbfe9dbd0) at clauses.c:2030
#10 0x0818ca35 in eval_const_expressions (node=0x85016c0) at clauses.c:1211
#11 0x0822fd63 in RelationGetIndexPredicate (relation=0x443361b8) at
relcache.c:2790
#12 0x080cb49f in BuildIndexInfo (index=0x443361b8) at index.c:900
#13 0x080febd6 in analyze_rel (relid=164956, vacstmt=0x44217764) at
analyze.c:257
#14 0x0813728b in vacuum (vacstmt=0x44217764, relids=0x44219fb8) at
vacuum.c:476
#15 0x0819350f in autovacuum_do_vac_analyze (relids=0x4421b394, dovacuum=0
'\0', doanalyze=1 '\001', freeze=0 '\0') at autovacuum.c:900
#16 0x08193ead in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:674
#17 0x081941f6 in autovac_start () at autovacuum.c:170
#18 0x08199f91 in ServerLoop () at postmaster.c:1268
#19 0x0819b122 in PostmasterMain (argc=3, argv=0x833b8b8) at postmaster.c:943
#20 0x0815bece in main (argc=3, argv=0x833b8b8) at main.c:256

and

#0 0x082535fc in CopySnapshot (snapshot=0x0) at tqual.c:1301
1301 newsnap = (Snapshot) palloc(sizeof(SnapshotData) +
(gdb) bt
#0 0x082535fc in CopySnapshot (snapshot=0x0) at tqual.c:1301
#1 0x0814673b in fmgr_sql (fcinfo=0xbfd97090) at functions.c:319
#2 0x081400bc in ExecMakeFunctionResult (fcache=0x8567a50,
econtext=0x8567ea8, isNull=0xbfd9730b "", isDone=0x0) at execQual.c:1096
#3 0x081426d6 in ExecEvalExprSwitchContext (expression=0x8569a9c,
econtext=0x0, isNull=0xbfd9730b "", isDone=0x0) at execQual.c:2865
#4 0x08189e53 in evaluate_expr (expr=0x8567a50, result_type=23) at
clauses.c:2646
#5 0x0818b8e9 in simplify_function (funcid=226356, result_type=23,
args=0x8552278, allow_inline=1 '\001', context=0xbfd97520) at clauses.c:2260
#6 0x0818bdea in eval_const_expressions_mutator (node=0x8551ffc,
context=0xbfd97520) at clauses.c:1305
#7 0x0818a5bd in expression_tree_mutator (node=0x853d450, mutator=0x818bc10
<eval_const_expressions_mutator>, context=0xbfd97520) at clauses.c:3473
#8 0x0818be4d in eval_const_expressions_mutator (node=0x853d544,
context=0xbfd97520) at clauses.c:1335
#9 0x0818c431 in eval_const_expressions_mutator (node=0x85521bc,
context=0xbfd97520) at clauses.c:2030
#10 0x0818ca35 in eval_const_expressions (node=0x853d570) at clauses.c:1211
#11 0x0822fd63 in RelationGetIndexPredicate (relation=0x44377818) at
relcache.c:2790
#12 0x080cb49f in BuildIndexInfo (index=0x44377818) at index.c:900
#13 0x080febd6 in analyze_rel (relid=227469, vacstmt=0x44258764) at
analyze.c:257
#14 0x0813728b in vacuum (vacstmt=0x44258764, relids=0x4425afb8) at
vacuum.c:476
#15 0x0819350f in autovacuum_do_vac_analyze (relids=0x4425c394, dovacuum=0
'\0', doanalyze=1 '\001', freeze=0 '\0') at autovacuum.c:900
#16 0x08193ead in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:674
#17 0x081941f6 in autovac_start () at autovacuum.c:170
#18 0x08199f91 in ServerLoop () at postmaster.c:1268
#19 0x0819b122 in PostmasterMain (argc=3, argv=0x833b8b8) at postmaster.c:943
#20 0x0815bece in main (argc=3, argv=0x833b8b8) at main.c:256


Both dumps are still available for additional info on variable-values, etc.



db=# select version();
version
------------------------------------------------------------------------
PostgreSQL 8.1.0 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.4.3





--
Best,




Frank.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 10:30 AM
Alvaro Herrera
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

Frank van Vugt wrote:
> L.S.
>
> Since I started to use v8.1 I've been seeing incidental segfaults of the
> autovacuum process, probably when it is kicking in at some particular point
> during the pg_restore of a database:


The problem appears to be that autovacuum is forgetting to set a
snapshot in cases where it's needed. My bug -- should be easy to fix.
I'll apply a patch shortly.

Thanks for the report!

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 10:30 AM
Alvaro Herrera
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

Frank van Vugt wrote:

> Since I started to use v8.1 I've been seeing incidental segfaults of the
> autovacuum process, probably when it is kicking in at some particular point
> during the pg_restore of a database:


Hum, I'm unable to reproduce the problem here; could you give me the
code for the functions in the functional indexes for tables 164956 or
227469? They must be non-trivial SQL functions AFAICT (I'm rather
unable to figure out their shape based only on the backtrace.)

The attached patch should correct the problem, but I'd like to make sure
it does ...

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 10:30 AM
Alvaro Herrera
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

Alvaro Herrera wrote:
> Frank van Vugt wrote:
>
> > Since I started to use v8.1 I've been seeing incidental segfaults of the
> > autovacuum process, probably when it is kicking in at some particular point
> > during the pg_restore of a database:


[...]

> The attached patch should correct the problem, but I'd like to make sure
> it does ...


I managed to make it fail, and the patch does indeed correct the
problem. I'm applying to both HEAD and the 8.1 branch. If you could
confirm that it fixes the problem for you, that'd be good.

Thanks,

--
Alvaro Herrera http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 10:30 AM
Tom Lane
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

Alvaro Herrera <alvherre@commandprompt.com> writes:
> The attached patch should correct the problem, but I'd like to make sure
> it does ...


Rather than that, I'd suggest just setting ActiveSnapshot
unconditionally after each of the StartTransactionCommand calls in
autovacuum.c, ie make the code look just like vacuum.c:

/* Begin a transaction for vacuuming this relation */
StartTransactionCommand();
/* functions in indexes may want a snapshot set */
ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());

This seems more future-proof. The patch as proposed is assuming a whole
lot about where snapshots might or might not get used.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 10:30 AM
Frank van Vugt
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > The attached patch should correct the problem, but I'd like to make sure
> > it does ...


> Rather than that, I'd suggest just setting ActiveSnapshot
> unconditionally after each of the StartTransactionCommand calls in
> autovacuum.c, ie make the code look just like vacuum.c:
>
> /* Begin a transaction for vacuuming this relation */
> StartTransactionCommand();
> /* functions in indexes may want a snapshot set */
> ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
>
> This seems more future-proof. The patch as proposed is assuming a whole
> lot about where snapshots might or might not get used.


Will try the patch tonight.

Tom, is your patch meant for the exact same location? Also, don't we need a
'CommitTransactionCommand()' as well?





--
Best,




Frank.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-10-2008, 10:30 AM
Tom Lane
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

Frank van Vugt <ftm.van.vugt@foxi.nl> writes:
> Tom, is your patch meant for the exact same location?


No. There are two StartTransactionCommand calls in autovacuum.c, and
what I'm suggesting is to add the ActiveSnapshot assignment after each
one.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-10-2008, 10:30 AM
Alvaro Herrera
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > The attached patch should correct the problem, but I'd like to make sure
> > it does ...

>
> Rather than that, I'd suggest just setting ActiveSnapshot
> unconditionally after each of the StartTransactionCommand calls in
> autovacuum.c, ie make the code look just like vacuum.c:
>
> /* Begin a transaction for vacuuming this relation */
> StartTransactionCommand();
> /* functions in indexes may want a snapshot set */
> ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());


Done.

--
Alvaro Herrera http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-10-2008, 10:30 AM
Frank van Vugt
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

Alvaro,

> Hum, I'm unable to reproduce the problem here; could you give me the
> code for the functions in the functional indexes for tables 164956 or
> 227469? They must be non-trivial SQL functions AFAICT (I'm rather
> unable to figure out their shape based only on the backtrace.)


Obviously the oid's have changed , but the relid in frame 13 of the latest
backtrace:

#13 0x080febd6 in analyze_rel (relid=268480, vacstmt=0x4422f4a0) at
analyze.c:257

is referring to a table 'purchaseorder_line' which has one functional index
defined like this (taken from '\d' output):

"purchaseorder_line_idx01" btree (salesorder_line_id)
WHERE status_id <> pol_stat('POL_CANCELLED'::character varying)
AND status_id <> pol_stat('POL_HANDLED'::character varying)

The pol_stat() function is a helper-function used to refer to statusses by
abbreviation instead of id and is defined as:

CREATE OR REPLACE FUNCTION pol_stat(varchar)
RETURNS int
LANGUAGE 'sql'
IMMUTABLE
STRICT
SECURITY INVOKER
AS 'SELECT id FROM purchaseorder_line_status WHERE abbreviation = $1';


If you were interested in some other relid, just let me know.



--
Best,




Frank.

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-10-2008, 10:30 AM
Frank van Vugt
 
Posts: n/a
Default Re: segfault of autovacuum process during restore - coredumps included

<TL>
> No. There are two StartTransactionCommand calls in autovacuum.c, and
> what I'm suggesting is to add the ActiveSnapshot assignment after each
> one.


<AH>
> Done.


I've changed autovacuum.c per this diff, I 'hope' I misinterpreted what needed
to be done (see below):

================================================== ====
-- autovacuum.c_orig 2005-11-28 16:34:49.000000000 +0100
+++ autovacuum.c 2005-11-28 16:37:11.000000000 +0100
@@ -494,6 +494,9 @@
/* Start a transaction so our commands have one to play into. */
StartTransactionCommand();

+ /* Begin a transaction for vacuuming this database */
+ ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+
dbRel = heap_open(DatabaseRelationId, AccessShareLock);

/* Must use a table scan, since there's no syscache for pg_database */
@@ -555,6 +558,9 @@
/* Start a transaction so our commands have one to play into. */
StartTransactionCommand();

+ /* Begin a transaction for vacuuming this relation */
+ ActiveSnapshot = CopySnapshot(GetTransactionSnapshot());
+
/*
* StartTransactionCommand and CommitTransactionCommand will
automatically
* switch to other contexts. We need this one to keep the list of
================================================== ====

Both the first and second attempt to restore my development database
failed..... ;(


Here's a new backtrace:


#0 0x0825361c in CopySnapshot (snapshot=0x0) at tqual.c:1301
1301 newsnap = (Snapshot) palloc(sizeof(SnapshotData) +
(gdb) bt
#0 0x0825361c in CopySnapshot (snapshot=0x0) at tqual.c:1301
#1 0x0814673b in fmgr_sql (fcinfo=0xbfa79e20) at functions.c:319
#2 0x081400bc in ExecMakeFunctionResult (fcache=0x84e3780,
econtext=0x84e3bd8, isNull=0xbfa7a09b "<\b", isDone=0x0) at execQual.c:1096
#3 0x081426d6 in ExecEvalExprSwitchContext (expression=0x850dfdc,
econtext=0x0, isNull=0xbfa7a09b "<\b", isDone=0x0) at execQual.c:2865
#4 0x08189e53 in evaluate_expr (expr=0x84e3780, result_type=23) at
clauses.c:2646
#5 0x0818b8e9 in simplify_function (funcid=267367, result_type=23,
args=0x84ed6d0, allow_inline=1 '\001', context=0xbfa7a2b0) at clauses.c:2260
#6 0x0818bdea in eval_const_expressions_mutator (node=0x84ed454,
context=0xbfa7a2b0) at clauses.c:1305
#7 0x0818a5bd in expression_tree_mutator (node=0x84e35a0, mutator=0x818bc10
<eval_const_expressions_mutator>, context=0xbfa7a2b0) at clauses.c:3473
#8 0x0818be4d in eval_const_expressions_mutator (node=0x84e3694,
context=0xbfa7a2b0) at clauses.c:1335
#9 0x0818c431 in eval_const_expressions_mutator (node=0x84ed614,
context=0xbfa7a2b0) at clauses.c:2030
#10 0x0818ca35 in eval_const_expressions (node=0x84e36c0) at clauses.c:1211
#11 0x0822fd83 in RelationGetIndexPredicate (relation=0x4433f40c) at
relcache.c:2790
#12 0x080cb49f in BuildIndexInfo (index=0x4433f40c) at index.c:900
#13 0x080febd6 in analyze_rel (relid=268480, vacstmt=0x4422f4a0) at
analyze.c:257
#14 0x0813728b in vacuum (vacstmt=0x4422f4a0, relids=0x44228dcc) at
vacuum.c:476
#15 0x0819350f in autovacuum_do_vac_analyze (relids=0x4422e528, dovacuum=0
'\0', doanalyze=1 '\001', freeze=0 '\0') at autovacuum.c:906
#16 0x08193ecd in AutoVacMain (argc=0, argv=0x0) at autovacuum.c:680
#17 0x08194216 in autovac_start () at autovacuum.c:170
#18 0x08199fb1 in ServerLoop () at postmaster.c:1268
#19 0x0819b142 in PostmasterMain (argc=3, argv=0x833b8a0) at postmaster.c:943
#20 0x0815bece in main (argc=3, argv=0x833b8a0) at main.c:256








--
Best,




Frank.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 02:13 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com