Unix Technical Forum

8.3.0 backend segfaults

This is a discussion on 8.3.0 backend segfaults within the pgsql Bugs forums, part of the PostgreSQL category; --> (Sorry if duplicates show up this is the third time ive posted this in the past 10 hours, Im ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Bugs

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 11:15 AM
Alex Hunsaker
 
Posts: n/a
Default 8.3.0 backend segfaults

(Sorry if duplicates show up this is the third time ive posted this in
the past 10 hours, Im assuming it got dropped because of the
attachments)

Problem: Apparently random segfaults apparently query agnostic, seem
to be more frequent when a pg_dump is running

The most frequent query it segfaults with is:
select w.worker_id, w.worker_id as printerid, w.worker, w.alias,
coalesce(w.alias, w.worker) as name, w.active, w.last_active,
w.last_deactive, round(extract(epoch from now()) - extract(epoch from
w.last_deactive)) as time_off from workers as w left join worker_vis
as wv on wv.worker_id = w.worker_id and wv.defunct = 0 and (
((wv.auth_id = ?) and (wv.auth_class = data_class('user_id'))) or
((wv.auth_id = ?) and (wv.auth_class = data_class('clinic_id'))))
where wv.worker_vis_id is not null and w.defunct = 0 order by
coalesce(w.alias, w.worker);

(sample arguments, they do not seem to make a difference)
167214, 13
167340, 16

But have seen other simpler queries such as (sorry no backtraces for
these... yet)
insert into audit_transaction (action, open_user_id, page_load_id,
user_id) values (?, ?, ?, ?); ARGS=suds, 509057, 15148365, 167217

If I select * from workers; and select * from workers_active;
everything looks good, no segfaults. I can also take a pg_dump if
virtually nothing is using the database. Otherwise pg_dump dies
randomly with an errors like:

Dumping the contents of table "clients_audit" failed: PQgetCopyData() failed.
Dumping the contents of table "file_data" failed: PQgetCopyData() failed.
Dumping the contents of table "workers_audit" failed: PQgetCopyData() failed.

(note these are probably 3 of the larger tables in the database, maybe
it has something to do with statement_timeout, i do have it set quite
low (3 min) ?)

If I manually prepare and execute the above queries they work fine.
prepare worker (bigint, bigint) as select w.worker_id, w.worker_id as
printerid, w.worker, w.alias, coalesce(w.alias, w.worker) as name,
w.active, w.last_active, w.last_deactive, round(extract(epoch from
now()) - extract(epoch from w.last_deactive)) as time_off from workers
as w left join worker_vis as wv on wv.worker_id = w.worker_id and
wv.defunct = 0 and ( ((wv.auth_id = ?) and (wv.auth_class =
data_class('user_id'))) or ((wv.auth_id = ?) and (wv.auth_class =
data_class('clinic_id')))) where wv.worker_vis_id is not null and
w.defunct = 0 order by coalesce(w.alias, w.worker);
execute worker (167214, 13);

Core dumps and binaries available if needed. I have about 6 core dumps
across 2 different servers. Which leads me to believe its either
postgres bug or a corrupt database (the other server is a pitr slave
for the master which i switched over to see if i could reproduce the
segfault on the master)

Nothing in dmesg, nothing in mcelog.
Raid controller is a 3ware 9550SX-12 with 6 70GB WD Raptor hard drives
in a raid 10, smart tests pass fine... everything looks good.

Work load is a web application where each page beings a transaction;
creates a temp table, does a few selects, inserts and updates and the
commits.

Postgresql 8.2.5 was working without any problems just prior to the upgrade.

\d workers
Table "public.workers"
Column | Type |
Modifiers
---------------+--------------------------+-----------------------------------------------------------------
worker_id | bigint | not null default
nextval('workers_worker_id_seq'::regclass)
date_created | timestamp with time zone | not null default now()
clientid | bigint | not null
worker | text | not null
alias | text |
job_type | smallint | not null
active | smallint | not null default 0
last_active | timestamp with time zone |
last_deactive | timestamp with time zone |
defunct | smallint | not null default 0
audit_class | integer |
audit_date | timestamp with time zone | not null default now()
audit_desc | text |
audit_id | bigint |
audit_seq | integer | not null default
nextval('audit_basic_audit_seq_seq'::regclass)
audit_table | character varying(64) |
audit_tid | bigint |
audit_type | character varying(32) |
audit_orig_id | integer |
path | text |
Indexes:
"workers_pkey" PRIMARY KEY, btree (worker_id) CLUSTER
Triggers:
workers_audit_aud BEFORE INSERT OR DELETE OR UPDATE ON workers FOR
EACH ROW EXECUTE PROCEDURE audit_table_go()

\d worker_vis
Table "public.worker_vis"
Column | Type |
Modifiers
---------------+--------------------------+-----------------------------------------------------------------
worker_vis_id | bigint | not null default
nextval('worker_vis_ruleid_seq'::regclass)
date_created | timestamp with time zone | not null default now()
worker_id | bigint |
auth_id | bigint |
auth_class | smallint |
defunct | smallint | not null default 0
audit_class | integer |
audit_date | timestamp with time zone | not null default now()
audit_desc | text |
audit_id | bigint |
audit_seq | integer | not null default
nextval('audit_basic_audit_seq_seq'::regclass)
audit_table | character varying(64) |
audit_tid | bigint |
audit_type | character varying(32) |
audit_orig_id | integer |
Indexes:
"worker_vis_pkey" PRIMARY KEY, btree (worker_vis_id)
"worker_vis_vis_idx" btree (worker_id, defunct, auth_id, auth_class) CLUSTER
Triggers:
worker_vis_audit_aud BEFORE INSERT OR DELETE OR UPDATE ON
worker_vis FOR EACH ROW EXECUTE PROCEDURE audit_table_go()

Version Information:

Following REL8.3_STABLE 2 versions tried both crash with a similar
backtrace (both died in CopySnapshot):

(currently running)
Author: teodor <teodor>
Date: Sun Mar 9 10:42:48 2008 +0000

Revert changes of CompareTSQ: it affects existing btree indexes.

(was running but also crashed)
Author: tgl <tgl>
Date: Sun Mar 2 00:10:28 2008 +0000

Venezuela Time now means UTC-4:30, not UTC-4:00. Adjust our treatment
of "VET" accordingly. Per bug #3997 from Aaron Mizrachi.

Memory: 8G ECC
Swap: 16GB (unused even at segfault time aka it does not seem to be
running out of memmory)
OS: Linux 2.6.24.3

select version();
PostgreSQL 8.3.0 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)

built with: --disable-nls --with-openssl --without-python
--without-tcl --without-krb5 --with-perl --enable-debug
--enable-integer-datetimes --with-ossp-uuid

postgresql.conf:
listen_addresses = '*'
port = 5432
max_connections = 500
superuser_reserved_connections = 3
ssl = on
shared_buffers = 1000MB
work_mem = 32MB
maintenance_work_mem = 256MB
max_stack_depth = 7MB
max_fsm_pages = 304800
max_fsm_relations = 3000
vacuum_cost_delay = 30
bgwriter_lru_maxpages = 500
wal_sync_method = fdatasync
wal_buffers = 256kB
checkpoint_segments = 8
checkpoint_timeout = 10min
archive_command = 'rsync -az -e "ssh" %p
psql-sync@10.10.10.1:standby/cur/wal/%f </dev/null'
random_page_cost = 2.6
effective_cache_size = 6GB
default_statistics_target = 101
constraint_exclusion = on
log_destination = 'syslog'
client_min_messages = warning
log_min_messages = warning
silent_mode = on
log_line_prefix = '[%m] %h:%d '
log_lock_waits = on
log_temp_files = 20MB
autovacuum_naptime = 3min
autovacuum_vacuum_threshold = 200
autovacuum_analyze_threshold = 200
statement_timeout = '3min'
datestyle = 'iso, mdy'
lc_messages = 'C'
lc_monetary = 'C'
lc_numeric = 'C'
lc_time = 'C'
backslash_quote = off
escape_string_warning = on
sql_inheritance = off
standard_conforming_strings = on

cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 246
stepping : 10
cpu MHz : 2000.000
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm
3dnowext 3dnow rep_good
bogomips : 3991.79
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 5
model name : AMD Opteron(tm) Processor 246
stepping : 10
cpu MHz : 2000.000
cache size : 1024 KB
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext lm
3dnowext 3dnow rep_good
bogomips : 3991.79
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp

Core was generated by `postgres: docsysweb CaduRx 192.168.134.1(49502) BIND '.
Program terminated with signal 11, Segmentation fault.
#0 0x00002b6ef7457f74 in memcpy () from /lib/libc.so.6
(gdb) bt
#0 0x00002b6ef7457f74 in memcpy () from /lib/libc.so.6
#1 0x000000000065fc88 in CopySnapshot (snapshot=0xa4e6e0) at tqual.c:1311
#2 0x000000000051cf34 in fmgr_sql (fcinfo=0x7fffb3f51160) at functions.c:299
#3 0x00000000005164cd in ExecMakeFunctionResult (fcache=0x10b4710,
econtext=0x10b4ba0, isNull=0x7fffb3f515af "", isDone=0x0) at
execQual.c:1351
#4 0x00000000005191ca in ExecEvalExprSwitchContext
(expression=0x2b6f2c5dc078, econtext=0x10, isNull=0x24c8500 <Address
0x24c8500 out of bounds>, isDone=0x24c84c0)
at execQual.c:3726
#5 0x00000000005690bd in evaluate_expr (expr=<value optimized out>,
result_type=23, result_typmod=-1) at clauses.c:3273
#6 0x0000000000569d21 in simplify_function (funcid=2214548,
result_type=23, result_typmod=-1, args=0xf8db40, allow_inline=1
'\001', context=0x7fffb3f51700) at clauses.c:2887
#7 0x000000000056a53a in eval_const_expressions_mutator
(node=0xf24d90, context=0x7fffb3f51700) at clauses.c:1795
#8 0x000000000056b185 in estimate_expression_value (root=<value
optimized out>, node=0x2b6f2c5dc078) at clauses.c:1719
#9 0x00000000005f8b86 in get_restriction_variable (root=0xf22d80,
args=<value optimized out>, varRelid=0, vardata=0x7fffb3f51810,
other=0x7fffb3f51848,
varonleft=0x7fffb3f5185f "\001\236") at selfuncs.c:3570
#10 0x00000000005fa562 in eqsel (fcinfo=<value optimized out>) at selfuncs.c:169
#11 0x000000000064540c in OidFunctionCall4 (functionId=<value
optimized out>, arg1=15871360, arg2=532, arg3=15879312, arg4=0) at
fmgr.c:1615
#12 0x000000000056d0c2 in restriction_selectivity (root=0xf22d80,
operator=532, args=0xf24c90, varRelid=0) at plancat.c:805
#13 0x000000000054e797 in clause_selectivity (root=0xf22d80,
clause=0xf24de0, varRelid=0, jointype=JOIN_INNER) at clausesel.c:639
#14 0x000000000054e0df in clauselist_selectivity (root=0xf22d80,
clauses=<value optimized out>, varRelid=0, jointype=JOIN_INNER) at
clausesel.c:123
#15 0x000000000054e448 in clause_selectivity (root=0xf22d80,
clause=0xf8bb20, varRelid=0, jointype=JOIN_INNER) at clausesel.c:576
#16 0x000000000054e72a in clause_selectivity (root=0xf22d80,
clause=<value optimized out>, varRelid=0, jointype=JOIN_INNER) at
clausesel.c:597
#17 0x000000000054e0df in clauselist_selectivity (root=0xf22d80,
clauses=<value optimized out>, varRelid=0, jointype=JOIN_INNER) at
clausesel.c:123
#18 0x000000000054f99f in set_baserel_size_estimates
(root=0x2b6f2c5dc078, rel=0xf26748) at costsize.c:2262
#19 0x000000000054d67c in set_rel_pathlist (root=0xf22d80,
rel=0xf26748, rti=2, rte=0xec7c30) at allpaths.c:215
#20 0x000000000054df02 in make_one_rel (root=0xf22d80,
joinlist=0xf26668) at allpaths.c:150
#21 0x000000000055f710 in query_planner (root=0xf22d80, tlist=<value
optimized out>, tuple_fraction=0, limit_tuples=-1,
cheapest_path=0x7fffb3f52220,
sorted_path=0x7fffb3f52218, num_groups=0x7fffb3f52228) at planmain.c:249
#22 0x0000000000560041 in grouping_planner (root=0xf22d80,
tuple_fraction=<value optimized out>) at planner.c:897
#23 0x00000000005610b8 in subquery_planner (glob=0xee11a0,
parse=0xee1230, level=0, tuple_fraction=0, subroot=0x7fffb3f523e8) at
planner.c:431
#24 0x00000000005614b1 in standard_planner (parse=0xee1230,
cursorOptions=0, boundParams=0x0) at planner.c:158
#25 0x00000000005a1c61 in pg_plan_query (querytree=0xee1230,
cursorOptions=0, boundParams=0x0) at postgres.c:681
#26 0x00000000005a1d13 in pg_plan_queries (querytrees=<value optimized
out>, cursorOptions=0, boundParams=0x0, needSnapshot=0 '\0') at
postgres.c:752
#27 0x0000000000633f93 in do_planning (querytrees=0xf22d50,
cursorOptions=0) at plancache.c:560
#28 0x000000000063436b in RevalidateCachedPlan (plansource=0xcd02f0,
useResOwner=0 '\0') at plancache.c:484
#29 0x00000000005a3b05 in PostgresMain (argc=4, argv=<value optimized
out>, username=0x8f3020 "docsysweb") at postgres.c:1605
#30 0x0000000000579d8f in ServerLoop () at postmaster.c:3207
#31 0x000000000057a85c in PostmasterMain (argc=3, argv=0x8ee2d0) at
postmaster.c:1029
#32 0x00000000005345ce in main (argc=3, argv=<value optimized out>) at
main.c:188
gdb) frame 28
#28 0x000000000063436b in RevalidateCachedPlan (plansource=0xcd02f0,
useResOwner=0 '\0') at plancache.c:484
484 slist = do_planning(slist,
plansource->cursor_options);
(gdb) print plansource->query_string
$1 = 0xf08ea0 "select w.worker_id, w.worker_id as printerid, w.worker,
w.alias, coalesce(w.alias, w.worker) as name, w.active, w.last_active,
w.last_deactive, round(extract(epoch from now()) - extract(epoch from
w.l"...

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 11:15 AM
Tom Lane
 
Posts: n/a
Default Re: 8.3.0 backend segfaults

"Alex Hunsaker" <badalex@gmail.com> writes:
> Problem: Apparently random segfaults apparently query agnostic, seem
> to be more frequent when a pg_dump is running


Hmm, seems from the backtrace that we're trying to do a replan with an
invalid ActiveSnapshot. What sequence of operations is the connected
client performing, exactly? (I'm not worried about the details of the
SQL per se, but of how it's preparing/issuing queries.)

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 11:15 AM
Alex Hunsaker
 
Posts: n/a
Default Re: 8.3.0 backend segfaults

On Tue, Mar 11, 2008 at 10:59 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Alex Hunsaker" <badalex@gmail.com> writes:
> > Problem: Apparently random segfaults apparently query agnostic, seem
> > to be more frequent when a pg_dump is running

>
> Hmm, seems from the backtrace that we're trying to do a replan with an
> invalid ActiveSnapshot. What sequence of operations is the connected
> client performing, exactly? (I'm not worried about the details of the
> SQL per se, but of how it's preparing/issuing queries.)
>
> regards, tom lane
>


Its through perl-dbi using ->prepare, ->execute.. is that what you
mean? Also this is with mod_perl and Apache:BI.

So a typical transacion looks like

connect to postgres;
begin;
do stuff here;
commit or rollback;
(next page load)
begin;
.....

so potentially a database connection/session is reused on the next
page load if that makes a difference.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 11:15 AM
Alex Hunsaker
 
Posts: n/a
Default Re: 8.3.0 backend segfaults

FYI right now Im trying:

Author: tgl <tgl>
Date: Sat Feb 2 22:26:17 2008 +0000

Fix WaitOnLock() to ensure that the process's "waiting" flag is reset after
erroring out of a wait. We can use a PG_TRY block for this, but
add a comment
explaining why it'd be a bad idea to use it for any other state cleanup.

Back-patch to 8.2. Prior releases had the same issue, but only with respect
to the process title, which is likely to get reset almost immediately anyway
after the transaction aborts, so it seems not worth changing them. In 8.2
and HEAD, the pg_stat_activity "waiting" flag could remain set incorrectly
for a long time.

Which was just after 8.3 was stamped. Ill see if anything happens
tomorrow when the masses start using it again. I will also try to
find a way to reproduce it.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 11:15 AM
Alex Hunsaker
 
Posts: n/a
Default Re: 8.3.0 backend segfaults

Oops we actually use DBI->prepare_cached() not DBI->prepare() which to
my understanding should be roughly equivalent to (because of
Apache:BI):

prepare query ....;
begin;
execute query;
commit;
(next page load)
begin;
execute query;
commit;

I can turn that off and only use DBI->prepare() as a test. Or heck
just cut DBI->prepare() out and just quote everything and send them
through using DBI->do() instead. That is if you think that could be
the culprit.

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 11:15 AM
Tom Lane
 
Posts: n/a
Default Re: 8.3.0 backend segfaults

"Alex Hunsaker" <badalex@gmail.com> writes:
> I can turn that off and only use DBI->prepare() as a test. Or heck
> just cut DBI->prepare() out and just quote everything and send them
> through using DBI->do() instead. That is if you think that could be
> the culprit.


The path of control that's dying cannot be entered at all unless you're
using a prepared statement, so cutting out DBI->prepare should make the
problem go away.

Now personally, I am much more interested in *reproducing* the problem
than merely dodging it. I can understand if you just need a workaround
yesterday, but please see if you can get a reproducible test case ...

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-10-2008, 11:15 AM
Alex Hunsaker
 
Posts: n/a
Default Re: 8.3.0 backend segfaults

On Wed, Mar 12, 2008 at 12:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Now personally, I am much more interested in *reproducing* the problem
> than merely dodging it. I can understand if you just need a workaround
> yesterday, but please see if you can get a reproducible test case ...
>
> regards, tom lane
>


Yesterday would be nice . Ill keep trying to get a test case. In
the mean time here is a new backtrace I just got (lord knows how... it
seems to be as soon as I stop trying to make it crash and look away,
thats when it does). This is 2 commits ( the one i reference in my
later email) after 8.3 was stamped. Happens to be the same query as
before...

Core was generated by `postgres: docsysweb CaduRx 192.168.134.1(54493) BIND '.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000000000 in ?? ()
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x000000000045a55e in heapgettup (scan=0xbd8bd0, dir=<value
optimized out>, nkeys=0, key=0x0) at heapam.c:431
#2 0x000000000045a96d in heap_getnext (scan=0xbd8bd0,
direction=14717080) at heapam.c:1259
#3 0x0000000000525478 in SeqNext (node=<value optimized out>) at
nodeSeqscan.c:92
#4 0x000000000051a74f in ExecScan (node=0xbd7ed0, accessMtd=0x5253c0
<SeqNext>) at execScan.c:103
#5 0x0000000000514380 in ExecProcNode (node=0xbd7ed0) at execProcnode.c:334
#6 0x00000000005135ba in ExecutorRun (queryDesc=<value optimized
out>, direction=ForwardScanDirection, count=1) at execMain.c:1233
#7 0x000000000051c82d in postquel_getnext (es=0xe09038,
fcache=0xd0e4c0) at functions.c:378
#8 0x000000000051ce6d in fmgr_sql (fcinfo=0x7fffe6d7af90) at functions.c:479
#9 0x00000000005163bd in ExecMakeFunctionResult (fcache=0xd0df50,
econtext=0xd0e3e0, isNull=0x7fffe6d7b3df "", isDone=0x0) at
execQual.c:1351
#10 0x00000000005190ba in ExecEvalExprSwitchContext
(expression=0x2ae1c608c658, econtext=0xe09098, isNull=0x32d <Address
0x32d out of bounds>, isDone=0x0) at execQual.c:3726
#11 0x0000000000568d4d in evaluate_expr (expr=<value optimized out>,
result_type=23, result_typmod=-1) at clauses.c:3273
#12 0x00000000005699b1 in simplify_function (funcid=2214548,
result_type=23, result_typmod=-1, args=0xcfbfa0, allow_inline=1
'\001', context=0x7fffe6d7b530) at clauses.c:2887
#13 0x000000000056a1ca in eval_const_expressions_mutator
(node=0xc34170, context=0x7fffe6d7b530) at clauses.c:1795
#14 0x000000000056ae15 in estimate_expression_value (root=<value
optimized out>, node=0x2ae1c608c658) at clauses.c:1719
#15 0x00000000005f8906 in get_restriction_variable (root=0xc32160,
args=<value optimized out>, varRelid=0, vardata=0x7fffe6d7b640,
other=0x7fffe6d7b678,
varonleft=0x7fffe6d7b68f "\001\236") at selfuncs.c:3570
#16 0x00000000005fa2e2 in eqsel (fcinfo=<value optimized out>) at selfuncs.c:169
#17 0x0000000000644f4c in OidFunctionCall4 (functionId=<value
optimized out>, arg1=12788064, arg2=532, arg3=12796016, arg4=0) at
fmgr.c:1615
#18 0x000000000056cd52 in restriction_selectivity (root=0xc32160,
operator=532, args=0xc34070, varRelid=0) at plancat.c:805
#19 0x000000000054e447 in clause_selectivity (root=0xc32160,
clause=0xc341c0, varRelid=0, jointype=JOIN_INNER) at clausesel.c:639
#20 0x000000000054dd8f in clauselist_selectivity (root=0xc32160,
clauses=<value optimized out>, varRelid=0, jointype=JOIN_INNER) at
clausesel.c:123
#21 0x000000000054e0f8 in clause_selectivity (root=0xc32160,
clause=0xcf9f80, varRelid=0, jointype=JOIN_INNER) at clausesel.c:576
#22 0x000000000054e3da in clause_selectivity (root=0xc32160,
clause=<value optimized out>, varRelid=0, jointype=JOIN_INNER) at
clausesel.c:597
#23 0x000000000054dd8f in clauselist_selectivity (root=0xc32160,
clauses=<value optimized out>, varRelid=0, jointype=JOIN_INNER) at
clausesel.c:123
#24 0x000000000054f64f in set_baserel_size_estimates
(root=0x2ae1c608c658, rel=0xc35b28) at costsize.c:2262
#25 0x000000000054d32c in set_rel_pathlist (root=0xc32160,
rel=0xc35b28, rti=2, rte=0xc1d110) at allpaths.c:215
#26 0x000000000054dbb2 in make_one_rel (root=0xc32160,
joinlist=0xc35a48) at allpaths.c:150
#27 0x000000000055f3a0 in query_planner (root=0xc32160, tlist=<value
optimized out>, tuple_fraction=0, limit_tuples=-1,
cheapest_path=0x7fffe6d7c050, sorted_path=0x7fffe6d7c048,
num_groups=0x7fffe6d7c058) at planmain.c:249
#28 0x000000000055fcd1 in grouping_planner (root=0xc32160,
tuple_fraction=<value optimized out>) at planner.c:897
#29 0x0000000000560d48 in subquery_planner (glob=0xe19b20,
parse=0xe19bb0, level=0, tuple_fraction=0, subroot=0x7fffe6d7c218) at
planner.c:431
#30 0x0000000000561141 in standard_planner (parse=0xe19bb0,
cursorOptions=0, boundParams=0x0) at planner.c:158
#31 0x00000000005a1971 in pg_plan_query (querytree=0xe19bb0,
cursorOptions=0, boundParams=0x0) at postgres.c:681
#32 0x00000000005a1a23 in pg_plan_queries (querytrees=<value optimized
out>, cursorOptions=0, boundParams=0x0, needSnapshot=0 '\0') at
postgres.c:752
#33 0x0000000000633cf3 in do_planning (querytrees=0xc32130,
cursorOptions=0) at plancache.c:560
#34 0x00000000006340cb in RevalidateCachedPlan (plansource=0xb58460,
useResOwner=0 '\0') at plancache.c:484
#35 0x00000000005a3815 in PostgresMain (argc=4, argv=<value optimized
out>, username=0x8f32f0 "docsysweb") at postgres.c:1605
#36 0x0000000000579a1f in ServerLoop () at postmaster.c:3207
#37 0x000000000057a4ec in PostmasterMain (argc=3, argv=0x8ee2d0) at
postmaster.c:1029
#38 0x000000000053441e in main (argc=3, argv=<value optimized out>) at
main.c:188

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-10-2008, 11:15 AM
Tom Lane
 
Posts: n/a
Default Re: 8.3.0 backend segfaults

"Alex Hunsaker" <badalex@gmail.com> writes:
> the mean time here is a new backtrace I just got (lord knows how... it
> seems to be as soon as I stop trying to make it crash and look away,
> thats when it does).


Not sure that you are clear on what's happening here, but the train of
events is something like
- you prepare a statement
- somebody modifies the schema of one of the tables used in the
statement
- you try to execute the statement, and updating the prepared
plan crashes

If you say "none of my stuff is changing any schemas", then I'd guess
that the triggering event is a background autovacuum, which forces
replan just like an intentional schema change would. Does stopping
autovacuum make the problem go away?

regards, tom lane

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-10-2008, 11:15 AM
Greg Sabino Mullane
 
Posts: n/a
Default Re: 8.3.0 backend segfaults


-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> Work load is a web application where each page beings a transaction;
> creates a temp table, does a few selects, inserts and updates and the
> commits.


Are you sure you are calling DBI->connect *after* the Apache children
are created? Major problems like this can happen if not. The use of
prepare_cached() may be adding to the problem as well, especially if
you are using temp tables. In DBD::Pg, prepared statements are not
actually prepared (in most cases) until just before the first execute,
to account for late bindings and to be more efficient. Some related
DBD::Pg attribs to look at are pg_server_prepare and pg_prepare_now.


- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200803121121
http://biglumber.com/x/web?pk=2529DF...9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkfX9Q0ACgkQvJuQZxSWSsjEjACg6QNUdPIw5g czfTtFK3aUMh39
fUYAoLwRrFZ75z2Fbq7GDRYqgTlRsR9N
=ngbh
-----END PGP SIGNATURE-----



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-10-2008, 11:15 AM
Alex Hunsaker
 
Posts: n/a
Default Re: 8.3.0 backend segfaults

Hrm still no luck.

I created a snapshot of the database, moved it onto another server so
i could play with it...

Ive tried using just prepare on the console using the query that fails:
prepare worker (bigint, bigint) as select w.worker_id, w.worker_id as
printerid, w.worker, w.alias, coalesce(w.alias, w.worker) as name,
w.active, w.last_active, w.last_deactive, round(extract(epoch from
now()) - extract(epoch from w.last_deactive)) as time_off from workers
as w left join worker_vis as wv on wv.worker_id = w.worker_id and
wv.defunct = 0 and ( ((wv.auth_id = $1) and (wv.auth_class =
data_class('user_id'))) or ((wv.auth_id = $2) and (wv.auth_class =
data_class('clinic_id')))) where wv.worker_vis_id is not null and
w.defunct = 0 order by coalesce(w.alias, w.worker);

update workers set last_active = now();
vacuum analyze workers;

update worker_vis set worker_id = worker_id;
vacuum analyze worker_vis;

update data_classes set defunct = 0 where defunct = 0;
vacuum analyze data_classes;

execute wrk;

That works as expected. I also tried each of those updates/vacuums separately.

So now I'm trying the the "bad" query in the simple perl script i
posted before, Ive tried just one instance, and multiple instances...
I guess next ill try running all the sql each web session generates
before it crashes... unless anyone has any other bright ideas for me
to try. Perhaps my simple updates are not enough for analyze to
invalidate the query plan? Should I be doing inserts/deletes or just
more updates?

Below are the table counts and the definition of data_classes. That
should be everything the query uses, except for the actually data.
Which I'm more than willing to provide (privately) if anyone thinks
they have a great idea on how to reproduce it.

SELECT count(1) from workers;
count
-------
716
SELECT count(1) from worker_vis;
count
-------
577

SELECT count(1) from data_classes;
count
-------
75

\d data_classes
Table "public.data_classes"
Column | Type |
Modifiers
--------------+--------------------------+-------------------------------------------------------------------
data_class | integer | not null default
nextval('data_classes_data_class_seq'::regclass)
data_id | character varying(80) |
data_table | text |
date_created | timestamp with time zone | default now()
defunct | smallint | default 0
description | character varying(80) |
Indexes:
"data_classes_pkey" PRIMARY KEY, btree (data_class)
"data_class_data_id_idx" UNIQUE, btree (data_id)
"data_class_data_idx" btree (data_id) WHERE defunct = 0

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:00 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com