vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hello, I have a smallish postgres database that segfaults everytime when I try to access a certain row in a certain column. xxx=# select file_id from dbfiles offset 632531 limit 1; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. file_id is a varchar(40). The database is a selfcompiled 8.2.6 on a x86_64 linux 2.6 machine that was running well for about 8 months before it started exhibiting the problem a couple days ago. I tried upgrading to 8.2.7 but the problem is still occuring. I have recompiled the postgres server without -O2 and with -g and have captured a coredump. Here's a bt and the first section of a bt full. If you need more information, please don't hesistate to contact me. Core was generated by `postgres: postgres xxx [local] SELECT '. Program terminated with signal 11, Segmentation fault. #0 0x000000000067c01b in pglz_decompress (source=0x2b3ab8060910, dest=0xa57744 "d") at pg_lzcompress.c:678 678 *bp = bp[-off]; (gdb) bt #0 0x000000000067c01b in pglz_decompress (source=0x2b3ab8060910, dest=0xa57744 "d") at pg_lzcompress.c:678 #1 0x00000000004613b6 in heap_tuple_untoast_attr (attr=0x2b3ab8060910) at tuptoaster.c:128 #2 0x00000000006a3a19 in pg_detoast_datum (datum=0x2b3ab8060910) at fmgr.c:1973 #3 0x00000000004423d0 in printtup (slot=0xa3ff78, self=0xa3de00) at printtup.c:317 #4 0x000000000053d184 in ExecSelect (slot=0xa3ff78, dest=0xa3de00, estate=0xa3fe00) at execMain.c:1310 #5 0x000000000053cfea in ExecutePlan (estate=0xa3fe00, planstate=0xa40120, operation=CMD_SELECT, numberTuples=0, direction=ForwardScanDirection, dest=0xa3de00) at execMain.c:1236 #6 0x000000000053b9aa in ExecutorRun (queryDesc=0xa335d0, direction=ForwardScanDirection, count=0) at execMain.c:241 #7 0x00000000005f9f4c in PortalRunSelect (portal=0xa4b6c0, forward=1 '\001', count=0, dest=0xa3de00) at pquery.c:831 #8 0x00000000005f9bc3 in PortalRun (portal=0xa4b6c0, count=9223372036854775807, dest=0xa3de00, altdest=0xa3de00, completionTag=0x7fff01602ac0 "") at pquery.c:656 #9 0x00000000005f4737 in exec_simple_query (query_string=0xa05100 "select file_id from dbfiles offset 632531 limit 1;") at postgres.c:939 #10 0x00000000005f8376 in PostgresMain (argc=4, argv=0x9672c8, username=0x967290 "postgres") at postgres.c:3424 #11 0x00000000005c4318 in BackendRun (port=0x9634d0) at postmaster.c:2934 #12 0x00000000005c38c5 in BackendStartup (port=0x9634d0) at postmaster.c:2561 #13 0x00000000005c154a in ServerLoop () at postmaster.c:1214 #14 0x00000000005c0f70 in PostmasterMain (argc=3, argv=0x946230) at postmaster.c:966 #15 0x0000000000568d76 in main (argc=3, argv=0x946230) at main.c:188 (gdb) bt full #0 0x000000000067c01b in pglz_decompress (source=0x2b3ab8060910, dest=0xa57744 "d") at pg_lzcompress.c:678 dp = (const unsigned char *) 0x2b3ab80707ea "`" dend = (const unsigned char *) 0x2b3abff0c03d "6.21.163" bp = (unsigned char *) 0xa7b000 <Address 0xa7b000 out of bounds> ctrl = 3 '\003' ctrlc = 5 len = 5 off = 1633 destsize = 0 __func__ = "pglz_decompress" --snip Thanks in advance, Karsten Desler -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs |
| |||
| Karsten Desler <kd@link11.de> writes: > I have a smallish postgres database that segfaults everytime when I try to > access a certain row in a certain column. Looks like a corrupted-data issue to me. It might be interesting to dump the page with pg_filedump and see if there's any apparent pattern to the damage. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs |
| |||
| Karsten Desler <kd@link11.de> writes: > I don't know much about the postgres architecture and I don't know if bounds > checking on-disk values on a read makes a lot of sense since usually one > should be able to assume that there are no randomly flipped bits; but it > would've been nice to have a sensible log entry as to what really > happened. FWIW, there is code in CVS HEAD that detects simple cases of corrupt compressed data, though it's anyone's guess if it would've caught your example here. > Anyway, for future reference: Assuming that this is the only corruption, > can I just UPDATE (or DELETE and reINSERT) the offending entry (maybe with a > following REINDEX/VACUUM?) or do I need to restore a backup? If only the one row is clobbered, you should be able to just delete and re-insert it, assuming you can identify it in a way that doesn't crash in itself (ctid is probably about the safest). Not sure if an UPDATE would be safe. My suspicion though is that you'll find that a large portion of that page is damaged; that's usually what we've seen in such cases in the past. regards, tom lane -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs |
| |||
| * Tom Lane wrote: > Karsten Desler <kd@link11.de> writes: > > I have a smallish postgres database that segfaults everytime when I try to > > access a certain row in a certain column. > > Looks like a corrupted-data issue to me. It might be interesting to > dump the page with pg_filedump and see if there's any apparent pattern > to the damage. Thanks, I'll try to play with pg_filedump later tonight. I've never had problems with this (and many more) postgres servers regarding corruption of on disk data structures and I'm perfectly fine with chalking it up to hardware problems. I don't know much about the postgres architecture and I don't know if bounds checking on-disk values on a read makes a lot of sense since usually one should be able to assume that there are no randomly flipped bits; but it would've been nice to have a sensible log entry as to what really happened. Anyway, for future reference: Assuming that this is the only corruption, can I just UPDATE (or DELETE and reINSERT) the offending entry (maybe with a following REINDEX/VACUUM?) or do I need to restore a backup? If possible, I'd prefer the UPDATE solution, of course, since it can be done without any downtime. Keep up the good work. Best regards, Karsten Desler -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs |
| |||
| Karsten Desler wrote: > > I don't know much about the postgres architecture and I don't know if bounds > checking on-disk values on a read makes a lot of sense since usually one > should be able to assume that there are no randomly flipped bits; but it > would've been nice to have a sensible log entry as to what really > happened. I attached backported patch from head to 8.2. You can try it. It has small performance penalty, but it does not crash on corrupted data. Zdenek -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs |
| |||
| Tom Lane wrote: > My suspicion though is that you'll find that a large portion of that > page is damaged; that's usually what we've seen in such cases in the > past. I think, It can happen only if corruption is less then TOAST chunk size. In other case, page header or tuple header+chunk id should be corrupted and it should be reported in another place. Zdenek -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs |
| ||||
| * Zdenek Kotala wrote: > Karsten Desler wrote: > > > >I don't know much about the postgres architecture and I don't know if > >bounds > >checking on-disk values on a read makes a lot of sense since usually one > >should be able to assume that there are no randomly flipped bits; but it > >would've been nice to have a sensible log entry as to what really > >happened. > > I attached backported patch from head to 8.2. You can try it. It has small > performance penalty, but it does not crash on corrupted data. Thank you very much! I have restored a backup of the corrupt postgres data files on a second server and I can confirm that postmaster no longer crashes with the patch applied. Thanks, Karsten -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs |
| Thread Tools | |
| Display Modes | |
| |