vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Apologies if my previous attempts to post this to the mailing list have actually succeeded, but I've seen no evidence of that... While doing some bugzilla testing, I ran into a data page corruption issue. The symptom was the usual "could not access status of transaction <bignum>". I tracked it down via the usual pg_filedump method and found what appears to my untrained eye to be data tuples from a different table residing in the (now corrupt) table in question. First, the relevant environmental details: System: - ASUS A8V motherboard - Athlon 64 3200 - 2G ECC memory (which is enabled. The kernel will panic in the face of an uncorrectable ECC error). - 2 250G Seagate ST3250823AS rev 3.03 in a mirrored configuration via Linux software RAID. OS: Debian Linux x86_64, 'unstable' branch. 'uname -a' yields: Linux kbrown-desktop 2.6.15-1-amd64-generic #2 Wed Jan 4 06:22:42 CET 2006 x86_64 GNU/Linux Filesystem: XFS with the default mount options. PostgreSQL: - version: 8.1.3. - The package is postgresql-8.1 version 8.1.3-2. - Autovacuum is enabled. I'm attaching my postgresql.conf. The context: I'm performing testing in preparation for migrating from a different bug tracking system to bugzilla. Most of this testing revolves around the migration tool, which reads directly from the other bug tracking system's database and inserts into the bugzilla database. The bugzilla database is hosted on PostgreSQL and has a lot of additional foreign keys that bugzilla does not have by default. All of the constraints are DEFERRABLE and INITIALLY DEFERRED. A test run consists of dropping all the constraints and all the tables, recreating all the tables and constraints, and importing a base set of data. The commands for doing all that were created via pg_dump. They are all executed in a single transaction and then committed. The migration run itself also executes in a single transaction and involves a savepoint prior to every insertion group (the name of the savepoint is based on the name of the operation, so the total number of outstanding savepoints at any one time is perhaps 4 or 5). Each migration run ends up executing perhaps 300,000 inserts. The meat of the problem: After the last run, a later select from one of the tables yielded the error. I narrowed it down to a single table (series_data) and tracked down the offending row via a binary search. Here's the end results: stagezilla=# select * from series_data; ERROR: could not access status of transaction 1685201011 DETAIL: could not open file "pg_clog/0647": No such file or directory stagezilla=# select * from series_data offset 9927 limit 1; series_id | series_date | series_value -----------+---------------------+-------------- 46 | 2006-01-06 00:00:00 | 0 (1 row) stagezilla=# select * from series_data offset 9928 limit 1; ERROR: could not access status of transaction 1685201011 DETAIL: could not open file "pg_clog/0647": No such file or directory stagezilla=# select ctid from series_data offset 9927 limit 1; ctid ---------- (72,136) (1 row) stagezilla=# select oid from pg_database where datname = 'stagezilla'; oid ------- 20518 (1 row) stagezilla=# select oid from pg_class where relname = 'series_data'; oid ------- 53294 (1 row) stagezilla=# \d series_data Table "public.series_data" Column | Type | Modifiers --------------+--------------------------------+----------- series_id | integer | not null series_date | timestamp(0) without time zone | not null series_value | integer | not null Indexes: "series_data_series_id_idx" UNIQUE, btree (series_id, series_date) Foreign-key constraints: "series_data_series_id_series_series_id_fk" FOREIGN KEY (series_id) REFERENCES series(series_id) ON UPDATE CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED The only file in pg_clog is 0000. After examining the output of pg_filedump, it became obvious this was a corrupt page issue. What bothers me is the way in which it's corrupt. The corrupt data looks supiciously like the data from different table, or perhaps from an index. In this case, from the short_desc field of the bugs table, which has a datatype of 'text' (I presume this means it's TOASTed?). The text data in the corrupt page exactly matches entries in that field of the bugs table. So: because the hardware is relatively new and is using components that are as reliable as I can make them given my somewhat limited budget (ECC memory, mirrored SATA disks), I have to consider the possibility that there's a bug here. Note that on the amd64 architecture, gcc (or some versions thereof) seems to have a code generation bug that can be eliminated by using the -fno-strict-aliasing switch. The version of gcc that seems to be in current use is 4.0.3. It's possible that this is causing problems, but to be honest this is the first time I've ever seen this happen. If you guys have any suggestions on what I might do to help track this one down, please let me know. I tarred up the entire 'stagezilla' database after shutting down PostgreSQL so that I can do additional examination of the datafiles if that proves useful. Additionally, if you'd like to see the output of pg_filedump, I'll be happy to post it here (or email it separately). I hesitate to attach it here because it's not clear the mailing list's spam filtering would let it pass. -- Kevin Brown kevin@sysexperts.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Kevin Brown <kevin@sysexperts.com> writes: > After examining the output of pg_filedump, it became obvious this was > a corrupt page issue. What bothers me is the way in which it's > corrupt. The corrupt data looks supiciously like the data from > different table, or perhaps from an index. In this case, from the > short_desc field of the bugs table, which has a datatype of 'text' (I > presume this means it's TOASTed?). The text data in the corrupt page > exactly matches entries in that field of the bugs table. Substitution of a page from a different table seems unlikely to result in this error, as the tuple header layout is the same for all tables. A page from an index, though, maybe. Does the suspicious looking page have any pd_special space? We have in the past seen data-corruption errors that were clearly due to substitution of a chunk of entirely non-Postgres data for a table page. I suppose that those are symptoms of either kernel or device misfeasance .... it's hard to see how Postgres could cause that to happen. We've not seen enough of them to detect any pattern though. I rather wonder if what you've seen is the same thing but the substituted data happens to be from another Postgres file. > Additionally, if you'd like to see the output of pg_filedump, I'll be > happy to post it here (or email it separately). I hesitate to attach > it here because it's not clear the mailing list's spam filtering would > let it pass. You should at least show the page you think is corrupt. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Tom Lane wrote: > You should at least show the page you think is corrupt. I attempted to send this additional info to the list but I think the message got dropped on the floor by the mailing list software or by the spam filter. I'll put the files on a web server and post links to them here. -- Kevin Brown kevin@sysexperts.com ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| I wrote: > I attempted to send this additional info to the list but I think the > message got dropped on the floor by the mailing list software or by > the spam filter. > > I'll put the files on a web server and post links to them here. You can find them here: https://gazebo.sysexperts.com/~kevin/postgresql The files are bad-page-info.txt and bad-page.txt. -- Kevin Brown kevin@sysexperts.com ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Kevin Brown <kevin@sysexperts.com> writes: >> I'll put the files on a web server and post links to them here. > You can find them here: > https://gazebo.sysexperts.com/~kevin/postgresql AFAICT, the first half of page 73 is OK, but the second half clearly is trashed. In the raw-format dump it does look a whole lot like the second half has been replaced by the first half of a page of some index. It's fairly hard to see how that could happen inside Postgres. One can readily imagine bugs that might replace one whole page with another, but there aren't any operations that manipulate half-a-page. On the other hand, if your kernel uses 4K blocksize, this could be explained as substitution of one buffer for another at the kernel level. So my money is on a kernel bug. As I mentioned, we've once or twice before seen reports that looked like similar substitutions of single pages by the kernel. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Tom Lane wrote: > It's fairly hard to see how that could happen inside Postgres. One can > readily imagine bugs that might replace one whole page with another, > but there aren't any operations that manipulate half-a-page. On the > other hand, if your kernel uses 4K blocksize, this could be explained > as substitution of one buffer for another at the kernel level. So my > money is on a kernel bug. As I mentioned, we've once or twice before > seen reports that looked like similar substitutions of single pages by > the kernel. Yeah, I agree that's what's likely to have happened here. The kernel's page size is 4k, as is the default block size used by XFS. Tracking this one down any further is going to be nigh unto impossible, I think. There have been a pile of fixes to XFS in 2.6.16, so I'll start using that, at least... Thanks for looking at this. -- Kevin Brown kevin@sysexperts.com ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |