vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| On Wed, Apr 9, 2008 at 4:47 PM, Mikko Partio <mpartio@gmail.com> wrote: > Hello all > > my struggle with the database continues (see earlier thread titled "too > many trigger records found for relation xyz"). > > Today, I created yet another to table to the same database. Everything > went ok, no errors or anything, but when I checked pg_tables -view I saw two > tables with the same name. Instantly I queried pg_class and yes there was > again two tables with same oid. I dropped the table before anything more > serious could happen, but then postgres started to complain of "cache lookup > failed for relation ...". I disconnected my psql session and tried to > reconnect but failed to do so: > > 2008-04-09 16:39:25 EEST [18984]: [1-1] FATAL: could not open relation > 1663/16386/544592: No such file or directory > > Indeed, there is no such file in that directory. I'm guessing that file is > connected to the table I just dropped. Now, is there anything to do to get > the database back online? I can still connect to other databases in the same > instance The cure was to create file 1663/16386/54459 8K in size with dd. The file in question was in fact the oid index on pg_class -- I had issued a REINDEX on pg_class just a moment before and apparantly something went wrong and the system lost track of the index. There was also two entries in pg_index for index pg_class_oid_index. After I removed the other entry and reindexed pg_class and pg_index, everything seems to be working ok. All the symptoms indicate that perhaps a xid wraparound had happened, but there is no such warning in logs and age(datfrozenxid) went never higher than say 250,000,000. Does anybody have a clue what might have happened? Regards Mikko |
| |||
| On Tue, Apr 15, 2008 at 9:36 AM, Mikko Partio <mpartio@gmail.com> wrote: > > > On Wed, Apr 9, 2008 at 4:47 PM, Mikko Partio <mpartio@gmail.com> wrote: > > > Hello all > > > > my struggle with the database continues (see earlier thread titled "too > > many trigger records found for relation xyz"). > > > > Today, I created yet another to table to the same database. Everything > > went ok, no errors or anything, but when I checked pg_tables -view I saw two > > tables with the same name. Instantly I queried pg_class and yes there was > > again two tables with same oid. I dropped the table before anything more > > serious could happen, but then postgres started to complain of "cache lookup > > failed for relation ...". I disconnected my psql session and tried to > > reconnect but failed to do so: > > > > 2008-04-09 16:39:25 EEST [18984]: [1-1] FATAL: could not open relation > > 1663/16386/544592: No such file or directory > > > > Indeed, there is no such file in that directory. I'm guessing that file > > is connected to the table I just dropped. Now, is there anything to do to > > get the database back online? I can still connect to other databases in the > > same instance > > > The cure was to create file 1663/16386/54459 8K in size with dd. The file > in question was in fact the oid index on pg_class -- I had issued a REINDEX > on pg_class just a moment before and apparantly something went wrong and the > system lost track of the index. There was also two entries in pg_index for > index pg_class_oid_index. After I removed the other entry and reindexed > pg_class and pg_index, everything seems to be working ok. All the symptoms > indicate that perhaps a xid wraparound had happened, but there is no such > warning in logs and age(datfrozenxid) went never higher than say > 250,000,000. Does anybody have a clue what might have happene > And now it has happened again. A CLUSTER operation was done on a table succesfully, afterwards when trying to access the table I get the error 2008-04-17 13:05:30 EEST [8435]: [32-1] ERROR: could not open relation 1663/16386/359232: No such file or directory Seems to me like VACUUM FULL, REINDEX and CLUSTER change the filename of a table and/or index and then fail to record the new name to system catalogues. Is this a known deficiency what can I do to stop this behaviour? Regards Mikko |
| |||
| On Thu, Apr 17, 2008 at 3:38 PM, Mikko Partio <mpartio@gmail.com> wrote: > > 2008-04-17 13:05:30 EEST [8435]: [32-1] ERROR: could not open relation > 1663/16386/359232: No such file or directory > Looks like a corrupt index to me. DId you try REINDEX on the table ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin |
| |||
| On Thu, Apr 17, 2008 at 1:36 PM, Pavan Deolasee <pavan.deolasee@gmail.com> wrote: > On Thu, Apr 17, 2008 at 3:38 PM, Mikko Partio <mpartio@gmail.com> wrote: > > > > > 2008-04-17 13:05:30 EEST [8435]: [32-1] ERROR: could not open relation > > 1663/16386/359232: No such file or directory > > > > Looks like a corrupt index to me. DId you try REINDEX on the table ? > Hi Pavan and thanks for your reply. I tried to reindex the individual indexes in the table: # reindex index xxx_idx; ERROR: could not open relation 1663/16386/359232: No such file or directory Since I thought the trouble may lie in the system catalogue indexes I issued a REINDEX SYSTEM db, which went through with no errors. After that I tried to remove indexes from the table in question: # drop index xxx_idx; ERROR: could not read block 0 of relation 1663/16386/2673: read only 0 of 8192 bytes Hmm.. this is a different oid # select 2673::regclass; regclass -------------------------- pg_depend_depender_index (1 row) But I just reindexed it! # reindex table pg_depend; WARNING: could not remove relation 1663/16386/2673: No such file or directory REINDEX When I fire pg_dump to take a last minute backup I see this error: pg_dump: Error message from server: ERROR: could not open relation 1663/16386/544529: No such file or directory pg_dump: The command was: SELECT tgname, tgfoid: tgfname, tgtype, tgnargs, tgargs, tgenabled, tgisconstraint, tgconstrname, tgdeferrable, tgconstrrelid, tginitdeferred, tableoid, oid, tgconstrrelid: pg_catalog.pg_trigger t where tgrelid = '294134': tgconstraint = 0 # reindex table pg_catalog.pg_trigger; WARNING: could not remove relation 1663/16386/544529: No such file or directory REINDEX Seems like the whole db is falling apart. Regards Mikko |
| |||
| "Mikko Partio" <mpartio@gmail.com> writes: > Seems like the whole db is falling apart. I think you've got really serious filesystem-level problems. Have you tried running any hardware diagnostics? Are you sure you're using a stable kernel version? regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin |
| |||
| On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Mikko Partio" <mpartio@gmail.com> writes: > > Seems like the whole db is falling apart. > > I think you've got really serious filesystem-level problems. Have you > tried running any hardware diagnostics? Are you sure you're using a > stable kernel version? I run fsck on the filesystem (gfs) -- no problems found. The disks are from a san and the diagnostic programs say there's nothing wrong. I also have other db clusters running on different filesystems (also gfs) and I have never had any problems with them. Regards Mikko |
| |||
| "Mikko Partio" <mpartio@gmail.com> writes: > On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I think you've got really serious filesystem-level problems. Have you >> tried running any hardware diagnostics? Are you sure you're using a >> stable kernel version? > I run fsck on the filesystem (gfs) -- no problems found. The disks are from > a san and the diagnostic programs say there's nothing wrong. I also have > other db clusters running on different filesystems (also gfs) and I have > never had any problems with them. Some RAM checks wouldn't be out of place either. regards, tom lane -- Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin |
| |||
| On Thu, Apr 17, 2008 at 6:59 PM, Mikko Partio <mpartio@gmail.com> wrote: > > > On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > "Mikko Partio" <mpartio@gmail.com> writes: > > > Seems like the whole db is falling apart. > > > > I think you've got really serious filesystem-level problems. Have you > > tried running any hardware diagnostics? Are you sure you're using a > > stable kernel version? > > > I run fsck on the filesystem (gfs) -- no problems found. The disks are > from a san and the diagnostic programs say there's nothing wrong. I also > have other db clusters running on different filesystems (also gfs) and I > have never had any problems with them. > Oh yeah and the kernel version is 2.6.18-53.1.14.el5. Regards Mikko |
| |||
| On Thu, Apr 17, 2008 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Mikko Partio" <mpartio@gmail.com> writes: > > On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I think you've got really serious filesystem-level problems. Have you > >> tried running any hardware diagnostics? Are you sure you're using a > >> stable kernel version? > > > I run fsck on the filesystem (gfs) -- no problems found. The disks are > from > > a san and the diagnostic programs say there's nothing wrong. I also have > > other db clusters running on different filesystems (also gfs) and I have > > never had any problems with them. > > Some RAM checks wouldn't be out of place either. Hmm didn't think of that, will do that asap (tomorrow). Thanks for your help. Regards Mikko |
| ||||
| On Thu, Apr 17, 2008 at 7:08 PM, Mikko Partio <mpartio@gmail.com> wrote: > > > On Thu, Apr 17, 2008 at 7:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > "Mikko Partio" <mpartio@gmail.com> writes: > > > On Thu, Apr 17, 2008 at 6:10 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > >> I think you've got really serious filesystem-level problems. Have > > you > > >> tried running any hardware diagnostics? Are you sure you're using a > > >> stable kernel version? > > > > > I run fsck on the filesystem (gfs) -- no problems found. The disks are > > from > > > a san and the diagnostic programs say there's nothing wrong. I also > > have > > > other db clusters running on different filesystems (also gfs) and I > > have > > > never had any problems with them. > > > > Some RAM checks wouldn't be out of place either. > > Memtest86+ has now been running for 20+ hours and no errors has been found. I was also unable to reproduce this problem, but it only happened after a few days of constant activity anyway so I guess it's not so easy to replicate. Any other pointers where to look at? Your help is well appreciated. Regards Mikko |