This is a discussion on Index corruption within the pgsql Hackers forums, part of the PostgreSQL category; --> Jan Wieck <JanWieck@Yahoo.com> writes: > You're right ... forgot about that one. > However, transactions from different origins are ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Jan Wieck <JanWieck@Yahoo.com> writes: > You're right ... forgot about that one. > However, transactions from different origins are NEVER selected together > and it wouldn't make sense to compare their xid's anyway. So the index > might return index tuples for rows from another origin, but the > following qualifications against the log_origin in the heap tuple will > filter them out. No, that's not the point here. The point here is that if the xids that are simultaneously present in the index span more than a 2G-XID range, btree *will fail* because it will be dealing with keys that do not obey the transitive law. You do have a problem --- it doesn't explain Marc's troubles, but sl_log_1_idx2 is broken for multi master situations. All you need is masters with sufficiently different XID counters. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Ühel kenal päeval, R, 2006-06-30 kell 12:05, kirjutas Jan Wieck: > On 6/30/2006 11:55 AM, Tom Lane wrote: > > > Jan Wieck <JanWieck@Yahoo.com> writes: > >> On 6/30/2006 11:17 AM, Marko Kreen wrote: > >>> If the xxid-s come from different DB-s, then there can still be problems. > > > >> How so? They are allways part of a multi-key index having the > >> originating node ID first. > > > > Really? > > > > create table @NAMESPACE@.sl_log_1 ( > > log_origin int4, > > log_xid @NAMESPACE@.xxid, > > log_tableid int4, > > log_actionseq int8, > > log_cmdtype char, > > log_cmddata text > > ); > > create index sl_log_1_idx1 on @NAMESPACE@.sl_log_1 > > (log_origin, log_xid @NAMESPACE@.xxid_ops, log_actionseq); > > > > create index sl_log_1_idx2 on @NAMESPACE@.sl_log_1 > > (log_xid @NAMESPACE@.xxid_ops); > > You're right ... forgot about that one. And yes, there can be > transactions originating from multiple origins (masters) in the same > log. The thing is, the index is only there because in a single origin > situation (most installations are), the log_origin is allways the same. > The optimizer therefore sometimes didn't think using an index at all > would be good. > > However, transactions from different origins are NEVER selected together > and it wouldn't make sense to compare their xid's anyway. So the index > might return index tuples for rows from another origin, but the > following qualifications against the log_origin in the heap tuple will > filter them out. The problem was not only with returning too many rows from tuples, but as much returning too few. In case when you return too few rows some actions will just be left out from replication and thus will be missing from slaves. -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Thu, 2006-06-29 at 21:47 -0400, Tom Lane wrote: > One easy thing that would be worth trying is to build with > --enable-cassert and see if any Asserts get provoked during the > A couple other things to try, given that you can provoke the failure > fairly easily: > . . . > 1. In studying the code, it bothers me a bit that P_NEW is the same as > InvalidBlockNumber. The intended uses of P_NEW appear to be adequately > . . . > 2. I'm also eyeing this bit of code in hio.c: > > If someone else has just extended the relation, it's possible that this > will allow a process to get to the page before the intended extender has > . . . We tried all of these suggestions and still get the problem. Nothing interesting in the log file so I guess the Asserts did not fire. We are going to try experimenting with different kernels now. Unless anyone has any other suggestions. __ Marc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQBEpZ+FUBr6u+c2wkERAtskAJ9M7Ywvh94ZzPfclfepcg CvHaUuQgCeLWAI uZ4llaF/FnPp659/B3Ftvjk= =Wgtw -----END PGP SIGNATURE----- |
| |||
| Marc Munro <marc@bloodnok.com> writes: > We tried all of these suggestions and still get the problem. Nothing > interesting in the log file so I guess the Asserts did not fire. Not surprising, it was a long shot that any of those things were really broken. But worth testing. > We are going to try experimenting with different kernels now. Unless > anyone has any other suggestions. Right at the moment I have no better ideas :-( regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| For the record, here are the results of our (ongoing) inevstigation into the index/heap corruption problems I reported a couple of weeks ago. We were able to trigger the problem with kernels 2.6.16, 2.6.17 and 2.6.18.rc1, with 2.6.16 seeming to be the most flaky. By replacing the NFS-mounted netapp with a fibre-channel SAN, we have eliminated the problem on all kernels. From this, it would seem to be an NFS bug introduced post 2.6.14, though we cannot rule out a postgres bug exposed by unusual timing issues. Our starting systems are: Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875 Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux (and others) kernel boot option: elevator=deadline 16 Gigs of RAM postgresql-8.0.8-1PGDG Bonded e1000/tg3 NICs with 8192 MTU. Slony 1.1.5 NetApp FAS270 OnTap 7.0.3 Mounted with the NFS options rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=60 0,tcp,noac Jumbo frames 8192 MTU. All postgres data and logs are stored on the netapp. All tests results were reproduced with postgres 8.0.8 __ Marc On Fri, 2006-06-30 at 23:20 -0400, Tom Lane wrote: > Marc Munro <marc@bloodnok.com> writes: > > We tried all of these suggestions and still get the problem. Nothing > > interesting in the log file so I guess the Asserts did not fire. > > Not surprising, it was a long shot that any of those things were really > broken. But worth testing. > > > We are going to try experimenting with different kernels now. Unless > > anyone has any other suggestions. > > Right at the moment I have no better ideas :-( > > regards, tom lane > -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQBEvoe9UBr6u+c2wkERAuFkAJsFaiUXMhWrdkDIwW4Q17 GpkT5DswCdFwMU CCQ92Y13MmhQ274isuT7iws= =xQ0D -----END PGP SIGNATURE----- |
| ||||
| Marc Munro wrote: >For the record, here are the results of our (ongoing) inevstigation into >the index/heap corruption problems I reported a couple of weeks ago. > >We were able to trigger the problem with kernels 2.6.16, 2.6.17 and >2.6.18.rc1, with 2.6.16 seeming to be the most flaky. > >By replacing the NFS-mounted netapp with a fibre-channel SAN, we have >eliminated the problem on all kernels. > >From this, it would seem to be an NFS bug introduced post 2.6.14, though >we cannot rule out a postgres bug exposed by unusual timing issues. > >Our starting systems are: > >Sun v40z 4 x Dual Core AMD Opteron(tm) Processor 875 >Kernel 2.6.16.14 #8 SMP x86_64 x86_64 x86_64 GNU/Linux (and others) >kernel boot option: elevator=deadline >16 Gigs of RAM >postgresql-8.0.8-1PGDG >Bonded e1000/tg3 NICs with 8192 MTU. >Slony 1.1.5 > >NetApp FAS270 OnTap 7.0.3 >Mounted with the NFS options >rw,nfsvers=3,hard,rsize=32768,wsize=32768,timeo=6 00,tcp,noac >Jumbo frames 8192 MTU. > >All postgres data and logs are stored on the netapp. > >All tests results were reproduced with postgres 8.0.8 > >__ >Marc > >On Fri, 2006-06-30 at 23:20 -0400, Tom Lane wrote: > > >>Marc Munro <marc@bloodnok.com> writes: >> >> >>>We tried all of these suggestions and still get the problem. Nothing >>>interesting in the log file so I guess the Asserts did not fire. >>> >>> >>Not surprising, it was a long shot that any of those things were really >>broken. But worth testing. >> >> >> >>>We are going to try experimenting with different kernels now. Unless >>>anyone has any other suggestions. >>> >>> >>Right at the moment I have no better ideas :-( >> >> regards, tom lane >> >> > > > On a good stock day, some levity is justified. How are hackers like politicians? -- No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.1.394 / Virus Database: 268.10.1/390 - Release Date: 7/17/2006 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |