Unix Technical Forum

Re: [PATCHES] [WIP] shared locks

This is a discussion on Re: [PATCHES] [WIP] shared locks within the pgsql Hackers forums, part of the PostgreSQL category; --> Found another interesting thing while testing this. I got a core dump from the Assert in GetMultiXactIdMembers, complaining that ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 03:36 AM
Tom Lane
 
Posts: n/a
Default Re: [PATCHES] [WIP] shared locks

Found another interesting thing while testing this. I got a core dump
from the Assert in GetMultiXactIdMembers, complaining that it was being
asked about a MultiXactId >= nextMXact. Sure enough, there was a
multixact on disk, left over from a previous core-dumped test, that was
larger than the nextMXact the current postmaster had started with.

My interpretation of this is that the MultiXact code is violating the
fundamental WAL rule, namely it is allowing data (multixact IDs in data
pages) to reach disk before the relevant WAL record (here the NEXTMULTI
record that should have advanced nextMXact) got to disk. It is very
easy for this to happen in the current system if the buffer page LSNs
aren't updated properly, because the bgwriter will be industriously
dumping dirty pages in the background.

AFAICS there isn't any very convenient way of propagating the true
location of the NEXTMULTI record into the page LSNs of the buffers that
heap_lock_tuple might stick relevant multi IDs into. What's probably
the easiest solution is for XLogPutNextMultiXactId to XLogFlush the
NEXTMULTI record before it returns. This is a mite annoying for
concurrency (because we'll have to hold MultiXactGenLock while flushing
xlog) but it should occur rarely enough to not be a huge deal.

At this point you're probably wondering why OID generation hasn't got
exactly the same problem, seeing that you borrowed all this logic from
the OID generator. The answer is that it would have the same problem,
except that an OID can only get onto disk as part of a tuple insert or
update, and all such events generate xlog records that must follow any
relevant NEXTOID record. Those records *will* get into the page LSNs,
and so the WAL rule is enforced.

So the problem would go away if heap_lock_tuple were generating any xlog
record of its own, which it might be doing by the time the 2PC dust
settles.

Plan B would be to decide that a multi ID that's >= nextMXact isn't
worthy of an Assert failure, but ought to be treated as just a dead
multixact. I'm kind of inclined to do that anyway, because I am not
convinced that this code guarantees no wraparound of multi IDs.

Thoughts?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:20 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com