Unix Technical Forum

Possible race in UnlockBuffers() and UnpinBuffer()

This is a discussion on Possible race in UnlockBuffers() and UnpinBuffer() within the pgsql Hackers forums, part of the PostgreSQL category; --> We have a wait-pin-to-1 mechanism in LockBufferForCleanup() like this: 1: bufHdr->wait_backend_pid = MyProcPid; 2: bufHdr->flags |= BM_PIN_COUNT_WAITER; 3: PinCountWaitBuf ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 01:59 AM
Qingqing Zhou
 
Posts: n/a
Default Possible race in UnlockBuffers() and UnpinBuffer()

We have a wait-pin-to-1 mechanism in LockBufferForCleanup() like this:

1: bufHdr->wait_backend_pid = MyProcPid;
2: bufHdr->flags |= BM_PIN_COUNT_WAITER;
3: PinCountWaitBuf = bufHdr;
4: UnlockBufHdr_NoHoldoff(bufHdr);
5: LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
6: /* Wait to be signaled by UnpinBuffer() */
7: ProcWaitForSignal();

Say if the waiter encounters an error on line 5 or gets a cancel signal on
line 6, it will abort current transaction and call UnlockBuffers() to cancel
the wait. However, a possible execution sequence involving another process
doing UnpinBuffer() may look like this:

unpinner: lockHdr(); read and reset flag; unlockHdr();
waiter: lockHdr(); reset flag; unlockHdr(); ProcCancelWaitForSignal();
unpinner: ProcSendSignal();

After this, the proc->sem will be bumped to 1 unexpectedly ... Since this
problem is rare, a possible fix is to put a critical section around line 1
to 7 and remove UnlockBuffers() accordingly.

Regards,
Qingqing








Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 02:00 AM
Tom Lane
 
Posts: n/a
Default Re: Possible race in UnlockBuffers() and UnpinBuffer()

"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> ... However, a possible execution sequence involving another process
> doing UnpinBuffer() may look like this:


> unpinner: lockHdr(); read and reset flag; unlockHdr();
> waiter: lockHdr(); reset flag; unlockHdr(); ProcCancelWaitForSignal();
> unpinner: ProcSendSignal();


Hmm ... I remember having convinced myself this code was OK, but I guess
I was wrong.

> After this, the proc->sem will be bumped to 1 unexpectedly ... Since this
> problem is rare, a possible fix is to put a critical section around line 1
> to 7 and remove UnlockBuffers() accordingly.


No, that would make any attempt to control-C a VACUUM have a significant
probability for panicking the whole database.

I think a better fix might be to arrange for an extra PGSemaphoreUnlock
to not be a problem. This is already true in lwlock.c, and in the
pin-count-waiter too (it'll just cause an extra cycle around the loop).
We'd have to modify ProcSleep to loop until it sees that someone has
actually granted or denied the lock, but that does not seem too hard.
(First thought about it is to change MyProc->waitStatus to have three
states, "waiting/ok/denied".) ProcCancelWaitForSignal could then go
away entirely.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 02:01 AM
Qingqing Zhou
 
Posts: n/a
Default Re: Possible race in UnlockBuffers() and UnpinBuffer()


"Tom Lane" <tgl@sss.pgh.pa.us> wrote
> "Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> > After this, the proc->sem will be bumped to 1 unexpectedly ... Since

this
> > problem is rare, a possible fix is to put a critical section around line

1
> > to 7 and remove UnlockBuffers() accordingly.

>
> No, that would make any attempt to control-C a VACUUM have a significant
> probability for panicking the whole database.
>


Why panicking by control-C? Is that critical section can prevent any signal
processing?

Regards,
Qingqing


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 02:01 AM
Tom Lane
 
Posts: n/a
Default Re: Possible race in UnlockBuffers() and UnpinBuffer()

"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote
>> "Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
>>> After this, the proc->sem will be bumped to 1 unexpectedly ... Since
>>> this problem is rare, a possible fix is to put a critical section
>>> around line 1 to 7 and remove UnlockBuffers() accordingly.

>>
>> No, that would make any attempt to control-C a VACUUM have a significant
>> probability for panicking the whole database.


> Why panicking by control-C?


Because the entire point of a critical section is that any error (eg
"Query cancelled") is turned into a panic. So a query-cancel attempt
while a vacuum is blocked here would either do nothing (bad) or panic
the database (worse).

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:21 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com