Unix Technical Forum

Re: database vacuum from cron hanging

This is a discussion on Re: database vacuum from cron hanging within the pgsql Hackers forums, part of the PostgreSQL category; --> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > (gdb) p BufferDescriptors[781] > $1 = {tag = {rnode = {spcNode = 1663, dbNode ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 07:11 AM
Tom Lane
 
Posts: n/a
Default Re: database vacuum from cron hanging

"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> (gdb) p BufferDescriptors[781]
> $1 = {tag = {rnode = {spcNode = 1663, dbNode = 16385, relNode = 2666}, blockNum = 1}, flags = 70, usage_count = 5, refcount = 4294967294,
> wait_backend_pid = 748, buf_hdr_lock = 0 '\0', buf_id = 781, freeNext = -2, io_in_progress_lock = 1615, content_lock = 1616}


Whoa. refcount -2?

Well, now we have an idea what to look for anyway ... and the relNode
says this is pg_constraint_contypid_index. Again. There's got to be
some broken code affecting that index in particular ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-11-2008, 07:11 AM
Tom Lane
 
Posts: n/a
Default Re: database vacuum from cron hanging

I wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> (gdb) p BufferDescriptors[781]
>> $1 = {tag = {rnode = {spcNode = 1663, dbNode = 16385, relNode = 2666}, blockNum = 1}, flags = 70, usage_count = 5, refcount = 4294967294,
>> wait_backend_pid = 748, buf_hdr_lock = 0 '\0', buf_id = 781, freeNext = -2, io_in_progress_lock = 1615, content_lock = 1616}


> Whoa. refcount -2?


BTW, at this point your most helpful move would be to rebuild with
--enable-cassert (keeping --enable-debug) and go back to your current
test mix. The only place that decrements refcount has
Assert(refcount > 0), so it seems quite likely that you'll soon see
an assertion crash, and then getting a stack trace from that would
probably expose the culprit immediately. (Make sure you are running
the postmaster under ulimit -c unlimited so that you will get a core
dump file to trace...)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-11-2008, 07:12 AM
Tom Lane
 
Posts: n/a
Default Re: database vacuum from cron hanging

I wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> (gdb) p BufferDescriptors[781]
>> $1 = {tag = {rnode = {spcNode = 1663, dbNode = 16385, relNode = 2666}, blockNum = 1}, flags = 70, usage_count = 5, refcount = 4294967294,
>> wait_backend_pid = 748, buf_hdr_lock = 0 '\0', buf_id = 781, freeNext = -2, io_in_progress_lock = 1615, content_lock = 1616}


> Whoa. refcount -2?


After meditating overnight, I have a theory. There seem to be two basic
categories of possible explanations for the above state:

1. Some path of control decrements refcount more times than it increments it.
2. Occasionally, an intended increment gets lost.

Yesterday I was thinking in terms of #1, but it really doesn't seem to
fit the observed facts very well. I don't see a reason why such a bug
would preferentially affect pg_constraint_contypid_index; also it seems
like it would be fairly easily repeatable by many people. The pin
tracking logic is all internal to individual backends and doesn't look
very vulnerable to, say, timing-related glitches.

On the other hand, it's not hard to concoct a plausible explanation
using #2: suppose that two backends wanting to pin the same buffer at
about the same time pick up the same original value of refcount, add
one, store back. This is not supposed to happen of course, but maybe
the compiler is optimizing some code in a way that gives this effect
(ie, by reading refcount before the buffer header spinlock has been
acquired). Now we can account for pg_constraint_contypid_index being
hit: we know you use domains a lot, and that uncached catalog search in
GetDomainConstraints would result in a whole lot of concurrent accesses
to that particular index, so it would be a likely place for such a bug
to manifest. And we can account for you being the only one seeing it:
this theory makes it compiler- and platform-dependent.

Accordingly: what's the platform exactly? (CPU type, and OS just in
case.) What compiler was used? (If gcc, show "gcc -v" output.)
Also please show the output of "pg_config".

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:57 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com