Unix Technical Forum

Improving deadlock error messages

This is a discussion on Improving deadlock error messages within the pgsql Hackers forums, part of the PostgreSQL category; --> The error message we currently produce when a deadlock occurs is pretty unfriendly: ERROR: deadlock detected DETAIL: Process 32068 ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 09:22 AM
Neil Conway
 
Posts: n/a
Default Improving deadlock error messages

The error message we currently produce when a deadlock occurs is pretty
unfriendly:

ERROR: deadlock detected
DETAIL: Process 32068 waits for AccessExclusiveLock on relation
16413 of database 16384; blocked by process 32064.
Process 32064 waits for AccessExclusiveLock on relation 16415
of database 16384; blocked by process 32068.

Users encounter this message relatively frequently -- and they often
depend on the information in the errdetail to track down the source of
the deadlock. Presenting numeric OIDs effectively forces the DBA to
resolve the OIDs to the appropriate relation and database names
manually.

I whipped up a quick patch to use names as well as OIDs for the
identifiers in the message, but on reflection the simple approach to
doing this is problematic: when we do syscache lookups to lookup the
identifier names, we might need to acquire an AccessShareLock on various
system catalogs (pg_class, pg_namespace, pg_database). This could fail
(e.g. because of a deadlock involving a system catalog), causing the
deadlock detector to infinitely recurse (albeit slowly).

We could fix this by first conditionally acquiring AccessShareLocks on
the necessary system catalogs. If any of those lock acquisitions fails,
we can just print the numeric OID instead. Since failing to acquire an
AccessShareLock on the system catalogs should not occur frequently in
practise, most users should see a more friendly error message.

Comments? Anyone see a better approach?

-Neil



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 09:22 AM
Tom Lane
 
Posts: n/a
Default Re: Improving deadlock error messages

Neil Conway <neilc@samurai.com> writes:
> I whipped up a quick patch to use names as well as OIDs for the
> identifiers in the message, but on reflection the simple approach to
> doing this is problematic: when we do syscache lookups to lookup the
> identifier names, we might need to acquire an AccessShareLock on various
> system catalogs (pg_class, pg_namespace, pg_database). This could fail
> (e.g. because of a deadlock involving a system catalog), causing the
> deadlock detector to infinitely recurse (albeit slowly).


Yup, that's exactly why it doesn't do that already.

> We could fix this by first conditionally acquiring AccessShareLocks on
> the necessary system catalogs.


I don't think you've thought of quite all of the failure cases. One
that's a bit pressing is that a deadlock isn't necessarily confined to
objects in your own database.

My take on this is that I'd much rather have unfriendly information
than none at all.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 09:22 AM
Gregory Stark
 
Posts: n/a
Default Re: Improving deadlock error messages

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> I don't think you've thought of quite all of the failure cases. One
> that's a bit pressing is that a deadlock isn't necessarily confined to
> objects in your own database.


We could do the syscache lookups for only the object we're waiting on and
store that information in the lock table for others to refer to. We would have
to do the syscache lookup either always or at the point where we first decide
we have to block.


--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 09:22 AM
Neil Conway
 
Posts: n/a
Default Re: Improving deadlock error messages

On Fri, 2007-04-20 at 02:55 -0400, Tom Lane wrote:
> I don't think you've thought of quite all of the failure cases. One
> that's a bit pressing is that a deadlock isn't necessarily confined to
> objects in your own database.


I'm not sure I follow. If we conditionally acquire the locks we need and
always fallback to just printing the numeric OIDs, ISTM we should be
able to avoid the infinite recursion problem. (If necessary, we can
always special-case objects outside our own database, although I'm not
sure that's necessary.)

-Neil



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 09:23 AM
Tom Lane
 
Posts: n/a
Default Re: Improving deadlock error messages

Neil Conway <neilc@samurai.com> writes:
> On Fri, 2007-04-20 at 02:55 -0400, Tom Lane wrote:
>> I don't think you've thought of quite all of the failure cases. One
>> that's a bit pressing is that a deadlock isn't necessarily confined to
>> objects in your own database.


> I'm not sure I follow. If we conditionally acquire the locks we need and
> always fallback to just printing the numeric OIDs, ISTM we should be
> able to avoid the infinite recursion problem. (If necessary, we can
> always special-case objects outside our own database, although I'm not
> sure that's necessary.)


Maybe so, but you're going to be writing quite a lot of duplicative
code, because the existing routines you might have been thinking of
using (lsyscache.c etc) don't behave that way.

The basic objection I've got to this is that it'll introduce a lot of
complexity and fragility into a seldom-taken error-recovery path,
which is almost by definition not well enough tested already.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 09:23 AM
Neil Conway
 
Posts: n/a
Default Re: Improving deadlock error messages

On Sat, 2007-04-21 at 02:38 -0400, Tom Lane wrote:
> Maybe so, but you're going to be writing quite a lot of duplicative
> code, because the existing routines you might have been thinking of
> using (lsyscache.c etc) don't behave that way.


Right, I'm envisioning doing a conditional LockAcquire and then
heap_open() / heap_getnext() by hand. That will be relatively slow, but
code that emits a deadlock error message is almost by definition not
performance critical.

BTW, another alternative would be to set a global variable instructing
LockAcquire() to not block waiting for a lock; instead, it would
longjmp(), a la elog(ERROR). You could even construct something similar
to PG_TRY():

PG_COND_LOCK();
{
/* do various things that might acquire lmgr locks */
}
PG_ACQUIRE_FAILED()
{
/* failed to acquire an lmgr lock */
}
PG_END_COND_LOCK();

The risk would be leaving the LockAcquire() call site in an inconsistent
state when we longjmp(), but since DeadLockReport() is going to
ereport(ERROR) anyway, it might be sufficiently safe. This scheme does
seem a bit fragile, though...

-Neil



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 09:23 AM
Neil Conway
 
Posts: n/a
Default Re: Improving deadlock error messages

On Sat, 2007-04-21 at 17:56 -0400, Neil Conway wrote:
> Right, I'm envisioning doing a conditional LockAcquire and then
> heap_open() / heap_getnext() by hand. That will be relatively slow, but
> code that emits a deadlock error message is almost by definition not
> performance critical.


.... although it turns out you'd need to conditionally lock a *lot* of
system catalogs to guarantee that you're not going to block on a lock at
some point. Needless to say, that approach would be pretty ugly and
fragile.

> BTW, another alternative would be to set a global variable instructing
> LockAcquire() to not block waiting for a lock; instead, it would
> longjmp(), a la elog(ERROR). You could even construct something similar
> to PG_TRY()


Attached is a very quick hack of a patch to do this.

-Neil



---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 09:24 AM
Neil Conway
 
Posts: n/a
Default Re: Improving deadlock error messages

On Sat, 2007-04-21 at 19:43 -0400, Neil Conway wrote:
> Attached is a very quick hack of a patch to do this.


Does anyone have any feedback on this approach? If people are satisfied
with this solution, I can get a cleaned up patch ready to apply shortly.

-Neil



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 09:24 AM
Tom Lane
 
Posts: n/a
Default Re: Improving deadlock error messages

Neil Conway <neilc@samurai.com> writes:
> On Sat, 2007-04-21 at 19:43 -0400, Neil Conway wrote:
>> Attached is a very quick hack of a patch to do this.


> Does anyone have any feedback on this approach? If people are satisfied
> with this solution, I can get a cleaned up patch ready to apply shortly.


I'm really still opposed to the entire concept. You're proposing to put
a lot of fragile-looking code into a seldom-exercised error path.
I fear bugs will survive a long time in there, and the net effect will be
that we get no information when we need it most. The numeric printouts
may be ugly, but they are reliable.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 09:24 AM
Neil Conway
 
Posts: n/a
Default Re: Improving deadlock error messages

On Mon, 2007-04-23 at 17:38 -0400, Tom Lane wrote:
> I'm really still opposed to the entire concept. You're proposing to put
> a lot of fragile-looking code into a seldom-exercised error path.


There's certainly not a "lot" of code: the patch just adds a few
syscache lookups, wrapped in a PG_LOCK_NOWAIT() block.

As for fragility, I think the important point is whether it's safe to
siglongjmp() out of LockAcquire(); the rest is just window dressing.

-Neil



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:48 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com