vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| The following bug has been logged online: Bug reference: 3242 Logged by: Marcin Waldowski Email address: M.Waldowski@sulechow.net PostgreSQL version: 8.2.3 and 8.2.1 Operating system: Windows XP SP2 Description: FATAL: could not unlock semaphore: error code 298 Details: Hello. After some time of performace test of our aplication (50 concurrent database connections making lots of quick transactions with prepared statements) we found problem in PostgreSQL log - "could not unlock semaphore: error code 298". After that connections were hanged blocked on update operations. We are investigating problem now. What another information should we provide? Log from 8.2.1 2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298 2007-04-19 08:52:11 STATEMENT: COMMIT 2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state Log from 8.2.3 2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298 2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2, c = c + $3 where id = $4 Regards, Marcin ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| We've made some tests on Linux and it seems like it never happen on this platform, but we use 8.1, not 8.2. select version() PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC i686-pc-linux-gnu-gcc (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9) Regards, Marcin Marcin Waldowski wrote: > The following bug has been logged online: > > Bug reference: 3242 > Logged by: Marcin Waldowski > Email address: M.Waldowski@sulechow.net > PostgreSQL version: 8.2.3 and 8.2.1 > Operating system: Windows XP SP2 > Description: FATAL: could not unlock semaphore: error code 298 > Details: > > Hello. > > After some time of performace test of our aplication (50 concurrent database > connections making lots of quick transactions with prepared statements) we > found problem in PostgreSQL log - "could not unlock semaphore: error code > 298". After that connections were hanged blocked on update operations. > > We are investigating problem now. What another information should we > provide? > > Log from 8.2.1 > 2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298 > 2007-04-19 08:52:11 STATEMENT: COMMIT > 2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state > > Log from 8.2.3 > 2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298 > 2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2, > c = c + $3 where id = $4 > > Regards, Marcin > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > > ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Hello. I've made some analysis of PostgreSQL code. It looks like void PGSemaphoreUnlock(PGSemaphore sema) from backend\port\win32_sema.c was executed one time more than needed. Error code 298 means "Too many posts were made to a semaphore": http://msdn2.microsoft.com/en-us/library/ms681382.aspx (sorry for posting microsoft links Below is an example when it happens: http://www.tech-archive.net/Archive/...4-02/0406.html If I understand it correctly it means that function ReleaseSemaphore (http://msdn2.microsoft.com/en-us/library/ms685071.aspx) which is executed from PGSemaphoreUnlock, was executed one time more than needed. I'm afraid than problem may lie above win32_sema.c Regards, Marcin Marcin Waldowski wrote: > The following bug has been logged online: > > Bug reference: 3242 > Logged by: Marcin Waldowski > Email address: M.Waldowski@sulechow.net > PostgreSQL version: 8.2.3 and 8.2.1 > Operating system: Windows XP SP2 > Description: FATAL: could not unlock semaphore: error code 298 > Details: > > Hello. > > After some time of performace test of our aplication (50 concurrent database > connections making lots of quick transactions with prepared statements) we > found problem in PostgreSQL log - "could not unlock semaphore: error code > 298". After that connections were hanged blocked on update operations. > > We are investigating problem now. What another information should we > provide? > > Log from 8.2.1 > 2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298 > 2007-04-19 08:52:11 STATEMENT: COMMIT > 2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state > > Log from 8.2.3 > 2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298 > 2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2, > c = c + $3 where id = $4 > > Regards, Marcin > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > > > ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote: > Hello. > > I've made some analysis of PostgreSQL code. It looks like void > PGSemaphoreUnlock(PGSemaphore sema) from backend\port\win32_sema.c was > executed one time more than needed. Certainly looks that way. I've looked at the code there, and can't find a clear problem. One way it could happen is if the actual PGSemaphoreUnlock() is called once more than needed. CC:ing to hackers for this question: Any chance that's happening? If this happens with SysV semaphores, will they error out, or just say it was done and do nothing? (meaning should we actuallyi be ignoring this error on windows?) //Magnus ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Magnus Hagander wrote: > I've looked at the code there, and can't find a clear problem. One way it > could happen is if the actual PGSemaphoreUnlock() is called once more than > needed. > > CC:ing to hackers for this question: > > Any chance that's happening? If this happens with SysV semaphores, will > they error out, or just say it was done and do nothing? (meaning should we > actuallyi be ignoring this error on windows?) > Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it happens. As I mentioned previously after it happens others connections were hung on update operations. What is strange we cannot reproduce this problem on Linux. But we can do this on Windows. What another information should we provide? Regards, Marcin ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Fri, Apr 20, 2007 at 10:09:39AM +0200, Marcin Waldowski wrote: > Magnus Hagander wrote: > >I've looked at the code there, and can't find a clear problem. One way it > >could happen is if the actual PGSemaphoreUnlock() is called once more than > >needed. > > > >CC:ing to hackers for this question: > > > >Any chance that's happening? If this happens with SysV semaphores, will > >they error out, or just say it was done and do nothing? (meaning should we > >actuallyi be ignoring this error on windows?) > > > > Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it > happens. No. It does ereport(FATAL) which terminates the backend. > As I mentioned previously after it happens others connections > were hung on update operations. What is strange we cannot reproduce this > problem on Linux. But we can do this on Windows. What another > information should we provide? Doesn't the postmaster restart all other backends due to the FATAL error? Are you saying that you can no longer make new connections to the server, or is the problem coming from that the aplpication doesn't like that the server kicked out all connections? If you can produce a self-contained test-case, that would certainly make debugging a lot easier. So if it's possible - but I realise that might not be easy for a problem like this :-) //Magnus ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| Magnus Hagander wrote: >> Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it >> happens. >> > > No. It does ereport(FATAL) which terminates the backend. > > Oh, now I see, sorry exception "FATAL: could not unlock semaphore", after that rollback failed because of IO error during write to connection and that was caused by "Connection reset by peer: socket write error". >> As I mentioned previously after it happens others connections >> were hung on update operations. What is strange we cannot reproduce this >> problem on Linux. But we can do this on Windows. What another >> information should we provide? >> > > Doesn't the postmaster restart all other backends due to the FATAL error? > Are you saying that you can no longer make new connections to the server, > or is the problem coming from that the aplpication doesn't like that the > server kicked out all connections? > No, we are sure that he didn't do that. As I mentioned above one connection was terminated, but other ones were hung on update operations. In this state it was possible to create new connection from PGAdmin and do some select and update operations. In addition I can say that we use only read-commited transactions and all operations are based on prepared statemens which are reused. > If you can produce a self-contained test-case, that would certainly make > debugging a lot easier. So if it's possible - but I realise that might not > be easy for a problem like this :-) > Our test case is our application, but unfortunately I cannot send it to you. I will think about test case, but I need to find a time for writing it PostgreSQL. Please instruct me what to do Regards, Marcin ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Marcin Waldowski wrote: >> >> Doesn't the postmaster restart all other backends due to the FATAL >> error? >> Are you saying that you can no longer make new connections to the >> server, >> or is the problem coming from that the aplpication doesn't like that the >> server kicked out all connections? >> > > No, we are sure that he didn't do that. As I mentioned above one > connection was terminated, but other ones were hung on update > operations. In this state it was possible to create new connection > from PGAdmin and do some select and update operations. In addition I > can say that we use only read-commited transactions and all operations > are based on prepared statemens which are reused. It may mean that PGSemaphoreUnlock(PGSemaphore sema) was executed for unintended sema "object". That's why PGSemaphoreUnlock() for unintended sema "object" failed and PGSemaphoreUnlock() for intended sema "object" *never* happens. That would explain why other connections were hung on update operations. I think it sounds quite reasonable to be one of possibilities Regards, Marcin ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Magnus Hagander <magnus@hagander.net> writes: > On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote: >>> I've looked at the code there, and can't find a clear problem. One way it >>> could happen is if the actual PGSemaphoreUnlock() is called once more than >>> needed. > CC:ing to hackers for this question: > Any chance that's happening? If this happens with SysV semaphores, will > they error out, or just say it was done and do nothing? (meaning should we > actuallyi be ignoring this error on windows?) How is it possible for a semaphore to be unlocked "too many times"? It's supposed to be a running counter of the net V's minus P's, and yes it had better be able to count higher than one. Have we chosen the wrong Windows primitive to implement this? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| Tom Lane wrote: > Magnus Hagander <magnus@hagander.net> writes: >> On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote: >>>> I've looked at the code there, and can't find a clear problem. One way it >>>> could happen is if the actual PGSemaphoreUnlock() is called once more than >>>> needed. > >> CC:ing to hackers for this question: > >> Any chance that's happening? If this happens with SysV semaphores, will >> they error out, or just say it was done and do nothing? (meaning should we >> actuallyi be ignoring this error on windows?) > > How is it possible for a semaphore to be unlocked "too many times"? > It's supposed to be a running counter of the net V's minus P's, and > yes it had better be able to count higher than one. Have we chosen > the wrong Windows primitive to implement this? No, it's definitly the right primitive. But we're creating it with a max count of 1. Not sure if that's right or not, too tired to think straight about that right now, but here's a summary: * Object is "signalled" when count > 0. * We create with an initial count of 1. * Calling WaitFor...() decreases the count. We call waitFor() in PGsemaphoreLock(). If count reaches zero, waitfor() will block. * Calling ReleaseSemaphore() increases the count. If count leaves zero for 1, a blocking waitfor() is released. If count ends up >1 (or whatever the limit is set to), we get said error. We call ReleaseSemaphore() in PGSemaphoreUnlock(). So basically this says we've called PGSemaphoreUnlock() more times than we've called PGSemaphoreLock(). Should we be creating it with a higher maximum value, and that's it? (it sounds like it, but I'm not entirely sure) //Magnus ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| Thread Tools | |
| Display Modes | |
|
|