Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Bugs

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 10:43 AM
Marcin Waldowski
 
Posts: n/a
Default BUG #3242: FATAL: could not unlock semaphore: error code 298


The following bug has been logged online:

Bug reference: 3242
Logged by: Marcin Waldowski
Email address: M.Waldowski@sulechow.net
PostgreSQL version: 8.2.3 and 8.2.1
Operating system: Windows XP SP2
Description: FATAL: could not unlock semaphore: error code 298
Details:

Hello.

After some time of performace test of our aplication (50 concurrent database
connections making lots of quick transactions with prepared statements) we
found problem in PostgreSQL log - "could not unlock semaphore: error code
298". After that connections were hanged blocked on update operations.

We are investigating problem now. What another information should we
provide?

Log from 8.2.1
2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298
2007-04-19 08:52:11 STATEMENT: COMMIT
2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state

Log from 8.2.3
2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298
2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2,
c = c + $3 where id = $4

Regards, Marcin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 10:43 AM
Marcin Waldowski
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code298

We've made some tests on Linux and it seems like it never happen on this
platform, but we use 8.1, not 8.2.

select version()
PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC
i686-pc-linux-gnu-gcc (GCC) 3.4.5 (Gentoo 3.4.5, ssp-3.4.5-1.0, pie-8.7.9)

Regards, Marcin

Marcin Waldowski wrote:
> The following bug has been logged online:
>
> Bug reference: 3242
> Logged by: Marcin Waldowski
> Email address: M.Waldowski@sulechow.net
> PostgreSQL version: 8.2.3 and 8.2.1
> Operating system: Windows XP SP2
> Description: FATAL: could not unlock semaphore: error code 298
> Details:
>
> Hello.
>
> After some time of performace test of our aplication (50 concurrent database
> connections making lots of quick transactions with prepared statements) we
> found problem in PostgreSQL log - "could not unlock semaphore: error code
> 298". After that connections were hanged blocked on update operations.
>
> We are investigating problem now. What another information should we
> provide?
>
> Log from 8.2.1
> 2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298
> 2007-04-19 08:52:11 STATEMENT: COMMIT
> 2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state
>
> Log from 8.2.3
> 2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298
> 2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2,
> c = c + $3 where id = $4
>
> Regards, Marcin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>
>



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 10:43 AM
Marcin Waldowski
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code298

Hello.

I've made some analysis of PostgreSQL code. It looks like void
PGSemaphoreUnlock(PGSemaphore sema) from backend\port\win32_sema.c was
executed one time more than needed.

Error code 298 means "Too many posts were made to a semaphore":
http://msdn2.microsoft.com/en-us/library/ms681382.aspx (sorry for
posting microsoft links )

Below is an example when it happens:
http://www.tech-archive.net/Archive/...4-02/0406.html

If I understand it correctly it means that function ReleaseSemaphore
(http://msdn2.microsoft.com/en-us/library/ms685071.aspx) which is
executed from PGSemaphoreUnlock, was executed one time more than needed.

I'm afraid than problem may lie above win32_sema.c

Regards, Marcin

Marcin Waldowski wrote:
> The following bug has been logged online:
>
> Bug reference: 3242
> Logged by: Marcin Waldowski
> Email address: M.Waldowski@sulechow.net
> PostgreSQL version: 8.2.3 and 8.2.1
> Operating system: Windows XP SP2
> Description: FATAL: could not unlock semaphore: error code 298
> Details:
>
> Hello.
>
> After some time of performace test of our aplication (50 concurrent database
> connections making lots of quick transactions with prepared statements) we
> found problem in PostgreSQL log - "could not unlock semaphore: error code
> 298". After that connections were hanged blocked on update operations.
>
> We are investigating problem now. What another information should we
> provide?
>
> Log from 8.2.1
> 2007-04-19 08:52:11 FATAL: could not unlock semaphore: error code 298
> 2007-04-19 08:52:11 STATEMENT: COMMIT
> 2007-04-19 08:52:11 WARNING: AbortTransaction while in COMMIT state
>
> Log from 8.2.3
> 2007-04-19 10:56:13 FATAL: could not unlock semaphore: error code 298
> 2007-04-19 10:56:13 STATEMENT: update sometable set a = a + $1, b = b + $2,
> c = c + $3 where id = $4
>
> Regards, Marcin
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>
>



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 10:43 AM
Magnus Hagander
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code 298

On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote:
> Hello.
>
> I've made some analysis of PostgreSQL code. It looks like void
> PGSemaphoreUnlock(PGSemaphore sema) from backend\port\win32_sema.c was
> executed one time more than needed.


Certainly looks that way.

I've looked at the code there, and can't find a clear problem. One way it
could happen is if the actual PGSemaphoreUnlock() is called once more than
needed.

CC:ing to hackers for this question:

Any chance that's happening? If this happens with SysV semaphores, will
they error out, or just say it was done and do nothing? (meaning should we
actuallyi be ignoring this error on windows?)

//Magnus


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 10:43 AM
Marcin Waldowski
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code298

Magnus Hagander wrote:
> I've looked at the code there, and can't find a clear problem. One way it
> could happen is if the actual PGSemaphoreUnlock() is called once more than
> needed.
>
> CC:ing to hackers for this question:
>
> Any chance that's happening? If this happens with SysV semaphores, will
> they error out, or just say it was done and do nothing? (meaning should we
> actuallyi be ignoring this error on windows?)
>


Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it
happens. As I mentioned previously after it happens others connections
were hung on update operations. What is strange we cannot reproduce this
problem on Linux. But we can do this on Windows. What another
information should we provide?

Regards, Marcin

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 10:44 AM
Magnus Hagander
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code 298

On Fri, Apr 20, 2007 at 10:09:39AM +0200, Marcin Waldowski wrote:
> Magnus Hagander wrote:
> >I've looked at the code there, and can't find a clear problem. One way it
> >could happen is if the actual PGSemaphoreUnlock() is called once more than
> >needed.
> >
> >CC:ing to hackers for this question:
> >
> >Any chance that's happening? If this happens with SysV semaphores, will
> >they error out, or just say it was done and do nothing? (meaning should we
> >actuallyi be ignoring this error on windows?)
> >

>
> Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it
> happens.


No. It does ereport(FATAL) which terminates the backend.


> As I mentioned previously after it happens others connections
> were hung on update operations. What is strange we cannot reproduce this
> problem on Linux. But we can do this on Windows. What another
> information should we provide?


Doesn't the postmaster restart all other backends due to the FATAL error?
Are you saying that you can no longer make new connections to the server,
or is the problem coming from that the aplpication doesn't like that the
server kicked out all connections?

If you can produce a self-contained test-case, that would certainly make
debugging a lot easier. So if it's possible - but I realise that might not
be easy for a problem like this :-)

//Magnus


---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-10-2008, 10:44 AM
Marcin Waldowski
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code298

Magnus Hagander wrote:
>> Hmm, PGSemaphoreUnlock() actually ignore this error, only log that it
>> happens.
>>

>
> No. It does ereport(FATAL) which terminates the backend.
>
>


Oh, now I see, sorry Indeed on this one connection we receive
exception "FATAL: could not unlock semaphore", after that rollback
failed because of IO error during write to connection and that was
caused by "Connection reset by peer: socket write error".

>> As I mentioned previously after it happens others connections
>> were hung on update operations. What is strange we cannot reproduce this
>> problem on Linux. But we can do this on Windows. What another
>> information should we provide?
>>

>
> Doesn't the postmaster restart all other backends due to the FATAL error?
> Are you saying that you can no longer make new connections to the server,
> or is the problem coming from that the aplpication doesn't like that the
> server kicked out all connections?
>


No, we are sure that he didn't do that. As I mentioned above one
connection was terminated, but other ones were hung on update
operations. In this state it was possible to create new connection from
PGAdmin and do some select and update operations. In addition I can say
that we use only read-commited transactions and all operations are based
on prepared statemens which are reused.

> If you can produce a self-contained test-case, that would certainly make
> debugging a lot easier. So if it's possible - but I realise that might not
> be easy for a problem like this :-)
>


Our test case is our application, but unfortunately I cannot send it to
you. I will think about test case, but I need to find a time for writing
it I can reproduce error and provide all information you need from
PostgreSQL. Please instruct me what to do

Regards, Marcin



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-10-2008, 10:44 AM
Marcin Waldowski
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code298

Marcin Waldowski wrote:
>>
>> Doesn't the postmaster restart all other backends due to the FATAL
>> error?
>> Are you saying that you can no longer make new connections to the
>> server,
>> or is the problem coming from that the aplpication doesn't like that the
>> server kicked out all connections?
>>

>
> No, we are sure that he didn't do that. As I mentioned above one
> connection was terminated, but other ones were hung on update
> operations. In this state it was possible to create new connection
> from PGAdmin and do some select and update operations. In addition I
> can say that we use only read-commited transactions and all operations
> are based on prepared statemens which are reused.


It may mean that PGSemaphoreUnlock(PGSemaphore sema) was executed for
unintended sema "object". That's why PGSemaphoreUnlock() for unintended
sema "object" failed and PGSemaphoreUnlock() for intended sema "object"
*never* happens. That would explain why other connections were hung on
update operations.

I think it sounds quite reasonable to be one of possibilities

Regards, Marcin

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-10-2008, 10:44 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code 298

Magnus Hagander <magnus@hagander.net> writes:
> On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote:
>>> I've looked at the code there, and can't find a clear problem. One way it
>>> could happen is if the actual PGSemaphoreUnlock() is called once more than
>>> needed.


> CC:ing to hackers for this question:


> Any chance that's happening? If this happens with SysV semaphores, will
> they error out, or just say it was done and do nothing? (meaning should we
> actuallyi be ignoring this error on windows?)


How is it possible for a semaphore to be unlocked "too many times"?
It's supposed to be a running counter of the net V's minus P's, and
yes it had better be able to count higher than one. Have we chosen
the wrong Windows primitive to implement this?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-10-2008, 10:44 AM
Magnus Hagander
 
Posts: n/a
Default Re: BUG #3242: FATAL: could not unlock semaphore: error code298

Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
>> On Fri, Apr 20, 2007 at 09:20:05AM +0200, Marcin Waldowski wrote:
>>>> I've looked at the code there, and can't find a clear problem. One way it
>>>> could happen is if the actual PGSemaphoreUnlock() is called once more than
>>>> needed.

>
>> CC:ing to hackers for this question:

>
>> Any chance that's happening? If this happens with SysV semaphores, will
>> they error out, or just say it was done and do nothing? (meaning should we
>> actuallyi be ignoring this error on windows?)

>
> How is it possible for a semaphore to be unlocked "too many times"?
> It's supposed to be a running counter of the net V's minus P's, and
> yes it had better be able to count higher than one. Have we chosen
> the wrong Windows primitive to implement this?


No, it's definitly the right primitive. But we're creating it with a max
count of 1. Not sure if that's right or not, too tired to think straight
about that right now, but here's a summary:

* Object is "signalled" when count > 0.

* We create with an initial count of 1.

* Calling WaitFor...() decreases the count. We call waitFor() in
PGsemaphoreLock(). If count reaches zero, waitfor() will block.

* Calling ReleaseSemaphore() increases the count. If count leaves zero
for 1, a blocking waitfor() is released. If count ends up >1 (or
whatever the limit is set to), we get said error. We call
ReleaseSemaphore() in PGSemaphoreUnlock().


So basically this says we've called PGSemaphoreUnlock() more times than
we've called PGSemaphoreLock().


Should we be creating it with a higher maximum value, and that's it? (it
sounds like it, but I'm not entirely sure)

//Magnus

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:18 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467