Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Bugs

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-10-2008, 11:00 AM
Peter Koczan
 
Posts: n/a
Default BUG #3504: Some listening sessions never return from writing, problems ensue


The following bug has been logged online:

Bug reference: 3504
Logged by: Peter Koczan
Email address: pjkoczan@gmail.com
PostgreSQL version: 8.2.4
Operating system: CentOS 4.5 Linux (RHEL 4), kernel 2.6.9-55.ELsmp
Description: Some listening sessions never return from writing,
problems ensue
Details:

There is a problem where connections listening for asynchronous notifies
occasionally block for writing on ther server side and never finish,
resulting in connections that always have the status "notify interrupt".
Apparently, this causes no ill effects for the application that's listening
(i.e. it still updates somehow), but no data is written to the connection
when it should be. It also becomes a problem for restarting the server since
postgres cannot kill the connections when they're in that state. I either
have to explicitly kill the connections, kill the client apps, or reboot the
server. If I try to restart postgres, it kills all but the "notify
interrupt" connections, but it doesn't shut down so I can't restart it short
of rebooting.

Here are stack traces from listening sessions, the first is ok, the second
is blocked.

-----------------------------------------------------

OK:
[root@vero ~]# ps axvw | grep 30728
30728 ? Ss 0:00 0 3265 42526 29240 0.7 postgres: koczan
csdb ator.cs.wisc.edu(38753) idle

[root@vero ~]# gdb -p 30728
....Standard loading messages...
(gdb) bt
#0 0x003bc7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x001bac93 in __read_nocancel () from /lib/tls/libc.so.6
#2 0x00e81e2e in sock_read () from
/s/openssl-0.9.8e/lib/libcrypto.so.0.9.8e
#3 0x00000008 in ?? ()
#4 0x08c40068 in ?? ()
#5 0x00000005 in ?? ()
#6 0x00f160e8 in ?? () from /s/openssl-0.9.8e/lib/libcrypto.so.0.9.8e
#7 0x08c3a310 in ?? ()
#8 0x00000000 in ?? ()


(This stack trace is of the last 3 updates, and is farily representative of
what I've seen).
[root@vero ~]# strace -p 30728
read(8, 0x8c40068, 5) = ? ERESTARTSYS (To be restarted)
--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={1, 0}}, NULL) = 0
semop(1081358, 0xbfe20b9c, 1) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
semop(1081358, 0xbfe20b6c, 1) = 0
_llseek(6, 0, [6078464], SEEK_END) = 0
time(NULL) = 1185998905
_llseek(5, 6569984, [6569984], SEEK_SET) = 0
write(5, "^\320\1\0\1\0\0\0!\0\0\0\0@d\225\34\0\0\0\10\0\0\ 0(\372"..., 8192)
= 8192
fdatasync(5) = 0
semop(1048589, 0xbfe20adc, 1) = 0
semop(1048589, 0xbfe20b2c, 1) = 0
write(8, "\27\3\1\0 \221\363{!\235\355\317\223\260\253\tY\242\350"..., 90) =
90
sigreturn() = ? (mask now [])
read(8, 0x8c40068, 5) = ? ERESTARTSYS (To be restarted)
--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={1, 0}}, NULL) = 0
semop(1081358, 0xbfe20b9c, 1) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
semop(1081358, 0xbfe20b6c, 1) = 0
_llseek(6, 0, [6078464], SEEK_END) = 0
time(NULL) = 1185998905
write(5, "^\320\1\0\1\0\0\0!\0\0\0\0`d\225H\0\0\0req\0 \272\301\10"...,
8192) = 8192
fdatasync(5) = 0
semop(1212434, 0xbfe20adc, 1) = 0
write(8, "\27\3\1\0 \'\6\235\323\2\33\321\216R\370\277i\304\217h"..., 90) =
90
sigreturn() = ? (mask now [])
read(8, 0x8c40068, 5) = ? ERESTARTSYS (To be restarted)
--- SIGUSR2 (User defined signal 2) @ 0 (0) ---
select(0, NULL, NULL, NULL, {0, 1000}) = 0 (Timeout)
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={1, 0}}, NULL) = 0
semop(1081358, 0xbfe20b9c, 1) = 0
setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0
_llseek(6, 0, [6078464], SEEK_END) = 0
time(NULL) = 1186000276
_llseek(5, 9453568, [9453568], SEEK_SET) = 0
write(5, "^\320\1\0\1\0\0\0!\0\0\0\0@\220\225r\4\0\0\0\0\0\ 0\0\0"..., 8192)
= 8192
fdatasync(5) = 0
semop(1081358, 0xbfe20adc, 1) = 0
write(8, "\27\3\1\0 \301\307_\375=\0032\243\212\215\362\3457\335"..., 90) =
90
sigreturn() = ? (mask now [])
read(8, 0x8c40068, 5) = ? ERESTARTSYS (To be restarted)
--- SIGUSR1 (User defined signal 1) @ 0 (0) ---
semop(1081358, 0xbfe20d2c, 1) = 0
semop(1081358, 0xbfe20d2c, 1) = 0
semop(1081358, 0xbfe20d2c, 1) = 0
sigreturn() = ? (mask now [])
read(8,

-----------------------------------------------------

Blocked process:
[root@vero ~]# ps axvw | grep 19583
19583 ? Ss 0:02 0 3265 42518 29368 0.7 postgres: craft
csdb zim.cs.wisc.edu(32777) notify interrupt

[root@vero ~]# gdb -p 19583
....Standard loading messages...
(gdb) bt
#0 0x003bc7a2 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2
#1 0x001bad13 in __write_nocancel () from /lib/tls/libc.so.6
#2 0x00e81ec4 in sock_write () from
/s/openssl-0.9.8e/lib/libcrypto.so.0.9.8e
#3 0x00000008 in ?? ()
#4 0x08c4484a in ?? ()
#5 0x00000038 in ?? ()
#6 0x00000030 in ?? ()
#7 0x00f160e8 in ?? () from /s/openssl-0.9.8e/lib/libcrypto.so.0.9.8e
#8 0x08c3a7f0 in ?? ()
#9 0x00000000 in ?? ()

(Stack trace from the same amount of time, same 3 updates)
[root@vero ~]# strace -p 19583
Process 19583 attached - interrupt to quit
write(8, "\224\337\252\27\3\1\0000\247W\250\2-\307;\204w\276\242"..., 56) =
? ERESTARTSYS (To be restarted)
--- SIGUSR1 (User defined signal 1) @ 0 (0) ---
sigreturn() = ? (mask now [USR2])
write(8, "\224\337\252\27\3\1\0000\247W\250\2-\307;\204w\276\242"..., 56

From my best guess, looks like something's weird with semop, but I don't
know.

Note: I'm compiling from source with PAM, krb5 (1.5.3), and openssl
(0.9.8e).

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-10-2008, 11:00 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing,problems ensue

Peter Koczan wrote:
> There is a problem where connections listening for asynchronous notifies
> occasionally block for writing on ther server side and never finish,
> resulting in connections that always have the status "notify interrupt".
> Apparently, this causes no ill effects for the application that's listening
> (i.e. it still updates somehow), but no data is written to the connection
> when it should be. It also becomes a problem for restarting the server since
> postgres cannot kill the connections when they're in that state. I either
> have to explicitly kill the connections, kill the client apps, or reboot the
> server. If I try to restart postgres, it kills all but the "notify
> interrupt" connections, but it doesn't shut down so I can't restart it short
> of rebooting.


Does the client read the async notifies? The write in the server will
block if the client doesn't keep up with reading the data.

> ...
>
> (This stack trace is of the last 3 updates, and is farily representative of
> what I've seen).
> [root@vero ~]# strace -p 30728


That's actually not a stack trace, but a system call trace.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-10-2008, 11:00 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing, problems ensue

1Heikki Linnakangas <heikki@enterprisedb.com> writes:
> Peter Koczan wrote:
>> There is a problem where connections listening for asynchronous notifies
>> occasionally block for writing on ther server side and never finish,


> Does the client read the async notifies? The write in the server will
> block if the client doesn't keep up with reading the data.


Loss of network connectivity to the client (eg, timeout of a NAT
mapping) seems like another plausible explanation.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-10-2008, 11:00 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing, problems ensue

Peter Koczan <pjkoczan@gmail.com> writes:
> Heikki Linnakangas wrote:
>> Does the client read the async notifies? The write in the server will
>> block if the client doesn't keep up with reading the data.
>>

> Well, the client updates it's GUI properly no matter whether the
> listening session is blocked or not (it's an ongoing tail of requests).
> Overtly from the client side, it's completely transparent. Is there any
> way I can tell definitively from the client side whether or not it's
> actually reading the async notifies?


netstat's Recv-Q column would probably reflect it if the client is
failing to consume messages. The send queue on the server side would be
worth checking too.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-10-2008, 11:00 AM
Peter Koczan
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing, problems ensue

I found something pretty interesting when running netstat's:

Before a series of 3 async notifies (second column is Recv-Q):

OK connections:
[koczan@ator] koczan $ netstat | grep vero
tcp 180 0 ator.cs.wisc.edu:32785
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 0 0 ator.cs.wisc.edu:32784
vero.cs.wisc.eduostgresqlESTABLISHED
[koczan@ator] koczan $ ssh newton netstat | grep vero
tcp 64260 0 newton.cs.wisc.edu:32778
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 0 0 newton.cs.wisc.edu:32777
vero.cs.wisc.eduostgresqlESTABLISHED

"notify interrupt" connections:
[koczan@ator] koczan $ ssh brian netstat | grep vero
tcp 0 0 brian.cs.wisc.edu:40365
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 77800 0 brian.cs.wisc.edu:40366
vero.cs.wisc.eduostgresqlESTABLISHED
[koczan@ator] koczan $ ssh zim netstat | grep vero
tcp 73402 0 zim.cs.wisc.edu:32777
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 0 0 zim.cs.wisc.edu:32776
vero.cs.wisc.eduostgresqlESTABLISHED

and after said notifies:

OK connections:
[koczan@ator] koczan $ netstat | grep vero
tcp 450 0 ator.cs.wisc.edu:32785
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 0 0 ator.cs.wisc.edu:32784
vero.cs.wisc.eduostgresqlESTABLISHED
[koczan@ator] koczan $ ssh newton netstat | grep vero
tcp 64260 0 newton.cs.wisc.edu:32778
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 0 0 newton.cs.wisc.edu:32777
vero.cs.wisc.eduostgresqlESTABLISHED

"notify interrupt" connections:
[koczan@ator] koczan $ ssh brian netstat | grep vero
tcp 0 0 brian.cs.wisc.edu:40365
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 77800 0 brian.cs.wisc.edu:40366
vero.cs.wisc.eduostgresqlESTABLISHED
[koczan@ator] koczan $ ssh zim netstat | grep vero
tcp 73402 0 zim.cs.wisc.edu:32777
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 0 0 zim.cs.wisc.edu:32776
vero.cs.wisc.eduostgresqlESTABLISHED

And on the server side things get a little more interesting (Send-Q is the
3rd column)...

OK:
[koczan@vero] ~ $ netstat | grep ator
tcp 0 0 vero.cs.wisc.eduostgresql ator.cs.wisc.edu:32785
ESTABLISHED
tcp 0 0 vero.cs.wisc.eduostgresql ator.cs.wisc.edu:32784
ESTABLISHED
[koczan@vero] ~ $ netstat | grep newton
tcp 0 0 vero.cs.wisc.eduostgresql newton.cs.wisc.edu:32778
ESTABLISHED
tcp 0 0 vero.cs.wisc.eduostgresql newton.cs.wisc.edu:32777
ESTABLISHED

"notify interrupt":
[koczan@vero] ~ $ netstat | grep brian
tcp 0 0 vero.cs.wisc.eduostgresql brian.cs.wisc.edu:40365
ESTABLISHED
tcp 0 13032 vero.cs.wisc.eduostgresql brian.cs.wisc.edu:40366
ESTABLISHED
[koczan@vero] ~ $ netstat | grep zim
tcp 0 13032 vero.cs.wisc.eduostgresql zim.cs.wisc.edu:32777
ESTABLISHED
tcp 0 0 vero.cs.wisc.eduostgresql zim.cs.wisc.edu:32776
ESTABLISHED

A quick perusal of the other "notify interrupt" connections shows 13032 in
the Send-Q column. They all got into this state at the same time.

Peter

P.S. Thanks for the help, I really appreciate it.

On 8/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Koczan <pjkoczan@gmail.com> writes:
> > Heikki Linnakangas wrote:
> >> Does the client read the async notifies? The write in the server will
> >> block if the client doesn't keep up with reading the data.
> >>

> > Well, the client updates it's GUI properly no matter whether the
> > listening session is blocked or not (it's an ongoing tail of requests).
> > Overtly from the client side, it's completely transparent. Is there any
> > way I can tell definitively from the client side whether or not it's
> > actually reading the async notifies?

>
> netstat's Recv-Q column would probably reflect it if the client is
> failing to consume messages. The send queue on the server side would be
> worth checking too.
>
> regards, tom lane
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-10-2008, 11:00 AM
Peter Koczan
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing, problems ensue

Whoops, a couple quick changes. This:
[koczan@ator] koczan $ ssh newton netstat | grep vero
tcp 64260 0 newton.cs.wisc.edu:32778
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 0 0 newton.cs.wisc.edu:32777
vero.cs.wisc.eduostgresqlESTABLISHED

should be this:
[koczan@ator] koczan $ ssh newton netstat | grep vero
tcp 64530 0 newton.cs.wisc.edu:32778
vero.cs.wisc.eduostgresqlESTABLISHED
tcp 0 0 newton.cs.wisc.edu:32777
vero.cs.wisc.eduostgresqlESTABLISHED

Also, the Send-Q column on the server side doesn't change, even after more
async notifies. They may not have happened at the same time, but it's a bit
perplexing that they all have the exact same amount of data in the queue.

Peter

On 8/2/07, Peter Koczan <pjkoczan@gmail.com> wrote:
>
> I found something pretty interesting when running netstat's:
>
> Before a series of 3 async notifies (second column is Recv-Q):
>
> OK connections:
> [koczan@ator] koczan $ netstat | grep vero
> tcp 180 0 ator.cs.wisc.edu:32785
> vero.cs.wisc.eduostgresql ESTABLISHED
> tcp 0 0 ator.cs.wisc.edu:32784
> vero.cs.wisc.eduostgresql ESTABLISHED
> [koczan@ator] koczan $ ssh newton netstat | grep vero
> tcp 64260 0 newton.cs.wisc.edu:32778
> vero.cs.wisc.eduostgresql ESTABLISHED
> tcp 0 0 newton.cs.wisc.edu:32777
> vero.cs.wisc.eduostgresql ESTABLISHED
>
> "notify interrupt" connections:
> [koczan@ator] koczan $ ssh brian netstat | grep vero
> tcp 0 0 brian.cs.wisc.edu:40365
> vero.cs.wisc.eduostgresql ESTABLISHED
> tcp 77800 0 brian.cs.wisc.edu:40366
> vero.cs.wisc.eduostgresql ESTABLISHED
> [koczan@ator] koczan $ ssh zim netstat | grep vero
> tcp 73402 0 zim.cs.wisc.edu:32777
> vero.cs.wisc.eduostgresql ESTABLISHED
> tcp 0 0 zim.cs.wisc.edu:32776
> vero.cs.wisc.eduostgresql ESTABLISHED
>
> and after said notifies:
>
> OK connections:
> [koczan@ator] koczan $ netstat | grep vero
> tcp 450 0 ator.cs.wisc.edu:32785
> vero.cs.wisc.eduostgresql ESTABLISHED
> tcp 0 0 ator.cs.wisc.edu:32784
> vero.cs.wisc.eduostgresql ESTABLISHED
> [koczan@ator] koczan $ ssh newton netstat | grep vero
> tcp 64260 0 newton.cs.wisc.edu:32778
> vero.cs.wisc.eduostgresql ESTABLISHED
> tcp 0 0 newton.cs.wisc.edu:32777
> vero.cs.wisc.eduostgresql ESTABLISHED
>
> "notify interrupt" connections:
> [koczan@ator] koczan $ ssh brian netstat | grep vero
> tcp 0 0 brian.cs.wisc.edu:40365
> vero.cs.wisc.eduostgresql ESTABLISHED
> tcp 77800 0 brian.cs.wisc.edu:40366
> vero.cs.wisc.eduostgresql ESTABLISHED
> [koczan@ator] koczan $ ssh zim netstat | grep vero
> tcp 73402 0 zim.cs.wisc.edu:32777
> vero.cs.wisc.eduostgresql ESTABLISHED
> tcp 0 0 zim.cs.wisc.edu:32776
> vero.cs.wisc.eduostgresql ESTABLISHED
>
> And on the server side things get a little more interesting (Send-Q is the
> 3rd column)...
>
> OK:
> [koczan@vero] ~ $ netstat | grep ator
> tcp 0 0 vero.cs.wisc.eduostgresql ator.cs.wisc.edu:32785
> ESTABLISHED
> tcp 0 0 vero.cs.wisc.eduostgresql ator.cs.wisc.edu:32784
> ESTABLISHED
> [koczan@vero] ~ $ netstat | grep newton
> tcp 0 0 vero.cs.wisc.eduostgresql newton.cs.wisc.edu:32778
> ESTABLISHED
> tcp 0 0 vero.cs.wisc.eduostgresql newton.cs.wisc.edu:32777
> ESTABLISHED
>
> "notify interrupt":
> [koczan@vero] ~ $ netstat | grep brian
> tcp 0 0 vero.cs.wisc.eduostgresql brian.cs.wisc.edu:40365
> ESTABLISHED
> tcp 0 13032 vero.cs.wisc.eduostgresql brian.cs.wisc.edu:40366
> ESTABLISHED
> [koczan@vero] ~ $ netstat | grep zim
> tcp 0 13032 vero.cs.wisc.eduostgresql zim.cs.wisc.edu:32777
> ESTABLISHED
> tcp 0 0 vero.cs.wisc.eduostgresql zim.cs.wisc.edu:32776
> ESTABLISHED
>
> A quick perusal of the other "notify interrupt" connections shows 13032 in
> the Send-Q column. They all got into this state at the same time.
>
> Peter
>
> P.S. Thanks for the help, I really appreciate it.
>
> On 8/2/07, Tom Lane <tgl@sss.pgh.pa.us > wrote:
> >
> > Peter Koczan <pjkoczan@gmail.com > writes:
> > > Heikki Linnakangas wrote:
> > >> Does the client read the async notifies? The write in the server will
> > >> block if the client doesn't keep up with reading the data.
> > >>
> > > Well, the client updates it's GUI properly no matter whether the
> > > listening session is blocked or not (it's an ongoing tail of

> > requests).
> > > Overtly from the client side, it's completely transparent. Is there

> > any
> > > way I can tell definitively from the client side whether or not it's
> > > actually reading the async notifies?

> >
> > netstat's Recv-Q column would probably reflect it if the client is
> > failing to consume messages. The send queue on the server side would be
> > worth checking too.
> >
> > regards, tom lane
> >

>
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-10-2008, 11:00 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing,problems ensue

It looks to me that the server is trying to send stuff to the client,
but the client is not reading it for some reason. There's nothing in the
queues in the direction client->server.

What exactly is the client doing? How does it connect to the server,
with libpq? Is it possible that the client is just leaking the
connection, keeping it open but not using it for anything?

Peter Koczan wrote:
> Whoops, a couple quick changes. This:
> [koczan@ator] koczan $ ssh newton netstat | grep vero
> tcp 64260 0 newton.cs.wisc.edu:32778
> vero.cs.wisc.eduostgresqlESTABLISHED
> tcp 0 0 newton.cs.wisc.edu:32777
> vero.cs.wisc.eduostgresqlESTABLISHED
>
> should be this:
> [koczan@ator] koczan $ ssh newton netstat | grep vero
> tcp 64530 0 newton.cs.wisc.edu:32778
> vero.cs.wisc.eduostgresqlESTABLISHED
> tcp 0 0 newton.cs.wisc.edu:32777
> vero.cs.wisc.eduostgresqlESTABLISHED
>
> Also, the Send-Q column on the server side doesn't change, even after more
> async notifies. They may not have happened at the same time, but it's a bit
> perplexing that they all have the exact same amount of data in the queue.
>
> Peter
>
> On 8/2/07, Peter Koczan <pjkoczan@gmail.com> wrote:
>> I found something pretty interesting when running netstat's:
>>
>> Before a series of 3 async notifies (second column is Recv-Q):
>>
>> OK connections:
>> [koczan@ator] koczan $ netstat | grep vero
>> tcp 180 0 ator.cs.wisc.edu:32785
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> tcp 0 0 ator.cs.wisc.edu:32784
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> [koczan@ator] koczan $ ssh newton netstat | grep vero
>> tcp 64260 0 newton.cs.wisc.edu:32778
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> tcp 0 0 newton.cs.wisc.edu:32777
>> vero.cs.wisc.eduostgresql ESTABLISHED
>>
>> "notify interrupt" connections:
>> [koczan@ator] koczan $ ssh brian netstat | grep vero
>> tcp 0 0 brian.cs.wisc.edu:40365
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> tcp 77800 0 brian.cs.wisc.edu:40366
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> [koczan@ator] koczan $ ssh zim netstat | grep vero
>> tcp 73402 0 zim.cs.wisc.edu:32777
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> tcp 0 0 zim.cs.wisc.edu:32776
>> vero.cs.wisc.eduostgresql ESTABLISHED
>>
>> and after said notifies:
>>
>> OK connections:
>> [koczan@ator] koczan $ netstat | grep vero
>> tcp 450 0 ator.cs.wisc.edu:32785
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> tcp 0 0 ator.cs.wisc.edu:32784
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> [koczan@ator] koczan $ ssh newton netstat | grep vero
>> tcp 64260 0 newton.cs.wisc.edu:32778
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> tcp 0 0 newton.cs.wisc.edu:32777
>> vero.cs.wisc.eduostgresql ESTABLISHED
>>
>> "notify interrupt" connections:
>> [koczan@ator] koczan $ ssh brian netstat | grep vero
>> tcp 0 0 brian.cs.wisc.edu:40365
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> tcp 77800 0 brian.cs.wisc.edu:40366
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> [koczan@ator] koczan $ ssh zim netstat | grep vero
>> tcp 73402 0 zim.cs.wisc.edu:32777
>> vero.cs.wisc.eduostgresql ESTABLISHED
>> tcp 0 0 zim.cs.wisc.edu:32776
>> vero.cs.wisc.eduostgresql ESTABLISHED
>>
>> And on the server side things get a little more interesting (Send-Q is the
>> 3rd column)...
>>
>> OK:
>> [koczan@vero] ~ $ netstat | grep ator
>> tcp 0 0 vero.cs.wisc.eduostgresql ator.cs.wisc.edu:32785
>> ESTABLISHED
>> tcp 0 0 vero.cs.wisc.eduostgresql ator.cs.wisc.edu:32784
>> ESTABLISHED
>> [koczan@vero] ~ $ netstat | grep newton
>> tcp 0 0 vero.cs.wisc.eduostgresql newton.cs.wisc.edu:32778
>> ESTABLISHED
>> tcp 0 0 vero.cs.wisc.eduostgresql newton.cs.wisc.edu:32777
>> ESTABLISHED
>>
>> "notify interrupt":
>> [koczan@vero] ~ $ netstat | grep brian
>> tcp 0 0 vero.cs.wisc.eduostgresql brian.cs.wisc.edu:40365
>> ESTABLISHED
>> tcp 0 13032 vero.cs.wisc.eduostgresql brian.cs.wisc.edu:40366
>> ESTABLISHED
>> [koczan@vero] ~ $ netstat | grep zim
>> tcp 0 13032 vero.cs.wisc.eduostgresql zim.cs.wisc.edu:32777
>> ESTABLISHED
>> tcp 0 0 vero.cs.wisc.eduostgresql zim.cs.wisc.edu:32776
>> ESTABLISHED
>>
>> A quick perusal of the other "notify interrupt" connections shows 13032 in
>> the Send-Q column. They all got into this state at the same time.
>>
>> Peter
>>
>> P.S. Thanks for the help, I really appreciate it.
>>
>> On 8/2/07, Tom Lane <tgl@sss.pgh.pa.us > wrote:
>>> Peter Koczan <pjkoczan@gmail.com > writes:
>>>> Heikki Linnakangas wrote:
>>>>> Does the client read the async notifies? The write in the server will
>>>>> block if the client doesn't keep up with reading the data.
>>>>>
>>>> Well, the client updates it's GUI properly no matter whether the
>>>> listening session is blocked or not (it's an ongoing tail of
>>> requests).
>>>> Overtly from the client side, it's completely transparent. Is there
>>> any
>>>> way I can tell definitively from the client side whether or not it's
>>>> actually reading the async notifies?
>>> netstat's Recv-Q column would probably reflect it if the client is
>>> failing to consume messages. The send queue on the server side would be
>>> worth checking too.
>>>
>>> regards, tom lane
>>>

>>

>



--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-10-2008, 11:00 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing, problems ensue

"Peter Koczan" <pjkoczan@gmail.com> writes:
> A quick perusal of the other "notify interrupt" connections shows 13032 in
> the Send-Q column. They all got into this state at the same time.


That's far too much data to be a notify message, or even a small number
of notify messages. Is it possible that the client's failed to collect
hundreds of notify events?

But in any case this clearly lets the backend off the hook. The problem
is on the client side: it's not reading the data.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-10-2008, 11:00 AM
Peter Koczan
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing, problems ensue

Either it's not reading the notifies or it's not telling the server that
it's read them and flushing out the pipe.

This app uses Perl/DBD::Pg, and the code is basically:

while ($dbh->func('pg_notifies')) {
# do stuff
}

The corresponding C code in DBD::Pg appears to be:
/* ================================================== ================ */
SV * dbd_db_pg_notifies (dbh, imp_dbh)
SV *dbh;
imp_dbh_t *imp_dbh;
{
PGnotify *notify;
AV *ret;
SV *retsv;
int status;

if (dbis->debug >= 3) { (void)PerlIO_printf(DBILOGFP, "dbdpg:
dbd_db_pg_notifies\n"); }

status = PQconsumeInput(imp_dbh->conn);
if (0 == status) {
status = PQstatus(imp_dbh->conn);
pg_error(dbh, status, PQerrorMessage(imp_dbh->conn));
return &sv_undef;
}

notify = PQnotifies(imp_dbh->conn);

if (!notify)
return &sv_undef;

ret=newAV();

av_push(ret, newSVpv(notify->relname,0) );
av_push(ret, newSViv(notify->be_pid) );

#if PGLIBVERSION >= 70400
PQfreemem(notify);
#else
Safefree(notify);
#endif

retsv = newRV(sv_2mortal((SV*)ret));

return retsv;

} /* end of dbd_db_pg_notifies */


On 8/2/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> "Peter Koczan" <pjkoczan@gmail.com> writes:
> > A quick perusal of the other "notify interrupt" connections shows 13032

> in
> > the Send-Q column. They all got into this state at the same time.

>
> That's far too much data to be a notify message, or even a small number
> of notify messages. Is it possible that the client's failed to collect
> hundreds of notify events?
>
> But in any case this clearly lets the backend off the hook. The problem
> is on the client side: it's not reading the data.
>
> regards, tom lane
>


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-10-2008, 11:00 AM
Tom Lane
 
Posts: n/a
Default Re: BUG #3504: Some listening sessions never return from writing, problems ensue

"Peter Koczan" <pjkoczan@gmail.com> writes:
> This app uses Perl/DBD::Pg, and the code is basically:


> while ($dbh->func('pg_notifies')) {
> # do stuff
> }


If I'm not mistaken, that loop would process all the currently available
notify messages and then fall out. So the real question is what's
around this --- what conditions will make the program come back to this
loop periodically? Or not do so?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:29 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95