Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-29-2008, 08:31 PM
Tom Lane
 
Posts: n/a
Default Recent buildfarm failures involving statement_timeout

The buildfarm has shown several intermittent failures recently:

http://www.pgbuildfarm.org/cgi-bin/s...-20%2000:06:01
http://www.pgbuildfarm.org/cgi-bin/s...-26%2004:15:02
http://www.pgbuildfarm.org/cgi-bin/s...-27%2000:06:01

all with the same signature:

*** ./expected/prepared_xacts.out Sun Apr 27 02:06:33 2008
--- ./results/prepared_xacts.out Sun Apr 27 02:10:19 2008
***************
*** 161,166 ****
--- 161,167 ----
SELECT * FROM pxtest3;
ERROR: canceling statement due to statement timeout
reset statement_timeout;
+ ERROR: canceling statement due to statement timeout
-- Disconnect, we will continue testing in a different backend
\c -
-- There should still be two prepared transactions

which so far as I recall is not a failure we've seen in the past
(though we still don't have buildfarm infrastructure that makes it
easy to confirm or refute that opinion).

One explanation for this is that these buildfarm machines were so
heavily overloaded that it really did take more than 1 sec to execute
"reset statement_timeout". That's not very plausible, though,
especially not considering that both machines logged session times
(via log_disconnect) that don't seem very far out of line.

Another theory is that we broke something recently, but I see no obvious
candidates in the CVS logs --- and I've spent the evening running 488
cycles of the parallel regression tests here, with no error.

Anyone have a theory?

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-29-2008, 08:31 PM
Gregory Stark
 
Posts: n/a
Default Re: Recent buildfarm failures involving statement_timeout

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Another theory is that we broke something recently, but I see no obvious
> candidates in the CVS logs --- and I've spent the evening running 488
> cycles of the parallel regression tests here, with no error.


It looks to me like a psql bug rather than a server bug. Psql has some hairy
logic to try to remember whether it has handled an error return yet or not and
I had a devil of a time trying to keep it working with the concurrent psql
patch. One of the failure modes I saw a lot was handling an error twice like
this.

It's possible it's a race condition where it only happens if psql gets the
error at a certain point, such as while fetching the results as opposed to
while the execute is pending. However IIRC the logic without concurrent psql
is actually fairly straightforward and there are no windows like this. Unless
it's actually in libpq and not psql. Hm.

Perhaps Bruce's terminate patch wasn't completely reverted and there was a
flag somewhere which isn't being reset properly?

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's Slony Replication support!

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-29-2008, 08:31 PM
Andrew Dunstan
 
Posts: n/a
Default Re: Recent buildfarm failures involving statement_timeout



Tom Lane wrote:
> The buildfarm has shown several intermittent failures recently:
>
> http://www.pgbuildfarm.org/cgi-bin/s...-20%2000:06:01
> http://www.pgbuildfarm.org/cgi-bin/s...-26%2004:15:02
> http://www.pgbuildfarm.org/cgi-bin/s...-27%2000:06:01
>
> all with the same signature:
>
> *** ./expected/prepared_xacts.out Sun Apr 27 02:06:33 2008
> --- ./results/prepared_xacts.out Sun Apr 27 02:10:19 2008
> ***************
> *** 161,166 ****
> --- 161,167 ----
> SELECT * FROM pxtest3;
> ERROR: canceling statement due to statement timeout
> reset statement_timeout;
> + ERROR: canceling statement due to statement timeout
> -- Disconnect, we will continue testing in a different backend
> \c -
> -- There should still be two prepared transactions
>
> which so far as I recall is not a failure we've seen in the past
> (though we still don't have buildfarm infrastructure that makes it
> easy to confirm or refute that opinion).
>


pgbfprod=# select sysname, snapshot from build_status_log where branch =
'HEAD' and log_stage = 'check.log' and log_text ~ $$\+ ERROR: canceling
statement due to statement timeout$$;
sysname | snapshot
---------+---------------------
fennec | 2008-03-06 01:10:02
heron | 2008-03-11 18:06:01
heron | 2008-03-11 23:06:01
heron | 2008-04-20 00:06:01
cobra | 2008-04-26 04:15:02
heron | 2008-04-27 00:06:01
(6 rows)


So, nothing before March.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-29-2008, 08:31 PM
Tom Lane
 
Posts: n/a
Default Re: Recent buildfarm failures involving statement_timeout

Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Another theory is that we broke something recently, but I see no obvious
>> candidates in the CVS logs --- and I've spent the evening running 488
>> cycles of the parallel regression tests here, with no error.


> It looks to me like a psql bug rather than a server bug.


Interesting theory, but it doesn't fit the facts: the extra error is in
the server log too.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-29-2008, 08:31 PM
Tom Lane
 
Posts: n/a
Default Re: Recent buildfarm failures involving statement_timeout

Andrew Dunstan <andrew@dunslane.net> writes:
> pgbfprod=# select sysname, snapshot from build_status_log where branch =
> 'HEAD' and log_stage = 'check.log' and log_text ~ $$\+ ERROR: canceling
> statement due to statement timeout$$;
> sysname | snapshot
> ---------+---------------------
> fennec | 2008-03-06 01:10:02
> heron | 2008-03-11 18:06:01
> heron | 2008-03-11 23:06:01
> heron | 2008-04-20 00:06:01
> cobra | 2008-04-26 04:15:02
> heron | 2008-04-27 00:06:01
> (6 rows)


Hmm, were there any matches in non-HEAD tests?

The log_disconnect records in these tests all seem to indicate that the
"reset" really did take a full second --- the total session time is
2-seconds-plus rather than 1-second-plus as in a successful run.

Since prepared_xacts runs as one member of a 20-way parallel test,
I guess it's not out of the question that the machine might sometimes
be loaded to the point where that backend doesn't get to run for awhile.
This would be particularly true if the kernel scheduler is one that
penalizes processes that it's decided are "background" processes ---
which might explain why certain platforms seem to show this more than
others.

Also, I trolled the CVS logs from 2008-03-06 back to the 8.3 branch point
and couldn't find anything that seemed likely to correspond to a large
performance dropoff.

So I'm kind of inclined to write this off as a testing glitch and "fix"
it by bumping the timeout up to 2 seconds. But it would be more
comforting if we'd seen it in back branches, or could spot some reason
why the test would be more likely to fail this way than in the past.
The prepared_xacts test itself hasn't changed materially since it was
put in...

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-29-2008, 08:31 PM
Andrew Dunstan
 
Posts: n/a
Default Re: Recent buildfarm failures involving statement_timeout



Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>
>> pgbfprod=# select sysname, snapshot from build_status_log where branch =
>> 'HEAD' and log_stage = 'check.log' and log_text ~ $$\+ ERROR: canceling
>> statement due to statement timeout$$;
>> sysname | snapshot
>> ---------+---------------------
>> fennec | 2008-03-06 01:10:02
>> heron | 2008-03-11 18:06:01
>> heron | 2008-03-11 23:06:01
>> heron | 2008-04-20 00:06:01
>> cobra | 2008-04-26 04:15:02
>> heron | 2008-04-27 00:06:01
>> (6 rows)
>>

>
> Hmm, were there any matches in non-HEAD tests?
>


sysname | snapshot | branch

---------+---------------------+---------------
lionfish | 2007-04-19 09:30:27 | REL8_2_STABLE
lionfish | 2007-05-29 23:30:07 | REL8_1_STABLE
lionfish | 2007-09-22 23:30:07 | REL8_1_STABLE



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-29-2008, 08:31 PM
Stefan Kaltenbrunner
 
Posts: n/a
Default Re: Recent buildfarm failures involving statement_timeout

Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>
>>> pgbfprod=# select sysname, snapshot from build_status_log where
>>> branch = 'HEAD' and log_stage = 'check.log' and log_text ~ $$\+
>>> ERROR: canceling statement due to statement timeout$$;
>>> sysname | snapshot ---------+---------------------
>>> fennec | 2008-03-06 01:10:02
>>> heron | 2008-03-11 18:06:01
>>> heron | 2008-03-11 23:06:01
>>> heron | 2008-04-20 00:06:01
>>> cobra | 2008-04-26 04:15:02
>>> heron | 2008-04-27 00:06:01
>>> (6 rows)
>>>

>>
>> Hmm, were there any matches in non-HEAD tests?
>>

>
> sysname | snapshot | branch
> ---------+---------------------+---------------
> lionfish | 2007-04-19 09:30:27 | REL8_2_STABLE
> lionfish | 2007-05-29 23:30:07 | REL8_1_STABLE
> lionfish | 2007-09-22 23:30:07 | REL8_1_STABLE


lionfish is(or rather was - powersupply got fried and I have no
replacement yet) one of the slowest machines(250MHZ mipsel,48MB RAM) on
the buildfarm (took about 5-6h for a complete run).


Stefan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-29-2008, 08:31 PM
Tom Lane
 
Posts: n/a
Default Re: Recent buildfarm failures involving statement_timeout

Stefan Kaltenbrunner <stefan@kaltenbrunner.cc> writes:
>> sysname | snapshot | branch
>> ---------+---------------------+---------------
>> lionfish | 2007-04-19 09:30:27 | REL8_2_STABLE
>> lionfish | 2007-05-29 23:30:07 | REL8_1_STABLE
>> lionfish | 2007-09-22 23:30:07 | REL8_1_STABLE


> lionfish is(or rather was - powersupply got fried and I have no
> replacement yet) one of the slowest machines(250MHZ mipsel,48MB RAM) on
> the buildfarm (took about 5-6h for a complete run).


Unless someone has a better idea, I suggest we just raise the timeouts
used in prepared_xacts from 1 sec to 2 sec. That should reduce the
recurrence rate to something vanishingly small if the problem is just
"it's too slow". If it's that something's actually getting stuck, then
we'll continue to see failures and will know that closer investigation
is needed.

regards, tom lane

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On



All times are GMT. The time now is 06:40 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145