vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have a benchmark test which runs a query very slowly under certain circumstances. I used Ethereal to capture the packet traffic, and also enabled debug5 logging out of curiousity. While the slow query is in progress, there is no log or packet activity, but the cpu is busy. These packets are below; look for SLOW PACKETS HERE to skip all this explanatory drudge. This WHERE clause is fast under all conditions: ... AND (val.data = $2) AND ... This WHERE clause is fast as a simple query, but is excruciatingly slow as prepare / execute / fetch: ... AND (val.data > $2 AND val.data < $3) AND ... My test program is in Perl and uses DBI/DBD::Pg. Postgresql version is 8.0.3 on a dual core dual opteron with 2G of RAM. DBI is version 1.48. DBD::Pg is version 1.42. The OS is rPath Linux 2.6.15. The test runs each SQL statement three times, first as a simple query to preload caches, then as prepare / execute / fetch, and lastly as a simple query again. $sth = $dbh->prepare(sql_with_placeholders); $dbh->selectall_arrayref(sql_with_values_substituted) ; $sth->execute(@values); $sth->fetchall_arrayref(); $dbh->selectall_arrayref(sql_with_values_substituted) ; I captured packet traffic and tailed the log while these were running. Everything is fine except one query, which took 75 seconds to run, when the others took 3 milliseconds. During this 75 seconds, the postmaster log showed no activity, but top showed the postmaster quite busy. 75 seconds! That's an eternity. I can't imagine any circumstances where it makes sense. EXPLAIN ANALYZE doesn't show the slow timing because it requires values, not $n placeholders, and it is the prepare / execute operation which is so slow. I will be glad to send the log, the packet capture file, the test program itself, and anything else which helps. Here are the EXPLAIN statements in case it helps. EXPLAIN for the equality WHERE clause: felix=> explain analyze SELECT sid.data, glue.key, glue.val, glue.sid FROM key, val, sid, glue WHERE (key.data = 'x6ATArB_k1cgLp1mD5x2nzVVf2DQw4Lw1-Ow5NCzzs5Pupg6K' AND key.id = glue.key) AND (val.data = 357354306) AND val.id = glue.val AND glue.sid = sid.id; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop (cost=5.82..1119.29 rows=1 width=60) (actual time=2.271..36.184 rows=1 loops=1) -> Hash Join (cost=5.82..1116.27 rows=1 width=16) (actual time=2.079..35.976 rows=1 loops=1) Hash Cond: ("outer"."key" = "inner".id) -> Nested Loop (cost=0.00..1105.43 rows=1001 width=16) (actual time=0.315..31.820 rows=1000 loops=1) -> Index Scan using val_data_key on val (cost=0.00..6.01 rows=1 width=4) (actual time=0.119..0.123 rows=1 loops=1) Index Cond: (data = 357354306) -> Index Scan using glue_val_idx on glue (cost=0.00..702.58 rows=31747 width=16) (actual time=0.181..24.438 rows=1000 loops=1) Index Cond: ("outer".id = glue.val) -> Hash (cost=5.82..5.82 rows=1 width=4) (actual time=0.292..0.292 rows=0 loops=1) -> Index Scan using key_data_key on "key" (cost=0.00..5.82 rows=1 width=4) (actual time=0.266..0.271 rows=1 loops=1) Index Cond: (data = 'x6ATArB_k1cgLp1mD5x2nzVVf2DQw4Lw1-Ow5NCzzs5Pupg6K'::text) -> Index Scan using sid_pkey on sid (cost=0.00..3.01 rows=1 width=52) (actual time=0.179..0.183 rows=1 loops=1) Index Cond: ("outer".sid = sid.id) Total runtime: 37.880 ms (14 rows) EXPLAIN for the range WHERE clause: felix=> explain analyze SELECT sid.data, glue.key, glue.val, glue.sid FROM key, val, sid, glue WHERE (key.data = 'kOSkZ5iN6sz-KqGo51aTwqZnvCKQRUH2SZ8k' AND key.id = glue.key) AND (val.data > 183722006 AND val.data < 183722206) AND val.id = glue.val AND glue.sid = sid.id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=5.82..1119.30 rows=1 width=60) (actual time=15.016..15.525 rows=1 loops=1) -> Hash Join (cost=5.82..1116.27 rows=1 width=16) (actual time=14.879..15.374 rows=1 loops=1) Hash Cond: ("outer"."key" = "inner".id) -> Nested Loop (cost=0.00..1105.43 rows=1001 width=16) (actual time=0.211..11.666 rows=1000 loops=1) -> Index Scan using val_data_key on val (cost=0.00..6.01 rows=1 width=4) (actual time=0.071..0.076 rows=1 loops=1) Index Cond: ((data > 183722006) AND (data < 183722206)) -> Index Scan using glue_val_idx on glue (cost=0.00..702.58 rows=31747 width=16) (actual time=0.124..4.311 rows=1000 loops=1) Index Cond: ("outer".id = glue.val) -> Hash (cost=5.82..5.82 rows=1 width=4) (actual time=0.119..0.119 rows=0 loops=1) -> Index Scan using key_data_key on "key" (cost=0.00..5.82 rows=1 width=4) (actual time=0.101..0.105 rows=1 loops=1) Index Cond: (data = 'kOSkZ5iN6sz-KqGo51aTwqZnvCKQRUH2SZ8k'::text) -> Index Scan using sid_pkey on sid (cost=0.00..3.01 rows=1 width=52) (actual time=0.123..0.127 rows=1 loops=1) Index Cond: ("outer".sid = sid.id) Total runtime: 15.620 ms (14 rows) Here are the relevant packets. The program had pauses inserted so I could take notes and more easily distinguish which packets went with which operations, thus the packet times are only relevant within each operation. The database is on a test machine which was doing nothing else during this test. First the simple query to preload the cache. It takes 120 milliseconds. This is the equality WHERE clause. Packet 1 Time 0.000000 Type: Simple query Query: SELECT sid.data, glue.key, glue.val, glue.sid FROM key, val, sid, glue WHERE (key.data = 'x6ATArB_k1cgLp1mD5x2nzVVf2DQw4Lw1-Ow5NCzzs5Pupg6K' AND key.id = glue.key) AND (val.data = 357354306) AND val.id = glue.val AND glue.sid = sid.id Packet 3 Time 0.120243 Type: Data row Data: 4551474B5473374B744B64475A52676D694B3841476F4152.. . Data: 3738 Data: 3339393435 Data: 3339393435303738 Then as prepare / execute / fetch. It takes 3 milliseconds. Packet 5 Time 29.986702 Type: Parse Query: SELECT sid.data, glue.key, glue.val, glue.sid FROM key, val, sid, glue WHERE (key.data = $1 AND key.id = glue.key) AND (val.data = $2) AND val.id = glue.val AND glue.sid = sid.id Packet 7 Time 29.987439 Type: Parse completion Packet 9 Time 29.987509 Type: Bind Data: 783641544172425F6B3163674C70316D443578326E7A5656.. . (x6ATArB_...) Data: 333537333534333036 (357354306) Packet 10 Time 29.989138 Type: Bind completion Type: Data row Data: 4551474B5473374B744B64475A52676D694B3841476F4152.. . Data: 3738 Data: 3339393435 Data: 3339393435303738 And finally a repeat as a simple query, for timing with cached data. This also takes 3 milliseconds. Packet 12 Time 70.139935 Type: Simple query Query: SELECT sid.data, glue.key, glue.val, glue.sid FROM key, val, sid, glue WHERE (key.data = 'x6ATArB_k1cgLp1mD5x2nzVVf2DQw4Lw1-Ow5NCzzs5Pupg6K' AND key.id = glue.key) AND (val.data = 357354306) AND val.id = glue.val AND glue.sid = sid.id Packet 13 Time 70.142515 Type: Data row Data: 4551474B5473374B744B64475A52676D694B3841476F4152.. . Data: 3738 Data: 3339393435 Data: 3339393435303738 So far so good. Then I switched to the range WHERE clause, and things don't look right. ================ SLOW PACKETS HERE ================ First, the simple query to preload the cache. It takes 117 milliseconds. Packet 18 Time 100.342829 Type: Simple query Query: SELECT sid.data, glue.key, glue.val, glue.sid FROM key, val, sid, glue WHERE (key.data = 'kOSkZ5iN6sz-KqGo51aTwqZnvCKQRUH2SZ8k' AND key.id = glue.key) AND (val.data > 183722006 AND val.data < 183722206) AND val.id = glue.val AND glue.sid = sid.id Packet 20 Time 100.460555 Type: Data row Data: 4E396538585A3338455A35574642786A5339347266515161.. . Data: 393637 Data: 3539373539 Data: 3539373539393637 Then the prepare / execute / fetch query, which takes 75 seconds. Packet 22 Time 119.563438 Type: Parse Query: SELECT sid.data, glue.key, glue.val, glue.sid FROM key, val, sid, glue WHERE (key.data = $1 AND key.id = glue.key) AND (val.data > $2 AND val.data < $3) AND val.id = glue.val AND glue.sid = sid.id Packet 24 Time 119.564242 Type: Parse completion Packet 26 Time 119.564306 Type: Bind Data: 6B4F536B5A35694E36737A2D4B71476F3531615477715A6E.. . Data: 313833373232303036 Data: 313833373232323036 Packet 28 Time 194.052914 Type: Bind completion Type: Data row Data: 4E396538585A3338455A35574642786A5339347266515161.. . Data: 393637 Data: 3539373539 Data: 3539373539393637 And finally the simple query again, which takes 3 milliseconds. Packet 30 Time 219.762523 Type: Simple query Query: SELECT sid.data, glue.key, glue.val, glue.sid FROM key, val, sid, glue WHERE (key.data = 'kOSkZ5iN6sz-KqGo51aTwqZnvCKQRUH2SZ8k' AND key.id = glue.key) AND (val.data > 183722006 AND val.data < 183722206) AND val.id = glue.val AND glue.sid = sid.id Packet 32 Time 219.765267 Type: Data row Data: 4E396538585A3338455A35574642786A5339347266515161.. . Data: 393637 Data: 3539373539 Data: 3539373539393637 -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| felix-accts-pgsql@crowfix.com writes: > EXPLAIN ANALYZE doesn't show the slow timing > because it requires values, not $n placeholders, To analyze the plan used for a parameterized query, try PREPARE foo(...) AS SELECT ... $n ... EXPLAIN ANALYZE EXECUTE foo(...) But I already know what you're going to find: the planner's estimates for the range query are not going to be very good when it has no idea what the range bounds are. This is a situation where it may be best to absorb the hit of re-planning each time instead of using a generic parameterized plan. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| On Sun, Apr 16, 2006 at 04:32:25PM -0400, Tom Lane wrote: > To analyze the plan used for a parameterized query, try > > PREPARE foo(...) AS SELECT ... $n ... > > EXPLAIN ANALYZE EXECUTE foo(...) > > But I already know what you're going to find: the planner's estimates > for the range query are not going to be very good when it has no idea > what the range bounds are. This is a situation where it may be best > to absorb the hit of re-planning each time instead of using a generic > parameterized plan. I will try this Monday, but isn't 75 seconds an awful long time? It almost seems like even the worst plan could find records faster than that, and if it were actually scanning everything sequentially, there would be a fair amount of variation, say 25 seconds, 50 seconds, 100 seconds. The most I have seen is a range of, say, 75-77. That just seems way too slow. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| ||||
| On Sun, Apr 16, 2006 at 04:32:25PM -0400, Tom Lane wrote: > felix-accts-pgsql@crowfix.com writes: > > EXPLAIN ANALYZE doesn't show the slow timing > > because it requires values, not $n placeholders, > > To analyze the plan used for a parameterized query, try > > PREPARE foo(...) AS SELECT ... $n ... > > EXPLAIN ANALYZE EXECUTE foo(...) > > But I already know what you're going to find: the planner's estimates > for the range query are not going to be very good when it has no idea > what the range bounds are. This is a situation where it may be best > to absorb the hit of re-planning each time instead of using a generic > parameterized plan. OK, here is the new explain analyze. I eliminated cache effects by dumping the tables and picking random values with an editor. felix=> PREPARE foo(TEXT, INT, INT) AS SELECT s.data, g.key, g.val, g.sid FROM key k, val_int v, sid s, glue_int g WHERE (k.data = $1 AND k.id = g.key) AND (v.data >= $2 AND v.data <= $3) AND v.id = g.val AND g.sid = s.id; PREPARE felix=> explain analyze execute foo('mthNQFrmVs3Q4bVruCxIAGy', 1973028023, 1973028223); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1380.11..404223.36 rows=499 width=60) (actual time=5785.012..77823.688 rows=1 loops=1) -> Hash Join (cost=1380.11..402713.38 rows=499 width=16) (actual time=5766.308..77804.969 rows=1 loops=1) Hash Cond: ("outer".val = "inner".id) -> Nested Loop (cost=0.00..400829.78 rows=99701 width=16) (actual time=115.154..77401.159 rows=100000 loops=1) -> Index Scan using key_data_key on "key" k (cost=0.00..5.82 rows=1 width=4) (actual time=0.125..0.132 rows=1 loops=1) Index Cond: (data = $1) -> Index Scan using glue_int_key_idx on glue_int g (cost=0.00..399577.70 rows=99701 width=16) (actual time=115.011..76570.366 rows=100000 loops=1) Index Cond: ("outer".id = g."key") -> Hash (cost=1378.86..1378.86 rows=500 width=4) (actual time=11.580..11.580 rows=0 loops=1) -> Index Scan using val_int_data_key on val_int v (cost=0.00..1378.86 rows=500 width=4) (actual time=11.556..11.561 rows=1 loops=1) Index Cond: ((data >= $2) AND (data <= $3)) -> Index Scan using sid_pkey on sid s (cost=0.00..3.01 rows=1 width=52) (actual time=18.682..18.687 rows=1 loops=1) Index Cond: ("outer".sid = s.id) Total runtime: 77823.897 ms (14 rows) A repeat shows it faster, from 77 seconds to 3. felix=> explain analyze execute foo('mthNQFrmVs3Q4bVruCxIAGy', 1973028023, 1973028223); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1380.11..404223.36 rows=499 width=60) (actual time=205.137..2931.899 rows=1 loops=1) -> Hash Join (cost=1380.11..402713.38 rows=499 width=16) (actual time=205.056..2931.803 rows=1 loops=1) Hash Cond: ("outer".val = "inner".id) -> Nested Loop (cost=0.00..400829.78 rows=99701 width=16) (actual time=0.148..2564.255 rows=100000 loops=1) -> Index Scan using key_data_key on "key" k (cost=0.00..5.82 rows=1 width=4) (actual time=0.031..0.039 rows=1 loops=1) Index Cond: (data = $1) -> Index Scan using glue_int_key_idx on glue_int g (cost=0.00..399577.70 rows=99701 width=16) (actual time=0.105..1808.068 rows=100000 loops=1) Index Cond: ("outer".id = g."key") -> Hash (cost=1378.86..1378.86 rows=500 width=4) (actual time=0.090..0.090 rows=0 loops=1) -> Index Scan using val_int_data_key on val_int v (cost=0.00..1378.86 rows=500 width=4) (actual time=0.074..0.080 rows=1 loops=1) Index Cond: ((data >= $2) AND (data <= $3)) -> Index Scan using sid_pkey on sid s (cost=0.00..3.01 rows=1 width=52) (actual time=0.061..0.066 rows=1 loops=1) Index Cond: ("outer".sid = s.id) Total runtime: 2932.013 ms (14 rows) And running it as a simple query shows it much faster, 72 ms. felix=> EXPLAIN ANALYZE SELECT s.data, g.key, g.val, g.sid FROM key k, val_int v, sid s, glue_int g WHERE (k.data = 'mthNQFrmVs3Q4bVruCxIAGy' AND k.id = g.key) AND (v.data >= 1973028023 AND v.data <= 1973028223) AND v.id = g.val AND g.sid = s.id; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=5.82..1162.41 rows=1 width=60) (actual time=53.341..71.996 rows=1 loops=1) -> Hash Join (cost=5.82..1159.38 rows=1 width=16) (actual time=53.280..71.921 rows=1 loops=1) Hash Cond: ("outer"."key" = "inner".id) -> Nested Loop (cost=0.00..1148.54 rows=1001 width=16) (actual time=40.525..67.962 rows=1000 loops=1) -> Index Scan using val_int_data_key on val_int v (cost=0.00..6.01 rows=1 width=4) (actual time=0.068..0.072 rows=1 loops=1) Index Cond: ((data >= 1973028023) AND (data <= 1973028223)) -> Index Scan using glue_int_val_idx on glue_int g (cost=0.00..730.26 rows=32982 width=16) (actual time=40.441..60.218 rows=1000 loops=1) Index Cond: ("outer".id = g.val) -> Hash (cost=5.82..5.82 rows=1 width=4) (actual time=0.067..0.067 rows=0 loops=1) -> Index Scan using key_data_key on "key" k (cost=0.00..5.82 rows=1 width=4) (actual time=0.049..0.054 rows=1 loops=1) Index Cond: (data = 'mthNQFrmVs3Q4bVruCxIAGy'::text) -> Index Scan using sid_pkey on sid s (cost=0.00..3.01 rows=1 width=52) (actual time=0.048..0.052 rows=1 loops=1) Index Cond: ("outer".sid = s.id) Total runtime: 72.099 ms (14 rows) But the plan has also changed, from outer.val to outer.key. Since the key table is 100 times smaller than the val table, I can see why one executes so much faster. But is 77 seconds reasonable? It just seems incredibly slow. -- ... _._. ._ ._. . _._. ._. ___ .__ ._. . .__. ._ .. ._. Felix Finch: scarecrow repairman & rocket surgeon / felix@crowfix.com GPG = E987 4493 C860 246C 3B1E 6477 7838 76E9 182E 8151 ITAR license #4933 I've found a solution to Fermat's Last Theorem but I see I've run out of room o ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |