This is a discussion on Re: performance of bitmap scans in nested loop joins within the pgsql Hackers forums, part of the PostgreSQL category; --> "Sergey E. Koposov" <math@sai.msu.ru> writes: > On Fri, 29 Apr 2005, Tom Lane wrote: >> The latter is (or ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| "Sergey E. Koposov" <math@sai.msu.ru> writes: > On Fri, 29 Apr 2005, Tom Lane wrote: >> The latter is (or should be) doing slightly *less* work, so why is it >> taking almost twice as much time? Can you get gprof profiles of the >> two cases? > I've got them. I attached two files with a little bit truncated output of > gprof. I figured out part of the problem: I had made nodeBitmapIndexscan re-open the index on each call, thinking that that would save amrescan calls. But an amrescan is a whole lot cheaper than index open/close, so that was a bad tradeoff. This seems to account for about half of the slowdown you exhibited. I'm not sure where the other half went ... gprof isn't turning up any obvious candidates. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) |
| |||
| On Wed, 4 May 2005, Tom Lane wrote: > > I figured out part of the problem: I had made nodeBitmapIndexscan > re-open the index on each call, thinking that that would save amrescan > calls. But an amrescan is a whole lot cheaper than index open/close, > so that was a bad tradeoff. > > This seems to account for about half of the slowdown you exhibited. > I'm not sure where the other half went ... gprof isn't turning up > any obvious candidates. > Great!!! Thanks. Now I have done again the profiling. Really, now the bitmap index scan is faster and take 90 seconds instead of 120 seconds. But it still more than the sum of two simple index scans (26+38 secs). I also tried to help you, Tom, to find the exact location of the remaining problem. For that purpose I did the profiling of those queries: test=# explain analyze select * from q3c,q3c as q3cs where q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3; test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE (q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993); test=# explain analyze select * from q3c,q3c as q3cs where (q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3) OR (q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993); And I coadded the "flat profiles" of first two (index scan) queries and compared it with the flat profile of bitmap scan: self seconds | self seconds Function of TWO index scans | of bitmap scan xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx xxx LWLockAcquire 51.29 46.29 LWLockRelease 40.29 38.65 _bt_compare 38.35 36.20 hash_seq_search 0.00 30.08 hash_search 26.90 29.32 AllocSetAlloc 10.73 27.23 btint8cmp 30.95 24.05 PinBuffer 23.25 22.56 hash_any 18.58 20.18 _bt_binsrch 19.73 19.09 UnpinBuffer 18.27 17.44 _bt_first 13.92 16.20 FunctionCall2 18.23 14.62 _bt_preprocess_keys 12.01 13.67 _bt_moveright 13.42 11.87 InstrStopNode 7.62 11.24 seg_alloc 0.00 11.13 ReadBuffer 11.42 10.68 InstrEndLoop 4.11 10.38 ExecProcNode 14.60 10.17 fmgr_info_cxt_security 13.87 9.96 _bt_search 10.41 9.80 AllocSetReset 9.70 9.47 AllocSetFree 4.00 7.75 _bt_getroot 7.54 7.35 LockBuffer 9.32 7.07 MemoryContextAlloc 3.22 6.93 ExecMakeFunctionResultNoSets 5.95 6.85 element_alloc 0.12 6.73 _bt_checkkeys 8.68 6.07 slot_deform_tuple 7.71 5.77 ExecReScan 2.89 5.67 MemoryContextAllocZero 0.00 5.43 ReleaseAndReadBuffer 6.20 5.00 ExecEvalVar 4.02 4.80 ReleaseBuffer 5.39 4.28 ExecScan 4.39 4.11 ExecutorRun 1.64 4.08 slot_getattr 4.80 4.00 _bt_relandgetbuf 4.31 3.70 MemoryContextCreate 0.00 3.54 The strongest difference is in hash_seq_search (30 seconds) and AllocSetAlloc -- 17 seconds. (which contribute a significant part into slowness of bitmap scan) (I remind that in my case bitmap scan is slower than 2 index scans by ~ 20-30seconds (34 seconds when postgres was compiled with profiling support, 25 seconds when postgres was compiled without profiling support)). Since I don't know exactly the normal sequence of function calls in postgres, I cannot judge whether those functions are true reason of the problem or not, but probably that information can help. In any case, again I have put all the full profiling results on the web http://lnfm1.sai.msu.ru/~math/public...idxscan1.gprof profiling of test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE (q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993); http://lnfm1.sai.msu.ru/~math/public...idxscan1.gprof profiling of test=# explain analyze select * from q3c,q3c as q3cs where q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3; http://lnfm1.sai.msu.ru/~math/public_misc/bitmap1.gprof profiling of test=# explain analyze select * from q3c,q3c as q3cs where (q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3) OR (q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993); Here, this is just the new output of explain analyze (for the postgres compiled without profiling support) test=# explain analyze select * from q3c,q3c as q3cs where (q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3) OR (q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=5832.03..190281569757.46 rows=1888909037091 width=96) (actual time=0.173..88500.469 rows=3000000 loops=1) -> Seq Scan on q3c q3cs (cost=0.00..60928.16 rows=3000016 width=48) (actual time=0.072..3750.724 rows=3000000 loops=1) -> Bitmap Heap Scan on q3c (cost=5832.03..43426.73 rows=666670 width=48) (actual time=0.021..0.022 rows=1 loops=3000000) Recheck Cond: (((q3c.ipix >= ("outer".ipix - 3)) AND (q3c.ipix <= ("outer".ipix + 3))) OR ((q3c.ipix >= ("outer".ipix - 1000)) AND (q3c.ipix <= ("outer".ipix - 993)))) -> BitmapOr (cost=5832.03..5832.03 rows=666670 width=0) (actual time=0.017..0.017 rows=0 loops=3000000) -> Bitmap Index Scan on ipix_idx (cost=0.00..2916.02 rows=333335 width=0) (actual time=0.008..0.008 rows=1 loops=3000000) Index Cond: ((q3c.ipix >= ("outer".ipix - 3)) AND (q3c.ipix <= ("outer".ipix + 3))) -> Bitmap Index Scan on ipix_idx (cost=0.00..2916.02 rows=333335 width=0) (actual time=0.006..0.006 rows=0 loops=3000000) Index Cond: ((q3c.ipix >= ("outer".ipix - 1000)) AND (q3c.ipix <= ("outer".ipix - 993))) Total runtime: 90241.180 ms test=# EXPLAIN ANALYZE SELECT * FROM q3c,q3c as q3cs WHERE test-# (q3c.ipix>=q3cs.ipix-1000 AND q3c.ipix<=q3cs.ipix-993); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.01..49059519577.21 rows=1000010666695 width=96) (actual time=26391.542..26391.542 rows=0 loops=1) -> Seq Scan on q3c q3cs (cost=0.00..60928.16 rows=3000016 width=48) (actual time=0.059..3393.616 rows=3000000 loops=1) -> Index Scan using ipix_idx on q3c (cost=0.01..9686.37 rows=333335 width=48) (actual time=0.006..0.006 rows=0 loops=3000000) Index Cond: ((q3c.ipix >= ("outer".ipix - 1000)) AND (q3c.ipix <= ("outer".ipix - 993))) Total runtime: 26391.623 ms (5 rows) test=# explain analyze select * from q3c,q3c as q3cs where q3c.ipix>=q3cs.ipix-3 AND q3c.ipix<=q3cs.ipix+3; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.01..49059519577.21 rows=1000010666695 width=96) (actual time=0.108..37355.388 rows=3000000 loops=1) -> Seq Scan on q3c q3cs (cost=0.00..60928.16 rows=3000016 width=48) (actual time=0.039..3403.276 rows=3000000 loops=1) -> Index Scan using ipix_idx on q3c (cost=0.01..9686.37 rows=333335 width=48) (actual time=0.007..0.008 rows=1 loops=3000000) Index Cond: ((q3c.ipix >= ("outer".ipix - 3)) AND (q3c.ipix <= ("outer".ipix + 3))) Total runtime: 38588.672 ms (5 rows) With best regards, Sergey Koposov ---------------------------(end of broadcast)--------------------------- TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org |
| |||
| "Sergey E. Koposov" <math@sai.msu.ru> writes: > And I coadded the "flat profiles" of first two (index scan) queries and > compared it with the flat profile of bitmap scan: Thanks, I had been thinking of doing that same calculation but hadn't got round to it yet. It looks like the bitmap case is actually a little ahead on buffer access (as you'd expect) and btree work (which is surprising because it ought to be dead even; are these numbers very repeatable?). Where we are losing is mostly on the actual manipulation of the bitmaps (particularly hash_seq_search which is done in tbm_begin_iterate; and it looks like memory allocation for the bitmap hashtables is nontrivial too). I had already had a TODO item to look into speeding up hash_seq_search ... will see what I can find. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: Have you searched our list archives? http://archives.postgresql.org |
| ||||
| On Thu, 5 May 2005, Tom Lane wrote: > "Sergey E. Koposov" <math@sai.msu.ru> writes: > > And I coadded the "flat profiles" of first two (index scan) queries and > > compared it with the flat profile of bitmap scan: > > Thanks, I had been thinking of doing that same calculation but hadn't > got round to it yet. It looks like the bitmap case is actually a little > ahead on buffer access (as you'd expect) and btree work (which is > surprising because it ought to be dead even; are these numbers very > repeatable?). Where we are losing is mostly on the actual manipulation Yes, all those timings are rather stable and have been obtained on the very unloaded system (and in fully cached regime). With Best regards, Sergey Koposov ---------------------------(end of broadcast)--------------------------- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |