This is a discussion on Horribly slow query/ sequential scan within the Pgsql Performance forums, part of the PostgreSQL category; --> This is a query migrated from postgres. In postgres it runs about 10,000 times *slower* than on informix on ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| This is a query migrated from postgres. In postgres it runs about 10,000 times *slower* than on informix on somewhat newer hardware. The problem is entirely due to the planner. This PostgreSQL 8.1.4 on linux, 2 gigs of ram. The table: Table "reporting.bill_rpt_work" Column | Type | Modifiers ---------------+-----------------------+----------- report_id | integer | client_id | character varying(10) | contract_id | integer | not null rate | numeric | not null appid | character varying(10) | not null userid | text | not null collection_id | integer | not null client_name | character varying(60) | use_sius | integer | not null is_subscribed | integer | not null hits | numeric | not null sius | numeric | not null total_amnt | numeric | not null royalty_total | numeric | Indexes: "billrptw_ndx" UNIQUE, btree (report_id, client_id, contract_id, rate, appid, userid, collection_id) "billrpt_cntrct_ndx" btree (report_id, contract_id, client_id) "billrpt_collid_ndx" btree (report_id, collection_id, client_id, contract_id) Foreign-key constraints: "$1" FOREIGN KEY (report_id) REFERENCES billing_reports(report_id) "$2" FOREIGN KEY (client_id) REFERENCES "work".clients(client_id) The query: explain analyze select w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs, sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w, billing_reports b where w.report_id in (select b.report_id from billing_reports where b.report_s_date = '2006-09-30') and (w.client_id = '227400001' or w.client_id = '2274000010') group by 1,2,3 order by 1,2,3; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- ------------------------------ GroupAggregate (cost=298061335.44..298259321.62 rows=26 width=58) (actual time=372213.673..372213.689 rows=2 loops=1) -> Sort (cost=298061335.44..298083333.83 rows=8799354 width=58) (actual time=372213.489..372213.503 rows=37 loops=1) Sort Key: w.appid, w.rate, w.is_subscribed -> Nested Loop (cost=0.00..296121313.45 rows=8799354 width=58) (actual time=286628.486..372213.053 rows=37 loops=1) Join Filter: (subplan) -> Seq Scan on bill_rpt_work w (cost=0.00..85703.20 rows=11238 width=62) (actual time=1.239..1736.746 rows=61020 loops=1) Filter: (((client_id)::text = '227400001'::text) OR ((client_id)::text = '2274000010'::text)) -> Seq Scan on billing_reports b (cost=0.00..29.66 rows=1566 width=8) (actual time=0.001..0.879 rows=1566 loops=61020) SubPlan -> Result (cost=0.00..29.66 rows=1566 width=0) (actual time=0.000..0.002 rows=1 loops=95557320) One-Time Filter: ($1 = '2006-09-30'::date) -> Seq Scan on billing_reports (cost=0.00..29.66 rows=1566 width=0) (actual time=0.001..0.863 rows=1565 loops=61020) Total runtime: 372214.085 ms Informix uses report id/client id as an index, thus eliminating a huge number of rows. The table has 2280545 rows currently; slightly fewer when the above analyze was run. Informix has about 5 times as much data. select count(*) from bill_rpt_work where report_id in (select report_id from billing_reports where report_s_date = '2006-09-30') and (client_id = '227400001' or client_id = '2274000010'); count ------- 37 (1 row) So scanning everything seems particularly senseless. I had some success adding client id and report id to the initial select list, but that causes all sorts of problems in calling procedures that expect different data grouping. Any suggestion would be welcome because this is a horrible show stopper. Thanks, Greg Williamson DBA GlobeXplorer LLC ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| I don't think I understand the idea behind this query. Do you really need billing_reports twice? > The query: > explain analyze select > w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs, > sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits, > sum(w.sius) * w.rate AS BYIUS > from bill_rpt_work w, billing_reports b > where w.report_id in > (select b.report_id from billing_reports where b.report_s_date = > '2006-09-30') > and (w.client_id = '227400001' or w.client_id = '2274000010') > group by 1,2,3 > order by 1,2,3; Maybe this is the query you want instead? select w.appid, w.rate, w.is_subscribed, sum(w.hits) AS Hits, sum(w.sius) AS IUs, sum(w.total_amnt) AS Total, sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w where w.report_id in (select b.report_id from billing_reports b where b.report_s_date = '2006-09-30') and (w.client_id = '227400001' or w.client_id = '2274000010') group by 1,2,3 order by 1,2,3; /Dennis ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Voila ! You da man ! & other expressions of awe and appreciation ... HAving burdened others with my foolishness too often, I hesitate to ask, but could someone either point me to a reference or explain what the difference might be ... I can see it with the eyes but I am having trouble understanding what Informix might have been doing to my (bad ?) SQL to "fix" the query. Seeing a redundancy and eliminating it ? The explain analyze for "db"'s sql (slightly faster than Informix on an older Sun machine ... about 20%): GroupAggregate (cost=64.35..64.75 rows=8 width=58) (actual time=0.612..0.629 rows=2 loops=1) -> Sort (cost=64.35..64.37 rows=8 width=58) (actual time=0.463..0.476 rows=37 loops=1) Sort Key: w.appid, w.rate, w.is_subscribed -> Nested Loop (cost=8.11..64.23 rows=8 width=58) (actual time=0.130..0.211 rows=37 loops=1) Join Filter: ("inner".report_id = "outer".report_id) -> HashAggregate (cost=3.95..3.96 rows=1 width=4) (actual time=0.035..0.035 rows=1 loops=1) -> Index Scan using billrpt_sdate_ndx on billing_reports b (cost=0.00..3.94 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=1) Index Cond: (report_s_date = '2006-09-30'::date) -> Bitmap Heap Scan on bill_rpt_work w (cost=4.17..59.92 rows=28 width=62) (actual time=0.084..0.111 rows=37 loops=1) Recheck Cond: (((w.report_id = "outer".report_id) AND ((w.client_id)::text = '227400001'::text)) OR ((w.report_id = "outer".report_id) AND ((w.client_id)::text = '2274000010'::text))) -> BitmapOr (cost=4.17..4.17 rows=28 width=0) (actual time=0.078..0.078 rows=0 loops=1) -> Bitmap Index Scan on billrptw_ndx (cost=0.00..2.08 rows=14 width=0) (actual time=0.053..0.053 rows=22 loops=1) Index Cond: ((w.report_id = "outer".report_id) AND ((w.client_id)::text = '227400001'::text)) -> Bitmap Index Scan on billrptw_ndx (cost=0.00..2.08 rows=14 width=0) (actual time=0.024..0.024 rows=15 loops=1) Index Cond: ((w.report_id = "outer".report_id) AND ((w.client_id)::text = '2274000010'::text)) Total runtime: 6.110 ms (16 rows) Thanks again (and sorry for the top-posting but this particular interface is ungainly) G -----Original Message----- From: db@zigo.dhs.org [mailto:db@zigo.dhs.org] Sent: Tue 1/9/2007 4:35 AM To: Gregory S. Williamson Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Horribly slow query/ sequential scan I don't think I understand the idea behind this query. Do you really need billing_reports twice? > The query: > explain analyze select > w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs, > sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits, > sum(w.sius) * w.rate AS BYIUS > from bill_rpt_work w, billing_reports b > where w.report_id in > (select b.report_id from billing_reports where b.report_s_date = > '2006-09-30') > and (w.client_id = '227400001' or w.client_id = '2274000010') > group by 1,2,3 > order by 1,2,3; Maybe this is the query you want instead? select w.appid, w.rate, w.is_subscribed, sum(w.hits) AS Hits, sum(w.sius) AS IUs, sum(w.total_amnt) AS Total, sum(w.hits) * w.rate AS ByHits, sum(w.sius) * w.rate AS BYIUS from bill_rpt_work w where w.report_id in (select b.report_id from billing_reports b where b.report_s_date = '2006-09-30') and (w.client_id = '227400001' or w.client_id = '2274000010') group by 1,2,3 order by 1,2,3; /Dennis ------------------------------------------------------- Click link below if it is SPAM gsw@globexplorer.com "https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45a38b1548991076418835&user= gsw@globexplorer.com&retrain=spam&template=history &history_page=1" !DSPAM:45a38b1548991076418835! ------------------------------------------------------- Voi ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| ||||
| "Gregory S. Williamson" <gsw@globexplorer.com> writes: > HAving burdened others with my foolishness too often, I hesitate to > ask, but could someone either point me to a reference or explain what > the difference might be ... I can see it with the eyes but I am having > trouble understanding what Informix might have been doing to my (bad > ?) SQL to "fix" the query. Me too. Does informix have anything EXPLAIN-like to show what it's doing? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |