Unix Technical Forum

Horribly slow query/ sequential scan

This is a discussion on Horribly slow query/ sequential scan within the Pgsql Performance forums, part of the PostgreSQL category; --> This is a query migrated from postgres. In postgres it runs about 10,000 times *slower* than on informix on ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 09:02 AM
Gregory S. Williamson
 
Posts: n/a
Default Horribly slow query/ sequential scan

This is a query migrated from postgres. In postgres it runs about 10,000 times *slower* than on informix on somewhat newer hardware. The problem is entirely due to the planner. This PostgreSQL 8.1.4 on linux, 2 gigs of ram.

The table:
Table "reporting.bill_rpt_work"
Column | Type | Modifiers
---------------+-----------------------+-----------
report_id | integer |
client_id | character varying(10) |
contract_id | integer | not null
rate | numeric | not null
appid | character varying(10) | not null
userid | text | not null
collection_id | integer | not null
client_name | character varying(60) |
use_sius | integer | not null
is_subscribed | integer | not null
hits | numeric | not null
sius | numeric | not null
total_amnt | numeric | not null
royalty_total | numeric |
Indexes:
"billrptw_ndx" UNIQUE, btree (report_id, client_id, contract_id, rate, appid, userid, collection_id)
"billrpt_cntrct_ndx" btree (report_id, contract_id, client_id)
"billrpt_collid_ndx" btree (report_id, collection_id, client_id, contract_id)
Foreign-key constraints:
"$1" FOREIGN KEY (report_id) REFERENCES billing_reports(report_id)
"$2" FOREIGN KEY (client_id) REFERENCES "work".clients(client_id)


The query:
explain analyze select
w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs,
sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits,
sum(w.sius) * w.rate AS BYIUS
from bill_rpt_work w, billing_reports b
where w.report_id in
(select b.report_id from billing_reports where b.report_s_date = '2006-09-30')
and (w.client_id = '227400001' or w.client_id = '2274000010')
group by 1,2,3
order by 1,2,3;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
------------------------------
GroupAggregate (cost=298061335.44..298259321.62 rows=26 width=58) (actual time=372213.673..372213.689 rows=2 loops=1)
-> Sort (cost=298061335.44..298083333.83 rows=8799354 width=58) (actual time=372213.489..372213.503 rows=37 loops=1)
Sort Key: w.appid, w.rate, w.is_subscribed
-> Nested Loop (cost=0.00..296121313.45 rows=8799354 width=58) (actual time=286628.486..372213.053 rows=37 loops=1)
Join Filter: (subplan)
-> Seq Scan on bill_rpt_work w (cost=0.00..85703.20 rows=11238 width=62) (actual time=1.239..1736.746 rows=61020 loops=1)
Filter: (((client_id)::text = '227400001'::text) OR ((client_id)::text = '2274000010'::text))
-> Seq Scan on billing_reports b (cost=0.00..29.66 rows=1566 width=8) (actual time=0.001..0.879 rows=1566 loops=61020)
SubPlan
-> Result (cost=0.00..29.66 rows=1566 width=0) (actual time=0.000..0.002 rows=1 loops=95557320)
One-Time Filter: ($1 = '2006-09-30'::date)
-> Seq Scan on billing_reports (cost=0.00..29.66 rows=1566 width=0) (actual time=0.001..0.863 rows=1565 loops=61020)
Total runtime: 372214.085 ms


Informix uses report id/client id as an index, thus eliminating a huge number of rows. The table has 2280545 rows currently; slightly fewer when the above analyze was run. Informix has about 5 times as much data.

select count(*) from bill_rpt_work where report_id in (select report_id from billing_reports where report_s_date = '2006-09-30') and (client_id = '227400001' or client_id = '2274000010');
count
-------
37
(1 row)

So scanning everything seems particularly senseless.

I had some success adding client id and report id to the initial select list, but that causes all sorts of problems in calling procedures that expect different data grouping.

Any suggestion would be welcome because this is a horrible show stopper.

Thanks,

Greg Williamson
DBA
GlobeXplorer LLC




---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 09:02 AM
db@zigo.dhs.org
 
Posts: n/a
Default Re: Horribly slow query/ sequential scan

I don't think I understand the idea behind this query. Do you really need
billing_reports twice?

> The query:
> explain analyze select
> w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs,
> sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits,
> sum(w.sius) * w.rate AS BYIUS
> from bill_rpt_work w, billing_reports b
> where w.report_id in
> (select b.report_id from billing_reports where b.report_s_date =
> '2006-09-30')
> and (w.client_id = '227400001' or w.client_id = '2274000010')
> group by 1,2,3
> order by 1,2,3;


Maybe this is the query you want instead?

select w.appid,
w.rate,
w.is_subscribed,
sum(w.hits) AS Hits,
sum(w.sius) AS IUs,
sum(w.total_amnt) AS Total,
sum(w.hits) * w.rate AS ByHits,
sum(w.sius) * w.rate AS BYIUS
from bill_rpt_work w
where w.report_id in
(select b.report_id from billing_reports b where b.report_s_date =
'2006-09-30')
and (w.client_id = '227400001' or w.client_id = '2274000010')
group by 1,2,3
order by 1,2,3;

/Dennis


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 09:03 AM
Gregory S. Williamson
 
Posts: n/a
Default Re: Horribly slow query/ sequential scan

Voila ! You da man !

& other expressions of awe and appreciation ...

HAving burdened others with my foolishness too often, I hesitate to ask, but could someone either point me to a reference or explain what the difference might be ... I can see it with the eyes but I am having trouble understanding what Informix might have been doing to my (bad ?) SQL to "fix" the query. Seeing a redundancy and eliminating it ?

The explain analyze for "db"'s sql (slightly faster than Informix on an older Sun machine ... about 20%):
GroupAggregate (cost=64.35..64.75 rows=8 width=58) (actual time=0.612..0.629 rows=2 loops=1)
-> Sort (cost=64.35..64.37 rows=8 width=58) (actual time=0.463..0.476 rows=37 loops=1)
Sort Key: w.appid, w.rate, w.is_subscribed
-> Nested Loop (cost=8.11..64.23 rows=8 width=58) (actual time=0.130..0.211 rows=37 loops=1)
Join Filter: ("inner".report_id = "outer".report_id)
-> HashAggregate (cost=3.95..3.96 rows=1 width=4) (actual time=0.035..0.035 rows=1 loops=1)
-> Index Scan using billrpt_sdate_ndx on billing_reports b (cost=0.00..3.94 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=1)
Index Cond: (report_s_date = '2006-09-30'::date)
-> Bitmap Heap Scan on bill_rpt_work w (cost=4.17..59.92 rows=28 width=62) (actual time=0.084..0.111 rows=37 loops=1)
Recheck Cond: (((w.report_id = "outer".report_id) AND ((w.client_id)::text = '227400001'::text)) OR ((w.report_id = "outer".report_id) AND ((w.client_id)::text = '2274000010'::text)))
-> BitmapOr (cost=4.17..4.17 rows=28 width=0) (actual time=0.078..0.078 rows=0 loops=1)
-> Bitmap Index Scan on billrptw_ndx (cost=0.00..2.08 rows=14 width=0) (actual time=0.053..0.053 rows=22 loops=1)
Index Cond: ((w.report_id = "outer".report_id) AND ((w.client_id)::text = '227400001'::text))
-> Bitmap Index Scan on billrptw_ndx (cost=0.00..2.08 rows=14 width=0) (actual time=0.024..0.024 rows=15 loops=1)
Index Cond: ((w.report_id = "outer".report_id) AND ((w.client_id)::text = '2274000010'::text))
Total runtime: 6.110 ms
(16 rows)

Thanks again (and sorry for the top-posting but this particular interface is ungainly)

G

-----Original Message-----
From: db@zigo.dhs.org [mailto:db@zigo.dhs.org]
Sent: Tue 1/9/2007 4:35 AM
To: Gregory S. Williamson
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Horribly slow query/ sequential scan

I don't think I understand the idea behind this query. Do you really need
billing_reports twice?

> The query:
> explain analyze select
> w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs,
> sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits,
> sum(w.sius) * w.rate AS BYIUS
> from bill_rpt_work w, billing_reports b
> where w.report_id in
> (select b.report_id from billing_reports where b.report_s_date =
> '2006-09-30')
> and (w.client_id = '227400001' or w.client_id = '2274000010')
> group by 1,2,3
> order by 1,2,3;


Maybe this is the query you want instead?

select w.appid,
w.rate,
w.is_subscribed,
sum(w.hits) AS Hits,
sum(w.sius) AS IUs,
sum(w.total_amnt) AS Total,
sum(w.hits) * w.rate AS ByHits,
sum(w.sius) * w.rate AS BYIUS
from bill_rpt_work w
where w.report_id in
(select b.report_id from billing_reports b where b.report_s_date =
'2006-09-30')
and (w.client_id = '227400001' or w.client_id = '2274000010')
group by 1,2,3
order by 1,2,3;

/Dennis



-------------------------------------------------------
Click link below if it is SPAM gsw@globexplorer.com
"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45a38b1548991076418835&user= gsw@globexplorer.com&retrain=spam&template=history &history_page=1"
!DSPAM:45a38b1548991076418835!
-------------------------------------------------------




Voi

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 09:03 AM
Tom Lane
 
Posts: n/a
Default Re: Horribly slow query/ sequential scan

"Gregory S. Williamson" <gsw@globexplorer.com> writes:
> HAving burdened others with my foolishness too often, I hesitate to
> ask, but could someone either point me to a reference or explain what
> the difference might be ... I can see it with the eyes but I am having
> trouble understanding what Informix might have been doing to my (bad
> ?) SQL to "fix" the query.


Me too. Does informix have anything EXPLAIN-like to show what it's
doing?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:46 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com