Unix Technical Forum

Query/database optimization

This is a discussion on Query/database optimization within the Pgsql Performance forums, part of the PostgreSQL category; --> Hello, I have a query: explain analyze select tu.url_id, tu.url, coalesce(sd.recurse, 100), case when COALESCE(get_option('use_banner')::integer,0) = 0 then 0 ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 08:11 AM
Eugeny N Dzhurinsky
 
Posts: n/a
Default Query/database optimization

Hello, I have a query:

explain analyze select tu.url_id, tu.url, coalesce(sd.recurse, 100), case when
COALESCE(get_option('use_banner')::integer,0) = 0 then 0 else ts.use_banner
end as use_banner, ts.use_cookies, ts.use_robots, ts.includes, ts.excludes,
ts.track_domain, ts.task_id,get_available_pages(ts.task_id,ts.custo mer_id),
ts.redirects from task_url tu inner join task_scheduler ts on
tu.task_id=ts.task_id inner join (subscription s inner join subscription_dic
sd on sd.id=s.dict_id ) on s.customer_id=ts.customer_id inner join customer
c on c.customer_id=ts.customer_id AND c.active WHERE
get_available_pages(ts.task_id,ts.customer_id) > 0 AND
((get_option('expired_users')::integer = 0) OR (isfinite(last_login) AND
extract('day' from current_timestamp - last_login)::integer <=
coalesce(get_option('expired_users')::integer,100) )) AND ((s.status is null
AND ts.customer_id is null) OR s.status > 0) AND
(get_check_period(ts.task_id,ts.next_check) is null OR
(unix_timestamp(get_check_period(ts.task_id,ts.nex t_check)) -
unix_timestamp(timenow()) < 3600)) AND ts.status <> 1 AND ((ts.start_time <
current_time AND ts.stop_time > current_time) OR (ts.start_time is null AND
ts.stop_time is null)) AND tu.url_id = 1 AND ts.customer_id not in (select
distinct customer_id from task_scheduler where status = 1) order by
ts.next_check is not null, unix_timestamp(ts.next_check) -
unix_timestamp(timenow()) limit 10;

which produces this query plan:
Limit (cost=2874.98..2874.99 rows=2 width=88) (actual time=11800.535..11800.546 rows=3 loops=1)
-> Sort (cost=2874.98..2874.99 rows=2 width=88) (actual time=11800.529..11800.532 rows=3 loops=1)
Sort Key: (ts.next_check IS NOT NULL), (date_part('epoch'::text, ts.next_check) - date_part('epoch'::text, (timenow())::timestamp without time zone))
-> Nested Loop (cost=4.37..2874.97 rows=2 width=88) (actual time=10249.115..11800.486 rows=3 loops=1)
-> Nested Loop (cost=4.37..2868.87 rows=2 width=55) (actual time=10247.721..11796.303 rows=3 loops=1)
Join Filter: ("inner".id = "outer".dict_id)
-> Nested Loop (cost=2.03..2865.13 rows=2 width=55) (actual time=10247.649..11796.142 rows=3 loops=1)
Join Filter: ((("inner".status IS NULL) AND ("outer".customer_id IS NULL)) OR ("inner".status > 0))
-> Nested Loop (cost=2.03..2858.34 rows=2 width=55) (actual time=10247.583..11795.936 rows=3 loops=1)
-> Seq Scan on customer c (cost=0.00..195.71 rows=231 width=4) (actual time=0.082..154.344 rows=4161 loops=1)
Filter: (active AND isfinite(last_login) AND ((date_part('day'::text, (('now'::text)::timestamp(6) with time zone - (last_login)::timestamp with time zone)))::integer <= 150))
-> Index Scan using task_scheduler_icustomer_id on task_scheduler ts (cost=2.03..11.51 rows=1 width=51) (actual time=2.785..2.785 rows=0 loops=4161)
Index Cond: ("outer".customer_id = ts.customer_id)
Filter: ((get_available_pages(task_id, customer_id) > 0) AND ((get_check_period(task_id, next_check) IS NULL) OR ((date_part('epoch'::text, get_check_period(task_id, next_check)) - date_part('epoch'::text, (timenow())::timestamp without time zone)) < 3600::double precision)) AND (status <> 1) AND ((((start_time)::time with time zone < ('now'::text)::time(6) with time zone) AND ((stop_time)::time with time zone > ('now'::text)::time(6) with time zone)) OR ((start_time IS NULL) AND (stop_time IS NULL))) AND (NOT (hashed subplan)))
SubPlan
-> Unique (cost=2.02..2.03 rows=1 width=4) (actual time=0.617..0.631 rows=3 loops=1)
-> Sort (cost=2.02..2.03 rows=1 width=4) (actual time=0.613..0.617 rows=3 loops=1)
Sort Key: customer_id
-> Index Scan using task_scheduler_istatus on task_scheduler (cost=0.00..2.01 rows=1 width=4) (actual time=0.044..0.580 rows=3 loops=1)
Index Cond: (status = 1)
-> Index Scan using subscription_icustomer_id on subscription s (cost=0.00..3.38 rows=1 width=12) (actual time=0.035..0.041 rows=1 loops=3)
Index Cond: ("outer".customer_id = s.customer_id)
-> Materialize (cost=2.34..2.65 rows=31 width=8) (actual time=0.008..0.027 rows=6 loops=3)
-> Seq Scan on subscription_dic sd (cost=0.00..2.31 rows=31 width=8) (actual time=0.013..0.034 rows=6 loops=1)
-> Index Scan using task_url_storage_task_id on task_url tu (cost=0.00..3.03 rows=1 width=37) (actual time=0.028..0.045 rows=1 loops=3)
Index Cond: (tu.task_id = "outer".task_id)
Filter: (url_id = 1)
Total runtime: 11801.082 ms
(28 rows)


Do I need to optimize a query somehow, or it is related to database
configuration?

I'm running postgresql 8.0.0 on CentOS release 3.7

--
Eugene N Dzhurinsky

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 08:12 AM
Tom Lane
 
Posts: n/a
Default Re: Query/database optimization

Eugeny N Dzhurinsky <bofh@redwerk.com> writes:
> [slow query]


The bulk of your time seems to be going into this indexscan:

> -> Index Scan using task_scheduler_icustomer_id on task_scheduler ts (cost=2.03..11.51 rows=1 width=51) (actual time=2.785..2.785 rows=0 loops=4161)
> Index Cond: ("outer".customer_id = ts.customer_id)
> Filter: ((get_available_pages(task_id, customer_id) > 0) AND ((get_check_period(task_id, next_check) IS NULL) OR ((date_part('epoch'::text, get_check_period(task_id, next_check)) - date_part('epoch'::text, (timenow())::timestamp without time zone)) < 3600::double precision)) AND (status <> 1) AND ((((start_time)::time with time zone < ('now'::text)::time(6) with time zone) AND ((stop_time)::time with time zone > ('now'::text)::time(6) with time zone)) OR ((start_time IS NULL) AND (stop_time IS NULL))) AND (NOT (hashed subplan)))
> SubPlan
> -> Unique (cost=2.02..2.03 rows=1 width=4) (actual time=0.617..0.631 rows=3 loops=1)
> ...


I kinda doubt that the index search itself is that slow --- doubtless
the problem comes from having to evaluate that filter condition on a lot
of rows. How fast are those functions you're calling?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 08:12 AM
Eugeny N Dzhurinsky
 
Posts: n/a
Default Re: Query/database optimization

On Tue, Aug 01, 2006 at 11:15:11PM -0400, Tom Lane wrote:
> Eugeny N Dzhurinsky <bofh@redwerk.com> writes:
> > [slow query]

> The bulk of your time seems to be going into this indexscan:
> > -> Index Scan using task_scheduler_icustomer_id on task_scheduler ts (cost=2.03..11.51 rows=1 width=51) (actual time=2.785..2.785 rows=0 loops=4161)
> > Index Cond: ("outer".customer_id = ts.customer_id)
> > Filter: ((get_available_pages(task_id, customer_id) > 0) AND ((get_check_period(task_id, next_check) IS NULL) OR ((date_part('epoch'::text, get_check_period(task_id, next_check)) - date_part('epoch'::text, (timenow())::timestamp without time zone)) < 3600::double precision)) AND (status <> 1) AND ((((start_time)::time with time zone < ('now'::text)::time(6) with time zone) AND ((stop_time)::time with time zone > ('now'::text)::time(6) with time zone)) OR ((start_time IS NULL) AND (stop_time IS NULL))) AND (NOT (hashed subplan)))
> > SubPlan
> > -> Unique (cost=2.02..2.03 rows=1 width=4) (actual time=0.617..0.631 rows=3 loops=1)
> > ...

>
> I kinda doubt that the index search itself is that slow --- doubtless
> the problem comes from having to evaluate that filter condition on a lot
> of rows. How fast are those functions you're calling?


Well, not really fast, especially get_available_pages

there is special table with history of changes, and there is a view for latest
changes per task, and this function selects all records from a view for given
ID, then calculates sum of pages of tasks and then calculates number of
available pages as number of allowed pages deduct number of processed pages.

probably there is bottleneck in this view selection?

--
Eugene N Dzhurinsky

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:14 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com