Unix Technical Forum

Join the same row

This is a discussion on Join the same row within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi, I´m trying to optimize some selects between 2 tables and the best way I found was alter the ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 06:44 AM
Edison Azzi
 
Posts: n/a
Default Join the same row

Hi,

I´m trying to optimize some selects between 2 tables and the best way I
found was
alter the first table and add the fields of the 2nd table. I adjusted
the contents and
now a have only one table with all info that I need. Now resides my
problem, because
of legacy queries I decided to make a Rule that replace the 2nd table.

Until now all worked well, but I found when I make a join between de result
table and de Rule, even tought is the same row in the same table, the
optimizer
generete two access for the same row:
cta_pag is the table and ctapag_adm is the rule.

CREATE OR REPLACE RULE "_RETURN" AS
ON SELECT TO ctapag_adm DO INSTEAD SELECT cta_pag.nrlancto,
cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai
FROM cta_pag
WHERE cta_pag.origem = 'A'::bpchar;

This is one of the legacy queries:

select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and
p.nrlancto = 21861;

EXPLAIN:
Nested Loop (cost=0.00..11.49 rows=1 width=443) (actual
time=0.081..0.088 rows=1 loops=1)
-> Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1
width=408) (actual time=0.044..0.046 rows=1 loops=1)
Index Cond: (nrlancto = 21861::numeric)
-> Index Scan using cta_pag_pk on cta_pag (cost=0.00..5.74 rows=1
width=35) (actual time=0.023..0.025 rows=1 loops=1)
Index Cond: (21861::numeric = nrlancto)
Filter: (origem = 'A'::bpchar)
Total runtime: 0.341 ms


Resulting in twice the time for accessing.

Acessing just on time the same row:

select * from cta_pag p where p.nrlancto = 21861

EXPLAIN:
Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1
width=408) (actual time=0.044..0.047 rows=1 loops=1)
Index Cond: (nrlancto = 21861::numeric)
Total runtime: 0.161 ms


Is there a way to force the optimizer to understand that is the same
row?

Thanks,
Edison


--
Edison Azzi
<edisonazzi (at ) terra ( dot ) com ( dot ) br>


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 06:44 AM
Richard Huxton
 
Posts: n/a
Default Re: Join the same row

Edison Azzi wrote:
> Hi,
>
> I´m trying to optimize some selects between 2 tables and the best way I
> found was
> alter the first table and add the fields of the 2nd table. I adjusted
> the contents and
> now a have only one table with all info that I need. Now resides my
> problem, because
> of legacy queries I decided to make a Rule that replace the 2nd table.
>
> Until now all worked well, but I found when I make a join between de result
> table and de Rule, even tought is the same row in the same table, the
> optimizer
> generete two access for the same row:
> cta_pag is the table and ctapag_adm is the rule.
>
> CREATE OR REPLACE RULE "_RETURN" AS
> ON SELECT TO ctapag_adm DO INSTEAD SELECT cta_pag.nrlancto,
> cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai
> FROM cta_pag
> WHERE cta_pag.origem = 'A'::bpchar;
>
> This is one of the legacy queries:
>
> select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto and
> p.nrlancto = 21861;


OK - and you get a self-join (which is what you asked for, but you'd
like the planner to notice that it might not be necessary).

> Resulting in twice the time for accessing.
>
> Acessing just on time the same row:
>
> select * from cta_pag p where p.nrlancto = 21861


This isn't the same query though. Your rule has an additional condition
origem='A'. This means it wouldn't be correct to eliminate the self-join
even if the planner could.

> Is there a way to force the optimizer to understand that is the same
> row?


However, even if you removed the condition on origem, I don't think the
planner will notice that it can eliminate the join. It's just too
unusual a case for the planner to have a rule for it.

I might be wrong about the planner - I'm just another user. One of the
developers may correct me.
--
Richard Huxton
Archonet Ltd


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 06:44 AM
Richard Huxton
 
Posts: n/a
Default Re: Join the same row

Edison Azzi wrote:
> Richard Huxton escreveu:
>> However, even if you removed the condition on origem, I don't think
>> the planner will notice that it can eliminate the join. It's just too
>> unusual a case for the planner to have a rule for it.
>>
>> I might be wrong about the planner - I'm just another user. One of the
>> developers may correct me.

>
>
> You are rigth, the planner will not eliminate the join, see:
>
> select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and
> p.nrlancto = 21861;
>
> EXPLAIN:
> Nested Loop (cost=0.00..11.48 rows=1 width=816)
> -> Index Scan using cta_pag_pk on cta_pag a (cost=0.00..5.74 rows=1
> width=408)
> Index Cond: (21861::numeric = nrlancto)
> -> Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1
> width=408)
> Index Cond: (nrlancto = 21861::numeric)
>
>
> I know that this is too unusual case, but I hoped that the planner could
> deal
> with this condition. I´m trying to speed up without have to rewrite a
> bunch of
> queries. Now I'll have to think another way to work around this issue.


Is the performance really so bad? All the data is guaranteed to be
cached for the second index-scan.

--
Richard Huxton
Archonet Ltd


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 06:45 AM
Edison Azzi
 
Posts: n/a
Default Re: Join the same row

Richard Huxton escreveu:

> Edison Azzi wrote:
>
>> Hi,
>>
>> I´m trying to optimize some selects between 2 tables and the best way
>> I found was
>> alter the first table and add the fields of the 2nd table. I adjusted
>> the contents and
>> now a have only one table with all info that I need. Now resides my
>> problem, because
>> of legacy queries I decided to make a Rule that replace the 2nd table.
>>
>> Until now all worked well, but I found when I make a join between de
>> result
>> table and de Rule, even tought is the same row in the same table, the
>> optimizer
>> generete two access for the same row:
>> cta_pag is the table and ctapag_adm is the rule.
>>
>> CREATE OR REPLACE RULE "_RETURN" AS
>> ON SELECT TO ctapag_adm DO INSTEAD SELECT cta_pag.nrlancto,
>> cta_pag.codconta, cta_pag.frequencia, cta_pag.nrlanctopai
>> FROM cta_pag
>> WHERE cta_pag.origem = 'A'::bpchar;
>>
>> This is one of the legacy queries:
>>
>> select * from cta_pag p , ctapag_adm a where a.nrlancto= p.nrlancto
>> and p.nrlancto = 21861;

>
>
> OK - and you get a self-join (which is what you asked for, but you'd
> like the planner to notice that it might not be necessary).
>
>> Resulting in twice the time for accessing.
>>
>> Acessing just on time the same row:
>>
>> select * from cta_pag p where p.nrlancto = 21861

>
>
> This isn't the same query though. Your rule has an additional
> condition origem='A'. This means it wouldn't be correct to eliminate
> the self-join even if the planner could.
>
>> Is there a way to force the optimizer to understand that is the
>> same row?

>
>
> However, even if you removed the condition on origem, I don't think
> the planner will notice that it can eliminate the join. It's just too
> unusual a case for the planner to have a rule for it.
>
> I might be wrong about the planner - I'm just another user. One of the
> developers may correct me.



You are rigth, the planner will not eliminate the join, see:

select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and
p.nrlancto = 21861;

EXPLAIN:
Nested Loop (cost=0.00..11.48 rows=1 width=816)
-> Index Scan using cta_pag_pk on cta_pag a (cost=0.00..5.74 rows=1
width=408)
Index Cond: (21861::numeric = nrlancto)
-> Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1
width=408)
Index Cond: (nrlancto = 21861::numeric)


I know that this is too unusual case, but I hoped that the planner could
deal
with this condition. I´m trying to speed up without have to rewrite a
bunch of
queries. Now I'll have to think another way to work around this issue.

Thanks,

Edison.




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 06:45 AM
Tom Lane
 
Posts: n/a
Default Re: Join the same row

Edison Azzi <edisonazzi@terra.com.br> writes:
> You are rigth, the planner will not eliminate the join, see:


> select * from cta_pag a, cta_pag p where a.nrlancto=p.nrlancto and
> p.nrlancto = 21861;


> EXPLAIN:
> Nested Loop (cost=0.00..11.48 rows=1 width=816)
> -> Index Scan using cta_pag_pk on cta_pag a (cost=0.00..5.74 rows=1
> width=408)
> Index Cond: (21861::numeric = nrlancto)
> -> Index Scan using cta_pag_pk on cta_pag p (cost=0.00..5.74 rows=1
> width=408)
> Index Cond: (nrlancto = 21861::numeric)


But do you care? That second fetch of the same row isn't going to cost
much of anything, since everything it needs to touch will have been
sucked into cache already. I don't really see the case for adding logic
to the planner to detect this particular flavor of badly-written query.

Notice that the planner *is* managing to propagate the constant
comparison to both relations.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:17 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com