Unix Technical Forum

Very large IN-clause is slow, but how to rewrite it?

This is a discussion on Very large IN-clause is slow, but how to rewrite it? within the pgsql Sql forums, part of the PostgreSQL category; --> I've been profiling a PG database / mix of applications and found that one statement which takes a very ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Sql

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 03:08 PM
Richard Jones
 
Posts: n/a
Default Very large IN-clause is slow, but how to rewrite it?

I've been profiling a PG database / mix of applications and found that
one statement which takes a very long time to execute is:

select e.keywordid, e.quantity, e.max_cpc, i.position
from bid3_events_impressions i, bid3_events e
where i.eventid = e.id and e.keywordid in ($1,$2,$3,$4,$5,$6,$7,
$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20, $21,$22,$23,$24,$25,$26,
$27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$3 9,$40,$41,$42,$43,$44,
$45,$46,$47,$48,
[... placeholders $49 thru $1908 omitted ...]
$1909,$1910,$1911,$1912,$1913,$1914,$1915,$1916,$1 917,$1918,$1919,$1920,
$1921,$1922,$1923,$1924,$1925,$1926,$1927,$1928,$1 929,$1930,$1931,$1932)
order by e.keywordid, e.creativeid, e.t

Needless to say this statement is being generated programatically.

The problem is that the code needs to execute selects of this sort on
various different number of keyword IDs quite frequently. I'm not
sure how to rewrite it. If I put the keyword IDs into a temporary
table then it is not at all clear that the overhead of doing each
individual INSERT to populate the table won't be just as slow (the
database is located across a network so there is a significant RTT,
and COPY isn't supported by my PG lib).

Has anyone got any suggestions?

Rich.

----------------------------------------------------------------------
explain select e.keywordid, e.quantity, e.max_cpc, i.position from bid3_events_impressions i, bid3_events e where i.eventid = e.id and e.keywordid in (1,2,3,4,5,6,7,8,9,10) order by e.keywordid, e.creativeid, e.t; QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=15795.79..15796.57 rows=312 width=34)
Sort Key: e.keywordid, e.creativeid, e.t
-> Hash Join (cost=11623.58..15782.87 rows=312 width=34)
Hash Cond: ("outer".eventid = "inner".id)
-> Seq Scan on bid3_events_impressions i (cost=0.00..3471.78 rows=136878 width=8)
-> Hash (cost=11622.35..11622.35 rows=489 width=34)
-> Seq Scan on bid3_events e (cost=0.00..11622.35 rows=489 width=34)
Filter: ((keywordid = 1) OR (keywordid = 2) OR (keywordid = 3) OR (keywordid = 4) OR (keywordid = 5) OR (keywordid = 6) OR (keywordid = 7) OR (keywordid = 8) OR (keywordid = 9) OR (keywordid = 10))
(8 rows)


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 03:08 PM
Oleg Bartunov
 
Posts: n/a
Default Re: Very large IN-clause is slow, but how to rewrite it?

Richard,

contrib/intarray may help you.

On Sun, 25 Feb 2007, Richard Jones wrote:

> I've been profiling a PG database / mix of applications and found that
> one statement which takes a very long time to execute is:
>
> select e.keywordid, e.quantity, e.max_cpc, i.position
> from bid3_events_impressions i, bid3_events e
> where i.eventid = e.id and e.keywordid in ($1,$2,$3,$4,$5,$6,$7,
> $8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18,$19,$20, $21,$22,$23,$24,$25,$26,
> $27,$28,$29,$30,$31,$32,$33,$34,$35,$36,$37,$38,$3 9,$40,$41,$42,$43,$44,
> $45,$46,$47,$48,
> [... placeholders $49 thru $1908 omitted ...]
> $1909,$1910,$1911,$1912,$1913,$1914,$1915,$1916,$1 917,$1918,$1919,$1920,
> $1921,$1922,$1923,$1924,$1925,$1926,$1927,$1928,$1 929,$1930,$1931,$1932)
> order by e.keywordid, e.creativeid, e.t
>
> Needless to say this statement is being generated programatically.
>
> The problem is that the code needs to execute selects of this sort on
> various different number of keyword IDs quite frequently. I'm not
> sure how to rewrite it. If I put the keyword IDs into a temporary
> table then it is not at all clear that the overhead of doing each
> individual INSERT to populate the table won't be just as slow (the
> database is located across a network so there is a significant RTT,
> and COPY isn't supported by my PG lib).
>
> Has anyone got any suggestions?
>
> Rich.
>
> ----------------------------------------------------------------------
> explain select e.keywordid, e.quantity, e.max_cpc, i.position from bid3_events_impressions i, bid3_events e where i.eventid = e.id and e.keywordid in (1,2,3,4,5,6,7,8,9,10) order by e.keywordid, e.creativeid, e.t; QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Sort (cost=15795.79..15796.57 rows=312 width=34)
> Sort Key: e.keywordid, e.creativeid, e.t
> -> Hash Join (cost=11623.58..15782.87 rows=312 width=34)
> Hash Cond: ("outer".eventid = "inner".id)
> -> Seq Scan on bid3_events_impressions i (cost=0.00..3471.78 rows=136878 width=8)
> -> Hash (cost=11622.35..11622.35 rows=489 width=34)
> -> Seq Scan on bid3_events e (cost=0.00..11622.35 rows=489 width=34)
> Filter: ((keywordid = 1) OR (keywordid = 2) OR (keywordid = 3) OR (keywordid = 4) OR (keywordid = 5) OR (keywordid = 6) OR (keywordid = 7) OR (keywordid = 8) OR (keywordid = 9) OR (keywordid = 10))
> (8 rows)
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>


Regards,
Oleg
__________________________________________________ ___________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 03:08 PM
Tom Lane
 
Posts: n/a
Default Re: Very large IN-clause is slow, but how to rewrite it?

Richard Jones <rich@annexia.org> writes:
> I've been profiling a PG database / mix of applications and found that
> one statement which takes a very long time to execute is:


PG 8.2 does better with long IN-lists ... although if the list is so
long as to be fetching a significant fraction of the table, you'll still
have problems. In that case I'd advise putting the values into a temp
table, ANALYZEing same, and doing "WHERE foo IN (SELECT x FROM tmp_table)".

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 03:08 PM
Joe Conway
 
Posts: n/a
Default Re: Very large IN-clause is slow, but how to rewrite it?

Tom Lane wrote:
> Richard Jones <rich@annexia.org> writes:
>> I've been profiling a PG database / mix of applications and found that
>> one statement which takes a very long time to execute is:

>
> PG 8.2 does better with long IN-lists ... although if the list is so
> long as to be fetching a significant fraction of the table, you'll still
> have problems. In that case I'd advise putting the values into a temp
> table, ANALYZEing same, and doing "WHERE foo IN (SELECT x FROM tmp_table)".


If 8.2, what about
... WHERE foo IN (select x from (values (1$),(2$),...,(N$)) as t(x))
?

It would be interesting to see how that compares performance-wise.

Joe


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 03:08 PM
Tom Lane
 
Posts: n/a
Default Re: Very large IN-clause is slow, but how to rewrite it?

Joe Conway <mail@joeconway.com> writes:
> If 8.2, what about
> ... WHERE foo IN (select x from (values (1$),(2$),...,(N$)) as t(x))
> ?


Well, the OP wasn't using 8.2 --- judging from the selected plan, it had
to be 8.0 or older. But yeah, a values-list is an interesting
alternative on 8.2. I think actually you don't need all that much extra
notation; this seems to work:

WHERE foo IN (VALUES ($1),($2),($3),...)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 03:08 PM
Richard Jones
 
Posts: n/a
Default Re: Very large IN-clause is slow, but how to rewrite it?

On Sun, Feb 25, 2007 at 01:34:44PM -0500, Tom Lane wrote:
> Joe Conway <mail@joeconway.com> writes:
> > If 8.2, what about
> > ... WHERE foo IN (select x from (values (1$),(2$),...,(N$)) as t(x))
> > ?

>
> Well, the OP wasn't using 8.2 --- judging from the selected plan, it had
> to be 8.0 or older. But yeah, a values-list is an interesting
> alternative on 8.2. I think actually you don't need all that much extra
> notation; this seems to work:
>
> WHERE foo IN (VALUES ($1),($2),($3),...)


That's right, it is in fact PG 7.4.

I will try the values suggestion to see if that makes a difference.

Rich.

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 03:29 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com