Unix Technical Forum

Trivial function query optimized badly

This is a discussion on Trivial function query optimized badly within the Pgsql Performance forums, part of the PostgreSQL category; --> Well, once again I'm hosed because there's no way to tell the optimizer the cost for a user-defined function. ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 09:01 AM
Craig A. James
 
Posts: n/a
Default Trivial function query optimized badly

Well, once again I'm hosed because there's no way to tell the optimizer the cost for a user-defined function. I know this issue has already been raised (by me!) several times, but I have to remind everyone about this. I frequently must rewrite my SQL to work around this problem.

Here is the function definition:

CREATE OR REPLACE FUNCTION cansmiles(text) RETURNS text
AS '/usr/local/pgsql/lib/libchem.so', 'cansmiles'
LANGUAGE 'C' STRICT IMMUTABLE;

Here is the bad optimization:

db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from version where version.isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1);
QUERY PLAN
--------------------------------------------------------------------------------------------------------
Seq Scan on version (cost=0.00..23.41 rows=1 width=4) (actual time=1434.281..1540.253 rows=1 loops=1)
Filter: (isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O'::text, 1))
Total runtime: 1540.347 ms
(3 rows)

I've had to break it up into two separate queries. Ironically, for large databases, Postgres does the right thing -- it computes the function, then uses the index on the "isosmiles" column. It's blazingly fast and very satisfactory. But for small databases, it apparently decides to recompute the function once per row, making the query N times slower (N = number of rows) than it should be!

In this instance, there are 1000 rows, and factor of 10^4 is a pretty dramatic slowdown... To make it work, I had to call the function separately then use its result to do the select.


db=> explain analyze select cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1);
QUERY PLAN
------------------------------------------------------------------------------------
Result (cost=0.00..0.01 rows=1 width=0) (actual time=1.692..1.694 rows=1 loops=1)
Total runtime: 1.720 ms
(2 rows)

db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from version where version.isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Index Scan using i_version_isosmiles on version (cost=0.00..5.80 rows=1 width=4) (actual time=0.114..0.117 rows=1 loops=1)
Index Cond: (isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O'::text)
Total runtime: 0.158 ms
(3 rows)

Craig


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 09:01 AM
Adam Rich
 
Posts: n/a
Default Re: Trivial function query optimized badly


Craig,
What version of postgres are you using? I just tested this on PG 8.1.2
and was unable to reproduce these results. I wrote a simple function
that returns the same text passed to it, after sleeping for 1 second.
I use it in a where clause, like your example below, and regardless of
the number of rows in the table, it still takes roughly 1 second,
indicating to me the function is only called once.

Is it possible that your function really isn't immutable? Would PG
realize this and fall back to treating it as VOLATILE ?



-----Original Message-----
From: pgsql-performance-owner@postgresql.org
[mailtogsql-performance-owner@postgresql.org] On Behalf Of Craig A.
James
Sent: Wednesday, January 03, 2007 9:11 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Trivial function query optimized badly


Well, once again I'm hosed because there's no way to tell the optimizer
the cost for a user-defined function. I know this issue has already
been raised (by me!) several times, but I have to remind everyone about
this. I frequently must rewrite my SQL to work around this problem.

Here is the function definition:

CREATE OR REPLACE FUNCTION cansmiles(text) RETURNS text
AS '/usr/local/pgsql/lib/libchem.so', 'cansmiles'
LANGUAGE 'C' STRICT IMMUTABLE;

Here is the bad optimization:

db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from
version where version.isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O',
1);
QUERY PLAN

------------------------------------------------------------------------
--------------------------------
Seq Scan on version (cost=0.00..23.41 rows=1 width=4) (actual
time=1434.281..1540.253 rows=1 loops=1)
Filter: (isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O'::text, 1))
Total runtime: 1540.347 ms
(3 rows)

I've had to break it up into two separate queries. Ironically, for
large databases, Postgres does the right thing -- it computes the
function, then uses the index on the "isosmiles" column. It's blazingly
fast and very satisfactory. But for small databases, it apparently
decides to recompute the function once per row, making the query N times
slower (N = number of rows) than it should be!

In this instance, there are 1000 rows, and factor of 10^4 is a pretty
dramatic slowdown... To make it work, I had to call the function
separately then use its result to do the select.


db=> explain analyze select cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1);
QUERY PLAN

------------------------------------------------------------------------
------------
Result (cost=0.00..0.01 rows=1 width=0) (actual time=1.692..1.694
rows=1 loops=1)
Total runtime: 1.720 ms
(2 rows)

db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from
version where version.isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O';
QUERY PLAN

------------------------------------------------------------------------
-----------------------------------------------------
Index Scan using i_version_isosmiles on version (cost=0.00..5.80
rows=1 width=4) (actual time=0.114..0.117 rows=1 loops=1)
Index Cond: (isosmiles = 'Brc1ccc2nc(cn2c1)C(=O)O'::text)
Total runtime: 0.158 ms
(3 rows)

Craig


---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 09:01 AM
Craig A. James
 
Posts: n/a
Default Re: Trivial function query optimized badly

Adam Rich wrote:
> Craig,
> What version of postgres are you using? I just tested this on PG 8.1.2
> and was unable to reproduce these results. I wrote a simple function
> that returns the same text passed to it, after sleeping for 1 second.
> I use it in a where clause, like your example below, and regardless of
> the number of rows in the table, it still takes roughly 1 second,
> indicating to me the function is only called once.


Sorry, I forgot that critical piece of info: I'm using 8.1.4.

Your results would indicate that 8.1.2 creates a different plan than 8.1.4, or else there's some configuration parameter that's different between your installation and mine that causes a radically different plan to be used. I assume you vacuum/analyzed the table before you ran the query.

> Is it possible that your function really isn't immutable? Would PG
> realize this and fall back to treating it as VOLATILE ?


Now that you say this, this seems more like a bug with the definition of IMMUTABLE. The function should only be called once if it's given a constant string, right? So the fact that Postgres called it once per row is just wrong.

Craig


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 09:01 AM
Tom Lane
 
Posts: n/a
Default Re: Trivial function query optimized badly

"Craig A. James" <cjames@modgraph-usa.com> writes:
> CREATE OR REPLACE FUNCTION cansmiles(text) RETURNS text
> AS '/usr/local/pgsql/lib/libchem.so', 'cansmiles'
> LANGUAGE 'C' STRICT IMMUTABLE;


Umm ... this is a single-argument function.

> db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from version where version.isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1);


And this query is invoking some other, two-argument function; which
apparently hasn't been marked IMMUTABLE, else it'd have been folded
to a constant.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 09:01 AM
Craig A. James
 
Posts: n/a
Default Re: Trivial function query optimized badly

Tom Lane wrote:
> "Craig A. James" <cjames@modgraph-usa.com> writes:
>> CREATE OR REPLACE FUNCTION cansmiles(text) RETURNS text
>> AS '/usr/local/pgsql/lib/libchem.so', 'cansmiles'
>> LANGUAGE 'C' STRICT IMMUTABLE;

>
> Umm ... this is a single-argument function.
>
>> db=> explain analyze select version_id, 'Brc1ccc2nc(cn2c1)C(=O)O' from version where version.isosmiles = cansmiles('Brc1ccc2nc(cn2c1)C(=O)O', 1);

>
> And this query is invoking some other, two-argument function; which
> apparently hasn't been marked IMMUTABLE, else it'd have been folded
> to a constant.


Good catch, mystery solved. There are two definitions for this function, the first just a "wrapper" for the second with the latter parameter defaulting to "1". The second definition was missing the "IMMUTABLE" keyword.

Thanks!
Craig

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 10:11 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com