Unix Technical Forum

Re: [PERFORM] Functionscan estimates

This is a discussion on Re: [PERFORM] Functionscan estimates within the pgsql Hackers forums, part of the PostgreSQL category; --> People: (HACKERS: Please read this entire thread at http://archives.postgresql.org/pgsql...4/msg00179.php Sorry for crossing this over.) > > The larger point ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-11-2008, 04:22 AM
Josh Berkus
 
Posts: n/a
Default Re: [PERFORM] Functionscan estimates

People:

(HACKERS: Please read this entire thread at
http://archives.postgresql.org/pgsql...4/msg00179.php
Sorry for crossing this over.)

> > The larger point is that writing an estimator for an SRF is frequently a
> > task about as difficult as writing the SRF itself

>
> True, although I think this doesn't necessarily kill the idea. If
> writing an estimator for a given SRF is too difficult, the user is no
> worse off than they are today. Hopefully there would be a fairly large
> class of SRFs for which writing an estimator would be relatively simple,
> and result in improved planner behavior.


For that matter, even supplying an estimate constant would be a vast
improvement over current functionality. I would suggest, in fact, that we
allow the use of either a constant number, or an estimator function, in that
column. Among other things, this would allow implementing the constant
number right now and the use of an estimating function later, in case we can
do the one but not the other for 8.1.

To be more sophisticated about the estimator function, it could take a subset
of the main functions arguments, based on $1 numbering, for example:
CREATE FUNCTION some_func ( INT, TEXT, TEXT, INT, INT ) ...
ALTER FUNCTION some_func WITH ESTIMATOR some_func_est( $4, $5 )

This would make writing estimators which would work for several functions
easier. Estimators would be a special type of functions which would take
any params and RETURN ESTIMATOR, which would be implicitly castable from some
general numeric type (like INT or FLOAT).

> > I don't foresee a whole lot of use of an estimator hook designed as
> > proposed here. In particular, if the API is such that we can only
> > use the estimator when all the function arguments are plan-time
> > constants, it's not going to be very helpful.


Actually, 95% of the time I use SRFs they are accepting constants and not row
references. And I use a lot of SRFs.

>
> Yes One approach might be to break the function's domain into pieces
> and have the estimator function calculate the estimated result set size
> for each piece. So, given a trivial function like:
>
> foo(int):
> if $1 < 10 then produce 100 rows
> else produce 10000 rows
>
> If the planner has encoded the distribution of input tuples to the
> function as a histogram, it could invoke the SRF's estimator function
> for the boundary values of each histogram bucket, and use that to get an
> idea of the function's likely result set size at runtime.
>
> And yes, the idea as sketched is totally unworkable For one thing,
> the difficulty of doing this grows rapidly as the number of arguments to
> the function increases. But perhaps there is some variant of this idea
> that might work...
>
> Another thought is that the estimator could provide information on the
> cost of evaluating the function, the number of tuples produced by the
> function, and even the distribution of those tuples.


Another possibility would be to support default values for all estimator
functions and have functions called in row context passed DEFAULT, thus
leaving it up to the estimator writer to supply median values for context
cases. Or to simply take the "first" values and use those.

While any of these possibilites aren't ideal, they are an improvement over the
current "flat 1000" estimate. As I said, even the ability to set a
per-function flat constant estimate would be an improvement.

> BTW, why is this on -performance? It should be on -hackers.


'cause I spend more time reading -performance, and I started the thread.
Crossed over now.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 11:31 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com