Unix Technical Forum

estimated rows vs. actual rows

This is a discussion on estimated rows vs. actual rows within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi, in the #postgresql-es channel someone shows me this: pgsql-7.4.5 + postgis --- begin context --- CREATE TABLE calles ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 11:07 AM
Jaime Casanova
 
Posts: n/a
Default estimated rows vs. actual rows

Hi,

in the #postgresql-es channel someone shows me this:

pgsql-7.4.5 + postgis

--- begin context ---

CREATE TABLE calles (
gid int4 NOT NULL DEFAULT nextval('public.callesstgo_gid_seq'::text),
nombre varchar,
inicio int4,
termino int4,
comuna varchar,
ciudad varchar,
region numeric,
pais varchar,
the_geom geometry,
id_comuna numeric,
CONSTRAINT callesstgo_pkey PRIMARY KEY (gid),
CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) =
'MULTILINESTRING'::text OR the_geom IS NULL),
CONSTRAINT enforce_srid_the_geom CHECK (srid(the_geom) = -1)
)
WITH OIDS;
*
CREATE INDEX idx_region_comunas ON calles USING btree
(id_comuna, region);

select count(*) from calles;
143902

--- end context ---
*
Ok . here is the problem (BTW, the database has been analyzed just
before this query was execured)

explain analyze
select * from calles where id_comuna = 92 and region=13;

QUERY PLAN Seq Scan on calles (cost=0.00..7876.53 rows=2610
width=279) (actual time=182.590..454.195 rows=4612 loops=1)
Filter: ((id_comuna = 92::numeric) AND (region = 13::numeric))
Total runtime: 456.876 ms


Why is this query using a seq scan rather than a index scan? i notice
the diff between the estimated rows and actual rows (almost 2000).

Can this affect the query plan? i think this is a problem of
statistics, am i right? if so, what can be done?

regards,
Jaime Casanova

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 11:07 AM
Jaime Casanova
 
Posts: n/a
Default Re: estimated rows vs. actual rows

On Sun, 13 Feb 2005 13:41:09 -0800, Josh Berkus <josh@agliodbs.com> wrote:
> Jaime,
>
> > Why is this query using a seq scan rather than a index scan?

>
> Because it thinks a seq scan will be faster.
>

I will suggest him to probe with seq scans disabled.

But, IMHO, if the table has 143902 and it thinks will retrieve 2610
(almost 1.81% of the total). it won't be faster with an index?

i know, i will suggest him to probe to be sure. just an opinion.

regards,
Jaime Casanova

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 11:07 AM
Josh Berkus
 
Posts: n/a
Default Re: estimated rows vs. actual rows

Jaime,

> Why is this query using a seq scan rather than a index scan?


Because it thinks a seq scan will be faster.

> i notice
> the diff between the estimated rows and actual rows (almost 2000).


Yes, ANALYZE, and possibly increasing the column stats, should help that.

> Can this affect the query plan? i think this is a problem of
> statistics, am i right? if so, what can be done?


Well, if the estimate was accurate, PG would be even *more* likely to use a
seq scan (more rows).

I think maybe you should establish whether a seq scan actually *is* faster?
Perhaps do SET enable_seqscan = false and then re-run the query a few times?

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 11:07 AM
Christopher Browne
 
Posts: n/a
Default Re: estimated rows vs. actual rows

After takin a swig o' Arrakan spice grog, systemguards@gmail.com (Jaime Casanova) belched out:
> On Sun, 13 Feb 2005 13:41:09 -0800, Josh Berkus <josh@agliodbs.com> wrote:
>> Jaime,
>>
>> > Why is this query using a seq scan rather than a index scan?

>>
>> Because it thinks a seq scan will be faster.
>>

> I will suggest him to probe with seq scans disabled.
>
> But, IMHO, if the table has 143902 and it thinks will retrieve 2610
> (almost 1.81% of the total). it won't be faster with an index?


If the 2610 rows are scattered widely enough, it may be cheaper to do
a seq scan.

After all, with a seq scan, you read each block of the table's pages
exactly once.

With an index scan, you read index pages _and_ table pages, and may do
and redo some of the pages.

It sounds as though it's worth forcing the matter and trying it both
ways and comparing them. Don't be surprised if the seq scan is in
fact faster...
--
select 'cbbrowne' || '@' || 'gmail.com';
http://cbbrowne.com/info/emacs.html
When aiming for the common denominator, be prepared for the occasional
division by zero.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 11:07 AM
Tom Lane
 
Posts: n/a
Default Re: estimated rows vs. actual rows

Jaime Casanova <systemguards@gmail.com> writes:
> But, IMHO, if the table has 143902 and it thinks will retrieve 2610
> (almost 1.81% of the total). it won't be faster with an index?


That's almost one row in fifty. We don't know how wide the table is,
but it's certainly possible that there are order-of-a-hundred rows
on each page; in which case the indexscan is likely to hit every page.
Twice. Not in sequence. Only if the selected rows are pretty well
clustered in a small part of the table is this going to be a win
over a seqscan.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 11:07 AM
Mark Kirkwood
 
Posts: n/a
Default Re: estimated rows vs. actual rows

Jaime Casanova wrote:
>
> But, IMHO, if the table has 143902 and it thinks will retrieve 2610
> (almost 1.81% of the total). it won't be faster with an index?
>


Depends on how those 2610 rows are distributed amongst the 143902. The
worst case scenario is each one of them in its own page. In that case
you have to read 2610 *pages*, which is probably a significant
percentage of the table.

You can find out this information from the pg_stats view (particularly
the correlation column).


Mark

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 01:49 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com