Unix Technical Forum

Sequential scan on FK join

This is a discussion on Sequential scan on FK join within the Pgsql Performance forums, part of the PostgreSQL category; --> All, I can see why the query below is slow. The lead table is 34 million rows, and a ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 12:33 PM
Martin Nickel
 
Posts: n/a
Default Sequential scan on FK join

All,

I can see why the query below is slow. The lead table is 34 million rows,
and a sequential scan always takes 3+ minutes. Mailing_id is the PK for
mailing and is constrained as a foreign key (NULLS allowed) in lead.
There is an index on lead.mailing_id. I've just run VACUUM ANALYZE on
lead. I don't understand why it isn't being used.

Thanks for your help,
Martin Nickel

SELECT m.mailcode, l.lead_id
FROM mailing m
INNER JOIN lead l ON m.mailing_id = l.mailing_id
WHERE (m.maildate >= '2005-7-01'::date
AND m.maildate < '2005-8-01'::date)
-- takes 510,145 ms

EXPLAIN SELECT m.mailcode, l.lead_id
FROM mailing m
INNER JOIN lead l ON m.mailing_id = l.mailing_id
WHERE (m.maildate >= '2005-7-01'::date
AND m.maildate < '2005-8-01'::date)

Hash Join (cost=62.13..2001702.55 rows=2711552 width=20)
Hash Cond: ("outer".mailing_id = "inner".mailing_id)
-> Seq Scan on lead l (cost=0.00..1804198.60 rows=34065260 width=8)
-> Hash (cost=61.22..61.22 rows=362 width=20)
-> Index Scan using mailing_maildate_idx on mailing m (cost=0.00..61.22 rows=362 width=20)
Index Cond: ((maildate >= '2005-07-01'::date) AND (maildate < '2005-08-01'::date))

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 12:33 PM
Martin Nickel
 
Posts: n/a
Default Re: Sequential scan on FK join

Here's an EXPLAIN ANALYZE so you can see the actuals:

"Hash Join (cost=62.13..2001702.55 rows=2711552 width=20) (actual
time=40.659..244709.315 rows=2125270 loops=1)" " Hash Cond:
("outer".mailing_id = "inner".mailing_id)" " -> Seq Scan on lead l
(cost=0.00..1804198.60 rows=34065260 width=8) (actual
time=8.621..180281.094 rows=34060373 loops=1)" " -> Hash
(cost=61.22..61.22 rows=362 width=20) (actual time=28.718..28.718 rows=0
loops=1)" " -> Index Scan using mailing_maildate_idx on mailing m
(cost=0.00..61.22 rows=362 width=20) (actual time=16.571..27.793 rows=430
loops=1)" " Index Cond: ((maildate >= '2005-07-01'::date) AND
(maildate < '2005-08-01'::date))" "Total runtime: 248104.339 ms"


On Wed, 12 Oct 2005 15:40:24 -0500, Martin Nickel wrote:

> All,
>
> I can see why the query below is slow. The lead table is 34 million rows,
> and a sequential scan always takes 3+ minutes. Mailing_id is the PK for
> mailing and is constrained as a foreign key (NULLS allowed) in lead.
> There is an index on lead.mailing_id. I've just run VACUUM ANALYZE on
> lead. I don't understand why it isn't being used.
>
> Thanks for your help,
> Martin Nickel
>
> SELECT m.mailcode, l.lead_id
> FROM mailing m
> INNER JOIN lead l ON m.mailing_id = l.mailing_id
> WHERE (m.maildate >= '2005-7-01'::date
> AND m.maildate < '2005-8-01'::date)
> -- takes 510,145 ms
>
> EXPLAIN SELECT m.mailcode, l.lead_id
> FROM mailing m
> INNER JOIN lead l ON m.mailing_id = l.mailing_id
> WHERE (m.maildate >= '2005-7-01'::date
> AND m.maildate < '2005-8-01'::date)
>
> Hash Join (cost=62.13..2001702.55 rows=2711552 width=20)
> Hash Cond: ("outer".mailing_id = "inner".mailing_id)
> -> Seq Scan on lead l (cost=0.00..1804198.60 rows=34065260 width=8)
> -> Hash (cost=61.22..61.22 rows=362 width=20)
> -> Index Scan using mailing_maildate_idx on mailing m (cost=0.00..61.22 rows=362 width=20)
> Index Cond: ((maildate >= '2005-07-01'::date) AND (maildate < '2005-08-01'::date))


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 12:33 PM
Martin Nickel
 
Posts: n/a
Default Re: Sequential scan on FK join

.... and if I turn off enable_seqscan attempting to force an indexed join,
the execution time goes up. This makes no sense at all to me.

SELECT m.mailcode, l.lead_id
FROM mailing m
INNER JOIN lead l ON m.mailing_id = l.mailing_id
WHERE (m.maildate >= '2005-7-01'::date
AND m.maildate < '2005-8-01'::date)

Nested Loop (cost=0.00..15995420.56 rows=2711552 width=20)
-> Index Scan using mailing_maildate_idx on mailing m
(cost=0.00..61.22 rows=362 width=20)
Index Cond: ((maildate >= '2005-07-01'::date)
AND (maildate < '2005-08-01'::date))
-> Index Scan using lead_mailing_fk_idx2 on lead l
(cost=0.00..43870.96 rows=25209 width=8)
Index Cond: ("outer".mailing_id = l.mailing_id)

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 12:33 PM
Martin Nickel
 
Posts: n/a
Default Re: Sequential scan on FK join

Does anyone have any ideas on this? Right now I just have to say
"Postgres doesn't work". The query is running on a dedicated dual-Opteron
server with 4G memory and 3Ware SATA raid and runs in 2 minutes (best
case, no activity on the server). The same query on a wimpy single-P4 MS
SQL Server box runs in 1/3 the time. There is some kind of an issue here.

On Wed, 12 Oct 2005 16:16:17 -0500, Martin Nickel wrote:

> ... and if I turn off enable_seqscan attempting to force an indexed join,
> the execution time goes up. This makes no sense at all to me.
>
> SELECT m.mailcode, l.lead_id
> FROM mailing m
> INNER JOIN lead l ON m.mailing_id = l.mailing_id WHERE (m.maildate >=
> '2005-7-01'::date
> AND m.maildate < '2005-8-01'::date)
>
> Nested Loop (cost=0.00..15995420.56 rows=2711552 width=20)
> -> Index Scan using mailing_maildate_idx on mailing m
> (cost=0.00..61.22 rows=362 width=20)
> Index Cond: ((maildate >= '2005-07-01'::date)
> AND (maildate < '2005-08-01'::date))
> -> Index Scan using lead_mailing_fk_idx2 on lead l
> (cost=0.00..43870.96 rows=25209 width=8)
> Index Cond: ("outer".mailing_id = l.mailing_id)


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 12:33 PM
Richard Huxton
 
Posts: n/a
Default Re: Sequential scan on FK join

Martin Nickel wrote:
> EXPLAIN SELECT m.mailcode, l.lead_id
> FROM mailing m
> INNER JOIN lead l ON m.mailing_id = l.mailing_id
> WHERE (m.maildate >= '2005-7-01'::date
> AND m.maildate < '2005-8-01'::date)
>
> Hash Join (cost=62.13..2001702.55 rows=2711552 width=20)
> Hash Cond: ("outer".mailing_id = "inner".mailing_id)
> -> Seq Scan on lead l (cost=0.00..1804198.60 rows=34065260 width=8)
> -> Hash (cost=61.22..61.22 rows=362 width=20)
> -> Index Scan using mailing_maildate_idx on mailing m (cost=0.00..61.22 rows=362 width=20)
> Index Cond: ((maildate >= '2005-07-01'::date) AND (maildate < '2005-08-01'::date))


Well the reason *why* is that the planner expects 2.71 million rows to
be matched. If that was the case, then a seq-scan of 34 million rows
might well make sense. The output from EXPLAIN ANALYSE would show us
whether that estimate is correct - is it?

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 12:33 PM
Martin Nickel
 
Posts: n/a
Default Re: Sequential scan on FK join

Subject: Re: Sequential scan on FK join
From: Martin Nickel <martin@portant.com>
Newsgroups: pgsql.performance
Date: Wed, 12 Oct 2005 15:53:35 -0500

Richard, here's the EXPLAIN ANALYZE. I see your point re: the 2.7M
expected vs the 2 actual, but I've run ANALYZE on the lead table and it
hasn't changed the plan. Suggestions?

"Hash Join (cost=62.13..2001702.55 rows=2711552 width=20) (actual
time=40.659..244709.315 rows=2 125270 loops=1)" " Hash Cond:
("outer".mailing_id = "inner".mailing_id)" " -> Seq Scan on lead l
(cost=0.00..1804198.60 rows=34065260 width=8) (actual
time=8.621..180281.094 rows=34060373 loops=1)" " -> Hash
(cost=61.22..61.22 rows=362 width=20) (actual time=28.718..28.718 rows=0
loops=1)" " -> Index Scan using mailing_maildate_idx on mailing m
(cost=0.00..61.22 rows=362 width=20) (actual time=16.571..27.793 rows=430
loops=1)" " Index Cond: ((maildate >= '2005-07-01'::date) AND
(maildate < '2005-08-01'::date))" "Total runtime: 248104.339 ms"


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 12:33 PM
Richard Huxton
 
Posts: n/a
Default Re: Sequential scan on FK join

Martin Nickel wrote:
> Subject: Re: Sequential scan on FK join
> From: Martin Nickel <martin@portant.com>
> Newsgroups: pgsql.performance
> Date: Wed, 12 Oct 2005 15:53:35 -0500
>
> Richard, here's the EXPLAIN ANALYZE. I see your point re: the 2.7M
> expected vs the 2 actual, but I've run ANALYZE on the lead table and it
> hasn't changed the plan. Suggestions?
>
> Hash Join (cost=62.13..2001702.55 rows=2711552 width=20)
> (actual time=40.659..244709.315 rows=2 125270 loops=1)

^^^
Hmm - is that not just a formatting gap there? Is it not 2,125,270 rows
matching which would suggest PG is getting it more right than wrong.

Try issuing "SET enable_seqscan=false" before running the explain
analyse - that will force the planner to use any indexes it can find and
should show us whether the index would help.
--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 12:33 PM
Martin Nickel
 
Posts: n/a
Default Re: Sequential scan on FK join

When I turn of seqscan it does use the index - and it runs 20 to 30%
longer. Based on that, the planner is correctly choosing a sequential
scan - but that's just hard for me to comprehend. I'm joining on an int4
key, 2048 per index page - I guess that's a lot of reads - then the data
-page reads. Still, the 8-minute query time seems excessive.

On Mon, 17 Oct 2005 18:45:38 +0100, Richard Huxton wrote:

> Martin Nickel wrote:
>> Subject: Re: Sequential scan on FK join From: Martin Nickel
>> <martin@portant.com> Newsgroups: pgsql.performance
>> Date: Wed, 12 Oct 2005 15:53:35 -0500
>>
>> Richard, here's the EXPLAIN ANALYZE. I see your point re: the 2.7M
>> expected vs the 2 actual, but I've run ANALYZE on the lead table and it
>> hasn't changed the plan. Suggestions?
>>
>> Hash Join (cost=62.13..2001702.55 rows=2711552 width=20) (actual
>> time=40.659..244709.315 rows=2 125270 loops=1)

> ^^^
> Hmm - is that not just a formatting gap there? Is it not 2,125,270 rows
> matching which would suggest PG is getting it more right than wrong.
>
> Try issuing "SET enable_seqscan=false" before running the explain analyse
> - that will force the planner to use any indexes it can find and should
> show us whether the index would help. --
> Richard Huxton
> Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org


Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 12:34 PM
Richard Huxton
 
Posts: n/a
Default Re: Sequential scan on FK join

Martin Nickel wrote:
> When I turn of seqscan it does use the index - and it runs 20 to 30%
> longer. Based on that, the planner is correctly choosing a sequential
> scan - but that's just hard for me to comprehend. I'm joining on an int4
> key, 2048 per index page - I guess that's a lot of reads - then the data
> -page reads. Still, the 8-minute query time seems excessive.


You'll be getting (many) fewer than 2048 index entries per page. There's
a page header and various pointers involved too, and index pages aren't
going to be full. So - it needs to search the table on dates, fetch the
id's and then assemble them for the hash join. Of course, if you have
too many to join then all this will spill to disk slowing you further.

Now, you'd rather get down below 8 minutes. There are a number of options:
1. Make sure your disk i/o is being pushed to its limit
2. Look into increasing the sort memory for this one query "set
work_mem..." (see the runtime configuration section of the manual)
3. Actually - are you happy that your general configuration is OK?
4. Perhaps use a cursor - I'm guessing you want to process these
mailings in some way and only want them one at a time in any case.
5. Try the query one day at a time and see if the balance tips the
other way - you'll be dealing with substantially less data per query
which might match your system better. Of course, this may not be
practical for your applicaton.
6. If your lead table is updated only rarely, you could try a CLUSTER
on the table by mailing_id - that should speed the scan. Read the manual
for the cluster command first though.

--
Richard Huxton
Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 12:34 PM
Martin Nickel
 
Posts: n/a
Default Re: Sequential scan on FK join

On Tue, 18 Oct 2005 08:52:15 +0100, Richard Huxton wrote:

> Martin Nickel wrote:
>> When I turn of seqscan it does use the index - and it runs 20 to 30%
>> longer. Based on that, the planner is correctly choosing a sequential
>> scan - but that's just hard for me to comprehend. I'm joining on an
>> int4 key, 2048 per index page - I guess that's a lot of reads - then the
>> data -page reads. Still, the 8-minute query time seems excessive.

>
> You'll be getting (many) fewer than 2048 index entries per page. There's a
> page header and various pointers involved too, and index pages aren't
> going to be full. So - it needs to search the table on dates, fetch the
> id's and then assemble them for the hash join. Of course, if you have too
> many to join then all this will spill to disk slowing you further.
>
> Now, you'd rather get down below 8 minutes. There are a number of options:
> 1. Make sure your disk i/o is being pushed to its limit

We are completely peaked out on disk io. iostat frequently shows 60%
iowait time. This is quite an issue for us and I don't have any
great ideas. Data is on a 3ware sata raid at raid 10 across 4 disks. I
can barely even run vacuums on our largest table (lead) since it runs for
a day and a half and kills our online performance while running.

> 2. Look into increasing the sort memory for this one query "set
> work_mem..." (see the runtime configuration section of the manual)

I haven't tried this, and I will. Thanks for the idea.

> 3. Actually - are you happy that your general configuration is OK?

I'm not at all. Most of the configuration changes I've tried have made
almost no discernable difference. I'll post the relevant numbers in a
different post - possibly you'll have some suggestions.

> 4. Perhaps use a cursor - I'm guessing you want to process these
> mailings in some way and only want them one at a time in any case.

Where this first came up was in trying to get aggregate totals per
mailing. I gave up on that and created a nightly job to create a summary
table since Postgres wasn't up to the job in real time. Still, I
frequently need to do the join and limit it by other criteria - and it is
incredibly slow - even when the result set is smallish.

> 5. Try the query one day at a time and see if the balance tips the
> other way - you'll be dealing with substantially less data per query
> which might match your system better. Of course, this may not be
> practical for your applicaton.

It is not useful.

> 6. If your lead table is updated only rarely, you could try a CLUSTER
> on the table by mailing_id - that should speed the scan. Read the manual
> for the cluster command first though.

The lead table is one of the most volatle in our system. Each day we
insert tens or hundreds of thousands of rows, update almost that many, and
delete a few. It is growing, and could reach 100 million rows in 8 or 9
months. We're redesigning the data structure a little so lead is not
updated (updates are just too slow), but it will continue to have inserts
and deletes, and we'll have to join it with the associated table being
updated, which already promises to be a slow operation.

We're looking at 15K rpm scsi drives for a replacement raid array. We are
getting the place where it may be cheaper to convert to Oracle or DB2 than
to try and make Posgres work.

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 06:28 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com