Unix Technical Forum

partitioned table and ORDER BY indexed_field DESC LIMIT 1

This is a discussion on partitioned table and ORDER BY indexed_field DESC LIMIT 1 within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi. I just created partitioned table, n_traf, sliced by month (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:25 AM
Anton
 
Posts: n/a
Default partitioned table and ORDER BY indexed_field DESC LIMIT 1

Hi.

I just created partitioned table, n_traf, sliced by month
(n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are
indexed by 'date_time' column.
Then I populate it (last value have date 2007-08-...) and do VACUUM
ANALYZE ON n_traf_y2007... all of it.

Now I try to select latest value (ORDER BY date_time LIMIT 1), but
Postgres produced the ugly plan:

=# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
QUERY PLAN
---------------------------------------------------------------------------------------------------------
Limit (cost=824637.69..824637.69 rows=1 width=32)
-> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
Sort Key: public.n_traf.date_time
-> Result (cost=0.00..100877.99 rows=5643499 width=32)
-> Append (cost=0.00..100877.99 rows=5643499 width=32)
-> Seq Scan on n_traf (cost=0.00..22.30
rows=1230 width=32)
-> Seq Scan on n_traf_y2007m01 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m02 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m03 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m04 n_traf
(cost=0.00..1.01 rows=1 width=32)
-> Seq Scan on n_traf_y2007m05 n_traf
(cost=0.00..9110.89 rows=509689 width=32)
-> Seq Scan on n_traf_y2007m06 n_traf
(cost=0.00..32003.89 rows=1790489 width=32)
-> Seq Scan on n_traf_y2007m07 n_traf
(cost=0.00..33881.10 rows=1895510 width=32)
-> Seq Scan on n_traf_y2007m08 n_traf
(cost=0.00..25702.70 rows=1437970 width=32)
-> Seq Scan on n_traf_y2007m09 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m10 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m11 n_traf
(cost=0.00..22.30 rows=1230 width=32)
-> Seq Scan on n_traf_y2007m12 n_traf
(cost=0.00..22.30 rows=1230 width=32)
(18 rows)


Why it no uses indexes at all?
-------------------------------------------

The simplier query goes fast, use index.
=# explain analyze SELECT * FROM n_traf_y2007m08 ORDER BY date_time
DESC LIMIT 1;

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..0.03 rows=1 width=32) (actual time=0.156..0.158
rows=1 loops=1)
-> Index Scan Backward using n_traf_y2007m08_date_time_login_id on
n_traf_y2007m08 (cost=0.00..39489.48 rows=1437970 width=32) (actual
time=0.150..0.150 rows=1 loops=1)
Total runtime: 0.241 ms
(3 rows)

Table n_traf looks like this:
=# \d n_traf
Table "public.n_traf"
Column | Type | Modifiers
-------------+-----------------------------+--------------------
login_id | integer | not null
traftype_id | integer | not null
date_time | timestamp without time zone | not null
bytes_in | bigint | not null default 0
bytes_out | bigint | not null default 0
Indexes:
"n_traf_login_id_key" UNIQUE, btree (login_id, traftype_id, date_time)
"n_traf_date_time_login_id" btree (date_time, login_id)
Foreign-key constraints:
"n_traf_login_id_fkey" FOREIGN KEY (login_id) REFERENCES
n_logins(login_id) ON UPDATE CASCADE ON DELETE CASCADE
"n_traf_traftype_id_fkey" FOREIGN KEY (traftype_id) REFERENCES
n_traftypes(traftype_id) ON UPDATE CASCADE
Rules:
n_traf_insert_y2007m01 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-01-01'::date AND new.date_time <
'2007-02-01 00:00:00'::timestamp without time zone DO INSTEAD
INSERT INTO n_traf_y2007m01 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m02 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-02-01'::date AND new.date_time <
'2007-03-01 00:00:00'::timestamp without time zone DO INSTEAD
INSERT INTO n_traf_y2007m02 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m03 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-03-01'::date AND new.date_time <
'2007-04-01 00:00:00'::timestamp without time zone DO INSTEAD
INSERT INTO n_traf_y2007m03 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m04 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-04-01'::date AND new.date_time <
'2007-05-01 00:00:00'::timestamp without time zone DO INSTEAD
INSERT INTO n_traf_y2007m04 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m05 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-05-01'::date AND new.date_time <
'2007-06-01 00:00:00'::timestamp without time zone DO INSTEAD
INSERT INTO n_traf_y2007m05 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m06 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-06-01'::date AND new.date_time <
'2007-07-01 00:00:00'::timestamp without time zone DO INSTEAD
INSERT INTO n_traf_y2007m06 (login_id, traftype_id, date_time,
bytes_in, bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m07 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-07-01'::date AND new.date_time <
'2007-08-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT
INTO n_traf_y2007m07 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m08 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-08-01'::date AND new.date_time <
'2007-09-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT
INTO n_traf_y2007m08 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m09 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-09-01'::date AND new.date_time <
'2007-10-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT
INTO n_traf_y2007m09 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m10 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-10-01'::date AND new.date_time <
'2007-11-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT
INTO n_traf_y2007m10 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m11 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-11-01'::date AND new.date_time <
'2007-12-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT
INTO n_traf_y2007m11 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)
n_traf_insert_y2007m12 AS
ON INSERT TO n_traf
WHERE new.date_time >= '2007-12-01'::date AND new.date_time <
'2008-01-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT
INTO n_traf_y2007m12 (login_id, traftype_id, date_time, bytes_in,
bytes_out)
VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in,
new.bytes_out)


Tables n_traf_y2007m... looks like these

Table "public.n_traf_y2007m01"
Column | Type | Modifiers
-------------+-----------------------------+--------------------
login_id | integer | not null
traftype_id | integer | not null
date_time | timestamp without time zone | not null
bytes_in | bigint | not null default 0
bytes_out | bigint | not null default 0
Indexes:
"n_traf_y2007m01_date_time_login_id" btree (date_time, login_id)
Check constraints:
"n_traf_y2007m01_date_time_check" CHECK (date_time >=
'2007-01-01'::date AND date_time < '2007-02-01 00:00:00'::timestamp
without time zone)
Inherits: n_traf

Index "public.n_traf_y2007m01_date_time_login_id"
Column | Type
-----------+-----------------------------
date_time | timestamp without time zone
login_id | integer
btree, for table "public.n_traf_y2007m01"

Table "public.n_traf_y2007m02"
Column | Type | Modifiers
-------------+-----------------------------+--------------------
login_id | integer | not null
traftype_id | integer | not null
date_time | timestamp without time zone | not null
bytes_in | bigint | not null default 0
bytes_out | bigint | not null default 0
Indexes:
"n_traf_y2007m02_date_time_login_id" btree (date_time, login_id)
Check constraints:
"n_traf_y2007m02_date_time_check" CHECK (date_time >=
'2007-02-01'::date AND date_time < '2007-03-01 00:00:00'::timestamp
without time zone)
Inherits: n_traf
....

--
engineer

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:25 AM
Mikko Partio
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

On 8/24/07, Anton <anton200@gmail.com> wrote:
>
> Hi.
>
> I just created partitioned table, n_traf, sliced by month
> (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are
> indexed by 'date_time' column.
> Then I populate it (last value have date 2007-08-...) and do VACUUM
> ANALYZE ON n_traf_y2007... all of it.
>
> Now I try to select latest value (ORDER BY date_time LIMIT 1), but
> Postgres produced the ugly plan:
>
> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> QUERY PLAN
>
> ---------------------------------------------------------------------------------------------------------
> Limit (cost=824637.69..824637.69 rows=1 width=32)
> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
> Sort Key: public.n_traf.date_time
> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
> -> Append (cost=0.00..100877.99 rows=5643499 width=32)
> -> Seq Scan on n_traf (cost=0.00..22.30
> rows=1230 width=32)
> -> Seq Scan on n_traf_y2007m01 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
> -> Seq Scan on n_traf_y2007m02 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
> -> Seq Scan on n_traf_y2007m03 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
> -> Seq Scan on n_traf_y2007m04 n_traf
> (cost=0.00..1.01 rows=1 width=32)
> -> Seq Scan on n_traf_y2007m05 n_traf
> (cost=0.00..9110.89 rows=509689 width=32)
> -> Seq Scan on n_traf_y2007m06 n_traf
> (cost=0.00..32003.89 rows=1790489 width=32)
> -> Seq Scan on n_traf_y2007m07 n_traf
> (cost=0.00..33881.10 rows=1895510 width=32)
> -> Seq Scan on n_traf_y2007m08 n_traf
> (cost=0.00..25702.70 rows=1437970 width=32)
> -> Seq Scan on n_traf_y2007m09 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
> -> Seq Scan on n_traf_y2007m10 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
> -> Seq Scan on n_traf_y2007m11 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
> -> Seq Scan on n_traf_y2007m12 n_traf
> (cost=0.00..22.30 rows=1230 width=32)
> (18 rows)
>
>
> Why it no uses indexes at all?
> -------------------------------------------





I'm no expert but I'd guess that the the planner doesn't know which
partition holds the latest time so it has to read them all.

Regards

MP

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:25 AM
Anton
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

> > =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> > QUERY PLAN

> ---------------------------------------------------------------------------------------------------------
> > Limit (cost=824637.69..824637.69 rows=1 width=32)
> > -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
> > Sort Key: public.n_traf.date_time
> > -> Result (cost=0.00..100877.99 rows=5643499 width=32)
> > -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
> > -> Seq Scan on n_traf (cost=0.00..22.30
> > rows=1230 width=32)
> > -> Seq Scan on n_traf_y2007m01 n_traf
> > (cost=0.00..22.30 rows=1230 width=32)

....
> > -> Seq Scan on n_traf_y2007m12 n_traf
> > (cost=0.00..22.30 rows=1230 width=32)
> > (18 rows)
> >
> > Why it no uses indexes at all?
> > -------------------------------------------

> I'm no expert but I'd guess that the the planner doesn't know which
> partition holds the latest time so it has to read them all.


Agree. But why it not uses indexes when it reading them?

--
engineer

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:25 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

Anton wrote:
>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>> QUERY PLAN

>> ---------------------------------------------------------------------------------------------------------
>>> Limit (cost=824637.69..824637.69 rows=1 width=32)
>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
>>> Sort Key: public.n_traf.date_time
>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
>>> -> Seq Scan on n_traf (cost=0.00..22.30
>>> rows=1230 width=32)
>>> -> Seq Scan on n_traf_y2007m01 n_traf
>>> (cost=0.00..22.30 rows=1230 width=32)

> ...
>>> -> Seq Scan on n_traf_y2007m12 n_traf
>>> (cost=0.00..22.30 rows=1230 width=32)
>>> (18 rows)
>>>
>>> Why it no uses indexes at all?
>>> -------------------------------------------

>> I'm no expert but I'd guess that the the planner doesn't know which
>> partition holds the latest time so it has to read them all.

>
> Agree. But why it not uses indexes when it reading them?


The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
below the append node. Therefore it needs to fetch all rows from all the
tables, and the fastest way to do that is a seq scan.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 11:25 AM
Luke Lonergan
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESCLIMIT 1

We just fixed this - I'll post a patch, but I don't have time to verify
against HEAD.

- Luke


On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

> Anton wrote:
>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>> QUERY PLAN
>>> ----------------------------------------------------------------------------
>>> -----------------------------
>>>> Limit (cost=824637.69..824637.69 rows=1 width=32)
>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
>>>> Sort Key: public.n_traf.date_time
>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
>>>> -> Seq Scan on n_traf (cost=0.00..22.30
>>>> rows=1230 width=32)
>>>> -> Seq Scan on n_traf_y2007m01 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)

>> ...
>>>> -> Seq Scan on n_traf_y2007m12 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)
>>>> (18 rows)
>>>>
>>>> Why it no uses indexes at all?
>>>> -------------------------------------------
>>> I'm no expert but I'd guess that the the planner doesn't know which
>>> partition holds the latest time so it has to read them all.

>>
>> Agree. But why it not uses indexes when it reading them?

>
> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> below the append node. Therefore it needs to fetch all rows from all the
> tables, and the fastest way to do that is a seq scan.




---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 11:25 AM
Luke Lonergan
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESCLIMIT 1

Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
it?

This enables the use of index scan of a child table by recognizing sort
order of the append node. Kurt Harriman did the work.

- Luke

Index: cdb-pg/src/backend/optimizer/path/indxpath.c
================================================== =================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/path/indxpath.c,v
diff -u -N -r1.22 -r1.22.2.1
--- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21
-0000 1.22
+++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15
-0000 1.22.2.1
@@ -379,8 +379,51 @@
index_pathkeys = build_index_pathkeys(root, index,
ForwardScanDirection,
true);
- useful_pathkeys = truncate_useless_pathkeys(root, rel,
- index_pathkeys);
+ /*
+ * CDB: For appendrel child, pathkeys contain Var nodes in
terms
+ * of the child's baserel. Transform the pathkey list to refer
to
+ * columns of the appendrel.
+ */
+ if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
+ {
+ AppendRelInfo *appinfo = NULL;
+ RelOptInfo *appendrel = NULL;
+ ListCell *appcell;
+ CdbPathLocus notalocus;
+
+ /* Find the appendrel of which this baserel is a child. */
+ foreach(appcell, root->append_rel_list)
+ {
+ appinfo = (AppendRelInfo *)lfirst(appcell);
+ if (appinfo->child_relid == rel->relid)
+ break;
+ }
+ Assert(appinfo);
+ appendrel = find_base_rel(root, appinfo->parent_relid);
+
+ /*
+ * The pathkey list happens to have the same format as the
+ * partitioning key of a Hashed locus, so by disguising it
+ * we can use cdbpathlocus_pull_above_projection() to do
the
+ * transformation.
+ */
+ CdbPathLocus_MakeHashed(&notalocus, index_pathkeys);
+ notalocus =
+ cdbpathlocus_pull_above_projection(root,
+ notalocus,
+ rel->relids,
+ rel->reltargetlist,
+
appendrel->reltargetlist,
+ appendrel->relid);
+ if (CdbPathLocus_IsHashed(notalocus))
+ index_pathkeys = truncate_useless_pathkeys(root,
appendrel,
+
notalocus.partkey);
+ else
+ index_pathkeys = NULL;
+ }
+
+ useful_pathkeys = truncate_useless_pathkeys(root, rel,
+ index_pathkeys);
}
else
useful_pathkeys = NIL;
Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
================================================== =================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/path/pathkeys.c,v
diff -u -N -r1.18 -r1.18.2.1
--- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07
-0000 1.18
+++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15
-0000 1.18.2.1
@@ -1403,55 +1403,53 @@
{
PathKeyItem *item;
Expr *newexpr;
+ AttrNumber targetindex;

Assert(pathkey);

- /* Use constant expr if available. Will be at head of list. */
- if (CdbPathkeyEqualsConstant(pathkey))
+ /* Find an expr that we can rewrite to use the projected columns. */
+ item = cdbpullup_findPathKeyItemInTargetList(pathkey,
+ relids,
+ targetlist,
+ &targetindex); // OUT
+
+ /* If not found, see if the equiv class contains a constant expr. */
+ if (!item &&
+ CdbPathkeyEqualsConstant(pathkey))
{
item = (PathKeyItem *)linitial(pathkey);
newexpr = (Expr *)copyObject(item->key);
}

- /* New vars for old! */
- else
- {
- AttrNumber targetindex;
+ /* Fail if no usable expr. */
+ else if (!item)
+ return NULL;

- /* Find an expr that we can rewrite to use the projected columns.
*/
- item = cdbpullup_findPathKeyItemInTargetList(pathkey,
- relids,
- targetlist,
- &targetindex); // OUT
- if (!item)
- return NULL;
+ /* If found matching targetlist item, make a Var that references it. */
+ else if (targetindex > 0)
+ newexpr = (Expr *)cdbpullup_makeVar(newrelid,
+ targetindex,
+ newvarlist,
+ (Expr *)item->key);

- /* If found matching targetlist item, make a Var that references
it. */
- if (targetindex > 0)
- newexpr = (Expr *)cdbpullup_makeVar(newrelid,
- targetindex,
- newvarlist,
- (Expr *)item->key);
+ /* Replace expr's Var nodes with new ones referencing the targetlist.
*/
+ else
+ newexpr = cdbpullup_expr((Expr *)item->key,
+ targetlist,
+ newvarlist,
+ newrelid);

- /* Replace expr's Var nodes with new ones referencing the
targetlist. */
- else
- newexpr = cdbpullup_expr((Expr *)item->key,
- targetlist,
- newvarlist,
- newrelid);
+ /* Pull up RelabelType node too, unless tlist expr has right type. */
+ if (IsA(item->key, RelabelType))
+ {
+ RelabelType *oldrelabel = (RelabelType *)item->key;

- /* Pull up RelabelType node too, unless tlist expr has right type.
*/
- if (IsA(item->key, RelabelType))
- {
- RelabelType *oldrelabel = (RelabelType *)item->key;
-
- if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
- oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
- newexpr = (Expr *)makeRelabelType(newexpr,
- oldrelabel->resulttype,
- oldrelabel->resulttypmod,
-
oldrelabel->relabelformat);
- }
+ if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
+ oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
+ newexpr = (Expr *)makeRelabelType(newexpr,
+ oldrelabel->resulttype,
+ oldrelabel->resulttypmod,
+ oldrelabel->relabelformat);
}
Insist(newexpr);

Index: cdb-pg/src/backend/optimizer/util/pathnode.c
================================================== =================
RCS file:
/data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
er/util/pathnode.c,v
diff -u -N -r1.52.2.4 -r1.52.2.5
--- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44
-0000 1.52.2.4
+++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15
-0000 1.52.2.5
@@ -1563,7 +1563,15 @@
pathnode->path.rescannable = false;
}

- return pathnode;
+ /*
+ * CDB: If there is exactly one subpath, its ordering is preserved.
+ * Child rel's pathkey exprs are already expressed in terms of the
+ * columns of the parent appendrel. See find_usable_indexes().
+ */
+ if (list_length(subpaths) == 1)
+ pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
+
+ return pathnode;
}

/*


On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:

> Anton wrote:
>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>> QUERY PLAN
>>> ----------------------------------------------------------------------------
>>> -----------------------------
>>>> Limit (cost=824637.69..824637.69 rows=1 width=32)
>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
>>>> Sort Key: public.n_traf.date_time
>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
>>>> -> Seq Scan on n_traf (cost=0.00..22.30
>>>> rows=1230 width=32)
>>>> -> Seq Scan on n_traf_y2007m01 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)

>> ...
>>>> -> Seq Scan on n_traf_y2007m12 n_traf
>>>> (cost=0.00..22.30 rows=1230 width=32)
>>>> (18 rows)
>>>>
>>>> Why it no uses indexes at all?
>>>> -------------------------------------------
>>> I'm no expert but I'd guess that the the planner doesn't know which
>>> partition holds the latest time so it has to read them all.

>>
>> Agree. But why it not uses indexes when it reading them?

>
> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> below the append node. Therefore it needs to fetch all rows from all the
> tables, and the fastest way to do that is a seq scan.




---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 11:26 AM
Tomas Tamosaitis
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESCLIMIT 1

Pn, 2007 08 24 14:53 +0600, Anton rašė:
> Hi.
>
> I just created partitioned table, n_traf, sliced by month
> (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are
> indexed by 'date_time' column.
> Then I populate it (last value have date 2007-08-...) and do VACUUM
> ANALYZE ON n_traf_y2007... all of it.
>
> Now I try to select latest value (ORDER BY date_time LIMIT 1), but
> Postgres produced the ugly plan:
>
> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> QUERY PL


can you test performance and send explain results of select like this :
select * from n_traf where date_time = (select max(date_time) from
n_traf);

i have similar problem with ~70M rows table (then using ordering), but
my table not partitioned.
I`m interesting how this select will works on partitioned table.


--
Pagarbiai,
Tomas Tamošaitis
Projektų Vadovas
Connecty
Skype://mazgis1009?add
Mob: +370 652 86127
e-pastas: tomas.tamosaitis@connecty.lt
web: www.connecty.lt


---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-19-2008, 11:26 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

Bruce, would you please add this to the 8.4 patch queue so we remember
to look at this later?

It didn't occur to me that we can do that in the degenerate case when
there's just a single node below the Append. A more general solution
would be to check if the pathkeys of all the child nodes match, and do a
"merge append" similar to a merge join.

Luke Lonergan wrote:
> Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> it?
>
> This enables the use of index scan of a child table by recognizing sort
> order of the append node. Kurt Harriman did the work.
>
> - Luke
>
> Index: cdb-pg/src/backend/optimizer/path/indxpath.c
> ================================================== =================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/indxpath.c,v
> diff -u -N -r1.22 -r1.22.2.1
> --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21
> -0000 1.22
> +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15
> -0000 1.22.2.1
> @@ -379,8 +379,51 @@
> index_pathkeys = build_index_pathkeys(root, index,
> ForwardScanDirection,
> true);
> - useful_pathkeys = truncate_useless_pathkeys(root, rel,
> - index_pathkeys);
> + /*
> + * CDB: For appendrel child, pathkeys contain Var nodes in
> terms
> + * of the child's baserel. Transform the pathkey list to refer
> to
> + * columns of the appendrel.
> + */
> + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL)
> + {
> + AppendRelInfo *appinfo = NULL;
> + RelOptInfo *appendrel = NULL;
> + ListCell *appcell;
> + CdbPathLocus notalocus;
> +
> + /* Find the appendrel of which this baserel is a child. */
> + foreach(appcell, root->append_rel_list)
> + {
> + appinfo = (AppendRelInfo *)lfirst(appcell);
> + if (appinfo->child_relid == rel->relid)
> + break;
> + }
> + Assert(appinfo);
> + appendrel = find_base_rel(root, appinfo->parent_relid);
> +
> + /*
> + * The pathkey list happens to have the same format as the
> + * partitioning key of a Hashed locus, so by disguising it
> + * we can use cdbpathlocus_pull_above_projection() to do
> the
> + * transformation.
> + */
> + CdbPathLocus_MakeHashed(&notalocus, index_pathkeys);
> + notalocus =
> + cdbpathlocus_pull_above_projection(root,
> + notalocus,
> + rel->relids,
> + rel->reltargetlist,
> +
> appendrel->reltargetlist,
> + appendrel->relid);
> + if (CdbPathLocus_IsHashed(notalocus))
> + index_pathkeys = truncate_useless_pathkeys(root,
> appendrel,
> +
> notalocus.partkey);
> + else
> + index_pathkeys = NULL;
> + }
> +
> + useful_pathkeys = truncate_useless_pathkeys(root, rel,
> + index_pathkeys);
> }
> else
> useful_pathkeys = NIL;
> Index: cdb-pg/src/backend/optimizer/path/pathkeys.c
> ================================================== =================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/path/pathkeys.c,v
> diff -u -N -r1.18 -r1.18.2.1
> --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07
> -0000 1.18
> +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15
> -0000 1.18.2.1
> @@ -1403,55 +1403,53 @@
> {
> PathKeyItem *item;
> Expr *newexpr;
> + AttrNumber targetindex;
>
> Assert(pathkey);
>
> - /* Use constant expr if available. Will be at head of list. */
> - if (CdbPathkeyEqualsConstant(pathkey))
> + /* Find an expr that we can rewrite to use the projected columns. */
> + item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> + relids,
> + targetlist,
> + &targetindex); // OUT
> +
> + /* If not found, see if the equiv class contains a constant expr. */
> + if (!item &&
> + CdbPathkeyEqualsConstant(pathkey))
> {
> item = (PathKeyItem *)linitial(pathkey);
> newexpr = (Expr *)copyObject(item->key);
> }
>
> - /* New vars for old! */
> - else
> - {
> - AttrNumber targetindex;
> + /* Fail if no usable expr. */
> + else if (!item)
> + return NULL;
>
> - /* Find an expr that we can rewrite to use the projected columns.
> */
> - item = cdbpullup_findPathKeyItemInTargetList(pathkey,
> - relids,
> - targetlist,
> - &targetindex); // OUT
> - if (!item)
> - return NULL;
> + /* If found matching targetlist item, make a Var that references it. */
> + else if (targetindex > 0)
> + newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> + targetindex,
> + newvarlist,
> + (Expr *)item->key);
>
> - /* If found matching targetlist item, make a Var that references
> it. */
> - if (targetindex > 0)
> - newexpr = (Expr *)cdbpullup_makeVar(newrelid,
> - targetindex,
> - newvarlist,
> - (Expr *)item->key);
> + /* Replace expr's Var nodes with new ones referencing the targetlist.
> */
> + else
> + newexpr = cdbpullup_expr((Expr *)item->key,
> + targetlist,
> + newvarlist,
> + newrelid);
>
> - /* Replace expr's Var nodes with new ones referencing the
> targetlist. */
> - else
> - newexpr = cdbpullup_expr((Expr *)item->key,
> - targetlist,
> - newvarlist,
> - newrelid);
> + /* Pull up RelabelType node too, unless tlist expr has right type. */
> + if (IsA(item->key, RelabelType))
> + {
> + RelabelType *oldrelabel = (RelabelType *)item->key;
>
> - /* Pull up RelabelType node too, unless tlist expr has right type.
> */
> - if (IsA(item->key, RelabelType))
> - {
> - RelabelType *oldrelabel = (RelabelType *)item->key;
> -
> - if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> - newexpr = (Expr *)makeRelabelType(newexpr,
> - oldrelabel->resulttype,
> - oldrelabel->resulttypmod,
> -
> oldrelabel->relabelformat);
> - }
> + if (oldrelabel->resulttype != exprType((Node *)newexpr) ||
> + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr))
> + newexpr = (Expr *)makeRelabelType(newexpr,
> + oldrelabel->resulttype,
> + oldrelabel->resulttypmod,
> + oldrelabel->relabelformat);
> }
> Insist(newexpr);
>
> Index: cdb-pg/src/backend/optimizer/util/pathnode.c
> ================================================== =================
> RCS file:
> /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz
> er/util/pathnode.c,v
> diff -u -N -r1.52.2.4 -r1.52.2.5
> --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44
> -0000 1.52.2.4
> +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15
> -0000 1.52.2.5
> @@ -1563,7 +1563,15 @@
> pathnode->path.rescannable = false;
> }
>
> - return pathnode;
> + /*
> + * CDB: If there is exactly one subpath, its ordering is preserved.
> + * Child rel's pathkey exprs are already expressed in terms of the
> + * columns of the parent appendrel. See find_usable_indexes().
> + */
> + if (list_length(subpaths) == 1)
> + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys;
> +
> + return pathnode;
> }
>
> /*
>
>
> On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
>
>> Anton wrote:
>>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
>>>>> QUERY PLAN
>>>> ----------------------------------------------------------------------------
>>>> -----------------------------
>>>>> Limit (cost=824637.69..824637.69 rows=1 width=32)
>>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
>>>>> Sort Key: public.n_traf.date_time
>>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
>>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
>>>>> -> Seq Scan on n_traf (cost=0.00..22.30
>>>>> rows=1230 width=32)
>>>>> -> Seq Scan on n_traf_y2007m01 n_traf
>>>>> (cost=0.00..22.30 rows=1230 width=32)
>>> ...
>>>>> -> Seq Scan on n_traf_y2007m12 n_traf
>>>>> (cost=0.00..22.30 rows=1230 width=32)
>>>>> (18 rows)
>>>>>
>>>>> Why it no uses indexes at all?
>>>>> -------------------------------------------
>>>> I'm no expert but I'd guess that the the planner doesn't know which
>>>> partition holds the latest time so it has to read them all.
>>> Agree. But why it not uses indexes when it reading them?

>> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
>> below the append node. Therefore it needs to fetch all rows from all the
>> tables, and the fastest way to do that is a seq scan.

>
>



--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-19-2008, 11:41 AM
Anton
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

I want ask about problem with partioned tables (it was discussed some
time ago, see below). Is it fixed somehow in 8.2.5 ?

2007/8/24, Luke Lonergan <llonergan@greenplum.com>:
> Below is a patch against 8.2.4 (more or less), Heikki can you take a look at
> it?
>
> This enables the use of index scan of a child table by recognizing sort
> order of the append node. Kurt Harriman did the work.

....
>
> On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote:
>
> > Anton wrote:
> >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1;
> >>>> QUERY PLAN
> >>> ----------------------------------------------------------------------------
> >>> -----------------------------
> >>>> Limit (cost=824637.69..824637.69 rows=1 width=32)
> >>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32)
> >>>> Sort Key: public.n_traf.date_time
> >>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32)
> >>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32)
> >>>> -> Seq Scan on n_traf (cost=0.00..22.30
> >>>> rows=1230 width=32)
> >>>> -> Seq Scan on n_traf_y2007m01 n_traf
> >>>> (cost=0.00..22.30 rows=1230 width=32)
> >> ...
> >>>> -> Seq Scan on n_traf_y2007m12 n_traf
> >>>> (cost=0.00..22.30 rows=1230 width=32)
> >>>> (18 rows)
> >>>>
> >>>> Why it no uses indexes at all?
> >>>> -------------------------------------------
> >>> I'm no expert but I'd guess that the the planner doesn't know which
> >>> partition holds the latest time so it has to read them all.
> >>
> >> Agree. But why it not uses indexes when it reading them?

> >
> > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..."
> > below the append node. Therefore it needs to fetch all rows from all the
> > tables, and the fastest way to do that is a seq scan.


--
engineer

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-19-2008, 11:41 AM
Tom Lane
 
Posts: n/a
Default Re: partitioned table and ORDER BY indexed_field DESC LIMIT 1

Anton <anton200@gmail.com> writes:
> I want ask about problem with partioned tables (it was discussed some
> time ago, see below). Is it fixed somehow in 8.2.5 ?


No. The patch you mention never was considered at all, since it
consisted of a selective quote from Greenplum source code. It would
not even compile in community Postgres, because it adds calls to half a
dozen Greenplum routines that we've never seen. Not to mention that
the base of the diff is Greenplum proprietary code, so the patch itself
wouldn't even apply successfully.

As to whether it would work if we had the full story ... well, not
having the full story, I don't want to opine.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump