This is a discussion on partitioned table and ORDER BY indexed_field DESC LIMIT 1 within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi. I just created partitioned table, n_traf, sliced by month (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi. I just created partitioned table, n_traf, sliced by month (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are indexed by 'date_time' column. Then I populate it (last value have date 2007-08-...) and do VACUUM ANALYZE ON n_traf_y2007... all of it. Now I try to select latest value (ORDER BY date_time LIMIT 1), but Postgres produced the ugly plan: =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; QUERY PLAN --------------------------------------------------------------------------------------------------------- Limit (cost=824637.69..824637.69 rows=1 width=32) -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) Sort Key: public.n_traf.date_time -> Result (cost=0.00..100877.99 rows=5643499 width=32) -> Append (cost=0.00..100877.99 rows=5643499 width=32) -> Seq Scan on n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m01 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m02 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m03 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m04 n_traf (cost=0.00..1.01 rows=1 width=32) -> Seq Scan on n_traf_y2007m05 n_traf (cost=0.00..9110.89 rows=509689 width=32) -> Seq Scan on n_traf_y2007m06 n_traf (cost=0.00..32003.89 rows=1790489 width=32) -> Seq Scan on n_traf_y2007m07 n_traf (cost=0.00..33881.10 rows=1895510 width=32) -> Seq Scan on n_traf_y2007m08 n_traf (cost=0.00..25702.70 rows=1437970 width=32) -> Seq Scan on n_traf_y2007m09 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m10 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m11 n_traf (cost=0.00..22.30 rows=1230 width=32) -> Seq Scan on n_traf_y2007m12 n_traf (cost=0.00..22.30 rows=1230 width=32) (18 rows) Why it no uses indexes at all? ------------------------------------------- The simplier query goes fast, use index. =# explain analyze SELECT * FROM n_traf_y2007m08 ORDER BY date_time DESC LIMIT 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=0.00..0.03 rows=1 width=32) (actual time=0.156..0.158 rows=1 loops=1) -> Index Scan Backward using n_traf_y2007m08_date_time_login_id on n_traf_y2007m08 (cost=0.00..39489.48 rows=1437970 width=32) (actual time=0.150..0.150 rows=1 loops=1) Total runtime: 0.241 ms (3 rows) Table n_traf looks like this: =# \d n_traf Table "public.n_traf" Column | Type | Modifiers -------------+-----------------------------+-------------------- login_id | integer | not null traftype_id | integer | not null date_time | timestamp without time zone | not null bytes_in | bigint | not null default 0 bytes_out | bigint | not null default 0 Indexes: "n_traf_login_id_key" UNIQUE, btree (login_id, traftype_id, date_time) "n_traf_date_time_login_id" btree (date_time, login_id) Foreign-key constraints: "n_traf_login_id_fkey" FOREIGN KEY (login_id) REFERENCES n_logins(login_id) ON UPDATE CASCADE ON DELETE CASCADE "n_traf_traftype_id_fkey" FOREIGN KEY (traftype_id) REFERENCES n_traftypes(traftype_id) ON UPDATE CASCADE Rules: n_traf_insert_y2007m01 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-01-01'::date AND new.date_time < '2007-02-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m01 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m02 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-02-01'::date AND new.date_time < '2007-03-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m02 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m03 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-03-01'::date AND new.date_time < '2007-04-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m03 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m04 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-04-01'::date AND new.date_time < '2007-05-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m04 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m05 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-05-01'::date AND new.date_time < '2007-06-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m05 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m06 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-06-01'::date AND new.date_time < '2007-07-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m06 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m07 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-07-01'::date AND new.date_time < '2007-08-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m07 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m08 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-08-01'::date AND new.date_time < '2007-09-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m08 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m09 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-09-01'::date AND new.date_time < '2007-10-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m09 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m10 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-10-01'::date AND new.date_time < '2007-11-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m10 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m11 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-11-01'::date AND new.date_time < '2007-12-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m11 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) n_traf_insert_y2007m12 AS ON INSERT TO n_traf WHERE new.date_time >= '2007-12-01'::date AND new.date_time < '2008-01-01 00:00:00'::timestamp without time zone DO INSTEAD INSERT INTO n_traf_y2007m12 (login_id, traftype_id, date_time, bytes_in, bytes_out) VALUES (new.login_id, new.traftype_id, new.date_time, new.bytes_in, new.bytes_out) Tables n_traf_y2007m... looks like these Table "public.n_traf_y2007m01" Column | Type | Modifiers -------------+-----------------------------+-------------------- login_id | integer | not null traftype_id | integer | not null date_time | timestamp without time zone | not null bytes_in | bigint | not null default 0 bytes_out | bigint | not null default 0 Indexes: "n_traf_y2007m01_date_time_login_id" btree (date_time, login_id) Check constraints: "n_traf_y2007m01_date_time_check" CHECK (date_time >= '2007-01-01'::date AND date_time < '2007-02-01 00:00:00'::timestamp without time zone) Inherits: n_traf Index "public.n_traf_y2007m01_date_time_login_id" Column | Type -----------+----------------------------- date_time | timestamp without time zone login_id | integer btree, for table "public.n_traf_y2007m01" Table "public.n_traf_y2007m02" Column | Type | Modifiers -------------+-----------------------------+-------------------- login_id | integer | not null traftype_id | integer | not null date_time | timestamp without time zone | not null bytes_in | bigint | not null default 0 bytes_out | bigint | not null default 0 Indexes: "n_traf_y2007m02_date_time_login_id" btree (date_time, login_id) Check constraints: "n_traf_y2007m02_date_time_check" CHECK (date_time >= '2007-02-01'::date AND date_time < '2007-03-01 00:00:00'::timestamp without time zone) Inherits: n_traf .... -- engineer ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On 8/24/07, Anton <anton200@gmail.com> wrote: > > Hi. > > I just created partitioned table, n_traf, sliced by month > (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are > indexed by 'date_time' column. > Then I populate it (last value have date 2007-08-...) and do VACUUM > ANALYZE ON n_traf_y2007... all of it. > > Now I try to select latest value (ORDER BY date_time LIMIT 1), but > Postgres produced the ugly plan: > > =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > QUERY PLAN > > --------------------------------------------------------------------------------------------------------- > Limit (cost=824637.69..824637.69 rows=1 width=32) > -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) > Sort Key: public.n_traf.date_time > -> Result (cost=0.00..100877.99 rows=5643499 width=32) > -> Append (cost=0.00..100877.99 rows=5643499 width=32) > -> Seq Scan on n_traf (cost=0.00..22.30 > rows=1230 width=32) > -> Seq Scan on n_traf_y2007m01 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m02 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m03 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m04 n_traf > (cost=0.00..1.01 rows=1 width=32) > -> Seq Scan on n_traf_y2007m05 n_traf > (cost=0.00..9110.89 rows=509689 width=32) > -> Seq Scan on n_traf_y2007m06 n_traf > (cost=0.00..32003.89 rows=1790489 width=32) > -> Seq Scan on n_traf_y2007m07 n_traf > (cost=0.00..33881.10 rows=1895510 width=32) > -> Seq Scan on n_traf_y2007m08 n_traf > (cost=0.00..25702.70 rows=1437970 width=32) > -> Seq Scan on n_traf_y2007m09 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m10 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m11 n_traf > (cost=0.00..22.30 rows=1230 width=32) > -> Seq Scan on n_traf_y2007m12 n_traf > (cost=0.00..22.30 rows=1230 width=32) > (18 rows) > > > Why it no uses indexes at all? > ------------------------------------------- I'm no expert but I'd guess that the the planner doesn't know which partition holds the latest time so it has to read them all. Regards MP |
| |||
| > > =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > > QUERY PLAN > --------------------------------------------------------------------------------------------------------- > > Limit (cost=824637.69..824637.69 rows=1 width=32) > > -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) > > Sort Key: public.n_traf.date_time > > -> Result (cost=0.00..100877.99 rows=5643499 width=32) > > -> Append (cost= 0.00..100877.99 rows=5643499 width=32) > > -> Seq Scan on n_traf (cost=0.00..22.30 > > rows=1230 width=32) > > -> Seq Scan on n_traf_y2007m01 n_traf > > (cost=0.00..22.30 rows=1230 width=32) .... > > -> Seq Scan on n_traf_y2007m12 n_traf > > (cost=0.00..22.30 rows=1230 width=32) > > (18 rows) > > > > Why it no uses indexes at all? > > ------------------------------------------- > I'm no expert but I'd guess that the the planner doesn't know which > partition holds the latest time so it has to read them all. Agree. But why it not uses indexes when it reading them? -- engineer ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Anton wrote: >>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; >>> QUERY PLAN >> --------------------------------------------------------------------------------------------------------- >>> Limit (cost=824637.69..824637.69 rows=1 width=32) >>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) >>> Sort Key: public.n_traf.date_time >>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) >>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) >>> -> Seq Scan on n_traf (cost=0.00..22.30 >>> rows=1230 width=32) >>> -> Seq Scan on n_traf_y2007m01 n_traf >>> (cost=0.00..22.30 rows=1230 width=32) > ... >>> -> Seq Scan on n_traf_y2007m12 n_traf >>> (cost=0.00..22.30 rows=1230 width=32) >>> (18 rows) >>> >>> Why it no uses indexes at all? >>> ------------------------------------------- >> I'm no expert but I'd guess that the the planner doesn't know which >> partition holds the latest time so it has to read them all. > > Agree. But why it not uses indexes when it reading them? The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." below the append node. Therefore it needs to fetch all rows from all the tables, and the fastest way to do that is a seq scan. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| We just fixed this - I'll post a patch, but I don't have time to verify against HEAD. - Luke On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > Anton wrote: >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; >>>> QUERY PLAN >>> ---------------------------------------------------------------------------- >>> ----------------------------- >>>> Limit (cost=824637.69..824637.69 rows=1 width=32) >>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) >>>> Sort Key: public.n_traf.date_time >>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) >>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) >>>> -> Seq Scan on n_traf (cost=0.00..22.30 >>>> rows=1230 width=32) >>>> -> Seq Scan on n_traf_y2007m01 n_traf >>>> (cost=0.00..22.30 rows=1230 width=32) >> ... >>>> -> Seq Scan on n_traf_y2007m12 n_traf >>>> (cost=0.00..22.30 rows=1230 width=32) >>>> (18 rows) >>>> >>>> Why it no uses indexes at all? >>>> ------------------------------------------- >>> I'm no expert but I'd guess that the the planner doesn't know which >>> partition holds the latest time so it has to read them all. >> >> Agree. But why it not uses indexes when it reading them? > > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." > below the append node. Therefore it needs to fetch all rows from all the > tables, and the fastest way to do that is a seq scan. ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org |
| |||
| Below is a patch against 8.2.4 (more or less), Heikki can you take a look at it? This enables the use of index scan of a child table by recognizing sort order of the append node. Kurt Harriman did the work. - Luke Index: cdb-pg/src/backend/optimizer/path/indxpath.c ================================================== ================= RCS file: /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz er/path/indxpath.c,v diff -u -N -r1.22 -r1.22.2.1 --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21 -0000 1.22 +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15 -0000 1.22.2.1 @@ -379,8 +379,51 @@ index_pathkeys = build_index_pathkeys(root, index, ForwardScanDirection, true); - useful_pathkeys = truncate_useless_pathkeys(root, rel, - index_pathkeys); + /* + * CDB: For appendrel child, pathkeys contain Var nodes in terms + * of the child's baserel. Transform the pathkey list to refer to + * columns of the appendrel. + */ + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL) + { + AppendRelInfo *appinfo = NULL; + RelOptInfo *appendrel = NULL; + ListCell *appcell; + CdbPathLocus notalocus; + + /* Find the appendrel of which this baserel is a child. */ + foreach(appcell, root->append_rel_list) + { + appinfo = (AppendRelInfo *)lfirst(appcell); + if (appinfo->child_relid == rel->relid) + break; + } + Assert(appinfo); + appendrel = find_base_rel(root, appinfo->parent_relid); + + /* + * The pathkey list happens to have the same format as the + * partitioning key of a Hashed locus, so by disguising it + * we can use cdbpathlocus_pull_above_projection() to do the + * transformation. + */ + CdbPathLocus_MakeHashed(¬alocus, index_pathkeys); + notalocus = + cdbpathlocus_pull_above_projection(root, + notalocus, + rel->relids, + rel->reltargetlist, + appendrel->reltargetlist, + appendrel->relid); + if (CdbPathLocus_IsHashed(notalocus)) + index_pathkeys = truncate_useless_pathkeys(root, appendrel, + notalocus.partkey); + else + index_pathkeys = NULL; + } + + useful_pathkeys = truncate_useless_pathkeys(root, rel, + index_pathkeys); } else useful_pathkeys = NIL; Index: cdb-pg/src/backend/optimizer/path/pathkeys.c ================================================== ================= RCS file: /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz er/path/pathkeys.c,v diff -u -N -r1.18 -r1.18.2.1 --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07 -0000 1.18 +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15 -0000 1.18.2.1 @@ -1403,55 +1403,53 @@ { PathKeyItem *item; Expr *newexpr; + AttrNumber targetindex; Assert(pathkey); - /* Use constant expr if available. Will be at head of list. */ - if (CdbPathkeyEqualsConstant(pathkey)) + /* Find an expr that we can rewrite to use the projected columns. */ + item = cdbpullup_findPathKeyItemInTargetList(pathkey, + relids, + targetlist, + &targetindex); // OUT + + /* If not found, see if the equiv class contains a constant expr. */ + if (!item && + CdbPathkeyEqualsConstant(pathkey)) { item = (PathKeyItem *)linitial(pathkey); newexpr = (Expr *)copyObject(item->key); } - /* New vars for old! */ - else - { - AttrNumber targetindex; + /* Fail if no usable expr. */ + else if (!item) + return NULL; - /* Find an expr that we can rewrite to use the projected columns. */ - item = cdbpullup_findPathKeyItemInTargetList(pathkey, - relids, - targetlist, - &targetindex); // OUT - if (!item) - return NULL; + /* If found matching targetlist item, make a Var that references it. */ + else if (targetindex > 0) + newexpr = (Expr *)cdbpullup_makeVar(newrelid, + targetindex, + newvarlist, + (Expr *)item->key); - /* If found matching targetlist item, make a Var that references it. */ - if (targetindex > 0) - newexpr = (Expr *)cdbpullup_makeVar(newrelid, - targetindex, - newvarlist, - (Expr *)item->key); + /* Replace expr's Var nodes with new ones referencing the targetlist. */ + else + newexpr = cdbpullup_expr((Expr *)item->key, + targetlist, + newvarlist, + newrelid); - /* Replace expr's Var nodes with new ones referencing the targetlist. */ - else - newexpr = cdbpullup_expr((Expr *)item->key, - targetlist, - newvarlist, - newrelid); + /* Pull up RelabelType node too, unless tlist expr has right type. */ + if (IsA(item->key, RelabelType)) + { + RelabelType *oldrelabel = (RelabelType *)item->key; - /* Pull up RelabelType node too, unless tlist expr has right type. */ - if (IsA(item->key, RelabelType)) - { - RelabelType *oldrelabel = (RelabelType *)item->key; - - if (oldrelabel->resulttype != exprType((Node *)newexpr) || - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) - newexpr = (Expr *)makeRelabelType(newexpr, - oldrelabel->resulttype, - oldrelabel->resulttypmod, - oldrelabel->relabelformat); - } + if (oldrelabel->resulttype != exprType((Node *)newexpr) || + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) + newexpr = (Expr *)makeRelabelType(newexpr, + oldrelabel->resulttype, + oldrelabel->resulttypmod, + oldrelabel->relabelformat); } Insist(newexpr); Index: cdb-pg/src/backend/optimizer/util/pathnode.c ================================================== ================= RCS file: /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz er/util/pathnode.c,v diff -u -N -r1.52.2.4 -r1.52.2.5 --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44 -0000 1.52.2.4 +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15 -0000 1.52.2.5 @@ -1563,7 +1563,15 @@ pathnode->path.rescannable = false; } - return pathnode; + /* + * CDB: If there is exactly one subpath, its ordering is preserved. + * Child rel's pathkey exprs are already expressed in terms of the + * columns of the parent appendrel. See find_usable_indexes(). + */ + if (list_length(subpaths) == 1) + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys; + + return pathnode; } /* On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > Anton wrote: >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; >>>> QUERY PLAN >>> ---------------------------------------------------------------------------- >>> ----------------------------- >>>> Limit (cost=824637.69..824637.69 rows=1 width=32) >>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) >>>> Sort Key: public.n_traf.date_time >>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) >>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) >>>> -> Seq Scan on n_traf (cost=0.00..22.30 >>>> rows=1230 width=32) >>>> -> Seq Scan on n_traf_y2007m01 n_traf >>>> (cost=0.00..22.30 rows=1230 width=32) >> ... >>>> -> Seq Scan on n_traf_y2007m12 n_traf >>>> (cost=0.00..22.30 rows=1230 width=32) >>>> (18 rows) >>>> >>>> Why it no uses indexes at all? >>>> ------------------------------------------- >>> I'm no expert but I'd guess that the the planner doesn't know which >>> partition holds the latest time so it has to read them all. >> >> Agree. But why it not uses indexes when it reading them? > > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." > below the append node. Therefore it needs to fetch all rows from all the > tables, and the fastest way to do that is a seq scan. ---------------------------(end of broadcast)--------------------------- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate |
| |||
| Pn, 2007 08 24 14:53 +0600, Anton rašė: > Hi. > > I just created partitioned table, n_traf, sliced by month > (n_traf_y2007m01, n_traf_y2007m02... and so on, see below). They are > indexed by 'date_time' column. > Then I populate it (last value have date 2007-08-...) and do VACUUM > ANALYZE ON n_traf_y2007... all of it. > > Now I try to select latest value (ORDER BY date_time LIMIT 1), but > Postgres produced the ugly plan: > > =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > QUERY PL can you test performance and send explain results of select like this : select * from n_traf where date_time = (select max(date_time) from n_traf); i have similar problem with ~70M rows table (then using ordering), but my table not partitioned. I`m interesting how this select will works on partitioned table. -- Pagarbiai, Tomas Tamošaitis Projektų Vadovas Connecty Skype://mazgis1009?add Mob: +370 652 86127 e-pastas: tomas.tamosaitis@connecty.lt web: www.connecty.lt ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Bruce, would you please add this to the 8.4 patch queue so we remember to look at this later? It didn't occur to me that we can do that in the degenerate case when there's just a single node below the Append. A more general solution would be to check if the pathkeys of all the child nodes match, and do a "merge append" similar to a merge join. Luke Lonergan wrote: > Below is a patch against 8.2.4 (more or less), Heikki can you take a look at > it? > > This enables the use of index scan of a child table by recognizing sort > order of the append node. Kurt Harriman did the work. > > - Luke > > Index: cdb-pg/src/backend/optimizer/path/indxpath.c > ================================================== ================= > RCS file: > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > er/path/indxpath.c,v > diff -u -N -r1.22 -r1.22.2.1 > --- cdb-pg/src/backend/optimizer/path/indxpath.c 25 Apr 2007 22:07:21 > -0000 1.22 > +++ cdb-pg/src/backend/optimizer/path/indxpath.c 10 Aug 2007 03:41:15 > -0000 1.22.2.1 > @@ -379,8 +379,51 @@ > index_pathkeys = build_index_pathkeys(root, index, > ForwardScanDirection, > true); > - useful_pathkeys = truncate_useless_pathkeys(root, rel, > - index_pathkeys); > + /* > + * CDB: For appendrel child, pathkeys contain Var nodes in > terms > + * of the child's baserel. Transform the pathkey list to refer > to > + * columns of the appendrel. > + */ > + if (rel->reloptkind == RELOPT_OTHER_MEMBER_REL) > + { > + AppendRelInfo *appinfo = NULL; > + RelOptInfo *appendrel = NULL; > + ListCell *appcell; > + CdbPathLocus notalocus; > + > + /* Find the appendrel of which this baserel is a child. */ > + foreach(appcell, root->append_rel_list) > + { > + appinfo = (AppendRelInfo *)lfirst(appcell); > + if (appinfo->child_relid == rel->relid) > + break; > + } > + Assert(appinfo); > + appendrel = find_base_rel(root, appinfo->parent_relid); > + > + /* > + * The pathkey list happens to have the same format as the > + * partitioning key of a Hashed locus, so by disguising it > + * we can use cdbpathlocus_pull_above_projection() to do > the > + * transformation. > + */ > + CdbPathLocus_MakeHashed(¬alocus, index_pathkeys); > + notalocus = > + cdbpathlocus_pull_above_projection(root, > + notalocus, > + rel->relids, > + rel->reltargetlist, > + > appendrel->reltargetlist, > + appendrel->relid); > + if (CdbPathLocus_IsHashed(notalocus)) > + index_pathkeys = truncate_useless_pathkeys(root, > appendrel, > + > notalocus.partkey); > + else > + index_pathkeys = NULL; > + } > + > + useful_pathkeys = truncate_useless_pathkeys(root, rel, > + index_pathkeys); > } > else > useful_pathkeys = NIL; > Index: cdb-pg/src/backend/optimizer/path/pathkeys.c > ================================================== ================= > RCS file: > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > er/path/pathkeys.c,v > diff -u -N -r1.18 -r1.18.2.1 > --- cdb-pg/src/backend/optimizer/path/pathkeys.c 30 Apr 2007 05:44:07 > -0000 1.18 > +++ cdb-pg/src/backend/optimizer/path/pathkeys.c 10 Aug 2007 03:41:15 > -0000 1.18.2.1 > @@ -1403,55 +1403,53 @@ > { > PathKeyItem *item; > Expr *newexpr; > + AttrNumber targetindex; > > Assert(pathkey); > > - /* Use constant expr if available. Will be at head of list. */ > - if (CdbPathkeyEqualsConstant(pathkey)) > + /* Find an expr that we can rewrite to use the projected columns. */ > + item = cdbpullup_findPathKeyItemInTargetList(pathkey, > + relids, > + targetlist, > + &targetindex); // OUT > + > + /* If not found, see if the equiv class contains a constant expr. */ > + if (!item && > + CdbPathkeyEqualsConstant(pathkey)) > { > item = (PathKeyItem *)linitial(pathkey); > newexpr = (Expr *)copyObject(item->key); > } > > - /* New vars for old! */ > - else > - { > - AttrNumber targetindex; > + /* Fail if no usable expr. */ > + else if (!item) > + return NULL; > > - /* Find an expr that we can rewrite to use the projected columns. > */ > - item = cdbpullup_findPathKeyItemInTargetList(pathkey, > - relids, > - targetlist, > - &targetindex); // OUT > - if (!item) > - return NULL; > + /* If found matching targetlist item, make a Var that references it. */ > + else if (targetindex > 0) > + newexpr = (Expr *)cdbpullup_makeVar(newrelid, > + targetindex, > + newvarlist, > + (Expr *)item->key); > > - /* If found matching targetlist item, make a Var that references > it. */ > - if (targetindex > 0) > - newexpr = (Expr *)cdbpullup_makeVar(newrelid, > - targetindex, > - newvarlist, > - (Expr *)item->key); > + /* Replace expr's Var nodes with new ones referencing the targetlist. > */ > + else > + newexpr = cdbpullup_expr((Expr *)item->key, > + targetlist, > + newvarlist, > + newrelid); > > - /* Replace expr's Var nodes with new ones referencing the > targetlist. */ > - else > - newexpr = cdbpullup_expr((Expr *)item->key, > - targetlist, > - newvarlist, > - newrelid); > + /* Pull up RelabelType node too, unless tlist expr has right type. */ > + if (IsA(item->key, RelabelType)) > + { > + RelabelType *oldrelabel = (RelabelType *)item->key; > > - /* Pull up RelabelType node too, unless tlist expr has right type. > */ > - if (IsA(item->key, RelabelType)) > - { > - RelabelType *oldrelabel = (RelabelType *)item->key; > - > - if (oldrelabel->resulttype != exprType((Node *)newexpr) || > - oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) > - newexpr = (Expr *)makeRelabelType(newexpr, > - oldrelabel->resulttype, > - oldrelabel->resulttypmod, > - > oldrelabel->relabelformat); > - } > + if (oldrelabel->resulttype != exprType((Node *)newexpr) || > + oldrelabel->resulttypmod != exprTypmod((Node *)newexpr)) > + newexpr = (Expr *)makeRelabelType(newexpr, > + oldrelabel->resulttype, > + oldrelabel->resulttypmod, > + oldrelabel->relabelformat); > } > Insist(newexpr); > > Index: cdb-pg/src/backend/optimizer/util/pathnode.c > ================================================== ================= > RCS file: > /data/FISHEYE_REPOSITORIES/greenplum/cvsroot/cdb2/cdb-pg/src/backend/optimiz > er/util/pathnode.c,v > diff -u -N -r1.52.2.4 -r1.52.2.5 > --- cdb-pg/src/backend/optimizer/util/pathnode.c 5 Aug 2007 23:06:44 > -0000 1.52.2.4 > +++ cdb-pg/src/backend/optimizer/util/pathnode.c 10 Aug 2007 03:41:15 > -0000 1.52.2.5 > @@ -1563,7 +1563,15 @@ > pathnode->path.rescannable = false; > } > > - return pathnode; > + /* > + * CDB: If there is exactly one subpath, its ordering is preserved. > + * Child rel's pathkey exprs are already expressed in terms of the > + * columns of the parent appendrel. See find_usable_indexes(). > + */ > + if (list_length(subpaths) == 1) > + pathnode->path.pathkeys = ((Path *)linitial(subpaths))->pathkeys; > + > + return pathnode; > } > > /* > > > On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > >> Anton wrote: >>>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; >>>>> QUERY PLAN >>>> ---------------------------------------------------------------------------- >>>> ----------------------------- >>>>> Limit (cost=824637.69..824637.69 rows=1 width=32) >>>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) >>>>> Sort Key: public.n_traf.date_time >>>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) >>>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) >>>>> -> Seq Scan on n_traf (cost=0.00..22.30 >>>>> rows=1230 width=32) >>>>> -> Seq Scan on n_traf_y2007m01 n_traf >>>>> (cost=0.00..22.30 rows=1230 width=32) >>> ... >>>>> -> Seq Scan on n_traf_y2007m12 n_traf >>>>> (cost=0.00..22.30 rows=1230 width=32) >>>>> (18 rows) >>>>> >>>>> Why it no uses indexes at all? >>>>> ------------------------------------------- >>>> I'm no expert but I'd guess that the the planner doesn't know which >>>> partition holds the latest time so it has to read them all. >>> Agree. But why it not uses indexes when it reading them? >> The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." >> below the append node. Therefore it needs to fetch all rows from all the >> tables, and the fastest way to do that is a seq scan. > > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| I want ask about problem with partioned tables (it was discussed some time ago, see below). Is it fixed somehow in 8.2.5 ? 2007/8/24, Luke Lonergan <llonergan@greenplum.com>: > Below is a patch against 8.2.4 (more or less), Heikki can you take a look at > it? > > This enables the use of index scan of a child table by recognizing sort > order of the append node. Kurt Harriman did the work. .... > > On 8/24/07 3:38 AM, "Heikki Linnakangas" <heikki@enterprisedb.com> wrote: > > > Anton wrote: > >>>> =# explain SELECT * FROM n_traf ORDER BY date_time DESC LIMIT 1; > >>>> QUERY PLAN > >>> ---------------------------------------------------------------------------- > >>> ----------------------------- > >>>> Limit (cost=824637.69..824637.69 rows=1 width=32) > >>>> -> Sort (cost=824637.69..838746.44 rows=5643499 width=32) > >>>> Sort Key: public.n_traf.date_time > >>>> -> Result (cost=0.00..100877.99 rows=5643499 width=32) > >>>> -> Append (cost= 0.00..100877.99 rows=5643499 width=32) > >>>> -> Seq Scan on n_traf (cost=0.00..22.30 > >>>> rows=1230 width=32) > >>>> -> Seq Scan on n_traf_y2007m01 n_traf > >>>> (cost=0.00..22.30 rows=1230 width=32) > >> ... > >>>> -> Seq Scan on n_traf_y2007m12 n_traf > >>>> (cost=0.00..22.30 rows=1230 width=32) > >>>> (18 rows) > >>>> > >>>> Why it no uses indexes at all? > >>>> ------------------------------------------- > >>> I'm no expert but I'd guess that the the planner doesn't know which > >>> partition holds the latest time so it has to read them all. > >> > >> Agree. But why it not uses indexes when it reading them? > > > > The planner isn't smart enough to push the "ORDER BY ... LIMIT ..." > > below the append node. Therefore it needs to fetch all rows from all the > > tables, and the fastest way to do that is a seq scan. -- engineer ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| Anton <anton200@gmail.com> writes: > I want ask about problem with partioned tables (it was discussed some > time ago, see below). Is it fixed somehow in 8.2.5 ? No. The patch you mention never was considered at all, since it consisted of a selective quote from Greenplum source code. It would not even compile in community Postgres, because it adds calls to half a dozen Greenplum routines that we've never seen. Not to mention that the base of the diff is Greenplum proprietary code, so the patch itself wouldn't even apply successfully. As to whether it would work if we had the full story ... well, not having the full story, I don't want to opine. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |