vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi List; Sorry if this is a dupe, my first post never showed up... I'm having some performance issues with a partitioned table. We have a VERY large table that we've partitioned by day. Currently we have 17 partitions - each partition table contains > 700million rows. One of the things we need to query is the min date from the master table - we may explore alternatives for this particular query, however even if we fix this query I think we have a fundamental issue with the use of indexes (actuallt the non-use) by the planner. Below is a sample of the DDL used to create our tables and an explain showing that the planner wants to do a sequential scan on each partition. We do have "constraint_elimination = on" set in the postgresql.conf file. I tried removing the index from the part_master table and got the same result Likewise the costs associated with the seq scans seem to be way off (yes I've run analyze on the master and all partition tables) - I ran the actual SQL statement below and killed it after about 15min. Thanks in advance for any help, advice, etc... Tables: ------------------------------------------ -- Master Table ------------------------------------------ CREATE TABLE part_master ( filename character varying(100), logdate date, ... -- about 50 more columns go here loghour date, url character varying(500), customer character varying(500) ); CREATE INDEX master_logdate ON part_master USING btree (logdate); ------------------------------------------ -- Partitions: ------------------------------------------ ------------------------------------------ -- part_20080319 ------------------------------------------ CREATE TABLE part_20080319 (CONSTRAINT part_20080319_logdate_check CHECK ((logdate = '2008-03-19'::date)) ) INHERITS (part_master); CREATE INDEX idx_part_20080319_customer ON part_20080319 USING btree (customer); CREATE INDEX idx_part_20080319_logdate ON part_20080319 USING btree (logdate); CREATE INDEX idx_part_20080319_loghour ON part_20080319 USING btree (loghour); ------------------------------------------ -- part_20080320 ------------------------------------------ CREATE TABLE part_20080320 (CONSTRAINT part_20080320_logdate_check CHECK ((logdate = '2008-03-20'::date)) ) INHERITS (part_master); CREATE INDEX idx_part_20080320_customer ON part_20080320 USING btree (customer); CREATE INDEX idx_part_20080320_logdate ON part_20080320 USING btree (logdate); CREATE INDEX idx_part_20080320_loghour ON part_20080320 USING btree (loghour); -- And so on, thru part_20080404 ------------------------------------------ -- explain plan ------------------------------------------ myDB=# explain SELECT min(logdate) FROM part_master; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=117070810.10..117070810.11 rows=1 width=4) -> Append (cost=0.00..114866502.48 rows=881723048 width=4) -> Seq Scan on part_master (cost=0.00..85596244.18 rows=679385718 width=4) -> Seq Scan on part_20080319 part (cost=0.00..212860.86 rows=1674986 width=4) -> Seq Scan on part_20080320 part (cost=0.00..1753802.51 rows=13782951 width=4) -> Seq Scan on part_20080321 part (cost=0.00..2061636.83 rows=15881283 width=4) -> Seq Scan on part_20080322 part (cost=0.00..1965144.71 rows=14936971 width=4) -> Seq Scan on part_20080323 part (cost=0.00..1614413.18 rows=12345618 width=4) -> Seq Scan on part_20080324 part (cost=0.00..1926520.22 rows=14741022 width=4) -> Seq Scan on part_20080325 part (cost=0.00..2356704.22 rows=18477622 width=4) -> Seq Scan on part_20080326 part (cost=0.00..1889267.71 rows=14512171 width=4) -> Seq Scan on part_20080327 part (cost=0.00..1622100.34 rows=12445034 width=4) -> Seq Scan on part_20080328 part (cost=0.00..1711779.49 rows=12885749 width=4) -> Seq Scan on part_20080329 part (cost=0.00..1568192.94 rows=11958394 width=4) -> Seq Scan on part_20080330 part (cost=0.00..1521204.64 rows=11676564 width=4) -> Seq Scan on part_20080331 part (cost=0.00..1587138.77 rows=12180377 width=4) -> Seq Scan on part_20080401 part (cost=0.00..2324352.82 rows=18211382 width=4) -> Seq Scan on part_20080402 part (cost=0.00..2891295.04 rows=6693804 width=4) -> Seq Scan on part_20080403 part (cost=0.00..1707327.48 rows=5748348 width=4) -> Seq Scan on part_20080404 part (cost=0.00..556516.54 rows=4185054 width=4) (20 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| kevin kempter wrote: > Hi List; > > Sorry if this is a dupe, my first post never showed up... > > I'm having some performance issues with a partitioned table. We have a > VERY large table that we've partitioned by day. > Unfortunately, that is the defined behavior in this case. From 5.9.6 of the manual: "Constraint exclusion only works when the query's WHERE clause contains constants." [Where the constants are of course your partitioning column(s)] The best way around this depends mostly on what you're up to. You can get the min tablename from the catalogs, or you can keep a table of active partitions that your script which drops off old partitions and generates new ones can keep updated on the oldest/newest partition dates. Or some number of other solutions, whatever you find cleanest for your purposes. Paul -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| |||
| "kevin kempter" <kevin@kevinkempterllc.com> writes: > that the planner wants to do a sequential scan on each partition. We do have > "constraint_elimination = on" set in the postgresql.conf file. "constraint_exclusion" btw. > myDB=# explain SELECT min(logdate) FROM part_master; Er, yeah. Unfortunately this is just not a kind of query our planner knows how to optimize when dealing with a partitioned table... yet. There are several different pieces missing to make this work. There's some hope some of them might show up for 8.4 but no guarantees. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication support! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| ||||
| kevin kempter wrote: > One of the things we need to query is the min date from the master table > - we may explore alternatives for this particular query, however even if > we fix this query I think we have a fundamental issue with the use of > indexes (actuallt the non-use) by the planner. We had a similar requirement, so I've been using a function that loops over the child tables, and queries for the min date from each. If all you need is the date, you can try a function call. Here is a modified version of what I've been using: CREATE OR REPLACE function get_min_date() RETURNS DATE as $_$ DECLARE x RECORD; min_date DATE; min_date_tmp DATE; qry TEXT; BEGIN /* can also test MIN() aggregate, rather than ORDER BY/LIMIT */ FOR x IN EXECUTE 'select tablename from pg_tables where tablename like ''part_20%''' loop qry := 'SELECT logdate FROM '||x.tablename||' ORDER BY logdate LIMIT 1'; EXECUTE qry INTO min_date_tmp; IF (min_date IS NULL OR (min_date_tmp IS NOT NULL AND min_date_tmp<min_date)) THEN min_date := min_date_tmp; END IF; END LOOP; RETURN min_date; END; $_$ language plpgsql immutable; -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance |
| Thread Tools | |
| Display Modes | |
| |