vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Summary: depending on the value, the planner will sometimes choose a seq scan, sometimes an index scan. The former produces terrible performace, the latter great performance. The long story: we had a disk failure (NOT the disk the db was on) and the machine's system disk had to be rebuilt from the raid array and re-GRUB'ed. Now that the the system is back up we are seeing terrible performance (or more accurately, wildly varying performance). I've tried re-importing the data from the live system (this is new hardware under testing for the system) and re-initing the db cluster. A specific example is probably best. This 'connections' table has about 922K rows. The difference here is node_id's 28542 vs. 28560. Using 28542 causes an index scan, 28560 causes a seq scan: The details: logicops2=> explain SELECT * from connections AS c LEFT JOIN connection_types AS ct ON ct.connection_type_id = c.connection_type_id WHERE ( connector_node_id = 28542 OR connectee_node_id = 28542 ) AND ( c.connection_type_id < 1000 ) LIMIT 300; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Limit (cost=1.29..563.05 rows=203 width=116) -> Hash Left Join (cost=1.29..563.05 rows=203 width=116) Hash Cond: ("outer".connection_type_id = "inner".connection_type_id) -> Index Scan using c_connector_node_id, c_connectee_node_id on connections c (cost=0.00..558.72 rows=203 width=33) Index Cond: ((connector_node_id = 28542) OR (connectee_node_id = 28542)) Filter: (connection_type_id < 1000) -> Hash (cost=1.23..1.23 rows=23 width=83) -> Seq Scan on connection_types ct (cost=0.00..1.23 rows=23 width=83) (8 rows) Time: 0.935 ms logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS ct ON ct.connection_type_id = c.connection_type_id WHERE ( connector_node_id = 28542 OR connectee_node_id = 28542 ) AND ( c.connection_type_id < 1000 ) LIMIT 300; ....results... (12 rows) Time: 1.887 ms -vs- logicops2=> explain SELECT * from connections AS c LEFT JOIN connection_types AS ct ON ct.connection_type_id = c.connection_type_id WHERE ( connector_node_id = 28560 OR connectee_node_id = 28560 ) AND ( c.connection_type_id < 1000 ) LIMIT 300; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------- Limit (cost=1.29..686.09 rows=300 width=116) -> Hash Left Join (cost=1.29..24939.39 rows=10925 width=116) Hash Cond: ("outer".connection_type_id = "inner".connection_type_id) -> Seq Scan on connections c (cost=0.00..24774.23 rows=10925 width=33) Filter: (((connector_node_id = 28560) OR (connectee_node_id = 28560)) AND (connection_type_id < 1000)) -> Hash (cost=1.23..1.23 rows=23 width=83) -> Seq Scan on connection_types ct (cost=0.00..1.23 rows=23 width=83) (7 rows) Time: 0.704 ms logicops2=> SELECT * from connections AS c LEFT JOIN connection_types AS ct ON ct.connection_type_id = c.connection_type_id WHERE ( connector_node_id = 28560 OR connectee_node_id = 28560 ) AND ( c.connection_type_id < 1000 ) LIMIT 300; ....results... (7 rows) Time: 578.597 ms .... it may be relevant that one node_id has 15 times as many connections: logicops2=> select count(*) from connections where connector_node_id = 28542 OR connectee_node_id = 28542; count ------- 856 (1 row) Time: 1.424 ms logicops2=> select count(*) from connections where connector_node_id = 28560 OR connectee_node_id = 28560; count ------- 13500 (1 row) Time: 559.696 ms .... but that shouldn't make a difference to the planner, should it? Yes, I've vacuum analyzed. Also, I was wondering if someone could correct me on a bit of array syntax. I'd like to have a query pass back an array of ints to a function call. Something like this: logicops2=> select * from nodes2ancestors(array[(select node_id from nodes where node_type_id = 3)]::int[], 0); ERROR: more than one row returned by a subquery used as an expression Thanks for any help/pointers you guys can provide. I really appreciate it as I'm down to the wire on a project and this performance thing has really blindsided us. Bart ---------------------------(end of broadcast)--------------------------- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Bart Grantham <bart@logicworks.net> writes: > ... it may be relevant that one node_id has 15 times as many connections: > ... but that shouldn't make a difference to the planner, should it? Of course it should (and does). > Yes, I've vacuum analyzed. What you need to do is show EXPLAIN ANALYZE output. There's not much point in complaining that the planner's estimates don't match reality when you don't present the reality to compare with. BTW, this would be more on-topic on the pgsql-performance list. > I'd like to have a query pass back an array of ints to a > function call. Something like this: > logicops2=> select * from nodes2ancestors(array[(select node_id from > nodes where node_type_id = 3)]::int[], 0); I think it's "array(select node_id from nodes where node_type_id = 3)" .... no square brackets anywhere. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings |