vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL > Server) the leaf level of the narrowest index on the table is scanned, > following a linked list of leaf pages. Leaf pages can be pretty dense > under Sybase, because they do use prefix compression. A count(*) > on a table with 100 million rows is going to take a few minutes, but it > is going to be at least an order of magnitude faster than a data page > scan -- maybe two orders of magnitude faster. MS SQL server (pre 2005) is not an MVCC database, so it's not apples to apples with pg. Many of the people who wander on this list and complain about count(*) either come from one of those or some other non-MVCC database or worse, a flat-file xbase type system. A performance comparison between MS 2005 and pg would be much more interesting. Personally, I don't know what all the fuss is about [although I wouldn't complain about an optimization Merlin ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Merlin Moncure wrote: > > In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL > > Server) the leaf level of the narrowest index on the table is scanned, > > following a linked list of leaf pages. Leaf pages can be pretty dense > > under Sybase, because they do use prefix compression. A count(*) > > on a table with 100 million rows is going to take a few minutes, but > it > > is going to be at least an order of magnitude faster than a data page > > scan -- maybe two orders of magnitude faster. > > MS SQL server (pre 2005) is not an MVCC database, so it's not apples to > apples with pg. Oh, also it was mentioned on pgsql-advocacy that InnoDB is MVCC. If that's the case, I wonder how do they do the count(*) thing? Is it fast? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On 11/18/05, Merlin Moncure <merlin.moncure@rcsonline.com> wrote: > > In Sybase ASE (and I'm pretty sure the same is true in Microsoft SQL > > Server) the leaf level of the narrowest index on the table is scanned, > > following a linked list of leaf pages. Leaf pages can be pretty dense > > under Sybase, because they do use prefix compression. A count(*) > > on a table with 100 million rows is going to take a few minutes, but > it > > is going to be at least an order of magnitude faster than a data page > > scan -- maybe two orders of magnitude faster. > > MS SQL server (pre 2005) is not an MVCC database, so it's not apples to > apples with pg. Many of the people who wander on this list and complain > about count(*) either come from one of those or some other non-MVCC > database or worse, a flat-file xbase type system. A performance > comparison between MS 2005 and pg would be much more interesting. > Personally, I don't know what all the fuss is about [although I wouldn't > complain about an optimization count(*) WHERE 1 is indeed a corner case that few to no real applications should care about... If we were having to choose between improving that case and preserving the performance and maintainability of PG then I think the discussion would already be over. However, some great ideas have been proposed here which would not only help in that case but would otherwise be quite useful. *Inclusion of a 'MVCC inflight' bit in indexes which would allow skipping MVCC checks in clumps of an index scan which have no pending changes. This would further close the performance gap between PG and non-MVCC databases for some workloads. *Introduction of high performance table sampling, which would be useful in many applications (including counting where there is a where clause) as well as for testing and adhoc queries. and *a estimate_count() that provides the planner estimate, which would return right away and provide what is really needed most of the time people try to count(*) on a large table. So, while this is a silly case to optimize for it's one where it appears that the proposed solutions will make PG better all around. ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Fri, Nov 18, 2005 at 02:56:52PM -0500, Gregory Maxwell wrote: > However, some great ideas have been proposed here which would not only > help in that case but would otherwise be quite useful. > > *Inclusion of a 'MVCC inflight' bit in indexes which would allow > skipping MVCC checks in clumps of an index scan which have no pending > changes. This would further close the performance gap between PG and > non-MVCC databases for some workloads. > *Introduction of high performance table sampling, which would be > useful in many applications (including counting where there is a where > clause) as well as for testing and adhoc queries. > and > *a estimate_count() that provides the planner estimate, which would > return right away and provide what is really needed most of the time > people try to count(*) on a large table. What about Greg Stark's idea of combining Simon's idea of storing per-heap-block xmin/xmax with using that information in an index scan? ISTM that's the best of everything that's been presented: it allows for faster index scans without adding a lot of visibility overhead to the index heap, and it also allows VACUUM to hit only pages that need vacuuming. Presumably this could also be used as the on-disk backing for the FSM, or it could potentially replace the FSM. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On 11/21/05, Jim C. Nasby <jnasby@pervasive.com> wrote: > What about Greg Stark's idea of combining Simon's idea of storing > per-heap-block xmin/xmax with using that information in an index scan? > ISTM that's the best of everything that's been presented: it allows for > faster index scans without adding a lot of visibility overhead to the > index heap, and it also allows VACUUM to hit only pages that need > vacuuming. Presumably this could also be used as the on-disk backing for > the FSM, or it could potentially replace the FSM. This should be a big win all around, especially now since in memory bitmaps make it more likely that some classes of queries will be pure index. I still think it would be useful to have a estimated_count() which switches to whatever method is needed to get a reasonably accurate count quickly (stats when there are no wheres we can't predict, sampling otherwise if the involved tables are large, and a normal count in other cases.) ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| Gregory Maxwell wrote: > On 11/21/05, Jim C. Nasby <jnasby@pervasive.com> wrote: > > What about Greg Stark's idea of combining Simon's idea of storing > > per-heap-block xmin/xmax with using that information in an index scan? > > ISTM that's the best of everything that's been presented: it allows for > > faster index scans without adding a lot of visibility overhead to the > > index heap, and it also allows VACUUM to hit only pages that need > > vacuuming. Presumably this could also be used as the on-disk backing for > > the FSM, or it could potentially replace the FSM. > > This should be a big win all around, especially now since in memory > bitmaps make it more likely that some classes of queries will be pure > index. I still think it would be useful to have a estimated_count() > which switches to whatever method is needed to get a reasonably > accurate count quickly (stats when there are no wheres we can't > predict, sampling otherwise if the involved tables are large, and a > normal count in other cases.) Added to TODO: * Add estimated_count(*) to return an estimate of COUNT(*) This would use the planner ANALYZE statistatics to return an estimated count. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| Jim C. Nasby wrote: > On Fri, Nov 18, 2005 at 02:56:52PM -0500, Gregory Maxwell wrote: > > However, some great ideas have been proposed here which would not only > > help in that case but would otherwise be quite useful. > > > > *Inclusion of a 'MVCC inflight' bit in indexes which would allow > > skipping MVCC checks in clumps of an index scan which have no pending > > changes. This would further close the performance gap between PG and > > non-MVCC databases for some workloads. > > *Introduction of high performance table sampling, which would be > > useful in many applications (including counting where there is a where > > clause) as well as for testing and adhoc queries. > > and > > *a estimate_count() that provides the planner estimate, which would > > return right away and provide what is really needed most of the time > > people try to count(*) on a large table. > > What about Greg Stark's idea of combining Simon's idea of storing > per-heap-block xmin/xmax with using that information in an index scan? > ISTM that's the best of everything that's been presented: it allows for > faster index scans without adding a lot of visibility overhead to the > index heap, and it also allows VACUUM to hit only pages that need > vacuuming. Presumably this could also be used as the on-disk backing for > the FSM, or it could potentially replace the FSM. Right, but xmin/xmax is too detailed. We just need a single bit to say all the rows in the heap page are visible to everyone. Seem my earlier posting. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073 ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| Thread Tools | |
| Display Modes | |
|
|