This is a discussion on count(*) performance improvement ideas within the pgsql Hackers forums, part of the PostgreSQL category; --> I am reading discussion about improving count(*) performance. I have also seen a TODO for this. Many people have ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I am reading discussion about improving count(*) performance. I have also seen a TODO for this. Many people have suggested TRIGGER based solution to the slow count(*) problem. I looked at the following link which presents the solution neatly. http://www.varlena.com/GeneralBits/120.php But how does that really work for SERIALIZABLE transactions ? If two concurrent transactions INSERT/DELETE rows from a table, the trigger execution of one of the transactions is bound to fail because of concurrent access. Even for READ COMMITTED transactions, the trigger execution would wait if the other transaction has executed the trigger on the same table. Well, I think the READ COMMITTED case can be handled with DEFERRED triggers, but that may require queuing up too many triggers if there are many inserts/deletes in a transaction. Running trigger for every insert/delete seems too expensive. I wonder if we can have a separate "counter" table (as suggested in the TRIGGER based solution) and track total number of tuples inserted and deleted in a transaction (and all the committed subtransactions). We then execute a single UPDATE at the end of the transaction. With HOT, updating the "counter" table should not be a big pain since all these updates can potentially be HOT updates. Also since the update of the "counter" table happens at the commit time, other transactions inserting/deleting from the same user table may need to wait for a very small period on the "counter" table tuple. This still doesn't solve the serializable transaction problem though. But I am sure we can figure out some solution for that case as well if we agree on the general approach. I am sure this must have been discussed before. So what are the objections ? Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > I am sure this must have been discussed before. Indeed. Apparently you didn't find the threads in which the idea of having transactions enter "delta" entries was discussed? Solves both the locking and the MVCC problems, at the cost that you need to make cleanup passes over the counter table fairly often. I don't see this as material for the core DB but it would make a great contrib module. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Tom Lane wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > > I am sure this must have been discussed before. > > Indeed. Apparently you didn't find the threads in which the idea of > having transactions enter "delta" entries was discussed? Solves both > the locking and the MVCC problems, at the cost that you need to make > cleanup passes over the counter table fairly often. > > I don't see this as material for the core DB but it would make a great > contrib module. The TODO list has lots of details on this. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Wed, Mar 12, 2008 at 9:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Pavan Deolasee" <pavan.deolasee@gmail.com> writes: > > I am sure this must have been discussed before. > > Indeed. Apparently you didn't find the threads in which the idea of > having transactions enter "delta" entries was discussed? Solves both > the locking and the MVCC problems, at the cost that you need to make > cleanup passes over the counter table fairly often. > Ok. I would surely look at those threads. Hopefully HOT will considerably solve the counter table cleanup issue. > I don't see this as material for the core DB but it would make a great > contrib module. > I guess we will need some backend hooks to make it really work, no ? At the minimum we need to track the "deltas" at the transaction level and the ability to do some extra processing at the commit time. May be I should first read those threads and I will find the answers. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Pavan Deolasee wrote: > I am reading discussion about improving count(*) performance. I have > also seen a TODO for this. > > Many people have suggested TRIGGER based solution to the slow count(*) > problem. I looked at the following link which presents the solution > neatly. > > http://www.varlena.com/GeneralBits/120.php > > But how does that really work for SERIALIZABLE transactions ? If > two concurrent transactions INSERT/DELETE rows from a table, > the trigger execution of one of the transactions is bound to fail > because of concurrent access. Even for READ COMMITTED transactions, > the trigger execution would wait if the other transaction has executed > the trigger on the same table. Well, I think the READ COMMITTED case > can be handled with DEFERRED triggers, but that may require queuing up > too many triggers if there are many inserts/deletes in a transaction. > > Running trigger for every insert/delete seems too expensive. I wonder > if we can have a separate "counter" table (as suggested in the TRIGGER > based solution) and track total number of tuples inserted and deleted > in a transaction (and all the committed subtransactions). We then > execute a single UPDATE at the end of the transaction. With HOT, > updating the "counter" table should not be a big pain since all these > updates can potentially be HOT updates. Also since the update of > the "counter" table happens at the commit time, other transactions > inserting/deleting from the same user table may need to wait for a > very small period on the "counter" table tuple. > > This still doesn't solve the serializable transaction problem > though. But I am sure we can figure out some solution for that case > as well if we agree on the general approach. > > I am sure this must have been discussed before. So what are the > objections If you are talking about automatically doing this for every table - I have an objection that the performance impact seems unwarranted against the gain. We are still talking about every insert or update updating some counter table, with the only mitigating factor being that the trigger would be coded deeper into PostgreSQL theoretically making it cheaper? You can already today create a trigger on insert that will append to a summary table of some sort, whose only purpose is to maintain counts. At the simplest, it is as somebody else suggested where you might have the other table only store the primary keys with foreign key references back to the main table for handling deletes and updates. Storing transaction numbers and such might allow the data to be reduced in terms of size (and therefore elapsed time to scan), but it seems complex. If this really is a problem that must be solved - I prefer the suggestion from the past of keeping track of live rows per block for a certain transaction range, and any that fall within this range can check off this block quickly with an exact count, then the exceptional blocks (the ones being changed) can be scanned to be sure. But, it's still pretty complicated to implement right and maintain, for what is probably limited gain. I don't personally buy into the need to do exact count(*) on a whole table quickly. I know people ask for it - but I find these same people either confused, or trying to use this functionality to accomplish some other end, under the assumption that because they can get counts faster from other databases, therefore PostgreSQL should do it as well. I sometimes wonder whether these people would even notice if PostgreSQL translated count(*) on the whole table to query reltuples. :-) Cheers, mark -- Mark Mielke <mark@mielke.cc> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Wed, Mar 12, 2008 at 9:14 PM, Mark Mielke <mark@mark.mielke.cc> wrote: > > If you are talking about automatically doing this for every table - I > have an objection that the performance impact seems unwarranted against > the gain. We are still talking about every insert or update updating > some counter table, with the only mitigating factor being that the > trigger would be coded deeper into PostgreSQL theoretically making it > cheaper? > No, I am not suggesting that. If you read proposal carefully, its one UPDATE per transaction. With HOT, I am hoping that the counter table may be completely cached in memory and won't bloat much. Also, we can always have a GUC (like pgstats) to control the overhead. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| Pavan Deolasee wrote: > On Wed, Mar 12, 2008 at 9:14 PM, Mark Mielke <mark@mark.mielke.cc> wrote: > >> If you are talking about automatically doing this for every table - I >> have an objection that the performance impact seems unwarranted against >> the gain. We are still talking about every insert or update updating >> some counter table, with the only mitigating factor being that the >> trigger would be coded deeper into PostgreSQL theoretically making it >> cheaper? >> > > No, I am not suggesting that. If you read proposal carefully, its one UPDATE > per transaction. With HOT, I am hoping that the counter table may be > completely cached in memory and won't bloat much. > > Also, we can always have a GUC (like pgstats) to control the overhead. > Fine - once per transaction instead of once per insert. Still, if there is overhead to this (updating a secondary summary table), does it really make sense to have it for every table? Most of my tables do not require count(*) on the whole table (actually - none of them do). For the same reason as I don't want oid, I don't think I would want "fast count" capabilities to impact my regular queries. Again, I don't think count(*) on the whole table is a particularly useful case. count(*) on particular subsets of the data may be, but of the whole table? If you can make a secondary summary table to be used for count(*) optional, great. If using HOT makes the secondary table more efficient, great. Cheers, mark -- Mark Mielke <mark@mielke.cc> |
| |||
| Mark Mielke <mark@mark.mielke.cc> writes: > Fine - once per transaction instead of once per insert. Still, if there > is overhead to this (updating a secondary summary table), does it really > make sense to have it for every table? We certainly wouldn't accept a patch that imposed this overhead on every table. One of the advantages of framing it as an explicit set of triggers is that then you have a natural way of indicating which table(s) you want the feature for (and are willing to pay the overhead to get it). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| |||
| On Wed, Mar 12, 2008 at 9:53 PM, Mark Mielke <mark@mark.mielke.cc> wrote: > > > > Fine - once per transaction instead of once per insert. Still, if there is > overhead to this (updating a secondary summary table), does it really make > sense to have it for every table? Most of my tables do not require count(*) > on the whole table (actually - none of them do). For the same reason as I > don't want oid, I don't think I would want "fast count" capabilities to > impact my regular queries. Again, I don't think count(*) on the whole table > is a particularly useful case. count(*) on particular subsets of the data > may be, but of the whole table? > ISTM that you are complaining because we never had an *fast* count(*) and adding that now comes at a cost. Had it been there from day one with the same overhead as we are talking about now, nobody would have complained :-) Anyways, your point is taken and it would be great if can make it configurable, if not table level then at least globally. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers |
| ||||
| Pavan Deolasee wrote: > On Wed, Mar 12, 2008 at 9:53 PM, Mark Mielke <mark@mark.mielke.cc> wrote: > >> >> Fine - once per transaction instead of once per insert. Still, if there is >> overhead to this (updating a secondary summary table), does it really make >> sense to have it for every table? Most of my tables do not require count(*) >> on the whole table (actually - none of them do). For the same reason as I >> don't want oid, I don't think I would want "fast count" capabilities to >> impact my regular queries. Again, I don't think count(*) on the whole table >> is a particularly useful case. count(*) on particular subsets of the data >> may be, but of the whole table? >> >> > > ISTM that you are complaining because we never had an *fast* count(*) > and adding that now comes at a cost. Had it been there from day one with > the same overhead as we are talking about now, nobody would have > complained :-) > Quite possibly - but, similarly, if PostgreSQL was generally slower due to such things, I might not have chosen PostgreSQL as my favoured db. :-) I think there is a reason PostgreSQL doesn't come with a fast count(*), and it's that the developers leading up to this point shared the opinion that this feature was not critical. I somewhat disagree about my personal reaction, though. I complain about many things, even configurable things, such as LISTEN/NOTIFY. I occasionally wish I had such a function, but given it's implementation, I would choose to use a non-PostgreSQL mechanism for implementation if available. Luckily, it's also "only suffer the performance cost if you CHOOSE to use it." Seems like a historically valuable trend. > Anyways, your point is taken and it would be great if can make it configurable, > if not table level then at least globally If table level, I might even try it out. If global level, I will not be trying it out. I'm only one person - but perhaps this will help point you in a direction favourable to many? If you are REALLY feeling clever, the queries that I would find it to benefit the MOST on, would include WHERE conditions. Just like indexes with WHERE conditions. If you get to this point, I think you are reaching something that will have far more universal benefit to existing applications. CREATE COUNT INDEX blah ON blah WHERE ... I would use this in many places. Cheers, mark -- Mark Mielke <mark@mielke.cc> |