vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| > We are facing a* critical situation because of the performance of the > **database** .* Even a basic query like select count(*) from > bigger_table is taking about 4 minutes to return. Several other replies have mentioned that COUNT() requires a full table scan, but this point can't be emphasized enough: Don't do it! People who are migrating from other environments (Oracle or MySQL) are used to COUNT(), MIN() and MAX() returning almost instantaneously, certainly on indexed columns. But for reasons that have something to do with transactions, these operations are unbelievably slow in PostgreSQL. Here are the alternatives that I've learned. COUNT() -- There is no good substitute. What I do is create a new column, "ROW_NUM" with an auto-incrementing sequence. Every time I insert a row, it gets a new value. Unfortunately, this doesn't work if you ever delete a row. The alternative is a more complex pair of triggers, one for insert and one for delete, that maintains the count in a separate one-row table. It's a nuisance, but it's a lot faster than doing a full table scan for every COUNT(). MIN() and MAX() -- These are surprisingly slow, because they seem to do a full table scan EVEN ON AN INDEXED COLUMN! I don't understand why, but happily there is an effective substitute: select mycolumn from mytable order by mycolumn limit 1; -- same as MIN() select mycolumn from mytable order by mycolumn desc limit 1; -- same as MAX() For a large table, MIN or MAX can take 5-10 minutes, where the above "select..." replacements can return in one millisecond. You should carefully examine your entire application for COUNT, MIN, and MAX, and get rid of them EVERYWHERE. This may be the entire source of your problem. It was in my case. This is, in my humble opinion, the only serious flaw in PostgreSQL. I've been totally happy with it in every other way, and once I understood these shortcomings, my application is runs faster than ever on PostgreSQL. Craig ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Dnia 23-10-2005, nie o godzinie 09:31 -0700, Craig A. James napisał(a): <cut> > MIN() and MAX() -- These are surprisingly slow, because they seem to do a full table scan EVEN ON AN INDEXED COLUMN! In 8.1 this is no true, see the changelog. > I don't understand why, but happily there is an effective substitute: > > select mycolumn from mytable order by mycolumn limit 1; -- same as MIN() > > select mycolumn from mytable order by mycolumn desc limit 1; -- same as MAX() In 8.1 these queries are equivalent: select mycolumn from mytable order by mycolumn limit 1; select min(mycolumn) from mytable; -- Tomasz Rybak <bogomips@post.pl> ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| On Sun, Oct 23, 2005 at 09:31:44AM -0700, Craig A. James wrote: > COUNT() -- There is no good substitute. What I do is create a new column, > "ROW_NUM" with an auto-incrementing sequence. Every time I insert a row, > it gets a new value. Unfortunately, this doesn't work if you ever delete a > row. The alternative is a more complex pair of triggers, one for insert > and one for delete, that maintains the count in a separate one-row table. > It's a nuisance, but it's a lot faster than doing a full table scan for > every COUNT(). This will sometimes give you wrong results if your transactions ever roll back, for instance. The correct way to do it is to maintain a table of deltas, and insert a new positive record every time you insert rows, and a negative one every time you delete them (using a trigger, of course). Then you can query it for SUM(). (To keep the table small, run a SUM() in a cron job or such to combine the deltas.) There has, IIRC, been talks of supporting fast (index-only) scans on read-only (ie. archived) partitions of tables, but it doesn't look like this is coming in the immediate future. I guess others know more than me here :-) > MIN() and MAX() -- These are surprisingly slow, because they seem to do a > full table scan EVEN ON AN INDEXED COLUMN! I don't understand why, but > happily there is an effective substitute: They are slow because PostgreSQL has generalized aggregates, ie. MAX() gets fed exactly the same data as SUM() would. PostgreSQL 8.1 (soon-to-be released) can rewrite a MAX() or MIN() to an appropriate LIMIT form, though, which solves the problem. /* Steinar */ -- Homepage: http://www.sesse.net/ ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| "Craig A. James" <cjames@modgraph-usa.com> writes: > MIN() and MAX() -- These are surprisingly slow, because they seem to do a full table scan EVEN ON AN INDEXED COLUMN! I don't understand why, but happily there is an effective substitute: > select mycolumn from mytable order by mycolumn limit 1; -- same as MIN() > select mycolumn from mytable order by mycolumn desc limit 1; -- same as MAX() BTW, Postgres does know to do that for itself as of 8.1. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| ||||
| Hi Craig, Thank you very much for your response. It really covered a great point. Thank you, Kishore. On 10/23/05, Craig A. James <cjames@modgraph-usa.com> wrote: > > > > We are facing a* critical situation because of the performance of the > > **database** .* Even a basic query like select count(*) from > > bigger_table is taking about 4 minutes to return. > > Several other replies have mentioned that COUNT() requires a full table > scan, but this point can't be emphasized enough: Don't do it! People who are > migrating from other environments (Oracle or MySQL) are used to COUNT(), > MIN() and MAX() returning almost instantaneously, certainly on indexed > columns. But for reasons that have something to do with transactions, these > operations are unbelievably slow in PostgreSQL. > > Here are the alternatives that I've learned. > > COUNT() -- There is no good substitute. What I do is create a new column, > "ROW_NUM" with an auto-incrementing sequence. Every time I insert a row, it > gets a new value. Unfortunately, this doesn't work if you ever delete a row. > The alternative is a more complex pair of triggers, one for insert and one > for delete, that maintains the count in a separate one-row table. It's a > nuisance, but it's a lot faster than doing a full table scan for every > COUNT(). > > MIN() and MAX() -- These are surprisingly slow, because they seem to do a > full table scan EVEN ON AN INDEXED COLUMN! I don't understand why, but > happily there is an effective substitute: > > select mycolumn from mytable order by mycolumn limit 1; -- same as MIN() > > select mycolumn from mytable order by mycolumn desc limit 1; -- same as > MAX() > > For a large table, MIN or MAX can take 5-10 minutes, where the above > "select..." replacements can return in one millisecond. > > You should carefully examine your entire application for COUNT, MIN, and > MAX, and get rid of them EVERYWHERE. This may be the entire source of your > problem. It was in my case. > > This is, in my humble opinion, the only serious flaw in PostgreSQL. I've > been totally happy with it in every other way, and once I understood these > shortcomings, my application is runs faster than ever on PostgreSQL. > > Craig > |
| Thread Tools | |
| Display Modes | |
|
|