vBulletin Search Engine Optimization
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi, I'm running PostgreSQL version 8 on a dual 2.4GHz Xeon with 1GB of RAM and an IDE hard drive. My big table has around 9 million records. Is there a tuning parameter I can change to increase speed of selects? Clearly, there's already some buffering going on since selecting an indexed ~50,000 records takes 17 seconds on the first try, and only 0.5 seconds on the second try (from pgsql). cowpea=> explain analyze select bs_fk from blast_result where si_fk=11843254; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using si_fk_index on blast_result (cost=0.00..22874.87 rows=58118 width=4) (actual time=112.249..17472.935 rows=50283 loops=1) Index Cond: (si_fk = 11843254) Total runtime: 17642.522 ms (3 rows) cowpea=> explain analyze select bs_fk from blast_result where si_fk=11843254; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------- Index Scan using si_fk_index on blast_result (cost=0.00..22874.87 rows=58118 width=4) (actual time=0.178..341.643 rows=50283 loops=1) Index Cond: (si_fk = 11843254) Total runtime: 505.011 ms (3 rows) cowpea=> Thanks, Tom -- Tom Laudeman twl8n@virginia.edu (434) 924-2456 http://www.people.virginia.edu/~twl8n/ http://laudeman.com/ ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote: > Hi, > > I'm running PostgreSQL version 8 on a dual 2.4GHz Xeon with 1GB of RAM > and an IDE hard drive. My big table has around 9 million records. > > Is there a tuning parameter I can change to increase speed of selects? > Clearly, there's already some buffering going on since selecting an > indexed ~50,000 records takes 17 seconds on the first try, and only 0.5 > seconds on the second try (from pgsql). Your OS is probably buffering, 1GB of RAM holds a lot of data. You can try increasing the shared_buffers parameter, but if the delay is getting data from the disk, that won't really help you. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate. -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.1 (GNU/Linux) iD8DBQFE2ec+IB7bNG8LQkwRAr5FAJ49LhhMS1ezCkD43tVIho TBTb6s6gCfR6a6 2BhcNCxjOK9LN37p5lJbNbc= =ZCI8 -----END PGP SIGNATURE----- |
| |||
| On Wed, Aug 09, 2006 at 03:46:38PM +0200, Martijn van Oosterhout wrote: > On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote: > > Is there a tuning parameter I can change to increase speed of selects? > > Clearly, there's already some buffering going on since selecting an > > indexed ~50,000 records takes 17 seconds on the first try, and only 0.5 > > seconds on the second try (from pgsql). > > Your OS is probably buffering, 1GB of RAM holds a lot of data. You can > try increasing the shared_buffers parameter, but if the delay is > getting data from the disk, that won't really help you. If most of your queries use the same index then clustering on that index might speed up initial (i.e., not-cached) queries by reducing the number of disk pages that need to be read. See the documentation for more information. http://www.postgresql.org/docs/8.1/i...l-cluster.html -- Michael Fuhr ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| What about creating views on areas of the table that are queried often? I don't know if you have access or the ability to find what type of trends the table has, in terms of queries, but if you create some views on frequently visited information, this could also help. Tom Laudeman wrote: > Hi, > > I'm running PostgreSQL version 8 on a dual 2.4GHz Xeon with 1GB of RAM > and an IDE hard drive. My big table has around 9 million records. > > Is there a tuning parameter I can change to increase speed of selects? > Clearly, there's already some buffering going on since selecting an > indexed ~50,000 records takes 17 seconds on the first try, and only > 0.5 seconds on the second try (from pgsql). > > cowpea=> explain analyze select bs_fk from blast_result where > si_fk=11843254; > QUERY > PLAN > -------------------------------------------------------------------------------------------------------------------------------------------- > > Index Scan using si_fk_index on blast_result (cost=0.00..22874.87 > rows=58118 width=4) (actual time=112.249..17472.935 rows=50283 loops=1) > Index Cond: (si_fk = 11843254) > Total runtime: 17642.522 ms > (3 rows) > > cowpea=> explain analyze select bs_fk from blast_result where > si_fk=11843254; > QUERY > PLAN > ---------------------------------------------------------------------------------------------------------------------------------------- > > Index Scan using si_fk_index on blast_result (cost=0.00..22874.87 > rows=58118 width=4) (actual time=0.178..341.643 rows=50283 loops=1) > Index Cond: (si_fk = 11843254) > Total runtime: 505.011 ms > (3 rows) > > cowpea=> > > > Thanks, > Tom > > ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| |||
| Louis, Views certainly help in managing complexity. They do nothing to improve query-speed. Querying a view gets rewritten to queries to the underlying tables on the fly. (as long as there are no materialized views, which are still on a the TODO list) -- GHUM Harald Massa persuadere et programmare Harald Armin Massa Reinsburgstraße 202b 70197 Stuttgart 0173/9409607 - Let's set so double the killer delete select all. |
| |||
| > Views certainly help in managing complexity. They do nothing to improve > query-speed. > > Querying a view gets rewritten to queries to the underlying tables on the > fly. > (as long as there are no materialized views, which are still on a the TODO > list) Would partial indexs on the most queried regions of the table help in query speed? Regards, Richard Broersma Jr. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| I'm not so sure about that, when you create a view on a table - at least with Oracle - which is a subset(the trivial or 'proper' subset is the entire table view) of the information on a table, when a select is issued against a table, Oracle at least, determines if there is a view already on a the table which potentially has a smaller amount of information to process - as long as the view contains the proper constraints that meet your 'select' criteria, the RDBMS engine will have fewer records to process - which I'd say, certainly constitutes a time benefit, in terms of 'performance gain.' Hence my reasoning behind determining IF there is a subset of the 'big table' that is frequented, I'd create a view on this, assuming postgresql does this too? Maybe somebody else can answer that for the pgsql-general's general information? query-speed itself is going to be as fast/slow as your system is configured for, however my point was to shave some time off of a 1M+ record table, but implementing views of 'frequently' visisted/hit records meeting the same specifications. Harald Armin Massa wrote: > Louis, > > Views certainly help in managing complexity. They do nothing to > improve query-speed. > > Querying a view gets rewritten to queries to the underlying tables on > the fly. > (as long as there are no materialized views, which are still on a the > TODO list) > > -- > GHUM Harald Massa > persuadere et programmare > Harald Armin Massa > Reinsburgstraße 202b > 70197 Stuttgart > 0173/9409607 > - > Let's set so double the killer delete select all. ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Wed, 2006-08-09 at 14:58, louis gonzales wrote: > I'm not so sure about that, when you create a view on a table - at least > with Oracle - which is a subset(the trivial or 'proper' subset is the > entire table view) of the information on a table, when a select is > issued against a table, Oracle at least, determines if there is a view > already on a the table which potentially has a smaller amount of > information to process - as long as the view contains the proper > constraints that meet your 'select' criteria, the RDBMS engine will have > fewer records to process - which I'd say, certainly constitutes a time > benefit, in terms of 'performance gain.' > > Hence my reasoning behind determining IF there is a subset of the 'big > table' that is frequented, I'd create a view on this, assuming > postgresql does this too? Maybe somebody else can answer that for the > pgsql-general's general information? > > query-speed itself is going to be as fast/slow as your system is > configured for, however my point was to shave some time off of a 1M+ > record table, but implementing views of 'frequently' visisted/hit > records meeting the same specifications. There are basically two ways to do views. The simple way, is to have a view represent a query that gets run everytime you call it. The more complex way is to "materialize" the view data, and put it into a new table, and then update that table whenever the source table changes. PostgreSQL has native support for the first type. They're cheap and easy, and work for most of the things people need views for (i.e. hiding complexity). PostgreSQL is extensible, and therefore you can institute the second type (i.e. materialized views) on your own. Thanksfully, someone else has already done most of the work for us, by the name of Jonathan Gardner, and you can find his nifty guide by typing "materialized views postgresql" into google. Gardner's materialized views support several update methods depending on what you need from your mat views. It's also a danged fine tutorial on how to write some simple plpgsql functions. |
| |||
| Michael, Great suggestion. I've read about CLUSTER, but never had a chance to use it. The only problem is that this table with 9 million records has 5 or 6 indexes. It is hard to pick the most used, but I'll bet CLUSTER will make at least one of the queries run very fast, especially for an index with a small number of distinct values. The speed of the query is (as Michael implies) limited to the rate at which the disk can seek and read. I have done experiments with views and cursors; there was no improvement in speed. I've also tried only pulling back primary keys in the hope that a smaller amount of data would more quickly be read into memory. No speed increase. I have also raised all the usual memory limits, with the expected results (slight speed improvements). I'll try CLUSTER (I'm looking forward to that test), but if we really need speed, it will probably be necessary to create copies of the table, or copy portions of the table elsewhere (essentially creating materialized views, I suppose). I'm still trying to get my science compatriot here to tell me which index he most wants to improve, then I'll CLUSTER the table on that index. Thanks! Tom Michael Fuhr wrote: >On Wed, Aug 09, 2006 at 03:46:38PM +0200, Martijn van Oosterhout wrote: > > >>On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote: >> >> >>>Is there a tuning parameter I can change to increase speed of selects? >>>Clearly, there's already some buffering going on since selecting an >>>indexed ~50,000 records takes 17 seconds on the first try, and only 0.5 >>>seconds on the second try (from pgsql). >>> >>> >>Your OS is probably buffering, 1GB of RAM holds a lot of data. You can >>try increasing the shared_buffers parameter, but if the delay is >>getting data from the disk, that won't really help you. >> >> > >If most of your queries use the same index then clustering on that >index might speed up initial (i.e., not-cached) queries by reducing >the number of disk pages that need to be read. See the documentation >for more information. > >http://www.postgresql.org/docs/8.1/i...l-cluster.html > > > -- Tom Laudeman twl8n@virginia.edu (434) 924-2456 http://www.people.virginia.edu/~twl8n/ http://laudeman.com/ |
| ||||
| On Wed, Aug 09, 2006 at 04:54:00PM -0400, Tom Laudeman wrote: > I'll try CLUSTER (I'm looking forward to that test), but if we really > need speed, it will probably be necessary to create copies of the table, > or copy portions of the table elsewhere (essentially creating > materialized views, I suppose). I'm still trying to get my science > compatriot here to tell me which index he most wants to improve, then > I'll CLUSTER the table on that index. If you enable statistics collection then you could use those statistics to see which indexes are used the most. Those indexes might be good candidates for clustering. http://www.postgresql.org/docs/8.1/i...ing-stats.html -- Michael Fuhr ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |