This is a discussion on Insert rate drops as table grows within the Pgsql General forums, part of the PostgreSQL category; --> I have this table and index: create table t(id int, hash int); create index idx_t on t(hash); The value ...
| |||||||
| FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| I have this table and index: create table t(id int, hash int); create index idx_t on t(hash); The value of the hash column, which is indexed, is a pseudo-random number. I load the table and measure the time per insert. What I've observed is that inserts slow down as the table grows to 1,000,000 records. Observing the pg_stat* tables, I see that the data page reads per unit time stay steady, but that index page reads grow quickly, (shared_buffers was set to 2000). I'm guessing that this is because inserts will append to data pages, but there is a random probe to the btree to maintain the index. This is a test program, but I'm wondering about my application that will need row counts much beyond 1,000,000. Questions: - Am I reasoning about the data and index costs correctly? - In order to keep insert times from dropping too much, do I simply need to increase shared_buffers in order to accomodate more of the index? - Assuming I'm willing to buy enough RAM and set shmmax high enough, are there practical limits on how big shared_buffers can be that will limit how far I can pursue such a strategy? Jack Orenstein ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster |
| |||
| jao@geophile.com writes: > I have this table and index: > create table t(id int, hash int); > create index idx_t on t(hash); > The value of the hash column, which is indexed, is a pseudo-random > number. I load the table and measure the time per insert. > What I've observed is that inserts slow down as the table grows to > 1,000,000 records. Observing the pg_stat* tables, I see that the data > page reads per unit time stay steady, but that index page reads grow > quickly, (shared_buffers was set to 2000). Define "quickly" ... the expected behavior is that cost to insert into a btree index grows roughly as log(N). Are you seeing anything worse than that? shared_buffers of 2000 is generally considered too small for high-volume databases. Numbers like 10000-50000 are considered reasonable on modern hardware. It's possible that you could go larger without too much penalty if using the 8.1 buffer manager code, but I don't know if anyone has benchmarked that systematically. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| Quoting Tom Lane <tgl@sss.pgh.pa.us>: > jao@geophile.com writes: >> I have this table and index: >> create table t(id int, hash int); >> create index idx_t on t(hash); > >> The value of the hash column, which is indexed, is a pseudo-random >> number. I load the table and measure the time per insert. > >> What I've observed is that inserts slow down as the table grows to >> 1,000,000 records. Observing the pg_stat* tables, I see that the data >> page reads per unit time stay steady, but that index page reads grow >> quickly, (shared_buffers was set to 2000). > > Define "quickly" ... the expected behavior is that cost to insert into > a btree index grows roughly as log(N). Are you seeing anything worse > than that? No, that's not what I'm seeing. The index block reads start low, and rise quickly to an approximate plateau. I've placed my test program and results here: http://geophile.com/insert_slowdown. - InsertPerformance.java: The test program (using the 8.0 JDBC driver and a 7.4.8 database. The database and test are all running on my laptop). - block_reads.jpg: Graph of data and index block reads, as reported by the pgstat_ tables, sampled every 15 seconds, (for a load of 1,000,000 rows). - insert_rate_vs_inserts.jpg: Graph of insert rate as a function of #rows inserted. - insert_rate_vs_time.jpg: Graph of insert rate as a function of wall clock time. > > shared_buffers of 2000 is generally considered too small for high-volume > databases. Understood. I set the value low to quickly test the idea that the index cache hit rate was the issue. > Numbers like 10000-50000 are considered reasonable on modern > hardware. These values are OK for 7.4.8? I've been using 8000. I thought I remembered reading that 12000-15000 was the top end of what would be reasonable, but I don't have a reference, and I don't think I've ever heard a rationale for such limits. Jack Orenstein ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend |
| ||||
| [Sorry if this is a repeat. Thought I sent this but I didn't see it come through the mailing list.] Quoting Tom Lane <tgl@sss.pgh.pa.us>: > jao@geophile.com writes: >> I have this table and index: >> create table t(id int, hash int); >> create index idx_t on t(hash); > >> The value of the hash column, which is indexed, is a pseudo-random >> number. I load the table and measure the time per insert. > >> What I've observed is that inserts slow down as the table grows to >> 1,000,000 records. Observing the pg_stat* tables, I see that the data >> page reads per unit time stay steady, but that index page reads grow >> quickly, (shared_buffers was set to 2000). > > Define "quickly" ... the expected behavior is that cost to insert into > a btree index grows roughly as log(N). Are you seeing anything worse > than that? No, that's not what I'm seeing. The index block reads start low, and rise quickly to an approximate plateau. I've placed my test program and results here: http://geophile.com/insert_slowdown. - InsertPerformance.java: The test program (using the 8.0 JDBC driver and a 7.4.8 database. The database and test are all running on my laptop). - block_reads.jpg: Graph of data and index block reads, as reported by the pgstat_ tables, sampled every 15 seconds, (for a load of 1,000,000 rows). - insert_rate_vs_inserts.jpg: Graph of insert rate as a function of #rows inserted. - insert_rate_vs_time.jpg: Graph of insert rate as a function of wall clock time. > > shared_buffers of 2000 is generally considered too small for high-volume > databases. Understood. I set the value low to quickly test the idea that the index cache hit rate was the issue. > Numbers like 10000-50000 are considered reasonable on modern > hardware. These values are OK for 7.4.8? I've been using 8000. I thought I remembered reading that 12000-15000 was the top end of what would be reasonable, but I don't have a reference, and I don't think I've ever heard a rationale for such limits. Jack Orenstein ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |