Unix Technical Forum

Insert rate drops as table grows

This is a discussion on Insert rate drops as table grows within the Pgsql General forums, part of the PostgreSQL category; --> I have this table and index: create table t(id int, hash int); create index idx_t on t(hash); The value ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql General

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-09-2008, 08:09 AM
jao@geophile.com
 
Posts: n/a
Default Insert rate drops as table grows

I have this table and index:

create table t(id int, hash int);
create index idx_t on t(hash);

The value of the hash column, which is indexed, is a pseudo-random
number. I load the table and measure the time per insert.

What I've observed is that inserts slow down as the table grows to
1,000,000 records. Observing the pg_stat* tables, I see that the data
page reads per unit time stay steady, but that index page reads grow
quickly, (shared_buffers was set to 2000).

I'm guessing that this is because inserts will append to data pages,
but there is a random probe to the btree to maintain the index.

This is a test program, but I'm wondering about my application that
will need row counts much beyond 1,000,000.

Questions:

- Am I reasoning about the data and index costs correctly?

- In order to keep insert times from dropping too much, do I simply
need to increase shared_buffers in order to accomodate more of the
index?

- Assuming I'm willing to buy enough RAM and set shmmax high enough,
are there practical limits on how big shared_buffers can be that will
limit how far I can pursue such a strategy?


Jack Orenstein



---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-09-2008, 08:09 AM
Tom Lane
 
Posts: n/a
Default Re: Insert rate drops as table grows

jao@geophile.com writes:
> I have this table and index:
> create table t(id int, hash int);
> create index idx_t on t(hash);


> The value of the hash column, which is indexed, is a pseudo-random
> number. I load the table and measure the time per insert.


> What I've observed is that inserts slow down as the table grows to
> 1,000,000 records. Observing the pg_stat* tables, I see that the data
> page reads per unit time stay steady, but that index page reads grow
> quickly, (shared_buffers was set to 2000).


Define "quickly" ... the expected behavior is that cost to insert into
a btree index grows roughly as log(N). Are you seeing anything worse
than that?

shared_buffers of 2000 is generally considered too small for high-volume
databases. Numbers like 10000-50000 are considered reasonable on modern
hardware. It's possible that you could go larger without too much
penalty if using the 8.1 buffer manager code, but I don't know if anyone
has benchmarked that systematically.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-09-2008, 08:09 AM
jao@geophile.com
 
Posts: n/a
Default Re: Insert rate drops as table grows

Quoting Tom Lane <tgl@sss.pgh.pa.us>:

> jao@geophile.com writes:
>> I have this table and index:
>> create table t(id int, hash int);
>> create index idx_t on t(hash);

>
>> The value of the hash column, which is indexed, is a pseudo-random
>> number. I load the table and measure the time per insert.

>
>> What I've observed is that inserts slow down as the table grows to
>> 1,000,000 records. Observing the pg_stat* tables, I see that the data
>> page reads per unit time stay steady, but that index page reads grow
>> quickly, (shared_buffers was set to 2000).

>
> Define "quickly" ... the expected behavior is that cost to insert into
> a btree index grows roughly as log(N). Are you seeing anything worse
> than that?


No, that's not what I'm seeing. The index block reads start low, and
rise quickly to an approximate plateau. I've placed my test program
and results here: http://geophile.com/insert_slowdown.

- InsertPerformance.java: The test program (using the 8.0 JDBC driver
and a 7.4.8 database. The database and test are all running on my
laptop).

- block_reads.jpg: Graph of data and index block reads, as reported by
the pgstat_ tables, sampled every 15 seconds, (for a load of 1,000,000
rows).

- insert_rate_vs_inserts.jpg: Graph of insert rate as a function of
#rows inserted.

- insert_rate_vs_time.jpg: Graph of insert rate as a function of wall
clock time.

>
> shared_buffers of 2000 is generally considered too small for high-volume
> databases.


Understood. I set the value low to quickly test the idea that the
index cache hit rate was the issue.

> Numbers like 10000-50000 are considered reasonable on modern
> hardware.


These values are OK for 7.4.8? I've been using 8000. I thought I
remembered reading that 12000-15000 was the top end of what would be
reasonable, but I don't have a reference, and I don't think I've ever
heard a rationale for such limits.

Jack Orenstein



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-09-2008, 08:10 AM
jao@geophile.com
 
Posts: n/a
Default Re: Insert rate drops as table grows

[Sorry if this is a repeat. Thought I sent this but I didn't see
it come through the mailing list.]

Quoting Tom Lane <tgl@sss.pgh.pa.us>:

> jao@geophile.com writes:
>> I have this table and index:
>> create table t(id int, hash int);
>> create index idx_t on t(hash);

>
>> The value of the hash column, which is indexed, is a pseudo-random
>> number. I load the table and measure the time per insert.

>
>> What I've observed is that inserts slow down as the table grows to
>> 1,000,000 records. Observing the pg_stat* tables, I see that the data
>> page reads per unit time stay steady, but that index page reads grow
>> quickly, (shared_buffers was set to 2000).

>
> Define "quickly" ... the expected behavior is that cost to insert into
> a btree index grows roughly as log(N). Are you seeing anything worse
> than that?


No, that's not what I'm seeing. The index block reads start low, and
rise quickly to an approximate plateau. I've placed my test program
and results here: http://geophile.com/insert_slowdown.

- InsertPerformance.java: The test program (using the 8.0 JDBC driver
and a 7.4.8 database. The database and test are all running on my
laptop).

- block_reads.jpg: Graph of data and index block reads, as reported by
the pgstat_ tables, sampled every 15 seconds, (for a load of 1,000,000
rows).

- insert_rate_vs_inserts.jpg: Graph of insert rate as a function of
#rows inserted.

- insert_rate_vs_time.jpg: Graph of insert rate as a function of wall
clock time.

>
> shared_buffers of 2000 is generally considered too small for high-volume
> databases.


Understood. I set the value low to quickly test the idea that the
index cache hit rate was the issue.

> Numbers like 10000-50000 are considered reasonable on modern
> hardware.


These values are OK for 7.4.8? I've been using 8000. I thought I
remembered reading that 12000-15000 was the top end of what would be
reasonable, but I don't have a reference, and I don't think I've ever
heard a rationale for such limits.

Jack Orenstein



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:52 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com