Unix Technical Forum

concurrent inserts into two separate tables are very slow

This is a discussion on concurrent inserts into two separate tables are very slow within the Pgsql Performance forums, part of the PostgreSQL category; --> Hi. Running postgres 8.2 on debian. I've noticed that concurrent inserts (archiving) of large batches data into two completely ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 11:43 AM
Sergei Shelukhin
 
Posts: n/a
Default concurrent inserts into two separate tables are very slow

Hi. Running postgres 8.2 on debian.
I've noticed that concurrent inserts (archiving) of large batches data
into two completely unrelated tables are many times slower than the
same inserts done in sequence.
Is there any way to speed them up apart from buying faster HDs/
changing RAID configuration?
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 11:44 AM
Scott Marlowe
 
Posts: n/a
Default Re: concurrent inserts into two separate tables are very slow

On Jan 5, 2008 9:00 PM, Sergei Shelukhin <realgeek@gmail.com> wrote:
> Hi. Running postgres 8.2 on debian.
> I've noticed that concurrent inserts (archiving) of large batches data
> into two completely unrelated tables are many times slower than the
> same inserts done in sequence.
> Is there any way to speed them up apart from buying faster HDs/
> changing RAID configuration?


What method are you using to load these data? Got a short example
that illustrates what you're doing?

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 11:44 AM
Sergei Shelukhin
 
Posts: n/a
Default Re: concurrent inserts into two separate tables are veryslow

Scott Marlowe wrote:
> On Jan 5, 2008 9:00 PM, Sergei Shelukhin <realgeek@gmail.com> wrote:
>
>> Hi. Running postgres 8.2 on debian.
>> I've noticed that concurrent inserts (archiving) of large batches data
>> into two completely unrelated tables are many times slower than the
>> same inserts done in sequence.
>> Is there any way to speed them up apart from buying faster HDs/
>> changing RAID configuration?
>>

>
> What method are you using to load these data? Got a short example
> that illustrates what you're doing?
>
>

The basic structure is as follows: there are several tables with
transaction data that is stored for one month only.
The data comes from several sources in different formats and is pushed
in using a custom script.
It gets the source data and puts it into a table it creates (import
table) with the same schema as the main table; then it deletes the month
old data from the main table; it also searches for duplicates in the
main table using some specific criteria and deletes them too (to make
use of indexes 2nd temp table is created with id int column and it's
populated with one insert ... select query with the transaction ids of
data duplicate in main and import tables, after that delete from pages
where id in (select id from 2nd-temp-table) is called). Then it inserts
the remainder of the imports table into the main table.
There are several data load processes that function in the same manner
with different target tables.
When they are running in sequence, they take about 20 minutes to
complete on average. If, however, they are running in parallel, they can
take up to 3 hours... I was wondering if it's solely the HD bottleneck
case, given that there's plenty of CPU and RAM available and postgres is
configured to use it.


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 11:44 AM
Scott Marlowe
 
Posts: n/a
Default Re: concurrent inserts into two separate tables are very slow

On Jan 7, 2008 4:49 PM, Sergei Shelukhin <realgeek@gmail.com> wrote:
>
> Scott Marlowe wrote:
> > On Jan 5, 2008 9:00 PM, Sergei Shelukhin <realgeek@gmail.com> wrote:
> >
> >> Hi. Running postgres 8.2 on debian.
> >> I've noticed that concurrent inserts (archiving) of large batches data
> >> into two completely unrelated tables are many times slower than the
> >> same inserts done in sequence.
> >> Is there any way to speed them up apart from buying faster HDs/
> >> changing RAID configuration?
> >>

> >
> > What method are you using to load these data? Got a short example
> > that illustrates what you're doing?
> >
> >

> The basic structure is as follows: there are several tables with
> transaction data that is stored for one month only.
> The data comes from several sources in different formats and is pushed
> in using a custom script.
> It gets the source data and puts it into a table it creates (import
> table) with the same schema as the main table; then it deletes the month
> old data from the main table; it also searches for duplicates in the
> main table using some specific criteria and deletes them too (to make
> use of indexes 2nd temp table is created with id int column and it's
> populated with one insert ... select query with the transaction ids of
> data duplicate in main and import tables, after that delete from pages
> where id in (select id from 2nd-temp-table) is called). Then it inserts
> the remainder of the imports table into the main table.
> There are several data load processes that function in the same manner
> with different target tables.
> When they are running in sequence, they take about 20 minutes to
> complete on average. If, however, they are running in parallel, they can
> take up to 3 hours... I was wondering if it's solely the HD bottleneck
> case, given that there's plenty of CPU and RAM available and postgres is
> configured to use it.


Ahh, thanks for the more detailed explanation. Now I get what you're facing.

There are a few things you could do that would probably help. Doing
more than one might help.

1: Buy a decent battery backed caching RAID controller. This will
smooth out writes a lot. If you can't afford that...
2: Build a nice big RAID-10 array, say 8 to 14 discs.
3: Put pg_xlog on a physically separate drive from the rest of the database.
4: Put each table being inserted to on a separate physical hard drives.
5: Stop writing to multiple tables at once.
6: (Not recommended) run with fsync turned off.

Each of these things can help on their own. My personal preference
for heavily written databases is a good RAID controller with battery
backed caching on and a lot of discs in RAID-10 or RAID-6 (depending
on read versus write ratio and the need for storage space.) RAID-10
is normally better for performance, RAID-6 with large arrays is better
for maximizing your size while maintaining decent performance and
reliability. RAID-5 is right out.

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:32 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com