Unix Technical Forum

Re:

This is a discussion on Re: within the Pgsql Performance forums, part of the PostgreSQL category; --> Steven Rosenstein <srosenst@us.ibm.com> writes: > My question is, are there any advantages, drawbacks, or outright > restrictions to using ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 11:39 AM
Tom Lane
 
Posts: n/a
Default Re:

Steven Rosenstein <srosenst@us.ibm.com> writes:
> My question is, are there any advantages, drawbacks, or outright
> restrictions to using multiple simultaneous COPY commands to load data into
> the same table?


It will work; not sure about whether there is any performance benefit.
I vaguely recall someone having posted about doing this, so you might
check the archives.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 11:39 AM
Mischa Sandberg
 
Posts: n/a
Default Re: COPY vs INSERT

> Steven Rosenstein <srosenst@us.ibm.com> writes:
> > My question is, are there any advantages, drawbacks, or outright
> > restrictions to using multiple simultaneous COPY commands to load

> data into
> > the same table?


Do you mean, multiple COPY commands (connections) being putline'd from
the same thread (process)? I have indirect evidence that this may hurt.

Two copy commands from different threads/processes are fine, and can
help, if they alternate contention on some other resource (disk/CPU).

I'm basing this on being at the third generation of a COPY
implementation. The app loads about 1M objects/hour from 6 servers.
Each object is split across four tables.
The batch load opens four connections and firehoses records down each.
A batch is 10K objects.

COPY invokes all the same logic as INSERT on the server side
(rowexclusive locking, transaction log, updating indexes, rules).
The difference is that all the rows are inserted as a single
transaction. This reduces the number of fsync's on the xlog,
which may be a limiting factor for you. You'll want to crank
WAL_BUFFERS and CHECKPOINT_SEGMENTS to match, though.
One of my streams has 6K records; I run with WB=1000, CS=128.

The downside I found with multiple clients inserting large blocks of
rows was, that they serialized. I THINK that's because at some point
they all needed to lock the same portions of the same indexes. I'm still
working on how to avoid that, tuning the batch size and inserting into a
"queue" table with fewer indexes.

COPY (via putline) didn't do measurably better than INSERT until I
batched 40 newline-separate rows into one putline call, which improved
it 2-3:1. The suspect problem was stalling on the TCP stream; the driver
was flushing small packets. This may or may not be relevant to you;
depends on how much processing (waiting) your app does between posting
of rows.

In such a case, writing alternately to two TCP streams from the same
process increases the likelihood of a stall. I've never tested that
set-up; it would have been heading AWAY from the solution in my case.

Hope that helps.
--
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 11:40 AM
Mike Rylander
 
Posts: n/a
Default Re:

On 5/3/05, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Steven Rosenstein <srosenst@us.ibm.com> writes:
> > My question is, are there any advantages, drawbacks, or outright
> > restrictions to using multiple simultaneous COPY commands to load data into
> > the same table?

>
> It will work; not sure about whether there is any performance benefit.
> I vaguely recall someone having posted about doing this, so you might
> check the archives.
>


I may be one of Tom's vague "voices". The only issue would be that
you need to remove all you UNIQUE constraints before sending multiple
COPYs to the server. This includes the PRIMARY KEY constraint. To
the backend, COPY is just like INSERT and all constraints need to be
checked and this will block the commit of one of the COPY streams.

However, multiple COPYs may no be needed. I regularly load several
table totaling around 50M rows with a single COPY per table. I drop
(actually, this is during DB reload, so I don't yet create...) all
fkeys, constraints and indexes and the data loads in a matter of 5
minutes or so.

Hope that helps!

--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 11:40 AM
David Roussel
 
Posts: n/a
Default Re: COPY vs INSERT

> COPY invokes all the same logic as INSERT on the server side
> (rowexclusive locking, transaction log, updating indexes, rules).
> The difference is that all the rows are inserted as a single
> transaction. This reduces the number of fsync's on the xlog,
> which may be a limiting factor for you. You'll want to crank
> WAL_BUFFERS and CHECKPOINT_SEGMENTS to match, though.
> One of my streams has 6K records; I run with WB=1000, CS=128.


So what's the difference between a COPY and a batch of INSERT
statements. Also, surely, fsyncs only occur at the end of a
transaction, no need to fsync before a commit has been issued, right?

David

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 11:40 AM
Mischa Sandberg
 
Posts: n/a
Default Re: COPY vs INSERT

Quoting David Roussel <pgsql-performance@diroussel.xsmail.com>:

> > COPY invokes all the same logic as INSERT on the server side
> > (rowexclusive locking, transaction log, updating indexes, rules).
> > The difference is that all the rows are inserted as a single
> > transaction. This reduces the number of fsync's on the xlog,
> > which may be a limiting factor for you. You'll want to crank
> > WAL_BUFFERS and CHECKPOINT_SEGMENTS to match, though.
> > One of my streams has 6K records; I run with WB=1000, CS=128.

>
> So what's the difference between a COPY and a batch of INSERT
> statements. Also, surely, fsyncs only occur at the end of a
> transaction, no need to fsync before a commit has been issued,
> right?


Sorry, I was comparing granularities the other way araound. As far as
xlog is concerned, a COPY is ALWAYS one big txn, no matter how many
putline commands you use to feed the copy. With inserts, you can choose
whether to commit every row, every nth row, etc.

Copy makes better use of the TCP connection for transmission. COPY uses
the TCP connection like a one-way pipe. INSERT is like an RPC: the
sender has to wait until the insert's return status roundtrips.
--
Engineers think equations approximate reality.
Physicists think reality approximates the equations.
Mathematicians never make the connection.


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 11:40 AM
John A Meinel
 
Posts: n/a
Default Re: COPY vs INSERT

David Roussel wrote:
>>COPY invokes all the same logic as INSERT on the server side
>>(rowexclusive locking, transaction log, updating indexes, rules).
>>The difference is that all the rows are inserted as a single
>>transaction. This reduces the number of fsync's on the xlog,
>>which may be a limiting factor for you. You'll want to crank
>>WAL_BUFFERS and CHECKPOINT_SEGMENTS to match, though.
>>One of my streams has 6K records; I run with WB=1000, CS=128.

>
>
> So what's the difference between a COPY and a batch of INSERT
> statements. Also, surely, fsyncs only occur at the end of a
> transaction, no need to fsync before a commit has been issued, right?


I think COPY also has the advantage that for index updates it only grabs
the lock once, rather than grabbing and releasing for each row. But I
believe you are right that fsync only happens on COMMIT.

>
> David


John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org

iD8DBQFCeVmLJdeBCYSNAAMRAuWrAKCbyhxIXWmpXWuKbPGRhj yloIU7swCgxjPE
Gn0DQu9Wj5m+Mr3Wv5Zzokk=
=pkfD
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 11:40 AM
Christopher Petrilli
 
Posts: n/a
Default Re: COPY vs INSERT

On 5/4/05, Mischa Sandberg <mischa.sandberg@telus.net> wrote:
> Quoting David Roussel <pgsql-performance@diroussel.xsmail.com>:
>
> > > COPY invokes all the same logic as INSERT on the server side
> > > (rowexclusive locking, transaction log, updating indexes, rules).
> > > The difference is that all the rows are inserted as a single
> > > transaction. This reduces the number of fsync's on the xlog,
> > > which may be a limiting factor for you. You'll want to crank
> > > WAL_BUFFERS and CHECKPOINT_SEGMENTS to match, though.
> > > One of my streams has 6K records; I run with WB=1000, CS=128.

> >
> > So what's the difference between a COPY and a batch of INSERT
> > statements. Also, surely, fsyncs only occur at the end of a
> > transaction, no need to fsync before a commit has been issued,
> > right?

>
> Sorry, I was comparing granularities the other way araound. As far as
> xlog is concerned, a COPY is ALWAYS one big txn, no matter how many
> putline commands you use to feed the copy. With inserts, you can choose
> whether to commit every row, every nth row, etc.
>
> Copy makes better use of the TCP connection for transmission. COPY uses
> the TCP connection like a one-way pipe. INSERT is like an RPC: the
> sender has to wait until the insert's return status roundtrips.


I have found even greater performance increases by using COPY FROM
<filename> not COPY FROM STDIN. This allows the backend process to
directly read the file, rather than shoving it over a pipe (thereby
potentially hitting the CPU multiple times). My experience is that
this is anywhere from 5-10x faster than INSERT statements on the
whole, and sometimes 200x.

Chris

--
| Christopher Petrilli
| petrilli@gmail.com

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 11:40 AM
Kris Jurka
 
Posts: n/a
Default Re: COPY vs INSERT



On Wed, 4 May 2005, Mischa Sandberg wrote:

> Copy makes better use of the TCP connection for transmission. COPY uses
> the TCP connection like a one-way pipe. INSERT is like an RPC: the
> sender has to wait until the insert's return status roundtrips.


Not true. A client may send any number of Bind/Execute messages on a
prepared statement before a Sync message. So multiple inserts may be sent
in one network roundtrip. This is exactly how the JDBC driver
implements batch statements. There is some limit to the number of queries
in flight at any given moment because there is the potential to deadlock
if both sides of network buffers are filled up and each side is blocked
waiting on a write. The JDBC driver has conservatively selected 256 as
the maximum number of queries to send at once.

Kris Jurka

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 11:40 AM
Christopher Kings-Lynne
 
Posts: n/a
Default Re: COPY vs INSERT

> So what's the difference between a COPY and a batch of INSERT
> statements. Also, surely, fsyncs only occur at the end of a
> transaction, no need to fsync before a commit has been issued, right?


With COPY, the data being inserted itself does not have to pass through
the postgresql parser.

Chris

---------------------------(end of broadcast)---------------------------
TIP 7: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 11:40 AM
Tom Lane
 
Posts: n/a
Default Re: COPY vs INSERT

Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
>> So what's the difference between a COPY and a batch of INSERT
>> statements. Also, surely, fsyncs only occur at the end of a
>> transaction, no need to fsync before a commit has been issued, right?


> With COPY, the data being inserted itself does not have to pass through
> the postgresql parser.


Also, there is a whole lot of one-time-per-statement overhead that can
be amortized across many rows instead of only one. Stuff like opening
the target table, looking up the per-column I/O conversion functions,
identifying trigger functions if any, yadda yadda. It's not *that*
expensive, but compared to an operation as small as inserting a single
row, it's significant.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 09:23 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com