Unix Technical Forum

A little COPY speedup

This is a discussion on A little COPY speedup within the Pgsql Patches forums, part of the PostgreSQL category; --> On Thu, 2007-03-01 at 17:01 +0000, Heikki Linnakangas wrote: > I ran oprofile on a COPY FROM to get ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply

 

LinkBack Thread Tools Display Modes
  #11 (permalink)  
Old 04-18-2008, 10:38 AM
Simon Riggs
 
Posts: n/a
Default Re: A little COPY speedup

On Thu, 2007-03-01 at 17:01 +0000, Heikki Linnakangas wrote:

> I ran oprofile on a COPY FROM to get an overview of where the CPU time
> is spent. To my amazement, the function at the top of the list was
> PageAddItem with 16% of samples.


Excellent.

I'm slightly worried though since that seems to have changed from 8.2,
which I oprofiled over Christmas. I can't recall what's changed though.
Was this just on one system? Is it possible there is an effect on one
system and not another?

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #12 (permalink)  
Old 04-18-2008, 10:38 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: A little COPY speedup

Simon Riggs wrote:
> On Thu, 2007-03-01 at 17:01 +0000, Heikki Linnakangas wrote:
>
>> I ran oprofile on a COPY FROM to get an overview of where the CPU time
>> is spent. To my amazement, the function at the top of the list was
>> PageAddItem with 16% of samples.

>
> Excellent.
>
> I'm slightly worried though since that seems to have changed from 8.2,
> which I oprofiled over Christmas. I can't recall what's changed though.
> Was this just on one system? Is it possible there is an effect on one
> system and not another?


Well, there's one big change: your patch to suppress WAL logging on
tables created in the same transaction. I ran the test again, this time
creating the table in a separate transaction:

samples % image name app name
symbol name
5480 17.0366 postgres postgres
XLogInsert
3684 11.4531 postgres postgres
PageAddItem
3580 11.1298 libc-2.3.6.so postgres memcpy
2498 7.7660 postgres postgres DoCopy
1265 3.9327 postgres postgres
LWLockAcquire
1210 3.7617 postgres postgres
CopyReadLine
1042 3.2394 postgres postgres
LWLockRelease
1038 3.2270 postgres postgres
heap_formtuple
1033 3.2115 libc-2.3.6.so postgres
____strtol_l_internal
875 2.7203 postgres postgres hash_any

The profile will probably look somewhat different depending on your
data, encoding etc.

All the page locking related functions account for ~10% in total,
including the LWLockAcquire/Release, Pin/UnBuffer, hash_any and so on.
And then there's all the memcpying...

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #13 (permalink)  
Old 04-18-2008, 10:38 AM
Simon Riggs
 
Posts: n/a
Default Re: A little COPY speedup

On Fri, 2007-03-02 at 10:09 +0000, Heikki Linnakangas wrote:

> Well, there's one big change: your patch to suppress WAL logging on
> tables created in the same transaction.


OK, just checking thats what you meant.

> All the page locking related functions account for ~10% in total,
> including the LWLockAcquire/Release, Pin/UnBuffer, hash_any and so on.
> And then there's all the memcpying...


I think its a great spot that PageAddItem() was so bad. I realise I
didn't actually look at what it was doing, just looked at ways to avoid
doing it on each individual call to the block for each row.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #14 (permalink)  
Old 04-18-2008, 10:38 AM
Tom Lane
 
Posts: n/a
Default Re: A little COPY speedup

"Simon Riggs" <simon@2ndquadrant.com> writes:
> I'm slightly worried though since that seems to have changed from 8.2,
> which I oprofiled over Christmas.


If you were testing a case with wider rows than Heikki tested, you'd see
less impact --- the cost of the old way was O(N^2) in the number of
tuples that fit on a page, so the behavior gets rapidly worse as you get
down to smaller tuple sizes. (Come to think of it, the cmin/cmax
collapse would be a factor here too.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #15 (permalink)  
Old 04-18-2008, 10:38 AM
Gregory Stark
 
Posts: n/a
Default Re: A little COPY speedup

"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> I'm slightly worried though since that seems to have changed from 8.2,
>> which I oprofiled over Christmas.

>
> If you were testing a case with wider rows than Heikki tested, you'd see
> less impact --- the cost of the old way was O(N^2) in the number of
> tuples that fit on a page, so the behavior gets rapidly worse as you get
> down to smaller tuple sizes. (Come to think of it, the cmin/cmax
> collapse would be a factor here too.)


Or larger block sizes of course. A 32kb block would be 16x as bad which starts
to be pretty serious.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #16 (permalink)  
Old 04-18-2008, 10:38 AM
Simon Riggs
 
Posts: n/a
Default Re: A little COPY speedup

On Fri, 2007-03-02 at 16:25 +0000, Gregory Stark wrote:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>
> > "Simon Riggs" <simon@2ndquadrant.com> writes:
> >> I'm slightly worried though since that seems to have changed from 8.2,
> >> which I oprofiled over Christmas.

> >
> > If you were testing a case with wider rows than Heikki tested, you'd see
> > less impact --- the cost of the old way was O(N^2) in the number of
> > tuples that fit on a page, so the behavior gets rapidly worse as you get
> > down to smaller tuple sizes. (Come to think of it, the cmin/cmax
> > collapse would be a factor here too.)

>
> Or larger block sizes of course. A 32kb block would be 16x as bad which starts
> to be pretty serious.


Well, I was only using 8kb blocks.

But I think the message is clear: we need to profile lots of different
combinations. I was using a 2 col table with integer, char(100).

IIRC there are issues with delimiter handling when we have lots of
columns in the input on COPY FROM, and num of cols on COPY TO. I've not
looked at those recently though.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #17 (permalink)  
Old 04-18-2008, 10:38 AM
Andrew Dunstan
 
Posts: n/a
Default Re: A little COPY speedup

Simon Riggs wrote:
>
> IIRC there are issues with delimiter handling when we have lots of
> columns in the input on COPY FROM, and num of cols on COPY TO. I've not
> looked at those recently though.
>
>


What sort of issues? Anything that breaks on this has catastrophic
consequences.

cheers

andrew


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #18 (permalink)  
Old 04-18-2008, 10:38 AM
Simon Riggs
 
Posts: n/a
Default Re: A little COPY speedup

On Fri, 2007-03-02 at 11:58 -0500, Andrew Dunstan wrote:
> Simon Riggs wrote:
> >
> > IIRC there are issues with delimiter handling when we have lots of
> > columns in the input on COPY FROM, and num of cols on COPY TO. I've not
> > looked at those recently though.
> >
> >

>
> What sort of issues? Anything that breaks on this has catastrophic
> consequences.


We were talking about performance, not data integrity....

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #19 (permalink)  
Old 04-18-2008, 10:38 AM
Andrew Dunstan
 
Posts: n/a
Default Re: A little COPY speedup

Simon Riggs wrote:
> On Fri, 2007-03-02 at 11:58 -0500, Andrew Dunstan wrote:
>
>> Simon Riggs wrote:
>>
>>> IIRC there are issues with delimiter handling when we have lots of
>>> columns in the input on COPY FROM, and num of cols on COPY TO. I've not
>>> looked at those recently though.
>>>
>>>
>>>

>> What sort of issues? Anything that breaks on this has catastrophic
>> consequences.
>>

>
> We were talking about performance, not data integrity....
>
>


OK. I'm still curious to know what the issues are with delimiter handling.

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #20 (permalink)  
Old 04-18-2008, 10:39 AM
Simon Riggs
 
Posts: n/a
Default Re: A little COPY speedup

On Fri, 2007-03-02 at 12:09 -0500, Andrew Dunstan wrote:

> OK. I'm still curious to know what the issues are with delimiter handling.


Rumours only.

Feedback from someone else looking to the problem last year. IIRC there
was a feeling that if we didn't have to search for delimiters the COPY
FROM input parsing could be easier.

Vague recollection that the COPY TO uses the slower API for getting heap
attributes, but didn't seem to show up when I profiled few months back.

I'm not personally proposing to take those thoughts any further.

--
Simon Riggs
EnterpriseDB http://www.enterprisedb.com



---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:54 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com