Unix Technical Forum

Re: COPY FROM performance improvements

This is a discussion on Re: COPY FROM performance improvements within the Pgsql Patches forums, part of the PostgreSQL category; --> Luke Lonergan wrote: >Andrew, > > > >>You might like to look at running pgindent (see src/tools/pgindent) over >>the ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 12:33 AM
Andrew Dunstan
 
Posts: n/a
Default Re: COPY FROM performance improvements



Luke Lonergan wrote:

>Andrew,
>
>
>
>>You might like to look at running pgindent (see src/tools/pgindent) over
>>the file before cutting a patch. Since this is usually run over each
>>file just before a release, the only badness should be things from
>>recent patches.
>>
>>

>
>I've attached two patches, one gained from running pgindent against the
>current CVS tip copy.c (:-D) and one gained by running the COPY FROM perf
>improvements through the same. Nifty tool!
>
>Only formatting changes in these.
>
>
>
>

Luke,

Something strange has happened. I suspect that you've inadvertantly used
GNU indent or an unpatched BSD indent. pgindent needs a special patched
BSD indent to work according to the PG standards - see the README

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 12:42 AM
Mark Wong
 
Posts: n/a
Default Re: COPY FROM performance improvements

I just ran through a few tests with the v14 patch against 100GB of data
from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to
give a few details, I only loaded data and started a COPY in parallel
for each the data files:
http://www.testing.osdl.org/projects...lts/fast_copy/

Here's a visual of my disk layout, for those familiar with the database schema:
http://www.testing.osdl.org/projects...-010-dbt3.html

I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.

Let me know if you have any questions.

Mark

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 12:42 AM
Luke Lonergan
 
Posts: n/a
Default Re: COPY FROM performance improvements

Cool!

At what rate does your disk setup write sequential data, e.g.:
time dd if=/dev/zero of=bigfile bs=8k count=500000

(sized for 2x RAM on a system with 2GB)

BTW - the Compaq smartarray controllers are pretty broken on Linux from a
performance standpoint in our experience. We've had disastrously bad
results from the SmartArray 5i and 6 controllers on kernels from 2.4 ->
2.6.10, on the order of 20MB/s.

For comparison, the results on our dual opteron with a single LSI SCSI
controller with software RAID0 on a 2.6.10 kernel:

[llonergan@stinger4 dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
count=500000
500000+0 records in
500000+0 records out

real 0m24.702s
user 0m0.077s
sys 0m8.794s

Which calculates out to about 161MB/s.

- Luke


On 7/21/05 2:55 PM, "Mark Wong" <markw@osdl.org> wrote:

> I just ran through a few tests with the v14 patch against 100GB of data
> from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to
> give a few details, I only loaded data and started a COPY in parallel
> for each the data files:
> http://www.testing.osdl.org/projects...lts/fast_copy/
>
> Here's a visual of my disk layout, for those familiar with the database
> schema:
> http://www.testing.osdl.org/projects...py/layout-dev4
> -010-dbt3.html
>
> I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
> attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.
>
> Let me know if you have any questions.
>
> Mark
>




---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 12:42 AM
Joshua D. Drake
 
Posts: n/a
Default Re: COPY FROM performance improvements

Luke Lonergan wrote:
> Cool!
>
> At what rate does your disk setup write sequential data, e.g.:
> time dd if=/dev/zero of=bigfile bs=8k count=500000
>
> (sized for 2x RAM on a system with 2GB)
>
> BTW - the Compaq smartarray controllers are pretty broken on Linux from a
> performance standpoint in our experience. We've had disastrously bad
> results from the SmartArray 5i and 6 controllers on kernels from 2.4 ->
> 2.6.10, on the order of 20MB/s.


O.k. this strikes me as interesting, now we know that Compaq and Dell
are borked for Linux. Is there a name brand server (read Enterprise)
that actually does provide reasonable performance?

>
> For comparison, the results on our dual opteron with a single LSI SCSI
> controller with software RAID0 on a 2.6.10 kernel:
>
> [llonergan@stinger4 dbfast]$ time dd if=/dev/zero of=bigfile bs=8k
> count=500000
> 500000+0 records in
> 500000+0 records out
>
> real 0m24.702s
> user 0m0.077s
> sys 0m8.794s
>
> Which calculates out to about 161MB/s.
>
> - Luke
>
>
> On 7/21/05 2:55 PM, "Mark Wong" <markw@osdl.org> wrote:
>
>
>>I just ran through a few tests with the v14 patch against 100GB of data
>>from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to
>>give a few details, I only loaded data and started a COPY in parallel
>>for each the data files:
>>http://www.testing.osdl.org/projects...lts/fast_copy/
>>
>>Here's a visual of my disk layout, for those familiar with the database
>>schema:
>>http://www.testing.osdl.org/projects...py/layout-dev4
>>-010-dbt3.html
>>
>>I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration
>>attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers.
>>
>>Let me know if you have any questions.
>>
>>Mark
>>

>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match



--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 12:43 AM
Luke Lonergan
 
Posts: n/a
Default Re: COPY FROM performance improvements

Joshua,

On 7/21/05 5:08 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:

> O.k. this strikes me as interesting, now we know that Compaq and Dell
> are borked for Linux. Is there a name brand server (read Enterprise)
> that actually does provide reasonable performance?


I think late model Dell (post the bad chipset problem, circa 2001-2?) and
IBM and Sun servers are fine because they all use simple SCSI adapters from
LSI or Adaptec.

The HP Smartarray is an aberration, they don't have good driver support for
Linux and as a consequence have some pretty bad problems with both
performance and stability. On Windows they perform quite well.

Also - there are very big issues with some SATA controllers and Linux we've
seen, particularly the Silicon Image, Highpoint other non-Intel controllers.
Not sure about Nvidia, but the only ones I trust now are 3Ware and the
others mentioned in earlier posts.

- Luke





---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 12:43 AM
Joshua D. Drake
 
Posts: n/a
Default Re: COPY FROM performance improvements


> I think late model Dell (post the bad chipset problem, circa 2001-2?) and
> IBM and Sun servers are fine because they all use simple SCSI adapters from
> LSI or Adaptec.


Well I know that isn't true at least not with ANY of the Dells my
customers have purchased in the last 18 months. They are still really,
really slow.

> Also - there are very big issues with some SATA controllers and Linux we've
> seen, particularly the Silicon Image, Highpoint other non-Intel controllers.
> Not sure about Nvidia, but the only ones I trust now are 3Ware and the
> others mentioned in earlier posts.


I have great success with Silicon Image as long as I am running them
with Linux software RAID. The LSI controllers are also really nice.

J



>
> - Luke
>
>
>



--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-18-2008, 12:43 AM
Luke Lonergan
 
Posts: n/a
Default Re: COPY FROM performance improvements

Joshua,

On 7/21/05 7:53 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
> Well I know that isn't true at least not with ANY of the Dells my
> customers have purchased in the last 18 months. They are still really,
> really slow.


That's too bad, can you cite some model numbers? SCSI?

> I have great success with Silicon Image as long as I am running them
> with Linux software RAID. The LSI controllers are also really nice.


That's good to hear, I gave up on Silicon Image controllers on Linux about 1
year ago, which kernel are you using with success? Silicon Image
controllers are the most popular, so it's important to see them supported
well, though I'd rather see more SATA headers than 2 off of the built-in
chipsets.

- Luke



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-18-2008, 12:43 AM
Andrew Dunstan
 
Posts: n/a
Default Re: COPY FROM performance improvements


this discussion belongs on -performance

cheers

andrew


Luke Lonergan said:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
>> Well I know that isn't true at least not with ANY of the Dells my
>> customers have purchased in the last 18 months. They are still really,
>> really slow.

>
> That's too bad, can you cite some model numbers? SCSI?
>
>> I have great success with Silicon Image as long as I am running them
>> with Linux software RAID. The LSI controllers are also really nice.

>
> That's good to hear, I gave up on Silicon Image controllers on Linux
> about 1 year ago, which kernel are you using with success? Silicon
> Image
> controllers are the most popular, so it's important to see them
> supported well, though I'd rather see more SATA headers than 2 off of
> the built-in chipsets.
>
> - Luke





---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-18-2008, 12:43 AM
Joshua D. Drake
 
Posts: n/a
Default Re: COPY FROM performance improvements

Luke Lonergan wrote:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
>
>>Well I know that isn't true at least not with ANY of the Dells my
>>customers have purchased in the last 18 months. They are still really,
>>really slow.

>
>
> That's too bad, can you cite some model numbers? SCSI?


Yeah I will get them and post, but yes they are all SCSI.

>
>
>>I have great success with Silicon Image as long as I am running them
>>with Linux software RAID. The LSI controllers are also really nice.

>
>
> That's good to hear, I gave up on Silicon Image controllers on Linux about 1
> year ago, which kernel are you using with success?


Any of the 2.6 kernels. ALso the laster 2.4 (+22 I believe) support it
pretty well as well.

Silicon Image
> controllers are the most popular, so it's important to see them supported
> well, though I'd rather see more SATA headers than 2 off of the built-in
> chipsets.
>
> - Luke
>



--
Your PostgreSQL solutions provider, Command Prompt, Inc.
24x7 support - 1.800.492.2240, programming, and consulting
Home of PostgreSQL Replicator, plPHP, plPerlNG and pgPHPToolkit
http://www.commandprompt.com / http://www.postgresql.org

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-18-2008, 12:43 AM
Patrick Welche
 
Posts: n/a
Default Re: COPY FROM performance improvements

On Thu, Jul 21, 2005 at 09:19:04PM -0700, Luke Lonergan wrote:
> Joshua,
>
> On 7/21/05 7:53 PM, "Joshua D. Drake" <jd@commandprompt.com> wrote:
> > Well I know that isn't true at least not with ANY of the Dells my
> > customers have purchased in the last 18 months. They are still really,
> > really slow.

>
> That's too bad, can you cite some model numbers? SCSI?


I would be interested too, given

http://www.netbsd.org/cgi-bin/query-...l?number=30531


Cheers,

Patrick

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:13 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com