Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 11:06 AM
Alex
 
Posts: n/a
Default How can I make use of both CPUs in a dual processor machine

Hi,
we just got a new dual processor machine and I wonder if there is a way
to utilize both processors.

Our DB server is basically fully dedicated to postgres. (its a dual amd
with 4gb mem.)

I have a batch job that periodically loads about 8 million records into
a table.
for this I drop the indices, truncate the table, use the copy to insert
the data, recreate the indices (4 indices), vacuum the table.

That is all done through a perl batch job.

While I am doing this, I noticed that only one CPU is really used.

So here are my questions:

Is there a way to utilize both CPUs

Is it possible to split up the import file and run 2 copy processes

Is it possible to create 2 indices at the same time

Would I actually gain anything from that, or is the bottleneck somewhere
else ?

(perl is a given here for the batch job)

If anyone has some experience or ideas... any hints or help on this
would be appreciated.

Thanks
Alex


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 11:06 AM
John A Meinel
 
Posts: n/a
Default Re: How can I make use of both CPUs in a dual processor

Alex wrote:

> Hi,
> we just got a new dual processor machine and I wonder if there is a
> way to utilize both processors.
>
> Our DB server is basically fully dedicated to postgres. (its a dual
> amd with 4gb mem.)
>
> I have a batch job that periodically loads about 8 million records
> into a table.
> for this I drop the indices, truncate the table, use the copy to
> insert the data, recreate the indices (4 indices), vacuum the table.
>
> That is all done through a perl batch job.
>
> While I am doing this, I noticed that only one CPU is really used.
>
> So here are my questions:
>
> Is there a way to utilize both CPUs
>

For postgres, you get a max of 1 CPU per connection, so to use both, you
need 2 CPU's.

> Is it possible to split up the import file and run 2 copy processes
>
> Is it possible to create 2 indices at the same time
>

You'd want to be a little careful. Postgres uses work_mem for vacuum and
index creation, so if you have 2 processes doing it, just make sure you
aren't running out of RAM and going to swap.

> Would I actually gain anything from that, or is the bottleneck
> somewhere else ?
>

More likely, the bottleneck would be disk I/O. Simply because it is
almost always disk I/O. However, without knowing your configuration, how
much CPU is used during the operation, etc, it's hard to say.

> (perl is a given here for the batch job)
>
> If anyone has some experience or ideas... any hints or help on this
> would be appreciated.
>
> Thanks
> Alex
>

Sorry I wasn't a lot of help. You should probably post your postgres
version, and more information about how much CPU load there is while
your load is running.

John
=:->


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCCiLqJdeBCYSNAAMRAs57AJ0WrL7te9DXyN7wOckjgZ 9XUzHpmgCePB7R
eRHtE0baqplSoTel8EU8WKI=
=HcWm
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 11:06 AM
Alex
 
Posts: n/a
Default Re: How can I make use of both CPUs in a dual processor

Thanks John.

Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI
Disks, 4GB of memory.
Disks are pretty fast and memory should be more than enough. Currently
we dont have many concurrent connections.

I run PG 8.0.1 on Fedora Core 3

When I now run the batch job, one CPU runs in the 80-90% the other in
5-10% max.






John A Meinel wrote:

> Alex wrote:
>
>> Hi,
>> we just got a new dual processor machine and I wonder if there is a
>> way to utilize both processors.
>>
>> Our DB server is basically fully dedicated to postgres. (its a dual
>> amd with 4gb mem.)
>>
>> I have a batch job that periodically loads about 8 million records
>> into a table.
>> for this I drop the indices, truncate the table, use the copy to
>> insert the data, recreate the indices (4 indices), vacuum the table.
>>
>> That is all done through a perl batch job.
>>
>> While I am doing this, I noticed that only one CPU is really used.
>>
>> So here are my questions:
>>
>> Is there a way to utilize both CPUs
>>

> For postgres, you get a max of 1 CPU per connection, so to use both,
> you need 2 CPU's.
>
>> Is it possible to split up the import file and run 2 copy processes
>>
>> Is it possible to create 2 indices at the same time
>>

> You'd want to be a little careful. Postgres uses work_mem for vacuum
> and index creation, so if you have 2 processes doing it, just make
> sure you aren't running out of RAM and going to swap.
>
>> Would I actually gain anything from that, or is the bottleneck
>> somewhere else ?
>>

> More likely, the bottleneck would be disk I/O. Simply because it is
> almost always disk I/O. However, without knowing your configuration,
> how much CPU is used during the operation, etc, it's hard to say.
>
>> (perl is a given here for the batch job)
>>
>> If anyone has some experience or ideas... any hints or help on this
>> would be appreciated.
>>
>> Thanks
>> Alex
>>

> Sorry I wasn't a lot of help. You should probably post your postgres
> version, and more information about how much CPU load there is while
> your load is running.
>
> John
> =:->
>




---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 11:06 AM
Iain
 
Posts: n/a
Default Re: How can I make use of both CPUs in a dual processor

You can wait for processes to finish as follows:

#launch 3 processes
sh -c './build_indexes1.sh' & PID1=$!
sh -c './build_indexes2.sh' & PID2=$!
sh -c './build_indexes3.sh' & PID3=$!
# then
wait $PID1
wait $PID2
wait $PID3
#continue

My feeling is that doing so should generally reduce the overall processing
time, but if there are contention problems then it could conceivably get
much worse.

regards
Iain
----- Original Message -----
From: "Alex" <alex@meerkatsoft.com>
To: "John A Meinel" <john@arbash-meinel.com>
Cc: <pgsql-performance@postgresql.org>
Sent: Thursday, February 10, 2005 12:00 AM
Subject: Re: [PERFORM] How can I make use of both CPUs in a dual processor


> Thanks John.
>
> Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm SCSI
> Disks, 4GB of memory.
> Disks are pretty fast and memory should be more than enough. Currently we
> dont have many concurrent connections.
>
> I run PG 8.0.1 on Fedora Core 3
>
> When I now run the batch job, one CPU runs in the 80-90% the other in
> 5-10% max.
>
>
>
>
>
>
> John A Meinel wrote:
>
>> Alex wrote:
>>
>>> Hi,
>>> we just got a new dual processor machine and I wonder if there is a way
>>> to utilize both processors.
>>>
>>> Our DB server is basically fully dedicated to postgres. (its a dual amd
>>> with 4gb mem.)
>>>
>>> I have a batch job that periodically loads about 8 million records into
>>> a table.
>>> for this I drop the indices, truncate the table, use the copy to insert
>>> the data, recreate the indices (4 indices), vacuum the table.
>>>
>>> That is all done through a perl batch job.
>>>
>>> While I am doing this, I noticed that only one CPU is really used.
>>>
>>> So here are my questions:
>>>
>>> Is there a way to utilize both CPUs
>>>

>> For postgres, you get a max of 1 CPU per connection, so to use both, you
>> need 2 CPU's.
>>
>>> Is it possible to split up the import file and run 2 copy processes
>>>
>>> Is it possible to create 2 indices at the same time
>>>

>> You'd want to be a little careful. Postgres uses work_mem for vacuum and
>> index creation, so if you have 2 processes doing it, just make sure you
>> aren't running out of RAM and going to swap.
>>
>>> Would I actually gain anything from that, or is the bottleneck somewhere
>>> else ?
>>>

>> More likely, the bottleneck would be disk I/O. Simply because it is
>> almost always disk I/O. However, without knowing your configuration, how
>> much CPU is used during the operation, etc, it's hard to say.
>>
>>> (perl is a given here for the batch job)
>>>
>>> If anyone has some experience or ideas... any hints or help on this
>>> would be appreciated.
>>>
>>> Thanks
>>> Alex
>>>

>> Sorry I wasn't a lot of help. You should probably post your postgres
>> version, and more information about how much CPU load there is while your
>> load is running.
>>
>> John
>> =:->
>>

>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend



---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 11:06 AM
John A Meinel
 
Posts: n/a
Default Re: How can I make use of both CPUs in a dual processor

Alex wrote:

> Thanks John.
>
> Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm
> SCSI Disks, 4GB of memory.
> Disks are pretty fast and memory should be more than enough. Currently
> we dont have many concurrent connections.
>

Well, you didn't mention Opteron before (it makes a difference against
Xeons).
How many disks and in what configuration?
Do you have pg_xlog on a separate set of disks?
Are your drives in RAID 10 (0+1) or RAID 5?

If you have enough disks the recommended configuration is at least a
RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest
of the drives) for the actual data.

If your dataset is read heavy, or you have more than 6 disks, you can
get away with RAID 5 for the actual data. But since you are talking
about loading 8million rows at once, it certainly sounds like you are
write heavy.

If you only have a few disks, it's still probably better to put pg_xlog
on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much append only,
so if you dedicate a disk set to it, you eliminate a lot of seek times.

> I run PG 8.0.1 on Fedora Core 3
>
> When I now run the batch job, one CPU runs in the 80-90% the other in
> 5-10% max.



Anyway, it doesn't completely sound like you are CPU limited, but you
might be able to get a little bit more if you spawn another process.
Have you tried dropping the index, doing the copy, and then recreating
the 4-indexes in separate processes?

The simple test for this is to open 3-4 psql connections, have one of
them drop the indexes and do the copy, in the other connections you can
already have typed "CREATE INDEX ..." so when the copy is done and
committed to the database, you just go to the other terminals and hit enter.

Unfortunately you'll have to use wall clock time to see if this is faster.

Though I think you could do the same thing with a bash script. The
authentication should be in "trust" mode so that you don't take the time
to type your password.

#!/bin/bash
psql -h <host> -c "DROP INDEX ...; COPY FROM ..."

psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..." &
psql -h <host> -c "CREATE INDEX ..."


Now, I don't really know how to wait for all child processes in a bash
script (I could give you the python for it, but you're a perl guy). But
by not spawning the last INDEX, I'm hoping it takes longer than the
rest. Try to put the most difficult index there.

Then you could just run

time loadscript.sh

I'm sure you could do the equivalent in perl. Just open multiple
connections to the DB, and have them ready.

I'm guessing since you are on a dual processor machine, you won't get
much better performance above 2 connections.

You can also try doing 2 COPYs at the same time, but it seems like you
would have issues. Do you have any serial columns that you expect to be
in a certain order, or is all the information in the copy?

If the latter, try it, let us know what you get. I can't tell you the
perl for this, since I'm not a perl guy.

John
=:->


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.6 (Cygwin)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFCCikzJdeBCYSNAAMRAhf1AKC7JW14BGhCPvgqQVeLyN 1uBSxv4gCdHKvb
1iuEAd6MTnPKuIVeZR1nCWg=
=kJpx
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-18-2008, 11:06 AM
Alex
 
Posts: n/a
Default Re: How can I make use of both CPUs in a dual processor

Thanks for all the suggestions. It seems that creating indices, or even
import data using a copy is easy to implement. I also have some jobs
that create reports and want to try if I gain anything if i work reports
in parallel.

will give it a try in the next week and let you know the resuls.

Alex

John A Meinel wrote:

> Alex wrote:
>
>> Thanks John.
>>
>> Well as I mentioned. I have a Dual AMD Opteron 64 2.4ghz, 15k rpm
>> SCSI Disks, 4GB of memory.
>> Disks are pretty fast and memory should be more than enough.
>> Currently we dont have many concurrent connections.
>>

> Well, you didn't mention Opteron before (it makes a difference against
> Xeons).
> How many disks and in what configuration?
> Do you have pg_xlog on a separate set of disks?
> Are your drives in RAID 10 (0+1) or RAID 5?
>
> If you have enough disks the recommended configuration is at least a
> RAID1 for the OS, RAID 10 for pg_xlog (4drives), and RAID 10 (the rest
> of the drives) for the actual data.
>
> If your dataset is read heavy, or you have more than 6 disks, you can
> get away with RAID 5 for the actual data. But since you are talking
> about loading 8million rows at once, it certainly sounds like you are
> write heavy.
>
> If you only have a few disks, it's still probably better to put
> pg_xlog on it's own RAID1 (2-drive) mirror. pg_xlog is pretty much
> append only, so if you dedicate a disk set to it, you eliminate a lot
> of seek times.
>
>> I run PG 8.0.1 on Fedora Core 3
>>
>> When I now run the batch job, one CPU runs in the 80-90% the other in
>> 5-10% max.

>
>
>
> Anyway, it doesn't completely sound like you are CPU limited, but you
> might be able to get a little bit more if you spawn another process.
> Have you tried dropping the index, doing the copy, and then recreating
> the 4-indexes in separate processes?
>
> The simple test for this is to open 3-4 psql connections, have one of
> them drop the indexes and do the copy, in the other connections you
> can already have typed "CREATE INDEX ..." so when the copy is done and
> committed to the database, you just go to the other terminals and hit
> enter.
>
> Unfortunately you'll have to use wall clock time to see if this is
> faster.
>
> Though I think you could do the same thing with a bash script. The
> authentication should be in "trust" mode so that you don't take the
> time to type your password.
>
> #!/bin/bash
> psql -h <host> -c "DROP INDEX ...; COPY FROM ..."
>
> psql -h <host> -c "CREATE INDEX ..." &
> psql -h <host> -c "CREATE INDEX ..." &
> psql -h <host> -c "CREATE INDEX ..." &
> psql -h <host> -c "CREATE INDEX ..."
>
>
> Now, I don't really know how to wait for all child processes in a bash
> script (I could give you the python for it, but you're a perl guy).
> But by not spawning the last INDEX, I'm hoping it takes longer than
> the rest. Try to put the most difficult index there.
>
> Then you could just run
>
> time loadscript.sh
>
> I'm sure you could do the equivalent in perl. Just open multiple
> connections to the DB, and have them ready.
>
> I'm guessing since you are on a dual processor machine, you won't get
> much better performance above 2 connections.
>
> You can also try doing 2 COPYs at the same time, but it seems like you
> would have issues. Do you have any serial columns that you expect to
> be in a certain order, or is all the information in the copy?
>
> If the latter, try it, let us know what you get. I can't tell you the
> perl for this, since I'm not a perl guy.
>
> John
> =:->
>




---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:40 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717