Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Performance

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-19-2008, 08:48 AM
Paul Lathrop
 
Posts: n/a
Default Defining performance.

Hello all,

I've been struggling with some performance questions regarding our
Postgres databases. Here's the background:

We run 4 ~25-30Gb databases which cache information from eBay. These
databases have had performance issues since before I joined the company.
The databases have gone through a number of iterations. Initially, they
were deployed as one huge database - performance was apparently
unacceptable. They were split, and tried on a variety of hardware
platforms. When I joined the company last year, the databases were
deployed on 12-disk RAID5 arrays on dual-proc AMD machines with 4Gb of
RAM, running Debian Woody and Postgres 7.2. These systems seemed to
suffer a gradually decreasing performance accompanied by a gradually
growing disk space usage. The DBA had come to the conclusion that the
VACUUM command did/does not work on these systems, because even after a
VACUUM FULL, the size of the database was continually increasing. So, as
things stand with the PG7.2 machines, vacuuming is run nightly, and
whenever the database size reaches 40Gb on disk (the point at which
performance has degraded below tolerance), the DBA exports the data,
deletes the database, and then imports the data, shrinking it to the
actual size of the dataset.

This process is time-consuming, costly, and the servers that we are
deployed on do not meet our stability requirements. So, after much
pushing, I was able to deploy a 12-disk RAID5 dual-proc AMD64 machine
with 16Gb of RAM running FreeBSD and Postgres 8.1.

The performance increase was immediate, obvious, and dramatic, as you
might expect from such a large boost in the underlying hardware.

This upgrade appears to have solved the VACUUM issue - regular VACUUM
commands now seem able to maintain the database size at a steady-state
(taking into account fluctuations associated with actual changes in the
dataset size!). We are now planning on moving the other three databases
to the new platform and hardware.

However, we still are suffering a gradual decrease in performance over
time - or so the application engineers claim. The DBA and I have been
banging our heads against this for a month.

Which brings me to the questions:

1) How does one define 'performance' anyway? Is it average time to
complete a query? If so, what kind of query? Is it some other metric?

2) I've combed the archives and seen evidence that people out there are
running much much larger databases on comparable hardware with decent
performance. Is this true, or is my dataset at about the limit of my
hardware?

3) Though this may seem irrelevant, since we are moving away from the
platform, it would still be good to know - was VACUUM actually
completely useless on PG7.2 or is there some other culprit on these
legacy machines?

4) Much of my reading of the PG docs and list archives seems to suggest
that much of performance tuning is done at the query level - you have to
know how to ask for information in an efficient way. To that end, I took
a look at some of the queries we get on a typical day. On average, 24
times per minute, our application causes a unique key violation. This
struck me as strange, but the VP of Engineering says this is a
performance ENHANCEMENT - the code doesn't bother checking for the
unique key because it depends on the database to enforce that. My
interpretation of reading the archives & docs seems to indicate that
this could be causing the constantly increasing database size... so now
that I've rambled about it, does an INSERT transaction that is rolled
back due to a unique key violation leave dead rows in the table? If so, why?

I really appreciate any information you guys can give me. I'm convinced
that PG is the best database for our needs, but I need to be able to get
this database performing well enough to convince the bigwigs.

Regards,
Paul Lathrop
Systems Administrator


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFb2JTX6ecHn3cW4kRA6gJAKCm53oD/W4zzsSHugVeoRtGqLpzrACdFywG
qBp2AYzWJmJIBKe3UGegFeI=
=P83E
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-19-2008, 08:48 AM
Tobias Brox
 
Posts: n/a
Default Re: Defining performance.

[Paul Lathrop - Thu at 02:59:27PM -0800]
> growing disk space usage. The DBA had come to the conclusion that the
> VACUUM command did/does not work on these systems, because even after a
> VACUUM FULL, the size of the database was continually increasing. So, as
> things stand with the PG7.2 machines, vacuuming is run nightly, and
> whenever the database size reaches 40Gb on disk (the point at which
> performance has degraded below tolerance), the DBA exports the data,
> deletes the database, and then imports the data, shrinking it to the
> actual size of the dataset.


We found one reason why vacuuming didn't always work for us - we had
long running transactions - in addition to killing the vacuum, it did
really nasty things to the performance in general.

To check for those transactions, I think it's needed to turn on
stats_command_string in the config.

I use this query to check:

select * from pg_stat_activity where current_query<>'<IDLE>' order by
query_start ;

If you spot any "<IDLE> in transaction" with an old query_start
timestamp, then that's most probably the reason.

Long running transactions doesn't have to be idle ... check the pg_locks
view for the lowest transactionid and compare (through the pid) with the
pg_stat_activity view to find the actual backend.

> However, we still are suffering a gradual decrease in performance over
> time - or so the application engineers claim. The DBA and I have been
> banging our heads against this for a month.


We're having the same issues, so we do the dumping and restoring every
now and then to be sure everything is properly cleaned up. With 8.1.

> 1) How does one define 'performance' anyway? Is it average time to
> complete a query? If so, what kind of query? Is it some other metric?


We have the same kind of problem, and the project leader (I sometimes
refer him as the "bottleneck" ;-) is most concerned about iowait at our
cpu graphs. Anyway, we do have other measures:

- our applications does log the duration of each request towards the
application as well as each query towards the database. If the
request (this is web servers) is taking "too long" time, it's logged
as error instead of debug. If a significant number of such errors
is due to database calls taking too much time, then the performance
is bad. Unfortunately, we have no way to automate such checking.

- I've setting up two scripts pinging that pg_stat_activity view every
now and then, logging how much "gruff" it finds there. Those two
scripts are eventually to be merged. One is simply logging what it
finds, the other is a plugin system to the Munin graphing package.

I've thrown the scripts we use out here:

http://oppetid.no/~tobixen/pg_activity_log.txt
http://oppetid.no/~tobixen/pg_activity.munin.txt

(I had to rename them to .txt to get the web server to play along).

Those are very as-is, should certainly be modified a bit to fit to any
other production environment. :-)

The pg_activity_log dumps a single number indicating the "stress level"
of the database to a file. I think this stress number, when taking out
i.e. the 20% worst numbers from the file for each day, can indicate
something about the performance of the database server. However, I
haven't had the chance to discuss it with the bottleneck yet.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-19-2008, 08:48 AM
Tom Lane
 
Posts: n/a
Default Re: Defining performance.

Paul Lathrop <plathrop@squaretrade.com> writes:
> ... When I joined the company last year, the databases were
> deployed on 12-disk RAID5 arrays on dual-proc AMD machines with 4Gb of
> RAM, running Debian Woody and Postgres 7.2. These systems seemed to
> suffer a gradually decreasing performance accompanied by a gradually
> growing disk space usage. The DBA had come to the conclusion that the
> VACUUM command did/does not work on these systems, because even after a
> VACUUM FULL, the size of the database was continually increasing.


The very first thing you need to do is get off 7.2.

After that, I'd recommend looking at *not* using VACUUM FULL. FULL is
actually counterproductive in a lot of scenarios, because it shrinks the
tables at the price of bloating the indexes. And 7.2's poor ability to
reuse index space turns that into a double whammy. Have you checked
into the relative sizes of tables and indexes and tracked the trend over
time?

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-19-2008, 08:48 AM
Scott Marlowe
 
Posts: n/a
Default Re: Defining performance.

On Thu, 2006-11-30 at 18:26, Tom Lane wrote:
> Paul Lathrop <plathrop@squaretrade.com> writes:
> > ... When I joined the company last year, the databases were
> > deployed on 12-disk RAID5 arrays on dual-proc AMD machines with 4Gb of
> > RAM, running Debian Woody and Postgres 7.2. These systems seemed to
> > suffer a gradually decreasing performance accompanied by a gradually
> > growing disk space usage. The DBA had come to the conclusion that the
> > VACUUM command did/does not work on these systems, because even after a
> > VACUUM FULL, the size of the database was continually increasing.

>
> The very first thing you need to do is get off 7.2.
>
> After that, I'd recommend looking at *not* using VACUUM FULL. FULL is
> actually counterproductive in a lot of scenarios, because it shrinks the
> tables at the price of bloating the indexes. And 7.2's poor ability to
> reuse index space turns that into a double whammy. Have you checked
> into the relative sizes of tables and indexes and tracked the trend over
> time?


And if you cant get off 7.2, look into scheduling some downtime to run
reindex on the bloated indexes.

In all honesty, a simple single processor workstation with a gig of ram
and a couple of good sized SATA drives and a modern linux distro can
probably outrun your 7.2 server if it's running on 8.1 / 8.2

It's that much faster now.

For the love of all that's holy, as well as your data, start planning
your migration now, and if you can, have it done by the end of next week
or so.

And backup every night religiously.

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-19-2008, 08:48 AM
Tobias Brox
 
Posts: n/a
Default Re: Defining performance.

[Jeff Davis - Thu at 04:57:54PM -0800]
> > We're having the same issues, so we do the dumping and restoring every
> > now and then to be sure everything is properly cleaned up. With 8.1.
> >

>
> What's causing that? Is it index bloat?
>
> I would think a REINDEX would avoid having to dump/restore, right? A
> CLUSTER might also be necessary, depending on what kind of performance
> degradation you're experiencing.
>
> Am I missing something?


Just as with Paul Lathrops case, the performance degradation is
something perceived by the application developers. We haven't had time
to actually verify reliably that the performance is actually beeing
degraded, neither that the reload beeing done helps (after we resolved
the pending transaction issue, anyway), nor look into what the possible
reasons of this percieved degradation could be.


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-19-2008, 08:48 AM
Jeff Davis
 
Posts: n/a
Default Re: Defining performance.

On Fri, 2006-12-01 at 01:05 +0100, Tobias Brox wrote:
> > However, we still are suffering a gradual decrease in performance over
> > time - or so the application engineers claim. The DBA and I have been
> > banging our heads against this for a month.

>
> We're having the same issues, so we do the dumping and restoring every
> now and then to be sure everything is properly cleaned up. With 8.1.
>


What's causing that? Is it index bloat?

I would think a REINDEX would avoid having to dump/restore, right? A
CLUSTER might also be necessary, depending on what kind of performance
degradation you're experiencing.

Am I missing something?

Regards,
Jeff Davis


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-19-2008, 08:48 AM
Heikki Linnakangas
 
Posts: n/a
Default Re: Defining performance.

Paul Lathrop wrote:
> We run 4 ~25-30Gb databases which cache information from eBay. These
> databases have had performance issues since before I joined the company.
> The databases have gone through a number of iterations. Initially, they
> were deployed as one huge database - performance was apparently
> unacceptable. They were split, and tried on a variety of hardware
> platforms. When I joined the company last year, the databases were
> deployed on 12-disk RAID5 arrays on dual-proc AMD machines with 4Gb of
> RAM, running Debian Woody and Postgres 7.2.


Well, first of all you need to upgrade. 7.2 is old and not supported
anymore.

> These systems seemed to
> suffer a gradually decreasing performance accompanied by a gradually
> growing disk space usage. The DBA had come to the conclusion that the
> VACUUM command did/does not work on these systems, because even after a
> VACUUM FULL, the size of the database was continually increasing. So, as
> things stand with the PG7.2 machines, vacuuming is run nightly, and
> whenever the database size reaches 40Gb on disk (the point at which
> performance has degraded below tolerance), the DBA exports the data,
> deletes the database, and then imports the data, shrinking it to the
> actual size of the dataset.


Vacuum didn't reclaim empty index pages until 7.4, so you might be
suffering from index bloat. A nightly reindex would help with that.

> This process is time-consuming, costly, and the servers that we are
> deployed on do not meet our stability requirements. So, after much
> pushing, I was able to deploy a 12-disk RAID5 dual-proc AMD64 machine
> with 16Gb of RAM running FreeBSD and Postgres 8.1.


You should give 8.2 (now in beta stage) a try as well. There's some
significant performance enhancements, for example vacuums should run faster.

> 1) How does one define 'performance' anyway? Is it average time to
> complete a query? If so, what kind of query? Is it some other metric?


Performance is really an umbrella term that can mean a lot of things.
You'll have to come up with a metric that's most meaningful to you and
that you can easily measure. Some typical metrics are:

* Average response time to a query/queries
* Max or 90% percentile response time to a query
* throughput, transactions per second

You'll have to start measuring performance somehow. You might find out
that actually your performance is bad only during some hour of day for
example. Take a look at the log_min_duration_statement parameter in more
recent versions for starter.

> 2) I've combed the archives and seen evidence that people out there are
> running much much larger databases on comparable hardware with decent
> performance. Is this true, or is my dataset at about the limit of my
> hardware?


It depends on your load, really. A dataset of ~ 40 GB is certainly not
that big compared to what some people have.

> 3) Though this may seem irrelevant, since we are moving away from the
> platform, it would still be good to know - was VACUUM actually
> completely useless on PG7.2 or is there some other culprit on these
> legacy machines?


It's certainly better nowadays..

> 4) Much of my reading of the PG docs and list archives seems to suggest
> that much of performance tuning is done at the query level - you have to
> know how to ask for information in an efficient way. To that end, I took
> a look at some of the queries we get on a typical day. On average, 24
> times per minute, our application causes a unique key violation. This
> struck me as strange, but the VP of Engineering says this is a
> performance ENHANCEMENT - the code doesn't bother checking for the
> unique key because it depends on the database to enforce that. My
> interpretation of reading the archives & docs seems to indicate that
> this could be causing the constantly increasing database size... so now
> that I've rambled about it, does an INSERT transaction that is rolled
> back due to a unique key violation leave dead rows in the table? If so, why?


The way unique checking works in PostgreSQL is:

1. The row is inserted into heap.
2. The corresponding index tuple is inserted to index. While doing that,
we check that there's no duplicate key there already.

So yes, a unique key violation will leave the dead tuple in the heap,
and it will be removed by vacuum later on.

I think it's a valid and sane approach to leave the uniqueness check to
the database. Unless a very large proportion of your transactions abort
due to unique key violations, the dead rows won't be a problem. The
space will be reclaimed by vacuum.

In general, it's normal that there's some dead rows in the database. As
long as you vacuum regularly, the database size should eventually reach
a steady-state where it doesn't grow anymore, unless the real live
dataset size increases.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 08:20 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710