Unix Technical Forum

SEO

vBulletin Search Engine Optimization


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-17-2008, 11:38 PM
Alvaro Herrera
 
Posts: n/a
Default Re: Autovacuum integration


Hackers,

Here is a second attempt at autovacuum integration. Please have a look
at it. Note that this patch automatically creates three new files:

src/backend/postmaster/autovacuum.c
src/include/catalog/pg_autovacuum.h
src/include/postmaster/autovacuum.h


There are the following changes since the previous version:

- Xid wraparound is taken care of, with a database-wide VACUUM when the
limit is too close.

- Manually executing VACUUM or ANALYZE now updates the statistics for
autovacuum.

- EXEC_BACKEND case was relieved of a few bugs -- it now actually works.

- elog(ERROR) is now handled more gracefully; the error is emitted,
instead of a SEGV happening. The end result is the same, however:
a single error takes the process down, and the database will be
vacuumed again only after all other databases are processed.

- Adapted to the new cwd convention.

- My email address is changed, so that I won't be posting the patch
several times hopefully :-)


Note that I didn't make the autovacuum daemon more verbose. Running
postmaster -d2 (or server_log_messages or whatever the option is called,
to DEBUG2 or higher) shows some messages which are, I hope, what Matthew
was expecting. (It this is not the case please let me know.)

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-17-2008, 11:39 PM
Tom Lane
 
Posts: n/a
Default Re: Autovacuum integration

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> Here is a second attempt at autovacuum integration.


A few comments:

* I strongly disagree with keeping updatable state in a catalog. In the
first place, that will cause autovac itself to create vacuumable trash.
In the second place, that means you can't modify the internal state kept
by autovacuum without forcing initdb; which is not a good situation,
considering how much change this code is likely to go through.
I think all the updatable state should be kept in the pgstats file.
(Memo to self: let's add a version number header to the pgstats file,
so that we can change its format without requiring an initdb to do so.)
pg_autovacuum should only contain user-settable parameters --- which is
still putting us at nontrivial risk for initdb constraints, but it's way
better than internal state. If you do that, then pg_autovacuum need
only contain entries for tables that have been given non-default
parameter settings (ie, those for which the user wants to override
global settings given in postgresql.conf).

* I'm pretty dubious about adding a syscache for pg_autovacuum.
Where exactly are the repeated fetches going to come from to justify
caching it?

* The signal processing needs re-work; you can't just randomly assign
signal numbers, you need to keep these things largely consistent with
the other subprocesses. In particular, if this thing is going to be
running transactions then it MUST honor SIGALRM (eg, for deadlock
timeout checks) and SIGUSR1 (sinval catchup interrupt), and I don't
think you get to ignore SIGTERM either (may get this from init).
I think it's a pretty bad idea to use SIGUSR2 for something other than
what regular backends use it for, too. (Consider the prospect that for
some reason your PID occurs in pg_listener.) It'd be a good idea to
honor SIGINT with the normal meaning (query cancel, ie, abort
transaction, which in this context would also imply process shutdown)
and use that to shut down the daemon at postmaster stop. In short,
the signal handling had better look a whole lot more like a normal
backend's.

* Speaking of shutdown, you can't stop the bgwriter until you know
that the autovac daemon is dead. In this respect too, it's much more
like a backend than like any of the other support processes. Signal it
when you signal the backends, and don't advance to the bgwriter kill
phase until autovac and all the backends are known gone.

* I see you have an autovac_init function to "annoy the user", but
shouldn't this be checked every time we are about to spawn an autovac
process?

* I don't see any special checks for shared catalogs, which means they
are probably going to be over-vacuumed; or possibly under-vacuumed if
you fail to track the update stats for them in a single place rather
than in each database. It'd probably be a good idea to nominate just
one database to be the place from which shared catalogs are vacuumed;
or treat them as a completely separate special case.

* I have no objection to adding extra entry points to vacuum.c to
simplify the calls to it.

* With respect to comments like:

+ /*
+ * We just skip a table not found on the stat hash. This is
+ * because whatever we do here, there won't be good statistics
+ * next time around, so we would do this same action again. It is
+ * tempting to issue an ANALYZE, but it won't work because ANALYZE
+ * doesn't update the stat system.
+ */

ISTM the entire point of "autovac integration" is to make sure the rest
of the system does what's needed to support autovac. If ANALYZE needs
to send something to the stats system, make it do so. (ISTM it'd be
reasonable for ANALYZE to make and report an estimate of the dead tuple
count, which after all is what you are hoping all the rest of the stats
machinery will derive for you...)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-17-2008, 11:39 PM
Alvaro Herrera
 
Posts: n/a
Default Re: Autovacuum integration

On Fri, Jul 08, 2005 at 03:56:25PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Here is a second attempt at autovacuum integration.

>
> A few comments:


Excellent, thanks. I'll be working on fixing all these.

> * I strongly disagree with keeping updatable state in a catalog. In the
> first place, that will cause autovac itself to create vacuumable trash.


Yes, I thought about that too. I think the argument that convinced me
to do otherwise was the possibility of messages being lost in the path
to stat collector. And I thought that it would be possible to change
tuples in-place, like vacuum does for pg_class. I don't have a problem
with changing it though.

The initdb-forcing point is a good one, didn't think of it.

> pg_autovacuum should only contain user-settable parameters --- which is
> still putting us at nontrivial risk for initdb constraints, but it's way
> better than internal state. If you do that, then pg_autovacuum need
> only contain entries for tables that have been given non-default
> parameter settings.


Ok. Do you think it's worth the trouble to have ALTER TABLE commands to
change autovac parameters?

> * The signal processing needs re-work; you can't just randomly assign
> signal numbers, you need to keep these things largely consistent with
> the other subprocesses.


I think I copied this from pgarch.c, which I can see was a really bad
idea. Will fix.

> * I see you have an autovac_init function to "annoy the user", but
> shouldn't this be checked every time we are about to spawn an autovac
> process?


You argued exactly the other way around to Matthew, before 8.0 freeze.
Personally I don't care either way.


> * I don't see any special checks for shared catalogs, which means they
> are probably going to be over-vacuumed; or possibly under-vacuumed if
> you fail to track the update stats for them in a single place rather
> than in each database. It'd probably be a good idea to nominate just
> one database to be the place from which shared catalogs are vacuumed;
> or treat them as a completely separate special case.


Yeah. The problem with this is that any particular database could be
absent from the stat hash. Personally I think we should make the stat
system aware of all databases, whether there has been activity in them
or not. That'd make this problem moot, and we could skip those tables
in any databases except template1.


Anything I omitted I agree with, so I'll fix things up accordingly.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
"The Gord often wonders why people threaten never to come back after they've
been told never to return" (www.actsofgord.com)

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-17-2008, 11:39 PM
Tom Lane
 
Posts: n/a
Default Re: Autovacuum integration

Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> On Fri, Jul 08, 2005 at 03:56:25PM -0400, Tom Lane wrote:
>> pg_autovacuum should only contain user-settable parameters ---


> Ok. Do you think it's worth the trouble to have ALTER TABLE commands to
> change autovac parameters?


I think we'll want them eventually, but I don't mind if 8.1 ships
without 'em.

>> * I see you have an autovac_init function to "annoy the user", but
>> shouldn't this be checked every time we are about to spawn an autovac
>> process?


> You argued exactly the other way around to Matthew, before 8.0 freeze.


Did I? I've forgotten the reasoning if so. But it's a minor point.

> Yeah. The problem with this is that any particular database could be
> absent from the stat hash. Personally I think we should make the stat
> system aware of all databases, whether there has been activity in them
> or not. That'd make this problem moot, and we could skip those tables
> in any databases except template1.


Hm. There's nothing wrong with having CREATE DATABASE send a message to
pgstats, but that could get lost anyway. A probably more significant
point is that either postgres or template1 is subject to being dropped;
so hardwiring a single database to do it from is likely not gonna work.

It'd be okay to *consider* vacuuming the shared relations on every
cycle, so long as the stats were managed correctly (ie, only one set of
stats kept). This is probably easier to do in the context of the
stats-in-pgstats idea than before. We could have pgstats deliberately
create an entry for "database zero", and put the table stats for shared
relations there. The only real issue is that we have no place to keep
any nondefault parameters for these catalogs (no, I don't wish to invent
pg_shautovacuum to fix that ;-)) so they'd always be autovacuumed with
the global default parameters. This seems acceptable to me.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-17-2008, 11:39 PM
Matthew T. O'Connor
 
Posts: n/a
Default Re: Autovacuum integration

Tom Lane wrote:

>Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>
>
>>Ok. Do you think it's worth the trouble to have ALTER TABLE commands to
>>change autovac parameters?
>>
>>

>
>I think we'll want them eventually, but I don't mind if 8.1 ships
>without 'em.
>
>


It might be good to ship 8.1 without them since autovacuum might change
a fair amount between 8.1 and 8.2 (FSM integration, better formula for
vacuum decisions etc...)


>>Yeah. The problem with this is that any particular database could be
>>absent from the stat hash. Personally I think we should make the stat
>>system aware of all databases, whether there has been activity in them
>>or not. That'd make this problem moot, and we could skip those tables
>>in any databases except template1.
>>
>>

>
>Hm. There's nothing wrong with having CREATE DATABASE send a message to
>pgstats, but that could get lost anyway. A probably more significant
>point is that either postgres or template1 is subject to being dropped;
>so hardwiring a single database to do it from is likely not gonna work.
>
>It'd be okay to *consider* vacuuming the shared relations on every
>cycle, so long as the stats were managed correctly (ie, only one set of
>stats kept). This is probably easier to do in the context of the
>stats-in-pgstats idea than before. We could have pgstats deliberately
>create an entry for "database zero", and put the table stats for shared
>relations there. The only real issue is that we have no place to keep
>any nondefault parameters for these catalogs (no, I don't wish to invent
>pg_shautovacuum to fix that ;-)) so they'd always be autovacuumed with
>the global default parameters. This seems acceptable to me.
>


If memory serves from development discussion of the contrib version,
isn't there an issue that while a shared relation need only be vacuumed
in one database in order to reclaim space etc, the analyze statistics
need to be updated in each database. Does that ring a bell or am I way
off base here?



---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-17-2008, 11:39 PM
Tom Lane
 
Posts: n/a
Default Re: Autovacuum integration

"Matthew T. O'Connor" <matthew@zeut.net> writes:
> Tom Lane wrote:
>> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>>> Ok. Do you think it's worth the trouble to have ALTER TABLE commands to
>>> change autovac parameters?

>>
>> I think we'll want them eventually, but I don't mind if 8.1 ships
>> without 'em.
>>

> It might be good to ship 8.1 without them since autovacuum might change
> a fair amount between 8.1 and 8.2 (FSM integration, better formula for
> vacuum decisions etc...)


Yeah, I was thinking the same thing: the more support there is, the
harder it will be to change the parameter set. Leaving it as a catalog
only will send a suitable message that this is still a prototype ...

> If memory serves from development discussion of the contrib version,
> isn't there an issue that while a shared relation need only be vacuumed
> in one database in order to reclaim space etc, the analyze statistics
> need to be updated in each database. Does that ring a bell or am I way
> off base here?


That's a fair point, but I'm not sure how much we care about maintaining
stats for the shared relations. These days the planner should get it
reasonably right about the relation size anyway, and the availability of
pg_statistic data is only a secondary concern; especially since we have
unique indexes on most of the columns people might search on, and the
presence of such an index also clues the planner.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-17-2008, 11:40 PM
Alvaro Herrera
 
Posts: n/a
Default Re: Autovacuum integration

On Fri, Jul 08, 2005 at 03:56:25PM -0400, Tom Lane wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> > Here is a second attempt at autovacuum integration.

>
> A few comments:


Ok, here is an updated patch. Hopefully I have covered most your
more important observations. Particularly, I changed the shutdown
sequence per your comments, and the pg_autovacuum tuple is optional.

Additional comments:

> * I see you have an autovac_init function to "annoy the user", but
> shouldn't this be checked every time we are about to spawn an autovac
> process?


I didn't do anything about this (i.e. it only happens once). Note that
if we annoy the user because of this, the autovacuum process is
disabled "forever."

> * I don't see any special checks for shared catalogs, which means they
> are probably going to be over-vacuumed; or possibly under-vacuumed if
> you fail to track the update stats for them in a single place rather
> than in each database.


I'm still not doing anything special about shared relations. I think it
would be easy to treat them in a special way.

> * I have no objection to adding extra entry points to vacuum.c to
> simplify the calls to it.


I didn't do it, because it uglified the code. Rather, I added a relid
member to VacuumStmt.

> If ANALYZE needs to send something to the stats system, make it do
> so.


It does, as does VACUUM. I still think we should do something special
on TRUNCATE, maybe send a special message.


Note that I keep track of dead tuples directly in the stats for each
table, rather than keeping track of "last vacuum tuples", which was a
strange concept anyway. It came out being much simpler this way. The
only consideration is that it makes the vacuum case different from
analyze, but I don't see that as a problem.

Also, there are tables for which analyze refuses to run. I'm not sure
what to do about them. The problem is that since ANALYZE doesn't run to
completion, it doesn't emit the stat message, so we try to analyze it
the next time around. I considered sending special messages to the
stats, telling not to analyze the table in the future (vacuum still
works as expected). However I don't see how would we re-enable the
auto-analyze feature in case something happens to the table. There are
two cases: pg_statistics is one, and the other is tables that don't have
any analyzable columns (There is at least one table of this kind in the
regression test, comprising one "box" column.) This may turn out not to
be a problem, since ANALYZE will return very quickly in this case, but
it annoys me anyway.

Finally: I didn't do anything special about TOAST tables yet. I think
this is a separate problem.

--
Alvaro Herrera (<alvherre[a]alvh.no-ip.org>)
Thou shalt study thy libraries and strive not to reinvent them without
cause, that thy code may be short and readable and thy days pleasant
and productive. (7th Commandment for C Programmers)


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 12:53 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
UnixAdminTalk.com

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691