Unix Technical Forum

more autovacuum fixes

This is a discussion on more autovacuum fixes within the Pgsql Patches forums, part of the PostgreSQL category; --> I attach a patch to fix some issues in the autovac process handling code. Mainly, instead of rechecking periodically ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > Pgsql Patches

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-18-2008, 10:15 AM
Alvaro Herrera
 
Posts: n/a
Default more autovacuum fixes

I attach a patch to fix some issues in the autovac process handling
code. Mainly, instead of rechecking periodically in the launcher
whether a worker was able to start or not, what we do is signal it from
the postmaster when it fails.

In order to do this I had to make the postmaster set a flag in shared
memory, much like the pmsignal.c code does. According to previous
discussions this is an acceptable thing for postmaster to do.

I must admit I did not try to fill shared memory with garbage and see if
it caused a postmaster crash. What I did test was creating a database,
making sure that the launcher was picking it up, and then delete the
database's directory. This correctly causes a worker to log errors when
trying to connect, and it is handled well -- i.e. other databases are
not starved. (One remaining problem I see is that a database in this
state, or similar, that fails to be vacuumed, *will* cause starvation if
in danger of Xid wraparound).

Another thing I noticed is that I can change the "autovacuum" parameter
in postgresql.conf and send SIGHUP to postmaster, which correctly
starts the launcher if it was previously disabled. To make the shutdown
case act accordingly, I made the launcher check the "start daemon" flag
after rereading the config file, and quit if it's off. I tried multiple
combinations of having it set and unset, changing
stats_collect_tuple_level and it works fine.

One problem with the patch is this (new code):

bn = (Backend *) malloc(sizeof(Backend));
! if (bn)
{
! bn->pid = StartAutoVacWorker();
! bn->is_autovacuum = true;
! /* we don't need a cancel key */

! if (bn->pid > 0)
! {
! /* FIXME -- unchecked memory allocation here */
! DLAddHead(BackendList, DLNewElem(bn));


If the palloc() inside DLNewElem fails, we will fail to report a "fork
failure" to the launcher. I am not sure how serious this is. One idea
that came to mind was using a PG_TRY block, sending the signal in the
CATCH block, and then rethrowing the exception. Is this acceptable?


All in all, this patch increases the reliability of autovacuum, so I
intend to apply it shortly unless there are objections. Further
improvements might come later as I become aware of possible fixes.

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"Hay quien adquiere la mala costumbre de ser infeliz" (M. A. Evans)


---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-18-2008, 10:15 AM
Alvaro Herrera
 
Posts: n/a
Default Re: more autovacuum fixes

Alvaro Herrera wrote:

> One problem with the patch is this (new code):
>
> bn = (Backend *) malloc(sizeof(Backend));
> ! if (bn)
> {
> ! bn->pid = StartAutoVacWorker();
> ! bn->is_autovacuum = true;
> ! /* we don't need a cancel key */
>
> ! if (bn->pid > 0)
> ! {
> ! /* FIXME -- unchecked memory allocation here */
> ! DLAddHead(BackendList, DLNewElem(bn));
>
>
> If the palloc() inside DLNewElem fails, we will fail to report a "fork
> failure" to the launcher. I am not sure how serious this is. One idea
> that came to mind was using a PG_TRY block, sending the signal in the
> CATCH block, and then rethrowing the exception. Is this acceptable?


I noticed another problem: the worker may fail during BaseInit() or
InitProcess(). This is not where most problems will be (that would be
later, in InitPostgres(), which is when the worker connects to a DB) but
still could cause a starvation problem, I think. Maybe the PG_TRY block
is called for in there, as well as the postmaster code.

--
Alvaro Herrera http://www.PlanetPostgreSQL.org/
"The ability to monopolize a planet is insignificant
next to the power of the source"

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-18-2008, 10:15 AM
Alvaro Herrera
 
Posts: n/a
Default Re: more autovacuum fixes

Alvaro Herrera wrote:

> One problem with the patch is this (new code):
>
> bn = (Backend *) malloc(sizeof(Backend));
> ! if (bn)
> {
> ! bn->pid = StartAutoVacWorker();
> ! bn->is_autovacuum = true;
> ! /* we don't need a cancel key */
>
> ! if (bn->pid > 0)
> ! {
> ! /* FIXME -- unchecked memory allocation here */
> ! DLAddHead(BackendList, DLNewElem(bn));
>
>
> If the palloc() inside DLNewElem fails, we will fail to report a "fork
> failure" to the launcher. I am not sure how serious this is.


Turns out that this problem is not serious at all, because if that
palloc() fails, the whole postmaster will exit with a FATAL out of
memory message.

The problems in the worker code after fork are still an issue though.

--
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
"MySQL is a toy compared to PostgreSQL." (Randal L. Schwartz)
(http://archives.postgresql.org/pgsql...7/msg00517.php)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-18-2008, 10:15 AM
Alvaro Herrera
 
Posts: n/a
Default Re: more autovacuum fixes

Alvaro Herrera wrote:

> Turns out that this problem is not serious at all, because if that
> palloc() fails, the whole postmaster will exit with a FATAL out of
> memory message.
>
> The problems in the worker code after fork are still an issue though.


It _is_ still an issue -- and a very serious issue at that. If a worker
fails before getting its entry from the startingWorker pointer, then
(patched) autovacuum will no longer run any task.

Since it doesn't seem like it is possible to signal the launcher until
the worker has set up shared memory, I think we should leave the current
shenanigans in place. They are really ugly code and I'd love to get rid
of them, but currently I don't see any better mechanism to deal with
this failure scenario.

Thanks for your attention ;-)

--
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Bob [Floyd] used to say that he was planning to get a Ph.D. by the "green
stamp method," namely by saving envelopes addressed to him as 'Dr. Floyd'.
After collecting 500 such letters, he mused, a university somewhere in
Arizona would probably grant him a degree. (Don Knuth)

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-18-2008, 10:15 AM
Alvaro Herrera
 
Posts: n/a
Default Re: more autovacuum fixes

Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>
> > Turns out that this problem is not serious at all, because if that
> > palloc() fails, the whole postmaster will exit with a FATAL out of
> > memory message.
> >
> > The problems in the worker code after fork are still an issue though.

>
> It _is_ still an issue -- and a very serious issue at that. If a worker
> fails before getting its entry from the startingWorker pointer, then
> (patched) autovacuum will no longer run any task.


I figured that I could keep the old check there for when the worker
failed, but still add the new signalling mechanism so that a fork()
failure (which I would think is the most likely of all) is taken care of
in a more timely manner.

I've also improved the rest of the code and comments a bit, and the new
code does seem better now. So I'll go ahead and commit it later today.

--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
"Uno puede defenderse de los ataques; contra los elogios se esta indefenso"


---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

http://archives.postgresql.org

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 04:24 AM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com