Unix Technical Forum

file-locking and postmaster.pid

This is a discussion on file-locking and postmaster.pid within the pgsql Hackers forums, part of the PostgreSQL category; --> Hi all. I've experienced several times that PG has died somehow and the postmaster.pid file still exists 'cause PG ...


Go Back   Unix Technical Forum > Database Server Software > PostgreSQL > pgsql Hackers

Register FAQ Members List Calendar Search Today's Posts Mark Forums Read
  #1 (permalink)  
Old 04-12-2008, 02:33 AM
Andreas Joseph Krogh
 
Posts: n/a
Default file-locking and postmaster.pid

Hi all.

I've experienced several times that PG has died somehow and the postmaster.pid
file still exists 'cause PG hasn't had the ability to delete it upon proper
shutdown. Upon start-up, after such an incidence, PG tells me another PG is
running and that I either have to shut down the other instance, or delete the
postmaster.pid file if there really isn't an instance running. This seems
totally unnecessary to me. Why doesn't PG use file-locking to tell if another
PG is running or not? If PG holds an exclusive-lock on the pid-file and the
process crashes, or shuts down, then the lock(which is process-based and
controlled by the kernel) will be removed and another PG which tries to start
up can detect that. Using the existence of the pid-file as the only evidence
gives too many false positives IMO.

I'm sure there's a good reason for having it the way it is, having so many
smart knowledgeable people working on this project. Could someone please
explain the rationale of the current solution to me?

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #2 (permalink)  
Old 04-12-2008, 02:33 AM
Tom Lane
 
Posts: n/a
Default Re: file-locking and postmaster.pid

Andreas Joseph Krogh <andreak@officenet.no> writes:
> I've experienced several times that PG has died somehow and the postmaster.pid
> file still exists 'cause PG hasn't had the ability to delete it upon proper
> shutdown. Upon start-up, after such an incidence, PG tells me another PG is
> running and that I either have to shut down the other instance, or delete the
> postmaster.pid file if there really isn't an instance running. This seems
> totally unnecessary to me.


The postmaster does check to see whether the PID mentioned in the file
is still alive, so it's not that easy for the above to happen. If you
can provide details of a scenario where a failure is likely, we'd like
to know about it. Also, what PG version are you talking about?

> Why doesn't PG use file-locking to tell if another
> PG is running or not?


Portability.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #3 (permalink)  
Old 04-12-2008, 02:33 AM
Andreas Joseph Krogh
 
Posts: n/a
Default Re: file-locking and postmaster.pid

On Tuesday 23 May 2006 17:54, Tom Lane wrote:
> Andreas Joseph Krogh <andreak@officenet.no> writes:
> > I've experienced several times that PG has died somehow and the
> > postmaster.pid file still exists 'cause PG hasn't had the ability to
> > delete it upon proper shutdown. Upon start-up, after such an incidence,
> > PG tells me another PG is running and that I either have to shut down the
> > other instance, or delete the postmaster.pid file if there really isn't
> > an instance running. This seems totally unnecessary to me.

>
> The postmaster does check to see whether the PID mentioned in the file
> is still alive, so it's not that easy for the above to happen. If you
> can provide details of a scenario where a failure is likely, we'd like
> to know about it. Also, what PG version are you talking about?


I have experienced this with PG-8.1.3 and will provide details if I can make
it happen. Basically it has happened when I have had to "hard-reset" my
laptop due to some strange bugs in Linux which have made it hang.

> > Why doesn't PG use file-locking to tell if another
> > PG is running or not?

>
> Portability.


Ok.

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #4 (permalink)  
Old 04-12-2008, 02:33 AM
Tom Lane
 
Posts: n/a
Default Re: file-locking and postmaster.pid

Andreas Joseph Krogh <andreak@officenet.no> writes:
> On Tuesday 23 May 2006 17:54, Tom Lane wrote:
>> The postmaster does check to see whether the PID mentioned in the file
>> is still alive, so it's not that easy for the above to happen. If you
>> can provide details of a scenario where a failure is likely, we'd like
>> to know about it. Also, what PG version are you talking about?


> I have experienced this with PG-8.1.3 and will provide details if I can make
> it happen. Basically it has happened when I have had to "hard-reset" my
> laptop due to some strange bugs in Linux which have made it hang.


If you're talking about a postmaster that's auto-started during the boot
sequence, then there is a risk depending on what start script you use.
The problem is that depending on what else runs during the system
startup, the PID assigned to the postmaster might be the same as in the
last boot cycle, or it might be different by one or two counts. The
postmaster disregards a pidfile containing its own PID, or its parent
process' PID, or a PID not belonging to a postgres-owned process.
That covers most cases but if your start script does something like

su -l postgres -c "pg_ctl start ..."

then you have a situation where not only the parent process (pg_ctl)
but also the grandparent (a shell) is postgres-owned, and if the pidfile
PID happens to match the grandparent then you lose. Solution is to
either not use pg_ctl here, or write "exec pg_ctl start ...", so that
there's only one postgres-owned process besides the postmaster itself.

Initscripts published by PGDG itself and by Red Hat have gotten this
right for awhile, but I suspect the word has not propagated to all
distros.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #5 (permalink)  
Old 04-12-2008, 02:33 AM
Adis Nezirovic
 
Posts: n/a
Default Re: file-locking and postmaster.pid

On Tue, May 23, 2006 at 05:23:16PM +0200, Andreas Joseph Krogh wrote:
> Hi all.
>
> I've experienced several times that PG has died somehow and the postmaster.pid
> file still exists 'cause PG hasn't had the ability to delete it upon proper
> shutdown. Upon start-up, after such an incidence, PG tells me another PG is
> running and that I either have to shut down the other instance, or deletethe
> postmaster.pid file if there really isn't an instance running. This seems
> totally unnecessary to me. Why doesn't PG use file-locking to tell if another
> PG is running or not? If PG holds an exclusive-lock on the pid-file and the
> process crashes, or shuts down, then the lock(which is process-based and
> controlled by the kernel) will be removed and another PG which tries to start
> up can detect that. Using the existence of the pid-file as the only evidence
> gives too many false positives IMO.


Well, maybe you could tweak postgres startup script, add check for post
master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
delete pid file on negative results.

i.e.

#!/bin/bash
PID=`pgrep -f /usr/bin/postmaster`;

if [[ $PID ]]; then
echo "'$PID'";
# postgres is already running
else
echo "Postmaster is not running";
# delete stale PID file
fi

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQFEczm/4fWLop9YkWURAtFhAJ9buaaFfg/H1I3BGDakDPYdcsHuhACglHw+
59Yvis0JThlp/kUohuecfUc=
=YHNk
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #6 (permalink)  
Old 04-12-2008, 02:33 AM
Tom Lane
 
Posts: n/a
Default Re: file-locking and postmaster.pid

Adis Nezirovic <adis@linux.org.ba> writes:
> Well, maybe you could tweak postgres startup script, add check for post
> master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
> delete pid file on negative results.


This is exactly what you should NOT do.

A start script that thinks it is smarter than the postmaster is almost
certainly wrong. It is certainly dangerous, too, because auto-deleting
that pidfile destroys the interlock against having two postmasters
running in the same data directory (which WILL corrupt your data,
quickly and irretrievably). All it takes to cause a problem is to
use the start script to start a postmaster, forgetting that you already
have one running ...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #7 (permalink)  
Old 04-12-2008, 02:33 AM
Adis Nezirovic
 
Posts: n/a
Default Re: file-locking and postmaster.pid

On Tue, May 23, 2006 at 01:36:41PM -0400, Tom Lane wrote:
> This is exactly what you should NOT do.
>
> A start script that thinks it is smarter than the postmaster is almost
> certainly wrong. It is certainly dangerous, too, because auto-deleting
> that pidfile destroys the interlock against having two postmasters
> running in the same data directory (which WILL corrupt your data,
> quickly and irretrievably). All it takes to cause a problem is to
> use the start script to start a postmaster, forgetting that you already
> have one running ...


I do agree with you that we should not play games with postmaster.
Better to be safe than sorry. (So, manually deleting pid file is the
only safe option). I was just suggestion (possibly dangerous)
workaround.

Btw, I do check for running postmaster, using full path (I don't wan to
kill every postmaster on the system), is this safe? Or there could be
race condition?

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQFEc0754fWLop9YkWURAswkAJ9F8bOp+CcLLn9RPZEaq0 xjOpAiOwCeIAv1
/ZI0eHWYhnI+12oBk6P1lM4=
=ZZrd
-----END PGP SIGNATURE-----

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #8 (permalink)  
Old 04-12-2008, 02:34 AM
Andreas Joseph Krogh
 
Posts: n/a
Default Re: file-locking and postmaster.pid

On Tuesday 23 May 2006 19:36, Tom Lane wrote:
> Adis Nezirovic <adis@linux.org.ba> writes:
> > Well, maybe you could tweak postgres startup script, add check for post
> > master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and
> > delete pid file on negative results.

>
> This is exactly what you should NOT do.
>
> A start script that thinks it is smarter than the postmaster is almost
> certainly wrong. It is certainly dangerous, too, because auto-deleting
> that pidfile destroys the interlock against having two postmasters
> running in the same data directory (which WILL corrupt your data,
> quickly and irretrievably). All it takes to cause a problem is to
> use the start script to start a postmaster, forgetting that you already
> have one running ...


My PG is not started with startup-scripts, but with this command:

pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #9 (permalink)  
Old 04-12-2008, 02:34 AM
Andreas Joseph Krogh
 
Posts: n/a
Default Re: file-locking and postmaster.pid

On Wednesday 24 May 2006 11:36, Andreas Joseph Krogh wrote:
> On Tuesday 23 May 2006 19:36, Tom Lane wrote:
> > Adis Nezirovic <adis@linux.org.ba> writes:
> > > Well, maybe you could tweak postgres startup script, add check for post
> > > master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'),
> > > and delete pid file on negative results.

> >
> > This is exactly what you should NOT do.
> >
> > A start script that thinks it is smarter than the postmaster is almost
> > certainly wrong. It is certainly dangerous, too, because auto-deleting
> > that pidfile destroys the interlock against having two postmasters
> > running in the same data directory (which WILL corrupt your data,
> > quickly and irretrievably). All it takes to cause a problem is to
> > use the start script to start a postmaster, forgetting that you already
> > have one running ...

>
> My PG is not started with startup-scripts, but with this command:
>
> pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start


.... and manually after login, ie. not at boot-time.

--
Andreas Joseph Krogh <andreak@officenet.no>
Senior Software Developer / Manager
gpg public_key: http://dev.officenet.no/~andreak/public_key.asc
------------------------+---------------------------------------------+
OfficeNet AS | The most difficult thing in the world is to |
Hoffsveien 17 | know how to do a thing and to watch |
PO. Box 425 Skøyen | somebody else doing it wrong, without |
0213 Oslo | comment. |
NORWAY | |
Phone : +47 22 13 01 00 | |
Direct: +47 22 13 10 03 | |
Mobile: +47 909 56 963 | |
------------------------+---------------------------------------------+

---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
  #10 (permalink)  
Old 04-12-2008, 02:34 AM
Andrej Ricnik-Bay
 
Posts: n/a
Default Re: file-locking and postmaster.pid

On 5/24/06, Andreas Joseph Krogh <andreak@officenet.no> wrote:

> > My PG is not started with startup-scripts, but with this command:
> >
> > pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start

>
> ... and manually after login, ie. not at boot-time.

I'd suggest trying to fix your Linux-install instead of mucking
about with Postgres, and this really a pgsql-novice question,
not a -hackers thing.


Cheers,
Andrej


--
Please don't top post, and don't use HTML e-Mail :} Make your quotes concise.

http://www.american.edu/econ/notes/htmlmail.htm

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faq

Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!
Reply With Quote
Reply


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Forum Jump


All times are GMT. The time now is 05:13 PM.


Powered by vBulletin® Version 3.6.5
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.2.0
www.UnixAdminTalk.com