This is a discussion on file-locking and postmaster.pid within the pgsql Hackers forums, part of the PostgreSQL category; --> Hi all. I've experienced several times that PG has died somehow and the postmaster.pid file still exists 'cause PG ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Hi all. I've experienced several times that PG has died somehow and the postmaster.pid file still exists 'cause PG hasn't had the ability to delete it upon proper shutdown. Upon start-up, after such an incidence, PG tells me another PG is running and that I either have to shut down the other instance, or delete the postmaster.pid file if there really isn't an instance running. This seems totally unnecessary to me. Why doesn't PG use file-locking to tell if another PG is running or not? If PG holds an exclusive-lock on the pid-file and the process crashes, or shuts down, then the lock(which is process-based and controlled by the kernel) will be removed and another PG which tries to start up can detect that. Using the existence of the pid-file as the only evidence gives too many false positives IMO. I'm sure there's a good reason for having it the way it is, having so many smart knowledgeable people working on this project. Could someone please explain the rationale of the current solution to me? -- Andreas Joseph Krogh <andreak@officenet.no> Senior Software Developer / Manager gpg public_key: http://dev.officenet.no/~andreak/public_key.asc ------------------------+---------------------------------------------+ OfficeNet AS | The most difficult thing in the world is to | Hoffsveien 17 | know how to do a thing and to watch | PO. Box 425 Skøyen | somebody else doing it wrong, without | 0213 Oslo | comment. | NORWAY | | Phone : +47 22 13 01 00 | | Direct: +47 22 13 10 03 | | Mobile: +47 909 56 963 | | ------------------------+---------------------------------------------+ ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings |
| |||
| Andreas Joseph Krogh <andreak@officenet.no> writes: > I've experienced several times that PG has died somehow and the postmaster.pid > file still exists 'cause PG hasn't had the ability to delete it upon proper > shutdown. Upon start-up, after such an incidence, PG tells me another PG is > running and that I either have to shut down the other instance, or delete the > postmaster.pid file if there really isn't an instance running. This seems > totally unnecessary to me. The postmaster does check to see whether the PID mentioned in the file is still alive, so it's not that easy for the above to happen. If you can provide details of a scenario where a failure is likely, we'd like to know about it. Also, what PG version are you talking about? > Why doesn't PG use file-locking to tell if another > PG is running or not? Portability. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| |||
| On Tuesday 23 May 2006 17:54, Tom Lane wrote: > Andreas Joseph Krogh <andreak@officenet.no> writes: > > I've experienced several times that PG has died somehow and the > > postmaster.pid file still exists 'cause PG hasn't had the ability to > > delete it upon proper shutdown. Upon start-up, after such an incidence, > > PG tells me another PG is running and that I either have to shut down the > > other instance, or delete the postmaster.pid file if there really isn't > > an instance running. This seems totally unnecessary to me. > > The postmaster does check to see whether the PID mentioned in the file > is still alive, so it's not that easy for the above to happen. If you > can provide details of a scenario where a failure is likely, we'd like > to know about it. Also, what PG version are you talking about? I have experienced this with PG-8.1.3 and will provide details if I can make it happen. Basically it has happened when I have had to "hard-reset" my laptop due to some strange bugs in Linux which have made it hang. > > Why doesn't PG use file-locking to tell if another > > PG is running or not? > > Portability. Ok. -- Andreas Joseph Krogh <andreak@officenet.no> Senior Software Developer / Manager gpg public_key: http://dev.officenet.no/~andreak/public_key.asc ------------------------+---------------------------------------------+ OfficeNet AS | The most difficult thing in the world is to | Hoffsveien 17 | know how to do a thing and to watch | PO. Box 425 Skøyen | somebody else doing it wrong, without | 0213 Oslo | comment. | NORWAY | | Phone : +47 22 13 01 00 | | Direct: +47 22 13 10 03 | | Mobile: +47 909 56 963 | | ------------------------+---------------------------------------------+ ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| Andreas Joseph Krogh <andreak@officenet.no> writes: > On Tuesday 23 May 2006 17:54, Tom Lane wrote: >> The postmaster does check to see whether the PID mentioned in the file >> is still alive, so it's not that easy for the above to happen. If you >> can provide details of a scenario where a failure is likely, we'd like >> to know about it. Also, what PG version are you talking about? > I have experienced this with PG-8.1.3 and will provide details if I can make > it happen. Basically it has happened when I have had to "hard-reset" my > laptop due to some strange bugs in Linux which have made it hang. If you're talking about a postmaster that's auto-started during the boot sequence, then there is a risk depending on what start script you use. The problem is that depending on what else runs during the system startup, the PID assigned to the postmaster might be the same as in the last boot cycle, or it might be different by one or two counts. The postmaster disregards a pidfile containing its own PID, or its parent process' PID, or a PID not belonging to a postgres-owned process. That covers most cases but if your start script does something like su -l postgres -c "pg_ctl start ..." then you have a situation where not only the parent process (pg_ctl) but also the grandparent (a shell) is postgres-owned, and if the pidfile PID happens to match the grandparent then you lose. Solution is to either not use pg_ctl here, or write "exec pg_ctl start ...", so that there's only one postgres-owned process besides the postmaster itself. Initscripts published by PGDG itself and by Red Hat have gotten this right for awhile, but I suspect the word has not propagated to all distros. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match |
| |||
| On Tue, May 23, 2006 at 05:23:16PM +0200, Andreas Joseph Krogh wrote: > Hi all. > > I've experienced several times that PG has died somehow and the postmaster.pid > file still exists 'cause PG hasn't had the ability to delete it upon proper > shutdown. Upon start-up, after such an incidence, PG tells me another PG is > running and that I either have to shut down the other instance, or deletethe > postmaster.pid file if there really isn't an instance running. This seems > totally unnecessary to me. Why doesn't PG use file-locking to tell if another > PG is running or not? If PG holds an exclusive-lock on the pid-file and the > process crashes, or shuts down, then the lock(which is process-based and > controlled by the kernel) will be removed and another PG which tries to start > up can detect that. Using the existence of the pid-file as the only evidence > gives too many false positives IMO. Well, maybe you could tweak postgres startup script, add check for post master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and delete pid file on negative results. i.e. #!/bin/bash PID=`pgrep -f /usr/bin/postmaster`; if [[ $PID ]]; then echo "'$PID'"; # postgres is already running else echo "Postmaster is not running"; # delete stale PID file fi -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFEczm/4fWLop9YkWURAtFhAJ9buaaFfg/H1I3BGDakDPYdcsHuhACglHw+ 59Yvis0JThlp/kUohuecfUc= =YHNk -----END PGP SIGNATURE----- |
| |||
| Adis Nezirovic <adis@linux.org.ba> writes: > Well, maybe you could tweak postgres startup script, add check for post > master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and > delete pid file on negative results. This is exactly what you should NOT do. A start script that thinks it is smarter than the postmaster is almost certainly wrong. It is certainly dangerous, too, because auto-deleting that pidfile destroys the interlock against having two postmasters running in the same data directory (which WILL corrupt your data, quickly and irretrievably). All it takes to cause a problem is to use the start script to start a postmaster, forgetting that you already have one running ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Tue, May 23, 2006 at 01:36:41PM -0400, Tom Lane wrote: > This is exactly what you should NOT do. > > A start script that thinks it is smarter than the postmaster is almost > certainly wrong. It is certainly dangerous, too, because auto-deleting > that pidfile destroys the interlock against having two postmasters > running in the same data directory (which WILL corrupt your data, > quickly and irretrievably). All it takes to cause a problem is to > use the start script to start a postmaster, forgetting that you already > have one running ... I do agree with you that we should not play games with postmaster. Better to be safe than sorry. (So, manually deleting pid file is the only safe option). I was just suggestion (possibly dangerous) workaround. Btw, I do check for running postmaster, using full path (I don't wan to kill every postmaster on the system), is this safe? Or there could be race condition? -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFEc0754fWLop9YkWURAswkAJ9F8bOp+CcLLn9RPZEaq0 xjOpAiOwCeIAv1 /ZI0eHWYhnI+12oBk6P1lM4= =ZZrd -----END PGP SIGNATURE----- |
| |||
| On Tuesday 23 May 2006 19:36, Tom Lane wrote: > Adis Nezirovic <adis@linux.org.ba> writes: > > Well, maybe you could tweak postgres startup script, add check for post > > master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), and > > delete pid file on negative results. > > This is exactly what you should NOT do. > > A start script that thinks it is smarter than the postmaster is almost > certainly wrong. It is certainly dangerous, too, because auto-deleting > that pidfile destroys the interlock against having two postmasters > running in the same data directory (which WILL corrupt your data, > quickly and irretrievably). All it takes to cause a problem is to > use the start script to start a postmaster, forgetting that you already > have one running ... My PG is not started with startup-scripts, but with this command: pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start -- Andreas Joseph Krogh <andreak@officenet.no> Senior Software Developer / Manager gpg public_key: http://dev.officenet.no/~andreak/public_key.asc ------------------------+---------------------------------------------+ OfficeNet AS | The most difficult thing in the world is to | Hoffsveien 17 | know how to do a thing and to watch | PO. Box 425 Skøyen | somebody else doing it wrong, without | 0213 Oslo | comment. | NORWAY | | Phone : +47 22 13 01 00 | | Direct: +47 22 13 10 03 | | Mobile: +47 909 56 963 | | ------------------------+---------------------------------------------+ ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| |||
| On Wednesday 24 May 2006 11:36, Andreas Joseph Krogh wrote: > On Tuesday 23 May 2006 19:36, Tom Lane wrote: > > Adis Nezirovic <adis@linux.org.ba> writes: > > > Well, maybe you could tweak postgres startup script, add check for post > > > master (either 'pgrep postmaster' or 'ps -axu | grep [p]ostmaster'), > > > and delete pid file on negative results. > > > > This is exactly what you should NOT do. > > > > A start script that thinks it is smarter than the postmaster is almost > > certainly wrong. It is certainly dangerous, too, because auto-deleting > > that pidfile destroys the interlock against having two postmasters > > running in the same data directory (which WILL corrupt your data, > > quickly and irretrievably). All it takes to cause a problem is to > > use the start script to start a postmaster, forgetting that you already > > have one running ... > > My PG is not started with startup-scripts, but with this command: > > pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start .... and manually after login, ie. not at boot-time. -- Andreas Joseph Krogh <andreak@officenet.no> Senior Software Developer / Manager gpg public_key: http://dev.officenet.no/~andreak/public_key.asc ------------------------+---------------------------------------------+ OfficeNet AS | The most difficult thing in the world is to | Hoffsveien 17 | know how to do a thing and to watch | PO. Box 425 Skøyen | somebody else doing it wrong, without | 0213 Oslo | comment. | NORWAY | | Phone : +47 22 13 01 00 | | Direct: +47 22 13 10 03 | | Mobile: +47 909 56 963 | | ------------------------+---------------------------------------------+ ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@postgresql.org so that your message can get through to the mailing list cleanly |
| ||||
| On 5/24/06, Andreas Joseph Krogh <andreak@officenet.no> wrote: > > My PG is not started with startup-scripts, but with this command: > > > > pg_ctl -D $PGDATA -l $PGDIR/log/logfile-`date +%Y-%m-%d`.log start > > ... and manually after login, ie. not at boot-time. I'd suggest trying to fix your Linux-install instead of mucking about with Postgres, and this really a pgsql-novice question, not a -hackers thing. Cheers, Andrej -- Please don't top post, and don't use HTML e-Mail :} Make your quotes concise. http://www.american.edu/econ/notes/htmlmail.htm ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq |
| Thread Tools | |
| Display Modes | |
|
|