This is a discussion on How do I kill the nfs daemons? within the Slackware Linux Support forums, part of the Unix Operating Systems category; --> Here's the problem: Machine 1 exports a directory via nfs (no samba or similar nonsense). On machine 2 the ...
| |||||||
| Register | FAQ | Members List | Calendar | Search | Today's Posts | Mark Forums Read |
| ||||
| Here's the problem: Machine 1 exports a directory via nfs (no samba or similar nonsense). On machine 2 the directory is mounted. Machine 1 goes down (hardware reset) without any nice unmounting or such. Machine 2 now has two dead daemons running 22099 root 9 0 0 0 0 S 0.0 0.0 0:00.00 rpciod 22100 root 9 0 0 0 0 S 0.0 0.0 0:00.00 lockd These cannot be killed with "kill - ANYSIGNAL". /proc/mounts shows the bogus mount still open. Any access to the mounted directory and/or mount table creates further unkillable processes like this: 3188 root 9 0 480 480 400 D 0.0 0.2 0:00.00 lsof None of this can be touched by root in any way. This is problematic, as these processes count in the load average which screws up load balancing -- this machine ends up with a load of "n" where "n" is the number of these dead unkillable processes and thus all traffic is routed to the only box on the network which didn't have that directory mounted which causes massive overload on that one machine while everything else sits essentially idle. Now this is an unusual situation, but I'm somewhat frustrated at the utter inability of the superuser to shoot down processes that have been sitting for hours unsuccessfully trying to connect to something that isn't there any more. Now in this particular case this was in part my own fault of course: clearly there are timeouts in the nfs system that I didn't set properly or otherwise the unsucessful attempt to access a stale nfs handle would've crapped out long ago. But there's just something wrong on a deeper level when the sysadmin cannot stop a process. Is there some super-duper-user trick that would allow me to terminate a process unconditionally? In the case of the "lsof" above, the process is sleeping uninterruptably because it is waiting for I/O -- but what if I want to kill it anyways? How do I tell such a process to give up waiting for I/O? In the case of rpciod and lockd, apparently SIGKILL and SIGHUP were just masked out somehow - they were utterly oblivious to any kill/skill/pkill ... ?? [slight puzzlement] |
| |||
| On 25 Sep 2003 14:00:39 -0700, Big Bird <condor@biosys.net> wrote: [Dying NFS mounts] > /proc/mounts shows the bogus mount still open. Can you try umounting it (maybe with 'umount -f')? > Is there some super-duper-user trick that would allow me to terminate > a process unconditionally? 'kill -9' usually does the job. -- Simon <simon@no-dns-yet.org.uk> **** GPG: F4A23C69 "We demand rigidly defined areas of doubt and uncertainty." - Douglas Adams |
| |||
| I use to have the same problem. I have been able to prevent it from happening now. If you use the default settings for /etc/exports (nfs server) and /etc/fstab (nfs client), they're mounted as "hard". Mount them as "soft" and your problems should go away. Some examples (mercury is the nfs server and venus is the nfs client): root@mercury:~# cat /etc/exports /mnt/home *.sysk-net.lan(rw,secure,sync,no_wdelay,root_squash) /mnt/dl *.sysk-net.lan(rw,secure,sync,no_wdelay,no_root_squash) root@venus:~# cat /etc/fstab mercury:/mnt/home /home nfs soft,rw,bg,intr,rsize=8192,wsize=8192,timeo=20,ret rans=3,retry=1 mercury:/mnt/dl /mnt/dl nfs soft,rw,bg,intr,rsize=8192,wsize=8192,timeo=20,ret rans=3,retry=1 Now what happens when the nfs server goes down (really sucks since /mnt/home is my /home on client -- X totally stops)? For a while, I can't do anything on the client (except as root, since it's home (/root) is locally mounted). But once the nfs server is back up, everything runs as normal, as if nothing ever happened. So, as far as an immediate fix for killing those daemons, killall -9 usually works. But for a long-term fix, use soft instead of the hard default. Also, if you don't mount via /etc/fstab (aka, you use the mount command directly) just use -o to put in soft option. Hope this helps. Jason Campbell <syskill@sysk-net.com> On Thu, 25 Sep 2003 14:00:39 -0700, Big Bird wrote: > Here's the problem: > > Machine 1 exports a directory via nfs (no samba or similar nonsense). > > On machine 2 the directory is mounted. > > Machine 1 goes down (hardware reset) without any nice unmounting or > such. > > blah... blah... blah... |
| |||
| In article <df160b8f.0309251300.3e729a82@posting.google.com >, Big Bird wrote: > Machine 1 exports a directory via nfs (no samba or similar nonsense). > > On machine 2 the directory is mounted. > > Machine 1 goes down (hardware reset) without any nice unmounting or > [snip] > Now this is an unusual situation, but I'm somewhat frustrated at the Not really. I've been there, done that, got the T-shirt. I have managed to resolve the problem by bringing machine 1 back up, then umount'ing on machine 2. > process is sleeping uninterruptably because it is waiting for I/O -- > but what if I want to kill it anyways? How do I tell such a process to > give up waiting for I/O? In the case of rpciod and lockd, apparently "Uninterruptible sleep" is, unfortunately, just that. I agree, it would seem beneficial to have a way to signal such processes and let them know that their wait is in vain, but AFAIK the gods of Unix have decreed that this not be so. You could search the LKML archives (Google has numerous sources) to see if the suggestion has come up before, and if so, what was said. -- /dev/rob0 - preferred_email=i$((28*28+28))@softhome.net or put "not-spam" or "/dev/rob0" in Subject header to reply |
| |||
| Simon <usenet@no-dns-yet.org.uk> wrote in message news:<slrnbn6r7g.a3d.usenet@dustpuppy.no-dns-yet.org.uk>... > On 25 Sep 2003 14:00:39 -0700, Big Bird <condor@biosys.net> wrote: > [Dying NFS mounts] > > /proc/mounts shows the bogus mount still open. > > Can you try umounting it (maybe with 'umount -f')? Of course I tried that. Gave me something like "device or resource busy" > > Is there some super-duper-user trick that would allow me to terminate > > a process unconditionally? > > 'kill -9' usually does the job. Sure, usually. And the fact that it usually works is exactly what left me somewhat dumbfounded when (in this case) it didn't work. I had never thought I might need something like "a bigger gun than kill -9". Which is why I'm asking around... (In the case at hand, I (hardware-)reset the machines in question which was unproblematic since they were not actually doing anything because their load averages were jacked up way beyond the reasonable. But for future reference...) |
| ||||
| Please don't top-post. Thank you. In article <pan.2003.09.25.23.58.26.669905@sysk-net.com>, Jason Campbell wrote: > So, as far as an immediate fix for killing those daemons, killall -9 > usually works. Absolutely not. This is not true. The process will still sit there awaiting its I/O, and all your SIGKILL's are queued for delivery for when the application awakens from its uninterruptible sleep. -- /dev/rob0 - preferred_email=i$((28*28+28))@softhome.net or put "not-spam" or "/dev/rob0" in Subject header to reply |