All of lore.kernel.org
 help / color / mirror / Atom feed
* System hanging in /proc/mounts almost every night... help!
@ 2008-06-03 19:39 Paul Smith
  2008-06-04 13:00 ` Jeff Moyer
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Smith @ 2008-06-03 19:39 UTC (permalink / raw)
  To: autofs

Hi all.  I'm running Ubuntu 8.04 (kernel 2.6.24-17-generic, autofs 4.1.4
+debian-2.1ubuntu2) on an Intel Pentium D 3GHz system with
hyperthreading (SMP kernel) and 1G RAM.

I'm using DHCP for networking and obtaining my automount maps via NIS.

For the last week or so, almost every morning when I come into work my
system is hung up in a strange way.  I can move my mouse but I never get
asked for my password to unlock my screen.  I can C-A-F1 etc. to get
back to a console but after I type my username at the login prompt, I
never get asked for a password and then that console is locked up.  If I
have a console session already logged in from the day before, then I can
use it for a while but eventually some command will lock hard; can't ^C,
can't ^Z, can't kill -9, nothing.

If I try to C-A-D to reboot the system starts to come down but then
hangs, hard, trying to bring down automount.  Reset just tries to reboot
again and hangs in the same place.  I have to power off/on the system
completely.  Bummer.

I did some debugging on this problem.  I logged in as root on every
console (F1-F6).  The next morning when the system was hung, I found a
command that hung (just "ls") and then I ran it in another console under
strace.

It turns out what's happening is it's opening /proc/mounts, which
succeeds, then trying to read(2) from it.  The read system call never
returns and there's no way to kill that process, at all, once it's in
that state.  Also I note the load on the system is very high: typically
over 7.  However top shows no processes chewing CPU.  I also note that
there are some "duplicate" automount processes running (that is, more
than one for the same map).  After I reboot, of course, everything is
fine.

Last night I started all the consoles and in one of them I wrote a
little shell script that ran `date`, then did cat /proc/mounts, then
slept for 15 seconds, then did it again.  I sent the output to a file.

I found that the hang happened last night at ~22:51 EDT.  There was
nothing interesting in the messages log, but in syslog I find a lot of
messages right around that time trying to get to non-existent automount
files (this is caused by some bogosity in the Tracker utility in Gnome,
but it shouldn't cause the system to hang!):

Jun  2 22:51:29 psmithub automount[29241]: >> mount.nfs: access denied by server while mounting snap-dev01:/user/.Trash-10490
Jun  2 22:51:29 psmithub automount[29241]: mount(nfs): nfs: mount failure snap-dev01:/user/.Trash-10490 on /user/.Trash-10490
Jun  2 22:51:29 psmithub automount[29241]: failed to mount /user/.Trash-10490
Jun  2 22:51:29 psmithub automount[29342]: failed to mount /nfs/.Trash
Jun  2 22:51:29 psmithub automount[29343]: failed to mount /nfs/.Trash-10490
Jun  2 22:51:29 psmithub automount[29344]: failed to mount /mnt/.Trash
Jun  2 22:51:29 psmithub automount[29345]: failed to mount /mnt/.Trash-10490
Jun  2 22:51:29 psmithub automount[29346]: >> /sbin/showmount: can't get address for .Trash
Jun  2 22:51:29 psmithub automount[29346]: lookup(program): lookup for .Trash failed
Jun  2 22:51:29 psmithub automount[29346]: failed to mount /net/.Trash
Jun  2 22:51:29 psmithub automount[29353]: >> /sbin/showmount: can't get address for .Trash-10490
Jun  2 22:51:29 psmithub automount[29353]: lookup(program): lookup for .Trash-10490 failed
Jun  2 22:51:29 psmithub automount[29353]: failed to mount /net/.Trash-10490
Jun  2 22:51:34 psmithub automount[29212]: mount(nfs): nfs: mount failure snap-dev01:/tools on /opt/net/tools
Jun  2 22:51:34 psmithub automount[29212]: failed to mount /opt/net/tools

That's the last message of interest in the syslog.  Here's the end of
the shell script loop log:

Mon Jun  2 22:51:30 EDT 2008
rootfs / rootfs rw 0 0
none /sys sysfs rw,nosuid,nodev,noexec 0 0
none /proc proc rw,nosuid,nodev,noexec 0 0
udev /dev tmpfs rw,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
/dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
/dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 /dev/.static/dev ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /lib/modules/2.6.24-17-generic/volatile tmpfs rw,relatime 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime 0 0
tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
/dev/sda5 /home ext3 rw,relatime,data=ordered 0 0
securityfs /sys/kernel/security securityfs rw,relatime 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
automount(pid5466) /net autofs rw,relatime,fd=4,pgrp=5466,timeout=300,minproto=2,maxproto=4,indirect 0 0
automount(pid5367) /mnt autofs rw,relatime,fd=4,pgrp=5367,timeout=60,minproto=2,maxproto=4,indirect 0 0
automount(pid5404) /nfs autofs rw,relatime,fd=4,pgrp=5404,timeout=3600,minproto=2,maxproto=4,indirect 0 0
automount(pid5532) /user autofs rw,relatime,fd=4,pgrp=5532,timeout=300,minproto=2,maxproto=4,indirect 0 0
automount(pid5612) /export/autofs autofs rw,relatime,fd=4,pgrp=5612,timeout=60,minproto=2,maxproto=4,indirect 0 0
automount(pid5684) /opt/net autofs rw,relatime,fd=4,pgrp=5684,timeout=36000,minproto=2,maxproto=4,indirect 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0


Mon Jun  2 22:51:45 EDT 2008

Then it just hangs.

If anyone has any thoughts about this, including ways I could proceed to
debug it, I'm interested!

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2008-06-05 15:25 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-03 19:39 System hanging in /proc/mounts almost every night... help! Paul Smith
2008-06-04 13:00 ` Jeff Moyer
2008-06-04 15:55   ` Paul Smith
2008-06-05 15:25   ` Paul Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.