* System hanging in /proc/mounts almost every night... help!
@ 2008-06-03 19:39 Paul Smith
2008-06-04 13:00 ` Jeff Moyer
0 siblings, 1 reply; 4+ messages in thread
From: Paul Smith @ 2008-06-03 19:39 UTC (permalink / raw)
To: autofs
Hi all. I'm running Ubuntu 8.04 (kernel 2.6.24-17-generic, autofs 4.1.4
+debian-2.1ubuntu2) on an Intel Pentium D 3GHz system with
hyperthreading (SMP kernel) and 1G RAM.
I'm using DHCP for networking and obtaining my automount maps via NIS.
For the last week or so, almost every morning when I come into work my
system is hung up in a strange way. I can move my mouse but I never get
asked for my password to unlock my screen. I can C-A-F1 etc. to get
back to a console but after I type my username at the login prompt, I
never get asked for a password and then that console is locked up. If I
have a console session already logged in from the day before, then I can
use it for a while but eventually some command will lock hard; can't ^C,
can't ^Z, can't kill -9, nothing.
If I try to C-A-D to reboot the system starts to come down but then
hangs, hard, trying to bring down automount. Reset just tries to reboot
again and hangs in the same place. I have to power off/on the system
completely. Bummer.
I did some debugging on this problem. I logged in as root on every
console (F1-F6). The next morning when the system was hung, I found a
command that hung (just "ls") and then I ran it in another console under
strace.
It turns out what's happening is it's opening /proc/mounts, which
succeeds, then trying to read(2) from it. The read system call never
returns and there's no way to kill that process, at all, once it's in
that state. Also I note the load on the system is very high: typically
over 7. However top shows no processes chewing CPU. I also note that
there are some "duplicate" automount processes running (that is, more
than one for the same map). After I reboot, of course, everything is
fine.
Last night I started all the consoles and in one of them I wrote a
little shell script that ran `date`, then did cat /proc/mounts, then
slept for 15 seconds, then did it again. I sent the output to a file.
I found that the hang happened last night at ~22:51 EDT. There was
nothing interesting in the messages log, but in syslog I find a lot of
messages right around that time trying to get to non-existent automount
files (this is caused by some bogosity in the Tracker utility in Gnome,
but it shouldn't cause the system to hang!):
Jun 2 22:51:29 psmithub automount[29241]: >> mount.nfs: access denied by server while mounting snap-dev01:/user/.Trash-10490
Jun 2 22:51:29 psmithub automount[29241]: mount(nfs): nfs: mount failure snap-dev01:/user/.Trash-10490 on /user/.Trash-10490
Jun 2 22:51:29 psmithub automount[29241]: failed to mount /user/.Trash-10490
Jun 2 22:51:29 psmithub automount[29342]: failed to mount /nfs/.Trash
Jun 2 22:51:29 psmithub automount[29343]: failed to mount /nfs/.Trash-10490
Jun 2 22:51:29 psmithub automount[29344]: failed to mount /mnt/.Trash
Jun 2 22:51:29 psmithub automount[29345]: failed to mount /mnt/.Trash-10490
Jun 2 22:51:29 psmithub automount[29346]: >> /sbin/showmount: can't get address for .Trash
Jun 2 22:51:29 psmithub automount[29346]: lookup(program): lookup for .Trash failed
Jun 2 22:51:29 psmithub automount[29346]: failed to mount /net/.Trash
Jun 2 22:51:29 psmithub automount[29353]: >> /sbin/showmount: can't get address for .Trash-10490
Jun 2 22:51:29 psmithub automount[29353]: lookup(program): lookup for .Trash-10490 failed
Jun 2 22:51:29 psmithub automount[29353]: failed to mount /net/.Trash-10490
Jun 2 22:51:34 psmithub automount[29212]: mount(nfs): nfs: mount failure snap-dev01:/tools on /opt/net/tools
Jun 2 22:51:34 psmithub automount[29212]: failed to mount /opt/net/tools
That's the last message of interest in the syslog. Here's the end of
the shell script loop log:
Mon Jun 2 22:51:30 EDT 2008
rootfs / rootfs rw 0 0
none /sys sysfs rw,nosuid,nodev,noexec 0 0
none /proc proc rw,nosuid,nodev,noexec 0 0
udev /dev tmpfs rw,relatime 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
/dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
/dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 /dev/.static/dev ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /lib/modules/2.6.24-17-generic/volatile tmpfs rw,relatime 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
devpts /dev/pts devpts rw,relatime 0 0
tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
/dev/sda5 /home ext3 rw,relatime,data=ordered 0 0
securityfs /sys/kernel/security securityfs rw,relatime 0 0
rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
automount(pid5466) /net autofs rw,relatime,fd=4,pgrp=5466,timeout=300,minproto=2,maxproto=4,indirect 0 0
automount(pid5367) /mnt autofs rw,relatime,fd=4,pgrp=5367,timeout=60,minproto=2,maxproto=4,indirect 0 0
automount(pid5404) /nfs autofs rw,relatime,fd=4,pgrp=5404,timeout=3600,minproto=2,maxproto=4,indirect 0 0
automount(pid5532) /user autofs rw,relatime,fd=4,pgrp=5532,timeout=300,minproto=2,maxproto=4,indirect 0 0
automount(pid5612) /export/autofs autofs rw,relatime,fd=4,pgrp=5612,timeout=60,minproto=2,maxproto=4,indirect 0 0
automount(pid5684) /opt/net autofs rw,relatime,fd=4,pgrp=5684,timeout=36000,minproto=2,maxproto=4,indirect 0 0
nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
Mon Jun 2 22:51:45 EDT 2008
Then it just hangs.
If anyone has any thoughts about this, including ways I could proceed to
debug it, I'm interested!
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: System hanging in /proc/mounts almost every night... help!
2008-06-03 19:39 System hanging in /proc/mounts almost every night... help! Paul Smith
@ 2008-06-04 13:00 ` Jeff Moyer
2008-06-04 15:55 ` Paul Smith
2008-06-05 15:25 ` Paul Smith
0 siblings, 2 replies; 4+ messages in thread
From: Jeff Moyer @ 2008-06-04 13:00 UTC (permalink / raw)
To: psmith; +Cc: autofs
Paul Smith <psmith@netezza.com> writes:
> Hi all. I'm running Ubuntu 8.04 (kernel 2.6.24-17-generic, autofs 4.1.4
> +debian-2.1ubuntu2) on an Intel Pentium D 3GHz system with
> hyperthreading (SMP kernel) and 1G RAM.
>
> I'm using DHCP for networking and obtaining my automount maps via NIS.
>
> For the last week or so, almost every morning when I come into work my
> system is hung up in a strange way. I can move my mouse but I never get
> asked for my password to unlock my screen. I can C-A-F1 etc. to get
> back to a console but after I type my username at the login prompt, I
> never get asked for a password and then that console is locked up. If I
> have a console session already logged in from the day before, then I can
> use it for a while but eventually some command will lock hard; can't ^C,
> can't ^Z, can't kill -9, nothing.
>
> If I try to C-A-D to reboot the system starts to come down but then
> hangs, hard, trying to bring down automount. Reset just tries to reboot
> again and hangs in the same place. I have to power off/on the system
> completely. Bummer.
>
> I did some debugging on this problem. I logged in as root on every
> console (F1-F6). The next morning when the system was hung, I found a
> command that hung (just "ls") and then I ran it in another console under
> strace.
>
> It turns out what's happening is it's opening /proc/mounts, which
> succeeds, then trying to read(2) from it. The read system call never
> returns and there's no way to kill that process, at all, once it's in
> that state. Also I note the load on the system is very high: typically
> over 7. However top shows no processes chewing CPU. I also note that
> there are some "duplicate" automount processes running (that is, more
> than one for the same map). After I reboot, of course, everything is
> fine.
>
> Last night I started all the consoles and in one of them I wrote a
> little shell script that ran `date`, then did cat /proc/mounts, then
> slept for 15 seconds, then did it again. I sent the output to a file.
>
> I found that the hang happened last night at ~22:51 EDT. There was
> nothing interesting in the messages log, but in syslog I find a lot of
> messages right around that time trying to get to non-existent automount
> files (this is caused by some bogosity in the Tracker utility in Gnome,
> but it shouldn't cause the system to hang!):
>
> Jun 2 22:51:29 psmithub automount[29241]: >> mount.nfs: access denied by server while mounting snap-dev01:/user/.Trash-10490
> Jun 2 22:51:29 psmithub automount[29241]: mount(nfs): nfs: mount failure snap-dev01:/user/.Trash-10490 on /user/.Trash-10490
> Jun 2 22:51:29 psmithub automount[29241]: failed to mount /user/.Trash-10490
> Jun 2 22:51:29 psmithub automount[29342]: failed to mount /nfs/.Trash
> Jun 2 22:51:29 psmithub automount[29343]: failed to mount /nfs/.Trash-10490
> Jun 2 22:51:29 psmithub automount[29344]: failed to mount /mnt/.Trash
> Jun 2 22:51:29 psmithub automount[29345]: failed to mount /mnt/.Trash-10490
> Jun 2 22:51:29 psmithub automount[29346]: >> /sbin/showmount: can't get address for .Trash
> Jun 2 22:51:29 psmithub automount[29346]: lookup(program): lookup for .Trash failed
> Jun 2 22:51:29 psmithub automount[29346]: failed to mount /net/.Trash
> Jun 2 22:51:29 psmithub automount[29353]: >> /sbin/showmount: can't get address for .Trash-10490
> Jun 2 22:51:29 psmithub automount[29353]: lookup(program): lookup for .Trash-10490 failed
> Jun 2 22:51:29 psmithub automount[29353]: failed to mount /net/.Trash-10490
> Jun 2 22:51:34 psmithub automount[29212]: mount(nfs): nfs: mount failure snap-dev01:/tools on /opt/net/tools
> Jun 2 22:51:34 psmithub automount[29212]: failed to mount /opt/net/tools
>
This .Trash madness has to end. It keep autofs from expiring mounts in
other situations. *grumble*
My advice for further debugging is to enabled autofs debug logging (see
http://people.redhat.com/jmoyer), and when hung, get the output from
sysrq-t. So, when you come in in the morning, issue the sysrq-t and
make sure you can capture the output somehow (serial console or
netconsole would be best).
More below...
> That's the last message of interest in the syslog. Here's the end of
> the shell script loop log:
>
> Mon Jun 2 22:51:30 EDT 2008
> rootfs / rootfs rw 0 0
> none /sys sysfs rw,nosuid,nodev,noexec 0 0
> none /proc proc rw,nosuid,nodev,noexec 0 0
> udev /dev tmpfs rw,relatime 0 0
> fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
> /dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 / ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
> /dev/disk/by-uuid/c7ada654-6e09-4400-ae85-c93e7fcd99d7 /dev/.static/dev ext3 rw,relatime,errors=remount-ro,data=ordered 0 0
> tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
> tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
> tmpfs /lib/modules/2.6.24-17-generic/volatile tmpfs rw,relatime 0 0
> tmpfs /dev/shm tmpfs rw,relatime 0 0
> devpts /dev/pts devpts rw,relatime 0 0
> tmpfs /var/run tmpfs rw,nosuid,nodev,noexec 0 0
> tmpfs /var/lock tmpfs rw,nosuid,nodev,noexec 0 0
> /dev/sda5 /home ext3 rw,relatime,data=ordered 0 0
> securityfs /sys/kernel/security securityfs rw,relatime 0 0
> rpc_pipefs /var/lib/nfs/rpc_pipefs rpc_pipefs rw,relatime 0 0
> automount(pid5466) /net autofs rw,relatime,fd=4,pgrp=5466,timeout=300,minproto=2,maxproto=4,indirect 0 0
> automount(pid5367) /mnt autofs rw,relatime,fd=4,pgrp=5367,timeout=60,minproto=2,maxproto=4,indirect 0 0
> automount(pid5404) /nfs autofs rw,relatime,fd=4,pgrp=5404,timeout=3600,minproto=2,maxproto=4,indirect 0 0
> automount(pid5532) /user autofs rw,relatime,fd=4,pgrp=5532,timeout=300,minproto=2,maxproto=4,indirect 0 0
> automount(pid5612) /export/autofs autofs rw,relatime,fd=4,pgrp=5612,timeout=60,minproto=2,maxproto=4,indirect 0 0
> automount(pid5684) /opt/net autofs rw,relatime,fd=4,pgrp=5684,timeout=36000,minproto=2,maxproto=4,indirect 0 0
> nfsd /proc/fs/nfsd nfsd rw,relatime 0 0
Well, I don't see the duplicate entry you mentioned above. It is
possible that there are multiple automount daemons for the same
mountpoint during a mount or expire event. That's just normal
operations.
How about your ps listing and maybe a gdb backtrace of the daemon (if
your system will allow you to get that).
Cheers,
Jeff
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: System hanging in /proc/mounts almost every night... help!
2008-06-04 13:00 ` Jeff Moyer
@ 2008-06-04 15:55 ` Paul Smith
2008-06-05 15:25 ` Paul Smith
1 sibling, 0 replies; 4+ messages in thread
From: Paul Smith @ 2008-06-04 15:55 UTC (permalink / raw)
To: Jeff Moyer; +Cc: autofs
On Wed, 2008-06-04 at 09:00 -0400, Jeff Moyer wrote:
> This .Trash madness has to end. It keep autofs from expiring mounts
> in other situations. *grumble*
I'm looking into this and it absolutely does not make any sense, you're
right. I've got a sample patch that will solve it that I'll propose to
the Gnome folks, but I think personally that my patch is just a quick
hack and something more comprehensive would be better (the types of
filesystems that are checked should be configurable, at least through
gconf or similar--my patch just hardcodes a check for "autofs").
Note this won't help the problem of non-expiring mounts: if you DO have
a .Trash file on the filesystem then, as you point out, this will keep
the partition mounted all the time. Gross.
FYI, it happened again last night and again, the last thing in the log
was the .Trash thing. However, I don't think that's the instigator per
se but rather a signpost. It appears that whenever any filesystem is
mounted or unmounted, that silly .Trash search is triggered; apparently
the gvfsd-trash applet has an inotify or dbus or something set up to
find out when a mount or unmount happens.
I think it's the mount or (more likely) unmount that is both causing the
problem, AND triggering the .Trash search.
Anyway, I get those annoying .Trash messages many times a day but it
never seems to cause any problem. But soon after I lock my system and
go home for the night, then it happens. Last night I left about 7pm,
and the problem occurred at 9:43pm.
I'll set up the debugging you asked for, thanks. Stay tuned for
tomorrow!
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: System hanging in /proc/mounts almost every night... help!
2008-06-04 13:00 ` Jeff Moyer
2008-06-04 15:55 ` Paul Smith
@ 2008-06-05 15:25 ` Paul Smith
1 sibling, 0 replies; 4+ messages in thread
From: Paul Smith @ 2008-06-05 15:25 UTC (permalink / raw)
To: Jeff Moyer; +Cc: autofs
On Wed, 2008-06-04 at 09:00 -0400, Jeff Moyer wrote:
> My advice for further debugging is to enabled autofs debug logging
> (see http://people.redhat.com/jmoyer), and when hung, get the output
> from sysrq-t.
Oh for...
and of course it didn't hang last night, for the first time. I do have
to say that I installed a new kernel from the Ubuntu repositories
yesterday, with some security etc. fixes. I didn't see any changes in
that update related to autofs or mounts but you never know.
I'll keep testing and see if it recurs.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2008-06-05 15:25 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-06-03 19:39 System hanging in /proc/mounts almost every night... help! Paul Smith
2008-06-04 13:00 ` Jeff Moyer
2008-06-04 15:55 ` Paul Smith
2008-06-05 15:25 ` Paul Smith
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.