* Umount call getting stuck, hanging nfs?
@ 2006-05-04 20:41 Mike Marion
2006-05-04 21:20 ` Jeff Moyer
2006-05-05 1:59 ` Ian Kent
0 siblings, 2 replies; 7+ messages in thread
From: Mike Marion @ 2006-05-04 20:41 UTC (permalink / raw)
To: autofs
Seeing some of our hosts in only one site having problems with hangs
occurring. Seems to be to same filer and even same paths, but what I
see is odd. The kernel rpciod thread is even stuck in state D,
seemingly because the umount call is.
i.e.
root 20302 1.2 0.0 2468 584 ? D 12:01 2:39
/bin/umount //usr/local/projects/dsp/qdsp6
root 6270 0.0 0.0 0 0 ? D Apr28 3:17 [rpciod]
unfortunately, once this happens, any new mounts will fail. Can't even
stat the path above via df. Basically the whole NFS layer is stuck.
Using autofs-4.1.4 with
autofs-4.1.4-misc-fixes.patch
autofs-4.1.4-multi-parse-fix.patch
autofs-4.1.4-non-replicated-ping.patch
patches (slight possibility one of the above is missing, but I'm pretty
damn sure they're in there).
Mounts are TCP based so I can't even use a spoofed interface to force a
umount.
Wondering why the extra / in the path on the umount call as well. Also
wondering if there's something in the filer (netapp) wrong that's giving
some kind of response to the umount that's tickling a bug there. Not
much I've found online yet though.
Oh, and umount call shows socks in fd list that don't appear to exist
anymore:
:~# ls -l /proc/20302/fd
total 3
dr-x------ 2 root root 0 May 4 15:26 .
dr-xr-xr-x 3 root root 0 May 4 12:01 ..
lrwx------ 1 root root 64 May 4 15:26 0 -> /dev/null
l-wx------ 1 root root 64 May 4 15:26 1 -> pipe:[4528730]
l-wx------ 1 root root 64 May 4 15:26 2 -> pipe:[4528730]
:~ # socklist | grep 4528730
:~ #
Problem happens on hosts using same autofs daemons with or without
direct maps enabled. Not really sure if it's technically an autofs
issue (unless there's a glitch in how it's calling umount and it's
timing there) or an NFS layer issue.
SLES9-SP1, kernel 2.6.5-7.147-smp (from suse-9.2 updates) on
x86_64 hosts.
--
Mike Marion-Unix SysAdmin/Staff Engineer-http://www.qualcomm.com
Drew Carey: "Look, this is an odd question, but you're kind of cute and you're
pretty nice to me. Are you drunk? It's OK if you are." => Drew Cary Show.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umount call getting stuck, hanging nfs?
2006-05-04 20:41 Umount call getting stuck, hanging nfs? Mike Marion
@ 2006-05-04 21:20 ` Jeff Moyer
2006-05-04 22:07 ` Mike Marion
2006-05-05 1:59 ` Ian Kent
1 sibling, 1 reply; 7+ messages in thread
From: Jeff Moyer @ 2006-05-04 21:20 UTC (permalink / raw)
To: Mike Marion; +Cc: autofs
==> Regarding [autofs] Umount call getting stuck, hanging nfs?; Mike Marion <mmarion@qualcomm.com> adds:
mmarion> Seeing some of our hosts in only one site having problems with
mmarion> hangs occurring. Seems to be to same filer and even same paths,
mmarion> but what I see is odd. The kernel rpciod thread is even stuck in
mmarion> state D, seemingly because the umount call is.
mmarion> i.e. root 20302 1.2 0.0 2468 584 ? D 12:01 2:39 /bin/umount
mmarion> //usr/local/projects/dsp/qdsp6
mmarion> root 6270 0.0 0.0 0 0 ? D Apr28 3:17 [rpciod]
mmarion> unfortunately, once this happens, any new mounts will fail. Can't
mmarion> even stat the path above via df. Basically the whole NFS layer is
mmarion> stuck.
mmarion> Using autofs-4.1.4 with autofs-4.1.4-misc-fixes.patch
mmarion> autofs-4.1.4-multi-parse-fix.patch
mmarion> autofs-4.1.4-non-replicated-ping.patch patches (slight possibility
mmarion> one of the above is missing, but I'm pretty damn sure they're in
mmarion> there).
mmarion> Mounts are TCP based so I can't even use a spoofed interface to
mmarion> force a umount.
mmarion> Wondering why the extra / in the path on the umount call as well.
mmarion> Also wondering if there's something in the filer (netapp) wrong
mmarion> that's giving some kind of response to the umount that's tickling
mmarion> a bug there. Not much I've found online yet though.
mmarion> Oh, and umount call shows socks in fd list that don't appear to
mmarion> exist anymore: :~# ls -l /proc/20302/fd total 3 dr-x------ 2 root
mmarion> root 0 May 4 15:26 . dr-xr-xr-x 3 root root 0 May 4 12:01 ..
mmarion> lrwx------ 1 root root 64 May 4 15:26 0 -> /dev/null l-wx------ 1
mmarion> root root 64 May 4 15:26 1 -> pipe:[4528730] l-wx------ 1 root
mmarion> root 64 May 4 15:26 2 -> pipe:[4528730] :~ # socklist | grep
mmarion> 4528730 :~ #
mmarion> Problem happens on hosts using same autofs daemons with or without
mmarion> direct maps enabled. Not really sure if it's technically an
mmarion> autofs issue (unless there's a glitch in how it's calling umount
mmarion> and it's timing there) or an NFS layer issue.
mmarion> SLES9-SP1, kernel 2.6.5-7.147-smp (from suse-9.2 updates) on
mmarion> x86_64 hosts.
Really sounds like an NFS problem. I'd post to the NFS list, and they'll
likely ask for over-the-wire messages.
-Jeff
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umount call getting stuck, hanging nfs?
2006-05-04 21:20 ` Jeff Moyer
@ 2006-05-04 22:07 ` Mike Marion
0 siblings, 0 replies; 7+ messages in thread
From: Mike Marion @ 2006-05-04 22:07 UTC (permalink / raw)
To: autofs
On Thu, May 04, 2006 at 05:20:05PM -0400, Jeff Moyer wrote:
> Really sounds like an NFS problem. I'd post to the NFS list, and they'll
> likely ask for over-the-wire messages.
I'll give that a shot. Only problems are that once it's in this state,
the box never talks to the NFS server anymore when any stats or other
operations on the stuck path(s) are done, and it'd be hard to catch one
in the act of getting stuck.
--
Mike Marion-Unix SysAdmin/Staff Engineer-http://www.qualcomm.com
George: "You don't work in the rain? You're a mailman... Neither Rain, nor
sleet, nor sno-IT'S THE FIRST ONE!" ==> Seinfeld
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umount call getting stuck, hanging nfs?
2006-05-04 20:41 Umount call getting stuck, hanging nfs? Mike Marion
2006-05-04 21:20 ` Jeff Moyer
@ 2006-05-05 1:59 ` Ian Kent
2006-05-08 23:34 ` Mike Marion
1 sibling, 1 reply; 7+ messages in thread
From: Ian Kent @ 2006-05-05 1:59 UTC (permalink / raw)
To: Mike Marion; +Cc: autofs
On Thu, 4 May 2006, Mike Marion wrote:
> Seeing some of our hosts in only one site having problems with hangs
> occurring. Seems to be to same filer and even same paths, but what I
> see is odd. The kernel rpciod thread is even stuck in state D,
> seemingly because the umount call is.
I think that might be the other way around.
>
> i.e.
> root 20302 1.2 0.0 2468 584 ? D 12:01 2:39
> /bin/umount //usr/local/projects/dsp/qdsp6
>
> root 6270 0.0 0.0 0 0 ? D Apr28 3:17 [rpciod]
>
> unfortunately, once this happens, any new mounts will fail. Can't even
> stat the path above via df. Basically the whole NFS layer is stuck.
Tell us what the maps look like.
>
> Using autofs-4.1.4 with
> autofs-4.1.4-misc-fixes.patch
> autofs-4.1.4-multi-parse-fix.patch
> autofs-4.1.4-non-replicated-ping.patch
> patches (slight possibility one of the above is missing, but I'm pretty
> damn sure they're in there).
>
> Mounts are TCP based so I can't even use a spoofed interface to force a
> umount.
>
> Wondering why the extra / in the path on the umount call as well. Also
> wondering if there's something in the filer (netapp) wrong that's giving
> some kind of response to the umount that's tickling a bug there. Not
> much I've found online yet though.
The extra "/" will be a bug but it gets ignored by the kernel in this
case.
>
> Oh, and umount call shows socks in fd list that don't appear to exist
> anymore:
> :~# ls -l /proc/20302/fd
> total 3
> dr-x------ 2 root root 0 May 4 15:26 .
> dr-xr-xr-x 3 root root 0 May 4 12:01 ..
> lrwx------ 1 root root 64 May 4 15:26 0 -> /dev/null
> l-wx------ 1 root root 64 May 4 15:26 1 -> pipe:[4528730]
> l-wx------ 1 root root 64 May 4 15:26 2 -> pipe:[4528730]
> :~ # socklist | grep 4528730
> :~ #
>
> Problem happens on hosts using same autofs daemons with or without
> direct maps enabled. Not really sure if it's technically an autofs
> issue (unless there's a glitch in how it's calling umount and it's
> timing there) or an NFS layer issue.
>
> SLES9-SP1, kernel 2.6.5-7.147-smp (from suse-9.2 updates) on
> x86_64 hosts.
>
> --
> Mike Marion-Unix SysAdmin/Staff Engineer-http://www.qualcomm.com
> Drew Carey: "Look, this is an odd question, but you're kind of cute and you're
> pretty nice to me. Are you drunk? It's OK if you are." => Drew Cary Show.
>
> _______________________________________________
> autofs mailing list
> autofs@linux.kernel.org
> http://linux.kernel.org/mailman/listinfo/autofs
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umount call getting stuck, hanging nfs?
2006-05-05 1:59 ` Ian Kent
@ 2006-05-08 23:34 ` Mike Marion
2006-05-09 1:31 ` Mike Marion
2006-05-09 11:21 ` Ian Kent
0 siblings, 2 replies; 7+ messages in thread
From: Mike Marion @ 2006-05-08 23:34 UTC (permalink / raw)
To: Ian Kent; +Cc: autofs
On Fri, May 05, 2006 at 09:59:14AM +0800, Ian Kent wrote:
> I think that might be the other way around.
You mean rpciod is already hung before the umount, and the umount is
then hung due to the rpciod?
> > /bin/umount //usr/local/projects/dsp/qdsp6
> >
> > root 6270 0.0 0.0 0 0 ? D Apr28 3:17 [rpciod]
> >
> > unfortunately, once this happens, any new mounts will fail. Can't even
> > stat the path above via df. Basically the whole NFS layer is stuck.
>
> Tell us what the maps look like.
The noted path is like this:
/prj/dsp/qdsp6 -rw,acdirmin=1,acdirmax=5,acregmin=1,acregmax=5,rsize=32768,wsize=32768,noquota
western:/vol/eng_aus_0004/qdsp6
And the rest of the file is very similar. The huge amount of options
came from trial and error with performance problems we were having, and
they're defaulting to tcp mounts. The odd thing is that we basically
never have mounts hanging off this /prj tree in San Diego, or any other
office except for one. And it's only hanging when talking to their 2
local NetApp filers.
--
Mike Marion-Unix SysAdmin/Staff Engineer-http://www.qualcomm.com
A nerd is someone whose life revolves around computers and technology.
A geek is someone whose life revolves around computers and technology...
and likes it! - Stolen from a /. post.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umount call getting stuck, hanging nfs?
2006-05-08 23:34 ` Mike Marion
@ 2006-05-09 1:31 ` Mike Marion
2006-05-09 11:21 ` Ian Kent
1 sibling, 0 replies; 7+ messages in thread
From: Mike Marion @ 2006-05-09 1:31 UTC (permalink / raw)
To: Ian Kent; +Cc: autofs
On Mon, May 08, 2006 at 04:34:38PM -0700, Mike Marion wrote:
> > Tell us what the maps look like.
>
> The noted path is like this:
> /prj/dsp/qdsp6 -rw,acdirmin=1,acdirmax=5,acregmin=1,acregmax=5,rsize=32768,wsize=32768,noquota
> western:/vol/eng_aus_0004/qdsp6
Oops.. that's the direct map entry and the host in question is using
program maps (the problem happens on hosts using direct maps or program
maps).
The path in question on that host looks like:
/usr/local/projects/dsp/qdsp6 -rw,noquota western:/vol/eng_aus_0004/qdsp6
Hmm.. I didn't realize they weren't getting the other options. Though
like I said, it happens with the above and program maps, or the direct
map entry shown in the quoted bit.
--
Mike Marion-Unix SysAdmin/Staff Engineer-http://www.qualcomm.com
Mayor: "Their little festival should pump some money into the local economy."
Cartman: "They're hippies!! They don't have any money!!" ==> South Park.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: Umount call getting stuck, hanging nfs?
2006-05-08 23:34 ` Mike Marion
2006-05-09 1:31 ` Mike Marion
@ 2006-05-09 11:21 ` Ian Kent
1 sibling, 0 replies; 7+ messages in thread
From: Ian Kent @ 2006-05-09 11:21 UTC (permalink / raw)
To: Mike Marion; +Cc: autofs
On Mon, 8 May 2006, Mike Marion wrote:
> On Fri, May 05, 2006 at 09:59:14AM +0800, Ian Kent wrote:
>
> > I think that might be the other way around.
>
> You mean rpciod is already hung before the umount, and the umount is
> then hung due to the rpciod?
My mistake. We don't actually know which way around it is.
>
> > > /bin/umount //usr/local/projects/dsp/qdsp6
> > >
> > > root 6270 0.0 0.0 0 0 ? D Apr28 3:17 [rpciod]
> > >
> > > unfortunately, once this happens, any new mounts will fail. Can't even
> > > stat the path above via df. Basically the whole NFS layer is stuck.
> >
> > Tell us what the maps look like.
>
> The noted path is like this:
> /prj/dsp/qdsp6 -rw,acdirmin=1,acdirmax=5,acregmin=1,acregmax=5,rsize=32768,wsize=32768,noquota
> western:/vol/eng_aus_0004/qdsp6
>
> And the rest of the file is very similar. The huge amount of options
> came from trial and error with performance problems we were having, and
> they're defaulting to tcp mounts. The odd thing is that we basically
> never have mounts hanging off this /prj tree in San Diego, or any other
> office except for one. And it's only hanging when talking to their 2
> local NetApp filers.
This problem does appear a bit like an NFS problem.
Have you checked the versions for each of the subsystems involved and
compared working and not working?
Ian
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2006-05-09 11:21 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-05-04 20:41 Umount call getting stuck, hanging nfs? Mike Marion
2006-05-04 21:20 ` Jeff Moyer
2006-05-04 22:07 ` Mike Marion
2006-05-05 1:59 ` Ian Kent
2006-05-08 23:34 ` Mike Marion
2006-05-09 1:31 ` Mike Marion
2006-05-09 11:21 ` Ian Kent
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.