From: Ian Kent <raven@themaw.net>
To: Jim Carter <jimc@math.ucla.edu>
Cc: autofs@linux.kernel.org
Subject: Re: clients suddenly start hanging (was: (no subject))
Date: Mon, 23 Jun 2008 12:46:07 +0800 [thread overview]
Message-ID: <1214196368.3098.41.camel@raven.themaw.net> (raw)
In-Reply-To: <Pine.LNX.4.64.0806222024150.18218@xena.cft.ca.us>
On Sun, 2008-06-22 at 20:49 -0700, Jim Carter wrote:
> On Sat, 21 Jun 2008, Ian Kent wrote:
>
> > Ooops, I didn't pay enough attention when I read the pthread barrier man
> > page. That isn't actually an error return but now I'm wondering why I
> > haven't seen it in my test, very odd.
> >
> > Let me fix it and we'll try again.
> >
> > There are other problems but I need to know if this is a viable approach
> > before going further with it.
> >
> > Try this instead.
>
> OK!!! The test program has been running for 28 hours continuously, 32
> hours total, and is still going, having done 37300 mount-unmount cycles so
> far. There are normally 244 filesystems mounted from 125 different
> machines.
Sound promising.
Using a pthread barrier is clearly the way to go here.
>
> There have been no hung processes, i.e. automount either mounts the
> filesystem or returns ENOENT in response to readdir(), within 120 secs.
> There have been no omitted unmounts, i.e. every mounted filesystem (that
> was unused) was unmounted within 1800 secs (the default timeout of 300 secs
> is used).
Mmmm .. wonder what's going on with that.
My test showed a problem with expires.
I'm fairly sure there was corruption of the control file handle and I'm
trying to fix that.
The kernel patches are meant to fix occasional incorrect ENOENT and
EBUSY returns but this could also be something in the daemon. Lets see
how an updated version of revision 8 of this patch goes before we look
more deeply into this.
>
> There was one error reported. I ran the test program, and someone powered
> off a workstation whose filesystem I had mounted. The resulting NFS
> timeout(s) caused the program to think the test thread was hung, so it
> tried to produce a backtrace, but there was a bug and the trace was spoiled
> (you've seen these spoiled traces before in files I've sent in). I
> improved the trace procedure and attempted to restart. I did a forced
> umount by "kill -USR1 $PID", but automount said on syslog:
>
> Jun 21 15:58:47 bustamove automount[2880]: master.c:957: assertion failed:
> ap->state == ST_READY
Oh .. that's not good, I haven't looked closely at the prune event
handling for quite some time. I expect I've broken it with other changes
since I last checked.
>
> And it didn't unmount anything. So I rebooted and started the test on a
> clean machine.
>
> There is a pattern of failure that may not be automount's fault. On
> almost exactly 0.1% of the attempted mounts, the readdir eventually fails
> with ENOENT. The test program leaves these filesystems alone for 1800
> secs, then tries again to mount and test them, which invariably succeeds. I
> don't see any pattern to the type of the machine: workstation, server,
> compute node, heavily loaded, totally idle, etc. But if multiple
> filesystems from one machine (submount) are unmounted and remounted at the
> "same" time (0.2 secs apart), if any one fails, there is a tendency for
> several others to also fail.
But we have to assume it's autofs, for now at least.
Ian
next prev parent reply other threads:[~2008-06-23 4:46 UTC|newest]
Thread overview: 51+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-06-12 4:50 [PATCH 00/10] Kernel patch series Ian Kent
2008-06-12 4:50 ` [PATCH 01/10] autofs4 - check for invalid dentry in getpath Ian Kent
2008-06-12 4:50 ` [PATCH 02/10] autofs4 - fix sparse warning in waitq.c:autofs4_expire_indirect() Ian Kent
2008-06-12 4:50 ` [PATCH 03/10] autofs4 - fix incorrect return from root.c:try_to_fill_dentry() Ian Kent
2008-06-12 4:51 ` [PATCH 04/10] autofs4 - fix mntput, dput order bug Ian Kent
2008-06-12 4:51 ` [PATCH 05/10] autofs4 - don't make expiring dentry negative Ian Kent
2008-06-12 4:51 ` [PATCH 06/10] autofs4 - use look aside list for lookups Ian Kent
2008-06-12 4:51 ` [PATCH 07/10] autofs4 - don't release directory mutex if called in oz_mode Ian Kent
2008-06-12 4:51 ` [PATCH 08/10] autofs4 - use lookup intent flags to trigger mounts Ian Kent
2008-06-12 4:51 ` [PATCH 09/10] autofs4 - use struct qstr in waitq.c Ian Kent
2008-06-12 4:51 ` [PATCH 10/10] autofs4 - fix pending mount race Ian Kent
2008-06-14 1:13 ` [PATCH 00/10] Kernel patch series Jim Carter
2008-06-14 3:30 ` Ian Kent
2008-06-14 3:42 ` Ian Kent
2008-06-19 0:40 ` clients suddenly start hanging (was: (no subject)) Jim Carter
2008-06-19 3:14 ` Ian Kent
2008-06-19 17:08 ` Jim Carter
2008-06-19 18:34 ` Jim Carter
2008-06-20 4:09 ` Ian Kent
2008-06-21 1:02 ` Jim Carter
2008-06-21 3:12 ` Ian Kent
2008-06-23 3:49 ` Jim Carter
2008-06-23 4:46 ` Ian Kent [this message]
2008-06-24 3:08 ` Ian Kent
2008-06-24 17:02 ` Stephen Biggs
2008-06-24 23:39 ` Jim Carter
2008-06-25 3:33 ` Ian Kent
2008-06-25 5:00 ` Ian Kent
2008-06-23 4:15 ` Ian Kent
-- strict thread matches above, loose matches on Subject: below --
2008-04-23 18:50 (no subject) Jim Carter
2008-04-23 20:04 ` Jeff Moyer
2008-04-24 3:10 ` Ian Kent
2008-04-24 16:52 ` clients suddenly start hanging (was: (no subject)) Jim Carter
2008-04-26 1:17 ` Jim Carter
2008-04-26 5:34 ` Ian Kent
2008-04-26 18:48 ` Jim Carter
2008-04-27 5:52 ` Ian Kent
2008-04-26 22:16 ` Jim Carter
2008-04-28 6:26 ` [PATCH 1/2] autofs4 - fix execution order race in mount request code Ian Kent
2008-05-08 4:52 ` clients suddenly start hanging (was: (no subject)) Jim Carter
2008-05-08 6:13 ` Ian Kent
2008-05-11 4:14 ` Jim Carter
2008-05-11 7:57 ` Ian Kent
2008-05-15 21:59 ` Jim Carter
2008-05-16 3:00 ` Ian Kent
2008-05-18 4:07 ` Ian Kent
2008-05-21 6:58 ` Ian Kent
2008-05-22 21:42 ` Jim Carter
2008-05-23 2:35 ` Ian Kent
2008-05-26 0:34 ` Jim Carter
2008-06-12 3:20 ` Ian Kent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1214196368.3098.41.camel@raven.themaw.net \
--to=raven@themaw.net \
--cc=autofs@linux.kernel.org \
--cc=jimc@math.ucla.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.