* autmount hangs occasionally on bind-mounts
@ 2010-09-27 5:55 Sebastian Hetze
2010-09-28 3:30 ` Ian Kent
0 siblings, 1 reply; 4+ messages in thread
From: Sebastian Hetze @ 2010-09-27 5:55 UTC (permalink / raw)
To: autofs
Hi *,
we are suffering from some sort of race condition that causes
automount to hang:
[351841.568061] INFO: task automount:22055 blocked for more than 120 seconds.
[351841.568689] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[351841.569717] automount D b983e7f6 0 22055 1 0x00000000
[351841.570252] e0ca7ef4 00000082 f3c38000 b983e7f6 00013fde eaed6000 f63af880 f5037c00
[351841.571308] c0863320 c0863320 f30de480 f30de718 c5589320 00000002 b9841648 00013fde
[351841.572316] f30de718 f72ceff4 f72ceff0 ffffffff e0ca7f20 c059fd3e e0ca7f14 f30de480
[351841.573364] Call Trace:
[351841.573686] [<c059fd3e>] __mutex_lock_slowpath+0xbe/0x120
[351841.574130] [<c059fc60>] mutex_lock+0x20/0x40
[351841.574496] [<c0202732>] do_rmdir+0x52/0xe0
[351841.574878] [<c04b67ad>] ? sys_socketcall+0x1cd/0x2a0
[351841.575266] [<c0202820>] sys_rmdir+0x10/0x20
[351841.575781] [<c010968c>] syscall_call+0x7/0xb
The error occurs occasionally, sometimes after one day, sometimes after
one week. It occurs with all kind of kernel versions and with all kind
of automount. Currently we are running autofs 5.0.4 and kernel
2.6.31-22-generic-pae from ubuntu.
I have tried to find out whats going wrong and have come
to the suspicion, that this might be a systematic problem
with automount.
In the failing system we are running automount locally, so automount
uses bind-mounts to make file system branches accessable somewhere
else.
The blocking sys_rmdir happens on such an bind-mount.
It appears that the bind-mount can be unmounted regardless how
many open files there are. Since the open files live on the
"real file system" there seems to be no notion of usage for
the bind mountpoint neither in automount nor in the kernel.
So what i suspect to happen is the automount tries to umount
the branch after the timeout has passed, the umount succeeds
although there are open files, automount proceeds with
removing the directory but before the sys_rmdir succeeds
occasionally one of the processes having open files on
the branch accesses this file/directory causing the kernel
to hang.
Is this a valid explanation? Can you accept this as an bug
and do something about it...
Best regards,
Sebastian
--
Sebastian Hetze
Linux Information Systems AG
Bundesallee 93, D-12161 Berlin
Fon: +49 30 818686-45, Fax: +49 30 818686-78
s.hetze@linux-ag.com, http://www.linux-ag.com
----------------------------------------------------------
Sitz der Gesellschaft: Putzbrunner Str. 71, 81739 München
Amtsgericht München: HRB 128 019
Vorstand: Rudolf Strobl
Aufsichtsrat: Michael Tarabochia (Vorsitzender)
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: autmount hangs occasionally on bind-mounts
2010-09-27 5:55 autmount hangs occasionally on bind-mounts Sebastian Hetze
@ 2010-09-28 3:30 ` Ian Kent
2010-09-28 10:11 ` Sebastian Hetze
[not found] ` <20100928101145.92E17409000B@mail.linux-ag.de>
0 siblings, 2 replies; 4+ messages in thread
From: Ian Kent @ 2010-09-28 3:30 UTC (permalink / raw)
To: Sebastian Hetze; +Cc: autofs
On Mon, 2010-09-27 at 07:55 +0200, Sebastian Hetze wrote:
> Hi *,
>
> we are suffering from some sort of race condition that causes
> automount to hang:
>
> [351841.568061] INFO: task automount:22055 blocked for more than 120 seconds.
> [351841.568689] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [351841.569717] automount D b983e7f6 0 22055 1 0x00000000
> [351841.570252] e0ca7ef4 00000082 f3c38000 b983e7f6 00013fde eaed6000 f63af880 f5037c00
> [351841.571308] c0863320 c0863320 f30de480 f30de718 c5589320 00000002 b9841648 00013fde
> [351841.572316] f30de718 f72ceff4 f72ceff0 ffffffff e0ca7f20 c059fd3e e0ca7f14 f30de480
> [351841.573364] Call Trace:
> [351841.573686] [<c059fd3e>] __mutex_lock_slowpath+0xbe/0x120
> [351841.574130] [<c059fc60>] mutex_lock+0x20/0x40
> [351841.574496] [<c0202732>] do_rmdir+0x52/0xe0
> [351841.574878] [<c04b67ad>] ? sys_socketcall+0x1cd/0x2a0
> [351841.575266] [<c0202820>] sys_rmdir+0x10/0x20
> [351841.575781] [<c010968c>] syscall_call+0x7/0xb
This is only half the story.
I think you'll find another process that is waiting on the expire via
autofs4_revalidate() and holds the mutex that the above process is
waiting on.
This is a known problem and has been present for years and cannot be
resolved using the current automount framwork.
I don't know why we're suddenly seeing people get caught by it recently
but we are.
Assuming you are seeing the problem I think you are you should be able
to work around it by using the "browse" option on your autofs mounts.
This should work OK as long as your maps are not too large.
>
> The error occurs occasionally, sometimes after one day, sometimes after
> one week. It occurs with all kind of kernel versions and with all kind
> of automount. Currently we are running autofs 5.0.4 and kernel
> 2.6.31-22-generic-pae from ubuntu.
>
> I have tried to find out whats going wrong and have come
> to the suspicion, that this might be a systematic problem
> with automount.
>
> In the failing system we are running automount locally, so automount
> uses bind-mounts to make file system branches accessable somewhere
> else.
>
> The blocking sys_rmdir happens on such an bind-mount.
>
> It appears that the bind-mount can be unmounted regardless how
> many open files there are. Since the open files live on the
> "real file system" there seems to be no notion of usage for
> the bind mountpoint neither in automount nor in the kernel.
Rubbish, open files elevate the reference count on certain kernel
objects within the mounted file system. If there is an open file within
a mounted filesystem the kernel knows about it.
>
> So what i suspect to happen is the automount tries to umount
> the branch after the timeout has passed, the umount succeeds
> although there are open files, automount proceeds with
> removing the directory but before the sys_rmdir succeeds
> occasionally one of the processes having open files on
> the branch accesses this file/directory causing the kernel
> to hang.
No, this is a deadlock within the VFS which is caused by autofs being
sensitive to the VFS locking requirements of certain system calls.
>
> Is this a valid explanation? Can you accept this as an bug
> and do something about it...
I have tried to resolve this several times over the last few years
without success. But there is an effort underway now to implement new
VFS automounting support and I'm working on the autofs pert of that.
Ian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: autmount hangs occasionally on bind-mounts
2010-09-28 3:30 ` Ian Kent
@ 2010-09-28 10:11 ` Sebastian Hetze
[not found] ` <20100928101145.92E17409000B@mail.linux-ag.de>
1 sibling, 0 replies; 4+ messages in thread
From: Sebastian Hetze @ 2010-09-28 10:11 UTC (permalink / raw)
To: Ian Kent; +Cc: autofs, Sebastian Hetze
On Tue, Sep 28, 2010 at 11:30:55AM +0800, Ian Kent wrote:
> On Mon, 2010-09-27 at 07:55 +0200, Sebastian Hetze wrote:
> > Hi *,
> >
> > we are suffering from some sort of race condition that causes
> > automount to hang:
> >
> > [351841.568061] INFO: task automount:22055 blocked for more than 120 seconds.
> > [351841.568689] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [351841.569717] automount D b983e7f6 0 22055 1 0x00000000
> > [351841.570252] e0ca7ef4 00000082 f3c38000 b983e7f6 00013fde eaed6000 f63af880 f5037c00
> > [351841.571308] c0863320 c0863320 f30de480 f30de718 c5589320 00000002 b9841648 00013fde
> > [351841.572316] f30de718 f72ceff4 f72ceff0 ffffffff e0ca7f20 c059fd3e e0ca7f14 f30de480
> > [351841.573364] Call Trace:
> > [351841.573686] [<c059fd3e>] __mutex_lock_slowpath+0xbe/0x120
> > [351841.574130] [<c059fc60>] mutex_lock+0x20/0x40
> > [351841.574496] [<c0202732>] do_rmdir+0x52/0xe0
> > [351841.574878] [<c04b67ad>] ? sys_socketcall+0x1cd/0x2a0
> > [351841.575266] [<c0202820>] sys_rmdir+0x10/0x20
> > [351841.575781] [<c010968c>] syscall_call+0x7/0xb
>
> This is only half the story.
>
> I think you'll find another process that is waiting on the expire via
> autofs4_revalidate() and holds the mutex that the above process is
> waiting on.
Actually, there is another blocked process:
[351961.584408] INFO: task install:22804 blocked for more than 120 seconds.
[351961.584913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[351961.585545] install D e268c4fc 0 22804 22798 0x00000000
[351961.586100] f442fed8 00000086 c02000b1 e268c4fc 00013fec f442fee8 e04efc00 00000000
[351961.587180] c0863320 c0863320 f3a19920 f3a19bb8 c55a9320 00000004 f442ff30 c1010000
[351961.588255] f3a19bb8 f72ceff4 f72ceff0 ffffffff f442ff04 c059fd3e f547be58 f3a19920
[351961.589550] Call Trace:
[351961.589864] [<c02000b1>] ? path_to_nameidata+0x31/0x50
[351961.590286] [<c059fd3e>] __mutex_lock_slowpath+0xbe/0x120
[351961.590793] [<c059fc60>] mutex_lock+0x20/0x40
[351961.591140] [<c01ffc4f>] lookup_create+0x1f/0xa0
[351961.591569] [<c020287c>] sys_mkdirat+0x4c/0x100
[351961.591996] [<c020e48a>] ? mntput_no_expire+0x1a/0xd0
[351961.592427] [<c0202950>] sys_mkdir+0x20/0x30
[351961.592912] [<c010968c>] syscall_call+0x7/0xb
>
> This is a known problem and has been present for years and cannot be
> resolved using the current automount framwork.
>
> I don't know why we're suddenly seeing people get caught by it recently
> but we are.
>
> Assuming you are seeing the problem I think you are you should be able
> to work around it by using the "browse" option on your autofs mounts.
> This should work OK as long as your maps are not too large.
>
We will try this option.
Thanx for your explanation.
Can you point me to an kernel bug report number that I can trace for
further development on that subject?
Best regards,
Sebastian
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: autmount hangs occasionally on bind-mounts
[not found] ` <20100928101145.92E17409000B@mail.linux-ag.de>
@ 2010-09-28 12:50 ` Ian Kent
0 siblings, 0 replies; 4+ messages in thread
From: Ian Kent @ 2010-09-28 12:50 UTC (permalink / raw)
To: Sebastian Hetze; +Cc: autofs
On Tue, 2010-09-28 at 12:11 +0200, Sebastian Hetze wrote:
> On Tue, Sep 28, 2010 at 11:30:55AM +0800, Ian Kent wrote:
> > On Mon, 2010-09-27 at 07:55 +0200, Sebastian Hetze wrote:
> > > Hi *,
> > >
> > > we are suffering from some sort of race condition that causes
> > > automount to hang:
> > >
> > > [351841.568061] INFO: task automount:22055 blocked for more than 120 seconds.
> > > [351841.568689] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > > [351841.569717] automount D b983e7f6 0 22055 1 0x00000000
> > > [351841.570252] e0ca7ef4 00000082 f3c38000 b983e7f6 00013fde eaed6000 f63af880 f5037c00
> > > [351841.571308] c0863320 c0863320 f30de480 f30de718 c5589320 00000002 b9841648 00013fde
> > > [351841.572316] f30de718 f72ceff4 f72ceff0 ffffffff e0ca7f20 c059fd3e e0ca7f14 f30de480
> > > [351841.573364] Call Trace:
> > > [351841.573686] [<c059fd3e>] __mutex_lock_slowpath+0xbe/0x120
> > > [351841.574130] [<c059fc60>] mutex_lock+0x20/0x40
> > > [351841.574496] [<c0202732>] do_rmdir+0x52/0xe0
> > > [351841.574878] [<c04b67ad>] ? sys_socketcall+0x1cd/0x2a0
> > > [351841.575266] [<c0202820>] sys_rmdir+0x10/0x20
> > > [351841.575781] [<c010968c>] syscall_call+0x7/0xb
> >
> > This is only half the story.
> >
> > I think you'll find another process that is waiting on the expire via
> > autofs4_revalidate() and holds the mutex that the above process is
> > waiting on.
>
> Actually, there is another blocked process:
While that does look a little like what I'd expect to see I don't think
that is the process your looking for.
>
> [351961.584408] INFO: task install:22804 blocked for more than 120 seconds.
> [351961.584913] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [351961.585545] install D e268c4fc 0 22804 22798 0x00000000
> [351961.586100] f442fed8 00000086 c02000b1 e268c4fc 00013fec f442fee8 e04efc00 00000000
> [351961.587180] c0863320 c0863320 f3a19920 f3a19bb8 c55a9320 00000004 f442ff30 c1010000
> [351961.588255] f3a19bb8 f72ceff4 f72ceff0 ffffffff f442ff04 c059fd3e f547be58 f3a19920
> [351961.589550] Call Trace:
> [351961.589864] [<c02000b1>] ? path_to_nameidata+0x31/0x50
> [351961.590286] [<c059fd3e>] __mutex_lock_slowpath+0xbe/0x120
> [351961.590793] [<c059fc60>] mutex_lock+0x20/0x40
> [351961.591140] [<c01ffc4f>] lookup_create+0x1f/0xa0
> [351961.591569] [<c020287c>] sys_mkdirat+0x4c/0x100
> [351961.591996] [<c020e48a>] ? mntput_no_expire+0x1a/0xd0
> [351961.592427] [<c0202950>] sys_mkdir+0x20/0x30
> [351961.592912] [<c010968c>] syscall_call+0x7/0xb
>
> >
> > This is a known problem and has been present for years and cannot be
> > resolved using the current automount framwork.
> >
> > I don't know why we're suddenly seeing people get caught by it recently
> > but we are.
> >
> > Assuming you are seeing the problem I think you are you should be able
> > to work around it by using the "browse" option on your autofs mounts.
> > This should work OK as long as your maps are not too large.
> >
>
> We will try this option.
>
> Thanx for your explanation.
>
> Can you point me to an kernel bug report number that I can trace for
> further development on that subject?
I don't think there is one.
Keep your eye on either the autofs mailing list or linux-fsdevel or
Linux Kernel Mailing list, the series will be posted in those lists.
It may not mention the deadlock issue since the VFS automount
implementation is mean to address slightly different issues with autofs,
AFS, CIFS and NFS. But for autofs a side effect of the implementation is
the deadlock is gone.
Ian
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2010-09-28 12:50 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-09-27 5:55 autmount hangs occasionally on bind-mounts Sebastian Hetze
2010-09-28 3:30 ` Ian Kent
2010-09-28 10:11 ` Sebastian Hetze
[not found] ` <20100928101145.92E17409000B@mail.linux-ag.de>
2010-09-28 12:50 ` Ian Kent
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.