* umount() and NFS races in 2.4.26
@ 2004-07-08 18:07 Thomas Moestl
2004-07-09 14:32 ` Marcelo Tosatti
2004-07-10 6:57 ` raven
0 siblings, 2 replies; 10+ messages in thread
From: Thomas Moestl @ 2004-07-08 18:07 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 4298 bytes --]
Hi,
after deploying an SMP machine at work, we started to experience Oopses
in file-system related code relatively frequently. Investigation
revealed that they were caused by references using junk pointers from
freed super blocks via dangling inodes from unmounted file systems;
Oopses would always be preceded by the warning
VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
on an unmount (unmount activity is high on this machine due to heavy use
of the automounter). The predecessor to this machine, a UP system with
otherwise almost identical configuration, did never encounter such
problems, so I went looking for possible SMP races.
I believe that I have found two problems:
- The NFS async unlink code (fs/nfs/unlink.c) does keep a dentry for
later asynchronous processing, but the mount point is unbusied via
path_release() once sys_unlink() returns (fs/namei.c). While it
does a dget() on the dentry, this is insufficient to prevent an
umount(); when one would happen in the right time window, it seems
that it would initially get past the busy check:
if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
(fs/namespace.c, do_umount()), but invalidate_inodes() in kill_super()
(fs/super.c) would then fail because of the inode referenced from
the dentry needed for the async unlink (which cannot be removed
by shrink_dcache_parent() because the NFS code did dget() it).
Please note that this problem is only conjectured, as it turned out
to not be our culprit. It looks not completely trivial to fix to me,
I believe it might require some changes that would affect other FS
implementations. It is not a true SMP race, if it exists it would
affect UP systems as well.
- There is a SMP race between the shrink_dcache_parent() (fs/dcache.c)
called from kill_super() and prune_dache() called via
shrink_dache_memory() (called by kswapd), as follows:
shrink_dache_parent() calls select_parent() to both prepare the LRU
list for purge_cache() and return the number of unused dcache
entries that can likely be removed in the next prune_dache() run.
If select_parent() returns 0, shrink_dcache_parent() assumes that
its work is done and returns. Now, assume for simplicity that there
are only two remaining dcache entries: one for "foo/bar" and for
the directory "foo/", which is referenced by the "foo/bar" entry.
Furthermore, prune_dcache() is currently running, called from kswapd,
and has decided to remove the "foo/bar" entry. To that end, it
calls prune_one_dentry(), which dentry_iput()s then "foo/bar" dentry.
This causes the dache_lock() to be dropped. Just now select_parent()
comes along, and can obtain the dcache_lock(). It now looks for unused
dentries; but there is only the "foo/" one left, which has a non-zero
d_count because "foo/bar" referenced it as parent, and
prune_one_dentry() did not yet have a chance to dput() it because it
has to wait for the dcache_lock(). Thus, select_parent() finds no
unused dentries, assumes that all is fine and does not purge any
more; the "foo/" entry can remain in the cache for much longer
because it may have DCACHE_REFERENCED set, so that the kswapd
purge_dcache() will leave it alone. The inode corresponding to the
dangling dcache entry will still be referenced, and end up dangling,
too. kill_super() will print the warning mentioned above.
When dentry or inode are touched again later (for example in another
purge_dcache() later on) we can end up accessing the super block
freed during the unmount, leading to an Oops.
This was partially verified by inspecting the dcache state via
/dev/kmem after the busy inodes warning had occured (the directory
dentry was not busy any more, but still remained in the unused list).
In the attached patch, I have used a semaphore to serialize purging
accesses to the dentry_unused list. With a kernel so patched, the
problem seems to have disappeared. The patch is just a quick hack,
the semantics of the semaphore is not really well-defined; but maybe
it can serve as a starting point.
I have not checked whether any of these issues pertain to the 2.6 series
as well.
- Thomas
P.S: please CC me in replies, as I am not subscribed to this list.
[-- Attachment #2: dcachesem.diff --]
[-- Type: text/plain, Size: 1226 bytes --]
--- dcache.c.orig Wed Jun 16 00:22:03 2004
+++ dcache.c Wed Jun 16 01:00:47 2004
@@ -51,6 +51,7 @@
static unsigned int d_hash_shift;
static struct list_head *dentry_hashtable;
static LIST_HEAD(dentry_unused);
+struct semaphore dcache_lru_sem;
/* Statistics gathering. */
struct dentry_stat_t dentry_stat = {0, 0, 45, 0,};
@@ -381,6 +382,7 @@
struct list_head *tmp, *next;
struct dentry *dentry;
+ down(&dcache_lru_sem);
/*
* Pass one ... move the dentries for the specified
* superblock to the most recent end of the unused list.
@@ -416,6 +418,7 @@
goto repeat;
}
spin_unlock(&dcache_lock);
+ up(&dcache_lru_sem);
}
/*
@@ -548,8 +551,10 @@
{
int found;
+ down(&dcache_lru_sem);
while ((found = select_parent(parent)) != 0)
prune_dcache(found);
+ up(&dcache_lru_sem);
}
/*
@@ -581,9 +586,11 @@
if (!(gfp_mask & __GFP_FS))
return 0;
+ down(&dcache_lru_sem);
count = dentry_stat.nr_unused / priority;
prune_dcache(count);
+ up(&dcache_lru_sem);
return kmem_cache_shrink(dentry_cache);
}
@@ -1247,6 +1254,7 @@
d++;
i--;
} while (i);
+ sema_init(&dcache_lru_sem, 1);
}
static void init_buffer_head(void * foo, kmem_cache_t * cachep, unsigned long flags)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
2004-07-08 18:07 Thomas Moestl
@ 2004-07-09 14:32 ` Marcelo Tosatti
2004-07-10 21:38 ` Trond Myklebust
2004-07-10 6:57 ` raven
1 sibling, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2004-07-09 14:32 UTC (permalink / raw)
To: Thomas Moestl; +Cc: linux-kernel, viro, Trond Myklebust
On Thu, Jul 08, 2004 at 08:07:09PM +0200, Thomas Moestl wrote:
> Hi,
Hi Thomas,
> after deploying an SMP machine at work, we started to experience Oopses
> in file-system related code relatively frequently. Investigation
> revealed that they were caused by references using junk pointers from
> freed super blocks via dangling inodes from unmounted file systems;
> Oopses would always be preceded by the warning
> VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
> on an unmount (unmount activity is high on this machine due to heavy use
> of the automounter). The predecessor to this machine, a UP system with
> otherwise almost identical configuration, did never encounter such
> problems, so I went looking for possible SMP races.
>
> I believe that I have found two problems:
>
> - The NFS async unlink code (fs/nfs/unlink.c) does keep a dentry for
> later asynchronous processing, but the mount point is unbusied via
> path_release() once sys_unlink() returns (fs/namei.c). While it
> does a dget() on the dentry, this is insufficient to prevent an
> umount(); when one would happen in the right time window, it seems
> that it would initially get past the busy check:
> if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
> (fs/namespace.c, do_umount()), but invalidate_inodes() in kill_super()
> (fs/super.c) would then fail because of the inode referenced from
> the dentry needed for the async unlink (which cannot be removed
> by shrink_dcache_parent() because the NFS code did dget() it).
>
> Please note that this problem is only conjectured, as it turned out
> to not be our culprit. It looks not completely trivial to fix to me,
> I believe it might require some changes that would affect other FS
> implementations. It is not a true SMP race, if it exists it would
> affect UP systems as well.
Trond?
> - There is a SMP race between the shrink_dcache_parent() (fs/dcache.c)
> called from kill_super() and prune_dache() called via
> shrink_dache_memory() (called by kswapd), as follows:
> shrink_dache_parent() calls select_parent() to both prepare the LRU
> list for purge_cache() and return the number of unused dcache
> entries that can likely be removed in the next prune_dache() run.
> If select_parent() returns 0, shrink_dcache_parent() assumes that
> its work is done and returns. Now, assume for simplicity that there
> are only two remaining dcache entries: one for "foo/bar" and for
> the directory "foo/", which is referenced by the "foo/bar" entry.
> Furthermore, prune_dcache() is currently running, called from kswapd,
> and has decided to remove the "foo/bar" entry. To that end, it
> calls prune_one_dentry(), which dentry_iput()s then "foo/bar" dentry.
> This causes the dache_lock() to be dropped. Just now select_parent()
> comes along, and can obtain the dcache_lock(). It now looks for unused
> dentries; but there is only the "foo/" one left, which has a non-zero
> d_count because "foo/bar" referenced it as parent, and
> prune_one_dentry() did not yet have a chance to dput() it because it
> has to wait for the dcache_lock(). Thus, select_parent() finds no
> unused dentries, assumes that all is fine and does not purge any
> more; the "foo/" entry can remain in the cache for much longer
> because it may have DCACHE_REFERENCED set, so that the kswapd
> purge_dcache() will leave it alone. The inode corresponding to the
> dangling dcache entry will still be referenced, and end up dangling,
> too. kill_super() will print the warning mentioned above.
> When dentry or inode are touched again later (for example in another
> purge_dcache() later on) we can end up accessing the super block
> freed during the unmount, leading to an Oops.
> This was partially verified by inspecting the dcache state via
> /dev/kmem after the busy inodes warning had occured (the directory
> dentry was not busy any more, but still remained in the unused list).
>
> In the attached patch, I have used a semaphore to serialize purging
> accesses to the dentry_unused list. With a kernel so patched, the
> problem seems to have disappeared. The patch is just a quick hack,
> the semantics of the semaphore is not really well-defined; but maybe
> it can serve as a starting point.
This is a fastpath, adding a semaphore here is not a Good Thing from the
performance POV.
Maybe shrink_dcache_parent() when called from kill_super() could be more
picky and loop again when a used dentry is found? I feel that
something along these line could make the problem go away without the
need for a slow sleep-lock.
Thanks for the detailed description of the problem.
> I have not checked whether any of these issues pertain to the 2.6 series
> as well.
>
> - Thomas
>
> P.S: please CC me in replies, as I am not subscribed to this list.
>
> --- dcache.c.orig Wed Jun 16 00:22:03 2004
> +++ dcache.c Wed Jun 16 01:00:47 2004
> @@ -51,6 +51,7 @@
> static unsigned int d_hash_shift;
> static struct list_head *dentry_hashtable;
> static LIST_HEAD(dentry_unused);
> +struct semaphore dcache_lru_sem;
>
> /* Statistics gathering. */
> struct dentry_stat_t dentry_stat = {0, 0, 45, 0,};
> @@ -381,6 +382,7 @@
> struct list_head *tmp, *next;
> struct dentry *dentry;
>
> + down(&dcache_lru_sem);
> /*
> * Pass one ... move the dentries for the specified
> * superblock to the most recent end of the unused list.
> @@ -416,6 +418,7 @@
> goto repeat;
> }
> spin_unlock(&dcache_lock);
> + up(&dcache_lru_sem);
> }
>
> /*
> @@ -548,8 +551,10 @@
> {
> int found;
>
> + down(&dcache_lru_sem);
> while ((found = select_parent(parent)) != 0)
> prune_dcache(found);
> + up(&dcache_lru_sem);
> }
>
> /*
> @@ -581,9 +586,11 @@
> if (!(gfp_mask & __GFP_FS))
> return 0;
>
> + down(&dcache_lru_sem);
> count = dentry_stat.nr_unused / priority;
>
> prune_dcache(count);
> + up(&dcache_lru_sem);
> return kmem_cache_shrink(dentry_cache);
> }
>
> @@ -1247,6 +1254,7 @@
> d++;
> i--;
> } while (i);
> + sema_init(&dcache_lru_sem, 1);
> }
>
> static void init_buffer_head(void * foo, kmem_cache_t * cachep, unsigned long flags)
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
@ 2004-07-09 15:00 James Pearson
0 siblings, 0 replies; 10+ messages in thread
From: James Pearson @ 2004-07-09 15:00 UTC (permalink / raw)
To: Thomas Moestl; +Cc: linux-kernel
Thomas Moestl:
>after deploying an SMP machine at work, we started to experience Oopses
>in file-system related code relatively frequently. Investigation
>revealed that they were caused by references using junk pointers from
>freed super blocks via dangling inodes from unmounted file systems;
>Oopses would always be preceded by the warning
> VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a
nice day...
>on an unmount (unmount activity is high on this machine due to heavy
>use of the automounter).
Are you using the latest autofs4 kernel patches?
I had a similar problem - see the thread at:
http://marc.theaimsgroup.com/?l=linux-nfs&m=108515468003933&w=2
The latest autofs4 patches fixed it for me - available from:
http://www.kernel.org/pub/linux/daemons/autofs/v4
James Pearson
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
2004-07-08 18:07 Thomas Moestl
2004-07-09 14:32 ` Marcelo Tosatti
@ 2004-07-10 6:57 ` raven
2004-07-10 18:19 ` Thomas Moestl
1 sibling, 1 reply; 10+ messages in thread
From: raven @ 2004-07-10 6:57 UTC (permalink / raw)
To: Thomas Moestl; +Cc: autofs mailing list, nfs, Kernel Mailing List
On Thu, 8 Jul 2004, Thomas Moestl wrote:
> Hi,
>
> after deploying an SMP machine at work, we started to experience Oopses
> in file-system related code relatively frequently. Investigation
> revealed that they were caused by references using junk pointers from
> freed super blocks via dangling inodes from unmounted file systems;
> Oopses would always be preceded by the warning
> VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
> on an unmount (unmount activity is high on this machine due to heavy use
> of the automounter). The predecessor to this machine, a UP system with
> otherwise almost identical configuration, did never encounter such
> problems, so I went looking for possible SMP races.
This has been reported many times by users of autofs, especially people
with a ot of mount/umount activity.
As James pointed out my latest autofs4 patch resolved the issue for him.
However, on the NFS list Greg Banks pointed out that this may be hiding a
problem that exists in NFS. So it would be good if the NFS folk could
investigate this further.
Never the less I'm sure there is a race in waitq.c of autofs4 in
2.4 that seems to cause this problem. This is one of the things
addressed by my patch.
The autofs4 stuff can be found at
http://www.kernel.org/pub/linux/daemons/autofs/v4
Note that the patch for the kernel module will not apply cleanly to
2.4.27-rc3 as Marcello has applied 2 hunks from that patch. Similarly if
you chose to use the kernel module build kit the mandatory and optional
kernel patches refered to in it have already been applied to 2.4.27-rc3.
Additionally, I recomend using the latest autofs, currently 4.1.3, as well
as the 4.1.3 patches found with the tarball at the location above.
>
> I believe that I have found two problems:
>
> - The NFS async unlink code (fs/nfs/unlink.c) does keep a dentry for
> later asynchronous processing, but the mount point is unbusied via
> path_release() once sys_unlink() returns (fs/namei.c). While it
> does a dget() on the dentry, this is insufficient to prevent an
> umount(); when one would happen in the right time window, it seems
> that it would initially get past the busy check:
> if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
> (fs/namespace.c, do_umount()), but invalidate_inodes() in kill_super()
> (fs/super.c) would then fail because of the inode referenced from
> the dentry needed for the async unlink (which cannot be removed
> by shrink_dcache_parent() because the NFS code did dget() it).
>
> Please note that this problem is only conjectured, as it turned out
> to not be our culprit. It looks not completely trivial to fix to me,
> I believe it might require some changes that would affect other FS
> implementations. It is not a true SMP race, if it exists it would
> affect UP systems as well.
>
> - There is a SMP race between the shrink_dcache_parent() (fs/dcache.c)
> called from kill_super() and prune_dache() called via
> shrink_dache_memory() (called by kswapd), as follows:
> shrink_dache_parent() calls select_parent() to both prepare the LRU
> list for purge_cache() and return the number of unused dcache
> entries that can likely be removed in the next prune_dache() run.
> If select_parent() returns 0, shrink_dcache_parent() assumes that
> its work is done and returns. Now, assume for simplicity that there
> are only two remaining dcache entries: one for "foo/bar" and for
> the directory "foo/", which is referenced by the "foo/bar" entry.
> Furthermore, prune_dcache() is currently running, called from kswapd,
> and has decided to remove the "foo/bar" entry. To that end, it
> calls prune_one_dentry(), which dentry_iput()s then "foo/bar" dentry.
> This causes the dache_lock() to be dropped. Just now select_parent()
> comes along, and can obtain the dcache_lock(). It now looks for unused
> dentries; but there is only the "foo/" one left, which has a non-zero
> d_count because "foo/bar" referenced it as parent, and
> prune_one_dentry() did not yet have a chance to dput() it because it
> has to wait for the dcache_lock(). Thus, select_parent() finds no
> unused dentries, assumes that all is fine and does not purge any
> more; the "foo/" entry can remain in the cache for much longer
> because it may have DCACHE_REFERENCED set, so that the kswapd
> purge_dcache() will leave it alone. The inode corresponding to the
> dangling dcache entry will still be referenced, and end up dangling,
> too. kill_super() will print the warning mentioned above.
> When dentry or inode are touched again later (for example in another
> purge_dcache() later on) we can end up accessing the super block
> freed during the unmount, leading to an Oops.
> This was partially verified by inspecting the dcache state via
> /dev/kmem after the busy inodes warning had occured (the directory
> dentry was not busy any more, but still remained in the unused list).
>
> In the attached patch, I have used a semaphore to serialize purging
> accesses to the dentry_unused list. With a kernel so patched, the
> problem seems to have disappeared. The patch is just a quick hack,
> the semantics of the semaphore is not really well-defined; but maybe
> it can serve as a starting point.
>
> I have not checked whether any of these issues pertain to the 2.6 series
> as well.
>
> - Thomas
>
> P.S: please CC me in replies, as I am not subscribed to this list.
>
>
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
2004-07-10 6:57 ` raven
@ 2004-07-10 18:19 ` Thomas Moestl
2004-07-10 19:25 ` raven
2004-07-11 10:16 ` raven
0 siblings, 2 replies; 10+ messages in thread
From: Thomas Moestl @ 2004-07-10 18:19 UTC (permalink / raw)
To: raven; +Cc: autofs mailing list, nfs, Kernel Mailing List
Hello,
On Sat, 2004/07/10 at 14:57:46 +0800, raven@themaw.net wrote:
> > Hi,
> >
> > after deploying an SMP machine at work, we started to experience Oopses
> > in file-system related code relatively frequently. Investigation
> > revealed that they were caused by references using junk pointers from
> > freed super blocks via dangling inodes from unmounted file systems;
> > Oopses would always be preceded by the warning
> > VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
> > on an unmount (unmount activity is high on this machine due to heavy use
> > of the automounter). The predecessor to this machine, a UP system with
> > otherwise almost identical configuration, did never encounter such
> > problems, so I went looking for possible SMP races.
>
> This has been reported many times by users of autofs, especially people
> with a ot of mount/umount activity.
>
> As James pointed out my latest autofs4 patch resolved the issue for him.
> However, on the NFS list Greg Banks pointed out that this may be hiding a
> problem that exists in NFS. So it would be good if the NFS folk could
> investigate this further.
>
> Never the less I'm sure there is a race in waitq.c of autofs4 in
> 2.4 that seems to cause this problem. This is one of the things
> addressed by my patch.
The system in question still uses autofs3. While I believe that the
waitq race is also present there (it could probably cause directory
lookups to hang, if I understand it correctly), I do not think that
any autofs3 code could cause exactly those symptoms that I have
observed. For that, it would have to obtain dentries of the file
systems that it has mounted, but the old code never does that.
Almost anything else autofs could do should only affect its own set of
dentries etc.
Thanks for the information, though, I'm going to upgrade to a patched
autofs4 at the next opportunity.
- Thomas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
2004-07-10 18:19 ` Thomas Moestl
@ 2004-07-10 19:25 ` raven
2004-07-10 19:50 ` Thomas Moestl
2004-07-11 10:16 ` raven
1 sibling, 1 reply; 10+ messages in thread
From: raven @ 2004-07-10 19:25 UTC (permalink / raw)
To: Thomas Moestl; +Cc: autofs mailing list, nfs, Kernel Mailing List
On Sat, 10 Jul 2004, Thomas Moestl wrote:
> Hello,
>
> On Sat, 2004/07/10 at 14:57:46 +0800, raven@themaw.net wrote:
> > > Hi,
> > >
> > > after deploying an SMP machine at work, we started to experience Oopses
> > > in file-system related code relatively frequently. Investigation
> > > revealed that they were caused by references using junk pointers from
> > > freed super blocks via dangling inodes from unmounted file systems;
> > > Oopses would always be preceded by the warning
> > > VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
> > > on an unmount (unmount activity is high on this machine due to heavy use
> > > of the automounter). The predecessor to this machine, a UP system with
> > > otherwise almost identical configuration, did never encounter such
> > > problems, so I went looking for possible SMP races.
> >
> > This has been reported many times by users of autofs, especially people
> > with a ot of mount/umount activity.
> >
> > As James pointed out my latest autofs4 patch resolved the issue for him.
> > However, on the NFS list Greg Banks pointed out that this may be hiding a
> > problem that exists in NFS. So it would be good if the NFS folk could
> > investigate this further.
> >
> > Never the less I'm sure there is a race in waitq.c of autofs4 in
> > 2.4 that seems to cause this problem. This is one of the things
> > addressed by my patch.
>
> The system in question still uses autofs3. While I believe that the
> waitq race is also present there (it could probably cause directory
> lookups to hang, if I understand it correctly), I do not think that
> any autofs3 code could cause exactly those symptoms that I have
> observed. For that, it would have to obtain dentries of the file
> systems that it has mounted, but the old code never does that.
All autofs has to do is not delete a directory before exiting for this
error to occur.
Ian
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
2004-07-10 19:25 ` raven
@ 2004-07-10 19:50 ` Thomas Moestl
0 siblings, 0 replies; 10+ messages in thread
From: Thomas Moestl @ 2004-07-10 19:50 UTC (permalink / raw)
To: raven; +Cc: autofs mailing list, nfs, Kernel Mailing List
On Sun, 2004/07/11 at 03:25:34 +0800, raven@themaw.net wrote:
> On Sat, 10 Jul 2004, Thomas Moestl wrote:
> > The system in question still uses autofs3. While I believe that the
> > waitq race is also present there (it could probably cause directory
> > lookups to hang, if I understand it correctly), I do not think that
> > any autofs3 code could cause exactly those symptoms that I have
> > observed. For that, it would have to obtain dentries of the file
> > systems that it has mounted, but the old code never does that.
>
> All autofs has to do is not delete a directory before exiting for this
> error to occur.
But in that case, the left-over dentry would be an autofs one, would
it not? In our case, the dentries were verified to belong to a NFS
mount that was unmounted by the automounter (that was one of the
symptoms I was referring to). The automounter itself was still
running, and the autofs still mounted.
- Thomas
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
2004-07-09 14:32 ` Marcelo Tosatti
@ 2004-07-10 21:38 ` Trond Myklebust
2004-07-11 0:32 ` viro
0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2004-07-10 21:38 UTC (permalink / raw)
To: Marcelo Tosatti; +Cc: Thomas Moestl, linux-kernel, Alexander Viro
På fr , 09/07/2004 klokka 09:32, skreiv Marcelo Tosatti:
> > - The NFS async unlink code (fs/nfs/unlink.c) does keep a dentry for
> > later asynchronous processing, but the mount point is unbusied via
> > path_release() once sys_unlink() returns (fs/namei.c). While it
> > does a dget() on the dentry, this is insufficient to prevent an
> > umount(); when one would happen in the right time window, it seems
> > that it would initially get past the busy check:
> > if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
> > (fs/namespace.c, do_umount()), but invalidate_inodes() in kill_super()
> > (fs/super.c) would then fail because of the inode referenced from
> > the dentry needed for the async unlink (which cannot be removed
> > by shrink_dcache_parent() because the NFS code did dget() it).
> >
> > Please note that this problem is only conjectured, as it turned out
> > to not be our culprit. It looks not completely trivial to fix to me,
> > I believe it might require some changes that would affect other FS
> > implementations. It is not a true SMP race, if it exists it would
> > affect UP systems as well.
>
> Trond?
Known problem, but a fix is not trivial since the unlink() procedure
does not take a nameidata structure argument (which would be needed in
order to figure out which vfs_mount struct to mntget()).
If someone can figure out a way to fix it, then a patch would be
welcome, but I'm on holiday until the Linux Kernel Summit starts, so
don't expect any immediate action from me...
Cheers,
Trond
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
2004-07-10 21:38 ` Trond Myklebust
@ 2004-07-11 0:32 ` viro
0 siblings, 0 replies; 10+ messages in thread
From: viro @ 2004-07-11 0:32 UTC (permalink / raw)
To: Trond Myklebust; +Cc: Marcelo Tosatti, Thomas Moestl, linux-kernel
On Sat, Jul 10, 2004 at 04:38:48PM -0500, Trond Myklebust wrote:
> Known problem, but a fix is not trivial since the unlink() procedure
> does not take a nameidata structure argument (which would be needed in
> order to figure out which vfs_mount struct to mntget()).
Why the hell would you need a vfsmount for that? You want to duplicate
an active reference to superblock and you want it to be dropped later.
Which means atomic_inc(&sb->s_active) and kill_super(sb) resp.
NFS itself has a right to do that - we know that caller of ->unlink()
is holding a reference to vfsmount, so there's an active reference
that won't go away until nfs_unlink() is finished.
Filesystem code can hold active references to superblocks of given type,
provided that it takes care to drop them eventually on its own...
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: umount() and NFS races in 2.4.26
2004-07-10 18:19 ` Thomas Moestl
2004-07-10 19:25 ` raven
@ 2004-07-11 10:16 ` raven
1 sibling, 0 replies; 10+ messages in thread
From: raven @ 2004-07-11 10:16 UTC (permalink / raw)
To: Thomas Moestl; +Cc: autofs mailing list, nfs, Kernel Mailing List
On Sat, 10 Jul 2004, Thomas Moestl wrote:
> > Never the less I'm sure there is a race in waitq.c of autofs4 in
> > 2.4 that seems to cause this problem. This is one of the things
> > addressed by my patch.
>
> The system in question still uses autofs3. While I believe that the
> waitq race is also present there (it could probably cause directory
> lookups to hang, if I understand it correctly), I do not think that
> any autofs3 code could cause exactly those symptoms that I have
> observed. For that, it would have to obtain dentries of the file
> systems that it has mounted, but the old code never does that.
I don't think autofs v3 has this race. It uses the BKL everywhere to
serialise execution. Never the less people seem to see the "busy inode"
messages sometimes.
Ian
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2004-07-11 10:28 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-09 15:00 umount() and NFS races in 2.4.26 James Pearson
-- strict thread matches above, loose matches on Subject: below --
2004-07-08 18:07 Thomas Moestl
2004-07-09 14:32 ` Marcelo Tosatti
2004-07-10 21:38 ` Trond Myklebust
2004-07-11 0:32 ` viro
2004-07-10 6:57 ` raven
2004-07-10 18:19 ` Thomas Moestl
2004-07-10 19:25 ` raven
2004-07-10 19:50 ` Thomas Moestl
2004-07-11 10:16 ` raven
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox