public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* umount() and NFS races in 2.4.26
@ 2004-07-08 18:07 Thomas Moestl
  2004-07-09 14:32 ` Marcelo Tosatti
  2004-07-10  6:57 ` raven
  0 siblings, 2 replies; 10+ messages in thread
From: Thomas Moestl @ 2004-07-08 18:07 UTC (permalink / raw)
  To: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 4298 bytes --]

Hi,

after deploying an SMP machine at work, we started to experience Oopses
in file-system related code relatively frequently. Investigation
revealed that they were caused by references using junk pointers from
freed super blocks via dangling inodes from unmounted file systems;
Oopses would always be preceded by the warning
  VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
on an unmount (unmount activity is high on this machine due to heavy use
of the automounter). The predecessor to this machine, a UP system with
otherwise almost identical configuration, did never encounter such
problems, so I went looking for possible SMP races.

I believe that I have found two problems:

- The NFS async unlink code (fs/nfs/unlink.c) does keep a dentry for
  later asynchronous processing, but the mount point is unbusied via
  path_release() once sys_unlink() returns (fs/namei.c). While it
  does a dget() on the dentry, this is insufficient to prevent an
  umount(); when one would happen in the right time window, it seems
  that it would initially get past the busy check:
	if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
  (fs/namespace.c, do_umount()), but invalidate_inodes() in kill_super()
  (fs/super.c) would then fail because of the inode referenced from
  the dentry needed for the async unlink (which cannot be removed
  by shrink_dcache_parent() because the NFS code did dget() it).

  Please note that this problem is only conjectured, as it turned out
  to not be our culprit. It looks not completely trivial to fix to me,
  I believe it might require some changes that would affect other FS
  implementations. It is not a true SMP race, if it exists it would
  affect UP systems as well.

- There is a SMP race between the shrink_dcache_parent() (fs/dcache.c)
  called from kill_super() and prune_dache() called via
  shrink_dache_memory() (called by kswapd), as follows:
  shrink_dache_parent() calls select_parent() to both prepare the LRU
  list for purge_cache() and return the number of unused dcache
  entries that can likely be removed in the next prune_dache() run.
  If select_parent() returns 0, shrink_dcache_parent() assumes that
  its work is done and returns. Now, assume for simplicity that there
  are only two remaining dcache entries: one for "foo/bar" and for
  the directory "foo/", which is referenced by the "foo/bar" entry.
  Furthermore, prune_dcache() is currently running, called from kswapd,
  and has decided to remove the "foo/bar" entry. To that end, it
  calls prune_one_dentry(), which dentry_iput()s then "foo/bar" dentry.
  This causes the dache_lock() to be dropped. Just now select_parent()
  comes along, and can obtain the dcache_lock(). It now looks for unused
  dentries; but there is only the "foo/" one left, which has a non-zero
  d_count because "foo/bar" referenced it as parent, and
  prune_one_dentry() did not yet have a chance to dput() it because it
  has to wait for the dcache_lock(). Thus, select_parent() finds no
  unused dentries, assumes that all is fine and does not purge any
  more; the "foo/" entry can remain in the cache for much longer
  because it may have DCACHE_REFERENCED set, so that the kswapd
  purge_dcache() will leave it alone. The inode corresponding to the
  dangling dcache entry will still be referenced, and end up dangling,
  too. kill_super() will print the warning mentioned above.
  When dentry or inode are touched again later (for example in another
  purge_dcache() later on) we can end up accessing the super block
  freed during the unmount, leading to an Oops.
  This was partially verified by inspecting the dcache state via
  /dev/kmem after the busy inodes warning had occured (the directory
  dentry was not busy any more, but still remained in the unused list).

  In the attached patch, I have used a semaphore to serialize purging
  accesses to the dentry_unused list. With a kernel so patched, the
  problem seems to have disappeared. The patch is just a quick hack,
  the semantics of the semaphore is not really well-defined; but maybe
  it can serve as a starting point.
  
I have not checked whether any of these issues pertain to the 2.6 series
as well.

	- Thomas

P.S: please CC me in replies, as I am not subscribed to this list.


[-- Attachment #2: dcachesem.diff --]
[-- Type: text/plain, Size: 1226 bytes --]

--- dcache.c.orig	Wed Jun 16 00:22:03 2004
+++ dcache.c	Wed Jun 16 01:00:47 2004
@@ -51,6 +51,7 @@
 static unsigned int d_hash_shift;
 static struct list_head *dentry_hashtable;
 static LIST_HEAD(dentry_unused);
+struct semaphore dcache_lru_sem;
 
 /* Statistics gathering. */
 struct dentry_stat_t dentry_stat = {0, 0, 45, 0,};
@@ -381,6 +382,7 @@
 	struct list_head *tmp, *next;
 	struct dentry *dentry;
 
+	down(&dcache_lru_sem);
 	/*
 	 * Pass one ... move the dentries for the specified
 	 * superblock to the most recent end of the unused list.
@@ -416,6 +418,7 @@
 		goto repeat;
 	}
 	spin_unlock(&dcache_lock);
+	up(&dcache_lru_sem);
 }
 
 /*
@@ -548,8 +551,10 @@
 {
 	int found;
 
+	down(&dcache_lru_sem);
 	while ((found = select_parent(parent)) != 0)
 		prune_dcache(found);
+	up(&dcache_lru_sem);
 }
 
 /*
@@ -581,9 +586,11 @@
 	if (!(gfp_mask & __GFP_FS))
 		return 0;
 
+	down(&dcache_lru_sem);
 	count = dentry_stat.nr_unused / priority;
 
 	prune_dcache(count);
+	up(&dcache_lru_sem);
 	return kmem_cache_shrink(dentry_cache);
 }
 
@@ -1247,6 +1254,7 @@
 		d++;
 		i--;
 	} while (i);
+	sema_init(&dcache_lru_sem, 1);
 }
 
 static void init_buffer_head(void * foo, kmem_cache_t * cachep, unsigned long flags)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
  2004-07-08 18:07 Thomas Moestl
@ 2004-07-09 14:32 ` Marcelo Tosatti
  2004-07-10 21:38   ` Trond Myklebust
  2004-07-10  6:57 ` raven
  1 sibling, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2004-07-09 14:32 UTC (permalink / raw)
  To: Thomas Moestl; +Cc: linux-kernel, viro, Trond Myklebust

On Thu, Jul 08, 2004 at 08:07:09PM +0200, Thomas Moestl wrote:
> Hi,

Hi Thomas,

> after deploying an SMP machine at work, we started to experience Oopses
> in file-system related code relatively frequently. Investigation
> revealed that they were caused by references using junk pointers from
> freed super blocks via dangling inodes from unmounted file systems;
> Oopses would always be preceded by the warning
>   VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
> on an unmount (unmount activity is high on this machine due to heavy use
> of the automounter). The predecessor to this machine, a UP system with
> otherwise almost identical configuration, did never encounter such
> problems, so I went looking for possible SMP races.
> 
> I believe that I have found two problems:
> 
> - The NFS async unlink code (fs/nfs/unlink.c) does keep a dentry for
>   later asynchronous processing, but the mount point is unbusied via
>   path_release() once sys_unlink() returns (fs/namei.c). While it
>   does a dget() on the dentry, this is insufficient to prevent an
>   umount(); when one would happen in the right time window, it seems
>   that it would initially get past the busy check:
> 	if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
>   (fs/namespace.c, do_umount()), but invalidate_inodes() in kill_super()
>   (fs/super.c) would then fail because of the inode referenced from
>   the dentry needed for the async unlink (which cannot be removed
>   by shrink_dcache_parent() because the NFS code did dget() it).
> 
>   Please note that this problem is only conjectured, as it turned out
>   to not be our culprit. It looks not completely trivial to fix to me,
>   I believe it might require some changes that would affect other FS
>   implementations. It is not a true SMP race, if it exists it would
>   affect UP systems as well.

Trond?

> - There is a SMP race between the shrink_dcache_parent() (fs/dcache.c)
>   called from kill_super() and prune_dache() called via
>   shrink_dache_memory() (called by kswapd), as follows:
>   shrink_dache_parent() calls select_parent() to both prepare the LRU
>   list for purge_cache() and return the number of unused dcache
>   entries that can likely be removed in the next prune_dache() run.
>   If select_parent() returns 0, shrink_dcache_parent() assumes that
>   its work is done and returns. Now, assume for simplicity that there
>   are only two remaining dcache entries: one for "foo/bar" and for
>   the directory "foo/", which is referenced by the "foo/bar" entry.
>   Furthermore, prune_dcache() is currently running, called from kswapd,
>   and has decided to remove the "foo/bar" entry. To that end, it
>   calls prune_one_dentry(), which dentry_iput()s then "foo/bar" dentry.
>   This causes the dache_lock() to be dropped. Just now select_parent()
>   comes along, and can obtain the dcache_lock(). It now looks for unused
>   dentries; but there is only the "foo/" one left, which has a non-zero
>   d_count because "foo/bar" referenced it as parent, and
>   prune_one_dentry() did not yet have a chance to dput() it because it
>   has to wait for the dcache_lock(). Thus, select_parent() finds no
>   unused dentries, assumes that all is fine and does not purge any
>   more; the "foo/" entry can remain in the cache for much longer
>   because it may have DCACHE_REFERENCED set, so that the kswapd
>   purge_dcache() will leave it alone. The inode corresponding to the
>   dangling dcache entry will still be referenced, and end up dangling,
>   too. kill_super() will print the warning mentioned above.
>   When dentry or inode are touched again later (for example in another
>   purge_dcache() later on) we can end up accessing the super block
>   freed during the unmount, leading to an Oops.
>   This was partially verified by inspecting the dcache state via
>   /dev/kmem after the busy inodes warning had occured (the directory
>   dentry was not busy any more, but still remained in the unused list).
> 
>   In the attached patch, I have used a semaphore to serialize purging
>   accesses to the dentry_unused list. With a kernel so patched, the
>   problem seems to have disappeared. The patch is just a quick hack,
>   the semantics of the semaphore is not really well-defined; but maybe
>   it can serve as a starting point.

This is a fastpath, adding a semaphore here is not a Good Thing from the
performance POV.

Maybe shrink_dcache_parent() when called from kill_super() could be more
picky and loop again when a used dentry is found? I feel that
something along these line could make the problem go away without the
need for a slow sleep-lock.

Thanks for the detailed description of the problem.

> I have not checked whether any of these issues pertain to the 2.6 series
> as well.
> 
> 	- Thomas
> 
> P.S: please CC me in replies, as I am not subscribed to this list.
> 

> --- dcache.c.orig	Wed Jun 16 00:22:03 2004
> +++ dcache.c	Wed Jun 16 01:00:47 2004
> @@ -51,6 +51,7 @@
>  static unsigned int d_hash_shift;
>  static struct list_head *dentry_hashtable;
>  static LIST_HEAD(dentry_unused);
> +struct semaphore dcache_lru_sem;
>  
>  /* Statistics gathering. */
>  struct dentry_stat_t dentry_stat = {0, 0, 45, 0,};
> @@ -381,6 +382,7 @@
>  	struct list_head *tmp, *next;
>  	struct dentry *dentry;
>  
> +	down(&dcache_lru_sem);
>  	/*
>  	 * Pass one ... move the dentries for the specified
>  	 * superblock to the most recent end of the unused list.
> @@ -416,6 +418,7 @@
>  		goto repeat;
>  	}
>  	spin_unlock(&dcache_lock);
> +	up(&dcache_lru_sem);
>  }
>  
>  /*
> @@ -548,8 +551,10 @@
>  {
>  	int found;
>  
> +	down(&dcache_lru_sem);
>  	while ((found = select_parent(parent)) != 0)
>  		prune_dcache(found);
> +	up(&dcache_lru_sem);
>  }
>  
>  /*
> @@ -581,9 +586,11 @@
>  	if (!(gfp_mask & __GFP_FS))
>  		return 0;
>  
> +	down(&dcache_lru_sem);
>  	count = dentry_stat.nr_unused / priority;
>  
>  	prune_dcache(count);
> +	up(&dcache_lru_sem);
>  	return kmem_cache_shrink(dentry_cache);
>  }
>  
> @@ -1247,6 +1254,7 @@
>  		d++;
>  		i--;
>  	} while (i);
> +	sema_init(&dcache_lru_sem, 1);
>  }
>  
>  static void init_buffer_head(void * foo, kmem_cache_t * cachep, unsigned long flags)


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
@ 2004-07-09 15:00 James Pearson
  0 siblings, 0 replies; 10+ messages in thread
From: James Pearson @ 2004-07-09 15:00 UTC (permalink / raw)
  To: Thomas Moestl; +Cc: linux-kernel

Thomas Moestl:
 >after deploying an SMP machine at work, we started to experience Oopses
 >in file-system related code relatively frequently. Investigation
 >revealed that they were caused by references using junk pointers from
 >freed super blocks via dangling inodes from unmounted file systems;
 >Oopses would always be preceded by the warning
 > VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a 
nice day...
 >on an unmount (unmount activity is high on this machine due to heavy 
 >use of the automounter).

Are you using the latest autofs4 kernel patches?

I had a similar problem - see the thread at:

http://marc.theaimsgroup.com/?l=linux-nfs&m=108515468003933&w=2

The latest autofs4 patches fixed it for me - available from:

http://www.kernel.org/pub/linux/daemons/autofs/v4

James Pearson

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
  2004-07-08 18:07 Thomas Moestl
  2004-07-09 14:32 ` Marcelo Tosatti
@ 2004-07-10  6:57 ` raven
  2004-07-10 18:19   ` Thomas Moestl
  1 sibling, 1 reply; 10+ messages in thread
From: raven @ 2004-07-10  6:57 UTC (permalink / raw)
  To: Thomas Moestl; +Cc: autofs mailing list, nfs, Kernel Mailing List

On Thu, 8 Jul 2004, Thomas Moestl wrote:

> Hi,
> 
> after deploying an SMP machine at work, we started to experience Oopses
> in file-system related code relatively frequently. Investigation
> revealed that they were caused by references using junk pointers from
> freed super blocks via dangling inodes from unmounted file systems;
> Oopses would always be preceded by the warning
>   VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
> on an unmount (unmount activity is high on this machine due to heavy use
> of the automounter). The predecessor to this machine, a UP system with
> otherwise almost identical configuration, did never encounter such
> problems, so I went looking for possible SMP races.

This has been reported many times by users of autofs, especially people 
with a ot of mount/umount activity.

As James pointed out my latest autofs4 patch resolved the issue for him.
However, on the NFS list Greg Banks pointed out that this may be hiding a
problem that exists in NFS. So it would be good if the NFS folk could 
investigate this further.

Never the less I'm sure there is a race in waitq.c of autofs4 in 
2.4 that seems to cause this problem. This is one of the things 
addressed by my patch.

The autofs4 stuff can be found at 
http://www.kernel.org/pub/linux/daemons/autofs/v4

Note that the patch for the kernel module will not apply cleanly to 
2.4.27-rc3 as Marcello has applied 2 hunks from that patch. Similarly if 
you chose to use the kernel module build kit the mandatory and optional 
kernel patches refered to in it have already been applied to 2.4.27-rc3.

Additionally, I recomend using the latest autofs, currently 4.1.3, as well 
as the 4.1.3 patches found with the tarball at the location above.

> 
> I believe that I have found two problems:
> 
> - The NFS async unlink code (fs/nfs/unlink.c) does keep a dentry for
>   later asynchronous processing, but the mount point is unbusied via
>   path_release() once sys_unlink() returns (fs/namei.c). While it
>   does a dget() on the dentry, this is insufficient to prevent an
>   umount(); when one would happen in the right time window, it seems
>   that it would initially get past the busy check:
> 	if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
>   (fs/namespace.c, do_umount()), but invalidate_inodes() in kill_super()
>   (fs/super.c) would then fail because of the inode referenced from
>   the dentry needed for the async unlink (which cannot be removed
>   by shrink_dcache_parent() because the NFS code did dget() it).
> 
>   Please note that this problem is only conjectured, as it turned out
>   to not be our culprit. It looks not completely trivial to fix to me,
>   I believe it might require some changes that would affect other FS
>   implementations. It is not a true SMP race, if it exists it would
>   affect UP systems as well.
> 
> - There is a SMP race between the shrink_dcache_parent() (fs/dcache.c)
>   called from kill_super() and prune_dache() called via
>   shrink_dache_memory() (called by kswapd), as follows:
>   shrink_dache_parent() calls select_parent() to both prepare the LRU
>   list for purge_cache() and return the number of unused dcache
>   entries that can likely be removed in the next prune_dache() run.
>   If select_parent() returns 0, shrink_dcache_parent() assumes that
>   its work is done and returns. Now, assume for simplicity that there
>   are only two remaining dcache entries: one for "foo/bar" and for
>   the directory "foo/", which is referenced by the "foo/bar" entry.
>   Furthermore, prune_dcache() is currently running, called from kswapd,
>   and has decided to remove the "foo/bar" entry. To that end, it
>   calls prune_one_dentry(), which dentry_iput()s then "foo/bar" dentry.
>   This causes the dache_lock() to be dropped. Just now select_parent()
>   comes along, and can obtain the dcache_lock(). It now looks for unused
>   dentries; but there is only the "foo/" one left, which has a non-zero
>   d_count because "foo/bar" referenced it as parent, and
>   prune_one_dentry() did not yet have a chance to dput() it because it
>   has to wait for the dcache_lock(). Thus, select_parent() finds no
>   unused dentries, assumes that all is fine and does not purge any
>   more; the "foo/" entry can remain in the cache for much longer
>   because it may have DCACHE_REFERENCED set, so that the kswapd
>   purge_dcache() will leave it alone. The inode corresponding to the
>   dangling dcache entry will still be referenced, and end up dangling,
>   too. kill_super() will print the warning mentioned above.
>   When dentry or inode are touched again later (for example in another
>   purge_dcache() later on) we can end up accessing the super block
>   freed during the unmount, leading to an Oops.
>   This was partially verified by inspecting the dcache state via
>   /dev/kmem after the busy inodes warning had occured (the directory
>   dentry was not busy any more, but still remained in the unused list).
> 
>   In the attached patch, I have used a semaphore to serialize purging
>   accesses to the dentry_unused list. With a kernel so patched, the
>   problem seems to have disappeared. The patch is just a quick hack,
>   the semantics of the semaphore is not really well-defined; but maybe
>   it can serve as a starting point.
>   
> I have not checked whether any of these issues pertain to the 2.6 series
> as well.
> 
> 	- Thomas
> 
> P.S: please CC me in replies, as I am not subscribed to this list.
> 
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
  2004-07-10  6:57 ` raven
@ 2004-07-10 18:19   ` Thomas Moestl
  2004-07-10 19:25     ` raven
  2004-07-11 10:16     ` raven
  0 siblings, 2 replies; 10+ messages in thread
From: Thomas Moestl @ 2004-07-10 18:19 UTC (permalink / raw)
  To: raven; +Cc: autofs mailing list, nfs, Kernel Mailing List

Hello,

On Sat, 2004/07/10 at 14:57:46 +0800, raven@themaw.net wrote:
> > Hi,
> > 
> > after deploying an SMP machine at work, we started to experience Oopses
> > in file-system related code relatively frequently. Investigation
> > revealed that they were caused by references using junk pointers from
> > freed super blocks via dangling inodes from unmounted file systems;
> > Oopses would always be preceded by the warning
> >   VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
> > on an unmount (unmount activity is high on this machine due to heavy use
> > of the automounter). The predecessor to this machine, a UP system with
> > otherwise almost identical configuration, did never encounter such
> > problems, so I went looking for possible SMP races.
> 
> This has been reported many times by users of autofs, especially people 
> with a ot of mount/umount activity.
> 
> As James pointed out my latest autofs4 patch resolved the issue for him.
> However, on the NFS list Greg Banks pointed out that this may be hiding a
> problem that exists in NFS. So it would be good if the NFS folk could 
> investigate this further.
> 
> Never the less I'm sure there is a race in waitq.c of autofs4 in 
> 2.4 that seems to cause this problem. This is one of the things 
> addressed by my patch.

The system in question still uses autofs3. While I believe that the
waitq race is also present there (it could probably cause directory
lookups to hang, if I understand it correctly), I do not think that
any autofs3 code could cause exactly those symptoms that I have
observed. For that, it would have to obtain dentries of the file
systems that it has mounted, but the old code never does that.

Almost anything else autofs could do should only affect its own set of
dentries etc.

Thanks for the information, though, I'm going to upgrade to a patched
autofs4 at the next opportunity.

	- Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
  2004-07-10 18:19   ` Thomas Moestl
@ 2004-07-10 19:25     ` raven
  2004-07-10 19:50       ` Thomas Moestl
  2004-07-11 10:16     ` raven
  1 sibling, 1 reply; 10+ messages in thread
From: raven @ 2004-07-10 19:25 UTC (permalink / raw)
  To: Thomas Moestl; +Cc: autofs mailing list, nfs, Kernel Mailing List

On Sat, 10 Jul 2004, Thomas Moestl wrote:

> Hello,
> 
> On Sat, 2004/07/10 at 14:57:46 +0800, raven@themaw.net wrote:
> > > Hi,
> > > 
> > > after deploying an SMP machine at work, we started to experience Oopses
> > > in file-system related code relatively frequently. Investigation
> > > revealed that they were caused by references using junk pointers from
> > > freed super blocks via dangling inodes from unmounted file systems;
> > > Oopses would always be preceded by the warning
> > >   VFS: Busy inodes after unmount. Self-destruct in 5 seconds.  Have a nice day...
> > > on an unmount (unmount activity is high on this machine due to heavy use
> > > of the automounter). The predecessor to this machine, a UP system with
> > > otherwise almost identical configuration, did never encounter such
> > > problems, so I went looking for possible SMP races.
> > 
> > This has been reported many times by users of autofs, especially people 
> > with a ot of mount/umount activity.
> > 
> > As James pointed out my latest autofs4 patch resolved the issue for him.
> > However, on the NFS list Greg Banks pointed out that this may be hiding a
> > problem that exists in NFS. So it would be good if the NFS folk could 
> > investigate this further.
> > 
> > Never the less I'm sure there is a race in waitq.c of autofs4 in 
> > 2.4 that seems to cause this problem. This is one of the things 
> > addressed by my patch.
> 
> The system in question still uses autofs3. While I believe that the
> waitq race is also present there (it could probably cause directory
> lookups to hang, if I understand it correctly), I do not think that
> any autofs3 code could cause exactly those symptoms that I have
> observed. For that, it would have to obtain dentries of the file
> systems that it has mounted, but the old code never does that.

All autofs has to do is not delete a directory before exiting for this 
error to occur.

Ian


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
  2004-07-10 19:25     ` raven
@ 2004-07-10 19:50       ` Thomas Moestl
  0 siblings, 0 replies; 10+ messages in thread
From: Thomas Moestl @ 2004-07-10 19:50 UTC (permalink / raw)
  To: raven; +Cc: autofs mailing list, nfs, Kernel Mailing List

On Sun, 2004/07/11 at 03:25:34 +0800, raven@themaw.net wrote:
> On Sat, 10 Jul 2004, Thomas Moestl wrote:
> > The system in question still uses autofs3. While I believe that the
> > waitq race is also present there (it could probably cause directory
> > lookups to hang, if I understand it correctly), I do not think that
> > any autofs3 code could cause exactly those symptoms that I have
> > observed. For that, it would have to obtain dentries of the file
> > systems that it has mounted, but the old code never does that.
> 
> All autofs has to do is not delete a directory before exiting for this 
> error to occur.

But in that case, the left-over dentry would be an autofs one, would
it not? In our case, the dentries were verified to belong to a NFS
mount that was unmounted by the automounter (that was one of the
symptoms I was referring to). The automounter itself was still
running, and the autofs still mounted.

	- Thomas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
  2004-07-09 14:32 ` Marcelo Tosatti
@ 2004-07-10 21:38   ` Trond Myklebust
  2004-07-11  0:32     ` viro
  0 siblings, 1 reply; 10+ messages in thread
From: Trond Myklebust @ 2004-07-10 21:38 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Thomas Moestl, linux-kernel, Alexander Viro

På fr , 09/07/2004 klokka 09:32, skreiv Marcelo Tosatti:
> > - The NFS async unlink code (fs/nfs/unlink.c) does keep a dentry for
> >   later asynchronous processing, but the mount point is unbusied via
> >   path_release() once sys_unlink() returns (fs/namei.c). While it
> >   does a dget() on the dentry, this is insufficient to prevent an
> >   umount(); when one would happen in the right time window, it seems
> >   that it would initially get past the busy check:
> > 	if (atomic_read(&mnt->mnt_count) == 2 || flags & MNT_DETACH) {
> >   (fs/namespace.c, do_umount()), but invalidate_inodes() in kill_super()
> >   (fs/super.c) would then fail because of the inode referenced from
> >   the dentry needed for the async unlink (which cannot be removed
> >   by shrink_dcache_parent() because the NFS code did dget() it).
> > 
> >   Please note that this problem is only conjectured, as it turned out
> >   to not be our culprit. It looks not completely trivial to fix to me,
> >   I believe it might require some changes that would affect other FS
> >   implementations. It is not a true SMP race, if it exists it would
> >   affect UP systems as well.
> 
> Trond?

Known problem, but a fix is not trivial since the unlink() procedure
does not take a nameidata structure argument (which would be needed in
order to figure out which vfs_mount struct to mntget()).

If someone can figure out a way to fix it, then a patch would be
welcome, but I'm on holiday until the Linux Kernel Summit starts, so
don't expect any immediate action from me...

Cheers,
 Trond

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
  2004-07-10 21:38   ` Trond Myklebust
@ 2004-07-11  0:32     ` viro
  0 siblings, 0 replies; 10+ messages in thread
From: viro @ 2004-07-11  0:32 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Marcelo Tosatti, Thomas Moestl, linux-kernel

On Sat, Jul 10, 2004 at 04:38:48PM -0500, Trond Myklebust wrote:
 
> Known problem, but a fix is not trivial since the unlink() procedure
> does not take a nameidata structure argument (which would be needed in
> order to figure out which vfs_mount struct to mntget()).

Why the hell would you need a vfsmount for that?  You want to duplicate
an active reference to superblock and you want it to be dropped later.
Which means atomic_inc(&sb->s_active) and kill_super(sb) resp.

NFS itself has a right to do that - we know that caller of ->unlink()
is holding a reference to vfsmount, so there's an active reference
that won't go away until nfs_unlink() is finished.

Filesystem code can hold active references to superblocks of given type,
provided that it takes care to drop them eventually on its own...

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: umount() and NFS races in 2.4.26
  2004-07-10 18:19   ` Thomas Moestl
  2004-07-10 19:25     ` raven
@ 2004-07-11 10:16     ` raven
  1 sibling, 0 replies; 10+ messages in thread
From: raven @ 2004-07-11 10:16 UTC (permalink / raw)
  To: Thomas Moestl; +Cc: autofs mailing list, nfs, Kernel Mailing List

On Sat, 10 Jul 2004, Thomas Moestl wrote:

> > Never the less I'm sure there is a race in waitq.c of autofs4 in 
> > 2.4 that seems to cause this problem. This is one of the things 
> > addressed by my patch.
> 
> The system in question still uses autofs3. While I believe that the
> waitq race is also present there (it could probably cause directory
> lookups to hang, if I understand it correctly), I do not think that
> any autofs3 code could cause exactly those symptoms that I have
> observed. For that, it would have to obtain dentries of the file
> systems that it has mounted, but the old code never does that.

I don't think autofs v3 has this race. It uses the BKL everywhere to 
serialise execution. Never the less people seem to see the "busy inode" 
messages sometimes.

Ian


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2004-07-11 10:28 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-07-09 15:00 umount() and NFS races in 2.4.26 James Pearson
  -- strict thread matches above, loose matches on Subject: below --
2004-07-08 18:07 Thomas Moestl
2004-07-09 14:32 ` Marcelo Tosatti
2004-07-10 21:38   ` Trond Myklebust
2004-07-11  0:32     ` viro
2004-07-10  6:57 ` raven
2004-07-10 18:19   ` Thomas Moestl
2004-07-10 19:25     ` raven
2004-07-10 19:50       ` Thomas Moestl
2004-07-11 10:16     ` raven

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox