public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Fix dcache race during umount
       [not found] <20060403133804.27986.patches@notabene>
@ 2006-04-03  3:40 ` NeilBrown
  2006-04-03 18:12   ` Balbir Singh
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2006-04-03  3:40 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Jan Blunck, Kirill Korotaev, olh, bsingharora

I think we finally have consensus on this patch.  However please speak
up if we don't :-)

Patch is against 2.6.16-mm2

NeilBrown


### Comments for Changeset

The race is that the shrink_dcache_memory shrinker could get called
while a filesystem is being unmounted, and could try to prune
a dentry belonging to that filesystem.

If it does, then it will call in to iput on the inode while the
dentry is no longer able to be found by the umounting process.  If
iput takes a while, generic_shutdown_super could get all the way
though shrink_dcache_parent and shrink_dcache_anon and
invalidate_inodes without ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to
touch it (which some filesystems certainly do), it will lose.  The
promised "Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry
on someone else's dentry.  As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a
filesystem that is being unmounted, and unmount will not be able to
start until any such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it
is doing the prune on behalf of so that it can be careful of other
filesystems.  shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon
list out of the superblock, so it can get the s_anon list itself, and
can pass the superblock down to prune_dcache.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: olh@suse.de
Cc: bsingharora@gmail.com

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/dcache.c            |   41 ++++++++++++++++++++++++++++-------------
 ./fs/super.c             |    2 +-
 ./include/linux/dcache.h |    2 +-
 3 files changed, 30 insertions(+), 15 deletions(-)

diff ./fs/dcache.c~current~ ./fs/dcache.c
--- ./fs/dcache.c~current~	2006-04-03 12:21:49.000000000 +1000
+++ ./fs/dcache.c	2006-04-03 12:21:55.000000000 +1000
@@ -382,6 +382,8 @@ static inline void prune_one_dentry(stru
 /**
  * prune_dcache - shrink the dcache
  * @count: number of entries to try and free
+ * @sb: if given, ignore dentries for other superblocks
+ *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
  * more memory, or simply when we need to unmount
@@ -392,7 +394,7 @@ static inline void prune_one_dentry(stru
  * all the dentries are in use.
  */
  
-static void prune_dcache(int count)
+static void prune_dcache(int count, struct super_block *sb)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
@@ -419,15 +421,27 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		/* If the dentry was recently referenced, don't free it. */
-		if (dentry->d_flags & DCACHE_REFERENCED) {
-			dentry->d_flags &= ~DCACHE_REFERENCED;
- 			list_add(&dentry->d_lru, &dentry_unused);
- 			dentry_stat.nr_unused++;
- 			spin_unlock(&dentry->d_lock);
-			continue;
+		if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+		    (!sb || dentry->d_sb == sb)) {
+			if (sb) {
+				prune_one_dentry(dentry);
+				continue;
+			}
+			/* Need to avoid race with generic_shutdown_super */
+			if (down_read_trylock(&dentry->d_sb->s_umount) &&
+			    dentry->d_sb->s_root != NULL) {
+				prune_one_dentry(dentry);
+				up_read(&dentry->d_sb->s_umount);
+				continue;
+			}
 		}
-		prune_one_dentry(dentry);
+		/* The dentry was recently referenced, or the filesystem
+		 * is being unmounted, so don't free it. */
+
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		list_add(&dentry->d_lru, &dentry_unused);
+		dentry_stat.nr_unused++;
+		spin_unlock(&dentry->d_lock);
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -630,7 +644,7 @@ void shrink_dcache_parent(struct dentry 
 	int found;
 
 	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found);
+		prune_dcache(found, parent->d_sb);
 }
 
 /**
@@ -643,9 +657,10 @@ void shrink_dcache_parent(struct dentry 
  * done under dcache_lock.
  *
  */
-void shrink_dcache_anon(struct hlist_head *head)
+void shrink_dcache_anon(struct super_block *sb)
 {
 	struct hlist_node *lp;
+	struct hlist_head *head = &sb->s_anon;
 	int found;
 	do {
 		found = 0;
@@ -668,7 +683,7 @@ void shrink_dcache_anon(struct hlist_hea
 			}
 		}
 		spin_unlock(&dcache_lock);
-		prune_dcache(found);
+		prune_dcache(found, sb);
 	} while(found);
 }
 
@@ -689,7 +704,7 @@ static int shrink_dcache_memory(int nr, 
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_dcache(nr);
+		prune_dcache(nr, NULL);
 	}
 	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }

diff ./fs/super.c~current~ ./fs/super.c
--- ./fs/super.c~current~	2006-04-03 12:21:49.000000000 +1000
+++ ./fs/super.c	2006-04-03 12:21:55.000000000 +1000
@@ -231,7 +231,7 @@ void generic_shutdown_super(struct super
 	if (root) {
 		sb->s_root = NULL;
 		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_anon(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);

diff ./include/linux/dcache.h~current~ ./include/linux/dcache.h
--- ./include/linux/dcache.h~current~	2006-04-03 12:21:49.000000000 +1000
+++ ./include/linux/dcache.h	2006-04-03 12:21:55.000000000 +1000
@@ -217,7 +217,7 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
+extern void shrink_dcache_anon(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-04-03  3:40 ` [PATCH] Fix dcache race during umount NeilBrown
@ 2006-04-03 18:12   ` Balbir Singh
  2006-04-04  0:59     ` Neil Brown
  2006-04-06 14:05     ` [PATCH 2.6.17-rc1-mm1] BUG due to freed dentry in dcache race fix Balbir Singh
  0 siblings, 2 replies; 14+ messages in thread
From: Balbir Singh @ 2006-04-03 18:12 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, linux-kernel, Jan Blunck, Kirill Korotaev, olh,
	balbir

Hi, Neil,

>
> Cc: Jan Blunck <jblunck@suse.de>
> Cc: Kirill Korotaev <dev@openvz.org>
> Cc: olh@suse.de
> Cc: bsingharora@gmail.com

Could you please make this balbir@in.ibm.com

>
> Signed-off-by: Neil Brown <neilb@suse.de>
>
<snip>
-               /* If the dentry was recently referenced, don't free it. */
-               if (dentry->d_flags & DCACHE_REFERENCED) {
-                       dentry->d_flags &= ~DCACHE_REFERENCED;
-                       list_add(&dentry->d_lru, &dentry_unused);
-                       dentry_stat.nr_unused++;
-                       spin_unlock(&dentry->d_lock);
-                       continue;
+               if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+                   (!sb || dentry->d_sb == sb)) {

Comments for the condition please. Something like

/*
 * If the dentry is not DCACHED_REFERENCED, it is time to move it to LRU list,
 * provided the super block is NULL (which means we are trying to reclaim memory
 * or this dentry belongs to the same super block that we want to shrink.
 */

One side-effect of this check I see is

Earlier, all prune_dcache() calls would prune the dentry cache. This
condition will cause dentries belonging only those super blocks being
shrink'ed to be freed up. shrink_dcache_memory() will have to do the
additional work of freeing dentries (especially for file systems like
sysfs, procfs, etc). But the good thing is it should make the per
super block operations faster (like unmount). IMO, this is the correct
behaviour, but I am not sure of the side-effects.

+                       if (sb) {
+                               prune_one_dentry(dentry);
+                               continue;
+                       }
> +                       /* Need to avoid race with generic_shutdown_super */
> +                       if (down_read_trylock(&dentry->d_sb->s_umount) &&
> +                           dentry->d_sb->s_root != NULL) {

There is a probable bug here. What if down_read_trylock() succeeds and
dentry->d_sb->s_root == NULL? We still need to do an up_read before we
move on.
The comment would be better put as

/*
 * If we are able to acquire the umount semaphore, then the super
block cannot be unmounted
 * while we are pruning this dentry. This helps avoid a race condition
that is caused due to
 * intermediate reference counts held by the children of the dentry in
prune_one_dentry().
 * This leads to select_dcache_parent() ignoring those dentries,
leaving behind non-dput
 * dentries. The unmount happens before prune_one_dentry() can dput
the dentries.
 */

> +                               prune_one_dentry(dentry);
> +                               up_read(&dentry->d_sb->s_umount);
> +                               continue;
> +                       }
>                 }
<snip>

Looks good to go, except for the comments (especially the up_read() bug)

Balbir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-04-03 18:12   ` Balbir Singh
@ 2006-04-04  0:59     ` Neil Brown
  2006-04-04  5:02       ` Balbir Singh
  2006-04-06 14:05     ` [PATCH 2.6.17-rc1-mm1] BUG due to freed dentry in dcache race fix Balbir Singh
  1 sibling, 1 reply; 14+ messages in thread
From: Neil Brown @ 2006-04-04  0:59 UTC (permalink / raw)
  To: balbir; +Cc: Andrew Morton, linux-kernel, Jan Blunck, Kirill Korotaev, olh

On Monday April 3, balbir@in.ibm.com wrote:
> Hi, Neil,
> 
> >
> > Cc: Jan Blunck <jblunck@suse.de>
> > Cc: Kirill Korotaev <dev@openvz.org>
> > Cc: olh@suse.de
> > Cc: bsingharora@gmail.com
> 
> Could you please make this balbir@in.ibm.com

Sure.

> 
> >
> > Signed-off-by: Neil Brown <neilb@suse.de>
> >
> <snip>
> -               /* If the dentry was recently referenced, don't free it. */
> -               if (dentry->d_flags & DCACHE_REFERENCED) {
> -                       dentry->d_flags &= ~DCACHE_REFERENCED;
> -                       list_add(&dentry->d_lru, &dentry_unused);
> -                       dentry_stat.nr_unused++;
> -                       spin_unlock(&dentry->d_lock);
> -                       continue;
> +               if (!(dentry->d_flags & DCACHE_REFERENCED) &&
> +                   (!sb || dentry->d_sb == sb)) {
> 
> Comments for the condition please. Something like
> 
> /*
>  * If the dentry is not DCACHED_REFERENCED, it is time to move it to LRU list,
>  * provided the super block is NULL (which means we are trying to reclaim memory
>  * or this dentry belongs to the same super block that we want to shrink.
>  */

Ok, thanks.  However it isn't time to "move it to the LRU list" but
rather time to "move it from the LRU list, out of the cache all
together, and through it away".

> 
> One side-effect of this check I see is
> 
> Earlier, all prune_dcache() calls would prune the dentry cache. This
> condition will cause dentries belonging only those super blocks being
> shrink'ed to be freed up. shrink_dcache_memory() will have to do the
> additional work of freeing dentries (especially for file systems like
> sysfs, procfs, etc). But the good thing is it should make the per
> super block operations faster (like unmount). IMO, this is the correct
> behaviour, but I am not sure of the side-effects.
> 

Hmm... yes, but there is a worse side-effect I think. If
shrink_dcache_memory finds a dentry that it cannot free, it will move
it to the head of the LRU, so unmount will not be able to find it so
easily, and will end up moving it back down to the tail.  I don't
think this can livelock, but it is unpleasant.

Rather than move these entries to the head of the list, I'd like to
leave them at the tail, and try to skip over entries that we might not
be able to free.



> +                       if (sb) {
> +                               prune_one_dentry(dentry);
> +                               continue;
> +                       }
> > +                       /* Need to avoid race with generic_shutdown_super */
> > +                       if (down_read_trylock(&dentry->d_sb->s_umount) &&
> > +                           dentry->d_sb->s_root != NULL) {
> 
> There is a probable bug here. What if down_read_trylock() succeeds and
> dentry->d_sb->s_root == NULL? We still need to do an up_read before we
> move on.

Oops, yes.  Thanks for that!

> The comment would be better put as
> 
> /*
>  * If we are able to acquire the umount semaphore, then the super
> block cannot be unmounted
>  * while we are pruning this dentry. This helps avoid a race condition
> that is caused due to
>  * intermediate reference counts held by the children of the dentry in
> prune_one_dentry().
>  * This leads to select_dcache_parent() ignoring those dentries,
> leaving behind non-dput
>  * dentries. The unmount happens before prune_one_dentry() can dput
> the dentries.
>  */

Well, I've beefed up the comment, but I would rather focus on the
prune_one_dentry holding an inode rather than holding a dentry.  I
think that problem is clearer and it comes to the same thing.

Patch to follow.

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Fix dcache race during umount
       [not found] <20060404110351.27364.patches@notabene>
@ 2006-04-04  1:05 ` NeilBrown
  2006-04-04  5:11   ` Balbir Singh
  0 siblings, 1 reply; 14+ messages in thread
From: NeilBrown @ 2006-04-04  1:05 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Jan Blunck, Kirill Korotaev, Olaf Hering,
	Balbir Singh

Trying again after some useful comments from Balbir.
Note: the change to prune_dcache looks quite different now.

NeilBrown

### Comments for Changeset

The race is that the shrink_dcache_memory shrinker could get called
while a filesystem is being unmounted, and could try to prune
a dentry belonging to that filesystem.

If it does, then it will call in to iput on the inode while the
dentry is no longer able to be found by the umounting process.  If
iput takes a while, generic_shutdown_super could get all the way
though shrink_dcache_parent and shrink_dcache_anon and
invalidate_inodes without ever waiting on this particular inode.

Eventually the superblock gets freed anyway and if the iput tried to
touch it (which some filesystems certainly do), it will lose.  The
promised "Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry
on someone else's dentry.  As a down_read_trylock is used,
shrink_dcache_memory will no longer try to prune the dentry of a
filesystem that is being unmounted, and unmount will not be able to
start until any such active prune_one_dentry completes.

This requires that prune_dcache *knows* which filesystem (if any) it
is doing the prune on behalf of so that it can be careful of other
filesystems.  shrink_dcache_memory isn't called it on behalf of any
filesystem, and so is careful of everything.

shrink_dcache_anon is now passed a super_block rather than the s_anon
list out of the superblock, so it can get the s_anon list itself, and
can pass the superblock down to prune_dcache.

If prune_dcache finds a dentry that it cannot free, it leaves it where
it is (at the tail of the list) and exits, on the assumption that some
other thread will be removing that dentry soon.  To try to make sure
that some work gets done, a limited number of dnetries which are
untouchable are skipped over while choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./fs/dcache.c            |   62 ++++++++++++++++++++++++++++++++++++++++++-----
 ./fs/super.c             |    2 -
 ./include/linux/dcache.h |    2 -
 3 files changed, 58 insertions(+), 8 deletions(-)

diff ./fs/dcache.c~current~ ./fs/dcache.c
--- ./fs/dcache.c~current~	2006-04-04 08:31:25.000000000 +1000
+++ ./fs/dcache.c	2006-04-04 10:58:14.000000000 +1000
@@ -382,6 +382,8 @@ static inline void prune_one_dentry(stru
 /**
  * prune_dcache - shrink the dcache
  * @count: number of entries to try and free
+ * @sb: if given, ignore dentries for other superblocks
+ *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
  * more memory, or simply when we need to unmount
@@ -392,7 +394,7 @@ static inline void prune_one_dentry(stru
  * all the dentries are in use.
  */
  
-static void prune_dcache(int count)
+static void prune_dcache(int count, struct super_block *sb)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
@@ -402,6 +404,19 @@ static void prune_dcache(int count)
 		cond_resched_lock(&dcache_lock);
 
 		tmp = dentry_unused.prev;
+		if (unlikely(sb)) {
+			/* Try to find a dentry for this sb, but don't try
+			 * too hard, if they aren't near the tail they will
+			 * be moved down again soon
+			 */
+			int skip = count;
+			while (skip &&
+			       tmp != &dentry_unused &&
+			       list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
+				skip--;
+				tmp = tmp->prev;
+			}
+		}
 		if (tmp == &dentry_unused)
 			break;
 		list_del_init(tmp);
@@ -427,7 +442,41 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		prune_one_dentry(dentry);
+		/*
+		 * If the dentry is not DCACHED_REFERENCED, it is time
+		 * to remove it from the dcache, provided the super block is
+		 * NULL (which means we are trying to reclaim memory)
+		 * or this dentry belongs to the same super block that
+		 * we want to shrink.
+		 */
+		/*
+		 * If this dentry is for "my" filesystem, then I can
+		 * prune it without taking the s_umount lock (I already hold it).
+		 */
+		if (sb && dentry->d_sb == sb) {
+			prune_one_dentry(dentry);
+			continue;
+		}
+		/* ...otherwise we need to be sure this filesystem isn't being
+		 * unmounted, otherwise we could race with generic_shutdown_super,
+		 * and end up holding a reference to an inode while the
+		 * filesystem is unmounted.
+		 * So we try to get s_umount, and make sure s_root isn't NULL
+		 */
+		if (down_read_trylock(&dentry->d_sb->s_umount)) {
+			if (dentry->d_sb->s_root != NULL) {
+				prune_one_dentry(dentry);
+				up_read(&dentry->d_sb->s_umount);
+				continue;
+			}
+			up_read(&dentry->d_sb->s_umount);
+		}
+		spin_unlock(&dentry->d_lock);
+		/* Cannot remove the first dentry, and it isn't appropriate
+		 * to move it to the head of the list, so give up, and try
+		 * later
+		 */
+		break;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -630,7 +679,7 @@ void shrink_dcache_parent(struct dentry 
 	int found;
 
 	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found);
+		prune_dcache(found, parent->d_sb);
 }
 
 /**
@@ -643,9 +692,10 @@ void shrink_dcache_parent(struct dentry 
  * done under dcache_lock.
  *
  */
-void shrink_dcache_anon(struct hlist_head *head)
+void shrink_dcache_anon(struct super_block *sb)
 {
 	struct hlist_node *lp;
+	struct hlist_head *head = &sb->s_anon;
 	int found;
 	do {
 		found = 0;
@@ -668,7 +718,7 @@ void shrink_dcache_anon(struct hlist_hea
 			}
 		}
 		spin_unlock(&dcache_lock);
-		prune_dcache(found);
+		prune_dcache(found, sb);
 	} while(found);
 }
 
@@ -689,7 +739,7 @@ static int shrink_dcache_memory(int nr, 
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_dcache(nr);
+		prune_dcache(nr, NULL);
 	}
 	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }

diff ./fs/super.c~current~ ./fs/super.c
--- ./fs/super.c~current~	2006-04-04 08:31:25.000000000 +1000
+++ ./fs/super.c	2006-04-03 12:21:55.000000000 +1000
@@ -231,7 +231,7 @@ void generic_shutdown_super(struct super
 	if (root) {
 		sb->s_root = NULL;
 		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_anon(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);

diff ./include/linux/dcache.h~current~ ./include/linux/dcache.h
--- ./include/linux/dcache.h~current~	2006-04-04 08:31:25.000000000 +1000
+++ ./include/linux/dcache.h	2006-04-03 12:21:55.000000000 +1000
@@ -217,7 +217,7 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
+extern void shrink_dcache_anon(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-04-04  0:59     ` Neil Brown
@ 2006-04-04  5:02       ` Balbir Singh
  0 siblings, 0 replies; 14+ messages in thread
From: Balbir Singh @ 2006-04-04  5:02 UTC (permalink / raw)
  To: Neil Brown; +Cc: Andrew Morton, linux-kernel, Jan Blunck, Kirill Korotaev, olh

> >  * If the dentry is not DCACHED_REFERENCED, it is time to move it to LRU list,
> >  * provided the super block is NULL (which means we are trying to reclaim memory
> >  * or this dentry belongs to the same super block that we want to shrink.
> >  */
>
> Ok, thanks.  However it isn't time to "move it to the LRU list" but
> rather time to "move it from the LRU list, out of the cache all
> together, and through it away".

Oops, yes

>
> >
> > One side-effect of this check I see is
> >
> > Earlier, all prune_dcache() calls would prune the dentry cache. This
> > condition will cause dentries belonging only those super blocks being
> > shrink'ed to be freed up. shrink_dcache_memory() will have to do the
> > additional work of freeing dentries (especially for file systems like
> > sysfs, procfs, etc). But the good thing is it should make the per
> > super block operations faster (like unmount). IMO, this is the correct
> > behaviour, but I am not sure of the side-effects.
> >
>
> Hmm... yes, but there is a worse side-effect I think. If
> shrink_dcache_memory finds a dentry that it cannot free, it will move
> it to the head of the LRU, so unmount will not be able to find it so
> easily, and will end up moving it back down to the tail.  I don't
> think this can livelock, but it is unpleasant.
>
> Rather than move these entries to the head of the list, I'd like to
> leave them at the tail, and try to skip over entries that we might not
> be able to free.

Yes and you could use the first pass of dcache_shrink_sb() to do that for you.

Thanks,
Balbir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-04-04  1:05 ` [PATCH] Fix dcache race during umount NeilBrown
@ 2006-04-04  5:11   ` Balbir Singh
  0 siblings, 0 replies; 14+ messages in thread
From: Balbir Singh @ 2006-04-04  5:11 UTC (permalink / raw)
  To: NeilBrown
  Cc: Andrew Morton, linux-kernel, Jan Blunck, Kirill Korotaev,
	Olaf Hering

> Trying again after some useful comments from Balbir.
> Note: the change to prune_dcache looks quite different now.
>
> NeilBrown
>
<snip>
>                 tmp = dentry_unused.prev;
> +               if (unlikely(sb)) {
> +                       /* Try to find a dentry for this sb, but don't try
> +                        * too hard, if they aren't near the tail they will
> +                        * be moved down again soon
> +                        */
> +                       int skip = count;
> +                       while (skip &&
> +                              tmp != &dentry_unused &&
> +                              list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
> +                               skip--;
> +                               tmp = tmp->prev;
> +                       }
> +               }

This code looks very similar to the first pass in dcache_shrink_sb().
I would prefer if we could re-use that code here, but that would
require creating a helper function by splitting the function.

Looks like a good solution to me.

Regards,
Balbir

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 2.6.17-rc1-mm1] BUG due to freed dentry in dcache race fix
  2006-04-03 18:12   ` Balbir Singh
  2006-04-04  0:59     ` Neil Brown
@ 2006-04-06 14:05     ` Balbir Singh
  1 sibling, 0 replies; 14+ messages in thread
From: Balbir Singh @ 2006-04-06 14:05 UTC (permalink / raw)
  To: NeilBrown, Andrew Morton; +Cc: linux-kernel, Jan Blunck, Kirill Korotaev, olh

Hi, Andrew,

Please apply this patch on top of

fix-dcache-race-during-umount.patch

we need to save a reference to the s_umount read write semaphore. The dentry
can be freed by prune_one_dentry(). Dereferencing dentry->d_sb->s_umount is
not safe after that point.

I hit an Oops while running 2.6.17-rc1-mm1

DMA free:3584kB min:68kB low:84kB high:100kB active:10448kB inactive:0kB presentOops: 0002 [#1]
PREEMPT SMP
last sysfs file: /devices/pci0000:00/0000:00:0a.0/power/state
Modules linked in: loop dm_mod ide_cd cdrom ohci_hcd usbcore serverworks generii
CPU:    1
EIP:    0060:[<c10824f1>]    Not tainted VLI
EFLAGS: 00010212   (2.6.17-rc1-mm1cpum #2)
EIP is at prune_dcache+0x91/0x1d0
eax: 6b6b6ba7   ebx: e45918e0   ecx: 00000001   edx: ffffffff
esi: e45918e8   edi: 00000058   ebp: e4cfcbe0   esp: e4cfcbbc
ds: 007b   es: 007b   ss: 0068
Process hackbench (pid: 11183, threadinfo=e4cfc000 task=e4d076b0)
Stack: <0>c12fb400 e4cfcbd0 c122f5ed c2288504 00000000 00000000 0000283c 000a0f
       c2259404 e4cfcbe8 c108266e e4cfcc28 c104fe9b 00000080 000000d0 0000000b
       00000021 00000000 e4cfc000 00000000 0000008c e4cfc000 00000080 00004db7
Call Trace:
 <c1003f9d> show_stack_log_lvl+0xad/0xe0   <c10041e7> show_registers+0x1c7/0x250
 <c10043aa> die+0x13a/0x330   <c1230f50> do_page_fault+0x2d0/0x750
 <c1003987> error_code+0x4f/0x54   <c108266e> shrink_dcache_memory+0x3e/0x50
 <c104fe9b> shrink_slab+0x17b/0x240   <c105077f> try_to_free_pages+0x1bf/0x2b0
 <c104b466> __alloc_pages+0x136/0x310   <c10635fc> cache_alloc_refill+0x40c/0x70
 <c1063b86> __kmalloc_track_caller+0xc6/0xf0   <c11d922f> __alloc_skb+0x5f/0x110
 <c11d5247> sock_alloc_send_skb+0x1a7/0x200   <c1227a2d> unix_stream_sendmsg+0x0
 <c11d1bb4> do_sock_write+0xb4/0xc0   <c11d2367> sock_aio_write+0x67/0x70
 <c1067809> do_sync_write+0xb9/0xf0   <c10682f1> vfs_write+0x181/0x190
 <c1068a07> sys_write+0x47/0x70   <c122f93f> sysenter_past_esp+0x54/0x75
Code: 0a 75 f3 85 c0 0f 88 fe 00 00 00 8b 4b 60 8b 41 38 85 c0 0f 84 de 00 00 0

Thanks,
Balbir

Signed-off-by: Balbir Singh <balbir@in.ibm.com>
---

 fs/dcache.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletion(-)

diff -puN fs/dcache.c~dcache_race_umount_sem_fix fs/dcache.c
--- linux-2.6.17/fs/dcache.c~dcache_race_umount_sem_fix	2006-04-06 17:11:41.000000000 +0530
+++ linux-2.6.17-balbir/fs/dcache.c	2006-04-06 17:17:02.000000000 +0530
@@ -464,9 +464,14 @@ static void prune_dcache(int count, stru
 		 * So we try to get s_umount, and make sure s_root isn't NULL
 		 */
 		if (down_read_trylock(&dentry->d_sb->s_umount)) {
+			/*
+			 * Save the semaphore reference, prune_one_dentry() can
+			 * free the dentry
+			 */
+			struct rw_semaphore *umnt_sem = &dentry->d_sb->s_umount;
 			if (dentry->d_sb->s_root != NULL) {
 				prune_one_dentry(dentry);
-				up_read(&dentry->d_sb->s_umount);
+				up_read(umnt_sem);
 				continue;
 			}
 			up_read(&dentry->d_sb->s_umount);
_

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH] Fix dcache race during umount
@ 2006-06-22 12:22 David Howells
  2006-06-22 14:27 ` David Howells
                   ` (3 more replies)
  0 siblings, 4 replies; 14+ messages in thread
From: David Howells @ 2006-06-22 12:22 UTC (permalink / raw)
  To: neilb, balbir, akpm, aviro
  Cc: jblunck, dev, olh, dhowells, linux-kernel, linux-fsdevel


Hi Neil,

I'd like to propose an alternative to your patch to fix the dcache race
between unmounting a filesystem and the memory shrinker.

In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
instead of shrink_dcache_anon(), and the latter function is discarded
completely since it's no longer used.

This allows for the possibility of a superblock with multiple dentry trees in
it, such as exist in the patches for NFS superblock sharing that I worked out.

Otherwise, the patches are much the same, with the critical bit being
prune_dcache() ignoring any dentry for which the superblock is being unmounted
if called from shrink_dcache_memory().

I feel that prune_dcache() should probably at some point be merged into its
two callers, since shrink_dcache_parent() and select_parent() can probably
then do a better job of eating a dentry subtree from the leaves inwards, but I
haven't attempted that with this patch.

David
---
The race is that the shrink_dcache_memory() shrinker could get called while a
filesystem is being unmounted, and could try to prune a dentry belonging to
that filesystem.

If it does, then it will call in to iput() on the inode while the dentry is no
longer able to be found by the umounting process.  If iput() takes a while,
generic_shutdown_super() could get all the way though shrink_dcache_parent()
and shrink_dcache_sb() and invalidate_inodes() without ever waiting on this
particular inode.

Eventually the superblock gets freed anyway and if the iput() tried to touch it
(which some filesystems certainly do), it will lose.  The promised
"Self-destruct in 5 seconds" doesn't lead to a nice day.

The race is closed by holding s_umount while calling prune_one_dentry() on
someone else's dentry.  As a down_read_trylock() is used,
shrink_dcache_memory() will no longer try to prune the dentry of a filesystem
that is being unmounted, and unmount will not be able to start until any such
active prune_one_dentry() completes.

This requires that prune_dcache *knows* which filesystem (if any) it is doing
the prune on behalf of so that it can be careful of other filesystems.
shrink_dcache_memory() isn't called it on behalf of any filesystem, and so is
careful of everything.

generic_shutdown_super() now calls shrink_dcache_sb() rather than
shrink_dcache_anon() so that superblocks with multiple dentry trees can be
dealt with.  shrink_dcache_anon() is discarded since nothing then uses it.

If prune_dcache() finds a dentry that it cannot free, it leaves it where it is
(at the tail of the list) and exits, on the assumption that some other thread
will be removing that dentry soon.  To try to make sure that some work gets
done, a limited number of dnetries which are untouchable are skipped over while
choosing the dentry to work on.

I believe this race was first found by Kirill Korotaev.

Cc: Jan Blunck <jblunck@suse.de>
Cc: Kirill Korotaev <dev@openvz.org>
Cc: Olaf Hering <olh@suse.de>
Cc: Neil Brown <neilb@suse.de>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-Off-By: David Howells <dhowells@redhat.com>
---
warthog>diffstat -p1 fix-dcache-race-during-umount-2.patch 
 fs/dcache.c            |  100 +++++++++++++++++++++++++++----------------------
 fs/super.c             |    2 
 include/linux/dcache.h |    1 
 3 files changed, 58 insertions(+), 45 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 940d188..c88e426 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -382,6 +382,8 @@ static inline void prune_one_dentry(stru
 /**
  * prune_dcache - shrink the dcache
  * @count: number of entries to try and free
+ * @sb: if given, ignore dentries for other superblocks
+ *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
  * more memory, or simply when we need to unmount
@@ -392,16 +394,29 @@ static inline void prune_one_dentry(stru
  * all the dentries are in use.
  */
  
-static void prune_dcache(int count)
+static void prune_dcache(int count, struct super_block *sb)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
 		struct dentry *dentry;
 		struct list_head *tmp;
+		struct rw_semaphore *s_umount;
 
 		cond_resched_lock(&dcache_lock);
 
 		tmp = dentry_unused.prev;
+		if (unlikely(sb)) {
+			/* Try to find a dentry for this sb, but don't try
+			 * too hard, if they aren't near the tail they will
+			 * be moved down again soon
+			 */
+			int skip = count;
+			while (skip && tmp != &dentry_unused &&
+			    list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
+				skip--;
+				tmp = tmp->prev;
+			}
+		}
 		if (tmp == &dentry_unused)
 			break;
 		list_del_init(tmp);
@@ -427,7 +442,45 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		prune_one_dentry(dentry);
+		/*
+		 * If the dentry is not DCACHED_REFERENCED, it is time
+		 * to remove it from the dcache, provided the super block is
+		 * NULL (which means we are trying to reclaim memory)
+		 * or this dentry belongs to the same super block that
+		 * we want to shrink.
+		 */
+		/*
+		 * If this dentry is for "my" filesystem, then I can prune it
+		 * without taking the s_umount lock (I already hold it).
+		 */
+		if (sb && dentry->d_sb == sb) {
+			prune_one_dentry(dentry);
+			continue;
+		}
+		/*
+		 * ...otherwise we need to be sure this filesystem isn't being
+		 * unmounted, otherwise we could race with
+		 * generic_shutdown_super(), and end up holding a reference to
+		 * an inode while the filesystem is unmounted.
+		 * So we try to get s_umount, and make sure s_root isn't NULL.
+		 * (Take a local copy of s_umount to avoid a use-after-free of
+		 * `dentry').
+		 */
+		s_umount = &dentry->d_sb->s_umount;
+		if (down_read_trylock(s_umount)) {
+			if (dentry->d_sb->s_root != NULL) {
+				prune_one_dentry(dentry);
+				up_read(s_umount);
+				continue;
+			}
+			up_read(s_umount);
+		}
+		spin_unlock(&dentry->d_lock);
+		/* Cannot remove the first dentry, and it isn't appropriate
+		 * to move it to the head of the list, so give up, and try
+		 * later
+		 */
+		break;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -630,46 +683,7 @@ void shrink_dcache_parent(struct dentry 
 	int found;
 
 	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found);
-}
-
-/**
- * shrink_dcache_anon - further prune the cache
- * @head: head of d_hash list of dentries to prune
- *
- * Prune the dentries that are anonymous
- *
- * parsing d_hash list does not hlist_for_each_entry_rcu() as it
- * done under dcache_lock.
- *
- */
-void shrink_dcache_anon(struct hlist_head *head)
-{
-	struct hlist_node *lp;
-	int found;
-	do {
-		found = 0;
-		spin_lock(&dcache_lock);
-		hlist_for_each(lp, head) {
-			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
-			if (!list_empty(&this->d_lru)) {
-				dentry_stat.nr_unused--;
-				list_del_init(&this->d_lru);
-			}
-
-			/* 
-			 * move only zero ref count dentries to the end 
-			 * of the unused list for prune_dcache
-			 */
-			if (!atomic_read(&this->d_count)) {
-				list_add_tail(&this->d_lru, &dentry_unused);
-				dentry_stat.nr_unused++;
-				found++;
-			}
-		}
-		spin_unlock(&dcache_lock);
-		prune_dcache(found);
-	} while(found);
+		prune_dcache(found, parent->d_sb);
 }
 
 /*
@@ -689,7 +703,7 @@ static int shrink_dcache_memory(int nr, 
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_dcache(nr);
+		prune_dcache(nr, NULL);
 	}
 	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }
diff --git a/fs/super.c b/fs/super.c
index 15f2afd..beb0c11 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -231,7 +231,7 @@ void generic_shutdown_super(struct super
 	if (root) {
 		sb->s_root = NULL;
 		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_sb(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index 836325e..0dd1610 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -217,7 +217,6 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 David Howells
@ 2006-06-22 14:27 ` David Howells
  2006-06-22 16:08 ` Jan Blunck
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2006-06-22 14:27 UTC (permalink / raw)
  To: David Howells
  Cc: neilb, balbir, akpm, aviro, jblunck, dev, olh, linux-kernel,
	linux-fsdevel

David Howells <dhowells@redhat.com> wrote:

> I'd like to propose an alternative to your patch to fix the dcache race
> between unmounting a filesystem and the memory shrinker.
> 
> In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> instead of shrink_dcache_anon(), and the latter function is discarded
> completely since it's no longer used.

On the other hand, I can make my patch just alter the effect of yours.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 David Howells
  2006-06-22 14:27 ` David Howells
@ 2006-06-22 16:08 ` Jan Blunck
  2006-06-22 16:44   ` Jan Blunck
  2006-06-23 13:28 ` Jan Blunck
  2006-06-24  5:23 ` Neil Brown
  3 siblings, 1 reply; 14+ messages in thread
From: Jan Blunck @ 2006-06-22 16:08 UTC (permalink / raw)
  To: David Howells
  Cc: neilb, balbir, akpm, aviro, dev, olh, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1105 bytes --]

On Thu, Jun 22, David Howells wrote:

> I'd like to propose an alternative to your patch to fix the dcache race
> between unmounting a filesystem and the memory shrinker.
> 
> In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> instead of shrink_dcache_anon(), and the latter function is discarded
> completely since it's no longer used.

I had a similar patch. But after calling shrink_dcache_sb() in
generic_shutdown_super() the call to shrink_dcache_parent() is not necessary
anymore. And you should also fix d_genocide() that it is putting unused
dentries to the LRU list.

> I feel that prune_dcache() should probably at some point be merged into its
> two callers, since shrink_dcache_parent() and select_parent() can probably
> then do a better job of eating a dentry subtree from the leaves inwards, but I
> haven't attempted that with this patch.

Hmm, yes. The problem is that we only have a valid reference to the root
dentry of the subtree that we shrink. So we have to get a reference for the
parent of each dentry that we prune before calling prune_one_dentry().

Jan

[-- Attachment #2: combined.diff --]
[-- Type: text/x-patch, Size: 5535 bytes --]

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -384,9 +384,8 @@ static inline void prune_one_dentry(stru
  * @count: number of entries to try and free
  *
  * Shrink the dcache. This is done when we need
- * more memory, or simply when we need to unmount
- * something (at which point we need to unuse
- * all dentries).
+ * more memory. When we need to unmount something
+ * we call shrink_dcache_sb().
  *
  * This function may fail to free any resources if
  * all the dentries are in use.
@@ -419,15 +418,26 @@ static void prune_dcache(int count)
 			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		/* If the dentry was recently referenced, don't free it. */
-		if (dentry->d_flags & DCACHE_REFERENCED) {
-			dentry->d_flags &= ~DCACHE_REFERENCED;
-			list_add(&dentry->d_lru, &dentry_unused);
-			dentry_stat.nr_unused++;
-			spin_unlock(&dentry->d_lock);
-			continue;
+		/* If the dentry was recently referenced, or dentry's
+		 * filesystem is going to be unmounted, don't free it. */
+		if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+		    down_read_trylock(&dentry->d_sb->s_umount)) {
+			struct super_block *sb = dentry->d_sb;
+
+			if (dentry->d_sb->s_root) {
+				prune_one_dentry(dentry);
+				up_read(&sb->s_umount);
+				continue;
+			}
+			up_read(&sb->s_umount);
 		}
-		prune_one_dentry(dentry);
+		/* Append it at the beginning of the list, because either it
+		 * was recently reference or the dentry's filesystem is
+		 * unmounted so shrink_dcache_sb() can find it faster. */
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		list_add(&dentry->d_lru, &dentry_unused);
+		dentry_stat.nr_unused++;
+		spin_unlock(&dentry->d_lock);
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -464,12 +474,10 @@ void shrink_dcache_sb(struct super_block
 	 * superblock to the most recent end of the unused list.
 	 */
 	spin_lock(&dcache_lock);
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
+	list_for_each_entry_safe(dentry, pos, &dentry_unused, d_lru) {
 		if (dentry->d_sb != sb)
 			continue;
-		list_del(tmp);
-		list_add(tmp, &dentry_unused);
+		list_move(&dentry->d_lru, &dentry_unused);
 	}
 
 	/*
@@ -633,45 +641,6 @@ void shrink_dcache_parent(struct dentry 
 		prune_dcache(found);
 }
 
-/**
- * shrink_dcache_anon - further prune the cache
- * @head: head of d_hash list of dentries to prune
- *
- * Prune the dentries that are anonymous
- *
- * parsing d_hash list does not hlist_for_each_entry_rcu() as it
- * done under dcache_lock.
- *
- */
-void shrink_dcache_anon(struct hlist_head *head)
-{
-	struct hlist_node *lp;
-	int found;
-	do {
-		found = 0;
-		spin_lock(&dcache_lock);
-		hlist_for_each(lp, head) {
-			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
-			if (!list_empty(&this->d_lru)) {
-				dentry_stat.nr_unused--;
-				list_del_init(&this->d_lru);
-			}
-
-			/*
-			 * move only zero ref count dentries to the end
-			 * of the unused list for prune_dcache
-			 */
-			if (!atomic_read(&this->d_count)) {
-				list_add_tail(&this->d_lru, &dentry_unused);
-				dentry_stat.nr_unused++;
-				found++;
-			}
-		}
-		spin_unlock(&dcache_lock);
-		prune_dcache(found);
-	} while(found);
-}
-
 /*
  * Scan `nr' dentries and return the number which remain.
  *
@@ -1604,19 +1573,38 @@ repeat:
 resume:
 	while (next != &this_parent->d_subdirs) {
 		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
+		struct dentry *dentry = list_entry(tmp, struct dentry,
+						   d_u.d_child);
 		next = tmp->next;
+
 		if (d_unhashed(dentry)||!dentry->d_inode)
 			continue;
+
+		if (!list_empty(&dentry->d_lru)) {
+			dentry_stat.nr_unused--;
+			list_del_init(&dentry->d_lru);
+		}
+		/*
+		 * We can lower the reference count here:
+		 * - if the refcount is zero afterwards, the dentry hasn't got
+		 *   any children
+		 * - if the recount isn't zero afterwards, we visit the
+		 *   chrildren next
+		 * - because we always hold the dcache lock, nobody else can
+		 *   kill the unused dentries yet
+		 */
+		if (atomic_dec_and_test(&dentry->d_count)) {
+			list_add_tail(&dentry->d_lru, &dentry_unused);
+			dentry_stat.nr_unused++;
+		}
+
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
 			goto repeat;
 		}
-		atomic_dec(&dentry->d_count);
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
-		atomic_dec(&this_parent->d_count);
 		this_parent = this_parent->d_parent;
 		goto resume;
 	}
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -230,8 +230,7 @@ void generic_shutdown_super(struct super
 
 	if (root) {
 		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_sb(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -217,7 +217,6 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 16:08 ` Jan Blunck
@ 2006-06-22 16:44   ` Jan Blunck
  0 siblings, 0 replies; 14+ messages in thread
From: Jan Blunck @ 2006-06-22 16:44 UTC (permalink / raw)
  To: David Howells
  Cc: neilb, balbir, akpm, aviro, dev, olh, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 368 bytes --]

On Thu, Jun 22, Jan Blunck wrote:

> I had a similar patch. But after calling shrink_dcache_sb() in
> generic_shutdown_super() the call to shrink_dcache_parent() is not necessary
> anymore. And you should also fix d_genocide() that it is putting unused
> dentries to the LRU list.

And I even have a patch that does compile ... so please ignore the previous
one.

Jan

[-- Attachment #2: combined.diff --]
[-- Type: text/x-patch, Size: 6207 bytes --]

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -384,9 +384,8 @@ static inline void prune_one_dentry(stru
  * @count: number of entries to try and free
  *
  * Shrink the dcache. This is done when we need
- * more memory, or simply when we need to unmount
- * something (at which point we need to unuse
- * all dentries).
+ * more memory. When we need to unmount something
+ * we call shrink_dcache_sb().
  *
  * This function may fail to free any resources if
  * all the dentries are in use.
@@ -419,15 +418,26 @@ static void prune_dcache(int count)
  			spin_unlock(&dentry->d_lock);
 			continue;
 		}
-		/* If the dentry was recently referenced, don't free it. */
-		if (dentry->d_flags & DCACHE_REFERENCED) {
-			dentry->d_flags &= ~DCACHE_REFERENCED;
- 			list_add(&dentry->d_lru, &dentry_unused);
- 			dentry_stat.nr_unused++;
- 			spin_unlock(&dentry->d_lock);
-			continue;
+		/* If the dentry was recently referenced, or dentry's
+		 * filesystem is going to be unmounted, don't free it. */
+		if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+		    down_read_trylock(&dentry->d_sb->s_umount)) {
+			struct super_block *sb = dentry->d_sb;
+
+			if (dentry->d_sb->s_root) {
+				prune_one_dentry(dentry);
+				up_read(&sb->s_umount);
+				continue;
+			}
+			up_read(&sb->s_umount);
 		}
-		prune_one_dentry(dentry);
+		/* Append it at the beginning of the list, because either it
+		 * was recently reference or the dentry's filesystem is
+		 * unmounted so shrink_dcache_sb() can find it faster. */
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		list_add(&dentry->d_lru, &dentry_unused);
+		dentry_stat.nr_unused++;
+		spin_unlock(&dentry->d_lock);
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -456,32 +466,28 @@ static void prune_dcache(int count)
 
 void shrink_dcache_sb(struct super_block * sb)
 {
-	struct list_head *tmp, *next;
-	struct dentry *dentry;
+	struct dentry *dentry, *pos;
 
 	/*
 	 * Pass one ... move the dentries for the specified
 	 * superblock to the most recent end of the unused list.
 	 */
 	spin_lock(&dcache_lock);
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
+	list_for_each_entry_safe(dentry, pos, &dentry_unused, d_lru) {
 		if (dentry->d_sb != sb)
 			continue;
-		list_del(tmp);
-		list_add(tmp, &dentry_unused);
+		list_move(&dentry->d_lru, &dentry_unused);
 	}
 
 	/*
 	 * Pass two ... free the dentries for this superblock.
 	 */
 repeat:
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
+	list_for_each_entry_safe(dentry, pos, &dentry_unused, d_lru) {
 		if (dentry->d_sb != sb)
 			continue;
 		dentry_stat.nr_unused--;
-		list_del_init(tmp);
+		list_del_init(&dentry->d_lru);
 		spin_lock(&dentry->d_lock);
 		if (atomic_read(&dentry->d_count)) {
 			spin_unlock(&dentry->d_lock);
@@ -633,45 +639,6 @@ void shrink_dcache_parent(struct dentry 
 		prune_dcache(found);
 }
 
-/**
- * shrink_dcache_anon - further prune the cache
- * @head: head of d_hash list of dentries to prune
- *
- * Prune the dentries that are anonymous
- *
- * parsing d_hash list does not hlist_for_each_entry_rcu() as it
- * done under dcache_lock.
- *
- */
-void shrink_dcache_anon(struct hlist_head *head)
-{
-	struct hlist_node *lp;
-	int found;
-	do {
-		found = 0;
-		spin_lock(&dcache_lock);
-		hlist_for_each(lp, head) {
-			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
-			if (!list_empty(&this->d_lru)) {
-				dentry_stat.nr_unused--;
-				list_del_init(&this->d_lru);
-			}
-
-			/* 
-			 * move only zero ref count dentries to the end 
-			 * of the unused list for prune_dcache
-			 */
-			if (!atomic_read(&this->d_count)) {
-				list_add_tail(&this->d_lru, &dentry_unused);
-				dentry_stat.nr_unused++;
-				found++;
-			}
-		}
-		spin_unlock(&dcache_lock);
-		prune_dcache(found);
-	} while(found);
-}
-
 /*
  * Scan `nr' dentries and return the number which remain.
  *
@@ -1604,19 +1571,38 @@ repeat:
 resume:
 	while (next != &this_parent->d_subdirs) {
 		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
+		struct dentry *dentry = list_entry(tmp, struct dentry,
+						   d_u.d_child);
 		next = tmp->next;
+
 		if (d_unhashed(dentry)||!dentry->d_inode)
 			continue;
+
+		if (!list_empty(&dentry->d_lru)) {
+			dentry_stat.nr_unused--;
+			list_del_init(&dentry->d_lru);
+		}
+		/*
+		 * We can lower the reference count here:
+		 * - if the refcount is zero afterwards, the dentry hasn't got
+		 *   any children
+		 * - if the recount isn't zero afterwards, we visit the
+		 *   chrildren next
+		 * - because we always hold the dcache lock, nobody else can
+		 *   kill the unused dentries yet
+		 */
+		if (atomic_dec_and_test(&dentry->d_count)) {
+			list_add_tail(&dentry->d_lru, &dentry_unused);
+			dentry_stat.nr_unused++;
+		}
+
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
 			goto repeat;
 		}
-		atomic_dec(&dentry->d_count);
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
-		atomic_dec(&this_parent->d_count);
 		this_parent = this_parent->d_parent;
 		goto resume;
 	}
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -230,8 +230,7 @@ void generic_shutdown_super(struct super
 
 	if (root) {
 		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_anon(&sb->s_anon);
+		shrink_dcache_sb(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -217,7 +217,6 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct hlist_head *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 David Howells
  2006-06-22 14:27 ` David Howells
  2006-06-22 16:08 ` Jan Blunck
@ 2006-06-23 13:28 ` Jan Blunck
  2006-06-24  5:23 ` Neil Brown
  3 siblings, 0 replies; 14+ messages in thread
From: Jan Blunck @ 2006-06-23 13:28 UTC (permalink / raw)
  To: David Howells
  Cc: neilb, balbir, akpm, aviro, dev, olh, linux-kernel, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 545 bytes --]

On Thu, Jun 22, David Howells wrote:

> I'd like to propose an alternative to your patch to fix the dcache race
> between unmounting a filesystem and the memory shrinker.
> 
> In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> instead of shrink_dcache_anon(), and the latter function is discarded
> completely since it's no longer used.

I've updated my patches to Linus git repository from today. Additionally, I've
changed shrink_dcache_parent() to not use prune_dcache() anymore.

Comments are welcome.

Regards,
	Jan

[-- Attachment #2: combined.diff --]
[-- Type: text/x-patch, Size: 10947 bytes --]

Index: linux-2.6/fs/dcache.c
===================================================================
--- linux-2.6.orig/fs/dcache.c
+++ linux-2.6/fs/dcache.c
@@ -387,101 +387,96 @@ static void prune_one_dentry(struct dent
  *         which are being unmounted.
  *
  * Shrink the dcache. This is done when we need
- * more memory, or simply when we need to unmount
- * something (at which point we need to unuse
- * all dentries).
+ * more memory. When we need to unmount something
+ * we call shrink_dcache_sb().
  *
  * This function may fail to free any resources if
  * all the dentries are in use.
  */
- 
-static void prune_dcache(int count, struct super_block *sb)
+static void prune_dcache(int count)
 {
 	spin_lock(&dcache_lock);
 	for (; count ; count--) {
 		struct dentry *dentry;
 		struct list_head *tmp;
-		struct rw_semaphore *s_umount;
 
 		cond_resched_lock(&dcache_lock);
 
 		tmp = dentry_unused.prev;
-		if (unlikely(sb)) {
-			/* Try to find a dentry for this sb, but don't try
-			 * too hard, if they aren't near the tail they will
-			 * be moved down again soon
-			 */
-			int skip = count;
-			while (skip && tmp != &dentry_unused &&
-			    list_entry(tmp, struct dentry, d_lru)->d_sb != sb) {
-				skip--;
-				tmp = tmp->prev;
-			}
-		}
 		if (tmp == &dentry_unused)
 			break;
-		list_del_init(tmp);
-		prefetch(dentry_unused.prev);
- 		dentry_stat.nr_unused--;
+		/*
+		 * We want to prefetch the predecessor of our predecessor since
+		 * our predecessor is already accessed in list_del_init().
+		 */
+		prefetch(tmp->prev->prev);
 		dentry = list_entry(tmp, struct dentry, d_lru);
+		dentry_stat.nr_unused--;
+		list_del_init(&dentry->d_lru);
 
- 		spin_lock(&dentry->d_lock);
+		spin_lock(&dentry->d_lock);
 		/*
 		 * We found an inuse dentry which was not removed from
-		 * dentry_unused because of laziness during lookup.  Do not free
+		 * dentry_unused because of laziness during lookup. Do not free
 		 * it - just keep it off the dentry_unused list.
 		 */
- 		if (atomic_read(&dentry->d_count)) {
- 			spin_unlock(&dentry->d_lock);
-			continue;
-		}
-		/* If the dentry was recently referenced, don't free it. */
-		if (dentry->d_flags & DCACHE_REFERENCED) {
-			dentry->d_flags &= ~DCACHE_REFERENCED;
- 			list_add(&dentry->d_lru, &dentry_unused);
- 			dentry_stat.nr_unused++;
- 			spin_unlock(&dentry->d_lock);
-			continue;
-		}
-		/*
-		 * If the dentry is not DCACHED_REFERENCED, it is time
-		 * to remove it from the dcache, provided the super block is
-		 * NULL (which means we are trying to reclaim memory)
-		 * or this dentry belongs to the same super block that
-		 * we want to shrink.
-		 */
-		/*
-		 * If this dentry is for "my" filesystem, then I can prune it
-		 * without taking the s_umount lock (I already hold it).
-		 */
-		if (sb && dentry->d_sb == sb) {
-			prune_one_dentry(dentry);
+		if (atomic_read(&dentry->d_count)) {
+			spin_unlock(&dentry->d_lock);
 			continue;
 		}
 		/*
-		 * ...otherwise we need to be sure this filesystem isn't being
-		 * unmounted, otherwise we could race with
-		 * generic_shutdown_super(), and end up holding a reference to
-		 * an inode while the filesystem is unmounted.
-		 * So we try to get s_umount, and make sure s_root isn't NULL.
-		 * (Take a local copy of s_umount to avoid a use-after-free of
-		 * `dentry').
+		 * If the dentry was recently referenced, or the dentry's
+		 * filesystem is going to be unmounted, don't free it.
 		 */
-		s_umount = &dentry->d_sb->s_umount;
-		if (down_read_trylock(s_umount)) {
-			if (dentry->d_sb->s_root != NULL) {
+		if (!(dentry->d_flags & DCACHE_REFERENCED) &&
+		    likely(down_read_trylock(&dentry->d_sb->s_umount))) {
+			struct super_block *sb = dentry->d_sb;
+
+			if (dentry->d_sb->s_root) {
 				prune_one_dentry(dentry);
-				up_read(s_umount);
+				up_read(&sb->s_umount);
 				continue;
 			}
-			up_read(s_umount);
+			up_read(&sb->s_umount);
 		}
-		spin_unlock(&dentry->d_lock);
-		/* Cannot remove the first dentry, and it isn't appropriate
-		 * to move it to the head of the list, so give up, and try
-		 * later
+		/*
+		 * Either the dentry was recently referenced or its filesystem
+		 * is going to be unmounted. Add the dentry to the beginning of
+		 * the LRU list so shrink_dcache_sb() can find it faster.
 		 */
-		break;
+		dentry->d_flags &= ~DCACHE_REFERENCED;
+		list_add(&dentry->d_lru, &dentry_unused);
+		dentry_stat.nr_unused++;
+		spin_unlock(&dentry->d_lock);
+	}
+	spin_unlock(&dcache_lock);
+}
+
+/*
+ * __shrink_dcache_sb - Shrink the dentries for a superblock
+ * @sb: superblock
+ * @list: list of dentries
+ *
+ * Prune a list of dentries of a superblock.
+ */
+static void __shrink_dcache_sb(struct super_block *sb, struct list_head *list)
+{
+	struct dentry *dentry, *pos;
+
+	spin_lock(&dcache_lock);
+repeat:
+	list_for_each_entry_safe(dentry, pos, list, d_lru) {
+		BUG_ON(dentry->d_sb != sb);
+		dentry_stat.nr_unused--;
+		list_del_init(&dentry->d_lru);
+		spin_lock(&dentry->d_lock);
+		if (atomic_read(&dentry->d_count)) {
+			spin_unlock(&dentry->d_lock);
+			continue;
+		}
+		prune_one_dentry(dentry);
+		cond_resched_lock(&dcache_lock);
+		goto repeat;
 	}
 	spin_unlock(&dcache_lock);
 }
@@ -505,47 +500,31 @@ static void prune_dcache(int count, stru
  *
  * Shrink the dcache for the specified super block. This
  * is used to free the dcache before unmounting a file
- * system
+ * system.
+ *
+ * This must be called with sb->s_umount locked.
  */
-
 void shrink_dcache_sb(struct super_block * sb)
 {
-	struct list_head *tmp, *next;
-	struct dentry *dentry;
+	struct dentry *dentry, *pos;
+	LIST_HEAD(list);
 
 	/*
 	 * Pass one ... move the dentries for the specified
-	 * superblock to the most recent end of the unused list.
+	 * superblock to the temporarily list.
 	 */
 	spin_lock(&dcache_lock);
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
+	list_for_each_entry_safe(dentry, pos, &dentry_unused, d_lru) {
 		if (dentry->d_sb != sb)
 			continue;
-		list_del(tmp);
-		list_add(tmp, &dentry_unused);
+		list_move(&dentry->d_lru, &list);
 	}
+	spin_unlock(&dcache_lock);
 
 	/*
 	 * Pass two ... free the dentries for this superblock.
 	 */
-repeat:
-	list_for_each_safe(tmp, next, &dentry_unused) {
-		dentry = list_entry(tmp, struct dentry, d_lru);
-		if (dentry->d_sb != sb)
-			continue;
-		dentry_stat.nr_unused--;
-		list_del_init(tmp);
-		spin_lock(&dentry->d_lock);
-		if (atomic_read(&dentry->d_count)) {
-			spin_unlock(&dentry->d_lock);
-			continue;
-		}
-		prune_one_dentry(dentry);
-		cond_resched_lock(&dcache_lock);
-		goto repeat;
-	}
-	spin_unlock(&dcache_lock);
+	__shrink_dcache_sb(sb, &list);
 }
 
 /*
@@ -614,7 +593,7 @@ positive:
  * drop the lock and return early due to latency
  * constraints.
  */
-static int select_parent(struct dentry * parent)
+static int select_parent(struct dentry * parent, struct list_head *list)
 {
 	struct dentry *this_parent = parent;
 	struct list_head *next;
@@ -638,7 +617,7 @@ resume:
 		 * of the unused list for prune_dcache
 		 */
 		if (!atomic_read(&dentry->d_count)) {
-			list_add(&dentry->d_lru, dentry_unused.prev);
+			list_add(&dentry->d_lru, list);
 			dentry_stat.nr_unused++;
 			found++;
 		}
@@ -681,50 +660,11 @@ out:
  
 void shrink_dcache_parent(struct dentry * parent)
 {
+	LIST_HEAD(list);
 	int found;
 
-	while ((found = select_parent(parent)) != 0)
-		prune_dcache(found, parent->d_sb);
-}
-
-/**
- * shrink_dcache_anon - further prune the cache
- * @head: head of d_hash list of dentries to prune
- *
- * Prune the dentries that are anonymous
- *
- * parsing d_hash list does not hlist_for_each_entry_rcu() as it
- * done under dcache_lock.
- *
- */
-void shrink_dcache_anon(struct super_block *sb)
-{
-	struct hlist_node *lp;
-	struct hlist_head *head = &sb->s_anon;
-	int found;
-	do {
-		found = 0;
-		spin_lock(&dcache_lock);
-		hlist_for_each(lp, head) {
-			struct dentry *this = hlist_entry(lp, struct dentry, d_hash);
-			if (!list_empty(&this->d_lru)) {
-				dentry_stat.nr_unused--;
-				list_del_init(&this->d_lru);
-			}
-
-			/* 
-			 * move only zero ref count dentries to the end 
-			 * of the unused list for prune_dcache
-			 */
-			if (!atomic_read(&this->d_count)) {
-				list_add_tail(&this->d_lru, &dentry_unused);
-				dentry_stat.nr_unused++;
-				found++;
-			}
-		}
-		spin_unlock(&dcache_lock);
-		prune_dcache(found, sb);
-	} while(found);
+	while ((found = select_parent(parent, &list)) != 0)
+		__shrink_dcache_sb(parent->d_sb, &list);
 }
 
 /*
@@ -744,7 +684,7 @@ static int shrink_dcache_memory(int nr, 
 	if (nr) {
 		if (!(gfp_mask & __GFP_FS))
 			return -1;
-		prune_dcache(nr, NULL);
+		prune_dcache(nr);
 	}
 	return (dentry_stat.nr_unused / 100) * sysctl_vfs_cache_pressure;
 }
@@ -1659,19 +1599,38 @@ repeat:
 resume:
 	while (next != &this_parent->d_subdirs) {
 		struct list_head *tmp = next;
-		struct dentry *dentry = list_entry(tmp, struct dentry, d_u.d_child);
+		struct dentry *dentry = list_entry(tmp, struct dentry,
+						   d_u.d_child);
 		next = tmp->next;
+
 		if (d_unhashed(dentry)||!dentry->d_inode)
 			continue;
+
+		if (!list_empty(&dentry->d_lru)) {
+			dentry_stat.nr_unused--;
+			list_del_init(&dentry->d_lru);
+		}
+		/*
+		 * We can lower the reference count here:
+		 * - if the refcount is zero afterwards, the dentry hasn't got
+		 *   any children
+		 * - if the recount isn't zero afterwards, we visit the
+		 *   chrildren next
+		 * - because we always hold the dcache lock, nobody else can
+		 *   kill the unused dentries yet
+		 */
+		if (atomic_dec_and_test(&dentry->d_count)) {
+			list_add_tail(&dentry->d_lru, &dentry_unused);
+			dentry_stat.nr_unused++;
+		}
+
 		if (!list_empty(&dentry->d_subdirs)) {
 			this_parent = dentry;
 			goto repeat;
 		}
-		atomic_dec(&dentry->d_count);
 	}
 	if (this_parent != root) {
 		next = this_parent->d_u.d_child.next;
-		atomic_dec(&this_parent->d_count);
 		this_parent = this_parent->d_parent;
 		goto resume;
 	}
Index: linux-2.6/fs/super.c
===================================================================
--- linux-2.6.orig/fs/super.c
+++ linux-2.6/fs/super.c
@@ -230,8 +230,7 @@ void generic_shutdown_super(struct super
 
 	if (root) {
 		sb->s_root = NULL;
-		shrink_dcache_parent(root);
-		shrink_dcache_anon(sb);
+		shrink_dcache_sb(sb);
 		dput(root);
 		fsync_super(sb);
 		lock_super(sb);
Index: linux-2.6/include/linux/dcache.h
===================================================================
--- linux-2.6.orig/include/linux/dcache.h
+++ linux-2.6/include/linux/dcache.h
@@ -217,7 +217,6 @@ extern struct dentry * d_alloc_anon(stru
 extern struct dentry * d_splice_alias(struct inode *, struct dentry *);
 extern void shrink_dcache_sb(struct super_block *);
 extern void shrink_dcache_parent(struct dentry *);
-extern void shrink_dcache_anon(struct super_block *);
 extern int d_invalidate(struct dentry *);
 
 /* only used at mount-time */

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-22 12:22 David Howells
                   ` (2 preceding siblings ...)
  2006-06-23 13:28 ` Jan Blunck
@ 2006-06-24  5:23 ` Neil Brown
  2006-06-24  9:16   ` David Howells
  3 siblings, 1 reply; 14+ messages in thread
From: Neil Brown @ 2006-06-24  5:23 UTC (permalink / raw)
  To: David Howells
  Cc: balbir, akpm, aviro, jblunck, dev, olh, linux-kernel,
	linux-fsdevel

On Thursday June 22, dhowells@redhat.com wrote:
> 
> Hi Neil,
> 
> I'd like to propose an alternative to your patch to fix the dcache race
> between unmounting a filesystem and the memory shrinker.
> 
> In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> instead of shrink_dcache_anon(), and the latter function is discarded
> completely since it's no longer used.

Is that a good idea?
I was under the impression that on large machines, the shear size of
the dentry_unused list makes scanning all of it under the dcache_lock
an unpleasant thing to do.

Do you not have easy access to the roots of all trees in your
super-block-sharing situation so that shrink_dcache_parent can be
called on them all?

I would have thought that we want to get rid of shrink_dcache_sb rather
than create more users of it ??

> 
> I feel that prune_dcache() should probably at some point be merged into its
> two callers, since shrink_dcache_parent() and select_parent() can probably
> then do a better job of eating a dentry subtree from the leaves inwards, but I
> haven't attempted that with this patch.
> 

I think this is true, prune_dcache is serving two masters and is
overly complex as a result.

NeilBrown

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Fix dcache race during umount
  2006-06-24  5:23 ` Neil Brown
@ 2006-06-24  9:16   ` David Howells
  0 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2006-06-24  9:16 UTC (permalink / raw)
  To: Neil Brown
  Cc: David Howells, balbir, akpm, aviro, jblunck, dev, olh,
	linux-kernel, linux-fsdevel

Neil Brown <neilb@suse.de> wrote:

> > In my patch, generic_shutdown_super() is made to call shrink_dcache_sb()
> > instead of shrink_dcache_anon(), and the latter function is discarded
> > completely since it's no longer used.
> 
> Is that a good idea?

It depends on how often you expect unmounts to be happening, I suppose.

> Do you not have easy access to the roots of all trees in your
> super-block-sharing situation so that shrink_dcache_parent can be
> called on them all?

Well, all the roots are on the anon list, it's just that shrink_dcache_anon()
can't get rid of any root that's got children.

For unmounting specifically, we can do better as we can consume the dentry
trees directly.  That's not too difficult when we can unconditionally destroy
them from the leaves inwards.  That way we could probably avoid calling
shrink_dcache_parent() also - stick the tree at s_root on to the anon list
during unmount and have a single algorithm to wipe away the whole lot from
there.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2006-06-24  9:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20060403133804.27986.patches@notabene>
2006-04-03  3:40 ` [PATCH] Fix dcache race during umount NeilBrown
2006-04-03 18:12   ` Balbir Singh
2006-04-04  0:59     ` Neil Brown
2006-04-04  5:02       ` Balbir Singh
2006-04-06 14:05     ` [PATCH 2.6.17-rc1-mm1] BUG due to freed dentry in dcache race fix Balbir Singh
     [not found] <20060404110351.27364.patches@notabene>
2006-04-04  1:05 ` [PATCH] Fix dcache race during umount NeilBrown
2006-04-04  5:11   ` Balbir Singh
2006-06-22 12:22 David Howells
2006-06-22 14:27 ` David Howells
2006-06-22 16:08 ` Jan Blunck
2006-06-22 16:44   ` Jan Blunck
2006-06-23 13:28 ` Jan Blunck
2006-06-24  5:23 ` Neil Brown
2006-06-24  9:16   ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox