public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* d_splice_alias() problem.
@ 2004-04-23 13:02 Nikita Danilov
  2004-04-23 15:40 ` Andreas Dilger
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Nikita Danilov @ 2004-04-23 13:02 UTC (permalink / raw)
  To: linux kernel mailing list, alexander viro, trond myklebust,
	neil brown

Hello,

for some time I am observing that during stress tests over NFS

   shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()

is called on inode with ->i_nlink == 0 which results in truncate and
file deletion. This is wrong in general (file system is re-entered), and
deadlock prone on some file systems.

After some debugging, I tracked problem down the to d_splice_alias()
failing to identify dentries when necessary.

Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
comes for this file. nfsd looks name up in the parent directory
(nfsd_unlink()->lookup_one_len()). File system back-end uses
d_splice_alias(), but it only works for directories and we end up with
second (this time connected) dentry D2.

D2 is successfully unlinked, file has ->i_nlink == 0, and ->i_count == 1
from D1, and when prune_dcache() hits D1 bad things happen.

It's hard to imagine how new name can be identified with one among
multiple anonymous dentries, which is necessary for
NFSEXP_NOSUBTREECHECK export to work reliably.

One possible work-around is to forcibly destroy all remaining
DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
sure that this is possible and solves all problems of having more
dentries than there are nlinks.

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-23 13:02 d_splice_alias() problem Nikita Danilov
@ 2004-04-23 15:40 ` Andreas Dilger
  2004-04-23 16:20   ` Nikita Danilov
  2004-04-23 23:49 ` Andrew Morton
  2004-04-30  4:54 ` Neil Brown
  2 siblings, 1 reply; 22+ messages in thread
From: Andreas Dilger @ 2004-04-23 15:40 UTC (permalink / raw)
  To: Nikita Danilov
  Cc: linux kernel mailing list, alexander viro, trond myklebust,
	neil brown

On Apr 23, 2004  17:02 +0400, Nikita Danilov wrote:
> Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
> DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
> comes for this file. nfsd looks name up in the parent directory
> (nfsd_unlink()->lookup_one_len()). File system back-end uses
> d_splice_alias(), but it only works for directories and we end up with
> second (this time connected) dentry D2.
> 
> It's hard to imagine how new name can be identified with one among
> multiple anonymous dentries, which is necessary for
> NFSEXP_NOSUBTREECHECK export to work reliably.
> 
> One possible work-around is to forcibly destroy all remaining
> DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
> sure that this is possible and solves all problems of having more
> dentries than there are nlinks.

We use a patch for Lustre which solves this problem.  When there is
a lookup-by-inum done on the server there is the possibility to get a
DISCONNECTED dentry as you say.  However, if we ever do another lookup
on this inode we verify that either this is a disconnected dentry and
return the existing dentry, or if it is a connected dentry we essentially
"rename" the disconnected dentry and connect it to the tree and return
that.  There can never be both connected and disconnected dentry aliases
on an inode at one time.

This is handled inside the ext3 lookup code, I'm not sure how easy/hard
it would be to make a generic VFS patch to do the same.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-23 15:40 ` Andreas Dilger
@ 2004-04-23 16:20   ` Nikita Danilov
  0 siblings, 0 replies; 22+ messages in thread
From: Nikita Danilov @ 2004-04-23 16:20 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: linux kernel mailing list, alexander viro, trond myklebust,
	neil brown

Andreas Dilger writes:
 > On Apr 23, 2004  17:02 +0400, Nikita Danilov wrote:
 > > Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
 > > DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
 > > comes for this file. nfsd looks name up in the parent directory
 > > (nfsd_unlink()->lookup_one_len()). File system back-end uses
 > > d_splice_alias(), but it only works for directories and we end up with
 > > second (this time connected) dentry D2.
 > > 
 > > It's hard to imagine how new name can be identified with one among
 > > multiple anonymous dentries, which is necessary for
 > > NFSEXP_NOSUBTREECHECK export to work reliably.
 > > 
 > > One possible work-around is to forcibly destroy all remaining
 > > DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
 > > sure that this is possible and solves all problems of having more
 > > dentries than there are nlinks.
 > 
 > We use a patch for Lustre which solves this problem.  When there is
 > a lookup-by-inum done on the server there is the possibility to get a
 > DISCONNECTED dentry as you say.  However, if we ever do another lookup
 > on this inode we verify that either this is a disconnected dentry and
 > return the existing dentry, or if it is a connected dentry we essentially
 > "rename" the disconnected dentry and connect it to the tree and return
 > that.  There can never be both connected and disconnected dentry aliases
 > on an inode at one time.

I am not sure I understand this description correctly, but it looks
pretty much like what d_splice_alias() is supposed to do according to
the comment on top of it.

What I missed is that inode can have no more than one disconnected
dentry even if it has multiple names. Hence when lookup-by-name happens
list we can d_move disconnected dentry in place of the named one. This
is no worse that what is done currently by d_find_alias() that
identifies disconnected dentry with arbitrary connected.

 > 
 > This is handled inside the ext3 lookup code, I'm not sure how easy/hard
 > it would be to make a generic VFS patch to do the same.
 > 
 > Cheers, Andreas

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-23 13:02 d_splice_alias() problem Nikita Danilov
  2004-04-23 15:40 ` Andreas Dilger
@ 2004-04-23 23:49 ` Andrew Morton
  2004-04-26 12:45   ` Nikita Danilov
  2004-04-30  4:54 ` Neil Brown
  2 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2004-04-23 23:49 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: linux-kernel, viro, trondmy, neilb

Nikita Danilov <Nikita@Namesys.COM> wrote:
>
> for some time I am observing that during stress tests over NFS
> 
>     shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()
> 
>  is called on inode with ->i_nlink == 0 which results in truncate and
>  file deletion. This is wrong in general (file system is re-entered), and
>  deadlock prone on some file systems.

The filesystem is only reentered if the caller of __alloc_pages() passed in
__GFP_FS, in which case the bug is in the caller, not in shrink_slab().


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-23 23:49 ` Andrew Morton
@ 2004-04-26 12:45   ` Nikita Danilov
  0 siblings, 0 replies; 22+ messages in thread
From: Nikita Danilov @ 2004-04-26 12:45 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, viro, trondmy, neilb

Andrew Morton writes:
 > Nikita Danilov <Nikita@Namesys.COM> wrote:
 > >
 > > for some time I am observing that during stress tests over NFS
 > > 
 > >     shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()
 > > 
 > >  is called on inode with ->i_nlink == 0 which results in truncate and
 > >  file deletion. This is wrong in general (file system is re-entered), and
 > >  deadlock prone on some file systems.
 > 
 > The filesystem is only reentered if the caller of __alloc_pages() passed in
 > __GFP_FS, in which case the bug is in the caller, not in shrink_slab().

Well, I always thought that the only file system IO that GFP_FS is
expected to do is one issued by ->writepage. Doing truncate from within
VM scanner looks... wrong. But that's not the point, actually. Current
d_splice_alias leads to the following problems with NFS (and may be
other remote file systems also):

 * there are more dentries than nlinks for a given file, as a result

 * file (not opened by user) is not truncated when its last name is
   removed. inode is pinned in the memory indefinitely by remaining
   disconnected dentries.

 * sequence "touch x; rm x" always creates _two_ dentries for "x": one
   disconnected (by ->decode_fh) and one connected (by lookup_one_len
   from nfs unlink request).

I think that d_splice_alias() should be changed to scan inode->i_dentry
list and d_move() any disconnected dentry found into new one.

 > 

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-23 13:02 d_splice_alias() problem Nikita Danilov
  2004-04-23 15:40 ` Andreas Dilger
  2004-04-23 23:49 ` Andrew Morton
@ 2004-04-30  4:54 ` Neil Brown
  2004-04-30  7:50   ` Greg Banks
                     ` (2 more replies)
  2 siblings, 3 replies; 22+ messages in thread
From: Neil Brown @ 2004-04-30  4:54 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: linux kernel mailing list, alexander viro, trond myklebust

On Friday April 23, Nikita@Namesys.COM wrote:
> Hello,
> 
> for some time I am observing that during stress tests over NFS
> 
>    shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()
> 
> is called on inode with ->i_nlink == 0 which results in truncate and
> file deletion. This is wrong in general (file system is re-entered), and
> deadlock prone on some file systems.
> 
> After some debugging, I tracked problem down the to d_splice_alias()
> failing to identify dentries when necessary.
> 
> Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
> DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
> comes for this file. nfsd looks name up in the parent directory
> (nfsd_unlink()->lookup_one_len()). File system back-end uses
> d_splice_alias(), but it only works for directories and we end up with
> second (this time connected) dentry D2.
> 
> D2 is successfully unlinked, file has ->i_nlink == 0, and ->i_count == 1
> from D1, and when prune_dcache() hits D1 bad things happen.
> 
> It's hard to imagine how new name can be identified with one among
> multiple anonymous dentries, which is necessary for
> NFSEXP_NOSUBTREECHECK export to work reliably.
> 
> One possible work-around is to forcibly destroy all remaining
> DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
> sure that this is possible and solves all problems of having more
> dentries than there are nlinks.
> 
> Nikita.

If I understand you correctly, the main problem is that a disconnected
dentry can hold an inode active after the last link has been removed.
The file will not then be truncated and removed until memory pressure
flushes the disconnected dentry from the dcache.

This problem can be resolved by making sure that an inode never has
both a connected and a disconnected dentry.

This is already the case for directories (as they must only have one
dentry), but it is not the case for non-directories.

The following patch tries to address this.  It is a "technology
preview" in that the only testing I have done is that it compiles OK. 

Please consider reviewing it to see if it makes sense.

It:
 - changes d_alloc_anon to make sure that a new disconnected dentry is
   only allocated if there is currently no (hashed) dentry for the
   inode. (Previously this would noramlly be true, but a race was
   possible).
 - changes d_splice_alias to re-use a disconnected dentry on
   non-directories as well as directories.
 - splits most of d_find_alias out into a separate function to make
   the above easier.

I haven't fully thought through issues with unhashed, connected
dentries.
__d_find_alias won't return them so d_alloc_anon will never return
one, so it is possible to have an unhashed dentry and a disconnected
dentry at the same time, which probably isn't desirable.

Is it OK for d_alloc_anon to return an unaliased dentry, and then have
it possibly spliced back into the dentry tree??? I'm not sure.

Comments welcome.

NeilBrown


===========================================================

 ----------- Diffstat output ------------
 ./fs/dcache.c |   60 +++++++++++++++++++++++++++++-----------------------------
 1 files changed, 30 insertions(+), 30 deletions(-)

diff ./fs/dcache.c~current~ ./fs/dcache.c
--- ./fs/dcache.c~current~	2004-04-30 14:25:50.000000000 +1000
+++ ./fs/dcache.c	2004-04-30 14:39:20.000000000 +1000
@@ -281,12 +281,11 @@ struct dentry * dget_locked(struct dentr
  * any other hashed alias over that one.
  */
 
-struct dentry * d_find_alias(struct inode *inode)
+static struct dentry * __d_find_alias(struct inode *inode, int want_discon)
 {
 	struct list_head *head, *next, *tmp;
 	struct dentry *alias, *discon_alias=NULL;
 
-	spin_lock(&dcache_lock);
 	head = &inode->i_dentry;
 	next = inode->i_dentry.next;
 	while (next != head) {
@@ -297,19 +296,26 @@ struct dentry * d_find_alias(struct inod
  		if (!d_unhashed(alias)) {
 			if (alias->d_flags & DCACHE_DISCONNECTED)
 				discon_alias = alias;
-			else {
+			else if (!want_discon) {
 				__dget_locked(alias);
-				spin_unlock(&dcache_lock);
 				return alias;
 			}
 		}
 	}
 	if (discon_alias)
 		__dget_locked(discon_alias);
-	spin_unlock(&dcache_lock);
 	return discon_alias;
 }
 
+struct dentry * d_find_alias(struct inode *inode)
+{
+	struct dentry *de;
+	spin_lock(&dcache_lock);
+	de = __d_find_alias(inode, 0);
+	spin_unlock(&dcache_lock);
+	return de;
+}
+
 /*
  *	Try to kill dentries associated with this inode.
  * WARNING: you must own a reference to inode.
@@ -835,28 +841,22 @@ struct dentry * d_alloc_anon(struct inod
 	tmp->d_parent = tmp; /* make sure dput doesn't croak */
 	
 	spin_lock(&dcache_lock);
-	if (S_ISDIR(inode->i_mode) && !list_empty(&inode->i_dentry)) {
-		/* A directory can only have one dentry.
-		 * This (now) has one, so use it.
-		 */
-		res = list_entry(inode->i_dentry.next, struct dentry, d_alias);
-		__dget_locked(res);
-	} else {
-		/* attach a disconnected dentry */
+
+	res = __d_find_alias(inode, 0);
+	if (!res) {
 		res = tmp;
 		tmp = NULL;
-		if (res) {
-			spin_lock(&res->d_lock);
-			res->d_sb = inode->i_sb;
-			res->d_parent = res;
-			res->d_inode = inode;
-			res->d_bucket = d_hash(res, res->d_name.hash);
-			res->d_flags |= DCACHE_DISCONNECTED;
-			res->d_vfs_flags &= ~DCACHE_UNHASHED;
-			list_add(&res->d_alias, &inode->i_dentry);
-			hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
-			spin_unlock(&res->d_lock);
-		}
+		spin_lock(&res->d_lock);
+		res->d_sb = inode->i_sb;
+		res->d_parent = res;
+		res->d_inode = inode;
+		res->d_bucket = d_hash(res, res->d_name.hash);
+		res->d_flags |= DCACHE_DISCONNECTED;
+		res->d_vfs_flags &= ~DCACHE_UNHASHED;
+		list_add(&res->d_alias, &inode->i_dentry);
+		hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
+		spin_unlock(&res->d_lock);
+
 		inode = NULL; /* don't drop reference */
 	}
 	spin_unlock(&dcache_lock);
@@ -878,7 +878,7 @@ struct dentry * d_alloc_anon(struct inod
  * DCACHE_DISCONNECTED), then d_move that in place of the given dentry
  * and return it, else simply d_add the inode to the dentry and return NULL.
  *
- * This is (will be) needed in the lookup routine of any filesystem that is exportable
+ * This is needed in the lookup routine of any filesystem that is exportable
  * (via knfsd) so that we can build dcache paths to directories effectively.
  *
  * If a dentry was found and moved, then it is returned.  Otherwise NULL
@@ -889,11 +889,11 @@ struct dentry *d_splice_alias(struct ino
 {
 	struct dentry *new = NULL;
 
-	if (inode && S_ISDIR(inode->i_mode)) {
+	if (inode) {
 		spin_lock(&dcache_lock);
-		if (!list_empty(&inode->i_dentry)) {
-			new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
-			__dget_locked(new);
+		new = __d_find_alias(inode, 1);
+		if (new) {
+			BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
 			spin_unlock(&dcache_lock);
 			security_d_instantiate(new, inode);
 			d_rehash(dentry);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-30  4:54 ` Neil Brown
@ 2004-04-30  7:50   ` Greg Banks
  2004-04-30 13:28   ` Nikita Danilov
  2004-05-03 12:02   ` Greg Banks
  2 siblings, 0 replies; 22+ messages in thread
From: Greg Banks @ 2004-04-30  7:50 UTC (permalink / raw)
  To: Neil Brown
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust

Neil Brown wrote:
> 
> If I understand you correctly, the main problem is that a disconnected
> dentry can hold an inode active after the last link has been removed.
> The file will not then be truncated and removed until memory pressure
> flushes the disconnected dentry from the dcache.
> 
> This problem can be resolved by making sure that an inode never has
> both a connected and a disconnected dentry.
> 
> This is already the case for directories (as they must only have one
> dentry), but it is not the case for non-directories.
> 
> The following patch tries to address this.  It is a "technology
> preview" in that the only testing I have done is that it compiles OK.
> 
> Please consider reviewing it to see if it makes sense.
> 
> It:
>  - changes d_alloc_anon to make sure that a new disconnected dentry is
>    only allocated if there is currently no (hashed) dentry for the
>    inode. (Previously this would noramlly be true, but a race was
>    possible).
>  - changes d_splice_alias to re-use a disconnected dentry on
>    non-directories as well as directories.
>  - splits most of d_find_alias out into a separate function to make
>    the above easier.
> 
> I haven't fully thought through issues with unhashed, connected
> dentries.
> __d_find_alias won't return them so d_alloc_anon will never return
> one, so it is possible to have an unhashed dentry and a disconnected
> dentry at the same time, which probably isn't desirable.
> 
> Is it OK for d_alloc_anon to return an unaliased dentry, and then have
> it possibly spliced back into the dentry tree??? I'm not sure.
> 
> Comments welcome.


I've been wrestling with a problem in this area for the last couple of 
days.  The symptoms are different but the ultimate cause appears to (also)
be a race condition somewhere under fh_verify() resulting in confused
dentry structures of some kind.  Eventually this results in __dget_locked()
being called on a dentry which has a zero reference count but is hashed
and not on the unused list, which spuriously decrements dentry_stat.nr_unused.
When this happens enough times dentry_stat.nr_unused drops below zero
and kswapd starts spinning trying to prune a near-infinite number of
dentries.

I just tried your patch and the problem remains.

What I'm getting from my debug setup is:

<4>kernel BUG at fs/dcache.c:289!
<4>nfsd[4981]: bugcheck! 0 [1]


   285	static inline struct dentry * __dget_locked(struct dentry *dentry)
   286	{
   287		atomic_inc(&dentry->d_count);
   288		if (atomic_read(&dentry->d_count) == 1) {
   289		    	BUG_ON(list_empty(&dentry->d_lru));   <-------------
   290			dentry_stat.nr_unused--;
   291			list_del_init(&dentry->d_lru);
   292			BUG_ON(dentry_stat.nr_unused < 0);
   293		}
   294		return dentry;
   295	}

[0]kdb> bt
Stack traceback for pid 4981
0xe00000300b1f0000     4981        1  1    0   R  0xe00000300b1f04f0 *nfsd
0xa0000001001a2250 __d_find_alias+0x250   <----- note: with your patch applied
        args (0xa000000100936acc, 0xe000003009059ba8, 0xa0000001001a23f0,
0x206)        kernel 0xa0000001001a2000 0xa0000001001a2380
0xa0000001001a23f0 d_find_alias+0x70
        args (0xe00000b07aff67b0, 0xa0000001008ec900, 0xa0000001001a4120,
0x287)        kernel 0xa0000001001a2380 0xa0000001001a2420
0xa0000001001a4120 d_alloc_anon+0x20
        args (0xe00000b07aff67b0, 0xe00000300b1f7ad0, 0xe00000300b1f7ac0,
0xa0000001003c4ee0, 0x207)
        kernel 0xa0000001001a4100 0xa0000001001a4440
0xa0000001003c4ee0 linvfs_get_dentry+0x120
        args (0xe00000b07aff67b0, 0xe00000307b299f18, 0xa000000100292860,
0xd1d)        kernel 0xa0000001003c4dc0 0xa0000001003c4f20
0xa000000100292860 find_exported_dentry+0xa0
        args (0xe000003008a9fa00, 0xe00000307b299f18, 0xe00000300b1f7ce0,
0xa000000100861340, 0xe00000b07aa6f180)
        kernel 0xa0000001002927c0 0xa000000100293920
0xa000000100294090 export_decode_fh+0xb0
        args (0xe000003008a9fa00, 0xe00000307b299f24, 0x4, 0x2, 0xa000000100861340)
        kernel 0xa000000100293fe0 0xa000000100294100
0xa0000001002993f0 fh_verify+0x910
        args (0xe000003016c0d000, 0xe00000307b299f08, 0x11270000, 0x44,
0xe00000307b299f14)
        kernel 0xa000000100298ae0 0xa000000100299960
0xa00000010029cb00 nfsd_open+0x40
        args (0xe000003016c0d000, 0xe00000307b299f08, 0x8000, 0x4,
0xe00000300b1f7d00)
        kernel 0xa00000010029cac0 0xa00000010029cea0
0xa00000010029d440 nfsd_read+0x40
        args (0xe000003016c0d000, 0xe00000307b299f08, 0xe00000300b1f7d10,
0xe00000307b299c38, 0x2)
        kernel 0xa00000010029d400 0xa00000010029dda0
0xa0000001002b02f0 nfsd3_proc_read+0x190
        args (0xe00000307b299f90, 0xe00000307b299b00, 0xe00000307b299f00,
0xe00000307b29a030, 0xe00000307b299f08)
        kernel 0xa0000001002b0160 0xa0000001002b0400
0xa000000100295120 nfsd_dispatch+0x280
        args (0xe000003016c0d000, 0xe00000306b888014, 0xa000000100ce7520,
0xe000003016c0d490, 0xa00000010093e0d0)
        kernel 0xa000000100294ea0 0xa000000100295320
0xa000000100721810 svc_process+0xff0
        args (0xe00000b07aa6e928, 0xe000003016c0d000, 0xe000003016c0d240,
0xe000003016c0d068, 0xa000000100ce7520)
        kernel 0xa000000100720820 0xa000000100721b60
0xa000000100294a60 nfsd+0x480
        args (0xe000003016c0d000, 0xfffeba2f, 0xfffeba2f, 0xe00000b07aa6e900,
0xa000000100b008a0)
        kernel 0xa0000001002945e0 0xa000000100294ea0
0xa00000010001ae60 kernel_thread_helper+0xe0
[...]

arg0 of the new d_find_alias() is the inode

[0]kdb> inode 0xe00000b07aff67b0 
struct inode at  0xe00000b07aff67b0
 i_ino = 137 i_count = 3 i_size 8589934592
 i_mode = 0100777  i_nlink = 1  i_rdev = 0x0
 i_hash.nxt = 0x0000000000000000 i_hash.pprev = 0xe00000307bcb6408
 i_list.nxt = 0xe000003008a9fac8 i_list.prv = 0xe000003008a9fac8
 i_dentry.nxt = 0xe000003009059b80 i_dentry.prv = 0xe000003009059b80
 i_sb = 0xe000003008a9fa00 i_op = 0xa0000001009422b0 i_data = 0xe00000b07aff68a8
nrpages = 73612
 i_fop= 0xa000000100941fe8 i_flock = 0x0000000000000000 i_mapping =
0xe00000b07aff68a8
 i_flags 0x0 i_state 0x1 [I_DIRTY_SYNC]  fs specific info @ 0xe00000b07aff6a10

Walk the dentry chain...only one entry

[0]kdb> dentry 0xe000003009059b80
Dentry at 0xe000003009059b80
 d_name.len = 16 d_name.name = 0xe000003016ca7010 <read_load_test.0>   <----- not
anonymous
 d_count = 1 d_flags = 0x4 d_inode = 0xe00000b07aff67b0
           ^           ^^^ DCACHE_DISCONNECTED
       count=0 before line 287
 d_parent = 0xe00000b0065d5480
 d_hash.nxt = 0x0000000000000000 d_hash.prv = 0xe00000307bbccbf0    <---- hashed
 d_lru.nxt = 0xe000003009059ba0 d_lru.prv = 0xe000003009059ba0      <---- not on
unused list
 d_child.nxt = 0xe00000b0065d54c0 d_child.prv = 0xe00000b0065d54c0
 d_subdirs.nxt = 0xe000003009059bc0 d_subdirs.prv = 0xe000003009059bc0
 d_alias.nxt = 0xe00000b07aff67d0 d_alias.prv = 0xe00000b07aff67d0
 d_op = 0x0000000000000000 d_sb = 0xe000003008a9fa00


>  /*
>   *     Try to kill dentries associated with this inode.
>   * WARNING: you must own a reference to inode.
> @@ -835,28 +841,22 @@ struct dentry * d_alloc_anon(struct inod
>         tmp->d_parent = tmp; /* make sure dput doesn't croak */
> 
>         spin_lock(&dcache_lock);
> -       if (S_ISDIR(inode->i_mode) && !list_empty(&inode->i_dentry)) {
> -               /* A directory can only have one dentry.
> -                * This (now) has one, so use it.
> -                */
> -               res = list_entry(inode->i_dentry.next, struct dentry, d_alias);
> -               __dget_locked(res);
> -       } else {
> -               /* attach a disconnected dentry */
> +
> +       res = __d_find_alias(inode, 0);
> +       if (!res) {
>                 res = tmp;

Yes this is an obvious (well, now) race condition. But not apparently the
whole story.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-30  4:54 ` Neil Brown
  2004-04-30  7:50   ` Greg Banks
@ 2004-04-30 13:28   ` Nikita Danilov
  2004-05-03 23:46     ` Neil Brown
  2004-05-03 12:02   ` Greg Banks
  2 siblings, 1 reply; 22+ messages in thread
From: Nikita Danilov @ 2004-04-30 13:28 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux kernel mailing list, alexander viro, trond myklebust

Neil Brown writes:
 > On Friday April 23, Nikita@Namesys.COM wrote:
 > > Hello,
 > > 
 > > for some time I am observing that during stress tests over NFS
 > > 
 > >    shrink_slab->...->prune_dcache()->prune_one_dentry()->...->iput()
 > > 
 > > is called on inode with ->i_nlink == 0 which results in truncate and
 > > file deletion. This is wrong in general (file system is re-entered), and
 > > deadlock prone on some file systems.
 > > 
 > > After some debugging, I tracked problem down the to d_splice_alias()
 > > failing to identify dentries when necessary.
 > > 
 > > Suppose we have an inode with ->i_nlink == 1. It's accessed over NFS and
 > > DCACHE_DISCONNECTED dentry D1 is created for it. Then, unlink request
 > > comes for this file. nfsd looks name up in the parent directory
 > > (nfsd_unlink()->lookup_one_len()). File system back-end uses
 > > d_splice_alias(), but it only works for directories and we end up with
 > > second (this time connected) dentry D2.
 > > 
 > > D2 is successfully unlinked, file has ->i_nlink == 0, and ->i_count == 1
 > > from D1, and when prune_dcache() hits D1 bad things happen.
 > > 
 > > It's hard to imagine how new name can be identified with one among
 > > multiple anonymous dentries, which is necessary for
 > > NFSEXP_NOSUBTREECHECK export to work reliably.
 > > 
 > > One possible work-around is to forcibly destroy all remaining
 > > DCACHE_DISCONNECTED dentries when ->i_nlink drops to zero, but I am not
 > > sure that this is possible and solves all problems of having more
 > > dentries than there are nlinks.
 > > 
 > > Nikita.
 > 
 > If I understand you correctly, the main problem is that a disconnected
 > dentry can hold an inode active after the last link has been removed.
 > The file will not then be truncated and removed until memory pressure
 > flushes the disconnected dentry from the dcache.
 > 
 > This problem can be resolved by making sure that an inode never has
 > both a connected and a disconnected dentry.
 > 
 > This is already the case for directories (as they must only have one
 > dentry), but it is not the case for non-directories.
 > 
 > The following patch tries to address this.  It is a "technology
 > preview" in that the only testing I have done is that it compiles OK. 

I have a test where such situation is reproducible and will give patch a
try.

Also, Al Viro pointed to me that it's not clear why DCACHE_DISCONNECTED
dentry is DCACHE_HASHED at all. If it were unhashed, last dput (done by
nfsd thread) would destroy it, truncating file if necessary.

 > 

Nikita.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-30  4:54 ` Neil Brown
  2004-04-30  7:50   ` Greg Banks
  2004-04-30 13:28   ` Nikita Danilov
@ 2004-05-03 12:02   ` Greg Banks
  2004-05-03 23:28     ` Neil Brown
  2 siblings, 1 reply; 22+ messages in thread
From: Greg Banks @ 2004-05-03 12:02 UTC (permalink / raw)
  To: Neil Brown
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust

Neil Brown wrote:
> 
> This problem can be resolved by making sure that an inode never has
> both a connected and a disconnected dentry.
> 
> This is already the case for directories (as they must only have one
> dentry), but it is not the case for non-directories.
> 
> The following patch tries to address this.  It is a "technology
> preview" in that the only testing I have done is that it compiles OK.
> 
> Please consider reviewing it to see if it makes sense.

It does, and it fixes one of the dcache bugs that was tripping my debug
code.  Here are a couple more.

*   Logic bug in d_splice_alias() forgets to clear the DCACHE_DISCONNECTED
    flag when a lookup connects a disconnected dentry.  Fix is (relative
    to Neil's patch):

--- linux.orig/fs/dcache.c	Mon May  3 21:46:30 2004
+++ linux/fs/dcache.c	Mon May  3 21:49:07 2004
@@ -894,6 +895,7 @@
 		new = __d_find_alias(inode, 1);
 		if (new) {
 			BUG_ON(!(new->d_flags & DCACHE_DISCONNECTED));
+			new->d_flags &= ~DCACHE_DISCONNECTED;
 			spin_unlock(&dcache_lock);
 			security_d_instantiate(new, inode);
 			d_rehash(dentry);


*   Dentry_stat.nr_unused can be be spuriously decremented when dput()
    races with __dget_unlocked().  Eventual result is nr_unused<0
    and kswapd loops.  This is the problem I mentioned earlier.  Note
    that this is not an NFS-specific problem.  Fix is:

--- linux.orig/fs/dcache.c	Mon May  3 21:46:30 2004
+++ linux/fs/dcache.c	Mon May  3 21:49:07 2004
@@ -255,8 +255,8 @@
 
 static inline struct dentry * __dget_locked(struct dentry *dentry)
 {
-	atomic_inc(&dentry->d_count);
-	if (atomic_read(&dentry->d_count) == 1) {
+	if (atomic_inc(&dentry->d_count) == 1) {
+	    	BUG_ON(list_empty(&dentry->d_lru));
 		dentry_stat.nr_unused--;
 		list_del_init(&dentry->d_lru);
 	}
@@ -663,6 +663,7 @@
 		if (gfp_mask & __GFP_FS)
 			prune_dcache(nr);
 	}
+	BUG_ON(dentry_stat.nr_unused < 0);
 	return dentry_stat.nr_unused;
 }
 

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-03 12:02   ` Greg Banks
@ 2004-05-03 23:28     ` Neil Brown
  2004-05-04  0:05       ` Greg Banks
                         ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Neil Brown @ 2004-05-03 23:28 UTC (permalink / raw)
  To: Greg Banks
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust

On Monday May 3, gnb@melbourne.sgi.com wrote:
> Neil Brown wrote:
> > 
> > This problem can be resolved by making sure that an inode never has
> > both a connected and a disconnected dentry.
> > 
> > This is already the case for directories (as they must only have one
> > dentry), but it is not the case for non-directories.
> > 
> > The following patch tries to address this.  It is a "technology
> > preview" in that the only testing I have done is that it compiles OK.
> > 
> > Please consider reviewing it to see if it makes sense.
> 
> It does, and it fixes one of the dcache bugs that was tripping my debug
> code.  Here are a couple more.
> 
> *   Logic bug in d_splice_alias() forgets to clear the DCACHE_DISCONNECTED
>     flag when a lookup connects a disconnected dentry.  Fix is (relative
>     to Neil's patch):

I seem to recall that this was intentional.  When I first wrote the
code I wanted to leave the splicing code to just splice things, and
when the code that cared about disconnected-ness (in knfsd) discovered
that something really was connected, it would clear the bit
(find_exported_dentry does this). 
However if we want to ensure that there is at-most one disconnected
dentry for an inode, we do need to set the bit more aggressively.
I'll try and review the code with that in mind.  Thanks.

> 
> 
> *   Dentry_stat.nr_unused can be be spuriously decremented when dput()
>     races with __dget_unlocked().  Eventual result is nr_unused<0
>     and kswapd loops.  This is the problem I mentioned earlier.  Note
>     that this is not an NFS-specific problem.  Fix is:
> 
> --- linux.orig/fs/dcache.c	Mon May  3 21:46:30 2004
> +++ linux/fs/dcache.c	Mon May  3 21:49:07 2004
> @@ -255,8 +255,8 @@
>  
>  static inline struct dentry * __dget_locked(struct dentry *dentry)
>  {
> -	atomic_inc(&dentry->d_count);
> -	if (atomic_read(&dentry->d_count) == 1) {
> +	if (atomic_inc(&dentry->d_count) == 1) {

One problem with this is that (in include/asm-i386/atomic.h at least):
  static __inline__ void atomic_inc(atomic_t *v)

atomic_inc returns "void".

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-04-30 13:28   ` Nikita Danilov
@ 2004-05-03 23:46     ` Neil Brown
  0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2004-05-03 23:46 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: linux kernel mailing list, alexander viro, trond myklebust

On Friday April 30, Nikita@Namesys.COM wrote:
> 
> Also, Al Viro pointed to me that it's not clear why DCACHE_DISCONNECTED
> dentry is DCACHE_HASHED at all. If it were unhashed, last dput (done by
> nfsd thread) would destroy it, truncating file if necessary.
> 

This causes other problems (I vaguely remember).
It means that every NFS request on such a file would cause the dentry
to be created and then destroyed.  If the filesystem is keeping state
in the dentry, this gets lost.  I think some filesystems discard
preallocated space when the dentry is destroyed.  Even if not, there
is a performance hit. 

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-03 23:28     ` Neil Brown
@ 2004-05-04  0:05       ` Greg Banks
  2004-05-04  7:00       ` Greg Banks
  2004-05-10  3:27       ` Neil Brown
  2 siblings, 0 replies; 22+ messages in thread
From: Greg Banks @ 2004-05-04  0:05 UTC (permalink / raw)
  To: Neil Brown
  Cc: Greg Banks, Nikita Danilov, linux kernel mailing list,
	alexander viro, trond myklebust

On Tue, May 04, 2004 at 09:28:48AM +1000, Neil Brown wrote:
> On Monday May 3, gnb@melbourne.sgi.com wrote:
> > Neil Brown wrote:
> > > 
> > *   Dentry_stat.nr_unused can be be spuriously decremented when dput()
> >     races with __dget_unlocked().  Eventual result is nr_unused<0
> >     and kswapd loops.  This is the problem I mentioned earlier.  Note
> >     that this is not an NFS-specific problem.  Fix is:
> > 
> > --- linux.orig/fs/dcache.c	Mon May  3 21:46:30 2004
> > +++ linux/fs/dcache.c	Mon May  3 21:49:07 2004
> > @@ -255,8 +255,8 @@
> >  
> >  static inline struct dentry * __dget_locked(struct dentry *dentry)
> >  {
> > -	atomic_inc(&dentry->d_count);
> > -	if (atomic_read(&dentry->d_count) == 1) {
> > +	if (atomic_inc(&dentry->d_count) == 1) {
> 
> One problem with this is that (in include/asm-i386/atomic.h at least):
>   static __inline__ void atomic_inc(atomic_t *v)
> 
> atomic_inc returns "void".

Doh!  This is what comes from doing all coding and testing on minority
achitectures...I'll think about this a bit more.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-03 23:28     ` Neil Brown
  2004-05-04  0:05       ` Greg Banks
@ 2004-05-04  7:00       ` Greg Banks
  2004-05-04  9:46         ` viro
  2004-05-10  3:03         ` Neil Brown
  2004-05-10  3:27       ` Neil Brown
  2 siblings, 2 replies; 22+ messages in thread
From: Greg Banks @ 2004-05-04  7:00 UTC (permalink / raw)
  To: Neil Brown
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust

Neil Brown wrote:
> 
> > *   Dentry_stat.nr_unused can be be spuriously decremented when dput()
> >     races with __dget_unlocked().  Eventual result is nr_unused<0
> >     and kswapd loops.  This is the problem I mentioned earlier.  Note
> >     that this is not an NFS-specific problem.  Fix is:
> >
> > --- linux.orig/fs/dcache.c    Mon May  3 21:46:30 2004
> > +++ linux/fs/dcache.c Mon May  3 21:49:07 2004
> > @@ -255,8 +255,8 @@
> >
> >  static inline struct dentry * __dget_locked(struct dentry *dentry)
> >  {
> > -     atomic_inc(&dentry->d_count);
> > -     if (atomic_read(&dentry->d_count) == 1) {
> > +     if (atomic_inc(&dentry->d_count) == 1) {
> 
> One problem with this is that (in include/asm-i386/atomic.h at least):
>   static __inline__ void atomic_inc(atomic_t *v)

Ok, how about this...it's portable, and not racy, but may perturb the
logic slightly by also taking dentries off the unused list in the case
where they already had d_count>=1.  I'm not sure how significant that is.
In any case this also passes my tests.


--- linux.orig/fs/dcache.c	Mon May  3 21:46:30 2004
+++ linux/fs/dcache.c	Tue May  4 14:34:44 2004
@@ -256,7 +256,7 @@
 static inline struct dentry * __dget_locked(struct dentry *dentry)
 {
 	atomic_inc(&dentry->d_count);
-	if (atomic_read(&dentry->d_count) == 1) {
+	if (!list_empty(&dentry->d_lru)) {
 		dentry_stat.nr_unused--;
 		list_del_init(&dentry->d_lru);
 	}
@@ -663,6 +663,7 @@
 		if (gfp_mask & __GFP_FS)
 			prune_dcache(nr);
 	}
+	BUG_ON(dentry_stat.nr_unused < 0);
 	return dentry_stat.nr_unused;
 }
 


Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-04  7:00       ` Greg Banks
@ 2004-05-04  9:46         ` viro
  2004-05-04 10:21           ` Greg Banks
  2004-05-05  0:11           ` Neil Brown
  2004-05-10  3:03         ` Neil Brown
  1 sibling, 2 replies; 22+ messages in thread
From: viro @ 2004-05-04  9:46 UTC (permalink / raw)
  To: Greg Banks
  Cc: Neil Brown, Nikita Danilov, linux kernel mailing list,
	trond myklebust

On Tue, May 04, 2004 at 05:00:15PM +1000, Greg Banks wrote:
 
> Ok, how about this...it's portable, and not racy, but may perturb the
> logic slightly by also taking dentries off the unused list in the case
> where they already had d_count>=1.  I'm not sure how significant that is.
> In any case this also passes my tests.
 
a) ask RCU folks to review - the current logics in dcache.c is extremely
brittle as it is.

b) after rereading that code, I _really_ don't like the crap with "hashed
but disconnected" nfsd is pulling off.  Neil, care to give detailed reasons
why you are doing that?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-04  9:46         ` viro
@ 2004-05-04 10:21           ` Greg Banks
  2004-05-05  0:11           ` Neil Brown
  1 sibling, 0 replies; 22+ messages in thread
From: Greg Banks @ 2004-05-04 10:21 UTC (permalink / raw)
  To: Dipankar Sarma
  Cc: viro, Neil Brown, Nikita Danilov, linux kernel mailing list,
	trond myklebust

G'day Dipankar,

I don't know if you've been following this thread on lkml, but Al Viro
wants an RCU expert's opinion on the following dcache patch.

Greg Banks wrote:
> 
> > > *   Dentry_stat.nr_unused can be be spuriously decremented when dput()
> > >     races with __dget_unlocked().  Eventual result is nr_unused<0
> > >     and kswapd loops.  This is the problem I mentioned earlier.  Note
> > >     that this is not an NFS-specific problem.  Fix is:
> > > [...first attempt elided...]
> Ok, how about this...it's portable, and not racy, but may perturb the
> logic slightly by also taking dentries off the unused list in the case
> where they already had d_count>=1.  I'm not sure how significant that is.
> In any case this also passes my tests.
> 
> --- linux.orig/fs/dcache.c      Mon May  3 21:46:30 2004
> +++ linux/fs/dcache.c   Tue May  4 14:34:44 2004
> @@ -256,7 +256,7 @@
>  static inline struct dentry * __dget_locked(struct dentry *dentry)
>  {
>         atomic_inc(&dentry->d_count);
> -       if (atomic_read(&dentry->d_count) == 1) {
> +       if (!list_empty(&dentry->d_lru)) {
>                 dentry_stat.nr_unused--;
>                 list_del_init(&dentry->d_lru);
>         }
> @@ -663,6 +663,7 @@
>                 if (gfp_mask & __GFP_FS)
>                         prune_dcache(nr);
>         }
> +       BUG_ON(dentry_stat.nr_unused < 0);
>         return dentry_stat.nr_unused;
>  }
> 

viro@parcelfarce.linux.theplanet.co.uk wrote:
> 
> a) ask RCU folks to review - the current logics in dcache.c is extremely
> brittle as it is.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-04  9:46         ` viro
  2004-05-04 10:21           ` Greg Banks
@ 2004-05-05  0:11           ` Neil Brown
  1 sibling, 0 replies; 22+ messages in thread
From: Neil Brown @ 2004-05-05  0:11 UTC (permalink / raw)
  To: viro; +Cc: Greg Banks, Nikita Danilov, linux kernel mailing list,
	trond myklebust

On Tuesday May 4, viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Tue, May 04, 2004 at 05:00:15PM +1000, Greg Banks wrote:
>  
> > Ok, how about this...it's portable, and not racy, but may perturb the
> > logic slightly by also taking dentries off the unused list in the case
> > where they already had d_count>=1.  I'm not sure how significant that is.
> > In any case this also passes my tests.
>  
> a) ask RCU folks to review - the current logics in dcache.c is extremely
> brittle as it is.
> 
> b) after rereading that code, I _really_ don't like the crap with "hashed
> but disconnected" nfsd is pulling off.  Neil, care to give detailed reasons
> why you are doing that?

As a quick answer, below is a copy of an email from Christoph Hellwig
from 18 months ago where says how glad he is to see it and wants to
know if it will stay.  Basically, without hashing the dentry, it will
get allocated and deallocated very frequently which has negative
consequences.  It would actually be nice to hold a "file" around
instead of just a "dentry", but a dentry is better than nothing.

NeilBrown


From: Neil Brown <neilb@cse.unsw.edu.au>
To: Christoph Hellwig <hch@sgi.com>
Cc:    linux-fsdevel@vger.kernel.org
Subject: Re: keeping nfsd dentries on unused_list
Date: Tue, 22 Oct 2002 09:15:54 +1000
Content-Type: text/plain; charset=us-ascii

On Monday October 21, hch@sgi.com wrote:
> Hi Neil,
> 
> since your export changes in 2.5.<early> always have a non-empty
> ->d_hash (linked into sb->s_anon) and thus are put on the unused
> list by dput instead of directly reclaiming it.
> 
> Is this behaviour intentional and will stay during 2.6?  Keeping
> those dentries alive will allow me to remove lots of code in XFS
> to keep inodes that are written to by nfsd in cache as the final
> iput will flush all delayed allocated space and thus decrease
> nfs write performance massively - but as long as the dentry is
> on the unused list after dput we still have an inode reference
> and thus the delalloc block don't need to be converted.

Yes.  That behaviour is intentional and will stay.

NeilBrown


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-04  7:00       ` Greg Banks
  2004-05-04  9:46         ` viro
@ 2004-05-10  3:03         ` Neil Brown
  2004-05-10  4:50           ` Greg Banks
  1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2004-05-10  3:03 UTC (permalink / raw)
  To: Greg Banks
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust, Dipankar Sarma

On Tuesday May 4, gnb@melbourne.sgi.com wrote:
> > >
> > > --- linux.orig/fs/dcache.c    Mon May  3 21:46:30 2004
> > > +++ linux/fs/dcache.c Mon May  3 21:49:07 2004
> > > @@ -255,8 +255,8 @@
> > >
> > >  static inline struct dentry * __dget_locked(struct dentry *dentry)
> > >  {
> > > -     atomic_inc(&dentry->d_count);
> > > -     if (atomic_read(&dentry->d_count) == 1) {
> > > +     if (atomic_inc(&dentry->d_count) == 1) {
> > 
> > One problem with this is that (in include/asm-i386/atomic.h at least):
> >   static __inline__ void atomic_inc(atomic_t *v)
> 
> Ok, how about this...it's portable, and not racy, but may perturb the
> logic slightly by also taking dentries off the unused list in the case
> where they already had d_count>=1.  I'm not sure how significant that is.
> In any case this also passes my tests.

I think this patch is good and needed.

I think the race can happen if:
       dentry->d_count == 1, not on list

   thread 1                       thread 2
   enter __dget_locked            enter dput
   atomic_inc(d_count) (now 2)
                                  atomic_dec_and_lock(d_count...) (now 1)
   if(atomic_read(d_count)==1 ....
         remove from list


This will remove it from the unused list when it isn't
on, and will decrement nr_unused which, as you say, is bad.

I don't think there can be any problem with taking dentries with a
non-zero d_count off the unused_list randomly (providing dcache_lock
is held of course).

The comment at the top of d_lookup says that it doesn't take dentries
off the unused_list, but instead leaves that task to assorted other
code.

prune_dcache and shrink_dcache_sb and select_parent will all quietly
remove such dentries from the unused list and there is no reason that
__dget_locked shouldn't aswell.  Possibly that patch should update the 
d_lookup comment to add __dget_locked to the set of functions that
clean up after it.

NeilBrown

> 
> 
> --- linux.orig/fs/dcache.c	Mon May  3 21:46:30 2004
> +++ linux/fs/dcache.c	Tue May  4 14:34:44 2004
> @@ -256,7 +256,7 @@
>  static inline struct dentry * __dget_locked(struct dentry *dentry)
>  {
>  	atomic_inc(&dentry->d_count);
> -	if (atomic_read(&dentry->d_count) == 1) {
> +	if (!list_empty(&dentry->d_lru)) {
>  		dentry_stat.nr_unused--;
>  		list_del_init(&dentry->d_lru);
>  	}
> @@ -663,6 +663,7 @@
>  		if (gfp_mask & __GFP_FS)
>  			prune_dcache(nr);
>  	}
> +	BUG_ON(dentry_stat.nr_unused < 0);
>  	return dentry_stat.nr_unused;
>  }
>  
> 
> 
> Greg.
> -- 
> Greg Banks, R&D Software Engineer, SGI Australian Software Group.
> I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-03 23:28     ` Neil Brown
  2004-05-04  0:05       ` Greg Banks
  2004-05-04  7:00       ` Greg Banks
@ 2004-05-10  3:27       ` Neil Brown
  2004-05-10 11:28         ` Greg Banks
  2 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2004-05-10  3:27 UTC (permalink / raw)
  To: Greg Banks, Nikita Danilov, linux kernel mailing list,
	alexander viro, trond myklebust

On Tuesday May 4, neilb@cse.unsw.edu.au wrote:
> On Monday May 3, gnb@melbourne.sgi.com wrote:
> > 
> > *   Logic bug in d_splice_alias() forgets to clear the DCACHE_DISCONNECTED
> >     flag when a lookup connects a disconnected dentry.  Fix is (relative
> >     to Neil's patch):
> 
> I seem to recall that this was intentional.  When I first wrote the
> code I wanted to leave the splicing code to just splice things, and
> when the code that cared about disconnected-ness (in knfsd) discovered
> that something really was connected, it would clear the bit
> (find_exported_dentry does this). 
> However if we want to ensure that there is at-most one disconnected
> dentry for an inode, we do need to set the bit more aggressively.
> I'll try and review the code with that in mind.  Thanks.
> 

Ok, I've reviewed the code and thought about it some more.

This was intentional and the patch to clear DCACHE_DISCONNECTED is not
needed and possibly wrong.

DCACHE_DISCONNECTED *doesn't* mean that the entries isn't connected,
but only that it might not be.
In knfsd, we build a patch from the bottom up, and the dentry at the
bottom is not "connected" until the top is finally connected to the
root.  If DCACHE_DISCONNECTED were to mean precisely that the dentry
isn't connected, then at that point were we finally connect a path we
would have to walk down the path (which could possibly branch)
clearing all the DCACHE_DISCONNECTED flags, and this clearly isn't a
good idea.
nfsd and exportfs should be the only bits of code that cares if
DCACHE_DISCONNECTED is set or not, and they will clear it if
appropriate.

A more significant measure is "IS_ROOT". When we call d_splice_alias,
it should look for an IS_ROOT alias to consider splicing in rather
than a DCACHE_DISCONNECTED alias, as a DCACHE_DISCONNECTED alias might
already be spliced into some other path (if it is for a file with
multiple links).

The follow patch should resolve all of this, and a few other things.
IT:

 - moves DCACHE_DISCONNECTED into d_vfs_flags and uses dcache_lock to
   make sure that setting the flag doesn't tread on some other flag
   in the same field.
 - introduces "d_get_alias" which is similar to "d_find_alias", but is
   less choosy: any alias will do.
 - uses d_get_alias in d_alloc_anon and d_splice_alias to try to use a
   pre-exisiting alias if possible.
 - Only splices in a pre-exisiting alias if it was IS_ROOT.  There can
   now only be one of these per inode, and if there is one, there will
   be no other alias.
 - introduces d_move_locked which does the same as d_move, but without
   getting dcache_lock first (it must already be held).  This is
   needed to be able to atomically test IS_ROOT, and then convert the
   dentry to non IS_ROOT before anyone else gets to test IS_ROOT.


Comments and testing results very welcome.

NeilBrown


 ----------- Diffstat output ------------
 ./fs/dcache.c            |  105 +++++++++++++++++++++++++++++++----------------
 ./fs/exportfs/expfs.c    |   17 ++++---
 ./fs/intermezzo/dir.c    |    4 +
 ./fs/nfsd/nfsfh.c        |    2 
 ./include/linux/dcache.h |    2 
 5 files changed, 86 insertions(+), 44 deletions(-)

diff ./fs/dcache.c~current~ ./fs/dcache.c
--- ./fs/dcache.c~current~	2004-05-10 13:26:54.000000000 +1000
+++ ./fs/dcache.c	2004-05-10 13:26:54.000000000 +1000
@@ -297,7 +297,7 @@ struct dentry * d_find_alias(struct inod
 		prefetch(next);
 		alias = list_entry(tmp, struct dentry, d_alias);
  		if (!d_unhashed(alias)) {
-			if (alias->d_flags & DCACHE_DISCONNECTED)
+			if (alias->d_vfs_flags & DCACHE_DISCONNECTED)
 				discon_alias = alias;
 			else {
 				__dget_locked(alias);
@@ -312,6 +312,39 @@ struct dentry * d_find_alias(struct inod
 	return discon_alias;
 }
 
+/**
+ * d_get_alias - grab any alias of inode
+ * @inode: inode in question
+ *
+ * Return a counted reference to an alias for @inode, or
+ * NULL if there are none.
+ *
+ * An inode can have one of:
+ *  - no aliases
+ *  - exactly one alias which IS_ROOT()
+ *  - one or more !IS_ROOT() aliases which might be hashed
+ * thus the returned alias might not be hashed, but will only be IS_ROOT
+ * if there are no other aliases.
+ */
+struct dentry *__d_get_alias(struct inode *inode)
+{
+	struct dentry *alias = NULL;
+	if (!list_empty(&inode->i_dentry)) {
+		alias = list_entry(inode->i_dentry.next, struct dentry, d_alias);
+		__dget_locked(alias);
+	}
+	return alias;
+}
+struct dentry *d_get_alias(struct inode *inode)
+{
+	struct dentry *alias = NULL;
+	spin_lock(&dcache_lock);
+	alias = __d_get_alias(inode);
+	spin_unlock(&dcache_lock);
+	return alias;
+}
+
+
 /*
  *	Try to kill dentries associated with this inode.
  * WARNING: you must own a reference to inode.
@@ -828,7 +861,7 @@ struct dentry * d_alloc_anon(struct inod
 	struct dentry *tmp;
 	struct dentry *res;
 
-	if ((res = d_find_alias(inode))) {
+	if ((res = d_get_alias(inode))) {
 		iput(inode);
 		return res;
 	}
@@ -840,28 +873,21 @@ struct dentry * d_alloc_anon(struct inod
 	tmp->d_parent = tmp; /* make sure dput doesn't croak */
 	
 	spin_lock(&dcache_lock);
-	if (S_ISDIR(inode->i_mode) && !list_empty(&inode->i_dentry)) {
-		/* A directory can only have one dentry.
-		 * This (now) has one, so use it.
-		 */
-		res = list_entry(inode->i_dentry.next, struct dentry, d_alias);
-		__dget_locked(res);
-	} else {
-		/* attach a disconnected dentry */
+
+	res = __d_get_alias(inode);
+	if (!res) {
 		res = tmp;
 		tmp = NULL;
-		if (res) {
-			spin_lock(&res->d_lock);
-			res->d_sb = inode->i_sb;
-			res->d_parent = res;
-			res->d_inode = inode;
-			res->d_bucket = d_hash(res, res->d_name.hash);
-			res->d_flags |= DCACHE_DISCONNECTED;
-			res->d_vfs_flags &= ~DCACHE_UNHASHED;
-			list_add(&res->d_alias, &inode->i_dentry);
-			hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
-			spin_unlock(&res->d_lock);
-		}
+		spin_lock(&res->d_lock);
+		res->d_sb = inode->i_sb;
+		res->d_parent = res;
+		res->d_inode = inode;
+		res->d_bucket = d_hash(res, res->d_name.hash);
+		res->d_vfs_flags |= DCACHE_DISCONNECTED;
+		res->d_vfs_flags &= ~DCACHE_UNHASHED;
+		list_add(&res->d_alias, &inode->i_dentry);
+		hlist_add_head(&res->d_hash, &inode->i_sb->s_anon);
+		spin_unlock(&res->d_lock);
 		inode = NULL; /* don't drop reference */
 	}
 	spin_unlock(&dcache_lock);
@@ -879,30 +905,32 @@ struct dentry * d_alloc_anon(struct inod
  * @inode:  the inode which may have a disconnected dentry
  * @dentry: a negative dentry which we want to point to the inode.
  *
- * If inode is a directory and has a 'disconnected' dentry (i.e. IS_ROOT and
- * DCACHE_DISCONNECTED), then d_move that in place of the given dentry
- * and return it, else simply d_add the inode to the dentry and return NULL.
+ * If inode has a 'disconnected' dentry (i.e. IS_ROOT and DCACHE_DISCONNECTED),
+ * then d_move that in place of the given dentry and return it, else simply
+ * d_add the inode to the dentry and return NULL.
  *
- * This is (will be) needed in the lookup routine of any filesystem that is exportable
+ * This is needed in the lookup routine of any filesystem that is exportable
  * (via knfsd) so that we can build dcache paths to directories effectively.
  *
  * If a dentry was found and moved, then it is returned.  Otherwise NULL
  * is returned.  This matches the expected return value of ->lookup.
  *
  */
+static void d_move_locked(struct dentry * dentry, struct dentry * target);
+
 struct dentry *d_splice_alias(struct inode *inode, struct dentry *dentry)
 {
 	struct dentry *new = NULL;
 
-	if (inode && S_ISDIR(inode->i_mode)) {
+	if (inode) {
 		spin_lock(&dcache_lock);
-		if (!list_empty(&inode->i_dentry)) {
-			new = list_entry(inode->i_dentry.next, struct dentry, d_alias);
-			__dget_locked(new);
+		new = __d_get_alias(inode);
+		if (new && IS_ROOT(new)) {
+			dentry->d_bucket = d_hash(dentry->d_parent,
+						  dentry->d_name.hash);
+			d_move_locked(new, dentry);
 			spin_unlock(&dcache_lock);
 			security_d_instantiate(new, inode);
-			d_rehash(dentry);
-			d_move(new, dentry);
 			iput(inode);
 		} else {
 			/* d_instantiate takes dcache_lock, so we do it by hand */
@@ -911,6 +939,7 @@ struct dentry *d_splice_alias(struct ino
 			spin_unlock(&dcache_lock);
 			security_d_instantiate(dentry, inode);
 			d_rehash(dentry);
+			if (new) dput(new);
 		}
 	} else
 		d_add(dentry, inode);
@@ -1185,12 +1214,11 @@ static inline void switch_names(struct d
  * dcache entries should not be moved in this way.
  */
 
-void d_move(struct dentry * dentry, struct dentry * target)
+static void d_move_locked(struct dentry * dentry, struct dentry * target)
 {
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
 
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	/*
 	 * XXXX: do we really need to take target->d_lock?
@@ -1243,6 +1271,15 @@ already_unhashed:
 	spin_unlock(&target->d_lock);
 	spin_unlock(&dentry->d_lock);
 	write_sequnlock(&rename_lock);
+}
+
+void d_move(struct dentry * dentry, struct dentry * target)
+{
+	if (!dentry->d_inode)
+		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
+
+	spin_lock(&dcache_lock);
+	d_move_locked(dentry, target);
 	spin_unlock(&dcache_lock);
 }
 

diff ./fs/exportfs/expfs.c~current~ ./fs/exportfs/expfs.c
--- ./fs/exportfs/expfs.c~current~	2004-05-10 13:26:54.000000000 +1000
+++ ./fs/exportfs/expfs.c	2004-05-10 13:26:54.000000000 +1000
@@ -68,7 +68,7 @@ find_exported_dentry(struct super_block 
 		goto err_out;
 	}
 	if (S_ISDIR(result->d_inode->i_mode) &&
-	    (result->d_flags & DCACHE_DISCONNECTED)) {
+	    (result->d_vfs_flags & DCACHE_DISCONNECTED)) {
 		/* it is an unconnected directory, we must connect it */
 		;
 	} else {
@@ -87,7 +87,6 @@ find_exported_dentry(struct super_block 
 			spin_unlock(&dcache_lock);
 			if (toput)
 				dput(toput);
-			toput = NULL;
 			if (dentry != result &&
 			    acceptable(context, dentry)) {
 				dput(result);
@@ -136,13 +135,13 @@ find_exported_dentry(struct super_block 
 	 * probably enough) without getting anywhere, we just give up
 	 */
 	noprogress= 0;
-	while (target_dir->d_flags & DCACHE_DISCONNECTED && noprogress++ < 10) {
+	while (target_dir->d_vfs_flags & DCACHE_DISCONNECTED && noprogress++ < 10) {
 		struct dentry *pd = target_dir;
 
 		dget(pd);
 		spin_lock(&pd->d_lock);
 		while (!IS_ROOT(pd) &&
-				(pd->d_parent->d_flags&DCACHE_DISCONNECTED)) {
+				(pd->d_parent->d_vfs_flags&DCACHE_DISCONNECTED)) {
 			struct dentry *parent = pd->d_parent;
 
 			dget(parent);
@@ -155,11 +154,15 @@ find_exported_dentry(struct super_block 
 
 		if (!IS_ROOT(pd)) {
 			/* must have found a connected parent - great */
-			pd->d_flags &= ~DCACHE_DISCONNECTED;
+			spin_lock(&dcache_lock);
+			pd->d_vfs_flags &= ~DCACHE_DISCONNECTED;
+			spin_unlock(&dcache_lock);
 			noprogress = 0;
 		} else if (pd == sb->s_root) {
 			printk(KERN_ERR "export: Eeek filesystem root is not connected, impossible\n");
-			pd->d_flags &= ~DCACHE_DISCONNECTED;
+			spin_lock(&dcache_lock);
+			pd->d_vfs_flags &= ~DCACHE_DISCONNECTED;
+			spin_unlock(&dcache_lock);
 			noprogress = 0;
 		} else {
 			/* we have hit the top of a disconnected path.  Try
@@ -227,7 +230,7 @@ find_exported_dentry(struct super_block 
 		dput(pd);
 	}
 
-	if (target_dir->d_flags & DCACHE_DISCONNECTED) {
+	if (target_dir->d_vfs_flags & DCACHE_DISCONNECTED) {
 		/* something went wrong - oh-well */
 		if (!err)
 			err = -ESTALE;

diff ./fs/intermezzo/dir.c~current~ ./fs/intermezzo/dir.c
--- ./fs/intermezzo/dir.c~current~	2004-05-10 13:26:54.000000000 +1000
+++ ./fs/intermezzo/dir.c	2004-05-10 13:26:54.000000000 +1000
@@ -161,7 +161,9 @@ struct dentry *presto_iget_ilookup(struc
         }
 
         d_instantiate(dentry, inode);
-        dentry->d_flags |= DCACHE_DISCONNECTED; /* NFS hack */
+        spin_lock(&dcache_lock);
+        dentry->d_vfs_flags |= DCACHE_DISCONNECTED; /* NFS hack */
+        spin_unlock(&dcache_lock);
 
         EXIT;
         return NULL;

diff ./fs/nfsd/nfsfh.c~current~ ./fs/nfsd/nfsfh.c
--- ./fs/nfsd/nfsfh.c~current~	2004-05-10 13:26:54.000000000 +1000
+++ ./fs/nfsd/nfsfh.c	2004-05-10 13:26:54.000000000 +1000
@@ -204,7 +204,7 @@ fh_verify(struct svc_rqst *rqstp, struct
 		}
 #ifdef NFSD_PARANOIA
 		if (S_ISDIR(dentry->d_inode->i_mode) &&
-		    (dentry->d_flags & DCACHE_DISCONNECTED)) {
+		    (dentry->d_vfs_flags & DCACHE_DISCONNECTED)) {
 			printk("nfsd: find_fh_dentry returned a DISCONNECTED directory: %s/%s\n",
 			       dentry->d_parent->d_name.name, dentry->d_name.name);
 		}

diff ./include/linux/dcache.h~current~ ./include/linux/dcache.h
--- ./include/linux/dcache.h~current~	2004-05-10 13:26:54.000000000 +1000
+++ ./include/linux/dcache.h	2004-05-10 13:26:54.000000000 +1000
@@ -146,7 +146,7 @@ d_iput:		no		no		yes
       * first endeavour to clear the bit either by discovering that it is
       * connected, or by performing lookup operations.   Any filesystem which
       * supports nfsd_operations MUST have a lookup function which, if it finds
-      * a directory inode with a DCACHE_DISCONNECTED dentry, will d_move
+      * an inode with an IS_ROOT, DCACHE_DISCONNECTED dentry, will d_move
       * that dentry into place and return that dentry rather than the passed one,
       * typically using d_splice_alias.
       */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-10  3:03         ` Neil Brown
@ 2004-05-10  4:50           ` Greg Banks
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Banks @ 2004-05-10  4:50 UTC (permalink / raw)
  To: Neil Brown
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust, Dipankar Sarma

Neil Brown wrote:
> 
> On Tuesday May 4, gnb@melbourne.sgi.com wrote:
> > > >
> > Ok, how about this...it's portable, and not racy, but may perturb the
> > logic slightly by also taking dentries off the unused list in the case
> > where they already had d_count>=1.  I'm not sure how significant that is.
> > In any case this also passes my tests.
> 
> I think this patch is good and needed.
> 
> I think the race can happen if:
>        dentry->d_count == 1, not on list
> 
>    thread 1                       thread 2
>    enter __dget_locked            enter dput
>    atomic_inc(d_count) (now 2)
>                                   atomic_dec_and_lock(d_count...) (now 1)
>    if(atomic_read(d_count)==1 ....
>          remove from list
> 
> This will remove it from the unused list when it isn't
> on, and will decrement nr_unused which, as you say, is bad.

This is precisely the race that my tracing indicated was happening
during my tests.

> [...]  Possibly that patch should update the
> d_lookup comment to add __dget_locked to the set of functions that
> clean up after it.

Ok, updated patch below.


--- linux.orig/fs/dcache.c	2004-05-10 13:38:09.000000000 +1000
+++ linux/fs/dcache.c	2004-05-10 14:45:44.000000000 +1000
@@ -257,7 +257,7 @@
 static inline struct dentry * __dget_locked(struct dentry *dentry)
 {
 	atomic_inc(&dentry->d_count);
-	if (atomic_read(&dentry->d_count) == 1) {
+	if (!list_empty(&dentry->d_lru)) {
 		dentry_stat.nr_unused--;
 		list_del_init(&dentry->d_lru);
 	}
@@ -940,8 +942,9 @@
  * lookup is going on.
  *
  * dentry_unused list is not updated even if lookup finds the required dentry
- * in there. It is updated in places such as prune_dcache, shrink_dcache_sb and
- * select_parent. This laziness saves lookup from dcache_lock acquisition.
+ * in there. It is updated in places such as prune_dcache, shrink_dcache_sb,
+ * select_parent and __dget_locked. This laziness saves lookup from dcache_lock
+ * acquisition.
  *
  * d_lookup() is protected against the concurrent renames in some unrelated
  * directory using the seqlockt_t rename_lock.


Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-10  3:27       ` Neil Brown
@ 2004-05-10 11:28         ` Greg Banks
  2004-05-13  5:58           ` Neil Brown
  0 siblings, 1 reply; 22+ messages in thread
From: Greg Banks @ 2004-05-10 11:28 UTC (permalink / raw)
  To: Neil Brown
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust

Neil Brown wrote:
> 
> > > *   Logic bug in d_splice_alias() forgets to clear the DCACHE_DISCONNECTED
> > >     flag when a lookup connects a disconnected dentry.  Fix is (relative
> > >     to Neil's patch):
> >
> Ok, I've reviewed the code and thought about it some more.
> 
> This was intentional and the patch to clear DCACHE_DISCONNECTED is not
> needed and possibly wrong.
> 
> DCACHE_DISCONNECTED *doesn't* mean that the entries isn't connected,
> but only that it might not be.

Agreed.  After reading your comments below the semantics of the flag
make sense: was once disconnected and may or may not still be depending
on lazy clearing by the expfs.c code).  My dentry state checking code
was wrong.

Of course I now have an issue with the misleading name ;-)

> In knfsd, we build a patch from the bottom up, and the dentry at the
> bottom is not "connected" until the top is finally connected to the
> root.  If DCACHE_DISCONNECTED were to mean precisely that the dentry
> isn't connected, then at that point were we finally connect a path we
> would have to walk down the path (which could possibly branch)
> clearing all the DCACHE_DISCONNECTED flags, and this clearly isn't a
> good idea.

Yes, with you so far.  Lazy clearing is good.

> nfsd and exportfs should be the only bits of code that cares if
> DCACHE_DISCONNECTED is set or not, and they will clear it if
> appropriate.

Ok, I've thought about this for a while and I see where you're coming
from: you don't want DCACHE_DISCONNECTED disappearing out from
underneath find_exported_dentry() as a side effect of either the lookups
it does or unrelated local lookups from other processes.  The cost
is that the flag potentially lingers when it could have been cleared
out earlier (as my patch tried to do), but this is better than subtle
logic failures down the track.

> A more significant measure is "IS_ROOT". When we call d_splice_alias,
> it should look for an IS_ROOT alias to consider splicing in rather
> than a DCACHE_DISCONNECTED alias, as a DCACHE_DISCONNECTED alias might
> already be spliced into some other path (if it is for a file with
> multiple links).
> 
> The follow patch should resolve all of this, and a few other things.
> IT:
> 
>  - moves DCACHE_DISCONNECTED into d_vfs_flags and uses dcache_lock to
>    make sure that setting the flag doesn't tread on some other flag
>    in the same field.

What I'm wondering is, do we still need DCACHE_DISCONNECTED at all?
Perhaps the uses of it could be replaced with combinations of checks
of IS_ROOT() and (d == d->d_sb->s_root) ?

>  - introduces "d_get_alias" which is similar to "d_find_alias", but is
>    less choosy: any alias will do.
>  - uses d_get_alias in d_alloc_anon and d_splice_alias to try to use a
>    pre-exisiting alias if possible.
>  - Only splices in a pre-exisiting alias if it was IS_ROOT.  There can
>    now only be one of these per inode, and if there is one, there will
>    be no other alias.
>  - introduces d_move_locked which does the same as d_move, but without
>    getting dcache_lock first (it must already be held).  This is
>    needed to be able to atomically test IS_ROOT, and then convert the
>    dentry to non IS_ROOT before anyone else gets to test IS_ROOT.
> 
> Comments and testing results very welcome.

The changes look good but I think the invariants you describe above
should go in a comment.

I will try to do some testing in the next couple of days.

> -void d_move(struct dentry * dentry, struct dentry * target)
> +static void d_move_locked(struct dentry * dentry, struct dentry * target)
>  {
>  	if (!dentry->d_inode)
>  		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
>[...]
> +
> +void d_move(struct dentry * dentry, struct dentry * target)
> +{
> +       if (!dentry->d_inode)
> +               printk(KERN_WARNING "VFS: moving negative dcache entry\n");
> +

Do you need two copies of this check in the same path?


Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-10 11:28         ` Greg Banks
@ 2004-05-13  5:58           ` Neil Brown
  2004-05-13  7:15             ` Greg Banks
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2004-05-13  5:58 UTC (permalink / raw)
  To: Greg Banks
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust


[I wrote this a few days ago but it looks like I forgot to post it...]

On Monday May 10, gnb@melbourne.sgi.com wrote:
> Neil Brown wrote:
> > 
> > > > *   Logic bug in d_splice_alias() forgets to clear the DCACHE_DISCONNECTED
> > > >     flag when a lookup connects a disconnected dentry.  Fix is (relative
> > > >     to Neil's patch):
> > >
> > Ok, I've reviewed the code and thought about it some more.
> > 
> > This was intentional and the patch to clear DCACHE_DISCONNECTED is not
> > needed and possibly wrong.
> > 
> > DCACHE_DISCONNECTED *doesn't* mean that the entries isn't connected,
> > but only that it might not be.
> 
> Agreed.  After reading your comments below the semantics of the flag
> make sense: was once disconnected and may or may not still be depending
> on lazy clearing by the expfs.c code).  My dentry state checking code
> was wrong.
> 
> Of course I now have an issue with the misleading name ;-)

Maybe:   DCACHE_NOT_KNOWN_TO_BE_CONNECTED.
Unfortunately the absence of the flags is stronger information that
it's presence and that makes it hard to name...

> 
> What I'm wondering is, do we still need DCACHE_DISCONNECTED at all?
> Perhaps the uses of it could be replaced with combinations of checks
> of IS_ROOT() and (d == d->d_sb->s_root) ?

It is still needed.
Suppose one thread creates a disconnected dentry, and then starts building
the path from the bottom up.
When it is half way up another request comes in for the same
filehandle.  The same dentry is found.  It is now not IS_ROOT, but
still DCACHE_DISCONNECTED.  Until it is fully connected the second
request shouldn't proceed, and without the DCACHE_DISCONNECTED flag it
is expensive to test for connected-ness.  !IS_ROOT certainly isn't
enough.

> 
> The changes look good but I think the invariants you describe above
> should go in a comment.

Good point.  Which comment I wonder...

> 
> I will try to do some testing in the next couple of days.
> 
> > -void d_move(struct dentry * dentry, struct dentry * target)
> > +static void d_move_locked(struct dentry * dentry, struct dentry * target)
> >  {
> >  	if (!dentry->d_inode)
> >  		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
> >[...]
> > +
> > +void d_move(struct dentry * dentry, struct dentry * target)
> > +{
> > +       if (!dentry->d_inode)
> > +               printk(KERN_WARNING "VFS: moving negative dcache entry\n");
> > +
> 
> Do you need two copies of this check in the same path?
> 

No.  cut-and-paste error I think.

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: d_splice_alias() problem.
  2004-05-13  5:58           ` Neil Brown
@ 2004-05-13  7:15             ` Greg Banks
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Banks @ 2004-05-13  7:15 UTC (permalink / raw)
  To: Neil Brown
  Cc: Nikita Danilov, linux kernel mailing list, alexander viro,
	trond myklebust

Neil Brown wrote:
> 
> > Of course I now have an issue with the misleading name ;-)
> 
> Maybe:   DCACHE_NOT_KNOWN_TO_BE_CONNECTED.
> Unfortunately the absence of the flags is stronger information that
> it's presence and that makes it hard to name...

;-)

> > What I'm wondering is, do we still need DCACHE_DISCONNECTED at all?
> > Perhaps the uses of it could be replaced with combinations of checks
> > of IS_ROOT() and (d == d->d_sb->s_root) ?
> 
> It is still needed.
> Suppose one thread creates a disconnected dentry, and then starts building
> the path from the bottom up.
> When it is half way up another request comes in for the same
> filehandle.  The same dentry is found.  It is now not IS_ROOT, but
> still DCACHE_DISCONNECTED.  Until it is fully connected the second
> request shouldn't proceed, and without the DCACHE_DISCONNECTED flag it
> is expensive to test for connected-ness.  !IS_ROOT certainly isn't
> enough.

Sure, IS_ROOT() only tells you whether a dentry is connected to its parent
and to tell if its connected to the fs root you need to traverse all the
parents.  But nfsd_acceptable() already does this to do permission checks
in the (default) subtree_check case.

So it seems that what the flag gives you is that when its absent and
nosubtree_check is in effect, you know you can short-circuit the connection
walk.  Fair enough.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2004-05-13  7:18 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2004-04-23 13:02 d_splice_alias() problem Nikita Danilov
2004-04-23 15:40 ` Andreas Dilger
2004-04-23 16:20   ` Nikita Danilov
2004-04-23 23:49 ` Andrew Morton
2004-04-26 12:45   ` Nikita Danilov
2004-04-30  4:54 ` Neil Brown
2004-04-30  7:50   ` Greg Banks
2004-04-30 13:28   ` Nikita Danilov
2004-05-03 23:46     ` Neil Brown
2004-05-03 12:02   ` Greg Banks
2004-05-03 23:28     ` Neil Brown
2004-05-04  0:05       ` Greg Banks
2004-05-04  7:00       ` Greg Banks
2004-05-04  9:46         ` viro
2004-05-04 10:21           ` Greg Banks
2004-05-05  0:11           ` Neil Brown
2004-05-10  3:03         ` Neil Brown
2004-05-10  4:50           ` Greg Banks
2004-05-10  3:27       ` Neil Brown
2004-05-10 11:28         ` Greg Banks
2004-05-13  5:58           ` Neil Brown
2004-05-13  7:15             ` Greg Banks

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox