linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How to handle non-local renames?
@ 2006-09-17 20:43 Miklos Szeredi
  2006-09-18 16:38 ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Miklos Szeredi @ 2006-09-17 20:43 UTC (permalink / raw)
  To: linux-fsdevel

Consider the scenario where systems A and B share a filesystem mounted
on /mnt.

A: cd /mnt/a/b
B: mv /mnt/a/b /mnt/a/c/d
A: ls /mnt/a/c/d

So in third step the lookup returns a cached inode, to which a cached
dentry with positive refcount already refers.

I can basically think of two possibilities:

  1) instantiate new dentry 'd' with the inode
  2) move the old dentry 'b' to the new dentry 'd'

With 1, there's a problem of directory aliasing, which means the VFS
doesn't ensure any more that directory loops can't happen.  NFS looks
like it does this, and relies on the server to avoid loops.

2 suffers from locking problems, since lookup already holds a i_mutex
on /mnt/a/c so it can't acquire either the rename_mutex or i_mutex on
/mnt/a which would be needed to safely move the dentry.

Now if a new object is created at /mnt/a/b

B: touch /mnt/a/b
A: ls /mnt/a/b

In case of 1 the old dentry will have to be unhashed and replaced with
the new object.  This means that some things will no longer work for
processes using the old dentry as CWD.  In NFS we'd get a nice ESTALE
error.

Does somebody have any thoughts on this?  Could this be done in a
better way?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-17 20:43 How to handle non-local renames? Miklos Szeredi
@ 2006-09-18 16:38 ` Trond Myklebust
  2006-09-18 18:00   ` Miklos Szeredi
  2006-09-19  8:24   ` Miklos Szeredi
  0 siblings, 2 replies; 25+ messages in thread
From: Trond Myklebust @ 2006-09-18 16:38 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: linux-fsdevel

On Sun, 2006-09-17 at 22:43 +0200, Miklos Szeredi wrote:
> Consider the scenario where systems A and B share a filesystem mounted
> on /mnt.
> 
> A: cd /mnt/a/b
> B: mv /mnt/a/b /mnt/a/c/d
> A: ls /mnt/a/c/d
> 
> So in third step the lookup returns a cached inode, to which a cached
> dentry with positive refcount already refers.
> 
> I can basically think of two possibilities:
> 
>   1) instantiate new dentry 'd' with the inode
>   2) move the old dentry 'b' to the new dentry 'd'
> 
> With 1, there's a problem of directory aliasing, which means the VFS
> doesn't ensure any more that directory loops can't happen.  NFS looks
> like it does this, and relies on the server to avoid loops.
> 
> 2 suffers from locking problems, since lookup already holds a i_mutex
> on /mnt/a/c so it can't acquire either the rename_mutex or i_mutex on
> /mnt/a which would be needed to safely move the dentry.

2 also suffers from the problem of loops. How do you fix it to deal with
the case of

A: cd /mnt/a/b
B: mv /mnt/a/b /mnt
B: mv /mnt/a /mnt/b
A: cd a

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-18 16:38 ` Trond Myklebust
@ 2006-09-18 18:00   ` Miklos Szeredi
  2006-09-19  8:24   ` Miklos Szeredi
  1 sibling, 0 replies; 25+ messages in thread
From: Miklos Szeredi @ 2006-09-18 18:00 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-fsdevel

> > Consider the scenario where systems A and B share a filesystem mounted
> > on /mnt.
> > 
> > A: cd /mnt/a/b
> > B: mv /mnt/a/b /mnt/a/c/d
> > A: ls /mnt/a/c/d
> > 
> > So in third step the lookup returns a cached inode, to which a cached
> > dentry with positive refcount already refers.
> > 
> > I can basically think of two possibilities:
> > 
> >   1) instantiate new dentry 'd' with the inode
> >   2) move the old dentry 'b' to the new dentry 'd'
> > 
> > With 1, there's a problem of directory aliasing, which means the VFS
> > doesn't ensure any more that directory loops can't happen.  NFS looks
> > like it does this, and relies on the server to avoid loops.
> > 
> > 2 suffers from locking problems, since lookup already holds a i_mutex
> > on /mnt/a/c so it can't acquire either the rename_mutex or i_mutex on
> > /mnt/a which would be needed to safely move the dentry.
> 
> 2 also suffers from the problem of loops. How do you fix it to deal with
> the case of
> 
> A: cd /mnt/a/b
> B: mv /mnt/a/b /mnt
> B: mv /mnt/a /mnt/b
> A: cd a

Right, if the move would create a loop, it obviously can't be done,
but that could be checked (if the locking could be done).

The problem is that what NFS does is not enough for FUSE, since in the
FUSE case the "server" cannot be trusted to not allow loops.  So even
case 1 becomes much more difficult.  That is why I'm searching for an
alternative.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-18 16:38 ` Trond Myklebust
  2006-09-18 18:00   ` Miklos Szeredi
@ 2006-09-19  8:24   ` Miklos Szeredi
  2006-09-22  0:00     ` Trond Myklebust
  1 sibling, 1 reply; 25+ messages in thread
From: Miklos Szeredi @ 2006-09-19  8:24 UTC (permalink / raw)
  To: trond.myklebust; +Cc: linux-fsdevel

> A: cd /mnt/a/b
> B: mv /mnt/a/b /mnt
> B: mv /mnt/a /mnt/b
> A: cd a

I think with this it's possible (though not easy) to deadlock NFS:

A: cd /mnt/a; sleep 10000 & cd b
B: mv /mnt/a/b /mnt/b
B: mv /mnt/a /mnt/b/a
A: mv /proc/`pidof sleep`/cwd/x x & rmdir /mnt/b/a

The mv will first lock a, then b (since it's CWD is /mnt/a/b, so
parents in the rename will be /mnt/a and /mnt/a/b), the rmdir will
first lock b, then a.  So you have a perfect abba deadlock.

Miklos



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-19  8:24   ` Miklos Szeredi
@ 2006-09-22  0:00     ` Trond Myklebust
  2006-09-22  6:22       ` Miklos Szeredi
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2006-09-22  0:00 UTC (permalink / raw)
  To: Miklos Szeredi, dhowells; +Cc: linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 823 bytes --]

On Tue, 2006-09-19 at 10:24 +0200, Miklos Szeredi wrote:
> > A: cd /mnt/a/b
> > B: mv /mnt/a/b /mnt
> > B: mv /mnt/a /mnt/b
> > A: cd a
> 
> I think with this it's possible (though not easy) to deadlock NFS:

Actually, it is dead(sic!) easy. See my testcase in

  http://bugzilla.kernel.org/show_bug.cgi?id=7178

Sigh...

I'm thinking something along the lines of the following patch on top of
David's d_materialise_dentry(), which is already in the 2.6.18-mm
kernels. Still untested (I'm working on that), but it attempts to do
what you were proposing:

        1) If instantiating a directory, check for aliases
        2) If an alias is found in the dcache, attempt to rename the
        existing dentry instead of instantiating the alias.
        3) Return ELOOP if the alias turns out to be a parent.

Cheers,
  Trond

[-- Attachment #2: linux-2.6.18-073-enforce_directory_uniqueness.dif --]
[-- Type: message/rfc822, Size: 5088 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: 
Message-ID: <1158883185.5535.6.camel@lade.trondhjem.org>

If the caller tries to instantiate a directory using an inode that already
has a dentry alias, then we attempt to rename the existing dentry instead
of instantiating a new one. Fail with an ELOOP error if the rename would
affect one of our parent directories.

This behaviour is needed in order to avoid issues such as

  http://bugzilla.kernel.org/show_bug.cgi?id=7178

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/dcache.c  |   79 +++++++++++++++++++++++++++++++++++++---------------------
 fs/nfs/dir.c |    7 ++++-
 2 files changed, 56 insertions(+), 30 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 17b392a..2bbf422 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1337,23 +1337,21 @@ static void switch_names(struct dentry *
  * deleted it.
  */
  
-/**
- * d_move - move a dentry
+/*
+ * d_move_locked - move a dentry
  * @dentry: entry to move
  * @target: new dentry
  *
  * Update the dcache to reflect the move of a file name. Negative
  * dcache entries should not be moved in this way.
  */
-
-void d_move(struct dentry * dentry, struct dentry * target)
+static void d_move_locked(struct dentry * dentry, struct dentry * target)
 {
 	struct hlist_head *list;
 
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
 
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	/*
 	 * XXXX: do we really need to take target->d_lock?
@@ -1404,10 +1402,39 @@ already_unhashed:
 	fsnotify_d_move(dentry);
 	spin_unlock(&dentry->d_lock);
 	write_sequnlock(&rename_lock);
+}
+
+/**
+ * d_move - move a dentry
+ * @dentry: entry to move
+ * @target: new dentry
+ *
+ * Update the dcache to reflect the move of a file name. Negative
+ * dcache entries should not be moved in this way.
+ */
+
+void d_move(struct dentry * dentry, struct dentry * target)
+{
+	spin_lock(&dcache_lock);
+	d_move_locked(dentry, target);
 	spin_unlock(&dcache_lock);
 }
 
 /*
+ * Reject any lookup loops due to malicious remote servers
+ */
+static int d_detect_loops(struct dentry *dentry, struct inode *inode)
+{
+	struct dentry *p;
+
+	for (p = dentry; p->d_parent != p; p = p->d_parent) {
+		if (p->d_parent->d_inode == inode)
+			return -ELOOP;
+	}
+	return 0;
+}
+
+/*
  * Prepare an anonymous dentry for life in the superblock's dentry tree as a
  * named dentry in place of the dentry to be replaced.
  */
@@ -1461,26 +1488,22 @@ struct dentry *d_materialise_unique(stru
 		goto found_lock;
 	}
 
-	/* See if a disconnected directory already exists as an anonymous root
-	 * that we should splice into the tree instead */
-	if (S_ISDIR(inode->i_mode) && (alias = __d_find_alias(inode, 1))) {
-		spin_lock(&alias->d_lock);
-
-		/* Is this a mountpoint that we could splice into our tree? */
-		if (IS_ROOT(alias))
-			goto connect_mountpoint;
-
-		if (alias->d_name.len == dentry->d_name.len &&
-		    alias->d_parent == dentry->d_parent &&
-		    memcmp(alias->d_name.name,
-			   dentry->d_name.name,
-			   dentry->d_name.len) == 0)
-			goto replace_with_alias;
-
-		spin_unlock(&alias->d_lock);
-
-		/* Doh! Seem to be aliasing directories for some reason... */
-		dput(alias);
+	if (S_ISDIR(inode->i_mode)) {
+		actual = ERR_PTR(d_detect_loops(dentry, inode));
+		if (IS_ERR(actual))
+			goto out_unlock;
+		/* Does an aliased dentry already exist? */
+		alias = __d_find_alias(inode, 0);
+		if (alias) {
+			actual = alias;
+			/* Is this an anonymous mountpoint that we could splice
+			 * into our tree? */
+			if (IS_ROOT(alias))
+				goto connect_mountpoint;
+			/* Nope, but we must(!) avoid directory aliasing */
+			d_move_locked(alias, dentry);
+			goto out_unlock;
+		}
 	}
 
 	/* Add a unique reference */
@@ -1495,6 +1518,7 @@ found_lock:
 found:
 	_d_rehash(actual);
 	spin_unlock(&actual->d_lock);
+out_unlock:
 	spin_unlock(&dcache_lock);
 
 	if (actual == dentry) {
@@ -1507,12 +1531,9 @@ found:
 
 	/* Convert the anonymous/root alias into an ordinary dentry */
 connect_mountpoint:
+	spin_lock(&alias->d_lock);
 	__d_materialise_dentry(dentry, alias);
-
-	/* Replace the candidate dentry with the alias in the tree */
-replace_with_alias:
 	__d_drop(alias);
-	actual = alias;
 	goto found;
 
 shouldnt_be_hashed:
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 3419c2d..aca90cb 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -933,8 +933,11 @@ static struct dentry *nfs_lookup(struct 
 
 no_entry:
 	res = d_materialise_unique(dentry, inode);
-	if (res != NULL)
+	if (res != NULL) {
+		if (IS_ERR(res))
+			goto out_unlock;
 		dentry = res;
+	}
 	nfs_renew_times(dentry);
 	nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
 out_unlock:
@@ -1130,6 +1133,8 @@ static struct dentry *nfs_readdir_lookup
 	alias = d_materialise_unique(dentry, inode);
 	if (alias != NULL) {
 		dput(dentry);
+		if (IS_ERR(alias))
+			return NULL;
 		dentry = alias;
 	}
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-22  0:00     ` Trond Myklebust
@ 2006-09-22  6:22       ` Miklos Szeredi
  2006-09-28 10:02         ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Miklos Szeredi @ 2006-09-22  6:22 UTC (permalink / raw)
  To: trond.myklebust; +Cc: dhowells, linux-fsdevel

> I'm thinking something along the lines of the following patch on top of
> David's d_materialise_dentry(), which is already in the 2.6.18-mm
> kernels. Still untested (I'm working on that), but it attempts to do
> what you were proposing:
> 
>         1) If instantiating a directory, check for aliases
>         2) If an alias is found in the dcache, attempt to rename the
>         existing dentry instead of instantiating the alias.
>         3) Return ELOOP if the alias turns out to be a parent.

What I'm still worried about is that by moving the alias across
directories without holding the rename mutex, you are violating a
premise in the cross directory rename locking rules:

  (2) if cross-directory rename holds the lock on filesystem, order will not
      change until rename acquires all locks.  (Proof: other cross-directory
      renames will be blocked on filesystem lock and we don't start changing
      the order until we had acquired all locks).

Possible solution:

  - if lookup is being called for last components in rename then
    s_vfs_rename_mutex is already held, we are OK.  But currently we
    don't know if lookup is called from rename.

  - otherwise trylock on s_vfs_rename_mutex, if succeeds then OK.  

  - if fails, then alias can't be moved, make the inode bad (which
    also unhashes it) and get a new inode, which will now be
    unaliased.

Also need to trylock i_mutex on alias->d_parent->i_inode after having
acquired s_vfs_rename_mutex for similar reasons.

However it's getting rather ugly, and I'd rather have something
simpler.

Miklos

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-22  6:22       ` Miklos Szeredi
@ 2006-09-28 10:02         ` Al Viro
  2006-09-28 10:03           ` Al Viro
                             ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Al Viro @ 2006-09-28 10:02 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: trond.myklebust, dhowells, linux-fsdevel

On Fri, Sep 22, 2006 at 08:22:19AM +0200, Miklos Szeredi wrote:
> > I'm thinking something along the lines of the following patch on top of
> > David's d_materialise_dentry(), which is already in the 2.6.18-mm
> > kernels. Still untested (I'm working on that), but it attempts to do
> > what you were proposing:
> > 
> >         1) If instantiating a directory, check for aliases
> >         2) If an alias is found in the dcache, attempt to rename the
> >         existing dentry instead of instantiating the alias.
> >         3) Return ELOOP if the alias turns out to be a parent.
> 
> What I'm still worried about is that by moving the alias across
> directories without holding the rename mutex, you are violating a
> premise in the cross directory rename locking rules:
> 
>   (2) if cross-directory rename holds the lock on filesystem, order will not
>       change until rename acquires all locks.  (Proof: other cross-directory
>       renames will be blocked on filesystem lock and we don't start changing
>       the order until we had acquired all locks).
> 
> Possible solution:
> 
>   - if lookup is being called for last components in rename then
>     s_vfs_rename_mutex is already held, we are OK.  But currently we
>     don't know if lookup is called from rename.
> 
>   - otherwise trylock on s_vfs_rename_mutex, if succeeds then OK.  
> 
>   - if fails, then alias can't be moved, make the inode bad (which
>     also unhashes it) and get a new inode, which will now be
>     unaliased.
> 
> Also need to trylock i_mutex on alias->d_parent->i_inode after having
> acquired s_vfs_rename_mutex for similar reasons.
> 
> However it's getting rather ugly, and I'd rather have something
> simpler.

Simpler: if alias and our dentry share the parent, it's locked and
we can rename without s_vfs_rename_mutex.  If they do not, walk the
tree under alias, unhash all non-directories and kill all directories.

We keep local renames, we keep renames within directory and cross-directory
rename on server ends up with invalidation when client notices it.

We don't _care_ if lookup() is not from rename.  That's OK.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 10:02         ` Al Viro
@ 2006-09-28 10:03           ` Al Viro
  2006-09-28 10:22             ` Miklos Szeredi
  2006-09-28 10:16           ` Miklos Szeredi
  2006-09-28 12:31           ` Trond Myklebust
  2 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2006-09-28 10:03 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: trond.myklebust, dhowells, linux-fsdevel

On Thu, Sep 28, 2006 at 11:02:23AM +0100, Al Viro wrote:
> On Fri, Sep 22, 2006 at 08:22:19AM +0200, Miklos Szeredi wrote:
> > > I'm thinking something along the lines of the following patch on top of
> > > David's d_materialise_dentry(), which is already in the 2.6.18-mm
> > > kernels. Still untested (I'm working on that), but it attempts to do
> > > what you were proposing:
> > > 
> > >         1) If instantiating a directory, check for aliases
> > >         2) If an alias is found in the dcache, attempt to rename the
> > >         existing dentry instead of instantiating the alias.
> > >         3) Return ELOOP if the alias turns out to be a parent.
> > 
> > What I'm still worried about is that by moving the alias across
> > directories without holding the rename mutex, you are violating a
> > premise in the cross directory rename locking rules:
> > 
> >   (2) if cross-directory rename holds the lock on filesystem, order will not
> >       change until rename acquires all locks.  (Proof: other cross-directory
> >       renames will be blocked on filesystem lock and we don't start changing
> >       the order until we had acquired all locks).
> > 
> > Possible solution:
> > 
> >   - if lookup is being called for last components in rename then
> >     s_vfs_rename_mutex is already held, we are OK.  But currently we
> >     don't know if lookup is called from rename.
> > 
> >   - otherwise trylock on s_vfs_rename_mutex, if succeeds then OK.  
> > 
> >   - if fails, then alias can't be moved, make the inode bad (which
> >     also unhashes it) and get a new inode, which will now be
> >     unaliased.
> > 
> > Also need to trylock i_mutex on alias->d_parent->i_inode after having
> > acquired s_vfs_rename_mutex for similar reasons.
> > 
> > However it's getting rather ugly, and I'd rather have something
> > simpler.
> 
> Simpler: if alias and our dentry share the parent, it's locked and
> we can rename without s_vfs_rename_mutex.  If they do not, walk the
> tree under alias, unhash all non-directories and kill all directories.
> 
> We keep local renames, we keep renames within directory and cross-directory
> rename on server ends up with invalidation when client notices it.
> 
> We don't _care_ if lookup() is not from rename.  That's OK.

Note that the only invariant we really need to provide is
	* no two dentries refer to the same directory inode.
As long as that is true, we are OK wrt VFS data structures.  We do need
to guarantee that anyway, FUSE or not.  NFS is deadlockable too in that
area.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 10:02         ` Al Viro
  2006-09-28 10:03           ` Al Viro
@ 2006-09-28 10:16           ` Miklos Szeredi
  2006-09-28 10:27             ` Al Viro
  2006-09-28 12:31           ` Trond Myklebust
  2 siblings, 1 reply; 25+ messages in thread
From: Miklos Szeredi @ 2006-09-28 10:16 UTC (permalink / raw)
  To: viro; +Cc: trond.myklebust, dhowells, linux-fsdevel

> > > I'm thinking something along the lines of the following patch on top of
> > > David's d_materialise_dentry(), which is already in the 2.6.18-mm
> > > kernels. Still untested (I'm working on that), but it attempts to do
> > > what you were proposing:
> > > 
> > >         1) If instantiating a directory, check for aliases
> > >         2) If an alias is found in the dcache, attempt to rename the
> > >         existing dentry instead of instantiating the alias.
> > >         3) Return ELOOP if the alias turns out to be a parent.
> > 
> > What I'm still worried about is that by moving the alias across
> > directories without holding the rename mutex, you are violating a
> > premise in the cross directory rename locking rules:
> > 
> >   (2) if cross-directory rename holds the lock on filesystem, order will not
> >       change until rename acquires all locks.  (Proof: other cross-directory
> >       renames will be blocked on filesystem lock and we don't start changing
> >       the order until we had acquired all locks).
> > 
> > Possible solution:
> > 
> >   - if lookup is being called for last components in rename then
> >     s_vfs_rename_mutex is already held, we are OK.  But currently we
> >     don't know if lookup is called from rename.
> > 
> >   - otherwise trylock on s_vfs_rename_mutex, if succeeds then OK.  
> > 
> >   - if fails, then alias can't be moved, make the inode bad (which
> >     also unhashes it) and get a new inode, which will now be
> >     unaliased.
> > 
> > Also need to trylock i_mutex on alias->d_parent->i_inode after having
> > acquired s_vfs_rename_mutex for similar reasons.
> > 
> > However it's getting rather ugly, and I'd rather have something
> > simpler.
> 
> Simpler: if alias and our dentry share the parent, it's locked and
> we can rename without s_vfs_rename_mutex.  If they do not, walk the
> tree under alias, unhash all non-directories and kill all directories.

What do you mean by "killing directories"?  Unhashing the inode?

Are these OK, without holding i_mutex on alias and descendents?

> We keep local renames, we keep renames within directory and cross-directory
> rename on server ends up with invalidation when client notices it.
> 
> We don't _care_ if lookup() is not from rename.  That's OK.

Well, there's the strange case of shared parents and lookup moving the
source into the target, effectively completing the rename, and thus
making rename think the source and target were equal, although in fact
they weren't...

Miklos

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 10:03           ` Al Viro
@ 2006-09-28 10:22             ` Miklos Szeredi
  0 siblings, 0 replies; 25+ messages in thread
From: Miklos Szeredi @ 2006-09-28 10:22 UTC (permalink / raw)
  To: viro; +Cc: trond.myklebust, dhowells, linux-fsdevel

> Note that the only invariant we really need to provide is
> 	* no two dentries refer to the same directory inode.

That's clear, FUSE already plays by the rules by refusing to create a
directory alias in lookup.  But that's not the nicest solution.

Miklos

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 10:16           ` Miklos Szeredi
@ 2006-09-28 10:27             ` Al Viro
  2006-09-28 10:53               ` Miklos Szeredi
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2006-09-28 10:27 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: trond.myklebust, dhowells, linux-fsdevel

On Thu, Sep 28, 2006 at 12:16:31PM +0200, Miklos Szeredi wrote:
> > Simpler: if alias and our dentry share the parent, it's locked and
> > we can rename without s_vfs_rename_mutex.  If they do not, walk the
> > tree under alias, unhash all non-directories and kill all directories.
> 
> What do you mean by "killing directories"?  Unhashing the inode?
> 
> Are these OK, without holding i_mutex on alias and descendents?

Only dcache locking is needed for unhashing dentries (which is what
we do to non-directories).  For directories, we should silently
unhash inodes and make sure that all future operations on those inodes
would fail.  Up to filesystem how you do that.  Additionally, make
sure that revalidation fails on those suckers.
 
> > We keep local renames, we keep renames within directory and cross-directory
> > rename on server ends up with invalidation when client notices it.
> > 
> > We don't _care_ if lookup() is not from rename.  That's OK.
> 
> Well, there's the strange case of shared parents and lookup moving the
> source into the target, effectively completing the rename, and thus
> making rename think the source and target were equal, although in fact
> they weren't...

Which is fine, since we'll get 0, i.e. rename successful.  Tolerable,
seeing that server _is_ playing games with us.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 10:27             ` Al Viro
@ 2006-09-28 10:53               ` Miklos Szeredi
  2006-09-28 11:21                 ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Miklos Szeredi @ 2006-09-28 10:53 UTC (permalink / raw)
  To: viro; +Cc: miklos, trond.myklebust, dhowells, linux-fsdevel

> > > Simpler: if alias and our dentry share the parent, it's locked and
> > > we can rename without s_vfs_rename_mutex.  If they do not, walk the
> > > tree under alias, unhash all non-directories and kill all directories.
> > 
> > What do you mean by "killing directories"?  Unhashing the inode?
> > 
> > Are these OK, without holding i_mutex on alias and descendents?
> 
> Only dcache locking is needed for unhashing dentries (which is what
> we do to non-directories).  For directories, we should silently
> unhash inodes and make sure that all future operations on those inodes
> would fail.  Up to filesystem how you do that.  Additionally, make
> sure that revalidation fails on those suckers.

OK, what about operations currently in progress on those directories
and files?  Is there no way these can cause trouble?

>  
> > > We keep local renames, we keep renames within directory and
> > > cross-directory rename on server ends up with invalidation when
> > > client notices it.
> > > 
> > > We don't _care_ if lookup() is not from rename.  That's OK.
> > 
> > Well, there's the strange case of shared parents and lookup moving the
> > source into the target, effectively completing the rename, and thus
> > making rename think the source and target were equal, although in fact
> > they weren't...
> 
> Which is fine, since we'll get 0, i.e. rename successful.  Tolerable,
> seeing that server _is_ playing games with us.

Not necessarily: it could have been two parallel renames doing the
same on two clients, one of them succeeding between the lookups on
source and target of the other.

But I'm not overly worried about this.

Miklos

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 10:53               ` Miklos Szeredi
@ 2006-09-28 11:21                 ` Al Viro
  2006-09-28 11:44                   ` Miklos Szeredi
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2006-09-28 11:21 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: trond.myklebust, dhowells, linux-fsdevel

On Thu, Sep 28, 2006 at 12:53:01PM +0200, Miklos Szeredi wrote:
> > > > Simpler: if alias and our dentry share the parent, it's locked and
> > > > we can rename without s_vfs_rename_mutex.  If they do not, walk the
> > > > tree under alias, unhash all non-directories and kill all directories.
> > > 
> > > What do you mean by "killing directories"?  Unhashing the inode?
> > > 
> > > Are these OK, without holding i_mutex on alias and descendents?
> > 
> > Only dcache locking is needed for unhashing dentries (which is what
> > we do to non-directories).  For directories, we should silently
> > unhash inodes and make sure that all future operations on those inodes
> > would fail.  Up to filesystem how you do that.  Additionally, make
> > sure that revalidation fails on those suckers.
> 
> OK, what about operations currently in progress on those directories
> and files?  Is there no way these can cause trouble?

We need variants of d_move() and d_rehash() that do not take dcache_lock,
so that such fs would use them under dcache_lock, then check if parent
is dead and do the same kind of unhashing/killing if it is.  I.e.
pretend that we'd won the race and killer lookup came later.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 11:21                 ` Al Viro
@ 2006-09-28 11:44                   ` Miklos Szeredi
  2006-09-28 11:54                     ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Miklos Szeredi @ 2006-09-28 11:44 UTC (permalink / raw)
  To: viro; +Cc: trond.myklebust, dhowells, linux-fsdevel

> > > > > Simpler: if alias and our dentry share the parent, it's locked and
> > > > > we can rename without s_vfs_rename_mutex.  If they do not, walk the
> > > > > tree under alias, unhash all non-directories and kill all directories.
> > > > 
> > > > What do you mean by "killing directories"?  Unhashing the inode?
> > > > 
> > > > Are these OK, without holding i_mutex on alias and descendents?
> > > 
> > > Only dcache locking is needed for unhashing dentries (which is what
> > > we do to non-directories).  For directories, we should silently
> > > unhash inodes and make sure that all future operations on those inodes
> > > would fail.  Up to filesystem how you do that.  Additionally, make
> > > sure that revalidation fails on those suckers.
> > 
> > OK, what about operations currently in progress on those directories
> > and files?  Is there no way these can cause trouble?
> 
> We need variants of d_move() and d_rehash() that do not take dcache_lock,
> so that such fs would use them under dcache_lock, then check if parent
> is dead and do the same kind of unhashing/killing if it is.  I.e.
> pretend that we'd won the race and killer lookup came later.

I mostly see how that would work for lookup, but in case of rename the
d_move() is done by the VFS.  So that seems to need surgery as well.

Miklos



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 11:44                   ` Miklos Szeredi
@ 2006-09-28 11:54                     ` Al Viro
  0 siblings, 0 replies; 25+ messages in thread
From: Al Viro @ 2006-09-28 11:54 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: trond.myklebust, dhowells, linux-fsdevel

On Thu, Sep 28, 2006 at 01:44:01PM +0200, Miklos Szeredi wrote:
> I mostly see how that would work for lookup, but in case of rename the
> d_move() is done by the VFS.  So that seems to need surgery as well.

See patches allowing to take it inside ->rename()...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 10:02         ` Al Viro
  2006-09-28 10:03           ` Al Viro
  2006-09-28 10:16           ` Miklos Szeredi
@ 2006-09-28 12:31           ` Trond Myklebust
  2006-09-28 12:42             ` Al Viro
  2 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2006-09-28 12:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, dhowells, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 1075 bytes --]

On Thu, 2006-09-28 at 11:02 +0100, Al Viro wrote:

> Simpler: if alias and our dentry share the parent, it's locked and
> we can rename without s_vfs_rename_mutex.  If they do not, walk the
> tree under alias, unhash all non-directories and kill all directories.
> 
> We keep local renames, we keep renames within directory and cross-directory
> rename on server ends up with invalidation when client notices it.
> 
> We don't _care_ if lookup() is not from rename.  That's OK.

You can't assume that you can always kill the subdirectories: they may
be in use. Even if no actual processes are using them, I may have
submounts.
Forcing processes to wait until the administrator gets round to
unmounting all subdirs of the old alias before they can access the new
one is not going to be acceptable to most users.

The attached patch is what I'm testing out right now. I've basically
added Miklos' suggestion of trying to grab the s_vfs_rename_mutex in
order to protect the cross-directory d_move() case, and then failing
with an EBUSY if that is not possible.

Cheers,
  Trond

[-- Attachment #2: linux-2.6.18-002-enforce_directory_uniqueness.dif --]
[-- Type: message/rfc822, Size: 5957 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: Fri, 22 Sep 2006 15:32:14 -0400
Message-ID: <1159446156.5439.14.camel@lade.trondhjem.org>

If the caller tries to instantiate a directory using an inode that already
has a dentry alias, then we attempt to rename the existing dentry instead
of instantiating a new one. Fail with an ELOOP error if the rename would
affect one of our parent directories.

This behaviour is needed in order to avoid issues such as

  http://bugzilla.kernel.org/show_bug.cgi?id=7178

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/dcache.c  |  106 ++++++++++++++++++++++++++++++++++++++--------------------
 fs/nfs/dir.c |    7 +++-
 2 files changed, 75 insertions(+), 38 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 17b392a..abe219f 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1337,23 +1337,21 @@ static void switch_names(struct dentry *
  * deleted it.
  */
  
-/**
- * d_move - move a dentry
+/*
+ * d_move_locked - move a dentry
  * @dentry: entry to move
  * @target: new dentry
  *
  * Update the dcache to reflect the move of a file name. Negative
  * dcache entries should not be moved in this way.
  */
-
-void d_move(struct dentry * dentry, struct dentry * target)
+static void d_move_locked(struct dentry * dentry, struct dentry * target)
 {
 	struct hlist_head *list;
 
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
 
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	/*
 	 * XXXX: do we really need to take target->d_lock?
@@ -1404,10 +1402,39 @@ already_unhashed:
 	fsnotify_d_move(dentry);
 	spin_unlock(&dentry->d_lock);
 	write_sequnlock(&rename_lock);
+}
+
+/**
+ * d_move - move a dentry
+ * @dentry: entry to move
+ * @target: new dentry
+ *
+ * Update the dcache to reflect the move of a file name. Negative
+ * dcache entries should not be moved in this way.
+ */
+
+void d_move(struct dentry * dentry, struct dentry * target)
+{
+	spin_lock(&dcache_lock);
+	d_move_locked(dentry, target);
 	spin_unlock(&dcache_lock);
 }
 
 /*
+ * Reject any lookup loops due to malicious remote servers
+ */
+static int d_detect_loops(struct dentry *dentry, struct inode *inode)
+{
+	struct dentry *p;
+
+	for (p = dentry; p->d_parent != p; p = p->d_parent) {
+		if (p->d_parent->d_inode == inode)
+			return -ELOOP;
+	}
+	return 0;
+}
+
+/*
  * Prepare an anonymous dentry for life in the superblock's dentry tree as a
  * named dentry in place of the dentry to be replaced.
  */
@@ -1449,7 +1476,7 @@ static void __d_materialise_dentry(struc
  */
 struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
 {
-	struct dentry *alias, *actual;
+	struct dentry *actual;
 
 	BUG_ON(!d_unhashed(dentry));
 
@@ -1461,26 +1488,40 @@ struct dentry *d_materialise_unique(stru
 		goto found_lock;
 	}
 
-	/* See if a disconnected directory already exists as an anonymous root
-	 * that we should splice into the tree instead */
-	if (S_ISDIR(inode->i_mode) && (alias = __d_find_alias(inode, 1))) {
-		spin_lock(&alias->d_lock);
-
-		/* Is this a mountpoint that we could splice into our tree? */
-		if (IS_ROOT(alias))
-			goto connect_mountpoint;
-
-		if (alias->d_name.len == dentry->d_name.len &&
-		    alias->d_parent == dentry->d_parent &&
-		    memcmp(alias->d_name.name,
-			   dentry->d_name.name,
-			   dentry->d_name.len) == 0)
-			goto replace_with_alias;
-
-		spin_unlock(&alias->d_lock);
-
-		/* Doh! Seem to be aliasing directories for some reason... */
-		dput(alias);
+	if (S_ISDIR(inode->i_mode)) {
+		struct dentry *alias;
+
+		actual = ERR_PTR(d_detect_loops(dentry, inode));
+		if (IS_ERR(actual))
+			goto out_unlock;
+		/* Does an aliased dentry already exist? */
+		alias = __d_find_alias(inode, 0);
+		if (alias) {
+			actual = alias;
+			/* Is this an anonymous mountpoint that we could splice
+			 * into our tree? */
+			if (IS_ROOT(alias)) {
+				spin_lock(&alias->d_lock);
+				__d_materialise_dentry(dentry, alias);
+				__d_drop(alias);
+				goto found;
+			}
+			/* Nope, but we must(!) avoid directory aliasing */
+			if (alias->d_parent == dentry->d_parent) {
+				d_move_locked(alias, dentry);
+				goto out_unlock;
+			}
+			if (mutex_trylock(&inode->i_sb->s_vfs_rename_mutex)) {
+				d_move_locked(alias, dentry);
+				spin_unlock(&dcache_lock);
+				mutex_unlock(&inode->i_sb->s_vfs_rename_mutex);
+				goto out_nolock;
+			}
+			spin_unlock(&dcache_lock);
+			dput(alias);
+			actual = ERR_PTR(-EBUSY);
+			goto out_nolock;
+		}
 	}
 
 	/* Add a unique reference */
@@ -1495,8 +1536,9 @@ found_lock:
 found:
 	_d_rehash(actual);
 	spin_unlock(&actual->d_lock);
+out_unlock:
 	spin_unlock(&dcache_lock);
-
+out_nolock:
 	if (actual == dentry) {
 		security_d_instantiate(dentry, inode);
 		return NULL;
@@ -1505,16 +1547,6 @@ found:
 	iput(inode);
 	return actual;
 
-	/* Convert the anonymous/root alias into an ordinary dentry */
-connect_mountpoint:
-	__d_materialise_dentry(dentry, alias);
-
-	/* Replace the candidate dentry with the alias in the tree */
-replace_with_alias:
-	__d_drop(alias);
-	actual = alias;
-	goto found;
-
 shouldnt_be_hashed:
 	spin_unlock(&dcache_lock);
 	BUG();
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 7432f1a..0f22a26 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -933,8 +933,11 @@ static struct dentry *nfs_lookup(struct 
 
 no_entry:
 	res = d_materialise_unique(dentry, inode);
-	if (res != NULL)
+	if (res != NULL) {
+		if (IS_ERR(res))
+			goto out_unlock;
 		dentry = res;
+	}
 	nfs_renew_times(dentry);
 	nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
 out_unlock:
@@ -1130,6 +1133,8 @@ static struct dentry *nfs_readdir_lookup
 	alias = d_materialise_unique(dentry, inode);
 	if (alias != NULL) {
 		dput(dentry);
+		if (IS_ERR(alias))
+			return NULL;
 		dentry = alias;
 	}
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 12:31           ` Trond Myklebust
@ 2006-09-28 12:42             ` Al Viro
  2006-09-28 12:57               ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2006-09-28 12:42 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Miklos Szeredi, dhowells, linux-fsdevel

On Thu, Sep 28, 2006 at 08:31:18AM -0400, Trond Myklebust wrote:
> > We don't _care_ if lookup() is not from rename.  That's OK.
> 
> You can't assume that you can always kill the subdirectories: they may
> be in use. Even if no actual processes are using them, I may have
> submounts.

So what?  Actual processes will get -ESTALE and STFU, submounts can
bloody well stay where they are; since we don't free dentries, WTF
would VFS care?  Mark inodes so that nothing would be done to those
directories, unhash them and be done with it.  umount -l will be
able to take them out just fine afterwards.  *Or* we could even
try and play with detaching them (needed on invalidation anyway).

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 12:42             ` Al Viro
@ 2006-09-28 12:57               ` Trond Myklebust
  2006-09-28 13:15                 ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2006-09-28 12:57 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, dhowells, linux-fsdevel

On Thu, 2006-09-28 at 13:42 +0100, Al Viro wrote:
> On Thu, Sep 28, 2006 at 08:31:18AM -0400, Trond Myklebust wrote:
> > > We don't _care_ if lookup() is not from rename.  That's OK.
> > 
> > You can't assume that you can always kill the subdirectories: they may
> > be in use. Even if no actual processes are using them, I may have
> > submounts.
> 
> So what?  Actual processes will get -ESTALE and STFU, submounts can
> bloody well stay where they are; since we don't free dentries, WTF
> would VFS care?  Mark inodes so that nothing would be done to those
> directories, unhash them and be done with it.  umount -l will be
> able to take them out just fine afterwards.  *Or* we could even
> try and play with detaching them (needed on invalidation anyway).

That would mean that I can in practice unmount directories on your
client simply by renaming a directory on my client. I can't see how that
would be acceptable either.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 12:57               ` Trond Myklebust
@ 2006-09-28 13:15                 ` Al Viro
  2006-09-28 13:31                   ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2006-09-28 13:15 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Miklos Szeredi, dhowells, linux-fsdevel

On Thu, Sep 28, 2006 at 08:57:08AM -0400, Trond Myklebust wrote:
> On Thu, 2006-09-28 at 13:42 +0100, Al Viro wrote:
> > On Thu, Sep 28, 2006 at 08:31:18AM -0400, Trond Myklebust wrote:
> > > > We don't _care_ if lookup() is not from rename.  That's OK.
> > > 
> > > You can't assume that you can always kill the subdirectories: they may
> > > be in use. Even if no actual processes are using them, I may have
> > > submounts.
> > 
> > So what?  Actual processes will get -ESTALE and STFU, submounts can
> > bloody well stay where they are; since we don't free dentries, WTF
> > would VFS care?  Mark inodes so that nothing would be done to those
> > directories, unhash them and be done with it.  umount -l will be
> > able to take them out just fine afterwards.  *Or* we could even
> > try and play with detaching them (needed on invalidation anyway).
> 
> That would mean that I can in practice unmount directories on your
> client simply by renaming a directory on my client. I can't see how that
> would be acceptable either.

And what are you going to do if I remove it and its ancestors on my client?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 13:15                 ` Al Viro
@ 2006-09-28 13:31                   ` Trond Myklebust
  2006-09-28 14:20                     ` Al Viro
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2006-09-28 13:31 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, dhowells, linux-fsdevel

On Thu, 2006-09-28 at 14:15 +0100, Al Viro wrote:
> > That would mean that I can in practice unmount directories on your
> > client simply by renaming a directory on my client. I can't see how that
> > would be acceptable either.
> 
> And what are you going to do if I remove it and its ancestors on my client?

If it is in a subdirectory to which you are not supposed to have write
access, then I'm going to yell bloody murder :-)
Ditto if the same thing happens when you rename a directory to which you
do have access, and that screws over my process in a subdir to which you
don't have access.

The point is that NFS has quite enough posix violations already on its
grubby little hands. As far as posix is concerned, renaming a parent is
not supposed to affect processes in the subdirectories. There is
currently no reason why we can't support that behaviour in the NFS
client too barring VFS quirks.
Furthermore, it would upset a lot of people to change the current
behaviour which does support remote rename, and has supported it for the
past 10 years at least. I'd therefore prefer to go for a workaround that
addresses the problem of the deadlocks instead of the useful
functionality.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 13:31                   ` Trond Myklebust
@ 2006-09-28 14:20                     ` Al Viro
  2006-09-28 18:24                       ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Al Viro @ 2006-09-28 14:20 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Miklos Szeredi, dhowells, linux-fsdevel

On Thu, Sep 28, 2006 at 09:31:59AM -0400, Trond Myklebust wrote:

> Furthermore, it would upset a lot of people to change the current
> behaviour which does support remote rename, and has supported it for the
> past 10 years at least. I'd therefore prefer to go for a workaround that
> addresses the problem of the deadlocks instead of the useful
> functionality.

OK...  I'll look into your variant again when I get some sleep - I'm
afraid that there are remaining holes, but right now I'm not in any
condition to verify that (or prove that there's none)...  Later tonight...

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 14:20                     ` Al Viro
@ 2006-09-28 18:24                       ` Trond Myklebust
  2006-09-28 20:50                         ` Randy Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2006-09-28 18:24 UTC (permalink / raw)
  To: Al Viro; +Cc: Miklos Szeredi, dhowells, linux-fsdevel

[-- Attachment #1: Type: text/plain, Size: 938 bytes --]

On Thu, 2006-09-28 at 15:20 +0100, Al Viro wrote:
> On Thu, Sep 28, 2006 at 09:31:59AM -0400, Trond Myklebust wrote:
> 
> > Furthermore, it would upset a lot of people to change the current
> > behaviour which does support remote rename, and has supported it for the
> > past 10 years at least. I'd therefore prefer to go for a workaround that
> > addresses the problem of the deadlocks instead of the useful
> > functionality.
> 
> OK...  I'll look into your variant again when I get some sleep - I'm
> afraid that there are remaining holes, but right now I'm not in any
> condition to verify that (or prove that there's none)...  Later tonight...

Argh... We are going to need to hold the i_mutex of the parent of the
alias too. If not, other processes that are in the middle of an rmdir
will break.

Please scrap the old patch. This one should follow all the VFS locking
rules, and error out when there is a conflict.

Cheers,
  Trond

[-- Attachment #2: linux-2.6.18-002-enforce_directory_uniqueness.dif --]
[-- Type: message/rfc822, Size: 6693 bytes --]

From: Trond Myklebust <Trond.Myklebust@netapp.com>
Subject: No Subject
Date: Fri, 22 Sep 2006 15:32:14 -0400
Message-ID: <1159467821.5482.9.camel@lade.trondhjem.org>

If the caller tries to instantiate a directory using an inode that already
has a dentry alias, then we attempt to rename the existing dentry instead
of instantiating a new one. Fail with an ELOOP error if the rename would
affect one of our parent directories.

This behaviour is needed in order to avoid issues such as

  http://bugzilla.kernel.org/show_bug.cgi?id=7178

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 fs/dcache.c  |  137 ++++++++++++++++++++++++++++++++++++++++++----------------
 fs/nfs/dir.c |    7 +++
 2 files changed, 106 insertions(+), 38 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 17b392a..d0e86d9 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1337,23 +1337,21 @@ static void switch_names(struct dentry *
  * deleted it.
  */
  
-/**
- * d_move - move a dentry
+/*
+ * d_move_locked - move a dentry
  * @dentry: entry to move
  * @target: new dentry
  *
  * Update the dcache to reflect the move of a file name. Negative
  * dcache entries should not be moved in this way.
  */
-
-void d_move(struct dentry * dentry, struct dentry * target)
+static void d_move_locked(struct dentry * dentry, struct dentry * target)
 {
 	struct hlist_head *list;
 
 	if (!dentry->d_inode)
 		printk(KERN_WARNING "VFS: moving negative dcache entry\n");
 
-	spin_lock(&dcache_lock);
 	write_seqlock(&rename_lock);
 	/*
 	 * XXXX: do we really need to take target->d_lock?
@@ -1404,10 +1402,84 @@ already_unhashed:
 	fsnotify_d_move(dentry);
 	spin_unlock(&dentry->d_lock);
 	write_sequnlock(&rename_lock);
+}
+
+/**
+ * d_move - move a dentry
+ * @dentry: entry to move
+ * @target: new dentry
+ *
+ * Update the dcache to reflect the move of a file name. Negative
+ * dcache entries should not be moved in this way.
+ */
+
+void d_move(struct dentry * dentry, struct dentry * target)
+{
+	spin_lock(&dcache_lock);
+	d_move_locked(dentry, target);
 	spin_unlock(&dcache_lock);
 }
 
 /*
+ * Helper that returns 1 if p1 is a parent of p2, else 0
+ */
+static int d_isparent(struct dentry *p1, struct dentry *p2)
+{
+	struct dentry *p;
+
+	for (p = p2; p->d_parent != p; p = p->d_parent) {
+		if (p->d_parent == p1)
+			return 1;
+	}
+	return 0;
+}
+
+/*
+ * This helper attempts to cope with remotely renamed directories
+ *
+ * It assumes that the caller is already holding
+ * dentry->d_parent->d_inode->i_mutex and the dcache_lock
+ *
+ * Note: If ever the locking in lock_rename() changes, then please
+ * remember to update this too...
+ *
+ * On return, dcache_lock will have been unlocked.
+ */
+static struct dentry *__d_unalias(struct dentry *dentry, struct dentry *alias)
+{
+	struct mutex *m1 = NULL, *m2 = NULL;
+	struct dentry *ret;
+
+	/* If alias and dentry share a parent, then no extra locks required */
+	if (alias->d_parent == dentry->d_parent)
+		goto out_unalias;
+
+	/* Check for loops */
+	ret = ERR_PTR(-ELOOP);
+	if (d_isparent(alias, dentry))
+		goto out_err;
+
+	/* See lock_rename() */
+	ret = ERR_PTR(-EBUSY);
+	if (!mutex_trylock(&dentry->d_sb->s_vfs_rename_mutex))
+		goto out_err;
+	m1 = &dentry->d_sb->s_vfs_rename_mutex;
+	if (!mutex_trylock(&alias->d_parent->d_inode->i_mutex))
+		goto out_err;
+	m2 = &alias->d_parent->d_inode->i_mutex;
+out_unalias:
+	d_move_locked(alias, dentry);
+	ret = alias;
+out_err:
+	spin_unlock(&dcache_lock);
+	if (m2)
+		mutex_unlock(m2);
+	if (m1)
+		mutex_unlock(m1);
+	return ret;
+}
+
+/*
  * Prepare an anonymous dentry for life in the superblock's dentry tree as a
  * named dentry in place of the dentry to be replaced.
  */
@@ -1449,7 +1521,7 @@ static void __d_materialise_dentry(struc
  */
 struct dentry *d_materialise_unique(struct dentry *dentry, struct inode *inode)
 {
-	struct dentry *alias, *actual;
+	struct dentry *actual;
 
 	BUG_ON(!d_unhashed(dentry));
 
@@ -1461,26 +1533,27 @@ struct dentry *d_materialise_unique(stru
 		goto found_lock;
 	}
 
-	/* See if a disconnected directory already exists as an anonymous root
-	 * that we should splice into the tree instead */
-	if (S_ISDIR(inode->i_mode) && (alias = __d_find_alias(inode, 1))) {
-		spin_lock(&alias->d_lock);
-
-		/* Is this a mountpoint that we could splice into our tree? */
-		if (IS_ROOT(alias))
-			goto connect_mountpoint;
-
-		if (alias->d_name.len == dentry->d_name.len &&
-		    alias->d_parent == dentry->d_parent &&
-		    memcmp(alias->d_name.name,
-			   dentry->d_name.name,
-			   dentry->d_name.len) == 0)
-			goto replace_with_alias;
-
-		spin_unlock(&alias->d_lock);
-
-		/* Doh! Seem to be aliasing directories for some reason... */
-		dput(alias);
+	if (S_ISDIR(inode->i_mode)) {
+		struct dentry *alias;
+
+		/* Does an aliased dentry already exist? */
+		alias = __d_find_alias(inode, 0);
+		if (alias) {
+			actual = alias;
+			/* Is this an anonymous mountpoint that we could splice
+			 * into our tree? */
+			if (IS_ROOT(alias)) {
+				spin_lock(&alias->d_lock);
+				__d_materialise_dentry(dentry, alias);
+				__d_drop(alias);
+				goto found;
+			}
+			/* Nope, but we must(!) avoid directory aliasing */
+			actual = __d_unalias(dentry, alias);
+			if (IS_ERR(actual))
+				dput(alias);
+			goto out_nolock;
+		}
 	}
 
 	/* Add a unique reference */
@@ -1496,7 +1569,7 @@ found:
 	_d_rehash(actual);
 	spin_unlock(&actual->d_lock);
 	spin_unlock(&dcache_lock);
-
+out_nolock:
 	if (actual == dentry) {
 		security_d_instantiate(dentry, inode);
 		return NULL;
@@ -1505,16 +1578,6 @@ found:
 	iput(inode);
 	return actual;
 
-	/* Convert the anonymous/root alias into an ordinary dentry */
-connect_mountpoint:
-	__d_materialise_dentry(dentry, alias);
-
-	/* Replace the candidate dentry with the alias in the tree */
-replace_with_alias:
-	__d_drop(alias);
-	actual = alias;
-	goto found;
-
 shouldnt_be_hashed:
 	spin_unlock(&dcache_lock);
 	BUG();
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 7432f1a..0f22a26 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -933,8 +933,11 @@ static struct dentry *nfs_lookup(struct 
 
 no_entry:
 	res = d_materialise_unique(dentry, inode);
-	if (res != NULL)
+	if (res != NULL) {
+		if (IS_ERR(res))
+			goto out_unlock;
 		dentry = res;
+	}
 	nfs_renew_times(dentry);
 	nfs_set_verifier(dentry, nfs_save_change_attribute(dir));
 out_unlock:
@@ -1130,6 +1133,8 @@ static struct dentry *nfs_readdir_lookup
 	alias = d_materialise_unique(dentry, inode);
 	if (alias != NULL) {
 		dput(dentry);
+		if (IS_ERR(alias))
+			return NULL;
 		dentry = alias;
 	}
 

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 18:24                       ` Trond Myklebust
@ 2006-09-28 20:50                         ` Randy Dunlap
  2006-09-29 14:55                           ` Trond Myklebust
  0 siblings, 1 reply; 25+ messages in thread
From: Randy Dunlap @ 2006-09-28 20:50 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Al Viro, Miklos Szeredi, dhowells, linux-fsdevel

On Thu, 28 Sep 2006 14:24:45 -0400 Trond Myklebust wrote:

> On Thu, 2006-09-28 at 15:20 +0100, Al Viro wrote:
> > On Thu, Sep 28, 2006 at 09:31:59AM -0400, Trond Myklebust wrote:
> > 
> > > Furthermore, it would upset a lot of people to change the current
> > > behaviour which does support remote rename, and has supported it for the
> > > past 10 years at least. I'd therefore prefer to go for a workaround that
> > > addresses the problem of the deadlocks instead of the useful
> > > functionality.
> > 
> > OK...  I'll look into your variant again when I get some sleep - I'm
> > afraid that there are remaining holes, but right now I'm not in any
> > condition to verify that (or prove that there's none)...  Later tonight...
> 
> Argh... We are going to need to hold the i_mutex of the parent of the
> alias too. If not, other processes that are in the middle of an rmdir
> will break.
> 
> Please scrap the old patch. This one should follow all the VFS locking
> rules, and error out when there is a conflict.

-/**
- * d_move - move a dentry
+/*

Why not leave this as kernel-doc?  (using "/**")

+ * d_move_locked - move a dentry
  * @dentry: entry to move
  * @target: new dentry

---
~Randy

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-28 20:50                         ` Randy Dunlap
@ 2006-09-29 14:55                           ` Trond Myklebust
  2006-09-29 16:03                             ` Randy Dunlap
  0 siblings, 1 reply; 25+ messages in thread
From: Trond Myklebust @ 2006-09-29 14:55 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Al Viro, Miklos Szeredi, dhowells, linux-fsdevel

On Thu, 2006-09-28 at 13:50 -0700, Randy Dunlap wrote:
> -/**
> - * d_move - move a dentry
> +/*
> 
> Why not leave this as kernel-doc?  (using "/**")
> 
> + * d_move_locked - move a dentry
>   * @dentry: entry to move
>   * @target: new dentry

d_move_locked() is a static function, and is only meant to be used in
fs/dcache.c. It is useful to document it internally, but why add it to
kernel-doc?

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: How to handle non-local renames?
  2006-09-29 14:55                           ` Trond Myklebust
@ 2006-09-29 16:03                             ` Randy Dunlap
  0 siblings, 0 replies; 25+ messages in thread
From: Randy Dunlap @ 2006-09-29 16:03 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: Al Viro, Miklos Szeredi, dhowells, linux-fsdevel

On Fri, 29 Sep 2006 10:55:55 -0400 Trond Myklebust wrote:

> On Thu, 2006-09-28 at 13:50 -0700, Randy Dunlap wrote:
> > -/**
> > - * d_move - move a dentry
> > +/*
> > 
> > Why not leave this as kernel-doc?  (using "/**")
> > 
> > + * d_move_locked - move a dentry
> >   * @dentry: entry to move
> >   * @target: new dentry
> 
> d_move_locked() is a static function, and is only meant to be used in
> fs/dcache.c. It is useful to document it internally, but why add it to
> kernel-doc?

OK, thanks.

---
~Randy

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2006-09-29 16:02 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-09-17 20:43 How to handle non-local renames? Miklos Szeredi
2006-09-18 16:38 ` Trond Myklebust
2006-09-18 18:00   ` Miklos Szeredi
2006-09-19  8:24   ` Miklos Szeredi
2006-09-22  0:00     ` Trond Myklebust
2006-09-22  6:22       ` Miklos Szeredi
2006-09-28 10:02         ` Al Viro
2006-09-28 10:03           ` Al Viro
2006-09-28 10:22             ` Miklos Szeredi
2006-09-28 10:16           ` Miklos Szeredi
2006-09-28 10:27             ` Al Viro
2006-09-28 10:53               ` Miklos Szeredi
2006-09-28 11:21                 ` Al Viro
2006-09-28 11:44                   ` Miklos Szeredi
2006-09-28 11:54                     ` Al Viro
2006-09-28 12:31           ` Trond Myklebust
2006-09-28 12:42             ` Al Viro
2006-09-28 12:57               ` Trond Myklebust
2006-09-28 13:15                 ` Al Viro
2006-09-28 13:31                   ` Trond Myklebust
2006-09-28 14:20                     ` Al Viro
2006-09-28 18:24                       ` Trond Myklebust
2006-09-28 20:50                         ` Randy Dunlap
2006-09-29 14:55                           ` Trond Myklebust
2006-09-29 16:03                             ` Randy Dunlap

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).