linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Allow file systems to manually d_move() inside of ->rename()
@ 2006-08-29 21:54 Mark Fasheh
  2006-08-30  0:35 ` Trond Myklebust
  0 siblings, 1 reply; 3+ messages in thread
From: Mark Fasheh @ 2006-08-29 21:54 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: linux-fsdevel, hch, viro

I'm currently working on removing the "dentry" vote from ocfs2. This is a
network message broadcasted to all mounted nodes on every unlink() or
rename(). Upon recieving the message, those nodes d_delete() the dentry in
question.

What I'm doing is replacing the broadcast mechanism with a cluster lock
which covers a set of dentries. Every node gets a read only lock, until an
unlink during which the unlinking node, will request an exclusive lock,
forcing the other nodes who care about that dentry to d_delete() it. The
effect is that we retain a very lightweight ->d_revalidate(), and at the
same time get to make large improvements to the average case performance
of the ocfs2 unlink and rename operations.

I have uncovered a very small race with rename and d_move() however. The
d_move() for a rename is after the ->rename() callback, outside the parent
directory cluster locks which normally protect the name creation/removal
mechanism. The worry is that after ocfs2_rename(), but before the d_move() a
different node can discover the new name (created during the rename) and
unlink it, forcing a d_delete(). d_move() it seems, unconditionally rehashes
the (renamed) dentry and so if it's done after a d_delete() we could get
some inconsitent state amongst the nodes.

My solution is to simply allow ocfs2 to do the d_move() inside of
ocfs2_rename() by introducing a FS_RENAME_DOES_D_MOVE flag. OCFS2 won't
actually change how or why d_move() is called during a rename, it just uses
this flag to change where the call is made.

For any interested parties, a preliminary ocfs2 patch is at

http://oss.oracle.com/~mfasheh/vote_removal/remove_dentry_vote_wip.patch

The interesting stuff is mostly in fs/ocfs2/dcache.[ch]

The ocfs2 patch isn't posted via e-mail because it's still a work in
progress. That said, it actually works quite well, and the only things I
have left to do are unrelated to rename - the patch needs to be split up
into smaller ones, and a pair of (minor) dlm related fixups are needed.
	--Mark


From: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] Allow file systems to manually d_move() inside of ->rename()

Some file systems want to be able to manually d_move() the dentries involved
in a rename. Introduce a flag which instructs vfs_rename_dir() and
vfs_rename_other() that it has already been handled internally.

OCFS2 uses this to protect that part of the entire operation with a cluster
lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

diff --git a/fs/namei.c b/fs/namei.c
index c784e8b..e5a8478 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode *
 		dput(new_dentry);
 	}
 	if (!error)
-		d_move(old_dentry,new_dentry);
+		if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
+			d_move(old_dentry,new_dentry);
 	return error;
 }
 
@@ -2377,7 +2378,7 @@ static int vfs_rename_other(struct inode
 		error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (!error) {
 		/* The following d_move() should become unconditional */
-		if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME))
+		if (!(old_dir->i_sb->s_type->fs_flags & (FS_ODD_RENAME|FS_RENAME_DOES_D_MOVE)))
 			d_move(old_dentry, new_dentry);
 	}
 	if (target)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e04a5cf..8e9a7ca 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -87,6 +87,7 @@ #define SEL_EX		4
 /* public flags for file_system_type */
 #define FS_REQUIRES_DEV 1 
 #define FS_BINARY_MOUNTDATA 2
+#define FS_RENAME_DOES_D_MOVE 4
 #define FS_REVAL_DOT	16384	/* Check the paths ".", ".." for staleness */
 #define FS_ODD_RENAME	32768	/* Temporary stuff; will go away as soon
 				  * as nfs_rename() will be cleaned up

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] Allow file systems to manually d_move() inside of ->rename()
  2006-08-29 21:54 [PATCH] Allow file systems to manually d_move() inside of ->rename() Mark Fasheh
@ 2006-08-30  0:35 ` Trond Myklebust
  2006-08-30  2:17   ` Mark Fasheh
  0 siblings, 1 reply; 3+ messages in thread
From: Trond Myklebust @ 2006-08-30  0:35 UTC (permalink / raw)
  To: Mark Fasheh; +Cc: akpm, linux-kernel, linux-fsdevel, hch, viro

On Tue, 2006-08-29 at 14:54 -0700, Mark Fasheh wrote:
> diff --git a/fs/namei.c b/fs/namei.c
> index c784e8b..e5a8478 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode *
>  		dput(new_dentry);
>  	}
>  	if (!error)
> -		d_move(old_dentry,new_dentry);
> +		if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
> +			d_move(old_dentry,new_dentry);
>  	return error;
>  }
>  
> @@ -2377,7 +2378,7 @@ static int vfs_rename_other(struct inode
>  		error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
>  	if (!error) {
>  		/* The following d_move() should become unconditional */
> -		if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME))
> +		if (!(old_dir->i_sb->s_type->fs_flags & (FS_ODD_RENAME|FS_RENAME_DOES_D_MOVE)))
>  			d_move(old_dentry, new_dentry);
>  	}
>  	if (target)
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index e04a5cf..8e9a7ca 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -87,6 +87,7 @@ #define SEL_EX		4
>  /* public flags for file_system_type */
>  #define FS_REQUIRES_DEV 1 
>  #define FS_BINARY_MOUNTDATA 2
> +#define FS_RENAME_DOES_D_MOVE 4
>  #define FS_REVAL_DOT	16384	/* Check the paths ".", ".." for staleness */
>  #define FS_ODD_RENAME	32768	/* Temporary stuff; will go away as soon
>  				  * as nfs_rename() will be cleaned up
> -

Why have 2 synonyms for the FS_ODD_RENAME stuff? Just fix up the NFS
client to do the d_move() unconditionally, and add a check for
FS_ODD_RENAME to vfs_rename_dir().

Cheers,
  Trond


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] Allow file systems to manually d_move() inside of ->rename()
  2006-08-30  0:35 ` Trond Myklebust
@ 2006-08-30  2:17   ` Mark Fasheh
  0 siblings, 0 replies; 3+ messages in thread
From: Mark Fasheh @ 2006-08-30  2:17 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: akpm, linux-kernel, linux-fsdevel, hch, viro, chuck.lever

Hi Trond,

On Tue, Aug 29, 2006 at 08:35:52PM -0400, Trond Myklebust wrote:
> Why have 2 synonyms for the FS_ODD_RENAME stuff? Just fix up the NFS
> client to do the d_move() unconditionally, and add a check for
> FS_ODD_RENAME to vfs_rename_dir().
Though I read through some of the nfs code when developing my ocfs2 patch, I
wasn't sure that approach was desired.

Partly because I'm not experienced enough with nfs internals to have known
whether that was safe to do, and partly because the comment in
vfs_rename_other() seemed to indicate that FS_ODD_RENAME was a temporary
solution for NFS, whereas I was looking for something more permanent.

That all said, how does this look? I took the liberty of renaming the flag
to something a little more descriptive.

During an irc conversation, Chuck pointed out that perhaps a better solution
is to just internalize the d_move() to all file systems. It makes the patch
a bit more sweeping, but I'm willing to handle it if everybody agrees on
that approach.
	--Mark


From: Mark Fasheh <mark.fasheh@oracle.com>

[PATCH] Allow file systems to manually d_move() inside of ->rename()

Some file systems want to manually d_move() the dentries involved in a
rename. We can do this by making use of the FS_ODD_RENAME flag if we just
have nfs_rename() unconditionally do the d_move(). While there, we rename
the flag to be more descriptive.

OCFS2 uses this to protect that part of the rename operation with a cluster
lock.

Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com>

diff --git a/fs/namei.c b/fs/namei.c
index c784e8b..29418ec 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode *
 		dput(new_dentry);
 	}
 	if (!error)
-		d_move(old_dentry,new_dentry);
+		if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
+			d_move(old_dentry,new_dentry);
 	return error;
 }
 
@@ -2376,8 +2377,7 @@ static int vfs_rename_other(struct inode
 	else
 		error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry);
 	if (!error) {
-		/* The following d_move() should become unconditional */
-		if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME))
+		if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE))
 			d_move(old_dentry, new_dentry);
 	}
 	if (target)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index 3ddda6f..8ead2b9 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -1623,8 +1623,7 @@ out:
 	if (rehash)
 		d_rehash(rehash);
 	if (!error) {
-		if (!S_ISDIR(old_inode->i_mode))
-			d_move(old_dentry, new_dentry);
+		d_move(old_dentry, new_dentry);
 		nfs_renew_times(new_dentry);
 		nfs_set_verifier(new_dentry, nfs_save_change_attribute(new_dir));
 	}
diff --git a/fs/nfs/super.c b/fs/nfs/super.c
index e8a9bee..6dd17db 100644
--- a/fs/nfs/super.c
+++ b/fs/nfs/super.c
@@ -120,7 +120,7 @@ static struct file_system_type nfs_fs_ty
 	.name		= "nfs",
 	.get_sb		= nfs_get_sb,
 	.kill_sb	= nfs_kill_super,
-	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+	.fs_flags	= FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
 };
 
 struct file_system_type clone_nfs_fs_type = {
@@ -128,7 +128,7 @@ struct file_system_type clone_nfs_fs_typ
 	.name		= "nfs",
 	.get_sb		= nfs_clone_nfs_sb,
 	.kill_sb	= nfs_kill_super,
-	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+	.fs_flags	= FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
 };
 
 static struct super_operations nfs_sops = {
@@ -156,7 +156,7 @@ static struct file_system_type nfs4_fs_t
 	.name		= "nfs4",
 	.get_sb		= nfs4_get_sb,
 	.kill_sb	= nfs4_kill_super,
-	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+	.fs_flags	= FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
 };
 
 struct file_system_type clone_nfs4_fs_type = {
@@ -164,7 +164,7 @@ struct file_system_type clone_nfs4_fs_ty
 	.name		= "nfs4",
 	.get_sb		= nfs_clone_nfs4_sb,
 	.kill_sb	= nfs4_kill_super,
-	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+	.fs_flags	= FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
 };
 
 struct file_system_type nfs_referral_nfs4_fs_type = {
@@ -172,7 +172,7 @@ struct file_system_type nfs_referral_nfs
 	.name		= "nfs4",
 	.get_sb		= nfs_referral_nfs4_sb,
 	.kill_sb	= nfs4_kill_super,
-	.fs_flags	= FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
+	.fs_flags	= FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA,
 };
 
 static struct super_operations nfs4_sops = {
diff --git a/include/linux/fs.h b/include/linux/fs.h
index e04a5cf..7b5f889 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -88,9 +88,10 @@ #define SEL_EX		4
 #define FS_REQUIRES_DEV 1 
 #define FS_BINARY_MOUNTDATA 2
 #define FS_REVAL_DOT	16384	/* Check the paths ".", ".." for staleness */
-#define FS_ODD_RENAME	32768	/* Temporary stuff; will go away as soon
-				  * as nfs_rename() will be cleaned up
-				  */
+#define FS_RENAME_DOES_D_MOVE	32768	/* FS will handle d_move()
+					 * during rename() internally.
+					 */
+
 /*
  * These are the fs-independent mount-flags: up to 32 flags are supported
  */

^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2006-08-30  2:17 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-08-29 21:54 [PATCH] Allow file systems to manually d_move() inside of ->rename() Mark Fasheh
2006-08-30  0:35 ` Trond Myklebust
2006-08-30  2:17   ` Mark Fasheh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).