* [PATCH] Allow file systems to manually d_move() inside of ->rename() @ 2006-08-29 21:54 Mark Fasheh 2006-08-30 0:35 ` Trond Myklebust 0 siblings, 1 reply; 3+ messages in thread From: Mark Fasheh @ 2006-08-29 21:54 UTC (permalink / raw) To: akpm, linux-kernel; +Cc: linux-fsdevel, hch, viro I'm currently working on removing the "dentry" vote from ocfs2. This is a network message broadcasted to all mounted nodes on every unlink() or rename(). Upon recieving the message, those nodes d_delete() the dentry in question. What I'm doing is replacing the broadcast mechanism with a cluster lock which covers a set of dentries. Every node gets a read only lock, until an unlink during which the unlinking node, will request an exclusive lock, forcing the other nodes who care about that dentry to d_delete() it. The effect is that we retain a very lightweight ->d_revalidate(), and at the same time get to make large improvements to the average case performance of the ocfs2 unlink and rename operations. I have uncovered a very small race with rename and d_move() however. The d_move() for a rename is after the ->rename() callback, outside the parent directory cluster locks which normally protect the name creation/removal mechanism. The worry is that after ocfs2_rename(), but before the d_move() a different node can discover the new name (created during the rename) and unlink it, forcing a d_delete(). d_move() it seems, unconditionally rehashes the (renamed) dentry and so if it's done after a d_delete() we could get some inconsitent state amongst the nodes. My solution is to simply allow ocfs2 to do the d_move() inside of ocfs2_rename() by introducing a FS_RENAME_DOES_D_MOVE flag. OCFS2 won't actually change how or why d_move() is called during a rename, it just uses this flag to change where the call is made. For any interested parties, a preliminary ocfs2 patch is at http://oss.oracle.com/~mfasheh/vote_removal/remove_dentry_vote_wip.patch The interesting stuff is mostly in fs/ocfs2/dcache.[ch] The ocfs2 patch isn't posted via e-mail because it's still a work in progress. That said, it actually works quite well, and the only things I have left to do are unrelated to rename - the patch needs to be split up into smaller ones, and a pair of (minor) dlm related fixups are needed. --Mark From: Mark Fasheh <mark.fasheh@oracle.com> [PATCH] Allow file systems to manually d_move() inside of ->rename() Some file systems want to be able to manually d_move() the dentries involved in a rename. Introduce a flag which instructs vfs_rename_dir() and vfs_rename_other() that it has already been handled internally. OCFS2 uses this to protect that part of the entire operation with a cluster lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> diff --git a/fs/namei.c b/fs/namei.c index c784e8b..e5a8478 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode * dput(new_dentry); } if (!error) - d_move(old_dentry,new_dentry); + if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) + d_move(old_dentry,new_dentry); return error; } @@ -2377,7 +2378,7 @@ static int vfs_rename_other(struct inode error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry); if (!error) { /* The following d_move() should become unconditional */ - if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME)) + if (!(old_dir->i_sb->s_type->fs_flags & (FS_ODD_RENAME|FS_RENAME_DOES_D_MOVE))) d_move(old_dentry, new_dentry); } if (target) diff --git a/include/linux/fs.h b/include/linux/fs.h index e04a5cf..8e9a7ca 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -87,6 +87,7 @@ #define SEL_EX 4 /* public flags for file_system_type */ #define FS_REQUIRES_DEV 1 #define FS_BINARY_MOUNTDATA 2 +#define FS_RENAME_DOES_D_MOVE 4 #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ #define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon * as nfs_rename() will be cleaned up ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PATCH] Allow file systems to manually d_move() inside of ->rename() 2006-08-29 21:54 [PATCH] Allow file systems to manually d_move() inside of ->rename() Mark Fasheh @ 2006-08-30 0:35 ` Trond Myklebust 2006-08-30 2:17 ` Mark Fasheh 0 siblings, 1 reply; 3+ messages in thread From: Trond Myklebust @ 2006-08-30 0:35 UTC (permalink / raw) To: Mark Fasheh; +Cc: akpm, linux-kernel, linux-fsdevel, hch, viro On Tue, 2006-08-29 at 14:54 -0700, Mark Fasheh wrote: > diff --git a/fs/namei.c b/fs/namei.c > index c784e8b..e5a8478 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode * > dput(new_dentry); > } > if (!error) > - d_move(old_dentry,new_dentry); > + if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) > + d_move(old_dentry,new_dentry); > return error; > } > > @@ -2377,7 +2378,7 @@ static int vfs_rename_other(struct inode > error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry); > if (!error) { > /* The following d_move() should become unconditional */ > - if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME)) > + if (!(old_dir->i_sb->s_type->fs_flags & (FS_ODD_RENAME|FS_RENAME_DOES_D_MOVE))) > d_move(old_dentry, new_dentry); > } > if (target) > diff --git a/include/linux/fs.h b/include/linux/fs.h > index e04a5cf..8e9a7ca 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -87,6 +87,7 @@ #define SEL_EX 4 > /* public flags for file_system_type */ > #define FS_REQUIRES_DEV 1 > #define FS_BINARY_MOUNTDATA 2 > +#define FS_RENAME_DOES_D_MOVE 4 > #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ > #define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon > * as nfs_rename() will be cleaned up > - Why have 2 synonyms for the FS_ODD_RENAME stuff? Just fix up the NFS client to do the d_move() unconditionally, and add a check for FS_ODD_RENAME to vfs_rename_dir(). Cheers, Trond ^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [PATCH] Allow file systems to manually d_move() inside of ->rename() 2006-08-30 0:35 ` Trond Myklebust @ 2006-08-30 2:17 ` Mark Fasheh 0 siblings, 0 replies; 3+ messages in thread From: Mark Fasheh @ 2006-08-30 2:17 UTC (permalink / raw) To: Trond Myklebust; +Cc: akpm, linux-kernel, linux-fsdevel, hch, viro, chuck.lever Hi Trond, On Tue, Aug 29, 2006 at 08:35:52PM -0400, Trond Myklebust wrote: > Why have 2 synonyms for the FS_ODD_RENAME stuff? Just fix up the NFS > client to do the d_move() unconditionally, and add a check for > FS_ODD_RENAME to vfs_rename_dir(). Though I read through some of the nfs code when developing my ocfs2 patch, I wasn't sure that approach was desired. Partly because I'm not experienced enough with nfs internals to have known whether that was safe to do, and partly because the comment in vfs_rename_other() seemed to indicate that FS_ODD_RENAME was a temporary solution for NFS, whereas I was looking for something more permanent. That all said, how does this look? I took the liberty of renaming the flag to something a little more descriptive. During an irc conversation, Chuck pointed out that perhaps a better solution is to just internalize the d_move() to all file systems. It makes the patch a bit more sweeping, but I'm willing to handle it if everybody agrees on that approach. --Mark From: Mark Fasheh <mark.fasheh@oracle.com> [PATCH] Allow file systems to manually d_move() inside of ->rename() Some file systems want to manually d_move() the dentries involved in a rename. We can do this by making use of the FS_ODD_RENAME flag if we just have nfs_rename() unconditionally do the d_move(). While there, we rename the flag to be more descriptive. OCFS2 uses this to protect that part of the rename operation with a cluster lock. Signed-off-by: Mark Fasheh <mark.fasheh@oracle.com> diff --git a/fs/namei.c b/fs/namei.c index c784e8b..29418ec 100644 --- a/fs/namei.c +++ b/fs/namei.c @@ -2353,7 +2353,8 @@ static int vfs_rename_dir(struct inode * dput(new_dentry); } if (!error) - d_move(old_dentry,new_dentry); + if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) + d_move(old_dentry,new_dentry); return error; } @@ -2376,8 +2377,7 @@ static int vfs_rename_other(struct inode else error = old_dir->i_op->rename(old_dir, old_dentry, new_dir, new_dentry); if (!error) { - /* The following d_move() should become unconditional */ - if (!(old_dir->i_sb->s_type->fs_flags & FS_ODD_RENAME)) + if (!(old_dir->i_sb->s_type->fs_flags & FS_RENAME_DOES_D_MOVE)) d_move(old_dentry, new_dentry); } if (target) diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index 3ddda6f..8ead2b9 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1623,8 +1623,7 @@ out: if (rehash) d_rehash(rehash); if (!error) { - if (!S_ISDIR(old_inode->i_mode)) - d_move(old_dentry, new_dentry); + d_move(old_dentry, new_dentry); nfs_renew_times(new_dentry); nfs_set_verifier(new_dentry, nfs_save_change_attribute(new_dir)); } diff --git a/fs/nfs/super.c b/fs/nfs/super.c index e8a9bee..6dd17db 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -120,7 +120,7 @@ static struct file_system_type nfs_fs_ty .name = "nfs", .get_sb = nfs_get_sb, .kill_sb = nfs_kill_super, - .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, }; struct file_system_type clone_nfs_fs_type = { @@ -128,7 +128,7 @@ struct file_system_type clone_nfs_fs_typ .name = "nfs", .get_sb = nfs_clone_nfs_sb, .kill_sb = nfs_kill_super, - .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, }; static struct super_operations nfs_sops = { @@ -156,7 +156,7 @@ static struct file_system_type nfs4_fs_t .name = "nfs4", .get_sb = nfs4_get_sb, .kill_sb = nfs4_kill_super, - .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, }; struct file_system_type clone_nfs4_fs_type = { @@ -164,7 +164,7 @@ struct file_system_type clone_nfs4_fs_ty .name = "nfs4", .get_sb = nfs_clone_nfs4_sb, .kill_sb = nfs4_kill_super, - .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, }; struct file_system_type nfs_referral_nfs4_fs_type = { @@ -172,7 +172,7 @@ struct file_system_type nfs_referral_nfs .name = "nfs4", .get_sb = nfs_referral_nfs4_sb, .kill_sb = nfs4_kill_super, - .fs_flags = FS_ODD_RENAME|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, + .fs_flags = FS_RENAME_DOES_D_MOVE|FS_REVAL_DOT|FS_BINARY_MOUNTDATA, }; static struct super_operations nfs4_sops = { diff --git a/include/linux/fs.h b/include/linux/fs.h index e04a5cf..7b5f889 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -88,9 +88,10 @@ #define SEL_EX 4 #define FS_REQUIRES_DEV 1 #define FS_BINARY_MOUNTDATA 2 #define FS_REVAL_DOT 16384 /* Check the paths ".", ".." for staleness */ -#define FS_ODD_RENAME 32768 /* Temporary stuff; will go away as soon - * as nfs_rename() will be cleaned up - */ +#define FS_RENAME_DOES_D_MOVE 32768 /* FS will handle d_move() + * during rename() internally. + */ + /* * These are the fs-independent mount-flags: up to 32 flags are supported */ ^ permalink raw reply related [flat|nested] 3+ messages in thread
end of thread, other threads:[~2006-08-30 2:17 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-08-29 21:54 [PATCH] Allow file systems to manually d_move() inside of ->rename() Mark Fasheh 2006-08-30 0:35 ` Trond Myklebust 2006-08-30 2:17 ` Mark Fasheh
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).