Linux NFS development
 help / color / mirror / Atom feed
* [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
@ 2025-04-27 23:01 trondmy
  2025-04-28  1:53 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: trondmy @ 2025-04-27 23:01 UTC (permalink / raw)
  To: Anna Schumaker; +Cc: linux-nfs

From: Trond Myklebust <trond.myklebust@hammerspace.com>

The Linux client assumes that all filehandles are non-volatile for
renames within the same directory (otherwise sillyrename cannot work).
However, the existence of the Linux 'subtree_check' export option has
meant that nfs_rename() has always assumed it needs to flush writes
before attempting to rename.

Since NFSv4 does allow the client to query whether or not the server
exhibits this behaviour, and since knfsd does actually set the
appropriate flag when 'subtree_check' is enabled on an export, it
should be OK to optimise away the write flushing behaviour in the cases
where it is clearly not needed.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
---
 fs/nfs/client.c           |  2 ++
 fs/nfs/dir.c              | 15 ++++++++++++++-
 include/linux/nfs_fs_sb.h |  6 ++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 2115c1189c2d..6d63b958c4bb 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -1105,6 +1105,8 @@ struct nfs_server *nfs_create_server(struct fs_context *fc)
 		if (server->namelen == 0 || server->namelen > NFS2_MAXNAMLEN)
 			server->namelen = NFS2_MAXNAMLEN;
 	}
+	/* Linux 'subtree_check' borkenness mandates this setting */
+	server->fh_expire_type = NFS_FH_VOL_RENAME;
 
 	if (!(fattr->valid & NFS_ATTR_FATTR)) {
 		error = ctx->nfs_mod->rpc_ops->getattr(server, ctx->mntfh,
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
index bd23fc736b39..d0e0b435a843 100644
--- a/fs/nfs/dir.c
+++ b/fs/nfs/dir.c
@@ -2676,6 +2676,18 @@ nfs_unblock_rename(struct rpc_task *task, struct nfs_renamedata *data)
 	unblock_revalidate(new_dentry);
 }
 
+static bool nfs_rename_is_unsafe_cross_dir(struct dentry *old_dentry,
+					   struct dentry *new_dentry)
+{
+	struct nfs_server *server = NFS_SB(old_dentry->d_sb);
+
+	if (old_dentry->d_parent != new_dentry->d_parent)
+		return false;
+	if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
+		return !(server->fh_expire_type & NFS_FH_NOEXPIRE_WITH_OPEN);
+	return true;
+}
+
 /*
  * RENAME
  * FIXME: Some nfsds, like the Linux user space nfsd, may generate a
@@ -2763,7 +2775,8 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
 
 	}
 
-	if (S_ISREG(old_inode->i_mode))
+	if (S_ISREG(old_inode->i_mode) &&
+	    nfs_rename_is_unsafe_cross_dir(old_dentry, new_dentry))
 		nfs_sync_inode(old_inode);
 	task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry,
 				must_unblock ? nfs_unblock_rename : NULL);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 71319637a84e..6732c7e04d19 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -236,6 +236,12 @@ struct nfs_server {
 	u32			acl_bitmask;	/* V4 bitmask representing the ACEs
 						   that are supported on this
 						   filesystem */
+	/* The following #defines numerically match the NFSv4 equivalents */
+#define NFS_FH_NOEXPIRE_WITH_OPEN (0x1)
+#define NFS_FH_VOLATILE_ANY (0x2)
+#define NFS_FH_VOL_MIGRATION (0x4)
+#define NFS_FH_VOL_RENAME (0x8)
+#define NFS_FH_RENAME_UNSAFE (NFS_FH_VOLATILE_ANY | NFS_FH_VOL_RENAME)
 	u32			fh_expire_type;	/* V4 bitmask representing file
 						   handle volatility type for
 						   this filesystem */
-- 
2.49.0


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
  2025-04-27 23:01 [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename() trondmy
@ 2025-04-28  1:53 ` kernel test robot
  2025-04-28  2:35 ` kernel test robot
  2025-04-29 16:14 ` Chuck Lever
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-04-28  1:53 UTC (permalink / raw)
  To: trondmy, Anna Schumaker; +Cc: llvm, oe-kbuild-all, linux-nfs

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on linus/master v6.15-rc3 next-20250424]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/trondmy-kernel-org/NFS-Avoid-flushing-data-while-holding-directory-locks-in-nfs_rename/20250428-070327
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link:    https://lore.kernel.org/r/a804c54445a3f028007763e05285d765afcab0f9.1745794273.git.trond.myklebust%40hammerspace.com
patch subject: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
config: i386-buildonly-randconfig-004-20250428 (https://download.01.org/0day-ci/archive/20250428/202504280902.aKsSfRfW-lkp@intel.com/config)
compiler: clang version 20.1.2 (https://github.com/llvm/llvm-project 58df0ef89dd64126512e4ee27b4ac3fd8ddf6247)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250428/202504280902.aKsSfRfW-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504280902.aKsSfRfW-lkp@intel.com/

All errors (new ones prefixed by >>):

>> fs/nfs/client.c:1109:10: error: no member named 'fh_expire_type' in 'struct nfs_server'
    1109 |         server->fh_expire_type = NFS_FH_VOL_RENAME;
         |         ~~~~~~  ^
>> fs/nfs/client.c:1109:27: error: use of undeclared identifier 'NFS_FH_VOL_RENAME'
    1109 |         server->fh_expire_type = NFS_FH_VOL_RENAME;
         |                                  ^
   2 errors generated.
--
>> fs/nfs/dir.c:2686:14: error: no member named 'fh_expire_type' in 'struct nfs_server'
    2686 |         if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
         |             ~~~~~~  ^
>> fs/nfs/dir.c:2686:31: error: use of undeclared identifier 'NFS_FH_RENAME_UNSAFE'
    2686 |         if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
         |                                      ^
   fs/nfs/dir.c:2687:20: error: no member named 'fh_expire_type' in 'struct nfs_server'
    2687 |                 return !(server->fh_expire_type & NFS_FH_NOEXPIRE_WITH_OPEN);
         |                          ~~~~~~  ^
>> fs/nfs/dir.c:2687:37: error: use of undeclared identifier 'NFS_FH_NOEXPIRE_WITH_OPEN'
    2687 |                 return !(server->fh_expire_type & NFS_FH_NOEXPIRE_WITH_OPEN);
         |                                                   ^
   4 errors generated.


vim +1109 fs/nfs/client.c

  1067	
  1068	/*
  1069	 * Create a version 2 or 3 volume record
  1070	 * - keyed on server and FSID
  1071	 */
  1072	struct nfs_server *nfs_create_server(struct fs_context *fc)
  1073	{
  1074		struct nfs_fs_context *ctx = nfs_fc2context(fc);
  1075		struct nfs_server *server;
  1076		struct nfs_fattr *fattr;
  1077		int error;
  1078	
  1079		server = nfs_alloc_server();
  1080		if (!server)
  1081			return ERR_PTR(-ENOMEM);
  1082	
  1083		server->cred = get_cred(fc->cred);
  1084	
  1085		error = -ENOMEM;
  1086		fattr = nfs_alloc_fattr();
  1087		if (fattr == NULL)
  1088			goto error;
  1089	
  1090		/* Get a client representation */
  1091		error = nfs_init_server(server, fc);
  1092		if (error < 0)
  1093			goto error;
  1094	
  1095		/* Probe the root fh to retrieve its FSID */
  1096		error = nfs_probe_fsinfo(server, ctx->mntfh, fattr);
  1097		if (error < 0)
  1098			goto error;
  1099		if (server->nfs_client->rpc_ops->version == 3) {
  1100			if (server->namelen == 0 || server->namelen > NFS3_MAXNAMLEN)
  1101				server->namelen = NFS3_MAXNAMLEN;
  1102			if (!(ctx->flags & NFS_MOUNT_NORDIRPLUS))
  1103				server->caps |= NFS_CAP_READDIRPLUS;
  1104		} else {
  1105			if (server->namelen == 0 || server->namelen > NFS2_MAXNAMLEN)
  1106				server->namelen = NFS2_MAXNAMLEN;
  1107		}
  1108		/* Linux 'subtree_check' borkenness mandates this setting */
> 1109		server->fh_expire_type = NFS_FH_VOL_RENAME;
  1110	
  1111		if (!(fattr->valid & NFS_ATTR_FATTR)) {
  1112			error = ctx->nfs_mod->rpc_ops->getattr(server, ctx->mntfh,
  1113							       fattr, NULL);
  1114			if (error < 0) {
  1115				dprintk("nfs_create_server: getattr error = %d\n", -error);
  1116				goto error;
  1117			}
  1118		}
  1119		memcpy(&server->fsid, &fattr->fsid, sizeof(server->fsid));
  1120	
  1121		dprintk("Server FSID: %llx:%llx\n",
  1122			(unsigned long long) server->fsid.major,
  1123			(unsigned long long) server->fsid.minor);
  1124	
  1125		nfs_server_insert_lists(server);
  1126		server->mount_time = jiffies;
  1127		nfs_free_fattr(fattr);
  1128		return server;
  1129	
  1130	error:
  1131		nfs_free_fattr(fattr);
  1132		nfs_free_server(server);
  1133		return ERR_PTR(error);
  1134	}
  1135	EXPORT_SYMBOL_GPL(nfs_create_server);
  1136	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
  2025-04-27 23:01 [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename() trondmy
  2025-04-28  1:53 ` kernel test robot
@ 2025-04-28  2:35 ` kernel test robot
  2025-04-29 16:14 ` Chuck Lever
  2 siblings, 0 replies; 7+ messages in thread
From: kernel test robot @ 2025-04-28  2:35 UTC (permalink / raw)
  To: trondmy, Anna Schumaker; +Cc: oe-kbuild-all, linux-nfs

Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on trondmy-nfs/linux-next]
[also build test ERROR on linus/master v6.15-rc3 next-20250424]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/trondmy-kernel-org/NFS-Avoid-flushing-data-while-holding-directory-locks-in-nfs_rename/20250428-070327
base:   git://git.linux-nfs.org/projects/trondmy/linux-nfs.git linux-next
patch link:    https://lore.kernel.org/r/a804c54445a3f028007763e05285d765afcab0f9.1745794273.git.trond.myklebust%40hammerspace.com
patch subject: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
config: sparc-randconfig-002-20250428 (https://download.01.org/0day-ci/archive/20250428/202504281017.yCgdaZD3-lkp@intel.com/config)
compiler: sparc-linux-gcc (GCC) 7.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250428/202504281017.yCgdaZD3-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202504281017.yCgdaZD3-lkp@intel.com/

All errors (new ones prefixed by >>):

   fs/nfs/client.c: In function 'nfs_create_server':
>> fs/nfs/client.c:1109:8: error: 'struct nfs_server' has no member named 'fh_expire_type'
     server->fh_expire_type = NFS_FH_VOL_RENAME;
           ^~
>> fs/nfs/client.c:1109:27: error: 'NFS_FH_VOL_RENAME' undeclared (first use in this function); did you mean 'NFS4_FH_VOL_RENAME'?
     server->fh_expire_type = NFS_FH_VOL_RENAME;
                              ^~~~~~~~~~~~~~~~~
                              NFS4_FH_VOL_RENAME
   fs/nfs/client.c:1109:27: note: each undeclared identifier is reported only once for each function it appears in
--
   fs/nfs/dir.c: In function 'nfs_rename_is_unsafe_cross_dir':
>> fs/nfs/dir.c:2686:12: error: 'struct nfs_server' has no member named 'fh_expire_type'
     if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
               ^~
>> fs/nfs/dir.c:2686:31: error: 'NFS_FH_RENAME_UNSAFE' undeclared (first use in this function); did you mean 'TASK_FREEZABLE_UNSAFE'?
     if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
                                  ^~~~~~~~~~~~~~~~~~~~
                                  TASK_FREEZABLE_UNSAFE
   fs/nfs/dir.c:2686:31: note: each undeclared identifier is reported only once for each function it appears in
   fs/nfs/dir.c:2687:18: error: 'struct nfs_server' has no member named 'fh_expire_type'
      return !(server->fh_expire_type & NFS_FH_NOEXPIRE_WITH_OPEN);
                     ^~
>> fs/nfs/dir.c:2687:37: error: 'NFS_FH_NOEXPIRE_WITH_OPEN' undeclared (first use in this function); did you mean 'NFS4_FH_NOEXPIRE_WITH_OPEN'?
      return !(server->fh_expire_type & NFS_FH_NOEXPIRE_WITH_OPEN);
                                        ^~~~~~~~~~~~~~~~~~~~~~~~~
                                        NFS4_FH_NOEXPIRE_WITH_OPEN


vim +1109 fs/nfs/client.c

  1067	
  1068	/*
  1069	 * Create a version 2 or 3 volume record
  1070	 * - keyed on server and FSID
  1071	 */
  1072	struct nfs_server *nfs_create_server(struct fs_context *fc)
  1073	{
  1074		struct nfs_fs_context *ctx = nfs_fc2context(fc);
  1075		struct nfs_server *server;
  1076		struct nfs_fattr *fattr;
  1077		int error;
  1078	
  1079		server = nfs_alloc_server();
  1080		if (!server)
  1081			return ERR_PTR(-ENOMEM);
  1082	
  1083		server->cred = get_cred(fc->cred);
  1084	
  1085		error = -ENOMEM;
  1086		fattr = nfs_alloc_fattr();
  1087		if (fattr == NULL)
  1088			goto error;
  1089	
  1090		/* Get a client representation */
  1091		error = nfs_init_server(server, fc);
  1092		if (error < 0)
  1093			goto error;
  1094	
  1095		/* Probe the root fh to retrieve its FSID */
  1096		error = nfs_probe_fsinfo(server, ctx->mntfh, fattr);
  1097		if (error < 0)
  1098			goto error;
  1099		if (server->nfs_client->rpc_ops->version == 3) {
  1100			if (server->namelen == 0 || server->namelen > NFS3_MAXNAMLEN)
  1101				server->namelen = NFS3_MAXNAMLEN;
  1102			if (!(ctx->flags & NFS_MOUNT_NORDIRPLUS))
  1103				server->caps |= NFS_CAP_READDIRPLUS;
  1104		} else {
  1105			if (server->namelen == 0 || server->namelen > NFS2_MAXNAMLEN)
  1106				server->namelen = NFS2_MAXNAMLEN;
  1107		}
  1108		/* Linux 'subtree_check' borkenness mandates this setting */
> 1109		server->fh_expire_type = NFS_FH_VOL_RENAME;
  1110	
  1111		if (!(fattr->valid & NFS_ATTR_FATTR)) {
  1112			error = ctx->nfs_mod->rpc_ops->getattr(server, ctx->mntfh,
  1113							       fattr, NULL);
  1114			if (error < 0) {
  1115				dprintk("nfs_create_server: getattr error = %d\n", -error);
  1116				goto error;
  1117			}
  1118		}
  1119		memcpy(&server->fsid, &fattr->fsid, sizeof(server->fsid));
  1120	
  1121		dprintk("Server FSID: %llx:%llx\n",
  1122			(unsigned long long) server->fsid.major,
  1123			(unsigned long long) server->fsid.minor);
  1124	
  1125		nfs_server_insert_lists(server);
  1126		server->mount_time = jiffies;
  1127		nfs_free_fattr(fattr);
  1128		return server;
  1129	
  1130	error:
  1131		nfs_free_fattr(fattr);
  1132		nfs_free_server(server);
  1133		return ERR_PTR(error);
  1134	}
  1135	EXPORT_SYMBOL_GPL(nfs_create_server);
  1136	

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
  2025-04-27 23:01 [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename() trondmy
  2025-04-28  1:53 ` kernel test robot
  2025-04-28  2:35 ` kernel test robot
@ 2025-04-29 16:14 ` Chuck Lever
  2025-04-29 16:54   ` Jeff Layton
  2025-04-29 23:22   ` Trond Myklebust
  2 siblings, 2 replies; 7+ messages in thread
From: Chuck Lever @ 2025-04-29 16:14 UTC (permalink / raw)
  To: trondmy, Anna Schumaker; +Cc: linux-nfs

On 4/27/25 7:01 PM, trondmy@kernel.org wrote:
> From: Trond Myklebust <trond.myklebust@hammerspace.com>
> 
> The Linux client assumes that all filehandles are non-volatile for
> renames within the same directory (otherwise sillyrename cannot work).
> However, the existence of the Linux 'subtree_check' export option has
> meant that nfs_rename() has always assumed it needs to flush writes
> before attempting to rename.
> 
> Since NFSv4 does allow the client to query whether or not the server
> exhibits this behaviour, and since knfsd does actually set the
> appropriate flag when 'subtree_check' is enabled on an export, it
> should be OK to optimise away the write flushing behaviour in the cases
> where it is clearly not needed.

No objection to the high level goal, seems like a sensible change.

So NFSv3 still has the flushing requirement, but NFSv4 can disable that
requirement as long as the server in question asserts FH4_VOLATILE_ANY
and FH4_VOL_RENAME, correct?

I'm wondering how confident we are that other server implementations
behave reasonably. Yes, they are broken if they don't behave, but there
is still risk of introducing a regression.


> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> ---
>  fs/nfs/client.c           |  2 ++
>  fs/nfs/dir.c              | 15 ++++++++++++++-
>  include/linux/nfs_fs_sb.h |  6 ++++++
>  3 files changed, 22 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 2115c1189c2d..6d63b958c4bb 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -1105,6 +1105,8 @@ struct nfs_server *nfs_create_server(struct fs_context *fc)
>  		if (server->namelen == 0 || server->namelen > NFS2_MAXNAMLEN)
>  			server->namelen = NFS2_MAXNAMLEN;
>  	}
> +	/* Linux 'subtree_check' borkenness mandates this setting */

Assuming you are going to respin this patch to deal with the build bot
warnings, can you make this comment more useful? "borkenness" sounds
like you are dealing with a server bug here, but that's not really
the case. subtree_check is working as designed: it requires the extra
flushing. The no_subtree_check case does not, IIUC.

It would be better to explain this need strictly in terms of file handle
volatility, no?


> +	server->fh_expire_type = NFS_FH_VOL_RENAME;
>  
>  	if (!(fattr->valid & NFS_ATTR_FATTR)) {
>  		error = ctx->nfs_mod->rpc_ops->getattr(server, ctx->mntfh,
> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> index bd23fc736b39..d0e0b435a843 100644
> --- a/fs/nfs/dir.c
> +++ b/fs/nfs/dir.c
> @@ -2676,6 +2676,18 @@ nfs_unblock_rename(struct rpc_task *task, struct nfs_renamedata *data)
>  	unblock_revalidate(new_dentry);
>  }
>  
> +static bool nfs_rename_is_unsafe_cross_dir(struct dentry *old_dentry,
> +					   struct dentry *new_dentry)
> +{
> +	struct nfs_server *server = NFS_SB(old_dentry->d_sb);
> +
> +	if (old_dentry->d_parent != new_dentry->d_parent)
> +		return false;
> +	if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
> +		return !(server->fh_expire_type & NFS_FH_NOEXPIRE_WITH_OPEN);
> +	return true;
> +}
> +
>  /*
>   * RENAME
>   * FIXME: Some nfsds, like the Linux user space nfsd, may generate a
> @@ -2763,7 +2775,8 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
>  
>  	}
>  
> -	if (S_ISREG(old_inode->i_mode))
> +	if (S_ISREG(old_inode->i_mode) &&
> +	    nfs_rename_is_unsafe_cross_dir(old_dentry, new_dentry))
>  		nfs_sync_inode(old_inode);
>  	task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry,
>  				must_unblock ? nfs_unblock_rename : NULL);
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 71319637a84e..6732c7e04d19 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -236,6 +236,12 @@ struct nfs_server {
>  	u32			acl_bitmask;	/* V4 bitmask representing the ACEs
>  						   that are supported on this
>  						   filesystem */
> +	/* The following #defines numerically match the NFSv4 equivalents */
> +#define NFS_FH_NOEXPIRE_WITH_OPEN (0x1)
> +#define NFS_FH_VOLATILE_ANY (0x2)
> +#define NFS_FH_VOL_MIGRATION (0x4)
> +#define NFS_FH_VOL_RENAME (0x8)
> +#define NFS_FH_RENAME_UNSAFE (NFS_FH_VOLATILE_ANY | NFS_FH_VOL_RENAME)
>  	u32			fh_expire_type;	/* V4 bitmask representing file
>  						   handle volatility type for
>  						   this filesystem */


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
  2025-04-29 16:14 ` Chuck Lever
@ 2025-04-29 16:54   ` Jeff Layton
  2025-04-29 23:22   ` Trond Myklebust
  1 sibling, 0 replies; 7+ messages in thread
From: Jeff Layton @ 2025-04-29 16:54 UTC (permalink / raw)
  To: Chuck Lever, trondmy, Anna Schumaker; +Cc: linux-nfs

On Tue, 2025-04-29 at 12:14 -0400, Chuck Lever wrote:
> On 4/27/25 7:01 PM, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > The Linux client assumes that all filehandles are non-volatile for
> > renames within the same directory (otherwise sillyrename cannot work).
> > However, the existence of the Linux 'subtree_check' export option has
> > meant that nfs_rename() has always assumed it needs to flush writes
> > before attempting to rename.
> > 
> > Since NFSv4 does allow the client to query whether or not the server
> > exhibits this behaviour, and since knfsd does actually set the
> > appropriate flag when 'subtree_check' is enabled on an export, it
> > should be OK to optimise away the write flushing behaviour in the cases
> > where it is clearly not needed.
> 
> No objection to the high level goal, seems like a sensible change.
> 
> So NFSv3 still has the flushing requirement, but NFSv4 can disable that
> requirement as long as the server in question asserts FH4_VOLATILE_ANY
> and FH4_VOL_RENAME, correct?
> 

(Note that there is a v2 patch that was posted soon after this)

It's actually the reverse: if the server doesn't assert either
FH4_VOL_RENAME or FH4_VOLATILE_ANY, then you're able to disable that
requirement.

> I'm wondering how confident we are that other server implementations
> behave reasonably. Yes, they are broken if they don't behave, but there
> is still risk of introducing a regression.

> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/client.c           |  2 ++
> >  fs/nfs/dir.c              | 15 ++++++++++++++-
> >  include/linux/nfs_fs_sb.h |  6 ++++++
> >  3 files changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > index 2115c1189c2d..6d63b958c4bb 100644
> > --- a/fs/nfs/client.c
> > +++ b/fs/nfs/client.c
> > @@ -1105,6 +1105,8 @@ struct nfs_server *nfs_create_server(struct fs_context *fc)
> >  		if (server->namelen == 0 || server->namelen > NFS2_MAXNAMLEN)
> >  			server->namelen = NFS2_MAXNAMLEN;
> >  	}
> > +	/* Linux 'subtree_check' borkenness mandates this setting */
> 
> Assuming you are going to respin this patch to deal with the build bot
> warnings, can you make this comment more useful? "borkenness" sounds
> like you are dealing with a server bug here, but that's not really
> the case. subtree_check is working as designed: it requires the extra
> flushing. The no_subtree_check case does not, IIUC.
> 
> It would be better to explain this need strictly in terms of file handle
> volatility, no?
> 
> 
> > +	server->fh_expire_type = NFS_FH_VOL_RENAME;
> >  
> >  	if (!(fattr->valid & NFS_ATTR_FATTR)) {
> >  		error = ctx->nfs_mod->rpc_ops->getattr(server, ctx->mntfh,
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index bd23fc736b39..d0e0b435a843 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -2676,6 +2676,18 @@ nfs_unblock_rename(struct rpc_task *task, struct nfs_renamedata *data)
> >  	unblock_revalidate(new_dentry);
> >  }
> >  
> > +static bool nfs_rename_is_unsafe_cross_dir(struct dentry *old_dentry,
> > +					   struct dentry *new_dentry)
> > +{
> > +	struct nfs_server *server = NFS_SB(old_dentry->d_sb);
> > +
> > +	if (old_dentry->d_parent != new_dentry->d_parent)
> > +		return false;
> > +	if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
> > +		return !(server->fh_expire_type & NFS_FH_NOEXPIRE_WITH_OPEN);
> > +	return true;
> > +}
> > +
> >  /*
> >   * RENAME
> >   * FIXME: Some nfsds, like the Linux user space nfsd, may generate a
> > @@ -2763,7 +2775,8 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir,
> >  
> >  	}
> >  
> > -	if (S_ISREG(old_inode->i_mode))
> > +	if (S_ISREG(old_inode->i_mode) &&
> > +	    nfs_rename_is_unsafe_cross_dir(old_dentry, new_dentry))
> >  		nfs_sync_inode(old_inode);
> >  	task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry,
> >  				must_unblock ? nfs_unblock_rename : NULL);
> > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> > index 71319637a84e..6732c7e04d19 100644
> > --- a/include/linux/nfs_fs_sb.h
> > +++ b/include/linux/nfs_fs_sb.h
> > @@ -236,6 +236,12 @@ struct nfs_server {
> >  	u32			acl_bitmask;	/* V4 bitmask representing the ACEs
> >  						   that are supported on this
> >  						   filesystem */
> > +	/* The following #defines numerically match the NFSv4 equivalents */
> > +#define NFS_FH_NOEXPIRE_WITH_OPEN (0x1)
> > +#define NFS_FH_VOLATILE_ANY (0x2)
> > +#define NFS_FH_VOL_MIGRATION (0x4)
> > +#define NFS_FH_VOL_RENAME (0x8)
> > +#define NFS_FH_RENAME_UNSAFE (NFS_FH_VOLATILE_ANY | NFS_FH_VOL_RENAME)
> >  	u32			fh_expire_type;	/* V4 bitmask representing file
> >  						   handle volatility type for
> >  						   this filesystem */
> 
> 

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
  2025-04-29 16:14 ` Chuck Lever
  2025-04-29 16:54   ` Jeff Layton
@ 2025-04-29 23:22   ` Trond Myklebust
  2025-04-30 13:28     ` Chuck Lever
  1 sibling, 1 reply; 7+ messages in thread
From: Trond Myklebust @ 2025-04-29 23:22 UTC (permalink / raw)
  To: anna@kernel.org, chuck.lever@oracle.com; +Cc: linux-nfs@vger.kernel.org

On Tue, 2025-04-29 at 12:14 -0400, Chuck Lever wrote:
> On 4/27/25 7:01 PM, trondmy@kernel.org wrote:
> > From: Trond Myklebust <trond.myklebust@hammerspace.com>
> > 
> > The Linux client assumes that all filehandles are non-volatile for
> > renames within the same directory (otherwise sillyrename cannot
> > work).
> > However, the existence of the Linux 'subtree_check' export option
> > has
> > meant that nfs_rename() has always assumed it needs to flush writes
> > before attempting to rename.
> > 
> > Since NFSv4 does allow the client to query whether or not the
> > server
> > exhibits this behaviour, and since knfsd does actually set the
> > appropriate flag when 'subtree_check' is enabled on an export, it
> > should be OK to optimise away the write flushing behaviour in the
> > cases
> > where it is clearly not needed.
> 
> No objection to the high level goal, seems like a sensible change.
> 
> So NFSv3 still has the flushing requirement, but NFSv4 can disable
> that
> requirement as long as the server in question asserts
> FH4_VOLATILE_ANY
> and FH4_VOL_RENAME, correct?
> 
> I'm wondering how confident we are that other server implementations
> behave reasonably. Yes, they are broken if they don't behave, but
> there
> is still risk of introducing a regression.

I'd prefer to deal with that through other means if it turns out that
we have to. The problem we have right now is that we are forcing a
writeback of the entire file while holding several directory mutexes
(as well as the filesystem-global s_vfs_rename_mutex). That's terrible
for performance and scalability.

> 
> 
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > ---
> >  fs/nfs/client.c           |  2 ++
> >  fs/nfs/dir.c              | 15 ++++++++++++++-
> >  include/linux/nfs_fs_sb.h |  6 ++++++
> >  3 files changed, 22 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> > index 2115c1189c2d..6d63b958c4bb 100644
> > --- a/fs/nfs/client.c
> > +++ b/fs/nfs/client.c
> > @@ -1105,6 +1105,8 @@ struct nfs_server *nfs_create_server(struct
> > fs_context *fc)
> >  		if (server->namelen == 0 || server->namelen >
> > NFS2_MAXNAMLEN)
> >  			server->namelen = NFS2_MAXNAMLEN;
> >  	}
> > +	/* Linux 'subtree_check' borkenness mandates this setting
> > */
> 
> Assuming you are going to respin this patch to deal with the build
> bot
> warnings, can you make this comment more useful? "borkenness" sounds
> like you are dealing with a server bug here, but that's not really
> the case. subtree_check is working as designed: it requires the extra
> flushing. The no_subtree_check case does not, IIUC.

subtree checking is working as designed, but that doesn't change the
fact that it is a violation of the NFSv3 protocol. You can't to change
the filehandle in mid flight in any stateless protocol, because that
will break applications.

> It would be better to explain this need strictly in terms of file
> handle
> volatility, no?
> 
> 
> > +	server->fh_expire_type = NFS_FH_VOL_RENAME;
> >  
> >  	if (!(fattr->valid & NFS_ATTR_FATTR)) {
> >  		error = ctx->nfs_mod->rpc_ops->getattr(server,
> > ctx->mntfh,
> > diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
> > index bd23fc736b39..d0e0b435a843 100644
> > --- a/fs/nfs/dir.c
> > +++ b/fs/nfs/dir.c
> > @@ -2676,6 +2676,18 @@ nfs_unblock_rename(struct rpc_task *task,
> > struct nfs_renamedata *data)
> >  	unblock_revalidate(new_dentry);
> >  }
> >  
> > +static bool nfs_rename_is_unsafe_cross_dir(struct dentry
> > *old_dentry,
> > +					   struct dentry
> > *new_dentry)
> > +{
> > +	struct nfs_server *server = NFS_SB(old_dentry->d_sb);
> > +
> > +	if (old_dentry->d_parent != new_dentry->d_parent)
> > +		return false;
> > +	if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
> > +		return !(server->fh_expire_type &
> > NFS_FH_NOEXPIRE_WITH_OPEN);
> > +	return true;
> > +}
> > +
> >  /*
> >   * RENAME
> >   * FIXME: Some nfsds, like the Linux user space nfsd, may generate
> > a
> > @@ -2763,7 +2775,8 @@ int nfs_rename(struct mnt_idmap *idmap,
> > struct inode *old_dir,
> >  
> >  	}
> >  
> > -	if (S_ISREG(old_inode->i_mode))
> > +	if (S_ISREG(old_inode->i_mode) &&
> > +	    nfs_rename_is_unsafe_cross_dir(old_dentry,
> > new_dentry))
> >  		nfs_sync_inode(old_inode);
> >  	task = nfs_async_rename(old_dir, new_dir, old_dentry,
> > new_dentry,
> >  				must_unblock ? nfs_unblock_rename
> > : NULL);
> > diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> > index 71319637a84e..6732c7e04d19 100644
> > --- a/include/linux/nfs_fs_sb.h
> > +++ b/include/linux/nfs_fs_sb.h
> > @@ -236,6 +236,12 @@ struct nfs_server {
> >  	u32			acl_bitmask;	/* V4 bitmask
> > representing the ACEs
> >  						   that are
> > supported on this
> >  						   filesystem */
> > +	/* The following #defines numerically match the NFSv4
> > equivalents */
> > +#define NFS_FH_NOEXPIRE_WITH_OPEN (0x1)
> > +#define NFS_FH_VOLATILE_ANY (0x2)
> > +#define NFS_FH_VOL_MIGRATION (0x4)
> > +#define NFS_FH_VOL_RENAME (0x8)
> > +#define NFS_FH_RENAME_UNSAFE (NFS_FH_VOLATILE_ANY |
> > NFS_FH_VOL_RENAME)
> >  	u32			fh_expire_type;	/* V4
> > bitmask representing file
> >  						   handle
> > volatility type for
> >  						   this filesystem
> > */
> 

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename()
  2025-04-29 23:22   ` Trond Myklebust
@ 2025-04-30 13:28     ` Chuck Lever
  0 siblings, 0 replies; 7+ messages in thread
From: Chuck Lever @ 2025-04-30 13:28 UTC (permalink / raw)
  To: Trond Myklebust, anna@kernel.org; +Cc: linux-nfs@vger.kernel.org

On 4/29/25 7:22 PM, Trond Myklebust wrote:
> On Tue, 2025-04-29 at 12:14 -0400, Chuck Lever wrote:
>> On 4/27/25 7:01 PM, trondmy@kernel.org wrote:
>>> From: Trond Myklebust <trond.myklebust@hammerspace.com>
>>>
>>> The Linux client assumes that all filehandles are non-volatile for
>>> renames within the same directory (otherwise sillyrename cannot
>>> work).
>>> However, the existence of the Linux 'subtree_check' export option
>>> has
>>> meant that nfs_rename() has always assumed it needs to flush writes
>>> before attempting to rename.
>>>
>>> Since NFSv4 does allow the client to query whether or not the
>>> server
>>> exhibits this behaviour, and since knfsd does actually set the
>>> appropriate flag when 'subtree_check' is enabled on an export, it
>>> should be OK to optimise away the write flushing behaviour in the
>>> cases
>>> where it is clearly not needed.
>>
>> No objection to the high level goal, seems like a sensible change.
>>
>> So NFSv3 still has the flushing requirement, but NFSv4 can disable
>> that
>> requirement as long as the server in question asserts
>> FH4_VOLATILE_ANY
>> and FH4_VOL_RENAME, correct?
>>
>> I'm wondering how confident we are that other server implementations
>> behave reasonably. Yes, they are broken if they don't behave, but
>> there
>> is still risk of introducing a regression.
> 
> I'd prefer to deal with that through other means if it turns out that
> we have to. The problem we have right now is that we are forcing a
> writeback of the entire file while holding several directory mutexes
> (as well as the filesystem-global s_vfs_rename_mutex). That's terrible
> for performance and scalability.
> 
>>
>>
>>> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
>>> ---
>>>  fs/nfs/client.c           |  2 ++
>>>  fs/nfs/dir.c              | 15 ++++++++++++++-
>>>  include/linux/nfs_fs_sb.h |  6 ++++++
>>>  3 files changed, 22 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
>>> index 2115c1189c2d..6d63b958c4bb 100644
>>> --- a/fs/nfs/client.c
>>> +++ b/fs/nfs/client.c
>>> @@ -1105,6 +1105,8 @@ struct nfs_server *nfs_create_server(struct
>>> fs_context *fc)
>>>  		if (server->namelen == 0 || server->namelen >
>>> NFS2_MAXNAMLEN)
>>>  			server->namelen = NFS2_MAXNAMLEN;
>>>  	}
>>> +	/* Linux 'subtree_check' borkenness mandates this setting
>>> */
>>
>> Assuming you are going to respin this patch to deal with the build
>> bot
>> warnings, can you make this comment more useful? "borkenness" sounds
>> like you are dealing with a server bug here, but that's not really
>> the case. subtree_check is working as designed: it requires the extra
>> flushing. The no_subtree_check case does not, IIUC.
> 
> subtree checking is working as designed, but that doesn't change the
> fact that it is a violation of the NFSv3 protocol. You can't to change
> the filehandle in mid flight in any stateless protocol, because that
> will break applications.

My point is that the comment you add in this patch, although whimsical,
is not terribly illuminating. A more expansive comment (with the detail
you provide above) would be helpful.

But if subtree_check is not compliant with RFC 1813, perhaps we should
consider finally deprecating and removing it. At least exports(5) should
mention the spec compliance issue, but my copy of exports(5) does not.


>> It would be better to explain this need strictly in terms of file
>> handle
>> volatility, no?
>>
>>
>>> +	server->fh_expire_type = NFS_FH_VOL_RENAME;
>>>  
>>>  	if (!(fattr->valid & NFS_ATTR_FATTR)) {
>>>  		error = ctx->nfs_mod->rpc_ops->getattr(server,
>>> ctx->mntfh,
>>> diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c
>>> index bd23fc736b39..d0e0b435a843 100644
>>> --- a/fs/nfs/dir.c
>>> +++ b/fs/nfs/dir.c
>>> @@ -2676,6 +2676,18 @@ nfs_unblock_rename(struct rpc_task *task,
>>> struct nfs_renamedata *data)
>>>  	unblock_revalidate(new_dentry);
>>>  }
>>>  
>>> +static bool nfs_rename_is_unsafe_cross_dir(struct dentry
>>> *old_dentry,
>>> +					   struct dentry
>>> *new_dentry)
>>> +{
>>> +	struct nfs_server *server = NFS_SB(old_dentry->d_sb);
>>> +
>>> +	if (old_dentry->d_parent != new_dentry->d_parent)
>>> +		return false;
>>> +	if (server->fh_expire_type & NFS_FH_RENAME_UNSAFE)
>>> +		return !(server->fh_expire_type &
>>> NFS_FH_NOEXPIRE_WITH_OPEN);
>>> +	return true;
>>> +}
>>> +
>>>  /*
>>>   * RENAME
>>>   * FIXME: Some nfsds, like the Linux user space nfsd, may generate
>>> a
>>> @@ -2763,7 +2775,8 @@ int nfs_rename(struct mnt_idmap *idmap,
>>> struct inode *old_dir,
>>>  
>>>  	}
>>>  
>>> -	if (S_ISREG(old_inode->i_mode))
>>> +	if (S_ISREG(old_inode->i_mode) &&
>>> +	    nfs_rename_is_unsafe_cross_dir(old_dentry,
>>> new_dentry))
>>>  		nfs_sync_inode(old_inode);
>>>  	task = nfs_async_rename(old_dir, new_dir, old_dentry,
>>> new_dentry,
>>>  				must_unblock ? nfs_unblock_rename
>>> : NULL);
>>> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
>>> index 71319637a84e..6732c7e04d19 100644
>>> --- a/include/linux/nfs_fs_sb.h
>>> +++ b/include/linux/nfs_fs_sb.h
>>> @@ -236,6 +236,12 @@ struct nfs_server {
>>>  	u32			acl_bitmask;	/* V4 bitmask
>>> representing the ACEs
>>>  						   that are
>>> supported on this
>>>  						   filesystem */
>>> +	/* The following #defines numerically match the NFSv4
>>> equivalents */
>>> +#define NFS_FH_NOEXPIRE_WITH_OPEN (0x1)
>>> +#define NFS_FH_VOLATILE_ANY (0x2)
>>> +#define NFS_FH_VOL_MIGRATION (0x4)
>>> +#define NFS_FH_VOL_RENAME (0x8)
>>> +#define NFS_FH_RENAME_UNSAFE (NFS_FH_VOLATILE_ANY |
>>> NFS_FH_VOL_RENAME)
>>>  	u32			fh_expire_type;	/* V4
>>> bitmask representing file
>>>  						   handle
>>> volatility type for
>>>  						   this filesystem
>>> */
>>
> 


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2025-04-30 13:28 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-27 23:01 [PATCH] NFS: Avoid flushing data while holding directory locks in nfs_rename() trondmy
2025-04-28  1:53 ` kernel test robot
2025-04-28  2:35 ` kernel test robot
2025-04-29 16:14 ` Chuck Lever
2025-04-29 16:54   ` Jeff Layton
2025-04-29 23:22   ` Trond Myklebust
2025-04-30 13:28     ` Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox