linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO
@ 2024-08-29  1:03 Mike Snitzer
  2024-08-29  1:03 ` [PATCH v14 01/25] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
                   ` (25 more replies)
  0 siblings, 26 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:03 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

These latest changes are available in my git tree here:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-next

I _think_ I addressed all of v13's very helpful review comments.
Special thanks to Neil and Chuck for their time and help!

And hopefully I didn't miss anything in the changelog below.

Changes since v13:
- Extended the nn->nfsd_serv reference lifetime to be identical to the
  nfsd_file (until after localio's IO is complete), suggested by Neil.
  This is made easier by introducing a 'struct nfs_localio_ctx' that
  contains both the 'nfsd_file' and 'nfsd_net' associated with
  localio.

- Switched nfs_common's 'nfs_to' symbol management locking from using
  mutex to spinlock, suggested by Neil.

- Eliminated nfs_local_file_open() by folding it into
  nfs_local_open_fh(), suggested by Neil.

- Updated nfs_uuid_is_local() to get reference on the net, drop it in
  nfs_local_disable(), suggested by Neil.

- Pushed saving/restoring of client's cred down from
  nfsd_open_local_fh() to nfsd_file_acquire_local(), suggested by
  Neil.

- Dropped the pNFS flexfiles-specific open file caching that caused
  lifetime issues (inability to unmount backing filesystem), noticed
  by Neil. Also removed nfsd_file dummy definition as a side-effect.

- Updated NFSD_LOCALIO in fs/nfsd/Kconfig to explicitly 'default n'
  and improve description, suggested by Chuck. Also made the same
  updates to NFS_LOCALIO in fs/nfs/Kconfig.

- Split out a separate preliminary patch that introduces
  nfsd_serv_try_get() and nfsd_serv_put() and the associated
  percpu_ref, suggested by Chuck.

- Moved rpcauth_map_clnt_to_svc_cred_local from net/sunrpc/auth.c to
  net/sunrpc/svcauth.c and renamed it to
  svcauth_map_clnt_to_svc_cred_local. Also added kdoc. Suggested by
  Chuck.

- Added Chuck's Acked-by to 2 patches.

- Incorporated Chuck's 6 patches that split up and improved the
  __fh_verify and nfsd_file_acquire_local patches.  Added
  fh_verify_local as Chuck suggested.  Used Neil's improved comment
  for localio's early return from check_nfsd_access.

- Revised the answer to FAQ 6 in localio.rst, hopefully for the
  better.

- Fixed issue Neil pointed out about nfs_local_disable() racing with
  nfsd_open_local_fh() by adding the use of a clp->cl_localio_lock
  (spinlock_t) and RCU to dereference clp->cl_nfssvc_net and
  clp->cl_nfssvc_dom.  The call to nfsd_open_local_fh() is covered by
  RCU.

- Split the patch "nfs_common: add NFS LOCALIO auxiliary protocol
  enablement" out to 3 separate patches.  Hope is that it helps reduce
  review burden thanks to each patch header explaining things with
  more precision and detail.

All review appreciated, thanks!
Mike

Chuck Lever (2):
  NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
  NFSD: Short-circuit fh_verify tracepoints for LOCALIO

Mike Snitzer (11):
  nfs_common: factor out nfs_errtbl and nfs_stat_to_errno
  nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno
  nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
  nfsd: add nfsd_serv_try_get and nfsd_serv_put
  SUNRPC: remove call_allocate() BUG_ONs
  nfs_common: add NFS LOCALIO auxiliary protocol enablement
  nfs_common: introduce nfs_localio_ctx struct and interfaces
  nfsd: implement server support for NFS_LOCALIO_PROGRAM
  nfs: pass struct nfs_localio_ctx to nfs_init_pgio and nfs_init_commit
  nfs: implement client support for NFS_LOCALIO_PROGRAM
  nfs: add Documentation/filesystems/nfs/localio.rst

NeilBrown (5):
  NFSD: Handle @rqstp == NULL in check_nfsd_access()
  NFSD: Refactor nfsd_setuser_and_check_port()
  nfsd: factor out __fh_verify to allow NULL rqstp to be passed
  nfsd: add nfsd_file_acquire_local()
  SUNRPC: replace program list with program array

Trond Myklebust (4):
  nfs: enable localio for non-pNFS IO
  pnfs/flexfiles: enable localio support
  nfs/localio: use dedicated workqueues for filesystem read and write
  nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst

Weston Andros Adamson (3):
  SUNRPC: add svcauth_map_clnt_to_svc_cred_local
  nfsd: add localio support
  nfs: add localio support

 Documentation/filesystems/nfs/localio.rst | 276 ++++++++
 fs/Kconfig                                |   3 +
 fs/nfs/Kconfig                            |  17 +
 fs/nfs/Makefile                           |   1 +
 fs/nfs/client.c                           |  15 +-
 fs/nfs/filelayout/filelayout.c            |   6 +-
 fs/nfs/flexfilelayout/flexfilelayout.c    |  56 +-
 fs/nfs/flexfilelayout/flexfilelayoutdev.c |   6 +
 fs/nfs/inode.c                            |  57 +-
 fs/nfs/internal.h                         |  53 +-
 fs/nfs/localio.c                          | 789 ++++++++++++++++++++++
 fs/nfs/nfs2xdr.c                          |  70 +-
 fs/nfs/nfs3xdr.c                          | 108 +--
 fs/nfs/nfs4xdr.c                          |  84 +--
 fs/nfs/nfstrace.h                         |  61 ++
 fs/nfs/pagelist.c                         |  16 +-
 fs/nfs/pnfs_nfs.c                         |   2 +-
 fs/nfs/write.c                            |  12 +-
 fs/nfs_common/Makefile                    |   5 +
 fs/nfs_common/common.c                    | 134 ++++
 fs/nfs_common/nfslocalio.c                | 233 +++++++
 fs/nfsd/Kconfig                           |  17 +
 fs/nfsd/Makefile                          |   1 +
 fs/nfsd/export.c                          |  30 +-
 fs/nfsd/filecache.c                       |  98 ++-
 fs/nfsd/filecache.h                       |   4 +
 fs/nfsd/localio.c                         | 180 +++++
 fs/nfsd/lockd.c                           |   6 +-
 fs/nfsd/netns.h                           |   8 +-
 fs/nfsd/nfsctl.c                          |   2 +-
 fs/nfsd/nfsd.h                            |   6 +-
 fs/nfsd/nfsfh.c                           | 141 ++--
 fs/nfsd/nfsfh.h                           |   2 +
 fs/nfsd/nfssvc.c                          | 105 ++-
 fs/nfsd/trace.h                           |  21 +-
 fs/nfsd/vfs.h                             |   7 +
 include/linux/nfs.h                       |   9 +
 include/linux/nfs_common.h                |  17 +
 include/linux/nfs_fs_sb.h                 |  10 +
 include/linux/nfs_xdr.h                   |  20 +-
 include/linux/nfslocalio.h                |  69 ++
 include/linux/sunrpc/svc.h                |   7 +-
 include/linux/sunrpc/svcauth.h            |   5 +
 net/sunrpc/clnt.c                         |   6 -
 net/sunrpc/svc.c                          |  68 +-
 net/sunrpc/svc_xprt.c                     |   2 +-
 net/sunrpc/svcauth.c                      |  28 +
 net/sunrpc/svcauth_unix.c                 |   3 +-
 48 files changed, 2467 insertions(+), 409 deletions(-)
 create mode 100644 Documentation/filesystems/nfs/localio.rst
 create mode 100644 fs/nfs/localio.c
 create mode 100644 fs/nfs_common/common.c
 create mode 100644 fs/nfs_common/nfslocalio.c
 create mode 100644 fs/nfsd/localio.c
 create mode 100644 include/linux/nfs_common.h
 create mode 100644 include/linux/nfslocalio.h

-- 
2.44.0


^ permalink raw reply	[flat|nested] 75+ messages in thread

* [PATCH v14 01/25] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
@ 2024-08-29  1:03 ` Mike Snitzer
  2024-08-29 14:17   ` Jeff Layton
  2024-08-29  1:03 ` [PATCH v14 02/25] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno Mike Snitzer
                   ` (24 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:03 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and
fs/nfs/nfs3xdr.c

Will also be used by fs/nfsd/localio.c

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/Kconfig             |   1 +
 fs/nfs/nfs2xdr.c           |  70 +-----------------------
 fs/nfs/nfs3xdr.c           | 108 +++++++------------------------------
 fs/nfs/nfs4xdr.c           |   4 +-
 fs/nfs_common/Makefile     |   2 +
 fs/nfs_common/common.c     |  67 +++++++++++++++++++++++
 fs/nfsd/Kconfig            |   1 +
 include/linux/nfs_common.h |  16 ++++++
 8 files changed, 109 insertions(+), 160 deletions(-)
 create mode 100644 fs/nfs_common/common.c
 create mode 100644 include/linux/nfs_common.h

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 57249f040dfc..0eb20012792f 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -4,6 +4,7 @@ config NFS_FS
 	depends on INET && FILE_LOCKING && MULTIUSER
 	select LOCKD
 	select SUNRPC
+	select NFS_COMMON
 	select NFS_ACL_SUPPORT if NFS_V3_ACL
 	help
 	  Choose Y here if you want to access files residing on other
diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
index c19093814296..6e75c6c2d234 100644
--- a/fs/nfs/nfs2xdr.c
+++ b/fs/nfs/nfs2xdr.c
@@ -22,14 +22,12 @@
 #include <linux/nfs.h>
 #include <linux/nfs2.h>
 #include <linux/nfs_fs.h>
+#include <linux/nfs_common.h>
 #include "nfstrace.h"
 #include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_XDR
 
-/* Mapping from NFS error code to "errno" error code. */
-#define errno_NFSERR_IO		EIO
-
 /*
  * Declare the space requirements for NFS arguments and replies as
  * number of 32bit-words
@@ -64,8 +62,6 @@
 #define NFS_readdirres_sz	(1+NFS_pagepad_sz)
 #define NFS_statfsres_sz	(1+NFS_info_sz)
 
-static int nfs_stat_to_errno(enum nfs_stat);
-
 /*
  * Encode/decode NFSv2 basic data types
  *
@@ -1054,70 +1050,6 @@ static int nfs2_xdr_dec_statfsres(struct rpc_rqst *req, struct xdr_stream *xdr,
 	return nfs_stat_to_errno(status);
 }
 
-
-/*
- * We need to translate between nfs status return values and
- * the local errno values which may not be the same.
- */
-static const struct {
-	int stat;
-	int errno;
-} nfs_errtbl[] = {
-	{ NFS_OK,		0		},
-	{ NFSERR_PERM,		-EPERM		},
-	{ NFSERR_NOENT,		-ENOENT		},
-	{ NFSERR_IO,		-errno_NFSERR_IO},
-	{ NFSERR_NXIO,		-ENXIO		},
-/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
-	{ NFSERR_ACCES,		-EACCES		},
-	{ NFSERR_EXIST,		-EEXIST		},
-	{ NFSERR_XDEV,		-EXDEV		},
-	{ NFSERR_NODEV,		-ENODEV		},
-	{ NFSERR_NOTDIR,	-ENOTDIR	},
-	{ NFSERR_ISDIR,		-EISDIR		},
-	{ NFSERR_INVAL,		-EINVAL		},
-	{ NFSERR_FBIG,		-EFBIG		},
-	{ NFSERR_NOSPC,		-ENOSPC		},
-	{ NFSERR_ROFS,		-EROFS		},
-	{ NFSERR_MLINK,		-EMLINK		},
-	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
-	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
-	{ NFSERR_DQUOT,		-EDQUOT		},
-	{ NFSERR_STALE,		-ESTALE		},
-	{ NFSERR_REMOTE,	-EREMOTE	},
-#ifdef EWFLUSH
-	{ NFSERR_WFLUSH,	-EWFLUSH	},
-#endif
-	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
-	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
-	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
-	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
-	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
-	{ NFSERR_SERVERFAULT,	-EREMOTEIO	},
-	{ NFSERR_BADTYPE,	-EBADTYPE	},
-	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
-	{ -1,			-EIO		}
-};
-
-/**
- * nfs_stat_to_errno - convert an NFS status code to a local errno
- * @status: NFS status code to convert
- *
- * Returns a local errno value, or -EIO if the NFS status code is
- * not recognized.  This function is used jointly by NFSv2 and NFSv3.
- */
-static int nfs_stat_to_errno(enum nfs_stat status)
-{
-	int i;
-
-	for (i = 0; nfs_errtbl[i].stat != -1; i++) {
-		if (nfs_errtbl[i].stat == (int)status)
-			return nfs_errtbl[i].errno;
-	}
-	dprintk("NFS: Unrecognized nfs status value: %u\n", status);
-	return nfs_errtbl[i].errno;
-}
-
 #define PROC(proc, argtype, restype, timer)				\
 [NFSPROC_##proc] = {							\
 	.p_proc	    =  NFSPROC_##proc,					\
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 60f032be805a..4ae01c10b7e2 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -21,14 +21,13 @@
 #include <linux/nfs3.h>
 #include <linux/nfs_fs.h>
 #include <linux/nfsacl.h>
+#include <linux/nfs_common.h>
+
 #include "nfstrace.h"
 #include "internal.h"
 
 #define NFSDBG_FACILITY		NFSDBG_XDR
 
-/* Mapping from NFS error code to "errno" error code. */
-#define errno_NFSERR_IO		EIO
-
 /*
  * Declare the space requirements for NFS arguments and replies as
  * number of 32bit-words
@@ -91,8 +90,6 @@
 				NFS3_pagepad_sz)
 #define ACL3_setaclres_sz	(1+NFS3_post_op_attr_sz)
 
-static int nfs3_stat_to_errno(enum nfs_stat);
-
 /*
  * Map file type to S_IFMT bits
  */
@@ -1406,7 +1403,7 @@ static int nfs3_xdr_dec_getattr3res(struct rpc_rqst *req,
 out:
 	return error;
 out_default:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1445,7 +1442,7 @@ static int nfs3_xdr_dec_setattr3res(struct rpc_rqst *req,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1495,7 +1492,7 @@ static int nfs3_xdr_dec_lookup3res(struct rpc_rqst *req,
 	error = decode_post_op_attr(xdr, result->dir_attr, userns);
 	if (unlikely(error))
 		goto out;
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1537,7 +1534,7 @@ static int nfs3_xdr_dec_access3res(struct rpc_rqst *req,
 out:
 	return error;
 out_default:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1578,7 +1575,7 @@ static int nfs3_xdr_dec_readlink3res(struct rpc_rqst *req,
 out:
 	return error;
 out_default:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1658,7 +1655,7 @@ static int nfs3_xdr_dec_read3res(struct rpc_rqst *req, struct xdr_stream *xdr,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1728,7 +1725,7 @@ static int nfs3_xdr_dec_write3res(struct rpc_rqst *req, struct xdr_stream *xdr,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1795,7 +1792,7 @@ static int nfs3_xdr_dec_create3res(struct rpc_rqst *req,
 	error = decode_wcc_data(xdr, result->dir_attr, userns);
 	if (unlikely(error))
 		goto out;
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1835,7 +1832,7 @@ static int nfs3_xdr_dec_remove3res(struct rpc_rqst *req,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1881,7 +1878,7 @@ static int nfs3_xdr_dec_rename3res(struct rpc_rqst *req,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -1926,7 +1923,7 @@ static int nfs3_xdr_dec_link3res(struct rpc_rqst *req, struct xdr_stream *xdr,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /**
@@ -2101,7 +2098,7 @@ static int nfs3_xdr_dec_readdir3res(struct rpc_rqst *req,
 	error = decode_post_op_attr(xdr, result->dir_attr, rpc_rqst_userns(req));
 	if (unlikely(error))
 		goto out;
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -2167,7 +2164,7 @@ static int nfs3_xdr_dec_fsstat3res(struct rpc_rqst *req,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -2243,7 +2240,7 @@ static int nfs3_xdr_dec_fsinfo3res(struct rpc_rqst *req,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -2304,7 +2301,7 @@ static int nfs3_xdr_dec_pathconf3res(struct rpc_rqst *req,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 /*
@@ -2350,7 +2347,7 @@ static int nfs3_xdr_dec_commit3res(struct rpc_rqst *req,
 out:
 	return error;
 out_status:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 #ifdef CONFIG_NFS_V3_ACL
@@ -2416,7 +2413,7 @@ static int nfs3_xdr_dec_getacl3res(struct rpc_rqst *req,
 out:
 	return error;
 out_default:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 static int nfs3_xdr_dec_setacl3res(struct rpc_rqst *req,
@@ -2435,76 +2432,11 @@ static int nfs3_xdr_dec_setacl3res(struct rpc_rqst *req,
 out:
 	return error;
 out_default:
-	return nfs3_stat_to_errno(status);
+	return nfs_stat_to_errno(status);
 }
 
 #endif  /* CONFIG_NFS_V3_ACL */
 
-
-/*
- * We need to translate between nfs status return values and
- * the local errno values which may not be the same.
- */
-static const struct {
-	int stat;
-	int errno;
-} nfs_errtbl[] = {
-	{ NFS_OK,		0		},
-	{ NFSERR_PERM,		-EPERM		},
-	{ NFSERR_NOENT,		-ENOENT		},
-	{ NFSERR_IO,		-errno_NFSERR_IO},
-	{ NFSERR_NXIO,		-ENXIO		},
-/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
-	{ NFSERR_ACCES,		-EACCES		},
-	{ NFSERR_EXIST,		-EEXIST		},
-	{ NFSERR_XDEV,		-EXDEV		},
-	{ NFSERR_NODEV,		-ENODEV		},
-	{ NFSERR_NOTDIR,	-ENOTDIR	},
-	{ NFSERR_ISDIR,		-EISDIR		},
-	{ NFSERR_INVAL,		-EINVAL		},
-	{ NFSERR_FBIG,		-EFBIG		},
-	{ NFSERR_NOSPC,		-ENOSPC		},
-	{ NFSERR_ROFS,		-EROFS		},
-	{ NFSERR_MLINK,		-EMLINK		},
-	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
-	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
-	{ NFSERR_DQUOT,		-EDQUOT		},
-	{ NFSERR_STALE,		-ESTALE		},
-	{ NFSERR_REMOTE,	-EREMOTE	},
-#ifdef EWFLUSH
-	{ NFSERR_WFLUSH,	-EWFLUSH	},
-#endif
-	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
-	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
-	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
-	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
-	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
-	{ NFSERR_SERVERFAULT,	-EREMOTEIO	},
-	{ NFSERR_BADTYPE,	-EBADTYPE	},
-	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
-	{ -1,			-EIO		}
-};
-
-/**
- * nfs3_stat_to_errno - convert an NFS status code to a local errno
- * @status: NFS status code to convert
- *
- * Returns a local errno value, or -EIO if the NFS status code is
- * not recognized.  This function is used jointly by NFSv2 and NFSv3.
- */
-static int nfs3_stat_to_errno(enum nfs_stat status)
-{
-	int i;
-
-	for (i = 0; nfs_errtbl[i].stat != -1; i++) {
-		if (nfs_errtbl[i].stat == (int)status)
-			return nfs_errtbl[i].errno;
-	}
-	dprintk("NFS: Unrecognized nfs status value: %u\n", status);
-	return nfs_errtbl[i].errno;
-}
-
-
 #define PROC(proc, argtype, restype, timer)				\
 [NFS3PROC_##proc] = {							\
 	.p_proc      = NFS3PROC_##proc,					\
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 7704a4509676..b4091af1a60d 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -52,6 +52,7 @@
 #include <linux/nfs.h>
 #include <linux/nfs4.h>
 #include <linux/nfs_fs.h>
+#include <linux/nfs_common.h>
 
 #include "nfs4_fs.h"
 #include "nfs4trace.h"
@@ -63,9 +64,6 @@
 
 #define NFSDBG_FACILITY		NFSDBG_XDR
 
-/* Mapping from NFS error code to "errno" error code. */
-#define errno_NFSERR_IO		EIO
-
 struct compound_hdr;
 static int nfs4_stat_to_errno(int);
 static void encode_layoutget(struct xdr_stream *xdr,
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index 119c75ab9fd0..e58b01bb8dda 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -8,3 +8,5 @@ nfs_acl-objs := nfsacl.o
 
 obj-$(CONFIG_GRACE_PERIOD) += grace.o
 obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
+
+obj-$(CONFIG_NFS_COMMON) += common.o
diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c
new file mode 100644
index 000000000000..a4ee95da2174
--- /dev/null
+++ b/fs/nfs_common/common.c
@@ -0,0 +1,67 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/module.h>
+#include <linux/nfs_common.h>
+
+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ */
+static const struct {
+	int stat;
+	int errno;
+} nfs_errtbl[] = {
+	{ NFS_OK,		0		},
+	{ NFSERR_PERM,		-EPERM		},
+	{ NFSERR_NOENT,		-ENOENT		},
+	{ NFSERR_IO,		-errno_NFSERR_IO},
+	{ NFSERR_NXIO,		-ENXIO		},
+/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
+	{ NFSERR_ACCES,		-EACCES		},
+	{ NFSERR_EXIST,		-EEXIST		},
+	{ NFSERR_XDEV,		-EXDEV		},
+	{ NFSERR_NODEV,		-ENODEV		},
+	{ NFSERR_NOTDIR,	-ENOTDIR	},
+	{ NFSERR_ISDIR,		-EISDIR		},
+	{ NFSERR_INVAL,		-EINVAL		},
+	{ NFSERR_FBIG,		-EFBIG		},
+	{ NFSERR_NOSPC,		-ENOSPC		},
+	{ NFSERR_ROFS,		-EROFS		},
+	{ NFSERR_MLINK,		-EMLINK		},
+	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
+	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
+	{ NFSERR_DQUOT,		-EDQUOT		},
+	{ NFSERR_STALE,		-ESTALE		},
+	{ NFSERR_REMOTE,	-EREMOTE	},
+#ifdef EWFLUSH
+	{ NFSERR_WFLUSH,	-EWFLUSH	},
+#endif
+	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
+	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
+	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
+	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
+	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
+	{ NFSERR_SERVERFAULT,	-EREMOTEIO	},
+	{ NFSERR_BADTYPE,	-EBADTYPE	},
+	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
+	{ -1,			-EIO		}
+};
+
+/**
+ * nfs_stat_to_errno - convert an NFS status code to a local errno
+ * @status: NFS status code to convert
+ *
+ * Returns a local errno value, or -EIO if the NFS status code is
+ * not recognized.  This function is used jointly by NFSv2 and NFSv3.
+ */
+int nfs_stat_to_errno(enum nfs_stat status)
+{
+	int i;
+
+	for (i = 0; nfs_errtbl[i].stat != -1; i++) {
+		if (nfs_errtbl[i].stat == (int)status)
+			return nfs_errtbl[i].errno;
+	}
+	return nfs_errtbl[i].errno;
+}
+EXPORT_SYMBOL_GPL(nfs_stat_to_errno);
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index ec2ab6429e00..c0bd1509ccd4 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -7,6 +7,7 @@ config NFSD
 	select LOCKD
 	select SUNRPC
 	select EXPORTFS
+	select NFS_COMMON
 	select NFS_ACL_SUPPORT if NFSD_V2_ACL
 	select NFS_ACL_SUPPORT if NFSD_V3_ACL
 	depends on MULTIUSER
diff --git a/include/linux/nfs_common.h b/include/linux/nfs_common.h
new file mode 100644
index 000000000000..3395c4a4d372
--- /dev/null
+++ b/include/linux/nfs_common.h
@@ -0,0 +1,16 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * This file contains constants and methods used by both NFS client and server.
+ */
+#ifndef _LINUX_NFS_COMMON_H
+#define _LINUX_NFS_COMMON_H
+
+#include <linux/errno.h>
+#include <uapi/linux/nfs.h>
+
+/* Mapping from NFS error code to "errno" error code. */
+#define errno_NFSERR_IO EIO
+
+int nfs_stat_to_errno(enum nfs_stat status);
+
+#endif /* _LINUX_NFS_COMMON_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 02/25] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
  2024-08-29  1:03 ` [PATCH v14 01/25] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
@ 2024-08-29  1:03 ` Mike Snitzer
  2024-08-29 14:17   ` Jeff Layton
  2024-08-29  1:03 ` [PATCH v14 03/25] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
                   ` (23 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:03 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be
used by fs/nfs/localio.c

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/nfs4xdr.c           | 67 --------------------------------------
 fs/nfs_common/common.c     | 67 ++++++++++++++++++++++++++++++++++++++
 include/linux/nfs_common.h |  1 +
 3 files changed, 68 insertions(+), 67 deletions(-)

diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index b4091af1a60d..971305bdaecb 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -65,7 +65,6 @@
 #define NFSDBG_FACILITY		NFSDBG_XDR
 
 struct compound_hdr;
-static int nfs4_stat_to_errno(int);
 static void encode_layoutget(struct xdr_stream *xdr,
 			     const struct nfs4_layoutget_args *args,
 			     struct compound_hdr *hdr);
@@ -7619,72 +7618,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
 	return 0;
 }
 
-/*
- * We need to translate between nfs status return values and
- * the local errno values which may not be the same.
- */
-static struct {
-	int stat;
-	int errno;
-} nfs_errtbl[] = {
-	{ NFS4_OK,		0		},
-	{ NFS4ERR_PERM,		-EPERM		},
-	{ NFS4ERR_NOENT,	-ENOENT		},
-	{ NFS4ERR_IO,		-errno_NFSERR_IO},
-	{ NFS4ERR_NXIO,		-ENXIO		},
-	{ NFS4ERR_ACCESS,	-EACCES		},
-	{ NFS4ERR_EXIST,	-EEXIST		},
-	{ NFS4ERR_XDEV,		-EXDEV		},
-	{ NFS4ERR_NOTDIR,	-ENOTDIR	},
-	{ NFS4ERR_ISDIR,	-EISDIR		},
-	{ NFS4ERR_INVAL,	-EINVAL		},
-	{ NFS4ERR_FBIG,		-EFBIG		},
-	{ NFS4ERR_NOSPC,	-ENOSPC		},
-	{ NFS4ERR_ROFS,		-EROFS		},
-	{ NFS4ERR_MLINK,	-EMLINK		},
-	{ NFS4ERR_NAMETOOLONG,	-ENAMETOOLONG	},
-	{ NFS4ERR_NOTEMPTY,	-ENOTEMPTY	},
-	{ NFS4ERR_DQUOT,	-EDQUOT		},
-	{ NFS4ERR_STALE,	-ESTALE		},
-	{ NFS4ERR_BADHANDLE,	-EBADHANDLE	},
-	{ NFS4ERR_BAD_COOKIE,	-EBADCOOKIE	},
-	{ NFS4ERR_NOTSUPP,	-ENOTSUPP	},
-	{ NFS4ERR_TOOSMALL,	-ETOOSMALL	},
-	{ NFS4ERR_SERVERFAULT,	-EREMOTEIO	},
-	{ NFS4ERR_BADTYPE,	-EBADTYPE	},
-	{ NFS4ERR_LOCKED,	-EAGAIN		},
-	{ NFS4ERR_SYMLINK,	-ELOOP		},
-	{ NFS4ERR_OP_ILLEGAL,	-EOPNOTSUPP	},
-	{ NFS4ERR_DEADLOCK,	-EDEADLK	},
-	{ NFS4ERR_NOXATTR,	-ENODATA	},
-	{ NFS4ERR_XATTR2BIG,	-E2BIG		},
-	{ -1,			-EIO		}
-};
-
-/*
- * Convert an NFS error code to a local one.
- * This one is used jointly by NFSv2 and NFSv3.
- */
-static int
-nfs4_stat_to_errno(int stat)
-{
-	int i;
-	for (i = 0; nfs_errtbl[i].stat != -1; i++) {
-		if (nfs_errtbl[i].stat == stat)
-			return nfs_errtbl[i].errno;
-	}
-	if (stat <= 10000 || stat > 10100) {
-		/* The server is looney tunes. */
-		return -EREMOTEIO;
-	}
-	/* If we cannot translate the error, the recovery routines should
-	 * handle it.
-	 * Note: remaining NFSv4 error codes have values > 10000, so should
-	 * not conflict with native Linux error codes.
-	 */
-	return -stat;
-}
-
 #ifdef CONFIG_NFS_V4_2
 #include "nfs42xdr.c"
 #endif /* CONFIG_NFS_V4_2 */
diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c
index a4ee95da2174..34a115176f97 100644
--- a/fs/nfs_common/common.c
+++ b/fs/nfs_common/common.c
@@ -2,6 +2,7 @@
 
 #include <linux/module.h>
 #include <linux/nfs_common.h>
+#include <linux/nfs4.h>
 
 /*
  * We need to translate between nfs status return values and
@@ -65,3 +66,69 @@ int nfs_stat_to_errno(enum nfs_stat status)
 	return nfs_errtbl[i].errno;
 }
 EXPORT_SYMBOL_GPL(nfs_stat_to_errno);
+
+/*
+ * We need to translate between nfs v4 status return values and
+ * the local errno values which may not be the same.
+ */
+static const struct {
+	int stat;
+	int errno;
+} nfs4_errtbl[] = {
+	{ NFS4_OK,		0		},
+	{ NFS4ERR_PERM,		-EPERM		},
+	{ NFS4ERR_NOENT,	-ENOENT		},
+	{ NFS4ERR_IO,		-errno_NFSERR_IO},
+	{ NFS4ERR_NXIO,		-ENXIO		},
+	{ NFS4ERR_ACCESS,	-EACCES		},
+	{ NFS4ERR_EXIST,	-EEXIST		},
+	{ NFS4ERR_XDEV,		-EXDEV		},
+	{ NFS4ERR_NOTDIR,	-ENOTDIR	},
+	{ NFS4ERR_ISDIR,	-EISDIR		},
+	{ NFS4ERR_INVAL,	-EINVAL		},
+	{ NFS4ERR_FBIG,		-EFBIG		},
+	{ NFS4ERR_NOSPC,	-ENOSPC		},
+	{ NFS4ERR_ROFS,		-EROFS		},
+	{ NFS4ERR_MLINK,	-EMLINK		},
+	{ NFS4ERR_NAMETOOLONG,	-ENAMETOOLONG	},
+	{ NFS4ERR_NOTEMPTY,	-ENOTEMPTY	},
+	{ NFS4ERR_DQUOT,	-EDQUOT		},
+	{ NFS4ERR_STALE,	-ESTALE		},
+	{ NFS4ERR_BADHANDLE,	-EBADHANDLE	},
+	{ NFS4ERR_BAD_COOKIE,	-EBADCOOKIE	},
+	{ NFS4ERR_NOTSUPP,	-ENOTSUPP	},
+	{ NFS4ERR_TOOSMALL,	-ETOOSMALL	},
+	{ NFS4ERR_SERVERFAULT,	-EREMOTEIO	},
+	{ NFS4ERR_BADTYPE,	-EBADTYPE	},
+	{ NFS4ERR_LOCKED,	-EAGAIN		},
+	{ NFS4ERR_SYMLINK,	-ELOOP		},
+	{ NFS4ERR_OP_ILLEGAL,	-EOPNOTSUPP	},
+	{ NFS4ERR_DEADLOCK,	-EDEADLK	},
+	{ NFS4ERR_NOXATTR,	-ENODATA	},
+	{ NFS4ERR_XATTR2BIG,	-E2BIG		},
+	{ -1,			-EIO		}
+};
+
+/*
+ * Convert an NFS error code to a local one.
+ * This one is used by NFSv4.
+ */
+int nfs4_stat_to_errno(int stat)
+{
+	int i;
+	for (i = 0; nfs4_errtbl[i].stat != -1; i++) {
+		if (nfs4_errtbl[i].stat == stat)
+			return nfs4_errtbl[i].errno;
+	}
+	if (stat <= 10000 || stat > 10100) {
+		/* The server is looney tunes. */
+		return -EREMOTEIO;
+	}
+	/* If we cannot translate the error, the recovery routines should
+	 * handle it.
+	 * Note: remaining NFSv4 error codes have values > 10000, so should
+	 * not conflict with native Linux error codes.
+	 */
+	return -stat;
+}
+EXPORT_SYMBOL_GPL(nfs4_stat_to_errno);
diff --git a/include/linux/nfs_common.h b/include/linux/nfs_common.h
index 3395c4a4d372..5fc02df88252 100644
--- a/include/linux/nfs_common.h
+++ b/include/linux/nfs_common.h
@@ -12,5 +12,6 @@
 #define errno_NFSERR_IO EIO
 
 int nfs_stat_to_errno(enum nfs_stat status);
+int nfs4_stat_to_errno(int stat);
 
 #endif /* _LINUX_NFS_COMMON_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 03/25] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
  2024-08-29  1:03 ` [PATCH v14 01/25] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
  2024-08-29  1:03 ` [PATCH v14 02/25] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno Mike Snitzer
@ 2024-08-29  1:03 ` Mike Snitzer
  2024-08-29 14:19   ` Jeff Layton
  2024-08-29  1:03 ` [PATCH v14 04/25] NFSD: Handle @rqstp == NULL in check_nfsd_access() Mike Snitzer
                   ` (22 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:03 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

Eliminates duplicate functions in various files to allow for
additional callers.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/flexfilelayout/flexfilelayout.c |  6 ------
 fs/nfs/nfs4xdr.c                       | 13 -------------
 include/linux/nfs_xdr.h                | 20 +++++++++++++++++++-
 3 files changed, 19 insertions(+), 20 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 39ba9f4208aa..d4d551ffea7b 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -2086,12 +2086,6 @@ static int ff_layout_encode_ioerr(struct xdr_stream *xdr,
 	return ff_layout_encode_ds_ioerr(xdr, &ff_args->errors);
 }
 
-static void
-encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
-	WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
 static void
 ff_layout_encode_ff_iostat_head(struct xdr_stream *xdr,
 			    const nfs4_stateid *stateid,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 971305bdaecb..6bf2d44e5d4e 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -972,11 +972,6 @@ static __be32 *reserve_space(struct xdr_stream *xdr, size_t nbytes)
 	return p;
 }
 
-static void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
-	WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
 static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
 {
 	WARN_ON_ONCE(xdr_stream_encode_opaque(xdr, str, len) < 0);
@@ -4406,14 +4401,6 @@ static int decode_access(struct xdr_stream *xdr, u32 *supported, u32 *access)
 	return 0;
 }
 
-static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
-{
-	ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
-	if (unlikely(ret < 0))
-		return -EIO;
-	return 0;
-}
-
 static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
 {
 	return decode_opaque_fixed(xdr, stateid, NFS4_STATEID_SIZE);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 45623af3e7b8..5e93fbfb785a 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1853,6 +1853,24 @@ struct nfs_rpc_ops {
 	void	(*disable_swap)(struct inode *inode);
 };
 
+/*
+ * Helper functions used by NFS client and/or server
+ */
+static inline void encode_opaque_fixed(struct xdr_stream *xdr,
+				       const void *buf, size_t len)
+{
+	WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
+}
+
+static inline int decode_opaque_fixed(struct xdr_stream *xdr,
+				      void *buf, size_t len)
+{
+	ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
+	if (unlikely(ret < 0))
+		return -EIO;
+	return 0;
+}
+
 /*
  * Function vectors etc. for the NFS client
  */
@@ -1866,4 +1884,4 @@ extern const struct rpc_version nfs_version4;
 extern const struct rpc_version nfsacl_version3;
 extern const struct rpc_program nfsacl_program;
 
-#endif
+#endif /* _LINUX_NFS_XDR_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 04/25] NFSD: Handle @rqstp == NULL in check_nfsd_access()
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (2 preceding siblings ...)
  2024-08-29  1:03 ` [PATCH v14 03/25] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
@ 2024-08-29  1:03 ` Mike Snitzer
  2024-08-29 14:20   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 05/25] NFSD: Refactor nfsd_setuser_and_check_port() Mike Snitzer
                   ` (21 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:03 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: NeilBrown <neilb@suse.de>

LOCALIO-initiated open operations are not running in an nfsd thread
and thus do not have an associated svc_rqst context.

Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/export.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
index 7bb4f2075ac5..c82d8e3e0d4f 100644
--- a/fs/nfsd/export.c
+++ b/fs/nfsd/export.c
@@ -1074,10 +1074,30 @@ static struct svc_export *exp_find(struct cache_detail *cd,
 	return exp;
 }
 
+/**
+ * check_nfsd_access - check if access to export is allowed.
+ * @exp: svc_export that is being accessed.
+ * @rqstp: svc_rqst attempting to access @exp (will be NULL for LOCALIO).
+ *
+ * Return values:
+ *   %nfs_ok if access is granted, or
+ *   %nfserr_wrongsec if access is denied
+ */
 __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp)
 {
 	struct exp_flavor_info *f, *end = exp->ex_flavors + exp->ex_nflavors;
-	struct svc_xprt *xprt = rqstp->rq_xprt;
+	struct svc_xprt *xprt;
+
+	/*
+	 * If rqstp is NULL, this is a LOCALIO request which will only
+	 * ever use a filehandle/credential pair for which access has
+	 * been affirmed (by ACCESS or OPEN NFS requests) over the
+	 * wire. So there is no need for further checks here.
+	 */
+	if (!rqstp)
+		return nfs_ok;
+
+	xprt = rqstp->rq_xprt;
 
 	if (exp->ex_xprtsec_modes & NFSEXP_XPRTSEC_NONE) {
 		if (!test_bit(XPT_TLS_SESSION, &xprt->xpt_flags))
@@ -1098,17 +1118,17 @@ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp)
 ok:
 	/* legacy gss-only clients are always OK: */
 	if (exp->ex_client == rqstp->rq_gssclient)
-		return 0;
+		return nfs_ok;
 	/* ip-address based client; check sec= export option: */
 	for (f = exp->ex_flavors; f < end; f++) {
 		if (f->pseudoflavor == rqstp->rq_cred.cr_flavor)
-			return 0;
+			return nfs_ok;
 	}
 	/* defaults in absence of sec= options: */
 	if (exp->ex_nflavors == 0) {
 		if (rqstp->rq_cred.cr_flavor == RPC_AUTH_NULL ||
 		    rqstp->rq_cred.cr_flavor == RPC_AUTH_UNIX)
-			return 0;
+			return nfs_ok;
 	}
 
 	/* If the compound op contains a spo_must_allowed op,
@@ -1118,7 +1138,7 @@ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp)
 	 */
 
 	if (nfsd4_spo_must_allow(rqstp))
-		return 0;
+		return nfs_ok;
 
 denied:
 	return nfserr_wrongsec;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 05/25] NFSD: Refactor nfsd_setuser_and_check_port()
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (3 preceding siblings ...)
  2024-08-29  1:03 ` [PATCH v14 04/25] NFSD: Handle @rqstp == NULL in check_nfsd_access() Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 14:23   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry() Mike Snitzer
                   ` (20 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: NeilBrown <neilb@suse.de>

There are several places where __fh_verify unconditionally dereferences
rqstp to check that the connection is suitably secure.  They look at
rqstp->rq_xprt which is not meaningful in the target use case of
"localio" NFS in which the client talks directly to the local server.

Prepare these to always succeed when rqstp is NULL.

Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfsfh.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 50d23d56f403..4b964a71a504 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -87,23 +87,24 @@ nfsd_mode_check(struct dentry *dentry, umode_t requested)
 	return nfserr_wrong_type;
 }
 
-static bool nfsd_originating_port_ok(struct svc_rqst *rqstp, int flags)
+static bool nfsd_originating_port_ok(struct svc_rqst *rqstp,
+				     struct svc_cred *cred,
+				     struct svc_export *exp)
 {
-	if (flags & NFSEXP_INSECURE_PORT)
+	if (nfsexp_flags(cred, exp) & NFSEXP_INSECURE_PORT)
 		return true;
 	/* We don't require gss requests to use low ports: */
-	if (rqstp->rq_cred.cr_flavor >= RPC_AUTH_GSS)
+	if (cred->cr_flavor >= RPC_AUTH_GSS)
 		return true;
 	return test_bit(RQ_SECURE, &rqstp->rq_flags);
 }
 
 static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp,
+					  struct svc_cred *cred,
 					  struct svc_export *exp)
 {
-	int flags = nfsexp_flags(&rqstp->rq_cred, exp);
-
 	/* Check if the request originated from a secure port. */
-	if (!nfsd_originating_port_ok(rqstp, flags)) {
+	if (rqstp && !nfsd_originating_port_ok(rqstp, cred, exp)) {
 		RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]);
 		dprintk("nfsd: request from insecure port %s!\n",
 		        svc_print_addr(rqstp, buf, sizeof(buf)));
@@ -111,7 +112,7 @@ static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp,
 	}
 
 	/* Set user creds for this exportpoint */
-	return nfserrno(nfsd_setuser(&rqstp->rq_cred, exp));
+	return nfserrno(nfsd_setuser(cred, exp));
 }
 
 static inline __be32 check_pseudo_root(struct dentry *dentry,
@@ -219,7 +220,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 		put_cred(override_creds(new));
 		put_cred(new);
 	} else {
-		error = nfsd_setuser_and_check_port(rqstp, exp);
+		error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
 		if (error)
 			goto out;
 	}
@@ -358,7 +359,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 	if (error)
 		goto out;
 
-	error = nfsd_setuser_and_check_port(rqstp, exp);
+	error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
 	if (error)
 		goto out;
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (4 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 05/25] NFSD: Refactor nfsd_setuser_and_check_port() Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:45   ` [PATCH v14.5 " Mike Snitzer
  2024-08-29 14:28   ` [PATCH v14 " Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 07/25] NFSD: Short-circuit fh_verify tracepoints for LOCALIO Mike Snitzer
                   ` (19 subsequent siblings)
  25 siblings, 2 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Chuck Lever <chuck.lever@oracle.com>

Currently, fh_verify() makes some daring assumptions about which
version of file handle the caller wants, based on the things it can
find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
case sometimes has no svc_rqst context, so this logic won't work in
that case.

Instead, examine the passed-in file handle. It's .max_size field
should carry information to allow nfsd_set_fh_dentry() to initialize
the file handle appropriately.

lockd appears to be the only kernel consumer that does not set the
file handle .max_size during initialization.

write_filehandle() is the other question mark, as it looks possible
to specify a maxsize between NFS_FHSIZE and NFS3_FHSIZE here.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfsd/lockd.c |  6 ++++--
 fs/nfsd/nfsfh.c | 11 +++++++----
 2 files changed, 11 insertions(+), 6 deletions(-)

diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c
index 46a7f9b813e5..e636d2a1e664 100644
--- a/fs/nfsd/lockd.c
+++ b/fs/nfsd/lockd.c
@@ -32,8 +32,10 @@ nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp,
 	int		access;
 	struct svc_fh	fh;
 
-	/* must initialize before using! but maxsize doesn't matter */
-	fh_init(&fh,0);
+	if (rqstp->rq_vers == 4)
+		fh_init(&fh, NFS3_FHSIZE);
+	else
+		fh_init(&fh, NFS_FHSIZE);
 	fh.fh_handle.fh_size = f->size;
 	memcpy(&fh.fh_handle.fh_raw, f->data, f->size);
 	fh.fh_export = NULL;
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 4b964a71a504..77acc26e8b02 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -267,25 +267,28 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	fhp->fh_dentry = dentry;
 	fhp->fh_export = exp;
 
-	switch (rqstp->rq_vers) {
-	case 4:
+	switch (fhp->fh_maxsize) {
+	case NFS4_FHSIZE:
 		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR)
 			fhp->fh_no_atomic_attr = true;
 		fhp->fh_64bit_cookies = true;
 		break;
-	case 3:
+	case NFS3_FHSIZE:
 		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC)
 			fhp->fh_no_wcc = true;
 		fhp->fh_64bit_cookies = true;
 		if (exp->ex_flags & NFSEXP_V4ROOT)
 			goto out;
 		break;
-	case 2:
+	case NFS_FHSIZE:
 		fhp->fh_no_wcc = true;
 		if (EX_WGATHER(exp))
 			fhp->fh_use_wgather = true;
 		if (exp->ex_flags & NFSEXP_V4ROOT)
 			goto out;
+		break;
+	case 0:
+		WARN_ONCE(1, "Uninitialized file handle");
 	}
 
 	return 0;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 07/25] NFSD: Short-circuit fh_verify tracepoints for LOCALIO
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (5 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry() Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 14:33   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 08/25] nfsd: factor out __fh_verify to allow NULL rqstp to be passed Mike Snitzer
                   ` (18 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Chuck Lever <chuck.lever@oracle.com>

LOCALIO will be able to call fh_verify() with a NULL rqstp. In this
case, the existing trace points need to be skipped because they
want to dereference the address fields in the passed-in rqstp.

Temporarily make these trace points conditional to avoid a seg
fault in this case. Putting the "rqstp != NULL" check in the trace
points themselves makes the check more efficient.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfsd/trace.h | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 77bbd23aa150..d22027e23761 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -193,7 +193,7 @@ TRACE_EVENT(nfsd_compound_encode_err,
 		{ S_IFIFO,		"FIFO" }, \
 		{ S_IFSOCK,		"SOCK" })
 
-TRACE_EVENT(nfsd_fh_verify,
+TRACE_EVENT_CONDITION(nfsd_fh_verify,
 	TP_PROTO(
 		const struct svc_rqst *rqstp,
 		const struct svc_fh *fhp,
@@ -201,6 +201,7 @@ TRACE_EVENT(nfsd_fh_verify,
 		int access
 	),
 	TP_ARGS(rqstp, fhp, type, access),
+	TP_CONDITION(rqstp != NULL),
 	TP_STRUCT__entry(
 		__field(unsigned int, netns_ino)
 		__sockaddr(server, rqstp->rq_xprt->xpt_remotelen)
@@ -239,7 +240,7 @@ TRACE_EVENT_CONDITION(nfsd_fh_verify_err,
 		__be32 error
 	),
 	TP_ARGS(rqstp, fhp, type, access, error),
-	TP_CONDITION(error),
+	TP_CONDITION(rqstp != NULL && error),
 	TP_STRUCT__entry(
 		__field(unsigned int, netns_ino)
 		__sockaddr(server, rqstp->rq_xprt->xpt_remotelen)
@@ -295,12 +296,13 @@ DECLARE_EVENT_CLASS(nfsd_fh_err_class,
 		  __entry->status)
 )
 
-#define DEFINE_NFSD_FH_ERR_EVENT(name)		\
-DEFINE_EVENT(nfsd_fh_err_class, nfsd_##name,	\
-	TP_PROTO(struct svc_rqst *rqstp,	\
-		 struct svc_fh	*fhp,		\
-		 int		status),	\
-	TP_ARGS(rqstp, fhp, status))
+#define DEFINE_NFSD_FH_ERR_EVENT(name)			\
+DEFINE_EVENT_CONDITION(nfsd_fh_err_class, nfsd_##name,	\
+	TP_PROTO(struct svc_rqst *rqstp,		\
+		 struct svc_fh	*fhp,			\
+		 int		status),		\
+	TP_ARGS(rqstp, fhp, status),			\
+	TP_CONDITION(rqstp != NULL))
 
 DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badexport);
 DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badhandle);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 08/25] nfsd: factor out __fh_verify to allow NULL rqstp to be passed
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (6 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 07/25] NFSD: Short-circuit fh_verify tracepoints for LOCALIO Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 14:39   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local() Mike Snitzer
                   ` (17 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: NeilBrown <neilb@suse.de>

__fh_verify() offers an interface like fh_verify() but doesn't require
a struct svc_rqst *, instead it also takes the specific parts as
explicit required arguments.  So it is safe to call __fh_verify() with
a NULL rqstp, but the net, cred, and client args must not be NULL.

__fh_verify() does not use SVC_NET(), nor does the functions it calls.

Rather than using rqstp->rq_client pass the client and gssclient
explicitly to __fh_verify and then to nfsd_set_fh_dentry().

Lastly, 4 associated tracepoints are only used if rqstp is not NULL
(this is a stop-gap that should be properly fixed so localio also
benefits from the utility these tracepoints provide when debugging
fh_verify issues).

Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/nfsfh.c | 90 +++++++++++++++++++++++++++++--------------------
 1 file changed, 53 insertions(+), 37 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 77acc26e8b02..80c06e170e9a 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -142,7 +142,11 @@ static inline __be32 check_pseudo_root(struct dentry *dentry,
  * dentry.  On success, the results are used to set fh_export and
  * fh_dentry.
  */
-static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
+static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
+				 struct svc_cred *cred,
+				 struct auth_domain *client,
+				 struct auth_domain *gssclient,
+				 struct svc_fh *fhp)
 {
 	struct knfsd_fh	*fh = &fhp->fh_handle;
 	struct fid *fid = NULL;
@@ -184,8 +188,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	data_left -= len;
 	if (data_left < 0)
 		return error;
-	exp = rqst_exp_find(&rqstp->rq_chandle, SVC_NET(rqstp),
-			    rqstp->rq_client, rqstp->rq_gssclient,
+	exp = rqst_exp_find(rqstp ? &rqstp->rq_chandle : NULL,
+			    net, client, gssclient,
 			    fh->fh_fsid_type, fh->fh_fsid);
 	fid = (struct fid *)(fh->fh_fsid + len);
 
@@ -220,7 +224,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 		put_cred(override_creds(new));
 		put_cred(new);
 	} else {
-		error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
+		error = nfsd_setuser_and_check_port(rqstp, cred, exp);
 		if (error)
 			goto out;
 	}
@@ -297,43 +301,21 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	return error;
 }
 
-/**
- * fh_verify - filehandle lookup and access checking
- * @rqstp: pointer to current rpc request
- * @fhp: filehandle to be verified
- * @type: expected type of object pointed to by filehandle
- * @access: type of access needed to object
- *
- * Look up a dentry from the on-the-wire filehandle, check the client's
- * access to the export, and set the current task's credentials.
- *
- * Regardless of success or failure of fh_verify(), fh_put() should be
- * called on @fhp when the caller is finished with the filehandle.
- *
- * fh_verify() may be called multiple times on a given filehandle, for
- * example, when processing an NFSv4 compound.  The first call will look
- * up a dentry using the on-the-wire filehandle.  Subsequent calls will
- * skip the lookup and just perform the other checks and possibly change
- * the current task's credentials.
- *
- * @type specifies the type of object expected using one of the S_IF*
- * constants defined in include/linux/stat.h.  The caller may use zero
- * to indicate that it doesn't care, or a negative integer to indicate
- * that it expects something not of the given type.
- *
- * @access is formed from the NFSD_MAY_* constants defined in
- * fs/nfsd/vfs.h.
- */
-__be32
-fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
+static __be32
+__fh_verify(struct svc_rqst *rqstp,
+	    struct net *net, struct svc_cred *cred,
+	    struct auth_domain *client,
+	    struct auth_domain *gssclient,
+	    struct svc_fh *fhp, umode_t type, int access)
 {
-	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	struct svc_export *exp = NULL;
 	struct dentry	*dentry;
 	__be32		error;
 
 	if (!fhp->fh_dentry) {
-		error = nfsd_set_fh_dentry(rqstp, fhp);
+		error = nfsd_set_fh_dentry(rqstp, net, cred, client,
+					   gssclient, fhp);
 		if (error)
 			goto out;
 	}
@@ -362,7 +344,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 	if (error)
 		goto out;
 
-	error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
+	error = nfsd_setuser_and_check_port(rqstp, cred, exp);
 	if (error)
 		goto out;
 
@@ -392,7 +374,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 
 skip_pseudoflavor_check:
 	/* Finally, check access permissions. */
-	error = nfsd_permission(&rqstp->rq_cred, exp, dentry, access);
+	error = nfsd_permission(cred, exp, dentry, access);
 out:
 	trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error);
 	if (error == nfserr_stale)
@@ -400,6 +382,40 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
 	return error;
 }
 
+/**
+ * fh_verify - filehandle lookup and access checking
+ * @rqstp: pointer to current rpc request
+ * @fhp: filehandle to be verified
+ * @type: expected type of object pointed to by filehandle
+ * @access: type of access needed to object
+ *
+ * Look up a dentry from the on-the-wire filehandle, check the client's
+ * access to the export, and set the current task's credentials.
+ *
+ * Regardless of success or failure of fh_verify(), fh_put() should be
+ * called on @fhp when the caller is finished with the filehandle.
+ *
+ * fh_verify() may be called multiple times on a given filehandle, for
+ * example, when processing an NFSv4 compound.  The first call will look
+ * up a dentry using the on-the-wire filehandle.  Subsequent calls will
+ * skip the lookup and just perform the other checks and possibly change
+ * the current task's credentials.
+ *
+ * @type specifies the type of object expected using one of the S_IF*
+ * constants defined in include/linux/stat.h.  The caller may use zero
+ * to indicate that it doesn't care, or a negative integer to indicate
+ * that it expects something not of the given type.
+ *
+ * @access is formed from the NFSD_MAY_* constants defined in
+ * fs/nfsd/vfs.h.
+ */
+__be32
+fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
+{
+	return __fh_verify(rqstp, SVC_NET(rqstp), &rqstp->rq_cred,
+			   rqstp->rq_client, rqstp->rq_gssclient,
+			   fhp, type, access);
+}
 
 /*
  * Compose a file handle for an NFS reply.
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local()
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (7 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 08/25] nfsd: factor out __fh_verify to allow NULL rqstp to be passed Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 14:49   ` Jeff Layton
  2024-08-29 15:47   ` Chuck Lever
  2024-08-29  1:04 ` [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put Mike Snitzer
                   ` (16 subsequent siblings)
  25 siblings, 2 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: NeilBrown <neilb@suse.de>

nfsd_file_acquire_local() can be used to look up a file by filehandle
without having a struct svc_rqst.  This can be used by NFS LOCALIO to
allow the NFS client to bypass the NFS protocol to directly access a
file provided by the NFS server which is running in the same kernel.

In nfsd_file_do_acquire() care is taken to always use fh_verify() if
rqstp is not NULL (as is the case for non-LOCALIO callers).  Otherwise
the non-LOCALIO callers will not supply the correct and required
arguments to __fh_verify (e.g. gssclient isn't passed).

Introduce fh_verify_local() wrapper around __fh_verify to make it
clear that LOCALIO is intended caller.

Also, use GC for nfsd_file returned by nfsd_file_acquire_local.  GC
offers performance improvements if/when a file is reopened before
launderette cleans it from the filecache's LRU.

Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC
Signed-off-by: NeilBrown <neilb@suse.de>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/filecache.c | 71 ++++++++++++++++++++++++++++++++++++++++-----
 fs/nfsd/filecache.h |  3 ++
 fs/nfsd/nfsfh.c     | 39 +++++++++++++++++++++++++
 fs/nfsd/nfsfh.h     |  2 ++
 4 files changed, 108 insertions(+), 7 deletions(-)

diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 9e9d246f993c..2dc72de31f61 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -982,12 +982,14 @@ nfsd_file_is_cached(struct inode *inode)
 }
 
 static __be32
-nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
+nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net,
+		     struct svc_cred *cred,
+		     struct auth_domain *client,
+		     struct svc_fh *fhp,
 		     unsigned int may_flags, struct file *file,
 		     struct nfsd_file **pnf, bool want_gc)
 {
 	unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
-	struct net *net = SVC_NET(rqstp);
 	struct nfsd_file *new, *nf;
 	bool stale_retry = true;
 	bool open_retry = true;
@@ -996,8 +998,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	int ret;
 
 retry:
-	status = fh_verify(rqstp, fhp, S_IFREG,
-				may_flags|NFSD_MAY_OWNER_OVERRIDE);
+	if (rqstp) {
+		status = fh_verify(rqstp, fhp, S_IFREG,
+				   may_flags|NFSD_MAY_OWNER_OVERRIDE);
+	} else {
+		status = fh_verify_local(net, cred, client, fhp, S_IFREG,
+					 may_flags|NFSD_MAY_OWNER_OVERRIDE);
+	}
 	if (status != nfs_ok)
 		return status;
 	inode = d_inode(fhp->fh_dentry);
@@ -1143,7 +1150,8 @@ __be32
 nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		     unsigned int may_flags, struct nfsd_file **pnf)
 {
-	return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true);
+	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
+				    fhp, may_flags, NULL, pnf, true);
 }
 
 /**
@@ -1167,7 +1175,55 @@ __be32
 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct nfsd_file **pnf)
 {
-	return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false);
+	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
+				    fhp, may_flags, NULL, pnf, false);
+}
+
+/**
+ * nfsd_file_acquire_local - Get a struct nfsd_file with an open file for localio
+ * @net: The network namespace in which to perform a lookup
+ * @cred: the user credential with which to validate access
+ * @client: the auth_domain for LOCALIO lookup
+ * @fhp: the NFS filehandle of the file to be opened
+ * @may_flags: NFSD_MAY_ settings for the file
+ * @pnf: OUT: new or found "struct nfsd_file" object
+ *
+ * This file lookup interface provide access to a file given the
+ * filehandle and credential.  No connection-based authorisation
+ * is performed and in that way it is quite different to other
+ * file access mediated by nfsd.  It allows a kernel module such as the NFS
+ * client to reach across network and filesystem namespaces to access
+ * a file.  The security implications of this should be carefully
+ * considered before use.
+ *
+ * The nfsd_file object returned by this API is reference-counted
+ * and garbage-collected. The object is retained for a few
+ * seconds after the final nfsd_file_put() in case the caller
+ * wants to re-use it.
+ *
+ * Return values:
+ *   %nfs_ok - @pnf points to an nfsd_file with its reference
+ *   count boosted.
+ *
+ * On error, an nfsstat value in network byte order is returned.
+ */
+__be32
+nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
+			struct auth_domain *client, struct svc_fh *fhp,
+			unsigned int may_flags, struct nfsd_file **pnf)
+{
+	/*
+	 * Save creds before calling nfsd_file_do_acquire() (which calls
+	 * nfsd_setuser). Important because caller (LOCALIO) is from
+	 * client context.
+	 */
+	const struct cred *save_cred = get_current_cred();
+	__be32 beres;
+
+	beres = nfsd_file_do_acquire(NULL, net, cred, client,
+				     fhp, may_flags, NULL, pnf, true);
+	revert_creds(save_cred);
+	return beres;
 }
 
 /**
@@ -1193,7 +1249,8 @@ nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
 			 unsigned int may_flags, struct file *file,
 			 struct nfsd_file **pnf)
 {
-	return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false);
+	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
+				    fhp, may_flags, file, pnf, false);
 }
 
 /*
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 3fbec24eea6c..26ada78b8c1e 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -66,5 +66,8 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
 __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		  unsigned int may_flags, struct file *file,
 		  struct nfsd_file **nfp);
+__be32 nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
+			       struct auth_domain *client, struct svc_fh *fhp,
+			       unsigned int may_flags, struct nfsd_file **pnf);
 int nfsd_file_cache_stats_show(struct seq_file *m, void *v);
 #endif /* _FS_NFSD_FILECACHE_H */
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 80c06e170e9a..49468e478d23 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -301,6 +301,22 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
 	return error;
 }
 
+/**
+ * __fh_verify - filehandle lookup and access checking
+ * @rqstp: RPC transaction context, or NULL
+ * @net: net namespace in which to perform the export lookup
+ * @cred: RPC user credential
+ * @client: RPC auth domain
+ * @gssclient: RPC GSS auth domain, or NULL
+ * @fhp: filehandle to be verified
+ * @type: expected type of object pointed to by filehandle
+ * @access: type of access needed to object
+ *
+ * This internal API can be used by callers who do not have an RPC
+ * transaction context (ie are not running in an nfsd thread).
+ *
+ * See fh_verify() for further descriptions of @fhp, @type, and @access.
+ */
 static __be32
 __fh_verify(struct svc_rqst *rqstp,
 	    struct net *net, struct svc_cred *cred,
@@ -382,6 +398,29 @@ __fh_verify(struct svc_rqst *rqstp,
 	return error;
 }
 
+/**
+ * fh_verify_local - filehandle lookup and access checking
+ * @net: net namespace in which to perform the export lookup
+ * @cred: RPC user credential
+ * @client: RPC auth domain
+ * @fhp: filehandle to be verified
+ * @type: expected type of object pointed to by filehandle
+ * @access: type of access needed to object
+ *
+ * This API can be used by callers who do not have an RPC
+ * transaction context (ie are not running in an nfsd thread).
+ *
+ * See fh_verify() for further descriptions of @fhp, @type, and @access.
+ */
+__be32
+fh_verify_local(struct net *net, struct svc_cred *cred,
+		struct auth_domain *client, struct svc_fh *fhp,
+		umode_t type, int access)
+{
+	return __fh_verify(NULL, net, cred, client, NULL,
+			   fhp, type, access);
+}
+
 /**
  * fh_verify - filehandle lookup and access checking
  * @rqstp: pointer to current rpc request
diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
index 8d46e203d139..5b7394801dc4 100644
--- a/fs/nfsd/nfsfh.h
+++ b/fs/nfsd/nfsfh.h
@@ -217,6 +217,8 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
  * Function prototypes
  */
 __be32	fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
+__be32	fh_verify_local(struct net *, struct svc_cred *, struct auth_domain *,
+			struct svc_fh *, umode_t, int);
 __be32	fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
 __be32	fh_update(struct svc_fh *);
 void	fh_put(struct svc_fh *);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (8 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local() Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 15:49   ` Chuck Lever
  2024-08-29 15:57   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 11/25] SUNRPC: remove call_allocate() BUG_ONs Mike Snitzer
                   ` (15 subsequent siblings)
  25 siblings, 2 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.

A percpu_ref is used to implement the interlock between
nfsd_destroy_serv and any caller of nfsd_serv_try_get.

This interlock is needed to properly wait for the completion of client
initiated localio calls to nfsd (that are _not_ in the context of nfsd).

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfsd/netns.h  |  8 +++++++-
 fs/nfsd/nfssvc.c | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 238fc4e56e53..e2d953f21dde 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -13,6 +13,7 @@
 #include <linux/filelock.h>
 #include <linux/nfs4.h>
 #include <linux/percpu_counter.h>
+#include <linux/percpu-refcount.h>
 #include <linux/siphash.h>
 #include <linux/sunrpc/stats.h>
 
@@ -139,7 +140,9 @@ struct nfsd_net {
 
 	struct svc_info nfsd_info;
 #define nfsd_serv nfsd_info.serv
-
+	struct percpu_ref nfsd_serv_ref;
+	struct completion nfsd_serv_confirm_done;
+	struct completion nfsd_serv_free_done;
 
 	/*
 	 * clientid and stateid data for construction of net unique COPY
@@ -221,6 +224,9 @@ struct nfsd_net {
 extern bool nfsd_support_version(int vers);
 extern unsigned int nfsd_net_id;
 
+bool nfsd_serv_try_get(struct nfsd_net *nn);
+void nfsd_serv_put(struct nfsd_net *nn);
+
 void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
 void nfsd_reset_write_verifier(struct nfsd_net *nn);
 #endif /* __NFSD_NETNS_H__ */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index defc430f912f..e43d440f9f0a 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -193,6 +193,30 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
 	return 0;
 }
 
+bool nfsd_serv_try_get(struct nfsd_net *nn)
+{
+	return percpu_ref_tryget_live(&nn->nfsd_serv_ref);
+}
+
+void nfsd_serv_put(struct nfsd_net *nn)
+{
+	percpu_ref_put(&nn->nfsd_serv_ref);
+}
+
+static void nfsd_serv_done(struct percpu_ref *ref)
+{
+	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
+
+	complete(&nn->nfsd_serv_confirm_done);
+}
+
+static void nfsd_serv_free(struct percpu_ref *ref)
+{
+	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
+
+	complete(&nn->nfsd_serv_free_done);
+}
+
 /*
  * Maximum number of nfsd processes
  */
@@ -392,6 +416,7 @@ static void nfsd_shutdown_net(struct net *net)
 		lockd_down(net);
 		nn->lockd_up = false;
 	}
+	percpu_ref_exit(&nn->nfsd_serv_ref);
 	nn->nfsd_net_up = false;
 	nfsd_shutdown_generic();
 }
@@ -471,6 +496,13 @@ void nfsd_destroy_serv(struct net *net)
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 	struct svc_serv *serv = nn->nfsd_serv;
 
+	lockdep_assert_held(&nfsd_mutex);
+
+	percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
+	wait_for_completion(&nn->nfsd_serv_confirm_done);
+	wait_for_completion(&nn->nfsd_serv_free_done);
+	/* percpu_ref_exit is called in nfsd_shutdown_net */
+
 	spin_lock(&nfsd_notifier_lock);
 	nn->nfsd_serv = NULL;
 	spin_unlock(&nfsd_notifier_lock);
@@ -595,6 +627,13 @@ int nfsd_create_serv(struct net *net)
 	if (nn->nfsd_serv)
 		return 0;
 
+	error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free,
+				0, GFP_KERNEL);
+	if (error)
+		return error;
+	init_completion(&nn->nfsd_serv_free_done);
+	init_completion(&nn->nfsd_serv_confirm_done);
+
 	if (nfsd_max_blksize == 0)
 		nfsd_max_blksize = nfsd_get_default_max_blksize();
 	nfsd_reset_versions(nn);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 11/25] SUNRPC: remove call_allocate() BUG_ONs
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (9 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 15:58   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 12/25] SUNRPC: add svcauth_map_clnt_to_svc_cred_local Mike Snitzer
                   ` (14 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

Remove BUG_ON if p_arglen=0 to allow RPC with void arg.
Remove BUG_ON if p_replen=0 to allow RPC with void return.

The former was needed for the first revision of the LOCALIO protocol
which had an RPC that took a void arg:

    /* raw RFC 9562 UUID */
    typedef u8 uuid_t<UUID_SIZE>;

    program NFS_LOCALIO_PROGRAM {
        version LOCALIO_V1 {
            void
                NULL(void) = 0;

            uuid_t
                GETUUID(void) = 1;
        } = 1;
    } = 400122;

The latter is needed for the final revision of the LOCALIO protocol
which has a UUID_IS_LOCAL RPC which returns a void:

    /* raw RFC 9562 UUID */
    typedef u8 uuid_t<UUID_SIZE>;

    program NFS_LOCALIO_PROGRAM {
        version LOCALIO_V1 {
            void
                NULL(void) = 0;

            void
                UUID_IS_LOCAL(uuid_t) = 1;
        } = 1;
    } = 400122;

There is really no value in triggering a BUG_ON in response to either
of these previously unsupported conditions.

NeilBrown would like the entire 'if (proc->p_proc != 0)' branch
removed (not just the one BUG_ON that must be removed for LOCALIO's
immediate needs of returning void).

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 net/sunrpc/clnt.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index 09f29a95f2bc..00fe6df11ab7 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1893,12 +1893,6 @@ call_allocate(struct rpc_task *task)
 	if (req->rq_buffer)
 		return;
 
-	if (proc->p_proc != 0) {
-		BUG_ON(proc->p_arglen == 0);
-		if (proc->p_decode != NULL)
-			BUG_ON(proc->p_replen == 0);
-	}
-
 	/*
 	 * Calculate the size (in quads) of the RPC call
 	 * and reply headers, and convert both values
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 12/25] SUNRPC: add svcauth_map_clnt_to_svc_cred_local
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (10 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 11/25] SUNRPC: remove call_allocate() BUG_ONs Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 15:50   ` Chuck Lever
  2024-08-29 16:01   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 13/25] SUNRPC: replace program list with program array Mike Snitzer
                   ` (13 subsequent siblings)
  25 siblings, 2 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Weston Andros Adamson <dros@primarydata.com>

Add new funtion svcauth_map_clnt_to_svc_cred_local which maps a
generic cred to a svc_cred suitable for use in nfsd.

This is needed by the localio code to map nfs client creds to nfs
server credentials.

Following from net/sunrpc/auth_unix.c:unx_marshal() it is clear that
->fsuid and ->fsgid must be used (rather than ->uid and ->gid).  In
addition, these uid and gid must be translated with from_kuid_munged()
so local client uses correct uid and gid when acting as local server.

Suggested-by: NeilBrown <neilb@suse.de> # to approximate unx_marshal()
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 include/linux/sunrpc/svcauth.h |  5 +++++
 net/sunrpc/svcauth.c           | 28 ++++++++++++++++++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
index 63cf6fb26dcc..2e111153f7cd 100644
--- a/include/linux/sunrpc/svcauth.h
+++ b/include/linux/sunrpc/svcauth.h
@@ -14,6 +14,7 @@
 #include <linux/sunrpc/msg_prot.h>
 #include <linux/sunrpc/cache.h>
 #include <linux/sunrpc/gss_api.h>
+#include <linux/sunrpc/clnt.h>
 #include <linux/hash.h>
 #include <linux/stringhash.h>
 #include <linux/cred.h>
@@ -157,6 +158,10 @@ extern enum svc_auth_status svc_set_client(struct svc_rqst *rqstp);
 extern int	svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops);
 extern void	svc_auth_unregister(rpc_authflavor_t flavor);
 
+extern void	svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt,
+						   const struct cred *,
+						   struct svc_cred *);
+
 extern struct auth_domain *unix_domain_find(char *name);
 extern void auth_domain_put(struct auth_domain *item);
 extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new);
diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c
index 93d9e949e265..55b4d2874188 100644
--- a/net/sunrpc/svcauth.c
+++ b/net/sunrpc/svcauth.c
@@ -18,6 +18,7 @@
 #include <linux/sunrpc/svcauth.h>
 #include <linux/err.h>
 #include <linux/hash.h>
+#include <linux/user_namespace.h>
 
 #include <trace/events/sunrpc.h>
 
@@ -175,6 +176,33 @@ rpc_authflavor_t svc_auth_flavor(struct svc_rqst *rqstp)
 }
 EXPORT_SYMBOL_GPL(svc_auth_flavor);
 
+/**
+ * svcauth_map_clnt_to_svc_cred_local - maps a generic cred
+ * to a svc_cred suitable for use in nfsd.
+ * @clnt: rpc_clnt associated with nfs client
+ * @cred: generic cred associated with nfs client
+ * @svc: returned svc_cred that is suitable for use in nfsd
+ */
+void svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt,
+					const struct cred *cred,
+					struct svc_cred *svc)
+{
+	struct user_namespace *userns = clnt->cl_cred ?
+		clnt->cl_cred->user_ns : &init_user_ns;
+
+	memset(svc, 0, sizeof(struct svc_cred));
+
+	svc->cr_uid = KUIDT_INIT(from_kuid_munged(userns, cred->fsuid));
+	svc->cr_gid = KGIDT_INIT(from_kgid_munged(userns, cred->fsgid));
+	svc->cr_flavor = clnt->cl_auth->au_flavor;
+	if (cred->group_info)
+		svc->cr_group_info = get_group_info(cred->group_info);
+	/* These aren't relevant for local (network is bypassed) */
+	svc->cr_principal = NULL;
+	svc->cr_gss_mech = NULL;
+}
+EXPORT_SYMBOL_GPL(svcauth_map_clnt_to_svc_cred_local);
+
 /**************************************************
  * 'auth_domains' are stored in a hash table indexed by name.
  * When the last reference to an 'auth_domain' is dropped,
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 13/25] SUNRPC: replace program list with program array
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (11 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 12/25] SUNRPC: add svcauth_map_clnt_to_svc_cred_local Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 16:02   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
                   ` (12 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: NeilBrown <neil@brown.name>

A service created with svc_create_pooled() can be given a linked list of
programs and all of these will be served.

Using a linked list makes it cumbersome when there are several programs
that can be optionally selected with CONFIG settings.

After this patch is applied, API consumers must use only
svc_create_pooled() when creating an RPC service that listens for more
than one RPC program.

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfsd/nfsctl.c           |  2 +-
 fs/nfsd/nfsd.h             |  2 +-
 fs/nfsd/nfssvc.c           | 38 ++++++++++-----------
 include/linux/sunrpc/svc.h |  7 ++--
 net/sunrpc/svc.c           | 68 ++++++++++++++++++++++----------------
 net/sunrpc/svc_xprt.c      |  2 +-
 net/sunrpc/svcauth_unix.c  |  3 +-
 7 files changed, 67 insertions(+), 55 deletions(-)

diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 1c9e5b4bcb0a..64c1b4d649bc 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2246,7 +2246,7 @@ static __net_init int nfsd_net_init(struct net *net)
 	if (retval)
 		goto out_repcache_error;
 	memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
-	nn->nfsd_svcstats.program = &nfsd_program;
+	nn->nfsd_svcstats.program = &nfsd_programs[0];
 	for (i = 0; i < sizeof(nn->nfsd_versions); i++)
 		nn->nfsd_versions[i] = nfsd_support_version(i);
 	for (i = 0; i < sizeof(nn->nfsd4_minorversions); i++)
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 4ccbf014a2c7..b0d3e82d6dcd 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -85,7 +85,7 @@ struct nfsd_genl_rqstp {
 	u32			rq_opnum[NFSD_MAX_OPS_PER_COMPOUND];
 };
 
-extern struct svc_program	nfsd_program;
+extern struct svc_program	nfsd_programs[];
 extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
 extern struct mutex		nfsd_mutex;
 extern spinlock_t		nfsd_drc_lock;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index e43d440f9f0a..c639fbe4d8c2 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -35,7 +35,6 @@
 #define NFSDDBG_FACILITY	NFSDDBG_SVC
 
 atomic_t			nfsd_th_cnt = ATOMIC_INIT(0);
-extern struct svc_program	nfsd_program;
 static int			nfsd(void *vrqstp);
 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
 static int			nfsd_acl_rpcbind_set(struct net *,
@@ -90,20 +89,9 @@ static const struct svc_version *nfsd_acl_version[] = {
 # endif
 };
 
-#define NFSD_ACL_MINVERS            2
+#define NFSD_ACL_MINVERS	2
 #define NFSD_ACL_NRVERS		ARRAY_SIZE(nfsd_acl_version)
 
-static struct svc_program	nfsd_acl_program = {
-	.pg_prog		= NFS_ACL_PROGRAM,
-	.pg_nvers		= NFSD_ACL_NRVERS,
-	.pg_vers		= nfsd_acl_version,
-	.pg_name		= "nfsacl",
-	.pg_class		= "nfsd",
-	.pg_authenticate	= &svc_set_client,
-	.pg_init_request	= nfsd_acl_init_request,
-	.pg_rpcbind_set		= nfsd_acl_rpcbind_set,
-};
-
 #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
 
 static const struct svc_version *nfsd_version[NFSD_MAXVERS+1] = {
@@ -116,18 +104,29 @@ static const struct svc_version *nfsd_version[NFSD_MAXVERS+1] = {
 #endif
 };
 
-struct svc_program		nfsd_program = {
-#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
-	.pg_next		= &nfsd_acl_program,
-#endif
+struct svc_program		nfsd_programs[] = {
+	{
 	.pg_prog		= NFS_PROGRAM,		/* program number */
 	.pg_nvers		= NFSD_MAXVERS+1,	/* nr of entries in nfsd_version */
 	.pg_vers		= nfsd_version,		/* version table */
 	.pg_name		= "nfsd",		/* program name */
 	.pg_class		= "nfsd",		/* authentication class */
-	.pg_authenticate	= &svc_set_client,	/* export authentication */
+	.pg_authenticate	= svc_set_client,	/* export authentication */
 	.pg_init_request	= nfsd_init_request,
 	.pg_rpcbind_set		= nfsd_rpcbind_set,
+	},
+#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
+	{
+	.pg_prog		= NFS_ACL_PROGRAM,
+	.pg_nvers		= NFSD_ACL_NRVERS,
+	.pg_vers		= nfsd_acl_version,
+	.pg_name		= "nfsacl",
+	.pg_class		= "nfsd",
+	.pg_authenticate	= svc_set_client,
+	.pg_init_request	= nfsd_acl_init_request,
+	.pg_rpcbind_set		= nfsd_acl_rpcbind_set,
+	},
+#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
 };
 
 bool nfsd_support_version(int vers)
@@ -637,7 +636,8 @@ int nfsd_create_serv(struct net *net)
 	if (nfsd_max_blksize == 0)
 		nfsd_max_blksize = nfsd_get_default_max_blksize();
 	nfsd_reset_versions(nn);
-	serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats,
+	serv = svc_create_pooled(nfsd_programs, ARRAY_SIZE(nfsd_programs),
+				 &nn->nfsd_svcstats,
 				 nfsd_max_blksize, nfsd);
 	if (serv == NULL)
 		return -ENOMEM;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 437672bcaa22..c7ad2fb2a155 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -67,9 +67,10 @@ enum {
  * We currently do not support more than one RPC program per daemon.
  */
 struct svc_serv {
-	struct svc_program *	sv_program;	/* RPC program */
+	struct svc_program *	sv_programs;	/* RPC programs */
 	struct svc_stat *	sv_stats;	/* RPC statistics */
 	spinlock_t		sv_lock;
+	unsigned int		sv_nprogs;	/* Number of sv_programs */
 	unsigned int		sv_nrthreads;	/* # of server threads */
 	unsigned int		sv_maxconn;	/* max connections allowed or
 						 * '0' causing max to be based
@@ -357,10 +358,9 @@ struct svc_process_info {
 };
 
 /*
- * List of RPC programs on the same transport endpoint
+ * RPC program - an array of these can use the same transport endpoint
  */
 struct svc_program {
-	struct svc_program *	pg_next;	/* other programs (same xprt) */
 	u32			pg_prog;	/* program number */
 	unsigned int		pg_lovers;	/* lowest version */
 	unsigned int		pg_hivers;	/* highest version */
@@ -438,6 +438,7 @@ bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
 void		   svc_rqst_release_pages(struct svc_rqst *rqstp);
 void		   svc_exit_thread(struct svc_rqst *);
 struct svc_serv *  svc_create_pooled(struct svc_program *prog,
+				     unsigned int nprog,
 				     struct svc_stat *stats,
 				     unsigned int bufsize,
 				     int (*threadfn)(void *data));
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index ff6f3e35b36d..b33386d249c2 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -440,10 +440,11 @@ EXPORT_SYMBOL_GPL(svc_rpcb_cleanup);
 
 static int svc_uses_rpcbind(struct svc_serv *serv)
 {
-	struct svc_program	*progp;
-	unsigned int		i;
+	unsigned int		p, i;
+
+	for (p = 0; p < serv->sv_nprogs; p++) {
+		struct svc_program *progp = &serv->sv_programs[p];
 
-	for (progp = serv->sv_program; progp; progp = progp->pg_next) {
 		for (i = 0; i < progp->pg_nvers; i++) {
 			if (progp->pg_vers[i] == NULL)
 				continue;
@@ -480,7 +481,7 @@ __svc_init_bc(struct svc_serv *serv)
  * Create an RPC service
  */
 static struct svc_serv *
-__svc_create(struct svc_program *prog, struct svc_stat *stats,
+__svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats,
 	     unsigned int bufsize, int npools, int (*threadfn)(void *data))
 {
 	struct svc_serv	*serv;
@@ -491,7 +492,8 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
 	if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL)))
 		return NULL;
 	serv->sv_name      = prog->pg_name;
-	serv->sv_program   = prog;
+	serv->sv_programs  = prog;
+	serv->sv_nprogs    = nprogs;
 	serv->sv_stats     = stats;
 	if (bufsize > RPCSVC_MAXPAYLOAD)
 		bufsize = RPCSVC_MAXPAYLOAD;
@@ -499,17 +501,18 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
 	serv->sv_max_mesg  = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE);
 	serv->sv_threadfn = threadfn;
 	xdrsize = 0;
-	while (prog) {
-		prog->pg_lovers = prog->pg_nvers-1;
-		for (vers=0; vers<prog->pg_nvers ; vers++)
-			if (prog->pg_vers[vers]) {
-				prog->pg_hivers = vers;
-				if (prog->pg_lovers > vers)
-					prog->pg_lovers = vers;
-				if (prog->pg_vers[vers]->vs_xdrsize > xdrsize)
-					xdrsize = prog->pg_vers[vers]->vs_xdrsize;
+	for (i = 0; i < nprogs; i++) {
+		struct svc_program *progp = &prog[i];
+
+		progp->pg_lovers = progp->pg_nvers-1;
+		for (vers = 0; vers < progp->pg_nvers ; vers++)
+			if (progp->pg_vers[vers]) {
+				progp->pg_hivers = vers;
+				if (progp->pg_lovers > vers)
+					progp->pg_lovers = vers;
+				if (progp->pg_vers[vers]->vs_xdrsize > xdrsize)
+					xdrsize = progp->pg_vers[vers]->vs_xdrsize;
 			}
-		prog = prog->pg_next;
 	}
 	serv->sv_xdrsize   = xdrsize;
 	INIT_LIST_HEAD(&serv->sv_tempsocks);
@@ -558,13 +561,14 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
 struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize,
 			    int (*threadfn)(void *data))
 {
-	return __svc_create(prog, NULL, bufsize, 1, threadfn);
+	return __svc_create(prog, 1, NULL, bufsize, 1, threadfn);
 }
 EXPORT_SYMBOL_GPL(svc_create);
 
 /**
  * svc_create_pooled - Create an RPC service with pooled threads
- * @prog: the RPC program the new service will handle
+ * @prog:  Array of RPC programs the new service will handle
+ * @nprogs: Number of programs in the array
  * @stats: the stats struct if desired
  * @bufsize: maximum message size for @prog
  * @threadfn: a function to service RPC requests for @prog
@@ -572,6 +576,7 @@ EXPORT_SYMBOL_GPL(svc_create);
  * Returns an instantiated struct svc_serv object or NULL.
  */
 struct svc_serv *svc_create_pooled(struct svc_program *prog,
+				   unsigned int nprogs,
 				   struct svc_stat *stats,
 				   unsigned int bufsize,
 				   int (*threadfn)(void *data))
@@ -579,7 +584,7 @@ struct svc_serv *svc_create_pooled(struct svc_program *prog,
 	struct svc_serv *serv;
 	unsigned int npools = svc_pool_map_get();
 
-	serv = __svc_create(prog, stats, bufsize, npools, threadfn);
+	serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn);
 	if (!serv)
 		goto out_err;
 	serv->sv_is_pooled = true;
@@ -602,16 +607,16 @@ svc_destroy(struct svc_serv **servp)
 
 	*servp = NULL;
 
-	dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name);
+	dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name);
 	timer_shutdown_sync(&serv->sv_temptimer);
 
 	/*
 	 * Remaining transports at this point are not expected.
 	 */
 	WARN_ONCE(!list_empty(&serv->sv_permsocks),
-		  "SVC: permsocks remain for %s\n", serv->sv_program->pg_name);
+		  "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name);
 	WARN_ONCE(!list_empty(&serv->sv_tempsocks),
-		  "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name);
+		  "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name);
 
 	cache_clean_deferred(serv);
 
@@ -1149,15 +1154,16 @@ int svc_register(const struct svc_serv *serv, struct net *net,
 		 const int family, const unsigned short proto,
 		 const unsigned short port)
 {
-	struct svc_program	*progp;
-	unsigned int		i;
+	unsigned int		p, i;
 	int			error = 0;
 
 	WARN_ON_ONCE(proto == 0 && port == 0);
 	if (proto == 0 && port == 0)
 		return -EINVAL;
 
-	for (progp = serv->sv_program; progp; progp = progp->pg_next) {
+	for (p = 0; p < serv->sv_nprogs; p++) {
+		struct svc_program *progp = &serv->sv_programs[p];
+
 		for (i = 0; i < progp->pg_nvers; i++) {
 
 			error = progp->pg_rpcbind_set(net, progp, i,
@@ -1209,13 +1215,14 @@ static void __svc_unregister(struct net *net, const u32 program, const u32 versi
 static void svc_unregister(const struct svc_serv *serv, struct net *net)
 {
 	struct sighand_struct *sighand;
-	struct svc_program *progp;
 	unsigned long flags;
-	unsigned int i;
+	unsigned int p, i;
 
 	clear_thread_flag(TIF_SIGPENDING);
 
-	for (progp = serv->sv_program; progp; progp = progp->pg_next) {
+	for (p = 0; p < serv->sv_nprogs; p++) {
+		struct svc_program *progp = &serv->sv_programs[p];
+
 		for (i = 0; i < progp->pg_nvers; i++) {
 			if (progp->pg_vers[i] == NULL)
 				continue;
@@ -1321,7 +1328,7 @@ svc_process_common(struct svc_rqst *rqstp)
 	struct svc_process_info process;
 	enum svc_auth_status	auth_res;
 	unsigned int		aoffset;
-	int			rc;
+	int			pr, rc;
 	__be32			*p;
 
 	/* Will be turned off only when NFSv4 Sessions are used */
@@ -1345,9 +1352,12 @@ svc_process_common(struct svc_rqst *rqstp)
 	rqstp->rq_vers = be32_to_cpup(p++);
 	rqstp->rq_proc = be32_to_cpup(p);
 
-	for (progp = serv->sv_program; progp; progp = progp->pg_next)
+	for (pr = 0; pr < serv->sv_nprogs; pr++) {
+		progp = &serv->sv_programs[pr];
+
 		if (rqstp->rq_prog == progp->pg_prog)
 			break;
+	}
 
 	/*
 	 * Decode auth data, and add verifier to reply buffer.
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 53ebc719ff5a..43c57124de52 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -268,7 +268,7 @@ static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name,
 		spin_unlock(&svc_xprt_class_lock);
 		newxprt = xcl->xcl_ops->xpo_create(serv, net, sap, len, flags);
 		if (IS_ERR(newxprt)) {
-			trace_svc_xprt_create_err(serv->sv_program->pg_name,
+			trace_svc_xprt_create_err(serv->sv_programs->pg_name,
 						  xcl->xcl_name, sap, len,
 						  newxprt);
 			module_put(xcl->xcl_owner);
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 04b45588ae6f..8ca98b146ec8 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -697,7 +697,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
 	rqstp->rq_auth_stat = rpc_autherr_badcred;
 	ipm = ip_map_cached_get(xprt);
 	if (ipm == NULL)
-		ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class,
+		ipm = __ip_map_lookup(sn->ip_map_cache,
+				      rqstp->rq_server->sv_programs->pg_class,
 				    &sin6->sin6_addr);
 
 	if (ipm == NULL)
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (12 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 13/25] SUNRPC: replace program list with program array Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 16:07   ` Jeff Layton
  2024-08-29 23:39   ` NeilBrown
  2024-08-29  1:04 ` [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces Mike Snitzer
                   ` (11 subsequent siblings)
  25 siblings, 2 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS
client to generate a nonce (single-use UUID) and associated
short-lived nfs_uuid_t struct, register it with nfs_common for
subsequent lookup and verification by the NFS server and if matched
the NFS server populates members in the nfs_uuid_t struct.

nfs_common's nfs_uuids list is the basis for localio enablement, as
such it has members that point to nfsd memory for direct use by the
client (e.g. 'net' is the server's network namespace, through it the
client can access nn->nfsd_serv with proper rcu read access).

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs_common/Makefile     |  3 ++
 fs/nfs_common/nfslocalio.c | 74 ++++++++++++++++++++++++++++++++++++++
 include/linux/nfslocalio.h | 31 ++++++++++++++++
 3 files changed, 108 insertions(+)
 create mode 100644 fs/nfs_common/nfslocalio.c
 create mode 100644 include/linux/nfslocalio.h

diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index e58b01bb8dda..a5e54809701e 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -6,6 +6,9 @@
 obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
 nfs_acl-objs := nfsacl.o
 
+obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o
+nfs_localio-objs := nfslocalio.o
+
 obj-$(CONFIG_GRACE_PERIOD) += grace.o
 obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
 
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
new file mode 100644
index 000000000000..1a35a4a6dbe0
--- /dev/null
+++ b/fs/nfs_common/nfslocalio.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/rculist.h>
+#include <linux/nfslocalio.h>
+#include <net/netns/generic.h>
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("NFS localio protocol bypass support");
+
+DEFINE_MUTEX(nfs_uuid_mutex);
+
+/*
+ * Global list of nfs_uuid_t instances, add/remove
+ * is protected by nfs_uuid_mutex.
+ * Reads are protected by RCU read lock (see below).
+ */
+LIST_HEAD(nfs_uuids);
+
+void nfs_uuid_begin(nfs_uuid_t *nfs_uuid)
+{
+	nfs_uuid->net = NULL;
+	nfs_uuid->dom = NULL;
+	uuid_gen(&nfs_uuid->uuid);
+
+	mutex_lock(&nfs_uuid_mutex);
+	list_add_tail_rcu(&nfs_uuid->list, &nfs_uuids);
+	mutex_unlock(&nfs_uuid_mutex);
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_begin);
+
+void nfs_uuid_end(nfs_uuid_t *nfs_uuid)
+{
+	mutex_lock(&nfs_uuid_mutex);
+	list_del_rcu(&nfs_uuid->list);
+	mutex_unlock(&nfs_uuid_mutex);
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_end);
+
+/* Must be called with RCU read lock held. */
+static nfs_uuid_t * nfs_uuid_lookup(const uuid_t *uuid)
+{
+	nfs_uuid_t *nfs_uuid;
+
+	list_for_each_entry_rcu(nfs_uuid, &nfs_uuids, list)
+		if (uuid_equal(&nfs_uuid->uuid, uuid))
+			return nfs_uuid;
+
+	return NULL;
+}
+
+bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *dom)
+{
+	bool is_local = false;
+	nfs_uuid_t *nfs_uuid;
+
+	rcu_read_lock();
+	nfs_uuid = nfs_uuid_lookup(uuid);
+	if (nfs_uuid) {
+		nfs_uuid->net = maybe_get_net(net);
+		if (nfs_uuid->net) {
+			is_local = true;
+			kref_get(&dom->ref);
+			nfs_uuid->dom = dom;
+		}
+	}
+	rcu_read_unlock();
+
+	return is_local;
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
new file mode 100644
index 000000000000..9735ae8d3e5e
--- /dev/null
+++ b/include/linux/nfslocalio.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+#ifndef __LINUX_NFSLOCALIO_H
+#define __LINUX_NFSLOCALIO_H
+
+#include <linux/list.h>
+#include <linux/uuid.h>
+#include <linux/sunrpc/svcauth.h>
+#include <linux/nfs.h>
+#include <net/net_namespace.h>
+
+/*
+ * Useful to allow a client to negotiate if localio
+ * possible with its server.
+ *
+ * See Documentation/filesystems/nfs/localio.rst for more detail.
+ */
+typedef struct {
+	uuid_t uuid;
+	struct list_head list;
+	struct net *net; /* nfsd's network namespace */
+	struct auth_domain *dom; /* auth_domain for localio */
+} nfs_uuid_t;
+
+void nfs_uuid_begin(nfs_uuid_t *);
+void nfs_uuid_end(nfs_uuid_t *);
+bool nfs_uuid_is_local(const uuid_t *, struct net *, struct auth_domain *);
+
+#endif  /* __LINUX_NFSLOCALIO_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (13 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 16:40   ` Jeff Layton
  2024-08-30  5:46   ` NeilBrown
  2024-08-29  1:04 ` [PATCH v14 16/25] nfsd: add localio support Mike Snitzer
                   ` (10 subsequent siblings)
  25 siblings, 2 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

Introduce struct nfs_localio_ctx and the interfaces
nfs_localio_ctx_alloc() and nfs_localio_ctx_free().  The next commit
will introduce nfsd_open_local_fh() which returns a nfs_localio_ctx
structure.

Also, expose localio's required NFSD symbols to NFS client:
- Cache nfsd_open_local_fh symbol and other required NFSD symbols in a
  globally accessible 'nfs_to' nfs_to_nfsd_t struct.  Add interfaces
  get_nfs_to_nfsd_symbols() and put_nfs_to_nfsd_symbols() to allow
  each NFS client to take a reference on NFSD symbols.

- Apologies for the DEFINE_NFS_TO_NFSD_SYMBOL macro that makes
  defining get_##NFSD_SYMBOL() and put_##NFSD_SYMBOL() functions far
  simpler (and avoids cut-n-paste bugs, which is what motivated the
  development and use of a macro for this). But as C macros go it is a
  very simple one and there are many like it all over the kernel.

- Given the unique nature of NFS LOCALIO being an optional feature
  that when used requires NFS share access to NFSD memory: a unique
  bridging of NFSD resources to NFS (via nfs_common) is needed.  But
  that bridge must be dynamic, hence the use of symbol_request() and
  symbol_put().  Proposed ideas to accomolish the same without using
  symbol_{request,put} would be far more tedious to implement and
  very likely no easier to review.  Anyway: sorry NeilBrown...

- Despite the use of indirect function calls, caching these nfsd
  symbols for use by the client offers a ~10% performance win
  (compared to always doing get+call+put) for high IOPS workloads.

- Introduce nfsd_file_file() wrapper that provides access to
  nfsd_file's backing file.  Keeps nfsd_file structure opaque to NFS
  client (as suggested by Jeff Layton).

- The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
  symbols prepares for the NFS client to use nfsd_file for localio.

Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs_common/nfslocalio.c | 159 +++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.c        |  25 ++++++
 fs/nfsd/filecache.h        |   1 +
 fs/nfsd/nfssvc.c           |   5 ++
 include/linux/nfslocalio.h |  38 +++++++++
 5 files changed, 228 insertions(+)

diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 1a35a4a6dbe0..cc30fdb0cb46 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -72,3 +72,162 @@ bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *
 	return is_local;
 }
 EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
+
+/*
+ * The nfs localio code needs to call into nfsd using various symbols (below),
+ * but cannot be statically linked, because that will make the nfs module
+ * depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
+ * nfs_common module will only hold a reference on nfsd when localio is in use.
+ * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
+nfs_to_nfsd_t nfs_to;
+EXPORT_SYMBOL_GPL(nfs_to);
+
+/* Macro to define nfs_to get and put methods, avoids copy-n-paste bugs */
+#define DEFINE_NFS_TO_NFSD_SYMBOL(NFSD_SYMBOL)		\
+static nfs_to_##NFSD_SYMBOL##_t get_##NFSD_SYMBOL(void)	\
+{							\
+	return symbol_request(NFSD_SYMBOL);		\
+}							\
+static void put_##NFSD_SYMBOL(void)			\
+{							\
+	symbol_put(NFSD_SYMBOL);			\
+	nfs_to.NFSD_SYMBOL = NULL;			\
+}
+
+/* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
+extern struct nfs_localio_ctx *
+nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
+		   const struct cred *, const struct nfs_fh *, const fmode_t);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
+
+/* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
+extern struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_get);
+
+/* The nfs localio code needs to call into nfsd to release the nfsd_file */
+extern void nfsd_file_put(struct nfsd_file *nf);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_put);
+
+/* The nfs localio code needs to call into nfsd to access the nf->nf_file */
+extern struct file * nfsd_file_file(struct nfsd_file *nf);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_file);
+
+/* The nfs localio code needs to call into nfsd to release nn->nfsd_serv */
+extern void nfsd_serv_put(struct nfsd_net *nn);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_serv_put);
+#undef DEFINE_NFS_TO_NFSD_SYMBOL
+
+static struct kmem_cache *nfs_localio_ctx_cache;
+
+struct nfs_localio_ctx *nfs_localio_ctx_alloc(void)
+{
+	return kmem_cache_alloc(nfs_localio_ctx_cache,
+				GFP_KERNEL | __GFP_ZERO);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_ctx_alloc);
+
+void nfs_localio_ctx_free(struct nfs_localio_ctx *localio)
+{
+	if (localio->nf)
+		nfs_to.nfsd_file_put(localio->nf);
+	if (localio->nn)
+		nfs_to.nfsd_serv_put(localio->nn);
+	kmem_cache_free(nfs_localio_ctx_cache, localio);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_ctx_free);
+
+bool get_nfs_to_nfsd_symbols(void)
+{
+	spin_lock(&nfs_to_nfsd_lock);
+
+	/* Only get symbols on first reference */
+	if (refcount_read(&nfs_to.ref) == 0)
+		refcount_set(&nfs_to.ref, 1);
+	else {
+		refcount_inc(&nfs_to.ref);
+		spin_unlock(&nfs_to_nfsd_lock);
+		return true;
+	}
+
+	nfs_to.nfsd_open_local_fh = get_nfsd_open_local_fh();
+	if (!nfs_to.nfsd_open_local_fh)
+		goto out_nfsd_open_local_fh;
+
+	nfs_to.nfsd_file_get = get_nfsd_file_get();
+	if (!nfs_to.nfsd_file_get)
+		goto out_nfsd_file_get;
+
+	nfs_to.nfsd_file_put = get_nfsd_file_put();
+	if (!nfs_to.nfsd_file_put)
+		goto out_nfsd_file_put;
+
+	nfs_to.nfsd_file_file = get_nfsd_file_file();
+	if (!nfs_to.nfsd_file_file)
+		goto out_nfsd_file_file;
+
+	nfs_to.nfsd_serv_put = get_nfsd_serv_put();
+	if (!nfs_to.nfsd_serv_put)
+		goto out_nfsd_serv_put;
+
+	spin_unlock(&nfs_to_nfsd_lock);
+	return true;
+
+out_nfsd_serv_put:
+	put_nfsd_file_file();
+out_nfsd_file_file:
+	put_nfsd_file_put();
+out_nfsd_file_put:
+	put_nfsd_file_get();
+out_nfsd_file_get:
+	put_nfsd_open_local_fh();
+out_nfsd_open_local_fh:
+	spin_unlock(&nfs_to_nfsd_lock);
+	return false;
+}
+EXPORT_SYMBOL_GPL(get_nfs_to_nfsd_symbols);
+
+void put_nfs_to_nfsd_symbols(void)
+{
+	spin_lock(&nfs_to_nfsd_lock);
+
+	if (!refcount_dec_and_test(&nfs_to.ref))
+		goto out;
+
+	put_nfsd_open_local_fh();
+	put_nfsd_file_get();
+	put_nfsd_file_put();
+	put_nfsd_file_file();
+	put_nfsd_serv_put();
+out:
+	spin_unlock(&nfs_to_nfsd_lock);
+}
+EXPORT_SYMBOL_GPL(put_nfs_to_nfsd_symbols);
+
+static int __init nfslocalio_init(void)
+{
+	refcount_set(&nfs_to.ref, 0);
+
+	nfs_to.nfsd_open_local_fh = NULL;
+	nfs_to.nfsd_file_get = NULL;
+	nfs_to.nfsd_file_put = NULL;
+	nfs_to.nfsd_file_file = NULL;
+	nfs_to.nfsd_serv_put = NULL;
+
+	nfs_localio_ctx_cache = KMEM_CACHE(nfs_localio_ctx, 0);
+	if (!nfs_localio_ctx_cache)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void __exit nfslocalio_exit(void)
+{
+	kmem_cache_destroy(nfs_localio_ctx_cache);
+}
+
+module_init(nfslocalio_init);
+module_exit(nfslocalio_exit);
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2dc72de31f61..a83d469bca6b 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -39,6 +39,7 @@
 #include <linux/fsnotify.h>
 #include <linux/seq_file.h>
 #include <linux/rhashtable.h>
+#include <linux/nfslocalio.h>
 
 #include "vfs.h"
 #include "nfsd.h"
@@ -345,6 +346,10 @@ nfsd_file_get(struct nfsd_file *nf)
 		return nf;
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(nfsd_file_get);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_file_get_t __maybe_unused nfsd_file_get_typecheck = nfsd_file_get;
 
 /**
  * nfsd_file_put - put the reference to a nfsd_file
@@ -389,6 +394,26 @@ nfsd_file_put(struct nfsd_file *nf)
 	if (refcount_dec_and_test(&nf->nf_ref))
 		nfsd_file_free(nf);
 }
+EXPORT_SYMBOL_GPL(nfsd_file_put);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_file_put_t __maybe_unused nfsd_file_put_typecheck = nfsd_file_put;
+
+/**
+ * nfsd_file_file - get the backing file of an nfsd_file
+ * @nf: nfsd_file of which to access the backing file.
+ *
+ * Return backing file for @nf.
+ */
+struct file *
+nfsd_file_file(struct nfsd_file *nf)
+{
+	return nf->nf_file;
+}
+EXPORT_SYMBOL_GPL(nfsd_file_file);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_file_file_t __maybe_unused nfsd_file_file_typecheck = nfsd_file_file;
 
 static void
 nfsd_file_dispose_list(struct list_head *dispose)
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 26ada78b8c1e..6fbbb2e32e95 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -56,6 +56,7 @@ int nfsd_file_cache_start_net(struct net *net);
 void nfsd_file_cache_shutdown_net(struct net *net);
 void nfsd_file_put(struct nfsd_file *nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+struct file *nfsd_file_file(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
 void nfsd_file_net_dispose(struct nfsd_net *nn);
 bool nfsd_file_is_cached(struct inode *inode);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c639fbe4d8c2..13c69aa40d1c 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -19,6 +19,7 @@
 #include <linux/sunrpc/svc_xprt.h>
 #include <linux/lockd/bind.h>
 #include <linux/nfsacl.h>
+#include <linux/nfslocalio.h>
 #include <linux/seq_file.h>
 #include <linux/inetdevice.h>
 #include <net/addrconf.h>
@@ -201,6 +202,10 @@ void nfsd_serv_put(struct nfsd_net *nn)
 {
 	percpu_ref_put(&nn->nfsd_serv_ref);
 }
+EXPORT_SYMBOL_GPL(nfsd_serv_put);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_serv_put_t __maybe_unused nfsd_serv_put_typecheck = nfsd_serv_put;
 
 static void nfsd_serv_done(struct percpu_ref *ref)
 {
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 9735ae8d3e5e..68f5b39f1940 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -7,6 +7,8 @@
 
 #include <linux/list.h>
 #include <linux/uuid.h>
+#include <linux/refcount.h>
+#include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/svcauth.h>
 #include <linux/nfs.h>
 #include <net/net_namespace.h>
@@ -28,4 +30,40 @@ void nfs_uuid_begin(nfs_uuid_t *);
 void nfs_uuid_end(nfs_uuid_t *);
 bool nfs_uuid_is_local(const uuid_t *, struct net *, struct auth_domain *);
 
+struct nfsd_file;
+struct nfsd_net;
+
+struct nfs_localio_ctx {
+	struct nfsd_file *nf;
+	struct nfsd_net *nn;
+};
+
+typedef struct nfs_localio_ctx *
+(*nfs_to_nfsd_open_local_fh_t)(struct net *, struct auth_domain *,
+			       struct rpc_clnt *, const struct cred *,
+			       const struct nfs_fh *, const fmode_t);
+typedef struct nfsd_file * (*nfs_to_nfsd_file_get_t)(struct nfsd_file *);
+typedef void (*nfs_to_nfsd_file_put_t)(struct nfsd_file *);
+typedef struct file * (*nfs_to_nfsd_file_file_t)(struct nfsd_file *);
+typedef unsigned int (*nfs_to_nfsd_net_id_value_t)(void);
+typedef void (*nfs_to_nfsd_serv_put_t)(struct nfsd_net *);
+
+typedef struct {
+	refcount_t			ref;
+	nfs_to_nfsd_open_local_fh_t	nfsd_open_local_fh;
+	nfs_to_nfsd_file_get_t		nfsd_file_get;
+	nfs_to_nfsd_file_put_t		nfsd_file_put;
+	nfs_to_nfsd_file_file_t		nfsd_file_file;
+	nfs_to_nfsd_net_id_value_t	nfsd_net_id_value;
+	nfs_to_nfsd_serv_put_t		nfsd_serv_put;
+} nfs_to_nfsd_t;
+
+extern nfs_to_nfsd_t nfs_to;
+
+bool get_nfs_to_nfsd_symbols(void);
+void put_nfs_to_nfsd_symbols(void);
+
+struct nfs_localio_ctx *nfs_localio_ctx_alloc(void);
+void nfs_localio_ctx_free(struct nfs_localio_ctx *);
+
 #endif  /* __LINUX_NFSLOCALIO_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 16/25] nfsd: add localio support
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (14 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 16:01   ` Chuck Lever
  2024-08-29 16:49   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
                   ` (9 subsequent siblings)
  25 siblings, 2 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Weston Andros Adamson <dros@primarydata.com>

Add server support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when both the client and server are
running on the same host.

If nfsd_open_local_fh() fails then the NFS client will both retry and
fallback to normal network-based read, write and commit operations if
localio is no longer supported.

Care is taken to ensure the same NFS security mechanisms are used
(authentication, etc) regardless of whether localio or regular NFS
access is used.  The auth_domain established as part of the traditional
NFS client access to the NFS server is also used for localio.  Store
auth_domain for localio in nfsd_uuid_t and transfer it to the client
if it is local to the server.

Relative to containers, localio gives the client access to the network
namespace the server has.  This is required to allow the client to
access the server's per-namespace nfsd_net struct.

CONFIG_NFSD_LOCALIO controls the server enablement for localio.
A later commit will add CONFIG_NFS_LOCALIO to allow the client
enablement.

This commit also introduces the use of nfsd's percpu_ref to interlock
nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
not destroyed while in use by nfsd_open_local_fh, and warrants a more
detailed explanation:

nfsd_open_local_fh uses nfsd_serv_try_get before opening its file
handle and then the reference must be dropped by the caller using
nfsd_serv_put (via nfs_localio_ctx_free).

This "interlock" working relies heavily on nfsd_open_local_fh()'s
maybe_get_net() safely dealing with the possibility that the struct
net (and nfsd_net by association) may have been destroyed by
nfsd_destroy_serv() via nfsd_shutdown_net().

Verified to fix an easy to hit crash that would occur if an nfsd
instance running in a container, with a localio client mounted, is
shutdown. Upon restart of the container and associated nfsd the client
would go on to crash due to NULL pointer dereference that occuured due
to the nfs client's localio attempting to nfsd_open_local_fh(), using
nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/Kconfig          |   3 ++
 fs/nfsd/Kconfig     |  16 +++++++
 fs/nfsd/Makefile    |   1 +
 fs/nfsd/filecache.c |   2 +-
 fs/nfsd/localio.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/trace.h     |   3 +-
 fs/nfsd/vfs.h       |   7 +++
 7 files changed, 135 insertions(+), 2 deletions(-)
 create mode 100644 fs/nfsd/localio.c

diff --git a/fs/Kconfig b/fs/Kconfig
index a46b0cbc4d8f..1b8a5edbddff 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
 	tristate
 	select FS_POSIX_ACL
 
+config NFS_COMMON_LOCALIO_SUPPORT
+	bool
+
 config NFS_COMMON
 	bool
 	depends on NFSD || NFS_FS || LOCKD
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index c0bd1509ccd4..e6fa7eaa1db0 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -90,6 +90,22 @@ config NFSD_V4
 
 	  If unsure, say N.
 
+config NFSD_LOCALIO
+	bool "NFS server support for the LOCALIO auxiliary protocol"
+	depends on NFSD
+	select NFS_COMMON_LOCALIO_SUPPORT
+	default n
+	help
+	  Some NFS servers support an auxiliary NFS LOCALIO protocol
+	  that is not an official part of the NFS protocol.
+
+	  This option enables support for the LOCALIO protocol in the
+	  kernel's NFS server.  Enable this to permit local NFS clients
+	  to bypass the network when issuing reads and writes to the
+	  local NFS server.
+
+	  If unsure, say N.
+
 config NFSD_PNFS
 	bool
 
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index b8736a82e57c..78b421778a79 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
 nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
 nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
 nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
+nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a83d469bca6b..49f4aab3208a 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -53,7 +53,7 @@
 #define NFSD_FILE_CACHE_UP		     (0)
 
 /* We only care about NFSD_MAY_READ/WRITE for this cache */
-#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
+#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
 
 static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
 static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
new file mode 100644
index 000000000000..4b65c66be129
--- /dev/null
+++ b/fs/nfsd/localio.c
@@ -0,0 +1,105 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS server support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/exportfs.h>
+#include <linux/sunrpc/svcauth.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/nfs.h>
+#include <linux/nfs_common.h>
+#include <linux/nfslocalio.h>
+#include <linux/string.h>
+
+#include "nfsd.h"
+#include "vfs.h"
+#include "netns.h"
+#include "filecache.h"
+
+/**
+ * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
+ *
+ * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net
+ * @cl_nfssvc_dom: the 'struct auth_domain' required for localio access
+ * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
+ * @cred: cred that the client established
+ * @nfs_fh: filehandle to lookup
+ * @fmode: fmode_t to use for open
+ *
+ * This function maps a local fh to a path on a local filesystem.
+ * This is useful when the nfs client has the local server mounted - it can
+ * avoid all the NFS overhead with reads, writes and commits.
+ *
+ * On successful return, returned nfs_localio_ctx will have its nfsd_file and
+ * nfsd_net members set. Caller is responsible for calling nfsd_file_put and
+ * nfsd_serv_put (via nfs_localio_ctx_free).
+ */
+struct nfs_localio_ctx *
+nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
+		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
+		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
+{
+	int mayflags = NFSD_MAY_LOCALIO;
+	int status = 0;
+	struct nfsd_net *nn;
+	struct svc_cred rq_cred;
+	struct svc_fh fh;
+	struct nfs_localio_ctx *localio;
+	__be32 beres;
+
+	if (nfs_fh->size > NFS4_FHSIZE)
+		return ERR_PTR(-EINVAL);
+
+	localio = nfs_localio_ctx_alloc();
+	if (!localio)
+		return ERR_PTR(-ENOMEM);
+
+	/*
+	 * Not running in nfsd context, so must safely get reference on nfsd_serv.
+	 * But the server may already be shutting down, if so disallow new localio.
+	 */
+	nn = net_generic(cl_nfssvc_net, nfsd_net_id);
+	if (unlikely(!nfsd_serv_try_get(nn))) {
+		status = -ENXIO;
+		goto out_nfsd_serv;
+	}
+
+	/* nfs_fh -> svc_fh */
+	fh_init(&fh, NFS4_FHSIZE);
+	fh.fh_handle.fh_size = nfs_fh->size;
+	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
+
+	if (fmode & FMODE_READ)
+		mayflags |= NFSD_MAY_READ;
+	if (fmode & FMODE_WRITE)
+		mayflags |= NFSD_MAY_WRITE;
+
+	svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
+
+	beres = nfsd_file_acquire_local(cl_nfssvc_net, &rq_cred, cl_nfssvc_dom,
+					&fh, mayflags, &localio->nf);
+	if (beres) {
+		status = nfs_stat_to_errno(be32_to_cpu(beres));
+		goto out_fh_put;
+	}
+	localio->nn = nn;
+
+out_fh_put:
+	fh_put(&fh);
+	if (rq_cred.cr_group_info)
+		put_group_info(rq_cred.cr_group_info);
+out_nfsd_serv:
+	if (status) {
+		nfs_localio_ctx_free(localio);
+		return ERR_PTR(status);
+	}
+	return localio;
+}
+EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index d22027e23761..82bcefcd1f21 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
 		{ NFSD_MAY_NOT_BREAK_LEASE,	"NOT_BREAK_LEASE" },	\
 		{ NFSD_MAY_BYPASS_GSS,		"BYPASS_GSS" },		\
 		{ NFSD_MAY_READ_IF_EXEC,	"READ_IF_EXEC" },	\
-		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" })
+		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" },	\
+		{ NFSD_MAY_LOCALIO,		"LOCALIO" })
 
 TRACE_EVENT(nfsd_compound,
 	TP_PROTO(
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 01947561d375..e12310dd5f4c 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -33,6 +33,8 @@
 
 #define NFSD_MAY_64BIT_COOKIE		0x1000 /* 64 bit readdir cookies for >= NFSv3 */
 
+#define NFSD_MAY_LOCALIO		0x2000 /* for tracing, reflects when localio used */
+
 #define NFSD_MAY_CREATE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE)
 #define NFSD_MAY_REMOVE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
 
@@ -158,6 +160,11 @@ __be32		nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
 
 void		nfsd_filp_close(struct file *fp);
 
+struct nfs_localio_ctx *
+nfsd_open_local_fh(struct net *, struct auth_domain *,
+		   struct rpc_clnt *, const struct cred *,
+		   const struct nfs_fh *, const fmode_t);
+
 static inline int fh_want_write(struct svc_fh *fh)
 {
 	int ret;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (15 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 16/25] nfsd: add localio support Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29 16:50   ` Jeff Layton
  2024-08-29  1:04 ` [PATCH v14 18/25] nfs: pass struct nfs_localio_ctx to nfs_init_pgio and nfs_init_commit Mike Snitzer
                   ` (8 subsequent siblings)
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common.  The server expects this protocol to use
the same transport as NFS and NFSACL for its RPCs.  This protocol
isn't part of an IETF standard, nor does it need to be considering it
is Linux-to-Linux auxiliary RPC protocol that amounts to an
implementation detail.

The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes).  The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.

The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
Linux Kernel Organization       400122  nfslocalio

Acked-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[neilb: factored out and simplified single localio protocol]
Co-developed-by: NeilBrown <neil@brown.name>
Signed-off-by: NeilBrown <neil@brown.name>
---
 fs/nfsd/localio.c   | 75 +++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/nfsd.h      |  4 +++
 fs/nfsd/nfssvc.c    | 23 +++++++++++++-
 include/linux/nfs.h |  7 +++++
 4 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 4b65c66be129..a192bbe308df 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -13,12 +13,15 @@
 #include <linux/nfs.h>
 #include <linux/nfs_common.h>
 #include <linux/nfslocalio.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
 #include <linux/string.h>
 
 #include "nfsd.h"
 #include "vfs.h"
 #include "netns.h"
 #include "filecache.h"
+#include "cache.h"
 
 /**
  * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
@@ -103,3 +106,75 @@ EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
 
 /* Compile time type checking, not used by anything */
 static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
+
+/*
+ * UUID_IS_LOCAL XDR functions
+ */
+
+static __be32 localio_proc_null(struct svc_rqst *rqstp)
+{
+	return rpc_success;
+}
+
+struct localio_uuidarg {
+	uuid_t			uuid;
+};
+
+static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
+{
+	struct localio_uuidarg *argp = rqstp->rq_argp;
+
+	(void) nfs_uuid_is_local(&argp->uuid, SVC_NET(rqstp),
+				 rqstp->rq_client);
+
+	return rpc_success;
+}
+
+static bool localio_decode_uuidarg(struct svc_rqst *rqstp,
+				   struct xdr_stream *xdr)
+{
+	struct localio_uuidarg *argp = rqstp->rq_argp;
+	u8 uuid[UUID_SIZE];
+
+	if (decode_opaque_fixed(xdr, uuid, UUID_SIZE))
+		return false;
+	import_uuid(&argp->uuid, uuid);
+
+	return true;
+}
+
+static const struct svc_procedure localio_procedures1[] = {
+	[LOCALIOPROC_NULL] = {
+		.pc_func = localio_proc_null,
+		.pc_decode = nfssvc_decode_voidarg,
+		.pc_encode = nfssvc_encode_voidres,
+		.pc_argsize = sizeof(struct nfsd_voidargs),
+		.pc_ressize = sizeof(struct nfsd_voidres),
+		.pc_cachetype = RC_NOCACHE,
+		.pc_xdrressize = 0,
+		.pc_name = "NULL",
+	},
+	[LOCALIOPROC_UUID_IS_LOCAL] = {
+		.pc_func = localio_proc_uuid_is_local,
+		.pc_decode = localio_decode_uuidarg,
+		.pc_encode = nfssvc_encode_voidres,
+		.pc_argsize = sizeof(struct localio_uuidarg),
+		.pc_argzero = sizeof(struct localio_uuidarg),
+		.pc_ressize = sizeof(struct nfsd_voidres),
+		.pc_cachetype = RC_NOCACHE,
+		.pc_name = "UUID_IS_LOCAL",
+	},
+};
+
+#define LOCALIO_NR_PROCEDURES ARRAY_SIZE(localio_procedures1)
+static DEFINE_PER_CPU_ALIGNED(unsigned long,
+			      localio_count[LOCALIO_NR_PROCEDURES]);
+const struct svc_version localio_version1 = {
+	.vs_vers	= 1,
+	.vs_nproc	= LOCALIO_NR_PROCEDURES,
+	.vs_proc	= localio_procedures1,
+	.vs_dispatch	= nfsd_dispatch,
+	.vs_count	= localio_count,
+	.vs_xdrsize	= XDR_QUADLEN(UUID_SIZE),
+	.vs_hidden	= true,
+};
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index b0d3e82d6dcd..232a873dc53a 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -146,6 +146,10 @@ extern const struct svc_version nfsd_acl_version3;
 #endif
 #endif
 
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+extern const struct svc_version localio_version1;
+#endif
+
 struct nfsd_net;
 
 enum vers_op {NFSD_SET, NFSD_CLEAR, NFSD_TEST, NFSD_AVAIL };
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 13c69aa40d1c..eec4a9803c4a 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -80,6 +80,15 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
 unsigned long	nfsd_drc_max_mem;
 unsigned long	nfsd_drc_mem_used;
 
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+static const struct svc_version *localio_versions[] = {
+	[1] = &localio_version1,
+};
+
+#define NFSD_LOCALIO_NRVERS		ARRAY_SIZE(localio_versions)
+
+#endif /* CONFIG_NFSD_LOCALIO */
+
 #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
 static const struct svc_version *nfsd_acl_version[] = {
 # if defined(CONFIG_NFSD_V2_ACL)
@@ -128,6 +137,18 @@ struct svc_program		nfsd_programs[] = {
 	.pg_rpcbind_set		= nfsd_acl_rpcbind_set,
 	},
 #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+	{
+	.pg_prog		= NFS_LOCALIO_PROGRAM,
+	.pg_nvers		= NFSD_LOCALIO_NRVERS,
+	.pg_vers		= localio_versions,
+	.pg_name		= "nfslocalio",
+	.pg_class		= "nfsd",
+	.pg_authenticate	= svc_set_client,
+	.pg_init_request	= svc_generic_init_request,
+	.pg_rpcbind_set		= svc_generic_rpcbind_set,
+	}
+#endif /* IS_ENABLED(CONFIG_NFSD_LOCALIO) */
 };
 
 bool nfsd_support_version(int vers)
@@ -949,7 +970,7 @@ nfsd(void *vrqstp)
 }
 
 /**
- * nfsd_dispatch - Process an NFS or NFSACL Request
+ * nfsd_dispatch - Process an NFS or NFSACL or LOCALIO Request
  * @rqstp: incoming request
  *
  * This RPC dispatcher integrates the NFS server's duplicate reply cache.
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index ceb70a926b95..5ff1a5b3b00c 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -13,6 +13,13 @@
 #include <linux/crc32.h>
 #include <uapi/linux/nfs.h>
 
+/* The localio program is entirely private to Linux and is
+ * NOT part of the uapi.
+ */
+#define NFS_LOCALIO_PROGRAM		400122
+#define LOCALIOPROC_NULL		0
+#define LOCALIOPROC_UUID_IS_LOCAL	1
+
 /*
  * This is the kernel NFS client file handle representation
  */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 18/25] nfs: pass struct nfs_localio_ctx to nfs_init_pgio and nfs_init_commit
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (16 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:04 ` [PATCH v14 19/25] nfs: add localio support Mike Snitzer
                   ` (7 subsequent siblings)
  25 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

The nfs_localio_ctx will be passed, in future commits, by callers
that enable localio support (for both regular NFS and pNFS IO).

[Derived from patch authored by Weston Andros Adamson, but switched
 from passing struct file to struct nfs_localio_ctx]

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/filelayout/filelayout.c         | 6 +++---
 fs/nfs/flexfilelayout/flexfilelayout.c | 6 +++---
 fs/nfs/internal.h                      | 7 +++++--
 fs/nfs/pagelist.c                      | 6 ++++--
 fs/nfs/pnfs_nfs.c                      | 2 +-
 fs/nfs/write.c                         | 5 +++--
 6 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index b6e9aeaf4ce2..d39a1f58e18d 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -488,7 +488,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
 	/* Perform an asynchronous read to ds */
 	nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
 			  NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
-			  0, RPC_TASK_SOFTCONN);
+			  0, RPC_TASK_SOFTCONN, NULL);
 	return PNFS_ATTEMPTED;
 }
 
@@ -530,7 +530,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
 	/* Perform an asynchronous write */
 	nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
 			  NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
-			  sync, RPC_TASK_SOFTCONN);
+			  sync, RPC_TASK_SOFTCONN, NULL);
 	return PNFS_ATTEMPTED;
 }
 
@@ -1011,7 +1011,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
 		data->args.fh = fh;
 	return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode),
 				   &filelayout_commit_call_ops, how,
-				   RPC_TASK_SOFTCONN);
+				   RPC_TASK_SOFTCONN, NULL);
 out_err:
 	pnfs_generic_prepare_to_resend_writes(data);
 	pnfs_generic_commit_release(data);
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index d4d551ffea7b..01ee52551a63 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1806,7 +1806,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
 	nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
 			  vers == 3 ? &ff_layout_read_call_ops_v3 :
 				      &ff_layout_read_call_ops_v4,
-			  0, RPC_TASK_SOFTCONN);
+			  0, RPC_TASK_SOFTCONN, NULL);
 	put_cred(ds_cred);
 	return PNFS_ATTEMPTED;
 
@@ -1874,7 +1874,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
 	nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
 			  vers == 3 ? &ff_layout_write_call_ops_v3 :
 				      &ff_layout_write_call_ops_v4,
-			  sync, RPC_TASK_SOFTCONN);
+			  sync, RPC_TASK_SOFTCONN, NULL);
 	put_cred(ds_cred);
 	return PNFS_ATTEMPTED;
 
@@ -1949,7 +1949,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
 	ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
 				   vers == 3 ? &ff_layout_commit_call_ops_v3 :
 					       &ff_layout_commit_call_ops_v4,
-				   how, RPC_TASK_SOFTCONN);
+				   how, RPC_TASK_SOFTCONN, NULL);
 	put_cred(ds_cred);
 	return ret;
 out_err:
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 5902a9beca1f..d4ab74a61668 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -9,6 +9,7 @@
 #include <linux/crc32.h>
 #include <linux/sunrpc/addr.h>
 #include <linux/nfs_page.h>
+#include <linux/nfslocalio.h>
 #include <linux/wait_bit.h>
 
 #define NFS_SB_MASK (SB_RDONLY|SB_NOSUID|SB_NODEV|SB_NOEXEC|SB_SYNCHRONOUS)
@@ -308,7 +309,8 @@ void nfs_pgio_header_free(struct nfs_pgio_header *);
 int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
 int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
 		      const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
-		      const struct rpc_call_ops *call_ops, int how, int flags);
+		      const struct rpc_call_ops *call_ops, int how, int flags,
+		      struct nfs_localio_ctx *localio);
 void nfs_free_request(struct nfs_page *req);
 struct nfs_pgio_mirror *
 nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
@@ -528,7 +530,8 @@ extern int nfs_initiate_commit(struct rpc_clnt *clnt,
 			       struct nfs_commit_data *data,
 			       const struct nfs_rpc_ops *nfs_ops,
 			       const struct rpc_call_ops *call_ops,
-			       int how, int flags);
+			       int how, int flags,
+			       struct nfs_localio_ctx *localio);
 extern void nfs_init_commit(struct nfs_commit_data *data,
 			    struct list_head *head,
 			    struct pnfs_layout_segment *lseg,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 04124f226665..849db19451ff 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -731,7 +731,8 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
 
 int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
 		      const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
-		      const struct rpc_call_ops *call_ops, int how, int flags)
+		      const struct rpc_call_ops *call_ops, int how, int flags,
+		      struct nfs_localio_ctx *localio)
 {
 	struct rpc_task *task;
 	struct rpc_message msg = {
@@ -961,7 +962,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
 					NFS_PROTO(hdr->inode),
 					desc->pg_rpc_callops,
 					desc->pg_ioflags,
-					RPC_TASK_CRED_NOREF | task_flags);
+					RPC_TASK_CRED_NOREF | task_flags,
+					NULL);
 	}
 	return ret;
 }
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index a74ee69a2fa6..dbef837e871a 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -490,7 +490,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
 			nfs_initiate_commit(NFS_CLIENT(inode), data,
 					    NFS_PROTO(data->inode),
 					    data->mds_ops, how,
-					    RPC_TASK_CRED_NOREF);
+					    RPC_TASK_CRED_NOREF, NULL);
 		} else {
 			nfs_init_commit(data, NULL, data->lseg, cinfo);
 			initiate_commit(data, how);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index d074d0ceb4f0..4bd16473a953 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1663,7 +1663,8 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release);
 int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
 			const struct nfs_rpc_ops *nfs_ops,
 			const struct rpc_call_ops *call_ops,
-			int how, int flags)
+			int how, int flags,
+			struct nfs_localio_ctx *localio)
 {
 	struct rpc_task *task;
 	int priority = flush_task_priority(how);
@@ -1809,7 +1810,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
 		task_flags = RPC_TASK_MOVEABLE;
 	return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
 				   data->mds_ops, how,
-				   RPC_TASK_CRED_NOREF | task_flags);
+				   RPC_TASK_CRED_NOREF | task_flags, NULL);
 }
 
 /*
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 19/25] nfs: add localio support
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (17 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 18/25] nfs: pass struct nfs_localio_ctx to nfs_init_pgio and nfs_init_commit Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:04 ` [PATCH v14 20/25] nfs: enable localio for non-pNFS IO Mike Snitzer
                   ` (6 subsequent siblings)
  25 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Weston Andros Adamson <dros@primarydata.com>

Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.

nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a Linux-only LOCALIO auxiliary RPC protocol.

This has dynamic binding with the nfsd module (via nfs_localio module
which is part of nfs_common). Localio will only work if nfsd is
already loaded.

The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use localio support.

CONFIG_NFS_LOCALIO controls the client enablement.

Lastly, localio uses an nfsd_file to initiate all IO.  To make proper
use of nfsd_file (and nfsd's filecache) its lifetime (duration before
nfsd_file_put is called) must extend until after commit, read and
write operations.  So rather than immediately call nfsd_file_put() in
nfs_local_open_fh(), nfsd_file_put() isn't called until
nfs_local_pgio_release() for read/write and not until
nfs_local_release_commit_data() for commit. The same applies to the
reference held on nfsd's nn->nfsd_serv. Both object lifetimes and
associated references are managed through calls to
nfs_localio_ctx_free().

Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/Kconfig            |  16 +
 fs/nfs/Makefile           |   1 +
 fs/nfs/client.c           |  11 +
 fs/nfs/internal.h         |  45 +++
 fs/nfs/localio.c          | 630 ++++++++++++++++++++++++++++++++++++++
 fs/nfs/nfstrace.h         |  61 ++++
 fs/nfs/pagelist.c         |   4 +
 fs/nfs/write.c            |   3 +
 include/linux/nfs.h       |   2 +
 include/linux/nfs_fs_sb.h |  10 +
 10 files changed, 783 insertions(+)
 create mode 100644 fs/nfs/localio.c

diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 0eb20012792f..9fe3b2709666 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -87,6 +87,22 @@ config NFS_V4
 
 	  If unsure, say Y.
 
+config NFS_LOCALIO
+	bool "NFS client support for the LOCALIO auxiliary protocol"
+	depends on NFS_FS
+	select NFS_COMMON_LOCALIO_SUPPORT
+	default n
+	help
+	  Some NFS servers support an auxiliary NFS LOCALIO protocol
+	  that is not an official part of the NFS protocol.
+
+	  This option enables support for the LOCALIO protocol in the
+	  kernel's NFS client.  Enable this to permit local NFS clients
+	  to bypass the network when issuing reads and writes to the
+	  local NFS server.
+
+	  If unsure, say N.
+
 config NFS_SWAP
 	bool "Provide swap over NFS support"
 	default n
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 5f6db37f461e..9fb2f2cac87e 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -13,6 +13,7 @@ nfs-y 			:= client.o dir.o file.o getroot.o inode.o super.o \
 nfs-$(CONFIG_ROOT_NFS)	+= nfsroot.o
 nfs-$(CONFIG_SYSCTL)	+= sysctl.o
 nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
+nfs-$(CONFIG_NFS_LOCALIO) += localio.o
 
 obj-$(CONFIG_NFS_V2) += nfsv2.o
 nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 8286edd6062d..b981c519a12d 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -178,6 +178,14 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
 	clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1;
 	clp->cl_net = get_net(cl_init->net);
 
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+	seqlock_init(&clp->cl_boot_lock);
+	ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+	clp->cl_nfssvc_net = NULL;
+	clp->cl_nfssvc_dom = NULL;
+	spin_lock_init(&clp->cl_localio_lock);
+#endif /* CONFIG_NFS_LOCALIO */
+
 	clp->cl_principal = "*";
 	clp->cl_xprtsec = cl_init->xprtsec;
 	return clp;
@@ -233,6 +241,8 @@ static void pnfs_init_server(struct nfs_server *server)
  */
 void nfs_free_client(struct nfs_client *clp)
 {
+	nfs_local_disable(clp);
+
 	/* -EIO all pending I/O */
 	if (!IS_ERR(clp->cl_rpcclient))
 		rpc_shutdown_client(clp->cl_rpcclient);
@@ -424,6 +434,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
 			list_add_tail(&new->cl_share_link,
 					&nn->nfs_client_list);
 			spin_unlock(&nn->nfs_client_lock);
+			nfs_local_probe(new);
 			return rpc_ops->init_client(new, cl_init);
 		}
 
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index d4ab74a61668..0716c90eaf9c 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -451,6 +451,51 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
 extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
 extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
 
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+/* localio.c */
+extern void nfs_local_disable(struct nfs_client *);
+extern void nfs_local_probe(struct nfs_client *);
+extern struct nfs_localio_ctx *nfs_local_open_fh(struct nfs_client *,
+						 const struct cred *,
+						 struct nfs_fh *,
+						 const fmode_t);
+extern int nfs_local_doio(struct nfs_client *,
+			  struct nfs_localio_ctx *,
+			  struct nfs_pgio_header *,
+			  const struct rpc_call_ops *);
+extern int nfs_local_commit(struct nfs_localio_ctx *,
+			    struct nfs_commit_data *,
+			    const struct rpc_call_ops *, int);
+extern bool nfs_server_is_local(const struct nfs_client *clp);
+
+#else
+static inline void nfs_local_disable(struct nfs_client *clp) {}
+static inline void nfs_local_probe(struct nfs_client *clp) {}
+static inline struct nfs_localio_ctx *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+		  struct nfs_fh *fh, const fmode_t mode)
+{
+	return NULL;
+}
+static inline int nfs_local_doio(struct nfs_client *clp,
+				 struct nfs_localio_ctx *localio,
+				 struct nfs_pgio_header *hdr,
+				 const struct rpc_call_ops *call_ops)
+{
+	return -EINVAL;
+}
+static inline int nfs_local_commit(struct nfs_localio_ctx *localio,
+				struct nfs_commit_data *data,
+				const struct rpc_call_ops *call_ops, int how)
+{
+	return -EINVAL;
+}
+static inline bool nfs_server_is_local(const struct nfs_client *clp)
+{
+	return false;
+}
+#endif /* CONFIG_NFS_LOCALIO */
+
 /* super.c */
 extern const struct super_operations nfs_sops;
 bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
new file mode 100644
index 000000000000..96e04328adb9
--- /dev/null
+++ b/fs/nfs/localio.c
@@ -0,0 +1,630 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS client support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/vfs.h>
+#include <linux/file.h>
+#include <linux/inet.h>
+#include <linux/sunrpc/addr.h>
+#include <linux/inetdevice.h>
+#include <net/addrconf.h>
+#include <linux/nfs_common.h>
+#include <linux/nfslocalio.h>
+#include <linux/module.h>
+#include <linux/bvec.h>
+
+#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
+
+#include "internal.h"
+#include "pnfs.h"
+#include "nfstrace.h"
+
+#define NFSDBG_FACILITY		NFSDBG_VFS
+
+struct nfs_local_kiocb {
+	struct kiocb		kiocb;
+	struct bio_vec		*bvec;
+	struct nfs_pgio_header	*hdr;
+	struct work_struct	work;
+	struct nfs_localio_ctx	*localio;
+};
+
+struct nfs_local_fsync_ctx {
+	struct nfs_localio_ctx	*localio;
+	struct nfs_commit_data	*data;
+	struct work_struct	work;
+	struct kref		kref;
+	struct completion	*done;
+};
+static void nfs_local_fsync_work(struct work_struct *work);
+
+static bool localio_enabled __read_mostly = true;
+module_param(localio_enabled, bool, 0644);
+
+bool nfs_server_is_local(const struct nfs_client *clp)
+{
+	return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
+		localio_enabled;
+}
+EXPORT_SYMBOL_GPL(nfs_server_is_local);
+
+/*
+ * nfs_local_enable - enable local i/o for an nfs_client
+ */
+static __maybe_unused void nfs_local_enable(struct nfs_client *clp,
+					    nfs_uuid_t *nfs_uuid)
+{
+	spin_lock(&clp->cl_localio_lock);
+
+	if (unlikely(!get_nfs_to_nfsd_symbols()))
+		goto out;
+	set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+	rcu_assign_pointer(clp->cl_nfssvc_net, nfs_uuid->net);
+	rcu_assign_pointer(clp->cl_nfssvc_dom, nfs_uuid->dom);
+	trace_nfs_local_enable(clp);
+out:
+	spin_unlock(&clp->cl_localio_lock);
+}
+
+/*
+ * nfs_local_disable - disable local i/o for an nfs_client
+ */
+void nfs_local_disable(struct nfs_client *clp)
+{
+	struct net *cl_nfssvc_net;
+	struct auth_domain *cl_nfssvc_dom;
+
+	spin_lock(&clp->cl_localio_lock);
+	if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
+		trace_nfs_local_disable(clp);
+		put_nfs_to_nfsd_symbols();
+
+		cl_nfssvc_net = rcu_dereference(clp->cl_nfssvc_net);
+		if (cl_nfssvc_net) {
+			put_net(cl_nfssvc_net);
+			RCU_INIT_POINTER(clp->cl_nfssvc_net, NULL);
+		}
+
+		cl_nfssvc_dom = rcu_dereference(clp->cl_nfssvc_dom);
+		if (cl_nfssvc_dom) {
+			auth_domain_put(cl_nfssvc_dom);
+			RCU_INIT_POINTER(clp->cl_nfssvc_dom, NULL);
+		}
+	}
+	spin_unlock(&clp->cl_localio_lock);
+}
+
+/*
+ * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ */
+void nfs_local_probe(struct nfs_client *clp)
+{
+}
+EXPORT_SYMBOL_GPL(nfs_local_probe);
+
+/*
+ * nfs_local_open_fh - open a local filehandle in terms of nfsd_file
+ *
+ * Returns a pointer to a struct nfs_localio_ctx or NULL
+ */
+struct nfs_localio_ctx *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+		  struct nfs_fh *fh, const fmode_t mode)
+{
+	struct net *cl_nfssvc_net;
+	struct auth_domain *cl_nfssvc_dom;
+	struct nfs_localio_ctx *localio;
+	int status;
+
+	if (!nfs_server_is_local(clp))
+		return NULL;
+	if (mode & ~(FMODE_READ | FMODE_WRITE))
+		return NULL;
+
+	rcu_read_lock();
+	cl_nfssvc_net = rcu_dereference(clp->cl_nfssvc_net);
+	cl_nfssvc_dom = rcu_dereference(clp->cl_nfssvc_dom);
+	if (unlikely(!cl_nfssvc_net || !cl_nfssvc_dom))
+		localio = ERR_PTR(-ENXIO);
+	else
+		localio = nfs_to.nfsd_open_local_fh(cl_nfssvc_net, cl_nfssvc_dom,
+						    clp->cl_rpcclient, cred, fh, mode);
+	rcu_read_unlock();
+	if (IS_ERR(localio)) {
+		status = PTR_ERR(localio);
+		trace_nfs_local_open_fh(fh, mode, status);
+		switch (status) {
+		case -ENOMEM:
+		case -ENXIO:
+		case -ENOENT:
+			nfs_local_disable(clp);
+		}
+		return NULL;
+	}
+	return localio;
+}
+EXPORT_SYMBOL_GPL(nfs_local_open_fh);
+
+static struct bio_vec *
+nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
+		unsigned int npages, gfp_t flags)
+{
+	struct bio_vec *bvec, *p;
+
+	bvec = kmalloc_array(npages, sizeof(*bvec), flags);
+	if (bvec != NULL) {
+		for (p = bvec; npages > 0; p++, pagevec++, npages--) {
+			p->bv_page = *pagevec;
+			p->bv_len = PAGE_SIZE;
+			p->bv_offset = 0;
+		}
+	}
+	return bvec;
+}
+
+static void
+nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
+{
+	kfree(iocb->bvec);
+	kfree(iocb);
+}
+
+static struct nfs_local_kiocb *
+nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
+		     struct nfs_localio_ctx *localio, gfp_t flags)
+{
+	struct nfs_local_kiocb *iocb;
+
+	iocb = kmalloc(sizeof(*iocb), flags);
+	if (iocb == NULL)
+		return NULL;
+	iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
+			hdr->page_array.npages, flags);
+	if (iocb->bvec == NULL) {
+		kfree(iocb);
+		return NULL;
+	}
+	init_sync_kiocb(&iocb->kiocb, nfs_to.nfsd_file_file(localio->nf));
+	iocb->kiocb.ki_pos = hdr->args.offset;
+	iocb->localio = localio;
+	iocb->hdr = hdr;
+	iocb->kiocb.ki_flags &= ~IOCB_APPEND;
+	return iocb;
+}
+
+static void
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+{
+	struct nfs_pgio_header *hdr = iocb->hdr;
+
+	iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages,
+		      hdr->args.count + hdr->args.pgbase);
+	if (hdr->args.pgbase != 0)
+		iov_iter_advance(i, hdr->args.pgbase);
+}
+
+static void
+nfs_local_hdr_release(struct nfs_pgio_header *hdr,
+		const struct rpc_call_ops *call_ops)
+{
+	call_ops->rpc_call_done(&hdr->task, hdr);
+	call_ops->rpc_release(hdr);
+}
+
+static void
+nfs_local_pgio_init(struct nfs_pgio_header *hdr,
+		const struct rpc_call_ops *call_ops)
+{
+	hdr->task.tk_ops = call_ops;
+	if (!hdr->task.tk_start)
+		hdr->task.tk_start = ktime_get();
+}
+
+static void
+nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
+{
+	if (status >= 0) {
+		hdr->res.count = status;
+		hdr->res.op_status = NFS4_OK;
+		hdr->task.tk_status = 0;
+	} else {
+		hdr->res.op_status = nfs4_stat_to_errno(status);
+		hdr->task.tk_status = status;
+	}
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+	struct nfs_pgio_header *hdr = iocb->hdr;
+
+	nfs_localio_ctx_free(iocb->localio);
+	nfs_local_iocb_free(iocb);
+	nfs_local_hdr_release(hdr, hdr->task.tk_ops);
+}
+
+static void
+nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
+{
+	struct nfs_pgio_header *hdr = iocb->hdr;
+	struct file *filp = iocb->kiocb.ki_filp;
+
+	nfs_local_pgio_done(hdr, status);
+
+	if (hdr->res.count != hdr->args.count ||
+	    hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
+		hdr->res.eof = true;
+
+	dprintk("%s: read %ld bytes eof %d.\n", __func__,
+			status > 0 ? status : 0, hdr->res.eof);
+}
+
+static int
+nfs_do_local_read(struct nfs_pgio_header *hdr,
+		  struct nfs_localio_ctx *localio,
+		  const struct rpc_call_ops *call_ops)
+{
+	struct file *filp = nfs_to.nfsd_file_file(localio->nf);
+	struct nfs_local_kiocb *iocb;
+	struct iov_iter iter;
+	ssize_t status;
+
+	dprintk("%s: vfs_read count=%u pos=%llu\n",
+		__func__, hdr->args.count, hdr->args.offset);
+
+	iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL);
+	if (iocb == NULL)
+		return -ENOMEM;
+	nfs_local_iter_init(&iter, iocb, READ);
+
+	nfs_local_pgio_init(hdr, call_ops);
+	hdr->res.eof = false;
+
+	status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+	WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+	nfs_local_read_done(iocb, status);
+	nfs_local_pgio_release(iocb);
+
+	return 0;
+}
+
+static void
+nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
+{
+	struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+	u32 *verf = (u32 *)verifier->data;
+	int seq = 0;
+
+	do {
+		read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
+		verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
+		verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
+	} while (need_seqretry(&clp->cl_boot_lock, seq));
+	done_seqretry(&clp->cl_boot_lock, seq);
+}
+
+static void
+nfs_reset_boot_verifier(struct inode *inode)
+{
+	struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+
+	write_seqlock(&clp->cl_boot_lock);
+	ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+	write_sequnlock(&clp->cl_boot_lock);
+}
+
+static void
+nfs_set_local_verifier(struct inode *inode,
+		struct nfs_writeverf *verf,
+		enum nfs3_stable_how how)
+{
+	nfs_copy_boot_verifier(&verf->verifier, inode);
+	verf->committed = how;
+}
+
+/* Factored out from fs/nfsd/vfs.h:fh_getattr() */
+static int __vfs_getattr(struct path *p, struct kstat *stat, int version)
+{
+	u32 request_mask = STATX_BASIC_STATS;
+
+	if (version == 4)
+		request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE);
+	return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT);
+}
+
+/* Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute() */
+static u64 __nfsd4_change_attribute(const struct kstat *stat,
+				    const struct inode *inode)
+{
+	u64 chattr;
+
+	if (stat->result_mask & STATX_CHANGE_COOKIE) {
+		chattr = stat->change_cookie;
+		if (S_ISREG(inode->i_mode) &&
+		    !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
+			chattr += (u64)stat->ctime.tv_sec << 30;
+			chattr += stat->ctime.tv_nsec;
+		}
+	} else {
+		chattr = time_to_chattr(&stat->ctime);
+	}
+	return chattr;
+}
+
+static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
+{
+	struct kstat stat;
+	struct file *filp = iocb->kiocb.ki_filp;
+	struct nfs_pgio_header *hdr = iocb->hdr;
+	struct nfs_fattr *fattr = hdr->res.fattr;
+	int version = NFS_PROTO(hdr->inode)->version;
+
+	if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version))
+		return;
+
+	fattr->valid = (NFS_ATTR_FATTR_FILEID |
+			NFS_ATTR_FATTR_CHANGE |
+			NFS_ATTR_FATTR_SIZE |
+			NFS_ATTR_FATTR_ATIME |
+			NFS_ATTR_FATTR_MTIME |
+			NFS_ATTR_FATTR_CTIME |
+			NFS_ATTR_FATTR_SPACE_USED);
+
+	fattr->fileid = stat.ino;
+	fattr->size = stat.size;
+	fattr->atime = stat.atime;
+	fattr->mtime = stat.mtime;
+	fattr->ctime = stat.ctime;
+	if (version == 4) {
+		fattr->change_attr =
+			__nfsd4_change_attribute(&stat, file_inode(filp));
+	} else
+		fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+	fattr->du.nfs3.used = stat.blocks << 9;
+}
+
+static void
+nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
+{
+	struct nfs_pgio_header *hdr = iocb->hdr;
+	struct inode *inode = hdr->inode;
+
+	dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
+
+	/* Handle short writes as if they are ENOSPC */
+	if (status > 0 && status < hdr->args.count) {
+		hdr->mds_offset += status;
+		hdr->args.offset += status;
+		hdr->args.pgbase += status;
+		hdr->args.count -= status;
+		nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
+		status = -ENOSPC;
+	}
+	if (status < 0)
+		nfs_reset_boot_verifier(inode);
+	else if (nfs_should_remove_suid(inode)) {
+		/* Deal with the suid/sgid bit corner case */
+		spin_lock(&inode->i_lock);
+		nfs_set_cache_invalid(inode, NFS_INO_INVALID_MODE);
+		spin_unlock(&inode->i_lock);
+	}
+	nfs_local_pgio_done(hdr, status);
+}
+
+static int
+nfs_do_local_write(struct nfs_pgio_header *hdr,
+		   struct nfs_localio_ctx *localio,
+		   const struct rpc_call_ops *call_ops)
+{
+	struct file *filp = nfs_to.nfsd_file_file(localio->nf);
+	struct nfs_local_kiocb *iocb;
+	struct iov_iter iter;
+	ssize_t status;
+
+	dprintk("%s: vfs_write count=%u pos=%llu %s\n",
+		__func__, hdr->args.count, hdr->args.offset,
+		(hdr->args.stable == NFS_UNSTABLE) ?  "unstable" : "stable");
+
+	iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO);
+	if (iocb == NULL)
+		return -ENOMEM;
+	nfs_local_iter_init(&iter, iocb, WRITE);
+
+	switch (hdr->args.stable) {
+	default:
+		break;
+	case NFS_DATA_SYNC:
+		iocb->kiocb.ki_flags |= IOCB_DSYNC;
+		break;
+	case NFS_FILE_SYNC:
+		iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
+	}
+	nfs_local_pgio_init(hdr, call_ops);
+
+	nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
+
+	file_start_write(filp);
+	status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+	file_end_write(filp);
+	WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+	nfs_local_write_done(iocb, status);
+	nfs_local_vfs_getattr(iocb);
+	nfs_local_pgio_release(iocb);
+
+	return 0;
+}
+
+int nfs_local_doio(struct nfs_client *clp, struct nfs_localio_ctx *localio,
+		   struct nfs_pgio_header *hdr,
+		   const struct rpc_call_ops *call_ops)
+{
+	int status = 0;
+	struct file *filp = nfs_to.nfsd_file_file(localio->nf);
+
+	if (!hdr->args.count)
+		return 0;
+	/* Don't support filesystems without read_iter/write_iter */
+	if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
+		nfs_local_disable(clp);
+		status = -EAGAIN;
+		goto out;
+	}
+
+	switch (hdr->rw_mode) {
+	case FMODE_READ:
+		status = nfs_do_local_read(hdr, localio, call_ops);
+		break;
+	case FMODE_WRITE:
+		status = nfs_do_local_write(hdr, localio, call_ops);
+		break;
+	default:
+		dprintk("%s: invalid mode: %d\n", __func__,
+			hdr->rw_mode);
+		status = -EINVAL;
+	}
+out:
+	if (status != 0) {
+		nfs_localio_ctx_free(localio);
+		hdr->task.tk_status = status;
+		nfs_local_hdr_release(hdr, call_ops);
+	}
+	return status;
+}
+
+static void
+nfs_local_init_commit(struct nfs_commit_data *data,
+		const struct rpc_call_ops *call_ops)
+{
+	data->task.tk_ops = call_ops;
+}
+
+static int
+nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
+{
+	loff_t start = data->args.offset;
+	loff_t end = LLONG_MAX;
+
+	if (data->args.count > 0) {
+		end = start + data->args.count - 1;
+		if (end < start)
+			end = LLONG_MAX;
+	}
+
+	dprintk("%s: commit %llu - %llu\n", __func__, start, end);
+	return vfs_fsync_range(filp, start, end, 0);
+}
+
+static void
+nfs_local_commit_done(struct nfs_commit_data *data, int status)
+{
+	if (status >= 0) {
+		nfs_set_local_verifier(data->inode,
+				data->res.verf,
+				NFS_FILE_SYNC);
+		data->res.op_status = NFS4_OK;
+		data->task.tk_status = 0;
+	} else {
+		nfs_reset_boot_verifier(data->inode);
+		data->res.op_status = nfs4_stat_to_errno(status);
+		data->task.tk_status = status;
+	}
+}
+
+static void
+nfs_local_release_commit_data(struct nfs_localio_ctx *localio,
+		struct nfs_commit_data *data,
+		const struct rpc_call_ops *call_ops)
+{
+	nfs_localio_ctx_free(localio);
+	call_ops->rpc_call_done(&data->task, data);
+	call_ops->rpc_release(data);
+}
+
+static struct nfs_local_fsync_ctx *
+nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data,
+			  struct nfs_localio_ctx *localio, gfp_t flags)
+{
+	struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
+
+	if (ctx != NULL) {
+		ctx->localio = localio;
+		ctx->data = data;
+		INIT_WORK(&ctx->work, nfs_local_fsync_work);
+		kref_init(&ctx->kref);
+		ctx->done = NULL;
+	}
+	return ctx;
+}
+
+static void
+nfs_local_fsync_ctx_kref_free(struct kref *kref)
+{
+	kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
+}
+
+static void
+nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
+{
+	kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
+}
+
+static void
+nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
+{
+	nfs_local_release_commit_data(ctx->localio, ctx->data,
+				      ctx->data->task.tk_ops);
+	nfs_local_fsync_ctx_put(ctx);
+}
+
+static void
+nfs_local_fsync_work(struct work_struct *work)
+{
+	struct nfs_local_fsync_ctx *ctx;
+	int status;
+
+	ctx = container_of(work, struct nfs_local_fsync_ctx, work);
+
+	status = nfs_local_run_commit(nfs_to.nfsd_file_file(ctx->localio->nf),
+				      ctx->data);
+	nfs_local_commit_done(ctx->data, status);
+	if (ctx->done != NULL)
+		complete(ctx->done);
+	nfs_local_fsync_ctx_free(ctx);
+}
+
+int nfs_local_commit(struct nfs_localio_ctx *localio,
+		     struct nfs_commit_data *data,
+		     const struct rpc_call_ops *call_ops, int how)
+{
+	struct nfs_local_fsync_ctx *ctx;
+
+	ctx = nfs_local_fsync_ctx_alloc(data, localio, GFP_KERNEL);
+	if (!ctx) {
+		nfs_local_commit_done(data, -ENOMEM);
+		nfs_local_release_commit_data(localio, data, call_ops);
+		return -ENOMEM;
+	}
+
+	nfs_local_init_commit(data, call_ops);
+	kref_get(&ctx->kref);
+	if (how & FLUSH_SYNC) {
+		DECLARE_COMPLETION_ONSTACK(done);
+		ctx->done = &done;
+		queue_work(nfsiod_workqueue, &ctx->work);
+		wait_for_completion(&done);
+	} else
+		queue_work(nfsiod_workqueue, &ctx->work);
+	nfs_local_fsync_ctx_put(ctx);
+	return 0;
+}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 352fdaed4075..1eab98c277fa 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1685,6 +1685,67 @@ TRACE_EVENT(nfs_mount_path,
 	TP_printk("path='%s'", __get_str(path))
 );
 
+TRACE_EVENT(nfs_local_open_fh,
+		TP_PROTO(
+			const struct nfs_fh *fh,
+			fmode_t fmode,
+			int error
+		),
+
+		TP_ARGS(fh, fmode, error),
+
+		TP_STRUCT__entry(
+			__field(int, error)
+			__field(u32, fhandle)
+			__field(unsigned int, fmode)
+		),
+
+		TP_fast_assign(
+			__entry->error = error;
+			__entry->fhandle = nfs_fhandle_hash(fh);
+			__entry->fmode = (__force unsigned int)fmode;
+		),
+
+		TP_printk(
+			"error=%d fhandle=0x%08x mode=%s",
+			__entry->error,
+			__entry->fhandle,
+			show_fs_fmode_flags(__entry->fmode)
+		)
+);
+
+DECLARE_EVENT_CLASS(nfs_local_client_event,
+		TP_PROTO(
+			const struct nfs_client *clp
+		),
+
+		TP_ARGS(clp),
+
+		TP_STRUCT__entry(
+			__field(unsigned int, protocol)
+			__string(server, clp->cl_hostname)
+		),
+
+		TP_fast_assign(
+			__entry->protocol = clp->rpc_ops->version;
+			__assign_str(server);
+		),
+
+		TP_printk(
+			"server=%s NFSv%u", __get_str(server), __entry->protocol
+		)
+);
+
+#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \
+	DEFINE_EVENT(nfs_local_client_event, name, \
+			TP_PROTO( \
+				const struct nfs_client *clp \
+			), \
+			TP_ARGS(clp))
+
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable);
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable);
+
 DECLARE_EVENT_CLASS(nfs_xdr_event,
 		TP_PROTO(
 			const struct xdr_stream *xdr,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 849db19451ff..cf68c0a61b7d 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -762,6 +762,10 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
 		hdr->args.count,
 		(unsigned long long)hdr->args.offset);
 
+	if (localio)
+		return nfs_local_doio(NFS_SERVER(hdr->inode)->nfs_client,
+				      localio, hdr, call_ops);
+
 	task = rpc_run_task(&task_setup_data);
 	if (IS_ERR(task))
 		return PTR_ERR(task);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 4bd16473a953..8bbbe8dace3b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1693,6 +1693,9 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
 
 	dprintk("NFS: initiated commit call\n");
 
+	if (localio)
+		return nfs_local_commit(localio, data, call_ops, how);
+
 	task = rpc_run_task(&task_setup_data);
 	if (IS_ERR(task))
 		return PTR_ERR(task);
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index 5ff1a5b3b00c..89ef8c5e98db 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -8,6 +8,8 @@
 #ifndef _LINUX_NFS_H
 #define _LINUX_NFS_H
 
+#include <linux/cred.h>
+#include <linux/sunrpc/auth.h>
 #include <linux/sunrpc/msg_prot.h>
 #include <linux/string.h>
 #include <linux/crc32.h>
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 1df86ab98c77..fc7982fc218c 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -8,6 +8,7 @@
 #include <linux/wait.h>
 #include <linux/nfs_xdr.h>
 #include <linux/sunrpc/xprt.h>
+#include <linux/nfslocalio.h>
 
 #include <linux/atomic.h>
 #include <linux/refcount.h>
@@ -49,6 +50,7 @@ struct nfs_client {
 #define NFS_CS_DS		7		/* - Server is a DS */
 #define NFS_CS_REUSEPORT	8		/* - reuse src port on reconnect */
 #define NFS_CS_PNFS		9		/* - Server used for pnfs */
+#define NFS_CS_LOCAL_IO		10		/* - client is local */
 	struct sockaddr_storage	cl_addr;	/* server identifier */
 	size_t			cl_addrlen;
 	char *			cl_hostname;	/* hostname of server */
@@ -125,6 +127,14 @@ struct nfs_client {
 	struct net		*cl_net;
 	struct list_head	pending_cb_stateids;
 	struct rcu_head		rcu;
+
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+	struct timespec64	cl_nfssvc_boot;
+	seqlock_t		cl_boot_lock;
+	struct net __rcu *	cl_nfssvc_net;
+	struct auth_domain __rcu * cl_nfssvc_dom;
+	spinlock_t		cl_localio_lock;
+#endif /* CONFIG_NFS_LOCALIO */
 };
 
 /*
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 20/25] nfs: enable localio for non-pNFS IO
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (18 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 19/25] nfs: add localio support Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:04 ` [PATCH v14 21/25] pnfs/flexfiles: enable localio support Mike Snitzer
                   ` (5 subsequent siblings)
  25 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Try a local open of the file being written to, and if it succeeds,
then use localio to issue IO.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/pagelist.c | 8 +++++++-
 fs/nfs/write.c    | 6 +++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index cf68c0a61b7d..1ea5d079ab8c 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -958,6 +958,12 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
 	nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
 	ret = nfs_generic_pgio(desc, hdr);
 	if (ret == 0) {
+		struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client;
+
+		struct nfs_localio_ctx *localio =
+			nfs_local_open_fh(clp, hdr->cred,
+					  hdr->args.fh, hdr->args.context->mode);
+
 		if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
 			task_flags = RPC_TASK_MOVEABLE;
 		ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
@@ -967,7 +973,7 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
 					desc->pg_rpc_callops,
 					desc->pg_ioflags,
 					RPC_TASK_CRED_NOREF | task_flags,
-					NULL);
+					localio);
 	}
 	return ret;
 }
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 8bbbe8dace3b..b8e8d5d0bc47 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1795,6 +1795,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
 		struct nfs_commit_info *cinfo)
 {
 	struct nfs_commit_data	*data;
+	struct nfs_localio_ctx *localio;
 	unsigned short task_flags = 0;
 
 	/* another commit raced with us */
@@ -1811,9 +1812,12 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
 	nfs_init_commit(data, head, NULL, cinfo);
 	if (NFS_SERVER(inode)->nfs_client->cl_minorversion)
 		task_flags = RPC_TASK_MOVEABLE;
+
+	localio = nfs_local_open_fh(NFS_SERVER(inode)->nfs_client, data->cred,
+				    data->args.fh, data->context->mode);
 	return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
 				   data->mds_ops, how,
-				   RPC_TASK_CRED_NOREF | task_flags, NULL);
+				   RPC_TASK_CRED_NOREF | task_flags, localio);
 }
 
 /*
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 21/25] pnfs/flexfiles: enable localio support
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (19 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 20/25] nfs: enable localio for non-pNFS IO Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:04 ` [PATCH v14 22/25] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
                   ` (4 subsequent siblings)
  25 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Trond Myklebust <trond.myklebust@hammerspace.com>

If the DS is local to this client use localio to write the data.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs/flexfilelayout/flexfilelayout.c    | 50 +++++++++++++++++++++--
 fs/nfs/flexfilelayout/flexfilelayoutdev.c |  6 +++
 2 files changed, 52 insertions(+), 4 deletions(-)

diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 01ee52551a63..7f1249db57b4 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -11,6 +11,7 @@
 #include <linux/nfs_mount.h>
 #include <linux/nfs_page.h>
 #include <linux/module.h>
+#include <linux/file.h>
 #include <linux/sched/mm.h>
 
 #include <linux/sunrpc/metrics.h>
@@ -162,6 +163,21 @@ decode_name(struct xdr_stream *xdr, u32 *id)
 	return 0;
 }
 
+static struct nfs_localio_ctx *
+ff_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+		 struct nfs_fh *fh, fmode_t mode)
+{
+	if (mode & FMODE_WRITE) {
+		/*
+		 * Always request read and write access since this corresponds
+		 * to a rw layout.
+		 */
+		mode |= FMODE_READ;
+	}
+
+	return nfs_local_open_fh(clp, cred, fh, mode);
+}
+
 static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
 		const struct nfs4_ff_layout_mirror *m2)
 {
@@ -237,7 +253,7 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
 
 static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
 {
-	const struct cred	*cred;
+	const struct cred *cred;
 
 	ff_layout_remove_mirror(mirror);
 	kfree(mirror->fh_versions);
@@ -1756,6 +1772,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
 	struct pnfs_layout_segment *lseg = hdr->lseg;
 	struct nfs4_pnfs_ds *ds;
 	struct rpc_clnt *ds_clnt;
+	struct nfs_localio_ctx *localio;
 	struct nfs4_ff_layout_mirror *mirror;
 	const struct cred *ds_cred;
 	loff_t offset = hdr->args.offset;
@@ -1802,11 +1819,18 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
 	hdr->args.offset = offset;
 	hdr->mds_offset = offset;
 
+	/* Start IO accounting for local read */
+	localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, FMODE_READ);
+	if (localio) {
+		hdr->task.tk_start = ktime_get();
+		ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
+	}
+
 	/* Perform an asynchronous read to ds */
 	nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
 			  vers == 3 ? &ff_layout_read_call_ops_v3 :
 				      &ff_layout_read_call_ops_v4,
-			  0, RPC_TASK_SOFTCONN, NULL);
+			  0, RPC_TASK_SOFTCONN, localio);
 	put_cred(ds_cred);
 	return PNFS_ATTEMPTED;
 
@@ -1826,6 +1850,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
 	struct pnfs_layout_segment *lseg = hdr->lseg;
 	struct nfs4_pnfs_ds *ds;
 	struct rpc_clnt *ds_clnt;
+	struct nfs_localio_ctx *localio;
 	struct nfs4_ff_layout_mirror *mirror;
 	const struct cred *ds_cred;
 	loff_t offset = hdr->args.offset;
@@ -1870,11 +1895,19 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
 	 */
 	hdr->args.offset = offset;
 
+	/* Start IO accounting for local write */
+	localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh,
+				   FMODE_READ|FMODE_WRITE);
+	if (localio) {
+		hdr->task.tk_start = ktime_get();
+		ff_layout_write_record_layoutstats_start(&hdr->task, hdr);
+	}
+
 	/* Perform an asynchronous write */
 	nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
 			  vers == 3 ? &ff_layout_write_call_ops_v3 :
 				      &ff_layout_write_call_ops_v4,
-			  sync, RPC_TASK_SOFTCONN, NULL);
+			  sync, RPC_TASK_SOFTCONN, localio);
 	put_cred(ds_cred);
 	return PNFS_ATTEMPTED;
 
@@ -1908,6 +1941,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
 	struct pnfs_layout_segment *lseg = data->lseg;
 	struct nfs4_pnfs_ds *ds;
 	struct rpc_clnt *ds_clnt;
+	struct nfs_localio_ctx *localio;
 	struct nfs4_ff_layout_mirror *mirror;
 	const struct cred *ds_cred;
 	u32 idx;
@@ -1946,10 +1980,18 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
 	if (fh)
 		data->args.fh = fh;
 
+	/* Start IO accounting for local commit */
+	localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh,
+				   FMODE_READ|FMODE_WRITE);
+	if (localio) {
+		data->task.tk_start = ktime_get();
+		ff_layout_commit_record_layoutstats_start(&data->task, data);
+	}
+
 	ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
 				   vers == 3 ? &ff_layout_commit_call_ops_v3 :
 					       &ff_layout_commit_call_ops_v4,
-				   how, RPC_TASK_SOFTCONN, NULL);
+				   how, RPC_TASK_SOFTCONN, localio);
 	put_cred(ds_cred);
 	return ret;
 out_err:
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index e028f5a0ef5f..e58bedfb1dcc 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -395,6 +395,12 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
 
 	/* connect success, check rsize/wsize limit */
 	if (!status) {
+		/*
+		 * ds_clp is put in destroy_ds().
+		 * keep ds_clp even if DS is local, so that if local IO cannot
+		 * proceed somehow, we can fall back to NFS whenever we want.
+		 */
+		nfs_local_probe(ds->ds_clp);
 		max_payload =
 			nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
 				       NULL);
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 22/25] nfs/localio: use dedicated workqueues for filesystem read and write
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (20 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 21/25] pnfs/flexfiles: enable localio support Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:04 ` [PATCH v14 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
                   ` (3 subsequent siblings)
  25 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Trond Myklebust <trond.myklebust@hammerspace.com>

For localio access, don't call filesystem read() and write() routines
directly.  This solves two problems:

1) localio writes need to use a normal (non-memreclaim) unbound
   workqueue.  This avoids imposing new requirements on how underlying
   filesystems process frontend IO, which would cause a large amount
   of work to update all filesystems.  Without this change, when XFS
   starts getting low on space, XFS flushes work on a non-memreclaim
   work queue, which causes a priority inversion problem:

00573 workqueue: WQ_MEM_RECLAIM writeback:wb_workfn is flushing !WQ_MEM_RECLAIM xfs-sync/vdc:xfs_flush_inodes_worker
00573 WARNING: CPU: 6 PID: 8525 at kernel/workqueue.c:3706 check_flush_dependency+0x2a4/0x328
00573 Modules linked in:
00573 CPU: 6 PID: 8525 Comm: kworker/u71:5 Not tainted 6.10.0-rc3-ktest-00032-g2b0a133403ab #18502
00573 Hardware name: linux,dummy-virt (DT)
00573 Workqueue: writeback wb_workfn (flush-0:33)
00573 pstate: 400010c5 (nZcv daIF -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00573 pc : check_flush_dependency+0x2a4/0x328
00573 lr : check_flush_dependency+0x2a4/0x328
00573 sp : ffff0000c5f06bb0
00573 x29: ffff0000c5f06bb0 x28: ffff0000c998a908 x27: 1fffe00019331521
00573 x26: ffff0000d0620900 x25: ffff0000c5f06ca0 x24: ffff8000828848c0
00573 x23: 1fffe00018be0d8e x22: ffff0000c1210000 x21: ffff0000c75fde00
00573 x20: ffff800080bfd258 x19: ffff0000cad63400 x18: ffff0000cd3a4810
00573 x17: 0000000000000000 x16: 0000000000000000 x15: ffff800080508d98
00573 x14: 0000000000000000 x13: 204d49414c434552 x12: 1fffe0001b6eeab2
00573 x11: ffff60001b6eeab2 x10: dfff800000000000 x9 : ffff60001b6eeab3
00573 x8 : 0000000000000001 x7 : 00009fffe491154e x6 : ffff0000db775593
00573 x5 : ffff0000db775590 x4 : ffff0000db775590 x3 : 0000000000000000
00573 x2 : 0000000000000027 x1 : ffff600018be0d62 x0 : dfff800000000000
00573 Call trace:
00573  check_flush_dependency+0x2a4/0x328
00573  __flush_work+0x184/0x5c8
00573  flush_work+0x18/0x28
00573  xfs_flush_inodes+0x68/0x88
00573  xfs_file_buffered_write+0x128/0x6f0
00573  xfs_file_write_iter+0x358/0x448
00573  nfs_local_doio+0x854/0x1568
00573  nfs_initiate_pgio+0x214/0x418
00573  nfs_generic_pg_pgios+0x304/0x480
00573  nfs_pageio_doio+0xe8/0x240
00573  nfs_pageio_complete+0x160/0x480
00573  nfs_writepages+0x300/0x4f0
00573  do_writepages+0x12c/0x4a0
00573  __writeback_single_inode+0xd4/0xa68
00573  writeback_sb_inodes+0x470/0xcb0
00573  __writeback_inodes_wb+0xb0/0x1d0
00573  wb_writeback+0x594/0x808
00573  wb_workfn+0x5e8/0x9e0
00573  process_scheduled_works+0x53c/0xd90
00573  worker_thread+0x370/0x8c8
00573  kthread+0x258/0x2e8
00573  ret_from_fork+0x10/0x20

2) Some filesystem writeback routines can end up taking up a lot of
   stack space (particularly XFS).  Instead of risking running over
   due to the extra overhead from the NFS stack, we should just call
   these routines from a workqueue job.  Since we need to do this to
   address 1) above we're able to avoid possibly blowing the stack
   "for free".

Use of dedicated workqueues improves performance over using the
system_unbound_wq.

Also, the creds used to open the file are used to override_creds() in
both nfs_local_call_read() and nfs_local_call_write() -- otherwise the
workqueue could have elevated capabilities (which the caller may not).

Lastly, care is taken to set PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO in
nfs_do_local_write() to avoid writeback deadlocks.

The PF_LOCAL_THROTTLE flag prevents deadlocks in balance_dirty_pages()
by causing writes to only be throttled against other writes to the
same bdi (it keeps the throttling local).  Normally all writes to
bdi(s) are throttled equally (after throughput factors are allowed
for).

The PF_MEMALLOC_NOIO flag prevents the lower filesystem IO from
causing memory reclaim to re-enter filesystems or IO devices and so
prevents deadlocks from occuring where IO that cleans pages is
waiting on IO to complete.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de> # eliminated wait_for_completion
---
 fs/nfs/inode.c    | 57 +++++++++++++++++++++++------------
 fs/nfs/internal.h |  1 +
 fs/nfs/localio.c  | 75 ++++++++++++++++++++++++++++++++++-------------
 3 files changed, 93 insertions(+), 40 deletions(-)

diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index b4914a11c3c2..542c7d97b235 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2461,35 +2461,54 @@ static void nfs_destroy_inodecache(void)
 	kmem_cache_destroy(nfs_inode_cachep);
 }
 
+struct workqueue_struct *nfslocaliod_workqueue;
 struct workqueue_struct *nfsiod_workqueue;
 EXPORT_SYMBOL_GPL(nfsiod_workqueue);
 
 /*
- * start up the nfsiod workqueue
- */
-static int nfsiod_start(void)
-{
-	struct workqueue_struct *wq;
-	dprintk("RPC:       creating workqueue nfsiod\n");
-	wq = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
-	if (wq == NULL)
-		return -ENOMEM;
-	nfsiod_workqueue = wq;
-	return 0;
-}
-
-/*
- * Destroy the nfsiod workqueue
+ * Destroy the nfsiod workqueues
  */
 static void nfsiod_stop(void)
 {
 	struct workqueue_struct *wq;
 
 	wq = nfsiod_workqueue;
-	if (wq == NULL)
-		return;
-	nfsiod_workqueue = NULL;
-	destroy_workqueue(wq);
+	if (wq != NULL) {
+		nfsiod_workqueue = NULL;
+		destroy_workqueue(wq);
+	}
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+	wq = nfslocaliod_workqueue;
+	if (wq != NULL) {
+		nfslocaliod_workqueue = NULL;
+		destroy_workqueue(wq);
+	}
+#endif /* CONFIG_NFS_LOCALIO */
+}
+
+/*
+ * Start the nfsiod workqueues
+ */
+static int nfsiod_start(void)
+{
+	dprintk("RPC:       creating workqueue nfsiod\n");
+	nfsiod_workqueue = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+	if (nfsiod_workqueue == NULL)
+		return -ENOMEM;
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+	/*
+	 * localio writes need to use a normal (non-memreclaim) workqueue.
+	 * When we start getting low on space, XFS goes and calls flush_work() on
+	 * a non-memreclaim work queue, which causes a priority inversion problem.
+	 */
+	dprintk("RPC:       creating workqueue nfslocaliod\n");
+	nfslocaliod_workqueue = alloc_workqueue("nfslocaliod", WQ_UNBOUND, 0);
+	if (unlikely(nfslocaliod_workqueue == NULL)) {
+		nfsiod_stop();
+		return -ENOMEM;
+	}
+#endif /* CONFIG_NFS_LOCALIO */
+	return 0;
 }
 
 unsigned int nfs_net_id;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 0716c90eaf9c..cc7e6f466dc9 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -440,6 +440,7 @@ int nfs_check_flags(int);
 
 /* inode.c */
 extern struct workqueue_struct *nfsiod_workqueue;
+extern struct workqueue_struct *nfslocaliod_workqueue;
 extern struct inode *nfs_alloc_inode(struct super_block *sb);
 extern void nfs_free_inode(struct inode *);
 extern int nfs_write_inode(struct inode *, struct writeback_control *);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 96e04328adb9..fa598b99941a 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -268,15 +268,34 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
 			status > 0 ? status : 0, hdr->res.eof);
 }
 
+static void nfs_local_call_read(struct work_struct *work)
+{
+	struct nfs_local_kiocb *iocb =
+		container_of(work, struct nfs_local_kiocb, work);
+	struct file *filp = iocb->kiocb.ki_filp;
+	const struct cred *save_cred;
+	struct iov_iter iter;
+	ssize_t status;
+
+	save_cred = override_creds(filp->f_cred);
+
+	nfs_local_iter_init(&iter, iocb, READ);
+
+	status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+	WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+	nfs_local_read_done(iocb, status);
+	nfs_local_pgio_release(iocb);
+
+	revert_creds(save_cred);
+}
+
 static int
 nfs_do_local_read(struct nfs_pgio_header *hdr,
 		  struct nfs_localio_ctx *localio,
 		  const struct rpc_call_ops *call_ops)
 {
-	struct file *filp = nfs_to.nfsd_file_file(localio->nf);
 	struct nfs_local_kiocb *iocb;
-	struct iov_iter iter;
-	ssize_t status;
 
 	dprintk("%s: vfs_read count=%u pos=%llu\n",
 		__func__, hdr->args.count, hdr->args.offset);
@@ -284,16 +303,12 @@ nfs_do_local_read(struct nfs_pgio_header *hdr,
 	iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL);
 	if (iocb == NULL)
 		return -ENOMEM;
-	nfs_local_iter_init(&iter, iocb, READ);
 
 	nfs_local_pgio_init(hdr, call_ops);
 	hdr->res.eof = false;
 
-	status = filp->f_op->read_iter(&iocb->kiocb, &iter);
-	WARN_ON_ONCE(status == -EIOCBQUEUED);
-
-	nfs_local_read_done(iocb, status);
-	nfs_local_pgio_release(iocb);
+	INIT_WORK(&iocb->work, nfs_local_call_read);
+	queue_work(nfslocaliod_workqueue, &iocb->work);
 
 	return 0;
 }
@@ -421,15 +436,40 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
 	nfs_local_pgio_done(hdr, status);
 }
 
+static void nfs_local_call_write(struct work_struct *work)
+{
+	struct nfs_local_kiocb *iocb =
+		container_of(work, struct nfs_local_kiocb, work);
+	struct file *filp = iocb->kiocb.ki_filp;
+	unsigned long old_flags = current->flags;
+	const struct cred *save_cred;
+	struct iov_iter iter;
+	ssize_t status;
+
+	current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
+	save_cred = override_creds(filp->f_cred);
+
+	nfs_local_iter_init(&iter, iocb, WRITE);
+
+	file_start_write(filp);
+	status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+	file_end_write(filp);
+	WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+	nfs_local_write_done(iocb, status);
+	nfs_local_vfs_getattr(iocb);
+	nfs_local_pgio_release(iocb);
+
+	revert_creds(save_cred);
+	current->flags = old_flags;
+}
+
 static int
 nfs_do_local_write(struct nfs_pgio_header *hdr,
 		   struct nfs_localio_ctx *localio,
 		   const struct rpc_call_ops *call_ops)
 {
-	struct file *filp = nfs_to.nfsd_file_file(localio->nf);
 	struct nfs_local_kiocb *iocb;
-	struct iov_iter iter;
-	ssize_t status;
 
 	dprintk("%s: vfs_write count=%u pos=%llu %s\n",
 		__func__, hdr->args.count, hdr->args.offset,
@@ -438,7 +478,6 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
 	iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO);
 	if (iocb == NULL)
 		return -ENOMEM;
-	nfs_local_iter_init(&iter, iocb, WRITE);
 
 	switch (hdr->args.stable) {
 	default:
@@ -453,14 +492,8 @@ nfs_do_local_write(struct nfs_pgio_header *hdr,
 
 	nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
 
-	file_start_write(filp);
-	status = filp->f_op->write_iter(&iocb->kiocb, &iter);
-	file_end_write(filp);
-	WARN_ON_ONCE(status == -EIOCBQUEUED);
-
-	nfs_local_write_done(iocb, status);
-	nfs_local_vfs_getattr(iocb);
-	nfs_local_pgio_release(iocb);
+	INIT_WORK(&iocb->work, nfs_local_call_write);
+	queue_work(nfslocaliod_workqueue, &iocb->work);
 
 	return 0;
 }
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (21 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 22/25] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:04 ` [PATCH v14 24/25] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
                   ` (2 subsequent siblings)
  25 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common for subsequent lookup and verification
by the NFS server.  If matched, the NFS server populates members in the
nfs_uuid_t struct.  The NFS client then transfers these nfs_uuid_t
struct member pointers to the nfs_client struct and cleans up the
nfs_uuid_t struct.  See: fs/nfs/localio.c:nfs_local_probe()

This protocol isn't part of an IETF standard, nor does it need to be
considering it is Linux-to-Linux auxiliary RPC protocol that amounts
to an implementation detail.

Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
AUTH_SYS) is used (enforced by fs/nfs/localio.c:nfs_local_probe()).

The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes).  The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.

Having a nonce (single-use uuid) is better than using the same uuid
for the life of the server, and sending it proactively by client
rather than reactively by the server is also safer.

[NeilBrown factored out and simplified a single localio protocol and
proposed making the uuid short-lived]

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
 fs/nfs/client.c  |   6 ++-
 fs/nfs/localio.c | 136 +++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 135 insertions(+), 7 deletions(-)

diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index b981c519a12d..6a4b605cc943 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -434,8 +434,10 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
 			list_add_tail(&new->cl_share_link,
 					&nn->nfs_client_list);
 			spin_unlock(&nn->nfs_client_lock);
-			nfs_local_probe(new);
-			return rpc_ops->init_client(new, cl_init);
+			new = rpc_ops->init_client(new, cl_init);
+			if (!IS_ERR(new))
+				 nfs_local_probe(new);
+			return new;
 		}
 
 		spin_unlock(&nn->nfs_client_lock);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index fa598b99941a..40521da422f7 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -50,18 +50,77 @@ static void nfs_local_fsync_work(struct work_struct *work);
 static bool localio_enabled __read_mostly = true;
 module_param(localio_enabled, bool, 0644);
 
+static inline bool nfs_client_is_local(const struct nfs_client *clp)
+{
+	return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+}
+
 bool nfs_server_is_local(const struct nfs_client *clp)
 {
-	return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
-		localio_enabled;
+	return nfs_client_is_local(clp) && localio_enabled;
 }
 EXPORT_SYMBOL_GPL(nfs_server_is_local);
 
+/*
+ * UUID_IS_LOCAL XDR functions
+ */
+
+static void localio_xdr_enc_uuidargs(struct rpc_rqst *req,
+				     struct xdr_stream *xdr,
+				     const void *data)
+{
+	const u8 *uuid = data;
+
+	encode_opaque_fixed(xdr, uuid, UUID_SIZE);
+}
+
+static int localio_xdr_dec_uuidres(struct rpc_rqst *req,
+				   struct xdr_stream *xdr,
+				   void *result)
+{
+	/* void return */
+	return 0;
+}
+
+static const struct rpc_procinfo nfs_localio_procedures[] = {
+	[LOCALIOPROC_UUID_IS_LOCAL] = {
+		.p_proc = LOCALIOPROC_UUID_IS_LOCAL,
+		.p_encode = localio_xdr_enc_uuidargs,
+		.p_decode = localio_xdr_dec_uuidres,
+		.p_arglen = XDR_QUADLEN(UUID_SIZE),
+		.p_replen = 0,
+		.p_statidx = LOCALIOPROC_UUID_IS_LOCAL,
+		.p_name = "UUID_IS_LOCAL",
+	},
+};
+
+static unsigned int nfs_localio_counts[ARRAY_SIZE(nfs_localio_procedures)];
+static const struct rpc_version nfslocalio_version1 = {
+	.number			= 1,
+	.nrprocs		= ARRAY_SIZE(nfs_localio_procedures),
+	.procs			= nfs_localio_procedures,
+	.counts			= nfs_localio_counts,
+};
+
+static const struct rpc_version *nfslocalio_version[] = {
+       [1]			= &nfslocalio_version1,
+};
+
+extern const struct rpc_program nfslocalio_program;
+static struct rpc_stat		nfslocalio_rpcstat = { &nfslocalio_program };
+
+const struct rpc_program nfslocalio_program = {
+	.name			= "nfslocalio",
+	.number			= NFS_LOCALIO_PROGRAM,
+	.nrvers			= ARRAY_SIZE(nfslocalio_version),
+	.version		= nfslocalio_version,
+	.stats			= &nfslocalio_rpcstat,
+};
+
 /*
  * nfs_local_enable - enable local i/o for an nfs_client
  */
-static __maybe_unused void nfs_local_enable(struct nfs_client *clp,
-					    nfs_uuid_t *nfs_uuid)
+static void nfs_local_enable(struct nfs_client *clp, nfs_uuid_t *nfs_uuid)
 {
 	spin_lock(&clp->cl_localio_lock);
 
@@ -103,11 +162,77 @@ void nfs_local_disable(struct nfs_client *clp)
 	spin_unlock(&clp->cl_localio_lock);
 }
 
+/*
+ * nfs_init_localioclient - Initialise an NFS localio client connection
+ */
+static struct rpc_clnt *nfs_init_localioclient(struct nfs_client *clp)
+{
+	struct rpc_clnt *rpcclient_localio;
+
+	rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
+						 &nfslocalio_program, 1);
+
+	dprintk_rcu("%s: server (%s) %s NFS LOCALIO.\n",
+		__func__, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
+		(IS_ERR(rpcclient_localio) ? "does not support" : "supports"));
+
+	return rpcclient_localio;
+}
+
+static bool nfs_server_uuid_is_local(struct nfs_client *clp,
+				     nfs_uuid_t *nfs_uuid)
+{
+	u8 uuid[UUID_SIZE];
+	struct rpc_message msg = {
+		.rpc_argp = &uuid,
+	};
+	struct rpc_clnt *rpcclient_localio;
+	int status;
+
+	rpcclient_localio = nfs_init_localioclient(clp);
+	if (IS_ERR(rpcclient_localio))
+		return false;
+
+	export_uuid(uuid, &nfs_uuid->uuid);
+
+	msg.rpc_proc = &nfs_localio_procedures[LOCALIOPROC_UUID_IS_LOCAL];
+	status = rpc_call_sync(rpcclient_localio, &msg, 0);
+	dprintk("%s: NFS reply UUID_IS_LOCAL: status=%d\n",
+		__func__, status);
+	rpc_shutdown_client(rpcclient_localio);
+
+	/* Server is only local if it initialized required struct members */
+	if (status || !nfs_uuid->net || !nfs_uuid->dom)
+		return false;
+
+	return true;
+}
+
 /*
  * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ * - called after alloc_client and init_client (so cl_rpcclient exists)
+ * - this function is idempotent, it can be called for old or new clients
  */
 void nfs_local_probe(struct nfs_client *clp)
 {
+	nfs_uuid_t nfs_uuid;
+
+	/* Disallow localio if disabled via sysfs or AUTH_SYS isn't used */
+	if (!localio_enabled ||
+	    clp->cl_rpcclient->cl_auth->au_flavor != RPC_AUTH_UNIX) {
+		nfs_local_disable(clp);
+		return;
+	}
+
+	if (nfs_client_is_local(clp)) {
+		/* If already enabled, disable and re-enable */
+		nfs_local_disable(clp);
+	}
+
+	nfs_uuid_begin(&nfs_uuid);
+	if (nfs_server_uuid_is_local(clp, &nfs_uuid))
+		nfs_local_enable(clp, &nfs_uuid);
+	nfs_uuid_end(&nfs_uuid);
 }
 EXPORT_SYMBOL_GPL(nfs_local_probe);
 
@@ -146,7 +271,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
 		case -ENOMEM:
 		case -ENXIO:
 		case -ENOENT:
-			nfs_local_disable(clp);
+			/* Revalidate localio, will disable if unsupported */
+			nfs_local_probe(clp);
 		}
 		return NULL;
 	}
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 24/25] nfs: add Documentation/filesystems/nfs/localio.rst
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (22 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:04 ` [PATCH v14 25/25] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst Mike Snitzer
  2024-08-29  1:42 ` [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
  25 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

This document gives an overview of the LOCALIO auxiliary RPC protocol
added to the Linux NFS client and server to allow them to reliably
handshake to determine if they are on the same host.

Once an NFS client and server handshake as "local", the client will
bypass the network RPC protocol for read, write and commit operations.
Due to this XDR and RPC bypass, these operations will operate faster.

Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 Documentation/filesystems/nfs/localio.rst | 199 ++++++++++++++++++++++
 1 file changed, 199 insertions(+)
 create mode 100644 Documentation/filesystems/nfs/localio.rst

diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
new file mode 100644
index 000000000000..8cceb3db386a
--- /dev/null
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -0,0 +1,199 @@
+===========
+NFS LOCALIO
+===========
+
+Overview
+========
+
+The LOCALIO auxiliary RPC protocol allows the Linux NFS client and
+server to reliably handshake to determine if they are on the same host.
+
+Once an NFS client and server handshake as "local", the client will
+bypass the network RPC protocol for read, write and commit operations.
+Due to this XDR and RPC bypass, these operations will operate faster.
+
+The LOCALIO auxiliary protocol's implementation, which uses the same
+connection as NFS traffic, follows the pattern established by the NFS
+ACL protocol extension.
+
+The LOCALIO auxiliary protocol is needed to allow robust discovery of
+clients local to their servers. In a private implementation that
+preceded use of this LOCALIO protocol, a fragile sockaddr network
+address based match against all local network interfaces was attempted.
+But unlike the LOCALIO protocol, the sockaddr-based matching didn't
+handle use of iptables or containers.
+
+The robust handshake between local client and server is just the
+beginning, the ultimate use case this locality makes possible is the
+client is able to open files and issue reads, writes and commits
+directly to the server without having to go over the network. The
+requirement is to perform these loopback NFS operations as efficiently
+as possible, this is particularly useful for container use cases
+(e.g. kubernetes) where it is possible to run an IO job local to the
+server.
+
+The performance advantage realized from LOCALIO's ability to bypass
+using XDR and RPC for reads, writes and commits can be extreme, e.g.:
+
+fio for 20 secs with directio, qd of 8, 16 libaio threads:
+- With LOCALIO:
+  4K read:    IOPS=979k,  BW=3825MiB/s (4011MB/s)(74.7GiB/20002msec)
+  4K write:   IOPS=165k,  BW=646MiB/s  (678MB/s)(12.6GiB/20002msec)
+  128K read:  IOPS=402k,  BW=49.1GiB/s (52.7GB/s)(982GiB/20002msec)
+  128K write: IOPS=11.5k, BW=1433MiB/s (1503MB/s)(28.0GiB/20004msec)
+
+- Without LOCALIO:
+  4K read:    IOPS=79.2k, BW=309MiB/s  (324MB/s)(6188MiB/20003msec)
+  4K write:   IOPS=59.8k, BW=234MiB/s  (245MB/s)(4671MiB/20002msec)
+  128K read:  IOPS=33.9k, BW=4234MiB/s (4440MB/s)(82.7GiB/20004msec)
+  128K write: IOPS=11.5k, BW=1434MiB/s (1504MB/s)(28.0GiB/20011msec)
+
+fio for 20 secs with directio, qd of 8, 1 libaio thread:
+- With LOCALIO:
+  4K read:    IOPS=230k,  BW=898MiB/s  (941MB/s)(17.5GiB/20001msec)
+  4K write:   IOPS=22.6k, BW=88.3MiB/s (92.6MB/s)(1766MiB/20001msec)
+  128K read:  IOPS=38.8k, BW=4855MiB/s (5091MB/s)(94.8GiB/20001msec)
+  128K write: IOPS=11.4k, BW=1428MiB/s (1497MB/s)(27.9GiB/20001msec)
+
+- Without LOCALIO:
+  4K read:    IOPS=77.1k, BW=301MiB/s  (316MB/s)(6022MiB/20001msec)
+  4K write:   IOPS=32.8k, BW=128MiB/s  (135MB/s)(2566MiB/20001msec)
+  128K read:  IOPS=24.4k, BW=3050MiB/s (3198MB/s)(59.6GiB/20001msec)
+  128K write: IOPS=11.4k, BW=1430MiB/s (1500MB/s)(27.9GiB/20001msec)
+
+RPC
+===
+
+The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
+RPC method that allows the Linux NFS client to verify the local Linux
+NFS server can see the nonce (single-use UUID) the client generated and
+made available in nfs_common. This protocol isn't part of an IETF
+standard, nor does it need to be considering it is Linux-to-Linux
+auxiliary RPC protocol that amounts to an implementation detail.
+
+The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
+the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
+XDR methods are used instead of the less efficient variable sized
+methods.
+
+The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
+by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
+Linux Kernel Organization       400122  nfslocalio
+
+The LOCALIO protocol spec in rpcgen syntax is:
+
+/* raw RFC 9562 UUID */
+#define UUID_SIZE 16
+typedef u8 uuid_t<UUID_SIZE>;
+
+program NFS_LOCALIO_PROGRAM {
+    version LOCALIO_V1 {
+        void
+            NULL(void) = 0;
+
+        void
+            UUID_IS_LOCAL(uuid_t) = 1;
+    } = 1;
+} = 400122;
+
+LOCALIO uses the same transport connection as NFS traffic. As such,
+LOCALIO is not registered with rpcbind.
+
+NFS Common and Client/Server Handshake
+======================================
+
+fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client
+to generate a nonce (single-use UUID) and associated short-lived
+nfs_uuid_t struct, register it with nfs_common for subsequent lookup and
+verification by the NFS server and if matched the NFS server populates
+members in the nfs_uuid_t struct. The nfs client then transfers these
+nfs_uuid_t struct member pointers to the nfs_client struct and cleans up
+the nfs_uuid_t struct.  See: fs/nfs/localio.c:nfs_local_probe()
+
+nfs_common's nfs_uuids list is the basis for LOCALIO enablement, as such
+it has members that point to nfsd memory for direct use by the client
+(e.g. 'net' is the server's network namespace, through it the client can
+access nn->nfsd_serv with proper rcu read access). It is this client
+and server synchronization that enables advanced usage and lifetime of
+objects to span from the host kernel's nfsd to per-container knfsd
+instances that are connected to nfs client's running on the same local
+host.
+
+NFS Client issues IO instead of Server
+======================================
+
+Because LOCALIO is focused on protocol bypass to achieve improved IO
+performance alternatives to traditional NFS wire protocol (SUNRPC with
+XDR) to access the backing filesystem must be provided.
+
+See fs/nfs/localio.c:nfs_local_open_fh() and
+fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
+focused use of select nfs server objects to allow a client local to a
+server to open a file pointer without needing to go over the network.
+
+The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
+server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
+both the nfsd network namespace and the associated nn->nfsd_serv in
+terms of RCU. If nfsd_open_local_fh() finds that client no longer sees
+valid nfsd objects (be it struct net or nn->nfsd_serv) it returns ENXIO
+to nfs_local_open_fh() and the client will try to reestablish the
+LOCALIO resources needed by calling nfs_local_probe() again. This
+recovery is needed if/when an nfsd instance running in a container were
+to reboot while a LOCALIO client is connected to it.
+
+Once the client has an open file pointer it will issue reads, writes and
+commits directly to the underlying local filesystem (normally done by
+the nfs server). As such, for these operations, the NFS client is
+issuing IO to the underlying local filesystem that it is sharing with
+the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and
+fs/nfs/localio.c:nfs_local_commit().
+
+Security
+========
+
+Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
+AUTH_SYS) is used.
+
+Care is taken to ensure the same NFS security mechanisms are used
+(authentication, etc) regardless of whether LOCALIO or regular NFS
+access is used. The auth_domain established as part of the traditional
+NFS client access to the NFS server is also used for LOCALIO.
+
+Relative to containers, LOCALIO gives the client access to the network
+namespace the server has. This is required to allow the client to access
+the server's per-namespace nfsd_net struct. With traditional NFS, the
+client is afforded this same level of access (albeit in terms of the NFS
+protocol via SUNRPC). No other namespaces (user, mount, etc) have been
+altered or purposely extended from the server to the client.
+
+Testing
+=======
+
+The LOCALIO auxiliary protocol and associated NFS LOCALIO read, write
+and commit access have proven stable against various test scenarios:
+
+- Client and server both on the same host.
+
+- All permutations of client and server support enablement for both
+  local and remote client and server.
+
+- Testing against NFS storage products that don't support the LOCALIO
+  protocol was also performed.
+
+- Client on host, server within a container (for both v3 and v4.2).
+  The container testing was in terms of podman managed containers and
+  includes successful container stop/restart scenario.
+
+- Formalizing these test scenarios in terms of existing test
+  infrastructure is on-going. Initial regular coverage is provided in
+  terms of ktest running xfstests against a LOCALIO-enabled NFS loopback
+  mount configuration, and includes lockdep and KASAN coverage, see:
+  https://evilpiepirate.org/~testdashboard/ci?user=snitzer&branch=snitm-nfs-next
+  https://github.com/koverstreet/ktest
+
+- Various kdevops testing (in terms of "Chuck's BuildBot") has been
+  performed to regularly verify the LOCALIO changes haven't caused any
+  regressions to non-LOCALIO NFS use cases.
+
+- All of Hammerspace's various sanity tests pass with LOCALIO enabled
+  (this includes numerous pNFS and flexfiles tests).
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14 25/25] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (23 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 24/25] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
@ 2024-08-29  1:04 ` Mike Snitzer
  2024-08-29  1:47   ` [PATCH v14.5 " Mike Snitzer
  2024-08-29  1:42 ` [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:04 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Add a FAQ section to give answers to questions that have been raised
during review of the localio feature.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 Documentation/filesystems/nfs/localio.rst | 77 +++++++++++++++++++++++
 1 file changed, 77 insertions(+)

diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
index 8cceb3db386a..4b6d63246479 100644
--- a/Documentation/filesystems/nfs/localio.rst
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -61,6 +61,83 @@ fio for 20 secs with directio, qd of 8, 1 libaio thread:
   128K read:  IOPS=24.4k, BW=3050MiB/s (3198MB/s)(59.6GiB/20001msec)
   128K write: IOPS=11.4k, BW=1430MiB/s (1500MB/s)(27.9GiB/20001msec)
 
+FAQ
+===
+
+1. What are the use cases for LOCALIO?
+
+   a. Workloads where the NFS client and server are on the same host
+      realize improved IO performance. In particular, it is common when
+      running containerised workloads for jobs to find themselves
+      running on the same host as the knfsd server being used for
+      storage.
+
+2. What are the requirements for LOCALIO?
+
+   a. Bypass use of the network RPC protocol as much as possible. This
+      includes bypassing XDR and RPC for open, read, write and commit
+      operations.
+   b. Allow client and server to autonomously discover if they are
+      running local to each other without making any assumptions about
+      the local network topology.
+   c. Support the use of containers by being compatible with relevant
+      namespaces (e.g. network, user, mount).
+   d. Support all versions of NFS. NFSv3 is of particular importance
+      because it has wide enterprise usage and pNFS flexfiles makes use
+      of it for the data path.
+
+3. Why doesn’t LOCALIO just compare IP addresses or hostnames when
+   deciding if the NFS client and server are co-located on the same
+   host?
+
+   Since one of the main use cases is containerised workloads, we cannot
+   assume that IP addresses will be shared between the client and
+   server. This sets up a requirement for a handshake protocol that
+   needs to go over the same connection as the NFS traffic in order to
+   identify that the client and the server really are running on the
+   same host. The handshake uses a secret that is sent over the wire,
+   and can be verified by both parties by comparing with a value stored
+   in shared kernel memory if they are truly co-located.
+
+4. Does LOCALIO improve pNFS flexfiles?
+
+   Yes, LOCALIO complements pNFS flexfiles by allowing it to take
+   advantage of NFS client and server locality.  Policy that initiates
+   client IO as closely to the server where the data is stored naturally
+   benefits from the data path optimization LOCALIO provides.
+
+5. Why not develop a new pNFS layout to enable LOCALIO?
+
+   A new pNFS layout could be developed, but doing so would put the
+   onus on the server to somehow discover that the client is co-located
+   when deciding to hand out the layout.
+   There is value in a simpler approach (as provided by LOCALIO) that
+   allows the NFS client to negotiate and leverage locality without
+   requiring more elaborate modeling and discovery of such locality in a
+   more centralized manner.
+
+6. Why is having the client perform a server-side file OPEN, without
+   using RPC, beneficial?  Is the benefit pNFS specific?
+
+   Avoiding the use of XDR and RPC for file opens is beneficial to
+   performance regardless of whether pNFS is used. However adding a
+   requirement to go over the wire to do an open and/or close ends up
+   negating any benefit of avoiding the wire for doing the I/O itself
+   when we’re dealing with small files. There is no benefit to replacing
+   the READ or WRITE with a new open and/or close operation that still
+   needs to go over the wire.
+
+7. Why is LOCALIO only supported with UNIX Authentication (AUTH_UNIX)?
+
+   Strong authentication is usually tied to the connection itself. It
+   works by establishing a context that is cached by the server, and
+   that acts as the key for discovering the authorisation token, which
+   can then be passed to rpc.mountd to complete the authentication
+   process. On the other hand, in the case of AUTH_UNIX, the credential
+   that was passed over the wire is used directly as the key in the
+   upcall to rpc.mountd. This simplifies the authentication process, and
+   so makes AUTH_UNIX easier to support.
+
 RPC
 ===
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO
  2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
                   ` (24 preceding siblings ...)
  2024-08-29  1:04 ` [PATCH v14 25/25] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst Mike Snitzer
@ 2024-08-29  1:42 ` Mike Snitzer
  2024-08-29  1:50   ` Mike Snitzer
  25 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:42 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Wed, Aug 28, 2024 at 09:03:55PM -0400, Mike Snitzer wrote:
> These latest changes are available in my git tree here:
> https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-next
> 
> I _think_ I addressed all of v13's very helpful review comments.
> Special thanks to Neil and Chuck for their time and help!
> 
> And hopefully I didn't miss anything in the changelog below.

As it happens, a last minute rebase that I did just before sending out
v14 caused me to send out 2 stale patches:
[PATCH v14 09/25] nfsd: add nfsd_file_acquire_local()
[PATCH v14 25/25] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst

I will reply to each patch with a correct v14.5 for each.

Sorry for the confusion.

Here is the incremental diff that shows what was missing in v14:

diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
index 4b6d63246479..5d652f637a97 100644
--- a/Documentation/filesystems/nfs/localio.rst
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -120,12 +120,13 @@ FAQ
    using RPC, beneficial?  Is the benefit pNFS specific?
 
    Avoiding the use of XDR and RPC for file opens is beneficial to
-   performance regardless of whether pNFS is used. However adding a
-   requirement to go over the wire to do an open and/or close ends up
-   negating any benefit of avoiding the wire for doing the I/O itself
-   when we´re dealing with small files. There is no benefit to replacing
-   the READ or WRITE with a new open and/or close operation that still
-   needs to go over the wire.
+   performance regardless of whether pNFS is used. Especially when
+   dealing with small files its best to avoid going over the wire
+   whenever possible, otherwise it could reduce or even negate the
+   benefits of avoiding the wire for doing the small file I/O itself.
+   Given LOCALIO's requirements the current approach of having the
+   client perform a server-side file open, without using RPC, is ideal.
+   If in the future requirements change then we can adapt accordingly.
 
 7. Why is LOCALIO only supported with UNIX Authentication (AUTH_UNIX)?
 
diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c
index e636d2a1e664..46a7f9b813e5 100644
--- a/fs/nfsd/lockd.c
+++ b/fs/nfsd/lockd.c
@@ -32,10 +32,8 @@ nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp,
 	int		access;
 	struct svc_fh	fh;
 
-	if (rqstp->rq_vers == 4)
-		fh_init(&fh, NFS3_FHSIZE);
-	else
-		fh_init(&fh, NFS_FHSIZE);
+	/* must initialize before using! but maxsize doesn't matter */
+	fh_init(&fh,0);
 	fh.fh_handle.fh_size = f->size;
 	memcpy(&fh.fh_handle.fh_raw, f->data, f->size);
 	fh.fh_export = NULL;
diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 49468e478d23..eca577cf3263 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -290,9 +290,6 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
 			fhp->fh_use_wgather = true;
 		if (exp->ex_flags & NFSEXP_V4ROOT)
 			goto out;
-		break;
-	case 0:
-		WARN_ONCE(1, "Uninitialized file handle");
 	}
 
 	return 0;

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14.5 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
  2024-08-29  1:04 ` [PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry() Mike Snitzer
@ 2024-08-29  1:45   ` Mike Snitzer
  2024-08-29 16:52     ` Jeff Layton
  2024-08-29 14:28   ` [PATCH v14 " Jeff Layton
  1 sibling, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:45 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Chuck Lever <chuck.lever@oracle.com>

Currently, fh_verify() makes some daring assumptions about which
version of file handle the caller wants, based on the things it can
find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
case sometimes has no svc_rqst context, so this logic won't work in
that case.

Instead, examine the passed-in file handle. It's .max_size field
should carry information to allow nfsd_set_fh_dentry() to initialize
the file handle appropriately.

The file handle used by lockd and the one created by write_filehandle
never need any of the version-specific fields (which affect things
like write and getattr requests and pre/post attributes).

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfsd/nfsfh.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
index 4b964a71a504..60c2395d7af7 100644
--- a/fs/nfsd/nfsfh.c
+++ b/fs/nfsd/nfsfh.c
@@ -267,20 +267,20 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
 	fhp->fh_dentry = dentry;
 	fhp->fh_export = exp;
 
-	switch (rqstp->rq_vers) {
-	case 4:
+	switch (fhp->fh_maxsize) {
+	case NFS4_FHSIZE:
 		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR)
 			fhp->fh_no_atomic_attr = true;
 		fhp->fh_64bit_cookies = true;
 		break;
-	case 3:
+	case NFS3_FHSIZE:
 		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC)
 			fhp->fh_no_wcc = true;
 		fhp->fh_64bit_cookies = true;
 		if (exp->ex_flags & NFSEXP_V4ROOT)
 			goto out;
 		break;
-	case 2:
+	case NFS_FHSIZE:
 		fhp->fh_no_wcc = true;
 		if (EX_WGATHER(exp))
 			fhp->fh_use_wgather = true;
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* [PATCH v14.5 25/25] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst
  2024-08-29  1:04 ` [PATCH v14 25/25] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst Mike Snitzer
@ 2024-08-29  1:47   ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:47 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

From: Trond Myklebust <trond.myklebust@hammerspace.com>

Add a FAQ section to give answers to questions that have been raised
during review of the localio feature.

Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 Documentation/filesystems/nfs/localio.rst | 78 +++++++++++++++++++++++
 1 file changed, 78 insertions(+)

diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
index 8cceb3db386a..5d652f637a97 100644
--- a/Documentation/filesystems/nfs/localio.rst
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -61,6 +61,84 @@ fio for 20 secs with directio, qd of 8, 1 libaio thread:
   128K read:  IOPS=24.4k, BW=3050MiB/s (3198MB/s)(59.6GiB/20001msec)
   128K write: IOPS=11.4k, BW=1430MiB/s (1500MB/s)(27.9GiB/20001msec)
 
+FAQ
+===
+
+1. What are the use cases for LOCALIO?
+
+   a. Workloads where the NFS client and server are on the same host
+      realize improved IO performance. In particular, it is common when
+      running containerised workloads for jobs to find themselves
+      running on the same host as the knfsd server being used for
+      storage.
+
+2. What are the requirements for LOCALIO?
+
+   a. Bypass use of the network RPC protocol as much as possible. This
+      includes bypassing XDR and RPC for open, read, write and commit
+      operations.
+   b. Allow client and server to autonomously discover if they are
+      running local to each other without making any assumptions about
+      the local network topology.
+   c. Support the use of containers by being compatible with relevant
+      namespaces (e.g. network, user, mount).
+   d. Support all versions of NFS. NFSv3 is of particular importance
+      because it has wide enterprise usage and pNFS flexfiles makes use
+      of it for the data path.
+
+3. Why doesn´t LOCALIO just compare IP addresses or hostnames when
+   deciding if the NFS client and server are co-located on the same
+   host?
+
+   Since one of the main use cases is containerised workloads, we cannot
+   assume that IP addresses will be shared between the client and
+   server. This sets up a requirement for a handshake protocol that
+   needs to go over the same connection as the NFS traffic in order to
+   identify that the client and the server really are running on the
+   same host. The handshake uses a secret that is sent over the wire,
+   and can be verified by both parties by comparing with a value stored
+   in shared kernel memory if they are truly co-located.
+
+4. Does LOCALIO improve pNFS flexfiles?
+
+   Yes, LOCALIO complements pNFS flexfiles by allowing it to take
+   advantage of NFS client and server locality.  Policy that initiates
+   client IO as closely to the server where the data is stored naturally
+   benefits from the data path optimization LOCALIO provides.
+
+5. Why not develop a new pNFS layout to enable LOCALIO?
+
+   A new pNFS layout could be developed, but doing so would put the
+   onus on the server to somehow discover that the client is co-located
+   when deciding to hand out the layout.
+   There is value in a simpler approach (as provided by LOCALIO) that
+   allows the NFS client to negotiate and leverage locality without
+   requiring more elaborate modeling and discovery of such locality in a
+   more centralized manner.
+
+6. Why is having the client perform a server-side file OPEN, without
+   using RPC, beneficial?  Is the benefit pNFS specific?
+
+   Avoiding the use of XDR and RPC for file opens is beneficial to
+   performance regardless of whether pNFS is used. Especially when
+   dealing with small files its best to avoid going over the wire
+   whenever possible, otherwise it could reduce or even negate the
+   benefits of avoiding the wire for doing the small file I/O itself.
+   Given LOCALIO's requirements the current approach of having the
+   client perform a server-side file open, without using RPC, is ideal.
+   If in the future requirements change then we can adapt accordingly.
+
+7. Why is LOCALIO only supported with UNIX Authentication (AUTH_UNIX)?
+
+   Strong authentication is usually tied to the connection itself. It
+   works by establishing a context that is cached by the server, and
+   that acts as the key for discovering the authorisation token, which
+   can then be passed to rpc.mountd to complete the authentication
+   process. On the other hand, in the case of AUTH_UNIX, the credential
+   that was passed over the wire is used directly as the key in the
+   upcall to rpc.mountd. This simplifies the authentication process, and
+   so makes AUTH_UNIX easier to support.
+
 RPC
 ===
 
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO
  2024-08-29  1:42 ` [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
@ 2024-08-29  1:50   ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29  1:50 UTC (permalink / raw)
  To: linux-nfs
  Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Wed, Aug 28, 2024 at 09:42:53PM -0400, Mike Snitzer wrote:
> On Wed, Aug 28, 2024 at 09:03:55PM -0400, Mike Snitzer wrote:
> > These latest changes are available in my git tree here:
> > https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-next
> > 
> > I _think_ I addressed all of v13's very helpful review comments.
> > Special thanks to Neil and Chuck for their time and help!
> > 
> > And hopefully I didn't miss anything in the changelog below.
> 
> As it happens, a last minute rebase that I did just before sending out
> v14 caused me to send out 2 stale patches:

I meant these were stale:

[PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
[PATCH v14 25/25] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst

But I've now sent v14.5 to fix each...

> Sorry for the confusion.

Again ;)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 01/25] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno
  2024-08-29  1:03 ` [PATCH v14 01/25] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
@ 2024-08-29 14:17   ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:17 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:03 -0400, Mike Snitzer wrote:
> Common nfs_stat_to_errno() is used by both fs/nfs/nfs2xdr.c and
> fs/nfs/nfs3xdr.c
> 
> Will also be used by fs/nfsd/localio.c
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfs/Kconfig             |   1 +
>  fs/nfs/nfs2xdr.c           |  70 +-----------------------
>  fs/nfs/nfs3xdr.c           | 108 +++++++----------------------------
> --
>  fs/nfs/nfs4xdr.c           |   4 +-
>  fs/nfs_common/Makefile     |   2 +
>  fs/nfs_common/common.c     |  67 +++++++++++++++++++++++
>  fs/nfsd/Kconfig            |   1 +
>  include/linux/nfs_common.h |  16 ++++++
>  8 files changed, 109 insertions(+), 160 deletions(-)
>  create mode 100644 fs/nfs_common/common.c
>  create mode 100644 include/linux/nfs_common.h
> 
> diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
> index 57249f040dfc..0eb20012792f 100644
> --- a/fs/nfs/Kconfig
> +++ b/fs/nfs/Kconfig
> @@ -4,6 +4,7 @@ config NFS_FS
>  	depends on INET && FILE_LOCKING && MULTIUSER
>  	select LOCKD
>  	select SUNRPC
> +	select NFS_COMMON
>  	select NFS_ACL_SUPPORT if NFS_V3_ACL
>  	help
>  	  Choose Y here if you want to access files residing on
> other
> diff --git a/fs/nfs/nfs2xdr.c b/fs/nfs/nfs2xdr.c
> index c19093814296..6e75c6c2d234 100644
> --- a/fs/nfs/nfs2xdr.c
> +++ b/fs/nfs/nfs2xdr.c
> @@ -22,14 +22,12 @@
>  #include <linux/nfs.h>
>  #include <linux/nfs2.h>
>  #include <linux/nfs_fs.h>
> +#include <linux/nfs_common.h>
>  #include "nfstrace.h"
>  #include "internal.h"
>  
>  #define NFSDBG_FACILITY		NFSDBG_XDR
>  
> -/* Mapping from NFS error code to "errno" error code. */
> -#define errno_NFSERR_IO		EIO
> -
>  /*
>   * Declare the space requirements for NFS arguments and replies as
>   * number of 32bit-words
> @@ -64,8 +62,6 @@
>  #define NFS_readdirres_sz	(1+NFS_pagepad_sz)
>  #define NFS_statfsres_sz	(1+NFS_info_sz)
>  
> -static int nfs_stat_to_errno(enum nfs_stat);
> -
>  /*
>   * Encode/decode NFSv2 basic data types
>   *
> @@ -1054,70 +1050,6 @@ static int nfs2_xdr_dec_statfsres(struct
> rpc_rqst *req, struct xdr_stream *xdr,
>  	return nfs_stat_to_errno(status);
>  }
>  
> -
> -/*
> - * We need to translate between nfs status return values and
> - * the local errno values which may not be the same.
> - */
> -static const struct {
> -	int stat;
> -	int errno;
> -} nfs_errtbl[] = {
> -	{ NFS_OK,		0		},
> -	{ NFSERR_PERM,		-EPERM		},
> -	{ NFSERR_NOENT,		-ENOENT		},
> -	{ NFSERR_IO,		-errno_NFSERR_IO},
> -	{ NFSERR_NXIO,		-ENXIO		},
> -/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
> -	{ NFSERR_ACCES,		-EACCES		},
> -	{ NFSERR_EXIST,		-EEXIST		},
> -	{ NFSERR_XDEV,		-EXDEV		},
> -	{ NFSERR_NODEV,		-ENODEV		},
> -	{ NFSERR_NOTDIR,	-ENOTDIR	},
> -	{ NFSERR_ISDIR,		-EISDIR		},
> -	{ NFSERR_INVAL,		-EINVAL		},
> -	{ NFSERR_FBIG,		-EFBIG		},
> -	{ NFSERR_NOSPC,		-ENOSPC		},
> -	{ NFSERR_ROFS,		-EROFS		},
> -	{ NFSERR_MLINK,		-EMLINK		},
> -	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
> -	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
> -	{ NFSERR_DQUOT,		-EDQUOT		},
> -	{ NFSERR_STALE,		-ESTALE		},
> -	{ NFSERR_REMOTE,	-EREMOTE	},
> -#ifdef EWFLUSH
> -	{ NFSERR_WFLUSH,	-EWFLUSH	},
> -#endif
> -	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
> -	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
> -	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
> -	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
> -	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
> -	{ NFSERR_SERVERFAULT,	-EREMOTEIO	},
> -	{ NFSERR_BADTYPE,	-EBADTYPE	},
> -	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
> -	{ -1,			-EIO		}
> -};
> -
> -/**
> - * nfs_stat_to_errno - convert an NFS status code to a local errno
> - * @status: NFS status code to convert
> - *
> - * Returns a local errno value, or -EIO if the NFS status code is
> - * not recognized.  This function is used jointly by NFSv2 and
> NFSv3.
> - */
> -static int nfs_stat_to_errno(enum nfs_stat status)
> -{
> -	int i;
> -
> -	for (i = 0; nfs_errtbl[i].stat != -1; i++) {
> -		if (nfs_errtbl[i].stat == (int)status)
> -			return nfs_errtbl[i].errno;
> -	}
> -	dprintk("NFS: Unrecognized nfs status value: %u\n", status);
> -	return nfs_errtbl[i].errno;
> -}
> -
>  #define PROC(proc, argtype, restype,
> timer)				\
>  [NFSPROC_##proc] =
> {							\
>  	.p_proc	    = 
> NFSPROC_##proc,					\
> diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
> index 60f032be805a..4ae01c10b7e2 100644
> --- a/fs/nfs/nfs3xdr.c
> +++ b/fs/nfs/nfs3xdr.c
> @@ -21,14 +21,13 @@
>  #include <linux/nfs3.h>
>  #include <linux/nfs_fs.h>
>  #include <linux/nfsacl.h>
> +#include <linux/nfs_common.h>
> +
>  #include "nfstrace.h"
>  #include "internal.h"
>  
>  #define NFSDBG_FACILITY		NFSDBG_XDR
>  
> -/* Mapping from NFS error code to "errno" error code. */
> -#define errno_NFSERR_IO		EIO
> -
>  /*
>   * Declare the space requirements for NFS arguments and replies as
>   * number of 32bit-words
> @@ -91,8 +90,6 @@
>  				NFS3_pagepad_sz)
>  #define ACL3_setaclres_sz	(1+NFS3_post_op_attr_sz)
>  
> -static int nfs3_stat_to_errno(enum nfs_stat);
> -
>  /*
>   * Map file type to S_IFMT bits
>   */
> @@ -1406,7 +1403,7 @@ static int nfs3_xdr_dec_getattr3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_default:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1445,7 +1442,7 @@ static int nfs3_xdr_dec_setattr3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1495,7 +1492,7 @@ static int nfs3_xdr_dec_lookup3res(struct
> rpc_rqst *req,
>  	error = decode_post_op_attr(xdr, result->dir_attr, userns);
>  	if (unlikely(error))
>  		goto out;
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1537,7 +1534,7 @@ static int nfs3_xdr_dec_access3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_default:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1578,7 +1575,7 @@ static int nfs3_xdr_dec_readlink3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_default:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1658,7 +1655,7 @@ static int nfs3_xdr_dec_read3res(struct
> rpc_rqst *req, struct xdr_stream *xdr,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1728,7 +1725,7 @@ static int nfs3_xdr_dec_write3res(struct
> rpc_rqst *req, struct xdr_stream *xdr,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1795,7 +1792,7 @@ static int nfs3_xdr_dec_create3res(struct
> rpc_rqst *req,
>  	error = decode_wcc_data(xdr, result->dir_attr, userns);
>  	if (unlikely(error))
>  		goto out;
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1835,7 +1832,7 @@ static int nfs3_xdr_dec_remove3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1881,7 +1878,7 @@ static int nfs3_xdr_dec_rename3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -1926,7 +1923,7 @@ static int nfs3_xdr_dec_link3res(struct
> rpc_rqst *req, struct xdr_stream *xdr,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /**
> @@ -2101,7 +2098,7 @@ static int nfs3_xdr_dec_readdir3res(struct
> rpc_rqst *req,
>  	error = decode_post_op_attr(xdr, result->dir_attr,
> rpc_rqst_userns(req));
>  	if (unlikely(error))
>  		goto out;
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -2167,7 +2164,7 @@ static int nfs3_xdr_dec_fsstat3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -2243,7 +2240,7 @@ static int nfs3_xdr_dec_fsinfo3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -2304,7 +2301,7 @@ static int nfs3_xdr_dec_pathconf3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  /*
> @@ -2350,7 +2347,7 @@ static int nfs3_xdr_dec_commit3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_status:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  #ifdef CONFIG_NFS_V3_ACL
> @@ -2416,7 +2413,7 @@ static int nfs3_xdr_dec_getacl3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_default:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  static int nfs3_xdr_dec_setacl3res(struct rpc_rqst *req,
> @@ -2435,76 +2432,11 @@ static int nfs3_xdr_dec_setacl3res(struct
> rpc_rqst *req,
>  out:
>  	return error;
>  out_default:
> -	return nfs3_stat_to_errno(status);
> +	return nfs_stat_to_errno(status);
>  }
>  
>  #endif  /* CONFIG_NFS_V3_ACL */
>  
> -
> -/*
> - * We need to translate between nfs status return values and
> - * the local errno values which may not be the same.
> - */
> -static const struct {
> -	int stat;
> -	int errno;
> -} nfs_errtbl[] = {
> -	{ NFS_OK,		0		},
> -	{ NFSERR_PERM,		-EPERM		},
> -	{ NFSERR_NOENT,		-ENOENT		},
> -	{ NFSERR_IO,		-errno_NFSERR_IO},
> -	{ NFSERR_NXIO,		-ENXIO		},
> -/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
> -	{ NFSERR_ACCES,		-EACCES		},
> -	{ NFSERR_EXIST,		-EEXIST		},
> -	{ NFSERR_XDEV,		-EXDEV		},
> -	{ NFSERR_NODEV,		-ENODEV		},
> -	{ NFSERR_NOTDIR,	-ENOTDIR	},
> -	{ NFSERR_ISDIR,		-EISDIR		},
> -	{ NFSERR_INVAL,		-EINVAL		},
> -	{ NFSERR_FBIG,		-EFBIG		},
> -	{ NFSERR_NOSPC,		-ENOSPC		},
> -	{ NFSERR_ROFS,		-EROFS		},
> -	{ NFSERR_MLINK,		-EMLINK		},
> -	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
> -	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
> -	{ NFSERR_DQUOT,		-EDQUOT		},
> -	{ NFSERR_STALE,		-ESTALE		},
> -	{ NFSERR_REMOTE,	-EREMOTE	},
> -#ifdef EWFLUSH
> -	{ NFSERR_WFLUSH,	-EWFLUSH	},
> -#endif
> -	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
> -	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
> -	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
> -	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
> -	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
> -	{ NFSERR_SERVERFAULT,	-EREMOTEIO	},
> -	{ NFSERR_BADTYPE,	-EBADTYPE	},
> -	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
> -	{ -1,			-EIO		}
> -};
> -
> -/**
> - * nfs3_stat_to_errno - convert an NFS status code to a local errno
> - * @status: NFS status code to convert
> - *
> - * Returns a local errno value, or -EIO if the NFS status code is
> - * not recognized.  This function is used jointly by NFSv2 and
> NFSv3.
> - */
> -static int nfs3_stat_to_errno(enum nfs_stat status)
> -{
> -	int i;
> -
> -	for (i = 0; nfs_errtbl[i].stat != -1; i++) {
> -		if (nfs_errtbl[i].stat == (int)status)
> -			return nfs_errtbl[i].errno;
> -	}
> -	dprintk("NFS: Unrecognized nfs status value: %u\n", status);
> -	return nfs_errtbl[i].errno;
> -}
> -
> -
>  #define PROC(proc, argtype, restype,
> timer)				\
>  [NFS3PROC_##proc] =
> {							\
>  	.p_proc      =
> NFS3PROC_##proc,					\
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 7704a4509676..b4091af1a60d 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -52,6 +52,7 @@
>  #include <linux/nfs.h>
>  #include <linux/nfs4.h>
>  #include <linux/nfs_fs.h>
> +#include <linux/nfs_common.h>
>  
>  #include "nfs4_fs.h"
>  #include "nfs4trace.h"
> @@ -63,9 +64,6 @@
>  
>  #define NFSDBG_FACILITY		NFSDBG_XDR
>  
> -/* Mapping from NFS error code to "errno" error code. */
> -#define errno_NFSERR_IO		EIO
> -
>  struct compound_hdr;
>  static int nfs4_stat_to_errno(int);
>  static void encode_layoutget(struct xdr_stream *xdr,
> diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
> index 119c75ab9fd0..e58b01bb8dda 100644
> --- a/fs/nfs_common/Makefile
> +++ b/fs/nfs_common/Makefile
> @@ -8,3 +8,5 @@ nfs_acl-objs := nfsacl.o
>  
>  obj-$(CONFIG_GRACE_PERIOD) += grace.o
>  obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
> +
> +obj-$(CONFIG_NFS_COMMON) += common.o
> diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c
> new file mode 100644
> index 000000000000..a4ee95da2174
> --- /dev/null
> +++ b/fs/nfs_common/common.c
> @@ -0,0 +1,67 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +
> +#include <linux/module.h>
> +#include <linux/nfs_common.h>
> +
> +/*
> + * We need to translate between nfs status return values and
> + * the local errno values which may not be the same.
> + */
> +static const struct {
> +	int stat;
> +	int errno;
> +} nfs_errtbl[] = {
> +	{ NFS_OK,		0		},
> +	{ NFSERR_PERM,		-EPERM		},
> +	{ NFSERR_NOENT,		-ENOENT		},
> +	{ NFSERR_IO,		-errno_NFSERR_IO},
> +	{ NFSERR_NXIO,		-ENXIO		},
> +/*	{ NFSERR_EAGAIN,	-EAGAIN		}, */
> +	{ NFSERR_ACCES,		-EACCES		},
> +	{ NFSERR_EXIST,		-EEXIST		},
> +	{ NFSERR_XDEV,		-EXDEV		},
> +	{ NFSERR_NODEV,		-ENODEV		},
> +	{ NFSERR_NOTDIR,	-ENOTDIR	},
> +	{ NFSERR_ISDIR,		-EISDIR		},
> +	{ NFSERR_INVAL,		-EINVAL		},
> +	{ NFSERR_FBIG,		-EFBIG		},
> +	{ NFSERR_NOSPC,		-ENOSPC		},
> +	{ NFSERR_ROFS,		-EROFS		},
> +	{ NFSERR_MLINK,		-EMLINK		},
> +	{ NFSERR_NAMETOOLONG,	-ENAMETOOLONG	},
> +	{ NFSERR_NOTEMPTY,	-ENOTEMPTY	},
> +	{ NFSERR_DQUOT,		-EDQUOT		},
> +	{ NFSERR_STALE,		-ESTALE		},
> +	{ NFSERR_REMOTE,	-EREMOTE	},
> +#ifdef EWFLUSH
> +	{ NFSERR_WFLUSH,	-EWFLUSH	},
> +#endif
> +	{ NFSERR_BADHANDLE,	-EBADHANDLE	},
> +	{ NFSERR_NOT_SYNC,	-ENOTSYNC	},
> +	{ NFSERR_BAD_COOKIE,	-EBADCOOKIE	},
> +	{ NFSERR_NOTSUPP,	-ENOTSUPP	},
> +	{ NFSERR_TOOSMALL,	-ETOOSMALL	},
> +	{ NFSERR_SERVERFAULT,	-EREMOTEIO	},
> +	{ NFSERR_BADTYPE,	-EBADTYPE	},
> +	{ NFSERR_JUKEBOX,	-EJUKEBOX	},
> +	{ -1,			-EIO		}
> +};
> +
> +/**
> + * nfs_stat_to_errno - convert an NFS status code to a local errno
> + * @status: NFS status code to convert
> + *
> + * Returns a local errno value, or -EIO if the NFS status code is
> + * not recognized.  This function is used jointly by NFSv2 and
> NFSv3.
> + */
> +int nfs_stat_to_errno(enum nfs_stat status)
> +{
> +	int i;
> +
> +	for (i = 0; nfs_errtbl[i].stat != -1; i++) {
> +		if (nfs_errtbl[i].stat == (int)status)
> +			return nfs_errtbl[i].errno;
> +	}
> +	return nfs_errtbl[i].errno;
> +}
> +EXPORT_SYMBOL_GPL(nfs_stat_to_errno);
> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> index ec2ab6429e00..c0bd1509ccd4 100644
> --- a/fs/nfsd/Kconfig
> +++ b/fs/nfsd/Kconfig
> @@ -7,6 +7,7 @@ config NFSD
>  	select LOCKD
>  	select SUNRPC
>  	select EXPORTFS
> +	select NFS_COMMON
>  	select NFS_ACL_SUPPORT if NFSD_V2_ACL
>  	select NFS_ACL_SUPPORT if NFSD_V3_ACL
>  	depends on MULTIUSER
> diff --git a/include/linux/nfs_common.h b/include/linux/nfs_common.h
> new file mode 100644
> index 000000000000..3395c4a4d372
> --- /dev/null
> +++ b/include/linux/nfs_common.h
> @@ -0,0 +1,16 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * This file contains constants and methods used by both NFS client
> and server.
> + */
> +#ifndef _LINUX_NFS_COMMON_H
> +#define _LINUX_NFS_COMMON_H
> +
> +#include <linux/errno.h>
> +#include <uapi/linux/nfs.h>
> +
> +/* Mapping from NFS error code to "errno" error code. */
> +#define errno_NFSERR_IO EIO
> +
> +int nfs_stat_to_errno(enum nfs_stat status);
> +
> +#endif /* _LINUX_NFS_COMMON_H */

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 02/25] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno
  2024-08-29  1:03 ` [PATCH v14 02/25] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno Mike Snitzer
@ 2024-08-29 14:17   ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:17 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:03 -0400, Mike Snitzer wrote:
> Common nfs4_stat_to_errno() is used by fs/nfs/nfs4xdr.c and will be
> used by fs/nfs/localio.c
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfs/nfs4xdr.c           | 67 --------------------------------------
>  fs/nfs_common/common.c     | 67 ++++++++++++++++++++++++++++++++++++++
>  include/linux/nfs_common.h |  1 +
>  3 files changed, 68 insertions(+), 67 deletions(-)
> 
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index b4091af1a60d..971305bdaecb 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -65,7 +65,6 @@
>  #define NFSDBG_FACILITY		NFSDBG_XDR
>  
>  struct compound_hdr;
> -static int nfs4_stat_to_errno(int);
>  static void encode_layoutget(struct xdr_stream *xdr,
>  			     const struct nfs4_layoutget_args *args,
>  			     struct compound_hdr *hdr);
> @@ -7619,72 +7618,6 @@ int nfs4_decode_dirent(struct xdr_stream *xdr, struct nfs_entry *entry,
>  	return 0;
>  }
>  
> -/*
> - * We need to translate between nfs status return values and
> - * the local errno values which may not be the same.
> - */
> -static struct {
> -	int stat;
> -	int errno;
> -} nfs_errtbl[] = {
> -	{ NFS4_OK,		0		},
> -	{ NFS4ERR_PERM,		-EPERM		},
> -	{ NFS4ERR_NOENT,	-ENOENT		},
> -	{ NFS4ERR_IO,		-errno_NFSERR_IO},
> -	{ NFS4ERR_NXIO,		-ENXIO		},
> -	{ NFS4ERR_ACCESS,	-EACCES		},
> -	{ NFS4ERR_EXIST,	-EEXIST		},
> -	{ NFS4ERR_XDEV,		-EXDEV		},
> -	{ NFS4ERR_NOTDIR,	-ENOTDIR	},
> -	{ NFS4ERR_ISDIR,	-EISDIR		},
> -	{ NFS4ERR_INVAL,	-EINVAL		},
> -	{ NFS4ERR_FBIG,		-EFBIG		},
> -	{ NFS4ERR_NOSPC,	-ENOSPC		},
> -	{ NFS4ERR_ROFS,		-EROFS		},
> -	{ NFS4ERR_MLINK,	-EMLINK		},
> -	{ NFS4ERR_NAMETOOLONG,	-ENAMETOOLONG	},
> -	{ NFS4ERR_NOTEMPTY,	-ENOTEMPTY	},
> -	{ NFS4ERR_DQUOT,	-EDQUOT		},
> -	{ NFS4ERR_STALE,	-ESTALE		},
> -	{ NFS4ERR_BADHANDLE,	-EBADHANDLE	},
> -	{ NFS4ERR_BAD_COOKIE,	-EBADCOOKIE	},
> -	{ NFS4ERR_NOTSUPP,	-ENOTSUPP	},
> -	{ NFS4ERR_TOOSMALL,	-ETOOSMALL	},
> -	{ NFS4ERR_SERVERFAULT,	-EREMOTEIO	},
> -	{ NFS4ERR_BADTYPE,	-EBADTYPE	},
> -	{ NFS4ERR_LOCKED,	-EAGAIN		},
> -	{ NFS4ERR_SYMLINK,	-ELOOP		},
> -	{ NFS4ERR_OP_ILLEGAL,	-EOPNOTSUPP	},
> -	{ NFS4ERR_DEADLOCK,	-EDEADLK	},
> -	{ NFS4ERR_NOXATTR,	-ENODATA	},
> -	{ NFS4ERR_XATTR2BIG,	-E2BIG		},
> -	{ -1,			-EIO		}
> -};
> -
> -/*
> - * Convert an NFS error code to a local one.
> - * This one is used jointly by NFSv2 and NFSv3.
> - */
> -static int
> -nfs4_stat_to_errno(int stat)
> -{
> -	int i;
> -	for (i = 0; nfs_errtbl[i].stat != -1; i++) {
> -		if (nfs_errtbl[i].stat == stat)
> -			return nfs_errtbl[i].errno;
> -	}
> -	if (stat <= 10000 || stat > 10100) {
> -		/* The server is looney tunes. */
> -		return -EREMOTEIO;
> -	}
> -	/* If we cannot translate the error, the recovery routines should
> -	 * handle it.
> -	 * Note: remaining NFSv4 error codes have values > 10000, so should
> -	 * not conflict with native Linux error codes.
> -	 */
> -	return -stat;
> -}
> -
>  #ifdef CONFIG_NFS_V4_2
>  #include "nfs42xdr.c"
>  #endif /* CONFIG_NFS_V4_2 */
> diff --git a/fs/nfs_common/common.c b/fs/nfs_common/common.c
> index a4ee95da2174..34a115176f97 100644
> --- a/fs/nfs_common/common.c
> +++ b/fs/nfs_common/common.c
> @@ -2,6 +2,7 @@
>  
>  #include <linux/module.h>
>  #include <linux/nfs_common.h>
> +#include <linux/nfs4.h>
>  
>  /*
>   * We need to translate between nfs status return values and
> @@ -65,3 +66,69 @@ int nfs_stat_to_errno(enum nfs_stat status)
>  	return nfs_errtbl[i].errno;
>  }
>  EXPORT_SYMBOL_GPL(nfs_stat_to_errno);
> +
> +/*
> + * We need to translate between nfs v4 status return values and
> + * the local errno values which may not be the same.
> + */
> +static const struct {
> +	int stat;
> +	int errno;
> +} nfs4_errtbl[] = {
> +	{ NFS4_OK,		0		},
> +	{ NFS4ERR_PERM,		-EPERM		},
> +	{ NFS4ERR_NOENT,	-ENOENT		},
> +	{ NFS4ERR_IO,		-errno_NFSERR_IO},
> +	{ NFS4ERR_NXIO,		-ENXIO		},
> +	{ NFS4ERR_ACCESS,	-EACCES		},
> +	{ NFS4ERR_EXIST,	-EEXIST		},
> +	{ NFS4ERR_XDEV,		-EXDEV		},
> +	{ NFS4ERR_NOTDIR,	-ENOTDIR	},
> +	{ NFS4ERR_ISDIR,	-EISDIR		},
> +	{ NFS4ERR_INVAL,	-EINVAL		},
> +	{ NFS4ERR_FBIG,		-EFBIG		},
> +	{ NFS4ERR_NOSPC,	-ENOSPC		},
> +	{ NFS4ERR_ROFS,		-EROFS		},
> +	{ NFS4ERR_MLINK,	-EMLINK		},
> +	{ NFS4ERR_NAMETOOLONG,	-ENAMETOOLONG	},
> +	{ NFS4ERR_NOTEMPTY,	-ENOTEMPTY	},
> +	{ NFS4ERR_DQUOT,	-EDQUOT		},
> +	{ NFS4ERR_STALE,	-ESTALE		},
> +	{ NFS4ERR_BADHANDLE,	-EBADHANDLE	},
> +	{ NFS4ERR_BAD_COOKIE,	-EBADCOOKIE	},
> +	{ NFS4ERR_NOTSUPP,	-ENOTSUPP	},
> +	{ NFS4ERR_TOOSMALL,	-ETOOSMALL	},
> +	{ NFS4ERR_SERVERFAULT,	-EREMOTEIO	},
> +	{ NFS4ERR_BADTYPE,	-EBADTYPE	},
> +	{ NFS4ERR_LOCKED,	-EAGAIN		},
> +	{ NFS4ERR_SYMLINK,	-ELOOP		},
> +	{ NFS4ERR_OP_ILLEGAL,	-EOPNOTSUPP	},
> +	{ NFS4ERR_DEADLOCK,	-EDEADLK	},
> +	{ NFS4ERR_NOXATTR,	-ENODATA	},
> +	{ NFS4ERR_XATTR2BIG,	-E2BIG		},
> +	{ -1,			-EIO		}
> +};
> +
> +/*
> + * Convert an NFS error code to a local one.
> + * This one is used by NFSv4.
> + */
> +int nfs4_stat_to_errno(int stat)
> +{
> +	int i;
> +	for (i = 0; nfs4_errtbl[i].stat != -1; i++) {
> +		if (nfs4_errtbl[i].stat == stat)
> +			return nfs4_errtbl[i].errno;
> +	}
> +	if (stat <= 10000 || stat > 10100) {
> +		/* The server is looney tunes. */
> +		return -EREMOTEIO;
> +	}
> +	/* If we cannot translate the error, the recovery routines should
> +	 * handle it.
> +	 * Note: remaining NFSv4 error codes have values > 10000, so should
> +	 * not conflict with native Linux error codes.
> +	 */
> +	return -stat;
> +}
> +EXPORT_SYMBOL_GPL(nfs4_stat_to_errno);
> diff --git a/include/linux/nfs_common.h b/include/linux/nfs_common.h
> index 3395c4a4d372..5fc02df88252 100644
> --- a/include/linux/nfs_common.h
> +++ b/include/linux/nfs_common.h
> @@ -12,5 +12,6 @@
>  #define errno_NFSERR_IO EIO
>  
>  int nfs_stat_to_errno(enum nfs_stat status);
> +int nfs4_stat_to_errno(int stat);
>  
>  #endif /* _LINUX_NFS_COMMON_H */

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 03/25] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
  2024-08-29  1:03 ` [PATCH v14 03/25] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
@ 2024-08-29 14:19   ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:19 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:03 -0400, Mike Snitzer wrote:
> Eliminates duplicate functions in various files to allow for
> additional callers.
> 
> Reviewed-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfs/flexfilelayout/flexfilelayout.c |  6 ------
>  fs/nfs/nfs4xdr.c                       | 13 -------------
>  include/linux/nfs_xdr.h                | 20 +++++++++++++++++++-
>  3 files changed, 19 insertions(+), 20 deletions(-)
> 
> diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
> index 39ba9f4208aa..d4d551ffea7b 100644
> --- a/fs/nfs/flexfilelayout/flexfilelayout.c
> +++ b/fs/nfs/flexfilelayout/flexfilelayout.c
> @@ -2086,12 +2086,6 @@ static int ff_layout_encode_ioerr(struct xdr_stream *xdr,
>  	return ff_layout_encode_ds_ioerr(xdr, &ff_args->errors);
>  }
>  
> -static void
> -encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
> -{
> -	WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
> -}
> -
>  static void
>  ff_layout_encode_ff_iostat_head(struct xdr_stream *xdr,
>  			    const nfs4_stateid *stateid,
> diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
> index 971305bdaecb..6bf2d44e5d4e 100644
> --- a/fs/nfs/nfs4xdr.c
> +++ b/fs/nfs/nfs4xdr.c
> @@ -972,11 +972,6 @@ static __be32 *reserve_space(struct xdr_stream *xdr, size_t nbytes)
>  	return p;
>  }
>  
> -static void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
> -{
> -	WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
> -}
> -
>  static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
>  {
>  	WARN_ON_ONCE(xdr_stream_encode_opaque(xdr, str, len) < 0);
> @@ -4406,14 +4401,6 @@ static int decode_access(struct xdr_stream *xdr, u32 *supported, u32 *access)
>  	return 0;
>  }
>  
> -static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
> -{
> -	ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
> -	if (unlikely(ret < 0))
> -		return -EIO;
> -	return 0;
> -}
> -
>  static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
>  {
>  	return decode_opaque_fixed(xdr, stateid, NFS4_STATEID_SIZE);
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index 45623af3e7b8..5e93fbfb785a 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1853,6 +1853,24 @@ struct nfs_rpc_ops {
>  	void	(*disable_swap)(struct inode *inode);
>  };
>  
> +/*
> + * Helper functions used by NFS client and/or server
> + */
> +static inline void encode_opaque_fixed(struct xdr_stream *xdr,
> +				       const void *buf, size_t len)
> +{
> +	WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
> +}
> +
> +static inline int decode_opaque_fixed(struct xdr_stream *xdr,
> +				      void *buf, size_t len)
> +{
> +	ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
> +	if (unlikely(ret < 0))
> +		return -EIO;
> +	return 0;
> +}
> +
>  /*
>   * Function vectors etc. for the NFS client
>   */
> @@ -1866,4 +1884,4 @@ extern const struct rpc_version nfs_version4;
>  extern const struct rpc_version nfsacl_version3;
>  extern const struct rpc_program nfsacl_program;
>  
> -#endif
> +#endif /* _LINUX_NFS_XDR_H */

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 04/25] NFSD: Handle @rqstp == NULL in check_nfsd_access()
  2024-08-29  1:03 ` [PATCH v14 04/25] NFSD: Handle @rqstp == NULL in check_nfsd_access() Mike Snitzer
@ 2024-08-29 14:20   ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:20 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:03 -0400, Mike Snitzer wrote:
> From: NeilBrown <neilb@suse.de>
> 
> LOCALIO-initiated open operations are not running in an nfsd thread
> and thus do not have an associated svc_rqst context.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/export.c | 30 +++++++++++++++++++++++++-----
>  1 file changed, 25 insertions(+), 5 deletions(-)
> 
> diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c
> index 7bb4f2075ac5..c82d8e3e0d4f 100644
> --- a/fs/nfsd/export.c
> +++ b/fs/nfsd/export.c
> @@ -1074,10 +1074,30 @@ static struct svc_export *exp_find(struct cache_detail *cd,
>  	return exp;
>  }
>  
> +/**
> + * check_nfsd_access - check if access to export is allowed.
> + * @exp: svc_export that is being accessed.
> + * @rqstp: svc_rqst attempting to access @exp (will be NULL for LOCALIO).
> + *
> + * Return values:
> + *   %nfs_ok if access is granted, or
> + *   %nfserr_wrongsec if access is denied
> + */
>  __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp)
>  {
>  	struct exp_flavor_info *f, *end = exp->ex_flavors + exp->ex_nflavors;
> -	struct svc_xprt *xprt = rqstp->rq_xprt;
> +	struct svc_xprt *xprt;
> +
> +	/*
> +	 * If rqstp is NULL, this is a LOCALIO request which will only
> +	 * ever use a filehandle/credential pair for which access has
> +	 * been affirmed (by ACCESS or OPEN NFS requests) over the
> +	 * wire. So there is no need for further checks here.
> +	 */
> +	if (!rqstp)
> +		return nfs_ok;
> +
> +	xprt = rqstp->rq_xprt;
>  
>  	if (exp->ex_xprtsec_modes & NFSEXP_XPRTSEC_NONE) {
>  		if (!test_bit(XPT_TLS_SESSION, &xprt->xpt_flags))
> @@ -1098,17 +1118,17 @@ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp)
>  ok:
>  	/* legacy gss-only clients are always OK: */
>  	if (exp->ex_client == rqstp->rq_gssclient)
> -		return 0;
> +		return nfs_ok;
>  	/* ip-address based client; check sec= export option: */
>  	for (f = exp->ex_flavors; f < end; f++) {
>  		if (f->pseudoflavor == rqstp->rq_cred.cr_flavor)
> -			return 0;
> +			return nfs_ok;
>  	}
>  	/* defaults in absence of sec= options: */
>  	if (exp->ex_nflavors == 0) {
>  		if (rqstp->rq_cred.cr_flavor == RPC_AUTH_NULL ||
>  		    rqstp->rq_cred.cr_flavor == RPC_AUTH_UNIX)
> -			return 0;
> +			return nfs_ok;
>  	}
>  
>  	/* If the compound op contains a spo_must_allowed op,
> @@ -1118,7 +1138,7 @@ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp)
>  	 */
>  
>  	if (nfsd4_spo_must_allow(rqstp))
> -		return 0;
> +		return nfs_ok;
>  
>  denied:
>  	return nfserr_wrongsec;

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 05/25] NFSD: Refactor nfsd_setuser_and_check_port()
  2024-08-29  1:04 ` [PATCH v14 05/25] NFSD: Refactor nfsd_setuser_and_check_port() Mike Snitzer
@ 2024-08-29 14:23   ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:23 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> From: NeilBrown <neilb@suse.de>
> 
> There are several places where __fh_verify unconditionally dereferences
> rqstp to check that the connection is suitably secure.  They look at
> rqstp->rq_xprt which is not meaningful in the target use case of
> "localio" NFS in which the client talks directly to the local server.
> 
> Prepare these to always succeed when rqstp is NULL.
> 
> Signed-off-by: NeilBrown <neilb@suse.de>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfsfh.c | 19 ++++++++++---------
>  1 file changed, 10 insertions(+), 9 deletions(-)
> 
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 50d23d56f403..4b964a71a504 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -87,23 +87,24 @@ nfsd_mode_check(struct dentry *dentry, umode_t requested)
>  	return nfserr_wrong_type;
>  }
>  
> -static bool nfsd_originating_port_ok(struct svc_rqst *rqstp, int flags)
> +static bool nfsd_originating_port_ok(struct svc_rqst *rqstp,
> +				     struct svc_cred *cred,
> +				     struct svc_export *exp)
>  {
> -	if (flags & NFSEXP_INSECURE_PORT)
> +	if (nfsexp_flags(cred, exp) & NFSEXP_INSECURE_PORT)
>  		return true;
>  	/* We don't require gss requests to use low ports: */
> -	if (rqstp->rq_cred.cr_flavor >= RPC_AUTH_GSS)
> +	if (cred->cr_flavor >= RPC_AUTH_GSS)
>  		return true;
>  	return test_bit(RQ_SECURE, &rqstp->rq_flags);
>  }
>  
>  static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp,
> +					  struct svc_cred *cred,
>  					  struct svc_export *exp)
>  {
> -	int flags = nfsexp_flags(&rqstp->rq_cred, exp);
> -
>  	/* Check if the request originated from a secure port. */
> -	if (!nfsd_originating_port_ok(rqstp, flags)) {
> +	if (rqstp && !nfsd_originating_port_ok(rqstp, cred, exp)) {
>  		RPC_IFDEBUG(char buf[RPC_MAX_ADDRBUFLEN]);
>  		dprintk("nfsd: request from insecure port %s!\n",
>  		        svc_print_addr(rqstp, buf, sizeof(buf)));
> @@ -111,7 +112,7 @@ static __be32 nfsd_setuser_and_check_port(struct svc_rqst *rqstp,
>  	}
>  
>  	/* Set user creds for this exportpoint */
> -	return nfserrno(nfsd_setuser(&rqstp->rq_cred, exp));
> +	return nfserrno(nfsd_setuser(cred, exp));
>  }
>  
>  static inline __be32 check_pseudo_root(struct dentry *dentry,
> @@ -219,7 +220,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  		put_cred(override_creds(new));
>  		put_cred(new);
>  	} else {
> -		error = nfsd_setuser_and_check_port(rqstp, exp);
> +		error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
>  		if (error)
>  			goto out;
>  	}
> @@ -358,7 +359,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
>  	if (error)
>  		goto out;
>  
> -	error = nfsd_setuser_and_check_port(rqstp, exp);
> +	error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
>  	if (error)
>  		goto out;
>  

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
  2024-08-29  1:04 ` [PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry() Mike Snitzer
  2024-08-29  1:45   ` [PATCH v14.5 " Mike Snitzer
@ 2024-08-29 14:28   ` Jeff Layton
  2024-08-29 15:28     ` Mike Snitzer
  1 sibling, 1 reply; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:28 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Currently, fh_verify() makes some daring assumptions about which
> version of file handle the caller wants, based on the things it can
> find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
> case sometimes has no svc_rqst context, so this logic won't work in
> that case.
> 
> Instead, examine the passed-in file handle. It's .max_size field
> should carry information to allow nfsd_set_fh_dentry() to initialize
> the file handle appropriately.
> 
> lockd appears to be the only kernel consumer that does not set the
> file handle .max_size during initialization.
> 
> write_filehandle() is the other question mark, as it looks possible
> to specify a maxsize between NFS_FHSIZE and NFS3_FHSIZE here.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfsd/lockd.c |  6 ++++--
>  fs/nfsd/nfsfh.c | 11 +++++++----
>  2 files changed, 11 insertions(+), 6 deletions(-)
> 
> diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c
> index 46a7f9b813e5..e636d2a1e664 100644
> --- a/fs/nfsd/lockd.c
> +++ b/fs/nfsd/lockd.c
> @@ -32,8 +32,10 @@ nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp,
>  	int		access;
>  	struct svc_fh	fh;
>  
> -	/* must initialize before using! but maxsize doesn't matter */
> -	fh_init(&fh,0);
> +	if (rqstp->rq_vers == 4)
> +		fh_init(&fh, NFS3_FHSIZE);
> +	else
> +		fh_init(&fh, NFS_FHSIZE);
>  	fh.fh_handle.fh_size = f->size;
>  	memcpy(&fh.fh_handle.fh_raw, f->data, f->size);
>  	fh.fh_export = NULL;
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 4b964a71a504..77acc26e8b02 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -267,25 +267,28 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	fhp->fh_dentry = dentry;
>  	fhp->fh_export = exp;
>  
> -	switch (rqstp->rq_vers) {
> -	case 4:
> +	switch (fhp->fh_maxsize) {
> +	case NFS4_FHSIZE:
>  		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR)
>  			fhp->fh_no_atomic_attr = true;
>  		fhp->fh_64bit_cookies = true;
>  		break;
> -	case 3:
> +	case NFS3_FHSIZE:
>  		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC)
>  			fhp->fh_no_wcc = true;
>  		fhp->fh_64bit_cookies = true;
>  		if (exp->ex_flags & NFSEXP_V4ROOT)
>  			goto out;
>  		break;
> -	case 2:
> +	case NFS_FHSIZE:
>  		fhp->fh_no_wcc = true;
>  		if (EX_WGATHER(exp))
>  			fhp->fh_use_wgather = true;
>  		if (exp->ex_flags & NFSEXP_V4ROOT)
>  			goto out;
> +		break;
> +	case 0:
> +		WARN_ONCE(1, "Uninitialized file handle");
>  	}
>  
>  	return 0;

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 07/25] NFSD: Short-circuit fh_verify tracepoints for LOCALIO
  2024-08-29  1:04 ` [PATCH v14 07/25] NFSD: Short-circuit fh_verify tracepoints for LOCALIO Mike Snitzer
@ 2024-08-29 14:33   ` Jeff Layton
  2024-08-29 14:35     ` Chuck Lever
  0 siblings, 1 reply; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:33 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> LOCALIO will be able to call fh_verify() with a NULL rqstp. In this
> case, the existing trace points need to be skipped because they
> want to dereference the address fields in the passed-in rqstp.
> 
> Temporarily make these trace points conditional to avoid a seg
> fault in this case. Putting the "rqstp != NULL" check in the trace
> points themselves makes the check more efficient.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfsd/trace.h | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 77bbd23aa150..d22027e23761 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -193,7 +193,7 @@ TRACE_EVENT(nfsd_compound_encode_err,
>  		{ S_IFIFO,		"FIFO" }, \
>  		{ S_IFSOCK,		"SOCK" })
>  
> -TRACE_EVENT(nfsd_fh_verify,
> +TRACE_EVENT_CONDITION(nfsd_fh_verify,
>  	TP_PROTO(
>  		const struct svc_rqst *rqstp,
>  		const struct svc_fh *fhp,
> @@ -201,6 +201,7 @@ TRACE_EVENT(nfsd_fh_verify,
>  		int access
>  	),
>  	TP_ARGS(rqstp, fhp, type, access),
> +	TP_CONDITION(rqstp != NULL),
>  	TP_STRUCT__entry(
>  		__field(unsigned int, netns_ino)
>  		__sockaddr(server, rqstp->rq_xprt->xpt_remotelen)
> @@ -239,7 +240,7 @@ TRACE_EVENT_CONDITION(nfsd_fh_verify_err,
>  		__be32 error
>  	),
>  	TP_ARGS(rqstp, fhp, type, access, error),
> -	TP_CONDITION(error),
> +	TP_CONDITION(rqstp != NULL && error),
>  	TP_STRUCT__entry(
>  		__field(unsigned int, netns_ino)
>  		__sockaddr(server, rqstp->rq_xprt->xpt_remotelen)
> @@ -295,12 +296,13 @@ DECLARE_EVENT_CLASS(nfsd_fh_err_class,
>  		  __entry->status)
>  )
>  
> -#define DEFINE_NFSD_FH_ERR_EVENT(name)		\
> -DEFINE_EVENT(nfsd_fh_err_class, nfsd_##name,	\
> -	TP_PROTO(struct svc_rqst *rqstp,	\
> -		 struct svc_fh	*fhp,		\
> -		 int		status),	\
> -	TP_ARGS(rqstp, fhp, status))
> +#define DEFINE_NFSD_FH_ERR_EVENT(name)			\
> +DEFINE_EVENT_CONDITION(nfsd_fh_err_class, nfsd_##name,	\
> +	TP_PROTO(struct svc_rqst *rqstp,		\
> +		 struct svc_fh	*fhp,			\
> +		 int		status),		\
> +	TP_ARGS(rqstp, fhp, status),			\
> +	TP_CONDITION(rqstp != NULL))
>  
>  DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badexport);
>  DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badhandle);

A bit ugly. We really only want the rqstp here to get at the socket
structures. I'm still looking at the rest of the set, so I'll assume
that this gets cleaned up later.

Acked-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 07/25] NFSD: Short-circuit fh_verify tracepoints for LOCALIO
  2024-08-29 14:33   ` Jeff Layton
@ 2024-08-29 14:35     ` Chuck Lever
  0 siblings, 0 replies; 75+ messages in thread
From: Chuck Lever @ 2024-08-29 14:35 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Mike Snitzer, linux-nfs, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 10:33:18AM -0400, Jeff Layton wrote:
> On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > LOCALIO will be able to call fh_verify() with a NULL rqstp. In this
> > case, the existing trace points need to be skipped because they
> > want to dereference the address fields in the passed-in rqstp.
> > 
> > Temporarily make these trace points conditional to avoid a seg
> > fault in this case. Putting the "rqstp != NULL" check in the trace
> > points themselves makes the check more efficient.
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/nfsd/trace.h | 18 ++++++++++--------
> >  1 file changed, 10 insertions(+), 8 deletions(-)
> > 
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index 77bbd23aa150..d22027e23761 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -193,7 +193,7 @@ TRACE_EVENT(nfsd_compound_encode_err,
> >  		{ S_IFIFO,		"FIFO" }, \
> >  		{ S_IFSOCK,		"SOCK" })
> >  
> > -TRACE_EVENT(nfsd_fh_verify,
> > +TRACE_EVENT_CONDITION(nfsd_fh_verify,
> >  	TP_PROTO(
> >  		const struct svc_rqst *rqstp,
> >  		const struct svc_fh *fhp,
> > @@ -201,6 +201,7 @@ TRACE_EVENT(nfsd_fh_verify,
> >  		int access
> >  	),
> >  	TP_ARGS(rqstp, fhp, type, access),
> > +	TP_CONDITION(rqstp != NULL),
> >  	TP_STRUCT__entry(
> >  		__field(unsigned int, netns_ino)
> >  		__sockaddr(server, rqstp->rq_xprt->xpt_remotelen)
> > @@ -239,7 +240,7 @@ TRACE_EVENT_CONDITION(nfsd_fh_verify_err,
> >  		__be32 error
> >  	),
> >  	TP_ARGS(rqstp, fhp, type, access, error),
> > -	TP_CONDITION(error),
> > +	TP_CONDITION(rqstp != NULL && error),
> >  	TP_STRUCT__entry(
> >  		__field(unsigned int, netns_ino)
> >  		__sockaddr(server, rqstp->rq_xprt->xpt_remotelen)
> > @@ -295,12 +296,13 @@ DECLARE_EVENT_CLASS(nfsd_fh_err_class,
> >  		  __entry->status)
> >  )
> >  
> > -#define DEFINE_NFSD_FH_ERR_EVENT(name)		\
> > -DEFINE_EVENT(nfsd_fh_err_class, nfsd_##name,	\
> > -	TP_PROTO(struct svc_rqst *rqstp,	\
> > -		 struct svc_fh	*fhp,		\
> > -		 int		status),	\
> > -	TP_ARGS(rqstp, fhp, status))
> > +#define DEFINE_NFSD_FH_ERR_EVENT(name)			\
> > +DEFINE_EVENT_CONDITION(nfsd_fh_err_class, nfsd_##name,	\
> > +	TP_PROTO(struct svc_rqst *rqstp,		\
> > +		 struct svc_fh	*fhp,			\
> > +		 int		status),		\
> > +	TP_ARGS(rqstp, fhp, status),			\
> > +	TP_CONDITION(rqstp != NULL))
> >  
> >  DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badexport);
> >  DEFINE_NFSD_FH_ERR_EVENT(set_fh_dentry_badhandle);
> 
> A bit ugly. We really only want the rqstp here to get at the socket
> structures. I'm still looking at the rest of the set, so I'll assume
> that this gets cleaned up later.

No, it doesn't. We don't have a solution for how to trace
LOCALIO activity here yet.

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 08/25] nfsd: factor out __fh_verify to allow NULL rqstp to be passed
  2024-08-29  1:04 ` [PATCH v14 08/25] nfsd: factor out __fh_verify to allow NULL rqstp to be passed Mike Snitzer
@ 2024-08-29 14:39   ` Jeff Layton
  2024-08-29 15:35     ` Mike Snitzer
  0 siblings, 1 reply; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:39 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> From: NeilBrown <neilb@suse.de>
> 
> __fh_verify() offers an interface like fh_verify() but doesn't require
> a struct svc_rqst *, instead it also takes the specific parts as
> explicit required arguments.  So it is safe to call __fh_verify() with
> a NULL rqstp, but the net, cred, and client args must not be NULL.
> 
> __fh_verify() does not use SVC_NET(), nor does the functions it calls.
> 
> Rather than using rqstp->rq_client pass the client and gssclient
> explicitly to __fh_verify and then to nfsd_set_fh_dentry().
> 
> Lastly, 4 associated tracepoints are only used if rqstp is not NULL
> (this is a stop-gap that should be properly fixed so localio also
> benefits from the utility these tracepoints provide when debugging
> fh_verify issues).
> 

nit: this last paragraph doesn't apply anymore with the inclusion of
the previous patch

> Signed-off-by: NeilBrown <neilb@suse.de>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/nfsfh.c | 90 +++++++++++++++++++++++++++++--------------------
>  1 file changed, 53 insertions(+), 37 deletions(-)
> 
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 77acc26e8b02..80c06e170e9a 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -142,7 +142,11 @@ static inline __be32 check_pseudo_root(struct dentry *dentry,
>   * dentry.  On success, the results are used to set fh_export and
>   * fh_dentry.
>   */
> -static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
> +static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
> +				 struct svc_cred *cred,
> +				 struct auth_domain *client,
> +				 struct auth_domain *gssclient,
> +				 struct svc_fh *fhp)
>  {
>  	struct knfsd_fh	*fh = &fhp->fh_handle;
>  	struct fid *fid = NULL;
> @@ -184,8 +188,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	data_left -= len;
>  	if (data_left < 0)
>  		return error;
> -	exp = rqst_exp_find(&rqstp->rq_chandle, SVC_NET(rqstp),
> -			    rqstp->rq_client, rqstp->rq_gssclient,
> +	exp = rqst_exp_find(rqstp ? &rqstp->rq_chandle : NULL,
> +			    net, client, gssclient,
>  			    fh->fh_fsid_type, fh->fh_fsid);
>  	fid = (struct fid *)(fh->fh_fsid + len);
>  
> @@ -220,7 +224,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  		put_cred(override_creds(new));
>  		put_cred(new);
>  	} else {
> -		error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
> +		error = nfsd_setuser_and_check_port(rqstp, cred, exp);
>  		if (error)
>  			goto out;
>  	}
> @@ -297,43 +301,21 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	return error;
>  }
>  
> -/**
> - * fh_verify - filehandle lookup and access checking
> - * @rqstp: pointer to current rpc request
> - * @fhp: filehandle to be verified
> - * @type: expected type of object pointed to by filehandle
> - * @access: type of access needed to object
> - *
> - * Look up a dentry from the on-the-wire filehandle, check the client's
> - * access to the export, and set the current task's credentials.
> - *
> - * Regardless of success or failure of fh_verify(), fh_put() should be
> - * called on @fhp when the caller is finished with the filehandle.
> - *
> - * fh_verify() may be called multiple times on a given filehandle, for
> - * example, when processing an NFSv4 compound.  The first call will look
> - * up a dentry using the on-the-wire filehandle.  Subsequent calls will
> - * skip the lookup and just perform the other checks and possibly change
> - * the current task's credentials.
> - *
> - * @type specifies the type of object expected using one of the S_IF*
> - * constants defined in include/linux/stat.h.  The caller may use zero
> - * to indicate that it doesn't care, or a negative integer to indicate
> - * that it expects something not of the given type.
> - *
> - * @access is formed from the NFSD_MAY_* constants defined in
> - * fs/nfsd/vfs.h.
> - */
> -__be32
> -fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
> +static __be32
> +__fh_verify(struct svc_rqst *rqstp,
> +	    struct net *net, struct svc_cred *cred,
> +	    struct auth_domain *client,
> +	    struct auth_domain *gssclient,
> +	    struct svc_fh *fhp, umode_t type, int access)

I don't consider is a show-stopper, but it might be good to have a
kerneldoc header on this, just because it has so many parameters.
Having them clearly spelled out, and the rules around what must be set
when rqstp is NULL would make it less likely we'll break those
assumptions in the future.

>  {
> -	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>  	struct svc_export *exp = NULL;
>  	struct dentry	*dentry;
>  	__be32		error;
>  
>  	if (!fhp->fh_dentry) {
> -		error = nfsd_set_fh_dentry(rqstp, fhp);
> +		error = nfsd_set_fh_dentry(rqstp, net, cred, client,
> +					   gssclient, fhp);
>  		if (error)
>  			goto out;
>  	}
> @@ -362,7 +344,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
>  	if (error)
>  		goto out;
>  
> -	error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
> +	error = nfsd_setuser_and_check_port(rqstp, cred, exp);
>  	if (error)
>  		goto out;
>  
> @@ -392,7 +374,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
>  
>  skip_pseudoflavor_check:
>  	/* Finally, check access permissions. */
> -	error = nfsd_permission(&rqstp->rq_cred, exp, dentry, access);
> +	error = nfsd_permission(cred, exp, dentry, access);
>  out:
>  	trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error);
>  	if (error == nfserr_stale)
> @@ -400,6 +382,40 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
>  	return error;
>  }
>  
> +/**
> + * fh_verify - filehandle lookup and access checking
> + * @rqstp: pointer to current rpc request
> + * @fhp: filehandle to be verified
> + * @type: expected type of object pointed to by filehandle
> + * @access: type of access needed to object
> + *
> + * Look up a dentry from the on-the-wire filehandle, check the client's
> + * access to the export, and set the current task's credentials.
> + *
> + * Regardless of success or failure of fh_verify(), fh_put() should be
> + * called on @fhp when the caller is finished with the filehandle.
> + *
> + * fh_verify() may be called multiple times on a given filehandle, for
> + * example, when processing an NFSv4 compound.  The first call will look
> + * up a dentry using the on-the-wire filehandle.  Subsequent calls will
> + * skip the lookup and just perform the other checks and possibly change
> + * the current task's credentials.
> + *
> + * @type specifies the type of object expected using one of the S_IF*
> + * constants defined in include/linux/stat.h.  The caller may use zero
> + * to indicate that it doesn't care, or a negative integer to indicate
> + * that it expects something not of the given type.
> + *
> + * @access is formed from the NFSD_MAY_* constants defined in
> + * fs/nfsd/vfs.h.
> + */
> +__be32
> +fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
> +{
> +	return __fh_verify(rqstp, SVC_NET(rqstp), &rqstp->rq_cred,
> +			   rqstp->rq_client, rqstp->rq_gssclient,
> +			   fhp, type, access);
> +}
>  
>  /*
>   * Compose a file handle for an NFS reply.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local()
  2024-08-29  1:04 ` [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local() Mike Snitzer
@ 2024-08-29 14:49   ` Jeff Layton
  2024-08-29 15:47   ` Chuck Lever
  1 sibling, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 14:49 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> From: NeilBrown <neilb@suse.de>
> 
> nfsd_file_acquire_local() can be used to look up a file by filehandle
> without having a struct svc_rqst.  This can be used by NFS LOCALIO to
> allow the NFS client to bypass the NFS protocol to directly access a
> file provided by the NFS server which is running in the same kernel.
> 
> In nfsd_file_do_acquire() care is taken to always use fh_verify() if
> rqstp is not NULL (as is the case for non-LOCALIO callers).  Otherwise
> the non-LOCALIO callers will not supply the correct and required
> arguments to __fh_verify (e.g. gssclient isn't passed).
> 
> Introduce fh_verify_local() wrapper around __fh_verify to make it
> clear that LOCALIO is intended caller.
> 
> Also, use GC for nfsd_file returned by nfsd_file_acquire_local.  GC
> offers performance improvements if/when a file is reopened before
> launderette cleans it from the filecache's LRU.
> 
> Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC
> Signed-off-by: NeilBrown <neilb@suse.de>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/filecache.c | 71 ++++++++++++++++++++++++++++++++++++++++-----
>  fs/nfsd/filecache.h |  3 ++
>  fs/nfsd/nfsfh.c     | 39 +++++++++++++++++++++++++
>  fs/nfsd/nfsfh.h     |  2 ++
>  4 files changed, 108 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 9e9d246f993c..2dc72de31f61 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -982,12 +982,14 @@ nfsd_file_is_cached(struct inode *inode)
>  }
>  
>  static __be32
> -nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> +nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net,
> +		     struct svc_cred *cred,
> +		     struct auth_domain *client,
> +		     struct svc_fh *fhp,
>  		     unsigned int may_flags, struct file *file,
>  		     struct nfsd_file **pnf, bool want_gc)
>  {
>  	unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
> -	struct net *net = SVC_NET(rqstp);
>  	struct nfsd_file *new, *nf;
>  	bool stale_retry = true;
>  	bool open_retry = true;
> @@ -996,8 +998,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	int ret;
>  
>  retry:
> -	status = fh_verify(rqstp, fhp, S_IFREG,
> -				may_flags|NFSD_MAY_OWNER_OVERRIDE);
> +	if (rqstp) {
> +		status = fh_verify(rqstp, fhp, S_IFREG,
> +				   may_flags|NFSD_MAY_OWNER_OVERRIDE);
> +	} else {
> +		status = fh_verify_local(net, cred, client, fhp, S_IFREG,
> +					 may_flags|NFSD_MAY_OWNER_OVERRIDE);
> +	}
>  	if (status != nfs_ok)
>  		return status;
>  	inode = d_inode(fhp->fh_dentry);
> @@ -1143,7 +1150,8 @@ __be32
>  nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		     unsigned int may_flags, struct nfsd_file **pnf)
>  {
> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true);
> +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> +				    fhp, may_flags, NULL, pnf, true);
>  }
>  
>  /**
> @@ -1167,7 +1175,55 @@ __be32
>  nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		  unsigned int may_flags, struct nfsd_file **pnf)
>  {
> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false);
> +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> +				    fhp, may_flags, NULL, pnf, false);
> +}
> +
> +/**
> + * nfsd_file_acquire_local - Get a struct nfsd_file with an open file for localio
> + * @net: The network namespace in which to perform a lookup
> + * @cred: the user credential with which to validate access
> + * @client: the auth_domain for LOCALIO lookup
> + * @fhp: the NFS filehandle of the file to be opened
> + * @may_flags: NFSD_MAY_ settings for the file
> + * @pnf: OUT: new or found "struct nfsd_file" object
> + *
> + * This file lookup interface provide access to a file given the
> + * filehandle and credential.  No connection-based authorisation
> + * is performed and in that way it is quite different to other
> + * file access mediated by nfsd.  It allows a kernel module such as the NFS
> + * client to reach across network and filesystem namespaces to access
> + * a file.  The security implications of this should be carefully
> + * considered before use.
> + *
> + * The nfsd_file object returned by this API is reference-counted
> + * and garbage-collected. The object is retained for a few
> + * seconds after the final nfsd_file_put() in case the caller
> + * wants to re-use it.
> + *
> + * Return values:
> + *   %nfs_ok - @pnf points to an nfsd_file with its reference
> + *   count boosted.
> + *
> + * On error, an nfsstat value in network byte order is returned.
> + */
> +__be32
> +nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
> +			struct auth_domain *client, struct svc_fh *fhp,
> +			unsigned int may_flags, struct nfsd_file **pnf)
> +{
> +	/*
> +	 * Save creds before calling nfsd_file_do_acquire() (which calls
> +	 * nfsd_setuser). Important because caller (LOCALIO) is from
> +	 * client context.
> +	 */
> +	const struct cred *save_cred = get_current_cred();
> +	__be32 beres;
> +
> +	beres = nfsd_file_do_acquire(NULL, net, cred, client,
> +				     fhp, may_flags, NULL, pnf, true);
> +	revert_creds(save_cred);
> +	return beres;
>  }
>  
>  /**
> @@ -1193,7 +1249,8 @@ nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  			 unsigned int may_flags, struct file *file,
>  			 struct nfsd_file **pnf)
>  {
> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false);
> +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> +				    fhp, may_flags, file, pnf, false);
>  }
>  
>  /*
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index 3fbec24eea6c..26ada78b8c1e 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -66,5 +66,8 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		  unsigned int may_flags, struct file *file,
>  		  struct nfsd_file **nfp);
> +__be32 nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
> +			       struct auth_domain *client, struct svc_fh *fhp,
> +			       unsigned int may_flags, struct nfsd_file **pnf);
>  int nfsd_file_cache_stats_show(struct seq_file *m, void *v);
>  #endif /* _FS_NFSD_FILECACHE_H */
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 80c06e170e9a..49468e478d23 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -301,6 +301,22 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
>  	return error;
>  }
>  
> +/**
> + * __fh_verify - filehandle lookup and access checking
> + * @rqstp: RPC transaction context, or NULL
> + * @net: net namespace in which to perform the export lookup
> + * @cred: RPC user credential
> + * @client: RPC auth domain
> + * @gssclient: RPC GSS auth domain, or NULL
> + * @fhp: filehandle to be verified
> + * @type: expected type of object pointed to by filehandle
> + * @access: type of access needed to object
> + *
> + * This internal API can be used by callers who do not have an RPC
> + * transaction context (ie are not running in an nfsd thread).
> + *
> + * See fh_verify() for further descriptions of @fhp, @type, and @access.
> + */
>  static __be32
>  __fh_verify(struct svc_rqst *rqstp,
>  	    struct net *net, struct svc_cred *cred,
> @@ -382,6 +398,29 @@ __fh_verify(struct svc_rqst *rqstp,
>  	return error;
>  }
>  
> +/**
> + * fh_verify_local - filehandle lookup and access checking
> + * @net: net namespace in which to perform the export lookup
> + * @cred: RPC user credential
> + * @client: RPC auth domain
> + * @fhp: filehandle to be verified
> + * @type: expected type of object pointed to by filehandle
> + * @access: type of access needed to object
> + *
> + * This API can be used by callers who do not have an RPC
> + * transaction context (ie are not running in an nfsd thread).
> + *
> + * See fh_verify() for further descriptions of @fhp, @type, and @access.
> + */
> +__be32
> +fh_verify_local(struct net *net, struct svc_cred *cred,
> +		struct auth_domain *client, struct svc_fh *fhp,
> +		umode_t type, int access)
> +{
> +	return __fh_verify(NULL, net, cred, client, NULL,
> +			   fhp, type, access);
> +}
> +
>  /**
>   * fh_verify - filehandle lookup and access checking
>   * @rqstp: pointer to current rpc request
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 8d46e203d139..5b7394801dc4 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -217,6 +217,8 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
>   * Function prototypes
>   */
>  __be32	fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
> +__be32	fh_verify_local(struct net *, struct svc_cred *, struct auth_domain *,
> +			struct svc_fh *, umode_t, int);
>  __be32	fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
>  __be32	fh_update(struct svc_fh *);
>  void	fh_put(struct svc_fh *);

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
  2024-08-29 14:28   ` [PATCH v14 " Jeff Layton
@ 2024-08-29 15:28     ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29 15:28 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 10:28:18AM -0400, Jeff Layton wrote:
> On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > Currently, fh_verify() makes some daring assumptions about which
> > version of file handle the caller wants, based on the things it can
> > find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
> > case sometimes has no svc_rqst context, so this logic won't work in
> > that case.
> > 
> > Instead, examine the passed-in file handle. It's .max_size field
> > should carry information to allow nfsd_set_fh_dentry() to initialize
> > the file handle appropriately.
> > 
> > lockd appears to be the only kernel consumer that does not set the
> > file handle .max_size during initialization.
> > 
> > write_filehandle() is the other question mark, as it looks possible
> > to specify a maxsize between NFS_FHSIZE and NFS3_FHSIZE here.
> > 
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/nfsd/lockd.c |  6 ++++--
> >  fs/nfsd/nfsfh.c | 11 +++++++----
> >  2 files changed, 11 insertions(+), 6 deletions(-)
> > 
> > diff --git a/fs/nfsd/lockd.c b/fs/nfsd/lockd.c
> > index 46a7f9b813e5..e636d2a1e664 100644
> > --- a/fs/nfsd/lockd.c
> > +++ b/fs/nfsd/lockd.c
> > @@ -32,8 +32,10 @@ nlm_fopen(struct svc_rqst *rqstp, struct nfs_fh *f, struct file **filp,
> >  	int		access;
> >  	struct svc_fh	fh;
> >  
> > -	/* must initialize before using! but maxsize doesn't matter */
> > -	fh_init(&fh,0);
> > +	if (rqstp->rq_vers == 4)
> > +		fh_init(&fh, NFS3_FHSIZE);
> > +	else
> > +		fh_init(&fh, NFS_FHSIZE);
> >  	fh.fh_handle.fh_size = f->size;
> >  	memcpy(&fh.fh_handle.fh_raw, f->data, f->size);
> >  	fh.fh_export = NULL;
> > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > index 4b964a71a504..77acc26e8b02 100644
> > --- a/fs/nfsd/nfsfh.c
> > +++ b/fs/nfsd/nfsfh.c
> > @@ -267,25 +267,28 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
> >  	fhp->fh_dentry = dentry;
> >  	fhp->fh_export = exp;
> >  
> > -	switch (rqstp->rq_vers) {
> > -	case 4:
> > +	switch (fhp->fh_maxsize) {
> > +	case NFS4_FHSIZE:
> >  		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR)
> >  			fhp->fh_no_atomic_attr = true;
> >  		fhp->fh_64bit_cookies = true;
> >  		break;
> > -	case 3:
> > +	case NFS3_FHSIZE:
> >  		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC)
> >  			fhp->fh_no_wcc = true;
> >  		fhp->fh_64bit_cookies = true;
> >  		if (exp->ex_flags & NFSEXP_V4ROOT)
> >  			goto out;
> >  		break;
> > -	case 2:
> > +	case NFS_FHSIZE:
> >  		fhp->fh_no_wcc = true;
> >  		if (EX_WGATHER(exp))
> >  			fhp->fh_use_wgather = true;
> >  		if (exp->ex_flags & NFSEXP_V4ROOT)
> >  			goto out;
> > +		break;
> > +	case 0:
> > +		WARN_ONCE(1, "Uninitialized file handle");
> >  	}
> >  
> >  	return 0;
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>

Thanks for the review!  But please note that you reviewed the stale
patch I mistakenly sent out, I replied to this patch with:

[PATCH v14.5 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()

Thanks.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 08/25] nfsd: factor out __fh_verify to allow NULL rqstp to be passed
  2024-08-29 14:39   ` Jeff Layton
@ 2024-08-29 15:35     ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29 15:35 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 10:39:33AM -0400, Jeff Layton wrote:
> On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > From: NeilBrown <neilb@suse.de>
> > 
> > __fh_verify() offers an interface like fh_verify() but doesn't require
> > a struct svc_rqst *, instead it also takes the specific parts as
> > explicit required arguments.  So it is safe to call __fh_verify() with
> > a NULL rqstp, but the net, cred, and client args must not be NULL.
> > 
> > __fh_verify() does not use SVC_NET(), nor does the functions it calls.
> > 
> > Rather than using rqstp->rq_client pass the client and gssclient
> > explicitly to __fh_verify and then to nfsd_set_fh_dentry().
> > 
> > Lastly, 4 associated tracepoints are only used if rqstp is not NULL
> > (this is a stop-gap that should be properly fixed so localio also
> > benefits from the utility these tracepoints provide when debugging
> > fh_verify issues).
> > 
> 
> nit: this last paragraph doesn't apply anymore with the inclusion of
> the previous patch

I thought that too, but then I considered it further and it is still
applicable, just that the previous patch is the one dealing with it.
I think it still worthwhile to mention the lack of fh_verify tracing
for localio in this header,

> 
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >  fs/nfsd/nfsfh.c | 90 +++++++++++++++++++++++++++++--------------------
> >  1 file changed, 53 insertions(+), 37 deletions(-)
> > 
> > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > index 77acc26e8b02..80c06e170e9a 100644
> > --- a/fs/nfsd/nfsfh.c
> > +++ b/fs/nfsd/nfsfh.c
> > @@ -142,7 +142,11 @@ static inline __be32 check_pseudo_root(struct dentry *dentry,
> >   * dentry.  On success, the results are used to set fh_export and
> >   * fh_dentry.
> >   */
> > -static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
> > +static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
> > +				 struct svc_cred *cred,
> > +				 struct auth_domain *client,
> > +				 struct auth_domain *gssclient,
> > +				 struct svc_fh *fhp)
> >  {
> >  	struct knfsd_fh	*fh = &fhp->fh_handle;
> >  	struct fid *fid = NULL;
> > @@ -184,8 +188,8 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
> >  	data_left -= len;
> >  	if (data_left < 0)
> >  		return error;
> > -	exp = rqst_exp_find(&rqstp->rq_chandle, SVC_NET(rqstp),
> > -			    rqstp->rq_client, rqstp->rq_gssclient,
> > +	exp = rqst_exp_find(rqstp ? &rqstp->rq_chandle : NULL,
> > +			    net, client, gssclient,
> >  			    fh->fh_fsid_type, fh->fh_fsid);
> >  	fid = (struct fid *)(fh->fh_fsid + len);
> >  
> > @@ -220,7 +224,7 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
> >  		put_cred(override_creds(new));
> >  		put_cred(new);
> >  	} else {
> > -		error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
> > +		error = nfsd_setuser_and_check_port(rqstp, cred, exp);
> >  		if (error)
> >  			goto out;
> >  	}
> > @@ -297,43 +301,21 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
> >  	return error;
> >  }
> >  
> > -/**
> > - * fh_verify - filehandle lookup and access checking
> > - * @rqstp: pointer to current rpc request
> > - * @fhp: filehandle to be verified
> > - * @type: expected type of object pointed to by filehandle
> > - * @access: type of access needed to object
> > - *
> > - * Look up a dentry from the on-the-wire filehandle, check the client's
> > - * access to the export, and set the current task's credentials.
> > - *
> > - * Regardless of success or failure of fh_verify(), fh_put() should be
> > - * called on @fhp when the caller is finished with the filehandle.
> > - *
> > - * fh_verify() may be called multiple times on a given filehandle, for
> > - * example, when processing an NFSv4 compound.  The first call will look
> > - * up a dentry using the on-the-wire filehandle.  Subsequent calls will
> > - * skip the lookup and just perform the other checks and possibly change
> > - * the current task's credentials.
> > - *
> > - * @type specifies the type of object expected using one of the S_IF*
> > - * constants defined in include/linux/stat.h.  The caller may use zero
> > - * to indicate that it doesn't care, or a negative integer to indicate
> > - * that it expects something not of the given type.
> > - *
> > - * @access is formed from the NFSD_MAY_* constants defined in
> > - * fs/nfsd/vfs.h.
> > - */
> > -__be32
> > -fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
> > +static __be32
> > +__fh_verify(struct svc_rqst *rqstp,
> > +	    struct net *net, struct svc_cred *cred,
> > +	    struct auth_domain *client,
> > +	    struct auth_domain *gssclient,
> > +	    struct svc_fh *fhp, umode_t type, int access)
> 
> I don't consider is a show-stopper, but it might be good to have a
> kerneldoc header on this, just because it has so many parameters.
> Having them clearly spelled out, and the rules around what must be set
> when rqstp is NULL would make it less likely we'll break those
> assumptions in the future.

Yeah, it does get backfilled in the next patch (which you just
reviewed so I'm just telling you something you know, this is just for
the benefit of others I guess).

The sequencing of the changes between this and the acquire_local patch
could be improved though.  SO if I need to do a v15 (I hope not!) I'll
clean it up ;)

Thanks,
Mike


> 
> >  {
> > -	struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
> > +	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> >  	struct svc_export *exp = NULL;
> >  	struct dentry	*dentry;
> >  	__be32		error;
> >  
> >  	if (!fhp->fh_dentry) {
> > -		error = nfsd_set_fh_dentry(rqstp, fhp);
> > +		error = nfsd_set_fh_dentry(rqstp, net, cred, client,
> > +					   gssclient, fhp);
> >  		if (error)
> >  			goto out;
> >  	}
> > @@ -362,7 +344,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
> >  	if (error)
> >  		goto out;
> >  
> > -	error = nfsd_setuser_and_check_port(rqstp, &rqstp->rq_cred, exp);
> > +	error = nfsd_setuser_and_check_port(rqstp, cred, exp);
> >  	if (error)
> >  		goto out;
> >  
> > @@ -392,7 +374,7 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
> >  
> >  skip_pseudoflavor_check:
> >  	/* Finally, check access permissions. */
> > -	error = nfsd_permission(&rqstp->rq_cred, exp, dentry, access);
> > +	error = nfsd_permission(cred, exp, dentry, access);
> >  out:
> >  	trace_nfsd_fh_verify_err(rqstp, fhp, type, access, error);
> >  	if (error == nfserr_stale)
> > @@ -400,6 +382,40 @@ fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
> >  	return error;
> >  }
> >  
> > +/**
> > + * fh_verify - filehandle lookup and access checking
> > + * @rqstp: pointer to current rpc request
> > + * @fhp: filehandle to be verified
> > + * @type: expected type of object pointed to by filehandle
> > + * @access: type of access needed to object
> > + *
> > + * Look up a dentry from the on-the-wire filehandle, check the client's
> > + * access to the export, and set the current task's credentials.
> > + *
> > + * Regardless of success or failure of fh_verify(), fh_put() should be
> > + * called on @fhp when the caller is finished with the filehandle.
> > + *
> > + * fh_verify() may be called multiple times on a given filehandle, for
> > + * example, when processing an NFSv4 compound.  The first call will look
> > + * up a dentry using the on-the-wire filehandle.  Subsequent calls will
> > + * skip the lookup and just perform the other checks and possibly change
> > + * the current task's credentials.
> > + *
> > + * @type specifies the type of object expected using one of the S_IF*
> > + * constants defined in include/linux/stat.h.  The caller may use zero
> > + * to indicate that it doesn't care, or a negative integer to indicate
> > + * that it expects something not of the given type.
> > + *
> > + * @access is formed from the NFSD_MAY_* constants defined in
> > + * fs/nfsd/vfs.h.
> > + */
> > +__be32
> > +fh_verify(struct svc_rqst *rqstp, struct svc_fh *fhp, umode_t type, int access)
> > +{
> > +	return __fh_verify(rqstp, SVC_NET(rqstp), &rqstp->rq_cred,
> > +			   rqstp->rq_client, rqstp->rq_gssclient,
> > +			   fhp, type, access);
> > +}
> >  
> >  /*
> >   * Compose a file handle for an NFS reply.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local()
  2024-08-29  1:04 ` [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local() Mike Snitzer
  2024-08-29 14:49   ` Jeff Layton
@ 2024-08-29 15:47   ` Chuck Lever
  2024-08-29 15:59     ` Mike Snitzer
  1 sibling, 1 reply; 75+ messages in thread
From: Chuck Lever @ 2024-08-29 15:47 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Wed, Aug 28, 2024 at 09:04:04PM -0400, Mike Snitzer wrote:
> From: NeilBrown <neilb@suse.de>
> 
> nfsd_file_acquire_local() can be used to look up a file by filehandle
> without having a struct svc_rqst.  This can be used by NFS LOCALIO to
> allow the NFS client to bypass the NFS protocol to directly access a
> file provided by the NFS server which is running in the same kernel.
> 
> In nfsd_file_do_acquire() care is taken to always use fh_verify() if
> rqstp is not NULL (as is the case for non-LOCALIO callers).  Otherwise
> the non-LOCALIO callers will not supply the correct and required
> arguments to __fh_verify (e.g. gssclient isn't passed).
> 
> Introduce fh_verify_local() wrapper around __fh_verify to make it
> clear that LOCALIO is intended caller.
> 
> Also, use GC for nfsd_file returned by nfsd_file_acquire_local.  GC
> offers performance improvements if/when a file is reopened before
> launderette cleans it from the filecache's LRU.
> 
> Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC
> Signed-off-by: NeilBrown <neilb@suse.de>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>  fs/nfsd/filecache.c | 71 ++++++++++++++++++++++++++++++++++++++++-----
>  fs/nfsd/filecache.h |  3 ++
>  fs/nfsd/nfsfh.c     | 39 +++++++++++++++++++++++++
>  fs/nfsd/nfsfh.h     |  2 ++
>  4 files changed, 108 insertions(+), 7 deletions(-)
> 
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 9e9d246f993c..2dc72de31f61 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -982,12 +982,14 @@ nfsd_file_is_cached(struct inode *inode)
>  }
>  
>  static __be32
> -nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> +nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net,
> +		     struct svc_cred *cred,
> +		     struct auth_domain *client,
> +		     struct svc_fh *fhp,
>  		     unsigned int may_flags, struct file *file,
>  		     struct nfsd_file **pnf, bool want_gc)
>  {
>  	unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
> -	struct net *net = SVC_NET(rqstp);
>  	struct nfsd_file *new, *nf;
>  	bool stale_retry = true;
>  	bool open_retry = true;
> @@ -996,8 +998,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  	int ret;
>  
>  retry:
> -	status = fh_verify(rqstp, fhp, S_IFREG,
> -				may_flags|NFSD_MAY_OWNER_OVERRIDE);
> +	if (rqstp) {
> +		status = fh_verify(rqstp, fhp, S_IFREG,
> +				   may_flags|NFSD_MAY_OWNER_OVERRIDE);
> +	} else {
> +		status = fh_verify_local(net, cred, client, fhp, S_IFREG,
> +					 may_flags|NFSD_MAY_OWNER_OVERRIDE);
> +	}
>  	if (status != nfs_ok)
>  		return status;
>  	inode = d_inode(fhp->fh_dentry);
> @@ -1143,7 +1150,8 @@ __be32
>  nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		     unsigned int may_flags, struct nfsd_file **pnf)
>  {
> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true);
> +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> +				    fhp, may_flags, NULL, pnf, true);
>  }
>  
>  /**
> @@ -1167,7 +1175,55 @@ __be32
>  nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		  unsigned int may_flags, struct nfsd_file **pnf)
>  {
> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false);
> +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> +				    fhp, may_flags, NULL, pnf, false);
> +}
> +
> +/**
> + * nfsd_file_acquire_local - Get a struct nfsd_file with an open file for localio
> + * @net: The network namespace in which to perform a lookup
> + * @cred: the user credential with which to validate access
> + * @client: the auth_domain for LOCALIO lookup
> + * @fhp: the NFS filehandle of the file to be opened
> + * @may_flags: NFSD_MAY_ settings for the file
> + * @pnf: OUT: new or found "struct nfsd_file" object
> + *
> + * This file lookup interface provide access to a file given the
> + * filehandle and credential.  No connection-based authorisation
> + * is performed and in that way it is quite different to other
> + * file access mediated by nfsd.  It allows a kernel module such as the NFS
> + * client to reach across network and filesystem namespaces to access
> + * a file.  The security implications of this should be carefully
> + * considered before use.
> + *
> + * The nfsd_file object returned by this API is reference-counted
> + * and garbage-collected. The object is retained for a few
> + * seconds after the final nfsd_file_put() in case the caller
> + * wants to re-use it.
> + *
> + * Return values:
> + *   %nfs_ok - @pnf points to an nfsd_file with its reference
> + *   count boosted.
> + *
> + * On error, an nfsstat value in network byte order is returned.
> + */
> +__be32
> +nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
> +			struct auth_domain *client, struct svc_fh *fhp,
> +			unsigned int may_flags, struct nfsd_file **pnf)
> +{
> +	/*
> +	 * Save creds before calling nfsd_file_do_acquire() (which calls
> +	 * nfsd_setuser). Important because caller (LOCALIO) is from
> +	 * client context.
> +	 */
> +	const struct cred *save_cred = get_current_cred();
> +	__be32 beres;
> +
> +	beres = nfsd_file_do_acquire(NULL, net, cred, client,
> +				     fhp, may_flags, NULL, pnf, true);
> +	revert_creds(save_cred);
> +	return beres;
>  }
>  
>  /**
> @@ -1193,7 +1249,8 @@ nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  			 unsigned int may_flags, struct file *file,
>  			 struct nfsd_file **pnf)
>  {
> -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false);
> +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> +				    fhp, may_flags, file, pnf, false);
>  }
>  
>  /*
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index 3fbec24eea6c..26ada78b8c1e 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -66,5 +66,8 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
>  		  unsigned int may_flags, struct file *file,
>  		  struct nfsd_file **nfp);
> +__be32 nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
> +			       struct auth_domain *client, struct svc_fh *fhp,
> +			       unsigned int may_flags, struct nfsd_file **pnf);
>  int nfsd_file_cache_stats_show(struct seq_file *m, void *v);
>  #endif /* _FS_NFSD_FILECACHE_H */
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 80c06e170e9a..49468e478d23 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -301,6 +301,22 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
>  	return error;
>  }
>  
> +/**
> + * __fh_verify - filehandle lookup and access checking
> + * @rqstp: RPC transaction context, or NULL
> + * @net: net namespace in which to perform the export lookup
> + * @cred: RPC user credential
> + * @client: RPC auth domain
> + * @gssclient: RPC GSS auth domain, or NULL
> + * @fhp: filehandle to be verified
> + * @type: expected type of object pointed to by filehandle
> + * @access: type of access needed to object
> + *
> + * This internal API can be used by callers who do not have an RPC
> + * transaction context (ie are not running in an nfsd thread).

This paragraph is incorrect, since fh_verify(), which has a non-NULL
@rqstp, also uses this internal API. Another review isn't needed,
but you should perhaps drop this paragraph before submitting the
final version.


> + *
> + * See fh_verify() for further descriptions of @fhp, @type, and @access.
> + */
>  static __be32
>  __fh_verify(struct svc_rqst *rqstp,
>  	    struct net *net, struct svc_cred *cred,
> @@ -382,6 +398,29 @@ __fh_verify(struct svc_rqst *rqstp,
>  	return error;
>  }
>  
> +/**
> + * fh_verify_local - filehandle lookup and access checking
> + * @net: net namespace in which to perform the export lookup
> + * @cred: RPC user credential
> + * @client: RPC auth domain
> + * @fhp: filehandle to be verified
> + * @type: expected type of object pointed to by filehandle
> + * @access: type of access needed to object
> + *
> + * This API can be used by callers who do not have an RPC
> + * transaction context (ie are not running in an nfsd thread).
> + *
> + * See fh_verify() for further descriptions of @fhp, @type, and @access.
> + */
> +__be32
> +fh_verify_local(struct net *net, struct svc_cred *cred,
> +		struct auth_domain *client, struct svc_fh *fhp,
> +		umode_t type, int access)

Yeah: Unneeded @rqstp parameter is gone. Clean.


> +{
> +	return __fh_verify(NULL, net, cred, client, NULL,
> +			   fhp, type, access);
> +}
> +
>  /**
>   * fh_verify - filehandle lookup and access checking
>   * @rqstp: pointer to current rpc request
> diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> index 8d46e203d139..5b7394801dc4 100644
> --- a/fs/nfsd/nfsfh.h
> +++ b/fs/nfsd/nfsfh.h
> @@ -217,6 +217,8 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
>   * Function prototypes
>   */
>  __be32	fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
> +__be32	fh_verify_local(struct net *, struct svc_cred *, struct auth_domain *,
> +			struct svc_fh *, umode_t, int);
>  __be32	fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
>  __be32	fh_update(struct svc_fh *);
>  void	fh_put(struct svc_fh *);
> -- 
> 2.44.0
> 

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put
  2024-08-29  1:04 ` [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put Mike Snitzer
@ 2024-08-29 15:49   ` Chuck Lever
  2024-08-29 15:57   ` Jeff Layton
  1 sibling, 0 replies; 75+ messages in thread
From: Chuck Lever @ 2024-08-29 15:49 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Wed, Aug 28, 2024 at 09:04:05PM -0400, Mike Snitzer wrote:
> Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
> to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
> caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.
> 
> A percpu_ref is used to implement the interlock between
> nfsd_destroy_serv and any caller of nfsd_serv_try_get.
> 
> This interlock is needed to properly wait for the completion of client
> initiated localio calls to nfsd (that are _not_ in the context of nfsd).
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfsd/netns.h  |  8 +++++++-
>  fs/nfsd/nfssvc.c | 39 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 238fc4e56e53..e2d953f21dde 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -13,6 +13,7 @@
>  #include <linux/filelock.h>
>  #include <linux/nfs4.h>
>  #include <linux/percpu_counter.h>
> +#include <linux/percpu-refcount.h>
>  #include <linux/siphash.h>
>  #include <linux/sunrpc/stats.h>
>  
> @@ -139,7 +140,9 @@ struct nfsd_net {
>  
>  	struct svc_info nfsd_info;
>  #define nfsd_serv nfsd_info.serv
> -
> +	struct percpu_ref nfsd_serv_ref;
> +	struct completion nfsd_serv_confirm_done;
> +	struct completion nfsd_serv_free_done;
>  
>  	/*
>  	 * clientid and stateid data for construction of net unique COPY
> @@ -221,6 +224,9 @@ struct nfsd_net {
>  extern bool nfsd_support_version(int vers);
>  extern unsigned int nfsd_net_id;
>  
> +bool nfsd_serv_try_get(struct nfsd_net *nn);
> +void nfsd_serv_put(struct nfsd_net *nn);
> +
>  void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
>  void nfsd_reset_write_verifier(struct nfsd_net *nn);
>  #endif /* __NFSD_NETNS_H__ */
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index defc430f912f..e43d440f9f0a 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -193,6 +193,30 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
>  	return 0;
>  }
>  
> +bool nfsd_serv_try_get(struct nfsd_net *nn)
> +{
> +	return percpu_ref_tryget_live(&nn->nfsd_serv_ref);
> +}
> +
> +void nfsd_serv_put(struct nfsd_net *nn)
> +{
> +	percpu_ref_put(&nn->nfsd_serv_ref);
> +}
> +
> +static void nfsd_serv_done(struct percpu_ref *ref)
> +{
> +	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
> +
> +	complete(&nn->nfsd_serv_confirm_done);
> +}
> +
> +static void nfsd_serv_free(struct percpu_ref *ref)
> +{
> +	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
> +
> +	complete(&nn->nfsd_serv_free_done);
> +}
> +
>  /*
>   * Maximum number of nfsd processes
>   */
> @@ -392,6 +416,7 @@ static void nfsd_shutdown_net(struct net *net)
>  		lockd_down(net);
>  		nn->lockd_up = false;
>  	}
> +	percpu_ref_exit(&nn->nfsd_serv_ref);
>  	nn->nfsd_net_up = false;
>  	nfsd_shutdown_generic();
>  }
> @@ -471,6 +496,13 @@ void nfsd_destroy_serv(struct net *net)
>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>  	struct svc_serv *serv = nn->nfsd_serv;
>  
> +	lockdep_assert_held(&nfsd_mutex);
> +
> +	percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
> +	wait_for_completion(&nn->nfsd_serv_confirm_done);
> +	wait_for_completion(&nn->nfsd_serv_free_done);
> +	/* percpu_ref_exit is called in nfsd_shutdown_net */
> +
>  	spin_lock(&nfsd_notifier_lock);
>  	nn->nfsd_serv = NULL;
>  	spin_unlock(&nfsd_notifier_lock);
> @@ -595,6 +627,13 @@ int nfsd_create_serv(struct net *net)
>  	if (nn->nfsd_serv)
>  		return 0;
>  
> +	error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free,
> +				0, GFP_KERNEL);
> +	if (error)
> +		return error;
> +	init_completion(&nn->nfsd_serv_free_done);
> +	init_completion(&nn->nfsd_serv_confirm_done);
> +
>  	if (nfsd_max_blksize == 0)
>  		nfsd_max_blksize = nfsd_get_default_max_blksize();
>  	nfsd_reset_versions(nn);
> -- 
> 2.44.0
> 

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 12/25] SUNRPC: add svcauth_map_clnt_to_svc_cred_local
  2024-08-29  1:04 ` [PATCH v14 12/25] SUNRPC: add svcauth_map_clnt_to_svc_cred_local Mike Snitzer
@ 2024-08-29 15:50   ` Chuck Lever
  2024-08-29 16:01   ` Jeff Layton
  1 sibling, 0 replies; 75+ messages in thread
From: Chuck Lever @ 2024-08-29 15:50 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Wed, Aug 28, 2024 at 09:04:07PM -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <dros@primarydata.com>
> 
> Add new funtion svcauth_map_clnt_to_svc_cred_local which maps a
> generic cred to a svc_cred suitable for use in nfsd.
> 
> This is needed by the localio code to map nfs client creds to nfs
> server credentials.
> 
> Following from net/sunrpc/auth_unix.c:unx_marshal() it is clear that
> ->fsuid and ->fsgid must be used (rather than ->uid and ->gid).  In
> addition, these uid and gid must be translated with from_kuid_munged()
> so local client uses correct uid and gid when acting as local server.
> 
> Suggested-by: NeilBrown <neilb@suse.de> # to approximate unx_marshal()
> Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  include/linux/sunrpc/svcauth.h |  5 +++++
>  net/sunrpc/svcauth.c           | 28 ++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
> index 63cf6fb26dcc..2e111153f7cd 100644
> --- a/include/linux/sunrpc/svcauth.h
> +++ b/include/linux/sunrpc/svcauth.h
> @@ -14,6 +14,7 @@
>  #include <linux/sunrpc/msg_prot.h>
>  #include <linux/sunrpc/cache.h>
>  #include <linux/sunrpc/gss_api.h>
> +#include <linux/sunrpc/clnt.h>
>  #include <linux/hash.h>
>  #include <linux/stringhash.h>
>  #include <linux/cred.h>
> @@ -157,6 +158,10 @@ extern enum svc_auth_status svc_set_client(struct svc_rqst *rqstp);
>  extern int	svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops);
>  extern void	svc_auth_unregister(rpc_authflavor_t flavor);
>  
> +extern void	svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt,
> +						   const struct cred *,
> +						   struct svc_cred *);
> +
>  extern struct auth_domain *unix_domain_find(char *name);
>  extern void auth_domain_put(struct auth_domain *item);
>  extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new);
> diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c
> index 93d9e949e265..55b4d2874188 100644
> --- a/net/sunrpc/svcauth.c
> +++ b/net/sunrpc/svcauth.c
> @@ -18,6 +18,7 @@
>  #include <linux/sunrpc/svcauth.h>
>  #include <linux/err.h>
>  #include <linux/hash.h>
> +#include <linux/user_namespace.h>
>  
>  #include <trace/events/sunrpc.h>
>  
> @@ -175,6 +176,33 @@ rpc_authflavor_t svc_auth_flavor(struct svc_rqst *rqstp)
>  }
>  EXPORT_SYMBOL_GPL(svc_auth_flavor);
>  
> +/**
> + * svcauth_map_clnt_to_svc_cred_local - maps a generic cred
> + * to a svc_cred suitable for use in nfsd.
> + * @clnt: rpc_clnt associated with nfs client
> + * @cred: generic cred associated with nfs client
> + * @svc: returned svc_cred that is suitable for use in nfsd
> + */
> +void svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt,
> +					const struct cred *cred,
> +					struct svc_cred *svc)
> +{
> +	struct user_namespace *userns = clnt->cl_cred ?
> +		clnt->cl_cred->user_ns : &init_user_ns;
> +
> +	memset(svc, 0, sizeof(struct svc_cred));
> +
> +	svc->cr_uid = KUIDT_INIT(from_kuid_munged(userns, cred->fsuid));
> +	svc->cr_gid = KGIDT_INIT(from_kgid_munged(userns, cred->fsgid));
> +	svc->cr_flavor = clnt->cl_auth->au_flavor;
> +	if (cred->group_info)
> +		svc->cr_group_info = get_group_info(cred->group_info);
> +	/* These aren't relevant for local (network is bypassed) */
> +	svc->cr_principal = NULL;
> +	svc->cr_gss_mech = NULL;
> +}
> +EXPORT_SYMBOL_GPL(svcauth_map_clnt_to_svc_cred_local);
> +
>  /**************************************************
>   * 'auth_domains' are stored in a hash table indexed by name.
>   * When the last reference to an 'auth_domain' is dropped,
> -- 
> 2.44.0
> 

Reviewed-by: Chuck Lever <chuck.lever@oracle.com>

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put
  2024-08-29  1:04 ` [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put Mike Snitzer
  2024-08-29 15:49   ` Chuck Lever
@ 2024-08-29 15:57   ` Jeff Layton
  2024-08-29 16:01     ` Mike Snitzer
  1 sibling, 1 reply; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 15:57 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
> to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
> caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.
> 
> A percpu_ref is used to implement the interlock between
> nfsd_destroy_serv and any caller of nfsd_serv_try_get.
> 
> This interlock is needed to properly wait for the completion of client
> initiated localio calls to nfsd (that are _not_ in the context of nfsd).
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfsd/netns.h  |  8 +++++++-
>  fs/nfsd/nfssvc.c | 39 +++++++++++++++++++++++++++++++++++++++
>  2 files changed, 46 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 238fc4e56e53..e2d953f21dde 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -13,6 +13,7 @@
>  #include <linux/filelock.h>
>  #include <linux/nfs4.h>
>  #include <linux/percpu_counter.h>
> +#include <linux/percpu-refcount.h>
>  #include <linux/siphash.h>
>  #include <linux/sunrpc/stats.h>
>  
> @@ -139,7 +140,9 @@ struct nfsd_net {
>  
>  	struct svc_info nfsd_info;
>  #define nfsd_serv nfsd_info.serv
> -
> +	struct percpu_ref nfsd_serv_ref;
> +	struct completion nfsd_serv_confirm_done;
> +	struct completion nfsd_serv_free_done;
>  
>  	/*
>  	 * clientid and stateid data for construction of net unique COPY
> @@ -221,6 +224,9 @@ struct nfsd_net {
>  extern bool nfsd_support_version(int vers);
>  extern unsigned int nfsd_net_id;
>  
> +bool nfsd_serv_try_get(struct nfsd_net *nn);
> +void nfsd_serv_put(struct nfsd_net *nn);
> +
>  void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
>  void nfsd_reset_write_verifier(struct nfsd_net *nn);
>  #endif /* __NFSD_NETNS_H__ */
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index defc430f912f..e43d440f9f0a 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -193,6 +193,30 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
>  	return 0;
>  }
>  
> +bool nfsd_serv_try_get(struct nfsd_net *nn)
> +{
> +	return percpu_ref_tryget_live(&nn->nfsd_serv_ref);
> +}
> +
> +void nfsd_serv_put(struct nfsd_net *nn)
> +{
> +	percpu_ref_put(&nn->nfsd_serv_ref);
> +}
> +
> +static void nfsd_serv_done(struct percpu_ref *ref)
> +{
> +	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
> +
> +	complete(&nn->nfsd_serv_confirm_done);
> +}
> +
> +static void nfsd_serv_free(struct percpu_ref *ref)
> +{
> +	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
> +
> +	complete(&nn->nfsd_serv_free_done);
> +}
> +
>  /*
>   * Maximum number of nfsd processes
>   */
> @@ -392,6 +416,7 @@ static void nfsd_shutdown_net(struct net *net)
>  		lockd_down(net);
>  		nn->lockd_up = false;
>  	}
> +	percpu_ref_exit(&nn->nfsd_serv_ref);
>  	nn->nfsd_net_up = false;
>  	nfsd_shutdown_generic();
>  }
> @@ -471,6 +496,13 @@ void nfsd_destroy_serv(struct net *net)
>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>  	struct svc_serv *serv = nn->nfsd_serv;
>  
> +	lockdep_assert_held(&nfsd_mutex);
> +
> +	percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
> +	wait_for_completion(&nn->nfsd_serv_confirm_done);
> +	wait_for_completion(&nn->nfsd_serv_free_done);
> +	/* percpu_ref_exit is called in nfsd_shutdown_net */
> +
>  	spin_lock(&nfsd_notifier_lock);
>  	nn->nfsd_serv = NULL;
>  	spin_unlock(&nfsd_notifier_lock);
> @@ -595,6 +627,13 @@ int nfsd_create_serv(struct net *net)
>  	if (nn->nfsd_serv)
>  		return 0;
>  
> +	error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free,
> +				0, GFP_KERNEL);
> +	if (error)
> +		return error;
> +	init_completion(&nn->nfsd_serv_free_done);
> +	init_completion(&nn->nfsd_serv_confirm_done);
> +
>  	if (nfsd_max_blksize == 0)
>  		nfsd_max_blksize = nfsd_get_default_max_blksize();
>  	nfsd_reset_versions(nn);

A little hard to review this one at this point in the series, as there
are no callers of get/put yet, but the concept seems reasonable.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 11/25] SUNRPC: remove call_allocate() BUG_ONs
  2024-08-29  1:04 ` [PATCH v14 11/25] SUNRPC: remove call_allocate() BUG_ONs Mike Snitzer
@ 2024-08-29 15:58   ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 15:58 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> Remove BUG_ON if p_arglen=0 to allow RPC with void arg.
> Remove BUG_ON if p_replen=0 to allow RPC with void return.
> 
> The former was needed for the first revision of the LOCALIO protocol
> which had an RPC that took a void arg:
> 
>     /* raw RFC 9562 UUID */
>     typedef u8 uuid_t<UUID_SIZE>;
> 
>     program NFS_LOCALIO_PROGRAM {
>         version LOCALIO_V1 {
>             void
>                 NULL(void) = 0;
> 
>             uuid_t
>                 GETUUID(void) = 1;
>         } = 1;
>     } = 400122;
> 
> The latter is needed for the final revision of the LOCALIO protocol
> which has a UUID_IS_LOCAL RPC which returns a void:
> 
>     /* raw RFC 9562 UUID */
>     typedef u8 uuid_t<UUID_SIZE>;
> 
>     program NFS_LOCALIO_PROGRAM {
>         version LOCALIO_V1 {
>             void
>                 NULL(void) = 0;
> 
>             void
>                 UUID_IS_LOCAL(uuid_t) = 1;
>         } = 1;
>     } = 400122;
> 
> There is really no value in triggering a BUG_ON in response to either
> of these previously unsupported conditions.
> 
> NeilBrown would like the entire 'if (proc->p_proc != 0)' branch
> removed (not just the one BUG_ON that must be removed for LOCALIO's
> immediate needs of returning void).
> 
> Reviewed-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  net/sunrpc/clnt.c | 6 ------
>  1 file changed, 6 deletions(-)
> 
> diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
> index 09f29a95f2bc..00fe6df11ab7 100644
> --- a/net/sunrpc/clnt.c
> +++ b/net/sunrpc/clnt.c
> @@ -1893,12 +1893,6 @@ call_allocate(struct rpc_task *task)
>  	if (req->rq_buffer)
>  		return;
>  
> -	if (proc->p_proc != 0) {
> -		BUG_ON(proc->p_arglen == 0);
> -		if (proc->p_decode != NULL)
> -			BUG_ON(proc->p_replen == 0);
> -	}
> -
>  	/*
>  	 * Calculate the size (in quads) of the RPC call
>  	 * and reply headers, and convert both values

Yay! More unneeded BUG_ONs gone.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local()
  2024-08-29 15:47   ` Chuck Lever
@ 2024-08-29 15:59     ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29 15:59 UTC (permalink / raw)
  To: Chuck Lever
  Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 11:47:22AM -0400, Chuck Lever wrote:
> On Wed, Aug 28, 2024 at 09:04:04PM -0400, Mike Snitzer wrote:
> > From: NeilBrown <neilb@suse.de>
> > 
> > nfsd_file_acquire_local() can be used to look up a file by filehandle
> > without having a struct svc_rqst.  This can be used by NFS LOCALIO to
> > allow the NFS client to bypass the NFS protocol to directly access a
> > file provided by the NFS server which is running in the same kernel.
> > 
> > In nfsd_file_do_acquire() care is taken to always use fh_verify() if
> > rqstp is not NULL (as is the case for non-LOCALIO callers).  Otherwise
> > the non-LOCALIO callers will not supply the correct and required
> > arguments to __fh_verify (e.g. gssclient isn't passed).
> > 
> > Introduce fh_verify_local() wrapper around __fh_verify to make it
> > clear that LOCALIO is intended caller.
> > 
> > Also, use GC for nfsd_file returned by nfsd_file_acquire_local.  GC
> > offers performance improvements if/when a file is reopened before
> > launderette cleans it from the filecache's LRU.
> > 
> > Suggested-by: Jeff Layton <jlayton@kernel.org> # use filecache's GC
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >  fs/nfsd/filecache.c | 71 ++++++++++++++++++++++++++++++++++++++++-----
> >  fs/nfsd/filecache.h |  3 ++
> >  fs/nfsd/nfsfh.c     | 39 +++++++++++++++++++++++++
> >  fs/nfsd/nfsfh.h     |  2 ++
> >  4 files changed, 108 insertions(+), 7 deletions(-)
> > 
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 9e9d246f993c..2dc72de31f61 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -982,12 +982,14 @@ nfsd_file_is_cached(struct inode *inode)
> >  }
> >  
> >  static __be32
> > -nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> > +nfsd_file_do_acquire(struct svc_rqst *rqstp, struct net *net,
> > +		     struct svc_cred *cred,
> > +		     struct auth_domain *client,
> > +		     struct svc_fh *fhp,
> >  		     unsigned int may_flags, struct file *file,
> >  		     struct nfsd_file **pnf, bool want_gc)
> >  {
> >  	unsigned char need = may_flags & NFSD_FILE_MAY_MASK;
> > -	struct net *net = SVC_NET(rqstp);
> >  	struct nfsd_file *new, *nf;
> >  	bool stale_retry = true;
> >  	bool open_retry = true;
> > @@ -996,8 +998,13 @@ nfsd_file_do_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  	int ret;
> >  
> >  retry:
> > -	status = fh_verify(rqstp, fhp, S_IFREG,
> > -				may_flags|NFSD_MAY_OWNER_OVERRIDE);
> > +	if (rqstp) {
> > +		status = fh_verify(rqstp, fhp, S_IFREG,
> > +				   may_flags|NFSD_MAY_OWNER_OVERRIDE);
> > +	} else {
> > +		status = fh_verify_local(net, cred, client, fhp, S_IFREG,
> > +					 may_flags|NFSD_MAY_OWNER_OVERRIDE);
> > +	}
> >  	if (status != nfs_ok)
> >  		return status;
> >  	inode = d_inode(fhp->fh_dentry);
> > @@ -1143,7 +1150,8 @@ __be32
> >  nfsd_file_acquire_gc(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		     unsigned int may_flags, struct nfsd_file **pnf)
> >  {
> > -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, true);
> > +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> > +				    fhp, may_flags, NULL, pnf, true);
> >  }
> >  
> >  /**
> > @@ -1167,7 +1175,55 @@ __be32
> >  nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		  unsigned int may_flags, struct nfsd_file **pnf)
> >  {
> > -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, NULL, pnf, false);
> > +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> > +				    fhp, may_flags, NULL, pnf, false);
> > +}
> > +
> > +/**
> > + * nfsd_file_acquire_local - Get a struct nfsd_file with an open file for localio
> > + * @net: The network namespace in which to perform a lookup
> > + * @cred: the user credential with which to validate access
> > + * @client: the auth_domain for LOCALIO lookup
> > + * @fhp: the NFS filehandle of the file to be opened
> > + * @may_flags: NFSD_MAY_ settings for the file
> > + * @pnf: OUT: new or found "struct nfsd_file" object
> > + *
> > + * This file lookup interface provide access to a file given the
> > + * filehandle and credential.  No connection-based authorisation
> > + * is performed and in that way it is quite different to other
> > + * file access mediated by nfsd.  It allows a kernel module such as the NFS
> > + * client to reach across network and filesystem namespaces to access
> > + * a file.  The security implications of this should be carefully
> > + * considered before use.
> > + *
> > + * The nfsd_file object returned by this API is reference-counted
> > + * and garbage-collected. The object is retained for a few
> > + * seconds after the final nfsd_file_put() in case the caller
> > + * wants to re-use it.
> > + *
> > + * Return values:
> > + *   %nfs_ok - @pnf points to an nfsd_file with its reference
> > + *   count boosted.
> > + *
> > + * On error, an nfsstat value in network byte order is returned.
> > + */
> > +__be32
> > +nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
> > +			struct auth_domain *client, struct svc_fh *fhp,
> > +			unsigned int may_flags, struct nfsd_file **pnf)
> > +{
> > +	/*
> > +	 * Save creds before calling nfsd_file_do_acquire() (which calls
> > +	 * nfsd_setuser). Important because caller (LOCALIO) is from
> > +	 * client context.
> > +	 */
> > +	const struct cred *save_cred = get_current_cred();
> > +	__be32 beres;
> > +
> > +	beres = nfsd_file_do_acquire(NULL, net, cred, client,
> > +				     fhp, may_flags, NULL, pnf, true);
> > +	revert_creds(save_cred);
> > +	return beres;
> >  }
> >  
> >  /**
> > @@ -1193,7 +1249,8 @@ nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  			 unsigned int may_flags, struct file *file,
> >  			 struct nfsd_file **pnf)
> >  {
> > -	return nfsd_file_do_acquire(rqstp, fhp, may_flags, file, pnf, false);
> > +	return nfsd_file_do_acquire(rqstp, SVC_NET(rqstp), NULL, NULL,
> > +				    fhp, may_flags, file, pnf, false);
> >  }
> >  
> >  /*
> > diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> > index 3fbec24eea6c..26ada78b8c1e 100644
> > --- a/fs/nfsd/filecache.h
> > +++ b/fs/nfsd/filecache.h
> > @@ -66,5 +66,8 @@ __be32 nfsd_file_acquire(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  __be32 nfsd_file_acquire_opened(struct svc_rqst *rqstp, struct svc_fh *fhp,
> >  		  unsigned int may_flags, struct file *file,
> >  		  struct nfsd_file **nfp);
> > +__be32 nfsd_file_acquire_local(struct net *net, struct svc_cred *cred,
> > +			       struct auth_domain *client, struct svc_fh *fhp,
> > +			       unsigned int may_flags, struct nfsd_file **pnf);
> >  int nfsd_file_cache_stats_show(struct seq_file *m, void *v);
> >  #endif /* _FS_NFSD_FILECACHE_H */
> > diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> > index 80c06e170e9a..49468e478d23 100644
> > --- a/fs/nfsd/nfsfh.c
> > +++ b/fs/nfsd/nfsfh.c
> > @@ -301,6 +301,22 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct net *net,
> >  	return error;
> >  }
> >  
> > +/**
> > + * __fh_verify - filehandle lookup and access checking
> > + * @rqstp: RPC transaction context, or NULL
> > + * @net: net namespace in which to perform the export lookup
> > + * @cred: RPC user credential
> > + * @client: RPC auth domain
> > + * @gssclient: RPC GSS auth domain, or NULL
> > + * @fhp: filehandle to be verified
> > + * @type: expected type of object pointed to by filehandle
> > + * @access: type of access needed to object
> > + *
> > + * This internal API can be used by callers who do not have an RPC
> > + * transaction context (ie are not running in an nfsd thread).
> 
> This paragraph is incorrect, since fh_verify(), which has a non-NULL
> @rqstp, also uses this internal API. Another review isn't needed,
> but you should perhaps drop this paragraph before submitting the
> final version.

OK, I reviewed this and thought the "can be" implied optional, so it
still applicable for the localio usecase.  But yeah, I will drop it.

> 
> 
> > + *
> > + * See fh_verify() for further descriptions of @fhp, @type, and @access.
> > + */
> >  static __be32
> >  __fh_verify(struct svc_rqst *rqstp,
> >  	    struct net *net, struct svc_cred *cred,
> > @@ -382,6 +398,29 @@ __fh_verify(struct svc_rqst *rqstp,
> >  	return error;
> >  }
> >  
> > +/**
> > + * fh_verify_local - filehandle lookup and access checking
> > + * @net: net namespace in which to perform the export lookup
> > + * @cred: RPC user credential
> > + * @client: RPC auth domain
> > + * @fhp: filehandle to be verified
> > + * @type: expected type of object pointed to by filehandle
> > + * @access: type of access needed to object
> > + *
> > + * This API can be used by callers who do not have an RPC
> > + * transaction context (ie are not running in an nfsd thread).
> > + *
> > + * See fh_verify() for further descriptions of @fhp, @type, and @access.
> > + */
> > +__be32
> > +fh_verify_local(struct net *net, struct svc_cred *cred,
> > +		struct auth_domain *client, struct svc_fh *fhp,
> > +		umode_t type, int access)
> 
> Yeah: Unneeded @rqstp parameter is gone. Clean.
> 

Yes

> > +{
> > +	return __fh_verify(NULL, net, cred, client, NULL,
> > +			   fhp, type, access);
> > +}
> > +
> >  /**
> >   * fh_verify - filehandle lookup and access checking
> >   * @rqstp: pointer to current rpc request
> > diff --git a/fs/nfsd/nfsfh.h b/fs/nfsd/nfsfh.h
> > index 8d46e203d139..5b7394801dc4 100644
> > --- a/fs/nfsd/nfsfh.h
> > +++ b/fs/nfsd/nfsfh.h
> > @@ -217,6 +217,8 @@ extern char * SVCFH_fmt(struct svc_fh *fhp);
> >   * Function prototypes
> >   */
> >  __be32	fh_verify(struct svc_rqst *, struct svc_fh *, umode_t, int);
> > +__be32	fh_verify_local(struct net *, struct svc_cred *, struct auth_domain *,
> > +			struct svc_fh *, umode_t, int);
> >  __be32	fh_compose(struct svc_fh *, struct svc_export *, struct dentry *, struct svc_fh *);
> >  __be32	fh_update(struct svc_fh *);
> >  void	fh_put(struct svc_fh *);
> > -- 
> > 2.44.0
> > 
> 
> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
> 
> -- 
> Chuck Lever

Thanks

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put
  2024-08-29 15:57   ` Jeff Layton
@ 2024-08-29 16:01     ` Mike Snitzer
  2024-08-29 16:04       ` Chuck Lever
  0 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29 16:01 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 11:57:20AM -0400, Jeff Layton wrote:
> On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
> > to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
> > caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.
> > 
> > A percpu_ref is used to implement the interlock between
> > nfsd_destroy_serv and any caller of nfsd_serv_try_get.
> > 
> > This interlock is needed to properly wait for the completion of client
> > initiated localio calls to nfsd (that are _not_ in the context of nfsd).
> > 
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/nfsd/netns.h  |  8 +++++++-
> >  fs/nfsd/nfssvc.c | 39 +++++++++++++++++++++++++++++++++++++++
> >  2 files changed, 46 insertions(+), 1 deletion(-)
> > 
> > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > index 238fc4e56e53..e2d953f21dde 100644
> > --- a/fs/nfsd/netns.h
> > +++ b/fs/nfsd/netns.h
> > @@ -13,6 +13,7 @@
> >  #include <linux/filelock.h>
> >  #include <linux/nfs4.h>
> >  #include <linux/percpu_counter.h>
> > +#include <linux/percpu-refcount.h>
> >  #include <linux/siphash.h>
> >  #include <linux/sunrpc/stats.h>
> >  
> > @@ -139,7 +140,9 @@ struct nfsd_net {
> >  
> >  	struct svc_info nfsd_info;
> >  #define nfsd_serv nfsd_info.serv
> > -
> > +	struct percpu_ref nfsd_serv_ref;
> > +	struct completion nfsd_serv_confirm_done;
> > +	struct completion nfsd_serv_free_done;
> >  
> >  	/*
> >  	 * clientid and stateid data for construction of net unique COPY
> > @@ -221,6 +224,9 @@ struct nfsd_net {
> >  extern bool nfsd_support_version(int vers);
> >  extern unsigned int nfsd_net_id;
> >  
> > +bool nfsd_serv_try_get(struct nfsd_net *nn);
> > +void nfsd_serv_put(struct nfsd_net *nn);
> > +
> >  void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
> >  void nfsd_reset_write_verifier(struct nfsd_net *nn);
> >  #endif /* __NFSD_NETNS_H__ */
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index defc430f912f..e43d440f9f0a 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -193,6 +193,30 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
> >  	return 0;
> >  }
> >  
> > +bool nfsd_serv_try_get(struct nfsd_net *nn)
> > +{
> > +	return percpu_ref_tryget_live(&nn->nfsd_serv_ref);
> > +}
> > +
> > +void nfsd_serv_put(struct nfsd_net *nn)
> > +{
> > +	percpu_ref_put(&nn->nfsd_serv_ref);
> > +}
> > +
> > +static void nfsd_serv_done(struct percpu_ref *ref)
> > +{
> > +	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
> > +
> > +	complete(&nn->nfsd_serv_confirm_done);
> > +}
> > +
> > +static void nfsd_serv_free(struct percpu_ref *ref)
> > +{
> > +	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
> > +
> > +	complete(&nn->nfsd_serv_free_done);
> > +}
> > +
> >  /*
> >   * Maximum number of nfsd processes
> >   */
> > @@ -392,6 +416,7 @@ static void nfsd_shutdown_net(struct net *net)
> >  		lockd_down(net);
> >  		nn->lockd_up = false;
> >  	}
> > +	percpu_ref_exit(&nn->nfsd_serv_ref);
> >  	nn->nfsd_net_up = false;
> >  	nfsd_shutdown_generic();
> >  }
> > @@ -471,6 +496,13 @@ void nfsd_destroy_serv(struct net *net)
> >  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> >  	struct svc_serv *serv = nn->nfsd_serv;
> >  
> > +	lockdep_assert_held(&nfsd_mutex);
> > +
> > +	percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
> > +	wait_for_completion(&nn->nfsd_serv_confirm_done);
> > +	wait_for_completion(&nn->nfsd_serv_free_done);
> > +	/* percpu_ref_exit is called in nfsd_shutdown_net */
> > +
> >  	spin_lock(&nfsd_notifier_lock);
> >  	nn->nfsd_serv = NULL;
> >  	spin_unlock(&nfsd_notifier_lock);
> > @@ -595,6 +627,13 @@ int nfsd_create_serv(struct net *net)
> >  	if (nn->nfsd_serv)
> >  		return 0;
> >  
> > +	error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free,
> > +				0, GFP_KERNEL);
> > +	if (error)
> > +		return error;
> > +	init_completion(&nn->nfsd_serv_free_done);
> > +	init_completion(&nn->nfsd_serv_confirm_done);
> > +
> >  	if (nfsd_max_blksize == 0)
> >  		nfsd_max_blksize = nfsd_get_default_max_blksize();
> >  	nfsd_reset_versions(nn);
> 
> A little hard to review this one at this point in the series, as there
> are no callers of get/put yet, but the concept seems reasonable.
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>

Thanks, yeah Chuck asked that I factor this interlock interface out to
a separate patch because it was a bit much buried in the next patch
that actually consumes it.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 12/25] SUNRPC: add svcauth_map_clnt_to_svc_cred_local
  2024-08-29  1:04 ` [PATCH v14 12/25] SUNRPC: add svcauth_map_clnt_to_svc_cred_local Mike Snitzer
  2024-08-29 15:50   ` Chuck Lever
@ 2024-08-29 16:01   ` Jeff Layton
  1 sibling, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 16:01 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <dros@primarydata.com>
> 
> Add new funtion svcauth_map_clnt_to_svc_cred_local which maps a
> generic cred to a svc_cred suitable for use in nfsd.
> 
> This is needed by the localio code to map nfs client creds to nfs
> server credentials.
> 
> Following from net/sunrpc/auth_unix.c:unx_marshal() it is clear that
> ->fsuid and ->fsgid must be used (rather than ->uid and ->gid).  In
> addition, these uid and gid must be translated with from_kuid_munged()
> so local client uses correct uid and gid when acting as local server.
> 
> Suggested-by: NeilBrown <neilb@suse.de> # to approximate unx_marshal()
> Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  include/linux/sunrpc/svcauth.h |  5 +++++
>  net/sunrpc/svcauth.c           | 28 ++++++++++++++++++++++++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h
> index 63cf6fb26dcc..2e111153f7cd 100644
> --- a/include/linux/sunrpc/svcauth.h
> +++ b/include/linux/sunrpc/svcauth.h
> @@ -14,6 +14,7 @@
>  #include <linux/sunrpc/msg_prot.h>
>  #include <linux/sunrpc/cache.h>
>  #include <linux/sunrpc/gss_api.h>
> +#include <linux/sunrpc/clnt.h>
>  #include <linux/hash.h>
>  #include <linux/stringhash.h>
>  #include <linux/cred.h>
> @@ -157,6 +158,10 @@ extern enum svc_auth_status svc_set_client(struct svc_rqst *rqstp);
>  extern int	svc_auth_register(rpc_authflavor_t flavor, struct auth_ops *aops);
>  extern void	svc_auth_unregister(rpc_authflavor_t flavor);
>  
> +extern void	svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt,
> +						   const struct cred *,
> +						   struct svc_cred *);
> +
>  extern struct auth_domain *unix_domain_find(char *name);
>  extern void auth_domain_put(struct auth_domain *item);
>  extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *new);
> diff --git a/net/sunrpc/svcauth.c b/net/sunrpc/svcauth.c
> index 93d9e949e265..55b4d2874188 100644
> --- a/net/sunrpc/svcauth.c
> +++ b/net/sunrpc/svcauth.c
> @@ -18,6 +18,7 @@
>  #include <linux/sunrpc/svcauth.h>
>  #include <linux/err.h>
>  #include <linux/hash.h>
> +#include <linux/user_namespace.h>
>  
>  #include <trace/events/sunrpc.h>
>  
> @@ -175,6 +176,33 @@ rpc_authflavor_t svc_auth_flavor(struct svc_rqst *rqstp)
>  }
>  EXPORT_SYMBOL_GPL(svc_auth_flavor);
>  
> +/**
> + * svcauth_map_clnt_to_svc_cred_local - maps a generic cred
> + * to a svc_cred suitable for use in nfsd.
> + * @clnt: rpc_clnt associated with nfs client
> + * @cred: generic cred associated with nfs client
> + * @svc: returned svc_cred that is suitable for use in nfsd
> + */
> +void svcauth_map_clnt_to_svc_cred_local(struct rpc_clnt *clnt,
> +					const struct cred *cred,
> +					struct svc_cred *svc)
> +{
> +	struct user_namespace *userns = clnt->cl_cred ?
> +		clnt->cl_cred->user_ns : &init_user_ns;
> +
> +	memset(svc, 0, sizeof(struct svc_cred));
> +
> +	svc->cr_uid = KUIDT_INIT(from_kuid_munged(userns, cred->fsuid));
> +	svc->cr_gid = KGIDT_INIT(from_kgid_munged(userns, cred->fsgid));
> +	svc->cr_flavor = clnt->cl_auth->au_flavor;
> +	if (cred->group_info)
> +		svc->cr_group_info = get_group_info(cred->group_info);
> +	/* These aren't relevant for local (network is bypassed) */
> +	svc->cr_principal = NULL;
> +	svc->cr_gss_mech = NULL;
> +}
> +EXPORT_SYMBOL_GPL(svcauth_map_clnt_to_svc_cred_local);
> +
>  /**************************************************
>   * 'auth_domains' are stored in a hash table indexed by name.
>   * When the last reference to an 'auth_domain' is dropped,

This is where the magic happens. Took me a bit to understand, but since
we're working in kuid_t/kgid_t, we don't need to worry about further
idmapping.

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 16/25] nfsd: add localio support
  2024-08-29  1:04 ` [PATCH v14 16/25] nfsd: add localio support Mike Snitzer
@ 2024-08-29 16:01   ` Chuck Lever
  2024-08-29 16:15     ` Mike Snitzer
  2024-08-29 23:10     ` NeilBrown
  2024-08-29 16:49   ` Jeff Layton
  1 sibling, 2 replies; 75+ messages in thread
From: Chuck Lever @ 2024-08-29 16:01 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Wed, Aug 28, 2024 at 09:04:11PM -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <dros@primarydata.com>
> 
> Add server support for bypassing NFS for localhost reads, writes, and
> commits. This is only useful when both the client and server are
> running on the same host.
> 
> If nfsd_open_local_fh() fails then the NFS client will both retry and
> fallback to normal network-based read, write and commit operations if
> localio is no longer supported.
> 
> Care is taken to ensure the same NFS security mechanisms are used
> (authentication, etc) regardless of whether localio or regular NFS
> access is used.  The auth_domain established as part of the traditional
> NFS client access to the NFS server is also used for localio.  Store
> auth_domain for localio in nfsd_uuid_t and transfer it to the client
> if it is local to the server.
> 
> Relative to containers, localio gives the client access to the network
> namespace the server has.  This is required to allow the client to
> access the server's per-namespace nfsd_net struct.
> 
> CONFIG_NFSD_LOCALIO controls the server enablement for localio.
> A later commit will add CONFIG_NFS_LOCALIO to allow the client
> enablement.
> 
> This commit also introduces the use of nfsd's percpu_ref to interlock
> nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
> not destroyed while in use by nfsd_open_local_fh, and warrants a more
> detailed explanation:
> 
> nfsd_open_local_fh uses nfsd_serv_try_get before opening its file
> handle and then the reference must be dropped by the caller using
> nfsd_serv_put (via nfs_localio_ctx_free).
> 
> This "interlock" working relies heavily on nfsd_open_local_fh()'s
> maybe_get_net() safely dealing with the possibility that the struct
> net (and nfsd_net by association) may have been destroyed by
> nfsd_destroy_serv() via nfsd_shutdown_net().
> 
> Verified to fix an easy to hit crash that would occur if an nfsd
> instance running in a container, with a localio client mounted, is
> shutdown. Upon restart of the container and associated nfsd the client
> would go on to crash due to NULL pointer dereference that occuured due
> to the nfs client's localio attempting to nfsd_open_local_fh(), using
> nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
> 
> Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/Kconfig          |   3 ++
>  fs/nfsd/Kconfig     |  16 +++++++
>  fs/nfsd/Makefile    |   1 +
>  fs/nfsd/filecache.c |   2 +-
>  fs/nfsd/localio.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfsd/trace.h     |   3 +-
>  fs/nfsd/vfs.h       |   7 +++
>  7 files changed, 135 insertions(+), 2 deletions(-)
>  create mode 100644 fs/nfsd/localio.c
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index a46b0cbc4d8f..1b8a5edbddff 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
>  	tristate
>  	select FS_POSIX_ACL
>  
> +config NFS_COMMON_LOCALIO_SUPPORT
> +	bool
> +
>  config NFS_COMMON
>  	bool
>  	depends on NFSD || NFS_FS || LOCKD
> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> index c0bd1509ccd4..e6fa7eaa1db0 100644
> --- a/fs/nfsd/Kconfig
> +++ b/fs/nfsd/Kconfig
> @@ -90,6 +90,22 @@ config NFSD_V4
>  
>  	  If unsure, say N.
>  
> +config NFSD_LOCALIO
> +	bool "NFS server support for the LOCALIO auxiliary protocol"
> +	depends on NFSD
> +	select NFS_COMMON_LOCALIO_SUPPORT
> +	default n
> +	help
> +	  Some NFS servers support an auxiliary NFS LOCALIO protocol
> +	  that is not an official part of the NFS protocol.
> +
> +	  This option enables support for the LOCALIO protocol in the
> +	  kernel's NFS server.  Enable this to permit local NFS clients
> +	  to bypass the network when issuing reads and writes to the
> +	  local NFS server.
> +
> +	  If unsure, say N.
> +
>  config NFSD_PNFS
>  	bool
>  
> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> index b8736a82e57c..78b421778a79 100644
> --- a/fs/nfsd/Makefile
> +++ b/fs/nfsd/Makefile
> @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
>  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
>  nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
>  nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index a83d469bca6b..49f4aab3208a 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -53,7 +53,7 @@
>  #define NFSD_FILE_CACHE_UP		     (0)
>  
>  /* We only care about NFSD_MAY_READ/WRITE for this cache */
> -#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
> +#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
>  
>  static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
>  static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> new file mode 100644
> index 000000000000..4b65c66be129
> --- /dev/null
> +++ b/fs/nfsd/localio.c
> @@ -0,0 +1,105 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * NFS server support for local clients to bypass network stack
> + *
> + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> + */
> +
> +#include <linux/exportfs.h>
> +#include <linux/sunrpc/svcauth.h>
> +#include <linux/sunrpc/clnt.h>
> +#include <linux/nfs.h>
> +#include <linux/nfs_common.h>
> +#include <linux/nfslocalio.h>
> +#include <linux/string.h>
> +
> +#include "nfsd.h"
> +#include "vfs.h"
> +#include "netns.h"
> +#include "filecache.h"
> +
> +/**
> + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
> + *
> + * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net
> + * @cl_nfssvc_dom: the 'struct auth_domain' required for localio access
> + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
> + * @cred: cred that the client established
> + * @nfs_fh: filehandle to lookup
> + * @fmode: fmode_t to use for open
> + *
> + * This function maps a local fh to a path on a local filesystem.
> + * This is useful when the nfs client has the local server mounted - it can
> + * avoid all the NFS overhead with reads, writes and commits.
> + *
> + * On successful return, returned nfs_localio_ctx will have its nfsd_file and
> + * nfsd_net members set. Caller is responsible for calling nfsd_file_put and
> + * nfsd_serv_put (via nfs_localio_ctx_free).
> + */
> +struct nfs_localio_ctx *
> +nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
> +		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> +		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> +{
> +	int mayflags = NFSD_MAY_LOCALIO;
> +	int status = 0;
> +	struct nfsd_net *nn;
> +	struct svc_cred rq_cred;
> +	struct svc_fh fh;
> +	struct nfs_localio_ctx *localio;
> +	__be32 beres;
> +
> +	if (nfs_fh->size > NFS4_FHSIZE)
> +		return ERR_PTR(-EINVAL);
> +
> +	localio = nfs_localio_ctx_alloc();
> +	if (!localio)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * Not running in nfsd context, so must safely get reference on nfsd_serv.
> +	 * But the server may already be shutting down, if so disallow new localio.
> +	 */
> +	nn = net_generic(cl_nfssvc_net, nfsd_net_id);
> +	if (unlikely(!nfsd_serv_try_get(nn))) {
> +		status = -ENXIO;
> +		goto out_nfsd_serv;
> +	}
> +
> +	/* nfs_fh -> svc_fh */
> +	fh_init(&fh, NFS4_FHSIZE);
> +	fh.fh_handle.fh_size = nfs_fh->size;
> +	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> +
> +	if (fmode & FMODE_READ)
> +		mayflags |= NFSD_MAY_READ;
> +	if (fmode & FMODE_WRITE)
> +		mayflags |= NFSD_MAY_WRITE;
> +
> +	svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
> +
> +	beres = nfsd_file_acquire_local(cl_nfssvc_net, &rq_cred, cl_nfssvc_dom,
> +					&fh, mayflags, &localio->nf);
> +	if (beres) {
> +		status = nfs_stat_to_errno(be32_to_cpu(beres));
> +		goto out_fh_put;
> +	}
> +	localio->nn = nn;
> +
> +out_fh_put:
> +	fh_put(&fh);
> +	if (rq_cred.cr_group_info)
> +		put_group_info(rq_cred.cr_group_info);
> +out_nfsd_serv:
> +	if (status) {
> +		nfs_localio_ctx_free(localio);
> +		return ERR_PTR(status);
> +	}
> +	return localio;
> +}
> +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index d22027e23761..82bcefcd1f21 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
>  		{ NFSD_MAY_NOT_BREAK_LEASE,	"NOT_BREAK_LEASE" },	\
>  		{ NFSD_MAY_BYPASS_GSS,		"BYPASS_GSS" },		\
>  		{ NFSD_MAY_READ_IF_EXEC,	"READ_IF_EXEC" },	\
> -		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" })
> +		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" },	\
> +		{ NFSD_MAY_LOCALIO,		"LOCALIO" })
>  
>  TRACE_EVENT(nfsd_compound,
>  	TP_PROTO(
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 01947561d375..e12310dd5f4c 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -33,6 +33,8 @@
>  
>  #define NFSD_MAY_64BIT_COOKIE		0x1000 /* 64 bit readdir cookies for >= NFSv3 */
>  
> +#define NFSD_MAY_LOCALIO		0x2000 /* for tracing, reflects when localio used */
> +
>  #define NFSD_MAY_CREATE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE)
>  #define NFSD_MAY_REMOVE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
>  
> @@ -158,6 +160,11 @@ __be32		nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
>  
>  void		nfsd_filp_close(struct file *fp);
>  
> +struct nfs_localio_ctx *
> +nfsd_open_local_fh(struct net *, struct auth_domain *,
> +		   struct rpc_clnt *, const struct cred *,
> +		   const struct nfs_fh *, const fmode_t);
> +
>  static inline int fh_want_write(struct svc_fh *fh)
>  {
>  	int ret;
> -- 
> 2.44.0
> 

Acked-by: Chuck Lever <chuck.lever@oracle.com>

I think I've looked at all the server-side changes now. I don't see
any issues that block merging this series.

Two follow-ups:

I haven't heard an answer to my question about how export options
that translate RPC user IDs might behave for LOCALIO operations
(eg. root_squash, all_squash). Test results, design points,
NEEDS_WORK, etc.

Someone should try out the trace points that we neutered in
fh_verify() before this set gets applied.


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 13/25] SUNRPC: replace program list with program array
  2024-08-29  1:04 ` [PATCH v14 13/25] SUNRPC: replace program list with program array Mike Snitzer
@ 2024-08-29 16:02   ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 16:02 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> From: NeilBrown <neil@brown.name>
> 
> A service created with svc_create_pooled() can be given a linked list of
> programs and all of these will be served.
> 
> Using a linked list makes it cumbersome when there are several programs
> that can be optionally selected with CONFIG settings.
> 
> After this patch is applied, API consumers must use only
> svc_create_pooled() when creating an RPC service that listens for more
> than one RPC program.
> 
> Acked-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: NeilBrown <neil@brown.name>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfsd/nfsctl.c           |  2 +-
>  fs/nfsd/nfsd.h             |  2 +-
>  fs/nfsd/nfssvc.c           | 38 ++++++++++-----------
>  include/linux/sunrpc/svc.h |  7 ++--
>  net/sunrpc/svc.c           | 68 ++++++++++++++++++++++----------------
>  net/sunrpc/svc_xprt.c      |  2 +-
>  net/sunrpc/svcauth_unix.c  |  3 +-
>  7 files changed, 67 insertions(+), 55 deletions(-)
> 
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 1c9e5b4bcb0a..64c1b4d649bc 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -2246,7 +2246,7 @@ static __net_init int nfsd_net_init(struct net *net)
>  	if (retval)
>  		goto out_repcache_error;
>  	memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
> -	nn->nfsd_svcstats.program = &nfsd_program;
> +	nn->nfsd_svcstats.program = &nfsd_programs[0];
>  	for (i = 0; i < sizeof(nn->nfsd_versions); i++)
>  		nn->nfsd_versions[i] = nfsd_support_version(i);
>  	for (i = 0; i < sizeof(nn->nfsd4_minorversions); i++)
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index 4ccbf014a2c7..b0d3e82d6dcd 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -85,7 +85,7 @@ struct nfsd_genl_rqstp {
>  	u32			rq_opnum[NFSD_MAX_OPS_PER_COMPOUND];
>  };
>  
> -extern struct svc_program	nfsd_program;
> +extern struct svc_program	nfsd_programs[];
>  extern const struct svc_version	nfsd_version2, nfsd_version3, nfsd_version4;
>  extern struct mutex		nfsd_mutex;
>  extern spinlock_t		nfsd_drc_lock;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index e43d440f9f0a..c639fbe4d8c2 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -35,7 +35,6 @@
>  #define NFSDDBG_FACILITY	NFSDDBG_SVC
>  
>  atomic_t			nfsd_th_cnt = ATOMIC_INIT(0);
> -extern struct svc_program	nfsd_program;
>  static int			nfsd(void *vrqstp);
>  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
>  static int			nfsd_acl_rpcbind_set(struct net *,
> @@ -90,20 +89,9 @@ static const struct svc_version *nfsd_acl_version[] = {
>  # endif
>  };
>  
> -#define NFSD_ACL_MINVERS            2
> +#define NFSD_ACL_MINVERS	2
>  #define NFSD_ACL_NRVERS		ARRAY_SIZE(nfsd_acl_version)
>  
> -static struct svc_program	nfsd_acl_program = {
> -	.pg_prog		= NFS_ACL_PROGRAM,
> -	.pg_nvers		= NFSD_ACL_NRVERS,
> -	.pg_vers		= nfsd_acl_version,
> -	.pg_name		= "nfsacl",
> -	.pg_class		= "nfsd",
> -	.pg_authenticate	= &svc_set_client,
> -	.pg_init_request	= nfsd_acl_init_request,
> -	.pg_rpcbind_set		= nfsd_acl_rpcbind_set,
> -};
> -
>  #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
>  
>  static const struct svc_version *nfsd_version[NFSD_MAXVERS+1] = {
> @@ -116,18 +104,29 @@ static const struct svc_version *nfsd_version[NFSD_MAXVERS+1] = {
>  #endif
>  };
>  
> -struct svc_program		nfsd_program = {
> -#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> -	.pg_next		= &nfsd_acl_program,
> -#endif
> +struct svc_program		nfsd_programs[] = {
> +	{
>  	.pg_prog		= NFS_PROGRAM,		/* program number */
>  	.pg_nvers		= NFSD_MAXVERS+1,	/* nr of entries in nfsd_version */
>  	.pg_vers		= nfsd_version,		/* version table */
>  	.pg_name		= "nfsd",		/* program name */
>  	.pg_class		= "nfsd",		/* authentication class */
> -	.pg_authenticate	= &svc_set_client,	/* export authentication */
> +	.pg_authenticate	= svc_set_client,	/* export authentication */
>  	.pg_init_request	= nfsd_init_request,
>  	.pg_rpcbind_set		= nfsd_rpcbind_set,
> +	},
> +#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> +	{
> +	.pg_prog		= NFS_ACL_PROGRAM,
> +	.pg_nvers		= NFSD_ACL_NRVERS,
> +	.pg_vers		= nfsd_acl_version,
> +	.pg_name		= "nfsacl",
> +	.pg_class		= "nfsd",
> +	.pg_authenticate	= svc_set_client,
> +	.pg_init_request	= nfsd_acl_init_request,
> +	.pg_rpcbind_set		= nfsd_acl_rpcbind_set,
> +	},
> +#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
>  };
>  
>  bool nfsd_support_version(int vers)
> @@ -637,7 +636,8 @@ int nfsd_create_serv(struct net *net)
>  	if (nfsd_max_blksize == 0)
>  		nfsd_max_blksize = nfsd_get_default_max_blksize();
>  	nfsd_reset_versions(nn);
> -	serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats,
> +	serv = svc_create_pooled(nfsd_programs, ARRAY_SIZE(nfsd_programs),
> +				 &nn->nfsd_svcstats,
>  				 nfsd_max_blksize, nfsd);
>  	if (serv == NULL)
>  		return -ENOMEM;
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index 437672bcaa22..c7ad2fb2a155 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -67,9 +67,10 @@ enum {
>   * We currently do not support more than one RPC program per daemon.
>   */
>  struct svc_serv {
> -	struct svc_program *	sv_program;	/* RPC program */
> +	struct svc_program *	sv_programs;	/* RPC programs */
>  	struct svc_stat *	sv_stats;	/* RPC statistics */
>  	spinlock_t		sv_lock;
> +	unsigned int		sv_nprogs;	/* Number of sv_programs */
>  	unsigned int		sv_nrthreads;	/* # of server threads */
>  	unsigned int		sv_maxconn;	/* max connections allowed or
>  						 * '0' causing max to be based
> @@ -357,10 +358,9 @@ struct svc_process_info {
>  };
>  
>  /*
> - * List of RPC programs on the same transport endpoint
> + * RPC program - an array of these can use the same transport endpoint
>   */
>  struct svc_program {
> -	struct svc_program *	pg_next;	/* other programs (same xprt) */
>  	u32			pg_prog;	/* program number */
>  	unsigned int		pg_lovers;	/* lowest version */
>  	unsigned int		pg_hivers;	/* highest version */
> @@ -438,6 +438,7 @@ bool		   svc_rqst_replace_page(struct svc_rqst *rqstp,
>  void		   svc_rqst_release_pages(struct svc_rqst *rqstp);
>  void		   svc_exit_thread(struct svc_rqst *);
>  struct svc_serv *  svc_create_pooled(struct svc_program *prog,
> +				     unsigned int nprog,
>  				     struct svc_stat *stats,
>  				     unsigned int bufsize,
>  				     int (*threadfn)(void *data));
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index ff6f3e35b36d..b33386d249c2 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -440,10 +440,11 @@ EXPORT_SYMBOL_GPL(svc_rpcb_cleanup);
>  
>  static int svc_uses_rpcbind(struct svc_serv *serv)
>  {
> -	struct svc_program	*progp;
> -	unsigned int		i;
> +	unsigned int		p, i;
> +
> +	for (p = 0; p < serv->sv_nprogs; p++) {
> +		struct svc_program *progp = &serv->sv_programs[p];
>  
> -	for (progp = serv->sv_program; progp; progp = progp->pg_next) {
>  		for (i = 0; i < progp->pg_nvers; i++) {
>  			if (progp->pg_vers[i] == NULL)
>  				continue;
> @@ -480,7 +481,7 @@ __svc_init_bc(struct svc_serv *serv)
>   * Create an RPC service
>   */
>  static struct svc_serv *
> -__svc_create(struct svc_program *prog, struct svc_stat *stats,
> +__svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats,
>  	     unsigned int bufsize, int npools, int (*threadfn)(void *data))
>  {
>  	struct svc_serv	*serv;
> @@ -491,7 +492,8 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
>  	if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL)))
>  		return NULL;
>  	serv->sv_name      = prog->pg_name;
> -	serv->sv_program   = prog;
> +	serv->sv_programs  = prog;
> +	serv->sv_nprogs    = nprogs;
>  	serv->sv_stats     = stats;
>  	if (bufsize > RPCSVC_MAXPAYLOAD)
>  		bufsize = RPCSVC_MAXPAYLOAD;
> @@ -499,17 +501,18 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
>  	serv->sv_max_mesg  = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE);
>  	serv->sv_threadfn = threadfn;
>  	xdrsize = 0;
> -	while (prog) {
> -		prog->pg_lovers = prog->pg_nvers-1;
> -		for (vers=0; vers<prog->pg_nvers ; vers++)
> -			if (prog->pg_vers[vers]) {
> -				prog->pg_hivers = vers;
> -				if (prog->pg_lovers > vers)
> -					prog->pg_lovers = vers;
> -				if (prog->pg_vers[vers]->vs_xdrsize > xdrsize)
> -					xdrsize = prog->pg_vers[vers]->vs_xdrsize;
> +	for (i = 0; i < nprogs; i++) {
> +		struct svc_program *progp = &prog[i];
> +
> +		progp->pg_lovers = progp->pg_nvers-1;
> +		for (vers = 0; vers < progp->pg_nvers ; vers++)
> +			if (progp->pg_vers[vers]) {
> +				progp->pg_hivers = vers;
> +				if (progp->pg_lovers > vers)
> +					progp->pg_lovers = vers;
> +				if (progp->pg_vers[vers]->vs_xdrsize > xdrsize)
> +					xdrsize = progp->pg_vers[vers]->vs_xdrsize;
>  			}
> -		prog = prog->pg_next;
>  	}
>  	serv->sv_xdrsize   = xdrsize;
>  	INIT_LIST_HEAD(&serv->sv_tempsocks);
> @@ -558,13 +561,14 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
>  struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize,
>  			    int (*threadfn)(void *data))
>  {
> -	return __svc_create(prog, NULL, bufsize, 1, threadfn);
> +	return __svc_create(prog, 1, NULL, bufsize, 1, threadfn);
>  }
>  EXPORT_SYMBOL_GPL(svc_create);
>  
>  /**
>   * svc_create_pooled - Create an RPC service with pooled threads
> - * @prog: the RPC program the new service will handle
> + * @prog:  Array of RPC programs the new service will handle
> + * @nprogs: Number of programs in the array
>   * @stats: the stats struct if desired
>   * @bufsize: maximum message size for @prog
>   * @threadfn: a function to service RPC requests for @prog
> @@ -572,6 +576,7 @@ EXPORT_SYMBOL_GPL(svc_create);
>   * Returns an instantiated struct svc_serv object or NULL.
>   */
>  struct svc_serv *svc_create_pooled(struct svc_program *prog,
> +				   unsigned int nprogs,
>  				   struct svc_stat *stats,
>  				   unsigned int bufsize,
>  				   int (*threadfn)(void *data))
> @@ -579,7 +584,7 @@ struct svc_serv *svc_create_pooled(struct svc_program *prog,
>  	struct svc_serv *serv;
>  	unsigned int npools = svc_pool_map_get();
>  
> -	serv = __svc_create(prog, stats, bufsize, npools, threadfn);
> +	serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn);
>  	if (!serv)
>  		goto out_err;
>  	serv->sv_is_pooled = true;
> @@ -602,16 +607,16 @@ svc_destroy(struct svc_serv **servp)
>  
>  	*servp = NULL;
>  
> -	dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name);
> +	dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name);
>  	timer_shutdown_sync(&serv->sv_temptimer);
>  
>  	/*
>  	 * Remaining transports at this point are not expected.
>  	 */
>  	WARN_ONCE(!list_empty(&serv->sv_permsocks),
> -		  "SVC: permsocks remain for %s\n", serv->sv_program->pg_name);
> +		  "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name);
>  	WARN_ONCE(!list_empty(&serv->sv_tempsocks),
> -		  "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name);
> +		  "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name);
>  
>  	cache_clean_deferred(serv);
>  
> @@ -1149,15 +1154,16 @@ int svc_register(const struct svc_serv *serv, struct net *net,
>  		 const int family, const unsigned short proto,
>  		 const unsigned short port)
>  {
> -	struct svc_program	*progp;
> -	unsigned int		i;
> +	unsigned int		p, i;
>  	int			error = 0;
>  
>  	WARN_ON_ONCE(proto == 0 && port == 0);
>  	if (proto == 0 && port == 0)
>  		return -EINVAL;
>  
> -	for (progp = serv->sv_program; progp; progp = progp->pg_next) {
> +	for (p = 0; p < serv->sv_nprogs; p++) {
> +		struct svc_program *progp = &serv->sv_programs[p];
> +
>  		for (i = 0; i < progp->pg_nvers; i++) {
>  
>  			error = progp->pg_rpcbind_set(net, progp, i,
> @@ -1209,13 +1215,14 @@ static void __svc_unregister(struct net *net, const u32 program, const u32 versi
>  static void svc_unregister(const struct svc_serv *serv, struct net *net)
>  {
>  	struct sighand_struct *sighand;
> -	struct svc_program *progp;
>  	unsigned long flags;
> -	unsigned int i;
> +	unsigned int p, i;
>  
>  	clear_thread_flag(TIF_SIGPENDING);
>  
> -	for (progp = serv->sv_program; progp; progp = progp->pg_next) {
> +	for (p = 0; p < serv->sv_nprogs; p++) {
> +		struct svc_program *progp = &serv->sv_programs[p];
> +
>  		for (i = 0; i < progp->pg_nvers; i++) {
>  			if (progp->pg_vers[i] == NULL)
>  				continue;
> @@ -1321,7 +1328,7 @@ svc_process_common(struct svc_rqst *rqstp)
>  	struct svc_process_info process;
>  	enum svc_auth_status	auth_res;
>  	unsigned int		aoffset;
> -	int			rc;
> +	int			pr, rc;
>  	__be32			*p;
>  
>  	/* Will be turned off only when NFSv4 Sessions are used */
> @@ -1345,9 +1352,12 @@ svc_process_common(struct svc_rqst *rqstp)
>  	rqstp->rq_vers = be32_to_cpup(p++);
>  	rqstp->rq_proc = be32_to_cpup(p);
>  
> -	for (progp = serv->sv_program; progp; progp = progp->pg_next)
> +	for (pr = 0; pr < serv->sv_nprogs; pr++) {
> +		progp = &serv->sv_programs[pr];
> +
>  		if (rqstp->rq_prog == progp->pg_prog)
>  			break;
> +	}
>  
>  	/*
>  	 * Decode auth data, and add verifier to reply buffer.
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index 53ebc719ff5a..43c57124de52 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -268,7 +268,7 @@ static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name,
>  		spin_unlock(&svc_xprt_class_lock);
>  		newxprt = xcl->xcl_ops->xpo_create(serv, net, sap, len, flags);
>  		if (IS_ERR(newxprt)) {
> -			trace_svc_xprt_create_err(serv->sv_program->pg_name,
> +			trace_svc_xprt_create_err(serv->sv_programs->pg_name,
>  						  xcl->xcl_name, sap, len,
>  						  newxprt);
>  			module_put(xcl->xcl_owner);
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 04b45588ae6f..8ca98b146ec8 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -697,7 +697,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
>  	rqstp->rq_auth_stat = rpc_autherr_badcred;
>  	ipm = ip_map_cached_get(xprt);
>  	if (ipm == NULL)
> -		ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class,
> +		ipm = __ip_map_lookup(sn->ip_map_cache,
> +				      rqstp->rq_server->sv_programs->pg_class,
>  				    &sin6->sin6_addr);
>  
>  	if (ipm == NULL)

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put
  2024-08-29 16:01     ` Mike Snitzer
@ 2024-08-29 16:04       ` Chuck Lever
  0 siblings, 0 replies; 75+ messages in thread
From: Chuck Lever @ 2024-08-29 16:04 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jeff Layton, linux-nfs, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 12:01:12PM -0400, Mike Snitzer wrote:
> On Thu, Aug 29, 2024 at 11:57:20AM -0400, Jeff Layton wrote:
> > On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > > Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
> > > to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
> > > caller of nfsd_serv_try_get releases their reference using nfsd_serv_put.
> > > 
> > > A percpu_ref is used to implement the interlock between
> > > nfsd_destroy_serv and any caller of nfsd_serv_try_get.
> > > 
> > > This interlock is needed to properly wait for the completion of client
> > > initiated localio calls to nfsd (that are _not_ in the context of nfsd).
> > > 
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > >  fs/nfsd/netns.h  |  8 +++++++-
> > >  fs/nfsd/nfssvc.c | 39 +++++++++++++++++++++++++++++++++++++++
> > >  2 files changed, 46 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > > index 238fc4e56e53..e2d953f21dde 100644
> > > --- a/fs/nfsd/netns.h
> > > +++ b/fs/nfsd/netns.h
> > > @@ -13,6 +13,7 @@
> > >  #include <linux/filelock.h>
> > >  #include <linux/nfs4.h>
> > >  #include <linux/percpu_counter.h>
> > > +#include <linux/percpu-refcount.h>
> > >  #include <linux/siphash.h>
> > >  #include <linux/sunrpc/stats.h>
> > >  
> > > @@ -139,7 +140,9 @@ struct nfsd_net {
> > >  
> > >  	struct svc_info nfsd_info;
> > >  #define nfsd_serv nfsd_info.serv
> > > -
> > > +	struct percpu_ref nfsd_serv_ref;
> > > +	struct completion nfsd_serv_confirm_done;
> > > +	struct completion nfsd_serv_free_done;
> > >  
> > >  	/*
> > >  	 * clientid and stateid data for construction of net unique COPY
> > > @@ -221,6 +224,9 @@ struct nfsd_net {
> > >  extern bool nfsd_support_version(int vers);
> > >  extern unsigned int nfsd_net_id;
> > >  
> > > +bool nfsd_serv_try_get(struct nfsd_net *nn);
> > > +void nfsd_serv_put(struct nfsd_net *nn);
> > > +
> > >  void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
> > >  void nfsd_reset_write_verifier(struct nfsd_net *nn);
> > >  #endif /* __NFSD_NETNS_H__ */
> > > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > > index defc430f912f..e43d440f9f0a 100644
> > > --- a/fs/nfsd/nfssvc.c
> > > +++ b/fs/nfsd/nfssvc.c
> > > @@ -193,6 +193,30 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
> > >  	return 0;
> > >  }
> > >  
> > > +bool nfsd_serv_try_get(struct nfsd_net *nn)
> > > +{
> > > +	return percpu_ref_tryget_live(&nn->nfsd_serv_ref);
> > > +}
> > > +
> > > +void nfsd_serv_put(struct nfsd_net *nn)
> > > +{
> > > +	percpu_ref_put(&nn->nfsd_serv_ref);
> > > +}
> > > +
> > > +static void nfsd_serv_done(struct percpu_ref *ref)
> > > +{
> > > +	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
> > > +
> > > +	complete(&nn->nfsd_serv_confirm_done);
> > > +}
> > > +
> > > +static void nfsd_serv_free(struct percpu_ref *ref)
> > > +{
> > > +	struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
> > > +
> > > +	complete(&nn->nfsd_serv_free_done);
> > > +}
> > > +
> > >  /*
> > >   * Maximum number of nfsd processes
> > >   */
> > > @@ -392,6 +416,7 @@ static void nfsd_shutdown_net(struct net *net)
> > >  		lockd_down(net);
> > >  		nn->lockd_up = false;
> > >  	}
> > > +	percpu_ref_exit(&nn->nfsd_serv_ref);
> > >  	nn->nfsd_net_up = false;
> > >  	nfsd_shutdown_generic();
> > >  }
> > > @@ -471,6 +496,13 @@ void nfsd_destroy_serv(struct net *net)
> > >  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > >  	struct svc_serv *serv = nn->nfsd_serv;
> > >  
> > > +	lockdep_assert_held(&nfsd_mutex);
> > > +
> > > +	percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
> > > +	wait_for_completion(&nn->nfsd_serv_confirm_done);
> > > +	wait_for_completion(&nn->nfsd_serv_free_done);
> > > +	/* percpu_ref_exit is called in nfsd_shutdown_net */
> > > +
> > >  	spin_lock(&nfsd_notifier_lock);
> > >  	nn->nfsd_serv = NULL;
> > >  	spin_unlock(&nfsd_notifier_lock);
> > > @@ -595,6 +627,13 @@ int nfsd_create_serv(struct net *net)
> > >  	if (nn->nfsd_serv)
> > >  		return 0;
> > >  
> > > +	error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free,
> > > +				0, GFP_KERNEL);
> > > +	if (error)
> > > +		return error;
> > > +	init_completion(&nn->nfsd_serv_free_done);
> > > +	init_completion(&nn->nfsd_serv_confirm_done);
> > > +
> > >  	if (nfsd_max_blksize == 0)
> > >  		nfsd_max_blksize = nfsd_get_default_max_blksize();
> > >  	nfsd_reset_versions(nn);
> > 
> > A little hard to review this one at this point in the series, as there
> > are no callers of get/put yet, but the concept seems reasonable.
> > 
> > Reviewed-by: Jeff Layton <jlayton@kernel.org>
> 
> Thanks, yeah Chuck asked that I factor this interlock interface out to
> a separate patch because it was a bit much buried in the next patch
> that actually consumes it.

Yes, and to add some rationale for it. I know folks don't like the
addition of new functions before their callers are introduced.

Thanks!

-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement
  2024-08-29  1:04 ` [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
@ 2024-08-29 16:07   ` Jeff Layton
  2024-08-29 16:22     ` Mike Snitzer
  2024-08-29 23:39   ` NeilBrown
  1 sibling, 1 reply; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 16:07 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS
> client to generate a nonce (single-use UUID) and associated
> short-lived nfs_uuid_t struct, register it with nfs_common for
> subsequent lookup and verification by the NFS server and if matched
> the NFS server populates members in the nfs_uuid_t struct.
> 
> nfs_common's nfs_uuids list is the basis for localio enablement, as
> such it has members that point to nfsd memory for direct use by the
> client (e.g. 'net' is the server's network namespace, through it the
> client can access nn->nfsd_serv with proper rcu read access).
> 
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfs_common/Makefile     |  3 ++
>  fs/nfs_common/nfslocalio.c | 74 ++++++++++++++++++++++++++++++++++++++
>  include/linux/nfslocalio.h | 31 ++++++++++++++++
>  3 files changed, 108 insertions(+)
>  create mode 100644 fs/nfs_common/nfslocalio.c
>  create mode 100644 include/linux/nfslocalio.h
> 
> diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
> index e58b01bb8dda..a5e54809701e 100644
> --- a/fs/nfs_common/Makefile
> +++ b/fs/nfs_common/Makefile
> @@ -6,6 +6,9 @@
>  obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
>  nfs_acl-objs := nfsacl.o
>  
> +obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o
> +nfs_localio-objs := nfslocalio.o
> +
>  obj-$(CONFIG_GRACE_PERIOD) += grace.o
>  obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
>  
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> new file mode 100644
> index 000000000000..1a35a4a6dbe0
> --- /dev/null
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -0,0 +1,74 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/rculist.h>
> +#include <linux/nfslocalio.h>
> +#include <net/netns/generic.h>
> +
> +MODULE_LICENSE("GPL");
> +MODULE_DESCRIPTION("NFS localio protocol bypass support");
> +
> +DEFINE_MUTEX(nfs_uuid_mutex);

Why a mutex here? AFAICT, you're just using this to protect the list. A
spinlock would probably be more efficient.

> +
> +/*
> + * Global list of nfs_uuid_t instances, add/remove
> + * is protected by nfs_uuid_mutex.
> + * Reads are protected by RCU read lock (see below).
> + */
> +LIST_HEAD(nfs_uuids);
> +
> +void nfs_uuid_begin(nfs_uuid_t *nfs_uuid)
> +{
> +	nfs_uuid->net = NULL;
> +	nfs_uuid->dom = NULL;
> +	uuid_gen(&nfs_uuid->uuid);
> +
> +	mutex_lock(&nfs_uuid_mutex);
> +	list_add_tail_rcu(&nfs_uuid->list, &nfs_uuids);
> +	mutex_unlock(&nfs_uuid_mutex);
> +}
> +EXPORT_SYMBOL_GPL(nfs_uuid_begin);
> +
> +void nfs_uuid_end(nfs_uuid_t *nfs_uuid)
> +{
> +	mutex_lock(&nfs_uuid_mutex);
> +	list_del_rcu(&nfs_uuid->list);
> +	mutex_unlock(&nfs_uuid_mutex);
> +}
> +EXPORT_SYMBOL_GPL(nfs_uuid_end);
> +
> +/* Must be called with RCU read lock held. */
> +static nfs_uuid_t * nfs_uuid_lookup(const uuid_t *uuid)
> +{
> +	nfs_uuid_t *nfs_uuid;
> +
> +	list_for_each_entry_rcu(nfs_uuid, &nfs_uuids, list)
> +		if (uuid_equal(&nfs_uuid->uuid, uuid))
> +			return nfs_uuid;
> +
> +	return NULL;
> +}
> +
> +bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *dom)
> +{
> +	bool is_local = false;
> +	nfs_uuid_t *nfs_uuid;
> +
> +	rcu_read_lock();
> +	nfs_uuid = nfs_uuid_lookup(uuid);
> +	if (nfs_uuid) {
> +		nfs_uuid->net = maybe_get_net(net);
> +		if (nfs_uuid->net) {
> +			is_local = true;
> +			kref_get(&dom->ref);
> +			nfs_uuid->dom = dom;
> +		}
> +	}
> +	rcu_read_unlock();
> +
> +	return is_local;
> +}
> +EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> new file mode 100644
> index 000000000000..9735ae8d3e5e
> --- /dev/null
> +++ b/include/linux/nfslocalio.h
> @@ -0,0 +1,31 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> + */
> +#ifndef __LINUX_NFSLOCALIO_H
> +#define __LINUX_NFSLOCALIO_H
> +
> +#include <linux/list.h>
> +#include <linux/uuid.h>
> +#include <linux/sunrpc/svcauth.h>
> +#include <linux/nfs.h>
> +#include <net/net_namespace.h>
> +
> +/*
> + * Useful to allow a client to negotiate if localio
> + * possible with its server.
> + *
> + * See Documentation/filesystems/nfs/localio.rst for more detail.
> + */
> +typedef struct {
> +	uuid_t uuid;
> +	struct list_head list;
> +	struct net *net; /* nfsd's network namespace */
> +	struct auth_domain *dom; /* auth_domain for localio */
> +} nfs_uuid_t;
> +
> +void nfs_uuid_begin(nfs_uuid_t *);
> +void nfs_uuid_end(nfs_uuid_t *);
> +bool nfs_uuid_is_local(const uuid_t *, struct net *, struct auth_domain *);
> +
> +#endif  /* __LINUX_NFSLOCALIO_H */

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 16/25] nfsd: add localio support
  2024-08-29 16:01   ` Chuck Lever
@ 2024-08-29 16:15     ` Mike Snitzer
  2024-08-29 23:10     ` NeilBrown
  1 sibling, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29 16:15 UTC (permalink / raw)
  To: Chuck Lever
  Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 12:01:42PM -0400, Chuck Lever wrote:
> On Wed, Aug 28, 2024 at 09:04:11PM -0400, Mike Snitzer wrote:
> > From: Weston Andros Adamson <dros@primarydata.com>
> > 
> > Add server support for bypassing NFS for localhost reads, writes, and
> > commits. This is only useful when both the client and server are
> > running on the same host.
> > 
> > If nfsd_open_local_fh() fails then the NFS client will both retry and
> > fallback to normal network-based read, write and commit operations if
> > localio is no longer supported.
> > 
> > Care is taken to ensure the same NFS security mechanisms are used
> > (authentication, etc) regardless of whether localio or regular NFS
> > access is used.  The auth_domain established as part of the traditional
> > NFS client access to the NFS server is also used for localio.  Store
> > auth_domain for localio in nfsd_uuid_t and transfer it to the client
> > if it is local to the server.
> > 
> > Relative to containers, localio gives the client access to the network
> > namespace the server has.  This is required to allow the client to
> > access the server's per-namespace nfsd_net struct.
> > 
> > CONFIG_NFSD_LOCALIO controls the server enablement for localio.
> > A later commit will add CONFIG_NFS_LOCALIO to allow the client
> > enablement.
> > 
> > This commit also introduces the use of nfsd's percpu_ref to interlock
> > nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
> > not destroyed while in use by nfsd_open_local_fh, and warrants a more
> > detailed explanation:
> > 
> > nfsd_open_local_fh uses nfsd_serv_try_get before opening its file
> > handle and then the reference must be dropped by the caller using
> > nfsd_serv_put (via nfs_localio_ctx_free).
> > 
> > This "interlock" working relies heavily on nfsd_open_local_fh()'s
> > maybe_get_net() safely dealing with the possibility that the struct
> > net (and nfsd_net by association) may have been destroyed by
> > nfsd_destroy_serv() via nfsd_shutdown_net().
> > 
> > Verified to fix an easy to hit crash that would occur if an nfsd
> > instance running in a container, with a localio client mounted, is
> > shutdown. Upon restart of the container and associated nfsd the client
> > would go on to crash due to NULL pointer dereference that occuured due
> > to the nfs client's localio attempting to nfsd_open_local_fh(), using
> > nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
> > 
> > Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/Kconfig          |   3 ++
> >  fs/nfsd/Kconfig     |  16 +++++++
> >  fs/nfsd/Makefile    |   1 +
> >  fs/nfsd/filecache.c |   2 +-
> >  fs/nfsd/localio.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/nfsd/trace.h     |   3 +-
> >  fs/nfsd/vfs.h       |   7 +++
> >  7 files changed, 135 insertions(+), 2 deletions(-)
> >  create mode 100644 fs/nfsd/localio.c
> > 
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index a46b0cbc4d8f..1b8a5edbddff 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
> >  	tristate
> >  	select FS_POSIX_ACL
> >  
> > +config NFS_COMMON_LOCALIO_SUPPORT
> > +	bool
> > +
> >  config NFS_COMMON
> >  	bool
> >  	depends on NFSD || NFS_FS || LOCKD
> > diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> > index c0bd1509ccd4..e6fa7eaa1db0 100644
> > --- a/fs/nfsd/Kconfig
> > +++ b/fs/nfsd/Kconfig
> > @@ -90,6 +90,22 @@ config NFSD_V4
> >  
> >  	  If unsure, say N.
> >  
> > +config NFSD_LOCALIO
> > +	bool "NFS server support for the LOCALIO auxiliary protocol"
> > +	depends on NFSD
> > +	select NFS_COMMON_LOCALIO_SUPPORT
> > +	default n
> > +	help
> > +	  Some NFS servers support an auxiliary NFS LOCALIO protocol
> > +	  that is not an official part of the NFS protocol.
> > +
> > +	  This option enables support for the LOCALIO protocol in the
> > +	  kernel's NFS server.  Enable this to permit local NFS clients
> > +	  to bypass the network when issuing reads and writes to the
> > +	  local NFS server.
> > +
> > +	  If unsure, say N.
> > +
> >  config NFSD_PNFS
> >  	bool
> >  
> > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > index b8736a82e57c..78b421778a79 100644
> > --- a/fs/nfsd/Makefile
> > +++ b/fs/nfsd/Makefile
> > @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> >  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> >  nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> >  nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> > +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index a83d469bca6b..49f4aab3208a 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -53,7 +53,7 @@
> >  #define NFSD_FILE_CACHE_UP		     (0)
> >  
> >  /* We only care about NFSD_MAY_READ/WRITE for this cache */
> > -#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
> > +#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> >  
> >  static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> >  static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > new file mode 100644
> > index 000000000000..4b65c66be129
> > --- /dev/null
> > +++ b/fs/nfsd/localio.c
> > @@ -0,0 +1,105 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * NFS server support for local clients to bypass network stack
> > + *
> > + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> > + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > + */
> > +
> > +#include <linux/exportfs.h>
> > +#include <linux/sunrpc/svcauth.h>
> > +#include <linux/sunrpc/clnt.h>
> > +#include <linux/nfs.h>
> > +#include <linux/nfs_common.h>
> > +#include <linux/nfslocalio.h>
> > +#include <linux/string.h>
> > +
> > +#include "nfsd.h"
> > +#include "vfs.h"
> > +#include "netns.h"
> > +#include "filecache.h"
> > +
> > +/**
> > + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
> > + *
> > + * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net
> > + * @cl_nfssvc_dom: the 'struct auth_domain' required for localio access
> > + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
> > + * @cred: cred that the client established
> > + * @nfs_fh: filehandle to lookup
> > + * @fmode: fmode_t to use for open
> > + *
> > + * This function maps a local fh to a path on a local filesystem.
> > + * This is useful when the nfs client has the local server mounted - it can
> > + * avoid all the NFS overhead with reads, writes and commits.
> > + *
> > + * On successful return, returned nfs_localio_ctx will have its nfsd_file and
> > + * nfsd_net members set. Caller is responsible for calling nfsd_file_put and
> > + * nfsd_serv_put (via nfs_localio_ctx_free).
> > + */
> > +struct nfs_localio_ctx *
> > +nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
> > +		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> > +		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> > +{
> > +	int mayflags = NFSD_MAY_LOCALIO;
> > +	int status = 0;
> > +	struct nfsd_net *nn;
> > +	struct svc_cred rq_cred;
> > +	struct svc_fh fh;
> > +	struct nfs_localio_ctx *localio;
> > +	__be32 beres;
> > +
> > +	if (nfs_fh->size > NFS4_FHSIZE)
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	localio = nfs_localio_ctx_alloc();
> > +	if (!localio)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	/*
> > +	 * Not running in nfsd context, so must safely get reference on nfsd_serv.
> > +	 * But the server may already be shutting down, if so disallow new localio.
> > +	 */
> > +	nn = net_generic(cl_nfssvc_net, nfsd_net_id);
> > +	if (unlikely(!nfsd_serv_try_get(nn))) {
> > +		status = -ENXIO;
> > +		goto out_nfsd_serv;
> > +	}
> > +
> > +	/* nfs_fh -> svc_fh */
> > +	fh_init(&fh, NFS4_FHSIZE);
> > +	fh.fh_handle.fh_size = nfs_fh->size;
> > +	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > +
> > +	if (fmode & FMODE_READ)
> > +		mayflags |= NFSD_MAY_READ;
> > +	if (fmode & FMODE_WRITE)
> > +		mayflags |= NFSD_MAY_WRITE;
> > +
> > +	svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
> > +
> > +	beres = nfsd_file_acquire_local(cl_nfssvc_net, &rq_cred, cl_nfssvc_dom,
> > +					&fh, mayflags, &localio->nf);
> > +	if (beres) {
> > +		status = nfs_stat_to_errno(be32_to_cpu(beres));
> > +		goto out_fh_put;
> > +	}
> > +	localio->nn = nn;
> > +
> > +out_fh_put:
> > +	fh_put(&fh);
> > +	if (rq_cred.cr_group_info)
> > +		put_group_info(rq_cred.cr_group_info);
> > +out_nfsd_serv:
> > +	if (status) {
> > +		nfs_localio_ctx_free(localio);
> > +		return ERR_PTR(status);
> > +	}
> > +	return localio;
> > +}
> > +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> > +
> > +/* Compile time type checking, not used by anything */
> > +static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index d22027e23761..82bcefcd1f21 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
> >  		{ NFSD_MAY_NOT_BREAK_LEASE,	"NOT_BREAK_LEASE" },	\
> >  		{ NFSD_MAY_BYPASS_GSS,		"BYPASS_GSS" },		\
> >  		{ NFSD_MAY_READ_IF_EXEC,	"READ_IF_EXEC" },	\
> > -		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" })
> > +		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" },	\
> > +		{ NFSD_MAY_LOCALIO,		"LOCALIO" })
> >  
> >  TRACE_EVENT(nfsd_compound,
> >  	TP_PROTO(
> > diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> > index 01947561d375..e12310dd5f4c 100644
> > --- a/fs/nfsd/vfs.h
> > +++ b/fs/nfsd/vfs.h
> > @@ -33,6 +33,8 @@
> >  
> >  #define NFSD_MAY_64BIT_COOKIE		0x1000 /* 64 bit readdir cookies for >= NFSv3 */
> >  
> > +#define NFSD_MAY_LOCALIO		0x2000 /* for tracing, reflects when localio used */
> > +
> >  #define NFSD_MAY_CREATE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE)
> >  #define NFSD_MAY_REMOVE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
> >  
> > @@ -158,6 +160,11 @@ __be32		nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
> >  
> >  void		nfsd_filp_close(struct file *fp);
> >  
> > +struct nfs_localio_ctx *
> > +nfsd_open_local_fh(struct net *, struct auth_domain *,
> > +		   struct rpc_clnt *, const struct cred *,
> > +		   const struct nfs_fh *, const fmode_t);
> > +
> >  static inline int fh_want_write(struct svc_fh *fh)
> >  {
> >  	int ret;
> > -- 
> > 2.44.0
> > 
> 
> Acked-by: Chuck Lever <chuck.lever@oracle.com>
> 
> I think I've looked at all the server-side changes now. I don't see
> any issues that block merging this series.

OK, thanks!

> Two follow-ups:
> 
> I haven't heard an answer to my question about how export options
> that translate RPC user IDs might behave for LOCALIO operations
> (eg. root_squash, all_squash). Test results, design points,
> NEEDS_WORK, etc.
> 
> Someone should try out the trace points that we neutered in
> fh_verify() before this set gets applied.

I'll work on both as a prereq for posting the final.

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement
  2024-08-29 16:07   ` Jeff Layton
@ 2024-08-29 16:22     ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29 16:22 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 12:07:06PM -0400, Jeff Layton wrote:
> On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS
> > client to generate a nonce (single-use UUID) and associated
> > short-lived nfs_uuid_t struct, register it with nfs_common for
> > subsequent lookup and verification by the NFS server and if matched
> > the NFS server populates members in the nfs_uuid_t struct.
> > 
> > nfs_common's nfs_uuids list is the basis for localio enablement, as
> > such it has members that point to nfsd memory for direct use by the
> > client (e.g. 'net' is the server's network namespace, through it the
> > client can access nn->nfsd_serv with proper rcu read access).
> > 
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/nfs_common/Makefile     |  3 ++
> >  fs/nfs_common/nfslocalio.c | 74 ++++++++++++++++++++++++++++++++++++++
> >  include/linux/nfslocalio.h | 31 ++++++++++++++++
> >  3 files changed, 108 insertions(+)
> >  create mode 100644 fs/nfs_common/nfslocalio.c
> >  create mode 100644 include/linux/nfslocalio.h
> > 
> > diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
> > index e58b01bb8dda..a5e54809701e 100644
> > --- a/fs/nfs_common/Makefile
> > +++ b/fs/nfs_common/Makefile
> > @@ -6,6 +6,9 @@
> >  obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
> >  nfs_acl-objs := nfsacl.o
> >  
> > +obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o
> > +nfs_localio-objs := nfslocalio.o
> > +
> >  obj-$(CONFIG_GRACE_PERIOD) += grace.o
> >  obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
> >  
> > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > new file mode 100644
> > index 000000000000..1a35a4a6dbe0
> > --- /dev/null
> > +++ b/fs/nfs_common/nfslocalio.c
> > @@ -0,0 +1,74 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > + */
> > +
> > +#include <linux/module.h>
> > +#include <linux/rculist.h>
> > +#include <linux/nfslocalio.h>
> > +#include <net/netns/generic.h>
> > +
> > +MODULE_LICENSE("GPL");
> > +MODULE_DESCRIPTION("NFS localio protocol bypass support");
> > +
> > +DEFINE_MUTEX(nfs_uuid_mutex);
> 
> Why a mutex here? AFAICT, you're just using this to protect the list. A
> spinlock would probably be more efficient.

Yeah, will do, I meant to revisit (when Neil suggested the same for
the lock that is added in 15/25).

Thanks.

> > +
> > +/*
> > + * Global list of nfs_uuid_t instances, add/remove
> > + * is protected by nfs_uuid_mutex.
> > + * Reads are protected by RCU read lock (see below).
> > + */
> > +LIST_HEAD(nfs_uuids);
> > +
> > +void nfs_uuid_begin(nfs_uuid_t *nfs_uuid)
> > +{
> > +	nfs_uuid->net = NULL;
> > +	nfs_uuid->dom = NULL;
> > +	uuid_gen(&nfs_uuid->uuid);
> > +
> > +	mutex_lock(&nfs_uuid_mutex);
> > +	list_add_tail_rcu(&nfs_uuid->list, &nfs_uuids);
> > +	mutex_unlock(&nfs_uuid_mutex);
> > +}
> > +EXPORT_SYMBOL_GPL(nfs_uuid_begin);
> > +
> > +void nfs_uuid_end(nfs_uuid_t *nfs_uuid)
> > +{
> > +	mutex_lock(&nfs_uuid_mutex);
> > +	list_del_rcu(&nfs_uuid->list);
> > +	mutex_unlock(&nfs_uuid_mutex);
> > +}
> > +EXPORT_SYMBOL_GPL(nfs_uuid_end);
> > +
> > +/* Must be called with RCU read lock held. */
> > +static nfs_uuid_t * nfs_uuid_lookup(const uuid_t *uuid)
> > +{
> > +	nfs_uuid_t *nfs_uuid;
> > +
> > +	list_for_each_entry_rcu(nfs_uuid, &nfs_uuids, list)
> > +		if (uuid_equal(&nfs_uuid->uuid, uuid))
> > +			return nfs_uuid;
> > +
> > +	return NULL;
> > +}
> > +
> > +bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *dom)
> > +{
> > +	bool is_local = false;
> > +	nfs_uuid_t *nfs_uuid;
> > +
> > +	rcu_read_lock();
> > +	nfs_uuid = nfs_uuid_lookup(uuid);
> > +	if (nfs_uuid) {
> > +		nfs_uuid->net = maybe_get_net(net);
> > +		if (nfs_uuid->net) {
> > +			is_local = true;
> > +			kref_get(&dom->ref);
> > +			nfs_uuid->dom = dom;
> > +		}
> > +	}
> > +	rcu_read_unlock();
> > +
> > +	return is_local;
> > +}
> > +EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> > diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> > new file mode 100644
> > index 000000000000..9735ae8d3e5e
> > --- /dev/null
> > +++ b/include/linux/nfslocalio.h
> > @@ -0,0 +1,31 @@
> > +/* SPDX-License-Identifier: GPL-2.0 */
> > +/*
> > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > + */
> > +#ifndef __LINUX_NFSLOCALIO_H
> > +#define __LINUX_NFSLOCALIO_H
> > +
> > +#include <linux/list.h>
> > +#include <linux/uuid.h>
> > +#include <linux/sunrpc/svcauth.h>
> > +#include <linux/nfs.h>
> > +#include <net/net_namespace.h>
> > +
> > +/*
> > + * Useful to allow a client to negotiate if localio
> > + * possible with its server.
> > + *
> > + * See Documentation/filesystems/nfs/localio.rst for more detail.
> > + */
> > +typedef struct {
> > +	uuid_t uuid;
> > +	struct list_head list;
> > +	struct net *net; /* nfsd's network namespace */
> > +	struct auth_domain *dom; /* auth_domain for localio */
> > +} nfs_uuid_t;
> > +
> > +void nfs_uuid_begin(nfs_uuid_t *);
> > +void nfs_uuid_end(nfs_uuid_t *);
> > +bool nfs_uuid_is_local(const uuid_t *, struct net *, struct auth_domain *);
> > +
> > +#endif  /* __LINUX_NFSLOCALIO_H */
> 
> -- 
> Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-29  1:04 ` [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces Mike Snitzer
@ 2024-08-29 16:40   ` Jeff Layton
  2024-08-29 16:52     ` Mike Snitzer
  2024-08-30  5:46   ` NeilBrown
  1 sibling, 1 reply; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 16:40 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> Introduce struct nfs_localio_ctx and the interfaces
> nfs_localio_ctx_alloc() and nfs_localio_ctx_free().  The next commit
> will introduce nfsd_open_local_fh() which returns a nfs_localio_ctx
> structure.
> 
> Also, expose localio's required NFSD symbols to NFS client:
> - Cache nfsd_open_local_fh symbol and other required NFSD symbols in a
>   globally accessible 'nfs_to' nfs_to_nfsd_t struct.  Add interfaces
>   get_nfs_to_nfsd_symbols() and put_nfs_to_nfsd_symbols() to allow
>   each NFS client to take a reference on NFSD symbols.
> 
> - Apologies for the DEFINE_NFS_TO_NFSD_SYMBOL macro that makes
>   defining get_##NFSD_SYMBOL() and put_##NFSD_SYMBOL() functions far
>   simpler (and avoids cut-n-paste bugs, which is what motivated the
>   development and use of a macro for this). But as C macros go it is a
>   very simple one and there are many like it all over the kernel.
> 
> - Given the unique nature of NFS LOCALIO being an optional feature
>   that when used requires NFS share access to NFSD memory: a unique
>   bridging of NFSD resources to NFS (via nfs_common) is needed.  But
>   that bridge must be dynamic, hence the use of symbol_request() and
>   symbol_put().  Proposed ideas to accomolish the same without using
>   symbol_{request,put} would be far more tedious to implement and
>   very likely no easier to review.  Anyway: sorry NeilBrown...
> 
> - Despite the use of indirect function calls, caching these nfsd
>   symbols for use by the client offers a ~10% performance win
>   (compared to always doing get+call+put) for high IOPS workloads.
> 
> - Introduce nfsd_file_file() wrapper that provides access to
>   nfsd_file's backing file.  Keeps nfsd_file structure opaque to NFS
>   client (as suggested by Jeff Layton).
> 
> - The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
>   symbols prepares for the NFS client to use nfsd_file for localio.
> 
> Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
> Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfs_common/nfslocalio.c | 159 +++++++++++++++++++++++++++++++++++++
>  fs/nfsd/filecache.c        |  25 ++++++
>  fs/nfsd/filecache.h        |   1 +
>  fs/nfsd/nfssvc.c           |   5 ++
>  include/linux/nfslocalio.h |  38 +++++++++
>  5 files changed, 228 insertions(+)
> 
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 1a35a4a6dbe0..cc30fdb0cb46 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -72,3 +72,162 @@ bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *
>  	return is_local;
>  }
>  EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> +
> +/*
> + * The nfs localio code needs to call into nfsd using various symbols (below),
> + * but cannot be statically linked, because that will make the nfs module
> + * depend on the nfsd module.
> + *
> + * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
> + * nfs_common module will only hold a reference on nfsd when localio is in use.
> + * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> + */
> +static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
> +nfs_to_nfsd_t nfs_to;
> +EXPORT_SYMBOL_GPL(nfs_to);
> +
> +/* Macro to define nfs_to get and put methods, avoids copy-n-paste bugs */
> +#define DEFINE_NFS_TO_NFSD_SYMBOL(NFSD_SYMBOL)		\
> +static nfs_to_##NFSD_SYMBOL##_t get_##NFSD_SYMBOL(void)	\
> +{							\
> +	return symbol_request(NFSD_SYMBOL);		\
> +}							\
> +static void put_##NFSD_SYMBOL(void)			\
> +{							\
> +	symbol_put(NFSD_SYMBOL);			\
> +	nfs_to.NFSD_SYMBOL = NULL;			\
> +}
> +
> +/* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
> +extern struct nfs_localio_ctx *
> +nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
> +		   const struct cred *, const struct nfs_fh *, const fmode_t);
> +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
> +
> +/* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
> +extern struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_get);
> +
> +/* The nfs localio code needs to call into nfsd to release the nfsd_file */
> +extern void nfsd_file_put(struct nfsd_file *nf);
> +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_put);
> +
> +/* The nfs localio code needs to call into nfsd to access the nf->nf_file */
> +extern struct file * nfsd_file_file(struct nfsd_file *nf);
> +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_file);
> +
> +/* The nfs localio code needs to call into nfsd to release nn->nfsd_serv */
> +extern void nfsd_serv_put(struct nfsd_net *nn);
> +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_serv_put);
> +#undef DEFINE_NFS_TO_NFSD_SYMBOL
> +

I have the same concerns as Neil did with this patch in v13. An ops
structure that nfsd registers with nfs_common and that has pointers to
all of these functions would be a lot cleaner. I think it'll end up
being less code too.

In fact, for that I'd probably break my usual guideline of not
introducing new interfaces without callers, and just do a separate
patch that adds the ops structure and sets up the handling of the
pointer to it in nfs_common.

> +static struct kmem_cache *nfs_localio_ctx_cache;
> +
> +struct nfs_localio_ctx *nfs_localio_ctx_alloc(void)
> +{
> +	return kmem_cache_alloc(nfs_localio_ctx_cache,
> +				GFP_KERNEL | __GFP_ZERO);
> +}
> +EXPORT_SYMBOL_GPL(nfs_localio_ctx_alloc);
> +
> +void nfs_localio_ctx_free(struct nfs_localio_ctx *localio)
> +{
> +	if (localio->nf)
> +		nfs_to.nfsd_file_put(localio->nf);
> +	if (localio->nn)
> +		nfs_to.nfsd_serv_put(localio->nn);
> +	kmem_cache_free(nfs_localio_ctx_cache, localio);
> +}
> +EXPORT_SYMBOL_GPL(nfs_localio_ctx_free);
> +
> +bool get_nfs_to_nfsd_symbols(void)
> +{
> +	spin_lock(&nfs_to_nfsd_lock);
> +
> +	/* Only get symbols on first reference */
> +	if (refcount_read(&nfs_to.ref) == 0)
> +		refcount_set(&nfs_to.ref, 1);
> +	else {
> +		refcount_inc(&nfs_to.ref);
> +		spin_unlock(&nfs_to_nfsd_lock);
> +		return true;
> +	}
> +
> +	nfs_to.nfsd_open_local_fh = get_nfsd_open_local_fh();
> +	if (!nfs_to.nfsd_open_local_fh)
> +		goto out_nfsd_open_local_fh;
> +
> +	nfs_to.nfsd_file_get = get_nfsd_file_get();
> +	if (!nfs_to.nfsd_file_get)
> +		goto out_nfsd_file_get;
> +
> +	nfs_to.nfsd_file_put = get_nfsd_file_put();
> +	if (!nfs_to.nfsd_file_put)
> +		goto out_nfsd_file_put;
> +
> +	nfs_to.nfsd_file_file = get_nfsd_file_file();
> +	if (!nfs_to.nfsd_file_file)
> +		goto out_nfsd_file_file;
> +
> +	nfs_to.nfsd_serv_put = get_nfsd_serv_put();
> +	if (!nfs_to.nfsd_serv_put)
> +		goto out_nfsd_serv_put;
> +
> +	spin_unlock(&nfs_to_nfsd_lock);
> +	return true;
> +
> +out_nfsd_serv_put:
> +	put_nfsd_file_file();
> +out_nfsd_file_file:
> +	put_nfsd_file_put();
> +out_nfsd_file_put:
> +	put_nfsd_file_get();
> +out_nfsd_file_get:
> +	put_nfsd_open_local_fh();
> +out_nfsd_open_local_fh:
> +	spin_unlock(&nfs_to_nfsd_lock);
> +	return false;
> +}
> +EXPORT_SYMBOL_GPL(get_nfs_to_nfsd_symbols);
> +
> +void put_nfs_to_nfsd_symbols(void)
> +{
> +	spin_lock(&nfs_to_nfsd_lock);
> +
> +	if (!refcount_dec_and_test(&nfs_to.ref))
> +		goto out;
> +
> +	put_nfsd_open_local_fh();
> +	put_nfsd_file_get();
> +	put_nfsd_file_put();
> +	put_nfsd_file_file();
> +	put_nfsd_serv_put();
> +out:
> +	spin_unlock(&nfs_to_nfsd_lock);
> +}
> +EXPORT_SYMBOL_GPL(put_nfs_to_nfsd_symbols);
> +
> +static int __init nfslocalio_init(void)
> +{
> +	refcount_set(&nfs_to.ref, 0);
> +
> +	nfs_to.nfsd_open_local_fh = NULL;
> +	nfs_to.nfsd_file_get = NULL;
> +	nfs_to.nfsd_file_put = NULL;
> +	nfs_to.nfsd_file_file = NULL;
> +	nfs_to.nfsd_serv_put = NULL;
> +
> +	nfs_localio_ctx_cache = KMEM_CACHE(nfs_localio_ctx, 0);
> +	if (!nfs_localio_ctx_cache)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static void __exit nfslocalio_exit(void)
> +{
> +	kmem_cache_destroy(nfs_localio_ctx_cache);
> +}
> +
> +module_init(nfslocalio_init);
> +module_exit(nfslocalio_exit);
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 2dc72de31f61..a83d469bca6b 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -39,6 +39,7 @@
>  #include <linux/fsnotify.h>
>  #include <linux/seq_file.h>
>  #include <linux/rhashtable.h>
> +#include <linux/nfslocalio.h>
>  
>  #include "vfs.h"
>  #include "nfsd.h"
> @@ -345,6 +346,10 @@ nfsd_file_get(struct nfsd_file *nf)
>  		return nf;
>  	return NULL;
>  }
> +EXPORT_SYMBOL_GPL(nfsd_file_get);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_file_get_t __maybe_unused nfsd_file_get_typecheck = nfsd_file_get;
>  
>  /**
>   * nfsd_file_put - put the reference to a nfsd_file
> @@ -389,6 +394,26 @@ nfsd_file_put(struct nfsd_file *nf)
>  	if (refcount_dec_and_test(&nf->nf_ref))
>  		nfsd_file_free(nf);
>  }
> +EXPORT_SYMBOL_GPL(nfsd_file_put);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_file_put_t __maybe_unused nfsd_file_put_typecheck = nfsd_file_put;
> +
> +/**
> + * nfsd_file_file - get the backing file of an nfsd_file
> + * @nf: nfsd_file of which to access the backing file.
> + *
> + * Return backing file for @nf.
> + */
> +struct file *
> +nfsd_file_file(struct nfsd_file *nf)
> +{
> +	return nf->nf_file;
> +}
> +EXPORT_SYMBOL_GPL(nfsd_file_file);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_file_file_t __maybe_unused nfsd_file_file_typecheck = nfsd_file_file;
>  
>  static void
>  nfsd_file_dispose_list(struct list_head *dispose)
> diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
> index 26ada78b8c1e..6fbbb2e32e95 100644
> --- a/fs/nfsd/filecache.h
> +++ b/fs/nfsd/filecache.h
> @@ -56,6 +56,7 @@ int nfsd_file_cache_start_net(struct net *net);
>  void nfsd_file_cache_shutdown_net(struct net *net);
>  void nfsd_file_put(struct nfsd_file *nf);
>  struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> +struct file *nfsd_file_file(struct nfsd_file *nf);
>  void nfsd_file_close_inode_sync(struct inode *inode);
>  void nfsd_file_net_dispose(struct nfsd_net *nn);
>  bool nfsd_file_is_cached(struct inode *inode);
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index c639fbe4d8c2..13c69aa40d1c 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -19,6 +19,7 @@
>  #include <linux/sunrpc/svc_xprt.h>
>  #include <linux/lockd/bind.h>
>  #include <linux/nfsacl.h>
> +#include <linux/nfslocalio.h>
>  #include <linux/seq_file.h>
>  #include <linux/inetdevice.h>
>  #include <net/addrconf.h>
> @@ -201,6 +202,10 @@ void nfsd_serv_put(struct nfsd_net *nn)
>  {
>  	percpu_ref_put(&nn->nfsd_serv_ref);
>  }
> +EXPORT_SYMBOL_GPL(nfsd_serv_put);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_serv_put_t __maybe_unused nfsd_serv_put_typecheck = nfsd_serv_put;
>  
>  static void nfsd_serv_done(struct percpu_ref *ref)
>  {
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index 9735ae8d3e5e..68f5b39f1940 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -7,6 +7,8 @@
>  
>  #include <linux/list.h>
>  #include <linux/uuid.h>
> +#include <linux/refcount.h>
> +#include <linux/sunrpc/clnt.h>
>  #include <linux/sunrpc/svcauth.h>
>  #include <linux/nfs.h>
>  #include <net/net_namespace.h>
> @@ -28,4 +30,40 @@ void nfs_uuid_begin(nfs_uuid_t *);
>  void nfs_uuid_end(nfs_uuid_t *);
>  bool nfs_uuid_is_local(const uuid_t *, struct net *, struct auth_domain *);
>  
> +struct nfsd_file;
> +struct nfsd_net;
> +
> +struct nfs_localio_ctx {
> +	struct nfsd_file *nf;
> +	struct nfsd_net *nn;
> +};
> +
> +typedef struct nfs_localio_ctx *
> +(*nfs_to_nfsd_open_local_fh_t)(struct net *, struct auth_domain *,
> +			       struct rpc_clnt *, const struct cred *,
> +			       const struct nfs_fh *, const fmode_t);
> +typedef struct nfsd_file * (*nfs_to_nfsd_file_get_t)(struct nfsd_file *);
> +typedef void (*nfs_to_nfsd_file_put_t)(struct nfsd_file *);
> +typedef struct file * (*nfs_to_nfsd_file_file_t)(struct nfsd_file *);
> +typedef unsigned int (*nfs_to_nfsd_net_id_value_t)(void);
> +typedef void (*nfs_to_nfsd_serv_put_t)(struct nfsd_net *);
> +
> +typedef struct {
> +	refcount_t			ref;
> +	nfs_to_nfsd_open_local_fh_t	nfsd_open_local_fh;
> +	nfs_to_nfsd_file_get_t		nfsd_file_get;
> +	nfs_to_nfsd_file_put_t		nfsd_file_put;
> +	nfs_to_nfsd_file_file_t		nfsd_file_file;
> +	nfs_to_nfsd_net_id_value_t	nfsd_net_id_value;
> +	nfs_to_nfsd_serv_put_t		nfsd_serv_put;
> +} nfs_to_nfsd_t;
> +
> +extern nfs_to_nfsd_t nfs_to;
> +
> +bool get_nfs_to_nfsd_symbols(void);
> +void put_nfs_to_nfsd_symbols(void);
> +
> +struct nfs_localio_ctx *nfs_localio_ctx_alloc(void);
> +void nfs_localio_ctx_free(struct nfs_localio_ctx *);
> +
>  #endif  /* __LINUX_NFSLOCALIO_H */

-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 16/25] nfsd: add localio support
  2024-08-29  1:04 ` [PATCH v14 16/25] nfsd: add localio support Mike Snitzer
  2024-08-29 16:01   ` Chuck Lever
@ 2024-08-29 16:49   ` Jeff Layton
  2024-08-29 16:59     ` Mike Snitzer
  1 sibling, 1 reply; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 16:49 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <dros@primarydata.com>
> 
> Add server support for bypassing NFS for localhost reads, writes, and
> commits. This is only useful when both the client and server are
> running on the same host.
> 
> If nfsd_open_local_fh() fails then the NFS client will both retry and
> fallback to normal network-based read, write and commit operations if
> localio is no longer supported.
> 
> Care is taken to ensure the same NFS security mechanisms are used
> (authentication, etc) regardless of whether localio or regular NFS
> access is used.  The auth_domain established as part of the traditional
> NFS client access to the NFS server is also used for localio.  Store
> auth_domain for localio in nfsd_uuid_t and transfer it to the client
> if it is local to the server.
> 
> Relative to containers, localio gives the client access to the network
> namespace the server has.  This is required to allow the client to
> access the server's per-namespace nfsd_net struct.
> 
> CONFIG_NFSD_LOCALIO controls the server enablement for localio.
> A later commit will add CONFIG_NFS_LOCALIO to allow the client
> enablement.

Do we need separate CONFIG options? Surely if you have one, you'll
always want the other.

> 
> This commit also introduces the use of nfsd's percpu_ref to interlock
> nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
> not destroyed while in use by nfsd_open_local_fh, and warrants a more
> detailed explanation:
> 
> nfsd_open_local_fh uses nfsd_serv_try_get before opening its file
> handle and then the reference must be dropped by the caller using
> nfsd_serv_put (via nfs_localio_ctx_free).
> 
> This "interlock" working relies heavily on nfsd_open_local_fh()'s
> maybe_get_net() safely dealing with the possibility that the struct
> net (and nfsd_net by association) may have been destroyed by
> nfsd_destroy_serv() via nfsd_shutdown_net().
> 
> Verified to fix an easy to hit crash that would occur if an nfsd
> instance running in a container, with a localio client mounted, is
> shutdown. Upon restart of the container and associated nfsd the client
> would go on to crash due to NULL pointer dereference that occuured due
> to the nfs client's localio attempting to nfsd_open_local_fh(), using
> nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
> 

Maybe transplant a version of the above 4 paragraphs to the patch that
adds the percpu_ref handling?


> Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/Kconfig          |   3 ++
>  fs/nfsd/Kconfig     |  16 +++++++
>  fs/nfsd/Makefile    |   1 +
>  fs/nfsd/filecache.c |   2 +-
>  fs/nfsd/localio.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfsd/trace.h     |   3 +-
>  fs/nfsd/vfs.h       |   7 +++
>  7 files changed, 135 insertions(+), 2 deletions(-)
>  create mode 100644 fs/nfsd/localio.c
> 
> diff --git a/fs/Kconfig b/fs/Kconfig
> index a46b0cbc4d8f..1b8a5edbddff 100644
> --- a/fs/Kconfig
> +++ b/fs/Kconfig
> @@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
>  	tristate
>  	select FS_POSIX_ACL
>  
> +config NFS_COMMON_LOCALIO_SUPPORT
> +	bool
> +
>  config NFS_COMMON
>  	bool
>  	depends on NFSD || NFS_FS || LOCKD
> diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> index c0bd1509ccd4..e6fa7eaa1db0 100644
> --- a/fs/nfsd/Kconfig
> +++ b/fs/nfsd/Kconfig
> @@ -90,6 +90,22 @@ config NFSD_V4
>  
>  	  If unsure, say N.
>  
> +config NFSD_LOCALIO
> +	bool "NFS server support for the LOCALIO auxiliary protocol"
> +	depends on NFSD
> +	select NFS_COMMON_LOCALIO_SUPPORT
> +	default n
> +	help
> +	  Some NFS servers support an auxiliary NFS LOCALIO protocol
> +	  that is not an official part of the NFS protocol.
> +
> +	  This option enables support for the LOCALIO protocol in the
> +	  kernel's NFS server.  Enable this to permit local NFS clients
> +	  to bypass the network when issuing reads and writes to the
> +	  local NFS server.
> +
> +	  If unsure, say N.
> +
>  config NFSD_PNFS
>  	bool
>  
> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> index b8736a82e57c..78b421778a79 100644
> --- a/fs/nfsd/Makefile
> +++ b/fs/nfsd/Makefile
> @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
>  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
>  nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
>  nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index a83d469bca6b..49f4aab3208a 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -53,7 +53,7 @@
>  #define NFSD_FILE_CACHE_UP		     (0)
>  
>  /* We only care about NFSD_MAY_READ/WRITE for this cache */
> -#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
> +#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
>  
>  static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
>  static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> new file mode 100644
> index 000000000000..4b65c66be129
> --- /dev/null
> +++ b/fs/nfsd/localio.c
> @@ -0,0 +1,105 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * NFS server support for local clients to bypass network stack
> + *
> + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> + */
> +
> +#include <linux/exportfs.h>
> +#include <linux/sunrpc/svcauth.h>
> +#include <linux/sunrpc/clnt.h>
> +#include <linux/nfs.h>
> +#include <linux/nfs_common.h>
> +#include <linux/nfslocalio.h>
> +#include <linux/string.h>
> +
> +#include "nfsd.h"
> +#include "vfs.h"
> +#include "netns.h"
> +#include "filecache.h"
> +
> +/**
> + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
> + *
> + * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net
> + * @cl_nfssvc_dom: the 'struct auth_domain' required for localio access
> + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
> + * @cred: cred that the client established
> + * @nfs_fh: filehandle to lookup
> + * @fmode: fmode_t to use for open
> + *
> + * This function maps a local fh to a path on a local filesystem.
> + * This is useful when the nfs client has the local server mounted - it can
> + * avoid all the NFS overhead with reads, writes and commits.
> + *
> + * On successful return, returned nfs_localio_ctx will have its nfsd_file and
> + * nfsd_net members set. Caller is responsible for calling nfsd_file_put and
> + * nfsd_serv_put (via nfs_localio_ctx_free).
> + */
> +struct nfs_localio_ctx *
> +nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
> +		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> +		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> +{
> +	int mayflags = NFSD_MAY_LOCALIO;
> +	int status = 0;
> +	struct nfsd_net *nn;
> +	struct svc_cred rq_cred;
> +	struct svc_fh fh;
> +	struct nfs_localio_ctx *localio;
> +	__be32 beres;
> +
> +	if (nfs_fh->size > NFS4_FHSIZE)
> +		return ERR_PTR(-EINVAL);
> +
> +	localio = nfs_localio_ctx_alloc();
> +	if (!localio)
> +		return ERR_PTR(-ENOMEM);
> +
> +	/*
> +	 * Not running in nfsd context, so must safely get reference on nfsd_serv.
> +	 * But the server may already be shutting down, if so disallow new localio.
> +	 */
> +	nn = net_generic(cl_nfssvc_net, nfsd_net_id);
> +	if (unlikely(!nfsd_serv_try_get(nn))) {
> +		status = -ENXIO;
> +		goto out_nfsd_serv;
> +	}
> +
> +	/* nfs_fh -> svc_fh */
> +	fh_init(&fh, NFS4_FHSIZE);
> +	fh.fh_handle.fh_size = nfs_fh->size;
> +	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> +
> +	if (fmode & FMODE_READ)
> +		mayflags |= NFSD_MAY_READ;
> +	if (fmode & FMODE_WRITE)
> +		mayflags |= NFSD_MAY_WRITE;
> +
> +	svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
> +
> +	beres = nfsd_file_acquire_local(cl_nfssvc_net, &rq_cred, cl_nfssvc_dom,
> +					&fh, mayflags, &localio->nf);
> +	if (beres) {
> +		status = nfs_stat_to_errno(be32_to_cpu(beres));
> +		goto out_fh_put;
> +	}
> +	localio->nn = nn;
> +
> +out_fh_put:
> +	fh_put(&fh);
> +	if (rq_cred.cr_group_info)
> +		put_group_info(rq_cred.cr_group_info);
> +out_nfsd_serv:
> +	if (status) {
> +		nfs_localio_ctx_free(localio);
> +		return ERR_PTR(status);
> +	}
> +	return localio;
> +}
> +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index d22027e23761..82bcefcd1f21 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
>  		{ NFSD_MAY_NOT_BREAK_LEASE,	"NOT_BREAK_LEASE" },	\
>  		{ NFSD_MAY_BYPASS_GSS,		"BYPASS_GSS" },		\
>  		{ NFSD_MAY_READ_IF_EXEC,	"READ_IF_EXEC" },	\
> -		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" })
> +		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" },	\
> +		{ NFSD_MAY_LOCALIO,		"LOCALIO" })
>  
>  TRACE_EVENT(nfsd_compound,
>  	TP_PROTO(
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 01947561d375..e12310dd5f4c 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -33,6 +33,8 @@
>  
>  #define NFSD_MAY_64BIT_COOKIE		0x1000 /* 64 bit readdir cookies for >= NFSv3 */
>  
> +#define NFSD_MAY_LOCALIO		0x2000 /* for tracing, reflects when localio used */
> +
>  #define NFSD_MAY_CREATE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE)
>  #define NFSD_MAY_REMOVE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
>  
> @@ -158,6 +160,11 @@ __be32		nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
>  
>  void		nfsd_filp_close(struct file *fp);
>  
> +struct nfs_localio_ctx *
> +nfsd_open_local_fh(struct net *, struct auth_domain *,
> +		   struct rpc_clnt *, const struct cred *,
> +		   const struct nfs_fh *, const fmode_t);
> +
>  static inline int fh_want_write(struct svc_fh *fh)
>  {
>  	int ret;

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM
  2024-08-29  1:04 ` [PATCH v14 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-08-29 16:50   ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 16:50 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
> RPC method that allows the Linux NFS client to verify the local Linux
> NFS server can see the nonce (single-use UUID) the client generated and
> made available in nfs_common.  The server expects this protocol to use
> the same transport as NFS and NFSACL for its RPCs.  This protocol
> isn't part of an IETF standard, nor does it need to be considering it
> is Linux-to-Linux auxiliary RPC protocol that amounts to an
> implementation detail.
> 
> The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
> the fixed UUID_SIZE (16 bytes).  The fixed size opaque encode and decode
> XDR methods are used instead of the less efficient variable sized
> methods.
> 
> The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
> by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
> Linux Kernel Organization       400122  nfslocalio
> 
> Acked-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> [neilb: factored out and simplified single localio protocol]
> Co-developed-by: NeilBrown <neil@brown.name>
> Signed-off-by: NeilBrown <neil@brown.name>
> ---
>  fs/nfsd/localio.c   | 75 +++++++++++++++++++++++++++++++++++++++++++++
>  fs/nfsd/nfsd.h      |  4 +++
>  fs/nfsd/nfssvc.c    | 23 +++++++++++++-
>  include/linux/nfs.h |  7 +++++
>  4 files changed, 108 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> index 4b65c66be129..a192bbe308df 100644
> --- a/fs/nfsd/localio.c
> +++ b/fs/nfsd/localio.c
> @@ -13,12 +13,15 @@
>  #include <linux/nfs.h>
>  #include <linux/nfs_common.h>
>  #include <linux/nfslocalio.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/nfs_xdr.h>
>  #include <linux/string.h>
>  
>  #include "nfsd.h"
>  #include "vfs.h"
>  #include "netns.h"
>  #include "filecache.h"
> +#include "cache.h"
>  
>  /**
>   * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
> @@ -103,3 +106,75 @@ EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
>  
>  /* Compile time type checking, not used by anything */
>  static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> +
> +/*
> + * UUID_IS_LOCAL XDR functions
> + */
> +
> +static __be32 localio_proc_null(struct svc_rqst *rqstp)
> +{
> +	return rpc_success;
> +}
> +
> +struct localio_uuidarg {
> +	uuid_t			uuid;
> +};
> +
> +static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
> +{
> +	struct localio_uuidarg *argp = rqstp->rq_argp;
> +
> +	(void) nfs_uuid_is_local(&argp->uuid, SVC_NET(rqstp),
> +				 rqstp->rq_client);
> +
> +	return rpc_success;
> +}
> +
> +static bool localio_decode_uuidarg(struct svc_rqst *rqstp,
> +				   struct xdr_stream *xdr)
> +{
> +	struct localio_uuidarg *argp = rqstp->rq_argp;
> +	u8 uuid[UUID_SIZE];
> +
> +	if (decode_opaque_fixed(xdr, uuid, UUID_SIZE))
> +		return false;
> +	import_uuid(&argp->uuid, uuid);
> +
> +	return true;
> +}
> +
> +static const struct svc_procedure localio_procedures1[] = {
> +	[LOCALIOPROC_NULL] = {
> +		.pc_func = localio_proc_null,
> +		.pc_decode = nfssvc_decode_voidarg,
> +		.pc_encode = nfssvc_encode_voidres,
> +		.pc_argsize = sizeof(struct nfsd_voidargs),
> +		.pc_ressize = sizeof(struct nfsd_voidres),
> +		.pc_cachetype = RC_NOCACHE,
> +		.pc_xdrressize = 0,
> +		.pc_name = "NULL",
> +	},
> +	[LOCALIOPROC_UUID_IS_LOCAL] = {
> +		.pc_func = localio_proc_uuid_is_local,
> +		.pc_decode = localio_decode_uuidarg,
> +		.pc_encode = nfssvc_encode_voidres,
> +		.pc_argsize = sizeof(struct localio_uuidarg),
> +		.pc_argzero = sizeof(struct localio_uuidarg),
> +		.pc_ressize = sizeof(struct nfsd_voidres),
> +		.pc_cachetype = RC_NOCACHE,
> +		.pc_name = "UUID_IS_LOCAL",
> +	},
> +};
> +
> +#define LOCALIO_NR_PROCEDURES ARRAY_SIZE(localio_procedures1)
> +static DEFINE_PER_CPU_ALIGNED(unsigned long,
> +			      localio_count[LOCALIO_NR_PROCEDURES]);
> +const struct svc_version localio_version1 = {
> +	.vs_vers	= 1,
> +	.vs_nproc	= LOCALIO_NR_PROCEDURES,
> +	.vs_proc	= localio_procedures1,
> +	.vs_dispatch	= nfsd_dispatch,
> +	.vs_count	= localio_count,
> +	.vs_xdrsize	= XDR_QUADLEN(UUID_SIZE),
> +	.vs_hidden	= true,
> +};
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index b0d3e82d6dcd..232a873dc53a 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -146,6 +146,10 @@ extern const struct svc_version nfsd_acl_version3;
>  #endif
>  #endif
>  
> +#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> +extern const struct svc_version localio_version1;
> +#endif
> +
>  struct nfsd_net;
>  
>  enum vers_op {NFSD_SET, NFSD_CLEAR, NFSD_TEST, NFSD_AVAIL };
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 13c69aa40d1c..eec4a9803c4a 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -80,6 +80,15 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
>  unsigned long	nfsd_drc_max_mem;
>  unsigned long	nfsd_drc_mem_used;
>  
> +#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> +static const struct svc_version *localio_versions[] = {
> +	[1] = &localio_version1,
> +};
> +
> +#define NFSD_LOCALIO_NRVERS		ARRAY_SIZE(localio_versions)
> +
> +#endif /* CONFIG_NFSD_LOCALIO */
> +
>  #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
>  static const struct svc_version *nfsd_acl_version[] = {
>  # if defined(CONFIG_NFSD_V2_ACL)
> @@ -128,6 +137,18 @@ struct svc_program		nfsd_programs[] = {
>  	.pg_rpcbind_set		= nfsd_acl_rpcbind_set,
>  	},
>  #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
> +#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> +	{
> +	.pg_prog		= NFS_LOCALIO_PROGRAM,
> +	.pg_nvers		= NFSD_LOCALIO_NRVERS,
> +	.pg_vers		= localio_versions,
> +	.pg_name		= "nfslocalio",
> +	.pg_class		= "nfsd",
> +	.pg_authenticate	= svc_set_client,
> +	.pg_init_request	= svc_generic_init_request,
> +	.pg_rpcbind_set		= svc_generic_rpcbind_set,
> +	}
> +#endif /* IS_ENABLED(CONFIG_NFSD_LOCALIO) */
>  };
>  
>  bool nfsd_support_version(int vers)
> @@ -949,7 +970,7 @@ nfsd(void *vrqstp)
>  }
>  
>  /**
> - * nfsd_dispatch - Process an NFS or NFSACL Request
> + * nfsd_dispatch - Process an NFS or NFSACL or LOCALIO Request
>   * @rqstp: incoming request
>   *
>   * This RPC dispatcher integrates the NFS server's duplicate reply cache.
> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
> index ceb70a926b95..5ff1a5b3b00c 100644
> --- a/include/linux/nfs.h
> +++ b/include/linux/nfs.h
> @@ -13,6 +13,13 @@
>  #include <linux/crc32.h>
>  #include <uapi/linux/nfs.h>
>  
> +/* The localio program is entirely private to Linux and is
> + * NOT part of the uapi.
> + */
> +#define NFS_LOCALIO_PROGRAM		400122
> +#define LOCALIOPROC_NULL		0
> +#define LOCALIOPROC_UUID_IS_LOCAL	1
> +
>  /*
>   * This is the kernel NFS client file handle representation
>   */

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-29 16:40   ` Jeff Layton
@ 2024-08-29 16:52     ` Mike Snitzer
  2024-08-29 17:48       ` Jeff Layton
  0 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29 16:52 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 12:40:27PM -0400, Jeff Layton wrote:
> On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > Introduce struct nfs_localio_ctx and the interfaces
> > nfs_localio_ctx_alloc() and nfs_localio_ctx_free().  The next commit
> > will introduce nfsd_open_local_fh() which returns a nfs_localio_ctx
> > structure.
> > 
> > Also, expose localio's required NFSD symbols to NFS client:
> > - Cache nfsd_open_local_fh symbol and other required NFSD symbols in a
> >   globally accessible 'nfs_to' nfs_to_nfsd_t struct.  Add interfaces
> >   get_nfs_to_nfsd_symbols() and put_nfs_to_nfsd_symbols() to allow
> >   each NFS client to take a reference on NFSD symbols.
> > 
> > - Apologies for the DEFINE_NFS_TO_NFSD_SYMBOL macro that makes
> >   defining get_##NFSD_SYMBOL() and put_##NFSD_SYMBOL() functions far
> >   simpler (and avoids cut-n-paste bugs, which is what motivated the
> >   development and use of a macro for this). But as C macros go it is a
> >   very simple one and there are many like it all over the kernel.
> > 
> > - Given the unique nature of NFS LOCALIO being an optional feature
> >   that when used requires NFS share access to NFSD memory: a unique
> >   bridging of NFSD resources to NFS (via nfs_common) is needed.  But
> >   that bridge must be dynamic, hence the use of symbol_request() and
> >   symbol_put().  Proposed ideas to accomolish the same without using
> >   symbol_{request,put} would be far more tedious to implement and
> >   very likely no easier to review.  Anyway: sorry NeilBrown...
> > 
> > - Despite the use of indirect function calls, caching these nfsd
> >   symbols for use by the client offers a ~10% performance win
> >   (compared to always doing get+call+put) for high IOPS workloads.
> > 
> > - Introduce nfsd_file_file() wrapper that provides access to
> >   nfsd_file's backing file.  Keeps nfsd_file structure opaque to NFS
> >   client (as suggested by Jeff Layton).
> > 
> > - The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
> >   symbols prepares for the NFS client to use nfsd_file for localio.
> > 
> > Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
> > Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/nfs_common/nfslocalio.c | 159 +++++++++++++++++++++++++++++++++++++
> >  fs/nfsd/filecache.c        |  25 ++++++
> >  fs/nfsd/filecache.h        |   1 +
> >  fs/nfsd/nfssvc.c           |   5 ++
> >  include/linux/nfslocalio.h |  38 +++++++++
> >  5 files changed, 228 insertions(+)
> > 
> > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > index 1a35a4a6dbe0..cc30fdb0cb46 100644
> > --- a/fs/nfs_common/nfslocalio.c
> > +++ b/fs/nfs_common/nfslocalio.c
> > @@ -72,3 +72,162 @@ bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *
> >  	return is_local;
> >  }
> >  EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> > +
> > +/*
> > + * The nfs localio code needs to call into nfsd using various symbols (below),
> > + * but cannot be statically linked, because that will make the nfs module
> > + * depend on the nfsd module.
> > + *
> > + * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
> > + * nfs_common module will only hold a reference on nfsd when localio is in use.
> > + * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> > + */
> > +static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
> > +nfs_to_nfsd_t nfs_to;
> > +EXPORT_SYMBOL_GPL(nfs_to);
> > +
> > +/* Macro to define nfs_to get and put methods, avoids copy-n-paste bugs */
> > +#define DEFINE_NFS_TO_NFSD_SYMBOL(NFSD_SYMBOL)		\
> > +static nfs_to_##NFSD_SYMBOL##_t get_##NFSD_SYMBOL(void)	\
> > +{							\
> > +	return symbol_request(NFSD_SYMBOL);		\
> > +}							\
> > +static void put_##NFSD_SYMBOL(void)			\
> > +{							\
> > +	symbol_put(NFSD_SYMBOL);			\
> > +	nfs_to.NFSD_SYMBOL = NULL;			\
> > +}
> > +
> > +/* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
> > +extern struct nfs_localio_ctx *
> > +nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
> > +		   const struct cred *, const struct nfs_fh *, const fmode_t);
> > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
> > +
> > +/* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
> > +extern struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_get);
> > +
> > +/* The nfs localio code needs to call into nfsd to release the nfsd_file */
> > +extern void nfsd_file_put(struct nfsd_file *nf);
> > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_put);
> > +
> > +/* The nfs localio code needs to call into nfsd to access the nf->nf_file */
> > +extern struct file * nfsd_file_file(struct nfsd_file *nf);
> > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_file);
> > +
> > +/* The nfs localio code needs to call into nfsd to release nn->nfsd_serv */
> > +extern void nfsd_serv_put(struct nfsd_net *nn);
> > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_serv_put);
> > +#undef DEFINE_NFS_TO_NFSD_SYMBOL
> > +
> 
> I have the same concerns as Neil did with this patch in v13. An ops
> structure that nfsd registers with nfs_common and that has pointers to
> all of these functions would be a lot cleaner. I think it'll end up
> being less code too.
> 
> In fact, for that I'd probably break my usual guideline of not
> introducing new interfaces without callers, and just do a separate
> patch that adds the ops structure and sets up the handling of the
> pointer to it in nfs_common.

OK, as much as it pains me to set aside proven code that I put a
decent amount of time to honing: I'll humor you guys and try to make
an ops structure workable. (we can always fall back to my approach if
I/we come up short).

I'm just concerned about the optional use aspect.  There is the pain
point of how does NFS client come to _know_ NFSD loaded?  Using
symbol_request() deals with that nicely.

I really don't want all calls in NFS client (or nfs_common) to have to
first check if nfs_common's 'nfs_to' ops structure is NULL or not.

But yeah, I'll put more time to it... ;)

Mike

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14.5 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry()
  2024-08-29  1:45   ` [PATCH v14.5 " Mike Snitzer
@ 2024-08-29 16:52     ` Jeff Layton
  0 siblings, 0 replies; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 16:52 UTC (permalink / raw)
  To: Mike Snitzer, linux-nfs
  Cc: Chuck Lever, Anna Schumaker, Trond Myklebust, NeilBrown,
	linux-fsdevel

On Wed, 2024-08-28 at 21:45 -0400, Mike Snitzer wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Currently, fh_verify() makes some daring assumptions about which
> version of file handle the caller wants, based on the things it can
> find in the passed-in rqstp. The about-to-be-introduced LOCALIO use
> case sometimes has no svc_rqst context, so this logic won't work in
> that case.
> 
> Instead, examine the passed-in file handle. It's .max_size field
> should carry information to allow nfsd_set_fh_dentry() to initialize
> the file handle appropriately.
> 
> The file handle used by lockd and the one created by write_filehandle
> never need any of the version-specific fields (which affect things
> like write and getattr requests and pre/post attributes).
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
>  fs/nfsd/nfsfh.c | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/nfsd/nfsfh.c b/fs/nfsd/nfsfh.c
> index 4b964a71a504..60c2395d7af7 100644
> --- a/fs/nfsd/nfsfh.c
> +++ b/fs/nfsd/nfsfh.c
> @@ -267,20 +267,20 @@ static __be32 nfsd_set_fh_dentry(struct svc_rqst *rqstp, struct svc_fh *fhp)
>  	fhp->fh_dentry = dentry;
>  	fhp->fh_export = exp;
>  
> -	switch (rqstp->rq_vers) {
> -	case 4:
> +	switch (fhp->fh_maxsize) {
> +	case NFS4_FHSIZE:
>  		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOATOMIC_ATTR)
>  			fhp->fh_no_atomic_attr = true;
>  		fhp->fh_64bit_cookies = true;
>  		break;
> -	case 3:
> +	case NFS3_FHSIZE:
>  		if (dentry->d_sb->s_export_op->flags & EXPORT_OP_NOWCC)
>  			fhp->fh_no_wcc = true;
>  		fhp->fh_64bit_cookies = true;
>  		if (exp->ex_flags & NFSEXP_V4ROOT)
>  			goto out;
>  		break;
> -	case 2:
> +	case NFS_FHSIZE:
>  		fhp->fh_no_wcc = true;
>  		if (EX_WGATHER(exp))
>  			fhp->fh_use_wgather = true;

Reviewed-by: Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 16/25] nfsd: add localio support
  2024-08-29 16:49   ` Jeff Layton
@ 2024-08-29 16:59     ` Mike Snitzer
  2024-08-29 17:18       ` Chuck Lever
  0 siblings, 1 reply; 75+ messages in thread
From: Mike Snitzer @ 2024-08-29 16:59 UTC (permalink / raw)
  To: Jeff Layton
  Cc: linux-nfs, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 12:49:23PM -0400, Jeff Layton wrote:
> On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > From: Weston Andros Adamson <dros@primarydata.com>
> > 
> > Add server support for bypassing NFS for localhost reads, writes, and
> > commits. This is only useful when both the client and server are
> > running on the same host.
> > 
> > If nfsd_open_local_fh() fails then the NFS client will both retry and
> > fallback to normal network-based read, write and commit operations if
> > localio is no longer supported.
> > 
> > Care is taken to ensure the same NFS security mechanisms are used
> > (authentication, etc) regardless of whether localio or regular NFS
> > access is used.  The auth_domain established as part of the traditional
> > NFS client access to the NFS server is also used for localio.  Store
> > auth_domain for localio in nfsd_uuid_t and transfer it to the client
> > if it is local to the server.
> > 
> > Relative to containers, localio gives the client access to the network
> > namespace the server has.  This is required to allow the client to
> > access the server's per-namespace nfsd_net struct.
> > 
> > CONFIG_NFSD_LOCALIO controls the server enablement for localio.
> > A later commit will add CONFIG_NFS_LOCALIO to allow the client
> > enablement.
> 
> Do we need separate CONFIG options? Surely if you have one, you'll
> always want the other.

We used to have 4 (2 for each)... yeah I hear you.  Its fiddley but I
can look at making it a single one with more feeling.  Same as the
nfs_to opes work I just commited to: worst case we keep what we have
with the 2 CONFIG options, but 1 option _should_ be doable.

> > This commit also introduces the use of nfsd's percpu_ref to interlock
> > nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
> > not destroyed while in use by nfsd_open_local_fh, and warrants a more
> > detailed explanation:
> > 
> > nfsd_open_local_fh uses nfsd_serv_try_get before opening its file
> > handle and then the reference must be dropped by the caller using
> > nfsd_serv_put (via nfs_localio_ctx_free).
> > 
> > This "interlock" working relies heavily on nfsd_open_local_fh()'s
> > maybe_get_net() safely dealing with the possibility that the struct
> > net (and nfsd_net by association) may have been destroyed by
> > nfsd_destroy_serv() via nfsd_shutdown_net().

This ^ 3rd paragraph no longer applicable, the use of proper long-term
ref on the 'nfsd_net' coupled with the use of RCU makes it so.

> > 
> > Verified to fix an easy to hit crash that would occur if an nfsd
> > instance running in a container, with a localio client mounted, is
> > shutdown. Upon restart of the container and associated nfsd the client
> > would go on to crash due to NULL pointer dereference that occuured due
> > to the nfs client's localio attempting to nfsd_open_local_fh(), using
> > nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
> > 
> 
> Maybe transplant a version of the above 4 paragraphs to the patch that
> adds the percpu_ref handling?

I think it best to be mention where the use of nfsd_serv_{try_get,put}
meets the road.  Hopefully you're cool with the 3 paragraphs staying
in this header? ;)

> > Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/Kconfig          |   3 ++
> >  fs/nfsd/Kconfig     |  16 +++++++
> >  fs/nfsd/Makefile    |   1 +
> >  fs/nfsd/filecache.c |   2 +-
> >  fs/nfsd/localio.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/nfsd/trace.h     |   3 +-
> >  fs/nfsd/vfs.h       |   7 +++
> >  7 files changed, 135 insertions(+), 2 deletions(-)
> >  create mode 100644 fs/nfsd/localio.c
> > 
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index a46b0cbc4d8f..1b8a5edbddff 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
> >  	tristate
> >  	select FS_POSIX_ACL
> >  
> > +config NFS_COMMON_LOCALIO_SUPPORT
> > +	bool
> > +
> >  config NFS_COMMON
> >  	bool
> >  	depends on NFSD || NFS_FS || LOCKD
> > diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> > index c0bd1509ccd4..e6fa7eaa1db0 100644
> > --- a/fs/nfsd/Kconfig
> > +++ b/fs/nfsd/Kconfig
> > @@ -90,6 +90,22 @@ config NFSD_V4
> >  
> >  	  If unsure, say N.
> >  
> > +config NFSD_LOCALIO
> > +	bool "NFS server support for the LOCALIO auxiliary protocol"
> > +	depends on NFSD
> > +	select NFS_COMMON_LOCALIO_SUPPORT
> > +	default n
> > +	help
> > +	  Some NFS servers support an auxiliary NFS LOCALIO protocol
> > +	  that is not an official part of the NFS protocol.
> > +
> > +	  This option enables support for the LOCALIO protocol in the
> > +	  kernel's NFS server.  Enable this to permit local NFS clients
> > +	  to bypass the network when issuing reads and writes to the
> > +	  local NFS server.
> > +
> > +	  If unsure, say N.
> > +
> >  config NFSD_PNFS
> >  	bool
> >  
> > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > index b8736a82e57c..78b421778a79 100644
> > --- a/fs/nfsd/Makefile
> > +++ b/fs/nfsd/Makefile
> > @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> >  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> >  nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> >  nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> > +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index a83d469bca6b..49f4aab3208a 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -53,7 +53,7 @@
> >  #define NFSD_FILE_CACHE_UP		     (0)
> >  
> >  /* We only care about NFSD_MAY_READ/WRITE for this cache */
> > -#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
> > +#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> >  
> >  static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> >  static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > new file mode 100644
> > index 000000000000..4b65c66be129
> > --- /dev/null
> > +++ b/fs/nfsd/localio.c
> > @@ -0,0 +1,105 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * NFS server support for local clients to bypass network stack
> > + *
> > + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> > + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > + */
> > +
> > +#include <linux/exportfs.h>
> > +#include <linux/sunrpc/svcauth.h>
> > +#include <linux/sunrpc/clnt.h>
> > +#include <linux/nfs.h>
> > +#include <linux/nfs_common.h>
> > +#include <linux/nfslocalio.h>
> > +#include <linux/string.h>
> > +
> > +#include "nfsd.h"
> > +#include "vfs.h"
> > +#include "netns.h"
> > +#include "filecache.h"
> > +
> > +/**
> > + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
> > + *
> > + * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net
> > + * @cl_nfssvc_dom: the 'struct auth_domain' required for localio access
> > + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
> > + * @cred: cred that the client established
> > + * @nfs_fh: filehandle to lookup
> > + * @fmode: fmode_t to use for open
> > + *
> > + * This function maps a local fh to a path on a local filesystem.
> > + * This is useful when the nfs client has the local server mounted - it can
> > + * avoid all the NFS overhead with reads, writes and commits.
> > + *
> > + * On successful return, returned nfs_localio_ctx will have its nfsd_file and
> > + * nfsd_net members set. Caller is responsible for calling nfsd_file_put and
> > + * nfsd_serv_put (via nfs_localio_ctx_free).
> > + */
> > +struct nfs_localio_ctx *
> > +nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
> > +		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> > +		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> > +{
> > +	int mayflags = NFSD_MAY_LOCALIO;
> > +	int status = 0;
> > +	struct nfsd_net *nn;
> > +	struct svc_cred rq_cred;
> > +	struct svc_fh fh;
> > +	struct nfs_localio_ctx *localio;
> > +	__be32 beres;
> > +
> > +	if (nfs_fh->size > NFS4_FHSIZE)
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	localio = nfs_localio_ctx_alloc();
> > +	if (!localio)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	/*
> > +	 * Not running in nfsd context, so must safely get reference on nfsd_serv.
> > +	 * But the server may already be shutting down, if so disallow new localio.
> > +	 */
> > +	nn = net_generic(cl_nfssvc_net, nfsd_net_id);
> > +	if (unlikely(!nfsd_serv_try_get(nn))) {
> > +		status = -ENXIO;
> > +		goto out_nfsd_serv;
> > +	}
> > +
> > +	/* nfs_fh -> svc_fh */
> > +	fh_init(&fh, NFS4_FHSIZE);
> > +	fh.fh_handle.fh_size = nfs_fh->size;
> > +	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > +
> > +	if (fmode & FMODE_READ)
> > +		mayflags |= NFSD_MAY_READ;
> > +	if (fmode & FMODE_WRITE)
> > +		mayflags |= NFSD_MAY_WRITE;
> > +
> > +	svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
> > +
> > +	beres = nfsd_file_acquire_local(cl_nfssvc_net, &rq_cred, cl_nfssvc_dom,
> > +					&fh, mayflags, &localio->nf);
> > +	if (beres) {
> > +		status = nfs_stat_to_errno(be32_to_cpu(beres));
> > +		goto out_fh_put;
> > +	}
> > +	localio->nn = nn;
> > +
> > +out_fh_put:
> > +	fh_put(&fh);
> > +	if (rq_cred.cr_group_info)
> > +		put_group_info(rq_cred.cr_group_info);
> > +out_nfsd_serv:
> > +	if (status) {
> > +		nfs_localio_ctx_free(localio);
> > +		return ERR_PTR(status);
> > +	}
> > +	return localio;
> > +}
> > +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> > +
> > +/* Compile time type checking, not used by anything */
> > +static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index d22027e23761..82bcefcd1f21 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
> >  		{ NFSD_MAY_NOT_BREAK_LEASE,	"NOT_BREAK_LEASE" },	\
> >  		{ NFSD_MAY_BYPASS_GSS,		"BYPASS_GSS" },		\
> >  		{ NFSD_MAY_READ_IF_EXEC,	"READ_IF_EXEC" },	\
> > -		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" })
> > +		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" },	\
> > +		{ NFSD_MAY_LOCALIO,		"LOCALIO" })
> >  
> >  TRACE_EVENT(nfsd_compound,
> >  	TP_PROTO(
> > diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> > index 01947561d375..e12310dd5f4c 100644
> > --- a/fs/nfsd/vfs.h
> > +++ b/fs/nfsd/vfs.h
> > @@ -33,6 +33,8 @@
> >  
> >  #define NFSD_MAY_64BIT_COOKIE		0x1000 /* 64 bit readdir cookies for >= NFSv3 */
> >  
> > +#define NFSD_MAY_LOCALIO		0x2000 /* for tracing, reflects when localio used */
> > +
> >  #define NFSD_MAY_CREATE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE)
> >  #define NFSD_MAY_REMOVE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
> >  
> > @@ -158,6 +160,11 @@ __be32		nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
> >  
> >  void		nfsd_filp_close(struct file *fp);
> >  
> > +struct nfs_localio_ctx *
> > +nfsd_open_local_fh(struct net *, struct auth_domain *,
> > +		   struct rpc_clnt *, const struct cred *,
> > +		   const struct nfs_fh *, const fmode_t);
> > +
> >  static inline int fh_want_write(struct svc_fh *fh)
> >  {
> >  	int ret;
> 
> Reviewed-by: Jeff Layton <jlayton@kernel.org>
> 

Thanks,
Mike

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 16/25] nfsd: add localio support
  2024-08-29 16:59     ` Mike Snitzer
@ 2024-08-29 17:18       ` Chuck Lever
  0 siblings, 0 replies; 75+ messages in thread
From: Chuck Lever @ 2024-08-29 17:18 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jeff Layton, linux-nfs, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, Aug 29, 2024 at 12:59:20PM -0400, Mike Snitzer wrote:
> On Thu, Aug 29, 2024 at 12:49:23PM -0400, Jeff Layton wrote:
> > On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > > From: Weston Andros Adamson <dros@primarydata.com>
> > > 
> > > Add server support for bypassing NFS for localhost reads, writes, and
> > > commits. This is only useful when both the client and server are
> > > running on the same host.
> > > 
> > > If nfsd_open_local_fh() fails then the NFS client will both retry and
> > > fallback to normal network-based read, write and commit operations if
> > > localio is no longer supported.
> > > 
> > > Care is taken to ensure the same NFS security mechanisms are used
> > > (authentication, etc) regardless of whether localio or regular NFS
> > > access is used.  The auth_domain established as part of the traditional
> > > NFS client access to the NFS server is also used for localio.  Store
> > > auth_domain for localio in nfsd_uuid_t and transfer it to the client
> > > if it is local to the server.
> > > 
> > > Relative to containers, localio gives the client access to the network
> > > namespace the server has.  This is required to allow the client to
> > > access the server's per-namespace nfsd_net struct.
> > > 
> > > CONFIG_NFSD_LOCALIO controls the server enablement for localio.
> > > A later commit will add CONFIG_NFS_LOCALIO to allow the client
> > > enablement.
> > 
> > Do we need separate CONFIG options? Surely if you have one, you'll
> > always want the other.
> 
> We used to have 4 (2 for each)... yeah I hear you.  Its fiddley but I
> can look at making it a single one with more feeling.  Same as the
> nfs_to opes work I just commited to: worst case we keep what we have
> with the 2 CONFIG options, but 1 option _should_ be doable.

I also had Jeff's question but it didn't boil up out of my
subconsciousness into my typing fingers. Seems like having a single
Kconfig option would make this slightly easier for downstream
consumers (ie, Linux distros).


-- 
Chuck Lever

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-29 16:52     ` Mike Snitzer
@ 2024-08-29 17:48       ` Jeff Layton
  2024-08-30  4:36         ` NeilBrown
  0 siblings, 1 reply; 75+ messages in thread
From: Jeff Layton @ 2024-08-29 17:48 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-nfs, Chuck Lever, Anna Schumaker, Trond Myklebust,
	NeilBrown, linux-fsdevel

On Thu, 2024-08-29 at 12:52 -0400, Mike Snitzer wrote:
> On Thu, Aug 29, 2024 at 12:40:27PM -0400, Jeff Layton wrote:
> > On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > > Introduce struct nfs_localio_ctx and the interfaces
> > > nfs_localio_ctx_alloc() and nfs_localio_ctx_free().  The next commit
> > > will introduce nfsd_open_local_fh() which returns a nfs_localio_ctx
> > > structure.
> > > 
> > > Also, expose localio's required NFSD symbols to NFS client:
> > > - Cache nfsd_open_local_fh symbol and other required NFSD symbols in a
> > >   globally accessible 'nfs_to' nfs_to_nfsd_t struct.  Add interfaces
> > >   get_nfs_to_nfsd_symbols() and put_nfs_to_nfsd_symbols() to allow
> > >   each NFS client to take a reference on NFSD symbols.
> > > 
> > > - Apologies for the DEFINE_NFS_TO_NFSD_SYMBOL macro that makes
> > >   defining get_##NFSD_SYMBOL() and put_##NFSD_SYMBOL() functions far
> > >   simpler (and avoids cut-n-paste bugs, which is what motivated the
> > >   development and use of a macro for this). But as C macros go it is a
> > >   very simple one and there are many like it all over the kernel.
> > > 
> > > - Given the unique nature of NFS LOCALIO being an optional feature
> > >   that when used requires NFS share access to NFSD memory: a unique
> > >   bridging of NFSD resources to NFS (via nfs_common) is needed.  But
> > >   that bridge must be dynamic, hence the use of symbol_request() and
> > >   symbol_put().  Proposed ideas to accomolish the same without using
> > >   symbol_{request,put} would be far more tedious to implement and
> > >   very likely no easier to review.  Anyway: sorry NeilBrown...
> > > 
> > > - Despite the use of indirect function calls, caching these nfsd
> > >   symbols for use by the client offers a ~10% performance win
> > >   (compared to always doing get+call+put) for high IOPS workloads.
> > > 
> > > - Introduce nfsd_file_file() wrapper that provides access to
> > >   nfsd_file's backing file.  Keeps nfsd_file structure opaque to NFS
> > >   client (as suggested by Jeff Layton).
> > > 
> > > - The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
> > >   symbols prepares for the NFS client to use nfsd_file for localio.
> > > 
> > > Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
> > > Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > >  fs/nfs_common/nfslocalio.c | 159 +++++++++++++++++++++++++++++++++++++
> > >  fs/nfsd/filecache.c        |  25 ++++++
> > >  fs/nfsd/filecache.h        |   1 +
> > >  fs/nfsd/nfssvc.c           |   5 ++
> > >  include/linux/nfslocalio.h |  38 +++++++++
> > >  5 files changed, 228 insertions(+)
> > > 
> > > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > > index 1a35a4a6dbe0..cc30fdb0cb46 100644
> > > --- a/fs/nfs_common/nfslocalio.c
> > > +++ b/fs/nfs_common/nfslocalio.c
> > > @@ -72,3 +72,162 @@ bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *
> > >  	return is_local;
> > >  }
> > >  EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> > > +
> > > +/*
> > > + * The nfs localio code needs to call into nfsd using various symbols (below),
> > > + * but cannot be statically linked, because that will make the nfs module
> > > + * depend on the nfsd module.
> > > + *
> > > + * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
> > > + * nfs_common module will only hold a reference on nfsd when localio is in use.
> > > + * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> > > + */
> > > +static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
> > > +nfs_to_nfsd_t nfs_to;
> > > +EXPORT_SYMBOL_GPL(nfs_to);
> > > +
> > > +/* Macro to define nfs_to get and put methods, avoids copy-n-paste bugs */
> > > +#define DEFINE_NFS_TO_NFSD_SYMBOL(NFSD_SYMBOL)		\
> > > +static nfs_to_##NFSD_SYMBOL##_t get_##NFSD_SYMBOL(void)	\
> > > +{							\
> > > +	return symbol_request(NFSD_SYMBOL);		\
> > > +}							\
> > > +static void put_##NFSD_SYMBOL(void)			\
> > > +{							\
> > > +	symbol_put(NFSD_SYMBOL);			\
> > > +	nfs_to.NFSD_SYMBOL = NULL;			\
> > > +}
> > > +
> > > +/* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
> > > +extern struct nfs_localio_ctx *
> > > +nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
> > > +		   const struct cred *, const struct nfs_fh *, const fmode_t);
> > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
> > > +
> > > +/* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
> > > +extern struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_get);
> > > +
> > > +/* The nfs localio code needs to call into nfsd to release the nfsd_file */
> > > +extern void nfsd_file_put(struct nfsd_file *nf);
> > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_put);
> > > +
> > > +/* The nfs localio code needs to call into nfsd to access the nf->nf_file */
> > > +extern struct file * nfsd_file_file(struct nfsd_file *nf);
> > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_file);
> > > +
> > > +/* The nfs localio code needs to call into nfsd to release nn->nfsd_serv */
> > > +extern void nfsd_serv_put(struct nfsd_net *nn);
> > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_serv_put);
> > > +#undef DEFINE_NFS_TO_NFSD_SYMBOL
> > > +
> > 
> > I have the same concerns as Neil did with this patch in v13. An ops
> > structure that nfsd registers with nfs_common and that has pointers to
> > all of these functions would be a lot cleaner. I think it'll end up
> > being less code too.
> > 
> > In fact, for that I'd probably break my usual guideline of not
> > introducing new interfaces without callers, and just do a separate
> > patch that adds the ops structure and sets up the handling of the
> > pointer to it in nfs_common.
> 
> OK, as much as it pains me to set aside proven code that I put a
> decent amount of time to honing: I'll humor you guys and try to make
> an ops structure workable. (we can always fall back to my approach if
> I/we come up short).
> 
> I'm just concerned about the optional use aspect.  There is the pain
> point of how does NFS client come to _know_ NFSD loaded?  Using
> symbol_request() deals with that nicely.
> 

Have a pointer to a struct nfsd_localio_ops or something in the
nfs_common module. That's initially set to NULL. Then, have a static
structure of that type in nfsd.ko, and have its __init routine set the
pointer in nfs_common to point to the right structure. The __exit
routine will later set it to NULL.

> I really don't want all calls in NFS client (or nfs_common) to have to
> first check if nfs_common's 'nfs_to' ops structure is NULL or not.

Neil seems to think that's not necessary:

"If nfs/localio holds an auth_domain, then it implicitly holds a
reference to the nfsd module and the functions cannot disappear."

That'll need to be clearly documented though as it's a subtle thing to
rely on for this.
-- 
Jeff Layton <jlayton@kernel.org>

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 16/25] nfsd: add localio support
  2024-08-29 16:01   ` Chuck Lever
  2024-08-29 16:15     ` Mike Snitzer
@ 2024-08-29 23:10     ` NeilBrown
  1 sibling, 0 replies; 75+ messages in thread
From: NeilBrown @ 2024-08-29 23:10 UTC (permalink / raw)
  To: Chuck Lever
  Cc: Mike Snitzer, linux-nfs, Jeff Layton, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, 30 Aug 2024, Chuck Lever wrote:
> On Wed, Aug 28, 2024 at 09:04:11PM -0400, Mike Snitzer wrote:
> > From: Weston Andros Adamson <dros@primarydata.com>
> > 
> > Add server support for bypassing NFS for localhost reads, writes, and
> > commits. This is only useful when both the client and server are
> > running on the same host.
> > 
> > If nfsd_open_local_fh() fails then the NFS client will both retry and
> > fallback to normal network-based read, write and commit operations if
> > localio is no longer supported.
> > 
> > Care is taken to ensure the same NFS security mechanisms are used
> > (authentication, etc) regardless of whether localio or regular NFS
> > access is used.  The auth_domain established as part of the traditional
> > NFS client access to the NFS server is also used for localio.  Store
> > auth_domain for localio in nfsd_uuid_t and transfer it to the client
> > if it is local to the server.
> > 
> > Relative to containers, localio gives the client access to the network
> > namespace the server has.  This is required to allow the client to
> > access the server's per-namespace nfsd_net struct.
> > 
> > CONFIG_NFSD_LOCALIO controls the server enablement for localio.
> > A later commit will add CONFIG_NFS_LOCALIO to allow the client
> > enablement.
> > 
> > This commit also introduces the use of nfsd's percpu_ref to interlock
> > nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
> > not destroyed while in use by nfsd_open_local_fh, and warrants a more
> > detailed explanation:
> > 
> > nfsd_open_local_fh uses nfsd_serv_try_get before opening its file
> > handle and then the reference must be dropped by the caller using
> > nfsd_serv_put (via nfs_localio_ctx_free).
> > 
> > This "interlock" working relies heavily on nfsd_open_local_fh()'s
> > maybe_get_net() safely dealing with the possibility that the struct
> > net (and nfsd_net by association) may have been destroyed by
> > nfsd_destroy_serv() via nfsd_shutdown_net().
> > 
> > Verified to fix an easy to hit crash that would occur if an nfsd
> > instance running in a container, with a localio client mounted, is
> > shutdown. Upon restart of the container and associated nfsd the client
> > would go on to crash due to NULL pointer dereference that occuured due
> > to the nfs client's localio attempting to nfsd_open_local_fh(), using
> > nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
> > 
> > Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Co-developed-by: Mike Snitzer <snitzer@kernel.org>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> >  fs/Kconfig          |   3 ++
> >  fs/nfsd/Kconfig     |  16 +++++++
> >  fs/nfsd/Makefile    |   1 +
> >  fs/nfsd/filecache.c |   2 +-
> >  fs/nfsd/localio.c   | 105 ++++++++++++++++++++++++++++++++++++++++++++
> >  fs/nfsd/trace.h     |   3 +-
> >  fs/nfsd/vfs.h       |   7 +++
> >  7 files changed, 135 insertions(+), 2 deletions(-)
> >  create mode 100644 fs/nfsd/localio.c
> > 
> > diff --git a/fs/Kconfig b/fs/Kconfig
> > index a46b0cbc4d8f..1b8a5edbddff 100644
> > --- a/fs/Kconfig
> > +++ b/fs/Kconfig
> > @@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
> >  	tristate
> >  	select FS_POSIX_ACL
> >  
> > +config NFS_COMMON_LOCALIO_SUPPORT
> > +	bool
> > +
> >  config NFS_COMMON
> >  	bool
> >  	depends on NFSD || NFS_FS || LOCKD
> > diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
> > index c0bd1509ccd4..e6fa7eaa1db0 100644
> > --- a/fs/nfsd/Kconfig
> > +++ b/fs/nfsd/Kconfig
> > @@ -90,6 +90,22 @@ config NFSD_V4
> >  
> >  	  If unsure, say N.
> >  
> > +config NFSD_LOCALIO
> > +	bool "NFS server support for the LOCALIO auxiliary protocol"
> > +	depends on NFSD
> > +	select NFS_COMMON_LOCALIO_SUPPORT
> > +	default n
> > +	help
> > +	  Some NFS servers support an auxiliary NFS LOCALIO protocol
> > +	  that is not an official part of the NFS protocol.
> > +
> > +	  This option enables support for the LOCALIO protocol in the
> > +	  kernel's NFS server.  Enable this to permit local NFS clients
> > +	  to bypass the network when issuing reads and writes to the
> > +	  local NFS server.
> > +
> > +	  If unsure, say N.
> > +
> >  config NFSD_PNFS
> >  	bool
> >  
> > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > index b8736a82e57c..78b421778a79 100644
> > --- a/fs/nfsd/Makefile
> > +++ b/fs/nfsd/Makefile
> > @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> >  nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> >  nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> >  nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> > +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index a83d469bca6b..49f4aab3208a 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -53,7 +53,7 @@
> >  #define NFSD_FILE_CACHE_UP		     (0)
> >  
> >  /* We only care about NFSD_MAY_READ/WRITE for this cache */
> > -#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE)
> > +#define NFSD_FILE_MAY_MASK	(NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> >  
> >  static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> >  static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > new file mode 100644
> > index 000000000000..4b65c66be129
> > --- /dev/null
> > +++ b/fs/nfsd/localio.c
> > @@ -0,0 +1,105 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * NFS server support for local clients to bypass network stack
> > + *
> > + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> > + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > + */
> > +
> > +#include <linux/exportfs.h>
> > +#include <linux/sunrpc/svcauth.h>
> > +#include <linux/sunrpc/clnt.h>
> > +#include <linux/nfs.h>
> > +#include <linux/nfs_common.h>
> > +#include <linux/nfslocalio.h>
> > +#include <linux/string.h>
> > +
> > +#include "nfsd.h"
> > +#include "vfs.h"
> > +#include "netns.h"
> > +#include "filecache.h"
> > +
> > +/**
> > + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
> > + *
> > + * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net
> > + * @cl_nfssvc_dom: the 'struct auth_domain' required for localio access
> > + * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
> > + * @cred: cred that the client established
> > + * @nfs_fh: filehandle to lookup
> > + * @fmode: fmode_t to use for open
> > + *
> > + * This function maps a local fh to a path on a local filesystem.
> > + * This is useful when the nfs client has the local server mounted - it can
> > + * avoid all the NFS overhead with reads, writes and commits.
> > + *
> > + * On successful return, returned nfs_localio_ctx will have its nfsd_file and
> > + * nfsd_net members set. Caller is responsible for calling nfsd_file_put and
> > + * nfsd_serv_put (via nfs_localio_ctx_free).
> > + */
> > +struct nfs_localio_ctx *
> > +nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
> > +		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> > +		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> > +{
> > +	int mayflags = NFSD_MAY_LOCALIO;
> > +	int status = 0;
> > +	struct nfsd_net *nn;
> > +	struct svc_cred rq_cred;
> > +	struct svc_fh fh;
> > +	struct nfs_localio_ctx *localio;
> > +	__be32 beres;
> > +
> > +	if (nfs_fh->size > NFS4_FHSIZE)
> > +		return ERR_PTR(-EINVAL);
> > +
> > +	localio = nfs_localio_ctx_alloc();
> > +	if (!localio)
> > +		return ERR_PTR(-ENOMEM);
> > +
> > +	/*
> > +	 * Not running in nfsd context, so must safely get reference on nfsd_serv.
> > +	 * But the server may already be shutting down, if so disallow new localio.
> > +	 */
> > +	nn = net_generic(cl_nfssvc_net, nfsd_net_id);
> > +	if (unlikely(!nfsd_serv_try_get(nn))) {
> > +		status = -ENXIO;
> > +		goto out_nfsd_serv;
> > +	}
> > +
> > +	/* nfs_fh -> svc_fh */
> > +	fh_init(&fh, NFS4_FHSIZE);
> > +	fh.fh_handle.fh_size = nfs_fh->size;
> > +	memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > +
> > +	if (fmode & FMODE_READ)
> > +		mayflags |= NFSD_MAY_READ;
> > +	if (fmode & FMODE_WRITE)
> > +		mayflags |= NFSD_MAY_WRITE;
> > +
> > +	svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
> > +
> > +	beres = nfsd_file_acquire_local(cl_nfssvc_net, &rq_cred, cl_nfssvc_dom,
> > +					&fh, mayflags, &localio->nf);
> > +	if (beres) {
> > +		status = nfs_stat_to_errno(be32_to_cpu(beres));
> > +		goto out_fh_put;
> > +	}
> > +	localio->nn = nn;
> > +
> > +out_fh_put:
> > +	fh_put(&fh);
> > +	if (rq_cred.cr_group_info)
> > +		put_group_info(rq_cred.cr_group_info);
> > +out_nfsd_serv:
> > +	if (status) {
> > +		nfs_localio_ctx_free(localio);
> > +		return ERR_PTR(status);
> > +	}
> > +	return localio;
> > +}
> > +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> > +
> > +/* Compile time type checking, not used by anything */
> > +static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index d22027e23761..82bcefcd1f21 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
> >  		{ NFSD_MAY_NOT_BREAK_LEASE,	"NOT_BREAK_LEASE" },	\
> >  		{ NFSD_MAY_BYPASS_GSS,		"BYPASS_GSS" },		\
> >  		{ NFSD_MAY_READ_IF_EXEC,	"READ_IF_EXEC" },	\
> > -		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" })
> > +		{ NFSD_MAY_64BIT_COOKIE,	"64BIT_COOKIE" },	\
> > +		{ NFSD_MAY_LOCALIO,		"LOCALIO" })
> >  
> >  TRACE_EVENT(nfsd_compound,
> >  	TP_PROTO(
> > diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> > index 01947561d375..e12310dd5f4c 100644
> > --- a/fs/nfsd/vfs.h
> > +++ b/fs/nfsd/vfs.h
> > @@ -33,6 +33,8 @@
> >  
> >  #define NFSD_MAY_64BIT_COOKIE		0x1000 /* 64 bit readdir cookies for >= NFSv3 */
> >  
> > +#define NFSD_MAY_LOCALIO		0x2000 /* for tracing, reflects when localio used */
> > +
> >  #define NFSD_MAY_CREATE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE)
> >  #define NFSD_MAY_REMOVE		(NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
> >  
> > @@ -158,6 +160,11 @@ __be32		nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
> >  
> >  void		nfsd_filp_close(struct file *fp);
> >  
> > +struct nfs_localio_ctx *
> > +nfsd_open_local_fh(struct net *, struct auth_domain *,
> > +		   struct rpc_clnt *, const struct cred *,
> > +		   const struct nfs_fh *, const fmode_t);
> > +
> >  static inline int fh_want_write(struct svc_fh *fh)
> >  {
> >  	int ret;
> > -- 
> > 2.44.0
> > 
> 
> Acked-by: Chuck Lever <chuck.lever@oracle.com>
> 
> I think I've looked at all the server-side changes now. I don't see
> any issues that block merging this series.
> 
> Two follow-ups:
> 
> I haven't heard an answer to my question about how export options
> that translate RPC user IDs might behave for LOCALIO operations
> (eg. root_squash, all_squash). Test results, design points,
> NEEDS_WORK, etc.

Export options that translate user IDs are managed by nfsd_setuser()
which is called by nfsd_setuser_and_check_port() which is called by
__fh_verify().  So they get handled exactly the same way for LOCALIO as
they do for WIRE-IO.

NeilBrown


> 
> Someone should try out the trace points that we neutered in
> fh_verify() before this set gets applied.
> 
> 
> -- 
> Chuck Lever
> 


^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement
  2024-08-29  1:04 ` [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
  2024-08-29 16:07   ` Jeff Layton
@ 2024-08-29 23:39   ` NeilBrown
  2024-08-30  1:45     ` Mike Snitzer
  1 sibling, 1 reply; 75+ messages in thread
From: NeilBrown @ 2024-08-29 23:39 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-nfs, Jeff Layton, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Thu, 29 Aug 2024, Mike Snitzer wrote:

> +
> +bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *dom)
> +{
> +	bool is_local = false;
> +	nfs_uuid_t *nfs_uuid;
> +
> +	rcu_read_lock();
> +	nfs_uuid = nfs_uuid_lookup(uuid);
> +	if (nfs_uuid) {
> +		nfs_uuid->net = maybe_get_net(net);

I know I said it looked wrong to be getting a ref for the domain but not
the net - and it did.  But that doesn't mean the fix was to get a ref
for the net and to hold it indefinitely.

This ref is now held until the client happens to notice that localio
doesn't work any more (because nfsd_serv_try_get() fails).  So the
shutdown of a net namespace will be delayed indefinitely if the NFS
filesystem isn't being actively used.

I would prefer that there were a way for the net namespace to reach back
into the client and disconnect itself.  Probably this would be a
linked-list in struct nfsd_net which linked list_heads in struct
nfs_client.  This list would need to be protected by a spinlock -
probably global in nfs_common so client could remove itself and server
could remove all clients after clearing their net pointers.

It is probably best if I explain all of what I am thinking as a patch.

Stay tuned.

NeilBrown

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement
  2024-08-29 23:39   ` NeilBrown
@ 2024-08-30  1:45     ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-30  1:45 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-nfs, Jeff Layton, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, Aug 30, 2024 at 09:39:10AM +1000, NeilBrown wrote:
> On Thu, 29 Aug 2024, Mike Snitzer wrote:
> 
> > +
> > +bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *dom)
> > +{
> > +	bool is_local = false;
> > +	nfs_uuid_t *nfs_uuid;
> > +
> > +	rcu_read_lock();
> > +	nfs_uuid = nfs_uuid_lookup(uuid);
> > +	if (nfs_uuid) {
> > +		nfs_uuid->net = maybe_get_net(net);
> 
> I know I said it looked wrong to be getting a ref for the domain but not
> the net - and it did.  But that doesn't mean the fix was to get a ref
> for the net and to hold it indefinitely.
>
> This ref is now held until the client happens to notice that localio
> doesn't work any more (because nfsd_serv_try_get() fails).  So the
> shutdown of a net namespace will be delayed indefinitely if the NFS
> filesystem isn't being actively used.
> 
> I would prefer that there were a way for the net namespace to reach back
> into the client and disconnect itself.  Probably this would be a
> linked-list in struct nfsd_net which linked list_heads in struct
> nfs_client.  This list would need to be protected by a spinlock -
> probably global in nfs_common so client could remove itself and server
> could remove all clients after clearing their net pointers.
> 
> It is probably best if I explain all of what I am thinking as a patch.
> 
> Stay tuned.

OK, a mechanism to have the net namespace disconnect itself sounds neat.

Or alternatively we could do what I was doing:

        /* Not running in nfsd context, must safely get reference on nfsd_serv */
        cl_nfssvc_net = maybe_get_net(cl_nfssvc_net);
        if (!cl_nfssvc_net)
                return -ENXIO;

        nn = net_generic(cl_nfssvc_net, nfsd_net_id);

        /* The server may already be shutting down, disallow new localio */
        if (unlikely(!nfsd_serv_try_get(nn))) {

But only if maybe_get_net() will always fail safely...

I feel like we talked about the relative safety of maybe_get_net()
before (but I'm coming up short searching my email):

static inline struct net *maybe_get_net(struct net *net)
{
        /* Used when we know struct net exists but we
         * aren't guaranteed a previous reference count
         * exists.  If the reference count is zero this
         * function fails and returns NULL.
         */
        if (!refcount_inc_not_zero(&net->ns.count))
                net = NULL;
        return net;
}

So you have doubts the struct net will always still exist because I
didn't take a reference? (from fs/nfsd/localio.c):

static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
{
        struct localio_uuidarg *argp = rqstp->rq_argp;

        (void) nfs_uuid_is_local(&argp->uuid, SVC_NET(rqstp),
                                 rqstp->rq_client);

        return rpc_success;
}

I think that's a fair concern (despite it working fine in practice
with destructive container testing, I cannot say there won't ever be a
use-after-free bug).

So all said: consider me staying tuned ;)

Thanks

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-29 17:48       ` Jeff Layton
@ 2024-08-30  4:36         ` NeilBrown
  2024-08-30  5:01           ` Mike Snitzer
  0 siblings, 1 reply; 75+ messages in thread
From: NeilBrown @ 2024-08-30  4:36 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Mike Snitzer, linux-nfs, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, 30 Aug 2024, Jeff Layton wrote:
> On Thu, 2024-08-29 at 12:52 -0400, Mike Snitzer wrote:
> > On Thu, Aug 29, 2024 at 12:40:27PM -0400, Jeff Layton wrote:
> > > On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > > > Introduce struct nfs_localio_ctx and the interfaces
> > > > nfs_localio_ctx_alloc() and nfs_localio_ctx_free().  The next commit
> > > > will introduce nfsd_open_local_fh() which returns a nfs_localio_ctx
> > > > structure.
> > > > 
> > > > Also, expose localio's required NFSD symbols to NFS client:
> > > > - Cache nfsd_open_local_fh symbol and other required NFSD symbols in a
> > > >   globally accessible 'nfs_to' nfs_to_nfsd_t struct.  Add interfaces
> > > >   get_nfs_to_nfsd_symbols() and put_nfs_to_nfsd_symbols() to allow
> > > >   each NFS client to take a reference on NFSD symbols.
> > > > 
> > > > - Apologies for the DEFINE_NFS_TO_NFSD_SYMBOL macro that makes
> > > >   defining get_##NFSD_SYMBOL() and put_##NFSD_SYMBOL() functions far
> > > >   simpler (and avoids cut-n-paste bugs, which is what motivated the
> > > >   development and use of a macro for this). But as C macros go it is a
> > > >   very simple one and there are many like it all over the kernel.
> > > > 
> > > > - Given the unique nature of NFS LOCALIO being an optional feature
> > > >   that when used requires NFS share access to NFSD memory: a unique
> > > >   bridging of NFSD resources to NFS (via nfs_common) is needed.  But
> > > >   that bridge must be dynamic, hence the use of symbol_request() and
> > > >   symbol_put().  Proposed ideas to accomolish the same without using
> > > >   symbol_{request,put} would be far more tedious to implement and
> > > >   very likely no easier to review.  Anyway: sorry NeilBrown...
> > > > 
> > > > - Despite the use of indirect function calls, caching these nfsd
> > > >   symbols for use by the client offers a ~10% performance win
> > > >   (compared to always doing get+call+put) for high IOPS workloads.
> > > > 
> > > > - Introduce nfsd_file_file() wrapper that provides access to
> > > >   nfsd_file's backing file.  Keeps nfsd_file structure opaque to NFS
> > > >   client (as suggested by Jeff Layton).
> > > > 
> > > > - The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
> > > >   symbols prepares for the NFS client to use nfsd_file for localio.
> > > > 
> > > > Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
> > > > Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
> > > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > > ---
> > > >  fs/nfs_common/nfslocalio.c | 159 +++++++++++++++++++++++++++++++++++++
> > > >  fs/nfsd/filecache.c        |  25 ++++++
> > > >  fs/nfsd/filecache.h        |   1 +
> > > >  fs/nfsd/nfssvc.c           |   5 ++
> > > >  include/linux/nfslocalio.h |  38 +++++++++
> > > >  5 files changed, 228 insertions(+)
> > > > 
> > > > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > > > index 1a35a4a6dbe0..cc30fdb0cb46 100644
> > > > --- a/fs/nfs_common/nfslocalio.c
> > > > +++ b/fs/nfs_common/nfslocalio.c
> > > > @@ -72,3 +72,162 @@ bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *
> > > >  	return is_local;
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> > > > +
> > > > +/*
> > > > + * The nfs localio code needs to call into nfsd using various symbols (below),
> > > > + * but cannot be statically linked, because that will make the nfs module
> > > > + * depend on the nfsd module.
> > > > + *
> > > > + * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
> > > > + * nfs_common module will only hold a reference on nfsd when localio is in use.
> > > > + * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> > > > + */
> > > > +static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
> > > > +nfs_to_nfsd_t nfs_to;
> > > > +EXPORT_SYMBOL_GPL(nfs_to);
> > > > +
> > > > +/* Macro to define nfs_to get and put methods, avoids copy-n-paste bugs */
> > > > +#define DEFINE_NFS_TO_NFSD_SYMBOL(NFSD_SYMBOL)		\
> > > > +static nfs_to_##NFSD_SYMBOL##_t get_##NFSD_SYMBOL(void)	\
> > > > +{							\
> > > > +	return symbol_request(NFSD_SYMBOL);		\
> > > > +}							\
> > > > +static void put_##NFSD_SYMBOL(void)			\
> > > > +{							\
> > > > +	symbol_put(NFSD_SYMBOL);			\
> > > > +	nfs_to.NFSD_SYMBOL = NULL;			\
> > > > +}
> > > > +
> > > > +/* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
> > > > +extern struct nfs_localio_ctx *
> > > > +nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
> > > > +		   const struct cred *, const struct nfs_fh *, const fmode_t);
> > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
> > > > +
> > > > +/* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
> > > > +extern struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_get);
> > > > +
> > > > +/* The nfs localio code needs to call into nfsd to release the nfsd_file */
> > > > +extern void nfsd_file_put(struct nfsd_file *nf);
> > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_put);
> > > > +
> > > > +/* The nfs localio code needs to call into nfsd to access the nf->nf_file */
> > > > +extern struct file * nfsd_file_file(struct nfsd_file *nf);
> > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_file);
> > > > +
> > > > +/* The nfs localio code needs to call into nfsd to release nn->nfsd_serv */
> > > > +extern void nfsd_serv_put(struct nfsd_net *nn);
> > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_serv_put);
> > > > +#undef DEFINE_NFS_TO_NFSD_SYMBOL
> > > > +
> > > 
> > > I have the same concerns as Neil did with this patch in v13. An ops
> > > structure that nfsd registers with nfs_common and that has pointers to
> > > all of these functions would be a lot cleaner. I think it'll end up
> > > being less code too.
> > > 
> > > In fact, for that I'd probably break my usual guideline of not
> > > introducing new interfaces without callers, and just do a separate
> > > patch that adds the ops structure and sets up the handling of the
> > > pointer to it in nfs_common.
> > 
> > OK, as much as it pains me to set aside proven code that I put a
> > decent amount of time to honing: I'll humor you guys and try to make
> > an ops structure workable. (we can always fall back to my approach if
> > I/we come up short).
> > 
> > I'm just concerned about the optional use aspect.  There is the pain
> > point of how does NFS client come to _know_ NFSD loaded?  Using
> > symbol_request() deals with that nicely.
> > 
> 
> Have a pointer to a struct nfsd_localio_ops or something in the
> nfs_common module. That's initially set to NULL. Then, have a static
> structure of that type in nfsd.ko, and have its __init routine set the
> pointer in nfs_common to point to the right structure. The __exit
> routine will later set it to NULL.
> 
> > I really don't want all calls in NFS client (or nfs_common) to have to
> > first check if nfs_common's 'nfs_to' ops structure is NULL or not.
> 
> Neil seems to think that's not necessary:
> 
> "If nfs/localio holds an auth_domain, then it implicitly holds a
> reference to the nfsd module and the functions cannot disappear."

On reflection that isn't quite right, but it is the sort of approach
that I think we need to take.
There are several things that the NFS client needs to hold one to.

1/ It needs a reference to the nfsd module (or symbols in the module).
   I think this can be held long term but we need a clear mechanism for
   it to be dropped.
2/ It needs a reference to the nfsd_serv which it gets through the
   'struct net' pointer.  I've posted patches to handle that better.
3/ It needs a reference to an auth_domain.  This can safely be a long
   term reference.  It can already be invalidated and the code to free
   it is in sunrpc which nfs already pins.  Any delay in freeing it only
   wastes memory (not much), it doesn't impact anything else.
4/ It needs a reference to the nfsd_file and/or file.  This is currently
   done only while the ref to the nfsd_serv is held, so I think there is
   no problem there.

So possibly we could take a reference to the nfsd module whenever we
store a net in nfs_uuid. and drop the ref whenever we clear that.

That means we cannot call nfsd_open_local_fh() without first getting a
ref on the nfsd_serv which my latest code doesn't do.  That is easily
fixed.  I'll send a patch for consideration...

Thanks,
NeilBrown

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-30  4:36         ` NeilBrown
@ 2024-08-30  5:01           ` Mike Snitzer
  2024-08-30  5:08             ` Mike Snitzer
                               ` (2 more replies)
  0 siblings, 3 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-30  5:01 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, linux-nfs, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, Aug 30, 2024 at 02:36:13PM +1000, NeilBrown wrote:
> On Fri, 30 Aug 2024, Jeff Layton wrote:
> > On Thu, 2024-08-29 at 12:52 -0400, Mike Snitzer wrote:
> > > On Thu, Aug 29, 2024 at 12:40:27PM -0400, Jeff Layton wrote:
> > > > On Wed, 2024-08-28 at 21:04 -0400, Mike Snitzer wrote:
> > > > > Introduce struct nfs_localio_ctx and the interfaces
> > > > > nfs_localio_ctx_alloc() and nfs_localio_ctx_free().  The next commit
> > > > > will introduce nfsd_open_local_fh() which returns a nfs_localio_ctx
> > > > > structure.
> > > > > 
> > > > > Also, expose localio's required NFSD symbols to NFS client:
> > > > > - Cache nfsd_open_local_fh symbol and other required NFSD symbols in a
> > > > >   globally accessible 'nfs_to' nfs_to_nfsd_t struct.  Add interfaces
> > > > >   get_nfs_to_nfsd_symbols() and put_nfs_to_nfsd_symbols() to allow
> > > > >   each NFS client to take a reference on NFSD symbols.
> > > > > 
> > > > > - Apologies for the DEFINE_NFS_TO_NFSD_SYMBOL macro that makes
> > > > >   defining get_##NFSD_SYMBOL() and put_##NFSD_SYMBOL() functions far
> > > > >   simpler (and avoids cut-n-paste bugs, which is what motivated the
> > > > >   development and use of a macro for this). But as C macros go it is a
> > > > >   very simple one and there are many like it all over the kernel.
> > > > > 
> > > > > - Given the unique nature of NFS LOCALIO being an optional feature
> > > > >   that when used requires NFS share access to NFSD memory: a unique
> > > > >   bridging of NFSD resources to NFS (via nfs_common) is needed.  But
> > > > >   that bridge must be dynamic, hence the use of symbol_request() and
> > > > >   symbol_put().  Proposed ideas to accomolish the same without using
> > > > >   symbol_{request,put} would be far more tedious to implement and
> > > > >   very likely no easier to review.  Anyway: sorry NeilBrown...
> > > > > 
> > > > > - Despite the use of indirect function calls, caching these nfsd
> > > > >   symbols for use by the client offers a ~10% performance win
> > > > >   (compared to always doing get+call+put) for high IOPS workloads.
> > > > > 
> > > > > - Introduce nfsd_file_file() wrapper that provides access to
> > > > >   nfsd_file's backing file.  Keeps nfsd_file structure opaque to NFS
> > > > >   client (as suggested by Jeff Layton).
> > > > > 
> > > > > - The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
> > > > >   symbols prepares for the NFS client to use nfsd_file for localio.
> > > > > 
> > > > > Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
> > > > > Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
> > > > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > > > ---
> > > > >  fs/nfs_common/nfslocalio.c | 159 +++++++++++++++++++++++++++++++++++++
> > > > >  fs/nfsd/filecache.c        |  25 ++++++
> > > > >  fs/nfsd/filecache.h        |   1 +
> > > > >  fs/nfsd/nfssvc.c           |   5 ++
> > > > >  include/linux/nfslocalio.h |  38 +++++++++
> > > > >  5 files changed, 228 insertions(+)
> > > > > 
> > > > > diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> > > > > index 1a35a4a6dbe0..cc30fdb0cb46 100644
> > > > > --- a/fs/nfs_common/nfslocalio.c
> > > > > +++ b/fs/nfs_common/nfslocalio.c
> > > > > @@ -72,3 +72,162 @@ bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *
> > > > >  	return is_local;
> > > > >  }
> > > > >  EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
> > > > > +
> > > > > +/*
> > > > > + * The nfs localio code needs to call into nfsd using various symbols (below),
> > > > > + * but cannot be statically linked, because that will make the nfs module
> > > > > + * depend on the nfsd module.
> > > > > + *
> > > > > + * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
> > > > > + * nfs_common module will only hold a reference on nfsd when localio is in use.
> > > > > + * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> > > > > + */
> > > > > +static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
> > > > > +nfs_to_nfsd_t nfs_to;
> > > > > +EXPORT_SYMBOL_GPL(nfs_to);
> > > > > +
> > > > > +/* Macro to define nfs_to get and put methods, avoids copy-n-paste bugs */
> > > > > +#define DEFINE_NFS_TO_NFSD_SYMBOL(NFSD_SYMBOL)		\
> > > > > +static nfs_to_##NFSD_SYMBOL##_t get_##NFSD_SYMBOL(void)	\
> > > > > +{							\
> > > > > +	return symbol_request(NFSD_SYMBOL);		\
> > > > > +}							\
> > > > > +static void put_##NFSD_SYMBOL(void)			\
> > > > > +{							\
> > > > > +	symbol_put(NFSD_SYMBOL);			\
> > > > > +	nfs_to.NFSD_SYMBOL = NULL;			\
> > > > > +}
> > > > > +
> > > > > +/* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
> > > > > +extern struct nfs_localio_ctx *
> > > > > +nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
> > > > > +		   const struct cred *, const struct nfs_fh *, const fmode_t);
> > > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
> > > > > +
> > > > > +/* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
> > > > > +extern struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
> > > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_get);
> > > > > +
> > > > > +/* The nfs localio code needs to call into nfsd to release the nfsd_file */
> > > > > +extern void nfsd_file_put(struct nfsd_file *nf);
> > > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_put);
> > > > > +
> > > > > +/* The nfs localio code needs to call into nfsd to access the nf->nf_file */
> > > > > +extern struct file * nfsd_file_file(struct nfsd_file *nf);
> > > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_file);
> > > > > +
> > > > > +/* The nfs localio code needs to call into nfsd to release nn->nfsd_serv */
> > > > > +extern void nfsd_serv_put(struct nfsd_net *nn);
> > > > > +DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_serv_put);
> > > > > +#undef DEFINE_NFS_TO_NFSD_SYMBOL
> > > > > +
> > > > 
> > > > I have the same concerns as Neil did with this patch in v13. An ops
> > > > structure that nfsd registers with nfs_common and that has pointers to
> > > > all of these functions would be a lot cleaner. I think it'll end up
> > > > being less code too.
> > > > 
> > > > In fact, for that I'd probably break my usual guideline of not
> > > > introducing new interfaces without callers, and just do a separate
> > > > patch that adds the ops structure and sets up the handling of the
> > > > pointer to it in nfs_common.
> > > 
> > > OK, as much as it pains me to set aside proven code that I put a
> > > decent amount of time to honing: I'll humor you guys and try to make
> > > an ops structure workable. (we can always fall back to my approach if
> > > I/we come up short).
> > > 
> > > I'm just concerned about the optional use aspect.  There is the pain
> > > point of how does NFS client come to _know_ NFSD loaded?  Using
> > > symbol_request() deals with that nicely.
> > > 
> > 
> > Have a pointer to a struct nfsd_localio_ops or something in the
> > nfs_common module. That's initially set to NULL. Then, have a static
> > structure of that type in nfsd.ko, and have its __init routine set the
> > pointer in nfs_common to point to the right structure. The __exit
> > routine will later set it to NULL.
> > 
> > > I really don't want all calls in NFS client (or nfs_common) to have to
> > > first check if nfs_common's 'nfs_to' ops structure is NULL or not.
> > 
> > Neil seems to think that's not necessary:
> > 
> > "If nfs/localio holds an auth_domain, then it implicitly holds a
> > reference to the nfsd module and the functions cannot disappear."
> 
> On reflection that isn't quite right, but it is the sort of approach
> that I think we need to take.
> There are several things that the NFS client needs to hold one to.
> 
> 1/ It needs a reference to the nfsd module (or symbols in the module).
>    I think this can be held long term but we need a clear mechanism for
>    it to be dropped.
> 2/ It needs a reference to the nfsd_serv which it gets through the
>    'struct net' pointer.  I've posted patches to handle that better.
> 3/ It needs a reference to an auth_domain.  This can safely be a long
>    term reference.  It can already be invalidated and the code to free
>    it is in sunrpc which nfs already pins.  Any delay in freeing it only
>    wastes memory (not much), it doesn't impact anything else.
> 4/ It needs a reference to the nfsd_file and/or file.  This is currently
>    done only while the ref to the nfsd_serv is held, so I think there is
>    no problem there.
> 
> So possibly we could take a reference to the nfsd module whenever we
> store a net in nfs_uuid. and drop the ref whenever we clear that.
> 
> That means we cannot call nfsd_open_local_fh() without first getting a
> ref on the nfsd_serv which my latest code doesn't do.  That is easily
> fixed.  I'll send a patch for consideration...

I already implemented 2 different versions today, meant for v15.

First is a relaxed version of the v14 code (less code, only using
symbol_request on nfsd_open_local_fh.

Second is much more relaxed, because it leverages your original
assumption that the auth_domain ref sufficient.

I'll reply twice to this mail with each each respective patch.

Maybe I save you some time...

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-30  5:01           ` Mike Snitzer
@ 2024-08-30  5:08             ` Mike Snitzer
  2024-08-30  5:12             ` Mike Snitzer
  2024-08-30  5:34             ` NeilBrown
  2 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-30  5:08 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, linux-nfs, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, Aug 30, 2024 at 01:01:55AM -0400, Mike Snitzer wrote:
> On Fri, Aug 30, 2024 at 02:36:13PM +1000, NeilBrown wrote:
> > On Fri, 30 Aug 2024, Jeff Layton wrote:
> > > 
> > > Have a pointer to a struct nfsd_localio_ops or something in the
> > > nfs_common module. That's initially set to NULL. Then, have a static
> > > structure of that type in nfsd.ko, and have its __init routine set the
> > > pointer in nfs_common to point to the right structure. The __exit
> > > routine will later set it to NULL.
> > > 
> > > > I really don't want all calls in NFS client (or nfs_common) to have to
> > > > first check if nfs_common's 'nfs_to' ops structure is NULL or not.
> > > 
> > > Neil seems to think that's not necessary:
> > > 
> > > "If nfs/localio holds an auth_domain, then it implicitly holds a
> > > reference to the nfsd module and the functions cannot disappear."
> > 
> > On reflection that isn't quite right, but it is the sort of approach
> > that I think we need to take.
> > There are several things that the NFS client needs to hold one to.
> > 
> > 1/ It needs a reference to the nfsd module (or symbols in the module).
> >    I think this can be held long term but we need a clear mechanism for
> >    it to be dropped.
> > 2/ It needs a reference to the nfsd_serv which it gets through the
> >    'struct net' pointer.  I've posted patches to handle that better.
> > 3/ It needs a reference to an auth_domain.  This can safely be a long
> >    term reference.  It can already be invalidated and the code to free
> >    it is in sunrpc which nfs already pins.  Any delay in freeing it only
> >    wastes memory (not much), it doesn't impact anything else.
> > 4/ It needs a reference to the nfsd_file and/or file.  This is currently
> >    done only while the ref to the nfsd_serv is held, so I think there is
> >    no problem there.
> > 
> > So possibly we could take a reference to the nfsd module whenever we
> > store a net in nfs_uuid. and drop the ref whenever we clear that.
> > 
> > That means we cannot call nfsd_open_local_fh() without first getting a
> > ref on the nfsd_serv which my latest code doesn't do.  That is easily
> > fixed.  I'll send a patch for consideration...
> 
> I already implemented 2 different versions today, meant for v15.
> 
> First is a relaxed version of the v14 code (less code, only using
> symbol_request on nfsd_open_local_fh.

This is the corresponding code that is needed in fs/nfsd/localio.c

+static const struct nfsd_localio_operations nfsd_localio_ops = {
+       .nfsd_open_local_fh = nfsd_open_local_fh,
+       .nfsd_file_get = nfsd_file_get,
+       .nfsd_file_put = nfsd_file_put,
+       .nfsd_file_file = nfsd_file_file,
+       .nfsd_serv_put = nfsd_serv_put,
+};
+
+void init_nfs_to_nfsd_localio_ops(void)
+{
+       memcpy(&nfs_to, &nfsd_localio_ops, sizeof(nfsd_localio_ops));
+}
+EXPORT_SYMBOL_GPL(init_nfs_to_nfsd_localio_ops);

From: Mike Snitzer <snitzer@kernel.org>
Date: Wed, 28 Aug 2024 17:04:44 -0500
Subject: [PATCH v15.option1] nfs_common: introduce nfs_localio_ctx struct and interfaces

Introduce struct nfs_localio_ctx (which has nfsd_file and nfsd_net
members) and the interfaces nfs_localio_ctx_alloc() and
nfs_localio_ctx_free().  The next commit will introduce
nfsd_open_local_fh() which returns a nfs_localio_ctx structure.

Also, expose localio's required NFSD symbols to NFS client:
- Make nfsd_open_local_fh() symbol and other required NFSD symbols
  available to NFS in a global 'nfs_to' nfsd_localio_operations
  struct.  Add interfaces get_nfs_to_nfsd_localio_ops() and
  put_nfs_to_nfsd_localio_ops() to allow each NFS client to take a
  reference on NFSD (indirectly through nfs_common).

- Given the unique nature of NFS LOCALIO being an optional feature
  that when used requires NFS share access to NFSD memory: a unique
  bridging of NFSD resources to NFS (via nfs_common) is needed.  But
  that bridge must be dynamic, hence the use of symbol_request() and
  symbol_put() of the one init_nfs_to_nfsd_localio_ops() symbol which
  nfs_common's LOCALIO code uses as a bellwether for both if NFSD is
  available and supports LOCALIO.

- Use of a refcount_t (managed by {get,put}_nfs_to_nfsd_localio_ops)
  is required to ensure NFS doesn't become disjoint from NFSD's
  availability in the face of LOCALIO support being toggled on/off
  coupled with the NFSD module (possibly) being unloaded.

- Introduce nfsd_file_file() wrapper that provides access to
  nfsd_file's backing file.  Keeps nfsd_file structure opaque to NFS
  client (as suggested by Jeff Layton).

- The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
  symbols prepares for the NFS client to use nfsd_file for localio.

Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
Suggested-by: NeilBrown <neilb@suse.de> # nfsd_localio_operations
Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs_common/nfslocalio.c | 107 +++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.c        |  16 ++++++
 fs/nfsd/filecache.h        |   1 +
 fs/nfsd/nfssvc.c           |   2 +
 include/linux/nfslocalio.h |  47 ++++++++++++++++
 5 files changed, 173 insertions(+)

diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 0b2e17c2068f..5f12610a877c 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -5,6 +5,7 @@
 
 #include <linux/module.h>
 #include <linux/rculist.h>
+#include <linux/refcount.h>
 #include <linux/nfslocalio.h>
 #include <net/netns/generic.h>
 
@@ -72,3 +73,109 @@ bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *
 	return is_local;
 }
 EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
+
+/*
+ * The nfs localio code needs to call into nfsd using various symbols (below),
+ * but cannot be statically linked, because that will make the nfs module
+ * depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
+ * nfs_common module will only hold a reference on nfsd when localio is in use.
+ * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
+static refcount_t nfs_to_ref;
+struct nfsd_localio_operations nfs_to;
+EXPORT_SYMBOL_GPL(nfs_to);
+
+bool get_nfs_to_nfsd_localio_ops(void)
+{
+	bool ret = false;
+	init_nfs_to_nfsd_localio_ops_t init_nfs_to;
+
+	spin_lock(&nfs_to_nfsd_lock);
+
+	/* Only get nfsd_localio_operations on first reference */
+	if (refcount_read(&nfs_to_ref) == 0)
+		refcount_set(&nfs_to_ref, 1);
+	else {
+		refcount_inc(&nfs_to_ref);
+		ret = true;
+		goto out;
+	}
+
+	/*
+	 * If NFSD isn't available LOCALIO isn't possible.
+	 * Use init_nfs_to_nfsd_ops symbol as the bellwether,
+	 * if available then nfs_common has NFSD module reference
+	 * on NFS's behalf and can initialize global 'nfs_to'.
+	 */
+	init_nfs_to = symbol_request(init_nfs_to_nfsd_localio_ops);
+	if (init_nfs_to) {
+		init_nfs_to();
+		if (WARN_ON_ONCE(!nfs_to.nfsd_open_local_fh)) {
+			symbol_put(init_nfs_to_nfsd_localio_ops);
+			goto out;
+		}
+		ret = true;
+	}
+out:
+	spin_unlock(&nfs_to_nfsd_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(get_nfs_to_nfsd_localio_ops);
+
+void put_nfs_to_nfsd_localio_ops(void)
+{
+	spin_lock(&nfs_to_nfsd_lock);
+
+	if (!refcount_dec_and_test(&nfs_to_ref))
+		goto out;
+
+	symbol_put(init_nfs_to_nfsd_localio_ops);
+	memset(&nfs_to, 0, sizeof(nfs_to));
+out:
+	spin_unlock(&nfs_to_nfsd_lock);
+}
+EXPORT_SYMBOL_GPL(put_nfs_to_nfsd_localio_ops);
+
+/*
+ * nfs_localio_ctx cache and alloc/free interfaces.
+ */
+static struct kmem_cache *nfs_localio_ctx_cache;
+
+struct nfs_localio_ctx *nfs_localio_ctx_alloc(void)
+{
+	return kmem_cache_alloc(nfs_localio_ctx_cache,
+				GFP_KERNEL | __GFP_ZERO);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_ctx_alloc);
+
+void nfs_localio_ctx_free(struct nfs_localio_ctx *localio)
+{
+	if (localio->nf)
+		nfs_to.nfsd_file_put(localio->nf);
+	if (localio->nn)
+		nfs_to.nfsd_serv_put(localio->nn);
+	kmem_cache_free(nfs_localio_ctx_cache, localio);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_ctx_free);
+
+static int __init nfslocalio_init(void)
+{
+	refcount_set(&nfs_to_ref, 0);
+
+	nfs_localio_ctx_cache = KMEM_CACHE(nfs_localio_ctx, 0);
+	if (!nfs_localio_ctx_cache)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void __exit nfslocalio_exit(void)
+{
+	kmem_cache_destroy(nfs_localio_ctx_cache);
+}
+
+module_init(nfslocalio_init);
+module_exit(nfslocalio_exit);
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2dc72de31f61..1a26f5812578 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -39,6 +39,7 @@
 #include <linux/fsnotify.h>
 #include <linux/seq_file.h>
 #include <linux/rhashtable.h>
+#include <linux/nfslocalio.h>
 
 #include "vfs.h"
 #include "nfsd.h"
@@ -345,6 +346,7 @@ nfsd_file_get(struct nfsd_file *nf)
 		return nf;
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(nfsd_file_get);
 
 /**
  * nfsd_file_put - put the reference to a nfsd_file
@@ -389,6 +391,20 @@ nfsd_file_put(struct nfsd_file *nf)
 	if (refcount_dec_and_test(&nf->nf_ref))
 		nfsd_file_free(nf);
 }
+EXPORT_SYMBOL_GPL(nfsd_file_put);
+
+/**
+ * nfsd_file_file - get the backing file of an nfsd_file
+ * @nf: nfsd_file of which to access the backing file.
+ *
+ * Return backing file for @nf.
+ */
+struct file *
+nfsd_file_file(struct nfsd_file *nf)
+{
+	return nf->nf_file;
+}
+EXPORT_SYMBOL_GPL(nfsd_file_file);
 
 static void
 nfsd_file_dispose_list(struct list_head *dispose)
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 26ada78b8c1e..6fbbb2e32e95 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -56,6 +56,7 @@ int nfsd_file_cache_start_net(struct net *net);
 void nfsd_file_cache_shutdown_net(struct net *net);
 void nfsd_file_put(struct nfsd_file *nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+struct file *nfsd_file_file(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
 void nfsd_file_net_dispose(struct nfsd_net *nn);
 bool nfsd_file_is_cached(struct inode *inode);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c639fbe4d8c2..7b9119b8dd1b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -19,6 +19,7 @@
 #include <linux/sunrpc/svc_xprt.h>
 #include <linux/lockd/bind.h>
 #include <linux/nfsacl.h>
+#include <linux/nfslocalio.h>
 #include <linux/seq_file.h>
 #include <linux/inetdevice.h>
 #include <net/addrconf.h>
@@ -201,6 +202,7 @@ void nfsd_serv_put(struct nfsd_net *nn)
 {
 	percpu_ref_put(&nn->nfsd_serv_ref);
 }
+EXPORT_SYMBOL_GPL(nfsd_serv_put);
 
 static void nfsd_serv_done(struct percpu_ref *ref)
 {
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 9735ae8d3e5e..6cc870122e2a 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -7,6 +7,7 @@
 
 #include <linux/list.h>
 #include <linux/uuid.h>
+#include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/svcauth.h>
 #include <linux/nfs.h>
 #include <net/net_namespace.h>
@@ -28,4 +29,50 @@ void nfs_uuid_begin(nfs_uuid_t *);
 void nfs_uuid_end(nfs_uuid_t *);
 bool nfs_uuid_is_local(const uuid_t *, struct net *, struct auth_domain *);
 
+struct nfsd_file;
+struct nfsd_net;
+
+struct nfs_localio_ctx {
+	struct nfsd_file *nf;
+	struct nfsd_net *nn;
+};
+
+/* localio needs to map filehandle -> struct nfsd_file */
+typedef struct nfs_localio_ctx *
+(*nfs_to_nfsd_open_local_fh_t)(struct net *, struct auth_domain *,
+			       struct rpc_clnt *, const struct cred *,
+			       const struct nfs_fh *, const fmode_t);
+
+extern struct nfs_localio_ctx *
+nfsd_open_local_fh(struct net *, struct auth_domain *,
+		   struct rpc_clnt *, const struct cred *,
+		   const struct nfs_fh *, const fmode_t);
+
+/* localio needs to acquire an nfsd_file */
+typedef struct nfsd_file * (*nfs_to_nfsd_file_get_t)(struct nfsd_file *);
+/* localio needs to release an nfsd_file */
+typedef void (*nfs_to_nfsd_file_put_t)(struct nfsd_file *);
+/* localio needs to access the nf->nf_file */
+typedef struct file * (*nfs_to_nfsd_file_file_t)(struct nfsd_file *);
+/* localio needs to release nn->nfsd_serv */
+typedef void (*nfs_to_nfsd_serv_put_t)(struct nfsd_net *);
+
+struct nfsd_localio_operations {
+	nfs_to_nfsd_open_local_fh_t	nfsd_open_local_fh;
+	nfs_to_nfsd_file_get_t		nfsd_file_get;
+	nfs_to_nfsd_file_put_t		nfsd_file_put;
+	nfs_to_nfsd_file_file_t		nfsd_file_file;
+	nfs_to_nfsd_serv_put_t		nfsd_serv_put;
+} ____cacheline_aligned;
+
+extern struct nfsd_localio_operations nfs_to;
+
+typedef void (*init_nfs_to_nfsd_localio_ops_t)(void);
+extern void init_nfs_to_nfsd_localio_ops(void);
+bool get_nfs_to_nfsd_localio_ops(void);
+void put_nfs_to_nfsd_localio_ops(void);
+
+struct nfs_localio_ctx *nfs_localio_ctx_alloc(void);
+void nfs_localio_ctx_free(struct nfs_localio_ctx *);
+
 #endif  /* __LINUX_NFSLOCALIO_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-30  5:01           ` Mike Snitzer
  2024-08-30  5:08             ` Mike Snitzer
@ 2024-08-30  5:12             ` Mike Snitzer
  2024-08-30  5:34             ` NeilBrown
  2 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-30  5:12 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, linux-nfs, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, Aug 30, 2024 at 01:01:55AM -0400, Mike Snitzer wrote:
> On Fri, Aug 30, 2024 at 02:36:13PM +1000, NeilBrown wrote:
> > On Fri, 30 Aug 2024, Jeff Layton wrote:
> > > 
> > > Have a pointer to a struct nfsd_localio_ops or something in the
> > > nfs_common module. That's initially set to NULL. Then, have a static
> > > structure of that type in nfsd.ko, and have its __init routine set the
> > > pointer in nfs_common to point to the right structure. The __exit
> > > routine will later set it to NULL.
> > > 
> > > > I really don't want all calls in NFS client (or nfs_common) to have to
> > > > first check if nfs_common's 'nfs_to' ops structure is NULL or not.
> > > 
> > > Neil seems to think that's not necessary:
> > > 
> > > "If nfs/localio holds an auth_domain, then it implicitly holds a
> > > reference to the nfsd module and the functions cannot disappear."
> > 
> > On reflection that isn't quite right, but it is the sort of approach
> > that I think we need to take.
> > There are several things that the NFS client needs to hold one to.
> > 
> > 1/ It needs a reference to the nfsd module (or symbols in the module).
> >    I think this can be held long term but we need a clear mechanism for
> >    it to be dropped.
> > 2/ It needs a reference to the nfsd_serv which it gets through the
> >    'struct net' pointer.  I've posted patches to handle that better.
> > 3/ It needs a reference to an auth_domain.  This can safely be a long
> >    term reference.  It can already be invalidated and the code to free
> >    it is in sunrpc which nfs already pins.  Any delay in freeing it only
> >    wastes memory (not much), it doesn't impact anything else.
> > 4/ It needs a reference to the nfsd_file and/or file.  This is currently
> >    done only while the ref to the nfsd_serv is held, so I think there is
> >    no problem there.
> > 
> > So possibly we could take a reference to the nfsd module whenever we
> > store a net in nfs_uuid. and drop the ref whenever we clear that.
> > 
> > That means we cannot call nfsd_open_local_fh() without first getting a
> > ref on the nfsd_serv which my latest code doesn't do.  That is easily
> > fixed.  I'll send a patch for consideration...
> 
> I already implemented 2 different versions today, meant for v15.
> 
> First is a relaxed version of the v14 code (less code, only using
> symbol_request on nfsd_open_local_fh.
> 
> Second is much more relaxed, because it leverages your original
> assumption that the auth_domain ref sufficient.

Corresponding code needed in fs/nfsd/localio.c:

static const struct nfsd_localio_operations nfsd_localio_ops = {
        .nfsd_open_local_fh = nfsd_open_local_fh,
        .nfsd_file_get = nfsd_file_get,
        .nfsd_file_put = nfsd_file_put,
        .nfsd_file_file = nfsd_file_file,
        .nfsd_serv_put = nfsd_serv_put,
};

void nfsd_localio_ops_init(void)
{
        memcpy(&nfs_to, &nfsd_localio_ops, sizeof(nfsd_localio_ops));
}

From: Mike Snitzer <snitzer@kernel.org>
Date: Wed, 28 Aug 2024 17:04:44 -0500
Subject: [PATCH v15.option2] nfs_common: introduce nfs_localio_ctx struct and interfaces

Introduce struct nfs_localio_ctx (which has nfsd_file and nfsd_net
members) and the interfaces nfs_localio_ctx_alloc() and
nfs_localio_ctx_free().  The next commit will introduce
nfsd_open_local_fh() which returns a nfs_localio_ctx structure.

Also, expose localio's required NFSD symbols to NFS client:
- Make nfsd_open_local_fh() symbol and other required NFSD symbols
  available to NFS in a global 'nfs_to' nfsd_localio_operations
  struct. The next commit will also introduce nfsd_localio_ops_init()
  that init_nfsd() will call to initialize 'nfs_to'.

- Introduce nfsd_file_file() wrapper that provides access to
  nfsd_file's backing file.  Keeps nfsd_file structure opaque to NFS
  client (as suggested by Jeff Layton).

- The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
  symbols prepares for the NFS client to use nfsd_file for localio.

Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
Suggested-by: NeilBrown <neilb@suse.de> # nfsd_localio_operations
Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
 fs/nfs_common/nfslocalio.c | 62 ++++++++++++++++++++++++++++++++++++++
 fs/nfsd/filecache.c        | 16 ++++++++++
 fs/nfsd/filecache.h        |  1 +
 fs/nfsd/nfssvc.c           |  2 ++
 include/linux/nfslocalio.h | 43 ++++++++++++++++++++++++++
 5 files changed, 124 insertions(+)

diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 0b2e17c2068f..175064e37a75 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -72,3 +72,65 @@ bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *
 	return is_local;
 }
 EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
+
+/*
+ * The NFS LOCALIO code needs to call into NFSD using various symbols,
+ * but cannot be statically linked, because that will make the NFS
+ * module always depend on the NFSD module.
+ *
+ * 'nfs_to' provides NFS access to NFSD functions needed for LOCALIO,
+ * its lifetime is tightly coupled to the NFSD module and will always
+ * be available to NFS LOCALIO because any successful client<->server
+ * LOCALIO handshake results in a reference taken on an auth_domain,
+ * so NFS implicitly holds a reference to the NFSD module and its
+ * functions in the 'nfs_to' nfsd_localio_operations cannot disappear.
+ *
+ * If the last NFS client using LOCALIO disconnects (and its reference
+ * on NFSD dropped) then NFSD could be unloaded, resulting in 'nfs_to'
+ * functions being invalid pointers. But if NFSD isn't loaded then NFS
+ * will not be able to handshake with NFSD and will have no cause to
+ * try to call 'nfs_to' function pointers. If/when NFSD is reloaded it
+ * will reinitialize the 'nfs_to' function pointers and make LOCALIO
+ * possible.
+ */
+struct nfsd_localio_operations nfs_to;
+EXPORT_SYMBOL_GPL(nfs_to);
+
+/*
+ * nfs_localio_ctx cache and alloc/free interfaces.
+ */
+static struct kmem_cache *nfs_localio_ctx_cache;
+
+struct nfs_localio_ctx *nfs_localio_ctx_alloc(void)
+{
+	return kmem_cache_alloc(nfs_localio_ctx_cache,
+				GFP_KERNEL | __GFP_ZERO);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_ctx_alloc);
+
+void nfs_localio_ctx_free(struct nfs_localio_ctx *localio)
+{
+	if (localio->nf)
+		nfs_to.nfsd_file_put(localio->nf);
+	if (localio->nn)
+		nfs_to.nfsd_serv_put(localio->nn);
+	kmem_cache_free(nfs_localio_ctx_cache, localio);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_ctx_free);
+
+static int __init nfslocalio_init(void)
+{
+	nfs_localio_ctx_cache = KMEM_CACHE(nfs_localio_ctx, 0);
+	if (!nfs_localio_ctx_cache)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void __exit nfslocalio_exit(void)
+{
+	kmem_cache_destroy(nfs_localio_ctx_cache);
+}
+
+module_init(nfslocalio_init);
+module_exit(nfslocalio_exit);
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2dc72de31f61..1a26f5812578 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -39,6 +39,7 @@
 #include <linux/fsnotify.h>
 #include <linux/seq_file.h>
 #include <linux/rhashtable.h>
+#include <linux/nfslocalio.h>
 
 #include "vfs.h"
 #include "nfsd.h"
@@ -345,6 +346,7 @@ nfsd_file_get(struct nfsd_file *nf)
 		return nf;
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(nfsd_file_get);
 
 /**
  * nfsd_file_put - put the reference to a nfsd_file
@@ -389,6 +391,20 @@ nfsd_file_put(struct nfsd_file *nf)
 	if (refcount_dec_and_test(&nf->nf_ref))
 		nfsd_file_free(nf);
 }
+EXPORT_SYMBOL_GPL(nfsd_file_put);
+
+/**
+ * nfsd_file_file - get the backing file of an nfsd_file
+ * @nf: nfsd_file of which to access the backing file.
+ *
+ * Return backing file for @nf.
+ */
+struct file *
+nfsd_file_file(struct nfsd_file *nf)
+{
+	return nf->nf_file;
+}
+EXPORT_SYMBOL_GPL(nfsd_file_file);
 
 static void
 nfsd_file_dispose_list(struct list_head *dispose)
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 26ada78b8c1e..6fbbb2e32e95 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -56,6 +56,7 @@ int nfsd_file_cache_start_net(struct net *net);
 void nfsd_file_cache_shutdown_net(struct net *net);
 void nfsd_file_put(struct nfsd_file *nf);
 struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+struct file *nfsd_file_file(struct nfsd_file *nf);
 void nfsd_file_close_inode_sync(struct inode *inode);
 void nfsd_file_net_dispose(struct nfsd_net *nn);
 bool nfsd_file_is_cached(struct inode *inode);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c639fbe4d8c2..7b9119b8dd1b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -19,6 +19,7 @@
 #include <linux/sunrpc/svc_xprt.h>
 #include <linux/lockd/bind.h>
 #include <linux/nfsacl.h>
+#include <linux/nfslocalio.h>
 #include <linux/seq_file.h>
 #include <linux/inetdevice.h>
 #include <net/addrconf.h>
@@ -201,6 +202,7 @@ void nfsd_serv_put(struct nfsd_net *nn)
 {
 	percpu_ref_put(&nn->nfsd_serv_ref);
 }
+EXPORT_SYMBOL_GPL(nfsd_serv_put);
 
 static void nfsd_serv_done(struct percpu_ref *ref)
 {
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 9735ae8d3e5e..fdb1f278afb6 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -7,6 +7,7 @@
 
 #include <linux/list.h>
 #include <linux/uuid.h>
+#include <linux/sunrpc/clnt.h>
 #include <linux/sunrpc/svcauth.h>
 #include <linux/nfs.h>
 #include <net/net_namespace.h>
@@ -28,4 +29,46 @@ void nfs_uuid_begin(nfs_uuid_t *);
 void nfs_uuid_end(nfs_uuid_t *);
 bool nfs_uuid_is_local(const uuid_t *, struct net *, struct auth_domain *);
 
+struct nfsd_file;
+struct nfsd_net;
+
+struct nfs_localio_ctx {
+	struct nfsd_file *nf;
+	struct nfsd_net *nn;
+};
+
+/* localio needs to map filehandle -> struct nfsd_file */
+typedef struct nfs_localio_ctx *
+(*nfs_to_nfsd_open_local_fh_t)(struct net *, struct auth_domain *,
+			       struct rpc_clnt *, const struct cred *,
+			       const struct nfs_fh *, const fmode_t);
+
+extern struct nfs_localio_ctx *
+nfsd_open_local_fh(struct net *, struct auth_domain *,
+		   struct rpc_clnt *, const struct cred *,
+		   const struct nfs_fh *, const fmode_t);
+
+/* localio needs to acquire an nfsd_file */
+typedef struct nfsd_file * (*nfs_to_nfsd_file_get_t)(struct nfsd_file *);
+/* localio needs to release an nfsd_file */
+typedef void (*nfs_to_nfsd_file_put_t)(struct nfsd_file *);
+/* localio needs to access the nf->nf_file */
+typedef struct file * (*nfs_to_nfsd_file_file_t)(struct nfsd_file *);
+/* localio needs to release nn->nfsd_serv */
+typedef void (*nfs_to_nfsd_serv_put_t)(struct nfsd_net *);
+
+struct nfsd_localio_operations {
+	nfs_to_nfsd_open_local_fh_t	nfsd_open_local_fh;
+	nfs_to_nfsd_file_get_t		nfsd_file_get;
+	nfs_to_nfsd_file_put_t		nfsd_file_put;
+	nfs_to_nfsd_file_file_t		nfsd_file_file;
+	nfs_to_nfsd_serv_put_t		nfsd_serv_put;
+} ____cacheline_aligned;
+
+extern void nfsd_localio_ops_init(void);
+extern struct nfsd_localio_operations nfs_to;
+
+struct nfs_localio_ctx *nfs_localio_ctx_alloc(void);
+void nfs_localio_ctx_free(struct nfs_localio_ctx *);
+
 #endif  /* __LINUX_NFSLOCALIO_H */
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-30  5:01           ` Mike Snitzer
  2024-08-30  5:08             ` Mike Snitzer
  2024-08-30  5:12             ` Mike Snitzer
@ 2024-08-30  5:34             ` NeilBrown
  2024-08-30  6:02               ` Mike Snitzer
  2 siblings, 1 reply; 75+ messages in thread
From: NeilBrown @ 2024-08-30  5:34 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Jeff Layton, linux-nfs, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, 30 Aug 2024, Mike Snitzer wrote:
> On Fri, Aug 30, 2024 at 02:36:13PM +1000, NeilBrown wrote:
> > On Fri, 30 Aug 2024, Jeff Layton wrote:

> > > Have a pointer to a struct nfsd_localio_ops or something in the
> > > nfs_common module. That's initially set to NULL. Then, have a static
> > > structure of that type in nfsd.ko, and have its __init routine set the
> > > pointer in nfs_common to point to the right structure. The __exit
> > > routine will later set it to NULL.
> > > 
> > > > I really don't want all calls in NFS client (or nfs_common) to have to
> > > > first check if nfs_common's 'nfs_to' ops structure is NULL or not.
> > > 
> > > Neil seems to think that's not necessary:
> > > 
> > > "If nfs/localio holds an auth_domain, then it implicitly holds a
> > > reference to the nfsd module and the functions cannot disappear."
> > 
> > On reflection that isn't quite right, but it is the sort of approach
> > that I think we need to take.
> > There are several things that the NFS client needs to hold one to.
> > 
> > 1/ It needs a reference to the nfsd module (or symbols in the module).
> >    I think this can be held long term but we need a clear mechanism for
> >    it to be dropped.
> > 2/ It needs a reference to the nfsd_serv which it gets through the
> >    'struct net' pointer.  I've posted patches to handle that better.
> > 3/ It needs a reference to an auth_domain.  This can safely be a long
> >    term reference.  It can already be invalidated and the code to free
> >    it is in sunrpc which nfs already pins.  Any delay in freeing it only
> >    wastes memory (not much), it doesn't impact anything else.
> > 4/ It needs a reference to the nfsd_file and/or file.  This is currently
> >    done only while the ref to the nfsd_serv is held, so I think there is
> >    no problem there.
> > 
> > So possibly we could take a reference to the nfsd module whenever we
> > store a net in nfs_uuid. and drop the ref whenever we clear that.
> > 
> > That means we cannot call nfsd_open_local_fh() without first getting a
> > ref on the nfsd_serv which my latest code doesn't do.  That is easily
> > fixed.  I'll send a patch for consideration...
> 
> I already implemented 2 different versions today, meant for v15.
> 
> First is a relaxed version of the v14 code (less code, only using
> symbol_request on nfsd_open_local_fh.
> 
> Second is much more relaxed, because it leverages your original
> assumption that the auth_domain ref sufficient.
> 
> I'll reply twice to this mail with each each respective patch.

Thanks... Unfortunately auth_domain isn't sufficient.

This is my version.  It should folded back into one or more earlier
patches.   I think it is simpler.

It is against your v15 but with my 6 nfs_uuid patches replaces your
equivalents. 

Thanks,
NeilBrown

diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 55622084d5c2..18b7554ec516 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -235,8 +235,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
 	if (mode & ~(FMODE_READ | FMODE_WRITE))
 		return NULL;
 
-	localio = nfs_to.nfsd_open_local_fh(&clp->cl_uuid,
-					    clp->cl_rpcclient, cred, fh, mode);
+	localio = nfs_open_local_fh(&clp->cl_uuid,
+				    clp->cl_rpcclient, cred, fh, mode);
 	if (IS_ERR(localio)) {
 		status = PTR_ERR(localio);
 		trace_nfs_local_open_fh(fh, mode, status);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index 8545ee75f756..cd9733eb3e4f 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -54,8 +54,11 @@ static nfs_uuid_t * nfs_uuid_lookup(const uuid_t *uuid)
 	return NULL;
 }
 
+struct module *nfsd_mod;
+
 void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
-		       struct net *net, struct auth_domain *dom)
+		       struct net *net, struct auth_domain *dom,
+		       struct module *mod)
 {
 	nfs_uuid_t *nfs_uuid;
 
@@ -70,6 +73,9 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
 		 */
 		list_move(&nfs_uuid->list, list);
 		nfs_uuid->net = net;
+
+		__module_get(mod);
+		nfsd_mod = mod;
 	}
 	spin_unlock(&nfs_uuid_lock);
 }
@@ -77,8 +83,10 @@ EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
 
 static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
 {
-	if (nfs_uuid->net)
+	if (nfs_uuid->net) {
 		put_net(nfs_uuid->net);
+		module_put(nfsd_mod);
+	}
 	nfs_uuid->net = NULL;
 	if (nfs_uuid->dom)
 		auth_domain_put(nfs_uuid->dom);
@@ -107,6 +115,26 @@ void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
 }
 EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
 
+struct nfs_localio_ctx *nfs_open_local_fh(nfs_uuid_t *uuid,
+		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
+		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
+{
+	struct nfs_localio_ctx *localio;
+
+	rcu_read_lock();
+	if (!READ_ONCE(uuid->net)) {
+		rcu_read_unlock();
+		return ERR_PTR(-ENXIO);
+	}
+	localio = nfs_to.nfsd_open_local_fh(uuid, rpc_clnt, cred,
+					    nfs_fh, fmode);
+	rcu_read_unlock();
+	if (IS_ERR(localio))
+		nfs_to.nfsd_serv_put(localio->nn);
+	return localio;
+}
+EXPORT_SYMBOL_GPL(nfs_open_local_fh);
+
 /*
  * The nfs localio code needs to call into nfsd using various symbols (below),
  * but cannot be statically linked, because that will make the nfs module
@@ -135,7 +163,8 @@ static void put_##NFSD_SYMBOL(void)			\
 /* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
 extern struct nfs_localio_ctx *
 nfsd_open_local_fh(nfs_uuid_t *, struct rpc_clnt *,
-		   const struct cred *, const struct nfs_fh *, const fmode_t);
+		   const struct cred *, const struct nfs_fh *, const fmode_t)
+	__must_hold(rcu);
 DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
 
 /* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 491bf5017d34..d50e54406914 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -45,6 +45,7 @@ struct nfs_localio_ctx *
 nfsd_open_local_fh(nfs_uuid_t *uuid,
 		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
 		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
+	__must_hold(rcu)
 {
 	int mayflags = NFSD_MAY_LOCALIO;
 	int status = 0;
@@ -58,10 +59,6 @@ nfsd_open_local_fh(nfs_uuid_t *uuid,
 	if (nfs_fh->size > NFS4_FHSIZE)
 		return ERR_PTR(-EINVAL);
 
-	localio = nfs_localio_ctx_alloc();
-	if (!localio)
-		return ERR_PTR(-ENOMEM);
-
 	/*
 	 * Not running in nfsd context, so must safely get reference on nfsd_serv.
 	 * But the server may already be shutting down, if so disallow new localio.
@@ -69,17 +66,22 @@ nfsd_open_local_fh(nfs_uuid_t *uuid,
 	 * uuid->net is not NULL, then nfsd_serv_try_get() is safe and if that succeeds
 	 * we will have an implied reference to the net.
 	 */
-	rcu_read_lock();
 	net = READ_ONCE(uuid->net);
 	if (net)
 		nn = net_generic(net, nfsd_net_id);
-	if (unlikely(!nn || !nfsd_serv_try_get(nn))) {
-		rcu_read_unlock();
-		status = -ENXIO;
-		goto out_nfsd_serv;
-	}
+	if (unlikely(!nn || !nfsd_serv_try_get(nn)))
+		return -ENXIO;
+
+	/* Drop the rcu lock for alloc and nfsd_file_acquire_local() */
 	rcu_read_unlock();
 
+	localio = nfs_localio_ctx_alloc();
+	if (!localio) {
+		localio = ERR_PTR(-ENOMEM);
+		nfsd_serv_put(nn);
+		goto out_localio;
+	}
+
 	/* nfs_fh -> svc_fh */
 	fh_init(&fh, NFS4_FHSIZE);
 	fh.fh_handle.fh_size = nfs_fh->size;
@@ -104,11 +106,13 @@ nfsd_open_local_fh(nfs_uuid_t *uuid,
 	fh_put(&fh);
 	if (rq_cred.cr_group_info)
 		put_group_info(rq_cred.cr_group_info);
-out_nfsd_serv:
+
 	if (status) {
 		nfs_localio_ctx_free(localio);
-		return ERR_PTR(status);
+		localio = ERR_PTR(status);
 	}
+out_localio:
+	rcu_read_lock();
 	return localio;
 }
 EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
@@ -136,7 +140,7 @@ static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
 	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
 
 	nfs_uuid_is_local(&argp->uuid, &nn->local_clients,
-			  net, rqstp->rq_client);
+			  net, rqstp->rq_client, THIS_MODULE);
 
 	return rpc_success;
 }
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 2ecceb8b9d3d..c73633120997 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -164,7 +164,8 @@ void		nfsd_filp_close(struct file *fp);
 struct nfs_localio_ctx *
 nfsd_open_local_fh(nfs_uuid_t *,
 		   struct rpc_clnt *, const struct cred *,
-		   const struct nfs_fh *, const fmode_t);
+		   const struct nfs_fh *, const fmode_t)
+	__must_hold(rcu);
 
 static inline int fh_want_write(struct svc_fh *fh)
 {
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index e196f716a2f5..303e82e75b9e 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -29,7 +29,7 @@ typedef struct {
 void nfs_uuid_begin(nfs_uuid_t *);
 void nfs_uuid_end(nfs_uuid_t *);
 void nfs_uuid_is_local(const uuid_t *, struct list_head *,
-		       struct net *, struct auth_domain *);
+		       struct net *, struct auth_domain *, struct module *);
 void nfs_uuid_invalidate_clients(struct list_head *list);
 void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
 
@@ -69,4 +69,8 @@ void put_nfs_to_nfsd_symbols(void);
 struct nfs_localio_ctx *nfs_localio_ctx_alloc(void);
 void nfs_localio_ctx_free(struct nfs_localio_ctx *);
 
+struct nfs_localio_ctx *nfs_open_local_fh(nfs_uuid_t *uuid,
+		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
+		   const struct nfs_fh *nfs_fh, const fmode_t fmode);
+
 #endif  /* __LINUX_NFSLOCALIO_H */

^ permalink raw reply related	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-29  1:04 ` [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces Mike Snitzer
  2024-08-29 16:40   ` Jeff Layton
@ 2024-08-30  5:46   ` NeilBrown
  2024-08-30  5:56     ` Mike Snitzer
  1 sibling, 1 reply; 75+ messages in thread
From: NeilBrown @ 2024-08-30  5:46 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: linux-nfs, Jeff Layton, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Thu, 29 Aug 2024, Mike Snitzer wrote:
> +
> +struct nfs_localio_ctx {
> +	struct nfsd_file *nf;
> +	struct nfsd_net *nn;
> +};

struct nfsd_file contains "struct net *nf_net" which is initialised
early.
So this structure is redundant.

Instead of exporting nfsd_file_put() to nfs-localio, export
nfsd_file_local_put() (or whatever) which both does the nfs_serv_put()
and the nfsd_file_put().

NeilBrown

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-30  5:46   ` NeilBrown
@ 2024-08-30  5:56     ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-30  5:56 UTC (permalink / raw)
  To: NeilBrown
  Cc: linux-nfs, Jeff Layton, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, Aug 30, 2024 at 03:46:31PM +1000, NeilBrown wrote:
> On Thu, 29 Aug 2024, Mike Snitzer wrote:
> > +
> > +struct nfs_localio_ctx {
> > +	struct nfsd_file *nf;
> > +	struct nfsd_net *nn;
> > +};
> 
> struct nfsd_file contains "struct net *nf_net" which is initialised
> early.
> So this structure is redundant.

Oof, unwinding returning nfs_localio_ctx and going back to nfsd_file it is.
 
> Instead of exporting nfsd_file_put() to nfs-localio, export
> nfsd_file_local_put() (or whatever) which both does the nfs_serv_put()
> and the nfsd_file_put().

OK, no more.. I have to catch up! ;)

^ permalink raw reply	[flat|nested] 75+ messages in thread

* Re: [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
  2024-08-30  5:34             ` NeilBrown
@ 2024-08-30  6:02               ` Mike Snitzer
  0 siblings, 0 replies; 75+ messages in thread
From: Mike Snitzer @ 2024-08-30  6:02 UTC (permalink / raw)
  To: NeilBrown
  Cc: Jeff Layton, linux-nfs, Chuck Lever, Anna Schumaker,
	Trond Myklebust, linux-fsdevel

On Fri, Aug 30, 2024 at 03:34:02PM +1000, NeilBrown wrote:
> On Fri, 30 Aug 2024, Mike Snitzer wrote:
> > On Fri, Aug 30, 2024 at 02:36:13PM +1000, NeilBrown wrote:
> > > On Fri, 30 Aug 2024, Jeff Layton wrote:
> 
> > > > Have a pointer to a struct nfsd_localio_ops or something in the
> > > > nfs_common module. That's initially set to NULL. Then, have a static
> > > > structure of that type in nfsd.ko, and have its __init routine set the
> > > > pointer in nfs_common to point to the right structure. The __exit
> > > > routine will later set it to NULL.
> > > > 
> > > > > I really don't want all calls in NFS client (or nfs_common) to have to
> > > > > first check if nfs_common's 'nfs_to' ops structure is NULL or not.
> > > > 
> > > > Neil seems to think that's not necessary:
> > > > 
> > > > "If nfs/localio holds an auth_domain, then it implicitly holds a
> > > > reference to the nfsd module and the functions cannot disappear."
> > > 
> > > On reflection that isn't quite right, but it is the sort of approach
> > > that I think we need to take.
> > > There are several things that the NFS client needs to hold one to.
> > > 
> > > 1/ It needs a reference to the nfsd module (or symbols in the module).
> > >    I think this can be held long term but we need a clear mechanism for
> > >    it to be dropped.
> > > 2/ It needs a reference to the nfsd_serv which it gets through the
> > >    'struct net' pointer.  I've posted patches to handle that better.
> > > 3/ It needs a reference to an auth_domain.  This can safely be a long
> > >    term reference.  It can already be invalidated and the code to free
> > >    it is in sunrpc which nfs already pins.  Any delay in freeing it only
> > >    wastes memory (not much), it doesn't impact anything else.
> > > 4/ It needs a reference to the nfsd_file and/or file.  This is currently
> > >    done only while the ref to the nfsd_serv is held, so I think there is
> > >    no problem there.
> > > 
> > > So possibly we could take a reference to the nfsd module whenever we
> > > store a net in nfs_uuid. and drop the ref whenever we clear that.
> > > 
> > > That means we cannot call nfsd_open_local_fh() without first getting a
> > > ref on the nfsd_serv which my latest code doesn't do.  That is easily
> > > fixed.  I'll send a patch for consideration...
> > 
> > I already implemented 2 different versions today, meant for v15.
> > 
> > First is a relaxed version of the v14 code (less code, only using
> > symbol_request on nfsd_open_local_fh.
> > 
> > Second is much more relaxed, because it leverages your original
> > assumption that the auth_domain ref sufficient.
> > 
> > I'll reply twice to this mail with each each respective patch.
> 
> Thanks... Unfortunately auth_domain isn't sufficient.
> 
> This is my version.  It should folded back into one or more earlier
> patches.   I think it is simpler.
> 
> It is against your v15 but with my 6 nfs_uuid patches replaces your
> equivalents.
> 
> Thanks,
> NeilBrown

Looks good!  But I noticed you are still using the v14
DEFINE_NFS_TO_NFSD_SYMBOL (just implies that nfs_to is getting setup
using symbol_request) so your refcounting via __module_get is
redundant.  But I see your intent, and I can combine what you provided
below with the v15.option2 that I emailed earlier (lean on your
__module_get rather than the insufficnet auth_domain ref).

Thanks.

> 
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 55622084d5c2..18b7554ec516 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -235,8 +235,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
>  	if (mode & ~(FMODE_READ | FMODE_WRITE))
>  		return NULL;
>  
> -	localio = nfs_to.nfsd_open_local_fh(&clp->cl_uuid,
> -					    clp->cl_rpcclient, cred, fh, mode);
> +	localio = nfs_open_local_fh(&clp->cl_uuid,
> +				    clp->cl_rpcclient, cred, fh, mode);
>  	if (IS_ERR(localio)) {
>  		status = PTR_ERR(localio);
>  		trace_nfs_local_open_fh(fh, mode, status);
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index 8545ee75f756..cd9733eb3e4f 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -54,8 +54,11 @@ static nfs_uuid_t * nfs_uuid_lookup(const uuid_t *uuid)
>  	return NULL;
>  }
>  
> +struct module *nfsd_mod;
> +
>  void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
> -		       struct net *net, struct auth_domain *dom)
> +		       struct net *net, struct auth_domain *dom,
> +		       struct module *mod)
>  {
>  	nfs_uuid_t *nfs_uuid;
>  
> @@ -70,6 +73,9 @@ void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
>  		 */
>  		list_move(&nfs_uuid->list, list);
>  		nfs_uuid->net = net;
> +
> +		__module_get(mod);
> +		nfsd_mod = mod;
>  	}
>  	spin_unlock(&nfs_uuid_lock);
>  }
> @@ -77,8 +83,10 @@ EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
>  
>  static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
>  {
> -	if (nfs_uuid->net)
> +	if (nfs_uuid->net) {
>  		put_net(nfs_uuid->net);
> +		module_put(nfsd_mod);
> +	}
>  	nfs_uuid->net = NULL;
>  	if (nfs_uuid->dom)
>  		auth_domain_put(nfs_uuid->dom);
> @@ -107,6 +115,26 @@ void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
>  }
>  EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
>  
> +struct nfs_localio_ctx *nfs_open_local_fh(nfs_uuid_t *uuid,
> +		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> +		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> +{
> +	struct nfs_localio_ctx *localio;
> +
> +	rcu_read_lock();
> +	if (!READ_ONCE(uuid->net)) {
> +		rcu_read_unlock();
> +		return ERR_PTR(-ENXIO);
> +	}
> +	localio = nfs_to.nfsd_open_local_fh(uuid, rpc_clnt, cred,
> +					    nfs_fh, fmode);
> +	rcu_read_unlock();
> +	if (IS_ERR(localio))
> +		nfs_to.nfsd_serv_put(localio->nn);
> +	return localio;
> +}
> +EXPORT_SYMBOL_GPL(nfs_open_local_fh);
> +
>  /*
>   * The nfs localio code needs to call into nfsd using various symbols (below),
>   * but cannot be statically linked, because that will make the nfs module
> @@ -135,7 +163,8 @@ static void put_##NFSD_SYMBOL(void)			\
>  /* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
>  extern struct nfs_localio_ctx *
>  nfsd_open_local_fh(nfs_uuid_t *, struct rpc_clnt *,
> -		   const struct cred *, const struct nfs_fh *, const fmode_t);
> +		   const struct cred *, const struct nfs_fh *, const fmode_t)
> +	__must_hold(rcu);
>  DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
>  
>  /* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> index 491bf5017d34..d50e54406914 100644
> --- a/fs/nfsd/localio.c
> +++ b/fs/nfsd/localio.c
> @@ -45,6 +45,7 @@ struct nfs_localio_ctx *
>  nfsd_open_local_fh(nfs_uuid_t *uuid,
>  		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
>  		   const struct nfs_fh *nfs_fh, const fmode_t fmode)
> +	__must_hold(rcu)
>  {
>  	int mayflags = NFSD_MAY_LOCALIO;
>  	int status = 0;
> @@ -58,10 +59,6 @@ nfsd_open_local_fh(nfs_uuid_t *uuid,
>  	if (nfs_fh->size > NFS4_FHSIZE)
>  		return ERR_PTR(-EINVAL);
>  
> -	localio = nfs_localio_ctx_alloc();
> -	if (!localio)
> -		return ERR_PTR(-ENOMEM);
> -
>  	/*
>  	 * Not running in nfsd context, so must safely get reference on nfsd_serv.
>  	 * But the server may already be shutting down, if so disallow new localio.
> @@ -69,17 +66,22 @@ nfsd_open_local_fh(nfs_uuid_t *uuid,
>  	 * uuid->net is not NULL, then nfsd_serv_try_get() is safe and if that succeeds
>  	 * we will have an implied reference to the net.
>  	 */
> -	rcu_read_lock();
>  	net = READ_ONCE(uuid->net);
>  	if (net)
>  		nn = net_generic(net, nfsd_net_id);
> -	if (unlikely(!nn || !nfsd_serv_try_get(nn))) {
> -		rcu_read_unlock();
> -		status = -ENXIO;
> -		goto out_nfsd_serv;
> -	}
> +	if (unlikely(!nn || !nfsd_serv_try_get(nn)))
> +		return -ENXIO;
> +
> +	/* Drop the rcu lock for alloc and nfsd_file_acquire_local() */
>  	rcu_read_unlock();
>  
> +	localio = nfs_localio_ctx_alloc();
> +	if (!localio) {
> +		localio = ERR_PTR(-ENOMEM);
> +		nfsd_serv_put(nn);
> +		goto out_localio;
> +	}
> +
>  	/* nfs_fh -> svc_fh */
>  	fh_init(&fh, NFS4_FHSIZE);
>  	fh.fh_handle.fh_size = nfs_fh->size;
> @@ -104,11 +106,13 @@ nfsd_open_local_fh(nfs_uuid_t *uuid,
>  	fh_put(&fh);
>  	if (rq_cred.cr_group_info)
>  		put_group_info(rq_cred.cr_group_info);
> -out_nfsd_serv:
> +
>  	if (status) {
>  		nfs_localio_ctx_free(localio);
> -		return ERR_PTR(status);
> +		localio = ERR_PTR(status);
>  	}
> +out_localio:
> +	rcu_read_lock();
>  	return localio;
>  }
>  EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> @@ -136,7 +140,7 @@ static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
>  	struct nfsd_net *nn = net_generic(net, nfsd_net_id);
>  
>  	nfs_uuid_is_local(&argp->uuid, &nn->local_clients,
> -			  net, rqstp->rq_client);
> +			  net, rqstp->rq_client, THIS_MODULE);
>  
>  	return rpc_success;
>  }
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 2ecceb8b9d3d..c73633120997 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -164,7 +164,8 @@ void		nfsd_filp_close(struct file *fp);
>  struct nfs_localio_ctx *
>  nfsd_open_local_fh(nfs_uuid_t *,
>  		   struct rpc_clnt *, const struct cred *,
> -		   const struct nfs_fh *, const fmode_t);
> +		   const struct nfs_fh *, const fmode_t)
> +	__must_hold(rcu);
>  
>  static inline int fh_want_write(struct svc_fh *fh)
>  {
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index e196f716a2f5..303e82e75b9e 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -29,7 +29,7 @@ typedef struct {
>  void nfs_uuid_begin(nfs_uuid_t *);
>  void nfs_uuid_end(nfs_uuid_t *);
>  void nfs_uuid_is_local(const uuid_t *, struct list_head *,
> -		       struct net *, struct auth_domain *);
> +		       struct net *, struct auth_domain *, struct module *);
>  void nfs_uuid_invalidate_clients(struct list_head *list);
>  void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
>  
> @@ -69,4 +69,8 @@ void put_nfs_to_nfsd_symbols(void);
>  struct nfs_localio_ctx *nfs_localio_ctx_alloc(void);
>  void nfs_localio_ctx_free(struct nfs_localio_ctx *);
>  
> +struct nfs_localio_ctx *nfs_open_local_fh(nfs_uuid_t *uuid,
> +		   struct rpc_clnt *rpc_clnt, const struct cred *cred,
> +		   const struct nfs_fh *nfs_fh, const fmode_t fmode);
> +
>  #endif  /* __LINUX_NFSLOCALIO_H */
> 

^ permalink raw reply	[flat|nested] 75+ messages in thread

end of thread, other threads:[~2024-08-30  6:02 UTC | newest]

Thread overview: 75+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-29  1:03 [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
2024-08-29  1:03 ` [PATCH v14 01/25] nfs_common: factor out nfs_errtbl and nfs_stat_to_errno Mike Snitzer
2024-08-29 14:17   ` Jeff Layton
2024-08-29  1:03 ` [PATCH v14 02/25] nfs_common: factor out nfs4_errtbl and nfs4_stat_to_errno Mike Snitzer
2024-08-29 14:17   ` Jeff Layton
2024-08-29  1:03 ` [PATCH v14 03/25] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-08-29 14:19   ` Jeff Layton
2024-08-29  1:03 ` [PATCH v14 04/25] NFSD: Handle @rqstp == NULL in check_nfsd_access() Mike Snitzer
2024-08-29 14:20   ` Jeff Layton
2024-08-29  1:04 ` [PATCH v14 05/25] NFSD: Refactor nfsd_setuser_and_check_port() Mike Snitzer
2024-08-29 14:23   ` Jeff Layton
2024-08-29  1:04 ` [PATCH v14 06/25] NFSD: Avoid using rqstp->rq_vers in nfsd_set_fh_dentry() Mike Snitzer
2024-08-29  1:45   ` [PATCH v14.5 " Mike Snitzer
2024-08-29 16:52     ` Jeff Layton
2024-08-29 14:28   ` [PATCH v14 " Jeff Layton
2024-08-29 15:28     ` Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 07/25] NFSD: Short-circuit fh_verify tracepoints for LOCALIO Mike Snitzer
2024-08-29 14:33   ` Jeff Layton
2024-08-29 14:35     ` Chuck Lever
2024-08-29  1:04 ` [PATCH v14 08/25] nfsd: factor out __fh_verify to allow NULL rqstp to be passed Mike Snitzer
2024-08-29 14:39   ` Jeff Layton
2024-08-29 15:35     ` Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 09/25] nfsd: add nfsd_file_acquire_local() Mike Snitzer
2024-08-29 14:49   ` Jeff Layton
2024-08-29 15:47   ` Chuck Lever
2024-08-29 15:59     ` Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 10/25] nfsd: add nfsd_serv_try_get and nfsd_serv_put Mike Snitzer
2024-08-29 15:49   ` Chuck Lever
2024-08-29 15:57   ` Jeff Layton
2024-08-29 16:01     ` Mike Snitzer
2024-08-29 16:04       ` Chuck Lever
2024-08-29  1:04 ` [PATCH v14 11/25] SUNRPC: remove call_allocate() BUG_ONs Mike Snitzer
2024-08-29 15:58   ` Jeff Layton
2024-08-29  1:04 ` [PATCH v14 12/25] SUNRPC: add svcauth_map_clnt_to_svc_cred_local Mike Snitzer
2024-08-29 15:50   ` Chuck Lever
2024-08-29 16:01   ` Jeff Layton
2024-08-29  1:04 ` [PATCH v14 13/25] SUNRPC: replace program list with program array Mike Snitzer
2024-08-29 16:02   ` Jeff Layton
2024-08-29  1:04 ` [PATCH v14 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-08-29 16:07   ` Jeff Layton
2024-08-29 16:22     ` Mike Snitzer
2024-08-29 23:39   ` NeilBrown
2024-08-30  1:45     ` Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces Mike Snitzer
2024-08-29 16:40   ` Jeff Layton
2024-08-29 16:52     ` Mike Snitzer
2024-08-29 17:48       ` Jeff Layton
2024-08-30  4:36         ` NeilBrown
2024-08-30  5:01           ` Mike Snitzer
2024-08-30  5:08             ` Mike Snitzer
2024-08-30  5:12             ` Mike Snitzer
2024-08-30  5:34             ` NeilBrown
2024-08-30  6:02               ` Mike Snitzer
2024-08-30  5:46   ` NeilBrown
2024-08-30  5:56     ` Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 16/25] nfsd: add localio support Mike Snitzer
2024-08-29 16:01   ` Chuck Lever
2024-08-29 16:15     ` Mike Snitzer
2024-08-29 23:10     ` NeilBrown
2024-08-29 16:49   ` Jeff Layton
2024-08-29 16:59     ` Mike Snitzer
2024-08-29 17:18       ` Chuck Lever
2024-08-29  1:04 ` [PATCH v14 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-08-29 16:50   ` Jeff Layton
2024-08-29  1:04 ` [PATCH v14 18/25] nfs: pass struct nfs_localio_ctx to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 19/25] nfs: add localio support Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 20/25] nfs: enable localio for non-pNFS IO Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 21/25] pnfs/flexfiles: enable localio support Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 22/25] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 24/25] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-29  1:04 ` [PATCH v14 25/25] nfs: add FAQ section to Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-08-29  1:47   ` [PATCH v14.5 " Mike Snitzer
2024-08-29  1:42 ` [PATCH v14 00/25] nfs/nfsd: add support for LOCALIO Mike Snitzer
2024-08-29  1:50   ` Mike Snitzer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).