* [PATCH v9 00/19] nfs/nfsd: add support for localio
@ 2024-06-28 21:10 Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 01/19] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
` (19 more replies)
0 siblings, 20 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Hi,
I'd prefer to see these changes land upstream for 6.11 if possible.
They are adequately Kconfig'd to certainly pose no risk if disabled.
And even if localio enabled it has proven to work well with increased
testing.
Worked with Kent Overstreet to enable testing integration with ktest
running xfstests, the dashboard is here:
https://evilpiepirate.org/~testdashboard/ci?branch=snitm-nfs
(it is running way more xfstests tests than is usual for nfs, would be
good to reconcile that with the listing provided here:
https://wiki.linux-nfs.org/wiki/index.php/Xfstests )
Changes since v8:
- Fixed xfstests generic/355 (clear suid after write) as a side-effect
of dropping the "nfs/localio: use dedicated workqueues for
filesystem read and write" patch (XFS is looking at the security
context of the task... which is really odd!)
- Refactored and fixed nfs_local_vfs_getattr() to support NFS v4 as
requested by Neil.
- Fixed potential for localio file opens to prevent nfsd from shutting
down (as caught by Jeff's helpful review) by switching to using
percpu_ref_tryget_live (and renamed nfsd_serv_get to
nfsd_serv_try_get).
- Removed all dprintk() from fs/nfsd/localio.c
- Removed one dprintk() from fs/nfs/localio.c, left others because the
nfs client maintainers don't seem so against them (they do require
explicit enablement after all).
TODO:
- Hopefully get a favorable response to this patch from XFS engineers:
https://marc.info/?l=linux-xfs&m=171959152810706&w=2
(otherwise, will need to revisit using dedicated workqueue patch)
All review and comments are welcome!
Thanks,
Mike
My git tree is here:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/
This v9 is both branch nfs-localio-for-6.11 (always tracks latest)
and nfs-localio-for-6.11.v9
Mike Snitzer (11):
nfs_common: add NFS LOCALIO auxiliary protocol enablement
nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4
nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg
nfs: implement client support for NFS_LOCALIO_PROGRAM
nfsd: add "localio" support
nfsd/localio: manage netns reference in nfsd_open_local_fh
nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh
nfsd: add Kconfig options to allow localio to be enabled
nfsd: implement server support for NFS_LOCALIO_PROGRAM
nfs: add Documentation/filesystems/nfs/localio.rst
NeilBrown (1):
SUNRPC: replace program list with program array
Trond Myklebust (2):
nfs: enable localio for non-pNFS I/O
pnfs/flexfiles: Enable localio for flexfiles I/O
Weston Andros Adamson (5):
nfs: pass nfs_client to nfs_initiate_pgio
nfs: pass descriptor thru nfs_initiate_pgio path
nfs: pass struct file to nfs_init_pgio and nfs_init_commit
sunrpc: add rpcauth_map_to_svc_cred_local
nfs: add "localio" support
Documentation/filesystems/nfs/localio.rst | 135 ++++
fs/Kconfig | 3 +
fs/nfs/Kconfig | 14 +
fs/nfs/Makefile | 1 +
fs/nfs/blocklayout/blocklayout.c | 6 +-
fs/nfs/client.c | 15 +-
fs/nfs/filelayout/filelayout.c | 16 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 131 +++-
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 +
fs/nfs/inode.c | 4 +
fs/nfs/internal.h | 60 +-
fs/nfs/localio.c | 827 ++++++++++++++++++++++
fs/nfs/nfs4xdr.c | 13 -
fs/nfs/nfstrace.h | 61 ++
fs/nfs/pagelist.c | 32 +-
fs/nfs/pnfs.c | 24 +-
fs/nfs/pnfs.h | 6 +-
fs/nfs/pnfs_nfs.c | 2 +-
fs/nfs/write.c | 13 +-
fs/nfs_common/Makefile | 3 +
fs/nfs_common/nfslocalio.c | 74 ++
fs/nfsd/Kconfig | 14 +
fs/nfsd/Makefile | 1 +
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 319 +++++++++
fs/nfsd/netns.h | 12 +-
fs/nfsd/nfsctl.c | 2 +-
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/nfssvc.c | 116 ++-
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 9 +
include/linux/nfs.h | 9 +
include/linux/nfs_fs.h | 2 +
include/linux/nfs_fs_sb.h | 10 +
include/linux/nfs_xdr.h | 20 +-
include/linux/nfslocalio.h | 41 ++
include/linux/sunrpc/auth.h | 4 +
include/linux/sunrpc/svc.h | 7 +-
net/sunrpc/auth.c | 15 +
net/sunrpc/clnt.c | 1 -
net/sunrpc/svc.c | 68 +-
net/sunrpc/svc_xprt.c | 2 +-
net/sunrpc/svcauth_unix.c | 3 +-
44 files changed, 1975 insertions(+), 135 deletions(-)
create mode 100644 Documentation/filesystems/nfs/localio.rst
create mode 100644 fs/nfs/localio.c
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 fs/nfsd/localio.c
create mode 100644 include/linux/nfslocalio.h
--
2.44.0
^ permalink raw reply [flat|nested] 44+ messages in thread
* [PATCH v9 01/19] nfs: pass nfs_client to nfs_initiate_pgio
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 02/19] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
` (18 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
The nfs_client is needed for localio support. Otherwise it won't be
possible to disable localio if it is attempted but fails.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/filelayout/filelayout.c | 4 ++--
fs/nfs/flexfilelayout/flexfilelayout.c | 6 ++++--
fs/nfs/internal.h | 5 +++--
fs/nfs/pagelist.c | 10 ++++++----
4 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 29d84dc66ca3..43e16e9e0176 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -486,7 +486,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
@@ -528,7 +528,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 24188af56d5b..327f1a5c9fbe 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1803,7 +1803,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
0, RPC_TASK_SOFTCONN);
@@ -1871,7 +1872,8 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
sync, RPC_TASK_SOFTCONN);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..a9c0c29f7804 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,8 +306,9 @@ extern const struct nfs_pageio_ops nfs_pgio_rw_ops;
struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+ struct nfs_pgio_header *hdr, const struct cred *cred,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 6efb5068c116..d9b795c538cd 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -844,8 +844,9 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
rpc_exit(task, err);
}
-int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+ struct nfs_pgio_header *hdr, const struct cred *cred,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
{
struct rpc_task *task;
@@ -855,7 +856,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
.rpc_cred = cred,
};
struct rpc_task_setup task_setup_data = {
- .rpc_client = clnt,
+ .rpc_client = rpc_clnt,
.task = &hdr->task,
.rpc_message = &msg,
.callback_ops = call_ops,
@@ -1070,7 +1071,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
if (ret == 0) {
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
+ ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)->nfs_client,
+ NFS_CLIENT(hdr->inode),
hdr,
hdr->cred,
NFS_PROTO(hdr->inode),
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 02/19] nfs: pass descriptor thru nfs_initiate_pgio path
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 01/19] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 03/19] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
` (17 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
This is needed for localio support.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/blocklayout/blocklayout.c | 6 ++++--
fs/nfs/filelayout/filelayout.c | 10 ++++++----
fs/nfs/flexfilelayout/flexfilelayout.c | 10 ++++++----
fs/nfs/internal.h | 6 +++---
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/pnfs.c | 24 +++++++++++++-----------
fs/nfs/pnfs.h | 6 ++++--
7 files changed, 40 insertions(+), 28 deletions(-)
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 6be13e0ec170..6a61ddd1835f 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -227,7 +227,8 @@ bl_end_par_io_read(void *data)
}
static enum pnfs_try_status
-bl_read_pagelist(struct nfs_pgio_header *header)
+bl_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *header)
{
struct pnfs_block_layout *bl = BLK_LSEG2EXT(header->lseg);
struct pnfs_block_dev_map map = { .start = NFS4_MAX_UINT64 };
@@ -372,7 +373,8 @@ static void bl_end_par_io_write(void *data)
}
static enum pnfs_try_status
-bl_write_pagelist(struct nfs_pgio_header *header, int sync)
+bl_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *header, int sync)
{
struct pnfs_block_layout *bl = BLK_LSEG2EXT(header->lseg);
struct pnfs_block_dev_map map = { .start = NFS4_MAX_UINT64 };
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 43e16e9e0176..f9b600c4a2b5 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -447,7 +447,8 @@ static const struct rpc_call_ops filelayout_commit_call_ops = {
};
static enum pnfs_try_status
-filelayout_read_pagelist(struct nfs_pgio_header *hdr)
+filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -486,7 +487,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
@@ -494,7 +495,8 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
/* Perform async writes. */
static enum pnfs_try_status
-filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int sync)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -528,7 +530,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 327f1a5c9fbe..22c0e8014afb 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1751,7 +1751,8 @@ static const struct rpc_call_ops ff_layout_commit_call_ops_v4 = {
};
static enum pnfs_try_status
-ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
+ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -1803,7 +1804,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
@@ -1822,7 +1823,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
/* Perform async writes. */
static enum pnfs_try_status
-ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int sync)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -1872,7 +1874,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index a9c0c29f7804..f6e56fdd8bc2 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,9 +306,9 @@ extern const struct nfs_pageio_ops nfs_pgio_rw_ops;
struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
- struct nfs_pgio_header *hdr, const struct cred *cred,
- const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
+ struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
+ const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index d9b795c538cd..3786d767e2ff 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -844,7 +844,8 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
rpc_exit(task, err);
}
-int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
+ struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
struct nfs_pgio_header *hdr, const struct cred *cred,
const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
@@ -1071,7 +1072,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
if (ret == 0) {
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)->nfs_client,
+ ret = nfs_initiate_pgio(desc,
+ NFS_SERVER(hdr->inode)->nfs_client,
NFS_CLIENT(hdr->inode),
hdr,
hdr->cred,
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b5834728f31b..c9015179b72c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2885,10 +2885,11 @@ pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
}
static enum pnfs_try_status
-pnfs_try_to_write_data(struct nfs_pgio_header *hdr,
- const struct rpc_call_ops *call_ops,
- struct pnfs_layout_segment *lseg,
- int how)
+pnfs_try_to_write_data(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops,
+ struct pnfs_layout_segment *lseg,
+ int how)
{
struct inode *inode = hdr->inode;
enum pnfs_try_status trypnfs;
@@ -2898,7 +2899,7 @@ pnfs_try_to_write_data(struct nfs_pgio_header *hdr,
dprintk("%s: Writing ino:%lu %u@%llu (how %d)\n", __func__,
inode->i_ino, hdr->args.count, hdr->args.offset, how);
- trypnfs = nfss->pnfs_curr_ld->write_pagelist(hdr, how);
+ trypnfs = nfss->pnfs_curr_ld->write_pagelist(desc, hdr, how);
if (trypnfs != PNFS_NOT_ATTEMPTED)
nfs_inc_stats(inode, NFSIOS_PNFS_WRITE);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
@@ -2913,7 +2914,7 @@ pnfs_do_write(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- trypnfs = pnfs_try_to_write_data(hdr, call_ops, lseg, how);
+ trypnfs = pnfs_try_to_write_data(desc, hdr, call_ops, lseg, how);
switch (trypnfs) {
case PNFS_NOT_ATTEMPTED:
pnfs_write_through_mds(desc, hdr);
@@ -3012,9 +3013,10 @@ pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
* Call the appropriate parallel I/O subsystem read function.
*/
static enum pnfs_try_status
-pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
- const struct rpc_call_ops *call_ops,
- struct pnfs_layout_segment *lseg)
+pnfs_try_to_read_data(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops,
+ struct pnfs_layout_segment *lseg)
{
struct inode *inode = hdr->inode;
struct nfs_server *nfss = NFS_SERVER(inode);
@@ -3025,7 +3027,7 @@ pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
dprintk("%s: Reading ino:%lu %u@%llu\n",
__func__, inode->i_ino, hdr->args.count, hdr->args.offset);
- trypnfs = nfss->pnfs_curr_ld->read_pagelist(hdr);
+ trypnfs = nfss->pnfs_curr_ld->read_pagelist(desc, hdr);
if (trypnfs != PNFS_NOT_ATTEMPTED)
nfs_inc_stats(inode, NFSIOS_PNFS_READ);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
@@ -3058,7 +3060,7 @@ pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- trypnfs = pnfs_try_to_read_data(hdr, call_ops, lseg);
+ trypnfs = pnfs_try_to_read_data(desc, hdr, call_ops, lseg);
switch (trypnfs) {
case PNFS_NOT_ATTEMPTED:
pnfs_read_through_mds(desc, hdr);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index fa5beeaaf5da..92acb837cfa6 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -157,8 +157,10 @@ struct pnfs_layoutdriver_type {
* Return PNFS_ATTEMPTED to indicate the layout code has attempted
* I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
*/
- enum pnfs_try_status (*read_pagelist)(struct nfs_pgio_header *);
- enum pnfs_try_status (*write_pagelist)(struct nfs_pgio_header *, int);
+ enum pnfs_try_status (*read_pagelist)(struct nfs_pageio_descriptor *,
+ struct nfs_pgio_header *);
+ enum pnfs_try_status (*write_pagelist)(struct nfs_pageio_descriptor *,
+ struct nfs_pgio_header *, int);
void (*free_deviceid_node) (struct nfs4_deviceid_node *);
struct nfs4_deviceid_node * (*alloc_deviceid_node)
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 03/19] nfs: pass struct file to nfs_init_pgio and nfs_init_commit
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 01/19] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 02/19] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 04/19] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
` (16 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
This is needed for localio support.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/filelayout/filelayout.c | 6 +++---
fs/nfs/flexfilelayout/flexfilelayout.c | 6 +++---
fs/nfs/internal.h | 6 ++++--
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/pnfs_nfs.c | 2 +-
fs/nfs/write.c | 5 +++--
6 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index f9b600c4a2b5..b9e5e7bd15ca 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -489,7 +489,7 @@ filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
/* Perform an asynchronous read to ds */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
- 0, RPC_TASK_SOFTCONN);
+ 0, RPC_TASK_SOFTCONN, NULL);
return PNFS_ATTEMPTED;
}
@@ -532,7 +532,7 @@ filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
/* Perform an asynchronous write */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
- sync, RPC_TASK_SOFTCONN);
+ sync, RPC_TASK_SOFTCONN, NULL);
return PNFS_ATTEMPTED;
}
@@ -1013,7 +1013,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
data->args.fh = fh;
return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode),
&filelayout_commit_call_ops, how,
- RPC_TASK_SOFTCONN);
+ RPC_TASK_SOFTCONN, NULL);
out_err:
pnfs_generic_prepare_to_resend_writes(data);
pnfs_generic_commit_release(data);
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 22c0e8014afb..3ea07446f05a 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1808,7 +1808,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
- 0, RPC_TASK_SOFTCONN);
+ 0, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1878,7 +1878,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
- sync, RPC_TASK_SOFTCONN);
+ sync, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1953,7 +1953,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_commit_call_ops_v3 :
&ff_layout_commit_call_ops_v4,
- how, RPC_TASK_SOFTCONN);
+ how, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return ret;
out_err:
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f6e56fdd8bc2..958c8de072e2 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -309,7 +309,8 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
- const struct rpc_call_ops *call_ops, int how, int flags);
+ const struct rpc_call_ops *call_ops, int how, int flags,
+ struct file *localio);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
@@ -529,7 +530,8 @@ extern int nfs_initiate_commit(struct rpc_clnt *clnt,
struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
- int how, int flags);
+ int how, int flags,
+ struct file *localio);
extern void nfs_init_commit(struct nfs_commit_data *data,
struct list_head *head,
struct pnfs_layout_segment *lseg,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 3786d767e2ff..57d62db3be5b 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -848,7 +848,8 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
struct nfs_pgio_header *hdr, const struct cred *cred,
const struct nfs_rpc_ops *rpc_ops,
- const struct rpc_call_ops *call_ops, int how, int flags)
+ const struct rpc_call_ops *call_ops, int how, int flags,
+ struct file *localio)
{
struct rpc_task *task;
struct rpc_message msg = {
@@ -1080,7 +1081,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
NFS_PROTO(hdr->inode),
desc->pg_rpc_callops,
desc->pg_ioflags,
- RPC_TASK_CRED_NOREF | task_flags);
+ RPC_TASK_CRED_NOREF | task_flags,
+ NULL);
}
return ret;
}
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 88e061bd711b..ecfde2649cf3 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -537,7 +537,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
nfs_initiate_commit(NFS_CLIENT(inode), data,
NFS_PROTO(data->inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF);
+ RPC_TASK_CRED_NOREF, NULL);
} else {
nfs_init_commit(data, NULL, data->lseg, cinfo);
initiate_commit(data, how);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2329cbb0e446..267bed2a4ceb 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1670,7 +1670,8 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release);
int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
- int how, int flags)
+ int how, int flags,
+ struct file *localio)
{
struct rpc_task *task;
int priority = flush_task_priority(how);
@@ -1816,7 +1817,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
task_flags = RPC_TASK_MOVEABLE;
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF | task_flags);
+ RPC_TASK_CRED_NOREF | task_flags, NULL);
}
/*
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 04/19] sunrpc: add rpcauth_map_to_svc_cred_local
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (2 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 03/19] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 05/19] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
` (15 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
Add new funtion rpcauth_map_to_svc_cred_local which maps a generic
cred to a svc_cred suitable for use in nfsd.
This is needed by the localio code to map nfs client creds to nfs
server credentials.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
include/linux/sunrpc/auth.h | 4 ++++
net/sunrpc/auth.c | 15 +++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index 61e58327b1aa..872f594a924c 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -11,6 +11,7 @@
#define _LINUX_SUNRPC_AUTH_H
#include <linux/sunrpc/sched.h>
+#include <linux/sunrpc/svcauth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/sunrpc/xdr.h>
@@ -184,6 +185,9 @@ int rpcauth_uptodatecred(struct rpc_task *);
int rpcauth_init_credcache(struct rpc_auth *);
void rpcauth_destroy_credcache(struct rpc_auth *);
void rpcauth_clear_credcache(struct rpc_cred_cache *);
+void rpcauth_map_to_svc_cred_local(struct rpc_auth *,
+ const struct cred *,
+ struct svc_cred *);
char * rpcauth_stringify_acceptor(struct rpc_cred *);
static inline
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 04534ea537c8..00f12ca779c5 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -308,6 +308,21 @@ rpcauth_init_credcache(struct rpc_auth *auth)
}
EXPORT_SYMBOL_GPL(rpcauth_init_credcache);
+void
+rpcauth_map_to_svc_cred_local(struct rpc_auth *auth, const struct cred *cred,
+ struct svc_cred *svc)
+{
+ svc->cr_uid = cred->uid;
+ svc->cr_gid = cred->gid;
+ svc->cr_flavor = auth->au_flavor;
+ if (cred->group_info)
+ svc->cr_group_info = get_group_info(cred->group_info);
+ /* These aren't relevant for local (network is bypassed) */
+ svc->cr_principal = NULL;
+ svc->cr_gss_mech = NULL;
+}
+EXPORT_SYMBOL_GPL(rpcauth_map_to_svc_cred_local);
+
char *
rpcauth_stringify_acceptor(struct rpc_cred *cred)
{
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 05/19] nfs_common: add NFS LOCALIO auxiliary protocol enablement
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (3 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 04/19] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 06/19] nfs: add "localio" support Mike Snitzer
` (14 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Localio is used by nfsd to add access to a global nfsd_uuids list in
nfs_common that is used to register and then identify local nfsd
instances.
nfsd_uuids is protected by nfsd_mutex or RCU read lock. List is
composed of nfsd_uuid_t instances that are managed as nfsd creates
them (per network namespace).
nfsd_uuid_is_local() will be used to search all local nfsd for the
client specified nfsd uuid.
This commit also adds all the nfs_client members required to implement
the entire localio feature (which depends on the LOCALIO protocol).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/client.c | 8 +++++
fs/nfs_common/Makefile | 3 ++
fs/nfs_common/nfslocalio.c | 74 ++++++++++++++++++++++++++++++++++++++
fs/nfsd/netns.h | 4 +++
fs/nfsd/nfssvc.c | 12 ++++++-
include/linux/nfs_fs_sb.h | 9 +++++
include/linux/nfslocalio.h | 39 ++++++++++++++++++++
7 files changed, 148 insertions(+), 1 deletion(-)
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 include/linux/nfslocalio.h
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index de77848ae654..bcdf8d42cbc7 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -178,6 +178,14 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1;
clp->cl_net = get_net(cl_init->net);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ seqlock_init(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ clp->nfsd_open_local_fh = NULL;
+ clp->cl_nfssvc_net = NULL;
+#endif /* CONFIG_NFS_LOCALIO */
+
clp->cl_principal = "*";
clp->cl_xprtsec = cl_init->xprtsec;
return clp;
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index 119c75ab9fd0..d81623b76aba 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -6,5 +6,8 @@
obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
nfs_acl-objs := nfsacl.o
+obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o
+nfs_localio-objs := nfslocalio.o
+
obj-$(CONFIG_GRACE_PERIOD) += grace.o
obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
new file mode 100644
index 000000000000..a234aa92950f
--- /dev/null
+++ b/fs/nfs_common/nfslocalio.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/rculist.h>
+#include <linux/nfslocalio.h>
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("NFS localio protocol bypass support");
+
+/*
+ * Global list of nfsd_uuid_t instances, add/remove
+ * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
+ * Reads are protected by RCU read lock (see below).
+ */
+LIST_HEAD(nfsd_uuids);
+EXPORT_SYMBOL(nfsd_uuids);
+
+/* Must be called with RCU read lock held. */
+static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid,
+ struct net **netp)
+{
+ nfsd_uuid_t *nfsd_uuid;
+
+ list_for_each_entry_rcu(nfsd_uuid, &nfsd_uuids, list)
+ if (uuid_equal(&nfsd_uuid->uuid, uuid)) {
+ *netp = nfsd_uuid->net;
+ return &nfsd_uuid->uuid;
+ }
+
+ return &uuid_null;
+}
+
+bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp)
+{
+ bool is_local;
+ const uuid_t *nfsd_uuid;
+
+ rcu_read_lock();
+ nfsd_uuid = nfsd_uuid_lookup(uuid, netp);
+ is_local = !uuid_is_null(nfsd_uuid);
+ rcu_read_unlock();
+
+ return is_local;
+}
+EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
+
+/*
+ * The nfs localio code needs to call into nfsd to do the filehandle -> struct path
+ * mapping, but cannot be statically linked, because that will make the nfs module
+ * depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
+ * nfs_common module will only hold a reference on nfsd when localio is in use.
+ * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+
+extern int nfsd_open_local_fh(struct net *, struct rpc_clnt *rpc_clnt,
+ const struct cred *cred, const struct nfs_fh *nfs_fh,
+ const fmode_t fmode, struct file **pfilp);
+
+nfs_to_nfsd_open_t get_nfsd_open_local_fh(void)
+{
+ return symbol_request(nfsd_open_local_fh);
+}
+EXPORT_SYMBOL_GPL(get_nfsd_open_local_fh);
+
+void put_nfsd_open_local_fh(void)
+{
+ symbol_put(nfsd_open_local_fh);
+}
+EXPORT_SYMBOL_GPL(put_nfsd_open_local_fh);
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 14ec15656320..0c5a1d97e4ac 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -15,6 +15,7 @@
#include <linux/percpu_counter.h>
#include <linux/siphash.h>
#include <linux/sunrpc/stats.h>
+#include <linux/nfslocalio.h>
/* Hash tables for nfs4_clientid state */
#define CLIENT_HASH_BITS 4
@@ -213,6 +214,9 @@ struct nfsd_net {
/* last time an admin-revoke happened for NFSv4.0 */
time64_t nfs40_last_revoke;
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ nfsd_uuid_t nfsd_uuid;
+#endif
};
/* Simple check to find out if a given net was properly initialized */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 9edb4f7c4cc2..1222a0a33fe1 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -19,6 +19,7 @@
#include <linux/sunrpc/svc_xprt.h>
#include <linux/lockd/bind.h>
#include <linux/nfsacl.h>
+#include <linux/nfslocalio.h>
#include <linux/seq_file.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
@@ -427,6 +428,10 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
#ifdef CONFIG_NFSD_V4_2_INTER_SSC
nfsd4_ssc_init_umount_work(nn);
+#endif
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ INIT_LIST_HEAD(&nn->nfsd_uuid.list);
+ list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
nn->nfsd_net_up = true;
return 0;
@@ -456,6 +461,9 @@ static void nfsd_shutdown_net(struct net *net)
lockd_down(net);
nn->lockd_up = false;
}
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ list_del_rcu(&nn->nfsd_uuid.list);
+#endif
nn->nfsd_net_up = false;
nfsd_shutdown_generic();
}
@@ -802,7 +810,9 @@ nfsd_svc(int n, int *nthreads, struct net *net, const struct cred *cred, const c
strscpy(nn->nfsd_name, scope ? scope : utsname()->nodename,
sizeof(nn->nfsd_name));
-
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ uuid_gen(&nn->nfsd_uuid.uuid);
+#endif
error = nfsd_create_serv(net);
if (error)
goto out;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 92de074e63b9..e58e706a6503 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -8,6 +8,7 @@
#include <linux/wait.h>
#include <linux/nfs_xdr.h>
#include <linux/sunrpc/xprt.h>
+#include <linux/nfslocalio.h>
#include <linux/atomic.h>
#include <linux/refcount.h>
@@ -125,6 +126,14 @@ struct nfs_client {
struct net *cl_net;
struct list_head pending_cb_stateids;
struct rcu_head rcu;
+
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ struct timespec64 cl_nfssvc_boot;
+ seqlock_t cl_boot_lock;
+ struct rpc_clnt * cl_rpcclient_localio;
+ struct net * cl_nfssvc_net;
+ nfs_to_nfsd_open_t nfsd_open_local_fh;
+#endif /* CONFIG_NFS_LOCALIO */
};
/*
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
new file mode 100644
index 000000000000..c9592ad0afe2
--- /dev/null
+++ b/include/linux/nfslocalio.h
@@ -0,0 +1,39 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+#ifndef __LINUX_NFSLOCALIO_H
+#define __LINUX_NFSLOCALIO_H
+
+#include <linux/list.h>
+#include <linux/uuid.h>
+#include <linux/nfs.h>
+#include <net/net_namespace.h>
+
+/*
+ * Global list of nfsd_uuid_t instances, add/remove
+ * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
+ */
+extern struct list_head nfsd_uuids;
+
+/*
+ * Each nfsd instance has an nfsd_uuid_t that is accessible through the
+ * global nfsd_uuids list. Useful to allow a client to negotiate if localio
+ * possible with its server.
+ */
+typedef struct {
+ uuid_t uuid;
+ struct list_head list;
+ struct net *net; /* nfsd's network namespace */
+} nfsd_uuid_t;
+
+bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp);
+
+typedef int (*nfs_to_nfsd_open_t)(struct net *, struct rpc_clnt *,
+ const struct cred *, const struct nfs_fh *,
+ const fmode_t, struct file **);
+
+nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
+void put_nfsd_open_local_fh(void);
+
+#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 06/19] nfs: add "localio" support
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (4 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 05/19] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4 Mike Snitzer
` (13 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.
nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a Linux-only LOCALIO auxiliary RPC protocol.
This has dynamic binding with the nfsd module (via nfs_localio module
which is part of nfs_common). Localio will only work if nfsd is
already loaded.
The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use localio support.
CONFIG_NFS_LOCALIO controls the client enablement.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/Kconfig | 3 +
fs/nfs/Kconfig | 14 +
fs/nfs/Makefile | 1 +
fs/nfs/client.c | 3 +
fs/nfs/inode.c | 4 +
fs/nfs/internal.h | 51 +++
fs/nfs/localio.c | 654 ++++++++++++++++++++++++++++++++++++++
fs/nfs/nfstrace.h | 61 ++++
fs/nfs/pagelist.c | 3 +
fs/nfs/write.c | 3 +
include/linux/nfs.h | 2 +
include/linux/nfs_fs.h | 2 +
include/linux/nfs_fs_sb.h | 1 +
13 files changed, 802 insertions(+)
create mode 100644 fs/nfs/localio.c
diff --git a/fs/Kconfig b/fs/Kconfig
index a46b0cbc4d8f..170083ff2a51 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
tristate
select FS_POSIX_ACL
+config NFS_COMMON_LOCALIO_SUPPORT
+ tristate
+
config NFS_COMMON
bool
depends on NFSD || NFS_FS || LOCKD
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 57249f040dfc..311ae8bc587f 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -86,6 +86,20 @@ config NFS_V4
If unsure, say Y.
+config NFS_LOCALIO
+ tristate "NFS client support for the LOCALIO auxiliary protocol"
+ depends on NFS_V3 || NFS_V4
+ select NFS_COMMON_LOCALIO_SUPPORT
+ help
+ Some NFS servers support an auxiliary NFS LOCALIO protocol
+ that is not an official part of the NFS version 3 or 4 protocol.
+
+ This option enables support for the LOCALIO protocol in the
+ kernel's NFS client. Enable this to bypass using the NFS
+ protocol when issuing reads, writes and commits to the server.
+
+ If unsure, say N.
+
config NFS_SWAP
bool "Provide swap over NFS support"
default n
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 5f6db37f461e..9fb2f2cac87e 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -13,6 +13,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \
nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
+nfs-$(CONFIG_NFS_LOCALIO) += localio.o
obj-$(CONFIG_NFS_V2) += nfsv2.o
nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index bcdf8d42cbc7..1300c388f971 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -241,6 +241,8 @@ static void pnfs_init_server(struct nfs_server *server)
*/
void nfs_free_client(struct nfs_client *clp)
{
+ nfs_local_disable(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
@@ -432,6 +434,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
+ nfs_local_probe(new);
return rpc_ops->init_client(new, cl_init);
}
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index acef52ecb1bb..f9923cbf6058 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -39,6 +39,7 @@
#include <linux/slab.h>
#include <linux/compat.h>
#include <linux/freezer.h>
+#include <linux/file.h>
#include <linux/uaccess.h>
#include <linux/iversion.h>
@@ -1053,6 +1054,7 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
ctx->lock_context.open_context = ctx;
INIT_LIST_HEAD(&ctx->list);
ctx->mdsthreshold = NULL;
+ ctx->local_filp = NULL;
return ctx;
}
EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
@@ -1084,6 +1086,8 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
nfs_sb_deactive(sb);
put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
kfree(ctx->mdsthreshold);
+ if (!IS_ERR_OR_NULL(ctx->local_filp))
+ fput(ctx->local_filp);
kfree_rcu(ctx, rcu_head);
}
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 958c8de072e2..d352040e3232 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -451,6 +451,57 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+/* localio.c */
+extern void nfs_local_disable(struct nfs_client *);
+extern void nfs_local_probe(struct nfs_client *);
+extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
+ struct nfs_fh *, const fmode_t);
+extern struct file *nfs_local_file_open(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ struct nfs_open_context *ctx);
+extern int nfs_local_doio(struct nfs_client *, struct file *,
+ struct nfs_pgio_header *,
+ const struct rpc_call_ops *);
+extern int nfs_local_commit(struct file *, struct nfs_commit_data *,
+ const struct rpc_call_ops *, int);
+extern bool nfs_server_is_local(const struct nfs_client *clp);
+
+#else
+static inline void nfs_local_disable(struct nfs_client *clp) {}
+static inline void nfs_local_probe(struct nfs_client *clp) {}
+static inline struct file *nfs_local_open_fh(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ const fmode_t mode)
+{
+ return ERR_PTR(-EINVAL);
+}
+static inline struct file *nfs_local_file_open(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ struct nfs_open_context *ctx)
+{
+ return NULL;
+}
+static inline int nfs_local_doio(struct nfs_client *clp, struct file *filep,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ return -EINVAL;
+}
+static inline int nfs_local_commit(struct file *filep, struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ return -EINVAL;
+}
+static inline bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return false;
+}
+#endif /* CONFIG_NFS_LOCALIO */
+
/* super.c */
extern const struct super_operations nfs_sops;
bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
new file mode 100644
index 000000000000..0f7d6d55087b
--- /dev/null
+++ b/fs/nfs/localio.c
@@ -0,0 +1,654 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS client support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/vfs.h>
+#include <linux/file.h>
+#include <linux/inet.h>
+#include <linux/sunrpc/addr.h>
+#include <linux/inetdevice.h>
+#include <net/addrconf.h>
+#include <linux/module.h>
+#include <linux/bvec.h>
+
+#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
+
+#include "internal.h"
+#include "pnfs.h"
+#include "nfstrace.h"
+
+#define NFSDBG_FACILITY NFSDBG_VFS
+
+struct nfs_local_kiocb {
+ struct kiocb kiocb;
+ struct bio_vec *bvec;
+ struct nfs_pgio_header *hdr;
+ struct work_struct work;
+};
+
+struct nfs_local_fsync_ctx {
+ struct file *filp;
+ struct nfs_commit_data *data;
+ struct work_struct work;
+ struct kref kref;
+ struct completion *done;
+};
+static void nfs_local_fsync_work(struct work_struct *work);
+
+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ */
+static struct {
+ __u32 stat;
+ int errno;
+} nfs_errtbl[] = {
+ { NFS4_OK, 0 },
+ { NFS4ERR_PERM, -EPERM },
+ { NFS4ERR_NOENT, -ENOENT },
+ { NFS4ERR_IO, -EIO },
+ { NFS4ERR_NXIO, -ENXIO },
+ { NFS4ERR_FBIG, -E2BIG },
+ { NFS4ERR_STALE, -EBADF },
+ { NFS4ERR_ACCESS, -EACCES },
+ { NFS4ERR_EXIST, -EEXIST },
+ { NFS4ERR_XDEV, -EXDEV },
+ { NFS4ERR_MLINK, -EMLINK },
+ { NFS4ERR_NOTDIR, -ENOTDIR },
+ { NFS4ERR_ISDIR, -EISDIR },
+ { NFS4ERR_INVAL, -EINVAL },
+ { NFS4ERR_FBIG, -EFBIG },
+ { NFS4ERR_NOSPC, -ENOSPC },
+ { NFS4ERR_ROFS, -EROFS },
+ { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG },
+ { NFS4ERR_NOTEMPTY, -ENOTEMPTY },
+ { NFS4ERR_DQUOT, -EDQUOT },
+ { NFS4ERR_STALE, -ESTALE },
+ { NFS4ERR_STALE, -EOPENSTALE },
+ { NFS4ERR_DELAY, -ETIMEDOUT },
+ { NFS4ERR_DELAY, -ERESTARTSYS },
+ { NFS4ERR_DELAY, -EAGAIN },
+ { NFS4ERR_DELAY, -ENOMEM },
+ { NFS4ERR_IO, -ETXTBSY },
+ { NFS4ERR_IO, -EBUSY },
+ { NFS4ERR_BADHANDLE, -EBADHANDLE },
+ { NFS4ERR_BAD_COOKIE, -EBADCOOKIE },
+ { NFS4ERR_NOTSUPP, -EOPNOTSUPP },
+ { NFS4ERR_TOOSMALL, -ETOOSMALL },
+ { NFS4ERR_SERVERFAULT, -ESERVERFAULT },
+ { NFS4ERR_SERVERFAULT, -ENFILE },
+ { NFS4ERR_IO, -EREMOTEIO },
+ { NFS4ERR_IO, -EUCLEAN },
+ { NFS4ERR_PERM, -ENOKEY },
+ { NFS4ERR_BADTYPE, -EBADTYPE },
+ { NFS4ERR_SYMLINK, -ELOOP },
+ { NFS4ERR_DEADLOCK, -EDEADLK },
+};
+
+/*
+ * Convert an NFS error code to a local one.
+ * This one is used jointly by NFSv2 and NFSv3.
+ */
+static __u32
+nfs4errno(int errno)
+{
+ unsigned int i;
+ for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) {
+ if (nfs_errtbl[i].errno == errno)
+ return nfs_errtbl[i].stat;
+ }
+ /* If we cannot translate the error, the recovery routines should
+ * handle it.
+ * Note: remaining NFSv4 error codes have values > 10000, so should
+ * not conflict with native Linux error codes.
+ */
+ return NFS4ERR_SERVERFAULT;
+}
+
+static bool localio_enabled __read_mostly = true;
+module_param(localio_enabled, bool, 0644);
+
+bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
+ localio_enabled;
+}
+EXPORT_SYMBOL_GPL(nfs_server_is_local);
+
+/*
+ * nfs_local_enable - enable local i/o for an nfs_client
+ */
+static __maybe_unused void nfs_local_enable(struct nfs_client *clp,
+ struct net *net)
+{
+ if (READ_ONCE(clp->nfsd_open_local_fh)) {
+ set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ clp->cl_nfssvc_net = net;
+ trace_nfs_local_enable(clp);
+ }
+}
+
+/*
+ * nfs_local_disable - disable local i/o for an nfs_client
+ */
+void nfs_local_disable(struct nfs_client *clp)
+{
+ if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
+ trace_nfs_local_disable(clp);
+ clp->cl_nfssvc_net = NULL;
+ }
+}
+
+/*
+ * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ */
+void nfs_local_probe(struct nfs_client *clp)
+{
+}
+EXPORT_SYMBOL_GPL(nfs_local_probe);
+
+/*
+ * nfs_local_open_fh - open a local filehandle
+ *
+ * Returns a pointer to a struct file or an ERR_PTR
+ */
+struct file *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, const fmode_t mode)
+{
+ struct file *filp;
+ int status;
+
+ if (mode & ~(FMODE_READ | FMODE_WRITE))
+ return ERR_PTR(-EINVAL);
+
+ status = clp->nfsd_open_local_fh(clp->cl_nfssvc_net, clp->cl_rpcclient,
+ cred, fh, mode, &filp);
+ if (status < 0) {
+ trace_nfs_local_open_fh(fh, mode, status);
+ switch (status) {
+ case -ENXIO:
+ nfs_local_disable(clp);
+ fallthrough;
+ case -ETIMEDOUT:
+ status = -EAGAIN;
+ }
+ filp = ERR_PTR(status);
+ }
+ return filp;
+}
+EXPORT_SYMBOL_GPL(nfs_local_open_fh);
+
+static struct bio_vec *
+nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
+ unsigned int npages, gfp_t flags)
+{
+ struct bio_vec *bvec, *p;
+
+ bvec = kmalloc_array(npages, sizeof(*bvec), flags);
+ if (bvec != NULL) {
+ for (p = bvec; npages > 0; p++, pagevec++, npages--) {
+ p->bv_page = *pagevec;
+ p->bv_len = PAGE_SIZE;
+ p->bv_offset = 0;
+ }
+ }
+ return bvec;
+}
+
+static void
+nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
+{
+ kfree(iocb->bvec);
+ kfree(iocb);
+}
+
+static struct nfs_local_kiocb *
+nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, struct file *filp,
+ gfp_t flags)
+{
+ struct nfs_local_kiocb *iocb;
+
+ iocb = kmalloc(sizeof(*iocb), flags);
+ if (iocb == NULL)
+ return NULL;
+ iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
+ hdr->page_array.npages, flags);
+ if (iocb->bvec == NULL) {
+ kfree(iocb);
+ return NULL;
+ }
+ init_sync_kiocb(&iocb->kiocb, filp);
+ iocb->kiocb.ki_pos = hdr->args.offset;
+ iocb->hdr = hdr;
+ iocb->kiocb.ki_flags &= ~IOCB_APPEND;
+ return iocb;
+}
+
+static void
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages,
+ hdr->args.count + hdr->args.pgbase);
+ if (hdr->args.pgbase != 0)
+ iov_iter_advance(i, hdr->args.pgbase);
+}
+
+static void
+nfs_local_hdr_release(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ call_ops->rpc_call_done(&hdr->task, hdr);
+ call_ops->rpc_release(hdr);
+}
+
+static void
+nfs_local_pgio_init(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ hdr->task.tk_ops = call_ops;
+ if (!hdr->task.tk_start)
+ hdr->task.tk_start = ktime_get();
+}
+
+static void
+nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
+{
+ if (status >= 0) {
+ hdr->res.count = status;
+ hdr->res.op_status = NFS4_OK;
+ hdr->task.tk_status = 0;
+ } else {
+ hdr->res.op_status = nfs4errno(status);
+ hdr->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ fput(iocb->kiocb.ki_filp);
+ nfs_local_iocb_free(iocb);
+ nfs_local_hdr_release(hdr, hdr->task.tk_ops);
+}
+
+static void
+nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct file *filp = iocb->kiocb.ki_filp;
+
+ nfs_local_pgio_done(hdr, status);
+
+ if (hdr->res.count != hdr->args.count ||
+ hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
+ hdr->res.eof = true;
+
+ dprintk("%s: read %ld bytes eof %d.\n", __func__,
+ status > 0 ? status : 0, hdr->res.eof);
+}
+
+static int
+nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_read count=%u pos=%llu\n",
+ __func__, hdr->args.count, hdr->args.offset);
+
+ iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, READ);
+
+ nfs_local_pgio_init(hdr, call_ops);
+ hdr->res.eof = false;
+
+ status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+
+ return 0;
+}
+
+static void
+nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ u32 *verf = (u32 *)verifier->data;
+ int seq = 0;
+
+ do {
+ read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
+ verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
+ verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
+ } while (need_seqretry(&clp->cl_boot_lock, seq));
+ done_seqretry(&clp->cl_boot_lock, seq);
+}
+
+static void
+nfs_reset_boot_verifier(struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+
+ write_seqlock(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+ write_sequnlock(&clp->cl_boot_lock);
+}
+
+static void
+nfs_set_local_verifier(struct inode *inode,
+ struct nfs_writeverf *verf,
+ enum nfs3_stable_how how)
+{
+
+ nfs_copy_boot_verifier(&verf->verifier, inode);
+ verf->committed = how;
+}
+
+static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
+{
+ struct kstat stat;
+ struct file *filp = iocb->kiocb.ki_filp;
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct nfs_fattr *fattr = hdr->res.fattr;
+
+ if (unlikely(!fattr) || vfs_getattr(&filp->f_path, &stat,
+ STATX_INO |
+ STATX_ATIME |
+ STATX_MTIME |
+ STATX_CTIME |
+ STATX_SIZE |
+ STATX_BLOCKS,
+ AT_STATX_SYNC_AS_STAT))
+ return;
+
+ fattr->valid = (NFS_ATTR_FATTR_FILEID |
+ NFS_ATTR_FATTR_CHANGE |
+ NFS_ATTR_FATTR_SIZE |
+ NFS_ATTR_FATTR_ATIME |
+ NFS_ATTR_FATTR_MTIME |
+ NFS_ATTR_FATTR_CTIME |
+ NFS_ATTR_FATTR_SPACE_USED);
+
+ fattr->fileid = stat.ino;
+ fattr->size = stat.size;
+ fattr->atime = stat.atime;
+ fattr->mtime = stat.mtime;
+ fattr->ctime = stat.ctime;
+ fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ fattr->du.nfs3.used = stat.blocks << 9;
+}
+
+static void
+nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
+
+ /* Handle short writes as if they are ENOSPC */
+ if (status > 0 && status < hdr->args.count) {
+ hdr->mds_offset += status;
+ hdr->args.offset += status;
+ hdr->args.pgbase += status;
+ hdr->args.count -= status;
+ nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
+ status = -ENOSPC;
+ }
+ if (status < 0)
+ nfs_reset_boot_verifier(hdr->inode);
+ nfs_local_pgio_done(hdr, status);
+}
+
+static int
+nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_write count=%u pos=%llu %s\n",
+ __func__, hdr->args.count, hdr->args.offset,
+ (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
+
+ iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, WRITE);
+
+ switch (hdr->args.stable) {
+ default:
+ break;
+ case NFS_DATA_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC;
+ break;
+ case NFS_FILE_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
+ }
+ nfs_local_pgio_init(hdr, call_ops);
+
+ nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
+
+ file_start_write(filp);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ file_end_write(filp);
+ WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+ nfs_local_write_done(iocb, status);
+ nfs_local_vfs_getattr(iocb);
+ nfs_local_pgio_release(iocb);
+
+ return 0;
+}
+
+static struct file *
+nfs_local_file_open_cached(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_open_context *ctx)
+{
+ struct file *filp = ctx->local_filp;
+
+ if (!filp) {
+ struct file *new = nfs_local_open_fh(clp, cred, fh, ctx->mode);
+ if (IS_ERR_OR_NULL(new))
+ return NULL;
+ /* try to put this one in the slot */
+ filp = cmpxchg(&ctx->local_filp, NULL, new);
+ if (filp != NULL)
+ fput(new);
+ else
+ filp = new;
+ }
+ return get_file(filp);
+}
+
+struct file *
+nfs_local_file_open(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_open_context *ctx)
+{
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ return nfs_local_file_open_cached(clp, cred, fh, ctx);
+}
+
+int
+nfs_local_doio(struct nfs_client *clp, struct file *filp,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ int status = 0;
+
+ if (!hdr->args.count)
+ goto out_fput;
+ /* Don't support filesystems without read_iter/write_iter */
+ if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
+ nfs_local_disable(clp);
+ status = -EAGAIN;
+ goto out_fput;
+ }
+
+ switch (hdr->rw_mode) {
+ case FMODE_READ:
+ status = nfs_do_local_read(hdr, filp, call_ops);
+ break;
+ case FMODE_WRITE:
+ status = nfs_do_local_write(hdr, filp, call_ops);
+ break;
+ default:
+ dprintk("%s: invalid mode: %d\n", __func__,
+ hdr->rw_mode);
+ status = -EINVAL;
+ }
+out_fput:
+ if (status != 0) {
+ fput(filp);
+ hdr->task.tk_status = status;
+ nfs_local_hdr_release(hdr, call_ops);
+ }
+ return status;
+}
+
+static void
+nfs_local_init_commit(struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ data->task.tk_ops = call_ops;
+}
+
+static int
+nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
+{
+ loff_t start = data->args.offset;
+ loff_t end = LLONG_MAX;
+
+ if (data->args.count > 0) {
+ end = start + data->args.count - 1;
+ if (end < start)
+ end = LLONG_MAX;
+ }
+
+ dprintk("%s: commit %llu - %llu\n", __func__, start, end);
+ return vfs_fsync_range(filp, start, end, 0);
+}
+
+static void
+nfs_local_commit_done(struct nfs_commit_data *data, int status)
+{
+ if (status >= 0) {
+ nfs_set_local_verifier(data->inode,
+ data->res.verf,
+ NFS_FILE_SYNC);
+ data->res.op_status = NFS4_OK;
+ data->task.tk_status = 0;
+ } else {
+ nfs_reset_boot_verifier(data->inode);
+ data->res.op_status = nfs4errno(status);
+ data->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_release_commit_data(struct file *filp,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ fput(filp);
+ call_ops->rpc_call_done(&data->task, data);
+ call_ops->rpc_release(data);
+}
+
+static struct nfs_local_fsync_ctx *
+nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, struct file *filp,
+ gfp_t flags)
+{
+ struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
+
+ if (ctx != NULL) {
+ ctx->filp = filp;
+ ctx->data = data;
+ INIT_WORK(&ctx->work, nfs_local_fsync_work);
+ kref_init(&ctx->kref);
+ ctx->done = NULL;
+ }
+ return ctx;
+}
+
+static void
+nfs_local_fsync_ctx_kref_free(struct kref *kref)
+{
+ kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
+}
+
+static void
+nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
+{
+ kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
+}
+
+static void
+nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
+{
+ nfs_local_release_commit_data(ctx->filp, ctx->data,
+ ctx->data->task.tk_ops);
+ nfs_local_fsync_ctx_put(ctx);
+}
+
+static void
+nfs_local_fsync_work(struct work_struct *work)
+{
+ struct nfs_local_fsync_ctx *ctx;
+ int status;
+
+ ctx = container_of(work, struct nfs_local_fsync_ctx, work);
+
+ status = nfs_local_run_commit(ctx->filp, ctx->data);
+ nfs_local_commit_done(ctx->data, status);
+ if (ctx->done != NULL)
+ complete(ctx->done);
+ nfs_local_fsync_ctx_free(ctx);
+}
+
+int
+nfs_local_commit(struct file *filp, struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ struct nfs_local_fsync_ctx *ctx;
+
+ ctx = nfs_local_fsync_ctx_alloc(data, filp, GFP_KERNEL);
+ if (!ctx) {
+ nfs_local_commit_done(data, -ENOMEM);
+ nfs_local_release_commit_data(filp, data, call_ops);
+ return -ENOMEM;
+ }
+
+ nfs_local_init_commit(data, call_ops);
+ kref_get(&ctx->kref);
+ if (how & FLUSH_SYNC) {
+ DECLARE_COMPLETION_ONSTACK(done);
+ ctx->done = &done;
+ queue_work(nfsiod_workqueue, &ctx->work);
+ wait_for_completion(&done);
+ } else
+ queue_work(nfsiod_workqueue, &ctx->work);
+ nfs_local_fsync_ctx_put(ctx);
+ return 0;
+}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 1e710654af11..95a2c19a9172 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1681,6 +1681,67 @@ TRACE_EVENT(nfs_mount_path,
TP_printk("path='%s'", __get_str(path))
);
+TRACE_EVENT(nfs_local_open_fh,
+ TP_PROTO(
+ const struct nfs_fh *fh,
+ fmode_t fmode,
+ int error
+ ),
+
+ TP_ARGS(fh, fmode, error),
+
+ TP_STRUCT__entry(
+ __field(int, error)
+ __field(u32, fhandle)
+ __field(unsigned int, fmode)
+ ),
+
+ TP_fast_assign(
+ __entry->error = error;
+ __entry->fhandle = nfs_fhandle_hash(fh);
+ __entry->fmode = (__force unsigned int)fmode;
+ ),
+
+ TP_printk(
+ "error=%d fhandle=0x%08x mode=%s",
+ __entry->error,
+ __entry->fhandle,
+ show_fs_fmode_flags(__entry->fmode)
+ )
+);
+
+DECLARE_EVENT_CLASS(nfs_local_client_event,
+ TP_PROTO(
+ const struct nfs_client *clp
+ ),
+
+ TP_ARGS(clp),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, protocol)
+ __string(server, clp->cl_hostname)
+ ),
+
+ TP_fast_assign(
+ __entry->protocol = clp->rpc_ops->version;
+ __assign_str(server);
+ ),
+
+ TP_printk(
+ "server=%s NFSv%u", __get_str(server), __entry->protocol
+ )
+);
+
+#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \
+ DEFINE_EVENT(nfs_local_client_event, name, \
+ TP_PROTO( \
+ const struct nfs_client *clp \
+ ), \
+ TP_ARGS(clp))
+
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable);
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable);
+
DECLARE_EVENT_CLASS(nfs_xdr_event,
TP_PROTO(
const struct xdr_stream *xdr,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 57d62db3be5b..b08420b8e664 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -879,6 +879,9 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
hdr->args.count,
(unsigned long long)hdr->args.offset);
+ if (localio)
+ return nfs_local_doio(clp, localio, hdr, call_ops);
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 267bed2a4ceb..b29b0fd5431f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1700,6 +1700,9 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
dprintk("NFS: initiated commit call\n");
+ if (localio)
+ return nfs_local_commit(localio, data, call_ops, how);
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index ceb70a926b95..64ed672a0b34 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -8,6 +8,8 @@
#ifndef _LINUX_NFS_H
#define _LINUX_NFS_H
+#include <linux/cred.h>
+#include <linux/sunrpc/auth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/string.h>
#include <linux/crc32.h>
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 039898d70954..a0bb947fdd1d 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -96,6 +96,8 @@ struct nfs_open_context {
struct list_head list;
struct nfs4_threshold *mdsthreshold;
struct rcu_head rcu_head;
+
+ struct file *local_filp;
};
struct nfs_open_dir_context {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index e58e706a6503..4290c550a049 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -50,6 +50,7 @@ struct nfs_client {
#define NFS_CS_DS 7 /* - Server is a DS */
#define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
#define NFS_CS_PNFS 9 /* - Server used for pnfs */
+#define NFS_CS_LOCAL_IO 10 /* - client is local */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (5 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 06/19] nfs: add "localio" support Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-29 15:50 ` Chuck Lever
2024-06-28 21:10 ` [PATCH v9 08/19] nfs: enable localio for non-pNFS I/O Mike Snitzer
` (12 subsequent siblings)
19 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
This is nfs-localio code which blurs the boundary between server and
client...
The change_attr is used by NFS to detect if a file might have changed.
This code is used to get the attributes after a write request. NFS
uses a GETATTR request to the server at other times. The change_attr
should be consistent between the two else comparisons will be
meaningless.
So nfs_localio_vfs_getattr() should use the same change_attr as the
one that would be used if the NFS GETATTR request were made. For
NFSv3, that is nfs_timespec_to_change_attr() as was already
implemented. For NFSv4 it is something different (as implemented in
this commit).
[above header derived from linux-nfs message Neil sent on this topic]
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 48 +++++++++++++++++++++++++++++++++++++++---------
1 file changed, 39 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 0f7d6d55087b..fe96f05ba8ca 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -364,21 +364,47 @@ nfs_set_local_verifier(struct inode *inode,
verf->committed = how;
}
+/* Factored out from fs/nfsd/vfs.h:fh_getattr() */
+static int __vfs_getattr(struct path *p, struct kstat *stat, int version)
+{
+ u32 request_mask = STATX_BASIC_STATS;
+
+ if (version == 4)
+ request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE);
+ return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT);
+}
+
+/*
+ * Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute(),
+ * FIXME: factor out to common code.
+ */
+static u64 __nfsd4_change_attribute(const struct kstat *stat,
+ const struct inode *inode)
+{
+ u64 chattr;
+
+ if (stat->result_mask & STATX_CHANGE_COOKIE) {
+ chattr = stat->change_cookie;
+ if (S_ISREG(inode->i_mode) &&
+ !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
+ chattr += (u64)stat->ctime.tv_sec << 30;
+ chattr += stat->ctime.tv_nsec;
+ }
+ } else {
+ chattr = time_to_chattr(&stat->ctime);
+ }
+ return chattr;
+}
+
static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
{
struct kstat stat;
struct file *filp = iocb->kiocb.ki_filp;
struct nfs_pgio_header *hdr = iocb->hdr;
struct nfs_fattr *fattr = hdr->res.fattr;
+ int version = NFS_PROTO(hdr->inode)->version;
- if (unlikely(!fattr) || vfs_getattr(&filp->f_path, &stat,
- STATX_INO |
- STATX_ATIME |
- STATX_MTIME |
- STATX_CTIME |
- STATX_SIZE |
- STATX_BLOCKS,
- AT_STATX_SYNC_AS_STAT))
+ if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version))
return;
fattr->valid = (NFS_ATTR_FATTR_FILEID |
@@ -394,7 +420,11 @@ static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
fattr->atime = stat.atime;
fattr->mtime = stat.mtime;
fattr->ctime = stat.ctime;
- fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ if (version == 4) {
+ fattr->change_attr =
+ __nfsd4_change_attribute(&stat, file_inode(filp));
+ } else
+ fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
fattr->du.nfs3.used = stat.blocks << 9;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 08/19] nfs: enable localio for non-pNFS I/O
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (6 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4 Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 09/19] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
` (11 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Try a local open of the file we're writing to, and if it succeeds, then
do local I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/pagelist.c | 19 ++++++++++---------
fs/nfs/write.c | 7 ++++++-
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index b08420b8e664..3ee78da5ebc4 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1063,6 +1063,7 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
{
struct nfs_pgio_header *hdr;
+ struct file *filp;
int ret;
unsigned short task_flags = 0;
@@ -1074,18 +1075,18 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0) {
+ struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client;
+
+ filp = nfs_local_file_open(clp, hdr->cred, hdr->args.fh,
+ hdr->args.context);
+
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(desc,
- NFS_SERVER(hdr->inode)->nfs_client,
- NFS_CLIENT(hdr->inode),
- hdr,
- hdr->cred,
- NFS_PROTO(hdr->inode),
- desc->pg_rpc_callops,
- desc->pg_ioflags,
+ ret = nfs_initiate_pgio(desc, clp, NFS_CLIENT(hdr->inode),
+ hdr, hdr->cred, NFS_PROTO(hdr->inode),
+ desc->pg_rpc_callops, desc->pg_ioflags,
RPC_TASK_CRED_NOREF | task_flags,
- NULL);
+ filp);
}
return ret;
}
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b29b0fd5431f..b2c06b8b88cd 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1802,6 +1802,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
struct nfs_commit_info *cinfo)
{
struct nfs_commit_data *data;
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ struct file *filp;
unsigned short task_flags = 0;
/* another commit raced with us */
@@ -1818,9 +1820,12 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
nfs_init_commit(data, head, NULL, cinfo);
if (NFS_SERVER(inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
+
+ filp = nfs_local_file_open(clp, data->cred, data->args.fh,
+ data->context);
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF | task_flags, NULL);
+ RPC_TASK_CRED_NOREF | task_flags, filp);
}
/*
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 09/19] pnfs/flexfiles: Enable localio for flexfiles I/O
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (7 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 08/19] nfs: enable localio for non-pNFS I/O Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 10/19] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
` (10 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Trond Myklebust <trond.myklebust@hammerspace.com>
If the DS is local to this client, then we should be able to use local
I/O to write the data.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 113 ++++++++++++++++++++--
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 ++
3 files changed, 112 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 3ea07446f05a..ec6aaa110a7b 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -11,6 +11,7 @@
#include <linux/nfs_mount.h>
#include <linux/nfs_page.h>
#include <linux/module.h>
+#include <linux/file.h>
#include <linux/sched/mm.h>
#include <linux/sunrpc/metrics.h>
@@ -162,6 +163,52 @@ decode_name(struct xdr_stream *xdr, u32 *id)
return 0;
}
+static struct file *
+ff_local_open_fh(struct pnfs_layout_segment *lseg,
+ u32 ds_idx,
+ struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ fmode_t mode)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+ struct file *filp, *new, __rcu **pfile;
+
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ if (mode & FMODE_WRITE) {
+ /*
+ * Always request read and write access since this corresponds
+ * to a rw layout.
+ */
+ mode |= FMODE_READ;
+ pfile = &mirror->rw_file;
+ } else
+ pfile = &mirror->ro_file;
+
+ new = NULL;
+ rcu_read_lock();
+ filp = rcu_dereference(*pfile);
+ if (!filp) {
+ rcu_read_unlock();
+ new = nfs_local_open_fh(clp, cred, fh, mode);
+ if (IS_ERR(new))
+ return NULL;
+ rcu_read_lock();
+ /* try to swap in the pointer */
+ filp = cmpxchg(pfile, NULL, new);
+ if (!filp) {
+ filp = new;
+ new = NULL;
+ }
+ }
+ filp = get_file_rcu(&filp);
+ rcu_read_unlock();
+ if (new)
+ fput(new);
+ return filp;
+}
+
static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
const struct nfs4_ff_layout_mirror *m2)
{
@@ -237,8 +284,15 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
{
+ struct file *filp;
const struct cred *cred;
+ filp = rcu_access_pointer(mirror->ro_file);
+ if (filp)
+ fput(filp);
+ filp = rcu_access_pointer(mirror->rw_file);
+ if (filp)
+ fput(filp);
ff_layout_remove_mirror(mirror);
kfree(mirror->fh_versions);
cred = rcu_access_pointer(mirror->ro_cred);
@@ -414,6 +468,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
struct nfs4_ff_layout_mirror *mirror;
struct cred *kcred;
const struct cred __rcu *cred;
+ const struct cred __rcu *old;
kuid_t uid;
kgid_t gid;
u32 ds_count, fh_count, id;
@@ -513,13 +568,26 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]);
if (mirror != fls->mirror_array[i]) {
+ struct file *filp;
+
/* swap cred ptrs so free_mirror will clean up old */
if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->ro_cred, cred);
- rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
+ old = xchg(&mirror->ro_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->ro_cred, old);
+ /* drop file if creds changed */
+ if (old != cred) {
+ filp = rcu_dereference_protected(xchg(&mirror->ro_file, NULL), 1);
+ if (filp)
+ fput(filp);
+ }
} else {
- cred = xchg(&mirror->rw_cred, cred);
- rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
+ old = xchg(&mirror->rw_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->rw_cred, old);
+ if (old != cred) {
+ filp = rcu_dereference_protected(xchg(&mirror->rw_file, NULL), 1);
+ if (filp)
+ fput(filp);
+ }
}
ff_layout_free_mirror(fls->mirror_array[i]);
fls->mirror_array[i] = mirror;
@@ -1757,6 +1825,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
loff_t offset = hdr->args.offset;
@@ -1803,12 +1872,20 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
hdr->args.offset = offset;
hdr->mds_offset = offset;
+ /* Start IO accounting for local read */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ);
+ if (filp) {
+ hdr->task.tk_start = ktime_get();
+ ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
+ }
+
/* Perform an asynchronous read to ds */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
- 0, RPC_TASK_SOFTCONN, NULL);
+ 0, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1829,6 +1906,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
loff_t offset = hdr->args.offset;
@@ -1873,12 +1951,20 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
*/
hdr->args.offset = offset;
+ /* Start IO accounting for local write */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ|FMODE_WRITE);
+ if (filp) {
+ hdr->task.tk_start = ktime_get();
+ ff_layout_write_record_layoutstats_start(&hdr->task, hdr);
+ }
+
/* Perform an asynchronous write */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
- sync, RPC_TASK_SOFTCONN, NULL);
+ sync, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1912,6 +1998,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
struct pnfs_layout_segment *lseg = data->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
u32 idx;
@@ -1950,10 +2037,18 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
if (fh)
data->args.fh = fh;
+ /* Start IO accounting for local commit */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ|FMODE_WRITE);
+ if (filp) {
+ data->task.tk_start = ktime_get();
+ ff_layout_commit_record_layoutstats_start(&data->task, data);
+ }
+
ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
- vers == 3 ? &ff_layout_commit_call_ops_v3 :
- &ff_layout_commit_call_ops_v4,
- how, RPC_TASK_SOFTCONN, NULL);
+ vers == 3 ? &ff_layout_commit_call_ops_v3 :
+ &ff_layout_commit_call_ops_v4,
+ how, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return ret;
out_err:
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index f84b3fb0dddd..8e042df5a2c9 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -82,7 +82,9 @@ struct nfs4_ff_layout_mirror {
struct nfs_fh *fh_versions;
nfs4_stateid stateid;
const struct cred __rcu *ro_cred;
+ struct file __rcu *ro_file;
const struct cred __rcu *rw_cred;
+ struct file __rcu *rw_file;
refcount_t ref;
spinlock_t lock;
unsigned long flags;
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index e028f5a0ef5f..e58bedfb1dcc 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -395,6 +395,12 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
/* connect success, check rsize/wsize limit */
if (!status) {
+ /*
+ * ds_clp is put in destroy_ds().
+ * keep ds_clp even if DS is local, so that if local IO cannot
+ * proceed somehow, we can fall back to NFS whenever we want.
+ */
+ nfs_local_probe(ds->ds_clp);
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 10/19] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (8 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 09/19] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 11/19] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg Mike Snitzer
` (9 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Eliminates duplicate functions in various files to allow for
additional callers.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 6 ------
fs/nfs/nfs4xdr.c | 13 -------------
include/linux/nfs_xdr.h | 20 +++++++++++++++++++-
3 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index ec6aaa110a7b..8b9096ad0663 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -2185,12 +2185,6 @@ static int ff_layout_encode_ioerr(struct xdr_stream *xdr,
return ff_layout_encode_ds_ioerr(xdr, &ff_args->errors);
}
-static void
-encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void
ff_layout_encode_ff_iostat_head(struct xdr_stream *xdr,
const nfs4_stateid *stateid,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 1416099dfcd1..ede431ee0ef0 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -968,11 +968,6 @@ static __be32 *reserve_space(struct xdr_stream *xdr, size_t nbytes)
return p;
}
-static void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
{
WARN_ON_ONCE(xdr_stream_encode_opaque(xdr, str, len) < 0);
@@ -4352,14 +4347,6 @@ static int decode_access(struct xdr_stream *xdr, u32 *supported, u32 *access)
return 0;
}
-static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
-{
- ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
- if (unlikely(ret < 0))
- return -EIO;
- return 0;
-}
-
static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
{
return decode_opaque_fixed(xdr, stateid, NFS4_STATEID_SIZE);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index d09b9773b20c..bb460af0ea1f 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1820,6 +1820,24 @@ struct nfs_rpc_ops {
void (*disable_swap)(struct inode *inode);
};
+/*
+ * Helper functions used by NFS client and/or server
+ */
+static inline void encode_opaque_fixed(struct xdr_stream *xdr,
+ const void *buf, size_t len)
+{
+ WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
+}
+
+static inline int decode_opaque_fixed(struct xdr_stream *xdr,
+ void *buf, size_t len)
+{
+ ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
+ if (unlikely(ret < 0))
+ return -EIO;
+ return 0;
+}
+
/*
* Function vectors etc. for the NFS client
*/
@@ -1833,4 +1851,4 @@ extern const struct rpc_version nfs_version4;
extern const struct rpc_version nfsacl_version3;
extern const struct rpc_program nfsacl_program;
-#endif
+#endif /* _LINUX_NFS_XDR_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 11/19] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (9 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 10/19] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 12/19] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
` (8 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
This is needed for the LOCALIO protocol's GETUUID RPC which takes a
void arg. The LOCALIO protocol spec in rpcgen syntax is:
/* raw RFC 9562 UUID */
typedef u8 uuid_t<UUID_SIZE>;
program NFS_LOCALIO_PROGRAM {
version LOCALIO_V1 {
void
NULL(void) = 0;
uuid_t
GETUUID(void) = 1;
} = 1;
} = 400122;
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
net/sunrpc/clnt.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index cfd1b1bf7e35..2d7f96103f08 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1894,7 +1894,6 @@ call_allocate(struct rpc_task *task)
return;
if (proc->p_proc != 0) {
- BUG_ON(proc->p_arglen == 0);
if (proc->p_decode != NULL)
BUG_ON(proc->p_replen == 0);
}
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 12/19] nfs: implement client support for NFS_LOCALIO_PROGRAM
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (10 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 11/19] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 13/19] nfsd: add "localio" support Mike Snitzer
` (7 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
LOCALIOPROC_GETUUID allows a client to discover the server's uuid.
nfs_local_probe() will retrieve server's uuid via LOCALIO protocol and
verify the server with that uuid it is known to be local. This ensures
client and server 1: support localio 2: are local to each other.
All the knowledge of the LOCALIO RPC protocol is in fs/nfs/localio.c
which implements just a single version (1) that is used independently
of what NFS version is used.
Get nfsd_open_local_fh and store it in rpc_client during client
creation, put the symbol during nfs_local_disable -- which is also
called during client destruction.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[neilb: factored out and simplified single localio protocol]
Co-developed-by: NeilBrown <neil@brown.name>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/client.c | 6 +-
fs/nfs/localio.c | 153 ++++++++++++++++++++++++++++++++++++++++++--
include/linux/nfs.h | 7 ++
3 files changed, 159 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 1300c388f971..6faa9fdc444d 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -434,8 +434,10 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
- nfs_local_probe(new);
- return rpc_ops->init_client(new, cl_init);
+ new = rpc_ops->init_client(new, cl_init);
+ if (!IS_ERR(new))
+ nfs_local_probe(new);
+ return new;
}
spin_unlock(&nn->nfs_client_lock);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index fe96f05ba8ca..1f583891f92b 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -15,6 +15,7 @@
#include <linux/sunrpc/addr.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
+#include <linux/nfslocalio.h>
#include <linux/module.h>
#include <linux/bvec.h>
@@ -117,18 +118,76 @@ nfs4errno(int errno)
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
+static inline bool nfs_client_is_local(const struct nfs_client *clp)
+{
+ return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+}
+
bool nfs_server_is_local(const struct nfs_client *clp)
{
- return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
- localio_enabled;
+ return nfs_client_is_local(clp) && localio_enabled;
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);
+/*
+ * GETUUID XDR functions
+ */
+
+static void localio_xdr_enc_getuuidargs(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ const void *data)
+{
+ /* void function */
+}
+
+static int localio_xdr_dec_getuuidres(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ void *result)
+{
+ u8 *uuid = result;
+
+ return decode_opaque_fixed(xdr, uuid, UUID_SIZE);
+}
+
+static const struct rpc_procinfo nfs_localio_procedures[] = {
+ [LOCALIOPROC_GETUUID] = {
+ .p_proc = LOCALIOPROC_GETUUID,
+ .p_encode = localio_xdr_enc_getuuidargs,
+ .p_decode = localio_xdr_dec_getuuidres,
+ .p_arglen = 0,
+ .p_replen = XDR_QUADLEN(UUID_SIZE),
+ .p_statidx = LOCALIOPROC_GETUUID,
+ .p_name = "GETUUID",
+ },
+};
+
+static unsigned int nfs_localio_counts[ARRAY_SIZE(nfs_localio_procedures)];
+const struct rpc_version nfslocalio_version1 = {
+ .number = 1,
+ .nrprocs = ARRAY_SIZE(nfs_localio_procedures),
+ .procs = nfs_localio_procedures,
+ .counts = nfs_localio_counts,
+};
+
+static const struct rpc_version *nfslocalio_version[] = {
+ [1] = &nfslocalio_version1,
+};
+
+extern const struct rpc_program nfslocalio_program;
+static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program };
+
+const struct rpc_program nfslocalio_program = {
+ .name = "nfslocalio",
+ .number = NFS_LOCALIO_PROGRAM,
+ .nrvers = ARRAY_SIZE(nfslocalio_version),
+ .version = nfslocalio_version,
+ .stats = &nfslocalio_rpcstat,
+};
+
/*
* nfs_local_enable - enable local i/o for an nfs_client
*/
-static __maybe_unused void nfs_local_enable(struct nfs_client *clp,
- struct net *net)
+static void nfs_local_enable(struct nfs_client *clp, struct net *net)
{
if (READ_ONCE(clp->nfsd_open_local_fh)) {
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
@@ -144,15 +203,98 @@ void nfs_local_disable(struct nfs_client *clp)
{
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
trace_nfs_local_disable(clp);
+ put_nfsd_open_local_fh();
+ clp->nfsd_open_local_fh = NULL;
+ if (!IS_ERR(clp->cl_rpcclient_localio)) {
+ rpc_shutdown_client(clp->cl_rpcclient_localio);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ }
clp->cl_nfssvc_net = NULL;
}
}
+/*
+ * nfs_init_localioclient - Initialise an NFS localio client connection
+ */
+static void nfs_init_localioclient(struct nfs_client *clp)
+{
+ if (unlikely(!IS_ERR(clp->cl_rpcclient_localio)))
+ goto out;
+ clp->cl_rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
+ &nfslocalio_program, 1);
+ if (IS_ERR(clp->cl_rpcclient_localio))
+ goto out;
+ /* No errors! Assume that localio is supported */
+ clp->nfsd_open_local_fh = get_nfsd_open_local_fh();
+ if (!clp->nfsd_open_local_fh) {
+ rpc_shutdown_client(clp->cl_rpcclient_localio);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ }
+out:
+ dprintk_rcu("%s: server (%s) %s NFS LOCALIO, nfsd_open_local_fh is %s.\n",
+ __func__, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
+ (IS_ERR(clp->cl_rpcclient_localio) ? "does not support" : "supports"),
+ (clp->nfsd_open_local_fh ? "set" : "not set"));
+}
+
+static bool nfs_local_server_getuuid(struct nfs_client *clp, uuid_t *nfsd_uuid)
+{
+ u8 uuid[UUID_SIZE];
+ struct rpc_message msg = {
+ .rpc_resp = &uuid,
+ };
+ int status;
+
+ nfs_init_localioclient(clp);
+ if (IS_ERR(clp->cl_rpcclient_localio))
+ return false;
+
+ msg.rpc_proc = &nfs_localio_procedures[LOCALIOPROC_GETUUID];
+ status = rpc_call_sync(clp->cl_rpcclient_localio, &msg, 0);
+ dprintk("%s: NFS reply getuuid: status=%d uuid=%pU\n",
+ __func__, status, uuid);
+ if (status)
+ return false;
+
+ import_uuid(nfsd_uuid, uuid);
+
+ return true;
+}
+
/*
* nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ * - called after alloc_client and init_client (so cl_rpcclient exists)
+ * - this function is idempotent, it can be called for old or new clients
*/
void nfs_local_probe(struct nfs_client *clp)
{
+ uuid_t uuid;
+ struct net *net = NULL;
+
+ if (!localio_enabled || clp->cl_rpcclient->cl_vers == 2)
+ goto unsupported;
+
+ if (nfs_client_is_local(clp)) {
+ /* If already enabled, disable and re-enable */
+ nfs_local_disable(clp);
+ }
+
+ /*
+ * Retrieve server's uuid via LOCALIO protocol and verify the
+ * server with that uuid is known to be local. This ensures
+ * client and server 1: support localio 2: are local to each other
+ * by verifying client's nfsd, with specified uuid, is local.
+ */
+ if (!nfs_local_server_getuuid(clp, &uuid) ||
+ !nfsd_uuid_is_local(&uuid, &net))
+ goto unsupported;
+
+ nfs_local_enable(clp, net);
+ return;
+
+unsupported:
+ /* localio not supported */
+ nfs_local_disable(clp);
}
EXPORT_SYMBOL_GPL(nfs_local_probe);
@@ -177,7 +319,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
trace_nfs_local_open_fh(fh, mode, status);
switch (status) {
case -ENXIO:
- nfs_local_disable(clp);
+ /* Revalidate localio, will disable if unsupported */
+ nfs_local_probe(clp);
fallthrough;
case -ETIMEDOUT:
status = -EAGAIN;
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index 64ed672a0b34..036f6b0ed94d 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -15,6 +15,13 @@
#include <linux/crc32.h>
#include <uapi/linux/nfs.h>
+/* The localio program is entirely private to Linux and is
+ * NOT part of the uapi.
+ */
+#define NFS_LOCALIO_PROGRAM 400122
+#define LOCALIOPROC_NULL 0
+#define LOCALIOPROC_GETUUID 1
+
/*
* This is the kernel NFS client file handle representation
*/
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 13/19] nfsd: add "localio" support
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (11 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 12/19] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-06-28 21:10 ` Mike Snitzer
2024-06-29 22:18 ` Chuck Lever
2024-06-28 21:11 ` [PATCH v9 14/19] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
` (6 subsequent siblings)
19 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:10 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Pass the stored cl_nfssvc_net from the client to the server as
first argument to nfsd_open_local_fh() to ensure the proper network
namespace is used for localio.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/Makefile | 1 +
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 239 ++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfssvc.c | 1 +
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 9 ++
6 files changed, 253 insertions(+), 2 deletions(-)
create mode 100644 fs/nfsd/localio.c
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index b8736a82e57c..78b421778a79 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
+nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ad9083ca144b..99631fa56662 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -52,7 +52,7 @@
#define NFSD_FILE_CACHE_UP (0)
/* We only care about NFSD_MAY_READ/WRITE for this cache */
-#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
new file mode 100644
index 000000000000..759a5cb79652
--- /dev/null
+++ b/fs/nfsd/localio.c
@@ -0,0 +1,239 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS server support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/exportfs.h>
+#include <linux/sunrpc/svcauth_gss.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/nfs.h>
+#include <linux/string.h>
+
+#include "nfsd.h"
+#include "vfs.h"
+#include "netns.h"
+#include "filecache.h"
+
+#define NFSDDBG_FACILITY NFSDDBG_FH
+
+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
+ * all compiled nfs objects if it were in include/linux/nfs.h
+ */
+static const struct {
+ int stat;
+ int errno;
+} nfs_common_errtbl[] = {
+ { NFS_OK, 0 },
+ { NFSERR_PERM, -EPERM },
+ { NFSERR_NOENT, -ENOENT },
+ { NFSERR_IO, -EIO },
+ { NFSERR_NXIO, -ENXIO },
+/* { NFSERR_EAGAIN, -EAGAIN }, */
+ { NFSERR_ACCES, -EACCES },
+ { NFSERR_EXIST, -EEXIST },
+ { NFSERR_XDEV, -EXDEV },
+ { NFSERR_NODEV, -ENODEV },
+ { NFSERR_NOTDIR, -ENOTDIR },
+ { NFSERR_ISDIR, -EISDIR },
+ { NFSERR_INVAL, -EINVAL },
+ { NFSERR_FBIG, -EFBIG },
+ { NFSERR_NOSPC, -ENOSPC },
+ { NFSERR_ROFS, -EROFS },
+ { NFSERR_MLINK, -EMLINK },
+ { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
+ { NFSERR_NOTEMPTY, -ENOTEMPTY },
+ { NFSERR_DQUOT, -EDQUOT },
+ { NFSERR_STALE, -ESTALE },
+ { NFSERR_REMOTE, -EREMOTE },
+#ifdef EWFLUSH
+ { NFSERR_WFLUSH, -EWFLUSH },
+#endif
+ { NFSERR_BADHANDLE, -EBADHANDLE },
+ { NFSERR_NOT_SYNC, -ENOTSYNC },
+ { NFSERR_BAD_COOKIE, -EBADCOOKIE },
+ { NFSERR_NOTSUPP, -ENOTSUPP },
+ { NFSERR_TOOSMALL, -ETOOSMALL },
+ { NFSERR_SERVERFAULT, -EREMOTEIO },
+ { NFSERR_BADTYPE, -EBADTYPE },
+ { NFSERR_JUKEBOX, -EJUKEBOX },
+ { -1, -EIO }
+};
+
+/**
+ * nfs_stat_to_errno - convert an NFS status code to a local errno
+ * @status: NFS status code to convert
+ *
+ * Returns a local errno value, or -EIO if the NFS status code is
+ * not recognized. nfsd_file_acquire() returns an nfsstat that
+ * needs to be translated to an errno before being returned to a
+ * local client application.
+ */
+static int nfs_stat_to_errno(enum nfs_stat status)
+{
+ int i;
+
+ for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
+ if (nfs_common_errtbl[i].stat == (int)status)
+ return nfs_common_errtbl[i].errno;
+ }
+ return nfs_common_errtbl[i].errno;
+}
+
+static void
+nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
+{
+ if (rqstp->rq_client)
+ auth_domain_put(rqstp->rq_client);
+ if (rqstp->rq_cred.cr_group_info)
+ put_group_info(rqstp->rq_cred.cr_group_info);
+ /* rpcauth_map_to_svc_cred_local() clears cr_principal */
+ WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
+ kfree(rqstp->rq_xprt);
+ kfree(rqstp);
+}
+
+static struct svc_rqst *
+nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
+ const struct cred *cred)
+{
+ struct svc_rqst *rqstp;
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ int status;
+
+ /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
+ if (unlikely(!READ_ONCE(nn->nfsd_serv)))
+ return ERR_PTR(-ENXIO);
+
+ rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
+ if (!rqstp)
+ return ERR_PTR(-ENOMEM);
+
+ rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
+ if (!rqstp->rq_xprt) {
+ status = -ENOMEM;
+ goto out_err;
+ }
+
+ rqstp->rq_xprt->xpt_net = net;
+ __set_bit(RQ_SECURE, &rqstp->rq_flags);
+ rqstp->rq_proc = 1;
+ rqstp->rq_vers = 3;
+ rqstp->rq_prot = IPPROTO_TCP;
+ rqstp->rq_server = nn->nfsd_serv;
+
+ /* Note: we're connecting to ourself, so source addr == peer addr */
+ rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
+ (struct sockaddr *)&rqstp->rq_addr,
+ sizeof(rqstp->rq_addr));
+
+ rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
+
+ /*
+ * set up enough for svcauth_unix_set_client to be able to wait
+ * for the cache downcall. Note that we do _not_ want to allow the
+ * request to be deferred for later revisit since this rqst and xprt
+ * are not set up to run inside of the normal svc_rqst engine.
+ */
+ INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
+ kref_init(&rqstp->rq_xprt->xpt_ref);
+ spin_lock_init(&rqstp->rq_xprt->xpt_lock);
+ rqstp->rq_chandle.thread_wait = 5 * HZ;
+
+ status = svcauth_unix_set_client(rqstp);
+ switch (status) {
+ case SVC_OK:
+ break;
+ case SVC_DENIED:
+ status = -ENXIO;
+ goto out_err;
+ default:
+ status = -ETIMEDOUT;
+ goto out_err;
+ }
+
+ return rqstp;
+
+out_err:
+ nfsd_local_fakerqst_destroy(rqstp);
+ return ERR_PTR(status);
+}
+
+/*
+ * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file
+ *
+ * This function maps a local fh to a path on a local filesystem.
+ * This is useful when the nfs client has the local server mounted - it can
+ * avoid all the NFS overhead with reads, writes and commits.
+ *
+ * on successful return, caller is responsible for calling path_put. Also
+ * note that this is called from nfs.ko via find_symbol() to avoid an explicit
+ * dependency on knfsd. So, there is no forward declaration in a header file
+ * for it.
+ */
+int nfsd_open_local_fh(struct net *net,
+ struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh,
+ const fmode_t fmode,
+ struct file **pfilp)
+{
+ const struct cred *save_cred;
+ struct svc_rqst *rqstp;
+ struct svc_fh fh;
+ struct nfsd_file *nf;
+ int status = 0;
+ int mayflags = NFSD_MAY_LOCALIO;
+ __be32 beres;
+
+ /* Save creds before calling into nfsd */
+ save_cred = get_current_cred();
+
+ rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
+ if (IS_ERR(rqstp)) {
+ status = PTR_ERR(rqstp);
+ goto out_revertcred;
+ }
+
+ /* nfs_fh -> svc_fh */
+ if (nfs_fh->size > NFS4_FHSIZE) {
+ status = -EINVAL;
+ goto out;
+ }
+ fh_init(&fh, NFS4_FHSIZE);
+ fh.fh_handle.fh_size = nfs_fh->size;
+ memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
+
+ if (fmode & FMODE_READ)
+ mayflags |= NFSD_MAY_READ;
+ if (fmode & FMODE_WRITE)
+ mayflags |= NFSD_MAY_WRITE;
+
+ beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
+ if (beres) {
+ status = nfs_stat_to_errno(be32_to_cpu(beres));
+ goto out_fh_put;
+ }
+
+ *pfilp = get_file(nf->nf_file);
+
+ nfsd_file_put(nf);
+out_fh_put:
+ fh_put(&fh);
+
+out:
+ nfsd_local_fakerqst_destroy(rqstp);
+out_revertcred:
+ revert_creds(save_cred);
+ return status;
+}
+EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 1222a0a33fe1..a477d2c5088a 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -431,6 +431,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
#endif
#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
INIT_LIST_HEAD(&nn->nfsd_uuid.list);
+ nn->nfsd_uuid.net = net;
list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
nn->nfsd_net_up = true;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 77bbd23aa150..9c0610fdd11c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
{ NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
{ NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
{ NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
- { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
+ { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
+ { NFSD_MAY_LOCALIO, "LOCALIO" })
TRACE_EVENT(nfsd_compound,
TP_PROTO(
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 57cd70062048..5146f0c81752 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -33,6 +33,8 @@
#define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */
+#define NFSD_MAY_LOCALIO 0x2000
+
#define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
#define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
@@ -158,6 +160,13 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
void nfsd_filp_close(struct file *fp);
+int nfsd_open_local_fh(struct net *net,
+ struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh,
+ const fmode_t fmode,
+ struct file **pfilp);
+
static inline int fh_want_write(struct svc_fh *fh)
{
int ret;
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 14/19] nfsd/localio: manage netns reference in nfsd_open_local_fh
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (12 preceding siblings ...)
2024-06-28 21:10 ` [PATCH v9 13/19] nfsd: add "localio" support Mike Snitzer
@ 2024-06-28 21:11 ` Mike Snitzer
2024-06-28 21:11 ` [PATCH v9 15/19] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh Mike Snitzer
` (5 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:11 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Use maybe_get_net() and put_net() in nfsd_open_local_fh().
Also refactor nfsd_open_local_fh() slightly.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 40 +++++++++++++++++++++++-----------------
1 file changed, 23 insertions(+), 17 deletions(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 759a5cb79652..8799ad3ac536 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -101,16 +101,11 @@ nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
static struct svc_rqst *
nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
- const struct cred *cred)
+ const struct cred *cred, struct svc_serv *serv)
{
struct svc_rqst *rqstp;
- struct nfsd_net *nn = net_generic(net, nfsd_net_id);
int status;
- /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
- if (unlikely(!READ_ONCE(nn->nfsd_serv)))
- return ERR_PTR(-ENXIO);
-
rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
if (!rqstp)
return ERR_PTR(-ENOMEM);
@@ -120,13 +115,13 @@ nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
status = -ENOMEM;
goto out_err;
}
-
rqstp->rq_xprt->xpt_net = net;
+
__set_bit(RQ_SECURE, &rqstp->rq_flags);
rqstp->rq_proc = 1;
rqstp->rq_vers = 3;
rqstp->rq_prot = IPPROTO_TCP;
- rqstp->rq_server = nn->nfsd_serv;
+ rqstp->rq_server = serv;
/* Note: we're connecting to ourself, so source addr == peer addr */
rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
@@ -184,28 +179,41 @@ int nfsd_open_local_fh(struct net *net,
const fmode_t fmode,
struct file **pfilp)
{
+ struct nfsd_net *nn;
const struct cred *save_cred;
struct svc_rqst *rqstp;
struct svc_fh fh;
struct nfsd_file *nf;
int status = 0;
int mayflags = NFSD_MAY_LOCALIO;
+ struct svc_serv *serv;
__be32 beres;
+ if (nfs_fh->size > NFS4_FHSIZE)
+ return -EINVAL;
+
+ /* Not running in nfsd context, must safely get reference on nfsd_serv */
+ net = maybe_get_net(net);
+ if (!net)
+ return -ENXIO;
+ nn = net_generic(net, nfsd_net_id);
+
+ serv = READ_ONCE(nn->nfsd_serv);
+ if (unlikely(!serv)) {
+ status = -ENXIO;
+ goto out_net;
+ }
+
/* Save creds before calling into nfsd */
save_cred = get_current_cred();
- rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
+ rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred, serv);
if (IS_ERR(rqstp)) {
status = PTR_ERR(rqstp);
goto out_revertcred;
}
/* nfs_fh -> svc_fh */
- if (nfs_fh->size > NFS4_FHSIZE) {
- status = -EINVAL;
- goto out;
- }
fh_init(&fh, NFS4_FHSIZE);
fh.fh_handle.fh_size = nfs_fh->size;
memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
@@ -220,17 +228,15 @@ int nfsd_open_local_fh(struct net *net,
status = nfs_stat_to_errno(be32_to_cpu(beres));
goto out_fh_put;
}
-
*pfilp = get_file(nf->nf_file);
-
nfsd_file_put(nf);
out_fh_put:
fh_put(&fh);
-
-out:
nfsd_local_fakerqst_destroy(rqstp);
out_revertcred:
revert_creds(save_cred);
+out_net:
+ put_net(net);
return status;
}
EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 15/19] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (13 preceding siblings ...)
2024-06-28 21:11 ` [PATCH v9 14/19] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
@ 2024-06-28 21:11 ` Mike Snitzer
2024-06-28 21:11 ` [PATCH v9 16/19] nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
` (4 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:11 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
client initiated localio calls to nfsd (that are _not_ in the context
of nfsd) are complete.
nfsd_open_local_fh is updated to nfsd_serv_try_get before opening its
file handle and then drop the reference using nfsd_serv_put at the end
of nfsd_open_local_fh.
This "interlock" working relies heavily on nfsd_open_local_fh()'s
maybe_get_net() safely dealing with the possibility that the struct
net (and nfsd_net by association) may have been destroyed by
nfsd_destroy_serv() via nfsd_shutdown_net().
Verified to fix an easy to hit crash that would occur if an nfsd
instance running in a container, with a localio client mounted, is
shutdown. Upon restart of the container and associated nfsd the client
would go on to crash due to NULL pointer dereference that occuured due
to the nfs client's localio attempting to nfsd_open_local_fh(), using
nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 8 ++++----
fs/nfsd/netns.h | 8 +++++++-
fs/nfsd/nfssvc.c | 39 +++++++++++++++++++++++++++++++++++++++
3 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 8799ad3ac536..ef8467056827 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -186,7 +186,6 @@ int nfsd_open_local_fh(struct net *net,
struct nfsd_file *nf;
int status = 0;
int mayflags = NFSD_MAY_LOCALIO;
- struct svc_serv *serv;
__be32 beres;
if (nfs_fh->size > NFS4_FHSIZE)
@@ -198,8 +197,8 @@ int nfsd_open_local_fh(struct net *net,
return -ENXIO;
nn = net_generic(net, nfsd_net_id);
- serv = READ_ONCE(nn->nfsd_serv);
- if (unlikely(!serv)) {
+ /* The server may already be shutting down, disallow new localio */
+ if (unlikely(!nfsd_serv_try_get(nn))) {
status = -ENXIO;
goto out_net;
}
@@ -207,7 +206,7 @@ int nfsd_open_local_fh(struct net *net,
/* Save creds before calling into nfsd */
save_cred = get_current_cred();
- rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred, serv);
+ rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred, nn->nfsd_serv);
if (IS_ERR(rqstp)) {
status = PTR_ERR(rqstp);
goto out_revertcred;
@@ -235,6 +234,7 @@ int nfsd_open_local_fh(struct net *net,
nfsd_local_fakerqst_destroy(rqstp);
out_revertcred:
revert_creds(save_cred);
+ nfsd_serv_put(nn);
out_net:
put_net(net);
return status;
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 0c5a1d97e4ac..443b003fd2ec 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -13,6 +13,7 @@
#include <linux/filelock.h>
#include <linux/nfs4.h>
#include <linux/percpu_counter.h>
+#include <linux/percpu-refcount.h>
#include <linux/siphash.h>
#include <linux/sunrpc/stats.h>
#include <linux/nfslocalio.h>
@@ -140,7 +141,9 @@ struct nfsd_net {
struct svc_info nfsd_info;
#define nfsd_serv nfsd_info.serv
-
+ struct percpu_ref nfsd_serv_ref;
+ struct completion nfsd_serv_confirm_done;
+ struct completion nfsd_serv_free_done;
/*
* clientid and stateid data for construction of net unique COPY
@@ -225,6 +228,9 @@ struct nfsd_net {
extern bool nfsd_support_version(int vers);
extern void nfsd_netns_free_versions(struct nfsd_net *nn);
+bool nfsd_serv_try_get(struct nfsd_net *nn);
+void nfsd_serv_put(struct nfsd_net *nn);
+
extern unsigned int nfsd_net_id;
void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index a477d2c5088a..11fb209b46bf 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -258,6 +258,30 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
return 0;
}
+bool nfsd_serv_try_get(struct nfsd_net *nn)
+{
+ return percpu_ref_tryget_live(&nn->nfsd_serv_ref);
+}
+
+void nfsd_serv_put(struct nfsd_net *nn)
+{
+ percpu_ref_put(&nn->nfsd_serv_ref);
+}
+
+static void nfsd_serv_done(struct percpu_ref *ref)
+{
+ struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
+
+ complete(&nn->nfsd_serv_confirm_done);
+}
+
+static void nfsd_serv_free(struct percpu_ref *ref)
+{
+ struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
+
+ complete(&nn->nfsd_serv_free_done);
+}
+
/*
* Maximum number of nfsd processes
*/
@@ -462,6 +486,7 @@ static void nfsd_shutdown_net(struct net *net)
lockd_down(net);
nn->lockd_up = false;
}
+ percpu_ref_exit(&nn->nfsd_serv_ref);
#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
list_del_rcu(&nn->nfsd_uuid.list);
#endif
@@ -544,6 +569,13 @@ void nfsd_destroy_serv(struct net *net)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct svc_serv *serv = nn->nfsd_serv;
+ lockdep_assert_held(&nfsd_mutex);
+
+ percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
+ wait_for_completion(&nn->nfsd_serv_confirm_done);
+ wait_for_completion(&nn->nfsd_serv_free_done);
+ /* percpu_ref_exit is called in nfsd_shutdown_net */
+
spin_lock(&nfsd_notifier_lock);
nn->nfsd_serv = NULL;
spin_unlock(&nfsd_notifier_lock);
@@ -666,6 +698,13 @@ int nfsd_create_serv(struct net *net)
if (nn->nfsd_serv)
return 0;
+ error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free,
+ 0, GFP_KERNEL);
+ if (error)
+ return error;
+ init_completion(&nn->nfsd_serv_free_done);
+ init_completion(&nn->nfsd_serv_confirm_done);
+
if (nfsd_max_blksize == 0)
nfsd_max_blksize = nfsd_get_default_max_blksize();
nfsd_reset_versions(nn);
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 16/19] nfsd: add Kconfig options to allow localio to be enabled
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (14 preceding siblings ...)
2024-06-28 21:11 ` [PATCH v9 15/19] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh Mike Snitzer
@ 2024-06-28 21:11 ` Mike Snitzer
2024-06-28 21:11 ` [PATCH v9 17/19] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
` (3 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:11 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
CONFIG_NFSD_LOCALIO controls the server enablement for localio.
An earlier commit added CONFIG_NFS_LOCALIO to allow the client
enablement.
While it is true that it doesn't make sense, on a using LOCALIO level,
to have one without the other: it is useful to allow a mix be
configured for testing purposes. It could be that the same control
could be achieved by exposing a discrete "localio_enabled"
module_param in the server (nfsd.ko) like is already available in the
client (nfs.ko).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/Kconfig | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index ec2ab6429e00..a36ff66c7430 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -89,6 +89,20 @@ config NFSD_V4
If unsure, say N.
+config NFSD_LOCALIO
+ tristate "NFS server support for the LOCALIO auxiliary protocol"
+ depends on NFSD || NFSD_V4
+ select NFS_COMMON_LOCALIO_SUPPORT
+ help
+ Some NFS servers support an auxiliary NFS LOCALIO protocol
+ that is not an official part of the NFS version 3 or 4 protocol.
+
+ This option enables support for the LOCALIO protocol in the
+ kernel's NFS server. Enable this to bypass using the NFS
+ protocol when issuing reads, writes and commits to the server.
+
+ If unsure, say N.
+
config NFSD_PNFS
bool
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 17/19] nfsd: implement server support for NFS_LOCALIO_PROGRAM
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (15 preceding siblings ...)
2024-06-28 21:11 ` [PATCH v9 16/19] nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
@ 2024-06-28 21:11 ` Mike Snitzer
2024-06-28 21:11 ` [PATCH v9 18/19] SUNRPC: replace program list with program array Mike Snitzer
` (2 subsequent siblings)
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:11 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
LOCALIOPROC_GETUUID encodes the server's uuid_t in terms of the fixed
UUID_SIZE (16). The fixed size opaque encode and decode XDR methods
are used instead of the less efficient variable sized methods.
Aside from a bit of code in nfssvc.c, all the knowledge of the LOCALIO
RPC protocol is in fs/nfsd/localio.c which implements just a single
version (1) that is used independently of what NFS version is used.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[neilb: factored out and simplified single localio protocol]
Co-developed-by: NeilBrown <neil@brown.name>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfsd/localio.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfssvc.c | 29 ++++++++++++++++++-
2 files changed, 102 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index ef8467056827..3b52391a7bde 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -11,12 +11,15 @@
#include <linux/sunrpc/svcauth_gss.h>
#include <linux/sunrpc/clnt.h>
#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
#include <linux/string.h>
#include "nfsd.h"
#include "vfs.h"
#include "netns.h"
#include "filecache.h"
+#include "cache.h"
#define NFSDDBG_FACILITY NFSDDBG_FH
@@ -243,3 +246,74 @@ EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
/* Compile time type checking, not used by anything */
static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
+
+/*
+ * GETUUID XDR encode functions
+ */
+
+static __be32 localio_proc_null(struct svc_rqst *rqstp)
+{
+ return rpc_success;
+}
+
+struct localio_getuuidres {
+ uuid_t uuid;
+};
+
+static __be32 localio_proc_getuuid(struct svc_rqst *rqstp)
+{
+ struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+ struct localio_getuuidres *resp = rqstp->rq_resp;
+
+ uuid_copy(&resp->uuid, &nn->nfsd_uuid.uuid);
+
+ return rpc_success;
+}
+
+static bool localio_encode_getuuidres(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr)
+{
+ struct localio_getuuidres *resp = rqstp->rq_resp;
+ u8 uuid[UUID_SIZE];
+
+ export_uuid(uuid, &resp->uuid);
+ encode_opaque_fixed(xdr, uuid, UUID_SIZE);
+
+ return true;
+}
+
+static const struct svc_procedure localio_procedures1[] = {
+ [LOCALIOPROC_NULL] = {
+ .pc_func = localio_proc_null,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfssvc_encode_voidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_voidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = 0,
+ .pc_name = "NULL",
+ },
+ [LOCALIOPROC_GETUUID] = {
+ .pc_func = localio_proc_getuuid,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = localio_encode_getuuidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct localio_getuuidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = XDR_QUADLEN(UUID_SIZE),
+ .pc_name = "GETUUID",
+ },
+};
+
+#define LOCALIO_NR_PROCEDURES ARRAY_SIZE(localio_procedures1)
+static DEFINE_PER_CPU_ALIGNED(unsigned long,
+ localio_count[LOCALIO_NR_PROCEDURES]);
+const struct svc_version localio_version1 = {
+ .vs_vers = 1,
+ .vs_nproc = LOCALIO_NR_PROCEDURES,
+ .vs_proc = localio_procedures1,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_count = localio_count,
+ .vs_xdrsize = XDR_QUADLEN(UUID_SIZE),
+ .vs_hidden = true,
+};
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 11fb209b46bf..6cc6a1971e21 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -81,6 +81,26 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
unsigned long nfsd_drc_max_mem;
unsigned long nfsd_drc_mem_used;
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+extern const struct svc_version localio_version1;
+static const struct svc_version *localio_versions[] = {
+ [1] = &localio_version1,
+};
+
+#define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions)
+
+static struct svc_program nfsd_localio_program = {
+ .pg_prog = NFS_LOCALIO_PROGRAM,
+ .pg_nvers = NFSD_LOCALIO_NRVERS,
+ .pg_vers = localio_versions,
+ .pg_name = "nfslocalio",
+ .pg_class = "nfsd",
+ .pg_authenticate = &svc_set_client,
+ .pg_init_request = svc_generic_init_request,
+ .pg_rpcbind_set = svc_generic_rpcbind_set,
+};
+#endif /* CONFIG_NFSD_LOCALIO */
+
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static const struct svc_version *nfsd_acl_version[] = {
# if defined(CONFIG_NFSD_V2_ACL)
@@ -95,6 +115,9 @@ static const struct svc_version *nfsd_acl_version[] = {
#define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)
static struct svc_program nfsd_acl_program = {
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ .pg_next = &nfsd_localio_program,
+#endif /* CONFIG_NFSD_LOCALIO */
.pg_prog = NFS_ACL_PROGRAM,
.pg_nvers = NFSD_ACL_NRVERS,
.pg_vers = nfsd_acl_version,
@@ -123,6 +146,10 @@ static const struct svc_version *nfsd_version[] = {
struct svc_program nfsd_program = {
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
.pg_next = &nfsd_acl_program,
+#else
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ .pg_next = &nfsd_localio_program,
+#endif /* CONFIG_NFSD_LOCALIO */
#endif
.pg_prog = NFS_PROGRAM, /* program number */
.pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
@@ -1014,7 +1041,7 @@ nfsd(void *vrqstp)
}
/**
- * nfsd_dispatch - Process an NFS or NFSACL Request
+ * nfsd_dispatch - Process an NFS or NFSACL or LOCALIO Request
* @rqstp: incoming request
*
* This RPC dispatcher integrates the NFS server's duplicate reply cache.
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 18/19] SUNRPC: replace program list with program array
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (16 preceding siblings ...)
2024-06-28 21:11 ` [PATCH v9 17/19] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-06-28 21:11 ` Mike Snitzer
2024-06-29 16:00 ` Chuck Lever
2024-06-28 21:11 ` [PATCH v9 19/19] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-06-29 15:36 ` [PATCH v9 00/19] nfs/nfsd: add support for localio Chuck Lever III
19 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:11 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: NeilBrown <neil@brown.name>
A service created with svc_create_pooled() can be given a linked list of
programs and all of these will be served.
Using a linked list makes it cumbersome when there are several programs
that can be optionally selected with CONFIG settings.
So change to use an array with explicit size. svc_create() is always
passed a single program. svc_create_pooled() now must be used for
multiple programs.
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/nfsctl.c | 2 +-
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/nfssvc.c | 69 ++++++++++++++++++--------------------
include/linux/sunrpc/svc.h | 7 ++--
net/sunrpc/svc.c | 68 +++++++++++++++++++++----------------
net/sunrpc/svc_xprt.c | 2 +-
net/sunrpc/svcauth_unix.c | 3 +-
7 files changed, 80 insertions(+), 73 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e5d2cc74ef77..6fb92bb61c6d 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2265,7 +2265,7 @@ static __net_init int nfsd_net_init(struct net *net)
if (retval)
goto out_repcache_error;
memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
- nn->nfsd_svcstats.program = &nfsd_program;
+ nn->nfsd_svcstats.program = &nfsd_programs[0];
nn->nfsd_versions = NULL;
nn->nfsd4_minorversions = NULL;
nfsd4_init_leases_net(nn);
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index cec8697b1cd6..c3f7c5957950 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -80,7 +80,7 @@ struct nfsd_genl_rqstp {
u32 rq_opnum[NFSD_MAX_OPS_PER_COMPOUND];
};
-extern struct svc_program nfsd_program;
+extern struct svc_program nfsd_programs[];
extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4;
extern struct mutex nfsd_mutex;
extern spinlock_t nfsd_drc_lock;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 6cc6a1971e21..ef2532303ece 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -36,7 +36,6 @@
#define NFSDDBG_FACILITY NFSDDBG_SVC
atomic_t nfsd_th_cnt = ATOMIC_INIT(0);
-extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static int nfsd_acl_rpcbind_set(struct net *,
@@ -89,16 +88,6 @@ static const struct svc_version *localio_versions[] = {
#define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions)
-static struct svc_program nfsd_localio_program = {
- .pg_prog = NFS_LOCALIO_PROGRAM,
- .pg_nvers = NFSD_LOCALIO_NRVERS,
- .pg_vers = localio_versions,
- .pg_name = "nfslocalio",
- .pg_class = "nfsd",
- .pg_authenticate = &svc_set_client,
- .pg_init_request = svc_generic_init_request,
- .pg_rpcbind_set = svc_generic_rpcbind_set,
-};
#endif /* CONFIG_NFSD_LOCALIO */
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
@@ -111,23 +100,9 @@ static const struct svc_version *nfsd_acl_version[] = {
# endif
};
-#define NFSD_ACL_MINVERS 2
+#define NFSD_ACL_MINVERS 2
#define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)
-static struct svc_program nfsd_acl_program = {
-#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
- .pg_next = &nfsd_localio_program,
-#endif /* CONFIG_NFSD_LOCALIO */
- .pg_prog = NFS_ACL_PROGRAM,
- .pg_nvers = NFSD_ACL_NRVERS,
- .pg_vers = nfsd_acl_version,
- .pg_name = "nfsacl",
- .pg_class = "nfsd",
- .pg_authenticate = &svc_set_client,
- .pg_init_request = nfsd_acl_init_request,
- .pg_rpcbind_set = nfsd_acl_rpcbind_set,
-};
-
#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
static const struct svc_version *nfsd_version[] = {
@@ -140,25 +115,44 @@ static const struct svc_version *nfsd_version[] = {
#endif
};
-#define NFSD_MINVERS 2
+#define NFSD_MINVERS 2
#define NFSD_NRVERS ARRAY_SIZE(nfsd_version)
-struct svc_program nfsd_program = {
-#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
- .pg_next = &nfsd_acl_program,
-#else
-#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
- .pg_next = &nfsd_localio_program,
-#endif /* CONFIG_NFSD_LOCALIO */
-#endif
+struct svc_program nfsd_programs[] = {
+ {
.pg_prog = NFS_PROGRAM, /* program number */
.pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
.pg_vers = nfsd_version, /* version table */
.pg_name = "nfsd", /* program name */
.pg_class = "nfsd", /* authentication class */
- .pg_authenticate = &svc_set_client, /* export authentication */
+ .pg_authenticate = svc_set_client, /* export authentication */
.pg_init_request = nfsd_init_request,
.pg_rpcbind_set = nfsd_rpcbind_set,
+ },
+#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
+ {
+ .pg_prog = NFS_ACL_PROGRAM,
+ .pg_nvers = NFSD_ACL_NRVERS,
+ .pg_vers = nfsd_acl_version,
+ .pg_name = "nfsacl",
+ .pg_class = "nfsd",
+ .pg_authenticate = svc_set_client,
+ .pg_init_request = nfsd_acl_init_request,
+ .pg_rpcbind_set = nfsd_acl_rpcbind_set,
+ },
+#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ {
+ .pg_prog = NFS_LOCALIO_PROGRAM,
+ .pg_nvers = NFSD_LOCALIO_NRVERS,
+ .pg_vers = localio_versions,
+ .pg_name = "nfslocalio",
+ .pg_class = "nfsd",
+ .pg_authenticate = svc_set_client,
+ .pg_init_request = svc_generic_init_request,
+ .pg_rpcbind_set = svc_generic_rpcbind_set,
+ }
+#endif /* IS_ENABLED(CONFIG_NFSD_LOCALIO) */
};
bool nfsd_support_version(int vers)
@@ -735,7 +729,8 @@ int nfsd_create_serv(struct net *net)
if (nfsd_max_blksize == 0)
nfsd_max_blksize = nfsd_get_default_max_blksize();
nfsd_reset_versions(nn);
- serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats,
+ serv = svc_create_pooled(nfsd_programs, ARRAY_SIZE(nfsd_programs),
+ &nn->nfsd_svcstats,
nfsd_max_blksize, nfsd);
if (serv == NULL)
return -ENOMEM;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index a7d0406b9ef5..7c86b1696398 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -66,9 +66,10 @@ enum {
* We currently do not support more than one RPC program per daemon.
*/
struct svc_serv {
- struct svc_program * sv_program; /* RPC program */
+ struct svc_program * sv_programs; /* RPC programs */
struct svc_stat * sv_stats; /* RPC statistics */
spinlock_t sv_lock;
+ unsigned int sv_nprogs; /* Number of sv_programs */
unsigned int sv_nrthreads; /* # of server threads */
unsigned int sv_maxconn; /* max connections allowed or
* '0' causing max to be based
@@ -329,10 +330,9 @@ struct svc_process_info {
};
/*
- * List of RPC programs on the same transport endpoint
+ * RPC program - an array of these can use the same transport endpoint
*/
struct svc_program {
- struct svc_program * pg_next; /* other programs (same xprt) */
u32 pg_prog; /* program number */
unsigned int pg_lovers; /* lowest version */
unsigned int pg_hivers; /* highest version */
@@ -414,6 +414,7 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp);
void svc_rqst_free(struct svc_rqst *);
void svc_exit_thread(struct svc_rqst *);
struct svc_serv * svc_create_pooled(struct svc_program *prog,
+ unsigned int nprog,
struct svc_stat *stats,
unsigned int bufsize,
int (*threadfn)(void *data));
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 965a27806bfd..d9f348aa0672 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -440,10 +440,11 @@ EXPORT_SYMBOL_GPL(svc_rpcb_cleanup);
static int svc_uses_rpcbind(struct svc_serv *serv)
{
- struct svc_program *progp;
- unsigned int i;
+ unsigned int p, i;
+
+ for (p = 0; p < serv->sv_nprogs; p++) {
+ struct svc_program *progp = &serv->sv_programs[p];
- for (progp = serv->sv_program; progp; progp = progp->pg_next) {
for (i = 0; i < progp->pg_nvers; i++) {
if (progp->pg_vers[i] == NULL)
continue;
@@ -480,7 +481,7 @@ __svc_init_bc(struct svc_serv *serv)
* Create an RPC service
*/
static struct svc_serv *
-__svc_create(struct svc_program *prog, struct svc_stat *stats,
+__svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats,
unsigned int bufsize, int npools, int (*threadfn)(void *data))
{
struct svc_serv *serv;
@@ -491,7 +492,8 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL)))
return NULL;
serv->sv_name = prog->pg_name;
- serv->sv_program = prog;
+ serv->sv_programs = prog;
+ serv->sv_nprogs = nprogs;
serv->sv_stats = stats;
if (bufsize > RPCSVC_MAXPAYLOAD)
bufsize = RPCSVC_MAXPAYLOAD;
@@ -499,17 +501,18 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE);
serv->sv_threadfn = threadfn;
xdrsize = 0;
- while (prog) {
- prog->pg_lovers = prog->pg_nvers-1;
- for (vers=0; vers<prog->pg_nvers ; vers++)
- if (prog->pg_vers[vers]) {
- prog->pg_hivers = vers;
- if (prog->pg_lovers > vers)
- prog->pg_lovers = vers;
- if (prog->pg_vers[vers]->vs_xdrsize > xdrsize)
- xdrsize = prog->pg_vers[vers]->vs_xdrsize;
+ for (i = 0; i < nprogs; i++) {
+ struct svc_program *progp = &prog[i];
+
+ progp->pg_lovers = progp->pg_nvers-1;
+ for (vers = 0; vers < progp->pg_nvers ; vers++)
+ if (progp->pg_vers[vers]) {
+ progp->pg_hivers = vers;
+ if (progp->pg_lovers > vers)
+ progp->pg_lovers = vers;
+ if (progp->pg_vers[vers]->vs_xdrsize > xdrsize)
+ xdrsize = progp->pg_vers[vers]->vs_xdrsize;
}
- prog = prog->pg_next;
}
serv->sv_xdrsize = xdrsize;
INIT_LIST_HEAD(&serv->sv_tempsocks);
@@ -558,13 +561,14 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize,
int (*threadfn)(void *data))
{
- return __svc_create(prog, NULL, bufsize, 1, threadfn);
+ return __svc_create(prog, 1, NULL, bufsize, 1, threadfn);
}
EXPORT_SYMBOL_GPL(svc_create);
/**
* svc_create_pooled - Create an RPC service with pooled threads
- * @prog: the RPC program the new service will handle
+ * @prog: Array of RPC programs the new service will handle
+ * @nprogs: Number of programs in the array
* @stats: the stats struct if desired
* @bufsize: maximum message size for @prog
* @threadfn: a function to service RPC requests for @prog
@@ -572,6 +576,7 @@ EXPORT_SYMBOL_GPL(svc_create);
* Returns an instantiated struct svc_serv object or NULL.
*/
struct svc_serv *svc_create_pooled(struct svc_program *prog,
+ unsigned int nprogs,
struct svc_stat *stats,
unsigned int bufsize,
int (*threadfn)(void *data))
@@ -579,7 +584,7 @@ struct svc_serv *svc_create_pooled(struct svc_program *prog,
struct svc_serv *serv;
unsigned int npools = svc_pool_map_get();
- serv = __svc_create(prog, stats, bufsize, npools, threadfn);
+ serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn);
if (!serv)
goto out_err;
serv->sv_is_pooled = true;
@@ -602,16 +607,16 @@ svc_destroy(struct svc_serv **servp)
*servp = NULL;
- dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name);
+ dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name);
timer_shutdown_sync(&serv->sv_temptimer);
/*
* Remaining transports at this point are not expected.
*/
WARN_ONCE(!list_empty(&serv->sv_permsocks),
- "SVC: permsocks remain for %s\n", serv->sv_program->pg_name);
+ "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name);
WARN_ONCE(!list_empty(&serv->sv_tempsocks),
- "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name);
+ "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name);
cache_clean_deferred(serv);
@@ -1156,15 +1161,16 @@ int svc_register(const struct svc_serv *serv, struct net *net,
const int family, const unsigned short proto,
const unsigned short port)
{
- struct svc_program *progp;
- unsigned int i;
+ unsigned int p, i;
int error = 0;
WARN_ON_ONCE(proto == 0 && port == 0);
if (proto == 0 && port == 0)
return -EINVAL;
- for (progp = serv->sv_program; progp; progp = progp->pg_next) {
+ for (p = 0; p < serv->sv_nprogs; p++) {
+ struct svc_program *progp = &serv->sv_programs[p];
+
for (i = 0; i < progp->pg_nvers; i++) {
error = progp->pg_rpcbind_set(net, progp, i,
@@ -1216,13 +1222,14 @@ static void __svc_unregister(struct net *net, const u32 program, const u32 versi
static void svc_unregister(const struct svc_serv *serv, struct net *net)
{
struct sighand_struct *sighand;
- struct svc_program *progp;
unsigned long flags;
- unsigned int i;
+ unsigned int p, i;
clear_thread_flag(TIF_SIGPENDING);
- for (progp = serv->sv_program; progp; progp = progp->pg_next) {
+ for (p = 0; p < serv->sv_nprogs; p++) {
+ struct svc_program *progp = &serv->sv_programs[p];
+
for (i = 0; i < progp->pg_nvers; i++) {
if (progp->pg_vers[i] == NULL)
continue;
@@ -1328,7 +1335,7 @@ svc_process_common(struct svc_rqst *rqstp)
struct svc_process_info process;
enum svc_auth_status auth_res;
unsigned int aoffset;
- int rc;
+ int pr, rc;
__be32 *p;
/* Will be turned off only when NFSv4 Sessions are used */
@@ -1352,9 +1359,12 @@ svc_process_common(struct svc_rqst *rqstp)
rqstp->rq_vers = be32_to_cpup(p++);
rqstp->rq_proc = be32_to_cpup(p);
- for (progp = serv->sv_program; progp; progp = progp->pg_next)
+ for (pr = 0; pr < serv->sv_nprogs; pr++) {
+ progp = &serv->sv_programs[pr];
+
if (rqstp->rq_prog == progp->pg_prog)
break;
+ }
/*
* Decode auth data, and add verifier to reply buffer.
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d3735ab3e6d1..16634afdf253 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -268,7 +268,7 @@ static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name,
spin_unlock(&svc_xprt_class_lock);
newxprt = xcl->xcl_ops->xpo_create(serv, net, sap, len, flags);
if (IS_ERR(newxprt)) {
- trace_svc_xprt_create_err(serv->sv_program->pg_name,
+ trace_svc_xprt_create_err(serv->sv_programs->pg_name,
xcl->xcl_name, sap, len,
newxprt);
module_put(xcl->xcl_owner);
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 04b45588ae6f..8ca98b146ec8 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -697,7 +697,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
rqstp->rq_auth_stat = rpc_autherr_badcred;
ipm = ip_map_cached_get(xprt);
if (ipm == NULL)
- ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class,
+ ipm = __ip_map_lookup(sn->ip_map_cache,
+ rqstp->rq_server->sv_programs->pg_class,
&sin6->sin6_addr);
if (ipm == NULL)
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* [PATCH v9 19/19] nfs: add Documentation/filesystems/nfs/localio.rst
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (17 preceding siblings ...)
2024-06-28 21:11 ` [PATCH v9 18/19] SUNRPC: replace program list with program array Mike Snitzer
@ 2024-06-28 21:11 ` Mike Snitzer
2024-06-29 15:36 ` [PATCH v9 00/19] nfs/nfsd: add support for localio Chuck Lever III
19 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2024-06-28 21:11 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
This document gives an overview of the LOCALIO auxiliary RPC protocol
added to the Linux NFS client and server (both v3 and v4) to allow a
client and server to reliably handshake to determine if they are on the
same host. The LOCALIO auxiliary protocol's implementation, which uses
the same connection as NFS traffic, follows the pattern established by
the NFS ACL protocol extension.
The robust handshake between local client and server is just the
beginning, the ultimate usecase this locality makes possible is the
client is able to issue reads, writes and commits directly to the server
without having to go over the network. This is particularly useful for
container usecases (e.g. kubernetes) where it is possible to run an IO
job local to the server.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
Documentation/filesystems/nfs/localio.rst | 135 ++++++++++++++++++++++
include/linux/nfslocalio.h | 2 +
2 files changed, 137 insertions(+)
create mode 100644 Documentation/filesystems/nfs/localio.rst
diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
new file mode 100644
index 000000000000..7f211e3fc34c
--- /dev/null
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -0,0 +1,135 @@
+===========
+NFS localio
+===========
+
+This document gives an overview of the LOCALIO auxiliary RPC protocol
+added to the Linux NFS client and server (both v3 and v4) to allow a
+client and server to reliably handshake to determine if they are on the
+same host. The LOCALIO auxiliary protocol's implementation, which uses
+the same connection as NFS traffic, follows the pattern established by
+the NFS ACL protocol extension.
+
+The LOCALIO auxiliary protocol is needed to allow robust discovery of
+clients local to their servers. In a private implementation that
+preceded use of this LOCALIO protocol, a fragile sockaddr network
+address based match against all local network interfaces was attempted.
+But unlike the LOCALIO protocol, the sockaddr-based matching didn't
+handle use of iptables or containers.
+
+The robust handshake between local client and server is just the
+beginning, the ultimate usecase this locality makes possible is the
+client is able to issue reads, writes and commits directly to the server
+without having to go over the network. This is particularly useful for
+container usecases (e.g. kubernetes) where it is possible to run an IO
+job local to the server.
+
+The performance advantage realized from localio's ability to bypass
+using XDR and RPC for reads, writes and commits can be extreme, e.g.:
+fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
+- With localio:
+ read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
+- Without localio:
+ read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
+
+RPC
+---
+
+The LOCALIO auxiliary RPC protocol consists of a single "GETUUID" RPC
+method that allows the Linux NFS client to retrieve a Linux NFS server's
+uuid. This protocol isn't part of an IETF standard, nor does it need to
+be considering it is Linux-to-Linux auxiliary RPC protocol that amounts
+to an implementation detail.
+
+The GETUUID method encodes the server's uuid_t in terms of the fixed
+UUID_SIZE (16 bytes). The fixed size opaque encode and decode XDR
+methods are used instead of the less efficient variable sized methods.
+
+The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
+by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
+Linux Kernel Organization 400122 nfslocalio
+
+The LOCALIO protocol spec in rpcgen syntax is:
+
+/* raw RFC 9562 UUID */
+#define UUID_SIZE 16
+typedef u8 uuid_t<UUID_SIZE>;
+
+program NFS_LOCALIO_PROGRAM {
+ version LOCALIO_V1 {
+ void
+ NULL(void) = 0;
+
+ uuid_t
+ GETUUID(void) = 1;
+ } = 1;
+} = 400122;
+
+LOCALIO uses the same transport connection as NFS traffic. As such,
+LOCALIO is not registered with rpcbind.
+
+Once an NFS client and server handshake as "local", the client will
+bypass the network RPC protocol for read, write and commit operations.
+Due to this XDR and RPC bypass, these operations will operate faster.
+
+NFS Common and Server
+---------------------
+
+Localio is used by nfsd to add access to a global nfsd_uuids list in
+nfs_common that is used to register and then identify local nfsd
+instances.
+
+nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is
+composed of nfsd_uuid_t instances that are managed as nfsd creates them
+(per network namespace).
+
+nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local
+nfsd for the client specified nfsd uuid.
+
+The nfsd_uuids list is the basis for localio enablement, as such it has
+members that point to nfsd memory for direct use by the client
+(e.g. 'net' is the server's network namespace, through it the client can
+access nn->nfsd_serv with proper rcu read access). It is this client
+and server synchronization that enables advanced usage and lifetime of
+objects to span from the host kernel's nfsd to per-container knfsd
+instances that are connected to nfs client's running on the same local
+host.
+
+NFS Client
+----------
+
+fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via
+LOCALIO protocol and check if the server with that uuid is known to be
+local. This ensures client and server 1: support localio 2: are local
+to each other.
+
+See fs/nfs/localio.c:nfs_local_open_fh() and
+fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
+focused use of nfsd_uuid_t struct to allow a client local to a server to
+open a file pointer without needing to go over the network.
+
+The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
+server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
+both the nfsd network namespace and the associated nn->nfsd_serv in
+terms of RCU. If nfsd_open_local_fh() finds that client no longer sees
+valid nfsd objects (be it struct net or nn->nfsd_serv) it returns ENXIO
+to nfs_local_open_fh() and the client will try to reestablish the
+LOCALIO resources needed by calling nfs_local_probe() again. This
+recovery is needed if/when an nfsd instance running in a container were
+to reboot while a localio client is connected to it.
+
+Testing
+-------
+
+The LOCALIO auxiliary protocol and associated NFS localio read, write
+and commit access have proven stable against various test scenarios but
+these have not yet been formalized in any testsuite:
+
+- Client and server both on localhost (for both v3 and v4.2).
+
+- Various permutations of client and server support enablement for
+ both local and remote client and server. Testing against NFS storage
+ products that don't support the LOCALIO protocol was also performed.
+
+- Client on host, server within a container (for both v3 and v4.2)
+ The container testing was in terms of podman managed containers and
+ includes container stop/restart scenario.
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index c9592ad0afe2..a9722e18b527 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -20,6 +20,8 @@ extern struct list_head nfsd_uuids;
* Each nfsd instance has an nfsd_uuid_t that is accessible through the
* global nfsd_uuids list. Useful to allow a client to negotiate if localio
* possible with its server.
+ *
+ * See Documentation/filesystems/nfs/localio.rst for more detail.
*/
typedef struct {
uuid_t uuid;
--
2.44.0
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v9 00/19] nfs/nfsd: add support for localio
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (18 preceding siblings ...)
2024-06-28 21:11 ` [PATCH v9 19/19] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
@ 2024-06-29 15:36 ` Chuck Lever III
2024-06-29 16:03 ` Mike Snitzer
19 siblings, 1 reply; 44+ messages in thread
From: Chuck Lever III @ 2024-06-29 15:36 UTC (permalink / raw)
To: Mike Snitzer
Cc: Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, snitzer@hammerspace.com
> On Jun 28, 2024, at 5:10 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> Hi,
>
> I'd prefer to see these changes land upstream for 6.11 if possible.
> They are adequately Kconfig'd to certainly pose no risk if disabled.
> And even if localio enabled it has proven to work well with increased
> testing.
Can v10 split this series into an NFS client part and an NFS
server part? I will need to get the NFSD changes into nfsd-next
in the next week or so to land in v6.11.
> Worked with Kent Overstreet to enable testing integration with ktest
> running xfstests, the dashboard is here:
> https://evilpiepirate.org/~testdashboard/ci?branch=snitm-nfs
> (it is running way more xfstests tests than is usual for nfs, would be
> good to reconcile that with the listing provided here:
> https://wiki.linux-nfs.org/wiki/index.php/Xfstests )
Actually, we're using kdevops for NFSD CI testing. Any possibility
that we can get some help setting that up? (It runs xfstests and
several other workflows).
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4
2024-06-28 21:10 ` [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4 Mike Snitzer
@ 2024-06-29 15:50 ` Chuck Lever
2024-06-30 22:01 ` NeilBrown
0 siblings, 1 reply; 44+ messages in thread
From: Chuck Lever @ 2024-06-29 15:50 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
On Fri, Jun 28, 2024 at 05:10:53PM -0400, Mike Snitzer wrote:
> This is nfs-localio code which blurs the boundary between server and
> client...
>
> The change_attr is used by NFS to detect if a file might have changed.
> This code is used to get the attributes after a write request. NFS
> uses a GETATTR request to the server at other times. The change_attr
> should be consistent between the two else comparisons will be
> meaningless.
>
> So nfs_localio_vfs_getattr() should use the same change_attr as the
> one that would be used if the NFS GETATTR request were made. For
> NFSv3, that is nfs_timespec_to_change_attr() as was already
> implemented. For NFSv4 it is something different (as implemented in
> this commit).
>
> [above header derived from linux-nfs message Neil sent on this topic]
Instead of this note, I recommend:
Message-Id: <171918165963.14261.959545364150864599@noble.neil.brown.name>
> Suggested-by: NeilBrown <neil@brown.name>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/localio.c | 48 +++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 39 insertions(+), 9 deletions(-)
>
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 0f7d6d55087b..fe96f05ba8ca 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -364,21 +364,47 @@ nfs_set_local_verifier(struct inode *inode,
> verf->committed = how;
> }
>
> +/* Factored out from fs/nfsd/vfs.h:fh_getattr() */
> +static int __vfs_getattr(struct path *p, struct kstat *stat, int version)
> +{
> + u32 request_mask = STATX_BASIC_STATS;
> +
> + if (version == 4)
> + request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE);
> + return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT);
> +}
> +
> +/*
> + * Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute(),
> + * FIXME: factor out to common code.
> + */
> +static u64 __nfsd4_change_attribute(const struct kstat *stat,
> + const struct inode *inode)
> +{
> + u64 chattr;
> +
> + if (stat->result_mask & STATX_CHANGE_COOKIE) {
> + chattr = stat->change_cookie;
> + if (S_ISREG(inode->i_mode) &&
> + !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
> + chattr += (u64)stat->ctime.tv_sec << 30;
> + chattr += stat->ctime.tv_nsec;
> + }
> + } else {
> + chattr = time_to_chattr(&stat->ctime);
> + }
> + return chattr;
> +}
> +
> static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
> {
> struct kstat stat;
> struct file *filp = iocb->kiocb.ki_filp;
> struct nfs_pgio_header *hdr = iocb->hdr;
> struct nfs_fattr *fattr = hdr->res.fattr;
> + int version = NFS_PROTO(hdr->inode)->version;
>
> - if (unlikely(!fattr) || vfs_getattr(&filp->f_path, &stat,
> - STATX_INO |
> - STATX_ATIME |
> - STATX_MTIME |
> - STATX_CTIME |
> - STATX_SIZE |
> - STATX_BLOCKS,
> - AT_STATX_SYNC_AS_STAT))
> + if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version))
> return;
>
> fattr->valid = (NFS_ATTR_FATTR_FILEID |
> @@ -394,7 +420,11 @@ static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
> fattr->atime = stat.atime;
> fattr->mtime = stat.mtime;
> fattr->ctime = stat.ctime;
> - fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> + if (version == 4) {
> + fattr->change_attr =
> + __nfsd4_change_attribute(&stat, file_inode(filp));
> + } else
> + fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> fattr->du.nfs3.used = stat.blocks << 9;
> }
>
> --
> 2.44.0
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 18/19] SUNRPC: replace program list with program array
2024-06-28 21:11 ` [PATCH v9 18/19] SUNRPC: replace program list with program array Mike Snitzer
@ 2024-06-29 16:00 ` Chuck Lever
2024-06-30 21:57 ` NeilBrown
0 siblings, 1 reply; 44+ messages in thread
From: Chuck Lever @ 2024-06-29 16:00 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
On Fri, Jun 28, 2024 at 05:11:04PM -0400, Mike Snitzer wrote:
> From: NeilBrown <neil@brown.name>
>
> A service created with svc_create_pooled() can be given a linked list of
> programs and all of these will be served.
>
> Using a linked list makes it cumbersome when there are several programs
> that can be optionally selected with CONFIG settings.
>
> So change to use an array with explicit size. svc_create() is always
> passed a single program. svc_create_pooled() now must be used for
> multiple programs.
Instead of this last sentence, it might be more clear to say:
> After this patch is applied, API consumers must use only
> svc_create_pooled() when creating an RPC service that listens for
> more than one RPC program.
I like the idea of replacing these static linked lists.
> Signed-off-by: NeilBrown <neil@brown.name>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfsd/nfsctl.c | 2 +-
> fs/nfsd/nfsd.h | 2 +-
> fs/nfsd/nfssvc.c | 69 ++++++++++++++++++--------------------
> include/linux/sunrpc/svc.h | 7 ++--
> net/sunrpc/svc.c | 68 +++++++++++++++++++++----------------
> net/sunrpc/svc_xprt.c | 2 +-
> net/sunrpc/svcauth_unix.c | 3 +-
> 7 files changed, 80 insertions(+), 73 deletions(-)
>
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index e5d2cc74ef77..6fb92bb61c6d 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -2265,7 +2265,7 @@ static __net_init int nfsd_net_init(struct net *net)
> if (retval)
> goto out_repcache_error;
> memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
> - nn->nfsd_svcstats.program = &nfsd_program;
> + nn->nfsd_svcstats.program = &nfsd_programs[0];
> nn->nfsd_versions = NULL;
> nn->nfsd4_minorversions = NULL;
> nfsd4_init_leases_net(nn);
> diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> index cec8697b1cd6..c3f7c5957950 100644
> --- a/fs/nfsd/nfsd.h
> +++ b/fs/nfsd/nfsd.h
> @@ -80,7 +80,7 @@ struct nfsd_genl_rqstp {
> u32 rq_opnum[NFSD_MAX_OPS_PER_COMPOUND];
> };
>
> -extern struct svc_program nfsd_program;
> +extern struct svc_program nfsd_programs[];
> extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4;
> extern struct mutex nfsd_mutex;
> extern spinlock_t nfsd_drc_lock;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 6cc6a1971e21..ef2532303ece 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -36,7 +36,6 @@
> #define NFSDDBG_FACILITY NFSDDBG_SVC
>
> atomic_t nfsd_th_cnt = ATOMIC_INIT(0);
> -extern struct svc_program nfsd_program;
> static int nfsd(void *vrqstp);
> #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> static int nfsd_acl_rpcbind_set(struct net *,
> @@ -89,16 +88,6 @@ static const struct svc_version *localio_versions[] = {
>
> #define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions)
>
> -static struct svc_program nfsd_localio_program = {
> - .pg_prog = NFS_LOCALIO_PROGRAM,
> - .pg_nvers = NFSD_LOCALIO_NRVERS,
> - .pg_vers = localio_versions,
> - .pg_name = "nfslocalio",
> - .pg_class = "nfsd",
> - .pg_authenticate = &svc_set_client,
> - .pg_init_request = svc_generic_init_request,
> - .pg_rpcbind_set = svc_generic_rpcbind_set,
> -};
> #endif /* CONFIG_NFSD_LOCALIO */
>
> #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> @@ -111,23 +100,9 @@ static const struct svc_version *nfsd_acl_version[] = {
> # endif
> };
>
> -#define NFSD_ACL_MINVERS 2
> +#define NFSD_ACL_MINVERS 2
> #define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)
>
> -static struct svc_program nfsd_acl_program = {
> -#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> - .pg_next = &nfsd_localio_program,
> -#endif /* CONFIG_NFSD_LOCALIO */
> - .pg_prog = NFS_ACL_PROGRAM,
> - .pg_nvers = NFSD_ACL_NRVERS,
> - .pg_vers = nfsd_acl_version,
> - .pg_name = "nfsacl",
> - .pg_class = "nfsd",
> - .pg_authenticate = &svc_set_client,
> - .pg_init_request = nfsd_acl_init_request,
> - .pg_rpcbind_set = nfsd_acl_rpcbind_set,
> -};
> -
> #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
>
> static const struct svc_version *nfsd_version[] = {
> @@ -140,25 +115,44 @@ static const struct svc_version *nfsd_version[] = {
> #endif
> };
>
> -#define NFSD_MINVERS 2
> +#define NFSD_MINVERS 2
> #define NFSD_NRVERS ARRAY_SIZE(nfsd_version)
>
> -struct svc_program nfsd_program = {
> -#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> - .pg_next = &nfsd_acl_program,
> -#else
> -#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> - .pg_next = &nfsd_localio_program,
> -#endif /* CONFIG_NFSD_LOCALIO */
> -#endif
> +struct svc_program nfsd_programs[] = {
> + {
> .pg_prog = NFS_PROGRAM, /* program number */
> .pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
> .pg_vers = nfsd_version, /* version table */
> .pg_name = "nfsd", /* program name */
> .pg_class = "nfsd", /* authentication class */
> - .pg_authenticate = &svc_set_client, /* export authentication */
> + .pg_authenticate = svc_set_client, /* export authentication */
> .pg_init_request = nfsd_init_request,
> .pg_rpcbind_set = nfsd_rpcbind_set,
> + },
> +#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> + {
> + .pg_prog = NFS_ACL_PROGRAM,
> + .pg_nvers = NFSD_ACL_NRVERS,
> + .pg_vers = nfsd_acl_version,
> + .pg_name = "nfsacl",
> + .pg_class = "nfsd",
> + .pg_authenticate = svc_set_client,
> + .pg_init_request = nfsd_acl_init_request,
> + .pg_rpcbind_set = nfsd_acl_rpcbind_set,
> + },
> +#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
> +#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> + {
> + .pg_prog = NFS_LOCALIO_PROGRAM,
> + .pg_nvers = NFSD_LOCALIO_NRVERS,
> + .pg_vers = localio_versions,
> + .pg_name = "nfslocalio",
> + .pg_class = "nfsd",
> + .pg_authenticate = svc_set_client,
> + .pg_init_request = svc_generic_init_request,
> + .pg_rpcbind_set = svc_generic_rpcbind_set,
> + }
> +#endif /* IS_ENABLED(CONFIG_NFSD_LOCALIO) */
> };
>
> bool nfsd_support_version(int vers)
> @@ -735,7 +729,8 @@ int nfsd_create_serv(struct net *net)
> if (nfsd_max_blksize == 0)
> nfsd_max_blksize = nfsd_get_default_max_blksize();
> nfsd_reset_versions(nn);
> - serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats,
> + serv = svc_create_pooled(nfsd_programs, ARRAY_SIZE(nfsd_programs),
> + &nn->nfsd_svcstats,
> nfsd_max_blksize, nfsd);
> if (serv == NULL)
> return -ENOMEM;
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index a7d0406b9ef5..7c86b1696398 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -66,9 +66,10 @@ enum {
> * We currently do not support more than one RPC program per daemon.
> */
> struct svc_serv {
> - struct svc_program * sv_program; /* RPC program */
> + struct svc_program * sv_programs; /* RPC programs */
> struct svc_stat * sv_stats; /* RPC statistics */
> spinlock_t sv_lock;
> + unsigned int sv_nprogs; /* Number of sv_programs */
> unsigned int sv_nrthreads; /* # of server threads */
> unsigned int sv_maxconn; /* max connections allowed or
> * '0' causing max to be based
> @@ -329,10 +330,9 @@ struct svc_process_info {
> };
>
> /*
> - * List of RPC programs on the same transport endpoint
> + * RPC program - an array of these can use the same transport endpoint
> */
> struct svc_program {
> - struct svc_program * pg_next; /* other programs (same xprt) */
> u32 pg_prog; /* program number */
> unsigned int pg_lovers; /* lowest version */
> unsigned int pg_hivers; /* highest version */
> @@ -414,6 +414,7 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp);
> void svc_rqst_free(struct svc_rqst *);
> void svc_exit_thread(struct svc_rqst *);
> struct svc_serv * svc_create_pooled(struct svc_program *prog,
> + unsigned int nprog,
> struct svc_stat *stats,
> unsigned int bufsize,
> int (*threadfn)(void *data));
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index 965a27806bfd..d9f348aa0672 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -440,10 +440,11 @@ EXPORT_SYMBOL_GPL(svc_rpcb_cleanup);
>
> static int svc_uses_rpcbind(struct svc_serv *serv)
> {
> - struct svc_program *progp;
> - unsigned int i;
> + unsigned int p, i;
> +
> + for (p = 0; p < serv->sv_nprogs; p++) {
> + struct svc_program *progp = &serv->sv_programs[p];
>
> - for (progp = serv->sv_program; progp; progp = progp->pg_next) {
> for (i = 0; i < progp->pg_nvers; i++) {
> if (progp->pg_vers[i] == NULL)
> continue;
> @@ -480,7 +481,7 @@ __svc_init_bc(struct svc_serv *serv)
> * Create an RPC service
> */
> static struct svc_serv *
> -__svc_create(struct svc_program *prog, struct svc_stat *stats,
> +__svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats,
> unsigned int bufsize, int npools, int (*threadfn)(void *data))
> {
> struct svc_serv *serv;
> @@ -491,7 +492,8 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
> if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL)))
> return NULL;
> serv->sv_name = prog->pg_name;
> - serv->sv_program = prog;
> + serv->sv_programs = prog;
> + serv->sv_nprogs = nprogs;
> serv->sv_stats = stats;
> if (bufsize > RPCSVC_MAXPAYLOAD)
> bufsize = RPCSVC_MAXPAYLOAD;
> @@ -499,17 +501,18 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
> serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE);
> serv->sv_threadfn = threadfn;
> xdrsize = 0;
> - while (prog) {
> - prog->pg_lovers = prog->pg_nvers-1;
> - for (vers=0; vers<prog->pg_nvers ; vers++)
> - if (prog->pg_vers[vers]) {
> - prog->pg_hivers = vers;
> - if (prog->pg_lovers > vers)
> - prog->pg_lovers = vers;
> - if (prog->pg_vers[vers]->vs_xdrsize > xdrsize)
> - xdrsize = prog->pg_vers[vers]->vs_xdrsize;
> + for (i = 0; i < nprogs; i++) {
> + struct svc_program *progp = &prog[i];
> +
> + progp->pg_lovers = progp->pg_nvers-1;
> + for (vers = 0; vers < progp->pg_nvers ; vers++)
> + if (progp->pg_vers[vers]) {
> + progp->pg_hivers = vers;
> + if (progp->pg_lovers > vers)
> + progp->pg_lovers = vers;
> + if (progp->pg_vers[vers]->vs_xdrsize > xdrsize)
> + xdrsize = progp->pg_vers[vers]->vs_xdrsize;
> }
> - prog = prog->pg_next;
> }
> serv->sv_xdrsize = xdrsize;
> INIT_LIST_HEAD(&serv->sv_tempsocks);
> @@ -558,13 +561,14 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
> struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize,
> int (*threadfn)(void *data))
> {
> - return __svc_create(prog, NULL, bufsize, 1, threadfn);
> + return __svc_create(prog, 1, NULL, bufsize, 1, threadfn);
> }
> EXPORT_SYMBOL_GPL(svc_create);
>
> /**
> * svc_create_pooled - Create an RPC service with pooled threads
> - * @prog: the RPC program the new service will handle
> + * @prog: Array of RPC programs the new service will handle
> + * @nprogs: Number of programs in the array
> * @stats: the stats struct if desired
> * @bufsize: maximum message size for @prog
> * @threadfn: a function to service RPC requests for @prog
> @@ -572,6 +576,7 @@ EXPORT_SYMBOL_GPL(svc_create);
> * Returns an instantiated struct svc_serv object or NULL.
> */
> struct svc_serv *svc_create_pooled(struct svc_program *prog,
> + unsigned int nprogs,
> struct svc_stat *stats,
> unsigned int bufsize,
> int (*threadfn)(void *data))
> @@ -579,7 +584,7 @@ struct svc_serv *svc_create_pooled(struct svc_program *prog,
> struct svc_serv *serv;
> unsigned int npools = svc_pool_map_get();
>
> - serv = __svc_create(prog, stats, bufsize, npools, threadfn);
> + serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn);
> if (!serv)
> goto out_err;
> serv->sv_is_pooled = true;
> @@ -602,16 +607,16 @@ svc_destroy(struct svc_serv **servp)
>
> *servp = NULL;
>
> - dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name);
> + dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name);
> timer_shutdown_sync(&serv->sv_temptimer);
>
> /*
> * Remaining transports at this point are not expected.
> */
> WARN_ONCE(!list_empty(&serv->sv_permsocks),
> - "SVC: permsocks remain for %s\n", serv->sv_program->pg_name);
> + "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name);
> WARN_ONCE(!list_empty(&serv->sv_tempsocks),
> - "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name);
> + "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name);
>
> cache_clean_deferred(serv);
>
> @@ -1156,15 +1161,16 @@ int svc_register(const struct svc_serv *serv, struct net *net,
> const int family, const unsigned short proto,
> const unsigned short port)
> {
> - struct svc_program *progp;
> - unsigned int i;
> + unsigned int p, i;
> int error = 0;
>
> WARN_ON_ONCE(proto == 0 && port == 0);
> if (proto == 0 && port == 0)
> return -EINVAL;
>
> - for (progp = serv->sv_program; progp; progp = progp->pg_next) {
> + for (p = 0; p < serv->sv_nprogs; p++) {
> + struct svc_program *progp = &serv->sv_programs[p];
> +
> for (i = 0; i < progp->pg_nvers; i++) {
>
> error = progp->pg_rpcbind_set(net, progp, i,
> @@ -1216,13 +1222,14 @@ static void __svc_unregister(struct net *net, const u32 program, const u32 versi
> static void svc_unregister(const struct svc_serv *serv, struct net *net)
> {
> struct sighand_struct *sighand;
> - struct svc_program *progp;
> unsigned long flags;
> - unsigned int i;
> + unsigned int p, i;
>
> clear_thread_flag(TIF_SIGPENDING);
>
> - for (progp = serv->sv_program; progp; progp = progp->pg_next) {
> + for (p = 0; p < serv->sv_nprogs; p++) {
> + struct svc_program *progp = &serv->sv_programs[p];
> +
> for (i = 0; i < progp->pg_nvers; i++) {
> if (progp->pg_vers[i] == NULL)
> continue;
> @@ -1328,7 +1335,7 @@ svc_process_common(struct svc_rqst *rqstp)
> struct svc_process_info process;
> enum svc_auth_status auth_res;
> unsigned int aoffset;
> - int rc;
> + int pr, rc;
> __be32 *p;
>
> /* Will be turned off only when NFSv4 Sessions are used */
> @@ -1352,9 +1359,12 @@ svc_process_common(struct svc_rqst *rqstp)
> rqstp->rq_vers = be32_to_cpup(p++);
> rqstp->rq_proc = be32_to_cpup(p);
>
> - for (progp = serv->sv_program; progp; progp = progp->pg_next)
> + for (pr = 0; pr < serv->sv_nprogs; pr++) {
> + progp = &serv->sv_programs[pr];
> +
> if (rqstp->rq_prog == progp->pg_prog)
> break;
> + }
>
> /*
> * Decode auth data, and add verifier to reply buffer.
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index d3735ab3e6d1..16634afdf253 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -268,7 +268,7 @@ static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name,
> spin_unlock(&svc_xprt_class_lock);
> newxprt = xcl->xcl_ops->xpo_create(serv, net, sap, len, flags);
> if (IS_ERR(newxprt)) {
> - trace_svc_xprt_create_err(serv->sv_program->pg_name,
> + trace_svc_xprt_create_err(serv->sv_programs->pg_name,
> xcl->xcl_name, sap, len,
> newxprt);
> module_put(xcl->xcl_owner);
> diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> index 04b45588ae6f..8ca98b146ec8 100644
> --- a/net/sunrpc/svcauth_unix.c
> +++ b/net/sunrpc/svcauth_unix.c
> @@ -697,7 +697,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> rqstp->rq_auth_stat = rpc_autherr_badcred;
> ipm = ip_map_cached_get(xprt);
> if (ipm == NULL)
> - ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class,
> + ipm = __ip_map_lookup(sn->ip_map_cache,
> + rqstp->rq_server->sv_programs->pg_class,
> &sin6->sin6_addr);
>
> if (ipm == NULL)
> --
> 2.44.0
>
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 00/19] nfs/nfsd: add support for localio
2024-06-29 15:36 ` [PATCH v9 00/19] nfs/nfsd: add support for localio Chuck Lever III
@ 2024-06-29 16:03 ` Mike Snitzer
2024-06-29 17:01 ` Chuck Lever
0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2024-06-29 16:03 UTC (permalink / raw)
To: Chuck Lever III
Cc: Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, snitzer@hammerspace.com
On Sat, Jun 29, 2024 at 03:36:10PM +0000, Chuck Lever III wrote:
>
>
> > On Jun 28, 2024, at 5:10 PM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > Hi,
> >
> > I'd prefer to see these changes land upstream for 6.11 if possible.
> > They are adequately Kconfig'd to certainly pose no risk if disabled.
> > And even if localio enabled it has proven to work well with increased
> > testing.
>
> Can v10 split this series into an NFS client part and an NFS
> server part? I will need to get the NFSD changes into nfsd-next
> in the next week or so to land in v6.11.
I forgot to mention this as a v9 improvement: I did split the series,
but left it as one patchset.
Patches 1-12 are NFS client, Patches 13-19 are NFSD.
Here is the diffstat for NFS (patches 1 - 12):
fs/Kconfig | 3
fs/nfs/Kconfig | 14
fs/nfs/Makefile | 1
fs/nfs/blocklayout/blocklayout.c | 6
fs/nfs/client.c | 15
fs/nfs/filelayout/filelayout.c | 16
fs/nfs/flexfilelayout/flexfilelayout.c | 131 ++++
fs/nfs/flexfilelayout/flexfilelayout.h | 2
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6
fs/nfs/inode.c | 4
fs/nfs/internal.h | 60 ++
fs/nfs/localio.c | 827 ++++++++++++++++++++++++++++++
fs/nfs/nfs4xdr.c | 13
fs/nfs/nfstrace.h | 61 ++
fs/nfs/pagelist.c | 32 -
fs/nfs/pnfs.c | 24
fs/nfs/pnfs.h | 6
fs/nfs/pnfs_nfs.c | 2
fs/nfs/write.c | 13
fs/nfs_common/Makefile | 3
fs/nfs_common/nfslocalio.c | 74 ++
fs/nfsd/netns.h | 4
fs/nfsd/nfssvc.c | 12
include/linux/nfs.h | 9
include/linux/nfs_fs.h | 2
include/linux/nfs_fs_sb.h | 10
include/linux/nfs_xdr.h | 20
include/linux/nfslocalio.h | 39 +
include/linux/sunrpc/auth.h | 4
net/sunrpc/auth.c | 15
net/sunrpc/clnt.c | 1
31 files changed, 1354 insertions(+), 75 deletions(-)
Unfortunately there are the fs/nfsd/netns.h and fs/nfsd/nfssvc.c
changes that anchor everything (patch 5).
I suppose we could invert the order, such that NFSD comes before NFS
changes. But then the NFS tree will need to be rebased on NFSD tree.
Diffstat for NFSD (patches 13 - 19):
Documentation/filesystems/nfs/localio.rst | 135 ++++++++++++
fs/nfsd/Kconfig | 14 +
fs/nfsd/Makefile | 1
fs/nfsd/filecache.c | 2
fs/nfsd/localio.c | 319 ++++++++++++++++++++++++++++++
fs/nfsd/netns.h | 8
fs/nfsd/nfsctl.c | 2
fs/nfsd/nfsd.h | 2
fs/nfsd/nfssvc.c | 104 +++++++--
fs/nfsd/trace.h | 3
fs/nfsd/vfs.h | 9
include/linux/nfslocalio.h | 2
include/linux/sunrpc/svc.h | 7
net/sunrpc/svc.c | 68 +++---
net/sunrpc/svc_xprt.c | 2
net/sunrpc/svcauth_unix.c | 3
16 files changed, 621 insertions(+), 60 deletions(-)
Happy to work it however you think is best.
> > Worked with Kent Overstreet to enable testing integration with ktest
> > running xfstests, the dashboard is here:
> > https://evilpiepirate.org/~testdashboard/ci?branch=snitm-nfs
> > (it is running way more xfstests tests than is usual for nfs, would be
> > good to reconcile that with the listing provided here:
> > https://wiki.linux-nfs.org/wiki/index.php/Xfstests )
>
> Actually, we're using kdevops for NFSD CI testing. Any possibility
> that we can get some help setting that up? (It runs xfstests and
> several other workflows).
Sure, I can get with you off-list if that's best? I just need some
pointers/access to help get it going.
Thanks,
Mike
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 00/19] nfs/nfsd: add support for localio
2024-06-29 16:03 ` Mike Snitzer
@ 2024-06-29 17:01 ` Chuck Lever
2024-06-29 19:10 ` Mike Snitzer
0 siblings, 1 reply; 44+ messages in thread
From: Chuck Lever @ 2024-06-29 17:01 UTC (permalink / raw)
To: Mike Snitzer
Cc: Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, snitzer@hammerspace.com
On Sat, Jun 29, 2024 at 12:03:50PM -0400, Mike Snitzer wrote:
> On Sat, Jun 29, 2024 at 03:36:10PM +0000, Chuck Lever III wrote:
> >
> >
> > > On Jun 28, 2024, at 5:10 PM, Mike Snitzer <snitzer@kernel.org> wrote:
> > >
> > > Hi,
> > >
> > > I'd prefer to see these changes land upstream for 6.11 if possible.
> > > They are adequately Kconfig'd to certainly pose no risk if disabled.
> > > And even if localio enabled it has proven to work well with increased
> > > testing.
> >
> > Can v10 split this series into an NFS client part and an NFS
> > server part? I will need to get the NFSD changes into nfsd-next
> > in the next week or so to land in v6.11.
>
> I forgot to mention this as a v9 improvement: I did split the series,
> but left it as one patchset.
>
> Patches 1-12 are NFS client, Patches 13-19 are NFSD.
I didn't notice that because my MUA displayed the patches completely
out of order. Apologies!
> Here is the diffstat for NFS (patches 1 - 12):
>
> fs/Kconfig | 3
> fs/nfs/Kconfig | 14
> fs/nfs/Makefile | 1
> fs/nfs/blocklayout/blocklayout.c | 6
> fs/nfs/client.c | 15
> fs/nfs/filelayout/filelayout.c | 16
> fs/nfs/flexfilelayout/flexfilelayout.c | 131 ++++
> fs/nfs/flexfilelayout/flexfilelayout.h | 2
> fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6
> fs/nfs/inode.c | 4
> fs/nfs/internal.h | 60 ++
> fs/nfs/localio.c | 827 ++++++++++++++++++++++++++++++
> fs/nfs/nfs4xdr.c | 13
> fs/nfs/nfstrace.h | 61 ++
> fs/nfs/pagelist.c | 32 -
> fs/nfs/pnfs.c | 24
> fs/nfs/pnfs.h | 6
> fs/nfs/pnfs_nfs.c | 2
> fs/nfs/write.c | 13
> fs/nfs_common/Makefile | 3
> fs/nfs_common/nfslocalio.c | 74 ++
> fs/nfsd/netns.h | 4
> fs/nfsd/nfssvc.c | 12
> include/linux/nfs.h | 9
> include/linux/nfs_fs.h | 2
> include/linux/nfs_fs_sb.h | 10
> include/linux/nfs_xdr.h | 20
> include/linux/nfslocalio.h | 39 +
> include/linux/sunrpc/auth.h | 4
> net/sunrpc/auth.c | 15
> net/sunrpc/clnt.c | 1
> 31 files changed, 1354 insertions(+), 75 deletions(-)
>
> Unfortunately there are the fs/nfsd/netns.h and fs/nfsd/nfssvc.c
> changes that anchor everything (patch 5).
I /did/ notice that.
> I suppose we could invert the order, such that NFSD comes before NFS
> changes. But then the NFS tree will need to be rebased on NFSD tree.
Alternately, I can take the NFSD-related patches in 6.11, and the
client changes can go in 6.12. My impression (could be wrong) was
that the NFSD patches were nearly ready but the client side was
still churning a little.
Or we might decide that it's not worth the trouble. Anna offered to
take the whole series, or I can. If Anna takes it, I'll send
Acked-by for the NFSD patches.
> Diffstat for NFSD (patches 13 - 19):
>
> Documentation/filesystems/nfs/localio.rst | 135 ++++++++++++
> fs/nfsd/Kconfig | 14 +
> fs/nfsd/Makefile | 1
> fs/nfsd/filecache.c | 2
> fs/nfsd/localio.c | 319 ++++++++++++++++++++++++++++++
> fs/nfsd/netns.h | 8
> fs/nfsd/nfsctl.c | 2
> fs/nfsd/nfsd.h | 2
> fs/nfsd/nfssvc.c | 104 +++++++--
> fs/nfsd/trace.h | 3
> fs/nfsd/vfs.h | 9
> include/linux/nfslocalio.h | 2
> include/linux/sunrpc/svc.h | 7
> net/sunrpc/svc.c | 68 +++---
> net/sunrpc/svc_xprt.c | 2
> net/sunrpc/svcauth_unix.c | 3
> 16 files changed, 621 insertions(+), 60 deletions(-)
>
> Happy to work it however you think is best.
>
> > > Worked with Kent Overstreet to enable testing integration with ktest
> > > running xfstests, the dashboard is here:
> > > https://evilpiepirate.org/~testdashboard/ci?branch=snitm-nfs
> > > (it is running way more xfstests tests than is usual for nfs, would be
> > > good to reconcile that with the listing provided here:
> > > https://wiki.linux-nfs.org/wiki/index.php/Xfstests )
> >
> > Actually, we're using kdevops for NFSD CI testing. Any possibility
> > that we can get some help setting that up? (It runs xfstests and
> > several other workflows).
>
> Sure, I can get with you off-list if that's best? I just need some
> pointers/access to help get it going.
Yes, off-list wfm.
Come to think of it, it might just work to point my test systems to
your git branch and let it rip, if there are no new tests. I will
try that.
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 00/19] nfs/nfsd: add support for localio
2024-06-29 17:01 ` Chuck Lever
@ 2024-06-29 19:10 ` Mike Snitzer
2024-06-29 20:31 ` Chuck Lever III
0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2024-06-29 19:10 UTC (permalink / raw)
To: Chuck Lever
Cc: Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, snitzer@hammerspace.com
On Sat, Jun 29, 2024 at 01:01:57PM -0400, Chuck Lever wrote:
> On Sat, Jun 29, 2024 at 12:03:50PM -0400, Mike Snitzer wrote:
> > On Sat, Jun 29, 2024 at 03:36:10PM +0000, Chuck Lever III wrote:
> > >
> > >
> > > > On Jun 28, 2024, at 5:10 PM, Mike Snitzer <snitzer@kernel.org> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I'd prefer to see these changes land upstream for 6.11 if possible.
> > > > They are adequately Kconfig'd to certainly pose no risk if disabled.
> > > > And even if localio enabled it has proven to work well with increased
> > > > testing.
> > >
> > > Can v10 split this series into an NFS client part and an NFS
> > > server part? I will need to get the NFSD changes into nfsd-next
> > > in the next week or so to land in v6.11.
> >
> > I forgot to mention this as a v9 improvement: I did split the series,
> > but left it as one patchset.
> >
> > Patches 1-12 are NFS client, Patches 13-19 are NFSD.
>
> I didn't notice that because my MUA displayed the patches completely
> out of order. Apologies!
>
> > Here is the diffstat for NFS (patches 1 - 12):
> >
> > fs/Kconfig | 3
> > fs/nfs/Kconfig | 14
> > fs/nfs/Makefile | 1
> > fs/nfs/blocklayout/blocklayout.c | 6
> > fs/nfs/client.c | 15
> > fs/nfs/filelayout/filelayout.c | 16
> > fs/nfs/flexfilelayout/flexfilelayout.c | 131 ++++
> > fs/nfs/flexfilelayout/flexfilelayout.h | 2
> > fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6
> > fs/nfs/inode.c | 4
> > fs/nfs/internal.h | 60 ++
> > fs/nfs/localio.c | 827 ++++++++++++++++++++++++++++++
> > fs/nfs/nfs4xdr.c | 13
> > fs/nfs/nfstrace.h | 61 ++
> > fs/nfs/pagelist.c | 32 -
> > fs/nfs/pnfs.c | 24
> > fs/nfs/pnfs.h | 6
> > fs/nfs/pnfs_nfs.c | 2
> > fs/nfs/write.c | 13
> > fs/nfs_common/Makefile | 3
> > fs/nfs_common/nfslocalio.c | 74 ++
> > fs/nfsd/netns.h | 4
> > fs/nfsd/nfssvc.c | 12
> > include/linux/nfs.h | 9
> > include/linux/nfs_fs.h | 2
> > include/linux/nfs_fs_sb.h | 10
> > include/linux/nfs_xdr.h | 20
> > include/linux/nfslocalio.h | 39 +
> > include/linux/sunrpc/auth.h | 4
> > net/sunrpc/auth.c | 15
> > net/sunrpc/clnt.c | 1
> > 31 files changed, 1354 insertions(+), 75 deletions(-)
> >
> > Unfortunately there are the fs/nfsd/netns.h and fs/nfsd/nfssvc.c
> > changes that anchor everything (patch 5).
>
> I /did/ notice that.
>
>
> > I suppose we could invert the order, such that NFSD comes before NFS
> > changes. But then the NFS tree will need to be rebased on NFSD tree.
>
> Alternately, I can take the NFSD-related patches in 6.11, and the
> client changes can go in 6.12. My impression (could be wrong) was
> that the NFSD patches were nearly ready but the client side was
> still churning a little.
I'm "done" with both afaik. Only thing that needs settling is that
XFS RFC patch I posted.
> Or we might decide that it's not worth the trouble. Anna offered to
> take the whole series, or I can. If Anna takes it, I'll send
> Acked-by for the NFSD patches.
Probably best to have it all go through the same tree. Just get proper
Acked-by:s where needed.
I would say it is more client heavy (in terms of code foot-print) so
maybe it does make more sense to go through NFS. Anna is handling the
6.11 merge for NFS so let's just work on getting proper Acked-by.
If you, Jeff and Neil could do a final review and provide Acked-by (or
conditional Acked-by if I fold your suggestions in for v10) I'll add
all your final feedback and Acked-by:s or Reviewed-by:s so Anna will
be able to simply pick it up once the NFS client side is also
reviewed.
> > Diffstat for NFSD (patches 13 - 19):
> >
> > Documentation/filesystems/nfs/localio.rst | 135 ++++++++++++
> > fs/nfsd/Kconfig | 14 +
> > fs/nfsd/Makefile | 1
> > fs/nfsd/filecache.c | 2
> > fs/nfsd/localio.c | 319 ++++++++++++++++++++++++++++++
> > fs/nfsd/netns.h | 8
> > fs/nfsd/nfsctl.c | 2
> > fs/nfsd/nfsd.h | 2
> > fs/nfsd/nfssvc.c | 104 +++++++--
> > fs/nfsd/trace.h | 3
> > fs/nfsd/vfs.h | 9
> > include/linux/nfslocalio.h | 2
> > include/linux/sunrpc/svc.h | 7
> > net/sunrpc/svc.c | 68 +++---
> > net/sunrpc/svc_xprt.c | 2
> > net/sunrpc/svcauth_unix.c | 3
> > 16 files changed, 621 insertions(+), 60 deletions(-)
> >
> > Happy to work it however you think is best.
> >
> > > > Worked with Kent Overstreet to enable testing integration with ktest
> > > > running xfstests, the dashboard is here:
> > > > https://evilpiepirate.org/~testdashboard/ci?branch=snitm-nfs
> > > > (it is running way more xfstests tests than is usual for nfs, would be
> > > > good to reconcile that with the listing provided here:
> > > > https://wiki.linux-nfs.org/wiki/index.php/Xfstests )
> > >
> > > Actually, we're using kdevops for NFSD CI testing. Any possibility
> > > that we can get some help setting that up? (It runs xfstests and
> > > several other workflows).
> >
> > Sure, I can get with you off-list if that's best? I just need some
> > pointers/access to help get it going.
>
> Yes, off-list wfm.
>
> Come to think of it, it might just work to point my test systems to
> your git branch and let it rip, if there are no new tests. I will
> try that.
Right, no new tests added to xfstests, it was purely configuration to
get xfstests running on single host in loopback mode (NFS client
mounting export from knfsd on same host).
Would be great if you could point your kdevops at my localio-for-6.11
branch. You just need to make sure to enable these in your Kconfig:
CONFIG_NFSD_LOCALIO=y
CONFIG_NFS_LOCALIO=y
CONFIG_NFS_COMMON_LOCALIO_SUPPORT=y
(either of the NFS or NFSD options will select
CONFIG_NFS_COMMON_LOCALIO_SUPPORT)
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 00/19] nfs/nfsd: add support for localio
2024-06-29 19:10 ` Mike Snitzer
@ 2024-06-29 20:31 ` Chuck Lever III
0 siblings, 0 replies; 44+ messages in thread
From: Chuck Lever III @ 2024-06-29 20:31 UTC (permalink / raw)
To: Mike Snitzer, Anna Schumaker
Cc: Linux NFS Mailing List, Jeff Layton, Trond Myklebust, Neil Brown,
snitzer@hammerspace.com
> On Jun 29, 2024, at 3:10 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Sat, Jun 29, 2024 at 01:01:57PM -0400, Chuck Lever wrote:
>> On Sat, Jun 29, 2024 at 12:03:50PM -0400, Mike Snitzer wrote:
>>> On Sat, Jun 29, 2024 at 03:36:10PM +0000, Chuck Lever III wrote:
>>>>
>>>>
>>>>> On Jun 28, 2024, at 5:10 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>> I'd prefer to see these changes land upstream for 6.11 if possible.
>>>>> They are adequately Kconfig'd to certainly pose no risk if disabled.
>>>>> And even if localio enabled it has proven to work well with increased
>>>>> testing.
>>>>
>>>> Can v10 split this series into an NFS client part and an NFS
>>>> server part? I will need to get the NFSD changes into nfsd-next
>>>> in the next week or so to land in v6.11.
>>>
>>> I forgot to mention this as a v9 improvement: I did split the series,
>>> but left it as one patchset.
>>>
>>> Patches 1-12 are NFS client, Patches 13-19 are NFSD.
>>
>> I didn't notice that because my MUA displayed the patches completely
>> out of order. Apologies!
>>
>>> Here is the diffstat for NFS (patches 1 - 12):
>>>
>>> fs/Kconfig | 3
>>> fs/nfs/Kconfig | 14
>>> fs/nfs/Makefile | 1
>>> fs/nfs/blocklayout/blocklayout.c | 6
>>> fs/nfs/client.c | 15
>>> fs/nfs/filelayout/filelayout.c | 16
>>> fs/nfs/flexfilelayout/flexfilelayout.c | 131 ++++
>>> fs/nfs/flexfilelayout/flexfilelayout.h | 2
>>> fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6
>>> fs/nfs/inode.c | 4
>>> fs/nfs/internal.h | 60 ++
>>> fs/nfs/localio.c | 827 ++++++++++++++++++++++++++++++
>>> fs/nfs/nfs4xdr.c | 13
>>> fs/nfs/nfstrace.h | 61 ++
>>> fs/nfs/pagelist.c | 32 -
>>> fs/nfs/pnfs.c | 24
>>> fs/nfs/pnfs.h | 6
>>> fs/nfs/pnfs_nfs.c | 2
>>> fs/nfs/write.c | 13
>>> fs/nfs_common/Makefile | 3
>>> fs/nfs_common/nfslocalio.c | 74 ++
>>> fs/nfsd/netns.h | 4
>>> fs/nfsd/nfssvc.c | 12
>>> include/linux/nfs.h | 9
>>> include/linux/nfs_fs.h | 2
>>> include/linux/nfs_fs_sb.h | 10
>>> include/linux/nfs_xdr.h | 20
>>> include/linux/nfslocalio.h | 39 +
>>> include/linux/sunrpc/auth.h | 4
>>> net/sunrpc/auth.c | 15
>>> net/sunrpc/clnt.c | 1
>>> 31 files changed, 1354 insertions(+), 75 deletions(-)
>>>
>>> Unfortunately there are the fs/nfsd/netns.h and fs/nfsd/nfssvc.c
>>> changes that anchor everything (patch 5).
>>
>> I /did/ notice that.
>>
>>
>>> I suppose we could invert the order, such that NFSD comes before NFS
>>> changes. But then the NFS tree will need to be rebased on NFSD tree.
>>
>> Alternately, I can take the NFSD-related patches in 6.11, and the
>> client changes can go in 6.12. My impression (could be wrong) was
>> that the NFSD patches were nearly ready but the client side was
>> still churning a little.
>
> I'm "done" with both afaik. Only thing that needs settling is that
> XFS RFC patch I posted.
>
>> Or we might decide that it's not worth the trouble. Anna offered to
>> take the whole series, or I can. If Anna takes it, I'll send
>> Acked-by for the NFSD patches.
>
> Probably best to have it all go through the same tree. Just get proper
> Acked-by:s where needed.
>
> I would say it is more client heavy (in terms of code foot-print) so
> maybe it does make more sense to go through NFS. Anna is handling the
> 6.11 merge for NFS so let's just work on getting proper Acked-by.
>
> If you, Jeff and Neil could do a final review and provide Acked-by (or
> conditional Acked-by if I fold your suggestions in for v10) I'll add
> all your final feedback and Acked-by:s or Reviewed-by:s so Anna will
> be able to simply pick it up once the NFS client side is also
> reviewed.
Anna suggested this should soak in linux-next until v6.12.
I don't have a strong preference, though v6.12 seems like
a safer goal if you haven't seen any client-side review yet.
>>> Diffstat for NFSD (patches 13 - 19):
>>>
>>> Documentation/filesystems/nfs/localio.rst | 135 ++++++++++++
>>> fs/nfsd/Kconfig | 14 +
>>> fs/nfsd/Makefile | 1
>>> fs/nfsd/filecache.c | 2
>>> fs/nfsd/localio.c | 319 ++++++++++++++++++++++++++++++
>>> fs/nfsd/netns.h | 8
>>> fs/nfsd/nfsctl.c | 2
>>> fs/nfsd/nfsd.h | 2
>>> fs/nfsd/nfssvc.c | 104 +++++++--
>>> fs/nfsd/trace.h | 3
>>> fs/nfsd/vfs.h | 9
>>> include/linux/nfslocalio.h | 2
>>> include/linux/sunrpc/svc.h | 7
>>> net/sunrpc/svc.c | 68 +++---
>>> net/sunrpc/svc_xprt.c | 2
>>> net/sunrpc/svcauth_unix.c | 3
>>> 16 files changed, 621 insertions(+), 60 deletions(-)
>>>
>>> Happy to work it however you think is best.
>>>
>>>>> Worked with Kent Overstreet to enable testing integration with ktest
>>>>> running xfstests, the dashboard is here:
>>>>> https://evilpiepirate.org/~testdashboard/ci?branch=snitm-nfs
>>>>> (it is running way more xfstests tests than is usual for nfs, would be
>>>>> good to reconcile that with the listing provided here:
>>>>> https://wiki.linux-nfs.org/wiki/index.php/Xfstests )
>>>>
>>>> Actually, we're using kdevops for NFSD CI testing. Any possibility
>>>> that we can get some help setting that up? (It runs xfstests and
>>>> several other workflows).
>>>
>>> Sure, I can get with you off-list if that's best? I just need some
>>> pointers/access to help get it going.
>>
>> Yes, off-list wfm.
>>
>> Come to think of it, it might just work to point my test systems to
>> your git branch and let it rip, if there are no new tests. I will
>> try that.
>
> Right, no new tests added to xfstests, it was purely configuration to
> get xfstests running on single host in loopback mode (NFS client
> mounting export from knfsd on same host).
>
> Would be great if you could point your kdevops at my localio-for-6.11
> branch. You just need to make sure to enable these in your Kconfig:
>
> CONFIG_NFSD_LOCALIO=y
> CONFIG_NFS_LOCALIO=y
> CONFIG_NFS_COMMON_LOCALIO_SUPPORT=y
>
> (either of the NFS or NFSD options will select
> CONFIG_NFS_COMMON_LOCALIO_SUPPORT)
I'm running the first set right now. We don't have a public
dashboard yet, but I can set up a MailNotifier for you.
You don't have any metrics that show whether (and how many)
local read and write operations are happening; so I can
tell if tests pass or fail, but not if local I/O is going
on. The usual approach is to hook that kind of client
metric into /proc/self/mountstats.
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-28 21:10 ` [PATCH v9 13/19] nfsd: add "localio" support Mike Snitzer
@ 2024-06-29 22:18 ` Chuck Lever
2024-06-30 14:49 ` Chuck Lever
` (2 more replies)
0 siblings, 3 replies; 44+ messages in thread
From: Chuck Lever @ 2024-06-29 22:18 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Sorry, I guess I expected to have more time to learn about these
patches before writing review comments. But if you want them to go
in soon, I had better look more closely at them now.
On Fri, Jun 28, 2024 at 05:10:59PM -0400, Mike Snitzer wrote:
> Pass the stored cl_nfssvc_net from the client to the server as
This is the only mention of cl_nfssvc_net I can find in this
patch. I'm not sure what it is. Patch description should maybe
provide some context.
> first argument to nfsd_open_local_fh() to ensure the proper network
> namespace is used for localio.
Can the patch description say something about the distinct mount
namespaces -- if the local application is running in a different
container than the NFS server, are we using only the network
namespaces for authorizing the file access? And is that OK to do?
If yes, patch description should explain that NFS local I/O ignores
the boundaries of mount namespaces and why that is OK to do.
> Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> Signed-off-by: Peng Tao <tao.peng@primarydata.com>
> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfsd/Makefile | 1 +
> fs/nfsd/filecache.c | 2 +-
> fs/nfsd/localio.c | 239 ++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/nfssvc.c | 1 +
> fs/nfsd/trace.h | 3 +-
> fs/nfsd/vfs.h | 9 ++
> 6 files changed, 253 insertions(+), 2 deletions(-)
> create mode 100644 fs/nfsd/localio.c
>
> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> index b8736a82e57c..78b421778a79 100644
> --- a/fs/nfsd/Makefile
> +++ b/fs/nfsd/Makefile
> @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index ad9083ca144b..99631fa56662 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -52,7 +52,7 @@
> #define NFSD_FILE_CACHE_UP (0)
>
> /* We only care about NFSD_MAY_READ/WRITE for this cache */
> -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
> +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
>
> static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> new file mode 100644
> index 000000000000..759a5cb79652
> --- /dev/null
> +++ b/fs/nfsd/localio.c
> @@ -0,0 +1,239 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * NFS server support for local clients to bypass network stack
> + *
> + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> + */
> +
> +#include <linux/exportfs.h>
> +#include <linux/sunrpc/svcauth_gss.h>
> +#include <linux/sunrpc/clnt.h>
> +#include <linux/nfs.h>
> +#include <linux/string.h>
> +
> +#include "nfsd.h"
> +#include "vfs.h"
> +#include "netns.h"
> +#include "filecache.h"
> +
> +#define NFSDDBG_FACILITY NFSDDBG_FH
With no more dprintk() call sites in this patch, you no longer need
this macro definition.
> +/*
> + * We need to translate between nfs status return values and
> + * the local errno values which may not be the same.
> + * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
> + * all compiled nfs objects if it were in include/linux/nfs.h
> + */
> +static const struct {
> + int stat;
> + int errno;
> +} nfs_common_errtbl[] = {
> + { NFS_OK, 0 },
> + { NFSERR_PERM, -EPERM },
> + { NFSERR_NOENT, -ENOENT },
> + { NFSERR_IO, -EIO },
> + { NFSERR_NXIO, -ENXIO },
> +/* { NFSERR_EAGAIN, -EAGAIN }, */
> + { NFSERR_ACCES, -EACCES },
> + { NFSERR_EXIST, -EEXIST },
> + { NFSERR_XDEV, -EXDEV },
> + { NFSERR_NODEV, -ENODEV },
> + { NFSERR_NOTDIR, -ENOTDIR },
> + { NFSERR_ISDIR, -EISDIR },
> + { NFSERR_INVAL, -EINVAL },
> + { NFSERR_FBIG, -EFBIG },
> + { NFSERR_NOSPC, -ENOSPC },
> + { NFSERR_ROFS, -EROFS },
> + { NFSERR_MLINK, -EMLINK },
> + { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
> + { NFSERR_NOTEMPTY, -ENOTEMPTY },
> + { NFSERR_DQUOT, -EDQUOT },
> + { NFSERR_STALE, -ESTALE },
> + { NFSERR_REMOTE, -EREMOTE },
> +#ifdef EWFLUSH
> + { NFSERR_WFLUSH, -EWFLUSH },
> +#endif
> + { NFSERR_BADHANDLE, -EBADHANDLE },
> + { NFSERR_NOT_SYNC, -ENOTSYNC },
> + { NFSERR_BAD_COOKIE, -EBADCOOKIE },
> + { NFSERR_NOTSUPP, -ENOTSUPP },
> + { NFSERR_TOOSMALL, -ETOOSMALL },
> + { NFSERR_SERVERFAULT, -EREMOTEIO },
> + { NFSERR_BADTYPE, -EBADTYPE },
> + { NFSERR_JUKEBOX, -EJUKEBOX },
> + { -1, -EIO }
> +};
> +
> +/**
> + * nfs_stat_to_errno - convert an NFS status code to a local errno
> + * @status: NFS status code to convert
> + *
> + * Returns a local errno value, or -EIO if the NFS status code is
> + * not recognized. nfsd_file_acquire() returns an nfsstat that
> + * needs to be translated to an errno before being returned to a
> + * local client application.
> + */
> +static int nfs_stat_to_errno(enum nfs_stat status)
> +{
> + int i;
> +
> + for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
> + if (nfs_common_errtbl[i].stat == (int)status)
> + return nfs_common_errtbl[i].errno;
> + }
> + return nfs_common_errtbl[i].errno;
> +}
> +
> +static void
> +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> +{
> + if (rqstp->rq_client)
> + auth_domain_put(rqstp->rq_client);
> + if (rqstp->rq_cred.cr_group_info)
> + put_group_info(rqstp->rq_cred.cr_group_info);
> + /* rpcauth_map_to_svc_cred_local() clears cr_principal */
> + WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
> + kfree(rqstp->rq_xprt);
> + kfree(rqstp);
> +}
> +
> +static struct svc_rqst *
> +nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
> + const struct cred *cred)
> +{
> + struct svc_rqst *rqstp;
> + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> + int status;
> +
> + /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
> + if (unlikely(!READ_ONCE(nn->nfsd_serv)))
> + return ERR_PTR(-ENXIO);
> +
> + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> + if (!rqstp)
> + return ERR_PTR(-ENOMEM);
> +
> + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> + if (!rqstp->rq_xprt) {
> + status = -ENOMEM;
> + goto out_err;
> + }
struct svc_rqst is pretty big (like, bigger than a couple of pages).
What happens if this allocation fails?
And how often does it occur -- does that add significant overhead?
> +
> + rqstp->rq_xprt->xpt_net = net;
> + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> + rqstp->rq_proc = 1;
> + rqstp->rq_vers = 3;
IMO these need to be symbolic constants, not integers. Or, at least
there needs to be some documenting comments that explain these are
fake and why that's OK to do. Or, are there better choices?
> + rqstp->rq_prot = IPPROTO_TCP;
> + rqstp->rq_server = nn->nfsd_serv;
> +
> + /* Note: we're connecting to ourself, so source addr == peer addr */
> + rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
> + (struct sockaddr *)&rqstp->rq_addr,
> + sizeof(rqstp->rq_addr));
> +
> + rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
> +
> + /*
> + * set up enough for svcauth_unix_set_client to be able to wait
> + * for the cache downcall. Note that we do _not_ want to allow the
> + * request to be deferred for later revisit since this rqst and xprt
> + * are not set up to run inside of the normal svc_rqst engine.
> + */
> + INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
> + kref_init(&rqstp->rq_xprt->xpt_ref);
> + spin_lock_init(&rqstp->rq_xprt->xpt_lock);
> + rqstp->rq_chandle.thread_wait = 5 * HZ;
> +
> + status = svcauth_unix_set_client(rqstp);
> + switch (status) {
> + case SVC_OK:
> + break;
> + case SVC_DENIED:
> + status = -ENXIO;
> + goto out_err;
> + default:
> + status = -ETIMEDOUT;
> + goto out_err;
> + }
Interesting. Why would svcauth_unix_set_client fail for a local I/O
request? Wouldn't it only be because the local application is trying
to open a file it doesn't have permission to?
> + return rqstp;
> +
> +out_err:
> + nfsd_local_fakerqst_destroy(rqstp);
> + return ERR_PTR(status);
> +}
> +
> +/*
> + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file
> + *
> + * This function maps a local fh to a path on a local filesystem.
> + * This is useful when the nfs client has the local server mounted - it can
> + * avoid all the NFS overhead with reads, writes and commits.
Hm. It just occurred to me that there won't be a two-phase commit
here, and possibly no flush-on-close, either? Can someone help
explain whether/how the writeback semantics are different for NFS
local I/O?
> + *
> + * on successful return, caller is responsible for calling path_put. Also
> + * note that this is called from nfs.ko via find_symbol() to avoid an explicit
> + * dependency on knfsd. So, there is no forward declaration in a header file
> + * for it.
Yet I see a declaration added below in fs/nfsd/vfs.h. Is this
comment out of date? Or perhaps you mean there's no declaration
that is shared with the client code?
> + */
> +int nfsd_open_local_fh(struct net *net,
I've been asking that new NFSD code use genuine full-blooded kdoc
comments for new functions. Since this is a global (EXPORTED)
function, please make this a genuine kdoc comment.
> + struct rpc_clnt *rpc_clnt,
> + const struct cred *cred,
> + const struct nfs_fh *nfs_fh,
> + const fmode_t fmode,
> + struct file **pfilp)
> +{
> + const struct cred *save_cred;
> + struct svc_rqst *rqstp;
> + struct svc_fh fh;
> + struct nfsd_file *nf;
> + int status = 0;
> + int mayflags = NFSD_MAY_LOCALIO;
> + __be32 beres;
Nit: I've been asking that new NFSD code use reverse-christmas tree
format for variable declarations.
> +
> + /* Save creds before calling into nfsd */
> + save_cred = get_current_cred();
> +
> + rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
> + if (IS_ERR(rqstp)) {
> + status = PTR_ERR(rqstp);
> + goto out_revertcred;
> + }
It might be nicer if you had a small pool of svc threads pre-created
for this purpose instead of manufacturing one of these and then
tearing it down for every local open call.
Maybe even better if you had an internal transport on which to queue
these open requests... because this is all pretty bespoke.
> +
> + /* nfs_fh -> svc_fh */
> + if (nfs_fh->size > NFS4_FHSIZE) {
> + status = -EINVAL;
> + goto out;
> + }
> + fh_init(&fh, NFS4_FHSIZE);
> + fh.fh_handle.fh_size = nfs_fh->size;
> + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> +
> + if (fmode & FMODE_READ)
> + mayflags |= NFSD_MAY_READ;
> + if (fmode & FMODE_WRITE)
> + mayflags |= NFSD_MAY_WRITE;
> +
> + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> + if (beres) {
> + status = nfs_stat_to_errno(be32_to_cpu(beres));
> + goto out_fh_put;
> + }
So I'm wondering whether just calling fh_verify() and then
nfsd_open_verified() would be simpler and/or good enough here. Is
there a strong reason to use the file cache for locally opened
files? Jeff, any thoughts? Will there be writeback ramifications for
doing this? Maybe we need a comment here explaining how these files
are garbage collected (just an fput by the local I/O client, I would
guess).
> +
> + *pfilp = get_file(nf->nf_file);
> +
> + nfsd_file_put(nf);
> +out_fh_put:
> + fh_put(&fh);
> +
> +out:
> + nfsd_local_fakerqst_destroy(rqstp);
> +out_revertcred:
> + revert_creds(save_cred);
> + return status;
> +}
> +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index 1222a0a33fe1..a477d2c5088a 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -431,6 +431,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
> #endif
> #if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> INIT_LIST_HEAD(&nn->nfsd_uuid.list);
> + nn->nfsd_uuid.net = net;
> list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
> #endif
> nn->nfsd_net_up = true;
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 77bbd23aa150..9c0610fdd11c 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
> { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
> { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
> { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
> - { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
> + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
> + { NFSD_MAY_LOCALIO, "LOCALIO" })
>
> TRACE_EVENT(nfsd_compound,
> TP_PROTO(
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 57cd70062048..5146f0c81752 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -33,6 +33,8 @@
>
> #define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */
>
> +#define NFSD_MAY_LOCALIO 0x2000
> +
> #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
> #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
>
> @@ -158,6 +160,13 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
>
> void nfsd_filp_close(struct file *fp);
>
> +int nfsd_open_local_fh(struct net *net,
> + struct rpc_clnt *rpc_clnt,
> + const struct cred *cred,
> + const struct nfs_fh *nfs_fh,
> + const fmode_t fmode,
> + struct file **pfilp);
> +
> static inline int fh_want_write(struct svc_fh *fh)
> {
> int ret;
> --
> 2.44.0
>
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-29 22:18 ` Chuck Lever
@ 2024-06-30 14:49 ` Chuck Lever
2024-06-30 19:44 ` Mike Snitzer
2024-06-30 19:51 ` Jeff Layton
2024-06-30 22:22 ` NeilBrown
2 siblings, 1 reply; 44+ messages in thread
From: Chuck Lever @ 2024-06-30 14:49 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > +
> > + /* nfs_fh -> svc_fh */
> > + if (nfs_fh->size > NFS4_FHSIZE) {
> > + status = -EINVAL;
> > + goto out;
> > + }
> > + fh_init(&fh, NFS4_FHSIZE);
> > + fh.fh_handle.fh_size = nfs_fh->size;
> > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > +
> > + if (fmode & FMODE_READ)
> > + mayflags |= NFSD_MAY_READ;
> > + if (fmode & FMODE_WRITE)
> > + mayflags |= NFSD_MAY_WRITE;
> > +
> > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > + if (beres) {
> > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > + goto out_fh_put;
> > + }
>
> So I'm wondering whether just calling fh_verify() and then
> nfsd_open_verified() would be simpler and/or good enough here. Is
> there a strong reason to use the file cache for locally opened
> files? Jeff, any thoughts?
> Will there be writeback ramifications for
> doing this? Maybe we need a comment here explaining how these files
> are garbage collected (just an fput by the local I/O client, I would
> guess).
OK, going back to this: Since right here we immediately call
nfsd_file_put(nf);
There are no writeback ramifications nor any need to comment about
garbage collection. But this still seems like a lot of (possibly
unnecessary) overhead for simply obtaining a struct file.
> > +
> > + *pfilp = get_file(nf->nf_file);
> > +
> > + nfsd_file_put(nf);
> > +out_fh_put:
> > + fh_put(&fh);
> > +
> > +out:
> > + nfsd_local_fakerqst_destroy(rqstp);
> > +out_revertcred:
> > + revert_creds(save_cred);
> > + return status;
> > +}
> > +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 14:49 ` Chuck Lever
@ 2024-06-30 19:44 ` Mike Snitzer
2024-06-30 19:52 ` Jeff Layton
0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2024-06-30 19:44 UTC (permalink / raw)
To: Chuck Lever
Cc: linux-nfs, Jeff Layton, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > +
> > > + /* nfs_fh -> svc_fh */
> > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > + status = -EINVAL;
> > > + goto out;
> > > + }
> > > + fh_init(&fh, NFS4_FHSIZE);
> > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > +
> > > + if (fmode & FMODE_READ)
> > > + mayflags |= NFSD_MAY_READ;
> > > + if (fmode & FMODE_WRITE)
> > > + mayflags |= NFSD_MAY_WRITE;
> > > +
> > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > + if (beres) {
> > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > + goto out_fh_put;
> > > + }
> >
> > So I'm wondering whether just calling fh_verify() and then
> > nfsd_open_verified() would be simpler and/or good enough here. Is
> > there a strong reason to use the file cache for locally opened
> > files? Jeff, any thoughts?
>
> > Will there be writeback ramifications for
> > doing this? Maybe we need a comment here explaining how these files
> > are garbage collected (just an fput by the local I/O client, I would
> > guess).
>
> OK, going back to this: Since right here we immediately call
>
> nfsd_file_put(nf);
>
> There are no writeback ramifications nor any need to comment about
> garbage collection. But this still seems like a lot of (possibly
> unnecessary) overhead for simply obtaining a struct file.
Easy enough change, probably best to avoid the filecache but would like
to verify with Jeff before switching:
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 1d6508aa931e..85ebf63789fb 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
const struct cred *save_cred;
struct svc_rqst *rqstp;
struct svc_fh fh;
- struct nfsd_file *nf;
__be32 beres;
if (nfs_fh->size > NFS4_FHSIZE)
@@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
if (fmode & FMODE_WRITE)
mayflags |= NFSD_MAY_WRITE;
- beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
+ beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
if (beres) {
status = nfs_stat_to_errno(be32_to_cpu(beres));
goto out_fh_put;
}
- *pfilp = get_file(nf->nf_file);
- nfsd_file_put(nf);
+ status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
out_fh_put:
fh_put(&fh);
nfsd_local_fakerqst_destroy(rqstp);
Mike
^ permalink raw reply related [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-29 22:18 ` Chuck Lever
2024-06-30 14:49 ` Chuck Lever
@ 2024-06-30 19:51 ` Jeff Layton
2024-06-30 22:22 ` NeilBrown
2 siblings, 0 replies; 44+ messages in thread
From: Jeff Layton @ 2024-06-30 19:51 UTC (permalink / raw)
To: Chuck Lever, Mike Snitzer
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, NeilBrown, snitzer
On Sat, 2024-06-29 at 18:18 -0400, Chuck Lever wrote:
> Sorry, I guess I expected to have more time to learn about these
> patches before writing review comments. But if you want them to go
> in soon, I had better look more closely at them now.
>
>
> On Fri, Jun 28, 2024 at 05:10:59PM -0400, Mike Snitzer wrote:
> > Pass the stored cl_nfssvc_net from the client to the server as
>
> This is the only mention of cl_nfssvc_net I can find in this
> patch. I'm not sure what it is. Patch description should maybe
> provide some context.
>
>
> > first argument to nfsd_open_local_fh() to ensure the proper network
> > namespace is used for localio.
>
> Can the patch description say something about the distinct mount
> namespaces -- if the local application is running in a different
> container than the NFS server, are we using only the network
> namespaces for authorizing the file access? And is that OK to do?
> If yes, patch description should explain that NFS local I/O ignores
> the boundaries of mount namespaces and why that is OK to do.
>
>
> > Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> > Signed-off-by: Peng Tao <tao.peng@primarydata.com>
> > Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfsd/Makefile | 1 +
> > fs/nfsd/filecache.c | 2 +-
> > fs/nfsd/localio.c | 239 ++++++++++++++++++++++++++++++++++++++++++++
> > fs/nfsd/nfssvc.c | 1 +
> > fs/nfsd/trace.h | 3 +-
> > fs/nfsd/vfs.h | 9 ++
> > 6 files changed, 253 insertions(+), 2 deletions(-)
> > create mode 100644 fs/nfsd/localio.c
> >
> > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > index b8736a82e57c..78b421778a79 100644
> > --- a/fs/nfsd/Makefile
> > +++ b/fs/nfsd/Makefile
> > @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> > nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> > nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> > nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> > +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index ad9083ca144b..99631fa56662 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -52,7 +52,7 @@
> > #define NFSD_FILE_CACHE_UP (0)
> >
> > /* We only care about NFSD_MAY_READ/WRITE for this cache */
> > -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
> > +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> >
> > static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> > static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > new file mode 100644
> > index 000000000000..759a5cb79652
> > --- /dev/null
> > +++ b/fs/nfsd/localio.c
> > @@ -0,0 +1,239 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * NFS server support for local clients to bypass network stack
> > + *
> > + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> > + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > + */
> > +
> > +#include <linux/exportfs.h>
> > +#include <linux/sunrpc/svcauth_gss.h>
> > +#include <linux/sunrpc/clnt.h>
> > +#include <linux/nfs.h>
> > +#include <linux/string.h>
> > +
> > +#include "nfsd.h"
> > +#include "vfs.h"
> > +#include "netns.h"
> > +#include "filecache.h"
> > +
> > +#define NFSDDBG_FACILITY NFSDDBG_FH
>
> With no more dprintk() call sites in this patch, you no longer need
> this macro definition.
>
>
> > +/*
> > + * We need to translate between nfs status return values and
> > + * the local errno values which may not be the same.
> > + * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
> > + * all compiled nfs objects if it were in include/linux/nfs.h
> > + */
> > +static const struct {
> > + int stat;
> > + int errno;
> > +} nfs_common_errtbl[] = {
> > + { NFS_OK, 0 },
> > + { NFSERR_PERM, -EPERM },
> > + { NFSERR_NOENT, -ENOENT },
> > + { NFSERR_IO, -EIO },
> > + { NFSERR_NXIO, -ENXIO },
> > +/* { NFSERR_EAGAIN, -EAGAIN }, */
> > + { NFSERR_ACCES, -EACCES },
> > + { NFSERR_EXIST, -EEXIST },
> > + { NFSERR_XDEV, -EXDEV },
> > + { NFSERR_NODEV, -ENODEV },
> > + { NFSERR_NOTDIR, -ENOTDIR },
> > + { NFSERR_ISDIR, -EISDIR },
> > + { NFSERR_INVAL, -EINVAL },
> > + { NFSERR_FBIG, -EFBIG },
> > + { NFSERR_NOSPC, -ENOSPC },
> > + { NFSERR_ROFS, -EROFS },
> > + { NFSERR_MLINK, -EMLINK },
> > + { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
> > + { NFSERR_NOTEMPTY, -ENOTEMPTY },
> > + { NFSERR_DQUOT, -EDQUOT },
> > + { NFSERR_STALE, -ESTALE },
> > + { NFSERR_REMOTE, -EREMOTE },
> > +#ifdef EWFLUSH
> > + { NFSERR_WFLUSH, -EWFLUSH },
> > +#endif
> > + { NFSERR_BADHANDLE, -EBADHANDLE },
> > + { NFSERR_NOT_SYNC, -ENOTSYNC },
> > + { NFSERR_BAD_COOKIE, -EBADCOOKIE },
> > + { NFSERR_NOTSUPP, -ENOTSUPP },
> > + { NFSERR_TOOSMALL, -ETOOSMALL },
> > + { NFSERR_SERVERFAULT, -EREMOTEIO },
> > + { NFSERR_BADTYPE, -EBADTYPE },
> > + { NFSERR_JUKEBOX, -EJUKEBOX },
> > + { -1, -EIO }
> > +};
> > +
> > +/**
> > + * nfs_stat_to_errno - convert an NFS status code to a local errno
> > + * @status: NFS status code to convert
> > + *
> > + * Returns a local errno value, or -EIO if the NFS status code is
> > + * not recognized. nfsd_file_acquire() returns an nfsstat that
> > + * needs to be translated to an errno before being returned to a
> > + * local client application.
> > + */
> > +static int nfs_stat_to_errno(enum nfs_stat status)
> > +{
> > + int i;
> > +
> > + for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
> > + if (nfs_common_errtbl[i].stat == (int)status)
> > + return nfs_common_errtbl[i].errno;
> > + }
> > + return nfs_common_errtbl[i].errno;
> > +}
> > +
> > +static void
> > +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> > +{
> > + if (rqstp->rq_client)
> > + auth_domain_put(rqstp->rq_client);
> > + if (rqstp->rq_cred.cr_group_info)
> > + put_group_info(rqstp->rq_cred.cr_group_info);
> > + /* rpcauth_map_to_svc_cred_local() clears cr_principal */
> > + WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
> > + kfree(rqstp->rq_xprt);
> > + kfree(rqstp);
> > +}
> > +
> > +static struct svc_rqst *
> > +nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
> > + const struct cred *cred)
> > +{
> > + struct svc_rqst *rqstp;
> > + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > + int status;
> > +
> > + /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
> > + if (unlikely(!READ_ONCE(nn->nfsd_serv)))
> > + return ERR_PTR(-ENXIO);
> > +
> > + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> > + if (!rqstp)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> > + if (!rqstp->rq_xprt) {
> > + status = -ENOMEM;
> > + goto out_err;
> > + }
>
> struct svc_rqst is pretty big (like, bigger than a couple of pages).
> What happens if this allocation fails?
>
> And how often does it occur -- does that add significant overhead?
>
>
> > +
> > + rqstp->rq_xprt->xpt_net = net;
> > + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> > + rqstp->rq_proc = 1;
> > + rqstp->rq_vers = 3;
>
> IMO these need to be symbolic constants, not integers. Or, at least
> there needs to be some documenting comments that explain these are
> fake and why that's OK to do. Or, are there better choices?
>
>
> > + rqstp->rq_prot = IPPROTO_TCP;
> > + rqstp->rq_server = nn->nfsd_serv;
> > +
> > + /* Note: we're connecting to ourself, so source addr == peer addr */
> > + rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
> > + (struct sockaddr *)&rqstp->rq_addr,
> > + sizeof(rqstp->rq_addr));
> > +
> > + rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
> > +
> > + /*
> > + * set up enough for svcauth_unix_set_client to be able to wait
> > + * for the cache downcall. Note that we do _not_ want to allow the
> > + * request to be deferred for later revisit since this rqst and xprt
> > + * are not set up to run inside of the normal svc_rqst engine.
> > + */
> > + INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
> > + kref_init(&rqstp->rq_xprt->xpt_ref);
> > + spin_lock_init(&rqstp->rq_xprt->xpt_lock);
> > + rqstp->rq_chandle.thread_wait = 5 * HZ;
> > +
> > + status = svcauth_unix_set_client(rqstp);
> > + switch (status) {
> > + case SVC_OK:
> > + break;
> > + case SVC_DENIED:
> > + status = -ENXIO;
> > + goto out_err;
> > + default:
> > + status = -ETIMEDOUT;
> > + goto out_err;
> > + }
>
> Interesting. Why would svcauth_unix_set_client fail for a local I/O
> request? Wouldn't it only be because the local application is trying
> to open a file it doesn't have permission to?
>
I'd think so, yes. This case is not exactly like doing I/O on a local
fs since we don't have a persistent open file. nfsd has to do the
permission check on every READ/WRITE op, and I think we have to follow
suit here, particularly in the case of flexfiles DS traffic, since
fencing requires it.
>
> > + return rqstp;
> > +
> > +out_err:
> > + nfsd_local_fakerqst_destroy(rqstp);
> > + return ERR_PTR(status);
> > +}
> > +
> > +/*
> > + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file
> > + *
> > + * This function maps a local fh to a path on a local filesystem.
> > + * This is useful when the nfs client has the local server mounted - it can
> > + * avoid all the NFS overhead with reads, writes and commits.
>
> Hm. It just occurred to me that there won't be a two-phase commit
> here, and possibly no flush-on-close, either? Can someone help
> explain whether/how the writeback semantics are different for NFS
> local I/O?
>
A localio request is just a replacement for a READ or WRITE RPC. We
don't have flush on close or a COMMIT in the WRITE RPC case either. The
client has to follow up with a COMMIT RPC (or the localio equivalent).
>
> > + *
> > + * on successful return, caller is responsible for calling path_put. Also
> > + * note that this is called from nfs.ko via find_symbol() to avoid an explicit
> > + * dependency on knfsd. So, there is no forward declaration in a header file
> > + * for it.
>
> Yet I see a declaration added below in fs/nfsd/vfs.h. Is this
> comment out of date? Or perhaps you mean there's no declaration
> that is shared with the client code?
>
>
> > + */
> > +int nfsd_open_local_fh(struct net *net,
>
> I've been asking that new NFSD code use genuine full-blooded kdoc
> comments for new functions. Since this is a global (EXPORTED)
> function, please make this a genuine kdoc comment.
>
>
> > + struct rpc_clnt *rpc_clnt,
> > + const struct cred *cred,
> > + const struct nfs_fh *nfs_fh,
> > + const fmode_t fmode,
> > + struct file **pfilp)
> > +{
> > + const struct cred *save_cred;
> > + struct svc_rqst *rqstp;
> > + struct svc_fh fh;
> > + struct nfsd_file *nf;
> > + int status = 0;
> > + int mayflags = NFSD_MAY_LOCALIO;
> > + __be32 beres;
>
> Nit: I've been asking that new NFSD code use reverse-christmas tree
> format for variable declarations.
>
>
> > +
> > + /* Save creds before calling into nfsd */
> > + save_cred = get_current_cred();
> > +
> > + rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
> > + if (IS_ERR(rqstp)) {
> > + status = PTR_ERR(rqstp);
> > + goto out_revertcred;
> > + }
>
> It might be nicer if you had a small pool of svc threads pre-created
> for this purpose instead of manufacturing one of these and then
> tearing it down for every local open call.
>
> Maybe even better if you had an internal transport on which to queue
> these open requests... because this is all pretty bespoke.
>
>
> > +
> > + /* nfs_fh -> svc_fh */
> > + if (nfs_fh->size > NFS4_FHSIZE) {
> > + status = -EINVAL;
> > + goto out;
> > + }
> > + fh_init(&fh, NFS4_FHSIZE);
> > + fh.fh_handle.fh_size = nfs_fh->size;
> > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > +
> > + if (fmode & FMODE_READ)
> > + mayflags |= NFSD_MAY_READ;
> > + if (fmode & FMODE_WRITE)
> > + mayflags |= NFSD_MAY_WRITE;
> > +
> > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > + if (beres) {
> > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > + goto out_fh_put;
> > + }
>
> So I'm wondering whether just calling fh_verify() and then
> nfsd_open_verified() would be simpler and/or good enough here. Is
> there a strong reason to use the file cache for locally opened
> files? Jeff, any thoughts? Will there be writeback ramifications for
> doing this? Maybe we need a comment here explaining how these files
> are garbage collected (just an fput by the local I/O client, I would
> guess).
>
Yes, I think you do want to use the filecache here. The whole point of
the filecache is to optimize away the open and close in v3 calls.
Optimizing those away in the context of a localio op seems even more
valuable, since you're not dealing with network latency.
>
> > +
> > + *pfilp = get_file(nf->nf_file);
> > +
> > + nfsd_file_put(nf);
> > +out_fh_put:
> > + fh_put(&fh);
> > +
> > +out:
> > + nfsd_local_fakerqst_destroy(rqstp);
> > +out_revertcred:
> > + revert_creds(save_cred);
> > + return status;
> > +}
> > +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> > +
> > +/* Compile time type checking, not used by anything */
> > +static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index 1222a0a33fe1..a477d2c5088a 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -431,6 +431,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
> > #endif
> > #if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> > INIT_LIST_HEAD(&nn->nfsd_uuid.list);
> > + nn->nfsd_uuid.net = net;
> > list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
> > #endif
> > nn->nfsd_net_up = true;
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index 77bbd23aa150..9c0610fdd11c 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
> > { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
> > { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
> > { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
> > - { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
> > + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
> > + { NFSD_MAY_LOCALIO, "LOCALIO" })
> >
> > TRACE_EVENT(nfsd_compound,
> > TP_PROTO(
> > diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> > index 57cd70062048..5146f0c81752 100644
> > --- a/fs/nfsd/vfs.h
> > +++ b/fs/nfsd/vfs.h
> > @@ -33,6 +33,8 @@
> >
> > #define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */
> >
> > +#define NFSD_MAY_LOCALIO 0x2000
> > +
> > #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
> > #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
> >
> > @@ -158,6 +160,13 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
> >
> > void nfsd_filp_close(struct file *fp);
> >
> > +int nfsd_open_local_fh(struct net *net,
> > + struct rpc_clnt *rpc_clnt,
> > + const struct cred *cred,
> > + const struct nfs_fh *nfs_fh,
> > + const fmode_t fmode,
> > + struct file **pfilp);
> > +
> > static inline int fh_want_write(struct svc_fh *fh)
> > {
> > int ret;
> > --
> > 2.44.0
> >
> >
>
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 19:44 ` Mike Snitzer
@ 2024-06-30 19:52 ` Jeff Layton
2024-06-30 19:55 ` Chuck Lever
0 siblings, 1 reply; 44+ messages in thread
From: Jeff Layton @ 2024-06-30 19:52 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever
Cc: linux-nfs, Anna Schumaker, Trond Myklebust, NeilBrown, snitzer
On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > +
> > > > + /* nfs_fh -> svc_fh */
> > > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > > + status = -EINVAL;
> > > > + goto out;
> > > > + }
> > > > + fh_init(&fh, NFS4_FHSIZE);
> > > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > +
> > > > + if (fmode & FMODE_READ)
> > > > + mayflags |= NFSD_MAY_READ;
> > > > + if (fmode & FMODE_WRITE)
> > > > + mayflags |= NFSD_MAY_WRITE;
> > > > +
> > > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > + if (beres) {
> > > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > + goto out_fh_put;
> > > > + }
> > >
> > > So I'm wondering whether just calling fh_verify() and then
> > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > there a strong reason to use the file cache for locally opened
> > > files? Jeff, any thoughts?
> >
> > > Will there be writeback ramifications for
> > > doing this? Maybe we need a comment here explaining how these files
> > > are garbage collected (just an fput by the local I/O client, I would
> > > guess).
> >
> > OK, going back to this: Since right here we immediately call
> >
> > nfsd_file_put(nf);
> >
> > There are no writeback ramifications nor any need to comment about
> > garbage collection. But this still seems like a lot of (possibly
> > unnecessary) overhead for simply obtaining a struct file.
>
> Easy enough change, probably best to avoid the filecache but would like
> to verify with Jeff before switching:
>
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> index 1d6508aa931e..85ebf63789fb 100644
> --- a/fs/nfsd/localio.c
> +++ b/fs/nfsd/localio.c
> @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> const struct cred *save_cred;
> struct svc_rqst *rqstp;
> struct svc_fh fh;
> - struct nfsd_file *nf;
> __be32 beres;
>
> if (nfs_fh->size > NFS4_FHSIZE)
> @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> if (fmode & FMODE_WRITE)
> mayflags |= NFSD_MAY_WRITE;
>
> - beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> + beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> if (beres) {
> status = nfs_stat_to_errno(be32_to_cpu(beres));
> goto out_fh_put;
> }
> - *pfilp = get_file(nf->nf_file);
> - nfsd_file_put(nf);
> + status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> out_fh_put:
> fh_put(&fh);
> nfsd_local_fakerqst_destroy(rqstp);
>
My suggestion would be to _not_ do this. I think you do want to use the
filecache (mostly for performance reasons).
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 19:52 ` Jeff Layton
@ 2024-06-30 19:55 ` Chuck Lever
2024-06-30 19:59 ` Jeff Layton
0 siblings, 1 reply; 44+ messages in thread
From: Chuck Lever @ 2024-06-30 19:55 UTC (permalink / raw)
To: Jeff Layton
Cc: Mike Snitzer, linux-nfs, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > +
> > > > > + /* nfs_fh -> svc_fh */
> > > > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > + status = -EINVAL;
> > > > > + goto out;
> > > > > + }
> > > > > + fh_init(&fh, NFS4_FHSIZE);
> > > > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > +
> > > > > + if (fmode & FMODE_READ)
> > > > > + mayflags |= NFSD_MAY_READ;
> > > > > + if (fmode & FMODE_WRITE)
> > > > > + mayflags |= NFSD_MAY_WRITE;
> > > > > +
> > > > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > + if (beres) {
> > > > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > + goto out_fh_put;
> > > > > + }
> > > >
> > > > So I'm wondering whether just calling fh_verify() and then
> > > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > > there a strong reason to use the file cache for locally opened
> > > > files? Jeff, any thoughts?
> > >
> > > > Will there be writeback ramifications for
> > > > doing this? Maybe we need a comment here explaining how these files
> > > > are garbage collected (just an fput by the local I/O client, I would
> > > > guess).
> > >
> > > OK, going back to this: Since right here we immediately call
> > >
> > > nfsd_file_put(nf);
> > >
> > > There are no writeback ramifications nor any need to comment about
> > > garbage collection. But this still seems like a lot of (possibly
> > > unnecessary) overhead for simply obtaining a struct file.
> >
> > Easy enough change, probably best to avoid the filecache but would like
> > to verify with Jeff before switching:
> >
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > index 1d6508aa931e..85ebf63789fb 100644
> > --- a/fs/nfsd/localio.c
> > +++ b/fs/nfsd/localio.c
> > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > const struct cred *save_cred;
> > struct svc_rqst *rqstp;
> > struct svc_fh fh;
> > - struct nfsd_file *nf;
> > __be32 beres;
> >
> > if (nfs_fh->size > NFS4_FHSIZE)
> > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > if (fmode & FMODE_WRITE)
> > mayflags |= NFSD_MAY_WRITE;
> >
> > - beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > + beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > if (beres) {
> > status = nfs_stat_to_errno(be32_to_cpu(beres));
> > goto out_fh_put;
> > }
> > - *pfilp = get_file(nf->nf_file);
> > - nfsd_file_put(nf);
> > + status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > out_fh_put:
> > fh_put(&fh);
> > nfsd_local_fakerqst_destroy(rqstp);
> >
>
> My suggestion would be to _not_ do this. I think you do want to use the
> filecache (mostly for performance reasons).
But look carefully:
-- we're not calling nfsd_file_acquire_gc() here
-- we're immediately calling nfsd_file_put() on the returned nf
There's nothing left in the file cache when nfsd_open_local_fh()
returns. Each call here will do a full file open and a full close.
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 19:55 ` Chuck Lever
@ 2024-06-30 19:59 ` Jeff Layton
2024-06-30 20:15 ` Chuck Lever
2024-06-30 21:54 ` NeilBrown
0 siblings, 2 replies; 44+ messages in thread
From: Jeff Layton @ 2024-06-30 19:59 UTC (permalink / raw)
To: Chuck Lever
Cc: Mike Snitzer, linux-nfs, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
On Sun, 2024-06-30 at 15:55 -0400, Chuck Lever wrote:
> On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> > On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > > +
> > > > > > + /* nfs_fh -> svc_fh */
> > > > > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > > + status = -EINVAL;
> > > > > > + goto out;
> > > > > > + }
> > > > > > + fh_init(&fh, NFS4_FHSIZE);
> > > > > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > > > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > > +
> > > > > > + if (fmode & FMODE_READ)
> > > > > > + mayflags |= NFSD_MAY_READ;
> > > > > > + if (fmode & FMODE_WRITE)
> > > > > > + mayflags |= NFSD_MAY_WRITE;
> > > > > > +
> > > > > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > + if (beres) {
> > > > > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > + goto out_fh_put;
> > > > > > + }
> > > > >
> > > > > So I'm wondering whether just calling fh_verify() and then
> > > > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > > > there a strong reason to use the file cache for locally opened
> > > > > files? Jeff, any thoughts?
> > > >
> > > > > Will there be writeback ramifications for
> > > > > doing this? Maybe we need a comment here explaining how these files
> > > > > are garbage collected (just an fput by the local I/O client, I would
> > > > > guess).
> > > >
> > > > OK, going back to this: Since right here we immediately call
> > > >
> > > > nfsd_file_put(nf);
> > > >
> > > > There are no writeback ramifications nor any need to comment about
> > > > garbage collection. But this still seems like a lot of (possibly
> > > > unnecessary) overhead for simply obtaining a struct file.
> > >
> > > Easy enough change, probably best to avoid the filecache but would like
> > > to verify with Jeff before switching:
> > >
> > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > index 1d6508aa931e..85ebf63789fb 100644
> > > --- a/fs/nfsd/localio.c
> > > +++ b/fs/nfsd/localio.c
> > > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > const struct cred *save_cred;
> > > struct svc_rqst *rqstp;
> > > struct svc_fh fh;
> > > - struct nfsd_file *nf;
> > > __be32 beres;
> > >
> > > if (nfs_fh->size > NFS4_FHSIZE)
> > > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > if (fmode & FMODE_WRITE)
> > > mayflags |= NFSD_MAY_WRITE;
> > >
> > > - beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > + beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > > if (beres) {
> > > status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > goto out_fh_put;
> > > }
> > > - *pfilp = get_file(nf->nf_file);
> > > - nfsd_file_put(nf);
> > > + status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > > out_fh_put:
> > > fh_put(&fh);
> > > nfsd_local_fakerqst_destroy(rqstp);
> > >
> >
> > My suggestion would be to _not_ do this. I think you do want to use the
> > filecache (mostly for performance reasons).
>
> But look carefully:
>
> -- we're not calling nfsd_file_acquire_gc() here
>
> -- we're immediately calling nfsd_file_put() on the returned nf
>
> There's nothing left in the file cache when nfsd_open_local_fh()
> returns. Each call here will do a full file open and a full close.
>
>
Good point. This should be calling nfsd_file_acquire_gc(), IMO.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 19:59 ` Jeff Layton
@ 2024-06-30 20:15 ` Chuck Lever
2024-06-30 21:07 ` Jeff Layton
2024-06-30 21:54 ` NeilBrown
1 sibling, 1 reply; 44+ messages in thread
From: Chuck Lever @ 2024-06-30 20:15 UTC (permalink / raw)
To: Jeff Layton
Cc: Mike Snitzer, linux-nfs, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
On Sun, Jun 30, 2024 at 03:59:58PM -0400, Jeff Layton wrote:
> On Sun, 2024-06-30 at 15:55 -0400, Chuck Lever wrote:
> > On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> > > On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > > > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > > > +
> > > > > > > + /* nfs_fh -> svc_fh */
> > > > > > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > > > + status = -EINVAL;
> > > > > > > + goto out;
> > > > > > > + }
> > > > > > > + fh_init(&fh, NFS4_FHSIZE);
> > > > > > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > > > > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > > > +
> > > > > > > + if (fmode & FMODE_READ)
> > > > > > > + mayflags |= NFSD_MAY_READ;
> > > > > > > + if (fmode & FMODE_WRITE)
> > > > > > > + mayflags |= NFSD_MAY_WRITE;
> > > > > > > +
> > > > > > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > > + if (beres) {
> > > > > > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > > + goto out_fh_put;
> > > > > > > + }
> > > > > >
> > > > > > So I'm wondering whether just calling fh_verify() and then
> > > > > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > > > > there a strong reason to use the file cache for locally opened
> > > > > > files? Jeff, any thoughts?
> > > > >
> > > > > > Will there be writeback ramifications for
> > > > > > doing this? Maybe we need a comment here explaining how these files
> > > > > > are garbage collected (just an fput by the local I/O client, I would
> > > > > > guess).
> > > > >
> > > > > OK, going back to this: Since right here we immediately call
> > > > >
> > > > > nfsd_file_put(nf);
> > > > >
> > > > > There are no writeback ramifications nor any need to comment about
> > > > > garbage collection. But this still seems like a lot of (possibly
> > > > > unnecessary) overhead for simply obtaining a struct file.
> > > >
> > > > Easy enough change, probably best to avoid the filecache but would like
> > > > to verify with Jeff before switching:
> > > >
> > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > index 1d6508aa931e..85ebf63789fb 100644
> > > > --- a/fs/nfsd/localio.c
> > > > +++ b/fs/nfsd/localio.c
> > > > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > const struct cred *save_cred;
> > > > struct svc_rqst *rqstp;
> > > > struct svc_fh fh;
> > > > - struct nfsd_file *nf;
> > > > __be32 beres;
> > > >
> > > > if (nfs_fh->size > NFS4_FHSIZE)
> > > > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > if (fmode & FMODE_WRITE)
> > > > mayflags |= NFSD_MAY_WRITE;
> > > >
> > > > - beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > + beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > > > if (beres) {
> > > > status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > goto out_fh_put;
> > > > }
> > > > - *pfilp = get_file(nf->nf_file);
> > > > - nfsd_file_put(nf);
> > > > + status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > > > out_fh_put:
> > > > fh_put(&fh);
> > > > nfsd_local_fakerqst_destroy(rqstp);
> > > >
> > >
> > > My suggestion would be to _not_ do this. I think you do want to use the
> > > filecache (mostly for performance reasons).
> >
> > But look carefully:
> >
> > -- we're not calling nfsd_file_acquire_gc() here
> >
> > -- we're immediately calling nfsd_file_put() on the returned nf
> >
> > There's nothing left in the file cache when nfsd_open_local_fh()
> > returns. Each call here will do a full file open and a full close.
>
> Good point. This should be calling nfsd_file_acquire_gc(), IMO.
So that goes to my point yesterday about writeback ramifications.
If these open files linger in the file cache, then when will they
get written back to storage and by whom? Is it going to be an nfsd
thread writing them back as part of garbage collection?
So, you're saying that the local I/O client will always behave like
NFSv3 in this regard, and open/read/close, open/write/close instead
of hanging on to the open file? That seems... suboptimal... and not
expected for a local file. That needs to be documented in the
LOCALIO design doc.
I'm also concerned about local applications closing a file but
having an open file handle linger in the file cache -- that can
prevent other accesses to the file until the GC ejects that open
file, as we've seen in the field.
IMHO nfsd_file_acquire_gc() is going to have some unwanted side
effects.
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 20:15 ` Chuck Lever
@ 2024-06-30 21:07 ` Jeff Layton
2024-06-30 21:56 ` NeilBrown
0 siblings, 1 reply; 44+ messages in thread
From: Jeff Layton @ 2024-06-30 21:07 UTC (permalink / raw)
To: Chuck Lever
Cc: Mike Snitzer, linux-nfs, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
On Sun, 2024-06-30 at 16:15 -0400, Chuck Lever wrote:
> On Sun, Jun 30, 2024 at 03:59:58PM -0400, Jeff Layton wrote:
> > On Sun, 2024-06-30 at 15:55 -0400, Chuck Lever wrote:
> > > On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> > > > On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > > > > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > > > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > > > > +
> > > > > > > > + /* nfs_fh -> svc_fh */
> > > > > > > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > > > > + status = -EINVAL;
> > > > > > > > + goto out;
> > > > > > > > + }
> > > > > > > > + fh_init(&fh, NFS4_FHSIZE);
> > > > > > > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > > > > > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > > > > +
> > > > > > > > + if (fmode & FMODE_READ)
> > > > > > > > + mayflags |= NFSD_MAY_READ;
> > > > > > > > + if (fmode & FMODE_WRITE)
> > > > > > > > + mayflags |= NFSD_MAY_WRITE;
> > > > > > > > +
> > > > > > > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > > > + if (beres) {
> > > > > > > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > > > + goto out_fh_put;
> > > > > > > > + }
> > > > > > >
> > > > > > > So I'm wondering whether just calling fh_verify() and then
> > > > > > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > > > > > there a strong reason to use the file cache for locally opened
> > > > > > > files? Jeff, any thoughts?
> > > > > >
> > > > > > > Will there be writeback ramifications for
> > > > > > > doing this? Maybe we need a comment here explaining how these files
> > > > > > > are garbage collected (just an fput by the local I/O client, I would
> > > > > > > guess).
> > > > > >
> > > > > > OK, going back to this: Since right here we immediately call
> > > > > >
> > > > > > nfsd_file_put(nf);
> > > > > >
> > > > > > There are no writeback ramifications nor any need to comment about
> > > > > > garbage collection. But this still seems like a lot of (possibly
> > > > > > unnecessary) overhead for simply obtaining a struct file.
> > > > >
> > > > > Easy enough change, probably best to avoid the filecache but would like
> > > > > to verify with Jeff before switching:
> > > > >
> > > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > > index 1d6508aa931e..85ebf63789fb 100644
> > > > > --- a/fs/nfsd/localio.c
> > > > > +++ b/fs/nfsd/localio.c
> > > > > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > > const struct cred *save_cred;
> > > > > struct svc_rqst *rqstp;
> > > > > struct svc_fh fh;
> > > > > - struct nfsd_file *nf;
> > > > > __be32 beres;
> > > > >
> > > > > if (nfs_fh->size > NFS4_FHSIZE)
> > > > > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > > if (fmode & FMODE_WRITE)
> > > > > mayflags |= NFSD_MAY_WRITE;
> > > > >
> > > > > - beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > + beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > > > > if (beres) {
> > > > > status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > goto out_fh_put;
> > > > > }
> > > > > - *pfilp = get_file(nf->nf_file);
> > > > > - nfsd_file_put(nf);
> > > > > + status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > > > > out_fh_put:
> > > > > fh_put(&fh);
> > > > > nfsd_local_fakerqst_destroy(rqstp);
> > > > >
> > > >
> > > > My suggestion would be to _not_ do this. I think you do want to use the
> > > > filecache (mostly for performance reasons).
> > >
> > > But look carefully:
> > >
> > > -- we're not calling nfsd_file_acquire_gc() here
> > >
> > > -- we're immediately calling nfsd_file_put() on the returned nf
> > >
> > > There's nothing left in the file cache when nfsd_open_local_fh()
> > > returns. Each call here will do a full file open and a full close.
> >
> > Good point. This should be calling nfsd_file_acquire_gc(), IMO.
>
> So that goes to my point yesterday about writeback ramifications.
>
> If these open files linger in the file cache, then when will they
> get written back to storage and by whom? Is it going to be an nfsd
> thread writing them back as part of garbage collection?
>
Usually the client is issuing regular COMMITs. If that doesn't happen,
then the flusher threads should get the rest.
Side note: I don't guess COMMIT goes over the localio path yet, does
it? Maybe it should. It would be nice to not tie up an nfsd thread with
writeback.
> So, you're saying that the local I/O client will always behave like
> NFSv3 in this regard, and open/read/close, open/write/close instead
> of hanging on to the open file? That seems... suboptimal... and not
> expected for a local file. That needs to be documented in the
> LOCALIO design doc.
>
I imagine so, which is why I suggest using the filecache. If we get one
READ or WRITE for the file via localio, we're pretty likely to get
more. Why not amortize that file open over several operations?
> I'm also concerned about local applications closing a file but
> having an open file handle linger in the file cache -- that can
> prevent other accesses to the file until the GC ejects that open
> file, as we've seen in the field.
>
> IMHO nfsd_file_acquire_gc() is going to have some unwanted side
> effects.
>
Typically, the client issues COMMIT calls when the client-side fd is
closed (for CTO). While I think we do need to be able to deal with
flushing files with dirty data that are left "hanging", that shouldn't
be the common case. Most of the time, the client is going to be issuing
regular COMMITs so that it can clean its pages.
IOW, I don't see how localio is any different than the case of normal
v3 IO in this respect.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 19:59 ` Jeff Layton
2024-06-30 20:15 ` Chuck Lever
@ 2024-06-30 21:54 ` NeilBrown
2024-07-01 1:29 ` NeilBrown
1 sibling, 1 reply; 44+ messages in thread
From: NeilBrown @ 2024-06-30 21:54 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Mike Snitzer, linux-nfs, Anna Schumaker,
Trond Myklebust, snitzer
On Mon, 01 Jul 2024, Jeff Layton wrote:
> On Sun, 2024-06-30 at 15:55 -0400, Chuck Lever wrote:
> > On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> > > On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > > > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > > > +
> > > > > > > + /* nfs_fh -> svc_fh */
> > > > > > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > > > + status = -EINVAL;
> > > > > > > + goto out;
> > > > > > > + }
> > > > > > > + fh_init(&fh, NFS4_FHSIZE);
> > > > > > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > > > > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > > > +
> > > > > > > + if (fmode & FMODE_READ)
> > > > > > > + mayflags |= NFSD_MAY_READ;
> > > > > > > + if (fmode & FMODE_WRITE)
> > > > > > > + mayflags |= NFSD_MAY_WRITE;
> > > > > > > +
> > > > > > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > > + if (beres) {
> > > > > > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > > + goto out_fh_put;
> > > > > > > + }
> > > > > >
> > > > > > So I'm wondering whether just calling fh_verify() and then
> > > > > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > > > > there a strong reason to use the file cache for locally opened
> > > > > > files? Jeff, any thoughts?
> > > > >
> > > > > > Will there be writeback ramifications for
> > > > > > doing this? Maybe we need a comment here explaining how these files
> > > > > > are garbage collected (just an fput by the local I/O client, I would
> > > > > > guess).
> > > > >
> > > > > OK, going back to this: Since right here we immediately call
> > > > >
> > > > > nfsd_file_put(nf);
> > > > >
> > > > > There are no writeback ramifications nor any need to comment about
> > > > > garbage collection. But this still seems like a lot of (possibly
> > > > > unnecessary) overhead for simply obtaining a struct file.
> > > >
> > > > Easy enough change, probably best to avoid the filecache but would like
> > > > to verify with Jeff before switching:
> > > >
> > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > index 1d6508aa931e..85ebf63789fb 100644
> > > > --- a/fs/nfsd/localio.c
> > > > +++ b/fs/nfsd/localio.c
> > > > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > const struct cred *save_cred;
> > > > struct svc_rqst *rqstp;
> > > > struct svc_fh fh;
> > > > - struct nfsd_file *nf;
> > > > __be32 beres;
> > > >
> > > > if (nfs_fh->size > NFS4_FHSIZE)
> > > > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > if (fmode & FMODE_WRITE)
> > > > mayflags |= NFSD_MAY_WRITE;
> > > >
> > > > - beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > + beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > > > if (beres) {
> > > > status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > goto out_fh_put;
> > > > }
> > > > - *pfilp = get_file(nf->nf_file);
> > > > - nfsd_file_put(nf);
> > > > + status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > > > out_fh_put:
> > > > fh_put(&fh);
> > > > nfsd_local_fakerqst_destroy(rqstp);
> > > >
> > >
> > > My suggestion would be to _not_ do this. I think you do want to use the
> > > filecache (mostly for performance reasons).
> >
> > But look carefully:
> >
> > -- we're not calling nfsd_file_acquire_gc() here
> >
> > -- we're immediately calling nfsd_file_put() on the returned nf
> >
> > There's nothing left in the file cache when nfsd_open_local_fh()
> > returns. Each call here will do a full file open and a full close.
> >
> >
>
> Good point. This should be calling nfsd_file_acquire_gc(), IMO.
Or the client could do a v4 style acquire, and not call nfsd_file_put()
until it was done with the file. I don't see a specific problem with
_gc, but avoiding the heuristic it implies seems best where possible.
NeilBrown
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 21:07 ` Jeff Layton
@ 2024-06-30 21:56 ` NeilBrown
0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2024-06-30 21:56 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Mike Snitzer, linux-nfs, Anna Schumaker,
Trond Myklebust, snitzer
On Mon, 01 Jul 2024, Jeff Layton wrote:
> On Sun, 2024-06-30 at 16:15 -0400, Chuck Lever wrote:
> > On Sun, Jun 30, 2024 at 03:59:58PM -0400, Jeff Layton wrote:
> > > On Sun, 2024-06-30 at 15:55 -0400, Chuck Lever wrote:
> > > > On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> > > > > On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > > > > > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > > > > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > > > > > +
> > > > > > > > > + /* nfs_fh -> svc_fh */
> > > > > > > > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > > > > > + status = -EINVAL;
> > > > > > > > > + goto out;
> > > > > > > > > + }
> > > > > > > > > + fh_init(&fh, NFS4_FHSIZE);
> > > > > > > > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > > > > > > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > > > > > +
> > > > > > > > > + if (fmode & FMODE_READ)
> > > > > > > > > + mayflags |= NFSD_MAY_READ;
> > > > > > > > > + if (fmode & FMODE_WRITE)
> > > > > > > > > + mayflags |= NFSD_MAY_WRITE;
> > > > > > > > > +
> > > > > > > > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > > > > + if (beres) {
> > > > > > > > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > > > > + goto out_fh_put;
> > > > > > > > > + }
> > > > > > > >
> > > > > > > > So I'm wondering whether just calling fh_verify() and then
> > > > > > > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > > > > > > there a strong reason to use the file cache for locally opened
> > > > > > > > files? Jeff, any thoughts?
> > > > > > >
> > > > > > > > Will there be writeback ramifications for
> > > > > > > > doing this? Maybe we need a comment here explaining how these files
> > > > > > > > are garbage collected (just an fput by the local I/O client, I would
> > > > > > > > guess).
> > > > > > >
> > > > > > > OK, going back to this: Since right here we immediately call
> > > > > > >
> > > > > > > nfsd_file_put(nf);
> > > > > > >
> > > > > > > There are no writeback ramifications nor any need to comment about
> > > > > > > garbage collection. But this still seems like a lot of (possibly
> > > > > > > unnecessary) overhead for simply obtaining a struct file.
> > > > > >
> > > > > > Easy enough change, probably best to avoid the filecache but would like
> > > > > > to verify with Jeff before switching:
> > > > > >
> > > > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > > > index 1d6508aa931e..85ebf63789fb 100644
> > > > > > --- a/fs/nfsd/localio.c
> > > > > > +++ b/fs/nfsd/localio.c
> > > > > > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > > > const struct cred *save_cred;
> > > > > > struct svc_rqst *rqstp;
> > > > > > struct svc_fh fh;
> > > > > > - struct nfsd_file *nf;
> > > > > > __be32 beres;
> > > > > >
> > > > > > if (nfs_fh->size > NFS4_FHSIZE)
> > > > > > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > > > if (fmode & FMODE_WRITE)
> > > > > > mayflags |= NFSD_MAY_WRITE;
> > > > > >
> > > > > > - beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > + beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > > > > > if (beres) {
> > > > > > status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > goto out_fh_put;
> > > > > > }
> > > > > > - *pfilp = get_file(nf->nf_file);
> > > > > > - nfsd_file_put(nf);
> > > > > > + status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > > > > > out_fh_put:
> > > > > > fh_put(&fh);
> > > > > > nfsd_local_fakerqst_destroy(rqstp);
> > > > > >
> > > > >
> > > > > My suggestion would be to _not_ do this. I think you do want to use the
> > > > > filecache (mostly for performance reasons).
> > > >
> > > > But look carefully:
> > > >
> > > > -- we're not calling nfsd_file_acquire_gc() here
> > > >
> > > > -- we're immediately calling nfsd_file_put() on the returned nf
> > > >
> > > > There's nothing left in the file cache when nfsd_open_local_fh()
> > > > returns. Each call here will do a full file open and a full close.
> > >
> > > Good point. This should be calling nfsd_file_acquire_gc(), IMO.
> >
> > So that goes to my point yesterday about writeback ramifications.
> >
> > If these open files linger in the file cache, then when will they
> > get written back to storage and by whom? Is it going to be an nfsd
> > thread writing them back as part of garbage collection?
> >
>
> Usually the client is issuing regular COMMITs. If that doesn't happen,
> then the flusher threads should get the rest.
>
> Side note: I don't guess COMMIT goes over the localio path yet, does
> it? Maybe it should. It would be nice to not tie up an nfsd thread with
> writeback.
The documentation certainly claims that COMMIT uses the localio path. I
haven't double checked the code but I'd be very surprised if it didn't.
NeilBrown
>
> > So, you're saying that the local I/O client will always behave like
> > NFSv3 in this regard, and open/read/close, open/write/close instead
> > of hanging on to the open file? That seems... suboptimal... and not
> > expected for a local file. That needs to be documented in the
> > LOCALIO design doc.
> >
>
> I imagine so, which is why I suggest using the filecache. If we get one
> READ or WRITE for the file via localio, we're pretty likely to get
> more. Why not amortize that file open over several operations?
>
> > I'm also concerned about local applications closing a file but
> > having an open file handle linger in the file cache -- that can
> > prevent other accesses to the file until the GC ejects that open
> > file, as we've seen in the field.
> >
> > IMHO nfsd_file_acquire_gc() is going to have some unwanted side
> > effects.
> >
>
> Typically, the client issues COMMIT calls when the client-side fd is
> closed (for CTO). While I think we do need to be able to deal with
> flushing files with dirty data that are left "hanging", that shouldn't
> be the common case. Most of the time, the client is going to be issuing
> regular COMMITs so that it can clean its pages.
>
> IOW, I don't see how localio is any different than the case of normal
> v3 IO in this respect.
> --
> Jeff Layton <jlayton@kernel.org>
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 18/19] SUNRPC: replace program list with program array
2024-06-29 16:00 ` Chuck Lever
@ 2024-06-30 21:57 ` NeilBrown
0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2024-06-30 21:57 UTC (permalink / raw)
To: Chuck Lever
Cc: Mike Snitzer, linux-nfs, Jeff Layton, Anna Schumaker,
Trond Myklebust, snitzer
On Sun, 30 Jun 2024, Chuck Lever wrote:
> On Fri, Jun 28, 2024 at 05:11:04PM -0400, Mike Snitzer wrote:
> > From: NeilBrown <neil@brown.name>
> >
> > A service created with svc_create_pooled() can be given a linked list of
> > programs and all of these will be served.
> >
> > Using a linked list makes it cumbersome when there are several programs
> > that can be optionally selected with CONFIG settings.
> >
> > So change to use an array with explicit size. svc_create() is always
> > passed a single program. svc_create_pooled() now must be used for
> > multiple programs.
>
> Instead of this last sentence, it might be more clear to say:
>
> > After this patch is applied, API consumers must use only
> > svc_create_pooled() when creating an RPC service that listens for
> > more than one RPC program.
Thanks - that's a much clearer way to say it.
NeilBrown
>
> I like the idea of replacing these static linked lists.
>
>
> > Signed-off-by: NeilBrown <neil@brown.name>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfsd/nfsctl.c | 2 +-
> > fs/nfsd/nfsd.h | 2 +-
> > fs/nfsd/nfssvc.c | 69 ++++++++++++++++++--------------------
> > include/linux/sunrpc/svc.h | 7 ++--
> > net/sunrpc/svc.c | 68 +++++++++++++++++++++----------------
> > net/sunrpc/svc_xprt.c | 2 +-
> > net/sunrpc/svcauth_unix.c | 3 +-
> > 7 files changed, 80 insertions(+), 73 deletions(-)
> >
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index e5d2cc74ef77..6fb92bb61c6d 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -2265,7 +2265,7 @@ static __net_init int nfsd_net_init(struct net *net)
> > if (retval)
> > goto out_repcache_error;
> > memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
> > - nn->nfsd_svcstats.program = &nfsd_program;
> > + nn->nfsd_svcstats.program = &nfsd_programs[0];
> > nn->nfsd_versions = NULL;
> > nn->nfsd4_minorversions = NULL;
> > nfsd4_init_leases_net(nn);
> > diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
> > index cec8697b1cd6..c3f7c5957950 100644
> > --- a/fs/nfsd/nfsd.h
> > +++ b/fs/nfsd/nfsd.h
> > @@ -80,7 +80,7 @@ struct nfsd_genl_rqstp {
> > u32 rq_opnum[NFSD_MAX_OPS_PER_COMPOUND];
> > };
> >
> > -extern struct svc_program nfsd_program;
> > +extern struct svc_program nfsd_programs[];
> > extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4;
> > extern struct mutex nfsd_mutex;
> > extern spinlock_t nfsd_drc_lock;
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index 6cc6a1971e21..ef2532303ece 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -36,7 +36,6 @@
> > #define NFSDDBG_FACILITY NFSDDBG_SVC
> >
> > atomic_t nfsd_th_cnt = ATOMIC_INIT(0);
> > -extern struct svc_program nfsd_program;
> > static int nfsd(void *vrqstp);
> > #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> > static int nfsd_acl_rpcbind_set(struct net *,
> > @@ -89,16 +88,6 @@ static const struct svc_version *localio_versions[] = {
> >
> > #define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions)
> >
> > -static struct svc_program nfsd_localio_program = {
> > - .pg_prog = NFS_LOCALIO_PROGRAM,
> > - .pg_nvers = NFSD_LOCALIO_NRVERS,
> > - .pg_vers = localio_versions,
> > - .pg_name = "nfslocalio",
> > - .pg_class = "nfsd",
> > - .pg_authenticate = &svc_set_client,
> > - .pg_init_request = svc_generic_init_request,
> > - .pg_rpcbind_set = svc_generic_rpcbind_set,
> > -};
> > #endif /* CONFIG_NFSD_LOCALIO */
> >
> > #if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> > @@ -111,23 +100,9 @@ static const struct svc_version *nfsd_acl_version[] = {
> > # endif
> > };
> >
> > -#define NFSD_ACL_MINVERS 2
> > +#define NFSD_ACL_MINVERS 2
> > #define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)
> >
> > -static struct svc_program nfsd_acl_program = {
> > -#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> > - .pg_next = &nfsd_localio_program,
> > -#endif /* CONFIG_NFSD_LOCALIO */
> > - .pg_prog = NFS_ACL_PROGRAM,
> > - .pg_nvers = NFSD_ACL_NRVERS,
> > - .pg_vers = nfsd_acl_version,
> > - .pg_name = "nfsacl",
> > - .pg_class = "nfsd",
> > - .pg_authenticate = &svc_set_client,
> > - .pg_init_request = nfsd_acl_init_request,
> > - .pg_rpcbind_set = nfsd_acl_rpcbind_set,
> > -};
> > -
> > #endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
> >
> > static const struct svc_version *nfsd_version[] = {
> > @@ -140,25 +115,44 @@ static const struct svc_version *nfsd_version[] = {
> > #endif
> > };
> >
> > -#define NFSD_MINVERS 2
> > +#define NFSD_MINVERS 2
> > #define NFSD_NRVERS ARRAY_SIZE(nfsd_version)
> >
> > -struct svc_program nfsd_program = {
> > -#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> > - .pg_next = &nfsd_acl_program,
> > -#else
> > -#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> > - .pg_next = &nfsd_localio_program,
> > -#endif /* CONFIG_NFSD_LOCALIO */
> > -#endif
> > +struct svc_program nfsd_programs[] = {
> > + {
> > .pg_prog = NFS_PROGRAM, /* program number */
> > .pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
> > .pg_vers = nfsd_version, /* version table */
> > .pg_name = "nfsd", /* program name */
> > .pg_class = "nfsd", /* authentication class */
> > - .pg_authenticate = &svc_set_client, /* export authentication */
> > + .pg_authenticate = svc_set_client, /* export authentication */
> > .pg_init_request = nfsd_init_request,
> > .pg_rpcbind_set = nfsd_rpcbind_set,
> > + },
> > +#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
> > + {
> > + .pg_prog = NFS_ACL_PROGRAM,
> > + .pg_nvers = NFSD_ACL_NRVERS,
> > + .pg_vers = nfsd_acl_version,
> > + .pg_name = "nfsacl",
> > + .pg_class = "nfsd",
> > + .pg_authenticate = svc_set_client,
> > + .pg_init_request = nfsd_acl_init_request,
> > + .pg_rpcbind_set = nfsd_acl_rpcbind_set,
> > + },
> > +#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
> > +#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> > + {
> > + .pg_prog = NFS_LOCALIO_PROGRAM,
> > + .pg_nvers = NFSD_LOCALIO_NRVERS,
> > + .pg_vers = localio_versions,
> > + .pg_name = "nfslocalio",
> > + .pg_class = "nfsd",
> > + .pg_authenticate = svc_set_client,
> > + .pg_init_request = svc_generic_init_request,
> > + .pg_rpcbind_set = svc_generic_rpcbind_set,
> > + }
> > +#endif /* IS_ENABLED(CONFIG_NFSD_LOCALIO) */
> > };
> >
> > bool nfsd_support_version(int vers)
> > @@ -735,7 +729,8 @@ int nfsd_create_serv(struct net *net)
> > if (nfsd_max_blksize == 0)
> > nfsd_max_blksize = nfsd_get_default_max_blksize();
> > nfsd_reset_versions(nn);
> > - serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats,
> > + serv = svc_create_pooled(nfsd_programs, ARRAY_SIZE(nfsd_programs),
> > + &nn->nfsd_svcstats,
> > nfsd_max_blksize, nfsd);
> > if (serv == NULL)
> > return -ENOMEM;
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index a7d0406b9ef5..7c86b1696398 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -66,9 +66,10 @@ enum {
> > * We currently do not support more than one RPC program per daemon.
> > */
> > struct svc_serv {
> > - struct svc_program * sv_program; /* RPC program */
> > + struct svc_program * sv_programs; /* RPC programs */
> > struct svc_stat * sv_stats; /* RPC statistics */
> > spinlock_t sv_lock;
> > + unsigned int sv_nprogs; /* Number of sv_programs */
> > unsigned int sv_nrthreads; /* # of server threads */
> > unsigned int sv_maxconn; /* max connections allowed or
> > * '0' causing max to be based
> > @@ -329,10 +330,9 @@ struct svc_process_info {
> > };
> >
> > /*
> > - * List of RPC programs on the same transport endpoint
> > + * RPC program - an array of these can use the same transport endpoint
> > */
> > struct svc_program {
> > - struct svc_program * pg_next; /* other programs (same xprt) */
> > u32 pg_prog; /* program number */
> > unsigned int pg_lovers; /* lowest version */
> > unsigned int pg_hivers; /* highest version */
> > @@ -414,6 +414,7 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp);
> > void svc_rqst_free(struct svc_rqst *);
> > void svc_exit_thread(struct svc_rqst *);
> > struct svc_serv * svc_create_pooled(struct svc_program *prog,
> > + unsigned int nprog,
> > struct svc_stat *stats,
> > unsigned int bufsize,
> > int (*threadfn)(void *data));
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index 965a27806bfd..d9f348aa0672 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -440,10 +440,11 @@ EXPORT_SYMBOL_GPL(svc_rpcb_cleanup);
> >
> > static int svc_uses_rpcbind(struct svc_serv *serv)
> > {
> > - struct svc_program *progp;
> > - unsigned int i;
> > + unsigned int p, i;
> > +
> > + for (p = 0; p < serv->sv_nprogs; p++) {
> > + struct svc_program *progp = &serv->sv_programs[p];
> >
> > - for (progp = serv->sv_program; progp; progp = progp->pg_next) {
> > for (i = 0; i < progp->pg_nvers; i++) {
> > if (progp->pg_vers[i] == NULL)
> > continue;
> > @@ -480,7 +481,7 @@ __svc_init_bc(struct svc_serv *serv)
> > * Create an RPC service
> > */
> > static struct svc_serv *
> > -__svc_create(struct svc_program *prog, struct svc_stat *stats,
> > +__svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats,
> > unsigned int bufsize, int npools, int (*threadfn)(void *data))
> > {
> > struct svc_serv *serv;
> > @@ -491,7 +492,8 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
> > if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL)))
> > return NULL;
> > serv->sv_name = prog->pg_name;
> > - serv->sv_program = prog;
> > + serv->sv_programs = prog;
> > + serv->sv_nprogs = nprogs;
> > serv->sv_stats = stats;
> > if (bufsize > RPCSVC_MAXPAYLOAD)
> > bufsize = RPCSVC_MAXPAYLOAD;
> > @@ -499,17 +501,18 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
> > serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE);
> > serv->sv_threadfn = threadfn;
> > xdrsize = 0;
> > - while (prog) {
> > - prog->pg_lovers = prog->pg_nvers-1;
> > - for (vers=0; vers<prog->pg_nvers ; vers++)
> > - if (prog->pg_vers[vers]) {
> > - prog->pg_hivers = vers;
> > - if (prog->pg_lovers > vers)
> > - prog->pg_lovers = vers;
> > - if (prog->pg_vers[vers]->vs_xdrsize > xdrsize)
> > - xdrsize = prog->pg_vers[vers]->vs_xdrsize;
> > + for (i = 0; i < nprogs; i++) {
> > + struct svc_program *progp = &prog[i];
> > +
> > + progp->pg_lovers = progp->pg_nvers-1;
> > + for (vers = 0; vers < progp->pg_nvers ; vers++)
> > + if (progp->pg_vers[vers]) {
> > + progp->pg_hivers = vers;
> > + if (progp->pg_lovers > vers)
> > + progp->pg_lovers = vers;
> > + if (progp->pg_vers[vers]->vs_xdrsize > xdrsize)
> > + xdrsize = progp->pg_vers[vers]->vs_xdrsize;
> > }
> > - prog = prog->pg_next;
> > }
> > serv->sv_xdrsize = xdrsize;
> > INIT_LIST_HEAD(&serv->sv_tempsocks);
> > @@ -558,13 +561,14 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
> > struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize,
> > int (*threadfn)(void *data))
> > {
> > - return __svc_create(prog, NULL, bufsize, 1, threadfn);
> > + return __svc_create(prog, 1, NULL, bufsize, 1, threadfn);
> > }
> > EXPORT_SYMBOL_GPL(svc_create);
> >
> > /**
> > * svc_create_pooled - Create an RPC service with pooled threads
> > - * @prog: the RPC program the new service will handle
> > + * @prog: Array of RPC programs the new service will handle
> > + * @nprogs: Number of programs in the array
> > * @stats: the stats struct if desired
> > * @bufsize: maximum message size for @prog
> > * @threadfn: a function to service RPC requests for @prog
> > @@ -572,6 +576,7 @@ EXPORT_SYMBOL_GPL(svc_create);
> > * Returns an instantiated struct svc_serv object or NULL.
> > */
> > struct svc_serv *svc_create_pooled(struct svc_program *prog,
> > + unsigned int nprogs,
> > struct svc_stat *stats,
> > unsigned int bufsize,
> > int (*threadfn)(void *data))
> > @@ -579,7 +584,7 @@ struct svc_serv *svc_create_pooled(struct svc_program *prog,
> > struct svc_serv *serv;
> > unsigned int npools = svc_pool_map_get();
> >
> > - serv = __svc_create(prog, stats, bufsize, npools, threadfn);
> > + serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn);
> > if (!serv)
> > goto out_err;
> > serv->sv_is_pooled = true;
> > @@ -602,16 +607,16 @@ svc_destroy(struct svc_serv **servp)
> >
> > *servp = NULL;
> >
> > - dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name);
> > + dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name);
> > timer_shutdown_sync(&serv->sv_temptimer);
> >
> > /*
> > * Remaining transports at this point are not expected.
> > */
> > WARN_ONCE(!list_empty(&serv->sv_permsocks),
> > - "SVC: permsocks remain for %s\n", serv->sv_program->pg_name);
> > + "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name);
> > WARN_ONCE(!list_empty(&serv->sv_tempsocks),
> > - "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name);
> > + "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name);
> >
> > cache_clean_deferred(serv);
> >
> > @@ -1156,15 +1161,16 @@ int svc_register(const struct svc_serv *serv, struct net *net,
> > const int family, const unsigned short proto,
> > const unsigned short port)
> > {
> > - struct svc_program *progp;
> > - unsigned int i;
> > + unsigned int p, i;
> > int error = 0;
> >
> > WARN_ON_ONCE(proto == 0 && port == 0);
> > if (proto == 0 && port == 0)
> > return -EINVAL;
> >
> > - for (progp = serv->sv_program; progp; progp = progp->pg_next) {
> > + for (p = 0; p < serv->sv_nprogs; p++) {
> > + struct svc_program *progp = &serv->sv_programs[p];
> > +
> > for (i = 0; i < progp->pg_nvers; i++) {
> >
> > error = progp->pg_rpcbind_set(net, progp, i,
> > @@ -1216,13 +1222,14 @@ static void __svc_unregister(struct net *net, const u32 program, const u32 versi
> > static void svc_unregister(const struct svc_serv *serv, struct net *net)
> > {
> > struct sighand_struct *sighand;
> > - struct svc_program *progp;
> > unsigned long flags;
> > - unsigned int i;
> > + unsigned int p, i;
> >
> > clear_thread_flag(TIF_SIGPENDING);
> >
> > - for (progp = serv->sv_program; progp; progp = progp->pg_next) {
> > + for (p = 0; p < serv->sv_nprogs; p++) {
> > + struct svc_program *progp = &serv->sv_programs[p];
> > +
> > for (i = 0; i < progp->pg_nvers; i++) {
> > if (progp->pg_vers[i] == NULL)
> > continue;
> > @@ -1328,7 +1335,7 @@ svc_process_common(struct svc_rqst *rqstp)
> > struct svc_process_info process;
> > enum svc_auth_status auth_res;
> > unsigned int aoffset;
> > - int rc;
> > + int pr, rc;
> > __be32 *p;
> >
> > /* Will be turned off only when NFSv4 Sessions are used */
> > @@ -1352,9 +1359,12 @@ svc_process_common(struct svc_rqst *rqstp)
> > rqstp->rq_vers = be32_to_cpup(p++);
> > rqstp->rq_proc = be32_to_cpup(p);
> >
> > - for (progp = serv->sv_program; progp; progp = progp->pg_next)
> > + for (pr = 0; pr < serv->sv_nprogs; pr++) {
> > + progp = &serv->sv_programs[pr];
> > +
> > if (rqstp->rq_prog == progp->pg_prog)
> > break;
> > + }
> >
> > /*
> > * Decode auth data, and add verifier to reply buffer.
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index d3735ab3e6d1..16634afdf253 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -268,7 +268,7 @@ static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name,
> > spin_unlock(&svc_xprt_class_lock);
> > newxprt = xcl->xcl_ops->xpo_create(serv, net, sap, len, flags);
> > if (IS_ERR(newxprt)) {
> > - trace_svc_xprt_create_err(serv->sv_program->pg_name,
> > + trace_svc_xprt_create_err(serv->sv_programs->pg_name,
> > xcl->xcl_name, sap, len,
> > newxprt);
> > module_put(xcl->xcl_owner);
> > diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
> > index 04b45588ae6f..8ca98b146ec8 100644
> > --- a/net/sunrpc/svcauth_unix.c
> > +++ b/net/sunrpc/svcauth_unix.c
> > @@ -697,7 +697,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
> > rqstp->rq_auth_stat = rpc_autherr_badcred;
> > ipm = ip_map_cached_get(xprt);
> > if (ipm == NULL)
> > - ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class,
> > + ipm = __ip_map_lookup(sn->ip_map_cache,
> > + rqstp->rq_server->sv_programs->pg_class,
> > &sin6->sin6_addr);
> >
> > if (ipm == NULL)
> > --
> > 2.44.0
> >
> >
>
> --
> Chuck Lever
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4
2024-06-29 15:50 ` Chuck Lever
@ 2024-06-30 22:01 ` NeilBrown
2024-06-30 22:23 ` Chuck Lever
0 siblings, 1 reply; 44+ messages in thread
From: NeilBrown @ 2024-06-30 22:01 UTC (permalink / raw)
To: Chuck Lever
Cc: Mike Snitzer, linux-nfs, Jeff Layton, Anna Schumaker,
Trond Myklebust, snitzer
On Sun, 30 Jun 2024, Chuck Lever wrote:
> On Fri, Jun 28, 2024 at 05:10:53PM -0400, Mike Snitzer wrote:
> > This is nfs-localio code which blurs the boundary between server and
> > client...
> >
> > The change_attr is used by NFS to detect if a file might have changed.
> > This code is used to get the attributes after a write request. NFS
> > uses a GETATTR request to the server at other times. The change_attr
> > should be consistent between the two else comparisons will be
> > meaningless.
> >
> > So nfs_localio_vfs_getattr() should use the same change_attr as the
> > one that would be used if the NFS GETATTR request were made. For
> > NFSv3, that is nfs_timespec_to_change_attr() as was already
> > implemented. For NFSv4 it is something different (as implemented in
> > this commit).
> >
> > [above header derived from linux-nfs message Neil sent on this topic]
>
> Instead of this note, I recommend:
>
> Message-Id: <171918165963.14261.959545364150864599@noble.neil.brown.name>
Linus would not be impressed. He likes links that you can click on and
follow.
So
Link: https://lore.kernel.org/171918165963.14261.959545364150864599@noble.neil.brown.name
is preferred (at least I think that is the current state of the
conversation
see https://lore.kernel.org/all/CAHk-=wiD9du3fBHuLYzwUSdNgY+hxMZEWNZpqJXy-=wD2wafdg@mail.gmail.com/
NeilBrown
>
>
> > Suggested-by: NeilBrown <neil@brown.name>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfs/localio.c | 48 +++++++++++++++++++++++++++++++++++++++---------
> > 1 file changed, 39 insertions(+), 9 deletions(-)
> >
> > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > index 0f7d6d55087b..fe96f05ba8ca 100644
> > --- a/fs/nfs/localio.c
> > +++ b/fs/nfs/localio.c
> > @@ -364,21 +364,47 @@ nfs_set_local_verifier(struct inode *inode,
> > verf->committed = how;
> > }
> >
> > +/* Factored out from fs/nfsd/vfs.h:fh_getattr() */
> > +static int __vfs_getattr(struct path *p, struct kstat *stat, int version)
> > +{
> > + u32 request_mask = STATX_BASIC_STATS;
> > +
> > + if (version == 4)
> > + request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE);
> > + return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT);
> > +}
> > +
> > +/*
> > + * Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute(),
> > + * FIXME: factor out to common code.
> > + */
> > +static u64 __nfsd4_change_attribute(const struct kstat *stat,
> > + const struct inode *inode)
> > +{
> > + u64 chattr;
> > +
> > + if (stat->result_mask & STATX_CHANGE_COOKIE) {
> > + chattr = stat->change_cookie;
> > + if (S_ISREG(inode->i_mode) &&
> > + !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
> > + chattr += (u64)stat->ctime.tv_sec << 30;
> > + chattr += stat->ctime.tv_nsec;
> > + }
> > + } else {
> > + chattr = time_to_chattr(&stat->ctime);
> > + }
> > + return chattr;
> > +}
> > +
> > static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
> > {
> > struct kstat stat;
> > struct file *filp = iocb->kiocb.ki_filp;
> > struct nfs_pgio_header *hdr = iocb->hdr;
> > struct nfs_fattr *fattr = hdr->res.fattr;
> > + int version = NFS_PROTO(hdr->inode)->version;
> >
> > - if (unlikely(!fattr) || vfs_getattr(&filp->f_path, &stat,
> > - STATX_INO |
> > - STATX_ATIME |
> > - STATX_MTIME |
> > - STATX_CTIME |
> > - STATX_SIZE |
> > - STATX_BLOCKS,
> > - AT_STATX_SYNC_AS_STAT))
> > + if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version))
> > return;
> >
> > fattr->valid = (NFS_ATTR_FATTR_FILEID |
> > @@ -394,7 +420,11 @@ static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
> > fattr->atime = stat.atime;
> > fattr->mtime = stat.mtime;
> > fattr->ctime = stat.ctime;
> > - fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> > + if (version == 4) {
> > + fattr->change_attr =
> > + __nfsd4_change_attribute(&stat, file_inode(filp));
> > + } else
> > + fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> > fattr->du.nfs3.used = stat.blocks << 9;
> > }
> >
> > --
> > 2.44.0
> >
>
> --
> Chuck Lever
>
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-29 22:18 ` Chuck Lever
2024-06-30 14:49 ` Chuck Lever
2024-06-30 19:51 ` Jeff Layton
@ 2024-06-30 22:22 ` NeilBrown
2024-06-30 22:34 ` Chuck Lever
2 siblings, 1 reply; 44+ messages in thread
From: NeilBrown @ 2024-06-30 22:22 UTC (permalink / raw)
To: Chuck Lever
Cc: Mike Snitzer, linux-nfs, Jeff Layton, Anna Schumaker,
Trond Myklebust, snitzer
On Sun, 30 Jun 2024, Chuck Lever wrote:
> Sorry, I guess I expected to have more time to learn about these
> patches before writing review comments. But if you want them to go
> in soon, I had better look more closely at them now.
>
>
> On Fri, Jun 28, 2024 at 05:10:59PM -0400, Mike Snitzer wrote:
> > Pass the stored cl_nfssvc_net from the client to the server as
>
> This is the only mention of cl_nfssvc_net I can find in this
> patch. I'm not sure what it is. Patch description should maybe
> provide some context.
>
>
> > first argument to nfsd_open_local_fh() to ensure the proper network
> > namespace is used for localio.
>
> Can the patch description say something about the distinct mount
> namespaces -- if the local application is running in a different
> container than the NFS server, are we using only the network
> namespaces for authorizing the file access? And is that OK to do?
> If yes, patch description should explain that NFS local I/O ignores
> the boundaries of mount namespaces and why that is OK to do.
>
>
> > Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> > Signed-off-by: Peng Tao <tao.peng@primarydata.com>
> > Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfsd/Makefile | 1 +
> > fs/nfsd/filecache.c | 2 +-
> > fs/nfsd/localio.c | 239 ++++++++++++++++++++++++++++++++++++++++++++
> > fs/nfsd/nfssvc.c | 1 +
> > fs/nfsd/trace.h | 3 +-
> > fs/nfsd/vfs.h | 9 ++
> > 6 files changed, 253 insertions(+), 2 deletions(-)
> > create mode 100644 fs/nfsd/localio.c
> >
> > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > index b8736a82e57c..78b421778a79 100644
> > --- a/fs/nfsd/Makefile
> > +++ b/fs/nfsd/Makefile
> > @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> > nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> > nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> > nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> > +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index ad9083ca144b..99631fa56662 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -52,7 +52,7 @@
> > #define NFSD_FILE_CACHE_UP (0)
> >
> > /* We only care about NFSD_MAY_READ/WRITE for this cache */
> > -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
> > +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> >
> > static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> > static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > new file mode 100644
> > index 000000000000..759a5cb79652
> > --- /dev/null
> > +++ b/fs/nfsd/localio.c
> > @@ -0,0 +1,239 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * NFS server support for local clients to bypass network stack
> > + *
> > + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> > + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > + */
> > +
> > +#include <linux/exportfs.h>
> > +#include <linux/sunrpc/svcauth_gss.h>
> > +#include <linux/sunrpc/clnt.h>
> > +#include <linux/nfs.h>
> > +#include <linux/string.h>
> > +
> > +#include "nfsd.h"
> > +#include "vfs.h"
> > +#include "netns.h"
> > +#include "filecache.h"
> > +
> > +#define NFSDDBG_FACILITY NFSDDBG_FH
>
> With no more dprintk() call sites in this patch, you no longer need
> this macro definition.
>
>
> > +/*
> > + * We need to translate between nfs status return values and
> > + * the local errno values which may not be the same.
> > + * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
> > + * all compiled nfs objects if it were in include/linux/nfs.h
> > + */
> > +static const struct {
> > + int stat;
> > + int errno;
> > +} nfs_common_errtbl[] = {
> > + { NFS_OK, 0 },
> > + { NFSERR_PERM, -EPERM },
> > + { NFSERR_NOENT, -ENOENT },
> > + { NFSERR_IO, -EIO },
> > + { NFSERR_NXIO, -ENXIO },
> > +/* { NFSERR_EAGAIN, -EAGAIN }, */
> > + { NFSERR_ACCES, -EACCES },
> > + { NFSERR_EXIST, -EEXIST },
> > + { NFSERR_XDEV, -EXDEV },
> > + { NFSERR_NODEV, -ENODEV },
> > + { NFSERR_NOTDIR, -ENOTDIR },
> > + { NFSERR_ISDIR, -EISDIR },
> > + { NFSERR_INVAL, -EINVAL },
> > + { NFSERR_FBIG, -EFBIG },
> > + { NFSERR_NOSPC, -ENOSPC },
> > + { NFSERR_ROFS, -EROFS },
> > + { NFSERR_MLINK, -EMLINK },
> > + { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
> > + { NFSERR_NOTEMPTY, -ENOTEMPTY },
> > + { NFSERR_DQUOT, -EDQUOT },
> > + { NFSERR_STALE, -ESTALE },
> > + { NFSERR_REMOTE, -EREMOTE },
> > +#ifdef EWFLUSH
> > + { NFSERR_WFLUSH, -EWFLUSH },
> > +#endif
> > + { NFSERR_BADHANDLE, -EBADHANDLE },
> > + { NFSERR_NOT_SYNC, -ENOTSYNC },
> > + { NFSERR_BAD_COOKIE, -EBADCOOKIE },
> > + { NFSERR_NOTSUPP, -ENOTSUPP },
> > + { NFSERR_TOOSMALL, -ETOOSMALL },
> > + { NFSERR_SERVERFAULT, -EREMOTEIO },
> > + { NFSERR_BADTYPE, -EBADTYPE },
> > + { NFSERR_JUKEBOX, -EJUKEBOX },
> > + { -1, -EIO }
> > +};
> > +
> > +/**
> > + * nfs_stat_to_errno - convert an NFS status code to a local errno
> > + * @status: NFS status code to convert
> > + *
> > + * Returns a local errno value, or -EIO if the NFS status code is
> > + * not recognized. nfsd_file_acquire() returns an nfsstat that
> > + * needs to be translated to an errno before being returned to a
> > + * local client application.
> > + */
> > +static int nfs_stat_to_errno(enum nfs_stat status)
> > +{
> > + int i;
> > +
> > + for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
> > + if (nfs_common_errtbl[i].stat == (int)status)
> > + return nfs_common_errtbl[i].errno;
> > + }
> > + return nfs_common_errtbl[i].errno;
> > +}
> > +
> > +static void
> > +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> > +{
> > + if (rqstp->rq_client)
> > + auth_domain_put(rqstp->rq_client);
> > + if (rqstp->rq_cred.cr_group_info)
> > + put_group_info(rqstp->rq_cred.cr_group_info);
> > + /* rpcauth_map_to_svc_cred_local() clears cr_principal */
> > + WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
> > + kfree(rqstp->rq_xprt);
> > + kfree(rqstp);
> > +}
> > +
> > +static struct svc_rqst *
> > +nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
> > + const struct cred *cred)
> > +{
> > + struct svc_rqst *rqstp;
> > + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > + int status;
> > +
> > + /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
> > + if (unlikely(!READ_ONCE(nn->nfsd_serv)))
> > + return ERR_PTR(-ENXIO);
> > +
> > + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> > + if (!rqstp)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> > + if (!rqstp->rq_xprt) {
> > + status = -ENOMEM;
> > + goto out_err;
> > + }
>
> struct svc_rqst is pretty big (like, bigger than a couple of pages).
> What happens if this allocation fails?
>
> And how often does it occur -- does that add significant overhead?
>
>
> > +
> > + rqstp->rq_xprt->xpt_net = net;
> > + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> > + rqstp->rq_proc = 1;
> > + rqstp->rq_vers = 3;
>
> IMO these need to be symbolic constants, not integers. Or, at least
> there needs to be some documenting comments that explain these are
> fake and why that's OK to do. Or, are there better choices?
>
>
> > + rqstp->rq_prot = IPPROTO_TCP;
> > + rqstp->rq_server = nn->nfsd_serv;
> > +
> > + /* Note: we're connecting to ourself, so source addr == peer addr */
> > + rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
> > + (struct sockaddr *)&rqstp->rq_addr,
> > + sizeof(rqstp->rq_addr));
> > +
> > + rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
> > +
> > + /*
> > + * set up enough for svcauth_unix_set_client to be able to wait
> > + * for the cache downcall. Note that we do _not_ want to allow the
> > + * request to be deferred for later revisit since this rqst and xprt
> > + * are not set up to run inside of the normal svc_rqst engine.
> > + */
> > + INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
> > + kref_init(&rqstp->rq_xprt->xpt_ref);
> > + spin_lock_init(&rqstp->rq_xprt->xpt_lock);
> > + rqstp->rq_chandle.thread_wait = 5 * HZ;
> > +
> > + status = svcauth_unix_set_client(rqstp);
> > + switch (status) {
> > + case SVC_OK:
> > + break;
> > + case SVC_DENIED:
> > + status = -ENXIO;
> > + goto out_err;
> > + default:
> > + status = -ETIMEDOUT;
> > + goto out_err;
> > + }
>
> Interesting. Why would svcauth_unix_set_client fail for a local I/O
> request? Wouldn't it only be because the local application is trying
> to open a file it doesn't have permission to?
>
I'm beginning to think this section of code is the of the sort where you
need to be twice as clever when debugging as you where when writing. It
is trying to get the client to use interfaces written for server-side
actions, and it isn't a good fit.
I think that instead we should modify fh_verify() so that it takes
explicit net, rq_vers, rq_cred, rq_client as well as the rqstp, and
the localio client passes in a NULL rqstp.
Getting the rq_client is an interesting challenge.
The above code (if I'm reading it correctly) gets the server-side
address of the IP connection, and passes that through to the sunrpc code
as though it is the client address. So as long as the server is
exporting to itself, and as long as no address translation is happening
on the path, this works. It feels messy though - and fragile.
I would rather we had some rq_client (struct auth_domain) that was
dedicated to localio. The client should be able to access it based on
the fact that it could rather the server UUID using the LOCALIO RPC
protocol.
I'm not sure what exactly this would look like, but the
'struct auth_domain *' should be something that can be accessed
directly, not looked up in a cache.
I can try to knock up a patch to allow fh_verify (and nfsd_file_acquire)
without an rqstp. I won't try the auth_domain change until I hear what
others think.
NeilBrown
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4
2024-06-30 22:01 ` NeilBrown
@ 2024-06-30 22:23 ` Chuck Lever
0 siblings, 0 replies; 44+ messages in thread
From: Chuck Lever @ 2024-06-30 22:23 UTC (permalink / raw)
To: NeilBrown
Cc: Mike Snitzer, linux-nfs, Jeff Layton, Anna Schumaker,
Trond Myklebust, snitzer
On Mon, Jul 01, 2024 at 08:01:30AM +1000, NeilBrown wrote:
> On Sun, 30 Jun 2024, Chuck Lever wrote:
> > On Fri, Jun 28, 2024 at 05:10:53PM -0400, Mike Snitzer wrote:
> > > This is nfs-localio code which blurs the boundary between server and
> > > client...
> > >
> > > The change_attr is used by NFS to detect if a file might have changed.
> > > This code is used to get the attributes after a write request. NFS
> > > uses a GETATTR request to the server at other times. The change_attr
> > > should be consistent between the two else comparisons will be
> > > meaningless.
> > >
> > > So nfs_localio_vfs_getattr() should use the same change_attr as the
> > > one that would be used if the NFS GETATTR request were made. For
> > > NFSv3, that is nfs_timespec_to_change_attr() as was already
> > > implemented. For NFSv4 it is something different (as implemented in
> > > this commit).
> > >
> > > [above header derived from linux-nfs message Neil sent on this topic]
> >
> > Instead of this note, I recommend:
> >
> > Message-Id: <171918165963.14261.959545364150864599@noble.neil.brown.name>
>
> Linus would not be impressed. He likes links that you can click on and
> follow.
I've read email that suggests he doesn't like those either. Another
day ending in "y", I guess.
> So
> Link: https://lore.kernel.org/171918165963.14261.959545364150864599@noble.neil.brown.name
>
> is preferred (at least I think that is the current state of the
> conversation
>
> see https://lore.kernel.org/all/CAHk-=wiD9du3fBHuLYzwUSdNgY+hxMZEWNZpqJXy-=wD2wafdg@mail.gmail.com/
As I read it, this refers to using Message-Id: to link to a patch
submission. I'm linking to a discussion thread, not to a patch.
Just to be clear.
> NeilBrown
>
> >
> >
> > > Suggested-by: NeilBrown <neil@brown.name>
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > > fs/nfs/localio.c | 48 +++++++++++++++++++++++++++++++++++++++---------
> > > 1 file changed, 39 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> > > index 0f7d6d55087b..fe96f05ba8ca 100644
> > > --- a/fs/nfs/localio.c
> > > +++ b/fs/nfs/localio.c
> > > @@ -364,21 +364,47 @@ nfs_set_local_verifier(struct inode *inode,
> > > verf->committed = how;
> > > }
> > >
> > > +/* Factored out from fs/nfsd/vfs.h:fh_getattr() */
> > > +static int __vfs_getattr(struct path *p, struct kstat *stat, int version)
> > > +{
> > > + u32 request_mask = STATX_BASIC_STATS;
> > > +
> > > + if (version == 4)
> > > + request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE);
> > > + return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT);
> > > +}
> > > +
> > > +/*
> > > + * Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute(),
> > > + * FIXME: factor out to common code.
> > > + */
> > > +static u64 __nfsd4_change_attribute(const struct kstat *stat,
> > > + const struct inode *inode)
> > > +{
> > > + u64 chattr;
> > > +
> > > + if (stat->result_mask & STATX_CHANGE_COOKIE) {
> > > + chattr = stat->change_cookie;
> > > + if (S_ISREG(inode->i_mode) &&
> > > + !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
> > > + chattr += (u64)stat->ctime.tv_sec << 30;
> > > + chattr += stat->ctime.tv_nsec;
> > > + }
> > > + } else {
> > > + chattr = time_to_chattr(&stat->ctime);
> > > + }
> > > + return chattr;
> > > +}
> > > +
> > > static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
> > > {
> > > struct kstat stat;
> > > struct file *filp = iocb->kiocb.ki_filp;
> > > struct nfs_pgio_header *hdr = iocb->hdr;
> > > struct nfs_fattr *fattr = hdr->res.fattr;
> > > + int version = NFS_PROTO(hdr->inode)->version;
> > >
> > > - if (unlikely(!fattr) || vfs_getattr(&filp->f_path, &stat,
> > > - STATX_INO |
> > > - STATX_ATIME |
> > > - STATX_MTIME |
> > > - STATX_CTIME |
> > > - STATX_SIZE |
> > > - STATX_BLOCKS,
> > > - AT_STATX_SYNC_AS_STAT))
> > > + if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version))
> > > return;
> > >
> > > fattr->valid = (NFS_ATTR_FATTR_FILEID |
> > > @@ -394,7 +420,11 @@ static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
> > > fattr->atime = stat.atime;
> > > fattr->mtime = stat.mtime;
> > > fattr->ctime = stat.ctime;
> > > - fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> > > + if (version == 4) {
> > > + fattr->change_attr =
> > > + __nfsd4_change_attribute(&stat, file_inode(filp));
> > > + } else
> > > + fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> > > fattr->du.nfs3.used = stat.blocks << 9;
> > > }
> > >
> > > --
> > > 2.44.0
> > >
> >
> > --
> > Chuck Lever
> >
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 22:22 ` NeilBrown
@ 2024-06-30 22:34 ` Chuck Lever
0 siblings, 0 replies; 44+ messages in thread
From: Chuck Lever @ 2024-06-30 22:34 UTC (permalink / raw)
To: NeilBrown
Cc: Mike Snitzer, linux-nfs, Jeff Layton, Anna Schumaker,
Trond Myklebust, snitzer
On Mon, Jul 01, 2024 at 08:22:56AM +1000, NeilBrown wrote:
> On Sun, 30 Jun 2024, Chuck Lever wrote:
> > Sorry, I guess I expected to have more time to learn about these
> > patches before writing review comments. But if you want them to go
> > in soon, I had better look more closely at them now.
> >
> >
> > On Fri, Jun 28, 2024 at 05:10:59PM -0400, Mike Snitzer wrote:
> > > Pass the stored cl_nfssvc_net from the client to the server as
> >
> > This is the only mention of cl_nfssvc_net I can find in this
> > patch. I'm not sure what it is. Patch description should maybe
> > provide some context.
> >
> >
> > > first argument to nfsd_open_local_fh() to ensure the proper network
> > > namespace is used for localio.
> >
> > Can the patch description say something about the distinct mount
> > namespaces -- if the local application is running in a different
> > container than the NFS server, are we using only the network
> > namespaces for authorizing the file access? And is that OK to do?
> > If yes, patch description should explain that NFS local I/O ignores
> > the boundaries of mount namespaces and why that is OK to do.
> >
> >
> > > Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> > > Signed-off-by: Peng Tao <tao.peng@primarydata.com>
> > > Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> > > Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > > fs/nfsd/Makefile | 1 +
> > > fs/nfsd/filecache.c | 2 +-
> > > fs/nfsd/localio.c | 239 ++++++++++++++++++++++++++++++++++++++++++++
> > > fs/nfsd/nfssvc.c | 1 +
> > > fs/nfsd/trace.h | 3 +-
> > > fs/nfsd/vfs.h | 9 ++
> > > 6 files changed, 253 insertions(+), 2 deletions(-)
> > > create mode 100644 fs/nfsd/localio.c
> > >
> > > diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> > > index b8736a82e57c..78b421778a79 100644
> > > --- a/fs/nfsd/Makefile
> > > +++ b/fs/nfsd/Makefile
> > > @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> > > nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> > > nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> > > nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> > > +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > > index ad9083ca144b..99631fa56662 100644
> > > --- a/fs/nfsd/filecache.c
> > > +++ b/fs/nfsd/filecache.c
> > > @@ -52,7 +52,7 @@
> > > #define NFSD_FILE_CACHE_UP (0)
> > >
> > > /* We only care about NFSD_MAY_READ/WRITE for this cache */
> > > -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
> > > +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
> > >
> > > static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> > > static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > new file mode 100644
> > > index 000000000000..759a5cb79652
> > > --- /dev/null
> > > +++ b/fs/nfsd/localio.c
> > > @@ -0,0 +1,239 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * NFS server support for local clients to bypass network stack
> > > + *
> > > + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> > > + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> > > + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> > > + */
> > > +
> > > +#include <linux/exportfs.h>
> > > +#include <linux/sunrpc/svcauth_gss.h>
> > > +#include <linux/sunrpc/clnt.h>
> > > +#include <linux/nfs.h>
> > > +#include <linux/string.h>
> > > +
> > > +#include "nfsd.h"
> > > +#include "vfs.h"
> > > +#include "netns.h"
> > > +#include "filecache.h"
> > > +
> > > +#define NFSDDBG_FACILITY NFSDDBG_FH
> >
> > With no more dprintk() call sites in this patch, you no longer need
> > this macro definition.
> >
> >
> > > +/*
> > > + * We need to translate between nfs status return values and
> > > + * the local errno values which may not be the same.
> > > + * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
> > > + * all compiled nfs objects if it were in include/linux/nfs.h
> > > + */
> > > +static const struct {
> > > + int stat;
> > > + int errno;
> > > +} nfs_common_errtbl[] = {
> > > + { NFS_OK, 0 },
> > > + { NFSERR_PERM, -EPERM },
> > > + { NFSERR_NOENT, -ENOENT },
> > > + { NFSERR_IO, -EIO },
> > > + { NFSERR_NXIO, -ENXIO },
> > > +/* { NFSERR_EAGAIN, -EAGAIN }, */
> > > + { NFSERR_ACCES, -EACCES },
> > > + { NFSERR_EXIST, -EEXIST },
> > > + { NFSERR_XDEV, -EXDEV },
> > > + { NFSERR_NODEV, -ENODEV },
> > > + { NFSERR_NOTDIR, -ENOTDIR },
> > > + { NFSERR_ISDIR, -EISDIR },
> > > + { NFSERR_INVAL, -EINVAL },
> > > + { NFSERR_FBIG, -EFBIG },
> > > + { NFSERR_NOSPC, -ENOSPC },
> > > + { NFSERR_ROFS, -EROFS },
> > > + { NFSERR_MLINK, -EMLINK },
> > > + { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
> > > + { NFSERR_NOTEMPTY, -ENOTEMPTY },
> > > + { NFSERR_DQUOT, -EDQUOT },
> > > + { NFSERR_STALE, -ESTALE },
> > > + { NFSERR_REMOTE, -EREMOTE },
> > > +#ifdef EWFLUSH
> > > + { NFSERR_WFLUSH, -EWFLUSH },
> > > +#endif
> > > + { NFSERR_BADHANDLE, -EBADHANDLE },
> > > + { NFSERR_NOT_SYNC, -ENOTSYNC },
> > > + { NFSERR_BAD_COOKIE, -EBADCOOKIE },
> > > + { NFSERR_NOTSUPP, -ENOTSUPP },
> > > + { NFSERR_TOOSMALL, -ETOOSMALL },
> > > + { NFSERR_SERVERFAULT, -EREMOTEIO },
> > > + { NFSERR_BADTYPE, -EBADTYPE },
> > > + { NFSERR_JUKEBOX, -EJUKEBOX },
> > > + { -1, -EIO }
> > > +};
> > > +
> > > +/**
> > > + * nfs_stat_to_errno - convert an NFS status code to a local errno
> > > + * @status: NFS status code to convert
> > > + *
> > > + * Returns a local errno value, or -EIO if the NFS status code is
> > > + * not recognized. nfsd_file_acquire() returns an nfsstat that
> > > + * needs to be translated to an errno before being returned to a
> > > + * local client application.
> > > + */
> > > +static int nfs_stat_to_errno(enum nfs_stat status)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
> > > + if (nfs_common_errtbl[i].stat == (int)status)
> > > + return nfs_common_errtbl[i].errno;
> > > + }
> > > + return nfs_common_errtbl[i].errno;
> > > +}
> > > +
> > > +static void
> > > +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> > > +{
> > > + if (rqstp->rq_client)
> > > + auth_domain_put(rqstp->rq_client);
> > > + if (rqstp->rq_cred.cr_group_info)
> > > + put_group_info(rqstp->rq_cred.cr_group_info);
> > > + /* rpcauth_map_to_svc_cred_local() clears cr_principal */
> > > + WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
> > > + kfree(rqstp->rq_xprt);
> > > + kfree(rqstp);
> > > +}
> > > +
> > > +static struct svc_rqst *
> > > +nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
> > > + const struct cred *cred)
> > > +{
> > > + struct svc_rqst *rqstp;
> > > + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > > + int status;
> > > +
> > > + /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
> > > + if (unlikely(!READ_ONCE(nn->nfsd_serv)))
> > > + return ERR_PTR(-ENXIO);
> > > +
> > > + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> > > + if (!rqstp)
> > > + return ERR_PTR(-ENOMEM);
> > > +
> > > + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> > > + if (!rqstp->rq_xprt) {
> > > + status = -ENOMEM;
> > > + goto out_err;
> > > + }
> >
> > struct svc_rqst is pretty big (like, bigger than a couple of pages).
> > What happens if this allocation fails?
> >
> > And how often does it occur -- does that add significant overhead?
> >
> >
> > > +
> > > + rqstp->rq_xprt->xpt_net = net;
> > > + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> > > + rqstp->rq_proc = 1;
> > > + rqstp->rq_vers = 3;
> >
> > IMO these need to be symbolic constants, not integers. Or, at least
> > there needs to be some documenting comments that explain these are
> > fake and why that's OK to do. Or, are there better choices?
> >
> >
> > > + rqstp->rq_prot = IPPROTO_TCP;
> > > + rqstp->rq_server = nn->nfsd_serv;
> > > +
> > > + /* Note: we're connecting to ourself, so source addr == peer addr */
> > > + rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
> > > + (struct sockaddr *)&rqstp->rq_addr,
> > > + sizeof(rqstp->rq_addr));
> > > +
> > > + rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
> > > +
> > > + /*
> > > + * set up enough for svcauth_unix_set_client to be able to wait
> > > + * for the cache downcall. Note that we do _not_ want to allow the
> > > + * request to be deferred for later revisit since this rqst and xprt
> > > + * are not set up to run inside of the normal svc_rqst engine.
> > > + */
> > > + INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
> > > + kref_init(&rqstp->rq_xprt->xpt_ref);
> > > + spin_lock_init(&rqstp->rq_xprt->xpt_lock);
> > > + rqstp->rq_chandle.thread_wait = 5 * HZ;
> > > +
> > > + status = svcauth_unix_set_client(rqstp);
> > > + switch (status) {
> > > + case SVC_OK:
> > > + break;
> > > + case SVC_DENIED:
> > > + status = -ENXIO;
> > > + goto out_err;
> > > + default:
> > > + status = -ETIMEDOUT;
> > > + goto out_err;
> > > + }
> >
> > Interesting. Why would svcauth_unix_set_client fail for a local I/O
> > request? Wouldn't it only be because the local application is trying
> > to open a file it doesn't have permission to?
> >
>
> I'm beginning to think this section of code is the of the sort where you
> need to be twice as clever when debugging as you where when writing. It
> is trying to get the client to use interfaces written for server-side
> actions, and it isn't a good fit.
>
> I think that instead we should modify fh_verify() so that it takes
> explicit net, rq_vers, rq_cred, rq_client as well as the rqstp, and
> the localio client passes in a NULL rqstp.
Nit: I'd rather provide a new fh_verify-like API -- changing the
synopsis of fh_verify() itself will result in a lot of code churn
for only a single call site.
> Getting the rq_client is an interesting challenge.
> The above code (if I'm reading it correctly) gets the server-side
> address of the IP connection, and passes that through to the sunrpc code
> as though it is the client address. So as long as the server is
> exporting to itself, and as long as no address translation is happening
> on the path, this works. It feels messy though - and fragile.
>
> I would rather we had some rq_client (struct auth_domain) that was
> dedicated to localio. The client should be able to access it based on
> the fact that it could rather the server UUID using the LOCALIO RPC
> protocol.
>
> I'm not sure what exactly this would look like, but the
> 'struct auth_domain *' should be something that can be accessed
> directly, not looked up in a cache.
I'd like to mitigate the possibility of having to wait for a
possible cache upcall, and reduce or remove the need for a phony
svc_rqst. It sounds like you are on that path.
Further, this needs to be clearly documented -- it's bypassing
(or perhaps augmenting) the export's usual IP address-based
authorization mechanism, so there are security considerations.
> I can try to knock up a patch to allow fh_verify (and nfsd_file_acquire)
> without an rqstp. I won't try the auth_domain change until I hear what
> others think.
--
Chuck Lever
^ permalink raw reply [flat|nested] 44+ messages in thread
* Re: [PATCH v9 13/19] nfsd: add "localio" support
2024-06-30 21:54 ` NeilBrown
@ 2024-07-01 1:29 ` NeilBrown
0 siblings, 0 replies; 44+ messages in thread
From: NeilBrown @ 2024-07-01 1:29 UTC (permalink / raw)
To: Jeff Layton
Cc: Chuck Lever, Mike Snitzer, linux-nfs, Anna Schumaker,
Trond Myklebust, snitzer
On Mon, 01 Jul 2024, NeilBrown wrote:
> On Mon, 01 Jul 2024, Jeff Layton wrote:
> > On Sun, 2024-06-30 at 15:55 -0400, Chuck Lever wrote:
> > > On Sun, Jun 30, 2024 at 03:52:51PM -0400, Jeff Layton wrote:
> > > > On Sun, 2024-06-30 at 15:44 -0400, Mike Snitzer wrote:
> > > > > On Sun, Jun 30, 2024 at 10:49:51AM -0400, Chuck Lever wrote:
> > > > > > On Sat, Jun 29, 2024 at 06:18:42PM -0400, Chuck Lever wrote:
> > > > > > > > +
> > > > > > > > + /* nfs_fh -> svc_fh */
> > > > > > > > + if (nfs_fh->size > NFS4_FHSIZE) {
> > > > > > > > + status = -EINVAL;
> > > > > > > > + goto out;
> > > > > > > > + }
> > > > > > > > + fh_init(&fh, NFS4_FHSIZE);
> > > > > > > > + fh.fh_handle.fh_size = nfs_fh->size;
> > > > > > > > + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> > > > > > > > +
> > > > > > > > + if (fmode & FMODE_READ)
> > > > > > > > + mayflags |= NFSD_MAY_READ;
> > > > > > > > + if (fmode & FMODE_WRITE)
> > > > > > > > + mayflags |= NFSD_MAY_WRITE;
> > > > > > > > +
> > > > > > > > + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > > > > + if (beres) {
> > > > > > > > + status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > > > > + goto out_fh_put;
> > > > > > > > + }
> > > > > > >
> > > > > > > So I'm wondering whether just calling fh_verify() and then
> > > > > > > nfsd_open_verified() would be simpler and/or good enough here. Is
> > > > > > > there a strong reason to use the file cache for locally opened
> > > > > > > files? Jeff, any thoughts?
> > > > > >
> > > > > > > Will there be writeback ramifications for
> > > > > > > doing this? Maybe we need a comment here explaining how these files
> > > > > > > are garbage collected (just an fput by the local I/O client, I would
> > > > > > > guess).
> > > > > >
> > > > > > OK, going back to this: Since right here we immediately call
> > > > > >
> > > > > > nfsd_file_put(nf);
> > > > > >
> > > > > > There are no writeback ramifications nor any need to comment about
> > > > > > garbage collection. But this still seems like a lot of (possibly
> > > > > > unnecessary) overhead for simply obtaining a struct file.
> > > > >
> > > > > Easy enough change, probably best to avoid the filecache but would like
> > > > > to verify with Jeff before switching:
> > > > >
> > > > > diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> > > > > index 1d6508aa931e..85ebf63789fb 100644
> > > > > --- a/fs/nfsd/localio.c
> > > > > +++ b/fs/nfsd/localio.c
> > > > > @@ -197,7 +197,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > > const struct cred *save_cred;
> > > > > struct svc_rqst *rqstp;
> > > > > struct svc_fh fh;
> > > > > - struct nfsd_file *nf;
> > > > > __be32 beres;
> > > > >
> > > > > if (nfs_fh->size > NFS4_FHSIZE)
> > > > > @@ -235,13 +234,12 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
> > > > > if (fmode & FMODE_WRITE)
> > > > > mayflags |= NFSD_MAY_WRITE;
> > > > >
> > > > > - beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> > > > > + beres = fh_verify(rqstp, &fh, S_IFREG, mayflags);
> > > > > if (beres) {
> > > > > status = nfs_stat_to_errno(be32_to_cpu(beres));
> > > > > goto out_fh_put;
> > > > > }
> > > > > - *pfilp = get_file(nf->nf_file);
> > > > > - nfsd_file_put(nf);
> > > > > + status = nfsd_open_verified(rqstp, &fh, mayflags, pfilp);
> > > > > out_fh_put:
> > > > > fh_put(&fh);
> > > > > nfsd_local_fakerqst_destroy(rqstp);
> > > > >
> > > >
> > > > My suggestion would be to _not_ do this. I think you do want to use the
> > > > filecache (mostly for performance reasons).
> > >
> > > But look carefully:
> > >
> > > -- we're not calling nfsd_file_acquire_gc() here
> > >
> > > -- we're immediately calling nfsd_file_put() on the returned nf
> > >
> > > There's nothing left in the file cache when nfsd_open_local_fh()
> > > returns. Each call here will do a full file open and a full close.
> > >
> > >
> >
> > Good point. This should be calling nfsd_file_acquire_gc(), IMO.
>
> Or the client could do a v4 style acquire, and not call nfsd_file_put()
> until it was done with the file. I don't see a specific problem with
> _gc, but avoiding the heuristic it implies seems best where possible.
>
I'm now wondering if this matters at all.
For NFSv4 the client still calls OPEN and CLOSE over the wire, so the
file will be in the cache whenever it is open so the current code is
fine.
For NFSv3 the client will only do the lookup once on the first IO
request. The struct file is stored in a client data structure and used
subsequently without any interaction with nfsd.
So if the client opens the same file multiple times we might get extra
lookups on the server, but I'm not at all sure that justifies any
complexity.
So my current inclination is to leave this code as is.
NeilBrown
^ permalink raw reply [flat|nested] 44+ messages in thread
end of thread, other threads:[~2024-07-01 1:29 UTC | newest]
Thread overview: 44+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-28 21:10 [PATCH v9 00/19] nfs/nfsd: add support for localio Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 01/19] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 02/19] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 03/19] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 04/19] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 05/19] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 06/19] nfs: add "localio" support Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 07/19] nfs/localio: fix nfs_localio_vfs_getattr() to properly support v4 Mike Snitzer
2024-06-29 15:50 ` Chuck Lever
2024-06-30 22:01 ` NeilBrown
2024-06-30 22:23 ` Chuck Lever
2024-06-28 21:10 ` [PATCH v9 08/19] nfs: enable localio for non-pNFS I/O Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 09/19] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 10/19] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 11/19] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 12/19] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-06-28 21:10 ` [PATCH v9 13/19] nfsd: add "localio" support Mike Snitzer
2024-06-29 22:18 ` Chuck Lever
2024-06-30 14:49 ` Chuck Lever
2024-06-30 19:44 ` Mike Snitzer
2024-06-30 19:52 ` Jeff Layton
2024-06-30 19:55 ` Chuck Lever
2024-06-30 19:59 ` Jeff Layton
2024-06-30 20:15 ` Chuck Lever
2024-06-30 21:07 ` Jeff Layton
2024-06-30 21:56 ` NeilBrown
2024-06-30 21:54 ` NeilBrown
2024-07-01 1:29 ` NeilBrown
2024-06-30 19:51 ` Jeff Layton
2024-06-30 22:22 ` NeilBrown
2024-06-30 22:34 ` Chuck Lever
2024-06-28 21:11 ` [PATCH v9 14/19] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
2024-06-28 21:11 ` [PATCH v9 15/19] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh Mike Snitzer
2024-06-28 21:11 ` [PATCH v9 16/19] nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
2024-06-28 21:11 ` [PATCH v9 17/19] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-06-28 21:11 ` [PATCH v9 18/19] SUNRPC: replace program list with program array Mike Snitzer
2024-06-29 16:00 ` Chuck Lever
2024-06-30 21:57 ` NeilBrown
2024-06-28 21:11 ` [PATCH v9 19/19] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-06-29 15:36 ` [PATCH v9 00/19] nfs/nfsd: add support for localio Chuck Lever III
2024-06-29 16:03 ` Mike Snitzer
2024-06-29 17:01 ` Chuck Lever
2024-06-29 19:10 ` Mike Snitzer
2024-06-29 20:31 ` Chuck Lever III
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox