* [PATCH v5 00/19] nfs/nfsd: add support for localio
@ 2024-06-18 20:19 Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 01/19] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
` (19 more replies)
0 siblings, 20 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
Hi,
This v5 is rebased on Chuck's nfsd-next (only required one adjustment
in patch 15 to account for new code that dereferences nn->nfsd_serv).
Only other change is patch 19 to add Documentation/filesystems/nfs/localio.rst
My git tree is here:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/
This v5 is both branch nfs-localio-for-6.11 (always tracks latest)
and nfs-localio-for-6.11.v5
Branches nfs-localio-for-6.11.v[1234] are also available.
To see the changes from v4 to v5 please do:
git remote add snitzer git://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git
git remote update snitzer
git diff snitzer/nfs-localio-for-6.11.v4 snitzer/nfs-localio-for-6.11.v5
[NOTE: there will be noise due to nfsd-next causing the base kernel to
move from v6.10-rc2 to v6.10-rc3]
All review and comments are welcome!
Thanks,
Mike
Mike Snitzer (11):
nfs_common: add NFS LOCALIO protocol extension enablement
nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM
nfsd: implement v3 and v4 server support for NFS_LOCALIO_PROGRAM
nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h
nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common
nfs/nfsd: ensure localio server always uses its network namespace
nfsd/localio: manage netns reference in nfsd_open_local_fh
nfsd: prepare to use SRCU to dereference nn->nfsd_serv
nfsd: use SRCU to dereference nn->nfsd_serv
nfsd/localio: use SRCU to dereference nn->nfsd_serv in nfsd_open_local_fh
nfs: add Documentation/filesystems/nfs/localio.rst
Trond Myklebust (3):
NFS: Enable localio for non-pNFS I/O
pnfs/flexfiles: Enable localio for flexfiles I/O
nfs/localio: use dedicated workqueues for filesystem read and write
Weston Andros Adamson (5):
nfs: pass nfs_client to nfs_initiate_pgio
nfs: pass descriptor thru nfs_initiate_pgio path
nfs: pass struct file to nfs_init_pgio and nfs_init_commit
sunrpc: add rpcauth_map_to_svc_cred_local
nfs/nfsd: add "localio" support
Documentation/filesystems/nfs/localio.rst | 101 +++
fs/Kconfig | 3 +
fs/nfs/Kconfig | 30 +
fs/nfs/Makefile | 1 +
fs/nfs/blocklayout/blocklayout.c | 6 +-
fs/nfs/client.c | 15 +-
fs/nfs/filelayout/filelayout.c | 16 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 131 +++-
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 +
fs/nfs/inode.c | 61 +-
fs/nfs/internal.h | 88 ++-
fs/nfs/localio.c | 850 ++++++++++++++++++++++
fs/nfs/nfs3_fs.h | 1 +
fs/nfs/nfs3client.c | 25 +
fs/nfs/nfs3proc.c | 3 +
fs/nfs/nfs3xdr.c | 58 ++
fs/nfs/nfs4_fs.h | 2 +
fs/nfs/nfs4client.c | 23 +
fs/nfs/nfs4proc.c | 3 +
fs/nfs/nfs4xdr.c | 65 +-
fs/nfs/nfstrace.h | 61 ++
fs/nfs/pagelist.c | 32 +-
fs/nfs/pnfs.c | 24 +-
fs/nfs/pnfs.h | 6 +-
fs/nfs/pnfs_nfs.c | 2 +-
fs/nfs/write.c | 13 +-
fs/nfs_common/Makefile | 3 +
fs/nfs_common/nfslocalio.c | 71 ++
fs/nfsd/Kconfig | 30 +
fs/nfsd/Makefile | 1 +
fs/nfsd/filecache.c | 15 +-
fs/nfsd/localio.c | 398 ++++++++++
fs/nfsd/netns.h | 16 +-
fs/nfsd/nfs4state.c | 25 +-
fs/nfsd/nfsctl.c | 28 +-
fs/nfsd/nfsd.h | 11 +
fs/nfsd/nfssvc.c | 182 ++++-
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 9 +
fs/nfsd/xdr.h | 6 +
include/linux/nfs.h | 2 +
include/linux/nfs_fs.h | 2 +
include/linux/nfs_fs_sb.h | 9 +
include/linux/nfs_xdr.h | 31 +-
include/linux/nfslocalio.h | 41 ++
include/linux/sunrpc/auth.h | 4 +
include/uapi/linux/nfs.h | 4 +
net/sunrpc/auth.c | 15 +
49 files changed, 2388 insertions(+), 146 deletions(-)
create mode 100644 Documentation/filesystems/nfs/localio.rst
create mode 100644 fs/nfs/localio.c
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 fs/nfsd/localio.c
create mode 100644 include/linux/nfslocalio.h
--
2.44.0
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH v5 01/19] nfs: pass nfs_client to nfs_initiate_pgio
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 02/19] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
` (18 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
The nfs_client is needed for localio support. Otherwise it won't be
possible to disable localio if it is attempted but fails.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/filelayout/filelayout.c | 4 ++--
fs/nfs/flexfilelayout/flexfilelayout.c | 6 ++++--
fs/nfs/internal.h | 5 +++--
fs/nfs/pagelist.c | 10 ++++++----
4 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 29d84dc66ca3..43e16e9e0176 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -486,7 +486,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
@@ -528,7 +528,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 24188af56d5b..327f1a5c9fbe 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1803,7 +1803,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
0, RPC_TASK_SOFTCONN);
@@ -1871,7 +1872,8 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
sync, RPC_TASK_SOFTCONN);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..a9c0c29f7804 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,8 +306,9 @@ extern const struct nfs_pageio_ops nfs_pgio_rw_ops;
struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+ struct nfs_pgio_header *hdr, const struct cred *cred,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 6efb5068c116..d9b795c538cd 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -844,8 +844,9 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
rpc_exit(task, err);
}
-int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+ struct nfs_pgio_header *hdr, const struct cred *cred,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
{
struct rpc_task *task;
@@ -855,7 +856,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
.rpc_cred = cred,
};
struct rpc_task_setup task_setup_data = {
- .rpc_client = clnt,
+ .rpc_client = rpc_clnt,
.task = &hdr->task,
.rpc_message = &msg,
.callback_ops = call_ops,
@@ -1070,7 +1071,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
if (ret == 0) {
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
+ ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)->nfs_client,
+ NFS_CLIENT(hdr->inode),
hdr,
hdr->cred,
NFS_PROTO(hdr->inode),
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 02/19] nfs: pass descriptor thru nfs_initiate_pgio path
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 01/19] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 03/19] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
` (17 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
This is needed for localio support.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/blocklayout/blocklayout.c | 6 ++++--
fs/nfs/filelayout/filelayout.c | 10 ++++++----
fs/nfs/flexfilelayout/flexfilelayout.c | 10 ++++++----
fs/nfs/internal.h | 6 +++---
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/pnfs.c | 24 +++++++++++++-----------
fs/nfs/pnfs.h | 6 ++++--
7 files changed, 40 insertions(+), 28 deletions(-)
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 6be13e0ec170..6a61ddd1835f 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -227,7 +227,8 @@ bl_end_par_io_read(void *data)
}
static enum pnfs_try_status
-bl_read_pagelist(struct nfs_pgio_header *header)
+bl_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *header)
{
struct pnfs_block_layout *bl = BLK_LSEG2EXT(header->lseg);
struct pnfs_block_dev_map map = { .start = NFS4_MAX_UINT64 };
@@ -372,7 +373,8 @@ static void bl_end_par_io_write(void *data)
}
static enum pnfs_try_status
-bl_write_pagelist(struct nfs_pgio_header *header, int sync)
+bl_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *header, int sync)
{
struct pnfs_block_layout *bl = BLK_LSEG2EXT(header->lseg);
struct pnfs_block_dev_map map = { .start = NFS4_MAX_UINT64 };
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 43e16e9e0176..f9b600c4a2b5 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -447,7 +447,8 @@ static const struct rpc_call_ops filelayout_commit_call_ops = {
};
static enum pnfs_try_status
-filelayout_read_pagelist(struct nfs_pgio_header *hdr)
+filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -486,7 +487,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
@@ -494,7 +495,8 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
/* Perform async writes. */
static enum pnfs_try_status
-filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int sync)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -528,7 +530,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 327f1a5c9fbe..22c0e8014afb 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1751,7 +1751,8 @@ static const struct rpc_call_ops ff_layout_commit_call_ops_v4 = {
};
static enum pnfs_try_status
-ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
+ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -1803,7 +1804,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
@@ -1822,7 +1823,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
/* Perform async writes. */
static enum pnfs_try_status
-ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int sync)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -1872,7 +1874,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index a9c0c29f7804..f6e56fdd8bc2 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,9 +306,9 @@ extern const struct nfs_pageio_ops nfs_pgio_rw_ops;
struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
- struct nfs_pgio_header *hdr, const struct cred *cred,
- const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
+ struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
+ const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index d9b795c538cd..3786d767e2ff 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -844,7 +844,8 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
rpc_exit(task, err);
}
-int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
+ struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
struct nfs_pgio_header *hdr, const struct cred *cred,
const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
@@ -1071,7 +1072,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
if (ret == 0) {
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)->nfs_client,
+ ret = nfs_initiate_pgio(desc,
+ NFS_SERVER(hdr->inode)->nfs_client,
NFS_CLIENT(hdr->inode),
hdr,
hdr->cred,
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b5834728f31b..c9015179b72c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2885,10 +2885,11 @@ pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
}
static enum pnfs_try_status
-pnfs_try_to_write_data(struct nfs_pgio_header *hdr,
- const struct rpc_call_ops *call_ops,
- struct pnfs_layout_segment *lseg,
- int how)
+pnfs_try_to_write_data(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops,
+ struct pnfs_layout_segment *lseg,
+ int how)
{
struct inode *inode = hdr->inode;
enum pnfs_try_status trypnfs;
@@ -2898,7 +2899,7 @@ pnfs_try_to_write_data(struct nfs_pgio_header *hdr,
dprintk("%s: Writing ino:%lu %u@%llu (how %d)\n", __func__,
inode->i_ino, hdr->args.count, hdr->args.offset, how);
- trypnfs = nfss->pnfs_curr_ld->write_pagelist(hdr, how);
+ trypnfs = nfss->pnfs_curr_ld->write_pagelist(desc, hdr, how);
if (trypnfs != PNFS_NOT_ATTEMPTED)
nfs_inc_stats(inode, NFSIOS_PNFS_WRITE);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
@@ -2913,7 +2914,7 @@ pnfs_do_write(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- trypnfs = pnfs_try_to_write_data(hdr, call_ops, lseg, how);
+ trypnfs = pnfs_try_to_write_data(desc, hdr, call_ops, lseg, how);
switch (trypnfs) {
case PNFS_NOT_ATTEMPTED:
pnfs_write_through_mds(desc, hdr);
@@ -3012,9 +3013,10 @@ pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
* Call the appropriate parallel I/O subsystem read function.
*/
static enum pnfs_try_status
-pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
- const struct rpc_call_ops *call_ops,
- struct pnfs_layout_segment *lseg)
+pnfs_try_to_read_data(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops,
+ struct pnfs_layout_segment *lseg)
{
struct inode *inode = hdr->inode;
struct nfs_server *nfss = NFS_SERVER(inode);
@@ -3025,7 +3027,7 @@ pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
dprintk("%s: Reading ino:%lu %u@%llu\n",
__func__, inode->i_ino, hdr->args.count, hdr->args.offset);
- trypnfs = nfss->pnfs_curr_ld->read_pagelist(hdr);
+ trypnfs = nfss->pnfs_curr_ld->read_pagelist(desc, hdr);
if (trypnfs != PNFS_NOT_ATTEMPTED)
nfs_inc_stats(inode, NFSIOS_PNFS_READ);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
@@ -3058,7 +3060,7 @@ pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- trypnfs = pnfs_try_to_read_data(hdr, call_ops, lseg);
+ trypnfs = pnfs_try_to_read_data(desc, hdr, call_ops, lseg);
switch (trypnfs) {
case PNFS_NOT_ATTEMPTED:
pnfs_read_through_mds(desc, hdr);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index fa5beeaaf5da..92acb837cfa6 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -157,8 +157,10 @@ struct pnfs_layoutdriver_type {
* Return PNFS_ATTEMPTED to indicate the layout code has attempted
* I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
*/
- enum pnfs_try_status (*read_pagelist)(struct nfs_pgio_header *);
- enum pnfs_try_status (*write_pagelist)(struct nfs_pgio_header *, int);
+ enum pnfs_try_status (*read_pagelist)(struct nfs_pageio_descriptor *,
+ struct nfs_pgio_header *);
+ enum pnfs_try_status (*write_pagelist)(struct nfs_pageio_descriptor *,
+ struct nfs_pgio_header *, int);
void (*free_deviceid_node) (struct nfs4_deviceid_node *);
struct nfs4_deviceid_node * (*alloc_deviceid_node)
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 03/19] nfs: pass struct file to nfs_init_pgio and nfs_init_commit
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 01/19] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 02/19] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 04/19] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
` (16 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
This is needed for localio support.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/filelayout/filelayout.c | 6 +++---
fs/nfs/flexfilelayout/flexfilelayout.c | 6 +++---
fs/nfs/internal.h | 6 ++++--
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/pnfs_nfs.c | 2 +-
fs/nfs/write.c | 5 +++--
6 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index f9b600c4a2b5..b9e5e7bd15ca 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -489,7 +489,7 @@ filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
/* Perform an asynchronous read to ds */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
- 0, RPC_TASK_SOFTCONN);
+ 0, RPC_TASK_SOFTCONN, NULL);
return PNFS_ATTEMPTED;
}
@@ -532,7 +532,7 @@ filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
/* Perform an asynchronous write */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
- sync, RPC_TASK_SOFTCONN);
+ sync, RPC_TASK_SOFTCONN, NULL);
return PNFS_ATTEMPTED;
}
@@ -1013,7 +1013,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
data->args.fh = fh;
return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode),
&filelayout_commit_call_ops, how,
- RPC_TASK_SOFTCONN);
+ RPC_TASK_SOFTCONN, NULL);
out_err:
pnfs_generic_prepare_to_resend_writes(data);
pnfs_generic_commit_release(data);
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 22c0e8014afb..3ea07446f05a 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1808,7 +1808,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
- 0, RPC_TASK_SOFTCONN);
+ 0, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1878,7 +1878,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
- sync, RPC_TASK_SOFTCONN);
+ sync, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1953,7 +1953,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_commit_call_ops_v3 :
&ff_layout_commit_call_ops_v4,
- how, RPC_TASK_SOFTCONN);
+ how, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return ret;
out_err:
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f6e56fdd8bc2..958c8de072e2 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -309,7 +309,8 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
- const struct rpc_call_ops *call_ops, int how, int flags);
+ const struct rpc_call_ops *call_ops, int how, int flags,
+ struct file *localio);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
@@ -529,7 +530,8 @@ extern int nfs_initiate_commit(struct rpc_clnt *clnt,
struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
- int how, int flags);
+ int how, int flags,
+ struct file *localio);
extern void nfs_init_commit(struct nfs_commit_data *data,
struct list_head *head,
struct pnfs_layout_segment *lseg,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 3786d767e2ff..57d62db3be5b 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -848,7 +848,8 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
struct nfs_pgio_header *hdr, const struct cred *cred,
const struct nfs_rpc_ops *rpc_ops,
- const struct rpc_call_ops *call_ops, int how, int flags)
+ const struct rpc_call_ops *call_ops, int how, int flags,
+ struct file *localio)
{
struct rpc_task *task;
struct rpc_message msg = {
@@ -1080,7 +1081,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
NFS_PROTO(hdr->inode),
desc->pg_rpc_callops,
desc->pg_ioflags,
- RPC_TASK_CRED_NOREF | task_flags);
+ RPC_TASK_CRED_NOREF | task_flags,
+ NULL);
}
return ret;
}
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 88e061bd711b..ecfde2649cf3 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -537,7 +537,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
nfs_initiate_commit(NFS_CLIENT(inode), data,
NFS_PROTO(data->inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF);
+ RPC_TASK_CRED_NOREF, NULL);
} else {
nfs_init_commit(data, NULL, data->lseg, cinfo);
initiate_commit(data, how);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2329cbb0e446..267bed2a4ceb 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1670,7 +1670,8 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release);
int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
- int how, int flags)
+ int how, int flags,
+ struct file *localio)
{
struct rpc_task *task;
int priority = flush_task_priority(how);
@@ -1816,7 +1817,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
task_flags = RPC_TASK_MOVEABLE;
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF | task_flags);
+ RPC_TASK_CRED_NOREF | task_flags, NULL);
}
/*
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 04/19] sunrpc: add rpcauth_map_to_svc_cred_local
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (2 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 03/19] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 05/19] nfs_common: add NFS LOCALIO protocol extension enablement Mike Snitzer
` (15 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
Add new funtion rpcauth_map_to_svc_cred_local which maps a generic
cred to a svc_cred suitable for use in nfsd.
This is needed by the localio code to map nfs client creds to nfs
server credentials.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
include/linux/sunrpc/auth.h | 4 ++++
net/sunrpc/auth.c | 15 +++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index 61e58327b1aa..872f594a924c 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -11,6 +11,7 @@
#define _LINUX_SUNRPC_AUTH_H
#include <linux/sunrpc/sched.h>
+#include <linux/sunrpc/svcauth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/sunrpc/xdr.h>
@@ -184,6 +185,9 @@ int rpcauth_uptodatecred(struct rpc_task *);
int rpcauth_init_credcache(struct rpc_auth *);
void rpcauth_destroy_credcache(struct rpc_auth *);
void rpcauth_clear_credcache(struct rpc_cred_cache *);
+void rpcauth_map_to_svc_cred_local(struct rpc_auth *,
+ const struct cred *,
+ struct svc_cred *);
char * rpcauth_stringify_acceptor(struct rpc_cred *);
static inline
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 04534ea537c8..00f12ca779c5 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -308,6 +308,21 @@ rpcauth_init_credcache(struct rpc_auth *auth)
}
EXPORT_SYMBOL_GPL(rpcauth_init_credcache);
+void
+rpcauth_map_to_svc_cred_local(struct rpc_auth *auth, const struct cred *cred,
+ struct svc_cred *svc)
+{
+ svc->cr_uid = cred->uid;
+ svc->cr_gid = cred->gid;
+ svc->cr_flavor = auth->au_flavor;
+ if (cred->group_info)
+ svc->cr_group_info = get_group_info(cred->group_info);
+ /* These aren't relevant for local (network is bypassed) */
+ svc->cr_principal = NULL;
+ svc->cr_gss_mech = NULL;
+}
+EXPORT_SYMBOL_GPL(rpcauth_map_to_svc_cred_local);
+
char *
rpcauth_stringify_acceptor(struct rpc_cred *cred)
{
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 05/19] nfs_common: add NFS LOCALIO protocol extension enablement
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (3 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 04/19] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-19 5:04 ` NeilBrown
2024-06-18 20:19 ` [PATCH v5 06/19] nfs/nfsd: add "localio" support Mike Snitzer
` (14 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
First use is in nfsd, to add access to a global nfsd_uuids list that
will be used to identify local nfsd instances.
nfsd_uuids is protected by nfsd_mutex or RCU read lock. List is
composed of nfsd_uuid_t instances that are managed as nfsd creates
them (per network namespace).
nfsd_uuid_is_local() will be used to search all local nfsd for the
client specified nfsd uuid.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/Kconfig | 3 +++
fs/nfs/Kconfig | 30 +++++++++++++++++++++++++++
fs/nfs_common/Makefile | 3 +++
fs/nfs_common/nfslocalio.c | 42 ++++++++++++++++++++++++++++++++++++++
fs/nfsd/Kconfig | 30 +++++++++++++++++++++++++++
fs/nfsd/netns.h | 4 ++++
fs/nfsd/nfssvc.c | 12 ++++++++++-
include/linux/nfslocalio.h | 29 ++++++++++++++++++++++++++
8 files changed, 152 insertions(+), 1 deletion(-)
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 include/linux/nfslocalio.h
diff --git a/fs/Kconfig b/fs/Kconfig
index a46b0cbc4d8f..170083ff2a51 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
tristate
select FS_POSIX_ACL
+config NFS_COMMON_LOCALIO_SUPPORT
+ tristate
+
config NFS_COMMON
bool
depends on NFSD || NFS_FS || LOCKD
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 57249f040dfc..70ff4f7a1a22 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -1,10 +1,16 @@
# SPDX-License-Identifier: GPL-2.0-only
+
+config NFS_LOCALIO
+ tristate
+
config NFS_FS
tristate "NFS client support"
depends on INET && FILE_LOCKING && MULTIUSER
select LOCKD
select SUNRPC
select NFS_ACL_SUPPORT if NFS_V3_ACL
+ select NFS_LOCALIO if NFS_V3_LOCALIO || NFS_V4_LOCALIO
+ select NFS_COMMON_LOCALIO_SUPPORT if NFS_LOCALIO
help
Choose Y here if you want to access files residing on other
computers using Sun's Network File System protocol. To compile
@@ -72,6 +78,18 @@ config NFS_V3_ACL
If unsure, say N.
+config NFS_V3_LOCALIO
+ bool "NFS client support for the NFSv3 LOCALIO protocol extension"
+ depends on NFS_V3
+ help
+ Some NFS servers support an auxiliary NFSv3 LOCALIO protocol
+ that is not an official part of the NFS version 3 protocol.
+
+ This option enables support for version 3 of the LOCALIO
+ protocol in the kernel's NFS client.
+
+ If unsure, say N.
+
config NFS_V4
tristate "NFS client support for NFS version 4"
depends on NFS_FS
@@ -86,6 +104,18 @@ config NFS_V4
If unsure, say Y.
+config NFS_V4_LOCALIO
+ bool "NFS client support for the NFSv4 LOCALIO protocol extension"
+ depends on NFS_V4
+ help
+ Some NFS servers support an auxiliary NFSv4 LOCALIO protocol
+ that is not an official part of the NFS version 4 protocol.
+
+ This option enables support for version 4 of the LOCALIO
+ protocol in the kernel's NFS client.
+
+ If unsure, say N.
+
config NFS_SWAP
bool "Provide swap over NFS support"
default n
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index 119c75ab9fd0..d81623b76aba 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -6,5 +6,8 @@
obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
nfs_acl-objs := nfsacl.o
+obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o
+nfs_localio-objs := nfslocalio.o
+
obj-$(CONFIG_GRACE_PERIOD) += grace.o
obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
new file mode 100644
index 000000000000..f214cc6754a1
--- /dev/null
+++ b/fs/nfs_common/nfslocalio.c
@@ -0,0 +1,42 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/rculist.h>
+#include <linux/nfslocalio.h>
+
+MODULE_LICENSE("GPL");
+
+/*
+ * Global list of nfsd_uuid_t instances, add/remove
+ * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
+ * Reads are protected RCU read lock (see below).
+ */
+LIST_HEAD(nfsd_uuids);
+EXPORT_SYMBOL(nfsd_uuids);
+
+/* Must be called with RCU read lock held. */
+static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid)
+{
+ nfsd_uuid_t *nfsd_uuid;
+
+ list_for_each_entry_rcu(nfsd_uuid, &nfsd_uuids, list)
+ if (uuid_equal(&nfsd_uuid->uuid, uuid))
+ return &nfsd_uuid->uuid;
+
+ return &uuid_null;
+}
+
+bool nfsd_uuid_is_local(const uuid_t *uuid)
+{
+ const uuid_t *nfsd_uuid;
+
+ rcu_read_lock();
+ nfsd_uuid = nfsd_uuid_lookup(uuid);
+ rcu_read_unlock();
+
+ return !uuid_is_null(nfsd_uuid);
+}
+EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index ec2ab6429e00..edae34a7b7e5 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -1,4 +1,8 @@
# SPDX-License-Identifier: GPL-2.0-only
+
+config NFSD_LOCALIO
+ tristate
+
config NFSD
tristate "NFS server support"
depends on INET
@@ -9,6 +13,8 @@ config NFSD
select EXPORTFS
select NFS_ACL_SUPPORT if NFSD_V2_ACL
select NFS_ACL_SUPPORT if NFSD_V3_ACL
+ select NFSD_LOCALIO if NFSD_V3_LOCALIO || NFSD_V4_LOCALIO
+ select NFS_COMMON_LOCALIO_SUPPORT if NFSD_LOCALIO
depends on MULTIUSER
help
Choose Y here if you want to allow other computers to access
@@ -69,6 +75,18 @@ config NFSD_V3_ACL
If unsure, say N.
+config NFSD_V3_LOCALIO
+ bool "NFS server support for the NFSv3 LOCALIO protocol extension"
+ depends on NFSD
+ help
+ Some NFS servers support an auxiliary NFSv3 LOCALIO protocol
+ that is not an official part of the NFS version 3 protocol.
+
+ This option enables support for version 3 of the LOCALIO
+ protocol in the kernel's NFS server.
+
+ If unsure, say N.
+
config NFSD_V4
bool "NFS server support for NFS version 4"
depends on NFSD && PROC_FS
@@ -89,6 +107,18 @@ config NFSD_V4
If unsure, say N.
+config NFSD_V4_LOCALIO
+ bool "NFS server support for the NFSv4 LOCALIO protocol extension"
+ depends on NFSD_V4
+ help
+ Some NFS servers support an auxiliary NFSv4 LOCALIO protocol
+ that is not an official part of the NFS version 4 protocol.
+
+ This option enables support for version 4 of the LOCALIO
+ protocol in the kernel's NFS server.
+
+ If unsure, say N.
+
config NFSD_PNFS
bool
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 14ec15656320..0c5a1d97e4ac 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -15,6 +15,7 @@
#include <linux/percpu_counter.h>
#include <linux/siphash.h>
#include <linux/sunrpc/stats.h>
+#include <linux/nfslocalio.h>
/* Hash tables for nfs4_clientid state */
#define CLIENT_HASH_BITS 4
@@ -213,6 +214,9 @@ struct nfsd_net {
/* last time an admin-revoke happened for NFSv4.0 */
time64_t nfs40_last_revoke;
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ nfsd_uuid_t nfsd_uuid;
+#endif
};
/* Simple check to find out if a given net was properly initialized */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 9edb4f7c4cc2..1222a0a33fe1 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -19,6 +19,7 @@
#include <linux/sunrpc/svc_xprt.h>
#include <linux/lockd/bind.h>
#include <linux/nfsacl.h>
+#include <linux/nfslocalio.h>
#include <linux/seq_file.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
@@ -427,6 +428,10 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
#ifdef CONFIG_NFSD_V4_2_INTER_SSC
nfsd4_ssc_init_umount_work(nn);
+#endif
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ INIT_LIST_HEAD(&nn->nfsd_uuid.list);
+ list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
nn->nfsd_net_up = true;
return 0;
@@ -456,6 +461,9 @@ static void nfsd_shutdown_net(struct net *net)
lockd_down(net);
nn->lockd_up = false;
}
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ list_del_rcu(&nn->nfsd_uuid.list);
+#endif
nn->nfsd_net_up = false;
nfsd_shutdown_generic();
}
@@ -802,7 +810,9 @@ nfsd_svc(int n, int *nthreads, struct net *net, const struct cred *cred, const c
strscpy(nn->nfsd_name, scope ? scope : utsname()->nodename,
sizeof(nn->nfsd_name));
-
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ uuid_gen(&nn->nfsd_uuid.uuid);
+#endif
error = nfsd_create_serv(net);
if (error)
goto out;
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
new file mode 100644
index 000000000000..d0bbacd0adcf
--- /dev/null
+++ b/include/linux/nfslocalio.h
@@ -0,0 +1,29 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+#ifndef __LINUX_NFSLOCALIO_H
+#define __LINUX_NFSLOCALIO_H
+
+#include <linux/list.h>
+#include <linux/uuid.h>
+
+/*
+ * Global list of nfsd_uuid_t instances, add/remove
+ * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
+ */
+extern struct list_head nfsd_uuids;
+
+/*
+ * Each nfsd instance has an nfsd_uuid_t that is accessible through the
+ * global nfsd_uuids list. Useful to allow a client to negotiate if localio
+ * possible with its server.
+ */
+typedef struct {
+ uuid_t uuid;
+ struct list_head list;
+} nfsd_uuid_t;
+
+bool nfsd_uuid_is_local(const uuid_t *uuid);
+
+#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 06/19] nfs/nfsd: add "localio" support
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (4 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 05/19] nfs_common: add NFS LOCALIO protocol extension enablement Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 21:28 ` Jeff Layton
2024-06-18 20:19 ` [PATCH v5 07/19] NFS: Enable localio for non-pNFS I/O Mike Snitzer
` (13 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.
nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a LOCALIO protocol extension.
This has dynamic binding with the nfsd module. Localio will only work
if nfsd is already loaded.
The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use localio support.
Also, tracepoints were added for nfs_local_open_fh, nfs_local_enable
and nfs_local_disable.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/Makefile | 1 +
fs/nfs/client.c | 7 +
fs/nfs/inode.c | 5 +
fs/nfs/internal.h | 53 +++
fs/nfs/localio.c | 797 ++++++++++++++++++++++++++++++++++++++
fs/nfs/nfstrace.h | 61 +++
fs/nfs/pagelist.c | 3 +
fs/nfs/write.c | 3 +
fs/nfsd/Makefile | 1 +
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 243 ++++++++++++
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 8 +
include/linux/nfs.h | 6 +
include/linux/nfs_fs.h | 2 +
include/linux/nfs_fs_sb.h | 5 +
include/linux/nfs_xdr.h | 1 +
17 files changed, 1199 insertions(+), 2 deletions(-)
create mode 100644 fs/nfs/localio.c
create mode 100644 fs/nfsd/localio.c
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 5f6db37f461e..9fb2f2cac87e 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -13,6 +13,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \
nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
+nfs-$(CONFIG_NFS_LOCALIO) += localio.o
obj-$(CONFIG_NFS_V2) += nfsv2.o
nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index de77848ae654..9170e6036fd2 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -178,6 +178,10 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1;
clp->cl_net = get_net(cl_init->net);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ seqlock_init(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+#endif
clp->cl_principal = "*";
clp->cl_xprtsec = cl_init->xprtsec;
return clp;
@@ -233,6 +237,8 @@ static void pnfs_init_server(struct nfs_server *server)
*/
void nfs_free_client(struct nfs_client *clp)
{
+ nfs_local_disable(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
@@ -424,6 +430,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
+ nfs_local_probe(new);
return rpc_ops->init_client(new, cl_init);
}
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index acef52ecb1bb..4f88b860494f 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -39,6 +39,7 @@
#include <linux/slab.h>
#include <linux/compat.h>
#include <linux/freezer.h>
+#include <linux/file.h>
#include <linux/uaccess.h>
#include <linux/iversion.h>
@@ -1053,6 +1054,7 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
ctx->lock_context.open_context = ctx;
INIT_LIST_HEAD(&ctx->list);
ctx->mdsthreshold = NULL;
+ ctx->local_filp = NULL;
return ctx;
}
EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
@@ -1084,6 +1086,8 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
nfs_sb_deactive(sb);
put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
kfree(ctx->mdsthreshold);
+ if (!IS_ERR_OR_NULL(ctx->local_filp))
+ fput(ctx->local_filp);
kfree_rcu(ctx, rcu_head);
}
@@ -2495,6 +2499,7 @@ static int __init init_nfs_fs(void)
if (err)
goto out1;
+ nfs_local_init();
err = register_nfs_fs();
if (err)
goto out0;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 958c8de072e2..c933421eb6af 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -451,6 +451,59 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+/* localio.c */
+extern void nfs_local_init(void);
+extern void nfs_local_disable(struct nfs_client *);
+extern void nfs_local_probe(struct nfs_client *);
+extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
+ struct nfs_fh *, const fmode_t);
+extern struct file *nfs_local_file_open(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ struct nfs_open_context *ctx);
+extern int nfs_local_doio(struct nfs_client *, struct file *,
+ struct nfs_pgio_header *,
+ const struct rpc_call_ops *);
+extern int nfs_local_commit(struct file *, struct nfs_commit_data *,
+ const struct rpc_call_ops *, int);
+extern bool nfs_server_is_local(const struct nfs_client *clp);
+
+#else
+static inline void nfs_local_init(void) {}
+static inline void nfs_local_disable(struct nfs_client *clp) {}
+static inline void nfs_local_probe(struct nfs_client *clp) {}
+static inline struct file *nfs_local_open_fh(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ const fmode_t mode)
+{
+ return ERR_PTR(-EINVAL);
+}
+static inline struct file *nfs_local_file_open(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ struct nfs_open_context *ctx)
+{
+ return NULL;
+}
+static inline int nfs_local_doio(struct nfs_client *clp, struct file *filep,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ return -EINVAL;
+}
+static inline int nfs_local_commit(struct file *filep, struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ return -EINVAL;
+}
+static inline bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return false;
+}
+#endif /* CONFIG_NFS_LOCALIO */
+
/* super.c */
extern const struct super_operations nfs_sops;
bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
new file mode 100644
index 000000000000..286cd0ded1b6
--- /dev/null
+++ b/fs/nfs/localio.c
@@ -0,0 +1,797 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS client support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/vfs.h>
+#include <linux/file.h>
+#include <linux/inet.h>
+#include <linux/sunrpc/addr.h>
+#include <linux/inetdevice.h>
+#include <net/addrconf.h>
+#include <linux/module.h>
+#include <linux/bvec.h>
+
+#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
+
+#include "internal.h"
+#include "pnfs.h"
+#include "nfstrace.h"
+
+#define NFSDBG_FACILITY NFSDBG_VFS
+
+extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh, const fmode_t fmode,
+ struct file **pfilp);
+/*
+ * The localio code needs to call into nfsd to do the filehandle -> struct path
+ * mapping, but cannot be statically linked, because that will make the nfs
+ * module depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module. This way the nfs module
+ * will only hold a reference on nfsd when it's actually in use. This also
+ * allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+
+struct nfs_local_open_ctx {
+ spinlock_t lock;
+ nfs_to_nfsd_open_t open_f;
+ atomic_t refcount;
+};
+
+struct nfs_local_kiocb {
+ struct kiocb kiocb;
+ struct bio_vec *bvec;
+ struct nfs_pgio_header *hdr;
+ struct work_struct work;
+};
+
+struct nfs_local_fsync_ctx {
+ struct file *filp;
+ struct nfs_commit_data *data;
+ struct work_struct work;
+ struct kref kref;
+ struct completion *done;
+};
+static void nfs_local_fsync_work(struct work_struct *work);
+
+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ */
+static struct {
+ __u32 stat;
+ int errno;
+} nfs_errtbl[] = {
+ { NFS4_OK, 0 },
+ { NFS4ERR_PERM, -EPERM },
+ { NFS4ERR_NOENT, -ENOENT },
+ { NFS4ERR_IO, -EIO },
+ { NFS4ERR_NXIO, -ENXIO },
+ { NFS4ERR_FBIG, -E2BIG },
+ { NFS4ERR_STALE, -EBADF },
+ { NFS4ERR_ACCESS, -EACCES },
+ { NFS4ERR_EXIST, -EEXIST },
+ { NFS4ERR_XDEV, -EXDEV },
+ { NFS4ERR_MLINK, -EMLINK },
+ { NFS4ERR_NOTDIR, -ENOTDIR },
+ { NFS4ERR_ISDIR, -EISDIR },
+ { NFS4ERR_INVAL, -EINVAL },
+ { NFS4ERR_FBIG, -EFBIG },
+ { NFS4ERR_NOSPC, -ENOSPC },
+ { NFS4ERR_ROFS, -EROFS },
+ { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG },
+ { NFS4ERR_NOTEMPTY, -ENOTEMPTY },
+ { NFS4ERR_DQUOT, -EDQUOT },
+ { NFS4ERR_STALE, -ESTALE },
+ { NFS4ERR_STALE, -EOPENSTALE },
+ { NFS4ERR_DELAY, -ETIMEDOUT },
+ { NFS4ERR_DELAY, -ERESTARTSYS },
+ { NFS4ERR_DELAY, -EAGAIN },
+ { NFS4ERR_DELAY, -ENOMEM },
+ { NFS4ERR_IO, -ETXTBSY },
+ { NFS4ERR_IO, -EBUSY },
+ { NFS4ERR_BADHANDLE, -EBADHANDLE },
+ { NFS4ERR_BAD_COOKIE, -EBADCOOKIE },
+ { NFS4ERR_NOTSUPP, -EOPNOTSUPP },
+ { NFS4ERR_TOOSMALL, -ETOOSMALL },
+ { NFS4ERR_SERVERFAULT, -ESERVERFAULT },
+ { NFS4ERR_SERVERFAULT, -ENFILE },
+ { NFS4ERR_IO, -EREMOTEIO },
+ { NFS4ERR_IO, -EUCLEAN },
+ { NFS4ERR_PERM, -ENOKEY },
+ { NFS4ERR_BADTYPE, -EBADTYPE },
+ { NFS4ERR_SYMLINK, -ELOOP },
+ { NFS4ERR_DEADLOCK, -EDEADLK },
+};
+
+/*
+ * Convert an NFS error code to a local one.
+ * This one is used jointly by NFSv2 and NFSv3.
+ */
+static __u32
+nfs4errno(int errno)
+{
+ unsigned int i;
+ for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) {
+ if (nfs_errtbl[i].errno == errno)
+ return nfs_errtbl[i].stat;
+ }
+ /* If we cannot translate the error, the recovery routines should
+ * handle it.
+ * Note: remaining NFSv4 error codes have values > 10000, so should
+ * not conflict with native Linux error codes.
+ */
+ return NFS4ERR_SERVERFAULT;
+}
+
+static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
+
+static bool localio_enabled __read_mostly = true;
+module_param(localio_enabled, bool, 0644);
+
+bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
+ localio_enabled;
+}
+EXPORT_SYMBOL_GPL(nfs_server_is_local);
+
+void
+nfs_local_init(void)
+{
+ struct nfs_local_open_ctx *ctx = &__local_open_ctx;
+
+ ctx->open_f = NULL;
+ spin_lock_init(&ctx->lock);
+ atomic_set(&ctx->refcount, 0);
+}
+
+static bool
+nfs_local_get_lookup_ctx(void)
+{
+ struct nfs_local_open_ctx *ctx = &__local_open_ctx;
+ nfs_to_nfsd_open_t fn = NULL;
+
+ spin_lock(&ctx->lock);
+ if (ctx->open_f == NULL) {
+ spin_unlock(&ctx->lock);
+
+ fn = symbol_request(nfsd_open_local_fh);
+ if (!fn)
+ return false;
+
+ spin_lock(&ctx->lock);
+ /* catch race */
+ if (ctx->open_f == NULL) {
+ ctx->open_f = fn;
+ fn = NULL;
+ }
+ }
+ atomic_inc(&ctx->refcount);
+ spin_unlock(&ctx->lock);
+ if (fn)
+ symbol_put(nfsd_open_local_fh);
+ return true;
+}
+
+static void
+nfs_local_put_lookup_ctx(void)
+{
+ struct nfs_local_open_ctx *ctx = &__local_open_ctx;
+ nfs_to_nfsd_open_t fn;
+
+ if (atomic_dec_and_lock(&ctx->refcount, &ctx->lock)) {
+ fn = ctx->open_f;
+ ctx->open_f = NULL;
+ spin_unlock(&ctx->lock);
+ if (fn)
+ symbol_put(nfsd_open_local_fh);
+ }
+}
+
+/*
+ * nfs_local_enable - attempt to enable local i/o for an nfs_client
+ */
+static void nfs_local_enable(struct nfs_client *clp)
+{
+ if (nfs_local_get_lookup_ctx()) {
+ set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ trace_nfs_local_enable(clp);
+ }
+}
+
+/*
+ * nfs_local_disable - disable local i/o for an nfs_client
+ */
+void
+nfs_local_disable(struct nfs_client *clp)
+{
+ if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
+ trace_nfs_local_disable(clp);
+ nfs_local_put_lookup_ctx();
+ }
+}
+
+/*
+ * nfs_local_probe - probe local i/o support for an nfs_client
+ */
+void
+nfs_local_probe(struct nfs_client *clp)
+{
+ bool enable = false;
+
+ if (enable)
+ nfs_local_enable(clp);
+}
+EXPORT_SYMBOL_GPL(nfs_local_probe);
+
+/*
+ * nfs_local_open_fh - open a local filehandle
+ *
+ * Returns a pointer to a struct file or an ERR_PTR
+ */
+struct file *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, const fmode_t mode)
+{
+ struct nfs_local_open_ctx *ctx = &__local_open_ctx;
+ struct file *filp;
+ int status;
+
+ if (mode & ~(FMODE_READ | FMODE_WRITE))
+ return ERR_PTR(-EINVAL);
+
+ status = ctx->open_f(clp->cl_rpcclient, cred, fh, mode, &filp);
+ if (status < 0) {
+ dprintk("%s: open local file failed error=%d\n",
+ __func__, status);
+ trace_nfs_local_open_fh(fh, mode, status);
+ switch (status) {
+ case -ENXIO:
+ nfs_local_disable(clp);
+ fallthrough;
+ case -ETIMEDOUT:
+ status = -EAGAIN;
+ }
+ filp = ERR_PTR(status);
+ }
+ return filp;
+}
+EXPORT_SYMBOL_GPL(nfs_local_open_fh);
+
+static struct bio_vec *
+nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
+ unsigned int npages, gfp_t flags)
+{
+ struct bio_vec *bvec, *p;
+
+ bvec = kmalloc_array(npages, sizeof(*bvec), flags);
+ if (bvec != NULL) {
+ for (p = bvec; npages > 0; p++, pagevec++, npages--) {
+ p->bv_page = *pagevec;
+ p->bv_len = PAGE_SIZE;
+ p->bv_offset = 0;
+ }
+ }
+ return bvec;
+}
+
+static void
+nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
+{
+ kfree(iocb->bvec);
+ kfree(iocb);
+}
+
+static struct nfs_local_kiocb *
+nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, struct file *filp,
+ gfp_t flags)
+{
+ struct nfs_local_kiocb *iocb;
+
+ iocb = kmalloc(sizeof(*iocb), flags);
+ if (iocb == NULL)
+ return NULL;
+ iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
+ hdr->page_array.npages, flags);
+ if (iocb->bvec == NULL) {
+ kfree(iocb);
+ return NULL;
+ }
+ init_sync_kiocb(&iocb->kiocb, filp);
+ iocb->kiocb.ki_pos = hdr->args.offset;
+ iocb->hdr = hdr;
+ /* FIXME: NFS_IOHDR_ODIRECT isn't ever set */
+ if (test_bit(NFS_IOHDR_ODIRECT, &hdr->flags))
+ iocb->kiocb.ki_flags |= IOCB_DIRECT|IOCB_DSYNC;
+ iocb->kiocb.ki_flags &= ~IOCB_APPEND;
+ return iocb;
+}
+
+static void
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ if (hdr->args.pgbase != 0) {
+ iov_iter_bvec(i, dir, iocb->bvec,
+ hdr->page_array.npages,
+ hdr->args.count + hdr->args.pgbase);
+ iov_iter_advance(i, hdr->args.pgbase);
+ } else
+ iov_iter_bvec(i, dir, iocb->bvec,
+ hdr->page_array.npages, hdr->args.count);
+}
+
+static void
+nfs_local_hdr_release(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ call_ops->rpc_call_done(&hdr->task, hdr);
+ call_ops->rpc_release(hdr);
+}
+
+static void
+nfs_local_pgio_init(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ hdr->task.tk_ops = call_ops;
+ if (!hdr->task.tk_start)
+ hdr->task.tk_start = ktime_get();
+}
+
+static void
+nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
+{
+ if (status >= 0) {
+ hdr->res.count = status;
+ hdr->res.op_status = NFS4_OK;
+ hdr->task.tk_status = 0;
+ } else {
+ hdr->res.op_status = nfs4errno(status);
+ hdr->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ fput(iocb->kiocb.ki_filp);
+ nfs_local_iocb_free(iocb);
+ nfs_local_hdr_release(hdr, hdr->task.tk_ops);
+}
+
+static void
+nfs_local_read_aio_complete_work(struct work_struct *work)
+{
+ struct nfs_local_kiocb *iocb = container_of(work,
+ struct nfs_local_kiocb, work);
+
+ nfs_local_pgio_release(iocb);
+}
+
+/*
+ * Complete the I/O from iocb->kiocb.ki_complete()
+ *
+ * Note that this function can be called from a bottom half context,
+ * hence we need to queue the fput() etc to a workqueue
+ */
+static void
+nfs_local_pgio_complete(struct nfs_local_kiocb *iocb)
+{
+ queue_work(nfsiod_workqueue, &iocb->work);
+}
+
+static void
+nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct file *filp = iocb->kiocb.ki_filp;
+
+ nfs_local_pgio_done(hdr, status);
+
+ if (hdr->res.count != hdr->args.count ||
+ hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
+ hdr->res.eof = true;
+
+ dprintk("%s: read %ld bytes eof %d.\n", __func__,
+ status > 0 ? status : 0, hdr->res.eof);
+}
+
+static void
+nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
+{
+ struct nfs_local_kiocb *iocb = container_of(kiocb,
+ struct nfs_local_kiocb, kiocb);
+
+ nfs_local_read_done(iocb, ret);
+ nfs_local_pgio_complete(iocb);
+}
+
+static int
+nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_read count=%u pos=%llu\n",
+ __func__, hdr->args.count, hdr->args.offset);
+
+ iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, READ);
+
+ nfs_local_pgio_init(hdr, call_ops);
+ hdr->res.eof = false;
+
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ INIT_WORK(&iocb->work, nfs_local_read_aio_complete_work);
+ iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
+ }
+
+ status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+ }
+ return 0;
+}
+
+static void
+nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ u32 *verf = (u32 *)verifier->data;
+ int seq = 0;
+
+ do {
+ read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
+ verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
+ verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
+ } while (need_seqretry(&clp->cl_boot_lock, seq));
+ done_seqretry(&clp->cl_boot_lock, seq);
+}
+
+static void
+nfs_reset_boot_verifier(struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+
+ write_seqlock(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+ write_sequnlock(&clp->cl_boot_lock);
+}
+
+static void
+nfs_set_local_verifier(struct inode *inode,
+ struct nfs_writeverf *verf,
+ enum nfs3_stable_how how)
+{
+
+ nfs_copy_boot_verifier(&verf->verifier, inode);
+ verf->committed = how;
+}
+
+static void
+nfs_get_vfs_attr(struct file *filp, struct nfs_fattr *fattr)
+{
+ struct kstat stat;
+
+ if (fattr != NULL && vfs_getattr(&filp->f_path, &stat,
+ STATX_INO |
+ STATX_ATIME |
+ STATX_MTIME |
+ STATX_CTIME |
+ STATX_SIZE |
+ STATX_BLOCKS,
+ AT_STATX_SYNC_AS_STAT) == 0) {
+ fattr->valid = NFS_ATTR_FATTR_FILEID |
+ NFS_ATTR_FATTR_CHANGE |
+ NFS_ATTR_FATTR_SIZE |
+ NFS_ATTR_FATTR_ATIME |
+ NFS_ATTR_FATTR_MTIME |
+ NFS_ATTR_FATTR_CTIME |
+ NFS_ATTR_FATTR_SPACE_USED;
+ fattr->fileid = stat.ino;
+ fattr->size = stat.size;
+ fattr->atime = stat.atime;
+ fattr->mtime = stat.mtime;
+ fattr->ctime = stat.ctime;
+ fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ fattr->du.nfs3.used = stat.blocks << 9;
+ }
+}
+
+static void
+nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
+
+ /* Handle short writes as if they are ENOSPC */
+ if (status > 0 && status < hdr->args.count) {
+ hdr->mds_offset += status;
+ hdr->args.offset += status;
+ hdr->args.pgbase += status;
+ hdr->args.count -= status;
+ nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
+ status = -ENOSPC;
+ }
+ if (status < 0)
+ nfs_reset_boot_verifier(hdr->inode);
+ nfs_local_pgio_done(hdr, status);
+}
+
+static void
+nfs_local_write_aio_complete_work(struct work_struct *work)
+{
+ struct nfs_local_kiocb *iocb = container_of(work,
+ struct nfs_local_kiocb, work);
+
+ nfs_get_vfs_attr(iocb->kiocb.ki_filp, iocb->hdr->res.fattr);
+ nfs_local_pgio_release(iocb);
+}
+
+static void
+nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
+{
+ struct nfs_local_kiocb *iocb = container_of(kiocb,
+ struct nfs_local_kiocb, kiocb);
+
+ nfs_local_write_done(iocb, ret);
+ nfs_local_pgio_complete(iocb);
+}
+
+static int
+nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_write count=%u pos=%llu %s\n",
+ __func__, hdr->args.count, hdr->args.offset,
+ (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
+
+ iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, WRITE);
+
+ switch (hdr->args.stable) {
+ default:
+ break;
+ case NFS_DATA_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC;
+ break;
+ case NFS_FILE_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
+ }
+ nfs_local_pgio_init(hdr, call_ops);
+
+ if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
+ INIT_WORK(&iocb->work, nfs_local_write_aio_complete_work);
+ iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
+ }
+
+ nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
+
+ file_start_write(filp);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ file_end_write(filp);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_write_done(iocb, status);
+ nfs_get_vfs_attr(filp, hdr->res.fattr);
+ nfs_local_pgio_release(iocb);
+ }
+ return 0;
+}
+
+static struct file *
+nfs_local_file_open_cached(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_open_context *ctx)
+{
+ struct file *filp = ctx->local_filp;
+
+ if (!filp) {
+ struct file *new = nfs_local_open_fh(clp, cred, fh, ctx->mode);
+ if (IS_ERR_OR_NULL(new))
+ return NULL;
+ /* try to put this one in the slot */
+ filp = cmpxchg(&ctx->local_filp, NULL, new);
+ if (filp != NULL)
+ fput(new);
+ else
+ filp = new;
+ }
+ return get_file(filp);
+}
+
+struct file *
+nfs_local_file_open(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_open_context *ctx)
+{
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ return nfs_local_file_open_cached(clp, cred, fh, ctx);
+}
+
+int
+nfs_local_doio(struct nfs_client *clp, struct file *filp,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ int status = 0;
+
+ if (!hdr->args.count)
+ goto out_fput;
+ /* Don't support filesystems without read_iter/write_iter */
+ if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
+ nfs_local_disable(clp);
+ status = -EAGAIN;
+ goto out_fput;
+ }
+
+ switch (hdr->rw_mode) {
+ case FMODE_READ:
+ status = nfs_do_local_read(hdr, filp, call_ops);
+ break;
+ case FMODE_WRITE:
+ status = nfs_do_local_write(hdr, filp, call_ops);
+ break;
+ default:
+ dprintk("%s: invalid mode: %d\n", __func__,
+ hdr->rw_mode);
+ status = -EINVAL;
+ }
+out_fput:
+ if (status != 0) {
+ fput(filp);
+ hdr->task.tk_status = status;
+ nfs_local_hdr_release(hdr, call_ops);
+ }
+ return status;
+}
+
+static void
+nfs_local_init_commit(struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ data->task.tk_ops = call_ops;
+}
+
+static int
+nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
+{
+ loff_t start = data->args.offset;
+ loff_t end = LLONG_MAX;
+
+ if (data->args.count > 0) {
+ end = start + data->args.count - 1;
+ if (end < start)
+ end = LLONG_MAX;
+ }
+
+ dprintk("%s: commit %llu - %llu\n", __func__, start, end);
+ return vfs_fsync_range(filp, start, end, 0);
+}
+
+static void
+nfs_local_commit_done(struct nfs_commit_data *data, int status)
+{
+ if (status >= 0) {
+ nfs_set_local_verifier(data->inode,
+ data->res.verf,
+ NFS_FILE_SYNC);
+ data->res.op_status = NFS4_OK;
+ data->task.tk_status = 0;
+ } else {
+ nfs_reset_boot_verifier(data->inode);
+ data->res.op_status = nfs4errno(status);
+ data->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_release_commit_data(struct file *filp,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ fput(filp);
+ call_ops->rpc_call_done(&data->task, data);
+ call_ops->rpc_release(data);
+}
+
+static struct nfs_local_fsync_ctx *
+nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, struct file *filp,
+ gfp_t flags)
+{
+ struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
+
+ if (ctx != NULL) {
+ ctx->filp = filp;
+ ctx->data = data;
+ INIT_WORK(&ctx->work, nfs_local_fsync_work);
+ kref_init(&ctx->kref);
+ ctx->done = NULL;
+ }
+ return ctx;
+}
+
+static void
+nfs_local_fsync_ctx_kref_free(struct kref *kref)
+{
+ kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
+}
+
+static void
+nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
+{
+ kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
+}
+
+static void
+nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
+{
+ nfs_local_release_commit_data(ctx->filp, ctx->data,
+ ctx->data->task.tk_ops);
+ nfs_local_fsync_ctx_put(ctx);
+}
+
+static void
+nfs_local_fsync_work(struct work_struct *work)
+{
+ struct nfs_local_fsync_ctx *ctx;
+ int status;
+
+ ctx = container_of(work, struct nfs_local_fsync_ctx, work);
+
+ status = nfs_local_run_commit(ctx->filp, ctx->data);
+ nfs_local_commit_done(ctx->data, status);
+ if (ctx->done != NULL)
+ complete(ctx->done);
+ nfs_local_fsync_ctx_free(ctx);
+}
+
+int
+nfs_local_commit(struct file *filp, struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ struct nfs_local_fsync_ctx *ctx;
+
+ ctx = nfs_local_fsync_ctx_alloc(data, filp, GFP_KERNEL);
+ if (!ctx) {
+ nfs_local_commit_done(data, -ENOMEM);
+ nfs_local_release_commit_data(filp, data, call_ops);
+ return -ENOMEM;
+ }
+
+ nfs_local_init_commit(data, call_ops);
+ kref_get(&ctx->kref);
+ if (how & FLUSH_SYNC) {
+ DECLARE_COMPLETION_ONSTACK(done);
+ ctx->done = &done;
+ queue_work(nfsiod_workqueue, &ctx->work);
+ wait_for_completion(&done);
+ } else
+ queue_work(nfsiod_workqueue, &ctx->work);
+ nfs_local_fsync_ctx_put(ctx);
+ return 0;
+}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 1e710654af11..95a2c19a9172 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1681,6 +1681,67 @@ TRACE_EVENT(nfs_mount_path,
TP_printk("path='%s'", __get_str(path))
);
+TRACE_EVENT(nfs_local_open_fh,
+ TP_PROTO(
+ const struct nfs_fh *fh,
+ fmode_t fmode,
+ int error
+ ),
+
+ TP_ARGS(fh, fmode, error),
+
+ TP_STRUCT__entry(
+ __field(int, error)
+ __field(u32, fhandle)
+ __field(unsigned int, fmode)
+ ),
+
+ TP_fast_assign(
+ __entry->error = error;
+ __entry->fhandle = nfs_fhandle_hash(fh);
+ __entry->fmode = (__force unsigned int)fmode;
+ ),
+
+ TP_printk(
+ "error=%d fhandle=0x%08x mode=%s",
+ __entry->error,
+ __entry->fhandle,
+ show_fs_fmode_flags(__entry->fmode)
+ )
+);
+
+DECLARE_EVENT_CLASS(nfs_local_client_event,
+ TP_PROTO(
+ const struct nfs_client *clp
+ ),
+
+ TP_ARGS(clp),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, protocol)
+ __string(server, clp->cl_hostname)
+ ),
+
+ TP_fast_assign(
+ __entry->protocol = clp->rpc_ops->version;
+ __assign_str(server);
+ ),
+
+ TP_printk(
+ "server=%s NFSv%u", __get_str(server), __entry->protocol
+ )
+);
+
+#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \
+ DEFINE_EVENT(nfs_local_client_event, name, \
+ TP_PROTO( \
+ const struct nfs_client *clp \
+ ), \
+ TP_ARGS(clp))
+
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable);
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable);
+
DECLARE_EVENT_CLASS(nfs_xdr_event,
TP_PROTO(
const struct xdr_stream *xdr,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 57d62db3be5b..b08420b8e664 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -879,6 +879,9 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
hdr->args.count,
(unsigned long long)hdr->args.offset);
+ if (localio)
+ return nfs_local_doio(clp, localio, hdr, call_ops);
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 267bed2a4ceb..b29b0fd5431f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1700,6 +1700,9 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
dprintk("NFS: initiated commit call\n");
+ if (localio)
+ return nfs_local_commit(localio, data, call_ops, how);
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index b8736a82e57c..78b421778a79 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
+nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ad9083ca144b..99631fa56662 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -52,7 +52,7 @@
#define NFSD_FILE_CACHE_UP (0)
/* We only care about NFSD_MAY_READ/WRITE for this cache */
-#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
new file mode 100644
index 000000000000..6e2918e76f49
--- /dev/null
+++ b/fs/nfsd/localio.c
@@ -0,0 +1,243 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS server support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/exportfs.h>
+#include <linux/sunrpc/svcauth_gss.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/nfs.h>
+#include <linux/string.h>
+
+#include "nfsd.h"
+#include "vfs.h"
+#include "netns.h"
+#include "filecache.h"
+
+#define NFSDDBG_FACILITY NFSDDBG_FH
+
+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
+ * all compiled nfs objects if it were in include/linux/nfs.h
+ */
+static const struct {
+ int stat;
+ int errno;
+} nfs_common_errtbl[] = {
+ { NFS_OK, 0 },
+ { NFSERR_PERM, -EPERM },
+ { NFSERR_NOENT, -ENOENT },
+ { NFSERR_IO, -EIO },
+ { NFSERR_NXIO, -ENXIO },
+/* { NFSERR_EAGAIN, -EAGAIN }, */
+ { NFSERR_ACCES, -EACCES },
+ { NFSERR_EXIST, -EEXIST },
+ { NFSERR_XDEV, -EXDEV },
+ { NFSERR_NODEV, -ENODEV },
+ { NFSERR_NOTDIR, -ENOTDIR },
+ { NFSERR_ISDIR, -EISDIR },
+ { NFSERR_INVAL, -EINVAL },
+ { NFSERR_FBIG, -EFBIG },
+ { NFSERR_NOSPC, -ENOSPC },
+ { NFSERR_ROFS, -EROFS },
+ { NFSERR_MLINK, -EMLINK },
+ { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
+ { NFSERR_NOTEMPTY, -ENOTEMPTY },
+ { NFSERR_DQUOT, -EDQUOT },
+ { NFSERR_STALE, -ESTALE },
+ { NFSERR_REMOTE, -EREMOTE },
+#ifdef EWFLUSH
+ { NFSERR_WFLUSH, -EWFLUSH },
+#endif
+ { NFSERR_BADHANDLE, -EBADHANDLE },
+ { NFSERR_NOT_SYNC, -ENOTSYNC },
+ { NFSERR_BAD_COOKIE, -EBADCOOKIE },
+ { NFSERR_NOTSUPP, -ENOTSUPP },
+ { NFSERR_TOOSMALL, -ETOOSMALL },
+ { NFSERR_SERVERFAULT, -EREMOTEIO },
+ { NFSERR_BADTYPE, -EBADTYPE },
+ { NFSERR_JUKEBOX, -EJUKEBOX },
+ { -1, -EIO }
+};
+
+/**
+ * nfs_stat_to_errno - convert an NFS status code to a local errno
+ * @status: NFS status code to convert
+ *
+ * Returns a local errno value, or -EIO if the NFS status code is
+ * not recognized. This function is used jointly by NFSv2 and NFSv3.
+ */
+static inline int nfs_stat_to_errno(enum nfs_stat status)
+{
+ int i;
+
+ for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
+ if (nfs_common_errtbl[i].stat == (int)status)
+ return nfs_common_errtbl[i].errno;
+ }
+ return nfs_common_errtbl[i].errno;
+}
+
+static void
+nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
+{
+ if (rqstp->rq_client)
+ auth_domain_put(rqstp->rq_client);
+ if (rqstp->rq_cred.cr_group_info)
+ put_group_info(rqstp->rq_cred.cr_group_info);
+ /* rpcauth_map_to_svc_cred_local() clears cr_principal */
+ WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
+ kfree(rqstp->rq_xprt);
+ kfree(rqstp);
+}
+
+static struct svc_rqst *
+nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
+{
+ struct svc_rqst *rqstp;
+ struct net *net = rpc_net_ns(rpc_clnt);
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ int status;
+
+ /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
+ if (unlikely(!READ_ONCE(nn->nfsd_serv))) {
+ dprintk("%s: localio denied. Server not running\n", __func__);
+ return ERR_PTR(-ENXIO);
+ }
+
+ rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
+ if (!rqstp)
+ return ERR_PTR(-ENOMEM);
+
+ rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
+ if (!rqstp->rq_xprt) {
+ status = -ENOMEM;
+ goto out_err;
+ }
+
+ rqstp->rq_xprt->xpt_net = net;
+ __set_bit(RQ_SECURE, &rqstp->rq_flags);
+ rqstp->rq_proc = 1;
+ rqstp->rq_vers = 3;
+ rqstp->rq_prot = IPPROTO_TCP;
+ rqstp->rq_server = nn->nfsd_serv;
+
+ /* Note: we're connecting to ourself, so source addr == peer addr */
+ rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
+ (struct sockaddr *)&rqstp->rq_addr,
+ sizeof(rqstp->rq_addr));
+
+ rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
+
+ /*
+ * set up enough for svcauth_unix_set_client to be able to wait
+ * for the cache downcall. Note that we do _not_ want to allow the
+ * request to be deferred for later revisit since this rqst and xprt
+ * are not set up to run inside of the normal svc_rqst engine.
+ */
+ INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
+ kref_init(&rqstp->rq_xprt->xpt_ref);
+ spin_lock_init(&rqstp->rq_xprt->xpt_lock);
+ rqstp->rq_chandle.thread_wait = 5 * HZ;
+
+ status = svcauth_unix_set_client(rqstp);
+ switch (status) {
+ case SVC_OK:
+ break;
+ case SVC_DENIED:
+ status = -ENXIO;
+ dprintk("%s: client %pISpc denied localio access\n",
+ __func__, (struct sockaddr *)&rqstp->rq_addr);
+ goto out_err;
+ default:
+ status = -ETIMEDOUT;
+ dprintk("%s: client %pISpc temporarily denied localio access\n",
+ __func__, (struct sockaddr *)&rqstp->rq_addr);
+ goto out_err;
+ }
+
+ return rqstp;
+
+out_err:
+ nfsd_local_fakerqst_destroy(rqstp);
+ return ERR_PTR(status);
+}
+
+/*
+ * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file
+ *
+ * This function maps a local fh to a path on a local filesystem.
+ * This is useful when the nfs client has the local server mounted - it can
+ * avoid all the NFS overhead with reads, writes and commits.
+ *
+ * on successful return, caller is responsible for calling path_put. Also
+ * note that this is called from nfs.ko via find_symbol() to avoid an explicit
+ * dependency on knfsd. So, there is no forward declaration in a header file
+ * for it.
+ */
+int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh,
+ const fmode_t fmode,
+ struct file **pfilp)
+{
+ const struct cred *save_cred;
+ struct svc_rqst *rqstp;
+ struct svc_fh fh;
+ struct nfsd_file *nf;
+ int status = 0;
+ int mayflags = NFSD_MAY_LOCALIO;
+ __be32 beres;
+
+ /* Save creds before calling into nfsd */
+ save_cred = get_current_cred();
+
+ rqstp = nfsd_local_fakerqst_create(rpc_clnt, cred);
+ if (IS_ERR(rqstp)) {
+ status = PTR_ERR(rqstp);
+ goto out_revertcred;
+ }
+
+ /* nfs_fh -> svc_fh */
+ if (nfs_fh->size > NFS4_FHSIZE) {
+ status = -EINVAL;
+ goto out;
+ }
+ fh_init(&fh, NFS4_FHSIZE);
+ fh.fh_handle.fh_size = nfs_fh->size;
+ memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
+
+ if (fmode & FMODE_READ)
+ mayflags |= NFSD_MAY_READ;
+ if (fmode & FMODE_WRITE)
+ mayflags |= NFSD_MAY_WRITE;
+
+ beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
+ if (beres) {
+ status = nfs_stat_to_errno(be32_to_cpu(beres));
+ dprintk("%s: fh_verify failed %d\n", __func__, status);
+ goto out_fh_put;
+ }
+
+ *pfilp = get_file(nf->nf_file);
+
+ nfsd_file_put(nf);
+out_fh_put:
+ fh_put(&fh);
+
+out:
+ nfsd_local_fakerqst_destroy(rqstp);
+out_revertcred:
+ revert_creds(save_cred);
+ return status;
+}
+EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 77bbd23aa150..9c0610fdd11c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
{ NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
{ NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
{ NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
- { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
+ { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
+ { NFSD_MAY_LOCALIO, "LOCALIO" })
TRACE_EVENT(nfsd_compound,
TP_PROTO(
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 57cd70062048..91c50649a8c7 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -36,6 +36,8 @@
#define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
#define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
+#define NFSD_MAY_LOCALIO 0x800000
+
struct nfsd_file;
/*
@@ -158,6 +160,12 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
void nfsd_filp_close(struct file *fp);
+int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh,
+ const fmode_t fmode,
+ struct file **pfilp);
+
static inline int fh_want_write(struct svc_fh *fh)
{
int ret;
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index ceb70a926b95..2dacfe9742c6 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -8,6 +8,8 @@
#ifndef _LINUX_NFS_H
#define _LINUX_NFS_H
+#include <linux/cred.h>
+#include <linux/sunrpc/auth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/string.h>
#include <linux/crc32.h>
@@ -46,6 +48,10 @@ enum nfs3_stable_how {
NFS_INVALID_STABLE_HOW = -1
};
+typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
+ const struct nfs_fh *, const fmode_t,
+ struct file **);
+
#ifdef CONFIG_CRC32
/**
* nfs_fhandle_hash - calculate the crc32 hash for the filehandle
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 039898d70954..a0bb947fdd1d 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -96,6 +96,8 @@ struct nfs_open_context {
struct list_head list;
struct nfs4_threshold *mdsthreshold;
struct rcu_head rcu_head;
+
+ struct file *local_filp;
};
struct nfs_open_dir_context {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 92de074e63b9..00fe469bc72e 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -49,6 +49,7 @@ struct nfs_client {
#define NFS_CS_DS 7 /* - Server is a DS */
#define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
#define NFS_CS_PNFS 9 /* - Server used for pnfs */
+#define NFS_CS_LOCAL_IO 10 /* - client is local */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
@@ -125,6 +126,10 @@ struct nfs_client {
struct net *cl_net;
struct list_head pending_cb_stateids;
struct rcu_head rcu;
+
+ /* localio */
+ struct timespec64 cl_nfssvc_boot;
+ seqlock_t cl_boot_lock;
};
/*
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index d09b9773b20c..764513a61601 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1605,6 +1605,7 @@ enum {
NFS_IOHDR_RESEND_PNFS,
NFS_IOHDR_RESEND_MDS,
NFS_IOHDR_UNSTABLE_WRITES,
+ NFS_IOHDR_ODIRECT,
};
struct nfs_io_completion;
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 07/19] NFS: Enable localio for non-pNFS I/O
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (5 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 06/19] nfs/nfsd: add "localio" support Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 08/19] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
` (12 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Try a local open of the file we're writing to, and if it succeeds, then
do local I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/pagelist.c | 19 ++++++++++---------
fs/nfs/write.c | 7 ++++++-
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index b08420b8e664..3ee78da5ebc4 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1063,6 +1063,7 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
{
struct nfs_pgio_header *hdr;
+ struct file *filp;
int ret;
unsigned short task_flags = 0;
@@ -1074,18 +1075,18 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0) {
+ struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client;
+
+ filp = nfs_local_file_open(clp, hdr->cred, hdr->args.fh,
+ hdr->args.context);
+
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(desc,
- NFS_SERVER(hdr->inode)->nfs_client,
- NFS_CLIENT(hdr->inode),
- hdr,
- hdr->cred,
- NFS_PROTO(hdr->inode),
- desc->pg_rpc_callops,
- desc->pg_ioflags,
+ ret = nfs_initiate_pgio(desc, clp, NFS_CLIENT(hdr->inode),
+ hdr, hdr->cred, NFS_PROTO(hdr->inode),
+ desc->pg_rpc_callops, desc->pg_ioflags,
RPC_TASK_CRED_NOREF | task_flags,
- NULL);
+ filp);
}
return ret;
}
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b29b0fd5431f..b2c06b8b88cd 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1802,6 +1802,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
struct nfs_commit_info *cinfo)
{
struct nfs_commit_data *data;
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ struct file *filp;
unsigned short task_flags = 0;
/* another commit raced with us */
@@ -1818,9 +1820,12 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
nfs_init_commit(data, head, NULL, cinfo);
if (NFS_SERVER(inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
+
+ filp = nfs_local_file_open(clp, data->cred, data->args.fh,
+ data->context);
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF | task_flags, NULL);
+ RPC_TASK_CRED_NOREF | task_flags, filp);
}
/*
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 08/19] pnfs/flexfiles: Enable localio for flexfiles I/O
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (6 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 07/19] NFS: Enable localio for non-pNFS I/O Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 09/19] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM Mike Snitzer
` (11 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
From: Trond Myklebust <trond.myklebust@hammerspace.com>
If the DS is local to this client, then we should be able to use local
I/O to write the data.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 113 ++++++++++++++++++++--
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 ++
3 files changed, 112 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 3ea07446f05a..ec6aaa110a7b 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -11,6 +11,7 @@
#include <linux/nfs_mount.h>
#include <linux/nfs_page.h>
#include <linux/module.h>
+#include <linux/file.h>
#include <linux/sched/mm.h>
#include <linux/sunrpc/metrics.h>
@@ -162,6 +163,52 @@ decode_name(struct xdr_stream *xdr, u32 *id)
return 0;
}
+static struct file *
+ff_local_open_fh(struct pnfs_layout_segment *lseg,
+ u32 ds_idx,
+ struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ fmode_t mode)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+ struct file *filp, *new, __rcu **pfile;
+
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ if (mode & FMODE_WRITE) {
+ /*
+ * Always request read and write access since this corresponds
+ * to a rw layout.
+ */
+ mode |= FMODE_READ;
+ pfile = &mirror->rw_file;
+ } else
+ pfile = &mirror->ro_file;
+
+ new = NULL;
+ rcu_read_lock();
+ filp = rcu_dereference(*pfile);
+ if (!filp) {
+ rcu_read_unlock();
+ new = nfs_local_open_fh(clp, cred, fh, mode);
+ if (IS_ERR(new))
+ return NULL;
+ rcu_read_lock();
+ /* try to swap in the pointer */
+ filp = cmpxchg(pfile, NULL, new);
+ if (!filp) {
+ filp = new;
+ new = NULL;
+ }
+ }
+ filp = get_file_rcu(&filp);
+ rcu_read_unlock();
+ if (new)
+ fput(new);
+ return filp;
+}
+
static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
const struct nfs4_ff_layout_mirror *m2)
{
@@ -237,8 +284,15 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
{
+ struct file *filp;
const struct cred *cred;
+ filp = rcu_access_pointer(mirror->ro_file);
+ if (filp)
+ fput(filp);
+ filp = rcu_access_pointer(mirror->rw_file);
+ if (filp)
+ fput(filp);
ff_layout_remove_mirror(mirror);
kfree(mirror->fh_versions);
cred = rcu_access_pointer(mirror->ro_cred);
@@ -414,6 +468,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
struct nfs4_ff_layout_mirror *mirror;
struct cred *kcred;
const struct cred __rcu *cred;
+ const struct cred __rcu *old;
kuid_t uid;
kgid_t gid;
u32 ds_count, fh_count, id;
@@ -513,13 +568,26 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]);
if (mirror != fls->mirror_array[i]) {
+ struct file *filp;
+
/* swap cred ptrs so free_mirror will clean up old */
if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->ro_cred, cred);
- rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
+ old = xchg(&mirror->ro_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->ro_cred, old);
+ /* drop file if creds changed */
+ if (old != cred) {
+ filp = rcu_dereference_protected(xchg(&mirror->ro_file, NULL), 1);
+ if (filp)
+ fput(filp);
+ }
} else {
- cred = xchg(&mirror->rw_cred, cred);
- rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
+ old = xchg(&mirror->rw_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->rw_cred, old);
+ if (old != cred) {
+ filp = rcu_dereference_protected(xchg(&mirror->rw_file, NULL), 1);
+ if (filp)
+ fput(filp);
+ }
}
ff_layout_free_mirror(fls->mirror_array[i]);
fls->mirror_array[i] = mirror;
@@ -1757,6 +1825,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
loff_t offset = hdr->args.offset;
@@ -1803,12 +1872,20 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
hdr->args.offset = offset;
hdr->mds_offset = offset;
+ /* Start IO accounting for local read */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ);
+ if (filp) {
+ hdr->task.tk_start = ktime_get();
+ ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
+ }
+
/* Perform an asynchronous read to ds */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
- 0, RPC_TASK_SOFTCONN, NULL);
+ 0, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1829,6 +1906,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
loff_t offset = hdr->args.offset;
@@ -1873,12 +1951,20 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
*/
hdr->args.offset = offset;
+ /* Start IO accounting for local write */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ|FMODE_WRITE);
+ if (filp) {
+ hdr->task.tk_start = ktime_get();
+ ff_layout_write_record_layoutstats_start(&hdr->task, hdr);
+ }
+
/* Perform an asynchronous write */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
- sync, RPC_TASK_SOFTCONN, NULL);
+ sync, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1912,6 +1998,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
struct pnfs_layout_segment *lseg = data->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
u32 idx;
@@ -1950,10 +2037,18 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
if (fh)
data->args.fh = fh;
+ /* Start IO accounting for local commit */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ|FMODE_WRITE);
+ if (filp) {
+ data->task.tk_start = ktime_get();
+ ff_layout_commit_record_layoutstats_start(&data->task, data);
+ }
+
ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
- vers == 3 ? &ff_layout_commit_call_ops_v3 :
- &ff_layout_commit_call_ops_v4,
- how, RPC_TASK_SOFTCONN, NULL);
+ vers == 3 ? &ff_layout_commit_call_ops_v3 :
+ &ff_layout_commit_call_ops_v4,
+ how, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return ret;
out_err:
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index f84b3fb0dddd..8e042df5a2c9 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -82,7 +82,9 @@ struct nfs4_ff_layout_mirror {
struct nfs_fh *fh_versions;
nfs4_stateid stateid;
const struct cred __rcu *ro_cred;
+ struct file __rcu *ro_file;
const struct cred __rcu *rw_cred;
+ struct file __rcu *rw_file;
refcount_t ref;
spinlock_t lock;
unsigned long flags;
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index e028f5a0ef5f..e58bedfb1dcc 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -395,6 +395,12 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
/* connect success, check rsize/wsize limit */
if (!status) {
+ /*
+ * ds_clp is put in destroy_ds().
+ * keep ds_clp even if DS is local, so that if local IO cannot
+ * proceed somehow, we can fall back to NFS whenever we want.
+ */
+ nfs_local_probe(ds->ds_clp);
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 09/19] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (7 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 08/19] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-19 5:30 ` NeilBrown
2024-06-18 20:19 ` [PATCH v5 10/19] nfsd: implement v3 and v4 server " Mike Snitzer
` (10 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
LOCALIOPROC_GETUUID allows client to discover server's uuid.
nfs_local_probe() will retrieve server's uuid via LOCALIO protocol and
verify the server with that uuid it is known to be local. This ensures
client and server 1: support localio 2: are local to each other.
While doing so, factor out nfs_init_localioclient() so it is used by
both nfs3client.c and nfs4client.c
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/client.c | 8 ++--
fs/nfs/internal.h | 19 +++++++++
fs/nfs/localio.c | 87 +++++++++++++++++++++++++++++++++++----
fs/nfs/nfs3_fs.h | 1 +
fs/nfs/nfs3client.c | 25 +++++++++++
fs/nfs/nfs3proc.c | 3 ++
fs/nfs/nfs3xdr.c | 67 ++++++++++++++++++++++++++++++
fs/nfs/nfs4_fs.h | 2 +
fs/nfs/nfs4client.c | 23 +++++++++++
fs/nfs/nfs4proc.c | 3 ++
fs/nfs/nfs4xdr.c | 52 +++++++++++++++++++++++
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfs_xdr.h | 10 +++++
include/uapi/linux/nfs.h | 4 ++
14 files changed, 293 insertions(+), 12 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 9170e6036fd2..7044b8b3b332 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -170,7 +170,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
}
INIT_LIST_HEAD(&clp->cl_superblocks);
- clp->cl_rpcclient = ERR_PTR(-EINVAL);
+ clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
clp->cl_flags = cl_init->init_flags;
clp->cl_proto = cl_init->proto;
@@ -430,8 +430,10 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
- nfs_local_probe(new);
- return rpc_ops->init_client(new, cl_init);
+ new = rpc_ops->init_client(new, cl_init);
+ if (!IS_ERR(new))
+ nfs_local_probe(new);
+ return new;
}
spin_unlock(&nn->nfs_client_lock);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index c933421eb6af..fb2fb59e7ed0 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -452,6 +452,25 @@ extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+/*
+ * Initialise an NFS localio client connection.
+ * Inlined here to allow nfs[34]client.c to share this code.
+ */
+static __always_inline void
+nfs_init_localioclient(struct nfs_client *clp,
+ const struct rpc_program *program, u32 vers)
+{
+ if (unlikely(!IS_ERR(clp->cl_rpcclient_localio)))
+ goto out;
+ clp->cl_rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
+ program, vers);
+out:
+ dfprintk_rcu(CLIENT, "%s: server (%s) %s NFSv%u LOCALIO\n", __func__,
+ rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
+ (IS_ERR(clp->cl_rpcclient_localio) ?
+ "does not support" : "supports"), vers);
+}
+
/* localio.c */
extern void nfs_local_init(void);
extern void nfs_local_disable(struct nfs_client *);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 286cd0ded1b6..54c41933173c 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -15,6 +15,7 @@
#include <linux/sunrpc/addr.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
+#include <linux/nfslocalio.h>
#include <linux/module.h>
#include <linux/bvec.h>
@@ -139,10 +140,14 @@ static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
+static inline bool nfs_client_is_local(const struct nfs_client *clp)
+{
+ return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+}
+
bool nfs_server_is_local(const struct nfs_client *clp)
{
- return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
- localio_enabled;
+ return nfs_client_is_local(clp) && localio_enabled;
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);
@@ -219,19 +224,82 @@ nfs_local_disable(struct nfs_client *clp)
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
trace_nfs_local_disable(clp);
nfs_local_put_lookup_ctx();
+ if (!IS_ERR(clp->cl_rpcclient_localio)) {
+ rpc_shutdown_client(clp->cl_rpcclient_localio);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ }
}
}
+static bool nfs_local_server_getuuid(struct nfs_client *clp, uuid_t *nfsd_uuid)
+{
+ u8 uuid[UUID_SIZE];
+ struct nfs_getuuidres res = {
+ uuid,
+ };
+ struct rpc_message msg = {
+ .rpc_resp = &res,
+ };
+ int status;
+
+ clp->rpc_ops->init_localioclient(clp);
+ if (IS_ERR(clp->cl_rpcclient_localio))
+ return false;
+
+ dprintk("%s: NFS issuing getuuid\n", __func__);
+ msg.rpc_proc = &clp->cl_rpcclient_localio->cl_procinfo[LOCALIOPROC_GETUUID];
+ status = rpc_call_sync(clp->cl_rpcclient_localio, &msg, 0);
+ dprintk("%s: NFS reply getuuid: status=%d uuid=%pU\n",
+ __func__, status, res.uuid);
+ if (status)
+ return false;
+
+ import_uuid(nfsd_uuid, res.uuid);
+
+ return true;
+}
+
/*
- * nfs_local_probe - probe local i/o support for an nfs_client
+ * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ * - called after alloc_client and init_client (so cl_rpcclient exists)
+ * - this function is idempotent, it can be called for old or new clients
*/
-void
-nfs_local_probe(struct nfs_client *clp)
+void nfs_local_probe(struct nfs_client *clp)
{
- bool enable = false;
+ uuid_t uuid;
- if (enable)
- nfs_local_enable(clp);
+ if (!localio_enabled)
+ goto unsupported;
+
+ if (nfs_client_is_local(clp)) {
+ /* If already enabled, disable and re-enable */
+ nfs_local_disable(clp);
+ }
+
+ switch (clp->cl_rpcclient->cl_vers) {
+ case 3:
+ case 4:
+ /*
+ * Retrieve server's uuid via LOCALIO protocol and verify the
+ * server with that uuid it is known to be local. This ensures
+ * client and server 1: support localio 2: are local to each other
+ * by verifying client's nfsd, with specified uuid, is local.
+ */
+ if (!nfs_local_server_getuuid(clp, &uuid) ||
+ !nfsd_uuid_is_local(&uuid))
+ goto unsupported;
+ break;
+ default:
+ goto unsupported;
+ }
+
+ dprintk("%s: detected local server.\n", __func__);
+ nfs_local_enable(clp);
+ return;
+
+unsupported:
+ /* localio not supported */
+ nfs_local_disable(clp);
}
EXPORT_SYMBOL_GPL(nfs_local_probe);
@@ -258,7 +326,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
trace_nfs_local_open_fh(fh, mode, status);
switch (status) {
case -ENXIO:
- nfs_local_disable(clp);
+ /* Revalidate localio, will disable if unsupported */
+ nfs_local_probe(clp);
fallthrough;
case -ETIMEDOUT:
status = -EAGAIN;
diff --git a/fs/nfs/nfs3_fs.h b/fs/nfs/nfs3_fs.h
index b333ea119ef5..efdf2b6519e9 100644
--- a/fs/nfs/nfs3_fs.h
+++ b/fs/nfs/nfs3_fs.h
@@ -30,6 +30,7 @@ static inline int nfs3_proc_setacls(struct inode *inode, struct posix_acl *acl,
struct nfs_server *nfs3_create_server(struct fs_context *);
struct nfs_server *nfs3_clone_server(struct nfs_server *, struct nfs_fh *,
struct nfs_fattr *, rpc_authflavor_t);
+void nfs3_init_localioclient(struct nfs_client *);
/* nfs3super.c */
extern struct nfs_subversion nfs_v3;
diff --git a/fs/nfs/nfs3client.c b/fs/nfs/nfs3client.c
index b0c8a39c2bbd..123e7c1fd339 100644
--- a/fs/nfs/nfs3client.c
+++ b/fs/nfs/nfs3client.c
@@ -7,6 +7,8 @@
#include "netns.h"
#include "sysfs.h"
+#define NFSDBG_FACILITY NFSDBG_CLIENT
+
#ifdef CONFIG_NFS_V3_ACL
static struct rpc_stat nfsacl_rpcstat = { &nfsacl_program };
static const struct rpc_version *nfsacl_version[] = {
@@ -130,3 +132,26 @@ struct nfs_client *nfs3_set_ds_client(struct nfs_server *mds_srv,
return clp;
}
EXPORT_SYMBOL_GPL(nfs3_set_ds_client);
+
+#if defined(CONFIG_NFS_V3_LOCALIO)
+static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program3 };
+static const struct rpc_version *nfslocalio_version[] = {
+ [3] = &nfslocalio_version3,
+};
+
+const struct rpc_program nfslocalio_program3 = {
+ .name = "nfslocalio",
+ .number = NFS_LOCALIO_PROGRAM,
+ .nrvers = ARRAY_SIZE(nfslocalio_version),
+ .version = nfslocalio_version,
+ .stats = &nfslocalio_rpcstat,
+};
+
+/*
+ * Initialise an NFSv3 localio client connection
+ */
+void nfs3_init_localioclient(struct nfs_client *clp)
+{
+ nfs_init_localioclient(clp, &nfslocalio_program3, 3);
+}
+#endif /* CONFIG_NFS_V3_LOCALIO */
diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 74bda639a7cf..40b6e4d1e7be 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -1067,4 +1067,7 @@ const struct nfs_rpc_ops nfs_v3_clientops = {
.free_client = nfs_free_client,
.create_server = nfs3_create_server,
.clone_server = nfs3_clone_server,
+#if defined(CONFIG_NFS_V3_LOCALIO)
+ .init_localioclient = nfs3_init_localioclient,
+#endif
};
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index 60f032be805a..d2a17ecd12b8 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2579,3 +2579,70 @@ const struct rpc_version nfsacl_version3 = {
.counts = nfs3_acl_counts,
};
#endif /* CONFIG_NFS_V3_ACL */
+
+#if defined(CONFIG_NFS_V3_LOCALIO)
+
+#define LOCALIO3_getuuidres_sz (1+XDR_QUADLEN(UUID_SIZE))
+
+static void nfs3_xdr_enc_getuuidargs(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ const void *data)
+{
+ /* void function */
+}
+
+// FIXME: factor out from fs/nfs/nfs4xdr.c
+static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
+{
+ ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
+ if (unlikely(ret < 0))
+ return -EIO;
+ return 0;
+}
+
+static inline int nfs3_decode_getuuidresok(struct xdr_stream *xdr,
+ struct nfs_getuuidres *result)
+{
+ return decode_opaque_fixed(xdr, result->uuid, UUID_SIZE);
+}
+
+static int nfs3_xdr_dec_getuuidres(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ void *result)
+{
+ enum nfs_stat status;
+ int error;
+
+ error = decode_nfsstat3(xdr, &status);
+ if (unlikely(error))
+ goto out;
+ if (status != NFS3_OK)
+ goto out_default;
+ error = nfs3_decode_getuuidresok(xdr, result);
+out:
+ return error;
+out_default:
+ return nfs3_stat_to_errno(status);
+}
+
+static const struct rpc_procinfo nfs3_localio_procedures[] = {
+ [LOCALIOPROC_GETUUID] = {
+ .p_proc = LOCALIOPROC_GETUUID,
+ .p_encode = nfs3_xdr_enc_getuuidargs,
+ .p_decode = nfs3_xdr_dec_getuuidres,
+ .p_arglen = 1,
+ .p_replen = LOCALIO3_getuuidres_sz,
+ .p_timer = 0,
+ .p_name = "GETUUID",
+ },
+};
+
+static unsigned int nfs3_localio_counts[ARRAY_SIZE(nfs3_localio_procedures)];
+const struct rpc_version nfslocalio_version3 = {
+ .number = 3,
+ .nrprocs = ARRAY_SIZE(nfs3_localio_procedures),
+ .procs = nfs3_localio_procedures,
+ .counts = nfs3_localio_counts,
+};
+
+#endif /* CONFIG_NFS_V3_LOCALIO */
diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 7024230f0d1d..a0a41917dec2 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -538,6 +538,8 @@ extern int nfs4_proc_commit(struct file *dst, __u64 offset, __u32 count, struct
extern const nfs4_stateid zero_stateid;
extern const nfs4_stateid invalid_stateid;
+extern void nfs4_init_localioclient(struct nfs_client *);
+
/* nfs4super.c */
struct nfs_mount_info;
extern struct nfs_subversion nfs_v4;
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 84573df5cf5a..d2f634aa1e1b 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -1384,3 +1384,26 @@ int nfs4_update_server(struct nfs_server *server, const char *hostname,
return nfs_probe_server(server, NFS_FH(d_inode(server->super->s_root)));
}
+
+#if defined(CONFIG_NFS_V4_LOCALIO)
+static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program4 };
+static const struct rpc_version *nfslocalio_version[] = {
+ [4] = &nfslocalio_version4,
+};
+
+const struct rpc_program nfslocalio_program4 = {
+ .name = "nfslocalio",
+ .number = NFS_LOCALIO_PROGRAM,
+ .nrvers = ARRAY_SIZE(nfslocalio_version),
+ .version = nfslocalio_version,
+ .stats = &nfslocalio_rpcstat,
+};
+
+/*
+ * Initialise an NFSv4 localio client connection
+ */
+void nfs4_init_localioclient(struct nfs_client *clp)
+{
+ nfs_init_localioclient(clp, &nfslocalio_program4, 4);
+}
+#endif /* CONFIG_NFS_V4_LOCALIO */
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index c93c12063b3a..060bc8dbee61 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -10745,6 +10745,9 @@ const struct nfs_rpc_ops nfs_v4_clientops = {
.discover_trunking = nfs4_discover_trunking,
.enable_swap = nfs4_enable_swap,
.disable_swap = nfs4_disable_swap,
+#if defined(CONFIG_NFS_V4_LOCALIO)
+ .init_localioclient = nfs4_init_localioclient,
+#endif
};
static const struct xattr_handler nfs4_xattr_nfs4_acl_handler = {
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 1416099dfcd1..d3b4fa3245f0 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -7728,3 +7728,55 @@ const struct rpc_version nfs_version4 = {
.procs = nfs4_procedures,
.counts = nfs_version4_counts,
};
+
+#if defined(CONFIG_NFS_V4_LOCALIO)
+
+#define LOCALIO4_getuuidres_sz (op_decode_hdr_maxsz+XDR_QUADLEN(UUID_SIZE))
+
+static void nfs4_xdr_enc_getuuidargs(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ const void *data)
+{
+ /* void function */
+}
+
+static inline int nfs4_decode_getuuidresok(struct xdr_stream *xdr,
+ struct nfs_getuuidres *result)
+{
+ return decode_opaque_fixed(xdr, result->uuid, UUID_SIZE);
+}
+
+static int nfs4_xdr_dec_getuuidres(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ void *result)
+{
+ // FIXME: need proper handling that isn't abusing nfs_opnum4
+ int error = decode_op_hdr(xdr, LOCALIOPROC_GETUUID);
+ if (unlikely(error))
+ goto out;
+ error = nfs4_decode_getuuidresok(xdr, result);
+out:
+ return error;
+}
+
+static const struct rpc_procinfo nfs4_localio_procedures[] = {
+ [LOCALIOPROC_GETUUID] = {
+ .p_proc = LOCALIOPROC_GETUUID,
+ .p_encode = nfs4_xdr_enc_getuuidargs,
+ .p_decode = nfs4_xdr_dec_getuuidres,
+ .p_arglen = 1,
+ .p_replen = LOCALIO4_getuuidres_sz,
+ .p_statidx = LOCALIOPROC_GETUUID,
+ .p_name = "GETUUID",
+ },
+};
+
+static unsigned int nfs4_localio_counts[ARRAY_SIZE(nfs4_localio_procedures)];
+const struct rpc_version nfslocalio_version4 = {
+ .number = 4,
+ .nrprocs = ARRAY_SIZE(nfs4_localio_procedures),
+ .procs = nfs4_localio_procedures,
+ .counts = nfs4_localio_counts,
+};
+
+#endif /* CONFIG_NFS_V4_LOCALIO */
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 00fe469bc72e..efcdb4d8e9de 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -130,6 +130,7 @@ struct nfs_client {
/* localio */
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
+ struct rpc_clnt * cl_rpcclient_localio; /* localio RPC client handle */
};
/*
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 764513a61601..2a438f4c2d6d 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1002,6 +1002,10 @@ struct nfs3_getaclres {
struct posix_acl * acl_default;
};
+struct nfs_getuuidres {
+ __u8 * uuid;
+};
+
#if IS_ENABLED(CONFIG_NFS_V4)
typedef u64 clientid4;
@@ -1819,6 +1823,7 @@ struct nfs_rpc_ops {
int (*discover_trunking)(struct nfs_server *, struct nfs_fh *);
void (*enable_swap)(struct inode *inode);
void (*disable_swap)(struct inode *inode);
+ void (*init_localioclient)(struct nfs_client *);
};
/*
@@ -1834,4 +1839,9 @@ extern const struct rpc_version nfs_version4;
extern const struct rpc_version nfsacl_version3;
extern const struct rpc_program nfsacl_program;
+extern const struct rpc_version nfslocalio_version3;
+extern const struct rpc_program nfslocalio_program3;
+extern const struct rpc_version nfslocalio_version4;
+extern const struct rpc_program nfslocalio_program4;
+
#endif
diff --git a/include/uapi/linux/nfs.h b/include/uapi/linux/nfs.h
index f356f2ba3814..e72f5564bdc0 100644
--- a/include/uapi/linux/nfs.h
+++ b/include/uapi/linux/nfs.h
@@ -33,6 +33,10 @@
#define NFS_MNT_VERSION 1
#define NFS_MNT3_VERSION 3
+#define NFS_LOCALIO_PROGRAM 100229
+#define LOCALIOPROC_NULL 0
+#define LOCALIOPROC_GETUUID 1
+
#define NFS_PIPE_DIRNAME "nfs"
/*
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 10/19] nfsd: implement v3 and v4 server support for NFS_LOCALIO_PROGRAM
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (8 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 09/19] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 11/19] nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h Mike Snitzer
` (9 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
LOCALIOPROC_GETUUID encodes the server's uuid_t in terms of the fixed
UUID_SIZE (16). The fixed size opaque encode and decode XDR methods
are used instead of the less efficient variable sized methods.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 148 ++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 11 ++++
fs/nfsd/nfssvc.c | 80 ++++++++++++++++++++++++-
fs/nfsd/xdr.h | 6 ++
4 files changed, 244 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 6e2918e76f49..bb84e165dbe1 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -17,6 +17,9 @@
#include "vfs.h"
#include "netns.h"
#include "filecache.h"
+#include "cache.h"
+#include "xdr3.h"
+#include "xdr4.h"
#define NFSDDBG_FACILITY NFSDDBG_FH
@@ -241,3 +244,148 @@ EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
/* Compile time type checking, not used by anything */
static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
+
+/*
+ * GETUUID XDR encode functions
+ */
+
+static __be32 nfsd_proc_null(struct svc_rqst *rqstp)
+{
+ return rpc_success;
+}
+
+static __be32 nfsd_proc_getuuid(struct svc_rqst *rqstp)
+{
+ struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+ struct nfsd_getuuidres *resp = rqstp->rq_resp;
+
+ uuid_copy(&resp->uuid, &nn->nfsd_uuid.uuid);
+ resp->status = nfs_ok;
+
+ return rpc_success;
+}
+
+#define NFS_getuuid_sz XDR_QUADLEN(UUID_SIZE)
+
+static inline void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
+{
+ WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
+}
+
+static void encode_uuid(struct xdr_stream *xdr, uuid_t *src_uuid)
+{
+ u8 uuid[UUID_SIZE];
+
+ export_uuid(uuid, src_uuid);
+ encode_opaque_fixed(xdr, uuid, UUID_SIZE);
+ dprintk("%s: uuid=%pU\n", __func__, uuid);
+}
+
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+static bool nfs3svc_encode_getuuidres(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr)
+{
+ struct nfsd_getuuidres *resp = rqstp->rq_resp;
+
+ if (!svcxdr_encode_nfsstat3(xdr, resp->status))
+ return false;
+ if (resp->status == nfs_ok)
+ encode_uuid(xdr, &resp->uuid);
+
+ return true;
+}
+
+#define ST 1 /* status */
+#define NFS3_filename_sz (1+(NFS3_MAXNAMLEN>>2))
+
+static const struct svc_procedure nfsd_localio_procedures3[2] = {
+ [LOCALIOPROC_NULL] = {
+ .pc_func = nfsd_proc_null,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfssvc_encode_voidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_voidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = 1,
+ .pc_name = "NULL",
+ },
+ [LOCALIOPROC_GETUUID] = {
+ .pc_func = nfsd_proc_getuuid,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfs3svc_encode_getuuidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_getuuidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = 1+NFS_getuuid_sz,
+ .pc_name = "GETUUID",
+ },
+};
+
+static DEFINE_PER_CPU_ALIGNED(unsigned long,
+ nfsd_localio_count3[ARRAY_SIZE(nfsd_localio_procedures3)]);
+const struct svc_version nfsd_localio_version3 = {
+ .vs_vers = 3,
+ .vs_nproc = 2,
+ .vs_proc = nfsd_localio_procedures3,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_count = nfsd_localio_count3,
+ .vs_xdrsize = NFS3_SVC_XDRSIZE,
+};
+#endif /* CONFIG_NFSD_V3_LOCALIO */
+
+#if defined(CONFIG_NFSD_V4_LOCALIO)
+static bool nfs4svc_encode_getuuidres(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr)
+{
+ struct nfsd_getuuidres *resp = rqstp->rq_resp;
+ __be32 *p;
+
+ p = xdr_reserve_space(xdr, 8);
+ if (!p)
+ return 0;
+ *p++ = cpu_to_be32(LOCALIOPROC_GETUUID);
+ *p++ = resp->status;
+
+ if (resp->status == nfs_ok)
+ encode_uuid(xdr, &resp->uuid);
+
+ return 1;
+}
+
+static const struct svc_procedure nfsd_localio_procedures4[2] = {
+ [LOCALIOPROC_NULL] = {
+ .pc_func = nfsd_proc_null,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfssvc_encode_voidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_voidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = 1,
+ .pc_name = "NULL",
+ },
+ [LOCALIOPROC_GETUUID] = {
+ .pc_func = nfsd_proc_getuuid,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfs4svc_encode_getuuidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_getuuidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = 2+NFS_getuuid_sz,
+ .pc_name = "GETUUID",
+ },
+};
+
+static DEFINE_PER_CPU_ALIGNED(unsigned long,
+ nfsd_localio_count4[ARRAY_SIZE(nfsd_localio_procedures4)]);
+const struct svc_version nfsd_localio_version4 = {
+ .vs_vers = 4,
+ .vs_nproc = 2,
+ .vs_proc = nfsd_localio_procedures4,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_count = nfsd_localio_count4,
+ .vs_xdrsize = NFS4_SVC_XDRSIZE,
+ .vs_rpcb_optnl = true,
+ .vs_need_cong_ctrl = true,
+
+};
+#endif /* CONFIG_NFSD_V4_LOCALIO */
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index cec8697b1cd6..4f51f95df294 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -143,6 +143,17 @@ extern const struct svc_version nfsd_acl_version3;
#endif
#endif
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+extern const struct svc_version nfsd_localio_version3;
+#else
+#define nfsd_localio_version3 NULL
+#endif
+#if defined(CONFIG_NFSD_V4_LOCALIO)
+extern const struct svc_version nfsd_localio_version4;
+#else
+#define nfsd_localio_version4 NULL
+#endif
+
struct nfsd_net;
enum vers_op {NFSD_SET, NFSD_CLEAR, NFSD_TEST, NFSD_AVAIL };
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 1222a0a33fe1..a81be9b39399 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -38,6 +38,16 @@
atomic_t nfsd_th_cnt = ATOMIC_INIT(0);
extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+static int nfsd_localio_rpcbind_set(struct net *,
+ const struct svc_program *,
+ u32, int,
+ unsigned short,
+ unsigned short);
+static __be32 nfsd_localio_init_request(struct svc_rqst *,
+ const struct svc_program *,
+ struct svc_process_info *);
+#endif /* CONFIG_NFSD_LOCALIO */
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static int nfsd_acl_rpcbind_set(struct net *,
const struct svc_program *,
@@ -81,6 +91,31 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
unsigned long nfsd_drc_max_mem;
unsigned long nfsd_drc_mem_used;
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+static const struct svc_version *nfsd_localio_version[] = {
+#if defined(CONFIG_NFSD_V3_LOCALIO)
+ [3] = &nfsd_localio_version3,
+#endif
+#if defined(CONFIG_NFSD_V4_LOCALIO)
+ [4] = &nfsd_localio_version4,
+#endif
+};
+
+#define NFSD_LOCALIO_MINVERS 3
+#define NFSD_LOCALIO_NRVERS ARRAY_SIZE(nfsd_localio_version)
+
+static struct svc_program nfsd_localio_program = {
+ .pg_prog = NFS_LOCALIO_PROGRAM,
+ .pg_nvers = NFSD_LOCALIO_NRVERS,
+ .pg_vers = nfsd_localio_version,
+ .pg_name = "nfslocalio",
+ .pg_class = "nfsd",
+ .pg_authenticate = &svc_set_client,
+ .pg_init_request = nfsd_localio_init_request,
+ .pg_rpcbind_set = nfsd_localio_rpcbind_set,
+};
+#endif /* CONFIG_NFSD_LOCALIO */
+
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static const struct svc_version *nfsd_acl_version[] = {
# if defined(CONFIG_NFSD_V2_ACL)
@@ -95,6 +130,9 @@ static const struct svc_version *nfsd_acl_version[] = {
#define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)
static struct svc_program nfsd_acl_program = {
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ .pg_next = &nfsd_localio_program,
+#endif /* CONFIG_NFSD_LOCALIO */
.pg_prog = NFS_ACL_PROGRAM,
.pg_nvers = NFSD_ACL_NRVERS,
.pg_vers = nfsd_acl_version,
@@ -123,6 +161,10 @@ static const struct svc_version *nfsd_version[] = {
struct svc_program nfsd_program = {
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
.pg_next = &nfsd_acl_program,
+#else
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ .pg_next = &nfsd_localio_program,
+#endif /* CONFIG_NFSD_LOCALIO */
#endif
.pg_prog = NFS_PROGRAM, /* program number */
.pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
@@ -832,6 +874,42 @@ nfsd_svc(int n, int *nthreads, struct net *net, const struct cred *cred, const c
return error;
}
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+static bool
+nfsd_support_localio_version(int vers)
+{
+ if (vers >= NFSD_LOCALIO_MINVERS && vers < NFSD_LOCALIO_NRVERS)
+ return nfsd_localio_version[vers] != NULL;
+ return false;
+}
+
+static int
+nfsd_localio_rpcbind_set(struct net *net, const struct svc_program *progp,
+ u32 version, int family, unsigned short proto,
+ unsigned short port)
+{
+ if (!nfsd_support_localio_version(version) ||
+ !nfsd_vers(net_generic(net, nfsd_net_id), version, NFSD_TEST))
+ return 0;
+ return svc_generic_rpcbind_set(net, progp, version, family,
+ proto, port);
+}
+
+static __be32
+nfsd_localio_init_request(struct svc_rqst *rqstp,
+ const struct svc_program *progp,
+ struct svc_process_info *ret)
+{
+ struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+
+ if (likely(nfsd_support_localio_version(rqstp->rq_vers) &&
+ nfsd_vers(nn, rqstp->rq_vers, NFSD_TEST)))
+ return svc_generic_init_request(rqstp, progp, ret);
+
+ return rpc_prog_unavail;
+}
+#endif /* CONFIG_NFSD_LOCALIO */
+
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static bool
nfsd_support_acl_version(int vers)
@@ -974,7 +1052,7 @@ nfsd(void *vrqstp)
}
/**
- * nfsd_dispatch - Process an NFS or NFSACL Request
+ * nfsd_dispatch - Process an NFS or NFSACL or NFSLOCALIO Request
* @rqstp: incoming request
*
* This RPC dispatcher integrates the NFS server's duplicate reply cache.
diff --git a/fs/nfsd/xdr.h b/fs/nfsd/xdr.h
index 852f71580bd0..5714469af597 100644
--- a/fs/nfsd/xdr.h
+++ b/fs/nfsd/xdr.h
@@ -5,6 +5,7 @@
#define LINUX_NFSD_H
#include <linux/vfs.h>
+#include <linux/uuid.h>
#include "nfsd.h"
#include "nfsfh.h"
@@ -123,6 +124,11 @@ struct nfsd_statfsres {
struct kstatfs stats;
};
+struct nfsd_getuuidres {
+ __be32 status;
+ uuid_t uuid;
+};
+
/*
* Storage requirements for XDR arguments and results.
*/
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 11/19] nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (9 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 10/19] nfsd: implement v3 and v4 server " Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 12/19] nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common Mike Snitzer
` (8 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
Eliminates duplicate functions in various files.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 6 ------
fs/nfs/nfs3xdr.c | 9 ---------
fs/nfs/nfs4xdr.c | 13 -------------
fs/nfsd/localio.c | 7 ++-----
include/linux/nfs_xdr.h | 20 +++++++++++++++++++-
5 files changed, 21 insertions(+), 34 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index ec6aaa110a7b..8b9096ad0663 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -2185,12 +2185,6 @@ static int ff_layout_encode_ioerr(struct xdr_stream *xdr,
return ff_layout_encode_ds_ioerr(xdr, &ff_args->errors);
}
-static void
-encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void
ff_layout_encode_ff_iostat_head(struct xdr_stream *xdr,
const nfs4_stateid *stateid,
diff --git a/fs/nfs/nfs3xdr.c b/fs/nfs/nfs3xdr.c
index d2a17ecd12b8..95a2fb0733ae 100644
--- a/fs/nfs/nfs3xdr.c
+++ b/fs/nfs/nfs3xdr.c
@@ -2591,15 +2591,6 @@ static void nfs3_xdr_enc_getuuidargs(struct rpc_rqst *req,
/* void function */
}
-// FIXME: factor out from fs/nfs/nfs4xdr.c
-static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
-{
- ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
- if (unlikely(ret < 0))
- return -EIO;
- return 0;
-}
-
static inline int nfs3_decode_getuuidresok(struct xdr_stream *xdr,
struct nfs_getuuidres *result)
{
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index d3b4fa3245f0..6b35b1d7d7ce 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -968,11 +968,6 @@ static __be32 *reserve_space(struct xdr_stream *xdr, size_t nbytes)
return p;
}
-static void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
{
WARN_ON_ONCE(xdr_stream_encode_opaque(xdr, str, len) < 0);
@@ -4352,14 +4347,6 @@ static int decode_access(struct xdr_stream *xdr, u32 *supported, u32 *access)
return 0;
}
-static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
-{
- ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
- if (unlikely(ret < 0))
- return -EIO;
- return 0;
-}
-
static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
{
return decode_opaque_fixed(xdr, stateid, NFS4_STATEID_SIZE);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index bb84e165dbe1..7ecd72406dc0 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -11,6 +11,8 @@
#include <linux/sunrpc/svcauth_gss.h>
#include <linux/sunrpc/clnt.h>
#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
#include <linux/string.h>
#include "nfsd.h"
@@ -267,11 +269,6 @@ static __be32 nfsd_proc_getuuid(struct svc_rqst *rqstp)
#define NFS_getuuid_sz XDR_QUADLEN(UUID_SIZE)
-static inline void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void encode_uuid(struct xdr_stream *xdr, uuid_t *src_uuid)
{
u8 uuid[UUID_SIZE];
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index 2a438f4c2d6d..daa4115f6be6 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1826,6 +1826,24 @@ struct nfs_rpc_ops {
void (*init_localioclient)(struct nfs_client *);
};
+/*
+ * Helper functions used by NFS client and/or server
+ */
+static inline void encode_opaque_fixed(struct xdr_stream *xdr,
+ const void *buf, size_t len)
+{
+ WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
+}
+
+static inline int decode_opaque_fixed(struct xdr_stream *xdr,
+ void *buf, size_t len)
+{
+ ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
+ if (unlikely(ret < 0))
+ return -EIO;
+ return 0;
+}
+
/*
* Function vectors etc. for the NFS client
*/
@@ -1844,4 +1862,4 @@ extern const struct rpc_program nfslocalio_program3;
extern const struct rpc_version nfslocalio_version4;
extern const struct rpc_program nfslocalio_program4;
-#endif
+#endif /* _LINUX_NFS_XDR_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 12/19] nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (10 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 11/19] nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 21:32 ` Jeff Layton
2024-06-18 20:19 ` [PATCH v5 13/19] nfs/nfsd: ensure localio server always uses its network namespace Mike Snitzer
` (7 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
Get nfsd_open_local_fh and store it in rpc_client during client
creation, put the symbol during nfs_local_disable -- which is also
called during client destruction.
Eliminates the need for nfs_local_open_ctx and extra locking and
refcounting work in fs/nfs/localio.c
Also makes it so the reference to the nfsd_open_local_fh symbol is
managed by the nfs_common module instead of the nfs client modules.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/client.c | 1 +
fs/nfs/inode.c | 1 -
fs/nfs/internal.h | 18 +++++---
fs/nfs/localio.c | 86 +++-----------------------------------
fs/nfs_common/nfslocalio.c | 26 ++++++++++++
include/linux/nfs.h | 4 --
include/linux/nfs_fs_sb.h | 2 +
include/linux/nfslocalio.h | 8 ++++
8 files changed, 54 insertions(+), 92 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 7044b8b3b332..cbabcdf3d785 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -171,6 +171,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
INIT_LIST_HEAD(&clp->cl_superblocks);
clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ clp->nfsd_open_local_fh = NULL;
clp->cl_flags = cl_init->init_flags;
clp->cl_proto = cl_init->proto;
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index 4f88b860494f..f9923cbf6058 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2499,7 +2499,6 @@ static int __init init_nfs_fs(void)
if (err)
goto out1;
- nfs_local_init();
err = register_nfs_fs();
if (err)
goto out0;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index fb2fb59e7ed0..d30a2e63063c 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -464,15 +464,22 @@ nfs_init_localioclient(struct nfs_client *clp,
goto out;
clp->cl_rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
program, vers);
+ if (IS_ERR(clp->cl_rpcclient_localio))
+ goto out;
+ /* No errors! Assume that localio is supported */
+ clp->nfsd_open_local_fh = get_nfsd_open_local_fh();
+ if (!clp->nfsd_open_local_fh) {
+ rpc_shutdown_client(clp->cl_rpcclient_localio);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ }
out:
- dfprintk_rcu(CLIENT, "%s: server (%s) %s NFSv%u LOCALIO\n", __func__,
- rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
- (IS_ERR(clp->cl_rpcclient_localio) ?
- "does not support" : "supports"), vers);
+ dfprintk_rcu(CLIENT, "%s: server (%s) %s NFSv%u LOCALIO, nfsd_open_local_fh is %s.\n",
+ __func__, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
+ (IS_ERR(clp->cl_rpcclient_localio) ? "does not support" : "supports"), vers,
+ (clp->nfsd_open_local_fh ? "set" : "not set"));
}
/* localio.c */
-extern void nfs_local_init(void);
extern void nfs_local_disable(struct nfs_client *);
extern void nfs_local_probe(struct nfs_client *);
extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
@@ -489,7 +496,6 @@ extern int nfs_local_commit(struct file *, struct nfs_commit_data *,
extern bool nfs_server_is_local(const struct nfs_client *clp);
#else
-static inline void nfs_local_init(void) {}
static inline void nfs_local_disable(struct nfs_client *clp) {}
static inline void nfs_local_probe(struct nfs_client *clp) {}
static inline struct file *nfs_local_open_fh(struct nfs_client *clp,
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 54c41933173c..ddd17549812e 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -29,26 +29,6 @@
#define NFSDBG_FACILITY NFSDBG_VFS
-extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
- const struct cred *cred,
- const struct nfs_fh *nfs_fh, const fmode_t fmode,
- struct file **pfilp);
-/*
- * The localio code needs to call into nfsd to do the filehandle -> struct path
- * mapping, but cannot be statically linked, because that will make the nfs
- * module depend on the nfsd module.
- *
- * Instead, do dynamic linking to the nfsd module. This way the nfs module
- * will only hold a reference on nfsd when it's actually in use. This also
- * allows some sanity checking, like giving up on localio if nfsd isn't loaded.
- */
-
-struct nfs_local_open_ctx {
- spinlock_t lock;
- nfs_to_nfsd_open_t open_f;
- atomic_t refcount;
-};
-
struct nfs_local_kiocb {
struct kiocb kiocb;
struct bio_vec *bvec;
@@ -135,8 +115,6 @@ nfs4errno(int errno)
return NFS4ERR_SERVERFAULT;
}
-static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
-
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
@@ -151,65 +129,12 @@ bool nfs_server_is_local(const struct nfs_client *clp)
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);
-void
-nfs_local_init(void)
-{
- struct nfs_local_open_ctx *ctx = &__local_open_ctx;
-
- ctx->open_f = NULL;
- spin_lock_init(&ctx->lock);
- atomic_set(&ctx->refcount, 0);
-}
-
-static bool
-nfs_local_get_lookup_ctx(void)
-{
- struct nfs_local_open_ctx *ctx = &__local_open_ctx;
- nfs_to_nfsd_open_t fn = NULL;
-
- spin_lock(&ctx->lock);
- if (ctx->open_f == NULL) {
- spin_unlock(&ctx->lock);
-
- fn = symbol_request(nfsd_open_local_fh);
- if (!fn)
- return false;
-
- spin_lock(&ctx->lock);
- /* catch race */
- if (ctx->open_f == NULL) {
- ctx->open_f = fn;
- fn = NULL;
- }
- }
- atomic_inc(&ctx->refcount);
- spin_unlock(&ctx->lock);
- if (fn)
- symbol_put(nfsd_open_local_fh);
- return true;
-}
-
-static void
-nfs_local_put_lookup_ctx(void)
-{
- struct nfs_local_open_ctx *ctx = &__local_open_ctx;
- nfs_to_nfsd_open_t fn;
-
- if (atomic_dec_and_lock(&ctx->refcount, &ctx->lock)) {
- fn = ctx->open_f;
- ctx->open_f = NULL;
- spin_unlock(&ctx->lock);
- if (fn)
- symbol_put(nfsd_open_local_fh);
- }
-}
-
/*
* nfs_local_enable - attempt to enable local i/o for an nfs_client
*/
static void nfs_local_enable(struct nfs_client *clp)
{
- if (nfs_local_get_lookup_ctx()) {
+ if (READ_ONCE(clp->nfsd_open_local_fh)) {
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
trace_nfs_local_enable(clp);
}
@@ -218,12 +143,12 @@ static void nfs_local_enable(struct nfs_client *clp)
/*
* nfs_local_disable - disable local i/o for an nfs_client
*/
-void
-nfs_local_disable(struct nfs_client *clp)
+void nfs_local_disable(struct nfs_client *clp)
{
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
trace_nfs_local_disable(clp);
- nfs_local_put_lookup_ctx();
+ put_nfsd_open_local_fh();
+ clp->nfsd_open_local_fh = NULL;
if (!IS_ERR(clp->cl_rpcclient_localio)) {
rpc_shutdown_client(clp->cl_rpcclient_localio);
clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
@@ -312,14 +237,13 @@ struct file *
nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, const fmode_t mode)
{
- struct nfs_local_open_ctx *ctx = &__local_open_ctx;
struct file *filp;
int status;
if (mode & ~(FMODE_READ | FMODE_WRITE))
return ERR_PTR(-EINVAL);
- status = ctx->open_f(clp->cl_rpcclient, cred, fh, mode, &filp);
+ status = clp->nfsd_open_local_fh(clp->cl_rpcclient, cred, fh, mode, &filp);
if (status < 0) {
dprintk("%s: open local file failed error=%d\n",
__func__, status);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index f214cc6754a1..c454c4100976 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -40,3 +40,29 @@ bool nfsd_uuid_is_local(const uuid_t *uuid)
return !uuid_is_null(nfsd_uuid);
}
EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
+
+/*
+ * The nfs localio code needs to call into nfsd to do the filehandle -> struct path
+ * mapping, but cannot be statically linked, because that will make the nfs module
+ * depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
+ * nfs_common module will only hold a reference on nfsd when localio is in use.
+ * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+
+extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+ const struct cred *cred, const struct nfs_fh *nfs_fh,
+ const fmode_t fmode, struct file **pfilp);
+
+nfs_to_nfsd_open_t get_nfsd_open_local_fh(void)
+{
+ return symbol_request(nfsd_open_local_fh);
+}
+EXPORT_SYMBOL_GPL(get_nfsd_open_local_fh);
+
+void put_nfsd_open_local_fh(void)
+{
+ symbol_put(nfsd_open_local_fh);
+}
+EXPORT_SYMBOL_GPL(put_nfsd_open_local_fh);
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index 2dacfe9742c6..64ed672a0b34 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -48,10 +48,6 @@ enum nfs3_stable_how {
NFS_INVALID_STABLE_HOW = -1
};
-typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
- const struct nfs_fh *, const fmode_t,
- struct file **);
-
#ifdef CONFIG_CRC32
/**
* nfs_fhandle_hash - calculate the crc32 hash for the filehandle
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index efcdb4d8e9de..f5760b05ec87 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -8,6 +8,7 @@
#include <linux/wait.h>
#include <linux/nfs_xdr.h>
#include <linux/sunrpc/xprt.h>
+#include <linux/nfslocalio.h>
#include <linux/atomic.h>
#include <linux/refcount.h>
@@ -131,6 +132,7 @@ struct nfs_client {
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
struct rpc_clnt * cl_rpcclient_localio; /* localio RPC client handle */
+ nfs_to_nfsd_open_t nfsd_open_local_fh;
};
/*
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index d0bbacd0adcf..b8df1b9f248d 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -7,6 +7,7 @@
#include <linux/list.h>
#include <linux/uuid.h>
+#include <linux/nfs.h>
/*
* Global list of nfsd_uuid_t instances, add/remove
@@ -26,4 +27,11 @@ typedef struct {
bool nfsd_uuid_is_local(const uuid_t *uuid);
+typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
+ const struct nfs_fh *, const fmode_t,
+ struct file **);
+
+nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
+void put_nfsd_open_local_fh(void);
+
#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 13/19] nfs/nfsd: ensure localio server always uses its network namespace
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (11 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 12/19] nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 21:36 ` Jeff Layton
2024-06-18 20:19 ` [PATCH v5 14/19] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
` (6 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
Pass the stored cl_nfssvc_net from the client to the server as first
argument to nfsd_open_local_fh() to ensure the proper network
namespace is used for localio.
Otherwise, before this commit, the nfs_client's network namespace was
used (as extracted from the client's cl_rpcclient). This is clearly
not going to allow proper functionality if the client and server
happen to have disjoint network namespaces.
Elected to not rename the nfsd_uuid_t structure despite it growing a
non-uuid member. Can revisit later.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/client.c | 1 +
fs/nfs/localio.c | 12 ++++++++----
fs/nfs_common/nfslocalio.c | 15 +++++++++------
fs/nfsd/localio.c | 9 +++++----
fs/nfsd/nfssvc.c | 1 +
fs/nfsd/vfs.h | 3 ++-
include/linux/nfs_fs_sb.h | 1 +
include/linux/nfslocalio.h | 10 ++++++----
8 files changed, 33 insertions(+), 19 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index cbabcdf3d785..40077ad08ccb 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -171,6 +171,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
INIT_LIST_HEAD(&clp->cl_superblocks);
clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ clp->cl_nfssvc_net = NULL;
clp->nfsd_open_local_fh = NULL;
clp->cl_flags = cl_init->init_flags;
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index ddd17549812e..d41130f5a84d 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -132,10 +132,11 @@ EXPORT_SYMBOL_GPL(nfs_server_is_local);
/*
* nfs_local_enable - attempt to enable local i/o for an nfs_client
*/
-static void nfs_local_enable(struct nfs_client *clp)
+static void nfs_local_enable(struct nfs_client *clp, struct net *net)
{
if (READ_ONCE(clp->nfsd_open_local_fh)) {
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ clp->cl_nfssvc_net = net;
trace_nfs_local_enable(clp);
}
}
@@ -153,6 +154,7 @@ void nfs_local_disable(struct nfs_client *clp)
rpc_shutdown_client(clp->cl_rpcclient_localio);
clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
}
+ clp->cl_nfssvc_net = NULL;
}
}
@@ -192,6 +194,7 @@ static bool nfs_local_server_getuuid(struct nfs_client *clp, uuid_t *nfsd_uuid)
void nfs_local_probe(struct nfs_client *clp)
{
uuid_t uuid;
+ struct net *net = NULL;
if (!localio_enabled)
goto unsupported;
@@ -211,7 +214,7 @@ void nfs_local_probe(struct nfs_client *clp)
* by verifying client's nfsd, with specified uuid, is local.
*/
if (!nfs_local_server_getuuid(clp, &uuid) ||
- !nfsd_uuid_is_local(&uuid))
+ !nfsd_uuid_is_local(&uuid, &net))
goto unsupported;
break;
default:
@@ -219,7 +222,7 @@ void nfs_local_probe(struct nfs_client *clp)
}
dprintk("%s: detected local server.\n", __func__);
- nfs_local_enable(clp);
+ nfs_local_enable(clp, net);
return;
unsupported:
@@ -243,7 +246,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
if (mode & ~(FMODE_READ | FMODE_WRITE))
return ERR_PTR(-EINVAL);
- status = clp->nfsd_open_local_fh(clp->cl_rpcclient, cred, fh, mode, &filp);
+ status = clp->nfsd_open_local_fh(clp->cl_nfssvc_net, clp->cl_rpcclient,
+ cred, fh, mode, &filp);
if (status < 0) {
dprintk("%s: open local file failed error=%d\n",
__func__, status);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index c454c4100976..086e09b3ec38 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -12,29 +12,32 @@ MODULE_LICENSE("GPL");
/*
* Global list of nfsd_uuid_t instances, add/remove
* is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
- * Reads are protected RCU read lock (see below).
+ * Reads are protected by RCU read lock (see below).
*/
LIST_HEAD(nfsd_uuids);
EXPORT_SYMBOL(nfsd_uuids);
/* Must be called with RCU read lock held. */
-static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid)
+static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid,
+ struct net **netp)
{
nfsd_uuid_t *nfsd_uuid;
list_for_each_entry_rcu(nfsd_uuid, &nfsd_uuids, list)
- if (uuid_equal(&nfsd_uuid->uuid, uuid))
+ if (uuid_equal(&nfsd_uuid->uuid, uuid)) {
+ *netp = nfsd_uuid->net;
return &nfsd_uuid->uuid;
+ }
return &uuid_null;
}
-bool nfsd_uuid_is_local(const uuid_t *uuid)
+bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp)
{
const uuid_t *nfsd_uuid;
rcu_read_lock();
- nfsd_uuid = nfsd_uuid_lookup(uuid);
+ nfsd_uuid = nfsd_uuid_lookup(uuid, netp);
rcu_read_unlock();
return !uuid_is_null(nfsd_uuid);
@@ -51,7 +54,7 @@ EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
* This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
*/
-extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+extern int nfsd_open_local_fh(struct net *, struct rpc_clnt *rpc_clnt,
const struct cred *cred, const struct nfs_fh *nfs_fh,
const fmode_t fmode, struct file **pfilp);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 7ecd72406dc0..34678bfed579 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -103,10 +103,10 @@ nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
}
static struct svc_rqst *
-nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
+nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
+ const struct cred *cred)
{
struct svc_rqst *rqstp;
- struct net *net = rpc_net_ns(rpc_clnt);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
int status;
@@ -186,7 +186,8 @@ nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
* dependency on knfsd. So, there is no forward declaration in a header file
* for it.
*/
-int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+int nfsd_open_local_fh(struct net *net,
+ struct rpc_clnt *rpc_clnt,
const struct cred *cred,
const struct nfs_fh *nfs_fh,
const fmode_t fmode,
@@ -203,7 +204,7 @@ int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
/* Save creds before calling into nfsd */
save_cred = get_current_cred();
- rqstp = nfsd_local_fakerqst_create(rpc_clnt, cred);
+ rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
if (IS_ERR(rqstp)) {
status = PTR_ERR(rqstp);
goto out_revertcred;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index a81be9b39399..48bfd3c6d619 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -473,6 +473,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
#endif
#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
INIT_LIST_HEAD(&nn->nfsd_uuid.list);
+ nn->nfsd_uuid.net = net;
list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
nn->nfsd_net_up = true;
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 91c50649a8c7..af07bb146e81 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -160,7 +160,8 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
void nfsd_filp_close(struct file *fp);
-int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
+int nfsd_open_local_fh(struct net *net,
+ struct rpc_clnt *rpc_clnt,
const struct cred *cred,
const struct nfs_fh *nfs_fh,
const fmode_t fmode,
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index f5760b05ec87..f47ea512eb0a 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -132,6 +132,7 @@ struct nfs_client {
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
struct rpc_clnt * cl_rpcclient_localio; /* localio RPC client handle */
+ struct net * cl_nfssvc_net;
nfs_to_nfsd_open_t nfsd_open_local_fh;
};
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index b8df1b9f248d..c9592ad0afe2 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -8,6 +8,7 @@
#include <linux/list.h>
#include <linux/uuid.h>
#include <linux/nfs.h>
+#include <net/net_namespace.h>
/*
* Global list of nfsd_uuid_t instances, add/remove
@@ -23,13 +24,14 @@ extern struct list_head nfsd_uuids;
typedef struct {
uuid_t uuid;
struct list_head list;
+ struct net *net; /* nfsd's network namespace */
} nfsd_uuid_t;
-bool nfsd_uuid_is_local(const uuid_t *uuid);
+bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp);
-typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
- const struct nfs_fh *, const fmode_t,
- struct file **);
+typedef int (*nfs_to_nfsd_open_t)(struct net *, struct rpc_clnt *,
+ const struct cred *, const struct nfs_fh *,
+ const fmode_t, struct file **);
nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
void put_nfsd_open_local_fh(void);
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 14/19] nfsd/localio: manage netns reference in nfsd_open_local_fh
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (12 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 13/19] nfs/nfsd: ensure localio server always uses its network namespace Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 15/19] nfsd: prepare to use SRCU to dereference nn->nfsd_serv Mike Snitzer
` (5 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
Use maybe_get_net() and put_net() in nfsd_open_local_fh().
Also refactor nfsd_open_local_fh() slightly.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 45 ++++++++++++++++++++++++++-------------------
1 file changed, 26 insertions(+), 19 deletions(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 34678bfed579..cdf8e115b33e 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -104,18 +104,11 @@ nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
static struct svc_rqst *
nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
- const struct cred *cred)
+ const struct cred *cred, struct svc_serv *serv)
{
struct svc_rqst *rqstp;
- struct nfsd_net *nn = net_generic(net, nfsd_net_id);
int status;
- /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
- if (unlikely(!READ_ONCE(nn->nfsd_serv))) {
- dprintk("%s: localio denied. Server not running\n", __func__);
- return ERR_PTR(-ENXIO);
- }
-
rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
if (!rqstp)
return ERR_PTR(-ENOMEM);
@@ -125,13 +118,13 @@ nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
status = -ENOMEM;
goto out_err;
}
-
rqstp->rq_xprt->xpt_net = net;
+
__set_bit(RQ_SECURE, &rqstp->rq_flags);
rqstp->rq_proc = 1;
rqstp->rq_vers = 3;
rqstp->rq_prot = IPPROTO_TCP;
- rqstp->rq_server = nn->nfsd_serv;
+ rqstp->rq_server = serv;
/* Note: we're connecting to ourself, so source addr == peer addr */
rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
@@ -193,28 +186,44 @@ int nfsd_open_local_fh(struct net *net,
const fmode_t fmode,
struct file **pfilp)
{
+ struct nfsd_net *nn;
const struct cred *save_cred;
struct svc_rqst *rqstp;
struct svc_fh fh;
struct nfsd_file *nf;
int status = 0;
int mayflags = NFSD_MAY_LOCALIO;
+ struct svc_serv *serv;
__be32 beres;
+ if (nfs_fh->size > NFS4_FHSIZE)
+ return -EINVAL;
+
+ /* Not running in nfsd context, must safely get reference on nfsd_serv */
+ net = maybe_get_net(net);
+ if (!net) {
+ dprintk("%s: localio denied. Server netns not available\n", __func__);
+ return -ENXIO;
+ }
+ nn = net_generic(net, nfsd_net_id);
+
+ serv = READ_ONCE(nn->nfsd_serv);
+ if (unlikely(!serv)) {
+ dprintk("%s: localio denied. Server not running\n", __func__);
+ status = -ENXIO;
+ goto out_net;
+ }
+
/* Save creds before calling into nfsd */
save_cred = get_current_cred();
- rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
+ rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred, serv);
if (IS_ERR(rqstp)) {
status = PTR_ERR(rqstp);
goto out_revertcred;
}
/* nfs_fh -> svc_fh */
- if (nfs_fh->size > NFS4_FHSIZE) {
- status = -EINVAL;
- goto out;
- }
fh_init(&fh, NFS4_FHSIZE);
fh.fh_handle.fh_size = nfs_fh->size;
memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
@@ -230,17 +239,15 @@ int nfsd_open_local_fh(struct net *net,
dprintk("%s: fh_verify failed %d\n", __func__, status);
goto out_fh_put;
}
-
*pfilp = get_file(nf->nf_file);
-
nfsd_file_put(nf);
out_fh_put:
fh_put(&fh);
-
-out:
nfsd_local_fakerqst_destroy(rqstp);
out_revertcred:
revert_creds(save_cred);
+out_net:
+ put_net(net);
return status;
}
EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 15/19] nfsd: prepare to use SRCU to dereference nn->nfsd_serv
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (13 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 14/19] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 16/19] nfsd: " Mike Snitzer
` (4 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
The next commit switches the nfsd_serv member for struct nfsd_net over
to a void pointer (void __rcu *). Prepare for this by assigning
nn->nfsd_serv to an struct svc_serv pointer that is then happily
dereferenced. This eliminates what would otherwise be numerous void
pointer dereferences after the next commit.
All nfsd code what audited so that methods that hold nfsd_mutex will
continue to directly dereference nn->nfsd_serv.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/nfsctl.c | 21 +++++++++++++--------
fs/nfsd/nfssvc.c | 34 ++++++++++++++++++----------------
2 files changed, 31 insertions(+), 24 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index e5d2cc74ef77..1bddbbf7418e 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -687,10 +687,11 @@ static ssize_t write_versions(struct file *file, char *buf, size_t size)
static ssize_t __write_ports_names(char *buf, struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv = nn->nfsd_serv;
- if (nn->nfsd_serv == NULL)
+ if (serv == NULL)
return 0;
- return svc_xprt_names(nn->nfsd_serv, buf, SIMPLE_TRANSACTION_LIMIT);
+ return svc_xprt_names(serv, buf, SIMPLE_TRANSACTION_LIMIT);
}
/*
@@ -717,7 +718,7 @@ static ssize_t __write_ports_addfd(char *buf, struct net *net, const struct cred
serv = nn->nfsd_serv;
err = svc_addsock(serv, net, fd, buf, SIMPLE_TRANSACTION_LIMIT, cred);
- if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
+ if (!serv->sv_nrthreads && list_empty(&serv->sv_permsocks))
nfsd_destroy_serv(net);
return err;
@@ -765,7 +766,7 @@ static ssize_t __write_ports_addxprt(char *buf, struct net *net, const struct cr
svc_xprt_put(xprt);
}
out_err:
- if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
+ if (!serv->sv_nrthreads && list_empty(&serv->sv_permsocks))
nfsd_destroy_serv(net);
return err;
@@ -1674,6 +1675,7 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
int *nthreads, count = 0, nrpools, i, ret = -EOPNOTSUPP, rem;
struct net *net = genl_info_net(info);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv;
const struct nlattr *attr;
const char *scope = NULL;
@@ -1708,7 +1710,8 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
info->attrs[NFSD_A_SERVER_LEASETIME] ||
info->attrs[NFSD_A_SERVER_SCOPE]) {
ret = -EBUSY;
- if (nn->nfsd_serv && nn->nfsd_serv->sv_nrthreads)
+ serv = nn->nfsd_serv;
+ if (serv && serv->sv_nrthreads)
goto out_unlock;
ret = -EINVAL;
@@ -1757,6 +1760,7 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb, struct genl_info *info)
{
struct net *net = genl_info_net(info);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv;
void *hdr;
int err;
@@ -1781,11 +1785,12 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb, struct genl_info *info)
if (err)
goto err_unlock;
- if (nn->nfsd_serv) {
+ serv = nn->nfsd_serv;
+ if (serv) {
int i;
for (i = 0; i < nfsd_nrpools(net); ++i) {
- struct svc_pool *sp = &nn->nfsd_serv->sv_pools[i];
+ struct svc_pool *sp = &serv->sv_pools[i];
err = nla_put_u32(skb, NFSD_A_SERVER_THREADS,
atomic_read(&sp->sp_nrthreads));
@@ -2103,7 +2108,7 @@ int nfsd_nl_listener_set_doit(struct sk_buff *skb, struct genl_info *info)
err = ret;
}
- if (!serv->sv_nrthreads && list_empty(&nn->nfsd_serv->sv_permsocks))
+ if (!serv->sv_nrthreads && list_empty(&serv->sv_permsocks))
nfsd_destroy_serv(net);
out_unlock_mtx:
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 48bfd3c6d619..bfc58001dd9a 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -309,10 +309,12 @@ int nfsd_nrthreads(struct net *net)
{
int rv = 0;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv;
mutex_lock(&nfsd_mutex);
- if (nn->nfsd_serv)
- rv = nn->nfsd_serv->sv_nrthreads;
+ serv = nn->nfsd_serv;
+ if (serv)
+ rv = serv->sv_nrthreads;
mutex_unlock(&nfsd_mutex);
return rv;
}
@@ -321,16 +323,17 @@ static int nfsd_init_socks(struct net *net, const struct cred *cred)
{
int error;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv = nn->nfsd_serv;
- if (!list_empty(&nn->nfsd_serv->sv_permsocks))
+ if (!list_empty(&serv->sv_permsocks))
return 0;
- error = svc_xprt_create(nn->nfsd_serv, "udp", net, PF_INET, NFS_PORT,
+ error = svc_xprt_create(serv, "udp", net, PF_INET, NFS_PORT,
SVC_SOCK_DEFAULTS, cred);
if (error < 0)
return error;
- error = svc_xprt_create(nn->nfsd_serv, "tcp", net, PF_INET, NFS_PORT,
+ error = svc_xprt_create(serv, "tcp", net, PF_INET, NFS_PORT,
SVC_SOCK_DEFAULTS, cred);
if (error < 0)
return error;
@@ -742,11 +745,12 @@ int nfsd_create_serv(struct net *net)
int nfsd_nrpools(struct net *net)
{
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv = nn->nfsd_serv;
- if (nn->nfsd_serv == NULL)
+ if (serv == NULL)
return 0;
else
- return nn->nfsd_serv->sv_nrpools;
+ return serv->sv_nrpools;
}
int nfsd_get_nrthreads(int n, int *nthreads, struct net *net)
@@ -780,14 +784,15 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
int tot = 0;
int err = 0;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_serv *serv = nn->nfsd_serv;
lockdep_assert_held(&nfsd_mutex);
- if (nn->nfsd_serv == NULL || n <= 0)
+ if (serv == NULL || n <= 0)
return 0;
- if (n > nn->nfsd_serv->sv_nrpools)
- n = nn->nfsd_serv->sv_nrpools;
+ if (n > serv->sv_nrpools)
+ n = serv->sv_nrpools;
/* enforce a global maximum number of threads */
tot = 0;
@@ -810,18 +815,15 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
/* apply the new numbers */
for (i = 0; i < n; i++) {
- err = svc_set_num_threads(nn->nfsd_serv,
- &nn->nfsd_serv->sv_pools[i],
+ err = svc_set_num_threads(serv, &serv->sv_pools[i],
nthreads[i]);
if (err)
goto out;
}
/* Anything undefined in array is considered to be 0 */
- for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
- err = svc_set_num_threads(nn->nfsd_serv,
- &nn->nfsd_serv->sv_pools[i],
- 0);
+ for (i = n; i < serv->sv_nrpools; ++i) {
+ err = svc_set_num_threads(serv, &serv->sv_pools[i], 0);
if (err)
goto out;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 16/19] nfsd: use SRCU to dereference nn->nfsd_serv
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (14 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 15/19] nfsd: prepare to use SRCU to dereference nn->nfsd_serv Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-19 12:39 ` Jeff Layton
2024-06-18 20:19 ` [PATCH v5 17/19] nfsd/localio: use SRCU to dereference nn->nfsd_serv in nfsd_open_local_fh Mike Snitzer
` (3 subsequent siblings)
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
Introduce nfsd_serv_get, nfsd_serv_put and nfsd_serv_sync and update
the nfsd code to prevent nfsd_destroy_serv from destroying
nn->nfsd_serv until all nfsd code is done with it (particularly the
localio code that doesn't run in the context of nfsd's svc threads,
nor does it take the nfsd_mutex).
Commit 83d5e5b0af90 ("dm: optimize use SRCU and RCU") provided a
familiar well-worn pattern for how implement.
Suggested-by: NeilBrown <neilb@suse.de>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/filecache.c | 13 ++++++++---
fs/nfsd/netns.h | 12 ++++++++--
fs/nfsd/nfs4state.c | 25 ++++++++++++++-------
fs/nfsd/nfsctl.c | 7 ++++--
fs/nfsd/nfssvc.c | 55 ++++++++++++++++++++++++++++++++++++---------
5 files changed, 87 insertions(+), 25 deletions(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 99631fa56662..474b3a3af3fb 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -413,12 +413,15 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
struct nfsd_file *nf = list_first_entry(dispose,
struct nfsd_file, nf_lru);
struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
+ int srcu_idx;
+ struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
struct nfsd_fcache_disposal *l = nn->fcache_disposal;
spin_lock(&l->lock);
list_move_tail(&nf->nf_lru, &l->freeme);
spin_unlock(&l->lock);
- svc_wake_up(nn->nfsd_serv);
+ svc_wake_up(serv);
+ nfsd_serv_put(nn, srcu_idx);
}
}
@@ -443,11 +446,15 @@ void nfsd_file_net_dispose(struct nfsd_net *nn)
for (i = 0; i < 8 && !list_empty(&l->freeme); i++)
list_move(l->freeme.next, &dispose);
spin_unlock(&l->lock);
- if (!list_empty(&l->freeme))
+ if (!list_empty(&l->freeme)) {
+ int srcu_idx;
+ struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
/* Wake up another thread to share the work
* *before* doing any actual disposing.
*/
- svc_wake_up(nn->nfsd_serv);
+ svc_wake_up(serv);
+ nfsd_serv_put(nn, srcu_idx);
+ }
nfsd_file_dispose_list(&dispose);
}
}
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 0c5a1d97e4ac..0eebcc03bcd3 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -139,8 +139,12 @@ struct nfsd_net {
u32 clverifier_counter;
struct svc_info nfsd_info;
-#define nfsd_serv nfsd_info.serv
-
+ /*
+ * The current 'nfsd_serv' at nfsd_info.serv
+ * Use nfsd_serv_get() or take nfsd_mutex to dereference.
+ */
+ void __rcu *nfsd_serv;
+ struct srcu_struct nfsd_serv_srcu;
/*
* clientid and stateid data for construction of net unique COPY
@@ -225,6 +229,10 @@ struct nfsd_net {
extern bool nfsd_support_version(int vers);
extern void nfsd_netns_free_versions(struct nfsd_net *nn);
+extern struct svc_serv *nfsd_serv_get(struct nfsd_net *nn, int *srcu_idx);
+extern void nfsd_serv_put(struct nfsd_net *nn, int srcu_idx);
+extern void nfsd_serv_sync(struct nfsd_net *nn);
+
extern unsigned int nfsd_net_id;
void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a20c2c9d7d45..8876810e569d 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1919,6 +1919,8 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn
u32 num = ca->maxreqs;
unsigned long avail, total_avail;
unsigned int scale_factor;
+ int srcu_idx;
+ struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
spin_lock(&nfsd_drc_lock);
if (nfsd_drc_max_mem > nfsd_drc_mem_used)
@@ -1940,7 +1942,7 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn
* Give the client one slot even if that would require
* over-allocation--it is better than failure.
*/
- scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
+ scale_factor = max_t(unsigned int, 8, serv->sv_nrthreads);
avail = clamp_t(unsigned long, avail, slotsize,
total_avail/scale_factor);
@@ -1949,6 +1951,8 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn
nfsd_drc_mem_used += num * slotsize;
spin_unlock(&nfsd_drc_lock);
+ nfsd_serv_put(nn, srcu_idx);
+
return num;
}
@@ -3702,12 +3706,16 @@ nfsd4_replay_create_session(struct nfsd4_create_session *cr_ses,
static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
{
- u32 maxrpc = nn->nfsd_serv->sv_max_mesg;
+ int srcu_idx;
+ struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
+ u32 maxrpc = serv->sv_max_mesg;
+ __be32 status = nfs_ok;
- if (ca->maxreq_sz < NFSD_MIN_REQ_HDR_SEQ_SZ)
- return nfserr_toosmall;
- if (ca->maxresp_sz < NFSD_MIN_RESP_HDR_SEQ_SZ)
- return nfserr_toosmall;
+ if (ca->maxreq_sz < NFSD_MIN_REQ_HDR_SEQ_SZ ||
+ ca->maxresp_sz < NFSD_MIN_RESP_HDR_SEQ_SZ) {
+ status = nfserr_toosmall;
+ goto out;
+ }
ca->headerpadsz = 0;
ca->maxreq_sz = min_t(u32, ca->maxreq_sz, maxrpc);
ca->maxresp_sz = min_t(u32, ca->maxresp_sz, maxrpc);
@@ -3726,8 +3734,9 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
* accounting is soft and provides no guarantees either way.
*/
ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
-
- return nfs_ok;
+out:
+ nfsd_serv_put(nn, srcu_idx);
+ return status;
}
/*
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 1bddbbf7418e..2d4c29c25c6a 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -1569,10 +1569,12 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
{
struct nfsd_net *nn = net_generic(sock_net(skb->sk), nfsd_net_id);
int i, ret, rqstp_index = 0;
+ int srcu_idx;
+ struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
rcu_read_lock();
- for (i = 0; i < nn->nfsd_serv->sv_nrpools; i++) {
+ for (i = 0; i < serv->sv_nrpools; i++) {
struct svc_rqst *rqstp;
if (i < cb->args[0]) /* already consumed */
@@ -1580,7 +1582,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
rqstp_index = 0;
list_for_each_entry_rcu(rqstp,
- &nn->nfsd_serv->sv_pools[i].sp_all_threads,
+ &serv->sv_pools[i].sp_all_threads,
rq_all) {
struct nfsd_genl_rqstp genl_rqstp;
unsigned int status_counter;
@@ -1645,6 +1647,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
ret = skb->len;
out:
rcu_read_unlock();
+ nfsd_serv_put(nn, srcu_idx);
return ret;
}
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index bfc58001dd9a..f84530f95eb8 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -300,6 +300,26 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
return 0;
}
+struct svc_serv *nfsd_serv_get(struct nfsd_net *nn, int *srcu_idx)
+ __acquires(nn->nfsd_serv_srcu)
+{
+ *srcu_idx = srcu_read_lock(&nn->nfsd_serv_srcu);
+
+ return srcu_dereference(nn->nfsd_serv, &nn->nfsd_serv_srcu);
+}
+
+void nfsd_serv_put(struct nfsd_net *nn, int srcu_idx)
+ __releases(nn->nfsd_serv_srcu)
+{
+ srcu_read_unlock(&nn->nfsd_serv_srcu, srcu_idx);
+}
+
+void nfsd_serv_sync(struct nfsd_net *nn)
+{
+ synchronize_srcu(&nn->nfsd_serv_srcu);
+ synchronize_rcu_expedited();
+}
+
/*
* Maximum number of nfsd processes
*/
@@ -507,6 +527,7 @@ static void nfsd_shutdown_net(struct net *net)
lockd_down(net);
nn->lockd_up = false;
}
+ cleanup_srcu_struct(&nn->nfsd_serv_srcu);
#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
list_del_rcu(&nn->nfsd_uuid.list);
#endif
@@ -514,6 +535,7 @@ static void nfsd_shutdown_net(struct net *net)
nfsd_shutdown_generic();
}
+// FIXME: eliminate nfsd_notifier_lock
static DEFINE_SPINLOCK(nfsd_notifier_lock);
static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event,
void *ptr)
@@ -523,20 +545,22 @@ static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event,
struct net *net = dev_net(dev);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct sockaddr_in sin;
+ int srcu_idx;
+ struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
- if (event != NETDEV_DOWN || !nn->nfsd_serv)
+ if (event != NETDEV_DOWN || !serv)
goto out;
spin_lock(&nfsd_notifier_lock);
- if (nn->nfsd_serv) {
+ if (serv) {
dprintk("nfsd_inetaddr_event: removed %pI4\n", &ifa->ifa_local);
sin.sin_family = AF_INET;
sin.sin_addr.s_addr = ifa->ifa_local;
- svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin);
+ svc_age_temp_xprts_now(serv, (struct sockaddr *)&sin);
}
spin_unlock(&nfsd_notifier_lock);
-
out:
+ nfsd_serv_put(nn, srcu_idx);
return NOTIFY_DONE;
}
@@ -553,22 +577,24 @@ static int nfsd_inet6addr_event(struct notifier_block *this,
struct net *net = dev_net(dev);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct sockaddr_in6 sin6;
+ int srcu_idx;
+ struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
- if (event != NETDEV_DOWN || !nn->nfsd_serv)
+ if (event != NETDEV_DOWN || !serv)
goto out;
spin_lock(&nfsd_notifier_lock);
- if (nn->nfsd_serv) {
+ if (serv) {
dprintk("nfsd_inet6addr_event: removed %pI6\n", &ifa->addr);
sin6.sin6_family = AF_INET6;
sin6.sin6_addr = ifa->addr;
if (ipv6_addr_type(&sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL)
sin6.sin6_scope_id = ifa->idev->dev->ifindex;
- svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin6);
+ svc_age_temp_xprts_now(serv, (struct sockaddr *)&sin6);
}
spin_unlock(&nfsd_notifier_lock);
-
out:
+ nfsd_serv_put(nn, srcu_idx);
return NOTIFY_DONE;
}
@@ -589,9 +615,12 @@ void nfsd_destroy_serv(struct net *net)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct svc_serv *serv = nn->nfsd_serv;
+ lockdep_assert_held(&nfsd_mutex);
+
spin_lock(&nfsd_notifier_lock);
- nn->nfsd_serv = NULL;
+ rcu_assign_pointer(nn->nfsd_serv, NULL);
spin_unlock(&nfsd_notifier_lock);
+ nfsd_serv_sync(nn);
/* check if the notifier still has clients */
if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
@@ -711,6 +740,10 @@ int nfsd_create_serv(struct net *net)
if (nn->nfsd_serv)
return 0;
+ error = init_srcu_struct(&nn->nfsd_serv_srcu);
+ if (error)
+ return error;
+
if (nfsd_max_blksize == 0)
nfsd_max_blksize = nfsd_get_default_max_blksize();
nfsd_reset_versions(nn);
@@ -727,8 +760,10 @@ int nfsd_create_serv(struct net *net)
}
spin_lock(&nfsd_notifier_lock);
nn->nfsd_info.mutex = &nfsd_mutex;
- nn->nfsd_serv = serv;
+ nn->nfsd_info.serv = serv;
+ rcu_assign_pointer(nn->nfsd_serv, nn->nfsd_info.serv);
spin_unlock(&nfsd_notifier_lock);
+ nfsd_serv_sync(nn);
set_max_drc();
/* check if the notifier is already set */
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 17/19] nfsd/localio: use SRCU to dereference nn->nfsd_serv in nfsd_open_local_fh
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (15 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 16/19] nfsd: " Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 18/19] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
` (2 subsequent siblings)
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
Use nfsd_serv_get to SRCU deference nn->nfsd_serv and pass the
resulting svc_serv to nfsd_local_fakerqst_create, open the file handle
and then drop the reference using nfsd_serv_put at the end of
nfsd_open_local_fh.
Verified to fix an easy to hit crash that would occur if an nfsd
instance running in a container, with a localio client mounted, is
shutdown. Upon restart of the container and associated nfsd the client
would go on to crash due to NULL pointer dereference that occuured due
to the nfs client's localio attempting to nfsd_open_local_fh(), using
nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index cdf8e115b33e..d1d9fbaab82e 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -193,6 +193,7 @@ int nfsd_open_local_fh(struct net *net,
struct nfsd_file *nf;
int status = 0;
int mayflags = NFSD_MAY_LOCALIO;
+ int srcu_idx;
struct svc_serv *serv;
__be32 beres;
@@ -207,7 +208,7 @@ int nfsd_open_local_fh(struct net *net,
}
nn = net_generic(net, nfsd_net_id);
- serv = READ_ONCE(nn->nfsd_serv);
+ serv = nfsd_serv_get(nn, &srcu_idx);
if (unlikely(!serv)) {
dprintk("%s: localio denied. Server not running\n", __func__);
status = -ENXIO;
@@ -247,6 +248,7 @@ int nfsd_open_local_fh(struct net *net,
out_revertcred:
revert_creds(save_cred);
out_net:
+ nfsd_serv_put(nn, srcu_idx);
put_net(net);
return status;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 18/19] nfs/localio: use dedicated workqueues for filesystem read and write
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (16 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 17/19] nfsd/localio: use SRCU to dereference nn->nfsd_serv in nfsd_open_local_fh Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 19/19] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-06-19 5:49 ` [PATCH v5 00/19] nfs/nfsd: add support for localio Christoph Hellwig
19 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
From: Trond Myklebust <trond.myklebust@hammerspace.com>
For localio access, don't call filesystem read() and write() routines
directly.
Some filesystem writeback routines can end up taking up a lot of stack
space (particularly xfs). Instead of risking running over due to the
extra overhead from the NFS stack, we should just call these routines
from a workqueue job.
Use of dedicated workqueues improves performance over using the
system_unbound_wq. Localio is motivated by the promise of improved
performance, it makes little sense to yield it back.
But further analysis of the latest stack depth requirements would be
useful. It'd be nice to root cause and fix the latest stack hogs,
because using workqueues at all can cause a loss in performance due to
context switches.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/inode.c | 57 +++++++++++++++++---------
fs/nfs/internal.h | 1 +
fs/nfs/localio.c | 102 +++++++++++++++++++++++++++++++++++-----------
3 files changed, 118 insertions(+), 42 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index f9923cbf6058..aac8c5302503 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2394,35 +2394,54 @@ static void nfs_destroy_inodecache(void)
kmem_cache_destroy(nfs_inode_cachep);
}
+struct workqueue_struct *nfslocaliod_workqueue;
struct workqueue_struct *nfsiod_workqueue;
EXPORT_SYMBOL_GPL(nfsiod_workqueue);
/*
- * start up the nfsiod workqueue
- */
-static int nfsiod_start(void)
-{
- struct workqueue_struct *wq;
- dprintk("RPC: creating workqueue nfsiod\n");
- wq = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
- if (wq == NULL)
- return -ENOMEM;
- nfsiod_workqueue = wq;
- return 0;
-}
-
-/*
- * Destroy the nfsiod workqueue
+ * Destroy the nfsiod workqueues
*/
static void nfsiod_stop(void)
{
struct workqueue_struct *wq;
wq = nfsiod_workqueue;
- if (wq == NULL)
- return;
- nfsiod_workqueue = NULL;
- destroy_workqueue(wq);
+ if (wq != NULL) {
+ nfsiod_workqueue = NULL;
+ destroy_workqueue(wq);
+ }
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ wq = nfslocaliod_workqueue;
+ if (wq != NULL) {
+ nfslocaliod_workqueue = NULL;
+ destroy_workqueue(wq);
+ }
+#endif /* CONFIG_NFS_LOCALIO */
+}
+
+/*
+ * Start the nfsiod workqueues
+ */
+static int nfsiod_start(void)
+{
+ dprintk("RPC: creating workqueue nfsiod\n");
+ nfsiod_workqueue = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+ if (nfsiod_workqueue == NULL)
+ return -ENOMEM;
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ /*
+ * localio writes need to use a normal (non-memreclaim) workqueue.
+ * When we start getting low on space, XFS goes and calls flush_work() on
+ * a non-memreclaim work queue, which causes a priority inversion problem.
+ */
+ dprintk("RPC: creating workqueue nfslocaliod\n");
+ nfslocaliod_workqueue = alloc_workqueue("nfslocaliod", WQ_UNBOUND, 0);
+ if (unlikely(nfslocaliod_workqueue == NULL)) {
+ nfsiod_stop();
+ return -ENOMEM;
+ }
+#endif /* CONFIG_NFS_LOCALIO */
+ return 0;
}
unsigned int nfs_net_id;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index d30a2e63063c..404524cd4d4a 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -440,6 +440,7 @@ int nfs_check_flags(int);
/* inode.c */
extern struct workqueue_struct *nfsiod_workqueue;
+extern struct workqueue_struct *nfslocaliod_workqueue;
extern struct inode *nfs_alloc_inode(struct super_block *sb);
extern void nfs_free_inode(struct inode *);
extern int nfs_write_inode(struct inode *, struct writeback_control *);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index d41130f5a84d..27fc941d9dfa 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -45,6 +45,12 @@ struct nfs_local_fsync_ctx {
};
static void nfs_local_fsync_work(struct work_struct *work);
+struct nfs_local_io_args {
+ struct nfs_local_kiocb *iocb;
+ struct work_struct work;
+ struct completion *done;
+};
+
/*
* We need to translate between nfs status return values and
* the local errno values which may not be the same.
@@ -417,21 +423,38 @@ nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
nfs_local_pgio_complete(iocb);
}
-static int
-nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
- const struct rpc_call_ops *call_ops)
+static void nfs_local_call_read(struct work_struct *work)
{
- struct nfs_local_kiocb *iocb;
+ struct nfs_local_io_args *args =
+ container_of(work, struct nfs_local_io_args, work);
+ struct nfs_local_kiocb *iocb = args->iocb;
+ struct file *filp = iocb->kiocb.ki_filp;
struct iov_iter iter;
ssize_t status;
+ nfs_local_iter_init(&iter, iocb, READ);
+
+ status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+ }
+ complete(args->done);
+}
+
+static int nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_io_args args;
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct nfs_local_kiocb *iocb;
+
dprintk("%s: vfs_read count=%u pos=%llu\n",
__func__, hdr->args.count, hdr->args.offset);
iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
if (iocb == NULL)
return -ENOMEM;
- nfs_local_iter_init(&iter, iocb, READ);
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
@@ -441,11 +464,18 @@ nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
}
- status = filp->f_op->read_iter(&iocb->kiocb, &iter);
- if (status != -EIOCBQUEUED) {
- nfs_local_read_done(iocb, status);
- nfs_local_pgio_release(iocb);
- }
+ /*
+ * Don't call filesystem read() routines directly.
+ * In order to avoid issues with stack overflow,
+ * call the read routines from a workqueue job.
+ */
+ args.iocb = iocb;
+ args.done = &done;
+ INIT_WORK_ONSTACK(&args.work, nfs_local_call_read);
+ queue_work(nfslocaliod_workqueue, &args.work);
+ wait_for_completion(&done);
+ destroy_work_on_stack(&args.work);
+
return 0;
}
@@ -555,14 +585,35 @@ nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
nfs_local_pgio_complete(iocb);
}
-static int
-nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
- const struct rpc_call_ops *call_ops)
+static void nfs_local_call_write(struct work_struct *work)
{
- struct nfs_local_kiocb *iocb;
+ struct nfs_local_io_args *args =
+ container_of(work, struct nfs_local_io_args, work);
+ struct nfs_local_kiocb *iocb = args->iocb;
+ struct file *filp = iocb->kiocb.ki_filp;
struct iov_iter iter;
ssize_t status;
+ nfs_local_iter_init(&iter, iocb, WRITE);
+
+ file_start_write(filp);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ file_end_write(filp);
+ if (status != -EIOCBQUEUED) {
+ nfs_local_write_done(iocb, status);
+ nfs_get_vfs_attr(filp, iocb->hdr->res.fattr);
+ nfs_local_pgio_release(iocb);
+ }
+ complete(args->done);
+}
+
+static int nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_io_args args;
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct nfs_local_kiocb *iocb;
+
dprintk("%s: vfs_write count=%u pos=%llu %s\n",
__func__, hdr->args.count, hdr->args.offset,
(hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
@@ -570,7 +621,6 @@ nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
if (iocb == NULL)
return -ENOMEM;
- nfs_local_iter_init(&iter, iocb, WRITE);
switch (hdr->args.stable) {
default:
@@ -590,14 +640,20 @@ nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
- file_start_write(filp);
- status = filp->f_op->write_iter(&iocb->kiocb, &iter);
- file_end_write(filp);
- if (status != -EIOCBQUEUED) {
- nfs_local_write_done(iocb, status);
- nfs_get_vfs_attr(filp, hdr->res.fattr);
- nfs_local_pgio_release(iocb);
- }
+ /*
+ * Don't call filesystem write() routines directly.
+ * Some filesystem writeback routines can end up taking up a lot of
+ * stack (particularly xfs). Instead of risking running over due to
+ * the extra overhead from the NFS stack, call these write routines
+ * from a workqueue job.
+ */
+ args.iocb = iocb;
+ args.done = &done;
+ INIT_WORK_ONSTACK(&args.work, nfs_local_call_write);
+ queue_work(nfslocaliod_workqueue, &args.work);
+ wait_for_completion(&done);
+ destroy_work_on_stack(&args.work);
+
return 0;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [PATCH v5 19/19] nfs: add Documentation/filesystems/nfs/localio.rst
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (17 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 18/19] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
@ 2024-06-18 20:19 ` Mike Snitzer
2024-06-18 21:46 ` Chuck Lever
2024-06-19 5:49 ` [PATCH v5 00/19] nfs/nfsd: add support for localio Christoph Hellwig
19 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-06-18 20:19 UTC (permalink / raw)
To: linux-nfs; +Cc: Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
This document gives an overview of the LOCALIO protocol extension
added to the Linux NFS client and server (both v3 and v4) to allow a
client and server to reliably handshake to determine if they are on
the same host. The LOCALIO protocol extension follows the well-worn
pattern established by the ACL protocol extension.
The robust handshake between local client and server is just the
beginning, the ultimate use-case this locality makes possible is the
client is able to issue reads, writes and commits directly to the
server without having to go over the network.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++
include/linux/nfslocalio.h | 2 +
2 files changed, 103 insertions(+)
create mode 100644 Documentation/filesystems/nfs/localio.rst
diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
new file mode 100644
index 000000000000..4b4595037a7f
--- /dev/null
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -0,0 +1,101 @@
+===========
+NFS localio
+===========
+
+This document gives an overview of the LOCALIO protocol extension added
+to the Linux NFS client and server (both v3 and v4) to allow a client
+and server to reliably handshake to determine if they are on the same
+host. The LOCALIO protocol extension follows the well-worn pattern
+established by the ACL protocol extension.
+
+The LOCALIO protocol extension is needed to allow robust discovery of
+clients local to their servers. Prior to this extension a fragile
+sockaddr network address based match against all local network
+interfaces was attempted. But unlike the LOCALIO protocol extension,
+the sockaddr-based matching didn't handle use of iptables or containers.
+
+The robust handshake between local client and server is just the
+beginning, the ultimate use-case this locality makes possible is the
+client is able to issue reads, writes and commits directly to the server
+without having to go over the network. This is particularly useful for
+container usecases (e.g. kubernetes) where it is possible to run an IO
+job local to the server.
+
+The performance advantage realized from localio's ability to bypass
+using XDR and RPC for reads, writes and commits can be extreme, e.g.:
+fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
+- With localio:
+ read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
+- Without localio:
+ read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
+
+RPC
+---
+
+The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows
+the client to retrieve a server's uuid. LOCALIOPROC_GETUUID encodes the
+server's uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed
+size opaque encode and decode XDR methods are used instead of the less
+efficient variable sized methods.
+
+NFS Common and Server
+---------------------
+
+First use is in nfsd, to add access to a global nfsd_uuids list in
+nfs_common that is used to register and then identify local nfsd
+instances.
+
+nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is
+composed of nfsd_uuid_t instances that are managed as nfsd creates them
+(per network namespace).
+
+nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local
+nfsd for the client specified nfsd uuid.
+
+The nfsd_uuids list is the basis for localio enablement, as such it has
+members that point to nfsd memory for direct use by the client
+(e.g. 'net' is the server's network namespace, through it the client can
+access nn->nfsd_serv with proper rcu read access). It is this client
+and server synchronization that enables advanced usage and lifetime of
+objects to span from the host kernel's nfsd to per-container knfsd
+instances that are connected to nfs client's running on the same local
+host.
+
+NFS Client
+----------
+
+fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via
+LOCALIO protocol and check if the server with that uuid is known to be
+local. This ensures client and server 1: support localio 2: are local
+to each other.
+
+See fs/nfs/localio.c:nfs_local_open_fh() and
+fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
+focused use of nfsd_uuid_t struct to allow a client local to a server to
+open a file pointer without needing to go over the network.
+
+The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
+server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
+both the nfsd network namespace and the associated nn->nfsd_serv in
+terms of RCU. If nfsd_open_local_fh() finds that client no longer sees
+valid nfsd objects (be it struct net or nn->nfsd_serv) it return ENXIO
+to nfs_local_open_fh() and the client will try to reestablish the
+LOCALIO resources needed by calling nfs_local_probe() again. This
+recovery is needed if/when an nfsd instance running in a container were
+to reboot while a localio client is connected to it.
+
+Testing
+-------
+
+The LOCALIO protocol extension and associated NFS localio read, right
+and commit access have proven stable against various test scenarios:
+
+- Client and server both on localhost (for both v3 and v4.2).
+
+- Various permutations of client and server support enablement for
+ both local and remote client and server. Testing against NFS storage
+ products that don't support the LOCALIO protocol was also performed.
+
+- Client on host, server within a container (for both v3 and v4.2)
+ The container testing was in terms of podman managed containers and
+ includes container stop/restart scenario.
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index c9592ad0afe2..a9722e18b527 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -20,6 +20,8 @@ extern struct list_head nfsd_uuids;
* Each nfsd instance has an nfsd_uuid_t that is accessible through the
* global nfsd_uuids list. Useful to allow a client to negotiate if localio
* possible with its server.
+ *
+ * See Documentation/filesystems/nfs/localio.rst for more detail.
*/
typedef struct {
uuid_t uuid;
--
2.44.0
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH v5 06/19] nfs/nfsd: add "localio" support
2024-06-18 20:19 ` [PATCH v5 06/19] nfs/nfsd: add "localio" support Mike Snitzer
@ 2024-06-18 21:28 ` Jeff Layton
0 siblings, 0 replies; 45+ messages in thread
From: Jeff Layton @ 2024-06-18 21:28 UTC (permalink / raw)
To: Mike Snitzer, linux-nfs; +Cc: Chuck Lever, Trond Myklebust, NeilBrown, snitzer
On Tue, 2024-06-18 at 16:19 -0400, Mike Snitzer wrote:
> From: Weston Andros Adamson <dros@primarydata.com>
>
> Add client support for bypassing NFS for localhost reads, writes, and
> commits. This is only useful when the client and the server are
> running on the same host.
>
> nfs_local_probe() is stubbed out, later commits will enable client and
> server handshake via a LOCALIO protocol extension.
>
> This has dynamic binding with the nfsd module. Localio will only work
> if nfsd is already loaded.
>
> The "localio_enabled" nfs kernel module parameter can be used to
> disable and enable the ability to use localio support.
>
> Also, tracepoints were added for nfs_local_open_fh, nfs_local_enable
> and nfs_local_disable.
>
> Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
> Signed-off-by: Peng Tao <tao.peng@primarydata.com>
> Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/Makefile | 1 +
> fs/nfs/client.c | 7 +
> fs/nfs/inode.c | 5 +
> fs/nfs/internal.h | 53 +++
> fs/nfs/localio.c | 797 ++++++++++++++++++++++++++++++++++++++
> fs/nfs/nfstrace.h | 61 +++
> fs/nfs/pagelist.c | 3 +
> fs/nfs/write.c | 3 +
> fs/nfsd/Makefile | 1 +
> fs/nfsd/filecache.c | 2 +-
> fs/nfsd/localio.c | 243 ++++++++++++
> fs/nfsd/trace.h | 3 +-
> fs/nfsd/vfs.h | 8 +
> include/linux/nfs.h | 6 +
> include/linux/nfs_fs.h | 2 +
> include/linux/nfs_fs_sb.h | 5 +
> include/linux/nfs_xdr.h | 1 +
> 17 files changed, 1199 insertions(+), 2 deletions(-)
> create mode 100644 fs/nfs/localio.c
> create mode 100644 fs/nfsd/localio.c
>
> diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
> index 5f6db37f461e..9fb2f2cac87e 100644
> --- a/fs/nfs/Makefile
> +++ b/fs/nfs/Makefile
> @@ -13,6 +13,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \
> nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
> nfs-$(CONFIG_SYSCTL) += sysctl.o
> nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
> +nfs-$(CONFIG_NFS_LOCALIO) += localio.o
>
> obj-$(CONFIG_NFS_V2) += nfsv2.o
> nfsv2-y := nfs2super.o proc.o nfs2xdr.o
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index de77848ae654..9170e6036fd2 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -178,6 +178,10 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
> clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1;
> clp->cl_net = get_net(cl_init->net);
>
> +#if IS_ENABLED(CONFIG_NFS_LOCALIO)
> + seqlock_init(&clp->cl_boot_lock);
> + ktime_get_real_ts64(&clp->cl_nfssvc_boot);
> +#endif
> clp->cl_principal = "*";
> clp->cl_xprtsec = cl_init->xprtsec;
> return clp;
> @@ -233,6 +237,8 @@ static void pnfs_init_server(struct nfs_server *server)
> */
> void nfs_free_client(struct nfs_client *clp)
> {
> + nfs_local_disable(clp);
> +
> /* -EIO all pending I/O */
> if (!IS_ERR(clp->cl_rpcclient))
> rpc_shutdown_client(clp->cl_rpcclient);
> @@ -424,6 +430,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
> list_add_tail(&new->cl_share_link,
> &nn->nfs_client_list);
> spin_unlock(&nn->nfs_client_lock);
> + nfs_local_probe(new);
> return rpc_ops->init_client(new, cl_init);
> }
>
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index acef52ecb1bb..4f88b860494f 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -39,6 +39,7 @@
> #include <linux/slab.h>
> #include <linux/compat.h>
> #include <linux/freezer.h>
> +#include <linux/file.h>
> #include <linux/uaccess.h>
> #include <linux/iversion.h>
>
> @@ -1053,6 +1054,7 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
> ctx->lock_context.open_context = ctx;
> INIT_LIST_HEAD(&ctx->list);
> ctx->mdsthreshold = NULL;
> + ctx->local_filp = NULL;
> return ctx;
> }
> EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
> @@ -1084,6 +1086,8 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
> nfs_sb_deactive(sb);
> put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
> kfree(ctx->mdsthreshold);
> + if (!IS_ERR_OR_NULL(ctx->local_filp))
> + fput(ctx->local_filp);
> kfree_rcu(ctx, rcu_head);
> }
>
> @@ -2495,6 +2499,7 @@ static int __init init_nfs_fs(void)
> if (err)
> goto out1;
>
> + nfs_local_init();
> err = register_nfs_fs();
> if (err)
> goto out0;
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index 958c8de072e2..c933421eb6af 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -451,6 +451,59 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
> extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
> extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
>
> +#if IS_ENABLED(CONFIG_NFS_LOCALIO)
> +/* localio.c */
> +extern void nfs_local_init(void);
> +extern void nfs_local_disable(struct nfs_client *);
> +extern void nfs_local_probe(struct nfs_client *);
> +extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
> + struct nfs_fh *, const fmode_t);
> +extern struct file *nfs_local_file_open(struct nfs_client *clp,
> + const struct cred *cred,
> + struct nfs_fh *fh,
> + struct nfs_open_context *ctx);
> +extern int nfs_local_doio(struct nfs_client *, struct file *,
> + struct nfs_pgio_header *,
> + const struct rpc_call_ops *);
> +extern int nfs_local_commit(struct file *, struct nfs_commit_data *,
> + const struct rpc_call_ops *, int);
> +extern bool nfs_server_is_local(const struct nfs_client *clp);
> +
> +#else
> +static inline void nfs_local_init(void) {}
> +static inline void nfs_local_disable(struct nfs_client *clp) {}
> +static inline void nfs_local_probe(struct nfs_client *clp) {}
> +static inline struct file *nfs_local_open_fh(struct nfs_client *clp,
> + const struct cred *cred,
> + struct nfs_fh *fh,
> + const fmode_t mode)
> +{
> + return ERR_PTR(-EINVAL);
> +}
> +static inline struct file *nfs_local_file_open(struct nfs_client *clp,
> + const struct cred *cred,
> + struct nfs_fh *fh,
> + struct nfs_open_context *ctx)
> +{
> + return NULL;
> +}
> +static inline int nfs_local_doio(struct nfs_client *clp, struct file *filep,
> + struct nfs_pgio_header *hdr,
> + const struct rpc_call_ops *call_ops)
> +{
> + return -EINVAL;
> +}
> +static inline int nfs_local_commit(struct file *filep, struct nfs_commit_data *data,
> + const struct rpc_call_ops *call_ops, int how)
> +{
> + return -EINVAL;
> +}
> +static inline bool nfs_server_is_local(const struct nfs_client *clp)
> +{
> + return false;
> +}
> +#endif /* CONFIG_NFS_LOCALIO */
> +
> /* super.c */
> extern const struct super_operations nfs_sops;
> bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> new file mode 100644
> index 000000000000..286cd0ded1b6
> --- /dev/null
> +++ b/fs/nfs/localio.c
> @@ -0,0 +1,797 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * NFS client support for local clients to bypass network stack
> + *
> + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> + */
> +
> +#include <linux/module.h>
> +#include <linux/errno.h>
> +#include <linux/vfs.h>
> +#include <linux/file.h>
> +#include <linux/inet.h>
> +#include <linux/sunrpc/addr.h>
> +#include <linux/inetdevice.h>
> +#include <net/addrconf.h>
> +#include <linux/module.h>
> +#include <linux/bvec.h>
> +
> +#include <linux/nfs.h>
> +#include <linux/nfs_fs.h>
> +#include <linux/nfs_xdr.h>
> +
> +#include "internal.h"
> +#include "pnfs.h"
> +#include "nfstrace.h"
> +
> +#define NFSDBG_FACILITY NFSDBG_VFS
> +
> +extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> + const struct cred *cred,
> + const struct nfs_fh *nfs_fh, const fmode_t fmode,
> + struct file **pfilp);
> +/*
> + * The localio code needs to call into nfsd to do the filehandle -> struct path
> + * mapping, but cannot be statically linked, because that will make the nfs
> + * module depend on the nfsd module.
> + *
> + * Instead, do dynamic linking to the nfsd module. This way the nfs module
> + * will only hold a reference on nfsd when it's actually in use. This also
> + * allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> + */
> +
> +struct nfs_local_open_ctx {
> + spinlock_t lock;
> + nfs_to_nfsd_open_t open_f;
> + atomic_t refcount;
> +};
> +
> +struct nfs_local_kiocb {
> + struct kiocb kiocb;
> + struct bio_vec *bvec;
> + struct nfs_pgio_header *hdr;
> + struct work_struct work;
> +};
> +
> +struct nfs_local_fsync_ctx {
> + struct file *filp;
> + struct nfs_commit_data *data;
> + struct work_struct work;
> + struct kref kref;
> + struct completion *done;
> +};
> +static void nfs_local_fsync_work(struct work_struct *work);
> +
> +/*
> + * We need to translate between nfs status return values and
> + * the local errno values which may not be the same.
> + */
> +static struct {
> + __u32 stat;
> + int errno;
> +} nfs_errtbl[] = {
> + { NFS4_OK, 0 },
> + { NFS4ERR_PERM, -EPERM },
> + { NFS4ERR_NOENT, -ENOENT },
> + { NFS4ERR_IO, -EIO },
> + { NFS4ERR_NXIO, -ENXIO },
> + { NFS4ERR_FBIG, -E2BIG },
> + { NFS4ERR_STALE, -EBADF },
> + { NFS4ERR_ACCESS, -EACCES },
> + { NFS4ERR_EXIST, -EEXIST },
> + { NFS4ERR_XDEV, -EXDEV },
> + { NFS4ERR_MLINK, -EMLINK },
> + { NFS4ERR_NOTDIR, -ENOTDIR },
> + { NFS4ERR_ISDIR, -EISDIR },
> + { NFS4ERR_INVAL, -EINVAL },
> + { NFS4ERR_FBIG, -EFBIG },
> + { NFS4ERR_NOSPC, -ENOSPC },
> + { NFS4ERR_ROFS, -EROFS },
> + { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG },
> + { NFS4ERR_NOTEMPTY, -ENOTEMPTY },
> + { NFS4ERR_DQUOT, -EDQUOT },
> + { NFS4ERR_STALE, -ESTALE },
> + { NFS4ERR_STALE, -EOPENSTALE },
> + { NFS4ERR_DELAY, -ETIMEDOUT },
> + { NFS4ERR_DELAY, -ERESTARTSYS },
> + { NFS4ERR_DELAY, -EAGAIN },
> + { NFS4ERR_DELAY, -ENOMEM },
> + { NFS4ERR_IO, -ETXTBSY },
> + { NFS4ERR_IO, -EBUSY },
> + { NFS4ERR_BADHANDLE, -EBADHANDLE },
> + { NFS4ERR_BAD_COOKIE, -EBADCOOKIE },
> + { NFS4ERR_NOTSUPP, -EOPNOTSUPP },
> + { NFS4ERR_TOOSMALL, -ETOOSMALL },
> + { NFS4ERR_SERVERFAULT, -ESERVERFAULT },
> + { NFS4ERR_SERVERFAULT, -ENFILE },
> + { NFS4ERR_IO, -EREMOTEIO },
> + { NFS4ERR_IO, -EUCLEAN },
> + { NFS4ERR_PERM, -ENOKEY },
> + { NFS4ERR_BADTYPE, -EBADTYPE },
> + { NFS4ERR_SYMLINK, -ELOOP },
> + { NFS4ERR_DEADLOCK, -EDEADLK },
> +};
> +
> +/*
> + * Convert an NFS error code to a local one.
> + * This one is used jointly by NFSv2 and NFSv3.
> + */
> +static __u32
> +nfs4errno(int errno)
> +{
> + unsigned int i;
> + for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) {
> + if (nfs_errtbl[i].errno == errno)
> + return nfs_errtbl[i].stat;
> + }
> + /* If we cannot translate the error, the recovery routines should
> + * handle it.
> + * Note: remaining NFSv4 error codes have values > 10000, so should
> + * not conflict with native Linux error codes.
> + */
> + return NFS4ERR_SERVERFAULT;
> +}
> +
> +static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
> +
> +static bool localio_enabled __read_mostly = true;
> +module_param(localio_enabled, bool, 0644);
> +
> +bool nfs_server_is_local(const struct nfs_client *clp)
> +{
> + return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
> + localio_enabled;
> +}
> +EXPORT_SYMBOL_GPL(nfs_server_is_local);
> +
> +void
> +nfs_local_init(void)
> +{
> + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> +
> + ctx->open_f = NULL;
> + spin_lock_init(&ctx->lock);
> + atomic_set(&ctx->refcount, 0);
> +}
> +
> +static bool
> +nfs_local_get_lookup_ctx(void)
> +{
> + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> + nfs_to_nfsd_open_t fn = NULL;
> +
> + spin_lock(&ctx->lock);
> + if (ctx->open_f == NULL) {
> + spin_unlock(&ctx->lock);
> +
> + fn = symbol_request(nfsd_open_local_fh);
> + if (!fn)
> + return false;
> +
> + spin_lock(&ctx->lock);
> + /* catch race */
> + if (ctx->open_f == NULL) {
> + ctx->open_f = fn;
> + fn = NULL;
> + }
> + }
> + atomic_inc(&ctx->refcount);
> + spin_unlock(&ctx->lock);
> + if (fn)
> + symbol_put(nfsd_open_local_fh);
> + return true;
> +}
> +
> +static void
> +nfs_local_put_lookup_ctx(void)
> +{
> + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> + nfs_to_nfsd_open_t fn;
> +
> + if (atomic_dec_and_lock(&ctx->refcount, &ctx->lock)) {
> + fn = ctx->open_f;
> + ctx->open_f = NULL;
> + spin_unlock(&ctx->lock);
> + if (fn)
> + symbol_put(nfsd_open_local_fh);
> + }
> +}
> +
> +/*
> + * nfs_local_enable - attempt to enable local i/o for an nfs_client
> + */
> +static void nfs_local_enable(struct nfs_client *clp)
> +{
> + if (nfs_local_get_lookup_ctx()) {
> + set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> + trace_nfs_local_enable(clp);
> + }
> +}
> +
> +/*
> + * nfs_local_disable - disable local i/o for an nfs_client
> + */
> +void
> +nfs_local_disable(struct nfs_client *clp)
> +{
> + if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> + trace_nfs_local_disable(clp);
> + nfs_local_put_lookup_ctx();
> + }
> +}
> +
> +/*
> + * nfs_local_probe - probe local i/o support for an nfs_client
> + */
> +void
> +nfs_local_probe(struct nfs_client *clp)
> +{
> + bool enable = false;
> +
> + if (enable)
> + nfs_local_enable(clp);
> +}
> +EXPORT_SYMBOL_GPL(nfs_local_probe);
> +
> +/*
> + * nfs_local_open_fh - open a local filehandle
> + *
> + * Returns a pointer to a struct file or an ERR_PTR
> + */
> +struct file *
> +nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> + struct nfs_fh *fh, const fmode_t mode)
> +{
> + struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> + struct file *filp;
> + int status;
> +
> + if (mode & ~(FMODE_READ | FMODE_WRITE))
> + return ERR_PTR(-EINVAL);
> +
> + status = ctx->open_f(clp->cl_rpcclient, cred, fh, mode, &filp);
> + if (status < 0) {
> + dprintk("%s: open local file failed error=%d\n",
> + __func__, status);
> + trace_nfs_local_open_fh(fh, mode, status);
> + switch (status) {
> + case -ENXIO:
> + nfs_local_disable(clp);
> + fallthrough;
> + case -ETIMEDOUT:
> + status = -EAGAIN;
> + }
> + filp = ERR_PTR(status);
> + }
> + return filp;
> +}
> +EXPORT_SYMBOL_GPL(nfs_local_open_fh);
> +
> +static struct bio_vec *
> +nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
> + unsigned int npages, gfp_t flags)
> +{
> + struct bio_vec *bvec, *p;
> +
> + bvec = kmalloc_array(npages, sizeof(*bvec), flags);
> + if (bvec != NULL) {
> + for (p = bvec; npages > 0; p++, pagevec++, npages--) {
> + p->bv_page = *pagevec;
> + p->bv_len = PAGE_SIZE;
> + p->bv_offset = 0;
> + }
> + }
> + return bvec;
> +}
> +
> +static void
> +nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
> +{
> + kfree(iocb->bvec);
> + kfree(iocb);
> +}
> +
> +static struct nfs_local_kiocb *
> +nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, struct file *filp,
> + gfp_t flags)
> +{
> + struct nfs_local_kiocb *iocb;
> +
> + iocb = kmalloc(sizeof(*iocb), flags);
> + if (iocb == NULL)
> + return NULL;
> + iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
> + hdr->page_array.npages, flags);
> + if (iocb->bvec == NULL) {
> + kfree(iocb);
> + return NULL;
> + }
> + init_sync_kiocb(&iocb->kiocb, filp);
> + iocb->kiocb.ki_pos = hdr->args.offset;
> + iocb->hdr = hdr;
> + /* FIXME: NFS_IOHDR_ODIRECT isn't ever set */
> + if (test_bit(NFS_IOHDR_ODIRECT, &hdr->flags))
> + iocb->kiocb.ki_flags |= IOCB_DIRECT|IOCB_DSYNC;
> + iocb->kiocb.ki_flags &= ~IOCB_APPEND;
> + return iocb;
> +}
> +
> +static void
> +nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
> +{
> + struct nfs_pgio_header *hdr = iocb->hdr;
> +
> + if (hdr->args.pgbase != 0) {
> + iov_iter_bvec(i, dir, iocb->bvec,
> + hdr->page_array.npages,
> + hdr->args.count + hdr->args.pgbase);
> + iov_iter_advance(i, hdr->args.pgbase);
> + } else
> + iov_iter_bvec(i, dir, iocb->bvec,
> + hdr->page_array.npages, hdr->args.count);
> +}
> +
> +static void
> +nfs_local_hdr_release(struct nfs_pgio_header *hdr,
> + const struct rpc_call_ops *call_ops)
> +{
> + call_ops->rpc_call_done(&hdr->task, hdr);
> + call_ops->rpc_release(hdr);
> +}
> +
> +static void
> +nfs_local_pgio_init(struct nfs_pgio_header *hdr,
> + const struct rpc_call_ops *call_ops)
> +{
> + hdr->task.tk_ops = call_ops;
> + if (!hdr->task.tk_start)
> + hdr->task.tk_start = ktime_get();
> +}
> +
> +static void
> +nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
> +{
> + if (status >= 0) {
> + hdr->res.count = status;
> + hdr->res.op_status = NFS4_OK;
> + hdr->task.tk_status = 0;
> + } else {
> + hdr->res.op_status = nfs4errno(status);
> + hdr->task.tk_status = status;
> + }
> +}
> +
> +static void
> +nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
> +{
> + struct nfs_pgio_header *hdr = iocb->hdr;
> +
> + fput(iocb->kiocb.ki_filp);
> + nfs_local_iocb_free(iocb);
> + nfs_local_hdr_release(hdr, hdr->task.tk_ops);
> +}
> +
> +static void
> +nfs_local_read_aio_complete_work(struct work_struct *work)
> +{
> + struct nfs_local_kiocb *iocb = container_of(work,
> + struct nfs_local_kiocb, work);
> +
> + nfs_local_pgio_release(iocb);
> +}
> +
> +/*
> + * Complete the I/O from iocb->kiocb.ki_complete()
> + *
> + * Note that this function can be called from a bottom half context,
> + * hence we need to queue the fput() etc to a workqueue
> + */
> +static void
> +nfs_local_pgio_complete(struct nfs_local_kiocb *iocb)
> +{
> + queue_work(nfsiod_workqueue, &iocb->work);
> +}
> +
> +static void
> +nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
> +{
> + struct nfs_pgio_header *hdr = iocb->hdr;
> + struct file *filp = iocb->kiocb.ki_filp;
> +
> + nfs_local_pgio_done(hdr, status);
> +
> + if (hdr->res.count != hdr->args.count ||
> + hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
> + hdr->res.eof = true;
> +
> + dprintk("%s: read %ld bytes eof %d.\n", __func__,
> + status > 0 ? status : 0, hdr->res.eof);
> +}
> +
> +static void
> +nfs_local_read_aio_complete(struct kiocb *kiocb, long ret)
> +{
> + struct nfs_local_kiocb *iocb = container_of(kiocb,
> + struct nfs_local_kiocb, kiocb);
> +
> + nfs_local_read_done(iocb, ret);
> + nfs_local_pgio_complete(iocb);
> +}
> +
> +static int
> +nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
> + const struct rpc_call_ops *call_ops)
> +{
> + struct nfs_local_kiocb *iocb;
> + struct iov_iter iter;
> + ssize_t status;
> +
> + dprintk("%s: vfs_read count=%u pos=%llu\n",
> + __func__, hdr->args.count, hdr->args.offset);
> +
> + iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
> + if (iocb == NULL)
> + return -ENOMEM;
> + nfs_local_iter_init(&iter, iocb, READ);
> +
> + nfs_local_pgio_init(hdr, call_ops);
> + hdr->res.eof = false;
> +
> + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> + INIT_WORK(&iocb->work, nfs_local_read_aio_complete_work);
> + iocb->kiocb.ki_complete = nfs_local_read_aio_complete;
> + }
> +
> + status = filp->f_op->read_iter(&iocb->kiocb, &iter);
> + if (status != -EIOCBQUEUED) {
> + nfs_local_read_done(iocb, status);
> + nfs_local_pgio_release(iocb);
> + }
> + return 0;
> +}
> +
> +static void
> +nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
> +{
> + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
> + u32 *verf = (u32 *)verifier->data;
> + int seq = 0;
> +
> + do {
> + read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
> + verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
> + verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
> + } while (need_seqretry(&clp->cl_boot_lock, seq));
> + done_seqretry(&clp->cl_boot_lock, seq);
> +}
> +
> +static void
> +nfs_reset_boot_verifier(struct inode *inode)
> +{
> + struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
> +
> + write_seqlock(&clp->cl_boot_lock);
> + ktime_get_real_ts64(&clp->cl_nfssvc_boot);
> + write_sequnlock(&clp->cl_boot_lock);
> +}
> +
> +static void
> +nfs_set_local_verifier(struct inode *inode,
> + struct nfs_writeverf *verf,
> + enum nfs3_stable_how how)
> +{
> +
> + nfs_copy_boot_verifier(&verf->verifier, inode);
> + verf->committed = how;
> +}
> +
> +static void
> +nfs_get_vfs_attr(struct file *filp, struct nfs_fattr *fattr)
> +{
> + struct kstat stat;
> +
> + if (fattr != NULL && vfs_getattr(&filp->f_path, &stat,
> + STATX_INO |
> + STATX_ATIME |
> + STATX_MTIME |
> + STATX_CTIME |
> + STATX_SIZE |
> + STATX_BLOCKS,
> + AT_STATX_SYNC_AS_STAT) == 0) {
> + fattr->valid = NFS_ATTR_FATTR_FILEID |
> + NFS_ATTR_FATTR_CHANGE |
> + NFS_ATTR_FATTR_SIZE |
> + NFS_ATTR_FATTR_ATIME |
> + NFS_ATTR_FATTR_MTIME |
> + NFS_ATTR_FATTR_CTIME |
> + NFS_ATTR_FATTR_SPACE_USED;
> + fattr->fileid = stat.ino;
> + fattr->size = stat.size;
> + fattr->atime = stat.atime;
> + fattr->mtime = stat.mtime;
> + fattr->ctime = stat.ctime;
> + fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
> + fattr->du.nfs3.used = stat.blocks << 9;
> + }
> +}
> +
> +static void
> +nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
> +{
> + struct nfs_pgio_header *hdr = iocb->hdr;
> +
> + dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
> +
> + /* Handle short writes as if they are ENOSPC */
> + if (status > 0 && status < hdr->args.count) {
> + hdr->mds_offset += status;
> + hdr->args.offset += status;
> + hdr->args.pgbase += status;
> + hdr->args.count -= status;
> + nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
> + status = -ENOSPC;
> + }
> + if (status < 0)
> + nfs_reset_boot_verifier(hdr->inode);
> + nfs_local_pgio_done(hdr, status);
> +}
> +
> +static void
> +nfs_local_write_aio_complete_work(struct work_struct *work)
> +{
> + struct nfs_local_kiocb *iocb = container_of(work,
> + struct nfs_local_kiocb, work);
> +
> + nfs_get_vfs_attr(iocb->kiocb.ki_filp, iocb->hdr->res.fattr);
> + nfs_local_pgio_release(iocb);
> +}
> +
> +static void
> +nfs_local_write_aio_complete(struct kiocb *kiocb, long ret)
> +{
> + struct nfs_local_kiocb *iocb = container_of(kiocb,
> + struct nfs_local_kiocb, kiocb);
> +
> + nfs_local_write_done(iocb, ret);
> + nfs_local_pgio_complete(iocb);
> +}
> +
> +static int
> +nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
> + const struct rpc_call_ops *call_ops)
> +{
> + struct nfs_local_kiocb *iocb;
> + struct iov_iter iter;
> + ssize_t status;
> +
> + dprintk("%s: vfs_write count=%u pos=%llu %s\n",
> + __func__, hdr->args.count, hdr->args.offset,
> + (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
> +
> + iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
> + if (iocb == NULL)
> + return -ENOMEM;
> + nfs_local_iter_init(&iter, iocb, WRITE);
> +
> + switch (hdr->args.stable) {
> + default:
> + break;
> + case NFS_DATA_SYNC:
> + iocb->kiocb.ki_flags |= IOCB_DSYNC;
> + break;
> + case NFS_FILE_SYNC:
> + iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
> + }
> + nfs_local_pgio_init(hdr, call_ops);
> +
> + if (iocb->kiocb.ki_flags & IOCB_DIRECT) {
> + INIT_WORK(&iocb->work, nfs_local_write_aio_complete_work);
> + iocb->kiocb.ki_complete = nfs_local_write_aio_complete;
> + }
> +
> + nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
> +
> + file_start_write(filp);
> + status = filp->f_op->write_iter(&iocb->kiocb, &iter);
> + file_end_write(filp);
> + if (status != -EIOCBQUEUED) {
> + nfs_local_write_done(iocb, status);
> + nfs_get_vfs_attr(filp, hdr->res.fattr);
> + nfs_local_pgio_release(iocb);
> + }
> + return 0;
> +}
> +
> +static struct file *
> +nfs_local_file_open_cached(struct nfs_client *clp, const struct cred *cred,
> + struct nfs_fh *fh, struct nfs_open_context *ctx)
> +{
> + struct file *filp = ctx->local_filp;
> +
> + if (!filp) {
> + struct file *new = nfs_local_open_fh(clp, cred, fh, ctx->mode);
> + if (IS_ERR_OR_NULL(new))
> + return NULL;
> + /* try to put this one in the slot */
> + filp = cmpxchg(&ctx->local_filp, NULL, new);
> + if (filp != NULL)
> + fput(new);
> + else
> + filp = new;
> + }
> + return get_file(filp);
> +}
> +
> +struct file *
> +nfs_local_file_open(struct nfs_client *clp, const struct cred *cred,
> + struct nfs_fh *fh, struct nfs_open_context *ctx)
> +{
> + if (!nfs_server_is_local(clp))
> + return NULL;
> + return nfs_local_file_open_cached(clp, cred, fh, ctx);
> +}
> +
> +int
> +nfs_local_doio(struct nfs_client *clp, struct file *filp,
> + struct nfs_pgio_header *hdr,
> + const struct rpc_call_ops *call_ops)
> +{
> + int status = 0;
> +
> + if (!hdr->args.count)
> + goto out_fput;
> + /* Don't support filesystems without read_iter/write_iter */
> + if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
> + nfs_local_disable(clp);
> + status = -EAGAIN;
> + goto out_fput;
> + }
> +
> + switch (hdr->rw_mode) {
> + case FMODE_READ:
> + status = nfs_do_local_read(hdr, filp, call_ops);
> + break;
> + case FMODE_WRITE:
> + status = nfs_do_local_write(hdr, filp, call_ops);
> + break;
> + default:
> + dprintk("%s: invalid mode: %d\n", __func__,
> + hdr->rw_mode);
> + status = -EINVAL;
> + }
> +out_fput:
> + if (status != 0) {
> + fput(filp);
> + hdr->task.tk_status = status;
> + nfs_local_hdr_release(hdr, call_ops);
> + }
> + return status;
> +}
> +
> +static void
> +nfs_local_init_commit(struct nfs_commit_data *data,
> + const struct rpc_call_ops *call_ops)
> +{
> + data->task.tk_ops = call_ops;
> +}
> +
> +static int
> +nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
> +{
> + loff_t start = data->args.offset;
> + loff_t end = LLONG_MAX;
> +
> + if (data->args.count > 0) {
> + end = start + data->args.count - 1;
> + if (end < start)
> + end = LLONG_MAX;
> + }
> +
> + dprintk("%s: commit %llu - %llu\n", __func__, start, end);
> + return vfs_fsync_range(filp, start, end, 0);
> +}
> +
> +static void
> +nfs_local_commit_done(struct nfs_commit_data *data, int status)
> +{
> + if (status >= 0) {
> + nfs_set_local_verifier(data->inode,
> + data->res.verf,
> + NFS_FILE_SYNC);
> + data->res.op_status = NFS4_OK;
> + data->task.tk_status = 0;
> + } else {
> + nfs_reset_boot_verifier(data->inode);
> + data->res.op_status = nfs4errno(status);
> + data->task.tk_status = status;
> + }
> +}
> +
> +static void
> +nfs_local_release_commit_data(struct file *filp,
> + struct nfs_commit_data *data,
> + const struct rpc_call_ops *call_ops)
> +{
> + fput(filp);
> + call_ops->rpc_call_done(&data->task, data);
> + call_ops->rpc_release(data);
> +}
> +
> +static struct nfs_local_fsync_ctx *
> +nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, struct file *filp,
> + gfp_t flags)
> +{
> + struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
> +
> + if (ctx != NULL) {
> + ctx->filp = filp;
> + ctx->data = data;
> + INIT_WORK(&ctx->work, nfs_local_fsync_work);
> + kref_init(&ctx->kref);
> + ctx->done = NULL;
> + }
> + return ctx;
> +}
> +
> +static void
> +nfs_local_fsync_ctx_kref_free(struct kref *kref)
> +{
> + kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
> +}
> +
> +static void
> +nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
> +{
> + kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
> +}
> +
> +static void
> +nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
> +{
> + nfs_local_release_commit_data(ctx->filp, ctx->data,
> + ctx->data->task.tk_ops);
> + nfs_local_fsync_ctx_put(ctx);
> +}
> +
> +static void
> +nfs_local_fsync_work(struct work_struct *work)
> +{
> + struct nfs_local_fsync_ctx *ctx;
> + int status;
> +
> + ctx = container_of(work, struct nfs_local_fsync_ctx, work);
> +
> + status = nfs_local_run_commit(ctx->filp, ctx->data);
> + nfs_local_commit_done(ctx->data, status);
> + if (ctx->done != NULL)
> + complete(ctx->done);
> + nfs_local_fsync_ctx_free(ctx);
> +}
> +
> +int
> +nfs_local_commit(struct file *filp, struct nfs_commit_data *data,
> + const struct rpc_call_ops *call_ops, int how)
> +{
> + struct nfs_local_fsync_ctx *ctx;
> +
> + ctx = nfs_local_fsync_ctx_alloc(data, filp, GFP_KERNEL);
> + if (!ctx) {
> + nfs_local_commit_done(data, -ENOMEM);
> + nfs_local_release_commit_data(filp, data, call_ops);
> + return -ENOMEM;
> + }
> +
> + nfs_local_init_commit(data, call_ops);
> + kref_get(&ctx->kref);
> + if (how & FLUSH_SYNC) {
> + DECLARE_COMPLETION_ONSTACK(done);
> + ctx->done = &done;
> + queue_work(nfsiod_workqueue, &ctx->work);
> + wait_for_completion(&done);
> + } else
> + queue_work(nfsiod_workqueue, &ctx->work);
> + nfs_local_fsync_ctx_put(ctx);
> + return 0;
> +}
> diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
> index 1e710654af11..95a2c19a9172 100644
> --- a/fs/nfs/nfstrace.h
> +++ b/fs/nfs/nfstrace.h
> @@ -1681,6 +1681,67 @@ TRACE_EVENT(nfs_mount_path,
> TP_printk("path='%s'", __get_str(path))
> );
>
> +TRACE_EVENT(nfs_local_open_fh,
> + TP_PROTO(
> + const struct nfs_fh *fh,
> + fmode_t fmode,
> + int error
> + ),
> +
> + TP_ARGS(fh, fmode, error),
> +
> + TP_STRUCT__entry(
> + __field(int, error)
> + __field(u32, fhandle)
> + __field(unsigned int, fmode)
> + ),
> +
> + TP_fast_assign(
> + __entry->error = error;
> + __entry->fhandle = nfs_fhandle_hash(fh);
> + __entry->fmode = (__force unsigned int)fmode;
> + ),
> +
> + TP_printk(
> + "error=%d fhandle=0x%08x mode=%s",
> + __entry->error,
> + __entry->fhandle,
> + show_fs_fmode_flags(__entry->fmode)
> + )
> +);
> +
> +DECLARE_EVENT_CLASS(nfs_local_client_event,
> + TP_PROTO(
> + const struct nfs_client *clp
> + ),
> +
> + TP_ARGS(clp),
> +
> + TP_STRUCT__entry(
> + __field(unsigned int, protocol)
> + __string(server, clp->cl_hostname)
> + ),
> +
> + TP_fast_assign(
> + __entry->protocol = clp->rpc_ops->version;
> + __assign_str(server);
> + ),
> +
> + TP_printk(
> + "server=%s NFSv%u", __get_str(server), __entry->protocol
> + )
> +);
> +
> +#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \
> + DEFINE_EVENT(nfs_local_client_event, name, \
> + TP_PROTO( \
> + const struct nfs_client *clp \
> + ), \
> + TP_ARGS(clp))
> +
> +DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable);
> +DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable);
> +
> DECLARE_EVENT_CLASS(nfs_xdr_event,
> TP_PROTO(
> const struct xdr_stream *xdr,
> diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
> index 57d62db3be5b..b08420b8e664 100644
> --- a/fs/nfs/pagelist.c
> +++ b/fs/nfs/pagelist.c
> @@ -879,6 +879,9 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
> hdr->args.count,
> (unsigned long long)hdr->args.offset);
>
> + if (localio)
> + return nfs_local_doio(clp, localio, hdr, call_ops);
> +
> task = rpc_run_task(&task_setup_data);
> if (IS_ERR(task))
> return PTR_ERR(task);
> diff --git a/fs/nfs/write.c b/fs/nfs/write.c
> index 267bed2a4ceb..b29b0fd5431f 100644
> --- a/fs/nfs/write.c
> +++ b/fs/nfs/write.c
> @@ -1700,6 +1700,9 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
>
> dprintk("NFS: initiated commit call\n");
>
> + if (localio)
> + return nfs_local_commit(localio, data, call_ops, how);
> +
> task = rpc_run_task(&task_setup_data);
> if (IS_ERR(task))
> return PTR_ERR(task);
> diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
> index b8736a82e57c..78b421778a79 100644
> --- a/fs/nfsd/Makefile
> +++ b/fs/nfsd/Makefile
> @@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
> nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
> nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
> nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
> +nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index ad9083ca144b..99631fa56662 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -52,7 +52,7 @@
> #define NFSD_FILE_CACHE_UP (0)
>
> /* We only care about NFSD_MAY_READ/WRITE for this cache */
> -#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
> +#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
>
> static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
> static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> new file mode 100644
> index 000000000000..6e2918e76f49
> --- /dev/null
> +++ b/fs/nfsd/localio.c
> @@ -0,0 +1,243 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * NFS server support for local clients to bypass network stack
> + *
> + * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
> + * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
> + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
> + */
> +
> +#include <linux/exportfs.h>
> +#include <linux/sunrpc/svcauth_gss.h>
> +#include <linux/sunrpc/clnt.h>
> +#include <linux/nfs.h>
> +#include <linux/string.h>
> +
> +#include "nfsd.h"
> +#include "vfs.h"
> +#include "netns.h"
> +#include "filecache.h"
> +
> +#define NFSDDBG_FACILITY NFSDDBG_FH
> +
> +/*
> + * We need to translate between nfs status return values and
> + * the local errno values which may not be the same.
> + * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
> + * all compiled nfs objects if it were in include/linux/nfs.h
> + */
> +static const struct {
> + int stat;
> + int errno;
> +} nfs_common_errtbl[] = {
> + { NFS_OK, 0 },
> + { NFSERR_PERM, -EPERM },
> + { NFSERR_NOENT, -ENOENT },
> + { NFSERR_IO, -EIO },
> + { NFSERR_NXIO, -ENXIO },
> +/* { NFSERR_EAGAIN, -EAGAIN }, */
> + { NFSERR_ACCES, -EACCES },
> + { NFSERR_EXIST, -EEXIST },
> + { NFSERR_XDEV, -EXDEV },
> + { NFSERR_NODEV, -ENODEV },
> + { NFSERR_NOTDIR, -ENOTDIR },
> + { NFSERR_ISDIR, -EISDIR },
> + { NFSERR_INVAL, -EINVAL },
> + { NFSERR_FBIG, -EFBIG },
> + { NFSERR_NOSPC, -ENOSPC },
> + { NFSERR_ROFS, -EROFS },
> + { NFSERR_MLINK, -EMLINK },
> + { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
> + { NFSERR_NOTEMPTY, -ENOTEMPTY },
> + { NFSERR_DQUOT, -EDQUOT },
> + { NFSERR_STALE, -ESTALE },
> + { NFSERR_REMOTE, -EREMOTE },
> +#ifdef EWFLUSH
> + { NFSERR_WFLUSH, -EWFLUSH },
> +#endif
> + { NFSERR_BADHANDLE, -EBADHANDLE },
> + { NFSERR_NOT_SYNC, -ENOTSYNC },
> + { NFSERR_BAD_COOKIE, -EBADCOOKIE },
> + { NFSERR_NOTSUPP, -ENOTSUPP },
> + { NFSERR_TOOSMALL, -ETOOSMALL },
> + { NFSERR_SERVERFAULT, -EREMOTEIO },
> + { NFSERR_BADTYPE, -EBADTYPE },
> + { NFSERR_JUKEBOX, -EJUKEBOX },
> + { -1, -EIO }
> +};
> +
> +/**
> + * nfs_stat_to_errno - convert an NFS status code to a local errno
> + * @status: NFS status code to convert
> + *
> + * Returns a local errno value, or -EIO if the NFS status code is
> + * not recognized. This function is used jointly by NFSv2 and NFSv3.
> + */
> +static inline int nfs_stat_to_errno(enum nfs_stat status)
> +{
> + int i;
> +
> + for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
> + if (nfs_common_errtbl[i].stat == (int)status)
> + return nfs_common_errtbl[i].errno;
> + }
> + return nfs_common_errtbl[i].errno;
> +}
> +
Honestly, this is a little large for an inline. It wouldn't hurt to
just make this non-static and only have the table in one place.
Consider that a nit though, I don't feel strongly about it.
> +static void
> +nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
> +{
> + if (rqstp->rq_client)
> + auth_domain_put(rqstp->rq_client);
> + if (rqstp->rq_cred.cr_group_info)
> + put_group_info(rqstp->rq_cred.cr_group_info);
> + /* rpcauth_map_to_svc_cred_local() clears cr_principal */
> + WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
> + kfree(rqstp->rq_xprt);
> + kfree(rqstp);
> +}
> +
> +static struct svc_rqst *
> +nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct cred *cred)
> +{
> + struct svc_rqst *rqstp;
> + struct net *net = rpc_net_ns(rpc_clnt);
> + struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> + int status;
> +
> + /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
> + if (unlikely(!READ_ONCE(nn->nfsd_serv))) {
> + dprintk("%s: localio denied. Server not running\n", __func__);
> + return ERR_PTR(-ENXIO);
> + }
> +
> + rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
> + if (!rqstp)
> + return ERR_PTR(-ENOMEM);
> +
> + rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
> + if (!rqstp->rq_xprt) {
> + status = -ENOMEM;
> + goto out_err;
> + }
> +
> + rqstp->rq_xprt->xpt_net = net;
> + __set_bit(RQ_SECURE, &rqstp->rq_flags);
> + rqstp->rq_proc = 1;
> + rqstp->rq_vers = 3;
> + rqstp->rq_prot = IPPROTO_TCP;
> + rqstp->rq_server = nn->nfsd_serv;
> +
> + /* Note: we're connecting to ourself, so source addr == peer addr */
> + rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
> + (struct sockaddr *)&rqstp->rq_addr,
> + sizeof(rqstp->rq_addr));
> +
> + rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
> +
> + /*
> + * set up enough for svcauth_unix_set_client to be able to wait
> + * for the cache downcall. Note that we do _not_ want to allow the
> + * request to be deferred for later revisit since this rqst and xprt
> + * are not set up to run inside of the normal svc_rqst engine.
> + */
> + INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
> + kref_init(&rqstp->rq_xprt->xpt_ref);
> + spin_lock_init(&rqstp->rq_xprt->xpt_lock);
> + rqstp->rq_chandle.thread_wait = 5 * HZ;
> +
> + status = svcauth_unix_set_client(rqstp);
> + switch (status) {
> + case SVC_OK:
> + break;
> + case SVC_DENIED:
> + status = -ENXIO;
> + dprintk("%s: client %pISpc denied localio access\n",
> + __func__, (struct sockaddr *)&rqstp->rq_addr);
> + goto out_err;
> + default:
> + status = -ETIMEDOUT;
> + dprintk("%s: client %pISpc temporarily denied localio access\n",
> + __func__, (struct sockaddr *)&rqstp->rq_addr);
> + goto out_err;
> + }
> +
> + return rqstp;
> +
> +out_err:
> + nfsd_local_fakerqst_destroy(rqstp);
> + return ERR_PTR(status);
> +}
> +
> +/*
> + * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file
> + *
> + * This function maps a local fh to a path on a local filesystem.
> + * This is useful when the nfs client has the local server mounted - it can
> + * avoid all the NFS overhead with reads, writes and commits.
> + *
> + * on successful return, caller is responsible for calling path_put. Also
> + * note that this is called from nfs.ko via find_symbol() to avoid an explicit
> + * dependency on knfsd. So, there is no forward declaration in a header file
> + * for it.
> + */
> +int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> + const struct cred *cred,
> + const struct nfs_fh *nfs_fh,
> + const fmode_t fmode,
> + struct file **pfilp)
> +{
> + const struct cred *save_cred;
> + struct svc_rqst *rqstp;
> + struct svc_fh fh;
> + struct nfsd_file *nf;
> + int status = 0;
> + int mayflags = NFSD_MAY_LOCALIO;
> + __be32 beres;
> +
> + /* Save creds before calling into nfsd */
> + save_cred = get_current_cred();
> +
> + rqstp = nfsd_local_fakerqst_create(rpc_clnt, cred);
> + if (IS_ERR(rqstp)) {
> + status = PTR_ERR(rqstp);
> + goto out_revertcred;
> + }
> +
> + /* nfs_fh -> svc_fh */
> + if (nfs_fh->size > NFS4_FHSIZE) {
> + status = -EINVAL;
> + goto out;
> + }
> + fh_init(&fh, NFS4_FHSIZE);
> + fh.fh_handle.fh_size = nfs_fh->size;
> + memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
> +
> + if (fmode & FMODE_READ)
> + mayflags |= NFSD_MAY_READ;
> + if (fmode & FMODE_WRITE)
> + mayflags |= NFSD_MAY_WRITE;
> +
> + beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
> + if (beres) {
> + status = nfs_stat_to_errno(be32_to_cpu(beres));
> + dprintk("%s: fh_verify failed %d\n", __func__, status);
> + goto out_fh_put;
> + }
> +
> + *pfilp = get_file(nf->nf_file);
> +
> + nfsd_file_put(nf);
> +out_fh_put:
> + fh_put(&fh);
> +
> +out:
> + nfsd_local_fakerqst_destroy(rqstp);
> +out_revertcred:
> + revert_creds(save_cred);
> + return status;
> +}
> +EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
> +
> +/* Compile time type checking, not used by anything */
> +static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index 77bbd23aa150..9c0610fdd11c 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
> { NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
> { NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
> { NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
> - { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
> + { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
> + { NFSD_MAY_LOCALIO, "LOCALIO" })
>
> TRACE_EVENT(nfsd_compound,
> TP_PROTO(
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 57cd70062048..91c50649a8c7 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -36,6 +36,8 @@
> #define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
> #define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
>
> +#define NFSD_MAY_LOCALIO 0x800000
> +
> struct nfsd_file;
>
> /*
> @@ -158,6 +160,12 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
>
> void nfsd_filp_close(struct file *fp);
>
> +int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> + const struct cred *cred,
> + const struct nfs_fh *nfs_fh,
> + const fmode_t fmode,
> + struct file **pfilp);
> +
> static inline int fh_want_write(struct svc_fh *fh)
> {
> int ret;
> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
> index ceb70a926b95..2dacfe9742c6 100644
> --- a/include/linux/nfs.h
> +++ b/include/linux/nfs.h
> @@ -8,6 +8,8 @@
> #ifndef _LINUX_NFS_H
> #define _LINUX_NFS_H
>
> +#include <linux/cred.h>
> +#include <linux/sunrpc/auth.h>
> #include <linux/sunrpc/msg_prot.h>
> #include <linux/string.h>
> #include <linux/crc32.h>
> @@ -46,6 +48,10 @@ enum nfs3_stable_how {
> NFS_INVALID_STABLE_HOW = -1
> };
>
> +typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
> + const struct nfs_fh *, const fmode_t,
> + struct file **);
> +
> #ifdef CONFIG_CRC32
> /**
> * nfs_fhandle_hash - calculate the crc32 hash for the filehandle
> diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
> index 039898d70954..a0bb947fdd1d 100644
> --- a/include/linux/nfs_fs.h
> +++ b/include/linux/nfs_fs.h
> @@ -96,6 +96,8 @@ struct nfs_open_context {
> struct list_head list;
> struct nfs4_threshold *mdsthreshold;
> struct rcu_head rcu_head;
> +
> + struct file *local_filp;
> };
>
> struct nfs_open_dir_context {
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index 92de074e63b9..00fe469bc72e 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -49,6 +49,7 @@ struct nfs_client {
> #define NFS_CS_DS 7 /* - Server is a DS */
> #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
> #define NFS_CS_PNFS 9 /* - Server used for pnfs */
> +#define NFS_CS_LOCAL_IO 10 /* - client is local */
> struct sockaddr_storage cl_addr; /* server identifier */
> size_t cl_addrlen;
> char * cl_hostname; /* hostname of server */
> @@ -125,6 +126,10 @@ struct nfs_client {
> struct net *cl_net;
> struct list_head pending_cb_stateids;
> struct rcu_head rcu;
> +
> + /* localio */
> + struct timespec64 cl_nfssvc_boot;
> + seqlock_t cl_boot_lock;
> };
>
> /*
> diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
> index d09b9773b20c..764513a61601 100644
> --- a/include/linux/nfs_xdr.h
> +++ b/include/linux/nfs_xdr.h
> @@ -1605,6 +1605,7 @@ enum {
> NFS_IOHDR_RESEND_PNFS,
> NFS_IOHDR_RESEND_MDS,
> NFS_IOHDR_UNSTABLE_WRITES,
> + NFS_IOHDR_ODIRECT,
> };
>
> struct nfs_io_completion;
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 12/19] nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common
2024-06-18 20:19 ` [PATCH v5 12/19] nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common Mike Snitzer
@ 2024-06-18 21:32 ` Jeff Layton
0 siblings, 0 replies; 45+ messages in thread
From: Jeff Layton @ 2024-06-18 21:32 UTC (permalink / raw)
To: Mike Snitzer, linux-nfs; +Cc: Chuck Lever, Trond Myklebust, NeilBrown, snitzer
On Tue, 2024-06-18 at 16:19 -0400, Mike Snitzer wrote:
> Get nfsd_open_local_fh and store it in rpc_client during client
> creation, put the symbol during nfs_local_disable -- which is also
> called during client destruction.
>
> Eliminates the need for nfs_local_open_ctx and extra locking and
> refcounting work in fs/nfs/localio.c
>
> Also makes it so the reference to the nfsd_open_local_fh symbol is
> managed by the nfs_common module instead of the nfs client modules.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/client.c | 1 +
> fs/nfs/inode.c | 1 -
> fs/nfs/internal.h | 18 +++++---
> fs/nfs/localio.c | 86 +++-----------------------------------
> fs/nfs_common/nfslocalio.c | 26 ++++++++++++
> include/linux/nfs.h | 4 --
> include/linux/nfs_fs_sb.h | 2 +
> include/linux/nfslocalio.h | 8 ++++
> 8 files changed, 54 insertions(+), 92 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index 7044b8b3b332..cbabcdf3d785 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -171,6 +171,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
>
> INIT_LIST_HEAD(&clp->cl_superblocks);
> clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
> + clp->nfsd_open_local_fh = NULL;
>
> clp->cl_flags = cl_init->init_flags;
> clp->cl_proto = cl_init->proto;
> diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
> index 4f88b860494f..f9923cbf6058 100644
> --- a/fs/nfs/inode.c
> +++ b/fs/nfs/inode.c
> @@ -2499,7 +2499,6 @@ static int __init init_nfs_fs(void)
> if (err)
> goto out1;
>
> - nfs_local_init();
> err = register_nfs_fs();
> if (err)
> goto out0;
> diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
> index fb2fb59e7ed0..d30a2e63063c 100644
> --- a/fs/nfs/internal.h
> +++ b/fs/nfs/internal.h
> @@ -464,15 +464,22 @@ nfs_init_localioclient(struct nfs_client *clp,
> goto out;
> clp->cl_rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
> program, vers);
> + if (IS_ERR(clp->cl_rpcclient_localio))
> + goto out;
> + /* No errors! Assume that localio is supported */
> + clp->nfsd_open_local_fh = get_nfsd_open_local_fh();
> + if (!clp->nfsd_open_local_fh) {
> + rpc_shutdown_client(clp->cl_rpcclient_localio);
> + clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
> + }
> out:
> - dfprintk_rcu(CLIENT, "%s: server (%s) %s NFSv%u LOCALIO\n", __func__,
> - rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
> - (IS_ERR(clp->cl_rpcclient_localio) ?
> - "does not support" : "supports"), vers);
> + dfprintk_rcu(CLIENT, "%s: server (%s) %s NFSv%u LOCALIO, nfsd_open_local_fh is %s.\n",
> + __func__, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
> + (IS_ERR(clp->cl_rpcclient_localio) ? "does not support" : "supports"), vers,
> + (clp->nfsd_open_local_fh ? "set" : "not set"));
> }
>
> /* localio.c */
> -extern void nfs_local_init(void);
> extern void nfs_local_disable(struct nfs_client *);
> extern void nfs_local_probe(struct nfs_client *);
> extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
> @@ -489,7 +496,6 @@ extern int nfs_local_commit(struct file *, struct nfs_commit_data *,
> extern bool nfs_server_is_local(const struct nfs_client *clp);
>
> #else
> -static inline void nfs_local_init(void) {}
> static inline void nfs_local_disable(struct nfs_client *clp) {}
> static inline void nfs_local_probe(struct nfs_client *clp) {}
> static inline struct file *nfs_local_open_fh(struct nfs_client *clp,
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index 54c41933173c..ddd17549812e 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -29,26 +29,6 @@
>
> #define NFSDBG_FACILITY NFSDBG_VFS
>
> -extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> - const struct cred *cred,
> - const struct nfs_fh *nfs_fh, const fmode_t fmode,
> - struct file **pfilp);
> -/*
> - * The localio code needs to call into nfsd to do the filehandle -> struct path
> - * mapping, but cannot be statically linked, because that will make the nfs
> - * module depend on the nfsd module.
> - *
> - * Instead, do dynamic linking to the nfsd module. This way the nfs module
> - * will only hold a reference on nfsd when it's actually in use. This also
> - * allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> - */
> -
> -struct nfs_local_open_ctx {
> - spinlock_t lock;
> - nfs_to_nfsd_open_t open_f;
> - atomic_t refcount;
> -};
> -
> struct nfs_local_kiocb {
> struct kiocb kiocb;
> struct bio_vec *bvec;
> @@ -135,8 +115,6 @@ nfs4errno(int errno)
> return NFS4ERR_SERVERFAULT;
> }
>
> -static struct nfs_local_open_ctx __local_open_ctx __read_mostly;
> -
> static bool localio_enabled __read_mostly = true;
> module_param(localio_enabled, bool, 0644);
>
> @@ -151,65 +129,12 @@ bool nfs_server_is_local(const struct nfs_client *clp)
> }
> EXPORT_SYMBOL_GPL(nfs_server_is_local);
>
> -void
> -nfs_local_init(void)
> -{
> - struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> -
> - ctx->open_f = NULL;
> - spin_lock_init(&ctx->lock);
> - atomic_set(&ctx->refcount, 0);
> -}
> -
> -static bool
> -nfs_local_get_lookup_ctx(void)
> -{
> - struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> - nfs_to_nfsd_open_t fn = NULL;
> -
> - spin_lock(&ctx->lock);
> - if (ctx->open_f == NULL) {
> - spin_unlock(&ctx->lock);
> -
> - fn = symbol_request(nfsd_open_local_fh);
> - if (!fn)
> - return false;
> -
> - spin_lock(&ctx->lock);
> - /* catch race */
> - if (ctx->open_f == NULL) {
> - ctx->open_f = fn;
> - fn = NULL;
> - }
> - }
> - atomic_inc(&ctx->refcount);
> - spin_unlock(&ctx->lock);
> - if (fn)
> - symbol_put(nfsd_open_local_fh);
> - return true;
> -}
> -
> -static void
> -nfs_local_put_lookup_ctx(void)
> -{
> - struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> - nfs_to_nfsd_open_t fn;
> -
> - if (atomic_dec_and_lock(&ctx->refcount, &ctx->lock)) {
> - fn = ctx->open_f;
> - ctx->open_f = NULL;
> - spin_unlock(&ctx->lock);
> - if (fn)
> - symbol_put(nfsd_open_local_fh);
> - }
> -}
> -
It seems like the new nfs_common infrastructure should be added earlier
in the series so you don't need to rip out the code above.
> /*
> * nfs_local_enable - attempt to enable local i/o for an nfs_client
> */
> static void nfs_local_enable(struct nfs_client *clp)
> {
> - if (nfs_local_get_lookup_ctx()) {
> + if (READ_ONCE(clp->nfsd_open_local_fh)) {
> set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> trace_nfs_local_enable(clp);
> }
> @@ -218,12 +143,12 @@ static void nfs_local_enable(struct nfs_client *clp)
> /*
> * nfs_local_disable - disable local i/o for an nfs_client
> */
> -void
> -nfs_local_disable(struct nfs_client *clp)
> +void nfs_local_disable(struct nfs_client *clp)
> {
> if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
> trace_nfs_local_disable(clp);
> - nfs_local_put_lookup_ctx();
> + put_nfsd_open_local_fh();
> + clp->nfsd_open_local_fh = NULL;
> if (!IS_ERR(clp->cl_rpcclient_localio)) {
> rpc_shutdown_client(clp->cl_rpcclient_localio);
> clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
> @@ -312,14 +237,13 @@ struct file *
> nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
> struct nfs_fh *fh, const fmode_t mode)
> {
> - struct nfs_local_open_ctx *ctx = &__local_open_ctx;
> struct file *filp;
> int status;
>
> if (mode & ~(FMODE_READ | FMODE_WRITE))
> return ERR_PTR(-EINVAL);
>
> - status = ctx->open_f(clp->cl_rpcclient, cred, fh, mode, &filp);
> + status = clp->nfsd_open_local_fh(clp->cl_rpcclient, cred, fh, mode, &filp);
> if (status < 0) {
> dprintk("%s: open local file failed error=%d\n",
> __func__, status);
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index f214cc6754a1..c454c4100976 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -40,3 +40,29 @@ bool nfsd_uuid_is_local(const uuid_t *uuid)
> return !uuid_is_null(nfsd_uuid);
> }
> EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
> +
> +/*
> + * The nfs localio code needs to call into nfsd to do the filehandle -> struct path
> + * mapping, but cannot be statically linked, because that will make the nfs module
> + * depend on the nfsd module.
> + *
> + * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
> + * nfs_common module will only hold a reference on nfsd when localio is in use.
> + * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
> + */
> +
> +extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> + const struct cred *cred, const struct nfs_fh *nfs_fh,
> + const fmode_t fmode, struct file **pfilp);
> +
> +nfs_to_nfsd_open_t get_nfsd_open_local_fh(void)
> +{
> + return symbol_request(nfsd_open_local_fh);
> +}
> +EXPORT_SYMBOL_GPL(get_nfsd_open_local_fh);
> +
> +void put_nfsd_open_local_fh(void)
> +{
> + symbol_put(nfsd_open_local_fh);
> +}
> +EXPORT_SYMBOL_GPL(put_nfsd_open_local_fh);
> diff --git a/include/linux/nfs.h b/include/linux/nfs.h
> index 2dacfe9742c6..64ed672a0b34 100644
> --- a/include/linux/nfs.h
> +++ b/include/linux/nfs.h
> @@ -48,10 +48,6 @@ enum nfs3_stable_how {
> NFS_INVALID_STABLE_HOW = -1
> };
>
> -typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
> - const struct nfs_fh *, const fmode_t,
> - struct file **);
> -
> #ifdef CONFIG_CRC32
> /**
> * nfs_fhandle_hash - calculate the crc32 hash for the filehandle
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index efcdb4d8e9de..f5760b05ec87 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -8,6 +8,7 @@
> #include <linux/wait.h>
> #include <linux/nfs_xdr.h>
> #include <linux/sunrpc/xprt.h>
> +#include <linux/nfslocalio.h>
>
> #include <linux/atomic.h>
> #include <linux/refcount.h>
> @@ -131,6 +132,7 @@ struct nfs_client {
> struct timespec64 cl_nfssvc_boot;
> seqlock_t cl_boot_lock;
> struct rpc_clnt * cl_rpcclient_localio; /* localio RPC client handle */
> + nfs_to_nfsd_open_t nfsd_open_local_fh;
> };
>
> /*
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index d0bbacd0adcf..b8df1b9f248d 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -7,6 +7,7 @@
>
> #include <linux/list.h>
> #include <linux/uuid.h>
> +#include <linux/nfs.h>
>
> /*
> * Global list of nfsd_uuid_t instances, add/remove
> @@ -26,4 +27,11 @@ typedef struct {
>
> bool nfsd_uuid_is_local(const uuid_t *uuid);
>
> +typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct cred *,
> + const struct nfs_fh *, const fmode_t,
> + struct file **);
> +
> +nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
> +void put_nfsd_open_local_fh(void);
> +
> #endif /* __LINUX_NFSLOCALIO_H */
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 13/19] nfs/nfsd: ensure localio server always uses its network namespace
2024-06-18 20:19 ` [PATCH v5 13/19] nfs/nfsd: ensure localio server always uses its network namespace Mike Snitzer
@ 2024-06-18 21:36 ` Jeff Layton
0 siblings, 0 replies; 45+ messages in thread
From: Jeff Layton @ 2024-06-18 21:36 UTC (permalink / raw)
To: Mike Snitzer, linux-nfs; +Cc: Chuck Lever, Trond Myklebust, NeilBrown, snitzer
On Tue, 2024-06-18 at 16:19 -0400, Mike Snitzer wrote:
> Pass the stored cl_nfssvc_net from the client to the server as first
> argument to nfsd_open_local_fh() to ensure the proper network
> namespace is used for localio.
>
> Otherwise, before this commit, the nfs_client's network namespace was
> used (as extracted from the client's cl_rpcclient). This is clearly
> not going to allow proper functionality if the client and server
> happen to have disjoint network namespaces.
>
> Elected to not rename the nfsd_uuid_t structure despite it growing a
> non-uuid member. Can revisit later.
>
I think this too needs to be introduced earlier in the series. Prior to
this point, someone could have LOCALIO enabled in their kernel, no? If
so, then that seems like it may be broken.
I think as a goal, we should ensure that we don't allow someone to turn
on LOCALIO until all of the necessary pieces are in place. Otherwise,
things may be "funny" during a bisect.
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfs/client.c | 1 +
> fs/nfs/localio.c | 12 ++++++++----
> fs/nfs_common/nfslocalio.c | 15 +++++++++------
> fs/nfsd/localio.c | 9 +++++----
> fs/nfsd/nfssvc.c | 1 +
> fs/nfsd/vfs.h | 3 ++-
> include/linux/nfs_fs_sb.h | 1 +
> include/linux/nfslocalio.h | 10 ++++++----
> 8 files changed, 33 insertions(+), 19 deletions(-)
>
> diff --git a/fs/nfs/client.c b/fs/nfs/client.c
> index cbabcdf3d785..40077ad08ccb 100644
> --- a/fs/nfs/client.c
> +++ b/fs/nfs/client.c
> @@ -171,6 +171,7 @@ struct nfs_client *nfs_alloc_client(const struct
> nfs_client_initdata *cl_init)
>
> INIT_LIST_HEAD(&clp->cl_superblocks);
> clp->cl_rpcclient = clp->cl_rpcclient_localio = ERR_PTR(-
> EINVAL);
> + clp->cl_nfssvc_net = NULL;
> clp->nfsd_open_local_fh = NULL;
>
> clp->cl_flags = cl_init->init_flags;
> diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
> index ddd17549812e..d41130f5a84d 100644
> --- a/fs/nfs/localio.c
> +++ b/fs/nfs/localio.c
> @@ -132,10 +132,11 @@ EXPORT_SYMBOL_GPL(nfs_server_is_local);
> /*
> * nfs_local_enable - attempt to enable local i/o for an nfs_client
> */
> -static void nfs_local_enable(struct nfs_client *clp)
> +static void nfs_local_enable(struct nfs_client *clp, struct net
> *net)
> {
> if (READ_ONCE(clp->nfsd_open_local_fh)) {
> set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
> + clp->cl_nfssvc_net = net;
> trace_nfs_local_enable(clp);
> }
> }
> @@ -153,6 +154,7 @@ void nfs_local_disable(struct nfs_client *clp)
> rpc_shutdown_client(clp-
> >cl_rpcclient_localio);
> clp->cl_rpcclient_localio = ERR_PTR(-
> EINVAL);
> }
> + clp->cl_nfssvc_net = NULL;
> }
> }
>
> @@ -192,6 +194,7 @@ static bool nfs_local_server_getuuid(struct
> nfs_client *clp, uuid_t *nfsd_uuid)
> void nfs_local_probe(struct nfs_client *clp)
> {
> uuid_t uuid;
> + struct net *net = NULL;
>
> if (!localio_enabled)
> goto unsupported;
> @@ -211,7 +214,7 @@ void nfs_local_probe(struct nfs_client *clp)
> * by verifying client's nfsd, with specified uuid,
> is local.
> */
> if (!nfs_local_server_getuuid(clp, &uuid) ||
> - !nfsd_uuid_is_local(&uuid))
> + !nfsd_uuid_is_local(&uuid, &net))
> goto unsupported;
> break;
> default:
> @@ -219,7 +222,7 @@ void nfs_local_probe(struct nfs_client *clp)
> }
>
> dprintk("%s: detected local server.\n", __func__);
> - nfs_local_enable(clp);
> + nfs_local_enable(clp, net);
> return;
>
> unsupported:
> @@ -243,7 +246,8 @@ nfs_local_open_fh(struct nfs_client *clp, const
> struct cred *cred,
> if (mode & ~(FMODE_READ | FMODE_WRITE))
> return ERR_PTR(-EINVAL);
>
> - status = clp->nfsd_open_local_fh(clp->cl_rpcclient, cred,
> fh, mode, &filp);
> + status = clp->nfsd_open_local_fh(clp->cl_nfssvc_net, clp-
> >cl_rpcclient,
> + cred, fh, mode, &filp);
> if (status < 0) {
> dprintk("%s: open local file failed error=%d\n",
> __func__, status);
> diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
> index c454c4100976..086e09b3ec38 100644
> --- a/fs/nfs_common/nfslocalio.c
> +++ b/fs/nfs_common/nfslocalio.c
> @@ -12,29 +12,32 @@ MODULE_LICENSE("GPL");
> /*
> * Global list of nfsd_uuid_t instances, add/remove
> * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
> - * Reads are protected RCU read lock (see below).
> + * Reads are protected by RCU read lock (see below).
> */
> LIST_HEAD(nfsd_uuids);
> EXPORT_SYMBOL(nfsd_uuids);
>
> /* Must be called with RCU read lock held. */
> -static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid)
> +static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid,
> + struct net **netp)
> {
> nfsd_uuid_t *nfsd_uuid;
>
> list_for_each_entry_rcu(nfsd_uuid, &nfsd_uuids, list)
> - if (uuid_equal(&nfsd_uuid->uuid, uuid))
> + if (uuid_equal(&nfsd_uuid->uuid, uuid)) {
> + *netp = nfsd_uuid->net;
> return &nfsd_uuid->uuid;
> + }
>
> return &uuid_null;
> }
>
> -bool nfsd_uuid_is_local(const uuid_t *uuid)
> +bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp)
> {
> const uuid_t *nfsd_uuid;
>
> rcu_read_lock();
> - nfsd_uuid = nfsd_uuid_lookup(uuid);
> + nfsd_uuid = nfsd_uuid_lookup(uuid, netp);
> rcu_read_unlock();
>
> return !uuid_is_null(nfsd_uuid);
> @@ -51,7 +54,7 @@ EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
> * This allows some sanity checking, like giving up on localio if
> nfsd isn't loaded.
> */
>
> -extern int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> +extern int nfsd_open_local_fh(struct net *, struct rpc_clnt
> *rpc_clnt,
> const struct cred *cred, const struct nfs_fh
> *nfs_fh,
> const fmode_t fmode, struct file **pfilp);
>
> diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
> index 7ecd72406dc0..34678bfed579 100644
> --- a/fs/nfsd/localio.c
> +++ b/fs/nfsd/localio.c
> @@ -103,10 +103,10 @@ nfsd_local_fakerqst_destroy(struct svc_rqst
> *rqstp)
> }
>
> static struct svc_rqst *
> -nfsd_local_fakerqst_create(struct rpc_clnt *rpc_clnt, const struct
> cred *cred)
> +nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt
> *rpc_clnt,
> + const struct cred *cred)
> {
> struct svc_rqst *rqstp;
> - struct net *net = rpc_net_ns(rpc_clnt);
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> int status;
>
> @@ -186,7 +186,8 @@ nfsd_local_fakerqst_create(struct rpc_clnt
> *rpc_clnt, const struct cred *cred)
> * dependency on knfsd. So, there is no forward declaration in a
> header file
> * for it.
> */
> -int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> +int nfsd_open_local_fh(struct net *net,
> + struct rpc_clnt *rpc_clnt,
> const struct cred *cred,
> const struct nfs_fh *nfs_fh,
> const fmode_t fmode,
> @@ -203,7 +204,7 @@ int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> /* Save creds before calling into nfsd */
> save_cred = get_current_cred();
>
> - rqstp = nfsd_local_fakerqst_create(rpc_clnt, cred);
> + rqstp = nfsd_local_fakerqst_create(net, rpc_clnt, cred);
> if (IS_ERR(rqstp)) {
> status = PTR_ERR(rqstp);
> goto out_revertcred;
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index a81be9b39399..48bfd3c6d619 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -473,6 +473,7 @@ static int nfsd_startup_net(struct net *net,
> const struct cred *cred)
> #endif
> #if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> INIT_LIST_HEAD(&nn->nfsd_uuid.list);
> + nn->nfsd_uuid.net = net;
> list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
> #endif
> nn->nfsd_net_up = true;
> diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
> index 91c50649a8c7..af07bb146e81 100644
> --- a/fs/nfsd/vfs.h
> +++ b/fs/nfsd/vfs.h
> @@ -160,7 +160,8 @@ __be32 nfsd_permission(struct
> svc_rqst *, struct svc_export *,
>
> void nfsd_filp_close(struct file *fp);
>
> -int nfsd_open_local_fh(struct rpc_clnt *rpc_clnt,
> +int nfsd_open_local_fh(struct net *net,
> + struct rpc_clnt *rpc_clnt,
> const struct cred *cred,
> const struct nfs_fh *nfs_fh,
> const fmode_t fmode,
> diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
> index f5760b05ec87..f47ea512eb0a 100644
> --- a/include/linux/nfs_fs_sb.h
> +++ b/include/linux/nfs_fs_sb.h
> @@ -132,6 +132,7 @@ struct nfs_client {
> struct timespec64 cl_nfssvc_boot;
> seqlock_t cl_boot_lock;
> struct rpc_clnt * cl_rpcclient_localio; /* localio
> RPC client handle */
> + struct net * cl_nfssvc_net;
> nfs_to_nfsd_open_t nfsd_open_local_fh;
> };
>
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index b8df1b9f248d..c9592ad0afe2 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -8,6 +8,7 @@
> #include <linux/list.h>
> #include <linux/uuid.h>
> #include <linux/nfs.h>
> +#include <net/net_namespace.h>
>
> /*
> * Global list of nfsd_uuid_t instances, add/remove
> @@ -23,13 +24,14 @@ extern struct list_head nfsd_uuids;
> typedef struct {
> uuid_t uuid;
> struct list_head list;
> + struct net *net; /* nfsd's network namespace */
> } nfsd_uuid_t;
>
> -bool nfsd_uuid_is_local(const uuid_t *uuid);
> +bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp);
>
> -typedef int (*nfs_to_nfsd_open_t)(struct rpc_clnt *, const struct
> cred *,
> - const struct nfs_fh *, const
> fmode_t,
> - struct file **);
> +typedef int (*nfs_to_nfsd_open_t)(struct net *, struct rpc_clnt *,
> + const struct cred *, const struct
> nfs_fh *,
> + const fmode_t, struct file **);
>
> nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
> void put_nfsd_open_local_fh(void);
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 19/19] nfs: add Documentation/filesystems/nfs/localio.rst
2024-06-18 20:19 ` [PATCH v5 19/19] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
@ 2024-06-18 21:46 ` Chuck Lever
2024-06-19 5:47 ` NeilBrown
0 siblings, 1 reply; 45+ messages in thread
From: Chuck Lever @ 2024-06-18 21:46 UTC (permalink / raw)
To: Mike Snitzer; +Cc: linux-nfs, Jeff Layton, Trond Myklebust, NeilBrown, snitzer
On Tue, Jun 18, 2024 at 04:19:49PM -0400, Mike Snitzer wrote:
> This document gives an overview of the LOCALIO protocol extension
> added to the Linux NFS client and server (both v3 and v4) to allow a
> client and server to reliably handshake to determine if they are on
> the same host. The LOCALIO protocol extension follows the well-worn
> pattern established by the ACL protocol extension.
>
> The robust handshake between local client and server is just the
> beginning, the ultimate use-case this locality makes possible is the
> client is able to issue reads, writes and commits directly to the
> server without having to go over the network.
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++
> include/linux/nfslocalio.h | 2 +
> 2 files changed, 103 insertions(+)
> create mode 100644 Documentation/filesystems/nfs/localio.rst
>
> diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> new file mode 100644
> index 000000000000..4b4595037a7f
> --- /dev/null
> +++ b/Documentation/filesystems/nfs/localio.rst
> @@ -0,0 +1,101 @@
> +===========
> +NFS localio
> +===========
> +
> +This document gives an overview of the LOCALIO protocol extension added
> +to the Linux NFS client and server (both v3 and v4) to allow a client
> +and server to reliably handshake to determine if they are on the same
> +host. The LOCALIO protocol extension follows the well-worn pattern
> +established by the ACL protocol extension.
> +
> +The LOCALIO protocol extension is needed to allow robust discovery of
> +clients local to their servers. Prior to this extension a fragile
> +sockaddr network address based match against all local network
> +interfaces was attempted. But unlike the LOCALIO protocol extension,
> +the sockaddr-based matching didn't handle use of iptables or containers.
> +
> +The robust handshake between local client and server is just the
> +beginning, the ultimate use-case this locality makes possible is the
> +client is able to issue reads, writes and commits directly to the server
> +without having to go over the network. This is particularly useful for
> +container usecases (e.g. kubernetes) where it is possible to run an IO
> +job local to the server.
> +
> +The performance advantage realized from localio's ability to bypass
> +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> +- With localio:
> + read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> +- Without localio:
> + read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> +
> +RPC
> +---
> +
> +The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows
> +the client to retrieve a server's uuid. LOCALIOPROC_GETUUID encodes the
> +server's uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed
> +size opaque encode and decode XDR methods are used instead of the less
> +efficient variable sized methods.
I'm reading between the lines ("well-worn pattern established by
the [NFS]ACL protocol"). I'm guessing that the client and server
will exchange this protocol on the same connection as NFS traffic?
The use of the term "extension" in this Document might be atypical.
An /extension/ means that the base RPC program (NFS in this case)
is somehow modified. However, if LOCALIO is a distinct RPC program
then this isn't an extension of the NFS protocol, per se.
A protocol spec needs to include:
o The RPC program and version number
o A description of each its procedures, along with an XDR definition
of its arguments and results
o Any related constants or bit mask values
And any details about a fixed destination port, or that
implementations should expect this RPC program to appear on the same
connection or transport as some other RPC program.
If this is a real extension of the NFS protocol, then I think the
usual rules apply of requiring standards action before we can merge
a Linux implementation of the extension. But I don't think that's
what you're doing...? That needs to be made more clear.
> +
> +NFS Common and Server
> +---------------------
> +
> +First use is in nfsd, to add access to a global nfsd_uuids list in
> +nfs_common that is used to register and then identify local nfsd
> +instances.
> +
> +nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is
> +composed of nfsd_uuid_t instances that are managed as nfsd creates them
> +(per network namespace).
> +
> +nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local
> +nfsd for the client specified nfsd uuid.
> +
> +The nfsd_uuids list is the basis for localio enablement, as such it has
> +members that point to nfsd memory for direct use by the client
> +(e.g. 'net' is the server's network namespace, through it the client can
> +access nn->nfsd_serv with proper rcu read access). It is this client
> +and server synchronization that enables advanced usage and lifetime of
> +objects to span from the host kernel's nfsd to per-container knfsd
> +instances that are connected to nfs client's running on the same local
> +host.
> +
> +NFS Client
> +----------
> +
> +fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via
> +LOCALIO protocol and check if the server with that uuid is known to be
> +local. This ensures client and server 1: support localio 2: are local
> +to each other.
> +
> +See fs/nfs/localio.c:nfs_local_open_fh() and
> +fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
> +focused use of nfsd_uuid_t struct to allow a client local to a server to
> +open a file pointer without needing to go over the network.
> +
> +The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
> +server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
> +both the nfsd network namespace and the associated nn->nfsd_serv in
> +terms of RCU. If nfsd_open_local_fh() finds that client no longer sees
> +valid nfsd objects (be it struct net or nn->nfsd_serv) it return ENXIO
> +to nfs_local_open_fh() and the client will try to reestablish the
> +LOCALIO resources needed by calling nfs_local_probe() again. This
> +recovery is needed if/when an nfsd instance running in a container were
> +to reboot while a localio client is connected to it.
> +
> +Testing
> +-------
> +
> +The LOCALIO protocol extension and associated NFS localio read, right
> +and commit access have proven stable against various test scenarios:
> +
> +- Client and server both on localhost (for both v3 and v4.2).
> +
> +- Various permutations of client and server support enablement for
> + both local and remote client and server. Testing against NFS storage
> + products that don't support the LOCALIO protocol was also performed.
> +
> +- Client on host, server within a container (for both v3 and v4.2)
> + The container testing was in terms of podman managed containers and
> + includes container stop/restart scenario.
This isn't what I meant by a section on testing.
I meant "How would I go about testing this myself? What tests are
publicly available or part of existing NFS test suites we commonly
use?"
So, this Documention needs a recipe for setting up a client/server
with LOCALIO and some details about how it can be tested.
What you wrote is appropriate for the series cover letter.
> diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
> index c9592ad0afe2..a9722e18b527 100644
> --- a/include/linux/nfslocalio.h
> +++ b/include/linux/nfslocalio.h
> @@ -20,6 +20,8 @@ extern struct list_head nfsd_uuids;
> * Each nfsd instance has an nfsd_uuid_t that is accessible through the
> * global nfsd_uuids list. Useful to allow a client to negotiate if localio
> * possible with its server.
> + *
> + * See Documentation/filesystems/nfs/localio.rst for more detail.
> */
> typedef struct {
> uuid_t uuid;
> --
> 2.44.0
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 05/19] nfs_common: add NFS LOCALIO protocol extension enablement
2024-06-18 20:19 ` [PATCH v5 05/19] nfs_common: add NFS LOCALIO protocol extension enablement Mike Snitzer
@ 2024-06-19 5:04 ` NeilBrown
0 siblings, 0 replies; 45+ messages in thread
From: NeilBrown @ 2024-06-19 5:04 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Jeff Layton, Chuck Lever, Trond Myklebust, snitzer
On Wed, 19 Jun 2024, Mike Snitzer wrote:
> First use is in nfsd, to add access to a global nfsd_uuids list that
> will be used to identify local nfsd instances.
>
> nfsd_uuids is protected by nfsd_mutex or RCU read lock. List is
> composed of nfsd_uuid_t instances that are managed as nfsd creates
> them (per network namespace).
>
> nfsd_uuid_is_local() will be used to search all local nfsd for the
> client specified nfsd uuid.
>
> +
> +bool nfsd_uuid_is_local(const uuid_t *uuid)
> +{
> + const uuid_t *nfsd_uuid;
> +
> + rcu_read_lock();
> + nfsd_uuid = nfsd_uuid_lookup(uuid);
> + rcu_read_unlock();
> +
> + return !uuid_is_null(nfsd_uuid);
This uuid_is_null() test needs to be inside rcu_read_lock()ed region, or
it could deref a freed pointer.
But this seems to be a good place in the series to propose a bigger
change.
I think that every fs that is communicating with a localio server should
be registered with nfs_common even if that server isn't presently local.
On each IO it should check if the server is actually local, and act
accordingly. This might mean an extra pointer-deref but if that is
deemed a problem I'm sure we can find a solution.
Imagine an NFS server cluster where the server instances can migrate
around the cluster. Imagine there are also client side mounts on nodes
in this cluster. At any point a server might migrate onto a node which
is also a client of that server. We have customers who do this and for
that reason we make sure that loop-back mounts work and don't hit memory
deadlocks.
It would be good if these configurations could set the uuid for each
server instance and have it follow the server when it is migrated. So
if the server suddenly becomes local, we get to bypass the network code
for all IO.
Each uuid registered with nfs_common could be possibly linked to a
server, and to zero or more struct nfs_clients. When the server
registration changes, we walk the list of clients and tell them about
the change.
We could certainly add that functionality later if there seems to be too
much change already, but I think it would be good to add at some stage,
and maybe now is the right time.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 09/19] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM
2024-06-18 20:19 ` [PATCH v5 09/19] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-06-19 5:30 ` NeilBrown
2024-06-19 13:18 ` Chuck Lever III
0 siblings, 1 reply; 45+ messages in thread
From: NeilBrown @ 2024-06-19 5:30 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Jeff Layton, Chuck Lever, Trond Myklebust, snitzer
On Wed, 19 Jun 2024, Mike Snitzer wrote:
> LOCALIOPROC_GETUUID allows client to discover server's uuid.
>
> nfs_local_probe() will retrieve server's uuid via LOCALIO protocol and
> verify the server with that uuid it is known to be local. This ensures
> client and server 1: support localio 2: are local to each other.
>
> While doing so, factor out nfs_init_localioclient() so it is used by
> both nfs3client.c and nfs4client.c
>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
..
>
> +#define NFS_LOCALIO_PROGRAM 100229
According to RFC5531, this number is reserved for "metad".
It might be best not to use it.
That RFC says that assigning numbers isn't a job for IETF standard-track
and handed the job over to IANA.
IANA...
https://www.iana.org/assignments/sun-rpc-numbers/sun-rpc-numbers.xhtml
thinks SUN rpc numbers are obsolete.
So maybe nobody cares.
I would feel most comfortable allocating a number from the range:
0x20000000 - 0x3fffffff Defined by local administrator
(some blocks assigned here)
and maybe make it configurable by a module parameter just to be on the
safe side (overkill??)
We could try registering with lanana.org (The Linux Assigned Names And
Numbers Authority) but I wouldn't be surprised if that went nowhere.
While this might not matter in practice, I think we should appear to be
doing the right thing.
NeilBrown
> +#define LOCALIOPROC_NULL 0
> +#define LOCALIOPROC_GETUUID 1
> +
> #define NFS_PIPE_DIRNAME "nfs"
>
> /*
> --
> 2.44.0
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 19/19] nfs: add Documentation/filesystems/nfs/localio.rst
2024-06-18 21:46 ` Chuck Lever
@ 2024-06-19 5:47 ` NeilBrown
2024-06-19 18:27 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: NeilBrown @ 2024-06-19 5:47 UTC (permalink / raw)
To: Chuck Lever
Cc: Mike Snitzer, linux-nfs, Jeff Layton, Trond Myklebust, snitzer
On Wed, 19 Jun 2024, Chuck Lever wrote:
> On Tue, Jun 18, 2024 at 04:19:49PM -0400, Mike Snitzer wrote:
> > This document gives an overview of the LOCALIO protocol extension
> > added to the Linux NFS client and server (both v3 and v4) to allow a
> > client and server to reliably handshake to determine if they are on
> > the same host. The LOCALIO protocol extension follows the well-worn
> > pattern established by the ACL protocol extension.
> >
> > The robust handshake between local client and server is just the
> > beginning, the ultimate use-case this locality makes possible is the
> > client is able to issue reads, writes and commits directly to the
> > server without having to go over the network.
> >
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++
> > include/linux/nfslocalio.h | 2 +
> > 2 files changed, 103 insertions(+)
> > create mode 100644 Documentation/filesystems/nfs/localio.rst
> >
> > diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> > new file mode 100644
> > index 000000000000..4b4595037a7f
> > --- /dev/null
> > +++ b/Documentation/filesystems/nfs/localio.rst
> > @@ -0,0 +1,101 @@
> > +===========
> > +NFS localio
> > +===========
> > +
> > +This document gives an overview of the LOCALIO protocol extension added
> > +to the Linux NFS client and server (both v3 and v4) to allow a client
> > +and server to reliably handshake to determine if they are on the same
> > +host. The LOCALIO protocol extension follows the well-worn pattern
> > +established by the ACL protocol extension.
> > +
> > +The LOCALIO protocol extension is needed to allow robust discovery of
> > +clients local to their servers. Prior to this extension a fragile
> > +sockaddr network address based match against all local network
> > +interfaces was attempted. But unlike the LOCALIO protocol extension,
> > +the sockaddr-based matching didn't handle use of iptables or containers.
> > +
> > +The robust handshake between local client and server is just the
> > +beginning, the ultimate use-case this locality makes possible is the
> > +client is able to issue reads, writes and commits directly to the server
> > +without having to go over the network. This is particularly useful for
> > +container usecases (e.g. kubernetes) where it is possible to run an IO
> > +job local to the server.
> > +
> > +The performance advantage realized from localio's ability to bypass
> > +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> > +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> > +- With localio:
> > + read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> > +- Without localio:
> > + read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> > +
> > +RPC
> > +---
> > +
> > +The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows
> > +the client to retrieve a server's uuid. LOCALIOPROC_GETUUID encodes the
> > +server's uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed
> > +size opaque encode and decode XDR methods are used instead of the less
> > +efficient variable sized methods.
>
> I'm reading between the lines ("well-worn pattern established by
> the [NFS]ACL protocol"). I'm guessing that the client and server
> will exchange this protocol on the same connection as NFS traffic?
>
> The use of the term "extension" in this Document might be atypical.
> An /extension/ means that the base RPC program (NFS in this case)
> is somehow modified. However, if LOCALIO is a distinct RPC program
> then this isn't an extension of the NFS protocol, per se.
>
> A protocol spec needs to include:
>
> o The RPC program and version number
>
> o A description of each its procedures, along with an XDR definition
> of its arguments and results
>
> o Any related constants or bit mask values
Note that providing this information in the format of a ".x" file as
understood by rpcgen is a good approach.
It isn't clear to me why you implement both v3 and v4 of the LOCALIO
program. I don't see how they relate to the NFS protocol version. Just
implement v1 which simply returns the UUID.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
` (18 preceding siblings ...)
2024-06-18 20:19 ` [PATCH v5 19/19] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
@ 2024-06-19 5:49 ` Christoph Hellwig
2024-06-19 7:10 ` NeilBrown
2024-06-19 14:02 ` Trond Myklebust
19 siblings, 2 replies; 45+ messages in thread
From: Christoph Hellwig @ 2024-06-19 5:49 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Jeff Layton, Chuck Lever, Trond Myklebust, NeilBrown,
snitzer
What happened to the requirement that all protocol extensions added
to Linux need to be standardized in IETF RFCs?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 5:49 ` [PATCH v5 00/19] nfs/nfsd: add support for localio Christoph Hellwig
@ 2024-06-19 7:10 ` NeilBrown
2024-06-19 7:15 ` Christoph Hellwig
` (2 more replies)
2024-06-19 14:02 ` Trond Myklebust
1 sibling, 3 replies; 45+ messages in thread
From: NeilBrown @ 2024-06-19 7:10 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, linux-nfs, Jeff Layton, Chuck Lever,
Trond Myklebust, snitzer
On Wed, 19 Jun 2024, Christoph Hellwig wrote:
> What happened to the requirement that all protocol extensions added
> to Linux need to be standardized in IETF RFCs?
>
>
Is that requirement documented somewhere? Not that I doubt it, but it
would be nice to know where it is explicit. I couldn't quickly find
anything in Documentation/
Can we get by without the LOCALIO protocol?
For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
is explicitly documented as being usable to determine if two servers are
the same.
For NFSv4.0 ... I don't think we should encourage that to be used.
For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
server_owner4. If krb5 was used there would probably be a server
identity in there that could be used.
I think the server could theoretically return an AUTH_SYS verifier in
each RPC reply and that could be used to identify the server. I'm not
sure that is a good idea though.
Going through the IETF process for something that is entirely private to
Linux seems a bit more than should be necessary..
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 7:10 ` NeilBrown
@ 2024-06-19 7:15 ` Christoph Hellwig
2024-06-19 10:09 ` Jeff Layton
2024-06-19 17:57 ` Mike Snitzer
2 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2024-06-19 7:15 UTC (permalink / raw)
To: NeilBrown
Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Jeff Layton,
Chuck Lever, Trond Myklebust, snitzer
On Wed, Jun 19, 2024 at 05:10:10PM +1000, NeilBrown wrote:
> Is that requirement documented somewhere?
Trond has responded with that policy to various in progress features
in the past for the client. I think it also is a generally very useful
policy. (Note that we ignore it with the NFSv3 side band protocols,
but that is ancient past)
> Not that I doubt it, but it
> would be nice to know where it is explicit. I couldn't quickly find
> anything in Documentation/
Agreed.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 7:10 ` NeilBrown
2024-06-19 7:15 ` Christoph Hellwig
@ 2024-06-19 10:09 ` Jeff Layton
2024-06-19 21:09 ` NeilBrown
2024-06-19 17:57 ` Mike Snitzer
2 siblings, 1 reply; 45+ messages in thread
From: Jeff Layton @ 2024-06-19 10:09 UTC (permalink / raw)
To: NeilBrown, Christoph Hellwig
Cc: Mike Snitzer, linux-nfs, Chuck Lever, Trond Myklebust, snitzer
On Wed, 2024-06-19 at 17:10 +1000, NeilBrown wrote:
> On Wed, 19 Jun 2024, Christoph Hellwig wrote:
> > What happened to the requirement that all protocol extensions added
> > to Linux need to be standardized in IETF RFCs?
> >
> >
>
> Is that requirement documented somewhere? Not that I doubt it, but it
> would be nice to know where it is explicit. I couldn't quickly find
> anything in Documentation/
>
> Can we get by without the LOCALIO protocol?
>
> For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
> is explicitly documented as being usable to determine if two servers are
> the same.
>
> For NFSv4.0 ... I don't think we should encourage that to be used.
>
> For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
> 4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
> server_owner4. If krb5 was used there would probably be a server
> identity in there that could be used.
> I think the server could theoretically return an AUTH_SYS verifier in
> each RPC reply and that could be used to identify the server. I'm not
> sure that is a good idea though.
>
My idea for v3 was that the localio client could do an O_TMPFILE create
on the exported fs and write some random junk to it (a uuid or
something). Construct the filehandle for that and then the client could
try to issue a READ for that filehandle via the NFS server. If it finds
that filehandle and the contents are correct then you're on the same
host. Then you just close the file and it should clean itself up.
This is a little less straightforward and efficient than the localio
protocol that Mike is proposing, but requires no protocol extensions.
> Going through the IETF process for something that is entirely private to
> Linux seems a bit more than should be necessary..
>
Agreed. Given that this our own protocol extension and we don't have
any expectation of other clients or servers implementing this, I don't
see the point. I do agree that trying to avoid program number conflicts
is a good thing though.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 16/19] nfsd: use SRCU to dereference nn->nfsd_serv
2024-06-18 20:19 ` [PATCH v5 16/19] nfsd: " Mike Snitzer
@ 2024-06-19 12:39 ` Jeff Layton
2024-06-19 17:26 ` Mike Snitzer
0 siblings, 1 reply; 45+ messages in thread
From: Jeff Layton @ 2024-06-19 12:39 UTC (permalink / raw)
To: Mike Snitzer, linux-nfs; +Cc: Chuck Lever, Trond Myklebust, NeilBrown, snitzer
On Tue, 2024-06-18 at 16:19 -0400, Mike Snitzer wrote:
> Introduce nfsd_serv_get, nfsd_serv_put and nfsd_serv_sync and update
> the nfsd code to prevent nfsd_destroy_serv from destroying
> nn->nfsd_serv until all nfsd code is done with it (particularly the
> localio code that doesn't run in the context of nfsd's svc threads,
> nor does it take the nfsd_mutex).
>
> Commit 83d5e5b0af90 ("dm: optimize use SRCU and RCU") provided a
> familiar well-worn pattern for how implement.
>
> Suggested-by: NeilBrown <neilb@suse.de>
> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> ---
> fs/nfsd/filecache.c | 13 ++++++++---
> fs/nfsd/netns.h | 12 ++++++++--
> fs/nfsd/nfs4state.c | 25 ++++++++++++++-------
> fs/nfsd/nfsctl.c | 7 ++++--
> fs/nfsd/nfssvc.c | 55 ++++++++++++++++++++++++++++++++++++---------
> 5 files changed, 87 insertions(+), 25 deletions(-)
>
> diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> index 99631fa56662..474b3a3af3fb 100644
> --- a/fs/nfsd/filecache.c
> +++ b/fs/nfsd/filecache.c
> @@ -413,12 +413,15 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> struct nfsd_file *nf = list_first_entry(dispose,
> struct nfsd_file, nf_lru);
> struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
> + int srcu_idx;
> + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
> struct nfsd_fcache_disposal *l = nn->fcache_disposal;
>
> spin_lock(&l->lock);
> list_move_tail(&nf->nf_lru, &l->freeme);
> spin_unlock(&l->lock);
> - svc_wake_up(nn->nfsd_serv);
> + svc_wake_up(serv);
> + nfsd_serv_put(nn, srcu_idx);
> }
> }
>
> @@ -443,11 +446,15 @@ void nfsd_file_net_dispose(struct nfsd_net *nn)
> for (i = 0; i < 8 && !list_empty(&l->freeme); i++)
> list_move(l->freeme.next, &dispose);
> spin_unlock(&l->lock);
> - if (!list_empty(&l->freeme))
> + if (!list_empty(&l->freeme)) {
> + int srcu_idx;
> + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
> /* Wake up another thread to share the work
> * *before* doing any actual disposing.
> */
> - svc_wake_up(nn->nfsd_serv);
> + svc_wake_up(serv);
> + nfsd_serv_put(nn, srcu_idx);
> + }
> nfsd_file_dispose_list(&dispose);
> }
> }
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index 0c5a1d97e4ac..0eebcc03bcd3 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -139,8 +139,12 @@ struct nfsd_net {
> u32 clverifier_counter;
>
> struct svc_info nfsd_info;
> -#define nfsd_serv nfsd_info.serv
> -
> + /*
> + * The current 'nfsd_serv' at nfsd_info.serv
> + * Use nfsd_serv_get() or take nfsd_mutex to dereference.
> + */
> + void __rcu *nfsd_serv;
I don't understand why you need a void pointer here. This should only
ever hold a pointer to the serv or NULL. It seems like this work just
as well:
struct svc_serv __rcu *nfsd_serv;
> + struct srcu_struct nfsd_serv_srcu;
>
> /*
> * clientid and stateid data for construction of net unique COPY
> @@ -225,6 +229,10 @@ struct nfsd_net {
> extern bool nfsd_support_version(int vers);
> extern void nfsd_netns_free_versions(struct nfsd_net *nn);
>
> +extern struct svc_serv *nfsd_serv_get(struct nfsd_net *nn, int *srcu_idx);
> +extern void nfsd_serv_put(struct nfsd_net *nn, int srcu_idx);
> +extern void nfsd_serv_sync(struct nfsd_net *nn);
> +
> extern unsigned int nfsd_net_id;
>
> void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index a20c2c9d7d45..8876810e569d 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1919,6 +1919,8 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn
> u32 num = ca->maxreqs;
> unsigned long avail, total_avail;
> unsigned int scale_factor;
> + int srcu_idx;
> + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
>
> spin_lock(&nfsd_drc_lock);
> if (nfsd_drc_max_mem > nfsd_drc_mem_used)
> @@ -1940,7 +1942,7 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn
> * Give the client one slot even if that would require
> * over-allocation--it is better than failure.
> */
> - scale_factor = max_t(unsigned int, 8, nn->nfsd_serv->sv_nrthreads);
> + scale_factor = max_t(unsigned int, 8, serv->sv_nrthreads);
>
> avail = clamp_t(unsigned long, avail, slotsize,
> total_avail/scale_factor);
> @@ -1949,6 +1951,8 @@ static u32 nfsd4_get_drc_mem(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn
> nfsd_drc_mem_used += num * slotsize;
> spin_unlock(&nfsd_drc_lock);
>
> + nfsd_serv_put(nn, srcu_idx);
> +
> return num;
> }
>
> @@ -3702,12 +3706,16 @@ nfsd4_replay_create_session(struct nfsd4_create_session *cr_ses,
>
> static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfsd_net *nn)
> {
> - u32 maxrpc = nn->nfsd_serv->sv_max_mesg;
> + int srcu_idx;
> + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
> + u32 maxrpc = serv->sv_max_mesg;
> + __be32 status = nfs_ok;
>
> - if (ca->maxreq_sz < NFSD_MIN_REQ_HDR_SEQ_SZ)
> - return nfserr_toosmall;
> - if (ca->maxresp_sz < NFSD_MIN_RESP_HDR_SEQ_SZ)
> - return nfserr_toosmall;
> + if (ca->maxreq_sz < NFSD_MIN_REQ_HDR_SEQ_SZ ||
> + ca->maxresp_sz < NFSD_MIN_RESP_HDR_SEQ_SZ) {
> + status = nfserr_toosmall;
> + goto out;
> + }
> ca->headerpadsz = 0;
> ca->maxreq_sz = min_t(u32, ca->maxreq_sz, maxrpc);
> ca->maxresp_sz = min_t(u32, ca->maxresp_sz, maxrpc);
> @@ -3726,8 +3734,9 @@ static __be32 check_forechannel_attrs(struct nfsd4_channel_attrs *ca, struct nfs
> * accounting is soft and provides no guarantees either way.
> */
> ca->maxreqs = nfsd4_get_drc_mem(ca, nn);
> -
> - return nfs_ok;
> +out:
> + nfsd_serv_put(nn, srcu_idx);
> + return status;
> }
>
> /*
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index 1bddbbf7418e..2d4c29c25c6a 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -1569,10 +1569,12 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> {
> struct nfsd_net *nn = net_generic(sock_net(skb->sk), nfsd_net_id);
> int i, ret, rqstp_index = 0;
> + int srcu_idx;
> + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
>
> rcu_read_lock();
>
> - for (i = 0; i < nn->nfsd_serv->sv_nrpools; i++) {
> + for (i = 0; i < serv->sv_nrpools; i++) {
> struct svc_rqst *rqstp;
>
> if (i < cb->args[0]) /* already consumed */
> @@ -1580,7 +1582,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
>
> rqstp_index = 0;
> list_for_each_entry_rcu(rqstp,
> - &nn->nfsd_serv->sv_pools[i].sp_all_threads,
> + &serv->sv_pools[i].sp_all_threads,
> rq_all) {
> struct nfsd_genl_rqstp genl_rqstp;
> unsigned int status_counter;
> @@ -1645,6 +1647,7 @@ int nfsd_nl_rpc_status_get_dumpit(struct sk_buff *skb,
> ret = skb->len;
> out:
> rcu_read_unlock();
> + nfsd_serv_put(nn, srcu_idx);
>
> return ret;
> }
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index bfc58001dd9a..f84530f95eb8 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -300,6 +300,26 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
> return 0;
> }
>
> +struct svc_serv *nfsd_serv_get(struct nfsd_net *nn, int *srcu_idx)
> + __acquires(nn->nfsd_serv_srcu)
> +{
> + *srcu_idx = srcu_read_lock(&nn->nfsd_serv_srcu);
> +
> + return srcu_dereference(nn->nfsd_serv, &nn->nfsd_serv_srcu);
> +}
> +
> +void nfsd_serv_put(struct nfsd_net *nn, int srcu_idx)
> + __releases(nn->nfsd_serv_srcu)
> +{
> + srcu_read_unlock(&nn->nfsd_serv_srcu, srcu_idx);
> +}
> +
> +void nfsd_serv_sync(struct nfsd_net *nn)
> +{
> + synchronize_srcu(&nn->nfsd_serv_srcu);
> + synchronize_rcu_expedited();
> +}
> +
> /*
> * Maximum number of nfsd processes
> */
> @@ -507,6 +527,7 @@ static void nfsd_shutdown_net(struct net *net)
> lockd_down(net);
> nn->lockd_up = false;
> }
> + cleanup_srcu_struct(&nn->nfsd_serv_srcu);
> #if IS_ENABLED(CONFIG_NFSD_LOCALIO)
> list_del_rcu(&nn->nfsd_uuid.list);
> #endif
> @@ -514,6 +535,7 @@ static void nfsd_shutdown_net(struct net *net)
> nfsd_shutdown_generic();
> }
>
> +// FIXME: eliminate nfsd_notifier_lock
> static DEFINE_SPINLOCK(nfsd_notifier_lock);
> static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event,
> void *ptr)
> @@ -523,20 +545,22 @@ static int nfsd_inetaddr_event(struct notifier_block *this, unsigned long event,
> struct net *net = dev_net(dev);
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> struct sockaddr_in sin;
> + int srcu_idx;
> + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
>
> - if (event != NETDEV_DOWN || !nn->nfsd_serv)
> + if (event != NETDEV_DOWN || !serv)
> goto out;
>
> spin_lock(&nfsd_notifier_lock);
> - if (nn->nfsd_serv) {
> + if (serv) {
> dprintk("nfsd_inetaddr_event: removed %pI4\n", &ifa->ifa_local);
> sin.sin_family = AF_INET;
> sin.sin_addr.s_addr = ifa->ifa_local;
> - svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin);
> + svc_age_temp_xprts_now(serv, (struct sockaddr *)&sin);
> }
> spin_unlock(&nfsd_notifier_lock);
> -
> out:
> + nfsd_serv_put(nn, srcu_idx);
> return NOTIFY_DONE;
> }
>
> @@ -553,22 +577,24 @@ static int nfsd_inet6addr_event(struct notifier_block *this,
> struct net *net = dev_net(dev);
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> struct sockaddr_in6 sin6;
> + int srcu_idx;
> + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
>
> - if (event != NETDEV_DOWN || !nn->nfsd_serv)
> + if (event != NETDEV_DOWN || !serv)
> goto out;
>
> spin_lock(&nfsd_notifier_lock);
> - if (nn->nfsd_serv) {
> + if (serv) {
> dprintk("nfsd_inet6addr_event: removed %pI6\n", &ifa->addr);
> sin6.sin6_family = AF_INET6;
> sin6.sin6_addr = ifa->addr;
> if (ipv6_addr_type(&sin6.sin6_addr) & IPV6_ADDR_LINKLOCAL)
> sin6.sin6_scope_id = ifa->idev->dev->ifindex;
> - svc_age_temp_xprts_now(nn->nfsd_serv, (struct sockaddr *)&sin6);
> + svc_age_temp_xprts_now(serv, (struct sockaddr *)&sin6);
> }
> spin_unlock(&nfsd_notifier_lock);
> -
> out:
> + nfsd_serv_put(nn, srcu_idx);
> return NOTIFY_DONE;
> }
>
> @@ -589,9 +615,12 @@ void nfsd_destroy_serv(struct net *net)
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> struct svc_serv *serv = nn->nfsd_serv;
>
> + lockdep_assert_held(&nfsd_mutex);
> +
> spin_lock(&nfsd_notifier_lock);
> - nn->nfsd_serv = NULL;
> + rcu_assign_pointer(nn->nfsd_serv, NULL);
> spin_unlock(&nfsd_notifier_lock);
> + nfsd_serv_sync(nn);
>
> /* check if the notifier still has clients */
> if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
> @@ -711,6 +740,10 @@ int nfsd_create_serv(struct net *net)
> if (nn->nfsd_serv)
> return 0;
>
> + error = init_srcu_struct(&nn->nfsd_serv_srcu);
> + if (error)
> + return error;
> +
> if (nfsd_max_blksize == 0)
> nfsd_max_blksize = nfsd_get_default_max_blksize();
> nfsd_reset_versions(nn);
> @@ -727,8 +760,10 @@ int nfsd_create_serv(struct net *net)
> }
> spin_lock(&nfsd_notifier_lock);
> nn->nfsd_info.mutex = &nfsd_mutex;
> - nn->nfsd_serv = serv;
> + nn->nfsd_info.serv = serv;
> + rcu_assign_pointer(nn->nfsd_serv, nn->nfsd_info.serv);
> spin_unlock(&nfsd_notifier_lock);
> + nfsd_serv_sync(nn);
I get why you're doing the synchronize on destroy, but why on the
create? You're not tearing anything down here, so I don't see the need
to ensure synchronization.
>
> set_max_drc();
> /* check if the notifier is already set */
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 09/19] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM
2024-06-19 5:30 ` NeilBrown
@ 2024-06-19 13:18 ` Chuck Lever III
0 siblings, 0 replies; 45+ messages in thread
From: Chuck Lever III @ 2024-06-19 13:18 UTC (permalink / raw)
To: Neil Brown, Mike Snitzer
Cc: Linux NFS Mailing List, Jeff Layton, Trond Myklebust,
snitzer@hammerspace.com
> On Jun 19, 2024, at 1:30 AM, NeilBrown <neilb@suse.de> wrote:
>
> On Wed, 19 Jun 2024, Mike Snitzer wrote:
>> LOCALIOPROC_GETUUID allows client to discover server's uuid.
>>
>> nfs_local_probe() will retrieve server's uuid via LOCALIO protocol and
>> verify the server with that uuid it is known to be local. This ensures
>> client and server 1: support localio 2: are local to each other.
>>
>> While doing so, factor out nfs_init_localioclient() so it is used by
>> both nfs3client.c and nfs4client.c
>>
>> Signed-off-by: Mike Snitzer <snitzer@kernel.org>
>> ---
> ..
>>
>> +#define NFS_LOCALIO_PROGRAM 100229
>
> According to RFC5531, this number is reserved for "metad".
> It might be best not to use it.
I agree.
> That RFC says that assigning numbers isn't a job for IETF standard-track
> and handed the job over to IANA.
>
> IANA...
> https://www.iana.org/assignments/sun-rpc-numbers/sun-rpc-numbers.xhtml
> thinks SUN rpc numbers are obsolete.
>
> So maybe nobody cares.
Since this is Linux-to-Linux, interop is not a concern.
But there is value in squatting on the program number to
ensure no-one else will use it (even by another Linux-
only consumer).
Mike, IMO you should look into reserving the program
number properly.
> I would feel most comfortable allocating a number from the range:
>
> 0x20000000 - 0x3fffffff Defined by local administrator
> (some blocks assigned here)
>
> and maybe make it configurable by a module parameter just to be on the
> safe side (overkill??)
>
> We could try registering with lanana.org (The Linux Assigned Names And
> Numbers Authority) but I wouldn't be surprised if that went nowhere.
>
> While this might not matter in practice, I think we should appear to be
> doing the right thing.
>
> NeilBrown
>
>
>> +#define LOCALIOPROC_NULL 0
>> +#define LOCALIOPROC_GETUUID 1
>> +
>> #define NFS_PIPE_DIRNAME "nfs"
>>
>> /*
>> --
>> 2.44.0
>>
>>
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 5:49 ` [PATCH v5 00/19] nfs/nfsd: add support for localio Christoph Hellwig
2024-06-19 7:10 ` NeilBrown
@ 2024-06-19 14:02 ` Trond Myklebust
1 sibling, 0 replies; 45+ messages in thread
From: Trond Myklebust @ 2024-06-19 14:02 UTC (permalink / raw)
To: snitzer@kernel.org, hch@infradead.org
Cc: Mike Snitzer, linux-nfs@vger.kernel.org, jlayton@kernel.org,
neilb@suse.de, chuck.lever@oracle.com
On Tue, 2024-06-18 at 22:49 -0700, Christoph Hellwig wrote:
> What happened to the requirement that all protocol extensions added
> to Linux need to be standardized in IETF RFCs?
>
The point of the side band protocol here is literally just to discover
if the server on the other end of the connection is me, myself and I.
IOW: did the IP + port that was used to set up a connection end up,
through the magic of routing, connecting to a knfsd service that is
running on the same machine as the client.
The only requirement for interoperability with other servers is that we
don't break them when probing. Hence the side band protocol, which uses
the fact that it is an RPC program with a value that will be ignored by
all other servers except the Linux servers that implement it.
Otherwise, the protocol is private to the Linux client and knfsd.
So, if the consensus is that this still needs to go through the IETF,
then fine, we can do that, and register the side band program name with
IANA.
If there is a better way to determine that we're talking to our own
server (which may be running in a container with its own network
namespace) then I'm all ears.
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 16/19] nfsd: use SRCU to dereference nn->nfsd_serv
2024-06-19 12:39 ` Jeff Layton
@ 2024-06-19 17:26 ` Mike Snitzer
0 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-19 17:26 UTC (permalink / raw)
To: Jeff Layton; +Cc: linux-nfs, Chuck Lever, Trond Myklebust, NeilBrown, snitzer
On Wed, Jun 19, 2024 at 08:39:46AM -0400, Jeff Layton wrote:
> On Tue, 2024-06-18 at 16:19 -0400, Mike Snitzer wrote:
> > Introduce nfsd_serv_get, nfsd_serv_put and nfsd_serv_sync and update
> > the nfsd code to prevent nfsd_destroy_serv from destroying
> > nn->nfsd_serv until all nfsd code is done with it (particularly the
> > localio code that doesn't run in the context of nfsd's svc threads,
> > nor does it take the nfsd_mutex).
> >
> > Commit 83d5e5b0af90 ("dm: optimize use SRCU and RCU") provided a
> > familiar well-worn pattern for how implement.
> >
> > Suggested-by: NeilBrown <neilb@suse.de>
> > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > ---
> > fs/nfsd/filecache.c | 13 ++++++++---
> > fs/nfsd/netns.h | 12 ++++++++--
> > fs/nfsd/nfs4state.c | 25 ++++++++++++++-------
> > fs/nfsd/nfsctl.c | 7 ++++--
> > fs/nfsd/nfssvc.c | 55 ++++++++++++++++++++++++++++++++++++---------
> > 5 files changed, 87 insertions(+), 25 deletions(-)
> >
> > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
> > index 99631fa56662..474b3a3af3fb 100644
> > --- a/fs/nfsd/filecache.c
> > +++ b/fs/nfsd/filecache.c
> > @@ -413,12 +413,15 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
> > struct nfsd_file *nf = list_first_entry(dispose,
> > struct nfsd_file, nf_lru);
> > struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
> > + int srcu_idx;
> > + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
> > struct nfsd_fcache_disposal *l = nn->fcache_disposal;
> >
> > spin_lock(&l->lock);
> > list_move_tail(&nf->nf_lru, &l->freeme);
> > spin_unlock(&l->lock);
> > - svc_wake_up(nn->nfsd_serv);
> > + svc_wake_up(serv);
> > + nfsd_serv_put(nn, srcu_idx);
> > }
> > }
> >
> > @@ -443,11 +446,15 @@ void nfsd_file_net_dispose(struct nfsd_net *nn)
> > for (i = 0; i < 8 && !list_empty(&l->freeme); i++)
> > list_move(l->freeme.next, &dispose);
> > spin_unlock(&l->lock);
> > - if (!list_empty(&l->freeme))
> > + if (!list_empty(&l->freeme)) {
> > + int srcu_idx;
> > + struct svc_serv *serv = nfsd_serv_get(nn, &srcu_idx);
> > /* Wake up another thread to share the work
> > * *before* doing any actual disposing.
> > */
> > - svc_wake_up(nn->nfsd_serv);
> > + svc_wake_up(serv);
> > + nfsd_serv_put(nn, srcu_idx);
> > + }
> > nfsd_file_dispose_list(&dispose);
> > }
> > }
> > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > index 0c5a1d97e4ac..0eebcc03bcd3 100644
> > --- a/fs/nfsd/netns.h
> > +++ b/fs/nfsd/netns.h
> > @@ -139,8 +139,12 @@ struct nfsd_net {
> > u32 clverifier_counter;
> >
> > struct svc_info nfsd_info;
> > -#define nfsd_serv nfsd_info.serv
> > -
> > + /*
> > + * The current 'nfsd_serv' at nfsd_info.serv
> > + * Use nfsd_serv_get() or take nfsd_mutex to dereference.
> > + */
> > + void __rcu *nfsd_serv;
>
> I don't understand why you need a void pointer here. This should only
> ever hold a pointer to the serv or NULL. It seems like this work just
> as well:
>
> struct svc_serv __rcu *nfsd_serv;
>
It is defensive, future-proofs us from some new code being introduced
that dereferences nn->nfsd_serv without proper use of nfsd_serv_get().
> > @@ -589,9 +615,12 @@ void nfsd_destroy_serv(struct net *net)
> > struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > struct svc_serv *serv = nn->nfsd_serv;
> >
> > + lockdep_assert_held(&nfsd_mutex);
> > +
> > spin_lock(&nfsd_notifier_lock);
> > - nn->nfsd_serv = NULL;
> > + rcu_assign_pointer(nn->nfsd_serv, NULL);
> > spin_unlock(&nfsd_notifier_lock);
> > + nfsd_serv_sync(nn);
> >
> > /* check if the notifier still has clients */
> > if (atomic_dec_return(&nfsd_notifier_refcount) == 0) {
> > @@ -711,6 +740,10 @@ int nfsd_create_serv(struct net *net)
> > if (nn->nfsd_serv)
> > return 0;
> >
> > + error = init_srcu_struct(&nn->nfsd_serv_srcu);
> > + if (error)
> > + return error;
> > +
> > if (nfsd_max_blksize == 0)
> > nfsd_max_blksize = nfsd_get_default_max_blksize();
> > nfsd_reset_versions(nn);
> > @@ -727,8 +760,10 @@ int nfsd_create_serv(struct net *net)
> > }
> > spin_lock(&nfsd_notifier_lock);
> > nn->nfsd_info.mutex = &nfsd_mutex;
> > - nn->nfsd_serv = serv;
> > + nn->nfsd_info.serv = serv;
> > + rcu_assign_pointer(nn->nfsd_serv, nn->nfsd_info.serv);
> > spin_unlock(&nfsd_notifier_lock);
> > + nfsd_serv_sync(nn);
>
> I get why you're doing the synchronize on destroy, but why on the
> create? You're not tearing anything down here, so I don't see the need
> to ensure synchronization.
Yeah, it isn't needed. Fixed, thanks.
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 7:10 ` NeilBrown
2024-06-19 7:15 ` Christoph Hellwig
2024-06-19 10:09 ` Jeff Layton
@ 2024-06-19 17:57 ` Mike Snitzer
2024-06-19 18:04 ` Chuck Lever III
2024-06-20 5:18 ` Christoph Hellwig
2 siblings, 2 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-19 17:57 UTC (permalink / raw)
To: NeilBrown
Cc: Christoph Hellwig, linux-nfs, Jeff Layton, Chuck Lever,
Trond Myklebust, snitzer
On Wed, Jun 19, 2024 at 05:10:10PM +1000, NeilBrown wrote:
> On Wed, 19 Jun 2024, Christoph Hellwig wrote:
> > What happened to the requirement that all protocol extensions added
> > to Linux need to be standardized in IETF RFCs?
> >
> >
>
> Is that requirement documented somewhere? Not that I doubt it, but it
> would be nice to know where it is explicit. I couldn't quickly find
> anything in Documentation/
>
> Can we get by without the LOCALIO protocol?
>
> For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
> is explicitly documented as being usable to determine if two servers are
> the same.
My first approach was to (ab)use EXCHANGE_ID. It worked, but it
required exporting a symbol to query the hash table local to
nfs4state, etc. It wasn't very clean.. could it have been made
clean?: I guess... but in the end I elected to solve both v3 and v4.x in
the same way using LOCALIO protocol.
> For NFSv4.0 ... I don't think we should encourage that to be used.
>
> For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
> 4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
> server_owner4. If krb5 was used there would probably be a server
> identity in there that could be used.
> I think the server could theoretically return an AUTH_SYS verifier in
> each RPC reply and that could be used to identify the server. I'm not
> sure that is a good idea though.
>
> Going through the IETF process for something that is entirely private to
> Linux seems a bit more than should be necessary..
I have to believe Christoph didn't appreciate this LOCALIO protocol is
an entirely private implementation detail to Linux (that allows client
and server handshake). I've clarified that in Documentation (for v6).
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 17:57 ` Mike Snitzer
@ 2024-06-19 18:04 ` Chuck Lever III
2024-06-19 18:13 ` Mike Snitzer
2024-06-20 5:18 ` Christoph Hellwig
1 sibling, 1 reply; 45+ messages in thread
From: Chuck Lever III @ 2024-06-19 18:04 UTC (permalink / raw)
To: Mike Snitzer
Cc: Neil Brown, Christoph Hellwig, Linux NFS Mailing List,
Jeff Layton, Trond Myklebust, snitzer@hammerspace.com
> On Jun 19, 2024, at 1:57 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Wed, Jun 19, 2024 at 05:10:10PM +1000, NeilBrown wrote:
>> On Wed, 19 Jun 2024, Christoph Hellwig wrote:
>>> What happened to the requirement that all protocol extensions added
>>> to Linux need to be standardized in IETF RFCs?
>>>
>>>
>>
>> Is that requirement documented somewhere? Not that I doubt it, but it
>> would be nice to know where it is explicit. I couldn't quickly find
>> anything in Documentation/
>>
>> Can we get by without the LOCALIO protocol?
>>
>> For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
>> is explicitly documented as being usable to determine if two servers are
>> the same.
>
> My first approach was to (ab)use EXCHANGE_ID. It worked, but it
> required exporting a symbol to query the hash table local to
> nfs4state, etc. It wasn't very clean.. could it have been made
> clean?: I guess... but in the end I elected to solve both v3 and v4.x in
> the same way using LOCALIO protocol.
>
>> For NFSv4.0 ... I don't think we should encourage that to be used.
>>
>> For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
>> 4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
>> server_owner4. If krb5 was used there would probably be a server
>> identity in there that could be used.
>> I think the server could theoretically return an AUTH_SYS verifier in
>> each RPC reply and that could be used to identify the server. I'm not
>> sure that is a good idea though.
>>
>> Going through the IETF process for something that is entirely private to
>> Linux seems a bit more than should be necessary..
>
> I have to believe Christoph didn't appreciate this LOCALIO protocol is
> an entirely private implementation detail to Linux (that allows client
> and server handshake). I've clarified that in Documentation (for v6).
Even though this is a private protocol, you don't want some
other NFS implementation re-using that RPC program number
for its own purposes.
I think registering the RPC program number and name with
IANA is going to save everyone some potential headaches
and won't be an arduous process.
--
Chuck Lever
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 18:04 ` Chuck Lever III
@ 2024-06-19 18:13 ` Mike Snitzer
2024-06-19 18:22 ` Chuck Lever III
0 siblings, 1 reply; 45+ messages in thread
From: Mike Snitzer @ 2024-06-19 18:13 UTC (permalink / raw)
To: Chuck Lever III
Cc: Neil Brown, Christoph Hellwig, Linux NFS Mailing List,
Jeff Layton, Trond Myklebust, snitzer@hammerspace.com
On Wed, Jun 19, 2024 at 06:04:46PM +0000, Chuck Lever III wrote:
>
>
> > On Jun 19, 2024, at 1:57 PM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > On Wed, Jun 19, 2024 at 05:10:10PM +1000, NeilBrown wrote:
> >> On Wed, 19 Jun 2024, Christoph Hellwig wrote:
> >>> What happened to the requirement that all protocol extensions added
> >>> to Linux need to be standardized in IETF RFCs?
> >>>
> >>>
> >>
> >> Is that requirement documented somewhere? Not that I doubt it, but it
> >> would be nice to know where it is explicit. I couldn't quickly find
> >> anything in Documentation/
> >>
> >> Can we get by without the LOCALIO protocol?
> >>
> >> For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
> >> is explicitly documented as being usable to determine if two servers are
> >> the same.
> >
> > My first approach was to (ab)use EXCHANGE_ID. It worked, but it
> > required exporting a symbol to query the hash table local to
> > nfs4state, etc. It wasn't very clean.. could it have been made
> > clean?: I guess... but in the end I elected to solve both v3 and v4.x in
> > the same way using LOCALIO protocol.
> >
> >> For NFSv4.0 ... I don't think we should encourage that to be used.
> >>
> >> For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
> >> 4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
> >> server_owner4. If krb5 was used there would probably be a server
> >> identity in there that could be used.
> >> I think the server could theoretically return an AUTH_SYS verifier in
> >> each RPC reply and that could be used to identify the server. I'm not
> >> sure that is a good idea though.
> >>
> >> Going through the IETF process for something that is entirely private to
> >> Linux seems a bit more than should be necessary..
> >
> > I have to believe Christoph didn't appreciate this LOCALIO protocol is
> > an entirely private implementation detail to Linux (that allows client
> > and server handshake). I've clarified that in Documentation (for v6).
>
> Even though this is a private protocol, you don't want some
> other NFS implementation re-using that RPC program number
> for its own purposes.
>
> I think registering the RPC program number and name with
> IANA is going to save everyone some potential headaches
> and won't be an arduous process.
I fully agree, I will work on it. If you have hints for the best place
to start I'd welcome any help getting the process started.
In v6 I switch to using rpc program number 0x20000002
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 18:13 ` Mike Snitzer
@ 2024-06-19 18:22 ` Chuck Lever III
0 siblings, 0 replies; 45+ messages in thread
From: Chuck Lever III @ 2024-06-19 18:22 UTC (permalink / raw)
To: Mike Snitzer
Cc: Neil Brown, Christoph Hellwig, Linux NFS Mailing List,
Jeff Layton, Trond Myklebust, snitzer@hammerspace.com
> On Jun 19, 2024, at 2:13 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Wed, Jun 19, 2024 at 06:04:46PM +0000, Chuck Lever III wrote:
>>
>>
>>> On Jun 19, 2024, at 1:57 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>>>
>>> On Wed, Jun 19, 2024 at 05:10:10PM +1000, NeilBrown wrote:
>>>> On Wed, 19 Jun 2024, Christoph Hellwig wrote:
>>>>> What happened to the requirement that all protocol extensions added
>>>>> to Linux need to be standardized in IETF RFCs?
>>>>>
>>>>>
>>>>
>>>> Is that requirement documented somewhere? Not that I doubt it, but it
>>>> would be nice to know where it is explicit. I couldn't quickly find
>>>> anything in Documentation/
>>>>
>>>> Can we get by without the LOCALIO protocol?
>>>>
>>>> For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
>>>> is explicitly documented as being usable to determine if two servers are
>>>> the same.
>>>
>>> My first approach was to (ab)use EXCHANGE_ID. It worked, but it
>>> required exporting a symbol to query the hash table local to
>>> nfs4state, etc. It wasn't very clean.. could it have been made
>>> clean?: I guess... but in the end I elected to solve both v3 and v4.x in
>>> the same way using LOCALIO protocol.
>>>
>>>> For NFSv4.0 ... I don't think we should encourage that to be used.
>>>>
>>>> For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
>>>> 4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
>>>> server_owner4. If krb5 was used there would probably be a server
>>>> identity in there that could be used.
>>>> I think the server could theoretically return an AUTH_SYS verifier in
>>>> each RPC reply and that could be used to identify the server. I'm not
>>>> sure that is a good idea though.
>>>>
>>>> Going through the IETF process for something that is entirely private to
>>>> Linux seems a bit more than should be necessary..
>>>
>>> I have to believe Christoph didn't appreciate this LOCALIO protocol is
>>> an entirely private implementation detail to Linux (that allows client
>>> and server handshake). I've clarified that in Documentation (for v6).
>>
>> Even though this is a private protocol, you don't want some
>> other NFS implementation re-using that RPC program number
>> for its own purposes.
>>
>> I think registering the RPC program number and name with
>> IANA is going to save everyone some potential headaches
>> and won't be an arduous process.
>
> I fully agree, I will work on it. If you have hints for the best place
> to start I'd welcome any help getting the process started.
See Appendix B of RFC 5531.
https://www.rfc-editor.org/rfc/rfc5531.html
> In v6 I switch to using rpc program number 0x20000002
"Specific numbers cannot be requested. Numbers are
assigned on a First Come First Served basis." You
can use whatever you like until one is assigned,
knowing that the risk is it is almost certainly
not going to be the same value that IANA will give
you.
--
Chuck Lever
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 19/19] nfs: add Documentation/filesystems/nfs/localio.rst
2024-06-19 5:47 ` NeilBrown
@ 2024-06-19 18:27 ` Mike Snitzer
0 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-19 18:27 UTC (permalink / raw)
To: NeilBrown; +Cc: Chuck Lever, linux-nfs, Jeff Layton, Trond Myklebust, snitzer
On Wed, Jun 19, 2024 at 03:47:05PM +1000, NeilBrown wrote:
> On Wed, 19 Jun 2024, Chuck Lever wrote:
> > On Tue, Jun 18, 2024 at 04:19:49PM -0400, Mike Snitzer wrote:
> > > This document gives an overview of the LOCALIO protocol extension
> > > added to the Linux NFS client and server (both v3 and v4) to allow a
> > > client and server to reliably handshake to determine if they are on
> > > the same host. The LOCALIO protocol extension follows the well-worn
> > > pattern established by the ACL protocol extension.
> > >
> > > The robust handshake between local client and server is just the
> > > beginning, the ultimate use-case this locality makes possible is the
> > > client is able to issue reads, writes and commits directly to the
> > > server without having to go over the network.
> > >
> > > Signed-off-by: Mike Snitzer <snitzer@kernel.org>
> > > ---
> > > Documentation/filesystems/nfs/localio.rst | 101 ++++++++++++++++++++++
> > > include/linux/nfslocalio.h | 2 +
> > > 2 files changed, 103 insertions(+)
> > > create mode 100644 Documentation/filesystems/nfs/localio.rst
> > >
> > > diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
> > > new file mode 100644
> > > index 000000000000..4b4595037a7f
> > > --- /dev/null
> > > +++ b/Documentation/filesystems/nfs/localio.rst
> > > @@ -0,0 +1,101 @@
> > > +===========
> > > +NFS localio
> > > +===========
> > > +
> > > +This document gives an overview of the LOCALIO protocol extension added
> > > +to the Linux NFS client and server (both v3 and v4) to allow a client
> > > +and server to reliably handshake to determine if they are on the same
> > > +host. The LOCALIO protocol extension follows the well-worn pattern
> > > +established by the ACL protocol extension.
> > > +
> > > +The LOCALIO protocol extension is needed to allow robust discovery of
> > > +clients local to their servers. Prior to this extension a fragile
> > > +sockaddr network address based match against all local network
> > > +interfaces was attempted. But unlike the LOCALIO protocol extension,
> > > +the sockaddr-based matching didn't handle use of iptables or containers.
> > > +
> > > +The robust handshake between local client and server is just the
> > > +beginning, the ultimate use-case this locality makes possible is the
> > > +client is able to issue reads, writes and commits directly to the server
> > > +without having to go over the network. This is particularly useful for
> > > +container usecases (e.g. kubernetes) where it is possible to run an IO
> > > +job local to the server.
> > > +
> > > +The performance advantage realized from localio's ability to bypass
> > > +using XDR and RPC for reads, writes and commits can be extreme, e.g.:
> > > +fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
> > > +- With localio:
> > > + read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
> > > +- Without localio:
> > > + read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
> > > +
> > > +RPC
> > > +---
> > > +
> > > +The LOCALIO RPC protocol consists of a single "GETUUID" RPC that allows
> > > +the client to retrieve a server's uuid. LOCALIOPROC_GETUUID encodes the
> > > +server's uuid_t in terms of the fixed UUID_SIZE (16 bytes). The fixed
> > > +size opaque encode and decode XDR methods are used instead of the less
> > > +efficient variable sized methods.
> >
> > I'm reading between the lines ("well-worn pattern established by
> > the [NFS]ACL protocol"). I'm guessing that the client and server
> > will exchange this protocol on the same connection as NFS traffic?
> >
> > The use of the term "extension" in this Document might be atypical.
> > An /extension/ means that the base RPC program (NFS in this case)
> > is somehow modified. However, if LOCALIO is a distinct RPC program
> > then this isn't an extension of the NFS protocol, per se.
> >
> > A protocol spec needs to include:
> >
> > o The RPC program and version number
> >
> > o A description of each its procedures, along with an XDR definition
> > of its arguments and results
> >
> > o Any related constants or bit mask values
>
> Note that providing this information in the format of a ".x" file as
> understood by rpcgen is a good approach.
I've approximated that in an update for v6, but I'm sure it'll leave
you and Chuck wanting ;)
> It isn't clear to me why you implement both v3 and v4 of the LOCALIO
> program. I don't see how they relate to the NFS protocol version. Just
> implement v1 which simply returns the UUID.
Yeah, I'd love to pull it out to be standalone but in practice the
pattern I followed from NFS ACL (to use rpc_bind_new_program) took me
down the path of implementing it for both v3 and v4. It did help to
put the endpoints to action by leveraging what NFS already provides
for encoding status though.
Would be nice to avoid it but it isn't immediately clear to me how.
Can be done as followup work but it'd take me some time to sort it
out -- might be you could cut through it more easily?
Only having a single LOCALIO protocol version would allow for
nfs_init_localioclient() to not need 'vers' to be specified. And it'd
remove the need for the .init_localioclient hook I added (as well as
the use of __always_inline to share nfs_init_localioclient between
fs/nfs/nfs[34]client.c)
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 10:09 ` Jeff Layton
@ 2024-06-19 21:09 ` NeilBrown
2024-06-19 22:28 ` Jeff Layton
` (2 more replies)
0 siblings, 3 replies; 45+ messages in thread
From: NeilBrown @ 2024-06-19 21:09 UTC (permalink / raw)
To: Jeff Layton
Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever,
Trond Myklebust, snitzer
On Wed, 19 Jun 2024, Jeff Layton wrote:
> On Wed, 2024-06-19 at 17:10 +1000, NeilBrown wrote:
> > On Wed, 19 Jun 2024, Christoph Hellwig wrote:
> > > What happened to the requirement that all protocol extensions added
> > > to Linux need to be standardized in IETF RFCs?
> > >
> > >
> >
> > Is that requirement documented somewhere? Not that I doubt it, but it
> > would be nice to know where it is explicit. I couldn't quickly find
> > anything in Documentation/
> >
> > Can we get by without the LOCALIO protocol?
> >
> > For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
> > is explicitly documented as being usable to determine if two servers are
> > the same.
> >
> > For NFSv4.0 ... I don't think we should encourage that to be used.
> >
> > For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
> > 4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
> > server_owner4. If krb5 was used there would probably be a server
> > identity in there that could be used.
> > I think the server could theoretically return an AUTH_SYS verifier in
> > each RPC reply and that could be used to identify the server. I'm not
> > sure that is a good idea though.
> >
>
> My idea for v3 was that the localio client could do an O_TMPFILE create
> on the exported fs and write some random junk to it (a uuid or
> something). Construct the filehandle for that and then the client could
> try to issue a READ for that filehandle via the NFS server. If it finds
> that filehandle and the contents are correct then you're on the same
> host. Then you just close the file and it should clean itself up.
I can't see how this would work, but maybe I don't have a good enough
imagination.
The high-level view of the proposed protocol is:
- client asks remote server to identify itself.
- server returns an identity
- client uses local-sideband to ask each local server if it has the
given identity.
I don't see where an O_TMPFILE could fit into this, or how a different
high-level approach would be any better.
For NFSv3 the client could ask with a new Program or Version or
Procedure, or all three. Or it could ask with a new file-handle or path
name. I imagine using a webnfs (rfc2054) multi-component lookup on the
public filehandle for "/linux/config/server-id" and getting back a
filehandle which encodes the server ID somehow. All these seem credible
options and it is not clear than any one is better than any other.
For NFSv4.1 I think that LOCALIO looks a lot like trunking and so using
exactly the same mechanism to determine if two servers are the same is a
good idea.
But then LOCALIO also looks a lot like a new pNFS/DS protocol so maybe
we should specify that protocol and use GETDEVICELIST or GETDEVICEINFO
to find the identity of the server.
>
> This is a little less straightforward and efficient than the localio
> protocol that Mike is proposing, but requires no protocol extensions.
I think that if we use anything other than the server-id in the
EXCHANGE_ID response, then we are defining a new protocol as it is a new
request which we expect existing servers to ignore or fail, even though
they have never been tested to ignore/fail that particular request.
Of all the options I would guess that a new version for an existing
protocol would be safest as that is the most likely to have been tested.
A new RPC program is probably conceptually simplest. A little hack in
LOOKUPv3 to detect the public filehandle etc is probably the easiest to
code, and a new pnfs/ds protocol is probably the cleanest overall
except that it doesn't support NFSv3.
My purpose in all this is not to replace Mike's LOCALIO protocol, but to
explore the solution space to ensure there is nothing that is obviously
better. As yet, I don't think there is.
NeilBrown
>
> > Going through the IETF process for something that is entirely private to
> > Linux seems a bit more than should be necessary..
> >
>
> Agreed. Given that this our own protocol extension and we don't have
> any expectation of other clients or servers implementing this, I don't
> see the point. I do agree that trying to avoid program number conflicts
> is a good thing though.
> --
> Jeff Layton <jlayton@kernel.org>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 21:09 ` NeilBrown
@ 2024-06-19 22:28 ` Jeff Layton
2024-06-19 22:46 ` Mike Snitzer
2024-06-20 5:16 ` Christoph Hellwig
2 siblings, 0 replies; 45+ messages in thread
From: Jeff Layton @ 2024-06-19 22:28 UTC (permalink / raw)
To: NeilBrown
Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever,
Trond Myklebust, snitzer
On Thu, 2024-06-20 at 07:09 +1000, NeilBrown wrote:
> On Wed, 19 Jun 2024, Jeff Layton wrote:
> > On Wed, 2024-06-19 at 17:10 +1000, NeilBrown wrote:
> > > On Wed, 19 Jun 2024, Christoph Hellwig wrote:
> > > > What happened to the requirement that all protocol extensions added
> > > > to Linux need to be standardized in IETF RFCs?
> > > >
> > > >
> > >
> > > Is that requirement documented somewhere? Not that I doubt it, but it
> > > would be nice to know where it is explicit. I couldn't quickly find
> > > anything in Documentation/
> > >
> > > Can we get by without the LOCALIO protocol?
> > >
> > > For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
> > > is explicitly documented as being usable to determine if two servers are
> > > the same.
> > >
> > > For NFSv4.0 ... I don't think we should encourage that to be used.
> > >
> > > For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
> > > 4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
> > > server_owner4. If krb5 was used there would probably be a server
> > > identity in there that could be used.
> > > I think the server could theoretically return an AUTH_SYS verifier in
> > > each RPC reply and that could be used to identify the server. I'm not
> > > sure that is a good idea though.
> > >
> >
> > My idea for v3 was that the localio client could do an O_TMPFILE create
> > on the exported fs and write some random junk to it (a uuid or
> > something). Construct the filehandle for that and then the client could
> > try to issue a READ for that filehandle via the NFS server. If it finds
> > that filehandle and the contents are correct then you're on the same
> > host. Then you just close the file and it should clean itself up.
>
> I can't see how this would work, but maybe I don't have a good enough
> imagination.
>
Maybe I didn't explain it well:
Basically the idea was to create a "unique" file in a filesystem
exported by the local nfsd, and then see if it's accessible at the
expected filehandle via a v3 READ and has the expected contents. If it
is then you can assume localio is possible. O_TMPFILE would just make
it simple to clean up the file after you were done, and would avoid
adding spurious entries to the exported directory tree.
The problem with that method is that it's hard to make it work well
with containers. You'd need to be able to predict which net namespace's
server you were talking to, or somehow make the file be exported by all
of them. That alone makes it difficult to implement.
> The high-level view of the proposed protocol is:
> - client asks remote server to identify itself.
> - server returns an identity
> - client uses local-sideband to ask each local server if it has the
> given identity.
>
> I don't see where an O_TMPFILE could fit into this, or how a different
> high-level approach would be any better.
>
> For NFSv3 the client could ask with a new Program or Version or
> Procedure, or all three. Or it could ask with a new file-handle or path
> name. I imagine using a webnfs (rfc2054) multi-component lookup on the
> public filehandle for "/linux/config/server-id" and getting back a
> filehandle which encodes the server ID somehow. All these seem credible
> options and it is not clear than any one is better than any other.
>
> For NFSv4.1 I think that LOCALIO looks a lot like trunking and so using
> exactly the same mechanism to determine if two servers are the same is a
> good idea.
> But then LOCALIO also looks a lot like a new pNFS/DS protocol so maybe
> we should specify that protocol and use GETDEVICELIST or GETDEVICEINFO
> to find the identity of the server.
>
> >
> > This is a little less straightforward and efficient than the localio
> > protocol that Mike is proposing, but requires no protocol extensions.
>
> I think that if we use anything other than the server-id in the
> EXCHANGE_ID response, then we are defining a new protocol as it is a new
> request which we expect existing servers to ignore or fail, even though
> they have never been tested to ignore/fail that particular request.
>
> Of all the options I would guess that a new version for an existing
> protocol would be safest as that is the most likely to have been tested.
> A new RPC program is probably conceptually simplest. A little hack in
> LOOKUPv3 to detect the public filehandle etc is probably the easiest to
> code, and a new pnfs/ds protocol is probably the cleanest overall
> except that it doesn't support NFSv3.
>
> My purpose in all this is not to replace Mike's LOCALIO protocol, but to
> explore the solution space to ensure there is nothing that is obviously
> better. As yet, I don't think there is.
>
>
Agreed. Thanks for laying out some alternatives! It's good to consider
other possibilities.
> >
> > > Going through the IETF process for something that is entirely private to
> > > Linux seems a bit more than should be necessary..
> > >
> >
> > Agreed. Given that this our own protocol extension and we don't have
> > any expectation of other clients or servers implementing this, I don't
> > see the point. I do agree that trying to avoid program number conflicts
> > is a good thing though.
> > --
> > Jeff Layton <jlayton@kernel.org>
> >
>
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 21:09 ` NeilBrown
2024-06-19 22:28 ` Jeff Layton
@ 2024-06-19 22:46 ` Mike Snitzer
2024-06-20 5:16 ` Christoph Hellwig
2 siblings, 0 replies; 45+ messages in thread
From: Mike Snitzer @ 2024-06-19 22:46 UTC (permalink / raw)
To: NeilBrown
Cc: Jeff Layton, Christoph Hellwig, linux-nfs, Chuck Lever,
Trond Myklebust, snitzer
On Thu, Jun 20, 2024 at 07:09:23AM +1000, NeilBrown wrote:
> On Wed, 19 Jun 2024, Jeff Layton wrote:
> > On Wed, 2024-06-19 at 17:10 +1000, NeilBrown wrote:
> > > On Wed, 19 Jun 2024, Christoph Hellwig wrote:
> > > > What happened to the requirement that all protocol extensions added
> > > > to Linux need to be standardized in IETF RFCs?
> > > >
> > > >
> > >
> > > Is that requirement documented somewhere? Not that I doubt it, but it
> > > would be nice to know where it is explicit. I couldn't quickly find
> > > anything in Documentation/
> > >
> > > Can we get by without the LOCALIO protocol?
> > >
> > > For NFSv4.1 we could use the server_owner4 returned by EXCHANGE_ID. It
> > > is explicitly documented as being usable to determine if two servers are
> > > the same.
> > >
> > > For NFSv4.0 ... I don't think we should encourage that to be used.
> > >
> > > For NFSv3 it is harder. I'm not as ready to deprecate it as I am for
> > > 4.0. There is nothing in NFSv3 or MOUNT or NLM that is comparable to
> > > server_owner4. If krb5 was used there would probably be a server
> > > identity in there that could be used.
> > > I think the server could theoretically return an AUTH_SYS verifier in
> > > each RPC reply and that could be used to identify the server. I'm not
> > > sure that is a good idea though.
> > >
> >
> > My idea for v3 was that the localio client could do an O_TMPFILE create
> > on the exported fs and write some random junk to it (a uuid or
> > something). Construct the filehandle for that and then the client could
> > try to issue a READ for that filehandle via the NFS server. If it finds
> > that filehandle and the contents are correct then you're on the same
> > host. Then you just close the file and it should clean itself up.
>
> I can't see how this would work, but maybe I don't have a good enough
> imagination.
>
> The high-level view of the proposed protocol is:
> - client asks remote server to identify itself.
> - server returns an identity
> - client uses local-sideband to ask each local server if it has the
> given identity.
>
> I don't see where an O_TMPFILE could fit into this, or how a different
> high-level approach would be any better.
>
> For NFSv3 the client could ask with a new Program or Version or
> Procedure, or all three. Or it could ask with a new file-handle or path
> name. I imagine using a webnfs (rfc2054) multi-component lookup on the
> public filehandle for "/linux/config/server-id" and getting back a
> filehandle which encodes the server ID somehow. All these seem credible
> options and it is not clear than any one is better than any other.
>
> For NFSv4.1 I think that LOCALIO looks a lot like trunking and so using
> exactly the same mechanism to determine if two servers are the same is a
> good idea.
> But then LOCALIO also looks a lot like a new pNFS/DS protocol so maybe
> we should specify that protocol and use GETDEVICELIST or GETDEVICEINFO
> to find the identity of the server.
Easy enough to switch the RPC call used. If either GETDEVICELIST or
GETDEVICEINFO can convey a UUID it sounds fine to me. But for v4
EXCHANGE_ID already exists.
> > This is a little less straightforward and efficient than the localio
> > protocol that Mike is proposing, but requires no protocol extensions.
>
> I think that if we use anything other than the server-id in the
> EXCHANGE_ID response, then we are defining a new protocol as it is a new
> request which we expect existing servers to ignore or fail, even though
> they have never been tested to ignore/fail that particular request.
>
> Of all the options I would guess that a new version for an existing
> protocol would be safest as that is the most likely to have been tested.
> A new RPC program is probably conceptually simplest. A little hack in
> LOOKUPv3 to detect the public filehandle etc is probably the easiest to
> code, and a new pnfs/ds protocol is probably the cleanest overall
> except that it doesn't support NFSv3.
NFSv3 support is pretty important. So when faced with no options for
v3, I decided to implement LOCALIO (with Trond's encouragement) and
just have both NFS versions use it.
I _can_ frame the v4 support in terms of EXCHANGE_ID (and have already
implemented it before writing LOCALIO, patches aren't on the internet
but I can unearth that work if needed). But I'd still maintain the
nfsd_uuids list, and have nfs_localio's nfsd_uuid_is_local() lookup
the UUID that was embedded in the v4 EXCHANGE_ID payload...
But yes, we'd still need LOCALIO's GETUUID rpc for v3. So EXCHANGE_ID
really doesn't buy much (because we'd still need an IANA registered
rpc program number).
> My purpose in all this is not to replace Mike's LOCALIO protocol, but to
> explore the solution space to ensure there is nothing that is obviously
> better. As yet, I don't think there is.
Thanks, I really appreciate your professionalism and attention to
detail. Pleasure working with you again Neil!
Mike
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 21:09 ` NeilBrown
2024-06-19 22:28 ` Jeff Layton
2024-06-19 22:46 ` Mike Snitzer
@ 2024-06-20 5:16 ` Christoph Hellwig
2 siblings, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2024-06-20 5:16 UTC (permalink / raw)
To: NeilBrown
Cc: Jeff Layton, Christoph Hellwig, Mike Snitzer, linux-nfs,
Chuck Lever, Trond Myklebust, snitzer
On Thu, Jun 20, 2024 at 07:09:23AM +1000, NeilBrown wrote:
> I don't see where an O_TMPFILE could fit into this, or how a different
> high-level approach would be any better.
... especially given that O_TMPFILE requires and open but unlinked
file, which v3 can't really support at all, and even for v4 would
require quite a bit of work (although it would be very useful).
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH v5 00/19] nfs/nfsd: add support for localio
2024-06-19 17:57 ` Mike Snitzer
2024-06-19 18:04 ` Chuck Lever III
@ 2024-06-20 5:18 ` Christoph Hellwig
1 sibling, 0 replies; 45+ messages in thread
From: Christoph Hellwig @ 2024-06-20 5:18 UTC (permalink / raw)
To: Mike Snitzer
Cc: NeilBrown, Christoph Hellwig, linux-nfs, Jeff Layton, Chuck Lever,
Trond Myklebust, snitzer
On Wed, Jun 19, 2024 at 01:57:00PM -0400, Mike Snitzer wrote:
> > Going through the IETF process for something that is entirely private to
> > Linux seems a bit more than should be necessary..
>
> I have to believe Christoph didn't appreciate this LOCALIO protocol is
> an entirely private implementation detail to Linux (that allows client
> and server handshake). I've clarified that in Documentation (for v6).
Well, it still has XDR and code point registrations.
^ permalink raw reply [flat|nested] 45+ messages in thread
end of thread, other threads:[~2024-06-20 5:18 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-18 20:19 [PATCH v5 00/19] nfs/nfsd: add support for localio Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 01/19] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 02/19] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 03/19] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 04/19] sunrpc: add rpcauth_map_to_svc_cred_local Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 05/19] nfs_common: add NFS LOCALIO protocol extension enablement Mike Snitzer
2024-06-19 5:04 ` NeilBrown
2024-06-18 20:19 ` [PATCH v5 06/19] nfs/nfsd: add "localio" support Mike Snitzer
2024-06-18 21:28 ` Jeff Layton
2024-06-18 20:19 ` [PATCH v5 07/19] NFS: Enable localio for non-pNFS I/O Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 08/19] pnfs/flexfiles: Enable localio for flexfiles I/O Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 09/19] nfs: implement v3 and v4 client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-06-19 5:30 ` NeilBrown
2024-06-19 13:18 ` Chuck Lever III
2024-06-18 20:19 ` [PATCH v5 10/19] nfsd: implement v3 and v4 server " Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 11/19] nfs/nfsd: consolidate {encode,decode}_opaque_fixed in nfs_xdr.h Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 12/19] nfs/localio: move managing nfsd_open_local_fh symbol to nfs_common Mike Snitzer
2024-06-18 21:32 ` Jeff Layton
2024-06-18 20:19 ` [PATCH v5 13/19] nfs/nfsd: ensure localio server always uses its network namespace Mike Snitzer
2024-06-18 21:36 ` Jeff Layton
2024-06-18 20:19 ` [PATCH v5 14/19] nfsd/localio: manage netns reference in nfsd_open_local_fh Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 15/19] nfsd: prepare to use SRCU to dereference nn->nfsd_serv Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 16/19] nfsd: " Mike Snitzer
2024-06-19 12:39 ` Jeff Layton
2024-06-19 17:26 ` Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 17/19] nfsd/localio: use SRCU to dereference nn->nfsd_serv in nfsd_open_local_fh Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 18/19] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-06-18 20:19 ` [PATCH v5 19/19] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-06-18 21:46 ` Chuck Lever
2024-06-19 5:47 ` NeilBrown
2024-06-19 18:27 ` Mike Snitzer
2024-06-19 5:49 ` [PATCH v5 00/19] nfs/nfsd: add support for localio Christoph Hellwig
2024-06-19 7:10 ` NeilBrown
2024-06-19 7:15 ` Christoph Hellwig
2024-06-19 10:09 ` Jeff Layton
2024-06-19 21:09 ` NeilBrown
2024-06-19 22:28 ` Jeff Layton
2024-06-19 22:46 ` Mike Snitzer
2024-06-20 5:16 ` Christoph Hellwig
2024-06-19 17:57 ` Mike Snitzer
2024-06-19 18:04 ` Chuck Lever III
2024-06-19 18:13 ` Mike Snitzer
2024-06-19 18:22 ` Chuck Lever III
2024-06-20 5:18 ` Christoph Hellwig
2024-06-19 14:02 ` Trond Myklebust
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).