* [PATCH v11 00/20] nfs/nfsd: add support for localio
@ 2024-07-02 16:28 Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 01/20] SUNRPC: add rpcauth_map_to_svc_cred_local Mike Snitzer
` (21 more replies)
0 siblings, 22 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Hi,
There seems to be consensus that these changes worthwhile and
extensively iterated on. I really appreciate the coordinated
development and review to this point.
I'd very much like these changes to land upstream as-is (unless review
teases out some show-stopper). These changes have been tested fairly
extensively (via xfstests) at this point.
Can we noew please provide formal review tags and merge these changes
through the NFS client tree for 6.11?
Changes since v10:
- Now that XFS will _not_ be patched with "xfs: enable WQ_MEM_RECLAIM
on m_sync_workqueue", reintroduce "nfs/localio: use dedicated workqueues for
filesystem read and write" (patch 18). Also fixed it so that it passes
xfstests generic/355.
FYI:
- I do not intend to rebase this series ontop of NeilBrown's partial
exploration of simplifying away the need for a "fake" svc_rqst
(noble goals and happy to help those changes land upstream as an
incremental improvement):
https://marc.info/?l=linux-nfs&m=171980269529965&w=2
- In addition, tweaks to use nfsd_file_acquire_gc() instead of
nfsd_file_acquire() aren't a priority. Each incremental change
comes with it the potential for regression and we need to lock-down
and stop churning. Happy to explore as incremental improvement and
optimization.
All review and comments are welcome!
Thanks,
Mike
My git tree is here:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/
This v11 is both branch nfs-localio-for-6.11 (always tracks latest)
and nfs-localio-for-6.11.v11
Mike Snitzer (10):
nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
nfs_common: add NFS LOCALIO auxiliary protocol enablement
nfsd: add Kconfig options to allow localio to be enabled
nfsd: manage netns reference in nfsd_open_local_fh
nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh
nfsd: implement server support for NFS_LOCALIO_PROGRAM
nfs: fix nfs_localio_vfs_getattr() to properly support v4
SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg
nfs: implement client support for NFS_LOCALIO_PROGRAM
nfs: add Documentation/filesystems/nfs/localio.rst
NeilBrown (1):
SUNRPC: replace program list with program array
Trond Myklebust (3):
nfs: enable localio for non-pNFS I/O
pnfs/flexfiles: enable localio for flexfiles I/O
nfs/localio: use dedicated workqueues for filesystem read and write
Weston Andros Adamson (6):
SUNRPC: add rpcauth_map_to_svc_cred_local
nfsd: add "localio" support
nfs: pass nfs_client to nfs_initiate_pgio
nfs: pass descriptor thru nfs_initiate_pgio path
nfs: pass struct file to nfs_init_pgio and nfs_init_commit
nfs: add "localio" support
Documentation/filesystems/nfs/localio.rst | 135 ++++
fs/Kconfig | 3 +
fs/nfs/Kconfig | 14 +
fs/nfs/Makefile | 1 +
fs/nfs/blocklayout/blocklayout.c | 6 +-
fs/nfs/client.c | 15 +-
fs/nfs/filelayout/filelayout.c | 16 +-
fs/nfs/flexfilelayout/flexfilelayout.c | 131 +++-
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 +
fs/nfs/inode.c | 61 +-
fs/nfs/internal.h | 61 +-
fs/nfs/localio.c | 891 ++++++++++++++++++++++
fs/nfs/nfs4xdr.c | 13 -
fs/nfs/nfstrace.h | 61 ++
fs/nfs/pagelist.c | 32 +-
fs/nfs/pnfs.c | 24 +-
fs/nfs/pnfs.h | 6 +-
fs/nfs/pnfs_nfs.c | 2 +-
fs/nfs/write.c | 13 +-
fs/nfs_common/Makefile | 3 +
fs/nfs_common/nfslocalio.c | 74 ++
fs/nfsd/Kconfig | 14 +
fs/nfsd/Makefile | 1 +
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 329 ++++++++
fs/nfsd/netns.h | 12 +-
fs/nfsd/nfsctl.c | 2 +-
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/nfssvc.c | 116 ++-
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 9 +
include/linux/nfs.h | 9 +
include/linux/nfs_fs.h | 2 +
include/linux/nfs_fs_sb.h | 10 +
include/linux/nfs_xdr.h | 20 +-
include/linux/nfslocalio.h | 42 +
include/linux/sunrpc/auth.h | 4 +
include/linux/sunrpc/svc.h | 7 +-
net/sunrpc/auth.c | 15 +
net/sunrpc/clnt.c | 1 -
net/sunrpc/svc.c | 68 +-
net/sunrpc/svc_xprt.c | 2 +-
net/sunrpc/svcauth_unix.c | 3 +-
44 files changed, 2089 insertions(+), 154 deletions(-)
create mode 100644 Documentation/filesystems/nfs/localio.rst
create mode 100644 fs/nfs/localio.c
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 fs/nfsd/localio.c
create mode 100644 include/linux/nfslocalio.h
--
2.44.0
^ permalink raw reply [flat|nested] 77+ messages in thread
* [PATCH v11 01/20] SUNRPC: add rpcauth_map_to_svc_cred_local
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 02/20] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
` (20 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
Add new funtion rpcauth_map_to_svc_cred_local which maps a generic
cred to a svc_cred suitable for use in nfsd.
This is needed by the localio code to map nfs client creds to nfs
server credentials.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
include/linux/sunrpc/auth.h | 4 ++++
net/sunrpc/auth.c | 15 +++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/include/linux/sunrpc/auth.h b/include/linux/sunrpc/auth.h
index 61e58327b1aa..872f594a924c 100644
--- a/include/linux/sunrpc/auth.h
+++ b/include/linux/sunrpc/auth.h
@@ -11,6 +11,7 @@
#define _LINUX_SUNRPC_AUTH_H
#include <linux/sunrpc/sched.h>
+#include <linux/sunrpc/svcauth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/sunrpc/xdr.h>
@@ -184,6 +185,9 @@ int rpcauth_uptodatecred(struct rpc_task *);
int rpcauth_init_credcache(struct rpc_auth *);
void rpcauth_destroy_credcache(struct rpc_auth *);
void rpcauth_clear_credcache(struct rpc_cred_cache *);
+void rpcauth_map_to_svc_cred_local(struct rpc_auth *,
+ const struct cred *,
+ struct svc_cred *);
char * rpcauth_stringify_acceptor(struct rpc_cred *);
static inline
diff --git a/net/sunrpc/auth.c b/net/sunrpc/auth.c
index 04534ea537c8..00f12ca779c5 100644
--- a/net/sunrpc/auth.c
+++ b/net/sunrpc/auth.c
@@ -308,6 +308,21 @@ rpcauth_init_credcache(struct rpc_auth *auth)
}
EXPORT_SYMBOL_GPL(rpcauth_init_credcache);
+void
+rpcauth_map_to_svc_cred_local(struct rpc_auth *auth, const struct cred *cred,
+ struct svc_cred *svc)
+{
+ svc->cr_uid = cred->uid;
+ svc->cr_gid = cred->gid;
+ svc->cr_flavor = auth->au_flavor;
+ if (cred->group_info)
+ svc->cr_group_info = get_group_info(cred->group_info);
+ /* These aren't relevant for local (network is bypassed) */
+ svc->cr_principal = NULL;
+ svc->cr_gss_mech = NULL;
+}
+EXPORT_SYMBOL_GPL(rpcauth_map_to_svc_cred_local);
+
char *
rpcauth_stringify_acceptor(struct rpc_cred *cred)
{
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 02/20] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 01/20] SUNRPC: add rpcauth_map_to_svc_cred_local Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 03/20] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
` (19 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Eliminates duplicate functions in various files to allow for
additional callers.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 6 ------
fs/nfs/nfs4xdr.c | 13 -------------
include/linux/nfs_xdr.h | 20 +++++++++++++++++++-
3 files changed, 19 insertions(+), 20 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 24188af56d5b..4a9106fa8220 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -2086,12 +2086,6 @@ static int ff_layout_encode_ioerr(struct xdr_stream *xdr,
return ff_layout_encode_ds_ioerr(xdr, &ff_args->errors);
}
-static void
-encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void
ff_layout_encode_ff_iostat_head(struct xdr_stream *xdr,
const nfs4_stateid *stateid,
diff --git a/fs/nfs/nfs4xdr.c b/fs/nfs/nfs4xdr.c
index 1416099dfcd1..ede431ee0ef0 100644
--- a/fs/nfs/nfs4xdr.c
+++ b/fs/nfs/nfs4xdr.c
@@ -968,11 +968,6 @@ static __be32 *reserve_space(struct xdr_stream *xdr, size_t nbytes)
return p;
}
-static void encode_opaque_fixed(struct xdr_stream *xdr, const void *buf, size_t len)
-{
- WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
-}
-
static void encode_string(struct xdr_stream *xdr, unsigned int len, const char *str)
{
WARN_ON_ONCE(xdr_stream_encode_opaque(xdr, str, len) < 0);
@@ -4352,14 +4347,6 @@ static int decode_access(struct xdr_stream *xdr, u32 *supported, u32 *access)
return 0;
}
-static int decode_opaque_fixed(struct xdr_stream *xdr, void *buf, size_t len)
-{
- ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
- if (unlikely(ret < 0))
- return -EIO;
- return 0;
-}
-
static int decode_stateid(struct xdr_stream *xdr, nfs4_stateid *stateid)
{
return decode_opaque_fixed(xdr, stateid, NFS4_STATEID_SIZE);
diff --git a/include/linux/nfs_xdr.h b/include/linux/nfs_xdr.h
index d09b9773b20c..bb460af0ea1f 100644
--- a/include/linux/nfs_xdr.h
+++ b/include/linux/nfs_xdr.h
@@ -1820,6 +1820,24 @@ struct nfs_rpc_ops {
void (*disable_swap)(struct inode *inode);
};
+/*
+ * Helper functions used by NFS client and/or server
+ */
+static inline void encode_opaque_fixed(struct xdr_stream *xdr,
+ const void *buf, size_t len)
+{
+ WARN_ON_ONCE(xdr_stream_encode_opaque_fixed(xdr, buf, len) < 0);
+}
+
+static inline int decode_opaque_fixed(struct xdr_stream *xdr,
+ void *buf, size_t len)
+{
+ ssize_t ret = xdr_stream_decode_opaque_fixed(xdr, buf, len);
+ if (unlikely(ret < 0))
+ return -EIO;
+ return 0;
+}
+
/*
* Function vectors etc. for the NFS client
*/
@@ -1833,4 +1851,4 @@ extern const struct rpc_version nfs_version4;
extern const struct rpc_version nfsacl_version3;
extern const struct rpc_program nfsacl_program;
-#endif
+#endif /* _LINUX_NFS_XDR_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 03/20] nfs_common: add NFS LOCALIO auxiliary protocol enablement
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 01/20] SUNRPC: add rpcauth_map_to_svc_cred_local Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 02/20] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 04/20] nfsd: add "localio" support Mike Snitzer
` (18 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Localio is used by nfsd to add access to a global nfsd_uuids list in
nfs_common that is used to register and then identify local nfsd
instances.
nfsd_uuids is protected by nfsd_mutex or RCU read lock. List is
composed of nfsd_uuid_t instances that are managed as nfsd creates
them (per network namespace).
nfsd_uuid_is_local() will be used to search all local nfsd for the
client specified nfsd uuid.
This commit also adds all the nfs_client members required to implement
the entire localio feature (which depends on the LOCALIO protocol).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/client.c | 8 +++++
fs/nfs_common/Makefile | 3 ++
fs/nfs_common/nfslocalio.c | 74 ++++++++++++++++++++++++++++++++++++++
fs/nfsd/netns.h | 4 +++
fs/nfsd/nfssvc.c | 12 ++++++-
include/linux/nfs_fs_sb.h | 9 +++++
include/linux/nfslocalio.h | 40 +++++++++++++++++++++
7 files changed, 149 insertions(+), 1 deletion(-)
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 include/linux/nfslocalio.h
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index de77848ae654..bcdf8d42cbc7 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -178,6 +178,14 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1;
clp->cl_net = get_net(cl_init->net);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ seqlock_init(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ clp->nfsd_open_local_fh = NULL;
+ clp->cl_nfssvc_net = NULL;
+#endif /* CONFIG_NFS_LOCALIO */
+
clp->cl_principal = "*";
clp->cl_xprtsec = cl_init->xprtsec;
return clp;
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index 119c75ab9fd0..d81623b76aba 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -6,5 +6,8 @@
obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
nfs_acl-objs := nfsacl.o
+obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o
+nfs_localio-objs := nfslocalio.o
+
obj-$(CONFIG_GRACE_PERIOD) += grace.o
obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
new file mode 100644
index 000000000000..a234aa92950f
--- /dev/null
+++ b/fs/nfs_common/nfslocalio.c
@@ -0,0 +1,74 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/rculist.h>
+#include <linux/nfslocalio.h>
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("NFS localio protocol bypass support");
+
+/*
+ * Global list of nfsd_uuid_t instances, add/remove
+ * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
+ * Reads are protected by RCU read lock (see below).
+ */
+LIST_HEAD(nfsd_uuids);
+EXPORT_SYMBOL(nfsd_uuids);
+
+/* Must be called with RCU read lock held. */
+static const uuid_t * nfsd_uuid_lookup(const uuid_t *uuid,
+ struct net **netp)
+{
+ nfsd_uuid_t *nfsd_uuid;
+
+ list_for_each_entry_rcu(nfsd_uuid, &nfsd_uuids, list)
+ if (uuid_equal(&nfsd_uuid->uuid, uuid)) {
+ *netp = nfsd_uuid->net;
+ return &nfsd_uuid->uuid;
+ }
+
+ return &uuid_null;
+}
+
+bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp)
+{
+ bool is_local;
+ const uuid_t *nfsd_uuid;
+
+ rcu_read_lock();
+ nfsd_uuid = nfsd_uuid_lookup(uuid, netp);
+ is_local = !uuid_is_null(nfsd_uuid);
+ rcu_read_unlock();
+
+ return is_local;
+}
+EXPORT_SYMBOL_GPL(nfsd_uuid_is_local);
+
+/*
+ * The nfs localio code needs to call into nfsd to do the filehandle -> struct path
+ * mapping, but cannot be statically linked, because that will make the nfs module
+ * depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
+ * nfs_common module will only hold a reference on nfsd when localio is in use.
+ * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+
+extern int nfsd_open_local_fh(struct net *, struct rpc_clnt *rpc_clnt,
+ const struct cred *cred, const struct nfs_fh *nfs_fh,
+ const fmode_t fmode, struct file **pfilp);
+
+nfs_to_nfsd_open_t get_nfsd_open_local_fh(void)
+{
+ return symbol_request(nfsd_open_local_fh);
+}
+EXPORT_SYMBOL_GPL(get_nfsd_open_local_fh);
+
+void put_nfsd_open_local_fh(void)
+{
+ symbol_put(nfsd_open_local_fh);
+}
+EXPORT_SYMBOL_GPL(put_nfsd_open_local_fh);
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 14ec15656320..0c5a1d97e4ac 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -15,6 +15,7 @@
#include <linux/percpu_counter.h>
#include <linux/siphash.h>
#include <linux/sunrpc/stats.h>
+#include <linux/nfslocalio.h>
/* Hash tables for nfs4_clientid state */
#define CLIENT_HASH_BITS 4
@@ -213,6 +214,9 @@ struct nfsd_net {
/* last time an admin-revoke happened for NFSv4.0 */
time64_t nfs40_last_revoke;
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ nfsd_uuid_t nfsd_uuid;
+#endif
};
/* Simple check to find out if a given net was properly initialized */
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 0bc8eaa5e009..402d436cbd24 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -19,6 +19,7 @@
#include <linux/sunrpc/svc_xprt.h>
#include <linux/lockd/bind.h>
#include <linux/nfsacl.h>
+#include <linux/nfslocalio.h>
#include <linux/seq_file.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
@@ -427,6 +428,10 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
#ifdef CONFIG_NFSD_V4_2_INTER_SSC
nfsd4_ssc_init_umount_work(nn);
+#endif
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ INIT_LIST_HEAD(&nn->nfsd_uuid.list);
+ list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
nn->nfsd_net_up = true;
return 0;
@@ -456,6 +461,9 @@ static void nfsd_shutdown_net(struct net *net)
lockd_down(net);
nn->lockd_up = false;
}
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ list_del_rcu(&nn->nfsd_uuid.list);
+#endif
nn->nfsd_net_up = false;
nfsd_shutdown_generic();
}
@@ -808,7 +816,9 @@ nfsd_svc(int n, int *nthreads, struct net *net, const struct cred *cred, const c
strscpy(nn->nfsd_name, scope ? scope : utsname()->nodename,
sizeof(nn->nfsd_name));
-
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ uuid_gen(&nn->nfsd_uuid.uuid);
+#endif
error = nfsd_create_serv(net);
if (error)
goto out;
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 92de074e63b9..e58e706a6503 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -8,6 +8,7 @@
#include <linux/wait.h>
#include <linux/nfs_xdr.h>
#include <linux/sunrpc/xprt.h>
+#include <linux/nfslocalio.h>
#include <linux/atomic.h>
#include <linux/refcount.h>
@@ -125,6 +126,14 @@ struct nfs_client {
struct net *cl_net;
struct list_head pending_cb_stateids;
struct rcu_head rcu;
+
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ struct timespec64 cl_nfssvc_boot;
+ seqlock_t cl_boot_lock;
+ struct rpc_clnt * cl_rpcclient_localio;
+ struct net * cl_nfssvc_net;
+ nfs_to_nfsd_open_t nfsd_open_local_fh;
+#endif /* CONFIG_NFS_LOCALIO */
};
/*
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
new file mode 100644
index 000000000000..22443d2089eb
--- /dev/null
+++ b/include/linux/nfslocalio.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+#ifndef __LINUX_NFSLOCALIO_H
+#define __LINUX_NFSLOCALIO_H
+
+#include <linux/list.h>
+#include <linux/uuid.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/nfs.h>
+#include <net/net_namespace.h>
+
+/*
+ * Global list of nfsd_uuid_t instances, add/remove
+ * is protected by fs/nfsd/nfssvc.c:nfsd_mutex.
+ */
+extern struct list_head nfsd_uuids;
+
+/*
+ * Each nfsd instance has an nfsd_uuid_t that is accessible through the
+ * global nfsd_uuids list. Useful to allow a client to negotiate if localio
+ * possible with its server.
+ */
+typedef struct {
+ uuid_t uuid;
+ struct list_head list;
+ struct net *net; /* nfsd's network namespace */
+} nfsd_uuid_t;
+
+bool nfsd_uuid_is_local(const uuid_t *uuid, struct net **netp);
+
+typedef int (*nfs_to_nfsd_open_t)(struct net *, struct rpc_clnt *,
+ const struct cred *, const struct nfs_fh *,
+ const fmode_t, struct file **);
+
+nfs_to_nfsd_open_t get_nfsd_open_local_fh(void);
+void put_nfsd_open_local_fh(void);
+
+#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 04/20] nfsd: add "localio" support
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (2 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 03/20] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 05/20] nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
` (17 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
Add server support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when both the client and server are
running on the same host.
If nfsd_open_local_fh() fails (e.g. due to allocation failure in
nfsd_local_fakerqst_create) then the NFS client will both retry and
fallback to normal network-based read, write and commit operations if
localio is no longer supported.
To ensure the server's network namespace is used for localio (to allow
for access to the proper 'struct nfsd_net') the NFS client code will
pass the server's 'struct net' (stored as cl_nfssvc_net in 'struct
nfs_client') as first argument to nfsd_open_local_fh().
It is expected that both the client and server are using the same
mount namespace.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/Makefile | 1 +
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 248 ++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfssvc.c | 1 +
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 9 ++
6 files changed, 262 insertions(+), 2 deletions(-)
create mode 100644 fs/nfsd/localio.c
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index b8736a82e57c..78b421778a79 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
+nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index ad9083ca144b..99631fa56662 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -52,7 +52,7 @@
#define NFSD_FILE_CACHE_UP (0)
/* We only care about NFSD_MAY_READ/WRITE for this cache */
-#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
new file mode 100644
index 000000000000..2eedeaeab533
--- /dev/null
+++ b/fs/nfsd/localio.c
@@ -0,0 +1,248 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS server support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/exportfs.h>
+#include <linux/sunrpc/svcauth_gss.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/nfs.h>
+#include <linux/string.h>
+
+#include "nfsd.h"
+#include "vfs.h"
+#include "netns.h"
+#include "filecache.h"
+
+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ * - duplicated from fs/nfs/nfs2xdr.c to avoid needless bloat of
+ * all compiled nfs objects if it were in include/linux/nfs.h
+ */
+static const struct {
+ int stat;
+ int errno;
+} nfs_common_errtbl[] = {
+ { NFS_OK, 0 },
+ { NFSERR_PERM, -EPERM },
+ { NFSERR_NOENT, -ENOENT },
+ { NFSERR_IO, -EIO },
+ { NFSERR_NXIO, -ENXIO },
+/* { NFSERR_EAGAIN, -EAGAIN }, */
+ { NFSERR_ACCES, -EACCES },
+ { NFSERR_EXIST, -EEXIST },
+ { NFSERR_XDEV, -EXDEV },
+ { NFSERR_NODEV, -ENODEV },
+ { NFSERR_NOTDIR, -ENOTDIR },
+ { NFSERR_ISDIR, -EISDIR },
+ { NFSERR_INVAL, -EINVAL },
+ { NFSERR_FBIG, -EFBIG },
+ { NFSERR_NOSPC, -ENOSPC },
+ { NFSERR_ROFS, -EROFS },
+ { NFSERR_MLINK, -EMLINK },
+ { NFSERR_NAMETOOLONG, -ENAMETOOLONG },
+ { NFSERR_NOTEMPTY, -ENOTEMPTY },
+ { NFSERR_DQUOT, -EDQUOT },
+ { NFSERR_STALE, -ESTALE },
+ { NFSERR_REMOTE, -EREMOTE },
+#ifdef EWFLUSH
+ { NFSERR_WFLUSH, -EWFLUSH },
+#endif
+ { NFSERR_BADHANDLE, -EBADHANDLE },
+ { NFSERR_NOT_SYNC, -ENOTSYNC },
+ { NFSERR_BAD_COOKIE, -EBADCOOKIE },
+ { NFSERR_NOTSUPP, -ENOTSUPP },
+ { NFSERR_TOOSMALL, -ETOOSMALL },
+ { NFSERR_SERVERFAULT, -EREMOTEIO },
+ { NFSERR_BADTYPE, -EBADTYPE },
+ { NFSERR_JUKEBOX, -EJUKEBOX },
+ { -1, -EIO }
+};
+
+/**
+ * nfs_stat_to_errno - convert an NFS status code to a local errno
+ * @status: NFS status code to convert
+ *
+ * Returns a local errno value, or -EIO if the NFS status code is
+ * not recognized. nfsd_file_acquire() returns an nfsstat that
+ * needs to be translated to an errno before being returned to a
+ * local client application.
+ */
+static int nfs_stat_to_errno(enum nfs_stat status)
+{
+ int i;
+
+ for (i = 0; nfs_common_errtbl[i].stat != -1; i++) {
+ if (nfs_common_errtbl[i].stat == (int)status)
+ return nfs_common_errtbl[i].errno;
+ }
+ return nfs_common_errtbl[i].errno;
+}
+
+static void
+nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
+{
+ if (rqstp->rq_client)
+ auth_domain_put(rqstp->rq_client);
+ if (rqstp->rq_cred.cr_group_info)
+ put_group_info(rqstp->rq_cred.cr_group_info);
+ /* rpcauth_map_to_svc_cred_local() clears cr_principal */
+ WARN_ON_ONCE(rqstp->rq_cred.cr_principal != NULL);
+ kfree(rqstp->rq_xprt);
+ kfree(rqstp);
+}
+
+static struct svc_rqst *
+nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
+ const struct cred *cred)
+{
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ struct svc_rqst *rqstp;
+ int status;
+
+ /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
+ if (unlikely(!READ_ONCE(nn->nfsd_serv)))
+ return ERR_PTR(-ENXIO);
+
+ rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
+ if (!rqstp)
+ return ERR_PTR(-ENOMEM);
+
+ rqstp->rq_xprt = kzalloc(sizeof(*rqstp->rq_xprt), GFP_KERNEL);
+ if (!rqstp->rq_xprt) {
+ status = -ENOMEM;
+ goto out_err;
+ }
+
+ rqstp->rq_xprt->xpt_net = net;
+ __set_bit(RQ_SECURE, &rqstp->rq_flags);
+ rqstp->rq_server = nn->nfsd_serv;
+ /*
+ * These constants aren't actively used in this fake svc_rqst,
+ * which bypasses SUNRPC, but they must pass negative checks.
+ */
+ rqstp->rq_proc = 1;
+ rqstp->rq_vers = 3;
+ rqstp->rq_prot = IPPROTO_TCP;
+
+ /* Note: we're connecting to ourself, so source addr == peer addr */
+ rqstp->rq_addrlen = rpc_peeraddr(rpc_clnt,
+ (struct sockaddr *)&rqstp->rq_addr,
+ sizeof(rqstp->rq_addr));
+
+ rpcauth_map_to_svc_cred_local(rpc_clnt->cl_auth, cred, &rqstp->rq_cred);
+
+ /*
+ * set up enough for svcauth_unix_set_client to be able to wait
+ * for the cache downcall. Note that we do _not_ want to allow the
+ * request to be deferred for later revisit since this rqst and xprt
+ * are not set up to run inside of the normal svc_rqst engine.
+ */
+ INIT_LIST_HEAD(&rqstp->rq_xprt->xpt_deferred);
+ kref_init(&rqstp->rq_xprt->xpt_ref);
+ spin_lock_init(&rqstp->rq_xprt->xpt_lock);
+ rqstp->rq_chandle.thread_wait = 5 * HZ;
+
+ status = svcauth_unix_set_client(rqstp);
+ switch (status) {
+ case SVC_OK:
+ break;
+ case SVC_DENIED:
+ status = -ENXIO;
+ goto out_err;
+ default:
+ status = -ETIMEDOUT;
+ goto out_err;
+ }
+
+ return rqstp;
+
+out_err:
+ nfsd_local_fakerqst_destroy(rqstp);
+ return ERR_PTR(status);
+}
+
+/**
+ * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to @file
+ *
+ * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net
+ * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
+ * @cred: cred that the client established
+ * @nfs_fh: filehandle to lookup
+ * @fmode: fmode_t to use for open
+ * @pfilp: returned file pointer that maps to @nfs_fh
+ *
+ * This function maps a local fh to a path on a local filesystem.
+ * This is useful when the nfs client has the local server mounted - it can
+ * avoid all the NFS overhead with reads, writes and commits.
+ *
+ * On successful return, caller is responsible for calling path_put. Also
+ * note that this is called from nfs.ko via find_symbol() to avoid an explicit
+ * dependency on knfsd. So, there is no forward declaration in a header file
+ * for it that is shared with the client.
+ */
+int nfsd_open_local_fh(struct net *cl_nfssvc_net,
+ struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh,
+ const fmode_t fmode,
+ struct file **pfilp)
+{
+ int mayflags = NFSD_MAY_LOCALIO;
+ int status = 0;
+ const struct cred *save_cred;
+ struct svc_rqst *rqstp;
+ struct svc_fh fh;
+ struct nfsd_file *nf;
+ __be32 beres;
+
+ /* Save creds before calling into nfsd */
+ save_cred = get_current_cred();
+
+ rqstp = nfsd_local_fakerqst_create(cl_nfssvc_net, rpc_clnt, cred);
+ if (IS_ERR(rqstp)) {
+ status = PTR_ERR(rqstp);
+ goto out_revertcred;
+ }
+
+ /* nfs_fh -> svc_fh */
+ if (nfs_fh->size > NFS4_FHSIZE) {
+ status = -EINVAL;
+ goto out;
+ }
+ fh_init(&fh, NFS4_FHSIZE);
+ fh.fh_handle.fh_size = nfs_fh->size;
+ memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
+
+ if (fmode & FMODE_READ)
+ mayflags |= NFSD_MAY_READ;
+ if (fmode & FMODE_WRITE)
+ mayflags |= NFSD_MAY_WRITE;
+
+ beres = nfsd_file_acquire(rqstp, &fh, mayflags, &nf);
+ if (beres) {
+ status = nfs_stat_to_errno(be32_to_cpu(beres));
+ goto out_fh_put;
+ }
+
+ *pfilp = get_file(nf->nf_file);
+
+ nfsd_file_put(nf);
+out_fh_put:
+ fh_put(&fh);
+
+out:
+ nfsd_local_fakerqst_destroy(rqstp);
+out_revertcred:
+ revert_creds(save_cred);
+ return status;
+}
+EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 402d436cbd24..5c99ba9abb03 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -431,6 +431,7 @@ static int nfsd_startup_net(struct net *net, const struct cred *cred)
#endif
#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
INIT_LIST_HEAD(&nn->nfsd_uuid.list);
+ nn->nfsd_uuid.net = net;
list_add_tail_rcu(&nn->nfsd_uuid.list, &nfsd_uuids);
#endif
nn->nfsd_net_up = true;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 77bbd23aa150..9c0610fdd11c 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
{ NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
{ NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
{ NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
- { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
+ { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
+ { NFSD_MAY_LOCALIO, "LOCALIO" })
TRACE_EVENT(nfsd_compound,
TP_PROTO(
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 57cd70062048..5146f0c81752 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -33,6 +33,8 @@
#define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */
+#define NFSD_MAY_LOCALIO 0x2000
+
#define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
#define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
@@ -158,6 +160,13 @@ __be32 nfsd_permission(struct svc_rqst *, struct svc_export *,
void nfsd_filp_close(struct file *fp);
+int nfsd_open_local_fh(struct net *net,
+ struct rpc_clnt *rpc_clnt,
+ const struct cred *cred,
+ const struct nfs_fh *nfs_fh,
+ const fmode_t fmode,
+ struct file **pfilp);
+
static inline int fh_want_write(struct svc_fh *fh)
{
int ret;
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 05/20] nfsd: add Kconfig options to allow localio to be enabled
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (3 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 04/20] nfsd: add "localio" support Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 06/20] nfsd: manage netns reference in nfsd_open_local_fh Mike Snitzer
` (16 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
CONFIG_NFSD_LOCALIO controls the server enablement for localio.
A later commit will add CONFIG_NFS_LOCALIO to allow the client
enablement.
While it is true that it doesn't make sense, on a using LOCALIO level,
to have one without the other: it is useful to allow a mix be
configured for testing purposes. It could be that the same control
could be achieved by exposing a discrete "localio_enabled"
module_param in the server (nfsd.ko) like is already available in the
client (nfs.ko).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/Kconfig | 3 +++
fs/nfsd/Kconfig | 14 ++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/fs/Kconfig b/fs/Kconfig
index a46b0cbc4d8f..170083ff2a51 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
tristate
select FS_POSIX_ACL
+config NFS_COMMON_LOCALIO_SUPPORT
+ tristate
+
config NFS_COMMON
bool
depends on NFSD || NFS_FS || LOCKD
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index ec2ab6429e00..a36ff66c7430 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -89,6 +89,20 @@ config NFSD_V4
If unsure, say N.
+config NFSD_LOCALIO
+ tristate "NFS server support for the LOCALIO auxiliary protocol"
+ depends on NFSD || NFSD_V4
+ select NFS_COMMON_LOCALIO_SUPPORT
+ help
+ Some NFS servers support an auxiliary NFS LOCALIO protocol
+ that is not an official part of the NFS version 3 or 4 protocol.
+
+ This option enables support for the LOCALIO protocol in the
+ kernel's NFS server. Enable this to bypass using the NFS
+ protocol when issuing reads, writes and commits to the server.
+
+ If unsure, say N.
+
config NFSD_PNFS
bool
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 06/20] nfsd: manage netns reference in nfsd_open_local_fh
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (4 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 05/20] nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 07/20] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh Mike Snitzer
` (15 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Use maybe_get_net() and put_net() in nfsd_open_local_fh().
Also refactor nfsd_open_local_fh() slightly.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 40 +++++++++++++++++++++++-----------------
1 file changed, 23 insertions(+), 17 deletions(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 2eedeaeab533..0f81c340acc5 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -99,16 +99,11 @@ nfsd_local_fakerqst_destroy(struct svc_rqst *rqstp)
static struct svc_rqst *
nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
- const struct cred *cred)
+ const struct cred *cred, struct svc_serv *serv)
{
- struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct svc_rqst *rqstp;
int status;
- /* FIXME: not running in nfsd context, must get reference on nfsd_serv */
- if (unlikely(!READ_ONCE(nn->nfsd_serv)))
- return ERR_PTR(-ENXIO);
-
rqstp = kzalloc(sizeof(*rqstp), GFP_KERNEL);
if (!rqstp)
return ERR_PTR(-ENOMEM);
@@ -118,10 +113,10 @@ nfsd_local_fakerqst_create(struct net *net, struct rpc_clnt *rpc_clnt,
status = -ENOMEM;
goto out_err;
}
-
rqstp->rq_xprt->xpt_net = net;
+
__set_bit(RQ_SECURE, &rqstp->rq_flags);
- rqstp->rq_server = nn->nfsd_serv;
+ rqstp->rq_server = serv;
/*
* These constants aren't actively used in this fake svc_rqst,
* which bypasses SUNRPC, but they must pass negative checks.
@@ -195,26 +190,39 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
{
int mayflags = NFSD_MAY_LOCALIO;
int status = 0;
+ struct nfsd_net *nn;
const struct cred *save_cred;
struct svc_rqst *rqstp;
struct svc_fh fh;
struct nfsd_file *nf;
+ struct svc_serv *serv;
__be32 beres;
+ if (nfs_fh->size > NFS4_FHSIZE)
+ return -EINVAL;
+
+ /* Not running in nfsd context, must safely get reference on nfsd_serv */
+ cl_nfssvc_net = maybe_get_net(cl_nfssvc_net);
+ if (!cl_nfssvc_net)
+ return -ENXIO;
+ nn = net_generic(cl_nfssvc_net, nfsd_net_id);
+
+ serv = READ_ONCE(nn->nfsd_serv);
+ if (unlikely(!serv)) {
+ status = -ENXIO;
+ goto out_net;
+ }
+
/* Save creds before calling into nfsd */
save_cred = get_current_cred();
- rqstp = nfsd_local_fakerqst_create(cl_nfssvc_net, rpc_clnt, cred);
+ rqstp = nfsd_local_fakerqst_create(cl_nfssvc_net, rpc_clnt, cred, serv);
if (IS_ERR(rqstp)) {
status = PTR_ERR(rqstp);
goto out_revertcred;
}
/* nfs_fh -> svc_fh */
- if (nfs_fh->size > NFS4_FHSIZE) {
- status = -EINVAL;
- goto out;
- }
fh_init(&fh, NFS4_FHSIZE);
fh.fh_handle.fh_size = nfs_fh->size;
memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
@@ -229,17 +237,15 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
status = nfs_stat_to_errno(be32_to_cpu(beres));
goto out_fh_put;
}
-
*pfilp = get_file(nf->nf_file);
-
nfsd_file_put(nf);
out_fh_put:
fh_put(&fh);
-
-out:
nfsd_local_fakerqst_destroy(rqstp);
out_revertcred:
revert_creds(save_cred);
+out_net:
+ put_net(cl_nfssvc_net);
return status;
}
EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 07/20] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (5 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 06/20] nfsd: manage netns reference in nfsd_open_local_fh Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 08/20] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
` (14 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
Introduce nfsd_serv_try_get and nfsd_serv_put and update the nfsd code
to prevent nfsd_destroy_serv from destroying nn->nfsd_serv until any
client initiated localio calls to nfsd (that are _not_ in the context
of nfsd) are complete.
nfsd_open_local_fh is updated to nfsd_serv_try_get before opening its
file handle and then drop the reference using nfsd_serv_put at the end
of nfsd_open_local_fh.
This "interlock" working relies heavily on nfsd_open_local_fh()'s
maybe_get_net() safely dealing with the possibility that the struct
net (and nfsd_net by association) may have been destroyed by
nfsd_destroy_serv() via nfsd_shutdown_net().
Verified to fix an easy to hit crash that would occur if an nfsd
instance running in a container, with a localio client mounted, is
shutdown. Upon restart of the container and associated nfsd the client
would go on to crash due to NULL pointer dereference that occuured due
to the nfs client's localio attempting to nfsd_open_local_fh(), using
nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/localio.c | 9 +++++----
fs/nfsd/netns.h | 8 +++++++-
fs/nfsd/nfssvc.c | 39 +++++++++++++++++++++++++++++++++++++++
3 files changed, 51 insertions(+), 5 deletions(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 0f81c340acc5..2e609ada7e19 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -195,7 +195,6 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
struct svc_rqst *rqstp;
struct svc_fh fh;
struct nfsd_file *nf;
- struct svc_serv *serv;
__be32 beres;
if (nfs_fh->size > NFS4_FHSIZE)
@@ -207,8 +206,8 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
return -ENXIO;
nn = net_generic(cl_nfssvc_net, nfsd_net_id);
- serv = READ_ONCE(nn->nfsd_serv);
- if (unlikely(!serv)) {
+ /* The server may already be shutting down, disallow new localio */
+ if (unlikely(!nfsd_serv_try_get(nn))) {
status = -ENXIO;
goto out_net;
}
@@ -216,7 +215,8 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
/* Save creds before calling into nfsd */
save_cred = get_current_cred();
- rqstp = nfsd_local_fakerqst_create(cl_nfssvc_net, rpc_clnt, cred, serv);
+ rqstp = nfsd_local_fakerqst_create(cl_nfssvc_net, rpc_clnt,
+ cred, nn->nfsd_serv);
if (IS_ERR(rqstp)) {
status = PTR_ERR(rqstp);
goto out_revertcred;
@@ -244,6 +244,7 @@ int nfsd_open_local_fh(struct net *cl_nfssvc_net,
nfsd_local_fakerqst_destroy(rqstp);
out_revertcred:
revert_creds(save_cred);
+ nfsd_serv_put(nn);
out_net:
put_net(cl_nfssvc_net);
return status;
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 0c5a1d97e4ac..443b003fd2ec 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -13,6 +13,7 @@
#include <linux/filelock.h>
#include <linux/nfs4.h>
#include <linux/percpu_counter.h>
+#include <linux/percpu-refcount.h>
#include <linux/siphash.h>
#include <linux/sunrpc/stats.h>
#include <linux/nfslocalio.h>
@@ -140,7 +141,9 @@ struct nfsd_net {
struct svc_info nfsd_info;
#define nfsd_serv nfsd_info.serv
-
+ struct percpu_ref nfsd_serv_ref;
+ struct completion nfsd_serv_confirm_done;
+ struct completion nfsd_serv_free_done;
/*
* clientid and stateid data for construction of net unique COPY
@@ -225,6 +228,9 @@ struct nfsd_net {
extern bool nfsd_support_version(int vers);
extern void nfsd_netns_free_versions(struct nfsd_net *nn);
+bool nfsd_serv_try_get(struct nfsd_net *nn);
+void nfsd_serv_put(struct nfsd_net *nn);
+
extern unsigned int nfsd_net_id;
void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 5c99ba9abb03..90922c0586d5 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -258,6 +258,30 @@ int nfsd_minorversion(struct nfsd_net *nn, u32 minorversion, enum vers_op change
return 0;
}
+bool nfsd_serv_try_get(struct nfsd_net *nn)
+{
+ return percpu_ref_tryget_live(&nn->nfsd_serv_ref);
+}
+
+void nfsd_serv_put(struct nfsd_net *nn)
+{
+ percpu_ref_put(&nn->nfsd_serv_ref);
+}
+
+static void nfsd_serv_done(struct percpu_ref *ref)
+{
+ struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
+
+ complete(&nn->nfsd_serv_confirm_done);
+}
+
+static void nfsd_serv_free(struct percpu_ref *ref)
+{
+ struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref);
+
+ complete(&nn->nfsd_serv_free_done);
+}
+
/*
* Maximum number of nfsd processes
*/
@@ -462,6 +486,7 @@ static void nfsd_shutdown_net(struct net *net)
lockd_down(net);
nn->lockd_up = false;
}
+ percpu_ref_exit(&nn->nfsd_serv_ref);
#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
list_del_rcu(&nn->nfsd_uuid.list);
#endif
@@ -544,6 +569,13 @@ void nfsd_destroy_serv(struct net *net)
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
struct svc_serv *serv = nn->nfsd_serv;
+ lockdep_assert_held(&nfsd_mutex);
+
+ percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done);
+ wait_for_completion(&nn->nfsd_serv_confirm_done);
+ wait_for_completion(&nn->nfsd_serv_free_done);
+ /* percpu_ref_exit is called in nfsd_shutdown_net */
+
spin_lock(&nfsd_notifier_lock);
nn->nfsd_serv = NULL;
spin_unlock(&nfsd_notifier_lock);
@@ -666,6 +698,13 @@ int nfsd_create_serv(struct net *net)
if (nn->nfsd_serv)
return 0;
+ error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free,
+ 0, GFP_KERNEL);
+ if (error)
+ return error;
+ init_completion(&nn->nfsd_serv_free_done);
+ init_completion(&nn->nfsd_serv_confirm_done);
+
if (nfsd_max_blksize == 0)
nfsd_max_blksize = nfsd_get_default_max_blksize();
nfsd_reset_versions(nn);
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 08/20] nfsd: implement server support for NFS_LOCALIO_PROGRAM
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (6 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 07/20] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 09/20] SUNRPC: replace program list with program array Mike Snitzer
` (13 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
LOCALIOPROC_GETUUID encodes the server's uuid_t in terms of the fixed
UUID_SIZE (16). The fixed size opaque encode and decode XDR methods
are used instead of the less efficient variable sized methods.
Aside from a bit of code in nfssvc.c, all the knowledge of the LOCALIO
RPC protocol is in fs/nfsd/localio.c which implements just a single
version (1) that is used independently of what NFS version is used.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[neilb: factored out and simplified single localio protocol]
Co-developed-by: NeilBrown <neil@brown.name>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfsd/localio.c | 74 +++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfssvc.c | 29 +++++++++++++++++-
include/linux/nfs.h | 7 +++++
3 files changed, 109 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 2e609ada7e19..1d6508aa931e 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -11,12 +11,15 @@
#include <linux/sunrpc/svcauth_gss.h>
#include <linux/sunrpc/clnt.h>
#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
#include <linux/string.h>
#include "nfsd.h"
#include "vfs.h"
#include "netns.h"
#include "filecache.h"
+#include "cache.h"
/*
* We need to translate between nfs status return values and
@@ -253,3 +256,74 @@ EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
/* Compile time type checking, not used by anything */
static nfs_to_nfsd_open_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
+
+/*
+ * GETUUID XDR encode functions
+ */
+
+static __be32 localio_proc_null(struct svc_rqst *rqstp)
+{
+ return rpc_success;
+}
+
+struct localio_getuuidres {
+ uuid_t uuid;
+};
+
+static __be32 localio_proc_getuuid(struct svc_rqst *rqstp)
+{
+ struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
+ struct localio_getuuidres *resp = rqstp->rq_resp;
+
+ uuid_copy(&resp->uuid, &nn->nfsd_uuid.uuid);
+
+ return rpc_success;
+}
+
+static bool localio_encode_getuuidres(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr)
+{
+ struct localio_getuuidres *resp = rqstp->rq_resp;
+ u8 uuid[UUID_SIZE];
+
+ export_uuid(uuid, &resp->uuid);
+ encode_opaque_fixed(xdr, uuid, UUID_SIZE);
+
+ return true;
+}
+
+static const struct svc_procedure localio_procedures1[] = {
+ [LOCALIOPROC_NULL] = {
+ .pc_func = localio_proc_null,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfssvc_encode_voidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_voidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = 0,
+ .pc_name = "NULL",
+ },
+ [LOCALIOPROC_GETUUID] = {
+ .pc_func = localio_proc_getuuid,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = localio_encode_getuuidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct localio_getuuidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = XDR_QUADLEN(UUID_SIZE),
+ .pc_name = "GETUUID",
+ },
+};
+
+#define LOCALIO_NR_PROCEDURES ARRAY_SIZE(localio_procedures1)
+static DEFINE_PER_CPU_ALIGNED(unsigned long,
+ localio_count[LOCALIO_NR_PROCEDURES]);
+const struct svc_version localio_version1 = {
+ .vs_vers = 1,
+ .vs_nproc = LOCALIO_NR_PROCEDURES,
+ .vs_proc = localio_procedures1,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_count = localio_count,
+ .vs_xdrsize = XDR_QUADLEN(UUID_SIZE),
+ .vs_hidden = true,
+};
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 90922c0586d5..3e528d242966 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -81,6 +81,26 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
unsigned long nfsd_drc_max_mem;
unsigned long nfsd_drc_mem_used;
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+extern const struct svc_version localio_version1;
+static const struct svc_version *localio_versions[] = {
+ [1] = &localio_version1,
+};
+
+#define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions)
+
+static struct svc_program nfsd_localio_program = {
+ .pg_prog = NFS_LOCALIO_PROGRAM,
+ .pg_nvers = NFSD_LOCALIO_NRVERS,
+ .pg_vers = localio_versions,
+ .pg_name = "nfslocalio",
+ .pg_class = "nfsd",
+ .pg_authenticate = &svc_set_client,
+ .pg_init_request = svc_generic_init_request,
+ .pg_rpcbind_set = svc_generic_rpcbind_set,
+};
+#endif /* CONFIG_NFSD_LOCALIO */
+
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static const struct svc_version *nfsd_acl_version[] = {
# if defined(CONFIG_NFSD_V2_ACL)
@@ -95,6 +115,9 @@ static const struct svc_version *nfsd_acl_version[] = {
#define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)
static struct svc_program nfsd_acl_program = {
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ .pg_next = &nfsd_localio_program,
+#endif /* CONFIG_NFSD_LOCALIO */
.pg_prog = NFS_ACL_PROGRAM,
.pg_nvers = NFSD_ACL_NRVERS,
.pg_vers = nfsd_acl_version,
@@ -123,6 +146,10 @@ static const struct svc_version *nfsd_version[] = {
struct svc_program nfsd_program = {
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
.pg_next = &nfsd_acl_program,
+#else
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ .pg_next = &nfsd_localio_program,
+#endif /* CONFIG_NFSD_LOCALIO */
#endif
.pg_prog = NFS_PROGRAM, /* program number */
.pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
@@ -1020,7 +1047,7 @@ nfsd(void *vrqstp)
}
/**
- * nfsd_dispatch - Process an NFS or NFSACL Request
+ * nfsd_dispatch - Process an NFS or NFSACL or LOCALIO Request
* @rqstp: incoming request
*
* This RPC dispatcher integrates the NFS server's duplicate reply cache.
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index ceb70a926b95..b1e00349f3ed 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -13,6 +13,13 @@
#include <linux/crc32.h>
#include <uapi/linux/nfs.h>
+/* The localio program is entirely private to Linux and is
+ * NOT part of the uapi.
+ */
+#define NFS_LOCALIO_PROGRAM 400122
+#define LOCALIOPROC_NULL 0
+#define LOCALIOPROC_GETUUID 1
+
/*
* This is the kernel NFS client file handle representation
*/
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 09/20] SUNRPC: replace program list with program array
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (7 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 08/20] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 10/20] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
` (12 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: NeilBrown <neil@brown.name>
A service created with svc_create_pooled() can be given a linked list of
programs and all of these will be served.
Using a linked list makes it cumbersome when there are several programs
that can be optionally selected with CONFIG settings.
After this patch is applied, API consumers must use only
svc_create_pooled() when creating an RPC service that listens for more
than one RPC program.
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfsd/nfsctl.c | 2 +-
fs/nfsd/nfsd.h | 2 +-
fs/nfsd/nfssvc.c | 69 ++++++++++++++++++--------------------
include/linux/sunrpc/svc.h | 7 ++--
net/sunrpc/svc.c | 68 +++++++++++++++++++++----------------
net/sunrpc/svc_xprt.c | 2 +-
net/sunrpc/svcauth_unix.c | 3 +-
7 files changed, 80 insertions(+), 73 deletions(-)
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 9e0ea6fc2aa3..e4636e260cac 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2239,7 +2239,7 @@ static __net_init int nfsd_net_init(struct net *net)
if (retval)
goto out_repcache_error;
memset(&nn->nfsd_svcstats, 0, sizeof(nn->nfsd_svcstats));
- nn->nfsd_svcstats.program = &nfsd_program;
+ nn->nfsd_svcstats.program = &nfsd_programs[0];
nn->nfsd_versions = NULL;
nn->nfsd4_minorversions = NULL;
nn->nfsd_info.mutex = &nfsd_mutex;
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index cec8697b1cd6..c3f7c5957950 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -80,7 +80,7 @@ struct nfsd_genl_rqstp {
u32 rq_opnum[NFSD_MAX_OPS_PER_COMPOUND];
};
-extern struct svc_program nfsd_program;
+extern struct svc_program nfsd_programs[];
extern const struct svc_version nfsd_version2, nfsd_version3, nfsd_version4;
extern struct mutex nfsd_mutex;
extern spinlock_t nfsd_drc_lock;
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 3e528d242966..bee834ec468c 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -36,7 +36,6 @@
#define NFSDDBG_FACILITY NFSDDBG_SVC
atomic_t nfsd_th_cnt = ATOMIC_INIT(0);
-extern struct svc_program nfsd_program;
static int nfsd(void *vrqstp);
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static int nfsd_acl_rpcbind_set(struct net *,
@@ -89,16 +88,6 @@ static const struct svc_version *localio_versions[] = {
#define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions)
-static struct svc_program nfsd_localio_program = {
- .pg_prog = NFS_LOCALIO_PROGRAM,
- .pg_nvers = NFSD_LOCALIO_NRVERS,
- .pg_vers = localio_versions,
- .pg_name = "nfslocalio",
- .pg_class = "nfsd",
- .pg_authenticate = &svc_set_client,
- .pg_init_request = svc_generic_init_request,
- .pg_rpcbind_set = svc_generic_rpcbind_set,
-};
#endif /* CONFIG_NFSD_LOCALIO */
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
@@ -111,23 +100,9 @@ static const struct svc_version *nfsd_acl_version[] = {
# endif
};
-#define NFSD_ACL_MINVERS 2
+#define NFSD_ACL_MINVERS 2
#define NFSD_ACL_NRVERS ARRAY_SIZE(nfsd_acl_version)
-static struct svc_program nfsd_acl_program = {
-#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
- .pg_next = &nfsd_localio_program,
-#endif /* CONFIG_NFSD_LOCALIO */
- .pg_prog = NFS_ACL_PROGRAM,
- .pg_nvers = NFSD_ACL_NRVERS,
- .pg_vers = nfsd_acl_version,
- .pg_name = "nfsacl",
- .pg_class = "nfsd",
- .pg_authenticate = &svc_set_client,
- .pg_init_request = nfsd_acl_init_request,
- .pg_rpcbind_set = nfsd_acl_rpcbind_set,
-};
-
#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
static const struct svc_version *nfsd_version[] = {
@@ -140,25 +115,44 @@ static const struct svc_version *nfsd_version[] = {
#endif
};
-#define NFSD_MINVERS 2
+#define NFSD_MINVERS 2
#define NFSD_NRVERS ARRAY_SIZE(nfsd_version)
-struct svc_program nfsd_program = {
-#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
- .pg_next = &nfsd_acl_program,
-#else
-#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
- .pg_next = &nfsd_localio_program,
-#endif /* CONFIG_NFSD_LOCALIO */
-#endif
+struct svc_program nfsd_programs[] = {
+ {
.pg_prog = NFS_PROGRAM, /* program number */
.pg_nvers = NFSD_NRVERS, /* nr of entries in nfsd_version */
.pg_vers = nfsd_version, /* version table */
.pg_name = "nfsd", /* program name */
.pg_class = "nfsd", /* authentication class */
- .pg_authenticate = &svc_set_client, /* export authentication */
+ .pg_authenticate = svc_set_client, /* export authentication */
.pg_init_request = nfsd_init_request,
.pg_rpcbind_set = nfsd_rpcbind_set,
+ },
+#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
+ {
+ .pg_prog = NFS_ACL_PROGRAM,
+ .pg_nvers = NFSD_ACL_NRVERS,
+ .pg_vers = nfsd_acl_version,
+ .pg_name = "nfsacl",
+ .pg_class = "nfsd",
+ .pg_authenticate = svc_set_client,
+ .pg_init_request = nfsd_acl_init_request,
+ .pg_rpcbind_set = nfsd_acl_rpcbind_set,
+ },
+#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ {
+ .pg_prog = NFS_LOCALIO_PROGRAM,
+ .pg_nvers = NFSD_LOCALIO_NRVERS,
+ .pg_vers = localio_versions,
+ .pg_name = "nfslocalio",
+ .pg_class = "nfsd",
+ .pg_authenticate = svc_set_client,
+ .pg_init_request = svc_generic_init_request,
+ .pg_rpcbind_set = svc_generic_rpcbind_set,
+ }
+#endif /* IS_ENABLED(CONFIG_NFSD_LOCALIO) */
};
bool nfsd_support_version(int vers)
@@ -735,7 +729,8 @@ int nfsd_create_serv(struct net *net)
if (nfsd_max_blksize == 0)
nfsd_max_blksize = nfsd_get_default_max_blksize();
nfsd_reset_versions(nn);
- serv = svc_create_pooled(&nfsd_program, &nn->nfsd_svcstats,
+ serv = svc_create_pooled(nfsd_programs, ARRAY_SIZE(nfsd_programs),
+ &nn->nfsd_svcstats,
nfsd_max_blksize, nfsd);
if (serv == NULL)
return -ENOMEM;
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index a7d0406b9ef5..7c86b1696398 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -66,9 +66,10 @@ enum {
* We currently do not support more than one RPC program per daemon.
*/
struct svc_serv {
- struct svc_program * sv_program; /* RPC program */
+ struct svc_program * sv_programs; /* RPC programs */
struct svc_stat * sv_stats; /* RPC statistics */
spinlock_t sv_lock;
+ unsigned int sv_nprogs; /* Number of sv_programs */
unsigned int sv_nrthreads; /* # of server threads */
unsigned int sv_maxconn; /* max connections allowed or
* '0' causing max to be based
@@ -329,10 +330,9 @@ struct svc_process_info {
};
/*
- * List of RPC programs on the same transport endpoint
+ * RPC program - an array of these can use the same transport endpoint
*/
struct svc_program {
- struct svc_program * pg_next; /* other programs (same xprt) */
u32 pg_prog; /* program number */
unsigned int pg_lovers; /* lowest version */
unsigned int pg_hivers; /* highest version */
@@ -414,6 +414,7 @@ void svc_rqst_release_pages(struct svc_rqst *rqstp);
void svc_rqst_free(struct svc_rqst *);
void svc_exit_thread(struct svc_rqst *);
struct svc_serv * svc_create_pooled(struct svc_program *prog,
+ unsigned int nprog,
struct svc_stat *stats,
unsigned int bufsize,
int (*threadfn)(void *data));
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index e03f14024e47..b2ba0fbbfdc9 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -440,10 +440,11 @@ EXPORT_SYMBOL_GPL(svc_rpcb_cleanup);
static int svc_uses_rpcbind(struct svc_serv *serv)
{
- struct svc_program *progp;
- unsigned int i;
+ unsigned int p, i;
+
+ for (p = 0; p < serv->sv_nprogs; p++) {
+ struct svc_program *progp = &serv->sv_programs[p];
- for (progp = serv->sv_program; progp; progp = progp->pg_next) {
for (i = 0; i < progp->pg_nvers; i++) {
if (progp->pg_vers[i] == NULL)
continue;
@@ -480,7 +481,7 @@ __svc_init_bc(struct svc_serv *serv)
* Create an RPC service
*/
static struct svc_serv *
-__svc_create(struct svc_program *prog, struct svc_stat *stats,
+__svc_create(struct svc_program *prog, int nprogs, struct svc_stat *stats,
unsigned int bufsize, int npools, int (*threadfn)(void *data))
{
struct svc_serv *serv;
@@ -491,7 +492,8 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
if (!(serv = kzalloc(sizeof(*serv), GFP_KERNEL)))
return NULL;
serv->sv_name = prog->pg_name;
- serv->sv_program = prog;
+ serv->sv_programs = prog;
+ serv->sv_nprogs = nprogs;
serv->sv_stats = stats;
if (bufsize > RPCSVC_MAXPAYLOAD)
bufsize = RPCSVC_MAXPAYLOAD;
@@ -499,17 +501,18 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
serv->sv_max_mesg = roundup(serv->sv_max_payload + PAGE_SIZE, PAGE_SIZE);
serv->sv_threadfn = threadfn;
xdrsize = 0;
- while (prog) {
- prog->pg_lovers = prog->pg_nvers-1;
- for (vers=0; vers<prog->pg_nvers ; vers++)
- if (prog->pg_vers[vers]) {
- prog->pg_hivers = vers;
- if (prog->pg_lovers > vers)
- prog->pg_lovers = vers;
- if (prog->pg_vers[vers]->vs_xdrsize > xdrsize)
- xdrsize = prog->pg_vers[vers]->vs_xdrsize;
+ for (i = 0; i < nprogs; i++) {
+ struct svc_program *progp = &prog[i];
+
+ progp->pg_lovers = progp->pg_nvers-1;
+ for (vers = 0; vers < progp->pg_nvers ; vers++)
+ if (progp->pg_vers[vers]) {
+ progp->pg_hivers = vers;
+ if (progp->pg_lovers > vers)
+ progp->pg_lovers = vers;
+ if (progp->pg_vers[vers]->vs_xdrsize > xdrsize)
+ xdrsize = progp->pg_vers[vers]->vs_xdrsize;
}
- prog = prog->pg_next;
}
serv->sv_xdrsize = xdrsize;
INIT_LIST_HEAD(&serv->sv_tempsocks);
@@ -558,13 +561,14 @@ __svc_create(struct svc_program *prog, struct svc_stat *stats,
struct svc_serv *svc_create(struct svc_program *prog, unsigned int bufsize,
int (*threadfn)(void *data))
{
- return __svc_create(prog, NULL, bufsize, 1, threadfn);
+ return __svc_create(prog, 1, NULL, bufsize, 1, threadfn);
}
EXPORT_SYMBOL_GPL(svc_create);
/**
* svc_create_pooled - Create an RPC service with pooled threads
- * @prog: the RPC program the new service will handle
+ * @prog: Array of RPC programs the new service will handle
+ * @nprogs: Number of programs in the array
* @stats: the stats struct if desired
* @bufsize: maximum message size for @prog
* @threadfn: a function to service RPC requests for @prog
@@ -572,6 +576,7 @@ EXPORT_SYMBOL_GPL(svc_create);
* Returns an instantiated struct svc_serv object or NULL.
*/
struct svc_serv *svc_create_pooled(struct svc_program *prog,
+ unsigned int nprogs,
struct svc_stat *stats,
unsigned int bufsize,
int (*threadfn)(void *data))
@@ -579,7 +584,7 @@ struct svc_serv *svc_create_pooled(struct svc_program *prog,
struct svc_serv *serv;
unsigned int npools = svc_pool_map_get();
- serv = __svc_create(prog, stats, bufsize, npools, threadfn);
+ serv = __svc_create(prog, nprogs, stats, bufsize, npools, threadfn);
if (!serv)
goto out_err;
serv->sv_is_pooled = true;
@@ -602,16 +607,16 @@ svc_destroy(struct svc_serv **servp)
*servp = NULL;
- dprintk("svc: svc_destroy(%s)\n", serv->sv_program->pg_name);
+ dprintk("svc: svc_destroy(%s)\n", serv->sv_programs->pg_name);
timer_shutdown_sync(&serv->sv_temptimer);
/*
* Remaining transports at this point are not expected.
*/
WARN_ONCE(!list_empty(&serv->sv_permsocks),
- "SVC: permsocks remain for %s\n", serv->sv_program->pg_name);
+ "SVC: permsocks remain for %s\n", serv->sv_programs->pg_name);
WARN_ONCE(!list_empty(&serv->sv_tempsocks),
- "SVC: tempsocks remain for %s\n", serv->sv_program->pg_name);
+ "SVC: tempsocks remain for %s\n", serv->sv_programs->pg_name);
cache_clean_deferred(serv);
@@ -1156,15 +1161,16 @@ int svc_register(const struct svc_serv *serv, struct net *net,
const int family, const unsigned short proto,
const unsigned short port)
{
- struct svc_program *progp;
- unsigned int i;
+ unsigned int p, i;
int error = 0;
WARN_ON_ONCE(proto == 0 && port == 0);
if (proto == 0 && port == 0)
return -EINVAL;
- for (progp = serv->sv_program; progp; progp = progp->pg_next) {
+ for (p = 0; p < serv->sv_nprogs; p++) {
+ struct svc_program *progp = &serv->sv_programs[p];
+
for (i = 0; i < progp->pg_nvers; i++) {
error = progp->pg_rpcbind_set(net, progp, i,
@@ -1216,13 +1222,14 @@ static void __svc_unregister(struct net *net, const u32 program, const u32 versi
static void svc_unregister(const struct svc_serv *serv, struct net *net)
{
struct sighand_struct *sighand;
- struct svc_program *progp;
unsigned long flags;
- unsigned int i;
+ unsigned int p, i;
clear_thread_flag(TIF_SIGPENDING);
- for (progp = serv->sv_program; progp; progp = progp->pg_next) {
+ for (p = 0; p < serv->sv_nprogs; p++) {
+ struct svc_program *progp = &serv->sv_programs[p];
+
for (i = 0; i < progp->pg_nvers; i++) {
if (progp->pg_vers[i] == NULL)
continue;
@@ -1328,7 +1335,7 @@ svc_process_common(struct svc_rqst *rqstp)
struct svc_process_info process;
enum svc_auth_status auth_res;
unsigned int aoffset;
- int rc;
+ int pr, rc;
__be32 *p;
/* Will be turned off only when NFSv4 Sessions are used */
@@ -1352,9 +1359,12 @@ svc_process_common(struct svc_rqst *rqstp)
rqstp->rq_vers = be32_to_cpup(p++);
rqstp->rq_proc = be32_to_cpup(p);
- for (progp = serv->sv_program; progp; progp = progp->pg_next)
+ for (pr = 0; pr < serv->sv_nprogs; pr++) {
+ progp = &serv->sv_programs[pr];
+
if (rqstp->rq_prog == progp->pg_prog)
break;
+ }
/*
* Decode auth data, and add verifier to reply buffer.
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index d3735ab3e6d1..16634afdf253 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -268,7 +268,7 @@ static int _svc_xprt_create(struct svc_serv *serv, const char *xprt_name,
spin_unlock(&svc_xprt_class_lock);
newxprt = xcl->xcl_ops->xpo_create(serv, net, sap, len, flags);
if (IS_ERR(newxprt)) {
- trace_svc_xprt_create_err(serv->sv_program->pg_name,
+ trace_svc_xprt_create_err(serv->sv_programs->pg_name,
xcl->xcl_name, sap, len,
newxprt);
module_put(xcl->xcl_owner);
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c
index 04b45588ae6f..8ca98b146ec8 100644
--- a/net/sunrpc/svcauth_unix.c
+++ b/net/sunrpc/svcauth_unix.c
@@ -697,7 +697,8 @@ svcauth_unix_set_client(struct svc_rqst *rqstp)
rqstp->rq_auth_stat = rpc_autherr_badcred;
ipm = ip_map_cached_get(xprt);
if (ipm == NULL)
- ipm = __ip_map_lookup(sn->ip_map_cache, rqstp->rq_server->sv_program->pg_class,
+ ipm = __ip_map_lookup(sn->ip_map_cache,
+ rqstp->rq_server->sv_programs->pg_class,
&sin6->sin6_addr);
if (ipm == NULL)
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 10/20] nfs: pass nfs_client to nfs_initiate_pgio
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (8 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 09/20] SUNRPC: replace program list with program array Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 11/20] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
` (11 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
The nfs_client is needed for localio support. Otherwise it won't be
possible to disable localio if it is attempted but fails.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/filelayout/filelayout.c | 4 ++--
fs/nfs/flexfilelayout/flexfilelayout.c | 6 ++++--
fs/nfs/internal.h | 5 +++--
fs/nfs/pagelist.c | 10 ++++++----
4 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 29d84dc66ca3..43e16e9e0176 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -486,7 +486,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
@@ -528,7 +528,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 4a9106fa8220..0784aac0be47 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1803,7 +1803,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
0, RPC_TASK_SOFTCONN);
@@ -1871,7 +1872,8 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds_clnt, hdr, ds_cred, ds->ds_clp->rpc_ops,
+ nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
sync, RPC_TASK_SOFTCONN);
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 9f0f4534744b..a9c0c29f7804 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,8 +306,9 @@ extern const struct nfs_pageio_ops nfs_pgio_rw_ops;
struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+ struct nfs_pgio_header *hdr, const struct cred *cred,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 040b6b79c75e..d35b2b30a404 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -844,8 +844,9 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
rpc_exit(task, err);
}
-int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
- const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+ struct nfs_pgio_header *hdr, const struct cred *cred,
+ const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
{
struct rpc_task *task;
@@ -855,7 +856,7 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
.rpc_cred = cred,
};
struct rpc_task_setup task_setup_data = {
- .rpc_client = clnt,
+ .rpc_client = rpc_clnt,
.task = &hdr->task,
.rpc_message = &msg,
.callback_ops = call_ops,
@@ -1070,7 +1071,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
if (ret == 0) {
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(NFS_CLIENT(hdr->inode),
+ ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)->nfs_client,
+ NFS_CLIENT(hdr->inode),
hdr,
hdr->cred,
NFS_PROTO(hdr->inode),
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 11/20] nfs: pass descriptor thru nfs_initiate_pgio path
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (9 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 10/20] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 12/20] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
` (10 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
This is needed for localio support.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/blocklayout/blocklayout.c | 6 ++++--
fs/nfs/filelayout/filelayout.c | 10 ++++++----
fs/nfs/flexfilelayout/flexfilelayout.c | 10 ++++++----
fs/nfs/internal.h | 6 +++---
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/pnfs.c | 24 +++++++++++++-----------
fs/nfs/pnfs.h | 6 ++++--
7 files changed, 40 insertions(+), 28 deletions(-)
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 6be13e0ec170..6a61ddd1835f 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -227,7 +227,8 @@ bl_end_par_io_read(void *data)
}
static enum pnfs_try_status
-bl_read_pagelist(struct nfs_pgio_header *header)
+bl_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *header)
{
struct pnfs_block_layout *bl = BLK_LSEG2EXT(header->lseg);
struct pnfs_block_dev_map map = { .start = NFS4_MAX_UINT64 };
@@ -372,7 +373,8 @@ static void bl_end_par_io_write(void *data)
}
static enum pnfs_try_status
-bl_write_pagelist(struct nfs_pgio_header *header, int sync)
+bl_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *header, int sync)
{
struct pnfs_block_layout *bl = BLK_LSEG2EXT(header->lseg);
struct pnfs_block_dev_map map = { .start = NFS4_MAX_UINT64 };
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index 43e16e9e0176..f9b600c4a2b5 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -447,7 +447,8 @@ static const struct rpc_call_ops filelayout_commit_call_ops = {
};
static enum pnfs_try_status
-filelayout_read_pagelist(struct nfs_pgio_header *hdr)
+filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -486,7 +487,7 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
0, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
@@ -494,7 +495,8 @@ filelayout_read_pagelist(struct nfs_pgio_header *hdr)
/* Perform async writes. */
static enum pnfs_try_status
-filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int sync)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -528,7 +530,7 @@ filelayout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = filelayout_get_dserver_offset(lseg, offset);
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, hdr->cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
sync, RPC_TASK_SOFTCONN);
return PNFS_ATTEMPTED;
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 0784aac0be47..3f0554fc9c31 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1751,7 +1751,8 @@ static const struct rpc_call_ops ff_layout_commit_call_ops_v4 = {
};
static enum pnfs_try_status
-ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
+ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -1803,7 +1804,7 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
hdr->mds_offset = offset;
/* Perform an asynchronous read to ds */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
@@ -1822,7 +1823,8 @@ ff_layout_read_pagelist(struct nfs_pgio_header *hdr)
/* Perform async writes. */
static enum pnfs_try_status
-ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
+ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr, int sync)
{
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
@@ -1872,7 +1874,7 @@ ff_layout_write_pagelist(struct nfs_pgio_header *hdr, int sync)
hdr->args.offset = offset;
/* Perform an asynchronous write */
- nfs_initiate_pgio(ds->ds_clp, ds_clnt, hdr, ds_cred,
+ nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index a9c0c29f7804..f6e56fdd8bc2 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -306,9 +306,9 @@ extern const struct nfs_pageio_ops nfs_pgio_rw_ops;
struct nfs_pgio_header *nfs_pgio_header_alloc(const struct nfs_rw_ops *);
void nfs_pgio_header_free(struct nfs_pgio_header *);
int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
-int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
- struct nfs_pgio_header *hdr, const struct cred *cred,
- const struct nfs_rpc_ops *rpc_ops,
+int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
+ struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
+ const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index d35b2b30a404..7f881314d973 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -844,7 +844,8 @@ static void nfs_pgio_prepare(struct rpc_task *task, void *calldata)
rpc_exit(task, err);
}
-int nfs_initiate_pgio(struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
+int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
+ struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
struct nfs_pgio_header *hdr, const struct cred *cred,
const struct nfs_rpc_ops *rpc_ops,
const struct rpc_call_ops *call_ops, int how, int flags)
@@ -1071,7 +1072,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
if (ret == 0) {
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(NFS_SERVER(hdr->inode)->nfs_client,
+ ret = nfs_initiate_pgio(desc,
+ NFS_SERVER(hdr->inode)->nfs_client,
NFS_CLIENT(hdr->inode),
hdr,
hdr->cred,
diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index b5834728f31b..c9015179b72c 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2885,10 +2885,11 @@ pnfs_write_through_mds(struct nfs_pageio_descriptor *desc,
}
static enum pnfs_try_status
-pnfs_try_to_write_data(struct nfs_pgio_header *hdr,
- const struct rpc_call_ops *call_ops,
- struct pnfs_layout_segment *lseg,
- int how)
+pnfs_try_to_write_data(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops,
+ struct pnfs_layout_segment *lseg,
+ int how)
{
struct inode *inode = hdr->inode;
enum pnfs_try_status trypnfs;
@@ -2898,7 +2899,7 @@ pnfs_try_to_write_data(struct nfs_pgio_header *hdr,
dprintk("%s: Writing ino:%lu %u@%llu (how %d)\n", __func__,
inode->i_ino, hdr->args.count, hdr->args.offset, how);
- trypnfs = nfss->pnfs_curr_ld->write_pagelist(hdr, how);
+ trypnfs = nfss->pnfs_curr_ld->write_pagelist(desc, hdr, how);
if (trypnfs != PNFS_NOT_ATTEMPTED)
nfs_inc_stats(inode, NFSIOS_PNFS_WRITE);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
@@ -2913,7 +2914,7 @@ pnfs_do_write(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- trypnfs = pnfs_try_to_write_data(hdr, call_ops, lseg, how);
+ trypnfs = pnfs_try_to_write_data(desc, hdr, call_ops, lseg, how);
switch (trypnfs) {
case PNFS_NOT_ATTEMPTED:
pnfs_write_through_mds(desc, hdr);
@@ -3012,9 +3013,10 @@ pnfs_read_through_mds(struct nfs_pageio_descriptor *desc,
* Call the appropriate parallel I/O subsystem read function.
*/
static enum pnfs_try_status
-pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
- const struct rpc_call_ops *call_ops,
- struct pnfs_layout_segment *lseg)
+pnfs_try_to_read_data(struct nfs_pageio_descriptor *desc,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops,
+ struct pnfs_layout_segment *lseg)
{
struct inode *inode = hdr->inode;
struct nfs_server *nfss = NFS_SERVER(inode);
@@ -3025,7 +3027,7 @@ pnfs_try_to_read_data(struct nfs_pgio_header *hdr,
dprintk("%s: Reading ino:%lu %u@%llu\n",
__func__, inode->i_ino, hdr->args.count, hdr->args.offset);
- trypnfs = nfss->pnfs_curr_ld->read_pagelist(hdr);
+ trypnfs = nfss->pnfs_curr_ld->read_pagelist(desc, hdr);
if (trypnfs != PNFS_NOT_ATTEMPTED)
nfs_inc_stats(inode, NFSIOS_PNFS_READ);
dprintk("%s End (trypnfs:%d)\n", __func__, trypnfs);
@@ -3058,7 +3060,7 @@ pnfs_do_read(struct nfs_pageio_descriptor *desc, struct nfs_pgio_header *hdr)
struct pnfs_layout_segment *lseg = desc->pg_lseg;
enum pnfs_try_status trypnfs;
- trypnfs = pnfs_try_to_read_data(hdr, call_ops, lseg);
+ trypnfs = pnfs_try_to_read_data(desc, hdr, call_ops, lseg);
switch (trypnfs) {
case PNFS_NOT_ATTEMPTED:
pnfs_read_through_mds(desc, hdr);
diff --git a/fs/nfs/pnfs.h b/fs/nfs/pnfs.h
index fa5beeaaf5da..92acb837cfa6 100644
--- a/fs/nfs/pnfs.h
+++ b/fs/nfs/pnfs.h
@@ -157,8 +157,10 @@ struct pnfs_layoutdriver_type {
* Return PNFS_ATTEMPTED to indicate the layout code has attempted
* I/O, else return PNFS_NOT_ATTEMPTED to fall back to normal NFS
*/
- enum pnfs_try_status (*read_pagelist)(struct nfs_pgio_header *);
- enum pnfs_try_status (*write_pagelist)(struct nfs_pgio_header *, int);
+ enum pnfs_try_status (*read_pagelist)(struct nfs_pageio_descriptor *,
+ struct nfs_pgio_header *);
+ enum pnfs_try_status (*write_pagelist)(struct nfs_pageio_descriptor *,
+ struct nfs_pgio_header *, int);
void (*free_deviceid_node) (struct nfs4_deviceid_node *);
struct nfs4_deviceid_node * (*alloc_deviceid_node)
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 12/20] nfs: pass struct file to nfs_init_pgio and nfs_init_commit
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (10 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 11/20] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 13/20] nfs: add "localio" support Mike Snitzer
` (9 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
This is needed for localio support.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/filelayout/filelayout.c | 6 +++---
fs/nfs/flexfilelayout/flexfilelayout.c | 6 +++---
fs/nfs/internal.h | 6 ++++--
fs/nfs/pagelist.c | 6 ++++--
fs/nfs/pnfs_nfs.c | 2 +-
fs/nfs/write.c | 5 +++--
6 files changed, 18 insertions(+), 13 deletions(-)
diff --git a/fs/nfs/filelayout/filelayout.c b/fs/nfs/filelayout/filelayout.c
index f9b600c4a2b5..b9e5e7bd15ca 100644
--- a/fs/nfs/filelayout/filelayout.c
+++ b/fs/nfs/filelayout/filelayout.c
@@ -489,7 +489,7 @@ filelayout_read_pagelist(struct nfs_pageio_descriptor *desc,
/* Perform an asynchronous read to ds */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_read_call_ops,
- 0, RPC_TASK_SOFTCONN);
+ 0, RPC_TASK_SOFTCONN, NULL);
return PNFS_ATTEMPTED;
}
@@ -532,7 +532,7 @@ filelayout_write_pagelist(struct nfs_pageio_descriptor *desc,
/* Perform an asynchronous write */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, hdr->cred,
NFS_PROTO(hdr->inode), &filelayout_write_call_ops,
- sync, RPC_TASK_SOFTCONN);
+ sync, RPC_TASK_SOFTCONN, NULL);
return PNFS_ATTEMPTED;
}
@@ -1013,7 +1013,7 @@ static int filelayout_initiate_commit(struct nfs_commit_data *data, int how)
data->args.fh = fh;
return nfs_initiate_commit(ds_clnt, data, NFS_PROTO(data->inode),
&filelayout_commit_call_ops, how,
- RPC_TASK_SOFTCONN);
+ RPC_TASK_SOFTCONN, NULL);
out_err:
pnfs_generic_prepare_to_resend_writes(data);
pnfs_generic_commit_release(data);
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 3f0554fc9c31..58f20cebf0c6 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -1808,7 +1808,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
- 0, RPC_TASK_SOFTCONN);
+ 0, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1878,7 +1878,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
- sync, RPC_TASK_SOFTCONN);
+ sync, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1953,7 +1953,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_commit_call_ops_v3 :
&ff_layout_commit_call_ops_v4,
- how, RPC_TASK_SOFTCONN);
+ how, RPC_TASK_SOFTCONN, NULL);
put_cred(ds_cred);
return ret;
out_err:
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index f6e56fdd8bc2..958c8de072e2 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -309,7 +309,8 @@ int nfs_generic_pgio(struct nfs_pageio_descriptor *, struct nfs_pgio_header *);
int nfs_initiate_pgio(struct nfs_pageio_descriptor *, struct nfs_client *clp,
struct rpc_clnt *rpc_clnt, struct nfs_pgio_header *hdr,
const struct cred *cred, const struct nfs_rpc_ops *rpc_ops,
- const struct rpc_call_ops *call_ops, int how, int flags);
+ const struct rpc_call_ops *call_ops, int how, int flags,
+ struct file *localio);
void nfs_free_request(struct nfs_page *req);
struct nfs_pgio_mirror *
nfs_pgio_current_mirror(struct nfs_pageio_descriptor *desc);
@@ -529,7 +530,8 @@ extern int nfs_initiate_commit(struct rpc_clnt *clnt,
struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
- int how, int flags);
+ int how, int flags,
+ struct file *localio);
extern void nfs_init_commit(struct nfs_commit_data *data,
struct list_head *head,
struct pnfs_layout_segment *lseg,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 7f881314d973..727d3b80e897 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -848,7 +848,8 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
struct nfs_client *clp, struct rpc_clnt *rpc_clnt,
struct nfs_pgio_header *hdr, const struct cred *cred,
const struct nfs_rpc_ops *rpc_ops,
- const struct rpc_call_ops *call_ops, int how, int flags)
+ const struct rpc_call_ops *call_ops, int how, int flags,
+ struct file *localio)
{
struct rpc_task *task;
struct rpc_message msg = {
@@ -1080,7 +1081,8 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
NFS_PROTO(hdr->inode),
desc->pg_rpc_callops,
desc->pg_ioflags,
- RPC_TASK_CRED_NOREF | task_flags);
+ RPC_TASK_CRED_NOREF | task_flags,
+ NULL);
}
return ret;
}
diff --git a/fs/nfs/pnfs_nfs.c b/fs/nfs/pnfs_nfs.c
index 88e061bd711b..ecfde2649cf3 100644
--- a/fs/nfs/pnfs_nfs.c
+++ b/fs/nfs/pnfs_nfs.c
@@ -537,7 +537,7 @@ pnfs_generic_commit_pagelist(struct inode *inode, struct list_head *mds_pages,
nfs_initiate_commit(NFS_CLIENT(inode), data,
NFS_PROTO(data->inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF);
+ RPC_TASK_CRED_NOREF, NULL);
} else {
nfs_init_commit(data, NULL, data->lseg, cinfo);
initiate_commit(data, how);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 2329cbb0e446..267bed2a4ceb 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1670,7 +1670,8 @@ EXPORT_SYMBOL_GPL(nfs_commitdata_release);
int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
const struct nfs_rpc_ops *nfs_ops,
const struct rpc_call_ops *call_ops,
- int how, int flags)
+ int how, int flags,
+ struct file *localio)
{
struct rpc_task *task;
int priority = flush_task_priority(how);
@@ -1816,7 +1817,7 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
task_flags = RPC_TASK_MOVEABLE;
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF | task_flags);
+ RPC_TASK_CRED_NOREF | task_flags, NULL);
}
/*
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 13/20] nfs: add "localio" support
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (11 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 12/20] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 14/20] nfs: fix nfs_localio_vfs_getattr() to properly support v4 Mike Snitzer
` (8 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Weston Andros Adamson <dros@primarydata.com>
Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.
nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a Linux-only LOCALIO auxiliary RPC protocol.
This has dynamic binding with the nfsd module (via nfs_localio module
which is part of nfs_common). Localio will only work if nfsd is
already loaded.
The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use localio support.
CONFIG_NFS_LOCALIO controls the client enablement.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/Kconfig | 14 +
fs/nfs/Makefile | 1 +
fs/nfs/client.c | 3 +
fs/nfs/inode.c | 4 +
fs/nfs/internal.h | 51 +++
fs/nfs/localio.c | 661 ++++++++++++++++++++++++++++++++++++++
fs/nfs/nfstrace.h | 61 ++++
fs/nfs/pagelist.c | 3 +
fs/nfs/write.c | 3 +
include/linux/nfs.h | 2 +
include/linux/nfs_fs.h | 2 +
include/linux/nfs_fs_sb.h | 1 +
12 files changed, 806 insertions(+)
create mode 100644 fs/nfs/localio.c
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 57249f040dfc..311ae8bc587f 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -86,6 +86,20 @@ config NFS_V4
If unsure, say Y.
+config NFS_LOCALIO
+ tristate "NFS client support for the LOCALIO auxiliary protocol"
+ depends on NFS_V3 || NFS_V4
+ select NFS_COMMON_LOCALIO_SUPPORT
+ help
+ Some NFS servers support an auxiliary NFS LOCALIO protocol
+ that is not an official part of the NFS version 3 or 4 protocol.
+
+ This option enables support for the LOCALIO protocol in the
+ kernel's NFS client. Enable this to bypass using the NFS
+ protocol when issuing reads, writes and commits to the server.
+
+ If unsure, say N.
+
config NFS_SWAP
bool "Provide swap over NFS support"
default n
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 5f6db37f461e..9fb2f2cac87e 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -13,6 +13,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \
nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
+nfs-$(CONFIG_NFS_LOCALIO) += localio.o
obj-$(CONFIG_NFS_V2) += nfsv2.o
nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index bcdf8d42cbc7..1300c388f971 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -241,6 +241,8 @@ static void pnfs_init_server(struct nfs_server *server)
*/
void nfs_free_client(struct nfs_client *clp)
{
+ nfs_local_disable(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
@@ -432,6 +434,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
+ nfs_local_probe(new);
return rpc_ops->init_client(new, cl_init);
}
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index acef52ecb1bb..f9923cbf6058 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -39,6 +39,7 @@
#include <linux/slab.h>
#include <linux/compat.h>
#include <linux/freezer.h>
+#include <linux/file.h>
#include <linux/uaccess.h>
#include <linux/iversion.h>
@@ -1053,6 +1054,7 @@ struct nfs_open_context *alloc_nfs_open_context(struct dentry *dentry,
ctx->lock_context.open_context = ctx;
INIT_LIST_HEAD(&ctx->list);
ctx->mdsthreshold = NULL;
+ ctx->local_filp = NULL;
return ctx;
}
EXPORT_SYMBOL_GPL(alloc_nfs_open_context);
@@ -1084,6 +1086,8 @@ static void __put_nfs_open_context(struct nfs_open_context *ctx, int is_sync)
nfs_sb_deactive(sb);
put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1));
kfree(ctx->mdsthreshold);
+ if (!IS_ERR_OR_NULL(ctx->local_filp))
+ fput(ctx->local_filp);
kfree_rcu(ctx, rcu_head);
}
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index 958c8de072e2..d352040e3232 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -451,6 +451,57 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+/* localio.c */
+extern void nfs_local_disable(struct nfs_client *);
+extern void nfs_local_probe(struct nfs_client *);
+extern struct file *nfs_local_open_fh(struct nfs_client *, const struct cred *,
+ struct nfs_fh *, const fmode_t);
+extern struct file *nfs_local_file_open(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ struct nfs_open_context *ctx);
+extern int nfs_local_doio(struct nfs_client *, struct file *,
+ struct nfs_pgio_header *,
+ const struct rpc_call_ops *);
+extern int nfs_local_commit(struct file *, struct nfs_commit_data *,
+ const struct rpc_call_ops *, int);
+extern bool nfs_server_is_local(const struct nfs_client *clp);
+
+#else
+static inline void nfs_local_disable(struct nfs_client *clp) {}
+static inline void nfs_local_probe(struct nfs_client *clp) {}
+static inline struct file *nfs_local_open_fh(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ const fmode_t mode)
+{
+ return ERR_PTR(-EINVAL);
+}
+static inline struct file *nfs_local_file_open(struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ struct nfs_open_context *ctx)
+{
+ return NULL;
+}
+static inline int nfs_local_doio(struct nfs_client *clp, struct file *filep,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ return -EINVAL;
+}
+static inline int nfs_local_commit(struct file *filep, struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ return -EINVAL;
+}
+static inline bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return false;
+}
+#endif /* CONFIG_NFS_LOCALIO */
+
/* super.c */
extern const struct super_operations nfs_sops;
bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
new file mode 100644
index 000000000000..5fd286e92df4
--- /dev/null
+++ b/fs/nfs/localio.c
@@ -0,0 +1,661 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS client support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/vfs.h>
+#include <linux/file.h>
+#include <linux/inet.h>
+#include <linux/sunrpc/addr.h>
+#include <linux/inetdevice.h>
+#include <net/addrconf.h>
+#include <linux/module.h>
+#include <linux/bvec.h>
+
+#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
+
+#include "internal.h"
+#include "pnfs.h"
+#include "nfstrace.h"
+
+#define NFSDBG_FACILITY NFSDBG_VFS
+
+struct nfs_local_kiocb {
+ struct kiocb kiocb;
+ struct bio_vec *bvec;
+ struct nfs_pgio_header *hdr;
+ struct work_struct work;
+};
+
+struct nfs_local_fsync_ctx {
+ struct file *filp;
+ struct nfs_commit_data *data;
+ struct work_struct work;
+ struct kref kref;
+ struct completion *done;
+};
+static void nfs_local_fsync_work(struct work_struct *work);
+
+/*
+ * We need to translate between nfs status return values and
+ * the local errno values which may not be the same.
+ */
+static struct {
+ __u32 stat;
+ int errno;
+} nfs_errtbl[] = {
+ { NFS4_OK, 0 },
+ { NFS4ERR_PERM, -EPERM },
+ { NFS4ERR_NOENT, -ENOENT },
+ { NFS4ERR_IO, -EIO },
+ { NFS4ERR_NXIO, -ENXIO },
+ { NFS4ERR_FBIG, -E2BIG },
+ { NFS4ERR_STALE, -EBADF },
+ { NFS4ERR_ACCESS, -EACCES },
+ { NFS4ERR_EXIST, -EEXIST },
+ { NFS4ERR_XDEV, -EXDEV },
+ { NFS4ERR_MLINK, -EMLINK },
+ { NFS4ERR_NOTDIR, -ENOTDIR },
+ { NFS4ERR_ISDIR, -EISDIR },
+ { NFS4ERR_INVAL, -EINVAL },
+ { NFS4ERR_FBIG, -EFBIG },
+ { NFS4ERR_NOSPC, -ENOSPC },
+ { NFS4ERR_ROFS, -EROFS },
+ { NFS4ERR_NAMETOOLONG, -ENAMETOOLONG },
+ { NFS4ERR_NOTEMPTY, -ENOTEMPTY },
+ { NFS4ERR_DQUOT, -EDQUOT },
+ { NFS4ERR_STALE, -ESTALE },
+ { NFS4ERR_STALE, -EOPENSTALE },
+ { NFS4ERR_DELAY, -ETIMEDOUT },
+ { NFS4ERR_DELAY, -ERESTARTSYS },
+ { NFS4ERR_DELAY, -EAGAIN },
+ { NFS4ERR_DELAY, -ENOMEM },
+ { NFS4ERR_IO, -ETXTBSY },
+ { NFS4ERR_IO, -EBUSY },
+ { NFS4ERR_BADHANDLE, -EBADHANDLE },
+ { NFS4ERR_BAD_COOKIE, -EBADCOOKIE },
+ { NFS4ERR_NOTSUPP, -EOPNOTSUPP },
+ { NFS4ERR_TOOSMALL, -ETOOSMALL },
+ { NFS4ERR_SERVERFAULT, -ESERVERFAULT },
+ { NFS4ERR_SERVERFAULT, -ENFILE },
+ { NFS4ERR_IO, -EREMOTEIO },
+ { NFS4ERR_IO, -EUCLEAN },
+ { NFS4ERR_PERM, -ENOKEY },
+ { NFS4ERR_BADTYPE, -EBADTYPE },
+ { NFS4ERR_SYMLINK, -ELOOP },
+ { NFS4ERR_DEADLOCK, -EDEADLK },
+};
+
+/*
+ * Convert an NFS error code to a local one.
+ * This one is used jointly by NFSv2 and NFSv3.
+ */
+static __u32
+nfs4errno(int errno)
+{
+ unsigned int i;
+ for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) {
+ if (nfs_errtbl[i].errno == errno)
+ return nfs_errtbl[i].stat;
+ }
+ /* If we cannot translate the error, the recovery routines should
+ * handle it.
+ * Note: remaining NFSv4 error codes have values > 10000, so should
+ * not conflict with native Linux error codes.
+ */
+ return NFS4ERR_SERVERFAULT;
+}
+
+static bool localio_enabled __read_mostly = true;
+module_param(localio_enabled, bool, 0644);
+
+bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
+ localio_enabled;
+}
+EXPORT_SYMBOL_GPL(nfs_server_is_local);
+
+/*
+ * nfs_local_enable - enable local i/o for an nfs_client
+ */
+static __maybe_unused void nfs_local_enable(struct nfs_client *clp,
+ struct net *net)
+{
+ if (READ_ONCE(clp->nfsd_open_local_fh)) {
+ set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ clp->cl_nfssvc_net = net;
+ trace_nfs_local_enable(clp);
+ }
+}
+
+/*
+ * nfs_local_disable - disable local i/o for an nfs_client
+ */
+void nfs_local_disable(struct nfs_client *clp)
+{
+ if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
+ trace_nfs_local_disable(clp);
+ clp->cl_nfssvc_net = NULL;
+ }
+}
+
+/*
+ * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ */
+void nfs_local_probe(struct nfs_client *clp)
+{
+}
+EXPORT_SYMBOL_GPL(nfs_local_probe);
+
+/*
+ * nfs_local_open_fh - open a local filehandle
+ *
+ * Returns a pointer to a struct file or an ERR_PTR
+ */
+struct file *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, const fmode_t mode)
+{
+ struct file *filp;
+ int status;
+
+ if (mode & ~(FMODE_READ | FMODE_WRITE))
+ return ERR_PTR(-EINVAL);
+
+ status = clp->nfsd_open_local_fh(clp->cl_nfssvc_net, clp->cl_rpcclient,
+ cred, fh, mode, &filp);
+ if (status < 0) {
+ trace_nfs_local_open_fh(fh, mode, status);
+ switch (status) {
+ case -ENXIO:
+ nfs_local_disable(clp);
+ fallthrough;
+ case -ETIMEDOUT:
+ status = -EAGAIN;
+ }
+ filp = ERR_PTR(status);
+ }
+ return filp;
+}
+EXPORT_SYMBOL_GPL(nfs_local_open_fh);
+
+static struct bio_vec *
+nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
+ unsigned int npages, gfp_t flags)
+{
+ struct bio_vec *bvec, *p;
+
+ bvec = kmalloc_array(npages, sizeof(*bvec), flags);
+ if (bvec != NULL) {
+ for (p = bvec; npages > 0; p++, pagevec++, npages--) {
+ p->bv_page = *pagevec;
+ p->bv_len = PAGE_SIZE;
+ p->bv_offset = 0;
+ }
+ }
+ return bvec;
+}
+
+static void
+nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
+{
+ kfree(iocb->bvec);
+ kfree(iocb);
+}
+
+static struct nfs_local_kiocb *
+nfs_local_iocb_alloc(struct nfs_pgio_header *hdr, struct file *filp,
+ gfp_t flags)
+{
+ struct nfs_local_kiocb *iocb;
+
+ iocb = kmalloc(sizeof(*iocb), flags);
+ if (iocb == NULL)
+ return NULL;
+ iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
+ hdr->page_array.npages, flags);
+ if (iocb->bvec == NULL) {
+ kfree(iocb);
+ return NULL;
+ }
+ init_sync_kiocb(&iocb->kiocb, filp);
+ iocb->kiocb.ki_pos = hdr->args.offset;
+ iocb->hdr = hdr;
+ iocb->kiocb.ki_flags &= ~IOCB_APPEND;
+ return iocb;
+}
+
+static void
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages,
+ hdr->args.count + hdr->args.pgbase);
+ if (hdr->args.pgbase != 0)
+ iov_iter_advance(i, hdr->args.pgbase);
+}
+
+static void
+nfs_local_hdr_release(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ call_ops->rpc_call_done(&hdr->task, hdr);
+ call_ops->rpc_release(hdr);
+}
+
+static void
+nfs_local_pgio_init(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ hdr->task.tk_ops = call_ops;
+ if (!hdr->task.tk_start)
+ hdr->task.tk_start = ktime_get();
+}
+
+static void
+nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
+{
+ if (status >= 0) {
+ hdr->res.count = status;
+ hdr->res.op_status = NFS4_OK;
+ hdr->task.tk_status = 0;
+ } else {
+ hdr->res.op_status = nfs4errno(status);
+ hdr->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ fput(iocb->kiocb.ki_filp);
+ nfs_local_iocb_free(iocb);
+ nfs_local_hdr_release(hdr, hdr->task.tk_ops);
+}
+
+static void
+nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct file *filp = iocb->kiocb.ki_filp;
+
+ nfs_local_pgio_done(hdr, status);
+
+ if (hdr->res.count != hdr->args.count ||
+ hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
+ hdr->res.eof = true;
+
+ dprintk("%s: read %ld bytes eof %d.\n", __func__,
+ status > 0 ? status : 0, hdr->res.eof);
+}
+
+static int
+nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_read count=%u pos=%llu\n",
+ __func__, hdr->args.count, hdr->args.offset);
+
+ iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, READ);
+
+ nfs_local_pgio_init(hdr, call_ops);
+ hdr->res.eof = false;
+
+ status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+
+ return 0;
+}
+
+static void
+nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ u32 *verf = (u32 *)verifier->data;
+ int seq = 0;
+
+ do {
+ read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
+ verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
+ verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
+ } while (need_seqretry(&clp->cl_boot_lock, seq));
+ done_seqretry(&clp->cl_boot_lock, seq);
+}
+
+static void
+nfs_reset_boot_verifier(struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+
+ write_seqlock(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+ write_sequnlock(&clp->cl_boot_lock);
+}
+
+static void
+nfs_set_local_verifier(struct inode *inode,
+ struct nfs_writeverf *verf,
+ enum nfs3_stable_how how)
+{
+
+ nfs_copy_boot_verifier(&verf->verifier, inode);
+ verf->committed = how;
+}
+
+static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
+{
+ struct kstat stat;
+ struct file *filp = iocb->kiocb.ki_filp;
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct nfs_fattr *fattr = hdr->res.fattr;
+
+ if (unlikely(!fattr) || vfs_getattr(&filp->f_path, &stat,
+ STATX_INO |
+ STATX_ATIME |
+ STATX_MTIME |
+ STATX_CTIME |
+ STATX_SIZE |
+ STATX_BLOCKS,
+ AT_STATX_SYNC_AS_STAT))
+ return;
+
+ fattr->valid = (NFS_ATTR_FATTR_FILEID |
+ NFS_ATTR_FATTR_CHANGE |
+ NFS_ATTR_FATTR_SIZE |
+ NFS_ATTR_FATTR_ATIME |
+ NFS_ATTR_FATTR_MTIME |
+ NFS_ATTR_FATTR_CTIME |
+ NFS_ATTR_FATTR_SPACE_USED);
+
+ fattr->fileid = stat.ino;
+ fattr->size = stat.size;
+ fattr->atime = stat.atime;
+ fattr->mtime = stat.mtime;
+ fattr->ctime = stat.ctime;
+ fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ fattr->du.nfs3.used = stat.blocks << 9;
+}
+
+static void
+nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct inode *inode = hdr->inode;
+
+ dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
+
+ /* Handle short writes as if they are ENOSPC */
+ if (status > 0 && status < hdr->args.count) {
+ hdr->mds_offset += status;
+ hdr->args.offset += status;
+ hdr->args.pgbase += status;
+ hdr->args.count -= status;
+ nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
+ status = -ENOSPC;
+ }
+ if (status < 0)
+ nfs_reset_boot_verifier(inode);
+ else if (nfs_should_remove_suid(inode)) {
+ /* Deal with the suid/sgid bit corner case */
+ spin_lock(&inode->i_lock);
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_MODE);
+ spin_unlock(&inode->i_lock);
+ }
+ nfs_local_pgio_done(hdr, status);
+}
+
+static int
+nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_write count=%u pos=%llu %s\n",
+ __func__, hdr->args.count, hdr->args.offset,
+ (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
+
+ iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, WRITE);
+
+ switch (hdr->args.stable) {
+ default:
+ break;
+ case NFS_DATA_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC;
+ break;
+ case NFS_FILE_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
+ }
+ nfs_local_pgio_init(hdr, call_ops);
+
+ nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
+
+ file_start_write(filp);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ file_end_write(filp);
+ WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+ nfs_local_write_done(iocb, status);
+ nfs_local_vfs_getattr(iocb);
+ nfs_local_pgio_release(iocb);
+
+ return 0;
+}
+
+static struct file *
+nfs_local_file_open_cached(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_open_context *ctx)
+{
+ struct file *filp = ctx->local_filp;
+
+ if (!filp) {
+ struct file *new = nfs_local_open_fh(clp, cred, fh, ctx->mode);
+ if (IS_ERR_OR_NULL(new))
+ return NULL;
+ /* try to put this one in the slot */
+ filp = cmpxchg(&ctx->local_filp, NULL, new);
+ if (filp != NULL)
+ fput(new);
+ else
+ filp = new;
+ }
+ return get_file(filp);
+}
+
+struct file *
+nfs_local_file_open(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, struct nfs_open_context *ctx)
+{
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ return nfs_local_file_open_cached(clp, cred, fh, ctx);
+}
+
+int
+nfs_local_doio(struct nfs_client *clp, struct file *filp,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ int status = 0;
+
+ if (!hdr->args.count)
+ goto out_fput;
+ /* Don't support filesystems without read_iter/write_iter */
+ if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
+ nfs_local_disable(clp);
+ status = -EAGAIN;
+ goto out_fput;
+ }
+
+ switch (hdr->rw_mode) {
+ case FMODE_READ:
+ status = nfs_do_local_read(hdr, filp, call_ops);
+ break;
+ case FMODE_WRITE:
+ status = nfs_do_local_write(hdr, filp, call_ops);
+ break;
+ default:
+ dprintk("%s: invalid mode: %d\n", __func__,
+ hdr->rw_mode);
+ status = -EINVAL;
+ }
+out_fput:
+ if (status != 0) {
+ fput(filp);
+ hdr->task.tk_status = status;
+ nfs_local_hdr_release(hdr, call_ops);
+ }
+ return status;
+}
+
+static void
+nfs_local_init_commit(struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ data->task.tk_ops = call_ops;
+}
+
+static int
+nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
+{
+ loff_t start = data->args.offset;
+ loff_t end = LLONG_MAX;
+
+ if (data->args.count > 0) {
+ end = start + data->args.count - 1;
+ if (end < start)
+ end = LLONG_MAX;
+ }
+
+ dprintk("%s: commit %llu - %llu\n", __func__, start, end);
+ return vfs_fsync_range(filp, start, end, 0);
+}
+
+static void
+nfs_local_commit_done(struct nfs_commit_data *data, int status)
+{
+ if (status >= 0) {
+ nfs_set_local_verifier(data->inode,
+ data->res.verf,
+ NFS_FILE_SYNC);
+ data->res.op_status = NFS4_OK;
+ data->task.tk_status = 0;
+ } else {
+ nfs_reset_boot_verifier(data->inode);
+ data->res.op_status = nfs4errno(status);
+ data->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_release_commit_data(struct file *filp,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ fput(filp);
+ call_ops->rpc_call_done(&data->task, data);
+ call_ops->rpc_release(data);
+}
+
+static struct nfs_local_fsync_ctx *
+nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data, struct file *filp,
+ gfp_t flags)
+{
+ struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
+
+ if (ctx != NULL) {
+ ctx->filp = filp;
+ ctx->data = data;
+ INIT_WORK(&ctx->work, nfs_local_fsync_work);
+ kref_init(&ctx->kref);
+ ctx->done = NULL;
+ }
+ return ctx;
+}
+
+static void
+nfs_local_fsync_ctx_kref_free(struct kref *kref)
+{
+ kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
+}
+
+static void
+nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
+{
+ kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
+}
+
+static void
+nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
+{
+ nfs_local_release_commit_data(ctx->filp, ctx->data,
+ ctx->data->task.tk_ops);
+ nfs_local_fsync_ctx_put(ctx);
+}
+
+static void
+nfs_local_fsync_work(struct work_struct *work)
+{
+ struct nfs_local_fsync_ctx *ctx;
+ int status;
+
+ ctx = container_of(work, struct nfs_local_fsync_ctx, work);
+
+ status = nfs_local_run_commit(ctx->filp, ctx->data);
+ nfs_local_commit_done(ctx->data, status);
+ if (ctx->done != NULL)
+ complete(ctx->done);
+ nfs_local_fsync_ctx_free(ctx);
+}
+
+int
+nfs_local_commit(struct file *filp, struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ struct nfs_local_fsync_ctx *ctx;
+
+ ctx = nfs_local_fsync_ctx_alloc(data, filp, GFP_KERNEL);
+ if (!ctx) {
+ nfs_local_commit_done(data, -ENOMEM);
+ nfs_local_release_commit_data(filp, data, call_ops);
+ return -ENOMEM;
+ }
+
+ nfs_local_init_commit(data, call_ops);
+ kref_get(&ctx->kref);
+ if (how & FLUSH_SYNC) {
+ DECLARE_COMPLETION_ONSTACK(done);
+ ctx->done = &done;
+ queue_work(nfsiod_workqueue, &ctx->work);
+ wait_for_completion(&done);
+ } else
+ queue_work(nfsiod_workqueue, &ctx->work);
+ nfs_local_fsync_ctx_put(ctx);
+ return 0;
+}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 1e710654af11..95a2c19a9172 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1681,6 +1681,67 @@ TRACE_EVENT(nfs_mount_path,
TP_printk("path='%s'", __get_str(path))
);
+TRACE_EVENT(nfs_local_open_fh,
+ TP_PROTO(
+ const struct nfs_fh *fh,
+ fmode_t fmode,
+ int error
+ ),
+
+ TP_ARGS(fh, fmode, error),
+
+ TP_STRUCT__entry(
+ __field(int, error)
+ __field(u32, fhandle)
+ __field(unsigned int, fmode)
+ ),
+
+ TP_fast_assign(
+ __entry->error = error;
+ __entry->fhandle = nfs_fhandle_hash(fh);
+ __entry->fmode = (__force unsigned int)fmode;
+ ),
+
+ TP_printk(
+ "error=%d fhandle=0x%08x mode=%s",
+ __entry->error,
+ __entry->fhandle,
+ show_fs_fmode_flags(__entry->fmode)
+ )
+);
+
+DECLARE_EVENT_CLASS(nfs_local_client_event,
+ TP_PROTO(
+ const struct nfs_client *clp
+ ),
+
+ TP_ARGS(clp),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, protocol)
+ __string(server, clp->cl_hostname)
+ ),
+
+ TP_fast_assign(
+ __entry->protocol = clp->rpc_ops->version;
+ __assign_str(server);
+ ),
+
+ TP_printk(
+ "server=%s NFSv%u", __get_str(server), __entry->protocol
+ )
+);
+
+#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \
+ DEFINE_EVENT(nfs_local_client_event, name, \
+ TP_PROTO( \
+ const struct nfs_client *clp \
+ ), \
+ TP_ARGS(clp))
+
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable);
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable);
+
DECLARE_EVENT_CLASS(nfs_xdr_event,
TP_PROTO(
const struct xdr_stream *xdr,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 727d3b80e897..7b7dbbefee03 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -879,6 +879,9 @@ int nfs_initiate_pgio(struct nfs_pageio_descriptor *desc,
hdr->args.count,
(unsigned long long)hdr->args.offset);
+ if (localio)
+ return nfs_local_doio(clp, localio, hdr, call_ops);
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 267bed2a4ceb..b29b0fd5431f 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1700,6 +1700,9 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
dprintk("NFS: initiated commit call\n");
+ if (localio)
+ return nfs_local_commit(localio, data, call_ops, how);
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index b1e00349f3ed..036f6b0ed94d 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -8,6 +8,8 @@
#ifndef _LINUX_NFS_H
#define _LINUX_NFS_H
+#include <linux/cred.h>
+#include <linux/sunrpc/auth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/string.h>
#include <linux/crc32.h>
diff --git a/include/linux/nfs_fs.h b/include/linux/nfs_fs.h
index 039898d70954..a0bb947fdd1d 100644
--- a/include/linux/nfs_fs.h
+++ b/include/linux/nfs_fs.h
@@ -96,6 +96,8 @@ struct nfs_open_context {
struct list_head list;
struct nfs4_threshold *mdsthreshold;
struct rcu_head rcu_head;
+
+ struct file *local_filp;
};
struct nfs_open_dir_context {
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index e58e706a6503..4290c550a049 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -50,6 +50,7 @@ struct nfs_client {
#define NFS_CS_DS 7 /* - Server is a DS */
#define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
#define NFS_CS_PNFS 9 /* - Server used for pnfs */
+#define NFS_CS_LOCAL_IO 10 /* - client is local */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 14/20] nfs: fix nfs_localio_vfs_getattr() to properly support v4
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (12 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 13/20] nfs: add "localio" support Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 15/20] nfs: enable localio for non-pNFS I/O Mike Snitzer
` (7 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
This is nfs-localio code which blurs the boundary between server and
client...
The change_attr is used by NFS to detect if a file might have changed.
This code is used to get the attributes after a write request. NFS
uses a GETATTR request to the server at other times. The change_attr
should be consistent between the two else comparisons will be
meaningless.
So nfs_localio_vfs_getattr() should use the same change_attr as the
one that would be used if the NFS GETATTR request were made. For
NFSv3, that is nfs_timespec_to_change_attr() as was already
implemented. For NFSv4 it is something different (as implemented in
this commit).
Message-Id: <171918165963.14261.959545364150864599@noble.neil.brown.name>
Suggested-by: NeilBrown <neil@brown.name>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/localio.c | 48 +++++++++++++++++++++++++++++++++++++++---------
1 file changed, 39 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 5fd286e92df4..efa01d732206 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -364,21 +364,47 @@ nfs_set_local_verifier(struct inode *inode,
verf->committed = how;
}
+/* Factored out from fs/nfsd/vfs.h:fh_getattr() */
+static int __vfs_getattr(struct path *p, struct kstat *stat, int version)
+{
+ u32 request_mask = STATX_BASIC_STATS;
+
+ if (version == 4)
+ request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE);
+ return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT);
+}
+
+/*
+ * Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute(),
+ * FIXME: factor out to common code.
+ */
+static u64 __nfsd4_change_attribute(const struct kstat *stat,
+ const struct inode *inode)
+{
+ u64 chattr;
+
+ if (stat->result_mask & STATX_CHANGE_COOKIE) {
+ chattr = stat->change_cookie;
+ if (S_ISREG(inode->i_mode) &&
+ !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
+ chattr += (u64)stat->ctime.tv_sec << 30;
+ chattr += stat->ctime.tv_nsec;
+ }
+ } else {
+ chattr = time_to_chattr(&stat->ctime);
+ }
+ return chattr;
+}
+
static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
{
struct kstat stat;
struct file *filp = iocb->kiocb.ki_filp;
struct nfs_pgio_header *hdr = iocb->hdr;
struct nfs_fattr *fattr = hdr->res.fattr;
+ int version = NFS_PROTO(hdr->inode)->version;
- if (unlikely(!fattr) || vfs_getattr(&filp->f_path, &stat,
- STATX_INO |
- STATX_ATIME |
- STATX_MTIME |
- STATX_CTIME |
- STATX_SIZE |
- STATX_BLOCKS,
- AT_STATX_SYNC_AS_STAT))
+ if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version))
return;
fattr->valid = (NFS_ATTR_FATTR_FILEID |
@@ -394,7 +420,11 @@ static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
fattr->atime = stat.atime;
fattr->mtime = stat.mtime;
fattr->ctime = stat.ctime;
- fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ if (version == 4) {
+ fattr->change_attr =
+ __nfsd4_change_attribute(&stat, file_inode(filp));
+ } else
+ fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
fattr->du.nfs3.used = stat.blocks << 9;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 15/20] nfs: enable localio for non-pNFS I/O
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (13 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 14/20] nfs: fix nfs_localio_vfs_getattr() to properly support v4 Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 16/20] pnfs/flexfiles: enable localio for flexfiles I/O Mike Snitzer
` (6 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Trond Myklebust <trond.myklebust@hammerspace.com>
Try a local open of the file we're writing to, and if it succeeds, then
do local I/O.
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/pagelist.c | 19 ++++++++++---------
fs/nfs/write.c | 7 ++++++-
2 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 7b7dbbefee03..031027983c16 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -1063,6 +1063,7 @@ EXPORT_SYMBOL_GPL(nfs_generic_pgio);
static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
{
struct nfs_pgio_header *hdr;
+ struct file *filp;
int ret;
unsigned short task_flags = 0;
@@ -1074,18 +1075,18 @@ static int nfs_generic_pg_pgios(struct nfs_pageio_descriptor *desc)
nfs_pgheader_init(desc, hdr, nfs_pgio_header_free);
ret = nfs_generic_pgio(desc, hdr);
if (ret == 0) {
+ struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client;
+
+ filp = nfs_local_file_open(clp, hdr->cred, hdr->args.fh,
+ hdr->args.context);
+
if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
- ret = nfs_initiate_pgio(desc,
- NFS_SERVER(hdr->inode)->nfs_client,
- NFS_CLIENT(hdr->inode),
- hdr,
- hdr->cred,
- NFS_PROTO(hdr->inode),
- desc->pg_rpc_callops,
- desc->pg_ioflags,
+ ret = nfs_initiate_pgio(desc, clp, NFS_CLIENT(hdr->inode),
+ hdr, hdr->cred, NFS_PROTO(hdr->inode),
+ desc->pg_rpc_callops, desc->pg_ioflags,
RPC_TASK_CRED_NOREF | task_flags,
- NULL);
+ filp);
}
return ret;
}
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index b29b0fd5431f..b2c06b8b88cd 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1802,6 +1802,8 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
struct nfs_commit_info *cinfo)
{
struct nfs_commit_data *data;
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ struct file *filp;
unsigned short task_flags = 0;
/* another commit raced with us */
@@ -1818,9 +1820,12 @@ nfs_commit_list(struct inode *inode, struct list_head *head, int how,
nfs_init_commit(data, head, NULL, cinfo);
if (NFS_SERVER(inode)->nfs_client->cl_minorversion)
task_flags = RPC_TASK_MOVEABLE;
+
+ filp = nfs_local_file_open(clp, data->cred, data->args.fh,
+ data->context);
return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode),
data->mds_ops, how,
- RPC_TASK_CRED_NOREF | task_flags, NULL);
+ RPC_TASK_CRED_NOREF | task_flags, filp);
}
/*
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 16/20] pnfs/flexfiles: enable localio for flexfiles I/O
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (14 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 15/20] nfs: enable localio for non-pNFS I/O Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 17/20] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg Mike Snitzer
` (5 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Trond Myklebust <trond.myklebust@hammerspace.com>
If the DS is local to this client, then we should be able to use local
I/O to write the data.
Signed-off-by: Peng Tao <tao.peng@primarydata.com>
Signed-off-by: Lance Shelton <lance.shelton@hammerspace.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/flexfilelayout/flexfilelayout.c | 113 ++++++++++++++++++++--
fs/nfs/flexfilelayout/flexfilelayout.h | 2 +
fs/nfs/flexfilelayout/flexfilelayoutdev.c | 6 ++
3 files changed, 112 insertions(+), 9 deletions(-)
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.c b/fs/nfs/flexfilelayout/flexfilelayout.c
index 58f20cebf0c6..8b9096ad0663 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.c
+++ b/fs/nfs/flexfilelayout/flexfilelayout.c
@@ -11,6 +11,7 @@
#include <linux/nfs_mount.h>
#include <linux/nfs_page.h>
#include <linux/module.h>
+#include <linux/file.h>
#include <linux/sched/mm.h>
#include <linux/sunrpc/metrics.h>
@@ -162,6 +163,52 @@ decode_name(struct xdr_stream *xdr, u32 *id)
return 0;
}
+static struct file *
+ff_local_open_fh(struct pnfs_layout_segment *lseg,
+ u32 ds_idx,
+ struct nfs_client *clp,
+ const struct cred *cred,
+ struct nfs_fh *fh,
+ fmode_t mode)
+{
+ struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx);
+ struct file *filp, *new, __rcu **pfile;
+
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ if (mode & FMODE_WRITE) {
+ /*
+ * Always request read and write access since this corresponds
+ * to a rw layout.
+ */
+ mode |= FMODE_READ;
+ pfile = &mirror->rw_file;
+ } else
+ pfile = &mirror->ro_file;
+
+ new = NULL;
+ rcu_read_lock();
+ filp = rcu_dereference(*pfile);
+ if (!filp) {
+ rcu_read_unlock();
+ new = nfs_local_open_fh(clp, cred, fh, mode);
+ if (IS_ERR(new))
+ return NULL;
+ rcu_read_lock();
+ /* try to swap in the pointer */
+ filp = cmpxchg(pfile, NULL, new);
+ if (!filp) {
+ filp = new;
+ new = NULL;
+ }
+ }
+ filp = get_file_rcu(&filp);
+ rcu_read_unlock();
+ if (new)
+ fput(new);
+ return filp;
+}
+
static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1,
const struct nfs4_ff_layout_mirror *m2)
{
@@ -237,8 +284,15 @@ static struct nfs4_ff_layout_mirror *ff_layout_alloc_mirror(gfp_t gfp_flags)
static void ff_layout_free_mirror(struct nfs4_ff_layout_mirror *mirror)
{
+ struct file *filp;
const struct cred *cred;
+ filp = rcu_access_pointer(mirror->ro_file);
+ if (filp)
+ fput(filp);
+ filp = rcu_access_pointer(mirror->rw_file);
+ if (filp)
+ fput(filp);
ff_layout_remove_mirror(mirror);
kfree(mirror->fh_versions);
cred = rcu_access_pointer(mirror->ro_cred);
@@ -414,6 +468,7 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
struct nfs4_ff_layout_mirror *mirror;
struct cred *kcred;
const struct cred __rcu *cred;
+ const struct cred __rcu *old;
kuid_t uid;
kgid_t gid;
u32 ds_count, fh_count, id;
@@ -513,13 +568,26 @@ ff_layout_alloc_lseg(struct pnfs_layout_hdr *lh,
mirror = ff_layout_add_mirror(lh, fls->mirror_array[i]);
if (mirror != fls->mirror_array[i]) {
+ struct file *filp;
+
/* swap cred ptrs so free_mirror will clean up old */
if (lgr->range.iomode == IOMODE_READ) {
- cred = xchg(&mirror->ro_cred, cred);
- rcu_assign_pointer(fls->mirror_array[i]->ro_cred, cred);
+ old = xchg(&mirror->ro_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->ro_cred, old);
+ /* drop file if creds changed */
+ if (old != cred) {
+ filp = rcu_dereference_protected(xchg(&mirror->ro_file, NULL), 1);
+ if (filp)
+ fput(filp);
+ }
} else {
- cred = xchg(&mirror->rw_cred, cred);
- rcu_assign_pointer(fls->mirror_array[i]->rw_cred, cred);
+ old = xchg(&mirror->rw_cred, cred);
+ rcu_assign_pointer(fls->mirror_array[i]->rw_cred, old);
+ if (old != cred) {
+ filp = rcu_dereference_protected(xchg(&mirror->rw_file, NULL), 1);
+ if (filp)
+ fput(filp);
+ }
}
ff_layout_free_mirror(fls->mirror_array[i]);
fls->mirror_array[i] = mirror;
@@ -1757,6 +1825,7 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
loff_t offset = hdr->args.offset;
@@ -1803,12 +1872,20 @@ ff_layout_read_pagelist(struct nfs_pageio_descriptor *desc,
hdr->args.offset = offset;
hdr->mds_offset = offset;
+ /* Start IO accounting for local read */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ);
+ if (filp) {
+ hdr->task.tk_start = ktime_get();
+ ff_layout_read_record_layoutstats_start(&hdr->task, hdr);
+ }
+
/* Perform an asynchronous read to ds */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_read_call_ops_v3 :
&ff_layout_read_call_ops_v4,
- 0, RPC_TASK_SOFTCONN, NULL);
+ 0, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1829,6 +1906,7 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
struct pnfs_layout_segment *lseg = hdr->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
loff_t offset = hdr->args.offset;
@@ -1873,12 +1951,20 @@ ff_layout_write_pagelist(struct nfs_pageio_descriptor *desc,
*/
hdr->args.offset = offset;
+ /* Start IO accounting for local write */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ|FMODE_WRITE);
+ if (filp) {
+ hdr->task.tk_start = ktime_get();
+ ff_layout_write_record_layoutstats_start(&hdr->task, hdr);
+ }
+
/* Perform an asynchronous write */
nfs_initiate_pgio(desc, ds->ds_clp, ds_clnt, hdr, ds_cred,
ds->ds_clp->rpc_ops,
vers == 3 ? &ff_layout_write_call_ops_v3 :
&ff_layout_write_call_ops_v4,
- sync, RPC_TASK_SOFTCONN, NULL);
+ sync, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return PNFS_ATTEMPTED;
@@ -1912,6 +1998,7 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
struct pnfs_layout_segment *lseg = data->lseg;
struct nfs4_pnfs_ds *ds;
struct rpc_clnt *ds_clnt;
+ struct file *filp;
struct nfs4_ff_layout_mirror *mirror;
const struct cred *ds_cred;
u32 idx;
@@ -1950,10 +2037,18 @@ static int ff_layout_initiate_commit(struct nfs_commit_data *data, int how)
if (fh)
data->args.fh = fh;
+ /* Start IO accounting for local commit */
+ filp = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh,
+ FMODE_READ|FMODE_WRITE);
+ if (filp) {
+ data->task.tk_start = ktime_get();
+ ff_layout_commit_record_layoutstats_start(&data->task, data);
+ }
+
ret = nfs_initiate_commit(ds_clnt, data, ds->ds_clp->rpc_ops,
- vers == 3 ? &ff_layout_commit_call_ops_v3 :
- &ff_layout_commit_call_ops_v4,
- how, RPC_TASK_SOFTCONN, NULL);
+ vers == 3 ? &ff_layout_commit_call_ops_v3 :
+ &ff_layout_commit_call_ops_v4,
+ how, RPC_TASK_SOFTCONN, filp);
put_cred(ds_cred);
return ret;
out_err:
diff --git a/fs/nfs/flexfilelayout/flexfilelayout.h b/fs/nfs/flexfilelayout/flexfilelayout.h
index f84b3fb0dddd..8e042df5a2c9 100644
--- a/fs/nfs/flexfilelayout/flexfilelayout.h
+++ b/fs/nfs/flexfilelayout/flexfilelayout.h
@@ -82,7 +82,9 @@ struct nfs4_ff_layout_mirror {
struct nfs_fh *fh_versions;
nfs4_stateid stateid;
const struct cred __rcu *ro_cred;
+ struct file __rcu *ro_file;
const struct cred __rcu *rw_cred;
+ struct file __rcu *rw_file;
refcount_t ref;
spinlock_t lock;
unsigned long flags;
diff --git a/fs/nfs/flexfilelayout/flexfilelayoutdev.c b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
index e028f5a0ef5f..e58bedfb1dcc 100644
--- a/fs/nfs/flexfilelayout/flexfilelayoutdev.c
+++ b/fs/nfs/flexfilelayout/flexfilelayoutdev.c
@@ -395,6 +395,12 @@ nfs4_ff_layout_prepare_ds(struct pnfs_layout_segment *lseg,
/* connect success, check rsize/wsize limit */
if (!status) {
+ /*
+ * ds_clp is put in destroy_ds().
+ * keep ds_clp even if DS is local, so that if local IO cannot
+ * proceed somehow, we can fall back to NFS whenever we want.
+ */
+ nfs_local_probe(ds->ds_clp);
max_payload =
nfs_block_size(rpc_max_payload(ds->ds_clp->cl_rpcclient),
NULL);
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 17/20] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (15 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 16/20] pnfs/flexfiles: enable localio for flexfiles I/O Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 18/20] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
` (4 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
This is needed for the LOCALIO protocol's GETUUID RPC which takes a
void arg. The LOCALIO protocol spec in rpcgen syntax is:
/* raw RFC 9562 UUID */
typedef u8 uuid_t<UUID_SIZE>;
program NFS_LOCALIO_PROGRAM {
version LOCALIO_V1 {
void
NULL(void) = 0;
uuid_t
GETUUID(void) = 1;
} = 1;
} = 400122;
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
net/sunrpc/clnt.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c
index cfd1b1bf7e35..2d7f96103f08 100644
--- a/net/sunrpc/clnt.c
+++ b/net/sunrpc/clnt.c
@@ -1894,7 +1894,6 @@ call_allocate(struct rpc_task *task)
return;
if (proc->p_proc != 0) {
- BUG_ON(proc->p_arglen == 0);
if (proc->p_decode != NULL)
BUG_ON(proc->p_replen == 0);
}
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 18/20] nfs/localio: use dedicated workqueues for filesystem read and write
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (16 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 17/20] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 19/20] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
` (3 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
From: Trond Myklebust <trond.myklebust@hammerspace.com>
For localio access, don't call filesystem read() and write() routines
directly. This solves two problems:
1) localio writes need to use a normal (non-memreclaim) unbound
workqueue. This avoids imposing new requirements on how underlying
filesystems process frontend IO, which would cause a large amount
of work to address all filesystem. Without this change, when XFS
starts getting low on space, XFS flushes work on a non-memreclaim
work queue, which causes a priority inversion problem:
00573 workqueue: WQ_MEM_RECLAIM writeback:wb_workfn is flushing !WQ_MEM_RECLAIM xfs-sync/vdc:xfs_flush_inodes_worker
00573 WARNING: CPU: 6 PID: 8525 at kernel/workqueue.c:3706 check_flush_dependency+0x2a4/0x328
00573 Modules linked in:
00573 CPU: 6 PID: 8525 Comm: kworker/u71:5 Not tainted 6.10.0-rc3-ktest-00032-g2b0a133403ab #18502
00573 Hardware name: linux,dummy-virt (DT)
00573 Workqueue: writeback wb_workfn (flush-0:33)
00573 pstate: 400010c5 (nZcv daIF -PAN -UAO -TCO -DIT +SSBS BTYPE=--)
00573 pc : check_flush_dependency+0x2a4/0x328
00573 lr : check_flush_dependency+0x2a4/0x328
00573 sp : ffff0000c5f06bb0
00573 x29: ffff0000c5f06bb0 x28: ffff0000c998a908 x27: 1fffe00019331521
00573 x26: ffff0000d0620900 x25: ffff0000c5f06ca0 x24: ffff8000828848c0
00573 x23: 1fffe00018be0d8e x22: ffff0000c1210000 x21: ffff0000c75fde00
00573 x20: ffff800080bfd258 x19: ffff0000cad63400 x18: ffff0000cd3a4810
00573 x17: 0000000000000000 x16: 0000000000000000 x15: ffff800080508d98
00573 x14: 0000000000000000 x13: 204d49414c434552 x12: 1fffe0001b6eeab2
00573 x11: ffff60001b6eeab2 x10: dfff800000000000 x9 : ffff60001b6eeab3
00573 x8 : 0000000000000001 x7 : 00009fffe491154e x6 : ffff0000db775593
00573 x5 : ffff0000db775590 x4 : ffff0000db775590 x3 : 0000000000000000
00573 x2 : 0000000000000027 x1 : ffff600018be0d62 x0 : dfff800000000000
00573 Call trace:
00573 check_flush_dependency+0x2a4/0x328
00573 __flush_work+0x184/0x5c8
00573 flush_work+0x18/0x28
00573 xfs_flush_inodes+0x68/0x88
00573 xfs_file_buffered_write+0x128/0x6f0
00573 xfs_file_write_iter+0x358/0x448
00573 nfs_local_doio+0x854/0x1568
00573 nfs_initiate_pgio+0x214/0x418
00573 nfs_generic_pg_pgios+0x304/0x480
00573 nfs_pageio_doio+0xe8/0x240
00573 nfs_pageio_complete+0x160/0x480
00573 nfs_writepages+0x300/0x4f0
00573 do_writepages+0x12c/0x4a0
00573 __writeback_single_inode+0xd4/0xa68
00573 writeback_sb_inodes+0x470/0xcb0
00573 __writeback_inodes_wb+0xb0/0x1d0
00573 wb_writeback+0x594/0x808
00573 wb_workfn+0x5e8/0x9e0
00573 process_scheduled_works+0x53c/0xd90
00573 worker_thread+0x370/0x8c8
00573 kthread+0x258/0x2e8
00573 ret_from_fork+0x10/0x20
2) Some filesystem writeback routines can end up taking up a lot of
stack space (particularly XFS). Instead of risking running over
due to the extra overhead from the NFS stack, we should just call
these routines from a workqueue job. Since we need to do this to
address 1) above we're able to avoid possibly blowing the stack
"for free".
Use of dedicated workqueues improves performance over using the
system_unbound_wq.
Also, the creds of the client task are used to override_creds() in
both nfs_local_call_read() and nfs_local_call_write() -- otherwise the
workqueue could have elevated capabilities (which caller may not).
Lastly, care is taken to set PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO in
both nfs_do_local_read() and nfs_do_local_write().
Dave Chinner detailed the need for these flags with:
"PF_LOCAL_THROTTLE prevents deadlocks in balance_dirty_pages() by
lifting the dirty ratio for this thread a little, hence giving it
priority over the upper filesystem. i.e. the upper filesystem will
throttle incoming writes first, then the back end IO submission
thread can still submit new front end IOs to the lower filesystem
and they won't block in balance_dirty_pages() because the lower
filesystem has a higher limit. hence the lower filesystem can always
drain the dirty pages on the upper filesystem, and the system won't
deadlock in balance_dirty_pages().
The PF_MEMALLOC_NOIO flag prevents the lower filesystem IO from
causing memory reclaim to re-enter filesystems or IO devices and so
prevents deadlocks from occuring where IO that cleans pages is
waiting on IO to complete."
Message-Id: <ZoHuXHMEuMrem73H@dread.disaster.area>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/inode.c | 57 ++++++++++++++++---------
fs/nfs/internal.h | 1 +
fs/nfs/localio.c | 103 +++++++++++++++++++++++++++++++++++-----------
3 files changed, 119 insertions(+), 42 deletions(-)
diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c
index f9923cbf6058..aac8c5302503 100644
--- a/fs/nfs/inode.c
+++ b/fs/nfs/inode.c
@@ -2394,35 +2394,54 @@ static void nfs_destroy_inodecache(void)
kmem_cache_destroy(nfs_inode_cachep);
}
+struct workqueue_struct *nfslocaliod_workqueue;
struct workqueue_struct *nfsiod_workqueue;
EXPORT_SYMBOL_GPL(nfsiod_workqueue);
/*
- * start up the nfsiod workqueue
- */
-static int nfsiod_start(void)
-{
- struct workqueue_struct *wq;
- dprintk("RPC: creating workqueue nfsiod\n");
- wq = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
- if (wq == NULL)
- return -ENOMEM;
- nfsiod_workqueue = wq;
- return 0;
-}
-
-/*
- * Destroy the nfsiod workqueue
+ * Destroy the nfsiod workqueues
*/
static void nfsiod_stop(void)
{
struct workqueue_struct *wq;
wq = nfsiod_workqueue;
- if (wq == NULL)
- return;
- nfsiod_workqueue = NULL;
- destroy_workqueue(wq);
+ if (wq != NULL) {
+ nfsiod_workqueue = NULL;
+ destroy_workqueue(wq);
+ }
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ wq = nfslocaliod_workqueue;
+ if (wq != NULL) {
+ nfslocaliod_workqueue = NULL;
+ destroy_workqueue(wq);
+ }
+#endif /* CONFIG_NFS_LOCALIO */
+}
+
+/*
+ * Start the nfsiod workqueues
+ */
+static int nfsiod_start(void)
+{
+ dprintk("RPC: creating workqueue nfsiod\n");
+ nfsiod_workqueue = alloc_workqueue("nfsiod", WQ_MEM_RECLAIM | WQ_UNBOUND, 0);
+ if (nfsiod_workqueue == NULL)
+ return -ENOMEM;
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ /*
+ * localio writes need to use a normal (non-memreclaim) workqueue.
+ * When we start getting low on space, XFS goes and calls flush_work() on
+ * a non-memreclaim work queue, which causes a priority inversion problem.
+ */
+ dprintk("RPC: creating workqueue nfslocaliod\n");
+ nfslocaliod_workqueue = alloc_workqueue("nfslocaliod", WQ_UNBOUND, 0);
+ if (unlikely(nfslocaliod_workqueue == NULL)) {
+ nfsiod_stop();
+ return -ENOMEM;
+ }
+#endif /* CONFIG_NFS_LOCALIO */
+ return 0;
}
unsigned int nfs_net_id;
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index d352040e3232..9251a357d097 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -440,6 +440,7 @@ int nfs_check_flags(int);
/* inode.c */
extern struct workqueue_struct *nfsiod_workqueue;
+extern struct workqueue_struct *nfslocaliod_workqueue;
extern struct inode *nfs_alloc_inode(struct super_block *sb);
extern void nfs_free_inode(struct inode *);
extern int nfs_write_inode(struct inode *, struct writeback_control *);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index efa01d732206..7039a181ff89 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -44,6 +44,13 @@ struct nfs_local_fsync_ctx {
};
static void nfs_local_fsync_work(struct work_struct *work);
+struct nfs_local_io_args {
+ struct nfs_local_kiocb *iocb;
+ const struct cred *cred;
+ struct work_struct work;
+ struct completion *done;
+};
+
/*
* We need to translate between nfs status return values and
* the local errno values which may not be the same.
@@ -301,30 +308,55 @@ nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
status > 0 ? status : 0, hdr->res.eof);
}
-static int
-nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
- const struct rpc_call_ops *call_ops)
+static void nfs_local_call_read(struct work_struct *work)
{
- struct nfs_local_kiocb *iocb;
+ struct nfs_local_io_args *args =
+ container_of(work, struct nfs_local_io_args, work);
+ struct nfs_local_kiocb *iocb = args->iocb;
+ struct file *filp = iocb->kiocb.ki_filp;
+ const struct cred *save_cred;
struct iov_iter iter;
ssize_t status;
+ current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
+ save_cred = override_creds(args->cred);
+
+ nfs_local_iter_init(&iter, iocb, READ);
+
+ status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+
+ revert_creds(save_cred);
+ complete(args->done);
+}
+
+static int nfs_do_local_read(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_io_args args;
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct nfs_local_kiocb *iocb;
+
dprintk("%s: vfs_read count=%u pos=%llu\n",
__func__, hdr->args.count, hdr->args.offset);
iocb = nfs_local_iocb_alloc(hdr, filp, GFP_KERNEL);
if (iocb == NULL)
return -ENOMEM;
- nfs_local_iter_init(&iter, iocb, READ);
nfs_local_pgio_init(hdr, call_ops);
hdr->res.eof = false;
- status = filp->f_op->read_iter(&iocb->kiocb, &iter);
- WARN_ON_ONCE(status == -EIOCBQUEUED);
-
- nfs_local_read_done(iocb, status);
- nfs_local_pgio_release(iocb);
+ args.iocb = iocb;
+ args.done = &done;
+ args.cred = current_cred();
+ INIT_WORK_ONSTACK(&args.work, nfs_local_call_read);
+ queue_work(nfslocaliod_workqueue, &args.work);
+ wait_for_completion(&done);
+ destroy_work_on_stack(&args.work);
return 0;
}
@@ -456,14 +488,41 @@ nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
nfs_local_pgio_done(hdr, status);
}
-static int
-nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
- const struct rpc_call_ops *call_ops)
+static void nfs_local_call_write(struct work_struct *work)
{
- struct nfs_local_kiocb *iocb;
+ struct nfs_local_io_args *args =
+ container_of(work, struct nfs_local_io_args, work);
+ struct nfs_local_kiocb *iocb = args->iocb;
+ struct file *filp = iocb->kiocb.ki_filp;
+ const struct cred *save_cred;
struct iov_iter iter;
ssize_t status;
+ current->flags |= PF_LOCAL_THROTTLE | PF_MEMALLOC_NOIO;
+ save_cred = override_creds(args->cred);
+
+ nfs_local_iter_init(&iter, iocb, WRITE);
+
+ file_start_write(filp);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ file_end_write(filp);
+ WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+ nfs_local_write_done(iocb, status);
+ nfs_local_vfs_getattr(iocb);
+ nfs_local_pgio_release(iocb);
+
+ revert_creds(save_cred);
+ complete(args->done);
+}
+
+static int nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
+ const struct rpc_call_ops *call_ops)
+{
+ struct nfs_local_io_args args;
+ DECLARE_COMPLETION_ONSTACK(done);
+ struct nfs_local_kiocb *iocb;
+
dprintk("%s: vfs_write count=%u pos=%llu %s\n",
__func__, hdr->args.count, hdr->args.offset,
(hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
@@ -471,7 +530,6 @@ nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
iocb = nfs_local_iocb_alloc(hdr, filp, GFP_NOIO);
if (iocb == NULL)
return -ENOMEM;
- nfs_local_iter_init(&iter, iocb, WRITE);
switch (hdr->args.stable) {
default:
@@ -486,14 +544,13 @@ nfs_do_local_write(struct nfs_pgio_header *hdr, struct file *filp,
nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
- file_start_write(filp);
- status = filp->f_op->write_iter(&iocb->kiocb, &iter);
- file_end_write(filp);
- WARN_ON_ONCE(status == -EIOCBQUEUED);
-
- nfs_local_write_done(iocb, status);
- nfs_local_vfs_getattr(iocb);
- nfs_local_pgio_release(iocb);
+ args.iocb = iocb;
+ args.done = &done;
+ args.cred = current_cred();
+ INIT_WORK_ONSTACK(&args.work, nfs_local_call_write);
+ queue_work(nfslocaliod_workqueue, &args.work);
+ wait_for_completion(&done);
+ destroy_work_on_stack(&args.work);
return 0;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 19/20] nfs: implement client support for NFS_LOCALIO_PROGRAM
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (17 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 18/20] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 20/20] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
` (2 subsequent siblings)
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
LOCALIOPROC_GETUUID allows a client to discover the server's uuid.
nfs_local_probe() will retrieve server's uuid via LOCALIO protocol and
verify the server with that uuid it is known to be local. This ensures
client and server 1: support localio 2: are local to each other.
All the knowledge of the LOCALIO RPC protocol is in fs/nfs/localio.c
which implements just a single version (1) that is used independently
of what NFS version is used.
Get nfsd_open_local_fh and store it in rpc_client during client
creation, put the symbol during nfs_local_disable -- which is also
called during client destruction.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[neilb: factored out and simplified single localio protocol]
Co-developed-by: NeilBrown <neil@brown.name>
Signed-off-by: NeilBrown <neil@brown.name>
---
fs/nfs/client.c | 6 +-
fs/nfs/localio.c | 153 +++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 152 insertions(+), 7 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 1300c388f971..6faa9fdc444d 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -434,8 +434,10 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
- nfs_local_probe(new);
- return rpc_ops->init_client(new, cl_init);
+ new = rpc_ops->init_client(new, cl_init);
+ if (!IS_ERR(new))
+ nfs_local_probe(new);
+ return new;
}
spin_unlock(&nn->nfs_client_lock);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 7039a181ff89..08d8f661ebe9 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -15,6 +15,7 @@
#include <linux/sunrpc/addr.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
+#include <linux/nfslocalio.h>
#include <linux/module.h>
#include <linux/bvec.h>
@@ -124,18 +125,76 @@ nfs4errno(int errno)
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
+static inline bool nfs_client_is_local(const struct nfs_client *clp)
+{
+ return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+}
+
bool nfs_server_is_local(const struct nfs_client *clp)
{
- return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
- localio_enabled;
+ return nfs_client_is_local(clp) && localio_enabled;
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);
+/*
+ * GETUUID XDR functions
+ */
+
+static void localio_xdr_enc_getuuidargs(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ const void *data)
+{
+ /* void function */
+}
+
+static int localio_xdr_dec_getuuidres(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ void *result)
+{
+ u8 *uuid = result;
+
+ return decode_opaque_fixed(xdr, uuid, UUID_SIZE);
+}
+
+static const struct rpc_procinfo nfs_localio_procedures[] = {
+ [LOCALIOPROC_GETUUID] = {
+ .p_proc = LOCALIOPROC_GETUUID,
+ .p_encode = localio_xdr_enc_getuuidargs,
+ .p_decode = localio_xdr_dec_getuuidres,
+ .p_arglen = 0,
+ .p_replen = XDR_QUADLEN(UUID_SIZE),
+ .p_statidx = LOCALIOPROC_GETUUID,
+ .p_name = "GETUUID",
+ },
+};
+
+static unsigned int nfs_localio_counts[ARRAY_SIZE(nfs_localio_procedures)];
+const struct rpc_version nfslocalio_version1 = {
+ .number = 1,
+ .nrprocs = ARRAY_SIZE(nfs_localio_procedures),
+ .procs = nfs_localio_procedures,
+ .counts = nfs_localio_counts,
+};
+
+static const struct rpc_version *nfslocalio_version[] = {
+ [1] = &nfslocalio_version1,
+};
+
+extern const struct rpc_program nfslocalio_program;
+static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program };
+
+const struct rpc_program nfslocalio_program = {
+ .name = "nfslocalio",
+ .number = NFS_LOCALIO_PROGRAM,
+ .nrvers = ARRAY_SIZE(nfslocalio_version),
+ .version = nfslocalio_version,
+ .stats = &nfslocalio_rpcstat,
+};
+
/*
* nfs_local_enable - enable local i/o for an nfs_client
*/
-static __maybe_unused void nfs_local_enable(struct nfs_client *clp,
- struct net *net)
+static void nfs_local_enable(struct nfs_client *clp, struct net *net)
{
if (READ_ONCE(clp->nfsd_open_local_fh)) {
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
@@ -151,15 +210,98 @@ void nfs_local_disable(struct nfs_client *clp)
{
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
trace_nfs_local_disable(clp);
+ put_nfsd_open_local_fh();
+ clp->nfsd_open_local_fh = NULL;
+ if (!IS_ERR(clp->cl_rpcclient_localio)) {
+ rpc_shutdown_client(clp->cl_rpcclient_localio);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ }
clp->cl_nfssvc_net = NULL;
}
}
+/*
+ * nfs_init_localioclient - Initialise an NFS localio client connection
+ */
+static void nfs_init_localioclient(struct nfs_client *clp)
+{
+ if (unlikely(!IS_ERR(clp->cl_rpcclient_localio)))
+ goto out;
+ clp->cl_rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
+ &nfslocalio_program, 1);
+ if (IS_ERR(clp->cl_rpcclient_localio))
+ goto out;
+ /* No errors! Assume that localio is supported */
+ clp->nfsd_open_local_fh = get_nfsd_open_local_fh();
+ if (!clp->nfsd_open_local_fh) {
+ rpc_shutdown_client(clp->cl_rpcclient_localio);
+ clp->cl_rpcclient_localio = ERR_PTR(-EINVAL);
+ }
+out:
+ dprintk_rcu("%s: server (%s) %s NFS LOCALIO, nfsd_open_local_fh is %s.\n",
+ __func__, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
+ (IS_ERR(clp->cl_rpcclient_localio) ? "does not support" : "supports"),
+ (clp->nfsd_open_local_fh ? "set" : "not set"));
+}
+
+static bool nfs_local_server_getuuid(struct nfs_client *clp, uuid_t *nfsd_uuid)
+{
+ u8 uuid[UUID_SIZE];
+ struct rpc_message msg = {
+ .rpc_resp = &uuid,
+ };
+ int status;
+
+ nfs_init_localioclient(clp);
+ if (IS_ERR(clp->cl_rpcclient_localio))
+ return false;
+
+ msg.rpc_proc = &nfs_localio_procedures[LOCALIOPROC_GETUUID];
+ status = rpc_call_sync(clp->cl_rpcclient_localio, &msg, 0);
+ dprintk("%s: NFS reply getuuid: status=%d uuid=%pU\n",
+ __func__, status, uuid);
+ if (status)
+ return false;
+
+ import_uuid(nfsd_uuid, uuid);
+
+ return true;
+}
+
/*
* nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ * - called after alloc_client and init_client (so cl_rpcclient exists)
+ * - this function is idempotent, it can be called for old or new clients
*/
void nfs_local_probe(struct nfs_client *clp)
{
+ uuid_t uuid;
+ struct net *net = NULL;
+
+ if (!localio_enabled || clp->cl_rpcclient->cl_vers == 2)
+ goto unsupported;
+
+ if (nfs_client_is_local(clp)) {
+ /* If already enabled, disable and re-enable */
+ nfs_local_disable(clp);
+ }
+
+ /*
+ * Retrieve server's uuid via LOCALIO protocol and verify the
+ * server with that uuid is known to be local. This ensures
+ * client and server 1: support localio 2: are local to each other
+ * by verifying client's nfsd, with specified uuid, is local.
+ */
+ if (!nfs_local_server_getuuid(clp, &uuid) ||
+ !nfsd_uuid_is_local(&uuid, &net))
+ goto unsupported;
+
+ nfs_local_enable(clp, net);
+ return;
+
+unsupported:
+ /* localio not supported */
+ nfs_local_disable(clp);
}
EXPORT_SYMBOL_GPL(nfs_local_probe);
@@ -184,7 +326,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
trace_nfs_local_open_fh(fh, mode, status);
switch (status) {
case -ENXIO:
- nfs_local_disable(clp);
+ /* Revalidate localio, will disable if unsupported */
+ nfs_local_probe(clp);
fallthrough;
case -ETIMEDOUT:
status = -EAGAIN;
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* [PATCH v11 20/20] nfs: add Documentation/filesystems/nfs/localio.rst
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (18 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 19/20] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
@ 2024-07-02 16:28 ` Mike Snitzer
2024-07-02 18:06 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Chuck Lever III
2024-07-03 15:16 ` Christoph Hellwig
21 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 16:28 UTC (permalink / raw)
To: linux-nfs
Cc: Jeff Layton, Chuck Lever, Anna Schumaker, Trond Myklebust,
NeilBrown, snitzer
This document gives an overview of the LOCALIO auxiliary RPC protocol
added to the Linux NFS client and server (both v3 and v4) to allow a
client and server to reliably handshake to determine if they are on the
same host. The LOCALIO auxiliary protocol's implementation, which uses
the same connection as NFS traffic, follows the pattern established by
the NFS ACL protocol extension.
The robust handshake between local client and server is just the
beginning, the ultimate usecase this locality makes possible is the
client is able to issue reads, writes and commits directly to the server
without having to go over the network. This is particularly useful for
container usecases (e.g. kubernetes) where it is possible to run an IO
job local to the server.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
Documentation/filesystems/nfs/localio.rst | 135 ++++++++++++++++++++++
include/linux/nfslocalio.h | 2 +
2 files changed, 137 insertions(+)
create mode 100644 Documentation/filesystems/nfs/localio.rst
diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst
new file mode 100644
index 000000000000..7f211e3fc34c
--- /dev/null
+++ b/Documentation/filesystems/nfs/localio.rst
@@ -0,0 +1,135 @@
+===========
+NFS localio
+===========
+
+This document gives an overview of the LOCALIO auxiliary RPC protocol
+added to the Linux NFS client and server (both v3 and v4) to allow a
+client and server to reliably handshake to determine if they are on the
+same host. The LOCALIO auxiliary protocol's implementation, which uses
+the same connection as NFS traffic, follows the pattern established by
+the NFS ACL protocol extension.
+
+The LOCALIO auxiliary protocol is needed to allow robust discovery of
+clients local to their servers. In a private implementation that
+preceded use of this LOCALIO protocol, a fragile sockaddr network
+address based match against all local network interfaces was attempted.
+But unlike the LOCALIO protocol, the sockaddr-based matching didn't
+handle use of iptables or containers.
+
+The robust handshake between local client and server is just the
+beginning, the ultimate usecase this locality makes possible is the
+client is able to issue reads, writes and commits directly to the server
+without having to go over the network. This is particularly useful for
+container usecases (e.g. kubernetes) where it is possible to run an IO
+job local to the server.
+
+The performance advantage realized from localio's ability to bypass
+using XDR and RPC for reads, writes and commits can be extreme, e.g.:
+fio for 20 secs with 24 libaio threads, 64k directio reads, qd of 8,
+- With localio:
+ read: IOPS=691k, BW=42.2GiB/s (45.3GB/s)(843GiB/20002msec)
+- Without localio:
+ read: IOPS=15.7k, BW=984MiB/s (1032MB/s)(19.2GiB/20013msec)
+
+RPC
+---
+
+The LOCALIO auxiliary RPC protocol consists of a single "GETUUID" RPC
+method that allows the Linux NFS client to retrieve a Linux NFS server's
+uuid. This protocol isn't part of an IETF standard, nor does it need to
+be considering it is Linux-to-Linux auxiliary RPC protocol that amounts
+to an implementation detail.
+
+The GETUUID method encodes the server's uuid_t in terms of the fixed
+UUID_SIZE (16 bytes). The fixed size opaque encode and decode XDR
+methods are used instead of the less efficient variable sized methods.
+
+The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
+by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
+Linux Kernel Organization 400122 nfslocalio
+
+The LOCALIO protocol spec in rpcgen syntax is:
+
+/* raw RFC 9562 UUID */
+#define UUID_SIZE 16
+typedef u8 uuid_t<UUID_SIZE>;
+
+program NFS_LOCALIO_PROGRAM {
+ version LOCALIO_V1 {
+ void
+ NULL(void) = 0;
+
+ uuid_t
+ GETUUID(void) = 1;
+ } = 1;
+} = 400122;
+
+LOCALIO uses the same transport connection as NFS traffic. As such,
+LOCALIO is not registered with rpcbind.
+
+Once an NFS client and server handshake as "local", the client will
+bypass the network RPC protocol for read, write and commit operations.
+Due to this XDR and RPC bypass, these operations will operate faster.
+
+NFS Common and Server
+---------------------
+
+Localio is used by nfsd to add access to a global nfsd_uuids list in
+nfs_common that is used to register and then identify local nfsd
+instances.
+
+nfsd_uuids is protected by the nfsd_mutex or RCU read lock and is
+composed of nfsd_uuid_t instances that are managed as nfsd creates them
+(per network namespace).
+
+nfsd_uuid_is_local() and nfsd_uuid_lookup() are used to search all local
+nfsd for the client specified nfsd uuid.
+
+The nfsd_uuids list is the basis for localio enablement, as such it has
+members that point to nfsd memory for direct use by the client
+(e.g. 'net' is the server's network namespace, through it the client can
+access nn->nfsd_serv with proper rcu read access). It is this client
+and server synchronization that enables advanced usage and lifetime of
+objects to span from the host kernel's nfsd to per-container knfsd
+instances that are connected to nfs client's running on the same local
+host.
+
+NFS Client
+----------
+
+fs/nfs/localio.c:nfs_local_probe() will retrieve a server's uuid via
+LOCALIO protocol and check if the server with that uuid is known to be
+local. This ensures client and server 1: support localio 2: are local
+to each other.
+
+See fs/nfs/localio.c:nfs_local_open_fh() and
+fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes
+focused use of nfsd_uuid_t struct to allow a client local to a server to
+open a file pointer without needing to go over the network.
+
+The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the
+server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access
+both the nfsd network namespace and the associated nn->nfsd_serv in
+terms of RCU. If nfsd_open_local_fh() finds that client no longer sees
+valid nfsd objects (be it struct net or nn->nfsd_serv) it returns ENXIO
+to nfs_local_open_fh() and the client will try to reestablish the
+LOCALIO resources needed by calling nfs_local_probe() again. This
+recovery is needed if/when an nfsd instance running in a container were
+to reboot while a localio client is connected to it.
+
+Testing
+-------
+
+The LOCALIO auxiliary protocol and associated NFS localio read, write
+and commit access have proven stable against various test scenarios but
+these have not yet been formalized in any testsuite:
+
+- Client and server both on localhost (for both v3 and v4.2).
+
+- Various permutations of client and server support enablement for
+ both local and remote client and server. Testing against NFS storage
+ products that don't support the LOCALIO protocol was also performed.
+
+- Client on host, server within a container (for both v3 and v4.2)
+ The container testing was in terms of podman managed containers and
+ includes container stop/restart scenario.
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 22443d2089eb..e8e3117abb5f 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -21,6 +21,8 @@ extern struct list_head nfsd_uuids;
* Each nfsd instance has an nfsd_uuid_t that is accessible through the
* global nfsd_uuids list. Useful to allow a client to negotiate if localio
* possible with its server.
+ *
+ * See Documentation/filesystems/nfs/localio.rst for more detail.
*/
typedef struct {
uuid_t uuid;
--
2.44.0
^ permalink raw reply related [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (19 preceding siblings ...)
2024-07-02 16:28 ` [PATCH v11 20/20] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
@ 2024-07-02 18:06 ` Chuck Lever III
2024-07-02 18:32 ` Mike Snitzer
2024-07-03 5:04 ` Christoph Hellwig
2024-07-03 15:16 ` Christoph Hellwig
21 siblings, 2 replies; 77+ messages in thread
From: Chuck Lever III @ 2024-07-02 18:06 UTC (permalink / raw)
To: Mike Snitzer
Cc: Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, snitzer@hammerspace.com
> On Jul 2, 2024, at 12:28 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> Hi,
>
> There seems to be consensus that these changes worthwhile and
> extensively iterated on.
I don't see a public consensus about "extensively iterated
on". The folks you talk to privately might believe that,
though.
> I'd very much like these changes to land upstream as-is (unless review
> teases out some show-stopper). These changes have been tested fairly
> extensively (via xfstests) at this point.
>
> Can we now please provide formal review tags and merge these changes
> through the NFS client tree for 6.11?
Contributors don't get to determine the kernel release where
their code lands; maintainers make that decision. You've stated
your preference, and we are trying to accommodate. But frankly,
the (server) changes don't stand up to close inspection yet.
One of the client maintainers has had years to live with this
work. But the server maintainers had their first look at this
just a few weeks ago, and this is not the only thing any of us
have on our plates at the moment. So you need to be patient.
> FYI:
> - I do not intend to rebase this series ontop of NeilBrown's partial
> exploration of simplifying away the need for a "fake" svc_rqst
> (noble goals and happy to help those changes land upstream as an
> incremental improvement):
> https://marc.info/?l=linux-nfs&m=171980269529965&w=2
Sorry, rebasing is going to be a requirement.
Again, as with the dprintk stuff, this is code that would get
reverted or replaced as soon as we merge. We don't knowingly
merge that kind of code; we fix it first.
To make it official, for v11 of this series:
Nacked-by: Chuck Lever <chuck.lever@oracle.com>
I'll be much more ready to consider an Acked-by: once the
"fake svc_rqst" code has been replaced.
> - In addition, tweaks to use nfsd_file_acquire_gc() instead of
> nfsd_file_acquire() aren't a priority.
The discussion has moved well beyond that now... IIUC the
preferred approach might be to hold the file open until the
local app is done with it. However, I'm still not convinced
there's a benefit to using the NFSD file cache vs. a plain
dentry_open().
Neil's clean-up might not need add a new nfsd_file_acquire()
API if we go with plain dentry_open().
There are still interesting choices to make here before it
is merged, so IMO the choices around nfsd_file_acquire()
remain a priority for merge-readiness.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-02 18:06 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Chuck Lever III
@ 2024-07-02 18:32 ` Mike Snitzer
2024-07-02 20:10 ` Chuck Lever III
2024-07-03 0:52 ` NeilBrown
2024-07-03 5:04 ` Christoph Hellwig
1 sibling, 2 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-02 18:32 UTC (permalink / raw)
To: Chuck Lever III
Cc: Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, snitzer@hammerspace.com
On Tue, Jul 02, 2024 at 06:06:09PM +0000, Chuck Lever III wrote:
>
>
> > On Jul 2, 2024, at 12:28 PM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > Hi,
> >
> > There seems to be consensus that these changes worthwhile and
> > extensively iterated on.
>
> I don't see a public consensus about "extensively iterated
> on". The folks you talk to privately might believe that,
> though.
>
>
> > I'd very much like these changes to land upstream as-is (unless review
> > teases out some show-stopper). These changes have been tested fairly
> > extensively (via xfstests) at this point.
> >
> > Can we now please provide formal review tags and merge these changes
> > through the NFS client tree for 6.11?
>
> Contributors don't get to determine the kernel release where
> their code lands; maintainers make that decision. You've stated
> your preference, and we are trying to accommodate. But frankly,
> the (server) changes don't stand up to close inspection yet.
>
> One of the client maintainers has had years to live with this
> work. But the server maintainers had their first look at this
> just a few weeks ago, and this is not the only thing any of us
> have on our plates at the moment. So you need to be patient.
>
>
> > FYI:
> > - I do not intend to rebase this series ontop of NeilBrown's partial
> > exploration of simplifying away the need for a "fake" svc_rqst
> > (noble goals and happy to help those changes land upstream as an
> > incremental improvement):
> > https://marc.info/?l=linux-nfs&m=171980269529965&w=2
>
> Sorry, rebasing is going to be a requirement.
What? You're imposing a rebase on completely unfinished and untested
code? Any idea when Neil will post v2? Or am I supposed to take his
partial first pass and fix it?
> Again, as with the dprintk stuff, this is code that would get
> reverted or replaced as soon as we merge. We don't knowingly
> merge that kind of code; we fix it first.
Nice rule, except there is merit in tested code landing without it
having to see last minute academic changes. These aren't dprintk,
these are disruptive changes that aren't fully formed. If they were
fully formed I wouldn't be resisting them.
> To make it official, for v11 of this series:
>
> Nacked-by: Chuck Lever <chuck.lever@oracle.com>
Thanks for that.
> I'll be much more ready to consider an Acked-by: once the
> "fake svc_rqst" code has been replaced.
If Neil completes his work I'll rebase. But last time I rebased to
his simplification of the localio protocol (to use array and not
lists, nice changes, appreciated but it took serious work on my part
to fold them in): the code immediately BUG_ON()'d in sunrpc trivially.
So please be considerate of my time and the requirement for code to
actually work.
I'm fine with these changes not landing for 6.11 if warranted. I just
seriously question the arbitrary nature of what constitutes necessary
change to allow inclusion.
> > - In addition, tweaks to use nfsd_file_acquire_gc() instead of
> > nfsd_file_acquire() aren't a priority.
>
> The discussion has moved well beyond that now... IIUC the
> preferred approach might be to hold the file open until the
> local app is done with it. However, I'm still not convinced
> there's a benefit to using the NFSD file cache vs. a plain
> dentry_open().
Saving an nfs_file to open_context, etc. All incremental improvement
(that needs time to stick the landing).
Why do you think it appropriate to cause upheaval on code that has
clearly drawn a line in the sand in terms of established fitness?
Eliding allocation of things and micro-optimizing can come later. But
I guess I'll just have to agree to disagree with this approach.
Really feels like I'll be forced to keep both pieces when it breaks in
the near-term.
By all means layer on new improvements. But this fear to establish a
baseline out of fear that we _might_ change it: don't even know where
to begin with that.
> Neil's clean-up might not need add a new nfsd_file_acquire()
> API if we go with plain dentry_open().
>
> There are still interesting choices to make here before it
> is merged, so IMO the choices around nfsd_file_acquire()
> remain a priority for merge-readiness.
Maybe Neil will post a fully working v12 rebased on his changes.
Mike
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-02 18:32 ` Mike Snitzer
@ 2024-07-02 20:10 ` Chuck Lever III
2024-07-03 0:57 ` Mike Snitzer
2024-07-03 0:52 ` NeilBrown
1 sibling, 1 reply; 77+ messages in thread
From: Chuck Lever III @ 2024-07-02 20:10 UTC (permalink / raw)
To: Mike Snitzer
Cc: Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, snitzer@hammerspace.com
> On Jul 2, 2024, at 2:32 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Tue, Jul 02, 2024 at 06:06:09PM +0000, Chuck Lever III wrote:
>>
>>
>>> On Jul 2, 2024, at 12:28 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>>>
>>> Hi,
>>>
>>> There seems to be consensus that these changes worthwhile and
>>> extensively iterated on.
>>
>> I don't see a public consensus about "extensively iterated
>> on". The folks you talk to privately might believe that,
>> though.
>>
>>
>>> I'd very much like these changes to land upstream as-is (unless review
>>> teases out some show-stopper). These changes have been tested fairly
>>> extensively (via xfstests) at this point.
>>>
>>> Can we now please provide formal review tags and merge these changes
>>> through the NFS client tree for 6.11?
>>
>> Contributors don't get to determine the kernel release where
>> their code lands; maintainers make that decision. You've stated
>> your preference, and we are trying to accommodate. But frankly,
>> the (server) changes don't stand up to close inspection yet.
>>
>> One of the client maintainers has had years to live with this
>> work. But the server maintainers had their first look at this
>> just a few weeks ago, and this is not the only thing any of us
>> have on our plates at the moment. So you need to be patient.
>>
>>
>>> FYI:
>>> - I do not intend to rebase this series ontop of NeilBrown's partial
>>> exploration of simplifying away the need for a "fake" svc_rqst
>>> (noble goals and happy to help those changes land upstream as an
>>> incremental improvement):
>>> https://marc.info/?l=linux-nfs&m=171980269529965&w=2
>>
>> Sorry, rebasing is going to be a requirement.
>
> What? You're imposing a rebase on completely unfinished and untested
> code? Any idea when Neil will post v2? Or am I supposed to take his
> partial first pass and fix it?
Don't be ridiculous. Wait for Neil to post a working version.
>> Again, as with the dprintk stuff, this is code that would get
>> reverted or replaced as soon as we merge. We don't knowingly
>> merge that kind of code; we fix it first.
>
> Nice rule, except there is merit in tested code landing without it
> having to see last minute academic changes. These aren't dprintk,
> these are disruptive changes that aren't fully formed. If they were
> fully formed I wouldn't be resisting them.
It's your server patch that isn't fully formed. Allocating
a fake svc_rqst outside of an svc thread context and adding
a work-around to avoid the cache lookup deferral is nothing
but a hacky smelly prototype. It's not merge-ready or -worthy.
> If Neil completes his work I'll rebase. But last time I rebased to
> his simplification of the localio protocol (to use array and not
> lists, nice changes, appreciated but it took serious work on my part
> to fold them in): the code immediately BUG_ON()'d in sunrpc trivially.
You should be very grateful that Neil is writing your code
for you. He's already contributed much more than you have
any reason to expect from someone who is not employed by
Hammerspace.
And quite frankly, it is not reasonable to expect anyone's
freshly written code to be completely free of bugs. I'm
sorry it took you a little while to find the problem, but
it will become easier when you become more familiar with
the code base.
> So please be considerate of my time and the requirement for code to
> actually work.
I'll be considerate when you are considerate of our time and
stop patch bombing the list with tiny incremental changes,
demanding we "get the review done and merge it" before it
is ready.
Honestly, the work is proceeding quite unremarkably for a
new feature. The problem seems to be that you don't
understand why we're asking for (actually quite small)
changes before merging, and we're asking you to do that
work. Why are we asking you to do it?
It's because you are asking for /our/ time. But we don't
work for Hammerspace and do not have any particular interest
in localIO and have no real way to test the facility yet
(no, running fstests does not count as a full test).
It's your responsibility to get this code put together,
it's got to be your time and effort. You are getting paid
to deal with this. None of the rest of us are. No-one else
is asking for this feature.
>>> - In addition, tweaks to use nfsd_file_acquire_gc() instead of
>>> nfsd_file_acquire() aren't a priority.
>>
>> The discussion has moved well beyond that now... IIUC the
>> preferred approach might be to hold the file open until the
>> local app is done with it. However, I'm still not convinced
>> there's a benefit to using the NFSD file cache vs. a plain
>> dentry_open().
>
> Saving an nfs_file to open_context, etc. All incremental improvement
> (that needs time to stick the landing).
You are still missing the point. The phony svc_rqst is being
passed into nfsd_file_acquire(). Either we have to fix
nfsd_file_acquire (as Neil did) or replace it's use with
fh_verify() / dentry_open().
This is not about garbage collection, and hasn't been for
a while. It's about replacing unmergable prototype code.
And sticking the landing? If a few good fstests results
are supposed to be good enough for us to merge your code
as it exists now, why aren't they good enough to verify
your code is OK to merge after a rebase?
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-02 18:32 ` Mike Snitzer
2024-07-02 20:10 ` Chuck Lever III
@ 2024-07-03 0:52 ` NeilBrown
2024-07-03 1:13 ` Mike Snitzer
1 sibling, 1 reply; 77+ messages in thread
From: NeilBrown @ 2024-07-03 0:52 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever III, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust, snitzer@hammerspace.com
On Wed, 03 Jul 2024, Mike Snitzer wrote:
>
> Maybe Neil will post a fully working v12 rebased on his changes.
Maybe I will, but it won't be before Friday.
I too wonder about the unusual expectation of haste, and what its real
source is.
NeilBrown
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-02 20:10 ` Chuck Lever III
@ 2024-07-03 0:57 ` Mike Snitzer
0 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-03 0:57 UTC (permalink / raw)
To: Chuck Lever III
Cc: Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown
I am an upstream Linux kernel maintainer too. My ideals and approach
are different but they are my own ;)
The first localio RFC (that made it to list as v2) was posted on June
11. I have tried to work well with you and everyone willing to help
and engage. So for it to come to this exchange is unfortunate.
Development with excess rebases is just soul-sucking. My v11's 0th
header certainly conveyed exhaustion in that aspect of how things have
gone as this series has evolved.
I clearly upset you by suggesting v11 suitable to merge for 6.11. I
really wasn't trying to be pushy. I didn't think it controversial,
but I concede not giving you much to work with if/when you disagreed.
Sorry about painting you into a corner.
v11 is a solid basis to develop upon further. I am all for iterating
further, am aware it is my burden to carry, and am hopeful we can get
localio staged in linux-next early during the 6.12 development window.
Let it soak (Anna's instinct was solid).
However, I'm hopeful to avoid the hell of frequent rebasing ontop of
refactored code that optimizes approaches that this v11 baseline
provides.
SO I'd like to propose I carry the v11 baseline in my git tree:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-next
And any changes (e.g. Neil's promising refactor to avoid needing
"fake" svc_rqst) can be based on 'nfs-localio-for-next' with
standalone incremental commits that can possibly get folded via a
final rebase once we're happy with the end result of the changes?
Thanks,
Mike
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 0:52 ` NeilBrown
@ 2024-07-03 1:13 ` Mike Snitzer
0 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-03 1:13 UTC (permalink / raw)
To: NeilBrown
Cc: Chuck Lever III, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust
On Wed, Jul 03, 2024 at 10:52:20AM +1000, NeilBrown wrote:
> On Wed, 03 Jul 2024, Mike Snitzer wrote:
> >
> > Maybe Neil will post a fully working v12 rebased on his changes.
>
> Maybe I will, but it won't be before Friday.
No problem! I can also just run with the first patchset you provided.
But hopeful you're OK with us doing incremental changes to this "v11"
baseline?:
https://git.kernel.org/pub/scm/linux/kernel/git/snitzer/linux.git/log/?h=nfs-localio-for-next
Please see the other reply I just sent for more context on why I hope
this works for you. Happy to do a final rebase once the code is
settled.
> I too wonder about the unusual expectation of haste, and what its real
> source is.
Desire to settle approach is all, to allow settling development and
ultimately move on to developing something else in NFS.
Mike
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-02 18:06 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Chuck Lever III
2024-07-02 18:32 ` Mike Snitzer
@ 2024-07-03 5:04 ` Christoph Hellwig
2024-07-03 8:52 ` Mike Snitzer
1 sibling, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-03 5:04 UTC (permalink / raw)
To: Chuck Lever III
Cc: Mike Snitzer, Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, snitzer@hammerspace.com
On Tue, Jul 02, 2024 at 06:06:09PM +0000, Chuck Lever III wrote:
> To make it official, for v11 of this series:
>
> Nacked-by: Chuck Lever <chuck.lever@oracle.com>
We've also not even looked into tackling the whole memory reclaim
recursion problem that have historically made local loopback
network file system an unsupported configuration. We've have an
ongoing discussion on the XFS list that really needs to to fsdevel
and mm to make any progress first. I see absolutely not chance to
solved that in this merge window. I'm also a bit surprised and
shocked by the rush here.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 5:04 ` Christoph Hellwig
@ 2024-07-03 8:52 ` Mike Snitzer
2024-07-03 14:16 ` Christoph Hellwig
2024-07-03 15:26 ` Chuck Lever III
0 siblings, 2 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-03 8:52 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chuck Lever III, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Tue, Jul 02, 2024 at 10:04:56PM -0700, Christoph Hellwig wrote:
> On Tue, Jul 02, 2024 at 06:06:09PM +0000, Chuck Lever III wrote:
> > To make it official, for v11 of this series:
> >
> > Nacked-by: Chuck Lever <chuck.lever@oracle.com>
>
> We've also not even looked into tackling the whole memory reclaim
> recursion problem that have historically made local loopback
> network file system an unsupported configuration. We've have an
> ongoing discussion on the XFS list that really needs to to fsdevel
> and mm to make any progress first. I see absolutely not chance to
> solved that in this merge window. I'm also a bit surprised and
> shocked by the rush here.
linux-nfs and linu-xfs both received those emails, there isn't some
secret game here:
https://marc.info/?l=linux-nfs&m=171976530216518&w=2
https://marc.info/?l=linux-xfs&m=171976530416523&w=2
I pivoted away from that (after Dave's helpful response) and back to
what I provided in most every revision of this patchset (with
different header and code revisions), most recent being patch 18 of
this v11 series: https://marc.info/?l=linux-nfs&m=171993773109538&w=2
And if spending the past 2 months discussing and developing in the
open is rushing things, I clearly need to slow down...
If only I had reason to think others were considering merging these
changes: https://marc.info/?l=linux-nfs&m=171942776105165&w=2
Ultimately I simply wanted to keep momentum up, I'm sure you can
relate to having a vision for phasing changes in without missing a
cycle. But happy to just continue working it into the 6.12
development window.
I'll be sure to cc linux-fsdevel on future revision(s).
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 8:52 ` Mike Snitzer
@ 2024-07-03 14:16 ` Christoph Hellwig
2024-07-03 15:11 ` Mike Snitzer
2024-07-03 15:26 ` Chuck Lever III
1 sibling, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-03 14:16 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Chuck Lever III, Linux NFS Mailing List,
Jeff Layton, Anna Schumaker, Trond Myklebust, Neil Brown,
Dave Chinner
On Wed, Jul 03, 2024 at 04:52:34AM -0400, Mike Snitzer wrote:
> Ultimately I simply wanted to keep momentum up, I'm sure you can
> relate to having a vision for phasing changes in without missing a
> cycle. But happy to just continue working it into the 6.12
> development window.
It just feels really rushed to have something with cross-subsystem
communication going in past -rc6 in a US holiday week. Sometimes
not rushing things too much will lead to much better results.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 14:16 ` Christoph Hellwig
@ 2024-07-03 15:11 ` Mike Snitzer
2024-07-03 15:18 ` Christoph Hellwig
0 siblings, 1 reply; 77+ messages in thread
From: Mike Snitzer @ 2024-07-03 15:11 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chuck Lever III, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Wed, Jul 03, 2024 at 07:16:31AM -0700, Christoph Hellwig wrote:
> On Wed, Jul 03, 2024 at 04:52:34AM -0400, Mike Snitzer wrote:
> > Ultimately I simply wanted to keep momentum up, I'm sure you can
> > relate to having a vision for phasing changes in without missing a
> > cycle. But happy to just continue working it into the 6.12
> > development window.
>
> It just feels really rushed to have something with cross-subsystem
> communication going in past -rc6 in a US holiday week. Sometimes
> not rushing things too much will lead to much better results.
Yes, I knew it to be very tight given the holiday. I should've just
yielded to the reality of the calendar and there being some extra
changes needed (remove "fake" svc_rqst in fs/nfsd/localio.c -- I was
hopeful that could be done incrementally after merge but I digress).
Will welcome any help you might offer to optimize localio as much as
possible (doesn't need to be in near-term, whenever you might have
time to look). Its current approach to use synchronous buffered
read_iter and write_iter, with active waiting, should be improved.
But Dave's idea to go full RMW to be page aligned will complicate a
forecasted NFS roadmap item to allow for: "do-not-cache capabilities,
so that the NFS server can turn off the buffer caching of files on
clients (force O_DIRECT-like writing/reading)". But even that seems a
catch-22 given the NFS client doesn't enforce DIO alignment.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
` (20 preceding siblings ...)
2024-07-02 18:06 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Chuck Lever III
@ 2024-07-03 15:16 ` Christoph Hellwig
2024-07-03 15:28 ` Mike Snitzer
21 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-03 15:16 UTC (permalink / raw)
To: Mike Snitzer
Cc: linux-nfs, Jeff Layton, Chuck Lever, Anna Schumaker,
Trond Myklebust, NeilBrown, snitzer
I've stated looking a bit at the code, and the architectural model
confuses me more than a bit.
A first thing that would be very helpful is an actual problem statement.
The only mention of a concrete use case is about containers, implying
that this about a client in one container/namespace with the server
or the servers in another containers/namespace. Is that the main use
case, are there others?
I kinda deduct from that that the client and server probably do not
have the same view and access permissions to the underlying file
systems? As this would defeat the use of NFS I suspect that is the
case, but it should probably be stated clearly somewhere.
Going from there I don't understand why we need multiple layers of
server bypass. The normal way to do this in NFSv4 is to use pNFS
layout.
I.e. you add a pnfs localio layout that just does local reads
and writes for the I/O path. We'd still need a way to find a good
in-kernel way to get the file structure, but compared to the two
separate layers of bypasses in the current code it should be
significantly simpler.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:11 ` Mike Snitzer
@ 2024-07-03 15:18 ` Christoph Hellwig
2024-07-03 15:24 ` Chuck Lever III
0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-03 15:18 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Chuck Lever III, Linux NFS Mailing List,
Jeff Layton, Anna Schumaker, Trond Myklebust, Neil Brown,
Dave Chinner
On Wed, Jul 03, 2024 at 11:11:51AM -0400, Mike Snitzer wrote:
> Will welcome any help you might offer to optimize localio as much as
> possible (doesn't need to be in near-term, whenever you might have
> time to look). Its current approach to use synchronous buffered
> read_iter and write_iter, with active waiting, should be improved.
>
> But Dave's idea to go full RMW to be page aligned will complicate a
> forecasted NFS roadmap item to allow for: "do-not-cache capabilities,
> so that the NFS server can turn off the buffer caching of files on
> clients (force O_DIRECT-like writing/reading)". But even that seems a
> catch-22 given the NFS client doesn't enforce DIO alignment.
As I just wrote in another mail I've now looked at the architecture,
and either I'm missing some unstated requires, or the whole architecture
seems very overcomplicated and suboptimal. If localio actually just was
a pNFS layout type you could trivially do asynchronous direct I/O from
the layout driver, and bypass a lot of the complexity. The actual way
to find the file struct still would be nasty, but I'll try to think of
something good for that.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:18 ` Christoph Hellwig
@ 2024-07-03 15:24 ` Chuck Lever III
2024-07-03 15:29 ` Christoph Hellwig
2024-07-03 15:36 ` Mike Snitzer
0 siblings, 2 replies; 77+ messages in thread
From: Chuck Lever III @ 2024-07-03 15:24 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Linux NFS Mailing List, Jeff Layton, Anna Schumaker,
Trond Myklebust, Neil Brown, Dave Chinner
> On Jul 3, 2024, at 11:18 AM, Christoph Hellwig <hch@infradead.org> wrote:
> The actual way
> to find the file struct still would be nasty, but I'll try to think of
> something good for that.
It is that very code that I've asked to be replaced before this
series can be merged. We have a set of patches for improving
that aspect that Neil is working on now.
When Mike presented LOCALIO to me at LSF, my initial suggestion
was to use pNFS. I think Jeff had the same reaction. IMO the
design document should, as part of the problem statement,
explain why a pNFS-only solution is not workable.
I'm also concerned about applications in one container being
able to reach around existing mount namespace silos into the
NFS server container's file systems. Obviously the NFS protocol
has its own authorization that would grant permission for that
access, but via the network.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 8:52 ` Mike Snitzer
2024-07-03 14:16 ` Christoph Hellwig
@ 2024-07-03 15:26 ` Chuck Lever III
2024-07-03 15:37 ` Mike Snitzer
1 sibling, 1 reply; 77+ messages in thread
From: Chuck Lever III @ 2024-07-03 15:26 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
> On Jul 3, 2024, at 4:52 AM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> And if spending the past 2 months discussing and developing in the
> open is rushing things, I clearly need to slow down...
>
> If only I had reason to think others were considering merging these
> changes: https://marc.info/?l=linux-nfs&m=171942776105165&w=2
There is no mention of a particular kernel release in that
email, nor is there a promise that we could hit the v6.11
merge window.
In particular, I was asking how the series should be split
up, since it modifies two separately maintained subsystems.
I apologize for not asking this more clearly.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:16 ` Christoph Hellwig
@ 2024-07-03 15:28 ` Mike Snitzer
2024-07-04 5:49 ` Christoph Hellwig
0 siblings, 1 reply; 77+ messages in thread
From: Mike Snitzer @ 2024-07-03 15:28 UTC (permalink / raw)
To: Christoph Hellwig
Cc: linux-nfs, Jeff Layton, Chuck Lever, Anna Schumaker,
Trond Myklebust, NeilBrown, snitzer
On Wed, Jul 03, 2024 at 08:16:00AM -0700, Christoph Hellwig wrote:
> I've stated looking a bit at the code, and the architectural model
> confuses me more than a bit.
>
> A first thing that would be very helpful is an actual problem statement.
> The only mention of a concrete use case is about containers, implying
> that this about a client in one container/namespace with the server
> or the servers in another containers/namespace. Is that the main use
> case, are there others?
Containers is a significant usecase, but also any client that might
need to access local storage efficiently (e.g. GPU service running NFS
client that needs to access NVMe on same host).
> I kinda deduct from that that the client and server probably do not
> have the same view and access permissions to the underlying file
> systems? As this would defeat the use of NFS I suspect that is the
> case, but it should probably be stated clearly somewhere.
I can tighten that up in the Documentation.
> Going from there I don't understand why we need multiple layers of
> server bypass. The normal way to do this in NFSv4 is to use pNFS
> layout.
>
> I.e. you add a pnfs localio layout that just does local reads
> and writes for the I/O path. We'd still need a way to find a good
> in-kernel way to get the file structure, but compared to the two
> separate layers of bypasses in the current code it should be
> significantly simpler.
Using pNFS layout isn't viable because NFSv3 is very much in the mix
(flexfiles layout) for Hammerspace.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:24 ` Chuck Lever III
@ 2024-07-03 15:29 ` Christoph Hellwig
2024-07-03 15:36 ` Mike Snitzer
1 sibling, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-03 15:29 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Mike Snitzer, Linux NFS Mailing List,
Jeff Layton, Anna Schumaker, Trond Myklebust, Neil Brown,
Dave Chinner
On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote:
> I'm also concerned about applications in one container being
> able to reach around existing mount namespace silos into the
> NFS server container's file systems. Obviously the NFS protocol
> has its own authorization that would grant permission for that
> access, but via the network.
Yes. One good way I could think is to use SCM_RIGHT to duplicate a file
descriptor over a unix socket. For that we'd need a way to actually
create that unix socket first and I also don't think we currently have
support for using that in-kernel, but it's a well-known way to hand file
descriptors to other processes. A big plus would be that this would
even work with non-kernel servers (or event clients for the matter)
as long as they run on the same kernel (including non-Linux kernels).
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:24 ` Chuck Lever III
2024-07-03 15:29 ` Christoph Hellwig
@ 2024-07-03 15:36 ` Mike Snitzer
2024-07-03 17:06 ` Jeff Layton
` (2 more replies)
1 sibling, 3 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-03 15:36 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote:
>
>
> > On Jul 3, 2024, at 11:18 AM, Christoph Hellwig <hch@infradead.org> wrote:
> > The actual way
> > to find the file struct still would be nasty, but I'll try to think of
> > something good for that.
>
> It is that very code that I've asked to be replaced before this
> series can be merged. We have a set of patches for improving
> that aspect that Neil is working on now.
>
> When Mike presented LOCALIO to me at LSF, my initial suggestion
> was to use pNFS. I think Jeff had the same reaction.
No, Jeff suggested using a O_TMPFILE based thing for localio
handshake. But he had the benefit of knowing NFSv3 important for the
intended localio usecase, so I'm not aware of him having pNFS design
ideas.
> IMO the design document should, as part of the problem statement,
> explain why a pNFS-only solution is not workable.
Sure, I can add that.
I explained the NFSv3 requirement when we discussed at LSF.
> I'm also concerned about applications in one container being
> able to reach around existing mount namespace silos into the
> NFS server container's file systems. Obviously the NFS protocol
> has its own authorization that would grant permission for that
> access, but via the network.
Jeff also had concerns there (as did I) but we arrived at NFS having
the ability to do it over network, so doing it with localio ultimately
"OK". That said, localio isn't taking special action to escape mount
namespaces (that I'm aware of) and in practice there are no
requirements to do so.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:26 ` Chuck Lever III
@ 2024-07-03 15:37 ` Mike Snitzer
0 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-03 15:37 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Wed, Jul 03, 2024 at 03:26:30PM +0000, Chuck Lever III wrote:
>
>
> > On Jul 3, 2024, at 4:52 AM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > And if spending the past 2 months discussing and developing in the
> > open is rushing things, I clearly need to slow down...
> >
> > If only I had reason to think others were considering merging these
> > changes: https://marc.info/?l=linux-nfs&m=171942776105165&w=2
>
> There is no mention of a particular kernel release in that
> email, nor is there a promise that we could hit the v6.11
> merge window.
>
> In particular, I was asking how the series should be split
> up, since it modifies two separately maintained subsystems.
>
> I apologize for not asking this more clearly.
No problem, I just got ahead of myself. Not your fault.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:36 ` Mike Snitzer
@ 2024-07-03 17:06 ` Jeff Layton
2024-07-04 6:00 ` Christoph Hellwig
2024-07-03 17:19 ` Chuck Lever III
2024-07-04 6:01 ` Christoph Hellwig
2 siblings, 1 reply; 77+ messages in thread
From: Jeff Layton @ 2024-07-03 17:06 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever III
Cc: Christoph Hellwig, Linux NFS Mailing List, Anna Schumaker,
Trond Myklebust, Neil Brown, Dave Chinner
On Wed, 2024-07-03 at 11:36 -0400, Mike Snitzer wrote:
> On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote:
> >
> >
> > > On Jul 3, 2024, at 11:18 AM, Christoph Hellwig
> > > <hch@infradead.org> wrote:
> > > The actual way
> > > to find the file struct still would be nasty, but I'll try to
> > > think of
> > > something good for that.
> >
> > It is that very code that I've asked to be replaced before this
> > series can be merged. We have a set of patches for improving
> > that aspect that Neil is working on now.
> >
> > When Mike presented LOCALIO to me at LSF, my initial suggestion
> > was to use pNFS. I think Jeff had the same reaction.
>
> No, Jeff suggested using a O_TMPFILE based thing for localio
> handshake. But he had the benefit of knowing NFSv3 important for the
> intended localio usecase, so I'm not aware of him having pNFS design
> ideas.
>
The other problem with doing this is that if a server is running in a
container, how is it to know that the client is in different container
on the same host, and hence that it can give out a localio layout? We'd
still need some way to detect that anyway, which would probably look a
lot like the localio protocol.
> > IMO the design document should, as part of the problem statement,
> > explain why a pNFS-only solution is not workable.
>
> Sure, I can add that.
>
> I explained the NFSv3 requirement when we discussed at LSF.
>
> > I'm also concerned about applications in one container being
> > able to reach around existing mount namespace silos into the
> > NFS server container's file systems. Obviously the NFS protocol
> > has its own authorization that would grant permission for that
> > access, but via the network.
>
> Jeff also had concerns there (as did I) but we arrived at NFS having
> the ability to do it over network, so doing it with localio
> ultimately
> "OK". That said, localio isn't taking special action to escape mount
> namespaces (that I'm aware of) and in practice there are no
> requirements to do so.
The one thing I think we need to ensure is that an unauthorized NFS
client on the same kernel can't use this to bypass export permission
checks.
IOW, suppose we have a client and server on the same host. The server
allows the client to access some of its exports, but not all. The rest
are restricted to only certain IP addresses.
Can the client use its localio access to bypass that since it's not
going across the network anymore? Maybe by using open_by_handle_at on
the NFS share on a guessed filehandle? I think we need to ensure that
that isn't possible.
I wonder if it's also worthwhile to gate localio access on an export
option, just out of an abundance of caution.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:36 ` Mike Snitzer
2024-07-03 17:06 ` Jeff Layton
@ 2024-07-03 17:19 ` Chuck Lever III
2024-07-03 19:04 ` Mike Snitzer
2024-07-03 21:35 ` NeilBrown
2024-07-04 6:01 ` Christoph Hellwig
2 siblings, 2 replies; 77+ messages in thread
From: Chuck Lever III @ 2024-07-03 17:19 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
> On Jul 3, 2024, at 11:36 AM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote:
>
>> IMO the design document should, as part of the problem statement,
>> explain why a pNFS-only solution is not workable.
>
> Sure, I can add that.
>
> I explained the NFSv3 requirement when we discussed at LSF.
You explained it to me in a private conversation, although
there was a lot of "I don't know yet" in that discussion.
It needs to be (re)explained in a public forum because
reviewers keep bringing this question up.
I hope to see more than just "NFSv3 is in the mix". There
needs to be some explanation of why it is necessary to
support NFSv3 without the use of pNFS flexfile.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 17:19 ` Chuck Lever III
@ 2024-07-03 19:04 ` Mike Snitzer
2024-07-04 5:55 ` Christoph Hellwig
2024-07-03 21:35 ` NeilBrown
1 sibling, 1 reply; 77+ messages in thread
From: Mike Snitzer @ 2024-07-03 19:04 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Linux NFS Mailing List, Jeff Layton,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Wed, Jul 03, 2024 at 05:19:06PM +0000, Chuck Lever III wrote:
>
>
> > On Jul 3, 2024, at 11:36 AM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote:
> >
> >> IMO the design document should, as part of the problem statement,
> >> explain why a pNFS-only solution is not workable.
> >
> > Sure, I can add that.
> >
> > I explained the NFSv3 requirement when we discussed at LSF.
>
> You explained it to me in a private conversation, although
> there was a lot of "I don't know yet" in that discussion.
Those "I don't know yet" were in response to you asking why a pNFS
layout (like the block layout) is not possible to achieve localio.
The answer to that is: someone(s) could try that, but there is no
interest from me or my employer to resort to using block layout with
centralized mapping of which client and DS are local so that the pNFS
MDS could handout such pNFS block layouts.
That added MDS complexity can be avoided if the client and server have
autonomy to negotiate more performant access without a centralized
arbiter (hence the "localio" handshake).
> It needs to be (re)explained in a public forum because
> reviewers keep bringing this question up.
Sure.
> I hope to see more than just "NFSv3 is in the mix". There
> needs to be some explanation of why it is necessary to
> support NFSv3 without the use of pNFS flexfile.
Loaded question there, not sure why you're leading with it being
invalid to decouple localio (leveraging client and server locality)
from pNFS.
NFS can realize benefits from localio being completely decoupled from
flexfiles and pNFS. There are clear benefits with container use-cases
that don't use pNFS at all.
Just so happens that flexfiles ushers in the use of NFSv3. Once the
client gets a flexfiles layout that points to an NFSv3 DS: the client
IO is issued in terms of NFSv3. If the client happens to be on the
same host as the server then using localio is a win.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 17:19 ` Chuck Lever III
2024-07-03 19:04 ` Mike Snitzer
@ 2024-07-03 21:35 ` NeilBrown
1 sibling, 0 replies; 77+ messages in thread
From: NeilBrown @ 2024-07-03 21:35 UTC (permalink / raw)
To: Chuck Lever III
Cc: Mike Snitzer, Christoph Hellwig, Linux NFS Mailing List,
Jeff Layton, Anna Schumaker, Trond Myklebust, Dave Chinner
On Thu, 04 Jul 2024, Chuck Lever III wrote:
>
>
> > On Jul 3, 2024, at 11:36 AM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > On Wed, Jul 03, 2024 at 03:24:18PM +0000, Chuck Lever III wrote:
> >
> >> IMO the design document should, as part of the problem statement,
> >> explain why a pNFS-only solution is not workable.
> >
> > Sure, I can add that.
> >
> > I explained the NFSv3 requirement when we discussed at LSF.
>
> You explained it to me in a private conversation, although
> there was a lot of "I don't know yet" in that discussion.
>
> It needs to be (re)explained in a public forum because
> reviewers keep bringing this question up.
>
> I hope to see more than just "NFSv3 is in the mix". There
> needs to be some explanation of why it is necessary to
> support NFSv3 without the use of pNFS flexfile.
>
My perspective if "of course NFSv3".
This core idea is to accelerate loop-back NFS and unless we have decided
to deprecate NFSv3 (as I think we have decided to deprecate NFSv2), then
NFSv3 support should be on the table.
If v3 support turns out to be particularly burdensome, then it's not a
"must have" for me, but it isn't at all clear to me that a pNFS approach
would have fewer problems - only different problems.
Just my 2c worth.
NeilBrown
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:28 ` Mike Snitzer
@ 2024-07-04 5:49 ` Christoph Hellwig
0 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-04 5:49 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, linux-nfs, Jeff Layton, Chuck Lever,
Anna Schumaker, Trond Myklebust, NeilBrown, snitzer
On Wed, Jul 03, 2024 at 11:28:55AM -0400, Mike Snitzer wrote:
> Containers is a significant usecase, but also any client that might
> need to access local storage efficiently (e.g. GPU service running NFS
> client that needs to access NVMe on same host).
Please explain that in terms of who talks to whom concretely using
the actual Linux and/or NFS entities. The last sentence just sound
like a AI generated marketing whitepaper.
> I can tighten that up in the Documentation.
Please write up a coherent document for the use case and circle it
around. It's kinda pointless do code review if we don't have a
problem statement and use case.
> Using pNFS layout isn't viable because NFSv3 is very much in the mix
> (flexfiles layout) for Hammerspace.
Again, why and how. We have a codebase that works entirely inside the
Linux kernel, and requires new code to be merged. If we can't ask
people to use a the current protocol (where current means a 14 year
old RFC!), we have a problem.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 19:04 ` Mike Snitzer
@ 2024-07-04 5:55 ` Christoph Hellwig
0 siblings, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-04 5:55 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever III, Christoph Hellwig, Linux NFS Mailing List,
Jeff Layton, Anna Schumaker, Trond Myklebust, Neil Brown,
Dave Chinner
On Wed, Jul 03, 2024 at 03:04:05PM -0400, Mike Snitzer wrote:
> The answer to that is: someone(s) could try that, but there is no
> interest from me or my employer to resort to using block layout with
> centralized mapping of which client and DS are local so that the pNFS
> MDS could handout such pNFS block layouts.
Where did block layout suddenly come from?
> That added MDS complexity can be avoided if the client and server have
> autonomy to negotiate more performant access without a centralized
> arbiter (hence the "localio" handshake).
Doing a localio layout would actually be a lot simpler than the current
mess, so that argument goes the other way around.
> NFS can realize benefits from localio being completely decoupled from
> flexfiles and pNFS.
How about actually listing the benefits?
> There are clear benefits with container use-cases
> that don't use pNFS at all.
Well, the point would be to make them use pNFS, because pNFS is the
well known and proven way to bypass the main server in NFS.
> Just so happens that flexfiles ushers in the use of NFSv3. Once the
> client gets a flexfiles layout that points to an NFSv3 DS: the client
> IO is issued in terms of NFSv3. If the client happens to be on the
> same host as the server then using localio is a win.
I have no idea where flexfiles comes in here and why it matters. The
Linux server does not even support flexfiles layouts.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 17:06 ` Jeff Layton
@ 2024-07-04 6:00 ` Christoph Hellwig
2024-07-04 18:31 ` Mike Snitzer
0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-04 6:00 UTC (permalink / raw)
To: Jeff Layton
Cc: Mike Snitzer, Chuck Lever III, Christoph Hellwig,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Neil Brown, Dave Chinner
On Wed, Jul 03, 2024 at 01:06:51PM -0400, Jeff Layton wrote:
> The other problem with doing this is that if a server is running in a
> container, how is it to know that the client is in different container
> on the same host, and hence that it can give out a localio layout? We'd
> still need some way to detect that anyway, which would probably look a
> lot like the localio protocol.
We'll need some way to detect that client and server are capable
of the bypass. And from all it looks that's actually the hard and
complicated part, and we'll need that for any scheme.
And then we need a way to bypass the server for I/O, which currently is
rather complex in the patchset and would be almost trivial with a new
pNFS layout.
> Can the client use its localio access to bypass that since it's not
> going across the network anymore? Maybe by using open_by_handle_at on
> the NFS share on a guessed filehandle? I think we need to ensure that
> that isn't possible.
If a file system is shared by containers and users in containers have
the capability to use open_by_handle_at the security model is already
broken without NFS or localio involved.
> I wonder if it's also worthwhile to gate localio access on an export
> option, just out of an abundance of caution.
export and mount option. We're speaking a non-standard side band
protocol here, there is no way that should be done without explicit
opt-in from both sides.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-03 15:36 ` Mike Snitzer
2024-07-03 17:06 ` Jeff Layton
2024-07-03 17:19 ` Chuck Lever III
@ 2024-07-04 6:01 ` Christoph Hellwig
2024-07-04 10:13 ` Jeff Layton
2 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-04 6:01 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever III, Christoph Hellwig, Linux NFS Mailing List,
Jeff Layton, Anna Schumaker, Trond Myklebust, Neil Brown,
Dave Chinner
On Wed, Jul 03, 2024 at 11:36:00AM -0400, Mike Snitzer wrote:
> > When Mike presented LOCALIO to me at LSF, my initial suggestion
> > was to use pNFS. I think Jeff had the same reaction.
>
> No, Jeff suggested using a O_TMPFILE based thing for localio
> handshake. But he had the benefit of knowing NFSv3 important for the
> intended localio usecase, so I'm not aware of him having pNFS design
> ideas.
How does O_TMPFILE fit in here? NFS doesn't even support O_TMPFILE.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-04 6:01 ` Christoph Hellwig
@ 2024-07-04 10:13 ` Jeff Layton
0 siblings, 0 replies; 77+ messages in thread
From: Jeff Layton @ 2024-07-04 10:13 UTC (permalink / raw)
To: Christoph Hellwig, Mike Snitzer
Cc: Chuck Lever III, Linux NFS Mailing List, Anna Schumaker,
Trond Myklebust, Neil Brown, Dave Chinner
On Wed, 2024-07-03 at 23:01 -0700, Christoph Hellwig wrote:
> On Wed, Jul 03, 2024 at 11:36:00AM -0400, Mike Snitzer wrote:
> > > When Mike presented LOCALIO to me at LSF, my initial suggestion
> > > was to use pNFS. I think Jeff had the same reaction.
> >
> > No, Jeff suggested using a O_TMPFILE based thing for localio
> > handshake. But he had the benefit of knowing NFSv3 important for the
> > intended localio usecase, so I'm not aware of him having pNFS design
> > ideas.
>
> How does O_TMPFILE fit in here? NFS doesn't even support O_TMPFILE.
>
At LSF we were tossing around ideas about how to detect whether the
client and server were on the same host. My thinking was to have a
common fs (maybe even a tmpfs) that was exported by all of the servers
on the host and accessible by all of the containers on the host.
The client would then do an O_TMPFILE open in that tmpfs, write some
data to it (uuids or something) and determine the filehandle. Then it
could issue a v3 READ against the NFS server for that filehandle and if
it worked and the contents were as expected you could be sure you're on
the same host. The client could then just close the file and it would
be cleaned up.
The problem of course is that that requires having a fs that is
commonly accessible between all of the containers, which is a bit more
setup than is ideal.
The localio protocol (particularly with Neil's suggested improvements)
is really a better scheme I think.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-04 6:00 ` Christoph Hellwig
@ 2024-07-04 18:31 ` Mike Snitzer
2024-07-05 5:18 ` Christoph Hellwig
0 siblings, 1 reply; 77+ messages in thread
From: Mike Snitzer @ 2024-07-04 18:31 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Jeff Layton, Chuck Lever III, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Wed, Jul 03, 2024 at 11:00:27PM -0700, Christoph Hellwig wrote:
> On Wed, Jul 03, 2024 at 01:06:51PM -0400, Jeff Layton wrote:
> > The other problem with doing this is that if a server is running in a
> > container, how is it to know that the client is in different container
> > on the same host, and hence that it can give out a localio layout? We'd
> > still need some way to detect that anyway, which would probably look a
> > lot like the localio protocol.
(NOTE, Jeff's message above was him stating flaws in his O_TMPFILE
idea that we discussed at LSF, his idea wasn't pusued for many
reasons. And Jeff has stated he believes localio better)
> We'll need some way to detect that client and server are capable
> of the bypass. And from all it looks that's actually the hard and
> complicated part, and we'll need that for any scheme.
Yes, hence the localio protocol that has wide buyin. To the point I
ran with registering an NFS Program number with iana.org for the
effort. My doing the localio protocol was born out of the requirement
to support NFSv3.
Neil's proposed refinement to add a a localio auth_domain to the
nfsd_net and his proposed risk-averse handshake within the localio
protocol will both improve security.
> And then we need a way to bypass the server for I/O, which currently is
> rather complex in the patchset and would be almost trivial with a new
> pNFS layout.
Some new layout misses the entire point of having localio work for
NFSv3 and NFSv4. NFSv3 is very ubiquitous.
And in this localio series, flexfiles is trained to use localio.
(Which you apparently don't recognize or care about because nfsd
doesn't have flexfiles server support).
> > Can the client use its localio access to bypass that since it's not
> > going across the network anymore? Maybe by using open_by_handle_at on
> > the NFS share on a guessed filehandle? I think we need to ensure that
> > that isn't possible.
>
> If a file system is shared by containers and users in containers have
> the capability to use open_by_handle_at the security model is already
> broken without NFS or localio involved.
Containers deployed by things like podman.io and kubernetes are
perfectly happy to allow containers permission to drive knfsd threads
in the host kernel. That this is foreign to you is odd.
An NFS client that happens to be on the host should work perfectly
fine too (if it has adequate permissions).
> > I wonder if it's also worthwhile to gate localio access on an export
> > option, just out of an abundance of caution.
>
> export and mount option. We're speaking a non-standard side band
> protocol here, there is no way that should be done without explicit
> opt-in from both sides.
That is already provided my existing controls. With both Kconfig
options that default to N, and the ability to disable the use of
localio entirely even if enabled in the Kconfig:
echo N > /sys/module/nfs/parameters/localio_enabled
And then ontop of it you have to loopback NFS mount.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-04 18:31 ` Mike Snitzer
@ 2024-07-05 5:18 ` Christoph Hellwig
2024-07-05 13:35 ` Chuck Lever III
2024-07-05 22:08 ` NeilBrown
0 siblings, 2 replies; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-05 5:18 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Jeff Layton, Chuck Lever III,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Neil Brown, Dave Chinner
On Thu, Jul 04, 2024 at 02:31:46PM -0400, Mike Snitzer wrote:
> Some new layout misses the entire point of having localio work for
> NFSv3 and NFSv4. NFSv3 is very ubiquitous.
I'm getting tird of bringing up this "oh NFSv3" again and again without
any explanation of why that matters for communication insides the
same Linux kernel instance with a kernel that obviously requires
patching. Why is running an obsolete protocol inside the same OS
instance required. Maybe it is, but if so it needs a very good
explanation.
> And in this localio series, flexfiles is trained to use localio.
> (Which you apparently don't recognize or care about because nfsd
> doesn't have flexfiles server support).
And you fail to explain why it matters. You are trying to sell this
code, you better have an explanation why it's complicated and convoluted
as hell. So far we are running in circles but there has been no clear
explanation of use cases.
> > > Can the client use its localio access to bypass that since it's not
> > > going across the network anymore? Maybe by using open_by_handle_at on
> > > the NFS share on a guessed filehandle? I think we need to ensure that
> > > that isn't possible.
> >
> > If a file system is shared by containers and users in containers have
> > the capability to use open_by_handle_at the security model is already
> > broken without NFS or localio involved.
>
> Containers deployed by things like podman.io and kubernetes are
> perfectly happy to allow containers permission to drive knfsd threads
> in the host kernel. That this is foreign to you is odd.
>
> An NFS client that happens to be on the host should work perfectly
> fine too (if it has adequate permissions).
Can you please stop the personal attacks? I am just stating the fact
that IF the containers using the NFS mount has access to the exported
file systems and the privileges to use open by handle there is nothing
nfsd can do about security as the container has full access to the file
system anyway. That's a fact and how you deploy the various containers
is completely irrelevant. It is also in case that you didn't notice
it last time about the _client_ containers as stated by me and the
original poster I replied to.
> > > I wonder if it's also worthwhile to gate localio access on an export
> > > option, just out of an abundance of caution.
> >
> > export and mount option. We're speaking a non-standard side band
> > protocol here, there is no way that should be done without explicit
> > opt-in from both sides.
>
> That is already provided my existing controls. With both Kconfig
> options that default to N, and the ability to disable the use of
> localio entirely even if enabled in the Kconfig:
> echo N > /sys/module/nfs/parameters/localio_enabled
And all of that is global and not per-mount or nfsd instance, which
doesn't exactly scale to a multi-tenant container hosting setup.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 5:18 ` Christoph Hellwig
@ 2024-07-05 13:35 ` Chuck Lever III
2024-07-05 13:39 ` Christoph Hellwig
2024-07-05 14:15 ` Mike Snitzer
2024-07-05 22:08 ` NeilBrown
1 sibling, 2 replies; 77+ messages in thread
From: Chuck Lever III @ 2024-07-05 13:35 UTC (permalink / raw)
To: Christoph Hellwig, Mike Snitzer
Cc: Jeff Layton, Linux NFS Mailing List, Anna Schumaker,
Trond Myklebust, Neil Brown, Dave Chinner
> On Jul 5, 2024, at 1:18 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
> On Thu, Jul 04, 2024 at 02:31:46PM -0400, Mike Snitzer wrote:
>> Some new layout misses the entire point of having localio work for
>> NFSv3 and NFSv4. NFSv3 is very ubiquitous.
>
> I'm getting tird of bringing up this "oh NFSv3" again and again without
> any explanation of why that matters for communication insides the
> same Linux kernel instance with a kernel that obviously requires
> patching. Why is running an obsolete protocol inside the same OS
> instance required. Maybe it is, but if so it needs a very good
> explanation.
I agree: I think the requirement for NFSv3 in this situation
needs a clear justification. Both peers are recent vintage
Linux kernels; both peers can use NFSv4.x, there's no
explicit need for backwards compatibility in the use cases
that have been provided so far.
Generally I do agree with Neil's "why not NFSv3, we still
support it" argument. But with NFSv4, you get better locking
semantics, delegation, pNFS (possibly), and proper protocol
extensibility. There are really strong reasons to restrict
this facility to NFSv4.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 13:35 ` Chuck Lever III
@ 2024-07-05 13:39 ` Christoph Hellwig
2024-07-05 14:15 ` Mike Snitzer
1 sibling, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-05 13:39 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Neil Brown, Dave Chinner
On Fri, Jul 05, 2024 at 01:35:18PM +0000, Chuck Lever III wrote:
> I agree: I think the requirement for NFSv3 in this situation
> needs a clear justification. Both peers are recent vintage
> Linux kernels; both peers can use NFSv4.x, there's no
> explicit need for backwards compatibility in the use cases
> that have been provided so far.
More importantly both peers are in fact the exact same Linux kernel
instance. Which is the important point here - we are doing a bypass
for a kernel talking to itself, although a kernel suffering from
multiple personality (dis)order where the different sides might
expose very different system views.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 13:35 ` Chuck Lever III
2024-07-05 13:39 ` Christoph Hellwig
@ 2024-07-05 14:15 ` Mike Snitzer
2024-07-05 14:18 ` Christoph Hellwig
1 sibling, 1 reply; 77+ messages in thread
From: Mike Snitzer @ 2024-07-05 14:15 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Jeff Layton, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Fri, Jul 05, 2024 at 01:35:18PM +0000, Chuck Lever III wrote:
>
>
> > On Jul 5, 2024, at 1:18 AM, Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Thu, Jul 04, 2024 at 02:31:46PM -0400, Mike Snitzer wrote:
> >> Some new layout misses the entire point of having localio work for
> >> NFSv3 and NFSv4. NFSv3 is very ubiquitous.
> >
> > I'm getting tird of bringing up this "oh NFSv3" again and again without
> > any explanation of why that matters for communication insides the
> > same Linux kernel instance with a kernel that obviously requires
> > patching. Why is running an obsolete protocol inside the same OS
> > instance required. Maybe it is, but if so it needs a very good
> > explanation.
>
> I agree: I think the requirement for NFSv3 in this situation
> needs a clear justification. Both peers are recent vintage
> Linux kernels; both peers can use NFSv4.x, there's no
> explicit need for backwards compatibility in the use cases
> that have been provided so far.
>
> Generally I do agree with Neil's "why not NFSv3, we still
> support it" argument. But with NFSv4, you get better locking
> semantics, delegation, pNFS (possibly), and proper protocol
> extensibility. There are really strong reasons to restrict
> this facility to NFSv4.
NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on
the same host.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 14:15 ` Mike Snitzer
@ 2024-07-05 14:18 ` Christoph Hellwig
2024-07-05 14:36 ` Mike Snitzer
0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-05 14:18 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever III, Christoph Hellwig, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Neil Brown, Dave Chinner
On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote:
> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on
> the same host.
That doesn't really bring is any further. Why is it required?
I think we'll just need to stop this discussion until we have reasonable
documentation of the use cases and assumptions, because without that
we'll get hund up in dead loops.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 14:18 ` Christoph Hellwig
@ 2024-07-05 14:36 ` Mike Snitzer
2024-07-05 14:59 ` Chuck Lever III
2024-07-05 18:59 ` Jeff Layton
0 siblings, 2 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-05 14:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chuck Lever III, Jeff Layton, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote:
> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote:
> > NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on
> > the same host.
>
> That doesn't really bring is any further. Why is it required?
>
> I think we'll just need to stop this discussion until we have reasonable
> documentation of the use cases and assumptions, because without that
> we'll get hund up in dead loops.
It _really_ isn't material to the core capability that localio provides.
localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
containers).
Hammerspace needs localio to work with NFSv3 to assist with its "data
movers" that run on the host (using nfs and nfsd).
Please just remove yourself from the conversation if you cannot make
sense of this. If you'd like to be involved, put the work in to
understand the code and be professional.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 14:36 ` Mike Snitzer
@ 2024-07-05 14:59 ` Chuck Lever III
2024-07-06 3:58 ` Mike Snitzer
2024-07-05 18:59 ` Jeff Layton
1 sibling, 1 reply; 77+ messages in thread
From: Chuck Lever III @ 2024-07-05 14:59 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Jeff Layton, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
> On Jul 5, 2024, at 10:36 AM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote:
>> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote:
>>> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on
>>> the same host.
>>
>> That doesn't really bring is any further. Why is it required?
>>
>> I think we'll just need to stop this discussion until we have reasonable
>> documentation of the use cases and assumptions, because without that
>> we'll get hund up in dead loops.
>
> It _really_ isn't material to the core capability that localio provides.
> localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
> containers).
>
> Hammerspace needs localio to work with NFSv3 to assist with its "data
> movers" that run on the host (using nfs and nfsd).
>
> Please just remove yourself from the conversation if you cannot make
> sense of this. If you'd like to be involved, put the work in to
> understand the code and be professional.
Sorry, I can't make sense of this either, and I find the
personal attack here completely inappropriate (and a bit
hypocritical, to be honest).
I have nothing else to contribute that you won't either
dismiss or treat as a personal attack, so I can't continue
this conversation.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 14:36 ` Mike Snitzer
2024-07-05 14:59 ` Chuck Lever III
@ 2024-07-05 18:59 ` Jeff Layton
1 sibling, 0 replies; 77+ messages in thread
From: Jeff Layton @ 2024-07-05 18:59 UTC (permalink / raw)
To: Mike Snitzer, Christoph Hellwig
Cc: Chuck Lever III, Linux NFS Mailing List, Anna Schumaker,
Trond Myklebust, Neil Brown, Dave Chinner
On Fri, 2024-07-05 at 10:36 -0400, Mike Snitzer wrote:
> On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote:
> > On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote:
> > > NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3
> > > knfsd on
> > > the same host.
> >
> > That doesn't really bring is any further. Why is it required?
> >
> > I think we'll just need to stop this discussion until we have
> > reasonable
> > documentation of the use cases and assumptions, because without
> > that
> > we'll get hund up in dead loops.
>
> It _really_ isn't material to the core capability that localio
> provides.
> localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
> containers).
>
> Hammerspace needs localio to work with NFSv3 to assist with its "data
> movers" that run on the host (using nfs and nfsd).
>
> Please just remove yourself from the conversation if you cannot make
> sense of this. If you'd like to be involved, put the work in to
> understand the code and be professional.
I disagree wholeheartedly with this statement. Christoph has raised a
very valid point. You have _not_ articulated why v3 access is important
here.
I'm aware of why it is (at least to HS), and I think there are other
valid reasons to keep v3 in the mix (as Neil has pointed out). But,
that info should be in the cover letter and changelogs. Not everyone
has insight into this, and tbqh, my understanding could be wrong.
Let's do please try to keep the discussion civil.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 5:18 ` Christoph Hellwig
2024-07-05 13:35 ` Chuck Lever III
@ 2024-07-05 22:08 ` NeilBrown
2024-07-06 6:02 ` Christoph Hellwig
1 sibling, 1 reply; 77+ messages in thread
From: NeilBrown @ 2024-07-05 22:08 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, Christoph Hellwig, Jeff Layton, Chuck Lever III,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Fri, 05 Jul 2024, Christoph Hellwig wrote:
> On Thu, Jul 04, 2024 at 02:31:46PM -0400, Mike Snitzer wrote:
> > Some new layout misses the entire point of having localio work for
> > NFSv3 and NFSv4. NFSv3 is very ubiquitous.
>
> I'm getting tird of bringing up this "oh NFSv3" again and again without
> any explanation of why that matters for communication insides the
> same Linux kernel instance with a kernel that obviously requires
> patching. Why is running an obsolete protocol inside the same OS
> instance required. Maybe it is, but if so it needs a very good
> explanation.
I would like to see a good explanation for why NOT NFSv3.
I don't think NFSv3 is obsolete. The first dictionary is "No longer in
use." which certainly doesn't apply.
I think "deprecated" is a more relevant term. I believe that NFSv2 has
been deprecated. I believe that NFSv4.0 should be deprecated. But I
don't see any reason to consider NFSv3 to be deprecated.
>
> > And in this localio series, flexfiles is trained to use localio.
> > (Which you apparently don't recognize or care about because nfsd
> > doesn't have flexfiles server support).
>
> And you fail to explain why it matters. You are trying to sell this
> code, you better have an explanation why it's complicated and convoluted
> as hell. So far we are running in circles but there has been no clear
> explanation of use cases.
Please avoid sweeping statements like "complicated and convoluted"
without backing them up with specifics.
I don't particularly want to defend the current localio protocol, and I
certainly see a number of points which can and must be improved. But it
isn't clear to me that the big picture is either complicated or
convoluted. Please provide details.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 14:59 ` Chuck Lever III
@ 2024-07-06 3:58 ` Mike Snitzer
2024-07-06 5:52 ` NeilBrown
` (2 more replies)
0 siblings, 3 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-06 3:58 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Jeff Layton, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Fri, Jul 05, 2024 at 02:59:31PM +0000, Chuck Lever III wrote:
>
>
> > On Jul 5, 2024, at 10:36 AM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> > On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote:
> >> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote:
> >>> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on
> >>> the same host.
> >>
> >> That doesn't really bring is any further. Why is it required?
> >>
> >> I think we'll just need to stop this discussion until we have reasonable
> >> documentation of the use cases and assumptions, because without that
> >> we'll get hund up in dead loops.
> >
> > It _really_ isn't material to the core capability that localio provides.
> > localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
> > containers).
> >
> > Hammerspace needs localio to work with NFSv3 to assist with its "data
> > movers" that run on the host (using nfs and nfsd).
> >
> > Please just remove yourself from the conversation if you cannot make
> > sense of this. If you'd like to be involved, put the work in to
> > understand the code and be professional.
>
> Sorry, I can't make sense of this either, and I find the
> personal attack here completely inappropriate (and a bit
> hypocritical, to be honest).
Hi Chuck,
I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to
Christoph. Who has taken to feign incapable of understanding localio
yet is perfectly OK with flexing like he is an authority on the topic.
He rallied to your Nacked-By with his chest puffed up and has
proceeded to baselessly shit-talk (did you miss his emails while we
slept last night?). Yes, let's condone and encourage more of that!?
No, I won't abide such toxicity. But thankfully Neil has since called
for him to stop. Alas...
Earlier today I answered the question about "why NFSv3?" in simple
terms. You and Christoph rejected it. I'm _not_ being evassive.
There isn't a lot to it: "More efficient NFS in containers" _is_ the
answer.
But hopefully this email settles "why NFSv3?". If not, please help me
(or others) further your understanding by reframing your NFSv3 concern
in terms other than "why NFSv3?". Getting a bit like having to answer
"why is water wet?"
Why NFSv3?
----------
The localio feature improves IO performance when using NFSv3 and NFSv4
with containers. Hammerspace has immediate need for the NFSv3
support, because its "data movers" use NFSv3, but NFSv4 support is
expected to be useful in the future.
Just because Hammerspace is very invested in pNFS doesn't mean all
aspects are framed in terms of it.
General statement:
------------------
I wrote maybe ~30% of the entire localio code as it stands at "v11"
and that was focused primarily on adding NFSv4 support and developing
the localio protocol, hooking it into NFS's client initialization and
teardown along with the server (and vice-versa, nfsd lifetime due to
container applications: tearing down nfsd in container while nfs
client actively connected from host). Neil helped refine the localio
protocol part, and he has also looked critically at many aspects and
has a great list of improvements that are needed. Jeff provided
top-notch review of my initial use of SRCU and later the percpu refcnt
for interlocking with the client and server.
My point: others wrote the majority of localio (years ago). I'm just
trying to shepherd it upstream in an acceptable form. And yes,
localio supporting both NFSv3 and NFSv4 is important to me,
Hammerspace and anyone who'd like more efficient IO with both NFSv3
and NFSv4 in containers.
Answering "Why NFSv3?" with questions:
--------------------------------------
1) Why wasn't general NFS localio bypass controversial 3 weeks ago?
Instead (given all inputs, NFSv3 support requirement being one of
them) the use of a "localio protocol" got broad consensus and buyin
from you, Jeff, and Neil.
I _thought_ we all agreed localio was a worthwhile _auxilliary_
addition to Linux's NFS client and server (to give them awareness of
each other through nfs_common) regardless of NFS protocol version.
That is why I registered a localio RPC program number with IANA (at
your suggestion, you were cc'd when I applied for it, and you are
named on IANA.org along with Trond and myself for the program number
IANA assigned):
https://www.iana.org/assignments/rpc-program-numbers/rpc-program-numbers.txt
$ cat rpc-program-numbers.txt | egrep 'Snitzer|Myklebust|Lever'
Linux Kernel Organization 400122 nfslocalio [Mike_Snitzer][Trond_Myklebust][Chuck_Lever]
[Chuck_Lever] Chuck Lever mailto:chuck.lever&oracle.com 2024-06-20
[Mike_Snitzer] Mike Snitzer mailto:snitzer&kernel.org 2024-06-20
[Trond_Myklebust] Trond Myklebust mailto:trondmy&hammerspace.com 2024-06-20
2) If we're introducing a general NFS localio bypass feature _and_
NFSv3 is important to the stakeholder proposing the feature _and_
NFSv3 support is easily implemented and supported: then why do you
have such concern about localio supporting NFSv3?
3) Why do you think NFSv3 unworthy? Is this just a bellweather for
broader opposition to flexfiles (and its encouraging more use of
NFSv3)? Flexfiles is at the heart of NFSv3 use at Hammerspace. I've
come to understand from you and Christoph that the lack of flexfiles
support in NFSD helps fuel dislike for flexfiles. That's a lot for me
to unpack, and pretty far removed from "why NFSv3?", so I'd need more
context than I have if Hammerspace's use of flexfiles is what is
fueling your challenge of localio's NFSv3 support.
...
Reiterating and then expanding on my email above:
localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
containers).
Hammerspace needs localio to work with NFSv3 to assist with its
"data movers" that run on the host (using nfs [on host] and nfsd
[within container]).
Now you can ask why _that_ is.. but it really is pretty disjoint from
the simple matter of ensuring localio support both NFSv3 and NFSv4.
I've shared that Hammerspace's "data movers" use NFSv3 currently, in
the future they could use NFSv4 as needed. Hence the desire to
support localio with both NFSv3 and NFSv4. [when I picked up the
localio code NFSv4 wasn't even supported yet].
I _hope_ I've now answered "why NFSv3?" clearly.
> I have nothing else to contribute that you won't either
> dismiss or treat as a personal attack, so I can't continue
> this conversation.
That isn't even a little bit fair... but not taking the bait.
Neil has been wonderful to work with and I look forward to all future
work with him (localio and beyond). I am not trying to do anything
out of line with this feature. I am and have been actively working
with you, Neil and Jeff for over a month now. I've adapted and
learned, _with_ your and others' help, to the best of my ability.
I'm trying here, maybe you could say "I'm trying too hard". Well I
just started a new job with Hammerspace after working for Red Hat for
the past 15 years (much of my time spent as the upstream Linux DM
maintainer -- but you know this). I am a capable engineer and I've
proposed the upstreaming of a localio feature that would do well to
land upstream. I've done so in a proficient way all things
considered, always happy to learn new things and improve. I need to
work with you. Hopefully well, and hopefully I can earn your respect,
please just know I'm merely trying to improve NFS.
Hammerspace would like to get all its Linux kernel NFS innovation
upstream. And I'm trying to do that. localio is my first task and
I've been working on it with focus for the past 2 months since joining
Hammerspace. But you basically know all this, I said all of it to you
at LSF.
So if you know all these things (I _know_ you do), why are you
treating me in this way? I feel like I'm caught in the middle of some
much bigger divide than anything I've been involved with, caused or
made privy to.
Guess the messenger gets shot sometimes.
Mike
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 3:58 ` Mike Snitzer
@ 2024-07-06 5:52 ` NeilBrown
2024-07-06 13:05 ` "why NFSv3?" [was: Re: [PATCH v11 00/20] nfs/nfsd: add support for localio] Mike Snitzer
2024-07-06 5:58 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Christoph Hellwig
2024-07-06 16:58 ` Chuck Lever III
2 siblings, 1 reply; 77+ messages in thread
From: NeilBrown @ 2024-07-06 5:52 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever III, Christoph Hellwig, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Sat, 06 Jul 2024, Mike Snitzer wrote:
> On Fri, Jul 05, 2024 at 02:59:31PM +0000, Chuck Lever III wrote:
> >
> >
> > > On Jul 5, 2024, at 10:36 AM, Mike Snitzer <snitzer@kernel.org> wrote:
> > >
> > > On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote:
> > >> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote:
> > >>> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on
> > >>> the same host.
> > >>
> > >> That doesn't really bring is any further. Why is it required?
> > >>
> > >> I think we'll just need to stop this discussion until we have reasonable
> > >> documentation of the use cases and assumptions, because without that
> > >> we'll get hund up in dead loops.
> > >
> > > It _really_ isn't material to the core capability that localio provides.
> > > localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
> > > containers).
> > >
> > > Hammerspace needs localio to work with NFSv3 to assist with its "data
> > > movers" that run on the host (using nfs and nfsd).
> > >
> > > Please just remove yourself from the conversation if you cannot make
> > > sense of this. If you'd like to be involved, put the work in to
> > > understand the code and be professional.
> >
> > Sorry, I can't make sense of this either, and I find the
> > personal attack here completely inappropriate (and a bit
> > hypocritical, to be honest).
>
> Hi Chuck,
>
> I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to
> Christoph. Who has taken to feign incapable of understanding localio
> yet is perfectly OK with flexing like he is an authority on the topic.
Ad Hominem doesn't achieve anything useful. Please stick with technical
arguments. (They are the only ones I understand).
NeilBrown
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 3:58 ` Mike Snitzer
2024-07-06 5:52 ` NeilBrown
@ 2024-07-06 5:58 ` Christoph Hellwig
2024-07-06 13:12 ` Mike Snitzer
2024-07-06 16:58 ` Chuck Lever III
2 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-06 5:58 UTC (permalink / raw)
To: Mike Snitzer
Cc: Chuck Lever III, Christoph Hellwig, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Neil Brown, Dave Chinner
On Fri, Jul 05, 2024 at 11:58:56PM -0400, Mike Snitzer wrote:
> I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to
> Christoph. Who has taken to feign incapable of understanding localio
> yet is perfectly OK with flexing like he is an authority on the topic.
Hi Mike,
please take a few days off and relax, and then write an actual use case
and requirements document. I'm out of this thread for now, but I'd
appreciate if you'd just restart, assuming no one is activing in bad
faith and try to explain what you are doing and why without getting
upset.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-05 22:08 ` NeilBrown
@ 2024-07-06 6:02 ` Christoph Hellwig
2024-07-06 6:37 ` NeilBrown
0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-06 6:02 UTC (permalink / raw)
To: NeilBrown
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton, Chuck Lever III,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Sat, Jul 06, 2024 at 08:08:07AM +1000, NeilBrown wrote:
> I would like to see a good explanation for why NOT NFSv3.
> I don't think NFSv3 is obsolete. The first dictionary is "No longer in
> use." which certainly doesn't apply.
> I think "deprecated" is a more relevant term. I believe that NFSv2 has
> been deprecated. I believe that NFSv4.0 should be deprecated. But I
> don't see any reason to consider NFSv3 to be deprecated.
The obvious answer is that NFSv4.1/2 (which is really the same thing)
is the only version of NFS under development and open for new features
at the protocol level. So from the standardization perspective NFSv3
is obsolete.
But the more important point is that NFSv4 has a built-in way to bypass
the server for I/O namely pNFS. And bypassing the server by directly
going to a local file system is the text book example for what pNFS
does. So we'll need a really good argument why we need to reinvented
a different scheme for bypassing the server for I/O. Maybe there is
a really good killer argument for doing that, but it needs to be clearly
stated and defended instead of assumed.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 6:02 ` Christoph Hellwig
@ 2024-07-06 6:37 ` NeilBrown
2024-07-06 6:42 ` Christoph Hellwig
0 siblings, 1 reply; 77+ messages in thread
From: NeilBrown @ 2024-07-06 6:37 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton, Chuck Lever III,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Sat, 06 Jul 2024, Christoph Hellwig wrote:
> On Sat, Jul 06, 2024 at 08:08:07AM +1000, NeilBrown wrote:
> > I would like to see a good explanation for why NOT NFSv3.
> > I don't think NFSv3 is obsolete. The first dictionary is "No longer in
> > use." which certainly doesn't apply.
> > I think "deprecated" is a more relevant term. I believe that NFSv2 has
> > been deprecated. I believe that NFSv4.0 should be deprecated. But I
> > don't see any reason to consider NFSv3 to be deprecated.
>
> The obvious answer is that NFSv4.1/2 (which is really the same thing)
> is the only version of NFS under development and open for new features
> at the protocol level. So from the standardization perspective NFSv3
> is obsolete.
RFC-1813 is certainly obsolete from a standardization perspective - it
isn't even an IETF standard - only informational. It can't be extended
with any hope of interoperability between implementations.
But we don't want interoperability between implementations. We want to
enhance the internal workings of one particular implementation. I don't
see that the standards status affects that choice.
>
> But the more important point is that NFSv4 has a built-in way to bypass
> the server for I/O namely pNFS. And bypassing the server by directly
> going to a local file system is the text book example for what pNFS
> does. So we'll need a really good argument why we need to reinvented
> a different scheme for bypassing the server for I/O. Maybe there is
> a really good killer argument for doing that, but it needs to be clearly
> stated and defended instead of assumed.
Could you provide a reference to the text book - or RFC - that describes
a pNFS DS protocol that completely bypasses the network, allowing the
client and MDS to determine if they are the same host and to potentially
do zero-copy IO.
If not, I will find it hard to understand your claim that it is "the
text book example".
Also, neither you nor I are in a position to assert what is needed for a
change to get accepted. That is up the the people with M: in front of
their email address. I believe that any council that either of us give
will considered with genuine interest, but making demands seems out of
place.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 6:37 ` NeilBrown
@ 2024-07-06 6:42 ` Christoph Hellwig
2024-07-06 17:15 ` Chuck Lever III
2024-07-08 4:03 ` NeilBrown
0 siblings, 2 replies; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-06 6:42 UTC (permalink / raw)
To: NeilBrown
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton, Chuck Lever III,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Sat, Jul 06, 2024 at 04:37:22PM +1000, NeilBrown wrote:
> > a different scheme for bypassing the server for I/O. Maybe there is
> > a really good killer argument for doing that, but it needs to be clearly
> > stated and defended instead of assumed.
>
> Could you provide a reference to the text book - or RFC - that describes
> a pNFS DS protocol that completely bypasses the network, allowing the
> client and MDS to determine if they are the same host and to potentially
> do zero-copy IO.
I did not say that we have the exact same functionality available and
there is no work to do at all, just that it is the standard way to bypass
the server.
RFC 5662, RFC 5663 and RFC 8154 specify layouts that completely bypass
the network and require the client and server to find out that they talk
to the same storage devuce, and directly perform zero copy I/O.
They do not require to be on the same host, though.
> If not, I will find it hard to understand your claim that it is "the
> text book example".
pNFS is all about handing out grants to bypass the server for I/O.
That is exactly what localio is doing.
^ permalink raw reply [flat|nested] 77+ messages in thread
* "why NFSv3?" [was: Re: [PATCH v11 00/20] nfs/nfsd: add support for localio]
2024-07-06 5:52 ` NeilBrown
@ 2024-07-06 13:05 ` Mike Snitzer
0 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-06 13:05 UTC (permalink / raw)
To: NeilBrown
Cc: Chuck Lever III, Christoph Hellwig, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
Earlier yesterday I answered the question about "why NFSv3?" in simple
terms. Chuck and Christoph rejected it. I'm _not_ being evassive.
There isn't a lot to it: "More efficient NFS in containers" _is_ the
answer.
But hopefully this email settles "why NFSv3?". If not, please help me
(or others) further your understanding by reframing your NFSv3 concern
in terms other than "why NFSv3?".
Why NFSv3?
----------
The localio feature improves IO performance when using NFSv3 and NFSv4
with containers. Hammerspace has immediate need for the NFSv3
support, because its "data movers" use NFSv3, but NFSv4 support is
expected to be useful in the future.
Why not use pNFS?
-----------------
Just because Hammerspace is very invested in pNFS doesn't mean all
aspects are framed in terms of it.
The complexity of a pNFS-based approach to addressing localio makes it
inferior to the proposed solution of an autonomous NFS client and
server rendezvous to allow for network bypass. There is no need for
pNFS and by not using pNFS the localio feature is accessible for
general NFSv3 and NFSv4 use.
Answering "Why NFSv3?" with questions:
--------------------------------------
1) Why wasn't general NFS localio bypass controversial 3 weeks ago?
Instead (given all inputs, NFSv3 support requirement being one of
them) the use of a "localio protocol" got broad consensus and buyin
from Chuck, Jeff, and Neil.
I _thought_ we all agreed localio was a worthwhile _auxilliary_
addition to Linux's NFS client and server (to give them awareness of
each other through nfs_common) regardless of NFS protocol version.
That is why I registered a localio RPC program number with IANA, at
Chuck's suggestion, see:
https://www.iana.org/assignments/rpc-program-numbers/rpc-program-numbers.txt
$ cat rpc-program-numbers.txt | egrep 'Snitzer|Myklebust|Lever'
Linux Kernel Organization 400122 nfslocalio [Mike_Snitzer][Trond_Myklebust][Chuck_Lever]
[Chuck_Lever] Chuck Lever mailto:chuck.lever&oracle.com 2024-06-20
[Mike_Snitzer] Mike Snitzer mailto:snitzer&kernel.org 2024-06-20
[Trond_Myklebust] Trond Myklebust mailto:trondmy&hammerspace.com 2024-06-20
2) If we're introducing a general NFS localio bypass feature _and_
NFSv3 is important to the stakeholder proposing the feature _and_
NFSv3 support is easily implemented and supported: then why do you
have such concern about localio supporting NFSv3?
3) Why do you think NFSv3 unworthy? Is this just a bellweather for
broader opposition to flexfiles (and its encouraging more use of
NFSv3)? Flexfiles is at the heart of NFSv3 use at Hammerspace. I've
come to understand from Chuck and Christoph that the lack of flexfiles
support in NFSD helps fuel dislike for flexfiles. That's a lot for me
to unpack, and pretty far removed from "why NFSv3?", so I'd need more
context than I have if Hammerspace's use of flexfiles is what is
fueling your challenge of localio's NFSv3 support.
...
Reiterating and then expanding on my earlier succinct answer:
localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
containers).
Hammerspace needs localio to work with NFSv3 to assist with its
"data movers" that run on the host (using nfs [on host] and nfsd
[within container]).
Now you can ask why _that_ is.. but it really is pretty disjoint from
the simple matter of ensuring localio support both NFSv3 and NFSv4.
I've shared that Hammerspace's "data movers" use NFSv3 currently, in
the future they could use NFSv4 as needed. Hence the desire to
support localio with both NFSv3 and NFSv4. [when I picked up the
localio code NFSv4 wasn't even supported yet].
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 5:58 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Christoph Hellwig
@ 2024-07-06 13:12 ` Mike Snitzer
2024-07-08 9:46 ` Christoph Hellwig
0 siblings, 1 reply; 77+ messages in thread
From: Mike Snitzer @ 2024-07-06 13:12 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chuck Lever III, Jeff Layton, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Fri, Jul 05, 2024 at 10:58:18PM -0700, Christoph Hellwig wrote:
> On Fri, Jul 05, 2024 at 11:58:56PM -0400, Mike Snitzer wrote:
> > I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to
> > Christoph. Who has taken to feign incapable of understanding localio
> > yet is perfectly OK with flexing like he is an authority on the topic.
>
> Hi Mike,
>
> please take a few days off and relax, and then write an actual use case
> and requirements document. I'm out of this thread for now, but I'd
> appreciate if you'd just restart, assuming no one is activing in bad
> faith and try to explain what you are doing and why without getting
> upset.
If you'd like reasonable people to not get upset with you, you might
try treating them with respect rather than gaslighting them.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 3:58 ` Mike Snitzer
2024-07-06 5:52 ` NeilBrown
2024-07-06 5:58 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Christoph Hellwig
@ 2024-07-06 16:58 ` Chuck Lever III
2024-07-07 0:42 ` Mike Snitzer
2 siblings, 1 reply; 77+ messages in thread
From: Chuck Lever III @ 2024-07-06 16:58 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Jeff Layton, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
> On Jul 5, 2024, at 11:58 PM, Mike Snitzer <snitzer@kernel.org> wrote:
>
> On Fri, Jul 05, 2024 at 02:59:31PM +0000, Chuck Lever III wrote:
>>
>>
>>> On Jul 5, 2024, at 10:36 AM, Mike Snitzer <snitzer@kernel.org> wrote:
>>>
>>> On Fri, Jul 05, 2024 at 07:18:29AM -0700, Christoph Hellwig wrote:
>>>> On Fri, Jul 05, 2024 at 10:15:46AM -0400, Mike Snitzer wrote:
>>>>> NFSv3 is needed because NFSv3 is used to initiate IO to NFSv3 knfsd on
>>>>> the same host.
>>>>
>>>> That doesn't really bring is any further. Why is it required?
>>>>
>>>> I think we'll just need to stop this discussion until we have reasonable
>>>> documentation of the use cases and assumptions, because without that
>>>> we'll get hund up in dead loops.
>>>
>>> It _really_ isn't material to the core capability that localio provides.
>>> localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
>>> containers).
>>>
>>> Hammerspace needs localio to work with NFSv3 to assist with its "data
>>> movers" that run on the host (using nfs and nfsd).
>>>
>>> Please just remove yourself from the conversation if you cannot make
>>> sense of this. If you'd like to be involved, put the work in to
>>> understand the code and be professional.
>>
>> Sorry, I can't make sense of this either, and I find the
>> personal attack here completely inappropriate (and a bit
>> hypocritical, to be honest).
>
> Hi Chuck,
>
> I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to
> Christoph. Who has taken to feign incapable of understanding localio
> yet is perfectly OK with flexing like he is an authority on the topic.
Well let's try a reality test.
Christoph has authored an IETF RFC on pNFS. He's also contributed
the pNFS SCSI (and now NVMe) implementation in the Linux server
and client. He seems to know the code well enough to offer an
informed opinion.
> He rallied to your Nacked-By with his chest puffed up and has
> proceeded to baselessly shit-talk (did you miss his emails while we
> slept last night?). Yes, let's condone and encourage more of that!?
> No, I won't abide such toxicity. But thankfully Neil has since called
> for him to stop. Alas...
>
> Earlier today I answered the question about "why NFSv3?" in simple
> terms. You and Christoph rejected it. I'm _not_ being evassive.
> There isn't a lot to it: "More efficient NFS in containers" _is_ the
> answer.
>
> But hopefully this email settles "why NFSv3?". If not, please help me
> (or others) further your understanding by reframing your NFSv3 concern
> in terms other than "why NFSv3?". Getting a bit like having to answer
> "why is water wet?"
>
> Why NFSv3?
> ----------
>
> The localio feature improves IO performance when using NFSv3 and NFSv4
> with containers. Hammerspace has immediate need for the NFSv3
> support, because its "data movers" use NFSv3, but NFSv4 support is
> expected to be useful in the future.
>
> Just because Hammerspace is very invested in pNFS doesn't mean all
> aspects are framed in terms of it.
>
> General statement:
> ------------------
>
> I wrote maybe ~30% of the entire localio code as it stands at "v11"
> and that was focused primarily on adding NFSv4 support and developing
> the localio protocol, hooking it into NFS's client initialization and
> teardown along with the server (and vice-versa, nfsd lifetime due to
> container applications: tearing down nfsd in container while nfs
> client actively connected from host). Neil helped refine the localio
> protocol part, and he has also looked critically at many aspects and
> has a great list of improvements that are needed. Jeff provided
> top-notch review of my initial use of SRCU and later the percpu refcnt
> for interlocking with the client and server.
>
> My point: others wrote the majority of localio (years ago). I'm just
> trying to shepherd it upstream in an acceptable form. And yes,
> localio supporting both NFSv3 and NFSv4 is important to me,
> Hammerspace and anyone who'd like more efficient IO with both NFSv3
> and NFSv4 in containers.
>
> Answering "Why NFSv3?" with questions:
> --------------------------------------
>
> 1) Why wasn't general NFS localio bypass controversial 3 weeks ago?
> Instead (given all inputs, NFSv3 support requirement being one of
> them) the use of a "localio protocol" got broad consensus and buyin
> from you, Jeff, and Neil.
>
> I _thought_ we all agreed localio was a worthwhile _auxilliary_
> addition to Linux's NFS client and server (to give them awareness of
> each other through nfs_common) regardless of NFS protocol version.
> That is why I registered a localio RPC program number with IANA (at
> your suggestion, you were cc'd when I applied for it, and you are
> named on IANA.org along with Trond and myself for the program number
> IANA assigned):
> https://www.iana.org/assignments/rpc-program-numbers/rpc-program-numbers.txt
>
> $ cat rpc-program-numbers.txt | egrep 'Snitzer|Myklebust|Lever'
> Linux Kernel Organization 400122 nfslocalio [Mike_Snitzer][Trond_Myklebust][Chuck_Lever]
> [Chuck_Lever] Chuck Lever mailto:chuck.lever&oracle.com 2024-06-20
> [Mike_Snitzer] Mike Snitzer mailto:snitzer&kernel.org 2024-06-20
> [Trond_Myklebust] Trond Myklebust mailto:trondmy&hammerspace.com 2024-06-20
>
> 2) If we're introducing a general NFS localio bypass feature _and_
> NFSv3 is important to the stakeholder proposing the feature _and_
> NFSv3 support is easily implemented and supported: then why do you
> have such concern about localio supporting NFSv3?
>
> 3) Why do you think NFSv3 unworthy? Is this just a bellweather for
> broader opposition to flexfiles (and its encouraging more use of
> NFSv3)? Flexfiles is at the heart of NFSv3 use at Hammerspace. I've
> come to understand from you and Christoph that the lack of flexfiles
> support in NFSD helps fuel dislike for flexfiles. That's a lot for me
> to unpack, and pretty far removed from "why NFSv3?", so I'd need more
> context than I have if Hammerspace's use of flexfiles is what is
> fueling your challenge of localio's NFSv3 support.
>
> ...
>
> Reiterating and then expanding on my email above:
>
> localio supporting NFSv3 is beneficial for NFSv3 users (NFSv3 in
> containers).
>
> Hammerspace needs localio to work with NFSv3 to assist with its
> "data movers" that run on the host (using nfs [on host] and nfsd
> [within container]).
>
> Now you can ask why _that_ is.. but it really is pretty disjoint from
> the simple matter of ensuring localio support both NFSv3 and NFSv4.
>
> I've shared that Hammerspace's "data movers" use NFSv3 currently, in
> the future they could use NFSv4 as needed. Hence the desire to
> support localio with both NFSv3 and NFSv4. [when I picked up the
> localio code NFSv4 wasn't even supported yet].
>
> I _hope_ I've now answered "why NFSv3?" clearly.
>
>> I have nothing else to contribute that you won't either
>> dismiss or treat as a personal attack, so I can't continue
>> this conversation.
>
> That isn't even a little bit fair... but not taking the bait.
>
> Neil has been wonderful to work with and I look forward to all future
> work with him (localio and beyond). I am not trying to do anything
> out of line with this feature. I am and have been actively working
> with you, Neil and Jeff for over a month now. I've adapted and
> learned, _with_ your and others' help, to the best of my ability.
>
> I'm trying here, maybe you could say "I'm trying too hard". Well I
> just started a new job with Hammerspace after working for Red Hat for
> the past 15 years (much of my time spent as the upstream Linux DM
> maintainer -- but you know this). I am a capable engineer and I've
> proposed the upstreaming of a localio feature that would do well to
> land upstream. I've done so in a proficient way all things
> considered, always happy to learn new things and improve. I need to
> work with you. Hopefully well, and hopefully I can earn your respect,
> please just know I'm merely trying to improve NFS.
>
> Hammerspace would like to get all its Linux kernel NFS innovation
> upstream. And I'm trying to do that. localio is my first task and
> I've been working on it with focus for the past 2 months since joining
> Hammerspace. But you basically know all this, I said all of it to you
> at LSF.
>
> So if you know all these things (I _know_ you do), why are you
> treating me in this way? I feel like I'm caught in the middle of some
> much bigger divide than anything I've been involved with, caused or
> made privy to.
Yes, NFS is larger (much larger) and much older than the
Linux implementation of it. I'm sorry your new colleagues
have not seen fit to help you fit yourself into this
picture.
> Guess the messenger gets shot sometimes.
This is exactly the problem: your attitude of victimhood.
You act like our questions are personal attacks on you.
Answering "I don't know" or "I need to think about it" or
"Let me ask someone" or "Can you explain that further" is
perfectly fine. Acknowledging the areas where you need to
learn more is a quintessential part of being a professional
software engineer.
---
I have no strong feelings one way or another about flexfiles.
And, I remain a full-throated advocate of the NFSv3 support
in the Linux NFS stack.
If you get an impression from me, come talk to me first
to confirm it. Don't go talk to your colleagues; in
particular don't talk to the ones who like to spread a lot
of weird ideas about me. You're getting bad information.
---
Our question isn't "Why NFSv3?" It's: Can your design
document explain in detail how the one existing application
(the data mover) will use and benefit from loopback
acceleration? It needs to explain why the data mover
does not use possibly more suitable solutions like
NFSv4.2 COPY. Why are we going to the effort of adding
this side car instead of using facilities that are
already available?
We're not asking for a one sentence email. A one sentence
email is not "a response in simple terms". It is a petulant
dismissal of our request for more info.
We're asking for a real problem statement and use case,
in detail, in the design document, not in email.
(Go read the requests in this thread again. Honest, that's
all we're asking for).
---
But finally: We've asked repeatedly for very typical
changes and answers, and though sometimes we're met
with a positive response, other times we get a defensive
response, or "I don't feel like it" or "that's busy work"
or "you're wasting my time." That doesn't sound like the
spirit of co-operation that I would like to see from a
regular contributor, nor do I expect it from someone who
is also a Linux kernel maintainer who really ought to
know better.
So heed Christoph's excellent advice: go eat a Snickers.
Calm down. Breathe. None of the rest of us are anywhere
near as upset about this as you are right now.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 6:42 ` Christoph Hellwig
@ 2024-07-06 17:15 ` Chuck Lever III
2024-07-08 4:10 ` NeilBrown
2024-07-08 9:40 ` Christoph Hellwig
2024-07-08 4:03 ` NeilBrown
1 sibling, 2 replies; 77+ messages in thread
From: Chuck Lever III @ 2024-07-06 17:15 UTC (permalink / raw)
To: Christoph Hellwig, Neil Brown
Cc: Mike Snitzer, Jeff Layton, Linux NFS Mailing List, Anna Schumaker,
Trond Myklebust, Dave Chinner
> On Jul 6, 2024, at 2:42 AM, Christoph Hellwig <hch@infradead.org> wrote:
>
> On Sat, Jul 06, 2024 at 04:37:22PM +1000, NeilBrown wrote:
>>> a different scheme for bypassing the server for I/O. Maybe there is
>>> a really good killer argument for doing that, but it needs to be clearly
>>> stated and defended instead of assumed.
>>
>> Could you provide a reference to the text book - or RFC - that describes
>> a pNFS DS protocol that completely bypasses the network, allowing the
>> client and MDS to determine if they are the same host and to potentially
>> do zero-copy IO.
>
> I did not say that we have the exact same functionality available and
> there is no work to do at all, just that it is the standard way to bypass
> the server.
>
> RFC 5662, RFC 5663 and RFC 8154 specify layouts that completely bypass
> the network and require the client and server to find out that they talk
> to the same storage devuce, and directly perform zero copy I/O.
> They do not require to be on the same host, though.
>
>> If not, I will find it hard to understand your claim that it is "the
>> text book example".
>
> pNFS is all about handing out grants to bypass the server for I/O.
> That is exactly what localio is doing.
In particular, Neil, a pNFS block/SCSI layout provides the
client with a set of device IDs. If the client is on the
same storage fabric as those devices, it can then access
those devices directly using SCSI commands rather than
going on the network [RFC8154].
This is equivalent to a loopback acceleration mechanism. If
the client and server are on the same host, then there are
natural ways to expose the devices to both peers, and the
existing pNFS protocol and SCSI Persistent Reservation
provide strong access authorization.
Both the Linux NFS client and server implement RFC 8154
well enough that this could be an alternative or even a
better solution than LOCALIO. The server stores an XFS
file system on the devices, and hands out layouts with
the device ID and LBAs of the extents where file content
is located.
The fly in this ointment is the need for NFSv3 support.
In an earlier email Mike mentioned that Hammerspace isn't
interested in providing a centrally managed directory of
block devices that could be utilized by the MDS to simply
inform the client of local devices. I don't think that's
the only possible solution for discovering the locality of
storage devices.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 16:58 ` Chuck Lever III
@ 2024-07-07 0:42 ` Mike Snitzer
0 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-07 0:42 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Jeff Layton, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
Chuck,
I think we can both agree there is no real benefit to us trading
punches and looking like fools more than we already have.
I say that not to impugn you or your position, but we look foolish.
I will pull my punches entirely (I really am, I could blow up
everything and that'll really be career limiting, heh). My aim is to
say my peace in this reply and hopefully we can set this unfortunate
exchange to one side.
I'll learn what I can from it, maybe you will, maybe others will...
On Sat, Jul 06, 2024 at 04:58:50PM +0000, Chuck Lever III wrote:
>
>
> > On Jul 5, 2024, at 11:58 PM, Mike Snitzer <snitzer@kernel.org> wrote:
> >
> >
> > Hi Chuck,
> >
> > I'm out-gunned with this good-cop/bad-cop dynamic. I was replying to
> > Christoph. Who has taken to feign incapable of understanding localio
> > yet is perfectly OK with flexing like he is an authority on the topic.
>
> Well let's try a reality test.
Perception is reality, so your reality is different than mine.
Neil, Christoph and yourself all took my last "Who has taken to feign"
sentence above as some ad-hominem attack. It wasn't, Christoph was
acting like the localio code incomprehensible for effect.
> Christoph has authored an IETF RFC on pNFS. He's also contributed
> the pNFS SCSI (and now NVMe) implementation in the Linux server
> and client. He seems to know the code well enough to offer an
> informed opinion.
I am _not_ questioning (and _never_ have questioned) Christoph's
extensive contributions, experience or intelligence.
I am calling out that Christoph didn't actually review the localio
code but proceeded to make extreme baseless negative judgments of it.
And you've glossed over that entirely and made it about "Christoph is
Christoph, who are you again?". Well aware who he is but I'm saying
baseless negative judgments are also a very regular occurrence from
him when he decides he just wants to blow up what I am doing (could be
he has done this to others, I have no idea). It isn't about the
technical details in the moment he does this, if he says it it must be
true, the aim is taint my work. Sad because I thought that dynamic
died when I finally relented in our feud over NVMe vs DM multipath
that spanned 2018 to 2021)
But damnit, it is happening all over again, now in the context of
localio. And you're buying it because he is parroting your concerns
about "why can't pNFS be used instead!?"
Neil won't get an answer after having called Christoph out on this
(unless Christoph now wants to try to make a liar out of me by
fabricating something well after his moment to do so has passed):
https://marc.info/?l=linux-nfs&m=172021727813076&w=2
If Christoph could arrest his propensity to do harm where it not
warranted I'd take every single technical critical feedback very
seriously and adjust the code as needed. When it is about the code,
_actually_ about the code.. Christoph rules.
He knows this, happy to say it (again): I respect his technical
ability, etc. I do _not_ respect him making blanket statements about
code without saying _why_ he arrived at his judgment.
<snip many words from me answering "why NFSv3?">
> > Hammerspace would like to get all its Linux kernel NFS innovation
> > upstream. And I'm trying to do that. localio is my first task and
> > I've been working on it with focus for the past 2 months since joining
> > Hammerspace. But you basically know all this, I said all of it to you
> > at LSF.
> >
> > So if you know all these things (I _know_ you do), why are you
> > treating me in this way? I feel like I'm caught in the middle of some
> > much bigger divide than anything I've been involved with, caused or
> > made privy to.
>
> Yes, NFS is larger (much larger) and much older than the
> Linux implementation of it. I'm sorry your new colleagues
> have not seen fit to help you fit yourself into this
> picture.
See a non-sequitur that takes shots at Hammerspace isn't professional.
Over the numerous iterations of localio there have been a handful of
times you have taken to name drop "Hammerspace" with a negative
connotation, simmering with contempt. Like I and others should feel
shame by association.
They haven't done anything wrong here. Trond hasn't done anything
wrong here. Whatever you grievance with Hammerspace, when interacting
with me please know I don't have any context for it. It is immaterial
to me and I don't need to know. If/when you have individual technical
problems with something Hammerspace is doing let's just work it
through without it needing to be this awkward elephant.
> > Guess the messenger gets shot sometimes.
>
> This is exactly the problem: your attitude of victimhood.
Statements like that show you're going out of your way to make an
enemy of me with no _real_ basis. But let me be clear: I have
absolutely been a victim of bullying and serious psychological attacks
over this past week, particularly repeat gaslighting by both you and
Christoph. That you're so unaware of how you have spun a false
narrative and are now tag-teaming with Christoph to attack me further
has been apparent for all to see. Hopefully linux-nfs is low traffic ;)
You have obviously formed an unfavorable opinion of me, apparent
turning point was my v11 patch header and ensuing negative exchange. I
own being too pushy when seeking progress on localio v11's inclusion
for 6.11; I apologized for that.
Hopefully we can actually get past all of this.
> You act like our questions are personal attacks on you.
No, I act like your personal attacks are personal attacks. Questions
and technical issues are always fair game. I pushed back on _one_
of your NFSD requirements (tracepoints), sorry if that was some
indication that I'm a malcontent.. but I'm not. I can have a
technical opinion though. SO I may make them known at times.
> Answering "I don't know" or "I need to think about it" or
> "Let me ask someone" or "Can you explain that further" is
> perfectly fine. Acknowledging the areas where you need to
> learn more is a quintessential part of being a professional
> software engineer.
I'm not afraid to admit when I don't know something (I've said it to
you when we met at LSF). I'm in no way an expert in NFS, you are. I
respect your command of NFS and welcome learning from you (like I have
to this point and hope to in the future).
> ---
>
> I have no strong feelings one way or another about flexfiles.
>
> And, I remain a full-throated advocate of the NFSv3 support
> in the Linux NFS stack.
>
> If you get an impression from me, come talk to me first
> to confirm it. Don't go talk to your colleagues; in
> particular don't talk to the ones who like to spread a lot
> of weird ideas about me. You're getting bad information.
I haven't ever heard from anyone at Hammerspace about you. And I
haven't sought insight about you either. AFAIK you aren't on anyone's
mind within Hammerspace (other than me given I'm actively dealing with
this unfortunate situation).
> ---
>
> Our question isn't "Why NFSv3?" It's: Can your design
> document explain in detail how the one existing application
> (the data mover) will use and benefit from loopback
> acceleration? It needs to explain why the data mover
> does not use possibly more suitable solutions like
> NFSv4.2 COPY. Why are we going to the effort of adding
> this side car instead of using facilities that are
> already available?
>
> We're not asking for a one sentence email. A one sentence
> email is not "a response in simple terms". It is a petulant
> dismissal of our request for more info.
Well I in no way intended it to be petulant. I was trying to reduce
the attack surface. And if you'd trust me at my word that'd go a long
way.
Try to take a step back and trust me when I tell you something.
You're welcome to unfulfilled by an answer and seek clarity, but I
promise you I'm not evasive or leaving information on the floor when
I answer questions. If I don't know something I seek the details out.
But you and Christoph are making a simple line of development
(localio) into some referendum on design choices in layers you have no
charter to concern yourself with (Hammerspace's data movers). You
still cannot accept that localio is devoid of pNFS use case
requirements (as required by Hammerspace) but still see fit to
reengineer the feature in terms of pNFS.
Hammerspace simply wants to optimize an NFS client on a host
connecting to an NFSD running in a container on the same host. It is
doing that in service to its distributed namespace that it is hosting
in disparate NFSD instances. But the "data movers" are a sideband
remapping/reshaping/rebalancing service. Completely disjoint from the
distribute pNFS namespace that the primary (flexfiles) clients are
accessing.
> We're asking for a real problem statement and use case,
> in detail, in the design document, not in email.
>
> (Go read the requests in this thread again. Honest, that's
> all we're asking for).
I'll consult with Trond to try to see how he suggests appeasing your
request more than I already have. But you really are departing
heavily from the narrow scope that localio covers.
> ---
>
> But finally: We've asked repeatedly for very typical
> changes and answers, and though sometimes we're met
> with a positive response, other times we get a defensive
> response, or "I don't feel like it" or "that's busy work"
> or "you're wasting my time." That doesn't sound like the
> spirit of co-operation that I would like to see from a
> regular contributor, nor do I expect it from someone who
> is also a Linux kernel maintainer who really ought to
> know better.
Projecting attributes and actions onto me doesn't make them true. I
haven't done or said most of the things you just asserted there.
I pushed back on tracepoints, but in the end I removed all the
dprintk()s from the NFSD code.
> So heed Christoph's excellent advice: go eat a Snickers.
> Calm down. Breathe. None of the rest of us are anywhere
> near as upset about this as you are right now.
Your words say otherwise, they have become quite negatively charged to
harm in the last week, but please let's just move on. I'm serious, no
lasting harm done from my vantage point, I can move on if you can.
Thanks,
Mike
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 6:42 ` Christoph Hellwig
2024-07-06 17:15 ` Chuck Lever III
@ 2024-07-08 4:03 ` NeilBrown
2024-07-08 9:37 ` Christoph Hellwig
1 sibling, 1 reply; 77+ messages in thread
From: NeilBrown @ 2024-07-08 4:03 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton, Chuck Lever III,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Sat, 06 Jul 2024, Christoph Hellwig wrote:
> On Sat, Jul 06, 2024 at 04:37:22PM +1000, NeilBrown wrote:
> > > a different scheme for bypassing the server for I/O. Maybe there is
> > > a really good killer argument for doing that, but it needs to be clearly
> > > stated and defended instead of assumed.
> >
> > Could you provide a reference to the text book - or RFC - that describes
> > a pNFS DS protocol that completely bypasses the network, allowing the
> > client and MDS to determine if they are the same host and to potentially
> > do zero-copy IO.
>
> I did not say that we have the exact same functionality available and
> there is no work to do at all, just that it is the standard way to bypass
> the server.
Sometimes what you don't say is important. As you acknowledge there is
work to do. Understanding how much work is involved is critical to
understanding that possible direction.
>
> RFC 5662, RFC 5663 and RFC 8154 specify layouts that completely bypass
> the network and require the client and server to find out that they talk
> to the same storage devuce, and directly perform zero copy I/O.
> They do not require to be on the same host, though.
Thanks.
>
> > If not, I will find it hard to understand your claim that it is "the
> > text book example".
>
> pNFS is all about handing out grants to bypass the server for I/O.
> That is exactly what localio is doing.
Yes, there is clearly an alignment.
But pNFS is about handing out grants using standardised protocols that
support interoperability between distinct nodes, and possibly distinct
implementations. localio doesn't need any of that. It all exists in a
single implementation on a single node. So in that sense there can be
expected to be different priorities.
Why should we pay the costs of pNFS when implementing localio? That
question can only be answered if we have a good understanding of the
costs and benefits. And that requires having a concrete proposal for
the "pNFS" option - if only a detailed sketch.
Just because pNFS could be part of the answer (which I don't dispute)
that doesn't necessarily mean it should be part of the answer.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 17:15 ` Chuck Lever III
@ 2024-07-08 4:10 ` NeilBrown
2024-07-08 14:41 ` Chuck Lever III
2024-07-08 9:40 ` Christoph Hellwig
1 sibling, 1 reply; 77+ messages in thread
From: NeilBrown @ 2024-07-08 4:10 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Sun, 07 Jul 2024, Chuck Lever III wrote:
>
> Both the Linux NFS client and server implement RFC 8154
> well enough that this could be an alternative or even a
> better solution than LOCALIO. The server stores an XFS
> file system on the devices, and hands out layouts with
> the device ID and LBAs of the extents where file content
> is located.
>
> The fly in this ointment is the need for NFSv3 support.
Another fly in this ointment is that only XFS currently implements that
.map_blocks export_operation, so only it could be used as a server-side
filesystem.
Maybe that would not be a barrier to Mike, but it does make it a lot
less interesting to me (not that I have a particular use case in mind,
but I just find "local bypass for NFSv4.1+ on XFS" less interesting than
"local bypass for NFS on Linux").
But my interest isn't a requirement of course.
>
> In an earlier email Mike mentioned that Hammerspace isn't
> interested in providing a centrally managed directory of
> block devices that could be utilized by the MDS to simply
> inform the client of local devices. I don't think that's
> the only possible solution for discovering the locality of
> storage devices.
Could you sketch out an alternate solution so that it can be assessed
objectively?
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-08 4:03 ` NeilBrown
@ 2024-07-08 9:37 ` Christoph Hellwig
2024-07-10 0:10 ` NeilBrown
0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-08 9:37 UTC (permalink / raw)
To: NeilBrown
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton, Chuck Lever III,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Mon, Jul 08, 2024 at 02:03:02PM +1000, NeilBrown wrote:
> > I did not say that we have the exact same functionality available and
> > there is no work to do at all, just that it is the standard way to bypass
> > the server.
>
> Sometimes what you don't say is important. As you acknowledge there is
> work to do. Understanding how much work is involved is critical to
> understanding that possible direction.
Of course there is. I've never said we don't need to do any work,
I'm just asking why we are not using the existing infrastruture to do
it.
> But pNFS is about handing out grants using standardised protocols that
> support interoperability between distinct nodes, and possibly distinct
> implementations. localio doesn't need any of that. It all exists in a
> single implementation on a single node. So in that sense there can be
> expected to be different priorities.
>
> Why should we pay the costs of pNFS when implementing localio?
Why do you think we pay a cost for it? From all I can tell it makes
the job simpler, especially if we want to do things like bypassing
the second page cache.
> That
> question can only be answered if we have a good understanding of the
> costs and benefits. And that requires having a concrete proposal for
> the "pNFS" option - if only a detailed sketch.
I sketched the the very sketchy sketch earlier - add a new localio
layout that does local file I/O. The I/O side of that is pretty
tivial, and maybe I can find some time to write draft code. The file
open side is just as horrible as in the current localio proposal,
and I could just reuse that for now, although I think the concept
of opening the file in the client contect is fundamentally wrong
no matter how we skin the cat.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 17:15 ` Chuck Lever III
2024-07-08 4:10 ` NeilBrown
@ 2024-07-08 9:40 ` Christoph Hellwig
1 sibling, 0 replies; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-08 9:40 UTC (permalink / raw)
To: Chuck Lever III
Cc: Christoph Hellwig, Neil Brown, Mike Snitzer, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Sat, Jul 06, 2024 at 05:15:08PM +0000, Chuck Lever III wrote:
> In an earlier email Mike mentioned that Hammerspace isn't
> interested in providing a centrally managed directory of
> block devices that could be utilized by the MDS to simply
> inform the client of local devices. I don't think that's
> the only possible solution for discovering the locality of
> storage devices.
Btw, no matter what Hammerspace feels (and why that matters for Linux),
the block layout is not a good idea for bypassing the network between
supposedly isolated containers. It completely bypasses the NFS
security model, which is an spectacularly bad idea if the clients aren't
fully trusted. I've mentioned this before, but I absolutely do not
advocate for using the block layout as a network bypass here.
The concept to do local file I/O from the client in cases where we
can do it is absolutely sensible. I just don't think doing it is
a magic unmanaged layer is a good idea, and figuring out how to
pass the opened file from nfsd to the client without risking security
problems and creating painful layering violations needs to be solved.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-06 13:12 ` Mike Snitzer
@ 2024-07-08 9:46 ` Christoph Hellwig
2024-07-08 12:55 ` Mike Snitzer
0 siblings, 1 reply; 77+ messages in thread
From: Christoph Hellwig @ 2024-07-08 9:46 UTC (permalink / raw)
To: Mike Snitzer
Cc: Christoph Hellwig, Chuck Lever III, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Neil Brown, Dave Chinner
On Sat, Jul 06, 2024 at 09:12:44AM -0400, Mike Snitzer wrote:
> If you'd like reasonable people to not get upset with you, you might
> try treating them with respect rather than gaslighting them.
Let me just repeat the calm down offer. I'll do my side of it by
stopping from replying to these threads for this week now, after that
I'll forget all the accusations you're throwing at me, and we restart
a purely technical discussion, ok?
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-08 9:46 ` Christoph Hellwig
@ 2024-07-08 12:55 ` Mike Snitzer
0 siblings, 0 replies; 77+ messages in thread
From: Mike Snitzer @ 2024-07-08 12:55 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Chuck Lever III, Jeff Layton, Linux NFS Mailing List,
Anna Schumaker, Trond Myklebust, Neil Brown, Dave Chinner
On Mon, Jul 08, 2024 at 02:46:28AM -0700, Christoph Hellwig wrote:
> On Sat, Jul 06, 2024 at 09:12:44AM -0400, Mike Snitzer wrote:
> > If you'd like reasonable people to not get upset with you, you might
> > try treating them with respect rather than gaslighting them.
>
> Let me just repeat the calm down offer. I'll do my side of it by
> stopping from replying to these threads for this week now, after that
> I'll forget all the accusations you're throwing at me, and we restart
> a purely technical discussion, ok?
OK
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-08 4:10 ` NeilBrown
@ 2024-07-08 14:41 ` Chuck Lever III
0 siblings, 0 replies; 77+ messages in thread
From: Chuck Lever III @ 2024-07-08 14:41 UTC (permalink / raw)
To: Neil Brown
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
> On Jul 8, 2024, at 12:10 AM, NeilBrown <neilb@suse.de> wrote:
>
> On Sun, 07 Jul 2024, Chuck Lever III wrote:
>>
>> Both the Linux NFS client and server implement RFC 8154
>> well enough that this could be an alternative or even a
>> better solution than LOCALIO. The server stores an XFS
>> file system on the devices, and hands out layouts with
>> the device ID and LBAs of the extents where file content
>> is located.
>>
>> The fly in this ointment is the need for NFSv3 support.
>
> Another fly in this ointment is that only XFS currently implements that
> .map_blocks export_operation, so only it could be used as a server-side
> filesystem.
I agree that limiting loopback acceleration only to
XFS exports is an artificial and undesirable
constraint.
> Maybe that would not be a barrier to Mike, but it does make it a lot
> less interesting to me (not that I have a particular use case in mind,
> but I just find "local bypass for NFSv4.1+ on XFS" less interesting than
> "local bypass for NFS on Linux").
>
> But my interest isn't a requirement of course.
My focus for now is ensuring that whatever is merged
into NFSD can be audited, in a straightforward and
rigorous way, to give us confidence that it is secure
and respects the existing policies and boundaries that
are configured via export authorization and local
namespace settings.
The pNFS-based approaches have the advantage that a lot
of that auditing has already been done, and by a wider
set of reviewers than just the Linux community. That's
my only interest in possibly excluding NFSv3; I have no
other quibble with the high-level goal of supporting
NFSv3 for loopback acceleration.
Thanks everyone for a spirited and frank discussion.
>> In an earlier email Mike mentioned that Hammerspace isn't
>> interested in providing a centrally managed directory of
>> block devices that could be utilized by the MDS to simply
>> inform the client of local devices. I don't think that's
>> the only possible solution for discovering the locality of
>> storage devices.
>
> Could you sketch out an alternate solution so that it can be assessed
> objectively?
I'll give it some thought.
--
Chuck Lever
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: [PATCH v11 00/20] nfs/nfsd: add support for localio
2024-07-08 9:37 ` Christoph Hellwig
@ 2024-07-10 0:10 ` NeilBrown
0 siblings, 0 replies; 77+ messages in thread
From: NeilBrown @ 2024-07-10 0:10 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Christoph Hellwig, Mike Snitzer, Jeff Layton, Chuck Lever III,
Linux NFS Mailing List, Anna Schumaker, Trond Myklebust,
Dave Chinner
On Mon, 08 Jul 2024, Christoph Hellwig wrote:
> On Mon, Jul 08, 2024 at 02:03:02PM +1000, NeilBrown wrote:
> > > I did not say that we have the exact same functionality available and
> > > there is no work to do at all, just that it is the standard way to bypass
> > > the server.
> >
> > Sometimes what you don't say is important. As you acknowledge there is
> > work to do. Understanding how much work is involved is critical to
> > understanding that possible direction.
>
> Of course there is. I've never said we don't need to do any work,
> I'm just asking why we are not using the existing infrastruture to do
> it.
>
> > But pNFS is about handing out grants using standardised protocols that
> > support interoperability between distinct nodes, and possibly distinct
> > implementations. localio doesn't need any of that. It all exists in a
> > single implementation on a single node. So in that sense there can be
> > expected to be different priorities.
> >
> > Why should we pay the costs of pNFS when implementing localio?
>
> Why do you think we pay a cost for it? From all I can tell it makes
> the job simpler, especially if we want to do things like bypassing
> the second page cache.
>
> > That
> > question can only be answered if we have a good understanding of the
> > costs and benefits. And that requires having a concrete proposal for
> > the "pNFS" option - if only a detailed sketch.
>
> I sketched the the very sketchy sketch earlier - add a new localio
> layout that does local file I/O. The I/O side of that is pretty
> tivial, and maybe I can find some time to write draft code. The file
> open side is just as horrible as in the current localio proposal,
> and I could just reuse that for now, although I think the concept
> of opening the file in the client contect is fundamentally wrong
> no matter how we skin the cat.
>
I had been assuming that you were proposing a new pNFS layout type with
associated protocol extension, an RFC describing them, and navigation of
the IETF standards process. These are (some of) the costs I was
thinking of. Of course the IETF requires demonstration of
interoperability between multiple implementations, and as that is
impossible for localio, we would fail before we started.
But I now suspect that I guessed your intention wrongly (I'm rubbish at
guessing other people's intentions). Your use of the word
"infrastructure" above and the sketchy sketch you provide (thanks) seems
to indicate you are only suggesting that we re-use some of the pnfs
abstractions and interfaces already implemented in the Linux NFS client
and server.
Is that what you mean?
If it is, then it isn't immediately clear to me that this would have to
be NFSv4 only. The different versions already share code where that
makes sense. Moving the pnfs code from the nfsv4 module to a new pnfs
module which nfsv3 also depends on might make sense.
I'm keen to know what you are really thinking.
Thanks,
NeilBrown
^ permalink raw reply [flat|nested] 77+ messages in thread
end of thread, other threads:[~2024-07-10 0:10 UTC | newest]
Thread overview: 77+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-02 16:28 [PATCH v11 00/20] nfs/nfsd: add support for localio Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 01/20] SUNRPC: add rpcauth_map_to_svc_cred_local Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 02/20] nfs: factor out {encode,decode}_opaque_fixed to nfs_xdr.h Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 03/20] nfs_common: add NFS LOCALIO auxiliary protocol enablement Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 04/20] nfsd: add "localio" support Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 05/20] nfsd: add Kconfig options to allow localio to be enabled Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 06/20] nfsd: manage netns reference in nfsd_open_local_fh Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 07/20] nfsd: use percpu_ref to interlock nfsd_destroy_serv and nfsd_open_local_fh Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 08/20] nfsd: implement server support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 09/20] SUNRPC: replace program list with program array Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 10/20] nfs: pass nfs_client to nfs_initiate_pgio Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 11/20] nfs: pass descriptor thru nfs_initiate_pgio path Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 12/20] nfs: pass struct file to nfs_init_pgio and nfs_init_commit Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 13/20] nfs: add "localio" support Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 14/20] nfs: fix nfs_localio_vfs_getattr() to properly support v4 Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 15/20] nfs: enable localio for non-pNFS I/O Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 16/20] pnfs/flexfiles: enable localio for flexfiles I/O Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 17/20] SUNRPC: remove call_allocate() BUG_ON if p_arglen=0 to allow RPC with void arg Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 18/20] nfs/localio: use dedicated workqueues for filesystem read and write Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 19/20] nfs: implement client support for NFS_LOCALIO_PROGRAM Mike Snitzer
2024-07-02 16:28 ` [PATCH v11 20/20] nfs: add Documentation/filesystems/nfs/localio.rst Mike Snitzer
2024-07-02 18:06 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Chuck Lever III
2024-07-02 18:32 ` Mike Snitzer
2024-07-02 20:10 ` Chuck Lever III
2024-07-03 0:57 ` Mike Snitzer
2024-07-03 0:52 ` NeilBrown
2024-07-03 1:13 ` Mike Snitzer
2024-07-03 5:04 ` Christoph Hellwig
2024-07-03 8:52 ` Mike Snitzer
2024-07-03 14:16 ` Christoph Hellwig
2024-07-03 15:11 ` Mike Snitzer
2024-07-03 15:18 ` Christoph Hellwig
2024-07-03 15:24 ` Chuck Lever III
2024-07-03 15:29 ` Christoph Hellwig
2024-07-03 15:36 ` Mike Snitzer
2024-07-03 17:06 ` Jeff Layton
2024-07-04 6:00 ` Christoph Hellwig
2024-07-04 18:31 ` Mike Snitzer
2024-07-05 5:18 ` Christoph Hellwig
2024-07-05 13:35 ` Chuck Lever III
2024-07-05 13:39 ` Christoph Hellwig
2024-07-05 14:15 ` Mike Snitzer
2024-07-05 14:18 ` Christoph Hellwig
2024-07-05 14:36 ` Mike Snitzer
2024-07-05 14:59 ` Chuck Lever III
2024-07-06 3:58 ` Mike Snitzer
2024-07-06 5:52 ` NeilBrown
2024-07-06 13:05 ` "why NFSv3?" [was: Re: [PATCH v11 00/20] nfs/nfsd: add support for localio] Mike Snitzer
2024-07-06 5:58 ` [PATCH v11 00/20] nfs/nfsd: add support for localio Christoph Hellwig
2024-07-06 13:12 ` Mike Snitzer
2024-07-08 9:46 ` Christoph Hellwig
2024-07-08 12:55 ` Mike Snitzer
2024-07-06 16:58 ` Chuck Lever III
2024-07-07 0:42 ` Mike Snitzer
2024-07-05 18:59 ` Jeff Layton
2024-07-05 22:08 ` NeilBrown
2024-07-06 6:02 ` Christoph Hellwig
2024-07-06 6:37 ` NeilBrown
2024-07-06 6:42 ` Christoph Hellwig
2024-07-06 17:15 ` Chuck Lever III
2024-07-08 4:10 ` NeilBrown
2024-07-08 14:41 ` Chuck Lever III
2024-07-08 9:40 ` Christoph Hellwig
2024-07-08 4:03 ` NeilBrown
2024-07-08 9:37 ` Christoph Hellwig
2024-07-10 0:10 ` NeilBrown
2024-07-03 17:19 ` Chuck Lever III
2024-07-03 19:04 ` Mike Snitzer
2024-07-04 5:55 ` Christoph Hellwig
2024-07-03 21:35 ` NeilBrown
2024-07-04 6:01 ` Christoph Hellwig
2024-07-04 10:13 ` Jeff Layton
2024-07-03 15:26 ` Chuck Lever III
2024-07-03 15:37 ` Mike Snitzer
2024-07-03 15:16 ` Christoph Hellwig
2024-07-03 15:28 ` Mike Snitzer
2024-07-04 5:49 ` Christoph Hellwig
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox