* [PATCH v14-plus 00/25] Address netns refcount issues for localio
@ 2024-08-30 2:20 NeilBrown
2024-08-30 2:20 ` [PATCH 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement NeilBrown
` (7 more replies)
0 siblings, 8 replies; 10+ messages in thread
From: NeilBrown @ 2024-08-30 2:20 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever, Jeff Layton; +Cc: linux-nfs
Following are revised versions of 6 patches from the v14 localio series.
The issue addressed is net namespace refcounting.
We don't want to keep a long-term counted reference in the client
because that prevents a server container from completely shutting down.
So we avoid taking a reference at all and rely on the per-cpu reference
to the server being sufficient to keep the net-ns active. This involves
allowing the net-ns exit code to iterate all active clients and clear
their ->net pointers (which they need to find the per-cpu-refcount for
the nfs_serv).
So:
- embed nfs_uuid_t in nfs_client. This provides a list_head that can
be used to find the client. It does add the actual uuid to nfs_client
so it is bigger than needed. If that is really a problem we can find
a fix.
- When the nfs server confirms that the uuid is local, it moves the
nfs_uuid_t onto a per-net-ns list.
- When the net-ns is shutting down - in a "pre_exit" handler, all these
nfS_uuid_t have their ->net cleared. There is an rcu_synchronize()
call between pre_exit() handlers and exit() handlers so and caller
that sees ->net as not NULL can safely check the ->counter
- We now pass the nfs_uuid_t to nfsd_open_local_fh() so it can safely
look at ->net in a private rcu_read_lock() section.
I have compile tested this code but nothing more.
Thanks,
NeilBrown
[PATCH 14/25] nfs_common: add NFS LOCALIO auxiliary protocol
[PATCH 15/25] nfs_common: introduce nfs_localio_ctx struct and
[PATCH 16/25] nfsd: add localio support
[PATCH 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM
[PATCH 19/25] nfs: add localio support
[PATCH 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
@ 2024-08-30 2:20 ` NeilBrown
2024-08-30 2:20 ` [PATCH 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces NeilBrown
` (6 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2024-08-30 2:20 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever, Jeff Layton; +Cc: linux-nfs
From: Mike Snitzer <snitzer@kernel.org>
fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client
to generate a nonce (single-use UUID) and associated nfs_uuid_t struct,
register it with nfs_common for subsequent lookup and verification by
the NFS server and if matched the NFS server populates members in the
nfs_uuid_t struct.
When the server populates the structure, it is moved onto a private list
allowing the server to invalidate the ->net pointer when the name space
is shut down. A maybe_get_net() call under rcu_read_lock() can be used
to stablish the net pointer active use.
nfs_common's nfs_uuids list is the basis for localio enablement, as
such it has members that point to nfsd memory for direct use by the
client (e.g. 'net' is the server's network namespace, through it the
client can access nn->nfsd_serv with proper rcu read access).
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs_common/Makefile | 3 ++
fs/nfs_common/nfslocalio.c | 108 +++++++++++++++++++++++++++++++++++++
include/linux/nfslocalio.h | 34 ++++++++++++
3 files changed, 145 insertions(+)
create mode 100644 fs/nfs_common/nfslocalio.c
create mode 100644 include/linux/nfslocalio.h
diff --git a/fs/nfs_common/Makefile b/fs/nfs_common/Makefile
index e58b01bb8dda..a5e54809701e 100644
--- a/fs/nfs_common/Makefile
+++ b/fs/nfs_common/Makefile
@@ -6,6 +6,9 @@
obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o
nfs_acl-objs := nfsacl.o
+obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o
+nfs_localio-objs := nfslocalio.o
+
obj-$(CONFIG_GRACE_PERIOD) += grace.o
obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
new file mode 100644
index 000000000000..f78cf99e2547
--- /dev/null
+++ b/fs/nfs_common/nfslocalio.c
@@ -0,0 +1,108 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/rculist.h>
+#include <linux/nfslocalio.h>
+#include <net/netns/generic.h>
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("NFS localio protocol bypass support");
+
+static DEFINE_SPINLOCK(nfs_uuid_lock);
+
+/*
+ * Global list of nfs_uuid_t instances, add/remove
+ * is protected by nfs_uuid_lock.
+ * Reads are protected by RCU read lock (see below).
+ */
+LIST_HEAD(nfs_uuids);
+
+void nfs_uuid_begin(nfs_uuid_t *nfs_uuid)
+{
+ nfs_uuid->net = NULL;
+ nfs_uuid->dom = NULL;
+ uuid_gen(&nfs_uuid->uuid);
+
+ spin_lock(&nfs_uuid_lock);
+ list_add_tail_rcu(&nfs_uuid->list, &nfs_uuids);
+ spin_unlock(&nfs_uuid_lock);
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_begin);
+
+void nfs_uuid_end(nfs_uuid_t *nfs_uuid)
+{
+ if (nfs_uuid->net == NULL) {
+ spin_lock(&nfs_uuid_lock);
+ list_del_init(&nfs_uuid->list);
+ spin_unlock(&nfs_uuid_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_end);
+
+/* Must be called with RCU read lock held. */
+static nfs_uuid_t * nfs_uuid_lookup(const uuid_t *uuid)
+{
+ nfs_uuid_t *nfs_uuid;
+
+ list_for_each_entry(nfs_uuid, &nfs_uuids, list)
+ if (uuid_equal(&nfs_uuid->uuid, uuid))
+ return nfs_uuid;
+
+ return NULL;
+}
+
+void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
+ struct net *net, struct auth_domain *dom)
+{
+ nfs_uuid_t *nfs_uuid;
+
+ spin_lock(&nfs_uuid_lock);
+ nfs_uuid = nfs_uuid_lookup(uuid);
+ if (nfs_uuid) {
+ kref_get(&dom->ref);
+ nfs_uuid->dom = dom;
+ /* We don't hold a ref on the net, but instead put
+ * ourselves on a list so the net pointer can be
+ * invalidated.
+ */
+ list_move(&nfs_uuid->list, list);
+ nfs_uuid->net = net;
+ }
+ spin_unlock(&nfs_uuid_lock);
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
+
+static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
+{
+ if (nfs_uuid->net)
+ put_net(nfs_uuid->net);
+ nfs_uuid->net = NULL;
+ if (nfs_uuid->dom)
+ auth_domain_put(nfs_uuid->dom);
+ nfs_uuid->dom = NULL;
+ list_del_init(&nfs_uuid->list);
+}
+
+void nfs_uuid_invalidate_clients(struct list_head *list)
+{
+ nfs_uuid_t *nfs_uuid, *tmp;
+
+ spin_lock(&nfs_uuid_lock);
+ list_for_each_entry_safe(nfs_uuid, tmp, list, list)
+ nfs_uuid_put_locked(nfs_uuid);
+ spin_unlock(&nfs_uuid_lock);
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_clients);
+
+void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
+{
+ if (nfs_uuid->net) {
+ spin_lock(&nfs_uuid_lock);
+ nfs_uuid_put_locked(nfs_uuid);
+ spin_unlock(&nfs_uuid_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
new file mode 100644
index 000000000000..a8216a777b40
--- /dev/null
+++ b/include/linux/nfslocalio.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+#ifndef __LINUX_NFSLOCALIO_H
+#define __LINUX_NFSLOCALIO_H
+
+#include <linux/list.h>
+#include <linux/uuid.h>
+#include <linux/sunrpc/svcauth.h>
+#include <linux/nfs.h>
+#include <net/net_namespace.h>
+
+/*
+ * Useful to allow a client to negotiate if localio
+ * possible with its server.
+ *
+ * See Documentation/filesystems/nfs/localio.rst for more detail.
+ */
+typedef struct {
+ uuid_t uuid;
+ struct list_head list;
+ struct net *net; /* nfsd's network namespace */
+ struct auth_domain *dom; /* auth_domain for localio */
+} nfs_uuid_t;
+
+void nfs_uuid_begin(nfs_uuid_t *);
+void nfs_uuid_end(nfs_uuid_t *);
+void nfs_uuid_is_local(const uuid_t *, struct list_head *,
+ struct net *, struct auth_domain *);
+void nfs_uuid_invalidate_clients(struct list_head *list);
+void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
+
+#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
2024-08-30 2:20 ` [PATCH 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement NeilBrown
@ 2024-08-30 2:20 ` NeilBrown
2024-08-30 2:20 ` [PATCH 16/25] nfsd: add localio support NeilBrown
` (5 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2024-08-30 2:20 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever, Jeff Layton; +Cc: linux-nfs
From: Mike Snitzer <snitzer@kernel.org>
Introduce struct nfs_localio_ctx and the interfaces
nfs_localio_ctx_alloc() and nfs_localio_ctx_free(). The next commit
will introduce nfsd_open_local_fh() which returns a nfs_localio_ctx
structure.
Also, expose localio's required NFSD symbols to NFS client:
- Cache nfsd_open_local_fh symbol and other required NFSD symbols in a
globally accessible 'nfs_to' nfs_to_nfsd_t struct. Add interfaces
get_nfs_to_nfsd_symbols() and put_nfs_to_nfsd_symbols() to allow
each NFS client to take a reference on NFSD symbols.
- Apologies for the DEFINE_NFS_TO_NFSD_SYMBOL macro that makes
defining get_##NFSD_SYMBOL() and put_##NFSD_SYMBOL() functions far
simpler (and avoids cut-n-paste bugs, which is what motivated the
development and use of a macro for this). But as C macros go it is a
very simple one and there are many like it all over the kernel.
- Given the unique nature of NFS LOCALIO being an optional feature
that when used requires NFS share access to NFSD memory: a unique
bridging of NFSD resources to NFS (via nfs_common) is needed. But
that bridge must be dynamic, hence the use of symbol_request() and
symbol_put(). Proposed ideas to accomolish the same without using
symbol_{request,put} would be far more tedious to implement and
very likely no easier to review. Anyway: sorry NeilBrown...
- Despite the use of indirect function calls, caching these nfsd
symbols for use by the client offers a ~10% performance win
(compared to always doing get+call+put) for high IOPS workloads.
- Introduce nfsd_file_file() wrapper that provides access to
nfsd_file's backing file. Keeps nfsd_file structure opaque to NFS
client (as suggested by Jeff Layton).
- The addition of nfsd_file_get, nfsd_file_put and nfsd_file_file
symbols prepares for the NFS client to use nfsd_file for localio.
Suggested-by: Trond Myklebust <trond.myklebust@hammerspace.com> # nfs_to
Suggested-by: Jeff Layton <jlayton@kernel.org> # nfsd_file_file
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs_common/nfslocalio.c | 159 +++++++++++++++++++++++++++++++++++++
fs/nfsd/filecache.c | 25 ++++++
fs/nfsd/filecache.h | 1 +
fs/nfsd/nfssvc.c | 5 ++
include/linux/nfslocalio.h | 38 +++++++++
5 files changed, 228 insertions(+)
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index f78cf99e2547..8545ee75f756 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -106,3 +106,162 @@ void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
}
}
EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
+
+/*
+ * The nfs localio code needs to call into nfsd using various symbols (below),
+ * but cannot be statically linked, because that will make the nfs module
+ * depend on the nfsd module.
+ *
+ * Instead, do dynamic linking to the nfsd module (via nfs_common module). The
+ * nfs_common module will only hold a reference on nfsd when localio is in use.
+ * This allows some sanity checking, like giving up on localio if nfsd isn't loaded.
+ */
+static DEFINE_SPINLOCK(nfs_to_nfsd_lock);
+nfs_to_nfsd_t nfs_to;
+EXPORT_SYMBOL_GPL(nfs_to);
+
+/* Macro to define nfs_to get and put methods, avoids copy-n-paste bugs */
+#define DEFINE_NFS_TO_NFSD_SYMBOL(NFSD_SYMBOL) \
+static nfs_to_##NFSD_SYMBOL##_t get_##NFSD_SYMBOL(void) \
+{ \
+ return symbol_request(NFSD_SYMBOL); \
+} \
+static void put_##NFSD_SYMBOL(void) \
+{ \
+ symbol_put(NFSD_SYMBOL); \
+ nfs_to.NFSD_SYMBOL = NULL; \
+}
+
+/* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
+extern struct nfs_localio_ctx *
+nfsd_open_local_fh(nfs_uuid_t *, struct rpc_clnt *,
+ const struct cred *, const struct nfs_fh *, const fmode_t);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
+
+/* The nfs localio code needs to call into nfsd to acquire the nfsd_file */
+extern struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_get);
+
+/* The nfs localio code needs to call into nfsd to release the nfsd_file */
+extern void nfsd_file_put(struct nfsd_file *nf);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_put);
+
+/* The nfs localio code needs to call into nfsd to access the nf->nf_file */
+extern struct file * nfsd_file_file(struct nfsd_file *nf);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_file_file);
+
+/* The nfs localio code needs to call into nfsd to release nn->nfsd_serv */
+extern void nfsd_serv_put(struct nfsd_net *nn);
+DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_serv_put);
+#undef DEFINE_NFS_TO_NFSD_SYMBOL
+
+static struct kmem_cache *nfs_localio_ctx_cache;
+
+struct nfs_localio_ctx *nfs_localio_ctx_alloc(void)
+{
+ return kmem_cache_alloc(nfs_localio_ctx_cache,
+ GFP_KERNEL | __GFP_ZERO);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_ctx_alloc);
+
+void nfs_localio_ctx_free(struct nfs_localio_ctx *localio)
+{
+ if (localio->nf)
+ nfs_to.nfsd_file_put(localio->nf);
+ if (localio->nn)
+ nfs_to.nfsd_serv_put(localio->nn);
+ kmem_cache_free(nfs_localio_ctx_cache, localio);
+}
+EXPORT_SYMBOL_GPL(nfs_localio_ctx_free);
+
+bool get_nfs_to_nfsd_symbols(void)
+{
+ spin_lock(&nfs_to_nfsd_lock);
+
+ /* Only get symbols on first reference */
+ if (refcount_read(&nfs_to.ref) == 0)
+ refcount_set(&nfs_to.ref, 1);
+ else {
+ refcount_inc(&nfs_to.ref);
+ spin_unlock(&nfs_to_nfsd_lock);
+ return true;
+ }
+
+ nfs_to.nfsd_open_local_fh = get_nfsd_open_local_fh();
+ if (!nfs_to.nfsd_open_local_fh)
+ goto out_nfsd_open_local_fh;
+
+ nfs_to.nfsd_file_get = get_nfsd_file_get();
+ if (!nfs_to.nfsd_file_get)
+ goto out_nfsd_file_get;
+
+ nfs_to.nfsd_file_put = get_nfsd_file_put();
+ if (!nfs_to.nfsd_file_put)
+ goto out_nfsd_file_put;
+
+ nfs_to.nfsd_file_file = get_nfsd_file_file();
+ if (!nfs_to.nfsd_file_file)
+ goto out_nfsd_file_file;
+
+ nfs_to.nfsd_serv_put = get_nfsd_serv_put();
+ if (!nfs_to.nfsd_serv_put)
+ goto out_nfsd_serv_put;
+
+ spin_unlock(&nfs_to_nfsd_lock);
+ return true;
+
+out_nfsd_serv_put:
+ put_nfsd_file_file();
+out_nfsd_file_file:
+ put_nfsd_file_put();
+out_nfsd_file_put:
+ put_nfsd_file_get();
+out_nfsd_file_get:
+ put_nfsd_open_local_fh();
+out_nfsd_open_local_fh:
+ spin_unlock(&nfs_to_nfsd_lock);
+ return false;
+}
+EXPORT_SYMBOL_GPL(get_nfs_to_nfsd_symbols);
+
+void put_nfs_to_nfsd_symbols(void)
+{
+ spin_lock(&nfs_to_nfsd_lock);
+
+ if (!refcount_dec_and_test(&nfs_to.ref))
+ goto out;
+
+ put_nfsd_open_local_fh();
+ put_nfsd_file_get();
+ put_nfsd_file_put();
+ put_nfsd_file_file();
+ put_nfsd_serv_put();
+out:
+ spin_unlock(&nfs_to_nfsd_lock);
+}
+EXPORT_SYMBOL_GPL(put_nfs_to_nfsd_symbols);
+
+static int __init nfslocalio_init(void)
+{
+ refcount_set(&nfs_to.ref, 0);
+
+ nfs_to.nfsd_open_local_fh = NULL;
+ nfs_to.nfsd_file_get = NULL;
+ nfs_to.nfsd_file_put = NULL;
+ nfs_to.nfsd_file_file = NULL;
+ nfs_to.nfsd_serv_put = NULL;
+
+ nfs_localio_ctx_cache = KMEM_CACHE(nfs_localio_ctx, 0);
+ if (!nfs_localio_ctx_cache)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void __exit nfslocalio_exit(void)
+{
+ kmem_cache_destroy(nfs_localio_ctx_cache);
+}
+
+module_init(nfslocalio_init);
+module_exit(nfslocalio_exit);
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index 2dc72de31f61..a83d469bca6b 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -39,6 +39,7 @@
#include <linux/fsnotify.h>
#include <linux/seq_file.h>
#include <linux/rhashtable.h>
+#include <linux/nfslocalio.h>
#include "vfs.h"
#include "nfsd.h"
@@ -345,6 +346,10 @@ nfsd_file_get(struct nfsd_file *nf)
return nf;
return NULL;
}
+EXPORT_SYMBOL_GPL(nfsd_file_get);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_file_get_t __maybe_unused nfsd_file_get_typecheck = nfsd_file_get;
/**
* nfsd_file_put - put the reference to a nfsd_file
@@ -389,6 +394,26 @@ nfsd_file_put(struct nfsd_file *nf)
if (refcount_dec_and_test(&nf->nf_ref))
nfsd_file_free(nf);
}
+EXPORT_SYMBOL_GPL(nfsd_file_put);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_file_put_t __maybe_unused nfsd_file_put_typecheck = nfsd_file_put;
+
+/**
+ * nfsd_file_file - get the backing file of an nfsd_file
+ * @nf: nfsd_file of which to access the backing file.
+ *
+ * Return backing file for @nf.
+ */
+struct file *
+nfsd_file_file(struct nfsd_file *nf)
+{
+ return nf->nf_file;
+}
+EXPORT_SYMBOL_GPL(nfsd_file_file);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_file_file_t __maybe_unused nfsd_file_file_typecheck = nfsd_file_file;
static void
nfsd_file_dispose_list(struct list_head *dispose)
diff --git a/fs/nfsd/filecache.h b/fs/nfsd/filecache.h
index 26ada78b8c1e..6fbbb2e32e95 100644
--- a/fs/nfsd/filecache.h
+++ b/fs/nfsd/filecache.h
@@ -56,6 +56,7 @@ int nfsd_file_cache_start_net(struct net *net);
void nfsd_file_cache_shutdown_net(struct net *net);
void nfsd_file_put(struct nfsd_file *nf);
struct nfsd_file *nfsd_file_get(struct nfsd_file *nf);
+struct file *nfsd_file_file(struct nfsd_file *nf);
void nfsd_file_close_inode_sync(struct inode *inode);
void nfsd_file_net_dispose(struct nfsd_net *nn);
bool nfsd_file_is_cached(struct inode *inode);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index c639fbe4d8c2..13c69aa40d1c 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -19,6 +19,7 @@
#include <linux/sunrpc/svc_xprt.h>
#include <linux/lockd/bind.h>
#include <linux/nfsacl.h>
+#include <linux/nfslocalio.h>
#include <linux/seq_file.h>
#include <linux/inetdevice.h>
#include <net/addrconf.h>
@@ -201,6 +202,10 @@ void nfsd_serv_put(struct nfsd_net *nn)
{
percpu_ref_put(&nn->nfsd_serv_ref);
}
+EXPORT_SYMBOL_GPL(nfsd_serv_put);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_serv_put_t __maybe_unused nfsd_serv_put_typecheck = nfsd_serv_put;
static void nfsd_serv_done(struct percpu_ref *ref)
{
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index a8216a777b40..e196f716a2f5 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -7,6 +7,8 @@
#include <linux/list.h>
#include <linux/uuid.h>
+#include <linux/refcount.h>
+#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svcauth.h>
#include <linux/nfs.h>
#include <net/net_namespace.h>
@@ -31,4 +33,40 @@ void nfs_uuid_is_local(const uuid_t *, struct list_head *,
void nfs_uuid_invalidate_clients(struct list_head *list);
void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
+struct nfsd_file;
+struct nfsd_net;
+
+struct nfs_localio_ctx {
+ struct nfsd_file *nf;
+ struct nfsd_net *nn;
+};
+
+typedef struct nfs_localio_ctx *
+(*nfs_to_nfsd_open_local_fh_t)(nfs_uuid_t *,
+ struct rpc_clnt *, const struct cred *,
+ const struct nfs_fh *, const fmode_t);
+typedef struct nfsd_file * (*nfs_to_nfsd_file_get_t)(struct nfsd_file *);
+typedef void (*nfs_to_nfsd_file_put_t)(struct nfsd_file *);
+typedef struct file * (*nfs_to_nfsd_file_file_t)(struct nfsd_file *);
+typedef unsigned int (*nfs_to_nfsd_net_id_value_t)(void);
+typedef void (*nfs_to_nfsd_serv_put_t)(struct nfsd_net *);
+
+typedef struct {
+ refcount_t ref;
+ nfs_to_nfsd_open_local_fh_t nfsd_open_local_fh;
+ nfs_to_nfsd_file_get_t nfsd_file_get;
+ nfs_to_nfsd_file_put_t nfsd_file_put;
+ nfs_to_nfsd_file_file_t nfsd_file_file;
+ nfs_to_nfsd_net_id_value_t nfsd_net_id_value;
+ nfs_to_nfsd_serv_put_t nfsd_serv_put;
+} nfs_to_nfsd_t;
+
+extern nfs_to_nfsd_t nfs_to;
+
+bool get_nfs_to_nfsd_symbols(void);
+void put_nfs_to_nfsd_symbols(void);
+
+struct nfs_localio_ctx *nfs_localio_ctx_alloc(void);
+void nfs_localio_ctx_free(struct nfs_localio_ctx *);
+
#endif /* __LINUX_NFSLOCALIO_H */
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 16/25] nfsd: add localio support
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
2024-08-30 2:20 ` [PATCH 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement NeilBrown
2024-08-30 2:20 ` [PATCH 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces NeilBrown
@ 2024-08-30 2:20 ` NeilBrown
2024-08-30 2:20 ` [PATCH 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM NeilBrown
` (4 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2024-08-30 2:20 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever, Jeff Layton; +Cc: linux-nfs
From: Weston Andros Adamson <dros@primarydata.com>
Add server support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when both the client and server are
running on the same host.
If nfsd_open_local_fh() fails then the NFS client will both retry and
fallback to normal network-based read, write and commit operations if
localio is no longer supported.
Care is taken to ensure the same NFS security mechanisms are used
(authentication, etc) regardless of whether localio or regular NFS
access is used. The auth_domain established as part of the traditional
NFS client access to the NFS server is also used for localio. Store
auth_domain for localio in nfsd_uuid_t and transfer it to the client
if it is local to the server.
Relative to containers, localio gives the client access to the network
namespace the server has. This is required to allow the client to
access the server's per-namespace nfsd_net struct.
CONFIG_NFSD_LOCALIO controls the server enablement for localio.
A later commit will add CONFIG_NFS_LOCALIO to allow the client
enablement.
This commit also introduces the use of nfsd's percpu_ref to interlock
nfsd_destroy_serv and nfsd_open_local_fh, to ensure nn->nfsd_serv is
not destroyed while in use by nfsd_open_local_fh, and warrants a more
detailed explanation:
nfsd_open_local_fh uses nfsd_serv_try_get before opening its file
handle and then the reference must be dropped by the caller using
nfsd_serv_put (via nfs_localio_ctx_free).
Verified to fix an easy to hit crash that would occur if an nfsd
instance running in a container, with a localio client mounted, is
shutdown. Upon restart of the container and associated nfsd the client
would go on to crash due to NULL pointer dereference that occurred due
to the nfs client's localio attempting to nfsd_open_local_fh(), using
nn->nfsd_serv, without having a proper reference on nn->nfsd_serv.
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/Kconfig | 3 ++
fs/nfsd/Kconfig | 16 +++++++
fs/nfsd/Makefile | 1 +
fs/nfsd/filecache.c | 2 +-
fs/nfsd/localio.c | 114 ++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/trace.h | 3 +-
fs/nfsd/vfs.h | 8 ++++
7 files changed, 145 insertions(+), 2 deletions(-)
create mode 100644 fs/nfsd/localio.c
diff --git a/fs/Kconfig b/fs/Kconfig
index a46b0cbc4d8f..1b8a5edbddff 100644
--- a/fs/Kconfig
+++ b/fs/Kconfig
@@ -377,6 +377,9 @@ config NFS_ACL_SUPPORT
tristate
select FS_POSIX_ACL
+config NFS_COMMON_LOCALIO_SUPPORT
+ bool
+
config NFS_COMMON
bool
depends on NFSD || NFS_FS || LOCKD
diff --git a/fs/nfsd/Kconfig b/fs/nfsd/Kconfig
index c0bd1509ccd4..e6fa7eaa1db0 100644
--- a/fs/nfsd/Kconfig
+++ b/fs/nfsd/Kconfig
@@ -90,6 +90,22 @@ config NFSD_V4
If unsure, say N.
+config NFSD_LOCALIO
+ bool "NFS server support for the LOCALIO auxiliary protocol"
+ depends on NFSD
+ select NFS_COMMON_LOCALIO_SUPPORT
+ default n
+ help
+ Some NFS servers support an auxiliary NFS LOCALIO protocol
+ that is not an official part of the NFS protocol.
+
+ This option enables support for the LOCALIO protocol in the
+ kernel's NFS server. Enable this to permit local NFS clients
+ to bypass the network when issuing reads and writes to the
+ local NFS server.
+
+ If unsure, say N.
+
config NFSD_PNFS
bool
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index b8736a82e57c..78b421778a79 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -23,3 +23,4 @@ nfsd-$(CONFIG_NFSD_PNFS) += nfs4layouts.o
nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
+nfsd-$(CONFIG_NFSD_LOCALIO) += localio.o
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index a83d469bca6b..49f4aab3208a 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -53,7 +53,7 @@
#define NFSD_FILE_CACHE_UP (0)
/* We only care about NFSD_MAY_READ/WRITE for this cache */
-#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE)
+#define NFSD_FILE_MAY_MASK (NFSD_MAY_READ|NFSD_MAY_WRITE|NFSD_MAY_LOCALIO)
static DEFINE_PER_CPU(unsigned long, nfsd_file_cache_hits);
static DEFINE_PER_CPU(unsigned long, nfsd_file_acquisitions);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
new file mode 100644
index 000000000000..637402eecb61
--- /dev/null
+++ b/fs/nfsd/localio.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS server support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/exportfs.h>
+#include <linux/sunrpc/svcauth.h>
+#include <linux/sunrpc/clnt.h>
+#include <linux/nfs.h>
+#include <linux/nfs_common.h>
+#include <linux/nfslocalio.h>
+#include <linux/string.h>
+
+#include "nfsd.h"
+#include "vfs.h"
+#include "netns.h"
+#include "filecache.h"
+
+/**
+ * nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
+ *
+ * @cl_nfssvc_net: the 'struct net' to use to get the proper nfsd_net
+ * @cl_nfssvc_dom: the 'struct auth_domain' required for localio access
+ * @rpc_clnt: rpc_clnt that the client established, used for sockaddr and cred
+ * @cred: cred that the client established
+ * @nfs_fh: filehandle to lookup
+ * @fmode: fmode_t to use for open
+ *
+ * This function maps a local fh to a path on a local filesystem.
+ * This is useful when the nfs client has the local server mounted - it can
+ * avoid all the NFS overhead with reads, writes and commits.
+ *
+ * On successful return, returned nfs_localio_ctx will have its nfsd_file and
+ * nfsd_net members set. Caller is responsible for calling nfsd_file_put and
+ * nfsd_serv_put (via nfs_localio_ctx_free).
+ */
+struct nfs_localio_ctx *
+nfsd_open_local_fh(nfs_uuid_t *uuid,
+ struct rpc_clnt *rpc_clnt, const struct cred *cred,
+ const struct nfs_fh *nfs_fh, const fmode_t fmode)
+{
+ int mayflags = NFSD_MAY_LOCALIO;
+ int status = 0;
+ struct nfsd_net *nn = NULL;
+ struct net *net;
+ struct svc_cred rq_cred;
+ struct svc_fh fh;
+ struct nfs_localio_ctx *localio;
+ __be32 beres;
+
+ if (nfs_fh->size > NFS4_FHSIZE)
+ return ERR_PTR(-EINVAL);
+
+ localio = nfs_localio_ctx_alloc();
+ if (!localio)
+ return ERR_PTR(-ENOMEM);
+
+ /*
+ * Not running in nfsd context, so must safely get reference on nfsd_serv.
+ * But the server may already be shutting down, if so disallow new localio.
+ * uuid->net is NOT a counted reference, but rcu_read_lock() ensures that if
+ * uuid->net is not NULL, then nfsd_serv_try_get() is safe and if that succeeds
+ * we will have an implied reference to the net.
+ */
+ rcu_read_lock();
+ net = READ_ONCE(uuid->net);
+ if (net)
+ nn = net_generic(net, nfsd_net_id);
+ if (unlikely(!nn || !nfsd_serv_try_get(nn))) {
+ rcu_read_unlock();
+ status = -ENXIO;
+ goto out_nfsd_serv;
+ }
+ rcu_read_unlock();
+
+ /* nfs_fh -> svc_fh */
+ fh_init(&fh, NFS4_FHSIZE);
+ fh.fh_handle.fh_size = nfs_fh->size;
+ memcpy(fh.fh_handle.fh_raw, nfs_fh->data, nfs_fh->size);
+
+ if (fmode & FMODE_READ)
+ mayflags |= NFSD_MAY_READ;
+ if (fmode & FMODE_WRITE)
+ mayflags |= NFSD_MAY_WRITE;
+
+ svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
+
+ beres = nfsd_file_acquire_local(uuid->net, &rq_cred, uuid->dom,
+ &fh, mayflags, &localio->nf);
+ if (beres) {
+ status = nfs_stat_to_errno(be32_to_cpu(beres));
+ goto out_fh_put;
+ }
+ localio->nn = nn;
+
+out_fh_put:
+ fh_put(&fh);
+ if (rq_cred.cr_group_info)
+ put_group_info(rq_cred.cr_group_info);
+out_nfsd_serv:
+ if (status) {
+ nfs_localio_ctx_free(localio);
+ return ERR_PTR(status);
+ }
+ return localio;
+}
+EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
+
+/* Compile time type checking, not used by anything */
+static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index d22027e23761..82bcefcd1f21 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -86,7 +86,8 @@ DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
{ NFSD_MAY_NOT_BREAK_LEASE, "NOT_BREAK_LEASE" }, \
{ NFSD_MAY_BYPASS_GSS, "BYPASS_GSS" }, \
{ NFSD_MAY_READ_IF_EXEC, "READ_IF_EXEC" }, \
- { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" })
+ { NFSD_MAY_64BIT_COOKIE, "64BIT_COOKIE" }, \
+ { NFSD_MAY_LOCALIO, "LOCALIO" })
TRACE_EVENT(nfsd_compound,
TP_PROTO(
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index 01947561d375..2ecceb8b9d3d 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -8,6 +8,7 @@
#include <linux/fs.h>
#include <linux/posix_acl.h>
+#include <linux/nfslocalio.h>
#include "nfsfh.h"
#include "nfsd.h"
@@ -33,6 +34,8 @@
#define NFSD_MAY_64BIT_COOKIE 0x1000 /* 64 bit readdir cookies for >= NFSv3 */
+#define NFSD_MAY_LOCALIO 0x2000 /* for tracing, reflects when localio used */
+
#define NFSD_MAY_CREATE (NFSD_MAY_EXEC|NFSD_MAY_WRITE)
#define NFSD_MAY_REMOVE (NFSD_MAY_EXEC|NFSD_MAY_WRITE|NFSD_MAY_TRUNC)
@@ -158,6 +161,11 @@ __be32 nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
void nfsd_filp_close(struct file *fp);
+struct nfs_localio_ctx *
+nfsd_open_local_fh(nfs_uuid_t *,
+ struct rpc_clnt *, const struct cred *,
+ const struct nfs_fh *, const fmode_t);
+
static inline int fh_want_write(struct svc_fh *fh)
{
int ret;
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
` (2 preceding siblings ...)
2024-08-30 2:20 ` [PATCH 16/25] nfsd: add localio support NeilBrown
@ 2024-08-30 2:20 ` NeilBrown
2024-08-30 2:20 ` [PATCH 19/25] nfs: add localio support NeilBrown
` (3 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2024-08-30 2:20 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever, Jeff Layton; +Cc: linux-nfs
From: Mike Snitzer <snitzer@kernel.org>
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common. The server expects this protocol to use
the same transport as NFS and NFSACL for its RPCs. This protocol
isn't part of an IETF standard, nor does it need to be considering it
is Linux-to-Linux auxiliary RPC protocol that amounts to an
implementation detail.
The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.
The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned
by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ):
Linux Kernel Organization 400122 nfslocalio
After a successful handshake the client will hold a non-counted
reference to the server's network namespace. On namespace shutdown
these non-counted references will be invalidated.
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
[neilb: factored out and simplified single localio protocol]
Co-developed-by: NeilBrown <neil@brown.name>
Signed-off-by: NeilBrown <neil@brown.name>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/localio.c | 77 +++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/netns.h | 2 ++
fs/nfsd/nfsctl.c | 16 ++++++++++
fs/nfsd/nfsd.h | 4 +++
fs/nfsd/nfssvc.c | 23 +++++++++++++-
include/linux/nfs.h | 7 +++++
6 files changed, 128 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index 637402eecb61..491bf5017d34 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -13,12 +13,15 @@
#include <linux/nfs.h>
#include <linux/nfs_common.h>
#include <linux/nfslocalio.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
#include <linux/string.h>
#include "nfsd.h"
#include "vfs.h"
#include "netns.h"
#include "filecache.h"
+#include "cache.h"
/**
* nfsd_open_local_fh - lookup a local filehandle @nfs_fh and map to nfsd_file
@@ -112,3 +115,77 @@ EXPORT_SYMBOL_GPL(nfsd_open_local_fh);
/* Compile time type checking, not used by anything */
static nfs_to_nfsd_open_local_fh_t __maybe_unused nfsd_open_local_fh_typecheck = nfsd_open_local_fh;
+
+/*
+ * UUID_IS_LOCAL XDR functions
+ */
+
+static __be32 localio_proc_null(struct svc_rqst *rqstp)
+{
+ return rpc_success;
+}
+
+struct localio_uuidarg {
+ uuid_t uuid;
+};
+
+static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
+{
+ struct localio_uuidarg *argp = rqstp->rq_argp;
+ struct net *net = SVC_NET(rqstp);
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+
+ nfs_uuid_is_local(&argp->uuid, &nn->local_clients,
+ net, rqstp->rq_client);
+
+ return rpc_success;
+}
+
+static bool localio_decode_uuidarg(struct svc_rqst *rqstp,
+ struct xdr_stream *xdr)
+{
+ struct localio_uuidarg *argp = rqstp->rq_argp;
+ u8 uuid[UUID_SIZE];
+
+ if (decode_opaque_fixed(xdr, uuid, UUID_SIZE))
+ return false;
+ import_uuid(&argp->uuid, uuid);
+
+ return true;
+}
+
+static const struct svc_procedure localio_procedures1[] = {
+ [LOCALIOPROC_NULL] = {
+ .pc_func = localio_proc_null,
+ .pc_decode = nfssvc_decode_voidarg,
+ .pc_encode = nfssvc_encode_voidres,
+ .pc_argsize = sizeof(struct nfsd_voidargs),
+ .pc_ressize = sizeof(struct nfsd_voidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_xdrressize = 0,
+ .pc_name = "NULL",
+ },
+ [LOCALIOPROC_UUID_IS_LOCAL] = {
+ .pc_func = localio_proc_uuid_is_local,
+ .pc_decode = localio_decode_uuidarg,
+ .pc_encode = nfssvc_encode_voidres,
+ .pc_argsize = sizeof(struct localio_uuidarg),
+ .pc_argzero = sizeof(struct localio_uuidarg),
+ .pc_ressize = sizeof(struct nfsd_voidres),
+ .pc_cachetype = RC_NOCACHE,
+ .pc_name = "UUID_IS_LOCAL",
+ },
+};
+
+#define LOCALIO_NR_PROCEDURES ARRAY_SIZE(localio_procedures1)
+static DEFINE_PER_CPU_ALIGNED(unsigned long,
+ localio_count[LOCALIO_NR_PROCEDURES]);
+const struct svc_version localio_version1 = {
+ .vs_vers = 1,
+ .vs_nproc = LOCALIO_NR_PROCEDURES,
+ .vs_proc = localio_procedures1,
+ .vs_dispatch = nfsd_dispatch,
+ .vs_count = localio_count,
+ .vs_xdrsize = XDR_QUADLEN(UUID_SIZE),
+ .vs_hidden = true,
+};
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index e2d953f21dde..9c65db8d3f44 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -216,6 +216,8 @@ struct nfsd_net {
/* last time an admin-revoke happened for NFSv4.0 */
time64_t nfs40_last_revoke;
+ /* Local clients to be invalidated when net is shut down */
+ struct list_head local_clients;
};
/* Simple check to find out if a given net was properly initialized */
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 64c1b4d649bc..01e383d692ab 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -18,6 +18,7 @@
#include <linux/sunrpc/svc.h>
#include <linux/module.h>
#include <linux/fsnotify.h>
+#include <linux/nfslocalio.h>
#include "idmap.h"
#include "nfsd.h"
@@ -2257,6 +2258,7 @@ static __net_init int nfsd_net_init(struct net *net)
get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key));
seqlock_init(&nn->writeverf_lock);
nfsd_proc_stat_init(net);
+ INIT_LIST_HEAD(&nn->local_clients);
return 0;
@@ -2268,6 +2270,19 @@ static __net_init int nfsd_net_init(struct net *net)
return retval;
}
+/**
+ * nfsd_net_pre_exit - Disconnect localio clients from net namespace
+ * @net: a network namespace that is about to be destroyed
+ *
+ * This invalidated ->net pointers held by localio clients
+ * while they can still safely access nn->counter.
+ */
+static __net_exit void nfsd_net_pre_exit(struct net *net)
+{
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+
+ nfs_uuid_invalidate_clients(&nn->local_clients);
+}
/**
* nfsd_net_exit - Release the nfsd_net portion of a net namespace
* @net: a network namespace that is about to be destroyed
@@ -2285,6 +2300,7 @@ static __net_exit void nfsd_net_exit(struct net *net)
static struct pernet_operations nfsd_net_ops = {
.init = nfsd_net_init,
+ .pre_exit = nfsd_net_pre_exit,
.exit = nfsd_net_exit,
.id = &nfsd_net_id,
.size = sizeof(struct nfsd_net),
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index b0d3e82d6dcd..232a873dc53a 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -146,6 +146,10 @@ extern const struct svc_version nfsd_acl_version3;
#endif
#endif
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+extern const struct svc_version localio_version1;
+#endif
+
struct nfsd_net;
enum vers_op {NFSD_SET, NFSD_CLEAR, NFSD_TEST, NFSD_AVAIL };
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 13c69aa40d1c..eec4a9803c4a 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -80,6 +80,15 @@ DEFINE_SPINLOCK(nfsd_drc_lock);
unsigned long nfsd_drc_max_mem;
unsigned long nfsd_drc_mem_used;
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+static const struct svc_version *localio_versions[] = {
+ [1] = &localio_version1,
+};
+
+#define NFSD_LOCALIO_NRVERS ARRAY_SIZE(localio_versions)
+
+#endif /* CONFIG_NFSD_LOCALIO */
+
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static const struct svc_version *nfsd_acl_version[] = {
# if defined(CONFIG_NFSD_V2_ACL)
@@ -128,6 +137,18 @@ struct svc_program nfsd_programs[] = {
.pg_rpcbind_set = nfsd_acl_rpcbind_set,
},
#endif /* defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL) */
+#if IS_ENABLED(CONFIG_NFSD_LOCALIO)
+ {
+ .pg_prog = NFS_LOCALIO_PROGRAM,
+ .pg_nvers = NFSD_LOCALIO_NRVERS,
+ .pg_vers = localio_versions,
+ .pg_name = "nfslocalio",
+ .pg_class = "nfsd",
+ .pg_authenticate = svc_set_client,
+ .pg_init_request = svc_generic_init_request,
+ .pg_rpcbind_set = svc_generic_rpcbind_set,
+ }
+#endif /* IS_ENABLED(CONFIG_NFSD_LOCALIO) */
};
bool nfsd_support_version(int vers)
@@ -949,7 +970,7 @@ nfsd(void *vrqstp)
}
/**
- * nfsd_dispatch - Process an NFS or NFSACL Request
+ * nfsd_dispatch - Process an NFS or NFSACL or LOCALIO Request
* @rqstp: incoming request
*
* This RPC dispatcher integrates the NFS server's duplicate reply cache.
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index ceb70a926b95..5ff1a5b3b00c 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -13,6 +13,13 @@
#include <linux/crc32.h>
#include <uapi/linux/nfs.h>
+/* The localio program is entirely private to Linux and is
+ * NOT part of the uapi.
+ */
+#define NFS_LOCALIO_PROGRAM 400122
+#define LOCALIOPROC_NULL 0
+#define LOCALIOPROC_UUID_IS_LOCAL 1
+
/*
* This is the kernel NFS client file handle representation
*/
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 19/25] nfs: add localio support
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
` (3 preceding siblings ...)
2024-08-30 2:20 ` [PATCH 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM NeilBrown
@ 2024-08-30 2:20 ` NeilBrown
2024-08-30 2:20 ` [PATCH 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM NeilBrown
` (2 subsequent siblings)
7 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2024-08-30 2:20 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever, Jeff Layton; +Cc: linux-nfs
From: Weston Andros Adamson <dros@primarydata.com>
Add client support for bypassing NFS for localhost reads, writes, and
commits. This is only useful when the client and the server are
running on the same host.
nfs_local_probe() is stubbed out, later commits will enable client and
server handshake via a Linux-only LOCALIO auxiliary RPC protocol.
This has dynamic binding with the nfsd module (via nfs_localio module
which is part of nfs_common). Localio will only work if nfsd is
already loaded.
The "localio_enabled" nfs kernel module parameter can be used to
disable and enable the ability to use localio support.
CONFIG_NFS_LOCALIO controls the client enablement.
Lastly, localio uses an nfsd_file to initiate all IO. To make proper
use of nfsd_file (and nfsd's filecache) its lifetime (duration before
nfsd_file_put is called) must extend until after commit, read and
write operations. So rather than immediately call nfsd_file_put() in
nfs_local_open_fh(), nfsd_file_put() isn't called until
nfs_local_pgio_release() for read/write and not until
nfs_local_release_commit_data() for commit. The same applies to the
reference held on nfsd's nn->nfsd_serv. Both object lifetimes and
associated references are managed through calls to
nfs_localio_ctx_free().
Signed-off-by: Weston Andros Adamson <dros@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
Co-developed-by: Mike Snitzer <snitzer@kernel.org>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
---
fs/nfs/Kconfig | 16 +
fs/nfs/Makefile | 1 +
fs/nfs/client.c | 10 +
fs/nfs/internal.h | 45 +++
fs/nfs/localio.c | 605 ++++++++++++++++++++++++++++++++++++++
fs/nfs/nfstrace.h | 61 ++++
fs/nfs/pagelist.c | 4 +
fs/nfs/write.c | 3 +
include/linux/nfs.h | 2 +
include/linux/nfs_fs_sb.h | 9 +
10 files changed, 756 insertions(+)
create mode 100644 fs/nfs/localio.c
diff --git a/fs/nfs/Kconfig b/fs/nfs/Kconfig
index 0eb20012792f..9fe3b2709666 100644
--- a/fs/nfs/Kconfig
+++ b/fs/nfs/Kconfig
@@ -87,6 +87,22 @@ config NFS_V4
If unsure, say Y.
+config NFS_LOCALIO
+ bool "NFS client support for the LOCALIO auxiliary protocol"
+ depends on NFS_FS
+ select NFS_COMMON_LOCALIO_SUPPORT
+ default n
+ help
+ Some NFS servers support an auxiliary NFS LOCALIO protocol
+ that is not an official part of the NFS protocol.
+
+ This option enables support for the LOCALIO protocol in the
+ kernel's NFS client. Enable this to permit local NFS clients
+ to bypass the network when issuing reads and writes to the
+ local NFS server.
+
+ If unsure, say N.
+
config NFS_SWAP
bool "Provide swap over NFS support"
default n
diff --git a/fs/nfs/Makefile b/fs/nfs/Makefile
index 5f6db37f461e..9fb2f2cac87e 100644
--- a/fs/nfs/Makefile
+++ b/fs/nfs/Makefile
@@ -13,6 +13,7 @@ nfs-y := client.o dir.o file.o getroot.o inode.o super.o \
nfs-$(CONFIG_ROOT_NFS) += nfsroot.o
nfs-$(CONFIG_SYSCTL) += sysctl.o
nfs-$(CONFIG_NFS_FSCACHE) += fscache.o
+nfs-$(CONFIG_NFS_LOCALIO) += localio.o
obj-$(CONFIG_NFS_V2) += nfsv2.o
nfsv2-y := nfs2super.o proc.o nfs2xdr.o
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 8286edd6062d..a890319d4fab 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -178,6 +178,13 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
clp->cl_max_connect = cl_init->max_connect ? cl_init->max_connect : 1;
clp->cl_net = get_net(cl_init->net);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ seqlock_init(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+ clp->cl_uuid.net = NULL;
+ spin_lock_init(&clp->cl_localio_lock);
+#endif /* CONFIG_NFS_LOCALIO */
+
clp->cl_principal = "*";
clp->cl_xprtsec = cl_init->xprtsec;
return clp;
@@ -233,6 +240,8 @@ static void pnfs_init_server(struct nfs_server *server)
*/
void nfs_free_client(struct nfs_client *clp)
{
+ nfs_local_disable(clp);
+
/* -EIO all pending I/O */
if (!IS_ERR(clp->cl_rpcclient))
rpc_shutdown_client(clp->cl_rpcclient);
@@ -424,6 +433,7 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
+ nfs_local_probe(new);
return rpc_ops->init_client(new, cl_init);
}
diff --git a/fs/nfs/internal.h b/fs/nfs/internal.h
index d4ab74a61668..0716c90eaf9c 100644
--- a/fs/nfs/internal.h
+++ b/fs/nfs/internal.h
@@ -451,6 +451,51 @@ extern void nfs_set_cache_invalid(struct inode *inode, unsigned long flags);
extern bool nfs_check_cache_invalid(struct inode *, unsigned long);
extern int nfs_wait_bit_killable(struct wait_bit_key *key, int mode);
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+/* localio.c */
+extern void nfs_local_disable(struct nfs_client *);
+extern void nfs_local_probe(struct nfs_client *);
+extern struct nfs_localio_ctx *nfs_local_open_fh(struct nfs_client *,
+ const struct cred *,
+ struct nfs_fh *,
+ const fmode_t);
+extern int nfs_local_doio(struct nfs_client *,
+ struct nfs_localio_ctx *,
+ struct nfs_pgio_header *,
+ const struct rpc_call_ops *);
+extern int nfs_local_commit(struct nfs_localio_ctx *,
+ struct nfs_commit_data *,
+ const struct rpc_call_ops *, int);
+extern bool nfs_server_is_local(const struct nfs_client *clp);
+
+#else
+static inline void nfs_local_disable(struct nfs_client *clp) {}
+static inline void nfs_local_probe(struct nfs_client *clp) {}
+static inline struct nfs_localio_ctx *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, const fmode_t mode)
+{
+ return NULL;
+}
+static inline int nfs_local_doio(struct nfs_client *clp,
+ struct nfs_localio_ctx *localio,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ return -EINVAL;
+}
+static inline int nfs_local_commit(struct nfs_localio_ctx *localio,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ return -EINVAL;
+}
+static inline bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return false;
+}
+#endif /* CONFIG_NFS_LOCALIO */
+
/* super.c */
extern const struct super_operations nfs_sops;
bool nfs_auth_info_match(const struct nfs_auth_info *, rpc_authflavor_t);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
new file mode 100644
index 000000000000..f391badfb999
--- /dev/null
+++ b/fs/nfs/localio.c
@@ -0,0 +1,605 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * NFS client support for local clients to bypass network stack
+ *
+ * Copyright (C) 2014 Weston Andros Adamson <dros@primarydata.com>
+ * Copyright (C) 2019 Trond Myklebust <trond.myklebust@hammerspace.com>
+ * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com>
+ */
+
+#include <linux/module.h>
+#include <linux/errno.h>
+#include <linux/vfs.h>
+#include <linux/file.h>
+#include <linux/inet.h>
+#include <linux/sunrpc/addr.h>
+#include <linux/inetdevice.h>
+#include <net/addrconf.h>
+#include <linux/nfs_common.h>
+#include <linux/nfslocalio.h>
+#include <linux/module.h>
+#include <linux/bvec.h>
+
+#include <linux/nfs.h>
+#include <linux/nfs_fs.h>
+#include <linux/nfs_xdr.h>
+
+#include "internal.h"
+#include "pnfs.h"
+#include "nfstrace.h"
+
+#define NFSDBG_FACILITY NFSDBG_VFS
+
+struct nfs_local_kiocb {
+ struct kiocb kiocb;
+ struct bio_vec *bvec;
+ struct nfs_pgio_header *hdr;
+ struct work_struct work;
+ struct nfs_localio_ctx *localio;
+};
+
+struct nfs_local_fsync_ctx {
+ struct nfs_localio_ctx *localio;
+ struct nfs_commit_data *data;
+ struct work_struct work;
+ struct kref kref;
+ struct completion *done;
+};
+static void nfs_local_fsync_work(struct work_struct *work);
+
+static bool localio_enabled __read_mostly = true;
+module_param(localio_enabled, bool, 0644);
+
+bool nfs_server_is_local(const struct nfs_client *clp)
+{
+ return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
+ localio_enabled;
+}
+EXPORT_SYMBOL_GPL(nfs_server_is_local);
+
+/*
+ * nfs_local_enable - enable local i/o for an nfs_client
+ */
+static __maybe_unused void nfs_local_enable(struct nfs_client *clp)
+{
+ spin_lock(&clp->cl_localio_lock);
+
+ if (unlikely(!get_nfs_to_nfsd_symbols()))
+ goto out;
+ set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+ trace_nfs_local_enable(clp);
+out:
+ spin_unlock(&clp->cl_localio_lock);
+}
+
+/*
+ * nfs_local_disable - disable local i/o for an nfs_client
+ */
+void nfs_local_disable(struct nfs_client *clp)
+{
+ spin_lock(&clp->cl_localio_lock);
+ if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
+ trace_nfs_local_disable(clp);
+ put_nfs_to_nfsd_symbols();
+
+ nfs_uuid_invalidate_one_client(&clp->cl_uuid);
+ }
+ spin_unlock(&clp->cl_localio_lock);
+}
+
+/*
+ * nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ */
+void nfs_local_probe(struct nfs_client *clp)
+{
+}
+EXPORT_SYMBOL_GPL(nfs_local_probe);
+
+/*
+ * nfs_local_open_fh - open a local filehandle in terms of nfsd_file
+ *
+ * Returns a pointer to a struct nfs_localio_ctx or NULL
+ */
+struct nfs_localio_ctx *
+nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
+ struct nfs_fh *fh, const fmode_t mode)
+{
+ struct nfs_localio_ctx *localio;
+ int status;
+
+ if (!nfs_server_is_local(clp))
+ return NULL;
+ if (mode & ~(FMODE_READ | FMODE_WRITE))
+ return NULL;
+
+ localio = nfs_to.nfsd_open_local_fh(&clp->cl_uuid,
+ clp->cl_rpcclient, cred, fh, mode);
+ if (IS_ERR(localio)) {
+ status = PTR_ERR(localio);
+ trace_nfs_local_open_fh(fh, mode, status);
+ switch (status) {
+ case -ENOMEM:
+ case -ENXIO:
+ case -ENOENT:
+ nfs_local_disable(clp);
+ }
+ return NULL;
+ }
+ return localio;
+}
+EXPORT_SYMBOL_GPL(nfs_local_open_fh);
+
+static struct bio_vec *
+nfs_bvec_alloc_and_import_pagevec(struct page **pagevec,
+ unsigned int npages, gfp_t flags)
+{
+ struct bio_vec *bvec, *p;
+
+ bvec = kmalloc_array(npages, sizeof(*bvec), flags);
+ if (bvec != NULL) {
+ for (p = bvec; npages > 0; p++, pagevec++, npages--) {
+ p->bv_page = *pagevec;
+ p->bv_len = PAGE_SIZE;
+ p->bv_offset = 0;
+ }
+ }
+ return bvec;
+}
+
+static void
+nfs_local_iocb_free(struct nfs_local_kiocb *iocb)
+{
+ kfree(iocb->bvec);
+ kfree(iocb);
+}
+
+static struct nfs_local_kiocb *
+nfs_local_iocb_alloc(struct nfs_pgio_header *hdr,
+ struct nfs_localio_ctx *localio, gfp_t flags)
+{
+ struct nfs_local_kiocb *iocb;
+
+ iocb = kmalloc(sizeof(*iocb), flags);
+ if (iocb == NULL)
+ return NULL;
+ iocb->bvec = nfs_bvec_alloc_and_import_pagevec(hdr->page_array.pagevec,
+ hdr->page_array.npages, flags);
+ if (iocb->bvec == NULL) {
+ kfree(iocb);
+ return NULL;
+ }
+ init_sync_kiocb(&iocb->kiocb, nfs_to.nfsd_file_file(localio->nf));
+ iocb->kiocb.ki_pos = hdr->args.offset;
+ iocb->localio = localio;
+ iocb->hdr = hdr;
+ iocb->kiocb.ki_flags &= ~IOCB_APPEND;
+ return iocb;
+}
+
+static void
+nfs_local_iter_init(struct iov_iter *i, struct nfs_local_kiocb *iocb, int dir)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ iov_iter_bvec(i, dir, iocb->bvec, hdr->page_array.npages,
+ hdr->args.count + hdr->args.pgbase);
+ if (hdr->args.pgbase != 0)
+ iov_iter_advance(i, hdr->args.pgbase);
+}
+
+static void
+nfs_local_hdr_release(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ call_ops->rpc_call_done(&hdr->task, hdr);
+ call_ops->rpc_release(hdr);
+}
+
+static void
+nfs_local_pgio_init(struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ hdr->task.tk_ops = call_ops;
+ if (!hdr->task.tk_start)
+ hdr->task.tk_start = ktime_get();
+}
+
+static void
+nfs_local_pgio_done(struct nfs_pgio_header *hdr, long status)
+{
+ if (status >= 0) {
+ hdr->res.count = status;
+ hdr->res.op_status = NFS4_OK;
+ hdr->task.tk_status = 0;
+ } else {
+ hdr->res.op_status = nfs4_stat_to_errno(status);
+ hdr->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_pgio_release(struct nfs_local_kiocb *iocb)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+
+ nfs_localio_ctx_free(iocb->localio);
+ nfs_local_iocb_free(iocb);
+ nfs_local_hdr_release(hdr, hdr->task.tk_ops);
+}
+
+static void
+nfs_local_read_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct file *filp = iocb->kiocb.ki_filp;
+
+ nfs_local_pgio_done(hdr, status);
+
+ if (hdr->res.count != hdr->args.count ||
+ hdr->args.offset + hdr->res.count >= i_size_read(file_inode(filp)))
+ hdr->res.eof = true;
+
+ dprintk("%s: read %ld bytes eof %d.\n", __func__,
+ status > 0 ? status : 0, hdr->res.eof);
+}
+
+static int
+nfs_do_local_read(struct nfs_pgio_header *hdr,
+ struct nfs_localio_ctx *localio,
+ const struct rpc_call_ops *call_ops)
+{
+ struct file *filp = nfs_to.nfsd_file_file(localio->nf);
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_read count=%u pos=%llu\n",
+ __func__, hdr->args.count, hdr->args.offset);
+
+ iocb = nfs_local_iocb_alloc(hdr, localio, GFP_KERNEL);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, READ);
+
+ nfs_local_pgio_init(hdr, call_ops);
+ hdr->res.eof = false;
+
+ status = filp->f_op->read_iter(&iocb->kiocb, &iter);
+ WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+ nfs_local_read_done(iocb, status);
+ nfs_local_pgio_release(iocb);
+
+ return 0;
+}
+
+static void
+nfs_copy_boot_verifier(struct nfs_write_verifier *verifier, struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+ u32 *verf = (u32 *)verifier->data;
+ int seq = 0;
+
+ do {
+ read_seqbegin_or_lock(&clp->cl_boot_lock, &seq);
+ verf[0] = (u32)clp->cl_nfssvc_boot.tv_sec;
+ verf[1] = (u32)clp->cl_nfssvc_boot.tv_nsec;
+ } while (need_seqretry(&clp->cl_boot_lock, seq));
+ done_seqretry(&clp->cl_boot_lock, seq);
+}
+
+static void
+nfs_reset_boot_verifier(struct inode *inode)
+{
+ struct nfs_client *clp = NFS_SERVER(inode)->nfs_client;
+
+ write_seqlock(&clp->cl_boot_lock);
+ ktime_get_real_ts64(&clp->cl_nfssvc_boot);
+ write_sequnlock(&clp->cl_boot_lock);
+}
+
+static void
+nfs_set_local_verifier(struct inode *inode,
+ struct nfs_writeverf *verf,
+ enum nfs3_stable_how how)
+{
+ nfs_copy_boot_verifier(&verf->verifier, inode);
+ verf->committed = how;
+}
+
+/* Factored out from fs/nfsd/vfs.h:fh_getattr() */
+static int __vfs_getattr(struct path *p, struct kstat *stat, int version)
+{
+ u32 request_mask = STATX_BASIC_STATS;
+
+ if (version == 4)
+ request_mask |= (STATX_BTIME | STATX_CHANGE_COOKIE);
+ return vfs_getattr(p, stat, request_mask, AT_STATX_SYNC_AS_STAT);
+}
+
+/* Copied from fs/nfsd/nfsfh.c:nfsd4_change_attribute() */
+static u64 __nfsd4_change_attribute(const struct kstat *stat,
+ const struct inode *inode)
+{
+ u64 chattr;
+
+ if (stat->result_mask & STATX_CHANGE_COOKIE) {
+ chattr = stat->change_cookie;
+ if (S_ISREG(inode->i_mode) &&
+ !(stat->attributes & STATX_ATTR_CHANGE_MONOTONIC)) {
+ chattr += (u64)stat->ctime.tv_sec << 30;
+ chattr += stat->ctime.tv_nsec;
+ }
+ } else {
+ chattr = time_to_chattr(&stat->ctime);
+ }
+ return chattr;
+}
+
+static void nfs_local_vfs_getattr(struct nfs_local_kiocb *iocb)
+{
+ struct kstat stat;
+ struct file *filp = iocb->kiocb.ki_filp;
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct nfs_fattr *fattr = hdr->res.fattr;
+ int version = NFS_PROTO(hdr->inode)->version;
+
+ if (unlikely(!fattr) || __vfs_getattr(&filp->f_path, &stat, version))
+ return;
+
+ fattr->valid = (NFS_ATTR_FATTR_FILEID |
+ NFS_ATTR_FATTR_CHANGE |
+ NFS_ATTR_FATTR_SIZE |
+ NFS_ATTR_FATTR_ATIME |
+ NFS_ATTR_FATTR_MTIME |
+ NFS_ATTR_FATTR_CTIME |
+ NFS_ATTR_FATTR_SPACE_USED);
+
+ fattr->fileid = stat.ino;
+ fattr->size = stat.size;
+ fattr->atime = stat.atime;
+ fattr->mtime = stat.mtime;
+ fattr->ctime = stat.ctime;
+ if (version == 4) {
+ fattr->change_attr =
+ __nfsd4_change_attribute(&stat, file_inode(filp));
+ } else
+ fattr->change_attr = nfs_timespec_to_change_attr(&fattr->ctime);
+ fattr->du.nfs3.used = stat.blocks << 9;
+}
+
+static void
+nfs_local_write_done(struct nfs_local_kiocb *iocb, long status)
+{
+ struct nfs_pgio_header *hdr = iocb->hdr;
+ struct inode *inode = hdr->inode;
+
+ dprintk("%s: wrote %ld bytes.\n", __func__, status > 0 ? status : 0);
+
+ /* Handle short writes as if they are ENOSPC */
+ if (status > 0 && status < hdr->args.count) {
+ hdr->mds_offset += status;
+ hdr->args.offset += status;
+ hdr->args.pgbase += status;
+ hdr->args.count -= status;
+ nfs_set_pgio_error(hdr, -ENOSPC, hdr->args.offset);
+ status = -ENOSPC;
+ }
+ if (status < 0)
+ nfs_reset_boot_verifier(inode);
+ else if (nfs_should_remove_suid(inode)) {
+ /* Deal with the suid/sgid bit corner case */
+ spin_lock(&inode->i_lock);
+ nfs_set_cache_invalid(inode, NFS_INO_INVALID_MODE);
+ spin_unlock(&inode->i_lock);
+ }
+ nfs_local_pgio_done(hdr, status);
+}
+
+static int
+nfs_do_local_write(struct nfs_pgio_header *hdr,
+ struct nfs_localio_ctx *localio,
+ const struct rpc_call_ops *call_ops)
+{
+ struct file *filp = nfs_to.nfsd_file_file(localio->nf);
+ struct nfs_local_kiocb *iocb;
+ struct iov_iter iter;
+ ssize_t status;
+
+ dprintk("%s: vfs_write count=%u pos=%llu %s\n",
+ __func__, hdr->args.count, hdr->args.offset,
+ (hdr->args.stable == NFS_UNSTABLE) ? "unstable" : "stable");
+
+ iocb = nfs_local_iocb_alloc(hdr, localio, GFP_NOIO);
+ if (iocb == NULL)
+ return -ENOMEM;
+ nfs_local_iter_init(&iter, iocb, WRITE);
+
+ switch (hdr->args.stable) {
+ default:
+ break;
+ case NFS_DATA_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC;
+ break;
+ case NFS_FILE_SYNC:
+ iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC;
+ }
+ nfs_local_pgio_init(hdr, call_ops);
+
+ nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable);
+
+ file_start_write(filp);
+ status = filp->f_op->write_iter(&iocb->kiocb, &iter);
+ file_end_write(filp);
+ WARN_ON_ONCE(status == -EIOCBQUEUED);
+
+ nfs_local_write_done(iocb, status);
+ nfs_local_vfs_getattr(iocb);
+ nfs_local_pgio_release(iocb);
+
+ return 0;
+}
+
+int nfs_local_doio(struct nfs_client *clp, struct nfs_localio_ctx *localio,
+ struct nfs_pgio_header *hdr,
+ const struct rpc_call_ops *call_ops)
+{
+ int status = 0;
+ struct file *filp = nfs_to.nfsd_file_file(localio->nf);
+
+ if (!hdr->args.count)
+ return 0;
+ /* Don't support filesystems without read_iter/write_iter */
+ if (!filp->f_op->read_iter || !filp->f_op->write_iter) {
+ nfs_local_disable(clp);
+ status = -EAGAIN;
+ goto out;
+ }
+
+ switch (hdr->rw_mode) {
+ case FMODE_READ:
+ status = nfs_do_local_read(hdr, localio, call_ops);
+ break;
+ case FMODE_WRITE:
+ status = nfs_do_local_write(hdr, localio, call_ops);
+ break;
+ default:
+ dprintk("%s: invalid mode: %d\n", __func__,
+ hdr->rw_mode);
+ status = -EINVAL;
+ }
+out:
+ if (status != 0) {
+ nfs_localio_ctx_free(localio);
+ hdr->task.tk_status = status;
+ nfs_local_hdr_release(hdr, call_ops);
+ }
+ return status;
+}
+
+static void
+nfs_local_init_commit(struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ data->task.tk_ops = call_ops;
+}
+
+static int
+nfs_local_run_commit(struct file *filp, struct nfs_commit_data *data)
+{
+ loff_t start = data->args.offset;
+ loff_t end = LLONG_MAX;
+
+ if (data->args.count > 0) {
+ end = start + data->args.count - 1;
+ if (end < start)
+ end = LLONG_MAX;
+ }
+
+ dprintk("%s: commit %llu - %llu\n", __func__, start, end);
+ return vfs_fsync_range(filp, start, end, 0);
+}
+
+static void
+nfs_local_commit_done(struct nfs_commit_data *data, int status)
+{
+ if (status >= 0) {
+ nfs_set_local_verifier(data->inode,
+ data->res.verf,
+ NFS_FILE_SYNC);
+ data->res.op_status = NFS4_OK;
+ data->task.tk_status = 0;
+ } else {
+ nfs_reset_boot_verifier(data->inode);
+ data->res.op_status = nfs4_stat_to_errno(status);
+ data->task.tk_status = status;
+ }
+}
+
+static void
+nfs_local_release_commit_data(struct nfs_localio_ctx *localio,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops)
+{
+ nfs_localio_ctx_free(localio);
+ call_ops->rpc_call_done(&data->task, data);
+ call_ops->rpc_release(data);
+}
+
+static struct nfs_local_fsync_ctx *
+nfs_local_fsync_ctx_alloc(struct nfs_commit_data *data,
+ struct nfs_localio_ctx *localio, gfp_t flags)
+{
+ struct nfs_local_fsync_ctx *ctx = kmalloc(sizeof(*ctx), flags);
+
+ if (ctx != NULL) {
+ ctx->localio = localio;
+ ctx->data = data;
+ INIT_WORK(&ctx->work, nfs_local_fsync_work);
+ kref_init(&ctx->kref);
+ ctx->done = NULL;
+ }
+ return ctx;
+}
+
+static void
+nfs_local_fsync_ctx_kref_free(struct kref *kref)
+{
+ kfree(container_of(kref, struct nfs_local_fsync_ctx, kref));
+}
+
+static void
+nfs_local_fsync_ctx_put(struct nfs_local_fsync_ctx *ctx)
+{
+ kref_put(&ctx->kref, nfs_local_fsync_ctx_kref_free);
+}
+
+static void
+nfs_local_fsync_ctx_free(struct nfs_local_fsync_ctx *ctx)
+{
+ nfs_local_release_commit_data(ctx->localio, ctx->data,
+ ctx->data->task.tk_ops);
+ nfs_local_fsync_ctx_put(ctx);
+}
+
+static void
+nfs_local_fsync_work(struct work_struct *work)
+{
+ struct nfs_local_fsync_ctx *ctx;
+ int status;
+
+ ctx = container_of(work, struct nfs_local_fsync_ctx, work);
+
+ status = nfs_local_run_commit(nfs_to.nfsd_file_file(ctx->localio->nf),
+ ctx->data);
+ nfs_local_commit_done(ctx->data, status);
+ if (ctx->done != NULL)
+ complete(ctx->done);
+ nfs_local_fsync_ctx_free(ctx);
+}
+
+int nfs_local_commit(struct nfs_localio_ctx *localio,
+ struct nfs_commit_data *data,
+ const struct rpc_call_ops *call_ops, int how)
+{
+ struct nfs_local_fsync_ctx *ctx;
+
+ ctx = nfs_local_fsync_ctx_alloc(data, localio, GFP_KERNEL);
+ if (!ctx) {
+ nfs_local_commit_done(data, -ENOMEM);
+ nfs_local_release_commit_data(localio, data, call_ops);
+ return -ENOMEM;
+ }
+
+ nfs_local_init_commit(data, call_ops);
+ kref_get(&ctx->kref);
+ if (how & FLUSH_SYNC) {
+ DECLARE_COMPLETION_ONSTACK(done);
+ ctx->done = &done;
+ queue_work(nfsiod_workqueue, &ctx->work);
+ wait_for_completion(&done);
+ } else
+ queue_work(nfsiod_workqueue, &ctx->work);
+ nfs_local_fsync_ctx_put(ctx);
+ return 0;
+}
diff --git a/fs/nfs/nfstrace.h b/fs/nfs/nfstrace.h
index 352fdaed4075..1eab98c277fa 100644
--- a/fs/nfs/nfstrace.h
+++ b/fs/nfs/nfstrace.h
@@ -1685,6 +1685,67 @@ TRACE_EVENT(nfs_mount_path,
TP_printk("path='%s'", __get_str(path))
);
+TRACE_EVENT(nfs_local_open_fh,
+ TP_PROTO(
+ const struct nfs_fh *fh,
+ fmode_t fmode,
+ int error
+ ),
+
+ TP_ARGS(fh, fmode, error),
+
+ TP_STRUCT__entry(
+ __field(int, error)
+ __field(u32, fhandle)
+ __field(unsigned int, fmode)
+ ),
+
+ TP_fast_assign(
+ __entry->error = error;
+ __entry->fhandle = nfs_fhandle_hash(fh);
+ __entry->fmode = (__force unsigned int)fmode;
+ ),
+
+ TP_printk(
+ "error=%d fhandle=0x%08x mode=%s",
+ __entry->error,
+ __entry->fhandle,
+ show_fs_fmode_flags(__entry->fmode)
+ )
+);
+
+DECLARE_EVENT_CLASS(nfs_local_client_event,
+ TP_PROTO(
+ const struct nfs_client *clp
+ ),
+
+ TP_ARGS(clp),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, protocol)
+ __string(server, clp->cl_hostname)
+ ),
+
+ TP_fast_assign(
+ __entry->protocol = clp->rpc_ops->version;
+ __assign_str(server);
+ ),
+
+ TP_printk(
+ "server=%s NFSv%u", __get_str(server), __entry->protocol
+ )
+);
+
+#define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \
+ DEFINE_EVENT(nfs_local_client_event, name, \
+ TP_PROTO( \
+ const struct nfs_client *clp \
+ ), \
+ TP_ARGS(clp))
+
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable);
+DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable);
+
DECLARE_EVENT_CLASS(nfs_xdr_event,
TP_PROTO(
const struct xdr_stream *xdr,
diff --git a/fs/nfs/pagelist.c b/fs/nfs/pagelist.c
index 849db19451ff..cf68c0a61b7d 100644
--- a/fs/nfs/pagelist.c
+++ b/fs/nfs/pagelist.c
@@ -762,6 +762,10 @@ int nfs_initiate_pgio(struct rpc_clnt *clnt, struct nfs_pgio_header *hdr,
hdr->args.count,
(unsigned long long)hdr->args.offset);
+ if (localio)
+ return nfs_local_doio(NFS_SERVER(hdr->inode)->nfs_client,
+ localio, hdr, call_ops);
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
diff --git a/fs/nfs/write.c b/fs/nfs/write.c
index 4bd16473a953..8bbbe8dace3b 100644
--- a/fs/nfs/write.c
+++ b/fs/nfs/write.c
@@ -1693,6 +1693,9 @@ int nfs_initiate_commit(struct rpc_clnt *clnt, struct nfs_commit_data *data,
dprintk("NFS: initiated commit call\n");
+ if (localio)
+ return nfs_local_commit(localio, data, call_ops, how);
+
task = rpc_run_task(&task_setup_data);
if (IS_ERR(task))
return PTR_ERR(task);
diff --git a/include/linux/nfs.h b/include/linux/nfs.h
index 5ff1a5b3b00c..89ef8c5e98db 100644
--- a/include/linux/nfs.h
+++ b/include/linux/nfs.h
@@ -8,6 +8,8 @@
#ifndef _LINUX_NFS_H
#define _LINUX_NFS_H
+#include <linux/cred.h>
+#include <linux/sunrpc/auth.h>
#include <linux/sunrpc/msg_prot.h>
#include <linux/string.h>
#include <linux/crc32.h>
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index 1df86ab98c77..b43e3e067e44 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -8,6 +8,7 @@
#include <linux/wait.h>
#include <linux/nfs_xdr.h>
#include <linux/sunrpc/xprt.h>
+#include <linux/nfslocalio.h>
#include <linux/atomic.h>
#include <linux/refcount.h>
@@ -49,6 +50,7 @@ struct nfs_client {
#define NFS_CS_DS 7 /* - Server is a DS */
#define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */
#define NFS_CS_PNFS 9 /* - Server used for pnfs */
+#define NFS_CS_LOCAL_IO 10 /* - client is local */
struct sockaddr_storage cl_addr; /* server identifier */
size_t cl_addrlen;
char * cl_hostname; /* hostname of server */
@@ -125,6 +127,13 @@ struct nfs_client {
struct net *cl_net;
struct list_head pending_cb_stateids;
struct rcu_head rcu;
+
+#if IS_ENABLED(CONFIG_NFS_LOCALIO)
+ struct timespec64 cl_nfssvc_boot;
+ seqlock_t cl_boot_lock;
+ nfs_uuid_t cl_uuid;
+ spinlock_t cl_localio_lock;
+#endif /* CONFIG_NFS_LOCALIO */
};
/*
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
` (4 preceding siblings ...)
2024-08-30 2:20 ` [PATCH 19/25] nfs: add localio support NeilBrown
@ 2024-08-30 2:20 ` NeilBrown
2024-08-30 3:46 ` [PATCH v14-plus 00/25] Address netns refcount issues for localio Mike Snitzer
2024-08-30 7:47 ` Mike Snitzer
7 siblings, 0 replies; 10+ messages in thread
From: NeilBrown @ 2024-08-30 2:20 UTC (permalink / raw)
To: Mike Snitzer, Chuck Lever, Jeff Layton; +Cc: linux-nfs
From: Mike Snitzer <snitzer@kernel.org>
The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL"
RPC method that allows the Linux NFS client to verify the local Linux
NFS server can see the nonce (single-use UUID) the client generated and
made available in nfs_common for subsequent lookup and verification
by the NFS server. If matched, the NFS server populates members in the
nfs_uuid_t struct. The NFS client then transfers these nfs_uuid_t
struct member pointers to the nfs_client struct and cleans up the
nfs_uuid_t struct. See: fs/nfs/localio.c:nfs_local_probe()
This protocol isn't part of an IETF standard, nor does it need to be
considering it is Linux-to-Linux auxiliary RPC protocol that amounts
to an implementation detail.
Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka
AUTH_SYS) is used (enforced by fs/nfs/localio.c:nfs_local_probe()).
The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of
the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode
XDR methods are used instead of the less efficient variable sized
methods.
Having a nonce (single-use uuid) is better than using the same uuid
for the life of the server, and sending it proactively by client
rather than reactively by the server is also safer.
[NeilBrown factored out and simplified a single localio protocol and
proposed making the uuid short-lived]
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Co-developed-by: NeilBrown <neilb@suse.de>
Signed-off-by: NeilBrown <neilb@suse.de>
---
fs/nfs/client.c | 6 ++-
fs/nfs/localio.c | 132 +++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 132 insertions(+), 6 deletions(-)
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index a890319d4fab..4903f5ff5758 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -433,8 +433,10 @@ struct nfs_client *nfs_get_client(const struct nfs_client_initdata *cl_init)
list_add_tail(&new->cl_share_link,
&nn->nfs_client_list);
spin_unlock(&nn->nfs_client_lock);
- nfs_local_probe(new);
- return rpc_ops->init_client(new, cl_init);
+ new = rpc_ops->init_client(new, cl_init);
+ if (!IS_ERR(new))
+ nfs_local_probe(new);
+ return new;
}
spin_unlock(&nn->nfs_client_lock);
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 9d8a1436f3c6..55622084d5c2 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -50,17 +50,77 @@ static void nfs_local_fsync_work(struct work_struct *work);
static bool localio_enabled __read_mostly = true;
module_param(localio_enabled, bool, 0644);
+static inline bool nfs_client_is_local(const struct nfs_client *clp)
+{
+ return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
+}
+
bool nfs_server_is_local(const struct nfs_client *clp)
{
- return test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags) != 0 &&
- localio_enabled;
+ return nfs_client_is_local(clp) && localio_enabled;
}
EXPORT_SYMBOL_GPL(nfs_server_is_local);
+/*
+ * UUID_IS_LOCAL XDR functions
+ */
+
+static void localio_xdr_enc_uuidargs(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ const void *data)
+{
+ const u8 *uuid = data;
+
+ encode_opaque_fixed(xdr, uuid, UUID_SIZE);
+}
+
+static int localio_xdr_dec_uuidres(struct rpc_rqst *req,
+ struct xdr_stream *xdr,
+ void *result)
+{
+ /* void return */
+ return 0;
+}
+
+static const struct rpc_procinfo nfs_localio_procedures[] = {
+ [LOCALIOPROC_UUID_IS_LOCAL] = {
+ .p_proc = LOCALIOPROC_UUID_IS_LOCAL,
+ .p_encode = localio_xdr_enc_uuidargs,
+ .p_decode = localio_xdr_dec_uuidres,
+ .p_arglen = XDR_QUADLEN(UUID_SIZE),
+ .p_replen = 0,
+ .p_statidx = LOCALIOPROC_UUID_IS_LOCAL,
+ .p_name = "UUID_IS_LOCAL",
+ },
+};
+
+static unsigned int nfs_localio_counts[ARRAY_SIZE(nfs_localio_procedures)];
+static const struct rpc_version nfslocalio_version1 = {
+ .number = 1,
+ .nrprocs = ARRAY_SIZE(nfs_localio_procedures),
+ .procs = nfs_localio_procedures,
+ .counts = nfs_localio_counts,
+};
+
+static const struct rpc_version *nfslocalio_version[] = {
+ [1] = &nfslocalio_version1,
+};
+
+extern const struct rpc_program nfslocalio_program;
+static struct rpc_stat nfslocalio_rpcstat = { &nfslocalio_program };
+
+const struct rpc_program nfslocalio_program = {
+ .name = "nfslocalio",
+ .number = NFS_LOCALIO_PROGRAM,
+ .nrvers = ARRAY_SIZE(nfslocalio_version),
+ .version = nfslocalio_version,
+ .stats = &nfslocalio_rpcstat,
+};
+
/*
* nfs_local_enable - enable local i/o for an nfs_client
*/
-static __maybe_unused void nfs_local_enable(struct nfs_client *clp)
+static void nfs_local_enable(struct nfs_client *clp)
{
spin_lock(&clp->cl_localio_lock);
@@ -87,11 +147,74 @@ void nfs_local_disable(struct nfs_client *clp)
spin_unlock(&clp->cl_localio_lock);
}
+/*
+ * nfs_init_localioclient - Initialise an NFS localio client connection
+ */
+static struct rpc_clnt *nfs_init_localioclient(struct nfs_client *clp)
+{
+ struct rpc_clnt *rpcclient_localio;
+
+ rpcclient_localio = rpc_bind_new_program(clp->cl_rpcclient,
+ &nfslocalio_program, 1);
+
+ dprintk_rcu("%s: server (%s) %s NFS LOCALIO.\n",
+ __func__, rpc_peeraddr2str(clp->cl_rpcclient, RPC_DISPLAY_ADDR),
+ (IS_ERR(rpcclient_localio) ? "does not support" : "supports"));
+
+ return rpcclient_localio;
+}
+
+static bool nfs_server_uuid_is_local(struct nfs_client *clp)
+{
+ u8 uuid[UUID_SIZE];
+ struct rpc_message msg = {
+ .rpc_argp = &uuid,
+ };
+ struct rpc_clnt *rpcclient_localio;
+ int status;
+
+ rpcclient_localio = nfs_init_localioclient(clp);
+ if (IS_ERR(rpcclient_localio))
+ return false;
+
+ export_uuid(uuid, &clp->cl_uuid.uuid);
+
+ msg.rpc_proc = &nfs_localio_procedures[LOCALIOPROC_UUID_IS_LOCAL];
+ status = rpc_call_sync(rpcclient_localio, &msg, 0);
+ dprintk("%s: NFS reply UUID_IS_LOCAL: status=%d\n",
+ __func__, status);
+ rpc_shutdown_client(rpcclient_localio);
+
+ /* Server is only local if it initialized required struct members */
+ if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom)
+ return false;
+
+ return true;
+}
+
/*
* nfs_local_probe - probe local i/o support for an nfs_server and nfs_client
+ * - called after alloc_client and init_client (so cl_rpcclient exists)
+ * - this function is idempotent, it can be called for old or new clients
*/
void nfs_local_probe(struct nfs_client *clp)
{
+ /* Disallow localio if disabled via sysfs or AUTH_SYS isn't used */
+ if (!localio_enabled ||
+ clp->cl_rpcclient->cl_auth->au_flavor != RPC_AUTH_UNIX) {
+ nfs_local_disable(clp);
+ return;
+ }
+
+ if (nfs_client_is_local(clp)) {
+ /* If already enabled, disable and re-enable */
+ nfs_local_disable(clp);
+ }
+
+ nfs_uuid_begin(&clp->cl_uuid);
+ if (nfs_server_uuid_is_local(clp))
+ nfs_local_enable(clp);
+ nfs_uuid_end(&clp->cl_uuid);
}
EXPORT_SYMBOL_GPL(nfs_local_probe);
@@ -121,7 +244,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
case -ENOMEM:
case -ENXIO:
case -ENOENT:
- nfs_local_disable(clp);
+ /* Revalidate localio, will disable if unsupported */
+ nfs_local_probe(clp);
}
return NULL;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v14-plus 00/25] Address netns refcount issues for localio
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
` (5 preceding siblings ...)
2024-08-30 2:20 ` [PATCH 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM NeilBrown
@ 2024-08-30 3:46 ` Mike Snitzer
2024-08-30 7:47 ` Mike Snitzer
7 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2024-08-30 3:46 UTC (permalink / raw)
To: NeilBrown; +Cc: Chuck Lever, Jeff Layton, linux-nfs
On Fri, Aug 30, 2024 at 12:20:13PM +1000, NeilBrown wrote:
> Following are revised versions of 6 patches from the v14 localio series.
>
> The issue addressed is net namespace refcounting.
>
> We don't want to keep a long-term counted reference in the client
> because that prevents a server container from completely shutting down.
>
> So we avoid taking a reference at all and rely on the per-cpu reference
> to the server being sufficient to keep the net-ns active. This involves
> allowing the net-ns exit code to iterate all active clients and clear
> their ->net pointers (which they need to find the per-cpu-refcount for
> the nfs_serv).
>
> So:
> - embed nfs_uuid_t in nfs_client. This provides a list_head that can
> be used to find the client. It does add the actual uuid to nfs_client
> so it is bigger than needed. If that is really a problem we can find
> a fix.
>
> - When the nfs server confirms that the uuid is local, it moves the
> nfs_uuid_t onto a per-net-ns list.
>
> - When the net-ns is shutting down - in a "pre_exit" handler, all these
> nfS_uuid_t have their ->net cleared. There is an rcu_synchronize()
> call between pre_exit() handlers and exit() handlers so and caller
> that sees ->net as not NULL can safely check the ->counter
>
> - We now pass the nfs_uuid_t to nfsd_open_local_fh() so it can safely
> look at ->net in a private rcu_read_lock() section.
>
> I have compile tested this code but nothing more.
Wow, nicely done. I will go over it in much more detail and test.
For the benefit of others, here is a single incremental diff relative
to v14:
diff --git a/fs/nfs/client.c b/fs/nfs/client.c
index 6a4b605cc943..4903f5ff5758 100644
--- a/fs/nfs/client.c
+++ b/fs/nfs/client.c
@@ -181,8 +181,7 @@ struct nfs_client *nfs_alloc_client(const struct nfs_client_initdata *cl_init)
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
seqlock_init(&clp->cl_boot_lock);
ktime_get_real_ts64(&clp->cl_nfssvc_boot);
- clp->cl_nfssvc_net = NULL;
- clp->cl_nfssvc_dom = NULL;
+ clp->cl_uuid.net = NULL;
spin_lock_init(&clp->cl_localio_lock);
#endif /* CONFIG_NFS_LOCALIO */
diff --git a/fs/nfs/localio.c b/fs/nfs/localio.c
index 40521da422f7..55622084d5c2 100644
--- a/fs/nfs/localio.c
+++ b/fs/nfs/localio.c
@@ -120,15 +120,13 @@ const struct rpc_program nfslocalio_program = {
/*
* nfs_local_enable - enable local i/o for an nfs_client
*/
-static void nfs_local_enable(struct nfs_client *clp, nfs_uuid_t *nfs_uuid)
+static void nfs_local_enable(struct nfs_client *clp)
{
spin_lock(&clp->cl_localio_lock);
if (unlikely(!get_nfs_to_nfsd_symbols()))
goto out;
set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags);
- rcu_assign_pointer(clp->cl_nfssvc_net, nfs_uuid->net);
- rcu_assign_pointer(clp->cl_nfssvc_dom, nfs_uuid->dom);
trace_nfs_local_enable(clp);
out:
spin_unlock(&clp->cl_localio_lock);
@@ -139,25 +137,12 @@ static void nfs_local_enable(struct nfs_client *clp, nfs_uuid_t *nfs_uuid)
*/
void nfs_local_disable(struct nfs_client *clp)
{
- struct net *cl_nfssvc_net;
- struct auth_domain *cl_nfssvc_dom;
-
spin_lock(&clp->cl_localio_lock);
if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) {
trace_nfs_local_disable(clp);
put_nfs_to_nfsd_symbols();
- cl_nfssvc_net = rcu_dereference(clp->cl_nfssvc_net);
- if (cl_nfssvc_net) {
- put_net(cl_nfssvc_net);
- RCU_INIT_POINTER(clp->cl_nfssvc_net, NULL);
- }
-
- cl_nfssvc_dom = rcu_dereference(clp->cl_nfssvc_dom);
- if (cl_nfssvc_dom) {
- auth_domain_put(cl_nfssvc_dom);
- RCU_INIT_POINTER(clp->cl_nfssvc_dom, NULL);
- }
+ nfs_uuid_invalidate_one_client(&clp->cl_uuid);
}
spin_unlock(&clp->cl_localio_lock);
}
@@ -179,8 +164,7 @@ static struct rpc_clnt *nfs_init_localioclient(struct nfs_client *clp)
return rpcclient_localio;
}
-static bool nfs_server_uuid_is_local(struct nfs_client *clp,
- nfs_uuid_t *nfs_uuid)
+static bool nfs_server_uuid_is_local(struct nfs_client *clp)
{
u8 uuid[UUID_SIZE];
struct rpc_message msg = {
@@ -193,7 +177,7 @@ static bool nfs_server_uuid_is_local(struct nfs_client *clp,
if (IS_ERR(rpcclient_localio))
return false;
- export_uuid(uuid, &nfs_uuid->uuid);
+ export_uuid(uuid, &clp->cl_uuid.uuid);
msg.rpc_proc = &nfs_localio_procedures[LOCALIOPROC_UUID_IS_LOCAL];
status = rpc_call_sync(rpcclient_localio, &msg, 0);
@@ -202,7 +186,7 @@ static bool nfs_server_uuid_is_local(struct nfs_client *clp,
rpc_shutdown_client(rpcclient_localio);
/* Server is only local if it initialized required struct members */
- if (status || !nfs_uuid->net || !nfs_uuid->dom)
+ if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom)
return false;
return true;
@@ -215,8 +199,6 @@ static bool nfs_server_uuid_is_local(struct nfs_client *clp,
*/
void nfs_local_probe(struct nfs_client *clp)
{
- nfs_uuid_t nfs_uuid;
-
/* Disallow localio if disabled via sysfs or AUTH_SYS isn't used */
if (!localio_enabled ||
clp->cl_rpcclient->cl_auth->au_flavor != RPC_AUTH_UNIX) {
@@ -229,10 +211,10 @@ void nfs_local_probe(struct nfs_client *clp)
nfs_local_disable(clp);
}
- nfs_uuid_begin(&nfs_uuid);
- if (nfs_server_uuid_is_local(clp, &nfs_uuid))
- nfs_local_enable(clp, &nfs_uuid);
- nfs_uuid_end(&nfs_uuid);
+ nfs_uuid_begin(&clp->cl_uuid);
+ if (nfs_server_uuid_is_local(clp))
+ nfs_local_enable(clp);
+ nfs_uuid_end(&clp->cl_uuid);
}
EXPORT_SYMBOL_GPL(nfs_local_probe);
@@ -245,8 +227,6 @@ struct nfs_localio_ctx *
nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
struct nfs_fh *fh, const fmode_t mode)
{
- struct net *cl_nfssvc_net;
- struct auth_domain *cl_nfssvc_dom;
struct nfs_localio_ctx *localio;
int status;
@@ -255,15 +235,8 @@ nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred,
if (mode & ~(FMODE_READ | FMODE_WRITE))
return NULL;
- rcu_read_lock();
- cl_nfssvc_net = rcu_dereference(clp->cl_nfssvc_net);
- cl_nfssvc_dom = rcu_dereference(clp->cl_nfssvc_dom);
- if (unlikely(!cl_nfssvc_net || !cl_nfssvc_dom))
- localio = ERR_PTR(-ENXIO);
- else
- localio = nfs_to.nfsd_open_local_fh(cl_nfssvc_net, cl_nfssvc_dom,
- clp->cl_rpcclient, cred, fh, mode);
- rcu_read_unlock();
+ localio = nfs_to.nfsd_open_local_fh(&clp->cl_uuid,
+ clp->cl_rpcclient, cred, fh, mode);
if (IS_ERR(localio)) {
status = PTR_ERR(localio);
trace_nfs_local_open_fh(fh, mode, status);
diff --git a/fs/nfs_common/nfslocalio.c b/fs/nfs_common/nfslocalio.c
index cc30fdb0cb46..8545ee75f756 100644
--- a/fs/nfs_common/nfslocalio.c
+++ b/fs/nfs_common/nfslocalio.c
@@ -11,11 +11,11 @@
MODULE_LICENSE("GPL");
MODULE_DESCRIPTION("NFS localio protocol bypass support");
-DEFINE_MUTEX(nfs_uuid_mutex);
+static DEFINE_SPINLOCK(nfs_uuid_lock);
/*
* Global list of nfs_uuid_t instances, add/remove
- * is protected by nfs_uuid_mutex.
+ * is protected by nfs_uuid_lock.
* Reads are protected by RCU read lock (see below).
*/
LIST_HEAD(nfs_uuids);
@@ -26,17 +26,19 @@ void nfs_uuid_begin(nfs_uuid_t *nfs_uuid)
nfs_uuid->dom = NULL;
uuid_gen(&nfs_uuid->uuid);
- mutex_lock(&nfs_uuid_mutex);
+ spin_lock(&nfs_uuid_lock);
list_add_tail_rcu(&nfs_uuid->list, &nfs_uuids);
- mutex_unlock(&nfs_uuid_mutex);
+ spin_unlock(&nfs_uuid_lock);
}
EXPORT_SYMBOL_GPL(nfs_uuid_begin);
void nfs_uuid_end(nfs_uuid_t *nfs_uuid)
{
- mutex_lock(&nfs_uuid_mutex);
- list_del_rcu(&nfs_uuid->list);
- mutex_unlock(&nfs_uuid_mutex);
+ if (nfs_uuid->net == NULL) {
+ spin_lock(&nfs_uuid_lock);
+ list_del_init(&nfs_uuid->list);
+ spin_unlock(&nfs_uuid_lock);
+ }
}
EXPORT_SYMBOL_GPL(nfs_uuid_end);
@@ -45,34 +47,66 @@ static nfs_uuid_t * nfs_uuid_lookup(const uuid_t *uuid)
{
nfs_uuid_t *nfs_uuid;
- list_for_each_entry_rcu(nfs_uuid, &nfs_uuids, list)
+ list_for_each_entry(nfs_uuid, &nfs_uuids, list)
if (uuid_equal(&nfs_uuid->uuid, uuid))
return nfs_uuid;
return NULL;
}
-bool nfs_uuid_is_local(const uuid_t *uuid, struct net *net, struct auth_domain *dom)
+void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list,
+ struct net *net, struct auth_domain *dom)
{
- bool is_local = false;
nfs_uuid_t *nfs_uuid;
- rcu_read_lock();
+ spin_lock(&nfs_uuid_lock);
nfs_uuid = nfs_uuid_lookup(uuid);
if (nfs_uuid) {
- nfs_uuid->net = maybe_get_net(net);
- if (nfs_uuid->net) {
- is_local = true;
- kref_get(&dom->ref);
- nfs_uuid->dom = dom;
- }
+ kref_get(&dom->ref);
+ nfs_uuid->dom = dom;
+ /* We don't hold a ref on the net, but instead put
+ * ourselves on a list so the net pointer can be
+ * invalidated.
+ */
+ list_move(&nfs_uuid->list, list);
+ nfs_uuid->net = net;
}
- rcu_read_unlock();
-
- return is_local;
+ spin_unlock(&nfs_uuid_lock);
}
EXPORT_SYMBOL_GPL(nfs_uuid_is_local);
+static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid)
+{
+ if (nfs_uuid->net)
+ put_net(nfs_uuid->net);
+ nfs_uuid->net = NULL;
+ if (nfs_uuid->dom)
+ auth_domain_put(nfs_uuid->dom);
+ nfs_uuid->dom = NULL;
+ list_del_init(&nfs_uuid->list);
+}
+
+void nfs_uuid_invalidate_clients(struct list_head *list)
+{
+ nfs_uuid_t *nfs_uuid, *tmp;
+
+ spin_lock(&nfs_uuid_lock);
+ list_for_each_entry_safe(nfs_uuid, tmp, list, list)
+ nfs_uuid_put_locked(nfs_uuid);
+ spin_unlock(&nfs_uuid_lock);
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_clients);
+
+void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid)
+{
+ if (nfs_uuid->net) {
+ spin_lock(&nfs_uuid_lock);
+ nfs_uuid_put_locked(nfs_uuid);
+ spin_unlock(&nfs_uuid_lock);
+ }
+}
+EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client);
+
/*
* The nfs localio code needs to call into nfsd using various symbols (below),
* but cannot be statically linked, because that will make the nfs module
@@ -100,7 +134,7 @@ static void put_##NFSD_SYMBOL(void) \
/* The nfs localio code needs to call into nfsd to map filehandle -> struct nfsd_file */
extern struct nfs_localio_ctx *
-nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *,
+nfsd_open_local_fh(nfs_uuid_t *, struct rpc_clnt *,
const struct cred *, const struct nfs_fh *, const fmode_t);
DEFINE_NFS_TO_NFSD_SYMBOL(nfsd_open_local_fh);
diff --git a/fs/nfsd/localio.c b/fs/nfsd/localio.c
index a192bbe308df..491bf5017d34 100644
--- a/fs/nfsd/localio.c
+++ b/fs/nfsd/localio.c
@@ -42,13 +42,14 @@
* nfsd_serv_put (via nfs_localio_ctx_free).
*/
struct nfs_localio_ctx *
-nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
+nfsd_open_local_fh(nfs_uuid_t *uuid,
struct rpc_clnt *rpc_clnt, const struct cred *cred,
const struct nfs_fh *nfs_fh, const fmode_t fmode)
{
int mayflags = NFSD_MAY_LOCALIO;
int status = 0;
- struct nfsd_net *nn;
+ struct nfsd_net *nn = NULL;
+ struct net *net;
struct svc_cred rq_cred;
struct svc_fh fh;
struct nfs_localio_ctx *localio;
@@ -64,12 +65,20 @@ nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
/*
* Not running in nfsd context, so must safely get reference on nfsd_serv.
* But the server may already be shutting down, if so disallow new localio.
+ * uuid->net is NOT a counted reference, but rcu_read_lock() ensures that if
+ * uuid->net is not NULL, then nfsd_serv_try_get() is safe and if that succeeds
+ * we will have an implied reference to the net.
*/
- nn = net_generic(cl_nfssvc_net, nfsd_net_id);
- if (unlikely(!nfsd_serv_try_get(nn))) {
+ rcu_read_lock();
+ net = READ_ONCE(uuid->net);
+ if (net)
+ nn = net_generic(net, nfsd_net_id);
+ if (unlikely(!nn || !nfsd_serv_try_get(nn))) {
+ rcu_read_unlock();
status = -ENXIO;
goto out_nfsd_serv;
}
+ rcu_read_unlock();
/* nfs_fh -> svc_fh */
fh_init(&fh, NFS4_FHSIZE);
@@ -83,7 +92,7 @@ nfsd_open_local_fh(struct net *cl_nfssvc_net, struct auth_domain *cl_nfssvc_dom,
svcauth_map_clnt_to_svc_cred_local(rpc_clnt, cred, &rq_cred);
- beres = nfsd_file_acquire_local(cl_nfssvc_net, &rq_cred, cl_nfssvc_dom,
+ beres = nfsd_file_acquire_local(uuid->net, &rq_cred, uuid->dom,
&fh, mayflags, &localio->nf);
if (beres) {
status = nfs_stat_to_errno(be32_to_cpu(beres));
@@ -123,9 +132,11 @@ struct localio_uuidarg {
static __be32 localio_proc_uuid_is_local(struct svc_rqst *rqstp)
{
struct localio_uuidarg *argp = rqstp->rq_argp;
+ struct net *net = SVC_NET(rqstp);
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- (void) nfs_uuid_is_local(&argp->uuid, SVC_NET(rqstp),
- rqstp->rq_client);
+ nfs_uuid_is_local(&argp->uuid, &nn->local_clients,
+ net, rqstp->rq_client);
return rpc_success;
}
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index e2d953f21dde..9c65db8d3f44 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -216,6 +216,8 @@ struct nfsd_net {
/* last time an admin-revoke happened for NFSv4.0 */
time64_t nfs40_last_revoke;
+ /* Local clients to be invalidated when net is shut down */
+ struct list_head local_clients;
};
/* Simple check to find out if a given net was properly initialized */
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 64c1b4d649bc..01e383d692ab 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -18,6 +18,7 @@
#include <linux/sunrpc/svc.h>
#include <linux/module.h>
#include <linux/fsnotify.h>
+#include <linux/nfslocalio.h>
#include "idmap.h"
#include "nfsd.h"
@@ -2257,6 +2258,7 @@ static __net_init int nfsd_net_init(struct net *net)
get_random_bytes(&nn->siphash_key, sizeof(nn->siphash_key));
seqlock_init(&nn->writeverf_lock);
nfsd_proc_stat_init(net);
+ INIT_LIST_HEAD(&nn->local_clients);
return 0;
@@ -2268,6 +2270,19 @@ static __net_init int nfsd_net_init(struct net *net)
return retval;
}
+/**
+ * nfsd_net_pre_exit - Disconnect localio clients from net namespace
+ * @net: a network namespace that is about to be destroyed
+ *
+ * This invalidated ->net pointers held by localio clients
+ * while they can still safely access nn->counter.
+ */
+static __net_exit void nfsd_net_pre_exit(struct net *net)
+{
+ struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+
+ nfs_uuid_invalidate_clients(&nn->local_clients);
+}
/**
* nfsd_net_exit - Release the nfsd_net portion of a net namespace
* @net: a network namespace that is about to be destroyed
@@ -2285,6 +2300,7 @@ static __net_exit void nfsd_net_exit(struct net *net)
static struct pernet_operations nfsd_net_ops = {
.init = nfsd_net_init,
+ .pre_exit = nfsd_net_pre_exit,
.exit = nfsd_net_exit,
.id = &nfsd_net_id,
.size = sizeof(struct nfsd_net),
diff --git a/fs/nfsd/vfs.h b/fs/nfsd/vfs.h
index e12310dd5f4c..2ecceb8b9d3d 100644
--- a/fs/nfsd/vfs.h
+++ b/fs/nfsd/vfs.h
@@ -8,6 +8,7 @@
#include <linux/fs.h>
#include <linux/posix_acl.h>
+#include <linux/nfslocalio.h>
#include "nfsfh.h"
#include "nfsd.h"
@@ -161,7 +162,7 @@ __be32 nfsd_permission(struct svc_cred *cred, struct svc_export *exp,
void nfsd_filp_close(struct file *fp);
struct nfs_localio_ctx *
-nfsd_open_local_fh(struct net *, struct auth_domain *,
+nfsd_open_local_fh(nfs_uuid_t *,
struct rpc_clnt *, const struct cred *,
const struct nfs_fh *, const fmode_t);
diff --git a/include/linux/nfs_fs_sb.h b/include/linux/nfs_fs_sb.h
index fc7982fc218c..b43e3e067e44 100644
--- a/include/linux/nfs_fs_sb.h
+++ b/include/linux/nfs_fs_sb.h
@@ -131,8 +131,7 @@ struct nfs_client {
#if IS_ENABLED(CONFIG_NFS_LOCALIO)
struct timespec64 cl_nfssvc_boot;
seqlock_t cl_boot_lock;
- struct net __rcu * cl_nfssvc_net;
- struct auth_domain __rcu * cl_nfssvc_dom;
+ nfs_uuid_t cl_uuid;
spinlock_t cl_localio_lock;
#endif /* CONFIG_NFS_LOCALIO */
};
diff --git a/include/linux/nfslocalio.h b/include/linux/nfslocalio.h
index 68f5b39f1940..e196f716a2f5 100644
--- a/include/linux/nfslocalio.h
+++ b/include/linux/nfslocalio.h
@@ -28,7 +28,10 @@ typedef struct {
void nfs_uuid_begin(nfs_uuid_t *);
void nfs_uuid_end(nfs_uuid_t *);
-bool nfs_uuid_is_local(const uuid_t *, struct net *, struct auth_domain *);
+void nfs_uuid_is_local(const uuid_t *, struct list_head *,
+ struct net *, struct auth_domain *);
+void nfs_uuid_invalidate_clients(struct list_head *list);
+void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid);
struct nfsd_file;
struct nfsd_net;
@@ -39,7 +42,7 @@ struct nfs_localio_ctx {
};
typedef struct nfs_localio_ctx *
-(*nfs_to_nfsd_open_local_fh_t)(struct net *, struct auth_domain *,
+(*nfs_to_nfsd_open_local_fh_t)(nfs_uuid_t *,
struct rpc_clnt *, const struct cred *,
const struct nfs_fh *, const fmode_t);
typedef struct nfsd_file * (*nfs_to_nfsd_file_get_t)(struct nfsd_file *);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v14-plus 00/25] Address netns refcount issues for localio
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
` (6 preceding siblings ...)
2024-08-30 3:46 ` [PATCH v14-plus 00/25] Address netns refcount issues for localio Mike Snitzer
@ 2024-08-30 7:47 ` Mike Snitzer
2024-08-30 13:56 ` Mike Snitzer
7 siblings, 1 reply; 10+ messages in thread
From: Mike Snitzer @ 2024-08-30 7:47 UTC (permalink / raw)
To: NeilBrown; +Cc: Chuck Lever, Jeff Layton, linux-nfs
On Fri, Aug 30, 2024 at 12:20:13PM +1000, NeilBrown wrote:
> Following are revised versions of 6 patches from the v14 localio series.
>
> The issue addressed is net namespace refcounting.
>
> We don't want to keep a long-term counted reference in the client
> because that prevents a server container from completely shutting down.
>
> So we avoid taking a reference at all and rely on the per-cpu reference
> to the server being sufficient to keep the net-ns active. This involves
> allowing the net-ns exit code to iterate all active clients and clear
> their ->net pointers (which they need to find the per-cpu-refcount for
> the nfs_serv).
>
> So:
> - embed nfs_uuid_t in nfs_client. This provides a list_head that can
> be used to find the client. It does add the actual uuid to nfs_client
> so it is bigger than needed. If that is really a problem we can find
> a fix.
>
> - When the nfs server confirms that the uuid is local, it moves the
> nfs_uuid_t onto a per-net-ns list.
>
> - When the net-ns is shutting down - in a "pre_exit" handler, all these
> nfS_uuid_t have their ->net cleared. There is an rcu_synchronize()
> call between pre_exit() handlers and exit() handlers so and caller
> that sees ->net as not NULL can safely check the ->counter
>
> - We now pass the nfs_uuid_t to nfsd_open_local_fh() so it can safely
> look at ->net in a private rcu_read_lock() section.
>
> I have compile tested this code but nothing more.
>
> Thanks,
> NeilBrown
>
> [PATCH 14/25] nfs_common: add NFS LOCALIO auxiliary protocol
> [PATCH 15/25] nfs_common: introduce nfs_localio_ctx struct and
> [PATCH 16/25] nfsd: add localio support
> [PATCH 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM
> [PATCH 19/25] nfs: add localio support
> [PATCH 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM
Hey Neil,
I attempted to test the kernel with your changes but it crashed with:
[ 55.422564] list_add corruption. next is NULL.
[ 55.423523] ------------[ cut here ]------------
[ 55.424423] kernel BUG at lib/list_debug.c:27!
[ 55.425291] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 55.426367] CPU: 29 UID: 0 PID: 5251 Comm: nfsd Kdump: loaded Not tainted 6.11.0-rc4.snitm+ #147
[ 55.427991] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-4.module+el8.9.0+19570+14a90618 04/01/2014
[ 55.429697] RIP: 0010:__list_add_valid_or_report+0x55/0xa0
[ 55.430741] Code: 4c 39 cf 74 4f b8 01 00 00 00 5d c3 cc cc cc cc 48 c7 c7 98 1d a5 82 e8 d9 6d 93 ff 0f 0b 48 c7 c7 c0 1d a5 82 e8 cb 6d 93 ff <0f> 0b 4c 89 c1 48 c7 c7 e8 1d a5 82 e8 ba 6d 93 ff 0f 0b 48 89 d1
[ 55.434167] RSP: 0018:ffff8881441a7d50 EFLAGS: 00010296
[ 55.435141] RAX: 0000000000000022 RBX: ffff888107b50370 RCX: 0000000000000000
[ 55.436455] RDX: ffff888473caf800 RSI: ffff888473ca18c0 RDI: ffff888473ca18c0
[ 55.437770] RBP: ffff8881441a7d50 R08: 0000000000000022 R09: ffff8881441a7be8
[ 55.439098] R10: ffff8881441a7be0 R11: ffffffff8333f328 R12: ffff888107b50380
[ 55.440419] R13: ffff888103b15080 R14: ffff888107bb5d00 R15: 0000000000000000
[ 55.441737] FS: 0000000000000000(0000) GS:ffff888473c80000(0000) knlGS:0000000000000000
[ 55.443228] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 55.444297] CR2: 000055958fab3488 CR3: 000000010615e000 CR4: 0000000000350ef0
[ 55.445615] Call Trace:
[ 55.446090] <TASK>
[ 55.446498] ? show_regs+0x6d/0x80
[ 55.447149] ? die+0x3c/0xa0
[ 55.447698] ? do_trap+0xcf/0xf0
[ 55.448316] ? do_error_trap+0x75/0xa0
[ 55.449026] ? __list_add_valid_or_report+0x55/0xa0
[ 55.449938] ? exc_invalid_op+0x57/0x80
[ 55.450660] ? __list_add_valid_or_report+0x55/0xa0
[ 55.451646] ? asm_exc_invalid_op+0x1f/0x30
[ 55.452438] ? __list_add_valid_or_report+0x55/0xa0
[ 55.453355] nfs_uuid_is_local+0xba/0x110
[ 55.454115] localio_proc_uuid_is_local+0x64/0x80 [nfsd]
[ 55.455145] nfsd_dispatch+0xc2/0x210 [nfsd]
[ 55.455977] svc_process_common+0x2e6/0x6e0
[ 55.456761] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[ 55.457697] svc_process+0x13e/0x1e0
[ 55.458377] svc_recv+0x89e/0xa70
[ 55.459012] ? __pfx_nfsd+0x10/0x10 [nfsd]
[ 55.459806] nfsd+0xa5/0x100 [nfsd]
[ 55.460486] kthread+0xe5/0x120
[ 55.461090] ? __pfx_kthread+0x10/0x10
[ 55.461801] ret_from_fork+0x3d/0x60
[ 55.462476] ? __pfx_kthread+0x10/0x10
[ 55.463184] ret_from_fork_asm+0x1a/0x30
[ 55.463923] </TASK>
I'll triple check my melding of your changes and mine in ~7 hours.. I
may have missed something.
Note this is _not_ with your other incremental patch (that uses
__module_get) -- only because I didn't get to that yet.
Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v14-plus 00/25] Address netns refcount issues for localio
2024-08-30 7:47 ` Mike Snitzer
@ 2024-08-30 13:56 ` Mike Snitzer
0 siblings, 0 replies; 10+ messages in thread
From: Mike Snitzer @ 2024-08-30 13:56 UTC (permalink / raw)
To: NeilBrown; +Cc: Chuck Lever, Jeff Layton, linux-nfs
On Fri, Aug 30, 2024 at 03:47:31AM -0400, Mike Snitzer wrote:
> On Fri, Aug 30, 2024 at 12:20:13PM +1000, NeilBrown wrote:
> > Following are revised versions of 6 patches from the v14 localio series.
> >
> > The issue addressed is net namespace refcounting.
> >
> > We don't want to keep a long-term counted reference in the client
> > because that prevents a server container from completely shutting down.
> >
> > So we avoid taking a reference at all and rely on the per-cpu reference
> > to the server being sufficient to keep the net-ns active. This involves
> > allowing the net-ns exit code to iterate all active clients and clear
> > their ->net pointers (which they need to find the per-cpu-refcount for
> > the nfs_serv).
> >
> > So:
> > - embed nfs_uuid_t in nfs_client. This provides a list_head that can
> > be used to find the client. It does add the actual uuid to nfs_client
> > so it is bigger than needed. If that is really a problem we can find
> > a fix.
> >
> > - When the nfs server confirms that the uuid is local, it moves the
> > nfs_uuid_t onto a per-net-ns list.
> >
> > - When the net-ns is shutting down - in a "pre_exit" handler, all these
> > nfS_uuid_t have their ->net cleared. There is an rcu_synchronize()
> > call between pre_exit() handlers and exit() handlers so and caller
> > that sees ->net as not NULL can safely check the ->counter
> >
> > - We now pass the nfs_uuid_t to nfsd_open_local_fh() so it can safely
> > look at ->net in a private rcu_read_lock() section.
> >
> > I have compile tested this code but nothing more.
> >
> > Thanks,
> > NeilBrown
> >
> > [PATCH 14/25] nfs_common: add NFS LOCALIO auxiliary protocol
> > [PATCH 15/25] nfs_common: introduce nfs_localio_ctx struct and
> > [PATCH 16/25] nfsd: add localio support
> > [PATCH 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM
> > [PATCH 19/25] nfs: add localio support
> > [PATCH 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM
>
> Hey Neil,
>
> I attempted to test the kernel with your changes but it crashed with:
>
> [ 55.422564] list_add corruption. next is NULL.
> [ 55.423523] ------------[ cut here ]------------
> [ 55.424423] kernel BUG at lib/list_debug.c:27!
...
> I'll triple check my melding of your changes and mine in ~7 hours.. I
> may have missed something.
Good news, I was just missing your fs/nfsd/nfsctl.c changes.. works
well with those ;)
Seeing a very slight drop in performance (with my well-worn smoke test
that does 20 secs of aio directio 128K reads with 24 threads on my
testbed). But I mean slight (~2-3%). Not worried about it,
correctness trumps performance, but I think reclaiming that
performance will be easy (by avoiding the need for KMEM_CACHE
nfs_localio_ctx_{alloc,free}).
> Note this is _not_ with your other incremental patch (that uses
> __module_get) -- only because I didn't get to that yet.
Moving on to incorporating your nfsd __module_get now. Then on to the
work of switching from nfs_localio_ctx back to returning nfsd_file.
Mike
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-08-30 13:56 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-30 2:20 [PATCH v14-plus 00/25] Address netns refcount issues for localio NeilBrown
2024-08-30 2:20 ` [PATCH 14/25] nfs_common: add NFS LOCALIO auxiliary protocol enablement NeilBrown
2024-08-30 2:20 ` [PATCH 15/25] nfs_common: introduce nfs_localio_ctx struct and interfaces NeilBrown
2024-08-30 2:20 ` [PATCH 16/25] nfsd: add localio support NeilBrown
2024-08-30 2:20 ` [PATCH 17/25] nfsd: implement server support for NFS_LOCALIO_PROGRAM NeilBrown
2024-08-30 2:20 ` [PATCH 19/25] nfs: add localio support NeilBrown
2024-08-30 2:20 ` [PATCH 23/25] nfs: implement client support for NFS_LOCALIO_PROGRAM NeilBrown
2024-08-30 3:46 ` [PATCH v14-plus 00/25] Address netns refcount issues for localio Mike Snitzer
2024-08-30 7:47 ` Mike Snitzer
2024-08-30 13:56 ` Mike Snitzer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.