* [RFC PATCH 0/2] NFSD: add a setting to disable splice reads
@ 2025-03-08 20:14 cel
2025-03-08 20:14 ` [RFC PATCH 1/2] NFSD: Add /sys/kernel/debug/nfsd cel
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: cel @ 2025-03-08 20:14 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Trond Myklebust, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
The usual policy for kernel-user space APIs is that once a public
API appears, it is difficult to change or remove, in order to
maintain backwards compatibility with user space software.
This series introduces /sys/kernel/debug/nfsd/ where we can
hopefully place ephemeral and undocumented settings for testing NFSD
features that are to be added or deprecated without the worry of
having to support an administrative API forever without change,
amen.
As a first consumer of this user-kernel API, the series adds a
simple disable-splice-read setting, which can force all NFS READ
operations to use vfs_iter_read() rather than page splicing to fill
data content for an NFS reply.
The splice read path is the default on most file systems, so it gets
most of the test experience. The purpose of this new setting is to
enable test runners to force the use of the iov iter path. We are
also interested in comparing the performance of the splice and iter
paths, as a prelude to potentially removing page splicing. This new
setting makes it easy to benchmark either read mode without having
to rebuild the kernel.
We have an eye on a few other consumers, such as uncached I/O and
increasing the maximum r/wsize, for which /sys/kernel/debug/nfsd
might be suitable while their performance impact is studied before
a concrete administrative interface is agreed upon.
Opinions and code review are welcome, as always.
Chuck Lever (2):
NFSD: Add /sys/kernel/debug/nfsd
NFSD: Add experimental setting to disable the use of splice read
fs/nfsd/Makefile | 1 +
fs/nfsd/debugfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfsctl.c | 4 ++++
fs/nfsd/nfsd.h | 10 ++++++++++
fs/nfsd/vfs.c | 4 ++++
5 files changed, 66 insertions(+)
create mode 100644 fs/nfsd/debugfs.c
--
2.48.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [RFC PATCH 1/2] NFSD: Add /sys/kernel/debug/nfsd
2025-03-08 20:14 [RFC PATCH 0/2] NFSD: add a setting to disable splice reads cel
@ 2025-03-08 20:14 ` cel
2025-03-08 20:14 ` [RFC PATCH 2/2] NFSD: Add experimental setting to disable the use of splice read cel
2025-03-10 12:08 ` [RFC PATCH 0/2] NFSD: add a setting to disable splice reads Jeff Layton
2 siblings, 0 replies; 4+ messages in thread
From: cel @ 2025-03-08 20:14 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Trond Myklebust, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Create a small sandbox under /sys/kernel/debug for experimental NFS
server feature settings. There is no API/ABI compatibility guarantee
for these settings.
The only documentation for such settings, if any documentation exists,
is in the kernel source code.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/Makefile | 1 +
fs/nfsd/debugfs.c | 18 ++++++++++++++++++
fs/nfsd/nfsctl.c | 4 ++++
fs/nfsd/nfsd.h | 8 ++++++++
4 files changed, 31 insertions(+)
create mode 100644 fs/nfsd/debugfs.c
diff --git a/fs/nfsd/Makefile b/fs/nfsd/Makefile
index 2f687619f65b..55744bb786c9 100644
--- a/fs/nfsd/Makefile
+++ b/fs/nfsd/Makefile
@@ -24,6 +24,7 @@ nfsd-$(CONFIG_NFSD_BLOCKLAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_SCSILAYOUT) += blocklayout.o blocklayoutxdr.o
nfsd-$(CONFIG_NFSD_FLEXFILELAYOUT) += flexfilelayout.o flexfilelayoutxdr.o
nfsd-$(CONFIG_NFS_LOCALIO) += localio.o
+nfsd-$(CONFIG_DEBUG_FS) += debugfs.o
.PHONY: xdrgen
diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
new file mode 100644
index 000000000000..e913268d9c2d
--- /dev/null
+++ b/fs/nfsd/debugfs.c
@@ -0,0 +1,18 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include <linux/debugfs.h>
+
+#include "nfsd.h"
+
+static struct dentry *nfsd_top_dir __read_mostly;
+
+void nfsd_debugfs_exit(void)
+{
+ debugfs_remove_recursive(nfsd_top_dir);
+ nfsd_top_dir = NULL;
+}
+
+void nfsd_debugfs_init(void)
+{
+ nfsd_top_dir = debugfs_create_dir("nfsd", NULL);
+}
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index ce2a71e4904c..1919eafe500a 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -2276,6 +2276,8 @@ static int __init init_nfsd(void)
{
int retval;
+ nfsd_debugfs_init();
+
retval = nfsd4_init_slabs();
if (retval)
return retval;
@@ -2323,6 +2325,7 @@ static int __init init_nfsd(void)
nfsd4_exit_pnfs();
out_free_slabs:
nfsd4_free_slabs();
+ nfsd_debugfs_exit();
return retval;
}
@@ -2339,6 +2342,7 @@ static void __exit exit_nfsd(void)
nfsd_lockd_shutdown();
nfsd4_free_slabs();
nfsd4_exit_pnfs();
+ nfsd_debugfs_exit();
}
MODULE_AUTHOR("Olaf Kirch <okir@monad.swb.de>");
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index e2997f0ffbc5..8a53ddab5df0 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -156,6 +156,14 @@ void nfsd_reset_versions(struct nfsd_net *nn);
int nfsd_create_serv(struct net *net);
void nfsd_destroy_serv(struct net *net);
+#ifdef CONFIG_DEBUG_FS
+void nfsd_debugfs_init(void);
+void nfsd_debugfs_exit(void);
+#else
+static inline void nfsd_debugfs_init(void) {}
+static inline void nfsd_debugfs_exit(void) {}
+#endif
+
extern int nfsd_max_blksize;
static inline int nfsd_v4client(struct svc_rqst *rq)
--
2.48.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [RFC PATCH 2/2] NFSD: Add experimental setting to disable the use of splice read
2025-03-08 20:14 [RFC PATCH 0/2] NFSD: add a setting to disable splice reads cel
2025-03-08 20:14 ` [RFC PATCH 1/2] NFSD: Add /sys/kernel/debug/nfsd cel
@ 2025-03-08 20:14 ` cel
2025-03-10 12:08 ` [RFC PATCH 0/2] NFSD: add a setting to disable splice reads Jeff Layton
2 siblings, 0 replies; 4+ messages in thread
From: cel @ 2025-03-08 20:14 UTC (permalink / raw)
To: Neil Brown, Jeff Layton, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Trond Myklebust, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
NFSD currently has two separate code paths for handling read
requests. One uses page splicing; the other is a traditional read
based on an iov iterator.
Because most Linux file systems support splice read, the latter
does not get nearly the same test experience as splice reads.
To force the use of vectored reads for testing and benchmarking,
introduce the ability to disable splice reads for all NFS READ
operations.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/debugfs.c | 29 +++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 2 ++
fs/nfsd/vfs.c | 4 ++++
3 files changed, 35 insertions(+)
diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
index e913268d9c2d..894938fea97b 100644
--- a/fs/nfsd/debugfs.c
+++ b/fs/nfsd/debugfs.c
@@ -6,6 +6,32 @@
static struct dentry *nfsd_top_dir __read_mostly;
+/*
+ * /sys/kernel/debug/nfsd/disable-splice-read
+ *
+ * Contents:
+ * %0: NFS READ is allowed to use page splicing
+ * %1: NFS READ uses only iov iter read
+ *
+ * The default value of this setting is zero (page splicing is
+ * allowed). This setting is effective for all NFS versions, all
+ * exports, and in all NFSD net namespaces.
+ */
+
+static int nfsd_dsr_get(void *data, u64 *val)
+{
+ *val = nfsd_disable_splice_read ? 1 : 0;
+ return 0;
+}
+
+static int nfsd_dsr_set(void *data, u64 val)
+{
+ nfsd_disable_splice_read = (val > 0) ? true : false;
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(nfsd_dsr_fops, nfsd_dsr_get, nfsd_dsr_set, "%llu\n");
+
void nfsd_debugfs_exit(void)
{
debugfs_remove_recursive(nfsd_top_dir);
@@ -15,4 +41,7 @@ void nfsd_debugfs_exit(void)
void nfsd_debugfs_init(void)
{
nfsd_top_dir = debugfs_create_dir("nfsd", NULL);
+
+ debugfs_create_file("disable-splice-read", S_IWUSR | S_IRUGO,
+ nfsd_top_dir, NULL, &nfsd_dsr_fops);
}
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 8a53ddab5df0..232aee06223d 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -164,6 +164,8 @@ static inline void nfsd_debugfs_init(void) {}
static inline void nfsd_debugfs_exit(void) {}
#endif
+extern bool nfsd_disable_splice_read __read_mostly;
+
extern int nfsd_max_blksize;
static inline int nfsd_v4client(struct svc_rqst *rq)
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 29cb7b812d71..30b0b192f1fa 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -47,6 +47,8 @@
#define NFSDDBG_FACILITY NFSDDBG_FILEOP
+bool nfsd_disable_splice_read __read_mostly;
+
/**
* nfserrno - Map Linux errnos to NFS errnos
* @errno: POSIX(-ish) error code to be mapped
@@ -1237,6 +1239,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, struct nfsd_file *nf,
*/
bool nfsd_read_splice_ok(struct svc_rqst *rqstp)
{
+ if (nfsd_disable_splice_read)
+ return false;
switch (svc_auth_flavor(rqstp)) {
case RPC_AUTH_GSS_KRB5I:
case RPC_AUTH_GSS_KRB5P:
--
2.48.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH 0/2] NFSD: add a setting to disable splice reads
2025-03-08 20:14 [RFC PATCH 0/2] NFSD: add a setting to disable splice reads cel
2025-03-08 20:14 ` [RFC PATCH 1/2] NFSD: Add /sys/kernel/debug/nfsd cel
2025-03-08 20:14 ` [RFC PATCH 2/2] NFSD: Add experimental setting to disable the use of splice read cel
@ 2025-03-10 12:08 ` Jeff Layton
2 siblings, 0 replies; 4+ messages in thread
From: Jeff Layton @ 2025-03-10 12:08 UTC (permalink / raw)
To: cel, Neil Brown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, Trond Myklebust, Chuck Lever
On Sat, 2025-03-08 at 15:14 -0500, cel@kernel.org wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> The usual policy for kernel-user space APIs is that once a public
> API appears, it is difficult to change or remove, in order to
> maintain backwards compatibility with user space software.
>
> This series introduces /sys/kernel/debug/nfsd/ where we can
> hopefully place ephemeral and undocumented settings for testing NFSD
> features that are to be added or deprecated without the worry of
> having to support an administrative API forever without change,
> amen.
>
> As a first consumer of this user-kernel API, the series adds a
> simple disable-splice-read setting, which can force all NFS READ
> operations to use vfs_iter_read() rather than page splicing to fill
> data content for an NFS reply.
>
> The splice read path is the default on most file systems, so it gets
> most of the test experience. The purpose of this new setting is to
> enable test runners to force the use of the iov iter path. We are
> also interested in comparing the performance of the splice and iter
> paths, as a prelude to potentially removing page splicing. This new
> setting makes it easy to benchmark either read mode without having
> to rebuild the kernel.
>
> We have an eye on a few other consumers, such as uncached I/O and
> increasing the maximum r/wsize, for which /sys/kernel/debug/nfsd
> might be suitable while their performance impact is studied before
> a concrete administrative interface is agreed upon.
>
> Opinions and code review are welcome, as always.
>
> Chuck Lever (2):
> NFSD: Add /sys/kernel/debug/nfsd
> NFSD: Add experimental setting to disable the use of splice read
>
> fs/nfsd/Makefile | 1 +
> fs/nfsd/debugfs.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
> fs/nfsd/nfsctl.c | 4 ++++
> fs/nfsd/nfsd.h | 10 ++++++++++
> fs/nfsd/vfs.c | 4 ++++
> 5 files changed, 66 insertions(+)
> create mode 100644 fs/nfsd/debugfs.c
>
Looks good to me:
Reviewed-by: Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2025-03-10 12:08 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-08 20:14 [RFC PATCH 0/2] NFSD: add a setting to disable splice reads cel
2025-03-08 20:14 ` [RFC PATCH 1/2] NFSD: Add /sys/kernel/debug/nfsd cel
2025-03-08 20:14 ` [RFC PATCH 2/2] NFSD: Add experimental setting to disable the use of splice read cel
2025-03-10 12:08 ` [RFC PATCH 0/2] NFSD: add a setting to disable splice reads Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox