From: Chuck Lever <cel@kernel.org>
To: Christoph Hellwig <hch@lst.de>, Mike Snitzer <snitzer@kernel.org>
Cc: <linux-nfs@vger.kernel.org>, Chuck Lever <chuck.lever@oracle.com>
Subject: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control
Date: Fri, 19 Dec 2025 09:11:04 -0500 [thread overview]
Message-ID: <20251219141105.1247093-2-cel@kernel.org> (raw)
In-Reply-To: <20251219141105.1247093-1-cel@kernel.org>
From: Chuck Lever <chuck.lever@oracle.com>
On NFS servers with fast network links but slow storage, clients can
generate WRITE requests faster than the server can flush payloads to
durable storage. This can push the server into memory exhaustion as
dirty pages accumulate across hundreds of concurrent NFSD threads.
The existing dirty page throttling (balance_dirty_pages()) uses
per-task accounting with default ratelimits that allow each thread
to dirty ~32 pages before throttling occurs. With many NFSD threads,
this allows significant dirty page accumulation before any
throttling kicks in.
Add a debugfs control to enable aggressive write throttling for
NFSD:
/sys/kernel/debug/nfsd/write_throttle
When set to 1, NFSD write operations reduce nr_dirtied_pause to
force balance_dirty_pages() to be called more frequently. This uses
the same page-size-adjusted limit that
balance_dirty_pages_ratelimited_flags() applies when
wb->dirty_exceeded is true, providing 4x more frequent throttling on
systems with 4KB pages.
The setting defaults to 0 (normal throttling) and can be changed at
runtime without restarting NFSD.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/debugfs.c | 33 +++++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 9 +++++++++
fs/nfsd/vfs.c | 17 +++++++++++++++++
3 files changed, 59 insertions(+)
diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
index 7f44689e0a53..f3d9e957cc5c 100644
--- a/fs/nfsd/debugfs.c
+++ b/fs/nfsd/debugfs.c
@@ -122,6 +122,36 @@ static int nfsd_io_cache_write_set(void *data, u64 val)
DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_write_fops, nfsd_io_cache_write_get,
nfsd_io_cache_write_set, "%llu\n");
+/*
+ * /sys/kernel/debug/nfsd/write_throttle
+ *
+ * Contents:
+ * %0: Normal throttling (default)
+ * %1: Aggressive throttling for NFSD writes
+ *
+ * When set to 1, NFSD write operations are throttled more aggressively
+ * to prevent memory exhaustion when fast network clients overwhelm slow
+ * storage. This is useful when the server has limited memory or slow disks.
+ *
+ * This setting takes immediate effect for all NFS versions, all exports,
+ * and in all NFSD net namespaces.
+ */
+
+static int nfsd_write_throttle_get(void *data, u64 *val)
+{
+ *val = nfsd_aggressive_write_throttle ? 1 : 0;
+ return 0;
+}
+
+static int nfsd_write_throttle_set(void *data, u64 val)
+{
+ nfsd_aggressive_write_throttle = (val > 0);
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(nfsd_write_throttle_fops, nfsd_write_throttle_get,
+ nfsd_write_throttle_set, "%llu\n");
+
void nfsd_debugfs_exit(void)
{
debugfs_remove_recursive(nfsd_top_dir);
@@ -140,4 +170,7 @@ void nfsd_debugfs_init(void)
debugfs_create_file("io_cache_write", 0644, nfsd_top_dir, NULL,
&nfsd_io_cache_write_fops);
+
+ debugfs_create_file("write_throttle", 0644, nfsd_top_dir, NULL,
+ &nfsd_write_throttle_fops);
}
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index b0283213a8f5..16a259839768 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -165,6 +165,15 @@ enum {
extern u64 nfsd_io_cache_read __read_mostly;
extern u64 nfsd_io_cache_write __read_mostly;
+extern bool nfsd_aggressive_write_throttle __read_mostly;
+
+/*
+ * Aggressive write throttling reduces nr_dirtied_pause to force more
+ * frequent calls to balance_dirty_pages(). This uses the same page-size
+ * adjusted formula as balance_dirty_pages_ratelimited_flags() when
+ * wb->dirty_exceeded is true (see mm/page-writeback.c:2066).
+ */
+#define NFSD_AGGRESSIVE_DIRTY_LIMIT (32 >> (PAGE_SHIFT - 10))
extern int nfsd_max_blksize;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 168d3ccc8155..33805b9ac7e4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -51,6 +51,7 @@
bool nfsd_disable_splice_read __read_mostly;
u64 nfsd_io_cache_read __read_mostly = NFSD_IO_BUFFERED;
u64 nfsd_io_cache_write __read_mostly = NFSD_IO_BUFFERED;
+bool nfsd_aggressive_write_throttle __read_mostly;
/**
* nfserrno - Map Linux errnos to NFS errnos
@@ -1420,6 +1421,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int pflags = current->flags;
bool restore_flags = false;
unsigned int nvecs;
+ int saved_nr_dirtied_pause = 0;
+ bool throttle_adjusted = false;
trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
@@ -1441,6 +1444,18 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
exp = fhp->fh_export;
+ /*
+ * If aggressive write throttling is enabled, reduce the per-task
+ * dirty page limit to throttle NFSD writes more aggressively.
+ * This helps prevent memory exhaustion when fast network clients
+ * overwhelm slow storage.
+ */
+ if (nfsd_aggressive_write_throttle) {
+ saved_nr_dirtied_pause = current->nr_dirtied_pause;
+ current->nr_dirtied_pause = NFSD_AGGRESSIVE_DIRTY_LIMIT;
+ throttle_adjusted = true;
+ }
+
if (!EX_ISSYNC(exp))
stable = NFS_UNSTABLE;
init_sync_kiocb(&kiocb, file);
@@ -1505,6 +1520,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
trace_nfsd_write_err(rqstp, fhp, offset, host_err);
nfserr = nfserrno(host_err);
}
+ if (throttle_adjusted)
+ current->nr_dirtied_pause = saved_nr_dirtied_pause;
if (restore_flags)
current_restore_flags(pflags, PF_LOCAL_THROTTLE);
return nfserr;
--
2.52.0
next prev parent reply other threads:[~2025-12-19 14:11 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever
2025-12-19 14:11 ` Chuck Lever [this message]
2026-01-07 7:55 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Christoph Hellwig
2026-01-07 14:36 ` Chuck Lever
2026-01-07 14:42 ` Christoph Hellwig
2026-01-07 14:49 ` Chuck Lever
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
2025-12-20 15:34 ` kernel test robot
2025-12-21 5:41 ` kernel test robot
2025-12-22 18:06 ` kernel test robot
2025-12-22 23:47 ` kernel test robot
2026-01-07 8:00 ` Christoph Hellwig
2026-01-07 14:42 ` Chuck Lever
2026-01-07 16:25 ` Christoph Hellwig
2026-01-07 19:40 ` Mike Snitzer
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251219141105.1247093-2-cel@kernel.org \
--to=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hch@lst.de \
--cc=linux-nfs@vger.kernel.org \
--cc=snitzer@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.