linux-nfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs
@ 2025-12-19 14:11 Chuck Lever
  2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever
  2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
  0 siblings, 2 replies; 3+ messages in thread
From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw)
  To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

Following up on

  https://lore.kernel.org/linux-nfs/99dd427d-a16e-4494-a4b1-ff65488181ee@oracle.com/

Client workloads that are write-intensive can sometimes trigger an
NFSD meltdown (thrashing, livelocking, or becoming unresponsive).
This can happen when clients present NFSD with more UNSTABLE WRITEs
than can fit in the server's physical memory, and the system simply
can't get those dirty pages onto persistent storage fast enough.

In those cases, it makes sense to slow those clients down until the
backlog can be cleared out. NFSD might do this by delaying the
responses to UNSTABLE WRITEs, which in turn leaves unprocessed
ingress WRITEs on the transport queue longer, and thus closes down
the ingress congestion window on the network connection. This
applies direct backpressure on the noisy clients.

NFSD might already be doing this to some extent, but it can be
argued that it is not going far enough.

These two patches fall squarely in the "crazy ideas" category, but
I hope they serve as conversation starters.

Chuck Lever (2):
  NFSD: Add aggressive write throttling control
  NFSD: Add asynchronous write throttling support

 fs/nfsd/debugfs.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++
 fs/nfsd/nfsd.h    | 10 +++++++
 fs/nfsd/vfs.c     | 34 ++++++++++++++++++++++++
 3 files changed, 111 insertions(+)

-- 
2.52.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC PATCH 1/2] NFSD: Add aggressive write throttling control
  2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever
@ 2025-12-19 14:11 ` Chuck Lever
  2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
  1 sibling, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw)
  To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

On NFS servers with fast network links but slow storage, clients can
generate WRITE requests faster than the server can flush payloads to
durable storage. This can push the server into memory exhaustion as
dirty pages accumulate across hundreds of concurrent NFSD threads.

The existing dirty page throttling (balance_dirty_pages()) uses
per-task accounting with default ratelimits that allow each thread
to dirty ~32 pages before throttling occurs. With many NFSD threads,
this allows significant dirty page accumulation before any
throttling kicks in.

Add a debugfs control to enable aggressive write throttling for
NFSD:

  /sys/kernel/debug/nfsd/write_throttle

When set to 1, NFSD write operations reduce nr_dirtied_pause to
force balance_dirty_pages() to be called more frequently. This uses
the same page-size-adjusted limit that
balance_dirty_pages_ratelimited_flags() applies when
wb->dirty_exceeded is true, providing 4x more frequent throttling on
systems with 4KB pages.

The setting defaults to 0 (normal throttling) and can be changed at
runtime without restarting NFSD.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/debugfs.c | 33 +++++++++++++++++++++++++++++++++
 fs/nfsd/nfsd.h    |  9 +++++++++
 fs/nfsd/vfs.c     | 17 +++++++++++++++++
 3 files changed, 59 insertions(+)

diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
index 7f44689e0a53..f3d9e957cc5c 100644
--- a/fs/nfsd/debugfs.c
+++ b/fs/nfsd/debugfs.c
@@ -122,6 +122,36 @@ static int nfsd_io_cache_write_set(void *data, u64 val)
 DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_write_fops, nfsd_io_cache_write_get,
 			 nfsd_io_cache_write_set, "%llu\n");
 
+/*
+ * /sys/kernel/debug/nfsd/write_throttle
+ *
+ * Contents:
+ *   %0: Normal throttling (default)
+ *   %1: Aggressive throttling for NFSD writes
+ *
+ * When set to 1, NFSD write operations are throttled more aggressively
+ * to prevent memory exhaustion when fast network clients overwhelm slow
+ * storage. This is useful when the server has limited memory or slow disks.
+ *
+ * This setting takes immediate effect for all NFS versions, all exports,
+ * and in all NFSD net namespaces.
+ */
+
+static int nfsd_write_throttle_get(void *data, u64 *val)
+{
+	*val = nfsd_aggressive_write_throttle ? 1 : 0;
+	return 0;
+}
+
+static int nfsd_write_throttle_set(void *data, u64 val)
+{
+	nfsd_aggressive_write_throttle = (val > 0);
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(nfsd_write_throttle_fops, nfsd_write_throttle_get,
+			 nfsd_write_throttle_set, "%llu\n");
+
 void nfsd_debugfs_exit(void)
 {
 	debugfs_remove_recursive(nfsd_top_dir);
@@ -140,4 +170,7 @@ void nfsd_debugfs_init(void)
 
 	debugfs_create_file("io_cache_write", 0644, nfsd_top_dir, NULL,
 			    &nfsd_io_cache_write_fops);
+
+	debugfs_create_file("write_throttle", 0644, nfsd_top_dir, NULL,
+			    &nfsd_write_throttle_fops);
 }
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index b0283213a8f5..16a259839768 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -165,6 +165,15 @@ enum {
 
 extern u64 nfsd_io_cache_read __read_mostly;
 extern u64 nfsd_io_cache_write __read_mostly;
+extern bool nfsd_aggressive_write_throttle __read_mostly;
+
+/*
+ * Aggressive write throttling reduces nr_dirtied_pause to force more
+ * frequent calls to balance_dirty_pages(). This uses the same page-size
+ * adjusted formula as balance_dirty_pages_ratelimited_flags() when
+ * wb->dirty_exceeded is true (see mm/page-writeback.c:2066).
+ */
+#define NFSD_AGGRESSIVE_DIRTY_LIMIT	(32 >> (PAGE_SHIFT - 10))
 
 extern int nfsd_max_blksize;
 
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 168d3ccc8155..33805b9ac7e4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -51,6 +51,7 @@
 bool nfsd_disable_splice_read __read_mostly;
 u64 nfsd_io_cache_read __read_mostly = NFSD_IO_BUFFERED;
 u64 nfsd_io_cache_write __read_mostly = NFSD_IO_BUFFERED;
+bool nfsd_aggressive_write_throttle __read_mostly;
 
 /**
  * nfserrno - Map Linux errnos to NFS errnos
@@ -1420,6 +1421,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 	unsigned int		pflags = current->flags;
 	bool			restore_flags = false;
 	unsigned int		nvecs;
+	int			saved_nr_dirtied_pause = 0;
+	bool			throttle_adjusted = false;
 
 	trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
 
@@ -1441,6 +1444,18 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 
 	exp = fhp->fh_export;
 
+	/*
+	 * If aggressive write throttling is enabled, reduce the per-task
+	 * dirty page limit to throttle NFSD writes more aggressively.
+	 * This helps prevent memory exhaustion when fast network clients
+	 * overwhelm slow storage.
+	 */
+	if (nfsd_aggressive_write_throttle) {
+		saved_nr_dirtied_pause = current->nr_dirtied_pause;
+		current->nr_dirtied_pause = NFSD_AGGRESSIVE_DIRTY_LIMIT;
+		throttle_adjusted = true;
+	}
+
 	if (!EX_ISSYNC(exp))
 		stable = NFS_UNSTABLE;
 	init_sync_kiocb(&kiocb, file);
@@ -1505,6 +1520,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		trace_nfsd_write_err(rqstp, fhp, offset, host_err);
 		nfserr = nfserrno(host_err);
 	}
+	if (throttle_adjusted)
+		current->nr_dirtied_pause = saved_nr_dirtied_pause;
 	if (restore_flags)
 		current_restore_flags(pflags, PF_LOCAL_THROTTLE);
 	return nfserr;
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
  2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever
  2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever
@ 2025-12-19 14:11 ` Chuck Lever
  1 sibling, 0 replies; 3+ messages in thread
From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw)
  To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever

From: Chuck Lever <chuck.lever@oracle.com>

When memory pressure occurs during buffered writes, the traditional
approach is for balance_dirty_pages() to put the writing thread to
sleep until dirty pages are flushed. For NFSD, this means server
threads block waiting for I/O, reducing overall server throughput.

Add support for asynchronous write throttling using the BDP_ASYNC
flag to balance_dirty_pages_ratelimited_flags(). When enabled via:

  /sys/kernel/debug/nfsd/write_async_throttle

NFSD checks memory pressure before attempting buffered writes. If
balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating
memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for
NFSv3) to the client instead of blocking.

This allows clients to back off and retry rather than having server
threads tied up waiting for writeback. The setting defaults to 0
(synchronous throttling) and can be combined with write_throttle for
layered throttling strategies.

Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is
automatically disabled for NFSv2 requests regardless of the setting.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/nfsd/debugfs.c | 34 ++++++++++++++++++++++++++++++++++
 fs/nfsd/nfsd.h    |  1 +
 fs/nfsd/vfs.c     | 17 +++++++++++++++++
 3 files changed, 52 insertions(+)

diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
index f3d9e957cc5c..f2cce37589ce 100644
--- a/fs/nfsd/debugfs.c
+++ b/fs/nfsd/debugfs.c
@@ -152,6 +152,37 @@ static int nfsd_write_throttle_set(void *data, u64 val)
 DEFINE_DEBUGFS_ATTRIBUTE(nfsd_write_throttle_fops, nfsd_write_throttle_get,
 			 nfsd_write_throttle_set, "%llu\n");
 
+/*
+ * /sys/kernel/debug/nfsd/write_async_throttle
+ *
+ * Contents:
+ *   %0: Synchronous throttling (default) - writes sleep in balance_dirty_pages()
+ *   %1: Asynchronous throttling - return NFS4ERR_DELAY when memory is tight
+ *
+ * When set to 1, NFSD uses BDP_ASYNC mode which returns -EAGAIN from
+ * balance_dirty_pages_ratelimited_flags() instead of sleeping. This allows
+ * NFSD to return NFS4ERR_DELAY (or NFSERR_JUKEBOX for NFSv3), letting
+ * clients back off and retry rather than having NFSD threads blocked.
+ *
+ * This setting takes immediate effect for all NFS versions, all exports,
+ * and in all NFSD net namespaces.
+ */
+
+static int nfsd_async_throttle_get(void *data, u64 *val)
+{
+	*val = nfsd_async_write_throttle ? 1 : 0;
+	return 0;
+}
+
+static int nfsd_async_throttle_set(void *data, u64 val)
+{
+	nfsd_async_write_throttle = (val > 0);
+	return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(nfsd_async_throttle_fops, nfsd_async_throttle_get,
+			 nfsd_async_throttle_set, "%llu\n");
+
 void nfsd_debugfs_exit(void)
 {
 	debugfs_remove_recursive(nfsd_top_dir);
@@ -173,4 +204,7 @@ void nfsd_debugfs_init(void)
 
 	debugfs_create_file("write_throttle", 0644, nfsd_top_dir, NULL,
 			    &nfsd_write_throttle_fops);
+
+	debugfs_create_file("write_async_throttle", 0644, nfsd_top_dir, NULL,
+			    &nfsd_async_throttle_fops);
 }
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 16a259839768..ea61db58ef95 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -166,6 +166,7 @@ enum {
 extern u64 nfsd_io_cache_read __read_mostly;
 extern u64 nfsd_io_cache_write __read_mostly;
 extern bool nfsd_aggressive_write_throttle __read_mostly;
+extern bool nfsd_async_write_throttle __read_mostly;
 
 /*
  * Aggressive write throttling reduces nr_dirtied_pause to force more
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 33805b9ac7e4..0fcfd29e843d 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -52,6 +52,7 @@ bool nfsd_disable_splice_read __read_mostly;
 u64 nfsd_io_cache_read __read_mostly = NFSD_IO_BUFFERED;
 u64 nfsd_io_cache_write __read_mostly = NFSD_IO_BUFFERED;
 bool nfsd_aggressive_write_throttle __read_mostly;
+bool nfsd_async_write_throttle __read_mostly;
 
 /**
  * nfserrno - Map Linux errnos to NFS errnos
@@ -1473,6 +1474,22 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
 		}
 	}
 
+	/*
+	 * If async throttling is enabled, check memory pressure
+	 * before attempting buffered writes. Return -EAGAIN if
+	 * the system is low on memory, allowing NFSD to return
+	 * an NFS error code asking the client to retry later.
+	 *
+	 * Skip this for NFSv2 since it lacks NFSERR_JUKEBOX.
+	 */
+	if (nfsd_async_write_throttle && rqstp->rq_vers >= 3) {
+		host_err =
+			balance_dirty_pages_ratelimited_flags(file->f_mapping,
+							      BDP_ASYNC);
+		if (host_err == -EAGAIN)
+			break;
+	}
+
 	nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
 
 	since = READ_ONCE(file->f_wb_err);
-- 
2.52.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-12-19 14:11 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever
2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).