* [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs @ 2025-12-19 14:11 Chuck Lever 2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever 2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever 0 siblings, 2 replies; 11+ messages in thread From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw) To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever From: Chuck Lever <chuck.lever@oracle.com> Following up on https://lore.kernel.org/linux-nfs/99dd427d-a16e-4494-a4b1-ff65488181ee@oracle.com/ Client workloads that are write-intensive can sometimes trigger an NFSD meltdown (thrashing, livelocking, or becoming unresponsive). This can happen when clients present NFSD with more UNSTABLE WRITEs than can fit in the server's physical memory, and the system simply can't get those dirty pages onto persistent storage fast enough. In those cases, it makes sense to slow those clients down until the backlog can be cleared out. NFSD might do this by delaying the responses to UNSTABLE WRITEs, which in turn leaves unprocessed ingress WRITEs on the transport queue longer, and thus closes down the ingress congestion window on the network connection. This applies direct backpressure on the noisy clients. NFSD might already be doing this to some extent, but it can be argued that it is not going far enough. These two patches fall squarely in the "crazy ideas" category, but I hope they serve as conversation starters. Chuck Lever (2): NFSD: Add aggressive write throttling control NFSD: Add asynchronous write throttling support fs/nfsd/debugfs.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++ fs/nfsd/nfsd.h | 10 +++++++ fs/nfsd/vfs.c | 34 ++++++++++++++++++++++++ 3 files changed, 111 insertions(+) -- 2.52.0 ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 1/2] NFSD: Add aggressive write throttling control 2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever @ 2025-12-19 14:11 ` Chuck Lever 2026-01-07 7:55 ` Christoph Hellwig 2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever 1 sibling, 1 reply; 11+ messages in thread From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw) To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever From: Chuck Lever <chuck.lever@oracle.com> On NFS servers with fast network links but slow storage, clients can generate WRITE requests faster than the server can flush payloads to durable storage. This can push the server into memory exhaustion as dirty pages accumulate across hundreds of concurrent NFSD threads. The existing dirty page throttling (balance_dirty_pages()) uses per-task accounting with default ratelimits that allow each thread to dirty ~32 pages before throttling occurs. With many NFSD threads, this allows significant dirty page accumulation before any throttling kicks in. Add a debugfs control to enable aggressive write throttling for NFSD: /sys/kernel/debug/nfsd/write_throttle When set to 1, NFSD write operations reduce nr_dirtied_pause to force balance_dirty_pages() to be called more frequently. This uses the same page-size-adjusted limit that balance_dirty_pages_ratelimited_flags() applies when wb->dirty_exceeded is true, providing 4x more frequent throttling on systems with 4KB pages. The setting defaults to 0 (normal throttling) and can be changed at runtime without restarting NFSD. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- fs/nfsd/debugfs.c | 33 +++++++++++++++++++++++++++++++++ fs/nfsd/nfsd.h | 9 +++++++++ fs/nfsd/vfs.c | 17 +++++++++++++++++ 3 files changed, 59 insertions(+) diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c index 7f44689e0a53..f3d9e957cc5c 100644 --- a/fs/nfsd/debugfs.c +++ b/fs/nfsd/debugfs.c @@ -122,6 +122,36 @@ static int nfsd_io_cache_write_set(void *data, u64 val) DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_write_fops, nfsd_io_cache_write_get, nfsd_io_cache_write_set, "%llu\n"); +/* + * /sys/kernel/debug/nfsd/write_throttle + * + * Contents: + * %0: Normal throttling (default) + * %1: Aggressive throttling for NFSD writes + * + * When set to 1, NFSD write operations are throttled more aggressively + * to prevent memory exhaustion when fast network clients overwhelm slow + * storage. This is useful when the server has limited memory or slow disks. + * + * This setting takes immediate effect for all NFS versions, all exports, + * and in all NFSD net namespaces. + */ + +static int nfsd_write_throttle_get(void *data, u64 *val) +{ + *val = nfsd_aggressive_write_throttle ? 1 : 0; + return 0; +} + +static int nfsd_write_throttle_set(void *data, u64 val) +{ + nfsd_aggressive_write_throttle = (val > 0); + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(nfsd_write_throttle_fops, nfsd_write_throttle_get, + nfsd_write_throttle_set, "%llu\n"); + void nfsd_debugfs_exit(void) { debugfs_remove_recursive(nfsd_top_dir); @@ -140,4 +170,7 @@ void nfsd_debugfs_init(void) debugfs_create_file("io_cache_write", 0644, nfsd_top_dir, NULL, &nfsd_io_cache_write_fops); + + debugfs_create_file("write_throttle", 0644, nfsd_top_dir, NULL, + &nfsd_write_throttle_fops); } diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index b0283213a8f5..16a259839768 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -165,6 +165,15 @@ enum { extern u64 nfsd_io_cache_read __read_mostly; extern u64 nfsd_io_cache_write __read_mostly; +extern bool nfsd_aggressive_write_throttle __read_mostly; + +/* + * Aggressive write throttling reduces nr_dirtied_pause to force more + * frequent calls to balance_dirty_pages(). This uses the same page-size + * adjusted formula as balance_dirty_pages_ratelimited_flags() when + * wb->dirty_exceeded is true (see mm/page-writeback.c:2066). + */ +#define NFSD_AGGRESSIVE_DIRTY_LIMIT (32 >> (PAGE_SHIFT - 10)) extern int nfsd_max_blksize; diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 168d3ccc8155..33805b9ac7e4 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -51,6 +51,7 @@ bool nfsd_disable_splice_read __read_mostly; u64 nfsd_io_cache_read __read_mostly = NFSD_IO_BUFFERED; u64 nfsd_io_cache_write __read_mostly = NFSD_IO_BUFFERED; +bool nfsd_aggressive_write_throttle __read_mostly; /** * nfserrno - Map Linux errnos to NFS errnos @@ -1420,6 +1421,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, unsigned int pflags = current->flags; bool restore_flags = false; unsigned int nvecs; + int saved_nr_dirtied_pause = 0; + bool throttle_adjusted = false; trace_nfsd_write_opened(rqstp, fhp, offset, *cnt); @@ -1441,6 +1444,18 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, exp = fhp->fh_export; + /* + * If aggressive write throttling is enabled, reduce the per-task + * dirty page limit to throttle NFSD writes more aggressively. + * This helps prevent memory exhaustion when fast network clients + * overwhelm slow storage. + */ + if (nfsd_aggressive_write_throttle) { + saved_nr_dirtied_pause = current->nr_dirtied_pause; + current->nr_dirtied_pause = NFSD_AGGRESSIVE_DIRTY_LIMIT; + throttle_adjusted = true; + } + if (!EX_ISSYNC(exp)) stable = NFS_UNSTABLE; init_sync_kiocb(&kiocb, file); @@ -1505,6 +1520,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, trace_nfsd_write_err(rqstp, fhp, offset, host_err); nfserr = nfserrno(host_err); } + if (throttle_adjusted) + current->nr_dirtied_pause = saved_nr_dirtied_pause; if (restore_flags) current_restore_flags(pflags, PF_LOCAL_THROTTLE); return nfserr; -- 2.52.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control 2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever @ 2026-01-07 7:55 ` Christoph Hellwig 2026-01-07 14:36 ` Chuck Lever 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2026-01-07 7:55 UTC (permalink / raw) To: Chuck Lever Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever, linux-mm, linux-fsdevel On Fri, Dec 19, 2025 at 09:11:04AM -0500, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > On NFS servers with fast network links but slow storage, clients can > generate WRITE requests faster than the server can flush payloads to > durable storage. This can push the server into memory exhaustion as > dirty pages accumulate across hundreds of concurrent NFSD threads. > > The existing dirty page throttling (balance_dirty_pages()) uses > per-task accounting with default ratelimits that allow each thread > to dirty ~32 pages before throttling occurs. With many NFSD threads, > this allows significant dirty page accumulation before any > throttling kicks in. What makes NFSD so special here vs say a userspace process with a bunch of threads? Also what is the actual problem we're trying to solve? I kinda hate having this stuff in NFSD when there's nothing specific about nfs serving here. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control 2026-01-07 7:55 ` Christoph Hellwig @ 2026-01-07 14:36 ` Chuck Lever 2026-01-07 14:42 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Chuck Lever @ 2026-01-07 14:36 UTC (permalink / raw) To: Christoph Hellwig Cc: Mike Snitzer, linux-nfs, Chuck Lever, linux-mm, linux-fsdevel On 1/7/26 2:55 AM, Christoph Hellwig wrote: > On Fri, Dec 19, 2025 at 09:11:04AM -0500, Chuck Lever wrote: >> From: Chuck Lever <chuck.lever@oracle.com> >> >> On NFS servers with fast network links but slow storage, clients can >> generate WRITE requests faster than the server can flush payloads to >> durable storage. This can push the server into memory exhaustion as >> dirty pages accumulate across hundreds of concurrent NFSD threads. >> >> The existing dirty page throttling (balance_dirty_pages()) uses >> per-task accounting with default ratelimits that allow each thread >> to dirty ~32 pages before throttling occurs. With many NFSD threads, >> this allows significant dirty page accumulation before any >> throttling kicks in. > > What makes NFSD so special here vs say a userspace process with a bunch > of threads? Also what is the actual problem we're trying to solve? The problem, as I see it, is that the system is not providing enough backpressure to slow down noisy clients, allowing them to overwhelm the server's memory with UNSTABLE WRITE traffic. This is the same issue, IMO, that Mike's direct I/O is attempting to address. Our implementation of UNSTABLE WRITE is a denial-of-service vector. > I kinda hate having this stuff in NFSD when there's nothing specific > about nfs serving here. Don't worry too much about that, these patches are obviously not in any kind of merge-able shape yet. We do need to understand the metabolism of UNSTABLE WRITEs, in particular, to get a clear picture of what needs to be controlled to make the server autonomously stable. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control 2026-01-07 14:36 ` Chuck Lever @ 2026-01-07 14:42 ` Christoph Hellwig 2026-01-07 14:49 ` Chuck Lever 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2026-01-07 14:42 UTC (permalink / raw) To: Chuck Lever Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever, linux-mm, linux-fsdevel On Wed, Jan 07, 2026 at 09:36:39AM -0500, Chuck Lever wrote: > > What makes NFSD so special here vs say a userspace process with a bunch > > of threads? Also what is the actual problem we're trying to solve? > > The problem, as I see it, is that the system is not providing enough > backpressure to slow down noisy clients, allowing them to overwhelm > the server's memory with UNSTABLE WRITE traffic. > > This is the same issue, IMO, that Mike's direct I/O is attempting to > address. Our implementation of UNSTABLE WRITE is a denial-of-service > vector. But how is this different from Samba or a userspace NFS server? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control 2026-01-07 14:42 ` Christoph Hellwig @ 2026-01-07 14:49 ` Chuck Lever 0 siblings, 0 replies; 11+ messages in thread From: Chuck Lever @ 2026-01-07 14:49 UTC (permalink / raw) To: Christoph Hellwig Cc: Mike Snitzer, linux-nfs, Chuck Lever, linux-mm, linux-fsdevel On 1/7/26 9:42 AM, Christoph Hellwig wrote: > On Wed, Jan 07, 2026 at 09:36:39AM -0500, Chuck Lever wrote: >>> What makes NFSD so special here vs say a userspace process with a bunch >>> of threads? Also what is the actual problem we're trying to solve? >> >> The problem, as I see it, is that the system is not providing enough >> backpressure to slow down noisy clients, allowing them to overwhelm >> the server's memory with UNSTABLE WRITE traffic. >> >> This is the same issue, IMO, that Mike's direct I/O is attempting to >> address. Our implementation of UNSTABLE WRITE is a denial-of-service >> vector. > > But how is this different from Samba or a userspace NFS server? Well it might not be different. But at this point I don't think we know enough about the problem to say one way or another. I'm just trying to gather more experimental evidence about what is happening. -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support 2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever 2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever @ 2025-12-19 14:11 ` Chuck Lever 2026-01-07 8:00 ` Christoph Hellwig 1 sibling, 1 reply; 11+ messages in thread From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw) To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever From: Chuck Lever <chuck.lever@oracle.com> When memory pressure occurs during buffered writes, the traditional approach is for balance_dirty_pages() to put the writing thread to sleep until dirty pages are flushed. For NFSD, this means server threads block waiting for I/O, reducing overall server throughput. Add support for asynchronous write throttling using the BDP_ASYNC flag to balance_dirty_pages_ratelimited_flags(). When enabled via: /sys/kernel/debug/nfsd/write_async_throttle NFSD checks memory pressure before attempting buffered writes. If balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for NFSv3) to the client instead of blocking. This allows clients to back off and retry rather than having server threads tied up waiting for writeback. The setting defaults to 0 (synchronous throttling) and can be combined with write_throttle for layered throttling strategies. Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is automatically disabled for NFSv2 requests regardless of the setting. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> --- fs/nfsd/debugfs.c | 34 ++++++++++++++++++++++++++++++++++ fs/nfsd/nfsd.h | 1 + fs/nfsd/vfs.c | 17 +++++++++++++++++ 3 files changed, 52 insertions(+) diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c index f3d9e957cc5c..f2cce37589ce 100644 --- a/fs/nfsd/debugfs.c +++ b/fs/nfsd/debugfs.c @@ -152,6 +152,37 @@ static int nfsd_write_throttle_set(void *data, u64 val) DEFINE_DEBUGFS_ATTRIBUTE(nfsd_write_throttle_fops, nfsd_write_throttle_get, nfsd_write_throttle_set, "%llu\n"); +/* + * /sys/kernel/debug/nfsd/write_async_throttle + * + * Contents: + * %0: Synchronous throttling (default) - writes sleep in balance_dirty_pages() + * %1: Asynchronous throttling - return NFS4ERR_DELAY when memory is tight + * + * When set to 1, NFSD uses BDP_ASYNC mode which returns -EAGAIN from + * balance_dirty_pages_ratelimited_flags() instead of sleeping. This allows + * NFSD to return NFS4ERR_DELAY (or NFSERR_JUKEBOX for NFSv3), letting + * clients back off and retry rather than having NFSD threads blocked. + * + * This setting takes immediate effect for all NFS versions, all exports, + * and in all NFSD net namespaces. + */ + +static int nfsd_async_throttle_get(void *data, u64 *val) +{ + *val = nfsd_async_write_throttle ? 1 : 0; + return 0; +} + +static int nfsd_async_throttle_set(void *data, u64 val) +{ + nfsd_async_write_throttle = (val > 0); + return 0; +} + +DEFINE_DEBUGFS_ATTRIBUTE(nfsd_async_throttle_fops, nfsd_async_throttle_get, + nfsd_async_throttle_set, "%llu\n"); + void nfsd_debugfs_exit(void) { debugfs_remove_recursive(nfsd_top_dir); @@ -173,4 +204,7 @@ void nfsd_debugfs_init(void) debugfs_create_file("write_throttle", 0644, nfsd_top_dir, NULL, &nfsd_write_throttle_fops); + + debugfs_create_file("write_async_throttle", 0644, nfsd_top_dir, NULL, + &nfsd_async_throttle_fops); } diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h index 16a259839768..ea61db58ef95 100644 --- a/fs/nfsd/nfsd.h +++ b/fs/nfsd/nfsd.h @@ -166,6 +166,7 @@ enum { extern u64 nfsd_io_cache_read __read_mostly; extern u64 nfsd_io_cache_write __read_mostly; extern bool nfsd_aggressive_write_throttle __read_mostly; +extern bool nfsd_async_write_throttle __read_mostly; /* * Aggressive write throttling reduces nr_dirtied_pause to force more diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c index 33805b9ac7e4..0fcfd29e843d 100644 --- a/fs/nfsd/vfs.c +++ b/fs/nfsd/vfs.c @@ -52,6 +52,7 @@ bool nfsd_disable_splice_read __read_mostly; u64 nfsd_io_cache_read __read_mostly = NFSD_IO_BUFFERED; u64 nfsd_io_cache_write __read_mostly = NFSD_IO_BUFFERED; bool nfsd_aggressive_write_throttle __read_mostly; +bool nfsd_async_write_throttle __read_mostly; /** * nfserrno - Map Linux errnos to NFS errnos @@ -1473,6 +1474,22 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp, } } + /* + * If async throttling is enabled, check memory pressure + * before attempting buffered writes. Return -EAGAIN if + * the system is low on memory, allowing NFSD to return + * an NFS error code asking the client to retry later. + * + * Skip this for NFSv2 since it lacks NFSERR_JUKEBOX. + */ + if (nfsd_async_write_throttle && rqstp->rq_vers >= 3) { + host_err = + balance_dirty_pages_ratelimited_flags(file->f_mapping, + BDP_ASYNC); + if (host_err == -EAGAIN) + break; + } + nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload); since = READ_ONCE(file->f_wb_err); -- 2.52.0 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support 2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever @ 2026-01-07 8:00 ` Christoph Hellwig 2026-01-07 14:42 ` Chuck Lever 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2026-01-07 8:00 UTC (permalink / raw) To: Chuck Lever; +Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever On Fri, Dec 19, 2025 at 09:11:05AM -0500, Chuck Lever wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > When memory pressure occurs during buffered writes, the traditional > approach is for balance_dirty_pages() to put the writing thread to > sleep until dirty pages are flushed. For NFSD, this means server > threads block waiting for I/O, reducing overall server throughput. > > Add support for asynchronous write throttling using the BDP_ASYNC > flag to balance_dirty_pages_ratelimited_flags(). When enabled via: > > /sys/kernel/debug/nfsd/write_async_throttle Let me reiterate that I really, really hate all this magic debugs-fs enabled features. Either they are gnuinely useful (think this would be such a thing) and they should be enabled unconditionally, or they are tradeoffs and should have a proper tunable not hidden in debugfs. > NFSD checks memory pressure before attempting buffered writes. If > balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating > memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for > NFSv3) to the client instead of blocking. > > This allows clients to back off and retry rather than having server > threads tied up waiting for writeback. The setting defaults to 0 > (synchronous throttling) and can be combined with write_throttle for > layered throttling strategies. > > Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is > automatically disabled for NFSv2 requests regardless of the setting. This all seems very useful to me. But it really needs to show numbers on how it helps. > + * Contents: > + * %0: Synchronous throttling (default) - writes sleep in balance_dirty_pages() Overly lone line. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support 2026-01-07 8:00 ` Christoph Hellwig @ 2026-01-07 14:42 ` Chuck Lever 2026-01-07 16:25 ` Christoph Hellwig 2026-01-07 19:40 ` Mike Snitzer 0 siblings, 2 replies; 11+ messages in thread From: Chuck Lever @ 2026-01-07 14:42 UTC (permalink / raw) To: Christoph Hellwig; +Cc: Mike Snitzer, linux-nfs, Chuck Lever On 1/7/26 3:00 AM, Christoph Hellwig wrote: > On Fri, Dec 19, 2025 at 09:11:05AM -0500, Chuck Lever wrote: >> From: Chuck Lever <chuck.lever@oracle.com> >> >> When memory pressure occurs during buffered writes, the traditional >> approach is for balance_dirty_pages() to put the writing thread to >> sleep until dirty pages are flushed. For NFSD, this means server >> threads block waiting for I/O, reducing overall server throughput. >> >> Add support for asynchronous write throttling using the BDP_ASYNC >> flag to balance_dirty_pages_ratelimited_flags(). When enabled via: >> >> /sys/kernel/debug/nfsd/write_async_throttle > > Let me reiterate that I really, really hate all this magic debugs-fs > enabled features. Either they are gnuinely useful (think this would > be such a thing) and they should be enabled unconditionally, or they > are tradeoffs and should have a proper tunable not hidden in debugfs. The use of debugfs here is because we don't yet have a coherent design in mind -- this new facility is entirely experimental, and we need a way to enable and disable it to make good comparisons, without making immutable changes to the actual NFSD administrative interface. "The RFC sign out front should have told ya." But I agree, in the long term I most prefer no new administrative controls -- it should just work if at all possible. >> NFSD checks memory pressure before attempting buffered writes. If >> balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating >> memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for >> NFSv3) to the client instead of blocking. >> >> This allows clients to back off and retry rather than having server >> threads tied up waiting for writeback. The setting defaults to 0 >> (synchronous throttling) and can be combined with write_throttle for >> layered throttling strategies. >> >> Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is >> automatically disabled for NFSv2 requests regardless of the setting. > > This all seems very useful to me. But it really needs to show numbers > on how it helps. Well if I can get this into operational shape, perhaps J. Flynn would be interested in trying it out for us. I'm happy to run with this one and drop (or postpone) 1/2, if that is your assessment. >> + * Contents: >> + * %0: Synchronous throttling (default) - writes sleep in balance_dirty_pages() > > Overly lone line. > -- Chuck Lever ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support 2026-01-07 14:42 ` Chuck Lever @ 2026-01-07 16:25 ` Christoph Hellwig 2026-01-07 19:40 ` Mike Snitzer 1 sibling, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2026-01-07 16:25 UTC (permalink / raw) To: Chuck Lever; +Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever On Wed, Jan 07, 2026 at 09:42:58AM -0500, Chuck Lever wrote: > I'm happy to run with this one and drop (or postpone) 1/2, if that is > your assessment. I don't really understand what exactly patch 1 is aiming for. Not stalling nfsd threads when congested makes total sense on the other hand. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support 2026-01-07 14:42 ` Chuck Lever 2026-01-07 16:25 ` Christoph Hellwig @ 2026-01-07 19:40 ` Mike Snitzer 1 sibling, 0 replies; 11+ messages in thread From: Mike Snitzer @ 2026-01-07 19:40 UTC (permalink / raw) To: Chuck Lever; +Cc: Christoph Hellwig, linux-nfs, Chuck Lever, jonathan.flynn On Wed, Jan 07, 2026 at 09:42:58AM -0500, Chuck Lever wrote: > On 1/7/26 3:00 AM, Christoph Hellwig wrote: > > On Fri, Dec 19, 2025 at 09:11:05AM -0500, Chuck Lever wrote: > >> From: Chuck Lever <chuck.lever@oracle.com> > >> > >> When memory pressure occurs during buffered writes, the traditional > >> approach is for balance_dirty_pages() to put the writing thread to > >> sleep until dirty pages are flushed. For NFSD, this means server > >> threads block waiting for I/O, reducing overall server throughput. > >> > >> Add support for asynchronous write throttling using the BDP_ASYNC > >> flag to balance_dirty_pages_ratelimited_flags(). When enabled via: > >> > >> /sys/kernel/debug/nfsd/write_async_throttle > > > > Let me reiterate that I really, really hate all this magic debugs-fs > > enabled features. Either they are gnuinely useful (think this would > > be such a thing) and they should be enabled unconditionally, or they > > are tradeoffs and should have a proper tunable not hidden in debugfs. > > The use of debugfs here is because we don't yet have a coherent design > in mind -- this new facility is entirely experimental, and we need a > way to enable and disable it to make good comparisons, without making > immutable changes to the actual NFSD administrative interface. > > "The RFC sign out front should have told ya." > > But I agree, in the long term I most prefer no new administrative > controls -- it should just work if at all possible. > > > >> NFSD checks memory pressure before attempting buffered writes. If > >> balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating > >> memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for > >> NFSv3) to the client instead of blocking. > >> > >> This allows clients to back off and retry rather than having server > >> threads tied up waiting for writeback. The setting defaults to 0 > >> (synchronous throttling) and can be combined with write_throttle for > >> layered throttling strategies. > >> > >> Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is > >> automatically disabled for NFSv2 requests regardless of the setting. > > > > This all seems very useful to me. But it really needs to show numbers > > on how it helps. > > Well if I can get this into operational shape, perhaps J. Flynn would > be interested in trying it out for us. > > I'm happy to run with this one and drop (or postpone) 1/2, if that is > your assessment. Probably a good start. Definitely looks useful and worth measuring to see if buffered IO improves. I can include it in a test kernel for Jon Flynn once you're happy with the patch and would like further testing (fyi I've rebased to latest 6.18-stable but Jon hasn't done baseline testing of it yet, so we could kill 2 birds once ready). Thanks, Mike ps. Jon, for further context see Chuck's original 2/2 patch: https://lore.kernel.org/linux-nfs/20251219141105.1247093-3-cel@kernel.org/ And his cover letter: https://lore.kernel.org/linux-nfs/20251219141105.1247093-1-cel@kernel.org/ Also patch 1/2, but consensus seems to be "focus on 2/2 first": https://lore.kernel.org/linux-nfs/20251219141105.1247093-2-cel@kernel.org/ ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2026-01-07 19:40 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever 2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever 2026-01-07 7:55 ` Christoph Hellwig 2026-01-07 14:36 ` Chuck Lever 2026-01-07 14:42 ` Christoph Hellwig 2026-01-07 14:49 ` Chuck Lever 2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever 2026-01-07 8:00 ` Christoph Hellwig 2026-01-07 14:42 ` Chuck Lever 2026-01-07 16:25 ` Christoph Hellwig 2026-01-07 19:40 ` Mike Snitzer
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox