* [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs
@ 2025-12-19 14:11 Chuck Lever
2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
0 siblings, 2 replies; 15+ messages in thread
From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw)
To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
Following up on
https://lore.kernel.org/linux-nfs/99dd427d-a16e-4494-a4b1-ff65488181ee@oracle.com/
Client workloads that are write-intensive can sometimes trigger an
NFSD meltdown (thrashing, livelocking, or becoming unresponsive).
This can happen when clients present NFSD with more UNSTABLE WRITEs
than can fit in the server's physical memory, and the system simply
can't get those dirty pages onto persistent storage fast enough.
In those cases, it makes sense to slow those clients down until the
backlog can be cleared out. NFSD might do this by delaying the
responses to UNSTABLE WRITEs, which in turn leaves unprocessed
ingress WRITEs on the transport queue longer, and thus closes down
the ingress congestion window on the network connection. This
applies direct backpressure on the noisy clients.
NFSD might already be doing this to some extent, but it can be
argued that it is not going far enough.
These two patches fall squarely in the "crazy ideas" category, but
I hope they serve as conversation starters.
Chuck Lever (2):
NFSD: Add aggressive write throttling control
NFSD: Add asynchronous write throttling support
fs/nfsd/debugfs.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 10 +++++++
fs/nfsd/vfs.c | 34 ++++++++++++++++++++++++
3 files changed, 111 insertions(+)
--
2.52.0
^ permalink raw reply [flat|nested] 15+ messages in thread
* [RFC PATCH 1/2] NFSD: Add aggressive write throttling control
2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever
@ 2025-12-19 14:11 ` Chuck Lever
2026-01-07 7:55 ` Christoph Hellwig
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
1 sibling, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw)
To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
On NFS servers with fast network links but slow storage, clients can
generate WRITE requests faster than the server can flush payloads to
durable storage. This can push the server into memory exhaustion as
dirty pages accumulate across hundreds of concurrent NFSD threads.
The existing dirty page throttling (balance_dirty_pages()) uses
per-task accounting with default ratelimits that allow each thread
to dirty ~32 pages before throttling occurs. With many NFSD threads,
this allows significant dirty page accumulation before any
throttling kicks in.
Add a debugfs control to enable aggressive write throttling for
NFSD:
/sys/kernel/debug/nfsd/write_throttle
When set to 1, NFSD write operations reduce nr_dirtied_pause to
force balance_dirty_pages() to be called more frequently. This uses
the same page-size-adjusted limit that
balance_dirty_pages_ratelimited_flags() applies when
wb->dirty_exceeded is true, providing 4x more frequent throttling on
systems with 4KB pages.
The setting defaults to 0 (normal throttling) and can be changed at
runtime without restarting NFSD.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/debugfs.c | 33 +++++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 9 +++++++++
fs/nfsd/vfs.c | 17 +++++++++++++++++
3 files changed, 59 insertions(+)
diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
index 7f44689e0a53..f3d9e957cc5c 100644
--- a/fs/nfsd/debugfs.c
+++ b/fs/nfsd/debugfs.c
@@ -122,6 +122,36 @@ static int nfsd_io_cache_write_set(void *data, u64 val)
DEFINE_DEBUGFS_ATTRIBUTE(nfsd_io_cache_write_fops, nfsd_io_cache_write_get,
nfsd_io_cache_write_set, "%llu\n");
+/*
+ * /sys/kernel/debug/nfsd/write_throttle
+ *
+ * Contents:
+ * %0: Normal throttling (default)
+ * %1: Aggressive throttling for NFSD writes
+ *
+ * When set to 1, NFSD write operations are throttled more aggressively
+ * to prevent memory exhaustion when fast network clients overwhelm slow
+ * storage. This is useful when the server has limited memory or slow disks.
+ *
+ * This setting takes immediate effect for all NFS versions, all exports,
+ * and in all NFSD net namespaces.
+ */
+
+static int nfsd_write_throttle_get(void *data, u64 *val)
+{
+ *val = nfsd_aggressive_write_throttle ? 1 : 0;
+ return 0;
+}
+
+static int nfsd_write_throttle_set(void *data, u64 val)
+{
+ nfsd_aggressive_write_throttle = (val > 0);
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(nfsd_write_throttle_fops, nfsd_write_throttle_get,
+ nfsd_write_throttle_set, "%llu\n");
+
void nfsd_debugfs_exit(void)
{
debugfs_remove_recursive(nfsd_top_dir);
@@ -140,4 +170,7 @@ void nfsd_debugfs_init(void)
debugfs_create_file("io_cache_write", 0644, nfsd_top_dir, NULL,
&nfsd_io_cache_write_fops);
+
+ debugfs_create_file("write_throttle", 0644, nfsd_top_dir, NULL,
+ &nfsd_write_throttle_fops);
}
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index b0283213a8f5..16a259839768 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -165,6 +165,15 @@ enum {
extern u64 nfsd_io_cache_read __read_mostly;
extern u64 nfsd_io_cache_write __read_mostly;
+extern bool nfsd_aggressive_write_throttle __read_mostly;
+
+/*
+ * Aggressive write throttling reduces nr_dirtied_pause to force more
+ * frequent calls to balance_dirty_pages(). This uses the same page-size
+ * adjusted formula as balance_dirty_pages_ratelimited_flags() when
+ * wb->dirty_exceeded is true (see mm/page-writeback.c:2066).
+ */
+#define NFSD_AGGRESSIVE_DIRTY_LIMIT (32 >> (PAGE_SHIFT - 10))
extern int nfsd_max_blksize;
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 168d3ccc8155..33805b9ac7e4 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -51,6 +51,7 @@
bool nfsd_disable_splice_read __read_mostly;
u64 nfsd_io_cache_read __read_mostly = NFSD_IO_BUFFERED;
u64 nfsd_io_cache_write __read_mostly = NFSD_IO_BUFFERED;
+bool nfsd_aggressive_write_throttle __read_mostly;
/**
* nfserrno - Map Linux errnos to NFS errnos
@@ -1420,6 +1421,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
unsigned int pflags = current->flags;
bool restore_flags = false;
unsigned int nvecs;
+ int saved_nr_dirtied_pause = 0;
+ bool throttle_adjusted = false;
trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
@@ -1441,6 +1444,18 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
exp = fhp->fh_export;
+ /*
+ * If aggressive write throttling is enabled, reduce the per-task
+ * dirty page limit to throttle NFSD writes more aggressively.
+ * This helps prevent memory exhaustion when fast network clients
+ * overwhelm slow storage.
+ */
+ if (nfsd_aggressive_write_throttle) {
+ saved_nr_dirtied_pause = current->nr_dirtied_pause;
+ current->nr_dirtied_pause = NFSD_AGGRESSIVE_DIRTY_LIMIT;
+ throttle_adjusted = true;
+ }
+
if (!EX_ISSYNC(exp))
stable = NFS_UNSTABLE;
init_sync_kiocb(&kiocb, file);
@@ -1505,6 +1520,8 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
trace_nfsd_write_err(rqstp, fhp, offset, host_err);
nfserr = nfserrno(host_err);
}
+ if (throttle_adjusted)
+ current->nr_dirtied_pause = saved_nr_dirtied_pause;
if (restore_flags)
current_restore_flags(pflags, PF_LOCAL_THROTTLE);
return nfserr;
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever
2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever
@ 2025-12-19 14:11 ` Chuck Lever
2025-12-20 15:34 ` kernel test robot
` (4 more replies)
1 sibling, 5 replies; 15+ messages in thread
From: Chuck Lever @ 2025-12-19 14:11 UTC (permalink / raw)
To: Christoph Hellwig, Mike Snitzer; +Cc: linux-nfs, Chuck Lever
From: Chuck Lever <chuck.lever@oracle.com>
When memory pressure occurs during buffered writes, the traditional
approach is for balance_dirty_pages() to put the writing thread to
sleep until dirty pages are flushed. For NFSD, this means server
threads block waiting for I/O, reducing overall server throughput.
Add support for asynchronous write throttling using the BDP_ASYNC
flag to balance_dirty_pages_ratelimited_flags(). When enabled via:
/sys/kernel/debug/nfsd/write_async_throttle
NFSD checks memory pressure before attempting buffered writes. If
balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating
memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for
NFSv3) to the client instead of blocking.
This allows clients to back off and retry rather than having server
threads tied up waiting for writeback. The setting defaults to 0
(synchronous throttling) and can be combined with write_throttle for
layered throttling strategies.
Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is
automatically disabled for NFSv2 requests regardless of the setting.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
fs/nfsd/debugfs.c | 34 ++++++++++++++++++++++++++++++++++
fs/nfsd/nfsd.h | 1 +
fs/nfsd/vfs.c | 17 +++++++++++++++++
3 files changed, 52 insertions(+)
diff --git a/fs/nfsd/debugfs.c b/fs/nfsd/debugfs.c
index f3d9e957cc5c..f2cce37589ce 100644
--- a/fs/nfsd/debugfs.c
+++ b/fs/nfsd/debugfs.c
@@ -152,6 +152,37 @@ static int nfsd_write_throttle_set(void *data, u64 val)
DEFINE_DEBUGFS_ATTRIBUTE(nfsd_write_throttle_fops, nfsd_write_throttle_get,
nfsd_write_throttle_set, "%llu\n");
+/*
+ * /sys/kernel/debug/nfsd/write_async_throttle
+ *
+ * Contents:
+ * %0: Synchronous throttling (default) - writes sleep in balance_dirty_pages()
+ * %1: Asynchronous throttling - return NFS4ERR_DELAY when memory is tight
+ *
+ * When set to 1, NFSD uses BDP_ASYNC mode which returns -EAGAIN from
+ * balance_dirty_pages_ratelimited_flags() instead of sleeping. This allows
+ * NFSD to return NFS4ERR_DELAY (or NFSERR_JUKEBOX for NFSv3), letting
+ * clients back off and retry rather than having NFSD threads blocked.
+ *
+ * This setting takes immediate effect for all NFS versions, all exports,
+ * and in all NFSD net namespaces.
+ */
+
+static int nfsd_async_throttle_get(void *data, u64 *val)
+{
+ *val = nfsd_async_write_throttle ? 1 : 0;
+ return 0;
+}
+
+static int nfsd_async_throttle_set(void *data, u64 val)
+{
+ nfsd_async_write_throttle = (val > 0);
+ return 0;
+}
+
+DEFINE_DEBUGFS_ATTRIBUTE(nfsd_async_throttle_fops, nfsd_async_throttle_get,
+ nfsd_async_throttle_set, "%llu\n");
+
void nfsd_debugfs_exit(void)
{
debugfs_remove_recursive(nfsd_top_dir);
@@ -173,4 +204,7 @@ void nfsd_debugfs_init(void)
debugfs_create_file("write_throttle", 0644, nfsd_top_dir, NULL,
&nfsd_write_throttle_fops);
+
+ debugfs_create_file("write_async_throttle", 0644, nfsd_top_dir, NULL,
+ &nfsd_async_throttle_fops);
}
diff --git a/fs/nfsd/nfsd.h b/fs/nfsd/nfsd.h
index 16a259839768..ea61db58ef95 100644
--- a/fs/nfsd/nfsd.h
+++ b/fs/nfsd/nfsd.h
@@ -166,6 +166,7 @@ enum {
extern u64 nfsd_io_cache_read __read_mostly;
extern u64 nfsd_io_cache_write __read_mostly;
extern bool nfsd_aggressive_write_throttle __read_mostly;
+extern bool nfsd_async_write_throttle __read_mostly;
/*
* Aggressive write throttling reduces nr_dirtied_pause to force more
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 33805b9ac7e4..0fcfd29e843d 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -52,6 +52,7 @@ bool nfsd_disable_splice_read __read_mostly;
u64 nfsd_io_cache_read __read_mostly = NFSD_IO_BUFFERED;
u64 nfsd_io_cache_write __read_mostly = NFSD_IO_BUFFERED;
bool nfsd_aggressive_write_throttle __read_mostly;
+bool nfsd_async_write_throttle __read_mostly;
/**
* nfserrno - Map Linux errnos to NFS errnos
@@ -1473,6 +1474,22 @@ nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
}
}
+ /*
+ * If async throttling is enabled, check memory pressure
+ * before attempting buffered writes. Return -EAGAIN if
+ * the system is low on memory, allowing NFSD to return
+ * an NFS error code asking the client to retry later.
+ *
+ * Skip this for NFSv2 since it lacks NFSERR_JUKEBOX.
+ */
+ if (nfsd_async_write_throttle && rqstp->rq_vers >= 3) {
+ host_err =
+ balance_dirty_pages_ratelimited_flags(file->f_mapping,
+ BDP_ASYNC);
+ if (host_err == -EAGAIN)
+ break;
+ }
+
nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
since = READ_ONCE(file->f_wb_err);
--
2.52.0
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
@ 2025-12-20 15:34 ` kernel test robot
2025-12-21 5:41 ` kernel test robot
` (3 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2025-12-20 15:34 UTC (permalink / raw)
To: Chuck Lever; +Cc: llvm, oe-kbuild-all
Hi Chuck,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on next-20251219]
[cannot apply to linus/master v6.16-rc1]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Chuck-Lever/NFSD-Add-aggressive-write-throttling-control/20251219-221859
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20251219141105.1247093-3-cel%40kernel.org
patch subject: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20251220/202512201657.c3KKm6Kh-lkp@intel.com/config)
compiler: clang version 20.1.8 (https://github.com/llvm/llvm-project 87f0227cb60147a26a1eeb4fb06e3b505e9c7261)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251220/202512201657.c3KKm6Kh-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512201657.c3KKm6Kh-lkp@intel.com/
All errors (new ones prefixed by >>):
>> fs/nfsd/vfs.c:1490:4: error: 'break' statement not in loop or switch statement
1490 | break;
| ^
1 error generated.
vim +/break +1490 fs/nfsd/vfs.c
1389
1390 /**
1391 * nfsd_vfs_write - write data to an already-open file
1392 * @rqstp: RPC execution context
1393 * @fhp: File handle of file to write into
1394 * @nf: An open file matching @fhp
1395 * @offset: Byte offset of start
1396 * @payload: xdr_buf containing the write payload
1397 * @cnt: IN: number of bytes to write, OUT: number of bytes actually written
1398 * @stable: An NFS stable_how value
1399 * @verf: NFS WRITE verifier
1400 *
1401 * Upon return, caller must invoke fh_put on @fhp.
1402 *
1403 * Return values:
1404 * An nfsstat value in network byte order.
1405 */
1406 __be32
1407 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
1408 struct nfsd_file *nf, loff_t offset,
1409 const struct xdr_buf *payload, unsigned long *cnt,
1410 int stable, __be32 *verf)
1411 {
1412 struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
1413 struct file *file = nf->nf_file;
1414 struct super_block *sb = file_inode(file)->i_sb;
1415 struct kiocb kiocb;
1416 struct svc_export *exp;
1417 struct iov_iter iter;
1418 errseq_t since;
1419 __be32 nfserr;
1420 int host_err;
1421 unsigned long exp_op_flags = 0;
1422 unsigned int pflags = current->flags;
1423 bool restore_flags = false;
1424 unsigned int nvecs;
1425 int saved_nr_dirtied_pause = 0;
1426 bool throttle_adjusted = false;
1427
1428 trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
1429
1430 if (sb->s_export_op)
1431 exp_op_flags = sb->s_export_op->flags;
1432
1433 if (test_bit(RQ_LOCAL, &rqstp->rq_flags) &&
1434 !(exp_op_flags & EXPORT_OP_REMOTE_FS)) {
1435 /*
1436 * We want throttling in balance_dirty_pages()
1437 * and shrink_inactive_list() to only consider
1438 * the backingdev we are writing to, so that nfs to
1439 * localhost doesn't cause nfsd to lock up due to all
1440 * the client's dirty pages or its congested queue.
1441 */
1442 current->flags |= PF_LOCAL_THROTTLE;
1443 restore_flags = true;
1444 }
1445
1446 exp = fhp->fh_export;
1447
1448 /*
1449 * If aggressive write throttling is enabled, reduce the per-task
1450 * dirty page limit to throttle NFSD writes more aggressively.
1451 * This helps prevent memory exhaustion when fast network clients
1452 * overwhelm slow storage.
1453 */
1454 if (nfsd_aggressive_write_throttle) {
1455 saved_nr_dirtied_pause = current->nr_dirtied_pause;
1456 current->nr_dirtied_pause = NFSD_AGGRESSIVE_DIRTY_LIMIT;
1457 throttle_adjusted = true;
1458 }
1459
1460 if (!EX_ISSYNC(exp))
1461 stable = NFS_UNSTABLE;
1462 init_sync_kiocb(&kiocb, file);
1463 kiocb.ki_pos = offset;
1464 if (likely(!fhp->fh_use_wgather)) {
1465 switch (stable) {
1466 case NFS_FILE_SYNC:
1467 /* persist data and timestamps */
1468 kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC;
1469 break;
1470 case NFS_DATA_SYNC:
1471 /* persist data only */
1472 kiocb.ki_flags |= IOCB_DSYNC;
1473 break;
1474 }
1475 }
1476
1477 /*
1478 * If async throttling is enabled, check memory pressure
1479 * before attempting buffered writes. Return -EAGAIN if
1480 * the system is low on memory, allowing NFSD to return
1481 * an NFS error code asking the client to retry later.
1482 *
1483 * Skip this for NFSv2 since it lacks NFSERR_JUKEBOX.
1484 */
1485 if (nfsd_async_write_throttle && rqstp->rq_vers >= 3) {
1486 host_err =
1487 balance_dirty_pages_ratelimited_flags(file->f_mapping,
1488 BDP_ASYNC);
1489 if (host_err == -EAGAIN)
> 1490 break;
1491 }
1492
1493 nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
1494
1495 since = READ_ONCE(file->f_wb_err);
1496 if (verf)
1497 nfsd_copy_write_verifier(verf, nn);
1498
1499 switch (nfsd_io_cache_write) {
1500 case NFSD_IO_DIRECT:
1501 host_err = nfsd_direct_write(rqstp, fhp, nf, nvecs,
1502 cnt, &kiocb);
1503 break;
1504 case NFSD_IO_DONTCACHE:
1505 if (file->f_op->fop_flags & FOP_DONTCACHE)
1506 kiocb.ki_flags |= IOCB_DONTCACHE;
1507 fallthrough;
1508 case NFSD_IO_BUFFERED:
1509 iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
1510 host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
1511 if (host_err < 0)
1512 break;
1513 *cnt = host_err;
1514 break;
1515 }
1516 if (host_err < 0) {
1517 commit_reset_write_verifier(nn, rqstp, host_err);
1518 goto out_nfserr;
1519 }
1520 nfsd_stats_io_write_add(nn, exp, *cnt);
1521 fsnotify_modify(file);
1522 host_err = filemap_check_wb_err(file->f_mapping, since);
1523 if (host_err < 0)
1524 goto out_nfserr;
1525
1526 if (stable && fhp->fh_use_wgather) {
1527 host_err = wait_for_concurrent_writes(file);
1528 if (host_err < 0)
1529 commit_reset_write_verifier(nn, rqstp, host_err);
1530 }
1531
1532 out_nfserr:
1533 if (host_err >= 0) {
1534 trace_nfsd_write_io_done(rqstp, fhp, offset, *cnt);
1535 nfserr = nfs_ok;
1536 } else {
1537 trace_nfsd_write_err(rqstp, fhp, offset, host_err);
1538 nfserr = nfserrno(host_err);
1539 }
1540 if (throttle_adjusted)
1541 current->nr_dirtied_pause = saved_nr_dirtied_pause;
1542 if (restore_flags)
1543 current_restore_flags(pflags, PF_LOCAL_THROTTLE);
1544 return nfserr;
1545 }
1546
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
2025-12-20 15:34 ` kernel test robot
@ 2025-12-21 5:41 ` kernel test robot
2025-12-22 18:06 ` kernel test robot
` (2 subsequent siblings)
4 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2025-12-21 5:41 UTC (permalink / raw)
To: Chuck Lever; +Cc: oe-kbuild-all
Hi Chuck,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on linus/master v6.19-rc1 next-20251219]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Chuck-Lever/NFSD-Add-aggressive-write-throttling-control/20251219-221859
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20251219141105.1247093-3-cel%40kernel.org
patch subject: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
config: x86_64-rhel-9.4 (https://download.01.org/0day-ci/archive/20251221/202512210637.Fz6bpRxI-lkp@intel.com/config)
compiler: gcc-14 (Debian 14.2.0-19) 14.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251221/202512210637.Fz6bpRxI-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512210637.Fz6bpRxI-lkp@intel.com/
All errors (new ones prefixed by >>):
fs/nfsd/vfs.c: In function 'nfsd_vfs_write':
>> fs/nfsd/vfs.c:1490:25: error: break statement not within loop or switch
1490 | break;
| ^~~~~
vim +1490 fs/nfsd/vfs.c
1389
1390 /**
1391 * nfsd_vfs_write - write data to an already-open file
1392 * @rqstp: RPC execution context
1393 * @fhp: File handle of file to write into
1394 * @nf: An open file matching @fhp
1395 * @offset: Byte offset of start
1396 * @payload: xdr_buf containing the write payload
1397 * @cnt: IN: number of bytes to write, OUT: number of bytes actually written
1398 * @stable: An NFS stable_how value
1399 * @verf: NFS WRITE verifier
1400 *
1401 * Upon return, caller must invoke fh_put on @fhp.
1402 *
1403 * Return values:
1404 * An nfsstat value in network byte order.
1405 */
1406 __be32
1407 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
1408 struct nfsd_file *nf, loff_t offset,
1409 const struct xdr_buf *payload, unsigned long *cnt,
1410 int stable, __be32 *verf)
1411 {
1412 struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
1413 struct file *file = nf->nf_file;
1414 struct super_block *sb = file_inode(file)->i_sb;
1415 struct kiocb kiocb;
1416 struct svc_export *exp;
1417 struct iov_iter iter;
1418 errseq_t since;
1419 __be32 nfserr;
1420 int host_err;
1421 unsigned long exp_op_flags = 0;
1422 unsigned int pflags = current->flags;
1423 bool restore_flags = false;
1424 unsigned int nvecs;
1425 int saved_nr_dirtied_pause = 0;
1426 bool throttle_adjusted = false;
1427
1428 trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
1429
1430 if (sb->s_export_op)
1431 exp_op_flags = sb->s_export_op->flags;
1432
1433 if (test_bit(RQ_LOCAL, &rqstp->rq_flags) &&
1434 !(exp_op_flags & EXPORT_OP_REMOTE_FS)) {
1435 /*
1436 * We want throttling in balance_dirty_pages()
1437 * and shrink_inactive_list() to only consider
1438 * the backingdev we are writing to, so that nfs to
1439 * localhost doesn't cause nfsd to lock up due to all
1440 * the client's dirty pages or its congested queue.
1441 */
1442 current->flags |= PF_LOCAL_THROTTLE;
1443 restore_flags = true;
1444 }
1445
1446 exp = fhp->fh_export;
1447
1448 /*
1449 * If aggressive write throttling is enabled, reduce the per-task
1450 * dirty page limit to throttle NFSD writes more aggressively.
1451 * This helps prevent memory exhaustion when fast network clients
1452 * overwhelm slow storage.
1453 */
1454 if (nfsd_aggressive_write_throttle) {
1455 saved_nr_dirtied_pause = current->nr_dirtied_pause;
1456 current->nr_dirtied_pause = NFSD_AGGRESSIVE_DIRTY_LIMIT;
1457 throttle_adjusted = true;
1458 }
1459
1460 if (!EX_ISSYNC(exp))
1461 stable = NFS_UNSTABLE;
1462 init_sync_kiocb(&kiocb, file);
1463 kiocb.ki_pos = offset;
1464 if (likely(!fhp->fh_use_wgather)) {
1465 switch (stable) {
1466 case NFS_FILE_SYNC:
1467 /* persist data and timestamps */
1468 kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC;
1469 break;
1470 case NFS_DATA_SYNC:
1471 /* persist data only */
1472 kiocb.ki_flags |= IOCB_DSYNC;
1473 break;
1474 }
1475 }
1476
1477 /*
1478 * If async throttling is enabled, check memory pressure
1479 * before attempting buffered writes. Return -EAGAIN if
1480 * the system is low on memory, allowing NFSD to return
1481 * an NFS error code asking the client to retry later.
1482 *
1483 * Skip this for NFSv2 since it lacks NFSERR_JUKEBOX.
1484 */
1485 if (nfsd_async_write_throttle && rqstp->rq_vers >= 3) {
1486 host_err =
1487 balance_dirty_pages_ratelimited_flags(file->f_mapping,
1488 BDP_ASYNC);
1489 if (host_err == -EAGAIN)
> 1490 break;
1491 }
1492
1493 nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
1494
1495 since = READ_ONCE(file->f_wb_err);
1496 if (verf)
1497 nfsd_copy_write_verifier(verf, nn);
1498
1499 switch (nfsd_io_cache_write) {
1500 case NFSD_IO_DIRECT:
1501 host_err = nfsd_direct_write(rqstp, fhp, nf, nvecs,
1502 cnt, &kiocb);
1503 break;
1504 case NFSD_IO_DONTCACHE:
1505 if (file->f_op->fop_flags & FOP_DONTCACHE)
1506 kiocb.ki_flags |= IOCB_DONTCACHE;
1507 fallthrough;
1508 case NFSD_IO_BUFFERED:
1509 iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
1510 host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
1511 if (host_err < 0)
1512 break;
1513 *cnt = host_err;
1514 break;
1515 }
1516 if (host_err < 0) {
1517 commit_reset_write_verifier(nn, rqstp, host_err);
1518 goto out_nfserr;
1519 }
1520 nfsd_stats_io_write_add(nn, exp, *cnt);
1521 fsnotify_modify(file);
1522 host_err = filemap_check_wb_err(file->f_mapping, since);
1523 if (host_err < 0)
1524 goto out_nfserr;
1525
1526 if (stable && fhp->fh_use_wgather) {
1527 host_err = wait_for_concurrent_writes(file);
1528 if (host_err < 0)
1529 commit_reset_write_verifier(nn, rqstp, host_err);
1530 }
1531
1532 out_nfserr:
1533 if (host_err >= 0) {
1534 trace_nfsd_write_io_done(rqstp, fhp, offset, *cnt);
1535 nfserr = nfs_ok;
1536 } else {
1537 trace_nfsd_write_err(rqstp, fhp, offset, host_err);
1538 nfserr = nfserrno(host_err);
1539 }
1540 if (throttle_adjusted)
1541 current->nr_dirtied_pause = saved_nr_dirtied_pause;
1542 if (restore_flags)
1543 current_restore_flags(pflags, PF_LOCAL_THROTTLE);
1544 return nfserr;
1545 }
1546
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
2025-12-20 15:34 ` kernel test robot
2025-12-21 5:41 ` kernel test robot
@ 2025-12-22 18:06 ` kernel test robot
2025-12-22 23:47 ` kernel test robot
2026-01-07 8:00 ` Christoph Hellwig
4 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2025-12-22 18:06 UTC (permalink / raw)
To: Chuck Lever; +Cc: oe-kbuild-all
Hi Chuck,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on linus/master v6.19-rc2 next-20251219]
[cannot apply to hch-configfs/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Chuck-Lever/NFSD-Add-aggressive-write-throttling-control/20251219-221859
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20251219141105.1247093-3-cel%40kernel.org
patch subject: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
config: parisc-randconfig-001-20251223 (https://download.01.org/0day-ci/archive/20251223/202512230126.gowu7NIP-lkp@intel.com/config)
compiler: hppa-linux-gcc (GCC) 8.5.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251223/202512230126.gowu7NIP-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512230126.gowu7NIP-lkp@intel.com/
All errors (new ones prefixed by >>):
fs/nfsd/vfs.c: In function 'nfsd_vfs_write':
>> fs/nfsd/vfs.c:1490:4: error: break statement not within loop or switch
break;
^~~~~
vim +1490 fs/nfsd/vfs.c
1389
1390 /**
1391 * nfsd_vfs_write - write data to an already-open file
1392 * @rqstp: RPC execution context
1393 * @fhp: File handle of file to write into
1394 * @nf: An open file matching @fhp
1395 * @offset: Byte offset of start
1396 * @payload: xdr_buf containing the write payload
1397 * @cnt: IN: number of bytes to write, OUT: number of bytes actually written
1398 * @stable: An NFS stable_how value
1399 * @verf: NFS WRITE verifier
1400 *
1401 * Upon return, caller must invoke fh_put on @fhp.
1402 *
1403 * Return values:
1404 * An nfsstat value in network byte order.
1405 */
1406 __be32
1407 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
1408 struct nfsd_file *nf, loff_t offset,
1409 const struct xdr_buf *payload, unsigned long *cnt,
1410 int stable, __be32 *verf)
1411 {
1412 struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
1413 struct file *file = nf->nf_file;
1414 struct super_block *sb = file_inode(file)->i_sb;
1415 struct kiocb kiocb;
1416 struct svc_export *exp;
1417 struct iov_iter iter;
1418 errseq_t since;
1419 __be32 nfserr;
1420 int host_err;
1421 unsigned long exp_op_flags = 0;
1422 unsigned int pflags = current->flags;
1423 bool restore_flags = false;
1424 unsigned int nvecs;
1425 int saved_nr_dirtied_pause = 0;
1426 bool throttle_adjusted = false;
1427
1428 trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
1429
1430 if (sb->s_export_op)
1431 exp_op_flags = sb->s_export_op->flags;
1432
1433 if (test_bit(RQ_LOCAL, &rqstp->rq_flags) &&
1434 !(exp_op_flags & EXPORT_OP_REMOTE_FS)) {
1435 /*
1436 * We want throttling in balance_dirty_pages()
1437 * and shrink_inactive_list() to only consider
1438 * the backingdev we are writing to, so that nfs to
1439 * localhost doesn't cause nfsd to lock up due to all
1440 * the client's dirty pages or its congested queue.
1441 */
1442 current->flags |= PF_LOCAL_THROTTLE;
1443 restore_flags = true;
1444 }
1445
1446 exp = fhp->fh_export;
1447
1448 /*
1449 * If aggressive write throttling is enabled, reduce the per-task
1450 * dirty page limit to throttle NFSD writes more aggressively.
1451 * This helps prevent memory exhaustion when fast network clients
1452 * overwhelm slow storage.
1453 */
1454 if (nfsd_aggressive_write_throttle) {
1455 saved_nr_dirtied_pause = current->nr_dirtied_pause;
1456 current->nr_dirtied_pause = NFSD_AGGRESSIVE_DIRTY_LIMIT;
1457 throttle_adjusted = true;
1458 }
1459
1460 if (!EX_ISSYNC(exp))
1461 stable = NFS_UNSTABLE;
1462 init_sync_kiocb(&kiocb, file);
1463 kiocb.ki_pos = offset;
1464 if (likely(!fhp->fh_use_wgather)) {
1465 switch (stable) {
1466 case NFS_FILE_SYNC:
1467 /* persist data and timestamps */
1468 kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC;
1469 break;
1470 case NFS_DATA_SYNC:
1471 /* persist data only */
1472 kiocb.ki_flags |= IOCB_DSYNC;
1473 break;
1474 }
1475 }
1476
1477 /*
1478 * If async throttling is enabled, check memory pressure
1479 * before attempting buffered writes. Return -EAGAIN if
1480 * the system is low on memory, allowing NFSD to return
1481 * an NFS error code asking the client to retry later.
1482 *
1483 * Skip this for NFSv2 since it lacks NFSERR_JUKEBOX.
1484 */
1485 if (nfsd_async_write_throttle && rqstp->rq_vers >= 3) {
1486 host_err =
1487 balance_dirty_pages_ratelimited_flags(file->f_mapping,
1488 BDP_ASYNC);
1489 if (host_err == -EAGAIN)
> 1490 break;
1491 }
1492
1493 nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
1494
1495 since = READ_ONCE(file->f_wb_err);
1496 if (verf)
1497 nfsd_copy_write_verifier(verf, nn);
1498
1499 switch (nfsd_io_cache_write) {
1500 case NFSD_IO_DIRECT:
1501 host_err = nfsd_direct_write(rqstp, fhp, nf, nvecs,
1502 cnt, &kiocb);
1503 break;
1504 case NFSD_IO_DONTCACHE:
1505 if (file->f_op->fop_flags & FOP_DONTCACHE)
1506 kiocb.ki_flags |= IOCB_DONTCACHE;
1507 fallthrough;
1508 case NFSD_IO_BUFFERED:
1509 iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
1510 host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
1511 if (host_err < 0)
1512 break;
1513 *cnt = host_err;
1514 break;
1515 }
1516 if (host_err < 0) {
1517 commit_reset_write_verifier(nn, rqstp, host_err);
1518 goto out_nfserr;
1519 }
1520 nfsd_stats_io_write_add(nn, exp, *cnt);
1521 fsnotify_modify(file);
1522 host_err = filemap_check_wb_err(file->f_mapping, since);
1523 if (host_err < 0)
1524 goto out_nfserr;
1525
1526 if (stable && fhp->fh_use_wgather) {
1527 host_err = wait_for_concurrent_writes(file);
1528 if (host_err < 0)
1529 commit_reset_write_verifier(nn, rqstp, host_err);
1530 }
1531
1532 out_nfserr:
1533 if (host_err >= 0) {
1534 trace_nfsd_write_io_done(rqstp, fhp, offset, *cnt);
1535 nfserr = nfs_ok;
1536 } else {
1537 trace_nfsd_write_err(rqstp, fhp, offset, host_err);
1538 nfserr = nfserrno(host_err);
1539 }
1540 if (throttle_adjusted)
1541 current->nr_dirtied_pause = saved_nr_dirtied_pause;
1542 if (restore_flags)
1543 current_restore_flags(pflags, PF_LOCAL_THROTTLE);
1544 return nfserr;
1545 }
1546
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
` (2 preceding siblings ...)
2025-12-22 18:06 ` kernel test robot
@ 2025-12-22 23:47 ` kernel test robot
2026-01-07 8:00 ` Christoph Hellwig
4 siblings, 0 replies; 15+ messages in thread
From: kernel test robot @ 2025-12-22 23:47 UTC (permalink / raw)
To: Chuck Lever; +Cc: llvm, oe-kbuild-all
Hi Chuck,
[This is a private test report for your RFC patch.]
kernel test robot noticed the following build errors:
[auto build test ERROR on brauner-vfs/vfs.all]
[also build test ERROR on linus/master v6.19-rc2 next-20251219]
[cannot apply to hch-configfs/for-next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Chuck-Lever/NFSD-Add-aggressive-write-throttling-control/20251219-221859
base: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git vfs.all
patch link: https://lore.kernel.org/r/20251219141105.1247093-3-cel%40kernel.org
patch subject: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
config: loongarch-defconfig (https://download.01.org/0day-ci/archive/20251223/202512230750.htK4fXlz-lkp@intel.com/config)
compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251223/202512230750.htK4fXlz-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202512230750.htK4fXlz-lkp@intel.com/
All errors (new ones prefixed by >>):
>> fs/nfsd/vfs.c:1490:4: error: 'break' statement not in loop or switch statement
1490 | break;
| ^
1 error generated.
vim +/break +1490 fs/nfsd/vfs.c
1389
1390 /**
1391 * nfsd_vfs_write - write data to an already-open file
1392 * @rqstp: RPC execution context
1393 * @fhp: File handle of file to write into
1394 * @nf: An open file matching @fhp
1395 * @offset: Byte offset of start
1396 * @payload: xdr_buf containing the write payload
1397 * @cnt: IN: number of bytes to write, OUT: number of bytes actually written
1398 * @stable: An NFS stable_how value
1399 * @verf: NFS WRITE verifier
1400 *
1401 * Upon return, caller must invoke fh_put on @fhp.
1402 *
1403 * Return values:
1404 * An nfsstat value in network byte order.
1405 */
1406 __be32
1407 nfsd_vfs_write(struct svc_rqst *rqstp, struct svc_fh *fhp,
1408 struct nfsd_file *nf, loff_t offset,
1409 const struct xdr_buf *payload, unsigned long *cnt,
1410 int stable, __be32 *verf)
1411 {
1412 struct nfsd_net *nn = net_generic(SVC_NET(rqstp), nfsd_net_id);
1413 struct file *file = nf->nf_file;
1414 struct super_block *sb = file_inode(file)->i_sb;
1415 struct kiocb kiocb;
1416 struct svc_export *exp;
1417 struct iov_iter iter;
1418 errseq_t since;
1419 __be32 nfserr;
1420 int host_err;
1421 unsigned long exp_op_flags = 0;
1422 unsigned int pflags = current->flags;
1423 bool restore_flags = false;
1424 unsigned int nvecs;
1425 int saved_nr_dirtied_pause = 0;
1426 bool throttle_adjusted = false;
1427
1428 trace_nfsd_write_opened(rqstp, fhp, offset, *cnt);
1429
1430 if (sb->s_export_op)
1431 exp_op_flags = sb->s_export_op->flags;
1432
1433 if (test_bit(RQ_LOCAL, &rqstp->rq_flags) &&
1434 !(exp_op_flags & EXPORT_OP_REMOTE_FS)) {
1435 /*
1436 * We want throttling in balance_dirty_pages()
1437 * and shrink_inactive_list() to only consider
1438 * the backingdev we are writing to, so that nfs to
1439 * localhost doesn't cause nfsd to lock up due to all
1440 * the client's dirty pages or its congested queue.
1441 */
1442 current->flags |= PF_LOCAL_THROTTLE;
1443 restore_flags = true;
1444 }
1445
1446 exp = fhp->fh_export;
1447
1448 /*
1449 * If aggressive write throttling is enabled, reduce the per-task
1450 * dirty page limit to throttle NFSD writes more aggressively.
1451 * This helps prevent memory exhaustion when fast network clients
1452 * overwhelm slow storage.
1453 */
1454 if (nfsd_aggressive_write_throttle) {
1455 saved_nr_dirtied_pause = current->nr_dirtied_pause;
1456 current->nr_dirtied_pause = NFSD_AGGRESSIVE_DIRTY_LIMIT;
1457 throttle_adjusted = true;
1458 }
1459
1460 if (!EX_ISSYNC(exp))
1461 stable = NFS_UNSTABLE;
1462 init_sync_kiocb(&kiocb, file);
1463 kiocb.ki_pos = offset;
1464 if (likely(!fhp->fh_use_wgather)) {
1465 switch (stable) {
1466 case NFS_FILE_SYNC:
1467 /* persist data and timestamps */
1468 kiocb.ki_flags |= IOCB_DSYNC | IOCB_SYNC;
1469 break;
1470 case NFS_DATA_SYNC:
1471 /* persist data only */
1472 kiocb.ki_flags |= IOCB_DSYNC;
1473 break;
1474 }
1475 }
1476
1477 /*
1478 * If async throttling is enabled, check memory pressure
1479 * before attempting buffered writes. Return -EAGAIN if
1480 * the system is low on memory, allowing NFSD to return
1481 * an NFS error code asking the client to retry later.
1482 *
1483 * Skip this for NFSv2 since it lacks NFSERR_JUKEBOX.
1484 */
1485 if (nfsd_async_write_throttle && rqstp->rq_vers >= 3) {
1486 host_err =
1487 balance_dirty_pages_ratelimited_flags(file->f_mapping,
1488 BDP_ASYNC);
1489 if (host_err == -EAGAIN)
> 1490 break;
1491 }
1492
1493 nvecs = xdr_buf_to_bvec(rqstp->rq_bvec, rqstp->rq_maxpages, payload);
1494
1495 since = READ_ONCE(file->f_wb_err);
1496 if (verf)
1497 nfsd_copy_write_verifier(verf, nn);
1498
1499 switch (nfsd_io_cache_write) {
1500 case NFSD_IO_DIRECT:
1501 host_err = nfsd_direct_write(rqstp, fhp, nf, nvecs,
1502 cnt, &kiocb);
1503 break;
1504 case NFSD_IO_DONTCACHE:
1505 if (file->f_op->fop_flags & FOP_DONTCACHE)
1506 kiocb.ki_flags |= IOCB_DONTCACHE;
1507 fallthrough;
1508 case NFSD_IO_BUFFERED:
1509 iov_iter_bvec(&iter, ITER_SOURCE, rqstp->rq_bvec, nvecs, *cnt);
1510 host_err = vfs_iocb_iter_write(file, &kiocb, &iter);
1511 if (host_err < 0)
1512 break;
1513 *cnt = host_err;
1514 break;
1515 }
1516 if (host_err < 0) {
1517 commit_reset_write_verifier(nn, rqstp, host_err);
1518 goto out_nfserr;
1519 }
1520 nfsd_stats_io_write_add(nn, exp, *cnt);
1521 fsnotify_modify(file);
1522 host_err = filemap_check_wb_err(file->f_mapping, since);
1523 if (host_err < 0)
1524 goto out_nfserr;
1525
1526 if (stable && fhp->fh_use_wgather) {
1527 host_err = wait_for_concurrent_writes(file);
1528 if (host_err < 0)
1529 commit_reset_write_verifier(nn, rqstp, host_err);
1530 }
1531
1532 out_nfserr:
1533 if (host_err >= 0) {
1534 trace_nfsd_write_io_done(rqstp, fhp, offset, *cnt);
1535 nfserr = nfs_ok;
1536 } else {
1537 trace_nfsd_write_err(rqstp, fhp, offset, host_err);
1538 nfserr = nfserrno(host_err);
1539 }
1540 if (throttle_adjusted)
1541 current->nr_dirtied_pause = saved_nr_dirtied_pause;
1542 if (restore_flags)
1543 current_restore_flags(pflags, PF_LOCAL_THROTTLE);
1544 return nfserr;
1545 }
1546
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control
2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever
@ 2026-01-07 7:55 ` Christoph Hellwig
2026-01-07 14:36 ` Chuck Lever
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2026-01-07 7:55 UTC (permalink / raw)
To: Chuck Lever
Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever, linux-mm,
linux-fsdevel
On Fri, Dec 19, 2025 at 09:11:04AM -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> On NFS servers with fast network links but slow storage, clients can
> generate WRITE requests faster than the server can flush payloads to
> durable storage. This can push the server into memory exhaustion as
> dirty pages accumulate across hundreds of concurrent NFSD threads.
>
> The existing dirty page throttling (balance_dirty_pages()) uses
> per-task accounting with default ratelimits that allow each thread
> to dirty ~32 pages before throttling occurs. With many NFSD threads,
> this allows significant dirty page accumulation before any
> throttling kicks in.
What makes NFSD so special here vs say a userspace process with a bunch
of threads? Also what is the actual problem we're trying to solve?
I kinda hate having this stuff in NFSD when there's nothing specific
about nfs serving here.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
` (3 preceding siblings ...)
2025-12-22 23:47 ` kernel test robot
@ 2026-01-07 8:00 ` Christoph Hellwig
2026-01-07 14:42 ` Chuck Lever
4 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2026-01-07 8:00 UTC (permalink / raw)
To: Chuck Lever; +Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever
On Fri, Dec 19, 2025 at 09:11:05AM -0500, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> When memory pressure occurs during buffered writes, the traditional
> approach is for balance_dirty_pages() to put the writing thread to
> sleep until dirty pages are flushed. For NFSD, this means server
> threads block waiting for I/O, reducing overall server throughput.
>
> Add support for asynchronous write throttling using the BDP_ASYNC
> flag to balance_dirty_pages_ratelimited_flags(). When enabled via:
>
> /sys/kernel/debug/nfsd/write_async_throttle
Let me reiterate that I really, really hate all this magic debugs-fs
enabled features. Either they are gnuinely useful (think this would
be such a thing) and they should be enabled unconditionally, or they
are tradeoffs and should have a proper tunable not hidden in debugfs.
> NFSD checks memory pressure before attempting buffered writes. If
> balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating
> memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for
> NFSv3) to the client instead of blocking.
>
> This allows clients to back off and retry rather than having server
> threads tied up waiting for writeback. The setting defaults to 0
> (synchronous throttling) and can be combined with write_throttle for
> layered throttling strategies.
>
> Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is
> automatically disabled for NFSv2 requests regardless of the setting.
This all seems very useful to me. But it really needs to show numbers
on how it helps.
> + * Contents:
> + * %0: Synchronous throttling (default) - writes sleep in balance_dirty_pages()
Overly lone line.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control
2026-01-07 7:55 ` Christoph Hellwig
@ 2026-01-07 14:36 ` Chuck Lever
2026-01-07 14:42 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Chuck Lever @ 2026-01-07 14:36 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, linux-nfs, Chuck Lever, linux-mm, linux-fsdevel
On 1/7/26 2:55 AM, Christoph Hellwig wrote:
> On Fri, Dec 19, 2025 at 09:11:04AM -0500, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> On NFS servers with fast network links but slow storage, clients can
>> generate WRITE requests faster than the server can flush payloads to
>> durable storage. This can push the server into memory exhaustion as
>> dirty pages accumulate across hundreds of concurrent NFSD threads.
>>
>> The existing dirty page throttling (balance_dirty_pages()) uses
>> per-task accounting with default ratelimits that allow each thread
>> to dirty ~32 pages before throttling occurs. With many NFSD threads,
>> this allows significant dirty page accumulation before any
>> throttling kicks in.
>
> What makes NFSD so special here vs say a userspace process with a bunch
> of threads? Also what is the actual problem we're trying to solve?
The problem, as I see it, is that the system is not providing enough
backpressure to slow down noisy clients, allowing them to overwhelm
the server's memory with UNSTABLE WRITE traffic.
This is the same issue, IMO, that Mike's direct I/O is attempting to
address. Our implementation of UNSTABLE WRITE is a denial-of-service
vector.
> I kinda hate having this stuff in NFSD when there's nothing specific
> about nfs serving here.
Don't worry too much about that, these patches are obviously not in any
kind of merge-able shape yet. We do need to understand the metabolism of
UNSTABLE WRITEs, in particular, to get a clear picture of what needs to
be controlled to make the server autonomously stable.
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control
2026-01-07 14:36 ` Chuck Lever
@ 2026-01-07 14:42 ` Christoph Hellwig
2026-01-07 14:49 ` Chuck Lever
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2026-01-07 14:42 UTC (permalink / raw)
To: Chuck Lever
Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever, linux-mm,
linux-fsdevel
On Wed, Jan 07, 2026 at 09:36:39AM -0500, Chuck Lever wrote:
> > What makes NFSD so special here vs say a userspace process with a bunch
> > of threads? Also what is the actual problem we're trying to solve?
>
> The problem, as I see it, is that the system is not providing enough
> backpressure to slow down noisy clients, allowing them to overwhelm
> the server's memory with UNSTABLE WRITE traffic.
>
> This is the same issue, IMO, that Mike's direct I/O is attempting to
> address. Our implementation of UNSTABLE WRITE is a denial-of-service
> vector.
But how is this different from Samba or a userspace NFS server?
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2026-01-07 8:00 ` Christoph Hellwig
@ 2026-01-07 14:42 ` Chuck Lever
2026-01-07 16:25 ` Christoph Hellwig
2026-01-07 19:40 ` Mike Snitzer
0 siblings, 2 replies; 15+ messages in thread
From: Chuck Lever @ 2026-01-07 14:42 UTC (permalink / raw)
To: Christoph Hellwig; +Cc: Mike Snitzer, linux-nfs, Chuck Lever
On 1/7/26 3:00 AM, Christoph Hellwig wrote:
> On Fri, Dec 19, 2025 at 09:11:05AM -0500, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> When memory pressure occurs during buffered writes, the traditional
>> approach is for balance_dirty_pages() to put the writing thread to
>> sleep until dirty pages are flushed. For NFSD, this means server
>> threads block waiting for I/O, reducing overall server throughput.
>>
>> Add support for asynchronous write throttling using the BDP_ASYNC
>> flag to balance_dirty_pages_ratelimited_flags(). When enabled via:
>>
>> /sys/kernel/debug/nfsd/write_async_throttle
>
> Let me reiterate that I really, really hate all this magic debugs-fs
> enabled features. Either they are gnuinely useful (think this would
> be such a thing) and they should be enabled unconditionally, or they
> are tradeoffs and should have a proper tunable not hidden in debugfs.
The use of debugfs here is because we don't yet have a coherent design
in mind -- this new facility is entirely experimental, and we need a
way to enable and disable it to make good comparisons, without making
immutable changes to the actual NFSD administrative interface.
"The RFC sign out front should have told ya."
But I agree, in the long term I most prefer no new administrative
controls -- it should just work if at all possible.
>> NFSD checks memory pressure before attempting buffered writes. If
>> balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating
>> memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for
>> NFSv3) to the client instead of blocking.
>>
>> This allows clients to back off and retry rather than having server
>> threads tied up waiting for writeback. The setting defaults to 0
>> (synchronous throttling) and can be combined with write_throttle for
>> layered throttling strategies.
>>
>> Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is
>> automatically disabled for NFSv2 requests regardless of the setting.
>
> This all seems very useful to me. But it really needs to show numbers
> on how it helps.
Well if I can get this into operational shape, perhaps J. Flynn would
be interested in trying it out for us.
I'm happy to run with this one and drop (or postpone) 1/2, if that is
your assessment.
>> + * Contents:
>> + * %0: Synchronous throttling (default) - writes sleep in balance_dirty_pages()
>
> Overly lone line.
>
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 1/2] NFSD: Add aggressive write throttling control
2026-01-07 14:42 ` Christoph Hellwig
@ 2026-01-07 14:49 ` Chuck Lever
0 siblings, 0 replies; 15+ messages in thread
From: Chuck Lever @ 2026-01-07 14:49 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Mike Snitzer, linux-nfs, Chuck Lever, linux-mm, linux-fsdevel
On 1/7/26 9:42 AM, Christoph Hellwig wrote:
> On Wed, Jan 07, 2026 at 09:36:39AM -0500, Chuck Lever wrote:
>>> What makes NFSD so special here vs say a userspace process with a bunch
>>> of threads? Also what is the actual problem we're trying to solve?
>>
>> The problem, as I see it, is that the system is not providing enough
>> backpressure to slow down noisy clients, allowing them to overwhelm
>> the server's memory with UNSTABLE WRITE traffic.
>>
>> This is the same issue, IMO, that Mike's direct I/O is attempting to
>> address. Our implementation of UNSTABLE WRITE is a denial-of-service
>> vector.
>
> But how is this different from Samba or a userspace NFS server?
Well it might not be different. But at this point I don't think we know
enough about the problem to say one way or another. I'm just trying to
gather more experimental evidence about what is happening.
--
Chuck Lever
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2026-01-07 14:42 ` Chuck Lever
@ 2026-01-07 16:25 ` Christoph Hellwig
2026-01-07 19:40 ` Mike Snitzer
1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2026-01-07 16:25 UTC (permalink / raw)
To: Chuck Lever; +Cc: Christoph Hellwig, Mike Snitzer, linux-nfs, Chuck Lever
On Wed, Jan 07, 2026 at 09:42:58AM -0500, Chuck Lever wrote:
> I'm happy to run with this one and drop (or postpone) 1/2, if that is
> your assessment.
I don't really understand what exactly patch 1 is aiming for. Not
stalling nfsd threads when congested makes total sense on the other
hand.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support
2026-01-07 14:42 ` Chuck Lever
2026-01-07 16:25 ` Christoph Hellwig
@ 2026-01-07 19:40 ` Mike Snitzer
1 sibling, 0 replies; 15+ messages in thread
From: Mike Snitzer @ 2026-01-07 19:40 UTC (permalink / raw)
To: Chuck Lever; +Cc: Christoph Hellwig, linux-nfs, Chuck Lever, jonathan.flynn
On Wed, Jan 07, 2026 at 09:42:58AM -0500, Chuck Lever wrote:
> On 1/7/26 3:00 AM, Christoph Hellwig wrote:
> > On Fri, Dec 19, 2025 at 09:11:05AM -0500, Chuck Lever wrote:
> >> From: Chuck Lever <chuck.lever@oracle.com>
> >>
> >> When memory pressure occurs during buffered writes, the traditional
> >> approach is for balance_dirty_pages() to put the writing thread to
> >> sleep until dirty pages are flushed. For NFSD, this means server
> >> threads block waiting for I/O, reducing overall server throughput.
> >>
> >> Add support for asynchronous write throttling using the BDP_ASYNC
> >> flag to balance_dirty_pages_ratelimited_flags(). When enabled via:
> >>
> >> /sys/kernel/debug/nfsd/write_async_throttle
> >
> > Let me reiterate that I really, really hate all this magic debugs-fs
> > enabled features. Either they are gnuinely useful (think this would
> > be such a thing) and they should be enabled unconditionally, or they
> > are tradeoffs and should have a proper tunable not hidden in debugfs.
>
> The use of debugfs here is because we don't yet have a coherent design
> in mind -- this new facility is entirely experimental, and we need a
> way to enable and disable it to make good comparisons, without making
> immutable changes to the actual NFSD administrative interface.
>
> "The RFC sign out front should have told ya."
>
> But I agree, in the long term I most prefer no new administrative
> controls -- it should just work if at all possible.
>
>
> >> NFSD checks memory pressure before attempting buffered writes. If
> >> balance_dirty_pages_ratelimited_flags() returns -EAGAIN (indicating
> >> memory exhaustion), NFSD returns NFS4ERR_DELAY (or NFSERR_JUKEBOX for
> >> NFSv3) to the client instead of blocking.
> >>
> >> This allows clients to back off and retry rather than having server
> >> threads tied up waiting for writeback. The setting defaults to 0
> >> (synchronous throttling) and can be combined with write_throttle for
> >> layered throttling strategies.
> >>
> >> Note: NFSv2 does not support NFSERR_JUKEBOX, so async throttling is
> >> automatically disabled for NFSv2 requests regardless of the setting.
> >
> > This all seems very useful to me. But it really needs to show numbers
> > on how it helps.
>
> Well if I can get this into operational shape, perhaps J. Flynn would
> be interested in trying it out for us.
>
> I'm happy to run with this one and drop (or postpone) 1/2, if that is
> your assessment.
Probably a good start. Definitely looks useful and worth measuring to
see if buffered IO improves.
I can include it in a test kernel for Jon Flynn once you're happy with
the patch and would like further testing (fyi I've rebased to latest
6.18-stable but Jon hasn't done baseline testing of it yet, so we
could kill 2 birds once ready).
Thanks,
Mike
ps. Jon, for further context see Chuck's original 2/2 patch:
https://lore.kernel.org/linux-nfs/20251219141105.1247093-3-cel@kernel.org/
And his cover letter:
https://lore.kernel.org/linux-nfs/20251219141105.1247093-1-cel@kernel.org/
Also patch 1/2, but consensus seems to be "focus on 2/2 first":
https://lore.kernel.org/linux-nfs/20251219141105.1247093-2-cel@kernel.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2026-01-07 19:40 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-19 14:11 [RFC PATCH 0/2] NFSD: Rate-limiting unstable WRITEs Chuck Lever
2025-12-19 14:11 ` [RFC PATCH 1/2] NFSD: Add aggressive write throttling control Chuck Lever
2026-01-07 7:55 ` Christoph Hellwig
2026-01-07 14:36 ` Chuck Lever
2026-01-07 14:42 ` Christoph Hellwig
2026-01-07 14:49 ` Chuck Lever
2025-12-19 14:11 ` [RFC PATCH 2/2] NFSD: Add asynchronous write throttling support Chuck Lever
2025-12-20 15:34 ` kernel test robot
2025-12-21 5:41 ` kernel test robot
2025-12-22 18:06 ` kernel test robot
2025-12-22 23:47 ` kernel test robot
2026-01-07 8:00 ` Christoph Hellwig
2026-01-07 14:42 ` Chuck Lever
2026-01-07 16:25 ` Christoph Hellwig
2026-01-07 19:40 ` Mike Snitzer
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.