* [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool
@ 2025-12-12 22:39 Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 1/6] sunrpc: split svc_set_num_threads() into two functions Jeff Layton
` (6 more replies)
0 siblings, 7 replies; 17+ messages in thread
From: Jeff Layton @ 2025-12-12 22:39 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel, Jeff Layton
This patchset changes nfsd to dynamically size its threadpool as
needed. The main user-visible change is the addition of new controls
that allow the admin to set a minimum number of threads.
When the minimum is set to a non-zero value, the traditional "threads"
setting is interpreted as a maximum number of threads instead of a
static count. The server will start the minimum number of threads, and
then ramp up the thread count as needed. When the server is idle, it
will gradually ramp down the thread count.
This control scheme should allow us to sanely switch between kernels
that do and do not support dynamic threading. In the case where dynamic
threading is not supported, the user will just get the static maximum
number of threads.
The series is based on a set of draft patches by Neil. There are a
number of changes from his work:
1/ his original series was based around a new setting that defined a
maximum number of threads. This one instead adds a control to define a
minimum number of threads.
2/ if the thread's wait doesn't hit the timeout, then svc_recv() will
not return -ETIMEDOUT and the thread will stay running. Timeouts only
happens if a thread is wakes up without finding work to do.
3/ the printks in his original patches have been changed to tracepoints
So far this is only lightly tested, but it seems to work well. I
still need to do some benchmarking to see whether this affects
performance, so I'm posting this as an RFC for now.
Does this approach look sane to everyone?
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Jeff Layton (6):
sunrpc: split svc_set_num_threads() into two functions
sunrpc: remove special handling of NULL pool from svc_start/stop_kthreads()
sunrpc: track the max number of requested threads in a pool
sunrpc: introduce the concept of a minimum number of threads per pool
nfsd: adjust number of running nfsd threads based on activity
nfsd: add controls to set the minimum number of threads per pool
Documentation/netlink/specs/nfsd.yaml | 5 +
fs/lockd/svc.c | 4 +-
fs/nfs/callback.c | 8 +-
fs/nfsd/netlink.c | 5 +-
fs/nfsd/netns.h | 6 ++
fs/nfsd/nfsctl.c | 50 ++++++++++
fs/nfsd/nfssvc.c | 56 ++++++++---
fs/nfsd/trace.h | 54 ++++++++++
include/linux/sunrpc/svc.h | 11 +-
include/linux/sunrpc/svcsock.h | 2 +-
include/uapi/linux/nfsd_netlink.h | 1 +
net/sunrpc/svc.c | 182 +++++++++++++++++++---------------
net/sunrpc/svc_xprt.c | 45 +++++++--
13 files changed, 315 insertions(+), 114 deletions(-)
---
base-commit: b6cf9ca7e7555f7f079bb062e3cd99a501e0d611
change-id: 20251212-nfsd-dynathread-9f7a31172005
Best regards,
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* [PATCH RFC 1/6] sunrpc: split svc_set_num_threads() into two functions
2025-12-12 22:39 [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Jeff Layton
@ 2025-12-12 22:39 ` Jeff Layton
2025-12-13 19:29 ` Chuck Lever
2025-12-12 22:39 ` [PATCH RFC 2/6] sunrpc: remove special handling of NULL pool from svc_start/stop_kthreads() Jeff Layton
` (5 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-12-12 22:39 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel, Jeff Layton
svc_set_num_threads() will set the number of running threads for a given
pool. If the pool argument is set to NULL however, it will distribute
the threads among all of the pools evenly.
These divergent codepaths complicate the move to dynamic threading.
Simplify the API by splitting these two cases into different helpers:
Add a new svc_set_pool_threads() function that sets the number of
threads in a single, given pool. Modify svc_set_num_threads() to
distribute the threads evenly between all of the pools and then call
svc_set_pool_threads() for each.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/lockd/svc.c | 4 ++--
fs/nfs/callback.c | 8 +++----
fs/nfsd/nfssvc.c | 21 ++++++++----------
include/linux/sunrpc/svc.h | 3 ++-
net/sunrpc/svc.c | 54 +++++++++++++++++++++++++++++++++++++---------
5 files changed, 61 insertions(+), 29 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index d68afa196535a8785bab2931c2b14f03a1174ef9..fbf132b4e08d11a91784c21ee0209fd7c149fd9d 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -340,7 +340,7 @@ static int lockd_get(void)
return -ENOMEM;
}
- error = svc_set_num_threads(serv, NULL, 1);
+ error = svc_set_num_threads(serv, 1);
if (error < 0) {
svc_destroy(&serv);
return error;
@@ -368,7 +368,7 @@ static void lockd_put(void)
unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
#endif
- svc_set_num_threads(nlmsvc_serv, NULL, 0);
+ svc_set_num_threads(nlmsvc_serv, 0);
timer_delete_sync(&nlmsvc_retry);
svc_destroy(&nlmsvc_serv);
dprintk("lockd_down: service destroyed\n");
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index c8b837006bb27277ab34fe516f1b63992fee9b7f..44b35b7f8dc022f1d8c069eaf2f7d334c93f77fc 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -119,9 +119,9 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt,
if (serv->sv_nrthreads == nrservs)
return 0;
- ret = svc_set_num_threads(serv, NULL, nrservs);
+ ret = svc_set_num_threads(serv, nrservs);
if (ret) {
- svc_set_num_threads(serv, NULL, 0);
+ svc_set_num_threads(serv, 0);
return ret;
}
dprintk("nfs_callback_up: service started\n");
@@ -242,7 +242,7 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt)
cb_info->users++;
err_net:
if (!cb_info->users) {
- svc_set_num_threads(cb_info->serv, NULL, 0);
+ svc_set_num_threads(cb_info->serv, 0);
svc_destroy(&cb_info->serv);
}
err_create:
@@ -268,7 +268,7 @@ void nfs_callback_down(int minorversion, struct net *net)
nfs_callback_down_net(minorversion, serv, net);
cb_info->users--;
if (cb_info->users == 0) {
- svc_set_num_threads(serv, NULL, 0);
+ svc_set_num_threads(serv, 0);
dprintk("nfs_callback_down: service destroyed\n");
svc_destroy(&cb_info->serv);
}
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 93f7435cafd2362d9ddb28815277c824067cb370..aafec8ff588b85b0e26d40b76ef00953dc6472b4 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
}
/* Kill outstanding nfsd threads */
- svc_set_num_threads(serv, NULL, 0);
+ svc_set_num_threads(serv, 0);
nfsd_destroy_serv(net);
mutex_unlock(&nfsd_mutex);
}
@@ -702,12 +702,9 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
if (nn->nfsd_serv == NULL || n <= 0)
return 0;
- /*
- * Special case: When n == 1, pass in NULL for the pool, so that the
- * change is distributed equally among them.
- */
+ /* Special case: When n == 1, distribute threads equally among pools. */
if (n == 1)
- return svc_set_num_threads(nn->nfsd_serv, NULL, nthreads[0]);
+ return svc_set_num_threads(nn->nfsd_serv, nthreads[0]);
if (n > nn->nfsd_serv->sv_nrpools)
n = nn->nfsd_serv->sv_nrpools;
@@ -733,18 +730,18 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
/* apply the new numbers */
for (i = 0; i < n; i++) {
- err = svc_set_num_threads(nn->nfsd_serv,
- &nn->nfsd_serv->sv_pools[i],
- nthreads[i]);
+ err = svc_set_pool_threads(nn->nfsd_serv,
+ &nn->nfsd_serv->sv_pools[i],
+ nthreads[i]);
if (err)
goto out;
}
/* Anything undefined in array is considered to be 0 */
for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
- err = svc_set_num_threads(nn->nfsd_serv,
- &nn->nfsd_serv->sv_pools[i],
- 0);
+ err = svc_set_pool_threads(nn->nfsd_serv,
+ &nn->nfsd_serv->sv_pools[i],
+ 0);
if (err)
goto out;
}
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 5506d20857c318774cd223272d4b0022cc19ffb8..dd5fbbf8b3d39df6c17a7624edf344557fffd32c 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -446,7 +446,8 @@ struct svc_serv * svc_create_pooled(struct svc_program *prog,
struct svc_stat *stats,
unsigned int bufsize,
int (*threadfn)(void *data));
-int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
+int svc_set_pool_threads(struct svc_serv *, struct svc_pool *, int);
+int svc_set_num_threads(struct svc_serv *, int);
int svc_pool_stats_open(struct svc_info *si, struct file *file);
void svc_process(struct svc_rqst *rqstp);
void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 4704dce7284eccc9e2bc64cf22947666facfa86a..3fe5a7f8e57e3fa3837265ec06884b357d5373ff 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -856,15 +856,12 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
}
/**
- * svc_set_num_threads - adjust number of threads per RPC service
+ * svc_set_pool_threads - adjust number of threads per pool
* @serv: RPC service to adjust
- * @pool: Specific pool from which to choose threads, or NULL
+ * @pool: Specific pool from which to choose threads
* @nrservs: New number of threads for @serv (0 or less means kill all threads)
*
- * Create or destroy threads to make the number of threads for @serv the
- * given number. If @pool is non-NULL, change only threads in that pool;
- * otherwise, round-robin between all pools for @serv. @serv's
- * sv_nrthreads is adjusted for each thread created or destroyed.
+ * Create or destroy threads in @pool to bring it to @nrservs.
*
* Caller must ensure mutual exclusion between this and server startup or
* shutdown.
@@ -873,12 +870,12 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
* starting a thread.
*/
int
-svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
+svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
{
if (!pool)
- nrservs -= serv->sv_nrthreads;
- else
- nrservs -= pool->sp_nrthreads;
+ return -EINVAL;
+
+ nrservs -= pool->sp_nrthreads;
if (nrservs > 0)
return svc_start_kthreads(serv, pool, nrservs);
@@ -886,6 +883,43 @@ svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
return svc_stop_kthreads(serv, pool, nrservs);
return 0;
}
+EXPORT_SYMBOL_GPL(svc_set_pool_threads);
+
+/**
+ * svc_set_num_threads - adjust number of threads in serv
+ * @serv: RPC service to adjust
+ * @nrservs: New number of threads for @serv (0 or less means kill all threads)
+ *
+ * Create or destroy threads in @serv to bring it to @nrservs. If there
+ * are multiple pools then the new threads or victims will be distributed
+ * evenly among them.
+ *
+ * Caller must ensure mutual exclusion between this and server startup or
+ * shutdown.
+ *
+ * Returns zero on success or a negative errno if an error occurred while
+ * starting a thread.
+ */
+int
+svc_set_num_threads(struct svc_serv *serv, int nrservs)
+{
+ int base = nrservs / serv->sv_nrpools;
+ int remain = nrservs % serv->sv_nrpools;
+ int i, err;
+
+ for (i = 0; i < serv->sv_nrpools; ++i) {
+ int threads = base;
+
+ if (remain) {
+ ++threads;
+ --remain;
+ }
+ err = svc_set_pool_threads(serv, &serv->sv_pools[i], threads);
+ if (err)
+ break;
+ }
+ return err;
+}
EXPORT_SYMBOL_GPL(svc_set_num_threads);
/**
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 2/6] sunrpc: remove special handling of NULL pool from svc_start/stop_kthreads()
2025-12-12 22:39 [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 1/6] sunrpc: split svc_set_num_threads() into two functions Jeff Layton
@ 2025-12-12 22:39 ` Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 3/6] sunrpc: track the max number of requested threads in a pool Jeff Layton
` (4 subsequent siblings)
6 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-12-12 22:39 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel, Jeff Layton
Now that svc_set_num_threads() handles distributing the threads among
the available pools, remove the special handling of a NULL pool pointer
from svc_start_kthreads() and svc_stop_kthreads().
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
net/sunrpc/svc.c | 53 +++++++----------------------------------------------
1 file changed, 7 insertions(+), 46 deletions(-)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 3fe5a7f8e57e3fa3837265ec06884b357d5373ff..3484c587a108e6f34e5c23edaf8f3a3c169c9e4a 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -763,53 +763,19 @@ void svc_pool_wake_idle_thread(struct svc_pool *pool)
}
EXPORT_SYMBOL_GPL(svc_pool_wake_idle_thread);
-static struct svc_pool *
-svc_pool_next(struct svc_serv *serv, struct svc_pool *pool, unsigned int *state)
-{
- return pool ? pool : &serv->sv_pools[(*state)++ % serv->sv_nrpools];
-}
-
-static struct svc_pool *
-svc_pool_victim(struct svc_serv *serv, struct svc_pool *target_pool,
- unsigned int *state)
-{
- struct svc_pool *pool;
- unsigned int i;
-
- pool = target_pool;
-
- if (!pool) {
- for (i = 0; i < serv->sv_nrpools; i++) {
- pool = &serv->sv_pools[--(*state) % serv->sv_nrpools];
- if (pool->sp_nrthreads)
- break;
- }
- }
-
- if (pool && pool->sp_nrthreads) {
- set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
- set_bit(SP_NEED_VICTIM, &pool->sp_flags);
- return pool;
- }
- return NULL;
-}
-
static int
svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
{
struct svc_rqst *rqstp;
struct task_struct *task;
- struct svc_pool *chosen_pool;
- unsigned int state = serv->sv_nrthreads-1;
int node;
int err;
do {
nrservs--;
- chosen_pool = svc_pool_next(serv, pool, &state);
- node = svc_pool_map_get_node(chosen_pool->sp_id);
+ node = svc_pool_map_get_node(pool->sp_id);
- rqstp = svc_prepare_thread(serv, chosen_pool, node);
+ rqstp = svc_prepare_thread(serv, pool, node);
if (!rqstp)
return -ENOMEM;
task = kthread_create_on_node(serv->sv_threadfn, rqstp,
@@ -821,7 +787,7 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
rqstp->rq_task = task;
if (serv->sv_nrpools > 1)
- svc_pool_map_set_cpumask(task, chosen_pool->sp_id);
+ svc_pool_map_set_cpumask(task, pool->sp_id);
svc_sock_update_bufs(serv);
wake_up_process(task);
@@ -840,16 +806,11 @@ svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
static int
svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
{
- unsigned int state = serv->sv_nrthreads-1;
- struct svc_pool *victim;
-
do {
- victim = svc_pool_victim(serv, pool, &state);
- if (!victim)
- break;
- svc_pool_wake_idle_thread(victim);
- wait_on_bit(&victim->sp_flags, SP_VICTIM_REMAINS,
- TASK_IDLE);
+ set_bit(SP_VICTIM_REMAINS, &pool->sp_flags);
+ set_bit(SP_NEED_VICTIM, &pool->sp_flags);
+ svc_pool_wake_idle_thread(pool);
+ wait_on_bit(&pool->sp_flags, SP_VICTIM_REMAINS, TASK_IDLE);
nrservs++;
} while (nrservs < 0);
return 0;
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 3/6] sunrpc: track the max number of requested threads in a pool
2025-12-12 22:39 [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 1/6] sunrpc: split svc_set_num_threads() into two functions Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 2/6] sunrpc: remove special handling of NULL pool from svc_start/stop_kthreads() Jeff Layton
@ 2025-12-12 22:39 ` Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 4/6] sunrpc: introduce the concept of a minimum number of threads per pool Jeff Layton
` (3 subsequent siblings)
6 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-12-12 22:39 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel, Jeff Layton
The kernel currently tracks the number of threads running in a pool in
the "sp_nrthreads" field. In the future, where threads are dynamically
spun up and down, it'll be necessary to keep track of the maximum number
of requested threads separately from the actual number running.
Add a pool->sp_nrthrmax parameter to track this. When userland changes
the number of threads in a pool, update that value accordingly.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
include/linux/sunrpc/svc.h | 3 ++-
net/sunrpc/svc.c | 1 +
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index dd5fbbf8b3d39df6c17a7624edf344557fffd32c..ee9260ca908c907f4373f4cfa471b272bc7bcc8c 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -35,8 +35,9 @@
*/
struct svc_pool {
unsigned int sp_id; /* pool id; also node id on NUMA */
+ unsigned int sp_nrthreads; /* # of threads currently running in pool */
+ unsigned int sp_nrthrmax; /* Max requested number of threads in pool */
struct lwq sp_xprts; /* pending transports */
- unsigned int sp_nrthreads; /* # of threads in pool */
struct list_head sp_all_threads; /* all server threads */
struct llist_head sp_idle_threads; /* idle server threads */
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 3484c587a108e6f34e5c23edaf8f3a3c169c9e4a..8cd45f62ef74af6e0826b8f13cc903b0962af5e0 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -836,6 +836,7 @@ svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
if (!pool)
return -EINVAL;
+ pool->sp_nrthrmax = nrservs;
nrservs -= pool->sp_nrthreads;
if (nrservs > 0)
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 4/6] sunrpc: introduce the concept of a minimum number of threads per pool
2025-12-12 22:39 [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Jeff Layton
` (2 preceding siblings ...)
2025-12-12 22:39 ` [PATCH RFC 3/6] sunrpc: track the max number of requested threads in a pool Jeff Layton
@ 2025-12-12 22:39 ` Jeff Layton
2025-12-13 20:19 ` Chuck Lever
2025-12-12 22:39 ` [PATCH RFC 5/6] nfsd: adjust number of running nfsd threads based on activity Jeff Layton
` (2 subsequent siblings)
6 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-12-12 22:39 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel, Jeff Layton
Add a new pool->sp_nrthrmin field to track the minimum number of threads
in a pool. Add min_threads parameters to both svc_set_num_threads() and
svc_set_pool_threads(). If min_threads is non-zero, then have
svc_set_num_threads() ensure that the number of running threads is
between the min and the max.
For now, the min_threads is always 0, but a later patch will pass the
proper value through from nfsd.
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/lockd/svc.c | 4 ++--
fs/nfs/callback.c | 8 ++++----
fs/nfsd/nfssvc.c | 8 ++++----
include/linux/sunrpc/svc.h | 7 ++++---
net/sunrpc/svc.c | 21 ++++++++++++++++++---
5 files changed, 32 insertions(+), 16 deletions(-)
diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
index fbf132b4e08d11a91784c21ee0209fd7c149fd9d..7899205314391415dfb698ab58fe97efc426d928 100644
--- a/fs/lockd/svc.c
+++ b/fs/lockd/svc.c
@@ -340,7 +340,7 @@ static int lockd_get(void)
return -ENOMEM;
}
- error = svc_set_num_threads(serv, 1);
+ error = svc_set_num_threads(serv, 1, 0);
if (error < 0) {
svc_destroy(&serv);
return error;
@@ -368,7 +368,7 @@ static void lockd_put(void)
unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
#endif
- svc_set_num_threads(nlmsvc_serv, 0);
+ svc_set_num_threads(nlmsvc_serv, 0, 0);
timer_delete_sync(&nlmsvc_retry);
svc_destroy(&nlmsvc_serv);
dprintk("lockd_down: service destroyed\n");
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 44b35b7f8dc022f1d8c069eaf2f7d334c93f77fc..32bbc0e688ff3988e4ba50eeb36b4808cec07c87 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -119,9 +119,9 @@ static int nfs_callback_start_svc(int minorversion, struct rpc_xprt *xprt,
if (serv->sv_nrthreads == nrservs)
return 0;
- ret = svc_set_num_threads(serv, nrservs);
+ ret = svc_set_num_threads(serv, nrservs, 0);
if (ret) {
- svc_set_num_threads(serv, 0);
+ svc_set_num_threads(serv, 0, 0);
return ret;
}
dprintk("nfs_callback_up: service started\n");
@@ -242,7 +242,7 @@ int nfs_callback_up(u32 minorversion, struct rpc_xprt *xprt)
cb_info->users++;
err_net:
if (!cb_info->users) {
- svc_set_num_threads(cb_info->serv, 0);
+ svc_set_num_threads(cb_info->serv, 0, 0);
svc_destroy(&cb_info->serv);
}
err_create:
@@ -268,7 +268,7 @@ void nfs_callback_down(int minorversion, struct net *net)
nfs_callback_down_net(minorversion, serv, net);
cb_info->users--;
if (cb_info->users == 0) {
- svc_set_num_threads(serv, 0);
+ svc_set_num_threads(serv, 0, 0);
dprintk("nfs_callback_down: service destroyed\n");
svc_destroy(&cb_info->serv);
}
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index aafec8ff588b85b0e26d40b76ef00953dc6472b4..993ed338764b0ccd7bdfb76bd6fbb5dc6ab4022d 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
}
/* Kill outstanding nfsd threads */
- svc_set_num_threads(serv, 0);
+ svc_set_num_threads(serv, 0, 0);
nfsd_destroy_serv(net);
mutex_unlock(&nfsd_mutex);
}
@@ -704,7 +704,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
/* Special case: When n == 1, distribute threads equally among pools. */
if (n == 1)
- return svc_set_num_threads(nn->nfsd_serv, nthreads[0]);
+ return svc_set_num_threads(nn->nfsd_serv, nthreads[0], 0);
if (n > nn->nfsd_serv->sv_nrpools)
n = nn->nfsd_serv->sv_nrpools;
@@ -732,7 +732,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
for (i = 0; i < n; i++) {
err = svc_set_pool_threads(nn->nfsd_serv,
&nn->nfsd_serv->sv_pools[i],
- nthreads[i]);
+ nthreads[i], 0);
if (err)
goto out;
}
@@ -741,7 +741,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
err = svc_set_pool_threads(nn->nfsd_serv,
&nn->nfsd_serv->sv_pools[i],
- 0);
+ 0, 0);
if (err)
goto out;
}
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index ee9260ca908c907f4373f4cfa471b272bc7bcc8c..35bd3247764ae8dc5dcdfffeea36f7cfefd13372 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -36,6 +36,7 @@
struct svc_pool {
unsigned int sp_id; /* pool id; also node id on NUMA */
unsigned int sp_nrthreads; /* # of threads currently running in pool */
+ unsigned int sp_nrthrmin; /* Min number of threads to run per pool */
unsigned int sp_nrthrmax; /* Max requested number of threads in pool */
struct lwq sp_xprts; /* pending transports */
struct list_head sp_all_threads; /* all server threads */
@@ -72,7 +73,7 @@ struct svc_serv {
struct svc_stat * sv_stats; /* RPC statistics */
spinlock_t sv_lock;
unsigned int sv_nprogs; /* Number of sv_programs */
- unsigned int sv_nrthreads; /* # of server threads */
+ unsigned int sv_nrthreads; /* # of running server threads */
unsigned int sv_max_payload; /* datagram payload size */
unsigned int sv_max_mesg; /* max_payload + 1 page for overheads */
unsigned int sv_xdrsize; /* XDR buffer size */
@@ -447,8 +448,8 @@ struct svc_serv * svc_create_pooled(struct svc_program *prog,
struct svc_stat *stats,
unsigned int bufsize,
int (*threadfn)(void *data));
-int svc_set_pool_threads(struct svc_serv *, struct svc_pool *, int);
-int svc_set_num_threads(struct svc_serv *, int);
+int svc_set_pool_threads(struct svc_serv *, struct svc_pool *, int, unsigned int);
+int svc_set_num_threads(struct svc_serv *, int, unsigned int);
int svc_pool_stats_open(struct svc_info *si, struct file *file);
void svc_process(struct svc_rqst *rqstp);
void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp);
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 8cd45f62ef74af6e0826b8f13cc903b0962af5e0..dc818158f8529b62dcf96c91bd9a9d4ab21df91f 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -821,6 +821,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
* @serv: RPC service to adjust
* @pool: Specific pool from which to choose threads
* @nrservs: New number of threads for @serv (0 or less means kill all threads)
+ * @min_threads: minimum number of threads per pool (0 means set to same as nrservs)
*
* Create or destroy threads in @pool to bring it to @nrservs.
*
@@ -831,12 +832,22 @@ svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
* starting a thread.
*/
int
-svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
+svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int nrservs,
+ unsigned int min_threads)
{
if (!pool)
return -EINVAL;
pool->sp_nrthrmax = nrservs;
+ if (min_threads) {
+ if (pool->sp_nrthreads > nrservs) {
+ // fallthrough to update nrservs
+ } else if (pool->sp_nrthreads < min_threads) {
+ nrservs = min_threads;
+ } else {
+ return 0;
+ }
+ }
nrservs -= pool->sp_nrthreads;
if (nrservs > 0)
@@ -851,6 +862,7 @@ EXPORT_SYMBOL_GPL(svc_set_pool_threads);
* svc_set_num_threads - adjust number of threads in serv
* @serv: RPC service to adjust
* @nrservs: New number of threads for @serv (0 or less means kill all threads)
+ * @min_threads: minimum number of threads per pool (0 means set to same as nrservs)
*
* Create or destroy threads in @serv to bring it to @nrservs. If there
* are multiple pools then the new threads or victims will be distributed
@@ -863,20 +875,23 @@ EXPORT_SYMBOL_GPL(svc_set_pool_threads);
* starting a thread.
*/
int
-svc_set_num_threads(struct svc_serv *serv, int nrservs)
+svc_set_num_threads(struct svc_serv *serv, int nrservs, unsigned int min_threads)
{
int base = nrservs / serv->sv_nrpools;
int remain = nrservs % serv->sv_nrpools;
int i, err;
for (i = 0; i < serv->sv_nrpools; ++i) {
+ struct svc_pool *pool = &serv->sv_pools[i];
int threads = base;
if (remain) {
++threads;
--remain;
}
- err = svc_set_pool_threads(serv, &serv->sv_pools[i], threads);
+
+ pool->sp_nrthrmin = min_threads;
+ err = svc_set_pool_threads(serv, pool, threads, min_threads);
if (err)
break;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 5/6] nfsd: adjust number of running nfsd threads based on activity
2025-12-12 22:39 [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Jeff Layton
` (3 preceding siblings ...)
2025-12-12 22:39 ` [PATCH RFC 4/6] sunrpc: introduce the concept of a minimum number of threads per pool Jeff Layton
@ 2025-12-12 22:39 ` Jeff Layton
2025-12-13 20:54 ` Chuck Lever
2025-12-12 22:39 ` [PATCH RFC 6/6] nfsd: add controls to set the minimum number of threads per pool Jeff Layton
2025-12-13 19:34 ` [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Chuck Lever
6 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-12-12 22:39 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel, Jeff Layton
This patch is based on a draft patch by Neil:
svc_recv() is changed to return a status. This can be:
-ETIMEDOUT - waited for 5 seconds and found nothing to do. This is
boring. Also there are more actual threads than really
needed.
-EBUSY - I did something, but there is more stuff to do and no one
idle who I can wake up to do it.
BTW I successful set a flag: SP_TASK_STARTING. You better
clear it.
0 - just minding my own business, nothing to see here.
nfsd() is changed to pay attention to this status. In the case of
-ETIMEDOUT, if the service mutex can be taken (trylock), the thread
becomes and RQ_VICTIM so that it will exit. In the case of -EBUSY, if
the actual number of threads is below the calculated maximum, a new
thread is started. SP_TASK_STARTING is cleared.
To support the above, some code is split out of svc_start_kthreads()
into svc_new_thread().
I think we want memory pressure to be able to push a thread into
returning -ETIMEDOUT. That can come later.
Signed-off-by: NeilBrown <neil@brown.name>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
fs/nfsd/nfssvc.c | 35 ++++++++++++++++++++-
fs/nfsd/trace.h | 35 +++++++++++++++++++++
include/linux/sunrpc/svc.h | 2 ++
include/linux/sunrpc/svcsock.h | 2 +-
net/sunrpc/svc.c | 69 ++++++++++++++++++++++++------------------
net/sunrpc/svc_xprt.c | 45 ++++++++++++++++++++++-----
6 files changed, 148 insertions(+), 40 deletions(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 993ed338764b0ccd7bdfb76bd6fbb5dc6ab4022d..26c3a6cb1f400f1b757d26f6ba77e27deb7e8ee2 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -896,9 +896,11 @@ static int
nfsd(void *vrqstp)
{
struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
+ struct svc_pool *pool = rqstp->rq_pool;
struct svc_xprt *perm_sock = list_entry(rqstp->rq_server->sv_permsocks.next, typeof(struct svc_xprt), xpt_list);
struct net *net = perm_sock->xpt_net;
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ bool have_mutex = false;
/* At this point, the thread shares current->fs
* with the init process. We need to create files with the
@@ -916,7 +918,36 @@ nfsd(void *vrqstp)
* The main request loop
*/
while (!svc_thread_should_stop(rqstp)) {
- svc_recv(rqstp);
+ switch (svc_recv(rqstp)) {
+ case -ETIMEDOUT: /* Nothing to do */
+ if (mutex_trylock(&nfsd_mutex)) {
+ if (pool->sp_nrthreads > pool->sp_nrthrmin) {
+ trace_nfsd_dynthread_kill(net, pool);
+ set_bit(RQ_VICTIM, &rqstp->rq_flags);
+ have_mutex = true;
+ } else
+ mutex_unlock(&nfsd_mutex);
+ } else {
+ trace_nfsd_dynthread_trylock_fail(net, pool);
+ }
+ break;
+ case -EBUSY: /* Too much to do */
+ if (pool->sp_nrthreads < pool->sp_nrthrmax &&
+ mutex_trylock(&nfsd_mutex)) {
+ // check no idle threads?
+ if (pool->sp_nrthreads < pool->sp_nrthrmax) {
+ trace_nfsd_dynthread_start(net, pool);
+ svc_new_thread(rqstp->rq_server, pool);
+ }
+ mutex_unlock(&nfsd_mutex);
+ } else {
+ trace_nfsd_dynthread_trylock_fail(net, pool);
+ }
+ clear_bit(SP_TASK_STARTING, &pool->sp_flags);
+ break;
+ default:
+ break;
+ }
nfsd_file_net_dispose(nn);
}
@@ -924,6 +955,8 @@ nfsd(void *vrqstp)
/* Release the thread */
svc_exit_thread(rqstp);
+ if (have_mutex)
+ mutex_unlock(&nfsd_mutex);
return 0;
}
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 5ae2a611e57f4b4e51a4d9eb6e0fccb66ad8d288..8885fd9bead98ebf55379d68ab9c3701981a5150 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -91,6 +91,41 @@ DEFINE_EVENT(nfsd_xdr_err_class, nfsd_##name##_err, \
DEFINE_NFSD_XDR_ERR_EVENT(garbage_args);
DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
+DECLARE_EVENT_CLASS(nfsd_dynthread_class,
+ TP_PROTO(
+ const struct net *net,
+ const struct svc_pool *pool
+ ),
+ TP_ARGS(net, pool),
+ TP_STRUCT__entry(
+ __field(unsigned int, netns_ino)
+ __field(unsigned int, pool_id)
+ __field(unsigned int, nrthreads)
+ __field(unsigned int, nrthrmin)
+ __field(unsigned int, nrthrmax)
+ ),
+ TP_fast_assign(
+ __entry->netns_ino = net->ns.inum;
+ __entry->pool_id = pool->sp_id;
+ __entry->nrthreads = pool->sp_nrthreads;
+ __entry->nrthrmin = pool->sp_nrthrmin;
+ __entry->nrthrmax = pool->sp_nrthrmax;
+ ),
+ TP_printk("pool=%u nrthreads=%u nrthrmin=%u nrthrmax=%u",
+ __entry->pool_id, __entry->nrthreads,
+ __entry->nrthrmin, __entry->nrthrmax
+ )
+);
+
+#define DEFINE_NFSD_DYNTHREAD_EVENT(name) \
+DEFINE_EVENT(nfsd_dynthread_class, nfsd_dynthread_##name, \
+ TP_PROTO(const struct net *net, const struct svc_pool *pool), \
+ TP_ARGS(net, pool))
+
+DEFINE_NFSD_DYNTHREAD_EVENT(start);
+DEFINE_NFSD_DYNTHREAD_EVENT(kill);
+DEFINE_NFSD_DYNTHREAD_EVENT(trylock_fail);
+
#define show_nfsd_may_flags(x) \
__print_flags(x, "|", \
{ NFSD_MAY_EXEC, "EXEC" }, \
diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
index 35bd3247764ae8dc5dcdfffeea36f7cfefd13372..f47e19c9bd9466986438766e9ab7b4c71cda1ba6 100644
--- a/include/linux/sunrpc/svc.h
+++ b/include/linux/sunrpc/svc.h
@@ -55,6 +55,7 @@ enum {
SP_TASK_PENDING, /* still work to do even if no xprt is queued */
SP_NEED_VICTIM, /* One thread needs to agree to exit */
SP_VICTIM_REMAINS, /* One thread needs to actually exit */
+ SP_TASK_STARTING, /* Task has started but not added to idle yet */
};
@@ -442,6 +443,7 @@ struct svc_serv *svc_create(struct svc_program *, unsigned int,
bool svc_rqst_replace_page(struct svc_rqst *rqstp,
struct page *page);
void svc_rqst_release_pages(struct svc_rqst *rqstp);
+int svc_new_thread(struct svc_serv *serv, struct svc_pool *pool);
void svc_exit_thread(struct svc_rqst *);
struct svc_serv * svc_create_pooled(struct svc_program *prog,
unsigned int nprog,
diff --git a/include/linux/sunrpc/svcsock.h b/include/linux/sunrpc/svcsock.h
index de37069aba90899be19b1090e6e90e509a3cf530..5c87d3fedd33e7edf5ade32e60523cae7e9ebaba 100644
--- a/include/linux/sunrpc/svcsock.h
+++ b/include/linux/sunrpc/svcsock.h
@@ -61,7 +61,7 @@ static inline u32 svc_sock_final_rec(struct svc_sock *svsk)
/*
* Function prototypes.
*/
-void svc_recv(struct svc_rqst *rqstp);
+int svc_recv(struct svc_rqst *rqstp);
void svc_send(struct svc_rqst *rqstp);
int svc_addsock(struct svc_serv *serv, struct net *net,
const int fd, char *name_return, const size_t len,
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index dc818158f8529b62dcf96c91bd9a9d4ab21df91f..9fca2dd340037f82baa4936766ebe0e38c3f0d85 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -714,9 +714,6 @@ svc_prepare_thread(struct svc_serv *serv, struct svc_pool *pool, int node)
rqstp->rq_err = -EAGAIN; /* No error yet */
- serv->sv_nrthreads += 1;
- pool->sp_nrthreads += 1;
-
/* Protected by whatever lock the service uses when calling
* svc_set_num_threads()
*/
@@ -763,45 +760,57 @@ void svc_pool_wake_idle_thread(struct svc_pool *pool)
}
EXPORT_SYMBOL_GPL(svc_pool_wake_idle_thread);
-static int
-svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
+int svc_new_thread(struct svc_serv *serv, struct svc_pool *pool)
{
struct svc_rqst *rqstp;
struct task_struct *task;
int node;
int err;
- do {
- nrservs--;
- node = svc_pool_map_get_node(pool->sp_id);
-
- rqstp = svc_prepare_thread(serv, pool, node);
- if (!rqstp)
- return -ENOMEM;
- task = kthread_create_on_node(serv->sv_threadfn, rqstp,
- node, "%s", serv->sv_name);
- if (IS_ERR(task)) {
- svc_exit_thread(rqstp);
- return PTR_ERR(task);
- }
+ node = svc_pool_map_get_node(pool->sp_id);
- rqstp->rq_task = task;
- if (serv->sv_nrpools > 1)
- svc_pool_map_set_cpumask(task, pool->sp_id);
+ rqstp = svc_prepare_thread(serv, pool, node);
+ if (!rqstp)
+ return -ENOMEM;
+ set_bit(SP_TASK_STARTING, &pool->sp_flags);
+ task = kthread_create_on_node(serv->sv_threadfn, rqstp,
+ node, "%s", serv->sv_name);
+ if (IS_ERR(task)) {
+ clear_bit(SP_TASK_STARTING, &pool->sp_flags);
+ svc_exit_thread(rqstp);
+ return PTR_ERR(task);
+ }
- svc_sock_update_bufs(serv);
- wake_up_process(task);
+ serv->sv_nrthreads += 1;
+ pool->sp_nrthreads += 1;
- wait_var_event(&rqstp->rq_err, rqstp->rq_err != -EAGAIN);
- err = rqstp->rq_err;
- if (err) {
- svc_exit_thread(rqstp);
- return err;
- }
- } while (nrservs > 0);
+ rqstp->rq_task = task;
+ if (serv->sv_nrpools > 1)
+ svc_pool_map_set_cpumask(task, pool->sp_id);
+ svc_sock_update_bufs(serv);
+ wake_up_process(task);
+
+ wait_var_event(&rqstp->rq_err, rqstp->rq_err != -EAGAIN);
+ err = rqstp->rq_err;
+ if (err) {
+ svc_exit_thread(rqstp);
+ return err;
+ }
return 0;
}
+EXPORT_SYMBOL_GPL(svc_new_thread);
+
+static int
+svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
+{
+ int err = 0;
+
+ while (!err && nrservs--)
+ err = svc_new_thread(serv, pool);
+
+ return err;
+}
static int
svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 6973184ff6675211b4338fac80105894e9c8d4df..9612334300c8dae38720a0f5c61c0f505432ec2f 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -714,15 +714,22 @@ svc_thread_should_sleep(struct svc_rqst *rqstp)
return true;
}
-static void svc_thread_wait_for_work(struct svc_rqst *rqstp)
+static bool nfsd_schedule_timeout(long timeout)
+{
+ return schedule_timeout(timeout) == 0;
+}
+
+static bool svc_thread_wait_for_work(struct svc_rqst *rqstp)
{
struct svc_pool *pool = rqstp->rq_pool;
+ bool did_timeout = false;
if (svc_thread_should_sleep(rqstp)) {
set_current_state(TASK_IDLE | TASK_FREEZABLE);
llist_add(&rqstp->rq_idle, &pool->sp_idle_threads);
+ clear_bit(SP_TASK_STARTING, &pool->sp_flags);
if (likely(svc_thread_should_sleep(rqstp)))
- schedule();
+ did_timeout = nfsd_schedule_timeout(5 * HZ);
while (!llist_del_first_this(&pool->sp_idle_threads,
&rqstp->rq_idle)) {
@@ -734,7 +741,10 @@ static void svc_thread_wait_for_work(struct svc_rqst *rqstp)
* for this new work. This thread can safely sleep
* until woken again.
*/
- schedule();
+ if (did_timeout)
+ did_timeout = nfsd_schedule_timeout(HZ);
+ else
+ did_timeout = nfsd_schedule_timeout(5 * HZ);
set_current_state(TASK_IDLE | TASK_FREEZABLE);
}
__set_current_state(TASK_RUNNING);
@@ -742,6 +752,7 @@ static void svc_thread_wait_for_work(struct svc_rqst *rqstp)
cond_resched();
}
try_to_freeze();
+ return did_timeout;
}
static void svc_add_new_temp_xprt(struct svc_serv *serv, struct svc_xprt *newxpt)
@@ -825,6 +836,8 @@ static void svc_handle_xprt(struct svc_rqst *rqstp, struct svc_xprt *xprt)
static void svc_thread_wake_next(struct svc_rqst *rqstp)
{
+ clear_bit(SP_TASK_STARTING, &rqstp->rq_pool->sp_flags);
+
if (!svc_thread_should_sleep(rqstp))
/* More work pending after I dequeued some,
* wake another worker
@@ -839,21 +852,31 @@ static void svc_thread_wake_next(struct svc_rqst *rqstp)
* This code is carefully organised not to touch any cachelines in
* the shared svc_serv structure, only cachelines in the local
* svc_pool.
+ *
+ * Returns -ETIMEDOUT if idle for an extended period
+ * -EBUSY is there is more work to do than available threads
+ * 0 otherwise.
*/
-void svc_recv(struct svc_rqst *rqstp)
+int svc_recv(struct svc_rqst *rqstp)
{
struct svc_pool *pool = rqstp->rq_pool;
+ bool did_wait;
+ int ret = 0;
if (!svc_alloc_arg(rqstp))
- return;
+ return ret;
+
+ did_wait = svc_thread_wait_for_work(rqstp);
- svc_thread_wait_for_work(rqstp);
+ if (did_wait && svc_thread_should_sleep(rqstp) &&
+ pool->sp_nrthrmin && (pool->sp_nrthreads > pool->sp_nrthrmin))
+ ret = -ETIMEDOUT;
clear_bit(SP_TASK_PENDING, &pool->sp_flags);
if (svc_thread_should_stop(rqstp)) {
svc_thread_wake_next(rqstp);
- return;
+ return ret;
}
rqstp->rq_xprt = svc_xprt_dequeue(pool);
@@ -867,8 +890,13 @@ void svc_recv(struct svc_rqst *rqstp)
*/
if (pool->sp_idle_threads.first)
rqstp->rq_chandle.thread_wait = 5 * HZ;
- else
+ else {
rqstp->rq_chandle.thread_wait = 1 * HZ;
+ if (!did_wait &&
+ !test_and_set_bit(SP_TASK_STARTING,
+ &pool->sp_flags))
+ ret = -EBUSY;
+ }
trace_svc_xprt_dequeue(rqstp);
svc_handle_xprt(rqstp, xprt);
@@ -887,6 +915,7 @@ void svc_recv(struct svc_rqst *rqstp)
}
}
#endif
+ return ret;
}
EXPORT_SYMBOL_GPL(svc_recv);
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* [PATCH RFC 6/6] nfsd: add controls to set the minimum number of threads per pool
2025-12-12 22:39 [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Jeff Layton
` (4 preceding siblings ...)
2025-12-12 22:39 ` [PATCH RFC 5/6] nfsd: adjust number of running nfsd threads based on activity Jeff Layton
@ 2025-12-12 22:39 ` Jeff Layton
2025-12-13 21:10 ` Chuck Lever
2025-12-13 19:34 ` [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Chuck Lever
6 siblings, 1 reply; 17+ messages in thread
From: Jeff Layton @ 2025-12-12 22:39 UTC (permalink / raw)
To: Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey,
Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel, Jeff Layton
Add a new "min_threads" variable to the nfsd_net, along with the
corresponding nfsdfs and netlink interfaces to set that value from
userland. Pass that value to svc_set_pool_threads() and
svc_set_num_threads().
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
Documentation/netlink/specs/nfsd.yaml | 5 ++++
fs/nfsd/netlink.c | 5 ++--
fs/nfsd/netns.h | 6 +++++
fs/nfsd/nfsctl.c | 50 +++++++++++++++++++++++++++++++++++
fs/nfsd/nfssvc.c | 8 +++---
fs/nfsd/trace.h | 19 +++++++++++++
include/uapi/linux/nfsd_netlink.h | 1 +
7 files changed, 88 insertions(+), 6 deletions(-)
diff --git a/Documentation/netlink/specs/nfsd.yaml b/Documentation/netlink/specs/nfsd.yaml
index 100363029e82aed87295e34a008ab771a95d508c..badb2fe57c9859c6932c621a589da694782b0272 100644
--- a/Documentation/netlink/specs/nfsd.yaml
+++ b/Documentation/netlink/specs/nfsd.yaml
@@ -78,6 +78,9 @@ attribute-sets:
-
name: scope
type: string
+ -
+ name: min-threads
+ type: u32
-
name: version
attributes:
@@ -159,6 +162,7 @@ operations:
- gracetime
- leasetime
- scope
+ - min-threads
-
name: threads-get
doc: get the number of running threads
@@ -170,6 +174,7 @@ operations:
- gracetime
- leasetime
- scope
+ - min-threads
-
name: version-set
doc: set nfs enabled versions
diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
index ac51a44e1065ec3f1d88165f70a831a828b58394..887525964451e640304371e33aa4f415b4ff2848 100644
--- a/fs/nfsd/netlink.c
+++ b/fs/nfsd/netlink.c
@@ -24,11 +24,12 @@ const struct nla_policy nfsd_version_nl_policy[NFSD_A_VERSION_ENABLED + 1] = {
};
/* NFSD_CMD_THREADS_SET - do */
-static const struct nla_policy nfsd_threads_set_nl_policy[NFSD_A_SERVER_SCOPE + 1] = {
+static const struct nla_policy nfsd_threads_set_nl_policy[NFSD_A_SERVER_MIN_THREADS + 1] = {
[NFSD_A_SERVER_THREADS] = { .type = NLA_U32, },
[NFSD_A_SERVER_GRACETIME] = { .type = NLA_U32, },
[NFSD_A_SERVER_LEASETIME] = { .type = NLA_U32, },
[NFSD_A_SERVER_SCOPE] = { .type = NLA_NUL_STRING, },
+ [NFSD_A_SERVER_MIN_THREADS] = { .type = NLA_U32, },
};
/* NFSD_CMD_VERSION_SET - do */
@@ -57,7 +58,7 @@ static const struct genl_split_ops nfsd_nl_ops[] = {
.cmd = NFSD_CMD_THREADS_SET,
.doit = nfsd_nl_threads_set_doit,
.policy = nfsd_threads_set_nl_policy,
- .maxattr = NFSD_A_SERVER_SCOPE,
+ .maxattr = NFSD_A_SERVER_MIN_THREADS,
.flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
},
{
diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
index 3e2d0fde80a7ce434ef2cce9f1666c2bd16ab2eb..1c3449810eaefea8167ddd284af7bd66cac7e211 100644
--- a/fs/nfsd/netns.h
+++ b/fs/nfsd/netns.h
@@ -128,6 +128,12 @@ struct nfsd_net {
seqlock_t writeverf_lock;
unsigned char writeverf[8];
+ /*
+ * Minimum number of threads to run per pool. If 0 then the
+ * min == max requested number of threads.
+ */
+ unsigned int min_threads;
+
u32 clientid_base;
u32 clientid_counter;
u32 clverifier_counter;
diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
index 206534fccf36a992026669fee6533adff1062c36..a5401015e62499d07150cde8822f1e7dd0515dfe 100644
--- a/fs/nfsd/nfsctl.c
+++ b/fs/nfsd/nfsctl.c
@@ -48,6 +48,7 @@ enum {
NFSD_Versions,
NFSD_Ports,
NFSD_MaxBlkSize,
+ NFSD_MinThreads,
NFSD_Filecache,
NFSD_Leasetime,
NFSD_Gracetime,
@@ -67,6 +68,7 @@ static ssize_t write_pool_threads(struct file *file, char *buf, size_t size);
static ssize_t write_versions(struct file *file, char *buf, size_t size);
static ssize_t write_ports(struct file *file, char *buf, size_t size);
static ssize_t write_maxblksize(struct file *file, char *buf, size_t size);
+static ssize_t write_minthreads(struct file *file, char *buf, size_t size);
#ifdef CONFIG_NFSD_V4
static ssize_t write_leasetime(struct file *file, char *buf, size_t size);
static ssize_t write_gracetime(struct file *file, char *buf, size_t size);
@@ -85,6 +87,7 @@ static ssize_t (*const write_op[])(struct file *, char *, size_t) = {
[NFSD_Versions] = write_versions,
[NFSD_Ports] = write_ports,
[NFSD_MaxBlkSize] = write_maxblksize,
+ [NFSD_MinThreads] = write_minthreads,
#ifdef CONFIG_NFSD_V4
[NFSD_Leasetime] = write_leasetime,
[NFSD_Gracetime] = write_gracetime,
@@ -899,6 +902,46 @@ static ssize_t write_maxblksize(struct file *file, char *buf, size_t size)
nfsd_max_blksize);
}
+/*
+ * write_minthreads - Set or report the current min number of threads
+ *
+ * Input:
+ * buf: ignored
+ * size: zero
+ * OR
+ *
+ * Input:
+ * buf: C string containing an unsigned
+ * integer value representing the new
+ * max number of threads
+ * size: non-zero length of C string in @buf
+ * Output:
+ * On success: passed-in buffer filled with '\n'-terminated C string
+ * containing numeric value of min_threads setting
+ * for this net namespace;
+ * return code is the size in bytes of the string
+ * On error: return code is zero or a negative errno value
+ */
+static ssize_t write_minthreads(struct file *file, char *buf, size_t size)
+{
+ char *mesg = buf;
+ struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
+ unsigned int minthreads = nn->min_threads;
+
+ if (size > 0) {
+ int rv = get_uint(&mesg, &minthreads);
+
+ if (rv)
+ return rv;
+ trace_nfsd_ctl_minthreads(netns(file), minthreads);
+ mutex_lock(&nfsd_mutex);
+ nn->min_threads = minthreads;
+ mutex_unlock(&nfsd_mutex);
+ }
+
+ return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", minthreads);
+}
+
#ifdef CONFIG_NFSD_V4
static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t size,
time64_t *time, struct nfsd_net *nn)
@@ -1292,6 +1335,7 @@ static int nfsd_fill_super(struct super_block *sb, struct fs_context *fc)
[NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
[NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
[NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops, S_IWUSR|S_IRUGO},
+ [NFSD_MinThreads] = {"min_threads", &transaction_ops, S_IWUSR|S_IRUGO},
[NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops, S_IRUGO},
#ifdef CONFIG_NFSD_V4
[NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops, S_IWUSR|S_IRUSR},
@@ -1636,6 +1680,10 @@ int nfsd_nl_threads_set_doit(struct sk_buff *skb, struct genl_info *info)
scope = nla_data(attr);
}
+ attr = info->attrs[NFSD_A_SERVER_MIN_THREADS];
+ if (attr)
+ nn->min_threads = nla_get_u32(attr);
+
ret = nfsd_svc(nrpools, nthreads, net, get_current_cred(), scope);
if (ret > 0)
ret = 0;
@@ -1675,6 +1723,8 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb, struct genl_info *info)
nn->nfsd4_grace) ||
nla_put_u32(skb, NFSD_A_SERVER_LEASETIME,
nn->nfsd4_lease) ||
+ nla_put_u32(skb, NFSD_A_SERVER_MIN_THREADS,
+ nn->min_threads) ||
nla_put_string(skb, NFSD_A_SERVER_SCOPE,
nn->nfsd_name);
if (err)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 26c3a6cb1f400f1b757d26f6ba77e27deb7e8ee2..d6120dd843ac1b6a42f0ef331700f4d6d70d8c38 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
}
/* Kill outstanding nfsd threads */
- svc_set_num_threads(serv, 0, 0);
+ svc_set_num_threads(serv, 0, nn->min_threads);
nfsd_destroy_serv(net);
mutex_unlock(&nfsd_mutex);
}
@@ -704,7 +704,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
/* Special case: When n == 1, distribute threads equally among pools. */
if (n == 1)
- return svc_set_num_threads(nn->nfsd_serv, nthreads[0], 0);
+ return svc_set_num_threads(nn->nfsd_serv, nthreads[0], nn->min_threads);
if (n > nn->nfsd_serv->sv_nrpools)
n = nn->nfsd_serv->sv_nrpools;
@@ -732,7 +732,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
for (i = 0; i < n; i++) {
err = svc_set_pool_threads(nn->nfsd_serv,
&nn->nfsd_serv->sv_pools[i],
- nthreads[i], 0);
+ nthreads[i], nn->min_threads);
if (err)
goto out;
}
@@ -741,7 +741,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct net *net)
for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
err = svc_set_pool_threads(nn->nfsd_serv,
&nn->nfsd_serv->sv_pools[i],
- 0, 0);
+ 0, nn->min_threads);
if (err)
goto out;
}
diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
index 8885fd9bead98ebf55379d68ab9c3701981a5150..d1d0b0dd054588a8c20e3386356dfa4e9632b8e0 100644
--- a/fs/nfsd/trace.h
+++ b/fs/nfsd/trace.h
@@ -2164,6 +2164,25 @@ TRACE_EVENT(nfsd_ctl_maxblksize,
)
);
+TRACE_EVENT(nfsd_ctl_minthreads,
+ TP_PROTO(
+ const struct net *net,
+ int minthreads
+ ),
+ TP_ARGS(net, minthreads),
+ TP_STRUCT__entry(
+ __field(unsigned int, netns_ino)
+ __field(int, minthreads)
+ ),
+ TP_fast_assign(
+ __entry->netns_ino = net->ns.inum;
+ __entry->minthreads = minthreads
+ ),
+ TP_printk("minthreads=%d",
+ __entry->minthreads
+ )
+);
+
TRACE_EVENT(nfsd_ctl_time,
TP_PROTO(
const struct net *net,
diff --git a/include/uapi/linux/nfsd_netlink.h b/include/uapi/linux/nfsd_netlink.h
index e157e2009ea8c1ef805301261d536c82677821ef..e9efbc9e63d83ed25fcd790b7a877c0023638f15 100644
--- a/include/uapi/linux/nfsd_netlink.h
+++ b/include/uapi/linux/nfsd_netlink.h
@@ -35,6 +35,7 @@ enum {
NFSD_A_SERVER_GRACETIME,
NFSD_A_SERVER_LEASETIME,
NFSD_A_SERVER_SCOPE,
+ NFSD_A_SERVER_MIN_THREADS,
__NFSD_A_SERVER_MAX,
NFSD_A_SERVER_MAX = (__NFSD_A_SERVER_MAX - 1)
--
2.52.0
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/6] sunrpc: split svc_set_num_threads() into two functions
2025-12-12 22:39 ` [PATCH RFC 1/6] sunrpc: split svc_set_num_threads() into two functions Jeff Layton
@ 2025-12-13 19:29 ` Chuck Lever
2025-12-13 21:55 ` Jeff Layton
0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2025-12-13 19:29 UTC (permalink / raw)
To: Jeff Layton, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> svc_set_num_threads() will set the number of running threads for a given
> pool. If the pool argument is set to NULL however, it will distribute
> the threads among all of the pools evenly.
>
> These divergent codepaths complicate the move to dynamic threading.
> Simplify the API by splitting these two cases into different helpers:
>
> Add a new svc_set_pool_threads() function that sets the number of
> threads in a single, given pool. Modify svc_set_num_threads() to
> distribute the threads evenly between all of the pools and then call
> svc_set_pool_threads() for each.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/lockd/svc.c | 4 ++--
> fs/nfs/callback.c | 8 +++----
> fs/nfsd/nfssvc.c | 21 ++++++++----------
> include/linux/sunrpc/svc.h | 3 ++-
> net/sunrpc/svc.c | 54 +++++++++++++++++++++++++++++++++++++---------
> 5 files changed, 61 insertions(+), 29 deletions(-)
>
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index
> d68afa196535a8785bab2931c2b14f03a1174ef9..fbf132b4e08d11a91784c21ee0209fd7c149fd9d
> 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -340,7 +340,7 @@ static int lockd_get(void)
> return -ENOMEM;
> }
>
> - error = svc_set_num_threads(serv, NULL, 1);
> + error = svc_set_num_threads(serv, 1);
> if (error < 0) {
> svc_destroy(&serv);
> return error;
> @@ -368,7 +368,7 @@ static void lockd_put(void)
> unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
> #endif
>
> - svc_set_num_threads(nlmsvc_serv, NULL, 0);
> + svc_set_num_threads(nlmsvc_serv, 0);
> timer_delete_sync(&nlmsvc_retry);
> svc_destroy(&nlmsvc_serv);
> dprintk("lockd_down: service destroyed\n");
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index
> c8b837006bb27277ab34fe516f1b63992fee9b7f..44b35b7f8dc022f1d8c069eaf2f7d334c93f77fc
> 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -119,9 +119,9 @@ static int nfs_callback_start_svc(int minorversion,
> struct rpc_xprt *xprt,
> if (serv->sv_nrthreads == nrservs)
> return 0;
>
> - ret = svc_set_num_threads(serv, NULL, nrservs);
> + ret = svc_set_num_threads(serv, nrservs);
> if (ret) {
> - svc_set_num_threads(serv, NULL, 0);
> + svc_set_num_threads(serv, 0);
> return ret;
> }
> dprintk("nfs_callback_up: service started\n");
> @@ -242,7 +242,7 @@ int nfs_callback_up(u32 minorversion, struct
> rpc_xprt *xprt)
> cb_info->users++;
> err_net:
> if (!cb_info->users) {
> - svc_set_num_threads(cb_info->serv, NULL, 0);
> + svc_set_num_threads(cb_info->serv, 0);
> svc_destroy(&cb_info->serv);
> }
> err_create:
> @@ -268,7 +268,7 @@ void nfs_callback_down(int minorversion, struct net
> *net)
> nfs_callback_down_net(minorversion, serv, net);
> cb_info->users--;
> if (cb_info->users == 0) {
> - svc_set_num_threads(serv, NULL, 0);
> + svc_set_num_threads(serv, 0);
> dprintk("nfs_callback_down: service destroyed\n");
> svc_destroy(&cb_info->serv);
> }
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index
> 93f7435cafd2362d9ddb28815277c824067cb370..aafec8ff588b85b0e26d40b76ef00953dc6472b4
> 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
> }
>
> /* Kill outstanding nfsd threads */
> - svc_set_num_threads(serv, NULL, 0);
> + svc_set_num_threads(serv, 0);
> nfsd_destroy_serv(net);
> mutex_unlock(&nfsd_mutex);
> }
> @@ -702,12 +702,9 @@ int nfsd_set_nrthreads(int n, int *nthreads,
> struct net *net)
> if (nn->nfsd_serv == NULL || n <= 0)
> return 0;
>
> - /*
> - * Special case: When n == 1, pass in NULL for the pool, so that the
> - * change is distributed equally among them.
> - */
> + /* Special case: When n == 1, distribute threads equally among pools. */
> if (n == 1)
> - return svc_set_num_threads(nn->nfsd_serv, NULL, nthreads[0]);
> + return svc_set_num_threads(nn->nfsd_serv, nthreads[0]);
>
> if (n > nn->nfsd_serv->sv_nrpools)
> n = nn->nfsd_serv->sv_nrpools;
> @@ -733,18 +730,18 @@ int nfsd_set_nrthreads(int n, int *nthreads,
> struct net *net)
>
> /* apply the new numbers */
> for (i = 0; i < n; i++) {
> - err = svc_set_num_threads(nn->nfsd_serv,
> - &nn->nfsd_serv->sv_pools[i],
> - nthreads[i]);
> + err = svc_set_pool_threads(nn->nfsd_serv,
> + &nn->nfsd_serv->sv_pools[i],
> + nthreads[i]);
> if (err)
> goto out;
> }
>
> /* Anything undefined in array is considered to be 0 */
> for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
> - err = svc_set_num_threads(nn->nfsd_serv,
> - &nn->nfsd_serv->sv_pools[i],
> - 0);
> + err = svc_set_pool_threads(nn->nfsd_serv,
> + &nn->nfsd_serv->sv_pools[i],
> + 0);
> if (err)
> goto out;
> }
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index
> 5506d20857c318774cd223272d4b0022cc19ffb8..dd5fbbf8b3d39df6c17a7624edf344557fffd32c
> 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -446,7 +446,8 @@ struct svc_serv * svc_create_pooled(struct
> svc_program *prog,
> struct svc_stat *stats,
> unsigned int bufsize,
> int (*threadfn)(void *data));
> -int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
> +int svc_set_pool_threads(struct svc_serv *, struct svc_pool *,
> int);
> +int svc_set_num_threads(struct svc_serv *, int);
> int svc_pool_stats_open(struct svc_info *si, struct file *file);
> void svc_process(struct svc_rqst *rqstp);
> void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp);
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index
> 4704dce7284eccc9e2bc64cf22947666facfa86a..3fe5a7f8e57e3fa3837265ec06884b357d5373ff
> 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -856,15 +856,12 @@ svc_stop_kthreads(struct svc_serv *serv, struct
> svc_pool *pool, int nrservs)
> }
>
> /**
> - * svc_set_num_threads - adjust number of threads per RPC service
> + * svc_set_pool_threads - adjust number of threads per pool
> * @serv: RPC service to adjust
> - * @pool: Specific pool from which to choose threads, or NULL
> + * @pool: Specific pool from which to choose threads
> * @nrservs: New number of threads for @serv (0 or less means kill all
> threads)
> *
> - * Create or destroy threads to make the number of threads for @serv
> the
> - * given number. If @pool is non-NULL, change only threads in that
> pool;
> - * otherwise, round-robin between all pools for @serv. @serv's
> - * sv_nrthreads is adjusted for each thread created or destroyed.
> + * Create or destroy threads in @pool to bring it to @nrservs.
> *
> * Caller must ensure mutual exclusion between this and server startup
> or
> * shutdown.
> @@ -873,12 +870,12 @@ svc_stop_kthreads(struct svc_serv *serv, struct
> svc_pool *pool, int nrservs)
> * starting a thread.
> */
> int
> -svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int
> nrservs)
> +svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int
> nrservs)
> {
> if (!pool)
> - nrservs -= serv->sv_nrthreads;
> - else
> - nrservs -= pool->sp_nrthreads;
> + return -EINVAL;
> +
> + nrservs -= pool->sp_nrthreads;
>
> if (nrservs > 0)
> return svc_start_kthreads(serv, pool, nrservs);
> @@ -886,6 +883,43 @@ svc_set_num_threads(struct svc_serv *serv, struct
> svc_pool *pool, int nrservs)
> return svc_stop_kthreads(serv, pool, nrservs);
> return 0;
> }
> +EXPORT_SYMBOL_GPL(svc_set_pool_threads);
> +
> +/**
> + * svc_set_num_threads - adjust number of threads in serv
> + * @serv: RPC service to adjust
> + * @nrservs: New number of threads for @serv (0 or less means kill all
> threads)
> + *
> + * Create or destroy threads in @serv to bring it to @nrservs. If there
> + * are multiple pools then the new threads or victims will be
> distributed
> + * evenly among them.
> + *
> + * Caller must ensure mutual exclusion between this and server startup
> or
> + * shutdown.
> + *
> + * Returns zero on success or a negative errno if an error occurred
> while
> + * starting a thread.
> + */
> +int
> +svc_set_num_threads(struct svc_serv *serv, int nrservs)
> +{
> + int base = nrservs / serv->sv_nrpools;
> + int remain = nrservs % serv->sv_nrpools;
> + int i, err;
> +
> + for (i = 0; i < serv->sv_nrpools; ++i) {
If sv_nrpools happens to be zero, then the loop doesn't
execute at all, and err is left containing stack garbage.
Is sv_nrpools guaranteed to be non-zero? If not then err
needs to be initialized before the loop runs. I see that
nfsd_set_nrthreads() in fs/nfsd/nfssvc.c has "int err = 0"
for a similar loop pattern.
> + int threads = base;
> +
> + if (remain) {
> + ++threads;
> + --remain;
> + }
> + err = svc_set_pool_threads(serv, &serv->sv_pools[i], threads);
> + if (err)
> + break;
> + }
> + return err;
> +}
> EXPORT_SYMBOL_GPL(svc_set_num_threads);
>
> /**
>
> --
> 2.52.0
--
Chuck Lever
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool
2025-12-12 22:39 [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Jeff Layton
` (5 preceding siblings ...)
2025-12-12 22:39 ` [PATCH RFC 6/6] nfsd: add controls to set the minimum number of threads per pool Jeff Layton
@ 2025-12-13 19:34 ` Chuck Lever
2025-12-13 21:35 ` Jeff Layton
6 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2025-12-13 19:34 UTC (permalink / raw)
To: Jeff Layton, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> This patchset changes nfsd to dynamically size its threadpool as
> needed. The main user-visible change is the addition of new controls
> that allow the admin to set a minimum number of threads.
>
> When the minimum is set to a non-zero value, the traditional "threads"
> setting is interpreted as a maximum number of threads instead of a
> static count. The server will start the minimum number of threads, and
> then ramp up the thread count as needed. When the server is idle, it
> will gradually ramp down the thread count.
>
> This control scheme should allow us to sanely switch between kernels
> that do and do not support dynamic threading. In the case where dynamic
> threading is not supported, the user will just get the static maximum
> number of threads.
An important consideration!
> The series is based on a set of draft patches by Neil. There are a
> number of changes from his work:
>
> 1/ his original series was based around a new setting that defined a
> maximum number of threads. This one instead adds a control to define a
> minimum number of threads.
My concern is whether one or more clients can force this mechanism
to continue creating threads until resource exhaustion causes a
denial of service.
I'm not convinced that setting a minimum number of threads is all
that interesting. Can you elaborate on why you chose that design?
--
Chuck Lever
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 4/6] sunrpc: introduce the concept of a minimum number of threads per pool
2025-12-12 22:39 ` [PATCH RFC 4/6] sunrpc: introduce the concept of a minimum number of threads per pool Jeff Layton
@ 2025-12-13 20:19 ` Chuck Lever
2025-12-13 21:38 ` Jeff Layton
0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2025-12-13 20:19 UTC (permalink / raw)
To: Jeff Layton, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> Add a new pool->sp_nrthrmin field to track the minimum number of threads
> in a pool. Add min_threads parameters to both svc_set_num_threads() and
> svc_set_pool_threads(). If min_threads is non-zero, then have
> svc_set_num_threads() ensure that the number of running threads is
> between the min and the max.
>
> For now, the min_threads is always 0, but a later patch will pass the
> proper value through from nfsd.
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/lockd/svc.c | 4 ++--
> fs/nfs/callback.c | 8 ++++----
> fs/nfsd/nfssvc.c | 8 ++++----
> include/linux/sunrpc/svc.h | 7 ++++---
> net/sunrpc/svc.c | 21 ++++++++++++++++++---
> 5 files changed, 32 insertions(+), 16 deletions(-)
>
> diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> index
> fbf132b4e08d11a91784c21ee0209fd7c149fd9d..7899205314391415dfb698ab58fe97efc426d928
> 100644
> --- a/fs/lockd/svc.c
> +++ b/fs/lockd/svc.c
> @@ -340,7 +340,7 @@ static int lockd_get(void)
> return -ENOMEM;
> }
>
> - error = svc_set_num_threads(serv, 1);
> + error = svc_set_num_threads(serv, 1, 0);
> if (error < 0) {
> svc_destroy(&serv);
> return error;
> @@ -368,7 +368,7 @@ static void lockd_put(void)
> unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
> #endif
>
> - svc_set_num_threads(nlmsvc_serv, 0);
> + svc_set_num_threads(nlmsvc_serv, 0, 0);
> timer_delete_sync(&nlmsvc_retry);
> svc_destroy(&nlmsvc_serv);
> dprintk("lockd_down: service destroyed\n");
> diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> index
> 44b35b7f8dc022f1d8c069eaf2f7d334c93f77fc..32bbc0e688ff3988e4ba50eeb36b4808cec07c87
> 100644
> --- a/fs/nfs/callback.c
> +++ b/fs/nfs/callback.c
> @@ -119,9 +119,9 @@ static int nfs_callback_start_svc(int minorversion,
> struct rpc_xprt *xprt,
> if (serv->sv_nrthreads == nrservs)
> return 0;
>
> - ret = svc_set_num_threads(serv, nrservs);
> + ret = svc_set_num_threads(serv, nrservs, 0);
> if (ret) {
> - svc_set_num_threads(serv, 0);
> + svc_set_num_threads(serv, 0, 0);
> return ret;
> }
> dprintk("nfs_callback_up: service started\n");
> @@ -242,7 +242,7 @@ int nfs_callback_up(u32 minorversion, struct
> rpc_xprt *xprt)
> cb_info->users++;
> err_net:
> if (!cb_info->users) {
> - svc_set_num_threads(cb_info->serv, 0);
> + svc_set_num_threads(cb_info->serv, 0, 0);
> svc_destroy(&cb_info->serv);
> }
> err_create:
> @@ -268,7 +268,7 @@ void nfs_callback_down(int minorversion, struct net
> *net)
> nfs_callback_down_net(minorversion, serv, net);
> cb_info->users--;
> if (cb_info->users == 0) {
> - svc_set_num_threads(serv, 0);
> + svc_set_num_threads(serv, 0, 0);
> dprintk("nfs_callback_down: service destroyed\n");
> svc_destroy(&cb_info->serv);
> }
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index
> aafec8ff588b85b0e26d40b76ef00953dc6472b4..993ed338764b0ccd7bdfb76bd6fbb5dc6ab4022d
> 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
> }
>
> /* Kill outstanding nfsd threads */
> - svc_set_num_threads(serv, 0);
> + svc_set_num_threads(serv, 0, 0);
> nfsd_destroy_serv(net);
> mutex_unlock(&nfsd_mutex);
> }
> @@ -704,7 +704,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> net *net)
>
> /* Special case: When n == 1, distribute threads equally among pools. */
> if (n == 1)
> - return svc_set_num_threads(nn->nfsd_serv, nthreads[0]);
> + return svc_set_num_threads(nn->nfsd_serv, nthreads[0], 0);
>
> if (n > nn->nfsd_serv->sv_nrpools)
> n = nn->nfsd_serv->sv_nrpools;
> @@ -732,7 +732,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> net *net)
> for (i = 0; i < n; i++) {
> err = svc_set_pool_threads(nn->nfsd_serv,
> &nn->nfsd_serv->sv_pools[i],
> - nthreads[i]);
> + nthreads[i], 0);
> if (err)
> goto out;
> }
> @@ -741,7 +741,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> net *net)
> for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
> err = svc_set_pool_threads(nn->nfsd_serv,
> &nn->nfsd_serv->sv_pools[i],
> - 0);
> + 0, 0);
> if (err)
> goto out;
> }
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index
> ee9260ca908c907f4373f4cfa471b272bc7bcc8c..35bd3247764ae8dc5dcdfffeea36f7cfefd13372
> 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -36,6 +36,7 @@
> struct svc_pool {
> unsigned int sp_id; /* pool id; also node id on NUMA */
> unsigned int sp_nrthreads; /* # of threads currently running in pool
> */
> + unsigned int sp_nrthrmin; /* Min number of threads to run per pool */
> unsigned int sp_nrthrmax; /* Max requested number of threads in pool
> */
> struct lwq sp_xprts; /* pending transports */
> struct list_head sp_all_threads; /* all server threads */
> @@ -72,7 +73,7 @@ struct svc_serv {
> struct svc_stat * sv_stats; /* RPC statistics */
> spinlock_t sv_lock;
> unsigned int sv_nprogs; /* Number of sv_programs */
> - unsigned int sv_nrthreads; /* # of server threads */
> + unsigned int sv_nrthreads; /* # of running server threads */
> unsigned int sv_max_payload; /* datagram payload size */
> unsigned int sv_max_mesg; /* max_payload + 1 page for overheads */
> unsigned int sv_xdrsize; /* XDR buffer size */
> @@ -447,8 +448,8 @@ struct svc_serv * svc_create_pooled(struct
> svc_program *prog,
> struct svc_stat *stats,
> unsigned int bufsize,
> int (*threadfn)(void *data));
> -int svc_set_pool_threads(struct svc_serv *, struct svc_pool *,
> int);
> -int svc_set_num_threads(struct svc_serv *, int);
> +int svc_set_pool_threads(struct svc_serv *, struct svc_pool *,
> int, unsigned int);
> +int svc_set_num_threads(struct svc_serv *, int, unsigned int);
> int svc_pool_stats_open(struct svc_info *si, struct file *file);
> void svc_process(struct svc_rqst *rqstp);
> void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp);
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index
> 8cd45f62ef74af6e0826b8f13cc903b0962af5e0..dc818158f8529b62dcf96c91bd9a9d4ab21df91f
> 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -821,6 +821,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct
> svc_pool *pool, int nrservs)
> * @serv: RPC service to adjust
> * @pool: Specific pool from which to choose threads
> * @nrservs: New number of threads for @serv (0 or less means kill all
> threads)
> + * @min_threads: minimum number of threads per pool (0 means set to
> same as nrservs)
> *
> * Create or destroy threads in @pool to bring it to @nrservs.
> *
> @@ -831,12 +832,22 @@ svc_stop_kthreads(struct svc_serv *serv, struct
> svc_pool *pool, int nrservs)
> * starting a thread.
> */
> int
> -svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int
> nrservs)
> +svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int
> nrservs,
> + unsigned int min_threads)
> {
> if (!pool)
> return -EINVAL;
>
> pool->sp_nrthrmax = nrservs;
> + if (min_threads) {
> + if (pool->sp_nrthreads > nrservs) {
> + // fallthrough to update nrservs
> + } else if (pool->sp_nrthreads < min_threads) {
> + nrservs = min_threads;
Nit: The mixed sign types here gives me hives. Can you think of
a reason nrservs is a signed int rather than unsigned?
> + } else {
> + return 0;
> + }
> + }
> nrservs -= pool->sp_nrthreads;
>
> if (nrservs > 0)
> @@ -851,6 +862,7 @@ EXPORT_SYMBOL_GPL(svc_set_pool_threads);
> * svc_set_num_threads - adjust number of threads in serv
> * @serv: RPC service to adjust
> * @nrservs: New number of threads for @serv (0 or less means kill all
> threads)
> + * @min_threads: minimum number of threads per pool (0 means set to
> same as nrservs)
> *
> * Create or destroy threads in @serv to bring it to @nrservs. If there
> * are multiple pools then the new threads or victims will be
> distributed
> @@ -863,20 +875,23 @@ EXPORT_SYMBOL_GPL(svc_set_pool_threads);
> * starting a thread.
> */
> int
> -svc_set_num_threads(struct svc_serv *serv, int nrservs)
> +svc_set_num_threads(struct svc_serv *serv, int nrservs, unsigned int
> min_threads)
> {
> int base = nrservs / serv->sv_nrpools;
> int remain = nrservs % serv->sv_nrpools;
> int i, err;
>
> for (i = 0; i < serv->sv_nrpools; ++i) {
> + struct svc_pool *pool = &serv->sv_pools[i];
> int threads = base;
>
> if (remain) {
> ++threads;
> --remain;
> }
> - err = svc_set_pool_threads(serv, &serv->sv_pools[i], threads);
> +
> + pool->sp_nrthrmin = min_threads;
> + err = svc_set_pool_threads(serv, pool, threads, min_threads);
> if (err)
> break;
> }
>
> --
> 2.52.0
--
Chuck Lever
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 5/6] nfsd: adjust number of running nfsd threads based on activity
2025-12-12 22:39 ` [PATCH RFC 5/6] nfsd: adjust number of running nfsd threads based on activity Jeff Layton
@ 2025-12-13 20:54 ` Chuck Lever
2025-12-13 21:43 ` Jeff Layton
0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2025-12-13 20:54 UTC (permalink / raw)
To: Jeff Layton, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> This patch is based on a draft patch by Neil:
>
> svc_recv() is changed to return a status. This can be:
>
> -ETIMEDOUT - waited for 5 seconds and found nothing to do. This is
> boring. Also there are more actual threads than really
> needed.
> -EBUSY - I did something, but there is more stuff to do and no one
> idle who I can wake up to do it.
> BTW I successful set a flag: SP_TASK_STARTING. You better
> clear it.
> 0 - just minding my own business, nothing to see here.
>
> nfsd() is changed to pay attention to this status. In the case of
> -ETIMEDOUT, if the service mutex can be taken (trylock), the thread
> becomes and RQ_VICTIM so that it will exit. In the case of -EBUSY, if
> the actual number of threads is below the calculated maximum, a new
> thread is started. SP_TASK_STARTING is cleared.
Jeff, since you reworked things to be based on a minimum rather
than a maximum count, is this paragraph now stale?
> To support the above, some code is split out of svc_start_kthreads()
> into svc_new_thread().
>
> I think we want memory pressure to be able to push a thread into
> returning -ETIMEDOUT. That can come later.
>
> Signed-off-by: NeilBrown <neil@brown.name>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> fs/nfsd/nfssvc.c | 35 ++++++++++++++++++++-
> fs/nfsd/trace.h | 35 +++++++++++++++++++++
> include/linux/sunrpc/svc.h | 2 ++
> include/linux/sunrpc/svcsock.h | 2 +-
> net/sunrpc/svc.c | 69 ++++++++++++++++++++++++------------------
> net/sunrpc/svc_xprt.c | 45 ++++++++++++++++++++++-----
> 6 files changed, 148 insertions(+), 40 deletions(-)
>
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index
> 993ed338764b0ccd7bdfb76bd6fbb5dc6ab4022d..26c3a6cb1f400f1b757d26f6ba77e27deb7e8ee2
> 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -896,9 +896,11 @@ static int
> nfsd(void *vrqstp)
> {
> struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
> + struct svc_pool *pool = rqstp->rq_pool;
> struct svc_xprt *perm_sock =
> list_entry(rqstp->rq_server->sv_permsocks.next, typeof(struct
> svc_xprt), xpt_list);
> struct net *net = perm_sock->xpt_net;
> struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> + bool have_mutex = false;
>
> /* At this point, the thread shares current->fs
> * with the init process. We need to create files with the
> @@ -916,7 +918,36 @@ nfsd(void *vrqstp)
> * The main request loop
> */
> while (!svc_thread_should_stop(rqstp)) {
> - svc_recv(rqstp);
> + switch (svc_recv(rqstp)) {
> + case -ETIMEDOUT: /* Nothing to do */
> + if (mutex_trylock(&nfsd_mutex)) {
> + if (pool->sp_nrthreads > pool->sp_nrthrmin) {
> + trace_nfsd_dynthread_kill(net, pool);
> + set_bit(RQ_VICTIM, &rqstp->rq_flags);
> + have_mutex = true;
> + } else
> + mutex_unlock(&nfsd_mutex);
> + } else {
> + trace_nfsd_dynthread_trylock_fail(net, pool);
> + }
> + break;
> + case -EBUSY: /* Too much to do */
> + if (pool->sp_nrthreads < pool->sp_nrthrmax &&
> + mutex_trylock(&nfsd_mutex)) {
> + // check no idle threads?
Can this comment be clarified? It looks like a note-to-self, that maybe
something is unfinished.
> + if (pool->sp_nrthreads < pool->sp_nrthrmax) {
> + trace_nfsd_dynthread_start(net, pool);
> + svc_new_thread(rqstp->rq_server, pool);
> + }
> + mutex_unlock(&nfsd_mutex);
> + } else {
> + trace_nfsd_dynthread_trylock_fail(net, pool);
> + }
> + clear_bit(SP_TASK_STARTING, &pool->sp_flags);
> + break;
> + default:
> + break;
> + }
> nfsd_file_net_dispose(nn);
> }
>
> @@ -924,6 +955,8 @@ nfsd(void *vrqstp)
>
> /* Release the thread */
> svc_exit_thread(rqstp);
> + if (have_mutex)
> + mutex_unlock(&nfsd_mutex);
> return 0;
> }
>
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index
> 5ae2a611e57f4b4e51a4d9eb6e0fccb66ad8d288..8885fd9bead98ebf55379d68ab9c3701981a5150
> 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -91,6 +91,41 @@ DEFINE_EVENT(nfsd_xdr_err_class, nfsd_##name##_err, \
> DEFINE_NFSD_XDR_ERR_EVENT(garbage_args);
> DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
>
> +DECLARE_EVENT_CLASS(nfsd_dynthread_class,
> + TP_PROTO(
> + const struct net *net,
> + const struct svc_pool *pool
> + ),
> + TP_ARGS(net, pool),
> + TP_STRUCT__entry(
> + __field(unsigned int, netns_ino)
> + __field(unsigned int, pool_id)
> + __field(unsigned int, nrthreads)
> + __field(unsigned int, nrthrmin)
> + __field(unsigned int, nrthrmax)
> + ),
> + TP_fast_assign(
> + __entry->netns_ino = net->ns.inum;
> + __entry->pool_id = pool->sp_id;
> + __entry->nrthreads = pool->sp_nrthreads;
> + __entry->nrthrmin = pool->sp_nrthrmin;
> + __entry->nrthrmax = pool->sp_nrthrmax;
> + ),
> + TP_printk("pool=%u nrthreads=%u nrthrmin=%u nrthrmax=%u",
> + __entry->pool_id, __entry->nrthreads,
> + __entry->nrthrmin, __entry->nrthrmax
> + )
> +);
> +
> +#define DEFINE_NFSD_DYNTHREAD_EVENT(name) \
> +DEFINE_EVENT(nfsd_dynthread_class, nfsd_dynthread_##name, \
> + TP_PROTO(const struct net *net, const struct svc_pool *pool), \
> + TP_ARGS(net, pool))
> +
> +DEFINE_NFSD_DYNTHREAD_EVENT(start);
> +DEFINE_NFSD_DYNTHREAD_EVENT(kill);
> +DEFINE_NFSD_DYNTHREAD_EVENT(trylock_fail);
> +
> #define show_nfsd_may_flags(x) \
> __print_flags(x, "|", \
> { NFSD_MAY_EXEC, "EXEC" }, \
> diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> index
> 35bd3247764ae8dc5dcdfffeea36f7cfefd13372..f47e19c9bd9466986438766e9ab7b4c71cda1ba6
> 100644
> --- a/include/linux/sunrpc/svc.h
> +++ b/include/linux/sunrpc/svc.h
> @@ -55,6 +55,7 @@ enum {
> SP_TASK_PENDING, /* still work to do even if no xprt is queued */
> SP_NEED_VICTIM, /* One thread needs to agree to exit */
> SP_VICTIM_REMAINS, /* One thread needs to actually exit */
> + SP_TASK_STARTING, /* Task has started but not added to idle yet */
> };
>
>
> @@ -442,6 +443,7 @@ struct svc_serv *svc_create(struct svc_program *,
> unsigned int,
> bool svc_rqst_replace_page(struct svc_rqst *rqstp,
> struct page *page);
> void svc_rqst_release_pages(struct svc_rqst *rqstp);
> +int svc_new_thread(struct svc_serv *serv, struct svc_pool *pool);
> void svc_exit_thread(struct svc_rqst *);
> struct svc_serv * svc_create_pooled(struct svc_program *prog,
> unsigned int nprog,
> diff --git a/include/linux/sunrpc/svcsock.h
> b/include/linux/sunrpc/svcsock.h
> index
> de37069aba90899be19b1090e6e90e509a3cf530..5c87d3fedd33e7edf5ade32e60523cae7e9ebaba
> 100644
> --- a/include/linux/sunrpc/svcsock.h
> +++ b/include/linux/sunrpc/svcsock.h
> @@ -61,7 +61,7 @@ static inline u32 svc_sock_final_rec(struct svc_sock
> *svsk)
> /*
> * Function prototypes.
> */
> -void svc_recv(struct svc_rqst *rqstp);
> +int svc_recv(struct svc_rqst *rqstp);
> void svc_send(struct svc_rqst *rqstp);
> int svc_addsock(struct svc_serv *serv, struct net *net,
> const int fd, char *name_return, const size_t len,
> diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> index
> dc818158f8529b62dcf96c91bd9a9d4ab21df91f..9fca2dd340037f82baa4936766ebe0e38c3f0d85
> 100644
> --- a/net/sunrpc/svc.c
> +++ b/net/sunrpc/svc.c
> @@ -714,9 +714,6 @@ svc_prepare_thread(struct svc_serv *serv, struct
> svc_pool *pool, int node)
>
> rqstp->rq_err = -EAGAIN; /* No error yet */
>
> - serv->sv_nrthreads += 1;
> - pool->sp_nrthreads += 1;
> -
> /* Protected by whatever lock the service uses when calling
> * svc_set_num_threads()
> */
> @@ -763,45 +760,57 @@ void svc_pool_wake_idle_thread(struct svc_pool *pool)
> }
> EXPORT_SYMBOL_GPL(svc_pool_wake_idle_thread);
>
> -static int
> -svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
> +int svc_new_thread(struct svc_serv *serv, struct svc_pool *pool)
Is now an exported function, should get a kdoc comment.
> {
> struct svc_rqst *rqstp;
> struct task_struct *task;
> int node;
> int err;
>
> - do {
> - nrservs--;
> - node = svc_pool_map_get_node(pool->sp_id);
> -
> - rqstp = svc_prepare_thread(serv, pool, node);
> - if (!rqstp)
> - return -ENOMEM;
> - task = kthread_create_on_node(serv->sv_threadfn, rqstp,
> - node, "%s", serv->sv_name);
> - if (IS_ERR(task)) {
> - svc_exit_thread(rqstp);
> - return PTR_ERR(task);
> - }
> + node = svc_pool_map_get_node(pool->sp_id);
>
> - rqstp->rq_task = task;
> - if (serv->sv_nrpools > 1)
> - svc_pool_map_set_cpumask(task, pool->sp_id);
> + rqstp = svc_prepare_thread(serv, pool, node);
> + if (!rqstp)
> + return -ENOMEM;
> + set_bit(SP_TASK_STARTING, &pool->sp_flags);
> + task = kthread_create_on_node(serv->sv_threadfn, rqstp,
> + node, "%s", serv->sv_name);
> + if (IS_ERR(task)) {
> + clear_bit(SP_TASK_STARTING, &pool->sp_flags);
> + svc_exit_thread(rqstp);
svc_exit_thread() decrements serv->sv_nrthreads and pool->sp_nrthreads
but this call site hasn't incremented them yet. Perhaps this error
flow needs a simpler clean-up than calling svc_exit_thread().
> + return PTR_ERR(task);
> + }
>
> - svc_sock_update_bufs(serv);
> - wake_up_process(task);
> + serv->sv_nrthreads += 1;
> + pool->sp_nrthreads += 1;
>
> - wait_var_event(&rqstp->rq_err, rqstp->rq_err != -EAGAIN);
> - err = rqstp->rq_err;
> - if (err) {
> - svc_exit_thread(rqstp);
> - return err;
> - }
> - } while (nrservs > 0);
> + rqstp->rq_task = task;
> + if (serv->sv_nrpools > 1)
> + svc_pool_map_set_cpumask(task, pool->sp_id);
>
> + svc_sock_update_bufs(serv);
> + wake_up_process(task);
> +
> + wait_var_event(&rqstp->rq_err, rqstp->rq_err != -EAGAIN);
> + err = rqstp->rq_err;
> + if (err) {
> + svc_exit_thread(rqstp);
> + return err;
> + }
> return 0;
> }
> +EXPORT_SYMBOL_GPL(svc_new_thread);
> +
> +static int
> +svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
> +{
> + int err = 0;
> +
> + while (!err && nrservs--)
> + err = svc_new_thread(serv, pool);
> +
> + return err;
> +}
>
> static int
> svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int
> nrservs)
> diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> index
> 6973184ff6675211b4338fac80105894e9c8d4df..9612334300c8dae38720a0f5c61c0f505432ec2f
> 100644
> --- a/net/sunrpc/svc_xprt.c
> +++ b/net/sunrpc/svc_xprt.c
> @@ -714,15 +714,22 @@ svc_thread_should_sleep(struct svc_rqst *rqstp)
> return true;
> }
>
> -static void svc_thread_wait_for_work(struct svc_rqst *rqstp)
> +static bool nfsd_schedule_timeout(long timeout)
Perhaps svc_schedule_timeout() is a more appropriate name for
a function that resides in net/sunrpc/svc_xprt.c.
> +{
> + return schedule_timeout(timeout) == 0;
> +}
> +
> +static bool svc_thread_wait_for_work(struct svc_rqst *rqstp)
> {
> struct svc_pool *pool = rqstp->rq_pool;
> + bool did_timeout = false;
>
> if (svc_thread_should_sleep(rqstp)) {
> set_current_state(TASK_IDLE | TASK_FREEZABLE);
> llist_add(&rqstp->rq_idle, &pool->sp_idle_threads);
> + clear_bit(SP_TASK_STARTING, &pool->sp_flags);
> if (likely(svc_thread_should_sleep(rqstp)))
> - schedule();
> + did_timeout = nfsd_schedule_timeout(5 * HZ);
>
> while (!llist_del_first_this(&pool->sp_idle_threads,
> &rqstp->rq_idle)) {
> @@ -734,7 +741,10 @@ static void svc_thread_wait_for_work(struct
> svc_rqst *rqstp)
> * for this new work. This thread can safely sleep
> * until woken again.
> */
> - schedule();
> + if (did_timeout)
> + did_timeout = nfsd_schedule_timeout(HZ);
> + else
> + did_timeout = nfsd_schedule_timeout(5 * HZ);
> set_current_state(TASK_IDLE | TASK_FREEZABLE);
> }
> __set_current_state(TASK_RUNNING);
> @@ -742,6 +752,7 @@ static void svc_thread_wait_for_work(struct
> svc_rqst *rqstp)
> cond_resched();
> }
> try_to_freeze();
> + return did_timeout;
> }
>
> static void svc_add_new_temp_xprt(struct svc_serv *serv, struct
> svc_xprt *newxpt)
> @@ -825,6 +836,8 @@ static void svc_handle_xprt(struct svc_rqst *rqstp,
> struct svc_xprt *xprt)
>
> static void svc_thread_wake_next(struct svc_rqst *rqstp)
> {
> + clear_bit(SP_TASK_STARTING, &rqstp->rq_pool->sp_flags);
> +
> if (!svc_thread_should_sleep(rqstp))
> /* More work pending after I dequeued some,
> * wake another worker
> @@ -839,21 +852,31 @@ static void svc_thread_wake_next(struct svc_rqst *rqstp)
> * This code is carefully organised not to touch any cachelines in
> * the shared svc_serv structure, only cachelines in the local
> * svc_pool.
> + *
> + * Returns -ETIMEDOUT if idle for an extended period
> + * -EBUSY is there is more work to do than available threads
> + * 0 otherwise.
> */
> -void svc_recv(struct svc_rqst *rqstp)
> +int svc_recv(struct svc_rqst *rqstp)
> {
> struct svc_pool *pool = rqstp->rq_pool;
> + bool did_wait;
> + int ret = 0;
>
> if (!svc_alloc_arg(rqstp))
> - return;
> + return ret;
> +
> + did_wait = svc_thread_wait_for_work(rqstp);
>
> - svc_thread_wait_for_work(rqstp);
> + if (did_wait && svc_thread_should_sleep(rqstp) &&
> + pool->sp_nrthrmin && (pool->sp_nrthreads > pool->sp_nrthrmin))
> + ret = -ETIMEDOUT;
>
> clear_bit(SP_TASK_PENDING, &pool->sp_flags);
>
> if (svc_thread_should_stop(rqstp)) {
> svc_thread_wake_next(rqstp);
> - return;
> + return ret;
> }
>
> rqstp->rq_xprt = svc_xprt_dequeue(pool);
> @@ -867,8 +890,13 @@ void svc_recv(struct svc_rqst *rqstp)
> */
> if (pool->sp_idle_threads.first)
> rqstp->rq_chandle.thread_wait = 5 * HZ;
> - else
> + else {
> rqstp->rq_chandle.thread_wait = 1 * HZ;
> + if (!did_wait &&
> + !test_and_set_bit(SP_TASK_STARTING,
> + &pool->sp_flags))
> + ret = -EBUSY;
> + }
>
> trace_svc_xprt_dequeue(rqstp);
> svc_handle_xprt(rqstp, xprt);
> @@ -887,6 +915,7 @@ void svc_recv(struct svc_rqst *rqstp)
> }
> }
> #endif
> + return ret;
> }
> EXPORT_SYMBOL_GPL(svc_recv);
>
>
> --
> 2.52.0
The extensive use of atomic bit ops here is a little worrying.
Those can be costly -- and the sp_flags field is going to get
poked at by more and more threads as the pool's thread count
increases.
--
Chuck Lever
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 6/6] nfsd: add controls to set the minimum number of threads per pool
2025-12-12 22:39 ` [PATCH RFC 6/6] nfsd: add controls to set the minimum number of threads per pool Jeff Layton
@ 2025-12-13 21:10 ` Chuck Lever
2025-12-13 21:47 ` Jeff Layton
0 siblings, 1 reply; 17+ messages in thread
From: Chuck Lever @ 2025-12-13 21:10 UTC (permalink / raw)
To: Jeff Layton, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> Add a new "min_threads" variable to the nfsd_net, along with the
> corresponding nfsdfs and netlink interfaces to set that value from
> userland. Pass that value to svc_set_pool_threads() and
> svc_set_num_threads().
>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> Documentation/netlink/specs/nfsd.yaml | 5 ++++
> fs/nfsd/netlink.c | 5 ++--
> fs/nfsd/netns.h | 6 +++++
> fs/nfsd/nfsctl.c | 50 +++++++++++++++++++++++++++++++++++
> fs/nfsd/nfssvc.c | 8 +++---
> fs/nfsd/trace.h | 19 +++++++++++++
> include/uapi/linux/nfsd_netlink.h | 1 +
> 7 files changed, 88 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/netlink/specs/nfsd.yaml
> b/Documentation/netlink/specs/nfsd.yaml
> index
> 100363029e82aed87295e34a008ab771a95d508c..badb2fe57c9859c6932c621a589da694782b0272
> 100644
> --- a/Documentation/netlink/specs/nfsd.yaml
> +++ b/Documentation/netlink/specs/nfsd.yaml
> @@ -78,6 +78,9 @@ attribute-sets:
> -
> name: scope
> type: string
> + -
> + name: min-threads
> + type: u32
> -
> name: version
> attributes:
> @@ -159,6 +162,7 @@ operations:
> - gracetime
> - leasetime
> - scope
> + - min-threads
> -
> name: threads-get
> doc: get the number of running threads
> @@ -170,6 +174,7 @@ operations:
> - gracetime
> - leasetime
> - scope
> + - min-threads
> -
> name: version-set
> doc: set nfs enabled versions
> diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
> index
> ac51a44e1065ec3f1d88165f70a831a828b58394..887525964451e640304371e33aa4f415b4ff2848
> 100644
> --- a/fs/nfsd/netlink.c
> +++ b/fs/nfsd/netlink.c
> @@ -24,11 +24,12 @@ const struct nla_policy
> nfsd_version_nl_policy[NFSD_A_VERSION_ENABLED + 1] = {
> };
>
> /* NFSD_CMD_THREADS_SET - do */
> -static const struct nla_policy
> nfsd_threads_set_nl_policy[NFSD_A_SERVER_SCOPE + 1] = {
> +static const struct nla_policy
> nfsd_threads_set_nl_policy[NFSD_A_SERVER_MIN_THREADS + 1] = {
> [NFSD_A_SERVER_THREADS] = { .type = NLA_U32, },
> [NFSD_A_SERVER_GRACETIME] = { .type = NLA_U32, },
> [NFSD_A_SERVER_LEASETIME] = { .type = NLA_U32, },
> [NFSD_A_SERVER_SCOPE] = { .type = NLA_NUL_STRING, },
> + [NFSD_A_SERVER_MIN_THREADS] = { .type = NLA_U32, },
> };
>
> /* NFSD_CMD_VERSION_SET - do */
> @@ -57,7 +58,7 @@ static const struct genl_split_ops nfsd_nl_ops[] = {
> .cmd = NFSD_CMD_THREADS_SET,
> .doit = nfsd_nl_threads_set_doit,
> .policy = nfsd_threads_set_nl_policy,
> - .maxattr = NFSD_A_SERVER_SCOPE,
> + .maxattr = NFSD_A_SERVER_MIN_THREADS,
> .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
> },
> {
> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> index
> 3e2d0fde80a7ce434ef2cce9f1666c2bd16ab2eb..1c3449810eaefea8167ddd284af7bd66cac7e211
> 100644
> --- a/fs/nfsd/netns.h
> +++ b/fs/nfsd/netns.h
> @@ -128,6 +128,12 @@ struct nfsd_net {
> seqlock_t writeverf_lock;
> unsigned char writeverf[8];
>
> + /*
> + * Minimum number of threads to run per pool. If 0 then the
> + * min == max requested number of threads.
> + */
> + unsigned int min_threads;
> +
> u32 clientid_base;
> u32 clientid_counter;
> u32 clverifier_counter;
> diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> index
> 206534fccf36a992026669fee6533adff1062c36..a5401015e62499d07150cde8822f1e7dd0515dfe
> 100644
> --- a/fs/nfsd/nfsctl.c
> +++ b/fs/nfsd/nfsctl.c
> @@ -48,6 +48,7 @@ enum {
> NFSD_Versions,
> NFSD_Ports,
> NFSD_MaxBlkSize,
> + NFSD_MinThreads,
> NFSD_Filecache,
> NFSD_Leasetime,
> NFSD_Gracetime,
> @@ -67,6 +68,7 @@ static ssize_t write_pool_threads(struct file *file,
> char *buf, size_t size);
> static ssize_t write_versions(struct file *file, char *buf, size_t
> size);
> static ssize_t write_ports(struct file *file, char *buf, size_t size);
> static ssize_t write_maxblksize(struct file *file, char *buf, size_t
> size);
> +static ssize_t write_minthreads(struct file *file, char *buf, size_t
> size);
> #ifdef CONFIG_NFSD_V4
> static ssize_t write_leasetime(struct file *file, char *buf, size_t
> size);
> static ssize_t write_gracetime(struct file *file, char *buf, size_t
> size);
> @@ -85,6 +87,7 @@ static ssize_t (*const write_op[])(struct file *,
> char *, size_t) = {
> [NFSD_Versions] = write_versions,
> [NFSD_Ports] = write_ports,
> [NFSD_MaxBlkSize] = write_maxblksize,
> + [NFSD_MinThreads] = write_minthreads,
> #ifdef CONFIG_NFSD_V4
> [NFSD_Leasetime] = write_leasetime,
> [NFSD_Gracetime] = write_gracetime,
> @@ -899,6 +902,46 @@ static ssize_t write_maxblksize(struct file *file,
> char *buf, size_t size)
> nfsd_max_blksize);
> }
>
> +/*
> + * write_minthreads - Set or report the current min number of threads
> + *
> + * Input:
> + * buf: ignored
> + * size: zero
> + * OR
> + *
> + * Input:
> + * buf: C string containing an unsigned
> + * integer value representing the new
> + * max number of threads
s/max number of threads/min number of threads
> + * size: non-zero length of C string in @buf
> + * Output:
> + * On success: passed-in buffer filled with '\n'-terminated C string
> + * containing numeric value of min_threads setting
> + * for this net namespace;
> + * return code is the size in bytes of the string
> + * On error: return code is zero or a negative errno value
> + */
> +static ssize_t write_minthreads(struct file *file, char *buf, size_t
> size)
> +{
> + char *mesg = buf;
> + struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
> + unsigned int minthreads = nn->min_threads;
> +
> + if (size > 0) {
What if @size is a very large number?
> + int rv = get_uint(&mesg, &minthreads);
> +
> + if (rv)
> + return rv;
> + trace_nfsd_ctl_minthreads(netns(file), minthreads);
> + mutex_lock(&nfsd_mutex);
> + nn->min_threads = minthreads;
> + mutex_unlock(&nfsd_mutex);
> + }
> +
> + return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", minthreads);
> +}
> +
> #ifdef CONFIG_NFSD_V4
> static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t
> size,
> time64_t *time, struct nfsd_net *nn)
> @@ -1292,6 +1335,7 @@ static int nfsd_fill_super(struct super_block
> *sb, struct fs_context *fc)
> [NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
> [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
> [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops,
> S_IWUSR|S_IRUGO},
> + [NFSD_MinThreads] = {"min_threads", &transaction_ops,
> S_IWUSR|S_IRUGO},
> [NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops,
> S_IRUGO},
> #ifdef CONFIG_NFSD_V4
> [NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops,
> S_IWUSR|S_IRUSR},
> @@ -1636,6 +1680,10 @@ int nfsd_nl_threads_set_doit(struct sk_buff
> *skb, struct genl_info *info)
> scope = nla_data(attr);
> }
>
> + attr = info->attrs[NFSD_A_SERVER_MIN_THREADS];
> + if (attr)
> + nn->min_threads = nla_get_u32(attr);
> +
> ret = nfsd_svc(nrpools, nthreads, net, get_current_cred(), scope);
> if (ret > 0)
> ret = 0;
> @@ -1675,6 +1723,8 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb,
> struct genl_info *info)
> nn->nfsd4_grace) ||
> nla_put_u32(skb, NFSD_A_SERVER_LEASETIME,
> nn->nfsd4_lease) ||
> + nla_put_u32(skb, NFSD_A_SERVER_MIN_THREADS,
> + nn->min_threads) ||
> nla_put_string(skb, NFSD_A_SERVER_SCOPE,
> nn->nfsd_name);
> if (err)
> diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> index
> 26c3a6cb1f400f1b757d26f6ba77e27deb7e8ee2..d6120dd843ac1b6a42f0ef331700f4d6d70d8c38
> 100644
> --- a/fs/nfsd/nfssvc.c
> +++ b/fs/nfsd/nfssvc.c
> @@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
> }
>
> /* Kill outstanding nfsd threads */
> - svc_set_num_threads(serv, 0, 0);
> + svc_set_num_threads(serv, 0, nn->min_threads);
Seems like this could actually /start/ threads during NFSD shutdown.
At the very least it needs an explanatory comment.
> nfsd_destroy_serv(net);
> mutex_unlock(&nfsd_mutex);
> }
> @@ -704,7 +704,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> net *net)
>
> /* Special case: When n == 1, distribute threads equally among pools. */
> if (n == 1)
> - return svc_set_num_threads(nn->nfsd_serv, nthreads[0], 0);
> + return svc_set_num_threads(nn->nfsd_serv, nthreads[0], nn->min_threads);
>
> if (n > nn->nfsd_serv->sv_nrpools)
> n = nn->nfsd_serv->sv_nrpools;
> @@ -732,7 +732,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> net *net)
> for (i = 0; i < n; i++) {
> err = svc_set_pool_threads(nn->nfsd_serv,
> &nn->nfsd_serv->sv_pools[i],
> - nthreads[i], 0);
> + nthreads[i], nn->min_threads);
> if (err)
> goto out;
> }
> @@ -741,7 +741,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> net *net)
> for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
> err = svc_set_pool_threads(nn->nfsd_serv,
> &nn->nfsd_serv->sv_pools[i],
> - 0, 0);
> + 0, nn->min_threads);
> if (err)
> goto out;
> }
> diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> index
> 8885fd9bead98ebf55379d68ab9c3701981a5150..d1d0b0dd054588a8c20e3386356dfa4e9632b8e0
> 100644
> --- a/fs/nfsd/trace.h
> +++ b/fs/nfsd/trace.h
> @@ -2164,6 +2164,25 @@ TRACE_EVENT(nfsd_ctl_maxblksize,
> )
> );
>
> +TRACE_EVENT(nfsd_ctl_minthreads,
> + TP_PROTO(
> + const struct net *net,
> + int minthreads
> + ),
> + TP_ARGS(net, minthreads),
> + TP_STRUCT__entry(
> + __field(unsigned int, netns_ino)
> + __field(int, minthreads)
> + ),
> + TP_fast_assign(
> + __entry->netns_ino = net->ns.inum;
> + __entry->minthreads = minthreads
> + ),
> + TP_printk("minthreads=%d",
> + __entry->minthreads
> + )
> +);
> +
> TRACE_EVENT(nfsd_ctl_time,
> TP_PROTO(
> const struct net *net,
> diff --git a/include/uapi/linux/nfsd_netlink.h
> b/include/uapi/linux/nfsd_netlink.h
> index
> e157e2009ea8c1ef805301261d536c82677821ef..e9efbc9e63d83ed25fcd790b7a877c0023638f15
> 100644
> --- a/include/uapi/linux/nfsd_netlink.h
> +++ b/include/uapi/linux/nfsd_netlink.h
> @@ -35,6 +35,7 @@ enum {
> NFSD_A_SERVER_GRACETIME,
> NFSD_A_SERVER_LEASETIME,
> NFSD_A_SERVER_SCOPE,
> + NFSD_A_SERVER_MIN_THREADS,
>
> __NFSD_A_SERVER_MAX,
> NFSD_A_SERVER_MAX = (__NFSD_A_SERVER_MAX - 1)
>
> --
> 2.52.0
Thanks, Neil and Jeff, for pulling all this together. It's good to
have something we can noodle on now.
--
Chuck Lever
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool
2025-12-13 19:34 ` [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Chuck Lever
@ 2025-12-13 21:35 ` Jeff Layton
0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-12-13 21:35 UTC (permalink / raw)
To: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Sat, 2025-12-13 at 14:34 -0500, Chuck Lever wrote:
>
> On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> > This patchset changes nfsd to dynamically size its threadpool as
> > needed. The main user-visible change is the addition of new controls
> > that allow the admin to set a minimum number of threads.
> >
> > When the minimum is set to a non-zero value, the traditional "threads"
> > setting is interpreted as a maximum number of threads instead of a
> > static count. The server will start the minimum number of threads, and
> > then ramp up the thread count as needed. When the server is idle, it
> > will gradually ramp down the thread count.
> >
> > This control scheme should allow us to sanely switch between kernels
> > that do and do not support dynamic threading. In the case where dynamic
> > threading is not supported, the user will just get the static maximum
> > number of threads.
>
> An important consideration!
>
>
> > The series is based on a set of draft patches by Neil. There are a
> > number of changes from his work:
> >
> > 1/ his original series was based around a new setting that defined a
> > maximum number of threads. This one instead adds a control to define a
> > minimum number of threads.
>
> My concern is whether one or more clients can force this mechanism
> to continue creating threads until resource exhaustion causes a
> denial of service.
>
The old "threads" setting is repurposed as a maximum when "min-theads"
is set. If someone sets "threads" high enough that they can drive the
machine into resource exhaustion, then that's an administrative error
IMO.
> I'm not convinced that setting a minimum number of threads is all
> that interesting. Can you elaborate on why you chose that design?
>
The main reason to do dynamic threading is that NFS activity can be
spotty. Servers often have periods where they are very busy and other
times where they are idle.
Today, admins usually just set "threads" to the maximum number that
they think they will ever need to deal with this. This is a waste of
resources when nfsd is idle, of course. For a dedicated NFS server that
isn't doing anything else, that's usually considered acceptable.
So, I see the dynamic threading as mostly useful for machines that are
running nfsd as a "side job". e.g. -- a compute-heavy server that runs
nfsd in order to make its results available to other hosts. In those
cases, it makes sense to allow the thread count to ramp down when no
one is accessing nfsd so that those resources can be used for other
things.
With that in mind, it makes sense to repurpose "threads" as a maximum,
since that reflects the reality for most people today. So, the new
control should have the effect of setting a minimum number of threads.
For my own testing, I've mostly set min-threads to 1. We could
certainly convert this into a "dynamic-threading" bool setting and just
hardcode the minimum to 1 or some other value in that case, but I think
it makes sense to allow the flexibility to set the value higher, at
least until we have a better feeling for how this affects performance.
Thanks for taking a look!
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 4/6] sunrpc: introduce the concept of a minimum number of threads per pool
2025-12-13 20:19 ` Chuck Lever
@ 2025-12-13 21:38 ` Jeff Layton
0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-12-13 21:38 UTC (permalink / raw)
To: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Sat, 2025-12-13 at 15:19 -0500, Chuck Lever wrote:
>
> On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> > Add a new pool->sp_nrthrmin field to track the minimum number of threads
> > in a pool. Add min_threads parameters to both svc_set_num_threads() and
> > svc_set_pool_threads(). If min_threads is non-zero, then have
> > svc_set_num_threads() ensure that the number of running threads is
> > between the min and the max.
> >
> > For now, the min_threads is always 0, but a later patch will pass the
> > proper value through from nfsd.
> >
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/lockd/svc.c | 4 ++--
> > fs/nfs/callback.c | 8 ++++----
> > fs/nfsd/nfssvc.c | 8 ++++----
> > include/linux/sunrpc/svc.h | 7 ++++---
> > net/sunrpc/svc.c | 21 ++++++++++++++++++---
> > 5 files changed, 32 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > index
> > fbf132b4e08d11a91784c21ee0209fd7c149fd9d..7899205314391415dfb698ab58fe97efc426d928
> > 100644
> > --- a/fs/lockd/svc.c
> > +++ b/fs/lockd/svc.c
> > @@ -340,7 +340,7 @@ static int lockd_get(void)
> > return -ENOMEM;
> > }
> >
> > - error = svc_set_num_threads(serv, 1);
> > + error = svc_set_num_threads(serv, 1, 0);
> > if (error < 0) {
> > svc_destroy(&serv);
> > return error;
> > @@ -368,7 +368,7 @@ static void lockd_put(void)
> > unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
> > #endif
> >
> > - svc_set_num_threads(nlmsvc_serv, 0);
> > + svc_set_num_threads(nlmsvc_serv, 0, 0);
> > timer_delete_sync(&nlmsvc_retry);
> > svc_destroy(&nlmsvc_serv);
> > dprintk("lockd_down: service destroyed\n");
> > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> > index
> > 44b35b7f8dc022f1d8c069eaf2f7d334c93f77fc..32bbc0e688ff3988e4ba50eeb36b4808cec07c87
> > 100644
> > --- a/fs/nfs/callback.c
> > +++ b/fs/nfs/callback.c
> > @@ -119,9 +119,9 @@ static int nfs_callback_start_svc(int minorversion,
> > struct rpc_xprt *xprt,
> > if (serv->sv_nrthreads == nrservs)
> > return 0;
> >
> > - ret = svc_set_num_threads(serv, nrservs);
> > + ret = svc_set_num_threads(serv, nrservs, 0);
> > if (ret) {
> > - svc_set_num_threads(serv, 0);
> > + svc_set_num_threads(serv, 0, 0);
> > return ret;
> > }
> > dprintk("nfs_callback_up: service started\n");
> > @@ -242,7 +242,7 @@ int nfs_callback_up(u32 minorversion, struct
> > rpc_xprt *xprt)
> > cb_info->users++;
> > err_net:
> > if (!cb_info->users) {
> > - svc_set_num_threads(cb_info->serv, 0);
> > + svc_set_num_threads(cb_info->serv, 0, 0);
> > svc_destroy(&cb_info->serv);
> > }
> > err_create:
> > @@ -268,7 +268,7 @@ void nfs_callback_down(int minorversion, struct net
> > *net)
> > nfs_callback_down_net(minorversion, serv, net);
> > cb_info->users--;
> > if (cb_info->users == 0) {
> > - svc_set_num_threads(serv, 0);
> > + svc_set_num_threads(serv, 0, 0);
> > dprintk("nfs_callback_down: service destroyed\n");
> > svc_destroy(&cb_info->serv);
> > }
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index
> > aafec8ff588b85b0e26d40b76ef00953dc6472b4..993ed338764b0ccd7bdfb76bd6fbb5dc6ab4022d
> > 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
> > }
> >
> > /* Kill outstanding nfsd threads */
> > - svc_set_num_threads(serv, 0);
> > + svc_set_num_threads(serv, 0, 0);
> > nfsd_destroy_serv(net);
> > mutex_unlock(&nfsd_mutex);
> > }
> > @@ -704,7 +704,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> > net *net)
> >
> > /* Special case: When n == 1, distribute threads equally among pools. */
> > if (n == 1)
> > - return svc_set_num_threads(nn->nfsd_serv, nthreads[0]);
> > + return svc_set_num_threads(nn->nfsd_serv, nthreads[0], 0);
> >
> > if (n > nn->nfsd_serv->sv_nrpools)
> > n = nn->nfsd_serv->sv_nrpools;
> > @@ -732,7 +732,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> > net *net)
> > for (i = 0; i < n; i++) {
> > err = svc_set_pool_threads(nn->nfsd_serv,
> > &nn->nfsd_serv->sv_pools[i],
> > - nthreads[i]);
> > + nthreads[i], 0);
> > if (err)
> > goto out;
> > }
> > @@ -741,7 +741,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> > net *net)
> > for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
> > err = svc_set_pool_threads(nn->nfsd_serv,
> > &nn->nfsd_serv->sv_pools[i],
> > - 0);
> > + 0, 0);
> > if (err)
> > goto out;
> > }
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index
> > ee9260ca908c907f4373f4cfa471b272bc7bcc8c..35bd3247764ae8dc5dcdfffeea36f7cfefd13372
> > 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -36,6 +36,7 @@
> > struct svc_pool {
> > unsigned int sp_id; /* pool id; also node id on NUMA */
> > unsigned int sp_nrthreads; /* # of threads currently running in pool
> > */
> > + unsigned int sp_nrthrmin; /* Min number of threads to run per pool */
> > unsigned int sp_nrthrmax; /* Max requested number of threads in pool
> > */
> > struct lwq sp_xprts; /* pending transports */
> > struct list_head sp_all_threads; /* all server threads */
> > @@ -72,7 +73,7 @@ struct svc_serv {
> > struct svc_stat * sv_stats; /* RPC statistics */
> > spinlock_t sv_lock;
> > unsigned int sv_nprogs; /* Number of sv_programs */
> > - unsigned int sv_nrthreads; /* # of server threads */
> > + unsigned int sv_nrthreads; /* # of running server threads */
> > unsigned int sv_max_payload; /* datagram payload size */
> > unsigned int sv_max_mesg; /* max_payload + 1 page for overheads */
> > unsigned int sv_xdrsize; /* XDR buffer size */
> > @@ -447,8 +448,8 @@ struct svc_serv * svc_create_pooled(struct
> > svc_program *prog,
> > struct svc_stat *stats,
> > unsigned int bufsize,
> > int (*threadfn)(void *data));
> > -int svc_set_pool_threads(struct svc_serv *, struct svc_pool *,
> > int);
> > -int svc_set_num_threads(struct svc_serv *, int);
> > +int svc_set_pool_threads(struct svc_serv *, struct svc_pool *,
> > int, unsigned int);
> > +int svc_set_num_threads(struct svc_serv *, int, unsigned int);
> > int svc_pool_stats_open(struct svc_info *si, struct file *file);
> > void svc_process(struct svc_rqst *rqstp);
> > void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp);
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index
> > 8cd45f62ef74af6e0826b8f13cc903b0962af5e0..dc818158f8529b62dcf96c91bd9a9d4ab21df91f
> > 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -821,6 +821,7 @@ svc_stop_kthreads(struct svc_serv *serv, struct
> > svc_pool *pool, int nrservs)
> > * @serv: RPC service to adjust
> > * @pool: Specific pool from which to choose threads
> > * @nrservs: New number of threads for @serv (0 or less means kill all
> > threads)
> > + * @min_threads: minimum number of threads per pool (0 means set to
> > same as nrservs)
> > *
> > * Create or destroy threads in @pool to bring it to @nrservs.
> > *
> > @@ -831,12 +832,22 @@ svc_stop_kthreads(struct svc_serv *serv, struct
> > svc_pool *pool, int nrservs)
> > * starting a thread.
> > */
> > int
> > -svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int
> > nrservs)
> > +svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int
> > nrservs,
> > + unsigned int min_threads)
> > {
> > if (!pool)
> > return -EINVAL;
> >
> > pool->sp_nrthrmax = nrservs;
> > + if (min_threads) {
> > + if (pool->sp_nrthreads > nrservs) {
> > + // fallthrough to update nrservs
> > + } else if (pool->sp_nrthreads < min_threads) {
> > + nrservs = min_threads;
>
> Nit: The mixed sign types here gives me hives. Can you think of
> a reason nrservs is a signed int rather than unsigned?
>
>
> > + } else {
> > + return 0;
> > + }
> > + }
> > nrservs -= pool->sp_nrthreads;
> >
> > if (nrservs > 0)
Because of this ^^^. We could certainly make the argument unsigned and
use a signed value internally in this function. I'll plan to do that on
the next respin.
> > @@ -851,6 +862,7 @@ EXPORT_SYMBOL_GPL(svc_set_pool_threads);
> > * svc_set_num_threads - adjust number of threads in serv
> > * @serv: RPC service to adjust
> > * @nrservs: New number of threads for @serv (0 or less means kill all
> > threads)
> > + * @min_threads: minimum number of threads per pool (0 means set to
> > same as nrservs)
> > *
> > * Create or destroy threads in @serv to bring it to @nrservs. If there
> > * are multiple pools then the new threads or victims will be
> > distributed
> > @@ -863,20 +875,23 @@ EXPORT_SYMBOL_GPL(svc_set_pool_threads);
> > * starting a thread.
> > */
> > int
> > -svc_set_num_threads(struct svc_serv *serv, int nrservs)
> > +svc_set_num_threads(struct svc_serv *serv, int nrservs, unsigned int
> > min_threads)
> > {
> > int base = nrservs / serv->sv_nrpools;
> > int remain = nrservs % serv->sv_nrpools;
> > int i, err;
> >
> > for (i = 0; i < serv->sv_nrpools; ++i) {
> > + struct svc_pool *pool = &serv->sv_pools[i];
> > int threads = base;
> >
> > if (remain) {
> > ++threads;
> > --remain;
> > }
> > - err = svc_set_pool_threads(serv, &serv->sv_pools[i], threads);
> > +
> > + pool->sp_nrthrmin = min_threads;
> > + err = svc_set_pool_threads(serv, pool, threads, min_threads);
> > if (err)
> > break;
> > }
> >
> > --
> > 2.52.0
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 5/6] nfsd: adjust number of running nfsd threads based on activity
2025-12-13 20:54 ` Chuck Lever
@ 2025-12-13 21:43 ` Jeff Layton
0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-12-13 21:43 UTC (permalink / raw)
To: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Sat, 2025-12-13 at 15:54 -0500, Chuck Lever wrote:
>
> On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> > This patch is based on a draft patch by Neil:
> >
> > svc_recv() is changed to return a status. This can be:
> >
> > -ETIMEDOUT - waited for 5 seconds and found nothing to do. This is
> > boring. Also there are more actual threads than really
> > needed.
> > -EBUSY - I did something, but there is more stuff to do and no one
> > idle who I can wake up to do it.
> > BTW I successful set a flag: SP_TASK_STARTING. You better
> > clear it.
> > 0 - just minding my own business, nothing to see here.
> >
> > nfsd() is changed to pay attention to this status. In the case of
> > -ETIMEDOUT, if the service mutex can be taken (trylock), the thread
> > becomes and RQ_VICTIM so that it will exit. In the case of -EBUSY, if
> > the actual number of threads is below the calculated maximum, a new
> > thread is started. SP_TASK_STARTING is cleared.
>
> Jeff, since you reworked things to be based on a minimum rather
> than a maximum count, is this paragraph now stale?
>
>
Yes, it is. Will fix.
> > To support the above, some code is split out of svc_start_kthreads()
> > into svc_new_thread().
> >
> > I think we want memory pressure to be able to push a thread into
> > returning -ETIMEDOUT. That can come later.
> >
> > Signed-off-by: NeilBrown <neil@brown.name>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/nfsd/nfssvc.c | 35 ++++++++++++++++++++-
> > fs/nfsd/trace.h | 35 +++++++++++++++++++++
> > include/linux/sunrpc/svc.h | 2 ++
> > include/linux/sunrpc/svcsock.h | 2 +-
> > net/sunrpc/svc.c | 69 ++++++++++++++++++++++++------------------
> > net/sunrpc/svc_xprt.c | 45 ++++++++++++++++++++++-----
> > 6 files changed, 148 insertions(+), 40 deletions(-)
> >
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index
> > 993ed338764b0ccd7bdfb76bd6fbb5dc6ab4022d..26c3a6cb1f400f1b757d26f6ba77e27deb7e8ee2
> > 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -896,9 +896,11 @@ static int
> > nfsd(void *vrqstp)
> > {
> > struct svc_rqst *rqstp = (struct svc_rqst *) vrqstp;
> > + struct svc_pool *pool = rqstp->rq_pool;
> > struct svc_xprt *perm_sock =
> > list_entry(rqstp->rq_server->sv_permsocks.next, typeof(struct
> > svc_xprt), xpt_list);
> > struct net *net = perm_sock->xpt_net;
> > struct nfsd_net *nn = net_generic(net, nfsd_net_id);
> > + bool have_mutex = false;
> >
> > /* At this point, the thread shares current->fs
> > * with the init process. We need to create files with the
> > @@ -916,7 +918,36 @@ nfsd(void *vrqstp)
> > * The main request loop
> > */
> > while (!svc_thread_should_stop(rqstp)) {
> > - svc_recv(rqstp);
> > + switch (svc_recv(rqstp)) {
> > + case -ETIMEDOUT: /* Nothing to do */
> > + if (mutex_trylock(&nfsd_mutex)) {
> > + if (pool->sp_nrthreads > pool->sp_nrthrmin) {
> > + trace_nfsd_dynthread_kill(net, pool);
> > + set_bit(RQ_VICTIM, &rqstp->rq_flags);
> > + have_mutex = true;
> > + } else
> > + mutex_unlock(&nfsd_mutex);
> > + } else {
> > + trace_nfsd_dynthread_trylock_fail(net, pool);
> > + }
> > + break;
> > + case -EBUSY: /* Too much to do */
> > + if (pool->sp_nrthreads < pool->sp_nrthrmax &&
> > + mutex_trylock(&nfsd_mutex)) {
> > + // check no idle threads?
>
> Can this comment be clarified? It looks like a note-to-self, that maybe
> something is unfinished.
>
That's leftover from Neil's original patch. I'm not sure what his
thinking was there. I'll plan to remove it.
>
> > + if (pool->sp_nrthreads < pool->sp_nrthrmax) {
> > + trace_nfsd_dynthread_start(net, pool);
> > + svc_new_thread(rqstp->rq_server, pool);
> > + }
> > + mutex_unlock(&nfsd_mutex);
> > + } else {
> > + trace_nfsd_dynthread_trylock_fail(net, pool);
> > + }
> > + clear_bit(SP_TASK_STARTING, &pool->sp_flags);
> > + break;
> > + default:
> > + break;
> > + }
> > nfsd_file_net_dispose(nn);
> > }
> >
> > @@ -924,6 +955,8 @@ nfsd(void *vrqstp)
> >
> > /* Release the thread */
> > svc_exit_thread(rqstp);
> > + if (have_mutex)
> > + mutex_unlock(&nfsd_mutex);
> > return 0;
> > }
> >
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index
> > 5ae2a611e57f4b4e51a4d9eb6e0fccb66ad8d288..8885fd9bead98ebf55379d68ab9c3701981a5150
> > 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -91,6 +91,41 @@ DEFINE_EVENT(nfsd_xdr_err_class, nfsd_##name##_err, \
> > DEFINE_NFSD_XDR_ERR_EVENT(garbage_args);
> > DEFINE_NFSD_XDR_ERR_EVENT(cant_encode);
> >
> > +DECLARE_EVENT_CLASS(nfsd_dynthread_class,
> > + TP_PROTO(
> > + const struct net *net,
> > + const struct svc_pool *pool
> > + ),
> > + TP_ARGS(net, pool),
> > + TP_STRUCT__entry(
> > + __field(unsigned int, netns_ino)
> > + __field(unsigned int, pool_id)
> > + __field(unsigned int, nrthreads)
> > + __field(unsigned int, nrthrmin)
> > + __field(unsigned int, nrthrmax)
> > + ),
> > + TP_fast_assign(
> > + __entry->netns_ino = net->ns.inum;
> > + __entry->pool_id = pool->sp_id;
> > + __entry->nrthreads = pool->sp_nrthreads;
> > + __entry->nrthrmin = pool->sp_nrthrmin;
> > + __entry->nrthrmax = pool->sp_nrthrmax;
> > + ),
> > + TP_printk("pool=%u nrthreads=%u nrthrmin=%u nrthrmax=%u",
> > + __entry->pool_id, __entry->nrthreads,
> > + __entry->nrthrmin, __entry->nrthrmax
> > + )
> > +);
> > +
> > +#define DEFINE_NFSD_DYNTHREAD_EVENT(name) \
> > +DEFINE_EVENT(nfsd_dynthread_class, nfsd_dynthread_##name, \
> > + TP_PROTO(const struct net *net, const struct svc_pool *pool), \
> > + TP_ARGS(net, pool))
> > +
> > +DEFINE_NFSD_DYNTHREAD_EVENT(start);
> > +DEFINE_NFSD_DYNTHREAD_EVENT(kill);
> > +DEFINE_NFSD_DYNTHREAD_EVENT(trylock_fail);
> > +
> > #define show_nfsd_may_flags(x) \
> > __print_flags(x, "|", \
> > { NFSD_MAY_EXEC, "EXEC" }, \
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index
> > 35bd3247764ae8dc5dcdfffeea36f7cfefd13372..f47e19c9bd9466986438766e9ab7b4c71cda1ba6
> > 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -55,6 +55,7 @@ enum {
> > SP_TASK_PENDING, /* still work to do even if no xprt is queued */
> > SP_NEED_VICTIM, /* One thread needs to agree to exit */
> > SP_VICTIM_REMAINS, /* One thread needs to actually exit */
> > + SP_TASK_STARTING, /* Task has started but not added to idle yet */
> > };
> >
> >
> > @@ -442,6 +443,7 @@ struct svc_serv *svc_create(struct svc_program *,
> > unsigned int,
> > bool svc_rqst_replace_page(struct svc_rqst *rqstp,
> > struct page *page);
> > void svc_rqst_release_pages(struct svc_rqst *rqstp);
> > +int svc_new_thread(struct svc_serv *serv, struct svc_pool *pool);
> > void svc_exit_thread(struct svc_rqst *);
> > struct svc_serv * svc_create_pooled(struct svc_program *prog,
> > unsigned int nprog,
> > diff --git a/include/linux/sunrpc/svcsock.h
> > b/include/linux/sunrpc/svcsock.h
> > index
> > de37069aba90899be19b1090e6e90e509a3cf530..5c87d3fedd33e7edf5ade32e60523cae7e9ebaba
> > 100644
> > --- a/include/linux/sunrpc/svcsock.h
> > +++ b/include/linux/sunrpc/svcsock.h
> > @@ -61,7 +61,7 @@ static inline u32 svc_sock_final_rec(struct svc_sock
> > *svsk)
> > /*
> > * Function prototypes.
> > */
> > -void svc_recv(struct svc_rqst *rqstp);
> > +int svc_recv(struct svc_rqst *rqstp);
> > void svc_send(struct svc_rqst *rqstp);
> > int svc_addsock(struct svc_serv *serv, struct net *net,
> > const int fd, char *name_return, const size_t len,
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index
> > dc818158f8529b62dcf96c91bd9a9d4ab21df91f..9fca2dd340037f82baa4936766ebe0e38c3f0d85
> > 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -714,9 +714,6 @@ svc_prepare_thread(struct svc_serv *serv, struct
> > svc_pool *pool, int node)
> >
> > rqstp->rq_err = -EAGAIN; /* No error yet */
> >
> > - serv->sv_nrthreads += 1;
> > - pool->sp_nrthreads += 1;
> > -
> > /* Protected by whatever lock the service uses when calling
> > * svc_set_num_threads()
> > */
> > @@ -763,45 +760,57 @@ void svc_pool_wake_idle_thread(struct svc_pool *pool)
> > }
> > EXPORT_SYMBOL_GPL(svc_pool_wake_idle_thread);
> >
> > -static int
> > -svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
> > +int svc_new_thread(struct svc_serv *serv, struct svc_pool *pool)
>
> Is now an exported function, should get a kdoc comment.
>
ACK.
>
> > {
> > struct svc_rqst *rqstp;
> > struct task_struct *task;
> > int node;
> > int err;
> >
> > - do {
> > - nrservs--;
> > - node = svc_pool_map_get_node(pool->sp_id);
> > -
> > - rqstp = svc_prepare_thread(serv, pool, node);
> > - if (!rqstp)
> > - return -ENOMEM;
> > - task = kthread_create_on_node(serv->sv_threadfn, rqstp,
> > - node, "%s", serv->sv_name);
> > - if (IS_ERR(task)) {
> > - svc_exit_thread(rqstp);
> > - return PTR_ERR(task);
> > - }
> > + node = svc_pool_map_get_node(pool->sp_id);
> >
> > - rqstp->rq_task = task;
> > - if (serv->sv_nrpools > 1)
> > - svc_pool_map_set_cpumask(task, pool->sp_id);
> > + rqstp = svc_prepare_thread(serv, pool, node);
> > + if (!rqstp)
> > + return -ENOMEM;
> > + set_bit(SP_TASK_STARTING, &pool->sp_flags);
> > + task = kthread_create_on_node(serv->sv_threadfn, rqstp,
> > + node, "%s", serv->sv_name);
> > + if (IS_ERR(task)) {
> > + clear_bit(SP_TASK_STARTING, &pool->sp_flags);
> > + svc_exit_thread(rqstp);
>
> svc_exit_thread() decrements serv->sv_nrthreads and pool->sp_nrthreads
> but this call site hasn't incremented them yet. Perhaps this error
> flow needs a simpler clean-up than calling svc_exit_thread().
>
ACK. I'll give that a harder look.
>
> > + return PTR_ERR(task);
> > + }
> >
> > - svc_sock_update_bufs(serv);
> > - wake_up_process(task);
> > + serv->sv_nrthreads += 1;
> > + pool->sp_nrthreads += 1;
> >
> > - wait_var_event(&rqstp->rq_err, rqstp->rq_err != -EAGAIN);
> > - err = rqstp->rq_err;
> > - if (err) {
> > - svc_exit_thread(rqstp);
> > - return err;
> > - }
> > - } while (nrservs > 0);
> > + rqstp->rq_task = task;
> > + if (serv->sv_nrpools > 1)
> > + svc_pool_map_set_cpumask(task, pool->sp_id);
> >
> > + svc_sock_update_bufs(serv);
> > + wake_up_process(task);
> > +
> > + wait_var_event(&rqstp->rq_err, rqstp->rq_err != -EAGAIN);
> > + err = rqstp->rq_err;
> > + if (err) {
> > + svc_exit_thread(rqstp);
> > + return err;
> > + }
> > return 0;
> > }
> > +EXPORT_SYMBOL_GPL(svc_new_thread);
> > +
> > +static int
> > +svc_start_kthreads(struct svc_serv *serv, struct svc_pool *pool, int nrservs)
> > +{
> > + int err = 0;
> > +
> > + while (!err && nrservs--)
> > + err = svc_new_thread(serv, pool);
> > +
> > + return err;
> > +}
> >
> > static int
> > svc_stop_kthreads(struct svc_serv *serv, struct svc_pool *pool, int
> > nrservs)
> > diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
> > index
> > 6973184ff6675211b4338fac80105894e9c8d4df..9612334300c8dae38720a0f5c61c0f505432ec2f
> > 100644
> > --- a/net/sunrpc/svc_xprt.c
> > +++ b/net/sunrpc/svc_xprt.c
> > @@ -714,15 +714,22 @@ svc_thread_should_sleep(struct svc_rqst *rqstp)
> > return true;
> > }
> >
> > -static void svc_thread_wait_for_work(struct svc_rqst *rqstp)
> > +static bool nfsd_schedule_timeout(long timeout)
>
> Perhaps svc_schedule_timeout() is a more appropriate name for
> a function that resides in net/sunrpc/svc_xprt.c.
>
Sounds good.
>
> > +{
> > + return schedule_timeout(timeout) == 0;
> > +}
> > +
> > +static bool svc_thread_wait_for_work(struct svc_rqst *rqstp)
> > {
> > struct svc_pool *pool = rqstp->rq_pool;
> > + bool did_timeout = false;
> >
> > if (svc_thread_should_sleep(rqstp)) {
> > set_current_state(TASK_IDLE | TASK_FREEZABLE);
> > llist_add(&rqstp->rq_idle, &pool->sp_idle_threads);
> > + clear_bit(SP_TASK_STARTING, &pool->sp_flags);
> > if (likely(svc_thread_should_sleep(rqstp)))
> > - schedule();
> > + did_timeout = nfsd_schedule_timeout(5 * HZ);
> >
> > while (!llist_del_first_this(&pool->sp_idle_threads,
> > &rqstp->rq_idle)) {
> > @@ -734,7 +741,10 @@ static void svc_thread_wait_for_work(struct
> > svc_rqst *rqstp)
> > * for this new work. This thread can safely sleep
> > * until woken again.
> > */
> > - schedule();
> > + if (did_timeout)
> > + did_timeout = nfsd_schedule_timeout(HZ);
> > + else
> > + did_timeout = nfsd_schedule_timeout(5 * HZ);
> > set_current_state(TASK_IDLE | TASK_FREEZABLE);
> > }
> > __set_current_state(TASK_RUNNING);
> > @@ -742,6 +752,7 @@ static void svc_thread_wait_for_work(struct
> > svc_rqst *rqstp)
> > cond_resched();
> > }
> > try_to_freeze();
> > + return did_timeout;
> > }
> >
> > static void svc_add_new_temp_xprt(struct svc_serv *serv, struct
> > svc_xprt *newxpt)
> > @@ -825,6 +836,8 @@ static void svc_handle_xprt(struct svc_rqst *rqstp,
> > struct svc_xprt *xprt)
> >
> > static void svc_thread_wake_next(struct svc_rqst *rqstp)
> > {
> > + clear_bit(SP_TASK_STARTING, &rqstp->rq_pool->sp_flags);
> > +
> > if (!svc_thread_should_sleep(rqstp))
> > /* More work pending after I dequeued some,
> > * wake another worker
> > @@ -839,21 +852,31 @@ static void svc_thread_wake_next(struct svc_rqst *rqstp)
> > * This code is carefully organised not to touch any cachelines in
> > * the shared svc_serv structure, only cachelines in the local
> > * svc_pool.
> > + *
> > + * Returns -ETIMEDOUT if idle for an extended period
> > + * -EBUSY is there is more work to do than available threads
> > + * 0 otherwise.
> > */
> > -void svc_recv(struct svc_rqst *rqstp)
> > +int svc_recv(struct svc_rqst *rqstp)
> > {
> > struct svc_pool *pool = rqstp->rq_pool;
> > + bool did_wait;
> > + int ret = 0;
> >
> > if (!svc_alloc_arg(rqstp))
> > - return;
> > + return ret;
> > +
> > + did_wait = svc_thread_wait_for_work(rqstp);
> >
> > - svc_thread_wait_for_work(rqstp);
> > + if (did_wait && svc_thread_should_sleep(rqstp) &&
> > + pool->sp_nrthrmin && (pool->sp_nrthreads > pool->sp_nrthrmin))
> > + ret = -ETIMEDOUT;
> >
> > clear_bit(SP_TASK_PENDING, &pool->sp_flags);
> >
> > if (svc_thread_should_stop(rqstp)) {
> > svc_thread_wake_next(rqstp);
> > - return;
> > + return ret;
> > }
> >
> > rqstp->rq_xprt = svc_xprt_dequeue(pool);
> > @@ -867,8 +890,13 @@ void svc_recv(struct svc_rqst *rqstp)
> > */
> > if (pool->sp_idle_threads.first)
> > rqstp->rq_chandle.thread_wait = 5 * HZ;
> > - else
> > + else {
> > rqstp->rq_chandle.thread_wait = 1 * HZ;
> > + if (!did_wait &&
> > + !test_and_set_bit(SP_TASK_STARTING,
> > + &pool->sp_flags))
> > + ret = -EBUSY;
> > + }
> >
> > trace_svc_xprt_dequeue(rqstp);
> > svc_handle_xprt(rqstp, xprt);
> > @@ -887,6 +915,7 @@ void svc_recv(struct svc_rqst *rqstp)
> > }
> > }
> > #endif
> > + return ret;
> > }
> > EXPORT_SYMBOL_GPL(svc_recv);
> >
> >
> > --
> > 2.52.0
>
> The extensive use of atomic bit ops here is a little worrying.
> Those can be costly -- and the sp_flags field is going to get
> poked at by more and more threads as the pool's thread count
> increases.
>
The current way that threading works is dependent on this today. We
could consider a spinlock and a non-atomic bitops, but that might be
even worse. I'll have to think about that.
Thanks for the review!
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 6/6] nfsd: add controls to set the minimum number of threads per pool
2025-12-13 21:10 ` Chuck Lever
@ 2025-12-13 21:47 ` Jeff Layton
0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-12-13 21:47 UTC (permalink / raw)
To: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Sat, 2025-12-13 at 16:10 -0500, Chuck Lever wrote:
>
> On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> > Add a new "min_threads" variable to the nfsd_net, along with the
> > corresponding nfsdfs and netlink interfaces to set that value from
> > userland. Pass that value to svc_set_pool_threads() and
> > svc_set_num_threads().
> >
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > Documentation/netlink/specs/nfsd.yaml | 5 ++++
> > fs/nfsd/netlink.c | 5 ++--
> > fs/nfsd/netns.h | 6 +++++
> > fs/nfsd/nfsctl.c | 50 +++++++++++++++++++++++++++++++++++
> > fs/nfsd/nfssvc.c | 8 +++---
> > fs/nfsd/trace.h | 19 +++++++++++++
> > include/uapi/linux/nfsd_netlink.h | 1 +
> > 7 files changed, 88 insertions(+), 6 deletions(-)
> >
> > diff --git a/Documentation/netlink/specs/nfsd.yaml
> > b/Documentation/netlink/specs/nfsd.yaml
> > index
> > 100363029e82aed87295e34a008ab771a95d508c..badb2fe57c9859c6932c621a589da694782b0272
> > 100644
> > --- a/Documentation/netlink/specs/nfsd.yaml
> > +++ b/Documentation/netlink/specs/nfsd.yaml
> > @@ -78,6 +78,9 @@ attribute-sets:
> > -
> > name: scope
> > type: string
> > + -
> > + name: min-threads
> > + type: u32
> > -
> > name: version
> > attributes:
> > @@ -159,6 +162,7 @@ operations:
> > - gracetime
> > - leasetime
> > - scope
> > + - min-threads
> > -
> > name: threads-get
> > doc: get the number of running threads
> > @@ -170,6 +174,7 @@ operations:
> > - gracetime
> > - leasetime
> > - scope
> > + - min-threads
> > -
> > name: version-set
> > doc: set nfs enabled versions
> > diff --git a/fs/nfsd/netlink.c b/fs/nfsd/netlink.c
> > index
> > ac51a44e1065ec3f1d88165f70a831a828b58394..887525964451e640304371e33aa4f415b4ff2848
> > 100644
> > --- a/fs/nfsd/netlink.c
> > +++ b/fs/nfsd/netlink.c
> > @@ -24,11 +24,12 @@ const struct nla_policy
> > nfsd_version_nl_policy[NFSD_A_VERSION_ENABLED + 1] = {
> > };
> >
> > /* NFSD_CMD_THREADS_SET - do */
> > -static const struct nla_policy
> > nfsd_threads_set_nl_policy[NFSD_A_SERVER_SCOPE + 1] = {
> > +static const struct nla_policy
> > nfsd_threads_set_nl_policy[NFSD_A_SERVER_MIN_THREADS + 1] = {
> > [NFSD_A_SERVER_THREADS] = { .type = NLA_U32, },
> > [NFSD_A_SERVER_GRACETIME] = { .type = NLA_U32, },
> > [NFSD_A_SERVER_LEASETIME] = { .type = NLA_U32, },
> > [NFSD_A_SERVER_SCOPE] = { .type = NLA_NUL_STRING, },
> > + [NFSD_A_SERVER_MIN_THREADS] = { .type = NLA_U32, },
> > };
> >
> > /* NFSD_CMD_VERSION_SET - do */
> > @@ -57,7 +58,7 @@ static const struct genl_split_ops nfsd_nl_ops[] = {
> > .cmd = NFSD_CMD_THREADS_SET,
> > .doit = nfsd_nl_threads_set_doit,
> > .policy = nfsd_threads_set_nl_policy,
> > - .maxattr = NFSD_A_SERVER_SCOPE,
> > + .maxattr = NFSD_A_SERVER_MIN_THREADS,
> > .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
> > },
> > {
> > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
> > index
> > 3e2d0fde80a7ce434ef2cce9f1666c2bd16ab2eb..1c3449810eaefea8167ddd284af7bd66cac7e211
> > 100644
> > --- a/fs/nfsd/netns.h
> > +++ b/fs/nfsd/netns.h
> > @@ -128,6 +128,12 @@ struct nfsd_net {
> > seqlock_t writeverf_lock;
> > unsigned char writeverf[8];
> >
> > + /*
> > + * Minimum number of threads to run per pool. If 0 then the
> > + * min == max requested number of threads.
> > + */
> > + unsigned int min_threads;
> > +
> > u32 clientid_base;
> > u32 clientid_counter;
> > u32 clverifier_counter;
> > diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c
> > index
> > 206534fccf36a992026669fee6533adff1062c36..a5401015e62499d07150cde8822f1e7dd0515dfe
> > 100644
> > --- a/fs/nfsd/nfsctl.c
> > +++ b/fs/nfsd/nfsctl.c
> > @@ -48,6 +48,7 @@ enum {
> > NFSD_Versions,
> > NFSD_Ports,
> > NFSD_MaxBlkSize,
> > + NFSD_MinThreads,
> > NFSD_Filecache,
> > NFSD_Leasetime,
> > NFSD_Gracetime,
> > @@ -67,6 +68,7 @@ static ssize_t write_pool_threads(struct file *file,
> > char *buf, size_t size);
> > static ssize_t write_versions(struct file *file, char *buf, size_t
> > size);
> > static ssize_t write_ports(struct file *file, char *buf, size_t size);
> > static ssize_t write_maxblksize(struct file *file, char *buf, size_t
> > size);
> > +static ssize_t write_minthreads(struct file *file, char *buf, size_t
> > size);
> > #ifdef CONFIG_NFSD_V4
> > static ssize_t write_leasetime(struct file *file, char *buf, size_t
> > size);
> > static ssize_t write_gracetime(struct file *file, char *buf, size_t
> > size);
> > @@ -85,6 +87,7 @@ static ssize_t (*const write_op[])(struct file *,
> > char *, size_t) = {
> > [NFSD_Versions] = write_versions,
> > [NFSD_Ports] = write_ports,
> > [NFSD_MaxBlkSize] = write_maxblksize,
> > + [NFSD_MinThreads] = write_minthreads,
> > #ifdef CONFIG_NFSD_V4
> > [NFSD_Leasetime] = write_leasetime,
> > [NFSD_Gracetime] = write_gracetime,
> > @@ -899,6 +902,46 @@ static ssize_t write_maxblksize(struct file *file,
> > char *buf, size_t size)
> > nfsd_max_blksize);
> > }
> >
> > +/*
> > + * write_minthreads - Set or report the current min number of threads
> > + *
> > + * Input:
> > + * buf: ignored
> > + * size: zero
> > + * OR
> > + *
> > + * Input:
> > + * buf: C string containing an unsigned
> > + * integer value representing the new
> > + * max number of threads
>
> s/max number of threads/min number of threads
>
Will fix.
>
> > + * size: non-zero length of C string in @buf
> > + * Output:
> > + * On success: passed-in buffer filled with '\n'-terminated C string
> > + * containing numeric value of min_threads setting
> > + * for this net namespace;
> > + * return code is the size in bytes of the string
> > + * On error: return code is zero or a negative errno value
> > + */
> > +static ssize_t write_minthreads(struct file *file, char *buf, size_t
> > size)
> > +{
> > + char *mesg = buf;
> > + struct nfsd_net *nn = net_generic(netns(file), nfsd_net_id);
> > + unsigned int minthreads = nn->min_threads;
> > +
> > + if (size > 0) {
>
> What if @size is a very large number?
>
Then it'll basically be ignored and you'll get a static "threads" count
of threads.
>
> > + int rv = get_uint(&mesg, &minthreads);
> > +
> > + if (rv)
> > + return rv;
> > + trace_nfsd_ctl_minthreads(netns(file), minthreads);
> > + mutex_lock(&nfsd_mutex);
> > + nn->min_threads = minthreads;
> > + mutex_unlock(&nfsd_mutex);
> > + }
> > +
> > + return scnprintf(buf, SIMPLE_TRANSACTION_LIMIT, "%u\n", minthreads);
> > +}
> > +
> > #ifdef CONFIG_NFSD_V4
> > static ssize_t __nfsd4_write_time(struct file *file, char *buf, size_t
> > size,
> > time64_t *time, struct nfsd_net *nn)
> > @@ -1292,6 +1335,7 @@ static int nfsd_fill_super(struct super_block
> > *sb, struct fs_context *fc)
> > [NFSD_Versions] = {"versions", &transaction_ops, S_IWUSR|S_IRUSR},
> > [NFSD_Ports] = {"portlist", &transaction_ops, S_IWUSR|S_IRUGO},
> > [NFSD_MaxBlkSize] = {"max_block_size", &transaction_ops,
> > S_IWUSR|S_IRUGO},
> > + [NFSD_MinThreads] = {"min_threads", &transaction_ops,
> > S_IWUSR|S_IRUGO},
> > [NFSD_Filecache] = {"filecache", &nfsd_file_cache_stats_fops,
> > S_IRUGO},
> > #ifdef CONFIG_NFSD_V4
> > [NFSD_Leasetime] = {"nfsv4leasetime", &transaction_ops,
> > S_IWUSR|S_IRUSR},
> > @@ -1636,6 +1680,10 @@ int nfsd_nl_threads_set_doit(struct sk_buff
> > *skb, struct genl_info *info)
> > scope = nla_data(attr);
> > }
> >
> > + attr = info->attrs[NFSD_A_SERVER_MIN_THREADS];
> > + if (attr)
> > + nn->min_threads = nla_get_u32(attr);
> > +
> > ret = nfsd_svc(nrpools, nthreads, net, get_current_cred(), scope);
> > if (ret > 0)
> > ret = 0;
> > @@ -1675,6 +1723,8 @@ int nfsd_nl_threads_get_doit(struct sk_buff *skb,
> > struct genl_info *info)
> > nn->nfsd4_grace) ||
> > nla_put_u32(skb, NFSD_A_SERVER_LEASETIME,
> > nn->nfsd4_lease) ||
> > + nla_put_u32(skb, NFSD_A_SERVER_MIN_THREADS,
> > + nn->min_threads) ||
> > nla_put_string(skb, NFSD_A_SERVER_SCOPE,
> > nn->nfsd_name);
> > if (err)
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index
> > 26c3a6cb1f400f1b757d26f6ba77e27deb7e8ee2..d6120dd843ac1b6a42f0ef331700f4d6d70d8c38
> > 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
> > }
> >
> > /* Kill outstanding nfsd threads */
> > - svc_set_num_threads(serv, 0, 0);
> > + svc_set_num_threads(serv, 0, nn->min_threads);
>
> Seems like this could actually /start/ threads during NFSD shutdown.
> At the very least it needs an explanatory comment.
>
I can add a comment, but if min-threads > threads, then min-threads
will have no effect.
>
> > nfsd_destroy_serv(net);
> > mutex_unlock(&nfsd_mutex);
> > }
> > @@ -704,7 +704,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> > net *net)
> >
> > /* Special case: When n == 1, distribute threads equally among pools. */
> > if (n == 1)
> > - return svc_set_num_threads(nn->nfsd_serv, nthreads[0], 0);
> > + return svc_set_num_threads(nn->nfsd_serv, nthreads[0], nn->min_threads);
> >
> > if (n > nn->nfsd_serv->sv_nrpools)
> > n = nn->nfsd_serv->sv_nrpools;
> > @@ -732,7 +732,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> > net *net)
> > for (i = 0; i < n; i++) {
> > err = svc_set_pool_threads(nn->nfsd_serv,
> > &nn->nfsd_serv->sv_pools[i],
> > - nthreads[i], 0);
> > + nthreads[i], nn->min_threads);
> > if (err)
> > goto out;
> > }
> > @@ -741,7 +741,7 @@ int nfsd_set_nrthreads(int n, int *nthreads, struct
> > net *net)
> > for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
> > err = svc_set_pool_threads(nn->nfsd_serv,
> > &nn->nfsd_serv->sv_pools[i],
> > - 0, 0);
> > + 0, nn->min_threads);
> > if (err)
> > goto out;
> > }
> > diff --git a/fs/nfsd/trace.h b/fs/nfsd/trace.h
> > index
> > 8885fd9bead98ebf55379d68ab9c3701981a5150..d1d0b0dd054588a8c20e3386356dfa4e9632b8e0
> > 100644
> > --- a/fs/nfsd/trace.h
> > +++ b/fs/nfsd/trace.h
> > @@ -2164,6 +2164,25 @@ TRACE_EVENT(nfsd_ctl_maxblksize,
> > )
> > );
> >
> > +TRACE_EVENT(nfsd_ctl_minthreads,
> > + TP_PROTO(
> > + const struct net *net,
> > + int minthreads
> > + ),
> > + TP_ARGS(net, minthreads),
> > + TP_STRUCT__entry(
> > + __field(unsigned int, netns_ino)
> > + __field(int, minthreads)
> > + ),
> > + TP_fast_assign(
> > + __entry->netns_ino = net->ns.inum;
> > + __entry->minthreads = minthreads
> > + ),
> > + TP_printk("minthreads=%d",
> > + __entry->minthreads
> > + )
> > +);
> > +
> > TRACE_EVENT(nfsd_ctl_time,
> > TP_PROTO(
> > const struct net *net,
> > diff --git a/include/uapi/linux/nfsd_netlink.h
> > b/include/uapi/linux/nfsd_netlink.h
> > index
> > e157e2009ea8c1ef805301261d536c82677821ef..e9efbc9e63d83ed25fcd790b7a877c0023638f15
> > 100644
> > --- a/include/uapi/linux/nfsd_netlink.h
> > +++ b/include/uapi/linux/nfsd_netlink.h
> > @@ -35,6 +35,7 @@ enum {
> > NFSD_A_SERVER_GRACETIME,
> > NFSD_A_SERVER_LEASETIME,
> > NFSD_A_SERVER_SCOPE,
> > + NFSD_A_SERVER_MIN_THREADS,
> >
> > __NFSD_A_SERVER_MAX,
> > NFSD_A_SERVER_MAX = (__NFSD_A_SERVER_MAX - 1)
> >
> > --
> > 2.52.0
>
> Thanks, Neil and Jeff, for pulling all this together. It's good to
> have something we can noodle on now.
>
Definitely. Neil's patches were a great start to this.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: [PATCH RFC 1/6] sunrpc: split svc_set_num_threads() into two functions
2025-12-13 19:29 ` Chuck Lever
@ 2025-12-13 21:55 ` Jeff Layton
0 siblings, 0 replies; 17+ messages in thread
From: Jeff Layton @ 2025-12-13 21:55 UTC (permalink / raw)
To: Chuck Lever, Chuck Lever, NeilBrown, Olga Kornievskaia, Dai Ngo,
Tom Talpey, Trond Myklebust, Anna Schumaker
Cc: linux-nfs, linux-kernel
On Sat, 2025-12-13 at 14:29 -0500, Chuck Lever wrote:
>
> On Fri, Dec 12, 2025, at 5:39 PM, Jeff Layton wrote:
> > svc_set_num_threads() will set the number of running threads for a given
> > pool. If the pool argument is set to NULL however, it will distribute
> > the threads among all of the pools evenly.
> >
> > These divergent codepaths complicate the move to dynamic threading.
> > Simplify the API by splitting these two cases into different helpers:
> >
> > Add a new svc_set_pool_threads() function that sets the number of
> > threads in a single, given pool. Modify svc_set_num_threads() to
> > distribute the threads evenly between all of the pools and then call
> > svc_set_pool_threads() for each.
> >
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > fs/lockd/svc.c | 4 ++--
> > fs/nfs/callback.c | 8 +++----
> > fs/nfsd/nfssvc.c | 21 ++++++++----------
> > include/linux/sunrpc/svc.h | 3 ++-
> > net/sunrpc/svc.c | 54 +++++++++++++++++++++++++++++++++++++---------
> > 5 files changed, 61 insertions(+), 29 deletions(-)
> >
> > diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c
> > index
> > d68afa196535a8785bab2931c2b14f03a1174ef9..fbf132b4e08d11a91784c21ee0209fd7c149fd9d
> > 100644
> > --- a/fs/lockd/svc.c
> > +++ b/fs/lockd/svc.c
> > @@ -340,7 +340,7 @@ static int lockd_get(void)
> > return -ENOMEM;
> > }
> >
> > - error = svc_set_num_threads(serv, NULL, 1);
> > + error = svc_set_num_threads(serv, 1);
> > if (error < 0) {
> > svc_destroy(&serv);
> > return error;
> > @@ -368,7 +368,7 @@ static void lockd_put(void)
> > unregister_inet6addr_notifier(&lockd_inet6addr_notifier);
> > #endif
> >
> > - svc_set_num_threads(nlmsvc_serv, NULL, 0);
> > + svc_set_num_threads(nlmsvc_serv, 0);
> > timer_delete_sync(&nlmsvc_retry);
> > svc_destroy(&nlmsvc_serv);
> > dprintk("lockd_down: service destroyed\n");
> > diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
> > index
> > c8b837006bb27277ab34fe516f1b63992fee9b7f..44b35b7f8dc022f1d8c069eaf2f7d334c93f77fc
> > 100644
> > --- a/fs/nfs/callback.c
> > +++ b/fs/nfs/callback.c
> > @@ -119,9 +119,9 @@ static int nfs_callback_start_svc(int minorversion,
> > struct rpc_xprt *xprt,
> > if (serv->sv_nrthreads == nrservs)
> > return 0;
> >
> > - ret = svc_set_num_threads(serv, NULL, nrservs);
> > + ret = svc_set_num_threads(serv, nrservs);
> > if (ret) {
> > - svc_set_num_threads(serv, NULL, 0);
> > + svc_set_num_threads(serv, 0);
> > return ret;
> > }
> > dprintk("nfs_callback_up: service started\n");
> > @@ -242,7 +242,7 @@ int nfs_callback_up(u32 minorversion, struct
> > rpc_xprt *xprt)
> > cb_info->users++;
> > err_net:
> > if (!cb_info->users) {
> > - svc_set_num_threads(cb_info->serv, NULL, 0);
> > + svc_set_num_threads(cb_info->serv, 0);
> > svc_destroy(&cb_info->serv);
> > }
> > err_create:
> > @@ -268,7 +268,7 @@ void nfs_callback_down(int minorversion, struct net
> > *net)
> > nfs_callback_down_net(minorversion, serv, net);
> > cb_info->users--;
> > if (cb_info->users == 0) {
> > - svc_set_num_threads(serv, NULL, 0);
> > + svc_set_num_threads(serv, 0);
> > dprintk("nfs_callback_down: service destroyed\n");
> > svc_destroy(&cb_info->serv);
> > }
> > diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
> > index
> > 93f7435cafd2362d9ddb28815277c824067cb370..aafec8ff588b85b0e26d40b76ef00953dc6472b4
> > 100644
> > --- a/fs/nfsd/nfssvc.c
> > +++ b/fs/nfsd/nfssvc.c
> > @@ -594,7 +594,7 @@ void nfsd_shutdown_threads(struct net *net)
> > }
> >
> > /* Kill outstanding nfsd threads */
> > - svc_set_num_threads(serv, NULL, 0);
> > + svc_set_num_threads(serv, 0);
> > nfsd_destroy_serv(net);
> > mutex_unlock(&nfsd_mutex);
> > }
> > @@ -702,12 +702,9 @@ int nfsd_set_nrthreads(int n, int *nthreads,
> > struct net *net)
> > if (nn->nfsd_serv == NULL || n <= 0)
> > return 0;
> >
> > - /*
> > - * Special case: When n == 1, pass in NULL for the pool, so that the
> > - * change is distributed equally among them.
> > - */
> > + /* Special case: When n == 1, distribute threads equally among pools. */
> > if (n == 1)
> > - return svc_set_num_threads(nn->nfsd_serv, NULL, nthreads[0]);
> > + return svc_set_num_threads(nn->nfsd_serv, nthreads[0]);
> >
> > if (n > nn->nfsd_serv->sv_nrpools)
> > n = nn->nfsd_serv->sv_nrpools;
> > @@ -733,18 +730,18 @@ int nfsd_set_nrthreads(int n, int *nthreads,
> > struct net *net)
> >
> > /* apply the new numbers */
> > for (i = 0; i < n; i++) {
> > - err = svc_set_num_threads(nn->nfsd_serv,
> > - &nn->nfsd_serv->sv_pools[i],
> > - nthreads[i]);
> > + err = svc_set_pool_threads(nn->nfsd_serv,
> > + &nn->nfsd_serv->sv_pools[i],
> > + nthreads[i]);
> > if (err)
> > goto out;
> > }
> >
> > /* Anything undefined in array is considered to be 0 */
> > for (i = n; i < nn->nfsd_serv->sv_nrpools; ++i) {
> > - err = svc_set_num_threads(nn->nfsd_serv,
> > - &nn->nfsd_serv->sv_pools[i],
> > - 0);
> > + err = svc_set_pool_threads(nn->nfsd_serv,
> > + &nn->nfsd_serv->sv_pools[i],
> > + 0);
> > if (err)
> > goto out;
> > }
> > diff --git a/include/linux/sunrpc/svc.h b/include/linux/sunrpc/svc.h
> > index
> > 5506d20857c318774cd223272d4b0022cc19ffb8..dd5fbbf8b3d39df6c17a7624edf344557fffd32c
> > 100644
> > --- a/include/linux/sunrpc/svc.h
> > +++ b/include/linux/sunrpc/svc.h
> > @@ -446,7 +446,8 @@ struct svc_serv * svc_create_pooled(struct
> > svc_program *prog,
> > struct svc_stat *stats,
> > unsigned int bufsize,
> > int (*threadfn)(void *data));
> > -int svc_set_num_threads(struct svc_serv *, struct svc_pool *, int);
> > +int svc_set_pool_threads(struct svc_serv *, struct svc_pool *,
> > int);
> > +int svc_set_num_threads(struct svc_serv *, int);
> > int svc_pool_stats_open(struct svc_info *si, struct file *file);
> > void svc_process(struct svc_rqst *rqstp);
> > void svc_process_bc(struct rpc_rqst *req, struct svc_rqst *rqstp);
> > diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
> > index
> > 4704dce7284eccc9e2bc64cf22947666facfa86a..3fe5a7f8e57e3fa3837265ec06884b357d5373ff
> > 100644
> > --- a/net/sunrpc/svc.c
> > +++ b/net/sunrpc/svc.c
> > @@ -856,15 +856,12 @@ svc_stop_kthreads(struct svc_serv *serv, struct
> > svc_pool *pool, int nrservs)
> > }
> >
> > /**
> > - * svc_set_num_threads - adjust number of threads per RPC service
> > + * svc_set_pool_threads - adjust number of threads per pool
> > * @serv: RPC service to adjust
> > - * @pool: Specific pool from which to choose threads, or NULL
> > + * @pool: Specific pool from which to choose threads
> > * @nrservs: New number of threads for @serv (0 or less means kill all
> > threads)
> > *
> > - * Create or destroy threads to make the number of threads for @serv
> > the
> > - * given number. If @pool is non-NULL, change only threads in that
> > pool;
> > - * otherwise, round-robin between all pools for @serv. @serv's
> > - * sv_nrthreads is adjusted for each thread created or destroyed.
> > + * Create or destroy threads in @pool to bring it to @nrservs.
> > *
> > * Caller must ensure mutual exclusion between this and server startup
> > or
> > * shutdown.
> > @@ -873,12 +870,12 @@ svc_stop_kthreads(struct svc_serv *serv, struct
> > svc_pool *pool, int nrservs)
> > * starting a thread.
> > */
> > int
> > -svc_set_num_threads(struct svc_serv *serv, struct svc_pool *pool, int
> > nrservs)
> > +svc_set_pool_threads(struct svc_serv *serv, struct svc_pool *pool, int
> > nrservs)
> > {
> > if (!pool)
> > - nrservs -= serv->sv_nrthreads;
> > - else
> > - nrservs -= pool->sp_nrthreads;
> > + return -EINVAL;
> > +
> > + nrservs -= pool->sp_nrthreads;
> >
> > if (nrservs > 0)
> > return svc_start_kthreads(serv, pool, nrservs);
> > @@ -886,6 +883,43 @@ svc_set_num_threads(struct svc_serv *serv, struct
> > svc_pool *pool, int nrservs)
> > return svc_stop_kthreads(serv, pool, nrservs);
> > return 0;
> > }
> > +EXPORT_SYMBOL_GPL(svc_set_pool_threads);
> > +
> > +/**
> > + * svc_set_num_threads - adjust number of threads in serv
> > + * @serv: RPC service to adjust
> > + * @nrservs: New number of threads for @serv (0 or less means kill all
> > threads)
> > + *
> > + * Create or destroy threads in @serv to bring it to @nrservs. If there
> > + * are multiple pools then the new threads or victims will be
> > distributed
> > + * evenly among them.
> > + *
> > + * Caller must ensure mutual exclusion between this and server startup
> > or
> > + * shutdown.
> > + *
> > + * Returns zero on success or a negative errno if an error occurred
> > while
> > + * starting a thread.
> > + */
> > +int
> > +svc_set_num_threads(struct svc_serv *serv, int nrservs)
> > +{
> > + int base = nrservs / serv->sv_nrpools;
> > + int remain = nrservs % serv->sv_nrpools;
> > + int i, err;
> > +
> > + for (i = 0; i < serv->sv_nrpools; ++i) {
>
> If sv_nrpools happens to be zero, then the loop doesn't
> execute at all, and err is left containing stack garbage.
> Is sv_nrpools guaranteed to be non-zero? If not then err
> needs to be initialized before the loop runs. I see that
> nfsd_set_nrthreads() in fs/nfsd/nfssvc.c has "int err = 0"
> for a similar loop pattern.
>
sv_nrpools should always be non-zero. There are many places in the rpc
layer that depend on having at least one pool. From __svc_create:
serv->sv_nrpools = npools;
serv->sv_pools =
kcalloc(serv->sv_nrpools, sizeof(struct svc_pool),
GFP_KERNEL);
if (!serv->sv_pools) {
kfree(serv);
return NULL;
}
I think kcalloc returns NULL if you pass it 0 for "n", so creation
should fail if that happens. None of the in-tree callers ever pass in 0
for the npools, but maybe that's worth an explicit check in
__svc_create(). I can add one.
>
> > + int threads = base;
> > +
> > + if (remain) {
> > + ++threads;
> > + --remain;
> > + }
> > + err = svc_set_pool_threads(serv, &serv->sv_pools[i], threads);
> > + if (err)
> > + break;
> > + }
> > + return err;
> > +}
> > EXPORT_SYMBOL_GPL(svc_set_num_threads);
> >
> > /**
> >
> > --
> > 2.52.0
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2025-12-13 21:55 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-12 22:39 [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 1/6] sunrpc: split svc_set_num_threads() into two functions Jeff Layton
2025-12-13 19:29 ` Chuck Lever
2025-12-13 21:55 ` Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 2/6] sunrpc: remove special handling of NULL pool from svc_start/stop_kthreads() Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 3/6] sunrpc: track the max number of requested threads in a pool Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 4/6] sunrpc: introduce the concept of a minimum number of threads per pool Jeff Layton
2025-12-13 20:19 ` Chuck Lever
2025-12-13 21:38 ` Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 5/6] nfsd: adjust number of running nfsd threads based on activity Jeff Layton
2025-12-13 20:54 ` Chuck Lever
2025-12-13 21:43 ` Jeff Layton
2025-12-12 22:39 ` [PATCH RFC 6/6] nfsd: add controls to set the minimum number of threads per pool Jeff Layton
2025-12-13 21:10 ` Chuck Lever
2025-12-13 21:47 ` Jeff Layton
2025-12-13 19:34 ` [PATCH RFC 0/6] nfsd: allow for a dynamically-sized threadpool Chuck Lever
2025-12-13 21:35 ` Jeff Layton
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).