* [PATCH RFC] sunrpc: hardcode pool_mode to pernode, remove other modes
@ 2026-04-23 14:39 Jeff Layton
2026-04-23 16:45 ` Chuck Lever
0 siblings, 1 reply; 4+ messages in thread
From: Jeff Layton @ 2026-04-23 14:39 UTC (permalink / raw)
To: Trond Myklebust, Anna Schumaker, Chuck Lever, NeilBrown,
Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel, Jeff Layton
The SVC_POOL_AUTO/GLOBAL/PERCPU/PERNODE pool mode selection machinery
was added when NUMA was new and the right default was unclear. Today,
pernode is the right choice everywhere:
- On multi-NUMA hosts, it gives one pool per node with proper thread
affinity and NUMA-local memory allocation.
- On single-node hosts, pernode degenerates to exactly one pool,
identical to the old "global" mode -- svc_pool_for_cpu() short-
circuits when sv_nrpools <= 1, no CPU affinity is set, and memory
is allocated from the single node.
The percpu mode (one pool per CPU) created excessive pools relative to
the number of threads most deployments run, and was only auto-selected
in a narrow case (single node, >2 CPUs).
Remove the SVC_POOL_* enum, mode selection heuristic,
svc_pool_map_init_percpu(), and all mode-based switch statements.
Simplify pool map functions to always use the pernode path.
The module parameter and netlink interfaces are preserved for backward
compatibility:
- Writing "pernode" succeeds; any other value returns -EINVAL
- Reading always returns "pernode"
- Writing to the module parameter emits a deprecation notice
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
In hindsight, I wish I had proposed this before adding the pool-mode
settings to the new nfsd netlink interfaces.
If this is too radical as a single step, we just could switch the
default to "pernode", add a warning and leave the interfaces alone for
now. Or maybe do that and go ahead and remove the percpu setting?
Thoughts?
---
net/sunrpc/svc.c | 231 +++++++------------------------------------------------
1 file changed, 29 insertions(+), 202 deletions(-)
diff --git a/net/sunrpc/svc.c b/net/sunrpc/svc.c
index 576fa42e7abf..8ecdd95c4867 100644
--- a/net/sunrpc/svc.c
+++ b/net/sunrpc/svc.c
@@ -38,19 +38,6 @@
static void svc_unregister(const struct svc_serv *serv, struct net *net);
-#define SVC_POOL_DEFAULT SVC_POOL_GLOBAL
-
-/*
- * Mode for mapping cpus to pools.
- */
-enum {
- SVC_POOL_AUTO = -1, /* choose one of the others */
- SVC_POOL_GLOBAL, /* no mapping, just a single global pool
- * (legacy & UP mode) */
- SVC_POOL_PERCPU, /* one pool per cpu */
- SVC_POOL_PERNODE /* one pool per numa node */
-};
-
/*
* Structure for mapping cpus to pools and vice versa.
* Setup once during sunrpc initialisation.
@@ -58,62 +45,20 @@ enum {
struct svc_pool_map {
int count; /* How many svc_servs use us */
- int mode; /* Note: int not enum to avoid
- * warnings about "enumeration value
- * not handled in switch" */
unsigned int npools;
- unsigned int *pool_to; /* maps pool id to cpu or node */
- unsigned int *to_pool; /* maps cpu or node to pool id */
+ unsigned int *pool_to; /* maps pool id to node */
+ unsigned int *to_pool; /* maps node to pool id */
};
-static struct svc_pool_map svc_pool_map = {
- .mode = SVC_POOL_DEFAULT
-};
+static struct svc_pool_map svc_pool_map;
static DEFINE_MUTEX(svc_pool_map_mutex);/* protects svc_pool_map.count only */
-static int
-__param_set_pool_mode(const char *val, struct svc_pool_map *m)
-{
- int err, mode;
-
- mutex_lock(&svc_pool_map_mutex);
-
- err = 0;
- if (!strncmp(val, "auto", 4))
- mode = SVC_POOL_AUTO;
- else if (!strncmp(val, "global", 6))
- mode = SVC_POOL_GLOBAL;
- else if (!strncmp(val, "percpu", 6))
- mode = SVC_POOL_PERCPU;
- else if (!strncmp(val, "pernode", 7))
- mode = SVC_POOL_PERNODE;
- else
- err = -EINVAL;
-
- if (err)
- goto out;
-
- if (m->count == 0)
- m->mode = mode;
- else if (mode != m->mode)
- err = -EBUSY;
-out:
- mutex_unlock(&svc_pool_map_mutex);
- return err;
-}
-
-static int
-param_set_pool_mode(const char *val, const struct kernel_param *kp)
-{
- struct svc_pool_map *m = kp->arg;
-
- return __param_set_pool_mode(val, m);
-}
-
int sunrpc_set_pool_mode(const char *val)
{
- return __param_set_pool_mode(val, &svc_pool_map);
+ if (strncmp(val, "pernode", 7))
+ return -EINVAL;
+ return 0;
}
EXPORT_SYMBOL(sunrpc_set_pool_mode);
@@ -122,84 +67,32 @@ EXPORT_SYMBOL(sunrpc_set_pool_mode);
* @buf: where to write the current pool_mode
* @size: size of @buf
*
- * Grab the current pool_mode from the svc_pool_map and write
- * the resulting string to @buf. Returns the number of characters
+ * Write the pool_mode string to @buf. Returns the number of characters
* written to @buf (a'la snprintf()).
*/
int
sunrpc_get_pool_mode(char *buf, size_t size)
{
- struct svc_pool_map *m = &svc_pool_map;
-
- switch (m->mode)
- {
- case SVC_POOL_AUTO:
- return snprintf(buf, size, "auto");
- case SVC_POOL_GLOBAL:
- return snprintf(buf, size, "global");
- case SVC_POOL_PERCPU:
- return snprintf(buf, size, "percpu");
- case SVC_POOL_PERNODE:
- return snprintf(buf, size, "pernode");
- default:
- return snprintf(buf, size, "%d", m->mode);
- }
+ return snprintf(buf, size, "pernode");
}
EXPORT_SYMBOL(sunrpc_get_pool_mode);
static int
-param_get_pool_mode(char *buf, const struct kernel_param *kp)
+param_set_pool_mode(const char *val, const struct kernel_param *kp)
{
- char str[16];
- int len;
-
- len = sunrpc_get_pool_mode(str, ARRAY_SIZE(str));
-
- /* Ensure we have room for newline and NUL */
- len = min_t(int, len, ARRAY_SIZE(str) - 2);
-
- /* tack on the newline */
- str[len] = '\n';
- str[len + 1] = '\0';
-
- return sysfs_emit(buf, "%s", str);
+ pr_notice("sunrpc: the pool_mode parameter is deprecated and will be removed in a future release.\n");
+ return sunrpc_set_pool_mode(val);
}
-module_param_call(pool_mode, param_set_pool_mode, param_get_pool_mode,
- &svc_pool_map, 0644);
-
-/*
- * Detect best pool mapping mode heuristically,
- * according to the machine's topology.
- */
static int
-svc_pool_map_choose_mode(void)
+param_get_pool_mode(char *buf, const struct kernel_param *kp)
{
- unsigned int node;
-
- if (nr_online_nodes > 1) {
- /*
- * Actually have multiple NUMA nodes,
- * so split pools on NUMA node boundaries
- */
- return SVC_POOL_PERNODE;
- }
-
- node = first_online_node;
- if (nr_cpus_node(node) > 2) {
- /*
- * Non-trivial SMP, or CONFIG_NUMA on
- * non-NUMA hardware, e.g. with a generic
- * x86_64 kernel on Xeons. In this case we
- * want to divide the pools on cpu boundaries.
- */
- return SVC_POOL_PERCPU;
- }
-
- /* default: one global pool */
- return SVC_POOL_GLOBAL;
+ return sysfs_emit(buf, "pernode\n");
}
+module_param_call(pool_mode, param_set_pool_mode, param_get_pool_mode,
+ NULL, 0644);
+
/*
* Allocate the to_pool[] and pool_to[] arrays.
* Returns 0 on success or an errno.
@@ -224,35 +117,7 @@ svc_pool_map_alloc_arrays(struct svc_pool_map *m, unsigned int maxpools)
}
/*
- * Initialise the pool map for SVC_POOL_PERCPU mode.
- * Returns number of pools or <0 on error.
- */
-static int
-svc_pool_map_init_percpu(struct svc_pool_map *m)
-{
- unsigned int maxpools = nr_cpu_ids;
- unsigned int pidx = 0;
- unsigned int cpu;
- int err;
-
- err = svc_pool_map_alloc_arrays(m, maxpools);
- if (err)
- return err;
-
- for_each_online_cpu(cpu) {
- BUG_ON(pidx >= maxpools);
- m->to_pool[cpu] = pidx;
- m->pool_to[pidx] = cpu;
- pidx++;
- }
- /* cpus brought online later all get mapped to pool0, sorry */
-
- return pidx;
-};
-
-
-/*
- * Initialise the pool map for SVC_POOL_PERNODE mode.
+ * Initialise the pool map for one pool per NUMA node.
* Returns number of pools or <0 on error.
*/
static int
@@ -284,14 +149,13 @@ svc_pool_map_init_pernode(struct svc_pool_map *m)
* Add a reference to the global map of cpus to pools (and
* vice versa) if pools are in use.
* Initialise the map if we're the first user.
- * Returns the number of pools. If this is '1', no reference
- * was taken.
+ * Returns the number of pools.
*/
static unsigned int
svc_pool_map_get(void)
{
struct svc_pool_map *m = &svc_pool_map;
- int npools = -1;
+ int npools;
mutex_lock(&svc_pool_map_mutex);
if (m->count++) {
@@ -299,23 +163,9 @@ svc_pool_map_get(void)
return m->npools;
}
- if (m->mode == SVC_POOL_AUTO)
- m->mode = svc_pool_map_choose_mode();
-
- switch (m->mode) {
- case SVC_POOL_PERCPU:
- npools = svc_pool_map_init_percpu(m);
- break;
- case SVC_POOL_PERNODE:
- npools = svc_pool_map_init_pernode(m);
- break;
- }
-
- if (npools <= 0) {
- /* default, or memory allocation failure */
+ npools = svc_pool_map_init_pernode(m);
+ if (npools <= 0)
npools = 1;
- m->mode = SVC_POOL_GLOBAL;
- }
m->npools = npools;
mutex_unlock(&svc_pool_map_mutex);
return npools;
@@ -346,14 +196,11 @@ static int svc_pool_map_get_node(unsigned int pidx)
{
const struct svc_pool_map *m = &svc_pool_map;
- if (m->count) {
- if (m->mode == SVC_POOL_PERCPU)
- return cpu_to_node(m->pool_to[pidx]);
- if (m->mode == SVC_POOL_PERNODE)
- return m->pool_to[pidx];
- }
+ if (m->count)
+ return m->pool_to[pidx];
return numa_mem_id();
}
+
/*
* Set the given thread's cpus_allowed mask so that it
* will only run on cpus in the given pool.
@@ -372,27 +219,15 @@ svc_pool_map_set_cpumask(struct task_struct *task, unsigned int pidx)
if (m->count == 0)
return;
- switch (m->mode) {
- case SVC_POOL_PERCPU:
- {
- set_cpus_allowed_ptr(task, cpumask_of(node));
- break;
- }
- case SVC_POOL_PERNODE:
- {
- set_cpus_allowed_ptr(task, cpumask_of_node(node));
- break;
- }
- }
+ set_cpus_allowed_ptr(task, cpumask_of_node(node));
}
/**
* svc_pool_for_cpu - Select pool to run a thread on this cpu
* @serv: An RPC service
*
- * Use the active CPU and the svc_pool_map's mode setting to
- * select the svc thread pool to use. Once initialized, the
- * svc_pool_map does not change.
+ * Use the active CPU and the svc_pool_map to select the svc thread
+ * pool to use. Once initialized, the svc_pool_map does not change.
*
* Return value:
* A pointer to an svc_pool
@@ -400,20 +235,12 @@ svc_pool_map_set_cpumask(struct task_struct *task, unsigned int pidx)
struct svc_pool *svc_pool_for_cpu(struct svc_serv *serv)
{
struct svc_pool_map *m = &svc_pool_map;
- int cpu = raw_smp_processor_id();
- unsigned int pidx = 0;
+ unsigned int pidx;
if (serv->sv_nrpools <= 1)
return serv->sv_pools;
- switch (m->mode) {
- case SVC_POOL_PERCPU:
- pidx = m->to_pool[cpu];
- break;
- case SVC_POOL_PERNODE:
- pidx = m->to_pool[cpu_to_node(cpu)];
- break;
- }
+ pidx = m->to_pool[cpu_to_node(raw_smp_processor_id())];
return &serv->sv_pools[pidx % serv->sv_nrpools];
}
---
base-commit: 2e68039281932e6dc37718a1ea7cbb8e2cda42e6
change-id: 20260423-sunrpc-pool-mode-3e6b56320dc4
Best regards,
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] sunrpc: hardcode pool_mode to pernode, remove other modes
2026-04-23 14:39 [PATCH RFC] sunrpc: hardcode pool_mode to pernode, remove other modes Jeff Layton
@ 2026-04-23 16:45 ` Chuck Lever
2026-04-24 0:26 ` Jeff Layton
0 siblings, 1 reply; 4+ messages in thread
From: Chuck Lever @ 2026-04-23 16:45 UTC (permalink / raw)
To: Jeff Layton, Trond Myklebust, Anna Schumaker, Chuck Lever,
NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel
On Thu, Apr 23, 2026, at 10:39 AM, Jeff Layton wrote:
> The SVC_POOL_AUTO/GLOBAL/PERCPU/PERNODE pool mode selection machinery
> was added when NUMA was new and the right default was unclear. Today,
> pernode is the right choice everywhere:
>
> - On multi-NUMA hosts, it gives one pool per node with proper thread
> affinity and NUMA-local memory allocation.
> - On single-node hosts, pernode degenerates to exactly one pool,
> identical to the old "global" mode -- svc_pool_for_cpu() short-
> circuits when sv_nrpools <= 1, no CPU affinity is set, and memory
> is allocated from the single node.
>
> The percpu mode (one pool per CPU) created excessive pools relative to
> the number of threads most deployments run, and was only auto-selected
> in a narrow case (single node, >2 CPUs).
>
> Remove the SVC_POOL_* enum, mode selection heuristic,
> svc_pool_map_init_percpu(), and all mode-based switch statements.
> Simplify pool map functions to always use the pernode path.
>
> The module parameter and netlink interfaces are preserved for backward
> compatibility:
> - Writing "pernode" succeeds; any other value returns -EINVAL
> - Reading always returns "pernode"
> - Writing to the module parameter emits a deprecation notice
>
> Assisted-by: Claude:claude-opus-4-6
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
> In hindsight, I wish I had proposed this before adding the pool-mode
> settings to the new nfsd netlink interfaces.
>
> If this is too radical as a single step, we just could switch the
> default to "pernode", add a warning and leave the interfaces alone for
> now. Or maybe do that and go ahead and remove the percpu setting?
>
> Thoughts?
Generally, I think the end goal of making "per-node" the default
and only setting is correct. My comments below are about how we
get there.
The main concern is not perturbing any configuration that today
happens to set the module parameter.
The proposed patch correctly preserves the shape of the user/kernel
interfaces (same module parameter name and perms, same netlink command
and attribute, same exported symbol names and signatures). The concern
is that the accepted input set has narrowed from four strings to one
and the setters reject the legacy three with -EINVAL. For the module
parameter that path runs at module load, so existing modprobe.d configs
written any time in the last ~19 years will cause load-time parameter
errors. Some might categorize that as a regression.
The commit message documents that non-"pernode" writes now return
-EINVAL. The historically correct approach for this kind of obsoleted
tuning knob is to accept-and-ignore the old values (plus the pr_notice)
rather than hard-fail.
Or, put another way, the proposed patch implements something slightly
different than true backwards compatibility. Userspace that previously
set pool_mode=global/percpu/auto now gets -EINVAL, which technically
speaking is a behavioral narrowing, not "backward compatibility."
I might go even further and suggest that perhaps for v7.2:
- Change the behavior of "auto" to be per-node
- Add a deprecation warning that is emitted when setting other modes,
but don't warn when the value is specifically the only accepted one.
Then wait a few more cycles before removal of percpu and global.
Other notes:
o The existing pernode path assumes every online node has both CPUs and
some memory. nr_online_nodes on some platforms (e.g., certain ARM64,
CXL) counts memoryless or CPU-less nodes; for_each_node_with_cpus()
vs. for_each_online_node() matters here. See
svc_pool_map_init_pernode().
o Recommend review of some history on this topic:
https://lore.kernel.org/linux-nfs/f027319a-378d-4b91-a418-c45218fb7a21@oracle.com/
--
Chuck Lever
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] sunrpc: hardcode pool_mode to pernode, remove other modes
2026-04-23 16:45 ` Chuck Lever
@ 2026-04-24 0:26 ` Jeff Layton
2026-04-24 14:02 ` Chuck Lever
0 siblings, 1 reply; 4+ messages in thread
From: Jeff Layton @ 2026-04-24 0:26 UTC (permalink / raw)
To: Chuck Lever, Trond Myklebust, Anna Schumaker, Chuck Lever,
NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel
On Thu, 2026-04-23 at 12:45 -0400, Chuck Lever wrote:
> On Thu, Apr 23, 2026, at 10:39 AM, Jeff Layton wrote:
> > The SVC_POOL_AUTO/GLOBAL/PERCPU/PERNODE pool mode selection machinery
> > was added when NUMA was new and the right default was unclear. Today,
> > pernode is the right choice everywhere:
> >
> > - On multi-NUMA hosts, it gives one pool per node with proper thread
> > affinity and NUMA-local memory allocation.
> > - On single-node hosts, pernode degenerates to exactly one pool,
> > identical to the old "global" mode -- svc_pool_for_cpu() short-
> > circuits when sv_nrpools <= 1, no CPU affinity is set, and memory
> > is allocated from the single node.
> >
> > The percpu mode (one pool per CPU) created excessive pools relative to
> > the number of threads most deployments run, and was only auto-selected
> > in a narrow case (single node, >2 CPUs).
> >
> > Remove the SVC_POOL_* enum, mode selection heuristic,
> > svc_pool_map_init_percpu(), and all mode-based switch statements.
> > Simplify pool map functions to always use the pernode path.
> >
> > The module parameter and netlink interfaces are preserved for backward
> > compatibility:
> > - Writing "pernode" succeeds; any other value returns -EINVAL
> > - Reading always returns "pernode"
> > - Writing to the module parameter emits a deprecation notice
> >
> > Assisted-by: Claude:claude-opus-4-6
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > In hindsight, I wish I had proposed this before adding the pool-mode
> > settings to the new nfsd netlink interfaces.
> >
> > If this is too radical as a single step, we just could switch the
> > default to "pernode", add a warning and leave the interfaces alone for
> > now. Or maybe do that and go ahead and remove the percpu setting?
> >
> > Thoughts?
>
> Generally, I think the end goal of making "per-node" the default
> and only setting is correct. My comments below are about how we
> get there.
>
> The main concern is not perturbing any configuration that today
> happens to set the module parameter.
>
> The proposed patch correctly preserves the shape of the user/kernel
> interfaces (same module parameter name and perms, same netlink command
> and attribute, same exported symbol names and signatures). The concern
> is that the accepted input set has narrowed from four strings to one
> and the setters reject the legacy three with -EINVAL. For the module
> parameter that path runs at module load, so existing modprobe.d configs
> written any time in the last ~19 years will cause load-time parameter
> errors. Some might categorize that as a regression.
>
> The commit message documents that non-"pernode" writes now return
> -EINVAL. The historically correct approach for this kind of obsoleted
> tuning knob is to accept-and-ignore the old values (plus the pr_notice)
> rather than hard-fail.
>
> Or, put another way, the proposed patch implements something slightly
> different than true backwards compatibility. Userspace that previously
> set pool_mode=global/percpu/auto now gets -EINVAL, which technically
> speaking is a behavioral narrowing, not "backward compatibility."
>
> I might go even further and suggest that perhaps for v7.2:
>
> - Change the behavior of "auto" to be per-node
That's already the case if you're on a NUMA box. The problem is that
the default is not "auto" but "global". We could change that (easily),
but I'd argue that "auto" mode just devolves into "pernode" for two
reasons:
- "pernode" is functionally identical to "global" when there is only a
single NUMA node
- "percpu" mode is basically useless
If we want to "go slow" on this, then what I'd probably suggest is that
we just change the default to "pernode" initially.
Single node boxes that have this set to global today should see no real
change (functionally identical). Boxes with multiple NUMA nodes that
don't set it now, will start using pernode mode, but that's probably a
good thing in most cases.
> - Add a deprecation warning that is emitted when setting other modes,
> but don't warn when the value is specifically the only accepted one.
>
We probably do still want to warn in that case. Once we remove the
option, the module load can fail, so we probably want to discourage
people from keeping the setting.
> Then wait a few more cycles before removal of percpu and global.
>
That's fair. We can stage this in slowly.
I'm terrible about following up with the later phases of long term
deprecation though. Suggestions on how to make sure we don't drop the
ball on finishing the change?
> Other notes:
>
> o The existing pernode path assumes every online node has both CPUs and
> some memory. nr_online_nodes on some platforms (e.g., certain ARM64,
> CXL) counts memoryless or CPU-less nodes; for_each_node_with_cpus()
> vs. for_each_online_node() matters here. See
> svc_pool_map_init_pernode().
>
Good point. We should probably change that too.
> o Recommend review of some history on this topic:
> https://lore.kernel.org/linux-nfs/f027319a-378d-4b91-a418-c45218fb7a21@oracle.com/
>
Thanks. I'll read up.
--
Jeff Layton <jlayton@kernel.org>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH RFC] sunrpc: hardcode pool_mode to pernode, remove other modes
2026-04-24 0:26 ` Jeff Layton
@ 2026-04-24 14:02 ` Chuck Lever
0 siblings, 0 replies; 4+ messages in thread
From: Chuck Lever @ 2026-04-24 14:02 UTC (permalink / raw)
To: Jeff Layton, Trond Myklebust, Anna Schumaker, Chuck Lever,
NeilBrown, Olga Kornievskaia, Dai Ngo, Tom Talpey
Cc: linux-nfs, linux-kernel
On Thu, Apr 23, 2026, at 8:26 PM, Jeff Layton wrote:
> On Thu, 2026-04-23 at 12:45 -0400, Chuck Lever wrote:
>> On Thu, Apr 23, 2026, at 10:39 AM, Jeff Layton wrote:
>> > The SVC_POOL_AUTO/GLOBAL/PERCPU/PERNODE pool mode selection machinery
>> > was added when NUMA was new and the right default was unclear. Today,
>> > pernode is the right choice everywhere:
>> >
>> > - On multi-NUMA hosts, it gives one pool per node with proper thread
>> > affinity and NUMA-local memory allocation.
>> > - On single-node hosts, pernode degenerates to exactly one pool,
>> > identical to the old "global" mode -- svc_pool_for_cpu() short-
>> > circuits when sv_nrpools <= 1, no CPU affinity is set, and memory
>> > is allocated from the single node.
>> >
>> > The percpu mode (one pool per CPU) created excessive pools relative to
>> > the number of threads most deployments run, and was only auto-selected
>> > in a narrow case (single node, >2 CPUs).
>> >
>> > Remove the SVC_POOL_* enum, mode selection heuristic,
>> > svc_pool_map_init_percpu(), and all mode-based switch statements.
>> > Simplify pool map functions to always use the pernode path.
>> >
>> > The module parameter and netlink interfaces are preserved for backward
>> > compatibility:
>> > - Writing "pernode" succeeds; any other value returns -EINVAL
>> > - Reading always returns "pernode"
>> > - Writing to the module parameter emits a deprecation notice
>> >
>> > Assisted-by: Claude:claude-opus-4-6
>> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
>> > ---
>> > In hindsight, I wish I had proposed this before adding the pool-mode
>> > settings to the new nfsd netlink interfaces.
>> >
>> > If this is too radical as a single step, we just could switch the
>> > default to "pernode", add a warning and leave the interfaces alone for
>> > now. Or maybe do that and go ahead and remove the percpu setting?
>> >
>> > Thoughts?
>>
>> Generally, I think the end goal of making "per-node" the default
>> and only setting is correct. My comments below are about how we
>> get there.
>>
>> The main concern is not perturbing any configuration that today
>> happens to set the module parameter.
>>
>> The proposed patch correctly preserves the shape of the user/kernel
>> interfaces (same module parameter name and perms, same netlink command
>> and attribute, same exported symbol names and signatures). The concern
>> is that the accepted input set has narrowed from four strings to one
>> and the setters reject the legacy three with -EINVAL. For the module
>> parameter that path runs at module load, so existing modprobe.d configs
>> written any time in the last ~19 years will cause load-time parameter
>> errors. Some might categorize that as a regression.
>>
>> The commit message documents that non-"pernode" writes now return
>> -EINVAL. The historically correct approach for this kind of obsoleted
>> tuning knob is to accept-and-ignore the old values (plus the pr_notice)
>> rather than hard-fail.
>>
>> Or, put another way, the proposed patch implements something slightly
>> different than true backwards compatibility. Userspace that previously
>> set pool_mode=global/percpu/auto now gets -EINVAL, which technically
>> speaking is a behavioral narrowing, not "backward compatibility."
>>
>> I might go even further and suggest that perhaps for v7.2:
>>
>> - Change the behavior of "auto" to be per-node
>
> That's already the case if you're on a NUMA box. The problem is that
> the default is not "auto" but "global". We could change that (easily),
> but I'd argue that "auto" mode just devolves into "pernode" for two
> reasons:
>
> - "pernode" is functionally identical to "global" when there is only a
> single NUMA node
>
> - "percpu" mode is basically useless
>
> If we want to "go slow" on this, then what I'd probably suggest is that
> we just change the default to "pernode" initially.
>
> Single node boxes that have this set to global today should see no real
> change (functionally identical). Boxes with multiple NUMA nodes that
> don't set it now, will start using pernode mode, but that's probably a
> good thing in most cases.
OK, there seems to be a difference between "auto" and "default". That's
a little whacky.
>> - Add a deprecation warning that is emitted when setting other modes,
>> but don't warn when the value is specifically the only accepted one.
>>
>
> We probably do still want to warn in that case. Once we remove the
> option, the module load can fail, so we probably want to discourage
> people from keeping the setting.
I'm saying that setting a deprecated pool mode should not cause
module loading to fail at all. Failing to load will be more painful
than loading and having different thread affinity behavior.
--
Chuck Lever
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-24 14:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-23 14:39 [PATCH RFC] sunrpc: hardcode pool_mode to pernode, remove other modes Jeff Layton
2026-04-23 16:45 ` Chuck Lever
2026-04-24 0:26 ` Jeff Layton
2026-04-24 14:02 ` Chuck Lever
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox