Linux NFS development
 help / color / mirror / Atom feed
From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <cel@kernel.org>,
	Trond Myklebust <trondmy@kernel.org>,
	Anna Schumaker <anna@kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neil@brown.name>,
	Olga Kornievskaia <okorniev@redhat.com>,
	Dai Ngo <Dai.Ngo@oracle.com>, Tom Talpey <tom@talpey.com>
Cc: linux-nfs@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC] sunrpc: hardcode pool_mode to pernode, remove other modes
Date: Thu, 23 Apr 2026 20:26:26 -0400	[thread overview]
Message-ID: <55bfc1442ee1a80723520b714fc5f358d11c4c38.camel@kernel.org> (raw)
In-Reply-To: <f61f49b8-c9c7-4a19-ba03-7f5967fcced8@app.fastmail.com>

On Thu, 2026-04-23 at 12:45 -0400, Chuck Lever wrote:
> On Thu, Apr 23, 2026, at 10:39 AM, Jeff Layton wrote:
> > The SVC_POOL_AUTO/GLOBAL/PERCPU/PERNODE pool mode selection machinery
> > was added when NUMA was new and the right default was unclear.  Today,
> > pernode is the right choice everywhere:
> > 
> > - On multi-NUMA hosts, it gives one pool per node with proper thread
> >   affinity and NUMA-local memory allocation.
> > - On single-node hosts, pernode degenerates to exactly one pool,
> >   identical to the old "global" mode -- svc_pool_for_cpu() short-
> >   circuits when sv_nrpools <= 1, no CPU affinity is set, and memory
> >   is allocated from the single node.
> > 
> > The percpu mode (one pool per CPU) created excessive pools relative to
> > the number of threads most deployments run, and was only auto-selected
> > in a narrow case (single node, >2 CPUs).
> > 
> > Remove the SVC_POOL_* enum, mode selection heuristic,
> > svc_pool_map_init_percpu(), and all mode-based switch statements.
> > Simplify pool map functions to always use the pernode path.
> > 
> > The module parameter and netlink interfaces are preserved for backward
> > compatibility:
> > - Writing "pernode" succeeds; any other value returns -EINVAL
> > - Reading always returns "pernode"
> > - Writing to the module parameter emits a deprecation notice
> > 
> > Assisted-by: Claude:claude-opus-4-6
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> > In hindsight, I wish I had proposed this before adding the pool-mode
> > settings to the new nfsd netlink interfaces.
> > 
> > If this is too radical as a single step, we just could switch the
> > default to "pernode", add a warning and leave the interfaces alone for
> > now. Or maybe do that and go ahead and remove the percpu setting?
> > 
> > Thoughts?
> 
> Generally, I think the end goal of making "per-node" the default
> and only setting is correct. My comments below are about how we
> get there.
> 
> The main concern is not perturbing any configuration that today
> happens to set the module parameter.
> 
> The proposed patch correctly preserves the shape of the user/kernel
> interfaces (same module parameter name and perms, same netlink command
> and attribute, same exported symbol names and signatures). The concern
> is that the accepted input set has narrowed from four strings to one
> and the setters reject the legacy three with -EINVAL. For the module
> parameter that path runs at module load, so existing modprobe.d configs
> written any time in the last ~19 years will cause load-time parameter
> errors. Some might categorize that as a regression.
>                                       
> The commit message documents that non-"pernode" writes now return
> -EINVAL. The historically correct approach for this kind of obsoleted
> tuning knob is to accept-and-ignore the old values (plus the pr_notice)
> rather than hard-fail.
> 
> Or, put another way, the proposed patch implements something slightly
> different than true backwards compatibility. Userspace that previously
> set pool_mode=global/percpu/auto now gets -EINVAL, which technically
> speaking is a behavioral narrowing, not "backward compatibility."
> 
> I might go even further and suggest that perhaps for v7.2:
> 
> - Change the behavior of "auto" to be per-node

That's already the case if you're on a NUMA box. The problem is that
the default is not "auto" but "global". We could change that (easily),
but I'd argue that "auto" mode just devolves into "pernode" for two
reasons:

- "pernode" is functionally identical to "global" when there is only a
single NUMA node

- "percpu" mode is basically useless

If we want to "go slow" on this, then what I'd probably suggest is that
we just change the default to "pernode" initially.

Single node boxes that have this set to global today should see no real
change (functionally identical). Boxes with multiple NUMA nodes that
don't set it now, will start using pernode mode, but that's probably a
good thing in most cases.


> - Add a deprecation warning that is emitted when setting other modes,
>   but don't warn when the value is specifically the only accepted one.
> 

We probably do still want to warn in that case. Once we remove the
option, the module load can fail, so we probably want to discourage
people from keeping the setting.

> Then wait a few more cycles before removal of percpu and global.
> 

That's fair. We can stage this in slowly.

I'm terrible about following up with the later phases of long term
deprecation though. Suggestions on how to make sure we don't drop the
ball on finishing the change?

> Other notes:
> 
> o The existing pernode path assumes every online node has both CPUs and
>   some memory. nr_online_nodes on some platforms (e.g., certain ARM64,
>   CXL) counts memoryless or CPU-less nodes; for_each_node_with_cpus()
>   vs. for_each_online_node() matters here. See
>   svc_pool_map_init_pernode().
> 

Good point. We should probably change that too.

> o Recommend review of some history on this topic:
>   https://lore.kernel.org/linux-nfs/f027319a-378d-4b91-a418-c45218fb7a21@oracle.com/
> 

Thanks. I'll read up.
-- 
Jeff Layton <jlayton@kernel.org>

  reply	other threads:[~2026-04-24  0:26 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-23 14:39 [PATCH RFC] sunrpc: hardcode pool_mode to pernode, remove other modes Jeff Layton
2026-04-23 16:45 ` Chuck Lever
2026-04-24  0:26   ` Jeff Layton [this message]
2026-04-24 14:02     ` Chuck Lever

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55bfc1442ee1a80723520b714fc5f358d11c4c38.camel@kernel.org \
    --to=jlayton@kernel.org \
    --cc=Dai.Ngo@oracle.com \
    --cc=anna@kernel.org \
    --cc=cel@kernel.org \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neil@brown.name \
    --cc=okorniev@redhat.com \
    --cc=tom@talpey.com \
    --cc=trondmy@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox