public inbox for linux-nfs@vger.kernel.org
 help / color / mirror / Atom feed
From: Tom Talpey <tom@talpey.com>
To: Chuck Lever III <chuck.lever@oracle.com>,
	Jeff Layton <jlayton@kernel.org>, Neil Brown <neilb@suse.de>
Cc: Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
	Olga Kornievskaia <kolga@netapp.com>,
	Dai Ngo <dai.ngo@oracle.com>, Steve Dickson <steved@redhat.com>
Subject: Re: [PATCH 13/14] nfsd: introduce concept of a maximum number of threads.
Date: Tue, 16 Jul 2024 14:49:59 -0400	[thread overview]
Message-ID: <05ab9c05-5d5d-4e51-9e38-7df1c2e60c28@talpey.com> (raw)
In-Reply-To: <B8450A75-EB10-4FED-A0AF-7EA7EA370055@oracle.com>

On 7/16/2024 9:31 AM, Chuck Lever III wrote:
> 
> 
>> On Jul 16, 2024, at 7:00 AM, Jeff Layton <jlayton@kernel.org> wrote:
>>
>> On Tue, 2024-07-16 at 13:21 +1000, NeilBrown wrote:
>>> On Tue, 16 Jul 2024, Jeff Layton wrote:
>>>> On Mon, 2024-07-15 at 17:14 +1000, NeilBrown wrote:
>>>>> A future patch will allow the number of threads in each nfsd pool to
>>>>> vary dynamically.
>>>>> The lower bound will be the number explicit requested via
>>>>> /proc/fs/nfsd/threads or /proc/fs/nfsd/pool_threads
>>>>>
>>>>> The upper bound can be set in each net-namespace by writing
>>>>> /proc/fs/nfsd/max_threads.  This upper bound applies across all pools,
>>>>> there is no per-pool upper limit.
>>>>>
>>>>> If no upper bound is set, then one is calculated.  A global upper limit
>>>>> is chosen based on amount of memory.  This limit only affects dynamic
>>>>> changes. Static configuration can always over-ride it.
>>>>>
>>>>> We track how many threads are configured in each net namespace, with the
>>>>> max or the min.  We also track how many net namespaces have nfsd
>>>>> configured with only a min, not a max.
>>>>>
>>>>> The difference between the calculated max and the total allocation is
>>>>> available to be shared among those namespaces which don't have a maximum
>>>>> configured.  Within a namespace, the available share is distributed
>>>>> equally across all pools.
>>>>>
>>>>> In the common case there is one namespace and one pool.  A small number
>>>>> of threads are configured as a minimum and no maximum is set.  In this
>>>>> case the effective maximum will be directly based on total memory.
>>>>> Approximately 8 per gigabyte.
>>>>>
>>>>
>>>>
>>>> Some of this may come across as bikeshedding, but I'd probably prefer
>>>> that this work a bit differently:
>>>>
>>>> 1/ I don't think we should enable this universally -- at least not
>>>> initially. What I'd prefer to see is a new pool_mode for the dynamic
>>>> threadpools (maybe call it "dynamic"). That gives us a clear opt-in
>>>> mechanism. Later once we're convinced it's safe, we can make "dynamic"
>>>> the default instead of "global".
>>>>
>>>> 2/ Rather than specifying a max_threads value separately, why not allow
>>>> the old threads/pool_threads interface to set the max and just have a
>>>> reasonable minimum setting (like the current default of 8). Since we're
>>>> growing the threadpool dynamically, I don't see why we need to have a
>>>> real configurable minimum.
>>>>
>>>> 3/ the dynamic pool-mode should probably be layered on top of the
>>>> pernode pool mode. IOW, in a NUMA configuration, we should split the
>>>> threads across NUMA nodes.
>>>
>>> Maybe we should start by discussing the goal.  What do we want
>>> configuration to look like when we finish?
>>>
>>> I think we want it to be transparent.  Sysadmin does nothing, and it all
>>> works perfectly.  Or as close to that as we can get.
>>>
>>
>> That's a nice eventual goal, but what do we do if we make this change
>> and it's not behaving for them? We need some way for them to revert to
>> traditional behavior if the new mode isn't working well.
> 
> As Steve pointed out (privately) there are likely to be cases
> where the dynamic thread count adjustment creates too many
> threads or somehow triggers a DoS. Admins want the ability to
> disable new features that cause trouble, and it is impossible
> for us to to say truthfully that we have predicted every
> misbehavior.
> 
> So +1 for having a mechanism for getting back the traditional
> behavior, at least until we have confidence it is not going
> to have troubling side-effects.

+1 on a configurable maximum as well, but I'll add a concern about
the NUMA node thing.

Not all CPU cores are created equal any more, there are "performance"
and "efficiency" (Atom) cores and there can be a big difference. Also
there are NUMA nodes with no CPUs at all, memory-only for example.
Then, CXL scrambles the topology again.

Let's not forget that these nfsd threads call into the filesystems,
which may desire very different NUMA affinities, for example the nfsd
protocol side may prefer to be near the network adapter, while the
filesystem side, the storage. And RDMA can bypass memory copy costs.

Thread count only addresses a fraction of these.

> Yes, in a perfect world, fully autonomous thread count
> adjustment would be amazing. Let's aim for that, but take
> baby steps to get there.

Amazing indeed, and just as unlikely to be perfect. Caution is good.

Tom.

  reply	other threads:[~2024-07-16 18:50 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15  7:14 [PATCH 00/14 RFC] support automatic changes to nfsd thread count NeilBrown
2024-07-15  7:14 ` [PATCH 01/14] lockd: discard nlmsvc_timeout NeilBrown
2024-07-15  7:14 ` [PATCH 02/14] SUNRPC: make various functions static, or not exported NeilBrown
2024-07-15  7:14 ` [PATCH 03/14] nfsd: move nfsd_pool_stats_open into nfsctl.c NeilBrown
2024-07-15  7:14 ` [PATCH 04/14] nfsd: don't allocate the versions array NeilBrown
2024-08-02 21:34   ` Mike Snitzer
2024-08-02 23:04     ` NeilBrown
2024-08-05  4:55       ` NeilBrown
2024-07-15  7:14 ` [PATCH 05/14] sunrpc: change sp_nrthreads from atomic_t to unsigned int NeilBrown
2024-07-15 14:12   ` Jeff Layton
2024-07-15 14:33     ` Jeff Layton
2024-07-16  1:33     ` NeilBrown
2024-07-24 19:36       ` Chuck Lever
2024-07-15  7:14 ` [PATCH 06/14] sunrpc: don't take ->sv_lock when updating ->sv_nrthreads NeilBrown
2024-07-15  7:14 ` [PATCH 07/14] Change unshare_fs_struct() to never fail NeilBrown
2024-07-15 14:39   ` Jeff Layton
2024-07-16  1:48     ` NeilBrown
2024-07-15  7:14 ` [PATCH 08/14] SUNRPC: move nrthreads counting to start/stop threads NeilBrown
2024-07-15  7:14 ` [PATCH 09/14] nfsd: return hard failure for OP_SETCLIENTID when there are too many clients NeilBrown
2024-07-15 15:21   ` Jeff Layton
2024-07-15  7:14 ` [PATCH 10/14] nfs: dynamically adjust per-client DRC slot limits NeilBrown
2024-07-15  7:14 ` [PATCH 11/14] nfsd: don't use sv_nrthreads in connection limiting calculations NeilBrown
2024-07-15 15:52   ` Jeff Layton
2024-07-16  2:04     ` NeilBrown
2024-07-15  7:14 ` [PATCH 12/14] sunrpc: introduce possibility that requested number of threads is different from actual NeilBrown
2024-07-15 16:00   ` Jeff Layton
2024-07-15  7:14 ` [PATCH 13/14] nfsd: introduce concept of a maximum number of threads NeilBrown
2024-07-15 17:06   ` Jeff Layton
2024-07-16  3:21     ` NeilBrown
2024-07-16 11:00       ` Jeff Layton
2024-07-16 13:31         ` Chuck Lever III
2024-07-16 18:49           ` Tom Talpey [this message]
2024-07-17 15:24             ` Chuck Lever III
2024-07-15  7:14 ` [PATCH 14/14] nfsd: adjust number of running nfsd threads NeilBrown
2024-07-15 17:29 ` [PATCH 00/14 RFC] support automatic changes to nfsd thread count Jeff Layton
2024-07-24 19:43 ` Chuck Lever III
2024-07-24 21:25   ` NeilBrown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=05ab9c05-5d5d-4e51-9e38-7df1c2e60c28@talpey.com \
    --to=tom@talpey.com \
    --cc=chuck.lever@oracle.com \
    --cc=dai.ngo@oracle.com \
    --cc=jlayton@kernel.org \
    --cc=kolga@netapp.com \
    --cc=linux-nfs@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=steved@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox