From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Josef Bacik <josef@toxicpanda.com>,
Linux NFS Mailing List <linux-nfs@vger.kernel.org>,
"kernel-team@fb.com" <kernel-team@fb.com>
Subject: Re: [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net
Date: Fri, 26 Jan 2024 10:35:54 -0500 [thread overview]
Message-ID: <90bb9cf7608084ca2f321fd8112d758624816766.camel@kernel.org> (raw)
In-Reply-To: <94064421-2EBA-45E7-8ECD-5BBCA6BB081C@oracle.com>
On Fri, 2024-01-26 at 15:16 +0000, Chuck Lever III wrote:
>
> > On Jan 26, 2024, at 10:03 AM, Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Fri, 2024-01-26 at 14:27 +0000, Chuck Lever III wrote:
> > >
> > > > On Jan 26, 2024, at 9:08 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > >
> > > > On Fri, 2024-01-26 at 13:48 +0000, Chuck Lever III wrote:
> > > > >
> > > > > > On Jan 26, 2024, at 8:01 AM, Jeff Layton <jlayton@kernel.org> wrote:
> > > > > >
> > > > > > On Thu, 2024-01-25 at 16:56 -0500, Josef Bacik wrote:
> > > > > > > On Thu, Jan 25, 2024 at 04:01:27PM -0500, Chuck Lever wrote:
> > > > > > > > On Thu, Jan 25, 2024 at 02:53:20PM -0500, Josef Bacik wrote:
> > > > > > > > > This is the last global stat, move it into nfsd_net and adjust all the
> > > > > > > > > users to use that variant instead of the global one.
> > > > > > > >
> > > > > > > > Hm. I thought nfsd threads were a global resource -- they service
> > > > > > > > all network namespaces. So, shouldn't the same thread count be
> > > > > > > > surfaced to all containers? Won't they all see all of the nfsd
> > > > > > > > processes?
> > > > > >
> > > > > > Each container is going to start /proc/fs/nfsd/threads number of threads
> > > > > > regardless. I hadn't actually grokked that they just get tossed onto the
> > > > > > pile of threads that service requests.
> > > > > >
> > > > > > Is is possible for one container to start a small number of threads but
> > > > > > have its client load be such that it spills over and ends up stealing
> > > > > > threads from other containers?
> > > > >
> > > > > I haven't seen any code that manages resources based on namespace,
> > > > > except in filecache.c to restrict writeback per namespace.
> > > > >
> > > > > My impression is that any nfsd thread can serve any namespace. I'm
> > > > > not sure it is currently meaningful for a particular net namespace to
> > > > > "create" more threads.
> > > > >
> > > > > If someone would like that level of control, we could implement a
> > > > > cgroup mechanism and have one or more separate svc_pools per net
> > > > > namespace, maybe? </hand wave>
> > > > >
> > > >
> > > > AFAICT, the total number of threads on the system will be the sum of the
> > > > threads started in each of the containers. They do just go into a big
> > > > pile, and whichever one wakes up will service the request, so the
> > > > threads aren't associated with the netns, per-se. The svc_rqst's however
> > > > _are_ per-netns. So, I don't see anything that ensures that a container
> > > > doesn't exceed the number of threads it started on its own behalf.
> > > >
> > > > <hand wave>
> > > > I'm not sure we'd need to tie this in to cgroups.
> > >
> > > Not a need, but cgroups are typically the way that such
> > > resource constraints are managed, that's all.
> > >
> > >
> > > > Now that Josef is
> > > > moving some of these key structures to be per-net, it should be fairly
> > > > simple to have nfsd() just look at the th_cnt and the thread count in
> > > > the current namespace, and just enqueue the RPC rather than doing it?
> > > > </hand wave>
> > > >
> > > > OTOH, maybe I'm overly concerned here.
> > > >
> > > >
> > > > >
> > > > > > > I don't think we want the network namespaces seeing how many threads exist in
> > > > > > > the entire system right?
> > > > >
> > > > > If someone in a non-init net namespace does a "pgrep -c nfsd" don't
> > > > > they see the total nfsd thread count for the host?
> > > > >
> > > >
> > > > Yes, they're kernel threads and they aren't associated with a particular
> > > > pid namespace.
> > > >
> > > > >
> > > > > > > Additionally it appears that we can have multiple threads per network namespace,
> > > > > > > so it's not like this will just show 1 for each individual nn, it'll show
> > > > > > > however many threads have been configured for that nfsd in that network
> > > > > > > namespace.
> > > > >
> > > > > I've never tried this, so I'm speculating. But it seems like for
> > > > > now, because all nfsd threads can serve all namespaces, they should
> > > > > all see the global thread count stat.
> > > > >
> > > > > Then later we can refine it.
> > > > >
> > > >
> > > > I don't think that info is particularly useful though, and it certainly
> > > > breaks expectations wrt container isolation.
> > > >
> > > > Look at it this way:
> > > >
> > > > Suppose I have access to a container and I spin up nfsd with a
> > > > particular number of threads. I now want to know "did I spin up enough
> > > > threads?"
> > >
> > > It makes sense to me for a container to ask for one or more
> > > NFSD listeners. But I'm not yet clear on what it means for
> > > a container to try to alter the NFSD thread count, which is
> > > currently global.
> > >
> >
> > write_threads sets the number of nfsd threads for the svc_serv, which is
> > a per-net object, so serv->sv_nrthreads is only counting the threads for
> > its namespace.
>
> OK. I missed the part where struct svc_serv is per-net, and
> each svc_serv has its own thread pool. Got it.
>
>
> > Each container that starts an nfsd will contribute "threads" number of
> > nfsds to the pile. When you set "threads" to a different number, the
> > total pile is grown or reduced accordingly. In fact, I'm not even sure
> > we keep a systemwide total.
> >
> > What _isn't_ done (unless I'm missing something) is any sort of
> > limitation on the use of those threads. So you could have a container
> > that starts one nfsd thread, but its clients can still have many more
> > RPCs in flight, up to the total number of nfsd's running on the host.
> > Limiting that might be possible, but I'm not yet convinced that it's a
> > good idea.
> >
> > It might be interesting to gather some metrics though. How often do the
> > number of RPCs in flight exceed the number of threads that we started in
> > a namespace? Might also be good to gather a high-water mark too?
>
> I had a bunch of trace points queued up for this purpose,
> and all that got thrown out by Neil's recent work fixing
> up our pool thread scheduler.
>
> I'm not sure that queued requests are all that meaningful
> or even quantifiable. A more meaningful metric is round
> trip time measured on clients.
>
I wouldn't try to queue anything just yet, but knowing when a particular
container is exceeding the number of threads it started seems like a
helpful metric. That could be done in a later patch though. I wouldn't
block this work on that.
>
> > > > By making this per-namespace as Josef suggests it should be
> > > > fairly simple to tell whether my clients are regularly overrunning the
> > > > threads I started. With this info as global, I have no idea what netns
> > > > the RPCs being counted are against. I can't do anything with that info.
> > >
> > > That's true. I'm just not sure we need to do anything
> > > significant about it as part of this patch series.
> > >
> > > I'm not at all advocating leaving it like this. I agree it
> > > needs updating.
> > >
> > > For now, just fill in that thread count stat with the global
> > > thread count, and later when we have a better concept for
> > > what it means to "adjust the nfsd thread count from inside a
> > > container" we can come back and make it make sense.
> >
> > It would be good to step back and decide how we think this should work.
> > What would this look like if we were designing it today? Then we can
> > decide how to get there.
> >
> > Personally, I think keeping the view inside the container as isolated as
> > possible is the right thing to do.
>
> I agree. I would like to get Josef's patches in quickly too.
>
> Should that metric report svc_serv::sv_nrthreads instead of
> the global thread count?
>
No. The sv_nrthreads is just the same value that /proc/fs/nfsd/threads
shows. The _metric_ should show the th_cnt for that particular
namespace.
It would also be nice to know what the high-water mark for th_cnt is.
IOW, if my containers clients exceeded the number of threads I've
started, then it would be helpful to know by how much.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-01-26 15:35 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-25 19:53 [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Josef Bacik
2024-01-25 19:53 ` [PATCH v2 01/13] sunrpc: don't change ->sv_stats if it doesn't exist Josef Bacik
2024-01-25 19:53 ` [PATCH v2 02/13] nfs: stop setting ->pg_stats for unused stats Josef Bacik
2024-01-25 19:53 ` [PATCH v2 03/13] sunrpc: pass in the sv_stats struct through svc_create* Josef Bacik
2024-01-25 20:56 ` Chuck Lever
2024-01-25 21:56 ` Josef Bacik
2024-01-25 19:53 ` [PATCH v2 04/13] sunrpc: remove ->pg_stats from svc_program Josef Bacik
2024-01-25 19:53 ` [PATCH v2 05/13] sunrpc: add a struct rpc_stats arg to rpc_create_args Josef Bacik
2024-01-25 20:53 ` Chuck Lever
2024-01-25 21:54 ` Josef Bacik
2024-01-25 22:30 ` Jeff Layton
2024-01-26 13:49 ` Chuck Lever III
2024-01-25 19:53 ` [PATCH v2 06/13] sunrpc: use the struct net as the svc proc private Josef Bacik
2024-01-25 19:53 ` [PATCH v2 07/13] nfsd: rename NFSD_NET_* to NFSD_STATS_* Josef Bacik
2024-01-25 19:53 ` [PATCH v2 08/13] nfsd: expose /proc/net/sunrpc/nfsd in net namespaces Josef Bacik
2024-01-25 19:53 ` [PATCH v2 09/13] nfsd: make all of the nfsd stats per-network namespace Josef Bacik
2024-01-25 19:53 ` [PATCH v2 10/13] nfsd: move th_cnt into nfsd_net Josef Bacik
2024-01-25 21:01 ` Chuck Lever
2024-01-25 21:56 ` Josef Bacik
2024-01-26 13:01 ` Jeff Layton
2024-01-26 13:48 ` Chuck Lever III
2024-01-26 14:08 ` Jeff Layton
2024-01-26 14:27 ` Chuck Lever III
2024-01-26 15:03 ` Jeff Layton
2024-01-26 15:16 ` Chuck Lever III
2024-01-26 15:35 ` Jeff Layton [this message]
2024-01-25 19:53 ` [PATCH v2 11/13] nfsd: make svc_stat per-network namespace instead of global Josef Bacik
2024-01-25 19:53 ` [PATCH v2 12/13] nfs: expose /proc/net/sunrpc/nfs in net namespaces Josef Bacik
2024-01-25 19:53 ` [PATCH v2 13/13] nfs: make the rpc_stat per net namespace Josef Bacik
2024-01-26 13:12 ` [PATCH v2 00/13] Make nfs and nfsd stats visible in network ns Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=90bb9cf7608084ca2f321fd8112d758624816766.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=josef@toxicpanda.com \
--cc=kernel-team@fb.com \
--cc=linux-nfs@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox