From: Jeff Layton <jlayton@kernel.org>
To: Chuck Lever <chuck.lever@oracle.com>, NeilBrown <neilb@suse.de>
Cc: Al Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-nfs@vger.kernel.org
Subject: Re: [PATCH/RFC] core/nfsd: allow kernel threads to use task_work.
Date: Thu, 30 Nov 2023 12:50:14 -0500 [thread overview]
Message-ID: <68b6743f8c095177f5c99876627861f0fbf48edc.camel@kernel.org> (raw)
In-Reply-To: <ZWYIj7K0KPQFCCdf@tissot.1015granger.net>
On Tue, 2023-11-28 at 10:34 -0500, Chuck Lever wrote:
> On Tue, Nov 28, 2023 at 01:57:30PM +1100, NeilBrown wrote:
> >
> > (trimmed cc...)
> >
> > On Tue, 28 Nov 2023, Chuck Lever wrote:
> > > On Tue, Nov 28, 2023 at 11:16:06AM +1100, NeilBrown wrote:
> > > > On Tue, 28 Nov 2023, Chuck Lever wrote:
> > > > > On Tue, Nov 28, 2023 at 09:05:21AM +1100, NeilBrown wrote:
> > > > > >
> > > > > > I have evidence from a customer site of 256 nfsd threads adding files to
> > > > > > delayed_fput_lists nearly twice as fast they are retired by a single
> > > > > > work-queue thread running delayed_fput(). As you might imagine this
> > > > > > does not end well (20 million files in the queue at the time a snapshot
> > > > > > was taken for analysis).
> > > > > >
> > > > > > While this might point to a problem with the filesystem not handling the
> > > > > > final close efficiently, such problems should only hurt throughput, not
> > > > > > lead to memory exhaustion.
> > > > >
> > > > > I have this patch queued for v6.8:
> > > > >
> > > > > https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/commit/?h=nfsd-next&id=c42661ffa58acfeaf73b932dec1e6f04ce8a98c0
> > > > >
> > > >
> > > > Thanks....
> > > > I think that change is good, but I don't think it addresses the problem
> > > > mentioned in the description, and it is not directly relevant to the
> > > > problem I saw ... though it is complicated.
> > > >
> > > > The problem "workqueue ... hogged cpu..." probably means that
> > > > nfsd_file_dispose_list() needs a cond_resched() call in the loop.
> > > > That will stop it from hogging the CPU whether it is tied to one CPU or
> > > > free to roam.
> > > >
> > > > Also that work is calling filp_close() which primarily calls
> > > > filp_flush().
> > > > It also calls fput() but that does minimal work. If there is much work
> > > > to do then that is offloaded to another work-item. *That* is the
> > > > workitem that I had problems with.
> > > >
> > > > The problem I saw was with an older kernel which didn't have the nfsd
> > > > file cache and so probably is calling filp_close more often.
> > >
> > > Without the file cache, the filp_close() should be handled directly
> > > by the nfsd thread handling the RPC, IIRC.
> >
> > Yes - but __fput() is handled by a workqueue.
> >
> > >
> > >
> > > > So maybe
> > > > my patch isn't so important now. Particularly as nfsd now isn't closing
> > > > most files in-task but instead offloads that to another task. So the
> > > > final fput will not be handled by the nfsd task either.
> > > >
> > > > But I think there is room for improvement. Gathering lots of files
> > > > together into a list and closing them sequentially is not going to be as
> > > > efficient as closing them in parallel.
> > >
> > > I believe the file cache passes the filps to the work queue one at
> >
> > nfsd_file_close_inode() does. nfsd_file_gc() and nfsd_file_lru_scan()
> > can pass multiple.
> >
> > > a time, but I don't think there's anything that forces the work
> > > queue to handle each flush/close completely before proceeding to the
> > > next.
> >
> > Parallelism with workqueues is controlled by the work items (struct
> > work_struct). Two different work items can run in parallel. But any
> > given work item can never run parallel to itself.
> >
> > The only work items queued on nfsd_filecache_wq are from
> > nn->fcache_disposal->work.
> > There is one of these for each network namespace. So in any given
> > network namespace, all work on nfsd_filecache_wq is fully serialised.
>
> OIC, it's that specific case you are concerned with. The per-
> namespace laundrette was added by:
>
> 9542e6a643fc ("nfsd: Containerise filecache laundrette")
>
> It's purpose was to confine the close backlog to each container.
>
> Seems like it would be better if there was a struct work_struct
> in each struct nfsd_file. That wouldn't add real backpressure to
> nfsd threads, but it would enable file closes to run in parallel.
>
I like this idea. That seems a lot simpler than all of this weirdo
queueing of delayed closes that we do.
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2023-11-30 17:50 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-27 22:05 [PATCH/RFC] core/nfsd: allow kernel threads to use task_work NeilBrown
2023-11-27 22:30 ` Al Viro
2023-11-27 22:43 ` NeilBrown
2023-11-27 22:59 ` Chuck Lever
2023-11-28 0:16 ` NeilBrown
2023-11-28 1:37 ` Chuck Lever
2023-11-28 2:57 ` NeilBrown
2023-11-28 15:34 ` Chuck Lever
2023-11-30 17:50 ` Jeff Layton [this message]
2023-11-28 13:51 ` Christian Brauner
2023-11-28 14:15 ` Jeff Layton
2023-11-28 15:22 ` Chuck Lever
2023-11-28 23:31 ` NeilBrown
2023-11-28 23:20 ` NeilBrown
2023-11-29 11:43 ` Christian Brauner
2023-12-04 1:30 ` NeilBrown
2023-11-29 14:04 ` Chuck Lever
2023-11-30 17:47 ` Jeff Layton
2023-11-30 18:07 ` Chuck Lever
2023-11-30 18:33 ` Jeff Layton
2023-11-28 11:24 ` Christian Brauner
2023-11-28 13:52 ` Oleg Nesterov
2023-11-28 15:33 ` Christian Brauner
2023-11-28 16:59 ` Oleg Nesterov
2023-11-28 17:29 ` Oleg Nesterov
2023-11-28 23:40 ` NeilBrown
2023-11-29 11:38 ` Christian Brauner
2023-11-28 14:01 ` Oleg Nesterov
2023-11-28 14:20 ` Oleg Nesterov
2023-11-29 0:14 ` NeilBrown
2023-11-29 7:55 ` Oleg Nesterov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=68b6743f8c095177f5c99876627861f0fbf48edc.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=brauner@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-nfs@vger.kernel.org \
--cc=neilb@suse.de \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).