From: Ian Kent <raven@themaw.net>
To: Al Viro <viro@zeniv.linux.org.uk>
Cc: Brian Foster <bfoster@redhat.com>, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH] namei: clear nd->root.mnt before O_CREAT unlazy
Date: Fri, 07 Jan 2022 15:04:46 +0800 [thread overview]
Message-ID: <b14cd1790c18e2be0ab6e5cfce91dee5611ceb9d.camel@themaw.net> (raw)
In-Reply-To: <YdfVG56XZnkePk7c@zeniv-ca.linux.org.uk>
On Fri, 2022-01-07 at 05:52 +0000, Al Viro wrote:
> On Fri, Jan 07, 2022 at 07:46:10AM +0800, Ian Kent wrote:
> > On Wed, 2022-01-05 at 13:02 -0500, Brian Foster wrote:
> > > The unlazy sequence of an rcuwalk lookup occurs a bit earlier than
> > > normal for O_CREAT lookups (i.e. in open_last_lookups()). The
> > > create
> > > logic here historically invoked complete_walk(), which clears the
> > > nd->root.mnt pointer when appropriate before the unlazy. This
> > > changed in commit 72287417abd1 ("open_last_lookups(): don't abuse
> > > complete_walk() when all we want is unlazy"), which refactored the
> > > create path to invoke unlazy_walk() and not consider nd->root.mnt.
> > >
> > > This tweak negatively impacts performance on a concurrent
> > > open(O_CREAT) workload to multiple independent mounts beneath the
> > > root directory. This attributes to increased spinlock contention on
> > > the root dentry via legitimize_root(), to the point where the
> > > spinlock becomes the primary bottleneck over the directory inode
> > > rwsem of the individual submounts. For example, the completion rate
> > > of a 32k thread aim7 create/close benchmark that repeatedly passes
> > > O_CREAT to open preexisting files drops from over 700k "jobs per
> > > minute" to 30, increasing the overall test time from a few minutes
> > > to over an hour.
> > >
> > > A similar, more simplified test to create a set of opener tasks
> > > across a set of submounts can demonstrate the problem more quickly.
> > > For example, consider sets of 100 open/close tasks each running
> > > against 64 independent filesystem mounts (i.e. 6400 tasks total),
> > > with each task completing 10k iterations before it exits. On an
> > > 80xcpu box running v5.16.0-rc2, this test completes in 50-55s. With
> > > this patch applied, the same test completes in 10-15s.
> > >
> > > This is not the most realistic workload in the world as it factors
> > > out inode allocation in the filesystem. The contention can also be
> > > avoided by more selective use of O_CREAT or via use of relative
> > > pathnames. That said, this regression appears to be an
> > > unintentional
> > > side effect of code cleanup and might be unexpected for users.
> > > Restore original behavior prior to commit 72287417abd1 by factoring
> > > the rcu logic from complete_walk() into a new helper and invoke
> > > that
> > > from both places.
> > >
> > > Signed-off-by: Brian Foster <bfoster@redhat.com>
> > > ---
> > > fs/namei.c | 37 +++++++++++++++++++++----------------
> > > 1 file changed, 21 insertions(+), 16 deletions(-)
> > >
> > > diff --git a/fs/namei.c b/fs/namei.c
> > > index 1f9d2187c765..b32fcbc99929 100644
> > > --- a/fs/namei.c
> > > +++ b/fs/namei.c
> > > @@ -856,6 +856,22 @@ static inline int d_revalidate(struct dentry
> > > *dentry, unsigned int flags)
> > > return 1;
> > > }
> > >
> > > +static inline bool complete_walk_rcu(struct nameidata *nd)
> > > +{
> > > + if (nd->flags & LOOKUP_RCU) {
> > > + /*
> > > + * We don't want to zero nd->root for scoped-
> > > lookups
> > > or
> > > + * externally-managed nd->root.
> > > + */
> > > + if (!(nd->state & ND_ROOT_PRESET))
> > > + if (!(nd->flags & LOOKUP_IS_SCOPED))
> > > + nd->root.mnt = NULL;
> > > + nd->flags &= ~LOOKUP_CACHED;
> > > + return try_to_unlazy(nd);
> > > + }
> > > + return true;
> > > +}
> > > +
> > > /**
> > > * complete_walk - successful completion of path walk
> > > * @nd: pointer nameidata
> > > @@ -871,18 +887,8 @@ static int complete_walk(struct nameidata *nd)
> > > struct dentry *dentry = nd->path.dentry;
> > > int status;
> > >
> > > - if (nd->flags & LOOKUP_RCU) {
> > > - /*
> > > - * We don't want to zero nd->root for scoped-
> > > lookups
> > > or
> > > - * externally-managed nd->root.
> > > - */
> > > - if (!(nd->state & ND_ROOT_PRESET))
> > > - if (!(nd->flags & LOOKUP_IS_SCOPED))
> > > - nd->root.mnt = NULL;
> > > - nd->flags &= ~LOOKUP_CACHED;
> > > - if (!try_to_unlazy(nd))
> > > - return -ECHILD;
> > > - }
> > > + if (!complete_walk_rcu(nd))
> > > + return -ECHILD;
> > >
> > > if (unlikely(nd->flags & LOOKUP_IS_SCOPED)) {
> > > /*
> > > @@ -3325,10 +3331,9 @@ static const char *open_last_lookups(struct
> > > nameidata *nd,
> > > BUG_ON(nd->flags & LOOKUP_RCU);
> > > } else {
> > > /* create side of things */
> > > - if (nd->flags & LOOKUP_RCU) {
> > > - if (!try_to_unlazy(nd))
> > > - return ERR_PTR(-ECHILD);
> > > - }
> > > + if (!complete_walk_rcu(nd))
> > > + return ERR_PTR(-ECHILD);
> > > +
> > > audit_inode(nd->name, dir, AUDIT_INODE_PARENT);
> > > /* trailing slashes? */
> > > if (unlikely(nd->last.name[nd->last.len]))
> >
> > Looks good, assuming Al is ok with the re-factoring.
> > Reviewed-by: Ian Kent <raven@themaw.net>
>
> Ummm.... Mind resending that? I'm still digging myself from under
> the huge pile of mail, and this seems to have been lost in process...
Brain wrote and sent the patch, I'm sure he'll resent it.
The re-factor I mention is just pulling out the needed bits from
complete_walk() rather than open coding it or reverting the original
change, commit 72287417abd1 ("open_last_lookups(): don't abuse
complete_walk() when all we want is unlazy").
Ian
next prev parent reply other threads:[~2022-01-07 7:04 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-05 18:02 [PATCH] namei: clear nd->root.mnt before O_CREAT unlazy Brian Foster
2022-01-06 23:46 ` Ian Kent
2022-01-07 5:52 ` Al Viro
2022-01-07 7:04 ` Ian Kent [this message]
2022-01-07 7:22 ` Al Viro
2022-01-07 7:10 ` Al Viro
2022-01-07 17:32 ` Brian Foster
2022-01-07 17:51 ` Al Viro
2022-01-07 18:42 ` Brian Foster
2022-01-08 3:26 ` Ian Kent
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b14cd1790c18e2be0ab6e5cfce91dee5611ceb9d.camel@themaw.net \
--to=raven@themaw.net \
--cc=bfoster@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).