Re: [PATCH v2] fs: try an opportunistic lookup for O_CREAT opens too

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Mateusz Guzik <mjguzik@gmail.com>
To: Jeff Layton <jlayton@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	 Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Josef Bacik <josef@toxicpanda.com>,
	 linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] fs: try an opportunistic lookup for O_CREAT opens too
Date: Tue, 6 Aug 2024 22:03:46 +0200	[thread overview]
Message-ID: <CAGudoHGZVBw3h_pHDaaSMeDgf3q_qn4wmkfOoG6y-CKN9sZLVQ@mail.gmail.com> (raw)
In-Reply-To: <44862ec7c85cdc19529e26f47176d0ecfc90d888.camel@kernel.org>

On Tue, Aug 6, 2024 at 9:26 PM Jeff Layton <jlayton@kernel.org> wrote:
>
> On Tue, 2024-08-06 at 12:11 -0700, Andi Kleen wrote:
> > Mateusz Guzik <mjguzik@gmail.com> writes:
> > >
> > > I would bench with that myself, but I temporarily don't have handy
> > > access to bigger hw. Even so, the below is completely optional and
> > > perhaps more of a suggestion for the future :)
> > >
> > > I hacked up the test case based on tests/open1.c.
> >
> > Don't you need two test cases? One where the file exists and one
> > where it doesn't. Because the "doesn't exist" will likely be slower
> > than before because it will do the lookups twice,
> > and it will likely even slow single threaded.
> >
> > I assume the penalty will also depend on the number of entries
> > in the path.
> >
> > That all seem to be an important considerations in judging the benefits
> > of the patch.
> >
>
> Definitely.
>
> FWIW, I did test a single threaded (bespoke) test case that did a bunch
> of O_CREAT opens, closes and then unlinks. I didn't measure any
> discernable difference with this patch. My conclusion from that was
> that the extra lockless lookup should be cheap.
>
> That said, this could show a difference if you have rather long hash
> chains that need to be walked completely, and you have to actually do
> the create every time. In practice though, time spent under the
> inode_lock and doing the create tends to dominate in that case, so I
> *think* this should still be worthwhile.
>
> I'll plan to add a test like that to will_it_scale unless Mateusz beats
> me to it. A long soak in linux-next is probably also justified with
> this patch.

Well if we are really going there I have to point out a couple of
three things concerning single-threaded performance from entering the
kernel to getting back with a file descriptor.

The headline is that there is a lot of single-threaded perf left on
the table and modulo a significant hit in terms of cycles it is going
to be hard to measure a meaningful result.

Before I get to the vfs layer, there is a significant loss in the
memory allocator because of memcg -- it takes several irq off/on trips
for every alloc (needed to grab struct file *). I have a plan what to
do with it (handle stuff with local cmpxchg (note no lock prefix)),
which I'm trying to get around to. Apart from that you may note the
allocator fast path performs a 16-byte cmpxchg, which is again dog
slow and executes twice (once for the file obj, another time for the
namei buffer). Someone(tm) should patch it up and I have some vague
ideas, but 0 idea when I can take a serious stab.

As for vfs, just from atomics perspective:
- opening a results in a spurious ref/unref cycle on the dentry, i
posted a patch [1] to sort it out (maybe Al will write his own
instead). single-threaded it gives about +5% to open/close rate
- there is an avoidable smp_mb in do_dentry_open, posted a patch [2]
and it got acked. i did not bother benching it
- someone sneaked in atomic refcount management to "struct filename"
to appease io_uring vs audit. this results in another atomic added to
every single path lookup. I have an idea what to do there
- opening for write (which you do with O_CREAT) calls
mnt_get_write_access which contains another smp_mb. this probably can
be sorted out the same way percpu semaphores got taken care of
- __legitimize_mnt has yet another smp_mb to synchro against unmount.
this *probably* can be sorted out with synchronize_rcu, but some care
is needed to not induce massive delays on unmount
- credential management is also done with atomics (so that's get + put
for the open/close cycle), $elsewhere I implemented a scheme where the
local count is cached in the thread itself and only rolled up on
credential change. this whacks the problem and is something i'm
planning on proposing at some point

and more

That is to say I completely agree this does add extra work, but the
kernel is not in shape where true single-threaded impact can be
sensibly measured. I do find it prudent to validate nothing happened
to somehow floor scalability of the case which does need to create
stuff though.

[1] https://lore.kernel.org/linux-fsdevel/20240806163256.882140-1-mjguzik@gmail.com/
[2] https://lore.kernel.org/linux-fsdevel/20240806172846.886570-1-mjguzik@gmail.com/

-- 
Mateusz Guzik <mjguzik gmail.com>

next prev parent reply	other threads:[~2024-08-06 20:04 UTC|newest]

Thread overview: 33+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-06 14:32 [PATCH v2] fs: try an opportunistic lookup for O_CREAT opens too Jeff Layton
2024-08-06 15:25 ` Mateusz Guzik
2024-08-06 16:17   ` Jeff Layton
2024-08-06 16:42     ` Mateusz Guzik
2024-08-06 19:11   ` Andi Kleen
2024-08-06 19:22     ` Mateusz Guzik
2024-08-06 20:42       ` Jeff Layton
2024-08-06 19:26     ` Jeff Layton
2024-08-06 20:03       ` Mateusz Guzik [this message]
2024-08-06 20:47         ` Andi Kleen
2024-08-15 15:07           ` Mateusz Guzik
2024-08-06 19:51 ` Jeff Layton
2024-08-14  2:18   ` Al Viro
2024-08-14  2:40     ` Al Viro
2024-08-14 11:48       ` Jeff Layton
2024-08-14 12:40         ` Christian Brauner
2024-08-14 15:44           ` Al Viro
2024-08-16  8:34             ` Christian Brauner
2024-08-14 15:42         ` Al Viro
2024-08-14 16:46           ` Jeff Layton
2024-08-07 14:26 ` Christian Brauner
2024-08-07 14:36   ` Jeff Layton
2024-08-08 10:36     ` Christian Brauner
2024-08-08 10:54       ` Jeff Layton
2024-08-08 11:18         ` Christian Brauner
2024-08-08 17:11       ` Jan Kara
2024-08-08 21:12         ` Paul Moore
2024-08-08 23:43           ` Jeff Layton
2024-08-09  0:28             ` Paul Moore
2024-08-09  0:33               ` Jeff Layton
2024-08-09  1:22                 ` Paul Moore
2024-08-09 14:21                   ` Jeff Layton
2024-08-11 21:52                     ` Paul Moore

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGudoHGZVBw3h_pHDaaSMeDgf3q_qn4wmkfOoG6y-CKN9sZLVQ@mail.gmail.com \
    --to=mjguzik@gmail.com \
    --cc=ak@linux.intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=jack@suse.cz \
    --cc=jlayton@kernel.org \
    --cc=josef@toxicpanda.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).