From: Jeff Layton <jlayton@kernel.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Josef Bacik <josef@toxicpanda.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 3/4] lockref: rework CMPXCHG_LOOP to handle contention better
Date: Sat, 03 Aug 2024 07:32:05 -0400 [thread overview]
Message-ID: <808181ffe87d83f8cb36ebb4afbf6cd90778c763.camel@kernel.org> (raw)
In-Reply-To: <CAGudoHFn5Fu2JMJSnqrtEERQhbYmFLB7xR58iXeGJ9_n7oxw8Q@mail.gmail.com>
On Sat, 2024-08-03 at 13:21 +0200, Mateusz Guzik wrote:
> On Sat, Aug 3, 2024 at 12:59 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Sat, 2024-08-03 at 11:09 +0200, Mateusz Guzik wrote:
> > > On Sat, Aug 3, 2024 at 6:44 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 02, 2024 at 05:45:04PM -0400, Jeff Layton wrote:
> > > > > In a later patch, we want to change the open(..., O_CREAT) codepath to
> > > > > avoid taking the inode->i_rwsem for write when the dentry already exists.
> > > > > When we tested that initially, the performance devolved significantly
> > > > > due to contention for the parent's d_lockref spinlock.
> > > > >
> > > > > There are two problems with lockrefs today: First, once any concurrent
> > > > > task takes the spinlock, they all end up taking the spinlock, which is
> > > > > much more costly than a single cmpxchg operation. The second problem is
> > > > > that once any task fails to cmpxchg 100 times, it falls back to the
> > > > > spinlock. The upshot there is that even moderate contention can cause a
> > > > > fallback to serialized spinlocking, which worsens performance.
> > > > >
> > > > > This patch changes CMPXCHG_LOOP in 2 ways:
> > > > >
> > > > > First, change the loop to spin instead of falling back to a locked
> > > > > codepath when the spinlock is held. Once the lock is released, allow the
> > > > > task to continue trying its cmpxchg loop as before instead of taking the
> > > > > lock. Second, don't allow the cmpxchg loop to give up after 100 retries.
> > > > > Just continue infinitely.
> > > > >
> > > > > This greatly reduces contention on the lockref when there are large
> > > > > numbers of concurrent increments and decrements occurring.
> > > > >
> > > >
> > > > This was already tried by me and it unfortunately can reduce performance.
> > > >
> > >
> > > Oh wait I misread the patch based on what I tried there. Spinning
> > > indefinitely waiting for the lock to be free is a no-go as it loses
> > > the forward progress guarantee (and it is possible to get the lock
> > > being continuously held). Only spinning up to an arbitrary point wins
> > > some in some tests and loses in others.
> > >
> >
> > I'm a little confused about the forward progress guarantee here. Does
> > that exist today at all? ISTM that falling back to spin_lock() after a
> > certain number of retries doesn't guarantee any forward progress. You
> > can still just end up spinning on the lock forever once that happens,
> > no?
> >
>
> There is the implicit assumption that everyone holds locks for a
> finite time. I agree there are no guarantees otherwise if that's what
> you meant.
>
> In this case, since spinlocks are queued, a constant stream of lock
> holders will make the lock appear taken indefinitely even if they all
> hold it for a short period.
>
> Stock lockref will give up atomics immediately and make sure to change
> the ref thanks to queueing up.
>
> Lockref as proposed in this patch wont be able to do anything as long
> as the lock trading is taking place.
>
Got it, thanks. This spinning is very simplistic, so I could see that
you could have one task continually getting shuffled to the end of the
queue.
> > > Either way, as described below, chances are decent that:
> > > 1. there is an easy way to not lockref_get/put on the parent if the
> > > file is already there, dodging the problem
> > > .. and even if that's not true
> > > 2. lockref can be ditched in favor of atomics. apart from some minor
> > > refactoring this all looks perfectly doable and I have a wip. I will
> > > try to find the time next week to sort it out
> > >
> >
> > Like I said in the earlier mail, I don't think we can stay in RCU mode
> > because of the audit_inode call. I'm definitely interested in your WIP
> > though!
> >
>
> well audit may be hackable so that it works in rcu most of the time,
> but that's not something i'm interested in
Audit not my favorite area of the kernel to work in either. I don't see
a good way to make it rcu-friendly, but I haven't looked too hard yet
either. It would be nice to be able to do some of the auditing under
rcu or spinlock.
>
> sorting out the lockref situation would definitely help other stuff
> (notably opening the same file RO).
>
Indeed. It's clear that the current implementation is a real
scalability problem in a lot of situations.
> anyhow one idea is to temporarily disable atomic ops with a flag in
> the counter, a fallback plan is to loosen lockref so that it can do
> transitions other than 0->1->2 with atomics, even if the lock is held.
>
> I have not looked at this in over a month, I'm going to need to
> refresh my memory on the details, I do remember there was some stuff
> to massage first.
>
> Anyhow, I expect a working WIP some time in the upcoming week.
>
Great, I'll stay tuned.
Thanks!
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-08-03 11:32 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 21:45 [PATCH RFC 0/4] fs: try an opportunistic lookup for O_CREAT opens too Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 1/4] fs: remove comment about d_rcu_to_refcount Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 2/4] fs: add a kerneldoc header over lookup_fast Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 3/4] lockref: rework CMPXCHG_LOOP to handle contention better Jeff Layton
2024-08-03 4:44 ` Mateusz Guzik
2024-08-03 9:09 ` Mateusz Guzik
2024-08-03 10:59 ` Jeff Layton
2024-08-03 11:21 ` Mateusz Guzik
2024-08-03 11:32 ` Jeff Layton [this message]
2024-08-05 11:44 ` Christian Brauner
2024-08-05 12:52 ` Jeff Layton
2024-08-06 11:36 ` Christian Brauner
2024-08-03 10:55 ` Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 4/4] fs: try an opportunistic lookup for O_CREAT opens too Jeff Layton
2024-08-05 10:46 ` [PATCH RFC 0/4] " Christian Brauner
2024-08-05 11:55 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=808181ffe87d83f8cb36ebb4afbf6cd90778c763.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mjguzik@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).