From: Jeff Layton <jlayton@kernel.org>
To: Mateusz Guzik <mjguzik@gmail.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Andrew Morton <akpm@linux-foundation.org>,
Josef Bacik <josef@toxicpanda.com>,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH RFC 3/4] lockref: rework CMPXCHG_LOOP to handle contention better
Date: Sat, 03 Aug 2024 07:32:05 -0400 [thread overview]
Message-ID: <808181ffe87d83f8cb36ebb4afbf6cd90778c763.camel@kernel.org> (raw)
In-Reply-To: <CAGudoHFn5Fu2JMJSnqrtEERQhbYmFLB7xR58iXeGJ9_n7oxw8Q@mail.gmail.com>
On Sat, 2024-08-03 at 13:21 +0200, Mateusz Guzik wrote:
> On Sat, Aug 3, 2024 at 12:59 PM Jeff Layton <jlayton@kernel.org> wrote:
> >
> > On Sat, 2024-08-03 at 11:09 +0200, Mateusz Guzik wrote:
> > > On Sat, Aug 3, 2024 at 6:44 AM Mateusz Guzik <mjguzik@gmail.com> wrote:
> > > >
> > > > On Fri, Aug 02, 2024 at 05:45:04PM -0400, Jeff Layton wrote:
> > > > > In a later patch, we want to change the open(..., O_CREAT) codepath to
> > > > > avoid taking the inode->i_rwsem for write when the dentry already exists.
> > > > > When we tested that initially, the performance devolved significantly
> > > > > due to contention for the parent's d_lockref spinlock.
> > > > >
> > > > > There are two problems with lockrefs today: First, once any concurrent
> > > > > task takes the spinlock, they all end up taking the spinlock, which is
> > > > > much more costly than a single cmpxchg operation. The second problem is
> > > > > that once any task fails to cmpxchg 100 times, it falls back to the
> > > > > spinlock. The upshot there is that even moderate contention can cause a
> > > > > fallback to serialized spinlocking, which worsens performance.
> > > > >
> > > > > This patch changes CMPXCHG_LOOP in 2 ways:
> > > > >
> > > > > First, change the loop to spin instead of falling back to a locked
> > > > > codepath when the spinlock is held. Once the lock is released, allow the
> > > > > task to continue trying its cmpxchg loop as before instead of taking the
> > > > > lock. Second, don't allow the cmpxchg loop to give up after 100 retries.
> > > > > Just continue infinitely.
> > > > >
> > > > > This greatly reduces contention on the lockref when there are large
> > > > > numbers of concurrent increments and decrements occurring.
> > > > >
> > > >
> > > > This was already tried by me and it unfortunately can reduce performance.
> > > >
> > >
> > > Oh wait I misread the patch based on what I tried there. Spinning
> > > indefinitely waiting for the lock to be free is a no-go as it loses
> > > the forward progress guarantee (and it is possible to get the lock
> > > being continuously held). Only spinning up to an arbitrary point wins
> > > some in some tests and loses in others.
> > >
> >
> > I'm a little confused about the forward progress guarantee here. Does
> > that exist today at all? ISTM that falling back to spin_lock() after a
> > certain number of retries doesn't guarantee any forward progress. You
> > can still just end up spinning on the lock forever once that happens,
> > no?
> >
>
> There is the implicit assumption that everyone holds locks for a
> finite time. I agree there are no guarantees otherwise if that's what
> you meant.
>
> In this case, since spinlocks are queued, a constant stream of lock
> holders will make the lock appear taken indefinitely even if they all
> hold it for a short period.
>
> Stock lockref will give up atomics immediately and make sure to change
> the ref thanks to queueing up.
>
> Lockref as proposed in this patch wont be able to do anything as long
> as the lock trading is taking place.
>
Got it, thanks. This spinning is very simplistic, so I could see that
you could have one task continually getting shuffled to the end of the
queue.
> > > Either way, as described below, chances are decent that:
> > > 1. there is an easy way to not lockref_get/put on the parent if the
> > > file is already there, dodging the problem
> > > .. and even if that's not true
> > > 2. lockref can be ditched in favor of atomics. apart from some minor
> > > refactoring this all looks perfectly doable and I have a wip. I will
> > > try to find the time next week to sort it out
> > >
> >
> > Like I said in the earlier mail, I don't think we can stay in RCU mode
> > because of the audit_inode call. I'm definitely interested in your WIP
> > though!
> >
>
> well audit may be hackable so that it works in rcu most of the time,
> but that's not something i'm interested in
Audit not my favorite area of the kernel to work in either. I don't see
a good way to make it rcu-friendly, but I haven't looked too hard yet
either. It would be nice to be able to do some of the auditing under
rcu or spinlock.
>
> sorting out the lockref situation would definitely help other stuff
> (notably opening the same file RO).
>
Indeed. It's clear that the current implementation is a real
scalability problem in a lot of situations.
> anyhow one idea is to temporarily disable atomic ops with a flag in
> the counter, a fallback plan is to loosen lockref so that it can do
> transitions other than 0->1->2 with atomics, even if the lock is held.
>
> I have not looked at this in over a month, I'm going to need to
> refresh my memory on the details, I do remember there was some stuff
> to massage first.
>
> Anyhow, I expect a working WIP some time in the upcoming week.
>
Great, I'll stay tuned.
Thanks!
--
Jeff Layton <jlayton@kernel.org>
next prev parent reply other threads:[~2024-08-03 11:32 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-02 21:45 [PATCH RFC 0/4] fs: try an opportunistic lookup for O_CREAT opens too Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 1/4] fs: remove comment about d_rcu_to_refcount Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 2/4] fs: add a kerneldoc header over lookup_fast Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 3/4] lockref: rework CMPXCHG_LOOP to handle contention better Jeff Layton
2024-08-03 4:44 ` Mateusz Guzik
2024-08-03 9:09 ` Mateusz Guzik
2024-08-03 10:59 ` Jeff Layton
2024-08-03 11:21 ` Mateusz Guzik
2024-08-03 11:32 ` Jeff Layton [this message]
2024-08-05 11:44 ` Christian Brauner
2024-08-05 12:52 ` Jeff Layton
2024-08-06 11:36 ` Christian Brauner
2024-08-03 10:55 ` Jeff Layton
2024-08-02 21:45 ` [PATCH RFC 4/4] fs: try an opportunistic lookup for O_CREAT opens too Jeff Layton
2024-08-05 10:46 ` [PATCH RFC 0/4] " Christian Brauner
2024-08-05 11:55 ` Jeff Layton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=808181ffe87d83f8cb36ebb4afbf6cd90778c763.camel@kernel.org \
--to=jlayton@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=jack@suse.cz \
--cc=josef@toxicpanda.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mjguzik@gmail.com \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.