From: Dave Chinner <david@fromorbit.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org,
linux-cachefs@redhat.com, dhowells@redhat.com,
gfs2@lists.linux.dev, dm-devel@lists.linux.dev,
linux-security-module@vger.kernel.org, selinux@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep
Date: Thu, 7 Dec 2023 15:41:56 +1100 [thread overview]
Message-ID: <ZXFNFKai3ILLFdnR@dread.disaster.area> (raw)
In-Reply-To: <20231207041650.3tzzmv2jfrr5vppl@moria.home.lan>
On Wed, Dec 06, 2023 at 11:16:50PM -0500, Kent Overstreet wrote:
> On Wed, Dec 06, 2023 at 05:05:39PM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@redhat.com>
> >
> > hash-bl nests spinlocks inside the bit locks. This causes problems
> > for CONFIG_PREEMPT_RT which converts spin locks to sleeping locks,
> > and we're not allowed to sleep while holding a spinning lock.
> >
> > Further, lockdep does not support bit locks, so we lose lockdep
> > coverage of the inode hash table with the hash-bl conversion.
> >
> > To enable these configs to work, add an external per-chain spinlock
> > to the hlist_bl_head() and add helpers to use this instead of the
> > bit spinlock when preempt_rt or lockdep are enabled.
> >
> > This converts all users of hlist-bl to use the external spinlock in
> > these situations, so we also gain lockdep coverage of things like
> > the dentry cache hash table with this change.
> >
> > Signed-off-by: Dave Chinner <dchinner@redhat.com>
>
> Sleepable bit locks can be done with wait_on_bit(), is that worth
> considering for PREEMPT_RT? Or are the other features of real locks
> important there?
I think wait_on_bit() is not scalable. It hashes down to one of 256
shared struct wait_queue_heads which have thundering herd
behaviours, and it requires the locker to always run
prepare_to_wait() and finish_wait(). This means there is at least
one spinlock_irqsave()/unlock pair needed, sometimes two, just to
get an uncontended sleeping bit lock.
So as a fast path operation that requires lock scalability, it's
going to be better to use a straight spinlock that doesn't require
irq safety as it's far less expensive than a sleeping bit lock.
Whether CONFIG_PREEMPT_RT changes that equation at all is not at all
clear to me, and so I'll leave that consideration to RT people if
they see a need to address it. In the mean time, we need to use an
external spinlock for lockdep validation so it really doesn't make
any sense at all to add a third locking variant with completely
different semantics just for PREEMPT_RT...
-Dave.
--
Dave Chinner
david@fromorbit.com
next prev parent reply other threads:[~2023-12-07 4:42 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-06 6:05 [PATCH 0/11] vfs: inode cache scalability improvements Dave Chinner
2023-12-06 6:05 ` [PATCH 01/11] lib/dlock-list: Distributed and lock-protected lists Dave Chinner
2023-12-07 2:23 ` Al Viro
2023-12-06 6:05 ` [PATCH 02/11] vfs: Remove unnecessary list_for_each_entry_safe() variants Dave Chinner
2023-12-07 2:26 ` Al Viro
2023-12-07 4:18 ` Kent Overstreet
2023-12-06 6:05 ` [PATCH 03/11] vfs: Use dlock list for superblock's inode list Dave Chinner
2023-12-07 2:40 ` Al Viro
2023-12-07 4:59 ` Dave Chinner
2023-12-07 5:03 ` Kent Overstreet
2023-12-06 6:05 ` [PATCH 04/11] lib/dlock-list: Make sibling CPUs share the same linked list Dave Chinner
2023-12-07 4:31 ` Kent Overstreet
2023-12-07 5:42 ` Kent Overstreet
2023-12-07 6:25 ` Dave Chinner
2023-12-07 6:49 ` Al Viro
2023-12-06 6:05 ` [PATCH 05/11] selinux: use dlist for isec inode list Dave Chinner
2023-12-06 21:52 ` Paul Moore
2023-12-06 23:04 ` Dave Chinner
2023-12-07 0:36 ` Paul Moore
2023-12-06 6:05 ` [PATCH 06/11] vfs: factor out inode hash head calculation Dave Chinner
2023-12-07 3:02 ` Al Viro
2023-12-06 6:05 ` [PATCH 07/11] hlist-bl: add hlist_bl_fake() Dave Chinner
2023-12-07 3:05 ` Al Viro
2023-12-06 6:05 ` [PATCH 08/11] vfs: inode cache conversion to hash-bl Dave Chinner
2023-12-07 4:58 ` Kent Overstreet
2023-12-07 6:03 ` Dave Chinner
2023-12-07 6:42 ` Al Viro
2023-12-06 6:05 ` [PATCH 09/11] hash-bl: explicitly initialise hash-bl heads Dave Chinner
2023-12-07 3:15 ` Al Viro
2023-12-06 6:05 ` [PATCH 10/11] list_bl: don't use bit locks for PREEMPT_RT or lockdep Dave Chinner
2023-12-07 4:16 ` Kent Overstreet
2023-12-07 4:41 ` Dave Chinner [this message]
2023-12-06 6:05 ` [PATCH 11/11] hlist-bl: introduced nested locking for dm-snap Dave Chinner
2023-12-07 17:08 ` [PATCH 0/11] vfs: inode cache scalability improvements Kent Overstreet
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZXFNFKai3ILLFdnR@dread.disaster.area \
--to=david@fromorbit.com \
--cc=dhowells@redhat.com \
--cc=dm-devel@lists.linux.dev \
--cc=gfs2@lists.linux.dev \
--cc=kent.overstreet@linux.dev \
--cc=linux-block@vger.kernel.org \
--cc=linux-cachefs@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-security-module@vger.kernel.org \
--cc=selinux@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.