Re: [PATCH 2/2] block/blk-mq: convert blk_mq_cpuhp_lock to raw_spinlock for RT

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Ming Lei <ming.lei@redhat.com>
To: "Ionut Nechita (WindRiver)" <djiony2011@gmail.com>
Cc: axboe@kernel.dk, gregkh@linuxfoundation.org,
	muchun.song@linux.dev, sashal@kernel.org,
	linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	stable@vger.kernel.org,
	Ionut Nechita <ionut.nechita@windriver.com>
Subject: Re: [PATCH 2/2] block/blk-mq: convert blk_mq_cpuhp_lock to raw_spinlock for RT
Date: Sat, 20 Dec 2025 20:47:40 +0800	[thread overview]
Message-ID: <aUaa7IbGko82Dn8Z@fedora> (raw)
In-Reply-To: <20251220110241.8435-3-ionut.nechita@windriver.com>

On Sat, Dec 20, 2025 at 01:02:41PM +0200, Ionut Nechita (WindRiver) wrote:
> From: Ionut Nechita <ionut.nechita@windriver.com>
> 
> Commit 58bf93580fec ("blk-mq: move cpuhp callback registering out of
> q->sysfs_lock") introduced a global mutex blk_mq_cpuhp_lock to avoid
> lockdep warnings between sysfs_lock and CPU hotplug lock.
> 
> On RT kernels (CONFIG_PREEMPT_RT), regular mutexes are converted to
> rt_mutex (sleeping locks). When block layer operations need to acquire
> blk_mq_cpuhp_lock, IRQ threads processing I/O completions may sleep,
> causing additional contention on top of the queue_lock issue from
> commit 679b1874eba7 ("block: fix ordering between checking
> QUEUE_FLAG_QUIESCED request adding").
> 
> Test case (MegaRAID 12GSAS with 8 MSI-X vectors on RT kernel):
> - v6.6.68-rt with queue_lock fix: 640 MB/s (queue_lock fixed)
> - v6.6.69-rt: still exhibits contention due to cpuhp_lock mutex
> 
> The functions protected by blk_mq_cpuhp_lock only perform fast,
> non-sleeping operations:
> - hlist_unhashed() checks
> - cpuhp_state_add_instance_nocalls() - just hlist manipulation
> - cpuhp_state_remove_instance_nocalls() - just hlist manipulation
> - INIT_HLIST_NODE() initialization
> 
> The _nocalls variants do not invoke state callbacks and only manipulate
> data structures, making them safe to call under raw_spinlock.
> 
> Convert blk_mq_cpuhp_lock from mutex to raw_spinlock to prevent it from
> becoming a sleeping lock in RT kernel. This eliminates the contention
> bottleneck while maintaining the lockdep fix's original intent.

What is the contention bottleneck? blk_mq_cpuhp_lock is only acquired in
slow code path, and it isn't required in fast io path.

> 
> Fixes: 58bf93580fec ("blk-mq: move cpuhp callback registering out of q->sysfs_lock")

With the 1st patch, the perf becomes 640MB/s, same with before regression.

So can you share what you try to fix with this patch?

> Cc: stable@vger.kernel.org
> Signed-off-by: Ionut Nechita <ionut.nechita@windriver.com>
> ---
>  block/blk-mq.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5fb8da4958d0..3982e24b1081 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -43,7 +43,7 @@
>  
>  static DEFINE_PER_CPU(struct llist_head, blk_cpu_done);
>  static DEFINE_PER_CPU(call_single_data_t, blk_cpu_csd);
> -static DEFINE_MUTEX(blk_mq_cpuhp_lock);
> +static DEFINE_RAW_SPINLOCK(blk_mq_cpuhp_lock);
>  
>  static void blk_mq_insert_request(struct request *rq, blk_insert_t flags);
>  static void blk_mq_request_bypass_insert(struct request *rq,
> @@ -3641,9 +3641,9 @@ static void __blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
>  
>  static void blk_mq_remove_cpuhp(struct blk_mq_hw_ctx *hctx)
>  {
> -	mutex_lock(&blk_mq_cpuhp_lock);
> +	raw_spin_lock(&blk_mq_cpuhp_lock);
>  	__blk_mq_remove_cpuhp(hctx);
> -	mutex_unlock(&blk_mq_cpuhp_lock);
> +	raw_spin_unlock(&blk_mq_cpuhp_lock);

__blk_mq_remove_cpuhp()
	->cpuhp_state_remove_instance_nocalls()
		->__cpuhp_state_remove_instance
			->cpus_read_lock
				->percpu_down_read
					->percpu_down_read_internal
						->might_sleep()


Thanks,
Ming

next prev parent reply	other threads:[~2025-12-20 12:48 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-20 11:02 [PATCH 0/2] block/blk-mq: fix RT kernel performance regressions Ionut Nechita (WindRiver)
2025-12-20 11:02 ` [PATCH 1/2] block/blk-mq: fix RT kernel regression with queue_lock in hot path Ionut Nechita (WindRiver)
2025-12-20 11:02 ` [PATCH 2/2] block/blk-mq: convert blk_mq_cpuhp_lock to raw_spinlock for RT Ionut Nechita (WindRiver)
2025-12-20 12:47   ` Ming Lei [this message]
2025-12-20 20:58     ` [PATCH 0/2] block/blk-mq: fix RT kernel performance regressions Ionut Nechita (WindRiver)
2025-12-20 16:00 ` [syzbot ci] " syzbot ci
     [not found] <20251220105448.8065-1-ionut.nechita@windriver.com>
2025-12-20 10:54 ` [PATCH 2/2] block/blk-mq: convert blk_mq_cpuhp_lock to raw_spinlock for RT Ionut Nechita (WindRiver)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aUaa7IbGko82Dn8Z@fedora \
    --to=ming.lei@redhat.com \
    --cc=axboe@kernel.dk \
    --cc=djiony2011@gmail.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=ionut.nechita@windriver.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=muchun.song@linux.dev \
    --cc=sashal@kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.