From: "Ionut Nechita (Wind River)" <ionut.nechita@windriver.com>
To: axboe@kernel.dk, linux-block@vger.kernel.org
Cc: bigeasy@linutronix.de, bvanassche@acm.org, clrkwllms@kernel.org,
rostedt@goodmis.org, ming.lei@redhat.com, muchun.song@linux.dev,
mkhalfella@purestorage.com, chris.friesen@windriver.com,
linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev,
linux-rt-users@vger.kernel.org, stable@vger.kernel.org,
ionut_n2001@yahoo.com, sunlightlinux@gmail.com
Subject: [PATCH v6 0/1] block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT
Date: Wed, 6 May 2026 09:56:11 +0300 [thread overview]
Message-ID: <cover.1778048987.git.ionut.nechita@windriver.com> (raw)
Hi Jens,
This is v6 of the fix for the RT kernel performance regression caused by
commit 6bda857bcbb86 ("block: fix ordering between checking
QUEUE_FLAG_QUIESCED request adding").
Changes since v5 (Mar 3):
- Rewrote the memory-ordering comments per Bart Van Assche's review.
The previous wording incorrectly described smp_mb__after_atomic() as
ordering against "subsequent loads in blk_mq_run_hw_queue()". The
comments now describe the actual reader/writer pairing: writer-side
smp_mb__after_atomic() in blk_mq_quiesce_queue_nowait() and
blk_mq_unquiesce_queue() pairs with reader-side smp_rmb() in
blk_mq_run_hw_queue() so the re-check observes the latest
quiesce_depth value.
- Rebased on top of linux-next (next-20260505).
- No functional / code-generation changes.
Changes since v4 (Feb 13):
- Rebased on top of linux-next (20260302)
- No code changes
Changes since v3 (Feb 11):
- Rebased on top of axboe/for-7.0/block
- Fixed Fixes tag commit hash to match upstream (6bda857bcbb86)
- Added Reviewed-by from Sebastian Andrzej Siewior
- No code changes
Changes since v2 (Feb 10):
- Replaced raw_spinlock_t quiesce_sync_lock with atomic_t for
quiesce_depth, as suggested by Sebastian Andrzej Siewior
- Eliminated QUEUE_FLAG_QUIESCED entirely; blk_queue_quiesced() now
checks atomic_read(&q->quiesce_depth) > 0
- Use atomic_dec_if_positive() in blk_mq_unquiesce_queue() to avoid
race between WARN check and decrement
- Removed the unrelated blk_mq_run_hw_queues() async=true change
- Removed blk-mq-debugfs.c QUIESCED flag entry
- Uses smp_mb__after_atomic() / smp_rmb() for memory ordering instead
of any spinlock in the hot path
Changes since v1 (RESEND, Jan 9):
- Rebased on top of axboe/for-7.0/block
- No code changes
The problem: on PREEMPT_RT kernels, the spinlock_t queue_lock added in
blk_mq_run_hw_queue() converts to a sleeping rt_mutex, causing all IRQ
threads (one per MSI-X vector) to serialize. On megaraid_sas with 128
MSI-X vectors and 120 hw queues, throughput drops from 640 MB/s to
153 MB/s.
The fix converts quiesce_depth to atomic_t, which serves as both the
depth tracker and the quiesce indicator (depth > 0 means quiesced).
This eliminates QUEUE_FLAG_QUIESCED and removes the need for any lock
in the hot path. Memory ordering is ensured by smp_mb__after_atomic()
on the writer side (after modifying quiesce_depth) paired with
smp_rmb() on the reader side (before re-checking quiesce state in
blk_mq_run_hw_queue()).
Link: https://lore.kernel.org/linux-block/20260303073744.20585-1-ionut.nechita@windriver.com/
Ionut Nechita (1):
block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention
on RT
block/blk-core.c | 1 +
block/blk-mq-debugfs.c | 1 -
block/blk-mq.c | 53 +++++++++++++++++++++---------------------
include/linux/blkdev.h | 9 ++++---
4 files changed, 34 insertions(+), 30 deletions(-)
--
2.54.0
next reply other threads:[~2026-05-06 6:57 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 6:56 Ionut Nechita (Wind River) [this message]
2026-05-06 6:56 ` [PATCH v6 1/1] block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT Ionut Nechita (Wind River)
2026-05-06 7:14 ` Bart Van Assche
2026-05-06 7:47 ` Sebastian Andrzej Siewior
2026-05-06 9:43 ` Bart Van Assche
2026-05-07 7:45 ` Sebastian Andrzej Siewior
2026-05-07 10:41 ` Bart Van Assche
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1778048987.git.ionut.nechita@windriver.com \
--to=ionut.nechita@windriver.com \
--cc=axboe@kernel.dk \
--cc=bigeasy@linutronix.de \
--cc=bvanassche@acm.org \
--cc=chris.friesen@windriver.com \
--cc=clrkwllms@kernel.org \
--cc=ionut_n2001@yahoo.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rt-devel@lists.linux.dev \
--cc=linux-rt-users@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=mkhalfella@purestorage.com \
--cc=muchun.song@linux.dev \
--cc=rostedt@goodmis.org \
--cc=stable@vger.kernel.org \
--cc=sunlightlinux@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.