All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Ionut Nechita (Wind River)" <ionut.nechita@windriver.com>
To: axboe@kernel.dk, linux-block@vger.kernel.org
Cc: bigeasy@linutronix.de, bvanassche@acm.org, clrkwllms@kernel.org,
	rostedt@goodmis.org, ming.lei@redhat.com, muchun.song@linux.dev,
	mkhalfella@purestorage.com, chris.friesen@windriver.com,
	linux-kernel@vger.kernel.org, linux-rt-devel@lists.linux.dev,
	linux-rt-users@vger.kernel.org, stable@vger.kernel.org,
	ionut_n2001@yahoo.com, sunlightlinux@gmail.com
Subject: [PATCH v6 0/1] block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT
Date: Wed,  6 May 2026 09:56:11 +0300	[thread overview]
Message-ID: <cover.1778048987.git.ionut.nechita@windriver.com> (raw)

Hi Jens,

This is v6 of the fix for the RT kernel performance regression caused by
commit 6bda857bcbb86 ("block: fix ordering between checking
QUEUE_FLAG_QUIESCED request adding").

Changes since v5 (Mar 3):
- Rewrote the memory-ordering comments per Bart Van Assche's review.
  The previous wording incorrectly described smp_mb__after_atomic() as
  ordering against "subsequent loads in blk_mq_run_hw_queue()". The
  comments now describe the actual reader/writer pairing: writer-side
  smp_mb__after_atomic() in blk_mq_quiesce_queue_nowait() and
  blk_mq_unquiesce_queue() pairs with reader-side smp_rmb() in
  blk_mq_run_hw_queue() so the re-check observes the latest
  quiesce_depth value.
- Rebased on top of linux-next (next-20260505).
- No functional / code-generation changes.

Changes since v4 (Feb 13):
- Rebased on top of linux-next (20260302)
- No code changes

Changes since v3 (Feb 11):
- Rebased on top of axboe/for-7.0/block
- Fixed Fixes tag commit hash to match upstream (6bda857bcbb86)
- Added Reviewed-by from Sebastian Andrzej Siewior
- No code changes

Changes since v2 (Feb 10):
- Replaced raw_spinlock_t quiesce_sync_lock with atomic_t for
  quiesce_depth, as suggested by Sebastian Andrzej Siewior
- Eliminated QUEUE_FLAG_QUIESCED entirely; blk_queue_quiesced() now
  checks atomic_read(&q->quiesce_depth) > 0
- Use atomic_dec_if_positive() in blk_mq_unquiesce_queue() to avoid
  race between WARN check and decrement
- Removed the unrelated blk_mq_run_hw_queues() async=true change
- Removed blk-mq-debugfs.c QUIESCED flag entry
- Uses smp_mb__after_atomic() / smp_rmb() for memory ordering instead
  of any spinlock in the hot path

Changes since v1 (RESEND, Jan 9):
- Rebased on top of axboe/for-7.0/block
- No code changes

The problem: on PREEMPT_RT kernels, the spinlock_t queue_lock added in
blk_mq_run_hw_queue() converts to a sleeping rt_mutex, causing all IRQ
threads (one per MSI-X vector) to serialize. On megaraid_sas with 128
MSI-X vectors and 120 hw queues, throughput drops from 640 MB/s to
153 MB/s.

The fix converts quiesce_depth to atomic_t, which serves as both the
depth tracker and the quiesce indicator (depth > 0 means quiesced).
This eliminates QUEUE_FLAG_QUIESCED and removes the need for any lock
in the hot path. Memory ordering is ensured by smp_mb__after_atomic()
on the writer side (after modifying quiesce_depth) paired with
smp_rmb() on the reader side (before re-checking quiesce state in
blk_mq_run_hw_queue()).

Link: https://lore.kernel.org/linux-block/20260303073744.20585-1-ionut.nechita@windriver.com/

Ionut Nechita (1):
  block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention
    on RT

 block/blk-core.c       |  1 +
 block/blk-mq-debugfs.c |  1 -
 block/blk-mq.c         | 53 +++++++++++++++++++++---------------------
 include/linux/blkdev.h |  9 ++++---
 4 files changed, 34 insertions(+), 30 deletions(-)

--
2.54.0


             reply	other threads:[~2026-05-06  6:57 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06  6:56 Ionut Nechita (Wind River) [this message]
2026-05-06  6:56 ` [PATCH v6 1/1] block/blk-mq: use atomic_t for quiesce_depth to avoid lock contention on RT Ionut Nechita (Wind River)
2026-05-06  7:14   ` Bart Van Assche
2026-05-06  7:47     ` Sebastian Andrzej Siewior
2026-05-06  9:43       ` Bart Van Assche
2026-05-07  7:45         ` Sebastian Andrzej Siewior
2026-05-07 10:41           ` Bart Van Assche

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1778048987.git.ionut.nechita@windriver.com \
    --to=ionut.nechita@windriver.com \
    --cc=axboe@kernel.dk \
    --cc=bigeasy@linutronix.de \
    --cc=bvanassche@acm.org \
    --cc=chris.friesen@windriver.com \
    --cc=clrkwllms@kernel.org \
    --cc=ionut_n2001@yahoo.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rt-devel@lists.linux.dev \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mkhalfella@purestorage.com \
    --cc=muchun.song@linux.dev \
    --cc=rostedt@goodmis.org \
    --cc=stable@vger.kernel.org \
    --cc=sunlightlinux@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.