linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop
@ 2025-08-14  8:24 Nilay Shroff
  2025-08-14  8:24 ` [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio() Nilay Shroff
                   ` (4 more replies)
  0 siblings, 5 replies; 25+ messages in thread
From: Nilay Shroff @ 2025-08-14  8:24 UTC (permalink / raw)
  To: linux-block
  Cc: axboe, ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce

Hi,

This patchset replaces the use of a static key in the I/O path (rq_qos_
xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
is made to eliminate a potential deadlock introduced by the use of static
keys in the blk-rq-qos infrastructure, as reported by lockdep during
blktests block/005[1].

The original static key approach was introduced to avoid unnecessary
dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
blk-iolatency) is configured. While efficient, enabling a static key at
runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
becomes problematic if the queue is already frozen — causing a reverse
dependency on ->freeze_lock. This results in a lockdep splat indicating
a potential deadlock.

To resolve this, we now gate q->rq_qos access with a q->queue_flags
bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
locking altogether.

I compared both static key and atomic bitop implementations using ftrace
function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
easy comparision, I made rq_qos_issue() noinline. The comparision was
made on PowerPC machine.

Static Key disabled (QoS is not configured):
5d0: 00 00 00 60     nop    # patched in by static key framework
5d4: 20 00 80 4e     blr    # return (branch to link register)

Only a nop and blr (branch to link register) are executed — very lightweight.

atomic bitop (QoS is not configured):
5d0: 20 00 23 e9     ld      r9,32(r3)     # load q->queue_flags
5d4: 00 80 29 71     andi.   r9,r9,32768   # check QUEUE_FLAG_QOS_ENABLED (bit 15)
5d8: 20 00 82 4d     beqlr                 # return if bit not set

This performs an ld and andi. before returning. Slightly more work,
but q->queue_flags is typically hot in cache during I/O submission.

With Static Key (disabled):
Duration (us): min=0.668 max=0.816 avg≈0.750

With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
Duration (us): min=0.684 max=0.834 avg≈0.759

As expected, both versions are almost similar in cost. The added latency
from an extra ld and andi. is in the range of ~9ns.

There're three patches in the series:
- First patch is a minor optimization which skips evaluating q->rq_qos
  check in the re_qos_done_bio() as it's not needed.
- Second patch fixes a subtle issue in rq_qos_del() to ensure that
  we decrement the block_rq_qos static key in rq_qos_del() when a 
  QoS policy is detached from the queue.
- Third patch replaces usage of block_rq_qos static key with atomic flag
  QUEUE_FLAG_QOS_ENABLED and thus helps break cpu_hotplug_lock depedency
  on freeze_lock and that eventually help to fix the lockdep splat.

As usual, feedback and review comments are welcome!

[1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/

Changes from v2:
  - Added a change to skip the q->rq_qos check in rq_qos_done_bio().
    This is now part of the first patch in this series. (Yu Kuai)
  - Added a separate patch to ensure block_rq_qos is decremented when
    detaching a QoS policy in rq_qos_del(). This is now the second 
    patch in this series. (Yu Kuai)
  - Folded the third patch from v2 into the new third patch, as patch
    ordering changed (the second patch from v2 is now the third patch
    in v3).

Link to v2: https://lore.kernel.org/all/20250805171749.3448694-1-nilay@linux.ibm.com/

Changes from v1:
  - For debugging I made rq_qos_issue() noinline in my local workspace,
    but then inadvertently it slipped through the patchset upstream. So
    reverted it and made rq_qos_issue() inline again as earlier.
  - Added Reported-by and Closes tags in the first patch, which I
    obviously missed to add in the first version.

Link to v1: https://lore.kernel.org/all/20250804122125.3271397-1-nilay@linux.ibm.com/

Nilay Shroff (3):
  block: skip q->rq_qos check in rq_qos_done_bio()
  block: decrement block_rq_qos static key in rq_qos_del()
  block: avoid cpu_hotplug_lock depedency on freeze_lock

 block/blk-mq-debugfs.c |  1 +
 block/blk-rq-qos.c     |  8 +++----
 block/blk-rq-qos.h     | 48 +++++++++++++++++++++++++++---------------
 include/linux/blkdev.h |  1 +
 4 files changed, 37 insertions(+), 21 deletions(-)

-- 
2.50.1


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio()
  2025-08-14  8:24 [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
@ 2025-08-14  8:24 ` Nilay Shroff
  2025-08-14  8:59   ` Yu Kuai
  2025-08-14 11:12   ` Ming Lei
  2025-08-14  8:24 ` [PATCHv3 2/3] block: decrement block_rq_qos static key in rq_qos_del() Nilay Shroff
                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 25+ messages in thread
From: Nilay Shroff @ 2025-08-14  8:24 UTC (permalink / raw)
  To: linux-block
  Cc: axboe, ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce

If a bio has BIO_QOS_THROTTLED or BIO_QOS_MERGED set,
it implicitly guarantees that q->rq_qos is present.
Avoid re-checking q->rq_qos in this case and call
__rq_qos_done_bio() directly as a minor optimization.

Suggested-by : Yu Kuai <yukuai1@huaweicloud.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-rq-qos.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 39749f4066fb..28125fc49eff 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -142,8 +142,14 @@ static inline void rq_qos_done_bio(struct bio *bio)
 	    bio->bi_bdev && (bio_flagged(bio, BIO_QOS_THROTTLED) ||
 			     bio_flagged(bio, BIO_QOS_MERGED))) {
 		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-		if (q->rq_qos)
-			__rq_qos_done_bio(q->rq_qos, bio);
+
+		/*
+		 * If a bio has BIO_QOS_xxx set, it implicitly implies that
+		 * q->rq_qos is present. So, we skip re-checking q->rq_qos
+		 * here as an extra optimization and directly call
+		 * __rq_qos_done_bio().
+		 */
+		__rq_qos_done_bio(q->rq_qos, bio);
 	}
 }
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 2/3] block: decrement block_rq_qos static key in rq_qos_del()
  2025-08-14  8:24 [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
  2025-08-14  8:24 ` [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio() Nilay Shroff
@ 2025-08-14  8:24 ` Nilay Shroff
  2025-08-14  9:14   ` Yu Kuai
  2025-08-14 11:33   ` Ming Lei
  2025-08-14  8:24 ` [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 25+ messages in thread
From: Nilay Shroff @ 2025-08-14  8:24 UTC (permalink / raw)
  To: linux-block
  Cc: axboe, ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce

rq_qos_add() increments the block_rq_qos static key when a QoS
policy is attached. When a QoS policy is removed via rq_qos_del(),
we must symmetrically decrement the static key. If this removal drops
the last QoS policy from the queue (q->rq_qos becomes NULL), the
static branch can be disabled and the jump label patched to a NOP,
avoiding overhead on the hot path.

This change ensures rq_qos_add()/rq_qos_del() keep the
block_rq_qos static key balanced and prevents leaving the branch
permanently enabled after the last policy is removed.

Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-rq-qos.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 848591fb3c57..b1e24bb85ad2 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -374,6 +374,7 @@ void rq_qos_del(struct rq_qos *rqos)
 	for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) {
 		if (*cur == rqos) {
 			*cur = rqos->next;
+			static_branch_dec(&block_rq_qos);
 			break;
 		}
 	}
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-14  8:24 [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
  2025-08-14  8:24 ` [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio() Nilay Shroff
  2025-08-14  8:24 ` [PATCHv3 2/3] block: decrement block_rq_qos static key in rq_qos_del() Nilay Shroff
@ 2025-08-14  8:24 ` Nilay Shroff
  2025-08-14  9:21   ` Yu Kuai
                     ` (2 more replies)
  2025-08-21 12:19 ` [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
  2025-08-21 13:11 ` Jens Axboe
  4 siblings, 3 replies; 25+ messages in thread
From: Nilay Shroff @ 2025-08-14  8:24 UTC (permalink / raw)
  To: linux-block
  Cc: axboe, ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce

A recent lockdep[1] splat observed while running blktest block/005
reveals a potential deadlock caused by the cpu_hotplug_lock dependency
on ->freeze_lock. This dependency was introduced by commit 033b667a823e
("block: blk-rq-qos: guard rq-qos helpers by static key").

That change added a static key to avoid fetching q->rq_qos when
neither blk-wbt nor blk-iolatency is configured. The static key
dynamically patches kernel text to a NOP when disabled, eliminating
overhead of fetching q->rq_qos in the I/O hot path. However, enabling
a static key at runtime requires acquiring both cpu_hotplug_lock and
jump_label_mutex. When this happens after the queue has already been
frozen (i.e., while holding ->freeze_lock), it creates a locking
dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
potential deadlock reported by lockdep [1].

To resolve this, replace the static key mechanism with q->queue_flags:
QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
otherwise, the access is skipped.

Since q->queue_flags is commonly accessed in IO hotpath and resides in
the first cacheline of struct request_queue, checking it imposes minimal
overhead while eliminating the deadlock risk.

This change avoids the lockdep splat without introducing performance
regressions.

[1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/

Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
 block/blk-mq-debugfs.c |  1 +
 block/blk-rq-qos.c     |  9 ++++---
 block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
 include/linux/blkdev.h |  1 +
 4 files changed, 37 insertions(+), 28 deletions(-)

diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7ed3e71f2fc0..32c65efdda46 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
 	QUEUE_FLAG_NAME(SQ_SCHED),
 	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
 	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
+	QUEUE_FLAG_NAME(QOS_ENABLED),
 };
 #undef QUEUE_FLAG_NAME
 
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index b1e24bb85ad2..654478dfbc20 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -2,8 +2,6 @@
 
 #include "blk-rq-qos.h"
 
-__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
-
 /*
  * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
  * false if 'v' + 1 would be bigger than 'below'.
@@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
 		struct rq_qos *rqos = q->rq_qos;
 		q->rq_qos = rqos->next;
 		rqos->ops->exit(rqos);
-		static_branch_dec(&block_rq_qos);
 	}
+	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
 	mutex_unlock(&q->rq_qos_mutex);
 }
 
@@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
 		goto ebusy;
 	rqos->next = q->rq_qos;
 	q->rq_qos = rqos;
-	static_branch_inc(&block_rq_qos);
+	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
 
 	blk_mq_unfreeze_queue(q, memflags);
 
@@ -374,10 +372,11 @@ void rq_qos_del(struct rq_qos *rqos)
 	for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) {
 		if (*cur == rqos) {
 			*cur = rqos->next;
-			static_branch_dec(&block_rq_qos);
 			break;
 		}
 	}
+	if (!q->rq_qos)
+		blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
 	blk_mq_unfreeze_queue(q, memflags);
 
 	mutex_lock(&q->debugfs_mutex);
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 28125fc49eff..1fe22000a379 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -12,7 +12,6 @@
 #include "blk-mq-debugfs.h"
 
 struct blk_mq_debugfs_attr;
-extern struct static_key_false block_rq_qos;
 
 enum rq_qos_id {
 	RQ_QOS_WBT,
@@ -113,49 +112,55 @@ void __rq_qos_queue_depth_changed(struct rq_qos *rqos);
 
 static inline void rq_qos_cleanup(struct request_queue *q, struct bio *bio)
 {
-	if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+	if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+			q->rq_qos)
 		__rq_qos_cleanup(q->rq_qos, bio);
 }
 
 static inline void rq_qos_done(struct request_queue *q, struct request *rq)
 {
-	if (static_branch_unlikely(&block_rq_qos) && q->rq_qos &&
-	    !blk_rq_is_passthrough(rq))
+	if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+			q->rq_qos && !blk_rq_is_passthrough(rq))
 		__rq_qos_done(q->rq_qos, rq);
 }
 
 static inline void rq_qos_issue(struct request_queue *q, struct request *rq)
 {
-	if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+	if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+			q->rq_qos)
 		__rq_qos_issue(q->rq_qos, rq);
 }
 
 static inline void rq_qos_requeue(struct request_queue *q, struct request *rq)
 {
-	if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+	if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+			q->rq_qos)
 		__rq_qos_requeue(q->rq_qos, rq);
 }
 
 static inline void rq_qos_done_bio(struct bio *bio)
 {
-	if (static_branch_unlikely(&block_rq_qos) &&
-	    bio->bi_bdev && (bio_flagged(bio, BIO_QOS_THROTTLED) ||
-			     bio_flagged(bio, BIO_QOS_MERGED))) {
-		struct request_queue *q = bdev_get_queue(bio->bi_bdev);
-
-		/*
-		 * If a bio has BIO_QOS_xxx set, it implicitly implies that
-		 * q->rq_qos is present. So, we skip re-checking q->rq_qos
-		 * here as an extra optimization and directly call
-		 * __rq_qos_done_bio().
-		 */
-		__rq_qos_done_bio(q->rq_qos, bio);
-	}
+	struct request_queue *q;
+
+	if (!bio->bi_bdev || (!bio_flagged(bio, BIO_QOS_THROTTLED) &&
+			     !bio_flagged(bio, BIO_QOS_MERGED)))
+		return;
+
+	q = bdev_get_queue(bio->bi_bdev);
+
+	/*
+	 * If a bio has BIO_QOS_xxx set, it implicitly implies that
+	 * q->rq_qos is present. So, we skip re-checking q->rq_qos
+	 * here as an extra optimization and directly call
+	 * __rq_qos_done_bio().
+	 */
+	__rq_qos_done_bio(q->rq_qos, bio);
 }
 
 static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
 {
-	if (static_branch_unlikely(&block_rq_qos) && q->rq_qos) {
+	if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+			q->rq_qos) {
 		bio_set_flag(bio, BIO_QOS_THROTTLED);
 		__rq_qos_throttle(q->rq_qos, bio);
 	}
@@ -164,14 +169,16 @@ static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
 static inline void rq_qos_track(struct request_queue *q, struct request *rq,
 				struct bio *bio)
 {
-	if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+	if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+			q->rq_qos)
 		__rq_qos_track(q->rq_qos, rq, bio);
 }
 
 static inline void rq_qos_merge(struct request_queue *q, struct request *rq,
 				struct bio *bio)
 {
-	if (static_branch_unlikely(&block_rq_qos) && q->rq_qos) {
+	if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+			q->rq_qos) {
 		bio_set_flag(bio, BIO_QOS_MERGED);
 		__rq_qos_merge(q->rq_qos, rq, bio);
 	}
@@ -179,7 +186,8 @@ static inline void rq_qos_merge(struct request_queue *q, struct request *rq,
 
 static inline void rq_qos_queue_depth_changed(struct request_queue *q)
 {
-	if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+	if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+			q->rq_qos)
 		__rq_qos_queue_depth_changed(q->rq_qos);
 }
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 95886b404b16..fe1797bbec42 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -656,6 +656,7 @@ enum {
 	QUEUE_FLAG_SQ_SCHED,		/* single queue style io dispatch */
 	QUEUE_FLAG_DISABLE_WBT_DEF,	/* for sched to disable/enable wbt */
 	QUEUE_FLAG_NO_ELV_SWITCH,	/* can't switch elevator any more */
+	QUEUE_FLAG_QOS_ENABLED,		/* qos is enabled */
 	QUEUE_FLAG_MAX
 };
 
-- 
2.50.1


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio()
  2025-08-14  8:24 ` [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio() Nilay Shroff
@ 2025-08-14  8:59   ` Yu Kuai
  2025-08-14 11:12   ` Ming Lei
  1 sibling, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-08-14  8:59 UTC (permalink / raw)
  To: Nilay Shroff, linux-block
  Cc: axboe, ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce,
	yukuai (C)



在 2025/08/14 16:24, Nilay Shroff 写道:
> If a bio has BIO_QOS_THROTTLED or BIO_QOS_MERGED set,
> it implicitly guarantees that q->rq_qos is present.
> Avoid re-checking q->rq_qos in this case and call
> __rq_qos_done_bio() directly as a minor optimization.
> 
> Suggested-by : Yu Kuai<yukuai1@huaweicloud.com>
> Signed-off-by: Nilay Shroff<nilay@linux.ibm.com>
> ---
>   block/blk-rq-qos.h | 10 ++++++++--
>   1 file changed, 8 insertions(+), 2 deletions(-)
LGTM
Reviewed-by: Yu Kuai <yukuai3@huawei.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 2/3] block: decrement block_rq_qos static key in rq_qos_del()
  2025-08-14  8:24 ` [PATCHv3 2/3] block: decrement block_rq_qos static key in rq_qos_del() Nilay Shroff
@ 2025-08-14  9:14   ` Yu Kuai
  2025-08-14 11:33   ` Ming Lei
  1 sibling, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-08-14  9:14 UTC (permalink / raw)
  To: Nilay Shroff, linux-block
  Cc: axboe, ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce,
	yukuai (C)

Hi,

在 2025/08/14 16:24, Nilay Shroff 写道:
> rq_qos_add() increments the block_rq_qos static key when a QoS
> policy is attached. When a QoS policy is removed via rq_qos_del(),
> we must symmetrically decrement the static key. If this removal drops
> the last QoS policy from the queue (q->rq_qos becomes NULL), the
> static branch can be disabled and the jump label patched to a NOP,
> avoiding overhead on the hot path.
> 
> This change ensures rq_qos_add()/rq_qos_del() keep the
> block_rq_qos static key balanced and prevents leaving the branch
> permanently enabled after the last policy is removed.
> 

BTW, this problem is unlikely to happen in real users cases, because
rq_qos_del is only called from error path, where activate cgroup policy
failed with -ENOMEM.
> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
>   block/blk-rq-qos.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> index 848591fb3c57..b1e24bb85ad2 100644
> --- a/block/blk-rq-qos.c
> +++ b/block/blk-rq-qos.c
> @@ -374,6 +374,7 @@ void rq_qos_del(struct rq_qos *rqos)
>   	for (cur = &q->rq_qos; *cur; cur = &(*cur)->next) {
>   		if (*cur == rqos) {
>   			*cur = rqos->next;
> +			static_branch_dec(&block_rq_qos);
>   			break;
>   		}
>   	}
> 

Reviewed-by: Yu Kuai <yukuai3@huawei.com>
Thanks


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-14  8:24 ` [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
@ 2025-08-14  9:21   ` Yu Kuai
  2025-08-14 12:44   ` Ming Lei
  2025-08-16  1:59   ` Ming Lei
  2 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-08-14  9:21 UTC (permalink / raw)
  To: Nilay Shroff, linux-block
  Cc: axboe, ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce,
	yukuai (C)

在 2025/08/14 16:24, Nilay Shroff 写道:
> A recent lockdep[1] splat observed while running blktest block/005
> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
> ("block: blk-rq-qos: guard rq-qos helpers by static key").
> 
> That change added a static key to avoid fetching q->rq_qos when
> neither blk-wbt nor blk-iolatency is configured. The static key
> dynamically patches kernel text to a NOP when disabled, eliminating
> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
> a static key at runtime requires acquiring both cpu_hotplug_lock and
> jump_label_mutex. When this happens after the queue has already been
> frozen (i.e., while holding ->freeze_lock), it creates a locking
> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
> potential deadlock reported by lockdep [1].
> 
> To resolve this, replace the static key mechanism with q->queue_flags:
> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
> otherwise, the access is skipped.
> 
> Since q->queue_flags is commonly accessed in IO hotpath and resides in
> the first cacheline of struct request_queue, checking it imposes minimal
> overhead while eliminating the deadlock risk.
> 
> This change avoids the lockdep splat without introducing performance
> regressions.
> 
> [1]https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> 
> Reported-by: Shinichiro Kawasaki<shinichiro.kawasaki@wdc.com>
> Closes:https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> Tested-by: Shin'ichiro Kawasaki<shinichiro.kawasaki@wdc.com>
> Signed-off-by: Nilay Shroff<nilay@linux.ibm.com>
> ---
>   block/blk-mq-debugfs.c |  1 +
>   block/blk-rq-qos.c     |  9 ++++---
>   block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>   include/linux/blkdev.h |  1 +
>   4 files changed, 37 insertions(+), 28 deletions(-)

LGTM
Reviewed-by: Yu Kuai <yukuai3@huawei.com>


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio()
  2025-08-14  8:24 ` [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio() Nilay Shroff
  2025-08-14  8:59   ` Yu Kuai
@ 2025-08-14 11:12   ` Ming Lei
  1 sibling, 0 replies; 25+ messages in thread
From: Ming Lei @ 2025-08-14 11:12 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce

On Thu, Aug 14, 2025 at 01:54:57PM +0530, Nilay Shroff wrote:
> If a bio has BIO_QOS_THROTTLED or BIO_QOS_MERGED set,
> it implicitly guarantees that q->rq_qos is present.
> Avoid re-checking q->rq_qos in this case and call
> __rq_qos_done_bio() directly as a minor optimization.
> 
> Suggested-by : Yu Kuai <yukuai1@huaweicloud.com>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 2/3] block: decrement block_rq_qos static key in rq_qos_del()
  2025-08-14  8:24 ` [PATCHv3 2/3] block: decrement block_rq_qos static key in rq_qos_del() Nilay Shroff
  2025-08-14  9:14   ` Yu Kuai
@ 2025-08-14 11:33   ` Ming Lei
  1 sibling, 0 replies; 25+ messages in thread
From: Ming Lei @ 2025-08-14 11:33 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce

On Thu, Aug 14, 2025 at 01:54:58PM +0530, Nilay Shroff wrote:
> rq_qos_add() increments the block_rq_qos static key when a QoS
> policy is attached. When a QoS policy is removed via rq_qos_del(),
> we must symmetrically decrement the static key. If this removal drops
> the last QoS policy from the queue (q->rq_qos becomes NULL), the
> static branch can be disabled and the jump label patched to a NOP,
> avoiding overhead on the hot path.
> 
> This change ensures rq_qos_add()/rq_qos_del() keep the
> block_rq_qos static key balanced and prevents leaving the branch
> permanently enabled after the last policy is removed.
> 
> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>

Reviewed-by: Ming Lei <ming.lei@redhat.com>

Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-14  8:24 ` [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
  2025-08-14  9:21   ` Yu Kuai
@ 2025-08-14 12:44   ` Ming Lei
  2025-08-14 12:57     ` Nilay Shroff
  2025-08-16  1:59   ` Ming Lei
  2 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2025-08-14 12:44 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce

On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
> A recent lockdep[1] splat observed while running blktest block/005
> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
> ("block: blk-rq-qos: guard rq-qos helpers by static key").
> 
> That change added a static key to avoid fetching q->rq_qos when
> neither blk-wbt nor blk-iolatency is configured. The static key
> dynamically patches kernel text to a NOP when disabled, eliminating
> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
> a static key at runtime requires acquiring both cpu_hotplug_lock and
> jump_label_mutex. When this happens after the queue has already been
> frozen (i.e., while holding ->freeze_lock), it creates a locking
> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
> potential deadlock reported by lockdep [1].
> 
> To resolve this, replace the static key mechanism with q->queue_flags:
> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
> otherwise, the access is skipped.
> 
> Since q->queue_flags is commonly accessed in IO hotpath and resides in
> the first cacheline of struct request_queue, checking it imposes minimal
> overhead while eliminating the deadlock risk.
> 
> This change avoids the lockdep splat without introducing performance
> regressions.
> 
> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> 
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> ---
>  block/blk-mq-debugfs.c |  1 +
>  block/blk-rq-qos.c     |  9 ++++---
>  block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>  include/linux/blkdev.h |  1 +
>  4 files changed, 37 insertions(+), 28 deletions(-)
> 
> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> index 7ed3e71f2fc0..32c65efdda46 100644
> --- a/block/blk-mq-debugfs.c
> +++ b/block/blk-mq-debugfs.c
> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
>  	QUEUE_FLAG_NAME(SQ_SCHED),
>  	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
>  	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
> +	QUEUE_FLAG_NAME(QOS_ENABLED),
>  };
>  #undef QUEUE_FLAG_NAME
>  
> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> index b1e24bb85ad2..654478dfbc20 100644
> --- a/block/blk-rq-qos.c
> +++ b/block/blk-rq-qos.c
> @@ -2,8 +2,6 @@
>  
>  #include "blk-rq-qos.h"
>  
> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
> -
>  /*
>   * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
>   * false if 'v' + 1 would be bigger than 'below'.
> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
>  		struct rq_qos *rqos = q->rq_qos;
>  		q->rq_qos = rqos->next;
>  		rqos->ops->exit(rqos);
> -		static_branch_dec(&block_rq_qos);
>  	}
> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
>  	mutex_unlock(&q->rq_qos_mutex);
>  }
>  
> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>  		goto ebusy;
>  	rqos->next = q->rq_qos;
>  	q->rq_qos = rqos;
> -	static_branch_inc(&block_rq_qos);
> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);

One stupid question: can we simply move static_branch_inc(&block_rq_qos)
out of queue freeze in rq_qos_add()?

What matters is just the 1st static_branch_inc() which switches the counter
from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
needn't queue freeze protection.



Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-14 12:44   ` Ming Lei
@ 2025-08-14 12:57     ` Nilay Shroff
  2025-08-14 13:38       ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Nilay Shroff @ 2025-08-14 12:57 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce



On 8/14/25 6:14 PM, Ming Lei wrote:
> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
>> A recent lockdep[1] splat observed while running blktest block/005
>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
>>
>> That change added a static key to avoid fetching q->rq_qos when
>> neither blk-wbt nor blk-iolatency is configured. The static key
>> dynamically patches kernel text to a NOP when disabled, eliminating
>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
>> a static key at runtime requires acquiring both cpu_hotplug_lock and
>> jump_label_mutex. When this happens after the queue has already been
>> frozen (i.e., while holding ->freeze_lock), it creates a locking
>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
>> potential deadlock reported by lockdep [1].
>>
>> To resolve this, replace the static key mechanism with q->queue_flags:
>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
>> otherwise, the access is skipped.
>>
>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
>> the first cacheline of struct request_queue, checking it imposes minimal
>> overhead while eliminating the deadlock risk.
>>
>> This change avoids the lockdep splat without introducing performance
>> regressions.
>>
>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>
>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>> ---
>>  block/blk-mq-debugfs.c |  1 +
>>  block/blk-rq-qos.c     |  9 ++++---
>>  block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>>  include/linux/blkdev.h |  1 +
>>  4 files changed, 37 insertions(+), 28 deletions(-)
>>
>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>> index 7ed3e71f2fc0..32c65efdda46 100644
>> --- a/block/blk-mq-debugfs.c
>> +++ b/block/blk-mq-debugfs.c
>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
>>  	QUEUE_FLAG_NAME(SQ_SCHED),
>>  	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
>>  	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
>>  };
>>  #undef QUEUE_FLAG_NAME
>>  
>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>> index b1e24bb85ad2..654478dfbc20 100644
>> --- a/block/blk-rq-qos.c
>> +++ b/block/blk-rq-qos.c
>> @@ -2,8 +2,6 @@
>>  
>>  #include "blk-rq-qos.h"
>>  
>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
>> -
>>  /*
>>   * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
>>   * false if 'v' + 1 would be bigger than 'below'.
>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
>>  		struct rq_qos *rqos = q->rq_qos;
>>  		q->rq_qos = rqos->next;
>>  		rqos->ops->exit(rqos);
>> -		static_branch_dec(&block_rq_qos);
>>  	}
>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
>>  	mutex_unlock(&q->rq_qos_mutex);
>>  }
>>  
>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>>  		goto ebusy;
>>  	rqos->next = q->rq_qos;
>>  	q->rq_qos = rqos;
>> -	static_branch_inc(&block_rq_qos);
>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
> 
> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
> out of queue freeze in rq_qos_add()?
> 
> What matters is just the 1st static_branch_inc() which switches the counter
> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
> needn't queue freeze protection.
> 
I thought about it earlier but that won't work because we have 
code paths freezing queue before it reaches upto rq_qos_add(),
For instance:

We have following code paths from where we invoke
rq_qos_add() APIs with queue already frozen:

ioc_qos_write()
 -> blkg_conf_open_bdev_frozen() => freezes queue
 -> blk_iocost_init()
   -> rq_qos_add()

queue_wb_lat_store()  => freezes queue
 -> wbt_init()
  -> rq_qos_add() 

And yes we do have code path leading to rq_qos_exit() 
which freezes queue first:

__del_gendisk()  => freezes queue 
  -> rq_qos_exit()   

And similarly for rq_qos_delete():
ioc_qos_write()
 -> blkg_conf_open_bdev_frozen() => freezes queue
 -> blk_iocost_init()
  -> rq_qos_del() 

So even if we move static_branch_inc() out of freeze queue 
in rq_qos_add(), it'd still cause the observed lockdep 
splat.

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-14 12:57     ` Nilay Shroff
@ 2025-08-14 13:38       ` Ming Lei
  2025-08-14 14:31         ` Nilay Shroff
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2025-08-14 13:38 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce

On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
> 
> 
> On 8/14/25 6:14 PM, Ming Lei wrote:
> > On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
> >> A recent lockdep[1] splat observed while running blktest block/005
> >> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
> >> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
> >> ("block: blk-rq-qos: guard rq-qos helpers by static key").
> >>
> >> That change added a static key to avoid fetching q->rq_qos when
> >> neither blk-wbt nor blk-iolatency is configured. The static key
> >> dynamically patches kernel text to a NOP when disabled, eliminating
> >> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
> >> a static key at runtime requires acquiring both cpu_hotplug_lock and
> >> jump_label_mutex. When this happens after the queue has already been
> >> frozen (i.e., while holding ->freeze_lock), it creates a locking
> >> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
> >> potential deadlock reported by lockdep [1].
> >>
> >> To resolve this, replace the static key mechanism with q->queue_flags:
> >> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
> >> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
> >> otherwise, the access is skipped.
> >>
> >> Since q->queue_flags is commonly accessed in IO hotpath and resides in
> >> the first cacheline of struct request_queue, checking it imposes minimal
> >> overhead while eliminating the deadlock risk.
> >>
> >> This change avoids the lockdep splat without introducing performance
> >> regressions.
> >>
> >> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> >>
> >> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> >> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> >> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> >> ---
> >>  block/blk-mq-debugfs.c |  1 +
> >>  block/blk-rq-qos.c     |  9 ++++---
> >>  block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
> >>  include/linux/blkdev.h |  1 +
> >>  4 files changed, 37 insertions(+), 28 deletions(-)
> >>
> >> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> >> index 7ed3e71f2fc0..32c65efdda46 100644
> >> --- a/block/blk-mq-debugfs.c
> >> +++ b/block/blk-mq-debugfs.c
> >> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
> >>  	QUEUE_FLAG_NAME(SQ_SCHED),
> >>  	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
> >>  	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
> >> +	QUEUE_FLAG_NAME(QOS_ENABLED),
> >>  };
> >>  #undef QUEUE_FLAG_NAME
> >>  
> >> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> >> index b1e24bb85ad2..654478dfbc20 100644
> >> --- a/block/blk-rq-qos.c
> >> +++ b/block/blk-rq-qos.c
> >> @@ -2,8 +2,6 @@
> >>  
> >>  #include "blk-rq-qos.h"
> >>  
> >> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
> >> -
> >>  /*
> >>   * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
> >>   * false if 'v' + 1 would be bigger than 'below'.
> >> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
> >>  		struct rq_qos *rqos = q->rq_qos;
> >>  		q->rq_qos = rqos->next;
> >>  		rqos->ops->exit(rqos);
> >> -		static_branch_dec(&block_rq_qos);
> >>  	}
> >> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
> >>  	mutex_unlock(&q->rq_qos_mutex);
> >>  }
> >>  
> >> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
> >>  		goto ebusy;
> >>  	rqos->next = q->rq_qos;
> >>  	q->rq_qos = rqos;
> >> -	static_branch_inc(&block_rq_qos);
> >> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
> > 
> > One stupid question: can we simply move static_branch_inc(&block_rq_qos)
> > out of queue freeze in rq_qos_add()?
> > 
> > What matters is just the 1st static_branch_inc() which switches the counter
> > from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
> > paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
> > needn't queue freeze protection.
> > 
> I thought about it earlier but that won't work because we have 
> code paths freezing queue before it reaches upto rq_qos_add(),
> For instance:
> 
> We have following code paths from where we invoke
> rq_qos_add() APIs with queue already frozen:
> 
> ioc_qos_write()
>  -> blkg_conf_open_bdev_frozen() => freezes queue
>  -> blk_iocost_init()
>    -> rq_qos_add()
> 
> queue_wb_lat_store()  => freezes queue
>  -> wbt_init()
>   -> rq_qos_add() 

The above two shouldn't be hard to solve, such as, add helper
rq_qos_prep_add() for increasing the static branch counter.

> 
> And yes we do have code path leading to rq_qos_exit() 
> which freezes queue first:
> 
> __del_gendisk()  => freezes queue 
>   -> rq_qos_exit()   
> 
> And similarly for rq_qos_delete():
> ioc_qos_write()
>  -> blkg_conf_open_bdev_frozen() => freezes queue
>  -> blk_iocost_init()
>   -> rq_qos_del() 
> 
> So even if we move static_branch_inc() out of freeze queue 
> in rq_qos_add(), it'd still cause the observed lockdep 
> splat.

rqos won't be called into for passthrough request, so rq_qos_exit()
may be moved to end of __del_gendisk(), when the freeze lock map
is released.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-14 13:38       ` Ming Lei
@ 2025-08-14 14:31         ` Nilay Shroff
  2025-08-15  0:13           ` Ming Lei
  0 siblings, 1 reply; 25+ messages in thread
From: Nilay Shroff @ 2025-08-14 14:31 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce



On 8/14/25 7:08 PM, Ming Lei wrote:
> On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
>>
>>
>> On 8/14/25 6:14 PM, Ming Lei wrote:
>>> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
>>>> A recent lockdep[1] splat observed while running blktest block/005
>>>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
>>>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
>>>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
>>>>
>>>> That change added a static key to avoid fetching q->rq_qos when
>>>> neither blk-wbt nor blk-iolatency is configured. The static key
>>>> dynamically patches kernel text to a NOP when disabled, eliminating
>>>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
>>>> a static key at runtime requires acquiring both cpu_hotplug_lock and
>>>> jump_label_mutex. When this happens after the queue has already been
>>>> frozen (i.e., while holding ->freeze_lock), it creates a locking
>>>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
>>>> potential deadlock reported by lockdep [1].
>>>>
>>>> To resolve this, replace the static key mechanism with q->queue_flags:
>>>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
>>>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
>>>> otherwise, the access is skipped.
>>>>
>>>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
>>>> the first cacheline of struct request_queue, checking it imposes minimal
>>>> overhead while eliminating the deadlock risk.
>>>>
>>>> This change avoids the lockdep splat without introducing performance
>>>> regressions.
>>>>
>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>
>>>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
>>>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>>>> ---
>>>>  block/blk-mq-debugfs.c |  1 +
>>>>  block/blk-rq-qos.c     |  9 ++++---
>>>>  block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>>>>  include/linux/blkdev.h |  1 +
>>>>  4 files changed, 37 insertions(+), 28 deletions(-)
>>>>
>>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>>>> index 7ed3e71f2fc0..32c65efdda46 100644
>>>> --- a/block/blk-mq-debugfs.c
>>>> +++ b/block/blk-mq-debugfs.c
>>>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
>>>>  	QUEUE_FLAG_NAME(SQ_SCHED),
>>>>  	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
>>>>  	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
>>>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
>>>>  };
>>>>  #undef QUEUE_FLAG_NAME
>>>>  
>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>>>> index b1e24bb85ad2..654478dfbc20 100644
>>>> --- a/block/blk-rq-qos.c
>>>> +++ b/block/blk-rq-qos.c
>>>> @@ -2,8 +2,6 @@
>>>>  
>>>>  #include "blk-rq-qos.h"
>>>>  
>>>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
>>>> -
>>>>  /*
>>>>   * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
>>>>   * false if 'v' + 1 would be bigger than 'below'.
>>>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
>>>>  		struct rq_qos *rqos = q->rq_qos;
>>>>  		q->rq_qos = rqos->next;
>>>>  		rqos->ops->exit(rqos);
>>>> -		static_branch_dec(&block_rq_qos);
>>>>  	}
>>>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
>>>>  	mutex_unlock(&q->rq_qos_mutex);
>>>>  }
>>>>  
>>>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>>>>  		goto ebusy;
>>>>  	rqos->next = q->rq_qos;
>>>>  	q->rq_qos = rqos;
>>>> -	static_branch_inc(&block_rq_qos);
>>>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>>>
>>> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
>>> out of queue freeze in rq_qos_add()?
>>>
>>> What matters is just the 1st static_branch_inc() which switches the counter
>>> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
>>> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
>>> needn't queue freeze protection.
>>>
>> I thought about it earlier but that won't work because we have 
>> code paths freezing queue before it reaches upto rq_qos_add(),
>> For instance:
>>
>> We have following code paths from where we invoke
>> rq_qos_add() APIs with queue already frozen:
>>
>> ioc_qos_write()
>>  -> blkg_conf_open_bdev_frozen() => freezes queue
>>  -> blk_iocost_init()
>>    -> rq_qos_add()
>>
>> queue_wb_lat_store()  => freezes queue
>>  -> wbt_init()
>>   -> rq_qos_add() 
> 
> The above two shouldn't be hard to solve, such as, add helper
> rq_qos_prep_add() for increasing the static branch counter.
> 
Yes but then it means that IOs which would be in flight 
would take a hit in hotpath: In hotpath those IOs
would evaluate static key branch to true and then fetch 
q->rq_qos (which probably would not be in the first
cacheline). So are we okay to take hat hit in IO 
hotpath?

Thanks,
--Nilay

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-14 14:31         ` Nilay Shroff
@ 2025-08-15  0:13           ` Ming Lei
  2025-08-15  1:04             ` Yu Kuai
  2025-08-15  9:43             ` Nilay Shroff
  0 siblings, 2 replies; 25+ messages in thread
From: Ming Lei @ 2025-08-15  0:13 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce

On Thu, Aug 14, 2025 at 08:01:11PM +0530, Nilay Shroff wrote:
> 
> 
> On 8/14/25 7:08 PM, Ming Lei wrote:
> > On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
> >>
> >>
> >> On 8/14/25 6:14 PM, Ming Lei wrote:
> >>> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
> >>>> A recent lockdep[1] splat observed while running blktest block/005
> >>>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
> >>>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
> >>>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
> >>>>
> >>>> That change added a static key to avoid fetching q->rq_qos when
> >>>> neither blk-wbt nor blk-iolatency is configured. The static key
> >>>> dynamically patches kernel text to a NOP when disabled, eliminating
> >>>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
> >>>> a static key at runtime requires acquiring both cpu_hotplug_lock and
> >>>> jump_label_mutex. When this happens after the queue has already been
> >>>> frozen (i.e., while holding ->freeze_lock), it creates a locking
> >>>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
> >>>> potential deadlock reported by lockdep [1].
> >>>>
> >>>> To resolve this, replace the static key mechanism with q->queue_flags:
> >>>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
> >>>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
> >>>> otherwise, the access is skipped.
> >>>>
> >>>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
> >>>> the first cacheline of struct request_queue, checking it imposes minimal
> >>>> overhead while eliminating the deadlock risk.
> >>>>
> >>>> This change avoids the lockdep splat without introducing performance
> >>>> regressions.
> >>>>
> >>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> >>>>
> >>>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> >>>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> >>>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> >>>> ---
> >>>>  block/blk-mq-debugfs.c |  1 +
> >>>>  block/blk-rq-qos.c     |  9 ++++---
> >>>>  block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
> >>>>  include/linux/blkdev.h |  1 +
> >>>>  4 files changed, 37 insertions(+), 28 deletions(-)
> >>>>
> >>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> >>>> index 7ed3e71f2fc0..32c65efdda46 100644
> >>>> --- a/block/blk-mq-debugfs.c
> >>>> +++ b/block/blk-mq-debugfs.c
> >>>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
> >>>>  	QUEUE_FLAG_NAME(SQ_SCHED),
> >>>>  	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
> >>>>  	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
> >>>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
> >>>>  };
> >>>>  #undef QUEUE_FLAG_NAME
> >>>>  
> >>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> >>>> index b1e24bb85ad2..654478dfbc20 100644
> >>>> --- a/block/blk-rq-qos.c
> >>>> +++ b/block/blk-rq-qos.c
> >>>> @@ -2,8 +2,6 @@
> >>>>  
> >>>>  #include "blk-rq-qos.h"
> >>>>  
> >>>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
> >>>> -
> >>>>  /*
> >>>>   * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
> >>>>   * false if 'v' + 1 would be bigger than 'below'.
> >>>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
> >>>>  		struct rq_qos *rqos = q->rq_qos;
> >>>>  		q->rq_qos = rqos->next;
> >>>>  		rqos->ops->exit(rqos);
> >>>> -		static_branch_dec(&block_rq_qos);
> >>>>  	}
> >>>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
> >>>>  	mutex_unlock(&q->rq_qos_mutex);
> >>>>  }
> >>>>  
> >>>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
> >>>>  		goto ebusy;
> >>>>  	rqos->next = q->rq_qos;
> >>>>  	q->rq_qos = rqos;
> >>>> -	static_branch_inc(&block_rq_qos);
> >>>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
> >>>
> >>> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
> >>> out of queue freeze in rq_qos_add()?
> >>>
> >>> What matters is just the 1st static_branch_inc() which switches the counter
> >>> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
> >>> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
> >>> needn't queue freeze protection.
> >>>
> >> I thought about it earlier but that won't work because we have 
> >> code paths freezing queue before it reaches upto rq_qos_add(),
> >> For instance:
> >>
> >> We have following code paths from where we invoke
> >> rq_qos_add() APIs with queue already frozen:
> >>
> >> ioc_qos_write()
> >>  -> blkg_conf_open_bdev_frozen() => freezes queue
> >>  -> blk_iocost_init()
> >>    -> rq_qos_add()
> >>
> >> queue_wb_lat_store()  => freezes queue
> >>  -> wbt_init()
> >>   -> rq_qos_add() 
> > 
> > The above two shouldn't be hard to solve, such as, add helper
> > rq_qos_prep_add() for increasing the static branch counter.
> > 
> Yes but then it means that IOs which would be in flight 
> would take a hit in hotpath: In hotpath those IOs
> would evaluate static key branch to true and then fetch 
> q->rq_qos (which probably would not be in the first
> cacheline). So are we okay to take hat hit in IO 
> hotpath?

But it is that in-tree code is doing, isn't it?

`static branch` is only evaluated iff at least one rqos is added.


Thanks, 
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-15  0:13           ` Ming Lei
@ 2025-08-15  1:04             ` Yu Kuai
  2025-08-15  7:59               ` Ming Lei
  2025-08-15  9:43             ` Nilay Shroff
  1 sibling, 1 reply; 25+ messages in thread
From: Yu Kuai @ 2025-08-15  1:04 UTC (permalink / raw)
  To: Ming Lei, Nilay Shroff
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce, yukuai (C)

Hi,

在 2025/08/15 8:13, Ming Lei 写道:
> On Thu, Aug 14, 2025 at 08:01:11PM +0530, Nilay Shroff wrote:
>>
>>
>> On 8/14/25 7:08 PM, Ming Lei wrote:
>>> On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
>>>>
>>>>
>>>> On 8/14/25 6:14 PM, Ming Lei wrote:
>>>>> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
>>>>>> A recent lockdep[1] splat observed while running blktest block/005
>>>>>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
>>>>>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
>>>>>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
>>>>>>
>>>>>> That change added a static key to avoid fetching q->rq_qos when
>>>>>> neither blk-wbt nor blk-iolatency is configured. The static key
>>>>>> dynamically patches kernel text to a NOP when disabled, eliminating
>>>>>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
>>>>>> a static key at runtime requires acquiring both cpu_hotplug_lock and
>>>>>> jump_label_mutex. When this happens after the queue has already been
>>>>>> frozen (i.e., while holding ->freeze_lock), it creates a locking
>>>>>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
>>>>>> potential deadlock reported by lockdep [1].
>>>>>>
>>>>>> To resolve this, replace the static key mechanism with q->queue_flags:
>>>>>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
>>>>>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
>>>>>> otherwise, the access is skipped.
>>>>>>
>>>>>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
>>>>>> the first cacheline of struct request_queue, checking it imposes minimal
>>>>>> overhead while eliminating the deadlock risk.
>>>>>>
>>>>>> This change avoids the lockdep splat without introducing performance
>>>>>> regressions.
>>>>>>
>>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>
>>>>>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
>>>>>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>>>>>> ---
>>>>>>   block/blk-mq-debugfs.c |  1 +
>>>>>>   block/blk-rq-qos.c     |  9 ++++---
>>>>>>   block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>>>>>>   include/linux/blkdev.h |  1 +
>>>>>>   4 files changed, 37 insertions(+), 28 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>>>>>> index 7ed3e71f2fc0..32c65efdda46 100644
>>>>>> --- a/block/blk-mq-debugfs.c
>>>>>> +++ b/block/blk-mq-debugfs.c
>>>>>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
>>>>>>   	QUEUE_FLAG_NAME(SQ_SCHED),
>>>>>>   	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
>>>>>>   	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
>>>>>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
>>>>>>   };
>>>>>>   #undef QUEUE_FLAG_NAME
>>>>>>   
>>>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>>>>>> index b1e24bb85ad2..654478dfbc20 100644
>>>>>> --- a/block/blk-rq-qos.c
>>>>>> +++ b/block/blk-rq-qos.c
>>>>>> @@ -2,8 +2,6 @@
>>>>>>   
>>>>>>   #include "blk-rq-qos.h"
>>>>>>   
>>>>>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
>>>>>> -
>>>>>>   /*
>>>>>>    * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
>>>>>>    * false if 'v' + 1 would be bigger than 'below'.
>>>>>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
>>>>>>   		struct rq_qos *rqos = q->rq_qos;
>>>>>>   		q->rq_qos = rqos->next;
>>>>>>   		rqos->ops->exit(rqos);
>>>>>> -		static_branch_dec(&block_rq_qos);
>>>>>>   	}
>>>>>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>>   	mutex_unlock(&q->rq_qos_mutex);
>>>>>>   }
>>>>>>   
>>>>>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>>>>>>   		goto ebusy;
>>>>>>   	rqos->next = q->rq_qos;
>>>>>>   	q->rq_qos = rqos;
>>>>>> -	static_branch_inc(&block_rq_qos);
>>>>>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>
>>>>> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
>>>>> out of queue freeze in rq_qos_add()?
>>>>>
>>>>> What matters is just the 1st static_branch_inc() which switches the counter
>>>>> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
>>>>> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
>>>>> needn't queue freeze protection.
>>>>>
>>>> I thought about it earlier but that won't work because we have
>>>> code paths freezing queue before it reaches upto rq_qos_add(),
>>>> For instance:
>>>>
>>>> We have following code paths from where we invoke
>>>> rq_qos_add() APIs with queue already frozen:
>>>>
>>>> ioc_qos_write()
>>>>   -> blkg_conf_open_bdev_frozen() => freezes queue
>>>>   -> blk_iocost_init()
>>>>     -> rq_qos_add()
>>>>
>>>> queue_wb_lat_store()  => freezes queue
>>>>   -> wbt_init()
>>>>    -> rq_qos_add()
>>>
>>> The above two shouldn't be hard to solve, such as, add helper
>>> rq_qos_prep_add() for increasing the static branch counter.
>>>
I thought about this, we'll need some return value to know if rq_qos
is really added and I feel code will be much complex. We'll need at
least two different APIs for cgroup based policy iocost/iolatency and
pure rq_qos policy wbt.

>> Yes but then it means that IOs which would be in flight
>> would take a hit in hotpath: In hotpath those IOs
>> would evaluate static key branch to true and then fetch
>> q->rq_qos (which probably would not be in the first
>> cacheline). So are we okay to take hat hit in IO
>> hotpath?

I don't quite understand, do you mean between the window that
static branch counter is increased and queue is not freezed? I think
this is not hot path.
> 
> But it is that in-tree code is doing, isn't it?
> 
> `static branch` is only evaluated iff at least one rqos is added.
> 
And yes.

Thanks,
Kuai


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-15  1:04             ` Yu Kuai
@ 2025-08-15  7:59               ` Ming Lei
  2025-08-15  8:39                 ` Yu Kuai
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2025-08-15  7:59 UTC (permalink / raw)
  To: Yu Kuai
  Cc: Nilay Shroff, linux-block, axboe, hch, shinichiro.kawasaki, kch,
	gjoyce, yukuai (C)

On Fri, Aug 15, 2025 at 09:04:53AM +0800, Yu Kuai wrote:
> Hi,
> 
> 在 2025/08/15 8:13, Ming Lei 写道:
> > On Thu, Aug 14, 2025 at 08:01:11PM +0530, Nilay Shroff wrote:
> > > 
> > > 
> > > On 8/14/25 7:08 PM, Ming Lei wrote:
> > > > On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
> > > > > 
> > > > > 
> > > > > On 8/14/25 6:14 PM, Ming Lei wrote:
> > > > > > On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
> > > > > > > A recent lockdep[1] splat observed while running blktest block/005
> > > > > > > reveals a potential deadlock caused by the cpu_hotplug_lock dependency
> > > > > > > on ->freeze_lock. This dependency was introduced by commit 033b667a823e
> > > > > > > ("block: blk-rq-qos: guard rq-qos helpers by static key").
> > > > > > > 
> > > > > > > That change added a static key to avoid fetching q->rq_qos when
> > > > > > > neither blk-wbt nor blk-iolatency is configured. The static key
> > > > > > > dynamically patches kernel text to a NOP when disabled, eliminating
> > > > > > > overhead of fetching q->rq_qos in the I/O hot path. However, enabling
> > > > > > > a static key at runtime requires acquiring both cpu_hotplug_lock and
> > > > > > > jump_label_mutex. When this happens after the queue has already been
> > > > > > > frozen (i.e., while holding ->freeze_lock), it creates a locking
> > > > > > > dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
> > > > > > > potential deadlock reported by lockdep [1].
> > > > > > > 
> > > > > > > To resolve this, replace the static key mechanism with q->queue_flags:
> > > > > > > QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
> > > > > > > accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
> > > > > > > otherwise, the access is skipped.
> > > > > > > 
> > > > > > > Since q->queue_flags is commonly accessed in IO hotpath and resides in
> > > > > > > the first cacheline of struct request_queue, checking it imposes minimal
> > > > > > > overhead while eliminating the deadlock risk.
> > > > > > > 
> > > > > > > This change avoids the lockdep splat without introducing performance
> > > > > > > regressions.
> > > > > > > 
> > > > > > > [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> > > > > > > 
> > > > > > > Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > > > > > > Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> > > > > > > Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> > > > > > > Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> > > > > > > Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> > > > > > > ---
> > > > > > >   block/blk-mq-debugfs.c |  1 +
> > > > > > >   block/blk-rq-qos.c     |  9 ++++---
> > > > > > >   block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
> > > > > > >   include/linux/blkdev.h |  1 +
> > > > > > >   4 files changed, 37 insertions(+), 28 deletions(-)
> > > > > > > 
> > > > > > > diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> > > > > > > index 7ed3e71f2fc0..32c65efdda46 100644
> > > > > > > --- a/block/blk-mq-debugfs.c
> > > > > > > +++ b/block/blk-mq-debugfs.c
> > > > > > > @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
> > > > > > >   	QUEUE_FLAG_NAME(SQ_SCHED),
> > > > > > >   	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
> > > > > > >   	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
> > > > > > > +	QUEUE_FLAG_NAME(QOS_ENABLED),
> > > > > > >   };
> > > > > > >   #undef QUEUE_FLAG_NAME
> > > > > > > diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> > > > > > > index b1e24bb85ad2..654478dfbc20 100644
> > > > > > > --- a/block/blk-rq-qos.c
> > > > > > > +++ b/block/blk-rq-qos.c
> > > > > > > @@ -2,8 +2,6 @@
> > > > > > >   #include "blk-rq-qos.h"
> > > > > > > -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
> > > > > > > -
> > > > > > >   /*
> > > > > > >    * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
> > > > > > >    * false if 'v' + 1 would be bigger than 'below'.
> > > > > > > @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
> > > > > > >   		struct rq_qos *rqos = q->rq_qos;
> > > > > > >   		q->rq_qos = rqos->next;
> > > > > > >   		rqos->ops->exit(rqos);
> > > > > > > -		static_branch_dec(&block_rq_qos);
> > > > > > >   	}
> > > > > > > +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
> > > > > > >   	mutex_unlock(&q->rq_qos_mutex);
> > > > > > >   }
> > > > > > > @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
> > > > > > >   		goto ebusy;
> > > > > > >   	rqos->next = q->rq_qos;
> > > > > > >   	q->rq_qos = rqos;
> > > > > > > -	static_branch_inc(&block_rq_qos);
> > > > > > > +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
> > > > > > 
> > > > > > One stupid question: can we simply move static_branch_inc(&block_rq_qos)
> > > > > > out of queue freeze in rq_qos_add()?
> > > > > > 
> > > > > > What matters is just the 1st static_branch_inc() which switches the counter
> > > > > > from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
> > > > > > paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
> > > > > > needn't queue freeze protection.
> > > > > > 
> > > > > I thought about it earlier but that won't work because we have
> > > > > code paths freezing queue before it reaches upto rq_qos_add(),
> > > > > For instance:
> > > > > 
> > > > > We have following code paths from where we invoke
> > > > > rq_qos_add() APIs with queue already frozen:
> > > > > 
> > > > > ioc_qos_write()
> > > > >   -> blkg_conf_open_bdev_frozen() => freezes queue
> > > > >   -> blk_iocost_init()
> > > > >     -> rq_qos_add()
> > > > > 
> > > > > queue_wb_lat_store()  => freezes queue
> > > > >   -> wbt_init()
> > > > >    -> rq_qos_add()
> > > > 
> > > > The above two shouldn't be hard to solve, such as, add helper
> > > > rq_qos_prep_add() for increasing the static branch counter.
> > > > 
> I thought about this, we'll need some return value to know if rq_qos
> is really added and I feel code will be much complex. We'll need at
> least two different APIs for cgroup based policy iocost/iolatency and
> pure rq_qos policy wbt.

Yes, but not too bad, such as:


diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 5bfd70311359..05b13235ebb3 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -3227,6 +3227,8 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
 
 	blkg_conf_init(&ctx, input);
 
+	rq_qos_prep_add();
+
 	memflags = blkg_conf_open_bdev_frozen(&ctx);
 	if (IS_ERR_VALUE(memflags)) {
 		ret = memflags;
@@ -3344,7 +3346,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
 	if (enable)
 		wbt_disable_default(disk);
 	else
-		wbt_enable_default(disk);
+		wbt_enable_default(disk, false);
 
 	blk_mq_unquiesce_queue(disk->queue);
 
@@ -3356,6 +3358,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
 	ret = -EINVAL;
 err:
 	blkg_conf_exit_frozen(&ctx, memflags);
+	rq_qos_prep_del();
 	return ret;
 }
 
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 848591fb3c57..27047f661e3f 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -319,7 +319,7 @@ void rq_qos_exit(struct request_queue *q)
 		struct rq_qos *rqos = q->rq_qos;
 		q->rq_qos = rqos->next;
 		rqos->ops->exit(rqos);
-		static_branch_dec(&block_rq_qos);
+		rq_qos_prep_del();
 	}
 	mutex_unlock(&q->rq_qos_mutex);
 }
@@ -346,7 +346,6 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
 		goto ebusy;
 	rqos->next = q->rq_qos;
 	q->rq_qos = rqos;
-	static_branch_inc(&block_rq_qos);
 
 	blk_mq_unfreeze_queue(q, memflags);
 
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 39749f4066fb..38572a7eb2b7 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -179,4 +179,14 @@ static inline void rq_qos_queue_depth_changed(struct request_queue *q)
 
 void rq_qos_exit(struct request_queue *);
 
+static inline void rq_qos_prep_add(void)
+{
+	static_branch_inc(&block_rq_qos);
+}
+
+static inline void rq_qos_prep_del(void)
+{
+	static_branch_dec(&block_rq_qos);
+}
+
 #endif
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 396cded255ea..3801f35f206f 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -613,6 +613,7 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
 	ssize_t ret;
 	s64 val;
 	unsigned int memflags;
+	bool rqos_added = false;
 
 	ret = queue_var_store64(&val, page);
 	if (ret < 0)
@@ -620,6 +621,7 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
 	if (val < -1)
 		return -EINVAL;
 
+	rq_qos_prep_add();
 	memflags = blk_mq_freeze_queue(q);
 
 	rqos = wbt_rq_qos(q);
@@ -627,6 +629,7 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
 		ret = wbt_init(disk);
 		if (ret)
 			goto out;
+		rqos_added = true;
 	}
 
 	ret = count;
@@ -653,6 +656,9 @@ static ssize_t queue_wb_lat_store(struct gendisk *disk, const char *page,
 out:
 	blk_mq_unfreeze_queue(q, memflags);
 
+	if (!rqos_added)
+		rq_qos_prep_del();
+
 	return ret;
 }
 
@@ -903,7 +909,7 @@ int blk_register_queue(struct gendisk *disk)
 
 	if (queue_is_mq(q))
 		elevator_set_default(q);
-	wbt_enable_default(disk);
+	wbt_enable_default(disk, true);
 
 	blk_queue_flag_set(QUEUE_FLAG_REGISTERED, q);
 
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a50d4cd55f41..9dc5926adfcc 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -698,7 +698,7 @@ static void wbt_requeue(struct rq_qos *rqos, struct request *rq)
 /*
  * Enable wbt if defaults are configured that way
  */
-void wbt_enable_default(struct gendisk *disk)
+void wbt_enable_default(struct gendisk *disk, bool track_rqos_add)
 {
 	struct request_queue *q = disk->queue;
 	struct rq_qos *rqos;
@@ -723,8 +723,18 @@ void wbt_enable_default(struct gendisk *disk)
 	if (!blk_queue_registered(q))
 		return;
 
-	if (queue_is_mq(q) && enable)
-		wbt_init(disk);
+	if (queue_is_mq(q) && enable) {
+		if (track_rqos_add) {
+			int ret;
+
+			rq_qos_prep_add();
+			ret = wbt_init(disk);
+			if (ret)
+				rq_qos_prep_del();
+		} else {
+			wbt_init(disk);
+		}
+	}
 }
 EXPORT_SYMBOL_GPL(wbt_enable_default);
 
diff --git a/block/blk-wbt.h b/block/blk-wbt.h
index e5fc653b9b76..fea0e54d7c1f 100644
--- a/block/blk-wbt.h
+++ b/block/blk-wbt.h
@@ -6,7 +6,7 @@
 
 int wbt_init(struct gendisk *disk);
 void wbt_disable_default(struct gendisk *disk);
-void wbt_enable_default(struct gendisk *disk);
+void wbt_enable_default(struct gendisk *disk, bool track_rqos_add);
 
 u64 wbt_get_min_lat(struct request_queue *q);
 void wbt_set_min_lat(struct request_queue *q, u64 val);
diff --git a/block/elevator.c b/block/elevator.c
index fe96c6f4753c..cc2ca78ecb2b 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -647,7 +647,7 @@ static int elevator_change_done(struct request_queue *q,
 		blk_mq_free_sched_tags(ctx->old->et, q->tag_set);
 		kobject_put(&ctx->old->kobj);
 		if (enable_wbt)
-			wbt_enable_default(q->disk);
+			wbt_enable_default(q->disk, true);
 	}
 	if (ctx->new) {
 		ret = elv_register_queue(q, ctx->new, !ctx->no_uevent);
diff --git a/block/genhd.c b/block/genhd.c
index c26733f6324b..2dd9d4ec66e4 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -762,8 +762,6 @@ static void __del_gendisk(struct gendisk *disk)
 	if (queue_is_mq(q))
 		blk_mq_cancel_work_sync(q);
 
-	rq_qos_exit(q);
-
 	/*
 	 * If the disk does not own the queue, allow using passthrough requests
 	 * again.  Else leave the queue frozen to fail all I/O.
@@ -775,6 +773,7 @@ static void __del_gendisk(struct gendisk *disk)
 
 	if (start_drain)
 		blk_unfreeze_release_lock(q);
+	rq_qos_exit(q);
 }
 
 static void disable_elv_switch(struct request_queue *q)


Thanks,
Ming


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-15  7:59               ` Ming Lei
@ 2025-08-15  8:39                 ` Yu Kuai
  0 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-08-15  8:39 UTC (permalink / raw)
  To: Ming Lei, Yu Kuai
  Cc: Nilay Shroff, linux-block, axboe, hch, shinichiro.kawasaki, kch,
	gjoyce, yukuai (C)

Hi,

在 2025/08/15 15:59, Ming Lei 写道:
> On Fri, Aug 15, 2025 at 09:04:53AM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/08/15 8:13, Ming Lei 写道:
>>> On Thu, Aug 14, 2025 at 08:01:11PM +0530, Nilay Shroff wrote:
>>>>
>>>>
>>>> On 8/14/25 7:08 PM, Ming Lei wrote:
>>>>> On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
>>>>>>
>>>>>>
>>>>>> On 8/14/25 6:14 PM, Ming Lei wrote:
>>>>>>> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
>>>>>>>> A recent lockdep[1] splat observed while running blktest block/005
>>>>>>>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
>>>>>>>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
>>>>>>>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
>>>>>>>>
>>>>>>>> That change added a static key to avoid fetching q->rq_qos when
>>>>>>>> neither blk-wbt nor blk-iolatency is configured. The static key
>>>>>>>> dynamically patches kernel text to a NOP when disabled, eliminating
>>>>>>>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
>>>>>>>> a static key at runtime requires acquiring both cpu_hotplug_lock and
>>>>>>>> jump_label_mutex. When this happens after the queue has already been
>>>>>>>> frozen (i.e., while holding ->freeze_lock), it creates a locking
>>>>>>>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
>>>>>>>> potential deadlock reported by lockdep [1].
>>>>>>>>
>>>>>>>> To resolve this, replace the static key mechanism with q->queue_flags:
>>>>>>>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
>>>>>>>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
>>>>>>>> otherwise, the access is skipped.
>>>>>>>>
>>>>>>>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
>>>>>>>> the first cacheline of struct request_queue, checking it imposes minimal
>>>>>>>> overhead while eliminating the deadlock risk.
>>>>>>>>
>>>>>>>> This change avoids the lockdep splat without introducing performance
>>>>>>>> regressions.
>>>>>>>>
>>>>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>>>
>>>>>>>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>>>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
>>>>>>>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>>>>>>>> ---
>>>>>>>>    block/blk-mq-debugfs.c |  1 +
>>>>>>>>    block/blk-rq-qos.c     |  9 ++++---
>>>>>>>>    block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>>>>>>>>    include/linux/blkdev.h |  1 +
>>>>>>>>    4 files changed, 37 insertions(+), 28 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>>>>>>>> index 7ed3e71f2fc0..32c65efdda46 100644
>>>>>>>> --- a/block/blk-mq-debugfs.c
>>>>>>>> +++ b/block/blk-mq-debugfs.c
>>>>>>>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
>>>>>>>>    	QUEUE_FLAG_NAME(SQ_SCHED),
>>>>>>>>    	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
>>>>>>>>    	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
>>>>>>>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
>>>>>>>>    };
>>>>>>>>    #undef QUEUE_FLAG_NAME
>>>>>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>>>>>>>> index b1e24bb85ad2..654478dfbc20 100644
>>>>>>>> --- a/block/blk-rq-qos.c
>>>>>>>> +++ b/block/blk-rq-qos.c
>>>>>>>> @@ -2,8 +2,6 @@
>>>>>>>>    #include "blk-rq-qos.h"
>>>>>>>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
>>>>>>>> -
>>>>>>>>    /*
>>>>>>>>     * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
>>>>>>>>     * false if 'v' + 1 would be bigger than 'below'.
>>>>>>>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
>>>>>>>>    		struct rq_qos *rqos = q->rq_qos;
>>>>>>>>    		q->rq_qos = rqos->next;
>>>>>>>>    		rqos->ops->exit(rqos);
>>>>>>>> -		static_branch_dec(&block_rq_qos);
>>>>>>>>    	}
>>>>>>>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>>>>    	mutex_unlock(&q->rq_qos_mutex);
>>>>>>>>    }
>>>>>>>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>>>>>>>>    		goto ebusy;
>>>>>>>>    	rqos->next = q->rq_qos;
>>>>>>>>    	q->rq_qos = rqos;
>>>>>>>> -	static_branch_inc(&block_rq_qos);
>>>>>>>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>>>
>>>>>>> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
>>>>>>> out of queue freeze in rq_qos_add()?
>>>>>>>
>>>>>>> What matters is just the 1st static_branch_inc() which switches the counter
>>>>>>> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
>>>>>>> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
>>>>>>> needn't queue freeze protection.
>>>>>>>
>>>>>> I thought about it earlier but that won't work because we have
>>>>>> code paths freezing queue before it reaches upto rq_qos_add(),
>>>>>> For instance:
>>>>>>
>>>>>> We have following code paths from where we invoke
>>>>>> rq_qos_add() APIs with queue already frozen:
>>>>>>
>>>>>> ioc_qos_write()
>>>>>>    -> blkg_conf_open_bdev_frozen() => freezes queue
>>>>>>    -> blk_iocost_init()
>>>>>>      -> rq_qos_add()
>>>>>>
>>>>>> queue_wb_lat_store()  => freezes queue
>>>>>>    -> wbt_init()
>>>>>>     -> rq_qos_add()
>>>>>
>>>>> The above two shouldn't be hard to solve, such as, add helper
>>>>> rq_qos_prep_add() for increasing the static branch counter.
>>>>>
>> I thought about this, we'll need some return value to know if rq_qos
>> is really added and I feel code will be much complex. We'll need at
>> least two different APIs for cgroup based policy iocost/iolatency and
>> pure rq_qos policy wbt.
> 
> Yes, but not too bad, such as:
> 
> 
> diff --git a/block/blk-iocost.c b/block/blk-iocost.c
> index 5bfd70311359..05b13235ebb3 100644
> --- a/block/blk-iocost.c
> +++ b/block/blk-iocost.c
> @@ -3227,6 +3227,8 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
>   
>   	blkg_conf_init(&ctx, input);
>   
> +	rq_qos_prep_add();
> +
>   	memflags = blkg_conf_open_bdev_frozen(&ctx);
>   	if (IS_ERR_VALUE(memflags)) {
>   		ret = memflags;
> @@ -3344,7 +3346,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
>   	if (enable)
>   		wbt_disable_default(disk);
>   	else
> -		wbt_enable_default(disk);
> +		wbt_enable_default(disk, false);
>   
>   	blk_mq_unquiesce_queue(disk->queue);
>   
> @@ -3356,6 +3358,7 @@ static ssize_t ioc_qos_write(struct kernfs_open_file *of, char *input,
>   	ret = -EINVAL;
>   err:
>   	blkg_conf_exit_frozen(&ctx, memflags);
> +	rq_qos_prep_del();
>   	return ret;
>   }

This is not enough for iocost:

1) ioc_qos_write() can be called with iocost already registered, we need
to call rq_qos_prep_del() in the succeed branch as well.
2) ioc_cost_model_write() have to have the same treatment.

>   
> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> index 848591fb3c57..27047f661e3f 100644
> --- a/block/blk-rq-qos.c
> +++ b/block/blk-rq-qos.c
> @@ -319,7 +319,7 @@ void rq_qos_exit(struct request_queue *q)
>   		struct rq_qos *rqos = q->rq_qos;
>   		q->rq_qos = rqos->next;
>   		rqos->ops->exit(rqos);
> -		static_branch_dec(&block_rq_qos);
> +		rq_qos_prep_del();
>   	}
>   	mutex_unlock(&q->rq_qos_mutex);
>   }
> @@ -346,7 +346,6 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>   		goto ebusy;
>   	rqos->next = q->rq_qos;
>   	q->rq_qos = rqos;
> -	static_branch_inc(&block_rq_qos);
>   
>   	blk_mq_unfreeze_queue(q, memflags);
>   
> diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
> index 39749f4066fb..38572a7eb2b7 100644
> --- a/block/blk-rq-qos.h
> +++ b/block/blk-rq-qos.h
> @@ -179,4 +179,14 @@ static inline void rq_qos_queue_depth_changed(struct request_queue *q)
>   
>   void rq_qos_exit(struct request_queue *);
>   
> +static inline void rq_qos_prep_add(void)
> +{
> +	static_branch_inc(&block_rq_qos);
> +}
> +
> +static inline void rq_qos_prep_del(void)
> +{
> +	static_branch_dec(&block_rq_qos);
> +}
> +
>   #endif

Wonder can we simplify this, we already have a disk level lock
rq_qos_mutex, that is held before freeze queue for iocost and iolatency,
and we can grab it before freeze queue for wbt as well, then we can just
do the static_branch_inc/dec with rq_qos_mutex held easily since
everything is serialized.

+static inline bool rq_qos_prep_add(struct request_queue *q, enum 
rq_qos_id id)
+{
+       lockdep_assert_held(q->rq_qos_mutex);
+
+       if (!rq_qos_id(q, id)) {
+               static_branch_inc(&block_rq_qos);
+               return true;
+       }
+
+       return false;
+}
+
+/* paired with rq_qos_prep_add */
+static inline bool rq_qos_prep_del(struct request_queue *q, enum 
rq_qos_id id, bool prepared)
+{
+       lockdep_assert_held(q->rq_qos_mutex);
+
+       if (prepared && !rq_qos_id(q, id))
+               static_branch_dec(&block_rq_qos);
+}
+


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-15  0:13           ` Ming Lei
  2025-08-15  1:04             ` Yu Kuai
@ 2025-08-15  9:43             ` Nilay Shroff
  2025-08-15 13:24               ` Ming Lei
  1 sibling, 1 reply; 25+ messages in thread
From: Nilay Shroff @ 2025-08-15  9:43 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce



On 8/15/25 5:43 AM, Ming Lei wrote:
> On Thu, Aug 14, 2025 at 08:01:11PM +0530, Nilay Shroff wrote:
>>
>>
>> On 8/14/25 7:08 PM, Ming Lei wrote:
>>> On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
>>>>
>>>>
>>>> On 8/14/25 6:14 PM, Ming Lei wrote:
>>>>> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
>>>>>> A recent lockdep[1] splat observed while running blktest block/005
>>>>>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
>>>>>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
>>>>>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
>>>>>>
>>>>>> That change added a static key to avoid fetching q->rq_qos when
>>>>>> neither blk-wbt nor blk-iolatency is configured. The static key
>>>>>> dynamically patches kernel text to a NOP when disabled, eliminating
>>>>>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
>>>>>> a static key at runtime requires acquiring both cpu_hotplug_lock and
>>>>>> jump_label_mutex. When this happens after the queue has already been
>>>>>> frozen (i.e., while holding ->freeze_lock), it creates a locking
>>>>>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
>>>>>> potential deadlock reported by lockdep [1].
>>>>>>
>>>>>> To resolve this, replace the static key mechanism with q->queue_flags:
>>>>>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
>>>>>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
>>>>>> otherwise, the access is skipped.
>>>>>>
>>>>>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
>>>>>> the first cacheline of struct request_queue, checking it imposes minimal
>>>>>> overhead while eliminating the deadlock risk.
>>>>>>
>>>>>> This change avoids the lockdep splat without introducing performance
>>>>>> regressions.
>>>>>>
>>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>
>>>>>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
>>>>>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>>>>>> ---
>>>>>>  block/blk-mq-debugfs.c |  1 +
>>>>>>  block/blk-rq-qos.c     |  9 ++++---
>>>>>>  block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>>>>>>  include/linux/blkdev.h |  1 +
>>>>>>  4 files changed, 37 insertions(+), 28 deletions(-)
>>>>>>
>>>>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>>>>>> index 7ed3e71f2fc0..32c65efdda46 100644
>>>>>> --- a/block/blk-mq-debugfs.c
>>>>>> +++ b/block/blk-mq-debugfs.c
>>>>>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
>>>>>>  	QUEUE_FLAG_NAME(SQ_SCHED),
>>>>>>  	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
>>>>>>  	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
>>>>>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
>>>>>>  };
>>>>>>  #undef QUEUE_FLAG_NAME
>>>>>>  
>>>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>>>>>> index b1e24bb85ad2..654478dfbc20 100644
>>>>>> --- a/block/blk-rq-qos.c
>>>>>> +++ b/block/blk-rq-qos.c
>>>>>> @@ -2,8 +2,6 @@
>>>>>>  
>>>>>>  #include "blk-rq-qos.h"
>>>>>>  
>>>>>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
>>>>>> -
>>>>>>  /*
>>>>>>   * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
>>>>>>   * false if 'v' + 1 would be bigger than 'below'.
>>>>>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
>>>>>>  		struct rq_qos *rqos = q->rq_qos;
>>>>>>  		q->rq_qos = rqos->next;
>>>>>>  		rqos->ops->exit(rqos);
>>>>>> -		static_branch_dec(&block_rq_qos);
>>>>>>  	}
>>>>>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>>  	mutex_unlock(&q->rq_qos_mutex);
>>>>>>  }
>>>>>>  
>>>>>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>>>>>>  		goto ebusy;
>>>>>>  	rqos->next = q->rq_qos;
>>>>>>  	q->rq_qos = rqos;
>>>>>> -	static_branch_inc(&block_rq_qos);
>>>>>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>
>>>>> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
>>>>> out of queue freeze in rq_qos_add()?
>>>>>
>>>>> What matters is just the 1st static_branch_inc() which switches the counter
>>>>> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
>>>>> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
>>>>> needn't queue freeze protection.
>>>>>
>>>> I thought about it earlier but that won't work because we have 
>>>> code paths freezing queue before it reaches upto rq_qos_add(),
>>>> For instance:
>>>>
>>>> We have following code paths from where we invoke
>>>> rq_qos_add() APIs with queue already frozen:
>>>>
>>>> ioc_qos_write()
>>>>  -> blkg_conf_open_bdev_frozen() => freezes queue
>>>>  -> blk_iocost_init()
>>>>    -> rq_qos_add()
>>>>
>>>> queue_wb_lat_store()  => freezes queue
>>>>  -> wbt_init()
>>>>   -> rq_qos_add() 
>>>
>>> The above two shouldn't be hard to solve, such as, add helper
>>> rq_qos_prep_add() for increasing the static branch counter.
>>>
>> Yes but then it means that IOs which would be in flight 
>> would take a hit in hotpath: In hotpath those IOs
>> would evaluate static key branch to true and then fetch 
>> q->rq_qos (which probably would not be in the first
>> cacheline). So are we okay to take hat hit in IO 
>> hotpath?
> 
> But it is that in-tree code is doing, isn't it?
> 
> `static branch` is only evaluated iff at least one rqos is added.
> 
In the current in-tree implementation, the static branch is evaluated
only if at least one rq_qos is added.

Per you suggested change, we would increment the static branch key before
freezing the queue (and before attaching the QoS policy to the request queue).
This means that any I/O already in flight would see the static branch key
as enabled and would proceed to fetch q->rq_qos, even though q->rq_qos would
still be NULL at that point since the QoS policy hasn’t yet been attached.
This results in a performance penalty due to the additional q->rq_qos fetch.

In contrast, the current tree avoids this penalty. The existing sequence is:
- Freeze the queue.
- Attach the QoS policy to the queue (q->rq_qos becomes non-NULL).
- Increment the static branch key.
- Unfreeze the queue.

With this ordering, if the hotpath finds the static branch key enabled, it is
guaranteed that q->rq_qos is non-NULL. Thus, we either:
- Skip evaluating the static branch key (and q->rq_qos) entirely, or
- If the static branch key is enabled, also have a valid q->rq_qos.

In summary, it appears that your proposed ordering introduces a window where the
static branch key is enabled but q->rq_qos is still NULL, incurring unnecessary
fetch overhead in the I/O hotpath.

Thanks,
--Nilay



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-15  9:43             ` Nilay Shroff
@ 2025-08-15 13:24               ` Ming Lei
  2025-08-15 18:33                 ` Nilay Shroff
  0 siblings, 1 reply; 25+ messages in thread
From: Ming Lei @ 2025-08-15 13:24 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce

On Fri, Aug 15, 2025 at 03:13:19PM +0530, Nilay Shroff wrote:
> 
> 
> On 8/15/25 5:43 AM, Ming Lei wrote:
> > On Thu, Aug 14, 2025 at 08:01:11PM +0530, Nilay Shroff wrote:
> >>
> >>
> >> On 8/14/25 7:08 PM, Ming Lei wrote:
> >>> On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
> >>>>
> >>>>
> >>>> On 8/14/25 6:14 PM, Ming Lei wrote:
> >>>>> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
> >>>>>> A recent lockdep[1] splat observed while running blktest block/005
> >>>>>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
> >>>>>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
> >>>>>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
> >>>>>>
> >>>>>> That change added a static key to avoid fetching q->rq_qos when
> >>>>>> neither blk-wbt nor blk-iolatency is configured. The static key
> >>>>>> dynamically patches kernel text to a NOP when disabled, eliminating
> >>>>>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
> >>>>>> a static key at runtime requires acquiring both cpu_hotplug_lock and
> >>>>>> jump_label_mutex. When this happens after the queue has already been
> >>>>>> frozen (i.e., while holding ->freeze_lock), it creates a locking
> >>>>>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
> >>>>>> potential deadlock reported by lockdep [1].
> >>>>>>
> >>>>>> To resolve this, replace the static key mechanism with q->queue_flags:
> >>>>>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
> >>>>>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
> >>>>>> otherwise, the access is skipped.
> >>>>>>
> >>>>>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
> >>>>>> the first cacheline of struct request_queue, checking it imposes minimal
> >>>>>> overhead while eliminating the deadlock risk.
> >>>>>>
> >>>>>> This change avoids the lockdep splat without introducing performance
> >>>>>> regressions.
> >>>>>>
> >>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> >>>>>>
> >>>>>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>>>>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> >>>>>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> >>>>>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> >>>>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
> >>>>>> ---
> >>>>>>  block/blk-mq-debugfs.c |  1 +
> >>>>>>  block/blk-rq-qos.c     |  9 ++++---
> >>>>>>  block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
> >>>>>>  include/linux/blkdev.h |  1 +
> >>>>>>  4 files changed, 37 insertions(+), 28 deletions(-)
> >>>>>>
> >>>>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
> >>>>>> index 7ed3e71f2fc0..32c65efdda46 100644
> >>>>>> --- a/block/blk-mq-debugfs.c
> >>>>>> +++ b/block/blk-mq-debugfs.c
> >>>>>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
> >>>>>>  	QUEUE_FLAG_NAME(SQ_SCHED),
> >>>>>>  	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
> >>>>>>  	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
> >>>>>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
> >>>>>>  };
> >>>>>>  #undef QUEUE_FLAG_NAME
> >>>>>>  
> >>>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> >>>>>> index b1e24bb85ad2..654478dfbc20 100644
> >>>>>> --- a/block/blk-rq-qos.c
> >>>>>> +++ b/block/blk-rq-qos.c
> >>>>>> @@ -2,8 +2,6 @@
> >>>>>>  
> >>>>>>  #include "blk-rq-qos.h"
> >>>>>>  
> >>>>>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
> >>>>>> -
> >>>>>>  /*
> >>>>>>   * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
> >>>>>>   * false if 'v' + 1 would be bigger than 'below'.
> >>>>>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
> >>>>>>  		struct rq_qos *rqos = q->rq_qos;
> >>>>>>  		q->rq_qos = rqos->next;
> >>>>>>  		rqos->ops->exit(rqos);
> >>>>>> -		static_branch_dec(&block_rq_qos);
> >>>>>>  	}
> >>>>>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
> >>>>>>  	mutex_unlock(&q->rq_qos_mutex);
> >>>>>>  }
> >>>>>>  
> >>>>>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
> >>>>>>  		goto ebusy;
> >>>>>>  	rqos->next = q->rq_qos;
> >>>>>>  	q->rq_qos = rqos;
> >>>>>> -	static_branch_inc(&block_rq_qos);
> >>>>>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
> >>>>>
> >>>>> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
> >>>>> out of queue freeze in rq_qos_add()?
> >>>>>
> >>>>> What matters is just the 1st static_branch_inc() which switches the counter
> >>>>> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
> >>>>> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
> >>>>> needn't queue freeze protection.
> >>>>>
> >>>> I thought about it earlier but that won't work because we have 
> >>>> code paths freezing queue before it reaches upto rq_qos_add(),
> >>>> For instance:
> >>>>
> >>>> We have following code paths from where we invoke
> >>>> rq_qos_add() APIs with queue already frozen:
> >>>>
> >>>> ioc_qos_write()
> >>>>  -> blkg_conf_open_bdev_frozen() => freezes queue
> >>>>  -> blk_iocost_init()
> >>>>    -> rq_qos_add()
> >>>>
> >>>> queue_wb_lat_store()  => freezes queue
> >>>>  -> wbt_init()
> >>>>   -> rq_qos_add() 
> >>>
> >>> The above two shouldn't be hard to solve, such as, add helper
> >>> rq_qos_prep_add() for increasing the static branch counter.
> >>>
> >> Yes but then it means that IOs which would be in flight 
> >> would take a hit in hotpath: In hotpath those IOs
> >> would evaluate static key branch to true and then fetch 
> >> q->rq_qos (which probably would not be in the first
> >> cacheline). So are we okay to take hat hit in IO 
> >> hotpath?
> > 
> > But it is that in-tree code is doing, isn't it?
> > 
> > `static branch` is only evaluated iff at least one rqos is added.
> > 
> In the current in-tree implementation, the static branch is evaluated
> only if at least one rq_qos is added.
> 
> Per you suggested change, we would increment the static branch key before
> freezing the queue (and before attaching the QoS policy to the request queue).
> This means that any I/O already in flight would see the static branch key
> as enabled and would proceed to fetch q->rq_qos, even though q->rq_qos would
> still be NULL at that point since the QoS policy hasn’t yet been attached.
> This results in a performance penalty due to the additional q->rq_qos fetch.
> 
> In contrast, the current tree avoids this penalty. The existing sequence is:
> - Freeze the queue.
> - Attach the QoS policy to the queue (q->rq_qos becomes non-NULL).
> - Increment the static branch key.
> - Unfreeze the queue.
> 
> With this ordering, if the hotpath finds the static branch key enabled, it is
> guaranteed that q->rq_qos is non-NULL. Thus, we either:
> - Skip evaluating the static branch key (and q->rq_qos) entirely, or
> - If the static branch key is enabled, also have a valid q->rq_qos.
> 
> In summary, it appears that your proposed ordering introduces a window where the
> static branch key is enabled but q->rq_qos is still NULL, incurring unnecessary
> fetch overhead in the I/O hotpath.

Yes, but the window is pretty small, so the extra overhead isn't something
matters. More importantly it shows correct static_branch_*() usage, which is
supposed to be called safely without subsystem lock.


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-15 13:24               ` Ming Lei
@ 2025-08-15 18:33                 ` Nilay Shroff
  2025-08-16  1:01                   ` Yu Kuai
  0 siblings, 1 reply; 25+ messages in thread
From: Nilay Shroff @ 2025-08-15 18:33 UTC (permalink / raw)
  To: Ming Lei; +Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce



On 8/15/25 6:54 PM, Ming Lei wrote:
> On Fri, Aug 15, 2025 at 03:13:19PM +0530, Nilay Shroff wrote:
>>
>>
>> On 8/15/25 5:43 AM, Ming Lei wrote:
>>> On Thu, Aug 14, 2025 at 08:01:11PM +0530, Nilay Shroff wrote:
>>>>
>>>>
>>>> On 8/14/25 7:08 PM, Ming Lei wrote:
>>>>> On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
>>>>>>
>>>>>>
>>>>>> On 8/14/25 6:14 PM, Ming Lei wrote:
>>>>>>> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
>>>>>>>> A recent lockdep[1] splat observed while running blktest block/005
>>>>>>>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
>>>>>>>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
>>>>>>>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
>>>>>>>>
>>>>>>>> That change added a static key to avoid fetching q->rq_qos when
>>>>>>>> neither blk-wbt nor blk-iolatency is configured. The static key
>>>>>>>> dynamically patches kernel text to a NOP when disabled, eliminating
>>>>>>>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
>>>>>>>> a static key at runtime requires acquiring both cpu_hotplug_lock and
>>>>>>>> jump_label_mutex. When this happens after the queue has already been
>>>>>>>> frozen (i.e., while holding ->freeze_lock), it creates a locking
>>>>>>>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
>>>>>>>> potential deadlock reported by lockdep [1].
>>>>>>>>
>>>>>>>> To resolve this, replace the static key mechanism with q->queue_flags:
>>>>>>>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
>>>>>>>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
>>>>>>>> otherwise, the access is skipped.
>>>>>>>>
>>>>>>>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
>>>>>>>> the first cacheline of struct request_queue, checking it imposes minimal
>>>>>>>> overhead while eliminating the deadlock risk.
>>>>>>>>
>>>>>>>> This change avoids the lockdep splat without introducing performance
>>>>>>>> regressions.
>>>>>>>>
>>>>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>>>
>>>>>>>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>>>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
>>>>>>>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>>>>>>>> ---
>>>>>>>>  block/blk-mq-debugfs.c |  1 +
>>>>>>>>  block/blk-rq-qos.c     |  9 ++++---
>>>>>>>>  block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>>>>>>>>  include/linux/blkdev.h |  1 +
>>>>>>>>  4 files changed, 37 insertions(+), 28 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>>>>>>>> index 7ed3e71f2fc0..32c65efdda46 100644
>>>>>>>> --- a/block/blk-mq-debugfs.c
>>>>>>>> +++ b/block/blk-mq-debugfs.c
>>>>>>>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
>>>>>>>>  	QUEUE_FLAG_NAME(SQ_SCHED),
>>>>>>>>  	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
>>>>>>>>  	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
>>>>>>>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
>>>>>>>>  };
>>>>>>>>  #undef QUEUE_FLAG_NAME
>>>>>>>>  
>>>>>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>>>>>>>> index b1e24bb85ad2..654478dfbc20 100644
>>>>>>>> --- a/block/blk-rq-qos.c
>>>>>>>> +++ b/block/blk-rq-qos.c
>>>>>>>> @@ -2,8 +2,6 @@
>>>>>>>>  
>>>>>>>>  #include "blk-rq-qos.h"
>>>>>>>>  
>>>>>>>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
>>>>>>>> -
>>>>>>>>  /*
>>>>>>>>   * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
>>>>>>>>   * false if 'v' + 1 would be bigger than 'below'.
>>>>>>>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
>>>>>>>>  		struct rq_qos *rqos = q->rq_qos;
>>>>>>>>  		q->rq_qos = rqos->next;
>>>>>>>>  		rqos->ops->exit(rqos);
>>>>>>>> -		static_branch_dec(&block_rq_qos);
>>>>>>>>  	}
>>>>>>>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>>>>  	mutex_unlock(&q->rq_qos_mutex);
>>>>>>>>  }
>>>>>>>>  
>>>>>>>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>>>>>>>>  		goto ebusy;
>>>>>>>>  	rqos->next = q->rq_qos;
>>>>>>>>  	q->rq_qos = rqos;
>>>>>>>> -	static_branch_inc(&block_rq_qos);
>>>>>>>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>>>
>>>>>>> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
>>>>>>> out of queue freeze in rq_qos_add()?
>>>>>>>
>>>>>>> What matters is just the 1st static_branch_inc() which switches the counter
>>>>>>> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
>>>>>>> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
>>>>>>> needn't queue freeze protection.
>>>>>>>
>>>>>> I thought about it earlier but that won't work because we have 
>>>>>> code paths freezing queue before it reaches upto rq_qos_add(),
>>>>>> For instance:
>>>>>>
>>>>>> We have following code paths from where we invoke
>>>>>> rq_qos_add() APIs with queue already frozen:
>>>>>>
>>>>>> ioc_qos_write()
>>>>>>  -> blkg_conf_open_bdev_frozen() => freezes queue
>>>>>>  -> blk_iocost_init()
>>>>>>    -> rq_qos_add()
>>>>>>
>>>>>> queue_wb_lat_store()  => freezes queue
>>>>>>  -> wbt_init()
>>>>>>   -> rq_qos_add() 
>>>>>
>>>>> The above two shouldn't be hard to solve, such as, add helper
>>>>> rq_qos_prep_add() for increasing the static branch counter.
>>>>>
>>>> Yes but then it means that IOs which would be in flight 
>>>> would take a hit in hotpath: In hotpath those IOs
>>>> would evaluate static key branch to true and then fetch 
>>>> q->rq_qos (which probably would not be in the first
>>>> cacheline). So are we okay to take hat hit in IO 
>>>> hotpath?
>>>
>>> But it is that in-tree code is doing, isn't it?
>>>
>>> `static branch` is only evaluated iff at least one rqos is added.
>>>
>> In the current in-tree implementation, the static branch is evaluated
>> only if at least one rq_qos is added.
>>
>> Per you suggested change, we would increment the static branch key before
>> freezing the queue (and before attaching the QoS policy to the request queue).
>> This means that any I/O already in flight would see the static branch key
>> as enabled and would proceed to fetch q->rq_qos, even though q->rq_qos would
>> still be NULL at that point since the QoS policy hasn’t yet been attached.
>> This results in a performance penalty due to the additional q->rq_qos fetch.
>>
>> In contrast, the current tree avoids this penalty. The existing sequence is:
>> - Freeze the queue.
>> - Attach the QoS policy to the queue (q->rq_qos becomes non-NULL).
>> - Increment the static branch key.
>> - Unfreeze the queue.
>>
>> With this ordering, if the hotpath finds the static branch key enabled, it is
>> guaranteed that q->rq_qos is non-NULL. Thus, we either:
>> - Skip evaluating the static branch key (and q->rq_qos) entirely, or
>> - If the static branch key is enabled, also have a valid q->rq_qos.
>>
>> In summary, it appears that your proposed ordering introduces a window where the
>> static branch key is enabled but q->rq_qos is still NULL, incurring unnecessary
>> fetch overhead in the I/O hotpath.
> 
> Yes, but the window is pretty small, so the extra overhead isn't something
> matters. More importantly it shows correct static_branch_*() usage, which is
> supposed to be called safely without subsystem lock.
> 
I see your point about static_branch_*() usage being independent of the subsystem
lock, but in this case that “small window” still sits directly in the I/O hotpath
and will be hit by all in-flight requests during queue freeze. The current ordering
is intentionally structured so that:

1. The branch is only enabled after q->rq_qos is guaranteed non-NULL.
2. Any hotpath evaluation of the static key implies a valid pointer
   dereference with no wasted cache miss.
Even if the window is short, we’d still pay for unnecessary q->rq_qos loads and
possible cacheline misses for every inflight I/O during that period. In practice,
that’s why this patch replaces the static key with a queue_flags bit: it’s lock-
free to update, eliminates the deadlock risk, and preserves the “no penalty until
active” property without depending on lock ordering subtlety. 

Having said that, I’m not opposed to reworking the patch per your proposal if 
all agree the minor hotpath cost is acceptable, but wanted to make sure the
trade-off is clear. 

Thanks,
--Nilay


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-15 18:33                 ` Nilay Shroff
@ 2025-08-16  1:01                   ` Yu Kuai
  0 siblings, 0 replies; 25+ messages in thread
From: Yu Kuai @ 2025-08-16  1:01 UTC (permalink / raw)
  To: Nilay Shroff, Ming Lei
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce

Hi,

在 2025/8/16 2:33, Nilay Shroff 写道:
>
> On 8/15/25 6:54 PM, Ming Lei wrote:
>> On Fri, Aug 15, 2025 at 03:13:19PM +0530, Nilay Shroff wrote:
>>>
>>> On 8/15/25 5:43 AM, Ming Lei wrote:
>>>> On Thu, Aug 14, 2025 at 08:01:11PM +0530, Nilay Shroff wrote:
>>>>>
>>>>> On 8/14/25 7:08 PM, Ming Lei wrote:
>>>>>> On Thu, Aug 14, 2025 at 06:27:08PM +0530, Nilay Shroff wrote:
>>>>>>>
>>>>>>> On 8/14/25 6:14 PM, Ming Lei wrote:
>>>>>>>> On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
>>>>>>>>> A recent lockdep[1] splat observed while running blktest block/005
>>>>>>>>> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
>>>>>>>>> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
>>>>>>>>> ("block: blk-rq-qos: guard rq-qos helpers by static key").
>>>>>>>>>
>>>>>>>>> That change added a static key to avoid fetching q->rq_qos when
>>>>>>>>> neither blk-wbt nor blk-iolatency is configured. The static key
>>>>>>>>> dynamically patches kernel text to a NOP when disabled, eliminating
>>>>>>>>> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
>>>>>>>>> a static key at runtime requires acquiring both cpu_hotplug_lock and
>>>>>>>>> jump_label_mutex. When this happens after the queue has already been
>>>>>>>>> frozen (i.e., while holding ->freeze_lock), it creates a locking
>>>>>>>>> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
>>>>>>>>> potential deadlock reported by lockdep [1].
>>>>>>>>>
>>>>>>>>> To resolve this, replace the static key mechanism with q->queue_flags:
>>>>>>>>> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
>>>>>>>>> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
>>>>>>>>> otherwise, the access is skipped.
>>>>>>>>>
>>>>>>>>> Since q->queue_flags is commonly accessed in IO hotpath and resides in
>>>>>>>>> the first cacheline of struct request_queue, checking it imposes minimal
>>>>>>>>> overhead while eliminating the deadlock risk.
>>>>>>>>>
>>>>>>>>> This change avoids the lockdep splat without introducing performance
>>>>>>>>> regressions.
>>>>>>>>>
>>>>>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>>>>
>>>>>>>>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>>>>> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>>>> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
>>>>>>>>> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
>>>>>>>>> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
>>>>>>>>> ---
>>>>>>>>>   block/blk-mq-debugfs.c |  1 +
>>>>>>>>>   block/blk-rq-qos.c     |  9 ++++---
>>>>>>>>>   block/blk-rq-qos.h     | 54 ++++++++++++++++++++++++------------------
>>>>>>>>>   include/linux/blkdev.h |  1 +
>>>>>>>>>   4 files changed, 37 insertions(+), 28 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
>>>>>>>>> index 7ed3e71f2fc0..32c65efdda46 100644
>>>>>>>>> --- a/block/blk-mq-debugfs.c
>>>>>>>>> +++ b/block/blk-mq-debugfs.c
>>>>>>>>> @@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
>>>>>>>>>   	QUEUE_FLAG_NAME(SQ_SCHED),
>>>>>>>>>   	QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
>>>>>>>>>   	QUEUE_FLAG_NAME(NO_ELV_SWITCH),
>>>>>>>>> +	QUEUE_FLAG_NAME(QOS_ENABLED),
>>>>>>>>>   };
>>>>>>>>>   #undef QUEUE_FLAG_NAME
>>>>>>>>>   
>>>>>>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>>>>>>>>> index b1e24bb85ad2..654478dfbc20 100644
>>>>>>>>> --- a/block/blk-rq-qos.c
>>>>>>>>> +++ b/block/blk-rq-qos.c
>>>>>>>>> @@ -2,8 +2,6 @@
>>>>>>>>>   
>>>>>>>>>   #include "blk-rq-qos.h"
>>>>>>>>>   
>>>>>>>>> -__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
>>>>>>>>> -
>>>>>>>>>   /*
>>>>>>>>>    * Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
>>>>>>>>>    * false if 'v' + 1 would be bigger than 'below'.
>>>>>>>>> @@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
>>>>>>>>>   		struct rq_qos *rqos = q->rq_qos;
>>>>>>>>>   		q->rq_qos = rqos->next;
>>>>>>>>>   		rqos->ops->exit(rqos);
>>>>>>>>> -		static_branch_dec(&block_rq_qos);
>>>>>>>>>   	}
>>>>>>>>> +	blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>>>>>   	mutex_unlock(&q->rq_qos_mutex);
>>>>>>>>>   }
>>>>>>>>>   
>>>>>>>>> @@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
>>>>>>>>>   		goto ebusy;
>>>>>>>>>   	rqos->next = q->rq_qos;
>>>>>>>>>   	q->rq_qos = rqos;
>>>>>>>>> -	static_branch_inc(&block_rq_qos);
>>>>>>>>> +	blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>>>> One stupid question: can we simply move static_branch_inc(&block_rq_qos)
>>>>>>>> out of queue freeze in rq_qos_add()?
>>>>>>>>
>>>>>>>> What matters is just the 1st static_branch_inc() which switches the counter
>>>>>>>> from 0 to 1, when blk_mq_freeze_queue() guarantees that all in-progress code
>>>>>>>> paths observe q->rq_qos as NULL. That means static_branch_inc(&block_rq_qos)
>>>>>>>> needn't queue freeze protection.
>>>>>>>>
>>>>>>> I thought about it earlier but that won't work because we have
>>>>>>> code paths freezing queue before it reaches upto rq_qos_add(),
>>>>>>> For instance:
>>>>>>>
>>>>>>> We have following code paths from where we invoke
>>>>>>> rq_qos_add() APIs with queue already frozen:
>>>>>>>
>>>>>>> ioc_qos_write()
>>>>>>>   -> blkg_conf_open_bdev_frozen() => freezes queue
>>>>>>>   -> blk_iocost_init()
>>>>>>>     -> rq_qos_add()
>>>>>>>
>>>>>>> queue_wb_lat_store()  => freezes queue
>>>>>>>   -> wbt_init()
>>>>>>>    -> rq_qos_add()
>>>>>> The above two shouldn't be hard to solve, such as, add helper
>>>>>> rq_qos_prep_add() for increasing the static branch counter.
>>>>>>
>>>>> Yes but then it means that IOs which would be in flight
>>>>> would take a hit in hotpath: In hotpath those IOs
>>>>> would evaluate static key branch to true and then fetch
>>>>> q->rq_qos (which probably would not be in the first
>>>>> cacheline). So are we okay to take hat hit in IO
>>>>> hotpath?
>>>> But it is that in-tree code is doing, isn't it?
>>>>
>>>> `static branch` is only evaluated iff at least one rqos is added.
>>>>
>>> In the current in-tree implementation, the static branch is evaluated
>>> only if at least one rq_qos is added.
>>>
>>> Per you suggested change, we would increment the static branch key before
>>> freezing the queue (and before attaching the QoS policy to the request queue).
>>> This means that any I/O already in flight would see the static branch key
>>> as enabled and would proceed to fetch q->rq_qos, even though q->rq_qos would
>>> still be NULL at that point since the QoS policy hasn’t yet been attached.
>>> This results in a performance penalty due to the additional q->rq_qos fetch.
>>>
>>> In contrast, the current tree avoids this penalty. The existing sequence is:
>>> - Freeze the queue.
>>> - Attach the QoS policy to the queue (q->rq_qos becomes non-NULL).
>>> - Increment the static branch key.
>>> - Unfreeze the queue.
>>>
>>> With this ordering, if the hotpath finds the static branch key enabled, it is
>>> guaranteed that q->rq_qos is non-NULL. Thus, we either:
>>> - Skip evaluating the static branch key (and q->rq_qos) entirely, or
>>> - If the static branch key is enabled, also have a valid q->rq_qos.
>>>
>>> In summary, it appears that your proposed ordering introduces a window where the
>>> static branch key is enabled but q->rq_qos is still NULL, incurring unnecessary
>>> fetch overhead in the I/O hotpath.
>> Yes, but the window is pretty small, so the extra overhead isn't something
>> matters. More importantly it shows correct static_branch_*() usage, which is
>> supposed to be called safely without subsystem lock.
>>
> I see your point about static_branch_*() usage being independent of the subsystem
> lock, but in this case that “small window” still sits directly in the I/O hotpath
> and will be hit by all in-flight requests during queue freeze. The current ordering
> is intentionally structured so that:
>
> 1. The branch is only enabled after q->rq_qos is guaranteed non-NULL.
> 2. Any hotpath evaluation of the static key implies a valid pointer
>     dereference with no wasted cache miss.
> Even if the window is short, we’d still pay for unnecessary q->rq_qos loads and
> possible cacheline misses for every inflight I/O during that period. In practice,
> that’s why this patch replaces the static key with a queue_flags bit: it’s lock-
> free to update, eliminates the deadlock risk, and preserves the “no penalty until
> active” property without depending on lock ordering subtlety.
>
> Having said that, I’m not opposed to reworking the patch per your proposal if
> all agree the minor hotpath cost is acceptable, but wanted to make sure the
> trade-off is clear.

I'm good with this set, you already have my review :)

BTW, this set have another advantage that the flag is per disk, the 
static branch

can only optimize when all the disks in the system doesn't enable 
rq_qos, is one

disk enable it, all disks in the system that doesn't have rq_qos enabled 
will all suffer

from the additional cacheline load.


Thanks

Kuai

>
> Thanks,
> --Nilay
>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
  2025-08-14  8:24 ` [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
  2025-08-14  9:21   ` Yu Kuai
  2025-08-14 12:44   ` Ming Lei
@ 2025-08-16  1:59   ` Ming Lei
  2 siblings, 0 replies; 25+ messages in thread
From: Ming Lei @ 2025-08-16  1:59 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: linux-block, axboe, yukuai1, hch, shinichiro.kawasaki, kch,
	gjoyce

On Thu, Aug 14, 2025 at 01:54:59PM +0530, Nilay Shroff wrote:
> A recent lockdep[1] splat observed while running blktest block/005
> reveals a potential deadlock caused by the cpu_hotplug_lock dependency
> on ->freeze_lock. This dependency was introduced by commit 033b667a823e
> ("block: blk-rq-qos: guard rq-qos helpers by static key").
> 
> That change added a static key to avoid fetching q->rq_qos when
> neither blk-wbt nor blk-iolatency is configured. The static key
> dynamically patches kernel text to a NOP when disabled, eliminating
> overhead of fetching q->rq_qos in the I/O hot path. However, enabling
> a static key at runtime requires acquiring both cpu_hotplug_lock and
> jump_label_mutex. When this happens after the queue has already been
> frozen (i.e., while holding ->freeze_lock), it creates a locking
> dependency from cpu_hotplug_lock to ->freeze_lock, which leads to a
> potential deadlock reported by lockdep [1].
> 
> To resolve this, replace the static key mechanism with q->queue_flags:
> QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
> accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
> otherwise, the access is skipped.
> 
> Since q->queue_flags is commonly accessed in IO hotpath and resides in
> the first cacheline of struct request_queue, checking it imposes minimal
> overhead while eliminating the deadlock risk.
> 
> This change avoids the lockdep splat without introducing performance
> regressions.
> 
> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> 
> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Closes: https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>

It is hard to use static branch correctly in current case from lock viewpoint, and
most distributions should enable at least one rqos, so static branch won't optimize
for typical cases:

Reviewed-by: Ming Lei <ming.lei@redhat.com>


Thanks,
Ming


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop
  2025-08-14  8:24 [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
                   ` (2 preceding siblings ...)
  2025-08-14  8:24 ` [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
@ 2025-08-21 12:19 ` Nilay Shroff
  2025-08-21 13:11   ` Jens Axboe
  2025-08-21 13:11 ` Jens Axboe
  4 siblings, 1 reply; 25+ messages in thread
From: Nilay Shroff @ 2025-08-21 12:19 UTC (permalink / raw)
  To: Jens Axboe
  Cc: ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce,
	linux-block@vger.kernel.org

Hi Jens,

Just a gentle ping on this patchset. There have been a couple of recent
reports[1][2] highlighting this issue. Could you please consider pulling it?

[1] https://lore.kernel.org/all/4f2bd603-dbd9-434c-9dfe-f2d4f1becd82@linux.ibm.com/
[2] https://lore.kernel.org/all/36e552a1-8e8e-4b6f-894b-e7e04e17196e@linux.ibm.com/

Thanks,
--Nilay

On 8/14/25 1:54 PM, Nilay Shroff wrote:
> Hi,
> 
> This patchset replaces the use of a static key in the I/O path (rq_qos_
> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
> is made to eliminate a potential deadlock introduced by the use of static
> keys in the blk-rq-qos infrastructure, as reported by lockdep during
> blktests block/005[1].
> 
> The original static key approach was introduced to avoid unnecessary
> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
> blk-iolatency) is configured. While efficient, enabling a static key at
> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
> becomes problematic if the queue is already frozen — causing a reverse
> dependency on ->freeze_lock. This results in a lockdep splat indicating
> a potential deadlock.
> 
> To resolve this, we now gate q->rq_qos access with a q->queue_flags
> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
> locking altogether.
> 
> I compared both static key and atomic bitop implementations using ftrace
> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
> easy comparision, I made rq_qos_issue() noinline. The comparision was
> made on PowerPC machine.
> 
> Static Key disabled (QoS is not configured):
> 5d0: 00 00 00 60     nop    # patched in by static key framework
> 5d4: 20 00 80 4e     blr    # return (branch to link register)
> 
> Only a nop and blr (branch to link register) are executed — very lightweight.
> 
> atomic bitop (QoS is not configured):
> 5d0: 20 00 23 e9     ld      r9,32(r3)     # load q->queue_flags
> 5d4: 00 80 29 71     andi.   r9,r9,32768   # check QUEUE_FLAG_QOS_ENABLED (bit 15)
> 5d8: 20 00 82 4d     beqlr                 # return if bit not set
> 
> This performs an ld and andi. before returning. Slightly more work,
> but q->queue_flags is typically hot in cache during I/O submission.
> 
> With Static Key (disabled):
> Duration (us): min=0.668 max=0.816 avg≈0.750
> 
> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
> Duration (us): min=0.684 max=0.834 avg≈0.759
> 
> As expected, both versions are almost similar in cost. The added latency
> from an extra ld and andi. is in the range of ~9ns.
> 
> There're three patches in the series:
> - First patch is a minor optimization which skips evaluating q->rq_qos
>   check in the re_qos_done_bio() as it's not needed.
> - Second patch fixes a subtle issue in rq_qos_del() to ensure that
>   we decrement the block_rq_qos static key in rq_qos_del() when a 
>   QoS policy is detached from the queue.
> - Third patch replaces usage of block_rq_qos static key with atomic flag
>   QUEUE_FLAG_QOS_ENABLED and thus helps break cpu_hotplug_lock depedency
>   on freeze_lock and that eventually help to fix the lockdep splat.
> 
> As usual, feedback and review comments are welcome!
> 
> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> 
> Changes from v2:
>   - Added a change to skip the q->rq_qos check in rq_qos_done_bio().
>     This is now part of the first patch in this series. (Yu Kuai)
>   - Added a separate patch to ensure block_rq_qos is decremented when
>     detaching a QoS policy in rq_qos_del(). This is now the second 
>     patch in this series. (Yu Kuai)
>   - Folded the third patch from v2 into the new third patch, as patch
>     ordering changed (the second patch from v2 is now the third patch
>     in v3).
> 
> Link to v2: https://lore.kernel.org/all/20250805171749.3448694-1-nilay@linux.ibm.com/
> 
> Changes from v1:
>   - For debugging I made rq_qos_issue() noinline in my local workspace,
>     but then inadvertently it slipped through the patchset upstream. So
>     reverted it and made rq_qos_issue() inline again as earlier.
>   - Added Reported-by and Closes tags in the first patch, which I
>     obviously missed to add in the first version.
> 
> Link to v1: https://lore.kernel.org/all/20250804122125.3271397-1-nilay@linux.ibm.com/
> 
> Nilay Shroff (3):
>   block: skip q->rq_qos check in rq_qos_done_bio()
>   block: decrement block_rq_qos static key in rq_qos_del()
>   block: avoid cpu_hotplug_lock depedency on freeze_lock
> 
>  block/blk-mq-debugfs.c |  1 +
>  block/blk-rq-qos.c     |  8 +++----
>  block/blk-rq-qos.h     | 48 +++++++++++++++++++++++++++---------------
>  include/linux/blkdev.h |  1 +
>  4 files changed, 37 insertions(+), 21 deletions(-)
> 


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop
  2025-08-14  8:24 [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
                   ` (3 preceding siblings ...)
  2025-08-21 12:19 ` [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
@ 2025-08-21 13:11 ` Jens Axboe
  4 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2025-08-21 13:11 UTC (permalink / raw)
  To: linux-block, Nilay Shroff
  Cc: ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce


On Thu, 14 Aug 2025 13:54:56 +0530, Nilay Shroff wrote:
> This patchset replaces the use of a static key in the I/O path (rq_qos_
> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
> is made to eliminate a potential deadlock introduced by the use of static
> keys in the blk-rq-qos infrastructure, as reported by lockdep during
> blktests block/005[1].
> 
> The original static key approach was introduced to avoid unnecessary
> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
> blk-iolatency) is configured. While efficient, enabling a static key at
> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
> becomes problematic if the queue is already frozen — causing a reverse
> dependency on ->freeze_lock. This results in a lockdep splat indicating
> a potential deadlock.
> 
> [...]

Applied, thanks!

[1/3] block: skip q->rq_qos check in rq_qos_done_bio()
      commit: 275332877e2fa9d6efa7402b1e897f6c6ee695bb
[2/3] block: decrement block_rq_qos static key in rq_qos_del()
      commit: ade1beea1c27657712aa8f594226d461639382ff
[3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock
      commit: 370ac285f23aecae40600851fb4a1a9e75e50973

Best regards,
-- 
Jens Axboe




^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop
  2025-08-21 12:19 ` [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
@ 2025-08-21 13:11   ` Jens Axboe
  0 siblings, 0 replies; 25+ messages in thread
From: Jens Axboe @ 2025-08-21 13:11 UTC (permalink / raw)
  To: Nilay Shroff
  Cc: ming.lei, yukuai1, hch, shinichiro.kawasaki, kch, gjoyce,
	linux-block@vger.kernel.org

On 8/21/25 6:19 AM, Nilay Shroff wrote:
> Hi Jens,
> 
> Just a gentle ping on this patchset. There have been a couple of recent
> reports[1][2] highlighting this issue. Could you please consider pulling it?
> 
> [1] https://lore.kernel.org/all/4f2bd603-dbd9-434c-9dfe-f2d4f1becd82@linux.ibm.com/
> [2] https://lore.kernel.org/all/36e552a1-8e8e-4b6f-894b-e7e04e17196e@linux.ibm.com/

Was waiting for discussion to settle down, but looks like there's
agreement to go with this approach. I've queued it up, thanks.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2025-08-21 13:11 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-14  8:24 [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
2025-08-14  8:24 ` [PATCHv3 1/3] block: skip q->rq_qos check in rq_qos_done_bio() Nilay Shroff
2025-08-14  8:59   ` Yu Kuai
2025-08-14 11:12   ` Ming Lei
2025-08-14  8:24 ` [PATCHv3 2/3] block: decrement block_rq_qos static key in rq_qos_del() Nilay Shroff
2025-08-14  9:14   ` Yu Kuai
2025-08-14 11:33   ` Ming Lei
2025-08-14  8:24 ` [PATCHv3 3/3] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
2025-08-14  9:21   ` Yu Kuai
2025-08-14 12:44   ` Ming Lei
2025-08-14 12:57     ` Nilay Shroff
2025-08-14 13:38       ` Ming Lei
2025-08-14 14:31         ` Nilay Shroff
2025-08-15  0:13           ` Ming Lei
2025-08-15  1:04             ` Yu Kuai
2025-08-15  7:59               ` Ming Lei
2025-08-15  8:39                 ` Yu Kuai
2025-08-15  9:43             ` Nilay Shroff
2025-08-15 13:24               ` Ming Lei
2025-08-15 18:33                 ` Nilay Shroff
2025-08-16  1:01                   ` Yu Kuai
2025-08-16  1:59   ` Ming Lei
2025-08-21 12:19 ` [PATCHv3 0/3] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
2025-08-21 13:11   ` Jens Axboe
2025-08-21 13:11 ` Jens Axboe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).