* [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
@ 2025-08-04 12:21 Nilay Shroff
2025-08-04 12:21 ` [PATCH 1/2] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Nilay Shroff @ 2025-08-04 12:21 UTC (permalink / raw)
To: linux-block; +Cc: axboe, kch, shinichiro.kawasaki, hch, ming.lei, gjoyce
This patchset replaces the use of a static key in the I/O path (rq_qos_
xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
is made to eliminate a potential deadlock introduced by the use of static
keys in the blk-rq-qos infrastructure, as reported by lockdep during
blktests block/005[1].
The original static key approach was introduced to avoid unnecessary
dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
blk-iolatency) is configured. While efficient, enabling a static key at
runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
becomes problematic if the queue is already frozen — causing a reverse
dependency on ->freeze_lock. This results in a lockdep splat indicating
a potential deadlock.
To resolve this, we now gate q->rq_qos access with a q->queue_flags
bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
locking altogether.
I compared both static key and atomic bitop implementations using ftrace
function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
easy comparision, I made rq_qos_issue() noinline. The comparision was
made on PowerPC machine.
Static Key (disabled : QoS is not configured):
5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
5d4: 20 00 80 4e blr # return (branch to link register)
Only a nop and blr (branch to link register) are executed — very lightweight.
atomic bitop (QoS is not configured):
5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
5d8: 20 00 82 4d beqlr # return if bit not set
This performs an ld and and andi. before returning. Slightly more work,
but q->queue_flags is typically hot in cache during I/O submission.
With Static Key (disabled):
Duration (us): min=0.668 max=0.816 avg≈0.750
With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
Duration (us): min=0.684 max=0.834 avg≈0.759
As expected, both versions are almost similar in cost. The added latency
from an extra ld and andi. is in the range of ~9ns.
There're two patches in the series. The first patch replaces static key
with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
rq_qos policies.
As usual, feedback and review comments are welcome!
[1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
Nilay Shroff (2):
block: avoid cpu_hotplug_lock depedency on freeze_lock
block: clear QUEUE_FLAG_QOS_ENABLED in rq_qos_del()
block/blk-mq-debugfs.c | 1 +
block/blk-rq-qos.c | 8 ++++----
block/blk-rq-qos.h | 45 +++++++++++++++++++++++++-----------------
include/linux/blkdev.h | 1 +
4 files changed, 33 insertions(+), 22 deletions(-)
--
2.50.1
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/2] block: avoid cpu_hotplug_lock depedency on freeze_lock
2025-08-04 12:21 [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
@ 2025-08-04 12:21 ` Nilay Shroff
2025-08-04 12:21 ` [PATCH 2/2] block: clear QUEUE_FLAG_QOS_ENABLED in rq_qos_del() Nilay Shroff
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Nilay Shroff @ 2025-08-04 12:21 UTC (permalink / raw)
To: linux-block; +Cc: axboe, kch, shinichiro.kawasaki, hch, ming.lei, gjoyce
A recent lockdep[1] splat observed while running blktest block/005
reveals a potential deadlock caused by the cpu_hotplug_lock dependency
on ->freeze_lock. This dependency was introduced by commit 033b667a823e
("block: blk-rq-qos: guard rq-qos helpers by static key").
That change added a static key to avoid fetching q->rq_qos when neither
blk-wbt nor blk-iolatency is configured. The static key dynamically
patches kernel text to a NOP when disabled, eliminating overhead of
fetching q->rq_qos in the I/O hot path. However, enabling a static key
at runtime requires acquiring both cpu_hotplug_lock and jump_label_mutex.
When this happens after the queue has already been frozen (i.e., while
holding ->freeze_lock), it creates a locking dependency from cpu_hotplug_
lock/jump_label_mutex to ->freeze_lock, which leads to a potential deadlock
reported by lockdep[1].
To resolve this, replace the static key mechanism with q->queue_flags:
QUEUE_FLAG_QOS_ENABLED. This flag is evaluated in the fast path before
accessing q->rq_qos. If the flag is set, we proceed to fetch q->rq_qos;
otherwise, the access is skipped.
Since q->queue_flags is commonly accessed in IO hotpath and resides in
the first cacheline of struct request_queue, checking it imposes minimal
overhead while eliminating the deadlock risk.
This change avoids the lockdep splat without introducing performance
regressions.
[1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
Fixes: 033b667a823e ("block: blk-rq-qos: guard rq-qos helpers by static key")
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-mq-debugfs.c | 1 +
block/blk-rq-qos.c | 6 ++----
block/blk-rq-qos.h | 45 +++++++++++++++++++++++++-----------------
include/linux/blkdev.h | 1 +
4 files changed, 31 insertions(+), 22 deletions(-)
diff --git a/block/blk-mq-debugfs.c b/block/blk-mq-debugfs.c
index 7ed3e71f2fc0..32c65efdda46 100644
--- a/block/blk-mq-debugfs.c
+++ b/block/blk-mq-debugfs.c
@@ -95,6 +95,7 @@ static const char *const blk_queue_flag_name[] = {
QUEUE_FLAG_NAME(SQ_SCHED),
QUEUE_FLAG_NAME(DISABLE_WBT_DEF),
QUEUE_FLAG_NAME(NO_ELV_SWITCH),
+ QUEUE_FLAG_NAME(QOS_ENABLED),
};
#undef QUEUE_FLAG_NAME
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 848591fb3c57..460c04715321 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -2,8 +2,6 @@
#include "blk-rq-qos.h"
-__read_mostly DEFINE_STATIC_KEY_FALSE(block_rq_qos);
-
/*
* Increment 'v', if 'v' is below 'below'. Returns true if we succeeded,
* false if 'v' + 1 would be bigger than 'below'.
@@ -319,8 +317,8 @@ void rq_qos_exit(struct request_queue *q)
struct rq_qos *rqos = q->rq_qos;
q->rq_qos = rqos->next;
rqos->ops->exit(rqos);
- static_branch_dec(&block_rq_qos);
}
+ blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
mutex_unlock(&q->rq_qos_mutex);
}
@@ -346,7 +344,7 @@ int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
goto ebusy;
rqos->next = q->rq_qos;
q->rq_qos = rqos;
- static_branch_inc(&block_rq_qos);
+ blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
blk_mq_unfreeze_queue(q, memflags);
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 39749f4066fb..0eeb5beb1f4d 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -12,7 +12,6 @@
#include "blk-mq-debugfs.h"
struct blk_mq_debugfs_attr;
-extern struct static_key_false block_rq_qos;
enum rq_qos_id {
RQ_QOS_WBT,
@@ -113,43 +112,50 @@ void __rq_qos_queue_depth_changed(struct rq_qos *rqos);
static inline void rq_qos_cleanup(struct request_queue *q, struct bio *bio)
{
- if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos)
__rq_qos_cleanup(q->rq_qos, bio);
}
static inline void rq_qos_done(struct request_queue *q, struct request *rq)
{
- if (static_branch_unlikely(&block_rq_qos) && q->rq_qos &&
- !blk_rq_is_passthrough(rq))
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos && !blk_rq_is_passthrough(rq))
__rq_qos_done(q->rq_qos, rq);
}
-static inline void rq_qos_issue(struct request_queue *q, struct request *rq)
+static __maybe_unused noinline void rq_qos_issue(struct request_queue *q, struct request *rq)
{
- if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos)
__rq_qos_issue(q->rq_qos, rq);
}
static inline void rq_qos_requeue(struct request_queue *q, struct request *rq)
{
- if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos)
__rq_qos_requeue(q->rq_qos, rq);
}
static inline void rq_qos_done_bio(struct bio *bio)
{
- if (static_branch_unlikely(&block_rq_qos) &&
- bio->bi_bdev && (bio_flagged(bio, BIO_QOS_THROTTLED) ||
- bio_flagged(bio, BIO_QOS_MERGED))) {
- struct request_queue *q = bdev_get_queue(bio->bi_bdev);
- if (q->rq_qos)
- __rq_qos_done_bio(q->rq_qos, bio);
- }
+ struct request_queue *q;
+
+ if (!bio->bi_bdev || (!bio_flagged(bio, BIO_QOS_THROTTLED) &&
+ !bio_flagged(bio, BIO_QOS_MERGED)))
+ return;
+
+ q = bdev_get_queue(bio->bi_bdev);
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos)
+ __rq_qos_done_bio(q->rq_qos, bio);
}
static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
{
- if (static_branch_unlikely(&block_rq_qos) && q->rq_qos) {
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos) {
bio_set_flag(bio, BIO_QOS_THROTTLED);
__rq_qos_throttle(q->rq_qos, bio);
}
@@ -158,14 +164,16 @@ static inline void rq_qos_throttle(struct request_queue *q, struct bio *bio)
static inline void rq_qos_track(struct request_queue *q, struct request *rq,
struct bio *bio)
{
- if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos)
__rq_qos_track(q->rq_qos, rq, bio);
}
static inline void rq_qos_merge(struct request_queue *q, struct request *rq,
struct bio *bio)
{
- if (static_branch_unlikely(&block_rq_qos) && q->rq_qos) {
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos) {
bio_set_flag(bio, BIO_QOS_MERGED);
__rq_qos_merge(q->rq_qos, rq, bio);
}
@@ -173,7 +181,8 @@ static inline void rq_qos_merge(struct request_queue *q, struct request *rq,
static inline void rq_qos_queue_depth_changed(struct request_queue *q)
{
- if (static_branch_unlikely(&block_rq_qos) && q->rq_qos)
+ if (unlikely(test_bit(QUEUE_FLAG_QOS_ENABLED, &q->queue_flags)) &&
+ q->rq_qos)
__rq_qos_queue_depth_changed(q->rq_qos);
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 95886b404b16..fe1797bbec42 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -656,6 +656,7 @@ enum {
QUEUE_FLAG_SQ_SCHED, /* single queue style io dispatch */
QUEUE_FLAG_DISABLE_WBT_DEF, /* for sched to disable/enable wbt */
QUEUE_FLAG_NO_ELV_SWITCH, /* can't switch elevator any more */
+ QUEUE_FLAG_QOS_ENABLED, /* qos is enabled */
QUEUE_FLAG_MAX
};
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/2] block: clear QUEUE_FLAG_QOS_ENABLED in rq_qos_del()
2025-08-04 12:21 [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
2025-08-04 12:21 ` [PATCH 1/2] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
@ 2025-08-04 12:21 ` Nilay Shroff
2025-08-04 13:42 ` [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Ming Lei
2025-08-05 9:28 ` Yu Kuai
3 siblings, 0 replies; 16+ messages in thread
From: Nilay Shroff @ 2025-08-04 12:21 UTC (permalink / raw)
To: linux-block; +Cc: axboe, kch, shinichiro.kawasaki, hch, ming.lei, gjoyce
When a QoS function is removed via rq_qos_del(), and it happens to be the
last QoS function on the request queue, q->rq_qos becomes NULL. In this
case, the QUEUE_FLAG_QOS_ENABLED bit should also be cleared to reflect
that no QoS hooks remain active.
This patch ensures that the QUEUE_FLAG_QOS_ENABLED flag is cleared if the
queue no longer has any associated rq_qos policies. Failing to do so
could cause unnecessary dereferences of a now-null q->rq_qos pointer in
the I/O path.
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
---
block/blk-rq-qos.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 460c04715321..654478dfbc20 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -375,6 +375,8 @@ void rq_qos_del(struct rq_qos *rqos)
break;
}
}
+ if (!q->rq_qos)
+ blk_queue_flag_clear(QUEUE_FLAG_QOS_ENABLED, q);
blk_mq_unfreeze_queue(q, memflags);
mutex_lock(&q->debugfs_mutex);
--
2.50.1
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-04 12:21 [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
2025-08-04 12:21 ` [PATCH 1/2] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
2025-08-04 12:21 ` [PATCH 2/2] block: clear QUEUE_FLAG_QOS_ENABLED in rq_qos_del() Nilay Shroff
@ 2025-08-04 13:42 ` Ming Lei
2025-08-05 4:58 ` Nilay Shroff
2025-08-05 9:28 ` Yu Kuai
3 siblings, 1 reply; 16+ messages in thread
From: Ming Lei @ 2025-08-04 13:42 UTC (permalink / raw)
To: Nilay Shroff; +Cc: linux-block, axboe, kch, shinichiro.kawasaki, hch, gjoyce
On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
> This patchset replaces the use of a static key in the I/O path (rq_qos_
> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
> is made to eliminate a potential deadlock introduced by the use of static
> keys in the blk-rq-qos infrastructure, as reported by lockdep during
> blktests block/005[1].
>
> The original static key approach was introduced to avoid unnecessary
> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
> blk-iolatency) is configured. While efficient, enabling a static key at
> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
> becomes problematic if the queue is already frozen — causing a reverse
> dependency on ->freeze_lock. This results in a lockdep splat indicating
> a potential deadlock.
>
> To resolve this, we now gate q->rq_qos access with a q->queue_flags
> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
> locking altogether.
>
> I compared both static key and atomic bitop implementations using ftrace
> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
> easy comparision, I made rq_qos_issue() noinline. The comparision was
> made on PowerPC machine.
>
> Static Key (disabled : QoS is not configured):
> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
> 5d4: 20 00 80 4e blr # return (branch to link register)
>
> Only a nop and blr (branch to link register) are executed — very lightweight.
>
> atomic bitop (QoS is not configured):
> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
> 5d8: 20 00 82 4d beqlr # return if bit not set
>
> This performs an ld and and andi. before returning. Slightly more work,
> but q->queue_flags is typically hot in cache during I/O submission.
>
> With Static Key (disabled):
> Duration (us): min=0.668 max=0.816 avg≈0.750
>
> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
> Duration (us): min=0.684 max=0.834 avg≈0.759
>
> As expected, both versions are almost similar in cost. The added latency
> from an extra ld and andi. is in the range of ~9ns.
>
> There're two patches in the series. The first patch replaces static key
> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
> rq_qos policies.
>
> As usual, feedback and review comments are welcome!
>
> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
Another approach is to call memalloc_noio_save() in cpu hotplug code...
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-04 13:42 ` [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Ming Lei
@ 2025-08-05 4:58 ` Nilay Shroff
2025-08-05 12:44 ` Ming Lei
2025-08-06 1:28 ` Jens Axboe
0 siblings, 2 replies; 16+ messages in thread
From: Nilay Shroff @ 2025-08-05 4:58 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, axboe, kch, shinichiro.kawasaki, hch, gjoyce
On 8/4/25 7:12 PM, Ming Lei wrote:
> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>> is made to eliminate a potential deadlock introduced by the use of static
>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>> blktests block/005[1].
>>
>> The original static key approach was introduced to avoid unnecessary
>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>> blk-iolatency) is configured. While efficient, enabling a static key at
>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>> becomes problematic if the queue is already frozen — causing a reverse
>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>> a potential deadlock.
>>
>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>> locking altogether.
>>
>> I compared both static key and atomic bitop implementations using ftrace
>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>> made on PowerPC machine.
>>
>> Static Key (disabled : QoS is not configured):
>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
>> 5d4: 20 00 80 4e blr # return (branch to link register)
>>
>> Only a nop and blr (branch to link register) are executed — very lightweight.
>>
>> atomic bitop (QoS is not configured):
>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>> 5d8: 20 00 82 4d beqlr # return if bit not set
>>
>> This performs an ld and and andi. before returning. Slightly more work,
>> but q->queue_flags is typically hot in cache during I/O submission.
>>
>> With Static Key (disabled):
>> Duration (us): min=0.668 max=0.816 avg≈0.750
>>
>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>> Duration (us): min=0.684 max=0.834 avg≈0.759
>>
>> As expected, both versions are almost similar in cost. The added latency
>> from an extra ld and andi. is in the range of ~9ns.
>>
>> There're two patches in the series. The first patch replaces static key
>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>> rq_qos policies.
>>
>> As usual, feedback and review comments are welcome!
>>
>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>
>
> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>
Yes that would help fix this. However per the general usage of GFP_NOIO scope in
kernel, it is used when we're performing memory allocations in a context where I/O
must not be initiated, because doing so could cause deadlocks or recursion.
So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
- In block layer context: during request submission
- Filesystem writeback, or swap-out.
- Memory reclaim or writeback triggered by memory pressure.
The cpu hotplug code may not be running in any of the above context. So
IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
a good idea, isn't it?
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-04 12:21 [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
` (2 preceding siblings ...)
2025-08-04 13:42 ` [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Ming Lei
@ 2025-08-05 9:28 ` Yu Kuai
2025-08-05 12:14 ` Nilay Shroff
3 siblings, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2025-08-05 9:28 UTC (permalink / raw)
To: Nilay Shroff, linux-block
Cc: axboe, kch, shinichiro.kawasaki, hch, ming.lei, gjoyce,
yukuai (C)
Hi,
在 2025/08/04 20:21, Nilay Shroff 写道:
> This patchset replaces the use of a static key in the I/O path (rq_qos_
> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
> is made to eliminate a potential deadlock introduced by the use of static
> keys in the blk-rq-qos infrastructure, as reported by lockdep during
> blktests block/005[1].
>
> The original static key approach was introduced to avoid unnecessary
> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
> blk-iolatency) is configured. While efficient, enabling a static key at
> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
> becomes problematic if the queue is already frozen — causing a reverse
> dependency on ->freeze_lock. This results in a lockdep splat indicating
> a potential deadlock.
Take a look at the report, the static key is from:
elevator_change_done
wbt_init
And looks like the queue is not frozen in this context, am I missing
something?
However, wbt_init() from queue_wb_lat_store() is indeed under
blk_mq_freeze_queue(), I understand the deadlock here, however, I'm
still confused about the report.
And for the deadlock, looks like blk-iocost and blk-iolatency, that
rq_qos_add is called from cgroupfs path, where queue is not freezed,
is it better to fix it from wbt, by calling rq_qos_add() first and
set rwb->enable_state to WBT_STATE_OFF_DEFAULT in wbt_init(), later
change this to WBT_STATE_ON_DEFAULT while queue is freezed.
Thanks,
Kuai
>
> To resolve this, we now gate q->rq_qos access with a q->queue_flags
> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
> locking altogether.
>
> I compared both static key and atomic bitop implementations using ftrace
> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
> easy comparision, I made rq_qos_issue() noinline. The comparision was
> made on PowerPC machine.
>
> Static Key (disabled : QoS is not configured):
> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
> 5d4: 20 00 80 4e blr # return (branch to link register)
>
> Only a nop and blr (branch to link register) are executed — very lightweight.
>
> atomic bitop (QoS is not configured):
> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
> 5d8: 20 00 82 4d beqlr # return if bit not set
>
> This performs an ld and and andi. before returning. Slightly more work,
> but q->queue_flags is typically hot in cache during I/O submission.
>
> With Static Key (disabled):
> Duration (us): min=0.668 max=0.816 avg≈0.750
>
> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
> Duration (us): min=0.684 max=0.834 avg≈0.759
>
> As expected, both versions are almost similar in cost. The added latency
> from an extra ld and andi. is in the range of ~9ns.
>
> There're two patches in the series. The first patch replaces static key
> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
> rq_qos policies.
>
> As usual, feedback and review comments are welcome!
>
> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>
> Nilay Shroff (2):
> block: avoid cpu_hotplug_lock depedency on freeze_lock
> block: clear QUEUE_FLAG_QOS_ENABLED in rq_qos_del()
>
> block/blk-mq-debugfs.c | 1 +
> block/blk-rq-qos.c | 8 ++++----
> block/blk-rq-qos.h | 45 +++++++++++++++++++++++++-----------------
> include/linux/blkdev.h | 1 +
> 4 files changed, 33 insertions(+), 22 deletions(-)
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-05 9:28 ` Yu Kuai
@ 2025-08-05 12:14 ` Nilay Shroff
0 siblings, 0 replies; 16+ messages in thread
From: Nilay Shroff @ 2025-08-05 12:14 UTC (permalink / raw)
To: Yu Kuai, linux-block
Cc: axboe, kch, shinichiro.kawasaki, hch, ming.lei, gjoyce,
yukuai (C)
On 8/5/25 2:58 PM, Yu Kuai wrote:
> Hi,
>
> 在 2025/08/04 20:21, Nilay Shroff 写道:
>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>> is made to eliminate a potential deadlock introduced by the use of static
>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>> blktests block/005[1].
>>
>> The original static key approach was introduced to avoid unnecessary
>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>> blk-iolatency) is configured. While efficient, enabling a static key at
>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>> becomes problematic if the queue is already frozen — causing a reverse
>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>> a potential deadlock.
>
> Take a look at the report, the static key is from:
>
> elevator_change_done
> wbt_init
>
> And looks like the queue is not frozen in this context, am I missing
> something?
We freeze queue from rq_qos_add() before we increment the static key.
>
> However, wbt_init() from queue_wb_lat_store() is indeed under
> blk_mq_freeze_queue(), I understand the deadlock here, however, I'm
> still confused about the report.
>
If you're following the report then you should notice that the thread#0
is blocked on cpu_hotplug_lock after freezing the queue or in another
words after it acquired ->freeze_lock (as mentioned above we do freeze
queue in rq_qos_add() first and then increment the static key). Then
thread#1 blocks on ->fs_reclaim (after it acquired the cpu_hotplug_lock).
And the last thread#3 in this report, waits for the queue to be unfrozen
(the queue has been frozen by thread #1). So this creates a cpu_hotplug_lock
dependency on ->freeze_lock. Hope this helps clarify your doubt.
> And for the deadlock, looks like blk-iocost and blk-iolatency, that
> rq_qos_add is called from cgroupfs path, where queue is not freezed,
We have following code paths (including blk-iocost) from where we invoke
rq_qos_xxx() APIs with queue already frozen:
ioc_qos_write()
-> blkg_conf_open_bdev_frozen() => freezes queue
-> blk_iocost_init()
-> rq_qos_add() => again freezes queue
-> static_branch_inc() => acquires cpu_hotplug_lock
queue_wb_lat_store() => freezes queue
-> wbt_init()
-> rq_qos_add() => again freezes queue
-> static_branch_inc => acquires cpu_hotplug_lock
__del_gendisk() => freezes queue
-> rq_qos_exit()
-> static_branch_dec() => acquires cpu_hotplug_lock
ioc_qos_write()
-> blkg_conf_open_bdev_frozen() => freezes queue
-> blk_iocost_init()
-> rq_qos_del()
We have to ideally decrement the static key in re_qos_del() but
that was missed. So the second patch in the series handles this
case, albeit using atomic bitops.
> is it better to fix it from wbt, by calling rq_qos_add() first and
> set rwb->enable_state to WBT_STATE_OFF_DEFAULT in wbt_init(), later
> change this to WBT_STATE_ON_DEFAULT while queue is freezed.
>
Hmm, as shown above other than wbt_init, we do have multiple code
paths from where we call rq_qos_xxx() APIs. So it's not the
only wbt path which we need to handle.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-05 4:58 ` Nilay Shroff
@ 2025-08-05 12:44 ` Ming Lei
2025-08-05 17:05 ` Nilay Shroff
2025-08-06 1:28 ` Jens Axboe
1 sibling, 1 reply; 16+ messages in thread
From: Ming Lei @ 2025-08-05 12:44 UTC (permalink / raw)
To: Nilay Shroff; +Cc: linux-block, axboe, kch, shinichiro.kawasaki, hch, gjoyce
On Tue, Aug 05, 2025 at 10:28:14AM +0530, Nilay Shroff wrote:
>
>
> On 8/4/25 7:12 PM, Ming Lei wrote:
> > On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
> >> This patchset replaces the use of a static key in the I/O path (rq_qos_
> >> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
> >> is made to eliminate a potential deadlock introduced by the use of static
> >> keys in the blk-rq-qos infrastructure, as reported by lockdep during
> >> blktests block/005[1].
> >>
> >> The original static key approach was introduced to avoid unnecessary
> >> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
> >> blk-iolatency) is configured. While efficient, enabling a static key at
> >> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
> >> becomes problematic if the queue is already frozen — causing a reverse
> >> dependency on ->freeze_lock. This results in a lockdep splat indicating
> >> a potential deadlock.
> >>
> >> To resolve this, we now gate q->rq_qos access with a q->queue_flags
> >> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
> >> locking altogether.
> >>
> >> I compared both static key and atomic bitop implementations using ftrace
> >> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
> >> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
> >> easy comparision, I made rq_qos_issue() noinline. The comparision was
> >> made on PowerPC machine.
> >>
> >> Static Key (disabled : QoS is not configured):
> >> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
> >> 5d4: 20 00 80 4e blr # return (branch to link register)
> >>
> >> Only a nop and blr (branch to link register) are executed — very lightweight.
> >>
> >> atomic bitop (QoS is not configured):
> >> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
> >> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
> >> 5d8: 20 00 82 4d beqlr # return if bit not set
> >>
> >> This performs an ld and and andi. before returning. Slightly more work,
> >> but q->queue_flags is typically hot in cache during I/O submission.
> >>
> >> With Static Key (disabled):
> >> Duration (us): min=0.668 max=0.816 avg≈0.750
> >>
> >> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
> >> Duration (us): min=0.684 max=0.834 avg≈0.759
> >>
> >> As expected, both versions are almost similar in cost. The added latency
> >> from an extra ld and andi. is in the range of ~9ns.
> >>
> >> There're two patches in the series. The first patch replaces static key
> >> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
> >> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
> >> rq_qos policies.
> >>
> >> As usual, feedback and review comments are welcome!
> >>
> >> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> >
> >
> > Another approach is to call memalloc_noio_save() in cpu hotplug code...
> >
> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
> kernel, it is used when we're performing memory allocations in a context where I/O
> must not be initiated, because doing so could cause deadlocks or recursion.
>
> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
> - In block layer context: during request submission
> - Filesystem writeback, or swap-out.
> - Memory reclaim or writeback triggered by memory pressure.
If you grep blk_mq_freeze_queue, you will see the above list is far from
enough, :-)
>
> The cpu hotplug code may not be running in any of the above context. So
> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
> a good idea, isn't it?
The reasoning(A -> B) looks correct, but the condition A is obviously not.
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-05 12:44 ` Ming Lei
@ 2025-08-05 17:05 ` Nilay Shroff
2025-08-06 7:21 ` Ming Lei
0 siblings, 1 reply; 16+ messages in thread
From: Nilay Shroff @ 2025-08-05 17:05 UTC (permalink / raw)
To: Ming Lei; +Cc: linux-block, axboe, kch, shinichiro.kawasaki, hch, gjoyce
On 8/5/25 6:14 PM, Ming Lei wrote:
> On Tue, Aug 05, 2025 at 10:28:14AM +0530, Nilay Shroff wrote:
>>
>>
>> On 8/4/25 7:12 PM, Ming Lei wrote:
>>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>>>> is made to eliminate a potential deadlock introduced by the use of static
>>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>>>> blktests block/005[1].
>>>>
>>>> The original static key approach was introduced to avoid unnecessary
>>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>>>> blk-iolatency) is configured. While efficient, enabling a static key at
>>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>>>> becomes problematic if the queue is already frozen — causing a reverse
>>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>>>> a potential deadlock.
>>>>
>>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>>>> locking altogether.
>>>>
>>>> I compared both static key and atomic bitop implementations using ftrace
>>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>>>> made on PowerPC machine.
>>>>
>>>> Static Key (disabled : QoS is not configured):
>>>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
>>>> 5d4: 20 00 80 4e blr # return (branch to link register)
>>>>
>>>> Only a nop and blr (branch to link register) are executed — very lightweight.
>>>>
>>>> atomic bitop (QoS is not configured):
>>>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
>>>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>>>> 5d8: 20 00 82 4d beqlr # return if bit not set
>>>>
>>>> This performs an ld and and andi. before returning. Slightly more work,
>>>> but q->queue_flags is typically hot in cache during I/O submission.
>>>>
>>>> With Static Key (disabled):
>>>> Duration (us): min=0.668 max=0.816 avg≈0.750
>>>>
>>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>>>> Duration (us): min=0.684 max=0.834 avg≈0.759
>>>>
>>>> As expected, both versions are almost similar in cost. The added latency
>>>> from an extra ld and andi. is in the range of ~9ns.
>>>>
>>>> There're two patches in the series. The first patch replaces static key
>>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>>>> rq_qos policies.
>>>>
>>>> As usual, feedback and review comments are welcome!
>>>>
>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>
>>>
>>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>>>
>> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
>> kernel, it is used when we're performing memory allocations in a context where I/O
>> must not be initiated, because doing so could cause deadlocks or recursion.
>>
>> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
>> - In block layer context: during request submission
>> - Filesystem writeback, or swap-out.
>> - Memory reclaim or writeback triggered by memory pressure.
>
> If you grep blk_mq_freeze_queue, you will see the above list is far from
> enough, :-)
>
Yes you were correct:-) I didn't cover all cases but only a subset.
>>
>> The cpu hotplug code may not be running in any of the above context. So
>> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
>> a good idea, isn't it?
>
> The reasoning(A -> B) looks correct, but the condition A is obviously not.
>
Regarding the use of memalloc_noio_save() in CPU hotplug code:
Notably this issue isn't limited to the CPU hotplug subsystem itself.
In reality, the cpu_hotplug_lock is widely used across various kernel
subsystems—not just in CPU hotplug-specific paths. There are several
code paths outside of the hotplug core that acquire cpu_hotplug_lock
and subsequently perform memory allocations using GFP_KERNEL.
You can observe this by grepping for usages of cpu_hotplug_lock throughout
the kernel. This means that adding memalloc_noio_save() solely within the
CPU hotplug code wouldn't address the broader problem.
I also experimented with placing memalloc_noio_save() in CPU hotplug path,
and as expected, I still encountered a lockdep splat—indicating that the
root cause lies deeper in the general locking and allocation order around
cpu_hotplug_lock and memory reclaim behavior. Please see below the new
lockdep splat observed (after adding memalloc_noio_save() in CPU hotplug
code):
======================================================
WARNING: possible circular locking dependency detected
6.16.0+ #14 Not tainted
------------------------------------------------------
check/4628 is trying to acquire lock:
c0000000027b30c8 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_inc+0x24/0x50
but task is already holding lock:
c0000000cb825d28 (&q->q_usage_counter(io)#18){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (&q->q_usage_counter(io)#18){++++}-{0:0}:
__lock_acquire+0x6b4/0x103c
lock_acquire.part.0+0xd0/0x26c
blk_alloc_queue+0x3ac/0x3e8
blk_mq_alloc_queue+0x88/0x11c
__blk_mq_alloc_disk+0x34/0xd8
nvme_alloc_ns+0xdc/0x6ac [nvme_core]
nvme_scan_ns+0x234/0x2d4 [nvme_core]
async_run_entry_fn+0x60/0x1cc
process_one_work+0x2ac/0x7e4
worker_thread+0x238/0x460
kthread+0x158/0x188
start_kernel_thread+0x14/0x18
-> #2 (fs_reclaim){+.+.}-{0:0}:
__lock_acquire+0x6b4/0x103c
lock_acquire.part.0+0xd0/0x26c
fs_reclaim_acquire+0xe0/0x120
__kmalloc_cache_noprof+0x78/0x5d0
jump_label_add_module+0x1b0/0x528
jump_label_module_notify+0xb0/0x114
notifier_call_chain+0xac/0x248
blocking_notifier_call_chain_robust+0x88/0x134
load_module+0x938/0xba0
init_module_from_file+0xb4/0x108
idempotent_init_module+0x26c/0x358
sys_finit_module+0x98/0x140
system_call_exception+0x134/0x360
system_call_vectored_common+0x15c/0x2ec
-> #1 (jump_label_mutex){+.+.}-{4:4}:
__lock_acquire+0x6b4/0x103c
lock_acquire.part.0+0xd0/0x26c
__mutex_lock+0xf0/0xf60
jump_label_init+0x74/0x194
early_init_devtree+0x110/0x534
early_setup+0xc4/0x2a0
start_here_multiplatform+0x84/0xa0
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
check_prev_add+0x170/0x1248
validate_chain+0x7f0/0xba8
__lock_acquire+0x6b4/0x103c
lock_acquire.part.0+0xd0/0x26c
cpus_read_lock+0x6c/0x18c
static_key_slow_inc+0x24/0x50
rq_qos_add+0x108/0x1c0
wbt_init+0x17c/0x234
elevator_change_done+0x228/0x2ac
elv_iosched_store+0x144/0x1f0
queue_attr_store+0x12c/0x164
sysfs_kf_write+0x74/0xc4
kernfs_fop_write_iter+0x1a8/0x2a4
vfs_write+0x45c/0x65c
ksys_write+0x84/0x140
system_call_exception+0x134/0x360
system_call_vectored_common+0x15c/0x2ec
other info that might help us debug this:
Chain exists of:
cpu_hotplug_lock --> fs_reclaim --> &q->q_usage_counter(io)#18
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&q->q_usage_counter(io)#18);
lock(fs_reclaim);
lock(&q->q_usage_counter(io)#18);
rlock(cpu_hotplug_lock);
*** DEADLOCK ***
7 locks held by check/4628:
#0: c0000000b6a92418 (sb_writers#3){.+.+}-{0:0}, at: ksys_write+0x84/0x140
#1: c0000000b79f5488 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x164/0x2a4
#2: c000000009aef2b8 (kn->active#58){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x170/0x2a4
#3: c0000000cc512190 (&set->update_nr_hwq_lock){++++}-{4:4}, at: elv_iosched_store+0x124/0x1f0
#4: c0000000cb825f30 (&q->rq_qos_mutex){+.+.}-{4:4}, at: wbt_init+0x160/0x234
#5: c0000000cb825d28 (&q->q_usage_counter(io)#18){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
#6: c0000000cb825d60 (&q->q_usage_counter(queue)#15){+.+.}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-05 4:58 ` Nilay Shroff
2025-08-05 12:44 ` Ming Lei
@ 2025-08-06 1:28 ` Jens Axboe
2025-08-06 1:44 ` Yu Kuai
2025-08-06 5:13 ` Nilay Shroff
1 sibling, 2 replies; 16+ messages in thread
From: Jens Axboe @ 2025-08-06 1:28 UTC (permalink / raw)
To: Nilay Shroff, Ming Lei; +Cc: linux-block, kch, shinichiro.kawasaki, hch, gjoyce
On 8/4/25 10:58 PM, Nilay Shroff wrote:
>
>
> On 8/4/25 7:12 PM, Ming Lei wrote:
>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>>> is made to eliminate a potential deadlock introduced by the use of static
>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>>> blktests block/005[1].
>>>
>>> The original static key approach was introduced to avoid unnecessary
>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>>> blk-iolatency) is configured. While efficient, enabling a static key at
>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>>> becomes problematic if the queue is already frozen — causing a reverse
>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>>> a potential deadlock.
>>>
>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>>> locking altogether.
>>>
>>> I compared both static key and atomic bitop implementations using ftrace
>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>>> made on PowerPC machine.
>>>
>>> Static Key (disabled : QoS is not configured):
>>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
>>> 5d4: 20 00 80 4e blr # return (branch to link register)
>>>
>>> Only a nop and blr (branch to link register) are executed — very lightweight.
>>>
>>> atomic bitop (QoS is not configured):
>>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
>>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>>> 5d8: 20 00 82 4d beqlr # return if bit not set
>>>
>>> This performs an ld and and andi. before returning. Slightly more work,
>>> but q->queue_flags is typically hot in cache during I/O submission.
>>>
>>> With Static Key (disabled):
>>> Duration (us): min=0.668 max=0.816 avg≈0.750
>>>
>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>>> Duration (us): min=0.684 max=0.834 avg≈0.759
>>>
>>> As expected, both versions are almost similar in cost. The added latency
>>> from an extra ld and andi. is in the range of ~9ns.
>>>
>>> There're two patches in the series. The first patch replaces static key
>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>>> rq_qos policies.
>>>
>>> As usual, feedback and review comments are welcome!
>>>
>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>
>>
>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>>
> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
> kernel, it is used when we're performing memory allocations in a context where I/O
> must not be initiated, because doing so could cause deadlocks or recursion.
>
> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
> - In block layer context: during request submission
> - Filesystem writeback, or swap-out.
> - Memory reclaim or writeback triggered by memory pressure.
>
> The cpu hotplug code may not be running in any of the above context. So
> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
> a good idea, isn't it?
Please heed Ming's advice, moving this from a static key to an atomic
queue flags ops is pointless, may as well kill it at that point.
I see v2 is out now with the exact same approach.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-06 1:28 ` Jens Axboe
@ 2025-08-06 1:44 ` Yu Kuai
2025-08-13 11:20 ` Nilay Shroff
2025-08-06 5:13 ` Nilay Shroff
1 sibling, 1 reply; 16+ messages in thread
From: Yu Kuai @ 2025-08-06 1:44 UTC (permalink / raw)
To: Jens Axboe, Nilay Shroff, Ming Lei
Cc: linux-block, kch, shinichiro.kawasaki, hch, gjoyce, yukuai (C)
Hi,
在 2025/08/06 9:28, Jens Axboe 写道:
> On 8/4/25 10:58 PM, Nilay Shroff wrote:
>>
>>
>> On 8/4/25 7:12 PM, Ming Lei wrote:
>>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>>>> is made to eliminate a potential deadlock introduced by the use of static
>>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>>>> blktests block/005[1].
>>>>
>>>> The original static key approach was introduced to avoid unnecessary
>>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>>>> blk-iolatency) is configured. While efficient, enabling a static key at
>>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>>>> becomes problematic if the queue is already frozen — causing a reverse
>>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>>>> a potential deadlock.
>>>>
>>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>>>> locking altogether.
>>>>
>>>> I compared both static key and atomic bitop implementations using ftrace
>>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>>>> made on PowerPC machine.
>>>>
>>>> Static Key (disabled : QoS is not configured):
>>>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
>>>> 5d4: 20 00 80 4e blr # return (branch to link register)
>>>>
>>>> Only a nop and blr (branch to link register) are executed — very lightweight.
>>>>
>>>> atomic bitop (QoS is not configured):
>>>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
>>>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>>>> 5d8: 20 00 82 4d beqlr # return if bit not set
>>>>
>>>> This performs an ld and and andi. before returning. Slightly more work,
>>>> but q->queue_flags is typically hot in cache during I/O submission.
>>>>
>>>> With Static Key (disabled):
>>>> Duration (us): min=0.668 max=0.816 avg≈0.750
>>>>
>>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>>>> Duration (us): min=0.684 max=0.834 avg≈0.759
>>>>
>>>> As expected, both versions are almost similar in cost. The added latency
>>>> from an extra ld and andi. is in the range of ~9ns.
>>>>
>>>> There're two patches in the series. The first patch replaces static key
>>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>>>> rq_qos policies.
>>>>
>>>> As usual, feedback and review comments are welcome!
>>>>
>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>
>>>
>>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>>>
>> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
>> kernel, it is used when we're performing memory allocations in a context where I/O
>> must not be initiated, because doing so could cause deadlocks or recursion.
>>
>> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
>> - In block layer context: during request submission
>> - Filesystem writeback, or swap-out.
>> - Memory reclaim or writeback triggered by memory pressure.
>>
>> The cpu hotplug code may not be running in any of the above context. So
>> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
>> a good idea, isn't it?
>
> Please heed Ming's advice, moving this from a static key to an atomic
> queue flags ops is pointless, may as well kill it at that point.
Nilay already tested and replied this is a dead end :(
I don't quite understand why it's pointless, if rq_qos is never enabled,
an atmoic queue_flag is still minor optimization, isn't it?
Thanks,
Kuai
>
> I see v2 is out now with the exact same approach.
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-06 1:28 ` Jens Axboe
2025-08-06 1:44 ` Yu Kuai
@ 2025-08-06 5:13 ` Nilay Shroff
1 sibling, 0 replies; 16+ messages in thread
From: Nilay Shroff @ 2025-08-06 5:13 UTC (permalink / raw)
To: Jens Axboe, Ming Lei; +Cc: linux-block, kch, shinichiro.kawasaki, hch, gjoyce
On 8/6/25 6:58 AM, Jens Axboe wrote:
> On 8/4/25 10:58 PM, Nilay Shroff wrote:
>>
>>
>> On 8/4/25 7:12 PM, Ming Lei wrote:
>>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>>>> is made to eliminate a potential deadlock introduced by the use of static
>>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>>>> blktests block/005[1].
>>>>
>>>> The original static key approach was introduced to avoid unnecessary
>>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>>>> blk-iolatency) is configured. While efficient, enabling a static key at
>>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>>>> becomes problematic if the queue is already frozen — causing a reverse
>>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>>>> a potential deadlock.
>>>>
>>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>>>> locking altogether.
>>>>
>>>> I compared both static key and atomic bitop implementations using ftrace
>>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>>>> made on PowerPC machine.
>>>>
>>>> Static Key (disabled : QoS is not configured):
>>>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
>>>> 5d4: 20 00 80 4e blr # return (branch to link register)
>>>>
>>>> Only a nop and blr (branch to link register) are executed — very lightweight.
>>>>
>>>> atomic bitop (QoS is not configured):
>>>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
>>>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>>>> 5d8: 20 00 82 4d beqlr # return if bit not set
>>>>
>>>> This performs an ld and and andi. before returning. Slightly more work,
>>>> but q->queue_flags is typically hot in cache during I/O submission.
>>>>
>>>> With Static Key (disabled):
>>>> Duration (us): min=0.668 max=0.816 avg≈0.750
>>>>
>>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>>>> Duration (us): min=0.684 max=0.834 avg≈0.759
>>>>
>>>> As expected, both versions are almost similar in cost. The added latency
>>>> from an extra ld and andi. is in the range of ~9ns.
>>>>
>>>> There're two patches in the series. The first patch replaces static key
>>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>>>> rq_qos policies.
>>>>
>>>> As usual, feedback and review comments are welcome!
>>>>
>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>
>>>
>>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>>>
>> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
>> kernel, it is used when we're performing memory allocations in a context where I/O
>> must not be initiated, because doing so could cause deadlocks or recursion.
>>
>> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
>> - In block layer context: during request submission
>> - Filesystem writeback, or swap-out.
>> - Memory reclaim or writeback triggered by memory pressure.
>>
>> The cpu hotplug code may not be running in any of the above context. So
>> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
>> a good idea, isn't it?
>
> Please heed Ming's advice, moving this from a static key to an atomic
> queue flags ops is pointless, may as well kill it at that point.
>
Yes I agree and personally I like static key very much as it's lightweight.
And I also liked the way you used it in IO hotpath so that we avoid cost of
fetching q->rq_qos when not needed.
Having said that, I also tried Ming's suggestion but that didn't work out
due to the fact that "cpu_hotplug_lock is widely used across various kernel
subsystems— not just in CPU hotplug-specific paths. There are several code
paths outside of the hotplug core that acquire cpu_hotplug_lock and subsequently
perform memory allocations using GFP_KERNEL". So essentially adopting to use
GFP_NOIO in cpu hotplug code may not help. You might have missed my reply to
Ming's suggestion, you may refer it here:
https://lore.kernel.org/all/897eaaa4-31c7-4661-b5d4-3e2bef1fca1e@linux.ibm.com/#t
> I see v2 is out now with the exact same approach.
>
Yes I sent out v2 just for fixing minor things in the original patch as I
outlined it in the v2 changelog.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-05 17:05 ` Nilay Shroff
@ 2025-08-06 7:21 ` Ming Lei
0 siblings, 0 replies; 16+ messages in thread
From: Ming Lei @ 2025-08-06 7:21 UTC (permalink / raw)
To: Nilay Shroff; +Cc: linux-block, axboe, kch, shinichiro.kawasaki, hch, gjoyce
On Tue, Aug 05, 2025 at 10:35:38PM +0530, Nilay Shroff wrote:
>
>
> On 8/5/25 6:14 PM, Ming Lei wrote:
> > On Tue, Aug 05, 2025 at 10:28:14AM +0530, Nilay Shroff wrote:
> >>
> >>
> >> On 8/4/25 7:12 PM, Ming Lei wrote:
> >>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
> >>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
> >>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
> >>>> is made to eliminate a potential deadlock introduced by the use of static
> >>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
> >>>> blktests block/005[1].
> >>>>
> >>>> The original static key approach was introduced to avoid unnecessary
> >>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
> >>>> blk-iolatency) is configured. While efficient, enabling a static key at
> >>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
> >>>> becomes problematic if the queue is already frozen — causing a reverse
> >>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
> >>>> a potential deadlock.
> >>>>
> >>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
> >>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
> >>>> locking altogether.
> >>>>
> >>>> I compared both static key and atomic bitop implementations using ftrace
> >>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
> >>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
> >>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
> >>>> made on PowerPC machine.
> >>>>
> >>>> Static Key (disabled : QoS is not configured):
> >>>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
> >>>> 5d4: 20 00 80 4e blr # return (branch to link register)
> >>>>
> >>>> Only a nop and blr (branch to link register) are executed — very lightweight.
> >>>>
> >>>> atomic bitop (QoS is not configured):
> >>>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
> >>>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
> >>>> 5d8: 20 00 82 4d beqlr # return if bit not set
> >>>>
> >>>> This performs an ld and and andi. before returning. Slightly more work,
> >>>> but q->queue_flags is typically hot in cache during I/O submission.
> >>>>
> >>>> With Static Key (disabled):
> >>>> Duration (us): min=0.668 max=0.816 avg≈0.750
> >>>>
> >>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
> >>>> Duration (us): min=0.684 max=0.834 avg≈0.759
> >>>>
> >>>> As expected, both versions are almost similar in cost. The added latency
> >>>> from an extra ld and andi. is in the range of ~9ns.
> >>>>
> >>>> There're two patches in the series. The first patch replaces static key
> >>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
> >>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
> >>>> rq_qos policies.
> >>>>
> >>>> As usual, feedback and review comments are welcome!
> >>>>
> >>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
> >>>
> >>>
> >>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
> >>>
> >> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
> >> kernel, it is used when we're performing memory allocations in a context where I/O
> >> must not be initiated, because doing so could cause deadlocks or recursion.
> >>
> >> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
> >> - In block layer context: during request submission
> >> - Filesystem writeback, or swap-out.
> >> - Memory reclaim or writeback triggered by memory pressure.
> >
> > If you grep blk_mq_freeze_queue, you will see the above list is far from
> > enough, :-)
> >
> Yes you were correct:-) I didn't cover all cases but only a subset.
>
> >>
> >> The cpu hotplug code may not be running in any of the above context. So
> >> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
> >> a good idea, isn't it?
> >
> > The reasoning(A -> B) looks correct, but the condition A is obviously not.
> >
> Regarding the use of memalloc_noio_save() in CPU hotplug code:
> Notably this issue isn't limited to the CPU hotplug subsystem itself.
> In reality, the cpu_hotplug_lock is widely used across various kernel
> subsystems—not just in CPU hotplug-specific paths. There are several
> code paths outside of the hotplug core that acquire cpu_hotplug_lock
> and subsequently perform memory allocations using GFP_KERNEL.
>
> You can observe this by grepping for usages of cpu_hotplug_lock throughout
> the kernel. This means that adding memalloc_noio_save() solely within the
> CPU hotplug code wouldn't address the broader problem.
>
> I also experimented with placing memalloc_noio_save() in CPU hotplug path,
> and as expected, I still encountered a lockdep splat—indicating that the
> root cause lies deeper in the general locking and allocation order around
> cpu_hotplug_lock and memory reclaim behavior. Please see below the new
> lockdep splat observed (after adding memalloc_noio_save() in CPU hotplug
> code):
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.16.0+ #14 Not tainted
> ------------------------------------------------------
> check/4628 is trying to acquire lock:
> c0000000027b30c8 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_inc+0x24/0x50
>
> but task is already holding lock:
> c0000000cb825d28 (&q->q_usage_counter(io)#18){++++}-{0:0}, at: blk_mq_freeze_queue_nomemsave+0x28/0x40
>
> which lock already depends on the new lock.
>
Technically it can be addressed by adding memalloc_noio_save() to
cpus_read_lock(), but it is one tree-wide change, so becomes harder
to move towards this way.
Now I don't object the approach in this patchset any more.
Thanks,
Ming
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-06 1:44 ` Yu Kuai
@ 2025-08-13 11:20 ` Nilay Shroff
2025-08-13 12:16 ` Jens Axboe
0 siblings, 1 reply; 16+ messages in thread
From: Nilay Shroff @ 2025-08-13 11:20 UTC (permalink / raw)
To: Yu Kuai, Jens Axboe, Ming Lei
Cc: linux-block, kch, shinichiro.kawasaki, hch, gjoyce, yukuai (C)
Hi Jens,
On 8/6/25 7:14 AM, Yu Kuai wrote:
> Hi,
>
> 在 2025/08/06 9:28, Jens Axboe 写道:
>> On 8/4/25 10:58 PM, Nilay Shroff wrote:
>>>
>>>
>>> On 8/4/25 7:12 PM, Ming Lei wrote:
>>>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>>>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>>>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>>>>> is made to eliminate a potential deadlock introduced by the use of static
>>>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>>>>> blktests block/005[1].
>>>>>
>>>>> The original static key approach was introduced to avoid unnecessary
>>>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>>>>> blk-iolatency) is configured. While efficient, enabling a static key at
>>>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>>>>> becomes problematic if the queue is already frozen — causing a reverse
>>>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>>>>> a potential deadlock.
>>>>>
>>>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>>>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>>>>> locking altogether.
>>>>>
>>>>> I compared both static key and atomic bitop implementations using ftrace
>>>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>>>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>>>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>>>>> made on PowerPC machine.
>>>>>
>>>>> Static Key (disabled : QoS is not configured):
>>>>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
>>>>> 5d4: 20 00 80 4e blr # return (branch to link register)
>>>>>
>>>>> Only a nop and blr (branch to link register) are executed — very lightweight.
>>>>>
>>>>> atomic bitop (QoS is not configured):
>>>>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
>>>>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>>>>> 5d8: 20 00 82 4d beqlr # return if bit not set
>>>>>
>>>>> This performs an ld and and andi. before returning. Slightly more work,
>>>>> but q->queue_flags is typically hot in cache during I/O submission.
>>>>>
>>>>> With Static Key (disabled):
>>>>> Duration (us): min=0.668 max=0.816 avg≈0.750
>>>>>
>>>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>>>>> Duration (us): min=0.684 max=0.834 avg≈0.759
>>>>>
>>>>> As expected, both versions are almost similar in cost. The added latency
>>>>> from an extra ld and andi. is in the range of ~9ns.
>>>>>
>>>>> There're two patches in the series. The first patch replaces static key
>>>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>>>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>>>>> rq_qos policies.
>>>>>
>>>>> As usual, feedback and review comments are welcome!
>>>>>
>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>
>>>>
>>>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>>>>
>>> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
>>> kernel, it is used when we're performing memory allocations in a context where I/O
>>> must not be initiated, because doing so could cause deadlocks or recursion.
>>>
>>> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
>>> - In block layer context: during request submission
>>> - Filesystem writeback, or swap-out.
>>> - Memory reclaim or writeback triggered by memory pressure.
>>>
>>> The cpu hotplug code may not be running in any of the above context. So
>>> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
>>> a good idea, isn't it?
>>
>> Please heed Ming's advice, moving this from a static key to an atomic
>> queue flags ops is pointless, may as well kill it at that point.
>
> Nilay already tested and replied this is a dead end :(
>
> I don't quite understand why it's pointless, if rq_qos is never enabled,
> an atmoic queue_flag is still minor optimization, isn't it?
>
>>
>> I see v2 is out now with the exact same approach.
>>
As mentioned earlier, I tried Ming's original recommendation, but it didn’t
resolve the issue. In a separate thread, Ming agreed that using an atomic queue
flag is a reasonable approach and would avoid the lockdep problem while still
keeping a minor fast-path optimization.
That leaves us with two options:
- Use an atomic queue flag, or
- Remove the static key entirely.
So before I send v3, do you prefer the atomic queue flag approach, or would you rather
see the static key removed altogether? My preference is for the atomic queue flag,
as it maintains a lightweight check without the static key’s locking concerns.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-13 11:20 ` Nilay Shroff
@ 2025-08-13 12:16 ` Jens Axboe
2025-08-13 15:01 ` Nilay Shroff
0 siblings, 1 reply; 16+ messages in thread
From: Jens Axboe @ 2025-08-13 12:16 UTC (permalink / raw)
To: Nilay Shroff, Yu Kuai, Ming Lei
Cc: linux-block, kch, shinichiro.kawasaki, hch, gjoyce, yukuai (C)
On 8/13/25 5:20 AM, Nilay Shroff wrote:
> Hi Jens,
>
> On 8/6/25 7:14 AM, Yu Kuai wrote:
>> Hi,
>>
>> ? 2025/08/06 9:28, Jens Axboe ??:
>>> On 8/4/25 10:58 PM, Nilay Shroff wrote:
>>>>
>>>>
>>>> On 8/4/25 7:12 PM, Ming Lei wrote:
>>>>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>>>>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>>>>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>>>>>> is made to eliminate a potential deadlock introduced by the use of static
>>>>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>>>>>> blktests block/005[1].
>>>>>>
>>>>>> The original static key approach was introduced to avoid unnecessary
>>>>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>>>>>> blk-iolatency) is configured. While efficient, enabling a static key at
>>>>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>>>>>> becomes problematic if the queue is already frozen ? causing a reverse
>>>>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>>>>>> a potential deadlock.
>>>>>>
>>>>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>>>>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>>>>>> locking altogether.
>>>>>>
>>>>>> I compared both static key and atomic bitop implementations using ftrace
>>>>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>>>>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>>>>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>>>>>> made on PowerPC machine.
>>>>>>
>>>>>> Static Key (disabled : QoS is not configured):
>>>>>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
>>>>>> 5d4: 20 00 80 4e blr # return (branch to link register)
>>>>>>
>>>>>> Only a nop and blr (branch to link register) are executed ? very lightweight.
>>>>>>
>>>>>> atomic bitop (QoS is not configured):
>>>>>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
>>>>>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>>>>>> 5d8: 20 00 82 4d beqlr # return if bit not set
>>>>>>
>>>>>> This performs an ld and and andi. before returning. Slightly more work,
>>>>>> but q->queue_flags is typically hot in cache during I/O submission.
>>>>>>
>>>>>> With Static Key (disabled):
>>>>>> Duration (us): min=0.668 max=0.816 avg?0.750
>>>>>>
>>>>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>>>>>> Duration (us): min=0.684 max=0.834 avg?0.759
>>>>>>
>>>>>> As expected, both versions are almost similar in cost. The added latency
>>>>>> from an extra ld and andi. is in the range of ~9ns.
>>>>>>
>>>>>> There're two patches in the series. The first patch replaces static key
>>>>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>>>>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>>>>>> rq_qos policies.
>>>>>>
>>>>>> As usual, feedback and review comments are welcome!
>>>>>>
>>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>
>>>>>
>>>>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>>>>>
>>>> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
>>>> kernel, it is used when we're performing memory allocations in a context where I/O
>>>> must not be initiated, because doing so could cause deadlocks or recursion.
>>>>
>>>> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
>>>> - In block layer context: during request submission
>>>> - Filesystem writeback, or swap-out.
>>>> - Memory reclaim or writeback triggered by memory pressure.
>>>>
>>>> The cpu hotplug code may not be running in any of the above context. So
>>>> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
>>>> a good idea, isn't it?
>>>
>>> Please heed Ming's advice, moving this from a static key to an atomic
>>> queue flags ops is pointless, may as well kill it at that point.
>>
>> Nilay already tested and replied this is a dead end :(
>>
>> I don't quite understand why it's pointless, if rq_qos is never enabled,
>> an atmoic queue_flag is still minor optimization, isn't it?
>>
>>>
>>> I see v2 is out now with the exact same approach.
>>>
> As mentioned earlier, I tried Ming's original recommendation, but it didn?t
> resolve the issue. In a separate thread, Ming agreed that using an atomic queue
> flag is a reasonable approach and would avoid the lockdep problem while still
> keeping a minor fast-path optimization.
>
> That leaves us with two options:
> - Use an atomic queue flag, or
> - Remove the static key entirely.
>
> So before I send v3, do you prefer the atomic queue flag approach, or
> would you rather see the static key removed altogether? My preference
> is for the atomic queue flag, as it maintains a lightweight check
> without the static key?s locking concerns.
Atomic test is still going to be better than pointless calls into
rq-qos, so that's still a win. Hence retaining it is better than simply
killing it off entirely.
I wonder if it makes sense to combine with IS_ENABLED() as well. Though
with how distros enable everything under the sun, probably not going to
be that useful.
--
Jens Axboe
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop
2025-08-13 12:16 ` Jens Axboe
@ 2025-08-13 15:01 ` Nilay Shroff
0 siblings, 0 replies; 16+ messages in thread
From: Nilay Shroff @ 2025-08-13 15:01 UTC (permalink / raw)
To: Jens Axboe, Yu Kuai, Ming Lei
Cc: linux-block, kch, shinichiro.kawasaki, hch, gjoyce, yukuai (C)
On 8/13/25 5:46 PM, Jens Axboe wrote:
> On 8/13/25 5:20 AM, Nilay Shroff wrote:
>> Hi Jens,
>>
>> On 8/6/25 7:14 AM, Yu Kuai wrote:
>>> Hi,
>>>
>>> ? 2025/08/06 9:28, Jens Axboe ??:
>>>> On 8/4/25 10:58 PM, Nilay Shroff wrote:
>>>>>
>>>>>
>>>>> On 8/4/25 7:12 PM, Ming Lei wrote:
>>>>>> On Mon, Aug 04, 2025 at 05:51:09PM +0530, Nilay Shroff wrote:
>>>>>>> This patchset replaces the use of a static key in the I/O path (rq_qos_
>>>>>>> xxx()) with an atomic queue flag (QUEUE_FLAG_QOS_ENABLED). This change
>>>>>>> is made to eliminate a potential deadlock introduced by the use of static
>>>>>>> keys in the blk-rq-qos infrastructure, as reported by lockdep during
>>>>>>> blktests block/005[1].
>>>>>>>
>>>>>>> The original static key approach was introduced to avoid unnecessary
>>>>>>> dereferencing of q->rq_qos when no blk-rq-qos module (e.g., blk-wbt or
>>>>>>> blk-iolatency) is configured. While efficient, enabling a static key at
>>>>>>> runtime requires taking cpu_hotplug_lock and jump_label_mutex, which
>>>>>>> becomes problematic if the queue is already frozen ? causing a reverse
>>>>>>> dependency on ->freeze_lock. This results in a lockdep splat indicating
>>>>>>> a potential deadlock.
>>>>>>>
>>>>>>> To resolve this, we now gate q->rq_qos access with a q->queue_flags
>>>>>>> bitop (QUEUE_FLAG_QOS_ENABLED), avoiding the static key and the associated
>>>>>>> locking altogether.
>>>>>>>
>>>>>>> I compared both static key and atomic bitop implementations using ftrace
>>>>>>> function graph tracer over ~50 invocations of rq_qos_issue() while ensuring
>>>>>>> blk-wbt/blk-iolatency were disabled (i.e., no QoS functionality). For
>>>>>>> easy comparision, I made rq_qos_issue() noinline. The comparision was
>>>>>>> made on PowerPC machine.
>>>>>>>
>>>>>>> Static Key (disabled : QoS is not configured):
>>>>>>> 5d0: 00 00 00 60 nop # patched in by static key framework (not taken)
>>>>>>> 5d4: 20 00 80 4e blr # return (branch to link register)
>>>>>>>
>>>>>>> Only a nop and blr (branch to link register) are executed ? very lightweight.
>>>>>>>
>>>>>>> atomic bitop (QoS is not configured):
>>>>>>> 5d0: 20 00 23 e9 ld r9,32(r3) # load q->queue_flags
>>>>>>> 5d4: 00 80 29 71 andi. r9,r9,32768 # check QUEUE_FLAG_QOS_ENABLED (bit 15)
>>>>>>> 5d8: 20 00 82 4d beqlr # return if bit not set
>>>>>>>
>>>>>>> This performs an ld and and andi. before returning. Slightly more work,
>>>>>>> but q->queue_flags is typically hot in cache during I/O submission.
>>>>>>>
>>>>>>> With Static Key (disabled):
>>>>>>> Duration (us): min=0.668 max=0.816 avg?0.750
>>>>>>>
>>>>>>> With atomic bitop QUEUE_FLAG_QOS_ENABLED (bit not set):
>>>>>>> Duration (us): min=0.684 max=0.834 avg?0.759
>>>>>>>
>>>>>>> As expected, both versions are almost similar in cost. The added latency
>>>>>>> from an extra ld and andi. is in the range of ~9ns.
>>>>>>>
>>>>>>> There're two patches in the series. The first patch replaces static key
>>>>>>> with QUEUE_FLAG_QOS_ENABLED. The second patch ensures that we disable
>>>>>>> the QUEUE_FLAG_QOS_ENABLED when the queue no longer has any associated
>>>>>>> rq_qos policies.
>>>>>>>
>>>>>>> As usual, feedback and review comments are welcome!
>>>>>>>
>>>>>>> [1] https://lore.kernel.org/linux-block/4fdm37so3o4xricdgfosgmohn63aa7wj3ua4e5vpihoamwg3ui@fq42f5q5t5ic/
>>>>>>
>>>>>>
>>>>>> Another approach is to call memalloc_noio_save() in cpu hotplug code...
>>>>>>
>>>>> Yes that would help fix this. However per the general usage of GFP_NOIO scope in
>>>>> kernel, it is used when we're performing memory allocations in a context where I/O
>>>>> must not be initiated, because doing so could cause deadlocks or recursion.
>>>>>
>>>>> So we typically, use GFP_NOIO in a code path that is already doing I/O, such as:
>>>>> - In block layer context: during request submission
>>>>> - Filesystem writeback, or swap-out.
>>>>> - Memory reclaim or writeback triggered by memory pressure.
>>>>>
>>>>> The cpu hotplug code may not be running in any of the above context. So
>>>>> IMO, adding memalloc_noio_save() in the cpu hotplug code would not be
>>>>> a good idea, isn't it?
>>>>
>>>> Please heed Ming's advice, moving this from a static key to an atomic
>>>> queue flags ops is pointless, may as well kill it at that point.
>>>
>>> Nilay already tested and replied this is a dead end :(
>>>
>>> I don't quite understand why it's pointless, if rq_qos is never enabled,
>>> an atmoic queue_flag is still minor optimization, isn't it?
>>>
>>>>
>>>> I see v2 is out now with the exact same approach.
>>>>
>> As mentioned earlier, I tried Ming's original recommendation, but it didn?t
>> resolve the issue. In a separate thread, Ming agreed that using an atomic queue
>> flag is a reasonable approach and would avoid the lockdep problem while still
>> keeping a minor fast-path optimization.
>>
>> That leaves us with two options:
>> - Use an atomic queue flag, or
>> - Remove the static key entirely.
>>
>> So before I send v3, do you prefer the atomic queue flag approach, or
>> would you rather see the static key removed altogether? My preference
>> is for the atomic queue flag, as it maintains a lightweight check
>> without the static key?s locking concerns.
>
> Atomic test is still going to be better than pointless calls into
> rq-qos, so that's still a win. Hence retaining it is better than simply
> killing it off entirely.
>
> I wonder if it makes sense to combine with IS_ENABLED() as well. Though
> with how distros enable everything under the sun, probably not going to
> be that useful.
>
Yes agreed, in my RHEL distro CONFIG_BLK_WBT, CONFIG_BLK_CGROUP_IOCOST
and CONFIG_BLK_CGROUP_IOLATENCY are all default enabled. So IS_ENABLED()
may not be that helpful. I'd send out v3 with some minor changes (per
review comments) using atomic queue flag now.
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2025-08-13 15:07 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-04 12:21 [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Nilay Shroff
2025-08-04 12:21 ` [PATCH 1/2] block: avoid cpu_hotplug_lock depedency on freeze_lock Nilay Shroff
2025-08-04 12:21 ` [PATCH 2/2] block: clear QUEUE_FLAG_QOS_ENABLED in rq_qos_del() Nilay Shroff
2025-08-04 13:42 ` [PATCH 0/2] block: blk-rq-qos: replace static key with atomic bitop Ming Lei
2025-08-05 4:58 ` Nilay Shroff
2025-08-05 12:44 ` Ming Lei
2025-08-05 17:05 ` Nilay Shroff
2025-08-06 7:21 ` Ming Lei
2025-08-06 1:28 ` Jens Axboe
2025-08-06 1:44 ` Yu Kuai
2025-08-13 11:20 ` Nilay Shroff
2025-08-13 12:16 ` Jens Axboe
2025-08-13 15:01 ` Nilay Shroff
2025-08-06 5:13 ` Nilay Shroff
2025-08-05 9:28 ` Yu Kuai
2025-08-05 12:14 ` Nilay Shroff
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox