* [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-16 4:10 [PATCH RESEND 0/5] block/blk-rq-qos: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
@ 2025-11-16 4:10 ` Yu Kuai
2025-11-17 10:10 ` Nilay Shroff
` (2 more replies)
2025-11-16 4:10 ` [PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
` (3 subsequent siblings)
4 siblings, 3 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-16 4:10 UTC (permalink / raw)
To: axboe, linux-block, linux-kernel, tj, nilay, ming.lei; +Cc: yukuai
queue should not be freezed under rq_qos_mutex, see example index
commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
parameters"), which means current implementation of rq_qos_add() is
problematic. Add a new helper and prepare to fix this problem in
following patches.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-rq-qos.c | 27 +++++++++++++++++++++++++++
block/blk-rq-qos.h | 2 ++
2 files changed, 29 insertions(+)
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 654478dfbc20..353397d7e126 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -322,6 +322,33 @@ void rq_qos_exit(struct request_queue *q)
mutex_unlock(&q->rq_qos_mutex);
}
+int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
+ enum rq_qos_id id, const struct rq_qos_ops *ops)
+{
+ struct request_queue *q = disk->queue;
+
+ WARN_ON_ONCE(q->mq_freeze_depth == 0);
+ lockdep_assert_held(&q->rq_qos_mutex);
+
+ if (rq_qos_id(q, id))
+ return -EBUSY;
+
+ rqos->disk = disk;
+ rqos->id = id;
+ rqos->ops = ops;
+ rqos->next = q->rq_qos;
+ q->rq_qos = rqos;
+ blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
+
+ if (rqos->ops->debugfs_attrs) {
+ mutex_lock(&q->debugfs_mutex);
+ blk_mq_debugfs_register_rqos(rqos);
+ mutex_unlock(&q->debugfs_mutex);
+ }
+
+ return 0;
+}
+
int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
const struct rq_qos_ops *ops)
{
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index b538f2c0febc..4a7fec01600b 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -87,6 +87,8 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
const struct rq_qos_ops *ops);
+int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
+ enum rq_qos_id id, const struct rq_qos_ops *ops);
void rq_qos_del(struct rq_qos *rqos);
typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
--
2.51.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-16 4:10 ` [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed() Yu Kuai
@ 2025-11-17 10:10 ` Nilay Shroff
2025-11-17 11:01 ` Ming Lei
2025-11-17 23:32 ` Bart Van Assche
2 siblings, 0 replies; 19+ messages in thread
From: Nilay Shroff @ 2025-11-17 10:10 UTC (permalink / raw)
To: Yu Kuai, axboe, linux-block, linux-kernel, tj, ming.lei
On 11/16/25 9:40 AM, Yu Kuai wrote:
> queue should not be freezed under rq_qos_mutex, see example index
> commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
> parameters"), which means current implementation of rq_qos_add() is
> problematic. Add a new helper and prepare to fix this problem in
> following patches.
>
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-16 4:10 ` [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed() Yu Kuai
2025-11-17 10:10 ` Nilay Shroff
@ 2025-11-17 11:01 ` Ming Lei
2025-11-17 11:13 ` Nilay Shroff
2025-11-17 23:32 ` Bart Van Assche
2 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-11-17 11:01 UTC (permalink / raw)
To: Yu Kuai; +Cc: axboe, linux-block, linux-kernel, tj, nilay
On Sun, Nov 16, 2025 at 12:10:20PM +0800, Yu Kuai wrote:
> queue should not be freezed under rq_qos_mutex, see example index
> commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
> parameters"), which means current implementation of rq_qos_add() is
> problematic. Add a new helper and prepare to fix this problem in
> following patches.
>
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> ---
> block/blk-rq-qos.c | 27 +++++++++++++++++++++++++++
> block/blk-rq-qos.h | 2 ++
> 2 files changed, 29 insertions(+)
>
> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> index 654478dfbc20..353397d7e126 100644
> --- a/block/blk-rq-qos.c
> +++ b/block/blk-rq-qos.c
> @@ -322,6 +322,33 @@ void rq_qos_exit(struct request_queue *q)
> mutex_unlock(&q->rq_qos_mutex);
> }
>
> +int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
> + enum rq_qos_id id, const struct rq_qos_ops *ops)
> +{
> + struct request_queue *q = disk->queue;
> +
> + WARN_ON_ONCE(q->mq_freeze_depth == 0);
> + lockdep_assert_held(&q->rq_qos_mutex);
> +
> + if (rq_qos_id(q, id))
> + return -EBUSY;
> +
> + rqos->disk = disk;
> + rqos->id = id;
> + rqos->ops = ops;
> + rqos->next = q->rq_qos;
> + q->rq_qos = rqos;
> + blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
> +
> + if (rqos->ops->debugfs_attrs) {
> + mutex_lock(&q->debugfs_mutex);
> + blk_mq_debugfs_register_rqos(rqos);
> + mutex_unlock(&q->debugfs_mutex);
> + }
It will cause more lockdep splat to let q->debugfs_mutex depend on queue freeze,
Also blk_mq_debugfs_register_rqos() does _not_ require queue to be frozen,
and it should be fine to move blk_mq_debugfs_register_rqos() out of queue
freeze.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-17 11:01 ` Ming Lei
@ 2025-11-17 11:13 ` Nilay Shroff
2025-11-17 11:30 ` Ming Lei
0 siblings, 1 reply; 19+ messages in thread
From: Nilay Shroff @ 2025-11-17 11:13 UTC (permalink / raw)
To: Ming Lei, Yu Kuai; +Cc: axboe, linux-block, linux-kernel, tj
On 11/17/25 4:31 PM, Ming Lei wrote:
> On Sun, Nov 16, 2025 at 12:10:20PM +0800, Yu Kuai wrote:
>> queue should not be freezed under rq_qos_mutex, see example index
>> commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
>> parameters"), which means current implementation of rq_qos_add() is
>> problematic. Add a new helper and prepare to fix this problem in
>> following patches.
>>
>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>> ---
>> block/blk-rq-qos.c | 27 +++++++++++++++++++++++++++
>> block/blk-rq-qos.h | 2 ++
>> 2 files changed, 29 insertions(+)
>>
>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>> index 654478dfbc20..353397d7e126 100644
>> --- a/block/blk-rq-qos.c
>> +++ b/block/blk-rq-qos.c
>> @@ -322,6 +322,33 @@ void rq_qos_exit(struct request_queue *q)
>> mutex_unlock(&q->rq_qos_mutex);
>> }
>>
>> +int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
>> + enum rq_qos_id id, const struct rq_qos_ops *ops)
>> +{
>> + struct request_queue *q = disk->queue;
>> +
>> + WARN_ON_ONCE(q->mq_freeze_depth == 0);
>> + lockdep_assert_held(&q->rq_qos_mutex);
>> +
>> + if (rq_qos_id(q, id))
>> + return -EBUSY;
>> +
>> + rqos->disk = disk;
>> + rqos->id = id;
>> + rqos->ops = ops;
>> + rqos->next = q->rq_qos;
>> + q->rq_qos = rqos;
>> + blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>> +
>> + if (rqos->ops->debugfs_attrs) {
>> + mutex_lock(&q->debugfs_mutex);
>> + blk_mq_debugfs_register_rqos(rqos);
>> + mutex_unlock(&q->debugfs_mutex);
>> + }
>
> It will cause more lockdep splat to let q->debugfs_mutex depend on queue freeze,
>
I think we already have that ->debugfs_mutex dependency on ->freeze_lock.
for instance,
ioc_qos_write => freeze-queue
blk_iocost_init
rq_qos_add
and also,
queue_wb_lat_store => freeze-queue
wbt_init
rq_qos_add
> Also blk_mq_debugfs_register_rqos() does _not_ require queue to be frozen,
> and it should be fine to move blk_mq_debugfs_register_rqos() out of queue
> freeze.
>
Yes correct, but I thought this pacthset is meant only to address incorrect
locking order between ->rq_qos_mutex and ->freeze_lock. So do you suggest
also refactoring code to avoid ->debugfs_mutex dependency on ->freeze_lock?
If yes then shouldn't that be handled in a separate patchset?
Thanks,
--Nilay
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-17 11:13 ` Nilay Shroff
@ 2025-11-17 11:30 ` Ming Lei
2025-11-17 11:39 ` Yu Kuai
0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-11-17 11:30 UTC (permalink / raw)
To: Nilay Shroff; +Cc: Yu Kuai, axboe, linux-block, linux-kernel, tj
On Mon, Nov 17, 2025 at 04:43:11PM +0530, Nilay Shroff wrote:
>
>
> On 11/17/25 4:31 PM, Ming Lei wrote:
> > On Sun, Nov 16, 2025 at 12:10:20PM +0800, Yu Kuai wrote:
> >> queue should not be freezed under rq_qos_mutex, see example index
> >> commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
> >> parameters"), which means current implementation of rq_qos_add() is
> >> problematic. Add a new helper and prepare to fix this problem in
> >> following patches.
> >>
> >> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> >> ---
> >> block/blk-rq-qos.c | 27 +++++++++++++++++++++++++++
> >> block/blk-rq-qos.h | 2 ++
> >> 2 files changed, 29 insertions(+)
> >>
> >> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> >> index 654478dfbc20..353397d7e126 100644
> >> --- a/block/blk-rq-qos.c
> >> +++ b/block/blk-rq-qos.c
> >> @@ -322,6 +322,33 @@ void rq_qos_exit(struct request_queue *q)
> >> mutex_unlock(&q->rq_qos_mutex);
> >> }
> >>
> >> +int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
> >> + enum rq_qos_id id, const struct rq_qos_ops *ops)
> >> +{
> >> + struct request_queue *q = disk->queue;
> >> +
> >> + WARN_ON_ONCE(q->mq_freeze_depth == 0);
> >> + lockdep_assert_held(&q->rq_qos_mutex);
> >> +
> >> + if (rq_qos_id(q, id))
> >> + return -EBUSY;
> >> +
> >> + rqos->disk = disk;
> >> + rqos->id = id;
> >> + rqos->ops = ops;
> >> + rqos->next = q->rq_qos;
> >> + q->rq_qos = rqos;
> >> + blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
> >> +
> >> + if (rqos->ops->debugfs_attrs) {
> >> + mutex_lock(&q->debugfs_mutex);
> >> + blk_mq_debugfs_register_rqos(rqos);
> >> + mutex_unlock(&q->debugfs_mutex);
> >> + }
> >
> > It will cause more lockdep splat to let q->debugfs_mutex depend on queue freeze,
> >
> I think we already have that ->debugfs_mutex dependency on ->freeze_lock.
> for instance,
> ioc_qos_write => freeze-queue
> blk_iocost_init
> rq_qos_add
Why is queue freeze needed in above code path?
Also blk_iolatency_init()/rq_qos_add() doesn't freeze queue.
>
> and also,
> queue_wb_lat_store => freeze-queue
> wbt_init
> rq_qos_add
Not all wbt_enable_default()/wbt_init() is called with queue frozen, but Kuai's
patchset changes all to freeze queue before registering debugfs entry, people will
complain new warning.
>
> > Also blk_mq_debugfs_register_rqos() does _not_ require queue to be frozen,
> > and it should be fine to move blk_mq_debugfs_register_rqos() out of queue
> > freeze.
> >
> Yes correct, but I thought this pacthset is meant only to address incorrect
> locking order between ->rq_qos_mutex and ->freeze_lock. So do you suggest
> also refactoring code to avoid ->debugfs_mutex dependency on ->freeze_lock?
> If yes then shouldn't that be handled in a separate patchset?
It is fine to fix in that way, but at least regression shouldn't be caused.
More importantly we shouldn't add new unnecessary dependency on queue freeze.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-17 11:30 ` Ming Lei
@ 2025-11-17 11:39 ` Yu Kuai
2025-11-17 11:54 ` Ming Lei
0 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-11-17 11:39 UTC (permalink / raw)
To: Ming Lei, Nilay Shroff; +Cc: axboe, linux-block, linux-kernel, tj, Yu Kuai
Hi,
在 2025/11/17 19:30, Ming Lei 写道:
> On Mon, Nov 17, 2025 at 04:43:11PM +0530, Nilay Shroff wrote:
>>
>> On 11/17/25 4:31 PM, Ming Lei wrote:
>>> On Sun, Nov 16, 2025 at 12:10:20PM +0800, Yu Kuai wrote:
>>>> queue should not be freezed under rq_qos_mutex, see example index
>>>> commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
>>>> parameters"), which means current implementation of rq_qos_add() is
>>>> problematic. Add a new helper and prepare to fix this problem in
>>>> following patches.
>>>>
>>>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>>>> ---
>>>> block/blk-rq-qos.c | 27 +++++++++++++++++++++++++++
>>>> block/blk-rq-qos.h | 2 ++
>>>> 2 files changed, 29 insertions(+)
>>>>
>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>>>> index 654478dfbc20..353397d7e126 100644
>>>> --- a/block/blk-rq-qos.c
>>>> +++ b/block/blk-rq-qos.c
>>>> @@ -322,6 +322,33 @@ void rq_qos_exit(struct request_queue *q)
>>>> mutex_unlock(&q->rq_qos_mutex);
>>>> }
>>>>
>>>> +int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
>>>> + enum rq_qos_id id, const struct rq_qos_ops *ops)
>>>> +{
>>>> + struct request_queue *q = disk->queue;
>>>> +
>>>> + WARN_ON_ONCE(q->mq_freeze_depth == 0);
>>>> + lockdep_assert_held(&q->rq_qos_mutex);
>>>> +
>>>> + if (rq_qos_id(q, id))
>>>> + return -EBUSY;
>>>> +
>>>> + rqos->disk = disk;
>>>> + rqos->id = id;
>>>> + rqos->ops = ops;
>>>> + rqos->next = q->rq_qos;
>>>> + q->rq_qos = rqos;
>>>> + blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>>>> +
>>>> + if (rqos->ops->debugfs_attrs) {
>>>> + mutex_lock(&q->debugfs_mutex);
>>>> + blk_mq_debugfs_register_rqos(rqos);
>>>> + mutex_unlock(&q->debugfs_mutex);
>>>> + }
>>> It will cause more lockdep splat to let q->debugfs_mutex depend on queue freeze,
>>>
>> I think we already have that ->debugfs_mutex dependency on ->freeze_lock.
>> for instance,
>> ioc_qos_write => freeze-queue
>> blk_iocost_init
>> rq_qos_add
> Why is queue freeze needed in above code path?
>
> Also blk_iolatency_init()/rq_qos_add() doesn't freeze queue.
I don't quite understand, rq_qos_add() always require queue freeze, prevent
deference q->rq_qos from IO path concurrently.
>
>> and also,
>> queue_wb_lat_store => freeze-queue
>> wbt_init
>> rq_qos_add
> Not all wbt_enable_default()/wbt_init() is called with queue frozen, but Kuai's
> patchset changes all to freeze queue before registering debugfs entry, people will
> complain new warning.
Yes, but the same as above, rq_qos_add() from wbt_init() will always freeze queue
before this set, so I don't understand why is there new warning?
>
>>> Also blk_mq_debugfs_register_rqos() does _not_ require queue to be frozen,
>>> and it should be fine to move blk_mq_debugfs_register_rqos() out of queue
>>> freeze.
>>>
>> Yes correct, but I thought this pacthset is meant only to address incorrect
>> locking order between ->rq_qos_mutex and ->freeze_lock. So do you suggest
>> also refactoring code to avoid ->debugfs_mutex dependency on ->freeze_lock?
>> If yes then shouldn't that be handled in a separate patchset?
> It is fine to fix in that way, but at least regression shouldn't be caused.
>
> More importantly we shouldn't add new unnecessary dependency on queue freeze.
This is correct, I'll work on the v2 set to move debugfs_mutex outside of freeze
queue, however, as you suggested before we should we should fix this incorrect
lock order first. How about I make them in a single set?
>
> Thanks,
> Ming
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-17 11:39 ` Yu Kuai
@ 2025-11-17 11:54 ` Ming Lei
2025-11-17 11:59 ` Yu Kuai
0 siblings, 1 reply; 19+ messages in thread
From: Ming Lei @ 2025-11-17 11:54 UTC (permalink / raw)
To: Yu Kuai; +Cc: Nilay Shroff, axboe, linux-block, linux-kernel, tj
On Mon, Nov 17, 2025 at 07:39:57PM +0800, Yu Kuai wrote:
> Hi,
>
> 在 2025/11/17 19:30, Ming Lei 写道:
> > On Mon, Nov 17, 2025 at 04:43:11PM +0530, Nilay Shroff wrote:
> >>
> >> On 11/17/25 4:31 PM, Ming Lei wrote:
> >>> On Sun, Nov 16, 2025 at 12:10:20PM +0800, Yu Kuai wrote:
> >>>> queue should not be freezed under rq_qos_mutex, see example index
> >>>> commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
> >>>> parameters"), which means current implementation of rq_qos_add() is
> >>>> problematic. Add a new helper and prepare to fix this problem in
> >>>> following patches.
> >>>>
> >>>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
> >>>> ---
> >>>> block/blk-rq-qos.c | 27 +++++++++++++++++++++++++++
> >>>> block/blk-rq-qos.h | 2 ++
> >>>> 2 files changed, 29 insertions(+)
> >>>>
> >>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
> >>>> index 654478dfbc20..353397d7e126 100644
> >>>> --- a/block/blk-rq-qos.c
> >>>> +++ b/block/blk-rq-qos.c
> >>>> @@ -322,6 +322,33 @@ void rq_qos_exit(struct request_queue *q)
> >>>> mutex_unlock(&q->rq_qos_mutex);
> >>>> }
> >>>>
> >>>> +int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
> >>>> + enum rq_qos_id id, const struct rq_qos_ops *ops)
> >>>> +{
> >>>> + struct request_queue *q = disk->queue;
> >>>> +
> >>>> + WARN_ON_ONCE(q->mq_freeze_depth == 0);
> >>>> + lockdep_assert_held(&q->rq_qos_mutex);
> >>>> +
> >>>> + if (rq_qos_id(q, id))
> >>>> + return -EBUSY;
> >>>> +
> >>>> + rqos->disk = disk;
> >>>> + rqos->id = id;
> >>>> + rqos->ops = ops;
> >>>> + rqos->next = q->rq_qos;
> >>>> + q->rq_qos = rqos;
> >>>> + blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
> >>>> +
> >>>> + if (rqos->ops->debugfs_attrs) {
> >>>> + mutex_lock(&q->debugfs_mutex);
> >>>> + blk_mq_debugfs_register_rqos(rqos);
> >>>> + mutex_unlock(&q->debugfs_mutex);
> >>>> + }
> >>> It will cause more lockdep splat to let q->debugfs_mutex depend on queue freeze,
> >>>
> >> I think we already have that ->debugfs_mutex dependency on ->freeze_lock.
> >> for instance,
> >> ioc_qos_write => freeze-queue
> >> blk_iocost_init
> >> rq_qos_add
> > Why is queue freeze needed in above code path?
> >
> > Also blk_iolatency_init()/rq_qos_add() doesn't freeze queue.
>
> I don't quite understand, rq_qos_add() always require queue freeze, prevent
> deference q->rq_qos from IO path concurrently.
>
> >
> >> and also,
> >> queue_wb_lat_store => freeze-queue
> >> wbt_init
> >> rq_qos_add
> > Not all wbt_enable_default()/wbt_init() is called with queue frozen, but Kuai's
> > patchset changes all to freeze queue before registering debugfs entry, people will
> > complain new warning.
>
> Yes, but the same as above, rq_qos_add() from wbt_init() will always freeze queue
> before this set, so I don't understand why is there new warning?
The in-tree rq_qos_add() registers debugfs after queue is unfreeze, but
your patchset basically moves queue freeze/unfreeze to callsite of rq_qos_add(),
then debugfs register is always done with queue frozen.
Dependency between queue freeze and q->debugfs_mutex is introduced in some
code paths, such as, elevator switch, blk_iolatency_init, ..., this way
will trigger warning because it isn't strange to run into memory
allocation in debugfs_create_*().
>
> >
> >>> Also blk_mq_debugfs_register_rqos() does _not_ require queue to be frozen,
> >>> and it should be fine to move blk_mq_debugfs_register_rqos() out of queue
> >>> freeze.
> >>>
> >> Yes correct, but I thought this pacthset is meant only to address incorrect
> >> locking order between ->rq_qos_mutex and ->freeze_lock. So do you suggest
> >> also refactoring code to avoid ->debugfs_mutex dependency on ->freeze_lock?
> >> If yes then shouldn't that be handled in a separate patchset?
> > It is fine to fix in that way, but at least regression shouldn't be caused.
> >
> > More importantly we shouldn't add new unnecessary dependency on queue freeze.
>
> This is correct, I'll work on the v2 set to move debugfs_mutex outside of freeze
> queue, however, as you suggested before we should we should fix this incorrect
> lock order first. How about I make them in a single set?
That is fine, but patches for moving debugfs_mutex should be put before
this patchset, which is always friendly for 'git bisect'.
Thanks,
Ming
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-17 11:54 ` Ming Lei
@ 2025-11-17 11:59 ` Yu Kuai
0 siblings, 0 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-17 11:59 UTC (permalink / raw)
To: Ming Lei; +Cc: Nilay Shroff, axboe, linux-block, linux-kernel, tj, Yu Kuai
Hi,
在 2025/11/17 19:54, Ming Lei 写道:
> On Mon, Nov 17, 2025 at 07:39:57PM +0800, Yu Kuai wrote:
>> Hi,
>>
>> 在 2025/11/17 19:30, Ming Lei 写道:
>>> On Mon, Nov 17, 2025 at 04:43:11PM +0530, Nilay Shroff wrote:
>>>> On 11/17/25 4:31 PM, Ming Lei wrote:
>>>>> On Sun, Nov 16, 2025 at 12:10:20PM +0800, Yu Kuai wrote:
>>>>>> queue should not be freezed under rq_qos_mutex, see example index
>>>>>> commit 9730763f4756 ("block: correct locking order for protecting blk-wbt
>>>>>> parameters"), which means current implementation of rq_qos_add() is
>>>>>> problematic. Add a new helper and prepare to fix this problem in
>>>>>> following patches.
>>>>>>
>>>>>> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
>>>>>> ---
>>>>>> block/blk-rq-qos.c | 27 +++++++++++++++++++++++++++
>>>>>> block/blk-rq-qos.h | 2 ++
>>>>>> 2 files changed, 29 insertions(+)
>>>>>>
>>>>>> diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
>>>>>> index 654478dfbc20..353397d7e126 100644
>>>>>> --- a/block/blk-rq-qos.c
>>>>>> +++ b/block/blk-rq-qos.c
>>>>>> @@ -322,6 +322,33 @@ void rq_qos_exit(struct request_queue *q)
>>>>>> mutex_unlock(&q->rq_qos_mutex);
>>>>>> }
>>>>>>
>>>>>> +int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
>>>>>> + enum rq_qos_id id, const struct rq_qos_ops *ops)
>>>>>> +{
>>>>>> + struct request_queue *q = disk->queue;
>>>>>> +
>>>>>> + WARN_ON_ONCE(q->mq_freeze_depth == 0);
>>>>>> + lockdep_assert_held(&q->rq_qos_mutex);
>>>>>> +
>>>>>> + if (rq_qos_id(q, id))
>>>>>> + return -EBUSY;
>>>>>> +
>>>>>> + rqos->disk = disk;
>>>>>> + rqos->id = id;
>>>>>> + rqos->ops = ops;
>>>>>> + rqos->next = q->rq_qos;
>>>>>> + q->rq_qos = rqos;
>>>>>> + blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
>>>>>> +
>>>>>> + if (rqos->ops->debugfs_attrs) {
>>>>>> + mutex_lock(&q->debugfs_mutex);
>>>>>> + blk_mq_debugfs_register_rqos(rqos);
>>>>>> + mutex_unlock(&q->debugfs_mutex);
>>>>>> + }
>>>>> It will cause more lockdep splat to let q->debugfs_mutex depend on queue freeze,
>>>>>
>>>> I think we already have that ->debugfs_mutex dependency on ->freeze_lock.
>>>> for instance,
>>>> ioc_qos_write => freeze-queue
>>>> blk_iocost_init
>>>> rq_qos_add
>>> Why is queue freeze needed in above code path?
>>>
>>> Also blk_iolatency_init()/rq_qos_add() doesn't freeze queue.
>> I don't quite understand, rq_qos_add() always require queue freeze, prevent
>> deference q->rq_qos from IO path concurrently.
>>
>>>> and also,
>>>> queue_wb_lat_store => freeze-queue
>>>> wbt_init
>>>> rq_qos_add
>>> Not all wbt_enable_default()/wbt_init() is called with queue frozen, but Kuai's
>>> patchset changes all to freeze queue before registering debugfs entry, people will
>>> complain new warning.
>> Yes, but the same as above, rq_qos_add() from wbt_init() will always freeze queue
>> before this set, so I don't understand why is there new warning?
> The in-tree rq_qos_add() registers debugfs after queue is unfreeze, but
> your patchset basically moves queue freeze/unfreeze to callsite of rq_qos_add(),
> then debugfs register is always done with queue frozen.
>
> Dependency between queue freeze and q->debugfs_mutex is introduced in some
> code paths, such as, elevator switch, blk_iolatency_init, ..., this way
> will trigger warning because it isn't strange to run into memory
> allocation in debugfs_create_*().
Yes, I realized I do misunderstand in previous email.
>
>>>>> Also blk_mq_debugfs_register_rqos() does _not_ require queue to be frozen,
>>>>> and it should be fine to move blk_mq_debugfs_register_rqos() out of queue
>>>>> freeze.
>>>>>
>>>> Yes correct, but I thought this pacthset is meant only to address incorrect
>>>> locking order between ->rq_qos_mutex and ->freeze_lock. So do you suggest
>>>> also refactoring code to avoid ->debugfs_mutex dependency on ->freeze_lock?
>>>> If yes then shouldn't that be handled in a separate patchset?
>>> It is fine to fix in that way, but at least regression shouldn't be caused.
>>>
>>> More importantly we shouldn't add new unnecessary dependency on queue freeze.
>> This is correct, I'll work on the v2 set to move debugfs_mutex outside of freeze
>> queue, however, as you suggested before we should we should fix this incorrect
>> lock order first. How about I make them in a single set?
> That is fine, but patches for moving debugfs_mutex should be put before
> this patchset, which is always friendly for 'git bisect'.
Sounds good :)
Thanks,
Kuai
>
>
> Thanks,
> Ming
>
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed()
2025-11-16 4:10 ` [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed() Yu Kuai
2025-11-17 10:10 ` Nilay Shroff
2025-11-17 11:01 ` Ming Lei
@ 2025-11-17 23:32 ` Bart Van Assche
2 siblings, 0 replies; 19+ messages in thread
From: Bart Van Assche @ 2025-11-17 23:32 UTC (permalink / raw)
To: Yu Kuai, axboe, linux-block, linux-kernel, tj, nilay, ming.lei
On 11/15/25 8:10 PM, Yu Kuai wrote:
> queue should not be freezed under rq_qos_mutex, see example index
^^^^^ ^^^^^^^ ^^^^^^^^^^^^^
A queue frozen also
> +int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
> + enum rq_qos_id id, const struct rq_qos_ops *ops)
> +{
In this patch and also in the following patches, please fix the name of
this function and change it into "rq_qos_add_frozen()".
Thanks,
Bart.
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-16 4:10 [PATCH RESEND 0/5] block/blk-rq-qos: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
2025-11-16 4:10 ` [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed() Yu Kuai
@ 2025-11-16 4:10 ` Yu Kuai
2025-11-17 10:11 ` Nilay Shroff
2025-11-19 6:55 ` kernel test robot
2025-11-16 4:10 ` [PATCH RESEND 3/5] blk-iocost: " Yu Kuai
` (2 subsequent siblings)
4 siblings, 2 replies; 19+ messages in thread
From: Yu Kuai @ 2025-11-16 4:10 UTC (permalink / raw)
To: axboe, linux-block, linux-kernel, tj, nilay, ming.lei; +Cc: yukuai
wbt_init() can be called from sysfs attribute and wbt_enable_default(),
however the lock order are inversely.
- queue_wb_lat_store() freeze queue first, and then wbt_init() hold
rq_qos_mutex. In this case queue will be freezed again inside
rq_qos_add(), however, in this case freeze queue recursivly is
inoperative;
- wbt_enable_default() from elevator switch will hold rq_qos_mutex
first, and then rq_qos_add() will freeze queue;
Fix this problem by converting to use new helper rq_qos_add_freezed() in
wbt_init(), and for wbt_enable_default(), freeze queue before calling
wbt_init().
Fixes: a13bd91be223 ("block/rq_qos: protect rq_qos apis with a new lock")
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-wbt.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index eb8037bae0bd..a784f6d338b4 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -724,8 +724,12 @@ void wbt_enable_default(struct gendisk *disk)
if (!blk_queue_registered(q))
return;
- if (queue_is_mq(q) && enable)
+ if (queue_is_mq(q) && enable) {
+ unsigned int memflags = blk_mq_freeze_queue(q);
+
wbt_init(disk);
+ blk_mq_unfreeze_queue(q, memflags);
+ }
}
EXPORT_SYMBOL_GPL(wbt_enable_default);
@@ -922,7 +926,7 @@ int wbt_init(struct gendisk *disk)
* Assign rwb and add the stats callback.
*/
mutex_lock(&q->rq_qos_mutex);
- ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
+ ret = rq_qos_add_freezed(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
mutex_unlock(&q->rq_qos_mutex);
if (ret)
goto err_free;
--
2.51.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-16 4:10 ` [PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
@ 2025-11-17 10:11 ` Nilay Shroff
2025-11-19 6:55 ` kernel test robot
1 sibling, 0 replies; 19+ messages in thread
From: Nilay Shroff @ 2025-11-17 10:11 UTC (permalink / raw)
To: Yu Kuai, axboe, linux-block, linux-kernel, tj, ming.lei
On 11/16/25 9:40 AM, Yu Kuai wrote:
> wbt_init() can be called from sysfs attribute and wbt_enable_default(),
> however the lock order are inversely.
>
> - queue_wb_lat_store() freeze queue first, and then wbt_init() hold
> rq_qos_mutex. In this case queue will be freezed again inside
> rq_qos_add(), however, in this case freeze queue recursivly is
> inoperative;
> - wbt_enable_default() from elevator switch will hold rq_qos_mutex
> first, and then rq_qos_add() will freeze queue;
>
> Fix this problem by converting to use new helper rq_qos_add_freezed() in
> wbt_init(), and for wbt_enable_default(), freeze queue before calling
> wbt_init().
>
> Fixes: a13bd91be223 ("block/rq_qos: protect rq_qos apis with a new lock")
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-16 4:10 ` [PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
2025-11-17 10:11 ` Nilay Shroff
@ 2025-11-19 6:55 ` kernel test robot
1 sibling, 0 replies; 19+ messages in thread
From: kernel test robot @ 2025-11-19 6:55 UTC (permalink / raw)
To: Yu Kuai
Cc: oe-lkp, lkp, linux-block, axboe, linux-kernel, tj, nilay,
ming.lei, yukuai, oliver.sang
Hello,
kernel test robot noticed "WARNING:possible_circular_locking_dependency_detected" on:
commit: 9b76049c7ab17a3352a58ee216f444769e216c5c ("[PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue")
url: https://github.com/intel-lab-lkp/linux/commits/Yu-Kuai/block-blk-rq-qos-add-a-new-helper-rq_qos_add_freezed/20251116-121353
base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux.git for-next
patch link: https://lore.kernel.org/all/20251116041024.120500-3-yukuai@fnnas.com/
patch subject: [PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue
in testcase: boot
config: x86_64-rhel-9.4-kselftests
compiler: gcc-14
test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 32G
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202511191340.643afc3a-lkp@intel.com
[ 36.217408][ T108] WARNING: possible circular locking dependency detected
[ 36.218277][ T108] 6.18.0-rc5-00238-g9b76049c7ab1 #1 Not tainted
[ 36.219067][ T108] ------------------------------------------------------
[ 36.219956][ T108] udevd/108 is trying to acquire lock:
[ 36.220622][ T108] ffff88813b4b6a40 (&q->debugfs_mutex){+.+.}-{4:4}, at: rq_qos_add_freezed (block/blk-rq-qos.c:345)
[ 36.221938][ T108]
[ 36.221938][ T108] but task is already holding lock:
[ 36.222851][ T108] ffff88813b4b63e0 (&q->rq_qos_mutex){+.+.}-{4:4}, at: wbt_init (block/blk-wbt.c:929)
[ 36.223964][ T108]
[ 36.223964][ T108] which lock already depends on the new lock.
[ 36.223964][ T108]
[ 36.225282][ T108]
[ 36.225282][ T108] the existing dependency chain (in reverse order) is:
[ 36.226380][ T108]
[ 36.226380][ T108] -> #4 (&q->rq_qos_mutex){+.+.}-{4:4}:
[ 36.228833][ T108] __lock_acquire (kernel/locking/lockdep.c:5237 (discriminator 1))
[ 36.230959][ T108] lock_acquire (include/linux/preempt.h:471 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
[ 36.233086][ T108] __mutex_lock (arch/x86/include/asm/jump_label.h:36 include/trace/events/lock.h:95 kernel/locking/mutex.c:600 kernel/locking/mutex.c:760)
[ 36.234852][ T108] wbt_init (block/blk-wbt.c:929)
[ 36.236509][ T108] wbt_enable_default (include/linux/blk-mq.h:960 block/blk-wbt.c:731)
[ 36.238663][ T108] blk_register_queue (block/blk-sysfs.c:948)
[ 36.240650][ T108] __add_disk (block/genhd.c:528)
[ 36.242626][ T108] add_disk_fwnode (block/genhd.c:598)
[ 36.244660][ T108] sr_probe (drivers/scsi/sr.c:703) sr_mod
[ 36.246799][ T108] really_probe (drivers/base/dd.c:581 drivers/base/dd.c:659)
[ 36.248845][ T108] __driver_probe_device (drivers/base/dd.c:801)
[ 36.250799][ T108] driver_probe_device (drivers/base/dd.c:831)
[ 36.252778][ T108] __driver_attach (drivers/base/dd.c:1218)
[ 36.254752][ T108] bus_for_each_dev (drivers/base/bus.c:369)
[ 36.256776][ T108] bus_add_driver (drivers/base/bus.c:678)
[ 36.258650][ T108] driver_register (drivers/base/driver.c:249)
[ 36.261124][ T108] init_sr (drivers/scsi/sr.c:152) sr_mod
[ 36.262938][ T108] do_one_initcall (init/main.c:1283)
[ 36.264802][ T108] do_init_module (kernel/module/main.c:3039)
[ 36.266560][ T108] load_module (kernel/module/main.c:3509)
[ 36.268367][ T108] init_module_from_file (kernel/module/main.c:3701)
[ 36.270150][ T108] idempotent_init_module (kernel/module/main.c:3713)
[ 36.272050][ T108] __x64_sys_finit_module (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) kernel/module/main.c:3736 (discriminator 1) kernel/module/main.c:3723 (discriminator 1) kernel/module/main.c:3723 (discriminator 1))
[ 36.273921][ T108] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 36.275672][ T108] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 36.277584][ T108]
[ 36.277584][ T108] -> #3 (&q->q_usage_counter(io)){++++}-{0:0}:
[ 36.280765][ T108] __lock_acquire (kernel/locking/lockdep.c:5237 (discriminator 1))
[ 36.282560][ T108] lock_acquire (include/linux/preempt.h:471 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
[ 36.284435][ T108] blk_alloc_queue (block/blk-core.c:462 (discriminator 1))
[ 36.286153][ T108] blk_mq_alloc_queue (block/blk-mq.c:4405)
[ 36.287937][ T108] scsi_alloc_sdev (drivers/scsi/scsi_scan.c:339)
[ 36.289536][ T108] scsi_probe_and_add_lun (drivers/scsi/scsi_scan.c:1211)
[ 36.291280][ T108] __scsi_add_device (drivers/scsi/scsi_scan.c:1625)
[ 36.292952][ T108] ata_scsi_scan_host (drivers/ata/libata-scsi.c:4577 (discriminator 1)) libata
[ 36.295269][ T108] async_run_entry_fn (arch/x86/include/asm/jump_label.h:36 kernel/async.c:131)
[ 36.297162][ T108] process_one_work (arch/x86/include/asm/jump_label.h:36 include/trace/events/workqueue.h:110 kernel/workqueue.c:3268)
[ 36.299000][ T108] worker_thread (kernel/workqueue.c:3340 (discriminator 2) kernel/workqueue.c:3427 (discriminator 2))
[ 36.300695][ T108] kthread (kernel/kthread.c:463)
[ 36.302155][ T108] ret_from_fork (arch/x86/kernel/process.c:164)
[ 36.303743][ T108] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
[ 36.305346][ T108]
[ 36.305346][ T108] -> #2 (fs_reclaim){+.+.}-{0:0}:
[ 36.307985][ T108] __lock_acquire (kernel/locking/lockdep.c:5237 (discriminator 1))
[ 36.309621][ T108] lock_acquire (include/linux/preempt.h:471 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
[ 36.311256][ T108] fs_reclaim_acquire (mm/page_alloc.c:4270 mm/page_alloc.c:4283)
[ 36.312913][ T108] kmem_cache_alloc_lru_noprof (include/linux/sched/mm.h:319 mm/slub.c:4925 mm/slub.c:5260 mm/slub.c:5303)
[ 36.314550][ T108] __d_alloc (fs/dcache.c:1692)
[ 36.316022][ T108] d_alloc_parallel (fs/dcache.c:2549)
[ 36.317503][ T108] __lookup_slow (fs/namei.c:1801)
[ 36.319003][ T108] simple_start_creating (fs/libfs.c:2304 (discriminator 1))
[ 36.320529][ T108] debugfs_start_creating+0x4f/0xe0
[ 36.322219][ T108] debugfs_create_dir (fs/debugfs/inode.c:374 (discriminator 1) fs/debugfs/inode.c:581 (discriminator 1))
[ 36.323776][ T108] pinctrl_init (drivers/pinctrl/core.c:2028 (discriminator 1) drivers/pinctrl/core.c:2420 (discriminator 1))
[ 36.325323][ T108] do_one_initcall (init/main.c:1283)
[ 36.326866][ T108] do_initcalls (init/main.c:1344 (discriminator 3) init/main.c:1361 (discriminator 3))
[ 36.328322][ T108] kernel_init_freeable (init/main.c:1597)
[ 36.329944][ T108] kernel_init (init/main.c:1485)
[ 36.331399][ T108] ret_from_fork (arch/x86/kernel/process.c:164)
[ 36.332850][ T108] ret_from_fork_asm (arch/x86/entry/entry_64.S:255)
[ 36.334399][ T108]
[ 36.334399][ T108] -> #1 (&sb->s_type->i_mutex_key#3){+.+.}-{4:4}:
[ 36.337887][ T108] __lock_acquire (kernel/locking/lockdep.c:5237 (discriminator 1))
[ 36.339409][ T108] lock_acquire (include/linux/preempt.h:471 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
[ 36.341061][ T108] down_write (arch/x86/include/asm/preempt.h:80 (discriminator 10) kernel/locking/rwsem.c:1315 (discriminator 10) kernel/locking/rwsem.c:1326 (discriminator 10) kernel/locking/rwsem.c:1591 (discriminator 10))
[ 36.342598][ T108] simple_start_creating (fs/libfs.c:2299)
[ 36.344192][ T108] debugfs_start_creating+0x4f/0xe0
[ 36.345959][ T108] debugfs_create_dir (fs/debugfs/inode.c:374 (discriminator 1) fs/debugfs/inode.c:581 (discriminator 1))
[ 36.347535][ T108] blk_register_queue (block/blk-sysfs.c:928 (discriminator 1))
[ 36.349145][ T108] __add_disk (block/genhd.c:528)
[ 36.350605][ T108] add_disk_fwnode (block/genhd.c:598)
[ 36.352129][ T108] sr_probe (drivers/scsi/sr.c:703) sr_mod
[ 36.353727][ T108] really_probe (drivers/base/dd.c:581 drivers/base/dd.c:659)
[ 36.355225][ T108] __driver_probe_device (drivers/base/dd.c:801)
[ 36.356818][ T108] driver_probe_device (drivers/base/dd.c:831)
[ 36.358406][ T108] __driver_attach (drivers/base/dd.c:1218)
[ 36.359984][ T108] bus_for_each_dev (drivers/base/bus.c:369)
[ 36.361462][ T108] bus_add_driver (drivers/base/bus.c:678)
[ 36.362950][ T108] driver_register (drivers/base/driver.c:249)
[ 36.364384][ T108] init_sr (drivers/scsi/sr.c:152) sr_mod
[ 36.365900][ T108] do_one_initcall (init/main.c:1283)
[ 36.367417][ T108] do_init_module (kernel/module/main.c:3039)
[ 36.368926][ T108] load_module (kernel/module/main.c:3509)
[ 36.370434][ T108] init_module_from_file (kernel/module/main.c:3701)
[ 36.372090][ T108] idempotent_init_module (kernel/module/main.c:3713)
[ 36.373729][ T108] __x64_sys_finit_module (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) kernel/module/main.c:3736 (discriminator 1) kernel/module/main.c:3723 (discriminator 1) kernel/module/main.c:3723 (discriminator 1))
[ 36.375279][ T108] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 36.376796][ T108] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 36.378301][ T108]
[ 36.378301][ T108] -> #0 (&q->debugfs_mutex){+.+.}-{4:4}:
[ 36.381043][ T108] check_prev_add (kernel/locking/lockdep.c:3166 (discriminator 2))
[ 36.382551][ T108] validate_chain (kernel/locking/lockdep.c:3285 kernel/locking/lockdep.c:3908)
[ 36.384101][ T108] __lock_acquire (kernel/locking/lockdep.c:5237 (discriminator 1))
[ 36.385608][ T108] lock_acquire (include/linux/preempt.h:471 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) include/trace/events/lock.h:24 (discriminator 2) kernel/locking/lockdep.c:5831 (discriminator 2))
[ 36.387119][ T108] __mutex_lock (arch/x86/include/asm/jump_label.h:36 include/trace/events/lock.h:95 kernel/locking/mutex.c:600 kernel/locking/mutex.c:760)
[ 36.388585][ T108] rq_qos_add_freezed (block/blk-rq-qos.c:345)
[ 36.390240][ T108] wbt_init (block/blk-wbt.c:930)
[ 36.391782][ T108] wbt_enable_default (include/linux/blk-mq.h:960 block/blk-wbt.c:731)
[ 36.393333][ T108] blk_register_queue (block/blk-sysfs.c:948)
[ 36.394867][ T108] __add_disk (block/genhd.c:528)
[ 36.396410][ T108] add_disk_fwnode (block/genhd.c:598)
[ 36.398009][ T108] sr_probe (drivers/scsi/sr.c:703) sr_mod
[ 36.399514][ T108] really_probe (drivers/base/dd.c:581 drivers/base/dd.c:659)
[ 36.401035][ T108] __driver_probe_device (drivers/base/dd.c:801)
[ 36.402511][ T108] driver_probe_device (drivers/base/dd.c:831)
[ 36.403990][ T108] __driver_attach (drivers/base/dd.c:1218)
[ 36.405471][ T108] bus_for_each_dev (drivers/base/bus.c:369)
[ 36.407062][ T108] bus_add_driver (drivers/base/bus.c:678)
[ 36.408549][ T108] driver_register (drivers/base/driver.c:249)
[ 36.410081][ T108] init_sr (drivers/scsi/sr.c:152) sr_mod
[ 36.411625][ T108] do_one_initcall (init/main.c:1283)
[ 36.413166][ T108] do_init_module (kernel/module/main.c:3039)
[ 36.414682][ T108] load_module (kernel/module/main.c:3509)
[ 36.416196][ T108] init_module_from_file (kernel/module/main.c:3701)
[ 36.417822][ T108] idempotent_init_module (kernel/module/main.c:3713)
[ 36.419453][ T108] __x64_sys_finit_module (include/linux/file.h:62 (discriminator 1) include/linux/file.h:83 (discriminator 1) kernel/module/main.c:3736 (discriminator 1) kernel/module/main.c:3723 (discriminator 1) kernel/module/main.c:3723 (discriminator 1))
[ 36.421144][ T108] do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1))
[ 36.422741][ T108] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
[ 36.424468][ T108]
[ 36.424468][ T108] other info that might help us debug this:
[ 36.424468][ T108]
[ 36.428405][ T108] Chain exists of:
[ 36.428405][ T108] &q->debugfs_mutex --> &q->q_usage_counter(io) --> &q->rq_qos_mutex
[ 36.428405][ T108]
[ 36.432967][ T108] Possible unsafe locking scenario:
[ 36.432967][ T108]
[ 36.435485][ T108] CPU0 CPU1
[ 36.437074][ T108] ---- ----
[ 36.438592][ T108] lock(&q->rq_qos_mutex);
[ 36.440094][ T108] lock(&q->q_usage_counter(io));
[ 36.441920][ T108] lock(&q->rq_qos_mutex);
[ 36.443715][ T108] lock(&q->debugfs_mutex);
[ 36.445174][ T108]
[ 36.445174][ T108] *** DEADLOCK ***
[ 36.445174][ T108]
[ 36.448803][ T108] 6 locks held by udevd/108:
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20251119/202511191340.643afc3a-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH RESEND 3/5] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-16 4:10 [PATCH RESEND 0/5] block/blk-rq-qos: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
2025-11-16 4:10 ` [PATCH RESEND 1/5] block/blk-rq-qos: add a new helper rq_qos_add_freezed() Yu Kuai
2025-11-16 4:10 ` [PATCH RESEND 2/5] blk-wbt: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
@ 2025-11-16 4:10 ` Yu Kuai
2025-11-17 10:11 ` Nilay Shroff
2025-11-16 4:10 ` [PATCH RESEND 4/5] blk-iolatency: " Yu Kuai
2025-11-16 4:10 ` [PATCH RESEND 5/5] block/blk-rq-qos: cleanup rq_qos_add() Yu Kuai
4 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-11-16 4:10 UTC (permalink / raw)
To: axboe, linux-block, linux-kernel, tj, nilay, ming.lei; +Cc: yukuai
Like wbt, rq_qos_add() can be called from two path and the lock order
are inversely:
- From ioc_qos_write(), queue is already freezed before rq_qos_add();
- From ioc_cost_model_write(), rq_qos_add() is called directly;
Fix this problem by converting to use blkg_conf_open_bdev_frozen()
from ioc_cost_model_write(), then since all rq_qos_add() callers
already freeze queue, convert to use rq_qos_add_freezed.
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-iocost.c | 17 ++++++++---------
1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 5bfd70311359..233c9749bfc9 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2927,7 +2927,7 @@ static int blk_iocost_init(struct gendisk *disk)
* called before policy activation completion, can't assume that the
* target bio has an iocg associated and need to test for NULL iocg.
*/
- ret = rq_qos_add(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
+ ret = rq_qos_add_freezed(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
if (ret)
goto err_free_ioc;
@@ -3410,7 +3410,7 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
{
struct blkg_conf_ctx ctx;
struct request_queue *q;
- unsigned int memflags;
+ unsigned long memflags;
struct ioc *ioc;
u64 u[NR_I_LCOEFS];
bool user;
@@ -3419,9 +3419,11 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
blkg_conf_init(&ctx, input);
- ret = blkg_conf_open_bdev(&ctx);
- if (ret)
+ memflags = blkg_conf_open_bdev_frozen(&ctx);
+ if (IS_ERR_VALUE(memflags)) {
+ ret = memflags;
goto err;
+ }
body = ctx.body;
q = bdev_get_queue(ctx.bdev);
@@ -3438,7 +3440,6 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
ioc = q_to_ioc(q);
}
- memflags = blk_mq_freeze_queue(q);
blk_mq_quiesce_queue(q);
spin_lock_irq(&ioc->lock);
@@ -3490,20 +3491,18 @@ static ssize_t ioc_cost_model_write(struct kernfs_open_file *of, char *input,
spin_unlock_irq(&ioc->lock);
blk_mq_unquiesce_queue(q);
- blk_mq_unfreeze_queue(q, memflags);
- blkg_conf_exit(&ctx);
+ blkg_conf_exit_frozen(&ctx, memflags);
return nbytes;
einval:
spin_unlock_irq(&ioc->lock);
blk_mq_unquiesce_queue(q);
- blk_mq_unfreeze_queue(q, memflags);
ret = -EINVAL;
err:
- blkg_conf_exit(&ctx);
+ blkg_conf_exit_frozen(&ctx, memflags);
return ret;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 3/5] blk-iocost: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-16 4:10 ` [PATCH RESEND 3/5] blk-iocost: " Yu Kuai
@ 2025-11-17 10:11 ` Nilay Shroff
0 siblings, 0 replies; 19+ messages in thread
From: Nilay Shroff @ 2025-11-17 10:11 UTC (permalink / raw)
To: Yu Kuai, axboe, linux-block, linux-kernel, tj, ming.lei
On 11/16/25 9:40 AM, Yu Kuai wrote:
> Like wbt, rq_qos_add() can be called from two path and the lock order
> are inversely:
>
> - From ioc_qos_write(), queue is already freezed before rq_qos_add();
> - From ioc_cost_model_write(), rq_qos_add() is called directly;
>
> Fix this problem by converting to use blkg_conf_open_bdev_frozen()
> from ioc_cost_model_write(), then since all rq_qos_add() callers
> already freeze queue, convert to use rq_qos_add_freezed.
>
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH RESEND 4/5] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-16 4:10 [PATCH RESEND 0/5] block/blk-rq-qos: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
` (2 preceding siblings ...)
2025-11-16 4:10 ` [PATCH RESEND 3/5] blk-iocost: " Yu Kuai
@ 2025-11-16 4:10 ` Yu Kuai
2025-11-17 10:12 ` Nilay Shroff
2025-11-16 4:10 ` [PATCH RESEND 5/5] block/blk-rq-qos: cleanup rq_qos_add() Yu Kuai
4 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-11-16 4:10 UTC (permalink / raw)
To: axboe, linux-block, linux-kernel, tj, nilay, ming.lei; +Cc: yukuai
Currently blk-iolatency will hold rq_qos_mutex first and then call
rq_qos_add() to freeze queue.
Fix this problem by converting to use blkg_conf_open_bdev_frozen()
from iolatency_set_limit(), and convert to use rq_qos_add_freezed().
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-iolatency.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 45bd18f68541..1565352b176d 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -764,8 +764,8 @@ static int blk_iolatency_init(struct gendisk *disk)
if (!blkiolat)
return -ENOMEM;
- ret = rq_qos_add(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
- &blkcg_iolatency_ops);
+ ret = rq_qos_add_freezed(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
+ &blkcg_iolatency_ops);
if (ret)
goto err_free;
ret = blkcg_activate_policy(disk, &blkcg_policy_iolatency);
@@ -831,16 +831,19 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf,
struct blkcg_gq *blkg;
struct blkg_conf_ctx ctx;
struct iolatency_grp *iolat;
+ unsigned long memflags;
char *p, *tok;
u64 lat_val = 0;
u64 oldval;
- int ret;
+ int ret = 0;
blkg_conf_init(&ctx, buf);
- ret = blkg_conf_open_bdev(&ctx);
- if (ret)
+ memflags = blkg_conf_open_bdev_frozen(&ctx);
+ if (IS_ERR_VALUE(memflags)) {
+ ret = memflags;
goto out;
+ }
/*
* blk_iolatency_init() may fail after rq_qos_add() succeeds which can
@@ -890,7 +893,7 @@ static ssize_t iolatency_set_limit(struct kernfs_open_file *of, char *buf,
iolatency_clear_scaling(blkg);
ret = 0;
out:
- blkg_conf_exit(&ctx);
+ blkg_conf_exit_frozen(&ctx, memflags);
return ret ?: nbytes;
}
--
2.51.0
^ permalink raw reply related [flat|nested] 19+ messages in thread* Re: [PATCH RESEND 4/5] blk-iolatency: fix incorrect lock order for rq_qos_mutex and freeze queue
2025-11-16 4:10 ` [PATCH RESEND 4/5] blk-iolatency: " Yu Kuai
@ 2025-11-17 10:12 ` Nilay Shroff
0 siblings, 0 replies; 19+ messages in thread
From: Nilay Shroff @ 2025-11-17 10:12 UTC (permalink / raw)
To: Yu Kuai, axboe, linux-block, linux-kernel, tj, ming.lei
On 11/16/25 9:40 AM, Yu Kuai wrote:
> Currently blk-iolatency will hold rq_qos_mutex first and then call
> rq_qos_add() to freeze queue.
>
> Fix this problem by converting to use blkg_conf_open_bdev_frozen()
> from iolatency_set_limit(), and convert to use rq_qos_add_freezed().
>
> Signed-off-by: Yu Kuai <yukuai@fnnas.com>
Looks good to me:
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH RESEND 5/5] block/blk-rq-qos: cleanup rq_qos_add()
2025-11-16 4:10 [PATCH RESEND 0/5] block/blk-rq-qos: fix incorrect lock order for rq_qos_mutex and freeze queue Yu Kuai
` (3 preceding siblings ...)
2025-11-16 4:10 ` [PATCH RESEND 4/5] blk-iolatency: " Yu Kuai
@ 2025-11-16 4:10 ` Yu Kuai
2025-11-17 10:13 ` Nilay Shroff
4 siblings, 1 reply; 19+ messages in thread
From: Yu Kuai @ 2025-11-16 4:10 UTC (permalink / raw)
To: axboe, linux-block, linux-kernel, tj, nilay, ming.lei; +Cc: yukuai
Now that there is no caller of rq_qos_add(), remove it, and also rename
rq_qos_add_freezed() back to rq_qos_add().
Signed-off-by: Yu Kuai <yukuai@fnnas.com>
---
block/blk-iocost.c | 2 +-
block/blk-iolatency.c | 4 ++--
block/blk-rq-qos.c | 42 ++----------------------------------------
block/blk-rq-qos.h | 6 ++----
block/blk-wbt.c | 2 +-
5 files changed, 8 insertions(+), 48 deletions(-)
diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 233c9749bfc9..0948f628386f 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -2927,7 +2927,7 @@ static int blk_iocost_init(struct gendisk *disk)
* called before policy activation completion, can't assume that the
* target bio has an iocg associated and need to test for NULL iocg.
*/
- ret = rq_qos_add_freezed(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
+ ret = rq_qos_add(&ioc->rqos, disk, RQ_QOS_COST, &ioc_rqos_ops);
if (ret)
goto err_free_ioc;
diff --git a/block/blk-iolatency.c b/block/blk-iolatency.c
index 1565352b176d..5b18125e21c9 100644
--- a/block/blk-iolatency.c
+++ b/block/blk-iolatency.c
@@ -764,8 +764,8 @@ static int blk_iolatency_init(struct gendisk *disk)
if (!blkiolat)
return -ENOMEM;
- ret = rq_qos_add_freezed(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
- &blkcg_iolatency_ops);
+ ret = rq_qos_add(&blkiolat->rqos, disk, RQ_QOS_LATENCY,
+ &blkcg_iolatency_ops);
if (ret)
goto err_free;
ret = blkcg_activate_policy(disk, &blkcg_policy_iolatency);
diff --git a/block/blk-rq-qos.c b/block/blk-rq-qos.c
index 353397d7e126..3a49af00b738 100644
--- a/block/blk-rq-qos.c
+++ b/block/blk-rq-qos.c
@@ -322,8 +322,8 @@ void rq_qos_exit(struct request_queue *q)
mutex_unlock(&q->rq_qos_mutex);
}
-int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
- enum rq_qos_id id, const struct rq_qos_ops *ops)
+int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk,
+ enum rq_qos_id id, const struct rq_qos_ops *ops)
{
struct request_queue *q = disk->queue;
@@ -349,44 +349,6 @@ int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
return 0;
}
-int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
- const struct rq_qos_ops *ops)
-{
- struct request_queue *q = disk->queue;
- unsigned int memflags;
-
- lockdep_assert_held(&q->rq_qos_mutex);
-
- rqos->disk = disk;
- rqos->id = id;
- rqos->ops = ops;
-
- /*
- * No IO can be in-flight when adding rqos, so freeze queue, which
- * is fine since we only support rq_qos for blk-mq queue.
- */
- memflags = blk_mq_freeze_queue(q);
-
- if (rq_qos_id(q, rqos->id))
- goto ebusy;
- rqos->next = q->rq_qos;
- q->rq_qos = rqos;
- blk_queue_flag_set(QUEUE_FLAG_QOS_ENABLED, q);
-
- blk_mq_unfreeze_queue(q, memflags);
-
- if (rqos->ops->debugfs_attrs) {
- mutex_lock(&q->debugfs_mutex);
- blk_mq_debugfs_register_rqos(rqos);
- mutex_unlock(&q->debugfs_mutex);
- }
-
- return 0;
-ebusy:
- blk_mq_unfreeze_queue(q, memflags);
- return -EBUSY;
-}
-
void rq_qos_del(struct rq_qos *rqos)
{
struct request_queue *q = rqos->disk->queue;
diff --git a/block/blk-rq-qos.h b/block/blk-rq-qos.h
index 4a7fec01600b..8bbf178c16b0 100644
--- a/block/blk-rq-qos.h
+++ b/block/blk-rq-qos.h
@@ -85,10 +85,8 @@ static inline void rq_wait_init(struct rq_wait *rq_wait)
init_waitqueue_head(&rq_wait->wait);
}
-int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk, enum rq_qos_id id,
- const struct rq_qos_ops *ops);
-int rq_qos_add_freezed(struct rq_qos *rqos, struct gendisk *disk,
- enum rq_qos_id id, const struct rq_qos_ops *ops);
+int rq_qos_add(struct rq_qos *rqos, struct gendisk *disk,
+ enum rq_qos_id id, const struct rq_qos_ops *ops);
void rq_qos_del(struct rq_qos *rqos);
typedef bool (acquire_inflight_cb_t)(struct rq_wait *rqw, void *private_data);
diff --git a/block/blk-wbt.c b/block/blk-wbt.c
index a784f6d338b4..d7f1e6ba1790 100644
--- a/block/blk-wbt.c
+++ b/block/blk-wbt.c
@@ -926,7 +926,7 @@ int wbt_init(struct gendisk *disk)
* Assign rwb and add the stats callback.
*/
mutex_lock(&q->rq_qos_mutex);
- ret = rq_qos_add_freezed(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
+ ret = rq_qos_add(&rwb->rqos, disk, RQ_QOS_WBT, &wbt_rqos_ops);
mutex_unlock(&q->rq_qos_mutex);
if (ret)
goto err_free;
--
2.51.0
^ permalink raw reply related [flat|nested] 19+ messages in thread