Linux block layer
 help / color / mirror / Atom feed
From: Nilay Shroff <nilay@linux.ibm.com>
To: Yu Kuai <yukuai1@huaweicloud.com>, ming.lei@redhat.com, axboe@kernel.dk
Cc: linux-block@vger.kernel.org, linux-kernel@vger.kernel.org,
	yi.zhang@huawei.com, yangerkun@huawei.com,
	johnny.chenyi@huawei.com, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock
Date: Tue, 9 Sep 2025 15:41:30 +0530	[thread overview]
Message-ID: <7fe7bfd3-d6c0-4485-aaa1-2c1629cb1784@linux.ibm.com> (raw)
In-Reply-To: <7544e26c-502a-75fc-7147-721a98bb0e80@huaweicloud.com>



On 9/9/25 3:06 PM, Yu Kuai wrote:
> Hi,
> 
> 在 2025/09/09 17:26, Nilay Shroff 写道:
>>
>>
>> On 9/9/25 12:46 PM, Yu Kuai wrote:
>>> Hi,
>>>
>>> 在 2025/09/09 14:52, Nilay Shroff 写道:
>>>>
>>>>
>>>> On 9/9/25 12:08 PM, Yu Kuai wrote:
>>>>> Hi,
>>>>>
>>>>> 在 2025/09/09 14:29, Nilay Shroff 写道:
>>>>>>
>>>>>>
>>>>>> On 9/8/25 11:45 AM, Yu Kuai wrote:
>>>>>>> From: Yu Kuai <yukuai3@huawei.com>
>>>>>>>
>>>>>>> request_queue->nr_requests can be changed by:
>>>>>>>
>>>>>>> a) switching elevator by update nr_hw_queues
>>>>>>> b) switching elevator by elevator sysfs attribute
>>>>>>> c) configue queue sysfs attribute nr_requests
>>>>>>>
>>>>>>> Current lock order is:
>>>>>>>
>>>>>>> 1) update_nr_hwq_lock, case a,b
>>>>>>> 2) freeze_queue
>>>>>>> 3) elevator_lock, cas a,b,c
>>>>>>>
>>>>>>> And update nr_requests is seriablized by elevator_lock() already,
>>>>>>> however, in the case c), we'll have to allocate new sched_tags if
>>>>>>> nr_requests grow, and do this with elevator_lock held and queue
>>>>>>> freezed has the risk of deadlock.
>>>>>>>
>>>>>>> Hence use update_nr_hwq_lock instead, make it possible to allocate
>>>>>>> memory if tags grow, meanwhile also prevent nr_requests to be changed
>>>>>>> concurrently.
>>>>>>>
>>>>>>> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
>>>>>>> ---
>>>>>>>     block/blk-sysfs.c | 12 +++++++++---
>>>>>>>     1 file changed, 9 insertions(+), 3 deletions(-)
>>>>>>>
>>>>>>> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
>>>>>>> index f99519f7a820..7ea15bf68b4b 100644
>>>>>>> --- a/block/blk-sysfs.c
>>>>>>> +++ b/block/blk-sysfs.c
>>>>>>> @@ -68,13 +68,14 @@ queue_requests_store(struct gendisk *disk, const char *page, size_t count)
>>>>>>>         int ret, err;
>>>>>>>         unsigned int memflags;
>>>>>>>         struct request_queue *q = disk->queue;
>>>>>>> +    struct blk_mq_tag_set *set = q->tag_set;
>>>>>>>           ret = queue_var_store(&nr, page, count);
>>>>>>>         if (ret < 0)
>>>>>>>             return ret;
>>>>>>>     -    memflags = blk_mq_freeze_queue(q);
>>>>>>> -    mutex_lock(&q->elevator_lock);
>>>>>>> +    /* serialize updating nr_requests with switching elevator */
>>>>>>> +    down_write(&set->update_nr_hwq_lock);
>>>>>>>     
>>>>>> For serializing update of nr_requests with switching elevator,
>>>>>> we should use disable_elv_switch(). So with this change we
>>>>>> don't need to acquire ->update_nr_hwq_lock in writer context
>>>>>> while running blk_mq_update_nr_requests but instead it can run
>>>>>> acquiring ->update_nr_hwq_lock in the reader context.
>>>>>>
>>>>>> So the code flow should be,
>>>>>>
>>>>>> disable_elv_switch  => this would set QUEUE_FLAG_NO_ELV_SWITCH
>>>>>> ...
>>>>>> down_read ->update_nr_hwq_lock
>>>>>> acquire ->freeze_lock
>>>>>> acquire ->elevator_lock;
>>>>>> ...
>>>>>> ...
>>>>>> release ->elevator_lock;
>>>>>> release ->freeze_lock
>>>>>>
>>>>>> clear QUEUE_FLAG_NO_ELV_SWITCH
>>>>>> up_read ->update_nr_hwq_lock
>>>>>>
>>>>>
>>>>> Yes, this make sense, however, there is also an implied condition that
>>>>> we should serialize queue_requests_store() with itself, what if a
>>>>> concurrent caller succeed the disable_elv_switch() before the
>>>>> down_read() in this way?
>>>>>
>>>>> t1:
>>>>> disable_elv_switch
>>>>>           t2:
>>>>>           disable_elv_switch
>>>>>
>>>>> down_read    down_read
>>>>>
>>>> I believe this is already protected with the kernfs internal
>>>> mutex locks. So you shouldn't be able to run two sysfs store
>>>> operations concurrently on the same attribute file.
>>>>
>>>
>>> There really is no such internal lock, the call stack is:
>>>
>>> kernfs_fop_write_iter
>>>   sysfs_kf_write
>>>    queue_attr_store
>>>
>>> There is only a file level mutex kernfs_open_file->lock from the top
>>> function kernfs_fop_write_iter(), however, this lock is not the same
>>> if we open the same attribute file from different context.
>>>
>> Oh yes this lock only protects if the same fd is being written
>> concurrently. However if we open the same sysfs file from different process
>> contexts then fd would be different and so this lock wouldn't protect
>> the simultaneous update of sysfs attribute. Having said that,
>> looking through the code again it seems that q->nr_requests update
>> is protected with ->elevator_lock (including both the elevator switch
>> code and your proposed changes in this patchset). So my question is
>> do we really need to synchronize nr_requests update code with elevator
>> swiupdate_nr_hwq_locktch code? So in another words what if we don't at
>> all use ->update_nr_hwq_lock in queue_requests_store?
> 
> 1) lock update_nr_hwq_lock, then no one can change nr_queuests
> 2) checking input nr_reqeusts
> 3) if grow, allocate memory
> 
> Main idea here is we can checking if nr_requests grow and then allocate
> mermory, without concern that nr_requests can be changed after memory
> allocation.
> 
If nr_requests changes after memory allocation we're still good because
eventually we'd only have one consistent value of nr_requests. For 
instance, if process A is updating nr_requests to 128 and sched switch
is updating nr_requests to 256 simultaneously then we'd either see 
nr_requests set to 128 or 256 in the end depending on who runs last.
We wouldn't get into a situation where we find some inconsistent update
to nr_requests, isn't it?

> BTW, I think this sysfs attr is really a slow path, and it's fine to
> grab the write lock.
> 
Yep you're right. But I think we should avoid locks if possible.

Thanks,
--Nilay


  reply	other threads:[~2025-09-09 10:11 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-08  6:15 [PATCH for-6.18/block 00/10] cleanup and fixes for updating nr_requests Yu Kuai
2025-09-08  6:15 ` [PATCH for-6.18/block 01/10] blk-mq: remove useless checking in queue_requests_store() Yu Kuai
2025-09-09 11:34   ` Nilay Shroff
2025-09-08  6:15 ` [PATCH for-6.18/block 02/10] blk-mq: remove useless checkings in blk_mq_update_nr_requests() Yu Kuai
2025-09-09 11:35   ` Nilay Shroff
2025-09-08  6:15 ` [PATCH for-6.18/block 03/10] blk-mq: check invalid nr_requests in queue_requests_store() Yu Kuai
2025-09-09 11:36   ` Nilay Shroff
2025-09-08  6:15 ` [PATCH for-6.18/block 04/10] blk-mq: convert to serialize updating nr_requests with update_nr_hwq_lock Yu Kuai
2025-09-09  6:29   ` Nilay Shroff
2025-09-09  6:38     ` Yu Kuai
2025-09-09  6:52       ` Nilay Shroff
2025-09-09  7:16         ` Yu Kuai
2025-09-09  9:26           ` Nilay Shroff
2025-09-09  9:36             ` Yu Kuai
2025-09-09 10:11               ` Nilay Shroff [this message]
2025-09-09 10:42                 ` Yu Kuai
2025-09-09 11:32                   ` Nilay Shroff
2025-09-09 11:40   ` Nilay Shroff
2025-09-08  6:15 ` [PATCH for-6.18/block 05/10] blk-mq: cleanup shared tags case in blk_mq_update_nr_requests() Yu Kuai
2025-09-09 11:58   ` Nilay Shroff
2025-09-08  6:15 ` [PATCH for-6.18/block 06/10] blk-mq: split bitmap grow and resize " Yu Kuai
2025-09-09 12:18   ` Nilay Shroff
2025-09-09 16:39     ` Yu Kuai
2025-09-10  6:30       ` Nilay Shroff
2025-09-10  6:42         ` Yu Kuai
2025-09-08  6:15 ` [PATCH for-6.18/block 07/10] blk-mq-sched: add new parameter nr_requests in blk_mq_alloc_sched_tags() Yu Kuai
2025-09-09 12:19   ` Nilay Shroff
2025-09-08  6:15 ` [PATCH for-6.18/block 08/10] blk-mq: fix potential deadlock while nr_requests grown Yu Kuai
2025-09-09  6:39   ` Nilay Shroff
2025-09-09  7:37     ` Yu Kuai
2025-09-09  9:36       ` Nilay Shroff
2025-09-09 12:21   ` Nilay Shroff
2025-09-10  7:46   ` Yu Kuai
2025-09-08  6:15 ` [PATCH for-6.18/block 09/10] blk-mq: remove blk_mq_tag_update_depth() Yu Kuai
2025-09-09 12:35   ` Nilay Shroff
2025-09-08  6:15 ` [PATCH for-6.18/block 10/10] blk-mq: fix stale nr_requests documentation Yu Kuai
2025-09-09 12:35   ` Nilay Shroff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7fe7bfd3-d6c0-4485-aaa1-2c1629cb1784@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=johnny.chenyi@huawei.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=yangerkun@huawei.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox