From: Jens Axboe <axboe@kernel.dk>
To: Yu Kuai <yukuai1@huaweicloud.com>,
Li Nan <linan666@huaweicloud.com>, Ming Lei <ming.lei@redhat.com>
Cc: jianchao.w.wang@oracle.com, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org, yangerkun@huawei.com,
yi.zhang@huawei.com, "yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [PATCH] blk-mq: check kobject state_in_sysfs before deleting in blk_mq_unregister_hctx
Date: Thu, 28 Aug 2025 19:20:30 -0600 [thread overview]
Message-ID: <6424f720-9eaa-4642-9186-c0a148995e02@kernel.dk> (raw)
In-Reply-To: <5adb469d-9e4b-e2d9-a77c-a1a4d11a49d5@huaweicloud.com>
On 8/28/25 7:09 PM, Yu Kuai wrote:
> Hi,
>
> ? 2025/08/29 1:23, Jens Axboe ??:
>> On 8/28/25 3:28 AM, Li Nan wrote:
>>>
>>>
>>> ? 2025/8/27 16:10, Ming Lei ??:
>>>> On Wed, Aug 27, 2025 at 11:22:06AM +0800, Li Nan wrote:
>>>>>
>>>>>
>>>>> ? 2025/8/27 9:35, Ming Lei ??:
>>>>>> On Wed, Aug 27, 2025 at 09:04:45AM +0800, Yu Kuai wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> ? 2025/08/27 8:58, Ming Lei ??:
>>>>>>>> On Tue, Aug 26, 2025 at 04:48:54PM +0800, linan666@huaweicloud.com wrote:
>>>>>>>>> From: Li Nan <linan122@huawei.com>
>>>>>>>>>
>>>>>>>>> In __blk_mq_update_nr_hw_queues() the return value of
>>>>>>>>> blk_mq_sysfs_register_hctxs() is not checked. If sysfs creation for hctx
>>>>>>>>
>>>>>>>> Looks we should check its return value and handle the failure in both
>>>>>>>> the call site and blk_mq_sysfs_register_hctxs().
>>>>>>>
>>>>>>> From __blk_mq_update_nr_hw_queues(), the old hctxs is already
>>>>>>> unregistered, and this function is void, we failed to register new hctxs
>>>>>>> because of memory allocation failure. I really don't know how to handle
>>>>>>> the failure here, do you have any suggestions?
>>>>>>
>>>>>> It is out of memory, I think it is fine to do whatever to leave queue state
>>>>>> intact instead of making it `partial workable`, such as:
>>>>>>
>>>>>> - try update nr_hw_queues to 1
>>>>>>
>>>>>> - if it still fails, delete disk & mark queue as dead if disk is attached
>>>>>>
>>>>>
>>>>> If we ignore these non-critical sysfs creation failures, the disk remains
>>>>> usable with no loss of functionality. Deleting the disk seems to escalate
>>>>> the error?
>>>>
>>>> It is more like a workaround by ignoring the sysfs register failure. And if
>>>> the issue need to be fixed in this way, you have to document it. >
>>>> In case of OOM, it usually means that the system isn't usable any more.
>>>> But it is NOIO allocation and the typical use case is for error recovery in
>>>> nvme pci, so there may not be enough pages for noio allocation only. That is
>>>> the reason for ignoring sysfs register in blk_mq_update_nr_hw_queues()?
>>>>
>>>> But NVMe has been pretty fragile in this area by using non-owner queue
>>>> freeze, and call blk_mq_update_nr_hw_queues() on frozen queue, so it is
>>>> really necessary to take it into account?
>>>
>>> I agree with your points about NOIO and NVMe.
>>>
>>> I hit this issue in null_blk during fuzz testing with memory-fault
>>> injection. Changing the number of hardware queues under OOM is
>>> extremely rare in real-world usage. So I think adding a workaround and
>>> documenting it is sufficient. What do you think?
>>
>> Working around it is fine, as it isn't a situation we really need to
>> worry about. But let's please not do it by poking at kobject internals.
>>
>
> There is already used in someplaces like sysfs_slab_unlink().
>
> Do we prefre add a new hctx->state like BLK_MQ_S_REGISTERED?
If it's already used in a few spots, then I guess we should just be
using it as well rather than have a state around it. So I guess it's
fine. I'll just grab the patch.
--
Jens Axboe
next prev parent reply other threads:[~2025-08-29 1:20 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-26 8:48 [PATCH] blk-mq: check kobject state_in_sysfs before deleting in blk_mq_unregister_hctx linan666
2025-08-27 0:53 ` Yu Kuai
2025-08-27 0:58 ` Ming Lei
2025-08-27 1:04 ` Yu Kuai
2025-08-27 1:35 ` Ming Lei
2025-08-27 3:22 ` Li Nan
2025-08-27 8:10 ` Ming Lei
2025-08-28 9:28 ` Li Nan
2025-08-28 12:08 ` Ming Lei
2025-08-28 17:23 ` Jens Axboe
2025-08-29 1:09 ` Yu Kuai
2025-08-29 1:20 ` Jens Axboe [this message]
2025-08-29 1:21 ` Jens Axboe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6424f720-9eaa-4642-9186-c0a148995e02@kernel.dk \
--to=axboe@kernel.dk \
--cc=jianchao.w.wang@oracle.com \
--cc=linan666@huaweicloud.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=yangerkun@huawei.com \
--cc=yi.zhang@huawei.com \
--cc=yukuai1@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.