From: Nilay Shroff <nilay@linux.ibm.com>
To: yukuai@fygo.io, Tejun Heo <tj@kernel.org>,
Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>
Cc: Zheng Qixing <zhengqixing@huawei.com>,
Christoph Hellwig <hch@lst.de>,
Tang Yizhou <yizhou.tang@shopee.com>,
Ming Lei <ming.lei@redhat.com>,
cgroups@vger.kernel.org, linux-block@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction
Date: Fri, 26 Jun 2026 11:42:29 +0530 [thread overview]
Message-ID: <866644be-3b07-49f9-9c5d-e0f94ad1c793@linux.ibm.com> (raw)
In-Reply-To: <749d17be-ff9f-4f70-a948-0133f02eec93@fygo.io>
On 6/26/26 7:22 AM, yu kuai wrote:
> Hi,
>
> 在 2026/6/26 9:50, yu kuai 写道:
>> Hi,
>>
>> 在 2026/6/25 23:08, Nilay Shroff 写道:
>>> On 6/24/26 12:16 PM, Yu Kuai wrote:
>>>> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
>>>> index 7baccfb690fe..f7e788a7fe95 100644
>>>> --- a/block/blk-cgroup.c
>>>> +++ b/block/blk-cgroup.c
>>>> @@ -1563,10 +1563,12 @@ int blkcg_activate_policy(struct gendisk
>>>> *disk, const struct blkcg_policy *pol)
>>>> if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn))
>>>> return -EINVAL;
>>>> if (queue_is_mq(q))
>>>> memflags = blk_mq_freeze_queue(q);
>>>> +
>>>> + mutex_lock(&q->blkcg_mutex);
>>>> retry:
>>>> spin_lock_irq(&q->queue_lock);
>>>> /* blkg_list is pushed at the head, reverse walk to
>>>> initialize parents first */
>>>> list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
>>>> @@ -1625,10 +1627,11 @@ int blkcg_activate_policy(struct gendisk
>>>> *disk, const struct blkcg_policy *pol)
>>>> __set_bit(pol->plid, q->blkcg_pols);
>>>> ret = 0;
>>>> spin_unlock_irq(&q->queue_lock);
>>>> out:
>>>> + mutex_unlock(&q->blkcg_mutex);
>>>> if (queue_is_mq(q))
>>>> blk_mq_unfreeze_queue(q, memflags);
>>>> if (pinned_blkg)
>>>> blkg_put(pinned_blkg);
>>>> if (pd_prealloc)
>>> If the policy allocation fails, we jump to the lable enomem: and
>>> teardown pds.
>>> But I see this path still only acquires ->queue_lock. Don't we also need
>>> to protect it with ->blkcg_mutex?
>> Yes, I agree we should protect it as well.
>
> Just take a closer look at the code, the enomem is already protected by
> blkcg_mutex :)
>
Oh yes, but the ->blkcg_mutex is never released if we jump to enomem.
So that may potentially cause deadlock. We need to release ->blkcg_mutex
once blkcg_policy_teardown_pds() returns. Or may be refactor code (or add
comment) so that it's easy to realize or spot the ->blkcg_mutex is acquired
and then released around blkcg_policy_teardown_pds().
>>
>>> Moreover I still see race between blkg insertion in blkg_create() which
>>> still doesn't use ->blkcg_mutex and so list traversal in
>>> bfq_end_wr_async()
>>> may still race with blkg_create(), isn't it? I remember you once told
>>> this will be handled in another series but I couldn't find that yet.
>> This is the set:
>>
>> [PATCH 0/8] blk-cgroup: remove queue_lock nesting from blkcg paths - Yu
>> Kuai <https://lore.kernel.org/all/cover.1780621988.git.yukuai@fygo.io/>
>>
>> Noted that set just make sure queue_lock is not nested under other atomic
>> context, and that set do not acquire blkcg_mutex for blkg_create() yet. Howerver,
>> with that set it'll be easy to convert all queue_lock to blkcg_mtuex for blkg
>> protection, and together with lots of blk-cgroup code cleanups.
>>
Okay, so are you planning to send a follow-up patchset that replaces ->queue_lock
with ->blkcg_mutex for protecting the blkg_list? If so, I'd still prefer acquiring
->blkcg_mutex around blkg_create() in this patchset. That would address the race
between blkg_create() and the blkg_list traversal in bfq_end_wr_async(), while the
subsequent series can focus on cleaning up and removing the remaining ->queue_lock
usage for blkg protection.
Thanks,
--Nilay
next prev parent reply other threads:[~2026-06-26 6:13 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-24 6:46 [PATCH 1/2] md/linear: add fault-tolerant mode for unraid-like setups Yu Kuai
2026-06-24 6:46 ` [PATCH 2/2] ext4: add unraid mount option for single-disk-per-group mode Yu Kuai
2026-06-24 6:46 ` [PATCH v2 0/4] blk-cgroup: fix blkg list and policy data races Yu Kuai
2026-06-24 6:46 ` [PATCH v2 1/4] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
2026-06-24 6:46 ` [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction Yu Kuai
2026-06-25 15:08 ` Nilay Shroff
2026-06-26 1:50 ` yu kuai
2026-06-26 1:52 ` yu kuai
2026-06-26 6:12 ` Nilay Shroff [this message]
2026-06-27 4:13 ` yu kuai
2026-06-24 6:46 ` [PATCH v2 3/4] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
2026-06-24 6:46 ` [PATCH v2 4/4] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
2026-06-24 6:55 ` [PATCH 1/2] md/linear: add fault-tolerant mode for unraid-like setups yu kuai
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=866644be-3b07-49f9-9c5d-e0f94ad1c793@linux.ibm.com \
--to=nilay@linux.ibm.com \
--cc=axboe@kernel.dk \
--cc=cgroups@vger.kernel.org \
--cc=hch@lst.de \
--cc=josef@toxicpanda.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=ming.lei@redhat.com \
--cc=tj@kernel.org \
--cc=yizhou.tang@shopee.com \
--cc=yukuai@fygo.io \
--cc=zhengqixing@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox