Re: [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction

Linux cgroups development
 help / color / mirror / Atom feed

From: Nilay Shroff <nilay@linux.ibm.com>
To: yukuai@fygo.io, Tejun Heo <tj@kernel.org>,
	Josef Bacik <josef@toxicpanda.com>, Jens Axboe <axboe@kernel.dk>
Cc: Zheng Qixing <zhengqixing@huawei.com>,
	Christoph Hellwig <hch@lst.de>,
	Tang Yizhou <yizhou.tang@shopee.com>,
	Ming Lei <ming.lei@redhat.com>,
	cgroups@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction
Date: Mon, 29 Jun 2026 11:03:28 +0530	[thread overview]
Message-ID: <c44fcd50-6341-4617-b6cb-cad3c65a7a75@linux.ibm.com> (raw)
In-Reply-To: <5f5a4729-0c90-4f6f-b97e-cab9d0289dc2@fygo.io>

On 6/27/26 9:43 AM, yu kuai wrote:
> Hi,
> 
> 在 2026/6/26 14:12, Nilay Shroff 写道:
>> On 6/26/26 7:22 AM, yu kuai wrote:
>>> Hi,
>>>
>>> 在 2026/6/26 9:50, yu kuai 写道:
>>>> Hi,
>>>>
>>>> 在 2026/6/25 23:08, Nilay Shroff 写道:
>>>>> On 6/24/26 12:16 PM, Yu Kuai wrote:
>>>>>> diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
>>>>>> index 7baccfb690fe..f7e788a7fe95 100644
>>>>>> --- a/block/blk-cgroup.c
>>>>>> +++ b/block/blk-cgroup.c
>>>>>> @@ -1563,10 +1563,12 @@ int blkcg_activate_policy(struct gendisk
>>>>>> *disk, const struct blkcg_policy *pol)
>>>>>>          if (WARN_ON_ONCE(!pol->pd_alloc_fn || !pol->pd_free_fn))
>>>>>>              return -EINVAL;
>>>>>>            if (queue_is_mq(q))
>>>>>>              memflags = blk_mq_freeze_queue(q);
>>>>>> +
>>>>>> +    mutex_lock(&q->blkcg_mutex);
>>>>>>      retry:
>>>>>>          spin_lock_irq(&q->queue_lock);
>>>>>>            /* blkg_list is pushed at the head, reverse walk to
>>>>>> initialize parents first */
>>>>>>          list_for_each_entry_reverse(blkg, &q->blkg_list, q_node) {
>>>>>> @@ -1625,10 +1627,11 @@ int blkcg_activate_policy(struct gendisk
>>>>>> *disk, const struct blkcg_policy *pol)
>>>>>>          __set_bit(pol->plid, q->blkcg_pols);
>>>>>>          ret = 0;
>>>>>>            spin_unlock_irq(&q->queue_lock);
>>>>>>      out:
>>>>>> +    mutex_unlock(&q->blkcg_mutex);
>>>>>>          if (queue_is_mq(q))
>>>>>>              blk_mq_unfreeze_queue(q, memflags);
>>>>>>          if (pinned_blkg)
>>>>>>              blkg_put(pinned_blkg);
>>>>>>          if (pd_prealloc)
>>>>> If the policy allocation fails, we jump to the lable enomem: and
>>>>> teardown pds.
>>>>> But I see this path still only acquires ->queue_lock. Don't we also
>>>>> need
>>>>> to protect it with ->blkcg_mutex?
>>>> Yes, I agree we should protect it as well.
>>>
>>> Just take a closer look at the code, the enomem is already protected by
>>> blkcg_mutex :)
>>>
>>
>> Oh yes, but the ->blkcg_mutex is never released if we jump to enomem.
>> So that may potentially cause deadlock. We need to release ->blkcg_mutex
>> once blkcg_policy_teardown_pds() returns. Or may be refactor code (or add
>> comment) so that it's easy to realize or spot the ->blkcg_mutex is
>> acquired
>> and then released around blkcg_policy_teardown_pds().
> 
> the enomem will goto out at last, and blkcg_mutex do released. The code is
> a bit hacky.
> 
>>
>>>>
>>>>> Moreover I still see race between blkg insertion in blkg_create()
>>>>> which
>>>>> still doesn't use ->blkcg_mutex and so list traversal in
>>>>> bfq_end_wr_async()
>>>>> may still race with blkg_create(), isn't it? I remember you once told
>>>>> this will be handled in another series but I couldn't find that yet.
>>>> This is the set:
>>>>
>>>> [PATCH 0/8] blk-cgroup: remove queue_lock nesting from blkcg paths - Yu
>>>> Kuai <https://lore.kernel.org/all/cover.1780621988.git.yukuai@fygo.io/>
>>>>
>>>> Noted that set just make sure queue_lock is not nested under other
>>>> atomic
>>>> context, and that set do not acquire blkcg_mutex for blkg_create()
>>>> yet. Howerver,
>>>> with that set it'll be easy to convert all queue_lock to blkcg_mtuex
>>>> for blkg
>>>> protection, and together with lots of blk-cgroup code cleanups.
>>>>
>>
>> Okay, so are you planning to send a follow-up patchset that replaces
>> ->queue_lock
>> with ->blkcg_mutex for protecting the blkg_list? If so, I'd still
>> prefer acquiring
>> ->blkcg_mutex around blkg_create() in this patchset. That would
>> address the race
>> between blkg_create() and the blkg_list traversal in
>> bfq_end_wr_async(), while the
>> subsequent series can focus on cleaning up and removing the remaining
>> ->queue_lock
>> usage for blkg protection.
> 
> Yes, there is a follow-up patchset. When this set was posted, blkg_create is still
> called with queue_lock held, so I can't do that. However, not that the other set
> is already applied, I can hold blkcg_mutex for blkg_create() now.
> 

If you already have a follow-up patchset that replaces ->queue_lock with
->blkcg_mutex for protecting blkg_list, then I think it might make sense
to send that out first (if you haven't already) and hold off this series for
now. That way, the blkg_create() race can be addressed first, and this
series can build on top of those changes.

BTW, if that follow-up series is merged first, I suspect the first two patches in
this series may no longer be necessary, leaving only patches 3/4 and 4/4.

Thanks,
--Nilay

next prev parent reply	other threads:[~2026-06-29  5:34 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-24  6:46 [PATCH 1/2] md/linear: add fault-tolerant mode for unraid-like setups Yu Kuai
2026-06-24  6:46 ` [PATCH 2/2] ext4: add unraid mount option for single-disk-per-group mode Yu Kuai
2026-06-24  6:46 ` [PATCH v2 0/4] blk-cgroup: fix blkg list and policy data races Yu Kuai
2026-06-24  6:46 ` [PATCH v2 1/4] blk-cgroup: protect q->blkg_list iteration in blkg_destroy_all() with blkcg_mutex Yu Kuai
2026-06-24  6:46 ` [PATCH v2 2/4] blk-cgroup: fix race between policy activation and blkg destruction Yu Kuai
2026-06-25 15:08   ` Nilay Shroff
2026-06-26  1:50     ` yu kuai
2026-06-26  1:52       ` yu kuai
2026-06-26  6:12         ` Nilay Shroff
2026-06-27  4:13           ` yu kuai
2026-06-29  5:33             ` Nilay Shroff [this message]
2026-06-29  9:03               ` yu kuai
2026-06-24  6:46 ` [PATCH v2 3/4] blk-cgroup: skip dying blkg in blkcg_activate_policy() Yu Kuai
2026-06-24  6:46 ` [PATCH v2 4/4] blk-cgroup: factor policy pd teardown loop into helper Yu Kuai
2026-06-24  6:55 ` [PATCH 1/2] md/linear: add fault-tolerant mode for unraid-like setups yu kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c44fcd50-6341-4617-b6cb-cad3c65a7a75@linux.ibm.com \
    --to=nilay@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=cgroups@vger.kernel.org \
    --cc=hch@lst.de \
    --cc=josef@toxicpanda.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=tj@kernel.org \
    --cc=yizhou.tang@shopee.com \
    --cc=yukuai@fygo.io \
    --cc=zhengqixing@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox