From: Chen Ridong <chenridong@huaweicloud.com>
To: Hillf Danton <hdanton@sina.com>
Cc: Michal Koutny <mkoutny@suse.com>,
tj@kernel.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org, lujialin4@huawei.com,
chenridong@huawei.com, gaoyingjie@uniontech.com
Subject: Re: [PATCH v2 -next] cgroup: remove offline draining in root destruction to avoid hung_tasks
Date: Fri, 15 Aug 2025 18:28:53 +0800 [thread overview]
Message-ID: <afc95938-0eb5-427b-a2dd-a7eccf54d891@huaweicloud.com> (raw)
In-Reply-To: <20250815100213.4599-1-hdanton@sina.com>
On 2025/8/15 18:02, Hillf Danton wrote:
> On Fri, 15 Aug 2025 15:29:56 +0800 Chen Ridong wrote:
>> On 2025/8/15 10:40, Hillf Danton wrote:
>>> On Fri, Jul 25, 2025 at 09:42:05AM +0800, Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>>> On Tue, Jul 22, 2025 at 11:27:33AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
>>>>>> CPU0 CPU1
>>>>>> mount perf_event umount net_prio
>>>>>> cgroup1_get_tree cgroup_kill_sb
>>>>>> rebind_subsystems // root destruction enqueues
>>>>>> // cgroup_destroy_wq
>>>>>> // kill all perf_event css
>>>>>> // one perf_event css A is dying
>>>>>> // css A offline enqueues cgroup_destroy_wq
>>>>>> // root destruction will be executed first
>>>>>> css_free_rwork_fn
>>>>>> cgroup_destroy_root
>>>>>> cgroup_lock_and_drain_offline
>>>>>> // some perf descendants are dying
>>>>>> // cgroup_destroy_wq max_active = 1
>>>>>> // waiting for css A to die
>>>>>>
>>>>>> Problem scenario:
>>>>>> 1. CPU0 mounts perf_event (rebind_subsystems)
>>>>>> 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
>>>>>> 3. A dying perf_event CSS gets queued for offline after root destruction
>>>>>> 4. Root destruction waits for offline completion, but offline work is
>>>>>> blocked behind root destruction in cgroup_destroy_wq (max_active=1)
>>>>>
>>>>> What's concerning me is why umount of net_prio hierarhy waits for
>>>>> draining of the default hierachy? (Where you then run into conflict with
>>>>> perf_event that's implicit_on_dfl.)
>>>>>
>>> /*
>>> * cgroup destruction makes heavy use of work items and there can be a lot
>>> * of concurrent destructions. Use a separate workqueue so that cgroup
>>> * destruction work items don't end up filling up max_active of system_wq
>>> * which may lead to deadlock.
>>> */
>>>
>>> If task hung could be reliably reproduced, it is right time to cut
>>> max_active off for cgroup_destroy_wq according to its comment.
>>
>> Hi Danton,
>>
>> Thank you for your feedback.
>>
>> While modifying max_active could be a viable solution, I’m unsure whether it might introduce other
>> side effects. Instead, I’ve proposed an alternative approach in v3 of the patch, which I believe
>> addresses the issue more comprehensively.
>>
> Given your reproducer [1], it is simple to test with max_active cut.
>
> I do not think v3 is a correct fix frankly because it leaves the root cause
> intact. Nor is it cgroup specific even given high concurrency in destruction.
>
> [1] https://lore.kernel.org/lkml/39e05402-40c7-4631-a87b-8e3747ceddc6@huaweicloud.com/
Hi Danton,
Thank you for your reply.
To clarify, when you mentioned "cut max_active off", did you mean setting max_active of
cgroup_destroy_wq to 1?
Note that cgroup_destroy_wq already has max_active=1 as inited:
```cgroup_destroy_wq = alloc_workqueue("cgroup_destroy", 0, 1);```
The v3 changes prevent subsystem root destruction from being blocked by unrelated subsystem offline
events. Since root destruction should only proceed after all descendants are destroyed, it shouldn't
be blocked by children offline events. My testing with the reproducer confirms this fixes the issue
I encountered.
--
Best regards,
Ridong
next prev parent reply other threads:[~2025-08-15 10:29 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-22 11:27 [PATCH v2 -next] cgroup: remove offline draining in root destruction to avoid hung_tasks Chen Ridong
2025-07-24 13:35 ` Michal Koutný
2025-07-25 1:42 ` Chen Ridong
2025-07-25 17:17 ` Michal Koutný
2025-07-26 0:52 ` Chen Ridong
2025-07-31 11:53 ` Chen Ridong
2025-08-14 15:17 ` Michal Koutný
2025-08-15 0:30 ` Chen Ridong
2025-08-15 2:40 ` Hillf Danton
2025-08-15 7:29 ` Chen Ridong
2025-08-15 10:02 ` Hillf Danton
2025-08-15 10:28 ` Chen Ridong [this message]
2025-08-15 11:54 ` Hillf Danton
2025-08-16 0:33 ` Chen Ridong
2025-08-16 0:57 ` Hillf Danton
2025-08-15 7:24 ` Chen Ridong
2025-07-25 1:48 ` Chen Ridong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=afc95938-0eb5-427b-a2dd-a7eccf54d891@huaweicloud.com \
--to=chenridong@huaweicloud.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=gaoyingjie@uniontech.com \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=lujialin4@huawei.com \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).