From: Chen Ridong <chenridong@huaweicloud.com>
To: tj@kernel.org, hannes@cmpxchg.org, mkoutny@suse.com, lizefan@huawei.com
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
lujialin4@huawei.com, chenridong@huawei.com,
gaoyingjie@uniontech.com
Subject: Re: [PATCH -next] cgroup: remove offline draining in root destruction to avoid hung_tasks
Date: Tue, 22 Jul 2025 19:14:01 +0800 [thread overview]
Message-ID: <be35ab6e-6670-414c-ab3b-c86a690c6cef@huaweicloud.com> (raw)
In-Reply-To: <20250722092444.4108989-1-chenridong@huaweicloud.com>
On 2025/7/22 17:24, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
>
> A hung task can occur during LTP cgroup testing when repeatedly
> mounting/unmounting perf_event and net_prio controllers with
> systemd.unified_cgroup_hierarchy=1. The hang manifests in
> cgroup_lock_and_drain_offline() during root destruction.
>
> Call Trace:
> cgroup_lock_and_drain_offline+0x14c/0x1e8
> cgroup_destroy_root+0x3c/0x2c0
> css_free_rwork_fn+0x248/0x338
> process_one_work+0x16c/0x3b8
> worker_thread+0x22c/0x3b0
> kthread+0xec/0x100
> ret_from_fork+0x10/0x20
>
> Root Cause:
>
> CPU0 CPU1
> mount perf_event umount net_prio
> cgroup1_get_tree cgroup_kill_sb
> rebind_subsystems // root destruction enqueues
> // cgroup_destroy_wq
> // kill all perf_event css
> // one perf_event css A is dying
> // css A offline enqueues cgroup_destroy_wq
> // root destruction will be executed first
> css_free_rwork_fn
> cgroup_destroy_root
> cgroup_lock_and_drain_offline
> // some perf descendants are dying
> // cgroup_destroy_wq max_active = 1
> // waiting for css A to die
>
> Problem scenario:
> 1. CPU0 mounts perf_event (rebind_subsystems)
> 2. CPU1 unmounts net_prio (cgroup_kill_sb), queuing root destruction work
> 3. A dying perf_event CSS gets queued for offline after root destruction
> 4. Root destruction waits for offline completion, but offline work is
> blocked behind root destruction in cgroup_destroy_wq (max_active=1)
>
> Solution:
> Move cgroup_lock_and_drain_offline() to the start of unmount operations.
> This ensures:
> 1. cgroup_lock_and_drain_offline() will not be called within
> cgroup_destroy_wq context.
> 2. No new dying csses for the subsystem being unmounted can appear in
> cgrp_dfl_root between unmount start and subsystem rebinding.
>
> Fixes: 334c3679ec4b ("cgroup: reimplement rebind_subsystems() using cgroup_apply_control() and friends")
> Reported-by: Gao Yingjie <gaoyingjie@uniontech.com>
> Signed-off-by: Chen Ridong <chenridong@huawei.com>
> ---
> kernel/cgroup/cgroup.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
> index 312c6a8b55bb..7a71410b350e 100644
> --- a/kernel/cgroup/cgroup.c
> +++ b/kernel/cgroup/cgroup.c
> @@ -1346,8 +1346,7 @@ static void cgroup_destroy_root(struct cgroup_root *root)
>
> trace_cgroup_destroy_root(root);
>
> - cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
> -
> + cgroup_lock();
> BUG_ON(atomic_read(&root->nr_cgrps));
> BUG_ON(!list_empty(&cgrp->self.children));
>
> @@ -2336,6 +2335,7 @@ static void cgroup_kill_sb(struct super_block *sb)
> *
> * And don't kill the default root.
> */
> + cgroup_lock_and_drain_offline(&cgrp_dfl_root.cgrp);
> if (list_empty(&root->cgrp.self.children) && root != &cgrp_dfl_root &&
> !percpu_ref_is_dying(&root->cgrp.self.refcnt))
> percpu_ref_kill(&root->cgrp.self.refcnt);
Sorry, this is a mistake, I will send the new one.
Best regards,
Ridong
prev parent reply other threads:[~2025-07-22 11:14 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-22 9:24 [PATCH -next] cgroup: remove offline draining in root destruction to avoid hung_tasks Chen Ridong
2025-07-22 11:14 ` Chen Ridong [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=be35ab6e-6670-414c-ab3b-c86a690c6cef@huaweicloud.com \
--to=chenridong@huaweicloud.com \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=gaoyingjie@uniontech.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan@huawei.com \
--cc=lujialin4@huawei.com \
--cc=mkoutny@suse.com \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).