From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
Bert Karwatzki <spasswolf@web.de>,
Michal Koutny <mkoutny@suse.com>,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH] cgroup: Wait for dying tasks to leave on rmdir
Date: Tue, 24 Mar 2026 09:21:47 +0100 [thread overview]
Message-ID: <20260324082147.9ysLN_6x@linutronix.de> (raw)
In-Reply-To: <acGavAFVTfggKIKy@slm.duckdns.org>
On 2026-03-23 09:55:40 [-1000], Tejun Heo wrote:
> Hello,
Hi,
> > Then I added my RCU patch. This led to a problem already during boot up
> > (didn't manage to get to the test suite).
>
> Is that the patch to move cgroup_task_dead() to delayed_put_task_struct()? I
> don't think we can delay populated state update till usage count reaches
> zero. e.g. bpf_task_acquire() can be used by arbitrary bpf programs and will
> pin the usage count indefinitely delaying populated state update. Similar to
> delaying the event to free path, you can construct a deadlock scenario too.
Okay, then. I expected it to be limited window within a bpf program or
the sched_ext.
> > systemd-1 places modprobe-1044 in a cgroup, then destroys the cgroup.
> > It hangs in cgroup_drain_dying() because nr_populated_csets is still 1.
> > modprobe-1044 is still there in Z so the cgroup removal didn't get there
> > yet. That irq_work was quicker than RCU in this case. This can be
> > reproduced without RCU by
>
> Isn't this the exact scenario? systemd is the one who should reap and drop
> the usage count but it's waiting for rmdir() to finish which can't finish
> due to the usage count which hasn't been reapted by systemd? We can't
> interlock these two. They have to make progress independently.
But nobody is holding it back. For some reason systemd-1 did not reap
modprobe-1044 first but went first for the rmdir(). I noticed it with
RCU first but it was also there after delayed the cleanup by one second
without RCU.
> > - irq_work_queue(this_cpu_ptr(&cgrp_dead_tasks_iwork));
> > + schedule_delayed_work(this_cpu_ptr(&cgrp_delayed_tasks_iwork), HZ);
> >
> > So there is always a one second delay. If I give up waiting after 10secs
> > then it boots eventually and there are no zombies around. The test_core
> > seems to complete…
> >
> > Having the irq_work as-is, then the "cgroup_dead()" happens on the HZ
> > tick. test_core then complains just with
> > | not ok 7 test_cgcore_populated
>
> The test is assuming that waitpid() success guarantees cgroup !populated
> event. While before all these changes, that held, it wasn't intentional and
> the test just picked up on arbitrary ordering. I'll just remove that
> particular test.
okay. Thanks.
> Thanks.
Sebastian
prev parent reply other threads:[~2026-03-24 8:21 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 3:58 [PATCH] cgroup: Wait for dying tasks to leave on rmdir Tejun Heo
2026-03-23 11:32 ` Sebastian Andrzej Siewior
2026-03-23 19:55 ` Tejun Heo
2026-03-24 8:21 ` Sebastian Andrzej Siewior [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324082147.9ysLN_6x@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=cgroups@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=oliver.sang@intel.com \
--cc=spasswolf@web.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox