From: Tejun Heo <tj@kernel.org>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
Bert Karwatzki <spasswolf@web.de>,
Michal Koutny <mkoutny@suse.com>,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH] cgroup: Wait for dying tasks to leave on rmdir
Date: Mon, 23 Mar 2026 09:55:40 -1000 [thread overview]
Message-ID: <acGavAFVTfggKIKy@slm.duckdns.org> (raw)
In-Reply-To: <20260323113252.xsuwQA3z@linutronix.de>
Hello,
On Mon, Mar 23, 2026 at 12:32:52PM +0100, Sebastian Andrzej Siewior wrote:
...
> I saw instances on PREEMPT_RT where the above cgroup_is_populated()
> reported true due to cgrp->nr_populated_csets = 1, the following
> iterator returned NULL but in that time do_cgroup_task_dead() saw no
> waiter and continued without a wake_up and then the following schedule()
> hung.
Ah, right, false->true is protected by cgroup_mutex but true->false is only
by css_set_lock. It should check populated again with css_set_lock held and
then do prepare_to_wait().
> There is no serialisation between this wait/ check and latter wake. An
> alternative would be to check and prepare_to_wait() under css_set_lock.
Yeap.
> > + finish_wait(&cgrp->dying_populated_waitq, &wait);
> > + mutex_lock(&cgroup_mutex);
> > + goto retry;
> > +}
>
> Then I added my RCU patch. This led to a problem already during boot up
> (didn't manage to get to the test suite).
Is that the patch to move cgroup_task_dead() to delayed_put_task_struct()? I
don't think we can delay populated state update till usage count reaches
zero. e.g. bpf_task_acquire() can be used by arbitrary bpf programs and will
pin the usage count indefinitely delaying populated state update. Similar to
delaying the event to free path, you can construct a deadlock scenario too.
> systemd-1 places modprobe-1044 in a cgroup, then destroys the cgroup.
> It hangs in cgroup_drain_dying() because nr_populated_csets is still 1.
> modprobe-1044 is still there in Z so the cgroup removal didn't get there
> yet. That irq_work was quicker than RCU in this case. This can be
> reproduced without RCU by
Isn't this the exact scenario? systemd is the one who should reap and drop
the usage count but it's waiting for rmdir() to finish which can't finish
due to the usage count which hasn't been reapted by systemd? We can't
interlock these two. They have to make progress independently.
> - irq_work_queue(this_cpu_ptr(&cgrp_dead_tasks_iwork));
> + schedule_delayed_work(this_cpu_ptr(&cgrp_delayed_tasks_iwork), HZ);
>
> So there is always a one second delay. If I give up waiting after 10secs
> then it boots eventually and there are no zombies around. The test_core
> seems to complete…
>
> Having the irq_work as-is, then the "cgroup_dead()" happens on the HZ
> tick. test_core then complains just with
> | not ok 7 test_cgcore_populated
The test is assuming that waitpid() success guarantees cgroup !populated
event. While before all these changes, that held, it wasn't intentional and
the test just picked up on arbitrary ordering. I'll just remove that
particular test.
Thanks.
--
tejun
next prev parent reply other threads:[~2026-03-23 19:55 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 3:58 [PATCH] cgroup: Wait for dying tasks to leave on rmdir Tejun Heo
2026-03-23 11:32 ` Sebastian Andrzej Siewior
2026-03-23 19:55 ` Tejun Heo [this message]
2026-03-24 8:21 ` Sebastian Andrzej Siewior
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=acGavAFVTfggKIKy@slm.duckdns.org \
--to=tj@kernel.org \
--cc=bigeasy@linutronix.de \
--cc=cgroups@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=oliver.sang@intel.com \
--cc=spasswolf@web.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox