From: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
To: Tejun Heo <tj@kernel.org>
Cc: cgroups@vger.kernel.org, linux-kernel@vger.kernel.org,
Bert Karwatzki <spasswolf@web.de>,
Michal Koutny <mkoutny@suse.com>,
Johannes Weiner <hannes@cmpxchg.org>,
kernel test robot <oliver.sang@intel.com>
Subject: Re: [PATCH v2] cgroup: Wait for dying tasks to leave on rmdir
Date: Tue, 24 Mar 2026 10:04:02 +0100 [thread overview]
Message-ID: <20260324090402.k7NkNcEp@linutronix.de> (raw)
In-Reply-To: <20260323200205.1063629-1-tj@kernel.org>
On 2026-03-23 10:02:05 [-1000], Tejun Heo wrote:
> a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") hid PF_EXITING
> tasks from cgroup.procs so that systemd doesn't see tasks that have already
> been reaped via waitpid(). However, the populated counter (nr_populated_csets)
> is only decremented when the task later passes through cgroup_task_dead() in
> finish_task_switch(). This means cgroup.procs can appear empty while the
> cgroup is still populated, causing rmdir to fail with -EBUSY.
>
> Fix this by making cgroup_rmdir() wait for dying tasks to fully leave. If the
> cgroup is populated but all remaining tasks have PF_EXITING set (the task
> iterator returns none due to the existing filter), wait for a kick from
> cgroup_task_dead() and retry. The wait is brief as tasks are removed from the
> cgroup's css_set between PF_EXITING assertion in do_exit() and
> cgroup_task_dead() in finish_task_switch().
>
> v2: cgroup_is_populated() true to false transition happens under css_set_lock
> not cgroup_mutex, so retest under css_set_lock before sleeping to avoid
> missed wakeups (Sebastian).
>
> Fixes: a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup")
> Reported-by: kernel test robot <oliver.sang@intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202603222104.2c81684e-lkp@intel.com
> Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> Signed-off-by: Tejun Heo <tj@kernel.org>
> Cc: Bert Karwatzki <spasswolf@web.de>
> Cc: Michal Koutny <mkoutny@suse.com>
> Cc: cgroups@vger.kernel.org
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
As mentioned in the other email, if I
- irq_work_queue(this_cpu_ptr(&cgrp_dead_tasks_iwork));
+ schedule_delayed_work(this_cpu_ptr(&cgrp_delayed_tasks_iwork), 1 * HZ);
then I hung at boot because it rmdir() a cgroup with a task in Z. It
might suggest a race because systemd might missed a task.
But this fixes the other issue so.
Sebastian
next prev parent reply other threads:[~2026-03-24 9:04 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 20:02 [PATCH v2] cgroup: Wait for dying tasks to leave on rmdir Tejun Heo
2026-03-24 9:04 ` Sebastian Andrzej Siewior [this message]
2026-03-24 20:17 ` Tejun Heo
2026-03-25 11:52 ` Sebastian Andrzej Siewior
2026-03-24 20:24 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324090402.k7NkNcEp@linutronix.de \
--to=bigeasy@linutronix.de \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mkoutny@suse.com \
--cc=oliver.sang@intel.com \
--cc=spasswolf@web.de \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox