From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>,
Andrea Righi <arighi@nvidia.com>,
Changwoo Min <changwoo@igalia.com>
Cc: Cheng-Yang Chou <yphbchou0911@gmail.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: [PATCH 1/2] sched_ext: Include exiting tasks in cgroup iter
Date: Mon, 27 Apr 2026 14:16:34 -1000 [thread overview]
Message-ID: <20260428001635.3293997-2-tj@kernel.org> (raw)
In-Reply-To: <20260428001635.3293997-1-tj@kernel.org>
a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made
css_task_iter_advance() skip exiting tasks so cgroup.procs stays consistent
with waitpid() visibility. Unfortunately, this broke scx_task_iter.
scx_task_iter walks either scx_tasks (global) or a cgroup subtree via
css_task_iter() and the two modes are expected to cover the same set of
tasks. After the above change the cgroup-scoped mode silently skips tasks
past exit_signals() that are still on scx_tasks.
scx_sub_enable_workfn()'s abort path is one of the symptoms: an exiting
SCX_TASK_SUB_INIT task can race past the cgroup iter leaking
__scx_init_task() state. Other iterations share the same gap.
Add CSS_TASK_ITER_WITH_DEAD to opt out of the skip and use it from
scx_task_iter().
Fixes: b0e4c2f8a0f0 ("sched_ext: Implement cgroup subtree iteration for scx_task_iter")
Reported-by: Cheng-Yang Chou <yphbchou0911@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
---
include/linux/cgroup.h | 1 +
kernel/cgroup/cgroup.c | 8 +++++---
kernel/sched/ext.c | 6 ++++--
3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e52160e85af4..f6d037a30fd8 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -53,6 +53,7 @@ struct kernel_clone_args;
enum css_task_iter_flags {
CSS_TASK_ITER_PROCS = (1U << 0), /* walk only threadgroup leaders */
CSS_TASK_ITER_THREADED = (1U << 1), /* walk all threaded css_sets in the domain */
+ CSS_TASK_ITER_WITH_DEAD = (1U << 2), /* include exiting tasks */
CSS_TASK_ITER_SKIPPED = (1U << 16), /* internal flags */
};
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
index 1f084ee71443..e51ce4cd3739 100644
--- a/kernel/cgroup/cgroup.c
+++ b/kernel/cgroup/cgroup.c
@@ -5059,10 +5059,12 @@ static void css_task_iter_advance(struct css_task_iter *it)
task = list_entry(it->task_pos, struct task_struct, cg_list);
/*
- * Hide tasks that are exiting but not yet removed. Keep zombie
- * leaders with live threads visible.
+ * Hide tasks that are exiting but not yet removed by default. Keep
+ * zombie leaders with live threads visible. Usages that need to walk
+ * every existing task can opt out via CSS_TASK_ITER_WITH_DEAD.
*/
- if ((task->flags & PF_EXITING) && !atomic_read(&task->signal->live))
+ if (!(it->flags & CSS_TASK_ITER_WITH_DEAD) &&
+ (task->flags & PF_EXITING) && !atomic_read(&task->signal->live))
goto repeat;
if (it->flags & CSS_TASK_ITER_PROCS) {
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 9eda20e5fdb8..cf43be8ac1aa 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -766,7 +766,8 @@ static void scx_task_iter_start(struct scx_task_iter *iter, struct cgroup *cgrp)
lockdep_assert_held(&cgroup_mutex);
iter->cgrp = cgrp;
iter->css_pos = css_next_descendant_pre(NULL, &iter->cgrp->self);
- css_task_iter_start(iter->css_pos, 0, &iter->css_iter);
+ css_task_iter_start(iter->css_pos, CSS_TASK_ITER_WITH_DEAD,
+ &iter->css_iter);
return;
}
#endif
@@ -866,7 +867,8 @@ static struct task_struct *scx_task_iter_next(struct scx_task_iter *iter)
iter->css_pos = css_next_descendant_pre(iter->css_pos,
&iter->cgrp->self);
if (iter->css_pos)
- css_task_iter_start(iter->css_pos, 0, &iter->css_iter);
+ css_task_iter_start(iter->css_pos, CSS_TASK_ITER_WITH_DEAD,
+ &iter->css_iter);
}
return NULL;
}
--
2.53.0
next prev parent reply other threads:[~2026-04-28 0:16 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 0:16 [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks Tejun Heo
2026-04-28 0:16 ` Tejun Heo [this message]
2026-04-28 0:16 ` [PATCH 2/2] sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked() Tejun Heo
2026-05-04 19:10 ` [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260428001635.3293997-2-tj@kernel.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
--cc=yphbchou0911@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.