From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>,
Andrea Righi <arighi@nvidia.com>,
Changwoo Min <changwoo@igalia.com>
Cc: Cheng-Yang Chou <yphbchou0911@gmail.com>,
Emil Tsalapatis <emil@etsalapatis.com>,
sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: [PATCH 2/2] sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked()
Date: Mon, 27 Apr 2026 14:16:35 -1000 [thread overview]
Message-ID: <20260428001635.3293997-3-tj@kernel.org> (raw)
In-Reply-To: <20260428001635.3293997-1-tj@kernel.org>
scx_task_iter's cgroup-scoped mode can return tasks whose
sched_ext_dead() has already completed: cgroup_task_dead() removes
from cset->tasks after sched_ext_dead() in finish_task_switch() and is
irq-work deferred on PREEMPT_RT. The global mode is fine -
sched_ext_dead() removes from scx_tasks via list_del_init() first.
Callers (sub-sched enable prep/abort/apply, scx_sub_disable(),
scx_fail_parent()) assume returned tasks are still on @sch and trip
WARN_ON_ONCE() or operate on torn-down state otherwise.
Set %SCX_TASK_OFF_TASKS in sched_ext_dead() under @p's rq lock and
have scx_task_iter_next_locked() skip flagged tasks under the same
lock. Setter and reader serialize on the per-task rq lock - no race.
Signed-off-by: Tejun Heo <tj@kernel.org>
---
include/linux/sched/ext.h | 1 +
kernel/sched/ext.c | 33 +++++++++++++++++++++++++--------
2 files changed, 26 insertions(+), 8 deletions(-)
diff --git a/include/linux/sched/ext.h b/include/linux/sched/ext.h
index 1a3af2ea2a79..adb9a4de068a 100644
--- a/include/linux/sched/ext.h
+++ b/include/linux/sched/ext.h
@@ -101,6 +101,7 @@ enum scx_ent_flags {
SCX_TASK_DEQD_FOR_SLEEP = 1 << 3, /* last dequeue was for SLEEP */
SCX_TASK_SUB_INIT = 1 << 4, /* task being initialized for a sub sched */
SCX_TASK_IMMED = 1 << 5, /* task is on local DSQ with %SCX_ENQ_IMMED */
+ SCX_TASK_OFF_TASKS = 1 << 6, /* removed from scx_tasks by sched_ext_dead() */
/*
* Bits 8 and 9 are used to carry task state:
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index cf43be8ac1aa..6c3c40499404 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -928,16 +928,27 @@ static struct task_struct *scx_task_iter_next_locked(struct scx_task_iter *iter)
*
* Test for idle_sched_class as only init_tasks are on it.
*/
- if (p->sched_class != &idle_sched_class)
- break;
- }
- if (!p)
- return NULL;
+ if (p->sched_class == &idle_sched_class)
+ continue;
- iter->rq = task_rq_lock(p, &iter->rf);
- iter->locked_task = p;
+ iter->rq = task_rq_lock(p, &iter->rf);
+ iter->locked_task = p;
- return p;
+ /*
+ * cgroup_task_dead() removes the dead tasks from cset->tasks
+ * after sched_ext_dead() and cgroup iteration may see tasks
+ * which already finished sched_ext_dead(). %SCX_TASK_OFF_TASKS
+ * is set by sched_ext_dead() under @p's rq lock. Test it to
+ * avoid visiting tasks which are already dead from SCX POV.
+ */
+ if (p->scx.flags & SCX_TASK_OFF_TASKS) {
+ __scx_task_iter_rq_unlock(iter);
+ continue;
+ }
+
+ return p;
+ }
+ return NULL;
}
/**
@@ -3816,6 +3827,11 @@ void sched_ext_dead(struct task_struct *p)
/*
* @p is off scx_tasks and wholly ours. scx_root_enable()'s READY ->
* ENABLED transitions can't race us. Disable ops for @p.
+ *
+ * %SCX_TASK_OFF_TASKS synchronizes against cgroup task iteration - see
+ * scx_task_iter_next_locked(). NONE tasks need no marking: cgroup
+ * iteration is only used from sub-sched paths, which require root
+ * enabled. Root enable transitions every live task to at least READY.
*/
if (scx_get_task_state(p) != SCX_TASK_NONE) {
struct rq_flags rf;
@@ -3823,6 +3839,7 @@ void sched_ext_dead(struct task_struct *p)
rq = task_rq_lock(p, &rf);
scx_disable_and_exit_task(scx_task_sched(p), p);
+ p->scx.flags |= SCX_TASK_OFF_TASKS;
task_rq_unlock(rq, p, &rf);
}
}
--
2.53.0
next prev parent reply other threads:[~2026-04-28 0:16 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-28 0:16 [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks Tejun Heo
2026-04-28 0:16 ` [PATCH 1/2] sched_ext: Include exiting tasks in cgroup iter Tejun Heo
2026-04-28 0:16 ` Tejun Heo [this message]
2026-05-04 19:10 ` [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260428001635.3293997-3-tj@kernel.org \
--to=tj@kernel.org \
--cc=arighi@nvidia.com \
--cc=changwoo@igalia.com \
--cc=emil@etsalapatis.com \
--cc=linux-kernel@vger.kernel.org \
--cc=sched-ext@lists.linux.dev \
--cc=void@manifault.com \
--cc=yphbchou0911@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.