public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks
@ 2026-04-28  0:16 Tejun Heo
  2026-04-28  0:16 ` [PATCH 1/2] sched_ext: Include exiting tasks in cgroup iter Tejun Heo
  2026-04-28  0:16 ` [PATCH 2/2] sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked() Tejun Heo
  0 siblings, 2 replies; 3+ messages in thread
From: Tejun Heo @ 2026-04-28  0:16 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min
  Cc: Cheng-Yang Chou, Emil Tsalapatis, sched-ext, linux-kernel

Hello,

a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made
css_task_iter_advance() skip exiting tasks. That broke scx_task_iter's
cgroup-scoped mode: it now silently skips tasks that are still on
scx_tasks but past exit_signals(), so the abort path in
scx_sub_enable_workfn() can miss SCX_TASK_SUB_INIT-marked exiting tasks
and leak __scx_init_task() state.

Restoring iter coverage exposes a separate latent issue: cgroup
iteration can return tasks whose sched_ext_dead() has already torn down
their per-task SCX state (cgroup_task_dead() runs after sched_ext_dead()
in finish_task_switch() and is irq-work deferred on PREEMPT_RT). Callers
trip WARN_ON_ONCE() / fail assertions when they see such a task.

This pair fixes both:

 0001 sched_ext: Include exiting tasks in cgroup iter
      Adds CSS_TASK_ITER_WITH_DEAD; scx_task_iter opts in.

 0002 sched_ext: Skip past-sched_ext_dead() tasks in
      scx_task_iter_next_locked()
      Adds SCX_TASK_OFF_TASKS, set in sched_ext_dead() under the rq
      lock; scx_task_iter_next_locked() skips flagged tasks under the
      same lock.

Verified with a stress harness that runs a 4-deep nested sub-sched
hierarchy with continuous fork/switch workers and random sub-sched
restarts at 5s intervals. Baseline (without the patches) wedged a
192-CPU bare-metal box in 66s and oopsed a 24-thread bare-metal box at
227s. Patched ran clean for 30min on both plus an 8-vCPU vng - 0
WARN/BUG/lockdep across ~1000 sub-restarts.

Based on sched_ext/for-7.1-fixes (deb7b2f93d01).

 include/linux/cgroup.h    |  1 +
 include/linux/sched/ext.h |  1 +
 kernel/cgroup/cgroup.c    |  8 +++++---
 kernel/sched/ext.c        | 39 +++++++++++++++++++++++++++++----------
 4 files changed, 36 insertions(+), 13 deletions(-)

Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git iter-include-dead-v1

Thanks.

--
tejun

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-04-28  0:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-28  0:16 [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks Tejun Heo
2026-04-28  0:16 ` [PATCH 1/2] sched_ext: Include exiting tasks in cgroup iter Tejun Heo
2026-04-28  0:16 ` [PATCH 2/2] sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked() Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox