All of lore.kernel.org
 help / color / mirror / Atom feed
From: Tejun Heo <tj@kernel.org>
To: David Vernet <void@manifault.com>,
	Andrea Righi <arighi@nvidia.com>,
	Changwoo Min <changwoo@igalia.com>
Cc: Cheng-Yang Chou <yphbchou0911@gmail.com>,
	Emil Tsalapatis <emil@etsalapatis.com>,
	sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org
Subject: [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks
Date: Mon, 27 Apr 2026 14:16:33 -1000	[thread overview]
Message-ID: <20260428001635.3293997-1-tj@kernel.org> (raw)

Hello,

a72f73c4dd9b ("cgroup: Don't expose dead tasks in cgroup") made
css_task_iter_advance() skip exiting tasks. That broke scx_task_iter's
cgroup-scoped mode: it now silently skips tasks that are still on
scx_tasks but past exit_signals(), so the abort path in
scx_sub_enable_workfn() can miss SCX_TASK_SUB_INIT-marked exiting tasks
and leak __scx_init_task() state.

Restoring iter coverage exposes a separate latent issue: cgroup
iteration can return tasks whose sched_ext_dead() has already torn down
their per-task SCX state (cgroup_task_dead() runs after sched_ext_dead()
in finish_task_switch() and is irq-work deferred on PREEMPT_RT). Callers
trip WARN_ON_ONCE() / fail assertions when they see such a task.

This pair fixes both:

 0001 sched_ext: Include exiting tasks in cgroup iter
      Adds CSS_TASK_ITER_WITH_DEAD; scx_task_iter opts in.

 0002 sched_ext: Skip past-sched_ext_dead() tasks in
      scx_task_iter_next_locked()
      Adds SCX_TASK_OFF_TASKS, set in sched_ext_dead() under the rq
      lock; scx_task_iter_next_locked() skips flagged tasks under the
      same lock.

Verified with a stress harness that runs a 4-deep nested sub-sched
hierarchy with continuous fork/switch workers and random sub-sched
restarts at 5s intervals. Baseline (without the patches) wedged a
192-CPU bare-metal box in 66s and oopsed a 24-thread bare-metal box at
227s. Patched ran clean for 30min on both plus an 8-vCPU vng - 0
WARN/BUG/lockdep across ~1000 sub-restarts.

Based on sched_ext/for-7.1-fixes (deb7b2f93d01).

 include/linux/cgroup.h    |  1 +
 include/linux/sched/ext.h |  1 +
 kernel/cgroup/cgroup.c    |  8 +++++---
 kernel/sched/ext.c        | 39 +++++++++++++++++++++++++++++----------
 4 files changed, 36 insertions(+), 13 deletions(-)

Git tree: git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git iter-include-dead-v1

Thanks.

--
tejun

             reply	other threads:[~2026-04-28  0:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-28  0:16 Tejun Heo [this message]
2026-04-28  0:16 ` [PATCH 1/2] sched_ext: Include exiting tasks in cgroup iter Tejun Heo
2026-04-28  0:16 ` [PATCH 2/2] sched_ext: Skip past-sched_ext_dead() tasks in scx_task_iter_next_locked() Tejun Heo
2026-05-04 19:10 ` [PATCHSET sched_ext/for-7.1-fixes] sched_ext: Fix cgroup iter coverage of in-do_exit tasks Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260428001635.3293997-1-tj@kernel.org \
    --to=tj@kernel.org \
    --cc=arighi@nvidia.com \
    --cc=changwoo@igalia.com \
    --cc=emil@etsalapatis.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=sched-ext@lists.linux.dev \
    --cc=void@manifault.com \
    --cc=yphbchou0911@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.