* WARNING at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0
@ 2026-06-01 12:41 Matt Fleming
2026-06-01 19:22 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() Tejun Heo
0 siblings, 1 reply; 4+ messages in thread
From: Matt Fleming @ 2026-06-01 12:41 UTC (permalink / raw)
To: Tejun Heo, David Vernet
Cc: Andrea Righi, Changwoo Min, sched-ext, kernel-team, Matt Fleming
Hi,
We're hitting a WARN_ON_ONCE in scx_cgroup_move_task() while running
scx_lavd. The stack is:
scx_cgroup_move_task+0xa8/0xb0
sched_move_task+0x134/0x290
cpu_cgroup_attach+0x39/0x70
cgroup_migrate_execute+0x37d/0x450
cgroup_update_dfl_csses+0x1e3/0x270
cgroup_subtree_control_write+0x3e7/0x440
The trigger is systemd's user@UID.service startup. The user manager
writes +cpu +memory +pids to its own subtree_control. Because +cpu is
already inherited, the cpu task_group doesn't change -- but
find_css_set() returns a css_set whose subsys[cpu] pointer differs
from the source (different css object, same tg_cgrp()). That sets
the cpu bit in mgctx->ss_mask, so both cpu_cgroup_can_attach() and
cpu_cgroup_attach() run.
scx_cgroup_can_attach() checks "from == to" in tg_cgrp() and skips the
task, leaving cgrp_moving_from NULL. But cpu_cgroup_attach() calls
sched_move_task() unconditionally, so scx_cgroup_move_task() fires and
WARNs on the NULL.
I managed to reproduce by cycling systemctl stop/start
user@UID.service. The hit rate varies with css_set cache state.
Any thoughts on the best way to resolve this?
Thanks,
Matt
^ permalink raw reply [flat|nested] 4+ messages in thread* [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() 2026-06-01 12:41 WARNING at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0 Matt Fleming @ 2026-06-01 19:22 ` Tejun Heo 2026-06-01 20:19 ` Andrea Righi 2026-06-02 21:35 ` Tejun Heo 0 siblings, 2 replies; 4+ messages in thread From: Tejun Heo @ 2026-06-01 19:22 UTC (permalink / raw) To: David Vernet, Andrea Righi, Changwoo Min, sched-ext Cc: Emil Tsalapatis, linux-kernel, Matt Fleming, kernel-team A WARN fires when systemd's user manager writes "+cpu +memory +pids" to its own subtree_control while a sched_ext scheduler is loaded: WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0 scx_cgroup_move_task+0xa8/0xb0 sched_move_task+0x134/0x290 cpu_cgroup_attach+0x39/0x70 cgroup_migrate_execute+0x37d/0x450 cgroup_update_dfl_csses+0x1e3/0x270 cgroup_subtree_control_write+0x3e7/0x440 scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu cgroup changes. It can still be NULL when scx_cgroup_move_task() runs, through this sequence: Step Result --------------------------------- ---------------------------------- 1. cpu enabled on cgroup G cpu css = A 2. cpu toggled off then on for G A killed, B created (same cgroup) 3. an exiting task keeps A alive migration skips it, A now stale 4. +memory migrates G stale A vs current B pulls cpu in 5. cpu attach runs for all tasks hits a live, cpu-unchanged task 6. scx_cgroup_move_task() on it cgrp_moving_from NULL -> WARN The mismatch is that scx_cgroup_can_attach() keys on cgroup identity while migration drives the move on css identity, so a NULL cgrp_moving_from here is a legitimate css-only migration, not a missing prep. The call is already gated on cgrp_moving_from, so just drop the warning. ops.cgroup_prep_move() and ops.cgroup_move() stay paired. Fixes: 819513666966 ("sched_ext: Add cgroup support") Cc: stable@vger.kernel.org # v6.12+ Reported-by: Matt Fleming <mfleming@cloudflare.com> Closes: https://lore.kernel.org/all/20260601124156.2205704-1-mfleming@cloudflare.com/ Signed-off-by: Tejun Heo <tj@kernel.org> --- kernel/sched/ext.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c index 012ca8b..a1f7698 100644 --- a/kernel/sched/ext.c +++ b/kernel/sched/ext.c @@ -4293,11 +4293,13 @@ void scx_cgroup_move_task(struct task_struct *p) return; /* - * @p must have ops.cgroup_prep_move() called on it and thus - * cgrp_moving_from set. + * scx_cgroup_can_attach() sets cgrp_moving_from only when the task's + * cgroup changes. Migration keys off css rather than cgroup identity, + * so it can hand an unchanged-cgroup task here with cgrp_moving_from + * NULL. Nothing to report to the BPF scheduler then, so skip it and + * keep prep_move and move paired. */ - if (SCX_HAS_OP(sch, cgroup_move) && - !WARN_ON_ONCE(!p->scx.cgrp_moving_from)) + if (SCX_HAS_OP(sch, cgroup_move) && p->scx.cgrp_moving_from) SCX_CALL_OP_TASK(sch, cgroup_move, task_rq(p), p, p->scx.cgrp_moving_from, tg_cgrp(task_group(p))); ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() 2026-06-01 19:22 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() Tejun Heo @ 2026-06-01 20:19 ` Andrea Righi 2026-06-02 21:35 ` Tejun Heo 1 sibling, 0 replies; 4+ messages in thread From: Andrea Righi @ 2026-06-01 20:19 UTC (permalink / raw) To: Tejun Heo Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis, linux-kernel, Matt Fleming, kernel-team Hello, On Mon, Jun 01, 2026 at 09:22:37AM -1000, Tejun Heo wrote: > A WARN fires when systemd's user manager writes "+cpu +memory +pids" to > its own subtree_control while a sched_ext scheduler is loaded: > > WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0 > scx_cgroup_move_task+0xa8/0xb0 > sched_move_task+0x134/0x290 > cpu_cgroup_attach+0x39/0x70 > cgroup_migrate_execute+0x37d/0x450 > cgroup_update_dfl_csses+0x1e3/0x270 > cgroup_subtree_control_write+0x3e7/0x440 > > scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu > cgroup changes. It can still be NULL when scx_cgroup_move_task() runs, > through this sequence: > > Step Result > --------------------------------- ---------------------------------- > 1. cpu enabled on cgroup G cpu css = A > 2. cpu toggled off then on for G A killed, B created (same cgroup) > 3. an exiting task keeps A alive migration skips it, A now stale > 4. +memory migrates G stale A vs current B pulls cpu in > 5. cpu attach runs for all tasks hits a live, cpu-unchanged task > 6. scx_cgroup_move_task() on it cgrp_moving_from NULL -> WARN > > The mismatch is that scx_cgroup_can_attach() keys on cgroup identity > while migration drives the move on css identity, so a NULL cgrp_moving_from > here is a legitimate css-only migration, not a missing prep. > > The call is already gated on cgrp_moving_from, so just drop the warning. > ops.cgroup_prep_move() and ops.cgroup_move() stay paired. > > Fixes: 819513666966 ("sched_ext: Add cgroup support") > Cc: stable@vger.kernel.org # v6.12+ > Reported-by: Matt Fleming <mfleming@cloudflare.com> > Closes: https://lore.kernel.org/all/20260601124156.2205704-1-mfleming@cloudflare.com/ > Signed-off-by: Tejun Heo <tj@kernel.org> Makes sense to me. Reviewed-by: Andrea Righi <arighi@nvidia.com> Thanks, -Andrea > --- > kernel/sched/ext.c | 10 ++++++---- > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 012ca8b..a1f7698 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -4293,11 +4293,13 @@ void scx_cgroup_move_task(struct task_struct *p) > return; > > /* > - * @p must have ops.cgroup_prep_move() called on it and thus > - * cgrp_moving_from set. > + * scx_cgroup_can_attach() sets cgrp_moving_from only when the task's > + * cgroup changes. Migration keys off css rather than cgroup identity, > + * so it can hand an unchanged-cgroup task here with cgrp_moving_from > + * NULL. Nothing to report to the BPF scheduler then, so skip it and > + * keep prep_move and move paired. > */ > - if (SCX_HAS_OP(sch, cgroup_move) && > - !WARN_ON_ONCE(!p->scx.cgrp_moving_from)) > + if (SCX_HAS_OP(sch, cgroup_move) && p->scx.cgrp_moving_from) > SCX_CALL_OP_TASK(sch, cgroup_move, task_rq(p), > p, p->scx.cgrp_moving_from, > tg_cgrp(task_group(p))); ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() 2026-06-01 19:22 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() Tejun Heo 2026-06-01 20:19 ` Andrea Righi @ 2026-06-02 21:35 ` Tejun Heo 1 sibling, 0 replies; 4+ messages in thread From: Tejun Heo @ 2026-06-02 21:35 UTC (permalink / raw) To: David Vernet, Andrea Righi, Changwoo Min, sched-ext Cc: Emil Tsalapatis, linux-kernel, Matt Fleming, kernel-team Applied to sched_ext/for-7.1-fixes. Thanks. -- tejun ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-06-02 21:35 UTC | newest] Thread overview: 4+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-06-01 12:41 WARNING at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0 Matt Fleming 2026-06-01 19:22 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() Tejun Heo 2026-06-01 20:19 ` Andrea Righi 2026-06-02 21:35 ` Tejun Heo
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.