All of lore.kernel.org
 help / color / mirror / Atom feed
* WARNING at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0
@ 2026-06-01 12:41 Matt Fleming
  2026-06-01 19:22 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() Tejun Heo
  0 siblings, 1 reply; 4+ messages in thread
From: Matt Fleming @ 2026-06-01 12:41 UTC (permalink / raw)
  To: Tejun Heo, David Vernet
  Cc: Andrea Righi, Changwoo Min, sched-ext, kernel-team, Matt Fleming

Hi,

We're hitting a WARN_ON_ONCE in scx_cgroup_move_task() while running
scx_lavd. The stack is:

    scx_cgroup_move_task+0xa8/0xb0
    sched_move_task+0x134/0x290
    cpu_cgroup_attach+0x39/0x70
    cgroup_migrate_execute+0x37d/0x450
    cgroup_update_dfl_csses+0x1e3/0x270
    cgroup_subtree_control_write+0x3e7/0x440

The trigger is systemd's user@UID.service startup. The user manager
writes +cpu +memory +pids to its own subtree_control. Because +cpu is
already inherited, the cpu task_group doesn't change -- but
find_css_set() returns a css_set whose subsys[cpu] pointer differs
from the source (different css object, same tg_cgrp()). That sets
the cpu bit in mgctx->ss_mask, so both cpu_cgroup_can_attach() and
cpu_cgroup_attach() run.

scx_cgroup_can_attach() checks "from == to" in tg_cgrp() and skips the
task, leaving cgrp_moving_from NULL. But cpu_cgroup_attach() calls
sched_move_task() unconditionally, so scx_cgroup_move_task() fires and
WARNs on the NULL.

I managed to reproduce by cycling systemctl stop/start
user@UID.service. The hit rate varies with css_set cache state.

Any thoughts on the best way to resolve this?

Thanks,
Matt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task()
  2026-06-01 12:41 WARNING at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0 Matt Fleming
@ 2026-06-01 19:22 ` Tejun Heo
  2026-06-01 20:19   ` Andrea Righi
  2026-06-02 21:35   ` Tejun Heo
  0 siblings, 2 replies; 4+ messages in thread
From: Tejun Heo @ 2026-06-01 19:22 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min, sched-ext
  Cc: Emil Tsalapatis, linux-kernel, Matt Fleming, kernel-team

A WARN fires when systemd's user manager writes "+cpu +memory +pids" to
its own subtree_control while a sched_ext scheduler is loaded:

  WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0
   scx_cgroup_move_task+0xa8/0xb0
   sched_move_task+0x134/0x290
   cpu_cgroup_attach+0x39/0x70
   cgroup_migrate_execute+0x37d/0x450
   cgroup_update_dfl_csses+0x1e3/0x270
   cgroup_subtree_control_write+0x3e7/0x440

scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu
cgroup changes. It can still be NULL when scx_cgroup_move_task() runs,
through this sequence:

  Step                               Result
  ---------------------------------  ----------------------------------
  1. cpu enabled on cgroup G         cpu css = A
  2. cpu toggled off then on for G   A killed, B created (same cgroup)
  3. an exiting task keeps A alive   migration skips it, A now stale
  4. +memory migrates G              stale A vs current B pulls cpu in
  5. cpu attach runs for all tasks   hits a live, cpu-unchanged task
  6. scx_cgroup_move_task() on it    cgrp_moving_from NULL -> WARN

The mismatch is that scx_cgroup_can_attach() keys on cgroup identity
while migration drives the move on css identity, so a NULL cgrp_moving_from
here is a legitimate css-only migration, not a missing prep.

The call is already gated on cgrp_moving_from, so just drop the warning.
ops.cgroup_prep_move() and ops.cgroup_move() stay paired.

Fixes: 819513666966 ("sched_ext: Add cgroup support")
Cc: stable@vger.kernel.org # v6.12+
Reported-by: Matt Fleming <mfleming@cloudflare.com>
Closes: https://lore.kernel.org/all/20260601124156.2205704-1-mfleming@cloudflare.com/
Signed-off-by: Tejun Heo <tj@kernel.org>
---
 kernel/sched/ext.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 012ca8b..a1f7698 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -4293,11 +4293,13 @@ void scx_cgroup_move_task(struct task_struct *p)
 		return;
 
 	/*
-	 * @p must have ops.cgroup_prep_move() called on it and thus
-	 * cgrp_moving_from set.
+	 * scx_cgroup_can_attach() sets cgrp_moving_from only when the task's
+	 * cgroup changes. Migration keys off css rather than cgroup identity,
+	 * so it can hand an unchanged-cgroup task here with cgrp_moving_from
+	 * NULL. Nothing to report to the BPF scheduler then, so skip it and
+	 * keep prep_move and move paired.
 	 */
-	if (SCX_HAS_OP(sch, cgroup_move) &&
-	    !WARN_ON_ONCE(!p->scx.cgrp_moving_from))
+	if (SCX_HAS_OP(sch, cgroup_move) && p->scx.cgrp_moving_from)
 		SCX_CALL_OP_TASK(sch, cgroup_move, task_rq(p),
 				 p, p->scx.cgrp_moving_from,
 				 tg_cgrp(task_group(p)));

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task()
  2026-06-01 19:22 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() Tejun Heo
@ 2026-06-01 20:19   ` Andrea Righi
  2026-06-02 21:35   ` Tejun Heo
  1 sibling, 0 replies; 4+ messages in thread
From: Andrea Righi @ 2026-06-01 20:19 UTC (permalink / raw)
  To: Tejun Heo
  Cc: David Vernet, Changwoo Min, sched-ext, Emil Tsalapatis,
	linux-kernel, Matt Fleming, kernel-team

Hello,

On Mon, Jun 01, 2026 at 09:22:37AM -1000, Tejun Heo wrote:
> A WARN fires when systemd's user manager writes "+cpu +memory +pids" to
> its own subtree_control while a sched_ext scheduler is loaded:
> 
>   WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0
>    scx_cgroup_move_task+0xa8/0xb0
>    sched_move_task+0x134/0x290
>    cpu_cgroup_attach+0x39/0x70
>    cgroup_migrate_execute+0x37d/0x450
>    cgroup_update_dfl_csses+0x1e3/0x270
>    cgroup_subtree_control_write+0x3e7/0x440
> 
> scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu
> cgroup changes. It can still be NULL when scx_cgroup_move_task() runs,
> through this sequence:
> 
>   Step                               Result
>   ---------------------------------  ----------------------------------
>   1. cpu enabled on cgroup G         cpu css = A
>   2. cpu toggled off then on for G   A killed, B created (same cgroup)
>   3. an exiting task keeps A alive   migration skips it, A now stale
>   4. +memory migrates G              stale A vs current B pulls cpu in
>   5. cpu attach runs for all tasks   hits a live, cpu-unchanged task
>   6. scx_cgroup_move_task() on it    cgrp_moving_from NULL -> WARN
> 
> The mismatch is that scx_cgroup_can_attach() keys on cgroup identity
> while migration drives the move on css identity, so a NULL cgrp_moving_from
> here is a legitimate css-only migration, not a missing prep.
> 
> The call is already gated on cgrp_moving_from, so just drop the warning.
> ops.cgroup_prep_move() and ops.cgroup_move() stay paired.
> 
> Fixes: 819513666966 ("sched_ext: Add cgroup support")
> Cc: stable@vger.kernel.org # v6.12+
> Reported-by: Matt Fleming <mfleming@cloudflare.com>
> Closes: https://lore.kernel.org/all/20260601124156.2205704-1-mfleming@cloudflare.com/
> Signed-off-by: Tejun Heo <tj@kernel.org>

Makes sense to me.

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

> ---
>  kernel/sched/ext.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 012ca8b..a1f7698 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -4293,11 +4293,13 @@ void scx_cgroup_move_task(struct task_struct *p)
>  		return;
>  
>  	/*
> -	 * @p must have ops.cgroup_prep_move() called on it and thus
> -	 * cgrp_moving_from set.
> +	 * scx_cgroup_can_attach() sets cgrp_moving_from only when the task's
> +	 * cgroup changes. Migration keys off css rather than cgroup identity,
> +	 * so it can hand an unchanged-cgroup task here with cgrp_moving_from
> +	 * NULL. Nothing to report to the BPF scheduler then, so skip it and
> +	 * keep prep_move and move paired.
>  	 */
> -	if (SCX_HAS_OP(sch, cgroup_move) &&
> -	    !WARN_ON_ONCE(!p->scx.cgrp_moving_from))
> +	if (SCX_HAS_OP(sch, cgroup_move) && p->scx.cgrp_moving_from)
>  		SCX_CALL_OP_TASK(sch, cgroup_move, task_rq(p),
>  				 p, p->scx.cgrp_moving_from,
>  				 tg_cgrp(task_group(p)));

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task()
  2026-06-01 19:22 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() Tejun Heo
  2026-06-01 20:19   ` Andrea Righi
@ 2026-06-02 21:35   ` Tejun Heo
  1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2026-06-02 21:35 UTC (permalink / raw)
  To: David Vernet, Andrea Righi, Changwoo Min, sched-ext
  Cc: Emil Tsalapatis, linux-kernel, Matt Fleming, kernel-team

Applied to sched_ext/for-7.1-fixes.

Thanks.
--
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-06-02 21:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-01 12:41 WARNING at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0 Matt Fleming
2026-06-01 19:22 ` [PATCH sched_ext/for-7.1-fixes] sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() Tejun Heo
2026-06-01 20:19   ` Andrea Righi
2026-06-02 21:35   ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.