All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler
@ 2026-05-26 13:52 Zicheng Qu
  2026-05-26 15:18 ` sashiko-bot
  2026-05-26 17:20 ` Andrea Righi
  0 siblings, 2 replies; 10+ messages in thread
From: Zicheng Qu @ 2026-05-26 13:52 UTC (permalink / raw)
  To: tj, void, arighi, changwoo, mingo, peterz, juri.lelli,
	vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
	vschneid, kprateek.nayak, haoluo, joshdon, brho, sched-ext,
	linux-kernel
  Cc: tanghui20, zhangqiao22, quzicheng, quzicheng315

From: Zicheng Qu <quzicheng@huawei.com>

When a BPF scheduler is disabled, scx_root_disable() switches tasks
from ext_sched_class back to fair_sched_class directly. This does not
go through __setscheduler_params(), so p->se.load is not rebuilt for
tasks returning to fair.

For example, after enabling a sched_ext BPF scheduler and creating
CPU-bound tasks with different nice values, disabling the BPF scheduler
can leave them running under fair with stale p->se.load. They may then
split CPU time according to the stale weight instead of their current
nice weights.

Rebuild the fair load weight when scx_root_disable() switches a task
from ext_sched_class to fair_sched_class. Use set_load_weight(p, false)
so CFS gets a native load_weight derived from the task's current
policy/static_prio before the task is enqueued on fair.

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
---
 kernel/sched/ext.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 65631e577ee9..e5b8509ce7ee 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5967,6 +5967,10 @@ static void scx_root_disable(struct scx_sched *sch)
 
 		scoped_guard (sched_change, p, queue_flags) {
 			p->sched_class = new_class;
+
+			if (old_class == &ext_sched_class &&
+			    new_class == &fair_sched_class)
+				set_load_weight(p, false);
 		}
 
 		scx_disable_and_exit_task(scx_task_sched(p), p);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler
  2026-05-26 13:52 [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler Zicheng Qu
@ 2026-05-26 15:18 ` sashiko-bot
  2026-05-26 17:20 ` Andrea Righi
  1 sibling, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-26 15:18 UTC (permalink / raw)
  To: Zicheng Qu; +Cc: sched-ext

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:

Pre-existing issues:
- [High] Stale p->se.load weights when transitioning tasks from SCHED_EXT to SCHED_NORMAL via sys_sched_setscheduler()
--

commit d672e37dd0f7bb43fb8f1a521e9d70c91ecf303e
Author: Zicheng Qu <quzicheng@huawei.com>

sched_ext: Rebuild fair weight when disabling BPF scheduler

This commit fixes an issue where tasks switching from the BPF scheduler back
to the fair scheduler during scx_root_disable() retain stale load weights.
It rebuilds the fair weight by calling set_load_weight() so that CFS gets
the correct load_weight derived from the task's current policy.

> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 345aa11b84b28..3126125968d67 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -5872,6 +5872,10 @@ static void scx_root_disable(struct scx_sched *sch)
>  
>  		scoped_guard (sched_change, p, queue_flags) {
>  			p->sched_class = new_class;
> +
> +			if (old_class == &ext_sched_class &&
> +			    new_class == &fair_sched_class)
> +				set_load_weight(p, false);

[Severity: High]
This isn't a bug introduced by this patch, but does this same stale weight
problem happen when a task is moved from the BPF scheduler to the fair
scheduler via sys_sched_setscheduler()?

In __sched_setscheduler(), __setscheduler_params() is called before
p->sched_class is updated to the new class. __setscheduler_params() calls
set_load_weight(p, true).

Because p->sched_class is still ext_sched_class at this point,
set_load_weight() invokes reweight_task_scx(). reweight_task_scx() updates
p->scx.weight but completely ignores p->se.load.

Immediately after, p->sched_class is set to fair_sched_class, and the task
is enqueued into the CFS runqueue with its old, stale load weight. Could
tasks transitioning via sys_sched_setscheduler() also run with the wrong
scheduling priority and monopolize CPU time?

>  		}
>  
>  		scx_disable_and_exit_task(scx_task_sched(p), p);

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260526135211.1892261-1-quzicheng315@gmail.com?part=1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler
  2026-05-26 13:52 [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler Zicheng Qu
  2026-05-26 15:18 ` sashiko-bot
@ 2026-05-26 17:20 ` Andrea Righi
  2026-05-27  9:40   ` [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches quzicheng315
  1 sibling, 1 reply; 10+ messages in thread
From: Andrea Righi @ 2026-05-26 17:20 UTC (permalink / raw)
  To: Zicheng Qu
  Cc: tj, void, changwoo, mingo, peterz, juri.lelli, vincent.guittot,
	dietmar.eggemann, rostedt, bsegall, mgorman, vschneid,
	kprateek.nayak, haoluo, joshdon, brho, sched-ext, linux-kernel,
	tanghui20, zhangqiao22, quzicheng

Hi Zicheng,

On Tue, May 26, 2026 at 09:52:11PM +0800, Zicheng Qu wrote:
> From: Zicheng Qu <quzicheng@huawei.com>
> 
> When a BPF scheduler is disabled, scx_root_disable() switches tasks
> from ext_sched_class back to fair_sched_class directly. This does not
> go through __setscheduler_params(), so p->se.load is not rebuilt for
> tasks returning to fair.
> 
> For example, after enabling a sched_ext BPF scheduler and creating
> CPU-bound tasks with different nice values, disabling the BPF scheduler
> can leave them running under fair with stale p->se.load. They may then
> split CPU time according to the stale weight instead of their current
> nice weights.
> 
> Rebuild the fair load weight when scx_root_disable() switches a task
> from ext_sched_class to fair_sched_class. Use set_load_weight(p, false)
> so CFS gets a native load_weight derived from the task's current
> policy/static_prio before the task is enqueued on fair.
> 
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
> ---
>  kernel/sched/ext.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
> index 65631e577ee9..e5b8509ce7ee 100644
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -5967,6 +5967,10 @@ static void scx_root_disable(struct scx_sched *sch)
>  
>  		scoped_guard (sched_change, p, queue_flags) {
>  			p->sched_class = new_class;
> +
> +			if (old_class == &ext_sched_class &&
> +			    new_class == &fair_sched_class)
> +				set_load_weight(p, false);

I'm wondering if we have a similar issue for tasks moving from SCHED_EXT to
SCHED_NORMAL when a scx scheduler is running in partial mode. Maybe we need to
intercept this special case in __sched_setscheduler()? (not necessarily for this
patch, it can be addressed later as a separate follow-up patch).

For now, this makes sense to me.

Reviewed-by: Andrea Righi <arighi@nvidia.com>

Thanks,
-Andrea

>  		}
>  
>  		scx_disable_and_exit_task(scx_task_sched(p), p);
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches
  2026-05-26 17:20 ` Andrea Righi
@ 2026-05-27  9:40   ` quzicheng315
  2026-05-27 11:26     ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: quzicheng315 @ 2026-05-27  9:40 UTC (permalink / raw)
  To: arighi
  Cc: brho, bsegall, changwoo, dietmar.eggemann, haoluo, joshdon,
	juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo, peterz,
	quzicheng315, quzicheng, rostedt, sched-ext, tanghui20, tj,
	vincent.guittot, void, vschneid, zhangqiao22

From: Zicheng Qu <quzicheng315@gmail.com>

Tasks running on sched_ext do not use p->se.load as their active
scheduling weight. Their nice-derived weight is maintained as
p->scx.weight instead.

When such a task switches back to fair, CFS expects p->se.load to match
the task's current policy/static_prio before the task is enqueued.
However, not all ext to fair transitions rebuild p->se.load. For
example, scx_root_disable() switches tasks back to fair directly, and
partial mode can move a task from SCHED_EXT to SCHED_NORMAL through
sched_setscheduler(). In the latter case, set_load_weight(p, true) runs
while p->sched_class is still ext_sched_class, so reweight_task_scx()
updates p->scx.weight but leaves p->se.load stale.

Rebuild the fair load weight in sched_change_end() when the class switch
is from ext_sched_class to fair_sched_class. This is after the class has
been changed and before the task is enqueued on fair, so CFS sees a
native load_weight derived from the task's current policy/static_prio.

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
---
Changes in v2:
- Move the fix from scx_root_disable() to sched_change_end() so the same
  ext-to-fair rebuild also covers partial mode SCHED_EXT to SCHED_NORMAL
  transitions through sched_setscheduler(), as Andrea pointed out.

 kernel/sched/core.c |  2 ++
 kernel/sched/ext.h  | 11 +++++++++++
 2 files changed, 13 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b8871449d3c6..c694aabc451a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -11200,6 +11200,8 @@ void sched_change_end(struct sched_change_ctx *ctx)
 	 */
 	WARN_ON_ONCE(p->sched_class != ctx->class && !(ctx->flags & ENQUEUE_CLASS));
 
+	scx_rebuild_fair_weight_on_class_switch(p, ctx->class, p->sched_class);
+
 	if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to)
 		p->sched_class->switching_to(rq, p);
 
diff --git a/kernel/sched/ext.h b/kernel/sched/ext.h
index 0b7fc46aee08..1f8248c897af 100644
--- a/kernel/sched/ext.h
+++ b/kernel/sched/ext.h
@@ -35,6 +35,14 @@ static inline bool task_on_scx(const struct task_struct *p)
 	return scx_enabled() && p->sched_class == &ext_sched_class;
 }
 
+static inline void scx_rebuild_fair_weight_on_class_switch(struct task_struct *p,
+							   const struct sched_class *old_class,
+							   const struct sched_class *new_class)
+{
+	if (old_class == &ext_sched_class && new_class == &fair_sched_class)
+		set_load_weight(p, false);
+}
+
 #ifdef CONFIG_SCHED_CORE
 bool scx_prio_less(const struct task_struct *a, const struct task_struct *b,
 		   bool in_fi);
@@ -55,6 +63,9 @@ static inline int scx_check_setscheduler(struct task_struct *p, int policy) { re
 static inline bool task_on_scx(const struct task_struct *p) { return false; }
 static inline bool scx_allow_ttwu_queue(const struct task_struct *p) { return true; }
 static inline void init_sched_ext_class(void) {}
+static inline void scx_rebuild_fair_weight_on_class_switch(struct task_struct *p,
+							   const struct sched_class *old_class,
+							   const struct sched_class *new_class) {}
 
 #endif	/* CONFIG_SCHED_CLASS_EXT */
 
-- 
2.43.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches
  2026-05-27  9:40   ` [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches quzicheng315
@ 2026-05-27 11:26     ` Peter Zijlstra
  2026-05-28  2:53       ` Zicheng Qu
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2026-05-27 11:26 UTC (permalink / raw)
  To: quzicheng315
  Cc: arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo,
	joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo,
	quzicheng, rostedt, sched-ext, tanghui20, tj, vincent.guittot,
	void, vschneid, zhangqiao22

On Wed, May 27, 2026 at 05:40:37PM +0800, quzicheng315@gmail.com wrote:
> From: Zicheng Qu <quzicheng315@gmail.com>
> 
> Tasks running on sched_ext do not use p->se.load as their active
> scheduling weight. Their nice-derived weight is maintained as
> p->scx.weight instead.
> 
> When such a task switches back to fair, CFS expects p->se.load to match
> the task's current policy/static_prio before the task is enqueued.
> However, not all ext to fair transitions rebuild p->se.load. For
> example, scx_root_disable() switches tasks back to fair directly, and
> partial mode can move a task from SCHED_EXT to SCHED_NORMAL through
> sched_setscheduler(). In the latter case, set_load_weight(p, true) runs
> while p->sched_class is still ext_sched_class, so reweight_task_scx()
> updates p->scx.weight but leaves p->se.load stale.
> 
> Rebuild the fair load weight in sched_change_end() when the class switch
> is from ext_sched_class to fair_sched_class. This is after the class has
> been changed and before the task is enqueued on fair, so CFS sees a
> native load_weight derived from the task's current policy/static_prio.
> 
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
> ---
> Changes in v2:
> - Move the fix from scx_root_disable() to sched_change_end() so the same
>   ext-to-fair rebuild also covers partial mode SCHED_EXT to SCHED_NORMAL
>   transitions through sched_setscheduler(), as Andrea pointed out.
> 
>  kernel/sched/core.c |  2 ++
>  kernel/sched/ext.h  | 11 +++++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index b8871449d3c6..c694aabc451a 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -11200,6 +11200,8 @@ void sched_change_end(struct sched_change_ctx *ctx)
>  	 */
>  	WARN_ON_ONCE(p->sched_class != ctx->class && !(ctx->flags & ENQUEUE_CLASS));
>  
> +	scx_rebuild_fair_weight_on_class_switch(p, ctx->class, p->sched_class);
> +
>  	if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to)
>  		p->sched_class->switching_to(rq, p);
>  
> diff --git a/kernel/sched/ext.h b/kernel/sched/ext.h
> index 0b7fc46aee08..1f8248c897af 100644
> --- a/kernel/sched/ext.h
> +++ b/kernel/sched/ext.h
> @@ -35,6 +35,14 @@ static inline bool task_on_scx(const struct task_struct *p)
>  	return scx_enabled() && p->sched_class == &ext_sched_class;
>  }
>  
> +static inline void scx_rebuild_fair_weight_on_class_switch(struct task_struct *p,
> +							   const struct sched_class *old_class,
> +							   const struct sched_class *new_class)
> +{
> +	if (old_class == &ext_sched_class && new_class == &fair_sched_class)
> +		set_load_weight(p, false);
> +}
> +
>  #ifdef CONFIG_SCHED_CORE
>  bool scx_prio_less(const struct task_struct *a, const struct task_struct *b,
>  		   bool in_fi);
> @@ -55,6 +63,9 @@ static inline int scx_check_setscheduler(struct task_struct *p, int policy) { re
>  static inline bool task_on_scx(const struct task_struct *p) { return false; }
>  static inline bool scx_allow_ttwu_queue(const struct task_struct *p) { return true; }
>  static inline void init_sched_ext_class(void) {}
> +static inline void scx_rebuild_fair_weight_on_class_switch(struct task_struct *p,
> +							   const struct sched_class *old_class,
> +							   const struct sched_class *new_class) {}
>  
>  #endif	/* CONFIG_SCHED_CLASS_EXT */

This is truly horrible. We have 4 class methods involved with switching
classes and you stick in a random call in a place that is called when no
class is changed.

Would not something like this work?

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 62a2dcb0d03e..a2eb43bd73b9 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -14957,6 +14957,11 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 	detach_task_cfs_rq(p);
 }
 
+static void switching_to_fair(struct rq *rq, struct task_struct *p)
+{
+	set_load_weight(p, false);
+}
+
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
 	WARN_ON_ONCE(p->se.sched_delayed);
@@ -15351,6 +15356,7 @@ DEFINE_SCHED_CLASS(fair) = {
 	.prio_changed		= prio_changed_fair,
 	.switching_from		= switching_from_fair,
 	.switched_from		= switched_from_fair,
+	.switching_to		= switching_to_fair,
 	.switched_to		= switched_to_fair,
 
 	.get_rr_interval	= get_rr_interval_fair,

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches
  2026-05-27 11:26     ` Peter Zijlstra
@ 2026-05-28  2:53       ` Zicheng Qu
  2026-05-28  9:25         ` Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Zicheng Qu @ 2026-05-28  2:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo,
	joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo,
	quzicheng, rostedt, sched-ext, tanghui20, tj, vincent.guittot,
	void, vschneid, zhangqiao22, quzicheng315

On Wed, May 27, 2026 at 07:26PM +0800, Peter Zijlstra wrote:

> This is truly horrible. We have 4 class methods involved with switching
> classes and you stick in a random call in a place that is called when no
> class is changed.
>
> Would not something like this work?
>
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 62a2dcb0d03e..a2eb43bd73b9 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -14957,6 +14957,11 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
>   	detach_task_cfs_rq(p);
>   }
>   
> +static void switching_to_fair(struct rq *rq, struct task_struct *p)
> +{
> +	set_load_weight(p, false);
> +}
> +
>   static void switched_to_fair(struct rq *rq, struct task_struct *p)
>   {
>   	WARN_ON_ONCE(p->se.sched_delayed);
> @@ -15351,6 +15356,7 @@ DEFINE_SCHED_CLASS(fair) = {
>   	.prio_changed		= prio_changed_fair,
>   	.switching_from		= switching_from_fair,
>   	.switched_from		= switched_from_fair,
> +	.switching_to		= switching_to_fair,
>   	.switched_to		= switched_to_fair,
>   
>   	.get_rr_interval	= get_rr_interval_fair,
Yes, from the class switch point of view, `switching_to_fair()` is a better
fit.

Before v2, I was weighing three possible places for the fix:

1. Updating `p->se.load` from `reweight_task_scx()`. This would keep the 
fair
weight in sync while the task is on sched_ext, so switching back to fair 
would
not need any extra fixup. However, it would also make sched_ext maintain 
fair
class state even when fair is not using it, which does not seem like the 
right
ownership model.

2. Rebuilding `p->se.load` from fair's `switching_to` hook. This is the most
natural place semantically, since the task is entering fair and fair 
prepares
its own state before enqueue. My only concern was that, for non-ext -> fair
paths, `__setscheduler_params()` may have already updated `p->se.load` 
through
`set_load_weight(p, true)`, so calling `set_load_weight(p, false)`
unconditionally here can be redundant logically. Functionally, though, it is
harmless.

3. Rebuilding in `sched_change_end()` based on the old/new classes. That was
the v2 choice because both classes are available there, the task has not 
been
enqueued yet, and it covers both `scx_root_disable()` and the partial-mode
`sched_setscheduler()` path. In hindsight, though, this makes the generic
sched_change path handle a scx & fair-specific fixup. That is more awkward
than letting fair prepare its own state in `switching_to_fair()`.

I'll respin v3 as you suggested.


Thanks,

Zicheng


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches
  2026-05-28  2:53       ` Zicheng Qu
@ 2026-05-28  9:25         ` Peter Zijlstra
  2026-05-28 13:12           ` [PATCH v3] sched/fair: Rebuild load weight when switching to fair quzicheng315
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2026-05-28  9:25 UTC (permalink / raw)
  To: Zicheng Qu
  Cc: arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo,
	joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo,
	quzicheng, rostedt, sched-ext, tanghui20, tj, vincent.guittot,
	void, vschneid, zhangqiao22

On Thu, May 28, 2026 at 10:53:54AM +0800, Zicheng Qu wrote:

> 2. Rebuilding `p->se.load` from fair's `switching_to` hook. This is the most
> natural place semantically, since the task is entering fair and fair
> prepares
> its own state before enqueue. My only concern was that, for non-ext -> fair
> paths, `__setscheduler_params()` may have already updated `p->se.load`
> through
> `set_load_weight(p, true)`, so calling `set_load_weight(p, false)`
> unconditionally here can be redundant logically. Functionally, though, it is
> harmless.

Right. We can worry about optimizing this if there's ever a report. I
don't expect this to be noticeable much. If anything, the PI code would
be the one to trip this most often I think.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v3] sched/fair: Rebuild load weight when switching to fair
  2026-05-28  9:25         ` Peter Zijlstra
@ 2026-05-28 13:12           ` quzicheng315
  2026-05-28 14:27             ` Tejun Heo
  2026-05-30  5:06             ` sashiko-bot
  0 siblings, 2 replies; 10+ messages in thread
From: quzicheng315 @ 2026-05-28 13:12 UTC (permalink / raw)
  To: peterz
  Cc: arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo,
	joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo,
	quzicheng315, quzicheng, rostedt, sched-ext, tanghui20, tj,
	vincent.guittot, void, vschneid, zhangqiao22

From: Zicheng Qu <quzicheng@huawei.com>

Tasks that run outside fair may not keep p->se.load in sync with their
current scheduling policy and static priority. sched_ext, for example,
uses p->scx.weight as the active scheduling weight, so p->se.load can be
stale when a task moves back to fair.

The fair_sched_class expects the sched_entity load weight to be valid
before the task is enqueued. Rebuild it from fair's switching_to hook,
which runs after the class has been changed to fair and before enqueue,
so both sched_ext disable and SCHED_EXT to SCHED_NORMAL transitions get
a native fair load weight.

Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
---
Changes in v3:
- Move the rebuild into fair's switching_to hook, as suggested by Peter.
  This lets fair prepare its own state before enqueue and avoids adding a
  sched_ext/fair-specific fixup to the generic sched_change_end() path.

Changes in v2:
- Move the fix from scx_root_disable() to the class switch path so it also
  covers partial-mode SCHED_EXT to SCHED_NORMAL transitions through
  sched_setscheduler(). Andrea identified this missing case in the v1
  discussion.

 kernel/sched/fair.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 3ebec186f982..3a21ceefcadf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -13837,6 +13837,15 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
 	detach_task_cfs_rq(p);
 }
 
+static void switching_to_fair(struct rq *rq, struct task_struct *p)
+{
+	/*
+	 * Tasks may come from classes that don't keep se.load up to date.
+	 * Rebuild it before the task is enqueued.
+	 */
+	set_load_weight(p, false);
+}
+
 static void switched_to_fair(struct rq *rq, struct task_struct *p)
 {
 	WARN_ON_ONCE(p->se.sched_delayed);
@@ -14233,6 +14242,7 @@ DEFINE_SCHED_CLASS(fair) = {
 	.prio_changed		= prio_changed_fair,
 	.switching_from		= switching_from_fair,
 	.switched_from		= switched_from_fair,
+	.switching_to		= switching_to_fair,
 	.switched_to		= switched_to_fair,
 
 	.get_rr_interval	= get_rr_interval_fair,
-- 
2.53.0

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/fair: Rebuild load weight when switching to fair
  2026-05-28 13:12           ` [PATCH v3] sched/fair: Rebuild load weight when switching to fair quzicheng315
@ 2026-05-28 14:27             ` Tejun Heo
  2026-05-30  5:06             ` sashiko-bot
  1 sibling, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2026-05-28 14:27 UTC (permalink / raw)
  To: quzicheng315
  Cc: peterz, arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo,
	joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo,
	quzicheng, rostedt, sched-ext, tanghui20, vincent.guittot, void,
	vschneid, zhangqiao22

On Thu, May 28, 2026 at 09:12:38PM +0800, quzicheng315@gmail.com wrote:
> From: Zicheng Qu <quzicheng@huawei.com>
> 
> Tasks that run outside fair may not keep p->se.load in sync with their
> current scheduling policy and static priority. sched_ext, for example,
> uses p->scx.weight as the active scheduling weight, so p->se.load can be
> stale when a task moves back to fair.
> 
> The fair_sched_class expects the sched_entity load weight to be valid
> before the task is enqueued. Rebuild it from fair's switching_to hook,
> which runs after the class has been changed to fair and before enqueue,
> so both sched_ext disable and SCHED_EXT to SCHED_NORMAL transitions get
> a native fair load weight.
> 
> Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
> 
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Zicheng Qu <quzicheng@huawei.com>

Acked-by: Tejun Heo <tj@kernel.org>

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v3] sched/fair: Rebuild load weight when switching to fair
  2026-05-28 13:12           ` [PATCH v3] sched/fair: Rebuild load weight when switching to fair quzicheng315
  2026-05-28 14:27             ` Tejun Heo
@ 2026-05-30  5:06             ` sashiko-bot
  1 sibling, 0 replies; 10+ messages in thread
From: sashiko-bot @ 2026-05-30  5:06 UTC (permalink / raw)
  To: quzicheng315; +Cc: sched-ext

Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider:
- [High] Updating task load weight without scaling its PELT averages corrupts load_sum when attaching to fair.
--

commit 5e323dec0121afc6858dcf6c332ec85a8dc01718
Author: Zicheng Qu <quzicheng@huawei.com>

sched/fair: Rebuild load weight when switching to fair

Tasks running outside the fair scheduling class may not keep their
load weight in sync. This commit rebuilds the load weight from the
switching_to hook before enqueueing back into fair.

> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index 62a2dcb0d03e6..3671b512dcf7f 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c

[ ... ]

> @@ -14957,6 +14957,15 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p)
>  	detach_task_cfs_rq(p);
>  }
>  
> +static void switching_to_fair(struct rq *rq, struct task_struct *p)
> +{
> +	/*
> +	 * Tasks may come from classes that don't keep se.load up to date.
> +	 * Rebuild it before the task is enqueued.
> +	 */
> +	set_load_weight(p, false);

[Severity: High]
Does updating the task load weight without scaling its PELT averages corrupt
load_sum when attaching to fair?

When a task transitions to fair, switched_to_fair() enqueues it by calling
attach_task_cfs_rq() -> attach_entity_load_avg(). The attach_entity_load_avg()
function relies on se.load.weight matching se.avg.load_avg to reconstruct the
unweighted load_sum:

kernel/sched/fair.c:attach_entity_load_avg() {
    ...
    se->avg.load_sum = div_u64(se->avg.load_sum, se_weight(se));
    ...
}

Since load_avg is based on the old weight but se_weight() returns the newly
updated weight, load_sum ends up mathematically scaled by old_weight /
new_weight. If a task's nice value was changed to a lower priority (smaller
weight) while outside fair, load_sum could be artificially inflated by a
massive factor, polluting cfs_rq->avg and disrupting CPU load balancing.

> +}
> +

-- 
Sashiko AI review · https://sashiko.dev/#/patchset/20260528131238.3879110-1-quzicheng315@gmail.com?part=1

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2026-05-30  5:06 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 13:52 [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler Zicheng Qu
2026-05-26 15:18 ` sashiko-bot
2026-05-26 17:20 ` Andrea Righi
2026-05-27  9:40   ` [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches quzicheng315
2026-05-27 11:26     ` Peter Zijlstra
2026-05-28  2:53       ` Zicheng Qu
2026-05-28  9:25         ` Peter Zijlstra
2026-05-28 13:12           ` [PATCH v3] sched/fair: Rebuild load weight when switching to fair quzicheng315
2026-05-28 14:27             ` Tejun Heo
2026-05-30  5:06             ` sashiko-bot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.