* [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler
@ 2026-05-26 13:52 Zicheng Qu
2026-05-26 15:18 ` sashiko-bot
2026-05-26 17:20 ` Andrea Righi
0 siblings, 2 replies; 10+ messages in thread
From: Zicheng Qu @ 2026-05-26 13:52 UTC (permalink / raw)
To: tj, void, arighi, changwoo, mingo, peterz, juri.lelli,
vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman,
vschneid, kprateek.nayak, haoluo, joshdon, brho, sched-ext,
linux-kernel
Cc: tanghui20, zhangqiao22, quzicheng, quzicheng315
From: Zicheng Qu <quzicheng@huawei.com>
When a BPF scheduler is disabled, scx_root_disable() switches tasks
from ext_sched_class back to fair_sched_class directly. This does not
go through __setscheduler_params(), so p->se.load is not rebuilt for
tasks returning to fair.
For example, after enabling a sched_ext BPF scheduler and creating
CPU-bound tasks with different nice values, disabling the BPF scheduler
can leave them running under fair with stale p->se.load. They may then
split CPU time according to the stale weight instead of their current
nice weights.
Rebuild the fair load weight when scx_root_disable() switches a task
from ext_sched_class to fair_sched_class. Use set_load_weight(p, false)
so CFS gets a native load_weight derived from the task's current
policy/static_prio before the task is enqueued on fair.
Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class")
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
---
kernel/sched/ext.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c
index 65631e577ee9..e5b8509ce7ee 100644
--- a/kernel/sched/ext.c
+++ b/kernel/sched/ext.c
@@ -5967,6 +5967,10 @@ static void scx_root_disable(struct scx_sched *sch)
scoped_guard (sched_change, p, queue_flags) {
p->sched_class = new_class;
+
+ if (old_class == &ext_sched_class &&
+ new_class == &fair_sched_class)
+ set_load_weight(p, false);
}
scx_disable_and_exit_task(scx_task_sched(p), p);
--
2.43.0
^ permalink raw reply related [flat|nested] 10+ messages in thread* Re: [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler 2026-05-26 13:52 [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler Zicheng Qu @ 2026-05-26 15:18 ` sashiko-bot 2026-05-26 17:20 ` Andrea Righi 1 sibling, 0 replies; 10+ messages in thread From: sashiko-bot @ 2026-05-26 15:18 UTC (permalink / raw) To: Zicheng Qu; +Cc: sched-ext Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: Pre-existing issues: - [High] Stale p->se.load weights when transitioning tasks from SCHED_EXT to SCHED_NORMAL via sys_sched_setscheduler() -- commit d672e37dd0f7bb43fb8f1a521e9d70c91ecf303e Author: Zicheng Qu <quzicheng@huawei.com> sched_ext: Rebuild fair weight when disabling BPF scheduler This commit fixes an issue where tasks switching from the BPF scheduler back to the fair scheduler during scx_root_disable() retain stale load weights. It rebuilds the fair weight by calling set_load_weight() so that CFS gets the correct load_weight derived from the task's current policy. > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 345aa11b84b28..3126125968d67 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -5872,6 +5872,10 @@ static void scx_root_disable(struct scx_sched *sch) > > scoped_guard (sched_change, p, queue_flags) { > p->sched_class = new_class; > + > + if (old_class == &ext_sched_class && > + new_class == &fair_sched_class) > + set_load_weight(p, false); [Severity: High] This isn't a bug introduced by this patch, but does this same stale weight problem happen when a task is moved from the BPF scheduler to the fair scheduler via sys_sched_setscheduler()? In __sched_setscheduler(), __setscheduler_params() is called before p->sched_class is updated to the new class. __setscheduler_params() calls set_load_weight(p, true). Because p->sched_class is still ext_sched_class at this point, set_load_weight() invokes reweight_task_scx(). reweight_task_scx() updates p->scx.weight but completely ignores p->se.load. Immediately after, p->sched_class is set to fair_sched_class, and the task is enqueued into the CFS runqueue with its old, stale load weight. Could tasks transitioning via sys_sched_setscheduler() also run with the wrong scheduling priority and monopolize CPU time? > } > > scx_disable_and_exit_task(scx_task_sched(p), p); -- Sashiko AI review · https://sashiko.dev/#/patchset/20260526135211.1892261-1-quzicheng315@gmail.com?part=1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler 2026-05-26 13:52 [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler Zicheng Qu 2026-05-26 15:18 ` sashiko-bot @ 2026-05-26 17:20 ` Andrea Righi 2026-05-27 9:40 ` [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches quzicheng315 1 sibling, 1 reply; 10+ messages in thread From: Andrea Righi @ 2026-05-26 17:20 UTC (permalink / raw) To: Zicheng Qu Cc: tj, void, changwoo, mingo, peterz, juri.lelli, vincent.guittot, dietmar.eggemann, rostedt, bsegall, mgorman, vschneid, kprateek.nayak, haoluo, joshdon, brho, sched-ext, linux-kernel, tanghui20, zhangqiao22, quzicheng Hi Zicheng, On Tue, May 26, 2026 at 09:52:11PM +0800, Zicheng Qu wrote: > From: Zicheng Qu <quzicheng@huawei.com> > > When a BPF scheduler is disabled, scx_root_disable() switches tasks > from ext_sched_class back to fair_sched_class directly. This does not > go through __setscheduler_params(), so p->se.load is not rebuilt for > tasks returning to fair. > > For example, after enabling a sched_ext BPF scheduler and creating > CPU-bound tasks with different nice values, disabling the BPF scheduler > can leave them running under fair with stale p->se.load. They may then > split CPU time according to the stale weight instead of their current > nice weights. > > Rebuild the fair load weight when scx_root_disable() switches a task > from ext_sched_class to fair_sched_class. Use set_load_weight(p, false) > so CFS gets a native load_weight derived from the task's current > policy/static_prio before the task is enqueued on fair. > > Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") > Signed-off-by: Zicheng Qu <quzicheng@huawei.com> > --- > kernel/sched/ext.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 65631e577ee9..e5b8509ce7ee 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -5967,6 +5967,10 @@ static void scx_root_disable(struct scx_sched *sch) > > scoped_guard (sched_change, p, queue_flags) { > p->sched_class = new_class; > + > + if (old_class == &ext_sched_class && > + new_class == &fair_sched_class) > + set_load_weight(p, false); I'm wondering if we have a similar issue for tasks moving from SCHED_EXT to SCHED_NORMAL when a scx scheduler is running in partial mode. Maybe we need to intercept this special case in __sched_setscheduler()? (not necessarily for this patch, it can be addressed later as a separate follow-up patch). For now, this makes sense to me. Reviewed-by: Andrea Righi <arighi@nvidia.com> Thanks, -Andrea > } > > scx_disable_and_exit_task(scx_task_sched(p), p); > -- > 2.43.0 > ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches 2026-05-26 17:20 ` Andrea Righi @ 2026-05-27 9:40 ` quzicheng315 2026-05-27 11:26 ` Peter Zijlstra 0 siblings, 1 reply; 10+ messages in thread From: quzicheng315 @ 2026-05-27 9:40 UTC (permalink / raw) To: arighi Cc: brho, bsegall, changwoo, dietmar.eggemann, haoluo, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo, peterz, quzicheng315, quzicheng, rostedt, sched-ext, tanghui20, tj, vincent.guittot, void, vschneid, zhangqiao22 From: Zicheng Qu <quzicheng315@gmail.com> Tasks running on sched_ext do not use p->se.load as their active scheduling weight. Their nice-derived weight is maintained as p->scx.weight instead. When such a task switches back to fair, CFS expects p->se.load to match the task's current policy/static_prio before the task is enqueued. However, not all ext to fair transitions rebuild p->se.load. For example, scx_root_disable() switches tasks back to fair directly, and partial mode can move a task from SCHED_EXT to SCHED_NORMAL through sched_setscheduler(). In the latter case, set_load_weight(p, true) runs while p->sched_class is still ext_sched_class, so reweight_task_scx() updates p->scx.weight but leaves p->se.load stale. Rebuild the fair load weight in sched_change_end() when the class switch is from ext_sched_class to fair_sched_class. This is after the class has been changed and before the task is enqueued on fair, so CFS sees a native load_weight derived from the task's current policy/static_prio. Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") Signed-off-by: Zicheng Qu <quzicheng@huawei.com> --- Changes in v2: - Move the fix from scx_root_disable() to sched_change_end() so the same ext-to-fair rebuild also covers partial mode SCHED_EXT to SCHED_NORMAL transitions through sched_setscheduler(), as Andrea pointed out. kernel/sched/core.c | 2 ++ kernel/sched/ext.h | 11 +++++++++++ 2 files changed, 13 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b8871449d3c6..c694aabc451a 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -11200,6 +11200,8 @@ void sched_change_end(struct sched_change_ctx *ctx) */ WARN_ON_ONCE(p->sched_class != ctx->class && !(ctx->flags & ENQUEUE_CLASS)); + scx_rebuild_fair_weight_on_class_switch(p, ctx->class, p->sched_class); + if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to) p->sched_class->switching_to(rq, p); diff --git a/kernel/sched/ext.h b/kernel/sched/ext.h index 0b7fc46aee08..1f8248c897af 100644 --- a/kernel/sched/ext.h +++ b/kernel/sched/ext.h @@ -35,6 +35,14 @@ static inline bool task_on_scx(const struct task_struct *p) return scx_enabled() && p->sched_class == &ext_sched_class; } +static inline void scx_rebuild_fair_weight_on_class_switch(struct task_struct *p, + const struct sched_class *old_class, + const struct sched_class *new_class) +{ + if (old_class == &ext_sched_class && new_class == &fair_sched_class) + set_load_weight(p, false); +} + #ifdef CONFIG_SCHED_CORE bool scx_prio_less(const struct task_struct *a, const struct task_struct *b, bool in_fi); @@ -55,6 +63,9 @@ static inline int scx_check_setscheduler(struct task_struct *p, int policy) { re static inline bool task_on_scx(const struct task_struct *p) { return false; } static inline bool scx_allow_ttwu_queue(const struct task_struct *p) { return true; } static inline void init_sched_ext_class(void) {} +static inline void scx_rebuild_fair_weight_on_class_switch(struct task_struct *p, + const struct sched_class *old_class, + const struct sched_class *new_class) {} #endif /* CONFIG_SCHED_CLASS_EXT */ -- 2.43.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches 2026-05-27 9:40 ` [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches quzicheng315 @ 2026-05-27 11:26 ` Peter Zijlstra 2026-05-28 2:53 ` Zicheng Qu 0 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2026-05-27 11:26 UTC (permalink / raw) To: quzicheng315 Cc: arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo, quzicheng, rostedt, sched-ext, tanghui20, tj, vincent.guittot, void, vschneid, zhangqiao22 On Wed, May 27, 2026 at 05:40:37PM +0800, quzicheng315@gmail.com wrote: > From: Zicheng Qu <quzicheng315@gmail.com> > > Tasks running on sched_ext do not use p->se.load as their active > scheduling weight. Their nice-derived weight is maintained as > p->scx.weight instead. > > When such a task switches back to fair, CFS expects p->se.load to match > the task's current policy/static_prio before the task is enqueued. > However, not all ext to fair transitions rebuild p->se.load. For > example, scx_root_disable() switches tasks back to fair directly, and > partial mode can move a task from SCHED_EXT to SCHED_NORMAL through > sched_setscheduler(). In the latter case, set_load_weight(p, true) runs > while p->sched_class is still ext_sched_class, so reweight_task_scx() > updates p->scx.weight but leaves p->se.load stale. > > Rebuild the fair load weight in sched_change_end() when the class switch > is from ext_sched_class to fair_sched_class. This is after the class has > been changed and before the task is enqueued on fair, so CFS sees a > native load_weight derived from the task's current policy/static_prio. > > Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") > Signed-off-by: Zicheng Qu <quzicheng@huawei.com> > --- > Changes in v2: > - Move the fix from scx_root_disable() to sched_change_end() so the same > ext-to-fair rebuild also covers partial mode SCHED_EXT to SCHED_NORMAL > transitions through sched_setscheduler(), as Andrea pointed out. > > kernel/sched/core.c | 2 ++ > kernel/sched/ext.h | 11 +++++++++++ > 2 files changed, 13 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index b8871449d3c6..c694aabc451a 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -11200,6 +11200,8 @@ void sched_change_end(struct sched_change_ctx *ctx) > */ > WARN_ON_ONCE(p->sched_class != ctx->class && !(ctx->flags & ENQUEUE_CLASS)); > > + scx_rebuild_fair_weight_on_class_switch(p, ctx->class, p->sched_class); > + > if ((ctx->flags & ENQUEUE_CLASS) && p->sched_class->switching_to) > p->sched_class->switching_to(rq, p); > > diff --git a/kernel/sched/ext.h b/kernel/sched/ext.h > index 0b7fc46aee08..1f8248c897af 100644 > --- a/kernel/sched/ext.h > +++ b/kernel/sched/ext.h > @@ -35,6 +35,14 @@ static inline bool task_on_scx(const struct task_struct *p) > return scx_enabled() && p->sched_class == &ext_sched_class; > } > > +static inline void scx_rebuild_fair_weight_on_class_switch(struct task_struct *p, > + const struct sched_class *old_class, > + const struct sched_class *new_class) > +{ > + if (old_class == &ext_sched_class && new_class == &fair_sched_class) > + set_load_weight(p, false); > +} > + > #ifdef CONFIG_SCHED_CORE > bool scx_prio_less(const struct task_struct *a, const struct task_struct *b, > bool in_fi); > @@ -55,6 +63,9 @@ static inline int scx_check_setscheduler(struct task_struct *p, int policy) { re > static inline bool task_on_scx(const struct task_struct *p) { return false; } > static inline bool scx_allow_ttwu_queue(const struct task_struct *p) { return true; } > static inline void init_sched_ext_class(void) {} > +static inline void scx_rebuild_fair_weight_on_class_switch(struct task_struct *p, > + const struct sched_class *old_class, > + const struct sched_class *new_class) {} > > #endif /* CONFIG_SCHED_CLASS_EXT */ This is truly horrible. We have 4 class methods involved with switching classes and you stick in a random call in a place that is called when no class is changed. Would not something like this work? diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 62a2dcb0d03e..a2eb43bd73b9 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -14957,6 +14957,11 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p) detach_task_cfs_rq(p); } +static void switching_to_fair(struct rq *rq, struct task_struct *p) +{ + set_load_weight(p, false); +} + static void switched_to_fair(struct rq *rq, struct task_struct *p) { WARN_ON_ONCE(p->se.sched_delayed); @@ -15351,6 +15356,7 @@ DEFINE_SCHED_CLASS(fair) = { .prio_changed = prio_changed_fair, .switching_from = switching_from_fair, .switched_from = switched_from_fair, + .switching_to = switching_to_fair, .switched_to = switched_to_fair, .get_rr_interval = get_rr_interval_fair, ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches 2026-05-27 11:26 ` Peter Zijlstra @ 2026-05-28 2:53 ` Zicheng Qu 2026-05-28 9:25 ` Peter Zijlstra 0 siblings, 1 reply; 10+ messages in thread From: Zicheng Qu @ 2026-05-28 2:53 UTC (permalink / raw) To: Peter Zijlstra Cc: arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo, quzicheng, rostedt, sched-ext, tanghui20, tj, vincent.guittot, void, vschneid, zhangqiao22, quzicheng315 On Wed, May 27, 2026 at 07:26PM +0800, Peter Zijlstra wrote: > This is truly horrible. We have 4 class methods involved with switching > classes and you stick in a random call in a place that is called when no > class is changed. > > Would not something like this work? > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 62a2dcb0d03e..a2eb43bd73b9 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -14957,6 +14957,11 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p) > detach_task_cfs_rq(p); > } > > +static void switching_to_fair(struct rq *rq, struct task_struct *p) > +{ > + set_load_weight(p, false); > +} > + > static void switched_to_fair(struct rq *rq, struct task_struct *p) > { > WARN_ON_ONCE(p->se.sched_delayed); > @@ -15351,6 +15356,7 @@ DEFINE_SCHED_CLASS(fair) = { > .prio_changed = prio_changed_fair, > .switching_from = switching_from_fair, > .switched_from = switched_from_fair, > + .switching_to = switching_to_fair, > .switched_to = switched_to_fair, > > .get_rr_interval = get_rr_interval_fair, Yes, from the class switch point of view, `switching_to_fair()` is a better fit. Before v2, I was weighing three possible places for the fix: 1. Updating `p->se.load` from `reweight_task_scx()`. This would keep the fair weight in sync while the task is on sched_ext, so switching back to fair would not need any extra fixup. However, it would also make sched_ext maintain fair class state even when fair is not using it, which does not seem like the right ownership model. 2. Rebuilding `p->se.load` from fair's `switching_to` hook. This is the most natural place semantically, since the task is entering fair and fair prepares its own state before enqueue. My only concern was that, for non-ext -> fair paths, `__setscheduler_params()` may have already updated `p->se.load` through `set_load_weight(p, true)`, so calling `set_load_weight(p, false)` unconditionally here can be redundant logically. Functionally, though, it is harmless. 3. Rebuilding in `sched_change_end()` based on the old/new classes. That was the v2 choice because both classes are available there, the task has not been enqueued yet, and it covers both `scx_root_disable()` and the partial-mode `sched_setscheduler()` path. In hindsight, though, this makes the generic sched_change path handle a scx & fair-specific fixup. That is more awkward than letting fair prepare its own state in `switching_to_fair()`. I'll respin v3 as you suggested. Thanks, Zicheng ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches 2026-05-28 2:53 ` Zicheng Qu @ 2026-05-28 9:25 ` Peter Zijlstra 2026-05-28 13:12 ` [PATCH v3] sched/fair: Rebuild load weight when switching to fair quzicheng315 0 siblings, 1 reply; 10+ messages in thread From: Peter Zijlstra @ 2026-05-28 9:25 UTC (permalink / raw) To: Zicheng Qu Cc: arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo, quzicheng, rostedt, sched-ext, tanghui20, tj, vincent.guittot, void, vschneid, zhangqiao22 On Thu, May 28, 2026 at 10:53:54AM +0800, Zicheng Qu wrote: > 2. Rebuilding `p->se.load` from fair's `switching_to` hook. This is the most > natural place semantically, since the task is entering fair and fair > prepares > its own state before enqueue. My only concern was that, for non-ext -> fair > paths, `__setscheduler_params()` may have already updated `p->se.load` > through > `set_load_weight(p, true)`, so calling `set_load_weight(p, false)` > unconditionally here can be redundant logically. Functionally, though, it is > harmless. Right. We can worry about optimizing this if there's ever a report. I don't expect this to be noticeable much. If anything, the PI code would be the one to trip this most often I think. ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v3] sched/fair: Rebuild load weight when switching to fair 2026-05-28 9:25 ` Peter Zijlstra @ 2026-05-28 13:12 ` quzicheng315 2026-05-28 14:27 ` Tejun Heo 2026-05-30 5:06 ` sashiko-bot 0 siblings, 2 replies; 10+ messages in thread From: quzicheng315 @ 2026-05-28 13:12 UTC (permalink / raw) To: peterz Cc: arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo, quzicheng315, quzicheng, rostedt, sched-ext, tanghui20, tj, vincent.guittot, void, vschneid, zhangqiao22 From: Zicheng Qu <quzicheng@huawei.com> Tasks that run outside fair may not keep p->se.load in sync with their current scheduling policy and static priority. sched_ext, for example, uses p->scx.weight as the active scheduling weight, so p->se.load can be stale when a task moves back to fair. The fair_sched_class expects the sched_entity load weight to be valid before the task is enqueued. Rebuild it from fair's switching_to hook, which runs after the class has been changed to fair and before enqueue, so both sched_ext disable and SCHED_EXT to SCHED_NORMAL transitions get a native fair load weight. Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Zicheng Qu <quzicheng@huawei.com> --- Changes in v3: - Move the rebuild into fair's switching_to hook, as suggested by Peter. This lets fair prepare its own state before enqueue and avoids adding a sched_ext/fair-specific fixup to the generic sched_change_end() path. Changes in v2: - Move the fix from scx_root_disable() to the class switch path so it also covers partial-mode SCHED_EXT to SCHED_NORMAL transitions through sched_setscheduler(). Andrea identified this missing case in the v1 discussion. kernel/sched/fair.c | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 3ebec186f982..3a21ceefcadf 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -13837,6 +13837,15 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p) detach_task_cfs_rq(p); } +static void switching_to_fair(struct rq *rq, struct task_struct *p) +{ + /* + * Tasks may come from classes that don't keep se.load up to date. + * Rebuild it before the task is enqueued. + */ + set_load_weight(p, false); +} + static void switched_to_fair(struct rq *rq, struct task_struct *p) { WARN_ON_ONCE(p->se.sched_delayed); @@ -14233,6 +14242,7 @@ DEFINE_SCHED_CLASS(fair) = { .prio_changed = prio_changed_fair, .switching_from = switching_from_fair, .switched_from = switched_from_fair, + .switching_to = switching_to_fair, .switched_to = switched_to_fair, .get_rr_interval = get_rr_interval_fair, -- 2.53.0 ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v3] sched/fair: Rebuild load weight when switching to fair 2026-05-28 13:12 ` [PATCH v3] sched/fair: Rebuild load weight when switching to fair quzicheng315 @ 2026-05-28 14:27 ` Tejun Heo 2026-05-30 5:06 ` sashiko-bot 1 sibling, 0 replies; 10+ messages in thread From: Tejun Heo @ 2026-05-28 14:27 UTC (permalink / raw) To: quzicheng315 Cc: peterz, arighi, brho, bsegall, changwoo, dietmar.eggemann, haoluo, joshdon, juri.lelli, kprateek.nayak, linux-kernel, mgorman, mingo, quzicheng, rostedt, sched-ext, tanghui20, vincent.guittot, void, vschneid, zhangqiao22 On Thu, May 28, 2026 at 09:12:38PM +0800, quzicheng315@gmail.com wrote: > From: Zicheng Qu <quzicheng@huawei.com> > > Tasks that run outside fair may not keep p->se.load in sync with their > current scheduling policy and static priority. sched_ext, for example, > uses p->scx.weight as the active scheduling weight, so p->se.load can be > stale when a task moves back to fair. > > The fair_sched_class expects the sched_entity load weight to be valid > before the task is enqueued. Rebuild it from fair's switching_to hook, > which runs after the class has been changed to fair and before enqueue, > so both sched_ext disable and SCHED_EXT to SCHED_NORMAL transitions get > a native fair load weight. > > Fixes: f0e1a0643a59 ("sched_ext: Implement BPF extensible scheduler class") > > Suggested-by: Peter Zijlstra <peterz@infradead.org> > Signed-off-by: Zicheng Qu <quzicheng@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Thanks. -- tejun ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v3] sched/fair: Rebuild load weight when switching to fair 2026-05-28 13:12 ` [PATCH v3] sched/fair: Rebuild load weight when switching to fair quzicheng315 2026-05-28 14:27 ` Tejun Heo @ 2026-05-30 5:06 ` sashiko-bot 1 sibling, 0 replies; 10+ messages in thread From: sashiko-bot @ 2026-05-30 5:06 UTC (permalink / raw) To: quzicheng315; +Cc: sched-ext Thank you for your contribution! Sashiko AI review found 1 potential issue(s) to consider: - [High] Updating task load weight without scaling its PELT averages corrupts load_sum when attaching to fair. -- commit 5e323dec0121afc6858dcf6c332ec85a8dc01718 Author: Zicheng Qu <quzicheng@huawei.com> sched/fair: Rebuild load weight when switching to fair Tasks running outside the fair scheduling class may not keep their load weight in sync. This commit rebuilds the load weight from the switching_to hook before enqueueing back into fair. > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 62a2dcb0d03e6..3671b512dcf7f 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c [ ... ] > @@ -14957,6 +14957,15 @@ static void switched_from_fair(struct rq *rq, struct task_struct *p) > detach_task_cfs_rq(p); > } > > +static void switching_to_fair(struct rq *rq, struct task_struct *p) > +{ > + /* > + * Tasks may come from classes that don't keep se.load up to date. > + * Rebuild it before the task is enqueued. > + */ > + set_load_weight(p, false); [Severity: High] Does updating the task load weight without scaling its PELT averages corrupt load_sum when attaching to fair? When a task transitions to fair, switched_to_fair() enqueues it by calling attach_task_cfs_rq() -> attach_entity_load_avg(). The attach_entity_load_avg() function relies on se.load.weight matching se.avg.load_avg to reconstruct the unweighted load_sum: kernel/sched/fair.c:attach_entity_load_avg() { ... se->avg.load_sum = div_u64(se->avg.load_sum, se_weight(se)); ... } Since load_avg is based on the old weight but se_weight() returns the newly updated weight, load_sum ends up mathematically scaled by old_weight / new_weight. If a task's nice value was changed to a lower priority (smaller weight) while outside fair, load_sum could be artificially inflated by a massive factor, polluting cfs_rq->avg and disrupting CPU load balancing. > +} > + -- Sashiko AI review · https://sashiko.dev/#/patchset/20260528131238.3879110-1-quzicheng315@gmail.com?part=1 ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2026-05-30 5:06 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2026-05-26 13:52 [PATCH] sched_ext: Rebuild fair weight when disabling BPF scheduler Zicheng Qu 2026-05-26 15:18 ` sashiko-bot 2026-05-26 17:20 ` Andrea Righi 2026-05-27 9:40 ` [PATCH v2] sched_ext: Rebuild fair weight on ext to fair switches quzicheng315 2026-05-27 11:26 ` Peter Zijlstra 2026-05-28 2:53 ` Zicheng Qu 2026-05-28 9:25 ` Peter Zijlstra 2026-05-28 13:12 ` [PATCH v3] sched/fair: Rebuild load weight when switching to fair quzicheng315 2026-05-28 14:27 ` Tejun Heo 2026-05-30 5:06 ` sashiko-bot
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.