From: luca abeni <luca.abeni@santannapisa.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>,
Pierre Gondois <pierre.gondois@arm.com>,
tj@kernel.org, linux-kernel@vger.kernel.org, mingo@kernel.org,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com,
void@manifault.com, arighi@nvidia.com, changwoo@igalia.com,
cgroups@vger.kernel.org, sched-ext@lists.linux.dev,
liuwenfang@honor.com, tglx@linutronix.de,
Christian Loehle <christian.loehle@arm.com>
Subject: Re: [PATCH 05/12] sched: Move sched_class::prio_changed() into the change pattern
Date: Wed, 14 Jan 2026 15:04:30 +0100 [thread overview]
Message-ID: <20260114150430.36cb2b4a@nowhere> (raw)
In-Reply-To: <20260114130528.GB831285@noisy.programming.kicks-ass.net>
Hi Peter,
On Wed, 14 Jan 2026 14:05:28 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jan 14, 2026 at 11:23:36AM +0100, Peter Zijlstra wrote:
>
> > Juri, Luca, I'm tempted to suggest to simply remove the replenish on
> > RESTORE entirely -- that would allow the task to continue as it had
> > been, irrespective of it being 'late'.
> >
> > Something like so -- what would this break?
> >
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -2214,10 +2214,6 @@ enqueue_dl_entity(struct sched_dl_entity
> > update_dl_entity(dl_se);
> > } else if (flags & ENQUEUE_REPLENISH) {
> > replenish_dl_entity(dl_se);
> > - } else if ((flags & ENQUEUE_RESTORE) &&
> > - !is_dl_boosted(dl_se) &&
> > - dl_time_before(dl_se->deadline,
> > rq_clock(rq_of_dl_se(dl_se)))) {
> > - setup_new_dl_entity(dl_se);
> > }
> >
> > /*
>
> Ah, this is de-boost, right? Boosting allows one to break the CBS
> rules and then we have to rein in the excesses.
Sorry, I am missing a little bit of context (I am trying to catch up
reading the mailing list archives)... But I agree that the call to
setup_new_dl_entity() mentioned above does not make too much sense.
I suspect the hunk above could be directly removed, as you originally
suggested (on de-boosting(), the task returns to its original deadline,
which is larger than the inherited one, so I am not sure whether we
should generate a new deadline or just leave it as it is, even if it
has been missed).
Luca
>
> But we have {DE,EN}QUEUE_MOVE for this, that explicitly allows
> priority to change and is set for rt_mutex_setprio() (among others).
>
> So doing s/RESTORE/MOVE/ above.
>
> The corollary to all this is that everybody that sets MOVE must be
> able to deal with balance callbacks, so audit that too.
>
> This then gives something like so.. which builds and boots for me, but
> clearly I haven't been able to trigger these funny cases.
>
> ---
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4969,9 +4969,13 @@ struct balance_callback *splice_balance_
> return __splice_balance_callbacks(rq, true);
> }
>
> -static void __balance_callbacks(struct rq *rq)
> +void __balance_callbacks(struct rq *rq, struct rq_flags *rf)
> {
> + if (rf)
> + rq_unpin_lock(rq, rf);
> do_balance_callbacks(rq, __splice_balance_callbacks(rq,
> false));
> + if (rf)
> + rq_repin_lock(rq, rf);
> }
>
> void balance_callbacks(struct rq *rq, struct balance_callback *head)
> @@ -5018,7 +5022,7 @@ static inline void finish_lock_switch(st
> * prev into current:
> */
> spin_acquire(&__rq_lockp(rq)->dep_map, 0, 0, _THIS_IP_);
> - __balance_callbacks(rq);
> + __balance_callbacks(rq, NULL);
> raw_spin_rq_unlock_irq(rq);
> }
>
> @@ -6901,7 +6905,7 @@ static void __sched notrace __schedule(i
> proxy_tag_curr(rq, next);
>
> rq_unpin_lock(rq, &rf);
> - __balance_callbacks(rq);
> + __balance_callbacks(rq, NULL);
> raw_spin_rq_unlock_irq(rq);
> }
> trace_sched_exit_tp(is_switch);
> @@ -7350,7 +7354,7 @@ void rt_mutex_setprio(struct task_struct
> trace_sched_pi_setprio(p, pi_task);
> oldprio = p->prio;
>
> - if (oldprio == prio)
> + if (oldprio == prio && !dl_prio(prio))
> queue_flag &= ~DEQUEUE_MOVE;
>
> prev_class = p->sched_class;
> @@ -7396,9 +7400,7 @@ void rt_mutex_setprio(struct task_struct
> out_unlock:
> /* Caller holds task_struct::pi_lock, IRQs are still
> disabled */
> - rq_unpin_lock(rq, &rf);
> - __balance_callbacks(rq);
> - rq_repin_lock(rq, &rf);
> + __balance_callbacks(rq, &rf);
> __task_rq_unlock(rq, p, &rf);
> }
> #endif /* CONFIG_RT_MUTEXES */
> @@ -9167,6 +9169,8 @@ void sched_move_task(struct task_struct
>
> if (resched)
> resched_curr(rq);
> +
> + __balance_callbacks(rq, &rq_guard.rf);
> }
>
> static struct cgroup_subsys_state *
> @@ -10891,6 +10895,9 @@ void sched_change_end(struct sched_chang
> resched_curr(rq);
> }
> } else {
> + /*
> + * XXX validate prio only really changed when
> ENQUEUE_MOVE is set.
> + */
> p->sched_class->prio_changed(rq, p, ctx->prio);
> }
> }
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2214,9 +2214,14 @@ enqueue_dl_entity(struct sched_dl_entity
> update_dl_entity(dl_se);
> } else if (flags & ENQUEUE_REPLENISH) {
> replenish_dl_entity(dl_se);
> - } else if ((flags & ENQUEUE_RESTORE) &&
> + } else if ((flags & ENQUEUE_MOVE) &&
> !is_dl_boosted(dl_se) &&
> dl_time_before(dl_se->deadline,
> rq_clock(rq_of_dl_se(dl_se)))) {
> + /*
> + * Deals with the de-boost case, and ENQUEUE_MOVE
> explicitly
> + * allows us to change priority. Callers are
> expected to deal
> + * with balance_callbacks.
> + */
> setup_new_dl_entity(dl_se);
> }
>
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -545,6 +545,7 @@ static void scx_task_iter_start(struct s
> static void __scx_task_iter_rq_unlock(struct scx_task_iter *iter)
> {
> if (iter->locked_task) {
> + __balance_callbacks(iter->rq, &iter->rf);
> task_rq_unlock(iter->rq, iter->locked_task,
> &iter->rf); iter->locked_task = NULL;
> }
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2430,7 +2430,8 @@ extern const u32
> sched_prio_to_wmult[40
> * should preserve as much state as possible.
> *
> * MOVE - paired with SAVE/RESTORE, explicitly does not preserve the
> location
> - * in the runqueue.
> + * in the runqueue. IOW the priority is allowed to change.
> Callers
> + * must expect to deal with balance callbacks.
> *
> * NOCLOCK - skip the update_rq_clock() (avoids double updates)
> *
> @@ -4019,6 +4020,8 @@ extern void enqueue_task(struct rq *rq,
> extern bool dequeue_task(struct rq *rq, struct task_struct *p, int
> flags);
> extern struct balance_callback *splice_balance_callbacks(struct rq
> *rq); +
> +extern void __balance_callbacks(struct rq *rq, struct rq_flags *rf);
> extern void balance_callbacks(struct rq *rq, struct balance_callback
> *head);
> /*
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -639,7 +639,7 @@ int __sched_setscheduler(struct task_str
> * itself.
> */
> newprio = rt_effective_prio(p, newprio);
> - if (newprio == oldprio)
> + if (newprio == oldprio && !dl_prio(newprio))
> queue_flags &= ~DEQUEUE_MOVE;
> }
>
next prev parent reply other threads:[~2026-01-14 14:04 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-06 10:44 [PATCH 00/12] sched: Cleanup the change-pattern and related locking Peter Zijlstra
2025-10-06 10:44 ` [PATCH 01/12] sched: Employ sched_change guards Peter Zijlstra
2025-10-07 8:20 ` Andrea Righi
2025-10-08 6:51 ` Peter Zijlstra
2025-10-08 6:58 ` Andrea Righi
2025-10-07 16:58 ` Valentin Schneider
2025-10-08 14:02 ` Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 02/12] sched: Re-arrange the {EN,DE}QUEUE flags Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 03/12] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern Peter Zijlstra
2025-10-09 13:30 ` Dietmar Eggemann
2025-10-09 13:54 ` Peter Zijlstra
2025-10-09 14:09 ` Peter Zijlstra
2025-10-09 16:50 ` Dietmar Eggemann
2025-10-13 10:23 ` Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] sched/deadline: Prepare for switched_from() change tip-bot2 for Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 04/12] sched: Cleanup sched_delayed handling for class switches Peter Zijlstra
2025-10-07 15:22 ` Vincent Guittot
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 05/12] sched: Move sched_class::prio_changed() into the change pattern Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-01-12 20:44 ` [PATCH 05/12] " Pierre Gondois
2026-01-13 4:12 ` K Prateek Nayak
2026-01-13 10:45 ` Pierre Gondois
2026-01-13 11:05 ` K Prateek Nayak
2026-01-13 11:53 ` Peter Zijlstra
2026-01-13 11:56 ` Peter Zijlstra
2026-01-13 13:07 ` Pierre Gondois
2026-01-13 13:10 ` Pierre Gondois
2026-01-15 21:01 ` [tip: sched/urgent] sched/deadline: Avoid double update_rq_clock() tip-bot2 for Peter Zijlstra
2026-01-13 11:47 ` [PATCH 05/12] sched: Move sched_class::prio_changed() into the change pattern Peter Zijlstra
2026-01-14 6:47 ` K Prateek Nayak
2026-01-14 10:23 ` Peter Zijlstra
2026-01-14 13:05 ` Peter Zijlstra
2026-01-14 14:04 ` luca abeni [this message]
2026-01-14 14:20 ` Juri Lelli
2026-01-14 15:25 ` luca abeni
2026-01-15 8:24 ` Peter Zijlstra
2026-01-15 9:05 ` Peter Zijlstra
2026-01-15 13:13 ` Pierre Gondois
2026-01-15 13:56 ` Juri Lelli
2026-01-15 21:00 ` [tip: sched/urgent] sched/deadline: Use ENQUEUE_MOVE to allow priority change tip-bot2 for Peter Zijlstra
2026-01-15 21:00 ` [tip: sched/urgent] sched: Deadline has dynamic priority tip-bot2 for Peter Zijlstra
2026-01-15 21:01 ` [tip: sched/urgent] sched: Audit MOVE vs balance_callbacks tip-bot2 for Peter Zijlstra
2026-01-15 21:01 ` [tip: sched/urgent] sched: Fold rq-pin swizzle into __balance_callbacks() tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 06/12] sched: Fix migrate_disable_switch() locking Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 07/12] sched: Fix do_set_cpus_allowed() locking Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-24 14:58 ` [REGRESSION] Deadlock during CPU hotplug caused by abfc01077df6 Jan Polensky
2025-10-06 10:44 ` [PATCH 08/12] sched: Rename do_set_cpus_allowed() Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 09/12] sched: Make __do_set_cpus_allowed() use the sched_change pattern Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 10/12] sched: Add locking comments to sched_class methods Peter Zijlstra
2025-10-07 9:54 ` Juri Lelli
2025-10-08 7:04 ` Peter Zijlstra
2025-10-08 7:33 ` Greg Kroah-Hartman
2025-10-08 9:43 ` Juri Lelli
2025-10-08 10:06 ` Greg Kroah-Hartman
2025-10-08 14:34 ` Steven Rostedt
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 11/12] sched: Match __task_rq_{,un}lock() Peter Zijlstra
2025-10-07 20:44 ` Tejun Heo
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 12/12] sched: Cleanup the sched_change NOCLOCK usage Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-07 8:25 ` [PATCH 00/12] sched: Cleanup the change-pattern and related locking Andrea Righi
2025-10-07 9:55 ` Juri Lelli
2025-10-07 15:23 ` Vincent Guittot
2025-10-07 20:46 ` Tejun Heo
2025-10-08 13:54 ` Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260114150430.36cb2b4a@nowhere \
--to=luca.abeni@santannapisa.it \
--cc=arighi@nvidia.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=changwoo@igalia.com \
--cc=christian.loehle@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liuwenfang@honor.com \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=pierre.gondois@arm.com \
--cc=rostedt@goodmis.org \
--cc=sched-ext@lists.linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox