From: luca abeni <luca.abeni@santannapisa.it>
To: Peter Zijlstra <peterz@infradead.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>,
Pierre Gondois <pierre.gondois@arm.com>,
tj@kernel.org, linux-kernel@vger.kernel.org, mingo@kernel.org,
juri.lelli@redhat.com, vincent.guittot@linaro.org,
dietmar.eggemann@arm.com, rostedt@goodmis.org,
bsegall@google.com, mgorman@suse.de, vschneid@redhat.com,
longman@redhat.com, hannes@cmpxchg.org, mkoutny@suse.com,
void@manifault.com, arighi@nvidia.com, changwoo@igalia.com,
cgroups@vger.kernel.org, sched-ext@lists.linux.dev,
liuwenfang@honor.com, tglx@linutronix.de,
Christian Loehle <christian.loehle@arm.com>
Subject: Re: [PATCH 05/12] sched: Move sched_class::prio_changed() into the change pattern
Date: Wed, 14 Jan 2026 15:04:30 +0100 [thread overview]
Message-ID: <20260114150430.36cb2b4a@nowhere> (raw)
In-Reply-To: <20260114130528.GB831285@noisy.programming.kicks-ass.net>
Hi Peter,
On Wed, 14 Jan 2026 14:05:28 +0100
Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, Jan 14, 2026 at 11:23:36AM +0100, Peter Zijlstra wrote:
>
> > Juri, Luca, I'm tempted to suggest to simply remove the replenish on
> > RESTORE entirely -- that would allow the task to continue as it had
> > been, irrespective of it being 'late'.
> >
> > Something like so -- what would this break?
> >
> > --- a/kernel/sched/deadline.c
> > +++ b/kernel/sched/deadline.c
> > @@ -2214,10 +2214,6 @@ enqueue_dl_entity(struct sched_dl_entity
> > update_dl_entity(dl_se);
> > } else if (flags & ENQUEUE_REPLENISH) {
> > replenish_dl_entity(dl_se);
> > - } else if ((flags & ENQUEUE_RESTORE) &&
> > - !is_dl_boosted(dl_se) &&
> > - dl_time_before(dl_se->deadline,
> > rq_clock(rq_of_dl_se(dl_se)))) {
> > - setup_new_dl_entity(dl_se);
> > }
> >
> > /*
>
> Ah, this is de-boost, right? Boosting allows one to break the CBS
> rules and then we have to rein in the excesses.
Sorry, I am missing a little bit of context (I am trying to catch up
reading the mailing list archives)... But I agree that the call to
setup_new_dl_entity() mentioned above does not make too much sense.
I suspect the hunk above could be directly removed, as you originally
suggested (on de-boosting(), the task returns to its original deadline,
which is larger than the inherited one, so I am not sure whether we
should generate a new deadline or just leave it as it is, even if it
has been missed).
Luca
>
> But we have {DE,EN}QUEUE_MOVE for this, that explicitly allows
> priority to change and is set for rt_mutex_setprio() (among others).
>
> So doing s/RESTORE/MOVE/ above.
>
> The corollary to all this is that everybody that sets MOVE must be
> able to deal with balance callbacks, so audit that too.
>
> This then gives something like so.. which builds and boots for me, but
> clearly I haven't been able to trigger these funny cases.
>
> ---
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -4969,9 +4969,13 @@ struct balance_callback *splice_balance_
> return __splice_balance_callbacks(rq, true);
> }
>
> -static void __balance_callbacks(struct rq *rq)
> +void __balance_callbacks(struct rq *rq, struct rq_flags *rf)
> {
> + if (rf)
> + rq_unpin_lock(rq, rf);
> do_balance_callbacks(rq, __splice_balance_callbacks(rq,
> false));
> + if (rf)
> + rq_repin_lock(rq, rf);
> }
>
> void balance_callbacks(struct rq *rq, struct balance_callback *head)
> @@ -5018,7 +5022,7 @@ static inline void finish_lock_switch(st
> * prev into current:
> */
> spin_acquire(&__rq_lockp(rq)->dep_map, 0, 0, _THIS_IP_);
> - __balance_callbacks(rq);
> + __balance_callbacks(rq, NULL);
> raw_spin_rq_unlock_irq(rq);
> }
>
> @@ -6901,7 +6905,7 @@ static void __sched notrace __schedule(i
> proxy_tag_curr(rq, next);
>
> rq_unpin_lock(rq, &rf);
> - __balance_callbacks(rq);
> + __balance_callbacks(rq, NULL);
> raw_spin_rq_unlock_irq(rq);
> }
> trace_sched_exit_tp(is_switch);
> @@ -7350,7 +7354,7 @@ void rt_mutex_setprio(struct task_struct
> trace_sched_pi_setprio(p, pi_task);
> oldprio = p->prio;
>
> - if (oldprio == prio)
> + if (oldprio == prio && !dl_prio(prio))
> queue_flag &= ~DEQUEUE_MOVE;
>
> prev_class = p->sched_class;
> @@ -7396,9 +7400,7 @@ void rt_mutex_setprio(struct task_struct
> out_unlock:
> /* Caller holds task_struct::pi_lock, IRQs are still
> disabled */
> - rq_unpin_lock(rq, &rf);
> - __balance_callbacks(rq);
> - rq_repin_lock(rq, &rf);
> + __balance_callbacks(rq, &rf);
> __task_rq_unlock(rq, p, &rf);
> }
> #endif /* CONFIG_RT_MUTEXES */
> @@ -9167,6 +9169,8 @@ void sched_move_task(struct task_struct
>
> if (resched)
> resched_curr(rq);
> +
> + __balance_callbacks(rq, &rq_guard.rf);
> }
>
> static struct cgroup_subsys_state *
> @@ -10891,6 +10895,9 @@ void sched_change_end(struct sched_chang
> resched_curr(rq);
> }
> } else {
> + /*
> + * XXX validate prio only really changed when
> ENQUEUE_MOVE is set.
> + */
> p->sched_class->prio_changed(rq, p, ctx->prio);
> }
> }
> --- a/kernel/sched/deadline.c
> +++ b/kernel/sched/deadline.c
> @@ -2214,9 +2214,14 @@ enqueue_dl_entity(struct sched_dl_entity
> update_dl_entity(dl_se);
> } else if (flags & ENQUEUE_REPLENISH) {
> replenish_dl_entity(dl_se);
> - } else if ((flags & ENQUEUE_RESTORE) &&
> + } else if ((flags & ENQUEUE_MOVE) &&
> !is_dl_boosted(dl_se) &&
> dl_time_before(dl_se->deadline,
> rq_clock(rq_of_dl_se(dl_se)))) {
> + /*
> + * Deals with the de-boost case, and ENQUEUE_MOVE
> explicitly
> + * allows us to change priority. Callers are
> expected to deal
> + * with balance_callbacks.
> + */
> setup_new_dl_entity(dl_se);
> }
>
> --- a/kernel/sched/ext.c
> +++ b/kernel/sched/ext.c
> @@ -545,6 +545,7 @@ static void scx_task_iter_start(struct s
> static void __scx_task_iter_rq_unlock(struct scx_task_iter *iter)
> {
> if (iter->locked_task) {
> + __balance_callbacks(iter->rq, &iter->rf);
> task_rq_unlock(iter->rq, iter->locked_task,
> &iter->rf); iter->locked_task = NULL;
> }
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2430,7 +2430,8 @@ extern const u32
> sched_prio_to_wmult[40
> * should preserve as much state as possible.
> *
> * MOVE - paired with SAVE/RESTORE, explicitly does not preserve the
> location
> - * in the runqueue.
> + * in the runqueue. IOW the priority is allowed to change.
> Callers
> + * must expect to deal with balance callbacks.
> *
> * NOCLOCK - skip the update_rq_clock() (avoids double updates)
> *
> @@ -4019,6 +4020,8 @@ extern void enqueue_task(struct rq *rq,
> extern bool dequeue_task(struct rq *rq, struct task_struct *p, int
> flags);
> extern struct balance_callback *splice_balance_callbacks(struct rq
> *rq); +
> +extern void __balance_callbacks(struct rq *rq, struct rq_flags *rf);
> extern void balance_callbacks(struct rq *rq, struct balance_callback
> *head);
> /*
> --- a/kernel/sched/syscalls.c
> +++ b/kernel/sched/syscalls.c
> @@ -639,7 +639,7 @@ int __sched_setscheduler(struct task_str
> * itself.
> */
> newprio = rt_effective_prio(p, newprio);
> - if (newprio == oldprio)
> + if (newprio == oldprio && !dl_prio(newprio))
> queue_flags &= ~DEQUEUE_MOVE;
> }
>
next prev parent reply other threads:[~2026-01-14 14:04 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-06 10:44 [PATCH 00/12] sched: Cleanup the change-pattern and related locking Peter Zijlstra
2025-10-06 10:44 ` [PATCH 01/12] sched: Employ sched_change guards Peter Zijlstra
2025-10-07 8:20 ` Andrea Righi
2025-10-08 6:51 ` Peter Zijlstra
2025-10-08 6:58 ` Andrea Righi
2025-10-07 16:58 ` Valentin Schneider
2025-10-08 14:02 ` Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 02/12] sched: Re-arrange the {EN,DE}QUEUE flags Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 03/12] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern Peter Zijlstra
2025-10-09 13:30 ` Dietmar Eggemann
2025-10-09 13:54 ` Peter Zijlstra
2025-10-09 14:09 ` Peter Zijlstra
2025-10-09 16:50 ` Dietmar Eggemann
2025-10-13 10:23 ` Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] sched/deadline: Prepare for switched_from() change tip-bot2 for Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] sched: Fold sched_class::switch{ing,ed}_{to,from}() into the change pattern tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 04/12] sched: Cleanup sched_delayed handling for class switches Peter Zijlstra
2025-10-07 15:22 ` Vincent Guittot
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 05/12] sched: Move sched_class::prio_changed() into the change pattern Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-01-12 20:44 ` [PATCH 05/12] " Pierre Gondois
2026-01-13 4:12 ` K Prateek Nayak
2026-01-13 10:45 ` Pierre Gondois
2026-01-13 11:05 ` K Prateek Nayak
2026-01-13 11:53 ` Peter Zijlstra
2026-01-13 11:56 ` Peter Zijlstra
2026-01-13 13:07 ` Pierre Gondois
2026-01-13 13:10 ` Pierre Gondois
2026-01-15 21:01 ` [tip: sched/urgent] sched/deadline: Avoid double update_rq_clock() tip-bot2 for Peter Zijlstra
2026-01-13 11:47 ` [PATCH 05/12] sched: Move sched_class::prio_changed() into the change pattern Peter Zijlstra
2026-01-14 6:47 ` K Prateek Nayak
2026-01-14 10:23 ` Peter Zijlstra
2026-01-14 13:05 ` Peter Zijlstra
2026-01-14 14:04 ` luca abeni [this message]
2026-01-14 14:20 ` Juri Lelli
2026-01-14 15:25 ` luca abeni
2026-01-15 8:24 ` Peter Zijlstra
2026-01-15 9:05 ` Peter Zijlstra
2026-01-15 13:13 ` Pierre Gondois
2026-01-15 13:56 ` Juri Lelli
2026-01-15 21:00 ` [tip: sched/urgent] sched/deadline: Use ENQUEUE_MOVE to allow priority change tip-bot2 for Peter Zijlstra
2026-01-15 21:00 ` [tip: sched/urgent] sched: Deadline has dynamic priority tip-bot2 for Peter Zijlstra
2026-01-15 21:01 ` [tip: sched/urgent] sched: Audit MOVE vs balance_callbacks tip-bot2 for Peter Zijlstra
2026-01-15 21:01 ` [tip: sched/urgent] sched: Fold rq-pin swizzle into __balance_callbacks() tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 06/12] sched: Fix migrate_disable_switch() locking Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 07/12] sched: Fix do_set_cpus_allowed() locking Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-24 14:58 ` [REGRESSION] Deadlock during CPU hotplug caused by abfc01077df6 Jan Polensky
2025-10-06 10:44 ` [PATCH 08/12] sched: Rename do_set_cpus_allowed() Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 09/12] sched: Make __do_set_cpus_allowed() use the sched_change pattern Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 10/12] sched: Add locking comments to sched_class methods Peter Zijlstra
2025-10-07 9:54 ` Juri Lelli
2025-10-08 7:04 ` Peter Zijlstra
2025-10-08 7:33 ` Greg Kroah-Hartman
2025-10-08 9:43 ` Juri Lelli
2025-10-08 10:06 ` Greg Kroah-Hartman
2025-10-08 14:34 ` Steven Rostedt
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 11/12] sched: Match __task_rq_{,un}lock() Peter Zijlstra
2025-10-07 20:44 ` Tejun Heo
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-06 10:44 ` [PATCH 12/12] sched: Cleanup the sched_change NOCLOCK usage Peter Zijlstra
2025-10-16 9:33 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2025-10-07 8:25 ` [PATCH 00/12] sched: Cleanup the change-pattern and related locking Andrea Righi
2025-10-07 9:55 ` Juri Lelli
2025-10-07 15:23 ` Vincent Guittot
2025-10-07 20:46 ` Tejun Heo
2025-10-08 13:54 ` Valentin Schneider
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260114150430.36cb2b4a@nowhere \
--to=luca.abeni@santannapisa.it \
--cc=arighi@nvidia.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=changwoo@igalia.com \
--cc=christian.loehle@arm.com \
--cc=dietmar.eggemann@arm.com \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=kprateek.nayak@amd.com \
--cc=linux-kernel@vger.kernel.org \
--cc=liuwenfang@honor.com \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@kernel.org \
--cc=mkoutny@suse.com \
--cc=peterz@infradead.org \
--cc=pierre.gondois@arm.com \
--cc=rostedt@goodmis.org \
--cc=sched-ext@lists.linux.dev \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=void@manifault.com \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.