* [PATCH 0/6] sched/proxy: doodles..
@ 2026-05-26 11:16 Peter Zijlstra
2026-05-26 11:16 ` [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in() Peter Zijlstra
` (6 more replies)
0 siblings, 7 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 11:16 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, peterz, Mike Galbraith
This goes on top of queue:sched/proxy.
I was trying to do some cleanups and playing around with that PROXY_WAKING
removal, and also moving towards using ->is_blocked to replace the core
se.sched_delayed usage.
But aside from making a few silly mistakes and taking far too long to figure
out WTF happened, I've ran into a snag with the scheme to remove PROXY_WAKING.
This happens in patch #4, where we switch the proxy paths to be guarded by
->is_blocked, rather than ->blocked_on. This works fine for schedule() /
pick_next_task, since that guarantees there are no delayed tasks.
However ttwu_runnable() has no such luck, and if ->is_blocked is all we have,
then it turns out that we'll always fully block delayed tasks and send then
through the long path.
Now, I did me some ttwu-delayed patches a while ago, and Mike has been poking
me about them. Those patches pick out the delayed things before we take locks,
so perhaps we can resolve this that way. I'll have to poke a bit more.
In the meantime, I figured I'd share the patches I got... I think at least the
first 3 might live :-)
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-26 11:16 [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
@ 2026-05-26 11:16 ` Peter Zijlstra
2026-05-26 23:39 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-05-26 11:16 ` [PATCH 2/6] sched/proxy: Optimize try_to_wake_up() Peter Zijlstra
` (5 subsequent siblings)
6 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 11:16 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Peter Zijlstra (Intel), Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Per the discussion here:
https://lore.kernel.org/all/20260403112810.GG3738786@noisy.programming.kicks-ass.net/
The reason for this condition is that the signal condition in
try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
does that, in fact, that path does clear_task_blocked_on(), rendering the
clause under discussion moot.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 3 ---
1 file changed, 3 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7132,9 +7132,6 @@ static void __sched notrace __schedule(i
if (sched_proxy_exec()) {
struct task_struct *prev_donor = rq->donor;
- if (!prev_state && prev->blocked_on)
- clear_task_blocked_on(prev, NULL);
-
rq_set_donor(rq, next);
next->blocked_donor = NULL;
if (unlikely(next->is_blocked && next->blocked_on)) {
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 2/6] sched/proxy: Optimize try_to_wake_up()
2026-05-26 11:16 [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
2026-05-26 11:16 ` [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in() Peter Zijlstra
@ 2026-05-26 11:16 ` Peter Zijlstra
2026-05-27 1:56 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-05-26 11:16 ` [PATCH 3/6] sched: Be more strict about p->is_blocked Peter Zijlstra
` (4 subsequent siblings)
6 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 11:16 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Peter Zijlstra (Intel), Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
The reason for the clause in try_to_wake_up() is, per its comment, that
find_proxy_task()'s proxy_deactivate() is not always called with a cleared
p->blocked_on.
However, that seems silly and easily cured. Make sure to always call
proxy_deactivate() with a cleared p->blocked_on such that we might remove this
clause from the common wake-up path.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4344,14 +4344,6 @@ int try_to_wake_up(struct task_struct *p
WRITE_ONCE(p->__state, TASK_WAKING);
/*
- * We never clear the blocked_on relation on proxy_deactivate.
- * If we don't clear it here, we have TASK_RUNNING + p->blocked_on
- * when waking up. Since this is a fully blocked, off CPU task
- * waking up, it should be safe to clear the blocked_on relation.
- */
- if (task_is_blocked(p))
- clear_task_blocked_on(p, NULL);
- /*
* If the owning (remote) CPU is still in the middle of schedule() with
* this task as prev, considering queueing p on the remote CPUs wake_list
* which potentially sends an IPI instead of spinning on p->on_cpu to
@@ -6739,6 +6731,7 @@ static void proxy_deactivate(struct rq *
unsigned long state = READ_ONCE(donor->__state);
WARN_ON_ONCE(state == TASK_RUNNING);
+ WARN_ON_ONCE(donor->blocked_on);
/*
* Because we got donor from pick_next_task(), it is *crucial*
* that we call proxy_resched_idle() before we deactivate it.
@@ -6864,9 +6857,9 @@ find_proxy_task(struct rq *rq, struct ta
for (p = donor; (mutex = p->blocked_on); p = owner) {
/* if its PROXY_WAKING, do return migration or run if current */
if (mutex == PROXY_WAKING) {
+ clear_task_blocked_on(p, PROXY_WAKING);
if (task_current(rq, p)) {
p->is_blocked = 0;
- clear_task_blocked_on(p, PROXY_WAKING);
return p;
}
goto deactivate;
@@ -6900,9 +6893,9 @@ find_proxy_task(struct rq *rq, struct ta
* and return p (if it is current and safe to
* just run on this rq), or return-migrate the task.
*/
+ __clear_task_blocked_on(p, NULL);
if (task_current(rq, p)) {
p->is_blocked = 0;
- __clear_task_blocked_on(p, NULL);
return p;
}
goto deactivate;
@@ -6912,6 +6905,7 @@ find_proxy_task(struct rq *rq, struct ta
/* XXX Don't handle blocked owners/delayed dequeue yet */
if (curr_in_chain)
return proxy_resched_idle(rq);
+ __clear_task_blocked_on(p, NULL);
goto deactivate;
}
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 3/6] sched: Be more strict about p->is_blocked
2026-05-26 11:16 [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
2026-05-26 11:16 ` [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in() Peter Zijlstra
2026-05-26 11:16 ` [PATCH 2/6] sched/proxy: Optimize try_to_wake_up() Peter Zijlstra
@ 2026-05-26 11:16 ` Peter Zijlstra
2026-05-27 1:56 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-05-26 11:16 ` [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked Peter Zijlstra
` (3 subsequent siblings)
6 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 11:16 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Peter Zijlstra (Intel), Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Upon entry to try_to_block_task(), p->is_blocked should be false. After all,
the prior wakeup would have made it so per ttwu_do_wakeup().
Ensure this is the case, rather than clearing it in the path that doesn't set
it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6676,8 +6676,9 @@ static bool try_to_block_task(struct rq
{
unsigned long task_state = *task_state_p;
+ WARN_ON_ONCE(p->is_blocked);
+
if (signal_pending_state(task_state, p)) {
- p->is_blocked = 0;
WRITE_ONCE(p->__state, TASK_RUNNING);
*task_state_p = TASK_RUNNING;
clear_task_blocked_on(p, NULL);
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked
2026-05-26 11:16 [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
` (2 preceding siblings ...)
2026-05-26 11:16 ` [PATCH 3/6] sched: Be more strict about p->is_blocked Peter Zijlstra
@ 2026-05-26 11:16 ` Peter Zijlstra
2026-05-26 14:57 ` Peter Zijlstra
` (2 more replies)
2026-05-26 11:16 ` [PATCH 5/6] sched/proxy: Remove PROXY_WAKING Peter Zijlstra
` (2 subsequent siblings)
6 siblings, 3 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 11:16 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Peter Zijlstra (Intel), Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Rather than gate the proxy paths with p->blocked_on, use p->is_blocked.
This opens up the state: '->is_blocked && !->blocked_on' for future use.
Notably, only proxy and delayed tasks can be ->on_rq && ->is_blocked, and it is
guaranteed that sched_class::pick_task() will never return a delayed task.
Therefore any task returned from pick_next_task() that has ->is_blocked set,
must be a proxy task.
XXX: ttwu_runnable(): AFAICT this results in all delayed tasks getting blocked
and send down the long wakeup-path -- and while there were some plans there
[*], that was especially careful to not take all those locks.
[*] https://lore.kernel.org/r/20250702114924.091581796@infradead.org
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3764,7 +3764,7 @@ static inline void proxy_reset_donor(str
*/
static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
{
- if (!task_is_blocked(p))
+ if (!p->is_blocked)
return false;
scoped_guard(raw_spinlock, &p->blocked_lock) {
@@ -6850,14 +6850,14 @@ find_proxy_task(struct rq *rq, struct ta
bool curr_in_chain = false;
int this_cpu = cpu_of(rq);
struct task_struct *p;
- struct mutex *mutex;
int owner_cpu;
/* Follow blocked_on chain. */
- for (p = donor; (mutex = p->blocked_on); p = owner) {
+ for (p = donor; p->is_blocked; p = owner) {
/* if its PROXY_WAKING, do return migration or run if current */
- if (mutex == PROXY_WAKING) {
- clear_task_blocked_on(p, PROXY_WAKING);
+ struct mutex *mutex = p->blocked_on;
+ if (!mutex || mutex == PROXY_WAKING) {
+ clear_task_blocked_on(p, mutex);
if (task_current(rq, p)) {
p->is_blocked = 0;
return p;
@@ -7128,7 +7128,7 @@ static void __sched notrace __schedule(i
rq_set_donor(rq, next);
next->blocked_donor = NULL;
- if (unlikely(next->is_blocked && next->blocked_on)) {
+ if (unlikely(next->is_blocked)) {
next = find_proxy_task(rq, next, &rf);
if (!next) {
zap_balance_callbacks(rq);
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-05-26 11:16 [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
` (3 preceding siblings ...)
2026-05-26 11:16 ` [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked Peter Zijlstra
@ 2026-05-26 11:16 ` Peter Zijlstra
2026-06-01 10:54 ` Peter Zijlstra
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-05-26 11:16 ` [PATCH 6/6] sched: Simplify ttwu_runnable() Peter Zijlstra
2026-05-26 11:45 ` [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
6 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 11:16 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Peter Zijlstra (Intel), Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
From: K Prateek Nayak <kprateek.nayak@amd.com>
Now that the proxy path uses ->is_blocked, use the '->is_blocked &&
!->blocked_on' state instead of PROXY_WAKING. Notably, this is where a
blocked_on relation is broken but the donor task might still need a return
migration.
(Not-yet-)Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
include/linux/sched.h | 50 +---------------------------------------------
kernel/locking/mutex.c | 4 +--
kernel/locking/ww_mutex.h | 4 +--
kernel/sched/core.c | 13 ++++++++++-
4 files changed, 17 insertions(+), 54 deletions(-)
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2202,19 +2202,10 @@ extern int __cond_resched_rwlock_write(r
#ifndef CONFIG_PREEMPT_RT
-/*
- * With proxy exec, if a task has been proxy-migrated, it may be a donor
- * on a cpu that it can't actually run on. Thus we need a special state
- * to denote that the task is being woken, but that it needs to be
- * evaluated for return-migration before it is run. So if the task is
- * blocked_on PROXY_WAKING, return migrate it before running it.
- */
-#define PROXY_WAKING ((struct mutex *)(-1L))
-
static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
{
lockdep_assert_held_once(&p->blocked_lock);
- return p->blocked_on == PROXY_WAKING ? NULL : p->blocked_on;
+ return p->blocked_on;
}
static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
@@ -2242,7 +2233,7 @@ static inline void __clear_task_blocked_
* blocked_on relationships, but make sure we are not
* clearing the relationship with a different lock.
*/
- WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m && p->blocked_on != PROXY_WAKING);
+ WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
p->blocked_on = NULL;
}
@@ -2251,35 +2242,6 @@ static inline void clear_task_blocked_on
guard(raw_spinlock_irqsave)(&p->blocked_lock);
__clear_task_blocked_on(p, m);
}
-
-static inline void __set_task_blocked_on_waking(struct task_struct *p, struct mutex *m)
-{
- /* Currently we serialize blocked_on under the task::blocked_lock */
- lockdep_assert_held_once(&p->blocked_lock);
-
- if (!sched_proxy_exec()) {
- __clear_task_blocked_on(p, m);
- return;
- }
-
- /* Don't set PROXY_WAKING if blocked_on was already cleared */
- if (!p->blocked_on)
- return;
- /*
- * There may be cases where we set PROXY_WAKING on tasks that were
- * already set to waking, but make sure we are not changing
- * the relationship with a different lock.
- */
- WARN_ON_ONCE(m && p->blocked_on != m && p->blocked_on != PROXY_WAKING);
- p->blocked_on = PROXY_WAKING;
-}
-
-static inline void set_task_blocked_on_waking(struct task_struct *p, struct mutex *m)
-{
- guard(raw_spinlock_irqsave)(&p->blocked_lock);
- __set_task_blocked_on_waking(p, m);
-}
-
#else
static inline void __clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
{
@@ -2288,14 +2250,6 @@ static inline void __clear_task_blocked_
static inline void clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
{
}
-
-static inline void __set_task_blocked_on_waking(struct task_struct *p, struct rt_mutex *m)
-{
-}
-
-static inline void set_task_blocked_on_waking(struct task_struct *p, struct rt_mutex *m)
-{
-}
#endif /* !CONFIG_PREEMPT_RT */
static __always_inline bool need_resched(void)
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -1043,7 +1043,7 @@ static noinline void __sched __mutex_unl
next_lock = __get_task_blocked_on(donor);
if (next_lock == lock) {
next = get_task_struct(donor);
- __set_task_blocked_on_waking(donor, next_lock);
+ __clear_task_blocked_on(next, lock);
current->blocked_donor = NULL;
}
raw_spin_unlock(&donor->blocked_lock);
@@ -1059,7 +1059,7 @@ static noinline void __sched __mutex_unl
raw_spin_lock_nested(&next->blocked_lock, SINGLE_DEPTH_NESTING);
debug_mutex_wake_waiter(lock, waiter);
- __set_task_blocked_on_waking(next, lock);
+ __clear_task_blocked_on(next, lock);
raw_spin_unlock(&next->blocked_lock);
}
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -324,7 +324,7 @@ __ww_mutex_die(struct MUTEX *lock, struc
* blocked_on to PROXY_WAKING. Otherwise we can see
* circular blocked_on relationships that can't resolve.
*/
- set_task_blocked_on_waking(waiter->task, lock);
+ clear_task_blocked_on(waiter->task, lock);
wake_q_add(wake_q, waiter->task);
}
@@ -383,7 +383,7 @@ static bool __ww_mutex_wound(struct MUTE
* are waking the mutex owner, who may be currently
* blocked on a different mutex.
*/
- set_task_blocked_on_waking(owner, NULL);
+ clear_task_blocked_on(owner, NULL);
wake_q_add(wake_q, owner);
}
return true;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6856,7 +6865,7 @@ find_proxy_task(struct rq *rq, struct ta
for (p = donor; p->is_blocked; p = owner) {
/* if its PROXY_WAKING, do return migration or run if current */
struct mutex *mutex = p->blocked_on;
- if (!mutex || mutex == PROXY_WAKING) {
+ if (!mutex) {
clear_task_blocked_on(p, mutex);
if (task_current(rq, p)) {
p->is_blocked = 0;
^ permalink raw reply [flat|nested] 45+ messages in thread
* [PATCH 6/6] sched: Simplify ttwu_runnable()
2026-05-26 11:16 [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
` (4 preceding siblings ...)
2026-05-26 11:16 ` [PATCH 5/6] sched/proxy: Remove PROXY_WAKING Peter Zijlstra
@ 2026-05-26 11:16 ` Peter Zijlstra
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-05-26 11:45 ` [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
6 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 11:16 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Peter Zijlstra (Intel), Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Note that both proxy and delayed tasks have ->is_blocked set. Use this one
condition to guard both paths.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
kernel/sched/core.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3764,9 +3764,6 @@ static inline void proxy_reset_donor(str
*/
static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
{
- if (!p->is_blocked)
- return false;
-
scoped_guard(raw_spinlock, &p->blocked_lock) {
/* Task is waking up; clear any blocked_on relationship */
__clear_task_blocked_on(p, NULL);
@@ -3860,10 +3857,12 @@ static int ttwu_runnable(struct task_str
return 0;
update_rq_clock(rq);
- if (p->se.sched_delayed)
- enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
- if (proxy_needs_return(rq, p))
- return 0;
+ if (p->is_blocked) {
+ if (p->se.sched_delayed)
+ enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
+ if (proxy_needs_return(rq, p))
+ return 0;
+ }
if (!task_on_cpu(rq, p)) {
/*
* When on_rq && !on_cpu the task is preempted, see if
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 0/6] sched/proxy: doodles..
2026-05-26 11:16 [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
` (5 preceding siblings ...)
2026-05-26 11:16 ` [PATCH 6/6] sched: Simplify ttwu_runnable() Peter Zijlstra
@ 2026-05-26 11:45 ` Peter Zijlstra
6 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 11:45 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
I managed to screw up the emails and most MTAs refused to accept. Mails
made their to lore though.
On Tue, May 26, 2026 at 01:16:09PM +0200, Peter Zijlstra wrote:
> This goes on top of queue:sched/proxy.
>
> I was trying to do some cleanups and playing around with that PROXY_WAKING
> removal, and also moving towards using ->is_blocked to replace the core
> se.sched_delayed usage.
>
> But aside from making a few silly mistakes and taking far too long to figure
> out WTF happened, I've ran into a snag with the scheme to remove PROXY_WAKING.
>
> This happens in patch #4, where we switch the proxy paths to be guarded by
> ->is_blocked, rather than ->blocked_on. This works fine for schedule() /
> pick_next_task, since that guarantees there are no delayed tasks.
>
> However ttwu_runnable() has no such luck, and if ->is_blocked is all we have,
> then it turns out that we'll always fully block delayed tasks and send then
> through the long path.
>
> Now, I did me some ttwu-delayed patches a while ago, and Mike has been poking
> me about them. Those patches pick out the delayed things before we take locks,
> so perhaps we can resolve this that way. I'll have to poke a bit more.
>
> In the meantime, I figured I'd share the patches I got... I think at least the
> first 3 might live :-)
https://lore.kernel.org/all/20260526111609.433880331@infradead.org/T/#u
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked
2026-05-26 11:16 ` [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked Peter Zijlstra
@ 2026-05-26 14:57 ` Peter Zijlstra
2026-05-26 19:48 ` John Stultz
2026-05-27 2:25 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] sched/proxy: Switch proxy to use p->is_blocked tip-bot2 for Peter Zijlstra
2 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-26 14:57 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 01:16:13PM +0200, Peter Zijlstra wrote:
> Rather than gate the proxy paths with p->blocked_on, use p->is_blocked.
>
> This opens up the state: '->is_blocked && !->blocked_on' for future use.
>
> Notably, only proxy and delayed tasks can be ->on_rq && ->is_blocked, and it is
> guaranteed that sched_class::pick_task() will never return a delayed task.
> Therefore any task returned from pick_next_task() that has ->is_blocked set,
> must be a proxy task.
>
> XXX: ttwu_runnable(): AFAICT this results in all delayed tasks getting blocked
> and send down the long wakeup-path -- and while there were some plans there
> [*], that was especially careful to not take all those locks.
>
> [*] https://lore.kernel.org/r/20250702114924.091581796@infradead.org
>
> Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/sched/core.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3764,7 +3764,7 @@ static inline void proxy_reset_donor(str
> */
> static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
> {
> - if (!task_is_blocked(p))
> + if (!p->is_blocked)
> return false;
Oh, I think we can solve things if we have a cpus_allowed check here. If
the task is on an allowed CPU, it don't need migration and we can carry
on without eating the overhead.
>
> scoped_guard(raw_spinlock, &p->blocked_lock) {
> @@ -6850,14 +6850,14 @@ find_proxy_task(struct rq *rq, struct ta
> bool curr_in_chain = false;
> int this_cpu = cpu_of(rq);
> struct task_struct *p;
> - struct mutex *mutex;
> int owner_cpu;
>
> /* Follow blocked_on chain. */
> - for (p = donor; (mutex = p->blocked_on); p = owner) {
> + for (p = donor; p->is_blocked; p = owner) {
> /* if its PROXY_WAKING, do return migration or run if current */
> - if (mutex == PROXY_WAKING) {
> - clear_task_blocked_on(p, PROXY_WAKING);
> + struct mutex *mutex = p->blocked_on;
> + if (!mutex || mutex == PROXY_WAKING) {
> + clear_task_blocked_on(p, mutex);
> if (task_current(rq, p)) {
> p->is_blocked = 0;
> return p;
> @@ -7128,7 +7128,7 @@ static void __sched notrace __schedule(i
>
> rq_set_donor(rq, next);
> next->blocked_donor = NULL;
> - if (unlikely(next->is_blocked && next->blocked_on)) {
> + if (unlikely(next->is_blocked)) {
> next = find_proxy_task(rq, next, &rf);
> if (!next) {
> zap_balance_callbacks(rq);
>
>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked
2026-05-26 14:57 ` Peter Zijlstra
@ 2026-05-26 19:48 ` John Stultz
0 siblings, 0 replies; 45+ messages in thread
From: John Stultz @ 2026-05-26 19:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 7:57 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, May 26, 2026 at 01:16:13PM +0200, Peter Zijlstra wrote:
> > Rather than gate the proxy paths with p->blocked_on, use p->is_blocked.
> >
> > This opens up the state: '->is_blocked && !->blocked_on' for future use.
> >
> > Notably, only proxy and delayed tasks can be ->on_rq && ->is_blocked, and it is
> > guaranteed that sched_class::pick_task() will never return a delayed task.
> > Therefore any task returned from pick_next_task() that has ->is_blocked set,
> > must be a proxy task.
> >
> > XXX: ttwu_runnable(): AFAICT this results in all delayed tasks getting blocked
> > and send down the long wakeup-path -- and while there were some plans there
> > [*], that was especially careful to not take all those locks.
> >
> > [*] https://lore.kernel.org/r/20250702114924.091581796@infradead.org
> >
> > Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/sched/core.c | 12 ++++++------
> > 1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3764,7 +3764,7 @@ static inline void proxy_reset_donor(str
> > */
> > static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
> > {
> > - if (!task_is_blocked(p))
> > + if (!p->is_blocked)
> > return false;
>
> Oh, I think we can solve things if we have a cpus_allowed check here. If
> the task is on an allowed CPU, it don't need migration and we can carry
> on without eating the overhead.
I need to look more at your patches here, but I had a similar shortcut
awhile back in early versions of the series, and I dropped it because
it was pointed out that on big-little systems, you might have a
important task on the big that proxy-migrates to a little to get a
lock owned by a background task quickly released. But when the owner
wakes up the donor, if it wakes it on the little rq's, then it may be
a bit until the important task gets re-balanced to the big, impacting
performance. Instead it seemed better to match the non-proxy behavior
where when the task is proxy-migrated its the same as if its off the
rq blocking, and thus on wakeup we'd want to do the full placement
(hopefully back to the big, but at least wherever select_task_rq()
chooses).
That way proxy-migrations won't disrupt scheduling behaior very much.
thanks
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-26 11:16 ` [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in() Peter Zijlstra
@ 2026-05-26 23:39 ` John Stultz
2026-05-26 23:54 ` John Stultz
2026-05-28 23:20 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
1 sibling, 2 replies; 45+ messages in thread
From: John Stultz @ 2026-05-26 23:39 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Per the discussion here:
>
> https://lore.kernel.org/all/20260403112810.GG3738786@noisy.programming.kicks-ass.net/
>
> The reason for this condition is that the signal condition in
> try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
> does that, in fact, that path does clear_task_blocked_on(), rendering the
> clause under discussion moot.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> ---
> kernel/sched/core.c | 3 ---
> 1 file changed, 3 deletions(-)
>
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -7132,9 +7132,6 @@ static void __sched notrace __schedule(i
> if (sched_proxy_exec()) {
> struct task_struct *prev_donor = rq->donor;
>
> - if (!prev_state && prev->blocked_on)
> - clear_task_blocked_on(prev, NULL);
> -
> rq_set_donor(rq, next);
> next->blocked_donor = NULL;
> if (unlikely(next->is_blocked && next->blocked_on)) {
Oh good! I had a note to try to re-confirm if that chunk was really
needed, as it did feel a bit like it was patching up a problem after
the fact.
That said, running this on top of your sched/proxy branch tripped over
warnings with the ww_mutex selftest, so it looks like there's
something else missing before this can land.
Digging in, it looks like we still need the fix I had here:
https://lore.kernel.org/lkml/20260430215103.2978955-3-jstultz@google.com/
Since without that, we can get into a situation where we have
blocked_on set when a task __state is TASK_RUNNING. The segment
you're dropping would catch and clear that out, but really we should
avoid getting into that situation in the mutex lock code.
thanks
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-26 23:39 ` John Stultz
@ 2026-05-26 23:54 ` John Stultz
2026-05-27 8:59 ` Peter Zijlstra
2026-05-28 23:20 ` John Stultz
1 sibling, 1 reply; 45+ messages in thread
From: John Stultz @ 2026-05-26 23:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 4:39 PM John Stultz <jstultz@google.com> wrote:
> On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Per the discussion here:
> >
> > https://lore.kernel.org/all/20260403112810.GG3738786@noisy.programming.kicks-ass.net/
> >
> > The reason for this condition is that the signal condition in
> > try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
> > does that, in fact, that path does clear_task_blocked_on(), rendering the
> > clause under discussion moot.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/sched/core.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7132,9 +7132,6 @@ static void __sched notrace __schedule(i
> > if (sched_proxy_exec()) {
> > struct task_struct *prev_donor = rq->donor;
> >
> > - if (!prev_state && prev->blocked_on)
> > - clear_task_blocked_on(prev, NULL);
> > -
> > rq_set_donor(rq, next);
> > next->blocked_donor = NULL;
> > if (unlikely(next->is_blocked && next->blocked_on)) {
>
> Oh good! I had a note to try to re-confirm if that chunk was really
> needed, as it did feel a bit like it was patching up a problem after
> the fact.
>
> That said, running this on top of your sched/proxy branch tripped over
> warnings with the ww_mutex selftest, so it looks like there's
> something else missing before this can land.
>
> Digging in, it looks like we still need the fix I had here:
> https://lore.kernel.org/lkml/20260430215103.2978955-3-jstultz@google.com/
I'll be doing some stress testing with the rest of set later, but so
far with the extra fix I pointed to this is looking ok
Acked-by: John Stultz <jstultz@google.com>
thanks
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 2/6] sched/proxy: Optimize try_to_wake_up()
2026-05-26 11:16 ` [PATCH 2/6] sched/proxy: Optimize try_to_wake_up() Peter Zijlstra
@ 2026-05-27 1:56 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 45+ messages in thread
From: John Stultz @ 2026-05-27 1:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> The reason for the clause in try_to_wake_up() is, per its comment, that
> find_proxy_task()'s proxy_deactivate() is not always called with a cleared
> p->blocked_on.
>
> However, that seems silly and easily cured. Make sure to always call
> proxy_deactivate() with a cleared p->blocked_on such that we might remove this
> clause from the common wake-up path.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <jstultz@google.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 3/6] sched: Be more strict about p->is_blocked
2026-05-26 11:16 ` [PATCH 3/6] sched: Be more strict about p->is_blocked Peter Zijlstra
@ 2026-05-27 1:56 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 45+ messages in thread
From: John Stultz @ 2026-05-27 1:56 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Upon entry to try_to_block_task(), p->is_blocked should be false. After all,
> the prior wakeup would have made it so per ttwu_do_wakeup().
>
> Ensure this is the case, rather than clearing it in the path that doesn't set
> it.
>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <jstultz@google.com>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked
2026-05-26 11:16 ` [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked Peter Zijlstra
2026-05-26 14:57 ` Peter Zijlstra
@ 2026-05-27 2:25 ` John Stultz
2026-05-27 8:29 ` Peter Zijlstra
2026-06-04 18:45 ` [tip: sched/core] sched/proxy: Switch proxy to use p->is_blocked tip-bot2 for Peter Zijlstra
2 siblings, 1 reply; 45+ messages in thread
From: John Stultz @ 2026-05-27 2:25 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> Rather than gate the proxy paths with p->blocked_on, use p->is_blocked.
>
> This opens up the state: '->is_blocked && !->blocked_on' for future use.
>
> Notably, only proxy and delayed tasks can be ->on_rq && ->is_blocked, and it is
> guaranteed that sched_class::pick_task() will never return a delayed task.
> Therefore any task returned from pick_next_task() that has ->is_blocked set,
> must be a proxy task.
While this seems true, it also feels very subtle.
Just taking a step back, while it might be possible, I'm not sure I'm
totally seeing the benefit of doing this.
When we were playing around with the idea of keeping ptr+latch-bit in
the blocked_on field, using NULL+latch to replace PROXY_WAKING made
sense, but with is_blocked being used for more than just proxy logic,
I'm not sure encoding meaning across the two fields is particularly
intuitive (and def seems more error prone). Is the special
PROXY_WAKING value really so awful? Or maybe does it make sense to
have different values for is_blocked (DELAYED, PROXY) so we can better
separate the variants when combining with blocked_on?
It is a little funny to see how close this is getting to the separate
blocked_on_state + blocked_on management I had way back when before we
compressed that down with PROXY_WAKING. :)
thanks
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked
2026-05-27 2:25 ` John Stultz
@ 2026-05-27 8:29 ` Peter Zijlstra
2026-06-04 18:45 ` [tip: sched/core] sched/proxy: Only return migrate when needed tip-bot2 for Peter Zijlstra
0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-27 8:29 UTC (permalink / raw)
To: John Stultz
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 07:25:13PM -0700, John Stultz wrote:
> On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Rather than gate the proxy paths with p->blocked_on, use p->is_blocked.
> >
> > This opens up the state: '->is_blocked && !->blocked_on' for future use.
> >
> > Notably, only proxy and delayed tasks can be ->on_rq && ->is_blocked, and it is
> > guaranteed that sched_class::pick_task() will never return a delayed task.
> > Therefore any task returned from pick_next_task() that has ->is_blocked set,
> > must be a proxy task.
>
> While this seems true, it also feels very subtle.
>
> Just taking a step back, while it might be possible, I'm not sure I'm
> totally seeing the benefit of doing this.
>
> When we were playing around with the idea of keeping ptr+latch-bit in
> the blocked_on field, using NULL+latch to replace PROXY_WAKING made
> sense, but with is_blocked being used for more than just proxy logic,
> I'm not sure encoding meaning across the two fields is particularly
> intuitive (and def seems more error prone). Is the special
> PROXY_WAKING value really so awful? Or maybe does it make sense to
> have different values for is_blocked (DELAYED, PROXY) so we can better
> separate the variants when combining with blocked_on?
>
> It is a little funny to see how close this is getting to the separate
> blocked_on_state + blocked_on management I had way back when before we
> compressed that down with PROXY_WAKING. :)
Yeah, I was thinking the same. But then last night, after it cooled down
a bit, my brain started working again and I realized that there is a
simple test that should work.
Basically, *IF* we are proxy migrated -- and thus need a return
migration -- then task_cpu(p) != p->wake_cpu, per proxy_set_task_cpu().
This doesn't suffer the random migration issues you get from purely
checking against p->cpus_ptr, and it is more specific than PROXY_WAKING,
in that it will really only do the long path / migration if we do in
fact need return migration. If we stayed on the right CPU, we simply
stay there.
So I've stuck the below into the series between 3 and 4. This seems to
survive boot with ww_mutex selftest and hackbench.
---
kernel/sched/core.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3767,6 +3767,21 @@ static inline bool proxy_needs_return(st
if (!task_is_blocked(p))
return false;
+ /*
+ * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
+ *
+ * However, proxy_set_task_cpu() is such that it preserves the
+ * original cpu in p->wake_cpu while migrating p for proxy reasons
+ * (possibly outside of the allowed p->cpus_ptr).
+ *
+ * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
+ * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
+ * will not apply. But if it did, this check is the safe way around
+ * and would migrate.
+ */
+ if (task_cpu(p) == p->wake_cpu)
+ return false;
+
scoped_guard(raw_spinlock, &p->blocked_lock) {
/* Task is waking up; clear any blocked_on relationship */
__clear_task_blocked_on(p, NULL);
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-26 23:54 ` John Stultz
@ 2026-05-27 8:59 ` Peter Zijlstra
0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-27 8:59 UTC (permalink / raw)
To: John Stultz
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 04:54:01PM -0700, John Stultz wrote:
> On Tue, May 26, 2026 at 4:39 PM John Stultz <jstultz@google.com> wrote:
> > On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > Per the discussion here:
> > >
> > > https://lore.kernel.org/all/20260403112810.GG3738786@noisy.programming.kicks-ass.net/
> > >
> > > The reason for this condition is that the signal condition in
> > > try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
> > > does that, in fact, that path does clear_task_blocked_on(), rendering the
> > > clause under discussion moot.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > > kernel/sched/core.c | 3 ---
> > > 1 file changed, 3 deletions(-)
> > >
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -7132,9 +7132,6 @@ static void __sched notrace __schedule(i
> > > if (sched_proxy_exec()) {
> > > struct task_struct *prev_donor = rq->donor;
> > >
> > > - if (!prev_state && prev->blocked_on)
> > > - clear_task_blocked_on(prev, NULL);
> > > -
> > > rq_set_donor(rq, next);
> > > next->blocked_donor = NULL;
> > > if (unlikely(next->is_blocked && next->blocked_on)) {
> >
> > Oh good! I had a note to try to re-confirm if that chunk was really
> > needed, as it did feel a bit like it was patching up a problem after
> > the fact.
> >
> > That said, running this on top of your sched/proxy branch tripped over
> > warnings with the ww_mutex selftest, so it looks like there's
> > something else missing before this can land.
> >
> > Digging in, it looks like we still need the fix I had here:
> > https://lore.kernel.org/lkml/20260430215103.2978955-3-jstultz@google.com/
Right, that patch looks reasonable, pulled that in.
> I'll be doing some stress testing with the rest of set later, but so
> far with the extra fix I pointed to this is looking ok
> Acked-by: John Stultz <jstultz@google.com>
Thanks!
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-26 23:39 ` John Stultz
2026-05-26 23:54 ` John Stultz
@ 2026-05-28 23:20 ` John Stultz
2026-05-29 6:45 ` K Prateek Nayak
2026-05-29 6:48 ` John Stultz
1 sibling, 2 replies; 45+ messages in thread
From: John Stultz @ 2026-05-28 23:20 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 4:39 PM John Stultz <jstultz@google.com> wrote:
> On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > Per the discussion here:
> >
> > https://lore.kernel.org/all/20260403112810.GG3738786@noisy.programming.kicks-ass.net/
> >
> > The reason for this condition is that the signal condition in
> > try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
> > does that, in fact, that path does clear_task_blocked_on(), rendering the
> > clause under discussion moot.
> >
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > ---
> > kernel/sched/core.c | 3 ---
> > 1 file changed, 3 deletions(-)
> >
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -7132,9 +7132,6 @@ static void __sched notrace __schedule(i
> > if (sched_proxy_exec()) {
> > struct task_struct *prev_donor = rq->donor;
> >
> > - if (!prev_state && prev->blocked_on)
> > - clear_task_blocked_on(prev, NULL);
> > -
> > rq_set_donor(rq, next);
> > next->blocked_donor = NULL;
> > if (unlikely(next->is_blocked && next->blocked_on)) {
>
> Oh good! I had a note to try to re-confirm if that chunk was really
> needed, as it did feel a bit like it was patching up a problem after
> the fact.
>
> That said, running this on top of your sched/proxy branch tripped over
> warnings with the ww_mutex selftest, so it looks like there's
> something else missing before this can land.
>
> Digging in, it looks like we still need the fix I had here:
> https://lore.kernel.org/lkml/20260430215103.2978955-3-jstultz@google.com/
>
> Since without that, we can get into a situation where we have
> blocked_on set when a task __state is TASK_RUNNING. The segment
> you're dropping would catch and clear that out, but really we should
> avoid getting into that situation in the mutex lock code.
Hey Peter,
So I've done testing with your full sched/proxy tree and with the
entire set it looks ok.
However, even with the fix I poined out, I've unfortunately hit races
with the ww_mutex selftest at the point of this patch in the series.
Basically between commit
1b89b7b21bf5 ("sched/proxy: Remove superfluous clear_task_blocked_in()")
and
a8be1edac5a1 ("sched/proxy: Remove PROXY_WAKING")
I'm currently tracing down exactly why the race is cropping up but I
believe the chunk removed in this case is avoiding cases where we end
up getting PROXY_WAKING set on a TASK_RUNNING task.
I'll get back to you when I get my head around it properly, but wanted
to raise the issue in case you or K Prateek can see right through it.
Again, once the PROXY_WAKING code is dropped the race seems to go
away, but I'd like to understand it better so we don't have a
broken-window in the patch series.
thanks
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-28 23:20 ` John Stultz
@ 2026-05-29 6:45 ` K Prateek Nayak
2026-05-29 7:14 ` John Stultz
` (2 more replies)
2026-05-29 6:48 ` John Stultz
1 sibling, 3 replies; 45+ messages in thread
From: K Prateek Nayak @ 2026-05-29 6:45 UTC (permalink / raw)
To: John Stultz, Peter Zijlstra
Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Hello John,
On 5/29/2026 4:50 AM, John Stultz wrote:
> However, even with the fix I poined out, I've unfortunately hit races
> with the ww_mutex selftest at the point of this patch in the series.
> Basically between commit
> 1b89b7b21bf5 ("sched/proxy: Remove superfluous clear_task_blocked_in()")
> and
> a8be1edac5a1 ("sched/proxy: Remove PROXY_WAKING")
>
> I'm currently tracing down exactly why the race is cropping up but I
> believe the chunk removed in this case is avoiding cases where we end
> up getting PROXY_WAKING set on a TASK_RUNNING task.
This seems to be the failure path:
/* Task p*/
mutex_lock(mutex)
... try_to_wake_up(p)
schedule_preempt_disabled() ttwu_runnable()
__schedule() __task_rq_lock() /* Wins */
rq_lock() /* Waits */ if (task_on_rq_queued(p))
/*
* p->is_blocked is still not set!
* proxy_needs_return() bails out early.
*/
ttwu_do_wakeup()
p->__state = TASK_RUNNING;
__tsk_rq_unlock();
...
/* p->__state = TASK_RUNNING */
prev_state = p->__state;
if (prev_state && ...) {
/*
* Skipped since task is
* already TASK_RUNNING
*/
}
/* p->is_blocked = 0; p->blocked_on = PROXY_WAKING */
next = p;
/* Returns from schedule_preempt_disabled()
set_task_blocked_on(p, mutex)
!!! p->blocked_on == PROXY_WAKING && p->blocked_on != mutex !!!
---
Also proxy_needs_return() bails out too early - a wakeup from signal
should still clear p->blocked_on even if p->wake_cpu is same as
task_cpu().
I think we need the following at commit 83f9b04ef50c ("sched/proxy:
Switch proxy to use p->is_blocked") to clear p->blocked_on in the wakeup
path irrespective of p->is_blocked:
(Lightly tested)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a125e65c35bb..fe903976fd09 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3764,28 +3764,28 @@ static inline void proxy_reset_donor(struct rq *rq)
*/
static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
{
- if (!p->is_blocked)
- return false;
-
- /*
- * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
- *
- * However, proxy_set_task_cpu() is such that it preserves the
- * original cpu in p->wake_cpu while migrating p for proxy reasons
- * (possibly outside of the allowed p->cpus_ptr).
- *
- * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
- * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
- * will not apply. But if it did, this check is the safe way around
- * and would migrate.
- */
- if (task_cpu(p) == p->wake_cpu)
+ if (!task_is_blocked(p))
return false;
scoped_guard(raw_spinlock, &p->blocked_lock) {
/* Task is waking up; clear any blocked_on relationship */
__clear_task_blocked_on(p, NULL);
+ /*
+ * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
+ *
+ * However, proxy_set_task_cpu() is such that it preserves the
+ * original cpu in p->wake_cpu while migrating p for proxy reasons
+ * (possibly outside of the allowed p->cpus_ptr).
+ *
+ * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
+ * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
+ * will not apply. But if it did, this check is the safe way around
+ * and would migrate.
+ */
+ if (task_cpu(p) == p->wake_cpu)
+ return false;
+
/* If already current, don't need to return migrate */
if (task_current(rq, p))
return false;
---
Part of that belongs in commit e2ff8b7bde07 ("sched/proxy: Only return
migrate when needed") and the first hunk of 83f9b04ef50c ("sched/proxy:
Switch proxy to use p->is_blocked") in proxy_needs_return() should be
dropped.
--
Thanks and Regards,
Prateek
^ permalink raw reply related [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-28 23:20 ` John Stultz
2026-05-29 6:45 ` K Prateek Nayak
@ 2026-05-29 6:48 ` John Stultz
2026-05-29 7:58 ` K Prateek Nayak
1 sibling, 1 reply; 45+ messages in thread
From: John Stultz @ 2026-05-29 6:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Thu, May 28, 2026 at 4:20 PM John Stultz <jstultz@google.com> wrote:
> On Tue, May 26, 2026 at 4:39 PM John Stultz <jstultz@google.com> wrote:
> > On Tue, May 26, 2026 at 4:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > >
> > > Per the discussion here:
> > >
> > > https://lore.kernel.org/all/20260403112810.GG3738786@noisy.programming.kicks-ass.net/
> > >
> > > The reason for this condition is that the signal condition in
> > > try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
> > > does that, in fact, that path does clear_task_blocked_on(), rendering the
> > > clause under discussion moot.
> > >
> > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > > ---
> > > kernel/sched/core.c | 3 ---
> > > 1 file changed, 3 deletions(-)
> > >
> > > --- a/kernel/sched/core.c
> > > +++ b/kernel/sched/core.c
> > > @@ -7132,9 +7132,6 @@ static void __sched notrace __schedule(i
> > > if (sched_proxy_exec()) {
> > > struct task_struct *prev_donor = rq->donor;
> > >
> > > - if (!prev_state && prev->blocked_on)
> > > - clear_task_blocked_on(prev, NULL);
> > > -
> > > rq_set_donor(rq, next);
> > > next->blocked_donor = NULL;
> > > if (unlikely(next->is_blocked && next->blocked_on)) {
> >
> > Oh good! I had a note to try to re-confirm if that chunk was really
> > needed, as it did feel a bit like it was patching up a problem after
> > the fact.
> >
> > That said, running this on top of your sched/proxy branch tripped over
> > warnings with the ww_mutex selftest, so it looks like there's
> > something else missing before this can land.
> >
> > Digging in, it looks like we still need the fix I had here:
> > https://lore.kernel.org/lkml/20260430215103.2978955-3-jstultz@google.com/
> >
> > Since without that, we can get into a situation where we have
> > blocked_on set when a task __state is TASK_RUNNING. The segment
> > you're dropping would catch and clear that out, but really we should
> > avoid getting into that situation in the mutex lock code.
>
> Hey Peter,
> So I've done testing with your full sched/proxy tree and with the
> entire set it looks ok.
>
> However, even with the fix I poined out, I've unfortunately hit races
> with the ww_mutex selftest at the point of this patch in the series.
> Basically between commit
> 1b89b7b21bf5 ("sched/proxy: Remove superfluous clear_task_blocked_in()")
> and
> a8be1edac5a1 ("sched/proxy: Remove PROXY_WAKING")
>
> I'm currently tracing down exactly why the race is cropping up but I
> believe the chunk removed in this case is avoiding cases where we end
> up getting PROXY_WAKING set on a TASK_RUNNING task.
>
> I'll get back to you when I get my head around it properly, but wanted
> to raise the issue in case you or K Prateek can see right through it.
So digging into this, I think part of the issue is the logic you're
removing here sort of keeps the blocked_on and __state/is_blocked
values in sync. And I think one reason they can get out of sync is due
to the task_is_current() shortcut returns in find_proxy_task(), where
we clear is_blocked/blocked_on and return the current task.
Bascially we can get in a situation where (sorry this gets a bit convoluted):
1) On CPU1, __mutex_lock_common, we set task A
blocked_on/TASK_UNINTERRUPTABLE, and call into __schedule().
2) On CPU2, task B who holds the mutex calls __mutex_unlock_slowpath()
and sets task A as PROXY_WAKING and starts to call into
wake_up_task().
3) On CPU1, in __schedule() we pick_next_task(), which returns task A,
which is_blocked. We call find_proxy_task() and note task A is
PROXY_WAKING. Since its also current, we take the short-cut and clear
is_blocked and blocked_on and return task A to run.
4) On CPU2, try_to_wake_up() hits ttwu_runnable(), and
proxy_needs_return() returns false as A->blocked_on is zero.
5) On CPU 1, task A is running, it grabs the lock it was waiting for
and exits __mutex_lock_common. It then enters __mutex_lock_common to
grab a different mutex that is already locked. It sets itself
blocked_on/TASK_INTERRUPTABLE and calls into __schedule()
6) On CPU2, ttwu_runnable() continues, and calls ttwu_do_wakeup(),
which clears A->is_blocked and sets the A->__state TASK_RUNNING
7) On CPU3, task C that holds the mutex A is waiting on, calls
__mutex_unlock_slowpath, setting A as PROXY_WAKING and calls into
wake_up_task()
8) On CPU1, in __schedule() pick_next_task() again returns task A. But
is_blocked is now zero, so we just return task A, even though
blocked_on is PROXY_WAKING.
9) On CPU1, task A gets back to the __mutex_lock_common() loop, calls
set_task_blocked_on() and trips warnings as A->blocked_on is still
PROXY_WAKING.
The chunk you drop in this patch effectively clears blocked_on in
whenever we notice current is TASK_RUNNING, which keeps us in sync to
avoid this.
I'm also wondering if dropping the shortcut returns in
find_proxy_task() that clear blocked_on and is_blocked is maybe the
right thing? Just to reduce the cases we have to wrangle here.
Even so, at least in my initial testing in doing so with just this
patch that doesn't seem sufficient as it seems ttwu() can still race
setting A->__state TASK_RUNNING vs find_proxy_task() hitting a
proxy_deactive() case tripping the TASK_RUNNING warn-on.
Maybe we just need to move the removal of this chunk till after we
drop PROXY_WAKING?
I'll tinker more tomorrow
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 6:45 ` K Prateek Nayak
@ 2026-05-29 7:14 ` John Stultz
2026-05-29 8:24 ` K Prateek Nayak
2026-05-29 8:47 ` Peter Zijlstra
2026-05-29 9:33 ` Peter Zijlstra
2 siblings, 1 reply; 45+ messages in thread
From: John Stultz @ 2026-05-29 7:14 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Peter Zijlstra, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Thu, May 28, 2026 at 11:45 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
>
> Hello John,
>
> On 5/29/2026 4:50 AM, John Stultz wrote:
> > However, even with the fix I poined out, I've unfortunately hit races
> > with the ww_mutex selftest at the point of this patch in the series.
> > Basically between commit
> > 1b89b7b21bf5 ("sched/proxy: Remove superfluous clear_task_blocked_in()")
> > and
> > a8be1edac5a1 ("sched/proxy: Remove PROXY_WAKING")
> >
> > I'm currently tracing down exactly why the race is cropping up but I
> > believe the chunk removed in this case is avoiding cases where we end
> > up getting PROXY_WAKING set on a TASK_RUNNING task.
>
> This seems to be the failure path:
>
> /* Task p*/
> mutex_lock(mutex)
> ... try_to_wake_up(p)
> schedule_preempt_disabled() ttwu_runnable()
> __schedule() __task_rq_lock() /* Wins */
> rq_lock() /* Waits */ if (task_on_rq_queued(p))
> /*
> * p->is_blocked is still not set!
> * proxy_needs_return() bails out early.
> */
> ttwu_do_wakeup()
> p->__state = TASK_RUNNING;
> __tsk_rq_unlock();
> ...
> /* p->__state = TASK_RUNNING */
> prev_state = p->__state;
> if (prev_state && ...) {
> /*
> * Skipped since task is
> * already TASK_RUNNING
> */
> }
>
> /* p->is_blocked = 0; p->blocked_on = PROXY_WAKING */
> next = p;
>
> /* Returns from schedule_preempt_disabled()
> set_task_blocked_on(p, mutex)
>
> !!! p->blocked_on == PROXY_WAKING && p->blocked_on != mutex !!!
> ---
You see these things so quickly! :) Beat me sending out my own
analysis (which is maybe a slightly different case, but still).
> Also proxy_needs_return() bails out too early - a wakeup from signal
> should still clear p->blocked_on even if p->wake_cpu is same as
> task_cpu().
>
>
> I think we need the following at commit 83f9b04ef50c ("sched/proxy:
> Switch proxy to use p->is_blocked") to clear p->blocked_on in the wakeup
> path irrespective of p->is_blocked:
>
> (Lightly tested)
>
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a125e65c35bb..fe903976fd09 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3764,28 +3764,28 @@ static inline void proxy_reset_donor(struct rq *rq)
> */
> static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
> {
> - if (!p->is_blocked)
> - return false;
> -
> - /*
> - * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
> - *
> - * However, proxy_set_task_cpu() is such that it preserves the
> - * original cpu in p->wake_cpu while migrating p for proxy reasons
> - * (possibly outside of the allowed p->cpus_ptr).
> - *
> - * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
> - * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
> - * will not apply. But if it did, this check is the safe way around
> - * and would migrate.
> - */
> - if (task_cpu(p) == p->wake_cpu)
> + if (!task_is_blocked(p))
> return false;
>
> scoped_guard(raw_spinlock, &p->blocked_lock) {
> /* Task is waking up; clear any blocked_on relationship */
> __clear_task_blocked_on(p, NULL);
>
> + /*
> + * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
> + *
> + * However, proxy_set_task_cpu() is such that it preserves the
> + * original cpu in p->wake_cpu while migrating p for proxy reasons
> + * (possibly outside of the allowed p->cpus_ptr).
> + *
> + * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
> + * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
> + * will not apply. But if it did, this check is the safe way around
> + * and would migrate.
> + */
> + if (task_cpu(p) == p->wake_cpu)
> + return false;
> +
> /* If already current, don't need to return migrate */
> if (task_current(rq, p))
> return false;
> ---
>
> Part of that belongs in commit e2ff8b7bde07 ("sched/proxy: Only return
> migrate when needed") and the first hunk of 83f9b04ef50c ("sched/proxy:
> Switch proxy to use p->is_blocked") in proxy_needs_return() should be
> dropped.
Very nice, yes this does help when the "Only return migrate when
needed" is added!
So I've included this and reworked the order of Peter's doodles slightly to be:
sched/proxy: Optimize try_to_wake_up()
sched: Be more strict about p->is_blocked
sched/proxy: Only return migrate when needed
FOLD: k prateek's fixup
sched/proxy: Switch proxy to use p->is_blocked
sched/proxy: Remove PROXY_WAKING
sched/proxy: Remove superfluous clear_task_blocked_in()
sched: Simplify ttwu_runnable()
Which seems to be working at each step. Though I've only lightly
tested and I didn't trip this initially with the series, so more
testing will be needed tommorow.
thanks
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 6:48 ` John Stultz
@ 2026-05-29 7:58 ` K Prateek Nayak
2026-05-29 10:06 ` Peter Zijlstra
0 siblings, 1 reply; 45+ messages in thread
From: K Prateek Nayak @ 2026-05-29 7:58 UTC (permalink / raw)
To: John Stultz, Peter Zijlstra
Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Hello John,
On 5/29/2026 12:18 PM, John Stultz wrote:
> Bascially we can get in a situation where (sorry this gets a bit convoluted):
>
> 1) On CPU1, __mutex_lock_common, we set task A
> blocked_on/TASK_UNINTERRUPTABLE, and call into __schedule().
>
> 2) On CPU2, task B who holds the mutex calls __mutex_unlock_slowpath()
> and sets task A as PROXY_WAKING and starts to call into
> wake_up_task().
>
> 3) On CPU1, in __schedule() we pick_next_task(), which returns task A,
> which is_blocked. We call find_proxy_task() and note task A is
> PROXY_WAKING. Since its also current, we take the short-cut and clear
> is_blocked and blocked_on and return task A to run.
>
> 4) On CPU2, try_to_wake_up() hits ttwu_runnable(), and
> proxy_needs_return() returns false as A->blocked_on is zero.
>
> 5) On CPU 1, task A is running, it grabs the lock it was waiting for
> and exits __mutex_lock_common. It then enters __mutex_lock_common to
> grab a different mutex that is already locked. It sets itself
> blocked_on/TASK_INTERRUPTABLE and calls into __schedule()
>
> 6) On CPU2, ttwu_runnable() continues, and calls ttwu_do_wakeup(),
> which clears A->is_blocked and sets the A->__state TASK_RUNNING
>
> 7) On CPU3, task C that holds the mutex A is waiting on, calls
> __mutex_unlock_slowpath, setting A as PROXY_WAKING and calls into
> wake_up_task()
>
> 8) On CPU1, in __schedule() pick_next_task() again returns task A. But
> is_blocked is now zero, so we just return task A, even though
> blocked_on is PROXY_WAKING.
>
> 9) On CPU1, task A gets back to the __mutex_lock_common() loop, calls
> set_task_blocked_on() and trips warnings as A->blocked_on is still
> PROXY_WAKING.
Oh geez! Me tries to visualize:
CPU1 CPU2 CPU3
==== ==== ====
__mutex_lock_common(MutexA)
set_task_blocked_on(TaskA, MutexA)
set_current_state(TASK_UNINTERRUPTABLE) __mutex_unlock_slowpath(MutexA)
... set_task_blocked_on_waking(TaskA)
schedule_preempt_disabled() wake_up_process(TaskA)
__schedule() /* (1) */ ... /* (2) */
if (prev_state &..)
TaskA->is_blocked = 0;
next = TaskA
find_proxy_task(TaskA)
/* TaskA-> blocked_on == TASK_WAKING */
clear_task_blocked_on(TaskA, NULL);
TaskA->is_blocked = 0;
...
next = TaskA /* (3) */
rq_unlock(CPU1) try_to_wake_up(TaskA)
rq_lock(CPU1)
ttwu_runnable()
/* TaskA->blocked_on == 0 (4) */
...
set_curent_state(TASK_RUNNING)
...
mutex_lock(MutexB)
__mutex_lock_common(MutexB)
...
set_task_blocked_on(TaskA, MutexB)
set_current_state(TASK_UNINTERRUPTABLE)
schedule_preempt_disabled()
__schedule() /* (5) */ ... __mutex_unlock_slowpath(MutexB)
ttwu_do_wakeup(TaskA) /* (6) */ set_task_blocked_on_waking(TaskA) /* (7) */
rq_unlock(CPU1)
rq_lock(CPU1)
/* TaskA->__state == TASK_RUNNING */
next = TasKA;
if (TaskA->is_blocked /* False */ && TaskA->blocked_on /* PROXY_WAKING */)
/* Skip */
next = TaskA; /* (8) */
set_task_blocked_on(p, MutexB)
!!! p->blocked_on != MutexB !!!
Yup! That is a concern too then!
I think we can just squash the PROXY_WAKING removal with "p->is_blocked"
introduction and a part of this problem should go away since unlocks
always clear task->blocked_on then.
>
> The chunk you drop in this patch effectively clears blocked_on in
> whenever we notice current is TASK_RUNNING, which keeps us in sync to
> avoid this.
>
> I'm also wondering if dropping the shortcut returns in
> find_proxy_task() that clear blocked_on and is_blocked is maybe the
> right thing? Just to reduce the cases we have to wrangle here.
> Even so, at least in my initial testing in doing so with just this
> patch that doesn't seem sufficient as it seems ttwu() can still race
> setting A->__state TASK_RUNNING vs find_proxy_task() hitting a
> proxy_deactive() case tripping the TASK_RUNNING warn-on.
Yeah, I tripped on those bits in the parallel thread. You can
perhaps give those changes a try.
>
> Maybe we just need to move the removal of this chunk till after we
> drop PROXY_WAKING?
That would be safer, yes.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 7:14 ` John Stultz
@ 2026-05-29 8:24 ` K Prateek Nayak
0 siblings, 0 replies; 45+ messages in thread
From: K Prateek Nayak @ 2026-05-29 8:24 UTC (permalink / raw)
To: John Stultz
Cc: Peter Zijlstra, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Hello John,
On 5/29/2026 12:44 PM, John Stultz wrote:
> You see these things so quickly! :) Beat me sending out my own
> analysis (which is maybe a slightly different case, but still).
That was an interesting case too! My brain defaults to one lock and two
CPUs but your brain goes a few step (and a few locks, CPUs) further ;-)
>> Part of that belongs in commit e2ff8b7bde07 ("sched/proxy: Only return
>> migrate when needed") and the first hunk of 83f9b04ef50c ("sched/proxy:
>> Switch proxy to use p->is_blocked") in proxy_needs_return() should be
>> dropped.
>
> Very nice, yes this does help when the "Only return migrate when
> needed" is added!
>
> So I've included this and reworked the order of Peter's doodles slightly to be:
> sched/proxy: Optimize try_to_wake_up()
> sched: Be more strict about p->is_blocked
> sched/proxy: Only return migrate when needed
> FOLD: k prateek's fixup
> sched/proxy: Switch proxy to use p->is_blocked
> sched/proxy: Remove PROXY_WAKING
> sched/proxy: Remove superfluous clear_task_blocked_in()
> sched: Simplify ttwu_runnable()
Ack! That should cover all bases.
>
> Which seems to be working at each step. Though I've only lightly
> tested and I didn't trip this initially with the series, so more
> testing will be needed tommorow.
Meanwhile, I'll rearrange the patches like you suggested above and run
some testes too. Fingers crossed
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 6:45 ` K Prateek Nayak
2026-05-29 7:14 ` John Stultz
@ 2026-05-29 8:47 ` Peter Zijlstra
2026-05-29 8:50 ` Peter Zijlstra
2026-05-29 10:46 ` K Prateek Nayak
2026-05-29 9:33 ` Peter Zijlstra
2 siblings, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-29 8:47 UTC (permalink / raw)
To: K Prateek Nayak
Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Fri, May 29, 2026 at 12:15:09PM +0530, K Prateek Nayak wrote:
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index a125e65c35bb..fe903976fd09 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -3764,28 +3764,28 @@ static inline void proxy_reset_donor(struct rq *rq)
> */
> static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
> {
> - if (!p->is_blocked)
> - return false;
> -
> - /*
> - * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
> - *
> - * However, proxy_set_task_cpu() is such that it preserves the
> - * original cpu in p->wake_cpu while migrating p for proxy reasons
> - * (possibly outside of the allowed p->cpus_ptr).
> - *
> - * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
> - * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
> - * will not apply. But if it did, this check is the safe way around
> - * and would migrate.
> - */
> - if (task_cpu(p) == p->wake_cpu)
> + if (!task_is_blocked(p))
> return false;
>
> scoped_guard(raw_spinlock, &p->blocked_lock) {
> /* Task is waking up; clear any blocked_on relationship */
> __clear_task_blocked_on(p, NULL);
>
> + /*
> + * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
> + *
> + * However, proxy_set_task_cpu() is such that it preserves the
> + * original cpu in p->wake_cpu while migrating p for proxy reasons
> + * (possibly outside of the allowed p->cpus_ptr).
> + *
> + * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
> + * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
> + * will not apply. But if it did, this check is the safe way around
> + * and would migrate.
> + */
> + if (task_cpu(p) == p->wake_cpu)
> + return false;
> +
> /* If already current, don't need to return migrate */
> if (task_current(rq, p))
> return false;
Egads, this is terrible. This means all is_blocked tasks always end up
taking blocked_lock.
The other suggestion (in the other subthread) was to simply delay this
patch until the end. That seems far more sensible.
Anyway, let me go try and find your git tree and see what you ended up
with.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 8:47 ` Peter Zijlstra
@ 2026-05-29 8:50 ` Peter Zijlstra
2026-05-29 10:46 ` K Prateek Nayak
1 sibling, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-29 8:50 UTC (permalink / raw)
To: K Prateek Nayak
Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Fri, May 29, 2026 at 10:47:12AM +0200, Peter Zijlstra wrote:
> On Fri, May 29, 2026 at 12:15:09PM +0530, K Prateek Nayak wrote:
>
> > diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> > index a125e65c35bb..fe903976fd09 100644
> > --- a/kernel/sched/core.c
> > +++ b/kernel/sched/core.c
> > @@ -3764,28 +3764,28 @@ static inline void proxy_reset_donor(struct rq *rq)
> > */
> > static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
> > {
> > - if (!p->is_blocked)
> > - return false;
> > -
> > - /*
> > - * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
> > - *
> > - * However, proxy_set_task_cpu() is such that it preserves the
> > - * original cpu in p->wake_cpu while migrating p for proxy reasons
> > - * (possibly outside of the allowed p->cpus_ptr).
> > - *
> > - * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
> > - * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
> > - * will not apply. But if it did, this check is the safe way around
> > - * and would migrate.
> > - */
> > - if (task_cpu(p) == p->wake_cpu)
> > + if (!task_is_blocked(p))
> > return false;
> >
> > scoped_guard(raw_spinlock, &p->blocked_lock) {
> > /* Task is waking up; clear any blocked_on relationship */
> > __clear_task_blocked_on(p, NULL);
> >
> > + /*
> > + * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
> > + *
> > + * However, proxy_set_task_cpu() is such that it preserves the
> > + * original cpu in p->wake_cpu while migrating p for proxy reasons
> > + * (possibly outside of the allowed p->cpus_ptr).
> > + *
> > + * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
> > + * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
> > + * will not apply. But if it did, this check is the safe way around
> > + * and would migrate.
> > + */
> > + if (task_cpu(p) == p->wake_cpu)
> > + return false;
> > +
> > /* If already current, don't need to return migrate */
> > if (task_current(rq, p))
> > return false;
>
> Egads, this is terrible. This means all is_blocked tasks always end up
> taking blocked_lock.
>
> The other suggestion (in the other subthread) was to simply delay this
> patch until the end. That seems far more sensible.
>
> Anyway, let me go try and find your git tree and see what you ended up
> with.
Argh, github is such an absolute terrible piece of shit :-( It won't
even let me browse git trees because of rate-limiting or something
stupid -- I've done less than 10 clicks on the site and it says I hit a
limit and should wait and/or log in.
Fuckers.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 6:45 ` K Prateek Nayak
2026-05-29 7:14 ` John Stultz
2026-05-29 8:47 ` Peter Zijlstra
@ 2026-05-29 9:33 ` Peter Zijlstra
2 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-29 9:33 UTC (permalink / raw)
To: K Prateek Nayak
Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Fri, May 29, 2026 at 12:15:09PM +0530, K Prateek Nayak wrote:
> Hello John,
>
> On 5/29/2026 4:50 AM, John Stultz wrote:
> > However, even with the fix I poined out, I've unfortunately hit races
> > with the ww_mutex selftest at the point of this patch in the series.
> > Basically between commit
> > 1b89b7b21bf5 ("sched/proxy: Remove superfluous clear_task_blocked_in()")
> > and
> > a8be1edac5a1 ("sched/proxy: Remove PROXY_WAKING")
> >
> > I'm currently tracing down exactly why the race is cropping up but I
> > believe the chunk removed in this case is avoiding cases where we end
> > up getting PROXY_WAKING set on a TASK_RUNNING task.
I'm struggling to make sense of this...
> This seems to be the failure path:
>
> /* Task p*/
> mutex_lock(mutex)
> ... try_to_wake_up(p)
> schedule_preempt_disabled() ttwu_runnable()
> __schedule() __task_rq_lock() /* Wins */
> rq_lock() /* Waits */ if (task_on_rq_queued(p))
> /*
> * p->is_blocked is still not set!
> * proxy_needs_return() bails out early.
> */
> ttwu_do_wakeup()
> p->__state = TASK_RUNNING;
> __tsk_rq_unlock();
> ...
> /* p->__state = TASK_RUNNING */
> prev_state = p->__state;
> if (prev_state && ...) {
> /*
> * Skipped since task is
> * already TASK_RUNNING
> */
> }
>
> /* p->is_blocked = 0; p->blocked_on = PROXY_WAKING */
> next = p;
>
> /* Returns from schedule_preempt_disabled()
> set_task_blocked_on(p, mutex)
>
> !!! p->blocked_on == PROXY_WAKING && p->blocked_on != mutex !!!
> ---
>
> Also proxy_needs_return() bails out too early - a wakeup from signal
> should still clear p->blocked_on even if p->wake_cpu is same as
> task_cpu().
esp. in the context of the full patch set. There was no migration,
therefore there is no need for a return migration. We're in
mutex_lock(), any exit path will clear blocked_on.
I don't see why we should have proxy_needs_return() unconditionally take
that lock and clear blocked_on.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 7:58 ` K Prateek Nayak
@ 2026-05-29 10:06 ` Peter Zijlstra
2026-05-29 10:54 ` K Prateek Nayak
0 siblings, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2026-05-29 10:06 UTC (permalink / raw)
To: K Prateek Nayak
Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Fri, May 29, 2026 at 01:28:45PM +0530, K Prateek Nayak wrote:
> Hello John,
>
> On 5/29/2026 12:18 PM, John Stultz wrote:
> > Bascially we can get in a situation where (sorry this gets a bit convoluted):
> >
> > 1) On CPU1, __mutex_lock_common, we set task A
> > blocked_on/TASK_UNINTERRUPTABLE, and call into __schedule().
> >
> > 2) On CPU2, task B who holds the mutex calls __mutex_unlock_slowpath()
> > and sets task A as PROXY_WAKING and starts to call into
> > wake_up_task().
> >
> > 3) On CPU1, in __schedule() we pick_next_task(), which returns task A,
> > which is_blocked. We call find_proxy_task() and note task A is
> > PROXY_WAKING. Since its also current, we take the short-cut and clear
> > is_blocked and blocked_on and return task A to run.
> >
> > 4) On CPU2, try_to_wake_up() hits ttwu_runnable(), and
> > proxy_needs_return() returns false as A->blocked_on is zero.
> >
> > 5) On CPU 1, task A is running, it grabs the lock it was waiting for
> > and exits __mutex_lock_common. It then enters __mutex_lock_common to
> > grab a different mutex that is already locked. It sets itself
> > blocked_on/TASK_INTERRUPTABLE and calls into __schedule()
> >
> > 6) On CPU2, ttwu_runnable() continues, and calls ttwu_do_wakeup(),
> > which clears A->is_blocked and sets the A->__state TASK_RUNNING
> >
> > 7) On CPU3, task C that holds the mutex A is waiting on, calls
> > __mutex_unlock_slowpath, setting A as PROXY_WAKING and calls into
> > wake_up_task()
> >
> > 8) On CPU1, in __schedule() pick_next_task() again returns task A. But
> > is_blocked is now zero, so we just return task A, even though
> > blocked_on is PROXY_WAKING.
> >
> > 9) On CPU1, task A gets back to the __mutex_lock_common() loop, calls
> > set_task_blocked_on() and trips warnings as A->blocked_on is still
> > PROXY_WAKING.
>
> Oh geez! Me tries to visualize:
Thanks!, I too need pictures, prose will forever confuse me :/
>
> CPU1 CPU2 CPU3
> ==== ==== ====
>
> __mutex_lock_common(MutexA)
> set_task_blocked_on(TaskA, MutexA)
> set_current_state(TASK_UNINTERRUPTABLE) __mutex_unlock_slowpath(MutexA)
> ... set_task_blocked_on_waking(TaskA)
> schedule_preempt_disabled() wake_up_process(TaskA)
> __schedule() /* (1) */ ... /* (2) */
>
> if (prev_state &..)
> TaskA->is_blocked = 0;
Should this be: TaskA->is_blocked = 1? Otherwise I'm not following.
> next = TaskA
> find_proxy_task(TaskA)
> /* TaskA-> blocked_on == TASK_WAKING */
> clear_task_blocked_on(TaskA, NULL);
> TaskA->is_blocked = 0;
> ...
> next = TaskA /* (3) */
> rq_unlock(CPU1) try_to_wake_up(TaskA)
> rq_lock(CPU1)
> ttwu_runnable()
> /* TaskA->blocked_on == 0 (4) */
> ...
> set_curent_state(TASK_RUNNING)
> ...
>
> mutex_lock(MutexB)
> __mutex_lock_common(MutexB)
> ...
> set_task_blocked_on(TaskA, MutexB)
> set_current_state(TASK_UNINTERRUPTABLE)
> schedule_preempt_disabled()
> __schedule() /* (5) */ ... __mutex_unlock_slowpath(MutexB)
> ttwu_do_wakeup(TaskA) /* (6) */ set_task_blocked_on_waking(TaskA) /* (7) */
> rq_unlock(CPU1)
>
> rq_lock(CPU1)
> /* TaskA->__state == TASK_RUNNING */
> next = TasKA;
>
> if (TaskA->is_blocked /* False */ && TaskA->blocked_on /* PROXY_WAKING */)
> /* Skip */
> next = TaskA; /* (8) */
>
> set_task_blocked_on(p, MutexB)
>
> !!! p->blocked_on != MutexB !!!
>
>
> Yup! That is a concern too then!
>
> I think we can just squash the PROXY_WAKING removal with "p->is_blocked"
> introduction and a part of this problem should go away since unlocks
> always clear task->blocked_on then.
While staring at this, I noted that the PROXY_WAKING removal patch
should also remove the clear_task_blocked_on() line in the very last
hunk.
That said, I do have a note to double check the lockless access to
p->blocked_on there.
Anyway, yes, the hunk removed in patch 1 cures this by clearing
TaskA->blocked_on (because prev == current == TaskA). I don't think we
need to squash everything into one giant patch over this if we just
reorder things.
The Changelog of patch 1 needs an extra few links to this discussion,
but that should be it, no?
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 8:47 ` Peter Zijlstra
2026-05-29 8:50 ` Peter Zijlstra
@ 2026-05-29 10:46 ` K Prateek Nayak
2026-05-30 2:56 ` John Stultz
1 sibling, 1 reply; 45+ messages in thread
From: K Prateek Nayak @ 2026-05-29 10:46 UTC (permalink / raw)
To: Peter Zijlstra
Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Hello Peter,
On 5/29/2026 2:17 PM, Peter Zijlstra wrote:
> Egads, this is terrible. This means all is_blocked tasks always end up
> taking blocked_lock.
>
> The other suggestion (in the other subthread) was to simply delay this
> patch until the end. That seems far more sensible.
>
> Anyway, let me go try and find your git tree and see what you ended up> with.
Now that I look at this again, yeah, you can simply move that
optimization later too. Github isn't letting me push the branch but
essentially you move:
e2ff8b7bde07 sched/proxy: Only return migrate when needed
1b89b7b21bf5 sched/proxy: Remove superfluous clear_task_blocked_in()
to happen after commit a8be1edac5a1 ("sched/proxy: Remove
PROXY_WAKING").
All of that was for essentially this scenario:
mutex_lock()
set_task_blocked_on(mutex)
schedule_preempt_disabled()
__schedule()
p->is_blocked = 1
-------> preempted __send_signal_locked()
signal_wake_up()
try_to_wake_up()
ttwu_runnable()
/*
* Needs to clear p->blocked_on
* As long as PROXY_WAKING exists.
*/
ttwu_do_wakeup()
p->__state = TASK_RUNNING
p->is_blocked = 0
mutex_unlock()
set_task_blocked_on_waking(p, mutex)
p->blocked_on = PROXY_WAKING;
/* p->is_blocked = 0; p->blocked_on = PROXY_WAKING */
next = p;
<------ swich_in
/* Exits out of schedule_preempt_disabled() */
set_task_blocked_on(p, mutex)
!!! p->blocked_on == PROXY_WAKING !!!
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 10:06 ` Peter Zijlstra
@ 2026-05-29 10:54 ` K Prateek Nayak
0 siblings, 0 replies; 45+ messages in thread
From: K Prateek Nayak @ 2026-05-29 10:54 UTC (permalink / raw)
To: Peter Zijlstra
Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On 5/29/2026 3:36 PM, Peter Zijlstra wrote:
> On Fri, May 29, 2026 at 01:28:45PM +0530, K Prateek Nayak wrote:
>> Hello John,
>>
>> On 5/29/2026 12:18 PM, John Stultz wrote:
>>> Bascially we can get in a situation where (sorry this gets a bit convoluted):
>>>
>>> 1) On CPU1, __mutex_lock_common, we set task A
>>> blocked_on/TASK_UNINTERRUPTABLE, and call into __schedule().
>>>
>>> 2) On CPU2, task B who holds the mutex calls __mutex_unlock_slowpath()
>>> and sets task A as PROXY_WAKING and starts to call into
>>> wake_up_task().
>>>
>>> 3) On CPU1, in __schedule() we pick_next_task(), which returns task A,
>>> which is_blocked. We call find_proxy_task() and note task A is
>>> PROXY_WAKING. Since its also current, we take the short-cut and clear
>>> is_blocked and blocked_on and return task A to run.
>>>
>>> 4) On CPU2, try_to_wake_up() hits ttwu_runnable(), and
>>> proxy_needs_return() returns false as A->blocked_on is zero.
>>>
>>> 5) On CPU 1, task A is running, it grabs the lock it was waiting for
>>> and exits __mutex_lock_common. It then enters __mutex_lock_common to
>>> grab a different mutex that is already locked. It sets itself
>>> blocked_on/TASK_INTERRUPTABLE and calls into __schedule()
>>>
>>> 6) On CPU2, ttwu_runnable() continues, and calls ttwu_do_wakeup(),
>>> which clears A->is_blocked and sets the A->__state TASK_RUNNING
>>>
>>> 7) On CPU3, task C that holds the mutex A is waiting on, calls
>>> __mutex_unlock_slowpath, setting A as PROXY_WAKING and calls into
>>> wake_up_task()
>>>
>>> 8) On CPU1, in __schedule() pick_next_task() again returns task A. But
>>> is_blocked is now zero, so we just return task A, even though
>>> blocked_on is PROXY_WAKING.
>>>
>>> 9) On CPU1, task A gets back to the __mutex_lock_common() loop, calls
>>> set_task_blocked_on() and trips warnings as A->blocked_on is still
>>> PROXY_WAKING.
>>
>> Oh geez! Me tries to visualize:
>
> Thanks!, I too need pictures, prose will forever confuse me :/
>
>>
>> CPU1 CPU2 CPU3
>> ==== ==== ====
>>
>> __mutex_lock_common(MutexA)
>> set_task_blocked_on(TaskA, MutexA)
>> set_current_state(TASK_UNINTERRUPTABLE) __mutex_unlock_slowpath(MutexA)
>> ... set_task_blocked_on_waking(TaskA)
>> schedule_preempt_disabled() wake_up_process(TaskA)
>> __schedule() /* (1) */ ... /* (2) */
>>
>> if (prev_state &..)
>> TaskA->is_blocked = 0;
>
> Should this be: TaskA->is_blocked = 1? Otherwise I'm not following.
Yup! My bad.
>
>> next = TaskA
>> find_proxy_task(TaskA)
>> /* TaskA-> blocked_on == TASK_WAKING */
>> clear_task_blocked_on(TaskA, NULL);
>> TaskA->is_blocked = 0;
>> ...
>> next = TaskA /* (3) */
>> rq_unlock(CPU1) try_to_wake_up(TaskA)
>> rq_lock(CPU1)
>> ttwu_runnable()
>> /* TaskA->blocked_on == 0 (4) */
>> ...
>> set_curent_state(TASK_RUNNING)
>> ...
>>
>> mutex_lock(MutexB)
>> __mutex_lock_common(MutexB)
>> ...
>> set_task_blocked_on(TaskA, MutexB)
>> set_current_state(TASK_UNINTERRUPTABLE)
>> schedule_preempt_disabled()
>> __schedule() /* (5) */ ... __mutex_unlock_slowpath(MutexB)
>> ttwu_do_wakeup(TaskA) /* (6) */ set_task_blocked_on_waking(TaskA) /* (7) */
>> rq_unlock(CPU1)
>>
>> rq_lock(CPU1)
>> /* TaskA->__state == TASK_RUNNING */
>> next = TasKA;
>>
>> if (TaskA->is_blocked /* False */ && TaskA->blocked_on /* PROXY_WAKING */)
>> /* Skip */
>> next = TaskA; /* (8) */
>>
>> set_task_blocked_on(p, MutexB)
>>
>> !!! p->blocked_on != MutexB !!!
>>
>>
>> Yup! That is a concern too then!
>>
>> I think we can just squash the PROXY_WAKING removal with "p->is_blocked"
>> introduction and a part of this problem should go away since unlocks
>> always clear task->blocked_on then.
>
> While staring at this, I noted that the PROXY_WAKING removal patch
> should also remove the clear_task_blocked_on() line in the very last
> hunk.
>
> That said, I do have a note to double check the lockless access to
> p->blocked_on there.
Now that PROXY_WAKING is gone, the only reason we locklessly inspect
p->blocked_on is with the intention of clearing it.
If we see a valid p->blocked_on, we take the lock and inspect it
again. If it is cleared, no other entity except the task can set it
for itself so it will remain blocked until the time task is
selected to run on CPU and the blocked donors don't run until they
are woken up.
>
> Anyway, yes, the hunk removed in patch 1 cures this by clearing
> TaskA->blocked_on (because prev == current == TaskA). I don't think we
> need to squash everything into one giant patch over this if we just
> reorder things.
>
> The Changelog of patch 1 needs an extra few links to this discussion,
> but that should be it, no?
Sure! That should do just fine.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-29 10:46 ` K Prateek Nayak
@ 2026-05-30 2:56 ` John Stultz
0 siblings, 0 replies; 45+ messages in thread
From: John Stultz @ 2026-05-30 2:56 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Peter Zijlstra, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Fri, May 29, 2026 at 3:46 AM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> On 5/29/2026 2:17 PM, Peter Zijlstra wrote:
> > Egads, this is terrible. This means all is_blocked tasks always end up
> > taking blocked_lock.
> >
> > The other suggestion (in the other subthread) was to simply delay this
> > patch until the end. That seems far more sensible.
> >
> > Anyway, let me go try and find your git tree and see what you ended up> with.
>
> Now that I look at this again, yeah, you can simply move that
> optimization later too. Github isn't letting me push the branch but
> essentially you move:
>
> e2ff8b7bde07 sched/proxy: Only return migrate when needed
> 1b89b7b21bf5 sched/proxy: Remove superfluous clear_task_blocked_in()
>
> to happen after commit a8be1edac5a1 ("sched/proxy: Remove
> PROXY_WAKING").
>
Yeah, just to confirm in my testing of the current peterz/sched/proxy
branch, I agree it looks like pushing the
"sched/proxy: Only return migrate when needed"
to be after
"sched/proxy: Remove PROXY_WAKING"
avoids some of the WARN_ONs I can trip inbetween those two currently.
thanks
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-05-26 11:16 ` [PATCH 5/6] sched/proxy: Remove PROXY_WAKING Peter Zijlstra
@ 2026-06-01 10:54 ` Peter Zijlstra
2026-06-01 20:32 ` John Stultz
2026-06-02 3:19 ` K Prateek Nayak
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
1 sibling, 2 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-06-01 10:54 UTC (permalink / raw)
To: John Stultz, K Prateek Nayak
Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, May 26, 2026 at 01:16:14PM +0200, Peter Zijlstra wrote:
> From: K Prateek Nayak <kprateek.nayak@amd.com>
>
> Now that the proxy path uses ->is_blocked, use the '->is_blocked &&
> !->blocked_on' state instead of PROXY_WAKING. Notably, this is where a
> blocked_on relation is broken but the donor task might still need a return
> migration.
>
> (Not-yet-)Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Prateek, can I make that a normal SoB from you? I'm thinking I should
merge sched/proxy into sched/core so we can get on with other stuff.
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-06-01 10:54 ` Peter Zijlstra
@ 2026-06-01 20:32 ` John Stultz
2026-06-02 5:22 ` K Prateek Nayak
2026-06-02 3:19 ` K Prateek Nayak
1 sibling, 1 reply; 45+ messages in thread
From: John Stultz @ 2026-06-01 20:32 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Mon, Jun 1, 2026 at 3:54 AM Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, May 26, 2026 at 01:16:14PM +0200, Peter Zijlstra wrote:
> > From: K Prateek Nayak <kprateek.nayak@amd.com>
> >
> > Now that the proxy path uses ->is_blocked, use the '->is_blocked &&
> > !->blocked_on' state instead of PROXY_WAKING. Notably, this is where a
> > blocked_on relation is broken but the donor task might still need a return
> > migration.
> >
> > (Not-yet-)Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>
> Prateek, can I make that a normal SoB from you? I'm thinking I should
> merge sched/proxy into sched/core so we can get on with other stuff.
Just as a heads up, so in stress testing[1] over the weekend with your
sched/proxy series, I hit the below null ptr traversal that seems to
be another pick_eevdf() returning null issue.
I'm not sure if this is proxy related or not yet, so I'll be working
to reproduce (took ~31 hours to trip this one) and narrow it down.
But I'm wondering, given this pick_eevdf() returning null symptom has
been a regular issue for various bugs over time, do we need some
better debug checks to try to better these narrow down?
This was using your tree at 4d92e41a046d, plus one workaround for
binutils on my system:
https://lore.kernel.org/lkml/7b45d196-063e-4e76-b08b-ec2bcc111328@linux.ibm.com/
[1]: Running the following in a loop: stress-ng --class scheduler
--all 1 --timeout 300
Crash below:
[112007.261294] BUG: kernel NULL pointer dereference, address: 0000000000000059
[112007.265100] #PF: supervisor read access in kernel mode
[112007.267796] #PF: error_code(0x0000) - not-present page
[112007.270507] PGD 0 P4D 0
[112007.271913] Oops: Oops: 0000 [#1] SMP NOPTI
[112007.274149] CPU: 6 UID: 0 PID: 1830390 Comm: stress-ng-cpu-s
Tainted: G W 7.1.0-rc2-00079-g7f66f556bfd7 #78
PREEMPT(full)
[112007.280445] Tainted: [W]=WARN
[112007.282098] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.17.0-debian-1.17.0-1 04/01/2014
[112007.286982] RIP: 0010:pick_task_fair+0x49/0x7c0
[112007.289398] Code: 18 a2 02 4c 89 74 24 40 49 89 f6 44 8b 8b 50 01
00 00 48 8d bb 40 01 00 00 45 85 c9 75 27 eb 63 be 01 00 00 00 e8 b7
49 ff
ff <80> 78 59 00 75 41 48 85 c0 74 d6 48 8b b8 b8 00 00 00 48 85 ff 0f
[112007.298732] RSP: 0018:ffffc90016a5bca0 EFLAGS: 00010082
[112007.301475] RAX: 0000000000000000 RBX: ffff8881b8fada00 RCX:
fffca7c2df129800
[112007.305138] RDX: ffff888191e90080 RSI: 0000000003000c00 RDI:
ffff888106fc7600
[112007.308813] RBP: ffffc90016a5bde0 R08: fffca7c2df129800 R09:
0000000003000c00
[112007.312498] R10: 000790b20f70c03b R11: 0000000000000000 R12:
0000000000000000
[112007.316165] R13: ffffffff82f4c310 R14: ffffc90016a5bd70 R15:
ffff8881b8fada00
[112007.319866] FS: 00007fdb4d1e7b00(0000) GS:ffff8882351ee000(0000)
knlGS:0000000000000000
[112007.324083] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[112007.327102] CR2: 0000000000000059 CR3: 0000000109b7b001 CR4:
0000000000370ef0
[112007.330798] Call Trace:
[112007.332211] <TASK>
[112007.333463] ? do_nanosleep+0x1a/0x190
[112007.335475] ? dequeue_task_fair+0x2b/0x180
[112007.337698] __schedule+0x2e7/0x1e50
[112007.339633] ? lock_acquire+0xd9/0x320
[112007.341668] ? do_nanosleep+0x1a/0x190
[112007.343669] ? lock_release+0x191/0x310
[112007.345725] ? do_nanosleep+0x1a/0x190
[112007.347753] schedule+0x3d/0x130
[112007.349622] do_nanosleep+0x6f/0x190
[112007.351546] hrtimer_nanosleep+0xba/0x1f0
[112007.353682] ? lock_release+0x191/0x310
[112007.355749] ? __pfx_hrtimer_wakeup+0x10/0x10
[112007.358092] common_nsleep+0x34/0x60
[112007.360021] __x64_sys_clock_nanosleep+0xde/0x150
[112007.362544] do_syscall_64+0xf3/0x6c0
[112007.364515] entry_SYSCALL_64_after_hwframe+0x77/0x7f
[112007.367156] RIP: 0033:0x7fdb4daba687
[112007.369087] Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00
00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24
10 0f 05 <5b> c3 0f 1f 80 00 00 00 00 83 e2 39 83 fa 08 75 de e8 23 ff
ff ff
[112007.378506] RSP: 002b:00007fff604b87d0 EFLAGS: 00000202 ORIG_RAX:
00000000000000e6
[112007.382426] RAX: ffffffffffffffda RBX: 00007fdb4d1e7b00 RCX:
00007fdb4daba687
[112007.386111] RDX: 00007fff604b8810 RSI: 0000000000000000 RDI:
0000000000000000
[112007.389821] RBP: 0000000000000000 R08: 0000000000000000 R09:
0000000000000000
[112007.393501] R10: 00007fff604b8810 R11: 0000000000000202 R12:
00000000001bedf6
[112007.397174] R13: 000055632496f550 R14: 0000000000000033 R15:
0000000000000001
[112007.400864] </TASK>
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-06-01 10:54 ` Peter Zijlstra
2026-06-01 20:32 ` John Stultz
@ 2026-06-02 3:19 ` K Prateek Nayak
1 sibling, 0 replies; 45+ messages in thread
From: K Prateek Nayak @ 2026-06-02 3:19 UTC (permalink / raw)
To: Peter Zijlstra, John Stultz
Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Hello Peter,
On 6/1/2026 4:24 PM, Peter Zijlstra wrote:
>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Prateek, can I make that a normal SoB from you? I'm thinking I should
> merge sched/proxy into sched/core so we can get on with other stuff.
Sorry for the delay! Please convert it into an official S-o-b.
Thank you.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-06-01 20:32 ` John Stultz
@ 2026-06-02 5:22 ` K Prateek Nayak
2026-06-02 6:58 ` John Stultz
2026-06-02 10:02 ` Peter Zijlstra
0 siblings, 2 replies; 45+ messages in thread
From: K Prateek Nayak @ 2026-06-02 5:22 UTC (permalink / raw)
To: John Stultz, Peter Zijlstra
Cc: Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
Hello John,
On 6/2/2026 2:02 AM, John Stultz wrote:
> On Mon, Jun 1, 2026 at 3:54 AM Peter Zijlstra <peterz@infradead.org> wrote:
>> On Tue, May 26, 2026 at 01:16:14PM +0200, Peter Zijlstra wrote:
>>> From: K Prateek Nayak <kprateek.nayak@amd.com>
>>>
>>> Now that the proxy path uses ->is_blocked, use the '->is_blocked &&
>>> !->blocked_on' state instead of PROXY_WAKING. Notably, this is where a
>>> blocked_on relation is broken but the donor task might still need a return
>>> migration.
>>>
>>> (Not-yet-)Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
>>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
>>
>> Prateek, can I make that a normal SoB from you? I'm thinking I should
>> merge sched/proxy into sched/core so we can get on with other stuff.
>
> Just as a heads up, so in stress testing[1] over the weekend with your
> sched/proxy series, I hit the below null ptr traversal that seems to
> be another pick_eevdf() returning null issue.
>
> I'm not sure if this is proxy related or not yet, so I'll be working
> to reproduce (took ~31 hours to trip this one) and narrow it down.
> But I'm wondering, given this pick_eevdf() returning null symptom has
> been a regular issue for various bugs over time, do we need some
> better debug checks to try to better these narrow down?
I think PARANOID_AVG sched feat allows for some indication if things
have gone sideways without crashing but there isn't an easy way to get
the cfs_rq state which led to the crash without a crash kernel.
>
> This was using your tree at 4d92e41a046d, plus one workaround for
> binutils on my system:
> https://lore.kernel.org/lkml/7b45d196-063e-4e76-b08b-ec2bcc111328@linux.ibm.com/
Could you also try merging tip:sched/urgent into this branch and
rerunning.
commit b6eee96843e8 ("sched/fair: Fix overflow in
vruntime_eligible()") in v7.1-rc3 moved to using 128-bit data type for
the eligibility check and it can catch cases where an overflow in the
multiplication will cause all entities to appear ineligible.
--
Thanks and Regards,
Prateek
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-06-02 5:22 ` K Prateek Nayak
@ 2026-06-02 6:58 ` John Stultz
2026-06-02 10:02 ` Peter Zijlstra
1 sibling, 0 replies; 45+ messages in thread
From: John Stultz @ 2026-06-02 6:58 UTC (permalink / raw)
To: K Prateek Nayak
Cc: Peter Zijlstra, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Mon, Jun 1, 2026 at 10:22 PM K Prateek Nayak <kprateek.nayak@amd.com> wrote:
> On 6/2/2026 2:02 AM, John Stultz wrote:
> > On Mon, Jun 1, 2026 at 3:54 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >> On Tue, May 26, 2026 at 01:16:14PM +0200, Peter Zijlstra wrote:
> >>> From: K Prateek Nayak <kprateek.nayak@amd.com>
> >>>
> >>> Now that the proxy path uses ->is_blocked, use the '->is_blocked &&
> >>> !->blocked_on' state instead of PROXY_WAKING. Notably, this is where a
> >>> blocked_on relation is broken but the donor task might still need a return
> >>> migration.
> >>>
> >>> (Not-yet-)Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
> >>> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> >>
> >> Prateek, can I make that a normal SoB from you? I'm thinking I should
> >> merge sched/proxy into sched/core so we can get on with other stuff.
> >
> > Just as a heads up, so in stress testing[1] over the weekend with your
> > sched/proxy series, I hit the below null ptr traversal that seems to
> > be another pick_eevdf() returning null issue.
> >
> > I'm not sure if this is proxy related or not yet, so I'll be working
> > to reproduce (took ~31 hours to trip this one) and narrow it down.
> > But I'm wondering, given this pick_eevdf() returning null symptom has
> > been a regular issue for various bugs over time, do we need some
> > better debug checks to try to better these narrow down?
>
> I think PARANOID_AVG sched feat allows for some indication if things
> have gone sideways without crashing but there isn't an easy way to get
> the cfs_rq state which led to the crash without a crash kernel.
>
> >
> > This was using your tree at 4d92e41a046d, plus one workaround for
> > binutils on my system:
> > https://lore.kernel.org/lkml/7b45d196-063e-4e76-b08b-ec2bcc111328@linux.ibm.com/
>
> Could you also try merging tip:sched/urgent into this branch and
> rerunning.
>
> commit b6eee96843e8 ("sched/fair: Fix overflow in
> vruntime_eligible()") in v7.1-rc3 moved to using 128-bit data type for
> the eligibility check and it can catch cases where an overflow in the
> multiplication will cause all entities to appear ineligible.
Oh, good point! I was thinking that was in in there, but it landed later.
Many thanks for pointing this out. I'll get the tests restarted.
thanks
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-06-02 5:22 ` K Prateek Nayak
2026-06-02 6:58 ` John Stultz
@ 2026-06-02 10:02 ` Peter Zijlstra
2026-06-04 18:29 ` John Stultz
1 sibling, 1 reply; 45+ messages in thread
From: Peter Zijlstra @ 2026-06-02 10:02 UTC (permalink / raw)
To: K Prateek Nayak
Cc: John Stultz, Joel Fernandes, Qais Yousef, Ingo Molnar, Juri Lelli,
Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, Jun 02, 2026 at 10:52:10AM +0530, K Prateek Nayak wrote:
> Could you also try merging tip:sched/urgent into this branch and
> rerunning.
I'll merge that into sched/core so we don't keep hitting this. Been
there done that etc. :(
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-06-02 10:02 ` Peter Zijlstra
@ 2026-06-04 18:29 ` John Stultz
2026-06-04 18:41 ` Peter Zijlstra
0 siblings, 1 reply; 45+ messages in thread
From: John Stultz @ 2026-06-04 18:29 UTC (permalink / raw)
To: Peter Zijlstra
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Tue, Jun 2, 2026 at 3:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Jun 02, 2026 at 10:52:10AM +0530, K Prateek Nayak wrote:
>
> > Could you also try merging tip:sched/urgent into this branch and
> > rerunning.
>
> I'll merge that into sched/core so we don't keep hitting this. Been
> there done that etc. :(
Just an update: So I re-ran with K Prateek's suggestion for a night
with no trouble, and then after I saw sched/core was updated I started
over with that as of 130fc7bdfadb ("sched/fair: Unify cfs_rq
throttling via account_cfs_rq_runtime()"), and its been running the
same stress testing for almost 40 hours without any apparent issues.
So yeah, I'm pretty sure the b6eee96843e8 ("sched/fair: Fix overflow
in vruntime_eligible()") fix resolved what I was hitting.
thanks!
-john
^ permalink raw reply [flat|nested] 45+ messages in thread
* Re: [PATCH 5/6] sched/proxy: Remove PROXY_WAKING
2026-06-04 18:29 ` John Stultz
@ 2026-06-04 18:41 ` Peter Zijlstra
0 siblings, 0 replies; 45+ messages in thread
From: Peter Zijlstra @ 2026-06-04 18:41 UTC (permalink / raw)
To: John Stultz
Cc: K Prateek Nayak, Joel Fernandes, Qais Yousef, Ingo Molnar,
Juri Lelli, Vincent Guittot, Dietmar Eggemann, Valentin Schneider,
Steven Rostedt, Ben Segall, Zimuzo Ezeozue, Will Deacon,
Waiman Long, Boqun Feng, Paul E. McKenney, Metin Kaya, Xuewen Yan,
Thomas Gleixner, Daniel Lezcano, Suleiman Souhlal, kuyo chang,
hupu, linux-kernel, Mike Galbraith
On Thu, Jun 04, 2026 at 11:29:06AM -0700, John Stultz wrote:
> On Tue, Jun 2, 2026 at 3:02 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Tue, Jun 02, 2026 at 10:52:10AM +0530, K Prateek Nayak wrote:
> >
> > > Could you also try merging tip:sched/urgent into this branch and
> > > rerunning.
> >
> > I'll merge that into sched/core so we don't keep hitting this. Been
> > there done that etc. :(
>
> Just an update: So I re-ran with K Prateek's suggestion for a night
> with no trouble, and then after I saw sched/core was updated I started
> over with that as of 130fc7bdfadb ("sched/fair: Unify cfs_rq
> throttling via account_cfs_rq_runtime()"), and its been running the
> same stress testing for almost 40 hours without any apparent issues.
>
> So yeah, I'm pretty sure the b6eee96843e8 ("sched/fair: Fix overflow
> in vruntime_eligible()") fix resolved what I was hitting.
Excellent!, I'll go push queue:sched/core into -tip momentarily.
^ permalink raw reply [flat|nested] 45+ messages in thread
* [tip: sched/core] sched: Simplify ttwu_runnable()
2026-05-26 11:16 ` [PATCH 6/6] sched: Simplify ttwu_runnable() Peter Zijlstra
@ 2026-06-04 18:45 ` tip-bot2 for Peter Zijlstra
0 siblings, 0 replies; 45+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-06-04 18:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 56e50ff567810db208cc37d9e17b8df044a9158c
Gitweb: https://git.kernel.org/tip/56e50ff567810db208cc37d9e17b8df044a9158c
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 26 May 2026 12:00:59 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 02 Jun 2026 12:26:10 +02:00
sched: Simplify ttwu_runnable()
Note that both proxy and delayed tasks have ->is_blocked set. Use this one
condition to guard both paths.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260526113322.714832584%40infradead.org
---
kernel/sched/core.c | 13 ++++++-------
1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index d579518..5a317f6 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3764,9 +3764,6 @@ static inline void proxy_reset_donor(struct rq *rq)
*/
static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
{
- if (!p->is_blocked)
- return false;
-
/*
* Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
*
@@ -3875,10 +3872,12 @@ static int ttwu_runnable(struct task_struct *p, int wake_flags)
return 0;
update_rq_clock(rq);
- if (p->se.sched_delayed)
- enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
- if (proxy_needs_return(rq, p))
- return 0;
+ if (p->is_blocked) {
+ if (p->se.sched_delayed)
+ enqueue_task(rq, p, ENQUEUE_NOCLOCK | ENQUEUE_DELAYED);
+ if (proxy_needs_return(rq, p))
+ return 0;
+ }
if (!task_on_cpu(rq, p)) {
/*
* When on_rq && !on_cpu the task is preempted, see if
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [tip: sched/core] sched/proxy: Remove superfluous clear_task_blocked_in()
2026-05-26 11:16 ` [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in() Peter Zijlstra
2026-05-26 23:39 ` John Stultz
@ 2026-06-04 18:45 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 45+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-06-04 18:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), John Stultz, x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: c0404dd88d124714351f7a961d3313ee0f2f036b
Gitweb: https://git.kernel.org/tip/c0404dd88d124714351f7a961d3313ee0f2f036b
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 26 May 2026 11:22:30 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 02 Jun 2026 12:26:09 +02:00
sched/proxy: Remove superfluous clear_task_blocked_in()
Per the discussion here:
https://lore.kernel.org/all/20260403112810.GG3738786@noisy.programming.kicks-ass.net/
The reason for this condition is that the signal condition in
try_to_block_task() would set_task_blocked_in_waking(). However, it no longer
does that, in fact, that path does clear_task_blocked_on().
Further, per the discussions here:
https://lore.kernel.org/r/dc61cf77-e541-441d-a708-c40e19aa0db2%40amd.com
https://lore.kernel.org/r//9dd1d24d-45d3-4ee2-8e67-8305b34bfb6d%40amd.com
there are a few other edge cases that needed this. But they're all
variants of PROXY_WAKING leaking out. And since PROXY_WAKING is now
gone, this is no longer needed either.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260526113322.120970670%40infradead.org
---
kernel/sched/core.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cec2c16..d579518 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7142,9 +7142,6 @@ pick_again:
if (sched_proxy_exec()) {
struct task_struct *prev_donor = rq->donor;
- if (!prev_state && prev->blocked_on)
- clear_task_blocked_on(prev, NULL);
-
rq_set_donor(rq, next);
next->blocked_donor = NULL;
if (unlikely(next->is_blocked)) {
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [tip: sched/core] sched/proxy: Remove PROXY_WAKING
2026-05-26 11:16 ` [PATCH 5/6] sched/proxy: Remove PROXY_WAKING Peter Zijlstra
2026-06-01 10:54 ` Peter Zijlstra
@ 2026-06-04 18:45 ` tip-bot2 for K Prateek Nayak
1 sibling, 0 replies; 45+ messages in thread
From: tip-bot2 for K Prateek Nayak @ 2026-06-04 18:45 UTC (permalink / raw)
To: linux-tip-commits
Cc: K Prateek Nayak, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: ec9d4f1c424134bbf30965075df78d02a5d021dc
Gitweb: https://git.kernel.org/tip/ec9d4f1c424134bbf30965075df78d02a5d021dc
Author: K Prateek Nayak <kprateek.nayak@amd.com>
AuthorDate: Tue, 26 May 2026 11:43:02 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 02 Jun 2026 12:26:09 +02:00
sched/proxy: Remove PROXY_WAKING
Now that the proxy path uses ->is_blocked, use the '->is_blocked &&
!->blocked_on' state instead of PROXY_WAKING. Notably, this is where a
blocked_on relation is broken but the donor task might still need a return
migration.
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260526113322.596522894%40infradead.org
---
include/linux/sched.h | 50 +-------------------------------------
kernel/locking/mutex.c | 4 +--
kernel/locking/ww_mutex.h | 4 +--
kernel/sched/core.c | 2 +-
4 files changed, 7 insertions(+), 53 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index e2f127a..35e6183 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2205,19 +2205,10 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock) __must_hold(lock);
#ifndef CONFIG_PREEMPT_RT
-/*
- * With proxy exec, if a task has been proxy-migrated, it may be a donor
- * on a cpu that it can't actually run on. Thus we need a special state
- * to denote that the task is being woken, but that it needs to be
- * evaluated for return-migration before it is run. So if the task is
- * blocked_on PROXY_WAKING, return migrate it before running it.
- */
-#define PROXY_WAKING ((struct mutex *)(-1L))
-
static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
{
lockdep_assert_held_once(&p->blocked_lock);
- return p->blocked_on == PROXY_WAKING ? NULL : p->blocked_on;
+ return p->blocked_on;
}
static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
@@ -2245,7 +2236,7 @@ static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *
* blocked_on relationships, but make sure we are not
* clearing the relationship with a different lock.
*/
- WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m && p->blocked_on != PROXY_WAKING);
+ WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
p->blocked_on = NULL;
}
@@ -2254,35 +2245,6 @@ static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)
guard(raw_spinlock_irqsave)(&p->blocked_lock);
__clear_task_blocked_on(p, m);
}
-
-static inline void __set_task_blocked_on_waking(struct task_struct *p, struct mutex *m)
-{
- /* Currently we serialize blocked_on under the task::blocked_lock */
- lockdep_assert_held_once(&p->blocked_lock);
-
- if (!sched_proxy_exec()) {
- __clear_task_blocked_on(p, m);
- return;
- }
-
- /* Don't set PROXY_WAKING if blocked_on was already cleared */
- if (!p->blocked_on)
- return;
- /*
- * There may be cases where we set PROXY_WAKING on tasks that were
- * already set to waking, but make sure we are not changing
- * the relationship with a different lock.
- */
- WARN_ON_ONCE(m && p->blocked_on != m && p->blocked_on != PROXY_WAKING);
- p->blocked_on = PROXY_WAKING;
-}
-
-static inline void set_task_blocked_on_waking(struct task_struct *p, struct mutex *m)
-{
- guard(raw_spinlock_irqsave)(&p->blocked_lock);
- __set_task_blocked_on_waking(p, m);
-}
-
#else
static inline void __clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
{
@@ -2291,14 +2253,6 @@ static inline void __clear_task_blocked_on(struct task_struct *p, struct rt_mute
static inline void clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
{
}
-
-static inline void __set_task_blocked_on_waking(struct task_struct *p, struct rt_mutex *m)
-{
-}
-
-static inline void set_task_blocked_on_waking(struct task_struct *p, struct rt_mutex *m)
-{
-}
#endif /* !CONFIG_PREEMPT_RT */
static __always_inline bool need_resched(void)
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2867716..89d01f7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -1044,7 +1044,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
next_lock = __get_task_blocked_on(donor);
if (next_lock == lock) {
next = get_task_struct(donor);
- __set_task_blocked_on_waking(donor, next_lock);
+ __clear_task_blocked_on(next, lock);
current->blocked_donor = NULL;
}
raw_spin_unlock(&donor->blocked_lock);
@@ -1060,7 +1060,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
raw_spin_lock_nested(&next->blocked_lock, SINGLE_DEPTH_NESTING);
debug_mutex_wake_waiter(lock, waiter);
- __set_task_blocked_on_waking(next, lock);
+ __clear_task_blocked_on(next, lock);
raw_spin_unlock(&next->blocked_lock);
}
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 6c12452..d62b49b 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -324,7 +324,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
* blocked_on to PROXY_WAKING. Otherwise we can see
* circular blocked_on relationships that can't resolve.
*/
- set_task_blocked_on_waking(waiter->task, lock);
+ clear_task_blocked_on(waiter->task, lock);
wake_q_add(wake_q, waiter->task);
}
@@ -383,7 +383,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
* are waking the mutex owner, who may be currently
* blocked on a different mutex.
*/
- set_task_blocked_on_waking(owner, NULL);
+ clear_task_blocked_on(owner, NULL);
wake_q_add(wake_q, owner);
}
return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 9b71031..cec2c16 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6872,7 +6872,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
for (p = donor; p->is_blocked; p = owner) {
/* if its PROXY_WAKING, do return migration or run if current */
struct mutex *mutex = p->blocked_on;
- if (!mutex || mutex == PROXY_WAKING) {
+ if (!mutex) {
clear_task_blocked_on(p, mutex);
if (task_current(rq, p)) {
p->is_blocked = 0;
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [tip: sched/core] sched/proxy: Switch proxy to use p->is_blocked
2026-05-26 11:16 ` [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked Peter Zijlstra
2026-05-26 14:57 ` Peter Zijlstra
2026-05-27 2:25 ` John Stultz
@ 2026-06-04 18:45 ` tip-bot2 for Peter Zijlstra
2 siblings, 0 replies; 45+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-06-04 18:45 UTC (permalink / raw)
To: linux-tip-commits
Cc: K Prateek Nayak, Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: be365ce2bc20b8970bed350f82c3b760256b6945
Gitweb: https://git.kernel.org/tip/be365ce2bc20b8970bed350f82c3b760256b6945
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 26 May 2026 11:42:29 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 02 Jun 2026 12:26:09 +02:00
sched/proxy: Switch proxy to use p->is_blocked
Rather than gate the proxy paths with p->blocked_on, use p->is_blocked.
This opens up the state: '->is_blocked && !->blocked_on' for future use.
Notably, only proxy and delayed tasks can be ->on_rq && ->is_blocked, and it is
guaranteed that sched_class::pick_task() will never return a delayed task.
Therefore any task returned from pick_next_task() that has ->is_blocked set,
must be a proxy task.
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260526113322.477954312%40infradead.org
---
kernel/sched/core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b007b65..9b71031 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3764,7 +3764,7 @@ static inline void proxy_reset_donor(struct rq *rq)
*/
static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
{
- if (!task_is_blocked(p))
+ if (!p->is_blocked)
return false;
/*
@@ -6866,14 +6866,14 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
bool curr_in_chain = false;
int this_cpu = cpu_of(rq);
struct task_struct *p;
- struct mutex *mutex;
int owner_cpu;
/* Follow blocked_on chain. */
- for (p = donor; (mutex = p->blocked_on); p = owner) {
+ for (p = donor; p->is_blocked; p = owner) {
/* if its PROXY_WAKING, do return migration or run if current */
- if (mutex == PROXY_WAKING) {
- clear_task_blocked_on(p, PROXY_WAKING);
+ struct mutex *mutex = p->blocked_on;
+ if (!mutex || mutex == PROXY_WAKING) {
+ clear_task_blocked_on(p, mutex);
if (task_current(rq, p)) {
p->is_blocked = 0;
return p;
@@ -7147,7 +7147,7 @@ pick_again:
rq_set_donor(rq, next);
next->blocked_donor = NULL;
- if (unlikely(next->is_blocked && next->blocked_on)) {
+ if (unlikely(next->is_blocked)) {
next = find_proxy_task(rq, next, &rf);
if (!next) {
zap_balance_callbacks(rq);
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [tip: sched/core] sched/proxy: Only return migrate when needed
2026-05-27 8:29 ` Peter Zijlstra
@ 2026-06-04 18:45 ` tip-bot2 for Peter Zijlstra
0 siblings, 0 replies; 45+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-06-04 18:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 7918cf3693614c9f96bc9e43daff6fc72c01b81a
Gitweb: https://git.kernel.org/tip/7918cf3693614c9f96bc9e43daff6fc72c01b81a
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Wed, 27 May 2026 09:58:02 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 02 Jun 2026 12:26:08 +02:00
sched/proxy: Only return migrate when needed
Current code will 'unconditionally' return migrate on PROXY_WAKING, even if the
task is (still) on the original CPU.
Check task_cpu(p) against p->waking_cpu, which per proxy_set_task_cpu()
preserves the original CPU the task was on. If they do not mis-match, there is
no need to go through the more expensive wakeup path.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260527082916.GP3126523%40noisy.programming.kicks-ass.net
---
kernel/sched/core.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b7eb12..b007b65 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -3767,6 +3767,21 @@ static inline bool proxy_needs_return(struct rq *rq, struct task_struct *p)
if (!task_is_blocked(p))
return false;
+ /*
+ * Typically per __set_task_cpu(), task_cpu(p) == p->wake_cpu.
+ *
+ * However, proxy_set_task_cpu() is such that it preserves the
+ * original cpu in p->wake_cpu while migrating p for proxy reasons
+ * (possibly outside of the allowed p->cpus_ptr).
+ *
+ * Furthermore, migration_cpu_stop() / __migrate_swap_task(), will
+ * only set p->wake_cpu when !p->on_rq, and since here p->on_rq, this
+ * will not apply. But if it did, this check is the safe way around
+ * and would migrate.
+ */
+ if (task_cpu(p) == p->wake_cpu)
+ return false;
+
scoped_guard(raw_spinlock, &p->blocked_lock) {
/* Task is waking up; clear any blocked_on relationship */
__clear_task_blocked_on(p, NULL);
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [tip: sched/core] sched: Be more strict about p->is_blocked
2026-05-26 11:16 ` [PATCH 3/6] sched: Be more strict about p->is_blocked Peter Zijlstra
2026-05-27 1:56 ` John Stultz
@ 2026-06-04 18:45 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 45+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-06-04 18:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), John Stultz, x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: 708024b575b4ea58c5956e7c09f2d2f48facd478
Gitweb: https://git.kernel.org/tip/708024b575b4ea58c5956e7c09f2d2f48facd478
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 26 May 2026 11:32:34 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 02 Jun 2026 12:26:08 +02:00
sched: Be more strict about p->is_blocked
Upon entry to try_to_block_task(), p->is_blocked should be false. After all,
the prior wakeup would have made it so per ttwu_do_wakeup().
Ensure this is the case, rather than clearing it in the path that doesn't set
it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260526113322.364017314%40infradead.org
---
kernel/sched/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index a06d5a5..8b7eb12 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6676,8 +6676,9 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p,
{
unsigned long task_state = *task_state_p;
+ WARN_ON_ONCE(p->is_blocked);
+
if (signal_pending_state(task_state, p)) {
- p->is_blocked = 0;
WRITE_ONCE(p->__state, TASK_RUNNING);
*task_state_p = TASK_RUNNING;
clear_task_blocked_on(p, NULL);
^ permalink raw reply related [flat|nested] 45+ messages in thread
* [tip: sched/core] sched/proxy: Optimize try_to_wake_up()
2026-05-26 11:16 ` [PATCH 2/6] sched/proxy: Optimize try_to_wake_up() Peter Zijlstra
2026-05-27 1:56 ` John Stultz
@ 2026-06-04 18:45 ` tip-bot2 for Peter Zijlstra
1 sibling, 0 replies; 45+ messages in thread
From: tip-bot2 for Peter Zijlstra @ 2026-06-04 18:45 UTC (permalink / raw)
To: linux-tip-commits; +Cc: Peter Zijlstra (Intel), John Stultz, x86, linux-kernel
The following commit has been merged into the sched/core branch of tip:
Commit-ID: abc40cca0efdf5ba28b7bc37f1db445a8cc840bd
Gitweb: https://git.kernel.org/tip/abc40cca0efdf5ba28b7bc37f1db445a8cc840bd
Author: Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 26 May 2026 11:28:46 +02:00
Committer: Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 02 Jun 2026 12:26:08 +02:00
sched/proxy: Optimize try_to_wake_up()
The reason for the clause in try_to_wake_up() is, per its comment, that
find_proxy_task()'s proxy_deactivate() is not always called with a cleared
p->blocked_on.
However, that seems silly and easily cured. Make sure to always call
proxy_deactivate() with a cleared p->blocked_on such that we might remove this
clause from the common wake-up path.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: John Stultz <jstultz@google.com>
Link: https://patch.msgid.link/20260526113322.244729903%40infradead.org
---
kernel/sched/core.c | 14 ++++----------
1 file changed, 4 insertions(+), 10 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 4c6ceff..a06d5a5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4344,14 +4344,6 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags)
WRITE_ONCE(p->__state, TASK_WAKING);
/*
- * We never clear the blocked_on relation on proxy_deactivate.
- * If we don't clear it here, we have TASK_RUNNING + p->blocked_on
- * when waking up. Since this is a fully blocked, off CPU task
- * waking up, it should be safe to clear the blocked_on relation.
- */
- if (task_is_blocked(p))
- clear_task_blocked_on(p, NULL);
- /*
* If the owning (remote) CPU is still in the middle of schedule() with
* this task as prev, considering queueing p on the remote CPUs wake_list
* which potentially sends an IPI instead of spinning on p->on_cpu to
@@ -6739,6 +6731,7 @@ static void proxy_deactivate(struct rq *rq, struct task_struct *donor)
unsigned long state = READ_ONCE(donor->__state);
WARN_ON_ONCE(state == TASK_RUNNING);
+ WARN_ON_ONCE(donor->blocked_on);
/*
* Because we got donor from pick_next_task(), it is *crucial*
* that we call proxy_resched_idle() before we deactivate it.
@@ -6864,9 +6857,9 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
for (p = donor; (mutex = p->blocked_on); p = owner) {
/* if its PROXY_WAKING, do return migration or run if current */
if (mutex == PROXY_WAKING) {
+ clear_task_blocked_on(p, PROXY_WAKING);
if (task_current(rq, p)) {
p->is_blocked = 0;
- clear_task_blocked_on(p, PROXY_WAKING);
return p;
}
goto deactivate;
@@ -6900,9 +6893,9 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
* and return p (if it is current and safe to
* just run on this rq), or return-migrate the task.
*/
+ __clear_task_blocked_on(p, NULL);
if (task_current(rq, p)) {
p->is_blocked = 0;
- __clear_task_blocked_on(p, NULL);
return p;
}
goto deactivate;
@@ -6912,6 +6905,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
/* XXX Don't handle blocked owners/delayed dequeue yet */
if (curr_in_chain)
return proxy_resched_idle(rq);
+ __clear_task_blocked_on(p, NULL);
goto deactivate;
}
^ permalink raw reply related [flat|nested] 45+ messages in thread
end of thread, other threads:[~2026-06-04 18:45 UTC | newest]
Thread overview: 45+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-26 11:16 [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
2026-05-26 11:16 ` [PATCH 1/6] sched/proxy: Remove superfluous clear_task_blocked_in() Peter Zijlstra
2026-05-26 23:39 ` John Stultz
2026-05-26 23:54 ` John Stultz
2026-05-27 8:59 ` Peter Zijlstra
2026-05-28 23:20 ` John Stultz
2026-05-29 6:45 ` K Prateek Nayak
2026-05-29 7:14 ` John Stultz
2026-05-29 8:24 ` K Prateek Nayak
2026-05-29 8:47 ` Peter Zijlstra
2026-05-29 8:50 ` Peter Zijlstra
2026-05-29 10:46 ` K Prateek Nayak
2026-05-30 2:56 ` John Stultz
2026-05-29 9:33 ` Peter Zijlstra
2026-05-29 6:48 ` John Stultz
2026-05-29 7:58 ` K Prateek Nayak
2026-05-29 10:06 ` Peter Zijlstra
2026-05-29 10:54 ` K Prateek Nayak
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-05-26 11:16 ` [PATCH 2/6] sched/proxy: Optimize try_to_wake_up() Peter Zijlstra
2026-05-27 1:56 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-05-26 11:16 ` [PATCH 3/6] sched: Be more strict about p->is_blocked Peter Zijlstra
2026-05-27 1:56 ` John Stultz
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-05-26 11:16 ` [PATCH 4/6] sched/proxy: Switch proxy to use p->is_blocked Peter Zijlstra
2026-05-26 14:57 ` Peter Zijlstra
2026-05-26 19:48 ` John Stultz
2026-05-27 2:25 ` John Stultz
2026-05-27 8:29 ` Peter Zijlstra
2026-06-04 18:45 ` [tip: sched/core] sched/proxy: Only return migrate when needed tip-bot2 for Peter Zijlstra
2026-06-04 18:45 ` [tip: sched/core] sched/proxy: Switch proxy to use p->is_blocked tip-bot2 for Peter Zijlstra
2026-05-26 11:16 ` [PATCH 5/6] sched/proxy: Remove PROXY_WAKING Peter Zijlstra
2026-06-01 10:54 ` Peter Zijlstra
2026-06-01 20:32 ` John Stultz
2026-06-02 5:22 ` K Prateek Nayak
2026-06-02 6:58 ` John Stultz
2026-06-02 10:02 ` Peter Zijlstra
2026-06-04 18:29 ` John Stultz
2026-06-04 18:41 ` Peter Zijlstra
2026-06-02 3:19 ` K Prateek Nayak
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for K Prateek Nayak
2026-05-26 11:16 ` [PATCH 6/6] sched: Simplify ttwu_runnable() Peter Zijlstra
2026-06-04 18:45 ` [tip: sched/core] " tip-bot2 for Peter Zijlstra
2026-05-26 11:45 ` [PATCH 0/6] sched/proxy: doodles Peter Zijlstra
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.