From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Zimuzo Ezeozue <zezeozue@google.com>,
Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: [PATCH v26 04/10] locking: Add task::blocked_lock to serialize blocked_on state
Date: Tue, 24 Mar 2026 19:13:19 +0000 [thread overview]
Message-ID: <20260324191337.1841376-5-jstultz@google.com> (raw)
In-Reply-To: <20260324191337.1841376-1-jstultz@google.com>
So far, we have been able to utilize the mutex::wait_lock
for serializing the blocked_on state, but when we move to
proxying across runqueues, we will need to add more state
and a way to serialize changes to this state in contexts
where we don't hold the mutex::wait_lock.
So introduce the task::blocked_lock, which nests under the
mutex::wait_lock in the locking order, and rework the locking
to use it.
Signed-off-by: John Stultz <jstultz@google.com>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
v15:
* Split back out into later in the series
v16:
* Fixups to mark tasks unblocked before sleeping in
mutex_optimistic_spin()
* Rework to use guard() as suggested by Peter
v19:
* Rework logic for PREEMPT_RT issues reported by
K Prateek Nayak
v21:
* After recently thinking more on ww_mutex code, I
reworked the blocked_lock usage in mutex lock to
avoid having to take nested locks in the ww_mutex
paths, as I was concerned the lock ordering
constraints weren't as strong as I had previously
thought.
v22:
* Added some extra spaces to avoid dense code blocks
suggested by K Prateek
v23:
* Move get_task_blocked_on() to kernel/locking/mutex.h
as requested by PeterZ
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
include/linux/sched.h | 48 +++++++++++++-----------------------
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/locking/mutex-debug.c | 4 +--
kernel/locking/mutex.c | 40 +++++++++++++++++++-----------
kernel/locking/mutex.h | 6 +++++
kernel/locking/ww_mutex.h | 4 +--
kernel/sched/core.c | 4 ++-
8 files changed, 58 insertions(+), 50 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5a5d3dbc9cdf3..2eef9bc6daaab 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1238,6 +1238,7 @@ struct task_struct {
#endif
struct mutex *blocked_on; /* lock we're blocked on */
+ raw_spinlock_t blocked_lock;
#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
/*
@@ -2181,57 +2182,42 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock) __must_hold(lock);
#ifndef CONFIG_PREEMPT_RT
static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
{
- struct mutex *m = p->blocked_on;
-
- if (m)
- lockdep_assert_held_once(&m->wait_lock);
- return m;
+ lockdep_assert_held_once(&p->blocked_lock);
+ return p->blocked_on;
}
static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- struct mutex *blocked_on = READ_ONCE(p->blocked_on);
-
WARN_ON_ONCE(!m);
/* The task should only be setting itself as blocked */
WARN_ON_ONCE(p != current);
- /* Currently we serialize blocked_on under the mutex::wait_lock */
- lockdep_assert_held_once(&m->wait_lock);
+ /* Currently we serialize blocked_on under the task::blocked_lock */
+ lockdep_assert_held_once(&p->blocked_lock);
/*
* Check ensure we don't overwrite existing mutex value
* with a different mutex. Note, setting it to the same
* lock repeatedly is ok.
*/
- WARN_ON_ONCE(blocked_on && blocked_on != m);
- WRITE_ONCE(p->blocked_on, m);
-}
-
-static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
-{
- guard(raw_spinlock_irqsave)(&m->wait_lock);
- __set_task_blocked_on(p, m);
+ WARN_ON_ONCE(p->blocked_on && p->blocked_on != m);
+ p->blocked_on = m;
}
static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- if (m) {
- struct mutex *blocked_on = READ_ONCE(p->blocked_on);
-
- /* Currently we serialize blocked_on under the mutex::wait_lock */
- lockdep_assert_held_once(&m->wait_lock);
- /*
- * There may be cases where we re-clear already cleared
- * blocked_on relationships, but make sure we are not
- * clearing the relationship with a different lock.
- */
- WARN_ON_ONCE(blocked_on && blocked_on != m);
- }
- WRITE_ONCE(p->blocked_on, NULL);
+ /* Currently we serialize blocked_on under the task::blocked_lock */
+ lockdep_assert_held_once(&p->blocked_lock);
+ /*
+ * There may be cases where we re-clear already cleared
+ * blocked_on relationships, but make sure we are not
+ * clearing the relationship with a different lock.
+ */
+ WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
+ p->blocked_on = NULL;
}
static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- guard(raw_spinlock_irqsave)(&m->wait_lock);
+ guard(raw_spinlock_irqsave)(&p->blocked_lock);
__clear_task_blocked_on(p, m);
}
#else
diff --git a/init/init_task.c b/init/init_task.c
index 5c838757fc10e..b5f48ebdc2b6e 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -169,6 +169,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
.journal_info = NULL,
INIT_CPU_TIMERS(init_task)
.pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
+ .blocked_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.blocked_lock),
.timer_slack_ns = 50000, /* 50 usec default slack */
.thread_pid = &init_struct_pid,
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
diff --git a/kernel/fork.c b/kernel/fork.c
index bc2bf58b93b65..079802cb61002 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2076,6 +2076,7 @@ __latent_entropy struct task_struct *copy_process(
ftrace_graph_init_task(p);
rt_mutex_init_task(p);
+ raw_spin_lock_init(&p->blocked_lock);
lockdep_assert_irqs_enabled();
#ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 2c6b02d4699be..cc6aa9c6e9813 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -54,13 +54,13 @@ void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
lockdep_assert_held(&lock->wait_lock);
/* Current thread can't be already blocked (since it's executing!) */
- DEBUG_LOCKS_WARN_ON(__get_task_blocked_on(task));
+ DEBUG_LOCKS_WARN_ON(get_task_blocked_on(task));
}
void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
struct task_struct *task)
{
- struct mutex *blocked_on = __get_task_blocked_on(task);
+ struct mutex *blocked_on = get_task_blocked_on(task);
DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
DEBUG_LOCKS_WARN_ON(waiter->task != task);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 2a1d165b3167e..4aa79bcab08c7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -656,6 +656,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
goto err_early_kill;
}
+ raw_spin_lock(¤t->blocked_lock);
__set_task_blocked_on(current, lock);
set_current_state(state);
trace_contention_begin(lock, LCB_F_MUTEX);
@@ -669,8 +670,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
* the handoff.
*/
if (__mutex_trylock(lock))
- goto acquired;
+ break;
+ raw_spin_unlock(¤t->blocked_lock);
/*
* Check for signals and kill conditions while holding
* wait_lock. This ensures the lock cancellation is ordered
@@ -693,12 +695,14 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
first = __mutex_waiter_is_first(lock, &waiter);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
+ raw_spin_lock(¤t->blocked_lock);
/*
* As we likely have been woken up by task
* that has cleared our blocked_on state, re-set
* it to the lock we are trying to acquire.
*/
- set_task_blocked_on(current, lock);
+ __set_task_blocked_on(current, lock);
set_current_state(state);
/*
* Here we order against unlock; we must either see it change
@@ -709,25 +713,33 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
break;
if (first) {
- trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
+ bool opt_acquired;
+
/*
* mutex_optimistic_spin() can call schedule(), so
- * clear blocked on so we don't become unselectable
+ * we need to release these locks before calling it,
+ * and clear blocked on so we don't become unselectable
* to run.
*/
- clear_task_blocked_on(current, lock);
- if (mutex_optimistic_spin(lock, ww_ctx, &waiter))
+ __clear_task_blocked_on(current, lock);
+ raw_spin_unlock(¤t->blocked_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+
+ trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
+ opt_acquired = mutex_optimistic_spin(lock, ww_ctx, &waiter);
+
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
+ raw_spin_lock(¤t->blocked_lock);
+ __set_task_blocked_on(current, lock);
+
+ if (opt_acquired)
break;
- set_task_blocked_on(current, lock);
trace_contention_begin(lock, LCB_F_MUTEX);
}
-
- raw_spin_lock_irqsave(&lock->wait_lock, flags);
}
- raw_spin_lock_irqsave(&lock->wait_lock, flags);
-acquired:
__clear_task_blocked_on(current, lock);
__set_current_state(TASK_RUNNING);
+ raw_spin_unlock(¤t->blocked_lock);
if (ww_ctx) {
/*
@@ -756,11 +768,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
return 0;
err:
- __clear_task_blocked_on(current, lock);
+ clear_task_blocked_on(current, lock);
__set_current_state(TASK_RUNNING);
__mutex_remove_waiter(lock, &waiter);
err_early_kill:
- WARN_ON(__get_task_blocked_on(current));
+ WARN_ON(get_task_blocked_on(current));
trace_contention_end(lock, ret);
raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
debug_mutex_free_waiter(&waiter);
@@ -971,7 +983,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
next = waiter->task;
debug_mutex_wake_waiter(lock, waiter);
- __clear_task_blocked_on(next, lock);
+ clear_task_blocked_on(next, lock);
wake_q_add(&wake_q, next);
}
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 9ad4da8cea004..7a8ba13fee949 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -47,6 +47,12 @@ static inline struct task_struct *__mutex_owner(struct mutex *lock)
return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS);
}
+static inline struct mutex *get_task_blocked_on(struct task_struct *p)
+{
+ guard(raw_spinlock_irqsave)(&p->blocked_lock);
+ return __get_task_blocked_on(p);
+}
+
#ifdef CONFIG_DEBUG_MUTEXES
extern void debug_mutex_lock_common(struct mutex *lock,
struct mutex_waiter *waiter);
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 31a785afee6c0..e4a81790ea7dd 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -289,7 +289,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
* blocked_on pointer. Otherwise we can see circular
* blocked_on relationships that can't resolve.
*/
- __clear_task_blocked_on(waiter->task, lock);
+ clear_task_blocked_on(waiter->task, lock);
wake_q_add(wake_q, waiter->task);
}
@@ -347,7 +347,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
* are waking the mutex owner, who may be currently
* blocked on a different mutex.
*/
- __clear_task_blocked_on(owner, NULL);
+ clear_task_blocked_on(owner, NULL);
wake_q_add(wake_q, owner);
}
return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 610e48cdb66a9..7187c63174cd2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6587,6 +6587,7 @@ static struct task_struct *proxy_deactivate(struct rq *rq, struct task_struct *d
* p->pi_lock
* rq->lock
* mutex->wait_lock
+ * p->blocked_lock
*
* Returns the task that is going to be used as execution context (the one
* that is actually going to be run on cpu_of(rq)).
@@ -6606,8 +6607,9 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
* and ensure @owner sticks around.
*/
guard(raw_spinlock)(&mutex->wait_lock);
+ guard(raw_spinlock)(&p->blocked_lock);
- /* Check again that p is blocked with wait_lock held */
+ /* Check again that p is blocked with blocked_lock held */
if (mutex != __get_task_blocked_on(p)) {
/*
* Something changed in the blocked_on chain and
--
2.53.0.1018.g2bb0e51243-goog
next prev parent reply other threads:[~2026-03-24 19:13 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-24 19:13 [PATCH v26 00/10] Simple Donor Migration for Proxy Execution John Stultz
2026-03-24 19:13 ` [PATCH v26 01/10] sched: Make class_schedulers avoid pushing current, and get rid of proxy_tag_curr() John Stultz
2026-03-24 19:13 ` [PATCH v26 02/10] sched: Minimise repeated sched_proxy_exec() checking John Stultz
2026-03-24 19:13 ` [PATCH v26 03/10] sched: Fix potentially missing balancing with Proxy Exec John Stultz
2026-03-24 19:13 ` John Stultz [this message]
2026-03-24 19:13 ` [PATCH v26 05/10] sched: Fix modifying donor->blocked on without proper locking John Stultz
2026-03-26 21:45 ` Steven Rostedt
2026-03-24 19:13 ` [PATCH v26 06/10] sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration John Stultz
2026-03-24 19:13 ` [PATCH v26 07/10] sched: Add assert_balance_callbacks_empty helper John Stultz
2026-03-24 19:13 ` [PATCH v26 08/10] sched: Add logic to zap balance callbacks if we pick again John Stultz
2026-03-24 19:13 ` [PATCH v26 09/10] sched: Move attach_one_task and attach_task helpers to sched.h John Stultz
2026-03-24 19:13 ` [PATCH v26 10/10] sched: Handle blocked-waiter migration (and return migration) John Stultz
2026-03-26 22:52 ` Steven Rostedt
2026-03-27 4:47 ` K Prateek Nayak
2026-03-27 12:47 ` Peter Zijlstra
2026-03-25 10:52 ` [PATCH v26 00/10] Simple Donor Migration for Proxy Execution K Prateek Nayak
2026-03-27 11:48 ` Peter Zijlstra
2026-03-27 13:33 ` K Prateek Nayak
2026-03-27 15:20 ` Peter Zijlstra
2026-03-27 15:41 ` Peter Zijlstra
2026-03-27 16:00 ` Peter Zijlstra
2026-03-27 16:57 ` K Prateek Nayak
2026-03-27 19:15 ` John Stultz
2026-03-27 19:10 ` John Stultz
2026-03-28 4:53 ` K Prateek Nayak
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260324191337.1841376-5-jstultz@google.com \
--to=jstultz@google.com \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=bsegall@google.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox