From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Zimuzo Ezeozue <zezeozue@google.com>,
Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
Thomas Gleixner <tglx@linutronix.de>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: [PATCH v23 1/9] locking: Add task::blocked_lock to serialize blocked_on state
Date: Thu, 30 Oct 2025 00:18:42 +0000 [thread overview]
Message-ID: <20251030001857.681432-2-jstultz@google.com> (raw)
In-Reply-To: <20251030001857.681432-1-jstultz@google.com>
So far, we have been able to utilize the mutex::wait_lock
for serializing the blocked_on state, but when we move to
proxying across runqueues, we will need to add more state
and a way to serialize changes to this state in contexts
where we don't hold the mutex::wait_lock.
So introduce the task::blocked_lock, which nests under the
mutex::wait_lock in the locking order, and rework the locking
to use it.
Signed-off-by: John Stultz <jstultz@google.com>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
---
v15:
* Split back out into later in the series
v16:
* Fixups to mark tasks unblocked before sleeping in
mutex_optimistic_spin()
* Rework to use guard() as suggested by Peter
v19:
* Rework logic for PREEMPT_RT issues reported by
K Prateek Nayak
v21:
* After recently thinking more on ww_mutex code, I
reworked the blocked_lock usage in mutex lock to
avoid having to take nested locks in the ww_mutex
paths, as I was concerned the lock ordering
constraints weren't as strong as I had previously
thought.
v22:
* Added some extra spaces to avoid dense code blocks
suggested by K Prateek
v23:
* Move get_task_blocked_on() to kernel/locking/mutex.h
as requested by PeterZ
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
include/linux/sched.h | 48 +++++++++++++-----------------------
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/locking/mutex-debug.c | 4 +--
kernel/locking/mutex.c | 40 +++++++++++++++++++-----------
kernel/locking/mutex.h | 6 +++++
kernel/locking/ww_mutex.h | 4 +--
kernel/sched/core.c | 4 ++-
8 files changed, 58 insertions(+), 50 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index cbb7340c5866f..16122c2a2a224 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1241,6 +1241,7 @@ struct task_struct {
#endif
struct mutex *blocked_on; /* lock we're blocked on */
+ raw_spinlock_t blocked_lock;
#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
/*
@@ -2149,57 +2150,42 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock);
#ifndef CONFIG_PREEMPT_RT
static inline struct mutex *__get_task_blocked_on(struct task_struct *p)
{
- struct mutex *m = p->blocked_on;
-
- if (m)
- lockdep_assert_held_once(&m->wait_lock);
- return m;
+ lockdep_assert_held_once(&p->blocked_lock);
+ return p->blocked_on;
}
static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- struct mutex *blocked_on = READ_ONCE(p->blocked_on);
-
WARN_ON_ONCE(!m);
/* The task should only be setting itself as blocked */
WARN_ON_ONCE(p != current);
- /* Currently we serialize blocked_on under the mutex::wait_lock */
- lockdep_assert_held_once(&m->wait_lock);
+ /* Currently we serialize blocked_on under the task::blocked_lock */
+ lockdep_assert_held_once(&p->blocked_lock);
/*
* Check ensure we don't overwrite existing mutex value
* with a different mutex. Note, setting it to the same
* lock repeatedly is ok.
*/
- WARN_ON_ONCE(blocked_on && blocked_on != m);
- WRITE_ONCE(p->blocked_on, m);
-}
-
-static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
-{
- guard(raw_spinlock_irqsave)(&m->wait_lock);
- __set_task_blocked_on(p, m);
+ WARN_ON_ONCE(p->blocked_on && p->blocked_on != m);
+ p->blocked_on = m;
}
static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- if (m) {
- struct mutex *blocked_on = READ_ONCE(p->blocked_on);
-
- /* Currently we serialize blocked_on under the mutex::wait_lock */
- lockdep_assert_held_once(&m->wait_lock);
- /*
- * There may be cases where we re-clear already cleared
- * blocked_on relationships, but make sure we are not
- * clearing the relationship with a different lock.
- */
- WARN_ON_ONCE(blocked_on && blocked_on != m);
- }
- WRITE_ONCE(p->blocked_on, NULL);
+ /* Currently we serialize blocked_on under the task::blocked_lock */
+ lockdep_assert_held_once(&p->blocked_lock);
+ /*
+ * There may be cases where we re-clear already cleared
+ * blocked_on relationships, but make sure we are not
+ * clearing the relationship with a different lock.
+ */
+ WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m);
+ p->blocked_on = NULL;
}
static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- guard(raw_spinlock_irqsave)(&m->wait_lock);
+ guard(raw_spinlock_irqsave)(&p->blocked_lock);
__clear_task_blocked_on(p, m);
}
#else
diff --git a/init/init_task.c b/init/init_task.c
index a55e2189206fa..60477d74546e0 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -143,6 +143,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
.journal_info = NULL,
INIT_CPU_TIMERS(init_task)
.pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
+ .blocked_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.blocked_lock),
.timer_slack_ns = 50000, /* 50 usec default slack */
.thread_pid = &init_struct_pid,
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
diff --git a/kernel/fork.c b/kernel/fork.c
index 3da0f08615a95..0697084be202f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2038,6 +2038,7 @@ __latent_entropy struct task_struct *copy_process(
ftrace_graph_init_task(p);
rt_mutex_init_task(p);
+ raw_spin_lock_init(&p->blocked_lock);
lockdep_assert_irqs_enabled();
#ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/locking/mutex-debug.c b/kernel/locking/mutex-debug.c
index 949103fd8e9b5..1d8cff71f65e1 100644
--- a/kernel/locking/mutex-debug.c
+++ b/kernel/locking/mutex-debug.c
@@ -54,13 +54,13 @@ void debug_mutex_add_waiter(struct mutex *lock, struct mutex_waiter *waiter,
lockdep_assert_held(&lock->wait_lock);
/* Current thread can't be already blocked (since it's executing!) */
- DEBUG_LOCKS_WARN_ON(__get_task_blocked_on(task));
+ DEBUG_LOCKS_WARN_ON(get_task_blocked_on(task));
}
void debug_mutex_remove_waiter(struct mutex *lock, struct mutex_waiter *waiter,
struct task_struct *task)
{
- struct mutex *blocked_on = __get_task_blocked_on(task);
+ struct mutex *blocked_on = get_task_blocked_on(task);
DEBUG_LOCKS_WARN_ON(list_empty(&waiter->list));
DEBUG_LOCKS_WARN_ON(waiter->task != task);
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index de7d6702cd96c..c44fc63d4476e 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -640,6 +640,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
goto err_early_kill;
}
+ raw_spin_lock(¤t->blocked_lock);
__set_task_blocked_on(current, lock);
set_current_state(state);
trace_contention_begin(lock, LCB_F_MUTEX);
@@ -653,8 +654,9 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
* the handoff.
*/
if (__mutex_trylock(lock))
- goto acquired;
+ break;
+ raw_spin_unlock(¤t->blocked_lock);
/*
* Check for signals and kill conditions while holding
* wait_lock. This ensures the lock cancellation is ordered
@@ -677,12 +679,14 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
first = __mutex_waiter_is_first(lock, &waiter);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
+ raw_spin_lock(¤t->blocked_lock);
/*
* As we likely have been woken up by task
* that has cleared our blocked_on state, re-set
* it to the lock we are trying to acquire.
*/
- set_task_blocked_on(current, lock);
+ __set_task_blocked_on(current, lock);
set_current_state(state);
/*
* Here we order against unlock; we must either see it change
@@ -693,25 +697,33 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
break;
if (first) {
- trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
+ bool opt_acquired;
+
/*
* mutex_optimistic_spin() can call schedule(), so
- * clear blocked on so we don't become unselectable
+ * we need to release these locks before calling it,
+ * and clear blocked on so we don't become unselectable
* to run.
*/
- clear_task_blocked_on(current, lock);
- if (mutex_optimistic_spin(lock, ww_ctx, &waiter))
+ __clear_task_blocked_on(current, lock);
+ raw_spin_unlock(¤t->blocked_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+
+ trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
+ opt_acquired = mutex_optimistic_spin(lock, ww_ctx, &waiter);
+
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
+ raw_spin_lock(¤t->blocked_lock);
+ __set_task_blocked_on(current, lock);
+
+ if (opt_acquired)
break;
- set_task_blocked_on(current, lock);
trace_contention_begin(lock, LCB_F_MUTEX);
}
-
- raw_spin_lock_irqsave(&lock->wait_lock, flags);
}
- raw_spin_lock_irqsave(&lock->wait_lock, flags);
-acquired:
__clear_task_blocked_on(current, lock);
__set_current_state(TASK_RUNNING);
+ raw_spin_unlock(¤t->blocked_lock);
if (ww_ctx) {
/*
@@ -740,11 +752,11 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
return 0;
err:
- __clear_task_blocked_on(current, lock);
+ clear_task_blocked_on(current, lock);
__set_current_state(TASK_RUNNING);
__mutex_remove_waiter(lock, &waiter);
err_early_kill:
- WARN_ON(__get_task_blocked_on(current));
+ WARN_ON(get_task_blocked_on(current));
trace_contention_end(lock, ret);
raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
debug_mutex_free_waiter(&waiter);
@@ -955,7 +967,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
next = waiter->task;
debug_mutex_wake_waiter(lock, waiter);
- __clear_task_blocked_on(next, lock);
+ clear_task_blocked_on(next, lock);
wake_q_add(&wake_q, next);
}
diff --git a/kernel/locking/mutex.h b/kernel/locking/mutex.h
index 2e8080a9bee37..5cfd663e2c011 100644
--- a/kernel/locking/mutex.h
+++ b/kernel/locking/mutex.h
@@ -47,6 +47,12 @@ static inline struct task_struct *__mutex_owner(struct mutex *lock)
return (struct task_struct *)(atomic_long_read(&lock->owner) & ~MUTEX_FLAGS);
}
+static inline struct mutex *get_task_blocked_on(struct task_struct *p)
+{
+ guard(raw_spinlock_irqsave)(&p->blocked_lock);
+ return __get_task_blocked_on(p);
+}
+
#ifdef CONFIG_DEBUG_MUTEXES
extern void debug_mutex_lock_common(struct mutex *lock,
struct mutex_waiter *waiter);
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 31a785afee6c0..e4a81790ea7dd 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -289,7 +289,7 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
* blocked_on pointer. Otherwise we can see circular
* blocked_on relationships that can't resolve.
*/
- __clear_task_blocked_on(waiter->task, lock);
+ clear_task_blocked_on(waiter->task, lock);
wake_q_add(wake_q, waiter->task);
}
@@ -347,7 +347,7 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
* are waking the mutex owner, who may be currently
* blocked on a different mutex.
*/
- __clear_task_blocked_on(owner, NULL);
+ clear_task_blocked_on(owner, NULL);
wake_q_add(wake_q, owner);
}
return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index cb4f6d91d4455..517b26c515bc5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6607,6 +6607,7 @@ static struct task_struct *proxy_deactivate(struct rq *rq, struct task_struct *d
* p->pi_lock
* rq->lock
* mutex->wait_lock
+ * p->blocked_lock
*
* Returns the task that is going to be used as execution context (the one
* that is actually going to be run on cpu_of(rq)).
@@ -6630,8 +6631,9 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
* and ensure @owner sticks around.
*/
guard(raw_spinlock)(&mutex->wait_lock);
+ guard(raw_spinlock)(&p->blocked_lock);
- /* Check again that p is blocked with wait_lock held */
+ /* Check again that p is blocked with blocked_lock held */
if (mutex != __get_task_blocked_on(p)) {
/*
* Something changed in the blocked_on chain and
--
2.51.1.930.gacf6e81ea2-goog
next prev parent reply other threads:[~2025-10-30 0:19 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-30 0:18 [PATCH v23 0/9] Donor Migration for Proxy Execution (v23) John Stultz
2025-10-30 0:18 ` John Stultz [this message]
2025-10-30 0:18 ` [PATCH v23 2/9] sched: Fix modifying donor->blocked on without proper locking John Stultz
2025-10-30 4:51 ` K Prateek Nayak
2025-10-30 23:42 ` John Stultz
2025-10-30 0:18 ` [PATCH v23 3/9] sched/locking: Add special p->blocked_on==PROXY_WAKING value for proxy return-migration John Stultz
2025-10-30 7:32 ` K Prateek Nayak
2025-10-30 23:53 ` John Stultz
2025-10-30 0:18 ` [PATCH v23 4/9] sched: Add assert_balance_callbacks_empty helper John Stultz
2025-10-30 7:38 ` K Prateek Nayak
2025-10-30 0:18 ` [PATCH v23 5/9] sched: Add logic to zap balance callbacks if we pick again John Stultz
2025-10-30 8:08 ` K Prateek Nayak
2025-10-31 3:15 ` John Stultz
2025-10-31 3:50 ` K Prateek Nayak
2025-10-30 0:18 ` [PATCH v23 6/9] sched: Handle blocked-waiter migration (and return migration) John Stultz
2025-10-30 9:32 ` K Prateek Nayak
2025-11-07 23:18 ` John Stultz
2025-11-10 4:47 ` K Prateek Nayak
2025-11-20 1:53 ` John Stultz
2025-11-20 2:00 ` John Stultz
2025-11-20 2:55 ` K Prateek Nayak
2025-11-20 6:33 ` John Stultz
2025-11-20 7:16 ` K Prateek Nayak
2025-11-20 7:27 ` John Stultz
2025-11-07 15:19 ` Juri Lelli
2025-11-07 17:24 ` John Stultz
2025-10-30 0:18 ` [PATCH v23 7/9] sched: Have try_to_wake_up() handle return-migration for PROXY_WAKING case John Stultz
2025-10-31 4:27 ` K Prateek Nayak
2025-11-20 1:05 ` John Stultz
2025-11-20 3:15 ` K Prateek Nayak
2025-11-20 7:34 ` John Stultz
2025-10-30 0:18 ` [PATCH v23 8/9] sched: Add blocked_donor link to task for smarter mutex handoffs John Stultz
2025-10-31 5:01 ` K Prateek Nayak
2025-11-11 7:50 ` John Stultz
2025-11-11 8:35 ` K Prateek Nayak
2025-10-30 0:18 ` [PATCH v23 9/9] sched: Migrate whole chain in proxy_migrate_task() John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251030001857.681432-2-jstultz@google.com \
--to=jstultz@google.com \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=bsegall@google.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox