From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Ben Segall <bsegall@google.com>,
Zimuzo Ezeozue <zezeozue@google.com>,
Mel Gorman <mgorman@suse.de>, Will Deacon <will@kernel.org>,
Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: [RFC][PATCH v20 1/6] locking: Add task::blocked_lock to serialize blocked_on state
Date: Tue, 22 Jul 2025 07:05:47 +0000 [thread overview]
Message-ID: <20250722070600.3267819-2-jstultz@google.com> (raw)
In-Reply-To: <20250722070600.3267819-1-jstultz@google.com>
So far, we have been able to utilize the mutex::wait_lock
for serializing the blocked_on state, but when we move to
proxying across runqueues, we will need to add more state
and a way to serialize changes to this state in contexts
where we don't hold the mutex::wait_lock.
So introduce the task::blocked_lock, which nests under the
mutex::wait_lock in the locking order, and rework the locking
to use it.
Signed-off-by: John Stultz <jstultz@google.com>
---
v15:
* Split back out into later in the series
v16:
* Fixups to mark tasks unblocked before sleeping in
mutex_optimistic_spin()
* Rework to use guard() as suggested by Peter
v19:
* Rework logic for PREEMPT_RT issues reported by
K Prateek Nayak
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Zimuzo Ezeozue <zezeozue@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
include/linux/sched.h | 25 ++++++++++++++++++-------
init/init_task.c | 1 +
kernel/fork.c | 1 +
kernel/locking/mutex.c | 34 ++++++++++++++++++++++------------
kernel/locking/ww_mutex.h | 6 ++++--
kernel/sched/core.c | 4 +++-
6 files changed, 49 insertions(+), 22 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 5b4e1cd52e27a..a6654948d264f 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1232,6 +1232,7 @@ struct task_struct {
#endif
struct mutex *blocked_on; /* lock we're blocked on */
+ raw_spinlock_t blocked_lock;
#ifdef CONFIG_DETECT_HUNG_TASK_BLOCKER
/*
@@ -2145,8 +2146,8 @@ static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
WARN_ON_ONCE(!m);
/* The task should only be setting itself as blocked */
WARN_ON_ONCE(p != current);
- /* Currently we serialize blocked_on under the mutex::wait_lock */
- lockdep_assert_held_once(&m->wait_lock);
+ /* Currently we serialize blocked_on under the task::blocked_lock */
+ lockdep_assert_held_once(&p->blocked_lock);
/*
* Check ensure we don't overwrite existing mutex value
* with a different mutex. Note, setting it to the same
@@ -2158,15 +2159,14 @@ static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m)
static inline void set_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- guard(raw_spinlock_irqsave)(&m->wait_lock);
+ guard(raw_spinlock_irqsave)(&p->blocked_lock);
__set_task_blocked_on(p, m);
}
static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- WARN_ON_ONCE(!m);
- /* Currently we serialize blocked_on under the mutex::wait_lock */
- lockdep_assert_held_once(&m->wait_lock);
+ /* Currently we serialize blocked_on under the task::blocked_lock */
+ lockdep_assert_held_once(&p->blocked_lock);
/*
* There may be cases where we re-clear already cleared
* blocked_on relationships, but make sure we are not
@@ -2178,8 +2178,15 @@ static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *
static inline void clear_task_blocked_on(struct task_struct *p, struct mutex *m)
{
- guard(raw_spinlock_irqsave)(&m->wait_lock);
+ guard(raw_spinlock_irqsave)(&p->blocked_lock);
+ __clear_task_blocked_on(p, m);
+}
+
+static inline void clear_task_blocked_on_nested(struct task_struct *p, struct mutex *m)
+{
+ raw_spin_lock_nested(&p->blocked_lock, SINGLE_DEPTH_NESTING);
__clear_task_blocked_on(p, m);
+ raw_spin_unlock(&p->blocked_lock);
}
#else
static inline void __clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
@@ -2189,6 +2196,10 @@ static inline void __clear_task_blocked_on(struct task_struct *p, struct rt_mute
static inline void clear_task_blocked_on(struct task_struct *p, struct rt_mutex *m)
{
}
+
+static inline void clear_task_blocked_on_nested(struct task_struct *p, struct rt_mutex *m)
+{
+}
#endif /* !CONFIG_PREEMPT_RT */
static __always_inline bool need_resched(void)
diff --git a/init/init_task.c b/init/init_task.c
index e557f622bd906..7e29d86153d9f 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -140,6 +140,7 @@ struct task_struct init_task __aligned(L1_CACHE_BYTES) = {
.journal_info = NULL,
INIT_CPU_TIMERS(init_task)
.pi_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.pi_lock),
+ .blocked_lock = __RAW_SPIN_LOCK_UNLOCKED(init_task.blocked_lock),
.timer_slack_ns = 50000, /* 50 usec default slack */
.thread_pid = &init_struct_pid,
.thread_node = LIST_HEAD_INIT(init_signals.thread_head),
diff --git a/kernel/fork.c b/kernel/fork.c
index 5f87f05aff4a0..6a294e6ee105d 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2025,6 +2025,7 @@ __latent_entropy struct task_struct *copy_process(
ftrace_graph_init_task(p);
rt_mutex_init_task(p);
+ raw_spin_lock_init(&p->blocked_lock);
lockdep_assert_irqs_enabled();
#ifdef CONFIG_PROVE_LOCKING
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 80d778fedd605..2ab6d291696e8 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -614,6 +614,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
}
raw_spin_lock_irqsave(&lock->wait_lock, flags);
+ raw_spin_lock(¤t->blocked_lock);
/*
* After waiting to acquire the wait_lock, try again.
*/
@@ -657,7 +658,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
* the handoff.
*/
if (__mutex_trylock(lock))
- goto acquired;
+ break;
/*
* Check for signals and kill conditions while holding
@@ -675,18 +676,21 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
goto err;
}
+ raw_spin_unlock(¤t->blocked_lock);
raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
schedule_preempt_disabled();
first = __mutex_waiter_is_first(lock, &waiter);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
+ raw_spin_lock(¤t->blocked_lock);
/*
* As we likely have been woken up by task
* that has cleared our blocked_on state, re-set
* it to the lock we are trying to acquire.
*/
- set_task_blocked_on(current, lock);
+ __set_task_blocked_on(current, lock);
set_current_state(state);
/*
* Here we order against unlock; we must either see it change
@@ -697,23 +701,27 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
break;
if (first) {
- trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
+ bool opt_acquired;
+
/*
* mutex_optimistic_spin() can call schedule(), so
- * clear blocked on so we don't become unselectable
+ * we need to release these locks before calling it,
+ * and clear blocked on so we don't become unselectable
* to run.
*/
- clear_task_blocked_on(current, lock);
- if (mutex_optimistic_spin(lock, ww_ctx, &waiter))
+ __clear_task_blocked_on(current, lock);
+ raw_spin_unlock(¤t->blocked_lock);
+ raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
+ trace_contention_begin(lock, LCB_F_MUTEX | LCB_F_SPIN);
+ opt_acquired = mutex_optimistic_spin(lock, ww_ctx, &waiter);
+ raw_spin_lock_irqsave(&lock->wait_lock, flags);
+ raw_spin_lock(¤t->blocked_lock);
+ __set_task_blocked_on(current, lock);
+ if (opt_acquired)
break;
- set_task_blocked_on(current, lock);
trace_contention_begin(lock, LCB_F_MUTEX);
}
-
- raw_spin_lock_irqsave(&lock->wait_lock, flags);
}
- raw_spin_lock_irqsave(&lock->wait_lock, flags);
-acquired:
__clear_task_blocked_on(current, lock);
__set_current_state(TASK_RUNNING);
@@ -739,6 +747,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
if (ww_ctx)
ww_mutex_lock_acquired(ww, ww_ctx);
+ raw_spin_unlock(¤t->blocked_lock);
raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
preempt_enable();
return 0;
@@ -750,6 +759,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas
err_early_kill:
WARN_ON(__get_task_blocked_on(current));
trace_contention_end(lock, ret);
+ raw_spin_unlock(¤t->blocked_lock);
raw_spin_unlock_irqrestore_wake(&lock->wait_lock, flags, &wake_q);
debug_mutex_free_waiter(&waiter);
mutex_release(&lock->dep_map, ip);
@@ -959,7 +969,7 @@ static noinline void __sched __mutex_unlock_slowpath(struct mutex *lock, unsigne
next = waiter->task;
debug_mutex_wake_waiter(lock, waiter);
- __clear_task_blocked_on(next, lock);
+ clear_task_blocked_on(next, lock);
wake_q_add(&wake_q, next);
}
diff --git a/kernel/locking/ww_mutex.h b/kernel/locking/ww_mutex.h
index 086fd5487ca77..bf13039fb2a04 100644
--- a/kernel/locking/ww_mutex.h
+++ b/kernel/locking/ww_mutex.h
@@ -289,7 +289,8 @@ __ww_mutex_die(struct MUTEX *lock, struct MUTEX_WAITER *waiter,
* blocked_on pointer. Otherwise we can see circular
* blocked_on relationships that can't resolve.
*/
- __clear_task_blocked_on(waiter->task, lock);
+ /* nested as we should hold current->blocked_lock already */
+ clear_task_blocked_on_nested(waiter->task, lock);
wake_q_add(wake_q, waiter->task);
}
@@ -343,7 +344,8 @@ static bool __ww_mutex_wound(struct MUTEX *lock,
* blocked_on pointer. Otherwise we can see circular
* blocked_on relationships that can't resolve.
*/
- __clear_task_blocked_on(owner, lock);
+ /* nested as we should hold current->blocked_lock already */
+ clear_task_blocked_on_nested(owner, lock);
wake_q_add(wake_q, owner);
}
return true;
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index f7f576ad9b223..52c0f16aab101 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6633,6 +6633,7 @@ static struct task_struct *proxy_deactivate(struct rq *rq, struct task_struct *d
* p->pi_lock
* rq->lock
* mutex->wait_lock
+ * p->blocked_lock
*
* Returns the task that is going to be used as execution context (the one
* that is actually going to be run on cpu_of(rq)).
@@ -6656,8 +6657,9 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf)
* and ensure @owner sticks around.
*/
guard(raw_spinlock)(&mutex->wait_lock);
+ guard(raw_spinlock)(&p->blocked_lock);
- /* Check again that p is blocked with wait_lock held */
+ /* Check again that p is blocked with blocked_lock held */
if (mutex != __get_task_blocked_on(p)) {
/*
* Something changed in the blocked_on chain and
--
2.50.0.727.gbf7dc18ff4-goog
next prev parent reply other threads:[~2025-07-22 7:06 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-07-22 7:05 [RFC][PATCH v20 0/6] Donor Migration for Proxy Execution (v20) John Stultz
2025-07-22 7:05 ` John Stultz [this message]
2025-07-22 7:05 ` [RFC][PATCH v20 2/6] kernel/locking: Add blocked_on_state to provide necessary tri-state for return migration John Stultz
2025-07-22 7:05 ` [RFC][PATCH v20 3/6] sched: Add logic to zap balance callbacks if we pick again John Stultz
2025-07-22 7:05 ` [RFC][PATCH v20 4/6] sched: Handle blocked-waiter migration (and return migration) John Stultz
2025-07-22 7:05 ` [RFC][PATCH v20 5/6] sched: Add blocked_donor link to task for smarter mutex handoffs John Stultz
2025-07-22 7:05 ` [RFC][PATCH v20 6/6] sched: Migrate whole chain in proxy_migrate_task() John Stultz
2025-07-23 14:44 ` [RFC][PATCH v20 0/6] Donor Migration for Proxy Execution (v20) Juri Lelli
2025-07-23 22:42 ` John Stultz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20250722070600.3267819-2-jstultz@google.com \
--to=jstultz@google.com \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=bsegall@google.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=suleiman@google.com \
--cc=tglx@linutronix.de \
--cc=vincent.guittot@linaro.org \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
--cc=zezeozue@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.