From: John Stultz <jstultz@google.com>
To: LKML <linux-kernel@vger.kernel.org>
Cc: John Stultz <jstultz@google.com>,
Vineeth Pillai <vineethrp@google.com>,
Sonam Sanju <sonam.sanju@intel.com>,
Sean Christopherson <seanjc@google.com>,
Kunwu Chan <kunwu.chan@linux.dev>, Tejun Heo <tj@kernel.org>,
Joel Fernandes <joelagnelf@nvidia.com>,
Qais Yousef <qyousef@layalina.io>,
Ingo Molnar <mingo@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Dietmar Eggemann <dietmar.eggemann@arm.com>,
Valentin Schneider <vschneid@redhat.com>,
Steven Rostedt <rostedt@goodmis.org>,
Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
Boqun Feng <boqun.feng@gmail.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
Metin Kaya <Metin.Kaya@arm.com>,
Xuewen Yan <xuewen.yan94@gmail.com>,
K Prateek Nayak <kprateek.nayak@amd.com>,
Thomas Gleixner <tglx@linutronix.de>,
Daniel Lezcano <daniel.lezcano@linaro.org>,
Suleiman Souhlal <suleiman@google.com>,
kuyo chang <kuyo.chang@mediatek.com>, hupu <hupu.gm@gmail.com>,
kernel-team@android.com
Subject: [PATCH 1/2] sched: proxy-exec: Close race causing workqueue work being delayed
Date: Mon, 27 Apr 2026 18:38:40 +0000 [thread overview]
Message-ID: <20260427183848.698551-2-jstultz@google.com> (raw)
In-Reply-To: <20260427183848.698551-1-jstultz@google.com>
Vineeth reported seeing a KVM related deadlock connected to work
queue lockups using the android17-6.18 tree, which has
Proxy Execution enabled (using the full patch stack), but I've
subsequently reproduced it on v7.1-rc1.
On further debugging he found:
- kvm-irqfd-cleanup workqueue and rcu_gp lands in a per-cpu
pwq(work queue pool)
- one of kvm-irqfd-cleanup worker(say A) takes a mutex and then
calls synchronize_srcu_expedited()
- one other kvm-irqfd-cleanup worker worker(Say B) tries to
acquire the lock and then gets blocked
- On the way to blocking, this cpu gets an IPI and on return
from IPI, it calls __schedule() and did not get to complete
workqueue accounting(worker->sleeping = 0 and decrementing
pool->nr_running). This is done in sched_submit_work() ->
wq_worker_sleeping() called from schedule() and we got
preempted before that.
- proxy execution doesn't immediately take it off run queue as
p->blocked_on is set during __mutex_lock
- Next time when B is picked for running, it notices A(mutex
holder) is not on a runqueue and then blocks B.
find_proxy_task() -> proxy_deactivate() -> block_task()
- And things are then stuck. A is waiting for the workqueue to
be run, but B can't run the workqueue as it is blocked on A.
The trouble is that with Proxy Execution, in
__mutex_lock_common() we set the task state to
TASK_UNINTERRUPTIBLE, and set blocked_on before calling into
schedule(), where sched_submit_work() will be called.
But if an IPI comes in before we call schedule() the interrupt
will call __schedule(SM_PREEMPT) directly. This causes the
scheduler to see the current task as blocked_on, and deactivate
it (because the owner is off the runqueue).
Since its deactivated, it wont' be run, and it won't get to
call sched_submit_work().
Without proxy-execution, the SM_PREEMPT case will prevent the
task from being dequeued, and it can be reselected again and
run, which will allow it to finish calling into schedule()
and calling sched_submit_work() before actually blocking.
So we need to make sure on the SM_PREEMPT case, if current is
marked as blocked_on, we should clear the blocked_on state and
mark the task RUNNABLE so the task can be selected to complete
its call to schedule() -> sched_submit_work().
Now because we cleared BLOCKED_ON and set the task RUNNABLE,
the task will be able to be selected and run again and loop back
in __mutex_lock_common() where it can re-set the blocked_on
state and call back into schedule() in order to properly be
chosen as a donor.
Many thanks to Vineeth for figuring this very obscure race out
and for implementing a test tool to make it easily reproducible!
Reported-by: Vineeth Pillai <vineethrp@google.com>
Tested-by: Vineeth Pillai <vineethrp@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
---
Cc: Vineeth Pillai <vineethrp@google.com>
Cc: Sonam Sanju <sonam.sanju@intel.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Kunwu Chan <kunwu.chan@linux.dev>
Cc: Tejun Heo <tj@kernel.org>
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Cc: Qais Yousef <qyousef@layalina.io>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Valentin Schneider <vschneid@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: "Paul E. McKenney" <paulmck@kernel.org>
Cc: Metin Kaya <Metin.Kaya@arm.com>
Cc: Xuewen Yan <xuewen.yan94@gmail.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Suleiman Souhlal <suleiman@google.com>
Cc: kuyo chang <kuyo.chang@mediatek.com>
Cc: hupu <hupu.gm@gmail.com>
Cc: kernel-team@android.com
---
kernel/sched/core.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index da20fb6ea25ae..5f684caefd8b2 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7097,6 +7097,17 @@ static void __sched notrace __schedule(int sched_mode)
try_to_block_task(rq, prev, &prev_state,
!task_is_blocked(prev));
switch_count = &prev->nvcsw;
+ } else if (preempt && prev->blocked_on) {
+ /*
+ * If we are SM_PREEMPT, we may have interrupted
+ * after blocked_on was set, before schedule()
+ * was run, preventing workques from running. So
+ * clear blocked_on and mark task RUNNING so it
+ * can be reselected to run and complete its
+ * logic
+ */
+ WRITE_ONCE(prev->__state, TASK_RUNNING);
+ clear_task_blocked_on(prev, NULL);
}
pick_again:
--
2.54.0.545.g6539524ca2-goog
next prev parent reply other threads:[~2026-04-27 18:38 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 18:38 [PATCH 0/2] Proxy Execution fixes for v7.1-rc John Stultz
2026-04-27 18:38 ` John Stultz [this message]
2026-04-28 8:06 ` [PATCH 1/2] sched: proxy-exec: Close race causing workqueue work being delayed K Prateek Nayak
2026-04-28 9:43 ` Peter Zijlstra
2026-04-28 11:18 ` Peter Zijlstra
2026-04-28 13:15 ` K Prateek Nayak
2026-04-28 14:12 ` K Prateek Nayak
2026-04-28 16:50 ` Peter Zijlstra
2026-04-29 2:27 ` John Stultz
2026-04-29 8:59 ` K Prateek Nayak
2026-04-30 5:44 ` John Stultz
2026-04-30 5:47 ` John Stultz
2026-04-30 7:25 ` K Prateek Nayak
2026-04-30 21:05 ` John Stultz
2026-04-30 20:40 ` John Stultz
2026-05-01 5:57 ` K Prateek Nayak
2026-04-27 18:38 ` [PATCH 2/2] locking: mutex: Fix proxy-exec potentially deactivating tasks marked TASK_RUNNING John Stultz
2026-04-28 8:16 ` K Prateek Nayak
2026-04-28 19:50 ` John Stultz
2026-04-30 9:56 ` [PATCH 0/2] Proxy Execution fixes for v7.1-rc Kunwu Chan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427183848.698551-2-jstultz@google.com \
--to=jstultz@google.com \
--cc=Metin.Kaya@arm.com \
--cc=boqun.feng@gmail.com \
--cc=daniel.lezcano@linaro.org \
--cc=dietmar.eggemann@arm.com \
--cc=hupu.gm@gmail.com \
--cc=joelagnelf@nvidia.com \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=kprateek.nayak@amd.com \
--cc=kunwu.chan@linux.dev \
--cc=kuyo.chang@mediatek.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mingo@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=qyousef@layalina.io \
--cc=rostedt@goodmis.org \
--cc=seanjc@google.com \
--cc=sonam.sanju@intel.com \
--cc=suleiman@google.com \
--cc=tglx@linutronix.de \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=vineethrp@google.com \
--cc=vschneid@redhat.com \
--cc=will@kernel.org \
--cc=xuewen.yan94@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.