From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BN8PR05CU002.outbound.protection.outlook.com (mail-eastus2azon11011061.outbound.protection.outlook.com [52.101.57.61]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7E2D2BEC52 for ; Tue, 28 Apr 2026 08:06:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.57.61 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777363599; cv=fail; b=N4Ki+LX2KcbzucflrZ/pOKAW4F7fmfQQPb9G8BXrVK+/HhJR4sfy/E/diX7r4LnSa673tJRhafcFwEHgRyxOPdk7/NmrZsjFBHpCi51PmVXpW7a74HWVVaWCauu7ovndByTel15s5ReQg+OuKw2k1NrAcE8+7Awr6+nxWU+81yM= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777363599; c=relaxed/simple; bh=RECcBVeAMe92oCqlvuPukLrUHGg2uciq63tBob8ruq0=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=FbCC6BaFFrRgoBcbuLXzZHRQk4BBj8FWHjNA1P5x3T08bYKIEVft+a7yhfkr5yg7QdHqNGkGZcgXgOmeCQirSQVsjfnYpzjPNStid4yz4RPTli1cQCINWPZ04XzryYZ8wHl0hyUxo8ykyf1snqmA7qYE20DVW9EcTJk7A8cqIhM= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=KcFakn2y; arc=fail smtp.client-ip=52.101.57.61 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="KcFakn2y" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=HARvVvdQATa8mwBgeCrGUvbT4N3h0ECRs77RvRE8vcO/1iu5LpiDxYKp1a2KVu7x9A814mveV5uvmeOo/S7/0OkjNQkjh7fTJNfg5JzqWN824CfFeSj16AoIAOt0x1p0ISFCK2OIUt7JVzierDiknlf0wtsLNFextmon//5egJ3vKf+Qk47+rgTwlx2D0TMtAcSTcYTLlYeYlSQVPGvgj2FI/86i+c63BgYFLbzRjidJQqb6+K2zPsrLT644bx0YBl4EiwwnmA/K04c+wBkKk2yg2kK8jjUsIBOZXPzqIeKuSEweNl8Kn8nLVG9JVqVTUgsJKIODidZNk1nfkwguNw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=9jiv6pFuwW5sjhlQgDAu8QP0KV1GVV6ajO8wkSq3cSU=; b=jGAzHYyVNm4cYHRa8Wb3R/VJ0vRsaXOfNyG2t7H5mHLVWYScmeG4xvSnHTF/n6n2MrkTuRD7N+ED7zDMdLdFOYVcNXPTEZ4zLs3kdGIVIEVc3VsxBZ2VURggp0/peQVEzPfF9zmHvRFuTG/I+t2Ze0V6JZW9EYyHa/oH8HKwNIdp1TNinrwwWA14QdyaRJK3VUxd0iZs2v6O5qPi5S+CBqU6wAnIiP95T2nvjXR5apDx1myCW8NsOprSrwBJZ+tMJdUxfoh83HvBCJRDJOgt29KMy3iva6p96R6jSQum1nfQg6qfiBfhypSVT8MvOgFkJzPEfCosHH+YWDciyS3v0Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=google.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9jiv6pFuwW5sjhlQgDAu8QP0KV1GVV6ajO8wkSq3cSU=; b=KcFakn2y9Ko3kUJNxWGFcjJyPQ99HrudjFBD4fyqXZI4/PETcOgllXxlSZeYKuHfztEfPCiBZDEzsVMqrzoiHpB55yOpFUjDkFsS9rhU2KZYye2fDPqFaZGVq0uUadH54aD+/8Qd4J8uxiEFIKQa1vx0jcZwDzG2Lju2rt0LA2w= Received: from PH7PR13CA0010.namprd13.prod.outlook.com (2603:10b6:510:174::20) by DS7PR12MB6287.namprd12.prod.outlook.com (2603:10b6:8:94::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9870.17; Tue, 28 Apr 2026 08:06:30 +0000 Received: from MW1PEPF0001615D.namprd21.prod.outlook.com (2603:10b6:510:174:cafe::21) by PH7PR13CA0010.outlook.office365.com (2603:10b6:510:174::20) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9846.28 via Frontend Transport; Tue, 28 Apr 2026 08:06:29 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb08.amd.com; pr=C Received: from satlexmb08.amd.com (165.204.84.17) by MW1PEPF0001615D.mail.protection.outlook.com (10.167.249.88) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9891.0 via Frontend Transport; Tue, 28 Apr 2026 08:06:29 +0000 Received: from SATLEXMB04.amd.com (10.181.40.145) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.17; Tue, 28 Apr 2026 03:06:24 -0500 Received: from satlexmb07.amd.com (10.181.42.216) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 28 Apr 2026 03:06:24 -0500 Received: from [10.136.41.76] (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Tue, 28 Apr 2026 03:06:18 -0500 Message-ID: <148b7898-9eef-4203-bb6c-5ba7f523fd01@amd.com> Date: Tue, 28 Apr 2026 13:36:11 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 1/2] sched: proxy-exec: Close race causing workqueue work being delayed To: John Stultz , LKML CC: Vineeth Pillai , Sonam Sanju , Sean Christopherson , Kunwu Chan , Tejun Heo , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , "Suleiman Souhlal" , kuyo chang , hupu , References: <20260427183848.698551-1-jstultz@google.com> <20260427183848.698551-2-jstultz@google.com> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20260427183848.698551-2-jstultz@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Received-SPF: None (SATLEXMB04.amd.com: kprateek.nayak@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MW1PEPF0001615D:EE_|DS7PR12MB6287:EE_ X-MS-Office365-Filtering-Correlation-Id: 3c6f63c8-33ca-4057-8284-08dea4fd0d9f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|82310400026|1800799024|7416014|376014|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: CuYS+jWF81kEhWK6lGJ5lPJy3Jw0XfDMT5cQjbkvbFJUHkFn5dOLmAkfRtXCw8Z2rVnYtxgKnE+laGNWjvzL63SRfT4pIOYiKYdMbr/boQPbewAtTxuvNAg6xJ89DB/u2ksCQm4jYhDBcR4/3e63mekpPNab/9qCflA43DFStDEjgERBGhslAuBjyA4/k5xz+t27jCSou72vmHWcNYB19p5p+wZMN4ri7L+r1S9xb66AykO3AKhJzUN3F2rIiAR4b51YYvQG2lyhFfUpkhrtWDXbfyZbvoaR3Ef53nqOQQosm8erCoAZ54Zc7672jfipCCrhfBxfFKghEGchGsCySh3ClOS71xuedhLuUayGUb60eO5mmySTtY9oaH781urWcEYgp31f1W27hQ8VIOwEoUlQN0z7R8E6veFSQYgcGluz7ODXQdTvuAV9KI5Hnq281ZLzFnDLr7Eul4vpqygH5jk/ea42dgoV0yu202AFWJ3qe5zfm+dy5+wZi9U+TgIdexf9VQaPHN4URhlzVXTb+cq2l1vf2VlCz3u9Qv3OvI1DLNwnv4kWIH6cxpKplETVvTlChF44KsAF2HamQ6Yq9v8L+KFrndSfs3ToPhi71BKkQ1jFjca1SmSldrcFHFfVPpkXxtJJQHUDOs7/w4G3s40gZZyjX2iYBsRy3I3I5C4sVPEiQs6e9UepUfgGrSvD6s/cUBeDVWddRgbEPkZrNFPiXZdmr0WiZG9IqwNY9LiuchLBJRbHnqFvq/bBlkkYbawAdlB2HgJ5jWpyLjZlKQ== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb08.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700016)(82310400026)(1800799024)(7416014)(376014)(18002099003)(56012099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: j2h0/qb5ANxSfGfJKtjpTOa9xCbc5kifyTxgY6vBu3cicyGyZdIkUotBxXs2Ls1anoeCqV466o4mrbrjhZFNZjI6Onmj6iCUDOlxItkwXtEcBBx9rYpic/nbbplWlkyEUGfJXcnwzlxtNesvG5Ygs3mHnJePQV/7BWaqA9GGnmIor9gC+RmGuBDjaAES5/cSuGJVes8kypmcE13nNugfPu++FWQIjCqYLGZiKlV7HUkDZZ4wFX85Hfo+6yPRcI4TfOMD3+uasfdu4KXQmf16nHITWnT33p/H65LLIF8Ab2000+Y46PQdhoxrxQYp/nMjMKkesbTQ8gs4I7lrYH8d9wqxwnzjePUu9vAi1lHWefYTYXB2qQF7x35iFEDFkOvfjxPN/jmaOYMd+aC2zC8W1dHM4qsr1/qQob75e29TjumkAf6d+8y3vDBrtqepY0iB X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 28 Apr 2026 08:06:29.4089 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 3c6f63c8-33ca-4057-8284-08dea4fd0d9f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb08.amd.com] X-MS-Exchange-CrossTenant-AuthSource: MW1PEPF0001615D.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB6287 Hello John, On 4/28/2026 12:08 AM, John Stultz wrote: > Vineeth reported seeing a KVM related deadlock connected to work > queue lockups using the android17-6.18 tree, which has > Proxy Execution enabled (using the full patch stack), but I've > subsequently reproduced it on v7.1-rc1. > > On further debugging he found: > - kvm-irqfd-cleanup workqueue and rcu_gp lands in a per-cpu > pwq(work queue pool) > - one of kvm-irqfd-cleanup worker(say A) takes a mutex and then > calls synchronize_srcu_expedited() > - one other kvm-irqfd-cleanup worker worker(Say B) tries to > acquire the lock and then gets blocked > - On the way to blocking, this cpu gets an IPI and on return > from IPI, it calls __schedule() and did not get to complete > workqueue accounting(worker->sleeping = 0 and decrementing > pool->nr_running). This is done in sched_submit_work() -> > wq_worker_sleeping() called from schedule() and we got > preempted before that. > - proxy execution doesn't immediately take it off run queue as > p->blocked_on is set during __mutex_lock > - Next time when B is picked for running, it notices A(mutex > holder) is not on a runqueue and then blocks B. > find_proxy_task() -> proxy_deactivate() -> block_task() > - And things are then stuck. A is waiting for the workqueue to > be run, but B can't run the workqueue as it is blocked on A. > > The trouble is that with Proxy Execution, in > __mutex_lock_common() we set the task state to > TASK_UNINTERRUPTIBLE, and set blocked_on before calling into > schedule(), where sched_submit_work() will be called. Geez! That is an interesting race. > > But if an IPI comes in before we call schedule() the interrupt > will call __schedule(SM_PREEMPT) directly. This causes the > scheduler to see the current task as blocked_on, and deactivate > it (because the owner is off the runqueue). > > Since its deactivated, it wont' be run, and it won't get to > call sched_submit_work(). > > Without proxy-execution, the SM_PREEMPT case will prevent the > task from being dequeued, and it can be reselected again and > run, which will allow it to finish calling into schedule() > and calling sched_submit_work() before actually blocking. > > So we need to make sure on the SM_PREEMPT case, if current is > marked as blocked_on, we should clear the blocked_on state and > mark the task RUNNABLE so the task can be selected to complete > its call to schedule() -> sched_submit_work(). > > Now because we cleared BLOCKED_ON and set the task RUNNABLE, > the task will be able to be selected and run again and loop back > in __mutex_lock_common() where it can re-set the blocked_on > state and call back into schedule() in order to properly be > chosen as a donor. > > Many thanks to Vineeth for figuring this very obscure race out > and for implementing a test tool to make it easily reproducible! > > Reported-by: Vineeth Pillai > Tested-by: Vineeth Pillai > Signed-off-by: John Stultz I guess it is missing a: Fixes: be41bde4c3a8 ("sched: Add an initial sketch of the find_proxy_task() function") since that is where we began blocking a task on task_is_blocked(). I really wish there was a better way to have detected this but I cannot think of any better way at the moment so feel free to include: Reviewed-by: K Prateek Nayak > --- > Cc: Vineeth Pillai > Cc: Sonam Sanju > Cc: Sean Christopherson > Cc: Kunwu Chan > Cc: Tejun Heo > Cc: Joel Fernandes > Cc: Qais Yousef > Cc: Ingo Molnar > Cc: Peter Zijlstra > Cc: Juri Lelli > Cc: Vincent Guittot > Cc: Dietmar Eggemann > Cc: Valentin Schneider > Cc: Steven Rostedt > Cc: Will Deacon > Cc: Waiman Long > Cc: Boqun Feng > Cc: "Paul E. McKenney" > Cc: Metin Kaya > Cc: Xuewen Yan > Cc: K Prateek Nayak > Cc: Thomas Gleixner > Cc: Daniel Lezcano > Cc: Suleiman Souhlal > Cc: kuyo chang > Cc: hupu > Cc: kernel-team@android.com > --- > kernel/sched/core.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index da20fb6ea25ae..5f684caefd8b2 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -7097,6 +7097,17 @@ static void __sched notrace __schedule(int sched_mode) > try_to_block_task(rq, prev, &prev_state, > !task_is_blocked(prev)); > switch_count = &prev->nvcsw; > + } else if (preempt && prev->blocked_on) { > + /* > + * If we are SM_PREEMPT, we may have interrupted > + * after blocked_on was set, before schedule() > + * was run, preventing workques from running. So > + * clear blocked_on and mark task RUNNING so it > + * can be reselected to run and complete its > + * logic > + */ > + WRITE_ONCE(prev->__state, TASK_RUNNING); nit. You probably need to update "prev_state" too for trace_sched_switch() to capture the right state down below. Since this is on the way to schedule(), I wonder if it possible to just do a "next = prev" and goto picked ... but that adds more latency on PREEMPT_RT so that is a no go I presume. > + clear_task_blocked_on(prev, NULL); > } > > pick_again: -- Thanks and Regards, Prateek