From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BN8PR05CU002.outbound.protection.outlook.com (mail-eastus2azon11011054.outbound.protection.outlook.com [52.101.57.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 56F1B2E22BD for ; Fri, 1 May 2026 06:58:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.57.54 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777618696; cv=fail; b=KfPfawgAy7SBxDN3QnG4fqhcVVcR/xBiFmc+NY3YN2AgRCfuydYCCBoBJ1hWholh3KIV/ian7Gj85vuGv4hrk1H5HRsV68f2r+OJNZZHrxIRMywI0VrZza7TnFrdYzZ3Sc7kMaDbAt3UQohXPnv0WZ7Y+K5M2GUjCJOrwH6ojz4= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777618696; c=relaxed/simple; bh=b9MZAWYB2Xp4JGncjtYO/rXabfkg8bkVHlKf4PynUJU=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=caYEcqbLlKnWWT38n35K3PEslvA+MZOBFwlEFyRzoJc62/DGGtq6D1tWHV1Die7aiLDSCaAzk/aiBOoomdWKFSptk5RJgNPuNEsYca2vc1qzCkFlG4FvnjEXFMOqDYUIvNCR40jodZhMgBh2+9kEbk3nu6vX/SKUSNWqXlMAvNM= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=zSfECMvr; arc=fail smtp.client-ip=52.101.57.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="zSfECMvr" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=GgYp7JMw9thSX/P1KXdXI1Zu3/q6j42SjrkHu0TA9Zxw1wWUj0vjhq7unAwOa6/UP1TlKP2TODWVSFtH7XckiJtgETyS27G6pvF8ULNr5C6Zr+mmtQeH1Xd4A7ct8uFPKhklUffTB1Djg7AEaQ/ujxzDLUwDN9PebLkRNpfTVjeLfISG+eViKmks4GAOWPdf7113aXppPKJRBF+PThmcq4tUM0iwM6BXoFh3Mep0vF/pqUD0a+fP0XdGZjIK/ZoyCGhP3FbQMFPY2w3Kxco33ZWjtUOEAM5tcNGI9J0baZWLTStoVA4ug+RZiwRo6ZcvxYM6PFfL96e5AiC3fU9w2w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=TeQxln+dKtk86qnOlHv82Btu8VGVa5QGdi8PUW29bHQ=; b=F/Kw41KlHkWrnvGOfJuW9KgBHu2afAWKSuoIgDXFnyxk4VW9oqC0tPihxHHUQXUTuv7hzsjt1ioYAQDiCXw2yXuzvx3uKH+a4A1vloAGFub+FLHdW5uMC6zQOBJKIWSg1olsw/1QVp8cnC5n+gx657wuIRD6LmzCelfmbbqEDmrYgIG+XMxST53Cz0ZKtTDequ6y8cWBAZok9mYVvXpLlal0KtQUuFoWy8RijuvW6J/WKlE5d8HmIu5ELs5Q6C9KPm9R7Sly1hGt6z54Wv/bof0TUdcO6Lk77KJ1dJ7sZbHkghCF7ZSgYTgzIiAfNQjJl8UASGTG4xR+sNp8GgHSSQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=google.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=TeQxln+dKtk86qnOlHv82Btu8VGVa5QGdi8PUW29bHQ=; b=zSfECMvr6e6tAgAvTI4PiGEbZD/0zo/eKj4hZcDOPL0CX9CtScbDlQiqlnWTDX5dgtj2s5qqk8Pcl42ZWEaDsFk4t8h7JHZH6RooacWvH04LAbpDjFRCoyB6f8ssd5Qul3cSCaFwVP0Pu8G5lo3CUUPnF3ykFgVjjT3cbGajdxw= Received: from SJ0PR13CA0087.namprd13.prod.outlook.com (2603:10b6:a03:2c4::32) by SA1PR12MB7151.namprd12.prod.outlook.com (2603:10b6:806:2b1::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9870.19; Fri, 1 May 2026 06:58:09 +0000 Received: from MWH0EPF000C6193.namprd02.prod.outlook.com (2603:10b6:a03:2c4:cafe::b1) by SJ0PR13CA0087.outlook.office365.com (2603:10b6:a03:2c4::32) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9891.7 via Frontend Transport; Fri, 1 May 2026 06:58:09 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by MWH0EPF000C6193.mail.protection.outlook.com (10.167.249.107) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9891.9 via Frontend Transport; Fri, 1 May 2026 06:58:09 +0000 Received: from satlexmb10.amd.com (10.181.42.219) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Fri, 1 May 2026 01:58:07 -0500 Received: from satlexmb08.amd.com (10.181.42.217) by satlexmb10.amd.com (10.181.42.219) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Fri, 1 May 2026 01:58:07 -0500 Received: from [172.31.184.125] (10.180.168.240) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Fri, 1 May 2026 01:57:58 -0500 Message-ID: <2d8b79f8-b7a5-4b84-b844-6de0609fd56d@amd.com> Date: Fri, 1 May 2026 12:27:57 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 2/2] locking: mutex: Fix proxy-exec potentially deactivating tasks marked TASK_RUNNING To: John Stultz , LKML CC: Vineeth Pillai , Sonam Sanju , Sean Christopherson , Kunwu Chan , Tejun Heo , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , "Suleiman Souhlal" , kuyo chang , hupu , References: <20260430215103.2978955-1-jstultz@google.com> <20260430215103.2978955-3-jstultz@google.com> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20260430215103.2978955-3-jstultz@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MWH0EPF000C6193:EE_|SA1PR12MB7151:EE_ X-MS-Office365-Filtering-Correlation-Id: 31b23bbc-9cce-44c5-30bb-08dea74f0111 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|82310400026|36860700016|376014|7416014|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: 9N1vpb06ig1BbmcoReadnXPDP2QyHuYLiLU5tfl8eEMfyBXnZ1Ud2EqLPZpE8I09eJnPrgh1+6TokiF29k7YbdhrmdYeofpo7sHcg8lh4Tp5WwqfWSw145E1c7J6z1hXUJp2m/dDvh64Erm6Gt2eMC2YFiDRy9/G2AJQukCHxokG9JxkefjcmhYtVGGghKkLi+txizNj+39dxmH6PPzYzOl64K3XDpGlKOb4W5bWVDHYqjcSpGo+tcBWctNPtlJCOLn1zRJuTLs1JUbplFdojwoQMWnEKMsfTqFlT2Jlt/vMdIRHDhfky/lC7n3Eer8IRDglEMC7cPepLfXrC4tabQJXXrIhjlfBrofnv/BlhAKS16MWbDh7cen5FsoKhqkOoVHPxH2v0c2DsyEPIbQPWT/OTZeG9HPZKOA8oNR/JDG44ocMYn5isN5KtA950HFXtaOE5+KuogOtsL3H5SkAzIly5pUrwOaKh0IkBtDxFCYkovy1XFTDbJcOBudJYi+j8qTdvrlTlehA7oW6KNclNsafS3TOQDMq9sDPpwVX8OIs6V2kp7JVncXFpgxghaX7OBFHkioBNVhd1/K02oRgzjW97EBeKIQa9Ly+TviD9XQyeRttM3Hcf1RwODzHeWYxaJVw89wihL3tkY8LEDDDjqUQT1Ioz+74L9gcSefH3yQHopPdI8exhU7UAjF524V/4cYkeAJidbHZ9yrmX8NSZY27cU4yJWpDifvYzfdcJ9I= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(82310400026)(36860700016)(376014)(7416014)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: V/LZxi1isROiGBDS2eKWkaucgGR8m/NJQyp9AX02oco2dPqcaV4+m9Iv6/QKdH7pidFqFLTgNkEanDwRbrkCqK/8/tnwDXZS3CCIDzHdSnQMrCipoJZqno8YhgamtIuQxOZ7aR3mVPkfDyCoC+AtpRhjrQ3ouWOUdrH5ESYwnq4cne4cf/XN+cxe2W9xIWLci8uBTCgtmvG9K2/BVEFEGz8AwLJWTtVBkeEW8xuuYM97s0hVhP/CQ18RqlQ3fhTIYYmrpFZ5gBydhSweXKzzfgUwxBS+kinX0krspXAKNhi6AOgey5u1n4PXMgIE2o24NS8zesSmaPuhWQpvSR8Hpy5R+wwFm+vg5za9kGU7Idgycuc9xq+oIkroJF4XWJMzRBzuYyDAlWqYopzO01oihnbs5506BlXYYfWE30U2X0fENxghVJK+kb9XmCB4yFCp X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 May 2026 06:58:09.4582 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 31b23bbc-9cce-44c5-30bb-08dea74f0111 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: MWH0EPF000C6193.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA1PR12MB7151 Hello John, On 5/1/2026 3:20 AM, John Stultz wrote: > Vineeth found came up with a test driver that could trip up > workqueue stalls. After fixing one issue this test found, > Vineeth reported the test was still failing. > > Greatly simplified, a task that tries to take a mutex already > owned by another task that is sleeping, can hit a edge case in > the mutex_lock_common() case. > > If the task fails to get the lock, calls into schedule, but gets > a spurious wakeup, it will find that it is first waiter, and > go into the mutex_optimistic_spin() logic. Though before calling > mutex_optimistic_spin(), we clear task blocked_on state, since > mutex_optimistic_spin() may call schedule() if need_resched() is > set. > > After mutex_optimistic_spin() fails, we set blocked_on again, > restart the main mutex loop, try to take the lock and call into > schedule_preempt_disabled(). > > From there, with proxy-execution, we'll see the task is > blocked_on, follow the chain, see the owner is sleeping and > dequeue the waiting task from the runqueue. > > This all sounds fine and reasonable. But what I had missed is > that in mutex_optimistic_spin(), not only do we call schedule() > but we set TASK_RUNNABLE right before doing so. > > This is ok for that invocation of schedule(). But when we come > back we re-set the blocked_on we had just cleared, but we do not > re-set the task state to TASK_INTERRUPTIBLE/UNINTERRUPTIBLE. > > This means we have a task that is blocked_on & TASK_RUNNABLE, > so when the proxy execution code dequeues the task, we are > in trouble since future wakeups will be shortcut by the > ttwu_state_match() check. I'm still having a hard time understanding how this happens - when the task fails grabbing a lock during optimistic_spinning(), we set blocked_on with TASK_RUNNING and go through another iteration of the loop. When the task hits schedule_preempt_disabled(), it is still TASK_RUNNING and __schedule() skips try_to_block_task() leaving the task in a preempted (unlatched) state. The task, when selected again, sets the state back to interruptible/uninterruptible/killable and then goes to optimistic spinning again since it should still be the first waiter if it hasn't managed to grab the lock. I don't see how this can cause a problem now with the latched state. There is no need for a wakeup since TASK_RUNNING implies the pick will select it again to run at some point and the blocked_on is re-evaluated. The signal_pending_state() checks the "state" based on the parameter passed to __mutex_lock_common() so it'll still bail out early for signal delivery. Do we still need it with the latched state machine? > > Thus, to avoid this, after mutex_optimistic_spin(), set the task > state back when we set blocked_on. > > Many many thanks again to Vineeth for his very useful testing > driver that uncovered this long hidden bug, that I hadn't > tripped in all my testing! Very impressed with the problems he's > uncovered! > > Reported-by: Vineeth Pillai > Tested-by: Vineeth Pillai > Signed-off-by: John Stultz > --- > kernel/locking/mutex.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 09534628dc01a..a93d4c6bee1a3 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -763,6 +763,7 @@ __mutex_lock_common(struct mutex *lock, unsigned int state, unsigned int subclas > raw_spin_lock_irqsave(&lock->wait_lock, flags); > raw_spin_lock(¤t->blocked_lock); > __set_task_blocked_on(current, lock); > + set_current_state(state); > > if (opt_acquired) > break; -- Thanks and Regards, Prateek