From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from PH7PR06CU001.outbound.protection.outlook.com (mail-westus3azon11010024.outbound.protection.outlook.com [52.101.201.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 89AF62FD69E for ; Fri, 1 May 2026 06:39:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.201.24 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777617567; cv=fail; b=Sn2P1I5xhoBzVjDwacNEeICNv0QqUQHiisSFrwHDx34cQlYo14GtrisFNZP3MZtFInWG8foMMwyED7bQI5TiKXT78rvy3cQ8Oba8KArQUl7qMAVYSe5GKq7kQ3b5oyyuK14D15PZ0dwa24bJ6KyTtylYXi7npVSD6knoLRuOX2o= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1777617567; c=relaxed/simple; bh=TSQFBG3LLe2ezXlsIT3kq7uCn+8vt57h8l1TXOyADqk=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=fQclvPxHZicSL0v50p+EC0tVF89C58sUwvrLt1YtzR75/IYCorRZ+lMh93rR78gFGuTSmgzn/t48LNPCsb4/E5XP/7WsI6ii/m7TCwN1axUpidq0NWY0FUO9uiF+gPWal/IZBx5Fr/qHWJ/128j0a88J/k7GjETyzhlLEXUqXWQ= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=ayAk9tn9; arc=fail smtp.client-ip=52.101.201.24 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="ayAk9tn9" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=rocyqLw54iKHuicQLMpI/rMGXuddHBEvehcrds+dxX4eQzXP0LUHjp62B6W5RyTZk0I5YsRBTUahXiVgsR0MZAGTHpIO3wEY14A+JyGjn8oWcaTc3N1QCZm/RIFx664P8vCJ7F/eoM7pXEqSJ2cJtOLZ+xDYj2zhpG1UlcBCcOpKA7PeHGy13uwbInSPIcoJ5wsNkbm/I8QokP8IMAG2oQji+P7HpKhF6P+5Ha9UA7m5hlA2LrZzaq46aPhEsjqA/nB9+Gn6fieExLBeXBj/x1LBx7EYh3n/X7SweJTKm9xqpxfg1wn/c5nfwooPxt9+N+L7L7F2uQRo5rDf2mXsRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=K2H5oRFU/bk07jmP4A+fLc/hjAtI7LgauLZurc3GT7Y=; b=PUW19gXt0xXqEqTrmvonRHN8xQpjbiXrj7MBl+TSM5n3/LUUnXbe2GUN+TP2UNygHV+ewtLzrH3fWrMNbJS83ZNS/mlp6Yr8k2RzFK1LUnF+8kuPxHAk/EuHGUo3vthdZvMgz01fQUiLuI5jjZk/84RuXpRzoJCxzdXdYbxaUPYHXMwBr1YtVuNVVm3AohfkQQ4/Xn53ddQfDt/LdPApuk2FZb0b4DzBLQtog2HfaCu8toyAqS0YLs7ozfKPa0vMjD/mH/7sQPYM/mC2VEa+UJyrr4MzM6ZmRQ1dg9r2BSbJfFd3ejqlFtLCFbDWZiph63WQxk0hZwn954zTJh2DqQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=google.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=K2H5oRFU/bk07jmP4A+fLc/hjAtI7LgauLZurc3GT7Y=; b=ayAk9tn9f7Z0yYCzOu4WnEkAsxUUGMN9TsqLRxvjBwXupDzlnlwHHnzUySowseTIMocBU0O5WCPKIdLIqrXpzK7abhOB3UPdvXYcpVQb5hk7dr/8HYynMO8qM+dft47V9RQVNCWqWLkRZN6iI4fgCBBYmfvfO/TFFRcoVkX6XJ4= Received: from BYAPR11CA0093.namprd11.prod.outlook.com (2603:10b6:a03:f4::34) by DS7PR12MB8203.namprd12.prod.outlook.com (2603:10b6:8:e1::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9870.22; Fri, 1 May 2026 06:39:20 +0000 Received: from SJ1PEPF00001CE7.namprd03.prod.outlook.com (2603:10b6:a03:f4:cafe::7) by BYAPR11CA0093.outlook.office365.com (2603:10b6:a03:f4::34) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9846.30 via Frontend Transport; Fri, 1 May 2026 06:39:20 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb08.amd.com; pr=C Received: from satlexmb08.amd.com (165.204.84.17) by SJ1PEPF00001CE7.mail.protection.outlook.com (10.167.242.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9891.9 via Frontend Transport; Fri, 1 May 2026 06:39:20 +0000 Received: from SATLEXMB04.amd.com (10.181.40.145) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.17; Fri, 1 May 2026 01:39:19 -0500 Received: from satlexmb07.amd.com (10.181.42.216) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Fri, 1 May 2026 01:39:19 -0500 Received: from [172.31.184.125] (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Fri, 1 May 2026 01:39:10 -0500 Message-ID: Date: Fri, 1 May 2026 12:09:09 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] sched: proxy-exec: Close race causing workqueue work being delayed To: John Stultz , LKML CC: Vineeth Pillai , Sonam Sanju , Sean Christopherson , Kunwu Chan , Tejun Heo , Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , Daniel Lezcano , "Suleiman Souhlal" , kuyo chang , hupu , References: <20260430215103.2978955-1-jstultz@google.com> <20260430215103.2978955-2-jstultz@google.com> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20260430215103.2978955-2-jstultz@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Received-SPF: None (SATLEXMB04.amd.com: kprateek.nayak@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00001CE7:EE_|DS7PR12MB8203:EE_ X-MS-Office365-Filtering-Correlation-Id: a3a01eae-cfe1-4109-114f-08dea74c6036 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|7416014|376014|1800799024|82310400026|36860700016|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: b3l2acf6pRucf8tHQ644isyh48ggITlWw0RgHBDgP3ojf4l5TE7x2MEQ1PlFaV8GmcfJCYAKcqSON+vrdwpQnxLloGI2K8xHTTQkVafiFNRG5pJWM+dVIWJ2CAC58Dx03jt1PkqJLhhhwDFrJP3LwgoxJG1fFlQo4EWcb+K2Wi7XozPFCynRCcLkRDuGwRWt0tlO1YhwfMAUNfOLATvKOg9tmEsKZ6me3+sNRRUbfIQffbb6V+wFlyWbEvHqG70O2eNoZUuvVbTOwRXidU2Q4tC8P7e8uCeJqUDIy/BwT2+2f+a6IBuFkbEYYXYmrNhcJK7mfJvQ5hyy/srelMb4NSnRB2InM4BpXasnfvbMQwzsa6qcl8KXyAaezc22pMLZd2DWeSvynvOUP2JqrC7QOwDXWZVQHNXjwEir0YZw53I9HTj4ZVshJvL6RCI9Kco3+/jD3zlKHpzA1CK+LTKqCtHh+/VN1asgkB7bTPcW/yVxwerDypooRV293KoquqfnX/qF3lDR4e8MMmK8uBROS8PvaZNaOY+D74G1yfx+0J7t+ukuEEEcc7cV18ERixDEs4Qqc910TJ7KQetMCuzkWUbtQ3KNW0D3qmFsnKZw+Xi7AAyBj7G2/N8lRqaw7Z5/MkOCwsK6T/xQD77KRIGfh9JAGoqLZINX9DA9w4yP9h2Fqgew5BY7X4t30/s0J7uLvvqzB6qye+knvmkqW54s25eVFtLHz+n1acndWi4J1hw= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb08.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(7416014)(376014)(1800799024)(82310400026)(36860700016)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: ffqPRW8wHMd5eaEy2UeZiRmRK8/57V1DQ8G74is0NLLbo/bB6MDlEyB7Xqb6hv6owVDfo5Ucfkey5r9UiiTiB4o2IT3nr3qnLFuvvxZR+hYJdtEEm8wKU6xtQAlifvRSNMKvxVqL86X2jvKvL0kdQs9JTdLUa9Brxos9oPDM10pqiJDQfqNWHnuHncuP8xpPNDZl1L4Jc9o4kFj0Injf4YCRzvpKfaorposWLJMBHZhzgnv6wkeNJeVzn2F7Jcps9BGOSWBKVGf8meUiSmCGX1F195UCm3bXnJW4W+20G6wI1gZ2HYJqS0Nt/wwJGOZXWzH5mGaT6cdKiWjiHI7sQQlJCmgJG34A1wa4PzJ20A3BEB/SqjdDV3C2iyxsuRdx2pVmfcpK7NMfW6k5Y96Ri8cgxPVbT9jJwdKExIcMKPK4FPl5FHp4mQ3neAPifFsX X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 01 May 2026 06:39:20.5341 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a3a01eae-cfe1-4109-114f-08dea74c6036 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb08.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00001CE7.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR12MB8203 Hello John, Mostly cosmetic nitpicks. The overall idea looks good. On 5/1/2026 3:20 AM, John Stultz wrote: > @@ -2183,18 +2183,56 @@ extern int __cond_resched_rwlock_write(rwlock_t *lock) __must_hold(lock); > #ifndef CONFIG_PREEMPT_RT > > /* > - * With proxy exec, if a task has been proxy-migrated, it may be a donor > - * on a cpu that it can't actually run on. Thus we need a special state > - * to denote that the task is being woken, but that it needs to be > - * evaluated for return-migration before it is run. So if the task is > - * blocked_on PROXY_WAKING, return migrate it before running it. > + * The proxy exec blocked_on pointer value uses the low bit as a latch > + * value which clarifies if the blocked_on value is used for proxying or > + * not. > + * > + * The state machine looks something like > + * NULL -> ptr:unlatched -> ptr:latched -> PROXY_WAKING -> NULL > + * > + * With some additional transitions: > + * ptr:unlatched -> NULL (done on current, or via set_task_blocked_on_waking()) > + * ptr:latched -> NULL (done only on current) > + * > + * 1) NULL and ptr:unlatched are effectively equivalent, no proxying will occur > + * 2) ptr:latched is the state when proxying will occur > + * 3) PROXY_WAKING is used when the task is being woken to ensure we > + * return-migrate proxy-migrated tasks before running them (note it has > + * the latch bit set). > */ > -#define PROXY_WAKING ((struct mutex *)(-1L)) > +#define PROXY_BLOCKED_LATCH (1UL) > +#define PROXY_BLOCKED_ON_MASK(x) ((struct mutex *)((unsigned long)(x) & ~PROXY_BLOCKED_LATCH)) nit. I think PROXY_BLOCKED_ON_MUTEX() would be a better name since this is returning the true mutex pointer back. No strong feelings, I'll defer to others for more comments. > +#define PROXY_WAKING ((struct mutex *)(-1L)) /* PROXY_WAKING has LATCH bit set */ > + > +static inline struct mutex *task_is_blocked_on(struct task_struct *p) I think this can take the role of task_is_blocked() no? Only one caller for try_to_block_task() will require looking at the raw blocked_on state but other than that, it is safe for the scheduler to move around the preempted task until it has grabbed the BO latch. > +{ > + if (!sched_proxy_exec()) > + return false; > + return (struct mutex *)((unsigned long)p->blocked_on & PROXY_BLOCKED_LATCH); > +} > + > +static inline void __set_task_blocked_on_latched(struct task_struct *p) > +{ Are you planning to reuse this sometime later in the series? If not I think we can convert this to "try_set_task_blocked_on_latch()" and return false if it finds blocked on having been cleared already. That way the lock + check in try_to_block_task() can be moved here. > + lockdep_assert_held_once(&p->blocked_lock); > + WARN_ON_ONCE(!p->blocked_on); > + p->blocked_on = (struct mutex *)((unsigned long)p->blocked_on | PROXY_BLOCKED_LATCH); > +} > + > +static inline struct mutex *__get_task_latched_blocked_on(struct task_struct *p) I think this can be __get_task_blocked_on() ... > +{ > + if (!task_is_blocked_on(p)) > + return NULL; > + if (p->blocked_on == PROXY_WAKING) > + return PROXY_WAKING; > + return PROXY_BLOCKED_ON_MASK(p->blocked_on); > +} > > static inline struct mutex *__get_task_blocked_on(struct task_struct *p) ... and this can be __get_task_blocked_on_raw() since only one caller in kernel/locking/mutex.h really cares about the ~PROXY_BLOCKED_LATCH value outside of this file. Everything in the sched bits can then simply be __get_task_blocked_on() and that seems much cleaner. Thoughts? > { > lockdep_assert_held_once(&p->blocked_lock); > - return p->blocked_on == PROXY_WAKING ? NULL : p->blocked_on; > + if (p->blocked_on == PROXY_WAKING) > + return NULL; > + return PROXY_BLOCKED_ON_MASK(p->blocked_on); > } > > static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m) > @@ -2215,6 +2253,8 @@ static inline void __set_task_blocked_on(struct task_struct *p, struct mutex *m) > > static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex *m) > { > + struct mutex *bo = p->blocked_on; > + > /* Currently we serialize blocked_on under the task::blocked_lock */ > lockdep_assert_held_once(&p->blocked_lock); > /* > @@ -2222,7 +2262,7 @@ static inline void __clear_task_blocked_on(struct task_struct *p, struct mutex * > * blocked_on relationships, but make sure we are not > * clearing the relationship with a different lock. > */ > - WARN_ON_ONCE(m && p->blocked_on && p->blocked_on != m && p->blocked_on != PROXY_WAKING); > + WARN_ON_ONCE(m && bo && __get_task_blocked_on(p) != m && bo != PROXY_WAKING); > p->blocked_on = NULL; > } > > @@ -2242,15 +2282,17 @@ static inline void __set_task_blocked_on_waking(struct task_struct *p, struct mu > return; > } > > - /* Don't set PROXY_WAKING if blocked_on was already cleared */ > - if (!p->blocked_on) > + /* Don't set PROXY_WAKING if we are not really blocked_on */ > + if (!task_is_blocked_on(p)) { > + p->blocked_on = NULL; /* clear if unlatched */ > return; > + } > /* > * There may be cases where we set PROXY_WAKING on tasks that were > * already set to waking, but make sure we are not changing > * the relationship with a different lock. > */ > - WARN_ON_ONCE(m && p->blocked_on != m && p->blocked_on != PROXY_WAKING); > + WARN_ON_ONCE(m && __get_task_blocked_on(p) != m && p->blocked_on != PROXY_WAKING); > p->blocked_on = PROXY_WAKING; > } > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index da20fb6ea25ae..2f912bf698446 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6599,8 +6599,13 @@ static bool try_to_block_task(struct rq *rq, struct task_struct *p, > * blocked on a mutex, and we want to keep it on the runqueue > * to be selectable for proxy-execution. > */ > - if (!should_block) > - return false; > + if (!should_block) { > + guard(raw_spinlock)(&p->blocked_lock); > + if (p->blocked_on) { > + __set_task_blocked_on_latched(p); > + return false; > + } > + } In my head, this as: if (!should_block & try_to_latch_task_blocked_on(p)) return false; seems much cleaner. I'll defer to other for comments. > > p->sched_contributes_to_load = > (task_state & TASK_UNINTERRUPTIBLE) && > @@ -6833,7 +6838,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) > int owner_cpu; > > /* Follow blocked_on chain. */ > - for (p = donor; (mutex = p->blocked_on); p = owner) { > + for (p = donor; (mutex = __get_task_latched_blocked_on(p)); p = owner) { > /* if its PROXY_WAKING, do return migration or run if current */ > if (mutex == PROXY_WAKING) { > if (task_current(rq, p)) { > @@ -6851,7 +6856,7 @@ find_proxy_task(struct rq *rq, struct task_struct *donor, struct rq_flags *rf) > guard(raw_spinlock)(&p->blocked_lock); > > /* Check again that p is blocked with blocked_lock held */ > - if (mutex != __get_task_blocked_on(p)) { > + if (mutex != __get_task_latched_blocked_on(p)) { > /* > * Something changed in the blocked_on chain and > * we don't know if only at this level. So, let's > @@ -7107,7 +7112,7 @@ static void __sched notrace __schedule(int sched_mode) > struct task_struct *prev_donor = rq->donor; > > rq_set_donor(rq, next); > - if (unlikely(next->blocked_on)) { > + if (unlikely(task_is_blocked_on(next))) { > next = find_proxy_task(rq, next, &rf); > if (!next) { > zap_balance_callbacks(rq); -- Thanks and Regards, Prateek