From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from CY7PR03CU001.outbound.protection.outlook.com (mail-westcentralusazon11010056.outbound.protection.outlook.com [40.93.198.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 43035366059 for ; Sun, 15 Mar 2026 17:38:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.93.198.56 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773596317; cv=fail; b=JlAikqRgbzYubsIU0mMUkPTBe/hacywzPTkCwGueVO4xXHNb5iZDmgW2GGUH8rtvxB9UZ01c28vev34dj+4IpiuRPK/xZlqwRXjHxVYdKpIlCQ7FXPCESHLcIJX+lfDYm0DILjdDNL3gkrVY7GYjS2VWkv52pyTRoIFOorLurpw= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773596317; c=relaxed/simple; bh=apsD0CKFQ8djMU2907SW2Rw+BEKWO+775s0QR49dZrA=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=H9ms6MaE1oHmriKw/bk1fOq4fuyoD9iCc4EfS2KZ4UkdfWL8OzqxChE32XEaJI2NP7H/cEt0CaX7KRpgi5AhuUq1nw0YtBKVaQ0dsUcafeLMKRrt5m08fXULj6oPdO2MF1zT+TrfJSL8hnL0STNr71UlB9W1x+6VpnEjU+1u7jI= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=vxPbJlgn; arc=fail smtp.client-ip=40.93.198.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="vxPbJlgn" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=P18v0t8vBiKgYh1gvkqH4LnBoNGUS9YprqGtxKNuE0yXwsxi0BNQ4rFJlo03DUccZ/rQORHd81g0qA14jiFhVawZc4ovhO7EQfmgKoZwGtLMW3IH8M/wgWoQQW4APLwmhIoUOY4BNqRMxKr/9SsOmOx46xR/sNAgmPPWgWH10S7bVhnAqC6Vp5MUrxXicL2gKDz2gQbAL9TfzFKUWwQnwfC1g6pcKF52BpkUiA7gJvdMMruyCZ6MgTpvHVqTzA10U4N/kssi7DnDWx3umgnaFGWK/XBpqizEV9MiZKHPlNA1MaPjqyVQfl2DHRSsO452TvC3iCFVr6Bd7dliaZKPpw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=UeUaLZCEQffjxcNzrL3tlCcq25+KkIhpCj6KswOR+yk=; b=SL6EcglFQzNi8PAG5zXkWVyMH7SD8gbm3wkrEzZ5tuv+w53uAtjgobvOxv003NCLtRiu72sqwgpCiOumJRKkJ2CxBpQ18vsIBVGYOkrBYscZY/jjBotEVosH/7emVMZfbnPh7QbK/RNspw6Js2AUtAQ5joATbhST7hhG19XF/Q1DY5CamDJZaX9+H7/pCrmRR8waUD5Pe8NiVGlKQ7kKzhOsoGJdBQptCQJU8C/ZBmrBgmnOM3b3AAgUiaAaZodABX5CzwPb9wdQ09BfBiuRDIvIdIoau+RayhBCU9gnWtjspeq+pptKSQtCqS09iT7aNOyqAV0TYpv/xrwtzaW3PQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=google.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=UeUaLZCEQffjxcNzrL3tlCcq25+KkIhpCj6KswOR+yk=; b=vxPbJlgnnNTioHXaDsarwyhZRkOS/KHULnmiOfTYci3KRylL0q5zRPJR6o3IhKUqDchZZBjti904LQlr0popu3o53a0VOwrE8O/T3MPJFQAz/GcQ4/FzO42iLqgXssKEzIiLlFD4Mlk/scrC6Md8E6Pu4gNS/3ZW8ZIIpf9Habc= Received: from BY5PR17CA0002.namprd17.prod.outlook.com (2603:10b6:a03:1b8::15) by CH3PR12MB8073.namprd12.prod.outlook.com (2603:10b6:610:126::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9723.13; Sun, 15 Mar 2026 17:38:30 +0000 Received: from SJ1PEPF00002310.namprd03.prod.outlook.com (2603:10b6:a03:1b8:cafe::a4) by BY5PR17CA0002.outlook.office365.com (2603:10b6:a03:1b8::15) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9700.21 via Frontend Transport; Sun, 15 Mar 2026 17:38:30 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb08.amd.com; pr=C Received: from satlexmb08.amd.com (165.204.84.17) by SJ1PEPF00002310.mail.protection.outlook.com (10.167.242.164) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9700.17 via Frontend Transport; Sun, 15 Mar 2026 17:38:29 +0000 Received: from Satlexmb09.amd.com (10.181.42.218) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Sun, 15 Mar 2026 12:38:28 -0500 Received: from satlexmb08.amd.com (10.181.42.217) by satlexmb09.amd.com (10.181.42.218) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.17; Sun, 15 Mar 2026 10:38:28 -0700 Received: from [172.31.184.125] (10.180.168.240) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Sun, 15 Mar 2026 12:38:21 -0500 Message-ID: Date: Sun, 15 Mar 2026 23:08:20 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v25 9/9] sched: Handle blocked-waiter migration (and return migration) To: John Stultz , LKML CC: Joel Fernandes , Qais Yousef , Ingo Molnar , Peter Zijlstra , "Juri Lelli" , Vincent Guittot , Dietmar Eggemann , Valentin Schneider , Steven Rostedt , Ben Segall , Zimuzo Ezeozue , Mel Gorman , Will Deacon , Waiman Long , Boqun Feng , "Paul E. McKenney" , Metin Kaya , Xuewen Yan , Thomas Gleixner , "Daniel Lezcano" , Suleiman Souhlal , kuyo chang , hupu , References: <20260313023022.2902479-1-jstultz@google.com> <20260313023022.2902479-10-jstultz@google.com> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20260313023022.2902479-10-jstultz@google.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00002310:EE_|CH3PR12MB8073:EE_ X-MS-Office365-Filtering-Correlation-Id: fd4ce7d1-84df-4e1b-db59-08de82b9abd9 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|82310400026|1800799024|376014|7416014|18002099003|56012099003|22082099003; X-Microsoft-Antispam-Message-Info: mJU0GHD8LBHc+hCS+0o4oPxOnZhauf6Gx5/8DejjRiPxR97MGTiKKEbtiRLCbdzb7gCizeeLIrNSQByeAiZwd7kVYKvZU8xzwHVK7Q1mTwN9lhfZzxp+3Ns0TNQLNau09LmdoHSdPanyIqy1aVJL7Gd52RREtnVhhisqsG4PpMK5UBvYYshRa2jucN4NEAijn+zpq2cLCvtoVQP61Xf74EE3Y542O1fwxHk9+Ri1liXU5fiFzKtlj/YJgr+8bBXoNhJ7b6P8A5wtNySmvqiPs4mczUuI58N/GBGzgN+LwfkMarXulSa/Oj12fKhnsUPqNfY7cnpx2XVpp6bE2+01aSI0Zw0VUmief5SuumPDsskXBxn0q1XEuXXGN8Ak6H2b/fQxHsEmFFXSpsgVPSMuwZ6mvOhFXQJkaZ9zUzc6s3xQ9R3Rc2WzZBYsGM8xKSWSGnU9A+kn/tqh0FZW/JpNlglPSWTNqaxPjQhj8M1jp4givTxMe2mZS0xyZjT6mI4eklxmk+dqAOSGS7cevOZlHJU65VWhuE75y69dlkK9qtljPMaPaVP3W6TRe/SorPXkklwEVfms5j7RIeGe0nHNTWcsxMJviaRhTFHHT6hQlZnTcgyzoXQDA34YkRP2XxEOOxLAxOHdPIVWZUF1a7MDWvST6GQu847qHcQFCyFv88Rb8zkenLKItI83YmuC+8X0bkSySvfzFzwrKBOnIGHkHI/OsWEtK9Ekp7APE64BZHI0wC1zNhE/ZG1s23YiLxdO0ZWdEN08GSlSykCyWb1xCA== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb08.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700016)(82310400026)(1800799024)(376014)(7416014)(18002099003)(56012099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: gsiI7jHtICc+cSTChb8PY9FPndOKZMho10yfn8TYY4Qsyo6EtPkJA1ZpWU+0UqC0XrIcTdDjaTRu3BPPLiotYnak6HaFOZm0oOMZqzpUF1shaXGj3IlbP6tJtnic0QIv+Y9PCRtNRLY6NxRqKzXe2kVtINExApBzp5qGMyz5nVi3hoAKebPwDm1iTtOGOkKfdAgp47USp09nJCL/gzYdY6EjM1tPUu2GHWK13257fpdJ4AJAuW0aH9QCgSwiAflczZp4iez+3UghDbwfzCJTTSj8K3xWH73/lZFdXpTkq5uOc7oQQNzRv89f+yd7csic72GroByLAe/cqFsQCoy3QBctXLl7ykOQVm6J/Evm2sEJpXPG7PvtFkekV/cjN96r0J2UHHuYUiLgghRMVG8+9hGG2D+gLruc3RG2lzAywH60SnrZvLqt1OuKDusegYhF X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Mar 2026 17:38:29.6176 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: fd4ce7d1-84df-4e1b-db59-08de82b9abd9 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb08.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00002310.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH3PR12MB8073 Hello John, On 3/13/2026 8:00 AM, John Stultz wrote: > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index af497b8c72dce..fe20204cf51cc 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -3643,6 +3643,23 @@ void update_rq_avg_idle(struct rq *rq) > rq->idle_stamp = 0; > } > > +#ifdef CONFIG_SCHED_PROXY_EXEC > +static inline void proxy_set_task_cpu(struct task_struct *p, int cpu) > +{ > + unsigned int wake_cpu; > + > + /* > + * Since we are enqueuing a blocked task on a cpu it may > + * not be able to run on, preserve wake_cpu when we > + * __set_task_cpu so we can return the task to where it > + * was previously runnable. > + */ > + wake_cpu = p->wake_cpu; > + __set_task_cpu(p, cpu); > + p->wake_cpu = wake_cpu; > +} > +#endif /* CONFIG_SCHED_PROXY_EXEC */ > + > static void > ttwu_do_activate(struct rq *rq, struct task_struct *p, int wake_flags, > struct rq_flags *rf) > @@ -4242,13 +4259,6 @@ int try_to_wake_up(struct task_struct *p, unsigned int state, int wake_flags) > ttwu_queue(p, cpu, wake_flags); > } > out: > - /* > - * For now, if we've been woken up, clear the task->blocked_on > - * regardless if it was set to a mutex or PROXY_WAKING so the > - * task can run. We will need to be more careful later when > - * properly handling proxy migration > - */ > - clear_task_blocked_on(p, NULL); So, for this bit, there are mutex variants that are interruptible and killable which probably benefits from clearing the blocked_on relation. For potential proxy task that are still queued, we'll hit the ttwu_runnable() path and resched out of there so it makes sense to mark them as PROXY_WAKING so schedule() can return migrate them, they run and hit the signal_pending_state() check in __mutex_lock_common() loop, and return -EINTR. Otherwise, if they need a full wakeup, they may be blocked on a sleeping owner, in which case it is beneficial to clear blocked_on, do a full wakeup. and let them run to evaluate the pending signal. ttwu_state_match() should filter out any spurious signals. Thoughts? > if (success) > ttwu_stat(p, task_cpu(p), wake_flags); > > @@ -6575,7 +6585,7 @@ static inline struct task_struct *proxy_resched_idle(struct rq *rq) > return rq->idle; > } > > -static bool __proxy_deactivate(struct rq *rq, struct task_struct *donor) > +static bool proxy_deactivate(struct rq *rq, struct task_struct *donor) > { > unsigned long state = READ_ONCE(donor->__state); > > @@ -6595,17 +6605,135 @@ static bool __proxy_deactivate(struct rq *rq, struct task_struct *donor) > return try_to_block_task(rq, donor, &state, true); > } > > -static struct task_struct *proxy_deactivate(struct rq *rq, struct task_struct *donor) > +/* > + * If the blocked-on relationship crosses CPUs, migrate @p to the > + * owner's CPU. > + * > + * This is because we must respect the CPU affinity of execution > + * contexts (owner) but we can ignore affinity for scheduling > + * contexts (@p). So we have to move scheduling contexts towards > + * potential execution contexts. > + * > + * Note: The owner can disappear, but simply migrate to @target_cpu > + * and leave that CPU to sort things out. > + */ > +static void proxy_migrate_task(struct rq *rq, struct rq_flags *rf, > + struct task_struct *p, int target_cpu) > { > - if (!__proxy_deactivate(rq, donor)) { > - /* > - * XXX: For now, if deactivation failed, set donor > - * as unblocked, as we aren't doing proxy-migrations > - * yet (more logic will be needed then). > - */ > - clear_task_blocked_on(donor, NULL); > + struct rq *target_rq = cpu_rq(target_cpu); > + > + lockdep_assert_rq_held(rq); > + > + /* > + * Since we're going to drop @rq, we have to put(@rq->donor) first, > + * otherwise we have a reference that no longer belongs to us. > + * > + * Additionally, as we put_prev_task(prev) earlier, its possible that > + * prev will migrate away as soon as we drop the rq lock, however we > + * still have it marked as rq->curr, as we've not yet switched tasks. > + * > + * So call proxy_resched_idle() to let go of the references before > + * we release the lock. > + */ > + proxy_resched_idle(rq); > + > + WARN_ON(p == rq->curr); > + > + deactivate_task(rq, p, DEQUEUE_NOCLOCK); > + proxy_set_task_cpu(p, target_cpu); > + > + /* > + * We have to zap callbacks before unlocking the rq > + * as another CPU may jump in and call sched_balance_rq > + * which can trip the warning in rq_pin_lock() if we > + * leave callbacks set. > + */ > + zap_balance_callbacks(rq); > + rq_unpin_lock(rq, rf); > + raw_spin_rq_unlock(rq); > + > + attach_one_task(target_rq, p); > + > + raw_spin_rq_lock(rq); > + rq_repin_lock(rq, rf); > + update_rq_clock(rq); > +} > + > +static void proxy_force_return(struct rq *rq, struct rq_flags *rf, > + struct task_struct *p) > +{ > + struct rq *this_rq, *target_rq; > + struct rq_flags this_rf; > + int cpu, wake_flag = WF_TTWU; > + > + lockdep_assert_rq_held(rq); > + WARN_ON(p == rq->curr); > + > + /* > + * We have to zap callbacks before unlocking the rq > + * as another CPU may jump in and call sched_balance_rq > + * which can trip the warning in rq_pin_lock() if we > + * leave callbacks set. > + */ > + zap_balance_callbacks(rq); > + rq_unpin_lock(rq, rf); > + raw_spin_rq_unlock(rq); > + > + /* > + * We drop the rq lock, and re-grab task_rq_lock to get > + * the pi_lock (needed for select_task_rq) as well. > + */ > + this_rq = task_rq_lock(p, &this_rf); > + > + /* > + * Since we let go of the rq lock, the task may have been > + * woken or migrated to another rq before we got the > + * task_rq_lock. So re-check we're on the same RQ. If > + * not, the task has already been migrated and that CPU > + * will handle any futher migrations. > + */ > + if (this_rq != rq) > + goto err_out; > + > + /* Similarly, if we've been dequeued, someone else will wake us */ > + if (!task_on_rq_queued(p)) > + goto err_out; > + > + /* > + * Since we should only be calling here from __schedule() > + * -> find_proxy_task(), no one else should have > + * assigned current out from under us. But check and warn > + * if we see this, then bail. > + */ > + if (task_current(this_rq, p) || task_on_cpu(this_rq, p)) { > + WARN_ONCE(1, "%s rq: %i current/on_cpu task %s %d on_cpu: %i\n", > + __func__, cpu_of(this_rq), > + p->comm, p->pid, p->on_cpu); > + goto err_out; > } > - return NULL; > + > + update_rq_clock(this_rq); > + proxy_resched_idle(this_rq); I still think this is too late, and only required if we are moving the donor. Can we do this before we drop the rq_lock so that a remote wakeup doesn't need to clear the this? (although I think we don't have that bit in the ttwu path anymore and we rely on the schedule() bits completely for return migration on this version - any particular reason?). > + deactivate_task(this_rq, p, DEQUEUE_NOCLOCK); > + cpu = select_task_rq(p, p->wake_cpu, &wake_flag); > + set_task_cpu(p, cpu); > + target_rq = cpu_rq(cpu); > + clear_task_blocked_on(p, NULL); > + task_rq_unlock(this_rq, p, &this_rf); > + > + attach_one_task(target_rq, p); I'm still having a hard time believing we cannot use wake_up_process() but let me look more into that tomorrow when the sun rises. > + > + /* Finally, re-grab the origianl rq lock and return to pick-again */ > + raw_spin_rq_lock(rq); > + rq_repin_lock(rq, rf); > + update_rq_clock(rq); > + return; > + > +err_out: > + task_rq_unlock(this_rq, p, &this_rf); > + raw_spin_rq_lock(rq); > + rq_repin_lock(rq, rf); > + update_rq_clock(rq); > } > > /* -- Thanks and Regards, Prateek