From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from DM1PR04CU001.outbound.protection.outlook.com (mail-centralusazon11010058.outbound.protection.outlook.com [52.101.61.58]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4EE6835DA60; Sun, 29 Mar 2026 16:26:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.61.58 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774801579; cv=fail; b=kG4IMahYXZalWK6mYh3+P4aMSUB6FU5/Ou9y5f82J920lR9JACxSEQVTtvmaNc9bQUCym/qUzANMeKQ3SvewBH2cJyBmGxXMCjmfUaP3jQYUn6+L18WontdLABnW+KmiB0Hta3Who1M8DLpMkqI07chg4gqM57k+VgOl0GoiAoM= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774801579; c=relaxed/simple; bh=xpagld+NlOKqF4q3IqcICkYlfnXfW0E9Ma0dFOXcML8=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=EIncQDHjTlC+gbIUgtsq3ogMIiuqeoOIAqITQ2KptgrLZWOwWkdi9k6OBQryZqk64r6kfJMMHksuupqdQ4gWzSDTtLPtsfQQLdQlkiGS93O+cRX0KFwrTBAB8GBKuS0+rWFP36N/qRC6WGSyNVZYHkrtDlZ5RSwG0iexGu57EiI= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=mEV1gMhJ; arc=fail smtp.client-ip=52.101.61.58 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="mEV1gMhJ" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=WMy8CSGNf5X7dAlnigGqjpRJWECp+SW5OL3wMkdJY2C1qVQxvF2L1kLirH+KtU/Yr6UJppOTrpssT5fmUi3RelQHe+ouVQ+rBE8I5smmGJ6zWVfkUHfd7iqs/KXEI/gqWuWMTG1E2pHOjiMcolIfZI/1ioLTEqQ+GWYYjstpeODytlQZlsr3aAvAhooNWPkHtaygWOlpcLrM8kWzvHucNx036+2J4irGPqguzb5o8HRKtraeD1JGVFtJglcFJaeooCl89EUl2ChU5xEWOPOVxuUHlUVjPgDwbED1e1/qnj+AJg0C1/AK+krVoEYlFzOFt87408W1F5eNc7IjbYRTNQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=J37tkLt3RWYl9B3lwFjGe9ndOjn9Jo/teoDd7GUS5F0=; b=fbRZ2WX4BPuuWgTODdZn7KuPa55tAxsBO3V5bt473pCkZ7OEDEEKNWlS59stkerq/wh8JA0GIdqfBwN+OYX9CPPMa2kc0mtsy+z5Q8tlVgzHJ0BIqn7yHKNwVl2Sys3Xn37GpJbbbYKWMkkkuIVT+0/fHtjGEcx07ppBYa8iD/e0XmLJNzi0GDB8hj4WhJKy4SLnMZzd/kXxIGQ1cYRiWMk8DLRIH2HDSx5HyidlJ+tOIZKLXnDjEQKASD22fLDtD2+Jpk/wxQihlh5mxMvysOhIWoFYrRfOKAz3N1TgBAttdDpeQX1sE5u4kJ5kC1dqk8pCWJNzEjwY+byaO63mbA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=J37tkLt3RWYl9B3lwFjGe9ndOjn9Jo/teoDd7GUS5F0=; b=mEV1gMhJ0bhgxx8WMhguH07jd0rGBHMruZEx6j3010qmasBu8mTXxODSm+uJarfd4PU+y7kuQ2ap+XtLMp7ZuJ3NKTe4cSV/WeB9NzF3MKgiV/tFEKgQTuYraoEUFI7tUjSIxqTvyIT7WWd1QOUfV7Lp7cAsaVagC0QIlywQl6a/tlMmFOjKYJaGi4GVZjN3/qlLgWgxp/eJL36+RhJ0Q1b0Kfxu6ZyRy2limlmP4PIBYNHyZWlVfTahKWLCzEDhsQKw+cE2BclPoGhCuVtfQrMpmOloDqMBWq/ZGo3K5sbZUmODM7Nipk+dbL/yoVnKLQXwhp6xOKWO0LKfj0zNDA== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) by MW4PR12MB7437.namprd12.prod.outlook.com (2603:10b6:303:21a::18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Sun, 29 Mar 2026 16:26:14 +0000 Received: from LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528]) by LV8PR12MB9620.namprd12.prod.outlook.com ([fe80::299d:f5e0:3550:1528%5]) with mapi id 15.20.9769.014; Sun, 29 Mar 2026 16:26:14 +0000 Date: Sun, 29 Mar 2026 18:26:10 +0200 From: Andrea Righi To: Tejun Heo Cc: David Vernet , Changwoo Min , Christian Loehle , Emil Tsalapatis , sched-ext@lists.linux.dev, linux-kernel@vger.kernel.org, stable@vger.kernel.org Subject: Re: [PATCH 1/2] sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback Message-ID: References: <20260329001856.835643-1-tj@kernel.org> <20260329001856.835643-2-tj@kernel.org> Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20260329001856.835643-2-tj@kernel.org> X-ClientProxiedBy: MI2P293CA0005.ITAP293.PROD.OUTLOOK.COM (2603:10a6:290:45::12) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: stable@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: LV8PR12MB9620:EE_|MW4PR12MB7437:EE_ X-MS-Office365-Filtering-Correlation-Id: d5f0fcf0-5ac3-4cf4-81c8-08de8dafe52d X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024|22082099003|56012099003|18002099003; X-Microsoft-Antispam-Message-Info: hZ25Xi6CxPdZElDApEY4e2SH1s2pJda0xUdrHqu1F6T3cSMM57jVGcGpb8Lq088luZ4t8VK7kQM3KzYq6rOJguUeI7q2ad7aICWtvJ/2JL6YAXmD7snVV3AK7cfKmPa6PA4+SEACoFq0D1UM4UjXIzLPyml+/Z4Yuk0dM2LvkiDNvXjbo0u4NLGaafnW60FJMjnodOHbCAyjbeSL9YGvoTHj8Oqh9A5M/1G4K2F5DrZmD1n7YwrGs6NGW4Iv557AYnYM34bTS7s/A0CZS4EWg/ivZKZS3IS+oJxRs94v2x1nABIuOEsfJ+MSPAauZktw4R5x6S/qlQWpL6nUHHScSg06HFgjbFuud6BZDuVN/pirneOwK9VQrcdMNHUjAk9GqqDL46kMCzYZoBrqkhJiX18FS6NNFqve1Fj/H0sj6ChpC7gt5mI6iNO4GVcHWndS3WlJVdEDQIkZKXX87ckVxm/wXy6AWk//HdVdmUVQIOhYP4xUhrW5geA0Pk1Ht33FT1D8PIcS2PmvZt7En5lbuIzQA2J6SVklOMeF+wts92vXOY/nKOoCJapoEifN82P18mGbREu/trPM/bvVqlLtdKzonYl7RqTwtuogkZx4OD15dgV9Obro8pKvMxxBG3gYUzugu2z4Sfc0BR5j5WmWtgoLOjgnE2UXSxZrdISuOdPXyb4G9/pHkv3r/OSl2r8NKUT+yK9RUChHejFDxu6VnZFXZ+k7jOUDWvjLzGxcFUHTc9Gc0iQWNloin3XI03KG X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:LV8PR12MB9620.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(376014)(366016)(1800799024)(22082099003)(56012099003)(18002099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?utf-8?B?OENqR0t0MzdUd1h1NHYxNkc1bWgxSlBoSk54aGgwZzFOelk3aUtOeGNMZnB4?= =?utf-8?B?ejN5b2JFZGlFSTc5VHJWVzQ5TElMY1hGeTRnaW4vL0VlWFdFUVZ2VUpodkgy?= =?utf-8?B?OXkxS3pPL3RoT2g5aGtKbTlwWDZZTS9hY0pDbUQ4Mm1rNGZ4T1NSRGhMNHFT?= =?utf-8?B?TjZlZGZwUWFxeVZvaEU3UXdHaEtiN3VmZDdUVEQ0ZEZKTGwxWGVNZWhhdXp6?= =?utf-8?B?OEJNWXFodXlJd3NFOTRXb3J3TTdycm9yVXgzRkRhNnJORDJUcGg1RWtWNG93?= =?utf-8?B?b1NuSk9kNU9TOThmSFZKUlZ2bEQrUFJLcVhteVhyRTZXQXRyN01MTklvWlpj?= =?utf-8?B?Umk4NTlFYlowb3M3bWF2ZXU5Wi9LaGF4Q2VGQ3J6Y2d5VVp2aTVnN0VNRmtF?= =?utf-8?B?dVd0V0pNSTk3RGd0QVZUTjYxcWJTS3FjdGU0TG5jZG4zVk5BYlNrZ24yMUhN?= =?utf-8?B?Z1JFUFV0NmNtTG5ZaDB4bzk2SHpqYXhjYnp6eVg4QnA0c1VtdlFnNFNQbEMz?= =?utf-8?B?dEsrdkNmOGtnd1dqbmpKQXB6OFZ4SVdTdURvWGd1QkkySGwzYWhoVlJLZ2U1?= =?utf-8?B?STNmTVJMc3BvbUt6elFLc0pKU3FiN04vaXQwRmE3RTE1SFUxMEp0aUxFOFJV?= =?utf-8?B?WDNXRmFLQ2EwMll5QysrM2Nwb2FhM09GTExXWk1pMGRIVnJkbHpWM3RtVWF1?= =?utf-8?B?SWlkWGtuVHNtQ1YwTjY3dytYTDNCN1dsMUtpVjNXSU02RFdqMDVhQnRlSjRH?= =?utf-8?B?VW1TeDJFckRmODJlSE92U2pleUJZbU9BRTlyeW9Qc084ajFXdWpFUkZ2d2Mv?= =?utf-8?B?eFp3L25VZjlRNzFTbzhPQVpXM0E5d0VuUzV2a3hBMDFTbWFKMzZlSTZwK3FI?= =?utf-8?B?dC9kanJyQk5FRXVIZEN4WExOMDJrcllkRFphS1JVeUNkOVI4Rnp1RllkWlV0?= =?utf-8?B?Q3dwenIzUXNYMjZnaUJKNmE3VFBDM3hWaERoY0JqZWNMTGNFQU5vdmJxblcx?= =?utf-8?B?VzhhTHJNWmM3MHV2RUd2aC9FNFlTZXlxLzlyKzBSSnZSMkFiN1N0dDc4ek1R?= =?utf-8?B?V2lwdzZNdTRjTWR0SlZLS21rU0xDclFFaldCbjN3WFNQQmlPUVNhWXBLWERO?= =?utf-8?B?YTk5cGdSeWtueHpodGluL3RLOWZQSU81RHUxbWZDUWdVeTVEWWhad0gwNmdL?= =?utf-8?B?TEh0blRHc0p4QXpnTEZ4MUlQR2dGSGZvTS9uc2VzWW5UK0Q4WkJobjFSTUNY?= =?utf-8?B?c1pGTDVJQi9SQzUxaVJLUGo4ZE4vdWh6bHozWWZKNndaT0xnZ0kxN21TU1lD?= =?utf-8?B?ZmNOcmtCZ2IwQTV4a0xOc3ZsWnhIOE9nNVdHd0NGSEpRUFd2bW5adkVWWm9B?= =?utf-8?B?Q0hMcmxlOW9mekU5WFlSOTlLaFN1RHV6MmFvTXM1ZGVkS01Za201ZStWeURI?= =?utf-8?B?MTFUU1VRVktYWmlmYXFOVWh2QnVDcmZxL3NtRVQxYm0zNWo0YmFDRHJMem5l?= =?utf-8?B?RzcvOWFyMm96cGg1SDZYS0NLQzN2NU5yT0JPQUg5RTAzeFFOblJjWDhuK1pH?= =?utf-8?B?eVRpclF5RnVoVG9pQWNwMi9vU3ltSFNUS1RtZ0lwWm9ZYkFMclVWY1JnWU9w?= =?utf-8?B?R3VOTk5wMVRVOVJxV0w4YldTQVA3TC9CeHZEQysrZFRTU0VlK2dwWUp2bC9C?= =?utf-8?B?NTVpYk51bzVXUDRDL2R2SXpvdGNlbTNXYlFESU4xVXhBQVVMTUtxMVBOc0lw?= =?utf-8?B?QVNnQlc5eE9welRTQkxHNnc0UjU5SXV0emptcTVGend0ZERkWm9UbkVVYmVt?= =?utf-8?B?dlRPekYzR2swVCtLbnJQeXQxS2JINHYwOUF5anByT0xRMEpla3ZHWkxvZDJN?= =?utf-8?B?TGI3OGZlUWtMZldmcXlsblRIRnc0dGR2T29tM1V3ejA3N1hTOVNHZmovN01k?= =?utf-8?B?SndNV2FkeUN0RTB1U0dzdGk4Sm9lakZyZk5UMkZEMi9MY2w2SG90Z0lZRzVv?= =?utf-8?B?aGQ4dDUzY3NqbllBdk9haHpDejhrVnNPUnBrY3pnZHByRnh0c1ppQzRTWVQ3?= =?utf-8?B?WlVzQ1FjektiOE80Q1EyK21sWCtvbWphbDRXbjAvS0RhUmxpVERlN2V4N3pN?= =?utf-8?B?eU5LUmdCV2RDRERNTnNMZEhaUDBUelBlS01zb1FrQWdySHdibmwra213ay9L?= =?utf-8?B?cDB0UXNtbTB6WVZBMXF3ZEZzbGk3K0ozNlFKajUyVGNqRVA2bTAxZWRaMnYr?= =?utf-8?B?cDU1aUN2WDNqY21YaGI2R254bURwZDFveWNoZENaTW1MN0pNNXQxZ2xyL01m?= =?utf-8?B?WnZnSXZlcis4cHZHaXBYemJaWUVjb3Q0UzBuU2hSbS9WczF0UDZWQT09?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: d5f0fcf0-5ac3-4cf4-81c8-08de8dafe52d X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 29 Mar 2026 16:26:13.9940 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 4/WwjecH1Y36maaZpQE6SP8XG2EVwFhVwwm1TfGVgw5dbLz2sB1Fwb9V/u/XZHgwH5Zj+Sl0dsJMQtVzMjEk7A== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW4PR12MB7437 Hi Tejun, On Sat, Mar 28, 2026 at 02:18:55PM -1000, Tejun Heo wrote: > SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() using > smp_cond_load_acquire() until the target CPU's kick_sync advances. Because > the irq_work runs in hardirq context, the waiting CPU cannot reschedule and > its own kick_sync never advances. If multiple CPUs form a wait cycle, all > CPUs deadlock. > > Replace the busy-wait in kick_cpus_irq_workfn() with resched_curr() to > force the CPU through do_pick_task_scx(), which queues a balance callback > to perform the wait. The balance callback drops the rq lock and enables > IRQs following the sched_core_balance() pattern, so the CPU can process > IPIs while waiting. The local CPU's kick_sync is advanced on entry to > do_pick_task_scx() and continuously during the wait, ensuring any CPU that > starts waiting for us sees the advancement and cannot form cyclic > dependencies. > > Fixes: 90e55164dad4 ("sched_ext: Implement SCX_KICK_WAIT") > Cc: stable@vger.kernel.org # v6.12+ > Reported-by: Christian Loehle > Link: https://lore.kernel.org/r/20260316100249.1651641-1-christian.loehle@arm.com > Signed-off-by: Tejun Heo > --- > kernel/sched/ext.c | 95 ++++++++++++++++++++++++++++++++------------ > kernel/sched/sched.h | 3 ++ > 2 files changed, 73 insertions(+), 25 deletions(-) > > diff --git a/kernel/sched/ext.c b/kernel/sched/ext.c > index 26a6ac2f8826..d5bdcdb3f700 100644 > --- a/kernel/sched/ext.c > +++ b/kernel/sched/ext.c > @@ -2404,7 +2404,7 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p, > { > struct scx_sched *sch = scx_root; > > - /* see kick_cpus_irq_workfn() */ > + /* see kick_sync_wait_bal_cb() */ > smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1); > > update_curr_scx(rq); > @@ -2447,6 +2447,48 @@ static void put_prev_task_scx(struct rq *rq, struct task_struct *p, > switch_class(rq, next); > } > > +static void kick_sync_wait_bal_cb(struct rq *rq) > +{ > + struct scx_kick_syncs __rcu *ks = __this_cpu_read(scx_kick_syncs); > + unsigned long *ksyncs = rcu_dereference_sched(ks)->syncs; > + bool waited; > + s32 cpu; > + > + /* > + * Drop rq lock and enable IRQs while waiting. IRQs must be enabled > + * — a target CPU may be waiting for us to process an IPI (e.g. TLB nit: s/—/-/ > + * flush) while we wait for its kick_sync to advance. > + * > + * Also, keep advancing our own kick_sync so that new kick_sync waits > + * targeting us, which can start after we drop the lock, cannot form > + * cyclic dependencies. > + */ > +retry: > + waited = false; > + for_each_cpu(cpu, rq->scx.cpus_to_sync) { > + /* > + * smp_load_acquire() pairs with smp_store_release() on > + * kick_sync updates on the target CPUs. > + */ > + if (cpu == cpu_of(rq) || > + smp_load_acquire(&cpu_rq(cpu)->scx.kick_sync) != ksyncs[cpu]) { > + cpumask_clear_cpu(cpu, rq->scx.cpus_to_sync); > + continue; > + } Should we add something like: if (cpu != cpu_of(rq) && !cpu_online(cpu)) { cpumask_clear_cpu(cpu, rq->scx.cpus_to_sync); continue; } > + > + raw_spin_rq_unlock_irq(rq); > + while (READ_ONCE(cpu_rq(cpu)->scx.kick_sync) == ksyncs[cpu]) { And here: if (cpu != cpu_of(rq) && !cpu_online(cpu)) break; (see below) > + smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1); > + cpu_relax(); > + } > + raw_spin_rq_lock_irq(rq); > + waited = true; > + } > + > + if (waited) > + goto retry; > +} > + > static struct task_struct *first_local_task(struct rq *rq) > { > return list_first_entry_or_null(&rq->scx.local_dsq.list, > @@ -2460,7 +2502,7 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx) > bool keep_prev; > struct task_struct *p; > > - /* see kick_cpus_irq_workfn() */ > + /* see kick_sync_wait_bal_cb() */ > smp_store_release(&rq->scx.kick_sync, rq->scx.kick_sync + 1); > > rq_modified_begin(rq, &ext_sched_class); > @@ -2470,6 +2512,17 @@ do_pick_task_scx(struct rq *rq, struct rq_flags *rf, bool force_scx) > rq_repin_lock(rq, rf); > maybe_queue_balance_callback(rq); > > + /* > + * Defer to a balance callback which can drop rq lock and enable > + * IRQs. Waiting directly in the pick path would deadlock against > + * CPUs sending us IPIs (e.g. TLB flushes) while we wait for them. > + */ > + if (unlikely(rq->scx.kick_sync_pending)) { > + rq->scx.kick_sync_pending = false; > + queue_balance_callback(rq, &rq->scx.kick_sync_bal_cb, > + kick_sync_wait_bal_cb); queue_balance_callback() is a no-op if the rq is in balance_push, but I guess it's ok to just clear the kick_sync_pending if we add the checks above. > + } > + > /* > * If any higher-priority sched class enqueued a runnable task on > * this rq during balance_one(), abort and return RETRY_TASK, so > @@ -4713,6 +4766,9 @@ static void scx_dump_state(struct scx_exit_info *ei, size_t dump_len) > if (!cpumask_empty(rq->scx.cpus_to_wait)) > dump_line(&ns, " cpus_to_wait : %*pb", > cpumask_pr_args(rq->scx.cpus_to_wait)); > + if (!cpumask_empty(rq->scx.cpus_to_sync)) > + dump_line(&ns, " cpus_to_sync : %*pb", > + cpumask_pr_args(rq->scx.cpus_to_sync)); > > used = seq_buf_used(&ns); > if (SCX_HAS_OP(sch, dump_cpu)) { > @@ -5610,11 +5666,11 @@ static bool kick_one_cpu(s32 cpu, struct rq *this_rq, unsigned long *ksyncs) > > if (cpumask_test_cpu(cpu, this_scx->cpus_to_wait)) { > if (cur_class == &ext_sched_class) { > + cpumask_set_cpu(cpu, this_scx->cpus_to_sync); > ksyncs[cpu] = rq->scx.kick_sync; > should_wait = true; > - } else { > - cpumask_clear_cpu(cpu, this_scx->cpus_to_wait); > } > + cpumask_clear_cpu(cpu, this_scx->cpus_to_wait); > } > > resched_curr(rq); > @@ -5669,27 +5725,15 @@ static void kick_cpus_irq_workfn(struct irq_work *irq_work) > cpumask_clear_cpu(cpu, this_scx->cpus_to_kick_if_idle); > } > > - if (!should_wait) > - return; > - > - for_each_cpu(cpu, this_scx->cpus_to_wait) { > - unsigned long *wait_kick_sync = &cpu_rq(cpu)->scx.kick_sync; > - > - /* > - * Busy-wait until the task running at the time of kicking is no > - * longer running. This can be used to implement e.g. core > - * scheduling. > - * > - * smp_cond_load_acquire() pairs with store_releases in > - * pick_task_scx() and put_prev_task_scx(). The former breaks > - * the wait if SCX's scheduling path is entered even if the same > - * task is picked subsequently. The latter is necessary to break > - * the wait when $cpu is taken by a higher sched class. > - */ > - if (cpu != cpu_of(this_rq)) > - smp_cond_load_acquire(wait_kick_sync, VAL != ksyncs[cpu]); > - > - cpumask_clear_cpu(cpu, this_scx->cpus_to_wait); > + /* > + * Can't wait in hardirq — kick_sync can't advance, deadlocking if > + * CPUs wait for each other. Defer to kick_sync_wait_bal_cb(). > + */ > + if (should_wait) { > + raw_spin_rq_lock(this_rq); > + this_scx->kick_sync_pending = true; > + resched_curr(this_rq); > + raw_spin_rq_unlock(this_rq); > } > } > > @@ -5794,6 +5838,7 @@ void __init init_sched_ext_class(void) > BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_kick_if_idle, GFP_KERNEL, n)); > BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_preempt, GFP_KERNEL, n)); > BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_wait, GFP_KERNEL, n)); > + BUG_ON(!zalloc_cpumask_var_node(&rq->scx.cpus_to_sync, GFP_KERNEL, n)); > rq->scx.deferred_irq_work = IRQ_WORK_INIT_HARD(deferred_irq_workfn); > rq->scx.kick_cpus_irq_work = IRQ_WORK_INIT_HARD(kick_cpus_irq_workfn); > > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 43bbf0693cca..1ef9ba480f51 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -805,9 +805,12 @@ struct scx_rq { > cpumask_var_t cpus_to_kick_if_idle; > cpumask_var_t cpus_to_preempt; > cpumask_var_t cpus_to_wait; > + cpumask_var_t cpus_to_sync; > + bool kick_sync_pending; > unsigned long kick_sync; > local_t reenq_local_deferred; > struct balance_callback deferred_bal_cb; > + struct balance_callback kick_sync_bal_cb; > struct irq_work deferred_irq_work; > struct irq_work kick_cpus_irq_work; > struct scx_dispatch_q bypass_dsq; > -- > 2.53.0 > Thanks, -Andrea