From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from BN8PR05CU002.outbound.protection.outlook.com (mail-eastus2azon11011023.outbound.protection.outlook.com [52.101.57.23]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 164BB3630B2 for ; Tue, 12 May 2026 14:56:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.57.23 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778597794; cv=fail; b=CDKAdudvn4vS4Nv+t5QCDW7mHEomAYVUMHRHYNrZj6e5PejV3XPPKgKzs7ocNeJwArD+2eaa2J7sZnMI10GUVL+c4nHj7OQuwHAiyBzTzeDa1ohf4zMdk0tjik0iX8cUNLTMZW7qNFvPljuQoJw3/R7Yi8BR15LgSZ2ZgY8/Pqw= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778597794; c=relaxed/simple; bh=qF3cxgbK/LolJ9sIpUH3vv6vdFdoNLb+sEkIcjhR+AE=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=I0dYFCHV091uddxchM2rnSidug2Il0Y46qAIhahC8W+3rc9nnKWsv2I59VMq1fr4q8Fm0nmExJF6g+fqe1Q7udWCHukSUrVghAN76L4kCcWG80SxHNwjS4nUd9KX+IQfKF9t1unbBEsW3Pld4HXSKpGSvIusbN4zKE53p8fcmXo= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=pdpQXAZw; arc=fail smtp.client-ip=52.101.57.23 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="pdpQXAZw" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=yvBj8EZlvrDr0lFBzJvyOPaA2Cl/CEuWPPmrNeQHpbL6BtNN6iQ+nw5yjQkpWZBXhu3g+MsVRByQCIkjgklYbU35zWZCcY/6+ueIZGIMd3GvhclTddisf54MCoihWJvFYVAGFPLCN96uOq1GtR4+t+YBYhgZ7VpQA5G95Fb7I7L/6SvXc6mT0V2HMYuLLirbx8PP9V+nQOiIbmj9lvoKlrJp2UULJDy7xEMRSW47HcX031K2CaqEIynssfTqO9MHjEfLalyVZ/cbTHUlRTFR6l7QgqSSTp3OveDKAGAchaZ7LYnVo7M78iAJhYwxhrikZJ/LtyDtL7jMQYsuVy0N6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=xA7rGpeTZZfEPwm96R+1t41sRrmuSj6ymXPMSIuqHxE=; b=bsGT9/6KXvYw6/+iIFW3dwc4KglwRLZPS0g4ac5KBirNwK6LGTVSJZBD9sGVXEl5kg25xwV6G41uEyQATceLgK4sFxKRI76nIUx5++vrqidU9DtwgKvjdbUvyScN9j+UZ7Cm4TIOTwl0kCpyQc/RBIQFK+ye5ZDuX/1pxlalgh67L93Ec88Vz43IqneR0M4HwegJ7uVvAQ1Q4QDlYUFA2pLLI4gfvHCwh+J13qY30L6N9Fj07F9FqIQVY4Qr86yvZIQ7pGoCeuxBtT8ZTcDbDTz3eXAFeI/O2X/fvQnGmcwDmpVdm6FJr+40bRxIaX5CalTaKLvfPDPo/Z2ui+/fBA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xA7rGpeTZZfEPwm96R+1t41sRrmuSj6ymXPMSIuqHxE=; b=pdpQXAZwozWVK+rNH6bXFFMe99JhMH1R3rqL+PdnexnVEFqtInN1UR6CI3F6wOMXSF0nJNt+EZmYxQVyiV296gsT5lixHVphkYhlNa2ZWWi6qdGcF+wmYXziN7WU5RzDDZe9VXL8b26XZyt/zVmIh3goFMixfiZoT5FVdQYw7/RQxizt1mTFf253UI75n2jfDXyYMoBo+0Ls8WCB9ALgkknKqEF5F6162rgFSmb8BI77filwoI/kKzxeYQc8lxf3na7maEBkl7A7ONpXBpi2p1heRgK0ADonuiPcjSKACxKzLsQa3zVk7e3gHpH+RJGgQBsabwb/k/GZOMV6CsnAGg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from DS2PR12MB9615.namprd12.prod.outlook.com (2603:10b6:8:275::18) by CY5PR12MB6382.namprd12.prod.outlook.com (2603:10b6:930:3e::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9891.23; Tue, 12 May 2026 14:56:23 +0000 Received: from DS2PR12MB9615.namprd12.prod.outlook.com ([fe80::f4e9:9ad6:cb62:2c15]) by DS2PR12MB9615.namprd12.prod.outlook.com ([fe80::f4e9:9ad6:cb62:2c15%3]) with mapi id 15.20.9891.021; Tue, 12 May 2026 14:56:23 +0000 Date: Tue, 12 May 2026 16:55:36 +0200 From: Andrea Righi To: Juri Lelli Cc: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , K Prateek Nayak , Frederic Weisbecker , linux-kernel@vger.kernel.org, David Haufe , Cao Ruichuang Subject: Re: [PATCH] sched/deadline: Make dl-server nohz full aware Message-ID: References: <20260512-upstream-fix-dlserver-nohzfull-b4-v1-1-a94844387ae7@redhat.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260512-upstream-fix-dlserver-nohzfull-b4-v1-1-a94844387ae7@redhat.com> X-ClientProxiedBy: ZR2P278CA0073.CHEP278.PROD.OUTLOOK.COM (2603:10a6:910:52::16) To LV8PR12MB9620.namprd12.prod.outlook.com (2603:10b6:408:2a1::19) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS2PR12MB9615:EE_|CY5PR12MB6382:EE_ X-MS-Office365-Filtering-Correlation-Id: 2ad48fb8-d3a0-4ab6-da68-08deb0369959 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|7416014|376014|366016|11063799003|4133799003|56012099003|18002099003|22082099003; X-Microsoft-Antispam-Message-Info: COg7Ktw4qW5VGHrhDt/zmpvhU2/pCuceXwShUsePzbzpCg1soAfH1Bt0UyWEu4uqltFkfQQHNKw8F+8gjKXunbC7JR9pGvQhCadF1J7DFlW0JVcEyd5ZdhnFc1x2uWZZbDuFiVfXHqQ3W7FBxC3k+SWIHZnoSDvAw0/DJuTUD8erxif6/IQkGwnhGfc36LTiAAmLoBRKVqLCRFxX3f7W8qcT+LIihMGKDg2blk4nLtuzcfDolzjDi+/INtvLDikWkPw2KoGONG+xNJiYTTufd5qlZ4EcGxawULXWOpbTQ1S05GWALWjFott+HZuK71L6pEGnnvshWzY9vuLcLWJKntXu4YHW365jo1pifb/NObjHPodRLDb8KPQRfccV3Z4LSArjlw/cqhS6CEY4EQdeHGTEs4gLWdFIHXK8Lu+J9U0Tlxr6qVzxUI08k7Rp+LcfWNOT+VRs7XWmBrlr+Lm7IzOZH3S9iSPKIrdwCjcYsMgGKFrg3nv4W78EIih/9nLYf2zR14tV4BjVKwLoqgVesktE6aQgUPUBVaBe/g2DtL31VBSQ904/agBiv0T9QpBK75wbBYU2/VREAd6vCiHNyC2M3jL5V60D9bpq3R06coAZcCu8nHGFhOXoFEcrQlLGzZ76EbXIbOjcAMBiv5f/1ioqPam2nGNlx4Tv42RtUcSeXXLcwaD7Iuo/cd5znoQPkixyEIMZYesN9YI8fA5GlA== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS2PR12MB9615.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(1800799024)(7416014)(376014)(366016)(11063799003)(4133799003)(56012099003)(18002099003)(22082099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?GnG+eOukWxN/GhBde+8oDrrP40I+bnVSwmBXCgrruGB/zQQmSan0zVz5c2x9?= =?us-ascii?Q?DqaYP8V7qp7CDcMojRZvRi3GmSTuJ5lA1IHOxxvzoktVWPmOltW+FzFxuOFa?= =?us-ascii?Q?fkb6Ssx0q55eyVzb92jKKq997Kmy54m9EDpXBb5A3gwV7l09f+Brl8SetJ61?= =?us-ascii?Q?zIOocd/iSmLuPnPe6E2EcRx1ipXvCOwF6VSAyCo8bcm220zziUC+2TY02ZVJ?= =?us-ascii?Q?1TGV+buwyhveTzY/ijETKhnaApr1QlQT7SYTKG/kat3xSLKHLnKfvkJcVZ5P?= =?us-ascii?Q?QswrU29wqlTQ3IivJQ8INsY7QwmXtz3sd060zd6SKbIme7lzcD9w5LDt9BLD?= =?us-ascii?Q?eVg45ikD+buG/FuOajd6yMPg+jL5DXz9e7ux0n2YupxtSwH2hIBnG4Lg5t84?= =?us-ascii?Q?58HndmJluzFQZaVtn65x2jKq8b5lftoAcVUq2MLRQbYaG88V1rDVrklYk/C/?= =?us-ascii?Q?SJ6t+qWntSNjxFPzx072k09QbJGgPOEJjIgki6IaIJMfGVZhLluMxp/34obO?= =?us-ascii?Q?DfrQjmXozKX3/RBNp8+zZQbonq1fGnf8gtE5fwiw9XSe2CXIw2RUZsadwDIO?= =?us-ascii?Q?JBhI0DgKFU7hLfKFC99FLrQP5bhoXPfYCZ1NrPLtnbfQw0Xdx+e8rPy87nzj?= =?us-ascii?Q?m9EasykeRhPlQ+0x7fBiue8U6KJvPaGm33joS5Fuee61gyNpi78O8WwHgdQh?= =?us-ascii?Q?WRbAfG5Qjl3xps6bXaQRRLpMwdXZygI0WDWf9ThkIsept5mUDuuVF597T5q1?= =?us-ascii?Q?9Q0Gl4JvrFBMYnNVBB34Wrprrwl7IZxI5S5VQsVQ02/N7xIXRJRQq0YPcwVG?= =?us-ascii?Q?3oIvkHh37bJ+zZzeSv3UJi9c4/elSBxF+4ooow54HPTCiEAnq/f8sG5NqRn8?= =?us-ascii?Q?YfRDiXB1PhGg1VS1UukUMW6cDAa1aQDjOHmf3nqTJ7eMs3QdBFnHUmg+VWET?= =?us-ascii?Q?HvjCNNy/xZqRbjvDsovbxm+801QoT8boBeTWjFwOaIOAHkxPcikdMCzNfCyu?= =?us-ascii?Q?afz8TAs3mAUfsocVmy2Cq/U5alY3f4vFvNVfwFUbieZhBFNWUPvYUEKtzEuy?= =?us-ascii?Q?oUP8jkhqzmNHau8MDcXqhufr1SR3lZ5V0wHo3+pZ67UAZYc522b3k45dcvMI?= =?us-ascii?Q?LM85kfWaxZyhv17OeHqiTfLMru/MhmEeL+DtjaCMktYva+pIAXrOR/lUk4I9?= =?us-ascii?Q?0a7RDvS7HsA6sNA0On1NMhgp50ovbmMStW/5HYdGjYcqyCcBOD+wSjrPlDep?= =?us-ascii?Q?fAQTrh+4qpnkt1W6WnwCldn/RccOSu310hS/YJ7JUziT/ZKY29kmzGyTjeum?= =?us-ascii?Q?W/8h0pSvxuEDG2nigrw8JvZGpicKMzbq+UDkoRvCToakm9OhGj1TL6xiICrj?= =?us-ascii?Q?9TfdDoNZp+njPzIk57f66eVc6D8sK67c1MmnmzvC29RMuL3VRwdDfXxWWmHn?= =?us-ascii?Q?DqXmhUpsZ+E+JQ1BN/ji21oc5Ri91bOWXkweVGtpwmWNLADjjR6AidkyBmLS?= =?us-ascii?Q?BtyHEBcva9QAbFpoBpz7iz1Co9KAbgKtem8nlIwyusnqUFL2HfxHrvRXfxzA?= =?us-ascii?Q?2zreSGq77g/DMuXzNIlxD71gb7XF1DujBDBCYMf9dMzrMdC6aiH3zqnJtkxc?= =?us-ascii?Q?rRP8x1Ps9hGmJXmJtdLsvrK5SNngOaUFtGhCVt9Jy0xVcWVVjx1+SgVMzRYS?= =?us-ascii?Q?K5jFXFA1ERas60mMEp61kDgQs/B0SH7pFWi0cotQ/wKdKH0r?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2ad48fb8-d3a0-4ab6-da68-08deb0369959 X-MS-Exchange-CrossTenant-AuthSource: LV8PR12MB9620.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 May 2026 14:56:22.9824 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 8iYXs85ZlGKxIOKS3ZkL2OaHDXWuxdhcyXAXf6s+PH6TQWok72Ek+PxywYi6gEBmxU9DJ5m+T+G9+kuLHgOrig== X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY5PR12MB6382 Hi Juri, On Tue, May 12, 2026 at 11:02:37AM +0200, Juri Lelli wrote: > The dl_server_timer() causes spurious IPIs on nohz_full cores, breaking > isolation guarantees. The timer executes on a housekeeping core and > eventually calls tick_nohz_dep_set_cpu(), sending IPIs to isolated cores > even when only a single task is running. > > The problem is that dl-servers are not coordinated with nohz_full tick > state. Timers can fire and send IPIs to otherwise undisturbed cores. > > Fix by managing servers in sched_can_stop_tick(): > > - When RT tasks run with CFS/SCX tasks, start the appropriate server > and keep the tick running > - When only RT tasks remain, stop all servers and allow tick to stop > (except for >1 RR tasks which need the tick for round-robin) > - When only CFS/SCX tasks remain, stop all servers before stopping tick > > Introduce dl_servers_stop_all() to reduce duplication and abstract > server management from core.c. Unify RT handling into one block that > handles both RR and FIFO cases. > > Fixes: 557a6bfc662c ("sched/fair: Add trivial fair server") > Reported-by: David Haufe > Closes: https://lore.kernel.org/lkml/CAKJHwtOw_G67edzuHVtL1xC5Vyt6StcZzihtDd0yaKudW=rwVw@mail.gmail.com > Signed-off-by: Juri Lelli > --- > I had to modify my first original attempt at fixing this (please take a > look at the linked report/discussion) to also take SCX into > consideration. As mentioned by Frederic, we don't allow to load BPF schedulers when isolcpus= is used, so I think we can simplify the sched_can_stop_tick() part. > > FYI, I temporarily pushed the script I'm using to repro and verify the > fix here > > https://github.com/jlelli/sched-deadline-tests/blob/master/test-dlserver-nohz.sh > --- > kernel/sched/core.c | 43 +++++++++++++++++++++++-------------------- > kernel/sched/deadline.c | 14 ++++++++++++++ > kernel/sched/sched.h | 1 + > 3 files changed, 38 insertions(+), 20 deletions(-) > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index b905805bbcbe4..98759255c306b 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -1414,30 +1414,35 @@ static inline bool __need_bw_check(struct rq *rq, struct task_struct *p) > > bool sched_can_stop_tick(struct rq *rq) > { > - int fifo_nr_running; > - > /* Deadline tasks, even if single, need the tick */ > if (rq->dl.dl_nr_running) > return false; > > /* > - * If there are more than one RR tasks, we need the tick to affect the > - * actual RR behaviour. > + * If there are RT tasks, we may need the tick (for >1 RR tasks), > + * but we must also service lower-priority CFS/SCX tasks via dl-servers. No need to mention SCX, maybe we can add a note that SCX is incompatible with isolcpus, so there's no SCX task to run here. > */ > - if (rq->rt.rr_nr_running) { > - if (rq->rt.rr_nr_running == 1) > - return true; > - else > + if (rq->rt.rt_nr_running) { > + if (rq->cfs.h_nr_queued) { > + dl_server_start(&rq->fair_server); > + return false; > + } > +#ifdef CONFIG_SCHED_CLASS_EXT > + if (rq->scx.nr_running) { > + dl_server_start(&rq->ext_server); > + return false; > + } > +#endif This #ifdef block can go away. > + /* > + * Only RT tasks, no CFS/SCX. Stop servers to prevent spurious CFS/SCX -> CFS. > + * wakeups. Tick can stop for single RR or any FIFO, but must > + * run for multiple RR (round-robin behavior). > + */ > + dl_servers_stop_all(rq); > + if (rq->rt.rr_nr_running > 1) > return false; > - } > - > - /* > - * If there's no RR tasks, but FIFO tasks, we can skip the tick, no > - * forced preemption between FIFO tasks. > - */ > - fifo_nr_running = rq->rt.rt_nr_running - rq->rt.rr_nr_running; > - if (fifo_nr_running) > return true; > + } > > /* > * If there are no DL,RR/FIFO tasks, there must only be CFS or SCX tasks > @@ -1462,6 +1467,7 @@ bool sched_can_stop_tick(struct rq *rq) > return false; > } > > + dl_servers_stop_all(rq); > return true; > } > #endif /* CONFIG_NO_HZ_FULL */ > @@ -8810,10 +8816,7 @@ int sched_cpu_dying(unsigned int cpu) > WARN(true, "Dying CPU not properly vacated!"); > dump_rq_tasks(rq, KERN_WARNING); > } > - dl_server_stop(&rq->fair_server); > -#ifdef CONFIG_SCHED_CLASS_EXT > - dl_server_stop(&rq->ext_server); > -#endif > + dl_servers_stop_all(rq); > rq_unlock_irqrestore(rq, &rf); > > calc_load_migrate(rq); > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > index edca7849b165d..c2b3d6bbe4828 100644 > --- a/kernel/sched/deadline.c > +++ b/kernel/sched/deadline.c > @@ -1826,6 +1826,20 @@ void dl_server_stop(struct sched_dl_entity *dl_se) > dl_se->dl_server_active = 0; > } > > +/* > + * Stop all dl-servers on this runqueue. Called when transitioning to a state > + * where the tick can be stopped (e.g., single RR/FIFO task, or no RT tasks). > + * This ensures server timers are disarmed and won't cause spurious wakeups on > + * nohz_full isolated cores. > + */ > +void dl_servers_stop_all(struct rq *rq) > +{ > + dl_server_stop(&rq->fair_server); > +#ifdef CONFIG_SCHED_CLASS_EXT > + dl_server_stop(&rq->ext_server); > +#endif > +} And I think the dl_servers_stop_all() helper still makes sense, stopping the ext_server is still needed in sched_cpu_dying() and calling dl_server_stop() on an already-inactive server is harmless in the no-RT path. Thanks, -Andrea > + > void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, > dl_server_pick_f pick_task) > { > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index 9f63b15d309d1..26cf1d14efde5 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -412,6 +412,7 @@ extern void dl_server_update_idle(struct sched_dl_entity *dl_se, s64 delta_exec) > extern void dl_server_update(struct sched_dl_entity *dl_se, s64 delta_exec); > extern void dl_server_start(struct sched_dl_entity *dl_se); > extern void dl_server_stop(struct sched_dl_entity *dl_se); > +extern void dl_servers_stop_all(struct rq *rq); > extern void dl_server_init(struct sched_dl_entity *dl_se, struct rq *rq, > dl_server_pick_f pick_task); > extern void sched_init_dl_servers(void); > > --- > base-commit: 4ac4d6549a6563878d7c19c154e017f6cb7114d3 > change-id: 20260512-upstream-fix-dlserver-nohzfull-b4-b745e2a967ed > > Best regards, > -- > Juri Lelli >