From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from PH0PR06CU001.outbound.protection.outlook.com (mail-westus3azon11011062.outbound.protection.outlook.com [40.107.208.62]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 455493B19B7 for ; Fri, 3 Apr 2026 13:59:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.208.62 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775224759; cv=fail; b=SBaCjhoIM4SJ0+J/KOE2dUOiIWwdfzjiRzEpkbhbPoQdaJqVV6NUBCtDvTNt8bVbVO0S2bDnjVCeU9VwhVGYnecoWSOEYRjHyCfQMCmTBGrecrj7D4lkIMdpp81zLD0bCLU/Wrsh4FUNKh7EV7ZZvdE1owZKyVJ+VUQlTHbpGRA= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1775224759; c=relaxed/simple; bh=2qD3VHoSiUEAQifPxG51pizjc9gHcpAIzbGGosRcurE=; h=Date:From:To:Cc:Subject:Message-ID:References:Content-Type: Content-Disposition:In-Reply-To:MIME-Version; b=U3QPXoTQHvEcVtVzWQKOiubrc1xdqnNvlQsuBsQHneO0DALYOlnmX0K/7ldXFKokaFUTwPcuVe78hMsw1TVnMkqGnb5kREG4IQMKpWvADY4pk+pKWeFOugURrhs1++Mou61BSnWBAPri8/jowhfzl6SP1oK5I0X0WdRfRg+5mEA= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com; spf=fail smtp.mailfrom=nvidia.com; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b=ZlrHn2US; arc=fail smtp.client-ip=40.107.208.62 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=nvidia.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=nvidia.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com header.b="ZlrHn2US" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=JUTbIwLikcaorNlqJDhPKHQX+Z3nKNSgCR6NngYck/B4djihaWR1brQiInELsAcRDGcP7xqtYJ8WiNnHqU8DG9qsPk1JNXVF4NH6rrsL+fx3yKbmExM3IoWHAJDHxFSk3cQHXXJmybGIbyexBEeo7IUF3GJcXmGJA6CkJyJy2xqbJKXOktmHX9Mqgp8a21uOSfaCLuj6roBEXFHmqGimUjJI6rC5BNlK3UHzLEQwR24yEIcobQ6HE5cesLjuzE3tbBHdnc2HR7yOTOQrI6oaYU8DTZw+SKCvQ+frbDfb5RnDrLNWIju4MS/0bpjS9Mel7ALV5Vq4LhYPWa5xrKj8RQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=GzoYEfpVthBHsDqwYhb57sB6mJntKHTCPdQSvG1Hrlc=; b=ioSmWxdd9iCSN28sVmqYLubr6Rx278Evg+2DQzOmLpM9Hhp1DAmlFtHily/1hfLDsxxEr1qqcXECzW2rZ+xGCeeUH6MeAES3btUym+LBRTsAOxZTCSYEXaZuBp2MQ5CarTY08qyjEYsLeoqYs8Jb8WzwSQYdlPLTnCEoDwpQi8U2+dV8ZG/JhO2K2dd2xK1t5OBzFwYb+WzaZSi5RdPxazi3b6k8efkgzP/CuAH0lq0+TJ461RaRvf26OSnhqQ2bBWhGN2mSbKW9WlgCYwxauW+Bm73To+HOm8LNsm4b8O85LzzwOvqM49GCxHWpV26e3fHY+vsajPACXzkaHp8Ohw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=GzoYEfpVthBHsDqwYhb57sB6mJntKHTCPdQSvG1Hrlc=; b=ZlrHn2UStlppbxLVyS977wim6gnuJ4Ep4EDtJlC4MU93CxBtz2ZShRa683puLS6+KZvbrQcoXS9hOp0FVpehrMqNMYz57i0IhhqJrRD/a5MF8cFEd/WcLl/DitNAxTXSv9ob5Uwm/TRY5s2VCvcAdOnt9Gv3zJIv7XIpBK3Uq+nGlLz79RBL6KpXssZWfaVbUblcOSRTV8YQ6sDZh1ss8573zfLC7FIJoeU/Z9GNbdhnFPwTj1oCcd1w8SBbVWCnlKnf0xEeDoCfoX2jE6vMUj0nUtBFPOEgs3hHLlo9otTHt9R7p7AIgpGXprYctkdBOKgwhNEQQNMn3wHdELh0Sg== Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=nvidia.com; Received: from DS2PR12MB9615.namprd12.prod.outlook.com (2603:10b6:8:275::18) by MN0PR12MB5788.namprd12.prod.outlook.com (2603:10b6:208:377::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.20; Fri, 3 Apr 2026 13:59:07 +0000 Received: from DS2PR12MB9615.namprd12.prod.outlook.com ([fe80::f4e9:9ad6:cb62:2c15]) by DS2PR12MB9615.namprd12.prod.outlook.com ([fe80::f4e9:9ad6:cb62:2c15%6]) with mapi id 15.20.9769.018; Fri, 3 Apr 2026 13:59:07 +0000 Date: Fri, 3 Apr 2026 15:58:53 +0200 From: Andrea Righi To: Peter Zijlstra Cc: soolaugust@gmail.com, jstultz@google.com, juri.lelli@redhat.com, mingo@redhat.com, linux-kernel@vger.kernel.org, zhidao su Subject: Re: [PATCH] sched/deadline: Fix stale dl_defer_running in update_dl_entity() if-branch Message-ID: References: <20260403081215.3942454-1-soolaugust@gmail.com> <20260403134256.GH3558198@noisy.programming.kicks-ass.net> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20260403134256.GH3558198@noisy.programming.kicks-ass.net> X-ClientProxiedBy: MI3PEPF00004E96.ITAP293.PROD.OUTLOOK.COM (2603:10a6:298:1::446) To DS2PR12MB9615.namprd12.prod.outlook.com (2603:10b6:8:275::18) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS2PR12MB9615:EE_|MN0PR12MB5788:EE_ X-MS-Office365-Filtering-Correlation-Id: 2e89befc-73a1-4776-7c60-08de91892c59 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|366016|376014|1800799024|18002099003|22082099003|56012099003; X-Microsoft-Antispam-Message-Info: h/Z9Ty/xnUjDu+TXfBToaPcBqTZGhP9OlVb3oIQORQowwO9N7IyERIEZLSkh7KYC1UZdL2IfV0Ln/Pb+OvZ+Ln3M3t0j1WpmqDQhZ2cZjTZNSfIQLfQFAEp/F/HnGomozoFe6Pb4sS9zZ4US6aBv8q8FyK6ftzJpsEab8GkehJamEJ6Cyq05tsKR9bsjn7G/lpG/GFdwvtQi2+vYF3MrlT1ug3oUqM6I0fadvmZza+aK4Z4DYrk2E3blJRzND/uV/65QCnXF0ACFx4TdtKNII02LQlrO6mJ2Shz6G4geXvr101zwA3LBzVX8vWEEn8ATBLpecF+1WyO8rrauRVkigRmlVyw6AayTrRUPlhZe889sfeFqLHKRboYruAPcNSobN9sYyVCyvW0xRD+BHgE1sNrcesFD1jFDsqsHVb95tNeNLxMqfrAvZ7duqYAbV7ECv5AevmZQ0L3OXll6dH2WY7oOcLOkt/OXtGlv314X+/i9veTP5oYJJUSSDHwA9Y28FV+Ib4T9KYoKGJ++RcGQkXXSBA1ZRZ1IyaKus0slu82wrZ668aBNXC4MlsKnzalDl3TV/p1Zjvs4dRCGFd+AUW5PTmDrnkaSjrFoxr2c0+sCjdE3dVU32+S6EcDrhtjcK0qvfNZ6ch8Sxqz6XYne7lMk07lspXpQqs0hAewEO2k4VjEZ1ii8LVx4zFRUCKUecZmQDLp7WCwAYAGaFR0u8p7MzCUQ3EKEg8DucSrJ2Sk= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS2PR12MB9615.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230040)(366016)(376014)(1800799024)(18002099003)(22082099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?tzBTuKvvrEQoNPlHVgxreC2CXxjQfMesoXO9Z+38By5iPvH4/yWyl+TVYMlJ?= =?us-ascii?Q?56o2TdVRXVb8tjceQVA9lPNitC2bWb/oF6SYzsYLK4i+SB3uNt15dumEYJ3s?= =?us-ascii?Q?Kn2wfYiTJNek8isZSdbCx8e5nGKw6vUEwYWAMLV+lj4BZLaTTQqd7Qy0jz6f?= =?us-ascii?Q?zgwO1cm0O3EIkESAmKoZhJezZmhdqy5+wDkue0zWHEba5CHu6Mcw+nSqKBKM?= =?us-ascii?Q?lEROJwSlTLYfi8axiB8Sp1Jrtb52sO8mz16m7svjgc/q34HGKm0FAPV7wkab?= =?us-ascii?Q?T2gEewa2nql+jrWtBRsHifeoctbC0AVaUd2qg3ypYiVGTdiKPx5Oy641m7jE?= =?us-ascii?Q?PFmuCzSABKJb6rsHjPJ46cKPeC/DOb04jZaTYT0NzGbidzbC5o/CgyHWTfIg?= =?us-ascii?Q?2orbokWIHJTkW/5NZDe8IC9ijV5dTdJdxePGtq8atSyVF4xlgXFB54ktLnZC?= =?us-ascii?Q?4evrvzoggA8mu0B04Je3wKsZePpGH5yi8csM+8rMPvcB4ZfMwGbv46MMRs0y?= =?us-ascii?Q?aZII36zBWvMx3qUpafG45Nh8A6zGtW10ch88XIkCXwK+d9laeDXMFuIo/8eg?= =?us-ascii?Q?NyV0Tp6kiQkRqydL73n/hZQFMhJ9tSCBzqaZ6L3nCDvwUmMpek7ZiR7NmysF?= =?us-ascii?Q?3i4wIDW1NxM+J/1iRNzE/WDaGuByfpK6sEK/WcpIOceQTFcdGV9kUMfw/hbs?= =?us-ascii?Q?2ny6crW4gB4wop5sEfo3okvmJnBqwzoCnnA+unlPdKvBXMUR3KaWnFX8UmOk?= =?us-ascii?Q?f3/8P64i0G4KmfmbQAvFeFy8qv8NYP9EZH53pC505Bu9/oTu5khsjawm8t/G?= =?us-ascii?Q?gdZAxmxdNB8K/DzfEh0l5PhV84UPIcHQw7xx5262tJnkeYYes2cqaOnM+jiZ?= =?us-ascii?Q?74YVehn2AIUuCiDqSAmGAk37h09AAHmWjuLRRFJATskHHxO7n52ZlmvqcYFq?= =?us-ascii?Q?MQCN2UMp2RZaOMlFL0CS1I4omW2gzM9vQlAXQJv1lJPt94GoS13swfgVUs8v?= =?us-ascii?Q?zUTOricT7l56zJEXKyqJvnsPBdYVI9IRHDT9QDQAJrMD0OJ7eN6vhIx2a40d?= =?us-ascii?Q?ecKsc/4h4wmlz2AzEw9iSt8zc1e1uoU3OBcrLNapUjj/PBJQaIUyRue/V9ci?= =?us-ascii?Q?stRvM+d/Whi6CrYvo5feErERcWktQEc+hlp5HmLKHrh8fRs/tKib3uoETVCK?= =?us-ascii?Q?YMnIXmibP365TSxX0BTGdBPFIIMnaiueZx8Gs9S1+cPZtIyqsgd5mOXs/jgq?= =?us-ascii?Q?+KKoBK/eppLv/rFg6UKMyFzc9EvVKtqN6sRjBH7wXdJGTN8jj/C77loNitra?= =?us-ascii?Q?/2a03EohomHxlE1n4gixfSgdwZ6B3aBBhbRg22186Quqp8Rl3x8RQdrfxrBW?= =?us-ascii?Q?RCjwmBy8qCohOwcPlrXvotw3IkgHo07aVweFsqSaYbaxP+HP7uFcAtwWBh10?= =?us-ascii?Q?ZD22hf8HU6Remrnv8OUvJtZKKzBBbXQfLMXuj4x5VYYHGuHrtSiT6JFg2tUI?= =?us-ascii?Q?HXG3GDBk5XnWL9bTRw+yN+UnRoQomvqQP79FWsVHnxaR7/pExmS4U119sFmZ?= =?us-ascii?Q?QsKEQDPq0bBkb5V896AdKdm8PUL4TNKK8ylQi7uK0qjDjJHfn7TkXHjZxWwM?= =?us-ascii?Q?Jt1/WZ4PNY3YLgitelDrIVS5U5gdBebraIEuLWRfVxaRPM9EyXncKtVMP6hT?= =?us-ascii?Q?6l4LuA//TVdBvzcsswvNsYJZ1+Ru3V+6Yla008Q3+FDvhfIf/SYaxHN5zrIF?= =?us-ascii?Q?YFif/qY/uw=3D=3D?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: 2e89befc-73a1-4776-7c60-08de91892c59 X-MS-Exchange-CrossTenant-AuthSource: DS2PR12MB9615.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Apr 2026 13:59:07.5581 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: 9SzmewQBXSThiyJmE3XprQ5Mvz1FvgxAm4CABVvMH10gwzmyuZNG+QJIkJwF3QkltlwnSs1KzJLf1bv6fDNprg== X-MS-Exchange-Transport-CrossTenantHeadersStamped: MN0PR12MB5788 Hello, On Fri, Apr 03, 2026 at 03:42:56PM +0200, Peter Zijlstra wrote: > On Fri, Apr 03, 2026 at 04:12:15PM +0800, soolaugust@gmail.com wrote: > > From: zhidao su > > > > commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server") added a > > dl_defer_running = 0 reset in the if-branch of update_dl_entity() to > > handle the case where [4] D->A is followed by [1] A->B (lapsed > > deadline). The intent was to ensure the server re-enters the zero-laxity > > wait when restarted after the deadline has passed. > > > > With Proxy Execution (PE), RT tasks proxied through the scheduler appear > > to trigger frequent dl_server_start() calls with expired deadlines. When > > this happens with dl_defer_running=1 (from a prior starvation episode), > > Peter's fix forces the fair_server back through the ~950ms zero-laxity > > wait each time. > > > > In our testing (virtme-ng, 4 CPUs, 4G RAM, ksched_football): > > With this fix: ~1s for all players to check in > > Without this fix: ~28s for all players to check in > > > > The issue appears to be that the clearing in update_dl_entity()'s > > if-branch is too aggressive for the PE use case. > > replenish_dl_new_period() already handles this via its internal guard: > > > > if (dl_se->dl_defer && !dl_se->dl_defer_running) { > > dl_se->dl_throttled = 1; > > dl_se->dl_defer_armed = 1; > > } > > > > When dl_defer_running=1 (starvation previously confirmed by the > > zero-laxity timer), replenish_dl_new_period() skips arming the > > zero-laxity timer, allowing the server to run directly. This seems > > correct: once starvation has been confirmed, subsequent start/stop > > cycles triggered by PE should not re-introduce the deferral delay. > > > > Note: this is the same change as the HACK revert in John's PE series > > (679ede58445 "HACK: Revert 'sched/deadline: Fix stuck dl_server'"), > > but with the rationale documented. > > > > The state machine comment is updated to reflect the actual behavior of > > replenish_dl_new_period() when dl_defer_running=1. > > > > Signed-off-by: zhidao su > > --- > > kernel/sched/deadline.c | 12 +++--------- > > 1 file changed, 3 insertions(+), 9 deletions(-) > > > > diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c > > index 01754d699f0..30b03021fce 100644 > > --- a/kernel/sched/deadline.c > > +++ b/kernel/sched/deadline.c > > @@ -1034,12 +1034,6 @@ static void update_dl_entity(struct sched_dl_entity *dl_se) > > return; > > } > > > > - /* > > - * When [4] D->A is followed by [1] A->B, dl_defer_running > > - * needs to be cleared, otherwise it will fail to properly > > - * start the zero-laxity timer. > > - */ > > - dl_se->dl_defer_running = 0; > > replenish_dl_new_period(dl_se, rq); > > } else if (dl_server(dl_se) && dl_se->dl_defer) { > > /* > > This cannot be right; it will insta break Andrea's test case again. I confirm that with this applied the sched_ext rt_stall selftest starts failing: $ sudo ./runner -t rt_stall ... # Runtime of EXT task (PID 2260) is 0.010000 seconds # Runtime of RT task (PID 2261) is 5.000000 seconds # EXT task got 0.20% of total runtime not ok 4 FAIL: EXT task got less than 4.00% of runtime [ 218.923834] sched_ext: BPF scheduler "rt_stall" disabled (unregistered from user space) # Planned tests != run tests (1 != 4) > > And I cannot make sense of your explanation; how does PE cause what to > happen? You mention PROXY_WAKING, this then means proxy_force_return(). > > I suspect whatever it is you're seeing will go away once we delete that > thing, see this discussion: > > https://lkml.kernel.org/r/20260402155055.GV3738010@noisy.programming.kicks-ass.net > Thanks, -Andrea