From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from CH4PR04CU002.outbound.protection.outlook.com (mail-northcentralusazon11013051.outbound.protection.outlook.com [40.107.201.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 643C854763 for ; Tue, 31 Mar 2026 00:38:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.201.51 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774917519; cv=fail; b=lDaanyVIdPPj9EvsyWqOQTS2ma+H7ggKA4/+2o8Rkj3T7p1VLWzGm0XSv2LpHGWWD7ueqBVhxKnH1/t368+flNmEVaD30zMY6TvdoJ+qUkyQeANdabyMeHCTR78YqcPzQpGbKT1HKnNnHmy6Z1viZAHiDcffDojGlMscn4/MLDw= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774917519; c=relaxed/simple; bh=iL+ocqpLvcM556lLJydCWWolgkPT/Vi2iRx8ffyskbo=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=dpVN4QhCZFqWFAPYR5TrHKpQTqTNDL7H3OgU2Y2r6vmyB2tsMKr+4OnX72Oi9cm81CWPeqDbAXT2aLB87bJEHtueNqtcLpyfjTPKLX2Uuq94gEmi2W6FNtwlHgS/fRIMfcAdZ1AvGWFNC1Gcr0TyfAL8UfEnv9itNIU+jle1qF8= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=Hkev+cbz; arc=fail smtp.client-ip=40.107.201.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="Hkev+cbz" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=hr/SmUTNmtgmzIG2/IYXEkXeOLHSrHN+41/L4RbREp6AFIq72DKoEdwWH5crlY7toanTfdK/ePbTW/2S+jMM69p/IP3SSNPFBUsbl2KQcYNLKZ1OnT/AsA9jiaPIilkmn/8Xc/a2Qea6oYOn7GnbAHrTWeTNisVXmY0E30avpdCQ8bZdsQBjn/UHp0lAaWWIaAXupP5DSMX28x0Iz5okfKGibPBRKHnfeCbrFGEQkiTeC+PKonK+YUAyCOQak3gDMZgnxZHuKyhqWCnndH8T5FR0qpo3uZWF7D7RiFfF+hImbG8FlGwTFzPkav7zqUlWGJ1mBfUDAq4ChCAw+O7Z4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=mpiAsDtqB8NS19WdtUM6Aq7zPC/JHdAPOYW+gj2gbPA=; b=A1aV41Rv9+rUrPsv7+MXcoakg8/hBdyLlsG57xrETXd09ePDB/VqqNDWHi071sEVly6VV+RWqhnMqA0q4uWjBZozEJuIcCYfPjUWNVdVjwjlWfp37uzn09o/auifFdDiglVkJnfdUbSOhH3Ce7MDFmwwnfdWFJHkRZ8QmDhsQHtFj1oahF7DPh3FsuRnkZ09rzKqwQf0rPq35p+TdKGscGXUmTvI130wJY3OF5WBuBcThr1ooarZQX5OE6k2C8vV7PHQwec1/3IWBDiI2mfx2HF0/OLrAcRwYdj5KPA5HQgYyVjEA+yBlAqlweMWOkilJr4hoFNRM3NQqXFBtsDUjQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=infradead.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=mpiAsDtqB8NS19WdtUM6Aq7zPC/JHdAPOYW+gj2gbPA=; b=Hkev+cbzr7uPwK7laU4RAdq3R7kq6JRrqFLDtT59FacmMllHyYpIpxyj0qX5VGx3BPGQkXYwx2V7RMZJyZScOmU/WyrqsYPqz6F1G9GjXBSE6styL/eFkGWQ2eInut/o11gH3RJcRGqtqSoGlbg2F5Fvm4JGKKqRFQZPxdynZSg= Received: from BN9PR03CA0056.namprd03.prod.outlook.com (2603:10b6:408:fb::31) by LV3PR12MB9186.namprd12.prod.outlook.com (2603:10b6:408:197::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9769.15; Tue, 31 Mar 2026 00:38:32 +0000 Received: from BN2PEPF000055DF.namprd21.prod.outlook.com (2603:10b6:408:fb:cafe::7c) by BN9PR03CA0056.outlook.office365.com (2603:10b6:408:fb::31) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9745.28 via Frontend Transport; Tue, 31 Mar 2026 00:38:32 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb08.amd.com; pr=C Received: from satlexmb08.amd.com (165.204.84.17) by BN2PEPF000055DF.mail.protection.outlook.com (10.167.245.9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9791.0 via Frontend Transport; Tue, 31 Mar 2026 00:38:32 +0000 Received: from SATLEXMB04.amd.com (10.181.40.145) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.17; Mon, 30 Mar 2026 19:38:32 -0500 Received: from satlexmb07.amd.com (10.181.42.216) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 30 Mar 2026 19:38:31 -0500 Received: from [172.31.184.125] (10.180.168.240) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Mon, 30 Mar 2026 19:38:28 -0500 Message-ID: Date: Tue, 31 Mar 2026 06:08:27 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/7] sched/fair: Fix zero_vruntime tracking To: Peter Zijlstra CC: John Stultz , , , , , , , , , , , , , , Suleiman Souhlal References: <20260219075840.162631716@infradead.org> <20260219080624.438854780@infradead.org> <20260330101018.GN3738786@noisy.programming.kicks-ass.net> <73dab51a-650f-4c82-9e73-13236b2a26c2@amd.com> <20260330144005.GP3738786@noisy.programming.kicks-ass.net> <20260330191108.GU2872@noisy.programming.kicks-ass.net> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20260330191108.GU2872@noisy.programming.kicks-ass.net> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Received-SPF: None (SATLEXMB04.amd.com: kprateek.nayak@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN2PEPF000055DF:EE_|LV3PR12MB9186:EE_ X-MS-Office365-Filtering-Correlation-Id: 9a828e20-e1ed-4640-5ede-08de8ebdd621 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700016|7416014|376014|82310400026|1800799024|22082099003|18002099003|56012099003; X-Microsoft-Antispam-Message-Info: yXR60vtY576JM3sTXyv9tdZb/NWIBg0KmBPDnYnLs7amplzwGsktv9R+vWnDiDk9YE/voZpdpPOF8KhZsONUueuptIyjpxtU48TMoLvP07IQL6dvKnjAUz/HkdK5YsDz7/PpaYqEukxbN521UAl5Nd7Z/TXmSE/b8EQLZ3lYyziuIYs2B24wWnyC3Yfm1E9gMShE1gFDqsB8UhFTOjq5mxpmTU4YjnmgLk934oBCwr1G+aGWcETJKaPm6kGWcmaWaWVs9kZg0ZcAjRtAk6RitF1Qw1MILKgY/jKAFYaD0hL/FjkUmfR+jBbaHI3QQ8/QGHKElWpOrLO7HbDtfCDlJJpIvWjfGlKTKRHBbWfDeqRZIXdYiMkoRZETM8NLJSvRomupNS/El7aOfy8ZtKLNu6VmuWCMefWySGDeFp9ssEXsQMFkB52NUfPJMPP8J6BixCvy9DGntJ/DtBkXFWHlyK76eDkBJIufk9/GDYIeIKllVxoLY6IQQPZkiK93KkPrwfbeT3vjLUQjJRe0PAH1iip/34aNspA0fxdXVGoO5EhH87WLN8dMDSKqHtpAaTRZ40MRyYzZoMsOHAMIgmwFYx/3K2PufsGNTStjvK1Sum+w6GMXutlpbzI9Lk3MXdXXsmxglLwYUtwzAClVdaDEUqutYqNJ3GwShFm5oWE/FcsEtyvW/QqCOqr4F8SaDmWFAIOG+ogzh6p/IPVlD3OZOGDl+uYuOogr3yiniomn1C3zsDu1g37myw2ftDbfWRO4/5Do6i26nxYK9hf99ivs0A== X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb08.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700016)(7416014)(376014)(82310400026)(1800799024)(22082099003)(18002099003)(56012099003);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: dI9eD8hfiNYabkZtJvylCQWnlzMLb94AxY8c9d0fEey7s9amqigkkmR5EWKAbaYU0/PEQwqj26TCBo6brUevP+luAAqO59drC+fY5Lu5/h47Pcus+9EdL2JAA1mqwzMlmmP1Ux4bD7fW8RbqPpSnV5wKktGQ6v5yGh3QeYcLX4Dnqa4D6mehCcc0L7irM/UEC9g+Ng9S9w75HtKOmfKyML5v0QuAIYi6ff6vCZIarvQc9QfxgJn13vrnfRRitbIoYq+xty9xm4FzHl5wL3XlGS2UCOPcDCPFgzVmQJFFVaI1y+AzyEHQ5QVwsGdTQVjXF9xMQViYLxIFmbkeT0ARsFJV2fOjLjWrmWVF4pp3wfCfQHEKvY1GgoiQyrBlVTW7JS7H10VV3otbPwma4eWoCDyOlvSrDuyW/5GiXkb3YLnxcdy1vRfeudXP792UEMLu X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 31 Mar 2026 00:38:32.5256 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 9a828e20-e1ed-4640-5ede-08de8ebdd621 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb08.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN2PEPF000055DF.namprd21.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV3PR12MB9186 Hello Peter, On 3/31/2026 12:41 AM, Peter Zijlstra wrote: >> Turns out I spoke too soon and it did eventually run into that >> problem again and then eventually crashed in pick_task_fair() >> later so there is definitely something amiss still :-( >> >> I'll throw in some debug traces and get back tomorrow. > > Are there cgroups involved? Indeed there are. > > I'm thinking that if you have two groups, and the tick always hits the > one group, the other group can go a while without ever getting updated. Ack! That could be but I only have once cgroup on top of root cgroup as far as cpu controllers are concerned so the sched_yield() catching up the avg_vruntime() should have worked. Either ways, I have more data: When I hit the overflow warning, I have: se: entity_key(-83106064385) weight(90891264) overflow(-7553615238018032640) cfs_rq: zero_vruntime(138430453113448575) sum_w_vruntime(0) sum_weight(0) cfs_rq->curr: entity_key(0) vruntime(138430453113448575) deadline(138430500540426854) Post avg_vruntime(): se: entity_key(-83106064385) weight(90891264) overflow(-7553615238018032640) cfs_rq: zero_vruntime(138430453113448575) sum_w_vruntime(0) sum_weight(0) cfs_rq->curr: entity_key(0) vruntime(138430453113448575) deadline(138430500540426854) so running avg_vruntime() doesn't make a difference and it seems to be a genuine case of place_entity() putting the newly woken entity pretty far back in the timeline. (I forgot to print weights!) Now, the funny part is, if I leave the system undisturbed, I get a few of the above warning and nothing interesting but as soon as I do a: grep bits /sys/kernel/debug/sched/debug Boom! Pick fails very consistently (Because of copy-pasta this too doesn't contain weights): NULL Pick! cfs_rq: zero_vruntime(89029406877992895) sum_w_vruntime(-135049248768) sum_weight(1048576) cfs_rq->curr: entity_key(149162) vruntime(89029406878142057) deadline(89029406976268435) queued se: entity_key(-123294) vruntime(89029406877869601) deadline(89029406880669601) after avg_vruntime()! cfs_rq: zero_vruntime(89029406877868114) sum_w_vruntime(-4206886912) sum_weight(1048576) cfs_rq->curr: entity_key(273943) vruntime(89029406878142057) deadline(89029406976268435) queued se: entity_key(1487) vruntime(89029406877869601) deadline(89029406880669601) NULL Pick! The above doesn't recover after a avg_vruntime(). Btw I'm running: nice -n 19 stress-ng --yield 32 -t 1000000s& while true; do perf bench sched messaging -p -t -l 100000 -g 16; done Nice 19 is to get a large deadline and keep catching up to that deadline at every yield to see if that makes any difference. > > But if there's no cgroups, this can't be it. > > Anyway, something like the below would rule this out I suppose. I'll add that in and see if it makes a difference. I'll add in weights and look at place_entity() to see if we have anything interesting going on there. Thank you for taking a look. -- Thanks and Regards, Prateek