From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from NAM04-BN8-obe.outbound.protection.outlook.com (mail-bn8nam04on2041.outbound.protection.outlook.com [40.107.100.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2097A18801A for ; Mon, 20 Jan 2025 09:12:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=40.107.100.41 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737364353; cv=fail; b=tKcUZa9t6tqlN7PnSpZpcBAnzwvfTxT27bj9QwY2+Z0hbIVcX24+nD+JFwIxFX4tubcqf5fccKfSIEhUCWyuRCeerrNr2Mk25WCvPTFI159zNJp45mBCb3k9H0xqyNxgA63ZsFN/bFdSPryXc24+YSfp920j6R0s3NN5LiQNFBQ= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1737364353; c=relaxed/simple; bh=Umo4aYmvcYOw+mhBOjhJM7bGrWg8tPPN9+octluRi9E=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=e50u6Jvvfz5gbod1monwrdXyq7pfs5FNODlAfoTmD+E2MyRMD6JyLvYYEy8WjX4b2ybEKBvNc3yj+GbY6IIq94StePfmHu5g4MnVqqirNU20VJkTr7sKtWEN/ZHGpO2PgQMguEJLE7J/vxKlOlp9Jri9oa75ubSXLELHzuBmdsU= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=Pus217g2; arc=fail smtp.client-ip=40.107.100.41 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="Pus217g2" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=buMWyz5PzRGVNb0qfzUiWIRfxm8tkISvBj0V251vS1EEhKkIqPHkb4ld39hTYis755RaY1eRLP3MhXVoymc6hxMRez/iI9FIf+ZfUA1UyTkFSlz5jv80cvrIG3sn0AfPfHfOSlLPsLgv2zOgl/2gOnMYhZiYf7oU6jMYRafibuJ8QMcQEVYxUD+U+lGmjuqnaHbKS0e4gJWYFrzcuTO4wPZhGmQEHNl4kSbJUR27xHmvxijS4L0KGBlSgQDWMQBCmEC6QMusKmkXf/6hl4CO/SZ/JBYJ7jNblleZaKD+KGmoaPL9jA0mxRHrVWw+Gxl1dh6IMn+HG8DHL+SPtzHqPA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8U2KVSI1G7Z1tHzq5MeLZ1jw/Ssj0zPVX+nAt96lSAk=; b=Nm6StycbbTJE0Vb3w2Bq6t661Ral66Ig/r5UWqwpVmixPTnce+G0/jyOfCkg7TjOcXSKPwx/ZEE2hKvfQ00I5tdqwh1veBqWMV/lo8RAz4oEED/wQFFMtBnjz5ovo/wu3xCSC2vk8Ua12FveHGXzRe/Q4gXVgZ/TDoLpOT2emAjl4joicxBErt6k+OH/CTlkBa5l381W82R+N/8Nrk5MUsKasn7/hZ+dj+Y2gZNOz29BCJ+Hyq7tJFvMGDrUlXSVYnh1QTgt1m+9F9UQ454f/U7CvYvrJn8p3WSt7W2Y/4MuFu6Bu12TUWtOzJvCnr2pjZ1+dPZPIQDACw/trY5UEg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=linux.ibm.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8U2KVSI1G7Z1tHzq5MeLZ1jw/Ssj0zPVX+nAt96lSAk=; b=Pus217g2UdXA3+Q+zTaZNrUT75isyqDzJr+z/SiBiAKggnK627YaAh04jxYQMRuu6J7DNhX1bdydp2fUoUknzK9FH+Oo0gQH1yjCddoJ+Gh/xgSma0t1O4q56M59Li/+r3Wh+3KsiG1i6kOc+fn0SNk5l+wwiUvlilat8kXMr8I= Received: from BN0PR04CA0169.namprd04.prod.outlook.com (2603:10b6:408:eb::24) by SA3PR12MB7878.namprd12.prod.outlook.com (2603:10b6:806:31e::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.8356.20; Mon, 20 Jan 2025 09:12:29 +0000 Received: from BN2PEPF0000449E.namprd02.prod.outlook.com (2603:10b6:408:eb:cafe::77) by BN0PR04CA0169.outlook.office365.com (2603:10b6:408:eb::24) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.8356.21 via Frontend Transport; Mon, 20 Jan 2025 09:12:28 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by BN2PEPF0000449E.mail.protection.outlook.com (10.167.243.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.8377.8 via Frontend Transport; Mon, 20 Jan 2025 09:12:28 +0000 Received: from [172.31.188.187] (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 20 Jan 2025 03:12:24 -0600 Message-ID: Date: Mon, 20 Jan 2025 14:42:16 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] sched/fair: Fix inaccurate h_nr_runnable accounting with delayed dequeue To: Madadi Vineeth Reddy CC: , , , , , , , , , , , References: <20250117105852.23908-1-kprateek.nayak@amd.com> <20250120050639.48966-1-vineethr@linux.ibm.com> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20250120050639.48966-1-vineethr@linux.ibm.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: BN2PEPF0000449E:EE_|SA3PR12MB7878:EE_ X-MS-Office365-Filtering-Correlation-Id: 4d8b42b1-e680-456f-e32c-08dd39329040 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|1800799024|376014|7416014|36860700013|82310400026|7053199007; X-Microsoft-Antispam-Message-Info: =?utf-8?B?Z2xoMzFkVG9jcWtNeHlhd0RnZGl2RWZ3Vy91ZzN1QktaS0d3UUx4eFZJMi84?= =?utf-8?B?amZEajQ3dW1hYmZyRmREaTZwUzdSNk9xN0VyeXpqQkhhWFhhOW1icjR6VlBG?= =?utf-8?B?TTJpTk9PYmo5VmNlQ0EvVzFYdjl6YW40NGhFc1RMQWJkU0JSVjBVNy9YSTUv?= =?utf-8?B?VnhDZ0ZiZnhQU3dsMzZMamxxU1EwTk1yZTY4K05jTkdrTkE5VThwanlyM01F?= =?utf-8?B?ZzJUUG5xdDhaSm05MjNkcDBFc1dZWVE4NHlxZ0JZeXJlUy9tVkxwSytnY3BX?= =?utf-8?B?RTR3TXJZQjFVTU00Y3d6a3BJOE12ZlVXU3NFK0g1YmdYNUdBQXc3cnZ2dWxo?= =?utf-8?B?TGc1V3EzK3JpNU1DODMxNXBGbVNnek00dFhLVTYzeTZ2cXVWbDU4dUNXeGZH?= =?utf-8?B?dGxTYlcycm5jcDF1NGdZWG5BaFVzeHloUDdSdXQ4WE5uamJiVmhETEZweEl6?= =?utf-8?B?dVltRklBenZsL29jSWdEclo5cjZoSDZCd3RlU1VJaE54Tm95ZVR0OHhFQnhj?= =?utf-8?B?bkVQM1JERmFoZ0xiWEpEcm01cUlmd1RoR0hmcHl4dmgzd2pQVUVyRFJ4Rkcv?= =?utf-8?B?cHIzT0RIZllLYjB5QUE3bEZxNFJqQVIwNDE3NjNvd1kvVW1jTzJjWGVNQk1a?= =?utf-8?B?dVhiY0hJZ1ZYdWxOUGFQQll1WDU1eHJNZmR1QXEyV3NHUHF2elk5eDlmSWpr?= =?utf-8?B?VU1oSHNCWmhwdHhkRUZqMGJPb0R2NkNCSnYxTzAwVjZTNS90VmFkQWRHQkxr?= =?utf-8?B?WkREUi9uSFpqU3B5Zm9EN2wwMU10OWVqSFpDOGgrVkppZ0RQSDg5WFFsd0s4?= =?utf-8?B?ZGFZYUtmYXZMbTFsUHlITjAxUlBQTzdnTW1xcUUvQlE2ck9DRUtvT0FvWXJz?= =?utf-8?B?N2MzcnpCZ0VaTkpYcTM3aSt6NjhDRmpsVWd2aHgrWkZrbW4ySENvaUFGU0VM?= =?utf-8?B?bGZLTnhHYTE4RnhqVWtMU1RhTlh4Z3l3REVMamdoK0UrOC8rdVRybG9kMnF4?= =?utf-8?B?WGFOWm4zMWV3SkxjUFlyMnQzaHZmS3dURDZFaHpuTkhzcHdydkVNS0xiWmYw?= =?utf-8?B?RHphVmhUd3RJdFpLTUFHRFFEZEI2YlFPdGNTbEpkeUlEK2EzR3RoS1IwSldO?= =?utf-8?B?djJLZE81MjhXem84bmYzVVUrdEFQRFM2UVdJNFZYenZLSTFVUnBMOXBRSWR3?= =?utf-8?B?WnNpQWwvcGNkNkJ3WW9PY3NTL1ZJYVZwTVorWnV4dmNYNlV6eFJzU1JPdXYy?= =?utf-8?B?NlcyeitoS1RnbHhxUnRQdEEvaTlNZWcrK3ZYNlBvSUI1em9NMDg5NzJSd2Jx?= =?utf-8?B?ZDhXeW9ublJQVEZUOHA0S09SUUdSYXJKbmQyNEVBMkRtbXFybEdSOW5jKzQ1?= =?utf-8?B?dzhwK0pXNVM4YlFjUnNRQVdENk1nNEdvQWJMTkZYWGJQTlNsZzF2V2FmdzJT?= =?utf-8?B?TXowZTMxeFNvSmtySXF6T1JEbW1IaGR3ditQUVJwWXNBYkJkUVV1WlU2M3dD?= =?utf-8?B?QXJFNDBjbk5mNXdMWmV0UE5rN1BzMC9UcjBwdzJ6Ui9jWkk1KzgrYk5SYk1K?= =?utf-8?B?MDJOa2Q0eHluMlBEU0ZMRzQvY2diZnBJMFBteVAwVm05bE1PVWZYUTlCRWpT?= =?utf-8?B?UGFQQkRUMWN6LzlUNXVKeW0rUTlZQXpCVkliWFloWU5pRkQ3ejZta202cG5m?= =?utf-8?B?UGJlSVFmS2ZxNGJaQ3BuWGRJaEsxaFYxZkQ2WXdwNlFncDYxM0dmbnFxQncr?= =?utf-8?B?Rlk2YzN6aUZyVVdqN1BOTmlsMWNHVEtqR0tmakJ6R2lkWW5MbVAvMDhMVGdB?= =?utf-8?B?Smh3N2dySExQcis4UDBoeUxFcmx3TXlzUU9uMllyMmNsZ0pVQ1FZYXMyTWRS?= =?utf-8?B?YjUyb2JJL3Y1aE9HWWZJN2NuUXVyditXajBzczVhbmFXTmUyNjdsbHAvVFZD?= =?utf-8?B?OUdEUzlOY2tuRWx3Y3BDR3FQMi9zak0rLzJ1Zi9GclpxemIvVkVjdzgzdXhT?= =?utf-8?B?M1JVUWxsQ09nPT0=?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:CAL;SFV:NSPM;H:SATLEXMB04.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(1800799024)(376014)(7416014)(36860700013)(82310400026)(7053199007);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Jan 2025 09:12:28.7018 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4d8b42b1-e680-456f-e32c-08dd39329040 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: BN2PEPF0000449E.namprd02.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: SA3PR12MB7878 Hello Madadi Vineeth Reddy, Thank you for the review and test. On 1/20/2025 10:36 AM, Madadi Vineeth Reddy wrote: > Hi Prateek, > >> A SCHED_WARN_ON() to inspect h_nr_runnable post its update in >> dequeue_entities() like below: >> >> cfs_rq->h_nr_runnable -= h_nr_runnable; >> SCHED_WARN_ON(((int) cfs_rq->h_nr_runnable) < 0); >> >> is consistently tripped when running wakeup intensive workloads like >> hackbench in a cgroup. > > I observed that the WARN_ON is triggered during the boot process without > the patch, and the patch resolves the issue. > > However, I was unable to trigger the WARN_ON by running hackbench in a > cgroup without the patch. Could you please share the specific test > scenario or configuration you used to reproduce it? Can you try converting the SCHED_WARN_ON() to a WARN_ON() and try again. I can consistently hit it to a point that it floods my console. With autogroup enabled on Ubuntu, I believe it is trivial to hit this issue. Following is the exact diff I'm using on top of tip:sched/core that floods my console: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 98ac49ce78ea..7bc2c57601b6 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7160,6 +7160,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags) cfs_rq->h_nr_runnable -= h_nr_runnable; cfs_rq->h_nr_queued -= h_nr_queued; + WARN_ON(((int) cfs_rq->h_nr_runnable) < 0); cfs_rq->h_nr_idle -= h_nr_idle; if (cfs_rq_is_idle(cfs_rq)) @@ -7199,6 +7200,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags) cfs_rq->h_nr_runnable -= h_nr_runnable; cfs_rq->h_nr_queued -= h_nr_queued; + WARN_ON(((int) cfs_rq->h_nr_runnable) < 0); cfs_rq->h_nr_idle -= h_nr_idle; if (cfs_rq_is_idle(cfs_rq)) -- I tested this on a 32 vCPU VM and I get similar floods. > > For the boot process scenario: > Tested-by: Madadi Vineeth Reddy Thanks a ton for testing! > > Thanks, > Madadi Vineeth Reddy -- Thanks and Regards, Prateek