From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from MW6PR02CU001.outbound.protection.outlook.com (mail-westus2azon11012020.outbound.protection.outlook.com [52.101.48.20])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63E0822652D
	for <linux-kernel@vger.kernel.org>; Fri,  9 Jan 2026 04:50:23 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.48.20
ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1767934225; cv=fail; b=WnjoJ9ExnvjhGq/zB4pecUINtPMvyU0Cu1CJpglY7rw3ZLZR2J6Z7jeiJoisVIlSwrNoN4AH3KPVNblukgtQ0W26d6We91dSOr5ZlkPoywg/kKWqsLa5EgpY1kIYPwOH74sZHPSVibEh1b0CMyD7MBe5A0nLVJUch96ikJ9CvAo=
ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1767934225; c=relaxed/simple;
	bh=G94Z6FMWDoNwGzWtU/lO9gS8XJ/IsrYZUye75paQDaU=;
	h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From:
	 In-Reply-To:Content-Type; b=U2hrU9yirO/39XSbV+KphkP4+Wz3L+rfLxx8Z7BF79SU1zZ1YXWe9sCb5QGYfDikhgGbOlRYw+va1Ns++eDMC3KuoUyqDA7wGa/x4ImVn+UitkoX1c6WPks+uZV65KOc+eFAspXju2567pjcWS+g3W8SXpXaVXC6Sp3y22b88wE=
ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=QO87WxII; arc=fail smtp.client-ip=52.101.48.20
Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com
Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="QO87WxII"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=yQFCUg5yefl1ER9VtIw1D947OwU4RxpLDPZxwJAJYGHe81K+EE1+kb3AtGT/1eXW1nHDbYtwIxrvE2A0vO1MnQdyhlcOMeLARFos86xhCQdyyEnu9xJU4iZb59EZAakq4dj/1H62hkP6C+19Hs65fdNERLzmebHU0bXjf/zGeVsYjT+CN0QJBSAj0ptVdWvOaGOpxR0IMhS2Fe4tXeRTqlcALpng85VmmsLHzlg7/RcRnQilwHLlzglcaVcPpr626eZGD4FrZd5dfx2rHBoN2zm9jK2H4SOnTLkDAq/wrEwFfvFkwM3GPdcMHb7lsvSOUdwox8GyrEOn3Z+U+vwJqw==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=uMd1CfnTe8t6GkYY9JuUdNQ4t7SyudpvzhdCyYeRlR4=;
 b=j2HPcBzw8iEpU1BFZdn0Wd9BG9qnqcUeVndDxx/8f0mdfUqzNML2GC3+5RfGRNC94oJynSeO0oJhLpQuhG4wCAkr6Pmmhya2jv6qOE1dCOJx0uRU4zQaOfH06iFrKcJ43yafVFQ2Wyh2/2WPvlxaUgU+6ZQ/DrqZubrq63/K2YPbfgsy3LUWjZ/6mAPJBuEgjP8rKCiKb/vg35sSe/VD9KucZCrSbrLZsbIsbHH01SXkPX2cXEHIQOOs+Xkm7JmstAje2kkEQODXFPyTT5jr9gljnpUSAwVHjdnKV5p301Y05DOZa1MaaZvohEhXPNrz+qn9nUcBOl+wLC9jq8gHNg==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 165.204.84.17) smtp.rcpttodomain=huawei.com smtp.mailfrom=amd.com; dmarc=pass
 (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com;
 dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=uMd1CfnTe8t6GkYY9JuUdNQ4t7SyudpvzhdCyYeRlR4=;
 b=QO87WxIIESFzO8VUsNLyzce8c+0RjFn/j2Qx7WfW29jpCgPhWPRwlAXTAO0n93sYE2wmyPrlFgGQRoqKxx/QThlKcF60tFwuEVeL0/ldiSiSG/mA9fU+XcDighFTnXNUJ/oBULjgm2mAK0f7AGFIqgXB1Gsu5sm0d/TleXA7n2k=
Received: from SJ0PR03CA0138.namprd03.prod.outlook.com (2603:10b6:a03:33c::23)
 by CH2PR12MB4152.namprd12.prod.outlook.com (2603:10b6:610:a7::8) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9499.2; Fri, 9 Jan
 2026 04:50:17 +0000
Received: from SJ5PEPF000001F6.namprd05.prod.outlook.com
 (2603:10b6:a03:33c:cafe::1d) by SJ0PR03CA0138.outlook.office365.com
 (2603:10b6:a03:33c::23) with Microsoft SMTP Server (version=TLS1_3,
 cipher=TLS_AES_256_GCM_SHA384) id 15.20.9499.4 via Frontend Transport; Fri, 9
 Jan 2026 04:50:17 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17)
 smtp.mailfrom=amd.com; dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=amd.com;
Received-SPF: Pass (protection.outlook.com: domain of amd.com designates
 165.204.84.17 as permitted sender) receiver=protection.outlook.com;
 client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C
Received: from satlexmb07.amd.com (165.204.84.17) by
 SJ5PEPF000001F6.mail.protection.outlook.com (10.167.242.74) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.9520.1 via Frontend Transport; Fri, 9 Jan 2026 04:50:15 +0000
Received: from SATLEXMB04.amd.com (10.181.40.145) by satlexmb07.amd.com
 (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.17; Thu, 8 Jan
 2026 22:50:15 -0600
Received: from satlexmb08.amd.com (10.181.42.217) by SATLEXMB04.amd.com
 (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 8 Jan
 2026 22:50:14 -0600
Received: from [10.136.32.160] (10.180.168.240) by satlexmb08.amd.com
 (10.181.42.217) with Microsoft SMTP Server id 15.2.2562.17 via Frontend
 Transport; Thu, 8 Jan 2026 22:50:11 -0600
Message-ID: <0615d2c6-c963-46ff-9088-d85e3821eec8@amd.com>
Date: Fri, 9 Jan 2026 10:20:10 +0530
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH] sched/fair: Fix vruntime drift by preventing double lag
 scaling during reweight
To: Zicheng Qu <quzicheng@huawei.com>, <mingo@redhat.com>,
	<peterz@infradead.org>, <juri.lelli@redhat.com>,
	<vincent.guittot@linaro.org>, <dietmar.eggemann@arm.com>,
	<rostedt@goodmis.org>, <bsegall@google.com>, <mgorman@suse.de>,
	<vschneid@redhat.com>, <linux-kernel@vger.kernel.org>
CC: <tanghui20@huawei.com>
References: <20251226001731.3730586-1-quzicheng@huawei.com>
Content-Language: en-US
From: K Prateek Nayak <kprateek.nayak@amd.com>
In-Reply-To: <20251226001731.3730586-1-quzicheng@huawei.com>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 7bit
Received-SPF: None (SATLEXMB04.amd.com: kprateek.nayak@amd.com does not
 designate permitted sender hosts)
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: SJ5PEPF000001F6:EE_|CH2PR12MB4152:EE_
X-MS-Office365-Filtering-Correlation-Id: 08872439-c95c-4692-8b43-08de4f3a9517
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam:
	BCL:0;ARA:13230040|36860700013|82310400026|7416014|1800799024|376014|921020;
X-Microsoft-Antispam-Message-Info:
	=?utf-8?B?M3N0TGVwRnRaOHN0dFFqSUdYalBoZGlraFVncVFuSGJGQzNFdUl2cVJmSzh3?=
 =?utf-8?B?QkY0cXJ3bWRrbURkZTZrdG5IeWdzM2NkN2g1OFNPSU5zemQrdlpTWGhqcDdD?=
 =?utf-8?B?d0U4cVlzbmJRZHlwZ0lqNlp4VWR1M0V5SDExMG92MXpHRk1NTmd4RC96N29Y?=
 =?utf-8?B?d0ltSmFpZVEyS1VaSXpvT2w4SDlJMFpETkxVTkt4SjFCSnJhVHMxUUJyWTNm?=
 =?utf-8?B?SzRJblBYN002L3p4SFFIQm1qN0pEZmI0M2RpWWJoTEtIWGM4Z3RsendRSHpG?=
 =?utf-8?B?SGRiUnZiUFdOWDR4NzgwVEpTaGlCWlVGc21KVE9ZSlo3d3BiRSs4U1VXcjVU?=
 =?utf-8?B?c09Gb29zT1h2cXNmNXljMkFCKzF2djhnUjhXazJ6Nm1SVG1ZMHRGWm1rM2Ft?=
 =?utf-8?B?d3FHK0JmakpIeW9oNk5Ha0s0SS9WUGI1MDV0TWw1ZzY4M2NFN2E4UDdCbTFI?=
 =?utf-8?B?NDBuWjFVaVFyZ1h1cys5SmhTYkRFUW5tSS8rcWd6d2RJdE44VENtMFJFN09i?=
 =?utf-8?B?TEE5czYzY1FVb205MXBDSWhIUllBVlFXcDFXYkx6b1MrSXQ1a0tDdWFFSDhl?=
 =?utf-8?B?ZzliYTY3eXMvMldCQm5zdUZRSlBGK2xab1ZVVHlQV1YrOUd1WmhrVFUyWWh4?=
 =?utf-8?B?N1BNcTBDaHJQV05UaVhYUVYxZ1kxbE5ZbGlvQ3JtQWpzbThXUnF4TjVSaU1W?=
 =?utf-8?B?S3hOZGNmRUVpTDFUUm5aNWRvUnlVR3pMZ3Z3QW54cnJvTGNaNFVibDJtSW9p?=
 =?utf-8?B?Nm1kTnBuZHl4OFJDa3QzSE9wempyRjEveXR2aXZFeVNIczVZQWVZVTVBakdV?=
 =?utf-8?B?NmFUY0FaVlo5dnhjbmFWakMzODlnNkdhZ1lKN25tYUlGYnk5ZW5GVUJXUVZs?=
 =?utf-8?B?TmRuWXczTnh6RXVGbHJiZUlORmN5QUJTTmlNQWg2cUgzWk9iVGkxWE1rRHdL?=
 =?utf-8?B?WndONU9sYkROMFplWm5kMDdobysyYnM2MVRJRG1nVHArRi8xVGNjY0VsTW1a?=
 =?utf-8?B?Q1VQRzhUK1I1Z2dkOWRkN0t1OVprb1NTemZ1M3ZmUHNYWkJkSHR4czFEdWZN?=
 =?utf-8?B?a0xDZ3hWUXM3ZkcwNDN5ZE1HMjc1RjVFbS9LazdTZ3NveWQxaHJsWnp1ZG1n?=
 =?utf-8?B?UUd3UExYajk2TWszWUJzWE5QbVVsZXVOeUZpZWFLSzdYSDdaVy9aWW83NEtq?=
 =?utf-8?B?T1VNZ3dRZklBNW5DZStnKzFqNzZnTDBVTmJXWXpxUGZNeWpVck52MnRTdzFl?=
 =?utf-8?B?akNWSCtDTk5EYk9Bb28xOEdpNVFaUThPWXN0bGE1R1orcU9OVzdubjhXSFp2?=
 =?utf-8?B?cXdxMFM1YW9UVVpuM3VHTTJ4aUhxY0JkL0xLdW1wT0U2ZktteWZ4cGFGR1RB?=
 =?utf-8?B?N2ROL0IyMTBONW0rcm1PTVJiMXgwd21GMmZWZm5uQWZBOUsxcGlpMHdpU215?=
 =?utf-8?B?TnEzR2J1cWw1aVQ1dlh2aVpJNkdDZHVGYUxHQTRPckFSRHBXTmxtOHNOUWRI?=
 =?utf-8?B?eW4rOXpMN2V2NGdnRzNFSjNFY1M3dmJCaGJjc3hpdDROUkFicXB2bUZUNm1M?=
 =?utf-8?B?UkNzSFI5TEhsNUc1eFptdEdVbEU3ZGVaQkxOSVdUR0pkcElQWktDd2NLZm1W?=
 =?utf-8?B?N1lhRkgrZjM3ZUZ3ZlFFU3BjUW1TUzRLRnhEdC9LNXg4L3NQMkhTdzc2Z0Iz?=
 =?utf-8?B?OTNjQXZSMjNra1g5aTlMQjF5TXd5dUdYUUVBdGhpbTM5WU5PSVZpckg4N0o1?=
 =?utf-8?B?TjdEc1NWKzhVeWNJN3RWaDlkQm5jTlVXaVI1TFEreGJoT1V1K3UwWWpCK2cv?=
 =?utf-8?B?RVlmN1lrM2NucFhxODRzN3hxMW4xRERNWUJ4YllUSXN0SmpMcllHNE1nY24x?=
 =?utf-8?B?QitidkRmVUFsRDhTQWZSRjUrN1pCNmRMekIzcS9mbmZCTXF4MGZsbnZSWU9u?=
 =?utf-8?B?NGV3Q1FTWDBuUGlQSkptQ1lsOHN0OThyRy9OYVVienFWdTZVWVdINmdwYXJp?=
 =?utf-8?B?NVhRZjM5RnBNU0JJSW5STFFxUW9DN1lWOVQzK0pNQmJIMm1aMGFFWlJYTzBy?=
 =?utf-8?B?V3htSE4xK2o1VDRtSmxhNElwbVMxK3pVUVdYRkc1TjFJQlVZZk5UOEF4M3Rw?=
 =?utf-8?Q?6NXNlKm6xqJmOcC4lZzsqsToR?=
X-Forefront-Antispam-Report:
	CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700013)(82310400026)(7416014)(1800799024)(376014)(921020);DIR:OUT;SFP:1101;
X-OriginatorOrg: amd.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jan 2026 04:50:15.9893
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 08872439-c95c-4692-8b43-08de4f3a9517
X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com]
X-MS-Exchange-CrossTenant-AuthSource:
	SJ5PEPF000001F6.namprd05.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4152

Hello Zicheng,

On 12/26/2025 5:47 AM, Zicheng Qu wrote:
> In reweight_entity(), when reweighting a currently running entity (se ==
> cfs_rq->curr), the entity remains on the runqueue context without
> undergoing a full dequeue/enqueue cycle. This means avg_vruntime()
> remains constant throughout the reweight operation.
> 
> However, the current implementation calls place_entity(..., 0) at the
> end of reweight_entity(). Under EEVDF, place_entity() is designed to
> handle entities entering the runqueue and calculates the virtual lag
> (vlag) to account for the change in the weighted average vruntime (V)
> using the formula:
> 
> 	vlag' = vlag * (W + w_i) / W
> 
> Where 'W' is the current aggregate weight (including
> cfs_rq->curr->load.weight) and 'w_i' is the weight of the entity being
> enqueued (in this case, the se is exactly the cfs_rq->curr).
> 
> This leads to a "double scaling" logic for running entities:
> 1. reweight_entity() already rescales se->vlag based on the new weight
>    ratio.
> 2. place_entity() then mistakenly applies the (W + w_i)/W scaling again,
>    treating the reweight as a fresh enqueue into a new total weight
> pool.
> 
> This can cause the entity's vlag to be amplified (if positive) or
> suppressed (if negative) incorrectly during the reweight process.
> 
> In environments with frequent cgroup throttle/unthrottle operations,
> this math error manifests as a vruntime drift.
> 
> A hungtask was observed as below:
> crash> runq -c 0 -g
> CPU 0
>   CURRENT: PID: 330440  TASK: ffff00004cd61540  COMMAND: "stress-ng"
>   ROOT_TASK_GROUP: ffff8001025fa4c0  RT_RQ: ffff0000fff42500
> 	 [no tasks queued]
>   ROOT_TASK_GROUP: ffff8001025fa4c0  CFS_RQ: ffff0000fff422c0
> 	 TASK_GROUP: ffff0000c130fc00  CFS_RQ: ffff00009125a400  <test_cg>	cfs_bandwidth: period=100000000, quota=18446744073709551615, gse: 0xffff000091258c00, vruntime=127285708384434, deadline=127285714880550, vlag=11721467, weight=338965, my_q=ffff00009125a400, cfs_rq: avg_vruntime=0, zero_vruntime=2029704519792, avg_load=0, nr_running=1
> 		TASK_GROUP: ffff0000d7cc8800  CFS_RQ: ffff0000c8f86800  <test_test329274_1>	cfs_bandwidth: period=14000000, quota=14000000, gse: 0xffff0000c8f86400, vruntime=2034894470719, deadline=2034898697770, vlag=0, weight=215291, my_q=ffff0000c8f86800, cfs_rq: avg_vruntime=-422528991, zero_vruntime=8444226681954, avg_load=54, nr_running=19
> 		   [110] PID: 330440  TASK: ffff00004cd61540  COMMAND: "stress-ng" [CURRENT]    vruntime=8444367524951, deadline=8444932411139, vlag=8444932411139, weight=3072, last_arrival=4002964107010, last_queued=0, exec_start=3872860294100, sum_exec_runtime=22252021900
> 		   ...
> 		   [110] PID: 330291  TASK: ffff0000c02c9540  COMMAND: "stress-ng"	vruntime=8444229273009, deadline=8444946073008, vlag=-2701415, weight=3072, last_arrival=4002964076840, last_queued=4002964550990, exec_start=3872859839290, sum_exec_runtime=22310951770
> 	 [100] PID: 97     TASK: ffff0000c2432a00  COMMAND: "kworker/0:1H"	vruntime=127285720095197, deadline=127285720119423, vlag=48453, weight=90891264, last_arrival=3846600432710, last_queued=3846600721010, exec_start=3743307237970, sum_exec_runtime=413405210
> 	 [120] PID: 15     TASK: ffff0000c0368080  COMMAND: "ksoftirqd/0"	vruntime=127285722433404, deadline=127285724533404, vlag=0, weight=1048576, last_arrival=3506755665780, last_queued=3506852159390, exec_start=3461615726670, sum_exec_runtime=16341041340
> 	 [120] PID: 50173  TASK: ffff0000741d8080  COMMAND: "kworker/0:0"	vruntime=127285722960040, deadline=127285725060040, vlag=-414755, weight=1048576, last_arrival=3506828139580, last_queued=3506972354700, exec_start=3461676584440, sum_exec_runtime=84414080
> 	 [120] PID: 58662  TASK: ffff000091180080  COMMAND: "kworker/0:2"	vruntime=127285723428168, deadline=127285725528168, vlag=3049158, weight=1048576, last_arrival=3505689085070, last_queued=3506848131990, exec_start=3460592328510, sum_exec_runtime=89193000
> 
> TASK 1 (systemd) is waiting for cgroup_mutex.
> TASK 329296 (sh) holds cgroup_mutex and is waiting for cpus_read_lock.
> TASK 50173 (kworker/0:0) holds the cpus_read_lock, but fail to be
> scheduled.
> test_cg and TASK 97 may have suppressed TASK 50173, causing
> it to not be scheduled for a long time, thus failing to release locks in
> a timely manner and ultimately causing a hungtask issue.
> 
> Fix by adding ENQUEUE_REWEIGHT_CURR flag and skipping vlag recalculation
> in place_entity() when reweighting the current running entity. For
> non-current entities, the existing logic remains as dequeue/enqueue
> changes avg_vruntime().
> 
> Fixes: 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag")
> Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
> ---
>  kernel/sched/fair.c  | 11 ++++++++++-
>  kernel/sched/sched.h |  1 +
>  2 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
> index da46c3164537..3be42729049e 100644
> --- a/kernel/sched/fair.c
> +++ b/kernel/sched/fair.c
> @@ -3787,7 +3787,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
>  
>  	enqueue_load_avg(cfs_rq, se);
>  	if (se->on_rq) {
> -		place_entity(cfs_rq, se, 0);
> +		place_entity(cfs_rq, se, curr ? ENQUEUE_REWEIGHT_CURR : 0);
>  		update_load_add(&cfs_rq->load, se->load.weight);
>  		if (!curr)
>  			__enqueue_entity(cfs_rq, se);
> @@ -5123,6 +5123,14 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  
>  		lag = se->vlag;
>  
> +		/*
> +		 * ENQUEUE_REWEIGHT_CURR:
> +		 * current running se (cfs_rq->curr) should skip vlag recalculation,
> +		 * because avg_vruntime(...) hasn't changed.
> +		 */
> +		if (flags & ENQUEUE_REWEIGHT_CURR)
> +			goto skip_lag_scale;

If I'm not mistaken, the problem is that we'll see "curr->on_rq" and
then do:

    if (curr && curr->on_rq)
        load += scale_load_down(curr->load.weight);

    lag *= load + scale_load_down(se->load.weight);


which shouldn't be the case since we are accounting "se" twice when
it is also the "curr" and avg_vruntime() would have also accounted it
already since "curr->on_rq" and then we do everything twice for "se".

I'm wondering if instead of adding a flag, we can do:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7377f9117501..7b4a7f4f2efa 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -3792,8 +3792,9 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
 			    unsigned long weight)
 {
 	bool curr = cfs_rq->curr == se;
+	bool queued = !!se->on_rq;
 
-	if (se->on_rq) {
+	if (queued) {
 		/* commit outstanding execution time */
 		update_curr(cfs_rq);
 		update_entity_lag(cfs_rq, se);
@@ -3803,6 +3804,12 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
 		if (!curr)
 			__dequeue_entity(cfs_rq, se);
 		update_load_sub(&cfs_rq->load, se->load.weight);
+		/*
+		 * Indicate that se is off the cfs_rq for place_entity()
+		 * to correctly scale the weight especially when curr is
+		 * being placed back.
+		 */
+		se->on_rq = 0;
 	}
 	dequeue_load_avg(cfs_rq, se);
 
@@ -3823,12 +3830,14 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se,
 	} while (0);
 
 	enqueue_load_avg(cfs_rq, se);
-	if (se->on_rq) {
+	if (queued) {
 		place_entity(cfs_rq, se, 0);
 		update_load_add(&cfs_rq->load, se->load.weight);
 		if (!curr)
 			__enqueue_entity(cfs_rq, se);
 		cfs_rq->nr_queued++;
+		/* Entity has been enqueued back. */
+		se->on_rq = 1;
 	}
 }
 
---

This matches what we do for curr in enqueue_entity() where we know
"cfs_rq->curr == se" but "se->on_rq == 0". Thoughts?

On a side note, I was looking at requeue_delayed_entity() and was
wondering if something like this makes sense there since it also does a
place_entity() but then an entity can never be "cfs_rq->curr" and be
delayed when we drop the rq_lock:

1) If se is ineligible, there must be another queued entity and if it is
   runnable, pick_task_fair() will pick the runnable entity and do an
   equivalent of (put_prev/set_next)_entity() to switch the
   "cfs_rq->curr" to the runnable hierarchy before dropping the rq_lock.

2) If everything is delayed, pick_next_entity() will dequeue them all
   completely before dropping the rq_lock for idle balancing.

FWIW, I've been running with:

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7377f9117501..550bddfb2cc0 100644
@@ -6843,6 +6852,7 @@ requeue_delayed_entity(struct sched_entity *se)
 	 */
 	WARN_ON_ONCE(!se->sched_delayed);
 	WARN_ON_ONCE(!se->on_rq);
+	WARN_ON_ONCE(cfs_rq->curr == se);
 
 	if (sched_feat(DELAY_ZERO)) {
 		update_entity_lag(cfs_rq, se);
---

and I haven't seen any splats (yet!) :-)

Peter, thoughts?

> +
>  		/*
>  		 * If we want to place a task and preserve lag, we have to
>  		 * consider the effect of the new entity on the weighted
> @@ -5185,6 +5193,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
>  		lag = div_s64(lag, load);
>  	}
>  
> +skip_lag_scale:
>  	se->vruntime = vruntime - lag;
>  
>  	if (se->rel_deadline) {
> diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
> index d30cca6870f5..e3a43f94dd2f 100644
> --- a/kernel/sched/sched.h
> +++ b/kernel/sched/sched.h
> @@ -2412,6 +2412,7 @@ extern const u32		sched_prio_to_wmult[40];
>  #define ENQUEUE_MIGRATED	0x00040000
>  #define ENQUEUE_INITIAL		0x00080000
>  #define ENQUEUE_RQ_SELECTED	0x00100000
> +#define ENQUEUE_REWEIGHT_CURR	0x00200000
>  
>  #define RETRY_TASK		((void *)-1UL)
>  

-- 
Thanks and Regards,
Prateek