From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from MW6PR02CU001.outbound.protection.outlook.com (mail-westus2azon11012020.outbound.protection.outlook.com [52.101.48.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 63E0822652D for ; Fri, 9 Jan 2026 04:50:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=fail smtp.client-ip=52.101.48.20 ARC-Seal:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767934225; cv=fail; b=WnjoJ9ExnvjhGq/zB4pecUINtPMvyU0Cu1CJpglY7rw3ZLZR2J6Z7jeiJoisVIlSwrNoN4AH3KPVNblukgtQ0W26d6We91dSOr5ZlkPoywg/kKWqsLa5EgpY1kIYPwOH74sZHPSVibEh1b0CMyD7MBe5A0nLVJUch96ikJ9CvAo= ARC-Message-Signature:i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1767934225; c=relaxed/simple; bh=G94Z6FMWDoNwGzWtU/lO9gS8XJ/IsrYZUye75paQDaU=; h=Message-ID:Date:MIME-Version:Subject:To:CC:References:From: In-Reply-To:Content-Type; b=U2hrU9yirO/39XSbV+KphkP4+Wz3L+rfLxx8Z7BF79SU1zZ1YXWe9sCb5QGYfDikhgGbOlRYw+va1Ns++eDMC3KuoUyqDA7wGa/x4ImVn+UitkoX1c6WPks+uZV65KOc+eFAspXju2567pjcWS+g3W8SXpXaVXC6Sp3y22b88wE= ARC-Authentication-Results:i=2; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com; spf=fail smtp.mailfrom=amd.com; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b=QO87WxII; arc=fail smtp.client-ip=52.101.48.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amd.com Authentication-Results: smtp.subspace.kernel.org; spf=fail smtp.mailfrom=amd.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=amd.com header.i=@amd.com header.b="QO87WxII" ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=yQFCUg5yefl1ER9VtIw1D947OwU4RxpLDPZxwJAJYGHe81K+EE1+kb3AtGT/1eXW1nHDbYtwIxrvE2A0vO1MnQdyhlcOMeLARFos86xhCQdyyEnu9xJU4iZb59EZAakq4dj/1H62hkP6C+19Hs65fdNERLzmebHU0bXjf/zGeVsYjT+CN0QJBSAj0ptVdWvOaGOpxR0IMhS2Fe4tXeRTqlcALpng85VmmsLHzlg7/RcRnQilwHLlzglcaVcPpr626eZGD4FrZd5dfx2rHBoN2zm9jK2H4SOnTLkDAq/wrEwFfvFkwM3GPdcMHb7lsvSOUdwox8GyrEOn3Z+U+vwJqw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=uMd1CfnTe8t6GkYY9JuUdNQ4t7SyudpvzhdCyYeRlR4=; b=j2HPcBzw8iEpU1BFZdn0Wd9BG9qnqcUeVndDxx/8f0mdfUqzNML2GC3+5RfGRNC94oJynSeO0oJhLpQuhG4wCAkr6Pmmhya2jv6qOE1dCOJx0uRU4zQaOfH06iFrKcJ43yafVFQ2Wyh2/2WPvlxaUgU+6ZQ/DrqZubrq63/K2YPbfgsy3LUWjZ/6mAPJBuEgjP8rKCiKb/vg35sSe/VD9KucZCrSbrLZsbIsbHH01SXkPX2cXEHIQOOs+Xkm7JmstAje2kkEQODXFPyTT5jr9gljnpUSAwVHjdnKV5p301Y05DOZa1MaaZvohEhXPNrz+qn9nUcBOl+wLC9jq8gHNg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=huawei.com smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=uMd1CfnTe8t6GkYY9JuUdNQ4t7SyudpvzhdCyYeRlR4=; b=QO87WxIIESFzO8VUsNLyzce8c+0RjFn/j2Qx7WfW29jpCgPhWPRwlAXTAO0n93sYE2wmyPrlFgGQRoqKxx/QThlKcF60tFwuEVeL0/ldiSiSG/mA9fU+XcDighFTnXNUJ/oBULjgm2mAK0f7AGFIqgXB1Gsu5sm0d/TleXA7n2k= Received: from SJ0PR03CA0138.namprd03.prod.outlook.com (2603:10b6:a03:33c::23) by CH2PR12MB4152.namprd12.prod.outlook.com (2603:10b6:610:a7::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9499.2; Fri, 9 Jan 2026 04:50:17 +0000 Received: from SJ5PEPF000001F6.namprd05.prod.outlook.com (2603:10b6:a03:33c:cafe::1d) by SJ0PR03CA0138.outlook.office365.com (2603:10b6:a03:33c::23) with Microsoft SMTP Server (version=TLS1_3, cipher=TLS_AES_256_GCM_SHA384) id 15.20.9499.4 via Frontend Transport; Fri, 9 Jan 2026 04:50:17 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=satlexmb07.amd.com; pr=C Received: from satlexmb07.amd.com (165.204.84.17) by SJ5PEPF000001F6.mail.protection.outlook.com (10.167.242.74) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9520.1 via Frontend Transport; Fri, 9 Jan 2026 04:50:15 +0000 Received: from SATLEXMB04.amd.com (10.181.40.145) by satlexmb07.amd.com (10.181.42.216) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.17; Thu, 8 Jan 2026 22:50:15 -0600 Received: from satlexmb08.amd.com (10.181.42.217) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Thu, 8 Jan 2026 22:50:14 -0600 Received: from [10.136.32.160] (10.180.168.240) by satlexmb08.amd.com (10.181.42.217) with Microsoft SMTP Server id 15.2.2562.17 via Frontend Transport; Thu, 8 Jan 2026 22:50:11 -0600 Message-ID: <0615d2c6-c963-46ff-9088-d85e3821eec8@amd.com> Date: Fri, 9 Jan 2026 10:20:10 +0530 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] sched/fair: Fix vruntime drift by preventing double lag scaling during reweight To: Zicheng Qu , , , , , , , , , , CC: References: <20251226001731.3730586-1-quzicheng@huawei.com> Content-Language: en-US From: K Prateek Nayak In-Reply-To: <20251226001731.3730586-1-quzicheng@huawei.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Received-SPF: None (SATLEXMB04.amd.com: kprateek.nayak@amd.com does not designate permitted sender hosts) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ5PEPF000001F6:EE_|CH2PR12MB4152:EE_ X-MS-Office365-Filtering-Correlation-Id: 08872439-c95c-4692-8b43-08de4f3a9517 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|36860700013|82310400026|7416014|1800799024|376014|921020; X-Microsoft-Antispam-Message-Info: =?utf-8?B?M3N0TGVwRnRaOHN0dFFqSUdYalBoZGlraFVncVFuSGJGQzNFdUl2cVJmSzh3?= =?utf-8?B?QkY0cXJ3bWRrbURkZTZrdG5IeWdzM2NkN2g1OFNPSU5zemQrdlpTWGhqcDdD?= =?utf-8?B?d0U4cVlzbmJRZHlwZ0lqNlp4VWR1M0V5SDExMG92MXpHRk1NTmd4RC96N29Y?= =?utf-8?B?d0ltSmFpZVEyS1VaSXpvT2w4SDlJMFpETkxVTkt4SjFCSnJhVHMxUUJyWTNm?= =?utf-8?B?SzRJblBYN002L3p4SFFIQm1qN0pEZmI0M2RpWWJoTEtIWGM4Z3RsendRSHpG?= =?utf-8?B?SGRiUnZiUFdOWDR4NzgwVEpTaGlCWlVGc21KVE9ZSlo3d3BiRSs4U1VXcjVU?= =?utf-8?B?c09Gb29zT1h2cXNmNXljMkFCKzF2djhnUjhXazJ6Nm1SVG1ZMHRGWm1rM2Ft?= =?utf-8?B?d3FHK0JmakpIeW9oNk5Ha0s0SS9WUGI1MDV0TWw1ZzY4M2NFN2E4UDdCbTFI?= =?utf-8?B?NDBuWjFVaVFyZ1h1cys5SmhTYkRFUW5tSS8rcWd6d2RJdE44VENtMFJFN09i?= =?utf-8?B?TEE5czYzY1FVb205MXBDSWhIUllBVlFXcDFXYkx6b1MrSXQ1a0tDdWFFSDhl?= =?utf-8?B?ZzliYTY3eXMvMldCQm5zdUZRSlBGK2xab1ZVVHlQV1YrOUd1WmhrVFUyWWh4?= =?utf-8?B?N1BNcTBDaHJQV05UaVhYUVYxZ1kxbE5ZbGlvQ3JtQWpzbThXUnF4TjVSaU1W?= =?utf-8?B?S3hOZGNmRUVpTDFUUm5aNWRvUnlVR3pMZ3Z3QW54cnJvTGNaNFVibDJtSW9p?= =?utf-8?B?Nm1kTnBuZHl4OFJDa3QzSE9wempyRjEveXR2aXZFeVNIczVZQWVZVTVBakdV?= =?utf-8?B?NmFUY0FaVlo5dnhjbmFWakMzODlnNkdhZ1lKN25tYUlGYnk5ZW5GVUJXUVZs?= =?utf-8?B?TmRuWXczTnh6RXVGbHJiZUlORmN5QUJTTmlNQWg2cUgzWk9iVGkxWE1rRHdL?= =?utf-8?B?WndONU9sYkROMFplWm5kMDdobysyYnM2MVRJRG1nVHArRi8xVGNjY0VsTW1a?= =?utf-8?B?Q1VQRzhUK1I1Z2dkOWRkN0t1OVprb1NTemZ1M3ZmUHNYWkJkSHR4czFEdWZN?= =?utf-8?B?a0xDZ3hWUXM3ZkcwNDN5ZE1HMjc1RjVFbS9LazdTZ3NveWQxaHJsWnp1ZG1n?= =?utf-8?B?UUd3UExYajk2TWszWUJzWE5QbVVsZXVOeUZpZWFLSzdYSDdaVy9aWW83NEtq?= =?utf-8?B?T1VNZ3dRZklBNW5DZStnKzFqNzZnTDBVTmJXWXpxUGZNeWpVck52MnRTdzFl?= =?utf-8?B?akNWSCtDTk5EYk9Bb28xOEdpNVFaUThPWXN0bGE1R1orcU9OVzdubjhXSFp2?= =?utf-8?B?cXdxMFM1YW9UVVpuM3VHTTJ4aUhxY0JkL0xLdW1wT0U2ZktteWZ4cGFGR1RB?= =?utf-8?B?N2ROL0IyMTBONW0rcm1PTVJiMXgwd21GMmZWZm5uQWZBOUsxcGlpMHdpU215?= =?utf-8?B?TnEzR2J1cWw1aVQ1dlh2aVpJNkdDZHVGYUxHQTRPckFSRHBXTmxtOHNOUWRI?= =?utf-8?B?eW4rOXpMN2V2NGdnRzNFSjNFY1M3dmJCaGJjc3hpdDROUkFicXB2bUZUNm1M?= =?utf-8?B?UkNzSFI5TEhsNUc1eFptdEdVbEU3ZGVaQkxOSVdUR0pkcElQWktDd2NLZm1W?= =?utf-8?B?N1lhRkgrZjM3ZUZ3ZlFFU3BjUW1TUzRLRnhEdC9LNXg4L3NQMkhTdzc2Z0Iz?= =?utf-8?B?OTNjQXZSMjNra1g5aTlMQjF5TXd5dUdYUUVBdGhpbTM5WU5PSVZpckg4N0o1?= =?utf-8?B?TjdEc1NWKzhVeWNJN3RWaDlkQm5jTlVXaVI1TFEreGJoT1V1K3UwWWpCK2cv?= =?utf-8?B?RVlmN1lrM2NucFhxODRzN3hxMW4xRERNWUJ4YllUSXN0SmpMcllHNE1nY24x?= =?utf-8?B?QitidkRmVUFsRDhTQWZSRjUrN1pCNmRMekIzcS9mbmZCTXF4MGZsbnZSWU9u?= =?utf-8?B?NGV3Q1FTWDBuUGlQSkptQ1lsOHN0OThyRy9OYVVienFWdTZVWVdINmdwYXJp?= =?utf-8?B?NVhRZjM5RnBNU0JJSW5STFFxUW9DN1lWOVQzK0pNQmJIMm1aMGFFWlJYTzBy?= =?utf-8?B?V3htSE4xK2o1VDRtSmxhNElwbVMxK3pVUVdYRkc1TjFJQlVZZk5UOEF4M3Rw?= =?utf-8?Q?6NXNlKm6xqJmOcC4lZzsqsToR?= X-Forefront-Antispam-Report: CIP:165.204.84.17;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:satlexmb07.amd.com;PTR:InfoDomainNonexistent;CAT:NONE;SFS:(13230040)(36860700013)(82310400026)(7416014)(1800799024)(376014)(921020);DIR:OUT;SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Jan 2026 04:50:15.9893 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 08872439-c95c-4692-8b43-08de4f3a9517 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d;Ip=[165.204.84.17];Helo=[satlexmb07.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ5PEPF000001F6.namprd05.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CH2PR12MB4152 Hello Zicheng, On 12/26/2025 5:47 AM, Zicheng Qu wrote: > In reweight_entity(), when reweighting a currently running entity (se == > cfs_rq->curr), the entity remains on the runqueue context without > undergoing a full dequeue/enqueue cycle. This means avg_vruntime() > remains constant throughout the reweight operation. > > However, the current implementation calls place_entity(..., 0) at the > end of reweight_entity(). Under EEVDF, place_entity() is designed to > handle entities entering the runqueue and calculates the virtual lag > (vlag) to account for the change in the weighted average vruntime (V) > using the formula: > > vlag' = vlag * (W + w_i) / W > > Where 'W' is the current aggregate weight (including > cfs_rq->curr->load.weight) and 'w_i' is the weight of the entity being > enqueued (in this case, the se is exactly the cfs_rq->curr). > > This leads to a "double scaling" logic for running entities: > 1. reweight_entity() already rescales se->vlag based on the new weight > ratio. > 2. place_entity() then mistakenly applies the (W + w_i)/W scaling again, > treating the reweight as a fresh enqueue into a new total weight > pool. > > This can cause the entity's vlag to be amplified (if positive) or > suppressed (if negative) incorrectly during the reweight process. > > In environments with frequent cgroup throttle/unthrottle operations, > this math error manifests as a vruntime drift. > > A hungtask was observed as below: > crash> runq -c 0 -g > CPU 0 > CURRENT: PID: 330440 TASK: ffff00004cd61540 COMMAND: "stress-ng" > ROOT_TASK_GROUP: ffff8001025fa4c0 RT_RQ: ffff0000fff42500 > [no tasks queued] > ROOT_TASK_GROUP: ffff8001025fa4c0 CFS_RQ: ffff0000fff422c0 > TASK_GROUP: ffff0000c130fc00 CFS_RQ: ffff00009125a400 cfs_bandwidth: period=100000000, quota=18446744073709551615, gse: 0xffff000091258c00, vruntime=127285708384434, deadline=127285714880550, vlag=11721467, weight=338965, my_q=ffff00009125a400, cfs_rq: avg_vruntime=0, zero_vruntime=2029704519792, avg_load=0, nr_running=1 > TASK_GROUP: ffff0000d7cc8800 CFS_RQ: ffff0000c8f86800 cfs_bandwidth: period=14000000, quota=14000000, gse: 0xffff0000c8f86400, vruntime=2034894470719, deadline=2034898697770, vlag=0, weight=215291, my_q=ffff0000c8f86800, cfs_rq: avg_vruntime=-422528991, zero_vruntime=8444226681954, avg_load=54, nr_running=19 > [110] PID: 330440 TASK: ffff00004cd61540 COMMAND: "stress-ng" [CURRENT] vruntime=8444367524951, deadline=8444932411139, vlag=8444932411139, weight=3072, last_arrival=4002964107010, last_queued=0, exec_start=3872860294100, sum_exec_runtime=22252021900 > ... > [110] PID: 330291 TASK: ffff0000c02c9540 COMMAND: "stress-ng" vruntime=8444229273009, deadline=8444946073008, vlag=-2701415, weight=3072, last_arrival=4002964076840, last_queued=4002964550990, exec_start=3872859839290, sum_exec_runtime=22310951770 > [100] PID: 97 TASK: ffff0000c2432a00 COMMAND: "kworker/0:1H" vruntime=127285720095197, deadline=127285720119423, vlag=48453, weight=90891264, last_arrival=3846600432710, last_queued=3846600721010, exec_start=3743307237970, sum_exec_runtime=413405210 > [120] PID: 15 TASK: ffff0000c0368080 COMMAND: "ksoftirqd/0" vruntime=127285722433404, deadline=127285724533404, vlag=0, weight=1048576, last_arrival=3506755665780, last_queued=3506852159390, exec_start=3461615726670, sum_exec_runtime=16341041340 > [120] PID: 50173 TASK: ffff0000741d8080 COMMAND: "kworker/0:0" vruntime=127285722960040, deadline=127285725060040, vlag=-414755, weight=1048576, last_arrival=3506828139580, last_queued=3506972354700, exec_start=3461676584440, sum_exec_runtime=84414080 > [120] PID: 58662 TASK: ffff000091180080 COMMAND: "kworker/0:2" vruntime=127285723428168, deadline=127285725528168, vlag=3049158, weight=1048576, last_arrival=3505689085070, last_queued=3506848131990, exec_start=3460592328510, sum_exec_runtime=89193000 > > TASK 1 (systemd) is waiting for cgroup_mutex. > TASK 329296 (sh) holds cgroup_mutex and is waiting for cpus_read_lock. > TASK 50173 (kworker/0:0) holds the cpus_read_lock, but fail to be > scheduled. > test_cg and TASK 97 may have suppressed TASK 50173, causing > it to not be scheduled for a long time, thus failing to release locks in > a timely manner and ultimately causing a hungtask issue. > > Fix by adding ENQUEUE_REWEIGHT_CURR flag and skipping vlag recalculation > in place_entity() when reweighting the current running entity. For > non-current entities, the existing logic remains as dequeue/enqueue > changes avg_vruntime(). > > Fixes: 6d71a9c61604 ("sched/fair: Fix EEVDF entity placement bug causing scheduling lag") > Signed-off-by: Zicheng Qu > --- > kernel/sched/fair.c | 11 ++++++++++- > kernel/sched/sched.h | 1 + > 2 files changed, 11 insertions(+), 1 deletion(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index da46c3164537..3be42729049e 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -3787,7 +3787,7 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, > > enqueue_load_avg(cfs_rq, se); > if (se->on_rq) { > - place_entity(cfs_rq, se, 0); > + place_entity(cfs_rq, se, curr ? ENQUEUE_REWEIGHT_CURR : 0); > update_load_add(&cfs_rq->load, se->load.weight); > if (!curr) > __enqueue_entity(cfs_rq, se); > @@ -5123,6 +5123,14 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) > > lag = se->vlag; > > + /* > + * ENQUEUE_REWEIGHT_CURR: > + * current running se (cfs_rq->curr) should skip vlag recalculation, > + * because avg_vruntime(...) hasn't changed. > + */ > + if (flags & ENQUEUE_REWEIGHT_CURR) > + goto skip_lag_scale; If I'm not mistaken, the problem is that we'll see "curr->on_rq" and then do: if (curr && curr->on_rq) load += scale_load_down(curr->load.weight); lag *= load + scale_load_down(se->load.weight); which shouldn't be the case since we are accounting "se" twice when it is also the "curr" and avg_vruntime() would have also accounted it already since "curr->on_rq" and then we do everything twice for "se". I'm wondering if instead of adding a flag, we can do: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7377f9117501..7b4a7f4f2efa 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3792,8 +3792,9 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, unsigned long weight) { bool curr = cfs_rq->curr == se; + bool queued = !!se->on_rq; - if (se->on_rq) { + if (queued) { /* commit outstanding execution time */ update_curr(cfs_rq); update_entity_lag(cfs_rq, se); @@ -3803,6 +3804,12 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, if (!curr) __dequeue_entity(cfs_rq, se); update_load_sub(&cfs_rq->load, se->load.weight); + /* + * Indicate that se is off the cfs_rq for place_entity() + * to correctly scale the weight especially when curr is + * being placed back. + */ + se->on_rq = 0; } dequeue_load_avg(cfs_rq, se); @@ -3823,12 +3830,14 @@ static void reweight_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, } while (0); enqueue_load_avg(cfs_rq, se); - if (se->on_rq) { + if (queued) { place_entity(cfs_rq, se, 0); update_load_add(&cfs_rq->load, se->load.weight); if (!curr) __enqueue_entity(cfs_rq, se); cfs_rq->nr_queued++; + /* Entity has been enqueued back. */ + se->on_rq = 1; } } --- This matches what we do for curr in enqueue_entity() where we know "cfs_rq->curr == se" but "se->on_rq == 0". Thoughts? On a side note, I was looking at requeue_delayed_entity() and was wondering if something like this makes sense there since it also does a place_entity() but then an entity can never be "cfs_rq->curr" and be delayed when we drop the rq_lock: 1) If se is ineligible, there must be another queued entity and if it is runnable, pick_task_fair() will pick the runnable entity and do an equivalent of (put_prev/set_next)_entity() to switch the "cfs_rq->curr" to the runnable hierarchy before dropping the rq_lock. 2) If everything is delayed, pick_next_entity() will dequeue them all completely before dropping the rq_lock for idle balancing. FWIW, I've been running with: diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 7377f9117501..550bddfb2cc0 100644 @@ -6843,6 +6852,7 @@ requeue_delayed_entity(struct sched_entity *se) */ WARN_ON_ONCE(!se->sched_delayed); WARN_ON_ONCE(!se->on_rq); + WARN_ON_ONCE(cfs_rq->curr == se); if (sched_feat(DELAY_ZERO)) { update_entity_lag(cfs_rq, se); --- and I haven't seen any splats (yet!) :-) Peter, thoughts? > + > /* > * If we want to place a task and preserve lag, we have to > * consider the effect of the new entity on the weighted > @@ -5185,6 +5193,7 @@ place_entity(struct cfs_rq *cfs_rq, struct sched_entity *se, int flags) > lag = div_s64(lag, load); > } > > +skip_lag_scale: > se->vruntime = vruntime - lag; > > if (se->rel_deadline) { > diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h > index d30cca6870f5..e3a43f94dd2f 100644 > --- a/kernel/sched/sched.h > +++ b/kernel/sched/sched.h > @@ -2412,6 +2412,7 @@ extern const u32 sched_prio_to_wmult[40]; > #define ENQUEUE_MIGRATED 0x00040000 > #define ENQUEUE_INITIAL 0x00080000 > #define ENQUEUE_RQ_SELECTED 0x00100000 > +#define ENQUEUE_REWEIGHT_CURR 0x00200000 > > #define RETRY_TASK ((void *)-1UL) > -- Thanks and Regards, Prateek