From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from sg-1-104.ptr.blmpb.com (sg-1-104.ptr.blmpb.com [118.26.132.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E7E55316180 for ; Fri, 28 Nov 2025 11:55:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.104 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764330929; cv=none; b=ZmNfchRChMlCqUVvAyEAjVhyhPLE/o0PQGLQQI5GybW7tUgpa7uaQ17e7t0wkwDPO047qOjS2pvLuWuzG0FDaNld5CX5Jp0JYZ1nPGvF4X8ohfP5W1nTVBmZxKjxixAlaOpbX79eNYLoiVTmC+rYnQHB5v+1FDqfW1y9pHHAEl0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764330929; c=relaxed/simple; bh=5rN2sNDWFEg+BMAs4tkyzOnRPb4F0IabpRF+4FiZPNI=; h=To:Cc:Content-Disposition:In-Reply-To:Subject:Content-Type: References:From:Date:Message-Id:Mime-Version; b=gD8q9JCPhtk8pZAcAgR58MUO1CO69FcGuXUqL2YBPujTU73IEtDGg8QK8ia7p0kCdBApxtO1tNm8LM0KR14OtxEIF6VrWnQfyNYle8MyZQ6+eaTlIznx+A/plgdGOs6m8EeTMOaRxlmZde6Yp1o2WaPjpLCQBHeWbnfSIDzOHXU= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=LA/6Kezz; arc=none smtp.client-ip=118.26.132.104 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="LA/6Kezz" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1764330913; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=cxw/0blW+uUqCI4a4MpZbbL98vAmlxV1NVJypIpHH7Q=; b=LA/6Kezz3CxjqxLMQAfiINqhJ0UHeIfPvIb1s2zeuqc+HoPi03WQljySpkvoJQoLsirPVW b4YarNPJy0bO38FAWKOPYnxVAJn2k8tjCOaJiwT98WD9YY2bExSe/WCo1f/cGaXtLf1NVY h/J1263fbzbC8k1XrJnazIDQPGNjRXklJqtfSElMgJwekTO3xRUUEvtjCEyW1Su/c3PqJQ cW3hZ6YEZm/ojHpn0UeA/eetuRpCrgV7w8y6WA9w4YWXqFojA4NGa1BBamBSpeOojcLBzE 7jj75uH5eLeeKPXz9lSsjpbgApiowFDRilyhIjQRuhZHTImXKmVuHNWesUWfEw== To: "xupengbo" , "Ingo Molnar" , "Peter Zijlstra" Cc: "Juri Lelli" , "Vincent Guittot" , "Dietmar Eggemann" , "Steven Rostedt" , "Ben Segall" , "Mel Gorman" , "Valentin Schneider" , "David Vernet" , , Content-Disposition: inline In-Reply-To: <20250827022208.14487-1-xupengbo@oppo.com> Content-Transfer-Encoding: 7bit Subject: Re: [PATCH v5] sched/fair: Fix unfairness caused by stalled tg_load_avg_contrib when the last task migrates out. Content-Type: text/plain; charset=UTF-8 References: <20250827022208.14487-1-xupengbo@oppo.com> X-Original-From: Aaron Lu From: "Aaron Lu" X-Lms-Return-Path: Date: Fri, 28 Nov 2025 19:54:45 +0800 Message-Id: <20251128115445.GA1526246@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Hello, On Wed, Aug 27, 2025 at 10:22:07AM +0800, xupengbo wrote: > When a task is migrated out, there is a probability that the tg->load_avg > value will become abnormal. The reason is as follows. > > 1. Due to the 1ms update period limitation in update_tg_load_avg(), there > is a possibility that the reduced load_avg is not updated to tg->load_avg > when a task migrates out. > 2. Even though __update_blocked_fair() traverses the leaf_cfs_rq_list and > calls update_tg_load_avg() for cfs_rqs that are not fully decayed, the key > function cfs_rq_is_decayed() does not check whether > cfs->tg_load_avg_contrib is null. Consequently, in some cases, > __update_blocked_fair() removes cfs_rqs whose avg.load_avg has not been > updated to tg->load_avg. > > Add a check of cfs_rq->tg_load_avg_contrib in cfs_rq_is_decayed(), > which fixes the case (2.) mentioned above. > > Fixes: 1528c661c24b ("sched/fair: Ratelimit update to tg->load_avg") > Tested-by: Aaron Lu > Reviewed-by: Aaron Lu > Reviewed-by: Vincent Guittot > Signed-off-by: xupengbo I wonder if there are any more concerns about this patch? If no, I hope this fix can be merged. It's a rare case but it does happen for some specific setup. Sorry if this is a bad timing, but I just hit an oncall where this exact problem occurred so I suppose it's worth a ping :) Best regards, Aaron