From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from fra-out-013.esa.eu-central-1.outbound.mail-perimeter.amazon.com (fra-out-013.esa.eu-central-1.outbound.mail-perimeter.amazon.com [63.178.132.221]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4FE5030DD35 for ; Mon, 1 Dec 2025 12:59:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=63.178.132.221 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764593987; cv=none; b=np/awhRz/eNCxwS/DMmyzXrZIk/xb32OUWI9XKZ8ZbcM3VSpmRuejrLa0eX0s7rZRGEs8w2/VA20g0OZsj4C7FW6Bz0tpfPIgTWCOHlesmrYs1VnDVkA2LOMVBho0StfDiSBiMaKKHZkJqWayPjuwKNytBlziMd0yhlKvacaTFg= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1764593987; c=relaxed/simple; bh=p9BpLfzbImE5wZDrXzDut1b1hRuWNkGaetip8iuAEUE=; h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=PCWcktn9fzBtKL75BNBiQh3iP3CAbp5ZO0zDEHWfE08RJsMbD4eSJyIbD+HtdFd0J4v41i84yqHXOm8aMk/DdLBc0SaisHTEsInKTH9nD4YnnvjTBAFsUkFMMZL6zzIr5647YYBm2V7SlMdhxZjOF7ZfCPs2zT87sZHeUn2XJbE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com; spf=pass smtp.mailfrom=amazon.com; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b=WwjAz1j8; arc=none smtp.client-ip=63.178.132.221 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=amazon.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=amazon.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=amazon.com header.i=@amazon.com header.b="WwjAz1j8" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazoncorp2; t=1764593985; x=1796129985; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1j6/SO2acgI5jMAUJEf68Gx6oInjoOosjNCSHlg93Io=; b=WwjAz1j8a7ZNtE5MoSwO3esACgyGVU1d6hEI5NuLdmb9SOdAm5jdfV3F rhBHRF0E/7y86N8ZrfZf8QS2Dguvpr7GViKOgL72UnyGO3UnM62SI8e+g oDgCaHIzflTBtaHsPluvgl2KmyzLjKsRj7HPX0drl19gAu42Fpb6TSDLI Y/4ih8FLAb1mTPeMp+SRwZj3bAniE3OcJhxnGr3uULztXSpnahtLFTfnC WU9Wyb5LJBzSGrnztme1ELhhQJEAILW+xVz/hDyLbKLZ9ZM7nW0viGlMq VvbKoG0FoFEn69wnLTIXGbjeK8WP2VA3TDuMAtrdMrqoXnA5JY8vIPPMC g==; X-CSE-ConnectionGUID: hZ4cXmQuSHKXS15EeQDWsQ== X-CSE-MsgGUID: TqEii3D8QMusE2z+/TT0sw== X-IronPort-AV: E=Sophos;i="6.20,240,1758585600"; d="scan'208";a="5948930" Received: from ip-10-6-6-97.eu-central-1.compute.internal (HELO smtpout.naws.eu-central-1.prod.farcaster.email.amazon.dev) ([10.6.6.97]) by internal-fra-out-013.esa.eu-central-1.outbound.mail-perimeter.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Dec 2025 12:59:24 +0000 Received: from EX19MTAEUC002.ant.amazon.com [54.240.197.228:5191] by smtpin.naws.eu-central-1.prod.farcaster.email.amazon.dev [10.0.10.60:2525] with esmtp (Farcaster) id 0117bd73-594e-49dd-a864-e8a0e43a201d; Mon, 1 Dec 2025 12:59:24 +0000 (UTC) X-Farcaster-Flow-ID: 0117bd73-594e-49dd-a864-e8a0e43a201d Received: from EX19D003EUB001.ant.amazon.com (10.252.51.97) by EX19MTAEUC002.ant.amazon.com (10.252.51.245) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Mon, 1 Dec 2025 12:59:23 +0000 Received: from u5934974a1cdd59.ant.amazon.com (10.146.13.109) by EX19D003EUB001.ant.amazon.com (10.252.51.97) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA) id 15.2.2562.29; Mon, 1 Dec 2025 12:59:15 +0000 From: Fernand Sieber To: Vincent Guittot CC: Peter Zijlstra , , , , , , , , , , , , , , , Subject: Re: [PATCH] sched/fair: Force idle aware load balancing Date: Mon, 1 Dec 2025 14:58:49 +0200 Message-ID: <20251201125851.272237-1-sieberf@amazon.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: References: <20251127202719.963766-1-sieberf@amazon.com> <20251128111427.GJ3245006@noisy.programming.kicks-ass.net> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-ClientProxiedBy: EX19D032UWA004.ant.amazon.com (10.13.139.56) To EX19D003EUB001.ant.amazon.com (10.252.51.97) Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit On Fri, 28 Nov 2025 at 14:50, Vincent Guittot wrote: > On Fri, 28 Nov 2025 at 12:14, Peter Zijlstra wrote: > > > > On Thu, Nov 27, 2025 at 10:27:17PM +0200, Fernand Sieber wrote: > > > > > @@ -11123,7 +11136,8 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s > > > return; > > > } > > > > > > - if (busiest->group_type == group_smt_balance) { > > > + if (busiest->group_type == group_smt_balance || > > > + busiest->forceidle_weight) { > > > > Should we not instead make it so that we select group_smt_balance in > > this case? > > Why do we need this test ? We have already removed forced idle cpus > from statistics ? > > I suppose Fernand wants to cover cases where there is 1 task per CPUs > so we are balanced but one CPU is forced idle and we want to force > migrating a task to then try to move back another one ? In this case > it should be detected early and become group_imbalanced type > Also what happens if we could migrate more than one task I've removed this override in v2, it doesn't seem to make much a difference after doing more benchmarking. When I traced LB inefficiencies, I noticed in some situations that a large imbalance (overloaded vs spare capacity) was detected, but remediation was delayed. So the intention of the override was to "nudge" the LB to take a remediation action immediately, regardless of the load to move, with the idea that it's better to migrate anything now rather than waste capacity in force idle for longer. This override was probably not the right tool for it. If I get a chance I'll try to dive deeper and provide more details. One different thing I noticed is that the task_hot check has a cookie check which is more or less bound to fail on busy large system running lots of different cookied tasks (e.g hypervisor on large servers with cookied time shared vCPUs) because there's almost zero chance that the target CPU is randomly running the same cookie as the migrating task. This delays migrations unnecessarily if the run queues are shorts and there are no valid spare candidates. Need to think more about that one, but if you have any ideas let me know.. ? Maybe instead of having this check the list of migrating tasks should be sorted to prioritize matching cookie tasks first if any, similar than proposed in the cache aware scheduling RFC? https://lwn.net/ml/all/26e7bfa88163e13ba1ebefbb54ecf5f42d84f884.1760206683.git.tim.c.chen@linux.intel.com/ Amazon Development Centre (South Africa) (Proprietary) Limited 29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa Registration Number: 2004 / 034463 / 07