From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 021573D9DC5 for ; Wed, 11 Mar 2026 11:06:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773227186; cv=none; b=sCACqVfFXfgEZxGL+G759NoHXAFEgz0MKdV0sDw5a+0SGn42iFoOB+JzNWA8csrMG/uMtiZqHdD1STXD7H0XdHICUsou8GnlYCJm6ndeTP9djNefaJZxCZof2zjbLqsY42KEdXMdtTTJ4KECHwMimKNpcFnQt5YtbVZIDuvjD3A= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773227186; c=relaxed/simple; bh=+Sr923D8OUkVmKoop2QrSVIlGbCKfSssFcj7aEPyKNY=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=D+yHJsifmOw+m7FIfMt5C8NYBxRh4LkZ6PcXuYrzpWaL+AshmyDlatkd2IWE76cfoBHcU0h95MNU6Lev2iid4DaSnJ0qzyLSzkM/Sn2fN1ceXa462z8A3STqPIITamqmLMqx0z0v005geYoy9OVpwnSCPpCS63JbGRcXtstrbW0= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CSuckZbw; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CSuckZbw" Received: by smtp.kernel.org (Postfix) with ESMTPSA id F4064C4CEF7; Wed, 11 Mar 2026 11:06:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1773227185; bh=+Sr923D8OUkVmKoop2QrSVIlGbCKfSssFcj7aEPyKNY=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=CSuckZbwJhouY8z/JKBrYyWJxvIXEPNGxOqQvnm7jee1TOPv+TDQs597IsapqO50A piKMascBSC2wzh8HslfawI3hXWbmikkFixIgV/y7+XrEjIqOePro3qRKweW0rS1a1t hR578NN/ghGjOg8O+Q6/sEqhNT+MJzLImrUghmqYZyD7h4NycDjzbLxEitmi6kGXOM 63L/JM4pjnr1YNFtEZUZcFfhA9HixqZTdEJmuLOfT7kKHrBM720RL8uRGDz9f6qL5l XdPSyWcfTumllfGOGgI6S05GO2Rc9UYxyAgqh819eg4KujSm2djCr6IJ1u5OmOHrjw JSuAfpqhqU/0g== Date: Wed, 11 Mar 2026 12:06:22 +0100 From: Frederic Weisbecker To: "Christoph Lameter (Ampere)" Cc: Shubhang Kaushik , Anna-Maria Behnsen , Ingo Molnar , Thomas Gleixner , Vincent Guittot , Valentin Schneider , dietmar.eggemann@arm.com, bsegall@google.com, mgorman@suse.de, rostedt@goodmis.org, linux-kernel@vger.kernel.org, Adam Li Subject: Re: [RESEND PATCH] tick/nohz: Fix wrong NOHZ idle CPU state Message-ID: References: <20260203-fix-nohz-idle-v1-1-ad05a5872080@os.amperecomputing.com> <8ae7f176-2e1c-2390-a7da-a694df4b551e@gentwo.org> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <8ae7f176-2e1c-2390-a7da-a694df4b551e@gentwo.org> Le Fri, Feb 13, 2026 at 10:15:15AM -0800, Christoph Lameter (Ampere) a écrit : > On Fri, 13 Feb 2026, Frederic Weisbecker wrote: > > > Then there seem to be something else going on that we don't fully understand > > because isolated CPUs run 1 pinned task per CPU and the only housekeeping CPU > > is CPU 0. So there is nothing to balance here. > > > > Perhaps some CPUs spend too much time scanning through all isolated CPUs to > > see if there is balancing to do. I don't know, this needs further investigation. > > But if the nohz_full CPUs are correctly domain isolated as they should > > (through isolcpus=domain or cpuset isolated partitions), they should be > > invisible to ilb anyway. > > > "balancing" would mean moving tasks from busy cpus (that are not in > NOHZ_FULL state) to idle cpus that can then be in NOHZ_FULL state. > > If the move to from a busy cpu to an idle cpu succeeds then both cpus may > only run one process and be able to enter NOHZ_FULL. > > This is f.e. the caser with threadpools used by certain AI apps. Before > the app starts numactl is used to setup a group of cpus that the app can use. > > One may optimize and allow NOHZ_FULL for these cpus. > > The app will then create a number of threads during its startup phase. > These should be all placed on idle cpus in the allowed cpu range. > > If this is configured the right way then each thread is on a different cpu > and there is one thread per cpu so that we can use NOHZ_FULL. > > This is sometimes broken because not all idle cpus are used. Instead some > cpus get two threads and other cpus stay idle. That is why idle load > balancing is needed. Which means you guys eventually rely on load balancing... So I can only repeat what I said there: https://lore.kernel.org/lkml/aY3k1_JJjPFUhPd4@localhost.localdomain/ > There is no cpu isolation/cgroups or other black magic involved here. Too bad, static task placement would fix your issue and domain isolation would improve your workload. Thanks. -- Frederic Weisbecker SUSE Labs