From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-alma10-1.taild15c8.ts.net [100.103.45.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AEF553D45C1; Fri, 3 Jul 2026 13:19:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=100.103.45.18 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783084773; cv=none; b=hW6dqwlNeqOvlrGboRFIhNFtlpIzP/YZedNEBXkUbNx1eflRTpQ3p5O6UZAj49SNIeG0XGnLu5JjwwfX/sG+o0lNW7vv5Be77Z37kf8XxE14FtOXA6Ab+ohZU+/5RNX+gvg8ak8Co28OitpDBXZw9rBC/xlem5p8nyZrbNQuSFU= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1783084773; c=relaxed/simple; bh=SybsfnBwWh6wNYO6K17YybWLHNe2wFkmCwK5CiUpNw8=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=XqKv/Q/JW7yWA/IBNdU5YL0Aiw+GJMJS/fXgcyo7Qv/VDGQXSka55y0BLQ+iLCxImH9r5Z6BNoqGdNLc5LPP220vxAKVm9CYw0o18voLss+N1JCRPrvSlZKS3AZb2x6PYzt2ScivcXeH0bXohXD+wEw/6PfqqCVPGNZEOY6qCws= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=QOE+X9Tc; arc=none smtp.client-ip=100.103.45.18 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="QOE+X9Tc" Received: by smtp.kernel.org (Postfix) with ESMTPSA id CE7C41F000E9; Fri, 3 Jul 2026 13:19:31 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=kernel.org; s=k20260515; t=1783084772; bh=nQj/hR3UQosCLZxgAly99fHbBbLd46rIDcrsPC7zt0Y=; h=Date:From:To:Cc:Subject:References:In-Reply-To; b=QOE+X9Tc6w7Al+6bRdD2DCLmRFcVP3+Apmw+i8eugOlGjisfMQa7KfblbyOmJ1bgS hwibJ4dstELm2Q++c9s1rXuAx5KlK8klqEc5gYTFi45Ow3239Cl843wnhFoQOAIj2+ E2moHoWghnK4DR2XdCbqrF2H09E1E/8UYuCAoaCKS4iSasC0Jq5Rpv6voChRUH0h7k wXywDHjK+YiIIr/u1pRtDZ1HcUekQqxYMHKpjCWsAZV4h0yg9LBK9T6YXNBrMekQNl Q7HpGNGpyvLwtGYpR9rxY+OQWlbcrz68J/3lyyMecJu4tIg43kYWthZ6d4McEmnZ5Q aF7C+y7vWCbEQ== Date: Fri, 3 Jul 2026 15:19:29 +0200 From: Frederic Weisbecker To: Waiman Long Cc: Jing Wu , Thomas Gleixner , linux-kernel@vger.kernel.org, rcu@vger.kernel.org, cgroups@vger.kernel.org, Qiliang Yuan Subject: Re: [PATCH-next 00/23] cgroup/cpuset: Enable runtime update of nohz_full and managed_irq CPUs Message-ID: References: <20260421030351.281436-1-longman@redhat.com> <20260624063404.2106807-1-realwujing@gmail.com> <4ad24488-9cc1-4f1c-8dc5-6830ae7420df@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: Le Wed, Jul 01, 2026 at 02:56:34PM -0400, Waiman Long a écrit : > On 7/1/26 10:22 AM, Frederic Weisbecker wrote: > > Le Thu, Jun 25, 2026 at 01:27:54AM -0400, Waiman Long a écrit : > > > On 6/24/26 2:34 AM, Jing Wu wrote: > > > > 3. Are there specific patches in your series where you would welcome > > > > our contribution directly? > > > I have broken down the shutdown callback into separate portions as suggested > > > by Thomas. The other major change that I am working on is to try to shutdown > > > to only CPUHP_AP_OFFLINE state instead of all the way down to CPUHP_OFFLINE. > > What was the reason for that already? Can we perhaps ask the user to offline > > the target CPUs before toggling isolation on them? > The major problem about fully offlining the CPU is the CPU hotplug stop > machine mechanism which put all the CPUs except the CPU to be offlined in a > waiting loop within the IPI handler when the offline CPU is transitioning > from CPUHP_TEARDOWN_CPU to  CPUHP_AP_IDLE_DEAD. If there is another active > isolated partition running DPDK, for instance, it will break the low latency > guarantee for a short duration. Looks like a long standing problem that does not only concern nohz_full but also RT in general. I made a proposal a while ago to solve this: https://lore.kernel.org/lkml/aQuNdOEmPYkI03my@localhost.localdomain/ To summarize, we could remove that stop machine thing and have this on the outgoing CPU at CPUHP_TEARDOWN_CPU: set_cpu_online(cpu, 0) synchronize_rcu() migrate things // call CPUHP_TEARDOWN_CPU -> CPUHP_AP_IDLE_DEAD And on other CPUs the usual should work: preempt_disable() // could now be replaced with rcu_read_lock() if (cpu_online(target)) // do things preempt_enable() There are a few dragons on the way in the update side but nothing unsolvable as far as I checked. Of course we must check all those callbacks one by one. Also on the read side we must be careful because: rcu_read_lock() A = cpu_online(target)) B = cpu_online(target)) rcu_read_unlock() We can now have A && !B but I doubt many callsites do that. > > > That will require some adjustments to the nohz_full related hotplug > > > functions. I have some ideas of what needs to be done. However, I haven't > > > looked into RCU yet. I know RCU support changing the nocb mask for fully > > > offline CPUs, I will need to find out if it possible to do that for > > > partially offline CPUs. > > No because callbacks can still be enqueued at this stage. But we could > > manage to make it work with CPUHP_AP_IDLE_DEAD. > > If we can only go as high as CPUHP_AP_IDLE_DEAD, we may as well go down all > the way to CPUHP_OFFLINE as stop machine should be done at > CPUHP_AP_IDLE_DEAD. In that case, we may have to break RCU out from > HK_TYPE_KERNEL_NOISE and add a cpuset control switch for the system > administrators to decide if they are willing to suffer a brief latency spike > for an existing isolated partition or keep the RCU housekeeping mask > unchanged to avoid that when creating a new or destroying an old isolated > partition. Halfway nohz_full doesn't sound good... Thanks. -- Frederic Weisbecker SUSE Labs