From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from outgoing2021.csail.mit.edu (outgoing2021.csail.mit.edu [128.30.2.78]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CC91D1BC3F for ; Sat, 4 Jan 2025 19:10:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=128.30.2.78 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736017805; cv=none; b=jx/dt9UFvZnbixE8dOQweylhgJN2HHmTokZh/Ibq3AmZzys9TFopDNJl3INrtUZDt5+YGaSiE6gkjENCxWbXASzzQWweh5OtcyhpjEzW4Luw85VL+AJPd9QoxqnWHaFYzSMDCuBujgt286pYEO8K+kHjh0gDAWbyvJu+dfGGII0= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736017805; c=relaxed/simple; bh=v5v0JNr+Mk0+lbkFZJuD70DEgcD7QW3n+P9gtInGXXo=; h=Date:From:To:Cc:Subject:Message-ID:References:MIME-Version: Content-Type:Content-Disposition:In-Reply-To; b=TR2PBPYBvtjgEiOCGpPyh+PMvqP8uN7pQ7pJlsMSwC1MO870R/1pW9R1tJQTBxfY8l3PGwh+LoWJlpWCZOINlT/VgSvqfJVriOMpZsEnN+QHiqAps+EMp+IhOGTr9YgM59Nl6uihQvpPDQmY5+1sZ+gk5gvgh+MUWR1//KaIuqs= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=csail.mit.edu; spf=pass smtp.mailfrom=csail.mit.edu; dkim=pass (2048-bit key) header.d=outgoing.csail.mit.edu header.i=@outgoing.csail.mit.edu header.b=uqHV20X7; arc=none smtp.client-ip=128.30.2.78 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=csail.mit.edu Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=csail.mit.edu Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=outgoing.csail.mit.edu header.i=@outgoing.csail.mit.edu header.b="uqHV20X7" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=outgoing.csail.mit.edu; s=test20231205; h=In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description:Resent-Date: Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id: List-Help:List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=DOS8XR3cvTcbN5bWMrCwOViqpc7PSeZo5/h0JQDAEHQ=; t=1736017803; x=1736881803; b=uqHV20X7rd3asrMNtgTLQHTEx4wH65OY4j5e0VOR4jpm1MIBzpqphJ+9e3ub4NKG3B4rfl9/ChN fS5ly6pLWR8LGkx/DJB1tmykb8+q0xWR9a1VXqxX4tc6YSiuQBpAfuOJaZ01q5hGWyaFeNaqFnST3 c5z43Hq0g7CAwlMliu1YPZ1cgyW1bpPcyASDvS2XZGxO+C+UtVkt4diocmbfF8D7wPkWNGdheKxwE 2Gy2+0rZl6QERa8TptQNVUYwwT7oDloq4Sohx7EvW91dgV6tmIVSWVKBCcK7naB2tBnWj8JdXGBCD NF0bOcyb6Jb9sc/2iO+DduSE7nC59bGjmskQ==; Received: from [49.207.235.237] (helo=csail.mit.edu) by outgoing2021.csail.mit.edu with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.95) (envelope-from ) id 1tU91W-009vzf-Sl; Sat, 04 Jan 2025 13:37:23 -0500 Date: Sun, 5 Jan 2025 00:07:12 +0530 From: "Srivatsa S. Bhat" To: Costa Shulyupin Cc: Thomas Gleixner , Peter Zijlstra , Yury Norov , Andrew Morton , Valentin Schneider , Frederic Weisbecker , Neeraj Upadhyay , linux-kernel@vger.kernel.org, Waiman Long , x86@kernel.org, paulmck@kernel.org Subject: Re: [RFC PATCH v1] stop_machine: Add stop_housekeeping_cpuslocked() Message-ID: References: <20241218171531.2217275-1-costa.shul@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241218171531.2217275-1-costa.shul@redhat.com> Hi Costa, On Wed, Dec 18, 2024 at 07:15:31PM +0200, Costa Shulyupin wrote: > CPU hotplug interferes with CPU isolation and introduces latency to > real-time tasks. > > The test: > > rtla timerlat hist -c 1 -a 500 & > echo 0 > /sys/devices/system/cpu/cpu2/online > > The RTLA tool reveals the following blocking thread stack trace: > > -> multi_cpu_stop > -> cpu_stopper_thread > -> smpboot_thread_fn > > This happens because multi_cpu_stop() disables interrupts for EACH online > CPU since takedown_cpu() indirectly invokes take_cpu_down() through > stop_machine_cpuslocked(). I'm omitting the detailed description of the > call chain. > I had explored removing stop-machine from the CPU hotplug offline path a very long time ago: https://lore.kernel.org/all/20130218123714.26245.61816.stgit@srivatsabhat.in.ibm.com/ Towards the tail end of that patchset is the actual change that replaces the call to __stop_machine() with stop_one_cpus(): https://lore.kernel.org/all/20130218124431.26245.10956.stgit@srivatsabhat.in.ibm.com/ But before that, there were ~45 odd patches in the series to make sure that all the existing CPU hotplug callbacks (at the time, in that kernel version) relying on any implicit assumptions related to the guarantees provided by stop_machine() were adequately addressed with an alternative scheme before switching over to stop_one_cpu() for CPU offlining. > Proposal: Limit the stop operation to housekeeping CPUs. > > take_cpu_down() invokes with cpuhp_invoke_callback_range_nofail: > - tick_cpu_dying() > - hrtimers_cpu_dying() > - smpcfd_dying_cpu() > - x86_pmu_dying_cpu() > - rcutree_dying_cpu() > - sched_cpu_dying() > - cache_ap_offline() > > Which synchronizations do these functions require instead of stop_machine? > I'd recommend taking a look at one such prior attempt to remove stop_machine from CPU hotplug (shared above) for reference, as you begin your analysis for the current kernel. Regards, Srivatsa Microsoft Linux Systems Group