From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFDBE1D61B1 for ; Thu, 19 Dec 2024 21:38:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734644312; cv=none; b=nE5wwLFE6Va0ssZO5fbK27j+xeHvVsvZiF87I5zAbnGH5wSQKJrJux/B0m2ks/1ODGXVUiqrr9kXkC/XqD+j6Q8vcarYKaE8TTbECXDZQ7wE6108zbzznM7j3cQI7Ov/Nf+zEpGNq8nEAPQOmV/FXgyC+xtwYdR1jJCABJJAguw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734644312; c=relaxed/simple; bh=D8sbOHXFOY2EN0fSHJyCUVCgjpQluKQLk2M4WSmQBeE=; h=From:To:Subject:In-Reply-To:References:Date:Message-ID: MIME-Version:Content-Type; b=IhMfw11O0mmSFxr/fl73gvqQdeivjWThoYJHPRQBsCz8sUZ2sAAivfc72Tk2AIl1h5EXCG8RFyZyus4p3DHQ4L3QyOWXOwbwxoMXRAKrC7l5gdteV6DQPb/CB8mBJRnYkgqcx5nh9Q/8Fo0lVH7EyFFCWsiFMVOamQbH2dnIop4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=om0xomOr; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Qv6PgzGk; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="om0xomOr"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Qv6PgzGk" From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1734644308; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FddzshwyoO0fybx3gFO4DRoxp50NsYlwaVQEVJz9Zyk=; b=om0xomOrkjrlzYMkwmhOBriXOm3WR4J2UfNDndzO70nG2AvGpQWjAS3fG3JnszfT7och1P MPfXpcJjMU3YvHBp4jW7JFX9ekJJqIa6cfhx6JWnv8M3utDpvRw2ePrQJo/fKZu4l7Vyoh Tt5r7X01LfRNLwEq5rttrmZzWADmzmNitE4kqIv0SVU3NiKUAYEpvwUujfzQTAsOF6+hZl 0XP5HIGc8f4ifMkNuY0BSTslAAXKLZnsFKlRxQZm8yHGXQ1KXPgeJ3xMkXfQikaUP/MrTT gmB3A0P8ISnltNCGPrOgzryvWEuMM6kYxuqF5dKsomI5iU3Tb6vpcZsrD19IeQ== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1734644308; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=FddzshwyoO0fybx3gFO4DRoxp50NsYlwaVQEVJz9Zyk=; b=Qv6PgzGkuIR5HuVtJD6RiHSQ76lu96lv0i0ooM3xxLsCpTHPApBajI0ZnQBBN74AGskEC6 XRZcjFf8AHu+zSCg== To: Costa Shulyupin , Peter Zijlstra , Yury Norov , Andrew Morton , Costa Shulyupin , Valentin Schneider , Frederic Weisbecker , Neeraj Upadhyay , linux-kernel@vger.kernel.org, Waiman Long , x86@kernel.org Subject: Re: [RFC PATCH v1] stop_machine: Add stop_housekeeping_cpuslocked() In-Reply-To: <20241218171531.2217275-1-costa.shul@redhat.com> References: <20241218171531.2217275-1-costa.shul@redhat.com> Date: Thu, 19 Dec 2024 22:38:28 +0100 Message-ID: <87frmj73qj.ffs@tglx> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain Costa! On Wed, Dec 18 2024 at 19:15, Costa Shulyupin wrote: > CPU hotplug interferes with CPU isolation and introduces latency to > real-time tasks. > > The test: > > rtla timerlat hist -c 1 -a 500 & > echo 0 > /sys/devices/system/cpu/cpu2/online > > The RTLA tool reveals the following blocking thread stack trace: > > -> multi_cpu_stop > -> cpu_stopper_thread > -> smpboot_thread_fn > > This happens because multi_cpu_stop() disables interrupts for EACH online > CPU since takedown_cpu() indirectly invokes take_cpu_down() through > stop_machine_cpuslocked(). I'm omitting the detailed description of the > call chain. TLDR: It's well known that taking a CPU offline is done via stop_machine(), which requires all online CPUs to spin in the stopper thread. > Proposal: Limit the stop operation to housekeeping CPUs. You proposed other non-working solutions before. How is that any different? Especially after Peter told you how to proceed: https://lore.kernel.org/all/20241209095730.GG21636@noisy.programming.kicks-ass.net He said after you tinkered with stop_one_cpu(): "Audit the full cpu hotplug stack and determine who all relies on this 'implicit' serialization, then proceed to provide alternative solutions for these sites. Then, at the very end move to stop_one_cpu()." Where is that audit or the argument why you don't need the audit because you now restrict stop_machine() to the housekeeping CPUs? > take_cpu_down() invokes with cpuhp_invoke_callback_range_nofail: > - tick_cpu_dying() > - hrtimers_cpu_dying() > - smpcfd_dying_cpu() > - x86_pmu_dying_cpu() > - rcutree_dying_cpu() > - sched_cpu_dying() > - cache_ap_offline() That's what happens to be called with your kernel config on your test system. What about the other 50+ CPU hotplug states in that range, which are .config or architecture dependent? > Which synchronizations do these functions require instead of > stop_machine? So you propose a change without knowing what the consequences are and expect us to do the audit (aka. your homework) and provide it to you on a silver tablet? Actually you are asking the wrong question. Peter told you to determine which other parts of the kernel rely on that implicit serialization for a reason: The callbacks on the unplugged CPU are not the main problem, but still they have to be audited one by one. The real problem is the state change caused by these callbacks, which is visible to the rest of the system. Which consequences has such a state change, if it is not serialized via stop_machine(). And that's what Peter asked you to audit and provide, no? > Passed standard regression tests: > - https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-9/-/merge_requests/6042 > - https://gitlab.com/redhat/centos-stream/src/kernel/centos-stream-10/-/merge_requests/89 > > Passes primitive CPU hotplug test without warnings. Q: What is the value of this statement? A: Zero, as the change comes with zero analysis and the random centos regression tests based on primitive CPU hotplug tests are as useful as running hello_world(). > What tests and test suites do you recommend? You are not really asking me to explain the usage of git grep and gurgle search? Let me give you a hint. You now propose to only stop the housekeeping CPUs because (my assumption) you assume that all isolated CPUs are happily computing along in user space or in a VM, which is equivalent from the kernel POV. Q: How is that assumption justified? A: Not at all. [1] There is no guarantee that isolated CPUs (user/vm) are not inside the kernel, are not coming back to the kernel while the unplug is in progress or come back to the kernel post unplug and observe stale state. That's not something which can be answered by random test suites. That's something which has to be audited and argued for correctness, which is what you have been asked to provide, right? The test suites are there to validate correctness and to eventually find the odd corner cases you missed in your analysis and which were missed by people reviewing it. They cannot replace upfront analysis as their test coverage is not complete. Your approach to solve this problem seems to be to throw random sh^Wideas at the wall and see what sticks. That's not how it works. You want this "feature", so it's your task to do the homework, i.e. the analysis you have been asked for, and to argue why your changes are correct under all circumstances and not only under a set of your made up assumptions. Even if these assumptions match the reality of your use case, you still have to provide the argument and the prove that there is no other use case, which makes your assumptions moot. For illustration let me go back to [1] above. Assume there are two housekeeping CPUs and two isolated CPUs. HKCPU0 HKCPU1 ISOLCPU0 ISOLCPU1 dont'care don't care user space don't care // context unplug(ISOLCPU1) CPU hotplug thread invokes all thread context callbacks takedown_cpu() stop_machine(HKCPUS) stopper_thread() stopper_thread() stopper_thread() local_irq_disable(); // All HK CPUs arrived invoke_dying_callbacks() syscall/vmexit() .... smp_function_call(cpu_online_mask, wait=true) .... wait_for_completion() set_cpu_offline(false); ISOLCPU0 now waits forever because ISOLCPU1 cannot process the IPI as it has interrupts disabled already and had not yet removed itself in the CPU online mask wehn ISOLCPU0 sent the IPI. That's one simple example which makes your assumption completely moot as you cannot make any unconditionally valid assumptions about user space or VM behaviour. There are tons of other examples, but that's not the maintainers/reviewers job to explain them to you. It's your job to explain to the maintainers/reviewers why these problems do not exist according to your fact based analysis. Please come back when you can provide such fact based analysis and spare us the next random "brilliant" idea thrown at the wall. Thanks, tglx