[PATCH 0/2] Reduce CPU usage when finished handling panic

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/2] Reduce CPU usage when finished handling panic
@ 2025-03-26 15:12 carlos.bilbao
  2025-03-26 15:12 ` [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic carlos.bilbao
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: carlos.bilbao @ 2025-03-26 15:12 UTC (permalink / raw)
  To: tglx
  Cc: bilbao, pmladek, akpm, jan.glauber, jani.nikula, linux-kernel,
	gregkh, takakura, john.ogness, Carlos Bilbao

From: Carlos Bilbao <cbilbao@digitalocean.com>

After the kernel has finished handling a panic, it enters a busy-wait loop.
But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
this negatively impacts the throughput of other VM guests running on the
same hypervisor.

This patch set introduces a weak function cpu_halt_after_panic() to give
architectures the option to halt the CPU during this state while still
allowing interrupts to be processed. Do so for arch/x86 by defining the
weak function and calling safe_halt().

Here's some numbers to support my claim, the perf stats from the hypervisor
after triggering a panic on a guest Linux kernel.

Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
Overhead  Command          Shared Object            Symbol
  42.20%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vmexit
  19.07%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_spec_ctrl_restore_host
   9.73%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_enter_exit
   3.60%  CPU 5/KVM        [kernel.kallsyms]        [k] __flush_smp_call_function_queue
   2.91%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_run
   2.85%  CPU 5/KVM        [kernel.kallsyms]        [k] native_irq_return_iret
   2.67%  CPU 5/KVM        [kernel.kallsyms]        [k] native_flush_tlb_one_user
   2.16%  CPU 5/KVM        [kernel.kallsyms]        [k] llist_reverse_order
   2.10%  CPU 5/KVM        [kernel.kallsyms]        [k] __srcu_read_lock
   2.08%  CPU 5/KVM        [kernel.kallsyms]        [k] flush_tlb_func
   1.52%  CPU 5/KVM        [kernel.kallsyms]        [k] vcpu_enter_guest.constprop.0

And here are the results from the guest VM after applying my patch:

Samples: 51  of event 'cycles:P', Event count (approx.): 37553709
Overhead  Command          Shared Object            Symbol
   7.94%  qemu-system-x86  [kernel.kallsyms]        [k] __schedule
   7.94%  qemu-system-x86  libc.so.6                [.] 0x00000000000a2702
   7.94%  qemu-system-x86  qemu-system-x86_64       [.] 0x000000000057603c
   7.43%  qemu-system-x86  libc.so.6                [.] malloc
   7.43%  qemu-system-x86  libc.so.6                [.] 0x00000000001af9c0
   6.37%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_mutex_unlock
   5.21%  IO mon_iothread  [kernel.kallsyms]        [k] __pollwait
   4.70%  IO mon_iothread  [kernel.kallsyms]        [k] clear_bhb_loop
   3.56%  IO mon_iothread  [kernel.kallsyms]        [k] __secure_computing
   3.56%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_query
   3.15%  IO mon_iothread  [kernel.kallsyms]        [k] __hrtimer_start_range_ns
   3.15%  IO mon_iothread  [kernel.kallsyms]        [k] _raw_spin_lock_irq
   2.88%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_prepare
   2.83%  qemu-system-x86  libglib-2.0.so.0.7200.4  [.] g_slist_foreach
   2.58%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004e820d
   2.21%  qemu-system-x86  libc.so.6                [.] 0x0000000000088010
   1.94%  IO mon_iothread  [kernel.kallsyms]        [k] arch_exit_to_user_mode_prepar

As you can see, CPU consumption is significantly reduced after applying the
proposed change after panic logic, with KVM-related functions (e.g.,
vmx_vmexit()) dropping from more than 70% of CPU usage to virtually
nothing. Also, the num of samples decreased from 55K to 51 and the event
count dropped from 36.09 billion to 37.55 million.

Carlos Bilbao at DigitalOcean (2):
  panic: Allow archs to reduce CPU consumption after panic
  x86/panic: Use safe_halt() for CPU halt after panic

---

 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/panic.c  |  9 +++++++++
 kernel/panic.c           | 12 +++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/panic.c


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic
  2025-03-26 15:12 [PATCH 0/2] Reduce CPU usage when finished handling panic carlos.bilbao
@ 2025-03-26 15:12 ` carlos.bilbao
  2025-04-11 14:03   ` Petr Mladek
  2025-03-26 15:12 ` [PATCH 2/2] x86/panic: Use safe_halt() for CPU halt " carlos.bilbao
  2025-04-10 17:30 ` Reminder: [PATCH 0/2] Reduce CPU usage when finished handling panic Carlos Bilbao
  2 siblings, 1 reply; 8+ messages in thread
From: carlos.bilbao @ 2025-03-26 15:12 UTC (permalink / raw)
  To: tglx
  Cc: bilbao, pmladek, akpm, jan.glauber, jani.nikula, linux-kernel,
	gregkh, takakura, john.ogness, Carlos Bilbao

From: Carlos Bilbao <carlos.bilbao@kernel.org>

After handling a panic, the kernel enters a busy-wait loop, unnecessarily
consuming CPU and potentially impacting other workloads including other
guest VMs in the case of virtualized setups.

Introduce cpu_halt_after_panic(), a weak function that archs can override
for a more efficient halt of CPU work. By default, it preserves the
pre-existing behavior of delay.

Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
---
 kernel/panic.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index fbc59b3b64d0..fafe3fa22533 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
 		crash_smp_send_stop();
 }
 
+/*
+ * Called after a kernel panic has been handled, at which stage halting
+ * the CPU can help reduce unnecessary CPU consumption. In the absence of
+ * arch-specific implementations, just delay
+ */
+static void __weak cpu_halt_after_panic(void)
+{
+	mdelay(PANIC_TIMER_STEP);
+}
+
 /**
  *	panic - halt the system
  *	@fmt: The text string to print
@@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
 			i += panic_blink(state ^= 1);
 			i_next = i + 3600 / PANIC_BLINK_SPD;
 		}
-		mdelay(PANIC_TIMER_STEP);
+		cpu_halt_after_panic();
 	}
 }
 
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH 2/2] x86/panic: Use safe_halt() for CPU halt after panic
  2025-03-26 15:12 [PATCH 0/2] Reduce CPU usage when finished handling panic carlos.bilbao
  2025-03-26 15:12 ` [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic carlos.bilbao
@ 2025-03-26 15:12 ` carlos.bilbao
  2025-04-10 17:30 ` Reminder: [PATCH 0/2] Reduce CPU usage when finished handling panic Carlos Bilbao
  2 siblings, 0 replies; 8+ messages in thread
From: carlos.bilbao @ 2025-03-26 15:12 UTC (permalink / raw)
  To: tglx
  Cc: bilbao, pmladek, akpm, jan.glauber, jani.nikula, linux-kernel,
	gregkh, takakura, john.ogness, Carlos Bilbao

From: Carlos Bilbao <carlos.bilbao@kernel.org>

Include an arch/x86-specific function to halt CPU once the kernel has
finished handling a panic, by defining the weak function
cpu_halt_after_panic().

Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
---
 arch/x86/kernel/Makefile | 1 +
 arch/x86/kernel/panic.c  | 9 +++++++++
 2 files changed, 10 insertions(+)
 create mode 100644 arch/x86/kernel/panic.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index f7918980667a..0e4d87596f5d 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -85,6 +85,7 @@ obj-y				+= stacktrace.o
 obj-y				+= cpu/
 obj-y				+= acpi/
 obj-y				+= reboot.o
+obj-y				+= panic.o
 obj-$(CONFIG_X86_MSR)		+= msr.o
 obj-$(CONFIG_X86_CPUID)		+= cpuid.o
 obj-$(CONFIG_PCI)		+= early-quirks.o
diff --git a/arch/x86/kernel/panic.c b/arch/x86/kernel/panic.c
new file mode 100644
index 000000000000..0838bb7cf9a9
--- /dev/null
+++ b/arch/x86/kernel/panic.c
@@ -0,0 +1,9 @@
+// SPDX-License-Identifier: GPL-2.0
+// Author Carlos Bilbao <carlos.bilbao@kernel.org>
+#include <linux/kernel.h>
+#include <asm/processor.h>
+
+void cpu_halt_after_panic(void)
+{
+	safe_halt();
+}
-- 
2.47.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Reminder: [PATCH 0/2] Reduce CPU usage when finished handling panic
  2025-03-26 15:12 [PATCH 0/2] Reduce CPU usage when finished handling panic carlos.bilbao
  2025-03-26 15:12 ` [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic carlos.bilbao
  2025-03-26 15:12 ` [PATCH 2/2] x86/panic: Use safe_halt() for CPU halt " carlos.bilbao
@ 2025-04-10 17:30 ` Carlos Bilbao
  2 siblings, 0 replies; 8+ messages in thread
From: Carlos Bilbao @ 2025-04-10 17:30 UTC (permalink / raw)
  To: carlos.bilbao, tglx, pmladek, akpm
  Cc: jan.glauber, jani.nikula, linux-kernel, gregkh, takakura,
	john.ogness, Carlos Bilbao

Hello again,

I would really appreciate your opinions on this.

Thanks!
Carlos

On 3/26/25 10:12, carlos.bilbao@kernel.org wrote:
> From: Carlos Bilbao <cbilbao@digitalocean.com>
> 
> After the kernel has finished handling a panic, it enters a busy-wait loop.
> But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
> this negatively impacts the throughput of other VM guests running on the
> same hypervisor.
> 
> This patch set introduces a weak function cpu_halt_after_panic() to give
> architectures the option to halt the CPU during this state while still
> allowing interrupts to be processed. Do so for arch/x86 by defining the
> weak function and calling safe_halt().
> 
> Here's some numbers to support my claim, the perf stats from the hypervisor
> after triggering a panic on a guest Linux kernel.
> 
> Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
> Overhead  Command          Shared Object            Symbol
>   42.20%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vmexit
>   19.07%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_spec_ctrl_restore_host
>    9.73%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_enter_exit
>    3.60%  CPU 5/KVM        [kernel.kallsyms]        [k] __flush_smp_call_function_queue
>    2.91%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_run
>    2.85%  CPU 5/KVM        [kernel.kallsyms]        [k] native_irq_return_iret
>    2.67%  CPU 5/KVM        [kernel.kallsyms]        [k] native_flush_tlb_one_user
>    2.16%  CPU 5/KVM        [kernel.kallsyms]        [k] llist_reverse_order
>    2.10%  CPU 5/KVM        [kernel.kallsyms]        [k] __srcu_read_lock
>    2.08%  CPU 5/KVM        [kernel.kallsyms]        [k] flush_tlb_func
>    1.52%  CPU 5/KVM        [kernel.kallsyms]        [k] vcpu_enter_guest.constprop.0
> 
> And here are the results from the guest VM after applying my patch:
> 
> Samples: 51  of event 'cycles:P', Event count (approx.): 37553709
> Overhead  Command          Shared Object            Symbol
>    7.94%  qemu-system-x86  [kernel.kallsyms]        [k] __schedule
>    7.94%  qemu-system-x86  libc.so.6                [.] 0x00000000000a2702
>    7.94%  qemu-system-x86  qemu-system-x86_64       [.] 0x000000000057603c
>    7.43%  qemu-system-x86  libc.so.6                [.] malloc
>    7.43%  qemu-system-x86  libc.so.6                [.] 0x00000000001af9c0
>    6.37%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_mutex_unlock
>    5.21%  IO mon_iothread  [kernel.kallsyms]        [k] __pollwait
>    4.70%  IO mon_iothread  [kernel.kallsyms]        [k] clear_bhb_loop
>    3.56%  IO mon_iothread  [kernel.kallsyms]        [k] __secure_computing
>    3.56%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_query
>    3.15%  IO mon_iothread  [kernel.kallsyms]        [k] __hrtimer_start_range_ns
>    3.15%  IO mon_iothread  [kernel.kallsyms]        [k] _raw_spin_lock_irq
>    2.88%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_prepare
>    2.83%  qemu-system-x86  libglib-2.0.so.0.7200.4  [.] g_slist_foreach
>    2.58%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004e820d
>    2.21%  qemu-system-x86  libc.so.6                [.] 0x0000000000088010
>    1.94%  IO mon_iothread  [kernel.kallsyms]        [k] arch_exit_to_user_mode_prepar
> 
> As you can see, CPU consumption is significantly reduced after applying the
> proposed change after panic logic, with KVM-related functions (e.g.,
> vmx_vmexit()) dropping from more than 70% of CPU usage to virtually
> nothing. Also, the num of samples decreased from 55K to 51 and the event
> count dropped from 36.09 billion to 37.55 million.
> 
> Carlos Bilbao at DigitalOcean (2):
>   panic: Allow archs to reduce CPU consumption after panic
>   x86/panic: Use safe_halt() for CPU halt after panic
> 
> ---
> 
>  arch/x86/kernel/Makefile |  1 +
>  arch/x86/kernel/panic.c  |  9 +++++++++
>  kernel/panic.c           | 12 +++++++++++-
>  3 files changed, 21 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/kernel/panic.c
> 
> 
> From mboxrd@z Thu Jan  1 00:00:00 1970
> Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201])
> 	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
> 	(No client certificate requested)
> 	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B42E1F4174
> 	for <linux-kernel@vger.kernel.org>; Wed, 26 Mar 2025 15:12:15 +0000 (UTC)
> Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201
> ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
> 	t=1743001935; cv=none; b=pTx5wAwLeH5sWAAgsmlCk1lZgzSyUJH4X0UwzbEXvNm3EDKfoAwmJNvbIAk6ESdDQZ4j/9u/Tr51T9mIAGBteoeogjNzS7CEhokwMvfjLwfK/GZHSzyN+0oqtMptT829NvzENA2BVex1DLKAjsePN5nTlrf3/WMHr1bcmQSYBG4=
> ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org;
> 	s=arc-20240116; t=1743001935; c=relaxed/simple;
> 	bh=dvH6cZROBDqL/EIJB0ddluLgh3GMP5pgXEtD5g291tI=;
> 	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
> 	 MIME-Version; b=oQxBjv4Hpv/rJEVcoN/5DAetBXYAQcfNM++r5iZT8phmtHiLu/rCJ3KAEuqzqy6ffyuEgAPLj8oD9G5nwxUFtscLmkYOL1LlhmcNF5Qtdfnmpdbtsd6oCsCd+9eV0diUhXdALWysZAF6aQXpSZw5LUfT8xresIHTaKWrp6pvX7A=
> ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=MurPLbRg; arc=none smtp.client-ip=10.30.226.201
> Authentication-Results: smtp.subspace.kernel.org;
> 	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="MurPLbRg"
> Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4D6FDC4CEE8;
> 	Wed, 26 Mar 2025 15:12:14 +0000 (UTC)
> DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
> 	s=k20201202; t=1743001935;
> 	bh=dvH6cZROBDqL/EIJB0ddluLgh3GMP5pgXEtD5g291tI=;
> 	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
> 	b=MurPLbRgOhvxG7DGoeI4e6uf1uBNgQKuYEVn+R9J1Ys/ntkU8s2GjleUUf4P5gSje
> 	 K0Qw27qmTj6yClEmUiZYU3Jw1dUraF20/y3Y5X2ULu4JIBKzJDJcs5zPefI7Hkzoie
> 	 vbSpNhTmjCjRrUQu0tIv9GAwFTQynj6olDOMx+wonf4CXVF2xg0OSv6n4KuZs9Plps
> 	 V14SmWmJUQLArdVDliLtaFaZ+VR12eQgLTD7XuLG8HExBuGdATgUYre2U3B9lGEfxr
> 	 RcHi7NoRsrkmWAkQfXjInPNCwOkLvWM6CaaRHxsMWSD8aK5/8DS82WxDealKGyUyX0
> 	 LuAEXKNekpppw==
> From: carlos.bilbao@kernel.org
> To: tglx@linutronix.de
> Cc: bilbao@vt.edu,
> 	pmladek@suse.com,
> 	akpm@linux-foundation.org,
> 	jan.glauber@gmail.com,
> 	jani.nikula@intel.com,
> 	linux-kernel@vger.kernel.org,
> 	gregkh@linuxfoundation.org,
> 	takakura@valinux.co.jp,
> 	john.ogness@linutronix.de,
> 	Carlos Bilbao <carlos.bilbao@kernel.org>
> Subject: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic
> Date: Wed, 26 Mar 2025 10:12:03 -0500
> Message-ID: <20250326151204.67898-2-carlos.bilbao@kernel.org>
> X-Mailer: git-send-email 2.47.1
> In-Reply-To: <20250326151204.67898-1-carlos.bilbao@kernel.org>
> References: <20250326151204.67898-1-carlos.bilbao@kernel.org>
> Precedence: bulk
> X-Mailing-List: linux-kernel@vger.kernel.org
> List-Id: <linux-kernel.vger.kernel.org>
> List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
> List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
> MIME-Version: 1.0
> Content-Transfer-Encoding: 8bit
> 
> From: Carlos Bilbao <carlos.bilbao@kernel.org>
> 
> After handling a panic, the kernel enters a busy-wait loop, unnecessarily
> consuming CPU and potentially impacting other workloads including other
> guest VMs in the case of virtualized setups.
> 
> Introduce cpu_halt_after_panic(), a weak function that archs can override
> for a more efficient halt of CPU work. By default, it preserves the
> pre-existing behavior of delay.
> 
> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
> Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
> ---
>  kernel/panic.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..fafe3fa22533 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>  		crash_smp_send_stop();
>  }
>  
> +/*
> + * Called after a kernel panic has been handled, at which stage halting
> + * the CPU can help reduce unnecessary CPU consumption. In the absence of
> + * arch-specific implementations, just delay
> + */
> +static void __weak cpu_halt_after_panic(void)
> +{
> +	mdelay(PANIC_TIMER_STEP);
> +}
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
>  			i += panic_blink(state ^= 1);
>  			i_next = i + 3600 / PANIC_BLINK_SPD;
>  		}
> -		mdelay(PANIC_TIMER_STEP);
> +		cpu_halt_after_panic();
>  	}
>  }
>  


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic
  2025-03-26 15:12 ` [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic carlos.bilbao
@ 2025-04-11 14:03   ` Petr Mladek
  2025-04-11 16:31     ` Sean Christopherson
  0 siblings, 1 reply; 8+ messages in thread
From: Petr Mladek @ 2025-04-11 14:03 UTC (permalink / raw)
  To: carlos.bilbao
  Cc: tglx, bilbao, akpm, jan.glauber, jani.nikula, linux-kernel,
	gregkh, takakura, john.ogness, Linus Torvalds, Jiri Slaby

Added Linus and Jiri (tty) into Cc.

On Wed 2025-03-26 10:12:03, carlos.bilbao@kernel.org wrote:
> From: Carlos Bilbao <carlos.bilbao@kernel.org>
> 
> After handling a panic, the kernel enters a busy-wait loop, unnecessarily
> consuming CPU and potentially impacting other workloads including other
> guest VMs in the case of virtualized setups.
> 
> Introduce cpu_halt_after_panic(), a weak function that archs can override
> for a more efficient halt of CPU work. By default, it preserves the
> pre-existing behavior of delay.
>
> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
> Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
> ---
>  kernel/panic.c | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index fbc59b3b64d0..fafe3fa22533 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>  		crash_smp_send_stop();
>  }
>  
> +/*
> + * Called after a kernel panic has been handled, at which stage halting
> + * the CPU can help reduce unnecessary CPU consumption. In the absence of
> + * arch-specific implementations, just delay
> + */
> +static void __weak cpu_halt_after_panic(void)
> +{
> +	mdelay(PANIC_TIMER_STEP);
> +}
> +
>  /**
>   *	panic - halt the system
>   *	@fmt: The text string to print
> @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
>  			i += panic_blink(state ^= 1);
>  			i_next = i + 3600 / PANIC_BLINK_SPD;
>  		}
> -		mdelay(PANIC_TIMER_STEP);
> +		cpu_halt_after_panic();

The 2nd patch implements this functions using safe_halt().

IMHO, it makes perfect sense to halt a virtualized system or
a system without a registered "graphical" console.

I think that the blinking diods were designed for desktops
and laptops with some "graphical" terminal attached. The
infinite loop allows to read the last lines, ideally
the backtrace on the screen.

I wonder if we could make it more clever and do the halt
only when the system is virtualized or when there is no
"graphical" console registered.

One way to detect graphical console might a check of
the existence of the c->unblank callback, see console_unblank().
But I am not sure if it is good enough.

Best Regards,
Petr

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic
  2025-04-11 14:03   ` Petr Mladek
@ 2025-04-11 16:31     ` Sean Christopherson
  2025-04-14 10:02       ` Jan Glauber
  2025-04-22 12:44       ` Carlos Bilbao
  0 siblings, 2 replies; 8+ messages in thread
From: Sean Christopherson @ 2025-04-11 16:31 UTC (permalink / raw)
  To: Petr Mladek
  Cc: carlos.bilbao, tglx, bilbao, akpm, jan.glauber, jani.nikula,
	linux-kernel, gregkh, takakura, john.ogness, Linus Torvalds,
	Jiri Slaby

On Fri, Apr 11, 2025, Petr Mladek wrote:
> Added Linus and Jiri (tty) into Cc.
> 
> On Wed 2025-03-26 10:12:03, carlos.bilbao@kernel.org wrote:
> > From: Carlos Bilbao <carlos.bilbao@kernel.org>
> > 
> > After handling a panic, the kernel enters a busy-wait loop, unnecessarily
> > consuming CPU and potentially impacting other workloads including other
> > guest VMs in the case of virtualized setups.

Impacting other guests isn't the guest kernel's problem.  If the host has heavily
overcommited CPUs and can't meet SLOs because VMs are panicking and not rebooting,
that's a host problem.

This could become a customer problem if they're getting billed based on CPU usage,
but I don't know that simply doing HLT is the best solution.  E.g. advising the
customer to configure their kernels to kexec into a kdump kernel or to reboot
on panic, seems like it would provide a better overall experience for most.

QEMU (assuming y'all use QEMU) also supports a pvpanic device, so unless the VM
and/or customer is using a funky setup, the host should already know the guest
has panicked.  At that point, the host can make appropiate scheduling decisions,
e.g. userspace can simply stop running the VM after a certain timeout, throttle
it, jail it, etc.

> > Introduce cpu_halt_after_panic(), a weak function that archs can override
> > for a more efficient halt of CPU work. By default, it preserves the
> > pre-existing behavior of delay.
> >
> > Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
> > Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
> > ---
> >  kernel/panic.c | 12 +++++++++++-
> >  1 file changed, 11 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index fbc59b3b64d0..fafe3fa22533 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
> >  		crash_smp_send_stop();
> >  }
> >  
> > +/*
> > + * Called after a kernel panic has been handled, at which stage halting
> > + * the CPU can help reduce unnecessary CPU consumption. In the absence of
> > + * arch-specific implementations, just delay
> > + */
> > +static void __weak cpu_halt_after_panic(void)
> > +{
> > +	mdelay(PANIC_TIMER_STEP);
> > +}
> > +
> >  /**
> >   *	panic - halt the system
> >   *	@fmt: The text string to print
> > @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
> >  			i += panic_blink(state ^= 1);
> >  			i_next = i + 3600 / PANIC_BLINK_SPD;
> >  		}
> > -		mdelay(PANIC_TIMER_STEP);
> > +		cpu_halt_after_panic();
> 
> The 2nd patch implements this functions using safe_halt().
> 
> IMHO, it makes perfect sense to halt a virtualized system or

I really, really don't want to arbitrarily single out VMs, especially in core
kernel code, as doing so risks creating problems that are unique to VMs.  If the
behavior is based on a virtualization-agnostic heuristic like "no console", then
by all means, but please don't condition something like this purely based on
running in a VM.

I also have no objection to the host/hypervisor prescribing specific behavior
(see below).  What I don't like is the kernel deciding to do things differently
because it's a VM, without any knowledge of the environment in which the VM is
running.

> a system without a registered "graphical" console.
>
> I think that the blinking diods were designed for desktops
> and laptops with some "graphical" terminal attached. The
> infinite loop allows to read the last lines, ideally
> the backtrace on the screen.
> 
> I wonder if we could make it more clever and do the halt
> only when the system is virtualized or when there is no
> "graphical" console registered.

Could we do something with panic_notifier_list?  E.g. let panic devices override
the default post-print behavior.  Then VMs with a paravirt panic interface could
make an explicit call out to the host instead of relying on arch specific code
to HLT and hope for the best.  And maybe console drivers that want/need to keep
the CPU running can nullify panic_halt?

E.g. with a rudimentary priority system so that paravirt devices can override
everything else.

diff --git a/kernel/panic.c b/kernel/panic.c
index d8635d5cecb2..fe9007222308 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -141,6 +141,18 @@ static long no_blink(int state)
 long (*panic_blink)(int state);
 EXPORT_SYMBOL(panic_blink);

+static void (*panic_halt)(void) = cpu_halt_after_panic;
+static int panic_hlt_priority;
+
+void panic_set_hlt(void (*fn)(void), int priority)
+{
+       if (priority <= panic_hlt_priority)
+               return;
+
+       panic_halt = fn;
+}
+EXPORT_SYMBOL_GPL(panic_set_hlt);
+
 /*
  * Stop ourself in panic -- architecture code may override this
  */
@@ -467,6 +479,9 @@ void panic(const char *fmt, ...)
        console_flush_on_panic(CONSOLE_FLUSH_PENDING);
        nbcon_atomic_flush_unsafe();

+       if (panic_halt)
+               panic_halt();
+
        local_irq_enable();
        for (i = 0; ; i += PANIC_TIMER_STEP) {
                touch_softlockup_watchdog();

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic
  2025-04-11 16:31     ` Sean Christopherson
@ 2025-04-14 10:02       ` Jan Glauber
  2025-04-22 12:44       ` Carlos Bilbao
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Glauber @ 2025-04-14 10:02 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Petr Mladek, carlos.bilbao, tglx, bilbao, akpm, jan.glauber,
	jani.nikula, linux-kernel, gregkh, takakura, john.ogness,
	Linus Torvalds, Jiri Slaby

On Fri, Apr 11, 2025 at 09:31:11AM -0700, Sean Christopherson wrote:
> On Fri, Apr 11, 2025, Petr Mladek wrote:
> > Added Linus and Jiri (tty) into Cc.
> > 
> > On Wed 2025-03-26 10:12:03, carlos.bilbao@kernel.org wrote:
> > > From: Carlos Bilbao <carlos.bilbao@kernel.org>
> > > 
> > > After handling a panic, the kernel enters a busy-wait loop, unnecessarily
> > > consuming CPU and potentially impacting other workloads including other
> > > guest VMs in the case of virtualized setups.
> 
> Impacting other guests isn't the guest kernel's problem.  If the host has heavily
> overcommited CPUs and can't meet SLOs because VMs are panicking and not rebooting,
> that's a host problem.

Agreed, and it might be worth having the host detect panics.

> This could become a customer problem if they're getting billed based on CPU usage,
> but I don't know that simply doing HLT is the best solution.  E.g. advising the
> customer to configure their kernels to kexec into a kdump kernel or to reboot
> on panic, seems like it would provide a better overall experience for most.

Reboot on panic would probably make most sense. But this is not about
the lack of configuration options or possibilites to deal with paniced
guests. This is plainly about the default option that we set. And
while there are good alternatives, most people or setups will just stick
to the defaults in my experience.

So is a busy loop for panicked virtual guest the default that we want to set or
can we change that case to something more sensible?

> QEMU (assuming y'all use QEMU) also supports a pvpanic device, so unless the VM
> and/or customer is using a funky setup, the host should already know the guest
> has panicked.  At that point, the host can make appropiate scheduling decisions,
> e.g. userspace can simply stop running the VM after a certain timeout, throttle
> it, jail it, etc.

pvpanic is the right solution, but again this is an external solution
that requires additional setup.

--Jan

> > > Introduce cpu_halt_after_panic(), a weak function that archs can override
> > > for a more efficient halt of CPU work. By default, it preserves the
> > > pre-existing behavior of delay.
> > >
> > > Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
> > > Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
> > > ---
> > >  kernel/panic.c | 12 +++++++++++-
> > >  1 file changed, 11 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/kernel/panic.c b/kernel/panic.c
> > > index fbc59b3b64d0..fafe3fa22533 100644
> > > --- a/kernel/panic.c
> > > +++ b/kernel/panic.c
> > > @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
> > >  		crash_smp_send_stop();
> > >  }
> > >  
> > > +/*
> > > + * Called after a kernel panic has been handled, at which stage halting
> > > + * the CPU can help reduce unnecessary CPU consumption. In the absence of
> > > + * arch-specific implementations, just delay
> > > + */
> > > +static void __weak cpu_halt_after_panic(void)
> > > +{
> > > +	mdelay(PANIC_TIMER_STEP);
> > > +}
> > > +
> > >  /**
> > >   *	panic - halt the system
> > >   *	@fmt: The text string to print
> > > @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
> > >  			i += panic_blink(state ^= 1);
> > >  			i_next = i + 3600 / PANIC_BLINK_SPD;
> > >  		}
> > > -		mdelay(PANIC_TIMER_STEP);
> > > +		cpu_halt_after_panic();
> > 
> > The 2nd patch implements this functions using safe_halt().
> > 
> > IMHO, it makes perfect sense to halt a virtualized system or
> 
> I really, really don't want to arbitrarily single out VMs, especially in core
> kernel code, as doing so risks creating problems that are unique to VMs.  If the
> behavior is based on a virtualization-agnostic heuristic like "no console", then
> by all means, but please don't condition something like this purely based on
> running in a VM.
> 
> I also have no objection to the host/hypervisor prescribing specific behavior
> (see below).  What I don't like is the kernel deciding to do things differently
> because it's a VM, without any knowledge of the environment in which the VM is
> running.
> 
> > a system without a registered "graphical" console.
> >
> > I think that the blinking diods were designed for desktops
> > and laptops with some "graphical" terminal attached. The
> > infinite loop allows to read the last lines, ideally
> > the backtrace on the screen.
> > 
> > I wonder if we could make it more clever and do the halt
> > only when the system is virtualized or when there is no
> > "graphical" console registered.
> 
> Could we do something with panic_notifier_list?  E.g. let panic devices override
> the default post-print behavior.  Then VMs with a paravirt panic interface could
> make an explicit call out to the host instead of relying on arch specific code
> to HLT and hope for the best.  And maybe console drivers that want/need to keep
> the CPU running can nullify panic_halt?
> 
> E.g. with a rudimentary priority system so that paravirt devices can override
> everything else.
> 
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d8635d5cecb2..fe9007222308 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -141,6 +141,18 @@ static long no_blink(int state)
>  long (*panic_blink)(int state);
>  EXPORT_SYMBOL(panic_blink);
>  
> +static void (*panic_halt)(void) = cpu_halt_after_panic;
> +static int panic_hlt_priority;
> +
> +void panic_set_hlt(void (*fn)(void), int priority)
> +{
> +       if (priority <= panic_hlt_priority)
> +               return;
> +
> +       panic_halt = fn;
> +}
> +EXPORT_SYMBOL_GPL(panic_set_hlt);
> +
>  /*
>   * Stop ourself in panic -- architecture code may override this
>   */
> @@ -467,6 +479,9 @@ void panic(const char *fmt, ...)
>         console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>         nbcon_atomic_flush_unsafe();
>  
> +       if (panic_halt)
> +               panic_halt();
> +
>         local_irq_enable();
>         for (i = 0; ; i += PANIC_TIMER_STEP) {
>                 touch_softlockup_watchdog();
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic
  2025-04-11 16:31     ` Sean Christopherson
  2025-04-14 10:02       ` Jan Glauber
@ 2025-04-22 12:44       ` Carlos Bilbao
  1 sibling, 0 replies; 8+ messages in thread
From: Carlos Bilbao @ 2025-04-22 12:44 UTC (permalink / raw)
  To: Sean Christopherson, Petr Mladek, jan.glauber
  Cc: carlos.bilbao, tglx, bilbao, akpm, jani.nikula, linux-kernel,
	gregkh, takakura, john.ogness, Linus Torvalds, Jiri Slaby,
	Carlos Bilbao

Thanks for the feedback, everyone!


On 4/11/25 11:31, Sean Christopherson wrote:

> On Fri, Apr 11, 2025, Petr Mladek wrote:
>> Added Linus and Jiri (tty) into Cc.
>>
>> On Wed 2025-03-26 10:12:03, carlos.bilbao@kernel.org wrote:
>>> From: Carlos Bilbao <carlos.bilbao@kernel.org>
>>>
>>> After handling a panic, the kernel enters a busy-wait loop, unnecessarily
>>> consuming CPU and potentially impacting other workloads including other
>>> guest VMs in the case of virtualized setups.
> Impacting other guests isn't the guest kernel's problem.  If the host has heavily
> overcommited CPUs and can't meet SLOs because VMs are panicking and not rebooting,
> that's a host problem.
>
> This could become a customer problem if they're getting billed based on CPU usage,
> but I don't know that simply doing HLT is the best solution.  E.g. advising the
> customer to configure their kernels to kexec into a kdump kernel or to reboot
> on panic, seems like it would provide a better overall experience for most.
>
> QEMU (assuming y'all use QEMU) also supports a pvpanic device, so unless the VM
> and/or customer is using a funky setup, the host should already know the guest
> has panicked.  At that point, the host can make appropiate scheduling decisions,
> e.g. userspace can simply stop running the VM after a certain timeout, throttle
> it, jail it, etc.
>
>>> Introduce cpu_halt_after_panic(), a weak function that archs can override
>>> for a more efficient halt of CPU work. By default, it preserves the
>>> pre-existing behavior of delay.
>>>
>>> Signed-off-by: Carlos Bilbao (DigitalOcean) <carlos.bilbao@kernel.org>
>>> Reviewed-by: Jan Glauber (DigitalOcean) <jan.glauber@gmail.com>
>>> ---
>>>  kernel/panic.c | 12 +++++++++++-
>>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/kernel/panic.c b/kernel/panic.c
>>> index fbc59b3b64d0..fafe3fa22533 100644
>>> --- a/kernel/panic.c
>>> +++ b/kernel/panic.c
>>> @@ -276,6 +276,16 @@ static void panic_other_cpus_shutdown(bool crash_kexec)
>>>  		crash_smp_send_stop();
>>>  }
>>>  
>>> +/*
>>> + * Called after a kernel panic has been handled, at which stage halting
>>> + * the CPU can help reduce unnecessary CPU consumption. In the absence of
>>> + * arch-specific implementations, just delay
>>> + */
>>> +static void __weak cpu_halt_after_panic(void)
>>> +{
>>> +	mdelay(PANIC_TIMER_STEP);
>>> +}
>>> +
>>>  /**
>>>   *	panic - halt the system
>>>   *	@fmt: The text string to print
>>> @@ -474,7 +484,7 @@ void panic(const char *fmt, ...)
>>>  			i += panic_blink(state ^= 1);
>>>  			i_next = i + 3600 / PANIC_BLINK_SPD;
>>>  		}
>>> -		mdelay(PANIC_TIMER_STEP);
>>> +		cpu_halt_after_panic();
>> The 2nd patch implements this functions using safe_halt().
>>
>> IMHO, it makes perfect sense to halt a virtualized system or
> I really, really don't want to arbitrarily single out VMs, especially in core
> kernel code, as doing so risks creating problems that are unique to VMs.  If the
> behavior is based on a virtualization-agnostic heuristic like "no console", then
> by all means, but please don't condition something like this purely based on
> running in a VM.
>
> I also have no objection to the host/hypervisor prescribing specific behavior
> (see below).  What I don't like is the kernel deciding to do things differently
> because it's a VM, without any knowledge of the environment in which the VM is
> running.


Sean, I understand your point that we shouldn’t make core kernel behavior arbitrarily VM-specific. Even though, arguably, my RFC cover letter focused on VMs -- as that’s where I detected the issue (without oversubscription, BTW) -- the intention is to provide a more general post-panic solution that can help both VM/bare-metal envs. As Jan mentioned, the current default (a
CPU spinning forever after panic()) is often suboptimal.


>> a system without a registered "graphical" console.
>>
>> I think that the blinking diods were designed for desktops
>> and laptops with some "graphical" terminal attached. The
>> infinite loop allows to read the last lines, ideally
>> the backtrace on the screen.
>>
>> I wonder if we could make it more clever and do the halt
>> only when the system is virtualized or when there is no
>> "graphical" console registered.
> Could we do something with panic_notifier_list?  E.g. let panic devices override
> the default post-print behavior.  Then VMs with a paravirt panic interface could
> make an explicit call out to the host instead of relying on arch specific code
> to HLT and hope for the best.  And maybe console drivers that want/need to keep
> the CPU running can nullify panic_halt?


I like your suggestion of a panic_set_hlt() API with a priority mechanism
-- it’s more flexible than a weak function, and I’m happy to integrate that
into v2. Here's what I propose:

    Patch 1: Introduce panic_set_hlt(func, prio).
    Patch 2: Register a fallback safe_halt() implementation at priority 0
             that just calls safe_halt().

Please, LMK if you've any concerns with that plan.


> E.g. with a rudimentary priority system so that paravirt devices can override
> everything else.
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d8635d5cecb2..fe9007222308 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -141,6 +141,18 @@ static long no_blink(int state)
>  long (*panic_blink)(int state);
>  EXPORT_SYMBOL(panic_blink);
>  
> +static void (*panic_halt)(void) = cpu_halt_after_panic;
> +static int panic_hlt_priority;
> +
> +void panic_set_hlt(void (*fn)(void), int priority)
> +{
> +       if (priority <= panic_hlt_priority)
> +               return;
> +
> +       panic_halt = fn;
> +}
> +EXPORT_SYMBOL_GPL(panic_set_hlt);
> +
>  /*
>   * Stop ourself in panic -- architecture code may override this
>   */
> @@ -467,6 +479,9 @@ void panic(const char *fmt, ...)
>         console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>         nbcon_atomic_flush_unsafe();
>  
> +       if (panic_halt)
> +               panic_halt();
> +
>         local_irq_enable();
>         for (i = 0; ; i += PANIC_TIMER_STEP) {
>                 touch_softlockup_watchdog();
>

Thanks,

Carlos


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-22 14:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-26 15:12 [PATCH 0/2] Reduce CPU usage when finished handling panic carlos.bilbao
2025-03-26 15:12 ` [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic carlos.bilbao
2025-04-11 14:03   ` Petr Mladek
2025-04-11 16:31     ` Sean Christopherson
2025-04-14 10:02       ` Jan Glauber
2025-04-22 12:44       ` Carlos Bilbao
2025-03-26 15:12 ` [PATCH 2/2] x86/panic: Use safe_halt() for CPU halt " carlos.bilbao
2025-04-10 17:30 ` Reminder: [PATCH 0/2] Reduce CPU usage when finished handling panic Carlos Bilbao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox