All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Reduce CPU usage when finished handling panic
@ 2025-03-26 15:12 carlos.bilbao
  2025-03-26 15:12 ` [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic carlos.bilbao
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: carlos.bilbao @ 2025-03-26 15:12 UTC (permalink / raw)
  To: tglx
  Cc: bilbao, pmladek, akpm, jan.glauber, jani.nikula, linux-kernel,
	gregkh, takakura, john.ogness, Carlos Bilbao

From: Carlos Bilbao <cbilbao@digitalocean.com>

After the kernel has finished handling a panic, it enters a busy-wait loop.
But, this unnecessarily consumes CPU power and electricity. Plus, in VMs,
this negatively impacts the throughput of other VM guests running on the
same hypervisor.

This patch set introduces a weak function cpu_halt_after_panic() to give
architectures the option to halt the CPU during this state while still
allowing interrupts to be processed. Do so for arch/x86 by defining the
weak function and calling safe_halt().

Here's some numbers to support my claim, the perf stats from the hypervisor
after triggering a panic on a guest Linux kernel.

Samples: 55K of event 'cycles:P', Event count (approx.): 36090772574
Overhead  Command          Shared Object            Symbol
  42.20%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vmexit
  19.07%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_spec_ctrl_restore_host
   9.73%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_enter_exit
   3.60%  CPU 5/KVM        [kernel.kallsyms]        [k] __flush_smp_call_function_queue
   2.91%  CPU 5/KVM        [kernel.kallsyms]        [k] vmx_vcpu_run
   2.85%  CPU 5/KVM        [kernel.kallsyms]        [k] native_irq_return_iret
   2.67%  CPU 5/KVM        [kernel.kallsyms]        [k] native_flush_tlb_one_user
   2.16%  CPU 5/KVM        [kernel.kallsyms]        [k] llist_reverse_order
   2.10%  CPU 5/KVM        [kernel.kallsyms]        [k] __srcu_read_lock
   2.08%  CPU 5/KVM        [kernel.kallsyms]        [k] flush_tlb_func
   1.52%  CPU 5/KVM        [kernel.kallsyms]        [k] vcpu_enter_guest.constprop.0

And here are the results from the guest VM after applying my patch:

Samples: 51  of event 'cycles:P', Event count (approx.): 37553709
Overhead  Command          Shared Object            Symbol
   7.94%  qemu-system-x86  [kernel.kallsyms]        [k] __schedule
   7.94%  qemu-system-x86  libc.so.6                [.] 0x00000000000a2702
   7.94%  qemu-system-x86  qemu-system-x86_64       [.] 0x000000000057603c
   7.43%  qemu-system-x86  libc.so.6                [.] malloc
   7.43%  qemu-system-x86  libc.so.6                [.] 0x00000000001af9c0
   6.37%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_mutex_unlock
   5.21%  IO mon_iothread  [kernel.kallsyms]        [k] __pollwait
   4.70%  IO mon_iothread  [kernel.kallsyms]        [k] clear_bhb_loop
   3.56%  IO mon_iothread  [kernel.kallsyms]        [k] __secure_computing
   3.56%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_query
   3.15%  IO mon_iothread  [kernel.kallsyms]        [k] __hrtimer_start_range_ns
   3.15%  IO mon_iothread  [kernel.kallsyms]        [k] _raw_spin_lock_irq
   2.88%  IO mon_iothread  libglib-2.0.so.0.7200.4  [.] g_main_context_prepare
   2.83%  qemu-system-x86  libglib-2.0.so.0.7200.4  [.] g_slist_foreach
   2.58%  IO mon_iothread  qemu-system-x86_64       [.] 0x00000000004e820d
   2.21%  qemu-system-x86  libc.so.6                [.] 0x0000000000088010
   1.94%  IO mon_iothread  [kernel.kallsyms]        [k] arch_exit_to_user_mode_prepar

As you can see, CPU consumption is significantly reduced after applying the
proposed change after panic logic, with KVM-related functions (e.g.,
vmx_vmexit()) dropping from more than 70% of CPU usage to virtually
nothing. Also, the num of samples decreased from 55K to 51 and the event
count dropped from 36.09 billion to 37.55 million.

Carlos Bilbao at DigitalOcean (2):
  panic: Allow archs to reduce CPU consumption after panic
  x86/panic: Use safe_halt() for CPU halt after panic

---

 arch/x86/kernel/Makefile |  1 +
 arch/x86/kernel/panic.c  |  9 +++++++++
 kernel/panic.c           | 12 +++++++++++-
 3 files changed, 21 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/panic.c


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2025-04-22 14:40 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-03-26 15:12 [PATCH 0/2] Reduce CPU usage when finished handling panic carlos.bilbao
2025-03-26 15:12 ` [PATCH 1/2] panic: Allow archs to reduce CPU consumption after panic carlos.bilbao
2025-04-11 14:03   ` Petr Mladek
2025-04-11 16:31     ` Sean Christopherson
2025-04-14 10:02       ` Jan Glauber
2025-04-22 12:44       ` Carlos Bilbao
2025-03-26 15:12 ` [PATCH 2/2] x86/panic: Use safe_halt() for CPU halt " carlos.bilbao
2025-04-10 17:30 ` Reminder: [PATCH 0/2] Reduce CPU usage when finished handling panic Carlos Bilbao

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.