linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* kvm/x86: perf: Softlockup issue
@ 2023-10-03 13:52 Breno Leitao
  2023-10-03 14:36 ` Jim Mattson
  0 siblings, 1 reply; 3+ messages in thread
From: Breno Leitao @ 2023-10-03 13:52 UTC (permalink / raw)
  To: kvm, linux-kernel, linux-perf-users; +Cc: rcu, rbc

I've been pursuing a bug in a virtual machine (KVM) that I would like to share
in here. The VM gets stuck when running perf in a VM and getting soft lockups.

The bug happens upstream (Linux 6.6-rc4 - 8a749fd1a8720d461). The same kernel
is being used in the host and in the guest.

The problem only happens in a very specific circumstances:

1) PMU needs to be enabled in the guest

2) Libvirt/QEMU needs to use a custom CPU:
	* Here is the qemu line:
		-cpu Skylake-Server,kvm-pv-eoi=on,pmu=on
	* Any other CPU seems to hit the problem
		* Even using Skylake-Server on a Skylake server
	* Using CPU passthrough workaround the problem

3) You need to use 6 or more events in perf.
	* This is a line that reproduces the problem:
	  # perf stat -e cpu-clock -e context-switches -e cpu-migrations  -e page-faults -e cycles -e instructions  -e branches ls
	* Removing any of these events (totaling 5 events) makes `perf` work again

4) This problem happens on upstream, 6.4 and 5.19
	* This problem doesn't seem to happen on 5.12

Problem
========

When running perf in the circumstances above, the VM is stuck, with a lot of
stack traces. This is some messages:

	 kernel:[  400.314381] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [kworker/u68:11:6853]
	 kernel:[  400.324380] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [dynoKernelMon:9781]
	 kernel:[  404.368380] watchdog: BUG: soft lockup - CPU#30 stuck for 22s! [kworker/30:2:1326]

Here is part of the stack. The full stack is in the pastebin below:

	 nmi_cpu_backtrace (lib/nmi_backtrace.c:115)
	 nmi_cpu_backtrace_handler (arch/x86/kernel/apic/hw_nmi.c:47)
	 nmi_handle (arch/x86/kernel/nmi.c:149)
	 __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
	 __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
	 default_do_nmi (arch/x86/kernel/nmi.c:347)
	 exc_nmi (arch/x86/kernel/nmi.c:543)
	 end_repeat_nmi (arch/x86/entry/entry_64.S:1471)
	 __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
	 __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
	 __intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)



More info
=========

Soft lockup messages in the guest:
	https://paste.debian.net/1293888/
Full log from the guest:
	https://paste.debian.net/1293891/
vCPU stacks dumped from the host (cat /proc/<vcpu>/stack):
	https://paste.debian.net/1293887/
Qemu (version 7.1.0) command line
	https://paste.debian.net/1293894/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kvm/x86: perf: Softlockup issue
  2023-10-03 13:52 kvm/x86: perf: Softlockup issue Breno Leitao
@ 2023-10-03 14:36 ` Jim Mattson
  2023-10-04 10:08   ` Breno Leitao
  0 siblings, 1 reply; 3+ messages in thread
From: Jim Mattson @ 2023-10-03 14:36 UTC (permalink / raw)
  To: Breno Leitao; +Cc: kvm, linux-kernel, linux-perf-users, rcu, rbc

On Tue, Oct 3, 2023 at 6:52 AM Breno Leitao <leitao@debian.org> wrote:
>
> I've been pursuing a bug in a virtual machine (KVM) that I would like to share
> in here. The VM gets stuck when running perf in a VM and getting soft lockups.
>
> The bug happens upstream (Linux 6.6-rc4 - 8a749fd1a8720d461). The same kernel
> is being used in the host and in the guest.

Have you tried https://lore.kernel.org/kvm/169567819674.170423.4384853980629356216.b4-ty@google.com/?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: kvm/x86: perf: Softlockup issue
  2023-10-03 14:36 ` Jim Mattson
@ 2023-10-04 10:08   ` Breno Leitao
  0 siblings, 0 replies; 3+ messages in thread
From: Breno Leitao @ 2023-10-04 10:08 UTC (permalink / raw)
  To: Jim Mattson; +Cc: kvm, linux-kernel, linux-perf-users, rcu, rbc

Hello Jim,

On Tue, Oct 03, 2023 at 07:36:48AM -0700, Jim Mattson wrote:
> On Tue, Oct 3, 2023 at 6:52 AM Breno Leitao <leitao@debian.org> wrote:
> >
> > I've been pursuing a bug in a virtual machine (KVM) that I would like to share
> > in here. The VM gets stuck when running perf in a VM and getting soft lockups.
> >
> > The bug happens upstream (Linux 6.6-rc4 - 8a749fd1a8720d461). The same kernel
> > is being used in the host and in the guest.
> 
> Have you tried https://lore.kernel.org/kvm/169567819674.170423.4384853980629356216.b4-ty@google.com/?

Thanks for the heads-up. These two patches indeed fix the problem.
Thanks for getting the problem fixed.

Tested-by: Breno Leitao <leitao@debian.org>

Breno

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-10-04 10:09 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-03 13:52 kvm/x86: perf: Softlockup issue Breno Leitao
2023-10-03 14:36 ` Jim Mattson
2023-10-04 10:08   ` Breno Leitao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).