From: Breno Leitao <leitao@debian.org>
To: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-perf-users@vger.kernel.org
Cc: rcu@vger.kernel.org, rbc@meta.com
Subject: kvm/x86: perf: Softlockup issue
Date: Tue, 3 Oct 2023 06:52:38 -0700 [thread overview]
Message-ID: <ZRwcpki67uhpAUKi@gmail.com> (raw)
I've been pursuing a bug in a virtual machine (KVM) that I would like to share
in here. The VM gets stuck when running perf in a VM and getting soft lockups.
The bug happens upstream (Linux 6.6-rc4 - 8a749fd1a8720d461). The same kernel
is being used in the host and in the guest.
The problem only happens in a very specific circumstances:
1) PMU needs to be enabled in the guest
2) Libvirt/QEMU needs to use a custom CPU:
* Here is the qemu line:
-cpu Skylake-Server,kvm-pv-eoi=on,pmu=on
* Any other CPU seems to hit the problem
* Even using Skylake-Server on a Skylake server
* Using CPU passthrough workaround the problem
3) You need to use 6 or more events in perf.
* This is a line that reproduces the problem:
# perf stat -e cpu-clock -e context-switches -e cpu-migrations -e page-faults -e cycles -e instructions -e branches ls
* Removing any of these events (totaling 5 events) makes `perf` work again
4) This problem happens on upstream, 6.4 and 5.19
* This problem doesn't seem to happen on 5.12
Problem
========
When running perf in the circumstances above, the VM is stuck, with a lot of
stack traces. This is some messages:
kernel:[ 400.314381] watchdog: BUG: soft lockup - CPU#3 stuck for 26s! [kworker/u68:11:6853]
kernel:[ 400.324380] watchdog: BUG: soft lockup - CPU#8 stuck for 26s! [dynoKernelMon:9781]
kernel:[ 404.368380] watchdog: BUG: soft lockup - CPU#30 stuck for 22s! [kworker/30:2:1326]
Here is part of the stack. The full stack is in the pastebin below:
nmi_cpu_backtrace (lib/nmi_backtrace.c:115)
nmi_cpu_backtrace_handler (arch/x86/kernel/apic/hw_nmi.c:47)
nmi_handle (arch/x86/kernel/nmi.c:149)
__intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
__intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
default_do_nmi (arch/x86/kernel/nmi.c:347)
exc_nmi (arch/x86/kernel/nmi.c:543)
end_repeat_nmi (arch/x86/entry/entry_64.S:1471)
__intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
__intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
__intel_pmu_enable_all (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/msr.h:147 arch/x86/include/asm/msr.h:262 arch/x86/events/intel/core.c:2239)
More info
=========
Soft lockup messages in the guest:
https://paste.debian.net/1293888/
Full log from the guest:
https://paste.debian.net/1293891/
vCPU stacks dumped from the host (cat /proc/<vcpu>/stack):
https://paste.debian.net/1293887/
Qemu (version 7.1.0) command line
https://paste.debian.net/1293894/
next reply other threads:[~2023-10-03 13:52 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-03 13:52 Breno Leitao [this message]
2023-10-03 14:36 ` kvm/x86: perf: Softlockup issue Jim Mattson
2023-10-04 10:08 ` Breno Leitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZRwcpki67uhpAUKi@gmail.com \
--to=leitao@debian.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=rbc@meta.com \
--cc=rcu@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.