From: Sean Christopherson <seanjc@google.com>
To: Will Deacon <will@kernel.org>,
Mark Rutland <mark.rutland@arm.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Arnaldo Carvalho de Melo <acme@kernel.org>,
Catalin Marinas <catalin.marinas@arm.com>,
Marc Zyngier <maz@kernel.org>, Guo Ren <guoren@kernel.org>,
Nick Hu <nickhu@andestech.com>, Greentime Hu <green.hu@gmail.com>,
Vincent Chen <deanbo422@gmail.com>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Thomas Gleixner <tglx@linutronix.de>,
Borislav Petkov <bp@alien8.de>,
x86@kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
Boris Ostrovsky <boris.ostrovsky@oracle.com>,
Juergen Gross <jgross@suse.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>,
Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Sean Christopherson <seanjc@google.com>,
Vitaly Kuznetsov <vkuznets@redhat.com>,
Wanpeng Li <wanpengli@tencent.com>,
Jim Mattson <jmattson@google.com>, Joerg Roedel <joro@8bytes.org>,
Stefano Stabellini <sstabellini@kernel.org>,
linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
kvmarm@lists.cs.columbia.edu, linux-csky@vger.kernel.org,
linux-riscv@lists.infradead.org, kvm@vger.kernel.org,
xen-devel@lists.xenproject.org,
Artem Kashkanov <artem.kashkanov@intel.com>,
Like Xu <like.xu.linux@gmail.com>,
Zhu Lingshan <lingshan.zhu@intel.com>
Subject: [PATCH 00/15] perf: KVM: Fix, optimize, and clean up callbacks
Date: Thu, 26 Aug 2021 17:57:03 -0700 [thread overview]
Message-ID: <20210827005718.585190-1-seanjc@google.com> (raw)
This started out as a small series[1] to fix a KVM bug related to Intel PT
interrupt handling and snowballed horribly.
The main problem being addressed is that the perf_guest_cbs are shared by
all CPUs, can be nullified by KVM during module unload, and are not
protected against concurrent access from NMI context.
The bug has escaped notice because all dereferences of perf_guest_cbs
follow the same "perf_guest_cbs && perf_guest_cbs->is_in_guest()" pattern,
and AFAICT the compiler never reloads perf_guest_cbs in this sequence.
The compiler does reload perf_guest_cbs for any future dereferences, but
the ->is_in_guest() guard all but guarantees the PMI handler will win the
race, e.g. to nullify perf_guest_cbs, KVM has to completely exit the guest
and teardown down all VMs before it can be unloaded.
But with help, e.g. READ_ONCE(perf_guest_cbs), unloading kvm_intel can
trigger a NULL pointer derference (see below). Manual intervention aside,
the bug is a bit of a time bomb, e.g. my patch 3 from the original PT
handling series would have omitted the ->is_in_guest() guard.
This series fixes the problem by making the callbacks per-CPU, and
registering/unregistering the callbacks only with preemption disabled
(except for the Xen case, which doesn't unregister).
This approach also allows for several nice cleanups in this series.
KVM x86 and arm64 can share callbacks, KVM x86 drops its somewhat
redundant current_vcpu, and the retpoline that is currently hit when KVM
is loaded (due to always checking ->is_in_guest()) goes away (it's still
there when running as Xen Dom0).
Changing to per-CPU callbacks also provides a good excuse to excise
copy+paste code from architectures that can't possibly have guest
callbacks.
This series conflicts horribly with a proposed patch[2] to use static
calls for perf_guest_cbs. But that patch is broken as it completely
fails to handle unregister, and it's not clear to me whether or not
it can correctly handle unregister without fixing the underlying race
(I don't know enough about the code patching for static calls).
This tweak
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 1eb45139fcc6..202e5ad97f82 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2954,7 +2954,7 @@ unsigned long perf_misc_flags(struct pt_regs *regs)
{
int misc = 0;
- if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+ if (READ_ONCE(perf_guest_cbs) && READ_ONCE(perf_guest_cbs)->is_in_guest()) {
if (perf_guest_cbs->is_user_mode())
misc |= PERF_RECORD_MISC_GUEST_USER;
else
while spamming module load/unload leads to:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:perf_misc_flags+0x1c/0x70
Call Trace:
perf_prepare_sample+0x53/0x6b0
perf_event_output_forward+0x67/0x160
__perf_event_overflow+0x52/0xf0
handle_pmi_common+0x207/0x300
intel_pmu_handle_irq+0xcf/0x410
perf_event_nmi_handler+0x28/0x50
nmi_handle+0xc7/0x260
default_do_nmi+0x6b/0x170
exc_nmi+0x103/0x130
asm_exc_nmi+0x76/0xbf
[1] https://lkml.kernel.org/r/20210823193709.55886-1-seanjc@google.com
[2] https://lkml.kernel.org/r/20210806133802.3528-2-lingshan.zhu@intel.com
Sean Christopherson (15):
KVM: x86: Register perf callbacks after calling vendor's
hardware_setup()
KVM: x86: Register Processor Trace interrupt hook iff PT enabled in
guest
perf: Stop pretending that perf can handle multiple guest callbacks
perf: Force architectures to opt-in to guest callbacks
perf: Track guest callbacks on a per-CPU basis
KVM: x86: Register perf callbacks only when actively handling
interrupt
KVM: Use dedicated flag to track if KVM is handling an NMI from guest
KVM: x86: Drop current_vcpu in favor of kvm_running_vcpu
KVM: arm64: Register/unregister perf callbacks at vcpu load/put
KVM: Move x86's perf guest info callbacks to generic KVM
KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c
KVM: arm64: Convert to the generic perf callbacks
KVM: arm64: Drop perf.c and fold its tiny bit of code into pmu.c
perf: Disallow bulk unregistering of guest callbacks and do cleanup
perf: KVM: Indicate "in guest" via NULL ->is_in_guest callback
arch/arm/kernel/perf_callchain.c | 28 ++------------
arch/arm64/Kconfig | 1 +
arch/arm64/include/asm/kvm_host.h | 8 +++-
arch/arm64/kernel/perf_callchain.c | 18 ++++++---
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/arm.c | 13 ++++++-
arch/arm64/kvm/perf.c | 62 ------------------------------
arch/arm64/kvm/pmu.c | 8 ++++
arch/csky/kernel/perf_callchain.c | 10 -----
arch/nds32/kernel/perf_event_cpu.c | 29 ++------------
arch/riscv/kernel/perf_callchain.c | 10 -----
arch/x86/Kconfig | 1 +
arch/x86/events/core.c | 17 +++++---
arch/x86/events/intel/core.c | 7 ++--
arch/x86/include/asm/kvm_host.h | 4 +-
arch/x86/kvm/pmu.c | 2 +-
arch/x86/kvm/pmu.h | 1 +
arch/x86/kvm/svm/svm.c | 2 +-
arch/x86/kvm/vmx/vmx.c | 25 ++++++++++--
arch/x86/kvm/x86.c | 54 +++-----------------------
arch/x86/kvm/x86.h | 12 +++---
arch/x86/xen/pmu.c | 2 +-
include/kvm/arm_pmu.h | 1 +
include/linux/kvm_host.h | 12 ++++++
include/linux/perf_event.h | 33 ++++++++++++----
init/Kconfig | 3 ++
kernel/events/core.c | 28 ++++++++------
virt/kvm/kvm_main.c | 42 ++++++++++++++++++++
28 files changed, 204 insertions(+), 231 deletions(-)
delete mode 100644 arch/arm64/kvm/perf.c
--
2.33.0.259.gc128427fd7-goog
next reply other threads:[~2021-08-27 0:57 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-08-27 0:57 Sean Christopherson [this message]
2021-08-27 0:57 ` [PATCH 01/15] KVM: x86: Register perf callbacks after calling vendor's hardware_setup() Sean Christopherson
2021-08-27 0:57 ` [PATCH 02/15] KVM: x86: Register Processor Trace interrupt hook iff PT enabled in guest Sean Christopherson
2021-08-27 0:57 ` [PATCH 03/15] perf: Stop pretending that perf can handle multiple guest callbacks Sean Christopherson
2021-08-27 0:57 ` [PATCH 04/15] perf: Force architectures to opt-in to " Sean Christopherson
2021-08-27 0:57 ` [PATCH 05/15] perf: Track guest callbacks on a per-CPU basis Sean Christopherson
2021-08-27 7:15 ` Peter Zijlstra
2021-08-27 14:49 ` Sean Christopherson
2021-08-27 14:56 ` Peter Zijlstra
2021-08-27 15:22 ` Sean Christopherson
2021-08-27 0:57 ` [PATCH 06/15] KVM: x86: Register perf callbacks only when actively handling interrupt Sean Christopherson
2021-08-27 7:21 ` Peter Zijlstra
2021-08-27 0:57 ` [PATCH 07/15] KVM: Use dedicated flag to track if KVM is handling an NMI from guest Sean Christopherson
2021-08-27 7:30 ` Peter Zijlstra
2021-08-27 14:58 ` Sean Christopherson
2021-08-27 0:57 ` [PATCH 08/15] KVM: x86: Drop current_vcpu in favor of kvm_running_vcpu Sean Christopherson
2021-08-27 0:57 ` [PATCH 09/15] KVM: arm64: Register/unregister perf callbacks at vcpu load/put Sean Christopherson
2021-08-27 0:57 ` [PATCH 10/15] KVM: Move x86's perf guest info callbacks to generic KVM Sean Christopherson
2021-08-27 0:57 ` [PATCH 11/15] KVM: x86: Move Intel Processor Trace interrupt handler to vmx.c Sean Christopherson
2021-08-27 7:34 ` Peter Zijlstra
2021-08-27 0:57 ` [PATCH 12/15] KVM: arm64: Convert to the generic perf callbacks Sean Christopherson
2021-08-27 0:57 ` [PATCH 13/15] KVM: arm64: Drop perf.c and fold its tiny bit of code into pmu.c Sean Christopherson
2021-08-27 0:57 ` [PATCH 14/15] perf: Disallow bulk unregistering of guest callbacks and do cleanup Sean Christopherson
2021-08-27 0:57 ` [PATCH 15/15] perf: KVM: Indicate "in guest" via NULL ->is_in_guest callback Sean Christopherson
2021-08-27 6:52 ` [PATCH 00/15] perf: KVM: Fix, optimize, and clean up callbacks Like Xu
2021-08-27 7:44 ` Peter Zijlstra
2021-08-27 8:01 ` Like Xu
2021-08-27 10:47 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210827005718.585190-1-seanjc@google.com \
--to=seanjc@google.com \
--cc=acme@kernel.org \
--cc=alexander.shishkin@linux.intel.com \
--cc=alexandru.elisei@arm.com \
--cc=aou@eecs.berkeley.edu \
--cc=artem.kashkanov@intel.com \
--cc=boris.ostrovsky@oracle.com \
--cc=bp@alien8.de \
--cc=catalin.marinas@arm.com \
--cc=deanbo422@gmail.com \
--cc=green.hu@gmail.com \
--cc=guoren@kernel.org \
--cc=hpa@zytor.com \
--cc=james.morse@arm.com \
--cc=jgross@suse.com \
--cc=jmattson@google.com \
--cc=jolsa@redhat.com \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=like.xu.linux@gmail.com \
--cc=lingshan.zhu@intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-csky@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=mark.rutland@arm.com \
--cc=maz@kernel.org \
--cc=mingo@redhat.com \
--cc=namhyung@kernel.org \
--cc=nickhu@andestech.com \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=sstabellini@kernel.org \
--cc=suzuki.poulose@arm.com \
--cc=tglx@linutronix.de \
--cc=vkuznets@redhat.com \
--cc=wanpengli@tencent.com \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=xen-devel@lists.xenproject.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).