* [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. @ 2025-04-26 0:13 Sean Christopherson 2025-04-27 9:10 ` Mi, Dapeng 2025-04-28 14:13 ` Seth Forshee 0 siblings, 2 replies; 7+ messages in thread From: Sean Christopherson @ 2025-04-26 0:13 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86 Cc: linux-perf-users, linux-kernel, Seth Forshee, Dapeng Mi, Sean Christopherson When generating the MSR_IA32_PEBS_ENABLE value that will be loaded on VM-Entry to a KVM guest, mask the value with the vCPU's desired PEBS_ENABLE value. Consulting only the host kernel's host vs. guest masks results in running the guest with PEBS enabled even when the guest doesn't want to use PEBS. Because KVM uses perf events to proxy the guest virtual PMU, simply looking at exclude_host can't differentiate between events created by host userspace, and events created by KVM on behalf of the guest. Running the guest with PEBS unexpectedly enabled typically manifests as crashes due to a near-infinite stream of #PFs. E.g. if the guest hasn't written MSR_IA32_DS_AREA, the CPU will hit page faults on address '0' when trying to record PEBS events. The issue is most easily reproduced by running `perf kvm top` from before commit 7b100989b4f6 ("perf evlist: Remove __evlist__add_default") (after which, `perf kvm top` effectively stopped using PEBS). The userspace side of perf creates a guest-only PEBS event, which intel_guest_get_msrs() misconstrues a guest-*owned* PEBS event. Arguably, this is a userspace bug, as enabling PEBS on guest-only events simply cannot work, and userspace can kill VMs in many other ways (there is no danger to the host). However, even if this is considered to be bad userspace behavior, there's zero downside to perf/KVM restricting PEBS to guest-owned events. Note, commit 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily in two rare situations") fixed the case where host userspace is profiling KVM *and* userspace, but missed the case where userspace is profiling only KVM. Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") Reported-by: Seth Forshee <sforshee@kernel.org> Closes: https://lore.kernel.org/all/Z_VUswFkWiTYI0eD@do-x1carbon Cc: stable@vger.kernel.org Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> --- arch/x86/events/intel/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index cd6329207311..75a376478b21 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4625,7 +4625,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) arr[pebs_enable] = (struct perf_guest_switch_msr){ .msr = MSR_IA32_PEBS_ENABLE, .host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask, - .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask, + .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable, }; if (arr[pebs_enable].host) { base-commit: 2492e5aba2be064d0604ae23ae0770ecc0168192 -- 2.49.0.850.g28803427d3-goog ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. 2025-04-26 0:13 [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value Sean Christopherson @ 2025-04-27 9:10 ` Mi, Dapeng 2025-05-02 15:04 ` Sean Christopherson 2025-04-28 14:13 ` Seth Forshee 1 sibling, 1 reply; 7+ messages in thread From: Mi, Dapeng @ 2025-04-27 9:10 UTC (permalink / raw) To: Sean Christopherson, Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86 Cc: linux-perf-users, linux-kernel, Seth Forshee On 4/26/2025 8:13 AM, Sean Christopherson wrote: > When generating the MSR_IA32_PEBS_ENABLE value that will be loaded on > VM-Entry to a KVM guest, mask the value with the vCPU's desired PEBS_ENABLE > value. Consulting only the host kernel's host vs. guest masks results in > running the guest with PEBS enabled even when the guest doesn't want to use > PEBS. Because KVM uses perf events to proxy the guest virtual PMU, simply > looking at exclude_host can't differentiate between events created by host > userspace, and events created by KVM on behalf of the guest. > > Running the guest with PEBS unexpectedly enabled typically manifests as > crashes due to a near-infinite stream of #PFs. E.g. if the guest hasn't > written MSR_IA32_DS_AREA, the CPU will hit page faults on address '0' when > trying to record PEBS events. > > The issue is most easily reproduced by running `perf kvm top` from before > commit 7b100989b4f6 ("perf evlist: Remove __evlist__add_default") (after > which, `perf kvm top` effectively stopped using PEBS). The userspace side > of perf creates a guest-only PEBS event, which intel_guest_get_msrs() > misconstrues a guest-*owned* PEBS event. > > Arguably, this is a userspace bug, as enabling PEBS on guest-only events > simply cannot work, and userspace can kill VMs in many other ways (there > is no danger to the host). However, even if this is considered to be bad > userspace behavior, there's zero downside to perf/KVM restricting PEBS to > guest-owned events. > > Note, commit 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily > in two rare situations") fixed the case where host userspace is profiling > KVM *and* userspace, but missed the case where userspace is profiling only > KVM. > > Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") > Reported-by: Seth Forshee <sforshee@kernel.org> > Closes: https://lore.kernel.org/all/Z_VUswFkWiTYI0eD@do-x1carbon > Cc: stable@vger.kernel.org > Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > arch/x86/events/intel/core.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > index cd6329207311..75a376478b21 100644 > --- a/arch/x86/events/intel/core.c > +++ b/arch/x86/events/intel/core.c > @@ -4625,7 +4625,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) > arr[pebs_enable] = (struct perf_guest_switch_msr){ > .msr = MSR_IA32_PEBS_ENABLE, > .host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask, > - .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask, > + .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable, Although I can't reproduce the guest crash issue on local SPR server, the logic is correct. Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Furthermore, the below global_ctrl guest value calculation seems incorrect as well. u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable; arr[global_ctrl] = (struct perf_guest_switch_msr){ .msr = MSR_CORE_PERF_GLOBAL_CTRL, .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, }; For guest PEBS events, its HW counter idx would be set into cpuc->pebs_enabled as well, so the calculated guest value would clear the corresponding bit of the guest PEBS events. Thus the guest PEBS events should not be enabled in non-root mode, but coincidentally since the calculated host and guest value of global_ctrl MSR is same, atomic_switch_perf_msrs() doesn't really write the guest value into GLOBAL_CTRL MSR before vm-entry and GLOBAL_CTRL MSR still keeps the default value (all HW counters are enabled). That's why we see guest PEBS still works. for (i = 0; i < nr_msrs; i++) if (msrs[i].host == msrs[i].guest) clear_atomic_switch_msr(vmx, msrs[i].msr); else add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, msrs[i].host, false); Currently we have this Sean's fix, only the guest PEBS event bits of IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply change global_ctrl guest value calculation to this. diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index 09d2d66c9f21..5bc56bb616ec 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data) arr[global_ctrl] = (struct perf_guest_switch_msr){ .msr = MSR_CORE_PERF_GLOBAL_CTRL, .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask, }; > }; > > if (arr[pebs_enable].host) { > > base-commit: 2492e5aba2be064d0604ae23ae0770ecc0168192 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. 2025-04-27 9:10 ` Mi, Dapeng @ 2025-05-02 15:04 ` Sean Christopherson 2025-05-06 12:10 ` Mi, Dapeng 0 siblings, 1 reply; 7+ messages in thread From: Sean Christopherson @ 2025-05-02 15:04 UTC (permalink / raw) To: Dapeng Mi Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, linux-perf-users, linux-kernel, Seth Forshee On Sun, Apr 27, 2025, Dapeng Mi wrote: > On 4/26/2025 8:13 AM, Sean Christopherson wrote: > Currently we have this Sean's fix, only the guest PEBS event bits of > IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply > change global_ctrl guest value calculation to this. > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > index 09d2d66c9f21..5bc56bb616ec 100644 > --- a/arch/x86/events/intel/core.c > +++ b/arch/x86/events/intel/core.c > @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr > *intel_guest_get_msrs(int *nr, void *data) > arr[global_ctrl] = (struct perf_guest_switch_msr){ > .msr = MSR_CORE_PERF_GLOBAL_CTRL, > .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, > + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask, > }; Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled will crash the guest. For the counter itself, unless leaving it enabled breaks perf and/or degrades the sampling, I don't think there's an obvious right/wrong approach. E.g. if the host wants to profile host and guest, then keeping the count running while the guest is active might be a good thing. It's still far, far from perfect, as a counter that overflows when the guest is active won't generate a PEBS record, but without digging further, it's not obvious that even that flaw is overall worse than always disabling the counter. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. 2025-05-02 15:04 ` Sean Christopherson @ 2025-05-06 12:10 ` Mi, Dapeng 2025-05-08 13:47 ` Sean Christopherson 0 siblings, 1 reply; 7+ messages in thread From: Mi, Dapeng @ 2025-05-06 12:10 UTC (permalink / raw) To: Sean Christopherson Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, linux-perf-users, linux-kernel, Seth Forshee On 5/2/2025 11:04 PM, Sean Christopherson wrote: > On Sun, Apr 27, 2025, Dapeng Mi wrote: >> On 4/26/2025 8:13 AM, Sean Christopherson wrote: >> Currently we have this Sean's fix, only the guest PEBS event bits of >> IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply >> change global_ctrl guest value calculation to this. >> >> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c >> index 09d2d66c9f21..5bc56bb616ec 100644 >> --- a/arch/x86/events/intel/core.c >> +++ b/arch/x86/events/intel/core.c >> @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr >> *intel_guest_get_msrs(int *nr, void *data) >> arr[global_ctrl] = (struct perf_guest_switch_msr){ >> .msr = MSR_CORE_PERF_GLOBAL_CTRL, >> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, >> - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, >> + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask, >> }; > Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled > will crash the guest. For the counter itself, unless leaving it enabled breaks > perf and/or degrades the sampling, I don't think there's an obvious right/wrong > approach. > > E.g. if the host wants to profile host and guest, then keeping the count running > while the guest is active might be a good thing. It's still far, far from > perfect, as a counter that overflows when the guest is active won't generate a > PEBS record, but without digging further, it's not obvious that even that flaw > is overall worse than always disabling the counter. Hmm, if the host PEBS event only samples host side, whether the HW counter or the PEBS engine would be disabled once VM enters non-root mode, the KVM PEBS implementation is correct. But for the host PEBS events which sampling both host and guest, the implementation seems incorrect. As the below code shows, as long as there are host PEBS events regardless of the host PEBS events only sample guest or both host and guest, the host PEBS events would be disabled on both HW counters and PEBS engine once VM enters non-root mode. arr[global_ctrl] = (struct perf_guest_switch_msr){ .msr = MSR_CORE_PERF_GLOBAL_CTRL, .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, }; if (arr[pebs_enable].host) { /* Disable guest PEBS if host PEBS is enabled. */ arr[pebs_enable].guest = 0; } So the host PEBS events which hopes to sample both host and guest only samples host side in fact. This is unexpected. I agree it's not enough to just remove the "~pebs_mask" for guest global_ctrl, a simple way to fix this issue could be to skip all PEBS related MSR switch at VM-exit/VM-entry and always keep the host vlaue in these PEBS MSRS regardless of in root or non-root mode. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. 2025-05-06 12:10 ` Mi, Dapeng @ 2025-05-08 13:47 ` Sean Christopherson 2025-05-09 2:19 ` Mi, Dapeng 0 siblings, 1 reply; 7+ messages in thread From: Sean Christopherson @ 2025-05-08 13:47 UTC (permalink / raw) To: Dapeng Mi Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, linux-perf-users, linux-kernel, Seth Forshee On Tue, May 06, 2025, Dapeng Mi wrote: > > On 5/2/2025 11:04 PM, Sean Christopherson wrote: > > On Sun, Apr 27, 2025, Dapeng Mi wrote: > >> On 4/26/2025 8:13 AM, Sean Christopherson wrote: > >> Currently we have this Sean's fix, only the guest PEBS event bits of > >> IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply > >> change global_ctrl guest value calculation to this. > >> > >> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c > >> index 09d2d66c9f21..5bc56bb616ec 100644 > >> --- a/arch/x86/events/intel/core.c > >> +++ b/arch/x86/events/intel/core.c > >> @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr > >> *intel_guest_get_msrs(int *nr, void *data) > >> arr[global_ctrl] = (struct perf_guest_switch_msr){ > >> .msr = MSR_CORE_PERF_GLOBAL_CTRL, > >> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > >> - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, > >> + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask, > >> }; > > Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled > > will crash the guest. For the counter itself, unless leaving it enabled breaks > > perf and/or degrades the sampling, I don't think there's an obvious right/wrong > > approach. > > > > E.g. if the host wants to profile host and guest, then keeping the count running > > while the guest is active might be a good thing. It's still far, far from > > perfect, as a counter that overflows when the guest is active won't generate a > > PEBS record, but without digging further, it's not obvious that even that flaw > > is overall worse than always disabling the counter. > > Hmm, if the host PEBS event only samples host side, whether the HW counter > or the PEBS engine would be disabled once VM enters non-root mode, the KVM > PEBS implementation is correct. But for the host PEBS events which sampling > both host and guest, the implementation seems incorrect. Well, yeah, because there is no correct implementation. > As the below code shows, as long as there are host PEBS events regardless > of the host PEBS events only sample guest or both host and guest, the host > PEBS events would be disabled on both HW counters and PEBS engine once VM > enters non-root mode. > > arr[global_ctrl] = (struct perf_guest_switch_msr){ > .msr = MSR_CORE_PERF_GLOBAL_CTRL, > .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, > .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, > }; > > if (arr[pebs_enable].host) { > /* Disable guest PEBS if host PEBS is enabled. */ > arr[pebs_enable].guest = 0; > > } > > So the host PEBS events which hopes to sample both host and guest only > samples host side in fact. This is unexpected. It's only unexpected in the sense that it's probably not well documented. Because the DS buffer is virtually address, there simply isn't a sane way to enable PEBS (or any feature that utizies the DS buffer) while running a KVM guest that isn't enlightened to explicitly allow profiling via host PEBS (and AFAIK, no such guest exists). Even when KVM is using shadowing paging, i.e. fully controls the page tables used while the guest is running, enabling PEBS isn't feasible as KVM has no way to prevent the guest from using the virtual address. E.g. KVM could shove in mappings for the DS buffer, but that DoS the guest if the guest wants to use the same virtual address range for its own purposes, and would be a massive data leak to the guest since the guest could read host data from the buffer. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. 2025-05-08 13:47 ` Sean Christopherson @ 2025-05-09 2:19 ` Mi, Dapeng 0 siblings, 0 replies; 7+ messages in thread From: Mi, Dapeng @ 2025-05-09 2:19 UTC (permalink / raw) To: Sean Christopherson Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, linux-perf-users, linux-kernel, Seth Forshee On 5/8/2025 9:47 PM, Sean Christopherson wrote: > On Tue, May 06, 2025, Dapeng Mi wrote: >> On 5/2/2025 11:04 PM, Sean Christopherson wrote: >>> On Sun, Apr 27, 2025, Dapeng Mi wrote: >>>> On 4/26/2025 8:13 AM, Sean Christopherson wrote: >>>> Currently we have this Sean's fix, only the guest PEBS event bits of >>>> IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply >>>> change global_ctrl guest value calculation to this. >>>> >>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c >>>> index 09d2d66c9f21..5bc56bb616ec 100644 >>>> --- a/arch/x86/events/intel/core.c >>>> +++ b/arch/x86/events/intel/core.c >>>> @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr >>>> *intel_guest_get_msrs(int *nr, void *data) >>>> arr[global_ctrl] = (struct perf_guest_switch_msr){ >>>> .msr = MSR_CORE_PERF_GLOBAL_CTRL, >>>> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, >>>> - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, >>>> + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask, >>>> }; >>> Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled >>> will crash the guest. For the counter itself, unless leaving it enabled breaks >>> perf and/or degrades the sampling, I don't think there's an obvious right/wrong >>> approach. >>> >>> E.g. if the host wants to profile host and guest, then keeping the count running >>> while the guest is active might be a good thing. It's still far, far from >>> perfect, as a counter that overflows when the guest is active won't generate a >>> PEBS record, but without digging further, it's not obvious that even that flaw >>> is overall worse than always disabling the counter. >> Hmm, if the host PEBS event only samples host side, whether the HW counter >> or the PEBS engine would be disabled once VM enters non-root mode, the KVM >> PEBS implementation is correct. But for the host PEBS events which sampling >> both host and guest, the implementation seems incorrect. > Well, yeah, because there is no correct implementation. > >> As the below code shows, as long as there are host PEBS events regardless >> of the host PEBS events only sample guest or both host and guest, the host >> PEBS events would be disabled on both HW counters and PEBS engine once VM >> enters non-root mode. >> >> arr[global_ctrl] = (struct perf_guest_switch_msr){ >> .msr = MSR_CORE_PERF_GLOBAL_CTRL, >> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask, >> .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask, >> }; >> >> if (arr[pebs_enable].host) { >> /* Disable guest PEBS if host PEBS is enabled. */ >> arr[pebs_enable].guest = 0; >> >> } >> >> So the host PEBS events which hopes to sample both host and guest only >> samples host side in fact. This is unexpected. > It's only unexpected in the sense that it's probably not well documented. Because > the DS buffer is virtually address, there simply isn't a sane way to enable PEBS > (or any feature that utizies the DS buffer) while running a KVM guest that isn't > enlightened to explicitly allow profiling via host PEBS (and AFAIK, no such guest > exists). > > Even when KVM is using shadowing paging, i.e. fully controls the page tables used > while the guest is running, enabling PEBS isn't feasible as KVM has no way to > prevent the guest from using the virtual address. E.g. KVM could shove in mappings > for the DS buffer, but that DoS the guest if the guest wants to use the same > virtual address range for its own purposes, and would be a massive data leak to > the guest since the guest could read host data from the buffer. Yeah, that's true. Thanks for the explanation. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value. 2025-04-26 0:13 [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value Sean Christopherson 2025-04-27 9:10 ` Mi, Dapeng @ 2025-04-28 14:13 ` Seth Forshee 1 sibling, 0 replies; 7+ messages in thread From: Seth Forshee @ 2025-04-28 14:13 UTC (permalink / raw) To: Sean Christopherson Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86, linux-perf-users, linux-kernel, Dapeng Mi On Fri, Apr 25, 2025 at 05:13:55PM -0700, Sean Christopherson wrote: > When generating the MSR_IA32_PEBS_ENABLE value that will be loaded on > VM-Entry to a KVM guest, mask the value with the vCPU's desired PEBS_ENABLE > value. Consulting only the host kernel's host vs. guest masks results in > running the guest with PEBS enabled even when the guest doesn't want to use > PEBS. Because KVM uses perf events to proxy the guest virtual PMU, simply > looking at exclude_host can't differentiate between events created by host > userspace, and events created by KVM on behalf of the guest. > > Running the guest with PEBS unexpectedly enabled typically manifests as > crashes due to a near-infinite stream of #PFs. E.g. if the guest hasn't > written MSR_IA32_DS_AREA, the CPU will hit page faults on address '0' when > trying to record PEBS events. > > The issue is most easily reproduced by running `perf kvm top` from before > commit 7b100989b4f6 ("perf evlist: Remove __evlist__add_default") (after > which, `perf kvm top` effectively stopped using PEBS). The userspace side > of perf creates a guest-only PEBS event, which intel_guest_get_msrs() > misconstrues a guest-*owned* PEBS event. > > Arguably, this is a userspace bug, as enabling PEBS on guest-only events > simply cannot work, and userspace can kill VMs in many other ways (there > is no danger to the host). However, even if this is considered to be bad > userspace behavior, there's zero downside to perf/KVM restricting PEBS to > guest-owned events. > > Note, commit 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily > in two rare situations") fixed the case where host userspace is profiling > KVM *and* userspace, but missed the case where userspace is profiling only > KVM. > > Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS") > Reported-by: Seth Forshee <sforshee@kernel.org> > Closes: https://lore.kernel.org/all/Z_VUswFkWiTYI0eD@do-x1carbon > Cc: stable@vger.kernel.org > Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> > Signed-off-by: Sean Christopherson <seanjc@google.com> Tested-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-05-09 2:19 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-04-26 0:13 [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value Sean Christopherson 2025-04-27 9:10 ` Mi, Dapeng 2025-05-02 15:04 ` Sean Christopherson 2025-05-06 12:10 ` Mi, Dapeng 2025-05-08 13:47 ` Sean Christopherson 2025-05-09 2:19 ` Mi, Dapeng 2025-04-28 14:13 ` Seth Forshee
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).