* [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.
@ 2025-04-26 0:13 Sean Christopherson
2025-04-27 9:10 ` Mi, Dapeng
2025-04-28 14:13 ` Seth Forshee
0 siblings, 2 replies; 7+ messages in thread
From: Sean Christopherson @ 2025-04-26 0:13 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86
Cc: linux-perf-users, linux-kernel, Seth Forshee, Dapeng Mi,
Sean Christopherson
When generating the MSR_IA32_PEBS_ENABLE value that will be loaded on
VM-Entry to a KVM guest, mask the value with the vCPU's desired PEBS_ENABLE
value. Consulting only the host kernel's host vs. guest masks results in
running the guest with PEBS enabled even when the guest doesn't want to use
PEBS. Because KVM uses perf events to proxy the guest virtual PMU, simply
looking at exclude_host can't differentiate between events created by host
userspace, and events created by KVM on behalf of the guest.
Running the guest with PEBS unexpectedly enabled typically manifests as
crashes due to a near-infinite stream of #PFs. E.g. if the guest hasn't
written MSR_IA32_DS_AREA, the CPU will hit page faults on address '0' when
trying to record PEBS events.
The issue is most easily reproduced by running `perf kvm top` from before
commit 7b100989b4f6 ("perf evlist: Remove __evlist__add_default") (after
which, `perf kvm top` effectively stopped using PEBS). The userspace side
of perf creates a guest-only PEBS event, which intel_guest_get_msrs()
misconstrues a guest-*owned* PEBS event.
Arguably, this is a userspace bug, as enabling PEBS on guest-only events
simply cannot work, and userspace can kill VMs in many other ways (there
is no danger to the host). However, even if this is considered to be bad
userspace behavior, there's zero downside to perf/KVM restricting PEBS to
guest-owned events.
Note, commit 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily
in two rare situations") fixed the case where host userspace is profiling
KVM *and* userspace, but missed the case where userspace is profiling only
KVM.
Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
Reported-by: Seth Forshee <sforshee@kernel.org>
Closes: https://lore.kernel.org/all/Z_VUswFkWiTYI0eD@do-x1carbon
Cc: stable@vger.kernel.org
Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/events/intel/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index cd6329207311..75a376478b21 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4625,7 +4625,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
arr[pebs_enable] = (struct perf_guest_switch_msr){
.msr = MSR_IA32_PEBS_ENABLE,
.host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
- .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
+ .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable,
};
if (arr[pebs_enable].host) {
base-commit: 2492e5aba2be064d0604ae23ae0770ecc0168192
--
2.49.0.850.g28803427d3-goog
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.
2025-04-26 0:13 [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value Sean Christopherson
@ 2025-04-27 9:10 ` Mi, Dapeng
2025-05-02 15:04 ` Sean Christopherson
2025-04-28 14:13 ` Seth Forshee
1 sibling, 1 reply; 7+ messages in thread
From: Mi, Dapeng @ 2025-04-27 9:10 UTC (permalink / raw)
To: Sean Christopherson, Peter Zijlstra, Ingo Molnar,
Arnaldo Carvalho de Melo, Namhyung Kim, Thomas Gleixner,
Borislav Petkov, Dave Hansen, x86
Cc: linux-perf-users, linux-kernel, Seth Forshee
On 4/26/2025 8:13 AM, Sean Christopherson wrote:
> When generating the MSR_IA32_PEBS_ENABLE value that will be loaded on
> VM-Entry to a KVM guest, mask the value with the vCPU's desired PEBS_ENABLE
> value. Consulting only the host kernel's host vs. guest masks results in
> running the guest with PEBS enabled even when the guest doesn't want to use
> PEBS. Because KVM uses perf events to proxy the guest virtual PMU, simply
> looking at exclude_host can't differentiate between events created by host
> userspace, and events created by KVM on behalf of the guest.
>
> Running the guest with PEBS unexpectedly enabled typically manifests as
> crashes due to a near-infinite stream of #PFs. E.g. if the guest hasn't
> written MSR_IA32_DS_AREA, the CPU will hit page faults on address '0' when
> trying to record PEBS events.
>
> The issue is most easily reproduced by running `perf kvm top` from before
> commit 7b100989b4f6 ("perf evlist: Remove __evlist__add_default") (after
> which, `perf kvm top` effectively stopped using PEBS). The userspace side
> of perf creates a guest-only PEBS event, which intel_guest_get_msrs()
> misconstrues a guest-*owned* PEBS event.
>
> Arguably, this is a userspace bug, as enabling PEBS on guest-only events
> simply cannot work, and userspace can kill VMs in many other ways (there
> is no danger to the host). However, even if this is considered to be bad
> userspace behavior, there's zero downside to perf/KVM restricting PEBS to
> guest-owned events.
>
> Note, commit 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily
> in two rare situations") fixed the case where host userspace is profiling
> KVM *and* userspace, but missed the case where userspace is profiling only
> KVM.
>
> Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
> Reported-by: Seth Forshee <sforshee@kernel.org>
> Closes: https://lore.kernel.org/all/Z_VUswFkWiTYI0eD@do-x1carbon
> Cc: stable@vger.kernel.org
> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/events/intel/core.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index cd6329207311..75a376478b21 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4625,7 +4625,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> arr[pebs_enable] = (struct perf_guest_switch_msr){
> .msr = MSR_IA32_PEBS_ENABLE,
> .host = cpuc->pebs_enabled & ~cpuc->intel_ctrl_guest_mask,
> - .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask,
> + .guest = pebs_mask & ~cpuc->intel_ctrl_host_mask & kvm_pmu->pebs_enable,
Although I can't reproduce the guest crash issue on local SPR server, the
logic is correct.
Reviewed-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Furthermore, the below global_ctrl guest value calculation seems
incorrect as well.
u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
arr[global_ctrl] = (struct perf_guest_switch_msr){
.msr = MSR_CORE_PERF_GLOBAL_CTRL,
.host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
.guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
};
For guest PEBS events, its HW counter idx would be set into
cpuc->pebs_enabled as well, so the calculated guest value would clear the
corresponding bit of the guest PEBS events. Thus the guest PEBS events
should not be enabled in non-root mode, but coincidentally since the
calculated host and guest value of global_ctrl MSR is same,
atomic_switch_perf_msrs() doesn't really write the guest value into
GLOBAL_CTRL MSR before vm-entry and GLOBAL_CTRL MSR still keeps the default
value (all HW counters are enabled). That's why we see guest PEBS still works.
for (i = 0; i < nr_msrs; i++)
if (msrs[i].host == msrs[i].guest)
clear_atomic_switch_msr(vmx, msrs[i].msr);
else
add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest,
msrs[i].host, false);
Currently we have this Sean's fix, only the guest PEBS event bits of
IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply
change global_ctrl guest value calculation to this.
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 09d2d66c9f21..5bc56bb616ec 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr
*intel_guest_get_msrs(int *nr, void *data)
arr[global_ctrl] = (struct perf_guest_switch_msr){
.msr = MSR_CORE_PERF_GLOBAL_CTRL,
.host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
- .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
+ .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask,
};
> };
>
> if (arr[pebs_enable].host) {
>
> base-commit: 2492e5aba2be064d0604ae23ae0770ecc0168192
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.
2025-04-26 0:13 [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value Sean Christopherson
2025-04-27 9:10 ` Mi, Dapeng
@ 2025-04-28 14:13 ` Seth Forshee
1 sibling, 0 replies; 7+ messages in thread
From: Seth Forshee @ 2025-04-28 14:13 UTC (permalink / raw)
To: Sean Christopherson
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
linux-perf-users, linux-kernel, Dapeng Mi
On Fri, Apr 25, 2025 at 05:13:55PM -0700, Sean Christopherson wrote:
> When generating the MSR_IA32_PEBS_ENABLE value that will be loaded on
> VM-Entry to a KVM guest, mask the value with the vCPU's desired PEBS_ENABLE
> value. Consulting only the host kernel's host vs. guest masks results in
> running the guest with PEBS enabled even when the guest doesn't want to use
> PEBS. Because KVM uses perf events to proxy the guest virtual PMU, simply
> looking at exclude_host can't differentiate between events created by host
> userspace, and events created by KVM on behalf of the guest.
>
> Running the guest with PEBS unexpectedly enabled typically manifests as
> crashes due to a near-infinite stream of #PFs. E.g. if the guest hasn't
> written MSR_IA32_DS_AREA, the CPU will hit page faults on address '0' when
> trying to record PEBS events.
>
> The issue is most easily reproduced by running `perf kvm top` from before
> commit 7b100989b4f6 ("perf evlist: Remove __evlist__add_default") (after
> which, `perf kvm top` effectively stopped using PEBS). The userspace side
> of perf creates a guest-only PEBS event, which intel_guest_get_msrs()
> misconstrues a guest-*owned* PEBS event.
>
> Arguably, this is a userspace bug, as enabling PEBS on guest-only events
> simply cannot work, and userspace can kill VMs in many other ways (there
> is no danger to the host). However, even if this is considered to be bad
> userspace behavior, there's zero downside to perf/KVM restricting PEBS to
> guest-owned events.
>
> Note, commit 854250329c02 ("KVM: x86/pmu: Disable guest PEBS temporarily
> in two rare situations") fixed the case where host userspace is profiling
> KVM *and* userspace, but missed the case where userspace is profiling only
> KVM.
>
> Fixes: c59a1f106f5c ("KVM: x86/pmu: Add IA32_PEBS_ENABLE MSR emulation for extended PEBS")
> Reported-by: Seth Forshee <sforshee@kernel.org>
> Closes: https://lore.kernel.org/all/Z_VUswFkWiTYI0eD@do-x1carbon
> Cc: stable@vger.kernel.org
> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Tested-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.
2025-04-27 9:10 ` Mi, Dapeng
@ 2025-05-02 15:04 ` Sean Christopherson
2025-05-06 12:10 ` Mi, Dapeng
0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2025-05-02 15:04 UTC (permalink / raw)
To: Dapeng Mi
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
linux-perf-users, linux-kernel, Seth Forshee
On Sun, Apr 27, 2025, Dapeng Mi wrote:
> On 4/26/2025 8:13 AM, Sean Christopherson wrote:
> Currently we have this Sean's fix, only the guest PEBS event bits of
> IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply
> change global_ctrl guest value calculation to this.
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 09d2d66c9f21..5bc56bb616ec 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr
> *intel_guest_get_msrs(int *nr, void *data)
> arr[global_ctrl] = (struct perf_guest_switch_msr){
> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
> + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask,
> };
Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled
will crash the guest. For the counter itself, unless leaving it enabled breaks
perf and/or degrades the sampling, I don't think there's an obvious right/wrong
approach.
E.g. if the host wants to profile host and guest, then keeping the count running
while the guest is active might be a good thing. It's still far, far from
perfect, as a counter that overflows when the guest is active won't generate a
PEBS record, but without digging further, it's not obvious that even that flaw
is overall worse than always disabling the counter.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.
2025-05-02 15:04 ` Sean Christopherson
@ 2025-05-06 12:10 ` Mi, Dapeng
2025-05-08 13:47 ` Sean Christopherson
0 siblings, 1 reply; 7+ messages in thread
From: Mi, Dapeng @ 2025-05-06 12:10 UTC (permalink / raw)
To: Sean Christopherson
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
linux-perf-users, linux-kernel, Seth Forshee
On 5/2/2025 11:04 PM, Sean Christopherson wrote:
> On Sun, Apr 27, 2025, Dapeng Mi wrote:
>> On 4/26/2025 8:13 AM, Sean Christopherson wrote:
>> Currently we have this Sean's fix, only the guest PEBS event bits of
>> IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply
>> change global_ctrl guest value calculation to this.
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index 09d2d66c9f21..5bc56bb616ec 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr
>> *intel_guest_get_msrs(int *nr, void *data)
>> arr[global_ctrl] = (struct perf_guest_switch_msr){
>> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
>> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
>> - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
>> + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask,
>> };
> Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled
> will crash the guest. For the counter itself, unless leaving it enabled breaks
> perf and/or degrades the sampling, I don't think there's an obvious right/wrong
> approach.
>
> E.g. if the host wants to profile host and guest, then keeping the count running
> while the guest is active might be a good thing. It's still far, far from
> perfect, as a counter that overflows when the guest is active won't generate a
> PEBS record, but without digging further, it's not obvious that even that flaw
> is overall worse than always disabling the counter.
Hmm, if the host PEBS event only samples host side, whether the HW counter
or the PEBS engine would be disabled once VM enters non-root mode, the KVM
PEBS implementation is correct. But for the host PEBS events which sampling
both host and guest, the implementation seems incorrect.
As the below code shows, as long as there are host PEBS events regardless
of the host PEBS events only sample guest or both host and guest, the host
PEBS events would be disabled on both HW counters and PEBS engine once VM
enters non-root mode.
arr[global_ctrl] = (struct perf_guest_switch_msr){
.msr = MSR_CORE_PERF_GLOBAL_CTRL,
.host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
.guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
};
if (arr[pebs_enable].host) {
/* Disable guest PEBS if host PEBS is enabled. */
arr[pebs_enable].guest = 0;
}
So the host PEBS events which hopes to sample both host and guest only
samples host side in fact. This is unexpected.
I agree it's not enough to just remove the "~pebs_mask" for guest
global_ctrl, a simple way to fix this issue could be to skip all PEBS
related MSR switch at VM-exit/VM-entry and always keep the host vlaue in
these PEBS MSRS regardless of in root or non-root mode.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.
2025-05-06 12:10 ` Mi, Dapeng
@ 2025-05-08 13:47 ` Sean Christopherson
2025-05-09 2:19 ` Mi, Dapeng
0 siblings, 1 reply; 7+ messages in thread
From: Sean Christopherson @ 2025-05-08 13:47 UTC (permalink / raw)
To: Dapeng Mi
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
linux-perf-users, linux-kernel, Seth Forshee
On Tue, May 06, 2025, Dapeng Mi wrote:
>
> On 5/2/2025 11:04 PM, Sean Christopherson wrote:
> > On Sun, Apr 27, 2025, Dapeng Mi wrote:
> >> On 4/26/2025 8:13 AM, Sean Christopherson wrote:
> >> Currently we have this Sean's fix, only the guest PEBS event bits of
> >> IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply
> >> change global_ctrl guest value calculation to this.
> >>
> >> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> >> index 09d2d66c9f21..5bc56bb616ec 100644
> >> --- a/arch/x86/events/intel/core.c
> >> +++ b/arch/x86/events/intel/core.c
> >> @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr
> >> *intel_guest_get_msrs(int *nr, void *data)
> >> arr[global_ctrl] = (struct perf_guest_switch_msr){
> >> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> >> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> >> - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
> >> + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask,
> >> };
> > Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled
> > will crash the guest. For the counter itself, unless leaving it enabled breaks
> > perf and/or degrades the sampling, I don't think there's an obvious right/wrong
> > approach.
> >
> > E.g. if the host wants to profile host and guest, then keeping the count running
> > while the guest is active might be a good thing. It's still far, far from
> > perfect, as a counter that overflows when the guest is active won't generate a
> > PEBS record, but without digging further, it's not obvious that even that flaw
> > is overall worse than always disabling the counter.
>
> Hmm, if the host PEBS event only samples host side, whether the HW counter
> or the PEBS engine would be disabled once VM enters non-root mode, the KVM
> PEBS implementation is correct. But for the host PEBS events which sampling
> both host and guest, the implementation seems incorrect.
Well, yeah, because there is no correct implementation.
> As the below code shows, as long as there are host PEBS events regardless
> of the host PEBS events only sample guest or both host and guest, the host
> PEBS events would be disabled on both HW counters and PEBS engine once VM
> enters non-root mode.
>
> arr[global_ctrl] = (struct perf_guest_switch_msr){
> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
> };
>
> if (arr[pebs_enable].host) {
> /* Disable guest PEBS if host PEBS is enabled. */
> arr[pebs_enable].guest = 0;
>
> }
>
> So the host PEBS events which hopes to sample both host and guest only
> samples host side in fact. This is unexpected.
It's only unexpected in the sense that it's probably not well documented. Because
the DS buffer is virtually address, there simply isn't a sane way to enable PEBS
(or any feature that utizies the DS buffer) while running a KVM guest that isn't
enlightened to explicitly allow profiling via host PEBS (and AFAIK, no such guest
exists).
Even when KVM is using shadowing paging, i.e. fully controls the page tables used
while the guest is running, enabling PEBS isn't feasible as KVM has no way to
prevent the guest from using the virtual address. E.g. KVM could shove in mappings
for the DS buffer, but that DoS the guest if the guest wants to use the same
virtual address range for its own purposes, and would be a massive data leak to
the guest since the guest could read host data from the buffer.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value.
2025-05-08 13:47 ` Sean Christopherson
@ 2025-05-09 2:19 ` Mi, Dapeng
0 siblings, 0 replies; 7+ messages in thread
From: Mi, Dapeng @ 2025-05-09 2:19 UTC (permalink / raw)
To: Sean Christopherson
Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
linux-perf-users, linux-kernel, Seth Forshee
On 5/8/2025 9:47 PM, Sean Christopherson wrote:
> On Tue, May 06, 2025, Dapeng Mi wrote:
>> On 5/2/2025 11:04 PM, Sean Christopherson wrote:
>>> On Sun, Apr 27, 2025, Dapeng Mi wrote:
>>>> On 4/26/2025 8:13 AM, Sean Christopherson wrote:
>>>> Currently we have this Sean's fix, only the guest PEBS event bits of
>>>> IA32_PEBS_ENABLE MSR are enabled in non-root mode, suppose we can simply
>>>> change global_ctrl guest value calculation to this.
>>>>
>>>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>>>> index 09d2d66c9f21..5bc56bb616ec 100644
>>>> --- a/arch/x86/events/intel/core.c
>>>> +++ b/arch/x86/events/intel/core.c
>>>> @@ -4342,9 +4342,12 @@ static struct perf_guest_switch_msr
>>>> *intel_guest_get_msrs(int *nr, void *data)
>>>> arr[global_ctrl] = (struct perf_guest_switch_msr){
>>>> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
>>>> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
>>>> - .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
>>>> + .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask,
>>>> };
>>> Hmm, that's not as clear cut. PEBS needs to be disabled because leaving it enabled
>>> will crash the guest. For the counter itself, unless leaving it enabled breaks
>>> perf and/or degrades the sampling, I don't think there's an obvious right/wrong
>>> approach.
>>>
>>> E.g. if the host wants to profile host and guest, then keeping the count running
>>> while the guest is active might be a good thing. It's still far, far from
>>> perfect, as a counter that overflows when the guest is active won't generate a
>>> PEBS record, but without digging further, it's not obvious that even that flaw
>>> is overall worse than always disabling the counter.
>> Hmm, if the host PEBS event only samples host side, whether the HW counter
>> or the PEBS engine would be disabled once VM enters non-root mode, the KVM
>> PEBS implementation is correct. But for the host PEBS events which sampling
>> both host and guest, the implementation seems incorrect.
> Well, yeah, because there is no correct implementation.
>
>> As the below code shows, as long as there are host PEBS events regardless
>> of the host PEBS events only sample guest or both host and guest, the host
>> PEBS events would be disabled on both HW counters and PEBS engine once VM
>> enters non-root mode.
>>
>> arr[global_ctrl] = (struct perf_guest_switch_msr){
>> .msr = MSR_CORE_PERF_GLOBAL_CTRL,
>> .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
>> .guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask & ~pebs_mask,
>> };
>>
>> if (arr[pebs_enable].host) {
>> /* Disable guest PEBS if host PEBS is enabled. */
>> arr[pebs_enable].guest = 0;
>>
>> }
>>
>> So the host PEBS events which hopes to sample both host and guest only
>> samples host side in fact. This is unexpected.
> It's only unexpected in the sense that it's probably not well documented. Because
> the DS buffer is virtually address, there simply isn't a sane way to enable PEBS
> (or any feature that utizies the DS buffer) while running a KVM guest that isn't
> enlightened to explicitly allow profiling via host PEBS (and AFAIK, no such guest
> exists).
>
> Even when KVM is using shadowing paging, i.e. fully controls the page tables used
> while the guest is running, enabling PEBS isn't feasible as KVM has no way to
> prevent the guest from using the virtual address. E.g. KVM could shove in mappings
> for the DS buffer, but that DoS the guest if the guest wants to use the same
> virtual address range for its own purposes, and would be a massive data leak to
> the guest since the guest could read host data from the buffer.
Yeah, that's true. Thanks for the explanation.
>
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-05-09 2:19 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-26 0:13 [PATCH] perf/x86/intel: KVM: Mask PEBS_ENABLE loaded for guest with vCPU's value Sean Christopherson
2025-04-27 9:10 ` Mi, Dapeng
2025-05-02 15:04 ` Sean Christopherson
2025-05-06 12:10 ` Mi, Dapeng
2025-05-08 13:47 ` Sean Christopherson
2025-05-09 2:19 ` Mi, Dapeng
2025-04-28 14:13 ` Seth Forshee
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).