* [PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context
@ 2026-03-20 15:27 Jim Mattson
2026-03-20 18:58 ` Jim Mattson
0 siblings, 1 reply; 4+ messages in thread
From: Jim Mattson @ 2026-03-20 15:27 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
Namhyung Kim, Mark Rutland, Alexander Shishkin, Jiri Olsa,
Ian Rogers, Adrian Hunter, James Clark, Thomas Gleixner,
Borislav Petkov, Dave Hansen, x86, H. Peter Anvin,
linux-perf-users, linux-kernel, Stephane Eranian, Mingwei Zhang
Cc: Jim Mattson
Writing MSR_IA32_PEBS_ENABLE in handle_pmi_common() when a PEBS event is
throttled is problematic for KVM, which may have an older value for the MSR
(previously obtained via perf_guest_get_msrs()) stored in the VM-exit
MSR-load list.
It isn't necessary to update the MSR in the PMI handler, because the
throttled counter's PerfEvtSel ENABLE bit has been cleared, so its
PEBS_ENABLE bit is irrelevant. However, the MSR must be updated before the
stale bit becomes relevant again (i.e. when the PMC starts counting a new
perf event).
When the PEBS_ENABLE MSR is rendered stale in the PMI handler, just record
that fact in a new cpu_hw_events boolean, pebs_stale. Extend
intel_pmu_pebs_enable_all() to write MSR_IA32_PEBS_ENABLE when pebs_stale
is set and to clear pebs_stale. This ensures stale bits are cleared
during the normal PMU enable path, before PERF_GLOBAL_CTRL is restored and
any new event can start counting.
Signed-off-by: Jim Mattson <jmattson@google.com>
---
arch/x86/events/intel/core.c | 28 ++++++++++++++++++++++------
arch/x86/events/intel/ds.c | 4 +++-
arch/x86/events/perf_event.h | 1 +
3 files changed, 26 insertions(+), 7 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index cf3a4fe06ff2..3b0173d25232 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3548,14 +3548,26 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
static_call(x86_pmu_drain_pebs)(regs, &data);
/*
- * PMI throttle may be triggered, which stops the PEBS event.
- * Although cpuc->pebs_enabled is updated accordingly, the
- * MSR_IA32_PEBS_ENABLE is not updated. Because the
- * cpuc->enabled has been forced to 0 in PMI.
- * Update the MSR if pebs_enabled is changed.
+ * PMI throttling may be triggered, which stops the PEBS
+ * event. Although cpuc->pebs_enabled was updated
+ * accordingly, MSR_IA32_PEBS_ENABLE has not been updated.
+ *
+ * The MSR must not be updated in NMI context, because KVM
+ * may be in a critical region on its way to VM-entry, with
+ * the now-stale MSR value stored in the VM-exit MSR-load
+ * list. On VM-exit, the CPU will restore the stale value.
+ *
+ * Deferring the MSR update is harmless because the
+ * throttled event's PerfEvtSel ENABLE bit has been
+ * cleared, so the stale PEBS_ENABLE bit is irrelevant for
+ * now.
+ *
+ * Track when MSR_IA32_PEBS_ENABLE is stale, so that
+ * pebs_enable_all() will write cpuc->pebs_enabled to the
+ * MSR, even when cpuc->pebs_enabled is 0.
*/
if (pebs_enabled != cpuc->pebs_enabled)
- wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
+ cpuc->pebs_stale = true;
/*
* Above PEBS handler (PEBS counters snapshotting) has updated fixed
@@ -4978,6 +4990,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
* These values have nothing to do with the emulated values the guest sees
* when it uses {RD,WR}MSR, which should be handled by the KVM context,
* specifically in the intel_pmu_{get,set}_msr().
+ *
+ * Note that MSRs returned from this function must not be modified in NMI
+ * context. Doing so could result in a stale host value being restored at
+ * the next VM-exit.
*/
static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
{
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 5027afc97b65..852baf6a05e9 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1982,8 +1982,10 @@ void intel_pmu_pebs_enable_all(void)
{
struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
- if (cpuc->pebs_enabled)
+ if (cpuc->pebs_enabled || cpuc->pebs_stale)
wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
+
+ cpuc->pebs_stale = false;
}
void intel_pmu_pebs_disable_all(void)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index fad87d3c8b2c..22102296e31b 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -298,6 +298,7 @@ struct cpu_hw_events {
/* DS based PEBS or arch-PEBS buffer address */
void *pebs_vaddr;
u64 pebs_enabled;
+ bool pebs_stale;
int n_pebs;
int n_large_pebs;
int n_pebs_via_pt;
--
2.53.0.959.g497ff81fa9-goog
^ permalink raw reply related [flat|nested] 4+ messages in thread* Re: [PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context
2026-03-20 15:27 [PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context Jim Mattson
@ 2026-03-20 18:58 ` Jim Mattson
2026-04-10 2:24 ` Mi, Dapeng
0 siblings, 1 reply; 4+ messages in thread
From: Jim Mattson @ 2026-03-20 18:58 UTC (permalink / raw)
To: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-perf-users, linux-kernel, Stephane Eranian,
Mingwei Zhang, Peter Zijlstra
On Fri, Mar 20, 2026 at 8:29 AM Jim Mattson <jmattson@google.com> wrote:
>
> Writing MSR_IA32_PEBS_ENABLE in handle_pmi_common() when a PEBS event is
> throttled is problematic for KVM, which may have an older value for the MSR
> (previously obtained via perf_guest_get_msrs()) stored in the VM-exit
> MSR-load list.
>
> It isn't necessary to update the MSR in the PMI handler, because the
> throttled counter's PerfEvtSel ENABLE bit has been cleared, so its
> PEBS_ENABLE bit is irrelevant. However, the MSR must be updated before the
> stale bit becomes relevant again (i.e. when the PMC starts counting a new
> perf event).
>
> When the PEBS_ENABLE MSR is rendered stale in the PMI handler, just record
> that fact in a new cpu_hw_events boolean, pebs_stale. Extend
> intel_pmu_pebs_enable_all() to write MSR_IA32_PEBS_ENABLE when pebs_stale
> is set and to clear pebs_stale. This ensures stale bits are cleared
> during the normal PMU enable path, before PERF_GLOBAL_CTRL is restored and
> any new event can start counting.
>
> Signed-off-by: Jim Mattson <jmattson@google.com>
> ---
> arch/x86/events/intel/core.c | 28 ++++++++++++++++++++++------
> arch/x86/events/intel/ds.c | 4 +++-
> arch/x86/events/perf_event.h | 1 +
> 3 files changed, 26 insertions(+), 7 deletions(-)
>
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index cf3a4fe06ff2..3b0173d25232 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3548,14 +3548,26 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
> static_call(x86_pmu_drain_pebs)(regs, &data);
>
> /*
> - * PMI throttle may be triggered, which stops the PEBS event.
> - * Although cpuc->pebs_enabled is updated accordingly, the
> - * MSR_IA32_PEBS_ENABLE is not updated. Because the
> - * cpuc->enabled has been forced to 0 in PMI.
> - * Update the MSR if pebs_enabled is changed.
> + * PMI throttling may be triggered, which stops the PEBS
> + * event. Although cpuc->pebs_enabled was updated
> + * accordingly, MSR_IA32_PEBS_ENABLE has not been updated.
> + *
> + * The MSR must not be updated in NMI context, because KVM
> + * may be in a critical region on its way to VM-entry, with
> + * the now-stale MSR value stored in the VM-exit MSR-load
> + * list. On VM-exit, the CPU will restore the stale value.
> + *
> + * Deferring the MSR update is harmless because the
> + * throttled event's PerfEvtSel ENABLE bit has been
> + * cleared, so the stale PEBS_ENABLE bit is irrelevant for
> + * now.
> + *
> + * Track when MSR_IA32_PEBS_ENABLE is stale, so that
> + * pebs_enable_all() will write cpuc->pebs_enabled to the
> + * MSR, even when cpuc->pebs_enabled is 0.
> */
> if (pebs_enabled != cpuc->pebs_enabled)
> - wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
> + cpuc->pebs_stale = true;
>
> /*
> * Above PEBS handler (PEBS counters snapshotting) has updated fixed
> @@ -4978,6 +4990,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
> * These values have nothing to do with the emulated values the guest sees
> * when it uses {RD,WR}MSR, which should be handled by the KVM context,
> * specifically in the intel_pmu_{get,set}_msr().
> + *
> + * Note that MSRs returned from this function must not be modified in NMI
> + * context. Doing so could result in a stale host value being restored at
> + * the next VM-exit.
> */
> static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> {
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 5027afc97b65..852baf6a05e9 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1982,8 +1982,10 @@ void intel_pmu_pebs_enable_all(void)
> {
> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>
> - if (cpuc->pebs_enabled)
> + if (cpuc->pebs_enabled || cpuc->pebs_stale)
> wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
> +
> + cpuc->pebs_stale = false;
> }
>
> void intel_pmu_pebs_disable_all(void)
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index fad87d3c8b2c..22102296e31b 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -298,6 +298,7 @@ struct cpu_hw_events {
> /* DS based PEBS or arch-PEBS buffer address */
> void *pebs_vaddr;
> u64 pebs_enabled;
> + bool pebs_stale;
> int n_pebs;
> int n_large_pebs;
> int n_pebs_via_pt;
> --
> 2.53.0.959.g497ff81fa9-goog
>
Apologies. Resend to fix Peter Zijlstra's address.
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context
2026-03-20 18:58 ` Jim Mattson
@ 2026-04-10 2:24 ` Mi, Dapeng
2026-04-10 2:58 ` Jim Mattson
0 siblings, 1 reply; 4+ messages in thread
From: Mi, Dapeng @ 2026-04-10 2:24 UTC (permalink / raw)
To: Jim Mattson, Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim,
Mark Rutland, Alexander Shishkin, Jiri Olsa, Ian Rogers,
Adrian Hunter, James Clark, Thomas Gleixner, Borislav Petkov,
Dave Hansen, x86, H. Peter Anvin, linux-perf-users, linux-kernel,
Stephane Eranian, Mingwei Zhang, Peter Zijlstra
On 3/21/2026 2:58 AM, Jim Mattson wrote:
> On Fri, Mar 20, 2026 at 8:29 AM Jim Mattson <jmattson@google.com> wrote:
>> Writing MSR_IA32_PEBS_ENABLE in handle_pmi_common() when a PEBS event is
>> throttled is problematic for KVM, which may have an older value for the MSR
>> (previously obtained via perf_guest_get_msrs()) stored in the VM-exit
>> MSR-load list.
>>
>> It isn't necessary to update the MSR in the PMI handler, because the
>> throttled counter's PerfEvtSel ENABLE bit has been cleared, so its
>> PEBS_ENABLE bit is irrelevant. However, the MSR must be updated before the
>> stale bit becomes relevant again (i.e. when the PMC starts counting a new
>> perf event).
>>
>> When the PEBS_ENABLE MSR is rendered stale in the PMI handler, just record
>> that fact in a new cpu_hw_events boolean, pebs_stale. Extend
>> intel_pmu_pebs_enable_all() to write MSR_IA32_PEBS_ENABLE when pebs_stale
>> is set and to clear pebs_stale. This ensures stale bits are cleared
>> during the normal PMU enable path, before PERF_GLOBAL_CTRL is restored and
>> any new event can start counting.
>>
>> Signed-off-by: Jim Mattson <jmattson@google.com>
>> ---
>> arch/x86/events/intel/core.c | 28 ++++++++++++++++++++++------
>> arch/x86/events/intel/ds.c | 4 +++-
>> arch/x86/events/perf_event.h | 1 +
>> 3 files changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
>> index cf3a4fe06ff2..3b0173d25232 100644
>> --- a/arch/x86/events/intel/core.c
>> +++ b/arch/x86/events/intel/core.c
>> @@ -3548,14 +3548,26 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>> static_call(x86_pmu_drain_pebs)(regs, &data);
>>
>> /*
>> - * PMI throttle may be triggered, which stops the PEBS event.
>> - * Although cpuc->pebs_enabled is updated accordingly, the
>> - * MSR_IA32_PEBS_ENABLE is not updated. Because the
>> - * cpuc->enabled has been forced to 0 in PMI.
>> - * Update the MSR if pebs_enabled is changed.
>> + * PMI throttling may be triggered, which stops the PEBS
>> + * event. Although cpuc->pebs_enabled was updated
>> + * accordingly, MSR_IA32_PEBS_ENABLE has not been updated.
>> + *
>> + * The MSR must not be updated in NMI context, because KVM
>> + * may be in a critical region on its way to VM-entry, with
>> + * the now-stale MSR value stored in the VM-exit MSR-load
>> + * list. On VM-exit, the CPU will restore the stale value.
Hmm, that looks like a race condition between KVM and PMI handler. Suppose
the issue could only happen in the legacy perf-based vPMU enabled case,
right? For the mediated vPMU, the host events would be rescheduled on each
VM-exit and the MSR_IA32_PEBS_ENABLE MSR would be always written.
>> + *
>> + * Deferring the MSR update is harmless because the
>> + * throttled event's PerfEvtSel ENABLE bit has been
>> + * cleared, so the stale PEBS_ENABLE bit is irrelevant for
>> + * now.
>> + *
>> + * Track when MSR_IA32_PEBS_ENABLE is stale, so that
>> + * pebs_enable_all() will write cpuc->pebs_enabled to the
>> + * MSR, even when cpuc->pebs_enabled is 0.
>> */
>> if (pebs_enabled != cpuc->pebs_enabled)
>> - wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
>> + cpuc->pebs_stale = true;
Deferring the MSR_IA32_PEBS_ENABLE writing would cause a mismatch between
the value of MSR_IA32_PEBS_ENABLE and cpuc->pebs_enabled until
intel_pmu_pebs_enable_all() is called again. But suppose it would be fine
since the event has been stopped and intel_pmu_pebs_enable_all() would be
called in next unthrottle.
>>
>> /*
>> * Above PEBS handler (PEBS counters snapshotting) has updated fixed
>> @@ -4978,6 +4990,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
>> * These values have nothing to do with the emulated values the guest sees
>> * when it uses {RD,WR}MSR, which should be handled by the KVM context,
>> * specifically in the intel_pmu_{get,set}_msr().
>> + *
>> + * Note that MSRs returned from this function must not be modified in NMI
>> + * context. Doing so could result in a stale host value being restored at
>> + * the next VM-exit.
>> */
>> static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
>> {
>> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
>> index 5027afc97b65..852baf6a05e9 100644
>> --- a/arch/x86/events/intel/ds.c
>> +++ b/arch/x86/events/intel/ds.c
>> @@ -1982,8 +1982,10 @@ void intel_pmu_pebs_enable_all(void)
>> {
>> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
>>
>> - if (cpuc->pebs_enabled)
>> + if (cpuc->pebs_enabled || cpuc->pebs_stale)
>> wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
>> +
>> + cpuc->pebs_stale = false;
Sashiko comments
"
Can this race with an NMI?
If a PMI fires exactly between wrmsrq() and cpuc->pebs_stale = false:
CPU0
intel_pmu_pebs_enable_all()
wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
<--- NMI (PMI) fires here
handle_pmi_common()
... throttles event ...
cpuc->pebs_stale = true;
<--- NMI returns
cpuc->pebs_stale = false;
Would this clobber the pebs_stale flag set by the NMI, leaving the software
state desynchronized from the hardware MSR and potentially causing the PMU
to skip the MSR update on the next enable?
"
It looks reasonable.
https://sashiko.dev/#/patchset/20260320152928.1916447-1-jmattson%40google.com
Thanks.
>> }
>>
>> void intel_pmu_pebs_disable_all(void)
>> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
>> index fad87d3c8b2c..22102296e31b 100644
>> --- a/arch/x86/events/perf_event.h
>> +++ b/arch/x86/events/perf_event.h
>> @@ -298,6 +298,7 @@ struct cpu_hw_events {
>> /* DS based PEBS or arch-PEBS buffer address */
>> void *pebs_vaddr;
>> u64 pebs_enabled;
>> + bool pebs_stale;
>> int n_pebs;
>> int n_large_pebs;
>> int n_pebs_via_pt;
>> --
>> 2.53.0.959.g497ff81fa9-goog
>>
> Apologies. Resend to fix Peter Zijlstra's address.
>
^ permalink raw reply [flat|nested] 4+ messages in thread* Re: [PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context
2026-04-10 2:24 ` Mi, Dapeng
@ 2026-04-10 2:58 ` Jim Mattson
0 siblings, 0 replies; 4+ messages in thread
From: Jim Mattson @ 2026-04-10 2:58 UTC (permalink / raw)
To: Mi, Dapeng
Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Namhyung Kim, Mark Rutland,
Alexander Shishkin, Jiri Olsa, Ian Rogers, Adrian Hunter,
James Clark, Thomas Gleixner, Borislav Petkov, Dave Hansen, x86,
H. Peter Anvin, linux-perf-users, linux-kernel, Stephane Eranian,
Mingwei Zhang, Peter Zijlstra
On Thu, Apr 9, 2026 at 7:24 PM Mi, Dapeng <dapeng1.mi@linux.intel.com> wrote:
>
>
> On 3/21/2026 2:58 AM, Jim Mattson wrote:
> > On Fri, Mar 20, 2026 at 8:29 AM Jim Mattson <jmattson@google.com> wrote:
> >> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> >> index 5027afc97b65..852baf6a05e9 100644
> >> --- a/arch/x86/events/intel/ds.c
> >> +++ b/arch/x86/events/intel/ds.c
> >> @@ -1982,8 +1982,10 @@ void intel_pmu_pebs_enable_all(void)
> >> {
> >> struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> >>
> >> - if (cpuc->pebs_enabled)
> >> + if (cpuc->pebs_enabled || cpuc->pebs_stale)
> >> wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
> >> +
> >> + cpuc->pebs_stale = false;
>
> Sashiko comments
>
> "
>
> Can this race with an NMI?
> If a PMI fires exactly between wrmsrq() and cpuc->pebs_stale = false:
> CPU0
> intel_pmu_pebs_enable_all()
> wrmsrq(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
>
> <--- NMI (PMI) fires here
> handle_pmi_common()
> ... throttles event ...
> cpuc->pebs_stale = true;
> <--- NMI returns
>
> cpuc->pebs_stale = false;
> Would this clobber the pebs_stale flag set by the NMI, leaving the software
> state desynchronized from the hardware MSR and potentially causing the PMU
> to skip the MSR update on the next enable?
>
> "
>
> It looks reasonable.
> https://sashiko.dev/#/patchset/20260320152928.1916447-1-jmattson%40google.com
>
> Thanks.
I believe this function is only called while the PMU is disabled (on
the path to re-enabling it), so there should be no PMIs.
Sadly, though, this patch has not fixed our problem.
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2026-04-10 2:59 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-20 15:27 [PATCH] perf/x86/intel: Don't update PEBS_ENABLE MSR in NMI context Jim Mattson
2026-03-20 18:58 ` Jim Mattson
2026-04-10 2:24 ` Mi, Dapeng
2026-04-10 2:58 ` Jim Mattson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox