public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
@ 2025-12-01 14:23 Fernand Sieber
  2025-12-01 14:26 ` kernel test robot
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Fernand Sieber @ 2025-12-01 14:23 UTC (permalink / raw)
  To: seanjc, pbonzini
  Cc: Jan H. Schönherr, tglx, mingo, bp, dave.hansen, x86, hpa,
	kvm, linux-kernel, dwmw, hborghor, sieberf, nh-open-source,
	abusse, nsaenz, stable

From: Jan H. Schönherr <jschoenh@amazon.de>

It is possible to degrade host performance by manipulating performance
counters from a VM and tricking the host hypervisor to enable branch
tracing. When the guest programs a CPU to track branch instructions and
deliver an interrupt after exactly one branch instruction, the value one
is handled by the host KVM/perf subsystems and treated incorrectly as a
special value to enable the branch trace store (BTS) subsystem. It
should not be possible to enable BTS from a guest. When BTS is enabled,
it leads to general host performance degradation to both VMs and host.

Perf considers the combination of PERF_COUNT_HW_BRANCH_INSTRUCTIONS with
a sample_period of 1 a special case and handles this as a BTS event (see
intel_pmu_has_bts_period()) -- a deviation from the usual semantic,
where the sample_period represents the amount of branch instructions to
encounter before the overflow handler is invoked.

Nothing prevents a guest from programming its vPMU with the above
settings (count branch, interrupt after one branch), which causes KVM to
erroneously instruct perf to create a BTS event within
pmc_reprogram_counter(), which does not have the desired semantics.

The guest could also do more benign actions and request an interrupt
after a more reasonable number of branch instructions via its vPMU. In
that case counting works initially. However, KVM occasionally pauses and
resumes the created performance counters. If the remaining amount of
branch instructions until interrupt has reached 1 exactly,
pmc_resume_counter() fails to resume the counter and a BTS event is
created instead with its incorrect semantics.

Fix this behavior by not passing the special value "1" as sample_period
to perf. Instead, perform the same quirk that happens later in
x86_perf_event_set_period() anyway, when the performance counter is
transferred to the actual PMU: bump the sample_period to 2.

Testing:
From guest:
`./wrmsr -p 12 0x186 0x1100c4`
`./wrmsr -p 12 0xc1 0xffffffffffff`
`./wrmsr -p 12 0x186 0x5100c4`

This sequence sets up branch instruction counting, initializes the counter
to overflow after one event (0xffffffffffff), and then enables edge
detection (bit 18) for branch events.

./wrmsr -p 12 0x186 0x1100c4
    Writes to IA32_PERFEVTSEL0 (0x186)
    Value 0x1100c4 breaks down as:
        Event = 0xC4 (Branch instructions)
        Bits 16-17: 0x1 (User mode only)
        Bit 22: 1 (Enable counter)

./wrmsr -p 12 0xc1 0xffffffffffff
    Writes to IA32_PMC0 (0xC1)
    Sets counter to maximum value (0xffffffffffff)
    This effectively sets up the counter to overflow on the next branch

./wrmsr -p 12 0x186 0x5100c4
    Updates IA32_PERFEVTSEL0 again
    Similar to first command but adds bit 18 (0x4 to 0x5)
    Enables edge detection (bit 18)

These MSR writes are trapped by the hypervisor in KVM and forwarded to
the perf subsystem to create corresponding monitoring events.

It is possible to repro this problem in a more realistic guest scenario:

`perf record -e branches:u -c 2 -a &`
`perf record -e branches:u -c 2 -a &`

This presumably triggers the issue by KVM pausing and resuming the
performance counter at the wrong moment, when its value is about to
overflow.

Signed-off-by: Jan H. Schönherr <jschoenh@amazon.de>
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Hendrik Borghorst <hborghor@amazon.de>
Link: https://lore.kernel.org/r/20251124100220.238177-1-sieberf@amazon.com
---
 arch/x86/kvm/pmu.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 487ad19a236e..547512028e24 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -225,6 +225,19 @@ static u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
 {
 	u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
 
+	/*
+	 * A sample_period of 1 might get mistaken by perf for a BTS event, see
+	 * intel_pmu_has_bts_period(). This would prevent re-arming the counter
+	 * via pmc_resume_counter(), followed by the accidental creation of an
+	 * actual BTS event, which we do not want.
+	 *
+	 * Avoid this by bumping the sampling period. Note, that we do not lose
+	 * any precision, because the same quirk happens later anyway (for
+	 * different reasons) in x86_perf_event_set_period().
+	 */
+	if (sample_period == 1)
+		sample_period = 2;
+
 	if (!sample_period)
 		sample_period = pmc_bitmask(pmc) + 1;
 	return sample_period;
-- 
2.43.0




Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-01 14:23 [PATCH] KVM: x86/pmu: Do not accidentally create BTS events Fernand Sieber
@ 2025-12-01 14:26 ` kernel test robot
  2025-12-01 14:45 ` Woodhouse, David
  2025-12-02 10:03 ` Peter Zijlstra
  2 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2025-12-01 14:26 UTC (permalink / raw)
  To: Fernand Sieber; +Cc: stable, oe-kbuild-all

Hi,

Thanks for your patch.

FYI: kernel test robot notices the stable kernel rule is not satisfied.

The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1

Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree.
Subject: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
Link: https://lore.kernel.org/stable/20251201142359.344741-1-sieberf%40amazon.com

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-01 14:23 [PATCH] KVM: x86/pmu: Do not accidentally create BTS events Fernand Sieber
  2025-12-01 14:26 ` kernel test robot
@ 2025-12-01 14:45 ` Woodhouse, David
  2025-12-02  9:35   ` Peter Zijlstra
  2025-12-02 10:03 ` Peter Zijlstra
  2 siblings, 1 reply; 16+ messages in thread
From: Woodhouse, David @ 2025-12-01 14:45 UTC (permalink / raw)
  To: Sieber, Fernand, pbonzini@redhat.com, seanjc@google.com,
	peterz@infradead.org
  Cc: Saenz Julienne, Nicolas, Busse, Anselm, x86@kernel.org,
	bp@alien8.de, stable@vger.kernel.org, hpa@zytor.com,
	mingo@redhat.com, tglx@linutronix.de, dave.hansen@linux.intel.com,
	kvm@vger.kernel.org, Schönherr, Jan H., Borghorst, Hendrik,
	linux-kernel@vger.kernel.org, nh-open-source@amazon.com


[-- Attachment #1.1: Type: text/plain, Size: 527 bytes --]

On Mon, 2025-12-01 at 16:23 +0200, Fernand Sieber wrote
> Perf considers the combination of PERF_COUNT_HW_BRANCH_INSTRUCTIONS with
> a sample_period of 1 a special case and handles this as a BTS event (see
> intel_pmu_has_bts_period()) -- a deviation from the usual semantic,
> where the sample_period represents the amount of branch instructions to
> encounter before the overflow handler is invoked.

That's kind of awful, and seems to be the real underlying cause of the KVM
issue. Can we kill it with fire? Peter?

[-- Attachment #1.2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5964 bytes --]

[-- Attachment #2.1: Type: text/plain, Size: 215 bytes --]




Amazon Development Centre (London) Ltd. Registered in England and Wales with registration number 04543232 with its registered office at 1 Principal Place, Worship Street, London EC2A 2FA, United Kingdom.



[-- Attachment #2.2: Type: text/html, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-01 14:45 ` Woodhouse, David
@ 2025-12-02  9:35   ` Peter Zijlstra
  2025-12-02  9:59     ` Fernand Sieber
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2025-12-02  9:35 UTC (permalink / raw)
  To: Woodhouse, David
  Cc: Sieber, Fernand, pbonzini@redhat.com, seanjc@google.com,
	Saenz Julienne, Nicolas, Busse, Anselm, x86@kernel.org,
	bp@alien8.de, stable@vger.kernel.org, hpa@zytor.com,
	mingo@redhat.com, tglx@linutronix.de, dave.hansen@linux.intel.com,
	kvm@vger.kernel.org, Schönherr, Jan H., Borghorst, Hendrik,
	linux-kernel@vger.kernel.org, nh-open-source@amazon.com

On Mon, Dec 01, 2025 at 02:45:01PM +0000, Woodhouse, David wrote:
> On Mon, 2025-12-01 at 16:23 +0200, Fernand Sieber wrote
> > Perf considers the combination of PERF_COUNT_HW_BRANCH_INSTRUCTIONS with
> > a sample_period of 1 a special case and handles this as a BTS event (see
> > intel_pmu_has_bts_period()) -- a deviation from the usual semantic,
> > where the sample_period represents the amount of branch instructions to
> > encounter before the overflow handler is invoked.
> 
> That's kind of awful, and seems to be the real underlying cause of the KVM
> issue. Can we kill it with fire? Peter?

Well, IIRC it gives the same information and was actually less expensive
to run, seeing how BTS can buffer the data rather than having to take an
interrupt on every event.

But its been ages since this was done.

Now arguably it should not be done for this kvm stuff, because the
data-store buffers don't virtualize (just like old PEBS).


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-02  9:35   ` Peter Zijlstra
@ 2025-12-02  9:59     ` Fernand Sieber
  0 siblings, 0 replies; 16+ messages in thread
From: Fernand Sieber @ 2025-12-02  9:59 UTC (permalink / raw)
  To: peterz
  Cc: abusse, bp, dave.hansen, dwmw, hborghor, hpa, jschoenh, kvm,
	linux-kernel, mingo, nh-open-source, nsaenz, pbonzini, seanjc,
	sieberf, stable, tglx, x86

On Mon, Dec 02, 2025 at 10:35:34AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 01, 2025 at 02:45:01PM +0000, Woodhouse, David wrote:
> > On Mon, 2025-12-01 at 16:23 +0200, Fernand Sieber wrote
> > > Perf considers the combination of PERF_COUNT_HW_BRANCH_INSTRUCTIONS with
> > > a sample_period of 1 a special case and handles this as a BTS event (see
> > > intel_pmu_has_bts_period()) -- a deviation from the usual semantic,
> > > where the sample_period represents the amount of branch instructions to
> > > encounter before the overflow handler is invoked.
> >
> > That's kind of awful, and seems to be the real underlying cause of the KVM
> > issue. Can we kill it with fire? Peter?
>
> Well, IIRC it gives the same information and was actually less expensive
> to run, seeing how BTS can buffer the data rather than having to take an
> interrupt on every event.
>
> But its been ages since this was done.
>
> Now arguably it should not be done for this kvm stuff, because the
> data-store buffers don't virtualize (just like old PEBS).

This. The current logic bypasses what the guest should actually be allowed
to do. See `vmx_get_supported_debugctl`, specifically the guest should not
be allowed to enable BTS.

Also semi related to this thread, but the auto enablement of BTS for
sample_period = 1 seems to yield undesirable behavior on the guest OS. The
guest OS will try to enable BTS and guest a wrmsr failure because the host
KVM rejects it, which leads to incorrect behavior (no tracing at all
happening).



Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-01 14:23 [PATCH] KVM: x86/pmu: Do not accidentally create BTS events Fernand Sieber
  2025-12-01 14:26 ` kernel test robot
  2025-12-01 14:45 ` Woodhouse, David
@ 2025-12-02 10:03 ` Peter Zijlstra
  2025-12-02 12:44   ` Peter Zijlstra
  2 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2025-12-02 10:03 UTC (permalink / raw)
  To: Fernand Sieber
  Cc: seanjc, pbonzini, Jan H. Schönherr, tglx, mingo, bp,
	dave.hansen, x86, hpa, kvm, linux-kernel, dwmw, hborghor,
	nh-open-source, abusse, nsaenz, stable

On Mon, Dec 01, 2025 at 04:23:57PM +0200, Fernand Sieber wrote:
>  arch/x86/kvm/pmu.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> index 487ad19a236e..547512028e24 100644
> --- a/arch/x86/kvm/pmu.c
> +++ b/arch/x86/kvm/pmu.c
> @@ -225,6 +225,19 @@ static u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
>  {
>  	u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
>  
> +	/*
> +	 * A sample_period of 1 might get mistaken by perf for a BTS event, see
> +	 * intel_pmu_has_bts_period(). This would prevent re-arming the counter
> +	 * via pmc_resume_counter(), followed by the accidental creation of an
> +	 * actual BTS event, which we do not want.
> +	 *
> +	 * Avoid this by bumping the sampling period. Note, that we do not lose
> +	 * any precision, because the same quirk happens later anyway (for
> +	 * different reasons) in x86_perf_event_set_period().
> +	 */
> +	if (sample_period == 1)
> +		sample_period = 2;
> +
>  	if (!sample_period)
>  		sample_period = pmc_bitmask(pmc) + 1;
>  	return sample_period;

Oh gawd, I so hate this kvm code. It is so ludicrously bad. The way it
keeps recreating counters is just stupid. And then they complain it
sucks, it does :-(

Anyway, yes this is terrible. Let me try and untangle all this, see if
there's a saner solution.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-02 10:03 ` Peter Zijlstra
@ 2025-12-02 12:44   ` Peter Zijlstra
  2025-12-02 16:03     ` Sean Christopherson
  2025-12-10 10:11     ` Fernand Sieber
  0 siblings, 2 replies; 16+ messages in thread
From: Peter Zijlstra @ 2025-12-02 12:44 UTC (permalink / raw)
  To: Fernand Sieber
  Cc: seanjc, pbonzini, Jan H. Schönherr, tglx, mingo, bp,
	dave.hansen, x86, hpa, kvm, linux-kernel, dwmw, hborghor,
	nh-open-source, abusse, nsaenz, stable

On Tue, Dec 02, 2025 at 11:03:11AM +0100, Peter Zijlstra wrote:
> On Mon, Dec 01, 2025 at 04:23:57PM +0200, Fernand Sieber wrote:
> >  arch/x86/kvm/pmu.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > index 487ad19a236e..547512028e24 100644
> > --- a/arch/x86/kvm/pmu.c
> > +++ b/arch/x86/kvm/pmu.c
> > @@ -225,6 +225,19 @@ static u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
> >  {
> >  	u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
> >  
> > +	/*
> > +	 * A sample_period of 1 might get mistaken by perf for a BTS event, see
> > +	 * intel_pmu_has_bts_period(). This would prevent re-arming the counter
> > +	 * via pmc_resume_counter(), followed by the accidental creation of an
> > +	 * actual BTS event, which we do not want.
> > +	 *
> > +	 * Avoid this by bumping the sampling period. Note, that we do not lose
> > +	 * any precision, because the same quirk happens later anyway (for
> > +	 * different reasons) in x86_perf_event_set_period().
> > +	 */
> > +	if (sample_period == 1)
> > +		sample_period = 2;
> > +
> >  	if (!sample_period)
> >  		sample_period = pmc_bitmask(pmc) + 1;
> >  	return sample_period;
> 
> Oh gawd, I so hate this kvm code. It is so ludicrously bad. The way it
> keeps recreating counters is just stupid. And then they complain it
> sucks, it does :-(
> 
> Anyway, yes this is terrible. Let me try and untangle all this, see if
> there's a saner solution.

Does something like so work? It is still terrible, but perhaps slightly
less so.

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 2b969386dcdd..493e6ba51e06 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1558,13 +1558,22 @@ static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned int hw_event, bts_event;
 
-	if (event->attr.freq)
+	/*
+	 * Only use BTS for fixed rate period==1 events.
+	 */
+	if (event->attr.freq || period != 1)
+		return false;
+
+	/*
+	 * BTS doesn't virtualize.
+	 */
+	if (event->attr.exclude_host)
 		return false;
 
 	hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
 	bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
 
-	return hw_event == bts_event && period == 1;
+	return hw_event == bts_event;
 }
 
 static inline bool intel_pmu_has_bts(struct perf_event *event)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-02 12:44   ` Peter Zijlstra
@ 2025-12-02 16:03     ` Sean Christopherson
  2025-12-10 10:11     ` Fernand Sieber
  1 sibling, 0 replies; 16+ messages in thread
From: Sean Christopherson @ 2025-12-02 16:03 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Fernand Sieber, pbonzini, Jan H. Schönherr, tglx, mingo, bp,
	dave.hansen, x86, hpa, kvm, linux-kernel, dwmw, hborghor,
	nh-open-source, abusse, nsaenz, stable

On Tue, Dec 02, 2025, Peter Zijlstra wrote:
> Does something like so work? It is still terrible, but perhaps slightly
> less so.
> 
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 2b969386dcdd..493e6ba51e06 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -1558,13 +1558,22 @@ static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period
>  	struct hw_perf_event *hwc = &event->hw;
>  	unsigned int hw_event, bts_event;
>  
> -	if (event->attr.freq)
> +	/*
> +	 * Only use BTS for fixed rate period==1 events.
> +	 */
> +	if (event->attr.freq || period != 1)
> +		return false;
> +
> +	/*
> +	 * BTS doesn't virtualize.
> +	 */
> +	if (event->attr.exclude_host)

Ya, this seems like the right fix.  Pulling in the original bug report:

  When BTS is enabled, it leads to general host performance degradation to both
  VMs and host.

I assume the underlying problem is that intel_pmu_enable_bts() is called even
when the event isn't enabled in the host, and BTS doesn't discrimate once it's
enable and bogs down the host (but presumably not the guest, at least not directly,
since KVM should prevent setting BTS in vmcs.GUEST_IA32_DEBUGCTL).  Enabling BTS
for exclude-host events simply can't work, even if the event came from host userspace.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-02 12:44   ` Peter Zijlstra
  2025-12-02 16:03     ` Sean Christopherson
@ 2025-12-10 10:11     ` Fernand Sieber
  2025-12-10 11:16       ` Peter Zijlstra
  1 sibling, 1 reply; 16+ messages in thread
From: Fernand Sieber @ 2025-12-10 10:11 UTC (permalink / raw)
  To: peterz
  Cc: abusse, bp, dave.hansen, dwmw, hborghor, hpa, jschoenh, kvm,
	linux-kernel, mingo, nh-open-source, nsaenz, pbonzini, seanjc,
	sieberf, stable, tglx, x86

On Tue, Dec 02, 2025 at 01:44:23PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 02, 2025 at 11:03:11AM +0100, Peter Zijlstra wrote:
> > On Mon, Dec 01, 2025 at 04:23:57PM +0200, Fernand Sieber wrote:
> > >  arch/x86/kvm/pmu.c | 13 +++++++++++++
> > >  1 file changed, 13 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > > index 487ad19a236e..547512028e24 100644
> > > --- a/arch/x86/kvm/pmu.c
> > > +++ b/arch/x86/kvm/pmu.c
> > > @@ -225,6 +225,19 @@ static u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
> > >  {
> > >  	u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
> > >
> > > +	/*
> > > +	 * A sample_period of 1 might get mistaken by perf for a BTS event, see
> > > +	 * intel_pmu_has_bts_period(). This would prevent re-arming the counter
> > > +	 * via pmc_resume_counter(), followed by the accidental creation of an
> > > +	 * actual BTS event, which we do not want.
> > > +	 *
> > > +	 * Avoid this by bumping the sampling period. Note, that we do not lose
> > > +	 * any precision, because the same quirk happens later anyway (for
> > > +	 * different reasons) in x86_perf_event_set_period().
> > > +	 */
> > > +	if (sample_period == 1)
> > > +		sample_period = 2;
> > > +
> > >  	if (!sample_period)
> > >  		sample_period = pmc_bitmask(pmc) + 1;
> > >  	return sample_period;
> >
> > Oh gawd, I so hate this kvm code. It is so ludicrously bad. The way it
> > keeps recreating counters is just stupid. And then they complain it
> > sucks, it does :-(
> >
> > Anyway, yes this is terrible. Let me try and untangle all this, see if
> > there's a saner solution.
>
> Does something like so work? It is still terrible, but perhaps slightly
> less so.
>
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 2b969386dcdd..493e6ba51e06 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -1558,13 +1558,22 @@ static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period
>  	struct hw_perf_event *hwc = &event->hw;
>  	unsigned int hw_event, bts_event;
>
> -	if (event->attr.freq)
> +	/*
> +	 * Only use BTS for fixed rate period==1 events.
> +	 */
> +	if (event->attr.freq || period != 1)
> +		return false;
> +
> +	/*
> +	 * BTS doesn't virtualize.
> +	 */
> +	if (event->attr.exclude_host)
>  		return false;
>
>  	hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
>  	bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
>
> -	return hw_event == bts_event && period == 1;
> +	return hw_event == bts_event;
>  }
>
>  static inline bool intel_pmu_has_bts(struct perf_event *event)

Hi Peter,

I've pulled your changes and confirmed that they address the original
bug report.

The repro I use is running on host, with a guest running:
`perf record -e branches:u -c 2 -a &`
`perf record -e branches:u -c 2 -a &`
Then I monitor the enablement of BTS on the host and verify that without
the change BTS is enabled, and with the change it's not.

This looks good to me, should we go ahead with your changes then?

--Fernand



Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
  2025-12-10 10:11     ` Fernand Sieber
@ 2025-12-10 11:16       ` Peter Zijlstra
  2025-12-11 18:36         ` [PATCH v2] perf/x86/intel: Do not enable BTS for guests Fernand Sieber
  0 siblings, 1 reply; 16+ messages in thread
From: Peter Zijlstra @ 2025-12-10 11:16 UTC (permalink / raw)
  To: Fernand Sieber
  Cc: abusse, bp, dave.hansen, dwmw, hborghor, hpa, jschoenh, kvm,
	linux-kernel, mingo, nh-open-source, nsaenz, pbonzini, seanjc,
	stable, tglx, x86

On Wed, Dec 10, 2025 at 12:11:47PM +0200, Fernand Sieber wrote:

> > Does something like so work? It is still terrible, but perhaps slightly
> > less so.
> >
> > diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> > index 2b969386dcdd..493e6ba51e06 100644
> > --- a/arch/x86/events/perf_event.h
> > +++ b/arch/x86/events/perf_event.h
> > @@ -1558,13 +1558,22 @@ static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period
> >  	struct hw_perf_event *hwc = &event->hw;
> >  	unsigned int hw_event, bts_event;
> >
> > -	if (event->attr.freq)
> > +	/*
> > +	 * Only use BTS for fixed rate period==1 events.
> > +	 */
> > +	if (event->attr.freq || period != 1)
> > +		return false;
> > +
> > +	/*
> > +	 * BTS doesn't virtualize.
> > +	 */
> > +	if (event->attr.exclude_host)
> >  		return false;
> >
> >  	hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
> >  	bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
> >
> > -	return hw_event == bts_event && period == 1;
> > +	return hw_event == bts_event;
> >  }
> >
> >  static inline bool intel_pmu_has_bts(struct perf_event *event)
> 
> Hi Peter,
> 
> I've pulled your changes and confirmed that they address the original
> bug report.
> 
> The repro I use is running on host, with a guest running:
> `perf record -e branches:u -c 2 -a &`
> `perf record -e branches:u -c 2 -a &`
> Then I monitor the enablement of BTS on the host and verify that without
> the change BTS is enabled, and with the change it's not.
> 
> This looks good to me, should we go ahead with your changes then?

Yeah, I suppose. Please stick a coherent changelog on and repost.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v2] perf/x86/intel: Do not enable BTS for guests
  2025-12-10 11:16       ` Peter Zijlstra
@ 2025-12-11 18:36         ` Fernand Sieber
  2025-12-11 18:38           ` kernel test robot
                             ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Fernand Sieber @ 2025-12-11 18:36 UTC (permalink / raw)
  To: peterz
  Cc: abusse, bp, dave.hansen, dwmw, hborghor, hpa, jschoenh, kvm,
	linux-kernel, mingo, nh-open-source, nsaenz, pbonzini, seanjc,
	sieberf, stable, tglx, x86

By default when users program perf to sample branch instructions
(PERF_COUNT_HW_BRANCH_INSTRUCTIONS) with a sample period of 1, perf
interprets this as a special case and enables BTS (Branch Trace Store)
as an optimization to avoid taking an interrupt on every branch.

Since BTS doesn't virtualize, this optimization doesn't make sense when
the request originates from a guest. Add an additional check that
prevents this optimization for virtualized events (exclude_host).

Reported-by: Jan H. Schönherr <jschoenh@amazon.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
---
 arch/x86/events/perf_event.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 3161ec0a3416..f2e2d9b03367 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1574,13 +1574,22 @@ static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned int hw_event, bts_event;
 
-	if (event->attr.freq)
+	/*
+	 * Only use BTS for fixed rate period==1 events.
+	 */
+	if (event->attr.freq || period != 1)
+		return false;
+
+	/*
+	 * BTS doesn't virtualize.
+	 */
+	if (event->attr.exclude_host)
 		return false;
 
 	hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
 	bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
 
-	return hw_event == bts_event && period == 1;
+	return hw_event == bts_event;
 }
 
 static inline bool intel_pmu_has_bts(struct perf_event *event)
-- 
2.43.0




Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] perf/x86/intel: Do not enable BTS for guests
  2025-12-11 18:36         ` [PATCH v2] perf/x86/intel: Do not enable BTS for guests Fernand Sieber
@ 2025-12-11 18:38           ` kernel test robot
  2026-01-14 17:25           ` David Woodhouse
  2026-01-21 15:50           ` [tip: perf/urgent] " tip-bot2 for Fernand Sieber
  2 siblings, 0 replies; 16+ messages in thread
From: kernel test robot @ 2025-12-11 18:38 UTC (permalink / raw)
  To: Fernand Sieber; +Cc: stable, oe-kbuild-all

Hi,

Thanks for your patch.

FYI: kernel test robot notices the stable kernel rule is not satisfied.

The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#option-1

Rule: add the tag "Cc: stable@vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree.
Subject: [PATCH v2] perf/x86/intel: Do not enable BTS for guests
Link: https://lore.kernel.org/stable/20251211183604.868641-1-sieberf%40amazon.com

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] perf/x86/intel: Do not enable BTS for guests
  2025-12-11 18:36         ` [PATCH v2] perf/x86/intel: Do not enable BTS for guests Fernand Sieber
  2025-12-11 18:38           ` kernel test robot
@ 2026-01-14 17:25           ` David Woodhouse
  2026-01-21 13:57             ` Fernand Sieber
  2026-01-21 15:50           ` [tip: perf/urgent] " tip-bot2 for Fernand Sieber
  2 siblings, 1 reply; 16+ messages in thread
From: David Woodhouse @ 2026-01-14 17:25 UTC (permalink / raw)
  To: Fernand Sieber, peterz
  Cc: abusse, bp, dave.hansen, hborghor, hpa, jschoenh, kvm,
	linux-kernel, mingo, nh-open-source, nsaenz, pbonzini, seanjc,
	stable, tglx, x86

[-- Attachment #1: Type: text/plain, Size: 1957 bytes --]

On Thu, 2025-12-11 at 20:36 +0200, Fernand Sieber wrote:
> By default when users program perf to sample branch instructions
> (PERF_COUNT_HW_BRANCH_INSTRUCTIONS) with a sample period of 1, perf
> interprets this as a special case and enables BTS (Branch Trace Store)
> as an optimization to avoid taking an interrupt on every branch.
> 
> Since BTS doesn't virtualize, this optimization doesn't make sense when
> the request originates from a guest. Add an additional check that
> prevents this optimization for virtualized events (exclude_host).
> 
> Reported-by: Jan H. Schönherr <jschoenh@amazon.de>
> Suggested-by: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: Fernand Sieber <sieberf@amazon.com>

Did this get applied?

I think you want Cc: stable@vger.kernel.org here in the body of the
commit itself, not just on the actual email.

> ---
>  arch/x86/events/perf_event.h | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 3161ec0a3416..f2e2d9b03367 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -1574,13 +1574,22 @@ static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period
>  	struct hw_perf_event *hwc = &event->hw;
>  	unsigned int hw_event, bts_event;
>  
> -	if (event->attr.freq)
> +	/*
> +	 * Only use BTS for fixed rate period==1 events.
> +	 */
> +	if (event->attr.freq || period != 1)
> +		return false;
> +
> +	/*
> +	 * BTS doesn't virtualize.
> +	 */
> +	if (event->attr.exclude_host)
>  		return false;
>  
>  	hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
>  	bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
>  
> -	return hw_event == bts_event && period == 1;
> +	return hw_event == bts_event;
>  }
>  
>  static inline bool intel_pmu_has_bts(struct perf_event *event)


[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 5069 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] perf/x86/intel: Do not enable BTS for guests
  2026-01-14 17:25           ` David Woodhouse
@ 2026-01-21 13:57             ` Fernand Sieber
  2026-01-21 15:31               ` Peter Zijlstra
  0 siblings, 1 reply; 16+ messages in thread
From: Fernand Sieber @ 2026-01-21 13:57 UTC (permalink / raw)
  To: dwmw2, peterz, seanjc
  Cc: abusse, bp, dave.hansen, hborghor, hpa, jschoenh, kvm,
	linux-kernel, mingo, nh-open-source, nsaenz, pbonzini, sieberf,
	stable, tglx, x86

Hi Peter,

Could you please take another look and see if you are happy to pull in v2 which
implements the approach that you suggested?

Thanks,

--Fernand



Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] perf/x86/intel: Do not enable BTS for guests
  2026-01-21 13:57             ` Fernand Sieber
@ 2026-01-21 15:31               ` Peter Zijlstra
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2026-01-21 15:31 UTC (permalink / raw)
  To: Fernand Sieber
  Cc: dwmw2, seanjc, abusse, bp, dave.hansen, hborghor, hpa, jschoenh,
	kvm, linux-kernel, mingo, nh-open-source, nsaenz, pbonzini,
	stable, tglx, x86

On Wed, Jan 21, 2026 at 03:57:13PM +0200, Fernand Sieber wrote:
> Hi Peter,
> 
> Could you please take another look and see if you are happy to pull in v2 which
> implements the approach that you suggested?

Yeah, sorry, fell into a crack and all that. Got it now.

Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [tip: perf/urgent] perf/x86/intel: Do not enable BTS for guests
  2025-12-11 18:36         ` [PATCH v2] perf/x86/intel: Do not enable BTS for guests Fernand Sieber
  2025-12-11 18:38           ` kernel test robot
  2026-01-14 17:25           ` David Woodhouse
@ 2026-01-21 15:50           ` tip-bot2 for Fernand Sieber
  2 siblings, 0 replies; 16+ messages in thread
From: tip-bot2 for Fernand Sieber @ 2026-01-21 15:50 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jschoenh, Peter Zijlstra, Fernand Sieber, stable, x86,
	linux-kernel

The following commit has been merged into the perf/urgent branch of tip:

Commit-ID:     91dcfae0ff2b9b9ab03c1ec95babaceefbffb9f4
Gitweb:        https://git.kernel.org/tip/91dcfae0ff2b9b9ab03c1ec95babaceefbffb9f4
Author:        Fernand Sieber <sieberf@amazon.com>
AuthorDate:    Thu, 11 Dec 2025 20:36:04 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Wed, 21 Jan 2026 16:28:59 +01:00

perf/x86/intel: Do not enable BTS for guests

By default when users program perf to sample branch instructions
(PERF_COUNT_HW_BRANCH_INSTRUCTIONS) with a sample period of 1, perf
interprets this as a special case and enables BTS (Branch Trace Store)
as an optimization to avoid taking an interrupt on every branch.

Since BTS doesn't virtualize, this optimization doesn't make sense when
the request originates from a guest. Add an additional check that
prevents this optimization for virtualized events (exclude_host).

Reported-by: Jan H. Schönherr <jschoenh@amazon.de>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Fernand Sieber <sieberf@amazon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org>
Link: https://patch.msgid.link/20251211183604.868641-1-sieberf@amazon.com
---
 arch/x86/events/perf_event.h | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 6296302..ad35c54 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -1574,13 +1574,22 @@ static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period
 	struct hw_perf_event *hwc = &event->hw;
 	unsigned int hw_event, bts_event;
 
-	if (event->attr.freq)
+	/*
+	 * Only use BTS for fixed rate period==1 events.
+	 */
+	if (event->attr.freq || period != 1)
+		return false;
+
+	/*
+	 * BTS doesn't virtualize.
+	 */
+	if (event->attr.exclude_host)
 		return false;
 
 	hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
 	bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
 
-	return hw_event == bts_event && period == 1;
+	return hw_event == bts_event;
 }
 
 static inline bool intel_pmu_has_bts(struct perf_event *event)

^ permalink raw reply related	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2026-01-21 15:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-01 14:23 [PATCH] KVM: x86/pmu: Do not accidentally create BTS events Fernand Sieber
2025-12-01 14:26 ` kernel test robot
2025-12-01 14:45 ` Woodhouse, David
2025-12-02  9:35   ` Peter Zijlstra
2025-12-02  9:59     ` Fernand Sieber
2025-12-02 10:03 ` Peter Zijlstra
2025-12-02 12:44   ` Peter Zijlstra
2025-12-02 16:03     ` Sean Christopherson
2025-12-10 10:11     ` Fernand Sieber
2025-12-10 11:16       ` Peter Zijlstra
2025-12-11 18:36         ` [PATCH v2] perf/x86/intel: Do not enable BTS for guests Fernand Sieber
2025-12-11 18:38           ` kernel test robot
2026-01-14 17:25           ` David Woodhouse
2026-01-21 13:57             ` Fernand Sieber
2026-01-21 15:31               ` Peter Zijlstra
2026-01-21 15:50           ` [tip: perf/urgent] " tip-bot2 for Fernand Sieber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox