The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Sean Christopherson <seanjc@google.com>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	 Arnaldo Carvalho de Melo <acme@kernel.org>,
	Namhyung Kim <namhyung@kernel.org>,
	 Sean Christopherson <seanjc@google.com>,
	Paolo Bonzini <pbonzini@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	 Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,  Ian Rogers <irogers@google.com>,
	Adrian Hunter <adrian.hunter@intel.com>,
	 James Clark <james.clark@linaro.org>,
	linux-perf-users@vger.kernel.org,  linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org,  Jim Mattson <jmattson@google.com>,
	Mingwei Zhang <mizhang@google.com>,
	 Stephane Eranian <eranian@google.com>,
	Dapeng Mi <dapeng1.mi@linux.intel.com>
Subject: [PATCH v3 0/9] perf/x86: Don't write PEBS_ENABLED on KVM transitions
Date: Fri,  8 May 2026 16:13:44 -0700	[thread overview]
Message-ID: <20260508231353.406465-1-seanjc@google.com> (raw)

Rework the handling of PEBS_ENABLED (and related PEBS MSRs) to *never* touch
PEBS_ENABLED if the CPU provides PEBS isolation, in which case disabling
counters via PERF_GLOBAL_CTRL is sufficient to prevent generation of unwanted
PEBS records.  For vCPUs without PEBS enabled, this saves upwards of 7 MSR
writes on each roundtrip between the guest and host (KVM performs an immediate
WRMSR to zero out PEBS_ENABLED if it's in the load list).  For vCPUS with PEBS,
this saves 3 MSR writes per roundtrip.

E.g. without PEBS activity in the host, for a guest with a vPMU, this reduces
the roundtrip time for a fastpath exit from ~1120 => ~860 cycles on EMR.  With
host PEBS active, the reduction is ~1450 => ~900 cycles.

However, performance isn't the underlying motiviation (well, at least, it
didn't start that way).  Jim, Mingwei, and Stephane have been chasing issues
where PEBS_ENABLED bits can get "stuck" in a '1' state when running KVM guests
while profiling the host with PEBS events.  The working theory is that perf
throttles PEBS events in NMI context, and thus clears bits in cpuc->pebs_enabled
and PEBS_ENABLED, after generating the list of PMU MSRs to context switch but
before VM-Entry.  And so when the host's PEBS_ENABLED is loaded on VM-Exit, the
CPU ends up with a stale PEBS_ENABLED that doesn't get reset until something
triggers an explicit reload in perf.

Note, as Peter pointed out, more than likely KVM needs to zero PERF_GLOBAL_CTRL
before invoking perf_guest_get_msrs(), as that's the only way to guarantee
stable output.  I deliberately didn't include that here, as I want to keep this
series focused on PEBS.  I also wanted to let Jim and company bottom out on
their investigation (still ongoing) before pursuing fixes that we'll probably
want to send to stable@.

v3:
 - Ensure guest PEBS_ENABLE is a subset of intel_ctrl. [Jim]
 - Rename intel_ctrl_{guest,host}_mask to be less confusing. [Jim]
 - Do even more cleanup of the cross-mapped handling, and specifically avoid
   overhead when PEBS isn't in use. [Sashiko]
 - Leave behind a FIXME regarding the "disable guest PEBS if host is using
   PEBS" code.  I still don't know for sure why that restriction is in place,
   and I'm too scared too change it. :-)

v2:
 - https://lore.kernel.org/all/20260423150340.463896-1-seanjc@google.com
 - "Load" the host value for the guest when an MSR should remain unchanged,
    instead of omitting the MSR from the list entirely, as KVM may need to
    _remove_ the MSR from the list. [Sashiko, Jim]
 - Collect Jim's reviews. [Jim]
 - Call out that the bug being fixed is theoretical at this point.
 - Dropping PEBS_ENABLED from the lists save three MSR writes, not two, as
   KVM performs an explicit WRMSR prior to VM-Entry to guarantee PEBS is
   quiesced.

v1: https://lore.kernel.org/all/20260414191425.2697918-1-seanjc@google.com


Sean Christopherson (9):
  perf/x86/intel: Ensure guest PEBS path doesn't set unwanted
    PERF_GLOBAL_CTRL bits
  perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU
    has isolation
  perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS
    is unused
  perf/x86/intel: Make @data a mandatory param for
    intel_guest_get_msrs()
  perf/x86/intel: Invert names of intel_ctrl_{guest,host}_mask
  perf/x86: KVM: Have perf define a dedicated struct for getting guest
    PEBS data
  perf/x86/intel: KVM: Handle cross-mapped PEBS PMCs entirely within KVM
  KVM: VMX: Drop a redundant pmu->global_ctrl check when processing
    pebs_enable
  KVM: VMX: Only tell perf to enable PEBS counters for fully enabled
    PMCs

 arch/x86/events/core.c            |  5 +-
 arch/x86/events/intel/core.c      | 92 +++++++++++++++++++------------
 arch/x86/events/intel/lbr.c       |  2 +-
 arch/x86/events/perf_event.h      |  7 ++-
 arch/x86/include/asm/kvm_host.h   |  9 ---
 arch/x86/include/asm/perf_event.h | 11 +++-
 arch/x86/kvm/vmx/pmu_intel.c      | 28 +++++++---
 arch/x86/kvm/vmx/vmx.c            | 10 ++--
 arch/x86/kvm/vmx/vmx.h            | 15 ++++-
 9 files changed, 114 insertions(+), 65 deletions(-)


base-commit: 254f49634ee16a731174d2ae34bc50bd5f45e731
-- 
2.54.0.563.g4f69b47b94-goog


             reply	other threads:[~2026-05-08 23:13 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-08 23:13 Sean Christopherson [this message]
2026-05-08 23:13 ` [PATCH v3 1/9] perf/x86/intel: Ensure guest PEBS path doesn't set unwanted PERF_GLOBAL_CTRL bits Sean Christopherson
2026-05-12  4:53   ` Mi, Dapeng
2026-05-08 23:13 ` [PATCH v3 2/9] perf/x86/intel: Don't write PEBS_ENABLED on host<=>guest xfers if CPU has isolation Sean Christopherson
2026-05-12  4:53   ` Mi, Dapeng
2026-05-08 23:13 ` [PATCH v3 3/9] perf/x86/intel: Don't context switch DS_AREA (and PEBS config) if PEBS is unused Sean Christopherson
2026-05-08 23:13 ` [PATCH v3 4/9] perf/x86/intel: Make @data a mandatory param for intel_guest_get_msrs() Sean Christopherson
2026-05-12 12:39   ` Jim Mattson
2026-05-08 23:13 ` [PATCH v3 5/9] perf/x86/intel: Invert names of intel_ctrl_{guest,host}_mask Sean Christopherson
2026-05-12  4:58   ` Mi, Dapeng
2026-05-08 23:13 ` [PATCH v3 6/9] perf/x86: KVM: Have perf define a dedicated struct for getting guest PEBS data Sean Christopherson
2026-05-08 23:13 ` [PATCH v3 7/9] perf/x86/intel: KVM: Handle cross-mapped PEBS PMCs entirely within KVM Sean Christopherson
2026-05-12  4:59   ` Mi, Dapeng
2026-05-08 23:13 ` [PATCH v3 8/9] KVM: VMX: Drop a redundant pmu->global_ctrl check when processing pebs_enable Sean Christopherson
2026-05-12  5:00   ` Mi, Dapeng
2026-05-08 23:13 ` [PATCH v3 9/9] KVM: VMX: Only tell perf to enable PEBS counters for fully enabled PMCs Sean Christopherson
2026-05-12  5:01   ` Mi, Dapeng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508231353.406465-1-seanjc@google.com \
    --to=seanjc@google.com \
    --cc=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=james.clark@linaro.org \
    --cc=jmattson@google.com \
    --cc=jolsa@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=mizhang@google.com \
    --cc=namhyung@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox