kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Like Xu <like.xu.linux@gmail.com>
To: Jim Mattson <jmattson@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	kvm list <kvm@vger.kernel.org>
Subject: Re: PMU virtualization and AMD erratum 1292
Date: Mon, 17 Jan 2022 12:26:47 +0800	[thread overview]
Message-ID: <453a2a09-5f29-491e-c386-6b23d4244cc2@gmail.com> (raw)
In-Reply-To: <CALMp9eQZa_y3ZN0_xHuB6nW0YU8oO6=5zPEov=DUQYPbzLeQVA@mail.gmail.com>

On 15/1/2022 4:02 am, Jim Mattson wrote:
>  From AMD erratum 1292:

I see quite a few errata in AMD's products in terms of PMU counters.

Considering the number of this type of machines in real world,
there is a real need to think about it. Thanks for pointing out.

> 
> The processor may experience sampling inaccuracies that cause the
> following performance counters to overcount retire-based events.
>   • PMCx0C0 [Retired Instructions]
>   • PMCx0C1 [Retired Uops]
>   • PMCx0C2 [Retired Branch Instructions]
>   • PMCx0C3 [Retired Branch Instructions Mispredicted]
>   • PMCx0C4 [Retired Taken Branch Instructions]
>   • PMCx0C5 [Retired Taken Branch Instructions Mispredicted]
>   • PMCx0C8 [Retired Near Returns]
>   • PMCx0C9 [Retired Near Returns Mispredicted]
>   • PMCx0CA [Retired Indirect Branch Instructions Mispredicted]
> • PMCx0CC [Retired Indirect Branch Instructions]
>   • PMCx0D1 [Retired Conditional Branch Instructions]
>   • PMCx1C7 [Retired Mispredicted Branch Instructions due to Direction Mismatch]
>   • PMCx1D0 [Retired Fused Branch Instructions]
> 
> The recommended workaround is:

Or to set the BIOS Setup Option "IBS hardware workaround."
(not recommended for production due to negative performance impact)

> 
> To count the non-FP affected PMC events correctly:
>   • Use Core::X86::Msr::PERF_CTL2 to count the events, and
>   • Program Core::X86::Msr::PERF_CTL2[43] to 1b, and
>   • Program Core::X86::Msr::PERF_CTL2[20] to 0b.

diff --git a/arch/x86/kvm/svm/pmu.c b/arch/x86/kvm/svm/pmu.c
index 12d8b301065a..6a7638043066 100644
--- a/arch/x86/kvm/svm/pmu.c
+++ b/arch/x86/kvm/svm/pmu.c
@@ -18,6 +18,13 @@
  #include "pmu.h"
  #include "svm.h"

+/* AMD erratum 1292 */
+static inline bool cpu_overcount_retire_events(struct kvm_vcpu *vcpu)
+{
+	return guest_cpuid_family(vcpu) == 0x19 &&
+		guest_cpuid_model(vcpu) < 0x10;
+}
+
  enum pmu_type {
  	PMU_TYPE_COUNTER = 0,
  	PMU_TYPE_EVNTSEL,
@@ -252,6 +259,7 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
  	struct kvm_pmc *pmc;
  	u32 msr = msr_info->index;
  	u64 data = msr_info->data;
+	u64 reserved_bits = pmu->reserved_bits;

  	/* MSR_PERFCTRn */
  	pmc = get_gp_pmc_amd(pmu, msr, PMU_TYPE_COUNTER);
@@ -264,7 +272,9 @@ static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
  	if (pmc) {
  		if (data == pmc->eventsel)
  			return 0;
-		if (!(data & pmu->reserved_bits)) {
+		if (pmc->idx == 2 && cpu_overcount_retire_events(vcpu))
+			reserved_bits &= ~BIT_ULL(43);
+		if (!(data & reserved_bits)) {
  			reprogram_gp_counter(pmc, data);
  			return 0;
  		}

> 
> It's unfortunate that kvm's PMU virtualization completely circumvents
> any attempt to employ the recommended workaround. Admittedly, bit 43
> is "reserved," and it would be foolish for a hypervisor to let a guest
> set a reserved bit in a host MSR. 

It's easy for KVM to clear the reserved bit PERF_CTL2[43]
for only (AMD Family 19h Models 00h-0Fh) guests.

Obviously, such guests need to be updated and the reserved bit can
be accessed safely. Don't worry about the legacy guest, see below.

> But, even the first recommendation
> is impossible under KVM, because the host's perf subsystem actually
> decides which hardware counter is going to be used, regardless of what
> the guest asks for.

First, the host perf subsystem needs to be patched to implement this workaround.
  (AMD guys have been notified)

The patched host perf will schedule all retire events to counter 2 as long as
the requested event_select and unit_mask are matched in the workaround table.

It works for both host-created perf_events and KVM-created perf_events, so that
all legacy (retire event) guests counters will use the specific host counter 2 and,
the sampling (w/o host counter multiplexing) will be kept accurate.

> 
> Am I the only one bothered by this?
With this workaround, it is easier to trigger multiplexing, which the guest
does not correctly perceive even now.

  reply	other threads:[~2022-01-17  4:26 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 20:02 PMU virtualization and AMD erratum 1292 Jim Mattson
2022-01-17  4:26 ` Like Xu [this message]
2022-01-17 20:57   ` Jim Mattson
2022-01-18  4:08     ` Jim Mattson
2022-01-18  6:25       ` Like Xu
2022-01-18 18:22         ` Jim Mattson
2022-01-19  3:54           ` Like Xu
2022-01-19  4:36             ` Ananth Narayan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=453a2a09-5f29-491e-c386-6b23d4244cc2@gmail.com \
    --to=like.xu.linux@gmail.com \
    --cc=eranian@google.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=pbonzini@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).