From: Sean Christopherson <seanjc@google.com>
To: Mingwei Zhang <mizhang@google.com>
Cc: Aaron Lewis <aaronlewis@google.com>,
Paolo Bonzini <pbonzini@redhat.com>,
"H. Peter Anvin" <hpa@zytor.com>,
kvm@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/2] KVM: x86/pmu: Reset perf_capabilities in vcpu to 0 if PDCM is disabled
Date: Wed, 24 Jan 2024 14:51:42 -0800 [thread overview]
Message-ID: <ZbGUfmn-ZAe4lkiN@google.com> (raw)
In-Reply-To: <ZbGOK9m6UKkQ38bK@google.com>
On Wed, Jan 24, 2024, Mingwei Zhang wrote:
> On Wed, Jan 24, 2024, Sean Christopherson wrote:
> > On Wed, Jan 24, 2024, Aaron Lewis wrote:
> > > On Wed, Jan 24, 2024 at 7:49 AM Sean Christopherson <seanjc@google.com> wrote:
> > > >
> > > > On Wed, Jan 24, 2024, Mingwei Zhang wrote:
> > > > No, this is just papering over the underlying bug. KVM shouldn't be stuffing
> > > > vcpu->arch.perf_capabilities without explicit writes from host userspace. E.g
> > > > KVM_SET_CPUID{,2} is allowed multiple times, at which point KVM could clobber a
> > > > host userspace write to MSR_IA32_PERF_CAPABILITIES. It's unlikely any userspace
> > > > actually does something like that, but KVM overwriting guest state is almost
> > > > never a good thing.
> > > >
> > > > I've been meaning to send a patch for a long time (IIRC, Aaron also ran into this?).
> > > > KVM needs to simply not stuff vcpu->arch.perf_capabilities. I believe we are
> > > > already fudging around this in our internal kernels, so I don't think there's a
> > > > need to carry a hack-a-fix for the destination kernel.
> > > >
> > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > > index 27e23714e960..fdef9d706d61 100644
> > > > --- a/arch/x86/kvm/x86.c
> > > > +++ b/arch/x86/kvm/x86.c
> > > > @@ -12116,7 +12116,6 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
> > > >
> > > > kvm_async_pf_hash_reset(vcpu);
> > > >
> > > > - vcpu->arch.perf_capabilities = kvm_caps.supported_perf_cap;
> > >
> > > Yeah, that will fix the issue we are seeing. The only thing that's
> > > not clear to me is if userspace should expect KVM to set this or if
> > > KVM should expect userspace to set this. How is that generally
> > > decided?
> >
> > By "this", you mean the effective RESET value for vcpu->arch.perf_capabilities?
> > To be consistent with KVM's CPUID module at vCPU creation, which is completely
> > empty (vCPU has no PMU and no PDCM support) KVM *must* zero
> > vcpu->arch.perf_capabilities.
> >
> > If userspace wants a non-zero value, then userspace needs to set CPUID to enable
> > PDCM and set MSR_IA32_PERF_CAPABILITIES.
> >
> > MSR_IA32_ARCH_CAPABILITIES is in the same boat, e.g. a vCPU without
> > X86_FEATURE_ARCH_CAPABILITIES can end up seeing a non-zero MSR value. That too
> > should be excised.
> >
> hmm, does that mean KVM just allows an invalid vcpu state exist from
> host point of view?
Yes.
https://lore.kernel.org/all/ZC4qF90l77m3X1Ir@google.com
> I think this makes a lot of confusions on migration where VMM on the source
> believes that a non-zero value from KVM_GET_MSRS is valid and the VMM on the
> target will find it not true.
Yes, but seeing a non-zero value is a KVM bug that should be fixed.
> If we follow the suggestion by removing the initial value at vCPU
> creation time, then I think it breaks the existing VMM code, since that
> requires VMM to explicitly set the MSR, which I am not sure we do today.
Yeah, I'm hoping we can squeak by without breaking existing setups.
I'm 99% certain QEMU is ok, as QEMU has explicitly set MSR_IA32_PERF_CAPABILITIES
since support for PDCM/PERF_CAPABILITIES was added by commit ea39f9b643
("target/i386: define a new MSR based feature word - FEAT_PERF_CAPABILITIES").
Frankly, if our VMM doesn't do the same, then it's wildly busted. Relying on
KVM to define the vCPU is irresponsible, to put it nicely.
> The following code below is different. The key difference is that the
> following code preserves a valid value, but this case is to not preserve
> an invalid value.
But it's a completely different fix. I referenced that commit to call out that
the need for the commit and changelog suggests that someone (*cough* us) is relying
on KVM to initialize MSR_PLATFORM_INFO, and has been doing so for a very long time.
That doesn't mean it's the correct KVM behavior, just that it's much riskier to
change.
next prev parent reply other threads:[~2024-01-24 22:51 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-01-24 0:38 [PATCH 0/2] minor fix on perf_capabilities in KVM/x86 Mingwei Zhang
2024-01-24 0:38 ` [PATCH 1/2] KVM: x86/pmu: Reset perf_capabilities in vcpu to 0 if PDCM is disabled Mingwei Zhang
2024-01-24 15:49 ` Sean Christopherson
2024-01-24 21:04 ` Aaron Lewis
2024-01-24 21:25 ` Sean Christopherson
2024-01-24 22:24 ` Mingwei Zhang
2024-01-24 22:51 ` Sean Christopherson [this message]
2024-01-25 0:14 ` Mingwei Zhang
2024-01-26 18:33 ` Sean Christopherson
2024-01-26 19:30 ` Mingwei Zhang
2024-01-26 19:34 ` Sean Christopherson
2024-01-29 14:40 ` Paolo Bonzini
2024-01-29 14:39 ` Paolo Bonzini
2024-01-31 19:43 ` Mingwei Zhang
2024-01-24 0:38 ` [PATCH 2/2] KVM: x86/pmu: Remove vcpu_get_perf_capabilities() Mingwei Zhang
2024-01-24 15:52 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZbGUfmn-ZAe4lkiN@google.com \
--to=seanjc@google.com \
--cc=aaronlewis@google.com \
--cc=hpa@zytor.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mizhang@google.com \
--cc=pbonzini@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox