Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition

linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Sean Christopherson <seanjc@google.com>
To: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Anirudh Rayabharam <anrayabh@linux.microsoft.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Michael Kelley <mikelley@microsoft.com>,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
Date: Tue, 23 Aug 2022 15:00:38 +0000	[thread overview]
Message-ID: <YwTrlgeqoAqyH0KF@google.com> (raw)
In-Reply-To: <87wnazwh1r.fsf@redhat.com>

On Tue, Aug 23, 2022, Vitaly Kuznetsov wrote:
> Sean Christopherson <seanjc@google.com> writes:
> 
> > On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
> >> QEMU's migration depends on the assumption that identical QEMU's command
> >> lines create identical (from guest PoV) configurations. Assume we have
> >> (simplified)
> >> 
> >> "-cpu CascadeLake-Sever,hv-evmcs"
> >> 
> >> on both source and destination but source host is newer, i.e. its KVM
> >> knows about TSC Scaling in eVMCS and destination host has no idea about
> >> it. If we just apply filtering upon vCPU creation, guest visible MSR
> >> values are going to be different, right? Ok, assuming QEMU also migrates
> >> VMX feature MSRs (TODO: check if that's true), we will be able to fail
> >> mirgration late (which is already much worse than not being able to
> >> create the desired configuration on destination, 'fail early') if we use
> >> in-KVM filtering to throw an error to userspace. But if we blindly
> >> filter control MSRs on the destination, 'TscScaling' will just disapper
> >> undreneath the guest. This is unlikely to work.
> >
> > But all of that holds true irrespetive of eVMCS.  If QEMU attempts to migrate a
> > nested guest from a KVM that supports TSC_SCALING to a KVM that doesn't support
> > TSC_SCALING, then TSC_SCALING is going to disappear and VM-Entry on the dest will
> > fail.  I.e. QEMU _must_ be able to detect the incompatibility and not attempt
> > the migration.  And with that code in place, QEMU doesn't need to do anything new
> > for eVMCS, it Just Works.
> 
> I'm obviously missing something. "-cpu CascadeLake-Sever" presumes
> cetain features, including VMX features (e.g. TSC_SCALING), an attempt
> to create such vCPU on a CPU which doesn't support it will lead to
> immediate failure. So two VMs created on different hosts with
> 
> -cpu CascadeLake-Sever
> 
> are guaranteed to look exactly the same from guest PoV. This is not true
> for '-cpu CascadeLake-Sever,hv-evmcs' (if we do it the way you suggest)
> as 'hv-evmcs' will be a *different* filter on each host (which is going
> to depend on KVM version, not even on the host's hardware).

We're talking about nested VMX, i.e. exposing TSC_SCALING to L1.  QEMU's CLX
definition doesn't include TSC_SCALING.  In fact, none of QEMU's predefined CPU
models supports TSC_SCALING, precisely because KVM didn't support exposing the
feature to L1 until relatively recently.

$ git grep VMX_SECONDARY_EXEC_TSC_SCALING
target/i386/cpu.h:#define VMX_SECONDARY_EXEC_TSC_SCALING              0x02000000
target/i386/kvm/kvm.c:    if (f[FEAT_VMX_SECONDARY_CTLS] & VMX_SECONDARY_EXEC_TSC_SCALING) {

> >> In any case, what we need, is an option for VMM (read: QEMU) to create
> >> the configuration with 'TscScaling' filtered out even KVM supports the
> >> bit in eVMCS. This way the guest will be able to migrate backwards to an
> >> older KVM which doesn't support it, i.e.
> >> 
> >> '-cpu CascadeLake-Sever,hv-evmcs'
> >>  creates the 'origin' eVMCS configuration, no TscScaling
> >> 
> >> '-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022' creates the updated one.

Ah, I see what you're worried about.  Your concern is that QEMU will add a VMX
feature to a predefined CPU model, but only later gain eVMCS support, and so
"CascadeLake-Server,hv-evmcs" will do different things depending on the KVM
version.

But again, that's already reality.  Run "-cpu CascadeLake-Server" against a KVM
from before commits:

  28c1c9fabf48 ("KVM/VMX: Emulate MSR_IA32_ARCH_CAPABILITIES")
  1eaafe91a0df ("kvm: x86: IA32_ARCH_CAPABILITIES is always supported")

and it will fail.  There are undoubtedly many other features that are similarly
affected, just go back far enough in KVM time.

Or simply run "-cpu CascadeLake-Server" on pre-CLX hardware.  Anything that KVM
doesn't fully emulate will not be present.

> > Again, this conundrum exists irrespective of eVMCS.  Properly solve the problem
> > for regular nVMX and eVMCS should naturally work.
> 
> I don't think we have this problem for VMX features as named CPU models
> in QEMU encode all of them explicitly, they *must* be present whenever
> such vCPU is created.

Yes, and if KVM doesn't support features that CascadeLake-Server requires, spawning
the VM will fail on the destination, as it should.  My point is that this behavior
is not unique to eVMCS.

QEMU/Libvirt must also be prepared for rejection, because it is flat out impossible
to ensure that KVM+hardware supports a specific feature.

> >> KVM_CAP_HYPERV_ENLIGHTENED_VMCS is bad as it only takes 'eVMCS' version
> >> as a parameter (as we assumed it will always change when new fields are
> >> added, but that turned out to be false). That's why I suggested
> >> KVM_CAP_HYPERV_ENLIGHTENED_VMCS2.
> >
> > Enumerating features via versions is such a bad API though, e.g. if there's a
> > bug with nested TSC_SCALING, userspace can't disable just nested TSC_SCALING
> > without everything else under the inscrutable "hv-evmcs-2022" being collateral
> > damage.
> 
> Why? Something like 
> 
> "-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022,-vmx-tsc-scaling" 
> 
> should work well, no? 'hv-evmcs*' are just filters, if the VMX feature
> is not there -- it's not there.

Because it's completely unnecessary, adds non-trivial maintenance burden to KVM,
and requires explicit documentation to explain to userspace what "hv-evmcs-2022"
means.

It's unnecessary because if the user is concerned about eVMCS features showing up
in the future, then they should do:

  -cpu CascadeLake-Server,hv-evmcs,-vmx-tsc-scaling,-<any other VMX features not eVMCS-friendly>

If QEMU wants to make that more user friendly, then define CascadeLake-Server-eVMCS
or whatever so that the features that are unlikely be supported for eVMCS are off by
default.  This is no different than QEMU not including nested TSC_SCALING in any of
the predefined models; the developers _know_ KVM doesn't widely support TSC_SCALING,
so it was omitted, even though a real CLX CPU is guaranteed to support TSC_SCALING.

It's non-trivial maintenance for KVM because it would require defining new versions
every time an eVMCS field is added, allowing userspace to specify and restrict
features based on arbitrary versions, and do all of that without conflicting with
whatever PV enumeration Microsoft adds.

next prev parent reply	other threads:[~2022-08-23 17:26 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags Vitaly Kuznetsov
2022-08-18 15:14   ` Sean Christopherson
2022-08-18 15:20     ` Vitaly Kuznetsov
2022-08-18 15:49       ` Sean Christopherson
2022-08-18 15:59         ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 02/26] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 03/26] x86/hyperv: Update " Vitaly Kuznetsov
2022-08-18 15:21   ` Sean Christopherson
2022-08-18 15:29     ` Vitaly Kuznetsov
2022-08-18 17:57       ` Sean Christopherson
2022-08-22  9:18         ` Vitaly Kuznetsov
2022-08-22 15:55           ` Sean Christopherson
2022-08-22 16:21             ` Vitaly Kuznetsov
2022-08-22 17:01               ` Sean Christopherson
2022-08-22 17:46                 ` Vitaly Kuznetsov
2022-08-22 18:32                   ` Sean Christopherson
2022-08-23  7:33                     ` Vitaly Kuznetsov
2022-08-23 15:00                       ` Sean Christopherson [this message]
2022-08-23 15:31                         ` Sean Christopherson
2022-08-23 16:54                         ` Vitaly Kuznetsov
2022-08-23 20:16                           ` Sean Christopherson
2022-08-22 16:13           ` Sean Christopherson
2022-08-22 16:24             ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 04/26] KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 05/26] KVM: nVMX: Support several new fields in eVMCSv1 Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 06/26] KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 07/26] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 08/26] KVM: selftests: Switch to updated eVMCSv1 definition Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS Vitaly Kuznetsov
2022-08-18 17:15   ` Sean Christopherson
2022-08-19  8:06     ` Vitaly Kuznetsov
2022-08-19 17:02       ` Sean Christopherson
2022-08-22  8:47         ` Vitaly Kuznetsov
2022-08-22 16:50           ` Sean Christopherson
2022-08-22 17:49             ` Vitaly Kuznetsov
2022-08-18 17:19   ` Sean Christopherson
2022-08-19  7:42     ` Vitaly Kuznetsov
2022-08-19 14:49       ` Sean Christopherson
2022-08-19 15:07         ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 10/26] KVM: selftests: Enable TSC scaling in evmcs selftest Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 11/26] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 12/26] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config() Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 13/26] KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 14/26] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 15/26] KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 16/26] KVM: VMX: Extend VMX controls macro shenanigans Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 17/26] KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of setup_vmcs_config() Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 18/26] KVM: VMX: Add missing VMEXIT controls to vmcs_config Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 19/26] KVM: VMX: Add missing CPU based VM execution " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 20/26] KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setup Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 21/26] KVM: x86: VMX: Replace some Intel model numbers with mnemonics Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config() Vitaly Kuznetsov
2022-08-18 17:49   ` Sean Christopherson
2022-08-19  7:48     ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 23/26] KVM: nVMX: Always set required-1 bits of pinbased_ctls to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 24/26] KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 25/26] KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 26/26] KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up nested MSR Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YwTrlgeqoAqyH0KF@google.com \
    --to=seanjc@google.com \
    --cc=anrayabh@linux.microsoft.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=mlevitsk@redhat.com \
    --cc=nathan@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=vkuznets@redhat.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).