linux-hyperv.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org, Paolo Bonzini <pbonzini@redhat.com>,
	Anirudh Rayabharam <anrayabh@linux.microsoft.com>,
	Wanpeng Li <wanpengli@tencent.com>,
	Jim Mattson <jmattson@google.com>,
	Maxim Levitsky <mlevitsk@redhat.com>,
	Nathan Chancellor <nathan@kernel.org>,
	Michael Kelley <mikelley@microsoft.com>,
	linux-hyperv@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH v5 03/26] x86/hyperv: Update 'struct hv_enlightened_vmcs' definition
Date: Tue, 23 Aug 2022 09:33:04 +0200	[thread overview]
Message-ID: <87wnazwh1r.fsf@redhat.com> (raw)
In-Reply-To: <YwPLt2e7CuqMzjt1@google.com>

Sean Christopherson <seanjc@google.com> writes:

> On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> 
>> > On Mon, Aug 22, 2022, Vitaly Kuznetsov wrote:
>> >> My initial implementation was inventing 'eVMCS revision' concept:
>> >> https://lore.kernel.org/kvm/20220629150625.238286-7-vkuznets@redhat.com/
>> >> 
>> >> which is needed if we don't gate all these new fields on CPUID.0x4000000A.EBX BIT(0).
>> >> 
>> >> Going forward, we will still (likely) need something when new fields show up.
>> >
>> > My comments from that thread still apply.  Adding "revisions" or feature flags
>> > isn't maintanable, e.g. at best KVM will end up with a ridiculous number of flags.
>> >
>> > Looking at QEMU, which I strongly suspect is the only VMM that enables
>> > KVM_CAP_HYPERV_ENLIGHTENED_VMCS, it does the sane thing of enabling the capability
>> > before grabbing the VMX MSRs.
>> >
>> > So, why not simply apply filtering for host accesses as well?
>> 
>> (I understand that using QEMU to justify KVM's behavior is flawed but...)
>> 
>> QEMU's migration depends on the assumption that identical QEMU's command
>> lines create identical (from guest PoV) configurations. Assume we have
>> (simplified)
>> 
>> "-cpu CascadeLake-Sever,hv-evmcs"
>> 
>> on both source and destination but source host is newer, i.e. its KVM
>> knows about TSC Scaling in eVMCS and destination host has no idea about
>> it. If we just apply filtering upon vCPU creation, guest visible MSR
>> values are going to be different, right? Ok, assuming QEMU also migrates
>> VMX feature MSRs (TODO: check if that's true), we will be able to fail
>> mirgration late (which is already much worse than not being able to
>> create the desired configuration on destination, 'fail early') if we use
>> in-KVM filtering to throw an error to userspace. But if we blindly
>> filter control MSRs on the destination, 'TscScaling' will just disapper
>> undreneath the guest. This is unlikely to work.
>
> But all of that holds true irrespetive of eVMCS.  If QEMU attempts to migrate a
> nested guest from a KVM that supports TSC_SCALING to a KVM that doesn't support
> TSC_SCALING, then TSC_SCALING is going to disappear and VM-Entry on the dest will
> fail.  I.e. QEMU _must_ be able to detect the incompatibility and not attempt
> the migration.  And with that code in place, QEMU doesn't need to do anything new
> for eVMCS, it Just Works.

I'm obviously missing something. "-cpu CascadeLake-Sever" presumes
cetain features, including VMX features (e.g. TSC_SCALING), an attempt
to create such vCPU on a CPU which doesn't support it will lead to
immediate failure. So two VMs created on different hosts with

-cpu CascadeLake-Sever

are guaranteed to look exactly the same from guest PoV. This is not true
for '-cpu CascadeLake-Sever,hv-evmcs' (if we do it the way you suggest)
as 'hv-evmcs' will be a *different* filter on each host (which is going
to depend on KVM version, not even on the host's hardware).

>
>> In any case, what we need, is an option for VMM (read: QEMU) to create
>> the configuration with 'TscScaling' filtered out even KVM supports the
>> bit in eVMCS. This way the guest will be able to migrate backwards to an
>> older KVM which doesn't support it, i.e.
>> 
>> '-cpu CascadeLake-Sever,hv-evmcs'
>>  creates the 'origin' eVMCS configuration, no TscScaling
>> 
>> '-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022' creates the updated one.
>
> Again, this conundrum exists irrespective of eVMCS.  Properly solve the problem
> for regular nVMX and eVMCS should naturally work.

I don't think we have this problem for VMX features as named CPU models
in QEMU encode all of them explicitly, they *must* be present whenever
such vCPU is created.

>
>> KVM_CAP_HYPERV_ENLIGHTENED_VMCS is bad as it only takes 'eVMCS' version
>> as a parameter (as we assumed it will always change when new fields are
>> added, but that turned out to be false). That's why I suggested
>> KVM_CAP_HYPERV_ENLIGHTENED_VMCS2.
>
> Enumerating features via versions is such a bad API though, e.g. if there's a
> bug with nested TSC_SCALING, userspace can't disable just nested TSC_SCALING
> without everything else under the inscrutable "hv-evmcs-2022" being collateral
> damage.

Why? Something like 

"-cpu CascadeLake-Sever,hv-evmcs,hv-evmcs-2022,-vmx-tsc-scaling" 

should work well, no? 'hv-evmcs*' are just filters, if the VMX feature
is not there -- it's not there.

We can certainly make this fine-grained and introduce
KVM_CAP_HYPERV_EVMCS_TSC_SCALING instead and make a 'hv-*' flag for it,
however, with Hyper-V and Windows I really don't like 'frankenstein'
Hyper-V configurations which look nothing like genuine Hyper-V as nobody
at Microsoft tests new Windows builds with such configurations. And yes,
bugs were discovered in the past (e.g. PerfGlobalCtrl and Win11).

-- 
Vitaly


  reply	other threads:[~2022-08-23  7:33 UTC|newest]

Thread overview: 59+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-02 16:07 [PATCH v5 00/26] KVM: VMX: Support updated eVMCSv1 revision + use vmcs_config for L1 VMX MSRs Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 01/26] KVM: x86: hyper-v: Expose access to debug MSRs in the partition privilege flags Vitaly Kuznetsov
2022-08-18 15:14   ` Sean Christopherson
2022-08-18 15:20     ` Vitaly Kuznetsov
2022-08-18 15:49       ` Sean Christopherson
2022-08-18 15:59         ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 02/26] x86/hyperv: Fix 'struct hv_enlightened_vmcs' definition Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 03/26] x86/hyperv: Update " Vitaly Kuznetsov
2022-08-18 15:21   ` Sean Christopherson
2022-08-18 15:29     ` Vitaly Kuznetsov
2022-08-18 17:57       ` Sean Christopherson
2022-08-22  9:18         ` Vitaly Kuznetsov
2022-08-22 15:55           ` Sean Christopherson
2022-08-22 16:21             ` Vitaly Kuznetsov
2022-08-22 17:01               ` Sean Christopherson
2022-08-22 17:46                 ` Vitaly Kuznetsov
2022-08-22 18:32                   ` Sean Christopherson
2022-08-23  7:33                     ` Vitaly Kuznetsov [this message]
2022-08-23 15:00                       ` Sean Christopherson
2022-08-23 15:31                         ` Sean Christopherson
2022-08-23 16:54                         ` Vitaly Kuznetsov
2022-08-23 20:16                           ` Sean Christopherson
2022-08-22 16:13           ` Sean Christopherson
2022-08-22 16:24             ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 04/26] KVM: VMX: Define VMCS-to-EVMCS conversion for the new fields Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 05/26] KVM: nVMX: Support several new fields in eVMCSv1 Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 06/26] KVM: x86: hyper-v: Cache HYPERV_CPUID_NESTED_FEATURES CPUID leaf Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 07/26] KVM: selftests: Add ENCLS_EXITING_BITMAP{,HIGH} VMCS fields Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 08/26] KVM: selftests: Switch to updated eVMCSv1 definition Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 09/26] KVM: VMX: nVMX: Support TSC scaling and PERF_GLOBAL_CTRL with enlightened VMCS Vitaly Kuznetsov
2022-08-18 17:15   ` Sean Christopherson
2022-08-19  8:06     ` Vitaly Kuznetsov
2022-08-19 17:02       ` Sean Christopherson
2022-08-22  8:47         ` Vitaly Kuznetsov
2022-08-22 16:50           ` Sean Christopherson
2022-08-22 17:49             ` Vitaly Kuznetsov
2022-08-18 17:19   ` Sean Christopherson
2022-08-19  7:42     ` Vitaly Kuznetsov
2022-08-19 14:49       ` Sean Christopherson
2022-08-19 15:07         ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 10/26] KVM: selftests: Enable TSC scaling in evmcs selftest Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 11/26] KVM: VMX: Get rid of eVMCS specific VMX controls sanitization Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 12/26] KVM: VMX: Check VM_ENTRY_IA32E_MODE in setup_vmcs_config() Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 13/26] KVM: VMX: Check CPU_BASED_{INTR,NMI}_WINDOW_EXITING " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 14/26] KVM: VMX: Tweak the special handling of SECONDARY_EXEC_ENCLS_EXITING " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 15/26] KVM: VMX: Don't toggle VM_ENTRY_IA32E_MODE for 32-bit kernels/KVM Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 16/26] KVM: VMX: Extend VMX controls macro shenanigans Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 17/26] KVM: VMX: Move CPU_BASED_CR8_{LOAD,STORE}_EXITING filtering out of setup_vmcs_config() Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 18/26] KVM: VMX: Add missing VMEXIT controls to vmcs_config Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 19/26] KVM: VMX: Add missing CPU based VM execution " Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 20/26] KVM: VMX: Adjust CR3/INVPLG interception for EPT=y at runtime, not setup Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 21/26] KVM: x86: VMX: Replace some Intel model numbers with mnemonics Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 22/26] KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config() Vitaly Kuznetsov
2022-08-18 17:49   ` Sean Christopherson
2022-08-19  7:48     ` Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 23/26] KVM: nVMX: Always set required-1 bits of pinbased_ctls to PIN_BASED_ALWAYSON_WITHOUT_TRUE_MSR Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 24/26] KVM: nVMX: Use sanitized allowed-1 bits for VMX control MSRs Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 25/26] KVM: VMX: Cache MSR_IA32_VMX_MISC in vmcs_config Vitaly Kuznetsov
2022-08-02 16:07 ` [PATCH v5 26/26] KVM: nVMX: Use cached host MSR_IA32_VMX_MISC value for setting up nested MSR Vitaly Kuznetsov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnazwh1r.fsf@redhat.com \
    --to=vkuznets@redhat.com \
    --cc=anrayabh@linux.microsoft.com \
    --cc=jmattson@google.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-hyperv@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mikelley@microsoft.com \
    --cc=mlevitsk@redhat.com \
    --cc=nathan@kernel.org \
    --cc=pbonzini@redhat.com \
    --cc=seanjc@google.com \
    --cc=wanpengli@tencent.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).