From: Vitaly Kuznetsov <vkuznets@redhat.com>
To: Sean Christopherson <seanjc@google.com>
Cc: James Morse <james.morse@arm.com>,
Alexandru Elisei <alexandru.elisei@arm.com>,
Suzuki K Poulose <suzuki.poulose@arm.com>,
Oliver Upton <oliver.upton@linux.dev>,
Atish Patra <atishp@atishpatra.org>,
David Hildenbrand <david@redhat.com>,
kvm@vger.kernel.org, linux-arm-kernel@lists.infradead.org,
kvmarm@lists.linux.dev, kvmarm@lists.cs.columbia.edu,
linux-mips@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
kvm-riscv@lists.infradead.org, linux-riscv@lists.infradead.org,
linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org,
Isaku Yamahata <isaku.yamahata@intel.com>,
Fabiano Rosas <farosas@linux.ibm.com>,
Michael Ellerman <mpe@ellerman.id.au>,
Chao Gao <chao.gao@intel.com>,
Thomas Gleixner <tglx@linutronix.de>,
Yuan Yao <yuan.yao@intel.com>,
Paolo Bonzini <pbonzini@redhat.com>,
Marc Zyngier <maz@kernel.org>,
Huacai Chen <chenhuacai@kernel.org>,
Aleksandar Markovic <aleksandar.qemu.devel@gmail.com>,
Anup Patel <anup@brainfault.org>,
Paul Walmsley <paul.walmsley@sifive.com>,
Palmer Dabbelt <palmer@dabbelt.com>,
Albert Ou <aou@eecs.berkeley.edu>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Janosch Frank <frankja@linux.ibm.com>,
Claudio Imbrenda <imbrenda@linux.ibm.com>,
Matthew Rosato <mjrosato@linux.ibm.com>,
Eric Farman <farman@linux.ibm.com>
Subject: Re: [PATCH 10/44] KVM: VMX: Clean up eVMCS enabling if KVM initialization fails
Date: Tue, 15 Nov 2022 10:30:14 +0100 [thread overview]
Message-ID: <87sfikmuop.fsf@redhat.com> (raw)
In-Reply-To: <Y22nrQ7aziK0NMOE@google.com>
Sean Christopherson <seanjc@google.com> writes:
> On Thu, Nov 03, 2022, Vitaly Kuznetsov wrote:
>> Sean Christopherson <seanjc@google.com> writes:
>> > + /*
>> > + * Reset everything to support using non-enlightened VMCS access later
>> > + * (e.g. when we reload the module with enlightened_vmcs=0)
>> > + */
>> > + for_each_online_cpu(cpu) {
>> > + vp_ap = hv_get_vp_assist_page(cpu);
>> > +
>> > + if (!vp_ap)
>> > + continue;
>> > +
>> > + vp_ap->nested_control.features.directhypercall = 0;
>> > + vp_ap->current_nested_vmcs = 0;
>> > + vp_ap->enlighten_vmentry = 0;
>> > + }
>>
>> Unrelated to your patch but while looking at this code I got curious
>> about why don't we need a protection against CPU offlining here. Turns
>> out that even when we offline a CPU, its VP assist page remains
>> allocated (see hv_cpu_die()), we just write '0' to the MSR and thus
>
> Heh, "die". Hyper-V is quite dramatic.
>
>> accessing the page is safe. The consequent hv_cpu_init(), however, does
>> not restore VP assist page when it's already allocated:
>>
>> # rdmsr -p 24 0x40000073
>> 10212f001
>> # echo 0 > /sys/devices/system/cpu/cpu24/online
>> # echo 1 > /sys/devices/system/cpu/cpu24/online
>> # rdmsr -p 24 0x40000073
>> 0
>>
>> The culprit is commit e5d9b714fe402 ("x86/hyperv: fix root partition
>> faults when writing to VP assist page MSR"). A patch is inbound.
>>
>> 'hv_root_partition' case is different though. We do memunmap() and reset
>> VP assist page to zero so it is theoretically possible we're going to
>> clash. Unless I'm missing some obvious reason why module unload can't
>> coincide with CPU offlining, we may be better off surrounding this with
>> cpus_read_lock()/cpus_read_unlock().
>
> I finally see what you're concerned about. If a CPU goes offline and its assist
> page is unmapped, zeroing out the nested/eVMCS stuff will fault.
>
> I think the real problem is that the purging of the eVMCS is in the wrong place.
> Move the clearing to vmx_hardware_disable() and then the CPU hotplug bug goes
> away once KVM disables hotplug during hardware enabling/disable later in the series.
> There's no need to wait until module exit, e.g. it's not like it costs much to
> clear a few variables, and IIUC the state is used only when KVM is actively using
> VMX/eVMCS.
>
> However, I believe there's a second bug. KVM's CPU online hook is called before
> Hyper-V's online hook (CPUHP_AP_ONLINE_DYN). Before this series, which moves KVM's
> hook from STARTING to ONLINE, KVM's hook is waaaay before Hyper-V's. That means
> that hv_cpu_init()'s allocation of the VP assist page will come _after_ KVM's
> check in vmx_hardware_enable()
>
> /*
> * This can happen if we hot-added a CPU but failed to allocate
> * VP assist page for it.
> */
> if (static_branch_unlikely(&enable_evmcs) &&
> !hv_get_vp_assist_page(cpu))
> return -EFAULT;
>
> I.e. CPU hotplug will never work if KVM is running VMs as a Hyper-V guest. I bet
> you can repro by doing a SUSPEND+RESUME.
>
> Can you try to see if that's actually a bug? If so, the only sane fix seems to
> be to add a dedicated ONLINE action for Hyper-V.
It seems we can't get away without a dedicated stage for Hyper-V anyway,
e.g. see our discussion with Michael:
https://lore.kernel.org/linux-hyperv/878rkqr7ku.fsf@ovpn-192-136.brq.redhat.com/
All these issues are more or less "theoretical" as there's no real CPU
hotplug on Hyper-V/Azure. Yes, it is possible to trigger problems by
doing CPU offline/online but I don't see how this may come handy outside
of testing envs.
> Per patch
>
> KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section
>
> from this series, CPUHP_AP_KVM_ONLINE needs to be before CPUHP_AP_SCHED_WAIT_EMPTY
> to ensure there are no tasks, i.e. no vCPUs, running on the to-be-unplugged CPU.
>
> Back to the original bug, proposed fix is below. The other advantage of moving
> the reset to hardware disabling is that the "cleanup" is just disabling the static
> key, and at that point can simply be deleted as there's no need to disable the
> static key when kvm-intel is unloaded since kvm-intel owns the key. I.e. this
> patch (that we're replying to) would get replaced with a patch to delete the
> disabling of the static key.
>
From a quick glance looks good to me, I'll try to find some time to work
on this issue. I will likely end up proposing a dedicated CPU hotplug
stage for Hyper-V (which needs to happen before KVM's
CPUHP_AP_KVM_ONLINE on CPU hotplug and after on unplug) anyway.
Thanks for looking into this!
> --
> From: Sean Christopherson <seanjc@google.com>
> Date: Thu, 10 Nov 2022 17:28:08 -0800
> Subject: [PATCH] KVM: VMX: Reset eVMCS controls in VP assist page during
> hardware disabling
>
> Reset the eVMCS controls in the per-CPU VP assist page during hardware
> disabling instead of waiting until kvm-intel's module exit. The controls
> are activated if and only if KVM creates a VM, i.e. don't need to be
> reset if hardware is never enabled.
>
> Doing the reset during hardware disabling will naturally fix a potential
> NULL pointer deref bug once KVM disables CPU hotplug while enabling and
> disabling hardware (which is necessary to fix a variety of bugs). If the
> kernel is running as the root partition, the VP assist page is unmapped
> during CPU hot unplug, and so KVM's clearing of the eVMCS controls needs
> to occur with CPU hot(un)plug disabled, otherwise KVM could attempt to
> write to a CPU's VP assist page after it's unmapped.
>
> Reported-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 50 +++++++++++++++++++++++++-----------------
> 1 file changed, 30 insertions(+), 20 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index aca88524fd1e..ae13aa3e8a1d 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -552,6 +552,33 @@ static int hv_enable_direct_tlbflush(struct kvm_vcpu *vcpu)
> return 0;
> }
>
> +static void hv_reset_evmcs(void)
> +{
> + struct hv_vp_assist_page *vp_ap;
> +
> + if (!static_branch_unlikely(&enable_evmcs))
> + return;
> +
> + /*
> + * KVM should enable eVMCS if and only if all CPUs have a VP assist
> + * page, and should reject CPU onlining if eVMCS is enabled the CPU
> + * doesn't have a VP assist page allocated.
> + */
> + vp_ap = hv_get_vp_assist_page(smp_processor_id());
> + if (WARN_ON_ONCE(!vp_ap))
> + return;
> +
> + /*
> + * Reset everything to support using non-enlightened VMCS access later
> + * (e.g. when we reload the module with enlightened_vmcs=0)
> + */
> + vp_ap->nested_control.features.directhypercall = 0;
> + vp_ap->current_nested_vmcs = 0;
> + vp_ap->enlighten_vmentry = 0;
> +}
> +
> +#else /* IS_ENABLED(CONFIG_HYPERV) */
> +static void hv_reset_evmcs(void) {}
> #endif /* IS_ENABLED(CONFIG_HYPERV) */
>
> /*
> @@ -2497,6 +2524,8 @@ static void vmx_hardware_disable(void)
> if (cpu_vmxoff())
> kvm_spurious_fault();
>
> + hv_reset_evmcs();
> +
> intel_pt_handle_vmx(0);
> }
>
> @@ -8463,27 +8492,8 @@ static void vmx_exit(void)
> kvm_exit();
>
> #if IS_ENABLED(CONFIG_HYPERV)
> - if (static_branch_unlikely(&enable_evmcs)) {
> - int cpu;
> - struct hv_vp_assist_page *vp_ap;
> - /*
> - * Reset everything to support using non-enlightened VMCS
> - * access later (e.g. when we reload the module with
> - * enlightened_vmcs=0)
> - */
> - for_each_online_cpu(cpu) {
> - vp_ap = hv_get_vp_assist_page(cpu);
> -
> - if (!vp_ap)
> - continue;
> -
> - vp_ap->nested_control.features.directhypercall = 0;
> - vp_ap->current_nested_vmcs = 0;
> - vp_ap->enlighten_vmentry = 0;
> - }
> -
> + if (static_branch_unlikely(&enable_evmcs))
> static_branch_disable(&enable_evmcs);
> - }
> #endif
> vmx_cleanup_l1d_flush();
>
>
> base-commit: 5f47ba6894477dfbdc5416467a25fb7acb47d404
--
Vitaly
next prev parent reply other threads:[~2022-11-15 9:31 UTC|newest]
Thread overview: 127+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-02 23:18 [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling Sean Christopherson
2022-11-02 23:18 ` [PATCH 01/44] KVM: Register /dev/kvm as the _very_ last thing during initialization Sean Christopherson
2022-11-02 23:18 ` [PATCH 02/44] KVM: Initialize IRQ FD after arch hardware setup Sean Christopherson
2022-11-04 0:41 ` Chao Gao
2022-11-04 20:15 ` Sean Christopherson
2022-11-02 23:18 ` [PATCH 03/44] KVM: Allocate cpus_hardware_enabled " Sean Christopherson
2022-11-04 5:37 ` Yuan Yao
2022-11-02 23:18 ` [PATCH 04/44] KVM: Teardown VFIO ops earlier in kvm_exit() Sean Christopherson
2022-11-03 12:46 ` Cornelia Huck
2022-11-07 17:56 ` Eric Farman
2022-11-02 23:18 ` [PATCH 05/44] KVM: s390: Unwind kvm_arch_init() piece-by-piece() if a step fails Sean Christopherson
2022-11-07 17:57 ` Eric Farman
2022-11-02 23:18 ` [PATCH 06/44] KVM: s390: Move hardware setup/unsetup to init/exit Sean Christopherson
2022-11-07 17:58 ` Eric Farman
2022-11-02 23:18 ` [PATCH 07/44] KVM: x86: Do timer initialization after XCR0 configuration Sean Christopherson
2022-11-02 23:18 ` [PATCH 08/44] KVM: x86: Move hardware setup/unsetup to init/exit Sean Christopherson
2022-11-04 6:22 ` Yuan Yao
2022-11-04 16:31 ` Sean Christopherson
2022-11-02 23:18 ` [PATCH 09/44] KVM: Drop arch hardware (un)setup hooks Sean Christopherson
2022-11-07 3:01 ` Anup Patel
2022-11-07 18:22 ` Eric Farman
2022-11-02 23:18 ` [PATCH 10/44] KVM: VMX: Clean up eVMCS enabling if KVM initialization fails Sean Christopherson
2022-11-03 14:01 ` Paolo Bonzini
2022-11-03 14:04 ` Paolo Bonzini
2022-11-03 14:28 ` Vitaly Kuznetsov
2022-11-11 1:38 ` Sean Christopherson
2022-11-15 9:30 ` Vitaly Kuznetsov [this message]
2022-11-02 23:18 ` [PATCH 11/44] KVM: x86: Move guts of kvm_arch_init() to standalone helper Sean Christopherson
2022-11-02 23:18 ` [PATCH 12/44] KVM: VMX: Do _all_ initialization before exposing /dev/kvm to userspace Sean Christopherson
2022-11-02 23:18 ` [PATCH 13/44] KVM: x86: Serialize vendor module initialization (hardware setup) Sean Christopherson
2022-11-16 1:46 ` Huang, Kai
2022-11-16 15:52 ` Sean Christopherson
2022-11-02 23:18 ` [PATCH 14/44] KVM: arm64: Simplify the CPUHP logic Sean Christopherson
2022-11-02 23:18 ` [PATCH 15/44] KVM: arm64: Free hypervisor allocations if vector slot init fails Sean Christopherson
2022-11-02 23:18 ` [PATCH 16/44] KVM: arm64: Unregister perf callbacks if hypervisor finalization fails Sean Christopherson
2022-11-02 23:18 ` [PATCH 17/44] KVM: arm64: Do arm/arch initialiation without bouncing through kvm_init() Sean Christopherson
2022-11-03 7:25 ` Philippe Mathieu-Daudé
2022-11-03 15:29 ` Sean Christopherson
2022-11-02 23:18 ` [PATCH 18/44] KVM: arm64: Mark kvm_arm_init() and its unique descendants as __init Sean Christopherson
2022-11-02 23:18 ` [PATCH 19/44] KVM: MIPS: Hardcode callbacks to hardware virtualization extensions Sean Christopherson
2022-11-02 23:18 ` [PATCH 20/44] KVM: MIPS: Setup VZ emulation? directly from kvm_mips_init() Sean Christopherson
2022-11-03 7:10 ` Philippe Mathieu-Daudé
2022-11-02 23:18 ` [PATCH 21/44] KVM: MIPS: Register die notifier prior to kvm_init() Sean Christopherson
2022-11-03 7:12 ` Philippe Mathieu-Daudé
2022-11-02 23:18 ` [PATCH 22/44] KVM: RISC-V: Do arch init directly in riscv_kvm_init() Sean Christopherson
2022-11-03 7:14 ` Philippe Mathieu-Daudé
2022-11-07 3:05 ` Anup Patel
2022-11-02 23:18 ` [PATCH 23/44] KVM: RISC-V: Tag init functions and data with __init, __ro_after_init Sean Christopherson
2022-11-07 3:10 ` Anup Patel
2022-11-02 23:18 ` [PATCH 24/44] KVM: PPC: Move processor compatibility check to module init Sean Christopherson
2022-11-02 23:18 ` [PATCH 25/44] KVM: s390: Do s390 specific init without bouncing through kvm_init() Sean Christopherson
2022-11-03 7:16 ` Philippe Mathieu-Daudé
2022-11-03 12:44 ` Claudio Imbrenda
2022-11-03 13:21 ` Claudio Imbrenda
2022-11-07 18:22 ` Eric Farman
2022-11-02 23:18 ` [PATCH 26/44] KVM: s390: Mark __kvm_s390_init() and its descendants as __init Sean Christopherson
2022-11-07 18:22 ` Eric Farman
2022-11-02 23:18 ` [PATCH 27/44] KVM: Drop kvm_arch_{init,exit}() hooks Sean Christopherson
2022-11-03 7:18 ` Philippe Mathieu-Daudé
2022-11-07 3:13 ` Anup Patel
2022-11-07 19:08 ` Eric Farman
2022-11-02 23:18 ` [PATCH 28/44] KVM: VMX: Make VMCS configuration/capabilities structs read-only after init Sean Christopherson
2022-11-02 23:18 ` [PATCH 29/44] KVM: x86: Do CPU compatibility checks in x86 code Sean Christopherson
2022-11-02 23:18 ` [PATCH 30/44] KVM: Drop kvm_arch_check_processor_compat() hook Sean Christopherson
2022-11-03 7:20 ` Philippe Mathieu-Daudé
2022-11-07 3:16 ` Anup Patel
2022-11-07 19:08 ` Eric Farman
2022-11-02 23:18 ` [PATCH 31/44] KVM: x86: Use KBUILD_MODNAME to specify vendor module name Sean Christopherson
2022-11-02 23:18 ` [PATCH 32/44] KVM: x86: Unify pr_fmt to use module name for all KVM modules Sean Christopherson
2022-11-10 7:31 ` Robert Hoo
2022-11-10 16:50 ` Sean Christopherson
2022-11-30 23:02 ` Sean Christopherson
2022-12-01 1:34 ` Robert Hoo
2022-11-02 23:19 ` [PATCH 33/44] KVM: x86: Do VMX/SVM support checks directly in vendor code Sean Christopherson
2022-11-03 15:08 ` Paolo Bonzini
2022-11-03 18:35 ` Sean Christopherson
2022-11-03 18:46 ` Paolo Bonzini
2022-11-03 18:58 ` Sean Christopherson
2022-11-04 8:02 ` Paolo Bonzini
2022-11-04 15:40 ` Sean Christopherson
2022-11-15 22:50 ` Huang, Kai
2022-11-16 1:56 ` Sean Christopherson
2022-11-02 23:19 ` [PATCH 34/44] KVM: VMX: Shuffle support checks and hardware enabling code around Sean Christopherson
2022-11-02 23:19 ` [PATCH 35/44] KVM: SVM: Check for SVM support in CPU compatibility checks Sean Christopherson
2022-11-02 23:19 ` [PATCH 36/44] KVM: x86: Do compatibility checks when onlining CPU Sean Christopherson
2022-11-03 15:17 ` Paolo Bonzini
2022-11-03 17:44 ` Sean Christopherson
2022-11-03 17:57 ` Paolo Bonzini
2022-11-03 21:04 ` Isaku Yamahata
2022-11-03 22:34 ` Sean Christopherson
2022-11-04 7:18 ` Isaku Yamahata
2022-11-11 0:06 ` Sean Christopherson
2022-11-02 23:19 ` [PATCH 37/44] KVM: Rename and move CPUHP_AP_KVM_STARTING to ONLINE section Sean Christopherson
2022-11-10 7:26 ` Robert Hoo
2022-11-10 16:49 ` Sean Christopherson
2022-11-02 23:19 ` [PATCH 38/44] KVM: Disable CPU hotplug during hardware enabling Sean Christopherson
2022-11-10 1:08 ` Huang, Kai
2022-11-10 2:20 ` Huang, Kai
2022-11-10 1:33 ` Huang, Kai
2022-11-10 2:11 ` Huang, Kai
2022-11-10 16:58 ` Sean Christopherson
2022-11-15 20:16 ` Sean Christopherson
2022-11-15 20:21 ` Sean Christopherson
2022-11-16 12:23 ` Huang, Kai
2022-11-16 17:11 ` Sean Christopherson
2022-11-17 1:39 ` Huang, Kai
2022-11-17 15:16 ` Sean Christopherson
2022-11-02 23:19 ` [PATCH 39/44] KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock Sean Christopherson
2022-11-03 15:23 ` Paolo Bonzini
2022-11-03 17:53 ` Sean Christopherson
2022-11-02 23:19 ` [PATCH 40/44] KVM: Remove on_each_cpu(hardware_disable_nolock) in kvm_exit() Sean Christopherson
2022-11-02 23:19 ` [PATCH 41/44] KVM: Use a per-CPU variable to track which CPUs have enabled virtualization Sean Christopherson
2022-11-02 23:19 ` [PATCH 42/44] KVM: Make hardware_enable_failed a local variable in the "enable all" path Sean Christopherson
2022-11-02 23:19 ` [PATCH 43/44] KVM: Register syscore (suspend/resume) ops early in kvm_init() Sean Christopherson
2022-11-02 23:19 ` [PATCH 44/44] KVM: Opt out of generic hardware enabling on s390 and PPC Sean Christopherson
2022-11-07 3:23 ` Anup Patel
2022-11-03 12:08 ` [PATCH 00/44] KVM: Rework kvm_init() and hardware enabling Christian Borntraeger
2022-11-03 15:27 ` Paolo Bonzini
2022-11-04 7:17 ` Isaku Yamahata
2022-11-04 7:59 ` Paolo Bonzini
2022-11-04 20:27 ` Sean Christopherson
2022-11-07 21:46 ` Isaku Yamahata
2022-11-08 1:09 ` Huang, Kai
2022-11-08 5:43 ` Isaku Yamahata
2022-11-08 8:56 ` Huang, Kai
2022-11-08 10:35 ` Huang, Kai
2022-11-08 17:46 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87sfikmuop.fsf@redhat.com \
--to=vkuznets@redhat.com \
--cc=aleksandar.qemu.devel@gmail.com \
--cc=alexandru.elisei@arm.com \
--cc=anup@brainfault.org \
--cc=aou@eecs.berkeley.edu \
--cc=atishp@atishpatra.org \
--cc=borntraeger@linux.ibm.com \
--cc=chao.gao@intel.com \
--cc=chenhuacai@kernel.org \
--cc=david@redhat.com \
--cc=farman@linux.ibm.com \
--cc=farosas@linux.ibm.com \
--cc=frankja@linux.ibm.com \
--cc=imbrenda@linux.ibm.com \
--cc=isaku.yamahata@intel.com \
--cc=james.morse@arm.com \
--cc=kvm-riscv@lists.infradead.org \
--cc=kvm@vger.kernel.org \
--cc=kvmarm@lists.cs.columbia.edu \
--cc=kvmarm@lists.linux.dev \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mips@vger.kernel.org \
--cc=linux-riscv@lists.infradead.org \
--cc=linux-s390@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=maz@kernel.org \
--cc=mjrosato@linux.ibm.com \
--cc=mpe@ellerman.id.au \
--cc=oliver.upton@linux.dev \
--cc=palmer@dabbelt.com \
--cc=paul.walmsley@sifive.com \
--cc=pbonzini@redhat.com \
--cc=seanjc@google.com \
--cc=suzuki.poulose@arm.com \
--cc=tglx@linutronix.de \
--cc=yuan.yao@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).