From: "Huang, Kai" <kai.huang@intel.com>
To: "kvm@vger.kernel.org" <kvm@vger.kernel.org>,
"linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"xin@zytor.com" <xin@zytor.com>
Cc: "brgerst@gmail.com" <brgerst@gmail.com>,
"andrew.cooper3@citrix.com" <andrew.cooper3@citrix.com>,
"arjan@linux.intel.com" <arjan@linux.intel.com>,
"Williams, Dan J" <dan.j.williams@intel.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
"seanjc@google.com" <seanjc@google.com>,
"mingo@redhat.com" <mingo@redhat.com>,
"pbonzini@redhat.com" <pbonzini@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"hpa@zytor.com" <hpa@zytor.com>,
"peterz@infradead.org" <peterz@infradead.org>,
"pavel@kernel.org" <pavel@kernel.org>,
"bp@alien8.de" <bp@alien8.de>,
"kprateek.nayak@amd.com" <kprateek.nayak@amd.com>,
"rafael@kernel.org" <rafael@kernel.org>,
"david.kaplan@amd.com" <david.kaplan@amd.com>,
"x86@kernel.org" <x86@kernel.org>,
"Gao, Chao" <chao.gao@intel.com>
Subject: Re: [RFC PATCH v1 1/5] x86/boot: Shift VMXON from KVM init to CPU startup phase
Date: Wed, 10 Sep 2025 08:02:46 +0000 [thread overview]
Message-ID: <1301b802284ed5755fe397f54e1de41638aec49c.camel@intel.com> (raw)
In-Reply-To: <20250909182828.1542362-2-xin@zytor.com>
On Tue, 2025-09-09 at 11:28 -0700, Xin Li (Intel) wrote:
> Move the VMXON setup from the KVM initialization path to the CPU startup
> phase to guarantee that hardware virtualization is enabled early and
> without interruption.
>
> As a result, KVM, often loaded as a kernel module, no longer needs to worry
> about whether or not VMXON has been executed on a CPU (e.g., CPU offline
> events or system reboots while KVM is loading).
KVM has a module parameter 'enable_virt_at_load', which controls whether
to enable virtualization (in case of VMX, VMXON) when loading KVM or defer
the enabling until the first VM is created.
Changing to unconditionally do VMXON when bringing up the CPU will kinda
break this. Maybe eventually, we might switch to unconditionally VMXON,
but now it seems a dramatic move.
I was thinking the code change would be the core kernel only provides the
VMXON/OFF APIs for KVM (and other kernel components to use, i.e., more
like "moving" VMX code out of KVM.
[...]
> +static bool is_vmx_supported(void)
> +{
> + int cpu = raw_smp_processor_id();
> +
> + if (!(cpuid_ecx(1) & (1 << (X86_FEATURE_VMX & 31)))) {
> + /* May not be an Intel CPU */
> + pr_info("VMX not supported by CPU%d\n", cpu);
> + return false;
> + }
> +
> + if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
> + !this_cpu_has(X86_FEATURE_VMX)) {
> + pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL on CPU%d\n", cpu);
> + return false;
> + }
> +
> + return true;
> +}
> +
> +/* IA-32 SDM Vol 3B: VMCS size is never greater than 4kB. */
> +union vmxon_vmcs {
> + struct vmcs_hdr hdr;
> + char data[PAGE_SIZE];
> +};
> +
> +static DEFINE_PER_CPU_PAGE_ALIGNED(union vmxon_vmcs, vmxon_vmcs);
> +
> +/*
> + * Executed during the CPU startup phase to execute VMXON to enable VMX. This
> + * ensures that KVM, often loaded as a kernel module, no longer needs to worry
> + * about whether or not VMXON has been executed on a CPU (e.g., CPU offline
> + * events or system reboots while KVM is loading).
> + *
> + * VMXON is not expected to fault, but fault handling is kept as a precaution
> + * against any unexpected code paths that might trigger it and can be removed
> + * later if unnecessary.
> + */
> +void cpu_enable_virtualization(void)
> +{
> + u64 vmxon_pointer = __pa(this_cpu_ptr(&vmxon_vmcs));
> + int cpu = raw_smp_processor_id();
> + u64 basic_msr;
> +
> + if (!is_vmx_supported())
> + return;
> +
> + if (cr4_read_shadow() & X86_CR4_VMXE) {
> + pr_err("VMX already enabled on CPU%d\n", cpu);
> + return;
> + }
> +
> + memset(this_cpu_ptr(&vmxon_vmcs), 0, PAGE_SIZE);
> +
> + /*
> + * Even though not explicitly documented by TLFS, VMXArea passed as
> + * VMXON argument should still be marked with revision_id reported by
> + * physical CPU.
> + */
> + rdmsrq(MSR_IA32_VMX_BASIC, basic_msr);
> + this_cpu_ptr(&vmxon_vmcs)->hdr.revision_id = vmx_basic_vmcs_revision_id(basic_msr);
> +
> + intel_pt_handle_vmx(1);
> +
> + cr4_set_bits(X86_CR4_VMXE);
> +
> + asm goto("1: vmxon %[vmxon_pointer]\n\t"
> + _ASM_EXTABLE(1b, %l[fault])
> + : : [vmxon_pointer] "m"(vmxon_pointer)
> + : : fault);
> +
> + return;
> +
> +fault:
> + pr_err("VMXON faulted on CPU%d\n", cpu);
> + cr4_clear_bits(X86_CR4_VMXE);
> + intel_pt_handle_vmx(0);
> +}
> +
> /*
> * This does the hard work of actually picking apart the CPU stuff...
> */
> @@ -2120,6 +2199,12 @@ void identify_secondary_cpu(unsigned int cpu)
>
> tsx_ap_init();
> c->initialized = true;
> +
> + /*
> + * Enable AP virtualization immediately after initializing the per-CPU
> + * cpuinfo_x86 structure, ensuring that this_cpu_has() operates correctly.
> + */
> + cpu_enable_virtualization();
> }
AFAICT here there's a functional drawback, that this implementation
doesn't handle VMXON failure while the existing KVM code does via a CPUHP
callback.
>
> void print_cpu_info(struct cpuinfo_x86 *c)
> @@ -2551,6 +2636,12 @@ void __init arch_cpu_finalize_init(void)
> *c = boot_cpu_data;
> c->initialized = true;
>
> + /*
> + * Enable BSP virtualization right after the BSP cpuinfo_x86 structure
> + * is initialized to ensure this_cpu_has() works as expected.
> + */
> + cpu_enable_virtualization();
> +
>
Any reason that you choose to do it in arch_cpu_finalize_init()? Perhaps
just a arch_initcall() or similar?
KVM has a specific CPUHP_AP_KVM_ONLINE to handle VMXON/OFF for CPU
online/offline. And it's not in STARTUP section (which is not allowed to
fail) so it can handle the failure of VMXON.
How about adding a VMX specific CPUHP callback instead?
In this way, not only we can put all VMX related code together (e.g.,
arch/x86/virt/vmx/vmx.c) which is way easier to review/maintain, but also
we can still handle the failure of VMXON just like in KVM.
(btw, originally KVM's CPUHP callback was also in STARTUP section, but
later we changed to after that in order to handle VMXON failure and
compatibility check failure gracefully.)
[...]
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 916441f5e85c..0eec314b79c2 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -206,11 +206,11 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> /* cr4 was introduced in the Pentium CPU */
> #ifdef CONFIG_X86_32
> if (ctxt->cr4)
> - __write_cr4(ctxt->cr4);
> + __write_cr4(ctxt->cr4 & ~X86_CR4_VMXE);
> #else
> /* CONFIG X86_64 */
> wrmsrq(MSR_EFER, ctxt->efer);
> - __write_cr4(ctxt->cr4);
> + __write_cr4(ctxt->cr4 & ~X86_CR4_VMXE);
> #endif
> write_cr3(ctxt->cr3);
> write_cr2(ctxt->cr2);
> @@ -291,6 +291,9 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
> * because some of the MSRs are "emulated" in microcode.
> */
> msr_restore_context(ctxt);
> +
> + if (ctxt->cr4 & X86_CR4_VMXE)
> + cpu_enable_virtualization();
> }
>
If we still leverage what KVM is doing -- using syscore_ops callback -- I
think we can avoid changing this function but keep all VMX code in a
dedicated file.
next prev parent reply other threads:[~2025-09-10 8:02 UTC|newest]
Thread overview: 26+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-09-09 18:28 [RFC PATCH v1 0/5] x86/boot, KVM: Move VMXON/VMXOFF handling from KVM to CPU lifecycle Xin Li (Intel)
2025-09-09 18:28 ` [RFC PATCH v1 1/5] x86/boot: Shift VMXON from KVM init to CPU startup phase Xin Li (Intel)
2025-09-10 5:37 ` Adrian Hunter
2025-09-10 7:25 ` Chao Gao
2025-09-11 6:57 ` Xin Li
2025-09-10 8:02 ` Huang, Kai [this message]
2025-09-10 11:10 ` Chao Gao
2025-09-10 11:35 ` Huang, Kai
2025-09-10 13:13 ` Arjan van de Ven
2025-09-10 20:52 ` Huang, Kai
2025-09-09 18:28 ` [RFC PATCH v1 2/5] x86/boot: Move VMXOFF from KVM teardown to CPU shutdown phase Xin Li (Intel)
2025-09-09 18:28 ` [RFC PATCH v1 3/5] x86/shutdown, KVM: VMX: Move VMCLEAR of VMCSs to cpu_disable_virtualization() Xin Li (Intel)
2025-09-09 18:28 ` [RFC PATCH v1 4/5] x86/reboot: Remove emergency_reboot_disable_virtualization() Xin Li (Intel)
2025-09-09 18:28 ` [RFC PATCH v1 5/5] KVM: Remove kvm_rebooting and its references Xin Li (Intel)
2025-09-16 17:56 ` Sean Christopherson
2025-09-17 16:51 ` Xin Li
2025-09-17 23:02 ` Sean Christopherson
2025-09-11 14:20 ` [RFC PATCH v1 0/5] x86/boot, KVM: Move VMXON/VMXOFF handling from KVM to CPU lifecycle Sean Christopherson
2025-09-11 15:20 ` Dave Hansen
2025-09-16 17:29 ` Sean Christopherson
2025-09-11 17:04 ` Arjan van de Ven
2025-09-16 17:54 ` Sean Christopherson
2025-09-16 18:25 ` Jim Mattson
2025-09-17 13:48 ` Arjan van de Ven
2025-09-17 17:30 ` Xin Li
2025-09-17 22:40 ` Sean Christopherson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1301b802284ed5755fe397f54e1de41638aec49c.camel@intel.com \
--to=kai.huang@intel.com \
--cc=andrew.cooper3@citrix.com \
--cc=arjan@linux.intel.com \
--cc=bp@alien8.de \
--cc=brgerst@gmail.com \
--cc=chao.gao@intel.com \
--cc=dan.j.williams@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=david.kaplan@amd.com \
--cc=hpa@zytor.com \
--cc=kprateek.nayak@amd.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=pavel@kernel.org \
--cc=pbonzini@redhat.com \
--cc=peterz@infradead.org \
--cc=rafael@kernel.org \
--cc=rick.p.edgecombe@intel.com \
--cc=seanjc@google.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
--cc=xin@zytor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox