public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 00/14] Support multiple KVM modules on the same host
@ 2023-11-07 20:19 Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 01/14] KVM: x86: Move common module params from SVM/VMX to x86 Anish Ghulati
                   ` (14 more replies)
  0 siblings, 15 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

This series is a rough, PoC-quality RFC to allow (un)loading and running
multiple KVM modules simultaneously on a single host, e.g. to deploy
fixes, mitigations, and/or new features without having to drain all VMs 
from the host. Multi-KVM will also allow running the "same" KVM module
with different params, e.g. to run trusted VMs with different mitigations.

The goal of this RFC is to get feedback on the idea itself and the
high-level approach.  In particular, we're looking for input on:

 - Combining kvm_intel.ko and kvm_amd.ko into kvm.ko
 - Exposing multiple /dev/kvmX devices via Kconfig
 - The name and prefix of the new base module

Feedback on individual patches is also welcome, but please keep in mind
that this is very much a work in-progress

This builds on Sean's series to hide KVM internals:

    https://lore.kernel.org/lkml/20230916003118.2540661-1-seanjc@google.com

The whole thing can be found at:

    https://github.com/asg-17/linux vac-rfc

The basic gist of the approach is to:

 - Move system-wide virtualization resource management to a new base
   module to avoid collisions between different KVM modules, e.g. VPIDs
   and ASIDs need to be unique per VM, and callbacks from IRQ handlers need
   to be mediated so that things like PMIs get to the right KVM instance.

 - Refactor KVM to make all upgradable assets visible only to KVM, i.e.
   make KVM a black box, so that the layout/size of things like "struct
   kvm_vcpu" isn't exposed to the kernel at-large.

 - Fold kvm_intel.ko and kvm_amd.ko into kvm.ko to avoid complications
   having to generate unique symbols for every symbol exported by kvm.ko.

 - Add a Kconfig string to allow defining a device and module postfix at
   build time, e.g. to create kvmX.ko and /dev/kvmX.

The proposed name of the new base module is vac.ko, a.k.a.
Virtualization Acceleration Code (Unupgradable Units Module). Childish
humor aside, "vac" is a unique name in the kernel and hopefully in x86
and hardware terminology, is a unique name in the kernel and hopefully
in x86 and hardware terminology, e.g. `git grep vac_` yields no hits in
the kernel. It also has the same number of characters as "kvm", e.g.
the namespace can be modified without needing whitespace adjustment if
we want to go that route.

Requirements / Goals / Notes:
 - Fully opt-in and backwards compatible (except for the disappearance
   of kvm_{amd,intel}.ko).

 - User space ultimately controls and is responsible for deployment,
   usage, lifecycles, etc.  Standard module refcounting applies, but 
   ensuruing that a VM is created with the "right" KVM module is a user
   space problem.

 - No user space *VMM* changes are required, e.g. /dev/kvm can be
   presented to a VMM by symlinking /dev/kvmX.

 - Mutually exclusive with subsytems that have a hard dependency on KVM,
   i.e. KVMGT.

 - x86 only (for the foreseeable future).

Anish Ghulati (13):
  KVM: x86: Move common module params from SVM/VMX to x86
  KVM: x86: Fold x86 vendor modules into the main KVM modules
  KVM: x86: Remove unused exports
  KVM: x86: Create stubs for a new VAC module
  KVM: x86: Refactor hardware enable/disable operations into a new file
  KVM: x86: Move user return msr operations out of KVM
  KVM: SVM: Move shared SVM data structures into VAC
  KVM: VMX: Move shared VMX data structures into VAC
  KVM: VMX: Move VMX enable and disable into VAC
  KVM: SVM: Move SVM enable and disable into VAC
  KVM: x86: Move VMX and SVM support checks into VAC
  KVM: x86: VAC: Move all hardware enable/disable code into VAC
  KVM: VAC: Bring up VAC as a new module

Venkatesh Srinivas (1):
  KVM: x86: Move shared KVM state into VAC

 arch/x86/include/asm/kvm-x86-ops.h |   3 +-
 arch/x86/include/asm/kvm_host.h    |  12 +-
 arch/x86/kernel/nmi.c              |   2 +-
 arch/x86/kvm/Kconfig               |  29 +-
 arch/x86/kvm/Makefile              |  31 ++-
 arch/x86/kvm/cpuid.c               |   8 +-
 arch/x86/kvm/hyperv.c              |   2 -
 arch/x86/kvm/irq.c                 |   3 -
 arch/x86/kvm/irq_comm.c            |   2 -
 arch/x86/kvm/kvm_onhyperv.c        |   3 -
 arch/x86/kvm/lapic.c               |  15 -
 arch/x86/kvm/mmu/mmu.c             |  12 -
 arch/x86/kvm/mmu/spte.c            |   4 -
 arch/x86/kvm/mtrr.c                |   1 -
 arch/x86/kvm/pmu.c                 |   2 -
 arch/x86/kvm/svm/nested.c          |   4 +-
 arch/x86/kvm/svm/sev.c             |   2 +-
 arch/x86/kvm/svm/svm.c             | 224 ++-------------
 arch/x86/kvm/svm/svm.h             |  21 +-
 arch/x86/kvm/svm/svm_data.h        |  23 ++
 arch/x86/kvm/svm/svm_ops.h         |   1 +
 arch/x86/kvm/svm/vac.c             | 172 ++++++++++++
 arch/x86/kvm/svm/vac.h             |  20 ++
 arch/x86/kvm/vac.c                 | 214 +++++++++++++++
 arch/x86/kvm/vac.h                 |  69 +++++
 arch/x86/kvm/vmx/nested.c          |   6 +-
 arch/x86/kvm/vmx/vac.c             | 287 +++++++++++++++++++
 arch/x86/kvm/vmx/vac.h             |  20 ++
 arch/x86/kvm/vmx/vmx.c             | 332 +++-------------------
 arch/x86/kvm/vmx/vmx.h             |   2 -
 arch/x86/kvm/vmx/vmx_ops.h         |   1 +
 arch/x86/kvm/x86.c                 | 423 ++---------------------------
 arch/x86/kvm/x86.h                 |  15 +-
 include/linux/kvm_host.h           |   2 +
 virt/kvm/Makefile.kvm              |  14 +-
 virt/kvm/kvm_main.c                | 210 +-------------
 virt/kvm/vac.c                     | 192 +++++++++++++
 virt/kvm/vac.h                     |  40 +++
 38 files changed, 1212 insertions(+), 1211 deletions(-)
 create mode 100644 arch/x86/kvm/svm/svm_data.h
 create mode 100644 arch/x86/kvm/svm/vac.c
 create mode 100644 arch/x86/kvm/svm/vac.h
 create mode 100644 arch/x86/kvm/vac.c
 create mode 100644 arch/x86/kvm/vac.h
 create mode 100644 arch/x86/kvm/vmx/vac.c
 create mode 100644 arch/x86/kvm/vmx/vac.h
 create mode 100644 virt/kvm/vac.c
 create mode 100644 virt/kvm/vac.h


base-commit: 0b78fc46e5450f08ef92431e569c797a63f31517
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC PATCH 01/14] KVM: x86: Move common module params from SVM/VMX to x86
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 02/14] KVM: x86: Fold x86 vendor modules into the main KVM modules Anish Ghulati
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Move nested and enable_vnmi from SVM and VMX into x86.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/svm/nested.c |  4 ++--
 arch/x86/kvm/svm/svm.c    | 17 +++++------------
 arch/x86/kvm/svm/svm.h    |  3 +--
 arch/x86/kvm/vmx/vmx.c    | 11 -----------
 arch/x86/kvm/x86.c        | 11 +++++++++++
 arch/x86/kvm/x86.h        |  4 ++++
 6 files changed, 23 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index dd496c9e5f91..aebccf7c1c2d 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -666,7 +666,7 @@ static void nested_vmcb02_prepare_control(struct vcpu_svm *svm,
 	else
 		int_ctl_vmcb01_bits |= (V_GIF_MASK | V_GIF_ENABLE_MASK);
 
-	if (vnmi) {
+	if (enable_vnmi) {
 		if (vmcb01->control.int_ctl & V_NMI_PENDING_MASK) {
 			svm->vcpu.arch.nmi_pending++;
 			kvm_make_request(KVM_REQ_EVENT, &svm->vcpu);
@@ -1083,7 +1083,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 		svm_update_lbrv(vcpu);
 	}
 
-	if (vnmi) {
+	if (enable_vnmi) {
 		if (vmcb02->control.int_ctl & V_NMI_BLOCKING_MASK)
 			vmcb01->control.int_ctl |= V_NMI_BLOCKING_MASK;
 		else
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f283eb47f6ac..3d44e42f4f22 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -197,10 +197,6 @@ module_param(pause_filter_count_max, ushort, 0444);
 bool npt_enabled = true;
 module_param_named(npt, npt_enabled, bool, 0444);
 
-/* allow nested virtualization in KVM/SVM */
-static int nested = true;
-module_param(nested, int, S_IRUGO);
-
 /* enable/disable Next RIP Save */
 int nrips = true;
 module_param(nrips, int, 0444);
@@ -234,9 +230,6 @@ module_param(dump_invalid_vmcb, bool, 0644);
 bool intercept_smi = true;
 module_param(intercept_smi, bool, 0444);
 
-bool vnmi = true;
-module_param(vnmi, bool, 0444);
-
 static bool svm_gp_erratum_intercept = true;
 
 static u8 rsm_ins_bytes[] = "\x0f\xaa";
@@ -1357,7 +1350,7 @@ static void init_vmcb(struct kvm_vcpu *vcpu)
 	if (kvm_vcpu_apicv_active(vcpu))
 		avic_init_vmcb(svm, vmcb);
 
-	if (vnmi)
+	if (enable_vnmi)
 		svm->vmcb->control.int_ctl |= V_NMI_ENABLE_MASK;
 
 	if (vgif) {
@@ -5089,7 +5082,7 @@ static __init void svm_set_cpu_caps(void)
 		if (vgif)
 			kvm_cpu_cap_set(X86_FEATURE_VGIF);
 
-		if (vnmi)
+		if (enable_vnmi)
 			kvm_cpu_cap_set(X86_FEATURE_VNMI);
 
 		/* Nested VM can receive #VMEXIT instead of triggering #GP */
@@ -5253,11 +5246,11 @@ static __init int svm_hardware_setup(void)
 			pr_info("Virtual GIF supported\n");
 	}
 
-	vnmi = vgif && vnmi && boot_cpu_has(X86_FEATURE_VNMI);
-	if (vnmi)
+	enable_vnmi = vgif && enable_vnmi && boot_cpu_has(X86_FEATURE_VNMI);
+	if (enable_vnmi)
 		pr_info("Virtual NMI enabled\n");
 
-	if (!vnmi) {
+	if (!enable_vnmi) {
 		svm_x86_ops.is_vnmi_pending = NULL;
 		svm_x86_ops.set_vnmi_pending = NULL;
 	}
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index f41253958357..436632706848 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -38,7 +38,6 @@ extern int nrips;
 extern int vgif;
 extern bool intercept_smi;
 extern bool x2avic_enabled;
-extern bool vnmi;
 
 /*
  * Clean bits in VMCB.
@@ -510,7 +509,7 @@ static inline bool is_x2apic_msrpm_offset(u32 offset)
 
 static inline struct vmcb *get_vnmi_vmcb_l1(struct vcpu_svm *svm)
 {
-	if (!vnmi)
+	if (!enable_vnmi)
 		return NULL;
 
 	if (is_guest_mode(&svm->vcpu))
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 43b87ad5fde8..65d59de3cc63 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -80,9 +80,6 @@ MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id);
 bool __read_mostly enable_vpid = 1;
 module_param_named(vpid, enable_vpid, bool, 0444);
 
-static bool __read_mostly enable_vnmi = 1;
-module_param_named(vnmi, enable_vnmi, bool, S_IRUGO);
-
 bool __read_mostly flexpriority_enabled = 1;
 module_param_named(flexpriority, flexpriority_enabled, bool, S_IRUGO);
 
@@ -107,14 +104,6 @@ module_param(enable_apicv, bool, S_IRUGO);
 bool __read_mostly enable_ipiv = true;
 module_param(enable_ipiv, bool, 0444);
 
-/*
- * If nested=1, nested virtualization is supported, i.e., guests may use
- * VMX and be a hypervisor for its own guests. If nested=0, guests may not
- * use VMX instructions.
- */
-static bool __read_mostly nested = 1;
-module_param(nested, bool, S_IRUGO);
-
 bool __read_mostly enable_pml = 1;
 module_param_named(pml, enable_pml, bool, S_IRUGO);
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index aab095f89d9e..6b7f89fd2d47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -176,6 +176,17 @@ bool __read_mostly enable_vmware_backdoor = false;
 module_param(enable_vmware_backdoor, bool, S_IRUGO);
 EXPORT_SYMBOL_GPL(enable_vmware_backdoor);
 
+/*
+ * If nested=1, nested virtualization is supported
+ */
+bool __read_mostly nested = 1;
+module_param(nested, bool, S_IRUGO);
+EXPORT_SYMBOL_GPL(nested);
+
+bool __read_mostly enable_vnmi = 1;
+module_param(enable_vnmi, bool, S_IRUGO);
+EXPORT_SYMBOL_GPL(enable_vnmi);
+
 /*
  * Flags to manipulate forced emulation behavior (any non-zero value will
  * enable forced emulation).
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1e7be1f6ab29..6b5490319d1b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -367,6 +367,10 @@ extern unsigned int min_timer_period_us;
 
 extern bool enable_vmware_backdoor;
 
+extern bool nested;
+
+extern bool enable_vnmi;
+
 extern int pi_inject_timer;
 
 extern bool report_ignored_msrs;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 02/14] KVM: x86: Fold x86 vendor modules into the main KVM modules
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 01/14] KVM: x86: Move common module params from SVM/VMX to x86 Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 03/14] KVM: x86: Remove unused exports Anish Ghulati
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Collapse x86 vendor modules, i.e. kvm_intel and kvm_amd into kvm.ko.

Add a new vendor_exit function to kvm_x86_ops so that KVM knows which
exit function to call when exiting. Since the vendor modules no longer
exist the vendor_exit function call does not have to go through a static
call.

Expose vendor init/exit/supported functions so that they can be called
from vendor neutral KVM locations.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h    |  2 +
 arch/x86/kernel/nmi.c              |  2 +-
 arch/x86/kvm/Kconfig               | 12 ++----
 arch/x86/kvm/Makefile              | 10 ++---
 arch/x86/kvm/svm/svm.c             | 38 +++++++----------
 arch/x86/kvm/vmx/vmx.c             | 66 +++++++++++++-----------------
 arch/x86/kvm/x86.c                 | 18 +++++---
 arch/x86/kvm/x86.h                 | 15 +++++++
 9 files changed, 82 insertions(+), 82 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index e3054e3e46d5..764be4a26a0c 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -14,6 +14,7 @@ BUILD_BUG_ON(1)
  * to make a definition optional, but in this case the default will
  * be __static_call_return0.
  */
+KVM_X86_OP(vendor_exit)
 KVM_X86_OP(check_processor_compatibility)
 KVM_X86_OP(hardware_enable)
 KVM_X86_OP(hardware_disable)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index eda45a937666..e01d1aa3628c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1546,6 +1546,8 @@ static inline u16 kvm_lapic_irq_dest_mode(bool dest_mode_logical)
 struct kvm_x86_ops {
 	const char *name;
 
+	void (*vendor_exit)(void);
+
 	int (*check_processor_compatibility)(void);
 
 	int (*hardware_enable)(void);
diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index a0c551846b35..8f2ac7598912 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -565,7 +565,7 @@ DEFINE_IDTENTRY_RAW(exc_nmi_kvm_vmx)
 {
 	exc_nmi(regs);
 }
-#if IS_MODULE(CONFIG_KVM_INTEL)
+#if IS_MODULE(CONFIG_KVM) && IS_ENABLED(CONFIG_KVM_INTEL)
 EXPORT_SYMBOL_GPL(asm_exc_nmi_kvm_vmx);
 #endif
 #endif
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 8c5fb7f57b4c..adfa57d59643 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -75,15 +75,12 @@ config KVM_WERROR
 	  If in doubt, say "N".
 
 config KVM_INTEL
-	tristate "KVM for Intel (and compatible) processors support"
+	bool "KVM for Intel (and compatible) processors support"
 	depends on KVM && IA32_FEAT_CTL
 	help
 	  Provides support for KVM on processors equipped with Intel's VT
 	  extensions, a.k.a. Virtual Machine Extensions (VMX).
 
-	  To compile this as a module, choose M here: the module
-	  will be called kvm-intel.
-
 config X86_SGX_KVM
 	bool "Software Guard eXtensions (SGX) Virtualization"
 	depends on X86_SGX && KVM_INTEL
@@ -97,20 +94,17 @@ config X86_SGX_KVM
 	  If unsure, say N.
 
 config KVM_AMD
-	tristate "KVM for AMD processors support"
+	bool "KVM for AMD processors support"
 	depends on KVM && (CPU_SUP_AMD || CPU_SUP_HYGON)
 	help
 	  Provides support for KVM on AMD processors equipped with the AMD-V
 	  (SVM) extensions.
 
-	  To compile this as a module, choose M here: the module
-	  will be called kvm-amd.
-
 config KVM_AMD_SEV
 	def_bool y
 	bool "AMD Secure Encrypted Virtualization (SEV) support"
 	depends on KVM_AMD && X86_64
-	depends on CRYPTO_DEV_SP_PSP && !(KVM_AMD=y && CRYPTO_DEV_CCP_DD=m)
+	depends on CRYPTO_DEV_SP_PSP && !(KVM=y && CRYPTO_DEV_CCP_DD=m)
 	help
 	  Provides support for launching Encrypted VMs (SEV) and Encrypted VMs
 	  with Encrypted State (SEV-ES) on AMD processors.
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index d13f1a7b7b3d..3e965c90e065 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -21,20 +21,18 @@ kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
 kvm-$(CONFIG_KVM_XEN)	+= xen.o
 kvm-$(CONFIG_KVM_SMM)	+= smm.o
 
-kvm-intel-y		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
+kvm-$(CONFIG_KVM_INTEL)	+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
 			   vmx/hyperv.o vmx/nested.o vmx/posted_intr.o
-kvm-intel-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
+kvm-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
 
-kvm-amd-y		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
+kvm-$(CONFIG_KVM_AMD)	+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
 			   svm/sev.o svm/hyperv.o
 
 ifdef CONFIG_HYPERV
-kvm-amd-y		+= svm/svm_onhyperv.o
+kvm-$(CONFIG_KVM_AMD)	+= svm/svm_onhyperv.o
 endif
 
 obj-$(CONFIG_KVM)	+= kvm.o
-obj-$(CONFIG_KVM_INTEL)	+= kvm-intel.o
-obj-$(CONFIG_KVM_AMD)	+= kvm-amd.o
 
 AFLAGS_svm/vmenter.o    := -iquote $(obj)
 $(obj)/svm/vmenter.o: $(obj)/kvm-asm-offsets.h
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 3d44e42f4f22..7fe9d11db8a6 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -52,9 +52,6 @@
 #include "kvm_onhyperv.h"
 #include "svm_onhyperv.h"
 
-MODULE_AUTHOR("Qumranet");
-MODULE_LICENSE("GPL");
-
 #ifdef MODULE
 static const struct x86_cpu_id svm_cpu_id[] = {
 	X86_MATCH_FEATURE(X86_FEATURE_SVM, NULL),
@@ -551,7 +548,7 @@ static bool __kvm_is_svm_supported(void)
 	return true;
 }
 
-static bool kvm_is_svm_supported(void)
+bool kvm_is_svm_supported(void)
 {
 	bool supported;
 
@@ -4873,9 +4870,21 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
+static void __svm_exit(void)
+{
+	cpu_emergency_unregister_virt_callback(svm_emergency_disable);
+}
+
+void svm_module_exit(void)
+{
+	__svm_exit();
+}
+
 static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
+	.vendor_exit = svm_module_exit,
+
 	.check_processor_compatibility = svm_check_processor_compat,
 
 	.hardware_unsetup = svm_hardware_unsetup,
@@ -5298,22 +5307,12 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
 	.pmu_ops = &amd_pmu_ops,
 };
 
-static void __svm_exit(void)
-{
-	kvm_x86_vendor_exit();
-
-	cpu_emergency_unregister_virt_callback(svm_emergency_disable);
-}
-
-static int __init svm_init(void)
+int __init svm_init(void)
 {
 	int r;
 
 	__unused_size_checks();
 
-	if (!kvm_is_svm_supported())
-		return -EOPNOTSUPP;
-
 	r = kvm_x86_vendor_init(&svm_init_ops);
 	if (r)
 		return r;
@@ -5335,12 +5334,3 @@ static int __init svm_init(void)
 	__svm_exit();
 	return r;
 }
-
-static void __exit svm_exit(void)
-{
-	kvm_exit();
-	__svm_exit();
-}
-
-module_init(svm_init)
-module_exit(svm_exit)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 65d59de3cc63..629e662b131e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -66,9 +66,6 @@
 #include "x86.h"
 #include "smm.h"
 
-MODULE_AUTHOR("Qumranet");
-MODULE_LICENSE("GPL");
-
 #ifdef MODULE
 static const struct x86_cpu_id vmx_cpu_id[] = {
 	X86_MATCH_FEATURE(X86_FEATURE_VMX, NULL),
@@ -2738,7 +2735,7 @@ static bool __kvm_is_vmx_supported(void)
 	return true;
 }
 
-static bool kvm_is_vmx_supported(void)
+bool kvm_is_vmx_supported(void)
 {
 	bool supported;
 
@@ -8199,9 +8196,35 @@ static void vmx_vm_destroy(struct kvm *kvm)
 	free_pages((unsigned long)kvm_vmx->pid_table, vmx_get_pid_table_order(kvm));
 }
 
+static void vmx_cleanup_l1d_flush(void)
+{
+	if (vmx_l1d_flush_pages) {
+		free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER);
+		vmx_l1d_flush_pages = NULL;
+	}
+	/* Restore state so sysfs ignores VMX */
+	l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
+}
+
+static void __vmx_exit(void)
+{
+	allow_smaller_maxphyaddr = false;
+
+	cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
+
+	vmx_cleanup_l1d_flush();
+}
+
+void vmx_module_exit(void)
+{
+	__vmx_exit();
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.name = KBUILD_MODNAME,
 
+	.vendor_exit = vmx_module_exit,
+
 	.check_processor_compatibility = vmx_check_processor_compat,
 
 	.hardware_unsetup = vmx_hardware_unsetup,
@@ -8608,41 +8631,10 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = {
 	.pmu_ops = &intel_pmu_ops,
 };
 
-static void vmx_cleanup_l1d_flush(void)
-{
-	if (vmx_l1d_flush_pages) {
-		free_pages((unsigned long)vmx_l1d_flush_pages, L1D_CACHE_ORDER);
-		vmx_l1d_flush_pages = NULL;
-	}
-	/* Restore state so sysfs ignores VMX */
-	l1tf_vmx_mitigation = VMENTER_L1D_FLUSH_AUTO;
-}
-
-static void __vmx_exit(void)
-{
-	allow_smaller_maxphyaddr = false;
-
-	cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
-
-	vmx_cleanup_l1d_flush();
-}
-
-static void vmx_exit(void)
-{
-	kvm_exit();
-	kvm_x86_vendor_exit();
-
-	__vmx_exit();
-}
-module_exit(vmx_exit);
-
-static int __init vmx_init(void)
+int __init vmx_init(void)
 {
 	int r, cpu;
 
-	if (!kvm_is_vmx_supported())
-		return -EOPNOTSUPP;
-
 	/*
 	 * Note, hv_init_evmcs() touches only VMX knobs, i.e. there's nothing
 	 * to unwind if a later step fails.
@@ -8691,6 +8683,7 @@ static int __init vmx_init(void)
 	if (r)
 		goto err_kvm_init;
 
+
 	return 0;
 
 err_kvm_init:
@@ -8699,4 +8692,3 @@ static int __init vmx_init(void)
 	kvm_x86_vendor_exit();
 	return r;
 }
-module_init(vmx_init);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6b7f89fd2d47..e62daa2c3017 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -15,6 +15,7 @@
  *   Amit Shah    <amit.shah@qumranet.com>
  *   Ben-Ami Yassour <benami@il.ibm.com>
  */
+#include <asm-generic/errno.h>
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/kvm_host.h>
@@ -13678,15 +13679,22 @@ static int __init kvm_x86_init(void)
 {
 	kvm_mmu_x86_module_init();
 	mitigate_smt_rsb &= boot_cpu_has_bug(X86_BUG_SMT_RSB) && cpu_smt_possible();
-	return 0;
+
+	if (kvm_is_svm_supported())
+		return svm_init();
+	else if (kvm_is_vmx_supported())
+		return vmx_init();
+
+	pr_err_ratelimited("kvm: no hardware support for SVM or VMX\n");
+	return -EOPNOTSUPP;
 }
 module_init(kvm_x86_init);
 
 static void __exit kvm_x86_exit(void)
 {
-	/*
-	 * If module_init() is implemented, module_exit() must also be
-	 * implemented to allow module unload.
-	 */
+	kvm_exit();
+	kvm_x86_vendor_exit();
+	kvm_x86_ops.vendor_exit();
+
 }
 module_exit(kvm_x86_exit);
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 6b5490319d1b..322be05e6c5b 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -9,6 +9,21 @@
 #include "kvm_cache_regs.h"
 #include "kvm_emulate.h"
 
+#ifdef CONFIG_KVM_AMD
+bool kvm_is_svm_supported(void);
+int __init svm_init(void);
+void svm_module_exit(void);
+#else
+bool kvm_is_svm_supported(void) { return false; }
+#endif
+#ifdef CONFIG_KVM_INTEL
+bool kvm_is_vmx_supported(void);
+int __init vmx_init(void);
+void vmx_module_exit(void);
+#else
+bool kvm_is_vmx_supported(void) { return false; }
+#endif
+
 struct kvm_caps {
 	/* control of guest tsc rate supported? */
 	bool has_tsc_control;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 03/14] KVM: x86: Remove unused exports
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 01/14] KVM: x86: Move common module params from SVM/VMX to x86 Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 02/14] KVM: x86: Fold x86 vendor modules into the main KVM modules Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 04/14] KVM: x86: Create stubs for a new VAC module Anish Ghulati
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Remove all the unused exports from KVM now that vendor modules no longer
exist.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/cpuid.c        |   7 --
 arch/x86/kvm/hyperv.c       |   2 -
 arch/x86/kvm/irq.c          |   3 -
 arch/x86/kvm/irq_comm.c     |   2 -
 arch/x86/kvm/kvm_onhyperv.c |   3 -
 arch/x86/kvm/lapic.c        |  15 ----
 arch/x86/kvm/mmu/mmu.c      |  12 ----
 arch/x86/kvm/mmu/spte.c     |   4 --
 arch/x86/kvm/mtrr.c         |   1 -
 arch/x86/kvm/pmu.c          |   2 -
 arch/x86/kvm/x86.c          | 140 ------------------------------------
 11 files changed, 191 deletions(-)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 0544e30b4946..01de1f659beb 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -34,7 +34,6 @@
  * aligned to sizeof(unsigned long) because it's not accessed via bitops.
  */
 u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
-EXPORT_SYMBOL_GPL(kvm_cpu_caps);
 
 u32 xstate_required_size(u64 xstate_bv, bool compacted)
 {
@@ -310,7 +309,6 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu)
 {
 	__kvm_update_cpuid_runtime(vcpu, vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent);
 }
-EXPORT_SYMBOL_GPL(kvm_update_cpuid_runtime);
 
 static bool kvm_cpuid_has_hyperv(struct kvm_cpuid_entry2 *entries, int nent)
 {
@@ -808,7 +806,6 @@ void kvm_set_cpu_caps(void)
 		kvm_cpu_cap_clear(X86_FEATURE_RDPID);
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);
 
 struct kvm_cpuid_array {
 	struct kvm_cpuid_entry2 *entries;
@@ -1432,7 +1429,6 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry_index(struct kvm_vcpu *vcpu,
 	return cpuid_entry2_find(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
 				 function, index);
 }
-EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry_index);
 
 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
 					      u32 function)
@@ -1440,7 +1436,6 @@ struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu,
 	return cpuid_entry2_find(vcpu->arch.cpuid_entries, vcpu->arch.cpuid_nent,
 				 function, KVM_CPUID_INDEX_NOT_SIGNIFICANT);
 }
-EXPORT_SYMBOL_GPL(kvm_find_cpuid_entry);
 
 /*
  * Intel CPUID semantics treats any query for an out-of-range leaf as if the
@@ -1560,7 +1555,6 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx,
 			used_max_basic);
 	return exact;
 }
-EXPORT_SYMBOL_GPL(kvm_cpuid);
 
 int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 {
@@ -1578,4 +1572,3 @@ int kvm_emulate_cpuid(struct kvm_vcpu *vcpu)
 	kvm_rdx_write(vcpu, edx);
 	return kvm_skip_emulated_instruction(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_cpuid);
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 7c2dac6824e2..c093307dbfcb 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -917,7 +917,6 @@ bool kvm_hv_assist_page_enabled(struct kvm_vcpu *vcpu)
 		return false;
 	return vcpu->arch.pv_eoi.msr_val & KVM_MSR_ENABLED;
 }
-EXPORT_SYMBOL_GPL(kvm_hv_assist_page_enabled);
 
 int kvm_hv_get_assist_page(struct kvm_vcpu *vcpu)
 {
@@ -929,7 +928,6 @@ int kvm_hv_get_assist_page(struct kvm_vcpu *vcpu)
 	return kvm_read_guest_cached(vcpu->kvm, &vcpu->arch.pv_eoi.data,
 				     &hv_vcpu->vp_assist_page, sizeof(struct hv_vp_assist_page));
 }
-EXPORT_SYMBOL_GPL(kvm_hv_get_assist_page);
 
 static void stimer_prepare_msg(struct kvm_vcpu_hv_stimer *stimer)
 {
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index b2c397dd2bc6..88de44c8087e 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -89,7 +89,6 @@ int kvm_cpu_has_injectable_intr(struct kvm_vcpu *v)
 
 	return kvm_apic_has_interrupt(v) != -1; /* LAPIC */
 }
-EXPORT_SYMBOL_GPL(kvm_cpu_has_injectable_intr);
 
 /*
  * check if there is pending interrupt without
@@ -102,7 +101,6 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v)
 
 	return kvm_apic_has_interrupt(v) != -1;	/* LAPIC */
 }
-EXPORT_SYMBOL_GPL(kvm_cpu_has_interrupt);
 
 /*
  * Read pending interrupt(from non-APIC source)
@@ -141,7 +139,6 @@ int kvm_cpu_get_interrupt(struct kvm_vcpu *v)
 
 	return kvm_get_apic_interrupt(v);	/* APIC */
 }
-EXPORT_SYMBOL_GPL(kvm_cpu_get_interrupt);
 
 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 16d076a1b91a..fa12d7340844 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -120,7 +120,6 @@ void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
 	irq->level = 1;
 	irq->shorthand = APIC_DEST_NOSHORT;
 }
-EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
 
 static inline bool kvm_msi_route_invalid(struct kvm *kvm,
 		struct kvm_kernel_irq_routing_entry *e)
@@ -356,7 +355,6 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
 
 	return r == 1;
 }
-EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
 
 #define IOAPIC_ROUTING_ENTRY(irq) \
 	{ .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP,	\
diff --git a/arch/x86/kvm/kvm_onhyperv.c b/arch/x86/kvm/kvm_onhyperv.c
index ded0bd688c65..aed3fdbd4b92 100644
--- a/arch/x86/kvm/kvm_onhyperv.c
+++ b/arch/x86/kvm/kvm_onhyperv.c
@@ -101,13 +101,11 @@ int hv_flush_remote_tlbs_range(struct kvm *kvm, gfn_t start_gfn, gfn_t nr_pages)
 
 	return __hv_flush_remote_tlbs_range(kvm, &range);
 }
-EXPORT_SYMBOL_GPL(hv_flush_remote_tlbs_range);
 
 int hv_flush_remote_tlbs(struct kvm *kvm)
 {
 	return __hv_flush_remote_tlbs_range(kvm, NULL);
 }
-EXPORT_SYMBOL_GPL(hv_flush_remote_tlbs);
 
 void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
 {
@@ -121,4 +119,3 @@ void hv_track_root_tdp(struct kvm_vcpu *vcpu, hpa_t root_tdp)
 		spin_unlock(&kvm_arch->hv_root_tdp_lock);
 	}
 }
-EXPORT_SYMBOL_GPL(hv_track_root_tdp);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index dcd60b39e794..1009ef21248d 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -682,7 +682,6 @@ bool __kvm_apic_update_irr(u32 *pir, void *regs, int *max_irr)
 	return ((max_updated_irr != -1) &&
 		(max_updated_irr == *max_irr));
 }
-EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
 
 bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir, int *max_irr)
 {
@@ -693,7 +692,6 @@ bool kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir, int *max_irr)
 		apic->irr_pending = true;
 	return irr_updated;
 }
-EXPORT_SYMBOL_GPL(kvm_apic_update_irr);
 
 static inline int apic_search_irr(struct kvm_lapic *apic)
 {
@@ -736,7 +734,6 @@ void kvm_apic_clear_irr(struct kvm_vcpu *vcpu, int vec)
 {
 	apic_clear_irr(vec, vcpu->arch.apic);
 }
-EXPORT_SYMBOL_GPL(kvm_apic_clear_irr);
 
 static inline void apic_set_isr(int vec, struct kvm_lapic *apic)
 {
@@ -811,7 +808,6 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 	 */
 	return apic_find_highest_irr(vcpu->arch.apic);
 }
-EXPORT_SYMBOL_GPL(kvm_lapic_find_highest_irr);
 
 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode,
 			     int vector, int level, int trig_mode,
@@ -973,7 +969,6 @@ void kvm_apic_update_ppr(struct kvm_vcpu *vcpu)
 {
 	apic_update_ppr(vcpu->arch.apic);
 }
-EXPORT_SYMBOL_GPL(kvm_apic_update_ppr);
 
 static void apic_set_tpr(struct kvm_lapic *apic, u32 tpr)
 {
@@ -1084,7 +1079,6 @@ bool kvm_apic_match_dest(struct kvm_vcpu *vcpu, struct kvm_lapic *source,
 		return false;
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_apic_match_dest);
 
 int kvm_vector_to_index(u32 vector, u32 dest_vcpus,
 		       const unsigned long *bitmap, u32 bitmap_size)
@@ -1497,7 +1491,6 @@ void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
 	kvm_ioapic_send_eoi(apic, vector);
 	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
 
 void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
 {
@@ -1522,7 +1515,6 @@ void kvm_apic_send_ipi(struct kvm_lapic *apic, u32 icr_low, u32 icr_high)
 
 	kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq, NULL);
 }
-EXPORT_SYMBOL_GPL(kvm_apic_send_ipi);
 
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
 {
@@ -1638,7 +1630,6 @@ u64 kvm_lapic_readable_reg_mask(struct kvm_lapic *apic)
 
 	return valid_reg_mask;
 }
-EXPORT_SYMBOL_GPL(kvm_lapic_readable_reg_mask);
 
 static int kvm_lapic_reg_read(struct kvm_lapic *apic, u32 offset, int len,
 			      void *data)
@@ -1872,7 +1863,6 @@ void kvm_wait_lapic_expire(struct kvm_vcpu *vcpu)
 	    lapic_timer_int_injected(vcpu))
 		__kvm_wait_lapic_expire(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_wait_lapic_expire);
 
 static void kvm_apic_inject_pending_timer_irqs(struct kvm_lapic *apic)
 {
@@ -2185,7 +2175,6 @@ void kvm_lapic_expired_hv_timer(struct kvm_vcpu *vcpu)
 out:
 	preempt_enable();
 }
-EXPORT_SYMBOL_GPL(kvm_lapic_expired_hv_timer);
 
 void kvm_lapic_switch_to_hv_timer(struct kvm_vcpu *vcpu)
 {
@@ -2438,7 +2427,6 @@ void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 {
 	kvm_lapic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
 }
-EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
 /* emulate APIC access in a trap manner */
 void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
@@ -2461,7 +2449,6 @@ void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 		kvm_lapic_reg_write(apic, offset, (u32)val);
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
 
 void kvm_free_lapic(struct kvm_vcpu *vcpu)
 {
@@ -2627,7 +2614,6 @@ int kvm_alloc_apic_access_page(struct kvm *kvm)
 	mutex_unlock(&kvm->slots_lock);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvm_alloc_apic_access_page);
 
 void kvm_inhibit_apic_access_page(struct kvm_vcpu *vcpu)
 {
@@ -2858,7 +2844,6 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
 	__apic_update_ppr(apic, &ppr);
 	return apic_has_interrupt_for_ppr(apic, ppr);
 }
-EXPORT_SYMBOL_GPL(kvm_apic_has_interrupt);
 
 int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
 {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e1d011c67cc6..e9e66d635688 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3610,7 +3610,6 @@ void kvm_mmu_free_roots(struct kvm *kvm, struct kvm_mmu *mmu,
 	kvm_mmu_commit_zap_page(kvm, &invalid_list);
 	write_unlock(&kvm->mmu_lock);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_free_roots);
 
 void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 {
@@ -3637,7 +3636,6 @@ void kvm_mmu_free_guest_mode_roots(struct kvm *kvm, struct kvm_mmu *mmu)
 
 	kvm_mmu_free_roots(kvm, mmu, roots_to_free);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_free_guest_mode_roots);
 
 static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, int quadrant,
 			    u8 level)
@@ -4441,7 +4439,6 @@ int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
 
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
 
 #ifdef CONFIG_X86_64
 static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *vcpu,
@@ -4660,7 +4657,6 @@ void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd)
 			__clear_sp_write_flooding_count(sp);
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_new_pgd);
 
 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn,
 			   unsigned int access)
@@ -5294,7 +5290,6 @@ void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0,
 	shadow_mmu_init_context(vcpu, context, cpu_role, root_role);
 	kvm_mmu_new_pgd(vcpu, nested_cr3);
 }
-EXPORT_SYMBOL_GPL(kvm_init_shadow_npt_mmu);
 
 static union kvm_cpu_role
 kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty,
@@ -5348,7 +5343,6 @@ void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
 
 	kvm_mmu_new_pgd(vcpu, new_eptp);
 }
-EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
 
 static void init_kvm_softmmu(struct kvm_vcpu *vcpu,
 			     union kvm_cpu_role cpu_role)
@@ -5413,7 +5407,6 @@ void kvm_init_mmu(struct kvm_vcpu *vcpu)
 	else
 		init_kvm_softmmu(vcpu, cpu_role);
 }
-EXPORT_SYMBOL_GPL(kvm_init_mmu);
 
 void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
 {
@@ -5449,7 +5442,6 @@ void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
 	kvm_mmu_unload(vcpu);
 	kvm_init_mmu(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_reset_context);
 
 int kvm_mmu_load(struct kvm_vcpu *vcpu)
 {
@@ -5763,7 +5755,6 @@ int noinline kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 err
 	return x86_emulate_instruction(vcpu, cr2_or_gpa, emulation_type, insn,
 				       insn_len);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_page_fault);
 
 static void __kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 				      u64 addr, hpa_t root_hpa)
@@ -5829,7 +5820,6 @@ void kvm_mmu_invalidate_addr(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			__kvm_mmu_invalidate_addr(vcpu, mmu, addr, mmu->prev_roots[i].hpa);
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_addr);
 
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 {
@@ -5846,7 +5836,6 @@ void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 	kvm_mmu_invalidate_addr(vcpu, vcpu->arch.walk_mmu, gva, KVM_MMU_ROOTS_ALL);
 	++vcpu->stat.invlpg;
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_invlpg);
 
 
 void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid)
@@ -5899,7 +5888,6 @@ void kvm_configure_mmu(bool enable_tdp, int tdp_forced_root_level,
 	else
 		max_huge_page_level = PG_LEVEL_2M;
 }
-EXPORT_SYMBOL_GPL(kvm_configure_mmu);
 
 /* The return value indicates if tlb flush on all vcpus is needed. */
 typedef bool (*slot_rmaps_handler) (struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c
index 4a599130e9c9..feb3bbb16d70 100644
--- a/arch/x86/kvm/mmu/spte.c
+++ b/arch/x86/kvm/mmu/spte.c
@@ -22,7 +22,6 @@
 bool __read_mostly enable_mmio_caching = true;
 static bool __ro_after_init allow_mmio_caching;
 module_param_named(mmio_caching, enable_mmio_caching, bool, 0444);
-EXPORT_SYMBOL_GPL(enable_mmio_caching);
 
 u64 __read_mostly shadow_host_writable_mask;
 u64 __read_mostly shadow_mmu_writable_mask;
@@ -409,7 +408,6 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_value, u64 mmio_mask, u64 access_mask)
 	shadow_mmio_mask  = mmio_mask;
 	shadow_mmio_access_mask = access_mask;
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
 
 void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
 {
@@ -420,7 +418,6 @@ void kvm_mmu_set_me_spte_mask(u64 me_value, u64 me_mask)
 	shadow_me_value = me_value;
 	shadow_me_mask = me_mask;
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_set_me_spte_mask);
 
 void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
 {
@@ -448,7 +445,6 @@ void kvm_mmu_set_ept_masks(bool has_ad_bits, bool has_exec_only)
 	kvm_mmu_set_mmio_spte_mask(VMX_EPT_MISCONFIG_WX_VALUE,
 				   VMX_EPT_RWX_MASK, 0);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_set_ept_masks);
 
 void kvm_mmu_reset_all_pte_masks(void)
 {
diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c
index 3eb6e7f47e96..409225d19ac5 100644
--- a/arch/x86/kvm/mtrr.c
+++ b/arch/x86/kvm/mtrr.c
@@ -685,7 +685,6 @@ u8 kvm_mtrr_get_guest_memory_type(struct kvm_vcpu *vcpu, gfn_t gfn)
 
 	return type;
 }
-EXPORT_SYMBOL_GPL(kvm_mtrr_get_guest_memory_type);
 
 bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn,
 					  int page_num)
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index edb89b51b383..f35511086046 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -27,7 +27,6 @@
 #define KVM_PMU_EVENT_FILTER_MAX_EVENTS 300
 
 struct x86_pmu_capability __read_mostly kvm_pmu_cap;
-EXPORT_SYMBOL_GPL(kvm_pmu_cap);
 
 /* Precise Distribution of Instructions Retired (PDIR) */
 static const struct x86_cpu_id vmx_pebs_pdir_cpu[] = {
@@ -773,7 +772,6 @@ void kvm_pmu_trigger_event(struct kvm_vcpu *vcpu, u64 perf_hw_id)
 			kvm_pmu_incr_counter(pmc);
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_pmu_trigger_event);
 
 static bool is_masked_filter_valid(const struct kvm_x86_pmu_event_filter *filter)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e62daa2c3017..0a8b94678928 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -95,7 +95,6 @@
 struct kvm_caps kvm_caps __read_mostly = {
 	.supported_mce_cap = MCG_CTL_P | MCG_SER_P,
 };
-EXPORT_SYMBOL_GPL(kvm_caps);
 
 #define  ERR_PTR_USR(e)  ((void __user *)ERR_PTR(e))
 
@@ -149,7 +148,6 @@ module_param(ignore_msrs, bool, S_IRUGO | S_IWUSR);
 
 bool __read_mostly report_ignored_msrs = true;
 module_param(report_ignored_msrs, bool, S_IRUGO | S_IWUSR);
-EXPORT_SYMBOL_GPL(report_ignored_msrs);
 
 unsigned int min_timer_period_us = 200;
 module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR);
@@ -175,18 +173,15 @@ module_param(vector_hashing, bool, S_IRUGO);
 
 bool __read_mostly enable_vmware_backdoor = false;
 module_param(enable_vmware_backdoor, bool, S_IRUGO);
-EXPORT_SYMBOL_GPL(enable_vmware_backdoor);
 
 /*
  * If nested=1, nested virtualization is supported
  */
 bool __read_mostly nested = 1;
 module_param(nested, bool, S_IRUGO);
-EXPORT_SYMBOL_GPL(nested);
 
 bool __read_mostly enable_vnmi = 1;
 module_param(enable_vnmi, bool, S_IRUGO);
-EXPORT_SYMBOL_GPL(enable_vnmi);
 
 /*
  * Flags to manipulate forced emulation behavior (any non-zero value will
@@ -201,7 +196,6 @@ module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR);
 
 /* Enable/disable PMU virtualization */
 bool __read_mostly enable_pmu = true;
-EXPORT_SYMBOL_GPL(enable_pmu);
 module_param(enable_pmu, bool, 0444);
 
 bool __read_mostly eager_page_split = true;
@@ -228,7 +222,6 @@ struct kvm_user_return_msrs {
 };
 
 u32 __read_mostly kvm_nr_uret_msrs;
-EXPORT_SYMBOL_GPL(kvm_nr_uret_msrs);
 static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
 static struct kvm_user_return_msrs __percpu *user_return_msrs;
 
@@ -238,19 +231,14 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs;
 				| XFEATURE_MASK_PKRU | XFEATURE_MASK_XTILE)
 
 u64 __read_mostly host_efer;
-EXPORT_SYMBOL_GPL(host_efer);
 
 bool __read_mostly allow_smaller_maxphyaddr = 0;
-EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr);
 
 bool __read_mostly enable_apicv = true;
-EXPORT_SYMBOL_GPL(enable_apicv);
 
 u64 __read_mostly host_xss;
-EXPORT_SYMBOL_GPL(host_xss);
 
 u64 __read_mostly host_arch_capabilities;
-EXPORT_SYMBOL_GPL(host_arch_capabilities);
 
 const struct _kvm_stats_desc kvm_vm_stats_desc[] = {
 	KVM_GENERIC_VM_STATS(),
@@ -422,7 +410,6 @@ int kvm_add_user_return_msr(u32 msr)
 	kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
 	return kvm_nr_uret_msrs++;
 }
-EXPORT_SYMBOL_GPL(kvm_add_user_return_msr);
 
 int kvm_find_user_return_msr(u32 msr)
 {
@@ -434,7 +421,6 @@ int kvm_find_user_return_msr(u32 msr)
 	}
 	return -1;
 }
-EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);
 
 static void kvm_user_return_msr_cpu_online(void)
 {
@@ -471,7 +457,6 @@ int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
 	}
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
 
 static void drop_user_return_notifiers(void)
 {
@@ -491,7 +476,6 @@ enum lapic_mode kvm_get_apic_mode(struct kvm_vcpu *vcpu)
 {
 	return kvm_apic_mode(kvm_get_apic_base(vcpu));
 }
-EXPORT_SYMBOL_GPL(kvm_get_apic_mode);
 
 int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
@@ -526,7 +510,6 @@ noinstr void kvm_spurious_fault(void)
 	/* Fault while not rebooting.  We want the trace. */
 	BUG_ON(!kvm_rebooting);
 }
-EXPORT_SYMBOL_GPL(kvm_spurious_fault);
 
 #define EXCPT_BENIGN		0
 #define EXCPT_CONTRIBUTORY	1
@@ -631,7 +614,6 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu,
 	ex->has_payload = false;
 	ex->payload = 0;
 }
-EXPORT_SYMBOL_GPL(kvm_deliver_exception_payload);
 
 static void kvm_queue_exception_vmexit(struct kvm_vcpu *vcpu, unsigned int vector,
 				       bool has_error_code, u32 error_code,
@@ -743,20 +725,17 @@ void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
 	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, false);
 }
-EXPORT_SYMBOL_GPL(kvm_queue_exception);
 
 void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr)
 {
 	kvm_multiple_exception(vcpu, nr, false, 0, false, 0, true);
 }
-EXPORT_SYMBOL_GPL(kvm_requeue_exception);
 
 void kvm_queue_exception_p(struct kvm_vcpu *vcpu, unsigned nr,
 			   unsigned long payload)
 {
 	kvm_multiple_exception(vcpu, nr, false, 0, true, payload, false);
 }
-EXPORT_SYMBOL_GPL(kvm_queue_exception_p);
 
 static void kvm_queue_exception_e_p(struct kvm_vcpu *vcpu, unsigned nr,
 				    u32 error_code, unsigned long payload)
@@ -774,7 +753,6 @@ int kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err)
 
 	return 1;
 }
-EXPORT_SYMBOL_GPL(kvm_complete_insn_gp);
 
 static int complete_emulated_insn_gp(struct kvm_vcpu *vcpu, int err)
 {
@@ -824,7 +802,6 @@ void kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu,
 
 	fault_mmu->inject_page_fault(vcpu, fault);
 }
-EXPORT_SYMBOL_GPL(kvm_inject_emulated_page_fault);
 
 void kvm_inject_nmi(struct kvm_vcpu *vcpu)
 {
@@ -836,13 +813,11 @@ void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
 {
 	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, false);
 }
-EXPORT_SYMBOL_GPL(kvm_queue_exception_e);
 
 void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code)
 {
 	kvm_multiple_exception(vcpu, nr, true, error_code, false, 0, true);
 }
-EXPORT_SYMBOL_GPL(kvm_requeue_exception_e);
 
 /*
  * Checks if cpl <= required_cpl; if true, return true.  Otherwise queue
@@ -864,7 +839,6 @@ bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr)
 	kvm_queue_exception(vcpu, UD_VECTOR);
 	return false;
 }
-EXPORT_SYMBOL_GPL(kvm_require_dr);
 
 static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu)
 {
@@ -919,7 +893,6 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3)
 
 	return 1;
 }
-EXPORT_SYMBOL_GPL(load_pdptrs);
 
 static bool kvm_is_valid_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
@@ -977,7 +950,6 @@ void kvm_post_set_cr0(struct kvm_vcpu *vcpu, unsigned long old_cr0, unsigned lon
 	    !kvm_check_has_quirk(vcpu->kvm, KVM_X86_QUIRK_CD_NW_CLEARED))
 		kvm_zap_gfn_range(vcpu->kvm, 0, ~0ULL);
 }
-EXPORT_SYMBOL_GPL(kvm_post_set_cr0);
 
 int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 {
@@ -1018,13 +990,11 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_set_cr0);
 
 void kvm_lmsw(struct kvm_vcpu *vcpu, unsigned long msw)
 {
 	(void)kvm_set_cr0(vcpu, kvm_read_cr0_bits(vcpu, ~0x0eul) | (msw & 0x0f));
 }
-EXPORT_SYMBOL_GPL(kvm_lmsw);
 
 void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
 {
@@ -1047,7 +1017,6 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
 	     kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE)))
 		write_pkru(vcpu->arch.pkru);
 }
-EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
 
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
 {
@@ -1073,7 +1042,6 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
 	}
 
 }
-EXPORT_SYMBOL_GPL(kvm_load_host_xsave_state);
 
 #ifdef CONFIG_X86_64
 static inline u64 kvm_guest_supported_xfd(struct kvm_vcpu *vcpu)
@@ -1138,7 +1106,6 @@ int kvm_emulate_xsetbv(struct kvm_vcpu *vcpu)
 
 	return kvm_skip_emulated_instruction(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_xsetbv);
 
 bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
@@ -1150,7 +1117,6 @@ bool __kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 
 	return true;
 }
-EXPORT_SYMBOL_GPL(__kvm_is_valid_cr4);
 
 static bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
@@ -1198,7 +1164,6 @@ void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned lon
 		kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu);
 
 }
-EXPORT_SYMBOL_GPL(kvm_post_set_cr4);
 
 int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 {
@@ -1229,7 +1194,6 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_set_cr4);
 
 static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid)
 {
@@ -1321,7 +1285,6 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_set_cr3);
 
 int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
 {
@@ -1333,7 +1296,6 @@ int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8)
 		vcpu->arch.cr8 = cr8;
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_set_cr8);
 
 unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 {
@@ -1342,7 +1304,6 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu)
 	else
 		return vcpu->arch.cr8;
 }
-EXPORT_SYMBOL_GPL(kvm_get_cr8);
 
 static void kvm_update_dr0123(struct kvm_vcpu *vcpu)
 {
@@ -1367,7 +1328,6 @@ void kvm_update_dr7(struct kvm_vcpu *vcpu)
 	if (dr7 & DR7_BP_EN_MASK)
 		vcpu->arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED;
 }
-EXPORT_SYMBOL_GPL(kvm_update_dr7);
 
 static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu)
 {
@@ -1408,7 +1368,6 @@ int kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val)
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_set_dr);
 
 void kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
 {
@@ -1428,7 +1387,6 @@ void kvm_get_dr(struct kvm_vcpu *vcpu, int dr, unsigned long *val)
 		break;
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_get_dr);
 
 int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
 {
@@ -1444,7 +1402,6 @@ int kvm_emulate_rdpmc(struct kvm_vcpu *vcpu)
 	kvm_rdx_write(vcpu, data >> 32);
 	return kvm_skip_emulated_instruction(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_rdpmc);
 
 /*
  * The three MSR lists(msrs_to_save, emulated_msrs, msr_based_features) track
@@ -1758,7 +1715,6 @@ bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer)
 
 	return __kvm_valid_efer(vcpu, efer);
 }
-EXPORT_SYMBOL_GPL(kvm_valid_efer);
 
 static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
@@ -1797,7 +1753,6 @@ void kvm_enable_efer_bits(u64 mask)
 {
        efer_reserved_bits &= ~mask;
 }
-EXPORT_SYMBOL_GPL(kvm_enable_efer_bits);
 
 bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
 {
@@ -1840,7 +1795,6 @@ bool kvm_msr_allowed(struct kvm_vcpu *vcpu, u32 index, u32 type)
 
 	return allowed;
 }
-EXPORT_SYMBOL_GPL(kvm_msr_allowed);
 
 /*
  * Write @data into the MSR specified by @index.  Select MSR specific fault
@@ -1988,13 +1942,11 @@ int kvm_get_msr(struct kvm_vcpu *vcpu, u32 index, u64 *data)
 {
 	return kvm_get_msr_ignored_check(vcpu, index, data, false);
 }
-EXPORT_SYMBOL_GPL(kvm_get_msr);
 
 int kvm_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data)
 {
 	return kvm_set_msr_ignored_check(vcpu, index, data, false);
 }
-EXPORT_SYMBOL_GPL(kvm_set_msr);
 
 static void complete_userspace_rdmsr(struct kvm_vcpu *vcpu)
 {
@@ -2083,7 +2035,6 @@ int kvm_emulate_rdmsr(struct kvm_vcpu *vcpu)
 
 	return static_call(kvm_x86_complete_emulated_msr)(vcpu, r);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_rdmsr);
 
 int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 {
@@ -2108,7 +2059,6 @@ int kvm_emulate_wrmsr(struct kvm_vcpu *vcpu)
 
 	return static_call(kvm_x86_complete_emulated_msr)(vcpu, r);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_wrmsr);
 
 int kvm_emulate_as_nop(struct kvm_vcpu *vcpu)
 {
@@ -2120,14 +2070,12 @@ int kvm_emulate_invd(struct kvm_vcpu *vcpu)
 	/* Treat an INVD instruction as a NOP and just skip it. */
 	return kvm_emulate_as_nop(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_invd);
 
 int kvm_handle_invalid_op(struct kvm_vcpu *vcpu)
 {
 	kvm_queue_exception(vcpu, UD_VECTOR);
 	return 1;
 }
-EXPORT_SYMBOL_GPL(kvm_handle_invalid_op);
 
 
 static int kvm_emulate_monitor_mwait(struct kvm_vcpu *vcpu, const char *insn)
@@ -2143,13 +2091,11 @@ int kvm_emulate_mwait(struct kvm_vcpu *vcpu)
 {
 	return kvm_emulate_monitor_mwait(vcpu, "MWAIT");
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_mwait);
 
 int kvm_emulate_monitor(struct kvm_vcpu *vcpu)
 {
 	return kvm_emulate_monitor_mwait(vcpu, "MONITOR");
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_monitor);
 
 static inline bool kvm_vcpu_exit_request(struct kvm_vcpu *vcpu)
 {
@@ -2222,7 +2168,6 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm_vcpu *vcpu)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(handle_fastpath_set_msr_irqoff);
 
 /*
  * Adapt set_msr() to msr_io()'s calling convention
@@ -2593,7 +2538,6 @@ u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc)
 	return vcpu->arch.l1_tsc_offset +
 		kvm_scale_tsc(host_tsc, vcpu->arch.l1_tsc_scaling_ratio);
 }
-EXPORT_SYMBOL_GPL(kvm_read_l1_tsc);
 
 u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier)
 {
@@ -2608,7 +2552,6 @@ u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier)
 	nested_offset += l2_offset;
 	return nested_offset;
 }
-EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_offset);
 
 u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
 {
@@ -2618,7 +2561,6 @@ u64 kvm_calc_nested_tsc_multiplier(u64 l1_multiplier, u64 l2_multiplier)
 
 	return l1_multiplier;
 }
-EXPORT_SYMBOL_GPL(kvm_calc_nested_tsc_multiplier);
 
 static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 l1_offset)
 {
@@ -3525,7 +3467,6 @@ void kvm_service_local_tlb_flush_requests(struct kvm_vcpu *vcpu)
 	if (kvm_check_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu))
 		kvm_vcpu_flush_tlb_guest(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_service_local_tlb_flush_requests);
 
 static void record_steal_time(struct kvm_vcpu *vcpu)
 {
@@ -4005,7 +3946,6 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	}
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_set_msr_common);
 
 static int get_msr_mce(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata, bool host)
 {
@@ -4363,7 +4303,6 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	}
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_get_msr_common);
 
 /*
  * Read or write a bunch of msrs. All parameters are kernel addresses.
@@ -7314,7 +7253,6 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva,
 	u64 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0;
 	return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read);
 
 gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
 			       struct x86_exception *exception)
@@ -7325,7 +7263,6 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva,
 	access |= PFERR_WRITE_MASK;
 	return mmu->gva_to_gpa(vcpu, mmu, gva, access, exception);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write);
 
 /* uses this to access any guest's mapped memory without checking CPL */
 gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva,
@@ -7411,7 +7348,6 @@ int kvm_read_guest_virt(struct kvm_vcpu *vcpu,
 	return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access,
 					  exception);
 }
-EXPORT_SYMBOL_GPL(kvm_read_guest_virt);
 
 static int emulator_read_std(struct x86_emulate_ctxt *ctxt,
 			     gva_t addr, void *val, unsigned int bytes,
@@ -7483,7 +7419,6 @@ int kvm_write_guest_virt_system(struct kvm_vcpu *vcpu, gva_t addr, void *val,
 	return kvm_write_guest_virt_helper(addr, val, bytes, vcpu,
 					   PFERR_WRITE_MASK, exception);
 }
-EXPORT_SYMBOL_GPL(kvm_write_guest_virt_system);
 
 static int kvm_can_emulate_insn(struct kvm_vcpu *vcpu, int emul_type,
 				void *insn, int insn_len)
@@ -7515,7 +7450,6 @@ int handle_ud(struct kvm_vcpu *vcpu)
 
 	return kvm_emulate_instruction(vcpu, emul_type);
 }
-EXPORT_SYMBOL_GPL(handle_ud);
 
 static int vcpu_is_mmio_gpa(struct kvm_vcpu *vcpu, unsigned long gva,
 			    gpa_t gpa, bool write)
@@ -7985,7 +7919,6 @@ int kvm_emulate_wbinvd(struct kvm_vcpu *vcpu)
 	kvm_emulate_wbinvd_noskip(vcpu);
 	return kvm_skip_emulated_instruction(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_wbinvd);
 
 
 
@@ -8460,7 +8393,6 @@ void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip)
 		kvm_set_rflags(vcpu, ctxt->eflags);
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt);
 
 static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data,
 					   u8 ndata, u8 *insn_bytes, u8 insn_size)
@@ -8526,13 +8458,11 @@ void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data,
 {
 	prepare_emulation_failure_exit(vcpu, data, ndata, NULL, 0);
 }
-EXPORT_SYMBOL_GPL(__kvm_prepare_emulation_failure_exit);
 
 void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu)
 {
 	__kvm_prepare_emulation_failure_exit(vcpu, NULL, 0);
 }
-EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit);
 
 static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type)
 {
@@ -8740,7 +8670,6 @@ int kvm_skip_emulated_instruction(struct kvm_vcpu *vcpu)
 		r = kvm_vcpu_do_singlestep(vcpu);
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvm_skip_emulated_instruction);
 
 static bool kvm_is_code_breakpoint_inhibited(struct kvm_vcpu *vcpu)
 {
@@ -8873,7 +8802,6 @@ int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type,
 
 	return r;
 }
-EXPORT_SYMBOL_GPL(x86_decode_emulated_instruction);
 
 int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa,
 			    int emulation_type, void *insn, int insn_len)
@@ -9061,14 +8989,12 @@ int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type)
 {
 	return x86_emulate_instruction(vcpu, 0, emulation_type, NULL, 0);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_instruction);
 
 int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu,
 					void *insn, int insn_len)
 {
 	return x86_emulate_instruction(vcpu, 0, 0, insn, insn_len);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_instruction_from_buffer);
 
 static int complete_fast_pio_out_port_0x7e(struct kvm_vcpu *vcpu)
 {
@@ -9163,7 +9089,6 @@ int kvm_fast_pio(struct kvm_vcpu *vcpu, int size, unsigned short port, int in)
 		ret = kvm_fast_pio_out(vcpu, size, port);
 	return ret && kvm_skip_emulated_instruction(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_fast_pio);
 
 static int kvmclock_cpu_down_prep(unsigned int cpu)
 {
@@ -9591,7 +9516,6 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 
 	return r;
 }
-EXPORT_SYMBOL_GPL(kvm_x86_vendor_init);
 
 void kvm_x86_vendor_exit(void)
 {
@@ -9625,7 +9549,6 @@ void kvm_x86_vendor_exit(void)
 	kvm_x86_ops.hardware_enable = NULL;
 	mutex_unlock(&vendor_module_lock);
 }
-EXPORT_SYMBOL_GPL(kvm_x86_vendor_exit);
 
 static int __kvm_emulate_halt(struct kvm_vcpu *vcpu, int state, int reason)
 {
@@ -9650,7 +9573,6 @@ int kvm_emulate_halt_noskip(struct kvm_vcpu *vcpu)
 {
 	return __kvm_emulate_halt(vcpu, KVM_MP_STATE_HALTED, KVM_EXIT_HLT);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_halt_noskip);
 
 int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 {
@@ -9661,7 +9583,6 @@ int kvm_emulate_halt(struct kvm_vcpu *vcpu)
 	 */
 	return kvm_emulate_halt_noskip(vcpu) && ret;
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_halt);
 
 int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
 {
@@ -9670,7 +9591,6 @@ int kvm_emulate_ap_reset_hold(struct kvm_vcpu *vcpu)
 	return __kvm_emulate_halt(vcpu, KVM_MP_STATE_AP_RESET_HOLD,
 					KVM_EXIT_AP_RESET_HOLD) && ret;
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_ap_reset_hold);
 
 #ifdef CONFIG_X86_64
 static int kvm_pv_clock_pairing(struct kvm_vcpu *vcpu, gpa_t paddr,
@@ -9734,7 +9654,6 @@ bool kvm_apicv_activated(struct kvm *kvm)
 {
 	return (READ_ONCE(kvm->arch.apicv_inhibit_reasons) == 0);
 }
-EXPORT_SYMBOL_GPL(kvm_apicv_activated);
 
 bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu)
 {
@@ -9743,7 +9662,6 @@ bool kvm_vcpu_apicv_activated(struct kvm_vcpu *vcpu)
 
 	return (vm_reasons | vcpu_reasons) == 0;
 }
-EXPORT_SYMBOL_GPL(kvm_vcpu_apicv_activated);
 
 static void set_or_clear_apicv_inhibit(unsigned long *inhibits,
 				       enum kvm_apicv_inhibit reason, bool set)
@@ -9917,7 +9835,6 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 	++vcpu->stat.hypercalls;
 	return kvm_skip_emulated_instruction(vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_emulate_hypercall);
 
 static int emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt)
 {
@@ -10355,7 +10272,6 @@ void __kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 	preempt_enable();
 	up_read(&vcpu->kvm->arch.apicv_update_lock);
 }
-EXPORT_SYMBOL_GPL(__kvm_vcpu_update_apicv);
 
 static void kvm_vcpu_update_apicv(struct kvm_vcpu *vcpu)
 {
@@ -10431,7 +10347,6 @@ void kvm_set_or_clear_apicv_inhibit(struct kvm *kvm,
 	__kvm_set_or_clear_apicv_inhibit(kvm, reason, set);
 	up_write(&kvm->arch.apicv_update_lock);
 }
-EXPORT_SYMBOL_GPL(kvm_set_or_clear_apicv_inhibit);
 
 static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
 {
@@ -10490,7 +10405,6 @@ void __kvm_request_immediate_exit(struct kvm_vcpu *vcpu)
 {
 	smp_send_reschedule(vcpu->cpu);
 }
-EXPORT_SYMBOL_GPL(__kvm_request_immediate_exit);
 
 /*
  * Called within kvm->srcu read side.
@@ -11467,7 +11381,6 @@ int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index,
 	kvm_set_rflags(vcpu, ctxt->eflags);
 	return 1;
 }
-EXPORT_SYMBOL_GPL(kvm_task_switch);
 
 static bool kvm_is_valid_sregs(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs)
 {
@@ -12159,7 +12072,6 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 	if (init_event)
 		kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_vcpu_reset);
 
 void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 {
@@ -12171,7 +12083,6 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 	kvm_set_segment(vcpu, &cs, VCPU_SREG_CS);
 	kvm_rip_write(vcpu, 0);
 }
-EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
 
 int kvm_arch_hardware_enable(void)
 {
@@ -12286,7 +12197,6 @@ bool kvm_vcpu_is_bsp(struct kvm_vcpu *vcpu)
 }
 
 __read_mostly DEFINE_STATIC_KEY_FALSE(kvm_has_noapic_vcpu);
-EXPORT_SYMBOL_GPL(kvm_has_noapic_vcpu);
 
 void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu)
 {
@@ -12475,7 +12385,6 @@ void __user * __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa,
 
 	return (void __user *)hva;
 }
-EXPORT_SYMBOL_GPL(__x86_set_memory_region);
 
 void kvm_arch_pre_destroy_vm(struct kvm *kvm)
 {
@@ -12939,13 +12848,11 @@ unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu)
 	return (u32)(get_segment_base(vcpu, VCPU_SREG_CS) +
 		     kvm_rip_read(vcpu));
 }
-EXPORT_SYMBOL_GPL(kvm_get_linear_rip);
 
 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip)
 {
 	return kvm_get_linear_rip(vcpu) == linear_rip;
 }
-EXPORT_SYMBOL_GPL(kvm_is_linear_rip);
 
 unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
 {
@@ -12956,7 +12863,6 @@ unsigned long kvm_get_rflags(struct kvm_vcpu *vcpu)
 		rflags &= ~X86_EFLAGS_TF;
 	return rflags;
 }
-EXPORT_SYMBOL_GPL(kvm_get_rflags);
 
 static void __kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 {
@@ -12971,7 +12877,6 @@ void kvm_set_rflags(struct kvm_vcpu *vcpu, unsigned long rflags)
 	__kvm_set_rflags(vcpu, rflags);
 	kvm_make_request(KVM_REQ_EVENT, vcpu);
 }
-EXPORT_SYMBOL_GPL(kvm_set_rflags);
 
 static inline u32 kvm_async_pf_hash_fn(gfn_t gfn)
 {
@@ -13188,37 +13093,31 @@ void kvm_arch_start_assignment(struct kvm *kvm)
 	if (atomic_inc_return(&kvm->arch.assigned_device_count) == 1)
 		static_call_cond(kvm_x86_pi_start_assignment)(kvm);
 }
-EXPORT_SYMBOL_GPL(kvm_arch_start_assignment);
 
 void kvm_arch_end_assignment(struct kvm *kvm)
 {
 	atomic_dec(&kvm->arch.assigned_device_count);
 }
-EXPORT_SYMBOL_GPL(kvm_arch_end_assignment);
 
 bool noinstr kvm_arch_has_assigned_device(struct kvm *kvm)
 {
 	return raw_atomic_read(&kvm->arch.assigned_device_count);
 }
-EXPORT_SYMBOL_GPL(kvm_arch_has_assigned_device);
 
 void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
 {
 	atomic_inc(&kvm->arch.noncoherent_dma_count);
 }
-EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);
 
 void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
 {
 	atomic_dec(&kvm->arch.noncoherent_dma_count);
 }
-EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);
 
 bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
 {
 	return atomic_read(&kvm->arch.noncoherent_dma_count);
 }
-EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
 
 bool kvm_arch_has_irq_bypass(void)
 {
@@ -13291,8 +13190,6 @@ bool kvm_arch_no_poll(struct kvm_vcpu *vcpu)
 {
 	return (vcpu->arch.msr_kvm_poll_control & 1) == 0;
 }
-EXPORT_SYMBOL_GPL(kvm_arch_no_poll);
-
 
 int kvm_spec_ctrl_test_value(u64 value)
 {
@@ -13318,7 +13215,6 @@ int kvm_spec_ctrl_test_value(u64 value)
 
 	return ret;
 }
-EXPORT_SYMBOL_GPL(kvm_spec_ctrl_test_value);
 
 void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_code)
 {
@@ -13343,7 +13239,6 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c
 	}
 	vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault);
 }
-EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error);
 
 /*
  * Handles kvm_read/write_guest_virt*() result and either injects #PF or returns
@@ -13372,7 +13267,6 @@ int kvm_handle_memory_failure(struct kvm_vcpu *vcpu, int r,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_handle_memory_failure);
 
 int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 {
@@ -13432,7 +13326,6 @@ int kvm_handle_invpcid(struct kvm_vcpu *vcpu, unsigned long type, gva_t gva)
 		return 1;
 	}
 }
-EXPORT_SYMBOL_GPL(kvm_handle_invpcid);
 
 static int complete_sev_es_emulated_mmio(struct kvm_vcpu *vcpu)
 {
@@ -13517,7 +13410,6 @@ int kvm_sev_es_mmio_write(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_write);
 
 int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
 			 void *data)
@@ -13555,7 +13447,6 @@ int kvm_sev_es_mmio_read(struct kvm_vcpu *vcpu, gpa_t gpa, unsigned int bytes,
 
 	return 0;
 }
-EXPORT_SYMBOL_GPL(kvm_sev_es_mmio_read);
 
 static void advance_sev_es_emulated_pio(struct kvm_vcpu *vcpu, unsigned count, int size)
 {
@@ -13643,37 +13534,6 @@ int kvm_sev_es_string_io(struct kvm_vcpu *vcpu, unsigned int size,
 	return in ? kvm_sev_es_ins(vcpu, size, port)
 		  : kvm_sev_es_outs(vcpu, size, port);
 }
-EXPORT_SYMBOL_GPL(kvm_sev_es_string_io);
-
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_entry);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_msr);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_cr);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmenter);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmexit_inject);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intr_vmexit);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_vmenter_failed);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_invlpga);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_skinit);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_nested_intercepts);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_write_tsc_offset);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_ple_window_update);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pml_full);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_pi_irte_update);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_unaccelerated_access);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_incomplete_ipi);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_ga_log);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_kick_vcpu_slowpath);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_avic_doorbell);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_apicv_accept_irq);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_enter);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_exit);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_enter);
-EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_vmgexit_msr_protocol_exit);
 
 static int __init kvm_x86_init(void)
 {
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 04/14] KVM: x86: Create stubs for a new VAC module
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (2 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 03/14] KVM: x86: Remove unused exports Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 05/14] KVM: x86: Refactor hardware enable/disable operations into a new file Anish Ghulati
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Create emtpy stub files for what will eventually be a new module named
Virtualization Acceleration Code: Unupgradable Units Module (backronym:
VACUUM, or VAC for short). VAC will function as a base module for
multiple KVM modules and will contain the code needed to manage
system-wide virtualization resources, like enabling/disabling
virtualization hardware.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/Makefile  | 2 ++
 arch/x86/kvm/svm/vac.c | 2 ++
 arch/x86/kvm/vac.c     | 3 +++
 arch/x86/kvm/vac.h     | 6 ++++++
 arch/x86/kvm/vmx/vac.c | 2 ++
 virt/kvm/Makefile.kvm  | 1 +
 virt/kvm/vac.c         | 3 +++
 virt/kvm/vac.h         | 6 ++++++
 8 files changed, 25 insertions(+)
 create mode 100644 arch/x86/kvm/svm/vac.c
 create mode 100644 arch/x86/kvm/vac.c
 create mode 100644 arch/x86/kvm/vac.h
 create mode 100644 arch/x86/kvm/vmx/vac.c
 create mode 100644 virt/kvm/vac.c
 create mode 100644 virt/kvm/vac.h

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 3e965c90e065..b3de4bd7988f 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -13,6 +13,8 @@ kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
 			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
 			   mmu/spte.o
 
+kvm-y                   += vac.o vmx/vac.o svm/vac.o
+
 ifdef CONFIG_HYPERV
 kvm-y			+= kvm_onhyperv.o
 endif
diff --git a/arch/x86/kvm/svm/vac.c b/arch/x86/kvm/svm/vac.c
new file mode 100644
index 000000000000..4aabf16d2fc0
--- /dev/null
+++ b/arch/x86/kvm/svm/vac.c
@@ -0,0 +1,2 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
diff --git a/arch/x86/kvm/vac.c b/arch/x86/kvm/vac.c
new file mode 100644
index 000000000000..18d2ae7d3e47
--- /dev/null
+++ b/arch/x86/kvm/vac.c
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "vac.h"
diff --git a/arch/x86/kvm/vac.h b/arch/x86/kvm/vac.h
new file mode 100644
index 000000000000..4d5dc4700f4e
--- /dev/null
+++ b/arch/x86/kvm/vac.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef ARCH_X86_KVM_VAC_H
+#define ARCH_X86_KVM_VAC_H
+
+#endif // ARCH_X86_KVM_VAC_H
diff --git a/arch/x86/kvm/vmx/vac.c b/arch/x86/kvm/vmx/vac.c
new file mode 100644
index 000000000000..4aabf16d2fc0
--- /dev/null
+++ b/arch/x86/kvm/vmx/vac.c
@@ -0,0 +1,2 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 4de10d447ef3..7876021ea4d7 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -11,6 +11,7 @@ kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
 ifdef CONFIG_VFIO
 kvm-y += $(KVM)/vfio.o
 endif
+kvm-y += $(KVM)/vac.o
 kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
 kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
 kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
diff --git a/virt/kvm/vac.c b/virt/kvm/vac.c
new file mode 100644
index 000000000000..18d2ae7d3e47
--- /dev/null
+++ b/virt/kvm/vac.c
@@ -0,0 +1,3 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include "vac.h"
diff --git a/virt/kvm/vac.h b/virt/kvm/vac.h
new file mode 100644
index 000000000000..8f7123a916c5
--- /dev/null
+++ b/virt/kvm/vac.h
@@ -0,0 +1,6 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef __KVM_VAC_H__
+#define __KVM_VAC_H__
+
+#endif
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 05/14] KVM: x86: Refactor hardware enable/disable operations into a new file
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (3 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 04/14] KVM: x86: Create stubs for a new VAC module Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 06/14] KVM: x86: Move user return msr operations out of KVM Anish Ghulati
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Move KVM's hardware enabling to vac.{h,c} as a first step towards
building VAC and all of the system-wide virtualization support as a
separate module.

Defer moving arch code to future patches to keep the diff reasonable.

No functional change intended.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 virt/kvm/kvm_main.c | 197 +-------------------------------------------
 virt/kvm/vac.c      | 177 +++++++++++++++++++++++++++++++++++++++
 virt/kvm/vac.h      |  26 ++++++
 3 files changed, 204 insertions(+), 196 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f585a159b4f5..fb50deaad3fd 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -59,6 +59,7 @@
 #include "coalesced_mmio.h"
 #include "async_pf.h"
 #include "kvm_mm.h"
+#include "vac.h"
 #include "vfio.h"
 
 #include <trace/events/ipi.h>
@@ -140,8 +141,6 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
 #define KVM_COMPAT(c)	.compat_ioctl	= kvm_no_compat_ioctl,	\
 			.open		= kvm_no_compat_open
 #endif
-static int hardware_enable_all(void);
-static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 
@@ -5167,200 +5166,6 @@ static struct miscdevice kvm_dev = {
 	&kvm_chardev_ops,
 };
 
-#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
-__visible bool kvm_rebooting;
-EXPORT_SYMBOL_GPL(kvm_rebooting);
-
-static DEFINE_PER_CPU(bool, hardware_enabled);
-static int kvm_usage_count;
-
-static int __hardware_enable_nolock(void)
-{
-	if (__this_cpu_read(hardware_enabled))
-		return 0;
-
-	if (kvm_arch_hardware_enable()) {
-		pr_info("kvm: enabling virtualization on CPU%d failed\n",
-			raw_smp_processor_id());
-		return -EIO;
-	}
-
-	__this_cpu_write(hardware_enabled, true);
-	return 0;
-}
-
-static void hardware_enable_nolock(void *failed)
-{
-	if (__hardware_enable_nolock())
-		atomic_inc(failed);
-}
-
-static int kvm_online_cpu(unsigned int cpu)
-{
-	int ret = 0;
-
-	/*
-	 * Abort the CPU online process if hardware virtualization cannot
-	 * be enabled. Otherwise running VMs would encounter unrecoverable
-	 * errors when scheduled to this CPU.
-	 */
-	mutex_lock(&kvm_lock);
-	if (kvm_usage_count)
-		ret = __hardware_enable_nolock();
-	mutex_unlock(&kvm_lock);
-	return ret;
-}
-
-static void hardware_disable_nolock(void *junk)
-{
-	/*
-	 * Note, hardware_disable_all_nolock() tells all online CPUs to disable
-	 * hardware, not just CPUs that successfully enabled hardware!
-	 */
-	if (!__this_cpu_read(hardware_enabled))
-		return;
-
-	kvm_arch_hardware_disable();
-
-	__this_cpu_write(hardware_enabled, false);
-}
-
-static int kvm_offline_cpu(unsigned int cpu)
-{
-	mutex_lock(&kvm_lock);
-	if (kvm_usage_count)
-		hardware_disable_nolock(NULL);
-	mutex_unlock(&kvm_lock);
-	return 0;
-}
-
-static void hardware_disable_all_nolock(void)
-{
-	BUG_ON(!kvm_usage_count);
-
-	kvm_usage_count--;
-	if (!kvm_usage_count)
-		on_each_cpu(hardware_disable_nolock, NULL, 1);
-}
-
-static void hardware_disable_all(void)
-{
-	cpus_read_lock();
-	mutex_lock(&kvm_lock);
-	hardware_disable_all_nolock();
-	mutex_unlock(&kvm_lock);
-	cpus_read_unlock();
-}
-
-static int hardware_enable_all(void)
-{
-	atomic_t failed = ATOMIC_INIT(0);
-	int r;
-
-	/*
-	 * Do not enable hardware virtualization if the system is going down.
-	 * If userspace initiated a forced reboot, e.g. reboot -f, then it's
-	 * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling
-	 * after kvm_reboot() is called.  Note, this relies on system_state
-	 * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops
-	 * hook instead of registering a dedicated reboot notifier (the latter
-	 * runs before system_state is updated).
-	 */
-	if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
-	    system_state == SYSTEM_RESTART)
-		return -EBUSY;
-
-	/*
-	 * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
-	 * is called, and so on_each_cpu() between them includes the CPU that
-	 * is being onlined.  As a result, hardware_enable_nolock() may get
-	 * invoked before kvm_online_cpu(), which also enables hardware if the
-	 * usage count is non-zero.  Disable CPU hotplug to avoid attempting to
-	 * enable hardware multiple times.
-	 */
-	cpus_read_lock();
-	mutex_lock(&kvm_lock);
-
-	r = 0;
-
-	kvm_usage_count++;
-	if (kvm_usage_count == 1) {
-		on_each_cpu(hardware_enable_nolock, &failed, 1);
-
-		if (atomic_read(&failed)) {
-			hardware_disable_all_nolock();
-			r = -EBUSY;
-		}
-	}
-
-	mutex_unlock(&kvm_lock);
-	cpus_read_unlock();
-
-	return r;
-}
-
-static void kvm_shutdown(void)
-{
-	/*
-	 * Disable hardware virtualization and set kvm_rebooting to indicate
-	 * that KVM has asynchronously disabled hardware virtualization, i.e.
-	 * that relevant errors and exceptions aren't entirely unexpected.
-	 * Some flavors of hardware virtualization need to be disabled before
-	 * transferring control to firmware (to perform shutdown/reboot), e.g.
-	 * on x86, virtualization can block INIT interrupts, which are used by
-	 * firmware to pull APs back under firmware control.  Note, this path
-	 * is used for both shutdown and reboot scenarios, i.e. neither name is
-	 * 100% comprehensive.
-	 */
-	pr_info("kvm: exiting hardware virtualization\n");
-	kvm_rebooting = true;
-	on_each_cpu(hardware_disable_nolock, NULL, 1);
-}
-
-static int kvm_suspend(void)
-{
-	/*
-	 * Secondary CPUs and CPU hotplug are disabled across the suspend/resume
-	 * callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
-	 * is stable.  Assert that kvm_lock is not held to ensure the system
-	 * isn't suspended while KVM is enabling hardware.  Hardware enabling
-	 * can be preempted, but the task cannot be frozen until it has dropped
-	 * all locks (userspace tasks are frozen via a fake signal).
-	 */
-	lockdep_assert_not_held(&kvm_lock);
-	lockdep_assert_irqs_disabled();
-
-	if (kvm_usage_count)
-		hardware_disable_nolock(NULL);
-	return 0;
-}
-
-static void kvm_resume(void)
-{
-	lockdep_assert_not_held(&kvm_lock);
-	lockdep_assert_irqs_disabled();
-
-	if (kvm_usage_count)
-		WARN_ON_ONCE(__hardware_enable_nolock());
-}
-
-static struct syscore_ops kvm_syscore_ops = {
-	.suspend = kvm_suspend,
-	.resume = kvm_resume,
-	.shutdown = kvm_shutdown,
-};
-#else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
-static int hardware_enable_all(void)
-{
-	return 0;
-}
-
-static void hardware_disable_all(void)
-{
-
-}
-#endif /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
-
 static void kvm_iodevice_destructor(struct kvm_io_device *dev)
 {
 	if (dev->ops->destructor)
diff --git a/virt/kvm/vac.c b/virt/kvm/vac.c
index 18d2ae7d3e47..ff034a53af50 100644
--- a/virt/kvm/vac.c
+++ b/virt/kvm/vac.c
@@ -1,3 +1,180 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
 #include "vac.h"
+
+#include <linux/cpu.h>
+#include <linux/percpu.h>
+#include <linux/mutex.h>
+
+#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
+DEFINE_MUTEX(vac_lock);
+
+__visible bool kvm_rebooting;
+EXPORT_SYMBOL_GPL(kvm_rebooting);
+
+static DEFINE_PER_CPU(bool, hardware_enabled);
+static int kvm_usage_count;
+
+static int __hardware_enable_nolock(void)
+{
+	if (__this_cpu_read(hardware_enabled))
+		return 0;
+
+	if (kvm_arch_hardware_enable()) {
+		pr_info("kvm: enabling virtualization on CPU%d failed\n",
+			raw_smp_processor_id());
+		return -EIO;
+	}
+
+	__this_cpu_write(hardware_enabled, true);
+	return 0;
+}
+
+static void hardware_enable_nolock(void *failed)
+{
+	if (__hardware_enable_nolock())
+		atomic_inc(failed);
+}
+
+int kvm_online_cpu(unsigned int cpu)
+{
+	int ret = 0;
+
+	/*
+	 * Abort the CPU online process if hardware virtualization cannot
+	 * be enabled. Otherwise running VMs would encounter unrecoverable
+	 * errors when scheduled to this CPU.
+	 */
+	mutex_lock(&vac_lock);
+	if (kvm_usage_count)
+		ret = __hardware_enable_nolock();
+	mutex_unlock(&vac_lock);
+	return ret;
+}
+
+static void hardware_disable_nolock(void *junk)
+{
+	/*
+	 * Note, hardware_disable_all_nolock() tells all online CPUs to disable
+	 * hardware, not just CPUs that successfully enabled hardware!
+	 */
+	if (!__this_cpu_read(hardware_enabled))
+		return;
+
+	kvm_arch_hardware_disable();
+
+	__this_cpu_write(hardware_enabled, false);
+}
+
+int kvm_offline_cpu(unsigned int cpu)
+{
+	mutex_lock(&vac_lock);
+	if (kvm_usage_count)
+		hardware_disable_nolock(NULL);
+	mutex_unlock(&vac_lock);
+	return 0;
+}
+
+static void hardware_disable_all_nolock(void)
+{
+	BUG_ON(!kvm_usage_count);
+
+	kvm_usage_count--;
+	if (!kvm_usage_count)
+		on_each_cpu(hardware_disable_nolock, NULL, 1);
+}
+
+void hardware_disable_all(void)
+{
+	cpus_read_lock();
+	mutex_lock(&vac_lock);
+	hardware_disable_all_nolock();
+	mutex_unlock(&vac_lock);
+	cpus_read_unlock();
+}
+
+int hardware_enable_all(void)
+{
+	atomic_t failed = ATOMIC_INIT(0);
+	int r = 0;
+
+	/*
+	 * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
+	 * is called, and so on_each_cpu() between them includes the CPU that
+	 * is being onlined.  As a result, hardware_enable_nolock() may get
+	 * invoked before kvm_online_cpu(), which also enables hardware if the
+	 * usage count is non-zero.  Disable CPU hotplug to avoid attempting to
+	 * enable hardware multiple times.
+	 */
+	cpus_read_lock();
+	mutex_lock(&vac_lock);
+
+	kvm_usage_count++;
+	if (kvm_usage_count == 1) {
+		on_each_cpu(hardware_enable_nolock, &failed, 1);
+
+		if (atomic_read(&failed)) {
+			hardware_disable_all_nolock();
+			r = -EBUSY;
+		}
+	}
+
+	mutex_unlock(&vac_lock);
+	cpus_read_unlock();
+
+	return r;
+}
+
+static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
+		      void *v)
+{
+	/*
+	 * Some (well, at least mine) BIOSes hang on reboot if
+	 * in vmx root mode.
+	 *
+	 * And Intel TXT required VMX off for all cpu when system shutdown.
+	 */
+	pr_info("kvm: exiting hardware virtualization\n");
+	kvm_rebooting = true;
+	on_each_cpu(hardware_disable_nolock, NULL, 1);
+	return NOTIFY_OK;
+}
+
+static int kvm_suspend(void)
+{
+	/*
+	 * Secondary CPUs and CPU hotplug are disabled across the suspend/resume
+	 * callbacks, i.e. no need to acquire vac_lock to ensure the usage count
+	 * is stable.  Assert that vac_lock is not held to ensure the system
+	 * isn't suspended while KVM is enabling hardware.  Hardware enabling
+	 * can be preempted, but the task cannot be frozen until it has dropped
+	 * all locks (userspace tasks are frozen via a fake signal).
+	 */
+	lockdep_assert_not_held(&vac_lock);
+	lockdep_assert_irqs_disabled();
+
+	if (kvm_usage_count)
+		hardware_disable_nolock(NULL);
+	return 0;
+}
+
+static void kvm_resume(void)
+{
+	lockdep_assert_not_held(&vac_lock);
+	lockdep_assert_irqs_disabled();
+
+	if (kvm_usage_count)
+		WARN_ON_ONCE(__hardware_enable_nolock());
+}
+
+struct notifier_block kvm_reboot_notifier = {
+	.notifier_call = kvm_reboot,
+	.priority = 0,
+};
+
+struct syscore_ops kvm_syscore_ops = {
+	.suspend = kvm_suspend,
+	.resume = kvm_resume,
+};
+
+#endif
diff --git a/virt/kvm/vac.h b/virt/kvm/vac.h
index 8f7123a916c5..aed178a16bdb 100644
--- a/virt/kvm/vac.h
+++ b/virt/kvm/vac.h
@@ -3,4 +3,30 @@
 #ifndef __KVM_VAC_H__
 #define __KVM_VAC_H__
 
+#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
+
+#include <linux/kvm_host.h>
+#include <linux/syscore_ops.h>
+
+int kvm_online_cpu(unsigned int cpu);
+int kvm_offline_cpu(unsigned int cpu);
+void hardware_disable_all(void);
+int hardware_enable_all(void);
+
+extern struct notifier_block kvm_reboot_notifier;
+
+extern struct syscore_ops kvm_syscore_ops;
+
+#else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
+static inline int hardware_enable_all(void)
+{
+	return 0;
+}
+
+static inline void hardware_disable_all(void)
+{
+
+}
+#endif /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
+
 #endif
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 06/14] KVM: x86: Move user return msr operations out of KVM
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (4 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 05/14] KVM: x86: Refactor hardware enable/disable operations into a new file Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 07/14] KVM: SVM: Move shared SVM data structures into VAC Anish Ghulati
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Move kvm_user_return_msrs into VAC. Create helper functions to access
user return msrs from KVM and (temporarily) expose them via vac.h. When
more code is moved to VAC these functions will no longer need to be
public and can be made internal.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/include/asm/kvm_host.h |  10 ---
 arch/x86/kvm/cpuid.c            |   1 +
 arch/x86/kvm/svm/svm.c          |   1 +
 arch/x86/kvm/vac.c              | 131 +++++++++++++++++++++++++++++
 arch/x86/kvm/vac.h              |  34 ++++++++
 arch/x86/kvm/vmx/vmx.c          |   1 +
 arch/x86/kvm/x86.c              | 142 ++------------------------------
 7 files changed, 173 insertions(+), 147 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e01d1aa3628c..34b995306c31 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1785,7 +1785,6 @@ struct kvm_arch_async_pf {
 	bool direct_map;
 };
 
-extern u32 __read_mostly kvm_nr_uret_msrs;
 extern u64 __read_mostly host_efer;
 extern bool __read_mostly allow_smaller_maxphyaddr;
 extern bool __read_mostly enable_apicv;
@@ -2139,15 +2138,6 @@ int kvm_pv_send_ipi(struct kvm *kvm, unsigned long ipi_bitmap_low,
 		    unsigned long ipi_bitmap_high, u32 min,
 		    unsigned long icr, int op_64_bit);
 
-int kvm_add_user_return_msr(u32 msr);
-int kvm_find_user_return_msr(u32 msr);
-int kvm_set_user_return_msr(unsigned index, u64 val, u64 mask);
-
-static inline bool kvm_is_supported_user_return_msr(u32 msr)
-{
-	return kvm_find_user_return_msr(msr) >= 0;
-}
-
 u64 kvm_scale_tsc(u64 tsc, u64 ratio);
 u64 kvm_read_l1_tsc(struct kvm_vcpu *vcpu, u64 host_tsc);
 u64 kvm_calc_nested_tsc_offset(u64 l1_offset, u64 l2_offset, u64 l2_multiplier);
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 01de1f659beb..961e5acd434c 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -28,6 +28,7 @@
 #include "trace.h"
 #include "pmu.h"
 #include "xen.h"
+#include "vac.h"
 
 /*
  * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 7fe9d11db8a6..f0a5cc43c023 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5,6 +5,7 @@
 #include "irq.h"
 #include "mmu.h"
 #include "kvm_cache_regs.h"
+#include "vac.h"
 #include "x86.h"
 #include "smm.h"
 #include "cpuid.h"
diff --git a/arch/x86/kvm/vac.c b/arch/x86/kvm/vac.c
index 18d2ae7d3e47..ab77aee4e1fa 100644
--- a/arch/x86/kvm/vac.c
+++ b/arch/x86/kvm/vac.c
@@ -1,3 +1,134 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
 #include "vac.h"
+#include <asm/msr.h>
+
+u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
+struct kvm_user_return_msrs __percpu *user_return_msrs;
+
+u32 __read_mostly kvm_nr_uret_msrs;
+
+void kvm_on_user_return(struct user_return_notifier *urn)
+{
+	unsigned int slot;
+	struct kvm_user_return_msrs *msrs
+		= container_of(urn, struct kvm_user_return_msrs, urn);
+	struct kvm_user_return_msr_values *values;
+	unsigned long flags;
+
+	/*
+	 * Disabling irqs at this point since the following code could be
+	 * interrupted and executed through kvm_arch_hardware_disable()
+	 */
+	local_irq_save(flags);
+	if (msrs->registered) {
+		msrs->registered = false;
+		user_return_notifier_unregister(urn);
+	}
+	local_irq_restore(flags);
+	for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
+		values = &msrs->values[slot];
+		if (values->host != values->curr) {
+			wrmsrl(kvm_uret_msrs_list[slot], values->host);
+			values->curr = values->host;
+		}
+	}
+}
+
+void kvm_user_return_msr_cpu_online(void)
+{
+	unsigned int cpu = smp_processor_id();
+	struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+	u64 value;
+	int i;
+
+	for (i = 0; i < kvm_nr_uret_msrs; ++i) {
+		rdmsrl_safe(kvm_uret_msrs_list[i], &value);
+		msrs->values[i].host = value;
+		msrs->values[i].curr = value;
+	}
+}
+
+void drop_user_return_notifiers(void)
+{
+	unsigned int cpu = smp_processor_id();
+	struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+
+	if (msrs->registered)
+		kvm_on_user_return(&msrs->urn);
+}
+
+static int kvm_probe_user_return_msr(u32 msr)
+{
+	u64 val;
+	int ret;
+
+	preempt_disable();
+	ret = rdmsrl_safe(msr, &val);
+	if (ret)
+		goto out;
+	ret = wrmsrl_safe(msr, val);
+out:
+	preempt_enable();
+	return ret;
+}
+
+int kvm_add_user_return_msr(u32 msr)
+{
+	BUG_ON(kvm_nr_uret_msrs >= KVM_MAX_NR_USER_RETURN_MSRS);
+
+	if (kvm_probe_user_return_msr(msr))
+		return -1;
+
+	kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
+	return kvm_nr_uret_msrs++;
+}
+
+int kvm_find_user_return_msr(u32 msr)
+{
+	int i;
+
+	for (i = 0; i < kvm_nr_uret_msrs; ++i) {
+		if (kvm_uret_msrs_list[i] == msr)
+			return i;
+	}
+	return -1;
+}
+
+int kvm_set_user_return_msr(unsigned int slot, u64 value, u64 mask)
+{
+	unsigned int cpu = smp_processor_id();
+	struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
+	int err;
+
+	value = (value & mask) | (msrs->values[slot].host & ~mask);
+	if (value == msrs->values[slot].curr)
+		return 0;
+	err = wrmsrl_safe(kvm_uret_msrs_list[slot], value);
+	if (err)
+		return 1;
+
+	msrs->values[slot].curr = value;
+	if (!msrs->registered) {
+		msrs->urn.on_user_return = kvm_on_user_return;
+		user_return_notifier_register(&msrs->urn);
+		msrs->registered = true;
+	}
+	return 0;
+}
+
+int kvm_alloc_user_return_msrs(void)
+{
+	user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
+	if (!user_return_msrs) {
+		pr_err("failed to allocate percpu kvm_user_return_msrs\n");
+		return -ENOMEM;
+	}
+	kvm_nr_uret_msrs = 0;
+	return 0;
+}
+
+void kvm_free_user_return_msrs(void)
+{
+	free_percpu(user_return_msrs);
+}
diff --git a/arch/x86/kvm/vac.h b/arch/x86/kvm/vac.h
index 4d5dc4700f4e..135d3be5461e 100644
--- a/arch/x86/kvm/vac.h
+++ b/arch/x86/kvm/vac.h
@@ -3,4 +3,38 @@
 #ifndef ARCH_X86_KVM_VAC_H
 #define ARCH_X86_KVM_VAC_H
 
+#include <linux/user-return-notifier.h>
+
+/*
+ * Restoring the host value for MSRs that are only consumed when running in
+ * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
+ * returns to userspace, i.e. the kernel can run with the guest's value.
+ */
+#define KVM_MAX_NR_USER_RETURN_MSRS 16
+
+struct kvm_user_return_msrs {
+	struct user_return_notifier urn;
+	bool registered;
+	struct kvm_user_return_msr_values {
+		u64 host;
+		u64 curr;
+	} values[KVM_MAX_NR_USER_RETURN_MSRS];
+};
+
+extern u32 __read_mostly kvm_nr_uret_msrs;
+
+int kvm_alloc_user_return_msrs(void);
+void kvm_free_user_return_msrs(void);
+int kvm_add_user_return_msr(u32 msr);
+int kvm_find_user_return_msr(u32 msr);
+int kvm_set_user_return_msr(unsigned int slot, u64 value, u64 mask);
+void kvm_on_user_return(struct user_return_notifier *urn);
+void kvm_user_return_msr_cpu_online(void);
+void drop_user_return_notifiers(void);
+
+static inline bool kvm_is_supported_user_return_msr(u32 msr)
+{
+	return kvm_find_user_return_msr(msr) >= 0;
+}
+
 #endif // ARCH_X86_KVM_VAC_H
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 629e662b131e..7fea84a17edf 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -64,6 +64,7 @@
 #include "vmcs12.h"
 #include "vmx.h"
 #include "x86.h"
+#include "vac.h"
 #include "smm.h"
 
 #ifdef MODULE
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0a8b94678928..7466a5945147 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -34,6 +34,7 @@
 #include "lapic.h"
 #include "xen.h"
 #include "smm.h"
+#include "vac.h"
 
 #include <linux/clocksource.h>
 #include <linux/interrupt.h>
@@ -205,26 +206,6 @@ module_param(eager_page_split, bool, 0644);
 static bool __read_mostly mitigate_smt_rsb;
 module_param(mitigate_smt_rsb, bool, 0444);
 
-/*
- * Restoring the host value for MSRs that are only consumed when running in
- * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
- * returns to userspace, i.e. the kernel can run with the guest's value.
- */
-#define KVM_MAX_NR_USER_RETURN_MSRS 16
-
-struct kvm_user_return_msrs {
-	struct user_return_notifier urn;
-	bool registered;
-	struct kvm_user_return_msr_values {
-		u64 host;
-		u64 curr;
-	} values[KVM_MAX_NR_USER_RETURN_MSRS];
-};
-
-u32 __read_mostly kvm_nr_uret_msrs;
-static u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
-static struct kvm_user_return_msrs __percpu *user_return_msrs;
-
 #define KVM_SUPPORTED_XCR0     (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \
 				| XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \
 				| XFEATURE_MASK_BNDCSR | XFEATURE_MASK_AVX512 \
@@ -358,115 +339,6 @@ static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
 		vcpu->arch.apf.gfns[i] = ~0;
 }
 
-static void kvm_on_user_return(struct user_return_notifier *urn)
-{
-	unsigned slot;
-	struct kvm_user_return_msrs *msrs
-		= container_of(urn, struct kvm_user_return_msrs, urn);
-	struct kvm_user_return_msr_values *values;
-	unsigned long flags;
-
-	/*
-	 * Disabling irqs at this point since the following code could be
-	 * interrupted and executed through kvm_arch_hardware_disable()
-	 */
-	local_irq_save(flags);
-	if (msrs->registered) {
-		msrs->registered = false;
-		user_return_notifier_unregister(urn);
-	}
-	local_irq_restore(flags);
-	for (slot = 0; slot < kvm_nr_uret_msrs; ++slot) {
-		values = &msrs->values[slot];
-		if (values->host != values->curr) {
-			wrmsrl(kvm_uret_msrs_list[slot], values->host);
-			values->curr = values->host;
-		}
-	}
-}
-
-static int kvm_probe_user_return_msr(u32 msr)
-{
-	u64 val;
-	int ret;
-
-	preempt_disable();
-	ret = rdmsrl_safe(msr, &val);
-	if (ret)
-		goto out;
-	ret = wrmsrl_safe(msr, val);
-out:
-	preempt_enable();
-	return ret;
-}
-
-int kvm_add_user_return_msr(u32 msr)
-{
-	BUG_ON(kvm_nr_uret_msrs >= KVM_MAX_NR_USER_RETURN_MSRS);
-
-	if (kvm_probe_user_return_msr(msr))
-		return -1;
-
-	kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
-	return kvm_nr_uret_msrs++;
-}
-
-int kvm_find_user_return_msr(u32 msr)
-{
-	int i;
-
-	for (i = 0; i < kvm_nr_uret_msrs; ++i) {
-		if (kvm_uret_msrs_list[i] == msr)
-			return i;
-	}
-	return -1;
-}
-
-static void kvm_user_return_msr_cpu_online(void)
-{
-	unsigned int cpu = smp_processor_id();
-	struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
-	u64 value;
-	int i;
-
-	for (i = 0; i < kvm_nr_uret_msrs; ++i) {
-		rdmsrl_safe(kvm_uret_msrs_list[i], &value);
-		msrs->values[i].host = value;
-		msrs->values[i].curr = value;
-	}
-}
-
-int kvm_set_user_return_msr(unsigned slot, u64 value, u64 mask)
-{
-	unsigned int cpu = smp_processor_id();
-	struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
-	int err;
-
-	value = (value & mask) | (msrs->values[slot].host & ~mask);
-	if (value == msrs->values[slot].curr)
-		return 0;
-	err = wrmsrl_safe(kvm_uret_msrs_list[slot], value);
-	if (err)
-		return 1;
-
-	msrs->values[slot].curr = value;
-	if (!msrs->registered) {
-		msrs->urn.on_user_return = kvm_on_user_return;
-		user_return_notifier_register(&msrs->urn);
-		msrs->registered = true;
-	}
-	return 0;
-}
-
-static void drop_user_return_notifiers(void)
-{
-	unsigned int cpu = smp_processor_id();
-	struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
-
-	if (msrs->registered)
-		kvm_on_user_return(&msrs->urn);
-}
-
 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.apic_base;
@@ -9415,13 +9287,9 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		return -ENOMEM;
 	}
 
-	user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
-	if (!user_return_msrs) {
-		pr_err("failed to allocate percpu kvm_user_return_msrs\n");
-		r = -ENOMEM;
+	r = kvm_alloc_user_return_msrs();
+	if (r)
 		goto out_free_x86_emulator_cache;
-	}
-	kvm_nr_uret_msrs = 0;
 
 	r = kvm_mmu_vendor_module_init();
 	if (r)
@@ -9500,7 +9368,7 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 out_mmu_exit:
 	kvm_mmu_vendor_module_exit();
 out_free_percpu:
-	free_percpu(user_return_msrs);
+	kvm_free_user_return_msrs();
 out_free_x86_emulator_cache:
 	kmem_cache_destroy(x86_emulator_cache);
 	return r;
@@ -9539,7 +9407,7 @@ void kvm_x86_vendor_exit(void)
 #endif
 	static_call(kvm_x86_hardware_unsetup)();
 	kvm_mmu_vendor_module_exit();
-	free_percpu(user_return_msrs);
+	kvm_free_user_return_msrs();
 	kmem_cache_destroy(x86_emulator_cache);
 #ifdef CONFIG_KVM_XEN
 	static_key_deferred_flush(&kvm_xen_enabled);
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 07/14] KVM: SVM: Move shared SVM data structures into VAC
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (5 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 06/14] KVM: x86: Move user return msr operations out of KVM Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 08/14] KVM: VMX: Move shared VMX " Anish Ghulati
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati, Venkatesh Srinivas

Move svm_cpu_data into VAC.

TODO: Explain why this data should be shared between KVMs and should be
made global by moving it into VAC.

Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/svm/svm.c |  9 ++++++++-
 arch/x86/kvm/svm/svm.h | 16 +---------------
 arch/x86/kvm/svm/vac.c |  5 +++++
 arch/x86/kvm/svm/vac.h | 23 +++++++++++++++++++++++
 4 files changed, 37 insertions(+), 16 deletions(-)
 create mode 100644 arch/x86/kvm/svm/vac.h

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index f0a5cc43c023..d53808d8ec37 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -234,7 +234,14 @@ static u8 rsm_ins_bytes[] = "\x0f\xaa";
 
 static unsigned long iopm_base;
 
-DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
+struct kvm_ldttss_desc {
+	u16 limit0;
+	u16 base0;
+	unsigned base1:8, type:5, dpl:2, p:1;
+	unsigned limit1:4, zero0:3, g:1, base2:8;
+	u32 base3;
+	u32 zero1;
+} __attribute__((packed));
 
 /*
  * Only MSR_TSC_AUX is switched via the user return hook.  EFER is switched via
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 436632706848..7fc652b1b92d 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -24,6 +24,7 @@
 
 #include "cpuid.h"
 #include "kvm_cache_regs.h"
+#include "vac.h"
 
 #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
 
@@ -291,21 +292,6 @@ struct vcpu_svm {
 	bool guest_gif;
 };
 
-struct svm_cpu_data {
-	u64 asid_generation;
-	u32 max_asid;
-	u32 next_asid;
-	u32 min_asid;
-
-	struct page *save_area;
-	unsigned long save_area_pa;
-
-	struct vmcb *current_vmcb;
-
-	/* index = sev_asid, value = vmcb pointer */
-	struct vmcb **sev_vmcbs;
-};
-
 DECLARE_PER_CPU(struct svm_cpu_data, svm_data);
 
 void recalc_intercepts(struct vcpu_svm *svm);
diff --git a/arch/x86/kvm/svm/vac.c b/arch/x86/kvm/svm/vac.c
index 4aabf16d2fc0..3e79279c6b34 100644
--- a/arch/x86/kvm/svm/vac.c
+++ b/arch/x86/kvm/svm/vac.c
@@ -1,2 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+#include <linux/percpu-defs.h>
+
+#include "vac.h"
+
+DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
diff --git a/arch/x86/kvm/svm/vac.h b/arch/x86/kvm/svm/vac.h
new file mode 100644
index 000000000000..2d42e4472703
--- /dev/null
+++ b/arch/x86/kvm/svm/vac.h
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0-only
+//
+#ifndef ARCH_X86_KVM_SVM_VAC_H
+#define ARCH_X86_KVM_SVM_VAC_H
+
+#include "../vac.h"
+
+struct svm_cpu_data {
+	u64 asid_generation;
+	u32 max_asid;
+	u32 next_asid;
+	u32 min_asid;
+
+	struct page *save_area;
+	unsigned long save_area_pa;
+
+	struct vmcb *current_vmcb;
+
+	/* index = sev_asid, value = vmcb pointer */
+	struct vmcb **sev_vmcbs;
+};
+
+#endif // ARCH_X86_KVM_SVM_VAC_H
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 08/14] KVM: VMX: Move shared VMX data structures into VAC
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (6 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 07/14] KVM: SVM: Move shared SVM data structures into VAC Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 09/14] KVM: x86: Move shared KVM state " Anish Ghulati
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati, Venkatesh Srinivas

Move vmxarea and current_vmcs into VAC.

Move VPID bitmap into the VAC

TODO: Explain why this data needs to be shared among multiple KVM
modules and moved into VAC.

Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/vmx/nested.c |  1 +
 arch/x86/kvm/vmx/vac.c    | 47 +++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vac.h    | 12 ++++++++++
 arch/x86/kvm/vmx/vmx.c    | 41 +++++-----------------------------
 arch/x86/kvm/vmx/vmx.h    |  2 --
 5 files changed, 65 insertions(+), 38 deletions(-)
 create mode 100644 arch/x86/kvm/vmx/vac.h

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index c5ec0ef51ff7..5c6ac7662453 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -14,6 +14,7 @@
 #include "pmu.h"
 #include "sgx.h"
 #include "trace.h"
+#include "vac.h"
 #include "vmx.h"
 #include "x86.h"
 #include "smm.h"
diff --git a/arch/x86/kvm/vmx/vac.c b/arch/x86/kvm/vmx/vac.c
index 4aabf16d2fc0..7b8ade0fb97f 100644
--- a/arch/x86/kvm/vmx/vac.c
+++ b/arch/x86/kvm/vmx/vac.c
@@ -1,2 +1,49 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+#include <asm/percpu.h>
+#include <linux/percpu-defs.h>
+
+#include "vac.h"
+
+
+static DEFINE_PER_CPU(struct vmcs *, vmxarea);
+
+DEFINE_PER_CPU(struct vmcs *, current_vmcs);
+
+void vac_set_vmxarea(struct vmcs *vmcs, int cpu)
+{
+	per_cpu(vmxarea, cpu) = vmcs;
+}
+
+struct vmcs *vac_get_vmxarea(int cpu)
+{
+	return per_cpu(vmxarea, cpu);
+}
+
+static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
+static DEFINE_SPINLOCK(vmx_vpid_lock);
+
+int allocate_vpid(void)
+{
+	int vpid;
+
+	if (!enable_vpid)
+		return 0;
+	spin_lock(&vmx_vpid_lock);
+	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
+	if (vpid < VMX_NR_VPIDS)
+		__set_bit(vpid, vmx_vpid_bitmap);
+	else
+		vpid = 0;
+	spin_unlock(&vmx_vpid_lock);
+	return vpid;
+}
+
+void free_vpid(int vpid)
+{
+	if (!enable_vpid || vpid == 0)
+		return;
+	spin_lock(&vmx_vpid_lock);
+	__clear_bit(vpid, vmx_vpid_bitmap);
+	spin_unlock(&vmx_vpid_lock);
+}
diff --git a/arch/x86/kvm/vmx/vac.h b/arch/x86/kvm/vmx/vac.h
new file mode 100644
index 000000000000..46c54fe7447d
--- /dev/null
+++ b/arch/x86/kvm/vmx/vac.h
@@ -0,0 +1,12 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <asm/vmx.h>
+
+#include "../vac.h"
+#include "vmcs.h"
+
+void vac_set_vmxarea(struct vmcs *vmcs, int cpu);
+
+struct vmcs *vac_get_vmxarea(int cpu);
+int allocate_vpid(void);
+void free_vpid(int vpid);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 7fea84a17edf..407e37810419 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -60,6 +60,7 @@
 #include "pmu.h"
 #include "sgx.h"
 #include "trace.h"
+#include "vac.h"
 #include "vmcs.h"
 #include "vmcs12.h"
 #include "vmx.h"
@@ -455,17 +456,12 @@ noinline void invept_error(unsigned long ext, u64 eptp, gpa_t gpa)
 			ext, eptp, gpa);
 }
 
-static DEFINE_PER_CPU(struct vmcs *, vmxarea);
-DEFINE_PER_CPU(struct vmcs *, current_vmcs);
 /*
  * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed
  * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
  */
 static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 
-static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
-static DEFINE_SPINLOCK(vmx_vpid_lock);
-
 struct vmcs_config vmcs_config __ro_after_init;
 struct vmx_capability vmx_capability __ro_after_init;
 
@@ -2792,7 +2788,7 @@ static int kvm_cpu_vmxon(u64 vmxon_pointer)
 static int vmx_hardware_enable(void)
 {
 	int cpu = raw_smp_processor_id();
-	u64 phys_addr = __pa(per_cpu(vmxarea, cpu));
+	u64 phys_addr = __pa(vac_get_vmxarea(cpu));
 	int r;
 
 	if (cr4_read_shadow() & X86_CR4_VMXE)
@@ -2921,8 +2917,8 @@ static void free_kvm_area(void)
 	int cpu;
 
 	for_each_possible_cpu(cpu) {
-		free_vmcs(per_cpu(vmxarea, cpu));
-		per_cpu(vmxarea, cpu) = NULL;
+		free_vmcs(vac_get_vmxarea(cpu));
+		vac_set_vmxarea(NULL, cpu);
 	}
 }
 
@@ -2952,7 +2948,7 @@ static __init int alloc_kvm_area(void)
 		if (kvm_is_using_evmcs())
 			vmcs->hdr.revision_id = vmcs_config.revision_id;
 
-		per_cpu(vmxarea, cpu) = vmcs;
+		vac_set_vmxarea(vmcs, cpu);
 	}
 	return 0;
 }
@@ -3897,31 +3893,6 @@ static void seg_setup(int seg)
 	vmcs_write32(sf->ar_bytes, ar);
 }
 
-int allocate_vpid(void)
-{
-	int vpid;
-
-	if (!enable_vpid)
-		return 0;
-	spin_lock(&vmx_vpid_lock);
-	vpid = find_first_zero_bit(vmx_vpid_bitmap, VMX_NR_VPIDS);
-	if (vpid < VMX_NR_VPIDS)
-		__set_bit(vpid, vmx_vpid_bitmap);
-	else
-		vpid = 0;
-	spin_unlock(&vmx_vpid_lock);
-	return vpid;
-}
-
-void free_vpid(int vpid)
-{
-	if (!enable_vpid || vpid == 0)
-		return;
-	spin_lock(&vmx_vpid_lock);
-	__clear_bit(vpid, vmx_vpid_bitmap);
-	spin_unlock(&vmx_vpid_lock);
-}
-
 static void vmx_msr_bitmap_l01_changed(struct vcpu_vmx *vmx)
 {
 	/*
@@ -8538,8 +8509,6 @@ static __init int hardware_setup(void)
 	kvm_caps.has_bus_lock_exit = cpu_has_vmx_bus_lock_detection();
 	kvm_caps.has_notify_vmexit = cpu_has_notify_vmexit();
 
-	set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */
-
 	if (enable_ept)
 		kvm_mmu_set_ept_masks(enable_ept_ad_bits,
 				      cpu_has_vmx_ept_execute_only());
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index 476119670d82..03b11159fde5 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -376,8 +376,6 @@ struct kvm_vmx {
 
 void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
 			struct loaded_vmcs *buddy);
-int allocate_vpid(void);
-void free_vpid(int vpid);
 void vmx_set_constant_host_state(struct vcpu_vmx *vmx);
 void vmx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel,
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 09/14] KVM: x86: Move shared KVM state into VAC
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (7 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 08/14] KVM: VMX: Move shared VMX " Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-17  8:54   ` Lai Jiangshan
  2023-11-07 20:19 ` [RFC PATCH 10/14] KVM: VMX: Move VMX enable and disable " Anish Ghulati
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Venkatesh Srinivas, Anish Ghulati

From: Venkatesh Srinivas <venkateshs@chromium.org>

Move kcpu_kick_mask and vm_running_vcpu* from arch neutral KVM code into
VAC.

TODO: Explain why this needs to be moved into VAC.

Signed-off-by: Venkatesh Srinivas <venkateshs@chromium.org>
Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 virt/kvm/kvm_main.c | 6 ++++--
 virt/kvm/vac.c      | 5 +++++
 virt/kvm/vac.h      | 3 +++
 3 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fb50deaad3fd..575f044fd842 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -108,7 +108,6 @@ LIST_HEAD(vm_list);
 static struct kmem_cache *kvm_vcpu_cache;
 
 static __read_mostly struct preempt_ops kvm_preempt_ops;
-static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
 
 struct dentry *kvm_debugfs_dir;
 EXPORT_SYMBOL_GPL(kvm_debugfs_dir);
@@ -150,7 +149,10 @@ static void kvm_uevent_notify_change(unsigned int type, struct kvm *kvm);
 static unsigned long long kvm_createvm_count;
 static unsigned long long kvm_active_vms;
 
-static DEFINE_PER_CPU(cpumask_var_t, cpu_kick_mask);
+__weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm,
+						   unsigned long start, unsigned long end)
+{
+}
 
 __weak void kvm_arch_guest_memory_reclaimed(struct kvm *kvm)
 {
diff --git a/virt/kvm/vac.c b/virt/kvm/vac.c
index ff034a53af50..c628afeb3d4b 100644
--- a/virt/kvm/vac.c
+++ b/virt/kvm/vac.c
@@ -6,6 +6,11 @@
 #include <linux/percpu.h>
 #include <linux/mutex.h>
 
+DEFINE_PER_CPU(cpumask_var_t, cpu_kick_mask);
+EXPORT_SYMBOL(cpu_kick_mask);
+
+DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
+
 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
 DEFINE_MUTEX(vac_lock);
 
diff --git a/virt/kvm/vac.h b/virt/kvm/vac.h
index aed178a16bdb..f3e7b08168df 100644
--- a/virt/kvm/vac.h
+++ b/virt/kvm/vac.h
@@ -29,4 +29,7 @@ static inline void hardware_disable_all(void)
 }
 #endif /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
 
+DECLARE_PER_CPU(cpumask_var_t, cpu_kick_mask);
+DECLARE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
+
 #endif
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 10/14] KVM: VMX: Move VMX enable and disable into VAC
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (8 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 09/14] KVM: x86: Move shared KVM state " Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:19 ` [RFC PATCH 11/14] KVM: SVM: Move SVM " Anish Ghulati
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Move loaded_vmcss_on_cpu into VAC. loaded_vmcss_on_cpu is a list of
VMCSs that have been loaded on a CPU; when offlining or kexec/crashing,
this needs to be VMCLEAR'd.

Register and unregister callbacks for vmx_emergency_disable in
vac_vmx_init/exit. These are init and exit functions for the VMX portion
of VAC. Temporarily call init and exit from vmx_init and __vmx_exit.
This will change once VAC is a module (the init/exit will be called via
module init and exit functions).

Move the call to hv_reset_evmcs from vmx_hardware_disable to
vmx_module_exit. This is because vmx_hardware_enable/disable is now part
of VAC.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/vac.h     |  14 +++
 arch/x86/kvm/vmx/vac.c | 190 +++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vac.h |   7 +-
 arch/x86/kvm/vmx/vmx.c | 182 ++-------------------------------------
 4 files changed, 219 insertions(+), 174 deletions(-)

diff --git a/arch/x86/kvm/vac.h b/arch/x86/kvm/vac.h
index 135d3be5461e..59cbf36ff8ce 100644
--- a/arch/x86/kvm/vac.h
+++ b/arch/x86/kvm/vac.h
@@ -5,6 +5,20 @@
 
 #include <linux/user-return-notifier.h>
 
+int __init vac_init(void);
+void vac_exit(void);
+
+#ifdef CONFIG_KVM_INTEL
+int __init vac_vmx_init(void);
+void vac_vmx_exit(void);
+#else
+int __init vac_vmx_init(void)
+{
+	return 0;
+}
+void vac_vmx_exit(void) {}
+#endif
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
diff --git a/arch/x86/kvm/vmx/vac.c b/arch/x86/kvm/vmx/vac.c
index 7b8ade0fb97f..202686ccbaec 100644
--- a/arch/x86/kvm/vmx/vac.c
+++ b/arch/x86/kvm/vmx/vac.c
@@ -1,10 +1,18 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
 #include <asm/percpu.h>
+#include <asm/reboot.h>
 #include <linux/percpu-defs.h>
 
 #include "vac.h"
+#include "vmx_ops.h"
+#include "posted_intr.h"
 
+/*
+ * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed
+ * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
+ */
+static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 
@@ -47,3 +55,185 @@ void free_vpid(int vpid)
 	__clear_bit(vpid, vmx_vpid_bitmap);
 	spin_unlock(&vmx_vpid_lock);
 }
+
+void add_vmcs_to_loaded_vmcss_on_cpu(
+		struct list_head *loaded_vmcss_on_cpu_link,
+		int cpu)
+{
+	list_add(loaded_vmcss_on_cpu_link, &per_cpu(loaded_vmcss_on_cpu, cpu));
+}
+
+static void __loaded_vmcs_clear(void *arg)
+{
+	struct loaded_vmcs *loaded_vmcs = arg;
+	int cpu = raw_smp_processor_id();
+
+	if (loaded_vmcs->cpu != cpu)
+		return; /* vcpu migration can race with cpu offline */
+	if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs)
+		per_cpu(current_vmcs, cpu) = NULL;
+
+	vmcs_clear(loaded_vmcs->vmcs);
+	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
+		vmcs_clear(loaded_vmcs->shadow_vmcs);
+
+	list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
+
+	/*
+	 * Ensure all writes to loaded_vmcs, including deleting it from its
+	 * current percpu list, complete before setting loaded_vmcs->cpu to
+	 * -1, otherwise a different cpu can see loaded_vmcs->cpu == -1 first
+	 * and add loaded_vmcs to its percpu list before it's deleted from this
+	 * cpu's list. Pairs with the smp_rmb() in vmx_vcpu_load_vmcs().
+	 */
+	smp_wmb();
+
+	loaded_vmcs->cpu = -1;
+	loaded_vmcs->launched = 0;
+}
+
+void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
+{
+	int cpu = loaded_vmcs->cpu;
+
+	if (cpu != -1)
+		smp_call_function_single(cpu,
+			 __loaded_vmcs_clear, loaded_vmcs, 1);
+
+}
+
+static int kvm_cpu_vmxon(u64 vmxon_pointer)
+{
+	u64 msr;
+
+	cr4_set_bits(X86_CR4_VMXE);
+
+	asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
+			  _ASM_EXTABLE(1b, %l[fault])
+			  : : [vmxon_pointer] "m"(vmxon_pointer)
+			  : : fault);
+	return 0;
+
+fault:
+	WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx\n",
+		  rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr) ? 0xdeadbeef : msr);
+	cr4_clear_bits(X86_CR4_VMXE);
+
+	return -EFAULT;
+}
+
+int vmx_hardware_enable(void)
+{
+	int cpu = raw_smp_processor_id();
+	u64 phys_addr = __pa(vac_get_vmxarea(cpu));
+	int r;
+
+	if (cr4_read_shadow() & X86_CR4_VMXE)
+		return -EBUSY;
+
+	/*
+	 * This can happen if we hot-added a CPU but failed to allocate
+	 * VP assist page for it.
+	 */
+	if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu))
+		return -EFAULT;
+
+	intel_pt_handle_vmx(1);
+
+	r = kvm_cpu_vmxon(phys_addr);
+	if (r) {
+		intel_pt_handle_vmx(0);
+		return r;
+	}
+
+	if (enable_ept)
+		ept_sync_global();
+
+	return 0;
+}
+
+static void vmclear_local_loaded_vmcss(void)
+{
+	int cpu = raw_smp_processor_id();
+	struct loaded_vmcs *v, *n;
+
+	list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
+				 loaded_vmcss_on_cpu_link)
+		__loaded_vmcs_clear(v);
+}
+
+/*
+ * Disable VMX and clear CR4.VMXE (even if VMXOFF faults)
+ *
+ * Note, VMXOFF causes a #UD if the CPU is !post-VMXON, but it's impossible to
+ * atomically track post-VMXON state, e.g. this may be called in NMI context.
+ * Eat all faults as all other faults on VMXOFF faults are mode related, i.e.
+ * faults are guaranteed to be due to the !post-VMXON check unless the CPU is
+ * magically in RM, VM86, compat mode, or at CPL>0.
+ */
+static int kvm_cpu_vmxoff(void)
+{
+	asm_volatile_goto("1: vmxoff\n\t"
+			  _ASM_EXTABLE(1b, %l[fault])
+			  ::: "cc", "memory" : fault);
+
+	cr4_clear_bits(X86_CR4_VMXE);
+	return 0;
+
+fault:
+	cr4_clear_bits(X86_CR4_VMXE);
+	return -EIO;
+}
+
+static void vmx_emergency_disable(void)
+{
+	int cpu = raw_smp_processor_id();
+	struct loaded_vmcs *v;
+
+	kvm_rebooting = true;
+
+	/*
+	 * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be
+	 * set in task context.  If this races with VMX is disabled by an NMI,
+	 * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to
+	 * kvm_rebooting set.
+	 */
+	if (!(__read_cr4() & X86_CR4_VMXE))
+		return;
+
+	list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
+			    loaded_vmcss_on_cpu_link)
+		vmcs_clear(v->vmcs);
+
+	kvm_cpu_vmxoff();
+}
+
+void vmx_hardware_disable(void)
+{
+	vmclear_local_loaded_vmcss();
+
+	if (kvm_cpu_vmxoff())
+		kvm_spurious_fault();
+
+	intel_pt_handle_vmx(0);
+}
+
+int __init vac_vmx_init(void)
+{
+	int cpu;
+
+	for_each_possible_cpu(cpu) {
+		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
+
+		pi_init_cpu(cpu);
+	}
+
+	cpu_emergency_register_virt_callback(vmx_emergency_disable);
+
+	return 0;
+}
+
+void vac_vmx_exit(void)
+{
+	cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
+}
diff --git a/arch/x86/kvm/vmx/vac.h b/arch/x86/kvm/vmx/vac.h
index 46c54fe7447d..daeea8ef0d33 100644
--- a/arch/x86/kvm/vmx/vac.h
+++ b/arch/x86/kvm/vmx/vac.h
@@ -9,4 +9,9 @@ void vac_set_vmxarea(struct vmcs *vmcs, int cpu);
 
 struct vmcs *vac_get_vmxarea(int cpu);
 int allocate_vpid(void);
-void free_vpid(int vpid);
+void add_vmcs_to_loaded_vmcss_on_cpu(
+		struct list_head *loaded_vmcss_on_cpu_link,
+		int cpu);
+void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs);
+int vmx_hardware_enable(void);
+void vmx_hardware_disable(void);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 407e37810419..46e2d5c69d1d 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -456,12 +456,6 @@ noinline void invept_error(unsigned long ext, u64 eptp, gpa_t gpa)
 			ext, eptp, gpa);
 }
 
-/*
- * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed
- * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
- */
-static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
-
 struct vmcs_config vmcs_config __ro_after_init;
 struct vmx_capability vmx_capability __ro_after_init;
 
@@ -716,90 +710,6 @@ static int vmx_set_guest_uret_msr(struct vcpu_vmx *vmx,
 	return ret;
 }
 
-/*
- * Disable VMX and clear CR4.VMXE (even if VMXOFF faults)
- *
- * Note, VMXOFF causes a #UD if the CPU is !post-VMXON, but it's impossible to
- * atomically track post-VMXON state, e.g. this may be called in NMI context.
- * Eat all faults as all other faults on VMXOFF faults are mode related, i.e.
- * faults are guaranteed to be due to the !post-VMXON check unless the CPU is
- * magically in RM, VM86, compat mode, or at CPL>0.
- */
-static int kvm_cpu_vmxoff(void)
-{
-	asm_volatile_goto("1: vmxoff\n\t"
-			  _ASM_EXTABLE(1b, %l[fault])
-			  ::: "cc", "memory" : fault);
-
-	cr4_clear_bits(X86_CR4_VMXE);
-	return 0;
-
-fault:
-	cr4_clear_bits(X86_CR4_VMXE);
-	return -EIO;
-}
-
-static void vmx_emergency_disable(void)
-{
-	int cpu = raw_smp_processor_id();
-	struct loaded_vmcs *v;
-
-	kvm_rebooting = true;
-
-	/*
-	 * Note, CR4.VMXE can be _cleared_ in NMI context, but it can only be
-	 * set in task context.  If this races with VMX is disabled by an NMI,
-	 * VMCLEAR and VMXOFF may #UD, but KVM will eat those faults due to
-	 * kvm_rebooting set.
-	 */
-	if (!(__read_cr4() & X86_CR4_VMXE))
-		return;
-
-	list_for_each_entry(v, &per_cpu(loaded_vmcss_on_cpu, cpu),
-			    loaded_vmcss_on_cpu_link)
-		vmcs_clear(v->vmcs);
-
-	kvm_cpu_vmxoff();
-}
-
-static void __loaded_vmcs_clear(void *arg)
-{
-	struct loaded_vmcs *loaded_vmcs = arg;
-	int cpu = raw_smp_processor_id();
-
-	if (loaded_vmcs->cpu != cpu)
-		return; /* vcpu migration can race with cpu offline */
-	if (per_cpu(current_vmcs, cpu) == loaded_vmcs->vmcs)
-		per_cpu(current_vmcs, cpu) = NULL;
-
-	vmcs_clear(loaded_vmcs->vmcs);
-	if (loaded_vmcs->shadow_vmcs && loaded_vmcs->launched)
-		vmcs_clear(loaded_vmcs->shadow_vmcs);
-
-	list_del(&loaded_vmcs->loaded_vmcss_on_cpu_link);
-
-	/*
-	 * Ensure all writes to loaded_vmcs, including deleting it from its
-	 * current percpu list, complete before setting loaded_vmcs->cpu to
-	 * -1, otherwise a different cpu can see loaded_vmcs->cpu == -1 first
-	 * and add loaded_vmcs to its percpu list before it's deleted from this
-	 * cpu's list. Pairs with the smp_rmb() in vmx_vcpu_load_vmcs().
-	 */
-	smp_wmb();
-
-	loaded_vmcs->cpu = -1;
-	loaded_vmcs->launched = 0;
-}
-
-void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
-{
-	int cpu = loaded_vmcs->cpu;
-
-	if (cpu != -1)
-		smp_call_function_single(cpu,
-			 __loaded_vmcs_clear, loaded_vmcs, 1);
-}
-
 static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg,
 				       unsigned field)
 {
@@ -1411,8 +1321,9 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu,
 		 */
 		smp_rmb();
 
-		list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link,
-			 &per_cpu(loaded_vmcss_on_cpu, cpu));
+		add_vmcs_to_loaded_vmcss_on_cpu(
+				&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link,
+				cpu);
 		local_irq_enable();
 	}
 
@@ -2765,78 +2676,6 @@ static int vmx_check_processor_compat(void)
 	return 0;
 }
 
-static int kvm_cpu_vmxon(u64 vmxon_pointer)
-{
-	u64 msr;
-
-	cr4_set_bits(X86_CR4_VMXE);
-
-	asm_volatile_goto("1: vmxon %[vmxon_pointer]\n\t"
-			  _ASM_EXTABLE(1b, %l[fault])
-			  : : [vmxon_pointer] "m"(vmxon_pointer)
-			  : : fault);
-	return 0;
-
-fault:
-	WARN_ONCE(1, "VMXON faulted, MSR_IA32_FEAT_CTL (0x3a) = 0x%llx\n",
-		  rdmsrl_safe(MSR_IA32_FEAT_CTL, &msr) ? 0xdeadbeef : msr);
-	cr4_clear_bits(X86_CR4_VMXE);
-
-	return -EFAULT;
-}
-
-static int vmx_hardware_enable(void)
-{
-	int cpu = raw_smp_processor_id();
-	u64 phys_addr = __pa(vac_get_vmxarea(cpu));
-	int r;
-
-	if (cr4_read_shadow() & X86_CR4_VMXE)
-		return -EBUSY;
-
-	/*
-	 * This can happen if we hot-added a CPU but failed to allocate
-	 * VP assist page for it.
-	 */
-	if (kvm_is_using_evmcs() && !hv_get_vp_assist_page(cpu))
-		return -EFAULT;
-
-	intel_pt_handle_vmx(1);
-
-	r = kvm_cpu_vmxon(phys_addr);
-	if (r) {
-		intel_pt_handle_vmx(0);
-		return r;
-	}
-
-	if (enable_ept)
-		ept_sync_global();
-
-	return 0;
-}
-
-static void vmclear_local_loaded_vmcss(void)
-{
-	int cpu = raw_smp_processor_id();
-	struct loaded_vmcs *v, *n;
-
-	list_for_each_entry_safe(v, n, &per_cpu(loaded_vmcss_on_cpu, cpu),
-				 loaded_vmcss_on_cpu_link)
-		__loaded_vmcs_clear(v);
-}
-
-static void vmx_hardware_disable(void)
-{
-	vmclear_local_loaded_vmcss();
-
-	if (kvm_cpu_vmxoff())
-		kvm_spurious_fault();
-
-	hv_reset_evmcs();
-
-	intel_pt_handle_vmx(0);
-}
-
 struct vmcs *alloc_vmcs_cpu(bool shadow, int cpu, gfp_t flags)
 {
 	int node = cpu_to_node(cpu);
@@ -8182,13 +8021,15 @@ static void __vmx_exit(void)
 {
 	allow_smaller_maxphyaddr = false;
 
-	cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
+	//TODO: Remove this exit call once VAC is a module
+	vac_vmx_exit();
 
 	vmx_cleanup_l1d_flush();
 }
 
 void vmx_module_exit(void)
 {
+	hv_reset_evmcs();
 	__vmx_exit();
 }
 
@@ -8603,7 +8444,7 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = {
 
 int __init vmx_init(void)
 {
-	int r, cpu;
+	int r;
 
 	/*
 	 * Note, hv_init_evmcs() touches only VMX knobs, i.e. there's nothing
@@ -8626,13 +8467,8 @@ int __init vmx_init(void)
 	if (r)
 		goto err_l1d_flush;
 
-	for_each_possible_cpu(cpu) {
-		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
-
-		pi_init_cpu(cpu);
-	}
-
-	cpu_emergency_register_virt_callback(vmx_emergency_disable);
+	//TODO: Remove this init call once VAC is a module
+	vac_vmx_init();
 
 	vmx_check_vmcs12_offsets();
 
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 11/14] KVM: SVM: Move SVM enable and disable into VAC
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (9 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 10/14] KVM: VMX: Move VMX enable and disable " Anish Ghulati
@ 2023-11-07 20:19 ` Anish Ghulati
  2023-11-07 20:20 ` [RFC PATCH 12/14] KVM: x86: Move VMX and SVM support checks " Anish Ghulati
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:19 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Move SVM's hardware enable and disable into VAC.

Similar to VMX this requires a temporary call to new init and exit
functions within VAC, and moving svm_init_erratum_383 into svm_init
instead of hardware enable.

Delete __svm_exit and make svm_module_exit a noop.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/svm/sev.c      |   2 +-
 arch/x86/kvm/svm/svm.c      | 129 ++----------------------------------
 arch/x86/kvm/svm/svm.h      |   4 +-
 arch/x86/kvm/svm/svm_data.h |  23 +++++++
 arch/x86/kvm/svm/vac.c      | 116 ++++++++++++++++++++++++++++++++
 arch/x86/kvm/svm/vac.h      |  23 +++----
 arch/x86/kvm/vac.h          |  12 ++++
 arch/x86/kvm/vmx/vac.h      |   5 ++
 8 files changed, 175 insertions(+), 139 deletions(-)
 create mode 100644 arch/x86/kvm/svm/svm_data.h

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index b9a0a939d59f..d7b76710ab0a 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -28,6 +28,7 @@
 #include "mmu.h"
 #include "x86.h"
 #include "svm.h"
+#include "svm_data.h"
 #include "svm_ops.h"
 #include "cpuid.h"
 #include "trace.h"
@@ -68,7 +69,6 @@ module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
 static u8 sev_enc_bit;
 static DECLARE_RWSEM(sev_deactivate_lock);
 static DEFINE_MUTEX(sev_bitmap_lock);
-unsigned int max_sev_asid;
 static unsigned int min_sev_asid;
 static unsigned long sev_me_mask;
 static unsigned int nr_asids;
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d53808d8ec37..752f769c0333 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5,11 +5,11 @@
 #include "irq.h"
 #include "mmu.h"
 #include "kvm_cache_regs.h"
-#include "vac.h"
 #include "x86.h"
 #include "smm.h"
 #include "cpuid.h"
 #include "pmu.h"
+#include "vac.h"
 
 #include <linux/module.h>
 #include <linux/mod_devicetable.h>
@@ -68,12 +68,6 @@ static bool erratum_383_found __read_mostly;
 
 u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly;
 
-/*
- * Set osvw_len to higher value when updated Revision Guides
- * are published and we know what the new status bits are
- */
-static uint64_t osvw_len = 4, osvw_status;
-
 static DEFINE_PER_CPU(u64, current_tsc_ratio);
 
 #define X2APIC_MSR(x)	(APIC_BASE_MSR + (x >> 4))
@@ -211,9 +205,6 @@ module_param(vgif, int, 0444);
 static int lbrv = true;
 module_param(lbrv, int, 0444);
 
-static int tsc_scaling = true;
-module_param(tsc_scaling, int, 0444);
-
 /*
  * enable / disable AVIC.  Because the defaults differ for APICv
  * support between VMX and SVM we cannot use module_param_named.
@@ -584,106 +575,6 @@ static void __svm_write_tsc_multiplier(u64 multiplier)
 	__this_cpu_write(current_tsc_ratio, multiplier);
 }
 
-static inline void kvm_cpu_svm_disable(void)
-{
-	uint64_t efer;
-
-	wrmsrl(MSR_VM_HSAVE_PA, 0);
-	rdmsrl(MSR_EFER, efer);
-	if (efer & EFER_SVME) {
-		/*
-		 * Force GIF=1 prior to disabling SVM, e.g. to ensure INIT and
-		 * NMI aren't blocked.
-		 */
-		stgi();
-		wrmsrl(MSR_EFER, efer & ~EFER_SVME);
-	}
-}
-
-static void svm_emergency_disable(void)
-{
-	kvm_rebooting = true;
-
-	kvm_cpu_svm_disable();
-}
-
-static void svm_hardware_disable(void)
-{
-	/* Make sure we clean up behind us */
-	if (tsc_scaling)
-		__svm_write_tsc_multiplier(SVM_TSC_RATIO_DEFAULT);
-
-	kvm_cpu_svm_disable();
-
-	amd_pmu_disable_virt();
-}
-
-static int svm_hardware_enable(void)
-{
-
-	struct svm_cpu_data *sd;
-	uint64_t efer;
-	int me = raw_smp_processor_id();
-
-	rdmsrl(MSR_EFER, efer);
-	if (efer & EFER_SVME)
-		return -EBUSY;
-
-	sd = per_cpu_ptr(&svm_data, me);
-	sd->asid_generation = 1;
-	sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
-	sd->next_asid = sd->max_asid + 1;
-	sd->min_asid = max_sev_asid + 1;
-
-	wrmsrl(MSR_EFER, efer | EFER_SVME);
-
-	wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa);
-
-	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
-		/*
-		 * Set the default value, even if we don't use TSC scaling
-		 * to avoid having stale value in the msr
-		 */
-		__svm_write_tsc_multiplier(SVM_TSC_RATIO_DEFAULT);
-	}
-
-
-	/*
-	 * Get OSVW bits.
-	 *
-	 * Note that it is possible to have a system with mixed processor
-	 * revisions and therefore different OSVW bits. If bits are not the same
-	 * on different processors then choose the worst case (i.e. if erratum
-	 * is present on one processor and not on another then assume that the
-	 * erratum is present everywhere).
-	 */
-	if (cpu_has(&boot_cpu_data, X86_FEATURE_OSVW)) {
-		uint64_t len, status = 0;
-		int err;
-
-		len = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &err);
-		if (!err)
-			status = native_read_msr_safe(MSR_AMD64_OSVW_STATUS,
-						      &err);
-
-		if (err)
-			osvw_status = osvw_len = 0;
-		else {
-			if (len < osvw_len)
-				osvw_len = len;
-			osvw_status |= status;
-			osvw_status &= (1ULL << osvw_len) - 1;
-		}
-	} else
-		osvw_status = osvw_len = 0;
-
-	svm_init_erratum_383();
-
-	amd_pmu_enable_virt();
-
-	return 0;
-}
-
 static void svm_cpu_uninit(int cpu)
 {
 	struct svm_cpu_data *sd = per_cpu_ptr(&svm_data, cpu);
@@ -4878,14 +4769,9 @@ static int svm_vm_init(struct kvm *kvm)
 	return 0;
 }
 
-static void __svm_exit(void)
-{
-	cpu_emergency_unregister_virt_callback(svm_emergency_disable);
-}
-
 void svm_module_exit(void)
 {
-	__svm_exit();
+	return;
 }
 
 static struct kvm_x86_ops svm_x86_ops __initdata = {
@@ -5325,7 +5211,8 @@ int __init svm_init(void)
 	if (r)
 		return r;
 
-	cpu_emergency_register_virt_callback(svm_emergency_disable);
+	//TODO: Remove this init call once VAC is a module
+	vac_svm_init();
 
 	/*
 	 * Common KVM initialization _must_ come last, after this, /dev/kvm is
@@ -5334,11 +5221,9 @@ int __init svm_init(void)
 	r = kvm_init(sizeof(struct vcpu_svm), __alignof__(struct vcpu_svm),
 		     THIS_MODULE);
 	if (r)
-		goto err_kvm_init;
+		return r;
 
-	return 0;
+	svm_init_erratum_383();
 
-err_kvm_init:
-	__svm_exit();
-	return r;
+	return 0;
 }
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h
index 7fc652b1b92d..7bd0dc0e000f 100644
--- a/arch/x86/kvm/svm/svm.h
+++ b/arch/x86/kvm/svm/svm.h
@@ -24,7 +24,7 @@
 
 #include "cpuid.h"
 #include "kvm_cache_regs.h"
-#include "vac.h"
+#include "svm_data.h"
 
 #define __sme_page_pa(x) __sme_set(page_to_pfn(x) << PAGE_SHIFT)
 
@@ -651,8 +651,6 @@ void avic_refresh_virtual_apic_mode(struct kvm_vcpu *vcpu);
 #define GHCB_VERSION_MIN	1ULL
 
 
-extern unsigned int max_sev_asid;
-
 void sev_vm_destroy(struct kvm *kvm);
 int sev_mem_enc_ioctl(struct kvm *kvm, void __user *argp);
 int sev_mem_enc_register_region(struct kvm *kvm,
diff --git a/arch/x86/kvm/svm/svm_data.h b/arch/x86/kvm/svm/svm_data.h
new file mode 100644
index 000000000000..9605807fc9d4
--- /dev/null
+++ b/arch/x86/kvm/svm/svm_data.h
@@ -0,0 +1,23 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#ifndef ARCH_X86_KVM_SVM_DATA_H
+#define ARCH_X86_KVM_SVM_DATA_H
+
+struct svm_cpu_data {
+	u64 asid_generation;
+	u32 max_asid;
+	u32 next_asid;
+	u32 min_asid;
+
+	struct page *save_area;
+	unsigned long save_area_pa;
+
+	struct vmcb *current_vmcb;
+
+	/* index = sev_asid, value = vmcb pointer */
+	struct vmcb **sev_vmcbs;
+};
+
+extern unsigned int max_sev_asid;
+
+#endif // ARCH_X86_KVM_SVM_DATA_H
diff --git a/arch/x86/kvm/svm/vac.c b/arch/x86/kvm/svm/vac.c
index 3e79279c6b34..2dd1c763f7d6 100644
--- a/arch/x86/kvm/svm/vac.c
+++ b/arch/x86/kvm/svm/vac.c
@@ -1,7 +1,123 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+#include <asm/reboot.h>
+#include <asm/svm.h>
 #include <linux/percpu-defs.h>
 
+#include "svm_ops.h"
 #include "vac.h"
 
 DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
+unsigned int max_sev_asid;
+
+static inline void kvm_cpu_svm_disable(void)
+{
+	uint64_t efer;
+
+	wrmsrl(MSR_VM_HSAVE_PA, 0);
+	rdmsrl(MSR_EFER, efer);
+	if (efer & EFER_SVME) {
+		/*
+		 * Force GIF=1 prior to disabling SVM, e.g. to ensure INIT and
+		 * NMI aren't blocked.
+		 */
+		stgi();
+		wrmsrl(MSR_EFER, efer & ~EFER_SVME);
+	}
+}
+
+static void svm_emergency_disable(void)
+{
+	kvm_rebooting = true;
+
+	kvm_cpu_svm_disable();
+}
+
+void svm_hardware_disable(void)
+{
+	/* Make sure we clean up behind us */
+	if (tsc_scaling)
+		// TODO: Fix everything TSC
+		// __svm_write_tsc_multiplier(SVM_TSC_RATIO_DEFAULT);
+
+	kvm_cpu_svm_disable();
+
+	amd_pmu_disable_virt();
+}
+
+int svm_hardware_enable(void)
+{
+
+	struct svm_cpu_data *sd;
+	uint64_t efer;
+	int me = raw_smp_processor_id();
+
+	rdmsrl(MSR_EFER, efer);
+	if (efer & EFER_SVME)
+		return -EBUSY;
+
+	sd = per_cpu_ptr(&svm_data, me);
+	sd->asid_generation = 1;
+	sd->max_asid = cpuid_ebx(SVM_CPUID_FUNC) - 1;
+	sd->next_asid = sd->max_asid + 1;
+	sd->min_asid = max_sev_asid + 1;
+
+	wrmsrl(MSR_EFER, efer | EFER_SVME);
+
+	wrmsrl(MSR_VM_HSAVE_PA, sd->save_area_pa);
+
+	if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) {
+		/*
+		 * Set the default value, even if we don't use TSC scaling
+		 * to avoid having stale value in the msr
+		 */
+		// TODO: Fix everything TSC
+		// __svm_write_tsc_multiplier(SVM_TSC_RATIO_DEFAULT);
+	}
+
+
+	/*
+	 * Get OSVW bits.
+	 *
+	 * Note that it is possible to have a system with mixed processor
+	 * revisions and therefore different OSVW bits. If bits are not the same
+	 * on different processors then choose the worst case (i.e. if erratum
+	 * is present on one processor and not on another then assume that the
+	 * erratum is present everywhere).
+	 */
+	if (cpu_has(&boot_cpu_data, X86_FEATURE_OSVW)) {
+		uint64_t len, status = 0;
+		int err;
+
+		len = native_read_msr_safe(MSR_AMD64_OSVW_ID_LENGTH, &err);
+		if (!err)
+			status = native_read_msr_safe(MSR_AMD64_OSVW_STATUS,
+						      &err);
+
+		if (err)
+			osvw_status = osvw_len = 0;
+		else {
+			if (len < osvw_len)
+				osvw_len = len;
+			osvw_status |= status;
+			osvw_status &= (1ULL << osvw_len) - 1;
+		}
+	} else
+		osvw_status = osvw_len = 0;
+
+	amd_pmu_enable_virt();
+
+	return 0;
+}
+
+int __init vac_svm_init(void)
+{
+	cpu_emergency_register_virt_callback(svm_emergency_disable);
+
+	return 0;
+}
+
+void vac_svm_exit(void)
+{
+	cpu_emergency_unregister_virt_callback(svm_emergency_disable);
+}
diff --git a/arch/x86/kvm/svm/vac.h b/arch/x86/kvm/svm/vac.h
index 2d42e4472703..870cb8a9c8d2 100644
--- a/arch/x86/kvm/svm/vac.h
+++ b/arch/x86/kvm/svm/vac.h
@@ -1,23 +1,20 @@
 // SPDX-License-Identifier: GPL-2.0-only
-//
+
 #ifndef ARCH_X86_KVM_SVM_VAC_H
 #define ARCH_X86_KVM_SVM_VAC_H
 
 #include "../vac.h"
+#include "svm_data.h"
 
-struct svm_cpu_data {
-	u64 asid_generation;
-	u32 max_asid;
-	u32 next_asid;
-	u32 min_asid;
-
-	struct page *save_area;
-	unsigned long save_area_pa;
+static int tsc_scaling = true;
 
-	struct vmcb *current_vmcb;
+/*
+ * Set osvw_len to higher value when updated Revision Guides
+ * are published and we know what the new status bits are
+ */
+static uint64_t osvw_len = 4, osvw_status;
 
-	/* index = sev_asid, value = vmcb pointer */
-	struct vmcb **sev_vmcbs;
-};
+int svm_hardware_enable(void);
+void svm_hardware_disable(void);
 
 #endif // ARCH_X86_KVM_SVM_VAC_H
diff --git a/arch/x86/kvm/vac.h b/arch/x86/kvm/vac.h
index 59cbf36ff8ce..6c0a480ee9e3 100644
--- a/arch/x86/kvm/vac.h
+++ b/arch/x86/kvm/vac.h
@@ -19,6 +19,18 @@ int __init vac_vmx_init(void)
 void vac_vmx_exit(void) {}
 #endif
 
+#ifdef CONFIG_KVM_AMD
+int __init vac_svm_init(void);
+void vac_svm_exit(void);
+#else
+int __init vac_svm_init(void)
+{
+	return 0;
+}
+void vac_svm_exit(void) {}
+#endif
+
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
diff --git a/arch/x86/kvm/vmx/vac.h b/arch/x86/kvm/vmx/vac.h
index daeea8ef0d33..d5af0ca67e3f 100644
--- a/arch/x86/kvm/vmx/vac.h
+++ b/arch/x86/kvm/vmx/vac.h
@@ -1,5 +1,8 @@
 // SPDX-License-Identifier: GPL-2.0-only
 
+#ifndef ARCH_X86_KVM_VMX_VAC_H
+#define ARCH_X86_KVM_VMX_VAC_H
+
 #include <asm/vmx.h>
 
 #include "../vac.h"
@@ -15,3 +18,5 @@ void add_vmcs_to_loaded_vmcss_on_cpu(
 void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs);
 int vmx_hardware_enable(void);
 void vmx_hardware_disable(void);
+
+#endif // ARCH_X86_KVM_VMX_VAC_H
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 12/14] KVM: x86: Move VMX and SVM support checks into VAC
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (10 preceding siblings ...)
  2023-11-07 20:19 ` [RFC PATCH 11/14] KVM: SVM: Move SVM " Anish Ghulati
@ 2023-11-07 20:20 ` Anish Ghulati
  2023-11-07 20:20 ` [RFC PATCH 13/14] KVM: x86: VAC: Move all hardware enable/disable code " Anish Ghulati
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:20 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

kvm_is_***_supported checks to see if VMX and SVM are supported on the
host. Move this to VAC because it needs to be called before each h/w
enable and disable call.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/svm/svm.c | 45 +-----------------------------------------
 arch/x86/kvm/svm/vac.c | 43 ++++++++++++++++++++++++++++++++++++++++
 arch/x86/kvm/vac.h     |  4 ++++
 arch/x86/kvm/vmx/vac.c | 29 +++++++++++++++++++++++++++
 arch/x86/kvm/vmx/vmx.c | 31 +----------------------------
 arch/x86/kvm/x86.h     |  6 ------
 6 files changed, 78 insertions(+), 80 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 752f769c0333..df5673c98e7b 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -515,52 +515,9 @@ static void svm_init_osvw(struct kvm_vcpu *vcpu)
 		vcpu->arch.osvw.status |= 1;
 }
 
-static bool __kvm_is_svm_supported(void)
-{
-	int cpu = smp_processor_id();
-	struct cpuinfo_x86 *c = &cpu_data(cpu);
-
-	u64 vm_cr;
-
-	if (c->x86_vendor != X86_VENDOR_AMD &&
-	    c->x86_vendor != X86_VENDOR_HYGON) {
-		pr_err("CPU %d isn't AMD or Hygon\n", cpu);
-		return false;
-	}
-
-	if (!cpu_has(c, X86_FEATURE_SVM)) {
-		pr_err("SVM not supported by CPU %d\n", cpu);
-		return false;
-	}
-
-	if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
-		pr_info("KVM is unsupported when running as an SEV guest\n");
-		return false;
-	}
-
-	rdmsrl(MSR_VM_CR, vm_cr);
-	if (vm_cr & (1 << SVM_VM_CR_SVM_DISABLE)) {
-		pr_err("SVM disabled (by BIOS) in MSR_VM_CR on CPU %d\n", cpu);
-		return false;
-	}
-
-	return true;
-}
-
-bool kvm_is_svm_supported(void)
-{
-	bool supported;
-
-	migrate_disable();
-	supported = __kvm_is_svm_supported();
-	migrate_enable();
-
-	return supported;
-}
-
 static int svm_check_processor_compat(void)
 {
-	if (!__kvm_is_svm_supported())
+	if (!kvm_is_svm_supported())
 		return -EIO;
 
 	return 0;
diff --git a/arch/x86/kvm/svm/vac.c b/arch/x86/kvm/svm/vac.c
index 2dd1c763f7d6..7c4db99ca7d5 100644
--- a/arch/x86/kvm/svm/vac.c
+++ b/arch/x86/kvm/svm/vac.c
@@ -10,6 +10,49 @@
 DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
 unsigned int max_sev_asid;
 
+static bool __kvm_is_svm_supported(void)
+{
+	int cpu = smp_processor_id();
+	struct cpuinfo_x86 *c = &cpu_data(cpu);
+
+	u64 vm_cr;
+
+	if (c->x86_vendor != X86_VENDOR_AMD &&
+	    c->x86_vendor != X86_VENDOR_HYGON) {
+		pr_err("CPU %d isn't AMD or Hygon\n", cpu);
+		return false;
+	}
+
+	if (!cpu_has(c, X86_FEATURE_SVM)) {
+		pr_err("SVM not supported by CPU %d\n", cpu);
+		return false;
+	}
+
+	if (cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
+		pr_info("KVM is unsupported when running as an SEV guest\n");
+		return false;
+	}
+
+	rdmsrl(MSR_VM_CR, vm_cr);
+	if (vm_cr & (1 << SVM_VM_CR_SVM_DISABLE)) {
+		pr_err("SVM disabled (by BIOS) in MSR_VM_CR on CPU %d\n", cpu);
+		return false;
+	}
+
+	return true;
+}
+
+bool kvm_is_svm_supported(void)
+{
+	bool supported;
+
+	migrate_disable();
+	supported = __kvm_is_svm_supported();
+	migrate_enable();
+
+	return supported;
+}
+
 static inline void kvm_cpu_svm_disable(void)
 {
 	uint64_t efer;
diff --git a/arch/x86/kvm/vac.h b/arch/x86/kvm/vac.h
index 6c0a480ee9e3..5be30cce5a1c 100644
--- a/arch/x86/kvm/vac.h
+++ b/arch/x86/kvm/vac.h
@@ -9,9 +9,11 @@ int __init vac_init(void);
 void vac_exit(void);
 
 #ifdef CONFIG_KVM_INTEL
+bool kvm_is_vmx_supported(void);
 int __init vac_vmx_init(void);
 void vac_vmx_exit(void);
 #else
+bool kvm_is_vmx_supported(void) { return false }
 int __init vac_vmx_init(void)
 {
 	return 0;
@@ -20,9 +22,11 @@ void vac_vmx_exit(void) {}
 #endif
 
 #ifdef CONFIG_KVM_AMD
+bool kvm_is_svm_supported(void);
 int __init vac_svm_init(void);
 void vac_svm_exit(void);
 #else
+bool kvm_is_svm_supported(void) { return false }
 int __init vac_svm_init(void)
 {
 	return 0;
diff --git a/arch/x86/kvm/vmx/vac.c b/arch/x86/kvm/vmx/vac.c
index 202686ccbaec..cdfdeb67a719 100644
--- a/arch/x86/kvm/vmx/vac.c
+++ b/arch/x86/kvm/vmx/vac.c
@@ -31,6 +31,35 @@ struct vmcs *vac_get_vmxarea(int cpu)
 static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
 static DEFINE_SPINLOCK(vmx_vpid_lock);
 
+static bool __kvm_is_vmx_supported(void)
+{
+	int cpu = smp_processor_id();
+
+	if (!(cpuid_ecx(1) & feature_bit(VMX))) {
+		pr_err("VMX not supported by CPU %d\n", cpu);
+		return false;
+	}
+
+	if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
+	    !this_cpu_has(X86_FEATURE_VMX)) {
+		pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL on CPU %d\n", cpu);
+		return false;
+	}
+
+	return true;
+}
+
+bool kvm_is_vmx_supported(void)
+{
+	bool supported;
+
+	migrate_disable();
+	supported = __kvm_is_vmx_supported();
+	migrate_enable();
+
+	return supported;
+}
+
 int allocate_vpid(void)
 {
 	int vpid;
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 46e2d5c69d1d..6301b49e0e80 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -2625,42 +2625,13 @@ static int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	return 0;
 }
 
-static bool __kvm_is_vmx_supported(void)
-{
-	int cpu = smp_processor_id();
-
-	if (!(cpuid_ecx(1) & feature_bit(VMX))) {
-		pr_err("VMX not supported by CPU %d\n", cpu);
-		return false;
-	}
-
-	if (!this_cpu_has(X86_FEATURE_MSR_IA32_FEAT_CTL) ||
-	    !this_cpu_has(X86_FEATURE_VMX)) {
-		pr_err("VMX not enabled (by BIOS) in MSR_IA32_FEAT_CTL on CPU %d\n", cpu);
-		return false;
-	}
-
-	return true;
-}
-
-bool kvm_is_vmx_supported(void)
-{
-	bool supported;
-
-	migrate_disable();
-	supported = __kvm_is_vmx_supported();
-	migrate_enable();
-
-	return supported;
-}
-
 static int vmx_check_processor_compat(void)
 {
 	int cpu = raw_smp_processor_id();
 	struct vmcs_config vmcs_conf;
 	struct vmx_capability vmx_cap;
 
-	if (!__kvm_is_vmx_supported())
+	if (!kvm_is_vmx_supported())
 		return -EIO;
 
 	if (setup_vmcs_config(&vmcs_conf, &vmx_cap) < 0) {
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 322be05e6c5b..1da8efcd3e9c 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -10,18 +10,12 @@
 #include "kvm_emulate.h"
 
 #ifdef CONFIG_KVM_AMD
-bool kvm_is_svm_supported(void);
 int __init svm_init(void);
 void svm_module_exit(void);
-#else
-bool kvm_is_svm_supported(void) { return false; }
 #endif
 #ifdef CONFIG_KVM_INTEL
-bool kvm_is_vmx_supported(void);
 int __init vmx_init(void);
 void vmx_module_exit(void);
-#else
-bool kvm_is_vmx_supported(void) { return false; }
 #endif
 
 struct kvm_caps {
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 13/14] KVM: x86: VAC: Move all hardware enable/disable code into VAC
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (11 preceding siblings ...)
  2023-11-07 20:20 ` [RFC PATCH 12/14] KVM: x86: Move VMX and SVM support checks " Anish Ghulati
@ 2023-11-07 20:20 ` Anish Ghulati
  2023-11-07 20:20 ` [RFC PATCH 14/14] KVM: VAC: Bring up VAC as a new module Anish Ghulati
  2023-11-17  8:53 ` [RFC PATCH 00/14] Support multiple KVM modules on the same host Lai Jiangshan
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:20 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

De-indirect hardware enable and disable. Now that all of these functions
are in the VAC, they don't need to be called via an indirect ops table and
static call.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/include/asm/kvm-x86-ops.h |   2 -
 arch/x86/kvm/svm/svm.c             |   2 -
 arch/x86/kvm/svm/svm_ops.h         |   1 +
 arch/x86/kvm/vac.c                 |  46 +++++++++++-
 arch/x86/kvm/vac.h                 |   9 ++-
 arch/x86/kvm/vmx/vmx.c             |   2 -
 arch/x86/kvm/vmx/vmx_ops.h         |   1 +
 arch/x86/kvm/x86.c                 | 117 -----------------------------
 arch/x86/kvm/x86.h                 |   2 -
 include/linux/kvm_host.h           |   2 +
 virt/kvm/vac.h                     |   5 ++
 11 files changed, 58 insertions(+), 131 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-x86-ops.h
index 764be4a26a0c..340dcae9dd32 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -16,8 +16,6 @@ BUILD_BUG_ON(1)
  */
 KVM_X86_OP(vendor_exit)
 KVM_X86_OP(check_processor_compatibility)
-KVM_X86_OP(hardware_enable)
-KVM_X86_OP(hardware_disable)
 KVM_X86_OP(hardware_unsetup)
 KVM_X86_OP(has_emulated_msr)
 KVM_X86_OP(vcpu_after_set_cpuid)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index df5673c98e7b..fb2c72430c7a 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4739,8 +4739,6 @@ static struct kvm_x86_ops svm_x86_ops __initdata = {
 	.check_processor_compatibility = svm_check_processor_compat,
 
 	.hardware_unsetup = svm_hardware_unsetup,
-	.hardware_enable = svm_hardware_enable,
-	.hardware_disable = svm_hardware_disable,
 	.has_emulated_msr = svm_has_emulated_msr,
 
 	.vcpu_create = svm_vcpu_create,
diff --git a/arch/x86/kvm/svm/svm_ops.h b/arch/x86/kvm/svm/svm_ops.h
index 36c8af87a707..5e89c06b5147 100644
--- a/arch/x86/kvm/svm/svm_ops.h
+++ b/arch/x86/kvm/svm/svm_ops.h
@@ -4,6 +4,7 @@
 
 #include <linux/compiler_types.h>
 
+#include "../vac.h"
 #include "x86.h"
 
 #define svm_asm(insn, clobber...)				\
diff --git a/arch/x86/kvm/vac.c b/arch/x86/kvm/vac.c
index ab77aee4e1fa..79f5c2ac159a 100644
--- a/arch/x86/kvm/vac.c
+++ b/arch/x86/kvm/vac.c
@@ -3,6 +3,8 @@
 #include "vac.h"
 #include <asm/msr.h>
 
+extern bool kvm_rebooting;
+
 u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
 struct kvm_user_return_msrs __percpu *user_return_msrs;
 
@@ -35,7 +37,7 @@ void kvm_on_user_return(struct user_return_notifier *urn)
 	}
 }
 
-void kvm_user_return_msr_cpu_online(void)
+static void kvm_user_return_msr_cpu_online(void)
 {
 	unsigned int cpu = smp_processor_id();
 	struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
@@ -49,7 +51,7 @@ void kvm_user_return_msr_cpu_online(void)
 	}
 }
 
-void drop_user_return_notifiers(void)
+static void drop_user_return_notifiers(void)
 {
 	unsigned int cpu = smp_processor_id();
 	struct kvm_user_return_msrs *msrs = per_cpu_ptr(user_return_msrs, cpu);
@@ -117,6 +119,46 @@ int kvm_set_user_return_msr(unsigned int slot, u64 value, u64 mask)
 	return 0;
 }
 
+int kvm_arch_hardware_enable(void)
+{
+	int ret = -EIO;
+
+	kvm_user_return_msr_cpu_online();
+
+	if (kvm_is_vmx_supported())
+		ret = vmx_hardware_enable();
+	else if (kvm_is_svm_supported())
+		ret = svm_hardware_enable();
+	if (ret != 0)
+		return ret;
+
+	// TODO: Handle unstable TSC
+
+	return 0;
+}
+
+void kvm_arch_hardware_disable(void)
+{
+	if (kvm_is_vmx_supported())
+		vmx_hardware_disable();
+	else if (kvm_is_svm_supported())
+		svm_hardware_disable();
+	drop_user_return_notifiers();
+}
+
+/*
+ * Handle a fault on a hardware virtualization (VMX or SVM) instruction.
+ *
+ * Hardware virtualization extension instructions may fault if a reboot turns
+ * off virtualization while processes are running.  Usually after catching the
+ * fault we just panic; during reboot instead the instruction is ignored.
+ */
+noinstr void kvm_spurious_fault(void)
+{
+	/* Fault while not rebooting.  We want the trace. */
+	BUG_ON(!kvm_rebooting);
+}
+
 int kvm_alloc_user_return_msrs(void)
 {
 	user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
diff --git a/arch/x86/kvm/vac.h b/arch/x86/kvm/vac.h
index 5be30cce5a1c..daf1f137d196 100644
--- a/arch/x86/kvm/vac.h
+++ b/arch/x86/kvm/vac.h
@@ -5,13 +5,14 @@
 
 #include <linux/user-return-notifier.h>
 
-int __init vac_init(void);
-void vac_exit(void);
+void kvm_spurious_fault(void);
 
 #ifdef CONFIG_KVM_INTEL
 bool kvm_is_vmx_supported(void);
 int __init vac_vmx_init(void);
 void vac_vmx_exit(void);
+int vmx_hardware_enable(void);
+void vmx_hardware_disable(void);
 #else
 bool kvm_is_vmx_supported(void) { return false }
 int __init vac_vmx_init(void)
@@ -25,6 +26,8 @@ void vac_vmx_exit(void) {}
 bool kvm_is_svm_supported(void);
 int __init vac_svm_init(void);
 void vac_svm_exit(void);
+int svm_hardware_enable(void);
+void svm_hardware_disable(void);
 #else
 bool kvm_is_svm_supported(void) { return false }
 int __init vac_svm_init(void)
@@ -59,8 +62,6 @@ int kvm_add_user_return_msr(u32 msr);
 int kvm_find_user_return_msr(u32 msr);
 int kvm_set_user_return_msr(unsigned int slot, u64 value, u64 mask);
 void kvm_on_user_return(struct user_return_notifier *urn);
-void kvm_user_return_msr_cpu_online(void);
-void drop_user_return_notifiers(void);
 
 static inline bool kvm_is_supported_user_return_msr(u32 msr)
 {
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 6301b49e0e80..69a6a8591996 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8013,8 +8013,6 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.hardware_unsetup = vmx_hardware_unsetup,
 
-	.hardware_enable = vmx_hardware_enable,
-	.hardware_disable = vmx_hardware_disable,
 	.has_emulated_msr = vmx_has_emulated_msr,
 
 	.vm_size = sizeof(struct kvm_vmx),
diff --git a/arch/x86/kvm/vmx/vmx_ops.h b/arch/x86/kvm/vmx/vmx_ops.h
index 33af7b4c6eb4..60325fd39120 100644
--- a/arch/x86/kvm/vmx/vmx_ops.h
+++ b/arch/x86/kvm/vmx/vmx_ops.h
@@ -8,6 +8,7 @@
 
 #include "hyperv.h"
 #include "vmcs.h"
+#include "../vac.h"
 #include "../x86.h"
 
 void vmread_error(unsigned long field);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7466a5945147..a74139061e4d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -370,19 +370,6 @@ int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 	return 0;
 }
 
-/*
- * Handle a fault on a hardware virtualization (VMX or SVM) instruction.
- *
- * Hardware virtualization extension instructions may fault if a reboot turns
- * off virtualization while processes are running.  Usually after catching the
- * fault we just panic; during reboot instead the instruction is ignored.
- */
-noinstr void kvm_spurious_fault(void)
-{
-	/* Fault while not rebooting.  We want the trace. */
-	BUG_ON(!kvm_rebooting);
-}
-
 #define EXCPT_BENIGN		0
 #define EXCPT_CONTRIBUTORY	1
 #define EXCPT_PF		2
@@ -9363,7 +9350,6 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 	return 0;
 
 out_unwind_ops:
-	kvm_x86_ops.hardware_enable = NULL;
 	static_call(kvm_x86_hardware_unsetup)();
 out_mmu_exit:
 	kvm_mmu_vendor_module_exit();
@@ -9414,7 +9400,6 @@ void kvm_x86_vendor_exit(void)
 	WARN_ON(static_branch_unlikely(&kvm_xen_enabled.key));
 #endif
 	mutex_lock(&vendor_module_lock);
-	kvm_x86_ops.hardware_enable = NULL;
 	mutex_unlock(&vendor_module_lock);
 }
 
@@ -11952,108 +11937,6 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
 	kvm_rip_write(vcpu, 0);
 }
 
-int kvm_arch_hardware_enable(void)
-{
-	struct kvm *kvm;
-	struct kvm_vcpu *vcpu;
-	unsigned long i;
-	int ret;
-	u64 local_tsc;
-	u64 max_tsc = 0;
-	bool stable, backwards_tsc = false;
-
-	kvm_user_return_msr_cpu_online();
-
-	ret = kvm_x86_check_processor_compatibility();
-	if (ret)
-		return ret;
-
-	ret = static_call(kvm_x86_hardware_enable)();
-	if (ret != 0)
-		return ret;
-
-	local_tsc = rdtsc();
-	stable = !kvm_check_tsc_unstable();
-	list_for_each_entry(kvm, &vm_list, vm_list) {
-		kvm_for_each_vcpu(i, vcpu, kvm) {
-			if (!stable && vcpu->cpu == smp_processor_id())
-				kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu);
-			if (stable && vcpu->arch.last_host_tsc > local_tsc) {
-				backwards_tsc = true;
-				if (vcpu->arch.last_host_tsc > max_tsc)
-					max_tsc = vcpu->arch.last_host_tsc;
-			}
-		}
-	}
-
-	/*
-	 * Sometimes, even reliable TSCs go backwards.  This happens on
-	 * platforms that reset TSC during suspend or hibernate actions, but
-	 * maintain synchronization.  We must compensate.  Fortunately, we can
-	 * detect that condition here, which happens early in CPU bringup,
-	 * before any KVM threads can be running.  Unfortunately, we can't
-	 * bring the TSCs fully up to date with real time, as we aren't yet far
-	 * enough into CPU bringup that we know how much real time has actually
-	 * elapsed; our helper function, ktime_get_boottime_ns() will be using boot
-	 * variables that haven't been updated yet.
-	 *
-	 * So we simply find the maximum observed TSC above, then record the
-	 * adjustment to TSC in each VCPU.  When the VCPU later gets loaded,
-	 * the adjustment will be applied.  Note that we accumulate
-	 * adjustments, in case multiple suspend cycles happen before some VCPU
-	 * gets a chance to run again.  In the event that no KVM threads get a
-	 * chance to run, we will miss the entire elapsed period, as we'll have
-	 * reset last_host_tsc, so VCPUs will not have the TSC adjusted and may
-	 * loose cycle time.  This isn't too big a deal, since the loss will be
-	 * uniform across all VCPUs (not to mention the scenario is extremely
-	 * unlikely). It is possible that a second hibernate recovery happens
-	 * much faster than a first, causing the observed TSC here to be
-	 * smaller; this would require additional padding adjustment, which is
-	 * why we set last_host_tsc to the local tsc observed here.
-	 *
-	 * N.B. - this code below runs only on platforms with reliable TSC,
-	 * as that is the only way backwards_tsc is set above.  Also note
-	 * that this runs for ALL vcpus, which is not a bug; all VCPUs should
-	 * have the same delta_cyc adjustment applied if backwards_tsc
-	 * is detected.  Note further, this adjustment is only done once,
-	 * as we reset last_host_tsc on all VCPUs to stop this from being
-	 * called multiple times (one for each physical CPU bringup).
-	 *
-	 * Platforms with unreliable TSCs don't have to deal with this, they
-	 * will be compensated by the logic in vcpu_load, which sets the TSC to
-	 * catchup mode.  This will catchup all VCPUs to real time, but cannot
-	 * guarantee that they stay in perfect synchronization.
-	 */
-	if (backwards_tsc) {
-		u64 delta_cyc = max_tsc - local_tsc;
-		list_for_each_entry(kvm, &vm_list, vm_list) {
-			kvm->arch.backwards_tsc_observed = true;
-			kvm_for_each_vcpu(i, vcpu, kvm) {
-				vcpu->arch.tsc_offset_adjustment += delta_cyc;
-				vcpu->arch.last_host_tsc = local_tsc;
-				kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu);
-			}
-
-			/*
-			 * We have to disable TSC offset matching.. if you were
-			 * booting a VM while issuing an S4 host suspend....
-			 * you may have some problem.  Solving this issue is
-			 * left as an exercise to the reader.
-			 */
-			kvm->arch.last_tsc_nsec = 0;
-			kvm->arch.last_tsc_write = 0;
-		}
-
-	}
-	return 0;
-}
-
-void kvm_arch_hardware_disable(void)
-{
-	static_call(kvm_x86_hardware_disable)();
-	drop_user_return_notifiers();
-}
-
 bool kvm_vcpu_is_reset_bsp(struct kvm_vcpu *vcpu)
 {
 	return vcpu->kvm->arch.bsp_vcpu_id == vcpu->vcpu_id;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 1da8efcd3e9c..17ff3917b9a8 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -40,8 +40,6 @@ struct kvm_caps {
 	u64 supported_perf_cap;
 };
 
-void kvm_spurious_fault(void);
-
 #define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check)		\
 ({									\
 	bool failed = (consistency_check);				\
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f0afe549c0d6..d26671682764 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1467,9 +1467,11 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
 #endif
 
 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
+#ifndef CONFIG_X86
 int kvm_arch_hardware_enable(void);
 void kvm_arch_hardware_disable(void);
 #endif
+#endif
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
diff --git a/virt/kvm/vac.h b/virt/kvm/vac.h
index f3e7b08168df..b5159fa3f18d 100644
--- a/virt/kvm/vac.h
+++ b/virt/kvm/vac.h
@@ -13,6 +13,11 @@ int kvm_offline_cpu(unsigned int cpu);
 void hardware_disable_all(void);
 int hardware_enable_all(void);
 
+#ifdef CONFIG_X86
+int kvm_arch_hardware_enable(void);
+void kvm_arch_hardware_disable(void);
+#endif
+
 extern struct notifier_block kvm_reboot_notifier;
 
 extern struct syscore_ops kvm_syscore_ops;
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC PATCH 14/14] KVM: VAC: Bring up VAC as a new module
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (12 preceding siblings ...)
  2023-11-07 20:20 ` [RFC PATCH 13/14] KVM: x86: VAC: Move all hardware enable/disable code " Anish Ghulati
@ 2023-11-07 20:20 ` Anish Ghulati
  2023-11-17  8:53 ` [RFC PATCH 00/14] Support multiple KVM modules on the same host Lai Jiangshan
  14 siblings, 0 replies; 22+ messages in thread
From: Anish Ghulati @ 2023-11-07 20:20 UTC (permalink / raw)
  To: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland
  Cc: Anish Ghulati

Add Kconfig options to build VAC as a new module. Make KVM dependent on
this new "base" module.

Add another Kconfig option to accept a postfix ID for multi-KVM. This
option allows setting a postfix to KVM module and character device names
to distinguish multiple KVMs running on the same host. Set the default
value to null.

Opportunistically fix indentation on the Makefile.

Resolve TODOs by moving vac_vmx/svm_init/exit() calls to the module_init
and module_exit functions of VAC.

Add exports for VAC data and functions that are required in KVM. Also
make some functions private now that they are not needed outside VAC.

TODO: Fix the module name, which is currently set to kvm-vac.

Signed-off-by: Anish Ghulati <aghulati@google.com>
---
 arch/x86/kvm/Kconfig      | 17 ++++++++++++++++
 arch/x86/kvm/Makefile     | 29 +++++++++++++++------------
 arch/x86/kvm/svm/svm.c    |  3 ---
 arch/x86/kvm/svm/vac.c    |  8 +++++++-
 arch/x86/kvm/vac.c        | 42 +++++++++++++++++++++++++++++++++++++--
 arch/x86/kvm/vac.h        |  2 --
 arch/x86/kvm/vmx/nested.c |  5 +++--
 arch/x86/kvm/vmx/vac.c    | 25 ++++++++++++++++++++---
 arch/x86/kvm/vmx/vac.h    |  2 --
 arch/x86/kvm/vmx/vmx.c    | 14 +++++--------
 arch/x86/kvm/x86.c        |  7 -------
 virt/kvm/Makefile.kvm     | 15 +++++++-------
 virt/kvm/kvm_main.c       |  7 ++++---
 virt/kvm/vac.c            |  7 +++++++
 14 files changed, 128 insertions(+), 55 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index adfa57d59643..42a0a0107572 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -17,10 +17,19 @@ menuconfig VIRTUALIZATION
 
 if VIRTUALIZATION
 
+config VAC
+	tristate "Virtualization Acceleration Component (VAC)"
+	select KVM_GENERIC_HARDWARE_ENABLING
+	help
+	  Support running multiple KVM modules on the same host. If VAC is not
+          selected to run as a separate module, it will run as part of KVM, and
+          the system will only support a single KVM.
+
 config KVM
 	tristate "Kernel-based Virtual Machine (KVM) support"
 	depends on HIGH_RES_TIMERS
 	depends on X86_LOCAL_APIC
+	depends on VAC
 	select PREEMPT_NOTIFIERS
 	select MMU_NOTIFIER
 	select HAVE_KVM_IRQCHIP
@@ -60,6 +69,14 @@ config KVM
 
 	  If unsure, say N.
 
+config KVM_ID
+	string "KVM: Postfix ID for multi-KVM"
+	depends on KVM
+	default ""
+	help
+	  This is the postfix string to append to the KVM module and
+          character device to differentiate multiple KVM builds.
+
 config KVM_WERROR
 	bool "Compile KVM with -Werror"
 	# KASAN may cause the build to fail due to larger frames
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index b3de4bd7988f..48d263ecaffa 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -8,33 +8,36 @@ endif
 
 include $(srctree)/virt/kvm/Makefile.kvm
 
-kvm-y			+= x86.o emulate.o i8259.o irq.o lapic.o \
+kvm-vac-y                   := $(KVM)/vac.o vac.o
+kvm-vac-$(CONFIG_KVM_INTEL) += vmx/vac.o
+kvm-vac-$(CONFIG_KVM_AMD)   += svm/vac.o
+
+kvm$(CONFIG_KVM_ID)-y	   += x86.o emulate.o i8259.o irq.o lapic.o \
 			   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
 			   hyperv.o debugfs.o mmu/mmu.o mmu/page_track.o \
 			   mmu/spte.o
 
-kvm-y                   += vac.o vmx/vac.o svm/vac.o
-
 ifdef CONFIG_HYPERV
-kvm-y			+= kvm_onhyperv.o
+kvm$(CONFIG_KVM_ID)-y	    += kvm_onhyperv.o
 endif
 
-kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
-kvm-$(CONFIG_KVM_XEN)	+= xen.o
-kvm-$(CONFIG_KVM_SMM)	+= smm.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_X86_64)    	+= mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_KVM_XEN)		+= xen.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_KVM_SMM)		+= smm.o
 
-kvm-$(CONFIG_KVM_INTEL)	+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
-			   vmx/hyperv.o vmx/nested.o vmx/posted_intr.o
-kvm-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_KVM_INTEL)		+= vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
+						vmx/hyperv.o vmx/nested.o vmx/posted_intr.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_X86_SGX_KVM)	+= vmx/sgx.o
 
-kvm-$(CONFIG_KVM_AMD)	+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
-			   svm/sev.o svm/hyperv.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_KVM_AMD)		+= svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o \
+						svm/sev.o svm/hyperv.o
 
 ifdef CONFIG_HYPERV
 kvm-$(CONFIG_KVM_AMD)	+= svm/svm_onhyperv.o
 endif
 
-obj-$(CONFIG_KVM)	+= kvm.o
+obj-$(CONFIG_VAC)       += kvm-vac.o
+obj-$(CONFIG_KVM)	+= kvm$(CONFIG_KVM_ID).o
 
 AFLAGS_svm/vmenter.o    := -iquote $(obj)
 $(obj)/svm/vmenter.o: $(obj)/kvm-asm-offsets.h
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index fb2c72430c7a..6b9f81fc84db 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5166,9 +5166,6 @@ int __init svm_init(void)
 	if (r)
 		return r;
 
-	//TODO: Remove this init call once VAC is a module
-	vac_svm_init();
-
 	/*
 	 * Common KVM initialization _must_ come last, after this, /dev/kvm is
 	 * exposed to userspace!
diff --git a/arch/x86/kvm/svm/vac.c b/arch/x86/kvm/svm/vac.c
index 7c4db99ca7d5..37ad2a3a9d2d 100644
--- a/arch/x86/kvm/svm/vac.c
+++ b/arch/x86/kvm/svm/vac.c
@@ -8,7 +8,10 @@
 #include "vac.h"
 
 DEFINE_PER_CPU(struct svm_cpu_data, svm_data);
+EXPORT_SYMBOL_GPL(svm_data);
+
 unsigned int max_sev_asid;
+EXPORT_SYMBOL_GPL(max_sev_asid);
 
 static bool __kvm_is_svm_supported(void)
 {
@@ -52,6 +55,7 @@ bool kvm_is_svm_supported(void)
 
 	return supported;
 }
+EXPORT_SYMBOL_GPL(kvm_is_svm_supported);
 
 static inline void kvm_cpu_svm_disable(void)
 {
@@ -87,6 +91,7 @@ void svm_hardware_disable(void)
 
 	amd_pmu_disable_virt();
 }
+EXPORT_SYMBOL_GPL(svm_hardware_disable);
 
 int svm_hardware_enable(void)
 {
@@ -152,6 +157,7 @@ int svm_hardware_enable(void)
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(svm_hardware_enable);
 
 int __init vac_svm_init(void)
 {
@@ -160,7 +166,7 @@ int __init vac_svm_init(void)
 	return 0;
 }
 
-void vac_svm_exit(void)
+void __exit vac_svm_exit(void)
 {
 	cpu_emergency_unregister_virt_callback(svm_emergency_disable);
 }
diff --git a/arch/x86/kvm/vac.c b/arch/x86/kvm/vac.c
index 79f5c2ac159a..2d7ca59b4c90 100644
--- a/arch/x86/kvm/vac.c
+++ b/arch/x86/kvm/vac.c
@@ -2,13 +2,18 @@
 
 #include "vac.h"
 #include <asm/msr.h>
+#include <linux/module.h>
+
+MODULE_LICENSE("GPL");
 
 extern bool kvm_rebooting;
 
 u32 __read_mostly kvm_uret_msrs_list[KVM_MAX_NR_USER_RETURN_MSRS];
 struct kvm_user_return_msrs __percpu *user_return_msrs;
+EXPORT_SYMBOL_GPL(user_return_msrs);
 
 u32 __read_mostly kvm_nr_uret_msrs;
+EXPORT_SYMBOL_GPL(kvm_nr_uret_msrs);
 
 void kvm_on_user_return(struct user_return_notifier *urn)
 {
@@ -85,6 +90,7 @@ int kvm_add_user_return_msr(u32 msr)
 	kvm_uret_msrs_list[kvm_nr_uret_msrs] = msr;
 	return kvm_nr_uret_msrs++;
 }
+EXPORT_SYMBOL_GPL(kvm_add_user_return_msr);
 
 int kvm_find_user_return_msr(u32 msr)
 {
@@ -96,6 +102,7 @@ int kvm_find_user_return_msr(u32 msr)
 	}
 	return -1;
 }
+EXPORT_SYMBOL_GPL(kvm_find_user_return_msr);
 
 int kvm_set_user_return_msr(unsigned int slot, u64 value, u64 mask)
 {
@@ -118,6 +125,7 @@ int kvm_set_user_return_msr(unsigned int slot, u64 value, u64 mask)
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(kvm_set_user_return_msr);
 
 int kvm_arch_hardware_enable(void)
 {
@@ -158,8 +166,9 @@ noinstr void kvm_spurious_fault(void)
 	/* Fault while not rebooting.  We want the trace. */
 	BUG_ON(!kvm_rebooting);
 }
+EXPORT_SYMBOL_GPL(kvm_spurious_fault);
 
-int kvm_alloc_user_return_msrs(void)
+static int kvm_alloc_user_return_msrs(void)
 {
 	user_return_msrs = alloc_percpu(struct kvm_user_return_msrs);
 	if (!user_return_msrs) {
@@ -170,7 +179,36 @@ int kvm_alloc_user_return_msrs(void)
 	return 0;
 }
 
-void kvm_free_user_return_msrs(void)
+static void kvm_free_user_return_msrs(void)
 {
 	free_percpu(user_return_msrs);
 }
+
+int __init vac_init(void)
+{
+	int r = 0;
+
+	r = kvm_alloc_user_return_msrs();
+	if (r)
+		goto out_user_return_msrs;
+
+	if (kvm_is_vmx_supported())
+		r = vac_vmx_init();
+	else if (kvm_is_svm_supported())
+		r = vac_svm_init();
+
+out_user_return_msrs:
+	return r;
+}
+module_init(vac_init);
+
+void __exit vac_exit(void)
+{
+	if (kvm_is_vmx_supported())
+		vac_vmx_exit();
+	else if (kvm_is_svm_supported())
+		vac_svm_exit();
+
+	kvm_free_user_return_msrs();
+}
+module_exit(vac_exit);
diff --git a/arch/x86/kvm/vac.h b/arch/x86/kvm/vac.h
index daf1f137d196..a40e5309ec5f 100644
--- a/arch/x86/kvm/vac.h
+++ b/arch/x86/kvm/vac.h
@@ -56,8 +56,6 @@ struct kvm_user_return_msrs {
 
 extern u32 __read_mostly kvm_nr_uret_msrs;
 
-int kvm_alloc_user_return_msrs(void);
-void kvm_free_user_return_msrs(void);
 int kvm_add_user_return_msr(u32 msr);
 int kvm_find_user_return_msr(u32 msr);
 int kvm_set_user_return_msr(unsigned int slot, u64 value, u64 mask);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 5c6ac7662453..c4999b4cf257 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -307,7 +307,8 @@ static void free_nested(struct kvm_vcpu *vcpu)
 	vmx->nested.vmxon = false;
 	vmx->nested.smm.vmxon = false;
 	vmx->nested.vmxon_ptr = INVALID_GPA;
-	free_vpid(vmx->nested.vpid02);
+	if (enable_vpid)
+		free_vpid(vmx->nested.vpid02);
 	vmx->nested.posted_intr_nv = -1;
 	vmx->nested.current_vmptr = INVALID_GPA;
 	if (enable_shadow_vmcs) {
@@ -5115,7 +5116,7 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu)
 		     HRTIMER_MODE_ABS_PINNED);
 	vmx->nested.preemption_timer.function = vmx_preemption_timer_fn;
 
-	vmx->nested.vpid02 = allocate_vpid();
+	vmx->nested.vpid02 = enable_vpid ? allocate_vpid() : 0;
 
 	vmx->nested.vmcs02_initialized = false;
 	vmx->nested.vmxon = true;
diff --git a/arch/x86/kvm/vmx/vac.c b/arch/x86/kvm/vmx/vac.c
index cdfdeb67a719..e147b9890c99 100644
--- a/arch/x86/kvm/vmx/vac.c
+++ b/arch/x86/kvm/vmx/vac.c
@@ -8,6 +8,10 @@
 #include "vmx_ops.h"
 #include "posted_intr.h"
 
+// TODO: Move these to VAC
+void vmclear_error(struct vmcs *vmcs, u64 phys_addr) {}
+void invept_error(unsigned long ext, u64 eptp, gpa_t gpa) {}
+
 /*
  * We maintain a per-CPU linked-list of VMCS loaded on that CPU. This is needed
  * when a CPU is brought down, and we need to VMCLEAR all VMCSs loaded on it.
@@ -17,16 +21,19 @@ static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu);
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 
 DEFINE_PER_CPU(struct vmcs *, current_vmcs);
+EXPORT_SYMBOL_GPL(current_vmcs);
 
 void vac_set_vmxarea(struct vmcs *vmcs, int cpu)
 {
 	per_cpu(vmxarea, cpu) = vmcs;
 }
+EXPORT_SYMBOL_GPL(vac_set_vmxarea);
 
 struct vmcs *vac_get_vmxarea(int cpu)
 {
 	return per_cpu(vmxarea, cpu);
 }
+EXPORT_SYMBOL_GPL(vac_get_vmxarea);
 
 static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS);
 static DEFINE_SPINLOCK(vmx_vpid_lock);
@@ -59,6 +66,7 @@ bool kvm_is_vmx_supported(void)
 
 	return supported;
 }
+EXPORT_SYMBOL_GPL(kvm_is_vmx_supported);
 
 int allocate_vpid(void)
 {
@@ -75,6 +83,7 @@ int allocate_vpid(void)
 	spin_unlock(&vmx_vpid_lock);
 	return vpid;
 }
+EXPORT_SYMBOL_GPL(allocate_vpid);
 
 void free_vpid(int vpid)
 {
@@ -84,6 +93,7 @@ void free_vpid(int vpid)
 	__clear_bit(vpid, vmx_vpid_bitmap);
 	spin_unlock(&vmx_vpid_lock);
 }
+EXPORT_SYMBOL_GPL(free_vpid);
 
 void add_vmcs_to_loaded_vmcss_on_cpu(
 		struct list_head *loaded_vmcss_on_cpu_link,
@@ -91,6 +101,7 @@ void add_vmcs_to_loaded_vmcss_on_cpu(
 {
 	list_add(loaded_vmcss_on_cpu_link, &per_cpu(loaded_vmcss_on_cpu, cpu));
 }
+EXPORT_SYMBOL_GPL(add_vmcs_to_loaded_vmcss_on_cpu);
 
 static void __loaded_vmcs_clear(void *arg)
 {
@@ -130,6 +141,7 @@ void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs)
 			 __loaded_vmcs_clear, loaded_vmcs, 1);
 
 }
+EXPORT_SYMBOL_GPL(loaded_vmcs_clear);
 
 static int kvm_cpu_vmxon(u64 vmxon_pointer)
 {
@@ -175,11 +187,16 @@ int vmx_hardware_enable(void)
 		return r;
 	}
 
-	if (enable_ept)
+	// TODO: VAC: Since we can have a mix of KVMs with enable_ept=0 and =1,
+	// we need to perform a global INVEPT here.
+	// TODO: Check for the
+	// vmx_capability invept bit before executing this.
+	if (1)
 		ept_sync_global();
 
 	return 0;
 }
+EXPORT_SYMBOL_GPL(vmx_hardware_enable);
 
 static void vmclear_local_loaded_vmcss(void)
 {
@@ -246,6 +263,7 @@ void vmx_hardware_disable(void)
 
 	intel_pt_handle_vmx(0);
 }
+EXPORT_SYMBOL_GPL(vmx_hardware_disable);
 
 int __init vac_vmx_init(void)
 {
@@ -254,7 +272,8 @@ int __init vac_vmx_init(void)
 	for_each_possible_cpu(cpu) {
 		INIT_LIST_HEAD(&per_cpu(loaded_vmcss_on_cpu, cpu));
 
-		pi_init_cpu(cpu);
+		// TODO: Move posted interrupts list to VAC
+		// pi_init_cpu(cpu);
 	}
 
 	cpu_emergency_register_virt_callback(vmx_emergency_disable);
@@ -262,7 +281,7 @@ int __init vac_vmx_init(void)
 	return 0;
 }
 
-void vac_vmx_exit(void)
+void __exit vac_vmx_exit(void)
 {
 	cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
 }
diff --git a/arch/x86/kvm/vmx/vac.h b/arch/x86/kvm/vmx/vac.h
index d5af0ca67e3f..991df7ad9d81 100644
--- a/arch/x86/kvm/vmx/vac.h
+++ b/arch/x86/kvm/vmx/vac.h
@@ -16,7 +16,5 @@ void add_vmcs_to_loaded_vmcss_on_cpu(
 		struct list_head *loaded_vmcss_on_cpu_link,
 		int cpu);
 void loaded_vmcs_clear(struct loaded_vmcs *loaded_vmcs);
-int vmx_hardware_enable(void);
-void vmx_hardware_disable(void);
 
 #endif // ARCH_X86_KVM_VMX_VAC_H
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 69a6a8591996..8d749da61c71 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7201,7 +7201,8 @@ static void vmx_vcpu_free(struct kvm_vcpu *vcpu)
 
 	if (enable_pml)
 		vmx_destroy_pml_buffer(vmx);
-	free_vpid(vmx->vpid);
+	if (enable_vpid)
+		free_vpid(vmx->vpid);
 	nested_vmx_free_vcpu(vcpu);
 	free_loaded_vmcs(vmx->loaded_vmcs);
 }
@@ -7219,7 +7220,7 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 
 	err = -ENOMEM;
 
-	vmx->vpid = allocate_vpid();
+	vmx->vpid = enable_vpid ? allocate_vpid() : 0;
 
 	/*
 	 * If PML is turned on, failure on enabling PML just results in failure
@@ -7308,7 +7309,8 @@ static int vmx_vcpu_create(struct kvm_vcpu *vcpu)
 free_pml:
 	vmx_destroy_pml_buffer(vmx);
 free_vpid:
-	free_vpid(vmx->vpid);
+	if (enable_vpid)
+		free_vpid(vmx->vpid);
 	return err;
 }
 
@@ -7992,9 +7994,6 @@ static void __vmx_exit(void)
 {
 	allow_smaller_maxphyaddr = false;
 
-	//TODO: Remove this exit call once VAC is a module
-	vac_vmx_exit();
-
 	vmx_cleanup_l1d_flush();
 }
 
@@ -8436,9 +8435,6 @@ int __init vmx_init(void)
 	if (r)
 		goto err_l1d_flush;
 
-	//TODO: Remove this init call once VAC is a module
-	vac_vmx_init();
-
 	vmx_check_vmcs12_offsets();
 
 	/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a74139061e4d..57b5bee2d484 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9274,10 +9274,6 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 		return -ENOMEM;
 	}
 
-	r = kvm_alloc_user_return_msrs();
-	if (r)
-		goto out_free_x86_emulator_cache;
-
 	r = kvm_mmu_vendor_module_init();
 	if (r)
 		goto out_free_percpu;
@@ -9354,8 +9350,6 @@ static int __kvm_x86_vendor_init(struct kvm_x86_init_ops *ops)
 out_mmu_exit:
 	kvm_mmu_vendor_module_exit();
 out_free_percpu:
-	kvm_free_user_return_msrs();
-out_free_x86_emulator_cache:
 	kmem_cache_destroy(x86_emulator_cache);
 	return r;
 }
@@ -9393,7 +9387,6 @@ void kvm_x86_vendor_exit(void)
 #endif
 	static_call(kvm_x86_hardware_unsetup)();
 	kvm_mmu_vendor_module_exit();
-	kvm_free_user_return_msrs();
 	kmem_cache_destroy(x86_emulator_cache);
 #ifdef CONFIG_KVM_XEN
 	static_key_deferred_flush(&kvm_xen_enabled);
diff --git a/virt/kvm/Makefile.kvm b/virt/kvm/Makefile.kvm
index 7876021ea4d7..f1ad75797fe8 100644
--- a/virt/kvm/Makefile.kvm
+++ b/virt/kvm/Makefile.kvm
@@ -7,13 +7,12 @@ ccflags-y += -I$(srctree)/$(src) -D__KVM__
 
 KVM ?= ../../../virt/kvm
 
-kvm-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
+kvm$(CONFIG_KVM_ID)-y := $(KVM)/kvm_main.o $(KVM)/eventfd.o $(KVM)/binary_stats.o
 ifdef CONFIG_VFIO
-kvm-y += $(KVM)/vfio.o
+kvm$(CONFIG_KVM_ID)-y += $(KVM)/vfio.o
 endif
-kvm-y += $(KVM)/vac.o
-kvm-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
-kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
-kvm-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
-kvm-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
-kvm-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_HAVE_KVM_IRQ_ROUTING) += $(KVM)/irqchip.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_HAVE_KVM_DIRTY_RING) += $(KVM)/dirty_ring.o
+kvm$(CONFIG_KVM_ID)-$(CONFIG_HAVE_KVM_PFNCACHE) += $(KVM)/pfncache.o
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 575f044fd842..c4af06b1e62c 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5163,9 +5163,8 @@ static struct file_operations kvm_chardev_ops = {
 };
 
 static struct miscdevice kvm_dev = {
-	KVM_MINOR,
-	"kvm",
-	&kvm_chardev_ops,
+	.minor = KVM_MINOR,
+	.fops = &kvm_chardev_ops,
 };
 
 static void kvm_iodevice_destructor(struct kvm_io_device *dev)
@@ -5914,7 +5913,9 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
 	/*
 	 * Registration _must_ be the very last thing done, as this exposes
 	 * /dev/kvm to userspace, i.e. all infrastructure must be setup!
+	 * Append CONFIG_KVM_ID to the device name.
 	 */
+	kvm_dev.name = kasprintf(GFP_KERNEL, "kvm%s", CONFIG_KVM_ID);
 	r = misc_register(&kvm_dev);
 	if (r) {
 		pr_err("kvm: misc device register failed\n");
diff --git a/virt/kvm/vac.c b/virt/kvm/vac.c
index c628afeb3d4b..60f5bec2659a 100644
--- a/virt/kvm/vac.c
+++ b/virt/kvm/vac.c
@@ -10,6 +10,7 @@ DEFINE_PER_CPU(cpumask_var_t, cpu_kick_mask);
 EXPORT_SYMBOL(cpu_kick_mask);
 
 DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu);
+EXPORT_SYMBOL_GPL(kvm_running_vcpu);
 
 #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
 DEFINE_MUTEX(vac_lock);
@@ -56,6 +57,7 @@ int kvm_online_cpu(unsigned int cpu)
 	mutex_unlock(&vac_lock);
 	return ret;
 }
+EXPORT_SYMBOL_GPL(kvm_online_cpu);
 
 static void hardware_disable_nolock(void *junk)
 {
@@ -79,6 +81,7 @@ int kvm_offline_cpu(unsigned int cpu)
 	mutex_unlock(&vac_lock);
 	return 0;
 }
+EXPORT_SYMBOL_GPL(kvm_offline_cpu);
 
 static void hardware_disable_all_nolock(void)
 {
@@ -97,6 +100,7 @@ void hardware_disable_all(void)
 	mutex_unlock(&vac_lock);
 	cpus_read_unlock();
 }
+EXPORT_SYMBOL_GPL(hardware_disable_all);
 
 int hardware_enable_all(void)
 {
@@ -129,6 +133,7 @@ int hardware_enable_all(void)
 
 	return r;
 }
+EXPORT_SYMBOL_GPL(hardware_enable_all);
 
 static int kvm_reboot(struct notifier_block *notifier, unsigned long val,
 		      void *v)
@@ -176,10 +181,12 @@ struct notifier_block kvm_reboot_notifier = {
 	.notifier_call = kvm_reboot,
 	.priority = 0,
 };
+EXPORT_SYMBOL_GPL(kvm_reboot_notifier);
 
 struct syscore_ops kvm_syscore_ops = {
 	.suspend = kvm_suspend,
 	.resume = kvm_resume,
 };
+EXPORT_SYMBOL_GPL(kvm_syscore_ops);
 
 #endif
-- 
2.42.0.869.gea05f2083d-goog


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/14] Support multiple KVM modules on the same host
  2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
                   ` (13 preceding siblings ...)
  2023-11-07 20:20 ` [RFC PATCH 14/14] KVM: VAC: Bring up VAC as a new module Anish Ghulati
@ 2023-11-17  8:53 ` Lai Jiangshan
  2023-11-28 18:10   ` Sean Christopherson
  14 siblings, 1 reply; 22+ messages in thread
From: Lai Jiangshan @ 2023-11-17  8:53 UTC (permalink / raw)
  To: Anish Ghulati
  Cc: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland

On Wed, Nov 8, 2023 at 4:20 AM Anish Ghulati <aghulati@google.com> wrote:
>
> This series is a rough, PoC-quality RFC to allow (un)loading and running
> multiple KVM modules simultaneously on a single host, e.g. to deploy
> fixes, mitigations, and/or new features without having to drain all VMs
> from the host. Multi-KVM will also allow running the "same" KVM module
> with different params, e.g. to run trusted VMs with different mitigations.
>
> The goal of this RFC is to get feedback on the idea itself and the
> high-level approach.  In particular, we're looking for input on:
>
>  - Combining kvm_intel.ko and kvm_amd.ko into kvm.ko
>  - Exposing multiple /dev/kvmX devices via Kconfig
>  - The name and prefix of the new base module
>
> Feedback on individual patches is also welcome, but please keep in mind
> that this is very much a work in-progress

Hello Anish

Scarce effort on multi-KVM can be seen in the mail list albeit many
companies enable multi-KVM internally.

I'm glad that you took a big step in upstreaming it.  And I hope it
can be materialized soon.


>
>  - Move system-wide virtualization resource management to a new base
>    module to avoid collisions between different KVM modules, e.g. VPIDs
>    and ASIDs need to be unique per VM, and callbacks from IRQ handlers need
>    to be mediated so that things like PMIs get to the right KVM instance.

perf_register_guest_info_callbacks() also accesses to system-wide resources,
but I don't see its relating code including kvm_guest_cbs being moved to AVC.

>
>  - Refactor KVM to make all upgradable assets visible only to KVM, i.e.
>    make KVM a black box, so that the layout/size of things like "struct
>    kvm_vcpu" isn't exposed to the kernel at-large.
>
>  - Fold kvm_intel.ko and kvm_amd.ko into kvm.ko to avoid complications
>    having to generate unique symbols for every symbol exported by kvm.ko.

The sizes of kvm_intel.ko and kvm_amd.ko are big, and there
is only 1G in the kernel available for modules. So I don't think folding
two vendors' code into kvm.ko is a good idea.

Since the symbols in the new module are invisible outside, I recommend:
new kvm_intel.ko = kvm_intel.ko + kvm.ko
new kvm_amd.ko = kvm_amd.ko + kvm.ko

>
>  - Add a Kconfig string to allow defining a device and module postfix at
>    build time, e.g. to create kvmX.ko and /dev/kvmX.
>
> The proposed name of the new base module is vac.ko, a.k.a.
> Virtualization Acceleration Code (Unupgradable Units Module). Childish
> humor aside, "vac" is a unique name in the kernel and hopefully in x86
> and hardware terminology, is a unique name in the kernel and hopefully
> in x86 and hardware terminology, e.g. `git grep vac_` yields no hits in
> the kernel. It also has the same number of characters as "kvm", e.g.
> the namespace can be modified without needing whitespace adjustment if
> we want to go that route.

How about the name kvm_base.ko?

And the variable/function name in it can still be kvm_foo (other than
kvm_base_foo).

Thanks
Lai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 09/14] KVM: x86: Move shared KVM state into VAC
  2023-11-07 20:19 ` [RFC PATCH 09/14] KVM: x86: Move shared KVM state " Anish Ghulati
@ 2023-11-17  8:54   ` Lai Jiangshan
  2023-11-28 18:01     ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Lai Jiangshan @ 2023-11-17  8:54 UTC (permalink / raw)
  To: Anish Ghulati
  Cc: kvm, linux-kernel, Sean Christopherson, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland,
	Venkatesh Srinivas

On Wed, Nov 8, 2023 at 4:21 AM Anish Ghulati <aghulati@google.com> wrote:
>
> From: Venkatesh Srinivas <venkateshs@chromium.org>
>
> Move kcpu_kick_mask and vm_running_vcpu* from arch neutral KVM code into
> VAC.

Hello, Venkatesh, Anish

IMO, the allocation code for cpu_kick_mask has to be moved too.

Thanks

Lai

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 09/14] KVM: x86: Move shared KVM state into VAC
  2023-11-17  8:54   ` Lai Jiangshan
@ 2023-11-28 18:01     ` Sean Christopherson
  0 siblings, 0 replies; 22+ messages in thread
From: Sean Christopherson @ 2023-11-28 18:01 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Anish Ghulati, kvm, linux-kernel, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, hpa,
	Vitaly Kuznetsov, peterz, paulmck, Mark Rutland,
	Venkatesh Srinivas

On Fri, Nov 17, 2023, Lai Jiangshan wrote:
> On Wed, Nov 8, 2023 at 4:21 AM Anish Ghulati <aghulati@google.com> wrote:
> >
> > From: Venkatesh Srinivas <venkateshs@chromium.org>
> >
> > Move kcpu_kick_mask and vm_running_vcpu* from arch neutral KVM code into
> > VAC.
> 
> Hello, Venkatesh, Anish
> 
> IMO, the allocation code for cpu_kick_mask has to be moved too.

I'm pretty sure this patch should be dropped.  I can't think of any reason why
cpu_kick_mask needs to be in VAC, and kvm_running_vcpu definitely needs to be
per-KVM.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/14] Support multiple KVM modules on the same host
  2023-11-17  8:53 ` [RFC PATCH 00/14] Support multiple KVM modules on the same host Lai Jiangshan
@ 2023-11-28 18:10   ` Sean Christopherson
  2026-01-05  7:48     ` Hou Wenlong
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2023-11-28 18:10 UTC (permalink / raw)
  To: Lai Jiangshan
  Cc: Anish Ghulati, kvm, linux-kernel, Paolo Bonzini, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, Dave Hansen, x86, hpa,
	Vitaly Kuznetsov, peterz, paulmck, Mark Rutland

On Fri, Nov 17, 2023, Lai Jiangshan wrote:
> On Wed, Nov 8, 2023 at 4:20 AM Anish Ghulati <aghulati@google.com> wrote:
> >
> > This series is a rough, PoC-quality RFC to allow (un)loading and running
> > multiple KVM modules simultaneously on a single host, e.g. to deploy
> > fixes, mitigations, and/or new features without having to drain all VMs
> > from the host. Multi-KVM will also allow running the "same" KVM module
> > with different params, e.g. to run trusted VMs with different mitigations.
> >
> > The goal of this RFC is to get feedback on the idea itself and the
> > high-level approach.  In particular, we're looking for input on:
> >
> >  - Combining kvm_intel.ko and kvm_amd.ko into kvm.ko
> >  - Exposing multiple /dev/kvmX devices via Kconfig
> >  - The name and prefix of the new base module
> >
> > Feedback on individual patches is also welcome, but please keep in mind
> > that this is very much a work in-progress
> 
> Hello Anish
> 
> Scarce effort on multi-KVM can be seen in the mail list albeit many
> companies enable multi-KVM internally.
> 
> I'm glad that you took a big step in upstreaming it.  And I hope it
> can be materialized soon.
> 
> 
> >
> >  - Move system-wide virtualization resource management to a new base
> >    module to avoid collisions between different KVM modules, e.g. VPIDs
> >    and ASIDs need to be unique per VM, and callbacks from IRQ handlers need
> >    to be mediated so that things like PMIs get to the right KVM instance.
> 
> perf_register_guest_info_callbacks() also accesses to system-wide resources,
> but I don't see its relating code including kvm_guest_cbs being moved to AVC.

Yeah, that's on the TODO list.  IIRC, the plan is to have VAC register a single
callback with perf, and then have VAC deal with invoking the callback(s) for the
correct KVM instance.

> >  - Refactor KVM to make all upgradable assets visible only to KVM, i.e.
> >    make KVM a black box, so that the layout/size of things like "struct
> >    kvm_vcpu" isn't exposed to the kernel at-large.
> >
> >  - Fold kvm_intel.ko and kvm_amd.ko into kvm.ko to avoid complications
> >    having to generate unique symbols for every symbol exported by kvm.ko.
> 
> The sizes of kvm_intel.ko and kvm_amd.ko are big, and there
> is only 1G in the kernel available for modules. So I don't think folding
> two vendors' code into kvm.ko is a good idea.
> 
> Since the symbols in the new module are invisible outside, I recommend:
> new kvm_intel.ko = kvm_intel.ko + kvm.ko
> new kvm_amd.ko = kvm_amd.ko + kvm.ko

Yeah, Paolo also suggested this at LPC.

> >  - Add a Kconfig string to allow defining a device and module postfix at
> >    build time, e.g. to create kvmX.ko and /dev/kvmX.
> >
> > The proposed name of the new base module is vac.ko, a.k.a.
> > Virtualization Acceleration Code (Unupgradable Units Module). Childish
> > humor aside, "vac" is a unique name in the kernel and hopefully in x86
> > and hardware terminology, is a unique name in the kernel and hopefully
> > in x86 and hardware terminology, e.g. `git grep vac_` yields no hits in
> > the kernel. It also has the same number of characters as "kvm", e.g.
> > the namespace can be modified without needing whitespace adjustment if
> > we want to go that route.
> 
> How about the name kvm_base.ko?
> 
> And the variable/function name in it can still be kvm_foo (other than
> kvm_base_foo).

My preference is to have a unique name that allows us to differentitate between
the "base" module/code and KVM code.  Verbal conversations about all of this get
quite confusing because it's not always clear whether "base KVM" refers to what
is currently kvm.ko, or what would become kvm_base.ko/vac.ko.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/14] Support multiple KVM modules on the same host
  2023-11-28 18:10   ` Sean Christopherson
@ 2026-01-05  7:48     ` Hou Wenlong
  2026-01-07 15:54       ` Sean Christopherson
  0 siblings, 1 reply; 22+ messages in thread
From: Hou Wenlong @ 2026-01-05  7:48 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Lai Jiangshan, Anish Ghulati, kvm, linux-kernel, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland

On Tue, Nov 28, 2023 at 10:10:20AM -0800, Sean Christopherson wrote:
> On Fri, Nov 17, 2023, Lai Jiangshan wrote:
> > On Wed, Nov 8, 2023 at 4:20 AM Anish Ghulati <aghulati@google.com> wrote:
> > >
> > > This series is a rough, PoC-quality RFC to allow (un)loading and running
> > > multiple KVM modules simultaneously on a single host, e.g. to deploy
> > > fixes, mitigations, and/or new features without having to drain all VMs
> > > from the host. Multi-KVM will also allow running the "same" KVM module
> > > with different params, e.g. to run trusted VMs with different mitigations.
> > >
> > > The goal of this RFC is to get feedback on the idea itself and the
> > > high-level approach.  In particular, we're looking for input on:
> > >
> > >  - Combining kvm_intel.ko and kvm_amd.ko into kvm.ko
> > >  - Exposing multiple /dev/kvmX devices via Kconfig
> > >  - The name and prefix of the new base module
> > >
> > > Feedback on individual patches is also welcome, but please keep in mind
> > > that this is very much a work in-progress
> > 
> > Hello Anish
> > 
> > Scarce effort on multi-KVM can be seen in the mail list albeit many
> > companies enable multi-KVM internally.
> > 
> > I'm glad that you took a big step in upstreaming it.  And I hope it
> > can be materialized soon.
> > 
> > 
> > >
> > >  - Move system-wide virtualization resource management to a new base
> > >    module to avoid collisions between different KVM modules, e.g. VPIDs
> > >    and ASIDs need to be unique per VM, and callbacks from IRQ handlers need
> > >    to be mediated so that things like PMIs get to the right KVM instance.
> > 
> > perf_register_guest_info_callbacks() also accesses to system-wide resources,
> > but I don't see its relating code including kvm_guest_cbs being moved to AVC.
> 
> Yeah, that's on the TODO list.  IIRC, the plan is to have VAC register a single
> callback with perf, and then have VAC deal with invoking the callback(s) for the
> correct KVM instance.
> 
> > >  - Refactor KVM to make all upgradable assets visible only to KVM, i.e.
> > >    make KVM a black box, so that the layout/size of things like "struct
> > >    kvm_vcpu" isn't exposed to the kernel at-large.
> > >
> > >  - Fold kvm_intel.ko and kvm_amd.ko into kvm.ko to avoid complications
> > >    having to generate unique symbols for every symbol exported by kvm.ko.
> > 
> > The sizes of kvm_intel.ko and kvm_amd.ko are big, and there
> > is only 1G in the kernel available for modules. So I don't think folding
> > two vendors' code into kvm.ko is a good idea.
> > 
> > Since the symbols in the new module are invisible outside, I recommend:
> > new kvm_intel.ko = kvm_intel.ko + kvm.ko
> > new kvm_amd.ko = kvm_amd.ko + kvm.ko
> 
> Yeah, Paolo also suggested this at LPC.
> 
> > >  - Add a Kconfig string to allow defining a device and module postfix at
> > >    build time, e.g. to create kvmX.ko and /dev/kvmX.
> > >
> > > The proposed name of the new base module is vac.ko, a.k.a.
> > > Virtualization Acceleration Code (Unupgradable Units Module). Childish
> > > humor aside, "vac" is a unique name in the kernel and hopefully in x86
> > > and hardware terminology, is a unique name in the kernel and hopefully
> > > in x86 and hardware terminology, e.g. `git grep vac_` yields no hits in
> > > the kernel. It also has the same number of characters as "kvm", e.g.
> > > the namespace can be modified without needing whitespace adjustment if
> > > we want to go that route.
> > 
> > How about the name kvm_base.ko?
> > 
> > And the variable/function name in it can still be kvm_foo (other than
> > kvm_base_foo).
> 
> My preference is to have a unique name that allows us to differentitate between
> the "base" module/code and KVM code.  Verbal conversations about all of this get
> quite confusing because it's not always clear whether "base KVM" refers to what
> is currently kvm.ko, or what would become kvm_base.ko/vac.ko.
>
Hi, Sean and Anish.

Sorry for revisiting this topic after a long time. I haven't seen any
new updates regarding this topic/series, and I didn’t find any recent
activity on the GitHub repository. Is the multi-KVM topic still being
considered for upstreaming, or is there anything blocking this?

As Lai pointed out, we also have a similar multi-KVM implementation in
our internal environment, so we are quite interested in this topic.
Recently, when we upgraded our kernel version, we found that maintaining
multi-KVM has become a significant burden. We are willing to move
forward with it if multi-KVM is still accepted for upstream. So I look
forward to feedback from the maintainers.

From what I've seen, the recent patch set that enables VMX/SVM during
booting is a good starting point for multi-KVM as well.

Thanks!

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/14] Support multiple KVM modules on the same host
  2026-01-05  7:48     ` Hou Wenlong
@ 2026-01-07 15:54       ` Sean Christopherson
  2026-01-08  6:55         ` Hou Wenlong
  0 siblings, 1 reply; 22+ messages in thread
From: Sean Christopherson @ 2026-01-07 15:54 UTC (permalink / raw)
  To: Hou Wenlong
  Cc: Lai Jiangshan, Anish Ghulati, kvm, linux-kernel, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland

On Mon, Jan 05, 2026, Hou Wenlong wrote:
> Sorry for revisiting this topic after a long time. I haven't seen any
> new updates regarding this topic/series, and I didn’t find any recent
> activity on the GitHub repository. Is the multi-KVM topic still being
> considered for upstreaming, or is there anything blocking this?

We have abandoned upstreaming multi-KVM.  The operational cost+complexity is too
high relative to the benefits, especially when factoring in things like ASI and
live patching, and the benefits are almost entirely obsoleted by kernel live update
support.

> As Lai pointed out, we also have a similar multi-KVM implementation in
> our internal environment, so we are quite interested in this topic.
> Recently, when we upgraded our kernel version, we found that maintaining
> multi-KVM has become a significant burden.

Yeah, I can imagine the pain all too well.  :-/

> We are willing to move forward with it if multi-KVM is still accepted for
> upstream. So I look forward to feedback from the maintainers.
>
> From what I've seen, the recent patch set that enables VMX/SVM during
> booting is a good starting point for multi-KVM as well.

I have mixed feelings on multi-KVM.  Without considering maintenance and support
costs, I still love the idea of reworking the kernel to support running multiple
hypervisors concurrently.  But as I explained in the first cover letter of that
series[0], there is a massive amount of complexity, both in initial development
and ongoing maintenance, needed to provide such infrastructure:

 : I got quite far long on rebasing some internal patches we have to extract the
 : core virtualization bits out of KVM x86, but as I paged back in all of the
 : things we had punted on (because they were waaay out of scope for our needs),
 : I realized more and more that providing truly generic virtualization
 : instrastructure is vastly different than providing infrastructure that can be
 : shared by multiple instances of KVM (or things very similar to KVM)[1].
 :
 : So while I still don't want to blindly do VMXON, I also think that trying to
 : actually support another in-tree hypervisor, without an imminent user to drive
 : the development, is a waste of resources, and would saddle KVM with a pile of
 : pointless complexity.

For deployment to a relatively homogeneous fleet, many of the pain points can be
avoided by either avoiding them entirely or making the settings "inflexible",
because there is effectively one use case and so such caveats are a non-issue.
But those types of simplifications don't work upstream, e.g. saying "eVMCS is
unsupported if multi-KVM is possible" instead of moving eVMCS enabling to a base
module isn't acceptable.

So I guess my "official" stance is that I'm not opposed to upstreaming multi-KVM
(or similar) functionality, but I'm not exactly in favor of it either.  And
practically speaking, because multi-KVM would be in constant conflict with so
much ongoing/new feature support (both in software and hardware), and is not a
priority for anyone pursuing kernel live update, upstreaming would be likely take
several years, without any guarantee of a successful landing.

[0] https://lore.kernel.org/all/20251010220403.987927-1-seanjc@google.com
[1] https://lore.kernel.org/all/aOl5EutrdL_OlVOO@google.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC PATCH 00/14] Support multiple KVM modules on the same host
  2026-01-07 15:54       ` Sean Christopherson
@ 2026-01-08  6:55         ` Hou Wenlong
  0 siblings, 0 replies; 22+ messages in thread
From: Hou Wenlong @ 2026-01-08  6:55 UTC (permalink / raw)
  To: Sean Christopherson
  Cc: Lai Jiangshan, Anish Ghulati, kvm, linux-kernel, Paolo Bonzini,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov, Dave Hansen, x86,
	hpa, Vitaly Kuznetsov, peterz, paulmck, Mark Rutland

On Wed, Jan 07, 2026 at 07:54:53AM -0800, Sean Christopherson wrote:
> On Mon, Jan 05, 2026, Hou Wenlong wrote:
> > Sorry for revisiting this topic after a long time. I haven't seen any
> > new updates regarding this topic/series, and I didn’t find any recent
> > activity on the GitHub repository. Is the multi-KVM topic still being
> > considered for upstreaming, or is there anything blocking this?
> 
> We have abandoned upstreaming multi-KVM.  The operational cost+complexity is too
> high relative to the benefits, especially when factoring in things like ASI and
> live patching, and the benefits are almost entirely obsoleted by kernel live update
> support.
>

We need to look into the new features to see if they’re compatible with
multi-KVM and how we can integrate them effectively. This is definitely
challenging work. We also have the option to explore the kernel live
update now.

> > As Lai pointed out, we also have a similar multi-KVM implementation in
> > our internal environment, so we are quite interested in this topic.
> > Recently, when we upgraded our kernel version, we found that maintaining
> > multi-KVM has become a significant burden.
> 
> Yeah, I can imagine the pain all too well.  :-/
> 
> > We are willing to move forward with it if multi-KVM is still accepted for
> > upstream. So I look forward to feedback from the maintainers.
> >
> > From what I've seen, the recent patch set that enables VMX/SVM during
> > booting is a good starting point for multi-KVM as well.
> 
> I have mixed feelings on multi-KVM.  Without considering maintenance and support
> costs, I still love the idea of reworking the kernel to support running multiple
> hypervisors concurrently.  But as I explained in the first cover letter of that
> series[0], there is a massive amount of complexity, both in initial development
> and ongoing maintenance, needed to provide such infrastructure:
> 
>  : I got quite far long on rebasing some internal patches we have to extract the
>  : core virtualization bits out of KVM x86, but as I paged back in all of the
>  : things we had punted on (because they were waaay out of scope for our needs),
>  : I realized more and more that providing truly generic virtualization
>  : instrastructure is vastly different than providing infrastructure that can be
>  : shared by multiple instances of KVM (or things very similar to KVM)[1].
>  :
>  : So while I still don't want to blindly do VMXON, I also think that trying to
>  : actually support another in-tree hypervisor, without an imminent user to drive
>  : the development, is a waste of resources, and would saddle KVM with a pile of
>  : pointless complexity.
> 
> For deployment to a relatively homogeneous fleet, many of the pain points can be
> avoided by either avoiding them entirely or making the settings "inflexible",
> because there is effectively one use case and so such caveats are a non-issue.
> But those types of simplifications don't work upstream, e.g. saying "eVMCS is
> unsupported if multi-KVM is possible" instead of moving eVMCS enabling to a base
> module isn't acceptable.
> 
> So I guess my "official" stance is that I'm not opposed to upstreaming multi-KVM
> (or similar) functionality, but I'm not exactly in favor of it either.  And
> practically speaking, because multi-KVM would be in constant conflict with so
> much ongoing/new feature support (both in software and hardware), and is not a
> priority for anyone pursuing kernel live update, upstreaming would be likely take
> several years, without any guarantee of a successful landing.
> 
> [0] https://lore.kernel.org/all/20251010220403.987927-1-seanjc@google.com
> [1] https://lore.kernel.org/all/aOl5EutrdL_OlVOO@google.com

Got it, thanks for your feedback!

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2026-01-08  6:55 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-07 20:19 [RFC PATCH 00/14] Support multiple KVM modules on the same host Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 01/14] KVM: x86: Move common module params from SVM/VMX to x86 Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 02/14] KVM: x86: Fold x86 vendor modules into the main KVM modules Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 03/14] KVM: x86: Remove unused exports Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 04/14] KVM: x86: Create stubs for a new VAC module Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 05/14] KVM: x86: Refactor hardware enable/disable operations into a new file Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 06/14] KVM: x86: Move user return msr operations out of KVM Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 07/14] KVM: SVM: Move shared SVM data structures into VAC Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 08/14] KVM: VMX: Move shared VMX " Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 09/14] KVM: x86: Move shared KVM state " Anish Ghulati
2023-11-17  8:54   ` Lai Jiangshan
2023-11-28 18:01     ` Sean Christopherson
2023-11-07 20:19 ` [RFC PATCH 10/14] KVM: VMX: Move VMX enable and disable " Anish Ghulati
2023-11-07 20:19 ` [RFC PATCH 11/14] KVM: SVM: Move SVM " Anish Ghulati
2023-11-07 20:20 ` [RFC PATCH 12/14] KVM: x86: Move VMX and SVM support checks " Anish Ghulati
2023-11-07 20:20 ` [RFC PATCH 13/14] KVM: x86: VAC: Move all hardware enable/disable code " Anish Ghulati
2023-11-07 20:20 ` [RFC PATCH 14/14] KVM: VAC: Bring up VAC as a new module Anish Ghulati
2023-11-17  8:53 ` [RFC PATCH 00/14] Support multiple KVM modules on the same host Lai Jiangshan
2023-11-28 18:10   ` Sean Christopherson
2026-01-05  7:48     ` Hou Wenlong
2026-01-07 15:54       ` Sean Christopherson
2026-01-08  6:55         ` Hou Wenlong

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox