* [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt
@ 2024-06-08 0:06 Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock Sean Christopherson
` (10 more replies)
0 siblings, 11 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.
The primary motivation for this series is to simplify dealing with enabling
virtualization for Intel's TDX, which needs to enable virtualization
when kvm-intel.ko is loaded, i.e. long before the first VM is created. TDX
doesn't _need_ to keep virtualization enabled, but doing so is much simpler
for KVM (see patch 3).
That said, this is a nice cleanup on its own, assuming I haven't broken
something. By registering the callbacks on-demand, the callbacks themselves
don't need to check kvm_usage_count, because their very existence implies a
non-zero count.
The meat is in patch 1. Patches 2 renames the helpers so that patch 3 is
less awkward. Patch 3 adds a module param to enable virtualization when KVM
is loaded. Patches 4-6 are tangentially related x86 cleanups to registers
KVM's "emergency disable" callback on-demand, same as the syscore callbacks.
The suspend/resume and cphup paths still need to be fully tested, as do
non-x86 architectures.
v3:
- Collect reviews/acks.
- Switch to kvm_usage_lock in a dedicated patch, Cc'd for stable@. [Chao]
- Enable virt at load by default. [Chao]
- Add comments to document how kvm_arch_{en,dis}able_virtualization() fit
into the overall flow. [Kai]
v2:
- https://lore.kernel.org/all/20240522022827.1690416-1-seanjc@google.com
- Use a dedicated mutex to avoid lock inversion issues between kvm_lock and
the cpuhp lock.
- Register emergency disable callbacks on-demand. [Kai]
- Drop an unintended s/junk/ign rename. [Kai]
- Decrement kvm_usage_count on failure. [Chao]
v1: https://lore.kernel.org/all/20240425233951.3344485-1-seanjc@google.com
Sean Christopherson (8):
KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
KVM: Register cpuhp and syscore callbacks when enabling hardware
KVM: Rename functions related to enabling virtualization hardware
KVM: Add a module param to allow enabling virtualization when KVM is
loaded
KVM: Add arch hooks for enabling/disabling virtualization
x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef
KVM: x86: Register "emergency disable" callbacks when virt is enabled
KVM: Enable virtualization at load/initialization by default
Documentation/virt/kvm/locking.rst | 19 ++-
arch/x86/include/asm/kvm_host.h | 3 +
arch/x86/include/asm/reboot.h | 2 +-
arch/x86/kvm/svm/svm.c | 5 +-
arch/x86/kvm/vmx/main.c | 2 +
arch/x86/kvm/vmx/vmx.c | 6 +-
arch/x86/kvm/vmx/x86_ops.h | 1 +
arch/x86/kvm/x86.c | 10 ++
include/linux/kvm_host.h | 14 ++
virt/kvm/kvm_main.c | 258 ++++++++++++++---------------
10 files changed, 175 insertions(+), 145 deletions(-)
base-commit: af0903ab52ee6d6f0f63af67fa73d5eb00f79b9a
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply [flat|nested] 31+ messages in thread
* [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
@ 2024-06-08 0:06 ` Sean Christopherson
2024-06-10 0:26 ` Huang, Kai
2024-08-14 18:06 ` Paolo Bonzini
2024-06-08 0:06 ` [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
` (9 subsequent siblings)
10 siblings, 2 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
on x86 due to a chain of locks and SRCU synchronizations. Translating the
below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
fairness of r/w semaphores).
CPU0 CPU1 CPU2
1 lock(&kvm->slots_lock);
2 lock(&vcpu->mutex);
3 lock(&kvm->srcu);
4 lock(cpu_hotplug_lock);
5 lock(kvm_lock);
6 lock(&kvm->slots_lock);
7 lock(cpu_hotplug_lock);
8 sync(&kvm->srcu);
Note, there are likely more potential deadlocks in KVM x86, e.g. the same
pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
__kvmclock_cpufreq_notifier(), but actually triggering such deadlocks is
beyond rare due to the combination of dependencies and timings involved.
E.g. the cpufreq notifier is only used on older CPUs without a constant
TSC, mucking with the NX hugepage mitigation while VMs are running is very
uncommon, and doing so while also onlining/offlining a CPU (necessary to
generate contention on cpu_hotplug_lock) would be even more unusual.
======================================================
WARNING: possible circular locking dependency detected
6.10.0-smp--c257535a0c9d-pip #330 Tainted: G S O
------------------------------------------------------
tee/35048 is trying to acquire lock:
ff6a80eced71e0a8 (&kvm->slots_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x179/0x1e0 [kvm]
but task is already holding lock:
ffffffffc07abb08 (kvm_lock){+.+.}-{3:3}, at: set_nx_huge_pages+0x14a/0x1e0 [kvm]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #3 (kvm_lock){+.+.}-{3:3}:
__mutex_lock+0x6a/0xb40
mutex_lock_nested+0x1f/0x30
kvm_dev_ioctl+0x4fb/0xe50 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #2 (cpu_hotplug_lock){++++}-{0:0}:
cpus_read_lock+0x2e/0xb0
static_key_slow_inc+0x16/0x30
kvm_lapic_set_base+0x6a/0x1c0 [kvm]
kvm_set_apic_base+0x8f/0xe0 [kvm]
kvm_set_msr_common+0x9ae/0xf80 [kvm]
vmx_set_msr+0xa54/0xbe0 [kvm_intel]
__kvm_set_msr+0xb6/0x1a0 [kvm]
kvm_arch_vcpu_ioctl+0xeca/0x10c0 [kvm]
kvm_vcpu_ioctl+0x485/0x5b0 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #1 (&kvm->srcu){.+.+}-{0:0}:
__synchronize_srcu+0x44/0x1a0
synchronize_srcu_expedited+0x21/0x30
kvm_swap_active_memslots+0x110/0x1c0 [kvm]
kvm_set_memslot+0x360/0x620 [kvm]
__kvm_set_memory_region+0x27b/0x300 [kvm]
kvm_vm_ioctl_set_memory_region+0x43/0x60 [kvm]
kvm_vm_ioctl+0x295/0x650 [kvm]
__se_sys_ioctl+0x7b/0xd0
__x64_sys_ioctl+0x21/0x30
x64_sys_call+0x15d0/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (&kvm->slots_lock){+.+.}-{3:3}:
__lock_acquire+0x15ef/0x2e30
lock_acquire+0xe0/0x260
__mutex_lock+0x6a/0xb40
mutex_lock_nested+0x1f/0x30
set_nx_huge_pages+0x179/0x1e0 [kvm]
param_attr_store+0x93/0x100
module_attr_store+0x22/0x40
sysfs_kf_write+0x81/0xb0
kernfs_fop_write_iter+0x133/0x1d0
vfs_write+0x28d/0x380
ksys_write+0x70/0xe0
__x64_sys_write+0x1f/0x30
x64_sys_call+0x281b/0x2e60
do_syscall_64+0x83/0x160
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Cc: Chao Gao <chao.gao@intel.com>
Fixes: 0bf50497f03b ("KVM: Drop kvm_count_lock and instead protect kvm_usage_count with kvm_lock")
Cc: stable@vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
Documentation/virt/kvm/locking.rst | 19 ++++++++++++------
virt/kvm/kvm_main.c | 31 +++++++++++++++---------------
2 files changed, 29 insertions(+), 21 deletions(-)
diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
index 02880d5552d5..5e102fe5b396 100644
--- a/Documentation/virt/kvm/locking.rst
+++ b/Documentation/virt/kvm/locking.rst
@@ -227,7 +227,13 @@ time it will be set using the Dirty tracking mechanism described above.
:Type: mutex
:Arch: any
:Protects: - vm_list
- - kvm_usage_count
+
+``kvm_usage_count``
+^^^^^^^^^^^^^^^^^^^
+
+:Type: mutex
+:Arch: any
+:Protects: - kvm_usage_count
- hardware virtualization enable/disable
:Comment: KVM also disables CPU hotplug via cpus_read_lock() during
enable/disable.
@@ -290,11 +296,12 @@ time it will be set using the Dirty tracking mechanism described above.
wakeup.
``vendor_module_lock``
-^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+^^^^^^^^^^^^^^^^^^^^^^
:Type: mutex
:Arch: x86
:Protects: loading a vendor module (kvm_amd or kvm_intel)
-:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
- taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
- many operations need to take cpu_hotplug_lock when loading a vendor module,
- e.g. updating static calls.
+:Comment: Exists because using kvm_lock leads to deadlock. kvm_lock is taken
+ in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while
+ cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many
+ operations need to take cpu_hotplug_lock when loading a vendor module, e.g.
+ updating static calls.
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4965196cad58..d9b0579d3eea 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5499,6 +5499,7 @@ __visible bool kvm_rebooting;
EXPORT_SYMBOL_GPL(kvm_rebooting);
static DEFINE_PER_CPU(bool, hardware_enabled);
+static DEFINE_MUTEX(kvm_usage_lock);
static int kvm_usage_count;
static int __hardware_enable_nolock(void)
@@ -5531,10 +5532,10 @@ static int kvm_online_cpu(unsigned int cpu)
* be enabled. Otherwise running VMs would encounter unrecoverable
* errors when scheduled to this CPU.
*/
- mutex_lock(&kvm_lock);
+ mutex_lock(&kvm_usage_lock);
if (kvm_usage_count)
ret = __hardware_enable_nolock();
- mutex_unlock(&kvm_lock);
+ mutex_unlock(&kvm_usage_lock);
return ret;
}
@@ -5554,10 +5555,10 @@ static void hardware_disable_nolock(void *junk)
static int kvm_offline_cpu(unsigned int cpu)
{
- mutex_lock(&kvm_lock);
+ mutex_lock(&kvm_usage_lock);
if (kvm_usage_count)
hardware_disable_nolock(NULL);
- mutex_unlock(&kvm_lock);
+ mutex_unlock(&kvm_usage_lock);
return 0;
}
@@ -5573,9 +5574,9 @@ static void hardware_disable_all_nolock(void)
static void hardware_disable_all(void)
{
cpus_read_lock();
- mutex_lock(&kvm_lock);
+ mutex_lock(&kvm_usage_lock);
hardware_disable_all_nolock();
- mutex_unlock(&kvm_lock);
+ mutex_unlock(&kvm_usage_lock);
cpus_read_unlock();
}
@@ -5606,7 +5607,7 @@ static int hardware_enable_all(void)
* enable hardware multiple times.
*/
cpus_read_lock();
- mutex_lock(&kvm_lock);
+ mutex_lock(&kvm_usage_lock);
r = 0;
@@ -5620,7 +5621,7 @@ static int hardware_enable_all(void)
}
}
- mutex_unlock(&kvm_lock);
+ mutex_unlock(&kvm_usage_lock);
cpus_read_unlock();
return r;
@@ -5648,13 +5649,13 @@ static int kvm_suspend(void)
{
/*
* Secondary CPUs and CPU hotplug are disabled across the suspend/resume
- * callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
- * is stable. Assert that kvm_lock is not held to ensure the system
- * isn't suspended while KVM is enabling hardware. Hardware enabling
- * can be preempted, but the task cannot be frozen until it has dropped
- * all locks (userspace tasks are frozen via a fake signal).
+ * callbacks, i.e. no need to acquire kvm_usage_lock to ensure the usage
+ * count is stable. Assert that kvm_usage_lock is not held to ensure
+ * the system isn't suspended while KVM is enabling hardware. Hardware
+ * enabling can be preempted, but the task cannot be frozen until it has
+ * dropped all locks (userspace tasks are frozen via a fake signal).
*/
- lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_not_held(&kvm_usage_lock);
lockdep_assert_irqs_disabled();
if (kvm_usage_count)
@@ -5664,7 +5665,7 @@ static int kvm_suspend(void)
static void kvm_resume(void)
{
- lockdep_assert_not_held(&kvm_lock);
+ lockdep_assert_not_held(&kvm_usage_lock);
lockdep_assert_irqs_disabled();
if (kvm_usage_count)
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock Sean Christopherson
@ 2024-06-08 0:06 ` Sean Christopherson
2024-06-10 0:55 ` Huang, Kai
2024-08-14 18:12 ` Paolo Bonzini
2024-06-08 0:06 ` [PATCH v3 3/8] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson
` (8 subsequent siblings)
10 siblings, 2 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Register KVM's cpuhp and syscore callback when enabling virtualization
in hardware instead of registering the callbacks during initialization,
and let the CPU up/down framework invoke the inner enable/disable
functions. Registering the callbacks during initialization makes things
more complex than they need to be, as KVM needs to be very careful about
handling races between enabling CPUs being onlined/offlined and hardware
being enabled/disabled.
Intel TDX support will require KVM to enable virtualization during KVM
initialization, i.e. will add another wrinkle to things, at which point
sorting out the potential races with kvm_usage_count would become even
more complex.
Note, using the cpuhp framework has a subtle behavioral change: enabling
will be done serially across all CPUs, whereas KVM currently sends an IPI
to all CPUs in parallel. While serializing virtualization enabling could
create undesirable latency, the issue is limited to creation of KVM's
first VM, and even that can be mitigated, e.g. by letting userspace force
virtualization to be enabled when KVM is initialized.
Cc: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 174 ++++++++++++++++----------------------------
1 file changed, 61 insertions(+), 113 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d9b0579d3eea..f6b114f42433 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5502,7 +5502,7 @@ static DEFINE_PER_CPU(bool, hardware_enabled);
static DEFINE_MUTEX(kvm_usage_lock);
static int kvm_usage_count;
-static int __hardware_enable_nolock(void)
+static int hardware_enable_nolock(void)
{
if (__this_cpu_read(hardware_enabled))
return 0;
@@ -5517,34 +5517,18 @@ static int __hardware_enable_nolock(void)
return 0;
}
-static void hardware_enable_nolock(void *failed)
-{
- if (__hardware_enable_nolock())
- atomic_inc(failed);
-}
-
static int kvm_online_cpu(unsigned int cpu)
{
- int ret = 0;
-
/*
* Abort the CPU online process if hardware virtualization cannot
* be enabled. Otherwise running VMs would encounter unrecoverable
* errors when scheduled to this CPU.
*/
- mutex_lock(&kvm_usage_lock);
- if (kvm_usage_count)
- ret = __hardware_enable_nolock();
- mutex_unlock(&kvm_usage_lock);
- return ret;
+ return hardware_enable_nolock();
}
static void hardware_disable_nolock(void *junk)
{
- /*
- * Note, hardware_disable_all_nolock() tells all online CPUs to disable
- * hardware, not just CPUs that successfully enabled hardware!
- */
if (!__this_cpu_read(hardware_enabled))
return;
@@ -5555,78 +5539,10 @@ static void hardware_disable_nolock(void *junk)
static int kvm_offline_cpu(unsigned int cpu)
{
- mutex_lock(&kvm_usage_lock);
- if (kvm_usage_count)
- hardware_disable_nolock(NULL);
- mutex_unlock(&kvm_usage_lock);
+ hardware_disable_nolock(NULL);
return 0;
}
-static void hardware_disable_all_nolock(void)
-{
- BUG_ON(!kvm_usage_count);
-
- kvm_usage_count--;
- if (!kvm_usage_count)
- on_each_cpu(hardware_disable_nolock, NULL, 1);
-}
-
-static void hardware_disable_all(void)
-{
- cpus_read_lock();
- mutex_lock(&kvm_usage_lock);
- hardware_disable_all_nolock();
- mutex_unlock(&kvm_usage_lock);
- cpus_read_unlock();
-}
-
-static int hardware_enable_all(void)
-{
- atomic_t failed = ATOMIC_INIT(0);
- int r;
-
- /*
- * Do not enable hardware virtualization if the system is going down.
- * If userspace initiated a forced reboot, e.g. reboot -f, then it's
- * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling
- * after kvm_reboot() is called. Note, this relies on system_state
- * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops
- * hook instead of registering a dedicated reboot notifier (the latter
- * runs before system_state is updated).
- */
- if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
- system_state == SYSTEM_RESTART)
- return -EBUSY;
-
- /*
- * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
- * is called, and so on_each_cpu() between them includes the CPU that
- * is being onlined. As a result, hardware_enable_nolock() may get
- * invoked before kvm_online_cpu(), which also enables hardware if the
- * usage count is non-zero. Disable CPU hotplug to avoid attempting to
- * enable hardware multiple times.
- */
- cpus_read_lock();
- mutex_lock(&kvm_usage_lock);
-
- r = 0;
-
- kvm_usage_count++;
- if (kvm_usage_count == 1) {
- on_each_cpu(hardware_enable_nolock, &failed, 1);
-
- if (atomic_read(&failed)) {
- hardware_disable_all_nolock();
- r = -EBUSY;
- }
- }
-
- mutex_unlock(&kvm_usage_lock);
- cpus_read_unlock();
-
- return r;
-}
-
static void kvm_shutdown(void)
{
/*
@@ -5658,8 +5574,7 @@ static int kvm_suspend(void)
lockdep_assert_not_held(&kvm_usage_lock);
lockdep_assert_irqs_disabled();
- if (kvm_usage_count)
- hardware_disable_nolock(NULL);
+ hardware_disable_nolock(NULL);
return 0;
}
@@ -5668,8 +5583,7 @@ static void kvm_resume(void)
lockdep_assert_not_held(&kvm_usage_lock);
lockdep_assert_irqs_disabled();
- if (kvm_usage_count)
- WARN_ON_ONCE(__hardware_enable_nolock());
+ WARN_ON_ONCE(hardware_enable_nolock());
}
static struct syscore_ops kvm_syscore_ops = {
@@ -5677,6 +5591,60 @@ static struct syscore_ops kvm_syscore_ops = {
.resume = kvm_resume,
.shutdown = kvm_shutdown,
};
+
+static int hardware_enable_all(void)
+{
+ int r;
+
+ guard(mutex)(&kvm_usage_lock);
+
+ if (kvm_usage_count++)
+ return 0;
+
+ r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
+ kvm_online_cpu, kvm_offline_cpu);
+ if (r)
+ goto err_cpuhp;
+
+ register_syscore_ops(&kvm_syscore_ops);
+
+ /*
+ * Undo virtualization enabling and bail if the system is going down.
+ * If userspace initiated a forced reboot, e.g. reboot -f, then it's
+ * possible for an in-flight operation to enable virtualization after
+ * syscore_shutdown() is called, i.e. without kvm_shutdown() being
+ * invoked. Note, this relies on system_state being set _before_
+ * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked
+ * or this CPU observes the impending shutdown. Which is why KVM uses
+ * a syscore ops hook instead of registering a dedicated reboot
+ * notifier (the latter runs before system_state is updated).
+ */
+ if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
+ system_state == SYSTEM_RESTART) {
+ r = -EBUSY;
+ goto err_rebooting;
+ }
+
+ return 0;
+
+err_rebooting:
+ unregister_syscore_ops(&kvm_syscore_ops);
+ cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
+err_cpuhp:
+ --kvm_usage_count;
+ return r;
+}
+
+static void hardware_disable_all(void)
+{
+ guard(mutex)(&kvm_usage_lock);
+
+ if (--kvm_usage_count)
+ return;
+
+ unregister_syscore_ops(&kvm_syscore_ops);
+ cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
+}
#else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
static int hardware_enable_all(void)
{
@@ -6382,15 +6350,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
int r;
int cpu;
-#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
- r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
- kvm_online_cpu, kvm_offline_cpu);
- if (r)
- return r;
-
- register_syscore_ops(&kvm_syscore_ops);
-#endif
-
/* A kmem cache lets us meet the alignment requirements of fx_save. */
if (!vcpu_align)
vcpu_align = __alignof__(struct kvm_vcpu);
@@ -6401,10 +6360,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
offsetofend(struct kvm_vcpu, stats_id)
- offsetof(struct kvm_vcpu, arch),
NULL);
- if (!kvm_vcpu_cache) {
- r = -ENOMEM;
- goto err_vcpu_cache;
- }
+ if (!kvm_vcpu_cache)
+ return -ENOMEM;
for_each_possible_cpu(cpu) {
if (!alloc_cpumask_var_node(&per_cpu(cpu_kick_mask, cpu),
@@ -6461,11 +6418,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
for_each_possible_cpu(cpu)
free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
kmem_cache_destroy(kvm_vcpu_cache);
-err_vcpu_cache:
-#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
- unregister_syscore_ops(&kvm_syscore_ops);
- cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
-#endif
return r;
}
EXPORT_SYMBOL_GPL(kvm_init);
@@ -6487,10 +6439,6 @@ void kvm_exit(void)
kmem_cache_destroy(kvm_vcpu_cache);
kvm_vfio_ops_exit();
kvm_async_pf_deinit();
-#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
- unregister_syscore_ops(&kvm_syscore_ops);
- cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
-#endif
kvm_irqfd_exit();
}
EXPORT_SYMBOL_GPL(kvm_exit);
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 3/8] KVM: Rename functions related to enabling virtualization hardware
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
@ 2024-06-08 0:06 ` Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded Sean Christopherson
` (7 subsequent siblings)
10 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Rename the various functions that enable virtualization to prepare for
upcoming changes, and to clean up artifacts of KVM's previous behavior,
which required manually juggling locks around kvm_usage_count.
Drop the "nolock" qualifier from per-CPU functions now that there are no
"nolock" implementations of the "all" variants, i.e. now that calling a
non-nolock function from a nolock function isn't confusing (unlike this
sentence).
Drop "all" from the outer helpers as they no longer manually iterate
over all CPUs, and because it might not be obvious what "all" refers to.
Instead, use double-underscores to communicate that the per-CPU functions
are helpers to the outer APIs.
Opportunistically prepend "kvm" to all functions to help make it clear
that they are KVM helpers, but mostly there's no reason not to.
Lastly, use "virtualization" instead of "hardware", because while the
functions do enable virtualization in hardware, there are a _lot_ of
things that KVM enables in hardware.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 32 ++++++++++++++++----------------
1 file changed, 16 insertions(+), 16 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f6b114f42433..98e52d12f137 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -138,8 +138,8 @@ static int kvm_no_compat_open(struct inode *inode, struct file *file)
#define KVM_COMPAT(c) .compat_ioctl = kvm_no_compat_ioctl, \
.open = kvm_no_compat_open
#endif
-static int hardware_enable_all(void);
-static void hardware_disable_all(void);
+static int kvm_enable_virtualization(void);
+static void kvm_disable_virtualization(void);
static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
@@ -1215,7 +1215,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
if (r)
goto out_err_no_arch_destroy_vm;
- r = hardware_enable_all();
+ r = kvm_enable_virtualization();
if (r)
goto out_err_no_disable;
@@ -1258,7 +1258,7 @@ static struct kvm *kvm_create_vm(unsigned long type, const char *fdname)
mmu_notifier_unregister(&kvm->mmu_notifier, current->mm);
#endif
out_err_no_mmu_notifier:
- hardware_disable_all();
+ kvm_disable_virtualization();
out_err_no_disable:
kvm_arch_destroy_vm(kvm);
out_err_no_arch_destroy_vm:
@@ -1353,7 +1353,7 @@ static void kvm_destroy_vm(struct kvm *kvm)
#endif
kvm_arch_free_vm(kvm);
preempt_notifier_dec();
- hardware_disable_all();
+ kvm_disable_virtualization();
mmdrop(mm);
}
@@ -5502,7 +5502,7 @@ static DEFINE_PER_CPU(bool, hardware_enabled);
static DEFINE_MUTEX(kvm_usage_lock);
static int kvm_usage_count;
-static int hardware_enable_nolock(void)
+static int __kvm_enable_virtualization(void)
{
if (__this_cpu_read(hardware_enabled))
return 0;
@@ -5524,10 +5524,10 @@ static int kvm_online_cpu(unsigned int cpu)
* be enabled. Otherwise running VMs would encounter unrecoverable
* errors when scheduled to this CPU.
*/
- return hardware_enable_nolock();
+ return __kvm_enable_virtualization();
}
-static void hardware_disable_nolock(void *junk)
+static void __kvm_disable_virtualization(void *ign)
{
if (!__this_cpu_read(hardware_enabled))
return;
@@ -5539,7 +5539,7 @@ static void hardware_disable_nolock(void *junk)
static int kvm_offline_cpu(unsigned int cpu)
{
- hardware_disable_nolock(NULL);
+ __kvm_disable_virtualization(NULL);
return 0;
}
@@ -5558,7 +5558,7 @@ static void kvm_shutdown(void)
*/
pr_info("kvm: exiting hardware virtualization\n");
kvm_rebooting = true;
- on_each_cpu(hardware_disable_nolock, NULL, 1);
+ on_each_cpu(__kvm_disable_virtualization, NULL, 1);
}
static int kvm_suspend(void)
@@ -5574,7 +5574,7 @@ static int kvm_suspend(void)
lockdep_assert_not_held(&kvm_usage_lock);
lockdep_assert_irqs_disabled();
- hardware_disable_nolock(NULL);
+ __kvm_disable_virtualization(NULL);
return 0;
}
@@ -5583,7 +5583,7 @@ static void kvm_resume(void)
lockdep_assert_not_held(&kvm_usage_lock);
lockdep_assert_irqs_disabled();
- WARN_ON_ONCE(hardware_enable_nolock());
+ WARN_ON_ONCE(__kvm_enable_virtualization());
}
static struct syscore_ops kvm_syscore_ops = {
@@ -5592,7 +5592,7 @@ static struct syscore_ops kvm_syscore_ops = {
.shutdown = kvm_shutdown,
};
-static int hardware_enable_all(void)
+static int kvm_enable_virtualization(void)
{
int r;
@@ -5635,7 +5635,7 @@ static int hardware_enable_all(void)
return r;
}
-static void hardware_disable_all(void)
+static void kvm_disable_virtualization(void)
{
guard(mutex)(&kvm_usage_lock);
@@ -5646,12 +5646,12 @@ static void hardware_disable_all(void)
cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
}
#else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
-static int hardware_enable_all(void)
+static int kvm_enable_virtualization(void)
{
return 0;
}
-static void hardware_disable_all(void)
+static void kvm_disable_virtualization(void)
{
}
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
` (2 preceding siblings ...)
2024-06-08 0:06 ` [PATCH v3 3/8] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson
@ 2024-06-08 0:06 ` Sean Christopherson
2024-08-02 12:02 ` Huang, Kai
2024-08-14 18:14 ` Paolo Bonzini
2024-06-08 0:06 ` [PATCH v3 5/8] KVM: Add arch hooks for enabling/disabling virtualization Sean Christopherson
` (6 subsequent siblings)
10 siblings, 2 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Add an off-by-default module param, enable_virt_at_load, to let userspace
force virtualization to be enabled in hardware when KVM is initialized,
i.e. just before /dev/kvm is exposed to userspace. Enabling virtualization
during KVM initialization allows userspace to avoid the additional latency
when creating/destroying the first/last VM. Now that KVM uses the cpuhp
framework to do per-CPU enabling, the latency could be non-trivial as the
cpuhup bringup/teardown is serialized across CPUs, e.g. the latency could
be problematic for use case that need to spin up VMs quickly.
Enabling virtualizaton during initialization will also allow KVM to setup
the Intel TDX Module, which requires VMX to be fully enabled, without
needing additional APIs to temporarily enable virtualization.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 37 +++++++++++++++++++++++++++++++++++++
1 file changed, 37 insertions(+)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 98e52d12f137..7bdd744e4821 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5495,6 +5495,9 @@ static struct miscdevice kvm_dev = {
};
#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
+static bool enable_virt_at_load;
+module_param(enable_virt_at_load, bool, 0444);
+
__visible bool kvm_rebooting;
EXPORT_SYMBOL_GPL(kvm_rebooting);
@@ -5645,15 +5648,41 @@ static void kvm_disable_virtualization(void)
unregister_syscore_ops(&kvm_syscore_ops);
cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
}
+
+static int kvm_init_virtualization(void)
+{
+ if (enable_virt_at_load)
+ return kvm_enable_virtualization();
+
+ return 0;
+}
+
+static void kvm_uninit_virtualization(void)
+{
+ if (enable_virt_at_load)
+ kvm_disable_virtualization();
+
+ WARN_ON(kvm_usage_count);
+}
#else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
static int kvm_enable_virtualization(void)
{
return 0;
}
+static int kvm_init_virtualization(void)
+{
+ return 0;
+}
+
static void kvm_disable_virtualization(void)
{
+}
+
+static void kvm_uninit_virtualization(void)
+{
+
}
#endif /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
@@ -6395,6 +6424,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
kvm_gmem_init(module);
+ r = kvm_init_virtualization();
+ if (r)
+ goto err_virt;
+
/*
* Registration _must_ be the very last thing done, as this exposes
* /dev/kvm to userspace, i.e. all infrastructure must be setup!
@@ -6408,6 +6441,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
return 0;
err_register:
+ kvm_uninit_virtualization();
+err_virt:
kvm_vfio_ops_exit();
err_vfio:
kvm_async_pf_deinit();
@@ -6433,6 +6468,8 @@ void kvm_exit(void)
*/
misc_deregister(&kvm_dev);
+ kvm_uninit_virtualization();
+
debugfs_remove_recursive(kvm_debugfs_dir);
for_each_possible_cpu(cpu)
free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 5/8] KVM: Add arch hooks for enabling/disabling virtualization
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
` (3 preceding siblings ...)
2024-06-08 0:06 ` [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded Sean Christopherson
@ 2024-06-08 0:06 ` Sean Christopherson
2024-08-14 18:15 ` Paolo Bonzini
2024-06-08 0:06 ` [PATCH v3 6/8] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
` (5 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Add arch hooks that are invoked when KVM enables/disable virtualization.
x86 will use the hooks to register an "emergency disable" callback, which
is essentially an x86-specific shutdown notifier that is used when the
kernel is doing an emergency reboot/shutdown/kexec.
Add comments for the declarations to help arch code understand exactly
when the callbacks are invoked. Alternatively, the APIs themselves could
communicate most of the same info, but kvm_arch_pre_enable_virtualization()
and kvm_arch_post_disable_virtualization() are a bit cumbersome, and make
it a bit less obvious that they are intended to be implemented as a pair.
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/linux/kvm_host.h | 14 ++++++++++++++
virt/kvm/kvm_main.c | 14 ++++++++++++++
2 files changed, 28 insertions(+)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 96ad3e8b9ddb..12ef3beb4e47 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1514,6 +1514,20 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
#endif
#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
+/*
+ * kvm_arch_{enable,disable}_virtualization() are called on one CPU, under
+ * kvm_usage_lock, immediately after/before 0=>1 and 1=>0 transitions of
+ * kvm_usage_count, i.e. at the beginning of the generic hardware enabling
+ * sequence, and at the end of the generic hardware disabling sequence.
+ */
+void kvm_arch_enable_virtualization(void);
+void kvm_arch_disable_virtualization(void);
+/*
+ * kvm_arch_hardware_{enable,disable}() are called on "every" CPU to do the
+ * actual twiddling of hardware bits. The hooks are called all online CPUs
+ * when KVM enables/disabled virtualization. Enabling/disabling is also done
+ * when a CPU is onlined/offlined (or Resumed/Suspended).
+ */
int kvm_arch_hardware_enable(void);
void kvm_arch_hardware_disable(void);
#endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7bdd744e4821..e20189a89a64 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5505,6 +5505,16 @@ static DEFINE_PER_CPU(bool, hardware_enabled);
static DEFINE_MUTEX(kvm_usage_lock);
static int kvm_usage_count;
+__weak void kvm_arch_enable_virtualization(void)
+{
+
+}
+
+__weak void kvm_arch_disable_virtualization(void)
+{
+
+}
+
static int __kvm_enable_virtualization(void)
{
if (__this_cpu_read(hardware_enabled))
@@ -5604,6 +5614,8 @@ static int kvm_enable_virtualization(void)
if (kvm_usage_count++)
return 0;
+ kvm_arch_enable_virtualization();
+
r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
kvm_online_cpu, kvm_offline_cpu);
if (r)
@@ -5634,6 +5646,7 @@ static int kvm_enable_virtualization(void)
unregister_syscore_ops(&kvm_syscore_ops);
cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
err_cpuhp:
+ kvm_arch_disable_virtualization();
--kvm_usage_count;
return r;
}
@@ -5647,6 +5660,7 @@ static void kvm_disable_virtualization(void)
unregister_syscore_ops(&kvm_syscore_ops);
cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
+ kvm_arch_disable_virtualization();
}
static int kvm_init_virtualization(void)
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 6/8] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
` (4 preceding siblings ...)
2024-06-08 0:06 ` [PATCH v3 5/8] KVM: Add arch hooks for enabling/disabling virtualization Sean Christopherson
@ 2024-06-08 0:06 ` Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 7/8] KVM: x86: Register "emergency disable" callbacks when virt is enabled Sean Christopherson
` (4 subsequent siblings)
10 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Define cpu_emergency_virt_cb even if the kernel is being built without KVM
support so that KVM can reference the typedef in asm/kvm_host.h without
needing yet more #ifdefs.
No functional change intended.
Acked-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/reboot.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index 6536873f8fc0..d0ef2a678d66 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -25,8 +25,8 @@ void __noreturn machine_real_restart(unsigned int type);
#define MRR_BIOS 0
#define MRR_APM 1
-#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
typedef void (cpu_emergency_virt_cb)(void);
+#if IS_ENABLED(CONFIG_KVM_INTEL) || IS_ENABLED(CONFIG_KVM_AMD)
void cpu_emergency_register_virt_callback(cpu_emergency_virt_cb *callback);
void cpu_emergency_unregister_virt_callback(cpu_emergency_virt_cb *callback);
void cpu_emergency_disable_virtualization(void);
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 7/8] KVM: x86: Register "emergency disable" callbacks when virt is enabled
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
` (5 preceding siblings ...)
2024-06-08 0:06 ` [PATCH v3 6/8] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
@ 2024-06-08 0:06 ` Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 8/8] KVM: Enable virtualization at load/initialization by default Sean Christopherson
` (3 subsequent siblings)
10 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Register the "disable virtualization in an emergency" callback just
before KVM enables virtualization in hardware, as there is no functional
need to keep the callbacks registered while KVM happens to be loaded, but
is inactive, i.e. if KVM hasn't enabled virtualization.
Note, unregistering the callback every time the last VM is destroyed could
have measurable latency due to the synchronize_rcu() needed to ensure all
references to the callback are dropped before KVM is unloaded. But the
latency should be a small fraction of the total latency of disabling
virtualization across all CPUs, and userspace can set enable_virt_at_load
to completely eliminate the runtime overhead.
Add a pointer in kvm_x86_ops to allow vendor code to provide its callback.
There is no reason to force vendor code to do the registration, and either
way KVM would need a new kvm_x86_ops hook.
Suggested-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 3 +++
arch/x86/kvm/svm/svm.c | 5 +----
arch/x86/kvm/vmx/main.c | 2 ++
arch/x86/kvm/vmx/vmx.c | 6 +-----
arch/x86/kvm/vmx/x86_ops.h | 1 +
arch/x86/kvm/x86.c | 10 ++++++++++
6 files changed, 18 insertions(+), 9 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5c0415899a07..a4444b43f575 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -36,6 +36,7 @@
#include <asm/kvm_page_track.h>
#include <asm/kvm_vcpu_regs.h>
#include <asm/hyperv-tlfs.h>
+#include <asm/reboot.h>
#define __KVM_HAVE_ARCH_VCPU_DEBUGFS
@@ -1626,6 +1627,8 @@ struct kvm_x86_ops {
int (*hardware_enable)(void);
void (*hardware_disable)(void);
+ cpu_emergency_virt_cb *emergency_disable;
+
void (*hardware_unsetup)(void);
bool (*has_emulated_msr)(struct kvm *kvm, u32 index);
void (*vcpu_after_set_cpuid)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d33193d522e3..aa0aeb185d17 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4981,6 +4981,7 @@ static void *svm_alloc_apic_backing_page(struct kvm_vcpu *vcpu)
static struct kvm_x86_ops svm_x86_ops __initdata = {
.name = KBUILD_MODNAME,
+ .emergency_disable = svm_emergency_disable,
.check_processor_compatibility = svm_check_processor_compat,
.hardware_unsetup = svm_hardware_unsetup,
@@ -5416,8 +5417,6 @@ static struct kvm_x86_init_ops svm_init_ops __initdata = {
static void __svm_exit(void)
{
kvm_x86_vendor_exit();
-
- cpu_emergency_unregister_virt_callback(svm_emergency_disable);
}
static int __init svm_init(void)
@@ -5433,8 +5432,6 @@ static int __init svm_init(void)
if (r)
return r;
- cpu_emergency_register_virt_callback(svm_emergency_disable);
-
/*
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d0e1a5b5c915..45d6b5ad2da3 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -25,6 +25,8 @@ struct kvm_x86_ops vt_x86_ops __initdata = {
.hardware_enable = vmx_hardware_enable,
.hardware_disable = vmx_hardware_disable,
+ .emergency_disable = vmx_emergency_disable,
+
.has_emulated_msr = vmx_has_emulated_msr,
.vm_size = sizeof(struct kvm_vmx),
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 0e3aaf520db2..7edbd4e5758e 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -754,7 +754,7 @@ static int kvm_cpu_vmxoff(void)
return -EIO;
}
-static void vmx_emergency_disable(void)
+void vmx_emergency_disable(void)
{
int cpu = raw_smp_processor_id();
struct loaded_vmcs *v;
@@ -8603,8 +8603,6 @@ static void __vmx_exit(void)
{
allow_smaller_maxphyaddr = false;
- cpu_emergency_unregister_virt_callback(vmx_emergency_disable);
-
vmx_cleanup_l1d_flush();
}
@@ -8651,8 +8649,6 @@ static int __init vmx_init(void)
pi_init_cpu(cpu);
}
- cpu_emergency_register_virt_callback(vmx_emergency_disable);
-
vmx_check_vmcs12_offsets();
/*
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index 502704596c83..afddfe3747dd 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -15,6 +15,7 @@ void vmx_hardware_unsetup(void);
int vmx_check_processor_compat(void);
int vmx_hardware_enable(void);
void vmx_hardware_disable(void);
+void vmx_emergency_disable(void);
int vmx_vm_init(struct kvm *kvm);
void vmx_vm_destroy(struct kvm *kvm);
int vmx_vcpu_precreate(struct kvm *kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4157602c964e..e4bdb42a9b00 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12478,6 +12478,16 @@ void kvm_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector)
}
EXPORT_SYMBOL_GPL(kvm_vcpu_deliver_sipi_vector);
+void kvm_arch_enable_virtualization(void)
+{
+ cpu_emergency_register_virt_callback(kvm_x86_ops.emergency_disable);
+}
+
+void kvm_arch_disable_virtualization(void)
+{
+ cpu_emergency_unregister_virt_callback(kvm_x86_ops.emergency_disable);
+}
+
int kvm_arch_hardware_enable(void)
{
struct kvm *kvm;
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* [PATCH v3 8/8] KVM: Enable virtualization at load/initialization by default
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
` (6 preceding siblings ...)
2024-06-08 0:06 ` [PATCH v3 7/8] KVM: x86: Register "emergency disable" callbacks when virt is enabled Sean Christopherson
@ 2024-06-08 0:06 ` Sean Christopherson
2024-08-14 18:20 ` Paolo Bonzini
2024-06-10 0:59 ` [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Huang, Kai
` (2 subsequent siblings)
10 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2024-06-08 0:06 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
Enable virtualization when KVM loads by default, as doing so avoids the
potential runtime overhead associated with using the cpuhp framework to
enabling virtualization on each CPU.
Prior to commit 10474ae8945c ("KVM: Activate Virtualization On Demand"),
KVM _unconditionally_ enabled virtualization during load, i.e. there's no
fundamental reason KVM needs to dynamically toggle virtualization. These
days, the only known argument for not enabling virtualization is to allow
KVM to be autoloaded without blocking other out-of-tree hypervisors, and
such use cases can simply change the module param, e.g. via command line.
Note, the aforementioned commit also mentioned that enabling SVM (AMD's
virtualization extensions) can result in "using invalid TLB entries".
It's not clear whether the changelog was referring to a KVM bug, a CPU
bug, or something else entirely. Regardless, leaving virtualization off
by default is not a robust "fix", as any protection provided is lost the
instant userspace creates the first VM.
Suggested-by: Chao Gao <chao.gao@intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e20189a89a64..1440c0a7c3c3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -5495,7 +5495,7 @@ static struct miscdevice kvm_dev = {
};
#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
-static bool enable_virt_at_load;
+static bool enable_virt_at_load = true;
module_param(enable_virt_at_load, bool, 0444);
__visible bool kvm_rebooting;
--
2.45.2.505.gda0bf45e8d-goog
^ permalink raw reply related [flat|nested] 31+ messages in thread
* Re: [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
2024-06-08 0:06 ` [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock Sean Christopherson
@ 2024-06-10 0:26 ` Huang, Kai
2024-08-14 18:06 ` Paolo Bonzini
1 sibling, 0 replies; 31+ messages in thread
From: Huang, Kai @ 2024-06-10 0:26 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Gao, Chao
On Fri, 2024-06-07 at 17:06 -0700, Sean Christopherson wrote:
> Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
> on x86 due to a chain of locks and SRCU synchronizations. Translating the
> below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
> CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
> fairness of r/w semaphores).
>
> CPU0 CPU1 CPU2
> 1 lock(&kvm->slots_lock);
> 2 lock(&vcpu->mutex);
> 3 lock(&kvm->srcu);
> 4 lock(cpu_hotplug_lock);
> 5 lock(kvm_lock);
> 6 lock(&kvm->slots_lock);
> 7 lock(cpu_hotplug_lock);
> 8 sync(&kvm->srcu);
>
>
[...]
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Nitpickings below:
> ---
> Documentation/virt/kvm/locking.rst | 19 ++++++++++++------
> virt/kvm/kvm_main.c | 31 +++++++++++++++---------------
> 2 files changed, 29 insertions(+), 21 deletions(-)
>
> diff --git a/Documentation/virt/kvm/locking.rst b/Documentation/virt/kvm/locking.rst
> index 02880d5552d5..5e102fe5b396 100644
> --- a/Documentation/virt/kvm/locking.rst
> +++ b/Documentation/virt/kvm/locking.rst
> @@ -227,7 +227,13 @@ time it will be set using the Dirty tracking mechanism described above.
> :Type: mutex
> :Arch: any
> :Protects: - vm_list
> - - kvm_usage_count
> +
> +``kvm_usage_count``
> +^^^^^^^^^^^^^^^^^^^
kvm_usage_lock
> +
> +:Type: mutex
> +:Arch: any
> +:Protects: - kvm_usage_count
> - hardware virtualization enable/disable
> :Comment: KVM also disables CPU hotplug via cpus_read_lock() during
> enable/disable.
I think this sentence should be improved to at least mention "Exists
because using kvm_lock leads to deadlock", just like the comment for
vendor_module_lock below.
> @@ -290,11 +296,12 @@ time it will be set using the Dirty tracking mechanism described above.
> wakeup.
>
> ``vendor_module_lock``
> -^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +^^^^^^^^^^^^^^^^^^^^^^
> :Type: mutex
> :Arch: x86
> :Protects: loading a vendor module (kvm_amd or kvm_intel)
> -:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
> - taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
> - many operations need to take cpu_hotplug_lock when loading a vendor module,
> - e.g. updating static calls.
> +:Comment: Exists because using kvm_lock leads to deadlock. kvm_lock is taken
> + in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while
> + cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many
> + operations need to take cpu_hotplug_lock when loading a vendor module, e.g.
> + updating static calls.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware
2024-06-08 0:06 ` [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
@ 2024-06-10 0:55 ` Huang, Kai
2024-08-14 18:12 ` Paolo Bonzini
1 sibling, 0 replies; 31+ messages in thread
From: Huang, Kai @ 2024-06-10 0:55 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Gao, Chao
On Fri, 2024-06-07 at 17:06 -0700, Sean Christopherson wrote:
> Register KVM's cpuhp and syscore callback when enabling virtualization
> in hardware instead of registering the callbacks during initialization,
> and let the CPU up/down framework invoke the inner enable/disable
> functions. Registering the callbacks during initialization makes things
> more complex than they need to be, as KVM needs to be very careful about
> handling races between enabling CPUs being onlined/offlined and hardware
> being enabled/disabled.
>
> Intel TDX support will require KVM to enable virtualization during KVM
> initialization, i.e. will add another wrinkle to things, at which point
> sorting out the potential races with kvm_usage_count would become even
> more complex.
>
> Note, using the cpuhp framework has a subtle behavioral change: enabling
> will be done serially across all CPUs, whereas KVM currently sends an IPI
> to all CPUs in parallel. While serializing virtualization enabling could
> create undesirable latency, the issue is limited to creation of KVM's
> first VM, and even that can be mitigated, e.g. by letting userspace force
> virtualization to be enabled when KVM is initialized.
>
> Cc: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>
Reviewed-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
` (7 preceding siblings ...)
2024-06-08 0:06 ` [PATCH v3 8/8] KVM: Enable virtualization at load/initialization by default Sean Christopherson
@ 2024-06-10 0:59 ` Huang, Kai
2024-08-14 18:23 ` Paolo Bonzini
2024-08-20 8:57 ` Paolo Bonzini
10 siblings, 0 replies; 31+ messages in thread
From: Huang, Kai @ 2024-06-10 0:59 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Gao, Chao
On Fri, 2024-06-07 at 17:06 -0700, Sean Christopherson wrote:
> Register KVM's cpuhp and syscore callbacks when enabling virtualization in
> hardware, as the sole purpose of said callbacks is to disable and re-enable
> virtualization as needed.
>
> The primary motivation for this series is to simplify dealing with enabling
> virtualization for Intel's TDX, which needs to enable virtualization
> when kvm-intel.ko is loaded, i.e. long before the first VM is created. TDX
> doesn't _need_ to keep virtualization enabled, but doing so is much simpler
> for KVM (see patch 3).
>
> That said, this is a nice cleanup on its own, assuming I haven't broken
> something. By registering the callbacks on-demand, the callbacks themselves
> don't need to check kvm_usage_count, because their very existence implies a
> non-zero count.
>
>
For this series,
Acked-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded
2024-06-08 0:06 ` [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded Sean Christopherson
@ 2024-08-02 12:02 ` Huang, Kai
2024-08-02 12:06 ` Huang, Kai
2024-08-13 2:31 ` Sean Christopherson
2024-08-14 18:14 ` Paolo Bonzini
1 sibling, 2 replies; 31+ messages in thread
From: Huang, Kai @ 2024-08-02 12:02 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Gao, Chao
> +static void kvm_uninit_virtualization(void)
> +{
> + if (enable_virt_at_load)
> + kvm_disable_virtualization();
> +
> + WARN_ON(kvm_usage_count);
> +}
>
Hi Sean,
The above "WARN_ON(kvm_usage_count);" assumes the
kvm_uninit_virtualization() is the last call of
kvm_disable_virtualization(), and it is called ...
> @@ -6433,6 +6468,8 @@ void kvm_exit(void)
> */
> misc_deregister(&kvm_dev);
>
> + kvm_uninit_virtualization();
> +
>
... from kvm_exit().
Accordingly, kvm_init_virtualization() is called in kvm_init().
For TDX, we want to "explicitly call kvm_enable_virtualization() +
initializing TDX module" before kvm_init() in vt_init(), since kvm_init()
is supposed to be the last step after initializing TDX.
In the exit path, accordingly, for TDX we want to call kvm_exit() first,
and then "do TDX cleanup staff + explicitly call
kvm_disable_virtualizaation()".
This will trigger the above "WARN_ON(kvm_usage_count);" when
enable_virt_at_load is true, because kvm_uninit_virtualization() isn't
the last call of kvm_disable_virtualization().
To resolve, I think one way is we can move kvm_init_virtualization() out
of kvm_init(), but I am not sure whether there's another common place
that kvm_init_virtualization() can be called for all ARCHs.
Do you have any comments?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded
2024-08-02 12:02 ` Huang, Kai
@ 2024-08-02 12:06 ` Huang, Kai
2024-08-13 2:31 ` Sean Christopherson
1 sibling, 0 replies; 31+ messages in thread
From: Huang, Kai @ 2024-08-02 12:06 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Gao, Chao
On Fri, 2024-08-02 at 12:02 +0000, Huang, Kai wrote:
> This will trigger the above "WARN_ON(kvm_usage_count);" when
> enable_virt_at_load is true, because kvm_uninit_virtualization() isn't
> the last call of kvm_disable_virtualization().
Correct: the WARN_ON() will be triggered regardless of
enable_virt_at_load is true or false, if kvm_uninit_virtualization()
isn't the last call of kvm_disable_virtualization().
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded
2024-08-02 12:02 ` Huang, Kai
2024-08-02 12:06 ` Huang, Kai
@ 2024-08-13 2:31 ` Sean Christopherson
2024-08-13 5:22 ` Huang, Kai
1 sibling, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2024-08-13 2:31 UTC (permalink / raw)
To: Kai Huang
Cc: pbonzini@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, Chao Gao
On Fri, Aug 02, 2024, Kai Huang wrote:
>
> > +static void kvm_uninit_virtualization(void)
> > +{
> > + if (enable_virt_at_load)
> > + kvm_disable_virtualization();
> > +
> > + WARN_ON(kvm_usage_count);
> > +}
> >
>
> Hi Sean,
>
> The above "WARN_ON(kvm_usage_count);" assumes the
> kvm_uninit_virtualization() is the last call of
> kvm_disable_virtualization(), and it is called ...
>
> > @@ -6433,6 +6468,8 @@ void kvm_exit(void)
> > */
> > misc_deregister(&kvm_dev);
> >
> > + kvm_uninit_virtualization();
> > +
> >
>
> ... from kvm_exit().
>
> Accordingly, kvm_init_virtualization() is called in kvm_init().
>
> For TDX, we want to "explicitly call kvm_enable_virtualization() +
> initializing TDX module" before kvm_init() in vt_init(), since kvm_init()
> is supposed to be the last step after initializing TDX.
>
> In the exit path, accordingly, for TDX we want to call kvm_exit() first,
> and then "do TDX cleanup staff + explicitly call
> kvm_disable_virtualizaation()".
>
> This will trigger the above "WARN_ON(kvm_usage_count);" when
> enable_virt_at_load is true, because kvm_uninit_virtualization() isn't
> the last call of kvm_disable_virtualization().
>
> To resolve, I think one way is we can move kvm_init_virtualization() out
> of kvm_init(), but I am not sure whether there's another common place
> that kvm_init_virtualization() can be called for all ARCHs.
>
> Do you have any comments?
Drat. That's my main coment, though not the exact word I used :-)
I managed to completely forget about TDX needing to enable virtualization to do
its setup before creating /dev/kvm. A few options jump to mind:
1. Expose kvm_enable_virtualization() to arch code and delete the WARN_ON().
2. Move kvm_init_virtualization() as you suggested.
3. Move the call to misc_register() out of kvm_init(), so that arch code can
do additional setup between kvm_init() and kvm_register_dev_kvm() or whatever.
I'm leaning towards #1. IIRC, that was my original intent before going down the
"enable virtualization at module load" path. And it's not mutually exclusive
with allowing virtualization to be forced on at module load.
If #1 isn't a good option for whatever reason, I'd lean slightly for #3 over #2,
purely because it's less arbitrary (registering /dev/kvm is the only thing that
has strict ordering requirements). But I don't know that having a separate
registration API would be a net positive, e.g. it's kinda nice that kvm_init()
needs to be last, because it helps ensure some amount of guaranteed ordering
between common KVM and arch code.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded
2024-08-13 2:31 ` Sean Christopherson
@ 2024-08-13 5:22 ` Huang, Kai
2024-08-13 23:47 ` Huang, Kai
0 siblings, 1 reply; 31+ messages in thread
From: Huang, Kai @ 2024-08-13 5:22 UTC (permalink / raw)
To: seanjc@google.com
Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, Gao, Chao
On Mon, 2024-08-12 at 19:31 -0700, Sean Christopherson wrote:
> On Fri, Aug 02, 2024, Kai Huang wrote:
> >
> > > +static void kvm_uninit_virtualization(void)
> > > +{
> > > + if (enable_virt_at_load)
> > > + kvm_disable_virtualization();
> > > +
> > > + WARN_ON(kvm_usage_count);
> > > +}
> > >
> >
> > Hi Sean,
> >
> > The above "WARN_ON(kvm_usage_count);" assumes the
> > kvm_uninit_virtualization() is the last call of
> > kvm_disable_virtualization(), and it is called ...
> >
> > > @@ -6433,6 +6468,8 @@ void kvm_exit(void)
> > > */
> > > misc_deregister(&kvm_dev);
> > >
> > > + kvm_uninit_virtualization();
> > > +
> > >
> >
> > ... from kvm_exit().
> >
> > Accordingly, kvm_init_virtualization() is called in kvm_init().
> >
> > For TDX, we want to "explicitly call kvm_enable_virtualization() +
> > initializing TDX module" before kvm_init() in vt_init(), since kvm_init()
> > is supposed to be the last step after initializing TDX.
> >
> > In the exit path, accordingly, for TDX we want to call kvm_exit() first,
> > and then "do TDX cleanup staff + explicitly call
> > kvm_disable_virtualizaation()".
> >
> > This will trigger the above "WARN_ON(kvm_usage_count);" when
> > enable_virt_at_load is true, because kvm_uninit_virtualization() isn't
> > the last call of kvm_disable_virtualization().
> >
> > To resolve, I think one way is we can move kvm_init_virtualization() out
> > of kvm_init(), but I am not sure whether there's another common place
> > that kvm_init_virtualization() can be called for all ARCHs.
> >
> > Do you have any comments?
>
> Drat. That's my main coment, though not the exact word I used :-)
>
> I managed to completely forget about TDX needing to enable virtualization to do
> its setup before creating /dev/kvm. A few options jump to mind:
>
> 1. Expose kvm_enable_virtualization() to arch code and delete the WARN_ON().
>
> 2. Move kvm_init_virtualization() as you suggested.
>
> 3. Move the call to misc_register() out of kvm_init(), so that arch code can
> do additional setup between kvm_init() and kvm_register_dev_kvm() or whatever.
>
> I'm leaning towards #1. IIRC, that was my original intent before going down the
> "enable virtualization at module load" path. And it's not mutually exclusive
> with allowing virtualization to be forced on at module load.
>
> If #1 isn't a good option for whatever reason, I'd lean slightly for #3 over #2,
> purely because it's less arbitrary (registering /dev/kvm is the only thing that
> has strict ordering requirements). But I don't know that having a separate
> registration API would be a net positive, e.g. it's kinda nice that kvm_init()
> needs to be last, because it helps ensure some amount of guaranteed ordering
> between common KVM and arch code.
I agree with option 1). We just allow arch code to do additional
kvm_enable_virtualization() before kvm_init() and kvm_disable_virtualization()
after kvm_exit(). I think it's kinda normal behaviour anyway.
And this is exactly what I am doing :-)
https://github.com/intel/tdx/commit/2f7cef685527a5ef952346ff5ab9adbb8bb6f371
https://github.com/intel/tdx/commit/6c76ffa47a98ca370fad389271dc3cedf304df2d
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded
2024-08-13 5:22 ` Huang, Kai
@ 2024-08-13 23:47 ` Huang, Kai
0 siblings, 0 replies; 31+ messages in thread
From: Huang, Kai @ 2024-08-13 23:47 UTC (permalink / raw)
To: seanjc@google.com
Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, Gao, Chao
On 13/08/2024 5:22 pm, Huang, Kai wrote:
> On Mon, 2024-08-12 at 19:31 -0700, Sean Christopherson wrote:
>> On Fri, Aug 02, 2024, Kai Huang wrote:
>>>
>>>> +static void kvm_uninit_virtualization(void)
>>>> +{
>>>> + if (enable_virt_at_load)
>>>> + kvm_disable_virtualization();
>>>> +
>>>> + WARN_ON(kvm_usage_count);
>>>> +}
>>>>
>>>
>>> Hi Sean,
>>>
>>> The above "WARN_ON(kvm_usage_count);" assumes the
>>> kvm_uninit_virtualization() is the last call of
>>> kvm_disable_virtualization(), and it is called ...
>>>
>>>> @@ -6433,6 +6468,8 @@ void kvm_exit(void)
>>>> */
>>>> misc_deregister(&kvm_dev);
>>>>
>>>> + kvm_uninit_virtualization();
>>>> +
>>>>
>>>
>>> ... from kvm_exit().
>>>
>>> Accordingly, kvm_init_virtualization() is called in kvm_init().
>>>
>>> For TDX, we want to "explicitly call kvm_enable_virtualization() +
>>> initializing TDX module" before kvm_init() in vt_init(), since kvm_init()
>>> is supposed to be the last step after initializing TDX.
>>>
>>> In the exit path, accordingly, for TDX we want to call kvm_exit() first,
>>> and then "do TDX cleanup staff + explicitly call
>>> kvm_disable_virtualizaation()".
>>>
>>> This will trigger the above "WARN_ON(kvm_usage_count);" when
>>> enable_virt_at_load is true, because kvm_uninit_virtualization() isn't
>>> the last call of kvm_disable_virtualization().
>>>
>>> To resolve, I think one way is we can move kvm_init_virtualization() out
>>> of kvm_init(), but I am not sure whether there's another common place
>>> that kvm_init_virtualization() can be called for all ARCHs.
>>>
>>> Do you have any comments?
>>
>> Drat. That's my main coment, though not the exact word I used :-)
>>
>> I managed to completely forget about TDX needing to enable virtualization to do
>> its setup before creating /dev/kvm. A few options jump to mind:
>>
>> 1. Expose kvm_enable_virtualization() to arch code and delete the WARN_ON().
>>
>> 2. Move kvm_init_virtualization() as you suggested.
>>
>> 3. Move the call to misc_register() out of kvm_init(), so that arch code can
>> do additional setup between kvm_init() and kvm_register_dev_kvm() or whatever.
>>
>> I'm leaning towards #1. IIRC, that was my original intent before going down the
>> "enable virtualization at module load" path. And it's not mutually exclusive
>> with allowing virtualization to be forced on at module load.
>>
>> If #1 isn't a good option for whatever reason, I'd lean slightly for #3 over #2,
>> purely because it's less arbitrary (registering /dev/kvm is the only thing that
>> has strict ordering requirements). But I don't know that having a separate
>> registration API would be a net positive, e.g. it's kinda nice that kvm_init()
>> needs to be last, because it helps ensure some amount of guaranteed ordering
>> between common KVM and arch code.
>
> I agree with option 1). We just allow arch code to do additional
> kvm_enable_virtualization() before kvm_init() and kvm_disable_virtualization()
> after kvm_exit(). I think it's kinda normal behaviour anyway.
>
> And this is exactly what I am doing :-)
>
> https://github.com/intel/tdx/commit/2f7cef685527a5ef952346ff5ab9adbb8bb6f371
> https://github.com/intel/tdx/commit/6c76ffa47a98ca370fad389271dc3cedf304df2d
>
Hi Sean,
Forgot to ask: I assume you will post v4 with that WARN_ON() in
kvm_uninit_virtualizaiton() removed, correct?
I am not sure whether you will include the patch to export
kvm_enable_virtualization() and kvm_disable_virtualization() but either
way is fine to me.
I am thinking if we can get this patchset merged to kvm-coco-queue then
we can actually start to review (and try to integrate) the "init TDX
during KVM loading" patches.
I appreciate if you and Paolo could share some plan on this. Thanks!
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
2024-06-08 0:06 ` [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock Sean Christopherson
2024-06-10 0:26 ` Huang, Kai
@ 2024-08-14 18:06 ` Paolo Bonzini
2024-08-15 14:39 ` Sean Christopherson
1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2024-08-14 18:06 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On 6/8/24 02:06, Sean Christopherson wrote:
> Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
> on x86 due to a chain of locks and SRCU synchronizations. Translating the
> below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
> CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
> fairness of r/w semaphores).
>
> CPU0 CPU1 CPU2
> 1 lock(&kvm->slots_lock);
> 2 lock(&vcpu->mutex);
> 3 lock(&kvm->srcu);
> 4 lock(cpu_hotplug_lock);
> 5 lock(kvm_lock);
> 6 lock(&kvm->slots_lock);
> 7 lock(cpu_hotplug_lock);
> 8 sync(&kvm->srcu);
>
> Note, there are likely more potential deadlocks in KVM x86, e.g. the same
> pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
> __kvmclock_cpufreq_notifier()
Offhand I couldn't see any places where {,__}cpufreq_driver_target() is
called within cpus_read_lock(). I didn't look too closely though.
> +``kvm_usage_count``
> +^^^^^^^^^^^^^^^^^^^
``kvm_usage_lock``
Paolo
> +
> +:Type: mutex
> +:Arch: any
> +:Protects: - kvm_usage_count
> - hardware virtualization enable/disable
> :Comment: KVM also disables CPU hotplug via cpus_read_lock() during
> enable/disable.
> @@ -290,11 +296,12 @@ time it will be set using the Dirty tracking mechanism described above.
> wakeup.
>
> ``vendor_module_lock``
> -^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> +^^^^^^^^^^^^^^^^^^^^^^
> :Type: mutex
> :Arch: x86
> :Protects: loading a vendor module (kvm_amd or kvm_intel)
> -:Comment: Exists because using kvm_lock leads to deadlock. cpu_hotplug_lock is
> - taken outside of kvm_lock, e.g. in KVM's CPU online/offline callbacks, and
> - many operations need to take cpu_hotplug_lock when loading a vendor module,
> - e.g. updating static calls.
> +:Comment: Exists because using kvm_lock leads to deadlock. kvm_lock is taken
> + in notifiers, e.g. __kvmclock_cpufreq_notifier(), that may be invoked while
> + cpu_hotplug_lock is held, e.g. from cpufreq_boost_trigger_state(), and many
> + operations need to take cpu_hotplug_lock when loading a vendor module, e.g.
> + updating static calls.
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4965196cad58..d9b0579d3eea 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5499,6 +5499,7 @@ __visible bool kvm_rebooting;
> EXPORT_SYMBOL_GPL(kvm_rebooting);
>
> static DEFINE_PER_CPU(bool, hardware_enabled);
> +static DEFINE_MUTEX(kvm_usage_lock);
> static int kvm_usage_count;
>
> static int __hardware_enable_nolock(void)
> @@ -5531,10 +5532,10 @@ static int kvm_online_cpu(unsigned int cpu)
> * be enabled. Otherwise running VMs would encounter unrecoverable
> * errors when scheduled to this CPU.
> */
> - mutex_lock(&kvm_lock);
> + mutex_lock(&kvm_usage_lock);
> if (kvm_usage_count)
> ret = __hardware_enable_nolock();
> - mutex_unlock(&kvm_lock);
> + mutex_unlock(&kvm_usage_lock);
> return ret;
> }
>
> @@ -5554,10 +5555,10 @@ static void hardware_disable_nolock(void *junk)
>
> static int kvm_offline_cpu(unsigned int cpu)
> {
> - mutex_lock(&kvm_lock);
> + mutex_lock(&kvm_usage_lock);
> if (kvm_usage_count)
> hardware_disable_nolock(NULL);
> - mutex_unlock(&kvm_lock);
> + mutex_unlock(&kvm_usage_lock);
> return 0;
> }
>
> @@ -5573,9 +5574,9 @@ static void hardware_disable_all_nolock(void)
> static void hardware_disable_all(void)
> {
> cpus_read_lock();
> - mutex_lock(&kvm_lock);
> + mutex_lock(&kvm_usage_lock);
> hardware_disable_all_nolock();
> - mutex_unlock(&kvm_lock);
> + mutex_unlock(&kvm_usage_lock);
> cpus_read_unlock();
> }
>
> @@ -5606,7 +5607,7 @@ static int hardware_enable_all(void)
> * enable hardware multiple times.
> */
> cpus_read_lock();
> - mutex_lock(&kvm_lock);
> + mutex_lock(&kvm_usage_lock);
>
> r = 0;
>
> @@ -5620,7 +5621,7 @@ static int hardware_enable_all(void)
> }
> }
>
> - mutex_unlock(&kvm_lock);
> + mutex_unlock(&kvm_usage_lock);
> cpus_read_unlock();
>
> return r;
> @@ -5648,13 +5649,13 @@ static int kvm_suspend(void)
> {
> /*
> * Secondary CPUs and CPU hotplug are disabled across the suspend/resume
> - * callbacks, i.e. no need to acquire kvm_lock to ensure the usage count
> - * is stable. Assert that kvm_lock is not held to ensure the system
> - * isn't suspended while KVM is enabling hardware. Hardware enabling
> - * can be preempted, but the task cannot be frozen until it has dropped
> - * all locks (userspace tasks are frozen via a fake signal).
> + * callbacks, i.e. no need to acquire kvm_usage_lock to ensure the usage
> + * count is stable. Assert that kvm_usage_lock is not held to ensure
> + * the system isn't suspended while KVM is enabling hardware. Hardware
> + * enabling can be preempted, but the task cannot be frozen until it has
> + * dropped all locks (userspace tasks are frozen via a fake signal).
> */
> - lockdep_assert_not_held(&kvm_lock);
> + lockdep_assert_not_held(&kvm_usage_lock);
> lockdep_assert_irqs_disabled();
>
> if (kvm_usage_count)
> @@ -5664,7 +5665,7 @@ static int kvm_suspend(void)
>
> static void kvm_resume(void)
> {
> - lockdep_assert_not_held(&kvm_lock);
> + lockdep_assert_not_held(&kvm_usage_lock);
> lockdep_assert_irqs_disabled();
>
> if (kvm_usage_count)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware
2024-06-08 0:06 ` [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
2024-06-10 0:55 ` Huang, Kai
@ 2024-08-14 18:12 ` Paolo Bonzini
2024-08-14 20:55 ` Sean Christopherson
1 sibling, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2024-08-14 18:12 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On 6/8/24 02:06, Sean Christopherson wrote:
> Register KVM's cpuhp and syscore callback when enabling virtualization
> in hardware instead of registering the callbacks during initialization,
> and let the CPU up/down framework invoke the inner enable/disable
> functions. Registering the callbacks during initialization makes things
> more complex than they need to be, as KVM needs to be very careful about
> handling races between enabling CPUs being onlined/offlined and hardware
> being enabled/disabled.
>
> Intel TDX support will require KVM to enable virtualization during KVM
> initialization, i.e. will add another wrinkle to things, at which point
> sorting out the potential races with kvm_usage_count would become even
> more complex.
>
> Note, using the cpuhp framework has a subtle behavioral change: enabling
> will be done serially across all CPUs, whereas KVM currently sends an IPI
> to all CPUs in parallel. While serializing virtualization enabling could
> create undesirable latency, the issue is limited to creation of KVM's
> first VM,
Isn't that "limited to when kvm_usage_count goes from 0 to 1", so every
time a VM is started if you never run two?
You're fixing this later though, so this is just an issue with the
commit message.
Paolo
and even that can be mitigated, e.g. by letting userspace force
> virtualization to be enabled when KVM is initialized.
>
> Cc: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/kvm_main.c | 174 ++++++++++++++++----------------------------
> 1 file changed, 61 insertions(+), 113 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index d9b0579d3eea..f6b114f42433 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5502,7 +5502,7 @@ static DEFINE_PER_CPU(bool, hardware_enabled);
> static DEFINE_MUTEX(kvm_usage_lock);
> static int kvm_usage_count;
>
> -static int __hardware_enable_nolock(void)
> +static int hardware_enable_nolock(void)
> {
> if (__this_cpu_read(hardware_enabled))
> return 0;
> @@ -5517,34 +5517,18 @@ static int __hardware_enable_nolock(void)
> return 0;
> }
>
> -static void hardware_enable_nolock(void *failed)
> -{
> - if (__hardware_enable_nolock())
> - atomic_inc(failed);
> -}
> -
> static int kvm_online_cpu(unsigned int cpu)
> {
> - int ret = 0;
> -
> /*
> * Abort the CPU online process if hardware virtualization cannot
> * be enabled. Otherwise running VMs would encounter unrecoverable
> * errors when scheduled to this CPU.
> */
> - mutex_lock(&kvm_usage_lock);
> - if (kvm_usage_count)
> - ret = __hardware_enable_nolock();
> - mutex_unlock(&kvm_usage_lock);
> - return ret;
> + return hardware_enable_nolock();
> }
>
> static void hardware_disable_nolock(void *junk)
> {
> - /*
> - * Note, hardware_disable_all_nolock() tells all online CPUs to disable
> - * hardware, not just CPUs that successfully enabled hardware!
> - */
> if (!__this_cpu_read(hardware_enabled))
> return;
>
> @@ -5555,78 +5539,10 @@ static void hardware_disable_nolock(void *junk)
>
> static int kvm_offline_cpu(unsigned int cpu)
> {
> - mutex_lock(&kvm_usage_lock);
> - if (kvm_usage_count)
> - hardware_disable_nolock(NULL);
> - mutex_unlock(&kvm_usage_lock);
> + hardware_disable_nolock(NULL);
> return 0;
> }
>
> -static void hardware_disable_all_nolock(void)
> -{
> - BUG_ON(!kvm_usage_count);
> -
> - kvm_usage_count--;
> - if (!kvm_usage_count)
> - on_each_cpu(hardware_disable_nolock, NULL, 1);
> -}
> -
> -static void hardware_disable_all(void)
> -{
> - cpus_read_lock();
> - mutex_lock(&kvm_usage_lock);
> - hardware_disable_all_nolock();
> - mutex_unlock(&kvm_usage_lock);
> - cpus_read_unlock();
> -}
> -
> -static int hardware_enable_all(void)
> -{
> - atomic_t failed = ATOMIC_INIT(0);
> - int r;
> -
> - /*
> - * Do not enable hardware virtualization if the system is going down.
> - * If userspace initiated a forced reboot, e.g. reboot -f, then it's
> - * possible for an in-flight KVM_CREATE_VM to trigger hardware enabling
> - * after kvm_reboot() is called. Note, this relies on system_state
> - * being set _before_ kvm_reboot(), which is why KVM uses a syscore ops
> - * hook instead of registering a dedicated reboot notifier (the latter
> - * runs before system_state is updated).
> - */
> - if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
> - system_state == SYSTEM_RESTART)
> - return -EBUSY;
> -
> - /*
> - * When onlining a CPU, cpu_online_mask is set before kvm_online_cpu()
> - * is called, and so on_each_cpu() between them includes the CPU that
> - * is being onlined. As a result, hardware_enable_nolock() may get
> - * invoked before kvm_online_cpu(), which also enables hardware if the
> - * usage count is non-zero. Disable CPU hotplug to avoid attempting to
> - * enable hardware multiple times.
> - */
> - cpus_read_lock();
> - mutex_lock(&kvm_usage_lock);
> -
> - r = 0;
> -
> - kvm_usage_count++;
> - if (kvm_usage_count == 1) {
> - on_each_cpu(hardware_enable_nolock, &failed, 1);
> -
> - if (atomic_read(&failed)) {
> - hardware_disable_all_nolock();
> - r = -EBUSY;
> - }
> - }
> -
> - mutex_unlock(&kvm_usage_lock);
> - cpus_read_unlock();
> -
> - return r;
> -}
> -
> static void kvm_shutdown(void)
> {
> /*
> @@ -5658,8 +5574,7 @@ static int kvm_suspend(void)
> lockdep_assert_not_held(&kvm_usage_lock);
> lockdep_assert_irqs_disabled();
>
> - if (kvm_usage_count)
> - hardware_disable_nolock(NULL);
> + hardware_disable_nolock(NULL);
> return 0;
> }
>
> @@ -5668,8 +5583,7 @@ static void kvm_resume(void)
> lockdep_assert_not_held(&kvm_usage_lock);
> lockdep_assert_irqs_disabled();
>
> - if (kvm_usage_count)
> - WARN_ON_ONCE(__hardware_enable_nolock());
> + WARN_ON_ONCE(hardware_enable_nolock());
> }
>
> static struct syscore_ops kvm_syscore_ops = {
> @@ -5677,6 +5591,60 @@ static struct syscore_ops kvm_syscore_ops = {
> .resume = kvm_resume,
> .shutdown = kvm_shutdown,
> };
> +
> +static int hardware_enable_all(void)
> +{
> + int r;
> +
> + guard(mutex)(&kvm_usage_lock);
> +
> + if (kvm_usage_count++)
> + return 0;
> +
> + r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> + kvm_online_cpu, kvm_offline_cpu);
> + if (r)
> + goto err_cpuhp;
> +
> + register_syscore_ops(&kvm_syscore_ops);
> +
> + /*
> + * Undo virtualization enabling and bail if the system is going down.
> + * If userspace initiated a forced reboot, e.g. reboot -f, then it's
> + * possible for an in-flight operation to enable virtualization after
> + * syscore_shutdown() is called, i.e. without kvm_shutdown() being
> + * invoked. Note, this relies on system_state being set _before_
> + * kvm_shutdown(), e.g. to ensure either kvm_shutdown() is invoked
> + * or this CPU observes the impending shutdown. Which is why KVM uses
> + * a syscore ops hook instead of registering a dedicated reboot
> + * notifier (the latter runs before system_state is updated).
> + */
> + if (system_state == SYSTEM_HALT || system_state == SYSTEM_POWER_OFF ||
> + system_state == SYSTEM_RESTART) {
> + r = -EBUSY;
> + goto err_rebooting;
> + }
> +
> + return 0;
> +
> +err_rebooting:
> + unregister_syscore_ops(&kvm_syscore_ops);
> + cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
> +err_cpuhp:
> + --kvm_usage_count;
> + return r;
> +}
> +
> +static void hardware_disable_all(void)
> +{
> + guard(mutex)(&kvm_usage_lock);
> +
> + if (--kvm_usage_count)
> + return;
> +
> + unregister_syscore_ops(&kvm_syscore_ops);
> + cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
> +}
> #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
> static int hardware_enable_all(void)
> {
> @@ -6382,15 +6350,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
> int r;
> int cpu;
>
> -#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> - r = cpuhp_setup_state_nocalls(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> - kvm_online_cpu, kvm_offline_cpu);
> - if (r)
> - return r;
> -
> - register_syscore_ops(&kvm_syscore_ops);
> -#endif
> -
> /* A kmem cache lets us meet the alignment requirements of fx_save. */
> if (!vcpu_align)
> vcpu_align = __alignof__(struct kvm_vcpu);
> @@ -6401,10 +6360,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
> offsetofend(struct kvm_vcpu, stats_id)
> - offsetof(struct kvm_vcpu, arch),
> NULL);
> - if (!kvm_vcpu_cache) {
> - r = -ENOMEM;
> - goto err_vcpu_cache;
> - }
> + if (!kvm_vcpu_cache)
> + return -ENOMEM;
>
> for_each_possible_cpu(cpu) {
> if (!alloc_cpumask_var_node(&per_cpu(cpu_kick_mask, cpu),
> @@ -6461,11 +6418,6 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
> for_each_possible_cpu(cpu)
> free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
> kmem_cache_destroy(kvm_vcpu_cache);
> -err_vcpu_cache:
> -#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> - unregister_syscore_ops(&kvm_syscore_ops);
> - cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> -#endif
> return r;
> }
> EXPORT_SYMBOL_GPL(kvm_init);
> @@ -6487,10 +6439,6 @@ void kvm_exit(void)
> kmem_cache_destroy(kvm_vcpu_cache);
> kvm_vfio_ops_exit();
> kvm_async_pf_deinit();
> -#ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> - unregister_syscore_ops(&kvm_syscore_ops);
> - cpuhp_remove_state_nocalls(CPUHP_AP_KVM_ONLINE);
> -#endif
> kvm_irqfd_exit();
> }
> EXPORT_SYMBOL_GPL(kvm_exit);
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded
2024-06-08 0:06 ` [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded Sean Christopherson
2024-08-02 12:02 ` Huang, Kai
@ 2024-08-14 18:14 ` Paolo Bonzini
1 sibling, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-08-14 18:14 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On 6/8/24 02:06, Sean Christopherson wrote:
> Add an off-by-default module param, enable_virt_at_load, to let userspace
> force virtualization to be enabled in hardware when KVM is initialized,
> i.e. just before /dev/kvm is exposed to userspace. Enabling virtualization
> during KVM initialization allows userspace to avoid the additional latency
> when creating/destroying the first/last VM. Now that KVM uses the cpuhp
> framework to do per-CPU enabling, the latency could be non-trivial as the
> cpuhup bringup/teardown is serialized across CPUs, e.g. the latency could
> be problematic for use case that need to spin up VMs quickly.
>
> Enabling virtualizaton during initialization will also allow KVM to setup
> the Intel TDX Module, which requires VMX to be fully enabled, without
> needing additional APIs to temporarily enable virtualization.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
I think we should enable it by default and wait for someone to complain.
Or notice, even.
Paolo
> ---
> virt/kvm/kvm_main.c | 37 +++++++++++++++++++++++++++++++++++++
> 1 file changed, 37 insertions(+)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 98e52d12f137..7bdd744e4821 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5495,6 +5495,9 @@ static struct miscdevice kvm_dev = {
> };
>
> #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> +static bool enable_virt_at_load;
> +module_param(enable_virt_at_load, bool, 0444);
> +
> __visible bool kvm_rebooting;
> EXPORT_SYMBOL_GPL(kvm_rebooting);
>
> @@ -5645,15 +5648,41 @@ static void kvm_disable_virtualization(void)
> unregister_syscore_ops(&kvm_syscore_ops);
> cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
> }
> +
> +static int kvm_init_virtualization(void)
> +{
> + if (enable_virt_at_load)
> + return kvm_enable_virtualization();
> +
> + return 0;
> +}
> +
> +static void kvm_uninit_virtualization(void)
> +{
> + if (enable_virt_at_load)
> + kvm_disable_virtualization();
> +
> + WARN_ON(kvm_usage_count);
> +}
> #else /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
> static int kvm_enable_virtualization(void)
> {
> return 0;
> }
>
> +static int kvm_init_virtualization(void)
> +{
> + return 0;
> +}
> +
> static void kvm_disable_virtualization(void)
> {
>
> +}
> +
> +static void kvm_uninit_virtualization(void)
> +{
> +
> }
> #endif /* CONFIG_KVM_GENERIC_HARDWARE_ENABLING */
>
> @@ -6395,6 +6424,10 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
>
> kvm_gmem_init(module);
>
> + r = kvm_init_virtualization();
> + if (r)
> + goto err_virt;
> +
> /*
> * Registration _must_ be the very last thing done, as this exposes
> * /dev/kvm to userspace, i.e. all infrastructure must be setup!
> @@ -6408,6 +6441,8 @@ int kvm_init(unsigned vcpu_size, unsigned vcpu_align, struct module *module)
> return 0;
>
> err_register:
> + kvm_uninit_virtualization();
> +err_virt:
> kvm_vfio_ops_exit();
> err_vfio:
> kvm_async_pf_deinit();
> @@ -6433,6 +6468,8 @@ void kvm_exit(void)
> */
> misc_deregister(&kvm_dev);
>
> + kvm_uninit_virtualization();
> +
> debugfs_remove_recursive(kvm_debugfs_dir);
> for_each_possible_cpu(cpu)
> free_cpumask_var(per_cpu(cpu_kick_mask, cpu));
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 5/8] KVM: Add arch hooks for enabling/disabling virtualization
2024-06-08 0:06 ` [PATCH v3 5/8] KVM: Add arch hooks for enabling/disabling virtualization Sean Christopherson
@ 2024-08-14 18:15 ` Paolo Bonzini
0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-08-14 18:15 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On 6/8/24 02:06, Sean Christopherson wrote:
> Add arch hooks that are invoked when KVM enables/disable virtualization.
> x86 will use the hooks to register an "emergency disable" callback, which
> is essentially an x86-specific shutdown notifier that is used when the
> kernel is doing an emergency reboot/shutdown/kexec.
>
> Add comments for the declarations to help arch code understand exactly
> when the callbacks are invoked. Alternatively, the APIs themselves could
> communicate most of the same info, but kvm_arch_pre_enable_virtualization()
> and kvm_arch_post_disable_virtualization() are a bit cumbersome, and make
> it a bit less obvious that they are intended to be implemented as a pair.
>
> Reviewed-by: Chao Gao <chao.gao@intel.com>
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> include/linux/kvm_host.h | 14 ++++++++++++++
> virt/kvm/kvm_main.c | 14 ++++++++++++++
> 2 files changed, 28 insertions(+)
>
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 96ad3e8b9ddb..12ef3beb4e47 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -1514,6 +1514,20 @@ static inline void kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) {}
> #endif
>
> #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> +/*
> + * kvm_arch_{enable,disable}_virtualization() are called on one CPU, under
> + * kvm_usage_lock, immediately after/before 0=>1 and 1=>0 transitions of
> + * kvm_usage_count, i.e. at the beginning of the generic hardware enabling
> + * sequence, and at the end of the generic hardware disabling sequence.
> + */
> +void kvm_arch_enable_virtualization(void);
> +void kvm_arch_disable_virtualization(void);
> +/*
> + * kvm_arch_hardware_{enable,disable}() are called on "every" CPU to do the
> + * actual twiddling of hardware bits. The hooks are called all online CPUs
> + * when KVM enables/disabled virtualization. Enabling/disabling is also done
> + * when a CPU is onlined/offlined (or Resumed/Suspended).
> + */
> int kvm_arch_hardware_enable(void);
> void kvm_arch_hardware_disable(void);
Since you are at it, rename these to
kvm_arch_{enable,disable}_virtualization_cpu()?
Paolo
> #endif
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 7bdd744e4821..e20189a89a64 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5505,6 +5505,16 @@ static DEFINE_PER_CPU(bool, hardware_enabled);
> static DEFINE_MUTEX(kvm_usage_lock);
> static int kvm_usage_count;
>
> +__weak void kvm_arch_enable_virtualization(void)
> +{
> +
> +}
> +
> +__weak void kvm_arch_disable_virtualization(void)
> +{
> +
> +}
> +
> static int __kvm_enable_virtualization(void)
> {
> if (__this_cpu_read(hardware_enabled))
> @@ -5604,6 +5614,8 @@ static int kvm_enable_virtualization(void)
> if (kvm_usage_count++)
> return 0;
>
> + kvm_arch_enable_virtualization();
> +
> r = cpuhp_setup_state(CPUHP_AP_KVM_ONLINE, "kvm/cpu:online",
> kvm_online_cpu, kvm_offline_cpu);
> if (r)
> @@ -5634,6 +5646,7 @@ static int kvm_enable_virtualization(void)
> unregister_syscore_ops(&kvm_syscore_ops);
> cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
> err_cpuhp:
> + kvm_arch_disable_virtualization();
> --kvm_usage_count;
> return r;
> }
> @@ -5647,6 +5660,7 @@ static void kvm_disable_virtualization(void)
>
> unregister_syscore_ops(&kvm_syscore_ops);
> cpuhp_remove_state(CPUHP_AP_KVM_ONLINE);
> + kvm_arch_disable_virtualization();
> }
>
> static int kvm_init_virtualization(void)
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 8/8] KVM: Enable virtualization at load/initialization by default
2024-06-08 0:06 ` [PATCH v3 8/8] KVM: Enable virtualization at load/initialization by default Sean Christopherson
@ 2024-08-14 18:20 ` Paolo Bonzini
0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-08-14 18:20 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On 6/8/24 02:06, Sean Christopherson wrote:
> Enable virtualization when KVM loads by default, as doing so avoids the
> potential runtime overhead associated with using the cpuhp framework to
> enabling virtualization on each CPU.
Hah, should have read ahead to this point of the series. Just squash
this in the earlier patch and call it a day; and place all the worrisome
remarks about possible latency along with documentation for the parameter.
Oops, there's no documentation, :) please add it to
Documentation/admin-guide/kernel-parameters.txt in v2.
Thanks,
Paolo
> Prior to commit 10474ae8945c ("KVM: Activate Virtualization On Demand"),
> KVM _unconditionally_ enabled virtualization during load, i.e. there's no
> fundamental reason KVM needs to dynamically toggle virtualization. These
> days, the only known argument for not enabling virtualization is to allow
> KVM to be autoloaded without blocking other out-of-tree hypervisors, and
> such use cases can simply change the module param, e.g. via command line.
>
> Note, the aforementioned commit also mentioned that enabling SVM (AMD's
> virtualization extensions) can result in "using invalid TLB entries".
> It's not clear whether the changelog was referring to a KVM bug, a CPU
> bug, or something else entirely. Regardless, leaving virtualization off
> by default is not a robust "fix", as any protection provided is lost the
> instant userspace creates the first VM.
>
> Suggested-by: Chao Gao <chao.gao@intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> virt/kvm/kvm_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index e20189a89a64..1440c0a7c3c3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -5495,7 +5495,7 @@ static struct miscdevice kvm_dev = {
> };
>
> #ifdef CONFIG_KVM_GENERIC_HARDWARE_ENABLING
> -static bool enable_virt_at_load;
> +static bool enable_virt_at_load = true;
> module_param(enable_virt_at_load, bool, 0444);
>
> __visible bool kvm_rebooting;
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
` (8 preceding siblings ...)
2024-06-10 0:59 ` [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Huang, Kai
@ 2024-08-14 18:23 ` Paolo Bonzini
2024-08-14 22:17 ` Huang, Kai
2024-08-20 8:57 ` Paolo Bonzini
10 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2024-08-14 18:23 UTC (permalink / raw)
To: Sean Christopherson
Cc: kvm, linux-kernel, Chao Gao, Kai Huang, Marc Zyngier, Anup Patel,
Huacai Chen, Oliver Upton
On 6/8/24 02:06, Sean Christopherson wrote:
> Register KVM's cpuhp and syscore callbacks when enabling virtualization in
> hardware, as the sole purpose of said callbacks is to disable and re-enable
> virtualization as needed.
>
> The primary motivation for this series is to simplify dealing with enabling
> virtualization for Intel's TDX, which needs to enable virtualization
> when kvm-intel.ko is loaded, i.e. long before the first VM is created. TDX
> doesn't _need_ to keep virtualization enabled, but doing so is much simpler
> for KVM (see patch 3).
>
> That said, this is a nice cleanup on its own, assuming I haven't broken
> something. By registering the callbacks on-demand, the callbacks themselves
> don't need to check kvm_usage_count, because their very existence implies a
> non-zero count.
>
> The meat is in patch 1. Patches 2 renames the helpers so that patch 3 is
> less awkward. Patch 3 adds a module param to enable virtualization when KVM
> is loaded. Patches 4-6 are tangentially related x86 cleanups to registers
> KVM's "emergency disable" callback on-demand, same as the syscore callbacks.
>
> The suspend/resume and cphup paths still need to be fully tested, as do
> non-x86 architectures.
Also placed in kvm/queue, mostly as a reminder to myself, and added
other maintainers for testing on ARM, RISC-V and LoongArch. The changes
from v3 to v4 should be mostly nits, documentation and organization of
the series.
Thanks,
Paolo
> v3:
> - Collect reviews/acks.
> - Switch to kvm_usage_lock in a dedicated patch, Cc'd for stable@. [Chao]
> - Enable virt at load by default. [Chao]
> - Add comments to document how kvm_arch_{en,dis}able_virtualization() fit
> into the overall flow. [Kai]
>
> v2:
> - https://lore.kernel.org/all/20240522022827.1690416-1-seanjc@google.com
> - Use a dedicated mutex to avoid lock inversion issues between kvm_lock and
> the cpuhp lock.
> - Register emergency disable callbacks on-demand. [Kai]
> - Drop an unintended s/junk/ign rename. [Kai]
> - Decrement kvm_usage_count on failure. [Chao]
>
> v1: https://lore.kernel.org/all/20240425233951.3344485-1-seanjc@google.com
>
> Sean Christopherson (8):
> KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
> KVM: Register cpuhp and syscore callbacks when enabling hardware
> KVM: Rename functions related to enabling virtualization hardware
> KVM: Add a module param to allow enabling virtualization when KVM is
> loaded
> KVM: Add arch hooks for enabling/disabling virtualization
> x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef
> KVM: x86: Register "emergency disable" callbacks when virt is enabled
> KVM: Enable virtualization at load/initialization by default
>
> Documentation/virt/kvm/locking.rst | 19 ++-
> arch/x86/include/asm/kvm_host.h | 3 +
> arch/x86/include/asm/reboot.h | 2 +-
> arch/x86/kvm/svm/svm.c | 5 +-
> arch/x86/kvm/vmx/main.c | 2 +
> arch/x86/kvm/vmx/vmx.c | 6 +-
> arch/x86/kvm/vmx/x86_ops.h | 1 +
> arch/x86/kvm/x86.c | 10 ++
> include/linux/kvm_host.h | 14 ++
> virt/kvm/kvm_main.c | 258 ++++++++++++++---------------
> 10 files changed, 175 insertions(+), 145 deletions(-)
>
>
> base-commit: af0903ab52ee6d6f0f63af67fa73d5eb00f79b9a
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware
2024-08-14 18:12 ` Paolo Bonzini
@ 2024-08-14 20:55 ` Sean Christopherson
0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-08-14 20:55 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On Wed, Aug 14, 2024, Paolo Bonzini wrote:
> On 6/8/24 02:06, Sean Christopherson wrote:
> > Register KVM's cpuhp and syscore callback when enabling virtualization
> > in hardware instead of registering the callbacks during initialization,
> > and let the CPU up/down framework invoke the inner enable/disable
> > functions. Registering the callbacks during initialization makes things
> > more complex than they need to be, as KVM needs to be very careful about
> > handling races between enabling CPUs being onlined/offlined and hardware
> > being enabled/disabled.
> >
> > Intel TDX support will require KVM to enable virtualization during KVM
> > initialization, i.e. will add another wrinkle to things, at which point
> > sorting out the potential races with kvm_usage_count would become even
> > more complex.
> >
> > Note, using the cpuhp framework has a subtle behavioral change: enabling
> > will be done serially across all CPUs, whereas KVM currently sends an IPI
> > to all CPUs in parallel. While serializing virtualization enabling could
> > create undesirable latency, the issue is limited to creation of KVM's
> > first VM,
>
> Isn't that "limited to when kvm_usage_count goes from 0 to 1", so every time
> a VM is started if you never run two?
Yes, "first" isn't the correct word/phrase.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt
2024-08-14 18:23 ` Paolo Bonzini
@ 2024-08-14 22:17 ` Huang, Kai
2024-08-15 14:41 ` Sean Christopherson
0 siblings, 1 reply; 31+ messages in thread
From: Huang, Kai @ 2024-08-14 22:17 UTC (permalink / raw)
To: Paolo Bonzini, Sean Christopherson
Cc: kvm, linux-kernel, Chao Gao, Marc Zyngier, Anup Patel,
Huacai Chen, Oliver Upton
> Also placed in kvm/queue, mostly as a reminder to myself, and added
> other maintainers for testing on ARM, RISC-V and LoongArch. The changes
> from v3 to v4 should be mostly nits, documentation and organization of
> the series.
>
Also another reminder:
Could you also remove the WARN_ON() in kvm_uninit_virtualization() so
that we can allow additional kvm_disable_virtualization() after that for
TDX?
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
2024-08-14 18:06 ` Paolo Bonzini
@ 2024-08-15 14:39 ` Sean Christopherson
2024-08-15 16:10 ` Paolo Bonzini
0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2024-08-15 14:39 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On Wed, Aug 14, 2024, Paolo Bonzini wrote:
> On 6/8/24 02:06, Sean Christopherson wrote:
> > Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
> > on x86 due to a chain of locks and SRCU synchronizations. Translating the
> > below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
> > CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
> > fairness of r/w semaphores).
> >
> > CPU0 CPU1 CPU2
> > 1 lock(&kvm->slots_lock);
> > 2 lock(&vcpu->mutex);
> > 3 lock(&kvm->srcu);
> > 4 lock(cpu_hotplug_lock);
> > 5 lock(kvm_lock);
> > 6 lock(&kvm->slots_lock);
> > 7 lock(cpu_hotplug_lock);
> > 8 sync(&kvm->srcu);
> >
> > Note, there are likely more potential deadlocks in KVM x86, e.g. the same
> > pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
> > __kvmclock_cpufreq_notifier()
>
> Offhand I couldn't see any places where {,__}cpufreq_driver_target() is
> called within cpus_read_lock(). I didn't look too closely though.
Aha! I think I finally found it and it's rather obvious now that I've found it.
I looked quite deeply on multiple occasions in the past and never found such a
case, but I could've sworn someone (Kai?) report a lockdep splat related to the
cpufreq stuff when I did the big generic hardware enabling a while back. Of
course, I couldn't find that either :-)
Anyways...
cpuhp_cpufreq_online()
|
-> cpufreq_online()
|
-> cpufreq_gov_performance_limits()
|
-> __cpufreq_driver_target()
|
-> __target_index()
>
> > +``kvm_usage_count``
> > +^^^^^^^^^^^^^^^^^^^
>
> ``kvm_usage_lock``
Good job me.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt
2024-08-14 22:17 ` Huang, Kai
@ 2024-08-15 14:41 ` Sean Christopherson
0 siblings, 0 replies; 31+ messages in thread
From: Sean Christopherson @ 2024-08-15 14:41 UTC (permalink / raw)
To: Kai Huang
Cc: Paolo Bonzini, kvm, linux-kernel, Chao Gao, Marc Zyngier,
Anup Patel, Huacai Chen, Oliver Upton
On Thu, Aug 15, 2024, Kai Huang wrote:
>
> > Also placed in kvm/queue, mostly as a reminder to myself, and added
> > other maintainers for testing on ARM, RISC-V and LoongArch. The changes
> > from v3 to v4 should be mostly nits, documentation and organization of
> > the series.
> >
>
> Also another reminder:
>
> Could you also remove the WARN_ON() in kvm_uninit_virtualization() so that
> we can allow additional kvm_disable_virtualization() after that for TDX?
Yeah, I'll take care of that in v4 (as above, Paolo put this in kvm/queue as a
placeholder of sorts).
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
2024-08-15 14:39 ` Sean Christopherson
@ 2024-08-15 16:10 ` Paolo Bonzini
2024-08-30 23:45 ` Sean Christopherson
0 siblings, 1 reply; 31+ messages in thread
From: Paolo Bonzini @ 2024-08-15 16:10 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On Thu, Aug 15, 2024 at 4:40 PM Sean Christopherson <seanjc@google.com> wrote:
>
> On Wed, Aug 14, 2024, Paolo Bonzini wrote:
> > On 6/8/24 02:06, Sean Christopherson wrote:
> > > Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
> > > on x86 due to a chain of locks and SRCU synchronizations. Translating the
> > > below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
> > > CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
> > > fairness of r/w semaphores).
> > >
> > > CPU0 CPU1 CPU2
> > > 1 lock(&kvm->slots_lock);
> > > 2 lock(&vcpu->mutex);
> > > 3 lock(&kvm->srcu);
> > > 4 lock(cpu_hotplug_lock);
> > > 5 lock(kvm_lock);
> > > 6 lock(&kvm->slots_lock);
> > > 7 lock(cpu_hotplug_lock);
> > > 8 sync(&kvm->srcu);
> > >
> > > Note, there are likely more potential deadlocks in KVM x86, e.g. the same
> > > pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
> > > __kvmclock_cpufreq_notifier()
> >
> > Offhand I couldn't see any places where {,__}cpufreq_driver_target() is
> > called within cpus_read_lock(). I didn't look too closely though.
>
> Anyways...
>
> cpuhp_cpufreq_online()
> |
> -> cpufreq_online()
> |
> -> cpufreq_gov_performance_limits()
> |
> -> __cpufreq_driver_target()
> |
> -> __target_index()
Ah, I only looked in generic code.
Can you add a comment to the comment message suggesting switching the
vm_list to RCU? All the occurrences of list_for_each_entry(...,
&vm_list, ...) seem amenable to that, and it should be as easy to
stick all or part of kvm_destroy_vm() behind call_rcu().
Thanks,
Paolo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
` (9 preceding siblings ...)
2024-08-14 18:23 ` Paolo Bonzini
@ 2024-08-20 8:57 ` Paolo Bonzini
10 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-08-20 8:57 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang, Chen, Farrah
On Sat, Jun 8, 2024 at 2:06 AM Sean Christopherson <seanjc@google.com> wrote:
> The suspend/resume and cphup paths still need to be fully tested, as do
> non-x86 architectures.
You can add
Tested-by: Farrah Chen <farrah.chen@intel.com>
> For CPU hotplug we tested this:
>
> 1) offline some CPUs;
> 2) load kvm-intel.ko;
> 3) run a VM;
> 4) online those offlined CPUs;
> 5) offline those CPUs again;
> 6) online those CPUs again;
>
> All steps can be done successfully, and the VM run as expected during
> step 4) to 6).
>
> For suspend/resume we tested:
>
> 1) load kvm-intel.ko and run a VM;
> 2) suspend host
> 3) resume the host back (using the IPMI KVM console)
>
> All steps worked successfully, and the VM still run fine after resume.
Thanks Kai and Farrah :)
Paolo
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
2024-08-15 16:10 ` Paolo Bonzini
@ 2024-08-30 23:45 ` Sean Christopherson
2024-09-02 13:03 ` Paolo Bonzini
0 siblings, 1 reply; 31+ messages in thread
From: Sean Christopherson @ 2024-08-30 23:45 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On Thu, Aug 15, 2024, Paolo Bonzini wrote:
> On Thu, Aug 15, 2024 at 4:40 PM Sean Christopherson <seanjc@google.com> wrote:
> >
> > On Wed, Aug 14, 2024, Paolo Bonzini wrote:
> > > On 6/8/24 02:06, Sean Christopherson wrote:
> > > > Use a dedicated mutex to guard kvm_usage_count to fix a potential deadlock
> > > > on x86 due to a chain of locks and SRCU synchronizations. Translating the
> > > > below lockdep splat, CPU1 #6 will wait on CPU0 #1, CPU0 #8 will wait on
> > > > CPU2 #3, and CPU2 #7 will wait on CPU1 #4 (if there's a writer, due to the
> > > > fairness of r/w semaphores).
> > > >
> > > > CPU0 CPU1 CPU2
> > > > 1 lock(&kvm->slots_lock);
> > > > 2 lock(&vcpu->mutex);
> > > > 3 lock(&kvm->srcu);
> > > > 4 lock(cpu_hotplug_lock);
> > > > 5 lock(kvm_lock);
> > > > 6 lock(&kvm->slots_lock);
> > > > 7 lock(cpu_hotplug_lock);
> > > > 8 sync(&kvm->srcu);
> > > >
> > > > Note, there are likely more potential deadlocks in KVM x86, e.g. the same
> > > > pattern of taking cpu_hotplug_lock outside of kvm_lock likely exists with
> > > > __kvmclock_cpufreq_notifier()
> > >
> > > Offhand I couldn't see any places where {,__}cpufreq_driver_target() is
> > > called within cpus_read_lock(). I didn't look too closely though.
> >
> > Anyways...
> >
> > cpuhp_cpufreq_online()
> > |
> > -> cpufreq_online()
> > |
> > -> cpufreq_gov_performance_limits()
> > |
> > -> __cpufreq_driver_target()
> > |
> > -> __target_index()
>
> Ah, I only looked in generic code.
>
> Can you add a comment to the comment message suggesting switching the vm_list
> to RCU? All the occurrences of list_for_each_entry(..., &vm_list, ...) seem
> amenable to that, and it should be as easy to stick all or part of
> kvm_destroy_vm() behind call_rcu().
+1 to the idea of making vm_list RCU-protected, though I think we'd want to use
SRCU, e.g. set_nx_huge_pages() currently takes eash VM's slots_lock while purging
possible NX hugepages.
And I think kvm_destroy_vm() can simply do a synchronize_srcu() after removing
the VM from the list. Trying to put kvm_destroy_vm() into an RCU callback would
probably be a bit of a disaster, e.g. kvm-intel.ko in particular currently does
some rather nasty things while destory a VM.
^ permalink raw reply [flat|nested] 31+ messages in thread
* Re: [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock
2024-08-30 23:45 ` Sean Christopherson
@ 2024-09-02 13:03 ` Paolo Bonzini
0 siblings, 0 replies; 31+ messages in thread
From: Paolo Bonzini @ 2024-09-02 13:03 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Chao Gao, Kai Huang
On Sat, Aug 31, 2024 at 1:45 AM Sean Christopherson <seanjc@google.com> wrote:
> > Can you add a comment to the comment message suggesting switching the vm_list
> > to RCU? All the occurrences of list_for_each_entry(..., &vm_list, ...) seem
> > amenable to that, and it should be as easy to stick all or part of
> > kvm_destroy_vm() behind call_rcu().
>
> +1 to the idea of making vm_list RCU-protected, though I think we'd want to use
> SRCU, e.g. set_nx_huge_pages() currently takes eash VM's slots_lock while purging
> possible NX hugepages.
Ah, for that I was thinking of wrapping everything with
kvm_get_kvm_safe()/rcu_read_unlock() and kvm_put_kvm/rcu_read_lock().
Avoiding zero refcounts is safer and generally these visits are not
hot code.
> And I think kvm_destroy_vm() can simply do a synchronize_srcu() after removing
> the VM from the list. Trying to put kvm_destroy_vm() into an RCU callback would
> probably be a bit of a disaster, e.g. kvm-intel.ko in particular currently does
> some rather nasty things while destory a VM.
If all iteration is guarded by kvm_get_kvm_safe(), probably you can
defer only the reclaiming part (i.e. the part after
kvm_destroy_devices()) which is a lot easier to audit.
Anyhow, I took a look at the v2 and it looks good.
Paolo
^ permalink raw reply [flat|nested] 31+ messages in thread
end of thread, other threads:[~2024-09-02 13:03 UTC | newest]
Thread overview: 31+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-08 0:06 [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 1/8] KVM: Use dedicated mutex to protect kvm_usage_count to avoid deadlock Sean Christopherson
2024-06-10 0:26 ` Huang, Kai
2024-08-14 18:06 ` Paolo Bonzini
2024-08-15 14:39 ` Sean Christopherson
2024-08-15 16:10 ` Paolo Bonzini
2024-08-30 23:45 ` Sean Christopherson
2024-09-02 13:03 ` Paolo Bonzini
2024-06-08 0:06 ` [PATCH v3 2/8] KVM: Register cpuhp and syscore callbacks when enabling hardware Sean Christopherson
2024-06-10 0:55 ` Huang, Kai
2024-08-14 18:12 ` Paolo Bonzini
2024-08-14 20:55 ` Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 3/8] KVM: Rename functions related to enabling virtualization hardware Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 4/8] KVM: Add a module param to allow enabling virtualization when KVM is loaded Sean Christopherson
2024-08-02 12:02 ` Huang, Kai
2024-08-02 12:06 ` Huang, Kai
2024-08-13 2:31 ` Sean Christopherson
2024-08-13 5:22 ` Huang, Kai
2024-08-13 23:47 ` Huang, Kai
2024-08-14 18:14 ` Paolo Bonzini
2024-06-08 0:06 ` [PATCH v3 5/8] KVM: Add arch hooks for enabling/disabling virtualization Sean Christopherson
2024-08-14 18:15 ` Paolo Bonzini
2024-06-08 0:06 ` [PATCH v3 6/8] x86/reboot: Unconditionally define cpu_emergency_virt_cb typedef Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 7/8] KVM: x86: Register "emergency disable" callbacks when virt is enabled Sean Christopherson
2024-06-08 0:06 ` [PATCH v3 8/8] KVM: Enable virtualization at load/initialization by default Sean Christopherson
2024-08-14 18:20 ` Paolo Bonzini
2024-06-10 0:59 ` [PATCH v3 0/8] KVM: Register cpuhp/syscore callbacks when enabling virt Huang, Kai
2024-08-14 18:23 ` Paolo Bonzini
2024-08-14 22:17 ` Huang, Kai
2024-08-15 14:41 ` Sean Christopherson
2024-08-20 8:57 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).