[PATCH 0/3] KVM: extract lock_all_vcpus/unlock_all

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/3] KVM: extract lock_all_vcpus/unlock_all_vcpus
@ 2025-02-11  0:09 Maxim Levitsky
  2025-02-11  0:09 ` [PATCH 1/3] KVM: x86: move sev_lock/unlock_vcpus_for_migration to kvm_main.c Maxim Levitsky
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Maxim Levitsky @ 2025-02-11  0:09 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Jing Zhang, Oliver Upton, linux-arm-kernel,
	Marc Zyngier, linux-kernel, Randy Dunlap, Suzuki K Poulose,
	Palmer Dabbelt, Zenghui Yu, kvm-riscv, Ingo Molnar, linux-riscv,
	Joey Gouly, Paul Walmsley, Maxim Levitsky, Thomas Gleixner,
	Bjorn Helgaas, Albert Ou, kvmarm, Alexander Potapenko, x86,
	Sean Christopherson, Anup Patel, Kunkun Jiang, Atish Patra,
	Catalin Marinas, Will Deacon, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

Implement Paolo's suggestion of reusing
sev_lock/unlock_vcpus_for_migration in arm and riscv code
for the purpose of taking vcpu->mutex of all vcpus of a VM.

Because sev_lock/unlock_vcpus_for_migration already have a workaround
for lockdep max lock depth, this fixes the lockdep warnings on arm
which were the inspiration for this refactoring.

This patch series was only compile tested on all 3 architectures.

Best regards,
	Maxim Levitsky

Maxim Levitsky (3):
  KVM: x86: move sev_lock/unlock_vcpus_for_migration to kvm_main.c
  KVM: arm64: switch to using kvm_lock/unlock_all_vcpus
  RISC-V: KVM: switch to kvm_lock/unlock_all_vcpus

 arch/arm64/include/asm/kvm_host.h     |  3 --
 arch/arm64/kvm/arch_timer.c           |  8 ++--
 arch/arm64/kvm/arm.c                  | 32 -------------
 arch/arm64/kvm/vgic/vgic-init.c       | 11 +++--
 arch/arm64/kvm/vgic/vgic-its.c        | 18 +++----
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 21 ++++----
 arch/riscv/kvm/aia_device.c           | 36 ++------------
 arch/x86/kvm/svm/sev.c                | 65 ++-----------------------
 include/linux/kvm_host.h              |  6 +++
 virt/kvm/kvm_main.c                   | 69 +++++++++++++++++++++++++++
 10 files changed, 115 insertions(+), 154 deletions(-)

-- 
2.26.3



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/3] KVM: x86: move sev_lock/unlock_vcpus_for_migration to kvm_main.c
  2025-02-11  0:09 [PATCH 0/3] KVM: extract lock_all_vcpus/unlock_all_vcpus Maxim Levitsky
@ 2025-02-11  0:09 ` Maxim Levitsky
  2025-02-11  0:09 ` [PATCH 2/3] KVM: arm64: switch to using kvm_lock/unlock_all_vcpus Maxim Levitsky
  2025-02-11  0:09 ` [PATCH 3/3] RISC-V: KVM: switch to kvm_lock/unlock_all_vcpus Maxim Levitsky
  2 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2025-02-11  0:09 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Jing Zhang, Oliver Upton, linux-arm-kernel,
	Marc Zyngier, linux-kernel, Randy Dunlap, Suzuki K Poulose,
	Palmer Dabbelt, Zenghui Yu, kvm-riscv, Ingo Molnar, linux-riscv,
	Joey Gouly, Paul Walmsley, Maxim Levitsky, Thomas Gleixner,
	Bjorn Helgaas, Albert Ou, kvmarm, Alexander Potapenko, x86,
	Sean Christopherson, Anup Patel, Kunkun Jiang, Atish Patra,
	Catalin Marinas, Will Deacon, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

Move sev_lock/unlock_vcpus_for_migration to kvm_main and call the
new functions the kvm_lock_all_vcpus/kvm_unlock_all_vcpus
and kvm_lock_all_vcpus_nested.

This code allows to lock all vCPUs without triggering lockdep warning
about reaching MAX_LOCK_DEPTH depth by coercing the lockdep into
thinking that we release all the locks other than vcpu'0 lock
immediately after we take them.

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/x86/kvm/svm/sev.c   | 65 +++----------------------------------
 include/linux/kvm_host.h |  6 ++++
 virt/kvm/kvm_main.c      | 69 ++++++++++++++++++++++++++++++++++++++++
 3 files changed, 79 insertions(+), 61 deletions(-)

diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index a2a794c32050..5ba1dd61aff0 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -1909,63 +1909,6 @@ enum sev_migration_role {
 	SEV_NR_MIGRATION_ROLES,
 };
 
-static int sev_lock_vcpus_for_migration(struct kvm *kvm,
-					enum sev_migration_role role)
-{
-	struct kvm_vcpu *vcpu;
-	unsigned long i, j;
-
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (mutex_lock_killable_nested(&vcpu->mutex, role))
-			goto out_unlock;
-
-#ifdef CONFIG_PROVE_LOCKING
-		if (!i)
-			/*
-			 * Reset the role to one that avoids colliding with
-			 * the role used for the first vcpu mutex.
-			 */
-			role = SEV_NR_MIGRATION_ROLES;
-		else
-			mutex_release(&vcpu->mutex.dep_map, _THIS_IP_);
-#endif
-	}
-
-	return 0;
-
-out_unlock:
-
-	kvm_for_each_vcpu(j, vcpu, kvm) {
-		if (i == j)
-			break;
-
-#ifdef CONFIG_PROVE_LOCKING
-		if (j)
-			mutex_acquire(&vcpu->mutex.dep_map, role, 0, _THIS_IP_);
-#endif
-
-		mutex_unlock(&vcpu->mutex);
-	}
-	return -EINTR;
-}
-
-static void sev_unlock_vcpus_for_migration(struct kvm *kvm)
-{
-	struct kvm_vcpu *vcpu;
-	unsigned long i;
-	bool first = true;
-
-	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (first)
-			first = false;
-		else
-			mutex_acquire(&vcpu->mutex.dep_map,
-				      SEV_NR_MIGRATION_ROLES, 0, _THIS_IP_);
-
-		mutex_unlock(&vcpu->mutex);
-	}
-}
-
 static void sev_migrate_from(struct kvm *dst_kvm, struct kvm *src_kvm)
 {
 	struct kvm_sev_info *dst = &to_kvm_svm(dst_kvm)->sev_info;
@@ -2104,10 +2047,10 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 		charged = true;
 	}
 
-	ret = sev_lock_vcpus_for_migration(kvm, SEV_MIGRATION_SOURCE);
+	ret = kvm_lock_all_vcpus_nested(kvm, SEV_MIGRATION_SOURCE);
 	if (ret)
 		goto out_dst_cgroup;
-	ret = sev_lock_vcpus_for_migration(source_kvm, SEV_MIGRATION_TARGET);
+	ret = kvm_lock_all_vcpus_nested(source_kvm, SEV_MIGRATION_TARGET);
 	if (ret)
 		goto out_dst_vcpu;
 
@@ -2121,9 +2064,9 @@ int sev_vm_move_enc_context_from(struct kvm *kvm, unsigned int source_fd)
 	ret = 0;
 
 out_source_vcpu:
-	sev_unlock_vcpus_for_migration(source_kvm);
+	kvm_unlock_all_vcpus(source_kvm);
 out_dst_vcpu:
-	sev_unlock_vcpus_for_migration(kvm);
+	kvm_unlock_all_vcpus(kvm);
 out_dst_cgroup:
 	/* Operates on the source on success, on the destination on failure.  */
 	if (charged)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f34f4cfaa513..14b4a2a6f8e6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1014,6 +1014,12 @@ static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
 
 void kvm_destroy_vcpus(struct kvm *kvm);
 
+int kvm_lock_all_vcpus_nested(struct kvm *kvm, unsigned int role);
+void kvm_unlock_all_vcpus(struct kvm *kvm);
+
+#define kvm_lock_all_vcpus(kvm) \
+	kvm_lock_all_vcpus_nested(kvm, 0)
+
 void vcpu_load(struct kvm_vcpu *vcpu);
 void vcpu_put(struct kvm_vcpu *vcpu);
 
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ba0327e2d0d3..f233a79af799 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1354,6 +1354,75 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)
 	return 0;
 }
 
+
+/*
+ * Lock all VM vCPUs.
+ * Can be used nested (to lock vCPUS of two VMs for example)
+ */
+
+int kvm_lock_all_vcpus_nested(struct kvm *kvm, unsigned int role)
+{
+	struct kvm_vcpu *vcpu;
+	unsigned long i, j;
+
+	lockdep_assert_held(&kvm->lock);
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (mutex_lock_killable_nested(&vcpu->mutex, role))
+			goto out_unlock;
+
+#ifdef CONFIG_PROVE_LOCKING
+		if (!i)
+			/*
+			 * Reset the role to one that avoids colliding with
+			 * the role used for the first vcpu mutex.
+			 */
+			role = MAX_LOCK_DEPTH - 1;
+		else
+			mutex_release(&vcpu->mutex.dep_map, _THIS_IP_);
+#endif
+	}
+
+	return 0;
+
+out_unlock:
+
+	kvm_for_each_vcpu(j, vcpu, kvm) {
+		if (i == j)
+			break;
+
+#ifdef CONFIG_PROVE_LOCKING
+		if (j)
+			mutex_acquire(&vcpu->mutex.dep_map, role, 0, _THIS_IP_);
+#endif
+
+		mutex_unlock(&vcpu->mutex);
+	}
+	return -EINTR;
+}
+EXPORT_SYMBOL_GPL(kvm_lock_all_vcpus_nested);
+
+void kvm_unlock_all_vcpus(struct kvm *kvm)
+{
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
+	bool first = true;
+
+	lockdep_assert_held(&kvm->lock);
+
+	kvm_for_each_vcpu(i, vcpu, kvm) {
+		if (first)
+			first = false;
+		else
+			mutex_acquire(&vcpu->mutex.dep_map,
+					MAX_LOCK_DEPTH - 1, 0, _THIS_IP_);
+
+		mutex_unlock(&vcpu->mutex);
+	}
+}
+EXPORT_SYMBOL_GPL(kvm_unlock_all_vcpus);
+
+
 /*
  * Allocation size is twice as large as the actual dirty bitmap size.
  * See kvm_vm_ioctl_get_dirty_log() why this is needed.
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/3] KVM: arm64: switch to using kvm_lock/unlock_all_vcpus
  2025-02-11  0:09 [PATCH 0/3] KVM: extract lock_all_vcpus/unlock_all_vcpus Maxim Levitsky
  2025-02-11  0:09 ` [PATCH 1/3] KVM: x86: move sev_lock/unlock_vcpus_for_migration to kvm_main.c Maxim Levitsky
@ 2025-02-11  0:09 ` Maxim Levitsky
  2025-02-11  9:24   ` Marc Zyngier
  2025-02-11  0:09 ` [PATCH 3/3] RISC-V: KVM: switch to kvm_lock/unlock_all_vcpus Maxim Levitsky
  2 siblings, 1 reply; 6+ messages in thread
From: Maxim Levitsky @ 2025-02-11  0:09 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Jing Zhang, Oliver Upton, linux-arm-kernel,
	Marc Zyngier, linux-kernel, Randy Dunlap, Suzuki K Poulose,
	Palmer Dabbelt, Zenghui Yu, kvm-riscv, Ingo Molnar, linux-riscv,
	Joey Gouly, Paul Walmsley, Maxim Levitsky, Thomas Gleixner,
	Bjorn Helgaas, Albert Ou, kvmarm, Alexander Potapenko, x86,
	Sean Christopherson, Anup Patel, Kunkun Jiang, Atish Patra,
	Catalin Marinas, Will Deacon, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

Switch to kvm_lock/unlock_all_vcpus instead of arm's own
version.

This fixes lockdep warning about reaching maximum lock depth:

[  328.171264] BUG: MAX_LOCK_DEPTH too low!
[  328.175227] turning off the locking correctness validator.
[  328.180726] Please attach the output of /proc/lock_stat to the bug report
[  328.187531] depth: 48  max: 48!
[  328.190678] 48 locks held by qemu-kvm/11664:
[  328.194957]  #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
[  328.204048]  #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[  328.212521]  #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[  328.220991]  #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[  328.229463]  #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[  328.237934]  #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
[  328.246405]  #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0

No functional change intended.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/arm64/include/asm/kvm_host.h     |  3 ---
 arch/arm64/kvm/arch_timer.c           |  8 +++----
 arch/arm64/kvm/arm.c                  | 32 ---------------------------
 arch/arm64/kvm/vgic/vgic-init.c       | 11 +++++----
 arch/arm64/kvm/vgic/vgic-its.c        | 18 ++++++++-------
 arch/arm64/kvm/vgic/vgic-kvm-device.c | 21 ++++++++++--------
 6 files changed, 33 insertions(+), 60 deletions(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cfa024de4e3..bba97ea700ca 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -1234,9 +1234,6 @@ int __init populate_sysreg_config(const struct sys_reg_desc *sr,
 				  unsigned int idx);
 int __init populate_nv_trap_config(void);
 
-bool lock_all_vcpus(struct kvm *kvm);
-void unlock_all_vcpus(struct kvm *kvm);
-
 void kvm_calculate_traps(struct kvm_vcpu *vcpu);
 
 /* MMIO helpers */
diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
index 231c0cd9c7b4..3af1da807f9c 100644
--- a/arch/arm64/kvm/arch_timer.c
+++ b/arch/arm64/kvm/arch_timer.c
@@ -1769,7 +1769,9 @@ int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
 
 	mutex_lock(&kvm->lock);
 
-	if (lock_all_vcpus(kvm)) {
+	ret = kvm_lock_all_vcpus(kvm);
+
+	if (!ret) {
 		set_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &kvm->arch.flags);
 
 		/*
@@ -1781,9 +1783,7 @@ int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
 		kvm->arch.timer_data.voffset = offset->counter_offset;
 		kvm->arch.timer_data.poffset = offset->counter_offset;
 
-		unlock_all_vcpus(kvm);
-	} else {
-		ret = -EBUSY;
+		kvm_unlock_all_vcpus(kvm);
 	}
 
 	mutex_unlock(&kvm->lock);
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 071a7d75be68..f58849c5b4f0 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -1895,38 +1895,6 @@ static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
 	}
 }
 
-void unlock_all_vcpus(struct kvm *kvm)
-{
-	lockdep_assert_held(&kvm->lock);
-
-	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
-}
-
-/* Returns true if all vcpus were locked, false otherwise */
-bool lock_all_vcpus(struct kvm *kvm)
-{
-	struct kvm_vcpu *tmp_vcpu;
-	unsigned long c;
-
-	lockdep_assert_held(&kvm->lock);
-
-	/*
-	 * Any time a vcpu is in an ioctl (including running), the
-	 * core KVM code tries to grab the vcpu->mutex.
-	 *
-	 * By grabbing the vcpu->mutex of all VCPUs we ensure that no
-	 * other VCPUs can fiddle with the state while we access it.
-	 */
-	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
-		if (!mutex_trylock(&tmp_vcpu->mutex)) {
-			unlock_vcpus(kvm, c - 1);
-			return false;
-		}
-	}
-
-	return true;
-}
-
 static unsigned long nvhe_percpu_size(void)
 {
 	return (unsigned long)CHOOSE_NVHE_SYM(__per_cpu_end) -
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index bc7e22ab5d81..8fbce4db5c2e 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -85,8 +85,8 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 	/* Must be held to avoid race with vCPU creation */
 	lockdep_assert_held(&kvm->lock);
 
-	ret = -EBUSY;
-	if (!lock_all_vcpus(kvm))
+	ret = kvm_lock_all_vcpus(kvm);
+	if (ret)
 		return ret;
 
 	mutex_lock(&kvm->arch.config_lock);
@@ -97,9 +97,12 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 	}
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
-		if (vcpu_has_run_once(vcpu))
+		if (vcpu_has_run_once(vcpu)) {
+			ret = -EBUSY;
 			goto out_unlock;
+		}
 	}
+
 	ret = 0;
 
 	if (type == KVM_DEV_TYPE_ARM_VGIC_V2)
@@ -124,7 +127,7 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
 
 out_unlock:
 	mutex_unlock(&kvm->arch.config_lock);
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	return ret;
 }
 
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index fb96802799c6..02b02b4fff5d 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -1977,7 +1977,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 	struct vgic_its *its;
 	gpa_t addr, offset;
 	unsigned int len;
-	int align, ret = 0;
+	int align, ret;
 
 	its = dev->private;
 	offset = attr->attr;
@@ -1999,9 +1999,10 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 
 	mutex_lock(&dev->kvm->lock);
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	ret = kvm_lock_all_vcpus(sdev->kvm);
+	if (ret) {
 		mutex_unlock(&dev->kvm->lock);
-		return -EBUSY;
+		return ret;
 	}
 
 	mutex_lock(&dev->kvm->arch.config_lock);
@@ -2034,7 +2035,7 @@ static int vgic_its_attr_regs_access(struct kvm_device *dev,
 	}
 out:
 	mutex_unlock(&dev->kvm->arch.config_lock);
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 	mutex_unlock(&dev->kvm->lock);
 	return ret;
 }
@@ -2697,16 +2698,17 @@ static int vgic_its_has_attr(struct kvm_device *dev,
 static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 {
 	const struct vgic_its_abi *abi = vgic_its_get_abi(its);
-	int ret = 0;
+	int ret;
 
 	if (attr == KVM_DEV_ARM_VGIC_CTRL_INIT) /* Nothing to do */
 		return 0;
 
 	mutex_lock(&kvm->lock);
 
-	if (!lock_all_vcpus(kvm)) {
+	ret = kvm_lock_all_vcpus(kvm);
+	if (ret) {
 		mutex_unlock(&kvm->lock);
-		return -EBUSY;
+		return ret;
 	}
 
 	mutex_lock(&kvm->arch.config_lock);
@@ -2726,7 +2728,7 @@ static int vgic_its_ctrl(struct kvm *kvm, struct vgic_its *its, u64 attr)
 
 	mutex_unlock(&its->its_lock);
 	mutex_unlock(&kvm->arch.config_lock);
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	mutex_unlock(&kvm->lock);
 	return ret;
 }
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 5f4f57aaa23e..ee70a9d642ed 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -268,15 +268,16 @@ static int vgic_set_common_attr(struct kvm_device *dev,
 				return -ENXIO;
 			mutex_lock(&dev->kvm->lock);
 
-			if (!lock_all_vcpus(dev->kvm)) {
+			r = kvm_lock_all_vcpus(dev->kvm);
+			if (r) {
 				mutex_unlock(&dev->kvm->lock);
-				return -EBUSY;
+				return r;
 			}
 
 			mutex_lock(&dev->kvm->arch.config_lock);
 			r = vgic_v3_save_pending_tables(dev->kvm);
 			mutex_unlock(&dev->kvm->arch.config_lock);
-			unlock_all_vcpus(dev->kvm);
+			kvm_unlock_all_vcpus(dev->kvm);
 			mutex_unlock(&dev->kvm->lock);
 			return r;
 		}
@@ -384,9 +385,10 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 
 	mutex_lock(&dev->kvm->lock);
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	ret = kvm_lock_all_vcpus(dev->kvm);
+	if (ret) {
 		mutex_unlock(&dev->kvm->lock);
-		return -EBUSY;
+		return ret;
 	}
 
 	mutex_lock(&dev->kvm->arch.config_lock);
@@ -409,7 +411,7 @@ static int vgic_v2_attr_regs_access(struct kvm_device *dev,
 
 out:
 	mutex_unlock(&dev->kvm->arch.config_lock);
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 	mutex_unlock(&dev->kvm->lock);
 
 	if (!ret && !is_write)
@@ -545,9 +547,10 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 
 	mutex_lock(&dev->kvm->lock);
 
-	if (!lock_all_vcpus(dev->kvm)) {
+	ret = kvm_lock_all_vcpus(dev->kvm);
+	if (ret) {
 		mutex_unlock(&dev->kvm->lock);
-		return -EBUSY;
+		return ret;
 	}
 
 	mutex_lock(&dev->kvm->arch.config_lock);
@@ -589,7 +592,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
 
 out:
 	mutex_unlock(&dev->kvm->arch.config_lock);
-	unlock_all_vcpus(dev->kvm);
+	kvm_unlock_all_vcpus(dev->kvm);
 	mutex_unlock(&dev->kvm->lock);
 
 	if (!ret && uaccess && !is_write) {
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 3/3] RISC-V: KVM: switch to kvm_lock/unlock_all_vcpus
  2025-02-11  0:09 [PATCH 0/3] KVM: extract lock_all_vcpus/unlock_all_vcpus Maxim Levitsky
  2025-02-11  0:09 ` [PATCH 1/3] KVM: x86: move sev_lock/unlock_vcpus_for_migration to kvm_main.c Maxim Levitsky
  2025-02-11  0:09 ` [PATCH 2/3] KVM: arm64: switch to using kvm_lock/unlock_all_vcpus Maxim Levitsky
@ 2025-02-11  0:09 ` Maxim Levitsky
  2 siblings, 0 replies; 6+ messages in thread
From: Maxim Levitsky @ 2025-02-11  0:09 UTC (permalink / raw)
  To: kvm
  Cc: Paolo Bonzini, Jing Zhang, Oliver Upton, linux-arm-kernel,
	Marc Zyngier, linux-kernel, Randy Dunlap, Suzuki K Poulose,
	Palmer Dabbelt, Zenghui Yu, kvm-riscv, Ingo Molnar, linux-riscv,
	Joey Gouly, Paul Walmsley, Maxim Levitsky, Thomas Gleixner,
	Bjorn Helgaas, Albert Ou, kvmarm, Alexander Potapenko, x86,
	Sean Christopherson, Anup Patel, Kunkun Jiang, Atish Patra,
	Catalin Marinas, Will Deacon, Borislav Petkov, Dave Hansen,
	H. Peter Anvin

use kvm_lock/unlock_all_vcpus instead of riscv's own
implementation.

Note that this does introduce a slight functional change - now vCPUs are
unlocked in the same order they were locked and not in the opposite order.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
---
 arch/riscv/kvm/aia_device.c | 36 +++---------------------------------
 1 file changed, 3 insertions(+), 33 deletions(-)

diff --git a/arch/riscv/kvm/aia_device.c b/arch/riscv/kvm/aia_device.c
index 39cd26af5a69..32354e47c7d8 100644
--- a/arch/riscv/kvm/aia_device.c
+++ b/arch/riscv/kvm/aia_device.c
@@ -12,36 +12,6 @@
 #include <linux/kvm_host.h>
 #include <linux/uaccess.h>
 
-static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
-{
-	struct kvm_vcpu *tmp_vcpu;
-
-	for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
-		tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx);
-		mutex_unlock(&tmp_vcpu->mutex);
-	}
-}
-
-static void unlock_all_vcpus(struct kvm *kvm)
-{
-	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
-}
-
-static bool lock_all_vcpus(struct kvm *kvm)
-{
-	struct kvm_vcpu *tmp_vcpu;
-	unsigned long c;
-
-	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
-		if (!mutex_trylock(&tmp_vcpu->mutex)) {
-			unlock_vcpus(kvm, c - 1);
-			return false;
-		}
-	}
-
-	return true;
-}
-
 static int aia_create(struct kvm_device *dev, u32 type)
 {
 	int ret;
@@ -52,8 +22,8 @@ static int aia_create(struct kvm_device *dev, u32 type)
 	if (irqchip_in_kernel(kvm))
 		return -EEXIST;
 
-	ret = -EBUSY;
-	if (!lock_all_vcpus(kvm))
+	ret = kvm_lock_all_vcpus(kvm);
+	if (ret)
 		return ret;
 
 	kvm_for_each_vcpu(i, vcpu, kvm) {
@@ -65,7 +35,7 @@ static int aia_create(struct kvm_device *dev, u32 type)
 	kvm->arch.aia.in_kernel = true;
 
 out_unlock:
-	unlock_all_vcpus(kvm);
+	kvm_unlock_all_vcpus(kvm);
 	return ret;
 }
 
-- 
2.26.3


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: switch to using kvm_lock/unlock_all_vcpus
  2025-02-11  0:09 ` [PATCH 2/3] KVM: arm64: switch to using kvm_lock/unlock_all_vcpus Maxim Levitsky
@ 2025-02-11  9:24   ` Marc Zyngier
  2025-02-11 10:40     ` Paolo Bonzini
  0 siblings, 1 reply; 6+ messages in thread
From: Marc Zyngier @ 2025-02-11  9:24 UTC (permalink / raw)
  To: Maxim Levitsky
  Cc: kvm, Paolo Bonzini, Jing Zhang, Oliver Upton, linux-arm-kernel,
	linux-kernel, Randy Dunlap, Suzuki K Poulose, Palmer Dabbelt,
	Zenghui Yu, kvm-riscv, Ingo Molnar, linux-riscv, Joey Gouly,
	Paul Walmsley, Thomas Gleixner, Bjorn Helgaas, Albert Ou, kvmarm,
	Alexander Potapenko, x86, Sean Christopherson, Anup Patel,
	Kunkun Jiang, Atish Patra, Catalin Marinas, Will Deacon,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

On Tue, 11 Feb 2025 00:09:16 +0000,
Maxim Levitsky <mlevitsk@redhat.com> wrote:
> 
> Switch to kvm_lock/unlock_all_vcpus instead of arm's own
> version.
> 
> This fixes lockdep warning about reaching maximum lock depth:
> 
> [  328.171264] BUG: MAX_LOCK_DEPTH too low!
> [  328.175227] turning off the locking correctness validator.
> [  328.180726] Please attach the output of /proc/lock_stat to the bug report
> [  328.187531] depth: 48  max: 48!
> [  328.190678] 48 locks held by qemu-kvm/11664:
> [  328.194957]  #0: ffff800086de5ba0 (&kvm->lock){+.+.}-{3:3}, at: kvm_ioctl_create_device+0x174/0x5b0
> [  328.204048]  #1: ffff0800e78800b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
> [  328.212521]  #2: ffff07ffeee51e98 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
> [  328.220991]  #3: ffff0800dc7d80b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
> [  328.229463]  #4: ffff07ffe0c980b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
> [  328.237934]  #5: ffff0800a3883c78 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
> [  328.246405]  #6: ffff07fffbe480b8 (&vcpu->mutex){+.+.}-{3:3}, at: lock_all_vcpus+0x16c/0x2a0
> 
> No functional change intended.

Actually plenty of it. This sort of broad assertion is really an
indication of the contrary.

> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
> ---
>  arch/arm64/include/asm/kvm_host.h     |  3 ---
>  arch/arm64/kvm/arch_timer.c           |  8 +++----
>  arch/arm64/kvm/arm.c                  | 32 ---------------------------
>  arch/arm64/kvm/vgic/vgic-init.c       | 11 +++++----
>  arch/arm64/kvm/vgic/vgic-its.c        | 18 ++++++++-------
>  arch/arm64/kvm/vgic/vgic-kvm-device.c | 21 ++++++++++--------
>  6 files changed, 33 insertions(+), 60 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
> index 7cfa024de4e3..bba97ea700ca 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -1234,9 +1234,6 @@ int __init populate_sysreg_config(const struct sys_reg_desc *sr,
>  				  unsigned int idx);
>  int __init populate_nv_trap_config(void);
>  
> -bool lock_all_vcpus(struct kvm *kvm);
> -void unlock_all_vcpus(struct kvm *kvm);
> -
>  void kvm_calculate_traps(struct kvm_vcpu *vcpu);
>  
>  /* MMIO helpers */
> diff --git a/arch/arm64/kvm/arch_timer.c b/arch/arm64/kvm/arch_timer.c
> index 231c0cd9c7b4..3af1da807f9c 100644
> --- a/arch/arm64/kvm/arch_timer.c
> +++ b/arch/arm64/kvm/arch_timer.c
> @@ -1769,7 +1769,9 @@ int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
>  
>  	mutex_lock(&kvm->lock);
>  
> -	if (lock_all_vcpus(kvm)) {
> +	ret = kvm_lock_all_vcpus(kvm);
> +
> +	if (!ret) {
>  		set_bit(KVM_ARCH_FLAG_VM_COUNTER_OFFSET, &kvm->arch.flags);
>  
>  		/*
> @@ -1781,9 +1783,7 @@ int kvm_vm_ioctl_set_counter_offset(struct kvm *kvm,
>  		kvm->arch.timer_data.voffset = offset->counter_offset;
>  		kvm->arch.timer_data.poffset = offset->counter_offset;
>  
> -		unlock_all_vcpus(kvm);
> -	} else {
> -		ret = -EBUSY;
> +		kvm_unlock_all_vcpus(kvm);
>  	}

This is a userspace ABI change. This ioctl is documented as being able
to return -EINVAL or -EBUSY, and nothing else other than 0. Yet the
new helper returns -EINTR, which you blindly forward to userspace.

>  
>  	mutex_unlock(&kvm->lock);
> diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
> index 071a7d75be68..f58849c5b4f0 100644
> --- a/arch/arm64/kvm/arm.c
> +++ b/arch/arm64/kvm/arm.c
> @@ -1895,38 +1895,6 @@ static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx)
>  	}
>  }
>  
> -void unlock_all_vcpus(struct kvm *kvm)
> -{
> -	lockdep_assert_held(&kvm->lock);
> -
> -	unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1);
> -}
> -
> -/* Returns true if all vcpus were locked, false otherwise */
> -bool lock_all_vcpus(struct kvm *kvm)
> -{
> -	struct kvm_vcpu *tmp_vcpu;
> -	unsigned long c;
> -
> -	lockdep_assert_held(&kvm->lock);
> -
> -	/*
> -	 * Any time a vcpu is in an ioctl (including running), the
> -	 * core KVM code tries to grab the vcpu->mutex.
> -	 *
> -	 * By grabbing the vcpu->mutex of all VCPUs we ensure that no
> -	 * other VCPUs can fiddle with the state while we access it.
> -	 */
> -	kvm_for_each_vcpu(c, tmp_vcpu, kvm) {
> -		if (!mutex_trylock(&tmp_vcpu->mutex)) {
> -			unlock_vcpus(kvm, c - 1);
> -			return false;
> -		}
> -	}
> -
> -	return true;
> -}

The semantics are different.

Other than the return values mentioned above, the new version fails on
signal delivery, which isn't expected.  The guarantee given to
userspace is that unless a vcpu thread is currently in KVM, the
locking will succeed. Not "will succeed unless something that is
outside of your control happens".

The arm64 version is also built around a mutex_trylock() because we
don't want to wait forever until the vcpu's mutex is released. We want
it now, or never. That's consistent with the above requirement on
userspace.

We can argue whether or not these are good guarantees (or
requirements) to give to (or demand from) userspace, but that's what
we have, and I'm not prepared to break any of it.

At the end of the day, the x86 locking serves completely different
purposes. It wants to gracefully wait for vcpus to exit and is happy
to replay things, because migration (which is what x86 seems to be
using this for) is a stupidly long process. Our locking is designed to
either succeed or fail quickly, because some of the lock paths are on
the critical path for VM startup and configuration.

So for this series to be acceptable, you'd have to provide the same
semantics. It is probably doable with a bit of macro magic, at the
expense of readability.

What I would also like to see is for this primitive to be usable with
scoped_cond_guard(), which would make the code much more readable.

Thanks,

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/3] KVM: arm64: switch to using kvm_lock/unlock_all_vcpus
  2025-02-11  9:24   ` Marc Zyngier
@ 2025-02-11 10:40     ` Paolo Bonzini
  0 siblings, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2025-02-11 10:40 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Maxim Levitsky, kvm, Jing Zhang, Oliver Upton, linux-arm-kernel,
	linux-kernel, Randy Dunlap, Suzuki K Poulose, Palmer Dabbelt,
	Zenghui Yu, kvm-riscv, Ingo Molnar, linux-riscv, Joey Gouly,
	Paul Walmsley, Thomas Gleixner, Bjorn Helgaas, Albert Ou, kvmarm,
	Alexander Potapenko, x86, Sean Christopherson, Anup Patel,
	Kunkun Jiang, Atish Patra, Catalin Marinas, Will Deacon,
	Borislav Petkov, Dave Hansen, H. Peter Anvin

On Tue, Feb 11, 2025 at 10:25 AM Marc Zyngier <maz@kernel.org> wrote:
> > No functional change intended.
>
> Actually plenty of it.

Yes, definitely. That's not "no functional change intended", it's "no
breakage of sane userspace intended" which is pretty much the
opposite.

> Yet the new helper returns -EINTR, which you blindly forward to userspace.

It won't quite reach userspace, since the task won't survive
mutex_lock_killable(). With your current code the kill will arrive
just after the ioctl returns to userspace, so there's no change in
behavior there. The -EBUSY change is the one that matters.

> At the end of the day, the x86 locking serves completely different
> purposes. It wants to gracefully wait for vcpus to exit and is happy
> to replay things, because migration (which is what x86 seems to be
> using this for) is a stupidly long process.

No, it's not using it for the whole length of migration. It serves the
same exact purpose as ARM: do some stuff on something that spans a
whole struct kvm. The only difference is the behavior on contended
mutex, where x86 simply says don't do it.

> Our locking is designed to
> either succeed or fail quickly, because some of the lock paths are on
> the critical path for VM startup and configuration.

The only long-running vCPU ioctl is KVM_RUN and you don't have that
during VM startup and configuration. The interesting case is
snapshotting, i.e. reading attributes.

Failing quickly when reading attributes makes sense, and I'd be
rightly wary of changing it. I know that QEMU is doing them only when
it knows CPUs are stopped, but it does seem to affect crosvm and
Firecracker more, for snapshotting more than startup/configuration.
The GIC snapshotting code in crosvm returns an anyhow::Result and ends
up logging an error with println!, Firecracker is similar as it also
propagates the error. So neither crosvm nor Firecracker have
particularly sophisticated error handling but they do seem to want to
fail their RPCs quickly.

Failing quickly when creating the device or setting attributes makes
less sense (Firecracker for one does an unwrap() there). But anyhow
the change would have to be a separate patch, certainly not one
applied through the generic KVM tree; and if you decide that you're
stuck with it, that would be more than understandable.

> So for this series to be acceptable, you'd have to provide the same
> semantics. It is probably doable with a bit of macro magic, at the
> expense of readability.

Or it can be just an extra bool argument, plus wrappers.

For RISC-V it's only used at device creation time, and the ioctl fails
if the vCPU ran at least once. Removing the "try" in trylock is
totally feasible there, however it must be documented in the commit
message.

Thanks,

Paolo

> What I would also like to see is for this primitive to be usable with
> scoped_cond_guard(), which would make the code much more readable.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-02-11 10:40 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-11  0:09 [PATCH 0/3] KVM: extract lock_all_vcpus/unlock_all_vcpus Maxim Levitsky
2025-02-11  0:09 ` [PATCH 1/3] KVM: x86: move sev_lock/unlock_vcpus_for_migration to kvm_main.c Maxim Levitsky
2025-02-11  0:09 ` [PATCH 2/3] KVM: arm64: switch to using kvm_lock/unlock_all_vcpus Maxim Levitsky
2025-02-11  9:24   ` Marc Zyngier
2025-02-11 10:40     ` Paolo Bonzini
2025-02-11  0:09 ` [PATCH 3/3] RISC-V: KVM: switch to kvm_lock/unlock_all_vcpus Maxim Levitsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox