All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd
@ 2025-09-05 10:05 Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state Oliver Upton
                   ` (6 more replies)
  0 siblings, 7 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Ben Horgan, Oliver Upton

v1: https://lore.kernel.org/kvmarm/20250904062348.223976-1-oliver.upton@linux.dev/

v1 -> v2:
 - Drop union misbehavior + misleading comment (Marc)
 - Fix LPI check to include the first one (Ben)
 - Clarify changelog to make lock relaxation slightly more obvious

Oliver Upton (6):
  KVM: arm64: vgic: Drop stale comment on IRQ active state
  KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
  KVM: arm64: Spin off release helper from vgic_put_irq()
  KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
  KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray
    lock
  KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock

 arch/arm64/kvm/vgic/vgic-debug.c |  2 +-
 arch/arm64/kvm/vgic/vgic-init.c  |  6 +--
 arch/arm64/kvm/vgic/vgic-its.c   | 15 +++---
 arch/arm64/kvm/vgic/vgic-v4.c    |  2 +-
 arch/arm64/kvm/vgic/vgic.c       | 78 +++++++++++++++++++++++---------
 arch/arm64/kvm/vgic/vgic.h       |  8 ++--
 include/kvm/arm_vgic.h           |  9 ++--
 7 files changed, 79 insertions(+), 41 deletions(-)


base-commit: b320789d6883cc00ac78ce83bccbfe7ed58afcf0
-- 
2.39.5


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state
  2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs Oliver Upton
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Ben Horgan, Oliver Upton

While LPIs lack an active state, KVM unconditionally folds the active
state from the LR into the vgic_irq struct meaning this field cannot be
'creatively' reused for something else.

Drop the misleading comment to reflect this.

Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 include/kvm/arm_vgic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 404883c7af6e..9f8a116925ca 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -139,7 +139,7 @@ struct vgic_irq {
 	bool pending_latch;		/* The pending latch state used to calculate
 					 * the pending state for both level
 					 * and edge triggered IRQs. */
-	bool active;			/* not used for LPIs */
+	bool active;
 	bool enabled;
 	bool hw;			/* Tied to HW IRQ */
 	struct kref refcount;		/* Used for LPIs */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
  2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 3/6] KVM: arm64: Spin off release helper from vgic_put_irq() Oliver Upton
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Ben Horgan, Oliver Upton

KVM's use of krefs to manage LPIs isn't adding much, especially
considering vgic_irq_release() is a noop due to the lack of sufficient
context.

Switch to using a regular refcount in anticipation of adding a
meaningful release concept for LPIs.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/vgic/vgic-debug.c |  2 +-
 arch/arm64/kvm/vgic/vgic-init.c  |  4 ++--
 arch/arm64/kvm/vgic/vgic-its.c   |  8 ++++----
 arch/arm64/kvm/vgic/vgic-v4.c    |  2 +-
 arch/arm64/kvm/vgic/vgic.c       | 17 ++++-------------
 arch/arm64/kvm/vgic/vgic.h       |  8 ++++----
 include/kvm/arm_vgic.h           |  4 ++--
 7 files changed, 18 insertions(+), 27 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
index 2684f273d9e1..4c1209261b65 100644
--- a/arch/arm64/kvm/vgic/vgic-debug.c
+++ b/arch/arm64/kvm/vgic/vgic-debug.c
@@ -69,7 +69,7 @@ static int iter_mark_lpis(struct kvm *kvm)
 	int nr_lpis = 0;
 
 	xa_for_each(&dist->lpi_xa, intid, irq) {
-		if (!vgic_try_get_irq_kref(irq))
+		if (!vgic_try_get_irq_ref(irq))
 			continue;
 
 		xa_set_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 1e680ad6e863..4c777136ea5f 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -208,7 +208,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
 		raw_spin_lock_init(&irq->irq_lock);
 		irq->vcpu = NULL;
 		irq->target_vcpu = vcpu0;
-		kref_init(&irq->refcount);
+		refcount_set(&irq->refcount, 0);
 		switch (dist->vgic_model) {
 		case KVM_DEV_TYPE_ARM_VGIC_V2:
 			irq->targets = 0;
@@ -277,7 +277,7 @@ static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu, u32 type)
 		irq->intid = i;
 		irq->vcpu = NULL;
 		irq->target_vcpu = vcpu;
-		kref_init(&irq->refcount);
+		refcount_set(&irq->refcount, 0);
 		if (vgic_irq_is_sgi(i)) {
 			/* SGIs */
 			irq->enabled = 1;
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 7368c13f16b7..f162206adb48 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -99,7 +99,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
 	raw_spin_lock_init(&irq->irq_lock);
 
 	irq->config = VGIC_CONFIG_EDGE;
-	kref_init(&irq->refcount);
+	refcount_set(&irq->refcount, 1);
 	irq->intid = intid;
 	irq->target_vcpu = vcpu;
 	irq->group = 1;
@@ -111,7 +111,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
 	 * check that we don't add a second list entry with the same LPI.
 	 */
 	oldirq = xa_load(&dist->lpi_xa, intid);
-	if (vgic_try_get_irq_kref(oldirq)) {
+	if (vgic_try_get_irq_ref(oldirq)) {
 		/* Someone was faster with adding this LPI, lets use that. */
 		kfree(irq);
 		irq = oldirq;
@@ -547,7 +547,7 @@ static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db,
 	rcu_read_lock();
 
 	irq = xa_load(&its->translation_cache, cache_key);
-	if (!vgic_try_get_irq_kref(irq))
+	if (!vgic_try_get_irq_ref(irq))
 		irq = NULL;
 
 	rcu_read_unlock();
@@ -571,7 +571,7 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
 	 * its_lock, as the ITE (and the reference it holds) cannot be freed.
 	 */
 	lockdep_assert_held(&its->its_lock);
-	vgic_get_irq_kref(irq);
+	vgic_get_irq_ref(irq);
 
 	old = xa_store(&its->translation_cache, cache_key, irq, GFP_KERNEL_ACCOUNT);
 
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index 4d9343d2b0b1..548aec9d5a72 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -518,7 +518,7 @@ static struct vgic_irq *__vgic_host_irq_get_vlpi(struct kvm *kvm, int host_irq)
 		if (!irq->hw || irq->host_irq != host_irq)
 			continue;
 
-		if (!vgic_try_get_irq_kref(irq))
+		if (!vgic_try_get_irq_ref(irq))
 			return NULL;
 
 		return irq;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index f5148b38120a..a1d6fab895c4 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -71,7 +71,7 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid)
 	rcu_read_lock();
 
 	irq = xa_load(&dist->lpi_xa, intid);
-	if (!vgic_try_get_irq_kref(irq))
+	if (!vgic_try_get_irq_ref(irq))
 		irq = NULL;
 
 	rcu_read_unlock();
@@ -114,15 +114,6 @@ struct vgic_irq *vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid)
 	return vgic_get_irq(vcpu->kvm, intid);
 }
 
-/*
- * We can't do anything in here, because we lack the kvm pointer to
- * lock and remove the item from the lpi_list. So we keep this function
- * empty and use the return value of kref_put() to trigger the freeing.
- */
-static void vgic_irq_release(struct kref *ref)
-{
-}
-
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
@@ -131,7 +122,7 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 	if (irq->intid < VGIC_MIN_LPI)
 		return;
 
-	if (!kref_put(&irq->refcount, vgic_irq_release))
+	if (!refcount_dec_and_test(&irq->refcount))
 		return;
 
 	xa_lock_irqsave(&dist->lpi_xa, flags);
@@ -399,7 +390,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
 	 * now in the ap_list. This is safe as the caller must already hold a
 	 * reference on the irq.
 	 */
-	vgic_get_irq_kref(irq);
+	vgic_get_irq_ref(irq);
 	list_add_tail(&irq->ap_list, &vcpu->arch.vgic_cpu.ap_list_head);
 	irq->vcpu = vcpu;
 
@@ -657,7 +648,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 
 			/*
 			 * This vgic_put_irq call matches the
-			 * vgic_get_irq_kref in vgic_queue_irq_unlock,
+			 * vgic_get_irq_ref in vgic_queue_irq_unlock,
 			 * where we added the LPI to the ap_list. As
 			 * we remove the irq from the list, we drop
 			 * also drop the refcount.
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index de1c1d3261c3..ac5f9c5d2b98 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -267,7 +267,7 @@ void vgic_v2_put(struct kvm_vcpu *vcpu);
 void vgic_v2_save_state(struct kvm_vcpu *vcpu);
 void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
 
-static inline bool vgic_try_get_irq_kref(struct vgic_irq *irq)
+static inline bool vgic_try_get_irq_ref(struct vgic_irq *irq)
 {
 	if (!irq)
 		return false;
@@ -275,12 +275,12 @@ static inline bool vgic_try_get_irq_kref(struct vgic_irq *irq)
 	if (irq->intid < VGIC_MIN_LPI)
 		return true;
 
-	return kref_get_unless_zero(&irq->refcount);
+	return refcount_inc_not_zero(&irq->refcount);
 }
 
-static inline void vgic_get_irq_kref(struct vgic_irq *irq)
+static inline void vgic_get_irq_ref(struct vgic_irq *irq)
 {
-	WARN_ON_ONCE(!vgic_try_get_irq_kref(irq));
+	WARN_ON_ONCE(!vgic_try_get_irq_ref(irq));
 }
 
 void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9f8a116925ca..640555ff5b54 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -8,8 +8,8 @@
 #include <linux/bits.h>
 #include <linux/kvm.h>
 #include <linux/irqreturn.h>
-#include <linux/kref.h>
 #include <linux/mutex.h>
+#include <linux/refcount.h>
 #include <linux/spinlock.h>
 #include <linux/static_key.h>
 #include <linux/types.h>
@@ -142,7 +142,7 @@ struct vgic_irq {
 	bool active;
 	bool enabled;
 	bool hw;			/* Tied to HW IRQ */
-	struct kref refcount;		/* Used for LPIs */
+	refcount_t refcount;		/* Used for LPIs */
 	u32 hwintid;			/* HW INTID number */
 	unsigned int host_irq;		/* linux irq corresponding to hwintid */
 	union {
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 3/6] KVM: arm64: Spin off release helper from vgic_put_irq()
  2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks Oliver Upton
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Ben Horgan, Oliver Upton

Spin off the release implementation from vgic_put_irq() to prepare for a
more involved fix for lock ordering such that it may be unnested from
raw spinlocks. This has the minor functional change of doing call_rcu()
behind the xa_lock although it shouldn't be consequential.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/vgic/vgic.c | 24 +++++++++++++++++-------
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index a1d6fab895c4..ec4d70936a5b 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -114,22 +114,32 @@ struct vgic_irq *vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid)
 	return vgic_get_irq(vcpu->kvm, intid);
 }
 
+static void vgic_release_lpi_locked(struct vgic_dist *dist, struct vgic_irq *irq)
+{
+	lockdep_assert_held(&dist->lpi_xa.xa_lock);
+	__xa_erase(&dist->lpi_xa, irq->intid);
+	kfree_rcu(irq, rcu);
+}
+
+static __must_check bool __vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
+{
+	if (irq->intid < VGIC_MIN_LPI)
+		return false;
+
+	return refcount_dec_and_test(&irq->refcount);
+}
+
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 	unsigned long flags;
 
-	if (irq->intid < VGIC_MIN_LPI)
-		return;
-
-	if (!refcount_dec_and_test(&irq->refcount))
+	if (!__vgic_put_irq(kvm, irq))
 		return;
 
 	xa_lock_irqsave(&dist->lpi_xa, flags);
-	__xa_erase(&dist->lpi_xa, irq->intid);
+	vgic_release_lpi_locked(dist, irq);
 	xa_unlock_irqrestore(&dist->lpi_xa, flags);
-
-	kfree_rcu(irq, rcu);
 }
 
 void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
  2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
                   ` (2 preceding siblings ...)
  2025-09-05 10:05 ` [PATCH v2 3/6] KVM: arm64: Spin off release helper from vgic_put_irq() Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock Oliver Upton
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Ben Horgan, Oliver Upton, syzbot+cef594105ac7e60c6d93

syzkaller has caught us red-handed once more, this time nesting regular
spinlocks behind raw spinlocks:

  =============================
  [ BUG: Invalid wait context ]
  6.16.0-rc3-syzkaller-g7b8346bd9fce #0 Not tainted
  -----------------------------
  syz.0.29/3743 is trying to lock:
  a3ff80008e2e9e18 (&xa->xa_lock#20){....}-{3:3}, at: vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137
  other info that might help us debug this:
  context-{5:5}
  3 locks held by syz.0.29/3743:
   #0: a3ff80008e2e90a8 (&kvm->slots_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x50/0x624 arch/arm64/kvm/vgic/vgic-init.c:499
   #1: a3ff80008e2e9fa0 (&kvm->arch.config_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x5c/0x624 arch/arm64/kvm/vgic/vgic-init.c:500
   #2: 58f0000021be1428 (&vgic_cpu->ap_list_lock){....}-{2:2}, at: vgic_flush_pending_lpis+0x3c/0x31c arch/arm64/kvm/vgic/vgic.c:150
  stack backtrace:
  CPU: 0 UID: 0 PID: 3743 Comm: syz.0.29 Not tainted 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 PREEMPT
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
   __dump_stack+0x30/0x40 lib/dump_stack.c:94
   dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120
   dump_stack+0x1c/0x28 lib/dump_stack.c:129
   print_lock_invalid_wait_context kernel/locking/lockdep.c:4833 [inline]
   check_wait_context kernel/locking/lockdep.c:4905 [inline]
   __lock_acquire+0x978/0x299c kernel/locking/lockdep.c:5190
   lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5871
   __raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
   _raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162
   vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137
   vgic_flush_pending_lpis+0x24c/0x31c arch/arm64/kvm/vgic/vgic.c:158
   __kvm_vgic_vcpu_destroy+0x44/0x500 arch/arm64/kvm/vgic/vgic-init.c:455
   kvm_vgic_destroy+0x100/0x624 arch/arm64/kvm/vgic/vgic-init.c:505
   kvm_arch_destroy_vm+0x80/0x138 arch/arm64/kvm/arm.c:244
   kvm_destroy_vm virt/kvm/kvm_main.c:1308 [inline]
   kvm_put_kvm+0x800/0xff8 virt/kvm/kvm_main.c:1344
   kvm_vm_release+0x58/0x78 virt/kvm/kvm_main.c:1367
   __fput+0x4ac/0x980 fs/file_table.c:465
   ____fput+0x20/0x58 fs/file_table.c:493
   task_work_run+0x1bc/0x254 kernel/task_work.c:227
   resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
   do_notify_resume+0x1b4/0x270 arch/arm64/kernel/entry-common.c:151
   exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
   exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
   el0_svc+0xb4/0x160 arch/arm64/kernel/entry-common.c:768
   el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
   el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600

This is of course no good, but is at odds with how LPI refcounts are
managed. Solve the locking mess by deferring the release of unreferenced
LPIs after the ap_list_lock is released. Mark these to-be-released LPIs
specially to avoid racing with vgic_put_irq() and causing a double-free.

Since references can only be taken on LPIs with a nonzero refcount,
extending the lifetime of freed LPIs is still safe.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Reported-by: syzbot+cef594105ac7e60c6d93@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/kvmarm/68acd0d9.a00a0220.33401d.048b.GAE@google.com/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/vgic/vgic.c | 41 ++++++++++++++++++++++++++++++++++----
 include/kvm/arm_vgic.h     |  3 +++
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index ec4d70936a5b..480391a0dcad 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -28,8 +28,8 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
  *     kvm->arch.config_lock (mutex)
  *       its->cmd_lock (mutex)
  *         its->its_lock (mutex)
- *           vgic_cpu->ap_list_lock		must be taken with IRQs disabled
- *             vgic_dist->lpi_xa.xa_lock	must be taken with IRQs disabled
+ *           vgic_dist->lpi_xa.xa_lock		must be taken with IRQs disabled
+ *             vgic_cpu->ap_list_lock		must be taken with IRQs disabled
  *               vgic_irq->irq_lock		must be taken with IRQs disabled
  *
  * As the ap_list_lock might be taken from the timer interrupt handler,
@@ -129,6 +129,15 @@ static __must_check bool __vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 	return refcount_dec_and_test(&irq->refcount);
 }
 
+static __must_check bool vgic_put_irq_norelease(struct kvm *kvm, struct vgic_irq *irq)
+{
+	if (!__vgic_put_irq(kvm, irq))
+		return false;
+
+	irq->pending_release = true;
+	return true;
+}
+
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
@@ -142,10 +151,27 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 	xa_unlock_irqrestore(&dist->lpi_xa, flags);
 }
 
+static void vgic_release_deleted_lpis(struct kvm *kvm)
+{
+	struct vgic_dist *dist = &kvm->arch.vgic;
+	unsigned long flags, intid;
+	struct vgic_irq *irq;
+
+	xa_lock_irqsave(&dist->lpi_xa, flags);
+
+	xa_for_each(&dist->lpi_xa, intid, irq) {
+		if (irq->pending_release)
+			vgic_release_lpi_locked(dist, irq);
+	}
+
+	xa_unlock_irqrestore(&dist->lpi_xa, flags);
+}
+
 void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq, *tmp;
+	bool deleted = false;
 	unsigned long flags;
 
 	raw_spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
@@ -156,11 +182,14 @@ void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
 			list_del(&irq->ap_list);
 			irq->vcpu = NULL;
 			raw_spin_unlock(&irq->irq_lock);
-			vgic_put_irq(vcpu->kvm, irq);
+			deleted |= vgic_put_irq_norelease(vcpu->kvm, irq);
 		}
 	}
 
 	raw_spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
+
+	if (deleted)
+		vgic_release_deleted_lpis(vcpu->kvm);
 }
 
 void vgic_irq_set_phys_pending(struct vgic_irq *irq, bool pending)
@@ -631,6 +660,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 {
 	struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
 	struct vgic_irq *irq, *tmp;
+	bool deleted_lpis = false;
 
 	DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
 
@@ -663,7 +693,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 			 * we remove the irq from the list, we drop
 			 * also drop the refcount.
 			 */
-			vgic_put_irq(vcpu->kvm, irq);
+			deleted_lpis |= vgic_put_irq_norelease(vcpu->kvm, irq);
 			continue;
 		}
 
@@ -726,6 +756,9 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
 	}
 
 	raw_spin_unlock(&vgic_cpu->ap_list_lock);
+
+	if (unlikely(deleted_lpis))
+		vgic_release_deleted_lpis(vcpu->kvm);
 }
 
 static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 640555ff5b54..4000ff16f295 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -140,6 +140,9 @@ struct vgic_irq {
 					 * the pending state for both level
 					 * and edge triggered IRQs. */
 	bool active;
+	bool pending_release;		/* Used for LPIs only, unreferenced IRQ
+					 * pending a release */
+
 	bool enabled;
 	bool hw;			/* Tied to HW IRQ */
 	refcount_t refcount;		/* Used for LPIs */
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
  2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
                   ` (3 preceding siblings ...)
  2025-09-05 10:05 ` [PATCH v2 4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
  2025-09-05 10:05 ` [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take " Oliver Upton
  2025-09-06  6:11 ` [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
  6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Ben Horgan, Oliver Upton

Now that releases of LPIs have been unnested from the ap_list_lock there
are no xarray writers that exist in a context where IRQs are already
disabled. As such we can relax the locking to the non-IRQ disabling
spinlock to guard the LPI xarray.

Note that there are still readers of the LPI xarray where IRQs are
disabled however the readers rely on RCU protection instead of the lock.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/vgic/vgic-init.c |  2 +-
 arch/arm64/kvm/vgic/vgic-its.c  |  7 +++----
 arch/arm64/kvm/vgic/vgic.c      | 13 ++++++-------
 3 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 4c777136ea5f..4c3c0d82e476 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -53,7 +53,7 @@ void kvm_vgic_early_init(struct kvm *kvm)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 
-	xa_init_flags(&dist->lpi_xa, XA_FLAGS_LOCK_IRQ);
+	xa_init(&dist->lpi_xa);
 }
 
 /* CREATION */
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index f162206adb48..ce3e3ed3f29f 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -78,7 +78,6 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 	struct vgic_irq *irq = vgic_get_irq(kvm, intid), *oldirq;
-	unsigned long flags;
 	int ret;
 
 	/* In this case there is no put, since we keep the reference. */
@@ -89,7 +88,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
 	if (!irq)
 		return ERR_PTR(-ENOMEM);
 
-	ret = xa_reserve_irq(&dist->lpi_xa, intid, GFP_KERNEL_ACCOUNT);
+	ret = xa_reserve(&dist->lpi_xa, intid, GFP_KERNEL_ACCOUNT);
 	if (ret) {
 		kfree(irq);
 		return ERR_PTR(ret);
@@ -104,7 +103,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
 	irq->target_vcpu = vcpu;
 	irq->group = 1;
 
-	xa_lock_irqsave(&dist->lpi_xa, flags);
+	xa_lock(&dist->lpi_xa);
 
 	/*
 	 * There could be a race with another vgic_add_lpi(), so we need to
@@ -126,7 +125,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
 	}
 
 out_unlock:
-	xa_unlock_irqrestore(&dist->lpi_xa, flags);
+	xa_unlock(&dist->lpi_xa);
 
 	if (ret)
 		return ERR_PTR(ret);
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 480391a0dcad..a21b482844ce 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -28,7 +28,7 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
  *     kvm->arch.config_lock (mutex)
  *       its->cmd_lock (mutex)
  *         its->its_lock (mutex)
- *           vgic_dist->lpi_xa.xa_lock		must be taken with IRQs disabled
+ *           vgic_dist->lpi_xa.xa_lock
  *             vgic_cpu->ap_list_lock		must be taken with IRQs disabled
  *               vgic_irq->irq_lock		must be taken with IRQs disabled
  *
@@ -141,30 +141,29 @@ static __must_check bool vgic_put_irq_norelease(struct kvm *kvm, struct vgic_irq
 void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
-	unsigned long flags;
 
 	if (!__vgic_put_irq(kvm, irq))
 		return;
 
-	xa_lock_irqsave(&dist->lpi_xa, flags);
+	xa_lock(&dist->lpi_xa);
 	vgic_release_lpi_locked(dist, irq);
-	xa_unlock_irqrestore(&dist->lpi_xa, flags);
+	xa_unlock(&dist->lpi_xa);
 }
 
 static void vgic_release_deleted_lpis(struct kvm *kvm)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
-	unsigned long flags, intid;
+	unsigned long intid;
 	struct vgic_irq *irq;
 
-	xa_lock_irqsave(&dist->lpi_xa, flags);
+	xa_lock(&dist->lpi_xa);
 
 	xa_for_each(&dist->lpi_xa, intid, irq) {
 		if (irq->pending_release)
 			vgic_release_lpi_locked(dist, irq);
 	}
 
-	xa_unlock_irqrestore(&dist->lpi_xa, flags);
+	xa_unlock(&dist->lpi_xa);
 }
 
 void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
                   ` (4 preceding siblings ...)
  2025-09-05 10:05 ` [PATCH v2 5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
  2025-11-05  9:37   ` Zenghui Yu
  2025-09-06  6:11 ` [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
  6 siblings, 1 reply; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
  To: kvmarm
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Ben Horgan, Oliver Upton

The release path on LPIs is quite rare, meaning it can be difficult to
find lock ordering bugs on the LPI xarray's spinlock. Tell lockdep that
vgic_put_irq() might acquire the xa_lock to make unsafe patterns more
obvious.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
 arch/arm64/kvm/vgic/vgic.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index a21b482844ce..3b247041a130 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -142,6 +142,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 
+	if (irq->intid >= VGIC_MIN_LPI)
+		might_lock(&dist->lpi_xa.xa_lock);
+
 	if (!__vgic_put_irq(kvm, irq))
 		return;
 
-- 
2.39.5


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd
  2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
                   ` (5 preceding siblings ...)
  2025-09-05 10:05 ` [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take " Oliver Upton
@ 2025-09-06  6:11 ` Oliver Upton
  6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-06  6:11 UTC (permalink / raw)
  To: kvmarm, Oliver Upton
  Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	Ben Horgan

On Fri, 05 Sep 2025 03:05:25 -0700, Oliver Upton wrote:
> v1: https://lore.kernel.org/kvmarm/20250904062348.223976-1-oliver.upton@linux.dev/
> 
> v1 -> v2:
>  - Drop union misbehavior + misleading comment (Marc)
>  - Fix LPI check to include the first one (Ben)
>  - Clarify changelog to make lock relaxation slightly more obvious
> 
> [...]

Applied to fixes, thanks!

[1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state
      https://git.kernel.org/kvmarm/kvmarm/c/982c6cee4c28
[2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
      https://git.kernel.org/kvmarm/kvmarm/c/5cce5e814fa2
[3/6] KVM: arm64: Spin off release helper from vgic_put_irq()
      https://git.kernel.org/kvmarm/kvmarm/c/135f15386207
[4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
      https://git.kernel.org/kvmarm/kvmarm/c/0388323fb7cb
[5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
      https://git.kernel.org/kvmarm/kvmarm/c/032ea3e45463
[6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
      https://git.kernel.org/kvmarm/kvmarm/c/78921bec9d58

--
Best,
Oliver

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  2025-09-05 10:05 ` [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take " Oliver Upton
@ 2025-11-05  9:37   ` Zenghui Yu
  2025-11-05 10:28     ` Marc Zyngier
  0 siblings, 1 reply; 13+ messages in thread
From: Zenghui Yu @ 2025-11-05  9:37 UTC (permalink / raw)
  To: Oliver Upton
  Cc: kvmarm, Marc Zyngier, Joey Gouly, Suzuki K Poulose, Ben Horgan

Hi Oliver,

On 2025/9/5 18:05, Oliver Upton wrote:
> The release path on LPIs is quite rare, meaning it can be difficult to
> find lock ordering bugs on the LPI xarray's spinlock. Tell lockdep that
> vgic_put_irq() might acquire the xa_lock to make unsafe patterns more
> obvious.
> 
> Reviewed-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
>  arch/arm64/kvm/vgic/vgic.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> index a21b482844ce..3b247041a130 100644
> --- a/arch/arm64/kvm/vgic/vgic.c
> +++ b/arch/arm64/kvm/vgic/vgic.c
> @@ -142,6 +142,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
>  {
>  	struct vgic_dist *dist = &kvm->arch.vgic;
>  
> +	if (irq->intid >= VGIC_MIN_LPI)
> +		might_lock(&dist->lpi_xa.xa_lock);

I got the following splat on a lockdep kernel. The reproducing step can
be easily inferred from the backtrace (i.e., starting a guest with an
assigned device).

 ================================
 WARNING: inconsistent lock state
 6.18.0-rc4-00019-g284922f4c563-dirty #2390 Not tainted
 --------------------------------
 inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
 swapper/10/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
 ffff8000a504de18 (&xa->xa_lock#19){?.+.}-{3:3}, at: vgic_put_irq+0x28/0x110
 {HARDIRQ-ON-W} state was registered at:
   lock_acquire+0x1c8/0x354
   _raw_spin_lock+0x48/0x60
   vgic_add_lpi.part.0+0x70/0x2f8
   vgic_its_cmd_handle_mapi.isra.0+0x398/0x418
   vgic_its_process_commands.part.0+0x4d4/0xfa0
   vgic_mmio_write_its_cwriter+0x80/0xa4
   dispatch_mmio_write+0xd0/0x128
   __kvm_io_bus_write+0xb4/0xe8
   kvm_io_bus_write+0x58/0x98
   io_mem_abort+0xe8/0x3f0
   kvm_handle_guest_abort+0x4d0/0x1414
   handle_exit+0x6c/0x1c4
   kvm_arch_vcpu_ioctl_run+0x678/0xbfc
   kvm_vcpu_ioctl+0x1ac/0xb24
   __arm64_sys_ioctl+0xac/0x104
   invoke_syscall+0x48/0x10c
   el0_svc_common.constprop.0+0x40/0xe0
   do_el0_svc+0x1c/0x28
   el0_svc+0x50/0x2c0
   el0t_64_sync_handler+0xa0/0xe4
   el0t_64_sync+0x198/0x19c
 irq event stamp: 5415534
 hardirqs last  enabled at (5415533): [<ffff8000813e291c>]
default_idle_call+0x7c/0x138
 hardirqs last disabled at (5415534): [<ffff8000813dabf4>]
enter_from_kernel_mode+0x10/0x3c
 softirqs last  enabled at (5415516): [<ffff8000800c7b54>]
handle_softirqs+0x4ac/0x4c4
 softirqs last disabled at (5415511): [<ffff800080010748>]
__do_softirq+0x14/0x20

 other info that might help us debug this:
  Possible unsafe locking scenario:

        CPU0
        ----
   lock(&xa->xa_lock#19);
   <Interrupt>
     lock(&xa->xa_lock#19);

  *** DEADLOCK ***

 2 locks held by swapper/10/0:
  #0: ffff00280db646a0 (&ctx->wqh#2){-...}-{3:3}, at:
eventfd_signal_mask+0x38/0xc0
  #1: ffff8000a504e480 (&kvm->irq_srcu){.?.+}-{0:0}, at:
irqfd_wakeup+0x88/0x2ac

 stack backtrace:
 CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Kdump: loaded Not tainted
6.18.0-rc4-00019-g284922f4c563-dirty #2390 PREEMPT
 Call trace:
  show_stack+0x18/0x24 (C)
  dump_stack_lvl+0x90/0xd0
  dump_stack+0x18/0x24
  print_usage_bug.part.0+0x29c/0x358
  mark_lock+0x6c0/0x960
  __lock_acquire+0xd4c/0x20fc
  lock_acquire+0x1c8/0x354
  vgic_put_irq+0x54/0x110
  vgic_its_inject_cached_translation+0x178/0x25c
  kvm_arch_set_irq_inatomic+0xac/0x124
  irqfd_wakeup+0x1a0/0x2ac
  __wake_up_common+0x94/0x10c
  __wake_up_locked_key+0x1c/0x28
  eventfd_signal_mask+0x78/0xc0
  vfio_msihandler+0x18/0x28
  __handle_irq_event_percpu+0x8c/0x374
  handle_irq_event+0x4c/0xa8
  handle_fasteoi_irq+0x108/0x198
  handle_irq_desc+0x40/0x58
  generic_handle_domain_irq+0x1c/0x28
  gic_handle_irq+0x4c/0x120
  call_on_irq_stack+0x30/0x48
  do_interrupt_handler+0x80/0x84
  el1_interrupt+0x3c/0x60
  el1h_64_irq_handler+0x18/0x24
  el1h_64_irq+0x6c/0x70
  default_idle_call+0x80/0x138 (P)
  do_idle+0x21c/0x27c
  cpu_startup_entry+0x34/0x3c
  secondary_start_kernel+0x144/0x164
  __secondary_switched+0xc0/0xc4

Thanks,
Zenghui

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  2025-11-05  9:37   ` Zenghui Yu
@ 2025-11-05 10:28     ` Marc Zyngier
  2025-11-06  0:46       ` Oliver Upton
  0 siblings, 1 reply; 13+ messages in thread
From: Marc Zyngier @ 2025-11-05 10:28 UTC (permalink / raw)
  To: Zenghui Yu; +Cc: Oliver Upton, kvmarm, Joey Gouly, Suzuki K Poulose, Ben Horgan

On Wed, 05 Nov 2025 09:37:10 +0000,
Zenghui Yu <yuzenghui@huawei.com> wrote:
> 
> Hi Oliver,
> 
> On 2025/9/5 18:05, Oliver Upton wrote:
> > The release path on LPIs is quite rare, meaning it can be difficult to
> > find lock ordering bugs on the LPI xarray's spinlock. Tell lockdep that
> > vgic_put_irq() might acquire the xa_lock to make unsafe patterns more
> > obvious.
> > 
> > Reviewed-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> >  arch/arm64/kvm/vgic/vgic.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> > index a21b482844ce..3b247041a130 100644
> > --- a/arch/arm64/kvm/vgic/vgic.c
> > +++ b/arch/arm64/kvm/vgic/vgic.c
> > @@ -142,6 +142,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> >  {
> >  	struct vgic_dist *dist = &kvm->arch.vgic;
> >  
> > +	if (irq->intid >= VGIC_MIN_LPI)
> > +		might_lock(&dist->lpi_xa.xa_lock);
> 
> I got the following splat on a lockdep kernel. The reproducing step can
> be easily inferred from the backtrace (i.e., starting a guest with an
> assigned device).
> 
>  ================================
>  WARNING: inconsistent lock state
>  6.18.0-rc4-00019-g284922f4c563-dirty #2390 Not tainted
>  --------------------------------
>  inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
>  swapper/10/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
>  ffff8000a504de18 (&xa->xa_lock#19){?.+.}-{3:3}, at: vgic_put_irq+0x28/0x110
>  {HARDIRQ-ON-W} state was registered at:
>    lock_acquire+0x1c8/0x354
>    _raw_spin_lock+0x48/0x60
>    vgic_add_lpi.part.0+0x70/0x2f8
>    vgic_its_cmd_handle_mapi.isra.0+0x398/0x418
>    vgic_its_process_commands.part.0+0x4d4/0xfa0
>    vgic_mmio_write_its_cwriter+0x80/0xa4
>    dispatch_mmio_write+0xd0/0x128
>    __kvm_io_bus_write+0xb4/0xe8
>    kvm_io_bus_write+0x58/0x98
>    io_mem_abort+0xe8/0x3f0
>    kvm_handle_guest_abort+0x4d0/0x1414
>    handle_exit+0x6c/0x1c4
>    kvm_arch_vcpu_ioctl_run+0x678/0xbfc
>    kvm_vcpu_ioctl+0x1ac/0xb24
>    __arm64_sys_ioctl+0xac/0x104
>    invoke_syscall+0x48/0x10c
>    el0_svc_common.constprop.0+0x40/0xe0
>    do_el0_svc+0x1c/0x28
>    el0_svc+0x50/0x2c0
>    el0t_64_sync_handler+0xa0/0xe4
>    el0t_64_sync+0x198/0x19c
>  irq event stamp: 5415534
>  hardirqs last  enabled at (5415533): [<ffff8000813e291c>]
> default_idle_call+0x7c/0x138
>  hardirqs last disabled at (5415534): [<ffff8000813dabf4>]
> enter_from_kernel_mode+0x10/0x3c
>  softirqs last  enabled at (5415516): [<ffff8000800c7b54>]
> handle_softirqs+0x4ac/0x4c4
>  softirqs last disabled at (5415511): [<ffff800080010748>]
> __do_softirq+0x14/0x20
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&xa->xa_lock#19);
>    <Interrupt>
>      lock(&xa->xa_lock#19);
> 
>   *** DEADLOCK ***
> 
>  2 locks held by swapper/10/0:
>   #0: ffff00280db646a0 (&ctx->wqh#2){-...}-{3:3}, at:
> eventfd_signal_mask+0x38/0xc0
>   #1: ffff8000a504e480 (&kvm->irq_srcu){.?.+}-{0:0}, at:
> irqfd_wakeup+0x88/0x2ac
> 
>  stack backtrace:
>  CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Kdump: loaded Not tainted
> 6.18.0-rc4-00019-g284922f4c563-dirty #2390 PREEMPT
>  Call trace:
>   show_stack+0x18/0x24 (C)
>   dump_stack_lvl+0x90/0xd0
>   dump_stack+0x18/0x24
>   print_usage_bug.part.0+0x29c/0x358
>   mark_lock+0x6c0/0x960
>   __lock_acquire+0xd4c/0x20fc
>   lock_acquire+0x1c8/0x354
>   vgic_put_irq+0x54/0x110
>   vgic_its_inject_cached_translation+0x178/0x25c
>   kvm_arch_set_irq_inatomic+0xac/0x124

Right. This might_lock() is gross, and clearly doesn't do the right
thing outside of direct injection of LPIs.

I think we should drop it, but we should ensure that lpi_xa.xa_lock is
never taken in interrupt context.

Oliver, what do you think?

	M.

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  2025-11-05 10:28     ` Marc Zyngier
@ 2025-11-06  0:46       ` Oliver Upton
  2025-11-06  0:58         ` Oliver Upton
  0 siblings, 1 reply; 13+ messages in thread
From: Oliver Upton @ 2025-11-06  0:46 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Zenghui Yu, kvmarm, Joey Gouly, Suzuki K Poulose, Ben Horgan

Hey,

On Wed, Nov 05, 2025 at 10:28:04AM +0000, Marc Zyngier wrote:
> On Wed, 05 Nov 2025 09:37:10 +0000,
> Zenghui Yu <yuzenghui@huawei.com> wrote:
> > I got the following splat on a lockdep kernel. The reproducing step can
> > be easily inferred from the backtrace (i.e., starting a guest with an
> > assigned device).

Ouch, sorry about that!

> >  ================================
> >  WARNING: inconsistent lock state
> >  6.18.0-rc4-00019-g284922f4c563-dirty #2390 Not tainted
> >  --------------------------------
> >  inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> >  swapper/10/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> >  ffff8000a504de18 (&xa->xa_lock#19){?.+.}-{3:3}, at: vgic_put_irq+0x28/0x110
> >  {HARDIRQ-ON-W} state was registered at:
> >    lock_acquire+0x1c8/0x354
> >    _raw_spin_lock+0x48/0x60
> >    vgic_add_lpi.part.0+0x70/0x2f8
> >    vgic_its_cmd_handle_mapi.isra.0+0x398/0x418
> >    vgic_its_process_commands.part.0+0x4d4/0xfa0
> >    vgic_mmio_write_its_cwriter+0x80/0xa4
> >    dispatch_mmio_write+0xd0/0x128
> >    __kvm_io_bus_write+0xb4/0xe8
> >    kvm_io_bus_write+0x58/0x98
> >    io_mem_abort+0xe8/0x3f0
> >    kvm_handle_guest_abort+0x4d0/0x1414
> >    handle_exit+0x6c/0x1c4
> >    kvm_arch_vcpu_ioctl_run+0x678/0xbfc
> >    kvm_vcpu_ioctl+0x1ac/0xb24
> >    __arm64_sys_ioctl+0xac/0x104
> >    invoke_syscall+0x48/0x10c
> >    el0_svc_common.constprop.0+0x40/0xe0
> >    do_el0_svc+0x1c/0x28
> >    el0_svc+0x50/0x2c0
> >    el0t_64_sync_handler+0xa0/0xe4
> >    el0t_64_sync+0x198/0x19c
> >  irq event stamp: 5415534
> >  hardirqs last  enabled at (5415533): [<ffff8000813e291c>]
> > default_idle_call+0x7c/0x138
> >  hardirqs last disabled at (5415534): [<ffff8000813dabf4>]
> > enter_from_kernel_mode+0x10/0x3c
> >  softirqs last  enabled at (5415516): [<ffff8000800c7b54>]
> > handle_softirqs+0x4ac/0x4c4
> >  softirqs last disabled at (5415511): [<ffff800080010748>]
> > __do_softirq+0x14/0x20
> > 
> >  other info that might help us debug this:
> >   Possible unsafe locking scenario:
> > 
> >         CPU0
> >         ----
> >    lock(&xa->xa_lock#19);
> >    <Interrupt>
> >      lock(&xa->xa_lock#19);
> > 
> >   *** DEADLOCK ***
> > 
> >  2 locks held by swapper/10/0:
> >   #0: ffff00280db646a0 (&ctx->wqh#2){-...}-{3:3}, at:
> > eventfd_signal_mask+0x38/0xc0
> >   #1: ffff8000a504e480 (&kvm->irq_srcu){.?.+}-{0:0}, at:
> > irqfd_wakeup+0x88/0x2ac
> > 
> >  stack backtrace:
> >  CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Kdump: loaded Not tainted
> > 6.18.0-rc4-00019-g284922f4c563-dirty #2390 PREEMPT
> >  Call trace:
> >   show_stack+0x18/0x24 (C)
> >   dump_stack_lvl+0x90/0xd0
> >   dump_stack+0x18/0x24
> >   print_usage_bug.part.0+0x29c/0x358
> >   mark_lock+0x6c0/0x960
> >   __lock_acquire+0xd4c/0x20fc
> >   lock_acquire+0x1c8/0x354
> >   vgic_put_irq+0x54/0x110
> >   vgic_its_inject_cached_translation+0x178/0x25c
> >   kvm_arch_set_irq_inatomic+0xac/0x124
> 
> Right. This might_lock() is gross, and clearly doesn't do the right
> thing outside of direct injection of LPIs.
> 
> I think we should drop it, but we should ensure that lpi_xa.xa_lock is
> never taken in interrupt context.
> 
> Oliver, what do you think?

It is possible (albeit improbable) that the last reference to an LPI gets
dropped here after injecting a cached translation. When that is the
case, vgic_put_irq() will take the xa_lock from an irq context. So
I'd say the might_lock() here is valid.

Zenghui, does reverting 982f31bbb5b0 ("KVM: arm64: vgic-v3: Don't require
IRQs be disabled for LPI xarray lock") make this go away?

Thanks,
Oliver

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  2025-11-06  0:46       ` Oliver Upton
@ 2025-11-06  0:58         ` Oliver Upton
  2025-11-06  3:34           ` Zenghui Yu
  0 siblings, 1 reply; 13+ messages in thread
From: Oliver Upton @ 2025-11-06  0:58 UTC (permalink / raw)
  To: Marc Zyngier; +Cc: Zenghui Yu, kvmarm, Joey Gouly, Suzuki K Poulose, Ben Horgan

On Wed, Nov 05, 2025 at 04:46:59PM -0800, Oliver Upton wrote:
> Hey,
> 
> On Wed, Nov 05, 2025 at 10:28:04AM +0000, Marc Zyngier wrote:
> > On Wed, 05 Nov 2025 09:37:10 +0000,
> > Zenghui Yu <yuzenghui@huawei.com> wrote:
> > > I got the following splat on a lockdep kernel. The reproducing step can
> > > be easily inferred from the backtrace (i.e., starting a guest with an
> > > assigned device).
> 
> Ouch, sorry about that!
> 
> > >  ================================
> > >  WARNING: inconsistent lock state
> > >  6.18.0-rc4-00019-g284922f4c563-dirty #2390 Not tainted
> > >  --------------------------------
> > >  inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> > >  swapper/10/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > >  ffff8000a504de18 (&xa->xa_lock#19){?.+.}-{3:3}, at: vgic_put_irq+0x28/0x110
> > >  {HARDIRQ-ON-W} state was registered at:
> > >    lock_acquire+0x1c8/0x354
> > >    _raw_spin_lock+0x48/0x60
> > >    vgic_add_lpi.part.0+0x70/0x2f8
> > >    vgic_its_cmd_handle_mapi.isra.0+0x398/0x418
> > >    vgic_its_process_commands.part.0+0x4d4/0xfa0
> > >    vgic_mmio_write_its_cwriter+0x80/0xa4
> > >    dispatch_mmio_write+0xd0/0x128
> > >    __kvm_io_bus_write+0xb4/0xe8
> > >    kvm_io_bus_write+0x58/0x98
> > >    io_mem_abort+0xe8/0x3f0
> > >    kvm_handle_guest_abort+0x4d0/0x1414
> > >    handle_exit+0x6c/0x1c4
> > >    kvm_arch_vcpu_ioctl_run+0x678/0xbfc
> > >    kvm_vcpu_ioctl+0x1ac/0xb24
> > >    __arm64_sys_ioctl+0xac/0x104
> > >    invoke_syscall+0x48/0x10c
> > >    el0_svc_common.constprop.0+0x40/0xe0
> > >    do_el0_svc+0x1c/0x28
> > >    el0_svc+0x50/0x2c0
> > >    el0t_64_sync_handler+0xa0/0xe4
> > >    el0t_64_sync+0x198/0x19c
> > >  irq event stamp: 5415534
> > >  hardirqs last  enabled at (5415533): [<ffff8000813e291c>]
> > > default_idle_call+0x7c/0x138
> > >  hardirqs last disabled at (5415534): [<ffff8000813dabf4>]
> > > enter_from_kernel_mode+0x10/0x3c
> > >  softirqs last  enabled at (5415516): [<ffff8000800c7b54>]
> > > handle_softirqs+0x4ac/0x4c4
> > >  softirqs last disabled at (5415511): [<ffff800080010748>]
> > > __do_softirq+0x14/0x20
> > > 
> > >  other info that might help us debug this:
> > >   Possible unsafe locking scenario:
> > > 
> > >         CPU0
> > >         ----
> > >    lock(&xa->xa_lock#19);
> > >    <Interrupt>
> > >      lock(&xa->xa_lock#19);
> > > 
> > >   *** DEADLOCK ***
> > > 
> > >  2 locks held by swapper/10/0:
> > >   #0: ffff00280db646a0 (&ctx->wqh#2){-...}-{3:3}, at:
> > > eventfd_signal_mask+0x38/0xc0
> > >   #1: ffff8000a504e480 (&kvm->irq_srcu){.?.+}-{0:0}, at:
> > > irqfd_wakeup+0x88/0x2ac
> > > 
> > >  stack backtrace:
> > >  CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Kdump: loaded Not tainted
> > > 6.18.0-rc4-00019-g284922f4c563-dirty #2390 PREEMPT
> > >  Call trace:
> > >   show_stack+0x18/0x24 (C)
> > >   dump_stack_lvl+0x90/0xd0
> > >   dump_stack+0x18/0x24
> > >   print_usage_bug.part.0+0x29c/0x358
> > >   mark_lock+0x6c0/0x960
> > >   __lock_acquire+0xd4c/0x20fc
> > >   lock_acquire+0x1c8/0x354
> > >   vgic_put_irq+0x54/0x110
> > >   vgic_its_inject_cached_translation+0x178/0x25c
> > >   kvm_arch_set_irq_inatomic+0xac/0x124
> > 
> > Right. This might_lock() is gross, and clearly doesn't do the right
> > thing outside of direct injection of LPIs.
> > 
> > I think we should drop it, but we should ensure that lpi_xa.xa_lock is
> > never taken in interrupt context.
> > 
> > Oliver, what do you think?
> 
> It is possible (albeit improbable) that the last reference to an LPI gets
> dropped here after injecting a cached translation. When that is the
> case, vgic_put_irq() will take the xa_lock from an irq context. So
> I'd say the might_lock() here is valid.
> 
> Zenghui, does reverting 982f31bbb5b0 ("KVM: arm64: vgic-v3: Don't require
> IRQs be disabled for LPI xarray lock") make this go away?

Well, a bit more than that. Revert and add the diff below. Like I said
in the original changelog, finding bugs for rare release paths is
annoying and having a reliable way of causing an explosion when the
calling context isn't right is a property I'd like to preserve.

Thanks,
Oliver

diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 6dd5a10081e2..1045e9538e91 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -142,8 +142,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
 {
 	struct vgic_dist *dist = &kvm->arch.vgic;
 
-	if (irq->intid >= VGIC_MIN_LPI)
-		might_lock(&dist->lpi_xa.xa_lock);
+	if (IS_ENABLED(CONFIG_LOCKDEP) && irq->intid >= VGIC_MIN_LPI) {
+		guard(spinlock_irqsave)(&dist->lpi_xa.xa_lock);
+	}
 
 	if (!__vgic_put_irq(kvm, irq))
 		return;

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
  2025-11-06  0:58         ` Oliver Upton
@ 2025-11-06  3:34           ` Zenghui Yu
  0 siblings, 0 replies; 13+ messages in thread
From: Zenghui Yu @ 2025-11-06  3:34 UTC (permalink / raw)
  To: Oliver Upton
  Cc: Marc Zyngier, kvmarm, Joey Gouly, Suzuki K Poulose, Ben Horgan

On 2025/11/6 8:58, Oliver Upton wrote:
> On Wed, Nov 05, 2025 at 04:46:59PM -0800, Oliver Upton wrote:
> > Hey,
> >
> > On Wed, Nov 05, 2025 at 10:28:04AM +0000, Marc Zyngier wrote:
> > > On Wed, 05 Nov 2025 09:37:10 +0000,
> > > Zenghui Yu <yuzenghui@huawei.com> wrote:
> > > >  Call trace:
> > > >   show_stack+0x18/0x24 (C)
> > > >   dump_stack_lvl+0x90/0xd0
> > > >   dump_stack+0x18/0x24
> > > >   print_usage_bug.part.0+0x29c/0x358
> > > >   mark_lock+0x6c0/0x960
> > > >   __lock_acquire+0xd4c/0x20fc
> > > >   lock_acquire+0x1c8/0x354
> > > >   vgic_put_irq+0x54/0x110
> > > >   vgic_its_inject_cached_translation+0x178/0x25c
> > > >   kvm_arch_set_irq_inatomic+0xac/0x124
> > >
> > > Right. This might_lock() is gross, and clearly doesn't do the right
> > > thing outside of direct injection of LPIs.
> > >
> > > I think we should drop it, but we should ensure that lpi_xa.xa_lock is
> > > never taken in interrupt context.
> > >
> > > Oliver, what do you think?
> >
> > It is possible (albeit improbable) that the last reference to an LPI gets
> > dropped here after injecting a cached translation. When that is the
> > case, vgic_put_irq() will take the xa_lock from an irq context.

IMO this is possible (with a really small possibility though) to happen.

> > So
> > I'd say the might_lock() here is valid.
> >
> > Zenghui, does reverting 982f31bbb5b0 ("KVM: arm64: vgic-v3: Don't require
> > IRQs be disabled for LPI xarray lock") make this go away?
> 
> Well, a bit more than that. Revert and add the diff below. Like I said
> in the original changelog, finding bugs for rare release paths is
> annoying and having a reliable way of causing an explosion when the
> calling context isn't right is a property I'd like to preserve.

The warning disappears with that.

Thanks,
Zenghui

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-11-06  3:34 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
2025-09-05 10:05 ` [PATCH v2 1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state Oliver Upton
2025-09-05 10:05 ` [PATCH v2 2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs Oliver Upton
2025-09-05 10:05 ` [PATCH v2 3/6] KVM: arm64: Spin off release helper from vgic_put_irq() Oliver Upton
2025-09-05 10:05 ` [PATCH v2 4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks Oliver Upton
2025-09-05 10:05 ` [PATCH v2 5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock Oliver Upton
2025-09-05 10:05 ` [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take " Oliver Upton
2025-11-05  9:37   ` Zenghui Yu
2025-11-05 10:28     ` Marc Zyngier
2025-11-06  0:46       ` Oliver Upton
2025-11-06  0:58         ` Oliver Upton
2025-11-06  3:34           ` Zenghui Yu
2025-09-06  6:11 ` [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.