* [PATCH v2 1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state
2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
2025-09-05 10:05 ` [PATCH v2 2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs Oliver Upton
` (5 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
To: kvmarm
Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Ben Horgan, Oliver Upton
While LPIs lack an active state, KVM unconditionally folds the active
state from the LR into the vgic_irq struct meaning this field cannot be
'creatively' reused for something else.
Drop the misleading comment to reflect this.
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
include/kvm/arm_vgic.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 404883c7af6e..9f8a116925ca 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -139,7 +139,7 @@ struct vgic_irq {
bool pending_latch; /* The pending latch state used to calculate
* the pending state for both level
* and edge triggered IRQs. */
- bool active; /* not used for LPIs */
+ bool active;
bool enabled;
bool hw; /* Tied to HW IRQ */
struct kref refcount; /* Used for LPIs */
--
2.39.5
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
2025-09-05 10:05 ` [PATCH v2 1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
2025-09-05 10:05 ` [PATCH v2 3/6] KVM: arm64: Spin off release helper from vgic_put_irq() Oliver Upton
` (4 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
To: kvmarm
Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Ben Horgan, Oliver Upton
KVM's use of krefs to manage LPIs isn't adding much, especially
considering vgic_irq_release() is a noop due to the lack of sufficient
context.
Switch to using a regular refcount in anticipation of adding a
meaningful release concept for LPIs.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/kvm/vgic/vgic-debug.c | 2 +-
arch/arm64/kvm/vgic/vgic-init.c | 4 ++--
arch/arm64/kvm/vgic/vgic-its.c | 8 ++++----
arch/arm64/kvm/vgic/vgic-v4.c | 2 +-
arch/arm64/kvm/vgic/vgic.c | 17 ++++-------------
arch/arm64/kvm/vgic/vgic.h | 8 ++++----
include/kvm/arm_vgic.h | 4 ++--
7 files changed, 18 insertions(+), 27 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-debug.c b/arch/arm64/kvm/vgic/vgic-debug.c
index 2684f273d9e1..4c1209261b65 100644
--- a/arch/arm64/kvm/vgic/vgic-debug.c
+++ b/arch/arm64/kvm/vgic/vgic-debug.c
@@ -69,7 +69,7 @@ static int iter_mark_lpis(struct kvm *kvm)
int nr_lpis = 0;
xa_for_each(&dist->lpi_xa, intid, irq) {
- if (!vgic_try_get_irq_kref(irq))
+ if (!vgic_try_get_irq_ref(irq))
continue;
xa_set_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER);
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 1e680ad6e863..4c777136ea5f 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -208,7 +208,7 @@ static int kvm_vgic_dist_init(struct kvm *kvm, unsigned int nr_spis)
raw_spin_lock_init(&irq->irq_lock);
irq->vcpu = NULL;
irq->target_vcpu = vcpu0;
- kref_init(&irq->refcount);
+ refcount_set(&irq->refcount, 0);
switch (dist->vgic_model) {
case KVM_DEV_TYPE_ARM_VGIC_V2:
irq->targets = 0;
@@ -277,7 +277,7 @@ static int vgic_allocate_private_irqs_locked(struct kvm_vcpu *vcpu, u32 type)
irq->intid = i;
irq->vcpu = NULL;
irq->target_vcpu = vcpu;
- kref_init(&irq->refcount);
+ refcount_set(&irq->refcount, 0);
if (vgic_irq_is_sgi(i)) {
/* SGIs */
irq->enabled = 1;
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index 7368c13f16b7..f162206adb48 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -99,7 +99,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
raw_spin_lock_init(&irq->irq_lock);
irq->config = VGIC_CONFIG_EDGE;
- kref_init(&irq->refcount);
+ refcount_set(&irq->refcount, 1);
irq->intid = intid;
irq->target_vcpu = vcpu;
irq->group = 1;
@@ -111,7 +111,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
* check that we don't add a second list entry with the same LPI.
*/
oldirq = xa_load(&dist->lpi_xa, intid);
- if (vgic_try_get_irq_kref(oldirq)) {
+ if (vgic_try_get_irq_ref(oldirq)) {
/* Someone was faster with adding this LPI, lets use that. */
kfree(irq);
irq = oldirq;
@@ -547,7 +547,7 @@ static struct vgic_irq *vgic_its_check_cache(struct kvm *kvm, phys_addr_t db,
rcu_read_lock();
irq = xa_load(&its->translation_cache, cache_key);
- if (!vgic_try_get_irq_kref(irq))
+ if (!vgic_try_get_irq_ref(irq))
irq = NULL;
rcu_read_unlock();
@@ -571,7 +571,7 @@ static void vgic_its_cache_translation(struct kvm *kvm, struct vgic_its *its,
* its_lock, as the ITE (and the reference it holds) cannot be freed.
*/
lockdep_assert_held(&its->its_lock);
- vgic_get_irq_kref(irq);
+ vgic_get_irq_ref(irq);
old = xa_store(&its->translation_cache, cache_key, irq, GFP_KERNEL_ACCOUNT);
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index 4d9343d2b0b1..548aec9d5a72 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -518,7 +518,7 @@ static struct vgic_irq *__vgic_host_irq_get_vlpi(struct kvm *kvm, int host_irq)
if (!irq->hw || irq->host_irq != host_irq)
continue;
- if (!vgic_try_get_irq_kref(irq))
+ if (!vgic_try_get_irq_ref(irq))
return NULL;
return irq;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index f5148b38120a..a1d6fab895c4 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -71,7 +71,7 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid)
rcu_read_lock();
irq = xa_load(&dist->lpi_xa, intid);
- if (!vgic_try_get_irq_kref(irq))
+ if (!vgic_try_get_irq_ref(irq))
irq = NULL;
rcu_read_unlock();
@@ -114,15 +114,6 @@ struct vgic_irq *vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid)
return vgic_get_irq(vcpu->kvm, intid);
}
-/*
- * We can't do anything in here, because we lack the kvm pointer to
- * lock and remove the item from the lpi_list. So we keep this function
- * empty and use the return value of kref_put() to trigger the freeing.
- */
-static void vgic_irq_release(struct kref *ref)
-{
-}
-
void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
@@ -131,7 +122,7 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
if (irq->intid < VGIC_MIN_LPI)
return;
- if (!kref_put(&irq->refcount, vgic_irq_release))
+ if (!refcount_dec_and_test(&irq->refcount))
return;
xa_lock_irqsave(&dist->lpi_xa, flags);
@@ -399,7 +390,7 @@ bool vgic_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
* now in the ap_list. This is safe as the caller must already hold a
* reference on the irq.
*/
- vgic_get_irq_kref(irq);
+ vgic_get_irq_ref(irq);
list_add_tail(&irq->ap_list, &vcpu->arch.vgic_cpu.ap_list_head);
irq->vcpu = vcpu;
@@ -657,7 +648,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
/*
* This vgic_put_irq call matches the
- * vgic_get_irq_kref in vgic_queue_irq_unlock,
+ * vgic_get_irq_ref in vgic_queue_irq_unlock,
* where we added the LPI to the ap_list. As
* we remove the irq from the list, we drop
* also drop the refcount.
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index de1c1d3261c3..ac5f9c5d2b98 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -267,7 +267,7 @@ void vgic_v2_put(struct kvm_vcpu *vcpu);
void vgic_v2_save_state(struct kvm_vcpu *vcpu);
void vgic_v2_restore_state(struct kvm_vcpu *vcpu);
-static inline bool vgic_try_get_irq_kref(struct vgic_irq *irq)
+static inline bool vgic_try_get_irq_ref(struct vgic_irq *irq)
{
if (!irq)
return false;
@@ -275,12 +275,12 @@ static inline bool vgic_try_get_irq_kref(struct vgic_irq *irq)
if (irq->intid < VGIC_MIN_LPI)
return true;
- return kref_get_unless_zero(&irq->refcount);
+ return refcount_inc_not_zero(&irq->refcount);
}
-static inline void vgic_get_irq_kref(struct vgic_irq *irq)
+static inline void vgic_get_irq_ref(struct vgic_irq *irq)
{
- WARN_ON_ONCE(!vgic_try_get_irq_kref(irq));
+ WARN_ON_ONCE(!vgic_try_get_irq_ref(irq));
}
void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9f8a116925ca..640555ff5b54 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -8,8 +8,8 @@
#include <linux/bits.h>
#include <linux/kvm.h>
#include <linux/irqreturn.h>
-#include <linux/kref.h>
#include <linux/mutex.h>
+#include <linux/refcount.h>
#include <linux/spinlock.h>
#include <linux/static_key.h>
#include <linux/types.h>
@@ -142,7 +142,7 @@ struct vgic_irq {
bool active;
bool enabled;
bool hw; /* Tied to HW IRQ */
- struct kref refcount; /* Used for LPIs */
+ refcount_t refcount; /* Used for LPIs */
u32 hwintid; /* HW INTID number */
unsigned int host_irq; /* linux irq corresponding to hwintid */
union {
--
2.39.5
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 3/6] KVM: arm64: Spin off release helper from vgic_put_irq()
2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
2025-09-05 10:05 ` [PATCH v2 1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state Oliver Upton
2025-09-05 10:05 ` [PATCH v2 2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
2025-09-05 10:05 ` [PATCH v2 4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks Oliver Upton
` (3 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
To: kvmarm
Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Ben Horgan, Oliver Upton
Spin off the release implementation from vgic_put_irq() to prepare for a
more involved fix for lock ordering such that it may be unnested from
raw spinlocks. This has the minor functional change of doing call_rcu()
behind the xa_lock although it shouldn't be consequential.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/kvm/vgic/vgic.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index a1d6fab895c4..ec4d70936a5b 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -114,22 +114,32 @@ struct vgic_irq *vgic_get_vcpu_irq(struct kvm_vcpu *vcpu, u32 intid)
return vgic_get_irq(vcpu->kvm, intid);
}
+static void vgic_release_lpi_locked(struct vgic_dist *dist, struct vgic_irq *irq)
+{
+ lockdep_assert_held(&dist->lpi_xa.xa_lock);
+ __xa_erase(&dist->lpi_xa, irq->intid);
+ kfree_rcu(irq, rcu);
+}
+
+static __must_check bool __vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
+{
+ if (irq->intid < VGIC_MIN_LPI)
+ return false;
+
+ return refcount_dec_and_test(&irq->refcount);
+}
+
void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
unsigned long flags;
- if (irq->intid < VGIC_MIN_LPI)
- return;
-
- if (!refcount_dec_and_test(&irq->refcount))
+ if (!__vgic_put_irq(kvm, irq))
return;
xa_lock_irqsave(&dist->lpi_xa, flags);
- __xa_erase(&dist->lpi_xa, irq->intid);
+ vgic_release_lpi_locked(dist, irq);
xa_unlock_irqrestore(&dist->lpi_xa, flags);
-
- kfree_rcu(irq, rcu);
}
void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
--
2.39.5
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
` (2 preceding siblings ...)
2025-09-05 10:05 ` [PATCH v2 3/6] KVM: arm64: Spin off release helper from vgic_put_irq() Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
2025-09-05 10:05 ` [PATCH v2 5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock Oliver Upton
` (2 subsequent siblings)
6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
To: kvmarm
Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Ben Horgan, Oliver Upton, syzbot+cef594105ac7e60c6d93
syzkaller has caught us red-handed once more, this time nesting regular
spinlocks behind raw spinlocks:
=============================
[ BUG: Invalid wait context ]
6.16.0-rc3-syzkaller-g7b8346bd9fce #0 Not tainted
-----------------------------
syz.0.29/3743 is trying to lock:
a3ff80008e2e9e18 (&xa->xa_lock#20){....}-{3:3}, at: vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137
other info that might help us debug this:
context-{5:5}
3 locks held by syz.0.29/3743:
#0: a3ff80008e2e90a8 (&kvm->slots_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x50/0x624 arch/arm64/kvm/vgic/vgic-init.c:499
#1: a3ff80008e2e9fa0 (&kvm->arch.config_lock){+.+.}-{4:4}, at: kvm_vgic_destroy+0x5c/0x624 arch/arm64/kvm/vgic/vgic-init.c:500
#2: 58f0000021be1428 (&vgic_cpu->ap_list_lock){....}-{2:2}, at: vgic_flush_pending_lpis+0x3c/0x31c arch/arm64/kvm/vgic/vgic.c:150
stack backtrace:
CPU: 0 UID: 0 PID: 3743 Comm: syz.0.29 Not tainted 6.16.0-rc3-syzkaller-g7b8346bd9fce #0 PREEMPT
Hardware name: linux,dummy-virt (DT)
Call trace:
show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:466 (C)
__dump_stack+0x30/0x40 lib/dump_stack.c:94
dump_stack_lvl+0xd8/0x12c lib/dump_stack.c:120
dump_stack+0x1c/0x28 lib/dump_stack.c:129
print_lock_invalid_wait_context kernel/locking/lockdep.c:4833 [inline]
check_wait_context kernel/locking/lockdep.c:4905 [inline]
__lock_acquire+0x978/0x299c kernel/locking/lockdep.c:5190
lock_acquire+0x14c/0x2e0 kernel/locking/lockdep.c:5871
__raw_spin_lock_irqsave include/linux/spinlock_api_smp.h:110 [inline]
_raw_spin_lock_irqsave+0x5c/0x7c kernel/locking/spinlock.c:162
vgic_put_irq+0xb4/0x190 arch/arm64/kvm/vgic/vgic.c:137
vgic_flush_pending_lpis+0x24c/0x31c arch/arm64/kvm/vgic/vgic.c:158
__kvm_vgic_vcpu_destroy+0x44/0x500 arch/arm64/kvm/vgic/vgic-init.c:455
kvm_vgic_destroy+0x100/0x624 arch/arm64/kvm/vgic/vgic-init.c:505
kvm_arch_destroy_vm+0x80/0x138 arch/arm64/kvm/arm.c:244
kvm_destroy_vm virt/kvm/kvm_main.c:1308 [inline]
kvm_put_kvm+0x800/0xff8 virt/kvm/kvm_main.c:1344
kvm_vm_release+0x58/0x78 virt/kvm/kvm_main.c:1367
__fput+0x4ac/0x980 fs/file_table.c:465
____fput+0x20/0x58 fs/file_table.c:493
task_work_run+0x1bc/0x254 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
do_notify_resume+0x1b4/0x270 arch/arm64/kernel/entry-common.c:151
exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline]
exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline]
el0_svc+0xb4/0x160 arch/arm64/kernel/entry-common.c:768
el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786
el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600
This is of course no good, but is at odds with how LPI refcounts are
managed. Solve the locking mess by deferring the release of unreferenced
LPIs after the ap_list_lock is released. Mark these to-be-released LPIs
specially to avoid racing with vgic_put_irq() and causing a double-free.
Since references can only be taken on LPIs with a nonzero refcount,
extending the lifetime of freed LPIs is still safe.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Reported-by: syzbot+cef594105ac7e60c6d93@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/kvmarm/68acd0d9.a00a0220.33401d.048b.GAE@google.com/
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/kvm/vgic/vgic.c | 41 ++++++++++++++++++++++++++++++++++----
include/kvm/arm_vgic.h | 3 +++
2 files changed, 40 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index ec4d70936a5b..480391a0dcad 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -28,8 +28,8 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
* kvm->arch.config_lock (mutex)
* its->cmd_lock (mutex)
* its->its_lock (mutex)
- * vgic_cpu->ap_list_lock must be taken with IRQs disabled
- * vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
+ * vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
+ * vgic_cpu->ap_list_lock must be taken with IRQs disabled
* vgic_irq->irq_lock must be taken with IRQs disabled
*
* As the ap_list_lock might be taken from the timer interrupt handler,
@@ -129,6 +129,15 @@ static __must_check bool __vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
return refcount_dec_and_test(&irq->refcount);
}
+static __must_check bool vgic_put_irq_norelease(struct kvm *kvm, struct vgic_irq *irq)
+{
+ if (!__vgic_put_irq(kvm, irq))
+ return false;
+
+ irq->pending_release = true;
+ return true;
+}
+
void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
@@ -142,10 +151,27 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
xa_unlock_irqrestore(&dist->lpi_xa, flags);
}
+static void vgic_release_deleted_lpis(struct kvm *kvm)
+{
+ struct vgic_dist *dist = &kvm->arch.vgic;
+ unsigned long flags, intid;
+ struct vgic_irq *irq;
+
+ xa_lock_irqsave(&dist->lpi_xa, flags);
+
+ xa_for_each(&dist->lpi_xa, intid, irq) {
+ if (irq->pending_release)
+ vgic_release_lpi_locked(dist, irq);
+ }
+
+ xa_unlock_irqrestore(&dist->lpi_xa, flags);
+}
+
void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_irq *irq, *tmp;
+ bool deleted = false;
unsigned long flags;
raw_spin_lock_irqsave(&vgic_cpu->ap_list_lock, flags);
@@ -156,11 +182,14 @@ void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
list_del(&irq->ap_list);
irq->vcpu = NULL;
raw_spin_unlock(&irq->irq_lock);
- vgic_put_irq(vcpu->kvm, irq);
+ deleted |= vgic_put_irq_norelease(vcpu->kvm, irq);
}
}
raw_spin_unlock_irqrestore(&vgic_cpu->ap_list_lock, flags);
+
+ if (deleted)
+ vgic_release_deleted_lpis(vcpu->kvm);
}
void vgic_irq_set_phys_pending(struct vgic_irq *irq, bool pending)
@@ -631,6 +660,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
{
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
struct vgic_irq *irq, *tmp;
+ bool deleted_lpis = false;
DEBUG_SPINLOCK_BUG_ON(!irqs_disabled());
@@ -663,7 +693,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
* we remove the irq from the list, we drop
* also drop the refcount.
*/
- vgic_put_irq(vcpu->kvm, irq);
+ deleted_lpis |= vgic_put_irq_norelease(vcpu->kvm, irq);
continue;
}
@@ -726,6 +756,9 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
}
raw_spin_unlock(&vgic_cpu->ap_list_lock);
+
+ if (unlikely(deleted_lpis))
+ vgic_release_deleted_lpis(vcpu->kvm);
}
static inline void vgic_fold_lr_state(struct kvm_vcpu *vcpu)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 640555ff5b54..4000ff16f295 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -140,6 +140,9 @@ struct vgic_irq {
* the pending state for both level
* and edge triggered IRQs. */
bool active;
+ bool pending_release; /* Used for LPIs only, unreferenced IRQ
+ * pending a release */
+
bool enabled;
bool hw; /* Tied to HW IRQ */
refcount_t refcount; /* Used for LPIs */
--
2.39.5
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
` (3 preceding siblings ...)
2025-09-05 10:05 ` [PATCH v2 4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
2025-09-05 10:05 ` [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take " Oliver Upton
2025-09-06 6:11 ` [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
To: kvmarm
Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Ben Horgan, Oliver Upton
Now that releases of LPIs have been unnested from the ap_list_lock there
are no xarray writers that exist in a context where IRQs are already
disabled. As such we can relax the locking to the non-IRQ disabling
spinlock to guard the LPI xarray.
Note that there are still readers of the LPI xarray where IRQs are
disabled however the readers rely on RCU protection instead of the lock.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/kvm/vgic/vgic-init.c | 2 +-
arch/arm64/kvm/vgic/vgic-its.c | 7 +++----
arch/arm64/kvm/vgic/vgic.c | 13 ++++++-------
3 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 4c777136ea5f..4c3c0d82e476 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -53,7 +53,7 @@ void kvm_vgic_early_init(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
- xa_init_flags(&dist->lpi_xa, XA_FLAGS_LOCK_IRQ);
+ xa_init(&dist->lpi_xa);
}
/* CREATION */
diff --git a/arch/arm64/kvm/vgic/vgic-its.c b/arch/arm64/kvm/vgic/vgic-its.c
index f162206adb48..ce3e3ed3f29f 100644
--- a/arch/arm64/kvm/vgic/vgic-its.c
+++ b/arch/arm64/kvm/vgic/vgic-its.c
@@ -78,7 +78,6 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
{
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_irq *irq = vgic_get_irq(kvm, intid), *oldirq;
- unsigned long flags;
int ret;
/* In this case there is no put, since we keep the reference. */
@@ -89,7 +88,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
if (!irq)
return ERR_PTR(-ENOMEM);
- ret = xa_reserve_irq(&dist->lpi_xa, intid, GFP_KERNEL_ACCOUNT);
+ ret = xa_reserve(&dist->lpi_xa, intid, GFP_KERNEL_ACCOUNT);
if (ret) {
kfree(irq);
return ERR_PTR(ret);
@@ -104,7 +103,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
irq->target_vcpu = vcpu;
irq->group = 1;
- xa_lock_irqsave(&dist->lpi_xa, flags);
+ xa_lock(&dist->lpi_xa);
/*
* There could be a race with another vgic_add_lpi(), so we need to
@@ -126,7 +125,7 @@ static struct vgic_irq *vgic_add_lpi(struct kvm *kvm, u32 intid,
}
out_unlock:
- xa_unlock_irqrestore(&dist->lpi_xa, flags);
+ xa_unlock(&dist->lpi_xa);
if (ret)
return ERR_PTR(ret);
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 480391a0dcad..a21b482844ce 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -28,7 +28,7 @@ struct vgic_global kvm_vgic_global_state __ro_after_init = {
* kvm->arch.config_lock (mutex)
* its->cmd_lock (mutex)
* its->its_lock (mutex)
- * vgic_dist->lpi_xa.xa_lock must be taken with IRQs disabled
+ * vgic_dist->lpi_xa.xa_lock
* vgic_cpu->ap_list_lock must be taken with IRQs disabled
* vgic_irq->irq_lock must be taken with IRQs disabled
*
@@ -141,30 +141,29 @@ static __must_check bool vgic_put_irq_norelease(struct kvm *kvm, struct vgic_irq
void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
- unsigned long flags;
if (!__vgic_put_irq(kvm, irq))
return;
- xa_lock_irqsave(&dist->lpi_xa, flags);
+ xa_lock(&dist->lpi_xa);
vgic_release_lpi_locked(dist, irq);
- xa_unlock_irqrestore(&dist->lpi_xa, flags);
+ xa_unlock(&dist->lpi_xa);
}
static void vgic_release_deleted_lpis(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
- unsigned long flags, intid;
+ unsigned long intid;
struct vgic_irq *irq;
- xa_lock_irqsave(&dist->lpi_xa, flags);
+ xa_lock(&dist->lpi_xa);
xa_for_each(&dist->lpi_xa, intid, irq) {
if (irq->pending_release)
vgic_release_lpi_locked(dist, irq);
}
- xa_unlock_irqrestore(&dist->lpi_xa, flags);
+ xa_unlock(&dist->lpi_xa);
}
void vgic_flush_pending_lpis(struct kvm_vcpu *vcpu)
--
2.39.5
^ permalink raw reply related [flat|nested] 13+ messages in thread* [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
` (4 preceding siblings ...)
2025-09-05 10:05 ` [PATCH v2 5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock Oliver Upton
@ 2025-09-05 10:05 ` Oliver Upton
2025-11-05 9:37 ` Zenghui Yu
2025-09-06 6:11 ` [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
6 siblings, 1 reply; 13+ messages in thread
From: Oliver Upton @ 2025-09-05 10:05 UTC (permalink / raw)
To: kvmarm
Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Ben Horgan, Oliver Upton
The release path on LPIs is quite rare, meaning it can be difficult to
find lock ordering bugs on the LPI xarray's spinlock. Tell lockdep that
vgic_put_irq() might acquire the xa_lock to make unsafe patterns more
obvious.
Reviewed-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
---
arch/arm64/kvm/vgic/vgic.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index a21b482844ce..3b247041a130 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -142,6 +142,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
+ if (irq->intid >= VGIC_MIN_LPI)
+ might_lock(&dist->lpi_xa.xa_lock);
+
if (!__vgic_put_irq(kvm, irq))
return;
--
2.39.5
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
2025-09-05 10:05 ` [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take " Oliver Upton
@ 2025-11-05 9:37 ` Zenghui Yu
2025-11-05 10:28 ` Marc Zyngier
0 siblings, 1 reply; 13+ messages in thread
From: Zenghui Yu @ 2025-11-05 9:37 UTC (permalink / raw)
To: Oliver Upton
Cc: kvmarm, Marc Zyngier, Joey Gouly, Suzuki K Poulose, Ben Horgan
Hi Oliver,
On 2025/9/5 18:05, Oliver Upton wrote:
> The release path on LPIs is quite rare, meaning it can be difficult to
> find lock ordering bugs on the LPI xarray's spinlock. Tell lockdep that
> vgic_put_irq() might acquire the xa_lock to make unsafe patterns more
> obvious.
>
> Reviewed-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> ---
> arch/arm64/kvm/vgic/vgic.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> index a21b482844ce..3b247041a130 100644
> --- a/arch/arm64/kvm/vgic/vgic.c
> +++ b/arch/arm64/kvm/vgic/vgic.c
> @@ -142,6 +142,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> {
> struct vgic_dist *dist = &kvm->arch.vgic;
>
> + if (irq->intid >= VGIC_MIN_LPI)
> + might_lock(&dist->lpi_xa.xa_lock);
I got the following splat on a lockdep kernel. The reproducing step can
be easily inferred from the backtrace (i.e., starting a guest with an
assigned device).
================================
WARNING: inconsistent lock state
6.18.0-rc4-00019-g284922f4c563-dirty #2390 Not tainted
--------------------------------
inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
swapper/10/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffff8000a504de18 (&xa->xa_lock#19){?.+.}-{3:3}, at: vgic_put_irq+0x28/0x110
{HARDIRQ-ON-W} state was registered at:
lock_acquire+0x1c8/0x354
_raw_spin_lock+0x48/0x60
vgic_add_lpi.part.0+0x70/0x2f8
vgic_its_cmd_handle_mapi.isra.0+0x398/0x418
vgic_its_process_commands.part.0+0x4d4/0xfa0
vgic_mmio_write_its_cwriter+0x80/0xa4
dispatch_mmio_write+0xd0/0x128
__kvm_io_bus_write+0xb4/0xe8
kvm_io_bus_write+0x58/0x98
io_mem_abort+0xe8/0x3f0
kvm_handle_guest_abort+0x4d0/0x1414
handle_exit+0x6c/0x1c4
kvm_arch_vcpu_ioctl_run+0x678/0xbfc
kvm_vcpu_ioctl+0x1ac/0xb24
__arm64_sys_ioctl+0xac/0x104
invoke_syscall+0x48/0x10c
el0_svc_common.constprop.0+0x40/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x50/0x2c0
el0t_64_sync_handler+0xa0/0xe4
el0t_64_sync+0x198/0x19c
irq event stamp: 5415534
hardirqs last enabled at (5415533): [<ffff8000813e291c>]
default_idle_call+0x7c/0x138
hardirqs last disabled at (5415534): [<ffff8000813dabf4>]
enter_from_kernel_mode+0x10/0x3c
softirqs last enabled at (5415516): [<ffff8000800c7b54>]
handle_softirqs+0x4ac/0x4c4
softirqs last disabled at (5415511): [<ffff800080010748>]
__do_softirq+0x14/0x20
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&xa->xa_lock#19);
<Interrupt>
lock(&xa->xa_lock#19);
*** DEADLOCK ***
2 locks held by swapper/10/0:
#0: ffff00280db646a0 (&ctx->wqh#2){-...}-{3:3}, at:
eventfd_signal_mask+0x38/0xc0
#1: ffff8000a504e480 (&kvm->irq_srcu){.?.+}-{0:0}, at:
irqfd_wakeup+0x88/0x2ac
stack backtrace:
CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Kdump: loaded Not tainted
6.18.0-rc4-00019-g284922f4c563-dirty #2390 PREEMPT
Call trace:
show_stack+0x18/0x24 (C)
dump_stack_lvl+0x90/0xd0
dump_stack+0x18/0x24
print_usage_bug.part.0+0x29c/0x358
mark_lock+0x6c0/0x960
__lock_acquire+0xd4c/0x20fc
lock_acquire+0x1c8/0x354
vgic_put_irq+0x54/0x110
vgic_its_inject_cached_translation+0x178/0x25c
kvm_arch_set_irq_inatomic+0xac/0x124
irqfd_wakeup+0x1a0/0x2ac
__wake_up_common+0x94/0x10c
__wake_up_locked_key+0x1c/0x28
eventfd_signal_mask+0x78/0xc0
vfio_msihandler+0x18/0x28
__handle_irq_event_percpu+0x8c/0x374
handle_irq_event+0x4c/0xa8
handle_fasteoi_irq+0x108/0x198
handle_irq_desc+0x40/0x58
generic_handle_domain_irq+0x1c/0x28
gic_handle_irq+0x4c/0x120
call_on_irq_stack+0x30/0x48
do_interrupt_handler+0x80/0x84
el1_interrupt+0x3c/0x60
el1h_64_irq_handler+0x18/0x24
el1h_64_irq+0x6c/0x70
default_idle_call+0x80/0x138 (P)
do_idle+0x21c/0x27c
cpu_startup_entry+0x34/0x3c
secondary_start_kernel+0x144/0x164
__secondary_switched+0xc0/0xc4
Thanks,
Zenghui
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
2025-11-05 9:37 ` Zenghui Yu
@ 2025-11-05 10:28 ` Marc Zyngier
2025-11-06 0:46 ` Oliver Upton
0 siblings, 1 reply; 13+ messages in thread
From: Marc Zyngier @ 2025-11-05 10:28 UTC (permalink / raw)
To: Zenghui Yu; +Cc: Oliver Upton, kvmarm, Joey Gouly, Suzuki K Poulose, Ben Horgan
On Wed, 05 Nov 2025 09:37:10 +0000,
Zenghui Yu <yuzenghui@huawei.com> wrote:
>
> Hi Oliver,
>
> On 2025/9/5 18:05, Oliver Upton wrote:
> > The release path on LPIs is quite rare, meaning it can be difficult to
> > find lock ordering bugs on the LPI xarray's spinlock. Tell lockdep that
> > vgic_put_irq() might acquire the xa_lock to make unsafe patterns more
> > obvious.
> >
> > Reviewed-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
> > ---
> > arch/arm64/kvm/vgic/vgic.c | 3 +++
> > 1 file changed, 3 insertions(+)
> >
> > diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
> > index a21b482844ce..3b247041a130 100644
> > --- a/arch/arm64/kvm/vgic/vgic.c
> > +++ b/arch/arm64/kvm/vgic/vgic.c
> > @@ -142,6 +142,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
> > {
> > struct vgic_dist *dist = &kvm->arch.vgic;
> >
> > + if (irq->intid >= VGIC_MIN_LPI)
> > + might_lock(&dist->lpi_xa.xa_lock);
>
> I got the following splat on a lockdep kernel. The reproducing step can
> be easily inferred from the backtrace (i.e., starting a guest with an
> assigned device).
>
> ================================
> WARNING: inconsistent lock state
> 6.18.0-rc4-00019-g284922f4c563-dirty #2390 Not tainted
> --------------------------------
> inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> swapper/10/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> ffff8000a504de18 (&xa->xa_lock#19){?.+.}-{3:3}, at: vgic_put_irq+0x28/0x110
> {HARDIRQ-ON-W} state was registered at:
> lock_acquire+0x1c8/0x354
> _raw_spin_lock+0x48/0x60
> vgic_add_lpi.part.0+0x70/0x2f8
> vgic_its_cmd_handle_mapi.isra.0+0x398/0x418
> vgic_its_process_commands.part.0+0x4d4/0xfa0
> vgic_mmio_write_its_cwriter+0x80/0xa4
> dispatch_mmio_write+0xd0/0x128
> __kvm_io_bus_write+0xb4/0xe8
> kvm_io_bus_write+0x58/0x98
> io_mem_abort+0xe8/0x3f0
> kvm_handle_guest_abort+0x4d0/0x1414
> handle_exit+0x6c/0x1c4
> kvm_arch_vcpu_ioctl_run+0x678/0xbfc
> kvm_vcpu_ioctl+0x1ac/0xb24
> __arm64_sys_ioctl+0xac/0x104
> invoke_syscall+0x48/0x10c
> el0_svc_common.constprop.0+0x40/0xe0
> do_el0_svc+0x1c/0x28
> el0_svc+0x50/0x2c0
> el0t_64_sync_handler+0xa0/0xe4
> el0t_64_sync+0x198/0x19c
> irq event stamp: 5415534
> hardirqs last enabled at (5415533): [<ffff8000813e291c>]
> default_idle_call+0x7c/0x138
> hardirqs last disabled at (5415534): [<ffff8000813dabf4>]
> enter_from_kernel_mode+0x10/0x3c
> softirqs last enabled at (5415516): [<ffff8000800c7b54>]
> handle_softirqs+0x4ac/0x4c4
> softirqs last disabled at (5415511): [<ffff800080010748>]
> __do_softirq+0x14/0x20
>
> other info that might help us debug this:
> Possible unsafe locking scenario:
>
> CPU0
> ----
> lock(&xa->xa_lock#19);
> <Interrupt>
> lock(&xa->xa_lock#19);
>
> *** DEADLOCK ***
>
> 2 locks held by swapper/10/0:
> #0: ffff00280db646a0 (&ctx->wqh#2){-...}-{3:3}, at:
> eventfd_signal_mask+0x38/0xc0
> #1: ffff8000a504e480 (&kvm->irq_srcu){.?.+}-{0:0}, at:
> irqfd_wakeup+0x88/0x2ac
>
> stack backtrace:
> CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Kdump: loaded Not tainted
> 6.18.0-rc4-00019-g284922f4c563-dirty #2390 PREEMPT
> Call trace:
> show_stack+0x18/0x24 (C)
> dump_stack_lvl+0x90/0xd0
> dump_stack+0x18/0x24
> print_usage_bug.part.0+0x29c/0x358
> mark_lock+0x6c0/0x960
> __lock_acquire+0xd4c/0x20fc
> lock_acquire+0x1c8/0x354
> vgic_put_irq+0x54/0x110
> vgic_its_inject_cached_translation+0x178/0x25c
> kvm_arch_set_irq_inatomic+0xac/0x124
Right. This might_lock() is gross, and clearly doesn't do the right
thing outside of direct injection of LPIs.
I think we should drop it, but we should ensure that lpi_xa.xa_lock is
never taken in interrupt context.
Oliver, what do you think?
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
2025-11-05 10:28 ` Marc Zyngier
@ 2025-11-06 0:46 ` Oliver Upton
2025-11-06 0:58 ` Oliver Upton
0 siblings, 1 reply; 13+ messages in thread
From: Oliver Upton @ 2025-11-06 0:46 UTC (permalink / raw)
To: Marc Zyngier; +Cc: Zenghui Yu, kvmarm, Joey Gouly, Suzuki K Poulose, Ben Horgan
Hey,
On Wed, Nov 05, 2025 at 10:28:04AM +0000, Marc Zyngier wrote:
> On Wed, 05 Nov 2025 09:37:10 +0000,
> Zenghui Yu <yuzenghui@huawei.com> wrote:
> > I got the following splat on a lockdep kernel. The reproducing step can
> > be easily inferred from the backtrace (i.e., starting a guest with an
> > assigned device).
Ouch, sorry about that!
> > ================================
> > WARNING: inconsistent lock state
> > 6.18.0-rc4-00019-g284922f4c563-dirty #2390 Not tainted
> > --------------------------------
> > inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> > swapper/10/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > ffff8000a504de18 (&xa->xa_lock#19){?.+.}-{3:3}, at: vgic_put_irq+0x28/0x110
> > {HARDIRQ-ON-W} state was registered at:
> > lock_acquire+0x1c8/0x354
> > _raw_spin_lock+0x48/0x60
> > vgic_add_lpi.part.0+0x70/0x2f8
> > vgic_its_cmd_handle_mapi.isra.0+0x398/0x418
> > vgic_its_process_commands.part.0+0x4d4/0xfa0
> > vgic_mmio_write_its_cwriter+0x80/0xa4
> > dispatch_mmio_write+0xd0/0x128
> > __kvm_io_bus_write+0xb4/0xe8
> > kvm_io_bus_write+0x58/0x98
> > io_mem_abort+0xe8/0x3f0
> > kvm_handle_guest_abort+0x4d0/0x1414
> > handle_exit+0x6c/0x1c4
> > kvm_arch_vcpu_ioctl_run+0x678/0xbfc
> > kvm_vcpu_ioctl+0x1ac/0xb24
> > __arm64_sys_ioctl+0xac/0x104
> > invoke_syscall+0x48/0x10c
> > el0_svc_common.constprop.0+0x40/0xe0
> > do_el0_svc+0x1c/0x28
> > el0_svc+0x50/0x2c0
> > el0t_64_sync_handler+0xa0/0xe4
> > el0t_64_sync+0x198/0x19c
> > irq event stamp: 5415534
> > hardirqs last enabled at (5415533): [<ffff8000813e291c>]
> > default_idle_call+0x7c/0x138
> > hardirqs last disabled at (5415534): [<ffff8000813dabf4>]
> > enter_from_kernel_mode+0x10/0x3c
> > softirqs last enabled at (5415516): [<ffff8000800c7b54>]
> > handle_softirqs+0x4ac/0x4c4
> > softirqs last disabled at (5415511): [<ffff800080010748>]
> > __do_softirq+0x14/0x20
> >
> > other info that might help us debug this:
> > Possible unsafe locking scenario:
> >
> > CPU0
> > ----
> > lock(&xa->xa_lock#19);
> > <Interrupt>
> > lock(&xa->xa_lock#19);
> >
> > *** DEADLOCK ***
> >
> > 2 locks held by swapper/10/0:
> > #0: ffff00280db646a0 (&ctx->wqh#2){-...}-{3:3}, at:
> > eventfd_signal_mask+0x38/0xc0
> > #1: ffff8000a504e480 (&kvm->irq_srcu){.?.+}-{0:0}, at:
> > irqfd_wakeup+0x88/0x2ac
> >
> > stack backtrace:
> > CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Kdump: loaded Not tainted
> > 6.18.0-rc4-00019-g284922f4c563-dirty #2390 PREEMPT
> > Call trace:
> > show_stack+0x18/0x24 (C)
> > dump_stack_lvl+0x90/0xd0
> > dump_stack+0x18/0x24
> > print_usage_bug.part.0+0x29c/0x358
> > mark_lock+0x6c0/0x960
> > __lock_acquire+0xd4c/0x20fc
> > lock_acquire+0x1c8/0x354
> > vgic_put_irq+0x54/0x110
> > vgic_its_inject_cached_translation+0x178/0x25c
> > kvm_arch_set_irq_inatomic+0xac/0x124
>
> Right. This might_lock() is gross, and clearly doesn't do the right
> thing outside of direct injection of LPIs.
>
> I think we should drop it, but we should ensure that lpi_xa.xa_lock is
> never taken in interrupt context.
>
> Oliver, what do you think?
It is possible (albeit improbable) that the last reference to an LPI gets
dropped here after injecting a cached translation. When that is the
case, vgic_put_irq() will take the xa_lock from an irq context. So
I'd say the might_lock() here is valid.
Zenghui, does reverting 982f31bbb5b0 ("KVM: arm64: vgic-v3: Don't require
IRQs be disabled for LPI xarray lock") make this go away?
Thanks,
Oliver
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
2025-11-06 0:46 ` Oliver Upton
@ 2025-11-06 0:58 ` Oliver Upton
2025-11-06 3:34 ` Zenghui Yu
0 siblings, 1 reply; 13+ messages in thread
From: Oliver Upton @ 2025-11-06 0:58 UTC (permalink / raw)
To: Marc Zyngier; +Cc: Zenghui Yu, kvmarm, Joey Gouly, Suzuki K Poulose, Ben Horgan
On Wed, Nov 05, 2025 at 04:46:59PM -0800, Oliver Upton wrote:
> Hey,
>
> On Wed, Nov 05, 2025 at 10:28:04AM +0000, Marc Zyngier wrote:
> > On Wed, 05 Nov 2025 09:37:10 +0000,
> > Zenghui Yu <yuzenghui@huawei.com> wrote:
> > > I got the following splat on a lockdep kernel. The reproducing step can
> > > be easily inferred from the backtrace (i.e., starting a guest with an
> > > assigned device).
>
> Ouch, sorry about that!
>
> > > ================================
> > > WARNING: inconsistent lock state
> > > 6.18.0-rc4-00019-g284922f4c563-dirty #2390 Not tainted
> > > --------------------------------
> > > inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage.
> > > swapper/10/0 [HC1[1]:SC0[0]:HE0:SE1] takes:
> > > ffff8000a504de18 (&xa->xa_lock#19){?.+.}-{3:3}, at: vgic_put_irq+0x28/0x110
> > > {HARDIRQ-ON-W} state was registered at:
> > > lock_acquire+0x1c8/0x354
> > > _raw_spin_lock+0x48/0x60
> > > vgic_add_lpi.part.0+0x70/0x2f8
> > > vgic_its_cmd_handle_mapi.isra.0+0x398/0x418
> > > vgic_its_process_commands.part.0+0x4d4/0xfa0
> > > vgic_mmio_write_its_cwriter+0x80/0xa4
> > > dispatch_mmio_write+0xd0/0x128
> > > __kvm_io_bus_write+0xb4/0xe8
> > > kvm_io_bus_write+0x58/0x98
> > > io_mem_abort+0xe8/0x3f0
> > > kvm_handle_guest_abort+0x4d0/0x1414
> > > handle_exit+0x6c/0x1c4
> > > kvm_arch_vcpu_ioctl_run+0x678/0xbfc
> > > kvm_vcpu_ioctl+0x1ac/0xb24
> > > __arm64_sys_ioctl+0xac/0x104
> > > invoke_syscall+0x48/0x10c
> > > el0_svc_common.constprop.0+0x40/0xe0
> > > do_el0_svc+0x1c/0x28
> > > el0_svc+0x50/0x2c0
> > > el0t_64_sync_handler+0xa0/0xe4
> > > el0t_64_sync+0x198/0x19c
> > > irq event stamp: 5415534
> > > hardirqs last enabled at (5415533): [<ffff8000813e291c>]
> > > default_idle_call+0x7c/0x138
> > > hardirqs last disabled at (5415534): [<ffff8000813dabf4>]
> > > enter_from_kernel_mode+0x10/0x3c
> > > softirqs last enabled at (5415516): [<ffff8000800c7b54>]
> > > handle_softirqs+0x4ac/0x4c4
> > > softirqs last disabled at (5415511): [<ffff800080010748>]
> > > __do_softirq+0x14/0x20
> > >
> > > other info that might help us debug this:
> > > Possible unsafe locking scenario:
> > >
> > > CPU0
> > > ----
> > > lock(&xa->xa_lock#19);
> > > <Interrupt>
> > > lock(&xa->xa_lock#19);
> > >
> > > *** DEADLOCK ***
> > >
> > > 2 locks held by swapper/10/0:
> > > #0: ffff00280db646a0 (&ctx->wqh#2){-...}-{3:3}, at:
> > > eventfd_signal_mask+0x38/0xc0
> > > #1: ffff8000a504e480 (&kvm->irq_srcu){.?.+}-{0:0}, at:
> > > irqfd_wakeup+0x88/0x2ac
> > >
> > > stack backtrace:
> > > CPU: 10 UID: 0 PID: 0 Comm: swapper/10 Kdump: loaded Not tainted
> > > 6.18.0-rc4-00019-g284922f4c563-dirty #2390 PREEMPT
> > > Call trace:
> > > show_stack+0x18/0x24 (C)
> > > dump_stack_lvl+0x90/0xd0
> > > dump_stack+0x18/0x24
> > > print_usage_bug.part.0+0x29c/0x358
> > > mark_lock+0x6c0/0x960
> > > __lock_acquire+0xd4c/0x20fc
> > > lock_acquire+0x1c8/0x354
> > > vgic_put_irq+0x54/0x110
> > > vgic_its_inject_cached_translation+0x178/0x25c
> > > kvm_arch_set_irq_inatomic+0xac/0x124
> >
> > Right. This might_lock() is gross, and clearly doesn't do the right
> > thing outside of direct injection of LPIs.
> >
> > I think we should drop it, but we should ensure that lpi_xa.xa_lock is
> > never taken in interrupt context.
> >
> > Oliver, what do you think?
>
> It is possible (albeit improbable) that the last reference to an LPI gets
> dropped here after injecting a cached translation. When that is the
> case, vgic_put_irq() will take the xa_lock from an irq context. So
> I'd say the might_lock() here is valid.
>
> Zenghui, does reverting 982f31bbb5b0 ("KVM: arm64: vgic-v3: Don't require
> IRQs be disabled for LPI xarray lock") make this go away?
Well, a bit more than that. Revert and add the diff below. Like I said
in the original changelog, finding bugs for rare release paths is
annoying and having a reliable way of causing an explosion when the
calling context isn't right is a property I'd like to preserve.
Thanks,
Oliver
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index 6dd5a10081e2..1045e9538e91 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -142,8 +142,9 @@ void vgic_put_irq(struct kvm *kvm, struct vgic_irq *irq)
{
struct vgic_dist *dist = &kvm->arch.vgic;
- if (irq->intid >= VGIC_MIN_LPI)
- might_lock(&dist->lpi_xa.xa_lock);
+ if (IS_ENABLED(CONFIG_LOCKDEP) && irq->intid >= VGIC_MIN_LPI) {
+ guard(spinlock_irqsave)(&dist->lpi_xa.xa_lock);
+ }
if (!__vgic_put_irq(kvm, irq))
return;
^ permalink raw reply related [flat|nested] 13+ messages in thread* Re: [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
2025-11-06 0:58 ` Oliver Upton
@ 2025-11-06 3:34 ` Zenghui Yu
0 siblings, 0 replies; 13+ messages in thread
From: Zenghui Yu @ 2025-11-06 3:34 UTC (permalink / raw)
To: Oliver Upton
Cc: Marc Zyngier, kvmarm, Joey Gouly, Suzuki K Poulose, Ben Horgan
On 2025/11/6 8:58, Oliver Upton wrote:
> On Wed, Nov 05, 2025 at 04:46:59PM -0800, Oliver Upton wrote:
> > Hey,
> >
> > On Wed, Nov 05, 2025 at 10:28:04AM +0000, Marc Zyngier wrote:
> > > On Wed, 05 Nov 2025 09:37:10 +0000,
> > > Zenghui Yu <yuzenghui@huawei.com> wrote:
> > > > Call trace:
> > > > show_stack+0x18/0x24 (C)
> > > > dump_stack_lvl+0x90/0xd0
> > > > dump_stack+0x18/0x24
> > > > print_usage_bug.part.0+0x29c/0x358
> > > > mark_lock+0x6c0/0x960
> > > > __lock_acquire+0xd4c/0x20fc
> > > > lock_acquire+0x1c8/0x354
> > > > vgic_put_irq+0x54/0x110
> > > > vgic_its_inject_cached_translation+0x178/0x25c
> > > > kvm_arch_set_irq_inatomic+0xac/0x124
> > >
> > > Right. This might_lock() is gross, and clearly doesn't do the right
> > > thing outside of direct injection of LPIs.
> > >
> > > I think we should drop it, but we should ensure that lpi_xa.xa_lock is
> > > never taken in interrupt context.
> > >
> > > Oliver, what do you think?
> >
> > It is possible (albeit improbable) that the last reference to an LPI gets
> > dropped here after injecting a cached translation. When that is the
> > case, vgic_put_irq() will take the xa_lock from an irq context.
IMO this is possible (with a really small possibility though) to happen.
> > So
> > I'd say the might_lock() here is valid.
> >
> > Zenghui, does reverting 982f31bbb5b0 ("KVM: arm64: vgic-v3: Don't require
> > IRQs be disabled for LPI xarray lock") make this go away?
>
> Well, a bit more than that. Revert and add the diff below. Like I said
> in the original changelog, finding bugs for rare release paths is
> annoying and having a reliable way of causing an explosion when the
> calling context isn't right is a property I'd like to preserve.
The warning disappears with that.
Thanks,
Zenghui
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd
2025-09-05 10:05 [PATCH v2 0/6] KVM: arm64: vgic-v3: Fix yet another lock ordering turd Oliver Upton
` (5 preceding siblings ...)
2025-09-05 10:05 ` [PATCH v2 6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take " Oliver Upton
@ 2025-09-06 6:11 ` Oliver Upton
6 siblings, 0 replies; 13+ messages in thread
From: Oliver Upton @ 2025-09-06 6:11 UTC (permalink / raw)
To: kvmarm, Oliver Upton
Cc: Marc Zyngier, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
Ben Horgan
On Fri, 05 Sep 2025 03:05:25 -0700, Oliver Upton wrote:
> v1: https://lore.kernel.org/kvmarm/20250904062348.223976-1-oliver.upton@linux.dev/
>
> v1 -> v2:
> - Drop union misbehavior + misleading comment (Marc)
> - Fix LPI check to include the first one (Ben)
> - Clarify changelog to make lock relaxation slightly more obvious
>
> [...]
Applied to fixes, thanks!
[1/6] KVM: arm64: vgic: Drop stale comment on IRQ active state
https://git.kernel.org/kvmarm/kvmarm/c/982c6cee4c28
[2/6] KVM: arm64: vgic-v3: Use bare refcount for VGIC LPIs
https://git.kernel.org/kvmarm/kvmarm/c/5cce5e814fa2
[3/6] KVM: arm64: Spin off release helper from vgic_put_irq()
https://git.kernel.org/kvmarm/kvmarm/c/135f15386207
[4/6] KVM: arm64: vgic-v3: Erase LPIs from xarray outside of raw spinlocks
https://git.kernel.org/kvmarm/kvmarm/c/0388323fb7cb
[5/6] KVM: arm64: vgic-v3: Don't require IRQs be disabled for LPI xarray lock
https://git.kernel.org/kvmarm/kvmarm/c/032ea3e45463
[6/6] KVM: arm64: vgic-v3: Indicate vgic_put_irq() may take LPI xarray lock
https://git.kernel.org/kvmarm/kvmarm/c/78921bec9d58
--
Best,
Oliver
^ permalink raw reply [flat|nested] 13+ messages in thread