* [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c
@ 2025-05-19 23:27 Sean Christopherson
2025-05-19 23:27 ` [PATCH 01/15] KVM: x86: Trigger I/O APIC route rescan in kvm_arch_irq_routing_update() Sean Christopherson
` (16 more replies)
0 siblings, 17 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
This series is prep work for the big device posted IRQs overhaul[1], in which
Paolo suggested getting rid of arch/x86/kvm/irq_comm.c[2]. As I started
chipping away bits of irq_comm.c to make the final code movement to irq.c as
small as possible, I realized that (a) a rather large amount of irq_comm.c was
actually I/O APIC code and (b) this would be a perfect opportunity to further
isolate the I/O APIC code.
So, a bit of hacking later and voila, CONFIG_KVM_IOAPIC. Similar to KVM's SMM
and Xen Kconfigs, this is something we would enable in production straightaway,
if we could magically fast-forwarded our kernel, as fully disabling I/O APIC
emulation puts a decent chunk of guest-visible surface entirely out of reach.
Side topic, Paolo's recollection that irq_comm.c was to hold common APIs between
x86 and Itanium was spot on. Though when I read Paolo's mail, I parsed "ia64"
as x86-64. I got quite a good laugh when I eventually realized that he really
did mean ia64 :-)
[1] https://lore.kernel.org/all/20250404193923.1413163-1-seanjc@google.com
[2] https://lore.kernel.org/all/cf4d9b81-c1ab-40a6-8c8c-36ad36b9be63@redhat.com
Sean Christopherson (15):
KVM: x86: Trigger I/O APIC route rescan in
kvm_arch_irq_routing_update()
KVM: x86: Drop superfluous kvm_set_pic_irq() => kvm_pic_set_irq()
wrapper
KVM: x86: Drop superfluous kvm_set_ioapic_irq() =>
kvm_ioapic_set_irq() wrapper
KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq()
wrapper
KVM: x86: Fold kvm_setup_default_irq_routing() into kvm_ioapic_init()
KVM: x86: Move kvm_{request,free}_irq_source_id() to i8254.c (PIT)
KVM: x86: Hardcode the PIT IRQ source ID to '2'
KVM: x86: Don't clear PIT's IRQ line status when destroying PIT
KVM: x86: Explicitly check for in-kernel PIC when getting ExtINT
KVM: Move x86-only tracepoints to x86's trace.h
KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
KVM: Squash two CONFIG_HAVE_KVM_IRQCHIP #ifdefs into one
KVM: selftests: Fall back to split IRQ chip if full in-kernel chip is
unsupported
KVM: x86: Move IRQ mask notifier infrastructure to I/O APIC emulation
KVM: x86: Fold irq_comm.c into irq.c
arch/x86/include/asm/kvm_host.h | 22 +-
arch/x86/kvm/Kconfig | 10 +
arch/x86/kvm/Makefile | 7 +-
arch/x86/kvm/hyperv.c | 10 +-
arch/x86/kvm/hyperv.h | 3 +-
arch/x86/kvm/i8254.c | 11 +-
arch/x86/kvm/i8254.h | 3 +-
arch/x86/kvm/i8259.c | 17 +-
arch/x86/kvm/ioapic.c | 87 +++-
arch/x86/kvm/ioapic.h | 22 +-
arch/x86/kvm/irq.c | 336 ++++++++++++++-
arch/x86/kvm/irq.h | 3 +-
arch/x86/kvm/irq_comm.c | 469 ---------------------
arch/x86/kvm/lapic.c | 7 +-
arch/x86/kvm/trace.h | 80 ++++
arch/x86/kvm/x86.c | 37 +-
include/linux/kvm_host.h | 9 +-
include/trace/events/kvm.h | 84 +---
tools/testing/selftests/kvm/lib/kvm_util.c | 13 +-
virt/kvm/irqchip.c | 2 -
20 files changed, 577 insertions(+), 655 deletions(-)
delete mode 100644 arch/x86/kvm/irq_comm.c
base-commit: 3f7b307757ecffc1c18ede9ee3cf9ce8101f3cc9
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply [flat|nested] 32+ messages in thread
* [PATCH 01/15] KVM: x86: Trigger I/O APIC route rescan in kvm_arch_irq_routing_update()
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
@ 2025-05-19 23:27 ` Sean Christopherson
2025-05-19 23:27 ` [PATCH 02/15] KVM: x86: Drop superfluous kvm_set_pic_irq() => kvm_pic_set_irq() wrapper Sean Christopherson
` (15 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Trigger the I/O APIC route rescan that's performed for a split IRQ chip
after userspace updates IRQ routes in kvm_arch_irq_routing_update(), i.e.
before dropping kvm->irq_lock. Calling kvm_make_all_cpus_request() under
a mutex is perfectly safe, and the smp_wmb()+smp_mb__after_atomic() pair
in __kvm_make_request()+kvm_check_request() ensures the new routing is
visible to vCPUs prior to the request being visible to vCPUs.
In all likelihood, commit b053b2aef25d ("KVM: x86: Add EOI exit bitmap
inference") somewhat arbitrarily made the request outside of irq_lock to
avoid holding irq_lock any longer than is strictly necessary. And then
commit abdb080f7ac8 ("kvm/irqchip: kvm_arch_irq_routing_update renaming
split") took the easy route of adding another arch hook instead of risking
a functional change.
Note, the call to synchronize_srcu_expedited() does NOT provide ordering
guarantees with respect to vCPUs scanning the new routing; as above, the
request infrastructure provides the necessary ordering. I.e. there's no
need to wait for kvm_scan_ioapic_routes() to complete if it's actively
running, because regardless of whether it grabs the old or new table, the
vCPU will have another KVM_REQ_SCAN_IOAPIC pending, i.e. will rescan again
and see the new mappings.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/irq_comm.c | 10 +++-------
include/linux/kvm_host.h | 4 ----
virt/kvm/irqchip.c | 2 --
3 files changed, 3 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index d6d792b5d1bd..e2ae62ff9cc2 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -395,13 +395,6 @@ int kvm_setup_default_irq_routing(struct kvm *kvm)
ARRAY_SIZE(default_routing), 0);
}
-void kvm_arch_post_irq_routing_update(struct kvm *kvm)
-{
- if (!irqchip_split(kvm))
- return;
- kvm_make_scan_ioapic_request(kvm);
-}
-
void kvm_scan_ioapic_irq(struct kvm_vcpu *vcpu, u32 dest_id, u16 dest_mode,
u8 vector, unsigned long *ioapic_handled_vectors)
{
@@ -466,4 +459,7 @@ void kvm_arch_irq_routing_update(struct kvm *kvm)
#ifdef CONFIG_KVM_HYPERV
kvm_hv_irq_routing_update(kvm);
#endif
+
+ if (irqchip_split(kvm))
+ kvm_make_scan_ioapic_request(kvm);
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c685fb417e92..963e250664d6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1020,14 +1020,10 @@ void vcpu_put(struct kvm_vcpu *vcpu);
#ifdef __KVM_HAVE_IOAPIC
void kvm_arch_post_irq_ack_notifier_list_update(struct kvm *kvm);
-void kvm_arch_post_irq_routing_update(struct kvm *kvm);
#else
static inline void kvm_arch_post_irq_ack_notifier_list_update(struct kvm *kvm)
{
}
-static inline void kvm_arch_post_irq_routing_update(struct kvm *kvm)
-{
-}
#endif
#ifdef CONFIG_HAVE_KVM_IRQCHIP
diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
index 162d8ed889f2..6ccabfd32287 100644
--- a/virt/kvm/irqchip.c
+++ b/virt/kvm/irqchip.c
@@ -222,8 +222,6 @@ int kvm_set_irq_routing(struct kvm *kvm,
kvm_arch_irq_routing_update(kvm);
mutex_unlock(&kvm->irq_lock);
- kvm_arch_post_irq_routing_update(kvm);
-
synchronize_srcu_expedited(&kvm->irq_srcu);
new = old;
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 02/15] KVM: x86: Drop superfluous kvm_set_pic_irq() => kvm_pic_set_irq() wrapper
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
2025-05-19 23:27 ` [PATCH 01/15] KVM: x86: Trigger I/O APIC route rescan in kvm_arch_irq_routing_update() Sean Christopherson
@ 2025-05-19 23:27 ` Sean Christopherson
2025-05-19 23:27 ` [PATCH 03/15] KVM: x86: Drop superfluous kvm_set_ioapic_irq() => kvm_ioapic_set_irq() wrapper Sean Christopherson
` (14 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Drop the superfluous and confusing kvm_set_pic_irq() => kvm_pic_set_irq()
wrapper, and instead wire up ->set() directly to its final destination.
Opportunistically move the declaration kvm_pic_set_irq() to irq.h to
start gathering more of the in-kernel APIC/IO-APIC logic in irq.{c,h}.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/i8259.c | 5 ++++-
arch/x86/kvm/irq.h | 2 ++
arch/x86/kvm/irq_comm.c | 10 +---------
4 files changed, 7 insertions(+), 11 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 330cdcbed1a6..f25ec3ec5ce4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2208,7 +2208,6 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
return !!(*irq_state);
}
-int kvm_pic_set_irq(struct kvm_pic *pic, int irq, int irq_source_id, int level);
void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index a8fb19940975..0150aec4f523 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -185,8 +185,11 @@ void kvm_pic_update_irq(struct kvm_pic *s)
pic_unlock(s);
}
-int kvm_pic_set_irq(struct kvm_pic *s, int irq, int irq_source_id, int level)
+int kvm_pic_set_irq(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
+ int irq_source_id, int level, bool line_status)
{
+ struct kvm_pic *s = kvm->arch.vpic;
+ int irq = e->irqchip.pin;
int ret, irq_level;
BUG_ON(irq < 0 || irq >= PIC_NUM_PINS);
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 76d46b2f41dd..33dd5666b656 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -63,6 +63,8 @@ int kvm_pic_init(struct kvm *kvm);
void kvm_pic_destroy(struct kvm *kvm);
int kvm_pic_read_irq(struct kvm *kvm);
void kvm_pic_update_irq(struct kvm_pic *s);
+int kvm_pic_set_irq(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
+ int irq_source_id, int level, bool line_status);
static inline int irqchip_split(struct kvm *kvm)
{
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index e2ae62ff9cc2..64f352e7bcb0 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -27,14 +27,6 @@
#include "x86.h"
#include "xen.h"
-static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e,
- struct kvm *kvm, int irq_source_id, int level,
- bool line_status)
-{
- struct kvm_pic *pic = kvm->arch.vpic;
- return kvm_pic_set_irq(pic, e->irqchip.pin, irq_source_id, level);
-}
-
static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id, int level,
bool line_status)
@@ -296,7 +288,7 @@ int kvm_set_routing_entry(struct kvm *kvm,
case KVM_IRQCHIP_PIC_MASTER:
if (ue->u.irqchip.pin >= PIC_NUM_PINS / 2)
return -EINVAL;
- e->set = kvm_set_pic_irq;
+ e->set = kvm_pic_set_irq;
break;
case KVM_IRQCHIP_IOAPIC:
if (ue->u.irqchip.pin >= KVM_IOAPIC_NUM_PINS)
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 03/15] KVM: x86: Drop superfluous kvm_set_ioapic_irq() => kvm_ioapic_set_irq() wrapper
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
2025-05-19 23:27 ` [PATCH 01/15] KVM: x86: Trigger I/O APIC route rescan in kvm_arch_irq_routing_update() Sean Christopherson
2025-05-19 23:27 ` [PATCH 02/15] KVM: x86: Drop superfluous kvm_set_pic_irq() => kvm_pic_set_irq() wrapper Sean Christopherson
@ 2025-05-19 23:27 ` Sean Christopherson
2025-05-19 23:27 ` [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper Sean Christopherson
` (13 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Drop the superfluous and confusing kvm_set_ioapic_irq() and instead wire
up ->set() directly to its final destination.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/ioapic.c | 6 ++++--
arch/x86/kvm/ioapic.h | 5 +++--
arch/x86/kvm/irq_comm.c | 11 +----------
3 files changed, 8 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 45dae2d5d2f1..8c8a8062eb19 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -479,9 +479,11 @@ static int ioapic_service(struct kvm_ioapic *ioapic, int irq, bool line_status)
return ret;
}
-int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
- int level, bool line_status)
+int kvm_ioapic_set_irq(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
+ int irq_source_id, int level, bool line_status)
{
+ struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+ int irq = e->irqchip.pin;
int ret, irq_level;
BUG_ON(irq < 0 || irq >= IOAPIC_NUM_PINS);
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index aa8cb4ac0479..a86f59bbea44 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -111,8 +111,9 @@ void kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, int vector,
int trigger_mode);
int kvm_ioapic_init(struct kvm *kvm);
void kvm_ioapic_destroy(struct kvm *kvm);
-int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
- int level, bool line_status);
+int kvm_ioapic_set_irq(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
+ int irq_source_id, int level, bool line_status);
+
void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int irq_source_id);
void kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 64f352e7bcb0..8dcb6a555902 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -27,15 +27,6 @@
#include "x86.h"
#include "xen.h"
-static int kvm_set_ioapic_irq(struct kvm_kernel_irq_routing_entry *e,
- struct kvm *kvm, int irq_source_id, int level,
- bool line_status)
-{
- struct kvm_ioapic *ioapic = kvm->arch.vioapic;
- return kvm_ioapic_set_irq(ioapic, e->irqchip.pin, irq_source_id, level,
- line_status);
-}
-
int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq, struct dest_map *dest_map)
{
@@ -293,7 +284,7 @@ int kvm_set_routing_entry(struct kvm *kvm,
case KVM_IRQCHIP_IOAPIC:
if (ue->u.irqchip.pin >= KVM_IOAPIC_NUM_PINS)
return -EINVAL;
- e->set = kvm_set_ioapic_irq;
+ e->set = kvm_ioapic_set_irq;
break;
default:
return -EINVAL;
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (2 preceding siblings ...)
2025-05-19 23:27 ` [PATCH 03/15] KVM: x86: Drop superfluous kvm_set_ioapic_irq() => kvm_ioapic_set_irq() wrapper Sean Christopherson
@ 2025-05-19 23:27 ` Sean Christopherson
2025-05-20 9:57 ` Vitaly Kuznetsov
2025-05-29 11:37 ` Huang, Kai
2025-05-19 23:27 ` [PATCH 05/15] KVM: x86: Fold kvm_setup_default_irq_routing() into kvm_ioapic_init() Sean Christopherson
` (12 subsequent siblings)
16 siblings, 2 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Drop the superfluous kvm_hv_set_sint() and instead wire up ->set() directly
to its final destination.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/hyperv.c | 10 +++++++---
arch/x86/kvm/hyperv.h | 3 ++-
arch/x86/kvm/irq_comm.c | 12 ------------
3 files changed, 9 insertions(+), 16 deletions(-)
diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
index 24f0318c50d7..7f565636edde 100644
--- a/arch/x86/kvm/hyperv.c
+++ b/arch/x86/kvm/hyperv.c
@@ -497,15 +497,19 @@ static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint)
return ret;
}
-int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vpidx, u32 sint)
+int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
+ int irq_source_id, int level, bool line_status)
{
struct kvm_vcpu_hv_synic *synic;
- synic = synic_get(kvm, vpidx);
+ if (!level)
+ return -1;
+
+ synic = synic_get(kvm, e->hv_sint.vcpu);
if (!synic)
return -EINVAL;
- return synic_set_irq(synic, sint);
+ return synic_set_irq(synic, e->hv_sint.sint);
}
void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector)
diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
index 913bfc96959c..4ad5a0749739 100644
--- a/arch/x86/kvm/hyperv.h
+++ b/arch/x86/kvm/hyperv.h
@@ -103,7 +103,8 @@ static inline bool kvm_hv_hypercall_enabled(struct kvm_vcpu *vcpu)
int kvm_hv_hypercall(struct kvm_vcpu *vcpu);
void kvm_hv_irq_routing_update(struct kvm *kvm);
-int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vcpu_id, u32 sint);
+int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
+ int irq_source_id, int level, bool line_status);
void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector);
int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 8dcb6a555902..b85e4be2ddff 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -127,18 +127,6 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
}
-#ifdef CONFIG_KVM_HYPERV
-static int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e,
- struct kvm *kvm, int irq_source_id, int level,
- bool line_status)
-{
- if (!level)
- return -1;
-
- return kvm_hv_synic_set_irq(kvm, e->hv_sint.vcpu, e->hv_sint.sint);
-}
-#endif
-
int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id, int level,
bool line_status)
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 05/15] KVM: x86: Fold kvm_setup_default_irq_routing() into kvm_ioapic_init()
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (3 preceding siblings ...)
2025-05-19 23:27 ` [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper Sean Christopherson
@ 2025-05-19 23:27 ` Sean Christopherson
2025-06-04 16:43 ` Paolo Bonzini
2025-05-19 23:27 ` [PATCH 06/15] KVM: x86: Move kvm_{request,free}_irq_source_id() to i8254.c (PIT) Sean Christopherson
` (11 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Move the default IRQ routing table used for in-kernel I/O APIC routing to
ioapic.c where it belongs, and fold the call to kvm_set_irq_routing() into
kvm_ioapic_init() (the call via kvm_setup_default_irq_routing() is done
immediately after kvm_ioapic_init()).
In addition to making it more obvious that the so called "default" routing
only applies to an in-kernel I/O APIC, getting it out of irq_comm.c will
allow removing irq_comm.c entirely, and will also allow for guarding KVM's
in-kernel I/O APIC emulation with a Kconfig with minimal #ifdefs.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/ioapic.c | 32 ++++++++++++++++++++++++++++++++
arch/x86/kvm/irq.h | 1 -
arch/x86/kvm/irq_comm.c | 32 --------------------------------
arch/x86/kvm/x86.c | 6 ------
4 files changed, 32 insertions(+), 39 deletions(-)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 8c8a8062eb19..dc45ea9f5b9c 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -710,6 +710,32 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
.write = ioapic_mmio_write,
};
+#define IOAPIC_ROUTING_ENTRY(irq) \
+ { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
+ .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
+#define ROUTING_ENTRY1(irq) IOAPIC_ROUTING_ENTRY(irq)
+
+#define PIC_ROUTING_ENTRY(irq) \
+ { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
+ .u.irqchip = { .irqchip = SELECT_PIC(irq), .pin = (irq) % 8 } }
+#define ROUTING_ENTRY2(irq) \
+ IOAPIC_ROUTING_ENTRY(irq), PIC_ROUTING_ENTRY(irq)
+
+static const struct kvm_irq_routing_entry default_routing[] = {
+ ROUTING_ENTRY2(0), ROUTING_ENTRY2(1),
+ ROUTING_ENTRY2(2), ROUTING_ENTRY2(3),
+ ROUTING_ENTRY2(4), ROUTING_ENTRY2(5),
+ ROUTING_ENTRY2(6), ROUTING_ENTRY2(7),
+ ROUTING_ENTRY2(8), ROUTING_ENTRY2(9),
+ ROUTING_ENTRY2(10), ROUTING_ENTRY2(11),
+ ROUTING_ENTRY2(12), ROUTING_ENTRY2(13),
+ ROUTING_ENTRY2(14), ROUTING_ENTRY2(15),
+ ROUTING_ENTRY1(16), ROUTING_ENTRY1(17),
+ ROUTING_ENTRY1(18), ROUTING_ENTRY1(19),
+ ROUTING_ENTRY1(20), ROUTING_ENTRY1(21),
+ ROUTING_ENTRY1(22), ROUTING_ENTRY1(23),
+};
+
int kvm_ioapic_init(struct kvm *kvm)
{
struct kvm_ioapic *ioapic;
@@ -731,8 +757,14 @@ int kvm_ioapic_init(struct kvm *kvm)
if (ret < 0) {
kvm->arch.vioapic = NULL;
kfree(ioapic);
+ return ret;
}
+ ret = kvm_set_irq_routing(kvm, default_routing,
+ ARRAY_SIZE(default_routing), 0);
+ if (ret)
+ kvm_ioapic_destroy(kvm);
+
return ret;
}
diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
index 33dd5666b656..f6134289523e 100644
--- a/arch/x86/kvm/irq.h
+++ b/arch/x86/kvm/irq.h
@@ -107,7 +107,6 @@ void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
int apic_has_pending_timer(struct kvm_vcpu *vcpu);
-int kvm_setup_default_irq_routing(struct kvm *kvm);
int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq,
struct dest_map *dest_map);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index b85e4be2ddff..998c4a34d87c 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -334,38 +334,6 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
}
EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
-#define IOAPIC_ROUTING_ENTRY(irq) \
- { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
- .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
-#define ROUTING_ENTRY1(irq) IOAPIC_ROUTING_ENTRY(irq)
-
-#define PIC_ROUTING_ENTRY(irq) \
- { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
- .u.irqchip = { .irqchip = SELECT_PIC(irq), .pin = (irq) % 8 } }
-#define ROUTING_ENTRY2(irq) \
- IOAPIC_ROUTING_ENTRY(irq), PIC_ROUTING_ENTRY(irq)
-
-static const struct kvm_irq_routing_entry default_routing[] = {
- ROUTING_ENTRY2(0), ROUTING_ENTRY2(1),
- ROUTING_ENTRY2(2), ROUTING_ENTRY2(3),
- ROUTING_ENTRY2(4), ROUTING_ENTRY2(5),
- ROUTING_ENTRY2(6), ROUTING_ENTRY2(7),
- ROUTING_ENTRY2(8), ROUTING_ENTRY2(9),
- ROUTING_ENTRY2(10), ROUTING_ENTRY2(11),
- ROUTING_ENTRY2(12), ROUTING_ENTRY2(13),
- ROUTING_ENTRY2(14), ROUTING_ENTRY2(15),
- ROUTING_ENTRY1(16), ROUTING_ENTRY1(17),
- ROUTING_ENTRY1(18), ROUTING_ENTRY1(19),
- ROUTING_ENTRY1(20), ROUTING_ENTRY1(21),
- ROUTING_ENTRY1(22), ROUTING_ENTRY1(23),
-};
-
-int kvm_setup_default_irq_routing(struct kvm *kvm)
-{
- return kvm_set_irq_routing(kvm, default_routing,
- ARRAY_SIZE(default_routing), 0);
-}
-
void kvm_scan_ioapic_irq(struct kvm_vcpu *vcpu, u32 dest_id, u16 dest_mode,
u8 vector, unsigned long *ioapic_handled_vectors)
{
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f9f798f286ce..4a9c252c9dab 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7118,12 +7118,6 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
goto create_irqchip_unlock;
}
- r = kvm_setup_default_irq_routing(kvm);
- if (r) {
- kvm_ioapic_destroy(kvm);
- kvm_pic_destroy(kvm);
- goto create_irqchip_unlock;
- }
/* Write kvm->irq_routing before enabling irqchip_in_kernel. */
smp_wmb();
kvm->arch.irqchip_mode = KVM_IRQCHIP_KERNEL;
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 06/15] KVM: x86: Move kvm_{request,free}_irq_source_id() to i8254.c (PIT)
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (4 preceding siblings ...)
2025-05-19 23:27 ` [PATCH 05/15] KVM: x86: Fold kvm_setup_default_irq_routing() into kvm_ioapic_init() Sean Christopherson
@ 2025-05-19 23:27 ` Sean Christopherson
2025-05-19 23:28 ` [PATCH 07/15] KVM: x86: Hardcode the PIT IRQ source ID to '2' Sean Christopherson
` (10 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:27 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Move kvm_{request,free}_irq_source_id() to i8254.c, i.e. the dedicated PIT
emulation file, in anticipation of removing them entirely in favor of
hardcoding the PIT's "requested" source ID (the source ID can only ever be
'2', and the request can never fail).
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/i8254.c | 44 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/irq_comm.c | 44 ----------------------------------------
include/linux/kvm_host.h | 2 --
3 files changed, 44 insertions(+), 46 deletions(-)
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 739aa6c0d0c3..2a0964c859af 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -641,6 +641,50 @@ static void kvm_pit_reset(struct kvm_pit *pit)
kvm_pit_reset_reinject(pit);
}
+static int kvm_request_irq_source_id(struct kvm *kvm)
+{
+ unsigned long *bitmap = &kvm->arch.irq_sources_bitmap;
+ int irq_source_id;
+
+ mutex_lock(&kvm->irq_lock);
+ irq_source_id = find_first_zero_bit(bitmap, BITS_PER_LONG);
+
+ if (irq_source_id >= BITS_PER_LONG) {
+ pr_warn("exhausted allocatable IRQ sources!\n");
+ irq_source_id = -EFAULT;
+ goto unlock;
+ }
+
+ ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
+ ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
+ set_bit(irq_source_id, bitmap);
+unlock:
+ mutex_unlock(&kvm->irq_lock);
+
+ return irq_source_id;
+}
+
+static void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
+{
+ ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
+ ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
+
+ mutex_lock(&kvm->irq_lock);
+ if (irq_source_id < 0 ||
+ irq_source_id >= BITS_PER_LONG) {
+ pr_err("IRQ source ID out of range!\n");
+ goto unlock;
+ }
+ clear_bit(irq_source_id, &kvm->arch.irq_sources_bitmap);
+ if (!irqchip_kernel(kvm))
+ goto unlock;
+
+ kvm_ioapic_clear_all(kvm->arch.vioapic, irq_source_id);
+ kvm_pic_clear_all(kvm->arch.vpic, irq_source_id);
+unlock:
+ mutex_unlock(&kvm->irq_lock);
+}
+
static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask)
{
struct kvm_pit *pit = container_of(kimn, struct kvm_pit, mask_notifier);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 998c4a34d87c..8c827da3e3d6 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -165,50 +165,6 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
return -EWOULDBLOCK;
}
-int kvm_request_irq_source_id(struct kvm *kvm)
-{
- unsigned long *bitmap = &kvm->arch.irq_sources_bitmap;
- int irq_source_id;
-
- mutex_lock(&kvm->irq_lock);
- irq_source_id = find_first_zero_bit(bitmap, BITS_PER_LONG);
-
- if (irq_source_id >= BITS_PER_LONG) {
- pr_warn("exhausted allocatable IRQ sources!\n");
- irq_source_id = -EFAULT;
- goto unlock;
- }
-
- ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
- ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
- set_bit(irq_source_id, bitmap);
-unlock:
- mutex_unlock(&kvm->irq_lock);
-
- return irq_source_id;
-}
-
-void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
-{
- ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
- ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
-
- mutex_lock(&kvm->irq_lock);
- if (irq_source_id < 0 ||
- irq_source_id >= BITS_PER_LONG) {
- pr_err("IRQ source ID out of range!\n");
- goto unlock;
- }
- clear_bit(irq_source_id, &kvm->arch.irq_sources_bitmap);
- if (!irqchip_kernel(kvm))
- goto unlock;
-
- kvm_ioapic_clear_all(kvm->arch.vioapic, irq_source_id);
- kvm_pic_clear_all(kvm->arch.vpic, irq_source_id);
-unlock:
- mutex_unlock(&kvm->irq_lock);
-}
-
void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
struct kvm_irq_mask_notifier *kimn)
{
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 963e250664d6..0d4506598d62 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1780,8 +1780,6 @@ void kvm_register_irq_ack_notifier(struct kvm *kvm,
struct kvm_irq_ack_notifier *kian);
void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
struct kvm_irq_ack_notifier *kian);
-int kvm_request_irq_source_id(struct kvm *kvm);
-void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args);
/*
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 07/15] KVM: x86: Hardcode the PIT IRQ source ID to '2'
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (5 preceding siblings ...)
2025-05-19 23:27 ` [PATCH 06/15] KVM: x86: Move kvm_{request,free}_irq_source_id() to i8254.c (PIT) Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-19 23:28 ` [PATCH 08/15] KVM: x86: Don't clear PIT's IRQ line status when destroying PIT Sean Christopherson
` (9 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Hardcode the PIT's source IRQ ID to '2' instead of "finding" that bit 2
is always the first available bit in irq_sources_bitmap. Bits 0 and 1 are
set/reserved by kvm_arch_init_vm(), i.e. long before kvm_create_pit() can
be invoked, and KVM allows at most one in-kernel PIT instance, i.e. it's
impossible for the PIT to find a different free bit (there are no other
users of kvm_request_irq_source_id().
Delete the now-defunct irq_sources_bitmap and all its associated code.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/i8254.c | 55 +++++----------------------------
arch/x86/kvm/i8254.h | 1 -
arch/x86/kvm/x86.c | 6 ----
include/linux/kvm_host.h | 1 +
5 files changed, 8 insertions(+), 56 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f25ec3ec5ce4..c8654e461933 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1396,7 +1396,6 @@ struct kvm_arch {
bool pause_in_guest;
bool cstate_in_guest;
- unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
/*
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index 2a0964c859af..d4fc20c265b2 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -248,8 +248,8 @@ static void pit_do_work(struct kthread_work *work)
if (atomic_read(&ps->reinject) && !atomic_xchg(&ps->irq_ack, 0))
return;
- kvm_set_irq(kvm, pit->irq_source_id, 0, 1, false);
- kvm_set_irq(kvm, pit->irq_source_id, 0, 0, false);
+ kvm_set_irq(kvm, KVM_PIT_IRQ_SOURCE_ID, 0, 1, false);
+ kvm_set_irq(kvm, KVM_PIT_IRQ_SOURCE_ID, 0, 0, false);
/*
* Provides NMI watchdog support via Virtual Wire mode.
@@ -641,47 +641,11 @@ static void kvm_pit_reset(struct kvm_pit *pit)
kvm_pit_reset_reinject(pit);
}
-static int kvm_request_irq_source_id(struct kvm *kvm)
+static void kvm_pit_clear_all(struct kvm *kvm)
{
- unsigned long *bitmap = &kvm->arch.irq_sources_bitmap;
- int irq_source_id;
-
mutex_lock(&kvm->irq_lock);
- irq_source_id = find_first_zero_bit(bitmap, BITS_PER_LONG);
-
- if (irq_source_id >= BITS_PER_LONG) {
- pr_warn("exhausted allocatable IRQ sources!\n");
- irq_source_id = -EFAULT;
- goto unlock;
- }
-
- ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
- ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
- set_bit(irq_source_id, bitmap);
-unlock:
- mutex_unlock(&kvm->irq_lock);
-
- return irq_source_id;
-}
-
-static void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id)
-{
- ASSERT(irq_source_id != KVM_USERSPACE_IRQ_SOURCE_ID);
- ASSERT(irq_source_id != KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID);
-
- mutex_lock(&kvm->irq_lock);
- if (irq_source_id < 0 ||
- irq_source_id >= BITS_PER_LONG) {
- pr_err("IRQ source ID out of range!\n");
- goto unlock;
- }
- clear_bit(irq_source_id, &kvm->arch.irq_sources_bitmap);
- if (!irqchip_kernel(kvm))
- goto unlock;
-
- kvm_ioapic_clear_all(kvm->arch.vioapic, irq_source_id);
- kvm_pic_clear_all(kvm->arch.vpic, irq_source_id);
-unlock:
+ kvm_ioapic_clear_all(kvm->arch.vioapic, KVM_PIT_IRQ_SOURCE_ID);
+ kvm_pic_clear_all(kvm->arch.vpic, KVM_PIT_IRQ_SOURCE_ID);
mutex_unlock(&kvm->irq_lock);
}
@@ -715,10 +679,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
if (!pit)
return NULL;
- pit->irq_source_id = kvm_request_irq_source_id(kvm);
- if (pit->irq_source_id < 0)
- goto fail_request;
-
mutex_init(&pit->pit_state.lock);
pid = get_pid(task_tgid(current));
@@ -770,8 +730,7 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
kvm_pit_set_reinject(pit, false);
kthread_destroy_worker(pit->worker);
fail_kthread:
- kvm_free_irq_source_id(kvm, pit->irq_source_id);
-fail_request:
+ kvm_pit_clear_all(kvm);
kfree(pit);
return NULL;
}
@@ -788,7 +747,7 @@ void kvm_free_pit(struct kvm *kvm)
kvm_pit_set_reinject(pit, false);
hrtimer_cancel(&pit->pit_state.timer);
kthread_destroy_worker(pit->worker);
- kvm_free_irq_source_id(kvm, pit->irq_source_id);
+ kvm_pit_clear_all(kvm);
kfree(pit);
}
}
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index a768212ba821..14fb310357f2 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -42,7 +42,6 @@ struct kvm_pit {
struct kvm_io_device speaker_dev;
struct kvm *kvm;
struct kvm_kpit_state pit_state;
- int irq_source_id;
struct kvm_irq_mask_notifier mask_notifier;
struct kthread_worker *worker;
struct kthread_work expired;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4a9c252c9dab..9e2c249d45ca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12790,12 +12790,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
atomic_set(&kvm->arch.noncoherent_dma_count, 0);
- /* Reserve bit 0 of irq_sources_bitmap for userspace irq source */
- set_bit(KVM_USERSPACE_IRQ_SOURCE_ID, &kvm->arch.irq_sources_bitmap);
- /* Reserve bit 1 of irq_sources_bitmap for irqfd-resampler */
- set_bit(KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
- &kvm->arch.irq_sources_bitmap);
-
raw_spin_lock_init(&kvm->arch.tsc_write_lock);
mutex_init(&kvm->arch.apic_map_lock);
seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0d4506598d62..44b439c5fcf4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -190,6 +190,7 @@ bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req);
#define KVM_USERSPACE_IRQ_SOURCE_ID 0
#define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
+#define KVM_PIT_IRQ_SOURCE_ID 2
extern struct mutex kvm_lock;
extern struct list_head vm_list;
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 08/15] KVM: x86: Don't clear PIT's IRQ line status when destroying PIT
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (6 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 07/15] KVM: x86: Hardcode the PIT IRQ source ID to '2' Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-29 11:41 ` Huang, Kai
2025-05-19 23:28 ` [PATCH 09/15] KVM: x86: Explicitly check for in-kernel PIC when getting ExtINT Sean Christopherson
` (8 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Don't bother clearing the PIT's IRQ line status when destroying the PIT,
as userspace can't possibly rely on KVM to lower the IRQ line in any sane
use case, and it's not at all obvious that clearing the PIT's IRQ line is
correct/desirable in kvm_create_pit()'s error path.
When called from kvm_arch_pre_destroy_vm(), the entire VM is being torn
down and thus {kvm_pic,kvm_ioapic}.irq_states are unreachable.
As for the error path in kvm_create_pit(), the only way the PIT's bit in
irq_states can be set is if userspace raises the associated IRQ before
KVM_CREATE_PIT{2} completes. Forcefully clearing the bit would clobber's
userspace's input, nonsensical though that input may be. Not to mention
that no known VMM will continue on if PIT creation fails.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 --
arch/x86/kvm/i8254.c | 10 ----------
arch/x86/kvm/i8259.c | 10 ----------
arch/x86/kvm/ioapic.c | 10 ----------
arch/x86/kvm/ioapic.h | 1 -
5 files changed, 33 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c8654e461933..ebda93979179 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -2207,8 +2207,6 @@ static inline int __kvm_irq_line_state(unsigned long *irq_state,
return !!(*irq_state);
}
-void kvm_pic_clear_all(struct kvm_pic *pic, int irq_source_id);
-
void kvm_inject_nmi(struct kvm_vcpu *vcpu);
int kvm_get_nr_pending_nmis(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index d4fc20c265b2..518e2e042605 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -641,14 +641,6 @@ static void kvm_pit_reset(struct kvm_pit *pit)
kvm_pit_reset_reinject(pit);
}
-static void kvm_pit_clear_all(struct kvm *kvm)
-{
- mutex_lock(&kvm->irq_lock);
- kvm_ioapic_clear_all(kvm->arch.vioapic, KVM_PIT_IRQ_SOURCE_ID);
- kvm_pic_clear_all(kvm->arch.vpic, KVM_PIT_IRQ_SOURCE_ID);
- mutex_unlock(&kvm->irq_lock);
-}
-
static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask)
{
struct kvm_pit *pit = container_of(kimn, struct kvm_pit, mask_notifier);
@@ -730,7 +722,6 @@ struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags)
kvm_pit_set_reinject(pit, false);
kthread_destroy_worker(pit->worker);
fail_kthread:
- kvm_pit_clear_all(kvm);
kfree(pit);
return NULL;
}
@@ -747,7 +738,6 @@ void kvm_free_pit(struct kvm *kvm)
kvm_pit_set_reinject(pit, false);
hrtimer_cancel(&pit->pit_state.timer);
kthread_destroy_worker(pit->worker);
- kvm_pit_clear_all(kvm);
kfree(pit);
}
}
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 0150aec4f523..4de055efc4ee 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -206,16 +206,6 @@ int kvm_pic_set_irq(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
return ret;
}
-void kvm_pic_clear_all(struct kvm_pic *s, int irq_source_id)
-{
- int i;
-
- pic_lock(s);
- for (i = 0; i < PIC_NUM_PINS; i++)
- __clear_bit(irq_source_id, &s->irq_states[i]);
- pic_unlock(s);
-}
-
/*
* acknowledge interrupt 'irq'
*/
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index dc45ea9f5b9c..7d2d47a6c2b6 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -498,16 +498,6 @@ int kvm_ioapic_set_irq(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
return ret;
}
-void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int irq_source_id)
-{
- int i;
-
- spin_lock(&ioapic->lock);
- for (i = 0; i < KVM_IOAPIC_NUM_PINS; i++)
- __clear_bit(irq_source_id, &ioapic->irq_states[i]);
- spin_unlock(&ioapic->lock);
-}
-
static void kvm_ioapic_eoi_inject_work(struct work_struct *work)
{
int i;
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index a86f59bbea44..fee17eb201ef 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -114,7 +114,6 @@ void kvm_ioapic_destroy(struct kvm *kvm);
int kvm_ioapic_set_irq(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
int irq_source_id, int level, bool line_status);
-void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int irq_source_id);
void kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 09/15] KVM: x86: Explicitly check for in-kernel PIC when getting ExtINT
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (7 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 08/15] KVM: x86: Don't clear PIT's IRQ line status when destroying PIT Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-19 23:28 ` [PATCH 10/15] KVM: Move x86-only tracepoints to x86's trace.h Sean Christopherson
` (7 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Explicitly check for an in-kernel PIC when checking for a pending ExtINT
in the PIC. Effectively swapping the split vs. full irqchip logic will
allow guarding the in-kernel I/O APIC (and PIC) emulation with a Kconfig,
and also makes it more obvious that kvm_pic_read_irq() won't result in a
NULL pointer dereference.
Opportunistically add WARNs in the fallthrough path, mostly to document
that the userspace ExtINT logic is only relevant to split IRQ chips.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/irq.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index 97d68d837929..b9b9df00ab77 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -41,6 +41,14 @@ static int pending_userspace_extint(struct kvm_vcpu *v)
return v->arch.pending_external_vector != -1;
}
+static int get_userspace_extint(struct kvm_vcpu *vcpu)
+{
+ int vector = vcpu->arch.pending_external_vector;
+
+ vcpu->arch.pending_external_vector = -1;
+ return vector;
+}
+
/*
* check if there is pending interrupt from
* non-APIC source without intack.
@@ -67,10 +75,11 @@ int kvm_cpu_has_extint(struct kvm_vcpu *v)
if (!kvm_apic_accept_pic_intr(v))
return 0;
- if (irqchip_split(v->kvm))
- return pending_userspace_extint(v);
- else
+ if (pic_in_kernel(v->kvm))
return v->kvm->arch.vpic->output;
+
+ WARN_ON_ONCE(!irqchip_split(v->kvm));
+ return pending_userspace_extint(v);
}
/*
@@ -126,13 +135,11 @@ int kvm_cpu_get_extint(struct kvm_vcpu *v)
return v->kvm->arch.xen.upcall_vector;
#endif
- if (irqchip_split(v->kvm)) {
- int vector = v->arch.pending_external_vector;
-
- v->arch.pending_external_vector = -1;
- return vector;
- } else
+ if (pic_in_kernel(v->kvm))
return kvm_pic_read_irq(v->kvm); /* PIC */
+
+ WARN_ON_ONCE(!irqchip_split(v->kvm));
+ return get_userspace_extint(v);
}
EXPORT_SYMBOL_GPL(kvm_cpu_get_extint);
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 10/15] KVM: Move x86-only tracepoints to x86's trace.h
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (8 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 09/15] KVM: x86: Explicitly check for in-kernel PIC when getting ExtINT Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-19 23:28 ` [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC Sean Christopherson
` (6 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Move the I/O APIC tracepoints and trace_kvm_msi_set_irq() to x86, as
__KVM_HAVE_IOAPIC is just code for "x86", and trace_kvm_msi_set_irq()
isn't unique to I/O APIC emulation.
Opportunistically clean up the absurdly messy #includes in ioapic.c.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/ioapic.c | 2 +-
arch/x86/kvm/irq_comm.c | 10 ++---
arch/x86/kvm/trace.h | 78 ++++++++++++++++++++++++++++++++++++++
include/trace/events/kvm.h | 77 -------------------------------------
4 files changed, 82 insertions(+), 85 deletions(-)
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 7d2d47a6c2b6..151ee9a64c3c 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -41,11 +41,11 @@
#include <asm/processor.h>
#include <asm/page.h>
#include <asm/current.h>
-#include <trace/events/kvm.h>
#include "ioapic.h"
#include "lapic.h"
#include "irq.h"
+#include "trace.h"
static int ioapic_service(struct kvm_ioapic *vioapic, int irq,
bool line_status);
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index 8c827da3e3d6..adef53dc4fef 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -15,15 +15,11 @@
#include <linux/export.h>
#include <linux/rculist.h>
-#include <trace/events/kvm.h>
-
-#include "irq.h"
-
+#include "hyperv.h"
#include "ioapic.h"
-
+#include "irq.h"
#include "lapic.h"
-
-#include "hyperv.h"
+#include "trace.h"
#include "x86.h"
#include "xen.h"
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index ba736cbb0587..4ef17990574d 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -260,6 +260,84 @@ TRACE_EVENT(kvm_cpuid,
__entry->used_max_basic ? ", used max basic" : "")
);
+#define kvm_deliver_mode \
+ {0x0, "Fixed"}, \
+ {0x1, "LowPrio"}, \
+ {0x2, "SMI"}, \
+ {0x3, "Res3"}, \
+ {0x4, "NMI"}, \
+ {0x5, "INIT"}, \
+ {0x6, "SIPI"}, \
+ {0x7, "ExtINT"}
+
+TRACE_EVENT(kvm_ioapic_set_irq,
+ TP_PROTO(__u64 e, int pin, bool coalesced),
+ TP_ARGS(e, pin, coalesced),
+
+ TP_STRUCT__entry(
+ __field( __u64, e )
+ __field( int, pin )
+ __field( bool, coalesced )
+ ),
+
+ TP_fast_assign(
+ __entry->e = e;
+ __entry->pin = pin;
+ __entry->coalesced = coalesced;
+ ),
+
+ TP_printk("pin %u dst %x vec %u (%s|%s|%s%s)%s",
+ __entry->pin, (u8)(__entry->e >> 56), (u8)__entry->e,
+ __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
+ (__entry->e & (1<<11)) ? "logical" : "physical",
+ (__entry->e & (1<<15)) ? "level" : "edge",
+ (__entry->e & (1<<16)) ? "|masked" : "",
+ __entry->coalesced ? " (coalesced)" : "")
+);
+
+TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
+ TP_PROTO(__u64 e),
+ TP_ARGS(e),
+
+ TP_STRUCT__entry(
+ __field( __u64, e )
+ ),
+
+ TP_fast_assign(
+ __entry->e = e;
+ ),
+
+ TP_printk("dst %x vec %u (%s|%s|%s%s)",
+ (u8)(__entry->e >> 56), (u8)__entry->e,
+ __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
+ (__entry->e & (1<<11)) ? "logical" : "physical",
+ (__entry->e & (1<<15)) ? "level" : "edge",
+ (__entry->e & (1<<16)) ? "|masked" : "")
+);
+
+TRACE_EVENT(kvm_msi_set_irq,
+ TP_PROTO(__u64 address, __u64 data),
+ TP_ARGS(address, data),
+
+ TP_STRUCT__entry(
+ __field( __u64, address )
+ __field( __u64, data )
+ ),
+
+ TP_fast_assign(
+ __entry->address = address;
+ __entry->data = data;
+ ),
+
+ TP_printk("dst %llx vec %u (%s|%s|%s%s)",
+ (u8)(__entry->address >> 12) | ((__entry->address >> 32) & 0xffffff00),
+ (u8)__entry->data,
+ __print_symbolic((__entry->data >> 8 & 0x7), kvm_deliver_mode),
+ (__entry->address & (1<<2)) ? "logical" : "physical",
+ (__entry->data & (1<<15)) ? "level" : "edge",
+ (__entry->address & (1<<3)) ? "|rh" : "")
+);
+
#define AREG(x) { APIC_##x, "APIC_" #x }
#define kvm_trace_symbol_apic \
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index fc7d0f8ff078..96e581900c8e 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -85,83 +85,6 @@ TRACE_EVENT(kvm_set_irq,
#endif /* defined(CONFIG_HAVE_KVM_IRQCHIP) */
#if defined(__KVM_HAVE_IOAPIC)
-#define kvm_deliver_mode \
- {0x0, "Fixed"}, \
- {0x1, "LowPrio"}, \
- {0x2, "SMI"}, \
- {0x3, "Res3"}, \
- {0x4, "NMI"}, \
- {0x5, "INIT"}, \
- {0x6, "SIPI"}, \
- {0x7, "ExtINT"}
-
-TRACE_EVENT(kvm_ioapic_set_irq,
- TP_PROTO(__u64 e, int pin, bool coalesced),
- TP_ARGS(e, pin, coalesced),
-
- TP_STRUCT__entry(
- __field( __u64, e )
- __field( int, pin )
- __field( bool, coalesced )
- ),
-
- TP_fast_assign(
- __entry->e = e;
- __entry->pin = pin;
- __entry->coalesced = coalesced;
- ),
-
- TP_printk("pin %u dst %x vec %u (%s|%s|%s%s)%s",
- __entry->pin, (u8)(__entry->e >> 56), (u8)__entry->e,
- __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
- (__entry->e & (1<<11)) ? "logical" : "physical",
- (__entry->e & (1<<15)) ? "level" : "edge",
- (__entry->e & (1<<16)) ? "|masked" : "",
- __entry->coalesced ? " (coalesced)" : "")
-);
-
-TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
- TP_PROTO(__u64 e),
- TP_ARGS(e),
-
- TP_STRUCT__entry(
- __field( __u64, e )
- ),
-
- TP_fast_assign(
- __entry->e = e;
- ),
-
- TP_printk("dst %x vec %u (%s|%s|%s%s)",
- (u8)(__entry->e >> 56), (u8)__entry->e,
- __print_symbolic((__entry->e >> 8 & 0x7), kvm_deliver_mode),
- (__entry->e & (1<<11)) ? "logical" : "physical",
- (__entry->e & (1<<15)) ? "level" : "edge",
- (__entry->e & (1<<16)) ? "|masked" : "")
-);
-
-TRACE_EVENT(kvm_msi_set_irq,
- TP_PROTO(__u64 address, __u64 data),
- TP_ARGS(address, data),
-
- TP_STRUCT__entry(
- __field( __u64, address )
- __field( __u64, data )
- ),
-
- TP_fast_assign(
- __entry->address = address;
- __entry->data = data;
- ),
-
- TP_printk("dst %llx vec %u (%s|%s|%s%s)",
- (u8)(__entry->address >> 12) | ((__entry->address >> 32) & 0xffffff00),
- (u8)__entry->data,
- __print_symbolic((__entry->data >> 8 & 0x7), kvm_deliver_mode),
- (__entry->address & (1<<2)) ? "logical" : "physical",
- (__entry->data & (1<<15)) ? "level" : "edge",
- (__entry->address & (1<<3)) ? "|rh" : "")
-);
#define kvm_irqchips \
{KVM_IRQCHIP_PIC_MASTER, "PIC master"}, \
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (9 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 10/15] KVM: Move x86-only tracepoints to x86's trace.h Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-29 11:55 ` Huang, Kai
2025-05-19 23:28 ` [PATCH 12/15] KVM: Squash two CONFIG_HAVE_KVM_IRQCHIP #ifdefs into one Sean Christopherson
` (5 subsequent siblings)
16 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Add a Kconfig to allowing building KVM without support for emulating an
I/O APIC, PIC, and PIT, which is desirable for deployments that effectively
don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to
create an in-kernel I/O APIC. E.g. compiling out support eliminates a few
thousand lines of guest-facing code and gives security folks warm fuzzies.
As a bonus, wrapping relevant paths with CONFIG_KVM_IOAPIC #ifdefs makes
it much easier for readers to understand which bits and pieces exist
specific for fully in-kernel IRQ chips.
Opportunistically convert all two in-kernel uses of __KVM_HAVE_IOAPIC to
CONFIG_KVM_IOAPIC, e.g. rather than add a second #ifdef to generate a stub
for kvm_arch_post_irq_routing_update().
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/Kconfig | 10 ++++++++++
arch/x86/kvm/Makefile | 5 +++--
arch/x86/kvm/irq.c | 6 ++++++
arch/x86/kvm/irq_comm.c | 2 ++
arch/x86/kvm/lapic.c | 7 ++++++-
arch/x86/kvm/trace.h | 2 ++
arch/x86/kvm/x86.c | 24 ++++++++++++++++++++----
include/linux/kvm_host.h | 2 +-
include/trace/events/kvm.h | 4 ++--
10 files changed, 54 insertions(+), 10 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ebda93979179..f5ff5174674c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1374,9 +1374,11 @@ struct kvm_arch {
atomic_t noncoherent_dma_count;
#define __KVM_HAVE_ARCH_ASSIGNED_DEVICE
atomic_t assigned_device_count;
+#ifdef CONFIG_KVM_IOAPIC
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
+#endif
atomic_t vapics_in_nmi_mode;
struct mutex apic_map_lock;
struct kvm_apic_map __rcu *apic_map;
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 2eeffcec5382..2c86673155c9 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -166,6 +166,16 @@ config KVM_AMD_SEV
Encrypted State (SEV-ES), and Secure Encrypted Virtualization with
Secure Nested Paging (SEV-SNP) technologies on AMD processors.
+config KVM_IOAPIC
+ bool "I/O APIC, PIC, and PIT emulation"
+ default y
+ depends on KVM
+ help
+ Provides support for KVM to emulate an I/O APIC, PIC, and PIT, i.e.
+ for full in-kernel APIC emulation.
+
+ If unsure, say Y.
+
config KVM_SMM
bool "System Management Mode emulation"
default y
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index a5d362c7b504..92c737257789 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -5,12 +5,13 @@ ccflags-$(CONFIG_KVM_WERROR) += -Werror
include $(srctree)/virt/kvm/Makefile.kvm
-kvm-y += x86.o emulate.o i8259.o irq.o lapic.o \
- i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \
+kvm-y += x86.o emulate.o irq.o lapic.o \
+ irq_comm.o cpuid.o pmu.o mtrr.o \
debugfs.o mmu/mmu.o mmu/page_track.o \
mmu/spte.o
kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
+kvm-$(CONFIG_KVM_IOAPIC) += i8259.o i8254.o ioapic.o
kvm-$(CONFIG_KVM_HYPERV) += hyperv.o
kvm-$(CONFIG_KVM_XEN) += xen.o
kvm-$(CONFIG_KVM_SMM) += smm.o
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index b9b9df00ab77..a416ccddde5f 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -75,8 +75,10 @@ int kvm_cpu_has_extint(struct kvm_vcpu *v)
if (!kvm_apic_accept_pic_intr(v))
return 0;
+#ifdef CONFIG_KVM_IOAPIC
if (pic_in_kernel(v->kvm))
return v->kvm->arch.vpic->output;
+#endif
WARN_ON_ONCE(!irqchip_split(v->kvm));
return pending_userspace_extint(v);
@@ -135,8 +137,10 @@ int kvm_cpu_get_extint(struct kvm_vcpu *v)
return v->kvm->arch.xen.upcall_vector;
#endif
+#ifdef CONFIG_KVM_IOAPIC
if (pic_in_kernel(v->kvm))
return kvm_pic_read_irq(v->kvm); /* PIC */
+#endif
WARN_ON_ONCE(!irqchip_split(v->kvm));
return get_userspace_extint(v);
@@ -170,7 +174,9 @@ void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu)
void __kvm_migrate_timers(struct kvm_vcpu *vcpu)
{
__kvm_migrate_apic_timer(vcpu);
+#ifdef CONFIG_KVM_IOAPIC
__kvm_migrate_pit_timer(vcpu);
+#endif
kvm_x86_call(migrate_timers)(vcpu);
}
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index adef53dc4fef..a4ef150fdd1c 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -208,6 +208,7 @@ int kvm_set_routing_entry(struct kvm *kvm,
* check kvm_arch_can_set_irq_routing() before calling this function.
*/
switch (ue->type) {
+#ifdef CONFIG_KVM_IOAPIC
case KVM_IRQ_ROUTING_IRQCHIP:
if (irqchip_split(kvm))
return -EINVAL;
@@ -231,6 +232,7 @@ int kvm_set_routing_entry(struct kvm *kvm,
}
e->irqchip.irqchip = ue->u.irqchip.irqchip;
break;
+#endif
case KVM_IRQ_ROUTING_MSI:
e->set = kvm_set_msi;
e->msi.address_lo = ue->u.msi.address_lo;
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 73418dc0ebb2..4cf8c1f753d3 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1455,7 +1455,7 @@ static bool kvm_ioapic_handles_vector(struct kvm_lapic *apic, int vector)
static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
{
- int trigger_mode;
+ int __maybe_unused trigger_mode;
/* Eoi the ioapic only if the ioapic doesn't own the vector. */
if (!kvm_ioapic_handles_vector(apic, vector))
@@ -1476,12 +1476,14 @@ static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector)
return;
}
+#ifdef CONFIG_KVM_IOAPIC
if (apic_test_vector(vector, apic->regs + APIC_TMR))
trigger_mode = IOAPIC_LEVEL_TRIG;
else
trigger_mode = IOAPIC_EDGE_TRIG;
kvm_ioapic_update_eoi(apic->vcpu, vector, trigger_mode);
+#endif
}
static int apic_set_eoi(struct kvm_lapic *apic)
@@ -3146,8 +3148,11 @@ int kvm_apic_set_state(struct kvm_vcpu *vcpu, struct kvm_lapic_state *s)
kvm_x86_call(hwapic_isr_update)(vcpu, apic_find_highest_isr(apic));
}
kvm_make_request(KVM_REQ_EVENT, vcpu);
+
+#ifdef CONFIG_KVM_IOAPIC
if (ioapic_in_kernel(vcpu->kvm))
kvm_rtc_eoi_tracking_restore_one(vcpu);
+#endif
vcpu->arch.apic_arb_prio = 0;
diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h
index 4ef17990574d..ababdba2c186 100644
--- a/arch/x86/kvm/trace.h
+++ b/arch/x86/kvm/trace.h
@@ -270,6 +270,7 @@ TRACE_EVENT(kvm_cpuid,
{0x6, "SIPI"}, \
{0x7, "ExtINT"}
+#ifdef CONFIG_KVM_IOAPIC
TRACE_EVENT(kvm_ioapic_set_irq,
TP_PROTO(__u64 e, int pin, bool coalesced),
TP_ARGS(e, pin, coalesced),
@@ -314,6 +315,7 @@ TRACE_EVENT(kvm_ioapic_delayed_eoi_inj,
(__entry->e & (1<<15)) ? "level" : "edge",
(__entry->e & (1<<16)) ? "|masked" : "")
);
+#endif
TRACE_EVENT(kvm_msi_set_irq,
TP_PROTO(__u64 address, __u64 data),
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9e2c249d45ca..52eff4919d95 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4630,17 +4630,20 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_EXT_CPUID:
case KVM_CAP_EXT_EMUL_CPUID:
case KVM_CAP_CLOCKSOURCE:
+#ifdef CONFIG_KVM_IOAPIC
case KVM_CAP_PIT:
+ case KVM_CAP_PIT2:
+ case KVM_CAP_PIT_STATE2:
+ case KVM_CAP_REINJECT_CONTROL:
+#endif
case KVM_CAP_NOP_IO_DELAY:
case KVM_CAP_MP_STATE:
case KVM_CAP_SYNC_MMU:
case KVM_CAP_USER_NMI:
- case KVM_CAP_REINJECT_CONTROL:
case KVM_CAP_IRQ_INJECT_STATUS:
case KVM_CAP_IOEVENTFD:
case KVM_CAP_IOEVENTFD_NO_LENGTH:
- case KVM_CAP_PIT2:
- case KVM_CAP_PIT_STATE2:
+
case KVM_CAP_SET_IDENTITY_MAP_ADDR:
case KVM_CAP_VCPU_EVENTS:
#ifdef CONFIG_KVM_HYPERV
@@ -6393,6 +6396,7 @@ static int kvm_vm_ioctl_set_nr_mmu_pages(struct kvm *kvm,
return 0;
}
+#ifdef CONFIG_KVM_IOAPIC
static int kvm_vm_ioctl_get_irqchip(struct kvm *kvm, struct kvm_irqchip *chip)
{
struct kvm_pic *pic = kvm->arch.vpic;
@@ -6521,6 +6525,7 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
return 0;
}
+#endif /* CONFIG_KVM_IOAPIC */
void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
@@ -7064,9 +7069,11 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
struct kvm *kvm = filp->private_data;
void __user *argp = (void __user *)arg;
int r = -ENOTTY;
+
+#ifdef CONFIG_KVM_IOAPIC
/*
* This union makes it completely explicit to gcc-3.x
- * that these two variables' stack usage should be
+ * that these three variables' stack usage should be
* combined, not added together.
*/
union {
@@ -7074,6 +7081,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
struct kvm_pit_state2 ps2;
struct kvm_pit_config pit_config;
} u;
+#endif
switch (ioctl) {
case KVM_SET_TSS_ADDR:
@@ -7097,6 +7105,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
case KVM_SET_NR_MMU_PAGES:
r = kvm_vm_ioctl_set_nr_mmu_pages(kvm, arg);
break;
+#ifdef CONFIG_KVM_IOAPIC
case KVM_CREATE_IRQCHIP: {
mutex_lock(&kvm->lock);
@@ -7257,6 +7266,7 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
r = kvm_vm_ioctl_reinject(kvm, &control);
break;
}
+#endif
case KVM_SET_BOOT_CPU_ID:
r = 0;
mutex_lock(&kvm->lock);
@@ -10716,8 +10726,10 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
if (irqchip_split(vcpu->kvm))
kvm_scan_ioapic_routes(vcpu, vcpu->arch.ioapic_handled_vectors);
+#ifdef CONFIG_KVM_IOAPIC
else if (ioapic_in_kernel(vcpu->kvm))
kvm_ioapic_scan_entry(vcpu, vcpu->arch.ioapic_handled_vectors);
+#endif
if (is_guest_mode(vcpu))
vcpu->arch.load_eoi_exitmap_pending = true;
@@ -12920,7 +12932,9 @@ void kvm_arch_pre_destroy_vm(struct kvm *kvm)
cancel_delayed_work_sync(&kvm->arch.kvmclock_sync_work);
cancel_delayed_work_sync(&kvm->arch.kvmclock_update_work);
+#ifdef CONFIG_KVM_IOAPIC
kvm_free_pit(kvm);
+#endif
kvm_mmu_pre_destroy_vm(kvm);
static_call_cond(kvm_x86_vm_pre_destroy)(kvm);
@@ -12944,8 +12958,10 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
}
kvm_destroy_vcpus(kvm);
kvm_free_msr_filter(srcu_dereference_check(kvm->arch.msr_filter, &kvm->srcu, 1));
+#ifdef CONFIG_KVM_IOAPIC
kvm_pic_destroy(kvm);
kvm_ioapic_destroy(kvm);
+#endif
kvfree(rcu_dereference_check(kvm->arch.apic_map, 1));
kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1));
kvm_mmu_uninit_vm(kvm);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 44b439c5fcf4..0e151db44ecd 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1019,7 +1019,7 @@ void kvm_destroy_vcpus(struct kvm *kvm);
void vcpu_load(struct kvm_vcpu *vcpu);
void vcpu_put(struct kvm_vcpu *vcpu);
-#ifdef __KVM_HAVE_IOAPIC
+#ifdef CONFIG_KVM_IOAPIC
void kvm_arch_post_irq_ack_notifier_list_update(struct kvm *kvm);
#else
static inline void kvm_arch_post_irq_ack_notifier_list_update(struct kvm *kvm)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 96e581900c8e..1065a81ca57f 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -84,14 +84,14 @@ TRACE_EVENT(kvm_set_irq,
);
#endif /* defined(CONFIG_HAVE_KVM_IRQCHIP) */
-#if defined(__KVM_HAVE_IOAPIC)
+#ifdef CONFIG_KVM_IOAPIC
#define kvm_irqchips \
{KVM_IRQCHIP_PIC_MASTER, "PIC master"}, \
{KVM_IRQCHIP_PIC_SLAVE, "PIC slave"}, \
{KVM_IRQCHIP_IOAPIC, "IOAPIC"}
-#endif /* defined(__KVM_HAVE_IOAPIC) */
+#endif /* CONFIG_KVM_IOAPIC */
#if defined(CONFIG_HAVE_KVM_IRQCHIP)
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 12/15] KVM: Squash two CONFIG_HAVE_KVM_IRQCHIP #ifdefs into one
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (10 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-19 23:28 ` [PATCH 13/15] KVM: selftests: Fall back to split IRQ chip if full in-kernel chip is unsupported Sean Christopherson
` (4 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Squash two #idef CONFIG_HAVE_KVM_IRQCHIP regions in KVM's trace events, as
the only code outside of the #idefs depends on CONFIG_KVM_IOAPIC, and that
Kconfig only exists for x86, which unconditionally selects HAVE_KVM_IRQCHIP.
No functional change intended.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
include/trace/events/kvm.h | 3 ---
1 file changed, 3 deletions(-)
diff --git a/include/trace/events/kvm.h b/include/trace/events/kvm.h
index 1065a81ca57f..0b6b79b1a1bc 100644
--- a/include/trace/events/kvm.h
+++ b/include/trace/events/kvm.h
@@ -82,7 +82,6 @@ TRACE_EVENT(kvm_set_irq,
TP_printk("gsi %u level %d source %d",
__entry->gsi, __entry->level, __entry->irq_source_id)
);
-#endif /* defined(CONFIG_HAVE_KVM_IRQCHIP) */
#ifdef CONFIG_KVM_IOAPIC
@@ -93,8 +92,6 @@ TRACE_EVENT(kvm_set_irq,
#endif /* CONFIG_KVM_IOAPIC */
-#if defined(CONFIG_HAVE_KVM_IRQCHIP)
-
#ifdef kvm_irqchips
#define kvm_ack_irq_string "irqchip %s pin %u"
#define kvm_ack_irq_parm __print_symbolic(__entry->irqchip, kvm_irqchips), __entry->pin
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 13/15] KVM: selftests: Fall back to split IRQ chip if full in-kernel chip is unsupported
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (11 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 12/15] KVM: Squash two CONFIG_HAVE_KVM_IRQCHIP #ifdefs into one Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-19 23:28 ` [PATCH 14/15] KVM: x86: Move IRQ mask notifier infrastructure to I/O APIC emulation Sean Christopherson
` (3 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Now that KVM x86 allows compiling out support for in-kernel I/O APIC (and
PIC and PIT) emulation, i.e. allows disabling KVM_CREATE_IRQCHIP for all
intents and purposes, fall back to a split IRQ chip for x86 if creating
the full in-kernel version fails with ENOTTY.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
tools/testing/selftests/kvm/lib/kvm_util.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c
index 50edc59cc0ca..53116f4ffe97 100644
--- a/tools/testing/selftests/kvm/lib/kvm_util.c
+++ b/tools/testing/selftests/kvm/lib/kvm_util.c
@@ -1713,7 +1713,18 @@ void *addr_gpa2alias(struct kvm_vm *vm, vm_paddr_t gpa)
/* Create an interrupt controller chip for the specified VM. */
void vm_create_irqchip(struct kvm_vm *vm)
{
- vm_ioctl(vm, KVM_CREATE_IRQCHIP, NULL);
+ int r;
+
+ /*
+ * Allocate a fully in-kernel IRQ chip by default, but fall back to a
+ * split model (x86 only) if that fails (KVM x86 allows compiling out
+ * support for KVM_CREATE_IRQCHIP).
+ */
+ r = __vm_ioctl(vm, KVM_CREATE_IRQCHIP, NULL);
+ if (r && errno == ENOTTY && kvm_has_cap(KVM_CAP_SPLIT_IRQCHIP))
+ vm_enable_cap(vm, KVM_CAP_SPLIT_IRQCHIP, 24);
+ else
+ TEST_ASSERT_VM_VCPU_IOCTL(!r, KVM_CREATE_IRQCHIP, r, vm);
vm->has_irqchip = true;
}
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 14/15] KVM: x86: Move IRQ mask notifier infrastructure to I/O APIC emulation
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (12 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 13/15] KVM: selftests: Fall back to split IRQ chip if full in-kernel chip is unsupported Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-19 23:28 ` [PATCH 15/15] KVM: x86: Fold irq_comm.c into irq.c Sean Christopherson
` (2 subsequent siblings)
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Move the IRQ mask logic to ioapic.c as KVM's only user is its in-kernel
I/O APIC emulation. In addition to encapsulating more I/O APIC specific
code, trimming down irq_comm.c helps pave the way for removing it entirely.
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/include/asm/kvm_host.h | 16 --------------
arch/x86/kvm/i8254.h | 2 ++
arch/x86/kvm/i8259.c | 2 ++
arch/x86/kvm/ioapic.c | 37 +++++++++++++++++++++++++++++++++
arch/x86/kvm/ioapic.h | 16 ++++++++++++++
arch/x86/kvm/irq_comm.c | 33 -----------------------------
arch/x86/kvm/x86.c | 1 -
7 files changed, 57 insertions(+), 50 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f5ff5174674c..21ccb122ab76 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1426,9 +1426,6 @@ struct kvm_arch {
struct delayed_work kvmclock_update_work;
struct delayed_work kvmclock_sync_work;
- /* reads protected by irq_srcu, writes by irq_lock */
- struct hlist_head mask_notifier_list;
-
#ifdef CONFIG_KVM_HYPERV
struct kvm_hv hyperv;
#endif
@@ -2038,19 +2035,6 @@ int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
int emulator_write_phys(struct kvm_vcpu *vcpu, gpa_t gpa,
const void *val, int bytes);
-struct kvm_irq_mask_notifier {
- void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked);
- int irq;
- struct hlist_node link;
-};
-
-void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
- struct kvm_irq_mask_notifier *kimn);
-void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq,
- struct kvm_irq_mask_notifier *kimn);
-void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin,
- bool mask);
-
extern bool tdp_enabled;
u64 vcpu_tsc_khz(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index 14fb310357f2..de172567b56a 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -6,6 +6,8 @@
#include <kvm/iodev.h>
+#include "ioapic.h"
+
struct kvm_kpit_channel_state {
u32 count; /* can be 65536 */
u16 latched_count;
diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 4de055efc4ee..2ac7f1678c46 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -31,6 +31,8 @@
#include <linux/mm.h>
#include <linux/slab.h>
#include <linux/bitops.h>
+
+#include "ioapic.h"
#include "irq.h"
#include <linux/kvm_host.h>
diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
index 151ee9a64c3c..daaf16e4681a 100644
--- a/arch/x86/kvm/ioapic.c
+++ b/arch/x86/kvm/ioapic.c
@@ -310,6 +310,42 @@ void kvm_arch_post_irq_ack_notifier_list_update(struct kvm *kvm)
kvm_make_scan_ioapic_request(kvm);
}
+void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
+ struct kvm_irq_mask_notifier *kimn)
+{
+ struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+
+ mutex_lock(&kvm->irq_lock);
+ kimn->irq = irq;
+ hlist_add_head_rcu(&kimn->link, &ioapic->mask_notifier_list);
+ mutex_unlock(&kvm->irq_lock);
+}
+
+void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq,
+ struct kvm_irq_mask_notifier *kimn)
+{
+ mutex_lock(&kvm->irq_lock);
+ hlist_del_rcu(&kimn->link);
+ mutex_unlock(&kvm->irq_lock);
+ synchronize_srcu(&kvm->irq_srcu);
+}
+
+void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin,
+ bool mask)
+{
+ struct kvm_ioapic *ioapic = kvm->arch.vioapic;
+ struct kvm_irq_mask_notifier *kimn;
+ int idx, gsi;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
+ if (gsi != -1)
+ hlist_for_each_entry_rcu(kimn, &ioapic->mask_notifier_list, link)
+ if (kimn->irq == gsi)
+ kimn->func(kimn, mask);
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+}
+
static void ioapic_write_indirect(struct kvm_ioapic *ioapic, u32 val)
{
unsigned index;
@@ -736,6 +772,7 @@ int kvm_ioapic_init(struct kvm *kvm)
return -ENOMEM;
spin_lock_init(&ioapic->lock);
INIT_DELAYED_WORK(&ioapic->eoi_inject, kvm_ioapic_eoi_inject_work);
+ INIT_HLIST_HEAD(&ioapic->mask_notifier_list);
kvm->arch.vioapic = ioapic;
kvm_ioapic_reset(ioapic);
kvm_iodevice_init(&ioapic->dev, &ioapic_mmio_ops);
diff --git a/arch/x86/kvm/ioapic.h b/arch/x86/kvm/ioapic.h
index fee17eb201ef..f5c1ff640635 100644
--- a/arch/x86/kvm/ioapic.h
+++ b/arch/x86/kvm/ioapic.h
@@ -86,8 +86,24 @@ struct kvm_ioapic {
struct delayed_work eoi_inject;
u32 irq_eoi[IOAPIC_NUM_PINS];
u32 irr_delivered;
+
+ /* reads protected by irq_srcu, writes by irq_lock */
+ struct hlist_head mask_notifier_list;
};
+struct kvm_irq_mask_notifier {
+ void (*func)(struct kvm_irq_mask_notifier *kimn, bool masked);
+ int irq;
+ struct hlist_node link;
+};
+
+void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
+ struct kvm_irq_mask_notifier *kimn);
+void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq,
+ struct kvm_irq_mask_notifier *kimn);
+void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin,
+ bool mask);
+
#ifdef DEBUG
#define ASSERT(x) \
do { \
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
index a4ef150fdd1c..fc0fa8155882 100644
--- a/arch/x86/kvm/irq_comm.c
+++ b/arch/x86/kvm/irq_comm.c
@@ -161,39 +161,6 @@ int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
return -EWOULDBLOCK;
}
-void kvm_register_irq_mask_notifier(struct kvm *kvm, int irq,
- struct kvm_irq_mask_notifier *kimn)
-{
- mutex_lock(&kvm->irq_lock);
- kimn->irq = irq;
- hlist_add_head_rcu(&kimn->link, &kvm->arch.mask_notifier_list);
- mutex_unlock(&kvm->irq_lock);
-}
-
-void kvm_unregister_irq_mask_notifier(struct kvm *kvm, int irq,
- struct kvm_irq_mask_notifier *kimn)
-{
- mutex_lock(&kvm->irq_lock);
- hlist_del_rcu(&kimn->link);
- mutex_unlock(&kvm->irq_lock);
- synchronize_srcu(&kvm->irq_srcu);
-}
-
-void kvm_fire_mask_notifiers(struct kvm *kvm, unsigned irqchip, unsigned pin,
- bool mask)
-{
- struct kvm_irq_mask_notifier *kimn;
- int idx, gsi;
-
- idx = srcu_read_lock(&kvm->irq_srcu);
- gsi = kvm_irq_map_chip_pin(kvm, irqchip, pin);
- if (gsi != -1)
- hlist_for_each_entry_rcu(kimn, &kvm->arch.mask_notifier_list, link)
- if (kimn->irq == gsi)
- kimn->func(kimn, mask);
- srcu_read_unlock(&kvm->irq_srcu, idx);
-}
-
bool kvm_arch_can_set_irq_routing(struct kvm *kvm)
{
return irqchip_in_kernel(kvm);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 52eff4919d95..3ac6f7c83a06 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12799,7 +12799,6 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
if (ret)
goto out_uninit_mmu;
- INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list);
atomic_set(&kvm->arch.noncoherent_dma_count, 0);
raw_spin_lock_init(&kvm->arch.tsc_write_lock);
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* [PATCH 15/15] KVM: x86: Fold irq_comm.c into irq.c
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (13 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 14/15] KVM: x86: Move IRQ mask notifier infrastructure to I/O APIC emulation Sean Christopherson
@ 2025-05-19 23:28 ` Sean Christopherson
2025-05-29 11:58 ` [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Huang, Kai
2025-06-04 16:56 ` Paolo Bonzini
16 siblings, 0 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-19 23:28 UTC (permalink / raw)
To: Sean Christopherson, Paolo Bonzini, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
Drop irq_comm.c, a.k.a. common IRQ APIs, as there has been no non-x86 user
since commit 003f7de62589 ("KVM: ia64: remove") (at the time, irq_comm.c
lived in virt/kvm, not arch/x86/kvm).
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/Makefile | 6 +-
arch/x86/kvm/irq.c | 305 ++++++++++++++++++++++++++++++++++++-
arch/x86/kvm/irq_comm.c | 325 ----------------------------------------
3 files changed, 306 insertions(+), 330 deletions(-)
delete mode 100644 arch/x86/kvm/irq_comm.c
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 92c737257789..c4b8950c7abe 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -5,10 +5,8 @@ ccflags-$(CONFIG_KVM_WERROR) += -Werror
include $(srctree)/virt/kvm/Makefile.kvm
-kvm-y += x86.o emulate.o irq.o lapic.o \
- irq_comm.o cpuid.o pmu.o mtrr.o \
- debugfs.o mmu/mmu.o mmu/page_track.o \
- mmu/spte.o
+kvm-y += x86.o emulate.o irq.o lapic.o cpuid.o pmu.o mtrr.o \
+ debugfs.o mmu/mmu.o mmu/page_track.o mmu/spte.o
kvm-$(CONFIG_X86_64) += mmu/tdp_iter.o mmu/tdp_mmu.o
kvm-$(CONFIG_KVM_IOAPIC) += i8259.o i8254.o ioapic.o
diff --git a/arch/x86/kvm/irq.c b/arch/x86/kvm/irq.c
index a416ccddde5f..314a93599942 100644
--- a/arch/x86/kvm/irq.c
+++ b/arch/x86/kvm/irq.c
@@ -12,8 +12,10 @@
#include <linux/export.h>
#include <linux/kvm_host.h>
+#include "hyperv.h"
+#include "ioapic.h"
#include "irq.h"
-#include "i8254.h"
+#include "trace.h"
#include "x86.h"
#include "xen.h"
@@ -191,3 +193,304 @@ bool kvm_arch_irqchip_in_kernel(struct kvm *kvm)
{
return irqchip_in_kernel(kvm);
}
+
+int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
+ struct kvm_lapic_irq *irq, struct dest_map *dest_map)
+{
+ int r = -1;
+ struct kvm_vcpu *vcpu, *lowest = NULL;
+ unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
+ unsigned int dest_vcpus = 0;
+
+ if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
+ return r;
+
+ if (irq->dest_mode == APIC_DEST_PHYSICAL &&
+ irq->dest_id == 0xff && kvm_lowest_prio_delivery(irq)) {
+ pr_info("apic: phys broadcast and lowest prio\n");
+ irq->delivery_mode = APIC_DM_FIXED;
+ }
+
+ memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (!kvm_apic_present(vcpu))
+ continue;
+
+ if (!kvm_apic_match_dest(vcpu, src, irq->shorthand,
+ irq->dest_id, irq->dest_mode))
+ continue;
+
+ if (!kvm_lowest_prio_delivery(irq)) {
+ if (r < 0)
+ r = 0;
+ r += kvm_apic_set_irq(vcpu, irq, dest_map);
+ } else if (kvm_apic_sw_enabled(vcpu->arch.apic)) {
+ if (!kvm_vector_hashing_enabled()) {
+ if (!lowest)
+ lowest = vcpu;
+ else if (kvm_apic_compare_prio(vcpu, lowest) < 0)
+ lowest = vcpu;
+ } else {
+ __set_bit(i, dest_vcpu_bitmap);
+ dest_vcpus++;
+ }
+ }
+ }
+
+ if (dest_vcpus != 0) {
+ int idx = kvm_vector_to_index(irq->vector, dest_vcpus,
+ dest_vcpu_bitmap, KVM_MAX_VCPUS);
+
+ lowest = kvm_get_vcpu(kvm, idx);
+ }
+
+ if (lowest)
+ r = kvm_apic_set_irq(lowest, irq, dest_map);
+
+ return r;
+}
+
+void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
+ struct kvm_lapic_irq *irq)
+{
+ struct msi_msg msg = { .address_lo = e->msi.address_lo,
+ .address_hi = e->msi.address_hi,
+ .data = e->msi.data };
+
+ trace_kvm_msi_set_irq(msg.address_lo | (kvm->arch.x2apic_format ?
+ (u64)msg.address_hi << 32 : 0), msg.data);
+
+ irq->dest_id = x86_msi_msg_get_destid(&msg, kvm->arch.x2apic_format);
+ irq->vector = msg.arch_data.vector;
+ irq->dest_mode = kvm_lapic_irq_dest_mode(msg.arch_addr_lo.dest_mode_logical);
+ irq->trig_mode = msg.arch_data.is_level;
+ irq->delivery_mode = msg.arch_data.delivery_mode << 8;
+ irq->msi_redir_hint = msg.arch_addr_lo.redirect_hint;
+ irq->level = 1;
+ irq->shorthand = APIC_DEST_NOSHORT;
+}
+EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
+
+static inline bool kvm_msi_route_invalid(struct kvm *kvm,
+ struct kvm_kernel_irq_routing_entry *e)
+{
+ return kvm->arch.x2apic_format && (e->msi.address_hi & 0xff);
+}
+
+int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm *kvm, int irq_source_id, int level, bool line_status)
+{
+ struct kvm_lapic_irq irq;
+
+ if (kvm_msi_route_invalid(kvm, e))
+ return -EINVAL;
+
+ if (!level)
+ return -1;
+
+ kvm_set_msi_irq(kvm, e, &irq);
+
+ return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
+}
+
+int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
+ struct kvm *kvm, int irq_source_id, int level,
+ bool line_status)
+{
+ struct kvm_lapic_irq irq;
+ int r;
+
+ switch (e->type) {
+#ifdef CONFIG_KVM_HYPERV
+ case KVM_IRQ_ROUTING_HV_SINT:
+ return kvm_hv_set_sint(e, kvm, irq_source_id, level,
+ line_status);
+#endif
+
+ case KVM_IRQ_ROUTING_MSI:
+ if (kvm_msi_route_invalid(kvm, e))
+ return -EINVAL;
+
+ kvm_set_msi_irq(kvm, e, &irq);
+
+ if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r, NULL))
+ return r;
+ break;
+
+#ifdef CONFIG_KVM_XEN
+ case KVM_IRQ_ROUTING_XEN_EVTCHN:
+ if (!level)
+ return -1;
+
+ return kvm_xen_set_evtchn_fast(&e->xen_evtchn, kvm);
+#endif
+ default:
+ break;
+ }
+
+ return -EWOULDBLOCK;
+}
+
+bool kvm_arch_can_set_irq_routing(struct kvm *kvm)
+{
+ return irqchip_in_kernel(kvm);
+}
+
+int kvm_set_routing_entry(struct kvm *kvm,
+ struct kvm_kernel_irq_routing_entry *e,
+ const struct kvm_irq_routing_entry *ue)
+{
+ /* We can't check irqchip_in_kernel() here as some callers are
+ * currently initializing the irqchip. Other callers should therefore
+ * check kvm_arch_can_set_irq_routing() before calling this function.
+ */
+ switch (ue->type) {
+#ifdef CONFIG_KVM_IOAPIC
+ case KVM_IRQ_ROUTING_IRQCHIP:
+ if (irqchip_split(kvm))
+ return -EINVAL;
+ e->irqchip.pin = ue->u.irqchip.pin;
+ switch (ue->u.irqchip.irqchip) {
+ case KVM_IRQCHIP_PIC_SLAVE:
+ e->irqchip.pin += PIC_NUM_PINS / 2;
+ fallthrough;
+ case KVM_IRQCHIP_PIC_MASTER:
+ if (ue->u.irqchip.pin >= PIC_NUM_PINS / 2)
+ return -EINVAL;
+ e->set = kvm_pic_set_irq;
+ break;
+ case KVM_IRQCHIP_IOAPIC:
+ if (ue->u.irqchip.pin >= KVM_IOAPIC_NUM_PINS)
+ return -EINVAL;
+ e->set = kvm_ioapic_set_irq;
+ break;
+ default:
+ return -EINVAL;
+ }
+ e->irqchip.irqchip = ue->u.irqchip.irqchip;
+ break;
+#endif
+ case KVM_IRQ_ROUTING_MSI:
+ e->set = kvm_set_msi;
+ e->msi.address_lo = ue->u.msi.address_lo;
+ e->msi.address_hi = ue->u.msi.address_hi;
+ e->msi.data = ue->u.msi.data;
+
+ if (kvm_msi_route_invalid(kvm, e))
+ return -EINVAL;
+ break;
+#ifdef CONFIG_KVM_HYPERV
+ case KVM_IRQ_ROUTING_HV_SINT:
+ e->set = kvm_hv_set_sint;
+ e->hv_sint.vcpu = ue->u.hv_sint.vcpu;
+ e->hv_sint.sint = ue->u.hv_sint.sint;
+ break;
+#endif
+#ifdef CONFIG_KVM_XEN
+ case KVM_IRQ_ROUTING_XEN_EVTCHN:
+ return kvm_xen_setup_evtchn(kvm, e, ue);
+#endif
+ default:
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
+ struct kvm_vcpu **dest_vcpu)
+{
+ int r = 0;
+ unsigned long i;
+ struct kvm_vcpu *vcpu;
+
+ if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu))
+ return true;
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (!kvm_apic_present(vcpu))
+ continue;
+
+ if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
+ irq->dest_id, irq->dest_mode))
+ continue;
+
+ if (++r == 2)
+ return false;
+
+ *dest_vcpu = vcpu;
+ }
+
+ return r == 1;
+}
+EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
+
+void kvm_scan_ioapic_irq(struct kvm_vcpu *vcpu, u32 dest_id, u16 dest_mode,
+ u8 vector, unsigned long *ioapic_handled_vectors)
+{
+ /*
+ * Intercept EOI if the vCPU is the target of the new IRQ routing, or
+ * the vCPU has a pending IRQ from the old routing, i.e. if the vCPU
+ * may receive a level-triggered IRQ in the future, or already received
+ * level-triggered IRQ. The EOI needs to be intercepted and forwarded
+ * to I/O APIC emulation so that the IRQ can be de-asserted.
+ */
+ if (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT, dest_id, dest_mode)) {
+ __set_bit(vector, ioapic_handled_vectors);
+ } else if (kvm_apic_pending_eoi(vcpu, vector)) {
+ __set_bit(vector, ioapic_handled_vectors);
+
+ /*
+ * Track the highest pending EOI for which the vCPU is NOT the
+ * target in the new routing. Only the EOI for the IRQ that is
+ * in-flight (for the old routing) needs to be intercepted, any
+ * future IRQs that arrive on this vCPU will be coincidental to
+ * the level-triggered routing and don't need to be intercepted.
+ */
+ if ((int)vector > vcpu->arch.highest_stale_pending_ioapic_eoi)
+ vcpu->arch.highest_stale_pending_ioapic_eoi = vector;
+ }
+}
+
+void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
+ ulong *ioapic_handled_vectors)
+{
+ struct kvm *kvm = vcpu->kvm;
+ struct kvm_kernel_irq_routing_entry *entry;
+ struct kvm_irq_routing_table *table;
+ u32 i, nr_ioapic_pins;
+ int idx;
+
+ idx = srcu_read_lock(&kvm->irq_srcu);
+ table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
+ nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
+ kvm->arch.nr_reserved_ioapic_pins);
+ for (i = 0; i < nr_ioapic_pins; ++i) {
+ hlist_for_each_entry(entry, &table->map[i], link) {
+ struct kvm_lapic_irq irq;
+
+ if (entry->type != KVM_IRQ_ROUTING_MSI)
+ continue;
+
+ kvm_set_msi_irq(vcpu->kvm, entry, &irq);
+
+ if (!irq.trig_mode)
+ continue;
+
+ kvm_scan_ioapic_irq(vcpu, irq.dest_id, irq.dest_mode,
+ irq.vector, ioapic_handled_vectors);
+ }
+ }
+ srcu_read_unlock(&kvm->irq_srcu, idx);
+}
+
+void kvm_arch_irq_routing_update(struct kvm *kvm)
+{
+#ifdef CONFIG_KVM_HYPERV
+ kvm_hv_irq_routing_update(kvm);
+#endif
+
+ if (irqchip_split(kvm))
+ kvm_make_scan_ioapic_request(kvm);
+}
diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
deleted file mode 100644
index fc0fa8155882..000000000000
--- a/arch/x86/kvm/irq_comm.c
+++ /dev/null
@@ -1,325 +0,0 @@
-// SPDX-License-Identifier: GPL-2.0-only
-/*
- * irq_comm.c: Common API for in kernel interrupt controller
- * Copyright (c) 2007, Intel Corporation.
- *
- * Authors:
- * Yaozu (Eddie) Dong <Eddie.dong@intel.com>
- *
- * Copyright 2010 Red Hat, Inc. and/or its affiliates.
- */
-#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
-
-#include <linux/kvm_host.h>
-#include <linux/slab.h>
-#include <linux/export.h>
-#include <linux/rculist.h>
-
-#include "hyperv.h"
-#include "ioapic.h"
-#include "irq.h"
-#include "lapic.h"
-#include "trace.h"
-#include "x86.h"
-#include "xen.h"
-
-int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
- struct kvm_lapic_irq *irq, struct dest_map *dest_map)
-{
- int r = -1;
- struct kvm_vcpu *vcpu, *lowest = NULL;
- unsigned long i, dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)];
- unsigned int dest_vcpus = 0;
-
- if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map))
- return r;
-
- if (irq->dest_mode == APIC_DEST_PHYSICAL &&
- irq->dest_id == 0xff && kvm_lowest_prio_delivery(irq)) {
- pr_info("apic: phys broadcast and lowest prio\n");
- irq->delivery_mode = APIC_DM_FIXED;
- }
-
- memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap));
-
- kvm_for_each_vcpu(i, vcpu, kvm) {
- if (!kvm_apic_present(vcpu))
- continue;
-
- if (!kvm_apic_match_dest(vcpu, src, irq->shorthand,
- irq->dest_id, irq->dest_mode))
- continue;
-
- if (!kvm_lowest_prio_delivery(irq)) {
- if (r < 0)
- r = 0;
- r += kvm_apic_set_irq(vcpu, irq, dest_map);
- } else if (kvm_apic_sw_enabled(vcpu->arch.apic)) {
- if (!kvm_vector_hashing_enabled()) {
- if (!lowest)
- lowest = vcpu;
- else if (kvm_apic_compare_prio(vcpu, lowest) < 0)
- lowest = vcpu;
- } else {
- __set_bit(i, dest_vcpu_bitmap);
- dest_vcpus++;
- }
- }
- }
-
- if (dest_vcpus != 0) {
- int idx = kvm_vector_to_index(irq->vector, dest_vcpus,
- dest_vcpu_bitmap, KVM_MAX_VCPUS);
-
- lowest = kvm_get_vcpu(kvm, idx);
- }
-
- if (lowest)
- r = kvm_apic_set_irq(lowest, irq, dest_map);
-
- return r;
-}
-
-void kvm_set_msi_irq(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e,
- struct kvm_lapic_irq *irq)
-{
- struct msi_msg msg = { .address_lo = e->msi.address_lo,
- .address_hi = e->msi.address_hi,
- .data = e->msi.data };
-
- trace_kvm_msi_set_irq(msg.address_lo | (kvm->arch.x2apic_format ?
- (u64)msg.address_hi << 32 : 0), msg.data);
-
- irq->dest_id = x86_msi_msg_get_destid(&msg, kvm->arch.x2apic_format);
- irq->vector = msg.arch_data.vector;
- irq->dest_mode = kvm_lapic_irq_dest_mode(msg.arch_addr_lo.dest_mode_logical);
- irq->trig_mode = msg.arch_data.is_level;
- irq->delivery_mode = msg.arch_data.delivery_mode << 8;
- irq->msi_redir_hint = msg.arch_addr_lo.redirect_hint;
- irq->level = 1;
- irq->shorthand = APIC_DEST_NOSHORT;
-}
-EXPORT_SYMBOL_GPL(kvm_set_msi_irq);
-
-static inline bool kvm_msi_route_invalid(struct kvm *kvm,
- struct kvm_kernel_irq_routing_entry *e)
-{
- return kvm->arch.x2apic_format && (e->msi.address_hi & 0xff);
-}
-
-int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
- struct kvm *kvm, int irq_source_id, int level, bool line_status)
-{
- struct kvm_lapic_irq irq;
-
- if (kvm_msi_route_invalid(kvm, e))
- return -EINVAL;
-
- if (!level)
- return -1;
-
- kvm_set_msi_irq(kvm, e, &irq);
-
- return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
-}
-
-int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
- struct kvm *kvm, int irq_source_id, int level,
- bool line_status)
-{
- struct kvm_lapic_irq irq;
- int r;
-
- switch (e->type) {
-#ifdef CONFIG_KVM_HYPERV
- case KVM_IRQ_ROUTING_HV_SINT:
- return kvm_hv_set_sint(e, kvm, irq_source_id, level,
- line_status);
-#endif
-
- case KVM_IRQ_ROUTING_MSI:
- if (kvm_msi_route_invalid(kvm, e))
- return -EINVAL;
-
- kvm_set_msi_irq(kvm, e, &irq);
-
- if (kvm_irq_delivery_to_apic_fast(kvm, NULL, &irq, &r, NULL))
- return r;
- break;
-
-#ifdef CONFIG_KVM_XEN
- case KVM_IRQ_ROUTING_XEN_EVTCHN:
- if (!level)
- return -1;
-
- return kvm_xen_set_evtchn_fast(&e->xen_evtchn, kvm);
-#endif
- default:
- break;
- }
-
- return -EWOULDBLOCK;
-}
-
-bool kvm_arch_can_set_irq_routing(struct kvm *kvm)
-{
- return irqchip_in_kernel(kvm);
-}
-
-int kvm_set_routing_entry(struct kvm *kvm,
- struct kvm_kernel_irq_routing_entry *e,
- const struct kvm_irq_routing_entry *ue)
-{
- /* We can't check irqchip_in_kernel() here as some callers are
- * currently initializing the irqchip. Other callers should therefore
- * check kvm_arch_can_set_irq_routing() before calling this function.
- */
- switch (ue->type) {
-#ifdef CONFIG_KVM_IOAPIC
- case KVM_IRQ_ROUTING_IRQCHIP:
- if (irqchip_split(kvm))
- return -EINVAL;
- e->irqchip.pin = ue->u.irqchip.pin;
- switch (ue->u.irqchip.irqchip) {
- case KVM_IRQCHIP_PIC_SLAVE:
- e->irqchip.pin += PIC_NUM_PINS / 2;
- fallthrough;
- case KVM_IRQCHIP_PIC_MASTER:
- if (ue->u.irqchip.pin >= PIC_NUM_PINS / 2)
- return -EINVAL;
- e->set = kvm_pic_set_irq;
- break;
- case KVM_IRQCHIP_IOAPIC:
- if (ue->u.irqchip.pin >= KVM_IOAPIC_NUM_PINS)
- return -EINVAL;
- e->set = kvm_ioapic_set_irq;
- break;
- default:
- return -EINVAL;
- }
- e->irqchip.irqchip = ue->u.irqchip.irqchip;
- break;
-#endif
- case KVM_IRQ_ROUTING_MSI:
- e->set = kvm_set_msi;
- e->msi.address_lo = ue->u.msi.address_lo;
- e->msi.address_hi = ue->u.msi.address_hi;
- e->msi.data = ue->u.msi.data;
-
- if (kvm_msi_route_invalid(kvm, e))
- return -EINVAL;
- break;
-#ifdef CONFIG_KVM_HYPERV
- case KVM_IRQ_ROUTING_HV_SINT:
- e->set = kvm_hv_set_sint;
- e->hv_sint.vcpu = ue->u.hv_sint.vcpu;
- e->hv_sint.sint = ue->u.hv_sint.sint;
- break;
-#endif
-#ifdef CONFIG_KVM_XEN
- case KVM_IRQ_ROUTING_XEN_EVTCHN:
- return kvm_xen_setup_evtchn(kvm, e, ue);
-#endif
- default:
- return -EINVAL;
- }
-
- return 0;
-}
-
-bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
- struct kvm_vcpu **dest_vcpu)
-{
- int r = 0;
- unsigned long i;
- struct kvm_vcpu *vcpu;
-
- if (kvm_intr_is_single_vcpu_fast(kvm, irq, dest_vcpu))
- return true;
-
- kvm_for_each_vcpu(i, vcpu, kvm) {
- if (!kvm_apic_present(vcpu))
- continue;
-
- if (!kvm_apic_match_dest(vcpu, NULL, irq->shorthand,
- irq->dest_id, irq->dest_mode))
- continue;
-
- if (++r == 2)
- return false;
-
- *dest_vcpu = vcpu;
- }
-
- return r == 1;
-}
-EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
-
-void kvm_scan_ioapic_irq(struct kvm_vcpu *vcpu, u32 dest_id, u16 dest_mode,
- u8 vector, unsigned long *ioapic_handled_vectors)
-{
- /*
- * Intercept EOI if the vCPU is the target of the new IRQ routing, or
- * the vCPU has a pending IRQ from the old routing, i.e. if the vCPU
- * may receive a level-triggered IRQ in the future, or already received
- * level-triggered IRQ. The EOI needs to be intercepted and forwarded
- * to I/O APIC emulation so that the IRQ can be de-asserted.
- */
- if (kvm_apic_match_dest(vcpu, NULL, APIC_DEST_NOSHORT, dest_id, dest_mode)) {
- __set_bit(vector, ioapic_handled_vectors);
- } else if (kvm_apic_pending_eoi(vcpu, vector)) {
- __set_bit(vector, ioapic_handled_vectors);
-
- /*
- * Track the highest pending EOI for which the vCPU is NOT the
- * target in the new routing. Only the EOI for the IRQ that is
- * in-flight (for the old routing) needs to be intercepted, any
- * future IRQs that arrive on this vCPU will be coincidental to
- * the level-triggered routing and don't need to be intercepted.
- */
- if ((int)vector > vcpu->arch.highest_stale_pending_ioapic_eoi)
- vcpu->arch.highest_stale_pending_ioapic_eoi = vector;
- }
-}
-
-void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu,
- ulong *ioapic_handled_vectors)
-{
- struct kvm *kvm = vcpu->kvm;
- struct kvm_kernel_irq_routing_entry *entry;
- struct kvm_irq_routing_table *table;
- u32 i, nr_ioapic_pins;
- int idx;
-
- idx = srcu_read_lock(&kvm->irq_srcu);
- table = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu);
- nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
- kvm->arch.nr_reserved_ioapic_pins);
- for (i = 0; i < nr_ioapic_pins; ++i) {
- hlist_for_each_entry(entry, &table->map[i], link) {
- struct kvm_lapic_irq irq;
-
- if (entry->type != KVM_IRQ_ROUTING_MSI)
- continue;
-
- kvm_set_msi_irq(vcpu->kvm, entry, &irq);
-
- if (!irq.trig_mode)
- continue;
-
- kvm_scan_ioapic_irq(vcpu, irq.dest_id, irq.dest_mode,
- irq.vector, ioapic_handled_vectors);
- }
- }
- srcu_read_unlock(&kvm->irq_srcu, idx);
-}
-
-void kvm_arch_irq_routing_update(struct kvm *kvm)
-{
-#ifdef CONFIG_KVM_HYPERV
- kvm_hv_irq_routing_update(kvm);
-#endif
-
- if (irqchip_split(kvm))
- kvm_make_scan_ioapic_request(kvm);
-}
--
2.49.0.1101.gccaa498523-goog
^ permalink raw reply related [flat|nested] 32+ messages in thread
* Re: [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper
2025-05-19 23:27 ` [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper Sean Christopherson
@ 2025-05-20 9:57 ` Vitaly Kuznetsov
2025-05-29 11:37 ` Huang, Kai
1 sibling, 0 replies; 32+ messages in thread
From: Vitaly Kuznetsov @ 2025-05-20 9:57 UTC (permalink / raw)
To: Sean Christopherson; +Cc: kvm, linux-kernel, Paolo Bonzini
Sean Christopherson <seanjc@google.com> writes:
> Drop the superfluous kvm_hv_set_sint() and instead wire up ->set() directly
> to its final destination.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
Nitpick: synic_set_irq() still has trace_kvm_hv_synic_set_irq() but
kvm_hv_synic_set_irq() is now gone, I think it may make sense to rename
it to e.g. 'trace_kvm_hv_set_sint' or 'trace_synic_set_irq' to avoid any
confusion.
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
> ---
> arch/x86/kvm/hyperv.c | 10 +++++++---
> arch/x86/kvm/hyperv.h | 3 ++-
> arch/x86/kvm/irq_comm.c | 12 ------------
> 3 files changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 24f0318c50d7..7f565636edde 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -497,15 +497,19 @@ static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint)
> return ret;
> }
>
> -int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vpidx, u32 sint)
> +int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
> + int irq_source_id, int level, bool line_status)
> {
> struct kvm_vcpu_hv_synic *synic;
>
> - synic = synic_get(kvm, vpidx);
> + if (!level)
> + return -1;
> +
> + synic = synic_get(kvm, e->hv_sint.vcpu);
> if (!synic)
> return -EINVAL;
>
> - return synic_set_irq(synic, sint);
> + return synic_set_irq(synic, e->hv_sint.sint);
> }
>
> void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector)
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 913bfc96959c..4ad5a0749739 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -103,7 +103,8 @@ static inline bool kvm_hv_hypercall_enabled(struct kvm_vcpu *vcpu)
> int kvm_hv_hypercall(struct kvm_vcpu *vcpu);
>
> void kvm_hv_irq_routing_update(struct kvm *kvm);
> -int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vcpu_id, u32 sint);
> +int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
> + int irq_source_id, int level, bool line_status);
> void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector);
> int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages);
>
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 8dcb6a555902..b85e4be2ddff 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -127,18 +127,6 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
> return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
> }
>
> -#ifdef CONFIG_KVM_HYPERV
> -static int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e,
> - struct kvm *kvm, int irq_source_id, int level,
> - bool line_status)
> -{
> - if (!level)
> - return -1;
> -
> - return kvm_hv_synic_set_irq(kvm, e->hv_sint.vcpu, e->hv_sint.sint);
> -}
> -#endif
> -
> int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
> struct kvm *kvm, int irq_source_id, int level,
> bool line_status)
--
Vitaly
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper
2025-05-19 23:27 ` [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper Sean Christopherson
2025-05-20 9:57 ` Vitaly Kuznetsov
@ 2025-05-29 11:37 ` Huang, Kai
2025-05-29 14:39 ` Sean Christopherson
1 sibling, 1 reply; 32+ messages in thread
From: Huang, Kai @ 2025-05-29 11:37 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
On Mon, 2025-05-19 at 16:27 -0700, Sean Christopherson wrote:
> Drop the superfluous kvm_hv_set_sint() and instead wire up ->set() directly
> to its final destination.
kvm_hv_set_sint() is still there after this patch. Did you mean "superfluous
kvm_hv_synic_set_irq()"? :-)
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/hyperv.c | 10 +++++++---
> arch/x86/kvm/hyperv.h | 3 ++-
> arch/x86/kvm/irq_comm.c | 12 ------------
> 3 files changed, 9 insertions(+), 16 deletions(-)
>
> diff --git a/arch/x86/kvm/hyperv.c b/arch/x86/kvm/hyperv.c
> index 24f0318c50d7..7f565636edde 100644
> --- a/arch/x86/kvm/hyperv.c
> +++ b/arch/x86/kvm/hyperv.c
> @@ -497,15 +497,19 @@ static int synic_set_irq(struct kvm_vcpu_hv_synic *synic, u32 sint)
> return ret;
> }
>
> -int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vpidx, u32 sint)
> +int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
> + int irq_source_id, int level, bool line_status)
> {
> struct kvm_vcpu_hv_synic *synic;
>
> - synic = synic_get(kvm, vpidx);
> + if (!level)
> + return -1;
> +
> + synic = synic_get(kvm, e->hv_sint.vcpu);
> if (!synic)
> return -EINVAL;
>
> - return synic_set_irq(synic, sint);
> + return synic_set_irq(synic, e->hv_sint.sint);
> }
>
> void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector)
> diff --git a/arch/x86/kvm/hyperv.h b/arch/x86/kvm/hyperv.h
> index 913bfc96959c..4ad5a0749739 100644
> --- a/arch/x86/kvm/hyperv.h
> +++ b/arch/x86/kvm/hyperv.h
> @@ -103,7 +103,8 @@ static inline bool kvm_hv_hypercall_enabled(struct kvm_vcpu *vcpu)
> int kvm_hv_hypercall(struct kvm_vcpu *vcpu);
>
> void kvm_hv_irq_routing_update(struct kvm *kvm);
> -int kvm_hv_synic_set_irq(struct kvm *kvm, u32 vcpu_id, u32 sint);
> +int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e, struct kvm *kvm,
> + int irq_source_id, int level, bool line_status);
> void kvm_hv_synic_send_eoi(struct kvm_vcpu *vcpu, int vector);
> int kvm_hv_activate_synic(struct kvm_vcpu *vcpu, bool dont_zero_synic_pages);
>
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index 8dcb6a555902..b85e4be2ddff 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -127,18 +127,6 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
> return kvm_irq_delivery_to_apic(kvm, NULL, &irq, NULL);
> }
>
> -#ifdef CONFIG_KVM_HYPERV
> -static int kvm_hv_set_sint(struct kvm_kernel_irq_routing_entry *e,
> - struct kvm *kvm, int irq_source_id, int level,
> - bool line_status)
> -{
> - if (!level)
> - return -1;
> -
> - return kvm_hv_synic_set_irq(kvm, e->hv_sint.vcpu, e->hv_sint.sint);
> -}
> -#endif
> -
> int kvm_arch_set_irq_inatomic(struct kvm_kernel_irq_routing_entry *e,
> struct kvm *kvm, int irq_source_id, int level,
> bool line_status)
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 08/15] KVM: x86: Don't clear PIT's IRQ line status when destroying PIT
2025-05-19 23:28 ` [PATCH 08/15] KVM: x86: Don't clear PIT's IRQ line status when destroying PIT Sean Christopherson
@ 2025-05-29 11:41 ` Huang, Kai
0 siblings, 0 replies; 32+ messages in thread
From: Huang, Kai @ 2025-05-29 11:41 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
On Mon, 2025-05-19 at 16:28 -0700, Sean Christopherson wrote:
> Don't bother clearing the PIT's IRQ line status when destroying the PIT,
> as userspace can't possibly rely on KVM to lower the IRQ line in any sane
> use case, and it's not at all obvious that clearing the PIT's IRQ line is
> correct/desirable in kvm_create_pit()'s error path.
>
> When called from kvm_arch_pre_destroy_vm(), the entire VM is being torn
> down and thus {kvm_pic,kvm_ioapic}.irq_states are unreachable.
>
> As for the error path in kvm_create_pit(), the only way the PIT's bit in
> irq_states can be set is if userspace raises the associated IRQ before
> KVM_CREATE_PIT{2} completes. Forcefully clearing the bit would clobber's
^
clobber
> userspace's input, nonsensical though that input may be. Not to mention
> that no known VMM will continue on if PIT creation fails.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-05-19 23:28 ` [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC Sean Christopherson
@ 2025-05-29 11:55 ` Huang, Kai
2025-05-29 11:57 ` Huang, Kai
0 siblings, 1 reply; 32+ messages in thread
From: Huang, Kai @ 2025-05-29 11:55 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
On Mon, 2025-05-19 at 16:28 -0700, Sean Christopherson wrote:
> Add a Kconfig to allowing building KVM without support for emulating an
^
allow
> I/O APIC, PIC, and PIT, which is desirable for deployments that effectively
> don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to
> create an in-kernel I/O APIC.
>
Do you happen to know what developments don't support a full in-kernel IRQ chip?
Do they only support userspace IRQ chip, or not support any IRQ chip at all?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-05-29 11:55 ` Huang, Kai
@ 2025-05-29 11:57 ` Huang, Kai
2025-05-29 14:31 ` Sean Christopherson
0 siblings, 1 reply; 32+ messages in thread
From: Huang, Kai @ 2025-05-29 11:57 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
On Thu, 2025-05-29 at 23:55 +1200, Kai Huang wrote:
> On Mon, 2025-05-19 at 16:28 -0700, Sean Christopherson wrote:
> > Add a Kconfig to allowing building KVM without support for emulating an
> ^
> allow
>
> > I/O APIC, PIC, and PIT, which is desirable for deployments that effectively
> > don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to
> > create an in-kernel I/O APIC.
> >
>
> Do you happen to know what developments don't support a full in-kernel IRQ chip?
>
> Do they only support userspace IRQ chip, or not support any IRQ chip at all?
Forgot to ask:
Since this new Kconfig option is not only for IOAPIC but also includes PIC and
PIT, is CONFIG_KVM_IRQCHIP a better name?
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (14 preceding siblings ...)
2025-05-19 23:28 ` [PATCH 15/15] KVM: x86: Fold irq_comm.c into irq.c Sean Christopherson
@ 2025-05-29 11:58 ` Huang, Kai
2025-06-04 16:56 ` Paolo Bonzini
16 siblings, 0 replies; 32+ messages in thread
From: Huang, Kai @ 2025-05-29 11:58 UTC (permalink / raw)
To: pbonzini@redhat.com, seanjc@google.com, vkuznets@redhat.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org
On Mon, 2025-05-19 at 16:27 -0700, Sean Christopherson wrote:
> This series is prep work for the big device posted IRQs overhaul[1], in which
> Paolo suggested getting rid of arch/x86/kvm/irq_comm.c[2].
For this series,
Acked-by: Kai Huang <kai.huang@intel.com>
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-05-29 11:57 ` Huang, Kai
@ 2025-05-29 14:31 ` Sean Christopherson
2025-05-29 22:51 ` Huang, Kai
0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-05-29 14:31 UTC (permalink / raw)
To: Kai Huang
Cc: pbonzini@redhat.com, vkuznets@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
On Thu, May 29, 2025, Kai Huang wrote:
> On Thu, 2025-05-29 at 23:55 +1200, Kai Huang wrote:
> > On Mon, 2025-05-19 at 16:28 -0700, Sean Christopherson wrote:
> > > Add a Kconfig to allowing building KVM without support for emulating an
> > ^
> > allow
> >
> > > I/O APIC, PIC, and PIT, which is desirable for deployments that effectively
> > > don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to
> > > create an in-kernel I/O APIC.
> >
> > Do you happen to know what developments don't support a full in-kernel IRQ chip?
Google Cloud, for one. I suspect/assume many/most CSPs don't utilize an in-kernel
I/O APIC.
> > Do they only support userspace IRQ chip, or not support any IRQ chip at all?
The former, only userspace I/O APIC (and associated devices), though some VM
shapes, e.g. TDX, don't provide an I/O APIC or PIC.
> Forgot to ask:
>
> Since this new Kconfig option is not only for IOAPIC but also includes PIC and
> PIT, is CONFIG_KVM_IRQCHIP a better name?
I much prefer IOAPIC, because IRQCHIP is far too ambiguous and confusing, e.g.
just look at KVM's internal APIs, where these:
irqchip_in_kernel()
irqchip_kernel()
are not equivalent. In practice, no modern guest kernel is going to utilize the
PIC, and the PIT isn't an IRQ chip, i.e. isn't strictly covered by IRQCHIP either.
So I think/hope the vast majority of users/readers will be able to intuit that
CONFIG_KVM_IOAPIC also covers the PIC and PIT.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper
2025-05-29 11:37 ` Huang, Kai
@ 2025-05-29 14:39 ` Sean Christopherson
2025-05-29 22:34 ` Huang, Kai
0 siblings, 1 reply; 32+ messages in thread
From: Sean Christopherson @ 2025-05-29 14:39 UTC (permalink / raw)
To: Kai Huang
Cc: pbonzini@redhat.com, vkuznets@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
On Thu, May 29, 2025, Kai Huang wrote:
> On Mon, 2025-05-19 at 16:27 -0700, Sean Christopherson wrote:
> > Drop the superfluous kvm_hv_set_sint() and instead wire up ->set() directly
> > to its final destination.
>
> kvm_hv_set_sint() is still there after this patch. Did you mean "superfluous
> kvm_hv_synic_set_irq()"? :-)
Ugh, yeah, bad changelog. Maybe this?
Rename kvm_hv_synic_set_irq() to kvm_hv_set_sint() and drop the previous
incarnation of kvm_hv_set_sint() provided by irq_comm.c, which is just a
wrapper to the hyperv.c code.
That said, given that the tracepoint is trace_kvm_hv_synic_set_irq(), and that
the IOAPIC and PIC versions are kvm_ioapic_set_irq() and kvm_pic_set_irq()
respectively, I'm leaning towards a straight drop of kvm_hv_set_sint(), i.e. keep
kvm_hv_synic_set_irq().
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper
2025-05-29 14:39 ` Sean Christopherson
@ 2025-05-29 22:34 ` Huang, Kai
0 siblings, 0 replies; 32+ messages in thread
From: Huang, Kai @ 2025-05-29 22:34 UTC (permalink / raw)
To: seanjc@google.com
Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, vkuznets@redhat.com
On Thu, 2025-05-29 at 07:39 -0700, Sean Christopherson wrote:
> On Thu, May 29, 2025, Kai Huang wrote:
> > On Mon, 2025-05-19 at 16:27 -0700, Sean Christopherson wrote:
> > > Drop the superfluous kvm_hv_set_sint() and instead wire up ->set() directly
> > > to its final destination.
> >
> > kvm_hv_set_sint() is still there after this patch. Did you mean "superfluous
> > kvm_hv_synic_set_irq()"? :-)
>
> Ugh, yeah, bad changelog. Maybe this?
>
> Rename kvm_hv_synic_set_irq() to kvm_hv_set_sint() and drop the previous
> incarnation of kvm_hv_set_sint() provided by irq_comm.c, which is just a
> wrapper to the hyperv.c code.
>
> That said, given that the tracepoint is trace_kvm_hv_synic_set_irq(), and that
> the IOAPIC and PIC versions are kvm_ioapic_set_irq() and kvm_pic_set_irq()
> respectively, I'm leaning towards a straight drop of kvm_hv_set_sint(), i.e. keep
> kvm_hv_synic_set_irq().
Yeah LGTM. Thanks.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-05-29 14:31 ` Sean Christopherson
@ 2025-05-29 22:51 ` Huang, Kai
2025-05-29 23:08 ` Sean Christopherson
0 siblings, 1 reply; 32+ messages in thread
From: Huang, Kai @ 2025-05-29 22:51 UTC (permalink / raw)
To: seanjc@google.com
Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, vkuznets@redhat.com
On Thu, 2025-05-29 at 07:31 -0700, Sean Christopherson wrote:
> On Thu, May 29, 2025, Kai Huang wrote:
> > On Thu, 2025-05-29 at 23:55 +1200, Kai Huang wrote:
> > > On Mon, 2025-05-19 at 16:28 -0700, Sean Christopherson wrote:
> > > > Add a Kconfig to allowing building KVM without support for emulating an
> > > ^
> > > allow
> > >
> > > > I/O APIC, PIC, and PIT, which is desirable for deployments that effectively
> > > > don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to
> > > > create an in-kernel I/O APIC.
> > >
> > > Do you happen to know what developments don't support a full in-kernel IRQ chip?
>
> Google Cloud, for one. I suspect/assume many/most CSPs don't utilize an in-kernel
> I/O APIC.
>
> > > Do they only support userspace IRQ chip, or not support any IRQ chip at all?
>
> The former, only userspace I/O APIC (and associated devices), though some VM
> shapes, e.g. TDX, don't provide an I/O APIC or PIC.
Thanks for the info.
Just wondering what's the benefit of using userspace IRQCHIP instead of
emulating in the kernel? I thought one should either use in-kernel IRQCHIP or
doesn't use any.
>
> > Forgot to ask:
> >
> > Since this new Kconfig option is not only for IOAPIC but also includes PIC and
> > PIT, is CONFIG_KVM_IRQCHIP a better name?
>
> I much prefer IOAPIC, because IRQCHIP is far too ambiguous and confusing, e.g.
> just look at KVM's internal APIs, where these:
>
> irqchip_in_kernel()
> irqchip_kernel()
>
> are not equivalent. In practice, no modern guest kernel is going to utilize the
> PIC, and the PIT isn't an IRQ chip, i.e. isn't strictly covered by IRQCHIP either.
Right.
Maybe it is worth to further have dedicated Kconfig for PIC, PIT and IOAPIC?
But hmm, I am not sure whether emulating IOAPIC has more value than PIC. For
modern guests all emulated/assigned devices should just use MSI/MSI-X?
> So I think/hope the vast majority of users/readers will be able to intuit that
> CONFIG_KVM_IOAPIC also covers the PIC and PIT.
Sure.
Btw, I also find irqchip_in_kernel() and irqchip_kernel() confusing. I am not
sure the value of having irqchip_in_kernel() in fact. The guest should always
have an in-kernel APIC for modern guests. I am wondering whether we can get rid
of it completely (the logic will be it is always be true), or we can have a
Kconfig to only build it when user truly wants it.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-05-29 22:51 ` Huang, Kai
@ 2025-05-29 23:08 ` Sean Christopherson
2025-05-29 23:55 ` Huang, Kai
2025-06-04 16:54 ` Paolo Bonzini
0 siblings, 2 replies; 32+ messages in thread
From: Sean Christopherson @ 2025-05-29 23:08 UTC (permalink / raw)
To: Kai Huang
Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, vkuznets@redhat.com
On Thu, May 29, 2025, Kai Huang wrote:
> On Thu, 2025-05-29 at 07:31 -0700, Sean Christopherson wrote:
> > On Thu, May 29, 2025, Kai Huang wrote:
> > > On Thu, 2025-05-29 at 23:55 +1200, Kai Huang wrote:
> > > > On Mon, 2025-05-19 at 16:28 -0700, Sean Christopherson wrote:
> > > > > Add a Kconfig to allowing building KVM without support for emulating an
> > > > ^
> > > > allow
> > > >
> > > > > I/O APIC, PIC, and PIT, which is desirable for deployments that effectively
> > > > > don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to
> > > > > create an in-kernel I/O APIC.
> > > >
> > > > Do you happen to know what developments don't support a full in-kernel IRQ chip?
> >
> > Google Cloud, for one. I suspect/assume many/most CSPs don't utilize an in-kernel
> > I/O APIC.
> >
> > > > Do they only support userspace IRQ chip, or not support any IRQ chip at all?
> >
> > The former, only userspace I/O APIC (and associated devices), though some VM
> > shapes, e.g. TDX, don't provide an I/O APIC or PIC.
>
> Thanks for the info.
>
> Just wondering what's the benefit of using userspace IRQCHIP instead of
> emulating in the kernel?
Reduced kernel attack surface (this was especially true years ago, before KVM's
I/O APIC emulation was well-tested) and more flexibility (e.g. shipping userspace
changes is typically easier than shipping new kernels. I'm pretty sure there's
one more big one that I'm blanking on at the moment.
> I thought one should either use in-kernel IRQCHIP or doesn't use any.
>
> >
> > > Forgot to ask:
> > >
> > > Since this new Kconfig option is not only for IOAPIC but also includes PIC and
> > > PIT, is CONFIG_KVM_IRQCHIP a better name?
> >
> > I much prefer IOAPIC, because IRQCHIP is far too ambiguous and confusing, e.g.
> > just look at KVM's internal APIs, where these:
> >
> > irqchip_in_kernel()
> > irqchip_kernel()
> >
> > are not equivalent. In practice, no modern guest kernel is going to utilize the
> > PIC, and the PIT isn't an IRQ chip, i.e. isn't strictly covered by IRQCHIP either.
>
> Right.
>
> Maybe it is worth to further have dedicated Kconfig for PIC, PIT and IOAPIC?
Nah. PIC and I/O APIC can't be split (without new uAPI and non-trivial complexity),
and I highly doubt there is any use case that would want an in-kernel I/O APIC
with a userspace PIT. I.e. in practice, the threealmost always come as a group;
either a setup wants all, or a setup wants none.
> But hmm, I am not sure whether emulating IOAPIC has more value than PIC.
AIUI, it's not really an either or, since most software expects both an I/O APIC
and PIC. Any remotely modern kernel will definitely prefer the I/O APIC, but I
don't think it's something that can be guaranteed.
> For modern guests all emulated/assigned devices should just use MSI/MSI-X?
Not all emulated devices, since some legacy hang off the I/O APIC, i.e. aren't
capable of generating MISs.
> > So I think/hope the vast majority of users/readers will be able to intuit that
> > CONFIG_KVM_IOAPIC also covers the PIC and PIT.
>
> Sure.
>
> Btw, I also find irqchip_in_kernel() and irqchip_kernel() confusing. I am not
> sure the value of having irqchip_in_kernel() in fact. The guest should always
> have an in-kernel APIC for modern guests. I am wondering whether we can get rid
> of it completely (the logic will be it is always be true), or we can have a
> Kconfig to only build it when user truly wants it.
For better or worse, an in-kernel local APIC is still optional. I do hope/want
to make it mandatory, but that's not a small ABI change.
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-05-29 23:08 ` Sean Christopherson
@ 2025-05-29 23:55 ` Huang, Kai
2025-06-04 16:54 ` Paolo Bonzini
1 sibling, 0 replies; 32+ messages in thread
From: Huang, Kai @ 2025-05-29 23:55 UTC (permalink / raw)
To: seanjc@google.com
Cc: kvm@vger.kernel.org, pbonzini@redhat.com,
linux-kernel@vger.kernel.org, vkuznets@redhat.com
On Thu, 2025-05-29 at 16:08 -0700, Sean Christopherson wrote:
> On Thu, May 29, 2025, Kai Huang wrote:
> > On Thu, 2025-05-29 at 07:31 -0700, Sean Christopherson wrote:
> > > On Thu, May 29, 2025, Kai Huang wrote:
> > > > On Thu, 2025-05-29 at 23:55 +1200, Kai Huang wrote:
> > > > > On Mon, 2025-05-19 at 16:28 -0700, Sean Christopherson wrote:
> > > > > > Add a Kconfig to allowing building KVM without support for emulating an
> > > > > ^
> > > > > allow
> > > > >
> > > > > > I/O APIC, PIC, and PIT, which is desirable for deployments that effectively
> > > > > > don't support a fully in-kernel IRQ chip, i.e. never expect any VMM to
> > > > > > create an in-kernel I/O APIC.
> > > > >
> > > > > Do you happen to know what developments don't support a full in-kernel IRQ chip?
> > >
> > > Google Cloud, for one. I suspect/assume many/most CSPs don't utilize an in-kernel
> > > I/O APIC.
> > >
> > > > > Do they only support userspace IRQ chip, or not support any IRQ chip at all?
> > >
> > > The former, only userspace I/O APIC (and associated devices), though some VM
> > > shapes, e.g. TDX, don't provide an I/O APIC or PIC.
> >
> > Thanks for the info.
> >
> > Just wondering what's the benefit of using userspace IRQCHIP instead of
> > emulating in the kernel?
>
> Reduced kernel attack surface (this was especially true years ago, before KVM's
> I/O APIC emulation was well-tested) and more flexibility (e.g. shipping userspace
> changes is typically easier than shipping new kernels. I'm pretty sure there's
> one more big one that I'm blanking on at the moment.
Yeah those make sense. I thought it was from functionality/performance's
perspective but I was at wrong direction.
>
> > I thought one should either use in-kernel IRQCHIP or doesn't use any.
> >
> > >
> > > > Forgot to ask:
> > > >
> > > > Since this new Kconfig option is not only for IOAPIC but also includes PIC and
> > > > PIT, is CONFIG_KVM_IRQCHIP a better name?
> > >
> > > I much prefer IOAPIC, because IRQCHIP is far too ambiguous and confusing, e.g.
> > > just look at KVM's internal APIs, where these:
> > >
> > > irqchip_in_kernel()
> > > irqchip_kernel()
> > >
> > > are not equivalent. In practice, no modern guest kernel is going to utilize the
> > > PIC, and the PIT isn't an IRQ chip, i.e. isn't strictly covered by IRQCHIP either.
> >
> > Right.
> >
> > Maybe it is worth to further have dedicated Kconfig for PIC, PIT and IOAPIC?
>
> Nah. PIC and I/O APIC can't be split (without new uAPI and non-trivial complexity),
Right. I forgot this.
> and I highly doubt there is any use case that would want an in-kernel I/O APIC
> with a userspace PIT. I.e. in practice, the threealmost always come as a group;
> either a setup wants all, or a setup wants none.
OK.
>
> > But hmm, I am not sure whether emulating IOAPIC has more value than PIC.
>
> AIUI, it's not really an either or, since most software expects both an I/O APIC
> and PIC. Any remotely modern kernel will definitely prefer the I/O APIC, but I
> don't think it's something that can be guaranteed.
OK :-)
>
> > For modern guests all emulated/assigned devices should just use MSI/MSI-X?
>
> Not all emulated devices, since some legacy hang off the I/O APIC, i.e. aren't
> capable of generating MISs.
Yeah. I thought in those deployments the guests should not be configured to
have those devices.
>
> > > So I think/hope the vast majority of users/readers will be able to intuit that
> > > CONFIG_KVM_IOAPIC also covers the PIC and PIT.
> >
> > Sure.
> >
> > Btw, I also find irqchip_in_kernel() and irqchip_kernel() confusing. I am not
> > sure the value of having irqchip_in_kernel() in fact. The guest should always
> > have an in-kernel APIC for modern guests. I am wondering whether we can get rid
> > of it completely (the logic will be it is always be true), or we can have a
> > Kconfig to only build it when user truly wants it.
>
> For better or worse, an in-kernel local APIC is still optional. I do hope/want
> to make it mandatory, but that's not a small ABI change.
Right. The ABI change is concern.
Thanks for all the explanation!
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 05/15] KVM: x86: Fold kvm_setup_default_irq_routing() into kvm_ioapic_init()
2025-05-19 23:27 ` [PATCH 05/15] KVM: x86: Fold kvm_setup_default_irq_routing() into kvm_ioapic_init() Sean Christopherson
@ 2025-06-04 16:43 ` Paolo Bonzini
0 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2025-06-04 16:43 UTC (permalink / raw)
To: Sean Christopherson, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
On 5/20/25 01:27, Sean Christopherson wrote:
> Move the default IRQ routing table used for in-kernel I/O APIC routing to
> ioapic.c where it belongs, and fold the call to kvm_set_irq_routing() into
> kvm_ioapic_init() (the call via kvm_setup_default_irq_routing() is done
> immediately after kvm_ioapic_init()).
>
> In addition to making it more obvious that the so called "default" routing
> only applies to an in-kernel I/O APIC, getting it out of irq_comm.c will
> allow removing irq_comm.c entirely, and will also allow for guarding KVM's
> in-kernel I/O APIC emulation with a Kconfig with minimal #ifdefs.
>
> No functional change intended.
Well, it also applies to the PIC. Even though the IOAPIC and PIC (and
PIT) do come in a bundle, it's a bit weird to have the PIC routing
entries initialized by kvm_ioapic_init(). Please keep
kvm_setup_default_irq_routine() a separate function.
Paolo
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/ioapic.c | 32 ++++++++++++++++++++++++++++++++
> arch/x86/kvm/irq.h | 1 -
> arch/x86/kvm/irq_comm.c | 32 --------------------------------
> arch/x86/kvm/x86.c | 6 ------
> 4 files changed, 32 insertions(+), 39 deletions(-)
>
> diff --git a/arch/x86/kvm/ioapic.c b/arch/x86/kvm/ioapic.c
> index 8c8a8062eb19..dc45ea9f5b9c 100644
> --- a/arch/x86/kvm/ioapic.c
> +++ b/arch/x86/kvm/ioapic.c
> @@ -710,6 +710,32 @@ static const struct kvm_io_device_ops ioapic_mmio_ops = {
> .write = ioapic_mmio_write,
> };
>
> +#define IOAPIC_ROUTING_ENTRY(irq) \
> + { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
> + .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
> +#define ROUTING_ENTRY1(irq) IOAPIC_ROUTING_ENTRY(irq)
> +
> +#define PIC_ROUTING_ENTRY(irq) \
> + { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
> + .u.irqchip = { .irqchip = SELECT_PIC(irq), .pin = (irq) % 8 } }
> +#define ROUTING_ENTRY2(irq) \
> + IOAPIC_ROUTING_ENTRY(irq), PIC_ROUTING_ENTRY(irq)
> +
> +static const struct kvm_irq_routing_entry default_routing[] = {
> + ROUTING_ENTRY2(0), ROUTING_ENTRY2(1),
> + ROUTING_ENTRY2(2), ROUTING_ENTRY2(3),
> + ROUTING_ENTRY2(4), ROUTING_ENTRY2(5),
> + ROUTING_ENTRY2(6), ROUTING_ENTRY2(7),
> + ROUTING_ENTRY2(8), ROUTING_ENTRY2(9),
> + ROUTING_ENTRY2(10), ROUTING_ENTRY2(11),
> + ROUTING_ENTRY2(12), ROUTING_ENTRY2(13),
> + ROUTING_ENTRY2(14), ROUTING_ENTRY2(15),
> + ROUTING_ENTRY1(16), ROUTING_ENTRY1(17),
> + ROUTING_ENTRY1(18), ROUTING_ENTRY1(19),
> + ROUTING_ENTRY1(20), ROUTING_ENTRY1(21),
> + ROUTING_ENTRY1(22), ROUTING_ENTRY1(23),
> +};
> +
> int kvm_ioapic_init(struct kvm *kvm)
> {
> struct kvm_ioapic *ioapic;
> @@ -731,8 +757,14 @@ int kvm_ioapic_init(struct kvm *kvm)
> if (ret < 0) {
> kvm->arch.vioapic = NULL;
> kfree(ioapic);
> + return ret;
> }
>
> + ret = kvm_set_irq_routing(kvm, default_routing,
> + ARRAY_SIZE(default_routing), 0);
> + if (ret)
> + kvm_ioapic_destroy(kvm);
> +
> return ret;
> }
>
> diff --git a/arch/x86/kvm/irq.h b/arch/x86/kvm/irq.h
> index 33dd5666b656..f6134289523e 100644
> --- a/arch/x86/kvm/irq.h
> +++ b/arch/x86/kvm/irq.h
> @@ -107,7 +107,6 @@ void __kvm_migrate_timers(struct kvm_vcpu *vcpu);
>
> int apic_has_pending_timer(struct kvm_vcpu *vcpu);
>
> -int kvm_setup_default_irq_routing(struct kvm *kvm);
> int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> struct kvm_lapic_irq *irq,
> struct dest_map *dest_map);
> diff --git a/arch/x86/kvm/irq_comm.c b/arch/x86/kvm/irq_comm.c
> index b85e4be2ddff..998c4a34d87c 100644
> --- a/arch/x86/kvm/irq_comm.c
> +++ b/arch/x86/kvm/irq_comm.c
> @@ -334,38 +334,6 @@ bool kvm_intr_is_single_vcpu(struct kvm *kvm, struct kvm_lapic_irq *irq,
> }
> EXPORT_SYMBOL_GPL(kvm_intr_is_single_vcpu);
>
> -#define IOAPIC_ROUTING_ENTRY(irq) \
> - { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
> - .u.irqchip = { .irqchip = KVM_IRQCHIP_IOAPIC, .pin = (irq) } }
> -#define ROUTING_ENTRY1(irq) IOAPIC_ROUTING_ENTRY(irq)
> -
> -#define PIC_ROUTING_ENTRY(irq) \
> - { .gsi = irq, .type = KVM_IRQ_ROUTING_IRQCHIP, \
> - .u.irqchip = { .irqchip = SELECT_PIC(irq), .pin = (irq) % 8 } }
> -#define ROUTING_ENTRY2(irq) \
> - IOAPIC_ROUTING_ENTRY(irq), PIC_ROUTING_ENTRY(irq)
> -
> -static const struct kvm_irq_routing_entry default_routing[] = {
> - ROUTING_ENTRY2(0), ROUTING_ENTRY2(1),
> - ROUTING_ENTRY2(2), ROUTING_ENTRY2(3),
> - ROUTING_ENTRY2(4), ROUTING_ENTRY2(5),
> - ROUTING_ENTRY2(6), ROUTING_ENTRY2(7),
> - ROUTING_ENTRY2(8), ROUTING_ENTRY2(9),
> - ROUTING_ENTRY2(10), ROUTING_ENTRY2(11),
> - ROUTING_ENTRY2(12), ROUTING_ENTRY2(13),
> - ROUTING_ENTRY2(14), ROUTING_ENTRY2(15),
> - ROUTING_ENTRY1(16), ROUTING_ENTRY1(17),
> - ROUTING_ENTRY1(18), ROUTING_ENTRY1(19),
> - ROUTING_ENTRY1(20), ROUTING_ENTRY1(21),
> - ROUTING_ENTRY1(22), ROUTING_ENTRY1(23),
> -};
> -
> -int kvm_setup_default_irq_routing(struct kvm *kvm)
> -{
> - return kvm_set_irq_routing(kvm, default_routing,
> - ARRAY_SIZE(default_routing), 0);
> -}
> -
> void kvm_scan_ioapic_irq(struct kvm_vcpu *vcpu, u32 dest_id, u16 dest_mode,
> u8 vector, unsigned long *ioapic_handled_vectors)
> {
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index f9f798f286ce..4a9c252c9dab 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7118,12 +7118,6 @@ int kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
> goto create_irqchip_unlock;
> }
>
> - r = kvm_setup_default_irq_routing(kvm);
> - if (r) {
> - kvm_ioapic_destroy(kvm);
> - kvm_pic_destroy(kvm);
> - goto create_irqchip_unlock;
> - }
> /* Write kvm->irq_routing before enabling irqchip_in_kernel. */
> smp_wmb();
> kvm->arch.irqchip_mode = KVM_IRQCHIP_KERNEL;
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-05-29 23:08 ` Sean Christopherson
2025-05-29 23:55 ` Huang, Kai
@ 2025-06-04 16:54 ` Paolo Bonzini
2025-06-19 10:05 ` Naveen N Rao
1 sibling, 1 reply; 32+ messages in thread
From: Paolo Bonzini @ 2025-06-04 16:54 UTC (permalink / raw)
To: Sean Christopherson, Kai Huang
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
vkuznets@redhat.com
On 5/30/25 01:08, Sean Christopherson wrote:
> On Thu, May 29, 2025, Kai Huang wrote:
>> On Thu, 2025-05-29 at 07:31 -0700, Sean Christopherson wrote:
>>> On Thu, May 29, 2025, Kai Huang wrote:
>>>> On Thu, 2025-05-29 at 23:55 +1200, Kai Huang wrote:
>>>>> Do they only support userspace IRQ chip, or not support any IRQ chip at all?
>>>
>>> The former, only userspace I/O APIC (and associated devices), though some VM
>>> shapes, e.g. TDX, don't provide an I/O APIC or PIC.
>>
>> Thanks for the info.
>>
>> Just wondering what's the benefit of using userspace IRQCHIP instead of
>> emulating in the kernel?
>
> Reduced kernel attack surface (this was especially true years ago, before KVM's
> I/O APIC emulation was well-tested) and more flexibility (e.g. shipping userspace
> changes is typically easier than shipping new kernels. I'm pretty sure there's
> one more big one that I'm blanking on at the moment.
Feature-wise, the big one is support for IRQ remapping which is not
implemented in the in-kernel IOAPIC.
>>>> Forgot to ask:
>>>>
>>>> Since this new Kconfig option is not only for IOAPIC but also includes PIC and
>>>> PIT, is CONFIG_KVM_IRQCHIP a better name?
>>>
>>> I much prefer IOAPIC, because IRQCHIP is far too ambiguous and confusing, e.g.
>>> just look at KVM's internal APIs, where these:
>>>
>>> irqchip_in_kernel()
>>> irqchip_kernel()
>>>
>>> are not equivalent. In practice, no modern guest kernel is going to utilize the
>>> PIC, and the PIT isn't an IRQ chip, i.e. isn't strictly covered by IRQCHIP either.
>>
>> Right.
>>
>> Maybe it is worth to further have dedicated Kconfig for PIC, PIT and IOAPIC?
>
> Nah. PIC and I/O APIC can't be split (without new uAPI and non-trivial complexity),
> and I highly doubt there is any use case that would want an in-kernel I/O APIC
> with a userspace PIT. I.e. in practice, the three almost always come as a group;
> either a setup wants all, or a setup wants none.
Without "almost", even. I think it's okay to make it CONFIG_KVM_IOAPIC,
it's not super accurate but there's no single word that convey "IOAPIC,
PIC and PIT".
>> Btw, I also find irqchip_in_kernel() and irqchip_kernel() confusing. I am not
>> sure the value of having irqchip_in_kernel() in fact. The guest should always
>> have an in-kernel APIC for modern guests. I am wondering whether we can get rid
>> of it completely (the logic will be it is always be true), or we can have a
>> Kconfig to only build it when user truly wants it.
irqchip_kernel() can be renamed to irqchip_full().
> For better or worse, an in-kernel local APIC is still optional. I do hope/want
> to make it mandatory, but that's not a small ABI change.
I am pretty sure that some users (was it DOSBox? or maybe even gVisor?)
would break.
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
` (15 preceding siblings ...)
2025-05-29 11:58 ` [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Huang, Kai
@ 2025-06-04 16:56 ` Paolo Bonzini
16 siblings, 0 replies; 32+ messages in thread
From: Paolo Bonzini @ 2025-06-04 16:56 UTC (permalink / raw)
To: Sean Christopherson, Vitaly Kuznetsov; +Cc: kvm, linux-kernel
On 5/20/25 01:27, Sean Christopherson wrote:
> This series is prep work for the big device posted IRQs overhaul[1], in which
> Paolo suggested getting rid of arch/x86/kvm/irq_comm.c[2]. As I started
> chipping away bits of irq_comm.c to make the final code movement to irq.c as
> small as possible, I realized that (a) a rather large amount of irq_comm.c was
> actually I/O APIC code and (b) this would be a perfect opportunity to further
> isolate the I/O APIC code.
>
> So, a bit of hacking later and voila, CONFIG_KVM_IOAPIC. Similar to KVM's SMM
> and Xen Kconfigs, this is something we would enable in production straightaway,
> if we could magically fast-forwarded our kernel, as fully disabling I/O APIC
> emulation puts a decent chunk of guest-visible surface entirely out of reach.
>
> Side topic, Paolo's recollection that irq_comm.c was to hold common APIs between
> x86 and Itanium was spot on. Though when I read Paolo's mail, I parsed "ia64"
> as x86-64. I got quite a good laugh when I eventually realized that he really
> did mean ia64 :-)
I totally did!
Looks good, other than the small comments here and there that you
received and my "preference" for keeping kvm_setup_default_irq_routing()
a separate function.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 32+ messages in thread
* Re: [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC
2025-06-04 16:54 ` Paolo Bonzini
@ 2025-06-19 10:05 ` Naveen N Rao
0 siblings, 0 replies; 32+ messages in thread
From: Naveen N Rao @ 2025-06-19 10:05 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Sean Christopherson, Kai Huang, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org, vkuznets@redhat.com
[Sorry for bumping an old thread]
On Wed, Jun 04, 2025 at 06:54:44PM +0200, Paolo Bonzini wrote:
> On 5/30/25 01:08, Sean Christopherson wrote:
> > On Thu, May 29, 2025, Kai Huang wrote:
> > > On Thu, 2025-05-29 at 07:31 -0700, Sean Christopherson wrote:
> > > > On Thu, May 29, 2025, Kai Huang wrote:
> > > > > On Thu, 2025-05-29 at 23:55 +1200, Kai Huang wrote:
> > > > > > Do they only support userspace IRQ chip, or not support any IRQ chip at all?
> > > >
> > > > The former, only userspace I/O APIC (and associated devices), though some VM
> > > > shapes, e.g. TDX, don't provide an I/O APIC or PIC.
> > >
> > > Thanks for the info.
> > >
> > > Just wondering what's the benefit of using userspace IRQCHIP instead of
> > > emulating in the kernel?
> >
> > Reduced kernel attack surface (this was especially true years ago, before KVM's
> > I/O APIC emulation was well-tested) and more flexibility (e.g. shipping userspace
> > changes is typically easier than shipping new kernels. I'm pretty sure there's
> > one more big one that I'm blanking on at the moment.
>
> Feature-wise, the big one is support for IRQ remapping which is not
> implemented in the in-kernel IOAPIC.
Is there a reason to prefer the in-kernel IOAPIC today, seeing as it is
still the default with Qemu?
Thanks,
Naveen
^ permalink raw reply [flat|nested] 32+ messages in thread
end of thread, other threads:[~2025-06-19 10:06 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-19 23:27 [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Sean Christopherson
2025-05-19 23:27 ` [PATCH 01/15] KVM: x86: Trigger I/O APIC route rescan in kvm_arch_irq_routing_update() Sean Christopherson
2025-05-19 23:27 ` [PATCH 02/15] KVM: x86: Drop superfluous kvm_set_pic_irq() => kvm_pic_set_irq() wrapper Sean Christopherson
2025-05-19 23:27 ` [PATCH 03/15] KVM: x86: Drop superfluous kvm_set_ioapic_irq() => kvm_ioapic_set_irq() wrapper Sean Christopherson
2025-05-19 23:27 ` [PATCH 04/15] KVM: x86: Drop superfluous kvm_hv_set_sint() => kvm_hv_synic_set_irq() wrapper Sean Christopherson
2025-05-20 9:57 ` Vitaly Kuznetsov
2025-05-29 11:37 ` Huang, Kai
2025-05-29 14:39 ` Sean Christopherson
2025-05-29 22:34 ` Huang, Kai
2025-05-19 23:27 ` [PATCH 05/15] KVM: x86: Fold kvm_setup_default_irq_routing() into kvm_ioapic_init() Sean Christopherson
2025-06-04 16:43 ` Paolo Bonzini
2025-05-19 23:27 ` [PATCH 06/15] KVM: x86: Move kvm_{request,free}_irq_source_id() to i8254.c (PIT) Sean Christopherson
2025-05-19 23:28 ` [PATCH 07/15] KVM: x86: Hardcode the PIT IRQ source ID to '2' Sean Christopherson
2025-05-19 23:28 ` [PATCH 08/15] KVM: x86: Don't clear PIT's IRQ line status when destroying PIT Sean Christopherson
2025-05-29 11:41 ` Huang, Kai
2025-05-19 23:28 ` [PATCH 09/15] KVM: x86: Explicitly check for in-kernel PIC when getting ExtINT Sean Christopherson
2025-05-19 23:28 ` [PATCH 10/15] KVM: Move x86-only tracepoints to x86's trace.h Sean Christopherson
2025-05-19 23:28 ` [PATCH 11/15] KVM: x86: Add CONFIG_KVM_IOAPIC to allow disabling in-kernel I/O APIC Sean Christopherson
2025-05-29 11:55 ` Huang, Kai
2025-05-29 11:57 ` Huang, Kai
2025-05-29 14:31 ` Sean Christopherson
2025-05-29 22:51 ` Huang, Kai
2025-05-29 23:08 ` Sean Christopherson
2025-05-29 23:55 ` Huang, Kai
2025-06-04 16:54 ` Paolo Bonzini
2025-06-19 10:05 ` Naveen N Rao
2025-05-19 23:28 ` [PATCH 12/15] KVM: Squash two CONFIG_HAVE_KVM_IRQCHIP #ifdefs into one Sean Christopherson
2025-05-19 23:28 ` [PATCH 13/15] KVM: selftests: Fall back to split IRQ chip if full in-kernel chip is unsupported Sean Christopherson
2025-05-19 23:28 ` [PATCH 14/15] KVM: x86: Move IRQ mask notifier infrastructure to I/O APIC emulation Sean Christopherson
2025-05-19 23:28 ` [PATCH 15/15] KVM: x86: Fold irq_comm.c into irq.c Sean Christopherson
2025-05-29 11:58 ` [PATCH 00/15] KVM: x86: Add I/O APIC kconfig, delete irq_comm.c Huang, Kai
2025-06-04 16:56 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).