* [PATCH v2 01/39] irqchip/gic-v5: Allow KVM setup without a maintenance IRQ
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
@ 2026-05-21 14:49 ` Sascha Bischoff
2026-05-21 14:49 ` [PATCH v2 02/39] irqchip/gic-v5: Provide OF IRS config frame attrs to KVM Sascha Bischoff
` (37 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:49 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
GICv5 does not require a virtual CPU interface maintenance interrupt
for native GCIE operation. The interrupt is only needed when
FEAT_GCIE_LEGACY is present, as the legacy GICv3 interface still
relies on maintenance IRQ delivery.
Stop rejecting KVM setup solely because the maintenance interrupt is
absent. Parse the interrupt if present, but if none is described and
the system does not advertise FEAT_GCIE_LEGACY, tell KVM that no
maintenance interrupt is required.
This lets native GICv5 KVM support be registered on systems that do
not provide a maintenance interrupt, while requiring a maintenance
interrupt for GICv3-capable systems.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
drivers/irqchip/irq-gic-v5.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v5.c b/drivers/irqchip/irq-gic-v5.c
index e9d1795235a66..600726b5c0a46 100644
--- a/drivers/irqchip/irq-gic-v5.c
+++ b/drivers/irqchip/irq-gic-v5.c
@@ -1141,12 +1141,19 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
gic_v5_kvm_info.type = GIC_V5;
/* GIC Virtual CPU interface maintenance interrupt */
- gic_v5_kvm_info.no_maint_irq_mask = false;
gic_v5_kvm_info.maint_irq = irq_of_parse_and_map(node, 0);
- if (!gic_v5_kvm_info.maint_irq) {
- pr_warn("cannot find GICv5 virtual CPU interface maintenance interrupt\n");
- return;
- }
+
+ /*
+ * We require an MI if we have legacy support, but don't, otherwise.
+ * Given that there's an existing flag to convey that an MI isn't
+ * needed, we (ab)use it to tell KVM that the MI isn't needed if we
+ * don't support legacy.
+ *
+ * The check for ARM64_HAS_GICV5_LEGACY explicitly doesn't use
+ * cpus_have_final_cap() here as we run too early.
+ */
+ if (!cpus_have_cap(ARM64_HAS_GICV5_LEGACY) && !gic_v5_kvm_info.maint_irq)
+ gic_v5_kvm_info.no_maint_irq_mask = true;
vgic_set_kvm_info(&gic_v5_kvm_info);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 02/39] irqchip/gic-v5: Provide OF IRS config frame attrs to KVM
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
2026-05-21 14:49 ` [PATCH v2 01/39] irqchip/gic-v5: Allow KVM setup without a maintenance IRQ Sascha Bischoff
@ 2026-05-21 14:49 ` Sascha Bischoff
2026-05-21 14:50 ` [PATCH v2 03/39] irqchip/gic-v5: Setup gic_kvm_info on ACPI hosts Sascha Bischoff
` (36 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:49 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
KVM needs to interact with the host IRS in order to, for example, make
VMs or VPEs valid. There are two potential approaches here. Either the
host irqchip driver can provide an interface, or KVM can interact
directly with the host IRS. The latter of these two is chosen as the
set of MMIO registers that KVM needs to interact with is orthogonal to
the set used by the host irqchip driver (with the exception of some of
the read-only IRS_IDRx registers).
Pass KVM a pointer to an IRS config frame - the config frame belonging
to ANY IRS is fine as long as one IRS's config frame is used
consistently - in struct gic_kvm_info. Additionally, include a flag
telling KVM whether the IRS is coherent or non-coherent in order to
make sure that KVM can do the correct cache state management, if
required.
Only OF (Device Tree) is supported with this change. ACPI is not.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
drivers/irqchip/irq-gic-v5-irs.c | 7 +++++--
drivers/irqchip/irq-gic-v5.c | 5 +++++
include/linux/irqchip/arm-gic-v5.h | 3 +++
include/linux/irqchip/arm-vgic-info.h | 5 +++++
4 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v5-irs.c b/drivers/irqchip/irq-gic-v5-irs.c
index f3fce0b1e25d9..607e066821b52 100644
--- a/drivers/irqchip/irq-gic-v5-irs.c
+++ b/drivers/irqchip/irq-gic-v5-irs.c
@@ -21,8 +21,6 @@
*/
#define LPI_ID_BITS_LINEAR 12
-#define IRS_FLAGS_NON_COHERENT BIT(0)
-
static DEFINE_PER_CPU_READ_MOSTLY(struct gicv5_irs_chip_data *, per_cpu_irs_data);
static LIST_HEAD(irs_nodes);
@@ -50,6 +48,11 @@ static void irs_writeq_relaxed(struct gicv5_irs_chip_data *irs_data,
writeq_relaxed(val, irs_data->irs_base + reg_offset);
}
+struct gicv5_irs_chip_data *gicv5_irs_get_chip_data(void)
+{
+ return per_cpu(per_cpu_irs_data, 0);
+}
+
/*
* The polling wait (in gicv5_wait_for_op_s_atomic()) on a GIC register
* provides the memory barriers (through MMIO accessors)
diff --git a/drivers/irqchip/irq-gic-v5.c b/drivers/irqchip/irq-gic-v5.c
index 600726b5c0a46..707deabbf2f63 100644
--- a/drivers/irqchip/irq-gic-v5.c
+++ b/drivers/irqchip/irq-gic-v5.c
@@ -1128,6 +1128,8 @@ static struct gic_kvm_info gic_v5_kvm_info __initdata;
static void __init gic_of_setup_kvm_info(struct device_node *node)
{
+ struct gicv5_irs_chip_data *irs_data = gicv5_irs_get_chip_data();
+
/*
* If we don't have native GICv5 virtualisation support, then
* we also don't have FEAT_GCIE_LEGACY - the architecture
@@ -1140,6 +1142,9 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
gic_v5_kvm_info.type = GIC_V5;
+ gic_v5_kvm_info.gicv5_irs.base = irs_data->irs_base;
+ gic_v5_kvm_info.gicv5_irs.non_coherent = !!(irs_data->flags & IRS_FLAGS_NON_COHERENT);
+
/* GIC Virtual CPU interface maintenance interrupt */
gic_v5_kvm_info.maint_irq = irq_of_parse_and_map(node, 0);
diff --git a/include/linux/irqchip/arm-gic-v5.h b/include/linux/irqchip/arm-gic-v5.h
index f78787e654f4c..681c5c51207d6 100644
--- a/include/linux/irqchip/arm-gic-v5.h
+++ b/include/linux/irqchip/arm-gic-v5.h
@@ -330,6 +330,8 @@ struct gicv5_irs_chip_data {
raw_spinlock_t spi_config_lock;
};
+#define IRS_FLAGS_NON_COHERENT BIT(0)
+
static inline int gicv5_wait_for_op_s_atomic(void __iomem *addr, u32 offset,
const char *reg_s, u32 mask,
u32 *val)
@@ -377,6 +379,7 @@ void __init gicv5_free_lpi_domain(void);
int gicv5_irs_of_probe(struct device_node *parent);
int gicv5_irs_acpi_probe(void);
+struct gicv5_irs_chip_data *gicv5_irs_get_chip_data(void);
void gicv5_irs_remove(void);
int gicv5_irs_enable(void);
void gicv5_irs_its_probe(void);
diff --git a/include/linux/irqchip/arm-vgic-info.h b/include/linux/irqchip/arm-vgic-info.h
index 67d9d960273b9..f05370e2debf4 100644
--- a/include/linux/irqchip/arm-vgic-info.h
+++ b/include/linux/irqchip/arm-vgic-info.h
@@ -38,6 +38,11 @@ struct gic_kvm_info {
bool has_v4_1;
/* Deactivation impared, subpar stuff */
bool no_hw_deactivation;
+ /* GICv5 IRS base */
+ struct {
+ void __iomem *base;
+ bool non_coherent;
+ } gicv5_irs;
};
#ifdef CONFIG_KVM
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 03/39] irqchip/gic-v5: Setup gic_kvm_info on ACPI hosts
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
2026-05-21 14:49 ` [PATCH v2 01/39] irqchip/gic-v5: Allow KVM setup without a maintenance IRQ Sascha Bischoff
2026-05-21 14:49 ` [PATCH v2 02/39] irqchip/gic-v5: Provide OF IRS config frame attrs to KVM Sascha Bischoff
@ 2026-05-21 14:50 ` Sascha Bischoff
2026-05-27 10:51 ` Marc Zyngier
2026-05-21 14:50 ` [PATCH v2 04/39] KVM: arm64: gic-v5: Define remaining IRS MMIO registers Sascha Bischoff
` (35 subsequent siblings)
38 siblings, 1 reply; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:50 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Device-tree based GICv5 probing already passes the IRS details and
maintenance interrupt to KVM, but the ACPI path only initialises the
irqchip and installs the ACPI IRQ model. As a result, KVM never sees
the GICv5 host information required to probe the vGIC on ACPI systems.
Add the ACPI equivalent of the DT KVM setup. Parse the MADT GICC
entries for the maintenance interrupt, require all relevant entries to
agree, register the interrupt as a GICv5 PPI-encoded GSI, and pass the
resulting IRQ together with the IRS base and coherency information to
KVM. Native GICv5 does not require a maintenance interrupt unless the
legacy GICv3-compatible CPU interface is present, so preserve the
existing no-maintenance-IRQ handling for that case.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
drivers/irqchip/irq-gic-v5.c | 103 +++++++++++++++++++++++++++++++++--
1 file changed, 98 insertions(+), 5 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v5.c b/drivers/irqchip/irq-gic-v5.c
index 707deabbf2f63..ccd1ec69a6ab2 100644
--- a/drivers/irqchip/irq-gic-v5.c
+++ b/drivers/irqchip/irq-gic-v5.c
@@ -1126,7 +1126,7 @@ static void gicv5_set_cpuif_idbits(void)
#ifdef CONFIG_KVM
static struct gic_kvm_info gic_v5_kvm_info __initdata;
-static void __init gic_of_setup_kvm_info(struct device_node *node)
+static void __init gic_setup_kvm_info(unsigned int maint_irq)
{
struct gicv5_irs_chip_data *irs_data = gicv5_irs_get_chip_data();
@@ -1140,13 +1140,14 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
return;
}
- gic_v5_kvm_info.type = GIC_V5;
+ if (WARN_ON(!irs_data))
+ return;
+ gic_v5_kvm_info.type = GIC_V5;
gic_v5_kvm_info.gicv5_irs.base = irs_data->irs_base;
gic_v5_kvm_info.gicv5_irs.non_coherent = !!(irs_data->flags & IRS_FLAGS_NON_COHERENT);
-
- /* GIC Virtual CPU interface maintenance interrupt */
- gic_v5_kvm_info.maint_irq = irq_of_parse_and_map(node, 0);
+ gic_v5_kvm_info.maint_irq = maint_irq;
+ gic_v5_kvm_info.no_maint_irq_mask = false;
/*
* We require an MI if we have legacy support, but don't, otherwise.
@@ -1162,10 +1163,101 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
vgic_set_kvm_info(&gic_v5_kvm_info);
}
+
+static void __init gic_of_setup_kvm_info(struct device_node *node)
+{
+ /* GIC Virtual CPU interface maintenance interrupt */
+ gic_setup_kvm_info(irq_of_parse_and_map(node, 0));
+}
+
+#ifdef CONFIG_ACPI
+struct gicv5_acpi_kvm_info {
+ u32 maint_irq;
+ int maint_irq_mode;
+};
+
+static struct gicv5_acpi_kvm_info acpi_v5_kvm_info __initdata;
+
+static int __init gic_acpi_parse_virt_madt_gicc(union acpi_subtable_headers *header,
+ const unsigned long end)
+{
+ struct acpi_madt_generic_interrupt *gicc =
+ (struct acpi_madt_generic_interrupt *)header;
+ static int first_madt = true;
+ int maint_irq_mode;
+
+ if (!(gicc->flags &
+ (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE)))
+ return 0;
+
+ maint_irq_mode = (gicc->flags & ACPI_MADT_VGIC_IRQ_MODE) ?
+ ACPI_EDGE_SENSITIVE : ACPI_LEVEL_SENSITIVE;
+
+ if (first_madt) {
+ first_madt = false;
+
+ acpi_v5_kvm_info.maint_irq = gicc->vgic_interrupt;
+ acpi_v5_kvm_info.maint_irq_mode = maint_irq_mode;
+ return 0;
+ }
+
+ /* The maintenance interrupt must be the same for every GICC entry. */
+ if (acpi_v5_kvm_info.maint_irq != gicc->vgic_interrupt ||
+ acpi_v5_kvm_info.maint_irq_mode != maint_irq_mode)
+ return -EINVAL;
+
+ return 0;
+}
+
+static bool __init gic_acpi_collect_virt_info(void)
+{
+ int count;
+
+ acpi_v5_kvm_info.maint_irq = 0;
+ acpi_v5_kvm_info.maint_irq_mode = 0;
+
+ count = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT,
+ gic_acpi_parse_virt_madt_gicc, 0);
+
+ return count > 0;
+}
+
+static void __init gic_acpi_setup_kvm_info(void)
+{
+ unsigned int maint_irq = 0;
+ int irq;
+
+ if (!gic_acpi_collect_virt_info()) {
+ pr_warn("Unable to get hardware information used for virtualization\n");
+ return;
+ }
+
+ if (acpi_v5_kvm_info.maint_irq) {
+ u32 gsi = FIELD_PREP(GICV5_HWIRQ_TYPE, GICV5_HWIRQ_TYPE_PPI) |
+ FIELD_PREP(GICV5_HWIRQ_ID, acpi_v5_kvm_info.maint_irq);
+
+ irq = acpi_register_gsi(NULL, gsi,
+ acpi_v5_kvm_info.maint_irq_mode,
+ ACPI_ACTIVE_HIGH);
+ if (irq <= 0)
+ return;
+
+ maint_irq = irq;
+ }
+
+ gic_setup_kvm_info(maint_irq);
+}
+#endif
#else
static inline void __init gic_of_setup_kvm_info(struct device_node *node)
{
}
+
+#ifdef CONFIG_ACPI
+static inline void __init gic_acpi_setup_kvm_info(void)
+{
+}
+#endif
#endif // CONFIG_KVM
static int __init gicv5_init_common(struct fwnode_handle *parent_domain)
@@ -1264,6 +1356,7 @@ static int __init gic_acpi_init(union acpi_subtable_headers *header, const unsig
goto out_irs;
acpi_set_irq_model(ACPI_IRQ_MODEL_GIC_V5, gic_v5_get_gsi_domain_id);
+ gic_acpi_setup_kvm_info();
return 0;
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* Re: [PATCH v2 03/39] irqchip/gic-v5: Setup gic_kvm_info on ACPI hosts
2026-05-21 14:50 ` [PATCH v2 03/39] irqchip/gic-v5: Setup gic_kvm_info on ACPI hosts Sascha Bischoff
@ 2026-05-27 10:51 ` Marc Zyngier
0 siblings, 0 replies; 42+ messages in thread
From: Marc Zyngier @ 2026-05-27 10:51 UTC (permalink / raw)
To: Sascha Bischoff
Cc: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org, nd, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
On Thu, 21 May 2026 15:50:09 +0100,
Sascha Bischoff <Sascha.Bischoff@arm.com> wrote:
>
> Device-tree based GICv5 probing already passes the IRS details and
> maintenance interrupt to KVM, but the ACPI path only initialises the
> irqchip and installs the ACPI IRQ model. As a result, KVM never sees
> the GICv5 host information required to probe the vGIC on ACPI systems.
>
> Add the ACPI equivalent of the DT KVM setup. Parse the MADT GICC
> entries for the maintenance interrupt, require all relevant entries to
> agree, register the interrupt as a GICv5 PPI-encoded GSI, and pass the
> resulting IRQ together with the IRS base and coherency information to
> KVM. Native GICv5 does not require a maintenance interrupt unless the
> legacy GICv3-compatible CPU interface is present, so preserve the
> existing no-maintenance-IRQ handling for that case.
>
> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
> ---
> drivers/irqchip/irq-gic-v5.c | 103 +++++++++++++++++++++++++++++++++--
> 1 file changed, 98 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v5.c b/drivers/irqchip/irq-gic-v5.c
> index 707deabbf2f63..ccd1ec69a6ab2 100644
> --- a/drivers/irqchip/irq-gic-v5.c
> +++ b/drivers/irqchip/irq-gic-v5.c
> @@ -1126,7 +1126,7 @@ static void gicv5_set_cpuif_idbits(void)
> #ifdef CONFIG_KVM
> static struct gic_kvm_info gic_v5_kvm_info __initdata;
>
> -static void __init gic_of_setup_kvm_info(struct device_node *node)
> +static void __init gic_setup_kvm_info(unsigned int maint_irq)
> {
> struct gicv5_irs_chip_data *irs_data = gicv5_irs_get_chip_data();
>
> @@ -1140,13 +1140,14 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
> return;
> }
>
> - gic_v5_kvm_info.type = GIC_V5;
> + if (WARN_ON(!irs_data))
> + return;
>
> + gic_v5_kvm_info.type = GIC_V5;
> gic_v5_kvm_info.gicv5_irs.base = irs_data->irs_base;
> gic_v5_kvm_info.gicv5_irs.non_coherent = !!(irs_data->flags & IRS_FLAGS_NON_COHERENT);
> -
> - /* GIC Virtual CPU interface maintenance interrupt */
> - gic_v5_kvm_info.maint_irq = irq_of_parse_and_map(node, 0);
> + gic_v5_kvm_info.maint_irq = maint_irq;
> + gic_v5_kvm_info.no_maint_irq_mask = false;
You remove this last line from patch #1, and reintroduce it here. My
gut feeling is that it should never be removed the first place.
>
> /*
> * We require an MI if we have legacy support, but don't, otherwise.
> @@ -1162,10 +1163,101 @@ static void __init gic_of_setup_kvm_info(struct device_node *node)
>
> vgic_set_kvm_info(&gic_v5_kvm_info);
> }
> +
> +static void __init gic_of_setup_kvm_info(struct device_node *node)
> +{
> + /* GIC Virtual CPU interface maintenance interrupt */
> + gic_setup_kvm_info(irq_of_parse_and_map(node, 0));
> +}
> +
> +#ifdef CONFIG_ACPI
> +struct gicv5_acpi_kvm_info {
> + u32 maint_irq;
> + int maint_irq_mode;
> +};
> +
> +static struct gicv5_acpi_kvm_info acpi_v5_kvm_info __initdata;
> +
> +static int __init gic_acpi_parse_virt_madt_gicc(union acpi_subtable_headers *header,
> + const unsigned long end)
> +{
> + struct acpi_madt_generic_interrupt *gicc =
> + (struct acpi_madt_generic_interrupt *)header;
> + static int first_madt = true;
> + int maint_irq_mode;
> +
> + if (!(gicc->flags &
> + (ACPI_MADT_ENABLED | ACPI_MADT_GICC_ONLINE_CAPABLE)))
> + return 0;
> +
> + maint_irq_mode = (gicc->flags & ACPI_MADT_VGIC_IRQ_MODE) ?
> + ACPI_EDGE_SENSITIVE : ACPI_LEVEL_SENSITIVE;
> +
> + if (first_madt) {
> + first_madt = false;
> +
> + acpi_v5_kvm_info.maint_irq = gicc->vgic_interrupt;
> + acpi_v5_kvm_info.maint_irq_mode = maint_irq_mode;
> + return 0;
> + }
> +
> + /* The maintenance interrupt must be the same for every GICC entry. */
> + if (acpi_v5_kvm_info.maint_irq != gicc->vgic_interrupt ||
> + acpi_v5_kvm_info.maint_irq_mode != maint_irq_mode)
> + return -EINVAL;
> +
> + return 0;
> +}
> +
> +static bool __init gic_acpi_collect_virt_info(void)
> +{
> + int count;
> +
> + acpi_v5_kvm_info.maint_irq = 0;
> + acpi_v5_kvm_info.maint_irq_mode = 0;
> +
> + count = acpi_table_parse_madt(ACPI_MADT_TYPE_GENERIC_INTERRUPT,
> + gic_acpi_parse_virt_madt_gicc, 0);
> +
> + return count > 0;
> +}
> +
> +static void __init gic_acpi_setup_kvm_info(void)
> +{
> + unsigned int maint_irq = 0;
> + int irq;
> +
> + if (!gic_acpi_collect_virt_info()) {
> + pr_warn("Unable to get hardware information used for virtualization\n");
> + return;
> + }
> +
> + if (acpi_v5_kvm_info.maint_irq) {
> + u32 gsi = FIELD_PREP(GICV5_HWIRQ_TYPE, GICV5_HWIRQ_TYPE_PPI) |
> + FIELD_PREP(GICV5_HWIRQ_ID, acpi_v5_kvm_info.maint_irq);
> +
> + irq = acpi_register_gsi(NULL, gsi,
> + acpi_v5_kvm_info.maint_irq_mode,
> + ACPI_ACTIVE_HIGH);
> + if (irq <= 0)
> + return;
This probably deserves a bit of a warning. And maybe not completely
fail the registration with KVM?
Thanks,
M.
--
Without deviation from the norm, progress is not possible.
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 04/39] KVM: arm64: gic-v5: Define remaining IRS MMIO registers
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (2 preceding siblings ...)
2026-05-21 14:50 ` [PATCH v2 03/39] irqchip/gic-v5: Setup gic_kvm_info on ACPI hosts Sascha Bischoff
@ 2026-05-21 14:50 ` Sascha Bischoff
2026-05-21 14:50 ` [PATCH v2 05/39] arm64/sysreg: Add GICv5 GIC VDPEND and VDRCFG encodings Sascha Bischoff
` (34 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:50 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Complete the set of defined IRS MMIO registers in the GICv5 header
file. Up until now, the set of defined IRS MMIO registers has been
driven by code requirements. However, in order to properly emulate the
IRS MMIO interface in KVM, the full set of IRS MMIO registers needs to
be added.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
include/linux/irqchip/arm-gic-v5.h | 203 +++++++++++++++++++++++++++--
1 file changed, 194 insertions(+), 9 deletions(-)
diff --git a/include/linux/irqchip/arm-gic-v5.h b/include/linux/irqchip/arm-gic-v5.h
index 681c5c51207d6..dd7da568ee8b8 100644
--- a/include/linux/irqchip/arm-gic-v5.h
+++ b/include/linux/irqchip/arm-gic-v5.h
@@ -62,20 +62,34 @@
#define GICV5_OUTER_SHARE 0b10
#define GICV5_INNER_SHARE 0b11
+#define GICV5_AIDR_COMPONENT_IRS 0b00
+#define GICV5_AIDR_COMPONENT_ITS 0b01
+#define GICV5_AIDR_COMPONENT_IWB 0b10
+
+#define GICV5_AIDR_ARCH_MAJ_REV_V5 0
+#define GICV5_AIDR_ARCH_MIN_REV_V0 0
+
/*
* IRS registers and tables structures
*/
#define GICV5_IRS_IDR0 0x0000
#define GICV5_IRS_IDR1 0x0004
#define GICV5_IRS_IDR2 0x0008
+#define GICV5_IRS_IDR3 0x000c
+#define GICV5_IRS_IDR4 0x0010
#define GICV5_IRS_IDR5 0x0014
#define GICV5_IRS_IDR6 0x0018
#define GICV5_IRS_IDR7 0x001c
+#define GICV5_IRS_IIDR 0x0040
+#define GICV5_IRS_AIDR 0x0044
#define GICV5_IRS_CR0 0x0080
#define GICV5_IRS_CR1 0x0084
#define GICV5_IRS_SYNCR 0x00c0
#define GICV5_IRS_SYNC_STATUSR 0x00c4
+#define GICV5_IRS_SPI_VMR 0x0100
#define GICV5_IRS_SPI_SELR 0x0108
+#define GICV5_IRS_SPI_DOMAINR 0x010c
+#define GICV5_IRS_SPI_RESAMPLER 0x0110
#define GICV5_IRS_SPI_CFGR 0x0114
#define GICV5_IRS_SPI_STATUSR 0x0118
#define GICV5_IRS_PE_SELR 0x0140
@@ -85,11 +99,51 @@
#define GICV5_IRS_IST_CFGR 0x0190
#define GICV5_IRS_IST_STATUSR 0x0194
#define GICV5_IRS_MAP_L2_ISTR 0x01c0
-
+#define GICV5_IRS_VMT_BASER 0x0200
+#define GICV5_IRS_VMT_CFGR 0x0210
+#define GICV5_IRS_VMT_STATUSR 0x0214
+#define GICV5_IRS_VPE_SELR 0x0240
+#define GICV5_IRS_VPE_DBR 0x0248
+#define GICV5_IRS_VPE_HPPIR 0x0250
+#define GICV5_IRS_VPE_CR0 0x0258
+#define GICV5_IRS_VPE_STATUSR 0x025c
+#define GICV5_IRS_VM_DBR 0x0280
+#define GICV5_IRS_VM_SELR 0x0288
+#define GICV5_IRS_VM_STATUSR 0x028c
+#define GICV5_IRS_VMAP_L2_VMTR 0x02c0
+#define GICV5_IRS_VMAP_VMR 0x02c8
+#define GICV5_IRS_VMAP_VISTR 0x02d0
+#define GICV5_IRS_VMAP_L2_VISTR 0x02d8
+#define GICV5_IRS_VMAP_VPER 0x02e0
+#define GICV5_IRS_SAVE_VMR 0x0300
+#define GICV5_IRS_SAVE_VM_STATUSR 0x0308
+#define GICV5_IRS_MEC_IDR 0x0340
+#define GICV5_IRS_MEC_MECID_R 0x0344
+#define GICV5_IRS_MPAM_IDR 0x0380
+#define GICV5_IRS_MPAM_PARTID_R 0x0384
+#define GICV5_IRS_SWERR_STATUSR 0x03c0
+#define GICV5_IRS_SWERR_SYNDROMER0 0x03c8
+#define GICV5_IRS_SWERR_SYNDROMER1 0x03d0
+
+#define GICV5_IRS_IDR0_IRSID GENMASK(31, 16)
+#define GICV5_IRS_IDR0_SWE BIT(12)
+#define GICV5_IRS_IDR0_MPAM BIT(11)
+#define GICV5_IRS_IDR0_MEC BIT(10)
+#define GICV5_IRS_IDR0_SETLPI BIT(9)
+#define GICV5_IRS_IDR0_VIRT_ONE_N BIT(8)
+#define GICV5_IRS_IDR0_ONE_N BIT(7)
#define GICV5_IRS_IDR0_VIRT BIT(6)
+#define GICV5_IRS_IDR0_PA_RANGE GENMASK(5, 2)
+#define GICV5_IRS_IDR0_INT_DOM GENMASK(1, 0)
+
+#define GICV5_IRS_IDR0_INT_DOM_SECURE 0b00
+#define GICV5_IRS_IDR0_INT_DOM_NON_SECURE 0b01
+#define GICV5_IRS_IDR0_INT_DOM_EL3 0b10
+#define GICV5_IRS_IDR0_INT_DOM_REALM 0b11
#define GICV5_IRS_IDR1_PRIORITY_BITS GENMASK(22, 20)
#define GICV5_IRS_IDR1_IAFFID_BITS GENMASK(19, 16)
+#define GICV5_IRS_IDR1_PE_CNT GENMASK(15, 0)
#define GICV5_IRS_IDR1_PRIORITY_BITS_1BITS 0b000
#define GICV5_IRS_IDR1_PRIORITY_BITS_2BITS 0b001
@@ -105,13 +159,30 @@
#define GICV5_IRS_IDR2_LPI BIT(5)
#define GICV5_IRS_IDR2_ID_BITS GENMASK(4, 0)
+#define GICV5_IRS_IST_L2SZ_SUPPORT_4KB(r) FIELD_GET(BIT(11), (r))
+#define GICV5_IRS_IST_L2SZ_SUPPORT_16KB(r) FIELD_GET(BIT(12), (r))
+#define GICV5_IRS_IST_L2SZ_SUPPORT_64KB(r) FIELD_GET(BIT(13), (r))
+
+#define GICV5_IRS_IDR3_VMT_LEVELS BIT(10)
+#define GICV5_IRS_IDR3_VM_ID_BITS GENMASK(9, 5)
+#define GICV5_IRS_IDR3_VMD_SZ GENMASK(4, 1)
+#define GICV5_IRS_IDR3_VMD BIT(0)
+
+#define GICV5_IRS_IDR4_VPE_ID_BITS GENMASK(9, 6)
+#define GICV5_IRS_IDR4_VPED_SZ GENMASK(5, 0)
+
#define GICV5_IRS_IDR5_SPI_RANGE GENMASK(24, 0)
#define GICV5_IRS_IDR6_SPI_IRS_RANGE GENMASK(24, 0)
#define GICV5_IRS_IDR7_SPI_BASE GENMASK(23, 0)
-#define GICV5_IRS_IST_L2SZ_SUPPORT_4KB(r) FIELD_GET(BIT(11), (r))
-#define GICV5_IRS_IST_L2SZ_SUPPORT_16KB(r) FIELD_GET(BIT(12), (r))
-#define GICV5_IRS_IST_L2SZ_SUPPORT_64KB(r) FIELD_GET(BIT(13), (r))
+#define GICV5_IRS_IIDR_PRODUCT_ID GENMASK(31, 20)
+#define GICV5_IRS_IIDR_VARIANT GENMASK(19, 16)
+#define GICV5_IRS_IIDR_REVISION GENMASK(15, 12)
+#define GICV5_IRS_IIDR_IMPLEMENTER GENMASK(11, 0)
+
+#define GICV5_IRS_AIDR_COMPONENT GENMASK(11, 8)
+#define GICV5_IRS_AIDR_ARCHMAJORREV GENMASK(7, 4)
+#define GICV5_IRS_AIDR_ARCHMINORREV GENMASK(3, 0)
#define GICV5_IRS_CR0_IDLE BIT(1)
#define GICV5_IRS_CR0_IRSEN BIT(0)
@@ -134,21 +205,39 @@
#define GICV5_IRS_SYNC_STATUSR_IDLE BIT(0)
-#define GICV5_IRS_SPI_STATUSR_V BIT(1)
-#define GICV5_IRS_SPI_STATUSR_IDLE BIT(0)
+#define GICV5_IRS_SPI_VMR_VIRT BIT_ULL(63)
+#define GICV5_IRS_SPI_VMR_VM_ID GENMASK_ULL(15, 0)
#define GICV5_IRS_SPI_SELR_ID GENMASK(23, 0)
+#define GICV5_IRS_SPI_DOMAINR_DOMAIN GENMASK(1, 0)
+
+#define GICV5_IRS_SPI_DOMAINR_DOMAIN_SECURE 0b00
+#define GICV5_IRS_SPI_DOMAINR_DOMAIN_NON_SECURE 0b01
+#define GICV5_IRS_SPI_DOMAINR_DOMAIN_EL3 0b10
+#define GICV5_IRS_SPI_DOMAINR_DOMAIN_REALM 0b11
+
+#define GICV5_IRS_SPI_RESAMPLER_ID GENMASK(23, 0)
+
#define GICV5_IRS_SPI_CFGR_TM BIT(0)
+#define GICV5_IRS_SPI_CFGR_TM_EDGE 0b0
+#define GICV5_IRS_SPI_CFGR_TM_LEVEL 0b1
+
+#define GICV5_IRS_SPI_STATUSR_V BIT(1)
+#define GICV5_IRS_SPI_STATUSR_IDLE BIT(0)
+
#define GICV5_IRS_PE_SELR_IAFFID GENMASK(15, 0)
+#define GICV5_IRS_PE_STATUSR_ONLINE BIT(2)
#define GICV5_IRS_PE_STATUSR_V BIT(1)
#define GICV5_IRS_PE_STATUSR_IDLE BIT(0)
#define GICV5_IRS_PE_CR0_DPS BIT(0)
-#define GICV5_IRS_IST_STATUSR_IDLE BIT(0)
+#define GICV5_IRS_IST_BASER_ADDR_MASK GENMASK_ULL(55, 6)
+#define GICV5_IRS_IST_BASER_VALID BIT_ULL(0)
+#define GICV5_IRS_IST_BASER_ADDR_SHIFT 6ULL
#define GICV5_IRS_IST_CFGR_STRUCTURE BIT(16)
#define GICV5_IRS_IST_CFGR_ISTSZ GENMASK(8, 7)
@@ -166,15 +255,111 @@
#define GICV5_IRS_IST_CFGR_L2SZ_16K 0b01
#define GICV5_IRS_IST_CFGR_L2SZ_64K 0b10
-#define GICV5_IRS_IST_BASER_ADDR_MASK GENMASK_ULL(55, 6)
-#define GICV5_IRS_IST_BASER_VALID BIT_ULL(0)
+#define GICV5_IRS_IST_STATUSR_IDLE BIT(0)
#define GICV5_IRS_MAP_L2_ISTR_ID GENMASK(23, 0)
+#define GICV5_IRS_VMT_BASER_ADDR GENMASK_ULL(55, 3)
+#define GICV5_IRS_VMT_BASER_ADDR_SHIFT 3ULL
+#define GICV5_IRS_VMT_BASER_VALID BIT_ULL(0)
+
+#define GICV5_IRS_VMT_CFGR_STRUCTURE_TWO_LEVEL 0b1
+#define GICV5_IRS_VMT_CFGR_STRUCTURE_LINEAR 0b0
+
+#define GICV5_IRS_VMT_CFGR_STRUCTURE BIT(16)
+#define GICV5_IRS_VMT_CFGR_VM_ID_BITS GENMASK(4, 0)
+
+#define GICV5_IRS_VMT_STATUSR_IDLE BIT(0)
+
+#define GICV5_IRS_VPE_SELR_S BIT_ULL(63)
+#define GICV5_IRS_VPE_SELR_VPE_ID GENMASK_ULL(47, 32)
+#define GICV5_IRS_VPE_SELR_VM_ID GENMASK_ULL(15, 0)
+
+#define GICV5_IRS_VPE_DBR_DBV BIT_ULL(63)
+#define GICV5_IRS_VPE_DBR_REQ_DB BIT_ULL(62)
+#define GICV5_IRS_VPE_DBR_DBPM GENMASK_ULL(36, 32)
+#define GICV5_IRS_VPE_DBR_INTID GENMASK_ULL(23, 0)
+
+#define GICV5_IRS_VPE_HPPIR_HPPIV BIT_ULL(32)
+#define GICV5_IRS_VPE_HPPIR_TYPE GENMASK_ULL(31, 29)
+#define GICV5_IRS_VPE_HPPIR_ID GENMASK_ULL(23, 0)
+
+#define GICV5_IRS_VPE_CR0_DPS BIT(0)
+
+#define GICV5_IRS_VPE_STATUSR_V BIT(1)
+#define GICV5_IRS_VPE_STATUSR_IDLE BIT(0)
+
+#define GICV5_IRS_VM_DBR_EN BIT_ULL(63)
+#define GICV5_IRS_VM_DBR_VPE_ID GENMASK_ULL(15, 0)
+
+#define GICV5_IRS_VM_SELR_VM_ID GENMASK(15, 0)
+
+#define GICV5_IRS_VM_STATUSR_V BIT(1)
+#define GICV5_IRS_VM_STATUSR_IDLE BIT(0)
+
+#define GICV5_IRS_VMAP_L2_VMTR_M BIT_ULL(63)
+#define GICV5_IRS_VMAP_L2_VMTR_VM_ID GENMASK_ULL(15, 0)
+
+#define GICV5_IRS_VMAP_VMR_M BIT_ULL(63)
+#define GICV5_IRS_VMAP_VMR_U BIT_ULL(62)
+#define GICV5_IRS_VMAP_VMR_VM_ID GENMASK_ULL(15, 0)
+
+#define GICV5_IRS_VMAP_VISTR_M BIT_ULL(63)
+#define GICV5_IRS_VMAP_VISTR_U BIT_ULL(62)
+#define GICV5_IRS_VMAP_VISTR_VM_ID GENMASK_ULL(47, 32)
+#define GICV5_IRS_VMAP_VISTR_TYPE GENMASK_ULL(31, 29)
+
+#define GICV5_IRS_VMAP_L2_VISTR_M BIT_ULL(63)
+#define GICV5_IRS_VMAP_L2_VISTR_VM_ID GENMASK_ULL(47, 32)
+#define GICV5_IRS_VMAP_L2_VISTR_TYPE GENMASK_ULL(31, 29)
+#define GICV5_IRS_VMAP_L2_VISTR_ID GENMASK_ULL(23, 0)
+
+#define GICV5_IRS_VMAP_VPER_M BIT_ULL(63)
+#define GICV5_IRS_VMAP_VPER_VM_ID GENMASK_ULL(47, 32)
+#define GICV5_IRS_VMAP_VPER_VPE_ID GENMASK_ULL(15, 0)
+
+#define GICV5_IRS_SAVE_VMR_VM_ID GENMASK_ULL(15, 0)
+#define GICV5_IRS_SAVE_VMR_Q BIT_ULL(62)
+#define GICV5_IRS_SAVE_VMR_S BIT_ULL(63)
+
+#define GICV5_IRS_SAVE_VM_STATUSR_IDLE BIT(0)
+#define GICV5_IRS_SAVE_VM_STATUSR_Q BIT(1)
+
+#define GICV5_IRS_MEC_IDR_MECIDSIZE GENMASK(3, 0)
+
+#define GICV5_IRS_MEC_MECID_R_MECID GENMASK(15, 0)
+
+#define GICV5_IRS_MPAM_IDR_HAS_MPAM_SP BIT(24)
+#define GICV5_IRS_MPAM_IDR_PMG_MAX GENMASK(23, 16)
+#define GICV5_IRS_MPAM_IDR_PARTID_MAX GENMASK(15, 0)
+
+#define GICV5_IRS_MPAM_PARTID_R_IDLE BIT(31)
+#define GICV5_IRS_MPAM_PARTID_R_MPAM_SP GENMASK(25, 24)
+#define GICV5_IRS_MPAM_PARTID_R_PMG GENMASK(23, 16)
+#define GICV5_IRS_MPAM_PARTID_R_PARTID GENMASK(15, 0)
+
+#define GICV5_IRS_SWERR_STATUSR_IMP_EC GENMASK_ULL(31, 24)
+#define GICV5_IRS_SWERR_STATUSR_EC GENMASK_ULL(23, 16)
+#define GICV5_IRS_SWERR_STATUSR_OF BIT_ULL(3)
+#define GICV5_IRS_SWERR_STATUSR_S1V BIT_ULL(2)
+#define GICV5_IRS_SWERR_STATUSR_S0V BIT_ULL(1)
+#define GICV5_IRS_SWERR_STATUSR_V BIT_ULL(0)
+
+#define GICV5_IRS_SWERR_SYNDROMER0_VIRTUAL BIT_ULL(63)
+#define GICV5_IRS_SWERR_SYNDROMER0_TYPE GENMASK_ULL(62, 60)
+#define GICV5_IRS_SWERR_SYNDROMER0_ID GENMASK_ULL(55, 32)
+#define GICV5_IRS_SWERR_SYNDROMER0_VM_ID GENMASK_ULL(15, 0)
+
+#define GICV5_IRS_SWERR_SYNDROMER1_ADDR GENMASK_ULL(55, 3)
+
#define GICV5_ISTL1E_VALID BIT_ULL(0)
+#define GICV5_IRS_ISTL1E_SIZE 8UL
#define GICV5_ISTL1E_L2_ADDR_MASK GENMASK_ULL(55, 12)
+#define GICV5_IRS_SETLPIR 0x0000
+#define GICV5_IRS_SETLPIR_ID GENMASK(23, 0)
+
/*
* ITS registers and tables structures
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 05/39] arm64/sysreg: Add GICv5 GIC VDPEND and VDRCFG encodings
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (3 preceding siblings ...)
2026-05-21 14:50 ` [PATCH v2 04/39] KVM: arm64: gic-v5: Define remaining IRS MMIO registers Sascha Bischoff
@ 2026-05-21 14:50 ` Sascha Bischoff
2026-05-21 14:51 ` [PATCH v2 06/39] arm64/sysreg: Update ICC_CR0_EL1 with LINK and LINK_IDLE fields Sascha Bischoff
` (33 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:50 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Add the encodings for the GIC VDPEND and GIC VDRCFG system
instructions. These operate on the virtual interrupt domain, and are
used to make interrupts pending for a VM and to read back the
configuration of a VM's interrupts.
This is part of enabling GICv5 KVM support, and is required for
injection of SPIs and LPIs, and querying the state of in-flight SPIs
to detect their deactivation.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/include/asm/sysreg.h | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 7aa08d59d4944..40ff7d25d37b0 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -1040,7 +1040,7 @@
#define GCS_CAP(x) ((((unsigned long)x) & GCS_CAP_ADDR_MASK) | \
GCS_CAP_VALID_TOKEN)
/*
- * Definitions for GICv5 instructions
+ * Definitions for GICv5 instructions for the Current Domain
*/
#define GICV5_OP_GIC_CDAFF sys_insn(1, 0, 12, 1, 3)
#define GICV5_OP_GIC_CDDI sys_insn(1, 0, 12, 2, 0)
@@ -1105,6 +1105,22 @@
#define GICV5_GICR_CDNMIA_TYPE_MASK GENMASK_ULL(31, 29)
#define GICV5_GICR_CDNMIA_ID_MASK GENMASK_ULL(23, 0)
+/*
+ * Definitions for GICv5 instructions for the Virtual Domain
+ */
+#define GICV5_OP_GIC_VDPEND sys_insn(1, 4, 12, 1, 4)
+#define GICV5_OP_GIC_VDRCFG sys_insn(1, 4, 12, 1, 5)
+
+/* Shift and mask definitions for GIC VDPEND */
+#define GICV5_GIC_VDPEND_PENDING_MASK BIT_ULL(63)
+#define GICV5_GIC_VDPEND_VM_MASK GENMASK_ULL(47, 32)
+#define GICV5_GIC_VDPEND_TYPE_MASK GENMASK_ULL(31, 29)
+#define GICV5_GIC_VDPEND_ID_MASK GENMASK_ULL(23, 0)
+
+/* Shift and mask definitions for GIC VDRCFG */
+#define GICV5_GIC_VDRCFG_TYPE_MASK GENMASK_ULL(31, 29)
+#define GICV5_GIC_VDRCFG_ID_MASK GENMASK_ULL(23, 0)
+
#define gicr_insn(insn) read_sysreg_s(GICV5_OP_GICR_##insn)
#define gic_insn(v, insn) write_sysreg_s(v, GICV5_OP_GIC_##insn)
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 06/39] arm64/sysreg: Update ICC_CR0_EL1 with LINK and LINK_IDLE fields
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (4 preceding siblings ...)
2026-05-21 14:50 ` [PATCH v2 05/39] arm64/sysreg: Add GICv5 GIC VDPEND and VDRCFG encodings Sascha Bischoff
@ 2026-05-21 14:51 ` Sascha Bischoff
2026-05-21 14:51 ` [PATCH v2 07/39] KVM: arm64: gic-v5: Extract host IRS caps from IRS config frame Sascha Bischoff
` (32 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:51 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
These fields have been added to the architecture since this register
was added to the generator, and were hence missing.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/tools/sysreg | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 6c3ff14e561e6..57ab09404267c 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -3736,7 +3736,9 @@ Sysreg ICC_CR0_EL1 3 1 12 0 1
Res0 63:39
Field 38 PID
Field 37:32 IPPT
-Res0 31:1
+Res0 31:3
+Field 2 LINK_IDLE
+Field 1 LINK
Field 0 EN
EndSysreg
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 07/39] KVM: arm64: gic-v5: Extract host IRS caps from IRS config frame
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (5 preceding siblings ...)
2026-05-21 14:51 ` [PATCH v2 06/39] arm64/sysreg: Update ICC_CR0_EL1 with LINK and LINK_IDLE fields Sascha Bischoff
@ 2026-05-21 14:51 ` Sascha Bischoff
2026-05-21 14:51 ` [PATCH v2 08/39] KVM: arm64: gic-v5: Add VPE doorbell domain Sascha Bischoff
` (31 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:51 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
The host irqchip driver provides KVM with a pointer to an IRS's config
frame, which allows KVM to directly interact with the host's IRS. The
MMIO registers in the config frame are used to configure VMs (in
addition to them being used by the host). The IRS's config frame also
includes a set of ID registers which describe the capabilities that
the IRS has.
Stash the pointer to the config frame, and extract the VM capabilities
(from IRS_IDR3 & IRS_IDR4), as well as the IST
capabilities/requirements (IRS_IDR2) from the IRS.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-v5.c | 46 +++++++++++++++++++++++++++++++++--
include/kvm/arm_vgic.h | 26 ++++++++++++++++++++
2 files changed, 70 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index d4789ff3e7402..3f7b132110114 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -11,6 +11,7 @@
#include "vgic.h"
#define ppi_caps kvm_vgic_global_state.vgic_v5_ppi_caps
+#define irs_caps kvm_vgic_global_state.vgic_v5_irs_caps
/*
* Not all PPIs are guaranteed to be implemented for GICv5. Deterermine which
@@ -34,6 +35,45 @@ static void vgic_v5_get_implemented_ppis(void)
__assign_bit(GICV5_ARCH_PPI_PMUIRQ, ppi_caps.impl_ppi_mask, system_supports_pmuv3());
}
+static u32 irs_readl_relaxed(const u32 reg_offset)
+{
+ return readl_relaxed(irs_caps.irs_base + reg_offset);
+}
+
+static void vgic_v5_irs_extract_vm_caps(const struct gic_kvm_info *info)
+{
+ u64 idr;
+
+ irs_caps.irs_base = info->gicv5_irs.base;
+ irs_caps.non_coherent = info->gicv5_irs.non_coherent;
+
+ idr = irs_readl_relaxed(GICV5_IRS_IDR2);
+
+ /* We skip the LPI field as it only applies to physical LPIs */
+ irs_caps.ist_id_bits = FIELD_GET(GICV5_IRS_IDR2_ID_BITS, idr);
+ irs_caps.min_lpi_id_bits = FIELD_GET(GICV5_IRS_IDR2_MIN_LPI_ID_BITS, idr);
+ irs_caps.ist_levels = (idr & GICV5_IRS_IDR2_IST_LEVELS);
+ irs_caps.ist_l2sz = FIELD_GET(GICV5_IRS_IDR2_IST_L2SZ, idr);
+ irs_caps.istmd = (idr & GICV5_IRS_IDR2_ISTMD);
+ irs_caps.istmd_sz = FIELD_GET(GICV5_IRS_IDR2_ISTMD_SZ, idr);
+
+ idr = irs_readl_relaxed(GICV5_IRS_IDR3);
+
+ irs_caps.max_vms = BIT(FIELD_GET(GICV5_IRS_IDR3_VM_ID_BITS, idr));
+ irs_caps.two_level_vmt_support = (idr & GICV5_IRS_IDR3_VMT_LEVELS);
+
+ if (idr & GICV5_IRS_IDR3_VMD)
+ irs_caps.vmd_size = BIT(FIELD_GET(GICV5_IRS_IDR3_VMD_SZ, idr));
+ else
+ irs_caps.vmd_size = 0;
+
+ idr = irs_readl_relaxed(GICV5_IRS_IDR4);
+
+ irs_caps.vped_size = BIT(FIELD_GET(GICV5_IRS_IDR4_VPED_SZ, idr));
+ /* Field stores VPE_ID_BITS - 1 */
+ irs_caps.max_vpes = BIT(FIELD_GET(GICV5_IRS_IDR4_VPE_ID_BITS, idr) + 1);
+}
+
/*
* Probe for a vGICv5 compatible interrupt controller, returning 0 on success.
*/
@@ -61,10 +101,12 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
goto skip_v5;
}
- kvm_vgic_global_state.max_gic_vcpus = VGIC_V5_MAX_CPUS;
-
+ vgic_v5_irs_extract_vm_caps(info);
vgic_v5_get_implemented_ppis();
+ kvm_vgic_global_state.max_gic_vcpus = min(irs_caps.max_vpes,
+ VGIC_V5_MAX_CPUS);
+
ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V5);
if (ret) {
kvm_err("Cannot register GICv5 KVM device.\n");
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index fe49fb56dc3c9..8d65a18fefb80 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -182,6 +182,32 @@ struct vgic_global {
struct {
DECLARE_BITMAP(impl_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS);
} vgic_v5_ppi_caps;
+
+ /* GICv5 IRS capabilities */
+ struct {
+ /* Base address of the host IRS's CONFIG_FRAME */
+ void __iomem *irs_base;
+
+ /* IST Caps */
+ u8 ist_id_bits;
+ bool ist_levels;
+ u8 ist_l2sz;
+ bool istmd;
+ u8 istmd_sz;
+
+ /* LPI only */
+ u8 min_lpi_id_bits;
+
+ /* VM Table, VPE Table */
+ bool two_level_vmt_support;
+ u32 max_vms;
+ u32 max_vpes;
+ u16 vmd_size;
+ u16 vped_size;
+
+ /* Is the IRS coherent with us, or not? */
+ bool non_coherent;
+ } vgic_v5_irs_caps;
};
extern struct vgic_global kvm_vgic_global_state;
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 08/39] KVM: arm64: gic-v5: Add VPE doorbell domain
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (6 preceding siblings ...)
2026-05-21 14:51 ` [PATCH v2 07/39] KVM: arm64: gic-v5: Extract host IRS caps from IRS config frame Sascha Bischoff
@ 2026-05-21 14:51 ` Sascha Bischoff
2026-05-21 14:52 ` [PATCH v2 09/39] KVM: arm64: gic-v5: Create & manage VM and VPE tables Sascha Bischoff
` (30 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:51 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
VPE doorbells allow the GICv5 hardware to notify KVM when an SPI or LPI
can be signalled to a non-resident VPE. This provides the mechanism used
to wake blocked vcpus once the hardware determines that the interrupt is
eligible to be delivered.
Add support for a per-VM VPE doorbell irq domain. The domain is created
under the GICv5 LPI domain, with one doorbell allocated per VPE. Store
the allocated doorbell base in the VM's GICv5 state so that later
patches can request per-vcpu doorbell IRQs and use them for IRS
commands and wakeups.
Add the per-VPE doorbell state to the GICv5 CPU interface state. The
doorbell IRQ number is populated when the IRQs are requested, and the
db_fired state is used by later patches once doorbell delivery is wired
up.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-init.c | 18 ++--
arch/arm64/kvm/vgic/vgic-v5.c | 137 +++++++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic.h | 1 +
include/kvm/arm_vgic.h | 4 +
include/linux/irqchip/arm-gic-v5.h | 2 +
5 files changed, 156 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 907057881b26a..625d352756fcf 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -492,16 +492,22 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
dist->nr_spis = 0;
dist->vgic_dist_base = VGIC_ADDR_UNDEF;
- if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
+ switch (dist->vgic_model) {
+ case KVM_DEV_TYPE_ARM_VGIC_V2:
+ dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
+ break;
+ case KVM_DEV_TYPE_ARM_VGIC_V3:
list_for_each_entry_safe(rdreg, next, &dist->rd_regions, list)
vgic_v3_free_redist_region(kvm, rdreg);
INIT_LIST_HEAD(&dist->rd_regions);
- } else {
- dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
- }
- if (vgic_supports_direct_irqs(kvm))
- vgic_v4_teardown(kvm);
+ if (vgic_supports_direct_irqs(kvm))
+ vgic_v4_teardown(kvm);
+ break;
+ case KVM_DEV_TYPE_ARM_VGIC_V5:
+ vgic_v5_teardown(kvm);
+ break;
+ }
xa_destroy(&dist->lpi_xa);
}
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 3f7b132110114..52924408ca990 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -7,6 +7,7 @@
#include <linux/bitops.h>
#include <linux/irqchip/arm-vgic-info.h>
+#include <linux/irqdomain.h>
#include "vgic.h"
@@ -152,6 +153,132 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
return 0;
}
+/*
+ * This set of irq_chip functions is specific for doorbells.
+ */
+static const struct irq_chip vgic_v5_db_irq_chip = {
+ .name = "GICv5-DB",
+ .irq_mask = irq_chip_mask_parent,
+ .irq_unmask = irq_chip_unmask_parent,
+ .irq_eoi = irq_chip_eoi_parent,
+ .irq_set_affinity = irq_chip_set_affinity_parent,
+ .irq_get_irqchip_state = irq_chip_get_parent_state,
+ .irq_set_irqchip_state = irq_chip_set_parent_state,
+ .flags = IRQCHIP_SET_TYPE_MASKED | IRQCHIP_SKIP_SET_WAKE |
+ IRQCHIP_MASK_ON_SUSPEND,
+};
+
+static void vgic_v5_irq_db_domain_free(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs)
+{
+ int i;
+
+ for (i = 0; i < nr_irqs; i++) {
+ struct irq_data *d = irq_domain_get_irq_data(domain, virq + i);
+
+ irq_set_handler(virq + i, NULL);
+ irq_domain_reset_irq_data(d);
+ }
+
+ irq_domain_free_irqs_parent(domain, virq, nr_irqs);
+}
+
+static int vgic_v5_irq_db_domain_alloc(struct irq_domain *domain,
+ unsigned int virq, unsigned int nr_irqs,
+ void *arg)
+{
+ const struct irq_chip *chip = &vgic_v5_db_irq_chip;
+ struct vgic_v5_vm *vm = arg;
+ struct irq_data *irqd;
+ int ret;
+
+ if (!vm) {
+ kvm_err("invalid parameter for doorbell irq allocation\n");
+ return -EINVAL;
+ }
+
+ ret = irq_domain_alloc_irqs_parent(domain, virq, nr_irqs, NULL);
+ if (ret)
+ return ret;
+
+ for (int i = 0; i < nr_irqs; i++) {
+ irq_domain_set_hwirq_and_chip(domain, virq + i, i, chip,
+ domain->host_data);
+ irqd = irq_desc_get_irq_data(irq_to_desc(virq + i));
+ irqd_set_single_target(irqd);
+ }
+
+ return 0;
+}
+
+static const struct irq_domain_ops vgic_v5_irq_db_domain_ops = {
+ .alloc = vgic_v5_irq_db_domain_alloc,
+ .free = vgic_v5_irq_db_domain_free,
+};
+
+static int vgic_v5_create_per_vm_domain(struct kvm *kvm)
+{
+ struct vgic_v5_vm *vm = &kvm->arch.vgic.gicv5_vm;
+ int nr_vcpus = atomic_read(&kvm->online_vcpus);
+ int id = task_pid_nr(current);
+ int ret, db_virq = 0;
+
+ if (!gicv5_global_data.lpi_domain) {
+ kvm_err("LPI domain uninitialized, can't set up KVM Doorbells\n");
+ return -ENODEV;
+ }
+
+ vm->fwnode = irq_domain_alloc_named_id_fwnode("GICv5-vpe-db", id);
+
+ /*
+ * KVM per-VM VPE DB domain; child of LPI domain; only ever handles
+ * doorbells. We know how many doorbells we have, and therefore we
+ * create a linear domain.
+ */
+ vm->domain = irq_domain_create_hierarchy(gicv5_global_data.lpi_domain,
+ 0, nr_vcpus, vm->fwnode,
+ &vgic_v5_irq_db_domain_ops, vm);
+ if (WARN_ON(!vm->domain)) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ db_virq = irq_domain_alloc_irqs(vm->domain, nr_vcpus, NUMA_NO_NODE, vm);
+ if (db_virq <= 0) {
+ ret = db_virq;
+ goto err;
+ }
+
+ kvm->arch.vgic.gicv5_vm.vpe_db_base = db_virq;
+
+ return 0;
+
+err:
+ if (db_virq > 0)
+ irq_domain_free_irqs(db_virq, nr_vcpus);
+ if (vm->domain)
+ irq_domain_remove(vm->domain);
+ if (vm->fwnode)
+ irq_domain_free_fwnode(vm->fwnode);
+
+ kvm->arch.vgic.gicv5_vm.vpe_db_base = 0;
+ vm->domain = NULL;
+ vm->fwnode = NULL;
+
+ return ret;
+}
+
+static void vgic_v5_teardown_per_vm_domain(struct vgic_v5_vm *vm)
+{
+ if (!vm->domain)
+ return;
+
+ irq_domain_remove(vm->domain);
+ irq_domain_free_fwnode(vm->fwnode);
+ vm->domain = NULL;
+ vm->fwnode = NULL;
+}
+
void vgic_v5_reset(struct kvm_vcpu *vcpu)
{
/*
@@ -167,10 +294,16 @@ void vgic_v5_reset(struct kvm_vcpu *vcpu)
vcpu->arch.vgic_cpu.num_pri_bits = 5;
}
+void vgic_v5_teardown(struct kvm *kvm)
+{
+ vgic_v5_teardown_per_vm_domain(&kvm->arch.vgic.gicv5_vm);
+}
+
int vgic_v5_init(struct kvm *kvm)
{
struct kvm_vcpu *vcpu;
unsigned long idx;
+ int ret;
if (vgic_initialized(kvm))
return 0;
@@ -182,6 +315,10 @@ int vgic_v5_init(struct kvm *kvm)
}
}
+ ret = vgic_v5_create_per_vm_domain(kvm);
+ if (ret)
+ return ret;
+
/* We only allow userspace to drive the SW_PPI, if it is implemented. */
bitmap_zero(kvm->arch.vgic.gicv5_vm.userspace_ppis,
VGIC_V5_NR_PRIVATE_IRQS);
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index f45f7e3ec4d6e..f2f5fdc3211d7 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -366,6 +366,7 @@ void vgic_debug_destroy(struct kvm *kvm);
int vgic_v5_probe(const struct gic_kvm_info *info);
void vgic_v5_reset(struct kvm_vcpu *vcpu);
int vgic_v5_init(struct kvm *kvm);
+void vgic_v5_teardown(struct kvm *kvm);
int vgic_v5_map_resources(struct kvm *kvm);
void vgic_v5_set_ppi_ops(struct kvm_vcpu *vcpu, u32 vintid);
bool vgic_v5_has_pending_ppi(struct kvm_vcpu *vcpu);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 8d65a18fefb80..bff2b7c896d55 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -392,6 +392,10 @@ struct vgic_v5_vm {
* convenient way to do that).
*/
DECLARE_BITMAP(vgic_ppi_hmr, VGIC_V5_NR_PRIVATE_IRQS);
+
+ struct fwnode_handle *fwnode;
+ struct irq_domain *domain;
+ int vpe_db_base;
};
struct vgic_dist {
diff --git a/include/linux/irqchip/arm-gic-v5.h b/include/linux/irqchip/arm-gic-v5.h
index dd7da568ee8b8..1702b57527dee 100644
--- a/include/linux/irqchip/arm-gic-v5.h
+++ b/include/linux/irqchip/arm-gic-v5.h
@@ -577,6 +577,8 @@ void gicv5_irs_syncr(void);
/* Embedded in kvm.arch */
struct gicv5_vpe {
+ int db;
+ bool db_fired;
bool resident;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 09/39] KVM: arm64: gic-v5: Create & manage VM and VPE tables
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (7 preceding siblings ...)
2026-05-21 14:51 ` [PATCH v2 08/39] KVM: arm64: gic-v5: Add VPE doorbell domain Sascha Bischoff
@ 2026-05-21 14:52 ` Sascha Bischoff
2026-05-21 14:52 ` [PATCH v2 10/39] KVM: arm64: gic-v5: Introduce guest IST alloc and management Sascha Bischoff
` (29 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:52 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
GICv5 uses a set of in-memory tables to track and manage VM
state. These must be allocated by the hypervisor, and provided to the
IRS to use.
The VMT (Virtual Machine Table) is a linear or two level table
comprising VMT Entries (VMTE). Each VMTE describes the state for a
single VM. This state includes things such as the SPI and LPI IST
configuration (coming in a future commit), an implementation-defined
VM Descriptor, and a VPE Table (VPET).
The VPET contains one entry per VPE belonging to a VM, and is used to
mark a VPE as valid, as well as providing the address of an
implementation-defined VPE Descriptor, which is used by the hardware
to track and manage VPE state.
This commit adds support for allocating the VMT, and managing the
VMTEs. The VMTEs can be initialised or released for re-use. Allocation
and tracking of unused VMTEs is handled with an IDA.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/vgic/vgic-init.c | 2 +
arch/arm64/kvm/vgic/vgic-v5-tables.c | 625 +++++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic-v5-tables.h | 76 ++++
arch/arm64/kvm/vgic/vgic-v5.c | 15 +
drivers/irqchip/irq-gic-v5-irs.c | 12 +-
include/kvm/arm_vgic.h | 4 +
include/linux/irqchip/arm-gic-v5.h | 14 +-
8 files changed, 740 insertions(+), 10 deletions(-)
create mode 100644 arch/arm64/kvm/vgic/vgic-v5-tables.c
create mode 100644 arch/arm64/kvm/vgic/vgic-v5-tables.h
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 59612d2f277c1..431de9b145ca1 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -24,7 +24,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o \
- vgic/vgic-v5.o
+ vgic/vgic-v5.o vgic/vgic-v5-tables.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 625d352756fcf..079a57c2b18f6 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -154,6 +154,8 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
case KVM_DEV_TYPE_ARM_VGIC_V3:
INIT_LIST_HEAD(&kvm->arch.vgic.rd_regions);
break;
+ case KVM_DEV_TYPE_ARM_VGIC_V5:
+ kvm->arch.vgic.gicv5_vm.vm_id = VGIC_V5_VM_ID_INVAL;
}
/*
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.c b/arch/arm64/kvm/vgic/vgic-v5-tables.c
new file mode 100644
index 0000000000000..e9b92893b4e1f
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.c
@@ -0,0 +1,625 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025, 2026 Arm Ltd.
+ */
+
+#include <kvm/arm_vgic.h>
+#include <linux/kernel.h>
+#include <linux/mm.h>
+#include <linux/mutex.h>
+#include <linux/sizes.h>
+#include <linux/slab.h>
+#include <linux/xarray.h>
+#include <asm/kvm_mmu.h>
+
+#include "vgic.h"
+#include "vgic-v5-tables.h"
+
+#define irs_caps kvm_vgic_global_state.vgic_v5_irs_caps
+
+static struct vgic_v5_vmt *vmt_info;
+/* Serialises lazy installation of shared second-level VMTs. */
+static DEFINE_MUTEX(vmt_l2_lock);
+static DEFINE_XARRAY(vm_info);
+
+/* Level 1 Virtual Machine Table Entry */
+#define GICV5_VMTEL1E_VALID BIT_ULL(0)
+/* Note that there is no shift for the address by design */
+#define GICV5_VMTEL1E_L2_ADDR GENMASK(51, 12)
+
+#define GICV5_VMTEL2E_SIZE 32ULL
+/* An L2 table (two-level VMT) is ALWAYS 4kB! */
+#define GICV5_VMT_L2_TABLE_SIZE 4096ULL
+#define GICV5_VMT_L2_TABLE_ENTRIES (GICV5_VMT_L2_TABLE_SIZE / GICV5_VMTEL2E_SIZE)
+
+/*
+ * As the L2 VMTE is a large data structure, we are splitting it into 4 parts.
+ * We only mask and shift WITHIN each part for simplicity.
+ */
+/* First 64-bit chunk */
+#define GICV5_VMTEL2E_VALID BIT_ULL(0)
+#define GICV5_VMTEL2E_VMD_ADDR_SHIFT 3ULL
+#define GICV5_VMTEL2E_VMD_ADDR GENMASK_ULL(55, 3)
+/* Second 64-bit chunk */
+#define GICV5_VMTEL2E_VPET_ADDR_SHIFT 3ULL
+#define GICV5_VMTEL2E_VPET_ADDR GENMASK_ULL(55, 3)
+#define GICV5_VMTEL2E_VPE_ID_BITS GENMASK_ULL(63, 59)
+/* Third & fourth 64-bit chunks (the encodings are the same for each) */
+#define GICV5_VMTEL2E_IST_VALID BIT_ULL(0)
+#define GICV5_VMTEL2E_IST_L2SZ GENMASK_ULL(2, 1)
+#define GICV5_VMTEL2E_IST_ADDR_SHIFT 6ULL
+#define GICV5_VMTEL2E_IST_ADDR GENMASK_ULL(55, 6)
+#define GICV5_VMTEL2E_IST_ISTSZ GENMASK_ULL(57, 56)
+#define GICV5_VMTEL2E_IST_STRUCTURE BIT_ULL(58)
+#define GICV5_VMTEL2E_IST_ID_BITS GENMASK_ULL(63, 59)
+
+/* Virtual PE Table Entry */
+#define GICV5_VPE_VALID BIT_ULL(0)
+/* Note that there is no shift for the address by design. */
+#define GICV5_VPED_ADDR_SHIFT 3ULL
+#define GICV5_VPED_ADDR GENMASK_ULL(55, 3)
+
+/*
+ * Our IRS might be coherent or non-coherent. If coherent, we can just emit a
+ * DSB to ensure that we're in sync. However, when non-coherent, we need to
+ * manage our cached data explicitly.
+ *
+ * This helper is used to handle both coherent and non-coherent IRSes, and
+ * handles all combinations of cleaning and invalidating to the PoC.
+ */
+static void vgic_v5_clean_inval(void *va, size_t size)
+{
+ unsigned long base = (unsigned long)va;
+
+ dsb(ishst);
+
+ if (kvm_vgic_global_state.vgic_v5_irs_caps.non_coherent)
+ dcache_clean_inval_poc(base, base + size);
+}
+
+/*
+ * Create a linear VM Table. Directly using the number of entries supplied as
+ * the size of an L2 VMTE (32 bytes) guarantees that our allocation is aligned per
+ * the GICv5 requirements for the IRS_VMT_BASER.
+ */
+static int vgic_v5_alloc_vmt_linear(unsigned int num_entries)
+{
+ vmt_info->linear.vmt_base = kzalloc_objs(*vmt_info->linear.vmt_base,
+ num_entries);
+ if (!vmt_info->linear.vmt_base)
+ return -ENOMEM;
+
+ vgic_v5_clean_inval(vmt_info->linear.vmt_base,
+ num_entries * sizeof(struct vmtl2_entry));
+
+ return 0;
+}
+
+/*
+ * Allocate the first level of a two-level VM table. The second-level VM tables
+ * are allocated on demand (by vgic_v5_alloc_l2_vmt()).
+ */
+static int vgic_v5_alloc_vmt_two_level(unsigned int num_entries)
+{
+ /*
+ * Each L2 VMT array is always 4k-sized (covering 128 VMs). This is
+ * mandated by the GICv5 specification (GICv5 EAC0 Specification rule
+ * D_LSPBK). Hence, round up the number of entries to be at least 128
+ * (or the next highest power of two as we give the HW the number of VM
+ * ID bits).
+ */
+ if (num_entries < GICV5_VMT_L2_TABLE_ENTRIES)
+ num_entries = GICV5_VMT_L2_TABLE_ENTRIES;
+ num_entries = roundup_pow_of_two(num_entries);
+
+ vmt_info->l2.num_l1_ents = (num_entries / GICV5_VMT_L2_TABLE_ENTRIES);
+ vmt_info->l2.vmt_base = kzalloc_objs(*vmt_info->l2.vmt_base,
+ vmt_info->l2.num_l1_ents);
+ if (!vmt_info->l2.vmt_base)
+ return -ENOMEM;
+
+ vmt_info->l2.l2ptrs = kzalloc_objs(*vmt_info->l2.l2ptrs,
+ vmt_info->l2.num_l1_ents,
+ GFP_KERNEL);
+ if (!vmt_info->l2.l2ptrs) {
+ kfree(vmt_info->l2.vmt_base);
+ return -ENOMEM;
+ }
+
+ vgic_v5_clean_inval(vmt_info->l2.vmt_base,
+ vmt_info->l2.num_l1_ents * sizeof(vmtl1_entry));
+
+ return 0;
+}
+
+/*
+ * Allocate a second level VMT, if required. This can be called eagerly, and
+ * will only perform the allocation if required.
+ */
+static int vgic_v5_alloc_l2_vmt(struct kvm *kvm)
+{
+ struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0);
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ enum gicv5_vcpu_cmd cmd = VMT_L2_MAP;
+ struct vmtl2_entry *l2_table;
+ unsigned int l1_index;
+ int ret;
+
+ /* Nothing to do if we have linear tables! */
+ if (!vmt_info->two_level)
+ return 0;
+
+ /*
+ * We have 4k-sized L2 tables - this is mandated by the spec for
+ * two-level VMTs (GICv5 EAC0 Specification rule D_LSPBK). This means
+ * that we have 128 entries per L1 VMTE.
+ */
+ l1_index = vm_id / GICV5_VMT_L2_TABLE_ENTRIES;
+
+ guard(mutex)(&vmt_l2_lock);
+
+ /* Already valid? Great! */
+ if (vmt_info->l2.l2ptrs[l1_index])
+ return 0;
+
+ l2_table = kzalloc_objs(*l2_table, GICV5_VMT_L2_TABLE_ENTRIES);
+ if (!l2_table)
+ return -ENOMEM;
+
+ vgic_v5_clean_inval(l2_table, GICV5_VMT_L2_TABLE_SIZE);
+
+ vgic_v5_clean_inval(vmt_info->l2.vmt_base + l1_index,
+ sizeof(vmtl1_entry));
+
+ WRITE_ONCE(vmt_info->l2.vmt_base[l1_index],
+ cpu_to_le64(virt_to_phys(l2_table)));
+
+ vgic_v5_clean_inval(vmt_info->l2.vmt_base + l1_index,
+ sizeof(vmtl1_entry));
+
+ /*
+ * VMAP in the L2 VMT via the IRS. We use any of the VM's CPUs as a
+ * conduit for interacting with the host's IRS. In the current case,
+ * this lets us resolve the VM ID to pass to the hardware.
+ */
+ ret = irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu0), &cmd);
+
+ /* We've failed to make the L2 VMT valid - things are very broken! */
+ if (ret) {
+ /* Remove the pointer from L1 table */
+ WRITE_ONCE(vmt_info->l2.vmt_base[l1_index], 0);
+
+ vgic_v5_clean_inval(vmt_info->l2.vmt_base + l1_index,
+ sizeof(vmtl1_entry));
+
+ kfree(l2_table);
+
+ return ret;
+ }
+
+ vmt_info->l2.l2ptrs[l1_index] = l2_table;
+
+ return 0;
+}
+
+/*
+ * Allocate the top-level VMT. This can either be linear or two-level.
+ */
+int vgic_v5_vmt_allocate(unsigned int max_vpes)
+{
+ int ret;
+
+ /* Allocate the tracking structure */
+ vmt_info = kzalloc_obj(*vmt_info, GFP_KERNEL);
+ if (!vmt_info)
+ return -ENOMEM;
+
+ ida_init(&vmt_info->vm_id_ida);
+ vmt_info->max_vpes = max_vpes;
+ vmt_info->vmd_size = irs_caps.vmd_size;
+ vmt_info->vped_size = irs_caps.vped_size;
+ vmt_info->two_level = irs_caps.two_level_vmt_support;
+ vmt_info->num_entries = irs_caps.max_vms;
+
+ if (vmt_info->two_level)
+ ret = vgic_v5_alloc_vmt_two_level(vmt_info->num_entries);
+ else
+ ret = vgic_v5_alloc_vmt_linear(vmt_info->num_entries);
+
+ /* If anything failed, free our tracking structure before returning */
+ if (ret) {
+ kfree(vmt_info);
+ vmt_info = NULL;
+ }
+
+ return ret;
+}
+
+/*
+ * Free the VMT and associated tracking structures. This isn't strictly expected
+ * to be called in general operation, but instead exists for completeness.
+ */
+int vgic_v5_vmt_free(void)
+{
+ if (!vmt_info)
+ return 0;
+
+ if (!vmt_info->two_level) {
+ kfree(vmt_info->linear.vmt_base);
+ } else {
+ /* Free the L2 tables; kfree(NULL) is safe */
+ for (int i = 0; i < vmt_info->l2.num_l1_ents; ++i)
+ kfree(vmt_info->l2.l2ptrs[i]);
+ kfree(vmt_info->l2.l2ptrs);
+
+ /* And now free the L1 table */
+ kfree(vmt_info->l2.vmt_base);
+ }
+
+ ida_destroy(&vmt_info->vm_id_ida);
+ kfree(vmt_info);
+ vmt_info = NULL;
+
+ return 0;
+}
+
+/*
+ * Look up a VMT Entry by VM ID.
+ */
+static struct vmtl2_entry *vgic_v5_get_l2_vmte(u16 vm_id)
+{
+ unsigned int l1_index, l2_index;
+ struct vmtl2_entry *l2_table;
+
+ if (!vmt_info->two_level)
+ return &vmt_info->linear.vmt_base[vm_id];
+
+ l1_index = vm_id / GICV5_VMT_L2_TABLE_ENTRIES;
+ l2_index = vm_id % GICV5_VMT_L2_TABLE_ENTRIES;
+
+ if (l1_index >= vmt_info->l2.num_l1_ents)
+ return ERR_PTR(-E2BIG);
+
+ if (!vmt_info->l2.l2ptrs[l1_index])
+ return ERR_PTR(-EINVAL);
+
+ l2_table = vmt_info->l2.l2ptrs[l1_index];
+ return &l2_table[l2_index];
+}
+
+/*
+ * Zero a VMT Entry, and flush & invalidate to the PoC, if required.
+ */
+static int vgic_v5_reset_vmte(struct kvm *kvm)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vmtl2_entry *vmte;
+
+ vmte = vgic_v5_get_l2_vmte(vm_id);
+ if (IS_ERR(vmte))
+ return PTR_ERR(vmte);
+
+ /*
+ * The VMT is normal memory shared with the IRS. Invalidate before
+ * rewriting the entry so that cacheline-granular maintenance cannot
+ * later push stale data for neighbouring IRS-visible state back to
+ * memory.
+ */
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+
+ /*
+ * Prevent the compiler from eliding the individual VMTE
+ * stores. Ordering and visibility to the IRS are provided by the
+ * surrounding cache maintenance and command protocol, not by
+ * WRITE_ONCE().
+ *
+ * The same compiler-access constraint applies to READ_ONCE() users in
+ * this file: when inspecting IRS-visible table entries, read the field
+ * exactly once and prevent the compiler from reusing, merging or
+ * tearing the access. Coherency and freshness for non-coherent IRSes
+ * still come from the surrounding cache maintenance.
+ */
+ WRITE_ONCE(vmte->val[0], cpu_to_le64(0ULL));
+ WRITE_ONCE(vmte->val[1], cpu_to_le64(0ULL));
+ WRITE_ONCE(vmte->val[2], cpu_to_le64(0ULL));
+ WRITE_ONCE(vmte->val[3], cpu_to_le64(0ULL));
+
+ /* And make our write visible to the IRS (if non-coherent) */
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+
+ return 0;
+}
+
+/*
+ * Use the IDA to allocate a new VM ID, and track it in the gicv5_vm data
+ * structure. If we're out of VM IDs, the IDA catches that, and we return the
+ * error (-ENOSPC). If we've previously allocated a VM ID, we catch that too and
+ * return -EBUSY.
+ */
+int vgic_v5_allocate_vm_id(struct kvm *kvm)
+{
+ int id;
+
+ if (kvm->arch.vgic.gicv5_vm.vm_id != VGIC_V5_VM_ID_INVAL)
+ return -EBUSY;
+
+ id = ida_alloc_max(&vmt_info->vm_id_ida, vmt_info->num_entries - 1u,
+ GFP_KERNEL);
+ if (id < 0)
+ return id;
+
+ kvm->arch.vgic.gicv5_vm.vm_id = id;
+
+ return 0;
+}
+
+/*
+ * Release the VM ID to allow it to be reallocated in the future.
+ */
+void vgic_v5_release_vm_id(struct kvm *kvm)
+{
+ if (kvm->arch.vgic.gicv5_vm.vm_id == VGIC_V5_VM_ID_INVAL)
+ return;
+
+ ida_free(&vmt_info->vm_id_ida, kvm->arch.vgic.gicv5_vm.vm_id);
+ kvm->arch.vgic.gicv5_vm.vm_id = VGIC_V5_VM_ID_INVAL;
+}
+
+/*
+ * Initialise an entry in the VMT based on the index of the VM.
+ *
+ * Note: We don't mark the VMTE as valid as this needs to be done by
+ * the hardware.
+ */
+int vgic_v5_vmte_init(struct kvm *kvm)
+{
+ int nr_cpus = atomic_read(&kvm->online_vcpus);
+ struct vgic_v5_vm_info *vmi = NULL;
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vmtl2_entry *vmte;
+ void **vped_ptrs = NULL;
+ vpe_entry *vpet = NULL;
+ void *vmd = NULL;
+ int ret;
+ u64 tmp;
+
+ if (nr_cpus > vmt_info->max_vpes)
+ return -E2BIG;
+
+ /*
+ * If we're using two-level VMTs, L2 is allocated on demand. For linear
+ * VMTs, this is a NOP.
+ */
+ ret = vgic_v5_alloc_l2_vmt(kvm);
+ if (ret)
+ return ret;
+
+ vmte = vgic_v5_get_l2_vmte(vm_id);
+ if (IS_ERR(vmte))
+ return PTR_ERR(vmte);
+
+ /* If the entry is already valid, something went wrong */
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+ if (le64_to_cpu(READ_ONCE(vmte->val[0])) & GICV5_VMTEL2E_VALID)
+ return -EINVAL;
+
+ ret = vgic_v5_reset_vmte(kvm);
+ if (ret)
+ return ret;
+
+ vmi = kzalloc_obj(*vmi);
+ if (!vmi) {
+ ret = -ENOMEM;
+ goto out_fail;
+ }
+
+ ret = xa_insert(&vm_info, vm_id, vmi, GFP_KERNEL);
+ if (ret)
+ goto out_fail;
+
+ /* Allocate and assign the VM Descriptor, if required. */
+ if (vmt_info->vmd_size != 0) {
+ vmd = kzalloc(vmt_info->vmd_size, GFP_KERNEL);
+ if (!vmd) {
+ ret = -ENOMEM;
+ goto out_fail;
+ }
+
+ /* Stash the VA so we can free it later */
+ vmi->vmd_base = vmd;
+
+ tmp = FIELD_PREP(GICV5_VMTEL2E_VMD_ADDR,
+ virt_to_phys(vmd) >> GICV5_VMTEL2E_VMD_ADDR_SHIFT);
+ WRITE_ONCE(vmte->val[0], cpu_to_le64(tmp));
+ }
+
+ /*
+ * Allocate and assign the VPE Table. Round up the number of CPUs to a
+ * whole power of two as we cannot describe non-powers-of-two in the
+ * VMTE field as it conveys the number of ID bits used and not the
+ * number of vPEs.
+ *
+ * The IRS encodes the number of IAFFID bits as N - 1, so a VM with a
+ * single vCPU must still allocate two VPET entries and expose 1 bit.
+ */
+ nr_cpus = max(2UL, roundup_pow_of_two(nr_cpus));
+ vmi->vpe_id_bits = fls(nr_cpus) - 1;
+
+ vpet = kzalloc_objs(*vpet, nr_cpus);
+ if (!vpet) {
+ ret = -ENOMEM;
+ goto out_fail;
+ }
+
+ /* Stash the VA so we can free it later */
+ vmi->vpet_base = vpet;
+
+ tmp = FIELD_PREP(GICV5_VMTEL2E_VPET_ADDR,
+ virt_to_phys(vpet) >> GICV5_VMTEL2E_VPET_ADDR_SHIFT);
+ tmp |= FIELD_PREP(GICV5_VMTEL2E_VPE_ID_BITS, vmi->vpe_id_bits);
+ WRITE_ONCE(vmte->val[1], cpu_to_le64(tmp));
+
+ vped_ptrs = kzalloc_objs(*vped_ptrs, nr_cpus, GFP_KERNEL);
+ if (!vped_ptrs) {
+ ret = -ENOMEM;
+ goto out_fail;
+ }
+ vmi->vped_ptrs = vped_ptrs;
+
+ if (vmd)
+ vgic_v5_clean_inval(vmd, vmt_info->vmd_size);
+ vgic_v5_clean_inval(vpet, sizeof(*vpet) * nr_cpus);
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+
+ kvm->arch.vgic.gicv5_vm.vmte_allocated = true;
+
+ return 0;
+
+out_fail:
+ /* kfree(NULL) is safe so we can just kfree() at leisure */
+ kfree(vmd);
+ kfree(vpet);
+ kfree(vped_ptrs);
+ if (vmi)
+ xa_erase(&vm_info, vm_id);
+ kfree(vmi);
+
+ vgic_v5_reset_vmte(kvm);
+
+ return ret;
+}
+
+/*
+ * Release the VMT Entry, freeing up any allocated data structures before
+ * zeroing the VMTE.
+ *
+ * The VMTE must be marked as invalid before it is released.
+ */
+int vgic_v5_vmte_release(struct kvm *kvm)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+ struct vmtl2_entry *vmte;
+ int ret;
+
+ vmte = vgic_v5_get_l2_vmte(vm_id);
+ if (IS_ERR(vmte))
+ return PTR_ERR(vmte);
+
+ /* Reject if the VMTE has not been marked as invalid! */
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+ if (le64_to_cpu(READ_ONCE(vmte->val[0])) & GICV5_VMTEL2E_VALID)
+ return -EINVAL;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ goto no_vmi;
+
+ for (int i = 0; i < BIT(vmi->vpe_id_bits); i++)
+ kfree(vmi->vped_ptrs[i]);
+ kfree(vmi->vped_ptrs);
+ kfree(vmi->vpet_base);
+ kfree(vmi->vmd_base);
+
+ xa_erase(&vm_info, vm_id);
+ kfree(vmi);
+
+no_vmi:
+ /*
+ * If we didn't get far enough into allocating a VMTE to create the VM
+ * info structure, then we just zero the VMTE and move on. There's
+ * nothing else we can realistically do here.
+ */
+ ret = vgic_v5_reset_vmte(kvm);
+ if (ret)
+ return ret;
+
+ kvm->arch.vgic.gicv5_vm.vmte_allocated = false;
+
+ return 0;
+}
+
+/*
+ * Allocate a VPE descriptor and provide it to the hardware via the VPE Table.
+ */
+int vgic_v5_vmte_alloc_vpe(struct kvm_vcpu *vcpu)
+{
+ u16 vm_id = vgic_v5_vm_id(vcpu->kvm);
+ u16 vpe_id = vgic_v5_vpe_id(vcpu);
+ struct vgic_v5_vm_info *vmi;
+ vpe_entry tmp, *vpet_base;
+ void *vped;
+
+ /* Make sure we're not over what the hardware supports */
+ if (vpe_id >= vmt_info->max_vpes)
+ return -E2BIG;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return -EINVAL;
+
+ if (vpe_id >= 1 << vmi->vpe_id_bits)
+ return -E2BIG;
+
+ vpet_base = vmi->vpet_base;
+
+ /* If the VPETE for this CPU is already valid we've gone wrong */
+ vgic_v5_clean_inval(&vpet_base[vpe_id], sizeof(*vpet_base));
+ if (le64_to_cpu(READ_ONCE(vpet_base[vpe_id])) & GICV5_VPE_VALID)
+ return -EBUSY;
+
+ /* Alloc VPE Descriptor. Only used by IRS. */
+ vped = kzalloc(vmt_info->vped_size, GFP_KERNEL);
+ if (!vped)
+ return -ENOMEM;
+
+ vmi->vped_ptrs[vpe_id] = vped;
+
+ tmp = FIELD_PREP(GICV5_VPED_ADDR, virt_to_phys(vped) >> GICV5_VPED_ADDR_SHIFT);
+ WRITE_ONCE(vpet_base[vpe_id], cpu_to_le64(tmp));
+
+ vgic_v5_clean_inval(vped, vmt_info->vped_size);
+ vgic_v5_clean_inval(vpet_base + vpe_id, sizeof(vpe_entry));
+
+ return 0;
+}
+
+/*
+ * Free the memory allocated for the VPE descriptor.
+ */
+int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu)
+{
+ u16 vm_id = vgic_v5_vm_id(vcpu->kvm);
+ u16 vpe_id = vgic_v5_vpe_id(vcpu);
+ struct vgic_v5_vm_info *vmi;
+ struct vmtl2_entry *vmte;
+ vpe_entry *vpet_base;
+ void *vped;
+
+ vmte = vgic_v5_get_l2_vmte(vm_id);
+ if (IS_ERR(vmte))
+ return PTR_ERR(vmte);
+
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+ if (le64_to_cpu(READ_ONCE(vmte->val[0])) & GICV5_VMTEL2E_VALID)
+ return -EBUSY;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (!vmi)
+ return -EINVAL;
+
+ if (vpe_id >= 1 << vmi->vpe_id_bits)
+ return -E2BIG;
+
+ vpet_base = vmi->vpet_base;
+ WRITE_ONCE(vpet_base[vpe_id], 0ULL);
+
+ vgic_v5_clean_inval(vpet_base + vpe_id, sizeof(vpe_entry));
+
+ /* Free VPE Descriptor. Only used by IRS. */
+ vped = vmi->vped_ptrs[vpe_id];
+ vmi->vped_ptrs[vpe_id] = NULL;
+ kfree(vped);
+
+ return 0;
+}
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.h b/arch/arm64/kvm/vgic/vgic-v5-tables.h
new file mode 100644
index 0000000000000..3ca5bc7214fc9
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.h
@@ -0,0 +1,76 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2025, 2026 Arm Ltd.
+ */
+
+#ifndef __KVM_ARM_VGICV5_TABLES_H__
+#define __KVM_ARM_VGICV5_TABLES_H__
+
+#include <linux/idr.h>
+#include <linux/irqchip/arm-gic-v5.h>
+
+/* Level 1 Virtual Machine Table Entry */
+typedef __le64 vmtl1_entry;
+
+/* Level 2 Virtual Machine Table Entry */
+struct vmtl2_entry {
+ __le64 val[4];
+};
+
+/* Virtual PE Table Entry */
+typedef __le64 vpe_entry;
+
+struct vgic_v5_vm_info {
+ void __iomem *vmd_base;
+ vpe_entry __iomem *vpet_base;
+ void __iomem **vped_ptrs;
+ u8 vpe_id_bits;
+};
+
+struct vgic_v5_vmt {
+ union {
+ struct {
+ struct vmtl2_entry *vmt_base;
+ unsigned int num_ents;
+ } linear;
+ struct {
+ vmtl1_entry *vmt_base;
+ struct vmtl2_entry **l2ptrs;
+ unsigned int num_l1_ents;
+ } l2;
+ };
+ bool two_level;
+ unsigned int num_entries;
+ unsigned int max_vpes;
+ size_t vmd_size;
+ size_t vped_size;
+ struct ida vm_id_ida;
+};
+
+static inline u16 vgic_v5_vm_id(struct kvm *kvm)
+{
+ return kvm->arch.vgic.gicv5_vm.vm_id;
+}
+
+static inline u16 vgic_v5_vpe_id(struct kvm_vcpu *vcpu)
+{
+ return vcpu->vcpu_id;
+}
+
+static inline int vgic_v5_vpe_db(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db;
+}
+
+int vgic_v5_vmt_allocate(unsigned int max_vpes);
+int vgic_v5_vmt_free(void);
+
+int vgic_v5_allocate_vm_id(struct kvm *kvm);
+void vgic_v5_release_vm_id(struct kvm *kvm);
+
+int vgic_v5_vmte_init(struct kvm *kvm);
+int vgic_v5_vmte_release(struct kvm *kvm);
+int vgic_v5_vmte_alloc_vpe(struct kvm_vcpu *vcpu);
+int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu);
+
+#endif
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 52924408ca990..adfe0b207ef40 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -153,6 +153,20 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
return 0;
}
+static int vgic_v5_db_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
+{
+ enum gicv5_vcpu_cmd *cmd = vcpu_info;
+
+ switch (*cmd) {
+ case VMT_L2_MAP:
+ case VMTE_MAKE_VALID:
+ case VMTE_MAKE_INVALID:
+ /* Not yet implemented */
+ default:
+ return -EINVAL;
+ }
+}
+
/*
* This set of irq_chip functions is specific for doorbells.
*/
@@ -164,6 +178,7 @@ static const struct irq_chip vgic_v5_db_irq_chip = {
.irq_set_affinity = irq_chip_set_affinity_parent,
.irq_get_irqchip_state = irq_chip_get_parent_state,
.irq_set_irqchip_state = irq_chip_set_parent_state,
+ .irq_set_vcpu_affinity = vgic_v5_db_set_vcpu_affinity,
.flags = IRQCHIP_SET_TYPE_MASKED | IRQCHIP_SKIP_SET_WAKE |
IRQCHIP_MASK_ON_SUSPEND,
};
diff --git a/drivers/irqchip/irq-gic-v5-irs.c b/drivers/irqchip/irq-gic-v5-irs.c
index 607e066821b52..70502b07ec8d7 100644
--- a/drivers/irqchip/irq-gic-v5-irs.c
+++ b/drivers/irqchip/irq-gic-v5-irs.c
@@ -269,24 +269,24 @@ int gicv5_irs_iste_alloc(const u32 lpi)
* itself is not supported) again serves to make it easier to find physically
* contiguous blocks of memory.
*/
-static unsigned int gicv5_irs_l2_sz(u32 idr2)
+unsigned int gicv5_irs_l2_sz(u32 l2sz)
{
switch (PAGE_SIZE) {
case SZ_64K:
- if (GICV5_IRS_IST_L2SZ_SUPPORT_64KB(idr2))
+ if (GICV5_IRS_IST_L2SZ_SUPPORT_64KB(l2sz))
return GICV5_IRS_IST_CFGR_L2SZ_64K;
fallthrough;
case SZ_4K:
- if (GICV5_IRS_IST_L2SZ_SUPPORT_4KB(idr2))
+ if (GICV5_IRS_IST_L2SZ_SUPPORT_4KB(l2sz))
return GICV5_IRS_IST_CFGR_L2SZ_4K;
fallthrough;
case SZ_16K:
- if (GICV5_IRS_IST_L2SZ_SUPPORT_16KB(idr2))
+ if (GICV5_IRS_IST_L2SZ_SUPPORT_16KB(l2sz))
return GICV5_IRS_IST_CFGR_L2SZ_16K;
break;
}
- if (GICV5_IRS_IST_L2SZ_SUPPORT_4KB(idr2))
+ if (GICV5_IRS_IST_L2SZ_SUPPORT_4KB(l2sz))
return GICV5_IRS_IST_CFGR_L2SZ_4K;
return GICV5_IRS_IST_CFGR_L2SZ_64K;
@@ -334,7 +334,7 @@ static int __init gicv5_irs_init_ist(struct gicv5_irs_chip_data *irs_data)
lpi_id_bits = min(lpi_id_bits, gicv5_global_data.cpuif_id_bits);
if (two_levels)
- l2sz = gicv5_irs_l2_sz(idr2);
+ l2sz = gicv5_irs_l2_sz(FIELD_GET(GICV5_IRS_IDR2_IST_L2SZ, idr2));
istmd = !!FIELD_GET(GICV5_IRS_IDR2_ISTMD, idr2);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index bff2b7c896d55..ba32cd71fe0a7 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -374,6 +374,8 @@ struct vgic_redist_region {
struct list_head list;
};
+#define VGIC_V5_VM_ID_INVAL (-1)
+
struct vgic_v5_vm {
/*
* We only expose a subset of PPIs to the guest. This subset is a
@@ -396,6 +398,8 @@ struct vgic_v5_vm {
struct fwnode_handle *fwnode;
struct irq_domain *domain;
int vpe_db_base;
+ u32 vm_id;
+ bool vmte_allocated;
};
struct vgic_dist {
diff --git a/include/linux/irqchip/arm-gic-v5.h b/include/linux/irqchip/arm-gic-v5.h
index 1702b57527dee..64e31068d9d17 100644
--- a/include/linux/irqchip/arm-gic-v5.h
+++ b/include/linux/irqchip/arm-gic-v5.h
@@ -159,9 +159,9 @@
#define GICV5_IRS_IDR2_LPI BIT(5)
#define GICV5_IRS_IDR2_ID_BITS GENMASK(4, 0)
-#define GICV5_IRS_IST_L2SZ_SUPPORT_4KB(r) FIELD_GET(BIT(11), (r))
-#define GICV5_IRS_IST_L2SZ_SUPPORT_16KB(r) FIELD_GET(BIT(12), (r))
-#define GICV5_IRS_IST_L2SZ_SUPPORT_64KB(r) FIELD_GET(BIT(13), (r))
+#define GICV5_IRS_IST_L2SZ_SUPPORT_4KB(r) FIELD_GET(BIT(0), (r))
+#define GICV5_IRS_IST_L2SZ_SUPPORT_16KB(r) FIELD_GET(BIT(1), (r))
+#define GICV5_IRS_IST_L2SZ_SUPPORT_64KB(r) FIELD_GET(BIT(2), (r))
#define GICV5_IRS_IDR3_VMT_LEVELS BIT(10)
#define GICV5_IRS_IDR3_VM_ID_BITS GENMASK(9, 5)
@@ -573,6 +573,7 @@ int gicv5_irs_cpu_to_iaffid(int cpu_id, u16 *iaffid);
struct gicv5_irs_chip_data *gicv5_irs_lookup_by_spi_id(u32 spi_id);
int gicv5_spi_irq_set_type(struct irq_data *d, unsigned int type);
int gicv5_irs_iste_alloc(u32 lpi);
+unsigned int gicv5_irs_l2_sz(u32 l2sz);
void gicv5_irs_syncr(void);
/* Embedded in kvm.arch */
@@ -617,4 +618,11 @@ void gicv5_deinit_lpis(void);
void __init gicv5_its_of_probe(struct device_node *parent);
void __init gicv5_its_acpi_probe(void);
+
+enum gicv5_vcpu_cmd {
+ VMT_L2_MAP, /* Map in a L2 VMT - *may* happen on VM init */
+ VMTE_MAKE_VALID, /* Make the VMTE valid */
+ VMTE_MAKE_INVALID, /* Make the VMTE (et al.) invalid */
+};
+
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 10/39] KVM: arm64: gic-v5: Introduce guest IST alloc and management
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (8 preceding siblings ...)
2026-05-21 14:52 ` [PATCH v2 09/39] KVM: arm64: gic-v5: Create & manage VM and VPE tables Sascha Bischoff
@ 2026-05-21 14:52 ` Sascha Bischoff
2026-05-21 14:52 ` [PATCH v2 11/39] KVM: arm64: gic-v5: Implement VMT/vIST IRS MMIO Ops Sascha Bischoff
` (28 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:52 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
GICv5 guests use Interrupt State Tables (ISTs) to track and manage the
interrupt state for SPIs and LPIs. These ISTs are provided to the host's
IRS via the VMTE.
On a host GICv5 system, SPIs do not require any up-front memory
allocation prior to their use, unlike LPIs which require the OS to
allocate an IST. For a GICv5 guest, the same holds from the guest's
point of view: SPIs should require no explicit memory allocation by the
guest. This means that KVM must provision the memory passed to the IRS
for managing a guest's SPI state.
Allocate the SPI IST before running the guest for the first time. As only
a small number of SPIs are expected, this is always allocated as a linear
IST. The host is responsible for freeing this memory on guest teardown.
For LPIs, the guest provisions memory for its LPI IST. KVM does not pass
that memory directly to the host IRS. Instead, allocate a shadow LPI IST
and pass that to the IRS through the VMTE. The LPI IST may be allocated
as a two-level structure when supported and required by the configured
LPI ID space, as many more LPIs are expected than SPIs. The host frees
this memory on guest teardown.
This commit also extends the doorbell domain to allow the doorbells
themselves to act as a conduit for issuing commands, similar to what
exists for GICv4 support. Effectively, irq_set_vcpu_affinity() becomes
an ioctl-like interface for issuing commands specific to either a VM or
the particular VPE that the doorbell belongs to. Add support for:
VMT_L2_MAP - Make a second level VM table valid
VMTE_MAKE_VALID - Make a single VMTE, and hence VM, valid
VMTE_MAKE_INVALID - Make a single VMTE, and hence VM, invalid
SPI_VIST_MAKE_VALID - Make the SPI IST valid
LPI_VIST_MAKE_VALID - Make the LPI IST valid
LPI_VIST_MAKE_INVALID - Make the LPI IST invalid
None of these commands are plumbed through to the host IRS at this
stage.
There is intentionally no SPI_VIST_MAKE_INVALID command. The SPI IST is
allocated as part of VM creation and is not invalidated while the VM is
live. It is freed when the VM is destroyed, after the VMTE has been made
invalid. The LPI IST, on the other hand, is driven by the guest, which is
free to invalidate and free its LPI IST at any point.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-v5-tables.c | 527 +++++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic-v5-tables.h | 22 ++
arch/arm64/kvm/vgic/vgic-v5.c | 3 +
include/linux/irqchip/arm-gic-v5.h | 3 +
4 files changed, 555 insertions(+)
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.c b/arch/arm64/kvm/vgic/vgic-v5-tables.c
index e9b92893b4e1f..a1d0f620b7913 100644
--- a/arch/arm64/kvm/vgic/vgic-v5-tables.c
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.c
@@ -59,6 +59,14 @@ static DEFINE_XARRAY(vm_info);
#define GICV5_VPED_ADDR_SHIFT 3ULL
#define GICV5_VPED_ADDR GENMASK_ULL(55, 3)
+/*
+ * The LPI and SPI configuration is stored in the 2nd and 3rd 64-bit chunks of
+ * the VMTE (0-based). We call this a section here in an attempt to simplify the
+ * code.
+ */
+#define GICV5_VMTEL2_LPI_SECTION 2
+#define GICV5_VMTEL2_SPI_SECTION 3
+
/*
* Our IRS might be coherent or non-coherent. If coherent, we can just emit a
* DSB to ensure that we're in sync. However, when non-coherent, we need to
@@ -489,6 +497,25 @@ int vgic_v5_vmte_init(struct kvm *kvm)
return ret;
}
+/*
+ * The following set of forward declarations makes the code layout a *little*
+ * clearer as it lets us keep the IST-related code together.
+ */
+static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
+ unsigned int id_bits,
+ unsigned int istsz);
+static int vgic_v5_alloc_l1_ist(struct kvm *kvm, unsigned int id_bits,
+ unsigned int istsz, unsigned int l2_split);
+static int vgic_v5_alloc_l2_ists(struct kvm *kvm, unsigned int id_bits,
+ unsigned int istsz, unsigned int l2_split);
+static int vgic_v5_alloc_two_level_lpi_ist(struct kvm *kvm,
+ unsigned int id_bits,
+ unsigned int istsz,
+ unsigned int l2_split);
+static int vgic_v5_linear_ist_free(struct kvm *kvm, bool spi);
+static int vgic_v5_two_level_ist_free(struct kvm *kvm, bool spi);
+static int vgic_v5_spi_ist_free(struct kvm *kvm);
+
/*
* Release the VMT Entry, freeing up any allocated data structures before
* zeroing the VMTE.
@@ -521,6 +548,20 @@ int vgic_v5_vmte_release(struct kvm *kvm)
kfree(vmi->vpet_base);
kfree(vmi->vmd_base);
+ /* If we have an LPI IST, free it */
+ if (vmi->h_lpi_ist) {
+ ret = vgic_v5_lpi_ist_free(kvm);
+ if (ret)
+ return ret;
+ }
+
+ /* If we have an SPI IST, free it */
+ if (vmi->h_spi_ist) {
+ ret = vgic_v5_spi_ist_free(kvm);
+ if (ret)
+ return ret;
+ }
+
xa_erase(&vm_info, vm_id);
kfree(vmi);
@@ -623,3 +664,489 @@ int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu)
return 0;
}
+
+/*
+ * Assign an already allocated IST to the VM by populating the fields in the
+ * corresponding VMTE. We re-use this code for both an SPI IST and LPI IST, even
+ * if the paths to reach it might be vastly different.
+ */
+static int vgic_v5_vmte_assign_ist(struct kvm *kvm, phys_addr_t ist_base,
+ bool two_level, unsigned int id_bits,
+ unsigned int l2sz, unsigned int istsz,
+ bool spi_ist)
+{
+ struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0);
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ enum gicv5_vcpu_cmd cmd;
+ struct vmtl2_entry *vmte;
+ unsigned int section;
+ u64 tmp;
+ int ret;
+
+ /*
+ * The L2 VMTE comprises four 64-bit "sections", where sections 2 & 3
+ * describe the LPI and SPI ISTs, respectively. Both the LPI and SPI
+ * sections have the same layout, and as we are either operating on SPIs
+ * or LPIs we pick a section of the VMTE to modify up-front.
+ *
+ * See the GICv5 EAC0 Specification 11.2.2 for more details about the
+ * VMTE layout.
+ */
+ section = spi_ist ? GICV5_VMTEL2_SPI_SECTION : GICV5_VMTEL2_LPI_SECTION;
+
+ if (ist_base & ~GICV5_VMTEL2E_IST_ADDR) {
+ kvm_err("IST alignment issue! Address: 0x%llx, Mask 0x%llx\n",
+ ist_base, GICV5_VMTEL2E_IST_ADDR);
+ return -EINVAL;
+ }
+
+ vmte = vgic_v5_get_l2_vmte(vm_id);
+ if (IS_ERR(vmte))
+ return PTR_ERR(vmte);
+
+ /* Bail if already allocated */
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+ if (le64_to_cpu(READ_ONCE(vmte->val[section])) & GICV5_VMTEL2E_IST_VALID)
+ return -EINVAL;
+
+ tmp = FIELD_PREP(GICV5_VMTEL2E_IST_L2SZ, l2sz);
+ tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ADDR,
+ ist_base >> GICV5_VMTEL2E_IST_ADDR_SHIFT);
+ tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ISTSZ, istsz);
+ tmp |= FIELD_PREP(GICV5_VMTEL2E_IST_ID_BITS, id_bits);
+ if (two_level)
+ tmp |= GICV5_VMTEL2E_IST_STRUCTURE;
+
+ WRITE_ONCE(vmte->val[section], cpu_to_le64(tmp));
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+
+ /* Finally, mark the entry as valid */
+ cmd = spi_ist ? SPI_VIST_MAKE_VALID : LPI_VIST_MAKE_VALID;
+ ret = irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu0), &cmd);
+
+ return ret;
+}
+
+/*
+ * Allocate a Linear IST - always used for SPIs and potentially LPIs.
+ *
+ * The calculation for n has been taken from section 11.2.2 of the GICv5 EAC0
+ * spec.
+ *
+ * NOTE: istsz is the FIELD used by GICv5, not the actual size (or log2() of the
+ * size).
+ */
+static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
+ unsigned int id_bits, unsigned int istsz)
+{
+ const size_t n = id_bits + 1 + istsz;
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+ __le64 *ist;
+ u32 l1sz;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return -EINVAL;
+
+ /*
+ * Allocate the IST. We only have one level, so we just use the L2 ISTE.
+ */
+ l1sz = BIT(n + 1);
+ ist = kzalloc(l1sz, GFP_KERNEL);
+ if (!ist)
+ return -ENOMEM;
+
+ if (spi_ist) {
+ vmi->h_spi_ist = ist;
+ } else {
+ vmi->h_lpi_ist_structure = false;
+ vmi->h_lpi_ist = ist;
+ }
+
+ vgic_v5_clean_inval(ist, l1sz);
+
+ return 0;
+}
+
+/*
+ * Allocate the first level of a two-level IST - LPI, only.
+ *
+ * The calculation for n has been taken from section 11.2.2 of the GICv5 EAC0
+ * spec.
+ *
+ * NOTE: istsz and l2sz are the FIELDS used by GICv5, not the actual sizes (or
+ * log2() of the sizes).
+ */
+static int vgic_v5_alloc_l1_ist(struct kvm *kvm, unsigned int id_bits,
+ unsigned int istsz, unsigned int l2sz)
+{
+ const size_t n = max(5, id_bits - ((10 - istsz) + (2 * l2sz)) + 3 - 1);
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ const u32 l1_size = BIT(n + 1);
+ struct vgic_v5_vm_info *vmi;
+ __le64 *ist;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (!vmi)
+ return -EINVAL;
+
+ ist = kzalloc(l1_size, GFP_KERNEL);
+ if (!ist)
+ return -ENOMEM;
+
+ vmi->h_lpi_ist_structure = true;
+ vmi->h_lpi_ist = ist;
+
+ vgic_v5_clean_inval(ist, l1_size);
+
+ return 0;
+}
+
+/*
+ * Allocate ALL of the second level ISTs for a two-level IST - LPI, only.
+ *
+ * The calculation for n has been taken from section 11.2.2 of the GICv5 EAC0
+ * spec. The l2_size calculation is from section 11.2.3 of the same document.
+ *
+ * NOTE: istsz and l2sz are the FIELDS used by GICv5, not the actual sizes (or
+ * log2() of the sizes).
+ */
+static int vgic_v5_alloc_l2_ists(struct kvm *kvm, unsigned int id_bits,
+ unsigned int istsz, unsigned int l2sz)
+{
+ const size_t n = max(5, id_bits - ((10 - istsz) + (2 * l2sz)) + 3 - 1);
+ const int l1_entries = BIT(n + 1) / GICV5_IRS_ISTL1E_SIZE;
+ const size_t l2_size = BIT(11 + (2 * l2sz) + 1);
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+ __le64 *l2ist;
+ __le64 *l1ist;
+ int index;
+ u64 val;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return -EINVAL;
+
+ l1ist = vmi->h_lpi_ist;
+
+ /*
+ * Allocate the storage for the pointers to the L2 ISTs (used when
+ * freeing later).
+ */
+ vmi->h_lpi_l2_ists = kzalloc_objs(*vmi->h_lpi_l2_ists, l1_entries,
+ GFP_KERNEL);
+ if (!vmi->h_lpi_l2_ists)
+ return -ENOMEM;
+
+ /* Allocate the L2 IST for each L1 IST entry */
+ for (index = 0; index < l1_entries; ++index) {
+ l2ist = kzalloc(l2_size, GFP_KERNEL);
+ if (!l2ist) {
+ while (--index >= 0)
+ kfree(vmi->h_lpi_l2_ists[index]);
+
+ kfree(vmi->h_lpi_l2_ists);
+ vmi->h_lpi_l2_ists = NULL;
+
+ return -ENOMEM;
+ }
+
+ /*
+ * We are not doing on-demand allocation of the L2 ISTs, and are
+ * instead provisioning the whole IST up front. This means that
+ * we are able to mark the L2 ISTs as valid in the L1 ISTEs as
+ * the overall IST is not yet valid.
+ */
+ val = (virt_to_phys(l2ist) & GICV5_ISTL1E_L2_ADDR_MASK) |
+ GICV5_ISTL1E_VALID;
+ l1ist[index] = cpu_to_le64(val);
+
+ vmi->h_lpi_l2_ists[index] = l2ist;
+
+ vgic_v5_clean_inval(l2ist, l2_size);
+ }
+
+ /* Handle CMOs for the whole L1 IST in one go */
+ vgic_v5_clean_inval(l1ist, l1_entries * sizeof(*l1ist));
+
+ return 0;
+}
+
+/* Allocate a two-level IST - LPIs, only */
+static int vgic_v5_alloc_two_level_lpi_ist(struct kvm *kvm, unsigned int id_bits,
+ unsigned int istsz, unsigned int l2sz)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+ int ret;
+
+ /*
+ * Allocate the L1 IST first, then all of the L2s. Everything
+ * is preallocated and we do no on-demand IST allocation. This
+ * is to avoid needing to track if and when the guest is doing
+ * on-demand IST allocation.
+ */
+ ret = vgic_v5_alloc_l1_ist(kvm, id_bits, istsz, l2sz);
+ if (ret)
+ return ret;
+
+ ret = vgic_v5_alloc_l2_ists(kvm, id_bits, istsz, l2sz);
+ if (ret) {
+ /* Free the L1 IST again */
+ vmi = xa_load(&vm_info, vm_id);
+ kfree(vmi->h_lpi_ist);
+ vmi->h_lpi_ist = 0;
+
+ return ret;
+ }
+
+ return 0;
+}
+
+static void vgic_v5_free_allocated_lpi_ist(struct vgic_v5_vm_info *vmi,
+ unsigned int id_bits,
+ unsigned int istsz,
+ unsigned int l2sz)
+{
+ if (!vmi->h_lpi_ist_structure) {
+ kfree(vmi->h_lpi_ist);
+ vmi->h_lpi_ist = NULL;
+ return;
+ }
+
+ if (vmi->h_lpi_l2_ists) {
+ const size_t n = max(5, id_bits - ((10 - istsz) + (2 * l2sz)) + 3 - 1);
+ const int l1_entries = BIT(n + 1) / GICV5_IRS_ISTL1E_SIZE;
+ int index;
+
+ for (index = 0; index < l1_entries; ++index)
+ kfree(vmi->h_lpi_l2_ists[index]);
+
+ kfree(vmi->h_lpi_l2_ists);
+ vmi->h_lpi_l2_ists = NULL;
+ }
+
+ kfree(vmi->h_lpi_ist);
+ vmi->h_lpi_ist = NULL;
+}
+
+static void vgic_v5_free_allocated_spi_ist(struct kvm *kvm)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return;
+
+ kfree(vmi->h_spi_ist);
+ vmi->h_spi_ist = NULL;
+}
+
+/*
+ * Free a Linear IST. Can only happen once the VM is dead.
+ */
+static int vgic_v5_linear_ist_free(struct kvm *kvm, bool spi)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vmtl2_entry *vmte;
+ struct vgic_v5_vm_info *vmi;
+ int section;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (!vmi)
+ return -EINVAL;
+
+ vmte = vgic_v5_get_l2_vmte(vm_id);
+ if (IS_ERR(vmte))
+ return PTR_ERR(vmte);
+
+ if (spi) {
+ section = GICV5_VMTEL2_SPI_SECTION;
+ vgic_v5_free_allocated_spi_ist(kvm);
+ } else {
+ section = GICV5_VMTEL2_LPI_SECTION;
+ vgic_v5_free_allocated_lpi_ist(vmi, 0, 0, 0);
+ }
+
+ /* The VM should be dead here, so we can just zero the VMT section */
+ vmte->val[section] = cpu_to_le64(0);
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+
+ return 0;
+}
+
+/*
+ * Free a Two-Level IST. Can only happen once the VM is dead.
+ */
+static int vgic_v5_two_level_ist_free(struct kvm *kvm, bool spi)
+{
+ unsigned int id_bits, istsz, l2sz;
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+ struct vmtl2_entry *vmte;
+ __le64 tmp;
+ int section;
+
+ /* We don't create two-level SPI ISTs, so freeing is a bad idea! */
+ if (spi)
+ return -EINVAL;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (!vmi)
+ return -EINVAL;
+
+ section = GICV5_VMTEL2_LPI_SECTION;
+
+ if (!vmi->h_lpi_ist_structure)
+ return -EINVAL;
+
+ vmte = vgic_v5_get_l2_vmte(vm_id);
+ if (IS_ERR(vmte))
+ return PTR_ERR(vmte);
+
+ tmp = le64_to_cpu(READ_ONCE(vmte->val[section]));
+
+ id_bits = FIELD_GET(GICV5_VMTEL2E_IST_ID_BITS, tmp);
+ istsz = FIELD_GET(GICV5_VMTEL2E_IST_ISTSZ, tmp);
+ l2sz = FIELD_GET(GICV5_VMTEL2E_IST_L2SZ, tmp);
+
+ vgic_v5_free_allocated_lpi_ist(vmi, id_bits, istsz, l2sz);
+
+ /* The VM must be dead, so we can just zero the VMT section */
+ vmte->val[section] = cpu_to_le64(0);
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+
+ return 0;
+}
+
+/* Helper to determine ISTE size based on metadata requirements */
+static unsigned int vgic_v5_ist_istsz(unsigned int id_bits)
+{
+ if (!irs_caps.istmd)
+ return GICV5_IRS_IST_CFGR_ISTSZ_4;
+
+ if (id_bits >= irs_caps.istmd_sz)
+ return GICV5_IRS_IST_CFGR_ISTSZ_16;
+
+ return GICV5_IRS_IST_CFGR_ISTSZ_8;
+}
+
+/*
+ * Allocate an IST for SPIs.
+ *
+ * We don't anticipate a large number of SPIs being allocated. Therefore, we
+ * always allocate a Linear IST for SPIs. This will need to be revisited should
+ * that assumption no longer hold.
+ */
+int vgic_v5_spi_ist_allocate(struct kvm *kvm, unsigned int id_bits)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+ phys_addr_t base_addr;
+ unsigned int istsz;
+ int ret;
+
+ istsz = vgic_v5_ist_istsz(id_bits);
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return -EINVAL;
+
+ ret = vgic_v5_alloc_linear_ist(kvm, true, id_bits, istsz);
+ if (ret)
+ return ret;
+ base_addr = virt_to_phys(vmi->h_spi_ist);
+
+ ret = vgic_v5_vmte_assign_ist(kvm, base_addr, false, id_bits, 0, istsz,
+ true);
+ if (ret) {
+ vgic_v5_free_allocated_spi_ist(kvm);
+ return ret;
+ }
+
+ return 0;
+}
+
+/*
+ * Free the IST for SPIs. Should only happen once the VM is dead.
+ */
+static int vgic_v5_spi_ist_free(struct kvm *kvm)
+{
+ return vgic_v5_linear_ist_free(kvm, true);
+}
+
+/*
+ * Allocate an IST for LPIs.
+ *
+ * Unlike with SPIs, we anticipate that the guest will allocate a relatively
+ * large number of LPIs. Therefore, while we support doing a linear LPI IST, it
+ * is expected that LPI ISTs will be two-level.
+ */
+int vgic_v5_lpi_ist_alloc(struct kvm *kvm, unsigned int id_bits)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+ unsigned int istsz, l2sz;
+ phys_addr_t phys_addr;
+ bool two_level;
+ int ret;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return -EINVAL;
+
+ if (vmi->h_lpi_ist)
+ return -EBUSY;
+
+ istsz = vgic_v5_ist_istsz(id_bits);
+ l2sz = gicv5_irs_l2_sz(irs_caps.ist_l2sz);
+
+ /*
+ * Determine if we want to create a Linear or a Two-Level IST.
+ *
+ * A two-level IST is only required when a single L2 IST cannot cover
+ * the requested ID space. This depends on the L2 IST size selected for
+ * the IRS, not PAGE_SIZE. Using PAGE_SIZE here would switch to
+ * two-level too early when the selected L2 IST is larger than a page,
+ * and the allocation sizing arithmetic would underflow.
+ */
+ two_level = irs_caps.ist_levels &&
+ id_bits > ((10 - istsz) + (2 * l2sz));
+
+ if (!two_level)
+ ret = vgic_v5_alloc_linear_ist(kvm, false /* LPIs, not SPIs */,
+ id_bits, istsz);
+ else
+ ret = vgic_v5_alloc_two_level_lpi_ist(kvm, id_bits, istsz,
+ l2sz);
+
+ if (ret)
+ return ret;
+
+ phys_addr = virt_to_phys(vmi->h_lpi_ist);
+ ret = vgic_v5_vmte_assign_ist(kvm, phys_addr, two_level, id_bits, l2sz,
+ istsz, false);
+ if (ret)
+ vgic_v5_free_allocated_lpi_ist(vmi, id_bits, istsz, l2sz);
+
+ return ret;
+}
+
+/* Free the LPI IST again */
+int vgic_v5_lpi_ist_free(struct kvm *kvm)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (!vmi)
+ return -ENXIO;
+
+ if (!vmi->h_lpi_ist_structure)
+ return vgic_v5_linear_ist_free(kvm, false);
+ else
+ return vgic_v5_two_level_ist_free(kvm, false);
+}
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.h b/arch/arm64/kvm/vgic/vgic-v5-tables.h
index 3ca5bc7214fc9..81fed6c5b1559 100644
--- a/arch/arm64/kvm/vgic/vgic-v5-tables.h
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.h
@@ -25,6 +25,24 @@ struct vgic_v5_vm_info {
vpe_entry __iomem *vpet_base;
void __iomem **vped_ptrs;
u8 vpe_id_bits;
+
+ /*
+ * Both the LPI and SPI ISTs are allocated by the hypervisor. While it
+ * would be possible to track and access them by iterating over the ISTs
+ * themselves, it makes more sense to store pointers to the ISTs.
+ *
+ * The LPI IST can either be two-level or linear. Hence, we keep track
+ * of the structure. If it is two-level, we retain pointers to the L1
+ * IST and to each L2 IST array. If it is linear, we just store the base
+ * address of the IST array.
+ *
+ * The SPI IST is linear, and therefore we just store the base address
+ * of the SPI IST array.
+ */
+ bool h_lpi_ist_structure;
+ __le64 *h_lpi_ist;
+ __le64 **h_lpi_l2_ists;
+ __le64 *h_spi_ist;
};
struct vgic_v5_vmt {
@@ -73,4 +91,8 @@ int vgic_v5_vmte_release(struct kvm *kvm);
int vgic_v5_vmte_alloc_vpe(struct kvm_vcpu *vcpu);
int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu);
+int vgic_v5_spi_ist_allocate(struct kvm *kvm, unsigned int id_bits);
+int vgic_v5_lpi_ist_alloc(struct kvm *kvm, unsigned int id_bits);
+int vgic_v5_lpi_ist_free(struct kvm *kvm);
+
#endif
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index adfe0b207ef40..120eadff9a128 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -161,6 +161,9 @@ static int vgic_v5_db_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
case VMT_L2_MAP:
case VMTE_MAKE_VALID:
case VMTE_MAKE_INVALID:
+ case SPI_VIST_MAKE_VALID:
+ case LPI_VIST_MAKE_VALID:
+ case LPI_VIST_MAKE_INVALID:
/* Not yet implemented */
default:
return -EINVAL;
diff --git a/include/linux/irqchip/arm-gic-v5.h b/include/linux/irqchip/arm-gic-v5.h
index 64e31068d9d17..ef649faeeb0ff 100644
--- a/include/linux/irqchip/arm-gic-v5.h
+++ b/include/linux/irqchip/arm-gic-v5.h
@@ -623,6 +623,9 @@ enum gicv5_vcpu_cmd {
VMT_L2_MAP, /* Map in a L2 VMT - *may* happen on VM init */
VMTE_MAKE_VALID, /* Make the VMTE valid */
VMTE_MAKE_INVALID, /* Make the VMTE (et al.) invalid */
+ SPI_VIST_MAKE_VALID, /* No corresponding invalid */
+ LPI_VIST_MAKE_VALID, /* Triggered by a guest */
+ LPI_VIST_MAKE_INVALID, /* Triggered by a guest */
};
#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 11/39] KVM: arm64: gic-v5: Implement VMT/vIST IRS MMIO Ops
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (9 preceding siblings ...)
2026-05-21 14:52 ` [PATCH v2 10/39] KVM: arm64: gic-v5: Introduce guest IST alloc and management Sascha Bischoff
@ 2026-05-21 14:52 ` Sascha Bischoff
2026-05-21 14:53 ` [PATCH v2 12/39] KVM: arm64: gic-v5: Keep GICv5 vCPU limit model-specific Sascha Bischoff
` (27 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:52 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
GICv5 has rules about which fields of a VMTE (or L1 VMT) may be
directly written by the host once the table is valid. This ensures
that no stale state is cached by the hardware, and provides a clear
interface for making VMs, ISTs, etc, valid.
The hypervisor is responsible for populating the VMTE for a
VM. However, it is not permitted to write the Valid bit (as the VM
table is already valid). Instead, the VM is made valid via an IRS MMIO
Op. The same applies to the ISTs - they must be made valid via the
host IRS.
This commit adds support for:
* Making level 1 VMTs valid (only), allowing for dynamic level 2 array
allocation.
* Making VMTEs (VMs) valid or invalid
* Making SPI/LPI ISTs valid or invalid for a specific VM
As part of this commit, the following vcpu_affinity-based commands are
plumbed in:
VMT_L2_MAP - Make a second level VM table valid
VMTE_MAKE_VALID - Make a single VMTE (and hence VM) valid
VMTE_MAKE_INVALID - Make a single VMTE (and hence VM) invalid
SPI_VIST_MAKE_VALID - Make the SPI IST valid
LPI_VIST_MAKE_VALID - Make the LPI IST valid
LPI_VIST_MAKE_INVALID - Make the LPI IST invalid
Note: the lack of SPI_VIST_MAKE_INVALID is intentional.
When successfully probing for a GICv5, the VMT is allocated, and is
made valid via the IRS's MMIO interface. Treat failures while
allocating or assigning the VMT as hard GICv5 probe failures. At that
point the IRS VM table state is a prerequisite for vGICv5 operation,
and falling back to the legacy path would leave the host without a
valid GICv5 VM table setup. Later failures can only fall back once the
IRS VMT state has been successfully cleared.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-v5-tables.c | 58 ++++++---
arch/arm64/kvm/vgic/vgic-v5-tables.h | 2 +
arch/arm64/kvm/vgic/vgic-v5.c | 188 ++++++++++++++++++++++++++-
3 files changed, 225 insertions(+), 23 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.c b/arch/arm64/kvm/vgic/vgic-v5-tables.c
index a1d0f620b7913..5c87c6c27087a 100644
--- a/arch/arm64/kvm/vgic/vgic-v5-tables.c
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.c
@@ -67,6 +67,21 @@ static DEFINE_XARRAY(vm_info);
#define GICV5_VMTEL2_LPI_SECTION 2
#define GICV5_VMTEL2_SPI_SECTION 3
+static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
+ unsigned int id_bits,
+ unsigned int istsz);
+static int vgic_v5_alloc_l1_ist(struct kvm *kvm, unsigned int id_bits,
+ unsigned int istsz, unsigned int l2_split);
+static int vgic_v5_alloc_l2_ists(struct kvm *kvm, unsigned int id_bits,
+ unsigned int istsz, unsigned int l2_split);
+static int vgic_v5_alloc_two_level_lpi_ist(struct kvm *kvm,
+ unsigned int id_bits,
+ unsigned int istsz,
+ unsigned int l2_split);
+static int vgic_v5_linear_ist_free(struct kvm *kvm, bool spi);
+static int vgic_v5_two_level_ist_free(struct kvm *kvm, bool spi);
+static int vgic_v5_spi_ist_free(struct kvm *kvm);
+
/*
* Our IRS might be coherent or non-coherent. If coherent, we can just emit a
* DSB to ensure that we're in sync. However, when non-coherent, we need to
@@ -497,25 +512,6 @@ int vgic_v5_vmte_init(struct kvm *kvm)
return ret;
}
-/*
- * The following set of forward declarations makes the code layout a *little*
- * clearer as it lets us keep the IST-related code together.
- */
-static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
- unsigned int id_bits,
- unsigned int istsz);
-static int vgic_v5_alloc_l1_ist(struct kvm *kvm, unsigned int id_bits,
- unsigned int istsz, unsigned int l2_split);
-static int vgic_v5_alloc_l2_ists(struct kvm *kvm, unsigned int id_bits,
- unsigned int istsz, unsigned int l2_split);
-static int vgic_v5_alloc_two_level_lpi_ist(struct kvm *kvm,
- unsigned int id_bits,
- unsigned int istsz,
- unsigned int l2_split);
-static int vgic_v5_linear_ist_free(struct kvm *kvm, bool spi);
-static int vgic_v5_two_level_ist_free(struct kvm *kvm, bool spi);
-static int vgic_v5_spi_ist_free(struct kvm *kvm);
-
/*
* Release the VMT Entry, freeing up any allocated data structures before
* zeroing the VMTE.
@@ -665,6 +661,23 @@ int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu)
return 0;
}
+phys_addr_t vgic_v5_get_vmt_base(void)
+{
+ phys_addr_t vmt_base;
+
+ if (!vmt_info->two_level)
+ vmt_base = virt_to_phys(vmt_info->linear.vmt_base);
+ else
+ vmt_base = virt_to_phys(vmt_info->l2.vmt_base);
+
+ return vmt_base;
+}
+
+u8 vgic_v5_vmt_vpe_id_bits(void)
+{
+ return fls(vmt_info->max_vpes) - 1;
+}
+
/*
* Assign an already allocated IST to the VM by populating the fields in the
* corresponding VMTE. We re-use this code for both an SPI IST and LPI IST, even
@@ -723,8 +736,13 @@ static int vgic_v5_vmte_assign_ist(struct kvm *kvm, phys_addr_t ist_base,
/* Finally, mark the entry as valid */
cmd = spi_ist ? SPI_VIST_MAKE_VALID : LPI_VIST_MAKE_VALID;
ret = irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu0), &cmd);
+ if (ret) {
+ WRITE_ONCE(vmte->val[section], 0ULL);
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+ return ret;
+ }
- return ret;
+ return 0;
}
/*
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.h b/arch/arm64/kvm/vgic/vgic-v5-tables.h
index 81fed6c5b1559..acd862b8806d1 100644
--- a/arch/arm64/kvm/vgic/vgic-v5-tables.h
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.h
@@ -82,6 +82,8 @@ static inline int vgic_v5_vpe_db(struct kvm_vcpu *vcpu)
int vgic_v5_vmt_allocate(unsigned int max_vpes);
int vgic_v5_vmt_free(void);
+phys_addr_t vgic_v5_get_vmt_base(void);
+u8 vgic_v5_vmt_vpe_id_bits(void);
int vgic_v5_allocate_vm_id(struct kvm *kvm);
void vgic_v5_release_vm_id(struct kvm *kvm);
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 120eadff9a128..f9578c2a634a4 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -10,10 +10,14 @@
#include <linux/irqdomain.h>
#include "vgic.h"
+#include "vgic-v5-tables.h"
#define ppi_caps kvm_vgic_global_state.vgic_v5_ppi_caps
#define irs_caps kvm_vgic_global_state.vgic_v5_irs_caps
+static int vgic_v5_irs_assign_vmt(bool two_level, u8 vm_id_bits, phys_addr_t vmt_base);
+static int vgic_v5_irs_clear_vmt(void);
+
/*
* Not all PPIs are guaranteed to be implemented for GICv5. Deterermine which
* ones are, and generate a mask.
@@ -36,11 +40,32 @@ static void vgic_v5_get_implemented_ppis(void)
__assign_bit(GICV5_ARCH_PPI_PMUIRQ, ppi_caps.impl_ppi_mask, system_supports_pmuv3());
}
+/*
+ * The IRS MMIO interface is shared between all VMs, so make sure we don't do
+ * anything stupid!
+ */
+static DEFINE_RAW_SPINLOCK(global_irs_lock);
+
static u32 irs_readl_relaxed(const u32 reg_offset)
{
return readl_relaxed(irs_caps.irs_base + reg_offset);
}
+static void irs_writel_relaxed(const u32 val, const u32 reg_offset)
+{
+ writel_relaxed(val, irs_caps.irs_base + reg_offset);
+}
+
+static u64 irs_readq_relaxed(const u32 reg_offset)
+{
+ return readq_relaxed(irs_caps.irs_base + reg_offset);
+}
+
+static void irs_writeq_relaxed(const u64 val, const u32 reg_offset)
+{
+ writeq_relaxed(val, irs_caps.irs_base + reg_offset);
+}
+
static void vgic_v5_irs_extract_vm_caps(const struct gic_kvm_info *info)
{
u64 idr;
@@ -85,6 +110,7 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
int ret;
kvm_vgic_global_state.type = VGIC_V5;
+ kvm_vgic_global_state.max_gic_vcpus = VGIC_V5_MAX_CPUS;
kvm_vgic_global_state.vcpu_base = 0;
kvm_vgic_global_state.vctrl_base = NULL;
@@ -105,12 +131,49 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
vgic_v5_irs_extract_vm_caps(info);
vgic_v5_get_implemented_ppis();
+ /*
+ * Even if the HW supports more per-VM vCPUs, artificially cap as we
+ * can't use them all.
+ */
+ kvm_vgic_global_state.max_gic_vcpus = min(irs_caps.max_vpes,
+ VGIC_V5_MAX_CPUS);
+
+ /*
+ * GICv5 requires a set of tables to be allocated in order to manage
+ * VMs. We allocate them in advance here, which alas means that we
+ * already have to make a decisions regarding the maximum number of VMs
+ * we want to run. For now, we match the maximum number offered by the
+ * hardware, but this might not be a wise choice in the long term.
+ */
+ ret = vgic_v5_vmt_allocate(kvm_vgic_global_state.max_gic_vcpus);
+ if (ret) {
+ kvm_err("Failed to allocate the GICv5 VM tables; no GICv5 support\n");
+ return -ENODEV;
+ }
+
+ /*
+ * We've now allocated the VM table, but the host's IRS doesn't know
+ * about it yet. Provide the base address of the VMT to the IRS, as well
+ * as the number of ID bits that it covers and the structure used
+ * (linear/two-level).
+ */
+ ret = vgic_v5_irs_assign_vmt(irs_caps.two_level_vmt_support,
+ ilog2(irs_caps.max_vms),
+ vgic_v5_get_vmt_base());
+ if (ret) {
+ kvm_err("Failed to assign the GICv5 VM tables to the IRS; no GICv5 support\n");
+ vgic_v5_vmt_free();
+ return -ENODEV;
+ }
+
kvm_vgic_global_state.max_gic_vcpus = min(irs_caps.max_vpes,
VGIC_V5_MAX_CPUS);
ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V5);
if (ret) {
kvm_err("Cannot register GICv5 KVM device.\n");
+ WARN_ON(vgic_v5_irs_clear_vmt());
+ vgic_v5_vmt_free();
goto skip_v5;
}
@@ -138,12 +201,13 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V3);
if (ret) {
kvm_err("Cannot register GICv3-legacy KVM device.\n");
- return ret;
+ /* vGICv5 should still work */
+ return v5_registered ? 0 : ret;
}
/* We potentially limit the max VCPUs further than we need to here */
kvm_vgic_global_state.max_gic_vcpus = min(VGIC_V3_MAX_CPUS,
- VGIC_V5_MAX_CPUS);
+ kvm_vgic_global_state.max_gic_vcpus);
static_branch_enable(&kvm_vgic_global_state.gicv3_cpuif);
kvm_info("GCIE legacy system register CPU interface\n");
@@ -153,18 +217,136 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
return 0;
}
+/*
+ * Wait for completion of a change in any of IRS_VMT_BASER, IRS_VMAP_L2_VMTR,
+ * IRS_VMAP_VMR, IRS_VMAP_VPER, IRS_VMAP_VISTR, IRS_VMAP_L2_VISTR.
+ */
+static int vgic_v5_irs_wait_for_vm_op(void)
+{
+ return gicv5_wait_for_op_atomic(irs_caps.irs_base,
+ GICV5_IRS_VMT_STATUSR,
+ GICV5_IRS_VMT_STATUSR_IDLE,
+ NULL);
+}
+
+static int vgic_v5_irs_write_vm_mmio_reg(u64 val, u32 offset)
+{
+ int ret;
+
+ guard(raw_spinlock_irqsave)(&global_irs_lock);
+
+ /* Make sure that we are idle to begin with */
+ ret = vgic_v5_irs_wait_for_vm_op();
+ if (ret)
+ return ret;
+
+ irs_writeq_relaxed(val, offset);
+
+ return vgic_v5_irs_wait_for_vm_op();
+}
+
+static int vgic_v5_irs_assign_vmt(bool two_level, u8 vm_id_bits,
+ phys_addr_t vmt_base)
+{
+ u64 vmt_baser;
+ u32 vmt_cfgr;
+
+ guard(raw_spinlock_irqsave)(&global_irs_lock);
+
+ vmt_baser = irs_readq_relaxed(GICV5_IRS_VMT_BASER);
+ if (!!FIELD_GET(GICV5_IRS_VMT_BASER_VALID, vmt_baser))
+ return -EBUSY;
+
+ vmt_cfgr = FIELD_PREP(GICV5_IRS_VMT_CFGR_VM_ID_BITS, vm_id_bits);
+ if (two_level)
+ vmt_cfgr |= FIELD_PREP(GICV5_IRS_VMT_CFGR_STRUCTURE,
+ GICV5_IRS_VMT_CFGR_STRUCTURE_TWO_LEVEL);
+
+ irs_writel_relaxed(vmt_cfgr, GICV5_IRS_VMT_CFGR);
+
+ /* The base address is intentionally only masked and not shifted */
+ vmt_baser = FIELD_PREP(GICV5_IRS_VMT_BASER_VALID, true) |
+ (vmt_base & GICV5_IRS_VMT_BASER_ADDR);
+ irs_writeq_relaxed(vmt_baser, GICV5_IRS_VMT_BASER);
+
+ return vgic_v5_irs_wait_for_vm_op();
+}
+
+static int vgic_v5_irs_clear_vmt(void)
+{
+ return vgic_v5_irs_write_vm_mmio_reg(0, GICV5_IRS_VMT_BASER);
+}
+
+static int vgic_v5_irs_vmap_l2_vmt(u16 vm_id)
+{
+ u64 val = FIELD_PREP(GICV5_IRS_VMAP_L2_VMTR_VM_ID, vm_id) |
+ GICV5_IRS_VMAP_L2_VMTR_M;
+
+ return vgic_v5_irs_write_vm_mmio_reg(val, GICV5_IRS_VMAP_L2_VMTR);
+}
+
+static int __vgic_v5_irs_vmap_vm(u16 vm_id, bool unmap)
+{
+ u64 val = FIELD_PREP(GICV5_IRS_VMAP_VMR_VM_ID, vm_id) |
+ FIELD_PREP(GICV5_IRS_VMAP_VMR_U, unmap) |
+ GICV5_IRS_VMAP_VMR_M;
+
+ return vgic_v5_irs_write_vm_mmio_reg(val, GICV5_IRS_VMAP_VMR);
+}
+
+static int vgic_v5_irs_set_vm_valid(u16 vm_id)
+{
+ return __vgic_v5_irs_vmap_vm(vm_id, false);
+}
+
+static int vgic_v5_irs_set_vm_invalid(u16 vm_id)
+{
+ return __vgic_v5_irs_vmap_vm(vm_id, true);
+}
+
+static int __vgic_v5_irs_update_vist_validity(u16 vm_id, bool spi_ist, bool unmap)
+{
+ u8 type = spi_ist ? 0b011 : 0b010;
+ u64 val = FIELD_PREP(GICV5_IRS_VMAP_VISTR_TYPE, type) |
+ FIELD_PREP(GICV5_IRS_VMAP_VISTR_VM_ID, vm_id) |
+ FIELD_PREP(GICV5_IRS_VMAP_VISTR_U, unmap) |
+ GICV5_IRS_VMAP_VISTR_M;
+
+ return vgic_v5_irs_write_vm_mmio_reg(val, GICV5_IRS_VMAP_VISTR);
+}
+
+static int vgic_v5_irs_set_vist_valid(u16 vm_id, bool spi_ist)
+{
+ return __vgic_v5_irs_update_vist_validity(vm_id, spi_ist, false);
+}
+
+/*
+ * LPI ISTs can be invalidated explicitly. SPI ISTs are invalidated by making
+ * the VMTE invalid during teardown.
+ */
+static int vgic_v5_irs_set_vist_invalid(u16 vm_id, bool spi_ist)
+{
+ return __vgic_v5_irs_update_vist_validity(vm_id, spi_ist, true);
+}
+
static int vgic_v5_db_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
{
+ struct vgic_v5_vm *vm = data->domain->host_data;
enum gicv5_vcpu_cmd *cmd = vcpu_info;
switch (*cmd) {
case VMT_L2_MAP:
+ return vgic_v5_irs_vmap_l2_vmt(vm->vm_id);
case VMTE_MAKE_VALID:
+ return vgic_v5_irs_set_vm_valid(vm->vm_id);
case VMTE_MAKE_INVALID:
+ return vgic_v5_irs_set_vm_invalid(vm->vm_id);
case SPI_VIST_MAKE_VALID:
+ return vgic_v5_irs_set_vist_valid(vm->vm_id, true);
case LPI_VIST_MAKE_VALID:
+ return vgic_v5_irs_set_vist_valid(vm->vm_id, false);
case LPI_VIST_MAKE_INVALID:
- /* Not yet implemented */
+ return vgic_v5_irs_set_vist_invalid(vm->vm_id, false);
default:
return -EINVAL;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 12/39] KVM: arm64: gic-v5: Keep GICv5 vCPU limit model-specific
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (10 preceding siblings ...)
2026-05-21 14:52 ` [PATCH v2 11/39] KVM: arm64: gic-v5: Implement VMT/vIST IRS MMIO Ops Sascha Bischoff
@ 2026-05-21 14:53 ` Sascha Bischoff
2026-05-21 14:53 ` [PATCH v2 13/39] KVM: arm64: gic-v5: Implement VPE IRS MMIO Ops Sascha Bischoff
` (26 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:53 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
A GICv5 host with FEAT_GCIE_LEGACY can expose both a native vGICv5 or
a vGICv3 device. These models do not necessarily have the same vCPU
limit: the native GICv5 limit is probed from the IRS VPE capacity,
while the GICv3 limit remains the fixed KVM vGICv3 limit.
Keep the IRS-derived limit separately for vGICv5 creation. The
pre-VGIC KVM_CAP_MAX_VCPUS value continues to expose the largest limit
among the still-selectable models, and kvm_vgic_create() clamps the VM
to the limit of the VGIC model userspace actually selected.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-init.c | 14 +++++++++-----
arch/arm64/kvm/vgic/vgic-v5.c | 19 +++++++++----------
include/kvm/arm_vgic.h | 16 ++++++++++++----
3 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 079a57c2b18f6..94632fd90b728 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -129,13 +129,17 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
}
ret = 0;
- if (type == KVM_DEV_TYPE_ARM_VGIC_V2)
+ switch (type) {
+ case KVM_DEV_TYPE_ARM_VGIC_V2:
kvm->max_vcpus = VGIC_V2_MAX_CPUS;
- else if (type == KVM_DEV_TYPE_ARM_VGIC_V3)
+ break;
+ case KVM_DEV_TYPE_ARM_VGIC_V3:
kvm->max_vcpus = VGIC_V3_MAX_CPUS;
- else if (type == KVM_DEV_TYPE_ARM_VGIC_V5)
- kvm->max_vcpus = min(VGIC_V5_MAX_CPUS,
- kvm_vgic_global_state.max_gic_vcpus);
+ break;
+ case KVM_DEV_TYPE_ARM_VGIC_V5:
+ kvm->max_vcpus = kvm_vgic_global_state.max_gicv5_vcpus;
+ break;
+ }
if (atomic_read(&kvm->online_vcpus) > kvm->max_vcpus) {
ret = -E2BIG;
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index f9578c2a634a4..909cef5f31afa 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -110,7 +110,8 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
int ret;
kvm_vgic_global_state.type = VGIC_V5;
- kvm_vgic_global_state.max_gic_vcpus = VGIC_V5_MAX_CPUS;
+ kvm_vgic_global_state.max_gic_vcpus = 0;
+ kvm_vgic_global_state.max_gicv5_vcpus = 0;
kvm_vgic_global_state.vcpu_base = 0;
kvm_vgic_global_state.vctrl_base = NULL;
@@ -135,8 +136,8 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
* Even if the HW supports more per-VM vCPUs, artificially cap as we
* can't use them all.
*/
- kvm_vgic_global_state.max_gic_vcpus = min(irs_caps.max_vpes,
- VGIC_V5_MAX_CPUS);
+ kvm_vgic_global_state.max_gicv5_vcpus = min(irs_caps.max_vpes,
+ VGIC_V5_MAX_CPUS);
/*
* GICv5 requires a set of tables to be allocated in order to manage
@@ -145,7 +146,7 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
* we want to run. For now, we match the maximum number offered by the
* hardware, but this might not be a wise choice in the long term.
*/
- ret = vgic_v5_vmt_allocate(kvm_vgic_global_state.max_gic_vcpus);
+ ret = vgic_v5_vmt_allocate(kvm_vgic_global_state.max_gicv5_vcpus);
if (ret) {
kvm_err("Failed to allocate the GICv5 VM tables; no GICv5 support\n");
return -ENODEV;
@@ -166,9 +167,6 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
return -ENODEV;
}
- kvm_vgic_global_state.max_gic_vcpus = min(irs_caps.max_vpes,
- VGIC_V5_MAX_CPUS);
-
ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V5);
if (ret) {
kvm_err("Cannot register GICv5 KVM device.\n");
@@ -178,6 +176,8 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
}
v5_registered = true;
+ kvm_vgic_global_state.max_gic_vcpus =
+ kvm_vgic_global_state.max_gicv5_vcpus;
kvm_info("GCIE system register CPU interface\n");
skip_v5:
@@ -205,9 +205,8 @@ int vgic_v5_probe(const struct gic_kvm_info *info)
return v5_registered ? 0 : ret;
}
- /* We potentially limit the max VCPUs further than we need to here */
- kvm_vgic_global_state.max_gic_vcpus = min(VGIC_V3_MAX_CPUS,
- kvm_vgic_global_state.max_gic_vcpus);
+ kvm_vgic_global_state.max_gic_vcpus = max(kvm_vgic_global_state.max_gic_vcpus,
+ VGIC_V3_MAX_CPUS);
static_branch_enable(&kvm_vgic_global_state.gicv3_cpuif);
kvm_info("GCIE legacy system register CPU interface\n");
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index ba32cd71fe0a7..6f736094a0e7e 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -157,9 +157,16 @@ struct vgic_global {
/* Maintenance IRQ number */
unsigned int maint_irq;
- /* maximum number of VCPUs allowed (GICv2 limits us to 8) */
+ /*
+ * Maximum number of VCPUs exposed before userspace has selected a
+ * VGIC model. Individual VGIC models can impose a lower limit
+ * (GICv2 limits us to 8).
+ */
int max_gic_vcpus;
+ /* Maximum number of VCPUs allowed for a GICv5 VM. */
+ int max_gicv5_vcpus;
+
/* Only needed for the legacy KVM_CREATE_IRQCHIP */
bool can_emulate_gicv2;
@@ -635,10 +642,11 @@ void kvm_vgic_process_async_update(struct kvm_vcpu *vcpu);
void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg, bool allow_group1);
/**
- * kvm_vgic_get_max_vcpus - Get the maximum number of VCPUs allowed by HW
+ * kvm_vgic_get_max_vcpus - Get the pre-VGIC-selection VCPU limit
*
- * The host's GIC naturally limits the maximum amount of VCPUs a guest
- * can use.
+ * Userspace can query KVM_CAP_MAX_VCPUS before selecting a VGIC model, so
+ * expose the highest model-specific limit and let kvm_vgic_create() enforce
+ * the selected model's actual limit.
*/
static inline int kvm_vgic_get_max_vcpus(void)
{
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 13/39] KVM: arm64: gic-v5: Implement VPE IRS MMIO Ops
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (11 preceding siblings ...)
2026-05-21 14:53 ` [PATCH v2 12/39] KVM: arm64: gic-v5: Keep GICv5 vCPU limit model-specific Sascha Bischoff
@ 2026-05-21 14:53 ` Sascha Bischoff
2026-05-21 14:53 ` [PATCH v2 14/39] KVM: arm64: gic-v5: Set up VMTEs and VPE doorbells Sascha Bischoff
` (25 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:53 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Introduce interfaces to make VPEs valid, and to configure them, via
the host's IRS. As with the other valid bits in the GICv5 VM tables,
VPEs cannot be made valid directly, and instead are made valid via an
IRS MMIO Op.
Additionally, some of the VPE configuration takes place via the IRS
MMIO interface too (via the IRS_VPE_CR0, IRS_VPE_DBR). VPE doorbells
are, for example, configured via this interface.
The existing VPE-doorbell-based commands are extended with:
VPE_MAKE_VALID - Make the VPE valid in the VPET
Note: There is no VPE_MAKE_INVALID as VPEs are only made invalid on
teardown, at which point the whole VMTE is marked as invalid. Hence,
it is not required.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-v5.c | 84 ++++++++++++++++++++++++++++++
include/linux/irqchip/arm-gic-v5.h | 1 +
2 files changed, 85 insertions(+)
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 909cef5f31afa..6a312c24d0b31 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -228,6 +228,18 @@ static int vgic_v5_irs_wait_for_vm_op(void)
NULL);
}
+/*
+ * Wait for completion of a change in any of IRS_VPE_SELR, IRS_VPE_DBR,
+ * IRS_VPE_CR0.
+ */
+static int vgic_v5_irs_wait_for_vpe_op(void)
+{
+ return gicv5_wait_for_op_atomic(irs_caps.irs_base,
+ GICV5_IRS_VPE_STATUSR,
+ GICV5_IRS_VPE_STATUSR_IDLE,
+ NULL);
+}
+
static int vgic_v5_irs_write_vm_mmio_reg(u64 val, u32 offset)
{
int ret;
@@ -328,10 +340,73 @@ static int vgic_v5_irs_set_vist_invalid(u16 vm_id, bool spi_ist)
return __vgic_v5_irs_update_vist_validity(vm_id, spi_ist, true);
}
+static int vgic_v5_irs_set_up_vpe(u16 vm_id, u16 vpe_id,
+ irq_hw_number_t db_hwirq)
+{
+ u64 vmap_vper, dbr, selr;
+ u32 statusr, cr0;
+ int ret;
+
+ guard(raw_spinlock_irqsave)(&global_irs_lock);
+
+ /* Make sure that we are idle to begin with */
+ ret = vgic_v5_irs_wait_for_vm_op();
+ if (ret)
+ return ret;
+
+ /* Mark the VPE as valid */
+ vmap_vper = FIELD_PREP(GICV5_IRS_VMAP_VPER_VPE_ID, vpe_id) |
+ FIELD_PREP(GICV5_IRS_VMAP_VPER_VM_ID, vm_id) |
+ GICV5_IRS_VMAP_VPER_M;
+ irs_writeq_relaxed(vmap_vper, GICV5_IRS_VMAP_VPER);
+
+ /* Wait for the VPE to be marked valid in the VPET */
+ ret = vgic_v5_irs_wait_for_vm_op();
+ if (ret)
+ return ret;
+
+ selr = FIELD_PREP(GICV5_IRS_VPE_SELR_VPE_ID, vpe_id) |
+ FIELD_PREP(GICV5_IRS_VPE_SELR_VM_ID, vm_id) |
+ GICV5_IRS_VPE_SELR_S;
+ irs_writeq_relaxed(selr, GICV5_IRS_VPE_SELR);
+
+ ret = vgic_v5_irs_wait_for_vpe_op();
+ if (ret)
+ return ret;
+
+ statusr = irs_readl_relaxed(GICV5_IRS_VPE_STATUSR);
+ if (!FIELD_GET(GICV5_IRS_VPE_STATUSR_V, statusr))
+ return -EINVAL;
+
+ /* Set targeted only routing (disable 1ofN vPE selection) */
+ cr0 = GICV5_IRS_VPE_CR0_DPS;
+ irs_writel_relaxed(cr0, GICV5_IRS_VPE_CR0);
+
+ ret = vgic_v5_irs_wait_for_vpe_op();
+ if (ret)
+ return ret;
+
+ /*
+ * The VPE has not yet run. Therefore, make sure that all interrupts
+ * will generate a doorbell.
+ */
+ dbr = FIELD_PREP(GICV5_IRS_VPE_DBR_INTID, db_hwirq) |
+ GICV5_IRS_VPE_DBR_DBV;
+ irs_writeq_relaxed(dbr, GICV5_IRS_VPE_DBR);
+
+ ret = vgic_v5_irs_wait_for_vpe_op();
+ if (ret)
+ return ret;
+
+ return 0;
+}
+
static int vgic_v5_db_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
{
struct vgic_v5_vm *vm = data->domain->host_data;
enum gicv5_vcpu_cmd *cmd = vcpu_info;
+ /* Our VPE ID is the index within the doorbell domain */
+ u16 vpe_id = data->hwirq;
switch (*cmd) {
case VMT_L2_MAP:
@@ -340,6 +415,15 @@ static int vgic_v5_db_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
return vgic_v5_irs_set_vm_valid(vm->vm_id);
case VMTE_MAKE_INVALID:
return vgic_v5_irs_set_vm_invalid(vm->vm_id);
+ case VPE_MAKE_VALID:
+ /*
+ * We need the actual LPI ID which lives in the top-most parent
+ * domain. This hwirq won't include the type (LPI) but that's
+ * not required for the IRS_VPE_DBR.
+ */
+ while (data->parent_data)
+ data = data->parent_data;
+ return vgic_v5_irs_set_up_vpe(vm->vm_id, vpe_id, data->hwirq);
case SPI_VIST_MAKE_VALID:
return vgic_v5_irs_set_vist_valid(vm->vm_id, true);
case LPI_VIST_MAKE_VALID:
diff --git a/include/linux/irqchip/arm-gic-v5.h b/include/linux/irqchip/arm-gic-v5.h
index ef649faeeb0ff..4cf85859f3a31 100644
--- a/include/linux/irqchip/arm-gic-v5.h
+++ b/include/linux/irqchip/arm-gic-v5.h
@@ -623,6 +623,7 @@ enum gicv5_vcpu_cmd {
VMT_L2_MAP, /* Map in a L2 VMT - *may* happen on VM init */
VMTE_MAKE_VALID, /* Make the VMTE valid */
VMTE_MAKE_INVALID, /* Make the VMTE (et al.) invalid */
+ VPE_MAKE_VALID, /* No corresponding invalid */
SPI_VIST_MAKE_VALID, /* No corresponding invalid */
LPI_VIST_MAKE_VALID, /* Triggered by a guest */
LPI_VIST_MAKE_INVALID, /* Triggered by a guest */
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 14/39] KVM: arm64: gic-v5: Set up VMTEs and VPE doorbells
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (12 preceding siblings ...)
2026-05-21 14:53 ` [PATCH v2 13/39] KVM: arm64: gic-v5: Implement VPE IRS MMIO Ops Sascha Bischoff
@ 2026-05-21 14:53 ` Sascha Bischoff
2026-05-21 14:54 ` [PATCH v2 15/39] KVM: arm64: gic-v5: Add resident/non-resident hyp calls Sascha Bischoff
` (24 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:53 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
A GICv5 VM needs a VM table entry before it can use SPIs and LPIs,
which are backed by the host IRS. The VM table itself is created at
probe time, but each VM still needs to claim and populate one VMTE
before it can use those interrupts.
Allocate a VM ID during vgic_v5_init(). The VM ID is also the index
into the VM table, so allocating it selects the VMTE slot that will be
used for the lifetime of the VM.
Create a per-VM VPE doorbell irq domain, allocate one doorbell
interrupt per vcpu, request the interrupts, and keep the doorbell IRQ
number in the vcpu's GICv5 state. The doorbell handler itself marks
the VPE doorbell as fired, raises KVM_REQ_IRQ_PENDING, and kicks the
target vcpu so that KVM can re-evaluate pending interrupt state.
With the VM ID and doorbells in place, initialise the VMTE backing
state, including the VM descriptor and VPE table. The doorbells have
to exist before making the VMTE valid, as they provide the IRQ-side
conduit used by the IRS commands. Make the VMTE valid via the IRS,
then allocate the VPE state for each vcpu.
Add vgic_v5_teardown() to unwind the state in the reverse order. Make
the VMTE invalid, free the VPE state, release the VMTE backing state,
free the doorbell IRQs and irq domain, and finally release the VM ID
so that the VMTE slot can be reused by a later VM.
On init failure, call the same teardown path so that partially created
state is unwound consistently.
As part of resetting VCPUs mark them as valid in the VM VPE
Table. This informs the IRS that a specific VPE may be made resident,
and without this the IRS will treat the VPE as invalid.
Also introduce a wrapper around the VPE doorbells -
vgic_v5_send_command(). This takes a struct kvm_vcpu pointer, and the
command to run, and triggers the function bound to the command via
that vcpu's doorbell. This is a convenience function to simplify the
code.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-v5.c | 154 +++++++++++++++++++++++++++++++---
1 file changed, 144 insertions(+), 10 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 6a312c24d0b31..08f2411c0a134 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -401,6 +401,23 @@ static int vgic_v5_irs_set_up_vpe(u16 vm_id, u16 vpe_id,
return 0;
}
+static irqreturn_t db_handler(int irq, void *data)
+{
+ struct kvm_vcpu *vcpu = data;
+
+ WRITE_ONCE(vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db_fired, true);
+
+ kvm_make_request(KVM_REQ_IRQ_PENDING, vcpu);
+ kvm_vcpu_kick(vcpu);
+
+ return IRQ_HANDLED;
+}
+
+static int vgic_v5_send_command(struct kvm_vcpu *vcpu, enum gicv5_vcpu_cmd cmd)
+{
+ return irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu), &cmd);
+}
+
static int vgic_v5_db_set_vcpu_affinity(struct irq_data *data, void *vcpu_info)
{
struct vgic_v5_vm *vm = data->domain->host_data;
@@ -575,33 +592,101 @@ void vgic_v5_reset(struct kvm_vcpu *vcpu)
* CPUIF (but potentially fewer in the IRS).
*/
vcpu->arch.vgic_cpu.num_pri_bits = 5;
+
+ /* Make the VPE valid in the VPET */
+ if (WARN_ON(vgic_v5_send_command(vcpu, VPE_MAKE_VALID)))
+ return;
+}
+
+static void vgic_v5_free_doorbells(struct kvm *kvm, unsigned int nr_dbs)
+{
+ struct vgic_v5_vm *vm = &kvm->arch.vgic.gicv5_vm;
+ struct kvm_vcpu *vcpu;
+ unsigned long i;
+ int db;
+
+ for (i = 0; i < nr_dbs; i++) {
+ vcpu = kvm_get_vcpu(kvm, i);
+ db = vgic_v5_vpe_db(vcpu);
+ if (!db)
+ continue;
+
+ free_irq(db, vcpu);
+ vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db = 0;
+ }
+
+ if (vm->vpe_db_base) {
+ irq_domain_free_irqs(vm->vpe_db_base,
+ atomic_read(&kvm->online_vcpus));
+ vm->vpe_db_base = 0;
+ }
}
void vgic_v5_teardown(struct kvm *kvm)
{
+ struct vgic_dist *dist = &kvm->arch.vgic;
+ struct kvm_vcpu *vcpu, *vcpu0;
+ unsigned long i;
+ int rc;
+
+ /*
+ * If the VM's ID isn't valid, then we either failed init very early or
+ * we've been called a second time. Nothing to do here in either case.
+ */
+ if (kvm->arch.vgic.gicv5_vm.vm_id == VGIC_V5_VM_ID_INVAL)
+ return;
+
+ if (kvm->arch.vgic.gicv5_vm.vmte_allocated) {
+ /* Make the VM invalid */
+ vcpu0 = kvm_get_vcpu(kvm, 0);
+ rc = vgic_v5_send_command(vcpu0, VMTE_MAKE_INVALID);
+ if (rc)
+ kvm_err("could not make VMTE invalid\n");
+
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ if (vgic_v5_vmte_free_vpe(vcpu))
+ kvm_err("Failed to free VPE\n");
+ }
+
+ if (vgic_v5_vmte_release(kvm))
+ kvm_err("Failed to release VM 0x%x\n", dist->gicv5_vm.vm_id);
+ }
+
+ vgic_v5_free_doorbells(kvm, atomic_read(&kvm->online_vcpus));
+
vgic_v5_teardown_per_vm_domain(&kvm->arch.vgic.gicv5_vm);
+
+ vgic_v5_release_vm_id(kvm);
}
+/*
+ * Claim and populate a VMTE (optionally making a new L2 VMT valid), create VPE
+ * doorbells, allocate VPET and populate for each VPE.
+ *
+ * Note: We do need to put the cart before the horse here. The VPE doorbells are
+ * our conduit for communication with the IRS, which means we need to have those
+ * before making the VMTE valid.
+ *
+ * On failure, we clean up in the teardown path (vgic_v5_teardown()).
+ */
int vgic_v5_init(struct kvm *kvm)
{
- struct kvm_vcpu *vcpu;
- unsigned long idx;
- int ret;
+ struct kvm_vcpu *vcpu, *vcpu0;
+ int nr_vcpus, ret = 0;
+ unsigned int db_virq;
+ unsigned long i;
- if (vgic_initialized(kvm))
- return 0;
+ nr_vcpus = atomic_read(&kvm->online_vcpus);
+ if (nr_vcpus == 0)
+ return -ENODEV;
- kvm_for_each_vcpu(idx, vcpu, kvm) {
+ kvm_for_each_vcpu(i, vcpu, kvm) {
if (vcpu_has_nv(vcpu)) {
kvm_err("Nested GICv5 VMs are currently unsupported\n");
return -EINVAL;
}
}
- ret = vgic_v5_create_per_vm_domain(kvm);
- if (ret)
- return ret;
-
/* We only allow userspace to drive the SW_PPI, if it is implemented. */
bitmap_zero(kvm->arch.vgic.gicv5_vm.userspace_ppis,
VGIC_V5_NR_PRIVATE_IRQS);
@@ -610,7 +695,56 @@ int vgic_v5_init(struct kvm *kvm)
kvm->arch.vgic.gicv5_vm.userspace_ppis,
ppi_caps.impl_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS);
+ ret = vgic_v5_allocate_vm_id(kvm);
+ if (ret)
+ return ret;
+
+ ret = vgic_v5_create_per_vm_domain(kvm);
+ if (ret)
+ goto err;
+
+ db_virq = kvm->arch.vgic.gicv5_vm.vpe_db_base;
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ ret = request_irq(db_virq + i, db_handler, 0, "vcpu", vcpu);
+ if (ret)
+ goto err;
+
+ /* Stash it with the VCPU for easy retrieval */
+ vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db = db_virq + i;
+ }
+
+ /* Populate VMTE (with VPET and VM descriptor) */
+ ret = vgic_v5_vmte_init(kvm);
+ if (ret)
+ goto err;
+
+ /* We pick the first vcpu to make the VMTE valid - any would do */
+ vcpu0 = kvm_get_vcpu(kvm, 0);
+ ret = vgic_v5_send_command(vcpu0, VMTE_MAKE_VALID);
+ if (ret)
+ goto err;
+
+ /* Loop over all VPEs, allocate/populate their data structures */
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ ret = vgic_v5_vmte_alloc_vpe(vcpu);
+ if (ret)
+ goto err;
+ }
+
return 0;
+
+err:
+ /*
+ * Explicitly tear everything down on failure. The teardown function is
+ * written to handle any partial state we might have, so we don't need
+ * to do any clean-up first. Teardown will be called a second time on VM
+ * destruction, but that's fine - it is better to leave things in a
+ * clean state now, and doubly so because userspace could actually go
+ * and retry init.
+ */
+ vgic_v5_teardown(kvm);
+
+ return ret;
}
int vgic_v5_map_resources(struct kvm *kvm)
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 15/39] KVM: arm64: gic-v5: Add resident/non-resident hyp calls
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (13 preceding siblings ...)
2026-05-21 14:53 ` [PATCH v2 14/39] KVM: arm64: gic-v5: Set up VMTEs and VPE doorbells Sascha Bischoff
@ 2026-05-21 14:54 ` Sascha Bischoff
2026-05-21 14:54 ` [PATCH v2 16/39] KVM: arm64: gic-v5: Request doorbells when VPEs enter WFI Sascha Bischoff
` (23 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:54 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
GICv5 introduces the concept of VPE residency - a VPE can be either
resident or non-resident. When the VPE is resident, the IRS is allowed
to select interrupts that target that VPE (or the VM) as the HPPI
(Highest Priority Pending Interrupt). As the IRS handles both SPIs and
LPIs, these will only be picked as the IRS's HPPI when a VPE is
resident.
A GICv5 VPE is made resident by writing ICH_CONTEXTR_EL2 with
ICH_CONTEXTR_EL2.V set, together with valid VM and VPE IDs. This
informs the IRS that a specific VPE is running, and that it can begin
HPPI selection for that VPE. Making a VPE non-resident (by making the
ICH_CONTEXTR_EL2 invalid) informs the IRS that the VPE is no longer
running, and it stops HPPI selection for it.
This change introduces two new hyp calls - one to make a VPE resident
and its counterpart to make a VPE non-resident. As part of making a
VPE resident, the resulting ICH_CONTEXTR_EL2.F bit is checked to catch
residency faults. Such a fault indicates a broken VM/VPE setup, so
warn and mark the VM dead.
Furthermore, this change extends vgic_v5_load() and vgic_v5_put() to
make the VPEs resident and non-resident, respectively. Hence, the VPE
is considered resident for the entire load-to-put interval.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/include/asm/kvm_asm.h | 2 ++
arch/arm64/include/asm/kvm_hyp.h | 2 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 16 ++++++++++++++++
arch/arm64/kvm/hyp/vgic-v5-sr.c | 26 ++++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic-v5.c | 16 ++++++++++++++--
include/kvm/arm_vgic.h | 3 +++
6 files changed, 63 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index 043495f7fc78b..d9ff9c2999aa7 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -87,6 +87,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___tracing_write_event,
__KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
+ __KVM_HOST_SMCCC_FUNC___vgic_v5_make_resident,
+ __KVM_HOST_SMCCC_FUNC___vgic_v5_make_non_resident,
__KVM_HOST_SMCCC_FUNC___vgic_v5_save_apr,
__KVM_HOST_SMCCC_FUNC___vgic_v5_restore_vmcr_apr,
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 8d06b62e7188c..5f9184276b04e 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -88,6 +88,8 @@ void __vgic_v3_restore_vmcr_aprs(struct vgic_v3_cpu_if *cpu_if);
int __vgic_v3_perform_cpuif_access(struct kvm_vcpu *vcpu);
/* GICv5 */
+void __vgic_v5_make_resident(struct vgic_v5_cpu_if *cpu_if);
+void __vgic_v5_make_non_resident(struct vgic_v5_cpu_if *cpu_if);
void __vgic_v5_save_apr(struct vgic_v5_cpu_if *cpu_if);
void __vgic_v5_restore_vmcr_apr(struct vgic_v5_cpu_if *cpu_if);
/* No hypercalls for the following */
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 06db299c37a89..555275736fa77 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -672,6 +672,20 @@ static void handle___tracing_write_event(struct kvm_cpu_context *host_ctxt)
trace_selftest(id);
}
+static void handle___vgic_v5_make_resident(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
+
+ __vgic_v5_make_resident(kern_hyp_va(cpu_if));
+}
+
+static void handle___vgic_v5_make_non_resident(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
+
+ __vgic_v5_make_non_resident(kern_hyp_va(cpu_if));
+}
+
static void handle___vgic_v5_save_apr(struct kvm_cpu_context *host_ctxt)
{
DECLARE_REG(struct vgic_v5_cpu_if *, cpu_if, host_ctxt, 1);
@@ -719,6 +733,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__tracing_write_event),
HANDLE_FUNC(__vgic_v3_save_aprs),
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
+ HANDLE_FUNC(__vgic_v5_make_resident),
+ HANDLE_FUNC(__vgic_v5_make_non_resident),
HANDLE_FUNC(__vgic_v5_save_apr),
HANDLE_FUNC(__vgic_v5_restore_vmcr_apr),
diff --git a/arch/arm64/kvm/hyp/vgic-v5-sr.c b/arch/arm64/kvm/hyp/vgic-v5-sr.c
index 6d69dfe89a96c..f064045a31aee 100644
--- a/arch/arm64/kvm/hyp/vgic-v5-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v5-sr.c
@@ -7,6 +7,32 @@
#include <asm/kvm_hyp.h>
+void __vgic_v5_make_resident(struct vgic_v5_cpu_if *cpu_if)
+{
+ write_sysreg_s(cpu_if->vgic_contextr, SYS_ICH_CONTEXTR_EL2);
+ isb();
+
+ /* Catch any faults */
+ cpu_if->vgic_contextr = read_sysreg_s(SYS_ICH_CONTEXTR_EL2);
+ if (!!FIELD_GET(ICH_CONTEXTR_EL2_F, cpu_if->vgic_contextr))
+ return;
+
+ cpu_if->gicv5_vpe.resident = true;
+}
+
+void __vgic_v5_make_non_resident(struct vgic_v5_cpu_if *cpu_if)
+{
+ /*
+ * Make as non-resident before actually making non-resident. Avoids race
+ * with doorbell arriving.
+ */
+ cpu_if->gicv5_vpe.resident = false;
+ dsb(st);
+
+ write_sysreg_s(cpu_if->vgic_contextr, SYS_ICH_CONTEXTR_EL2);
+ isb();
+}
+
void __vgic_v5_save_apr(struct vgic_v5_cpu_if *cpu_if)
{
cpu_if->vgic_apr = read_sysreg_s(SYS_ICH_APR_EL2);
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 08f2411c0a134..25590cf5ebee1 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -1038,6 +1038,8 @@ void vgic_v5_flush_ppi_state(struct kvm_vcpu *vcpu)
void vgic_v5_load(struct kvm_vcpu *vcpu)
{
struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+ u16 vm = vgic_v5_vm_id(vcpu->kvm);
+ u16 vpe = vgic_v5_vpe_id(vcpu);
/*
* On the WFI path, vgic_load is called a second time. The first is when
@@ -1050,7 +1052,15 @@ void vgic_v5_load(struct kvm_vcpu *vcpu)
kvm_call_hyp(__vgic_v5_restore_vmcr_apr, cpu_if);
- cpu_if->gicv5_vpe.resident = true;
+ cpu_if->vgic_contextr = FIELD_PREP(ICH_CONTEXTR_EL2_V, true) |
+ FIELD_PREP(ICH_CONTEXTR_EL2_VPE, vpe) |
+ FIELD_PREP(ICH_CONTEXTR_EL2_VM, vm);
+
+ kvm_call_hyp(__vgic_v5_make_resident, cpu_if);
+
+ /* Failed to make the VPE resident? Bang! */
+ if (WARN_ON(!!FIELD_GET(ICH_CONTEXTR_EL2_F, cpu_if->vgic_contextr)))
+ kvm_vm_dead(vcpu->kvm);
}
void vgic_v5_put(struct kvm_vcpu *vcpu)
@@ -1068,7 +1078,9 @@ void vgic_v5_put(struct kvm_vcpu *vcpu)
kvm_call_hyp(__vgic_v5_save_apr, cpu_if);
- cpu_if->gicv5_vpe.resident = false;
+ cpu_if->vgic_contextr = 0;
+
+ kvm_call_hyp(__vgic_v5_make_non_resident, cpu_if);
/* The shadow priority is only updated on entering WFI */
if (vcpu_get_flag(vcpu, IN_WFI))
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 6f736094a0e7e..faecde764fea3 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -536,6 +536,9 @@ struct vgic_v5_cpu_if {
*/
u64 vgic_icsr;
+ /* The contextr used to make VPEs resident and non-resident */
+ u64 vgic_contextr;
+
struct gicv5_vpe gicv5_vpe;
};
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 16/39] KVM: arm64: gic-v5: Request doorbells when VPEs enter WFI
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (14 preceding siblings ...)
2026-05-21 14:54 ` [PATCH v2 15/39] KVM: arm64: gic-v5: Add resident/non-resident hyp calls Sascha Bischoff
@ 2026-05-21 14:54 ` Sascha Bischoff
2026-05-21 14:54 ` [PATCH v2 17/39] KVM: arm64: gic-v5: Introduce struct vgic_v5_irs and IRS base address Sascha Bischoff
` (22 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:54 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
When a GICv5 VPE is made non-resident as part of the vcpu entering
WFI, request a VPE doorbell so that KVM can be notified when a
suitable SPI or LPI becomes pending for that VPE.
Program the doorbell priority mask, DBPM, from the effective virtual
priority mask before making the VPE non-resident. DBPM is the priority
threshold used by the GICv5 hardware to decide whether a pending SPI
or LPI is allowed to signal the VPE doorbell. This allows hardware to
signal the doorbell only for interrupts that the vcpu can actually
take, and avoids waking it for interrupts masked by the guest priority
state. If no interrupt can be signalled to the vcpu, leave the
doorbell request clear.
Make the doorbell interrupt affine to the current CPU before
requesting it. This nudges the wakeup back towards the CPU that last
ran the vcpu, where the relevant state is more likely to be cache-hot,
while also spreading doorbell interrupts across host PEs as different
vcpus enter WFI on different CPUs.
Clear stale db_fired state before making the VPE non-resident. Any
previous doorbell notification has already been consumed by this
point, and clearing it before the non-resident transition ensures that
a newly fired doorbell is observed.
Finally, teach kvm_vgic_vcpu_pending_irq() to report pending work for
a GICv5 vcpu when its VPE doorbell has fired, in addition to the
existing pending-PPI check.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/hyp/vgic-v5-sr.c | 9 ++++++++
arch/arm64/kvm/vgic/vgic-v5.c | 40 +++++++++++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic.c | 6 ++++-
3 files changed, 54 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/hyp/vgic-v5-sr.c b/arch/arm64/kvm/hyp/vgic-v5-sr.c
index f064045a31aee..46992a6c2cacb 100644
--- a/arch/arm64/kvm/hyp/vgic-v5-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v5-sr.c
@@ -22,6 +22,15 @@ void __vgic_v5_make_resident(struct vgic_v5_cpu_if *cpu_if)
void __vgic_v5_make_non_resident(struct vgic_v5_cpu_if *cpu_if)
{
+ /*
+ * Clear the db_fired state to ensure that we're ready for the next
+ * doorbell when it is requested. If a doorbell firing caused us to
+ * enter the guest, then we've already consumed that state at this
+ * point, so this is safe to clear. Use WRITE_ONCE() to ensure we're not
+ * racing with the doorbell firing and setting the state true again.
+ */
+ WRITE_ONCE(cpu_if->gicv5_vpe.db_fired, false);
+
/*
* Make as non-resident before actually making non-resident. Avoids race
* with doorbell arriving.
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 25590cf5ebee1..b966495901cc4 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -1079,6 +1079,46 @@ void vgic_v5_put(struct kvm_vcpu *vcpu)
kvm_call_hyp(__vgic_v5_save_apr, cpu_if);
cpu_if->vgic_contextr = 0;
+ if (vcpu_get_flag(vcpu, IN_WFI)) {
+ u32 priority_mask;
+ int dbpm;
+
+ /*
+ * Find the virtual running priority and use this to calculate
+ * the doorbell priority mask. We combine the highest active
+ * priority and the CPU's priority mask. The guest can't handle
+ * interrupts with priorities less than or equal to the virtual
+ * running priority, so there's literally no point in waking the
+ * guest for these.
+ *
+ * The priority needs to be higher than the mask to signal, so
+ * pick the next higher priority (subtract 1).
+ */
+ priority_mask = vgic_v5_get_effective_priority_mask(vcpu);
+
+ /*
+ * Request a doorbell *unless* the priority is 0, indicating
+ * that no interrupt can wake the CPU up.
+ */
+ if (priority_mask) {
+ int db_irq = vgic_v5_vpe_db(vcpu);
+ struct irq_data *d = irq_get_irq_data(db_irq);
+ const struct cpumask *aff = irq_data_get_effective_affinity_mask(d);
+ int cpu = smp_processor_id();
+
+ dbpm = priority_mask - 1;
+ cpu_if->vgic_contextr = FIELD_PREP(ICH_CONTEXTR_EL2_DB, 1) |
+ FIELD_PREP(ICH_CONTEXTR_EL2_DBPM, dbpm);
+
+ /*
+ * Make the doorbell affine to this CPU, if it isn't
+ * already. Actively check the cpumask first as it is
+ * cheaper than changing the affinity every time.
+ */
+ if (!cpumask_test_cpu(cpu, aff))
+ WARN_ON(irq_set_affinity(db_irq, cpumask_of(cpu)));
+ }
+ }
kvm_call_hyp(__vgic_v5_make_non_resident, cpu_if);
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index b697678d68b01..d56e87a0d2acc 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -1229,8 +1229,12 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
unsigned long flags;
struct vgic_vmcr vmcr;
- if (vgic_is_v5(vcpu->kvm))
+ if (vgic_is_v5(vcpu->kvm)) {
+ if (READ_ONCE(vcpu->arch.vgic_cpu.vgic_v5.gicv5_vpe.db_fired))
+ return true;
+
return vgic_v5_has_pending_ppi(vcpu);
+ }
if (!vcpu->kvm->arch.vgic.enabled)
return false;
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 17/39] KVM: arm64: gic-v5: Introduce struct vgic_v5_irs and IRS base address
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (15 preceding siblings ...)
2026-05-21 14:54 ` [PATCH v2 16/39] KVM: arm64: gic-v5: Request doorbells when VPEs enter WFI Sascha Bischoff
@ 2026-05-21 14:54 ` Sascha Bischoff
2026-05-21 14:55 ` [PATCH v2 18/39] KVM: arm64: gic-v5: Add IRS IODEV support to MMIO handlers Sascha Bischoff
` (21 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:54 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
In order to properly emulate the operation of the IRS from KVM, we
require storage for the MMIO register state. This change introduces
struct vgic_v5_irs, and adds a pointer to it to the struct vgic_dist.
This new data structure contains the storage for IRS MMIO state that
is required for emulating the MMIO interface in KVM. This provides
persistent storage, and a way to track data across MMIO writes, e.g.,
selecting an SPI and updating the configuration of it is two MMIO
writes.
Note that only a pointer to the data structure is added to struct
vgic_dist as this new structure is very large, and hence it makes
sense to dynamically allocate it and just provide a pointer to
retrieve it in struct vgic_dist.
In addition to adding a structure to store the MMIO state for the IRS,
we add the base address in GPA space to struct vgic_v5_irs.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
include/kvm/arm_vgic.h | 86 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 86 insertions(+)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index faecde764fea3..25368c5cda5df 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -409,6 +409,87 @@ struct vgic_v5_vm {
bool vmte_allocated;
};
+/*** GICv5 ***/
+struct vgic_v5_irs {
+ /* base addresses in guest physical address space: */
+ gpa_t vgic_v5_irs_base;
+
+ struct vgic_io_device iodev;
+ struct kvm_device *dev;
+
+ /* IRS state - used for registers etc */
+ struct {
+ u8 domain;
+ u8 pa_range;
+ bool virt;
+ bool setlpi;
+ bool mec;
+ bool mpam;
+ bool swe;
+ u16 irs_id;
+ } idr0;
+
+ struct {
+ /* PE_CNT is populated from online_vcpus at runtime */
+ u8 priority_bits;
+ } idr1;
+
+ struct {
+ u8 id_bits;
+ u8 min_lpi_id_bits;
+ bool ist_levels;
+ u8 ist_l2sz;
+ bool istmd;
+ u8 istmd_sz;
+ } idr2;
+
+ struct {
+ u32 spi_range;
+ } idr5;
+
+ struct {
+ u32 spi_irs_range;
+ } idr6;
+
+ struct {
+ u32 spi_base;
+ } idr7;
+
+ struct {
+ u8 sh;
+ u8 oc;
+ u8 ic;
+ bool ist_ra;
+ bool ist_wa;
+ bool vmt_ra;
+ bool vpet_ra;
+ bool vmd_ra;
+ bool vmd_wa;
+ bool vped_ra;
+ bool vped_wa;
+ } cr1;
+
+ struct {
+ u32 id;
+ } spi_selr;
+
+ struct {
+ u32 iaffid;
+ } pe_selr;
+
+ struct {
+ u8 lpi_id_bits;
+ u8 l2sz;
+ u8 istsz;
+ bool structure;
+ } ist_cfgr;
+
+ struct {
+ bool valid;
+ u64 addr;
+ } ist_baser;
+};
+
struct vgic_dist {
bool in_kernel;
bool ready;
@@ -486,6 +567,11 @@ struct vgic_dist {
* GICv5 per-VM data.
*/
struct vgic_v5_vm gicv5_vm;
+
+ /*
+ * GICv5 IRS data. Dynamically allocated due to the size.
+ */
+ struct vgic_v5_irs *vgic_v5_irs_data;
};
struct vgic_v2_cpu_if {
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 18/39] KVM: arm64: gic-v5: Add IRS IODEV support to MMIO handlers
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (16 preceding siblings ...)
2026-05-21 14:54 ` [PATCH v2 17/39] KVM: arm64: gic-v5: Introduce struct vgic_v5_irs and IRS base address Sascha Bischoff
@ 2026-05-21 14:55 ` Sascha Bischoff
2026-05-21 14:55 ` [PATCH v2 19/39] KVM: arm64: gic-v5: Add KVM_VGIC_V5_ADDR_TYPE_IRS to UAPI Sascha Bischoff
` (20 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:55 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
In order to support proper VMs (that support more than just PPIs) for
GICv5, it is important to emulate the GICv5 IRS too. The IRS includes
an MMIO interface which is used to interact with and configure the
IRS.
As part of providing the emulated IRS MMIO interface in KVM, extend
enum iodev_type to include a GICv5 IRS device, and extend the MMIO
code to handle reads and writes to that type of IO device. This will
allow the creation of a GICv5 IRS IO Device in KVM.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-mmio.c | 6 ++++++
arch/arm64/kvm/vgic/vgic-mmio.h | 2 ++
include/kvm/arm_vgic.h | 3 ++-
3 files changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/vgic/vgic-mmio.c b/arch/arm64/kvm/vgic/vgic-mmio.c
index 74d76dec97304..fddb9da0403d5 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio.c
+++ b/arch/arm64/kvm/vgic/vgic-mmio.c
@@ -1065,6 +1065,9 @@ static int dispatch_mmio_read(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
case IODEV_ITS:
data = region->its_read(vcpu->kvm, iodev->its, addr, len);
break;
+ case IODEV_GICV5_IRS:
+ data = region->read(vcpu, addr, len);
+ break;
}
vgic_data_host_to_mmio_bus(val, len, data);
@@ -1095,6 +1098,9 @@ static int dispatch_mmio_write(struct kvm_vcpu *vcpu, struct kvm_io_device *dev,
case IODEV_ITS:
region->its_write(vcpu->kvm, iodev->its, addr, len, data);
break;
+ case IODEV_GICV5_IRS:
+ region->write(vcpu, addr, len, data);
+ break;
}
return 0;
diff --git a/arch/arm64/kvm/vgic/vgic-mmio.h b/arch/arm64/kvm/vgic/vgic-mmio.h
index 50dc80220b0f3..38ed730d68ac3 100644
--- a/arch/arm64/kvm/vgic/vgic-mmio.h
+++ b/arch/arm64/kvm/vgic/vgic-mmio.h
@@ -217,6 +217,8 @@ unsigned int vgic_v2_init_cpuif_iodev(struct vgic_io_device *dev);
unsigned int vgic_v3_init_dist_iodev(struct vgic_io_device *dev);
+unsigned int vgic_v5_init_irs_iodev(struct vgic_io_device *dev);
+
u64 vgic_sanitise_outer_cacheability(u64 reg);
u64 vgic_sanitise_inner_cacheability(u64 reg);
u64 vgic_sanitise_shareability(u64 reg);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 25368c5cda5df..4d930a2651213 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -323,7 +323,8 @@ enum iodev_type {
IODEV_CPUIF,
IODEV_DIST,
IODEV_REDIST,
- IODEV_ITS
+ IODEV_ITS,
+ IODEV_GICV5_IRS
};
struct vgic_io_device {
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 19/39] KVM: arm64: gic-v5: Add KVM_VGIC_V5_ADDR_TYPE_IRS to UAPI
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (17 preceding siblings ...)
2026-05-21 14:55 ` [PATCH v2 18/39] KVM: arm64: gic-v5: Add IRS IODEV support to MMIO handlers Sascha Bischoff
@ 2026-05-21 14:55 ` Sascha Bischoff
2026-05-21 14:55 ` [PATCH v2 20/39] KVM: arm64: gic-v5: Add GICv5 IRS IODEV and MMIO emulation Sascha Bischoff
` (19 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:55 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Define the UAPI address type used by userspace to describe the
location of the emulated IRS in guest physical address space, together
with the size reserved for that region.
As per the GICv5 specification, the IRS has one CONFIG_FRAME and
optionally one SETLPI_FRAME per interrupt domain. Within a KVM VM we
are only concerned with one interrupt domain. Each of these frames is
64kB in size, so reserve 2x64kB of contiguous memory in the GPA space
for a GICv5 IRS.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/include/uapi/asm/kvm.h | 5 +++++
tools/arch/arm64/include/uapi/asm/kvm.h | 5 +++++
2 files changed, 10 insertions(+)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 1c13bfa2d38aa..d1b2ca317f586 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -97,6 +97,11 @@ struct kvm_regs {
#define KVM_VGIC_V3_REDIST_SIZE (2 * SZ_64K)
#define KVM_VGIC_V3_ITS_SIZE (2 * SZ_64K)
+/* Supported VGICv5 address types */
+#define KVM_VGIC_V5_ADDR_TYPE_IRS 6
+
+#define KVM_VGIC_V5_IRS_SIZE (2 * SZ_64K)
+
#define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
#define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
#define KVM_ARM_VCPU_PSCI_0_2 2 /* CPU uses PSCI v0.2 */
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index 1c13bfa2d38aa..d1b2ca317f586 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -97,6 +97,11 @@ struct kvm_regs {
#define KVM_VGIC_V3_REDIST_SIZE (2 * SZ_64K)
#define KVM_VGIC_V3_ITS_SIZE (2 * SZ_64K)
+/* Supported VGICv5 address types */
+#define KVM_VGIC_V5_ADDR_TYPE_IRS 6
+
+#define KVM_VGIC_V5_IRS_SIZE (2 * SZ_64K)
+
#define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
#define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
#define KVM_ARM_VCPU_PSCI_0_2 2 /* CPU uses PSCI v0.2 */
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 20/39] KVM: arm64: gic-v5: Add GICv5 IRS IODEV and MMIO emulation
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (18 preceding siblings ...)
2026-05-21 14:55 ` [PATCH v2 19/39] KVM: arm64: gic-v5: Add KVM_VGIC_V5_ADDR_TYPE_IRS to UAPI Sascha Bischoff
@ 2026-05-21 14:55 ` Sascha Bischoff
2026-05-21 14:56 ` [PATCH v2 21/39] KVM: arm64: gic-v5: Initialise per-VM IRS state Sascha Bischoff
` (18 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:55 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
In order to properly support GICv5-based VMs in KVM, we need to
emulate the CONFIG_FRAME for a virtual IRS. This emulation needs to
handle all guest accesses to the MMIO region, and mimic the behaviour
of a real IRS.
Introduce an IODEV for the GICv5 IRS, and an associated init function
that sets up the SPIs and initial state for the IRS. The MMIO
emulation provides support for the guest to query the IRS_IDx
registers, manipulate SPIs, configure ISTs, and so forth.
The emulation tracks selector state across MMIO accesses. For example,
a guest writes IRS_PE_SELR to select a PE by IAFFID. This is the VPE
ID for a VM, but the guest does not know this. If the guest reads
IRS_PE_STATUSR, KVM checks whether that IAFFID selects a valid VPE and
sets the V bit accordingly. IRS_PE_CR0 is accepted as write-ignored,
because KVM does not support 1-of-N routing.
The same selector/status register model is exposed for SPIs too.
When it comes to the LPI IST this also requires KVM to perform actions
on behalf of the guest. When the emulated IRS_IST_BASER is written,
KVM re-allocates the IST on the host, matching the guest's
configuration (from the emulated IRS_IST_CFGR) where appropriate. This
is then provided to the physical IRS via the VMTE. As far as the guest
is concerned, the IST it allocated is being used by the hardware, but
in reality the host IST is used instead.
This change provides the IRS IODEV as a whole, but this is not plumbed
into the rest of KVM yet.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/Makefile | 2 +-
arch/arm64/kvm/vgic/vgic-irs-v5.c | 757 +++++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic-v5-tables.c | 16 +
arch/arm64/kvm/vgic/vgic-v5-tables.h | 1 +
arch/arm64/kvm/vgic/vgic.h | 2 +
include/kvm/arm_vgic.h | 1 +
6 files changed, 778 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kvm/vgic/vgic-irs-v5.c
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 431de9b145ca1..92dda57c08766 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -24,7 +24,7 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o \
- vgic/vgic-v5.o vgic/vgic-v5-tables.o
+ vgic/vgic-v5.o vgic/vgic-v5-tables.o vgic/vgic-irs-v5.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o
diff --git a/arch/arm64/kvm/vgic/vgic-irs-v5.c b/arch/arm64/kvm/vgic/vgic-irs-v5.c
new file mode 100644
index 0000000000000..d1c724d0fd0b6
--- /dev/null
+++ b/arch/arm64/kvm/vgic/vgic-irs-v5.c
@@ -0,0 +1,757 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025 ARM Limited, All Rights Reserved.
+ */
+#include <linux/bitops.h>
+#include <linux/bsearch.h>
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <kvm/iodev.h>
+#include <kvm/arm_arch_timer.h>
+#include <kvm/arm_vgic.h>
+
+#include "vgic.h"
+#include "vgic-mmio.h"
+#include "vgic-v5-tables.h"
+
+#define irs_caps kvm_vgic_global_state.vgic_v5_irs_caps
+
+static struct vgic_dist *vgic_v5_get_vgic(struct kvm_vcpu *vcpu)
+{
+ return &vcpu->kvm->arch.vgic;
+}
+
+static struct vgic_v5_irs *vgic_v5_get_irs(struct kvm_vcpu *vcpu)
+{
+ return vcpu->kvm->arch.vgic.vgic_v5_irs_data;
+}
+
+static unsigned long vgic_v5_mmio_read_irs_misc(struct kvm_vcpu *vcpu,
+ gpa_t addr, unsigned int len)
+{
+ struct vgic_v5_irs *irs = vgic_v5_get_irs(vcpu);
+ const size_t offset = addr & (SZ_64K - 1);
+ struct kvm_vcpu *target_vcpu;
+ u8 vpe_id_bits;
+ u64 value = 0;
+
+ switch (offset) {
+ case GICV5_IRS_IDR0:
+ value = FIELD_PREP(GICV5_IRS_IDR0_INT_DOM, irs->idr0.domain);
+ value |= FIELD_PREP(GICV5_IRS_IDR0_PA_RANGE, irs->idr0.pa_range);
+ if (irs->idr0.virt)
+ value |= GICV5_IRS_IDR0_VIRT;
+ if (irs->idr0.setlpi)
+ value |= GICV5_IRS_IDR0_SETLPI;
+ if (irs->idr0.mec)
+ value |= GICV5_IRS_IDR0_MEC;
+ if (irs->idr0.mpam)
+ value |= GICV5_IRS_IDR0_MPAM;
+ if (irs->idr0.swe)
+ value |= GICV5_IRS_IDR0_SWE;
+ value |= FIELD_PREP(GICV5_IRS_IDR0_IRSID, irs->idr0.irs_id);
+ break;
+ case GICV5_IRS_IDR1:
+ value = FIELD_PREP(GICV5_IRS_IDR1_PE_CNT,
+ atomic_read(&vcpu->kvm->online_vcpus));
+ /*
+ * IRS_IDR1 encodes IAFFID_BITS as N - 1. The VMTE stores the
+ * actual number of bits used for VPE IDs.
+ */
+ vpe_id_bits = vgic_v5_vmte_vpe_id_bits(vcpu);
+ value |= FIELD_PREP(GICV5_IRS_IDR1_IAFFID_BITS, vpe_id_bits - 1);
+ value |= FIELD_PREP(GICV5_IRS_IDR1_PRIORITY_BITS, irs->idr1.priority_bits);
+ break;
+ case GICV5_IRS_IDR2:
+ value = FIELD_PREP(GICV5_IRS_IDR2_ISTMD_SZ, irs->idr2.istmd_sz);
+ if (irs->idr2.istmd)
+ value |= GICV5_IRS_IDR2_ISTMD;
+ value |= FIELD_PREP(GICV5_IRS_IDR2_IST_L2SZ, irs->idr2.ist_l2sz);
+ if (irs->idr2.ist_levels)
+ value |= GICV5_IRS_IDR2_IST_LEVELS;
+ value |= FIELD_PREP(GICV5_IRS_IDR2_MIN_LPI_ID_BITS, irs->idr2.min_lpi_id_bits);
+ value |= GICV5_IRS_IDR2_LPI;
+ value |= FIELD_PREP(GICV5_IRS_IDR2_ID_BITS, irs->idr2.id_bits);
+ break;
+ case GICV5_IRS_IDR5:
+ value = FIELD_PREP(GICV5_IRS_IDR5_SPI_RANGE, irs->idr5.spi_range);
+ break;
+ case GICV5_IRS_IDR6:
+ value = FIELD_PREP(GICV5_IRS_IDR6_SPI_IRS_RANGE, irs->idr6.spi_irs_range);
+ break;
+ case GICV5_IRS_IDR7:
+ value = FIELD_PREP(GICV5_IRS_IDR7_SPI_BASE, irs->idr7.spi_base);
+ break;
+ case GICV5_IRS_IIDR:
+ /* Revision, Variant, ProductID are implementation defined */
+ value = FIELD_PREP(GICV5_IRS_IIDR_PRODUCT_ID, PRODUCT_ID_KVM);
+ value |= FIELD_PREP(GICV5_IRS_IIDR_VARIANT, 0);
+ value |= FIELD_PREP(GICV5_IRS_IIDR_REVISION, 0);
+ value |= FIELD_PREP(GICV5_IRS_IIDR_IMPLEMENTER, IMPLEMENTER_ARM);
+ break;
+ case GICV5_IRS_AIDR:
+ value = FIELD_PREP(GICV5_IRS_AIDR_COMPONENT,
+ GICV5_AIDR_COMPONENT_IRS);
+ value |= FIELD_PREP(GICV5_IRS_AIDR_ARCHMAJORREV,
+ GICV5_AIDR_ARCH_MAJ_REV_V5);
+ value |= FIELD_PREP(GICV5_IRS_AIDR_ARCHMINORREV,
+ GICV5_AIDR_ARCH_MIN_REV_V0);
+ break;
+ case GICV5_IRS_CR0:
+ /*
+ * The IRS is ALWAYS idle as we handle things instantaneously
+ * from a guest's viewpoint.
+ */
+ value = GICV5_IRS_CR0_IDLE;
+ if (vcpu->kvm->arch.vgic.enabled)
+ value |= GICV5_IRS_CR0_IRSEN;
+ break;
+ case GICV5_IRS_CR1:
+ if (irs->cr1.vped_wa)
+ value |= GICV5_IRS_CR1_VPED_WA;
+ if (irs->cr1.vped_ra)
+ value |= GICV5_IRS_CR1_VPED_RA;
+ if (irs->cr1.vmd_wa)
+ value |= GICV5_IRS_CR1_VMD_WA;
+ if (irs->cr1.vmd_ra)
+ value |= GICV5_IRS_CR1_VMD_RA;
+ if (irs->cr1.vpet_ra)
+ value |= GICV5_IRS_CR1_VPET_RA;
+ if (irs->cr1.vmt_ra)
+ value |= GICV5_IRS_CR1_VMT_RA;
+ if (irs->cr1.ist_wa)
+ value |= GICV5_IRS_CR1_IST_WA;
+ if (irs->cr1.ist_ra)
+ value |= GICV5_IRS_CR1_IST_RA;
+ value |= FIELD_PREP(GICV5_IRS_CR1_IC, irs->cr1.ic);
+ value |= FIELD_PREP(GICV5_IRS_CR1_OC, irs->cr1.oc);
+ value |= FIELD_PREP(GICV5_IRS_CR1_SH, irs->cr1.sh);
+ break;
+ case GICV5_IRS_SYNC_STATUSR:
+ value = GICV5_IRS_SYNC_STATUSR_IDLE;
+ break;
+ case GICV5_IRS_PE_SELR:
+ value = FIELD_PREP(GICV5_IRS_PE_SELR_IAFFID, irs->pe_selr.iaffid);
+ break;
+ case GICV5_IRS_PE_STATUSR:
+ /* We assume that the PE is Online if present. Always IDLE too */
+ value = GICV5_IRS_PE_STATUSR_IDLE;
+
+ /* Set ONLINE and V if IAFFID selects a present PE */
+ if (kvm_get_vcpu_by_id(vcpu->kvm, irs->pe_selr.iaffid)) {
+ value |= GICV5_IRS_PE_STATUSR_ONLINE;
+ value |= GICV5_IRS_PE_STATUSR_V;
+ }
+ break;
+ case GICV5_IRS_PE_CR0:
+ /*
+ * Make sure that we are doing something reasonable first.
+ * Remember, the IAFFID is the same as the VPE_ID
+ */
+ target_vcpu = kvm_get_vcpu_by_id(vcpu->kvm, irs->pe_selr.iaffid);
+ if (!target_vcpu) {
+ kvm_err("Guest programmed invalid IAFFID (0x%x) into the IRS_PE_SELR\n",
+ irs->pe_selr.iaffid);
+ break;
+ }
+
+ value = GICV5_IRS_PE_CR0_DPS;
+ break;
+ default:
+ return 0;
+ }
+
+ return value;
+}
+
+static void vgic_v5_mmio_write_irs_misc(struct kvm_vcpu *vcpu, gpa_t addr,
+ unsigned int len, unsigned long val)
+{
+ struct vgic_v5_irs *irs = vgic_v5_get_irs(vcpu);
+ struct vgic_dist *vgic = vgic_v5_get_vgic(vcpu);
+ const size_t offset = addr & (SZ_64K - 1);
+
+ switch (offset) {
+ case GICV5_IRS_CR0:
+ mutex_lock(&vcpu->kvm->arch.config_lock);
+ /*
+ * We need to make sure that the IRS coming online (or
+ * going offline) is visible to all vCPUs, even if
+ * they are currently resident. Halt all of the vCPUs
+ * now, and resume once we've done the update.
+ */
+ kvm_arm_halt_guest(vcpu->kvm);
+
+ vgic->enabled = !!(val & GICV5_IRS_CR0_IRSEN);
+
+ kvm_arm_resume_guest(vcpu->kvm);
+ mutex_unlock(&vcpu->kvm->arch.config_lock);
+
+ return;
+ case GICV5_IRS_CR1:
+ irs->cr1.sh = FIELD_GET(GICV5_IRS_CR1_SH, val);
+ irs->cr1.oc = FIELD_GET(GICV5_IRS_CR1_OC, val);
+ irs->cr1.ic = FIELD_GET(GICV5_IRS_CR1_IC, val);
+ irs->cr1.ist_ra = !!(val & GICV5_IRS_CR1_IST_RA);
+ irs->cr1.ist_wa = !!(val & GICV5_IRS_CR1_IST_WA);
+ irs->cr1.vmt_ra = !!(val & GICV5_IRS_CR1_VMT_RA);
+ irs->cr1.vpet_ra = !!(val & GICV5_IRS_CR1_VPET_RA);
+ irs->cr1.vmd_ra = !!(val & GICV5_IRS_CR1_VMD_RA);
+ irs->cr1.vmd_wa = !!(val & GICV5_IRS_CR1_VMD_WA);
+ irs->cr1.vped_ra = !!(val & GICV5_IRS_CR1_VPED_RA);
+ irs->cr1.vped_wa = !!(val & GICV5_IRS_CR1_VPED_WA);
+ return;
+ case GICV5_IRS_PE_SELR:
+ irs->pe_selr.iaffid = FIELD_GET(GICV5_IRS_PE_SELR_IAFFID, val);
+ return;
+ case GICV5_IRS_PE_CR0:
+ /*
+ * We actually have nothing to do here as we don't support
+ * 1-of-N routing. The only thing that the guest can correctly
+ * write here is 0x1. However, there's no way to fault if it
+ * writes something else. This is effectively a WI in our case,
+ * but we keep it here for the purposes of documenting it.
+ */
+ return;
+ default:
+ return;
+ }
+}
+
+static bool vgic_v5_is_spi_selr_valid(struct vgic_v5_irs *irs)
+{
+ /* Invalid - we don't have any SPIs at all */
+ if (irs->idr5.spi_range == 0)
+ return false;
+
+ /* Invalid - we don't have any on this IRS */
+ if (irs->idr6.spi_irs_range == 0)
+ return false;
+
+ /* Invalid - ID is less than min */
+ if (irs->spi_selr.id < irs->idr7.spi_base)
+ return false;
+
+ /* Invalid - ID is greater than max */
+ if (irs->spi_selr.id >=
+ (irs->idr7.spi_base + irs->idr6.spi_irs_range))
+ return false;
+
+ return true;
+}
+
+static unsigned long vgic_v5_mmio_read_irs_spi(struct kvm_vcpu *vcpu,
+ gpa_t addr, unsigned int len)
+{
+ struct vgic_v5_irs *irs = vgic_v5_get_irs(vcpu);
+ const size_t offset = addr & (SZ_64K - 1);
+ struct vgic_irq *irq;
+ u64 value = 0;
+
+ switch (offset) {
+ case GICV5_IRS_SPI_SELR:
+ /* Return whatever was last written */
+ value = FIELD_PREP(GICV5_IRS_SPI_SELR_ID, irs->spi_selr.id);
+ break;
+ case GICV5_IRS_SPI_STATUSR:
+ /* We assume that we can always claim to be idle */
+ value = GICV5_IRS_SPI_STATUSR_IDLE;
+ if (vgic_v5_is_spi_selr_valid(irs))
+ value |= GICV5_IRS_SPI_STATUSR_V;
+ break;
+ case GICV5_IRS_SPI_DOMAINR:
+ value = FIELD_PREP(GICV5_IRS_SPI_DOMAINR_DOMAIN,
+ GICV5_IRS_SPI_DOMAINR_DOMAIN_NON_SECURE);
+ break;
+ case GICV5_IRS_SPI_CFGR:
+ if (!vgic_v5_is_spi_selr_valid(irs)) {
+ /* Fault with IRS_SPI_SELR; return 0*/
+ value = 0;
+ break;
+ }
+
+ irq = vgic_get_irq(vcpu->kvm, vgic_v5_make_spi(irs->spi_selr.id));
+ if (!irq) {
+ kvm_err("Guest trying to access SPI not backed by KVM\n");
+ value = 0;
+ break;
+ }
+
+ scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
+ if (irq->config == VGIC_CONFIG_LEVEL)
+ value = GICV5_IRS_SPI_CFGR_TM;
+ }
+
+ vgic_put_irq(vcpu->kvm, irq);
+
+ break;
+ default:
+ return 0;
+ }
+
+ return value;
+}
+
+static void vgic_v5_mmio_write_irs_spi(struct kvm_vcpu *vcpu, gpa_t addr,
+ unsigned int len, unsigned long val)
+{
+ struct vgic_v5_irs *irs = vgic_v5_get_irs(vcpu);
+ const size_t offset = addr & (SZ_64K - 1);
+ struct vgic_irq *irq;
+
+ switch (offset) {
+ case GICV5_IRS_SPI_SELR:
+ irs->spi_selr.id = FIELD_GET(GICV5_IRS_SPI_SELR_ID, val);
+ return;
+ case GICV5_IRS_SPI_CFGR:
+ if (!vgic_v5_is_spi_selr_valid(irs))
+ return;
+
+ /*
+ * Find KVM's representation of the interrupt - we need to make
+ * sure that KVM's view agrees with the guest's, else interrupt
+ * injection won't work properly for level-triggered interrupts
+ * (we fail to handle the clearing of the pending state if KVM
+ * thinks that the interrupt is edge-triggered, which is the
+ * default.)
+ */
+ irq = vgic_get_irq(vcpu->kvm, vgic_v5_make_spi(irs->spi_selr.id));
+ if (!irq)
+ return;
+
+ scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
+ if (val & GICV5_IRS_SPI_CFGR_TM)
+ irq->config = VGIC_CONFIG_LEVEL;
+ else
+ irq->config = VGIC_CONFIG_EDGE;
+ }
+
+ vgic_put_irq(vcpu->kvm, irq);
+
+ return;
+ default:
+ return;
+ }
+}
+
+static bool vgic_v5_ist_cfgr_valid(struct vgic_v5_irs *irs)
+{
+ unsigned int expected_istsz;
+
+ if (irs->ist_cfgr.lpi_id_bits < irs->idr2.min_lpi_id_bits ||
+ irs->ist_cfgr.lpi_id_bits > irs->idr2.id_bits)
+ return false;
+
+ if (!irs->idr2.istmd)
+ expected_istsz = GICV5_IRS_IST_CFGR_ISTSZ_4;
+ else if (irs->ist_cfgr.lpi_id_bits >= irs->idr2.istmd_sz)
+ expected_istsz = GICV5_IRS_IST_CFGR_ISTSZ_16;
+ else
+ expected_istsz = GICV5_IRS_IST_CFGR_ISTSZ_8;
+
+ if (irs->ist_cfgr.istsz != expected_istsz)
+ return false;
+
+ if (irs->ist_cfgr.structure && !irs->idr2.ist_levels)
+ return false;
+
+ if (!irs->ist_cfgr.structure)
+ return true;
+
+ return irs->ist_cfgr.l2sz == irs->idr2.ist_l2sz;
+}
+
+static unsigned long vgic_v5_mmio_read_irs_ist(struct kvm_vcpu *vcpu,
+ gpa_t addr, unsigned int len)
+{
+ struct vgic_v5_irs *irs = vgic_v5_get_irs(vcpu);
+ const size_t offset = addr & (SZ_64K - 1);
+ u64 value = 0;
+
+ switch (offset) {
+ case GICV5_IRS_IST_STATUSR:
+ return GICV5_IRS_IST_STATUSR_IDLE;
+ case GICV5_IRS_IST_CFGR:
+ if (irs->ist_cfgr.structure)
+ value |= GICV5_IRS_IST_CFGR_STRUCTURE;
+ value |= FIELD_PREP(GICV5_IRS_IST_CFGR_ISTSZ, irs->ist_cfgr.istsz);
+ value |= FIELD_PREP(GICV5_IRS_IST_CFGR_L2SZ, irs->ist_cfgr.l2sz);
+ value |= FIELD_PREP(GICV5_IRS_IST_CFGR_LPI_ID_BITS, irs->ist_cfgr.lpi_id_bits);
+ break;
+ case GICV5_IRS_IST_BASER:
+ value = FIELD_PREP(GICV5_IRS_IST_BASER_ADDR_MASK,
+ irs->ist_baser.addr >> GICV5_IRS_IST_BASER_ADDR_SHIFT);
+ if (irs->ist_baser.valid)
+ value |= GICV5_IRS_IST_BASER_VALID;
+ break;
+ default:
+ return 0;
+ }
+
+ return value;
+}
+
+static void vgic_v5_mmio_write_irs_ist(struct kvm_vcpu *vcpu, gpa_t addr,
+ unsigned int len, unsigned long val)
+{
+ struct vgic_v5_irs *irs = vgic_v5_get_irs(vcpu);
+ const size_t offset = addr & (SZ_64K - 1);
+ enum gicv5_vcpu_cmd cmd = LPI_VIST_MAKE_INVALID;
+
+ switch (offset) {
+ case GICV5_IRS_IST_CFGR:
+ irs->ist_cfgr.lpi_id_bits = FIELD_GET(GICV5_IRS_IST_CFGR_LPI_ID_BITS, val);
+ irs->ist_cfgr.l2sz = FIELD_GET(GICV5_IRS_IST_CFGR_L2SZ, val);
+ irs->ist_cfgr.istsz = FIELD_GET(GICV5_IRS_IST_CFGR_ISTSZ, val);
+ irs->ist_cfgr.structure = !!(val & GICV5_IRS_IST_CFGR_STRUCTURE);
+ return;
+ case GICV5_IRS_IST_BASER: {
+ bool valid = !!(val & GICV5_IRS_IST_BASER_VALID);
+
+ guard(mutex)(&vcpu->kvm->arch.config_lock);
+
+ /* Valid -> Invalid */
+ if (irs->ist_baser.valid && !valid) {
+ /* Make the LPI IST invalid and then ... */
+ if (irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu), &cmd))
+ break;
+
+ /*
+ * ... free the host IST if we successfully marked the
+ * IST as invalid. Frankly, if we failed to make the
+ * guest's IST as invalid, we're cooked because it means
+ * that the IRS may still be using the memory that we
+ * want to free. Hence, we leave it allocated and skip
+ * the clearing of valid bit in the baser.
+ */
+ if (vgic_v5_lpi_ist_free(vcpu->kvm))
+ break;
+ } else if (!irs->ist_baser.valid && valid) { /* Invalid -> Valid */
+ if (!vgic_v5_ist_cfgr_valid(irs)) {
+ kvm_err("Guest programmed invalid IRS_IST_CFGR\n");
+ break;
+ }
+
+ if (vgic_v5_lpi_ist_alloc(vcpu->kvm, irs->ist_cfgr.lpi_id_bits))
+ break;
+ }
+
+ /* Now that we've handled the edges, update the valid bit and addr */
+ irs->ist_baser.valid = !!(val & GICV5_IRS_IST_BASER_VALID);
+ irs->ist_baser.addr = FIELD_GET(GICV5_IRS_IST_BASER_ADDR_MASK, val)
+ << GICV5_IRS_IST_BASER_ADDR_SHIFT;
+
+ return;
+ }
+ default:
+ return;
+ }
+}
+
+static const struct vgic_register_region vgic_v5_irs_registers[] = {
+ /*
+ * This is the IRS_CONFIG_FRAME.
+ */
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR0, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR1, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR2, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR3, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR4, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR5, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR6, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR7, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IIDR, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_AIDR, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_CR0, vgic_v5_mmio_read_irs_misc,
+ vgic_v5_mmio_write_irs_misc, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_CR1, vgic_v5_mmio_read_irs_misc,
+ vgic_v5_mmio_write_irs_misc, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SYNCR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SYNC_STATUSR,
+ vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_VMR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8,
+ VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_SELR, vgic_v5_mmio_read_irs_spi,
+ vgic_v5_mmio_write_irs_spi, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_DOMAINR, vgic_v5_mmio_read_irs_spi,
+ vgic_v5_mmio_write_irs_spi, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_RESAMPLER, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_CFGR, vgic_v5_mmio_read_irs_spi,
+ vgic_v5_mmio_write_irs_spi, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_STATUSR,
+ vgic_v5_mmio_read_irs_spi, vgic_mmio_write_wi,
+ 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_PE_SELR, vgic_v5_mmio_read_irs_misc,
+ vgic_v5_mmio_write_irs_misc, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_PE_STATUSR,
+ vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_PE_CR0, vgic_v5_mmio_read_irs_misc,
+ vgic_v5_mmio_write_irs_misc, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IST_BASER, vgic_v5_mmio_read_irs_ist,
+ vgic_v5_mmio_write_irs_ist, 8,
+ VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IST_CFGR, vgic_v5_mmio_read_irs_ist,
+ vgic_v5_mmio_write_irs_ist, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IST_STATUSR,
+ vgic_v5_mmio_read_irs_ist, vgic_mmio_write_wi,
+ 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_MAP_L2_ISTR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+
+ /*
+ * The following registers are only for running VMs. They are not yet
+ * supported as we don't currently support nested, so expose them as
+ * read-as-zero/write-ignored.
+ */
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VMT_BASER, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VMT_CFGR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VMT_STATUSR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VPE_SELR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VPE_DBR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VPE_HPPIR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VPE_CR0, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VPE_STATUSR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VM_DBR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VM_SELR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VM_STATUSR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VMAP_L2_VMTR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VMAP_VMR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VMAP_VISTR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VMAP_L2_VISTR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_VMAP_VPER, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SAVE_VMR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SAVE_VM_STATUSR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+
+ /* MEC, MPAM, SWERR - all unimplemented */
+
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_MEC_IDR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_MEC_MECID_R, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_MPAM_IDR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_MPAM_PARTID_R, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SWERR_STATUSR, vgic_mmio_read_raz,
+ vgic_mmio_write_wi, 8, VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SWERR_SYNDROMER0,
+ vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
+ VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SWERR_SYNDROMER1,
+ vgic_mmio_read_raz, vgic_mmio_write_wi, 8,
+ VGIC_ACCESS_64bit),
+};
+
+unsigned int vgic_v5_init_irs_iodev(struct vgic_io_device *dev)
+{
+ dev->regions = vgic_v5_irs_registers;
+ dev->nr_regions = ARRAY_SIZE(vgic_v5_irs_registers);
+
+ kvm_iodevice_init(&dev->dev, &kvm_io_gic_ops);
+
+ /* We represent both of the IRS frames back to back, so this is 128K */
+ return KVM_VGIC_V5_IRS_SIZE;
+}
+
+int vgic_v5_register_irs_iodev(struct kvm *kvm, gpa_t irs_base_address)
+{
+ struct vgic_io_device *io_device = &kvm->arch.vgic.vgic_v5_irs_data->iodev;
+ unsigned int len;
+
+ /*
+ * Design choice: Force MMIO region to be 64k aligned. Simplifies
+ * pulling out registers.
+ */
+ if (!IS_ALIGNED(irs_base_address, SZ_64K)) {
+ kvm_err("IRS Base address is not aligned to 64k\n");
+ return -EINVAL;
+ }
+
+ len = vgic_v5_init_irs_iodev(io_device);
+
+ io_device->base_addr = irs_base_address;
+ io_device->iodev_type = IODEV_GICV5_IRS;
+ io_device->redist_vcpu = NULL;
+
+ return kvm_io_bus_register_dev(kvm, KVM_MMIO_BUS, irs_base_address, len,
+ &io_device->dev);
+}
+
+/**
+ * kvm_vgic_v5_irs_init: initialize the IRS data structures
+ * @kvm: kvm struct pointer
+ * @nr_spis: number of spis, frozen by caller
+ */
+int kvm_vgic_v5_irs_init(struct kvm *kvm, unsigned int nr_spis)
+{
+ struct vgic_dist *dist = &kvm->arch.vgic;
+ struct vgic_v5_irs *irs = dist->vgic_v5_irs_data;
+ struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0);
+ size_t nr_spi_bits;
+ u64 mmfr0;
+ int ret, i;
+
+ /*
+ * We (KVM) allocate an Interrupt State Table (IST) for SPIs. The
+ * hardware mandates that lower 6 bits of the address are 0. Each ISTE
+ * is 4 bytes in size (or larger if metadata storage is required), so 16
+ * entries would be enough for alignment. Keep the minimum at 32 SPIs to
+ * match KVM's vGICv3 minimum and the VGICv5 device API.
+ */
+ if (nr_spis && nr_spis < VGIC_V5_DEFAULT_NR_SPIS)
+ nr_spis = VGIC_V5_DEFAULT_NR_SPIS;
+
+ if (nr_spis) {
+ dist->spis = kcalloc(nr_spis, sizeof(struct vgic_irq),
+ GFP_KERNEL_ACCOUNT);
+ if (!dist->spis)
+ return -ENOMEM;
+
+ /*
+ * In the following code we do not take the irq struct lock since
+ * no other action on irq structs can happen while the VGIC is
+ * not initialized yet.
+ */
+ for (i = 0; i < nr_spis; i++) {
+ struct vgic_irq *irq = &dist->spis[i];
+
+ irq->intid = vgic_v5_make_spi(i);
+ INIT_LIST_HEAD(&irq->ap_list);
+ raw_spin_lock_init(&irq->irq_lock);
+ irq->vcpu = NULL;
+ irq->target_vcpu = vcpu0;
+ refcount_set(&irq->refcount, 0);
+ /*
+ * The guest controls the enable state, and again it is
+ * directly handled by the hardware. From our point of
+ * view it is always enabled.
+ */
+ irq->enabled = 1;
+ }
+
+ nr_spi_bits = fls(roundup_pow_of_two(nr_spis)) - 1;
+
+ ret = vgic_v5_spi_ist_allocate(kvm, nr_spi_bits);
+ if (ret) {
+ kfree(dist->spis);
+ dist->spis = NULL;
+ return ret;
+ }
+ }
+
+ /* Set sane initial state for the IRS MMIO registers */
+
+ irs->idr0.domain = GICV5_IRS_IDR0_INT_DOM_NON_SECURE;
+
+ mmfr0 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1);
+ irs->idr0.pa_range = cpuid_feature_extract_unsigned_field(mmfr0,
+ ID_AA64MMFR0_EL1_PARANGE_SHIFT);
+
+ irs->idr0.virt = 0;
+ irs->idr0.setlpi = 0;
+ irs->idr0.mec = 0;
+ irs->idr0.mpam = 0;
+ irs->idr0.swe = 0;
+ irs->idr0.irs_id = 0;
+
+ irs->idr1.priority_bits = gicv5_global_data.irs_pri_bits - 1;
+
+ /*
+ * Support 16-bits of ID space for the IRS. This should be sufficient
+ * for most applications, and the CPUIF is guaranteed to have at least
+ * 16-bits of ID space support (we actually present 16-bits there, even
+ * if the hardware supports more). Warn if the hardware doesn't support
+ * 16 bits, and use the smaller value. YMMV!
+ *
+ * As for the minimum number of ID bits, we match the hardware's
+ * capability.
+ */
+ if (irs_caps.ist_id_bits < 16)
+ pr_warn("Host IRS supports fewer than 16 ID bits for ISTs (%u)\n",
+ irs_caps.ist_id_bits);
+
+ irs->idr2.id_bits = min(16, irs_caps.ist_id_bits);
+ irs->idr2.min_lpi_id_bits = irs_caps.min_lpi_id_bits;
+
+ /* Only allow the guest to create Linear ISTs - simplifies Save/Restore */
+ irs->idr2.ist_levels = 0;
+ irs->idr2.ist_l2sz = GICV5_IRS_IST_CFGR_L2SZ_4K;
+ irs->idr2.istmd = 0;
+ irs->idr2.istmd_sz = 0;
+
+ /* We have a single IRS, only. All SPIs reside here! */
+ irs->idr5.spi_range = nr_spis;
+ irs->idr6.spi_irs_range = nr_spis;
+ irs->idr7.spi_base = 0;
+
+ irs->cr1.sh = 0;
+ irs->cr1.oc = 0;
+ irs->cr1.ic = 0;
+ irs->cr1.ist_ra = 0;
+ irs->cr1.ist_wa = 0;
+ irs->cr1.vmt_ra = 0;
+ irs->cr1.vpet_ra = 0;
+ irs->cr1.vmd_ra = 0;
+ irs->cr1.vmd_wa = 0;
+ irs->cr1.vped_ra = 0;
+ irs->cr1.vped_wa = 0;
+
+ irs->spi_selr.id = -1;
+
+ irs->pe_selr.iaffid = -1;
+
+ irs->ist_cfgr.lpi_id_bits = 0;
+ irs->ist_cfgr.l2sz = 0;
+ irs->ist_cfgr.istsz = 0;
+ irs->ist_cfgr.structure = 0;
+
+ irs->ist_baser.valid = 0;
+ irs->ist_baser.addr = 0;
+
+ return 0;
+}
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.c b/arch/arm64/kvm/vgic/vgic-v5-tables.c
index 5c87c6c27087a..2df470d29d64a 100644
--- a/arch/arm64/kvm/vgic/vgic-v5-tables.c
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.c
@@ -576,6 +576,22 @@ int vgic_v5_vmte_release(struct kvm *kvm)
return 0;
}
+/*
+ * Provide a way for the IRS MMIO emulation to correctly populate the number of
+ * IAFFID bits (which correspond to our vpe_id_bits.
+ */
+u8 vgic_v5_vmte_vpe_id_bits(struct kvm_vcpu *vcpu)
+{
+ u16 vm_id = vgic_v5_vm_id(vcpu->kvm);
+ struct vgic_v5_vm_info *vmi;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return 0;
+
+ return vmi->vpe_id_bits;
+}
+
/*
* Allocate a VPE descriptor and provide it to the hardware via the VPE Table.
*/
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.h b/arch/arm64/kvm/vgic/vgic-v5-tables.h
index acd862b8806d1..0ca0ae798dda6 100644
--- a/arch/arm64/kvm/vgic/vgic-v5-tables.h
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.h
@@ -90,6 +90,7 @@ void vgic_v5_release_vm_id(struct kvm *kvm);
int vgic_v5_vmte_init(struct kvm *kvm);
int vgic_v5_vmte_release(struct kvm *kvm);
+u8 vgic_v5_vmte_vpe_id_bits(struct kvm_vcpu *vcpu);
int vgic_v5_vmte_alloc_vpe(struct kvm_vcpu *vcpu);
int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index f2f5fdc3211d7..282278e4a6c19 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -366,6 +366,7 @@ void vgic_debug_destroy(struct kvm *kvm);
int vgic_v5_probe(const struct gic_kvm_info *info);
void vgic_v5_reset(struct kvm_vcpu *vcpu);
int vgic_v5_init(struct kvm *kvm);
+int kvm_vgic_v5_irs_init(struct kvm *kvm, unsigned int nr_spis);
void vgic_v5_teardown(struct kvm *kvm);
int vgic_v5_map_resources(struct kvm *kvm);
void vgic_v5_set_ppi_ops(struct kvm_vcpu *vcpu, u32 vintid);
@@ -378,6 +379,7 @@ void vgic_v5_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
void vgic_v5_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
void vgic_v5_restore_state(struct kvm_vcpu *vcpu);
void vgic_v5_save_state(struct kvm_vcpu *vcpu);
+int vgic_v5_register_irs_iodev(struct kvm *kvm, gpa_t irs_base_address);
#define for_each_visible_v5_ppi(__i, __k) \
for_each_set_bit(__i, (__k)->arch.vgic.gicv5_vm.vgic_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS)
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 4d930a2651213..143e75743da86 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -40,6 +40,7 @@
* in KVM for now. At a future stage, this can be bumped up to 128, if required.
*/
#define VGIC_V5_NR_PRIVATE_IRQS 64
+#define VGIC_V5_DEFAULT_NR_SPIS 32
#define is_v5_type(t, i) (FIELD_GET(GICV5_HWIRQ_TYPE, (i)) == (t))
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 21/39] KVM: arm64: gic-v5: Initialise per-VM IRS state
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (19 preceding siblings ...)
2026-05-21 14:55 ` [PATCH v2 20/39] KVM: arm64: gic-v5: Add GICv5 IRS IODEV and MMIO emulation Sascha Bischoff
@ 2026-05-21 14:56 ` Sascha Bischoff
2026-05-21 14:56 ` [PATCH v2 22/39] KVM: arm64: gic-v5: Register the IRS IODEV Sascha Bischoff
` (17 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:56 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
A virtual GICv5 needs an emulated IRS in addition to the host IRS
state used to back VMTEs, VPEs, and ISTs. Without this, KVM can only
provide the CPU-local PPI state and cannot expose the IRS-backed SPI
and LPI configuration expected by a GICv5 guest.
Allocate the per-VM emulated IRS state when creating a virtual GICv5,
and initialise it from vgic_v5_init(). If userspace has not provided a
number of SPIs, use the GICv5 default of 32. The IRS init path
allocates the SPI state, initialises the virtual IRS register state,
and creates the backing SPI IST when SPIs are present.
Keep the per-VM IRS object alive for the lifetime of the virtual GICv5.
vgic_v5_teardown() only unwinds resources allocated by vgic_v5_init(), so
failed initialisation can be retried, while kvm_vgic_dist_destroy() frees
the IRS object during final VGIC destruction.
This gives virtual GICv5s the IRS backing required for SPIs and LPIs,
rather than being limited to PPIs, only. Further patches add support for
SPI injection and lifecycle tracking.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-init.c | 59 +++++++++++++++++++++++----------
arch/arm64/kvm/vgic/vgic-v5.c | 8 ++++-
2 files changed, 49 insertions(+), 18 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index 94632fd90b728..aa883507d00d1 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -174,28 +174,48 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
break;
}
- if (ret) {
- kvm_for_each_vcpu(i, vcpu, kvm) {
- struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
- kfree(vgic_cpu->private_irqs);
- vgic_cpu->private_irqs = NULL;
- }
-
- kvm->arch.vgic.vgic_model = 0;
- goto out_unlock;
- }
+ if (ret)
+ goto out_free_private_irqs;
if (type == KVM_DEV_TYPE_ARM_VGIC_V3)
kvm->arch.vgic.nassgicap = system_supports_direct_sgis();
- /*
- * We now know that we have a GICv5. The Arch Timer PPI interrupts may
- * have been initialised at this stage, but will have done so assuming
- * that we have an older GIC, meaning that the IntIDs won't be
- * correct. We init them again, and this time they will be correct.
- */
- if (type == KVM_DEV_TYPE_ARM_VGIC_V5)
+ if (type == KVM_DEV_TYPE_ARM_VGIC_V5) {
+ /* Allocate a vIRS for GICv5 systems */
+ kvm->arch.vgic.vgic_v5_irs_data = kzalloc_obj(struct vgic_v5_irs,
+ GFP_KERNEL_ACCOUNT);
+ if (!kvm->arch.vgic.vgic_v5_irs_data) {
+ ret = -ENOMEM;
+ goto out_free_private_irqs;
+ }
+
+ /*
+ * Initialization happens later, for now just explicitly
+ * disable the device and undef its base address.
+ */
+ kvm->arch.vgic.vgic_v5_irs_data->vgic_v5_irs_base = VGIC_ADDR_UNDEF;
+
+ /*
+ * We now know that we have a GICv5. The Arch Timer PPI
+ * interrupts may have been initialised at this stage, but will
+ * have done so assuming that we have an older GIC, meaning that
+ * the IntIDs won't be correct. We init them again, and this
+ * time they will be correct.
+ */
kvm_timer_init_vm(kvm);
+ }
+
+ goto out_unlock;
+
+out_free_private_irqs:
+ kvm_for_each_vcpu(i, vcpu, kvm) {
+ struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+
+ kfree(vgic_cpu->private_irqs);
+ vgic_cpu->private_irqs = NULL;
+ }
+
+ kvm->arch.vgic.vgic_model = 0;
out_unlock:
mutex_unlock(&kvm->arch.config_lock);
@@ -467,6 +487,9 @@ int vgic_init(struct kvm *kvm)
return ret;
}
} else {
+ if (!dist->nr_spis)
+ dist->nr_spis = VGIC_V5_DEFAULT_NR_SPIS;
+
ret = vgic_v5_init(kvm);
if (ret)
return ret;
@@ -512,6 +535,8 @@ static void kvm_vgic_dist_destroy(struct kvm *kvm)
break;
case KVM_DEV_TYPE_ARM_VGIC_V5:
vgic_v5_teardown(kvm);
+ kfree(dist->vgic_v5_irs_data);
+ dist->vgic_v5_irs_data = NULL;
break;
}
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index b966495901cc4..f481191d72eae 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -661,7 +661,8 @@ void vgic_v5_teardown(struct kvm *kvm)
/*
* Claim and populate a VMTE (optionally making a new L2 VMT valid), create VPE
- * doorbells, allocate VPET and populate for each VPE.
+ * doorbells, allocate VPET and populate for each VPE. Finally, we also init the
+ * vIRS, which means allocating and making the virtual SPI IST valid.
*
* Note: We do need to put the cart before the horse here. The VPE doorbells are
* our conduit for communication with the IRS, which means we need to have those
@@ -731,6 +732,11 @@ int vgic_v5_init(struct kvm *kvm)
goto err;
}
+ /* Init IRS (and alloc SPI IST) */
+ ret = kvm_vgic_v5_irs_init(kvm, kvm->arch.vgic.nr_spis);
+ if (ret)
+ goto err;
+
return 0;
err:
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 22/39] KVM: arm64: gic-v5: Register the IRS IODEV
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (20 preceding siblings ...)
2026-05-21 14:56 ` [PATCH v2 21/39] KVM: arm64: gic-v5: Initialise per-VM IRS state Sascha Bischoff
@ 2026-05-21 14:56 ` Sascha Bischoff
2026-05-21 14:57 ` [PATCH v2 23/39] KVM: arm64: gic-v5: Set IRICHPPIDIS based on IRS enable state Sascha Bischoff
` (16 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:56 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Now that we have an emulated IRS, it needs to be registered, which
ensures that guest accesses to the MMIO regions handled by the device
are handled appropriately in KVM. Therefore, as part of
vgic_map_resources, the GICv5 IRS IODEV is registered. If the address
for the IRS is not provided, bail out reporting an error - this is not
a supported config.
As part of this change, expose setting the address of the emulated IRS
via KVM_VGIC_V5_ADDR_TYPE_IRS to userspace. Also allow userspace to set
the number of SPIs handled by the emulated GICv5 implementation, using a
GICv5-specific SPI count rather than the legacy total interrupt count.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-init.c | 19 ++++-
arch/arm64/kvm/vgic/vgic-kvm-device.c | 106 ++++++++++++++++++--------
2 files changed, 91 insertions(+), 34 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-init.c b/arch/arm64/kvm/vgic/vgic-init.c
index aa883507d00d1..f6c8a11c9aa44 100644
--- a/arch/arm64/kvm/vgic/vgic-init.c
+++ b/arch/arm64/kvm/vgic/vgic-init.c
@@ -657,9 +657,8 @@ int vgic_lazy_init(struct kvm *kvm)
int kvm_vgic_map_resources(struct kvm *kvm)
{
struct vgic_dist *dist = &kvm->arch.vgic;
- bool needs_dist = true;
enum vgic_type type;
- gpa_t dist_base;
+ gpa_t dist_base, irs_base;
int ret = 0;
if (likely(smp_load_acquire(&dist->ready)))
@@ -682,13 +681,12 @@ int kvm_vgic_map_resources(struct kvm *kvm)
} else {
ret = vgic_v5_map_resources(kvm);
type = VGIC_V5;
- needs_dist = false;
}
if (ret)
goto out;
- if (needs_dist) {
+ if (type != VGIC_V5) {
dist_base = dist->vgic_dist_base;
mutex_unlock(&kvm->arch.config_lock);
@@ -698,7 +696,20 @@ int kvm_vgic_map_resources(struct kvm *kvm)
goto out_slots;
}
} else {
+ irs_base = dist->vgic_v5_irs_data->vgic_v5_irs_base;
mutex_unlock(&kvm->arch.config_lock);
+
+ if (IS_VGIC_ADDR_UNDEF(irs_base)) {
+ kvm_err("No IRS address provided\n");
+ ret = -ENXIO;
+ goto out_slots;
+ }
+
+ ret = vgic_v5_register_irs_iodev(kvm, irs_base);
+ if (ret) {
+ kvm_err("Unable to register VGIC IRS MMIO regions\n");
+ goto out_slots;
+ }
}
smp_store_release(&dist->ready, true);
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 90be99443df3b..2bf1930902b8e 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -181,6 +181,14 @@ static int kvm_vgic_addr(struct kvm *kvm, struct kvm_device_attr *attr, bool wri
addr |= (u64)rdreg->count << KVM_VGIC_V3_RDIST_COUNT_SHIFT;
goto out;
}
+ case KVM_VGIC_V5_ADDR_TYPE_IRS:
+ r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V5);
+ if (r)
+ break;
+ addr_ptr = &vgic->vgic_v5_irs_data->vgic_v5_irs_base;
+ alignment = SZ_64K;
+ size = KVM_VGIC_V5_IRS_SIZE;
+ break;
default:
r = -ENODEV;
}
@@ -224,31 +232,48 @@ static int vgic_set_common_attr(struct kvm_device *dev,
if (get_user(val, uaddr))
return -EFAULT;
- /*
- * We require:
- * - at least 32 SPIs on top of the 16 SGIs and 16 PPIs
- * - at most 1024 interrupts
- * - a multiple of 32 interrupts
- */
- if (val < (VGIC_NR_PRIVATE_IRQS + 32) ||
- val > VGIC_MAX_RESERVED ||
- (val & 31))
- return -EINVAL;
+ if (!vgic_is_v5(dev->kvm)) {
+ /*
+ * We require:
+ * - at least 32 SPIs on top of the 16 SGIs and 16 PPIs
+ * - at most 1024 interrupts
+ * - a multiple of 32 interrupts
+ */
+ if (val < (VGIC_NR_PRIVATE_IRQS + 32) ||
+ val > VGIC_MAX_RESERVED || (val & 31))
+ return -EINVAL;
- mutex_lock(&dev->kvm->arch.config_lock);
+ mutex_lock(&dev->kvm->arch.config_lock);
- /*
- * Either userspace has already configured NR_IRQS or
- * the vgic has already been initialized and vgic_init()
- * supplied a default amount of SPIs.
- */
- if (dev->kvm->arch.vgic.nr_spis)
- ret = -EBUSY;
- else
- dev->kvm->arch.vgic.nr_spis =
- val - VGIC_NR_PRIVATE_IRQS;
+ /*
+ * Either userspace has already configured NR_IRQS or
+ * the vgic has already been initialized and vgic_init()
+ * supplied a default amount of SPIs.
+ */
+ if (dev->kvm->arch.vgic.nr_spis)
+ ret = -EBUSY;
+ else
+ dev->kvm->arch.vgic.nr_spis =
+ val - VGIC_NR_PRIVATE_IRQS;
- mutex_unlock(&dev->kvm->arch.config_lock);
+ mutex_unlock(&dev->kvm->arch.config_lock);
+ } else {
+ /*
+ * GICv5 reports a number of SPIs, not a total number of
+ * interrupts. Require a multiple of 32 SPIs.
+ */
+ if (val < VGIC_V5_DEFAULT_NR_SPIS ||
+ val > FIELD_MAX(GICV5_IRS_IDR5_SPI_RANGE) ||
+ (val & 31))
+ return -EINVAL;
+
+ mutex_lock(&dev->kvm->arch.config_lock);
+ if (vgic_initialized(dev->kvm) || dev->kvm->arch.vgic.nr_spis)
+ ret = -EBUSY;
+ else
+ dev->kvm->arch.vgic.nr_spis = val;
+ mutex_unlock(&dev->kvm->arch.config_lock);
+ }
return ret;
}
@@ -299,9 +324,14 @@ static int vgic_get_common_attr(struct kvm_device *dev,
return (r == -ENODEV) ? -ENXIO : r;
case KVM_DEV_ARM_VGIC_GRP_NR_IRQS: {
u32 __user *uaddr = (u32 __user *)(long)attr->addr;
-
- r = put_user(dev->kvm->arch.vgic.nr_spis +
- VGIC_NR_PRIVATE_IRQS, uaddr);
+ /* Older GICs */
+ if (!vgic_is_v5(dev->kvm)) {
+ r = put_user(dev->kvm->arch.vgic.nr_spis +
+ VGIC_NR_PRIVATE_IRQS,
+ uaddr);
+ } else {
+ r = put_user(dev->kvm->arch.vgic.nr_spis, uaddr);
+ }
break;
}
}
@@ -748,21 +778,25 @@ static int vgic_v5_set_attr(struct kvm_device *dev,
{
switch (attr->group) {
case KVM_DEV_ARM_VGIC_GRP_ADDR:
+ break;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
- case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
return -ENXIO;
+ case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+ break;
case KVM_DEV_ARM_VGIC_GRP_CTRL:
switch (attr->attr) {
case KVM_DEV_ARM_VGIC_CTRL_INIT:
- return vgic_set_common_attr(dev, attr);
+ break;
case KVM_DEV_ARM_VGIC_USERSPACE_PPIS:
default:
return -ENXIO;
}
+ break;
default:
return -ENXIO;
}
+ return vgic_set_common_attr(dev, attr);
}
static int vgic_v5_get_attr(struct kvm_device *dev,
@@ -770,21 +804,26 @@ static int vgic_v5_get_attr(struct kvm_device *dev,
{
switch (attr->group) {
case KVM_DEV_ARM_VGIC_GRP_ADDR:
+ break;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
- case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
return -ENXIO;
+ case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+ break;
case KVM_DEV_ARM_VGIC_GRP_CTRL:
switch (attr->attr) {
case KVM_DEV_ARM_VGIC_CTRL_INIT:
- return vgic_get_common_attr(dev, attr);
+ break;
case KVM_DEV_ARM_VGIC_USERSPACE_PPIS:
return vgic_v5_get_userspace_ppis(dev, attr);
default:
return -ENXIO;
}
+ break;
default:
return -ENXIO;
}
+
+ return vgic_get_common_attr(dev, attr);
}
static int vgic_v5_has_attr(struct kvm_device *dev,
@@ -792,15 +831,22 @@ static int vgic_v5_has_attr(struct kvm_device *dev,
{
switch (attr->group) {
case KVM_DEV_ARM_VGIC_GRP_ADDR:
+ switch (attr->attr) {
+ case KVM_VGIC_V5_ADDR_TYPE_IRS:
+ return 0;
+ }
+ return -ENXIO;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
- case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
return -ENXIO;
+ case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
+ return 0;
case KVM_DEV_ARM_VGIC_GRP_CTRL:
switch (attr->attr) {
case KVM_DEV_ARM_VGIC_CTRL_INIT:
return 0;
case KVM_DEV_ARM_VGIC_USERSPACE_PPIS:
return 0;
+ case KVM_DEV_ARM_VGIC_SAVE_PENDING_TABLES:
default:
return -ENXIO;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 23/39] KVM: arm64: gic-v5: Set IRICHPPIDIS based on IRS enable state
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (21 preceding siblings ...)
2026-05-21 14:56 ` [PATCH v2 22/39] KVM: arm64: gic-v5: Register the IRS IODEV Sascha Bischoff
@ 2026-05-21 14:57 ` Sascha Bischoff
2026-05-21 14:57 ` [PATCH v2 24/39] KVM: arm64: selftests: Update vGICv5 selftest to set IRS address Sascha Bischoff
` (15 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:57 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
The GICv5 ICH_CONTEXTR_EL2 has the IRICHPPIDIS field, which allows the
hypervisor to enable/disable the HPPI selection for SPIs and
LPIs. This can be used to emulate the guest enabling/disabling the
IRS. Therefore, make the state of this controlled by the IRS enable
state. Thus, SPIs and LPIs can't be delivered to the guest, until it
enables the emulated IRS, which matches the behaviour of the real
hardware.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-v5.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index f481191d72eae..458afdbfe2938 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -1043,6 +1043,7 @@ void vgic_v5_flush_ppi_state(struct kvm_vcpu *vcpu)
void vgic_v5_load(struct kvm_vcpu *vcpu)
{
+ bool irichppidis = !vcpu->kvm->arch.vgic.enabled;
struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
u16 vm = vgic_v5_vm_id(vcpu->kvm);
u16 vpe = vgic_v5_vpe_id(vcpu);
@@ -1059,6 +1060,7 @@ void vgic_v5_load(struct kvm_vcpu *vcpu)
kvm_call_hyp(__vgic_v5_restore_vmcr_apr, cpu_if);
cpu_if->vgic_contextr = FIELD_PREP(ICH_CONTEXTR_EL2_V, true) |
+ FIELD_PREP(ICH_CONTEXTR_EL2_IRICHPPIDIS, irichppidis) |
FIELD_PREP(ICH_CONTEXTR_EL2_VPE, vpe) |
FIELD_PREP(ICH_CONTEXTR_EL2_VM, vm);
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 24/39] KVM: arm64: selftests: Update vGICv5 selftest to set IRS address
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (22 preceding siblings ...)
2026-05-21 14:57 ` [PATCH v2 23/39] KVM: arm64: gic-v5: Set IRICHPPIDIS based on IRS enable state Sascha Bischoff
@ 2026-05-21 14:57 ` Sascha Bischoff
2026-05-21 14:57 ` [PATCH v2 25/39] KVM: arm64: gic-v5: Introduce SPI AP list Sascha Bischoff
` (14 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:57 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
This selftest was added before the GICv5 IRS was supported in
KVM. Therefore, there was no address to set, and the specific UAPI
didn't even exist.
Now that the IRS is supported, and setting its address is mandatory
before VGIC resources are mapped, set the emulated IRS GPA before
initialising the VGIC. Running a GICv5 VM will fail if userspace has
not provided the IRS address before the first vCPU run.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
tools/testing/selftests/kvm/arm64/vgic_v5.c | 6 ++++++
tools/testing/selftests/kvm/include/arm64/gic_v5.h | 3 +++
2 files changed, 9 insertions(+)
diff --git a/tools/testing/selftests/kvm/arm64/vgic_v5.c b/tools/testing/selftests/kvm/arm64/vgic_v5.c
index 96cfd6bb32f6f..19039a8940568 100644
--- a/tools/testing/selftests/kvm/arm64/vgic_v5.c
+++ b/tools/testing/selftests/kvm/arm64/vgic_v5.c
@@ -100,6 +100,7 @@ static void test_vgic_v5_ppis(u32 gic_dev_type)
struct ucall uc;
u64 user_ppis[2];
struct vm_gic v;
+ uint64_t attr;
int ret, i;
v.gic_dev_type = gic_dev_type;
@@ -116,6 +117,11 @@ static void test_vgic_v5_ppis(u32 gic_dev_type)
for (i = 0; i < NR_VCPUS; i++)
vcpu_init_descriptor_tables(vcpus[i]);
+ /* Set the address of the IRS before initialising the GIC */
+ attr = GICV5_IRS_CONFIG_BASE_GPA;
+ kvm_device_attr_set(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR,
+ KVM_VGIC_V5_ADDR_TYPE_IRS, &attr);
+
kvm_device_attr_set(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL,
KVM_DEV_ARM_VGIC_CTRL_INIT, NULL);
diff --git a/tools/testing/selftests/kvm/include/arm64/gic_v5.h b/tools/testing/selftests/kvm/include/arm64/gic_v5.h
index eb523d9277cf1..c388df8f2a2b4 100644
--- a/tools/testing/selftests/kvm/include/arm64/gic_v5.h
+++ b/tools/testing/selftests/kvm/include/arm64/gic_v5.h
@@ -10,6 +10,9 @@
#include "processor.h"
+/* GIC component base address is guest PA space */
+#define GICV5_IRS_CONFIG_BASE_GPA 0x8000000ULL
+
/*
* Definitions for GICv5 instructions for the Current Domain
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 25/39] KVM: arm64: gic-v5: Introduce SPI AP list
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (23 preceding siblings ...)
2026-05-21 14:57 ` [PATCH v2 24/39] KVM: arm64: selftests: Update vGICv5 selftest to set IRS address Sascha Bischoff
@ 2026-05-21 14:57 ` Sascha Bischoff
2026-05-21 14:58 ` [PATCH v2 26/39] KVM: arm64: gic-v5: Add GIC VDPEND and GIC VDRCFG hyp calls Sascha Bischoff
` (13 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:57 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
As a general rule, GICv5 works a bit differently to previous
generation GICs. When it comes to virtual interrupts, as much as
possible is handled directly by the hardware and requires minimal
software interaction.
So far, the GICv5 support has been limited to PPIs. These are handled
via a set of ICH_PPI_*_EL2 registers, which are used by the hypervisor
to manage the PPI state exposed to the guest. They effectively take
the role of the ICH_LR*_EL2 registers found in earlier GICs, but do so
for EVERY PPI in parallel. For this reason, the GICv5 PPI support
doesn't use AP lists at all - all PPI state is always presented to the
guest.
The lifecycle of a virtual SPI is largely handled by the hardware with
GICv5. GICv5 itself provides a set of system instructions that act
upon the virtual domain. One of these, GIC VDPEND, can be used to make
a specified interrupt pending for a guest. The state of guest
interrupts is tracked by ISTs, which are allocated by the hypervisor
and provided directly by the hardware. The enable state for SPIs and
LPIs is driven directly by the guest (using the GIC CDEN/CDDIS system
instructions). Priority, affinity are also driven by the guest.
All of the above means that it is in theory possible to handle virtual
SPIs from KVM by just executing GIC VDPEND whenever new state is to be
injected into the guest. Of course, reality is a little bit more
complicated.
KVM itself provides an interface to register a notifier on interrupt
deactivation - specifically intended for use with SPIs on Arm-based
systems. This notifier requires KVM to track when an interrupt has
been consumed by the guest, so that the notifier can be called.
SPIs are not per-vcpu - they are effectively global to the VM (even if
they are affine to a specific VCPU, KVM doesn't need to know this
information). Therefore, this change introduces a per-VM AP list
specifically for tracking SPIs for a GICv5 guest. The intent is that
while an SPI is in-flight (pending/active) it remains on this list,
such that KVM knows to track the state of said SPI. Once the interrupt
has been consumed by the guest, it can be popped off the list.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-irs-v5.c | 3 +++
include/kvm/arm_vgic.h | 14 ++++++++++++++
2 files changed, 17 insertions(+)
diff --git a/arch/arm64/kvm/vgic/vgic-irs-v5.c b/arch/arm64/kvm/vgic/vgic-irs-v5.c
index d1c724d0fd0b6..6739c01277866 100644
--- a/arch/arm64/kvm/vgic/vgic-irs-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-irs-v5.c
@@ -636,6 +636,9 @@ int kvm_vgic_v5_irs_init(struct kvm *kvm, unsigned int nr_spis)
u64 mmfr0;
int ret, i;
+ INIT_LIST_HEAD(&dist->vgic_v5_spi_ap_list_head);
+ raw_spin_lock_init(&dist->vgic_v5_spi_ap_list_lock);
+
/*
* We (KVM) allocate an Interrupt State Table (IST) for SPIs. The
* hardware mandates that lower 6 bits of the address are 0. Each ISTE
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 143e75743da86..d4b0e7e3edf26 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -574,6 +574,20 @@ struct vgic_dist {
* GICv5 IRS data. Dynamically allocated due to the size.
*/
struct vgic_v5_irs *vgic_v5_irs_data;
+
+ /*
+ * The GICv5 SPI AP list is global to the VM. This spinlock ensures that
+ * we don't do anything untoward!
+ */
+ raw_spinlock_t vgic_v5_spi_ap_list_lock;
+
+ /*
+ * List of global (non-private) IRQs that must be tracked because they
+ * are either Active or Pending (hence the name; AP list). This list
+ * will only ever contain SPIs. All private IRQs must go into a specific
+ * vcpu's AP list.
+ */
+ struct list_head vgic_v5_spi_ap_list_head;
};
struct vgic_v2_cpu_if {
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 26/39] KVM: arm64: gic-v5: Add GIC VDPEND and GIC VDRCFG hyp calls
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (24 preceding siblings ...)
2026-05-21 14:57 ` [PATCH v2 25/39] KVM: arm64: gic-v5: Introduce SPI AP list Sascha Bischoff
@ 2026-05-21 14:58 ` Sascha Bischoff
2026-05-21 14:58 ` [PATCH v2 27/39] KVM: arm64: gic-v5: Track SPI state for in-flight SPIs Sascha Bischoff
` (12 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:58 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
With PPIs, their state is injected via the ICH_PPI_x_EL2 system
registers. For SPIs and LPIs, there are no such registers as these
would limit the number of interrupts significantly. Instead, SPI and
LPI pending state can be managed from the hypervisor using the GIC
VDPEND instruction. This provides a way to set an SPI or LPI for a VM
as pending or non-pending, i.e., to inject interrupts into a guest.
At times, it is important to detect when there is an interrupt that
has been "consumed" by the guest (deactivated). For PPIs, it was
possible to do this via the ICH_PPI_x_EL2 registers, but for SPIs and
LPIs this needs to be done using the GIC VDRCFG instruction. This, in
combination with a read of the ICC_ICSR_EL1, allows the hypervisor to
query the state of any valid SPIs/LPIs for a guest.
These system instructions are only executable from EL2, and therefore
they must be wrapped in hypercalls for NVHE/hVHE configurations. In
the case of the GIC VDRCFG, this hypercall also does the read of the
ICSR to ensure that it snapshots the correct state. Not doing this
could result in reading incorrect state from the ICSR as there is no
guarantee that someone else didn't sneak in meanwhile.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/include/asm/kvm_asm.h | 2 ++
arch/arm64/include/asm/kvm_hyp.h | 2 ++
arch/arm64/kvm/hyp/nvhe/hyp-main.c | 18 ++++++++++++++++++
arch/arm64/kvm/hyp/vgic-v5-sr.c | 20 ++++++++++++++++++++
4 files changed, 42 insertions(+)
diff --git a/arch/arm64/include/asm/kvm_asm.h b/arch/arm64/include/asm/kvm_asm.h
index d9ff9c2999aa7..38a4ba998076c 100644
--- a/arch/arm64/include/asm/kvm_asm.h
+++ b/arch/arm64/include/asm/kvm_asm.h
@@ -89,6 +89,8 @@ enum __kvm_host_smccc_func {
__KVM_HOST_SMCCC_FUNC___vgic_v3_restore_vmcr_aprs,
__KVM_HOST_SMCCC_FUNC___vgic_v5_make_resident,
__KVM_HOST_SMCCC_FUNC___vgic_v5_make_non_resident,
+ __KVM_HOST_SMCCC_FUNC___vgic_v5_vdpend,
+ __KVM_HOST_SMCCC_FUNC___vgic_v5_vdrcfg,
__KVM_HOST_SMCCC_FUNC___vgic_v5_save_apr,
__KVM_HOST_SMCCC_FUNC___vgic_v5_restore_vmcr_apr,
diff --git a/arch/arm64/include/asm/kvm_hyp.h b/arch/arm64/include/asm/kvm_hyp.h
index 5f9184276b04e..20aeb29a4adf1 100644
--- a/arch/arm64/include/asm/kvm_hyp.h
+++ b/arch/arm64/include/asm/kvm_hyp.h
@@ -97,6 +97,8 @@ void __vgic_v5_save_ppi_state(struct vgic_v5_cpu_if *cpu_if);
void __vgic_v5_restore_ppi_state(struct vgic_v5_cpu_if *cpu_if);
void __vgic_v5_save_state(struct vgic_v5_cpu_if *cpu_if);
void __vgic_v5_restore_state(struct vgic_v5_cpu_if *cpu_if);
+void __vgic_v5_vdpend(u32 intid, bool pending, u16 vm);
+u64 __vgic_v5_vdrcfg(u32 intid);
#ifdef __KVM_NVHE_HYPERVISOR__
void __timer_enable_traps(struct kvm_vcpu *vcpu);
diff --git a/arch/arm64/kvm/hyp/nvhe/hyp-main.c b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
index 555275736fa77..9d3f968c316e7 100644
--- a/arch/arm64/kvm/hyp/nvhe/hyp-main.c
+++ b/arch/arm64/kvm/hyp/nvhe/hyp-main.c
@@ -700,6 +700,22 @@ static void handle___vgic_v5_restore_vmcr_apr(struct kvm_cpu_context *host_ctxt)
__vgic_v5_restore_vmcr_apr(kern_hyp_va(cpu_if));
}
+static void handle___vgic_v5_vdpend(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u32, intid, host_ctxt, 1);
+ DECLARE_REG(bool, pending, host_ctxt, 2);
+ DECLARE_REG(u16, vm, host_ctxt, 3);
+
+ __vgic_v5_vdpend(intid, pending, vm);
+}
+
+static void handle___vgic_v5_vdrcfg(struct kvm_cpu_context *host_ctxt)
+{
+ DECLARE_REG(u32, intid, host_ctxt, 1);
+
+ cpu_reg(host_ctxt, 1) = __vgic_v5_vdrcfg(intid);
+}
+
typedef void (*hcall_t)(struct kvm_cpu_context *);
#define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x
@@ -735,6 +751,8 @@ static const hcall_t host_hcall[] = {
HANDLE_FUNC(__vgic_v3_restore_vmcr_aprs),
HANDLE_FUNC(__vgic_v5_make_resident),
HANDLE_FUNC(__vgic_v5_make_non_resident),
+ HANDLE_FUNC(__vgic_v5_vdpend),
+ HANDLE_FUNC(__vgic_v5_vdrcfg),
HANDLE_FUNC(__vgic_v5_save_apr),
HANDLE_FUNC(__vgic_v5_restore_vmcr_apr),
diff --git a/arch/arm64/kvm/hyp/vgic-v5-sr.c b/arch/arm64/kvm/hyp/vgic-v5-sr.c
index 46992a6c2cacb..c50e6ae93ba3f 100644
--- a/arch/arm64/kvm/hyp/vgic-v5-sr.c
+++ b/arch/arm64/kvm/hyp/vgic-v5-sr.c
@@ -149,3 +149,23 @@ void __vgic_v5_restore_state(struct vgic_v5_cpu_if *cpu_if)
{
write_sysreg_s(cpu_if->vgic_icsr, SYS_ICC_ICSR_EL1);
}
+
+void __vgic_v5_vdpend(u32 intid, bool pending, u16 vm)
+{
+ u64 value;
+
+ value = intid & (GICV5_GIC_VDPEND_ID_MASK | GICV5_GIC_VDPEND_TYPE_MASK);
+ value |= FIELD_PREP(GICV5_GIC_VDPEND_PENDING_MASK, pending);
+ value |= FIELD_PREP(GICV5_GIC_VDPEND_VM_MASK, vm);
+ gic_insn(value, VDPEND);
+}
+
+u64 __vgic_v5_vdrcfg(u32 intid)
+{
+ u64 value;
+
+ value = intid & (GICV5_GIC_VDRCFG_ID_MASK | GICV5_GIC_VDRCFG_TYPE_MASK);
+ gic_insn(value, VDRCFG);
+ isb();
+ return read_sysreg_s(SYS_ICC_ICSR_EL1);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 27/39] KVM: arm64: gic-v5: Track SPI state for in-flight SPIs
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (25 preceding siblings ...)
2026-05-21 14:58 ` [PATCH v2 26/39] KVM: arm64: gic-v5: Add GIC VDPEND and GIC VDRCFG hyp calls Sascha Bischoff
@ 2026-05-21 14:58 ` Sascha Bischoff
2026-05-21 14:58 ` [PATCH v2 28/39] KVM: arm64: gic: Introduce set_pending_state() to irq_op Sascha Bischoff
` (11 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:58 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
GICv5 interrupt state is largely managed by the hardware
itself. However, it is possible to register a notifier for the
deactivation of an SPI, and hence KVM is required to track when such
an SPI has been consumed by the guest in order to trigger the
notifier. This allows the code that registered the notifier to be
informed when an SPI has been consumed and deactivated by a guest, and
that the guest is ready to receive the next interrupt, if required.
As part of folding interrupt state for GICv5, which until now just
included PPIs, check the SPI state. For each in-flight SPI (an SPI
that is on the VM's SPI AP list), use GIC VDRCFG to retrieve the state
of the SPI, and track the active and pending states to determine when
the SPI has been deactivated by the guest. This needs to happen on
*every* vcpu exit for *all* vcpus belonging to the VM whenever any SPI
is in flight. When no SPIs are in flight, no SPI state is queried.
When an SPI deactivation is detected, kvm_notify_acked_irq() is called
which triggers any registered notifiers for the SPI (and is a NOP,
otherwise). Additionally, the SPI itself is popped off the AP list.
NOTE: there is currently no way to query if an SPI has a notification
requirement or not. This could be optimised by introducing that and
only tracking the state of SPIs that actually have notifiers attached.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-v5.c | 59 ++++++++++++++++++++++++++++++++++-
arch/arm64/kvm/vgic/vgic.c | 2 +-
arch/arm64/kvm/vgic/vgic.h | 2 +-
3 files changed, 60 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 458afdbfe2938..5684b65fa9389 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -969,7 +969,7 @@ bool vgic_v5_has_pending_ppi(struct kvm_vcpu *vcpu)
* Detect any PPIs state changes, and propagate the state with KVM's
* shadow structures.
*/
-void vgic_v5_fold_ppi_state(struct kvm_vcpu *vcpu)
+static void vgic_v5_fold_ppi_state(struct kvm_vcpu *vcpu)
{
struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
unsigned long *activer, *pendr;
@@ -1041,6 +1041,63 @@ void vgic_v5_flush_ppi_state(struct kvm_vcpu *vcpu)
VGIC_V5_NR_PRIVATE_IRQS);
}
+void vgic_v5_fold_irq_state(struct kvm_vcpu *vcpu)
+{
+ struct vgic_dist *vgic_dist = &vcpu->kvm->arch.vgic;
+ struct vgic_irq *irq;
+
+ /* Sync back the guest PPI state to the KVM shadow state */
+ vgic_v5_fold_ppi_state(vcpu);
+
+ /*
+ * For SPIs, which are on the global AP list, we synchronise their state
+ * with the hardware state. If they have been deactivated, immediately
+ * pop them off the list. The notifier is called without the SPI AP list
+ * lock held to avoid deadlocks.
+ */
+retry:
+ raw_spin_lock(&vgic_dist->vgic_v5_spi_ap_list_lock);
+ list_for_each_entry(irq, &vgic_dist->vgic_v5_spi_ap_list_head, ap_list) {
+ bool pending;
+ u32 intid;
+ u64 icsr;
+
+ raw_spin_lock(&irq->irq_lock);
+
+ icsr = kvm_call_hyp_ret(__vgic_v5_vdrcfg, irq->intid);
+
+ irq->active = !!FIELD_GET(ICC_ICSR_EL1_Active, icsr);
+ pending = !!FIELD_GET(ICC_ICSR_EL1_Pending, icsr);
+
+ if (irq->config == VGIC_CONFIG_EDGE)
+ irq->pending_latch = pending;
+
+ if (irq->config == VGIC_CONFIG_LEVEL && !(pending || irq->active))
+ irq->pending_latch = false;
+
+ /* Deactivated? */
+ if (!irq->active && !pending && !irq_is_pending(irq)) {
+ /* Use raw SPI index without type for the GSI */
+ intid = FIELD_GET(GICV5_HWIRQ_ID, irq->intid);
+
+ /* And we're done with this SPI */
+ list_del(&irq->ap_list);
+ irq->vcpu = NULL;
+
+ raw_spin_unlock(&irq->irq_lock);
+ raw_spin_unlock(&vgic_dist->vgic_v5_spi_ap_list_lock);
+
+ kvm_notify_acked_irq(vcpu->kvm, 0, intid);
+ vgic_put_irq(vcpu->kvm, irq);
+
+ goto retry;
+ }
+
+ raw_spin_unlock(&irq->irq_lock);
+ }
+ raw_spin_unlock(&vgic_dist->vgic_v5_spi_ap_list_lock);
+}
+
void vgic_v5_load(struct kvm_vcpu *vcpu)
{
bool irichppidis = !vcpu->kvm->arch.vgic.enabled;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index d56e87a0d2acc..d628eea4cfa4e 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -855,7 +855,7 @@ static void vgic_prune_ap_list(struct kvm_vcpu *vcpu)
static void vgic_fold_state(struct kvm_vcpu *vcpu)
{
if (vgic_is_v5(vcpu->kvm)) {
- vgic_v5_fold_ppi_state(vcpu);
+ vgic_v5_fold_irq_state(vcpu);
return;
}
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 282278e4a6c19..7eef8ece52dde 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -372,7 +372,7 @@ int vgic_v5_map_resources(struct kvm *kvm);
void vgic_v5_set_ppi_ops(struct kvm_vcpu *vcpu, u32 vintid);
bool vgic_v5_has_pending_ppi(struct kvm_vcpu *vcpu);
void vgic_v5_flush_ppi_state(struct kvm_vcpu *vcpu);
-void vgic_v5_fold_ppi_state(struct kvm_vcpu *vcpu);
+void vgic_v5_fold_irq_state(struct kvm_vcpu *vcpu);
void vgic_v5_load(struct kvm_vcpu *vcpu);
void vgic_v5_put(struct kvm_vcpu *vcpu);
void vgic_v5_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 28/39] KVM: arm64: gic: Introduce set_pending_state() to irq_op
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (26 preceding siblings ...)
2026-05-21 14:58 ` [PATCH v2 27/39] KVM: arm64: gic-v5: Track SPI state for in-flight SPIs Sascha Bischoff
@ 2026-05-21 14:58 ` Sascha Bischoff
2026-05-21 14:59 ` [PATCH v2 29/39] KVM: arm64: gic-v5: Support SPI injection Sascha Bischoff
` (10 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:58 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
There are times, such as with GICv5 SPIs and LPIs, where the hardware
itself manages parts of the interrupt lifecycle. This means that
pending state can be directly communicated to the hardware instead of
being represented only in the VGIC shadow state.
In order to accommodate cases where the hardware handles pending state
directly, add a new set_pending_state() function pointer to
irq_ops. The intent is for this to be used after the VGIC shadow
pending state has changed, allowing the backend to mirror the updated
state into hardware.
This new function is plumbed into kvm_vgic_inject_irq(), and is only
called if irq_ops are provided and this function pointer is explicitly
set. In the general case, this has no effect.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic.c | 3 +++
include/kvm/arm_vgic.h | 6 ++++++
2 files changed, 9 insertions(+)
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index d628eea4cfa4e..b35833a4e2bf9 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -565,6 +565,9 @@ int kvm_vgic_inject_irq(struct kvm *kvm, struct kvm_vcpu *vcpu,
else
irq->pending_latch = true;
+ if (irq->ops && irq->ops->set_pending_state)
+ WARN_ON_ONCE(!irq->ops->set_pending_state(vcpu, irq));
+
vgic_queue_irq_unlock(kvm, irq, flags);
vgic_put_irq(kvm, irq);
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index d4b0e7e3edf26..f9f58ca793707 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -250,6 +250,12 @@ struct irq_ops {
*/
bool (*get_input_level)(int vintid);
+ /*
+ * Function pointer to directly update hardware pending state after the
+ * VGIC shadow pending state has changed.
+ */
+ bool (*set_pending_state)(struct kvm_vcpu *vcpu, struct vgic_irq *irq);
+
/*
* Function pointer to override the queuing of an IRQ.
*/
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 29/39] KVM: arm64: gic-v5: Support SPI injection
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (27 preceding siblings ...)
2026-05-21 14:58 ` [PATCH v2 28/39] KVM: arm64: gic: Introduce set_pending_state() to irq_op Sascha Bischoff
@ 2026-05-21 14:59 ` Sascha Bischoff
2026-05-26 13:41 ` Vladimir Murzin
2026-05-21 14:59 ` [PATCH v2 30/39] Documentation: KVM: Extend VGICv5 docs for KVM_VGIC_V5_ADDR_TYPE_IRS Sascha Bischoff
` (9 subsequent siblings)
38 siblings, 1 reply; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:59 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
GICv5 SPI lifecycle is handled by the GICv5 hardware once the pending
state has been injected.
This change adds support for injecting and managing SPIs to the core
VGIC code and GICv5 code. First of all, allow GICv5 SPIs to be looked
up by ID via vgic_get_irq(). Previously, only PPIs were supported.
Two irq_ops are used to inject the SPI pending state into the
hardware, and to append the SPI to the VM's global SPI AP list. The
set_pending_state() irq_op is used to inject the SPI's pending state
into the guest. The queue_irq_unlock irq_op is used to append the SPI
to the SPI AP list - they are not added to a per-VCPU AP list as they
are global to the VM. Also, this would require KVM to track the
affinity of individual interrupts, which would negate much of the
benefit of their lifecycle's being hardware managed.
While the SPIs are on the global AP list, their state is checked on
every vcpu exit, and once they've been consumed they are removed from
the AP list again.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-irs-v5.c | 1 +
arch/arm64/kvm/vgic/vgic-v5.c | 93 +++++++++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic.c | 28 +++++++---
arch/arm64/kvm/vgic/vgic.h | 2 +
4 files changed, 116 insertions(+), 8 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-irs-v5.c b/arch/arm64/kvm/vgic/vgic-irs-v5.c
index 6739c01277866..6352d17d557e0 100644
--- a/arch/arm64/kvm/vgic/vgic-irs-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-irs-v5.c
@@ -675,6 +675,7 @@ int kvm_vgic_v5_irs_init(struct kvm *kvm, unsigned int nr_spis)
* view it is always enabled.
*/
irq->enabled = 1;
+ vgic_v5_set_spi_ops(irq);
}
nr_spi_bits = fls(roundup_pow_of_two(nr_spis)) - 1;
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 5684b65fa9389..6e2191620e8d7 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -1098,6 +1098,99 @@ void vgic_v5_fold_irq_state(struct kvm_vcpu *vcpu)
raw_spin_unlock(&vgic_dist->vgic_v5_spi_ap_list_lock);
}
+static bool vgic_v5_set_spi_pending_state(struct kvm_vcpu *vcpu,
+ struct vgic_irq *irq)
+{
+ vgic_v5_set_irq_pend(irq->target_vcpu, irq);
+ return true;
+}
+
+/*
+ * Put the SPI on the SPI AP list. No need to kick the VCPU. If it is running,
+ * the interrupt will signal at some point, and if not, then a VPE doorbell will
+ * fire (based on the IAFFID the guest has configured).
+ */
+static bool vgic_v5_spi_queue_irq_unlock(struct kvm *kvm,
+ struct vgic_irq *irq,
+ unsigned long flags)
+ __releases(&irq->irq_lock)
+{
+ struct vgic_dist *vgic_dist = &kvm->arch.vgic;
+
+ lockdep_assert_held(&irq->irq_lock);
+
+ if (WARN_ON(!__irq_is_spi(KVM_DEV_TYPE_ARM_VGIC_V5, irq->intid))) {
+ raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+ return false;
+ }
+
+retry:
+ /*
+ * We're already on the AP list or don't need to be on
+ * one; nothing more to do.
+ */
+ if (irq->vcpu) {
+ raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+ return true;
+ }
+
+ raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+ /* someone can do stuff here, which we re-check below */
+ raw_spin_lock_irqsave(&vgic_dist->vgic_v5_spi_ap_list_lock, flags);
+ raw_spin_lock(&irq->irq_lock);
+
+ /*
+ * We've lost the race; and have already been queued. Unlock
+ * global AP list, relock IRQ, and retry.
+ */
+ if (unlikely(irq->vcpu)) {
+ raw_spin_unlock(&irq->irq_lock);
+ raw_spin_unlock_irqrestore(&vgic_dist->vgic_v5_spi_ap_list_lock, flags);
+
+ raw_spin_lock_irqsave(&irq->irq_lock, flags);
+
+ goto retry;
+ }
+
+ list_add_tail(&irq->ap_list, &vgic_dist->vgic_v5_spi_ap_list_head);
+
+ /*
+ * Use the VCPU we've been given as the target VCPU to track
+ * that we're on an AP list. We're not queued on that VCPU's AP
+ * list, but in lieu of an AP flag, this will do.
+ */
+ irq->vcpu = irq->target_vcpu;
+
+ raw_spin_unlock(&irq->irq_lock);
+ raw_spin_unlock_irqrestore(&vgic_dist->vgic_v5_spi_ap_list_lock, flags);
+
+ return true;
+}
+
+static const struct irq_ops vgic_v5_spi_irq_ops = {
+ .set_pending_state = vgic_v5_set_spi_pending_state,
+ .queue_irq_unlock = vgic_v5_spi_queue_irq_unlock,
+};
+
+void vgic_v5_set_spi_ops(struct vgic_irq *irq)
+{
+ if (WARN_ON(!irq) || WARN_ON(irq->ops))
+ return;
+
+ irq->ops = &vgic_v5_spi_irq_ops;
+}
+
+/* Set the pending state for GICv5 SPIs and LPIs */
+void vgic_v5_set_irq_pend(struct kvm_vcpu *vcpu, struct vgic_irq *irq)
+{
+ if (WARN_ON(__irq_is_ppi(KVM_DEV_TYPE_ARM_VGIC_V5, irq->intid)))
+ return;
+
+ kvm_call_hyp(__vgic_v5_vdpend, irq->intid, irq_is_pending(irq),
+ vcpu->kvm->arch.vgic.gicv5_vm.vm_id);
+}
+
void vgic_v5_load(struct kvm_vcpu *vcpu)
{
bool irichppidis = !vcpu->kvm->arch.vgic.enabled;
diff --git a/arch/arm64/kvm/vgic/vgic.c b/arch/arm64/kvm/vgic/vgic.c
index b35833a4e2bf9..8d5bfec4d26bc 100644
--- a/arch/arm64/kvm/vgic/vgic.c
+++ b/arch/arm64/kvm/vgic/vgic.c
@@ -86,19 +86,31 @@ static struct vgic_irq *vgic_get_lpi(struct kvm *kvm, u32 intid)
*/
struct vgic_irq *vgic_get_irq(struct kvm *kvm, u32 intid)
{
- /* Non-private IRQs are not yet implemented for GICv5 */
- if (vgic_is_v5(kvm))
- return NULL;
+ enum kvm_device_type type = kvm->arch.vgic.vgic_model;
/* SPIs */
- if (intid >= VGIC_NR_PRIVATE_IRQS &&
- intid < (kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS)) {
- intid = array_index_nospec(intid, kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS);
- return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
+ if (__irq_is_spi(type, intid)) {
+ switch (type) {
+ case KVM_DEV_TYPE_ARM_VGIC_V5:
+ intid = vgic_v5_get_hwirq_id(intid);
+
+ if (intid >= kvm->arch.vgic.nr_spis)
+ return NULL;
+
+ intid = array_index_nospec(intid, kvm->arch.vgic.nr_spis);
+ return &kvm->arch.vgic.spis[intid];
+ default:
+ u32 max_intid = kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
+
+ if (intid < max_intid) {
+ intid = array_index_nospec(intid, max_intid);
+ return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
+ }
+ }
}
/* LPIs */
- if (irq_is_lpi(kvm, intid))
+ if (__irq_is_lpi(type, intid))
return vgic_get_lpi(kvm, intid);
return NULL;
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index 7eef8ece52dde..b5036170430dd 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -370,6 +370,8 @@ int kvm_vgic_v5_irs_init(struct kvm *kvm, unsigned int nr_spis);
void vgic_v5_teardown(struct kvm *kvm);
int vgic_v5_map_resources(struct kvm *kvm);
void vgic_v5_set_ppi_ops(struct kvm_vcpu *vcpu, u32 vintid);
+void vgic_v5_set_spi_ops(struct vgic_irq *irq);
+void vgic_v5_set_irq_pend(struct kvm_vcpu *vcpu, struct vgic_irq *irq);
bool vgic_v5_has_pending_ppi(struct kvm_vcpu *vcpu);
void vgic_v5_flush_ppi_state(struct kvm_vcpu *vcpu);
void vgic_v5_fold_irq_state(struct kvm_vcpu *vcpu);
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* Re: [PATCH v2 29/39] KVM: arm64: gic-v5: Support SPI injection
2026-05-21 14:59 ` [PATCH v2 29/39] KVM: arm64: gic-v5: Support SPI injection Sascha Bischoff
@ 2026-05-26 13:41 ` Vladimir Murzin
0 siblings, 0 replies; 42+ messages in thread
From: Vladimir Murzin @ 2026-05-26 13:41 UTC (permalink / raw)
To: Sascha Bischoff, linux-arm-kernel@lists.infradead.org,
kvmarm@lists.linux.dev, kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Hi Sascha,
On 5/21/26 15:59, Sascha Bischoff wrote:
> /* SPIs */
> - if (intid >= VGIC_NR_PRIVATE_IRQS &&
> - intid < (kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS)) {
> - intid = array_index_nospec(intid, kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS);
> - return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
> + if (__irq_is_spi(type, intid)) {
> + switch (type) {
> + case KVM_DEV_TYPE_ARM_VGIC_V5:
> + intid = vgic_v5_get_hwirq_id(intid);
> +
> + if (intid >= kvm->arch.vgic.nr_spis)
> + return NULL;
> +
> + intid = array_index_nospec(intid, kvm->arch.vgic.nr_spis);
> + return &kvm->arch.vgic.spis[intid];
> + default:
> + u32 max_intid = kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
> +
> + if (intid < max_intid) {
> + intid = array_index_nospec(intid, max_intid);
> + return &kvm->arch.vgic.spis[intid - VGIC_NR_PRIVATE_IRQS];
> + }
> + }
> }
Just quick update to save everybody's time. That hunk causes my build fail with:
arch/arm64/kvm/vgic/vgic.c: In function 'vgic_get_irq':
arch/arm64/kvm/vgic/vgic.c:103:4: error: a label can only be part of a statement and a declaration is not a statement
103 | u32 max_intid = kvm->arch.vgic.nr_spis + VGIC_NR_PRIVATE_IRQS;
| ^~~
make[4]: *** [scripts/Makefile.build:289: arch/arm64/kvm/vgic/vgic.o] Error 1
make[3]: *** [scripts/Makefile.build:548: arch/arm64/kvm] Error 2
make[2]: *** [scripts/Makefile.build:548: arch/arm64] Error 2
Obvious fix-up would be wrap default case into curly braces.
Cheers
Vladimir
^ permalink raw reply [flat|nested] 42+ messages in thread
* [PATCH v2 30/39] Documentation: KVM: Extend VGICv5 docs for KVM_VGIC_V5_ADDR_TYPE_IRS
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (28 preceding siblings ...)
2026-05-21 14:59 ` [PATCH v2 29/39] KVM: arm64: gic-v5: Support SPI injection Sascha Bischoff
@ 2026-05-21 14:59 ` Sascha Bischoff
2026-05-21 14:59 ` [PATCH v2 31/39] KVM: arm64: gic-v5: Add GICv5 SPI injection to irqfd Sascha Bischoff
` (8 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:59 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Now that it is possible and required to set the address of the GICv5
IRS in GPA space, update the documentation accordingly. This region
must be 64KByte-aligned, and covers a total range of 128KBytes.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
.../virt/kvm/devices/arm-vgic-v5.rst | 35 ++++++++++++++++---
1 file changed, 31 insertions(+), 4 deletions(-)
diff --git a/Documentation/virt/kvm/devices/arm-vgic-v5.rst b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
index 70b9162755c7e..5c6323d82f784 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v5.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
@@ -12,12 +12,39 @@ Only one VGIC instance may be instantiated through this API. The created VGIC
will act as the VM interrupt controller, requiring emulated user-space devices
to inject interrupts to the VGIC instead of directly to CPUs.
-Creating a guest GICv5 device requires a GICv5 host. The current VGICv5 device
-only supports PPI interrupts. These can either be injected from emulated
-in-kernel devices (such as the Arch Timer, or PMU), or via the KVM_IRQ_LINE
-ioctl.
+Creating a guest GICv5 device requires a GICv5 host. The VGICv5 device supports
+PPI, SPI, and LPI interrupts. The PPI and SPI interrupts can either be injected
+from emulated in-kernel devices (such as the Arch Timer, or PMU), or via the
+KVM_IRQ_LINE ioctl. LPIs are not externally injected, but are handled in
+hardware via the LPI IST. Their pending state is driven directly by the guest.
Groups:
+ KVM_DEV_ARM_VGIC_GRP_ADDR
+ Attributes:
+
+ KVM_VGIC_V5_ADDR_TYPE_IRS (rw, 64-bit)
+ Base address in the guest physical address space of the GICv5 IRS
+ (Interrupt Routing Service) register mappings. Only valid for
+ KVM_DEV_TYPE_ARM_VGIC_V5. This address needs to be 64K aligned and the
+ region covers 128 KByte - the IRS has a CONFIG_FRAME and a SETLPI_FRAME,
+ each of which is 64 KBytes in size.
+
+ Setting the address of the IRS in GPA space is mandatory before VGIC
+ resources are mapped, as the IRS is responsible for handling SPIs and
+ LPIs. Failure to set the IRS address before the first vCPU run results in
+ an error.
+
+ KVM_DEV_ARM_VGIC_GRP_NR_IRQS
+ Attributes:
+
+ A value describing the number of SPIs for this GIC instance. This is
+ GICv5-specific: unlike GICv2/v3, the value does not include SGIs or PPIs.
+ The value ranges from 32 to the maximum value reported by
+ GICV5_IRS_IDR5.SPI_RANGE, in increments of 32. If userspace does not set
+ this attribute, KVM uses 32 SPIs by default.
+
+ kvm_device_attr.addr points to a __u32 value.
+
KVM_DEV_ARM_VGIC_GRP_CTRL
Attributes:
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 31/39] KVM: arm64: gic-v5: Add GICv5 SPI injection to irqfd
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (29 preceding siblings ...)
2026-05-21 14:59 ` [PATCH v2 30/39] Documentation: KVM: Extend VGICv5 docs for KVM_VGIC_V5_ADDR_TYPE_IRS Sascha Bischoff
@ 2026-05-21 14:59 ` Sascha Bischoff
2026-05-21 15:00 ` [PATCH v2 32/39] KVM: arm64: gic-v5: Mask per-vcpu PPI state in vgic_v5_finalize_ppi_state() Sascha Bischoff
` (7 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 14:59 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Now that there is support for GICv5 SPIs in KVM, update
vgic_irqfd_set_irq() to translate irqchip pins into GICv5 SPI IntIDs
before injecting them.
Also adjust IRQCHIP route validation for GICv5: use the configured SPI
count, fall back to the default SPI count before VGIC init, and cap
the accepted pin range to the generic irq routing table size.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/vgic/vgic-irqfd.c | 20 +++++++++++++++++---
1 file changed, 17 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kvm/vgic/vgic-irqfd.c b/arch/arm64/kvm/vgic/vgic-irqfd.c
index b9b86e3a6c862..3644516811214 100644
--- a/arch/arm64/kvm/vgic/vgic-irqfd.c
+++ b/arch/arm64/kvm/vgic/vgic-irqfd.c
@@ -19,7 +19,12 @@ static int vgic_irqfd_set_irq(struct kvm_kernel_irq_routing_entry *e,
struct kvm *kvm, int irq_source_id,
int level, bool line_status)
{
- unsigned int spi_id = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
+ unsigned int spi_id;
+
+ if (kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V5)
+ spi_id = vgic_v5_make_spi(e->irqchip.pin);
+ else
+ spi_id = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
if (!vgic_valid_spi(kvm, spi_id))
return -EINVAL;
@@ -39,15 +44,24 @@ int kvm_set_routing_entry(struct kvm *kvm,
struct kvm_kernel_irq_routing_entry *e,
const struct kvm_irq_routing_entry *ue)
{
+ unsigned int nr_pins = KVM_IRQCHIP_NUM_PINS;
int r = -EINVAL;
+ if (vgic_is_v5(kvm)) {
+ nr_pins = kvm->arch.vgic.nr_spis;
+ if (!nr_pins)
+ nr_pins = VGIC_V5_DEFAULT_NR_SPIS;
+
+ nr_pins = min(nr_pins, KVM_IRQCHIP_NUM_PINS);
+ }
+
switch (ue->type) {
case KVM_IRQ_ROUTING_IRQCHIP:
e->set = vgic_irqfd_set_irq;
e->irqchip.irqchip = ue->u.irqchip.irqchip;
e->irqchip.pin = ue->u.irqchip.pin;
- if ((e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS) ||
- (e->irqchip.irqchip >= KVM_NR_IRQCHIPS))
+ if (e->irqchip.pin >= nr_pins ||
+ e->irqchip.irqchip >= KVM_NR_IRQCHIPS)
goto out;
break;
case KVM_IRQ_ROUTING_MSI:
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 32/39] KVM: arm64: gic-v5: Mask per-vcpu PPI state in vgic_v5_finalize_ppi_state()
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (30 preceding siblings ...)
2026-05-21 14:59 ` [PATCH v2 31/39] KVM: arm64: gic-v5: Add GICv5 SPI injection to irqfd Sascha Bischoff
@ 2026-05-21 15:00 ` Sascha Bischoff
2026-05-21 15:00 ` [PATCH v2 33/39] KVM: arm64: gic-v5: Add GICv5 EL1 sysreg userspace accessors Sascha Bischoff
` (6 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 15:00 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Only a subset of the possible PPIs are exposed to a guest when running
with a vGICv5. First of all, only the architected PPIs are considered
by KVM. Secondly, only a set of those is exposed to a guest - those
corresponding to devices that KVM emulates (timers, PMU) and the GICv5
SW_PPI.
The finalisation of exposed PPIs happens on first vCPU run as this is
the first time when the full set of exposed devices is known. At this
stage a mask is calculated, and this mask is applied to both hide
non-exposed PPI state from the guest and to reduce overhead when
iterating over the PPIs.
As part of introducing support for userspace accesses to the GICv5
system registers it has become apparent that userspace sets of the
GICv5 PPI registers can result in a mismatch between the state exposed
to the guest and what KVM expects to be exposed. Effectively,
userspace can set the Enable, Active, Pending state of PPIs that KVM
has chosen to hide from a guest.
Under the assumption that on a VM restore userspace will set the PPI
state prior to running the vCPU(s) for the first time, rework
vgic_v5_finalize_ppi_state() to not only calculate the mask of exposed
PPIs, but also to clear any state for the non-exposed PPIs. This
ensures that only the state that KVM intends to expose to the guest is
exposed.
Note: If userspace chooses to set the state of PPI registers after
running a vCPU for the first time, then no masking takes place and
that state is directly exposed to a guest.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/arm.c | 2 +-
arch/arm64/kvm/vgic/vgic-v5.c | 71 +++++++++++++++++++++++++----------
include/kvm/arm_vgic.h | 2 +-
3 files changed, 53 insertions(+), 22 deletions(-)
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 34c9950884d5e..2a9cda1972b69 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -958,7 +958,7 @@ int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu)
return ret;
}
- ret = vgic_v5_finalize_ppi_state(kvm);
+ ret = vgic_v5_finalize_ppi_state(vcpu);
if (ret)
return ret;
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 6e2191620e8d7..05fd10030da84 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -761,9 +761,10 @@ int vgic_v5_map_resources(struct kvm *kvm)
return 0;
}
-int vgic_v5_finalize_ppi_state(struct kvm *kvm)
+int vgic_v5_finalize_ppi_state(struct kvm_vcpu *vcpu)
{
- struct kvm_vcpu *vcpu0;
+ struct kvm *kvm = vcpu->kvm;
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
int i;
if (!vgic_is_v5(kvm))
@@ -772,35 +773,65 @@ int vgic_v5_finalize_ppi_state(struct kvm *kvm)
guard(mutex)(&kvm->arch.config_lock);
/*
- * If SW_PPI has been advertised, then we know we already
- * initialised the whole thing, and we can return early. Yes,
- * this is pretty hackish as far as state tracking goes...
+ * Discover the set of PPIs that are exposed to the guest once per VM.
+ * Once known, apply that mask to each VCPU's restored PPI state as the
+ * VCPUs are first run.
*/
- if (test_bit(GICV5_ARCH_PPI_SW_PPI, kvm->arch.vgic.gicv5_vm.vgic_ppi_mask))
- return 0;
-
- /* The PPI state for all VCPUs should be the same. Pick the first. */
- vcpu0 = kvm_get_vcpu(kvm, 0);
+ if (!test_bit(GICV5_ARCH_PPI_SW_PPI, kvm->arch.vgic.gicv5_vm.vgic_ppi_mask)) {
+ bitmap_zero(kvm->arch.vgic.gicv5_vm.vgic_ppi_mask,
+ VGIC_V5_NR_PRIVATE_IRQS);
+ bitmap_zero(kvm->arch.vgic.gicv5_vm.vgic_ppi_hmr,
+ VGIC_V5_NR_PRIVATE_IRQS);
+
+ for_each_set_bit(i, ppi_caps.impl_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS) {
+ const u32 intid = vgic_v5_make_ppi(i);
+ struct vgic_irq *irq;
+
+ irq = vgic_get_vcpu_irq(vcpu, intid);
+
+ /* Expose PPIs with an owner or the SW_PPI, only */
+ scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
+ if (irq->owner || i == GICV5_ARCH_PPI_SW_PPI) {
+ __set_bit(i, kvm->arch.vgic.gicv5_vm.vgic_ppi_mask);
+ __assign_bit(i, kvm->arch.vgic.gicv5_vm.vgic_ppi_hmr,
+ irq->config == VGIC_CONFIG_LEVEL);
+ }
+ }
- bitmap_zero(kvm->arch.vgic.gicv5_vm.vgic_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS);
- bitmap_zero(kvm->arch.vgic.gicv5_vm.vgic_ppi_hmr, VGIC_V5_NR_PRIVATE_IRQS);
+ vgic_put_irq(kvm, irq);
+ }
+ }
- for_each_set_bit(i, ppi_caps.impl_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS) {
+ /*
+ * Apply the mask to Enable, Active. Skip pending as that's calculated
+ * on guest entry.
+ */
+ bitmap_and(cpu_if->vgic_ppi_enabler, cpu_if->vgic_ppi_enabler,
+ kvm->arch.vgic.gicv5_vm.vgic_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS);
+ bitmap_and(cpu_if->vgic_ppi_activer, cpu_if->vgic_ppi_activer,
+ kvm->arch.vgic.gicv5_vm.vgic_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS);
+
+ /* Also update the vgic_irqs */
+ for (i = 0; i < VGIC_V5_NR_PRIVATE_IRQS; i++) {
+ bool visible = test_bit(i, kvm->arch.vgic.gicv5_vm.vgic_ppi_mask);
const u32 intid = vgic_v5_make_ppi(i);
struct vgic_irq *irq;
- irq = vgic_get_vcpu_irq(vcpu0, intid);
+ irq = vgic_get_vcpu_irq(vcpu, intid);
- /* Expose PPIs with an owner or the SW_PPI, only */
scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
- if (irq->owner || i == GICV5_ARCH_PPI_SW_PPI) {
- __set_bit(i, kvm->arch.vgic.gicv5_vm.vgic_ppi_mask);
- __assign_bit(i, kvm->arch.vgic.gicv5_vm.vgic_ppi_hmr,
- irq->config == VGIC_CONFIG_LEVEL);
+ if (!visible) {
+ irq->enabled = false;
+ irq->active = false;
+ irq->pending_latch = false;
+ irq->line_level = false;
+ } else {
+ irq->enabled = test_bit(i, cpu_if->vgic_ppi_enabler);
+ irq->active = test_bit(i, cpu_if->vgic_ppi_activer);
}
}
- vgic_put_irq(vcpu0->kvm, irq);
+ vgic_put_irq(kvm, irq);
}
return 0;
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index f9f58ca793707..eb68c96a46ed2 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -783,7 +783,7 @@ int vgic_v4_load(struct kvm_vcpu *vcpu);
void vgic_v4_commit(struct kvm_vcpu *vcpu);
int vgic_v4_put(struct kvm_vcpu *vcpu);
-int vgic_v5_finalize_ppi_state(struct kvm *kvm);
+int vgic_v5_finalize_ppi_state(struct kvm_vcpu *vcpu);
bool vgic_v5_ppi_queue_irq_unlock(struct kvm *kvm, struct vgic_irq *irq,
unsigned long flags);
void vgic_v5_set_ppi_dvi(struct kvm_vcpu *vcpu, struct vgic_irq *irq, bool dvi);
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 33/39] KVM: arm64: gic-v5: Add GICv5 EL1 sysreg userspace accessors
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (31 preceding siblings ...)
2026-05-21 15:00 ` [PATCH v2 32/39] KVM: arm64: gic-v5: Mask per-vcpu PPI state in vgic_v5_finalize_ppi_state() Sascha Bischoff
@ 2026-05-21 15:00 ` Sascha Bischoff
2026-05-21 15:00 ` [PATCH v2 34/39] KVM: arm64: gic-v5: Handle userspace accesses to IRS MMIO region Sascha Bischoff
` (5 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 15:00 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Now that KVM is at the point where it is able to run meaningful VMs
with GICv5, it is important to be able to save/restore the GICv5 state
in order to allow for VM migration.
Add functions to handle the set/get for GICv5 EL1 system registers to
facilitate the save/restore of these. These access the stored
hypervisor state for the guest, rather than the guest registers
themselves. Much of the state that is read out is generated at this
point as it is stored across a range of registers. When writing the
system registers, the state is merged back into the appropriate
places.
The save/restore accessors follow the existing GICv3 CPU sysreg UAPI
encoding, so the GICv5 device can reuse that interface once the device
attribute plumbing is enabled.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/kvm/Makefile | 3 +-
arch/arm64/kvm/sys_regs.c | 6 +-
arch/arm64/kvm/vgic-sys-reg-v5.c | 519 ++++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic-kvm-device.c | 106 +++++-
arch/arm64/kvm/vgic/vgic.h | 7 +
5 files changed, 633 insertions(+), 8 deletions(-)
create mode 100644 arch/arm64/kvm/vgic-sys-reg-v5.c
diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index 92dda57c08766..7aaeeb84e788e 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -24,7 +24,8 @@ kvm-y += arm.o mmu.o mmio.o psci.o hypercalls.o pvtime.o \
vgic/vgic-mmio.o vgic/vgic-mmio-v2.o \
vgic/vgic-mmio-v3.o vgic/vgic-kvm-device.o \
vgic/vgic-its.o vgic/vgic-debug.o vgic/vgic-v3-nested.o \
- vgic/vgic-v5.o vgic/vgic-v5-tables.o vgic/vgic-irs-v5.o
+ vgic/vgic-v5.o vgic/vgic-v5-tables.o vgic/vgic-irs-v5.o \
+ vgic-sys-reg-v5.o
kvm-$(CONFIG_HW_PERF_EVENTS) += pmu-emul.o pmu.o
kvm-$(CONFIG_ARM64_PTR_AUTH) += pauth.o
diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 6083a1b23dbf9..af0d8357003be 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -5831,7 +5831,7 @@ int kvm_finalize_sys_regs(struct kvm_vcpu *vcpu)
int __init kvm_sys_reg_table_init(void)
{
- const struct sys_reg_desc *gicv3_regs;
+ const struct sys_reg_desc *gicv3_regs, *gicv5_regs;
bool valid = true;
unsigned int i, sz;
int ret = 0;
@@ -5844,8 +5844,12 @@ int __init kvm_sys_reg_table_init(void)
valid &= check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs), false);
valid &= check_sysreg_table(sys_insn_descs, ARRAY_SIZE(sys_insn_descs), false);
+ /* The GICv3 system registers... */
gicv3_regs = vgic_v3_get_sysreg_table(&sz);
valid &= check_sysreg_table(gicv3_regs, sz, false);
+ /* ...and the GICv5 system registers. */
+ gicv5_regs = vgic_v5_get_sysreg_table(&sz);
+ valid &= check_sysreg_table(gicv5_regs, sz, false);
if (!valid)
return -EINVAL;
diff --git a/arch/arm64/kvm/vgic-sys-reg-v5.c b/arch/arm64/kvm/vgic-sys-reg-v5.c
new file mode 100644
index 0000000000000..bbdc4f222c029
--- /dev/null
+++ b/arch/arm64/kvm/vgic-sys-reg-v5.c
@@ -0,0 +1,519 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2025, 2026 Arm Ltd.
+ */
+
+/*
+ * VGICv5 system registers handling functions for AArch64 mode
+ */
+
+#include <linux/irqchip/arm-gic-v5.h>
+
+#include <linux/kvm.h>
+#include <linux/kvm_host.h>
+#include <linux/wordpart.h>
+
+#include <asm/kvm_emulate.h>
+
+#include "vgic/vgic.h"
+#include "sys_regs.h"
+
+#define ICC_PPI_PRIORITYR_PRIORITY_MASK REPEAT_BYTE(0x1f)
+
+static int set_gic_apr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+
+ /* The upper 32 bits are RES0 */
+ cpu_if->vgic_apr = val & ~ICC_APR_EL1_RES0;
+
+ return 0;
+}
+
+static int get_gic_apr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 *val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+
+ *val = cpu_if->vgic_apr;
+
+ return 0;
+}
+
+static int set_gic_cr0(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+
+ /*
+ * We only support setting the ICC_CR0_EL1.En bit, which is actually
+ * stored in the VMCR.
+ */
+ FIELD_MODIFY(FEAT_GCIE_ICH_VMCR_EL2_EN, &cpu_if->vgic_vmcr,
+ FIELD_GET(ICC_CR0_EL1_EN, val));
+
+ return 0;
+}
+
+static int get_gic_cr0(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 *val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+
+ /*
+ * PID only applies if EL3 is present. Same applies to IPPT. Hence,
+ * those fields are always presented as 0.
+ *
+ * We always present the link as connected and idle:
+ * (LINK = 1, LINK_IDLE = 1).
+ */
+ *val = FIELD_PREP(ICC_CR0_EL1_EN,
+ FIELD_GET(FEAT_GCIE_ICH_VMCR_EL2_EN, cpu_if->vgic_vmcr));
+ *val |= ICC_CR0_EL1_LINK_MASK;
+ *val |= ICC_CR0_EL1_LINK_IDLE_MASK;
+
+ return 0;
+}
+
+static int set_gic_pcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+
+ /* Set the VPMR field in the VMCR */
+ FIELD_MODIFY(FEAT_GCIE_ICH_VMCR_EL2_VPMR, &cpu_if->vgic_vmcr,
+ FIELD_GET(ICC_PCR_EL1_PRIORITY, val));
+
+ return 0;
+}
+
+static int get_gic_pcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 *val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+
+ *val = FIELD_PREP(ICC_PCR_EL1_PRIORITY,
+ FIELD_GET(FEAT_GCIE_ICH_VMCR_EL2_VPMR, cpu_if->vgic_vmcr));
+
+ return 0;
+}
+
+static int set_gic_icsr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+
+ cpu_if->vgic_icsr = val & ~ICC_ICSR_EL1_RES0;
+
+ return 0;
+}
+
+static int get_gic_icsr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r,
+ u64 *val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+
+ *val = cpu_if->vgic_icsr;
+
+ return 0;
+}
+
+/*
+ * Helper macro to iterate over a range of PPIs and execute some code (to either
+ * extract or set the vgic_irq state). This is used when `get`-ing the PPI
+ * ENABLER, ACTIVER, PENDR and when setting the PRIORITYR state.
+ *
+ * vcpu: Pointer to struct kvm_vcpu (to which these PPIs belong)
+ * r: The register index. 0 or 1 for all except PRIORITYR (which is 0-15)
+ * nr: The number of PPIs iterated over. 64 for all but PRIORITYR (which is 8)
+ * code: The code snippet to execute for each vgic_irq
+ */
+#define for_ppi_state(vcpu, r, nr, code) \
+ do { \
+ struct kvm_vcpu *__vcpu = (vcpu); \
+ int __r = (r); \
+ int __nr = (nr); \
+ \
+ for (int i = 0; i < __nr; i++) { \
+ u32 id = vgic_v5_make_ppi(__r * __nr + i); \
+ struct vgic_irq *irq; \
+ \
+ irq = vgic_get_vcpu_irq(__vcpu, id); \
+ scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) { \
+ code; \
+ } \
+ vgic_put_irq(__vcpu->kvm, irq); \
+ } \
+ } while (0)
+
+static int set_gic_ppi_enabler(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r, u64 val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+ int i, start, end, reg = r->Op2 % 2;
+
+ /*
+ * If we're only handling architected PPIs and the guest writes to the
+ * enable for the non-architected PPIs, we just return as there's
+ * nothing to do at all. We don't even allocate the storage for them in
+ * this case.
+ */
+ if (VGIC_V5_NR_PRIVATE_IRQS == 64 && reg == 1)
+ return 0;
+
+ /*
+ * Merge the raw guest write into our bitmap at an offset of either 0 or
+ * 64.
+ *
+ * Note that there is *NO* masking applied - the enable state is written
+ * unfiltered. The assumption is that userspace uses this interface to
+ * set initial state before the guest runs, and then the exposed PPI
+ * mask is applied later, when vgic_v5_finalize_ppi_state() runs on
+ * first entry to each vCPU. If userspace chooses to set the enabler
+ * state later, it is fully capable of breaking the illusion we provided
+ * to the guest by exposing register state (and PPIs) to the guest that
+ * were not initially exposed. Good luck!
+ */
+ bitmap_write(cpu_if->vgic_ppi_enabler, val, 64 * reg, 64);
+
+ /*
+ * Sync the change in enable states to the vgic_irqs for the written
+ * register slice.
+ */
+ start = VGIC_V5_NR_PRIVATE_IRQS * reg;
+ end = start + VGIC_V5_NR_PRIVATE_IRQS;
+ for (i = start; i < end; i++) {
+ u32 intid = vgic_v5_make_ppi(i);
+ struct vgic_irq *irq;
+
+ irq = vgic_get_vcpu_irq(vcpu, intid);
+
+ scoped_guard(raw_spinlock_irqsave, &irq->irq_lock)
+ irq->enabled = test_bit(i, cpu_if->vgic_ppi_enabler);
+
+ vgic_put_irq(vcpu->kvm, irq);
+ }
+
+ return 0;
+}
+
+static int get_gic_ppi_enabler(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r, u64 *val)
+{
+ unsigned long enabler = 0;
+ int reg = r->Op2 % 2;
+
+ /* If we only support architected PPIs, return 0 */
+ if (VGIC_V5_NR_PRIVATE_IRQS == 64 && reg == 1) {
+ *val = 0;
+ return 0;
+ }
+
+ /* Iterate over each struct vgic_irq to build the ENABLER value. */
+ for_ppi_state(vcpu, reg, 64, __assign_bit(i % 64, &enabler, irq->enabled));
+
+ *val = enabler;
+
+ return 0;
+}
+
+static int set_gic_ppi_activer(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r, u64 val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+ int i, start, end, reg = r->Op2 % 2;
+
+ if (VGIC_V5_NR_PRIVATE_IRQS == 64 && reg == 1)
+ return 0;
+
+ /*
+ * Store the raw guest write. The exposed PPI mask is applied later,
+ * when vgic_v5_finalize_ppi_state() runs on first entry to each
+ * vCPU. See comment on set_gic_ppi_enabler() for details.
+ */
+ bitmap_write(cpu_if->vgic_ppi_activer, val, 64 * reg, 64);
+
+ start = VGIC_V5_NR_PRIVATE_IRQS * reg;
+ end = start + VGIC_V5_NR_PRIVATE_IRQS;
+ for (i = start; i < end; i++) {
+ u32 intid = vgic_v5_make_ppi(i);
+ struct vgic_irq *irq;
+
+ irq = vgic_get_vcpu_irq(vcpu, intid);
+
+ scoped_guard(raw_spinlock_irqsave, &irq->irq_lock)
+ irq->active = test_bit(i, cpu_if->vgic_ppi_activer);
+
+ vgic_put_irq(vcpu->kvm, irq);
+ }
+
+ return 0;
+}
+
+static int get_gic_ppi_activer(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r, u64 *val)
+{
+ unsigned long activer = 0;
+ int reg = r->Op2 % 2;
+
+ /* If we only support architected PPIs, return 0 */
+ if (VGIC_V5_NR_PRIVATE_IRQS == 64 && reg == 1) {
+ *val = 0;
+ return 0;
+ }
+
+ /* Iterate over each struct vgic_irq to build the ACTIVER value. */
+ for_ppi_state(vcpu, reg, 64, __assign_bit(i % 64, &activer, irq->active));
+
+ *val = activer;
+
+ return 0;
+}
+
+static int set_gic_ppi_pendr(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r, u64 val)
+{
+ int i, start, end, reg = r->Op2 % 2;
+
+ /* If we only support architected PPIs, return */
+ if (VGIC_V5_NR_PRIVATE_IRQS == 64 && reg == 1)
+ return 0;
+
+ /*
+ * Update each struct vgic_irq with the pending state, treating Level
+ * and Edge interrupts differently. The exposed PPI mask is applied
+ * later, when vgic_v5_finalize_ppi_state() runs on first entry to each
+ * vCPU. See comment on set_gic_ppi_enabler() for details.
+ */
+ start = VGIC_V5_NR_PRIVATE_IRQS * reg;
+ end = start + VGIC_V5_NR_PRIVATE_IRQS;
+ for (i = start; i < end; i++) {
+ u32 intid = vgic_v5_make_ppi(i);
+ struct vgic_irq *irq;
+
+ irq = vgic_get_vcpu_irq(vcpu, intid);
+
+ scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
+ bool level = !!(val & BIT_ULL(i));
+
+ if (irq->config == VGIC_CONFIG_LEVEL)
+ irq->line_level = level;
+ else
+ irq->pending_latch = level;
+ }
+
+ vgic_put_irq(vcpu->kvm, irq);
+ }
+
+ /*
+ * The pending state is generated from the vgic_irqs on each guest
+ * entry. Therefore, we don't store the raw value written anywhere in
+ * the case of userspace PPI_PENDRx_EL1 writes.
+ */
+
+ return 0;
+}
+
+static int get_gic_ppi_pendr(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r, u64 *val)
+{
+ unsigned long pendr = 0;
+ int reg = r->Op2 % 2;
+
+ /* If we only support architected PPIs, return 0 */
+ if (VGIC_V5_NR_PRIVATE_IRQS == 64 && reg == 1) {
+ *val = 0;
+ return 0;
+ }
+
+ /* Iterate over each struct vgic_irq to build the PENDR value. */
+ for_ppi_state(vcpu, reg, 64, {
+ if (irq_is_pending(irq))
+ __assign_bit(i % 64, &pendr, 1);
+ });
+
+ *val = pendr;
+
+ return 0;
+}
+
+static int set_gic_ppi_priorityr(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r, u64 val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+ int reg = ((r->CRm & 0x1) << 3) + r->Op2;
+
+ /* If we only support architected PPIs, return */
+ if (VGIC_V5_NR_PRIVATE_IRQS == 64 && reg > 7)
+ return 0;
+
+ val &= ICC_PPI_PRIORITYR_PRIORITY_MASK;
+
+ /*
+ * Although priorities are not regularly synced back to the vgic_irq
+ * state, they are explicitly synced back here. This is to ensure that
+ * any pending PPIs are evaluated correctly when first running the guest
+ * after setting the state.
+ */
+ for_ppi_state(vcpu, reg, 8,
+ irq->priority = (u8)(val >> (8 * i));
+ );
+
+ /*
+ * Update the state that will be written to the ICH_PPI_PRIORITYRx_EL2
+ * on next guest entry.
+ */
+ cpu_if->vgic_ppi_priorityr[reg] = val;
+
+ return 0;
+}
+
+static int get_gic_ppi_priorityr(struct kvm_vcpu *vcpu,
+ const struct sys_reg_desc *r, u64 *val)
+{
+ struct vgic_v5_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v5;
+ int reg = ((r->CRm & 0x1) << 3) + r->Op2;
+
+ /* If we only support architected PPIs, return 0 */
+ if (VGIC_V5_NR_PRIVATE_IRQS == 64 && reg > 7) {
+ *val = 0;
+ return 0;
+ }
+
+ /*
+ * The priorities are only synced back to the vgic_irq state when the
+ * vcpu is entering WFI (KVM only needs to know the priorities when
+ * evaluating if there are pending PPI interrupts for a vcpu). The raw
+ * register ICH_PPI_PRIORITYRx_EL1 state is simply saved and restored
+ * blindly. This state is just returned as it contains the most recent
+ * priorities written by the guest.
+ */
+ *val = cpu_if->vgic_ppi_priorityr[reg];
+
+ return 0;
+}
+
+/*
+ * The following registers are NOT supported:
+ *
+ * - ICC_HAPR_EL1
+ * The value of this is directly generated by the GICv5 hardware based on
+ * the ICC_APR_EL1 when the guest is running.
+ * - ICC_IAFFIDR_EL1
+ * The IAFFID for a GICv5 VPE is the same as the VPE ID, which is the index
+ * into the in-memory VPE Table. This is not configurable, and instead we
+ * rely on userspace recreating the VPEs in the same order prior to
+ * restoring guest state.
+ * - ICC_PPI_CACTIVER<n>_EL1
+ * Only raw state writes are supported via the S(et) variant.
+ * - ICC_PPI_CPENDR<n>_EL1
+ * Only raw state writes are supported via the S(et) variant.
+ */
+static const struct sys_reg_desc gic_v5_icc_reg_descs[] = {
+ { SYS_DESC(SYS_ICC_ICSR_EL1),
+ .set_user = set_gic_icsr, .get_user = get_gic_icsr, },
+ { SYS_DESC(SYS_ICC_PPI_ENABLER0_EL1),
+ .set_user = set_gic_ppi_enabler, .get_user = get_gic_ppi_enabler, },
+ { SYS_DESC(SYS_ICC_PPI_ENABLER1_EL1),
+ .set_user = set_gic_ppi_enabler, .get_user = get_gic_ppi_enabler, },
+ /*
+ * Only ICC_SACTIVER<n>_EL1 is exposed to the guest. This is treated as
+ * a *RAW* write of register state for writes.
+ */
+ { SYS_DESC(SYS_ICC_PPI_SACTIVER0_EL1),
+ .set_user = set_gic_ppi_activer, .get_user = get_gic_ppi_activer, },
+ { SYS_DESC(SYS_ICC_PPI_SACTIVER1_EL1),
+ .set_user = set_gic_ppi_activer, .get_user = get_gic_ppi_activer, },
+ /*
+ * Only ICC_SPENDR<n>_EL1 is exposed to the guest. This is treated as
+ * a *RAW* write of register state for writes.
+ */
+ { SYS_DESC(SYS_ICC_PPI_SPENDR0_EL1),
+ .set_user = set_gic_ppi_pendr, .get_user = get_gic_ppi_pendr, },
+ { SYS_DESC(SYS_ICC_PPI_SPENDR1_EL1),
+ .set_user = set_gic_ppi_pendr, .get_user = get_gic_ppi_pendr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR0_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR1_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR2_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR3_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR4_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR5_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR6_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR7_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR8_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR9_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR10_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR11_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR12_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR13_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR14_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_PPI_PRIORITYR15_EL1),
+ .set_user = set_gic_ppi_priorityr, .get_user = get_gic_ppi_priorityr, },
+ { SYS_DESC(SYS_ICC_APR_EL1),
+ .set_user = set_gic_apr, .get_user = get_gic_apr, },
+ { SYS_DESC(SYS_ICC_CR0_EL1),
+ .set_user = set_gic_cr0, .get_user = get_gic_cr0, },
+ { SYS_DESC(SYS_ICC_PCR_EL1),
+ .set_user = set_gic_pcr, .get_user = get_gic_pcr, },
+};
+
+const struct sys_reg_desc *vgic_v5_get_sysreg_table(unsigned int *sz)
+{
+ *sz = ARRAY_SIZE(gic_v5_icc_reg_descs);
+ return gic_v5_icc_reg_descs;
+}
+
+static u64 attr_to_id(u64 attr)
+{
+ return ARM64_SYS_REG(FIELD_GET(KVM_REG_ARM_VGIC_SYSREG_OP0_MASK, attr),
+ FIELD_GET(KVM_REG_ARM_VGIC_SYSREG_OP1_MASK, attr),
+ FIELD_GET(KVM_REG_ARM_VGIC_SYSREG_CRN_MASK, attr),
+ FIELD_GET(KVM_REG_ARM_VGIC_SYSREG_CRM_MASK, attr),
+ FIELD_GET(KVM_REG_ARM_VGIC_SYSREG_OP2_MASK, attr));
+}
+
+int vgic_v5_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr)
+{
+ const struct sys_reg_desc *r;
+
+ r = get_reg_by_id(attr_to_id(attr->attr), gic_v5_icc_reg_descs,
+ ARRAY_SIZE(gic_v5_icc_reg_descs));
+
+ if (r && !sysreg_hidden(vcpu, r))
+ return 0;
+
+ return -ENXIO;
+}
+
+int vgic_v5_cpu_sysregs_uaccess(struct kvm_vcpu *vcpu,
+ struct kvm_device_attr *attr,
+ bool is_write)
+{
+ struct kvm_one_reg reg = {
+ .id = attr_to_id(attr->attr),
+ .addr = attr->addr,
+ };
+
+ if (is_write)
+ return kvm_sys_reg_set_user(vcpu, ®, gic_v5_icc_reg_descs,
+ ARRAY_SIZE(gic_v5_icc_reg_descs));
+ else
+ return kvm_sys_reg_get_user(vcpu, ®, gic_v5_icc_reg_descs,
+ ARRAY_SIZE(gic_v5_icc_reg_descs));
+}
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 2bf1930902b8e..075e4c1326754 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -542,7 +542,7 @@ int vgic_v3_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
* Allow access to certain ID-like registers prior to VGIC initialization,
* thereby allowing the VMM to provision the features / sizing of the VGIC.
*/
-static bool reg_allowed_pre_init(struct kvm_device_attr *attr)
+static bool v3_reg_allowed_pre_init(struct kvm_device_attr *attr)
{
if (attr->group != KVM_DEV_ARM_VGIC_GRP_DIST_REGS)
return false;
@@ -605,7 +605,7 @@ static int vgic_v3_attr_regs_access(struct kvm_device *dev,
mutex_lock(&dev->kvm->arch.config_lock);
- if (!(vgic_initialized(dev->kvm) || reg_allowed_pre_init(attr))) {
+ if (!(vgic_initialized(dev->kvm) || v3_reg_allowed_pre_init(attr))) {
ret = -EBUSY;
goto out;
}
@@ -773,6 +773,92 @@ static int vgic_v5_get_userspace_ppis(struct kvm_device *dev,
return ret;
}
+int vgic_v5_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
+ struct vgic_reg_attr *reg_attr)
+{
+ unsigned long vgic_mpidr, mpidr_reg;
+
+ switch (attr->group) {
+ case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
+ vgic_mpidr = (attr->attr & KVM_DEV_ARM_VGIC_V3_MPIDR_MASK) >>
+ KVM_DEV_ARM_VGIC_V3_MPIDR_SHIFT;
+
+ mpidr_reg = VGIC_TO_MPIDR(vgic_mpidr);
+ reg_attr->vcpu = kvm_mpidr_to_vcpu(dev->kvm, mpidr_reg);
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ if (!reg_attr->vcpu)
+ return -EINVAL;
+
+ reg_attr->addr = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK;
+
+ return 0;
+}
+
+/*
+ * Some registers can potentially be read before the core GIC & IRS has been
+ * initialised. Right now, everything is required to be post-init.
+ */
+static bool v5_reg_allowed_pre_init(struct kvm_device_attr *attr)
+{
+ return false;
+}
+
+/*
+ * vgic_v5_attr_regs_access - allows user space to access VGIC v5 state
+ *
+ * @dev: kvm device handle
+ * @attr: kvm device attribute
+ * @is_write: true if userspace is writing a register
+ */
+static int vgic_v5_attr_regs_access(struct kvm_device *dev,
+ struct kvm_device_attr *attr,
+ bool is_write)
+{
+ struct vgic_reg_attr reg_attr;
+ struct kvm_vcpu *vcpu;
+ int ret;
+
+ ret = vgic_v5_parse_attr(dev, attr, ®_attr);
+ if (ret)
+ return ret;
+
+ vcpu = reg_attr.vcpu;
+
+ mutex_lock(&dev->kvm->lock);
+
+ if (kvm_trylock_all_vcpus(dev->kvm)) {
+ mutex_unlock(&dev->kvm->lock);
+ return -EBUSY;
+ }
+
+ mutex_lock(&dev->kvm->arch.config_lock);
+
+ if (!(vgic_initialized(dev->kvm) || v5_reg_allowed_pre_init(attr))) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ switch (attr->group) {
+ case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
+ ret = vgic_v5_cpu_sysregs_uaccess(vcpu, attr, is_write);
+ break;
+ default:
+ ret = -EINVAL;
+ break;
+ }
+
+out:
+ mutex_unlock(&dev->kvm->arch.config_lock);
+ kvm_unlock_all_vcpus(dev->kvm);
+ mutex_unlock(&dev->kvm->lock);
+
+ return ret;
+}
+
static int vgic_v5_set_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
@@ -780,7 +866,7 @@ static int vgic_v5_set_attr(struct kvm_device *dev,
case KVM_DEV_ARM_VGIC_GRP_ADDR:
break;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
- return -ENXIO;
+ return vgic_v5_attr_regs_access(dev, attr, true);
case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
break;
case KVM_DEV_ARM_VGIC_GRP_CTRL:
@@ -806,7 +892,7 @@ static int vgic_v5_get_attr(struct kvm_device *dev,
case KVM_DEV_ARM_VGIC_GRP_ADDR:
break;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
- return -ENXIO;
+ return vgic_v5_attr_regs_access(dev, attr, false);
case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
break;
case KVM_DEV_ARM_VGIC_GRP_CTRL:
@@ -836,8 +922,16 @@ static int vgic_v5_has_attr(struct kvm_device *dev,
return 0;
}
return -ENXIO;
- case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
- return -ENXIO;
+ case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS: {
+ struct vgic_reg_attr reg_attr;
+ int ret;
+
+ ret = vgic_v5_parse_attr(dev, attr, ®_attr);
+ if (ret)
+ return ret;
+
+ return vgic_v5_has_cpu_sysregs_attr(reg_attr.vcpu, attr);
+ }
case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
return 0;
case KVM_DEV_ARM_VGIC_GRP_CTRL:
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index b5036170430dd..bcdac044a23f4 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -252,6 +252,8 @@ struct ap_list_summary {
#define irqs_active_outside_lrs(s) \
((s)->nr_act && irqs_outside_lrs(s))
+int vgic_v5_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
+ struct vgic_reg_attr *reg_attr);
int vgic_v3_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
struct vgic_reg_attr *reg_attr);
int vgic_v2_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
@@ -383,6 +385,11 @@ void vgic_v5_restore_state(struct kvm_vcpu *vcpu);
void vgic_v5_save_state(struct kvm_vcpu *vcpu);
int vgic_v5_register_irs_iodev(struct kvm *kvm, gpa_t irs_base_address);
+int vgic_v5_cpu_sysregs_uaccess(struct kvm_vcpu *vcpu,
+ struct kvm_device_attr *attr, bool is_write);
+int vgic_v5_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
+const struct sys_reg_desc *vgic_v5_get_sysreg_table(unsigned int *sz);
+
#define for_each_visible_v5_ppi(__i, __k) \
for_each_set_bit(__i, (__k)->arch.vgic.gicv5_vm.vgic_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS)
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 34/39] KVM: arm64: gic-v5: Handle userspace accesses to IRS MMIO region
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (32 preceding siblings ...)
2026-05-21 15:00 ` [PATCH v2 33/39] KVM: arm64: gic-v5: Add GICv5 EL1 sysreg userspace accessors Sascha Bischoff
@ 2026-05-21 15:00 ` Sascha Bischoff
2026-05-21 15:01 ` [PATCH v2 35/39] KVM: arm64: gic-v5: Implement save/restore mechanisms for ISTs Sascha Bischoff
` (4 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 15:00 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
As part of saving and restoring state of a GICv5-based system,
userspace is required to save/restore the IRS MMIO registers. These
include important information such as guest IST configuration, and in
general KVM needs to present consistent state to the guest.
Provide accessors to read and write the IRS MMIO state. This is
modelled on what is already done for the GICv3 ITS as the idea is
broadly the same.
Where possible, the existing access mechanisms are used, but for some
registers the access is handled a bit differently as they have wider
effects. For example, some writes need to be sanitised to make sure
that the hardware is capable (IST capabilities presented to the guest,
for example). Similar things apply to the SPI config where we block
userspace from setting anything that doesn't match what has been set
already.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/vgic/vgic-irs-v5.c | 415 ++++++++++++++++++++----
arch/arm64/kvm/vgic/vgic-kvm-device.c | 55 +++-
arch/arm64/kvm/vgic/vgic.h | 4 +
tools/arch/arm64/include/uapi/asm/kvm.h | 1 +
5 files changed, 396 insertions(+), 80 deletions(-)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index d1b2ca317f586..710a0d267347d 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -422,6 +422,7 @@ enum {
#define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO 7
#define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ 9
+#define KVM_DEV_ARM_VGIC_GRP_IRS_REGS 10
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT 10
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/arch/arm64/kvm/vgic/vgic-irs-v5.c b/arch/arm64/kvm/vgic/vgic-irs-v5.c
index 6352d17d557e0..b7808555adc82 100644
--- a/arch/arm64/kvm/vgic/vgic-irs-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-irs-v5.c
@@ -393,12 +393,61 @@ static unsigned long vgic_v5_mmio_read_irs_ist(struct kvm_vcpu *vcpu,
return value;
}
+static void vgic_v5_update_irs_ist_baser(struct vgic_v5_irs *irs,
+ unsigned long val)
+{
+ irs->ist_baser.valid = !!(val & GICV5_IRS_IST_BASER_VALID);
+ irs->ist_baser.addr = FIELD_GET(GICV5_IRS_IST_BASER_ADDR_MASK, val)
+ << GICV5_IRS_IST_BASER_ADDR_SHIFT;
+}
+
+static int vgic_v5_write_irs_ist_baser(struct kvm_vcpu *vcpu, unsigned long val)
+{
+ struct vgic_v5_irs *irs = vgic_v5_get_irs(vcpu);
+ enum gicv5_vcpu_cmd cmd = LPI_VIST_MAKE_INVALID;
+ bool valid = !!(val & GICV5_IRS_IST_BASER_VALID);
+ int rc;
+
+ /* Valid -> Invalid */
+ if (irs->ist_baser.valid && !valid) {
+ /* Make the LPI IST invalid and then ... */
+ rc = irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu), &cmd);
+ if (rc)
+ return rc;
+
+ /*
+ * ... free the host IST if we successfully marked the
+ * IST as invalid. Frankly, if we failed to make the
+ * guest's IST as invalid, we're cooked because it means
+ * that the IRS may still be using the memory that we
+ * want to free. Hence, we leave it allocated and skip
+ * the clearing of valid bit in the baser.
+ */
+ rc = vgic_v5_lpi_ist_free(vcpu->kvm);
+ if (rc)
+ return rc;
+ } else if (!irs->ist_baser.valid && valid) { /* Invalid -> Valid */
+ if (!vgic_v5_ist_cfgr_valid(irs)) {
+ kvm_err("Guest programmed invalid IRS_IST_CFGR\n");
+ return -EINVAL;
+ }
+
+ rc = vgic_v5_lpi_ist_alloc(vcpu->kvm, irs->ist_cfgr.lpi_id_bits);
+ if (rc)
+ return rc;
+ }
+
+ /* Now that we've handled the edges, update the valid bit and addr */
+ vgic_v5_update_irs_ist_baser(irs, val);
+
+ return 0;
+}
+
static void vgic_v5_mmio_write_irs_ist(struct kvm_vcpu *vcpu, gpa_t addr,
unsigned int len, unsigned long val)
{
struct vgic_v5_irs *irs = vgic_v5_get_irs(vcpu);
const size_t offset = addr & (SZ_64K - 1);
- enum gicv5_vcpu_cmd cmd = LPI_VIST_MAKE_INVALID;
switch (offset) {
case GICV5_IRS_IST_CFGR:
@@ -408,72 +457,192 @@ static void vgic_v5_mmio_write_irs_ist(struct kvm_vcpu *vcpu, gpa_t addr,
irs->ist_cfgr.structure = !!(val & GICV5_IRS_IST_CFGR_STRUCTURE);
return;
case GICV5_IRS_IST_BASER: {
- bool valid = !!(val & GICV5_IRS_IST_BASER_VALID);
-
guard(mutex)(&vcpu->kvm->arch.config_lock);
+ vgic_v5_write_irs_ist_baser(vcpu, val);
+ return;
+ }
+ default:
+ return;
+ }
+}
- /* Valid -> Invalid */
- if (irs->ist_baser.valid && !valid) {
- /* Make the LPI IST invalid and then ... */
- if (irq_set_vcpu_affinity(vgic_v5_vpe_db(vcpu), &cmd))
- break;
+static unsigned long vgic_v5_mmio_uaccess_read_irs_status(struct kvm_vcpu *vcpu,
+ gpa_t addr,
+ unsigned int len)
+{
+ const size_t offset = addr & (SZ_64K - 1);
- /*
- * ... free the host IST if we successfully marked the
- * IST as invalid. Frankly, if we failed to make the
- * guest's IST as invalid, we're cooked because it means
- * that the IRS may still be using the memory that we
- * want to free. Hence, we leave it allocated and skip
- * the clearing of valid bit in the baser.
- */
- if (vgic_v5_lpi_ist_free(vcpu->kvm))
- break;
- } else if (!irs->ist_baser.valid && valid) { /* Invalid -> Valid */
- if (!vgic_v5_ist_cfgr_valid(irs)) {
- kvm_err("Guest programmed invalid IRS_IST_CFGR\n");
- break;
- }
-
- if (vgic_v5_lpi_ist_alloc(vcpu->kvm, irs->ist_cfgr.lpi_id_bits))
- break;
- }
+ switch (offset) {
+ case GICV5_IRS_SYNC_STATUSR:
+ return GICV5_IRS_SYNC_STATUSR_IDLE;
+ case GICV5_IRS_SPI_STATUSR:
+ return GICV5_IRS_SPI_STATUSR_IDLE;
+ case GICV5_IRS_PE_STATUSR:
+ return GICV5_IRS_PE_STATUSR_IDLE;
+ case GICV5_IRS_IST_STATUSR:
+ return GICV5_IRS_IST_STATUSR_IDLE;
+ default:
+ return 0;
+ }
+}
- /* Now that we've handled the edges, update the valid bit and addr */
- irs->ist_baser.valid = !!(val & GICV5_IRS_IST_BASER_VALID);
- irs->ist_baser.addr = FIELD_GET(GICV5_IRS_IST_BASER_ADDR_MASK, val)
- << GICV5_IRS_IST_BASER_ADDR_SHIFT;
+static int vgic_v5_mmio_uaccess_write_irs(struct kvm_vcpu *vcpu, gpa_t addr,
+ unsigned int len, unsigned long val)
+{
+ struct vgic_dist *vgic = &vcpu->kvm->arch.vgic;
+ struct vgic_v5_irs *irs_data = vgic->vgic_v5_irs_data;
+ size_t offset = addr & (SZ_64K - 1);
- return;
+ /*
+ * The following registers are ONLY settable via uaccesses. The guest
+ * cannot write them!
+ */
+
+ switch (offset) {
+ case GICV5_IRS_IDR0:
+ if (FIELD_GET(GICV5_IRS_IDR0_INT_DOM, val) !=
+ GICV5_IRS_IDR0_INT_DOM_NON_SECURE)
+ return -EINVAL;
+
+ if ((val & GICV5_IRS_IDR0_VIRT) ||
+ (val & GICV5_IRS_IDR0_ONE_N) ||
+ (val & GICV5_IRS_IDR0_VIRT_ONE_N) ||
+ (val & GICV5_IRS_IDR0_SETLPI) ||
+ (val & GICV5_IRS_IDR0_MEC) ||
+ (val & GICV5_IRS_IDR0_MPAM) ||
+ (val & GICV5_IRS_IDR0_SWE))
+ return -EINVAL;
+
+ irs_data->idr0.domain = FIELD_GET(GICV5_IRS_IDR0_INT_DOM, val);
+ irs_data->idr0.pa_range = FIELD_GET(GICV5_IRS_IDR0_PA_RANGE, val);
+ irs_data->idr0.virt = !!(val & GICV5_IRS_IDR0_VIRT);
+ irs_data->idr0.setlpi = !!(val & GICV5_IRS_IDR0_SETLPI);
+ irs_data->idr0.mec = !!(val & GICV5_IRS_IDR0_MEC);
+ irs_data->idr0.mpam = !!(val & GICV5_IRS_IDR0_MPAM);
+ irs_data->idr0.swe = !!(val & GICV5_IRS_IDR0_SWE);
+ irs_data->idr0.irs_id = FIELD_GET(GICV5_IRS_IDR0_IRSID, val);
+ break;
+ case GICV5_IRS_IDR1: {
+ unsigned int iaffid_bits, priority_bits;
+
+ /* Ignore writes to PE_CNT as this is populated from num vcpus */
+ iaffid_bits = FIELD_GET(GICV5_IRS_IDR1_IAFFID_BITS, val);
+ priority_bits = FIELD_GET(GICV5_IRS_IDR1_PRIORITY_BITS, val);
+
+ /*
+ * IAFFID_BITS is derived from the number of vPE ID bits in
+ * the VMTE, and is encoded as N - 1.
+ */
+ if (iaffid_bits != vgic_v5_vmte_vpe_id_bits(vcpu) - 1)
+ return -EINVAL;
+
+ if (priority_bits > gicv5_global_data.irs_pri_bits - 1)
+ return -EINVAL;
+
+ irs_data->idr1.priority_bits = priority_bits;
+ break;
}
+ case GICV5_IRS_IDR2:
+ /* We always support LPIs */
+ if (!(val & GICV5_IRS_IDR2_LPI))
+ return -EINVAL;
+
+ /* We only support LPIs with linear, non-metadata guest ISTs */
+ if (val & GICV5_IRS_IDR2_IST_LEVELS)
+ return -EINVAL;
+
+ if ((val & GICV5_IRS_IDR2_ISTMD) ||
+ FIELD_GET(GICV5_IRS_IDR2_ISTMD_SZ, val))
+ return -EINVAL;
+
+ /* We can't present more bits than we have support for in HW */
+ if (FIELD_GET(GICV5_IRS_IDR2_ID_BITS, val) > irs_caps.ist_id_bits)
+ return -EINVAL;
+
+ /* Min LPI ID bits must be greater than or equal to the HW */
+ if (FIELD_GET(GICV5_IRS_IDR2_MIN_LPI_ID_BITS, val) <
+ irs_caps.min_lpi_id_bits)
+ return -EINVAL;
+
+ if (FIELD_GET(GICV5_IRS_IDR2_MIN_LPI_ID_BITS, val) >
+ FIELD_GET(GICV5_IRS_IDR2_ID_BITS, val))
+ return -EINVAL;
+
+ irs_data->idr2.istmd_sz = FIELD_GET(GICV5_IRS_IDR2_ISTMD_SZ, val);
+ irs_data->idr2.istmd = !!(val & GICV5_IRS_IDR2_ISTMD);
+ irs_data->idr2.ist_l2sz = FIELD_GET(GICV5_IRS_IDR2_IST_L2SZ, val);
+ irs_data->idr2.ist_levels = !!(val & GICV5_IRS_IDR2_IST_LEVELS);
+ irs_data->idr2.min_lpi_id_bits = FIELD_GET(GICV5_IRS_IDR2_MIN_LPI_ID_BITS, val);
+ irs_data->idr2.id_bits = FIELD_GET(GICV5_IRS_IDR2_ID_BITS, val);
+ break;
+ case GICV5_IRS_IDR5:
+ if (FIELD_GET(GICV5_IRS_IDR5_SPI_RANGE, val) != irs_data->idr5.spi_range)
+ return -EINVAL;
+ break;
+ case GICV5_IRS_IDR6:
+ if (FIELD_GET(GICV5_IRS_IDR6_SPI_IRS_RANGE, val) != irs_data->idr6.spi_irs_range)
+ return -EINVAL;
+ break;
+ case GICV5_IRS_IDR7:
+ if (FIELD_GET(GICV5_IRS_IDR7_SPI_BASE, val) != irs_data->idr7.spi_base)
+ return -EINVAL;
+ break;
+ case GICV5_IRS_IST_BASER:
+ vgic_v5_update_irs_ist_baser(irs_data, val);
+ break;
+ case GICV5_IRS_SPI_CFGR:
+ break;
+ case GICV5_IRS_IIDR:
+ fallthrough;
+ case GICV5_IRS_AIDR:
+ break;
default:
- return;
+ return -EINVAL;
}
+
+ return 0;
}
static const struct vgic_register_region vgic_v5_irs_registers[] = {
/*
* This is the IRS_CONFIG_FRAME.
*/
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR0, vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR1, vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR2, vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IDR0, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IDR1, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IDR2, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR3, vgic_mmio_read_raz,
vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR4, vgic_mmio_read_raz,
vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR5, vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR6, vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IDR7, vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IIDR, vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_AIDR, vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IDR5, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IDR6, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IDR7, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IIDR, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_AIDR, vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_CR0, vgic_v5_mmio_read_irs_misc,
vgic_v5_mmio_write_irs_misc, 4,
VGIC_ACCESS_32bit),
@@ -483,9 +652,12 @@ static const struct vgic_register_region vgic_v5_irs_registers[] = {
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SYNCR, vgic_mmio_read_raz,
vgic_mmio_write_wi, 4,
VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SYNC_STATUSR,
- vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_SYNC_STATUSR,
+ vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi,
+ vgic_v5_mmio_uaccess_read_irs_status,
+ vgic_mmio_uaccess_write_wi, 4,
+ VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_VMR, vgic_mmio_read_raz,
vgic_mmio_write_wi, 8,
VGIC_ACCESS_64bit),
@@ -493,35 +665,48 @@ static const struct vgic_register_region vgic_v5_irs_registers[] = {
vgic_v5_mmio_write_irs_spi, 4,
VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_DOMAINR, vgic_v5_mmio_read_irs_spi,
- vgic_v5_mmio_write_irs_spi, 4,
- VGIC_ACCESS_32bit),
+ vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_RESAMPLER, vgic_mmio_read_raz,
vgic_mmio_write_wi, 4,
VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_CFGR, vgic_v5_mmio_read_irs_spi,
- vgic_v5_mmio_write_irs_spi, 4,
- VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_SPI_STATUSR,
- vgic_v5_mmio_read_irs_spi, vgic_mmio_write_wi,
- 4, VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_PE_SELR, vgic_v5_mmio_read_irs_misc,
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_SPI_CFGR,
+ vgic_v5_mmio_read_irs_spi,
+ vgic_v5_mmio_write_irs_spi, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_SPI_STATUSR,
+ vgic_v5_mmio_read_irs_spi,
+ vgic_mmio_write_wi,
+ vgic_v5_mmio_uaccess_read_irs_status,
+ vgic_mmio_uaccess_write_wi, 4,
+ VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH(GICV5_IRS_PE_SELR,
+ vgic_v5_mmio_read_irs_misc,
vgic_v5_mmio_write_irs_misc, 4,
VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_PE_STATUSR,
- vgic_v5_mmio_read_irs_misc,
- vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_PE_STATUSR,
+ vgic_v5_mmio_read_irs_misc,
+ vgic_mmio_write_wi,
+ vgic_v5_mmio_uaccess_read_irs_status,
+ vgic_mmio_uaccess_write_wi, 4,
+ VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_PE_CR0, vgic_v5_mmio_read_irs_misc,
vgic_v5_mmio_write_irs_misc, 4,
VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IST_BASER, vgic_v5_mmio_read_irs_ist,
- vgic_v5_mmio_write_irs_ist, 8,
- VGIC_ACCESS_64bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IST_BASER,
+ vgic_v5_mmio_read_irs_ist,
+ vgic_v5_mmio_write_irs_ist, NULL,
+ vgic_v5_mmio_uaccess_write_irs, 8,
+ VGIC_ACCESS_64bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IST_CFGR, vgic_v5_mmio_read_irs_ist,
vgic_v5_mmio_write_irs_ist, 4,
VGIC_ACCESS_32bit),
- REGISTER_DESC_WITH_LENGTH(GICV5_IRS_IST_STATUSR,
- vgic_v5_mmio_read_irs_ist, vgic_mmio_write_wi,
- 4, VGIC_ACCESS_32bit),
+ REGISTER_DESC_WITH_LENGTH_UACCESS(GICV5_IRS_IST_STATUSR,
+ vgic_v5_mmio_read_irs_ist,
+ vgic_mmio_write_wi,
+ vgic_v5_mmio_uaccess_read_irs_status,
+ vgic_mmio_uaccess_write_wi, 4,
+ VGIC_ACCESS_32bit),
REGISTER_DESC_WITH_LENGTH(GICV5_IRS_MAP_L2_ISTR, vgic_mmio_read_raz,
vgic_mmio_write_wi, 4, VGIC_ACCESS_32bit),
@@ -759,3 +944,93 @@ int kvm_vgic_v5_irs_init(struct kvm *kvm, unsigned int nr_spis)
return 0;
}
+
+int vgic_v5_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr *attr)
+{
+ const struct vgic_register_region *region;
+ struct vgic_reg_attr reg_attr;
+ struct kvm_vcpu *vcpu;
+ gpa_t addr, offset;
+ int ret, align;
+
+ ret = vgic_v5_parse_attr(dev, attr, ®_attr);
+ if (ret)
+ return ret;
+
+ vcpu = reg_attr.vcpu;
+ addr = reg_attr.addr;
+
+ if (attr->group == KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS)
+ return vgic_v5_has_cpu_sysregs_attr(vcpu, attr);
+
+ offset = attr->attr;
+
+ if (IS_VGIC_ADDR_UNDEF(dev->kvm->arch.vgic.vgic_v5_irs_data->vgic_v5_irs_base))
+ return -ENXIO;
+
+ region = vgic_find_mmio_region(vgic_v5_irs_registers,
+ ARRAY_SIZE(vgic_v5_irs_registers),
+ offset);
+ if (!region)
+ return -ENXIO;
+
+ align = region->access_flags & VGIC_ACCESS_64bit ? 0x7 : 0x3;
+ if (offset & align)
+ return -EINVAL;
+
+ return 0;
+}
+
+/*
+ * Access the IRS MMIO Regs. Relevant locks have been taken by the calling code.
+ */
+int vgic_v5_irs_attr_regs_access(struct kvm_device *dev,
+ struct kvm_device_attr *attr,
+ u64 *reg, bool is_write)
+{
+ const struct vgic_register_region *region;
+ gpa_t addr, offset;
+ unsigned int len;
+ int align, ret = 0;
+
+ offset = attr->attr;
+
+ if (IS_VGIC_ADDR_UNDEF(dev->kvm->arch.vgic.vgic_v5_irs_data->vgic_v5_irs_base))
+ return -ENXIO;
+
+ region = vgic_find_mmio_region(vgic_v5_irs_registers,
+ ARRAY_SIZE(vgic_v5_irs_registers),
+ offset);
+ if (!region)
+ return -ENXIO;
+
+ /*
+ * Although the spec supports upper/lower 32-bit accesses to
+ * 64-bit IRS registers, the userspace ABI requires 64-bit
+ * accesses to all 64-bit wide registers. We therefore only
+ * support 32-bit accesses to 32-bit-wide registers.
+ */
+ align = region->access_flags & VGIC_ACCESS_64bit ? 0x7 : 0x3;
+ len = region->access_flags & VGIC_ACCESS_64bit ? 8 : 4;
+
+ if (offset & align)
+ return -EINVAL;
+
+ addr = dev->kvm->arch.vgic.vgic_v5_irs_data->vgic_v5_irs_base + offset;
+
+ if (is_write) {
+ if (region->uaccess_write)
+ ret = region->uaccess_write(kvm_get_vcpu(dev->kvm, 0),
+ addr, len, *reg);
+ else
+ region->write(kvm_get_vcpu(dev->kvm, 0), addr, len, *reg);
+ } else {
+ if (region->uaccess_read)
+ *reg = region->uaccess_read(kvm_get_vcpu(dev->kvm, 0),
+ addr, len);
+ else
+ *reg = region->read(kvm_get_vcpu(dev->kvm, 0), addr, len);
+ }
+
+ return ret;
+}
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index 075e4c1326754..cab3d6db070ac 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -786,6 +786,9 @@ int vgic_v5_parse_attr(struct kvm_device *dev, struct kvm_device_attr *attr,
mpidr_reg = VGIC_TO_MPIDR(vgic_mpidr);
reg_attr->vcpu = kvm_mpidr_to_vcpu(dev->kvm, mpidr_reg);
break;
+ case KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
+ reg_attr->vcpu = kvm_get_vcpu(dev->kvm, 0);
+ break;
default:
return -EINVAL;
}
@@ -818,8 +821,11 @@ static int vgic_v5_attr_regs_access(struct kvm_device *dev,
struct kvm_device_attr *attr,
bool is_write)
{
+ u64 __user *uaddr = (u64 __user *)(unsigned long)attr->addr;
struct vgic_reg_attr reg_attr;
struct kvm_vcpu *vcpu;
+ bool uaccess;
+ u64 val;
int ret;
ret = vgic_v5_parse_attr(dev, attr, ®_attr);
@@ -828,6 +834,22 @@ static int vgic_v5_attr_regs_access(struct kvm_device *dev,
vcpu = reg_attr.vcpu;
+ switch (attr->group) {
+ case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
+ /* Sysregs uaccess is performed by the sysreg handling code */
+ uaccess = false;
+ break;
+ case KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
+ fallthrough;
+ default:
+ uaccess = true;
+ }
+
+ if (uaccess && is_write) {
+ if (get_user(val, uaddr))
+ return -EFAULT;
+ }
+
mutex_lock(&dev->kvm->lock);
if (kvm_trylock_all_vcpus(dev->kvm)) {
@@ -846,6 +868,18 @@ static int vgic_v5_attr_regs_access(struct kvm_device *dev,
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
ret = vgic_v5_cpu_sysregs_uaccess(vcpu, attr, is_write);
break;
+ case KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
+ /*
+ * The IRS registers are a mixture of 32-bit and 64-bit
+ * registers. Internally, we always perform the correctly sized
+ * access, but the UAPI is defined in such a way that we are
+ * always provided a __u64 by userspace. When userspace writes,
+ * the upper 32-bits are ignored for 32-bit accesses, and on a
+ * read any 32-bit accesses are written back to user memory
+ * using the full 64-bits.
+ */
+ ret = vgic_v5_irs_attr_regs_access(dev, attr, &val, is_write);
+ break;
default:
ret = -EINVAL;
break;
@@ -856,6 +890,9 @@ static int vgic_v5_attr_regs_access(struct kvm_device *dev,
kvm_unlock_all_vcpus(dev->kvm);
mutex_unlock(&dev->kvm->lock);
+ if (!ret && uaccess && !is_write)
+ ret = put_user(val, uaddr);
+
return ret;
}
@@ -865,6 +902,8 @@ static int vgic_v5_set_attr(struct kvm_device *dev,
switch (attr->group) {
case KVM_DEV_ARM_VGIC_GRP_ADDR:
break;
+ case KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
+ fallthrough;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
return vgic_v5_attr_regs_access(dev, attr, true);
case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
@@ -891,6 +930,8 @@ static int vgic_v5_get_attr(struct kvm_device *dev,
switch (attr->group) {
case KVM_DEV_ARM_VGIC_GRP_ADDR:
break;
+ case KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
+ fallthrough;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
return vgic_v5_attr_regs_access(dev, attr, false);
case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
@@ -922,16 +963,10 @@ static int vgic_v5_has_attr(struct kvm_device *dev,
return 0;
}
return -ENXIO;
- case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS: {
- struct vgic_reg_attr reg_attr;
- int ret;
-
- ret = vgic_v5_parse_attr(dev, attr, ®_attr);
- if (ret)
- return ret;
-
- return vgic_v5_has_cpu_sysregs_attr(reg_attr.vcpu, attr);
- }
+ case KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
+ fallthrough;
+ case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
+ return vgic_v5_has_attr_regs(dev, attr);
case KVM_DEV_ARM_VGIC_GRP_NR_IRQS:
return 0;
case KVM_DEV_ARM_VGIC_GRP_CTRL:
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index bcdac044a23f4..e05b4a5c2e49b 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -389,6 +389,10 @@ int vgic_v5_cpu_sysregs_uaccess(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr, bool is_write);
int vgic_v5_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
const struct sys_reg_desc *vgic_v5_get_sysreg_table(unsigned int *sz);
+int vgic_v5_irs_attr_regs_access(struct kvm_device *dev,
+ struct kvm_device_attr *attr,
+ u64 *reg, bool is_write);
+int vgic_v5_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr *attr);
#define for_each_visible_v5_ppi(__i, __k) \
for_each_set_bit(__i, (__k)->arch.vgic.gicv5_vm.vgic_ppi_mask, VGIC_V5_NR_PRIVATE_IRQS)
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index d1b2ca317f586..710a0d267347d 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -422,6 +422,7 @@ enum {
#define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO 7
#define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ 9
+#define KVM_DEV_ARM_VGIC_GRP_IRS_REGS 10
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT 10
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 35/39] KVM: arm64: gic-v5: Implement save/restore mechanisms for ISTs
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (33 preceding siblings ...)
2026-05-21 15:00 ` [PATCH v2 34/39] KVM: arm64: gic-v5: Handle userspace accesses to IRS MMIO region Sascha Bischoff
@ 2026-05-21 15:01 ` Sascha Bischoff
2026-05-21 15:01 ` [PATCH v2 36/39] Documentation: KVM: Document KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS for VGICv5 Sascha Bischoff
` (3 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 15:01 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
When running a GICv5 VM, there are up to two ISTs that must be saved
or restored when migrating a VM.
The SPI IST is allocated by the hypervisor, as the guest presumes the
memory for the SPI state is allocated by the hardware. The LPI IST, on
the other hand, is allocated by the guest in the event that it wishes
to use LPIs. We shadow the guest's LPI IST in KVM, and therefore the
guest's memory is never directly used by the GICv5 hardware. Hence, in
both cases, the in-use ISTs are allocated by the hypervisor.
As there is no guest-allocated memory for the SPI IST, the state of
this must be saved by the VMM. Therefore, the VMM must provide a
memory buffer large enough to store/restore the SPI IST (32-bits per
SPI).
The LPI IST, if present, is stored into guest memory as the guest has
already allocated storage under the assumption that it would be used
by the GIC. Each IST Entry is written back to guest memory (skipping
metadata sections) on a save, or restored from guest memory on a
restore. The guest is only allowed to create a linear IST, so there's
a sufficiently large region of memory that is contiguous in GPA space.
On a save, the VM itself is quiesced using IRS_SAVE_VMR - this ensures
that the hardware has written all interrupt state back to the
ISTs. Following the save operation, the IRS_SAVE_VM_STATUSR is checked
to ensure that the guest has remained quiescent. In the event that it
has not, an error is propagated back to the VMM such that it can retry
the save.
On restore, the VM is first made invalid - it is not allowed to write
to any of the tables while they are valid - and then the SPI and LPI
ISTs are restored (if required) before making the VM valid again. As
part of restoring the ISTs, any pending interrupts are tracked, and
IST pending state is cleared. Once the VM is made valid, these valid
interrupts are made pending again via the GIC VDPEND system
instruction.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
arch/arm64/include/uapi/asm/kvm.h | 1 +
arch/arm64/kvm/vgic/vgic-irs-v5.c | 20 +
arch/arm64/kvm/vgic/vgic-kvm-device.c | 13 +
arch/arm64/kvm/vgic/vgic-v5-tables.c | 645 ++++++++++++++++++++++++
arch/arm64/kvm/vgic/vgic-v5-tables.h | 12 +
arch/arm64/kvm/vgic/vgic-v5.c | 286 +++++++++++
arch/arm64/kvm/vgic/vgic.h | 3 +
tools/arch/arm64/include/uapi/asm/kvm.h | 1 +
8 files changed, 981 insertions(+)
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 710a0d267347d..1b9bbeab18a4e 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -423,6 +423,7 @@ enum {
#define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ 9
#define KVM_DEV_ARM_VGIC_GRP_IRS_REGS 10
+#define KVM_DEV_ARM_VGIC_GRP_IST 11
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT 10
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
diff --git a/arch/arm64/kvm/vgic/vgic-irs-v5.c b/arch/arm64/kvm/vgic/vgic-irs-v5.c
index b7808555adc82..92f646036439f 100644
--- a/arch/arm64/kvm/vgic/vgic-irs-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-irs-v5.c
@@ -945,6 +945,26 @@ int kvm_vgic_v5_irs_init(struct kvm *kvm, unsigned int nr_spis)
return 0;
}
+int vgic_v5_irs_lpi_ist_id_bits(struct kvm *kvm, unsigned int *id_bits)
+{
+ struct vgic_v5_irs *irs = kvm->arch.vgic.vgic_v5_irs_data;
+
+ if (WARN_ON_ONCE(!irs))
+ return -ENXIO;
+
+ if (!irs->ist_baser.valid)
+ return 0;
+
+ if (!vgic_v5_ist_cfgr_valid(irs)) {
+ kvm_err("Guest programmed invalid IRS_IST_CFGR\n");
+ return -EINVAL;
+ }
+
+ *id_bits = irs->ist_cfgr.lpi_id_bits;
+
+ return 1;
+}
+
int vgic_v5_has_attr_regs(struct kvm_device *dev, struct kvm_device_attr *attr)
{
const struct vgic_register_region *region;
diff --git a/arch/arm64/kvm/vgic/vgic-kvm-device.c b/arch/arm64/kvm/vgic/vgic-kvm-device.c
index cab3d6db070ac..afea89b99411f 100644
--- a/arch/arm64/kvm/vgic/vgic-kvm-device.c
+++ b/arch/arm64/kvm/vgic/vgic-kvm-device.c
@@ -902,6 +902,11 @@ static int vgic_v5_set_attr(struct kvm_device *dev,
switch (attr->group) {
case KVM_DEV_ARM_VGIC_GRP_ADDR:
break;
+ case KVM_DEV_ARM_VGIC_GRP_IST:
+ if (attr->attr)
+ return -ENXIO;
+
+ return vgic_v5_irs_restore_ists(dev->kvm, attr);
case KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
fallthrough;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
@@ -930,6 +935,11 @@ static int vgic_v5_get_attr(struct kvm_device *dev,
switch (attr->group) {
case KVM_DEV_ARM_VGIC_GRP_ADDR:
break;
+ case KVM_DEV_ARM_VGIC_GRP_IST:
+ if (attr->attr)
+ return -ENXIO;
+
+ return vgic_v5_irs_save_ists(dev->kvm, attr);
case KVM_DEV_ARM_VGIC_GRP_IRS_REGS:
fallthrough;
case KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS:
@@ -979,6 +989,9 @@ static int vgic_v5_has_attr(struct kvm_device *dev,
default:
return -ENXIO;
}
+ break;
+ case KVM_DEV_ARM_VGIC_GRP_IST:
+ return attr->attr ? -ENXIO : 0;
default:
return -ENXIO;
}
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.c b/arch/arm64/kvm/vgic/vgic-v5-tables.c
index 2df470d29d64a..b499731aa4ec4 100644
--- a/arch/arm64/kvm/vgic/vgic-v5-tables.c
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.c
@@ -59,6 +59,20 @@ static DEFINE_XARRAY(vm_info);
#define GICV5_VPED_ADDR_SHIFT 3ULL
#define GICV5_VPED_ADDR GENMASK_ULL(55, 3)
+/* L2 Interrupt State Table Entry */
+#define GICV5_ISTL2E_PENDING BIT(0)
+#define GICV5_ISTL2E_ACTIVE BIT(1)
+#define GICV5_ISTL2E_HM BIT(2)
+#define GICV5_ISTL2E_ENABLE BIT(3)
+#define GICV5_ISTL2E_IRM BIT(4)
+#define GICV5_ISTL2E_HWU GENMASK(10, 9)
+#define GICV5_ISTL2E_PRIORITY GENMASK(15, 11)
+#define GICV5_ISTL2E_IAFFID GENMASK(31, 16)
+
+#define GICV5_ISTE_SIZE(istsz) BIT((istsz) + 2)
+#define GICV5_LINEAR_IST_SIZE(id_bits, istsz) \
+ (BIT(id_bits) * GICV5_ISTE_SIZE(istsz))
+
/*
* The LPI and SPI configuration is stored in the 2nd and 3rd 64-bit chunks of
* the VMTE (0-based). We call this a section here in an attempt to simplify the
@@ -67,6 +81,26 @@ static DEFINE_XARRAY(vm_info);
#define GICV5_VMTEL2_LPI_SECTION 2
#define GICV5_VMTEL2_SPI_SECTION 3
+struct vgic_v5_ist_desc {
+ struct vgic_v5_vm_info *vmi;
+ void *base;
+ unsigned int id_bits;
+ unsigned int istsz;
+ unsigned int l2sz;
+ size_t iste_size;
+ bool present;
+};
+
+struct vgic_v5_two_level_ist_shape {
+ size_t l1_entries;
+ size_t l2_entries;
+};
+
+struct vgic_v5_pending_irq {
+ u32 irq;
+ struct list_head next;
+};
+
static int vgic_v5_alloc_linear_ist(struct kvm *kvm, bool spi_ist,
unsigned int id_bits,
unsigned int istsz);
@@ -100,6 +134,22 @@ static void vgic_v5_clean_inval(void *va, size_t size)
dcache_clean_inval_poc(base, base + size);
}
+static void vgic_v5_drain_pending_irqs(struct kvm *kvm,
+ struct vgic_v5_vm_info *vmi,
+ bool reinject)
+{
+ struct vgic_v5_pending_irq *pirq, *tmp;
+
+ list_for_each_entry_safe(pirq, tmp, &vmi->pending_irqs, next) {
+ if (reinject)
+ kvm_call_hyp(__vgic_v5_vdpend, pirq->irq, true,
+ vgic_v5_vm_id(kvm));
+
+ list_del(&pirq->next);
+ kfree(pirq);
+ }
+}
+
/*
* Create a linear VM Table. Directly using the number of entries supplied as
* the size of an L2 VMTE (32 bytes) guarantees that our allocation is aligned per
@@ -440,6 +490,13 @@ int vgic_v5_vmte_init(struct kvm *kvm)
if (ret)
goto out_fail;
+ /*
+ * If we are restoring the state of a guest, we need to re-inject any
+ * IRQs that were pending when the state of the guest was originally
+ * saved. We use the pending_irqs list for this.
+ */
+ INIT_LIST_HEAD(&vmi->pending_irqs);
+
/* Allocate and assign the VM Descriptor, if required. */
if (vmt_info->vmd_size != 0) {
vmd = kzalloc(vmt_info->vmd_size, GFP_KERNEL);
@@ -544,6 +601,9 @@ int vgic_v5_vmte_release(struct kvm *kvm)
kfree(vmi->vpet_base);
kfree(vmi->vmd_base);
+ /* Unlikely, but possible. Avoid leaking the memory. */
+ vgic_v5_drain_pending_irqs(kvm, vmi, false);
+
/* If we have an LPI IST, free it */
if (vmi->h_lpi_ist) {
ret = vgic_v5_lpi_ist_free(kvm);
@@ -1112,6 +1172,18 @@ static int vgic_v5_spi_ist_free(struct kvm *kvm)
return vgic_v5_linear_ist_free(kvm, true);
}
+int vgic_v5_lpi_ist_exists(struct kvm *kvm)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return -ENXIO;
+
+ return !!vmi->h_lpi_ist;
+}
+
/*
* Allocate an IST for LPIs.
*
@@ -1184,3 +1256,576 @@ int vgic_v5_lpi_ist_free(struct kvm *kvm)
else
return vgic_v5_two_level_ist_free(kvm, false);
}
+
+static struct vgic_v5_two_level_ist_shape
+vgic_v5_two_level_ist_shape(const struct vgic_v5_ist_desc *ist)
+{
+ struct vgic_v5_two_level_ist_shape shape;
+ size_t l2bits, n;
+
+ l2bits = (10 - ist->istsz) + (2 * ist->l2sz);
+ n = max(2, ist->id_bits - l2bits + 3 - 1);
+
+ shape.l1_entries = BIT(n + 1) / GICV5_IRS_ISTL1E_SIZE;
+ shape.l2_entries = BIT(l2bits);
+
+ return shape;
+}
+
+static int vgic_v5_read_vm_ist_desc(struct kvm *kvm, unsigned int section,
+ struct vgic_v5_ist_desc *ist)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vmtl2_entry *vmte;
+ u64 vmte_ist_section;
+
+ vmte = vgic_v5_get_l2_vmte(vm_id);
+ if (IS_ERR(vmte))
+ return PTR_ERR(vmte);
+
+ vgic_v5_clean_inval(vmte, sizeof(*vmte));
+ vmte_ist_section = le64_to_cpu(READ_ONCE(vmte->val[section]));
+
+ ist->id_bits = FIELD_GET(GICV5_VMTEL2E_IST_ID_BITS, vmte_ist_section);
+ ist->istsz = FIELD_GET(GICV5_VMTEL2E_IST_ISTSZ, vmte_ist_section);
+ ist->l2sz = FIELD_GET(GICV5_VMTEL2E_IST_L2SZ, vmte_ist_section);
+ ist->iste_size = GICV5_ISTE_SIZE(ist->istsz);
+
+ return vmte_ist_section & GICV5_VMTEL2E_IST_VALID;
+}
+
+static int vgic_v5_get_spi_ist_desc(struct kvm *kvm, bool userspace_buf,
+ struct vgic_v5_ist_desc *ist)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ int ret;
+
+ memset(ist, 0, sizeof(*ist));
+
+ ist->vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!ist->vmi))
+ return -ENXIO;
+
+ ret = vgic_v5_read_vm_ist_desc(kvm, GICV5_VMTEL2_SPI_SECTION, ist);
+ if (ret < 0)
+ return ret;
+
+ ist->base = ist->vmi->h_spi_ist;
+
+ /* We don't have SPIs, but userspace is trying to save/restore them. */
+ if (!ist->base && userspace_buf)
+ return -ENOENT;
+
+ /* We have SPIs but userspace isn't trying to save/restore them. */
+ if (ist->base && !userspace_buf)
+ return -EINVAL;
+
+ /* No SPIs and no userspace buffer: nothing to do. */
+ if (!ist->base && !userspace_buf)
+ return 0;
+
+ ist->present = true;
+ return 0;
+}
+
+static int vgic_v5_get_lpi_ist_desc(struct kvm *kvm,
+ struct vgic_v5_ist_desc *ist)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ bool guest_valid, host_valid;
+ int ret;
+
+ memset(ist, 0, sizeof(*ist));
+
+ ist->vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!ist->vmi))
+ return -ENXIO;
+
+ ret = vgic_v5_read_vm_ist_desc(kvm, GICV5_VMTEL2_LPI_SECTION, ist);
+ if (ret < 0)
+ return ret;
+
+ host_valid = ret;
+ guest_valid = kvm->arch.vgic.vgic_v5_irs_data->ist_baser.valid;
+ ist->base = ist->vmi->h_lpi_ist;
+
+ /* If there is no IST to save/restore, return without error. */
+ if (!guest_valid && !host_valid && !ist->base)
+ return 0;
+
+ /* Mismatched combination of valid state */
+ if (!guest_valid || !host_valid || !ist->base)
+ return -ENXIO;
+
+ if (ist->vmi->h_lpi_ist_structure && !ist->vmi->h_lpi_l2_ists)
+ return -ENXIO;
+
+ ist->present = true;
+ return 0;
+}
+
+/*
+ * Save the SPI IST to userspace-provided memory.
+ *
+ * Only the architected 32-bit ISTE state is exposed to userspace. Host
+ * metadata is skipped when striding through the linear host SPI IST.
+ */
+int vgic_v5_save_spi_ist(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+ u32 __user *uaddr = (u32 __user *)(unsigned long)attr->addr;
+ struct vgic_v5_ist_desc ist;
+ __le32 h_iste;
+ int ret;
+
+ ret = vgic_v5_get_spi_ist_desc(kvm, !!attr->addr, &ist);
+ if (ret || !ist.present)
+ return ret;
+
+ vgic_v5_clean_inval(ist.base,
+ GICV5_LINEAR_IST_SIZE(ist.id_bits, ist.istsz));
+
+ /* The host SPI IST is always linear. */
+ for (unsigned int i = 0; i < kvm->arch.vgic.nr_spis; ++i) {
+ /*
+ * Only the low 32 bits are saved. Any host metadata after the
+ * architected ISTE is skipped by the host ISTE stride.
+ */
+ __le32 *h_iste_addr = ist.base + i * ist.iste_size;
+
+ h_iste = READ_ONCE(*h_iste_addr);
+ ret = put_user(h_iste, uaddr);
+ if (ret)
+ return ret;
+
+ uaddr++;
+ }
+
+ return 0;
+}
+
+/*
+ * Save a Linear host LPI IST to guest memory.
+ *
+ * Only the architected 32-bit ISTE state is stored. Host metadata is skipped
+ * when striding through the host's LPI IST.
+ *
+ * The guest's LPI IST is always Linear.
+ */
+static int vgic_v5_save_linear_lpi_ist(struct kvm *kvm,
+ const struct vgic_v5_ist_desc *ist,
+ gpa_t g_entry_addr)
+{
+ size_t h_l2_index, h_l2_entries;
+ __le32 h_iste;
+ int ret;
+
+ h_l2_entries = BIT(ist->id_bits);
+
+ vgic_v5_clean_inval(ist->base,
+ GICV5_LINEAR_IST_SIZE(ist->id_bits, ist->istsz));
+
+ for (h_l2_index = 0; h_l2_index < h_l2_entries; h_l2_index++) {
+ __le32 *h_iste_addr = ist->base + h_l2_index * ist->iste_size;
+
+ h_iste = *h_iste_addr;
+ ret = vgic_write_guest_lock(kvm, g_entry_addr, &h_iste,
+ sizeof(h_iste));
+ if (ret)
+ return ret;
+
+ g_entry_addr += sizeof(h_iste);
+ }
+
+ return 0;
+}
+
+/*
+ * Save a Two-level host LPI IST to guest memory.
+ *
+ * Only the architected 32-bit ISTE state is stored. Host metadata is skipped
+ * when striding through the host's IST.
+ *
+ * The guest's LPI IST is always Linear.
+ */
+static int vgic_v5_save_two_level_lpi_ist(struct kvm *kvm,
+ const struct vgic_v5_ist_desc *ist,
+ gpa_t g_entry_addr)
+{
+ struct vgic_v5_two_level_ist_shape shape;
+ size_t h_l1_index, h_l2_index;
+ void *h_l2_ist_base;
+ __le32 h_iste;
+ int ret;
+
+ shape = vgic_v5_two_level_ist_shape(ist);
+
+ vgic_v5_clean_inval(ist->base,
+ shape.l1_entries * sizeof(*ist->vmi->h_lpi_ist));
+
+ for (h_l1_index = 0; h_l1_index < shape.l1_entries; h_l1_index++) {
+ u64 l1_iste;
+
+ /*
+ * Host L2 ISTs are preallocated. Any invalid L1 entry means the
+ * host IST state is inconsistent.
+ */
+ l1_iste = le64_to_cpu(READ_ONCE(ist->vmi->h_lpi_ist[h_l1_index]));
+ if (!FIELD_GET(GICV5_ISTL1E_VALID, l1_iste))
+ return -ENXIO;
+
+ h_l2_ist_base = ist->vmi->h_lpi_l2_ists[h_l1_index];
+ if (!h_l2_ist_base)
+ return -ENXIO;
+
+ vgic_v5_clean_inval(h_l2_ist_base,
+ shape.l2_entries * ist->iste_size);
+
+ for (h_l2_index = 0; h_l2_index < shape.l2_entries; h_l2_index++) {
+ h_iste = *(__le32 *)(h_l2_ist_base +
+ h_l2_index * ist->iste_size);
+
+ ret = vgic_write_guest_lock(kvm, g_entry_addr,
+ &h_iste, sizeof(h_iste));
+ if (ret)
+ return ret;
+
+ g_entry_addr += sizeof(__le32);
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Save the LPI IST to guest memory
+ *
+ * The guest LPI IST is exposed as a linear GPA range. The host LPI IST may be
+ * linear or two-level, so host iteration depends on the allocated host shape.
+ *
+ * Only the architected 32-bit ISTE state is saved. Host metadata is rebuilt on
+ * restore.
+ */
+int vgic_v5_save_lpi_ist(struct kvm *kvm)
+{
+ struct vgic_v5_ist_desc ist;
+ gpa_t g_entry_addr;
+ int ret;
+
+ ret = vgic_v5_get_lpi_ist_desc(kvm, &ist);
+ if (ret || !ist.present)
+ return ret;
+
+ /* The guest LPI IST is saved through its linear GPA range. */
+ g_entry_addr = kvm->arch.vgic.vgic_v5_irs_data->ist_baser.addr;
+
+ if (!ist.vmi->h_lpi_ist_structure)
+ return vgic_v5_save_linear_lpi_ist(kvm, &ist, g_entry_addr);
+
+ return vgic_v5_save_two_level_lpi_ist(kvm, &ist, g_entry_addr);
+}
+
+/*
+ * Track any SPIs and LPIs that were marked as pending at the point where the
+ * IST was restored.
+ *
+ * Restored pending state is cleared from the host IST and replayed with VDPEND
+ * before the VM first runs.
+ */
+static int vgic_v5_track_pending_irq(struct list_head *pending_irqs, u32 intid,
+ u32 type)
+{
+ struct vgic_v5_pending_irq *pirq;
+
+ pirq = kzalloc_obj(*pirq, GFP_KERNEL);
+ if (!pirq)
+ return -ENOMEM;
+
+ /* Encode the interrupt as a GICv5 IntID. */
+ pirq->irq = FIELD_PREP(GICV5_HWIRQ_TYPE, type) |
+ FIELD_PREP(GICV5_HWIRQ_ID, intid);
+
+ INIT_LIST_HEAD(&pirq->next);
+ list_add_tail(&pirq->next, pending_irqs);
+
+ return 0;
+}
+
+/*
+ * Process and sanitise each restored ISTE.
+ *
+ * HWU is for hardware use and must not survive migration. Pending state is
+ * tracked, cleared from the ISTE, and replayed before the VM first runs.
+ */
+static int vgic_v5_process_iste(__le32 *iste, struct list_head *pending_irqs,
+ u32 intid, u32 type)
+{
+ u32 iste_data = le32_to_cpu(READ_ONCE(*iste));
+ int ret;
+
+ /* Pending state is replayed later with VDPEND. */
+ if (iste_data & GICV5_ISTL2E_PENDING) {
+ ret = vgic_v5_track_pending_irq(pending_irqs, intid, type);
+ if (ret)
+ return ret;
+ }
+
+ iste_data &= ~GICV5_ISTL2E_PENDING;
+ iste_data &= ~GICV5_ISTL2E_HWU;
+
+ WRITE_ONCE(*iste, cpu_to_le32(iste_data));
+
+ return 0;
+}
+
+/*
+ * As part of restoring SPIs, sync back their handling modes to KVM. This is
+ * handled via the IRS's MMIO interface during normal operation, but we need to
+ * do this explicitly on restore.
+ */
+static void vgic_v5_restore_spi_config(struct kvm *kvm, __le32 iste, u32 spi)
+{
+ struct vgic_irq *irq;
+
+ irq = vgic_get_irq(kvm, vgic_v5_make_spi(spi));
+ if (WARN_ON_ONCE(!irq))
+ return;
+
+ scoped_guard(raw_spinlock_irqsave, &irq->irq_lock) {
+ if (le32_to_cpu(iste) & GICV5_ISTL2E_HM)
+ irq->config = VGIC_CONFIG_LEVEL;
+ else
+ irq->config = VGIC_CONFIG_EDGE;
+ }
+
+ vgic_put_irq(kvm, irq);
+}
+
+/*
+ * Restore the SPI IST from userspace-provided buffer to the host-allocated IST.
+ *
+ * Userspace supplies the architected 32-bit SPI ISTEs, only.
+ */
+int vgic_v5_restore_spi_ist(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+ u32 __user *uaddr = (u32 __user *)(unsigned long)attr->addr;
+ struct vgic_v5_ist_desc ist;
+ __le32 h_iste;
+ int ret;
+
+ ret = vgic_v5_get_spi_ist_desc(kvm, !!attr->addr, &ist);
+ if (ret || !ist.present)
+ return ret;
+
+ /*
+ * The saved SPI IST is linear and contains only architected 32-bit
+ * ISTEs. The host ISTE stride skips host metadata sections.
+ */
+ for (unsigned int i = 0; i < kvm->arch.vgic.nr_spis; i++) {
+ void *h_iste_addr = ist.base + i * ist.iste_size;
+
+ ret = get_user(h_iste, uaddr);
+ if (ret)
+ return ret;
+
+ /*
+ * Sanitise the IST, clearing HWU & pending fields. Pending
+ * state is later replayed via GIC VDPEND.
+ */
+ ret = vgic_v5_process_iste(&h_iste, &ist.vmi->pending_irqs,
+ i, GICV5_HWIRQ_TYPE_SPI);
+ if (ret)
+ return ret;
+
+ /* Update KVM's SPI level/edge tracking to match the ISTE */
+ vgic_v5_restore_spi_config(kvm, h_iste, i);
+
+ /*
+ * Zero the full ISTE (incl metadata), and write back the
+ * non-metadata region, only.
+ */
+ memset(h_iste_addr, 0, ist.iste_size);
+ WRITE_ONCE(*(__le32 *)h_iste_addr, h_iste);
+ vgic_v5_clean_inval(h_iste_addr, ist.iste_size);
+
+ uaddr++;
+ }
+
+ return 0;
+}
+
+/*
+ * Restore the LPI IST from guest memory to the Linear host-allocated LPI IST.
+ *
+ * The guest LPI IST is restored from a linear GPA range.
+ *
+ * Only the lower 32-bits of each ISTE are restored.
+ */
+static int vgic_v5_restore_linear_lpi_ist(struct kvm *kvm,
+ const struct vgic_v5_ist_desc *ist,
+ gpa_t g_entry_addr)
+{
+ size_t h_l2_index, h_l2_entries;
+ __le32 h_iste;
+ int ret;
+
+ h_l2_entries = BIT(ist->id_bits);
+
+ for (h_l2_index = 0; h_l2_index < h_l2_entries; h_l2_index++) {
+ void *h_iste_addr = ist->base + h_l2_index * ist->iste_size;
+
+ ret = kvm_read_guest_lock(kvm, g_entry_addr, &h_iste,
+ sizeof(h_iste));
+ if (ret)
+ return ret;
+
+ /*
+ * Sanitise the IST, clearing HWU & pending fields. Pending
+ * state is later replayed via GIC VDPEND.
+ */
+ ret = vgic_v5_process_iste(&h_iste, &ist->vmi->pending_irqs,
+ h_l2_index, GICV5_HWIRQ_TYPE_LPI);
+ if (ret)
+ return ret;
+
+ /*
+ * Zero the full ISTE (incl metadata), and write back the
+ * non-metadata region, only.
+ */
+ memset(h_iste_addr, 0, ist->iste_size);
+ WRITE_ONCE(*(__le32 *)h_iste_addr, h_iste);
+ vgic_v5_clean_inval(h_iste_addr, ist->iste_size);
+
+ g_entry_addr += sizeof(h_iste);
+ }
+
+ return 0;
+}
+
+/*
+ * Restore the LPI IST from guest memory to the Two-level host-allocated LPI
+ * IST.
+ *
+ * The guest LPI IST is restored from a linear GPA range.
+ *
+ * Only the lower 32-bits of each ISTE are restored.
+ */
+static int vgic_v5_restore_two_level_lpi_ist(struct kvm *kvm,
+ const struct vgic_v5_ist_desc *ist,
+ gpa_t g_entry_addr)
+{
+ struct vgic_v5_two_level_ist_shape shape;
+ size_t h_l1_index, h_l2_index;
+ void *h_l2_ist_base;
+ __le32 h_iste;
+ int ret;
+
+ shape = vgic_v5_two_level_ist_shape(ist);
+
+ vgic_v5_clean_inval(ist->vmi->h_lpi_ist,
+ shape.l1_entries * sizeof(*ist->vmi->h_lpi_ist));
+
+ for (h_l1_index = 0; h_l1_index < shape.l1_entries; ++h_l1_index) {
+ u64 l1_iste;
+
+ /*
+ * Host L2 ISTs are preallocated. Any invalid L1 entry means the
+ * host IST state is inconsistent.
+ */
+ l1_iste = le64_to_cpu(READ_ONCE(ist->vmi->h_lpi_ist[h_l1_index]));
+ if (!FIELD_GET(GICV5_ISTL1E_VALID, l1_iste))
+ return -ENXIO;
+
+ h_l2_ist_base = ist->vmi->h_lpi_l2_ists[h_l1_index];
+ if (!h_l2_ist_base)
+ return -ENXIO;
+
+ for (h_l2_index = 0; h_l2_index < shape.l2_entries; h_l2_index++) {
+ void *h_iste_addr = h_l2_ist_base +
+ h_l2_index * ist->iste_size;
+
+ ret = kvm_read_guest_lock(kvm, g_entry_addr,
+ &h_iste, sizeof(h_iste));
+ if (ret)
+ return ret;
+
+ /*
+ * Sanitise the IST, clearing HWU & pending
+ * fields. Pending state is later replayed via GIC
+ * VDPEND.
+ */
+ ret = vgic_v5_process_iste(&h_iste, &ist->vmi->pending_irqs,
+ h_l1_index * shape.l2_entries + h_l2_index,
+ GICV5_HWIRQ_TYPE_LPI);
+ if (ret)
+ return ret;
+
+ /*
+ * Zero the full ISTE (incl metadata), and write back
+ * the non-metadata region, only.
+ */
+ memset(h_iste_addr, 0, ist->iste_size);
+ WRITE_ONCE(*(__le32 *)h_iste_addr, h_iste);
+ vgic_v5_clean_inval(h_iste_addr, ist->iste_size);
+
+ g_entry_addr += sizeof(h_iste);
+ }
+ }
+
+ return 0;
+}
+
+/*
+ * Restore the LPI IST from guest memory to the host-allocated LPI IST.
+ *
+ * The guest LPI IST is restored from a linear GPA range. The host LPI IST may
+ * be linear or two-level, so host iteration depends on the allocated host
+ * shape.
+ */
+int vgic_v5_restore_lpi_ist(struct kvm *kvm)
+{
+ struct vgic_v5_ist_desc ist;
+ gpa_t g_entry_addr;
+ int ret;
+
+ ret = vgic_v5_get_lpi_ist_desc(kvm, &ist);
+ if (ret || !ist.present)
+ return ret;
+
+ /* The guest LPI IST is restored through its linear GPA range. */
+ g_entry_addr = kvm->arch.vgic.vgic_v5_irs_data->ist_baser.addr;
+
+ if (!ist.vmi->h_lpi_ist_structure)
+ return vgic_v5_restore_linear_lpi_ist(kvm, &ist, g_entry_addr);
+
+ return vgic_v5_restore_two_level_lpi_ist(kvm, &ist, g_entry_addr);
+}
+
+/*
+ * Process the pending IRQs removing them from the list and optionally injecting
+ * them.
+ */
+static int vgic_v5_process_pending_irqs(struct kvm *kvm, bool inject)
+{
+ u16 vm_id = vgic_v5_vm_id(kvm);
+ struct vgic_v5_vm_info *vmi;
+
+ vmi = xa_load(&vm_info, vm_id);
+ if (WARN_ON_ONCE(!vmi))
+ return -ENXIO;
+
+ vgic_v5_drain_pending_irqs(kvm, vmi, inject);
+
+ return 0;
+}
+
+/* Replay pending state that was cleared while restoring guest IST state. */
+int vgic_v5_restore_pending_irqs(struct kvm *kvm)
+{
+ return vgic_v5_process_pending_irqs(kvm, true);
+}
+
+/* Drop pending state collected by a failed IST restore. */
+void vgic_v5_discard_pending_irqs(struct kvm *kvm)
+{
+ vgic_v5_process_pending_irqs(kvm, false);
+}
diff --git a/arch/arm64/kvm/vgic/vgic-v5-tables.h b/arch/arm64/kvm/vgic/vgic-v5-tables.h
index 0ca0ae798dda6..ec54208e8825b 100644
--- a/arch/arm64/kvm/vgic/vgic-v5-tables.h
+++ b/arch/arm64/kvm/vgic/vgic-v5-tables.h
@@ -8,6 +8,7 @@
#include <linux/idr.h>
#include <linux/irqchip/arm-gic-v5.h>
+#include <linux/list.h>
/* Level 1 Virtual Machine Table Entry */
typedef __le64 vmtl1_entry;
@@ -43,6 +44,9 @@ struct vgic_v5_vm_info {
__le64 *h_lpi_ist;
__le64 **h_lpi_l2_ists;
__le64 *h_spi_ist;
+
+ /* Tracking of pending interrupts as part of IST restore */
+ struct list_head pending_irqs;
};
struct vgic_v5_vmt {
@@ -95,7 +99,15 @@ int vgic_v5_vmte_alloc_vpe(struct kvm_vcpu *vcpu);
int vgic_v5_vmte_free_vpe(struct kvm_vcpu *vcpu);
int vgic_v5_spi_ist_allocate(struct kvm *kvm, unsigned int id_bits);
+int vgic_v5_lpi_ist_exists(struct kvm *kvm);
int vgic_v5_lpi_ist_alloc(struct kvm *kvm, unsigned int id_bits);
int vgic_v5_lpi_ist_free(struct kvm *kvm);
+int vgic_v5_save_spi_ist(struct kvm *kvm, struct kvm_device_attr *attr);
+int vgic_v5_save_lpi_ist(struct kvm *kvm);
+int vgic_v5_restore_spi_ist(struct kvm *kvm, struct kvm_device_attr *attr);
+int vgic_v5_restore_lpi_ist(struct kvm *kvm);
+int vgic_v5_restore_pending_irqs(struct kvm *kvm);
+void vgic_v5_discard_pending_irqs(struct kvm *kvm);
+
#endif
diff --git a/arch/arm64/kvm/vgic/vgic-v5.c b/arch/arm64/kvm/vgic/vgic-v5.c
index 05fd10030da84..f89028082529a 100644
--- a/arch/arm64/kvm/vgic/vgic-v5.c
+++ b/arch/arm64/kvm/vgic/vgic-v5.c
@@ -8,6 +8,7 @@
#include <linux/bitops.h>
#include <linux/irqchip/arm-vgic-info.h>
#include <linux/irqdomain.h>
+#include <linux/kvm_host.h>
#include "vgic.h"
#include "vgic-v5-tables.h"
@@ -240,6 +241,17 @@ static int vgic_v5_irs_wait_for_vpe_op(void)
NULL);
}
+/*
+ * Wait for a write to IRS_SAVE_VMR to complete.
+ */
+static int vgic_v5_irs_wait_for_save_vm_op(u32 *statusr)
+{
+ return gicv5_wait_for_op_atomic(irs_caps.irs_base,
+ GICV5_IRS_SAVE_VM_STATUSR,
+ GICV5_IRS_SAVE_VM_STATUSR_IDLE,
+ statusr);
+}
+
static int vgic_v5_irs_write_vm_mmio_reg(u64 val, u32 offset)
{
int ret;
@@ -401,6 +413,27 @@ static int vgic_v5_irs_set_up_vpe(u16 vm_id, u16 vpe_id,
return 0;
}
+static int vgic_v5_irs_save_vm_op(u16 vm_id, bool save, u32 *statusr)
+{
+ u64 save_vmr;
+ int ret;
+
+ save_vmr = FIELD_PREP(GICV5_IRS_SAVE_VMR_VM_ID, vm_id);
+ save_vmr |= GICV5_IRS_SAVE_VMR_Q;
+ save_vmr |= FIELD_PREP(GICV5_IRS_SAVE_VMR_S, save);
+
+ guard(raw_spinlock_irqsave)(&global_irs_lock);
+
+ /* Make sure that we are idle to begin with. */
+ ret = vgic_v5_irs_wait_for_save_vm_op(NULL);
+ if (ret)
+ return ret;
+
+ irs_writeq_relaxed(save_vmr, GICV5_IRS_SAVE_VMR);
+
+ return vgic_v5_irs_wait_for_save_vm_op(statusr);
+}
+
static irqreturn_t db_handler(int irq, void *data)
{
struct kvm_vcpu *vcpu = data;
@@ -1212,6 +1245,46 @@ void vgic_v5_set_spi_ops(struct vgic_irq *irq)
irq->ops = &vgic_v5_spi_irq_ops;
}
+/*
+ * Rebuild the global SPI AP list after restoring the IST. Pending state is
+ * replayed directly to the IRS, so read the restored hardware state back before
+ * deciding whether an SPI must be tracked by KVM.
+ */
+static void vgic_v5_restore_spi_ap_list(struct kvm *kvm)
+{
+ struct vgic_dist *dist = &kvm->arch.vgic;
+
+ for (unsigned int i = 0; i < dist->nr_spis; i++) {
+ struct vgic_irq *irq = vgic_get_irq(kvm, vgic_v5_make_spi(i));
+ unsigned long flags;
+ bool pending;
+ u64 icsr;
+
+ if (WARN_ON_ONCE(!irq))
+ continue;
+
+ raw_spin_lock_irqsave(&irq->irq_lock, flags);
+
+ icsr = kvm_call_hyp_ret(__vgic_v5_vdrcfg, irq->intid);
+ irq->active = !!FIELD_GET(ICC_ICSR_EL1_Active, icsr);
+ pending = !!FIELD_GET(ICC_ICSR_EL1_Pending, icsr);
+
+ if (irq->config == VGIC_CONFIG_EDGE)
+ irq->pending_latch = pending;
+
+ if (irq->config == VGIC_CONFIG_LEVEL &&
+ !(pending || irq->active))
+ irq->pending_latch = false;
+
+ if (irq->active || pending)
+ vgic_v5_spi_queue_irq_unlock(kvm, irq, flags);
+ else
+ raw_spin_unlock_irqrestore(&irq->irq_lock, flags);
+
+ vgic_put_irq(kvm, irq);
+ }
+}
+
/* Set the pending state for GICv5 SPIs and LPIs */
void vgic_v5_set_irq_pend(struct kvm_vcpu *vcpu, struct vgic_irq *irq)
{
@@ -1353,3 +1426,216 @@ void vgic_v5_save_state(struct kvm_vcpu *vcpu)
__vgic_v5_save_ppi_state(cpu_if);
dsb(sy);
}
+
+static int vgic_v5_irs_status_is_quiesced(u32 statusr)
+{
+ if (statusr & GICV5_IRS_SAVE_VM_STATUSR_Q)
+ return 0;
+
+ return -EBUSY;
+}
+
+static int vgic_v5_irs_is_quiesced(u16 vm_id)
+{
+ u32 statusr;
+ int ret;
+
+ ret = vgic_v5_irs_save_vm_op(vm_id, false, &statusr);
+ if (ret)
+ return ret;
+
+ return vgic_v5_irs_status_is_quiesced(statusr);
+}
+
+int vgic_v5_irs_save_ists(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+ int ret = 0;
+ u32 statusr;
+ u16 vm_id = vgic_v5_vm_id(kvm);
+
+ mutex_lock(&kvm->lock);
+
+ if (kvm_trylock_all_vcpus(kvm)) {
+ mutex_unlock(&kvm->lock);
+ return -EBUSY;
+ }
+
+ mutex_lock(&kvm->arch.config_lock);
+
+ if (!vgic_initialized(kvm)) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ ret = vgic_v5_irs_save_vm_op(vm_id, true, &statusr);
+ if (ret) {
+ kvm_err("Failed to save GICv5 IRS VM state: %d\n", ret);
+ goto out_unlock;
+ }
+
+ ret = vgic_v5_irs_status_is_quiesced(statusr);
+ if (ret)
+ goto out_unlock;
+
+ /* Save the SPI IST to the userspace buffer. */
+ ret = vgic_v5_save_spi_ist(kvm, attr);
+ if (ret)
+ goto out_unlock;
+
+ ret = vgic_v5_irs_is_quiesced(vm_id);
+ if (ret)
+ goto out_unlock;
+
+ /* Save the LPI IST to guest memory. */
+ ret = vgic_v5_save_lpi_ist(kvm);
+ if (ret)
+ goto out_unlock;
+
+ ret = vgic_v5_irs_is_quiesced(vm_id);
+ if (ret)
+ goto out_unlock;
+
+out_unlock:
+ mutex_unlock(&kvm->arch.config_lock);
+ kvm_unlock_all_vcpus(kvm);
+ mutex_unlock(&kvm->lock);
+
+ return ret;
+}
+
+static int vgic_v5_restore_lpi_ist_alloc(struct kvm *kvm, bool *allocated)
+{
+ unsigned int id_bits;
+ int ret;
+
+ *allocated = false;
+
+ ret = vgic_v5_irs_lpi_ist_id_bits(kvm, &id_bits);
+ if (ret <= 0)
+ return ret;
+
+ ret = vgic_v5_lpi_ist_alloc(kvm, id_bits);
+ if (ret)
+ return ret;
+
+ *allocated = true;
+
+ return 0;
+}
+
+/*
+ * Clean up the LPI IST if we allocated it, and restore the VMTE to the
+ * original, valid state.
+ */
+static void vgic_v5_restore_cleanup(struct kvm *kvm,
+ struct kvm_vcpu *vcpu,
+ bool lpi_ist_allocated)
+{
+ if (lpi_ist_allocated) {
+ WARN_ON(vgic_v5_send_command(vcpu, VMTE_MAKE_INVALID));
+ WARN_ON(vgic_v5_lpi_ist_free(kvm));
+ }
+
+ WARN_ON(vgic_v5_send_command(vcpu, VMTE_MAKE_VALID));
+}
+
+int vgic_v5_irs_restore_ists(struct kvm *kvm, struct kvm_device_attr *attr)
+{
+ bool lpi_ist_allocated = false, vmte_invalid = false;
+ struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0);
+ int ret = 0;
+
+ mutex_lock(&kvm->lock);
+
+ if (kvm_trylock_all_vcpus(kvm)) {
+ mutex_unlock(&kvm->lock);
+ return -EBUSY;
+ }
+
+ mutex_lock(&kvm->arch.config_lock);
+
+ if (!vgic_initialized(kvm)) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ if (kvm_vm_has_ran_once(kvm)) {
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ ret = vgic_v5_lpi_ist_exists(kvm);
+ if (ret) {
+ if (ret > 0)
+ ret = -EBUSY;
+ goto out_unlock;
+ }
+
+ /*
+ * If the guest has previously allocated an IST (which we check based on
+ * the IRS_IST_BASER), extract the number of LPI ID bits from the
+ * IRS_IST_CFGR. Else, do nothing.
+ *
+ * We do this before making the VMTE invalid as we rely on
+ * IRS_VMAP_VISTR to mark the IST as valid in the VMTE. This can only
+ * happen while the VMTE is valid.
+ */
+ ret = vgic_v5_restore_lpi_ist_alloc(kvm, &lpi_ist_allocated);
+ if (ret)
+ goto out_unlock;
+
+ /*
+ * Host ISTs are updated while the VMTE is invalid, so the GIC cannot
+ * observe partially restored state.
+ */
+ ret = vgic_v5_send_command(vcpu0, VMTE_MAKE_INVALID);
+ if (ret) {
+ /*
+ * If invalidation fails, the restore cannot safely update host
+ * IST state.
+ */
+ goto out_unlock;
+ }
+ vmte_invalid = true;
+
+ /* Restore the SPI IST from the userspace buffer. */
+ ret = vgic_v5_restore_spi_ist(kvm, attr);
+ if (ret)
+ goto out_unlock;
+
+ /* Restore the LPI IST from guest memory. */
+ if (lpi_ist_allocated) {
+ ret = vgic_v5_restore_lpi_ist(kvm);
+ if (ret)
+ goto out_unlock;
+ }
+
+ /* And make the VM Valid again */
+ ret = vgic_v5_send_command(vcpu0, VMTE_MAKE_VALID);
+ if (ret)
+ goto out_unlock;
+ vmte_invalid = false;
+
+ /*
+ * As part of restoring the ISTs, and previously pending interrupts have
+ * been tracked and made non-pending. Now that the ISTs have been
+ * restored, and the VM is valid again, restore the pending interrupts.
+ */
+ ret = vgic_v5_restore_pending_irqs(kvm);
+ if (ret)
+ goto out_unlock;
+
+ vgic_v5_restore_spi_ap_list(kvm);
+
+out_unlock:
+ if (ret && (vmte_invalid || lpi_ist_allocated)) {
+ vgic_v5_discard_pending_irqs(kvm);
+ vgic_v5_restore_cleanup(kvm, vcpu0, lpi_ist_allocated);
+ }
+
+ mutex_unlock(&kvm->arch.config_lock);
+ kvm_unlock_all_vcpus(kvm);
+ mutex_unlock(&kvm->lock);
+
+ return ret;
+}
diff --git a/arch/arm64/kvm/vgic/vgic.h b/arch/arm64/kvm/vgic/vgic.h
index e05b4a5c2e49b..9c140a54e840e 100644
--- a/arch/arm64/kvm/vgic/vgic.h
+++ b/arch/arm64/kvm/vgic/vgic.h
@@ -384,11 +384,14 @@ void vgic_v5_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr);
void vgic_v5_restore_state(struct kvm_vcpu *vcpu);
void vgic_v5_save_state(struct kvm_vcpu *vcpu);
int vgic_v5_register_irs_iodev(struct kvm *kvm, gpa_t irs_base_address);
+int vgic_v5_irs_lpi_ist_id_bits(struct kvm *kvm, unsigned int *id_bits);
int vgic_v5_cpu_sysregs_uaccess(struct kvm_vcpu *vcpu,
struct kvm_device_attr *attr, bool is_write);
int vgic_v5_has_cpu_sysregs_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr);
const struct sys_reg_desc *vgic_v5_get_sysreg_table(unsigned int *sz);
+int vgic_v5_irs_save_ists(struct kvm *kvm, struct kvm_device_attr *attr);
+int vgic_v5_irs_restore_ists(struct kvm *kvm, struct kvm_device_attr *attr);
int vgic_v5_irs_attr_regs_access(struct kvm_device *dev,
struct kvm_device_attr *attr,
u64 *reg, bool is_write);
diff --git a/tools/arch/arm64/include/uapi/asm/kvm.h b/tools/arch/arm64/include/uapi/asm/kvm.h
index 710a0d267347d..1b9bbeab18a4e 100644
--- a/tools/arch/arm64/include/uapi/asm/kvm.h
+++ b/tools/arch/arm64/include/uapi/asm/kvm.h
@@ -423,6 +423,7 @@ enum {
#define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ 9
#define KVM_DEV_ARM_VGIC_GRP_IRS_REGS 10
+#define KVM_DEV_ARM_VGIC_GRP_IST 11
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT 10
#define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 36/39] Documentation: KVM: Document KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS for VGICv5
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (34 preceding siblings ...)
2026-05-21 15:01 ` [PATCH v2 35/39] KVM: arm64: gic-v5: Implement save/restore mechanisms for ISTs Sascha Bischoff
@ 2026-05-21 15:01 ` Sascha Bischoff
2026-05-21 15:01 ` [PATCH v2 37/39] Documentation: KVM: Add KVM_DEV_ARM_VGIC_GRP_IRS_REGS to VGICv5 docs Sascha Bischoff
` (2 subsequent siblings)
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 15:01 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
The virtual GICv5 adopts the same mechanism as GICv3 for userspace
read and writes of the system registers, albeit operating on a
different set of registers, of course.
Document KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS for GICv5 in the VGICv5
documentation, explicitly calling out the registers it operates
on. The main body of documentation has been directly copied from the
VGICv3 documentation as it has identical operation.
One key thing to note is that for two sets of GICv5 registers - those
pertaining to Active and Pending state - the operation of the
interface is different to how the actual registers operate. Both of
these registers have C and S variants (to set and clear bits) in
hardware. However for this interface, we ONLY implement the S variant,
AND treat it as a raw write. This simplifies the act of reading or
writing the state.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
.../virt/kvm/devices/arm-vgic-v5.rst | 66 +++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/Documentation/virt/kvm/devices/arm-vgic-v5.rst b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
index 5c6323d82f784..e2045b09f27d0 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v5.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
@@ -75,3 +75,69 @@ Groups:
-EFAULT Invalid guest ram access
-EBUSY One or more VCPUS are running
======= ========================================================
+
+ KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS
+ Attributes:
+
+ The attr field of kvm_device_attr encodes two values::
+
+ bits: | 63 .... 32 | 31 .... 16 | 15 .... 0 |
+ values: | mpidr | RES | instr |
+
+ The mpidr field encodes the CPU ID based on the affinity information in the
+ architecture defined MPIDR, and the field is encoded as follows::
+
+ | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 |
+ | Aff3 | Aff2 | Aff1 | Aff0 |
+
+ The instr field encodes the system register to access based on the fields
+ defined in the A64 instruction set encoding for system register access
+ (RES means the bits are reserved for future use and should be zero)::
+
+ | 15 ... 14 | 13 ... 11 | 10 ... 7 | 6 ... 3 | 2 ... 0 |
+ | Op 0 | Op1 | CRn | CRm | Op2 |
+
+ All system regs accessed through this API are (rw, 64-bit) and
+ kvm_device_attr.addr points to a __u64 value.
+
+ KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS accesses the CPU interface registers for the
+ CPU specified by the mpidr field.
+
+ The available registers are:
+
+ ======================= ===================================================
+ ICC_ICSR_EL1
+ ICC_PPI_ENABLER0_EL1
+ ICC_PPI_ENABLER1_EL1
+ ICC_PPI_SACTIVER0_EL1 ICC_PPI_CACTIVER0_EL1 is not supported. Writes to
+ ICC_PPI_SACTIVER0_EL1 are treated as RAW writes of
+ the underlying state.
+ ICC_PPI_SACTIVER1_EL1 ICC_PPI_CACTIVER1_EL1 is not supported. Writes to
+ ICC_PPI_SACTIVER1_EL1 are treated as RAW writes of
+ the underlying state.
+ ICC_PPI_SPENDR0_EL1 ICC_PPI_CPENDR0_EL1 is not supported. Writes to
+ ICC_PPI_SPENDR0_EL1 are treated as RAW writes of
+ the underlying state.
+ ICC_PPI_SPENDR1_EL1 ICC_PPI_CPENDR1_EL1 is not supported. Writes to
+ ICC_PPI_SPENDR1_EL1 are treated as RAW writes of
+ the underlying state.
+ ICC_PPI_PRIORITYR0_EL1
+ ICC_PPI_PRIORITYR1_EL1
+ ICC_PPI_PRIORITYR2_EL1
+ ICC_PPI_PRIORITYR3_EL1
+ ICC_PPI_PRIORITYR4_EL1
+ ICC_PPI_PRIORITYR5_EL1
+ ICC_PPI_PRIORITYR6_EL1
+ ICC_PPI_PRIORITYR7_EL1
+ ICC_PPI_PRIORITYR8_EL1
+ ICC_PPI_PRIORITYR9_EL1
+ ICC_PPI_PRIORITYR10_EL1
+ ICC_PPI_PRIORITYR11_EL1
+ ICC_PPI_PRIORITYR12_EL1
+ ICC_PPI_PRIORITYR13_EL1
+ ICC_PPI_PRIORITYR14_EL1
+ ICC_PPI_PRIORITYR15_EL1
+ ICC_APR_EL1
+ ICC_CR0_EL1
+ ICC_PCR_EL1
+ ======================= ===================================================
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 37/39] Documentation: KVM: Add KVM_DEV_ARM_VGIC_GRP_IRS_REGS to VGICv5 docs
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (35 preceding siblings ...)
2026-05-21 15:01 ` [PATCH v2 36/39] Documentation: KVM: Document KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS for VGICv5 Sascha Bischoff
@ 2026-05-21 15:01 ` Sascha Bischoff
2026-05-21 15:02 ` [PATCH v2 38/39] Documentation: KVM: Add docs for KVM_DEV_ARM_VGIC_GRP_IST Sascha Bischoff
2026-05-21 15:02 ` [PATCH v2 39/39] Documentation: KVM: Add the VGICv5 IRS save/restore sequences Sascha Bischoff
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 15:01 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Document the KVM_DEV_ARM_VGIC_GRP_IRS_REGS attribute group used to
read and write the virtual IRS's MMIO register state. This provides a
GICv5-specific interface for state that is conceptually similar to the
VGICv3 ITS register interface, but uses IRS terminology instead of ITS.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
.../virt/kvm/devices/arm-vgic-v5.rst | 36 +++++++++++++++++++
1 file changed, 36 insertions(+)
diff --git a/Documentation/virt/kvm/devices/arm-vgic-v5.rst b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
index e2045b09f27d0..217a1ecfbdc5f 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v5.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
@@ -141,3 +141,39 @@ Groups:
ICC_CR0_EL1
ICC_PCR_EL1
======================= ===================================================
+
+ KVM_DEV_ARM_VGIC_GRP_IRS_REGS
+ Attributes:
+ The attr field of kvm_device_attr encodes the offset of the IRS register,
+ relative to the IRS CONFIG_FRAME base address. This is the address that
+ was provided via KVM_VGIC_V5_ADDR_TYPE_IRS when creating VGICv5 in the
+ first place.
+
+ kvm_device_attr.addr points to a __u64 value whatever the width
+ of the addressed register (32/64 bits). 64 bit registers can only
+ be accessed with full length.
+
+ Writes to read-only registers are ignored by the kernel except for:
+
+ - IRS_IDR0 - IRS_IDR2 and IRS_IDR5 - IRS_IDR7: These are sanity checked to
+ ensure that they match a sane config.
+ - IRS_IDR3 and IRS_IDR4: These are RAZ/WI as nested virtualization is not
+ supported.
+
+ For registers without dedicated userspace accessors, getting or setting a
+ register uses the same emulated MMIO handlers as guest reads/writes.
+ Dedicated userspace accessors may instead save or restore migration state
+ without triggering guest-visible side effects. For example, restoring
+ IRS_IST_BASER only restores the emulated register state; any host LPI IST
+ allocation based on the restored IRS_IST_CFGR and IRS_IST_BASER state
+ happens when KVM_DEV_ARM_VGIC_GRP_IST is restored.
+
+ Errors:
+
+ ======= =================================================================
+ -ENXIO Offset does not correspond to any supported register
+ -EFAULT Invalid user pointer for attr->addr
+ -EINVAL Offset is not 32-bit aligned for 32-bit MMIO registers, or not
+ 64-bit aligned for 64-bit registers
+ -EBUSY VGIC is not initialized, or one or more VCPUs are running
+ ======= =================================================================
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 38/39] Documentation: KVM: Add docs for KVM_DEV_ARM_VGIC_GRP_IST
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (36 preceding siblings ...)
2026-05-21 15:01 ` [PATCH v2 37/39] Documentation: KVM: Add KVM_DEV_ARM_VGIC_GRP_IRS_REGS to VGICv5 docs Sascha Bischoff
@ 2026-05-21 15:02 ` Sascha Bischoff
2026-05-21 15:02 ` [PATCH v2 39/39] Documentation: KVM: Add the VGICv5 IRS save/restore sequences Sascha Bischoff
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 15:02 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
Document the IST save/restore userspace interface for the VGICv5
device, KVM_DEV_ARM_VGIC_GRP_IST.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
.../virt/kvm/devices/arm-vgic-v5.rst | 55 +++++++++++++++++++
1 file changed, 55 insertions(+)
diff --git a/Documentation/virt/kvm/devices/arm-vgic-v5.rst b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
index 217a1ecfbdc5f..0ee0fe9308fc9 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v5.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
@@ -177,3 +177,58 @@ Groups:
64-bit aligned for 64-bit registers
-EBUSY VGIC is not initialized, or one or more VCPUs are running
======= =================================================================
+
+ KVM_DEV_ARM_VGIC_GRP_IST
+ Attributes:
+ This interface is used to either save the state of the IRS's Interrupt
+ State Tables (ISTs), or to restore them. A get operation saves IST state,
+ and a set operation restores IST state. kvm_device_attr.attr is reserved
+ and must be zero.
+
+ The VGIC must be initialized before using this interface. Restore must be
+ performed before the VM has run. For restore, userspace must have already
+ restored the IRS state and guest memory needed to describe and back any
+ guest LPI IST.
+
+ Saving first asks the IRS to save and quiesce the VM so that interrupt
+ state has been written back to the ISTs. KVM checks that the VM remains
+ quiesced while copying out the SPI and LPI IST state.
+
+ The LPI IST is written to or read from guest-allocated memory. KVM assumes
+ that the guest has provisioned a linear virtual IST through IRS_IST_CFGR
+ and IRS_IST_BASER, and uses that guest memory as the LPI IST migration
+ storage. If the guest has not enabled an LPI IST, there is no LPI IST
+ state to save or restore.
+
+ The SPI IST has no guest-owned backing memory, so userspace must provide a
+ buffer through kvm_device_attr.addr for both get and set operations. The
+ buffer contains one little-endian 32-bit IST entry per exposed SPI, in SPI
+ number order. Its size is:
+
+ nr_spis * sizeof(__u32)
+
+ where nr_spis is the value returned by KVM_DEV_ARM_VGIC_GRP_NR_IRQS for
+ the VGICv5 device. For VGICv5 this value is the number of SPIs, not the
+ total number of interrupts. Since VGICv5 currently exposes at least 32
+ SPIs, kvm_device_attr.addr must be non-zero.
+
+ Errors:
+
+ =========== ============================================================
+ -EBUSY One or more VCPUs are running, the VGIC is not initialized,
+ restore was requested after the VM has run, an LPI IST
+ already exists, or the save operation completed but the VM
+ did not remain quiesced
+ -EINVAL A userspace SPI IST buffer was not supplied when one is
+ required, or an internal VM table operation rejected the VM
+ state
+ -ENOENT A userspace SPI IST buffer was supplied, but there is no SPI
+ IST to serialise/unserialise
+ -EFAULT Invalid user pointer for attr->addr, or the guest memory
+ backing the LPI IST could not be accessed
+ -ENXIO Required per-VM VGICv5/IST backing state is missing or
+ inconsistent
+ -ENOMEM Restoring IST state failed while allocating the host LPI IST
+ or tracking pending interrupts
+ -ETIMEDOUT An IRS save/VM operation timed out
+ =========== ============================================================
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread* [PATCH v2 39/39] Documentation: KVM: Add the VGICv5 IRS save/restore sequences
2026-05-21 14:49 [PATCH v2 00/39] KVM: arm64: Add GICv5 IRS support Sascha Bischoff
` (37 preceding siblings ...)
2026-05-21 15:02 ` [PATCH v2 38/39] Documentation: KVM: Add docs for KVM_DEV_ARM_VGIC_GRP_IST Sascha Bischoff
@ 2026-05-21 15:02 ` Sascha Bischoff
38 siblings, 0 replies; 42+ messages in thread
From: Sascha Bischoff @ 2026-05-21 15:02 UTC (permalink / raw)
To: linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev,
kvm@vger.kernel.org
Cc: nd, maz@kernel.org, oliver.upton@linux.dev, Joey Gouly,
Suzuki Poulose, yuzenghui@huawei.com, peter.maydell@linaro.org,
lpieralisi@kernel.org, Timothy Hayes
When saving/restoring the state of the GICv5 IRS, it is important that
it happens in the correct order. Failure to do so will almost
certainly result in failing to restore a guest that is capable of
handling interrupts correctly.
On a save, the ISTs must be saved prior to saving the guest's memory
as the guest's LPI IST is written to guest memory. Conversely, on
restore the guest's memory must be restored prior to restoring the
ISTs.
It is important to restore the IRS MMIO registers by first restoring
the IRS_IDx registers as they define the capabilities of the IRS, and
are used as part of creating and managing ISTs and SPIs.
In order to restore the ISTs themselves, the IRS_IST_CFGR must be
restored prior to the IRS_IST_BASER. KVM uses these restored registers
when KVM_DEV_ARM_VGIC_GRP_IST is restored to determine whether a guest
LPI IST exists, how large it must be, and where the guest-provided
migration storage lives. The host LPI IST is allocated and populated
as part of restoring KVM_DEV_ARM_VGIC_GRP_IST.
At this stage the remaining MMIO registers can be restored. The SPI
IST gets extracted from a userspace provided buffer, and is
transferred to the host-allocated SPI IST. The LPI IST is extracted
from guest memory, and is written to the host-allocated LPI IST.
As a general rule, the IRS_*_STATUSR registers can be ignored on
restore. They are not userspace writable.
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
.../virt/kvm/devices/arm-vgic-v5.rst | 45 +++++++++++++++++++
1 file changed, 45 insertions(+)
diff --git a/Documentation/virt/kvm/devices/arm-vgic-v5.rst b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
index 0ee0fe9308fc9..188851f22f9eb 100644
--- a/Documentation/virt/kvm/devices/arm-vgic-v5.rst
+++ b/Documentation/virt/kvm/devices/arm-vgic-v5.rst
@@ -232,3 +232,48 @@ Groups:
or tracking pending interrupts
-ETIMEDOUT An IRS save/VM operation timed out
=========== ============================================================
+
+IRS Save Sequence:
+------------------
+
+The following operations are required when saving the virtual GICv5 IRS:
+
+a) Save the ISTs by issuing KVM_GET_DEVICE_ATTR on KVM_DEV_ARM_VGIC_GRP_IST.
+b) Save the IRS MMIO register state by issuing KVM_GET_DEVICE_ATTR on
+ KVM_DEV_ARM_VGIC_GRP_IRS_REGS.
+
+These two steps may be performed in either order. However, the guest memory
+must be serialised after the ISTs have been saved, as saving the LPI IST writes
+the IST state back into guest memory.
+
+IRS Restore Sequence:
+---------------------
+
+The following ordering must be followed when restoring the virtual GICv5 and
+IRS:
+
+a) Create vCPUs.
+b) Provide the IRS base address by issuing KVM_SET_DEVICE_ATTR on
+ KVM_DEV_ARM_VGIC_GRP_ADDR
+c) Restore the number of SPIs by issuing KVM_SET_DEVICE_ATTR on
+ KVM_DEV_ARM_VGIC_GRP_NR_IRQS.
+d) Initialise the GIC - this sets up the default state and creates the SPI
+ IST - by issuing KVM_SET_DEVICE_ATTR on KVM_DEV_ARM_VGIC_GRP_CTRL with
+ KVM_DEV_ARM_VGIC_CTRL_INIT
+e) Restore guest memory.
+f) Restore the IRS MMIO register state by issuing KVM_SET_DEVICE_ATTR on
+ KVM_DEV_ARM_VGIC_GRP_IRS_REGS. KVM uses the restored IRS_IST_CFGR and
+ IRS_IST_BASER state to allocate the LPI IST during the following step.
+g) Restore the ISTs by issuing KVM_SET_DEVICE_ATTR on
+ KVM_DEV_ARM_VGIC_GRP_IST.
+
+The number of SPIs must be restored before VGIC initialization because
+initialization allocates the SPI state and fixes the SPI range exposed by the
+IRS ID registers.
+
+The various ``*_STATUSR`` registers are observational state in the current KVM
+implementation. Userspace may save them for validation or debugging purposes,
+but they are not required as restore input and do not need to be replayed during
+restore.
+
+Then vCPUs can be started.
--
2.34.1
^ permalink raw reply related [flat|nested] 42+ messages in thread