public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH kvmtool v5 0/7] arm64: Nested virtualization support
@ 2026-01-23 14:27 Andre Przywara
  2026-01-23 14:27 ` [PATCH kvmtool v5 1/7] Sync kernel UAPI headers with v6.19-rc6 Andre Przywara
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Andre Przywara @ 2026-01-23 14:27 UTC (permalink / raw)
  To: Julien Thierry, Will Deacon
  Cc: Marc Zyngier, kvm, kvmarm, Alexandru Elisei, Sascha Bischoff

This is v5 of the nested virt support series, fixing a corner case when
some maintenance IRQ setup fails. Also there is now a warning if --e2h0
is specified without --nested. Many thanks to Sascha for the review!
========================================================

Thanks to the imperturbable efforts from Marc, arm64 support for nested
virtualization has now reached the mainline kernel, which means the
respective kvmtool support should now be ready as well.

Patch 1 updates the kernel headers, to get the new EL2 capability, and
the VGIC device control to setup the maintenance IRQ.
Patch 2 introduces the new "--nested" command line option, to let the
VCPUs start in EL2. To allow KVM guests running in such a guest, we also
need VGIC support, which patch 3 allows by setting the maintenance IRQ.
Patch 4 to 6 are picked from Marc's repo, and allow to set the arch
timer offset, enable non-VHE guests (at the cost of losing recursive
nested virtualisation), and also advertise the virtual EL2 timer IRQ.

Tested on the FVP (with some good deal of patience), and some commercial
(non-fruity) hardware, down to a guest's guest's guest.

Cheers,
Andre

Changelog v4 ... v5:
- bump kernel headers to v6.19-rc6
- print a warning if --e2h0 is given without --nested
- fail if the maintenance IRQ setting attribute is not supported

Changelog v3 ... v4:
- pass kvm pointer to gic__generate_fdt_nodes()
- use macros for PPI offset and DT type identifier
- properly calculate DT interrupt flags value
- add patch 7 to fix virtio endianess issues
- CAPITALISE verbs in commit message

Changelog v2 ... v3:
- adjust^Wreplace commit messages for E2H0 and counter-offset patch
- check for KVM_CAP_ARM_EL2_E2H0 when --e2h0 is requested
- update kernel headers to v6.16 release

Changelog v1 ... 2:
- add three patches from Marc:
  - add --e2h0 command line option
  - add --counter-offset command line option
  - advertise all five arch timer interrupts in DT

Andre Przywara (3):
  Sync kernel UAPI headers with v6.19-rc6
  arm64: Initial nested virt support
  arm64: nested: Add support for setting maintenance IRQ

Marc Zyngier (4):
  arm64: Add counter offset control
  arm64: Add FEAT_E2H0 support
  arm64: Generate HYP timer interrupt specifiers
  arm64: Handle virtio endianness reset when running nested

 arm64/arm-cpu.c                     |   6 +-
 arm64/fdt.c                         |   5 +-
 arm64/gic.c                         |  29 ++++++-
 arm64/include/asm/kvm.h             |  25 ++++--
 arm64/include/kvm/gic.h             |   2 +-
 arm64/include/kvm/kvm-config-arch.h |  11 ++-
 arm64/include/kvm/kvm-cpu-arch.h    |   5 +-
 arm64/include/kvm/timer.h           |   2 +-
 arm64/kvm-cpu.c                     |  64 ++++++++++++---
 arm64/kvm.c                         |  19 +++++
 arm64/timer.c                       |  29 +++----
 include/linux/kvm.h                 |  47 +++++++++++
 include/linux/virtio_ids.h          |   1 +
 include/linux/virtio_net.h          |  49 +++++++++++-
 include/linux/virtio_pci.h          |   3 +-
 powerpc/include/asm/kvm.h           |  13 ----
 riscv/include/asm/kvm.h             |  29 ++++++-
 x86/include/asm/kvm.h               | 116 ++++++++++++++++++++++++++++
 18 files changed, 394 insertions(+), 61 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH kvmtool v5 1/7] Sync kernel UAPI headers with v6.19-rc6
  2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
@ 2026-01-23 14:27 ` Andre Przywara
  2026-01-23 14:27 ` [PATCH kvmtool v5 2/7] arm64: Initial nested virt support Andre Przywara
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Andre Przywara @ 2026-01-23 14:27 UTC (permalink / raw)
  To: Julien Thierry, Will Deacon
  Cc: Marc Zyngier, kvm, kvmarm, Alexandru Elisei, Sascha Bischoff

Needed for ARM nested virt support.
Generated using util/update_headers.sh.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 arm64/include/asm/kvm.h    |  25 ++++++--
 include/linux/kvm.h        |  47 +++++++++++++++
 include/linux/virtio_ids.h |   1 +
 include/linux/virtio_net.h |  49 +++++++++++++++-
 include/linux/virtio_pci.h |   3 +-
 powerpc/include/asm/kvm.h  |  13 -----
 riscv/include/asm/kvm.h    |  29 +++++++++-
 x86/include/asm/kvm.h      | 116 +++++++++++++++++++++++++++++++++++++
 8 files changed, 262 insertions(+), 21 deletions(-)

diff --git a/arm64/include/asm/kvm.h b/arm64/include/asm/kvm.h
index 568bf858..a792a599 100644
--- a/arm64/include/asm/kvm.h
+++ b/arm64/include/asm/kvm.h
@@ -31,7 +31,7 @@
 #define KVM_SPSR_FIQ	4
 #define KVM_NR_SPSR	5
 
-#ifndef __ASSEMBLY__
+#ifndef __ASSEMBLER__
 #include <linux/psci.h>
 #include <linux/types.h>
 #include <asm/ptrace.h>
@@ -105,6 +105,7 @@ struct kvm_regs {
 #define KVM_ARM_VCPU_PTRAUTH_ADDRESS	5 /* VCPU uses address authentication */
 #define KVM_ARM_VCPU_PTRAUTH_GENERIC	6 /* VCPU uses generic authentication */
 #define KVM_ARM_VCPU_HAS_EL2		7 /* Support nested virtualization */
+#define KVM_ARM_VCPU_HAS_EL2_E2H0	8 /* Limit NV support to E2H RES0 */
 
 struct kvm_vcpu_init {
 	__u32 target;
@@ -371,6 +372,7 @@ enum {
 #endif
 };
 
+/* Vendor hyper call function numbers 0-63 */
 #define KVM_REG_ARM_VENDOR_HYP_BMAP		KVM_REG_ARM_FW_FEAT_BMAP_REG(2)
 
 enum {
@@ -381,6 +383,17 @@ enum {
 #endif
 };
 
+/* Vendor hyper call function numbers 64-127 */
+#define KVM_REG_ARM_VENDOR_HYP_BMAP_2		KVM_REG_ARM_FW_FEAT_BMAP_REG(3)
+
+enum {
+	KVM_REG_ARM_VENDOR_HYP_BIT_DISCOVER_IMPL_VER	= 0,
+	KVM_REG_ARM_VENDOR_HYP_BIT_DISCOVER_IMPL_CPUS	= 1,
+#ifdef __KERNEL__
+	KVM_REG_ARM_VENDOR_HYP_BMAP_2_BIT_COUNT,
+#endif
+};
+
 /* Device Control API on vm fd */
 #define KVM_ARM_VM_SMCCC_CTRL		0
 #define   KVM_ARM_VM_SMCCC_FILTER	0
@@ -403,6 +416,7 @@ enum {
 #define KVM_DEV_ARM_VGIC_GRP_CPU_SYSREGS 6
 #define KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO  7
 #define KVM_DEV_ARM_VGIC_GRP_ITS_REGS 8
+#define KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ  9
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT	10
 #define KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_MASK \
 			(0x3fffffULL << KVM_DEV_ARM_VGIC_LINE_LEVEL_INFO_SHIFT)
@@ -417,10 +431,11 @@ enum {
 
 /* Device Control API on vcpu fd */
 #define KVM_ARM_VCPU_PMU_V3_CTRL	0
-#define   KVM_ARM_VCPU_PMU_V3_IRQ	0
-#define   KVM_ARM_VCPU_PMU_V3_INIT	1
-#define   KVM_ARM_VCPU_PMU_V3_FILTER	2
-#define   KVM_ARM_VCPU_PMU_V3_SET_PMU	3
+#define   KVM_ARM_VCPU_PMU_V3_IRQ		0
+#define   KVM_ARM_VCPU_PMU_V3_INIT		1
+#define   KVM_ARM_VCPU_PMU_V3_FILTER		2
+#define   KVM_ARM_VCPU_PMU_V3_SET_PMU		3
+#define   KVM_ARM_VCPU_PMU_V3_SET_NR_COUNTERS	4
 #define KVM_ARM_VCPU_TIMER_CTRL		1
 #define   KVM_ARM_VCPU_TIMER_IRQ_VTIMER		0
 #define   KVM_ARM_VCPU_TIMER_IRQ_PTIMER		1
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 45e6d8fc..dddb781b 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -178,6 +178,8 @@ struct kvm_xen_exit {
 #define KVM_EXIT_NOTIFY           37
 #define KVM_EXIT_LOONGARCH_IOCSR  38
 #define KVM_EXIT_MEMORY_FAULT     39
+#define KVM_EXIT_TDX              40
+#define KVM_EXIT_ARM_SEA          41
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -375,6 +377,7 @@ struct kvm_run {
 #define KVM_SYSTEM_EVENT_WAKEUP         4
 #define KVM_SYSTEM_EVENT_SUSPEND        5
 #define KVM_SYSTEM_EVENT_SEV_TERM       6
+#define KVM_SYSTEM_EVENT_TDX_FATAL      7
 			__u32 type;
 			__u32 ndata;
 			union {
@@ -446,6 +449,39 @@ struct kvm_run {
 			__u64 gpa;
 			__u64 size;
 		} memory_fault;
+		/* KVM_EXIT_TDX */
+		struct {
+			__u64 flags;
+			__u64 nr;
+			union {
+				struct {
+					__u64 ret;
+					__u64 data[5];
+				} unknown;
+				struct {
+					__u64 ret;
+					__u64 gpa;
+					__u64 size;
+				} get_quote;
+				struct {
+					__u64 ret;
+					__u64 leaf;
+					__u64 r11, r12, r13, r14;
+				} get_tdvmcall_info;
+				struct {
+					__u64 ret;
+					__u64 vector;
+				} setup_event_notify;
+			};
+		} tdx;
+		/* KVM_EXIT_ARM_SEA */
+		struct {
+#define KVM_EXIT_ARM_SEA_FLAG_GPA_VALID	(1ULL << 0)
+			__u64 flags;
+			__u64 esr;
+			__u64 gva;
+			__u64 gpa;
+		} arm_sea;
 		/* Fix the size of the union. */
 		char padding[256];
 	};
@@ -617,6 +653,7 @@ struct kvm_ioeventfd {
 #define KVM_X86_DISABLE_EXITS_HLT            (1 << 1)
 #define KVM_X86_DISABLE_EXITS_PAUSE          (1 << 2)
 #define KVM_X86_DISABLE_EXITS_CSTATE         (1 << 3)
+#define KVM_X86_DISABLE_EXITS_APERFMPERF     (1 << 4)
 
 /* for KVM_ENABLE_CAP */
 struct kvm_enable_cap {
@@ -929,6 +966,14 @@ struct kvm_enable_cap {
 #define KVM_CAP_PRE_FAULT_MEMORY 236
 #define KVM_CAP_X86_APIC_BUS_CYCLES_NS 237
 #define KVM_CAP_X86_GUEST_MODE 238
+#define KVM_CAP_ARM_WRITABLE_IMP_ID_REGS 239
+#define KVM_CAP_ARM_EL2 240
+#define KVM_CAP_ARM_EL2_E2H0 241
+#define KVM_CAP_RISCV_MP_STATE_RESET 242
+#define KVM_CAP_ARM_CACHEABLE_PFNMAP_SUPPORTED 243
+#define KVM_CAP_GUEST_MEMFD_FLAGS 244
+#define KVM_CAP_ARM_SEA_TO_USER 245
+#define KVM_CAP_S390_USER_OPEREXEC 246
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;
@@ -1565,6 +1610,8 @@ struct kvm_memory_attributes {
 #define KVM_MEMORY_ATTRIBUTE_PRIVATE           (1ULL << 3)
 
 #define KVM_CREATE_GUEST_MEMFD	_IOWR(KVMIO,  0xd4, struct kvm_create_guest_memfd)
+#define GUEST_MEMFD_FLAG_MMAP		(1ULL << 0)
+#define GUEST_MEMFD_FLAG_INIT_SHARED	(1ULL << 1)
 
 struct kvm_create_guest_memfd {
 	__u64 size;
diff --git a/include/linux/virtio_ids.h b/include/linux/virtio_ids.h
index 7aa2eb76..6c12db16 100644
--- a/include/linux/virtio_ids.h
+++ b/include/linux/virtio_ids.h
@@ -68,6 +68,7 @@
 #define VIRTIO_ID_AUDIO_POLICY		39 /* virtio audio policy */
 #define VIRTIO_ID_BT			40 /* virtio bluetooth */
 #define VIRTIO_ID_GPIO			41 /* virtio gpio */
+#define VIRTIO_ID_SPI			45 /* virtio spi */
 
 /*
  * Virtio Transitional IDs
diff --git a/include/linux/virtio_net.h b/include/linux/virtio_net.h
index ac917471..1db45b01 100644
--- a/include/linux/virtio_net.h
+++ b/include/linux/virtio_net.h
@@ -70,6 +70,28 @@
 					 * with the same MAC.
 					 */
 #define VIRTIO_NET_F_SPEED_DUPLEX 63	/* Device set linkspeed and duplex */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO 65 /* Driver can receive
+					      * GSO-over-UDP-tunnel packets
+					      */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM 66 /* Driver handles
+						   * GSO-over-UDP-tunnel
+						   * packets with partial csum
+						   * for the outer header
+						   */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO 67 /* Device can receive
+					     * GSO-over-UDP-tunnel packets
+					     */
+#define VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO_CSUM 68 /* Device handles
+						  * GSO-over-UDP-tunnel
+						  * packets with partial csum
+						  * for the outer header
+						  */
+
+/* Offloads bits corresponding to VIRTIO_NET_F_HOST_UDP_TUNNEL_GSO{,_CSUM}
+ * features
+ */
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_MAPPED	46
+#define VIRTIO_NET_F_GUEST_UDP_TUNNEL_GSO_CSUM_MAPPED	47
 
 #ifndef VIRTIO_NET_NO_LEGACY
 #define VIRTIO_NET_F_GSO	6	/* Host handles pkts w/ any GSO type */
@@ -131,12 +153,17 @@ struct virtio_net_hdr_v1 {
 #define VIRTIO_NET_HDR_F_NEEDS_CSUM	1	/* Use csum_start, csum_offset */
 #define VIRTIO_NET_HDR_F_DATA_VALID	2	/* Csum is valid */
 #define VIRTIO_NET_HDR_F_RSC_INFO	4	/* rsc info in csum_ fields */
+#define VIRTIO_NET_HDR_F_UDP_TUNNEL_CSUM 8	/* UDP tunnel csum offload */
 	__u8 flags;
 #define VIRTIO_NET_HDR_GSO_NONE		0	/* Not a GSO frame */
 #define VIRTIO_NET_HDR_GSO_TCPV4	1	/* GSO frame, IPv4 TCP (TSO) */
 #define VIRTIO_NET_HDR_GSO_UDP		3	/* GSO frame, IPv4 UDP (UFO) */
 #define VIRTIO_NET_HDR_GSO_TCPV6	4	/* GSO frame, IPv6 TCP */
 #define VIRTIO_NET_HDR_GSO_UDP_L4	5	/* GSO frame, IPv4& IPv6 UDP (USO) */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 0x20 /* UDPv4 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6 0x40 /* UDPv6 tunnel present */
+#define VIRTIO_NET_HDR_GSO_UDP_TUNNEL (VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV4 | \
+				       VIRTIO_NET_HDR_GSO_UDP_TUNNEL_IPV6)
 #define VIRTIO_NET_HDR_GSO_ECN		0x80	/* TCP has ECN set */
 	__u8 gso_type;
 	__virtio16 hdr_len;	/* Ethernet + IP + tcp/udp hdrs */
@@ -166,7 +193,8 @@ struct virtio_net_hdr_v1 {
 
 struct virtio_net_hdr_v1_hash {
 	struct virtio_net_hdr_v1 hdr;
-	__le32 hash_value;
+	__le16 hash_value_lo;
+	__le16 hash_value_hi;
 #define VIRTIO_NET_HASH_REPORT_NONE            0
 #define VIRTIO_NET_HASH_REPORT_IPv4            1
 #define VIRTIO_NET_HASH_REPORT_TCPv4           2
@@ -181,6 +209,12 @@ struct virtio_net_hdr_v1_hash {
 	__le16 padding;
 };
 
+struct virtio_net_hdr_v1_hash_tunnel {
+	struct virtio_net_hdr_v1_hash hash_hdr;
+	__le16 outer_th_offset;
+	__le16 inner_nh_offset;
+};
+
 #ifndef VIRTIO_NET_NO_LEGACY
 /* This header comes first in the scatter-gather list.
  * For legacy virtio, if VIRTIO_F_ANY_LAYOUT is not negotiated, it must
@@ -327,6 +361,19 @@ struct virtio_net_rss_config {
 	__u8 hash_key_data[/* hash_key_length */];
 };
 
+struct virtio_net_rss_config_hdr {
+	__le32 hash_types;
+	__le16 indirection_table_mask;
+	__le16 unclassified_queue;
+	__le16 indirection_table[/* 1 + indirection_table_mask */];
+};
+
+struct virtio_net_rss_config_trailer {
+	__le16 max_tx_vq;
+	__u8 hash_key_length;
+	__u8 hash_key_data[/* hash_key_length */];
+};
+
  #define VIRTIO_NET_CTRL_MQ_RSS_CONFIG          1
 
 /*
diff --git a/include/linux/virtio_pci.h b/include/linux/virtio_pci.h
index 8549d457..e732e345 100644
--- a/include/linux/virtio_pci.h
+++ b/include/linux/virtio_pci.h
@@ -40,7 +40,7 @@
 #define _LINUX_VIRTIO_PCI_H
 
 #include <linux/types.h>
-#include <linux/kernel.h>
+#include <linux/const.h>
 
 #ifndef VIRTIO_PCI_NO_LEGACY
 
@@ -246,6 +246,7 @@ struct virtio_pci_cfg_cap {
 #define VIRTIO_ADMIN_CMD_LIST_USE	0x1
 
 /* Admin command group type. */
+#define VIRTIO_ADMIN_GROUP_TYPE_SELF	0x0
 #define VIRTIO_ADMIN_GROUP_TYPE_SRIOV	0x1
 
 /* Transitional device admin command. */
diff --git a/powerpc/include/asm/kvm.h b/powerpc/include/asm/kvm.h
index eaeda001..077c5437 100644
--- a/powerpc/include/asm/kvm.h
+++ b/powerpc/include/asm/kvm.h
@@ -1,18 +1,5 @@
 /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
 /*
- * This program is free software; you can redistribute it and/or modify
- * it under the terms of the GNU General Public License, version 2, as
- * published by the Free Software Foundation.
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
- *
  * Copyright IBM Corp. 2007
  *
  * Authors: Hollis Blanchard <hollisb@us.ibm.com>
diff --git a/riscv/include/asm/kvm.h b/riscv/include/asm/kvm.h
index f06bc5ef..54f3ad7e 100644
--- a/riscv/include/asm/kvm.h
+++ b/riscv/include/asm/kvm.h
@@ -9,7 +9,7 @@
 #ifndef __LINUX_KVM_RISCV_H
 #define __LINUX_KVM_RISCV_H
 
-#ifndef __ASSEMBLY__
+#ifndef __ASSEMBLER__
 
 #include <linux/types.h>
 #include <asm/bitsperlong.h>
@@ -18,10 +18,13 @@
 #define __KVM_HAVE_IRQ_LINE
 
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_DIRTY_LOG_PAGE_OFFSET 64
 
 #define KVM_INTERRUPT_SET	-1U
 #define KVM_INTERRUPT_UNSET	-2U
 
+#define KVM_EXIT_FAIL_ENTRY_NO_VSFILE	(1ULL << 0)
+
 /* for KVM_GET_REGS and KVM_SET_REGS */
 struct kvm_regs {
 };
@@ -55,6 +58,7 @@ struct kvm_riscv_config {
 	unsigned long mimpid;
 	unsigned long zicboz_block_size;
 	unsigned long satp_mode;
+	unsigned long zicbop_block_size;
 };
 
 /* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
@@ -182,6 +186,12 @@ enum KVM_RISCV_ISA_EXT_ID {
 	KVM_RISCV_ISA_EXT_SVVPTC,
 	KVM_RISCV_ISA_EXT_ZABHA,
 	KVM_RISCV_ISA_EXT_ZICCRSE,
+	KVM_RISCV_ISA_EXT_ZAAMO,
+	KVM_RISCV_ISA_EXT_ZALRSC,
+	KVM_RISCV_ISA_EXT_ZICBOP,
+	KVM_RISCV_ISA_EXT_ZFBFMIN,
+	KVM_RISCV_ISA_EXT_ZVFBFMIN,
+	KVM_RISCV_ISA_EXT_ZVFBFWMA,
 	KVM_RISCV_ISA_EXT_MAX,
 };
 
@@ -202,6 +212,8 @@ enum KVM_RISCV_SBI_EXT_ID {
 	KVM_RISCV_SBI_EXT_DBCN,
 	KVM_RISCV_SBI_EXT_STA,
 	KVM_RISCV_SBI_EXT_SUSP,
+	KVM_RISCV_SBI_EXT_FWFT,
+	KVM_RISCV_SBI_EXT_MPXY,
 	KVM_RISCV_SBI_EXT_MAX,
 };
 
@@ -211,6 +223,18 @@ struct kvm_riscv_sbi_sta {
 	unsigned long shmem_hi;
 };
 
+struct kvm_riscv_sbi_fwft_feature {
+	unsigned long enable;
+	unsigned long flags;
+	unsigned long value;
+};
+
+/* SBI FWFT extension registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */
+struct kvm_riscv_sbi_fwft {
+	struct kvm_riscv_sbi_fwft_feature misaligned_deleg;
+	struct kvm_riscv_sbi_fwft_feature pointer_masking;
+};
+
 /* Possible states for kvm_riscv_timer */
 #define KVM_RISCV_TIMER_STATE_OFF	0
 #define KVM_RISCV_TIMER_STATE_ON	1
@@ -294,6 +318,9 @@ struct kvm_riscv_sbi_sta {
 #define KVM_REG_RISCV_SBI_STA		(0x0 << KVM_REG_RISCV_SUBTYPE_SHIFT)
 #define KVM_REG_RISCV_SBI_STA_REG(name)		\
 		(offsetof(struct kvm_riscv_sbi_sta, name) / sizeof(unsigned long))
+#define KVM_REG_RISCV_SBI_FWFT		(0x1 << KVM_REG_RISCV_SUBTYPE_SHIFT)
+#define KVM_REG_RISCV_SBI_FWFT_REG(name)	\
+		(offsetof(struct kvm_riscv_sbi_fwft, name) / sizeof(unsigned long))
 
 /* Device Control API: RISC-V AIA */
 #define KVM_DEV_RISCV_APLIC_ALIGN		0x1000
diff --git a/x86/include/asm/kvm.h b/x86/include/asm/kvm.h
index 9e75da97..7ceff658 100644
--- a/x86/include/asm/kvm.h
+++ b/x86/include/asm/kvm.h
@@ -35,6 +35,11 @@
 #define MC_VECTOR 18
 #define XM_VECTOR 19
 #define VE_VECTOR 20
+#define CP_VECTOR 21
+
+#define HV_VECTOR 28
+#define VC_VECTOR 29
+#define SX_VECTOR 30
 
 /* Select x86 specific features in <linux/kvm.h> */
 #define __KVM_HAVE_PIT
@@ -411,6 +416,35 @@ struct kvm_xcrs {
 	__u64 padding[16];
 };
 
+#define KVM_X86_REG_TYPE_MSR		2
+#define KVM_X86_REG_TYPE_KVM		3
+
+#define KVM_X86_KVM_REG_SIZE(reg)						\
+({										\
+	reg == KVM_REG_GUEST_SSP ? KVM_REG_SIZE_U64 : 0;			\
+})
+
+#define KVM_X86_REG_TYPE_SIZE(type, reg)					\
+({										\
+	__u64 type_size = (__u64)type << 32;					\
+										\
+	type_size |= type == KVM_X86_REG_TYPE_MSR ? KVM_REG_SIZE_U64 :		\
+		     type == KVM_X86_REG_TYPE_KVM ? KVM_X86_KVM_REG_SIZE(reg) :	\
+		     0;								\
+	type_size;								\
+})
+
+#define KVM_X86_REG_ID(type, index)				\
+	(KVM_REG_X86 | KVM_X86_REG_TYPE_SIZE(type, index) | index)
+
+#define KVM_X86_REG_MSR(index)					\
+	KVM_X86_REG_ID(KVM_X86_REG_TYPE_MSR, index)
+#define KVM_X86_REG_KVM(index)					\
+	KVM_X86_REG_ID(KVM_X86_REG_TYPE_KVM, index)
+
+/* KVM-defined registers starting from 0 */
+#define KVM_REG_GUEST_SSP	0
+
 #define KVM_SYNC_X86_REGS      (1UL << 0)
 #define KVM_SYNC_X86_SREGS     (1UL << 1)
 #define KVM_SYNC_X86_EVENTS    (1UL << 2)
@@ -441,6 +475,7 @@ struct kvm_sync_regs {
 #define KVM_X86_QUIRK_MWAIT_NEVER_UD_FAULTS	(1 << 6)
 #define KVM_X86_QUIRK_SLOT_ZAP_ALL		(1 << 7)
 #define KVM_X86_QUIRK_STUFF_FEATURE_MSRS	(1 << 8)
+#define KVM_X86_QUIRK_IGNORE_GUEST_PAT		(1 << 9)
 
 #define KVM_STATE_NESTED_FORMAT_VMX	0
 #define KVM_STATE_NESTED_FORMAT_SVM	1
@@ -467,6 +502,7 @@ struct kvm_sync_regs {
 /* vendor-specific groups and attributes for system fd */
 #define KVM_X86_GRP_SEV			1
 #  define KVM_X86_SEV_VMSA_FEATURES	0
+#  define KVM_X86_SNP_POLICY_BITS	1
 
 struct kvm_vmx_nested_state_data {
 	__u8 vmcs12[KVM_STATE_NESTED_VMX_VMCS_SIZE];
@@ -559,6 +595,9 @@ struct kvm_x86_mce {
 #define KVM_XEN_HVM_CONFIG_PVCLOCK_TSC_UNSTABLE	(1 << 7)
 #define KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA	(1 << 8)
 
+#define KVM_XEN_MSR_MIN_INDEX			0x40000000u
+#define KVM_XEN_MSR_MAX_INDEX			0x4fffffffu
+
 struct kvm_xen_hvm_config {
 	__u32 flags;
 	__u32 msr;
@@ -841,6 +880,7 @@ struct kvm_sev_snp_launch_start {
 };
 
 /* Kept in sync with firmware values for simplicity. */
+#define KVM_SEV_PAGE_TYPE_INVALID		0x0
 #define KVM_SEV_SNP_PAGE_TYPE_NORMAL		0x1
 #define KVM_SEV_SNP_PAGE_TYPE_ZERO		0x3
 #define KVM_SEV_SNP_PAGE_TYPE_UNMEASURED	0x4
@@ -927,4 +967,80 @@ struct kvm_hyperv_eventfd {
 #define KVM_X86_SNP_VM		4
 #define KVM_X86_TDX_VM		5
 
+/* Trust Domain eXtension sub-ioctl() commands. */
+enum kvm_tdx_cmd_id {
+	KVM_TDX_CAPABILITIES = 0,
+	KVM_TDX_INIT_VM,
+	KVM_TDX_INIT_VCPU,
+	KVM_TDX_INIT_MEM_REGION,
+	KVM_TDX_FINALIZE_VM,
+	KVM_TDX_GET_CPUID,
+
+	KVM_TDX_CMD_NR_MAX,
+};
+
+struct kvm_tdx_cmd {
+	/* enum kvm_tdx_cmd_id */
+	__u32 id;
+	/* flags for sub-commend. If sub-command doesn't use this, set zero. */
+	__u32 flags;
+	/*
+	 * data for each sub-command. An immediate or a pointer to the actual
+	 * data in process virtual address.  If sub-command doesn't use it,
+	 * set zero.
+	 */
+	__u64 data;
+	/*
+	 * Auxiliary error code.  The sub-command may return TDX SEAMCALL
+	 * status code in addition to -Exxx.
+	 */
+	__u64 hw_error;
+};
+
+struct kvm_tdx_capabilities {
+	__u64 supported_attrs;
+	__u64 supported_xfam;
+
+	__u64 kernel_tdvmcallinfo_1_r11;
+	__u64 user_tdvmcallinfo_1_r11;
+	__u64 kernel_tdvmcallinfo_1_r12;
+	__u64 user_tdvmcallinfo_1_r12;
+
+	__u64 reserved[250];
+
+	/* Configurable CPUID bits for userspace */
+	struct kvm_cpuid2 cpuid;
+};
+
+struct kvm_tdx_init_vm {
+	__u64 attributes;
+	__u64 xfam;
+	__u64 mrconfigid[6];	/* sha384 digest */
+	__u64 mrowner[6];	/* sha384 digest */
+	__u64 mrownerconfig[6];	/* sha384 digest */
+
+	/* The total space for TD_PARAMS before the CPUIDs is 256 bytes */
+	__u64 reserved[12];
+
+	/*
+	 * Call KVM_TDX_INIT_VM before vcpu creation, thus before
+	 * KVM_SET_CPUID2.
+	 * This configuration supersedes KVM_SET_CPUID2s for VCPUs because the
+	 * TDX module directly virtualizes those CPUIDs without VMM.  The user
+	 * space VMM, e.g. qemu, should make KVM_SET_CPUID2 consistent with
+	 * those values.  If it doesn't, KVM may have wrong idea of vCPUIDs of
+	 * the guest, and KVM may wrongly emulate CPUIDs or MSRs that the TDX
+	 * module doesn't virtualize.
+	 */
+	struct kvm_cpuid2 cpuid;
+};
+
+#define KVM_TDX_MEASURE_MEMORY_REGION   _BITULL(0)
+
+struct kvm_tdx_init_mem_region {
+	__u64 source_addr;
+	__u64 gpa;
+	__u64 nr_pages;
+};
+
 #endif /* _ASM_X86_KVM_H */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kvmtool v5 2/7] arm64: Initial nested virt support
  2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
  2026-01-23 14:27 ` [PATCH kvmtool v5 1/7] Sync kernel UAPI headers with v6.19-rc6 Andre Przywara
@ 2026-01-23 14:27 ` Andre Przywara
  2026-01-23 14:27 ` [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ Andre Przywara
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Andre Przywara @ 2026-01-23 14:27 UTC (permalink / raw)
  To: Julien Thierry, Will Deacon
  Cc: Marc Zyngier, kvm, kvmarm, Alexandru Elisei, Sascha Bischoff

The ARMv8.3 architecture update includes support for nested
virtualization. Allow the user to specify "--nested" to start a guest in
(virtual) EL2 instead of EL1.
This will also change the PSCI conduit from HVC to SMC in the device
tree.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
 arm64/fdt.c                         |  5 ++++-
 arm64/include/kvm/kvm-config-arch.h |  5 ++++-
 arm64/kvm-cpu.c                     | 12 +++++++++++-
 3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/arm64/fdt.c b/arm64/fdt.c
index df777587..98f1dd9d 100644
--- a/arm64/fdt.c
+++ b/arm64/fdt.c
@@ -205,7 +205,10 @@ static int setup_fdt(struct kvm *kvm)
 		_FDT(fdt_property_string(fdt, "compatible", "arm,psci"));
 		fns = &psci_0_1_fns;
 	}
-	_FDT(fdt_property_string(fdt, "method", "hvc"));
+	if (kvm->cfg.arch.nested_virt)
+		_FDT(fdt_property_string(fdt, "method", "smc"));
+	else
+		_FDT(fdt_property_string(fdt, "method", "hvc"));
 	_FDT(fdt_property_cell(fdt, "cpu_suspend", fns->cpu_suspend));
 	_FDT(fdt_property_cell(fdt, "cpu_off", fns->cpu_off));
 	_FDT(fdt_property_cell(fdt, "cpu_on", fns->cpu_on));
diff --git a/arm64/include/kvm/kvm-config-arch.h b/arm64/include/kvm/kvm-config-arch.h
index ee031f01..a1dac28e 100644
--- a/arm64/include/kvm/kvm-config-arch.h
+++ b/arm64/include/kvm/kvm-config-arch.h
@@ -10,6 +10,7 @@ struct kvm_config_arch {
 	bool		aarch32_guest;
 	bool		has_pmuv3;
 	bool		mte_disabled;
+	bool		nested_virt;
 	u64		kaslr_seed;
 	enum irqchip_type irqchip;
 	u64		fw_addr;
@@ -57,6 +58,8 @@ int sve_vl_parser(const struct option *opt, const char *arg, int unset);
 		     "Type of interrupt controller to emulate in the guest",	\
 		     irqchip_parser, NULL),					\
 	OPT_U64('\0', "firmware-address", &(cfg)->fw_addr,			\
-		"Address where firmware should be loaded"),
+		"Address where firmware should be loaded"),			\
+	OPT_BOOLEAN('\0', "nested", &(cfg)->nested_virt,			\
+		    "Start VCPUs in EL2 (for nested virt)"),
 
 #endif /* ARM_COMMON__KVM_CONFIG_ARCH_H */
diff --git a/arm64/kvm-cpu.c b/arm64/kvm-cpu.c
index 94c08a4d..42dc11da 100644
--- a/arm64/kvm-cpu.c
+++ b/arm64/kvm-cpu.c
@@ -71,6 +71,12 @@ static void kvm_cpu__select_features(struct kvm *kvm, struct kvm_vcpu_init *init
 	/* Enable SVE if available */
 	if (kvm__supports_extension(kvm, KVM_CAP_ARM_SVE))
 		init->features[0] |= 1UL << KVM_ARM_VCPU_SVE;
+
+	if (kvm->cfg.arch.nested_virt) {
+		if (!kvm__supports_extension(kvm, KVM_CAP_ARM_EL2))
+			die("EL2 (nested virt) is not supported");
+		init->features[0] |= 1UL << KVM_ARM_VCPU_HAS_EL2;
+	}
 }
 
 static int vcpu_configure_sve(struct kvm_cpu *vcpu)
@@ -313,7 +319,11 @@ static void reset_vcpu_aarch64(struct kvm_cpu *vcpu)
 	reg.addr = (u64)&data;
 
 	/* pstate = all interrupts masked */
-	data	= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT | PSR_MODE_EL1h;
+	data	= PSR_D_BIT | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT;
+	if (vcpu->kvm->cfg.arch.nested_virt)
+		data |= PSR_MODE_EL2h;
+	else
+		data |= PSR_MODE_EL1h;
 	reg.id	= ARM64_CORE_REG(regs.pstate);
 	if (ioctl(vcpu->vcpu_fd, KVM_SET_ONE_REG, &reg) < 0)
 		die_perror("KVM_SET_ONE_REG failed (spsr[EL1])");
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ
  2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
  2026-01-23 14:27 ` [PATCH kvmtool v5 1/7] Sync kernel UAPI headers with v6.19-rc6 Andre Przywara
  2026-01-23 14:27 ` [PATCH kvmtool v5 2/7] arm64: Initial nested virt support Andre Przywara
@ 2026-01-23 14:27 ` Andre Przywara
  2026-01-26 18:03   ` Marc Zyngier
  2026-01-23 14:27 ` [PATCH kvmtool v5 4/7] arm64: Add counter offset control Andre Przywara
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Andre Przywara @ 2026-01-23 14:27 UTC (permalink / raw)
  To: Julien Thierry, Will Deacon
  Cc: Marc Zyngier, kvm, kvmarm, Alexandru Elisei, Sascha Bischoff

Uses the new VGIC KVM device attribute to set the maintenance IRQ.
This is fixed to use PPI 9, as a platform decision made by kvmtool,
matching the SBSA recommendation.
Use the opportunity to pass the kvm pointer to gic__generate_fdt_nodes(),
as this simplifies the call and allows us access to the nested_virt
config variable on the way.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 arm64/arm-cpu.c         |  2 +-
 arm64/gic.c             | 29 +++++++++++++++++++++++++++--
 arm64/include/kvm/gic.h |  2 +-
 3 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/arm64/arm-cpu.c b/arm64/arm-cpu.c
index 69bb2cb2..0843ac05 100644
--- a/arm64/arm-cpu.c
+++ b/arm64/arm-cpu.c
@@ -14,7 +14,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm)
 {
 	int timer_interrupts[4] = {13, 14, 11, 10};
 
-	gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
+	gic__generate_fdt_nodes(fdt, kvm);
 	timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
 	pmu__generate_fdt_nodes(fdt, kvm);
 }
diff --git a/arm64/gic.c b/arm64/gic.c
index b0d3a1ab..2a595184 100644
--- a/arm64/gic.c
+++ b/arm64/gic.c
@@ -11,6 +11,8 @@
 
 #define IRQCHIP_GIC 0
 
+#define GIC_MAINT_IRQ	9
+
 static int gic_fd = -1;
 static u64 gic_redists_base;
 static u64 gic_redists_size;
@@ -302,10 +304,15 @@ static int gic__init_gic(struct kvm *kvm)
 
 	int lines = irq__get_nr_allocated_lines();
 	u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
+	u32 maint_irq = GIC_PPI_IRQ_BASE + GIC_MAINT_IRQ;
 	struct kvm_device_attr nr_irqs_attr = {
 		.group	= KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
 		.addr	= (u64)(unsigned long)&nr_irqs,
 	};
+	struct kvm_device_attr maint_irq_attr = {
+		.group	= KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ,
+		.addr	= (u64)(unsigned long)&maint_irq,
+	};
 	struct kvm_device_attr vgic_init_attr = {
 		.group	= KVM_DEV_ARM_VGIC_GRP_CTRL,
 		.attr	= KVM_DEV_ARM_VGIC_CTRL_INIT,
@@ -325,6 +332,16 @@ static int gic__init_gic(struct kvm *kvm)
 			return ret;
 	}
 
+	if (kvm->cfg.arch.nested_virt) {
+		ret = ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &maint_irq_attr);
+		if (!ret)
+			ret = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &maint_irq_attr);
+		if (ret) {
+			pr_err("could not set maintenance IRQ\n");
+			return ret;
+		}
+	}
+
 	irq__routing_init(kvm);
 
 	if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &vgic_init_attr)) {
@@ -342,7 +359,7 @@ static int gic__init_gic(struct kvm *kvm)
 }
 late_init(gic__init_gic)
 
-void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
+void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
 {
 	const char *compatible, *msi_compatible = NULL;
 	u64 msi_prop[2];
@@ -350,8 +367,12 @@ void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
 		cpu_to_fdt64(ARM_GIC_DIST_BASE), cpu_to_fdt64(ARM_GIC_DIST_SIZE),
 		0, 0,				/* to be filled */
 	};
+	u32 maint_irq[] = {
+		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(GIC_MAINT_IRQ),
+		gic__get_fdt_irq_cpumask(kvm) | IRQ_TYPE_LEVEL_HIGH
+	};
 
-	switch (type) {
+	switch (kvm->cfg.arch.irqchip) {
 	case IRQCHIP_GICV2M:
 		msi_compatible = "arm,gic-v2m-frame";
 		/* fall-through */
@@ -377,6 +398,10 @@ void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
 	_FDT(fdt_property_cell(fdt, "#interrupt-cells", GIC_FDT_IRQ_NUM_CELLS));
 	_FDT(fdt_property(fdt, "interrupt-controller", NULL, 0));
 	_FDT(fdt_property(fdt, "reg", reg_prop, sizeof(reg_prop)));
+	if (kvm->cfg.arch.nested_virt) {
+		_FDT(fdt_property(fdt, "interrupts", maint_irq,
+				  sizeof(maint_irq)));
+	}
 	_FDT(fdt_property_cell(fdt, "phandle", PHANDLE_GIC));
 	_FDT(fdt_property_cell(fdt, "#address-cells", 2));
 	_FDT(fdt_property_cell(fdt, "#size-cells", 2));
diff --git a/arm64/include/kvm/gic.h b/arm64/include/kvm/gic.h
index ad8bcbf2..8490cca6 100644
--- a/arm64/include/kvm/gic.h
+++ b/arm64/include/kvm/gic.h
@@ -36,7 +36,7 @@ struct kvm;
 int gic__alloc_irqnum(void);
 int gic__create(struct kvm *kvm, enum irqchip_type type);
 int gic__create_gicv2m_frame(struct kvm *kvm, u64 msi_frame_addr);
-void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type);
+void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm);
 u32 gic__get_fdt_irq_cpumask(struct kvm *kvm);
 
 int gic__add_irqfd(struct kvm *kvm, unsigned int gsi, int trigger_fd,
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kvmtool v5 4/7] arm64: Add counter offset control
  2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
                   ` (2 preceding siblings ...)
  2026-01-23 14:27 ` [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ Andre Przywara
@ 2026-01-23 14:27 ` Andre Przywara
  2026-01-23 14:27 ` [PATCH kvmtool v5 5/7] arm64: Add FEAT_E2H0 support Andre Przywara
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Andre Przywara @ 2026-01-23 14:27 UTC (permalink / raw)
  To: Julien Thierry, Will Deacon
  Cc: Marc Zyngier, kvm, kvmarm, Alexandru Elisei, Sascha Bischoff

From: Marc Zyngier <maz@kernel.org>

KVM allows the offsetting of the global counter in order to help with
migration of a VM. This offset applies cumulatively with the offsets
provided by the architecture.

Although kvmtool doesn't provide a way to migrate a VM, controlling
this offset is useful to test the timer subsystem.

Add the command line option --counter-offset to allow setting this value
when creating a VM.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
 arm64/include/kvm/kvm-config-arch.h |  3 +++
 arm64/kvm.c                         | 17 +++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/arm64/include/kvm/kvm-config-arch.h b/arm64/include/kvm/kvm-config-arch.h
index a1dac28e..44c43367 100644
--- a/arm64/include/kvm/kvm-config-arch.h
+++ b/arm64/include/kvm/kvm-config-arch.h
@@ -14,6 +14,7 @@ struct kvm_config_arch {
 	u64		kaslr_seed;
 	enum irqchip_type irqchip;
 	u64		fw_addr;
+	u64		counter_offset;
 	unsigned int	sve_max_vq;
 	bool		no_pvtime;
 };
@@ -59,6 +60,8 @@ int sve_vl_parser(const struct option *opt, const char *arg, int unset);
 		     irqchip_parser, NULL),					\
 	OPT_U64('\0', "firmware-address", &(cfg)->fw_addr,			\
 		"Address where firmware should be loaded"),			\
+	OPT_U64('\0', "counter-offset", &(cfg)->counter_offset,			\
+		"Specify the counter offset, defaulting to 0"),			\
 	OPT_BOOLEAN('\0', "nested", &(cfg)->nested_virt,			\
 		    "Start VCPUs in EL2 (for nested virt)"),
 
diff --git a/arm64/kvm.c b/arm64/kvm.c
index 23b4dab1..6e971dd7 100644
--- a/arm64/kvm.c
+++ b/arm64/kvm.c
@@ -119,6 +119,22 @@ static void kvm__arch_enable_mte(struct kvm *kvm)
 	pr_debug("MTE capability enabled");
 }
 
+static void kvm__arch_set_counter_offset(struct kvm *kvm)
+{
+	struct kvm_arm_counter_offset offset = {
+		.counter_offset = kvm->cfg.arch.counter_offset,
+	};
+
+	if (!kvm->cfg.arch.counter_offset)
+		return;
+
+	if (!kvm__supports_extension(kvm, KVM_CAP_COUNTER_OFFSET))
+		die("No support for global counter offset");
+
+	if (ioctl(kvm->vm_fd, KVM_ARM_SET_COUNTER_OFFSET, &offset))
+		die_perror("KVM_ARM_SET_COUNTER_OFFSET");
+}
+
 void kvm__arch_init(struct kvm *kvm)
 {
 	/* Create the virtual GIC. */
@@ -126,6 +142,7 @@ void kvm__arch_init(struct kvm *kvm)
 		die("Failed to create virtual GIC");
 
 	kvm__arch_enable_mte(kvm);
+	kvm__arch_set_counter_offset(kvm);
 }
 
 static u64 kvm__arch_get_payload_region_size(struct kvm *kvm)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kvmtool v5 5/7] arm64: Add FEAT_E2H0 support
  2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
                   ` (3 preceding siblings ...)
  2026-01-23 14:27 ` [PATCH kvmtool v5 4/7] arm64: Add counter offset control Andre Przywara
@ 2026-01-23 14:27 ` Andre Przywara
  2026-01-30  9:29   ` Sascha Bischoff
  2026-01-23 14:27 ` [PATCH kvmtool v5 6/7] arm64: Generate HYP timer interrupt specifiers Andre Przywara
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Andre Przywara @ 2026-01-23 14:27 UTC (permalink / raw)
  To: Julien Thierry, Will Deacon
  Cc: Marc Zyngier, kvm, kvmarm, Alexandru Elisei, Sascha Bischoff

From: Marc Zyngier <maz@kernel.org>

The --nested option allows a guest to boot at EL2 without FEAT_E2H0
(i.e. mandating VHE support). While this is great for "modern" operating
systems and hypervisors, a few legacy guests are stuck in a distant past.

To support those, add the --e2h0 command line option, that exposes
FEAT_E2H0 to the guest, at the expense of a number of other features, such
as FEAT_NV2. This is conditioned on the host itself supporting FEAT_E2H0.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
 arm64/include/kvm/kvm-config-arch.h | 5 ++++-
 arm64/kvm-cpu.c                     | 5 +++++
 arm64/kvm.c                         | 2 ++
 3 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/arm64/include/kvm/kvm-config-arch.h b/arm64/include/kvm/kvm-config-arch.h
index 44c43367..73bf4211 100644
--- a/arm64/include/kvm/kvm-config-arch.h
+++ b/arm64/include/kvm/kvm-config-arch.h
@@ -11,6 +11,7 @@ struct kvm_config_arch {
 	bool		has_pmuv3;
 	bool		mte_disabled;
 	bool		nested_virt;
+	bool		e2h0;
 	u64		kaslr_seed;
 	enum irqchip_type irqchip;
 	u64		fw_addr;
@@ -63,6 +64,8 @@ int sve_vl_parser(const struct option *opt, const char *arg, int unset);
 	OPT_U64('\0', "counter-offset", &(cfg)->counter_offset,			\
 		"Specify the counter offset, defaulting to 0"),			\
 	OPT_BOOLEAN('\0', "nested", &(cfg)->nested_virt,			\
-		    "Start VCPUs in EL2 (for nested virt)"),
+		    "Start VCPUs in EL2 (for nested virt)"),			\
+	OPT_BOOLEAN('\0', "e2h0", &(cfg)->e2h0,					\
+		    "Create guest without VHE support"),
 
 #endif /* ARM_COMMON__KVM_CONFIG_ARCH_H */
diff --git a/arm64/kvm-cpu.c b/arm64/kvm-cpu.c
index 42dc11da..5e4f3a7d 100644
--- a/arm64/kvm-cpu.c
+++ b/arm64/kvm-cpu.c
@@ -76,6 +76,11 @@ static void kvm_cpu__select_features(struct kvm *kvm, struct kvm_vcpu_init *init
 		if (!kvm__supports_extension(kvm, KVM_CAP_ARM_EL2))
 			die("EL2 (nested virt) is not supported");
 		init->features[0] |= 1UL << KVM_ARM_VCPU_HAS_EL2;
+		if (kvm->cfg.arch.e2h0) {
+			if (!kvm__supports_extension(kvm, KVM_CAP_ARM_EL2_E2H0))
+				die("FEAT_E2H0 is not supported");
+			init->features[0] |= 1UL << KVM_ARM_VCPU_HAS_EL2_E2H0;
+		}
 	}
 }
 
diff --git a/arm64/kvm.c b/arm64/kvm.c
index 6e971dd7..ed0f1264 100644
--- a/arm64/kvm.c
+++ b/arm64/kvm.c
@@ -440,6 +440,8 @@ void kvm__arch_validate_cfg(struct kvm *kvm)
 	    kvm->cfg.ram_addr + kvm->cfg.ram_size > SZ_4G) {
 		die("RAM extends above 4GB");
 	}
+	if (kvm->cfg.arch.e2h0 && !kvm->cfg.arch.nested_virt)
+		pr_warning("--e2h0 requires --nested, ignoring");
 }
 
 u64 kvm__arch_default_ram_address(void)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kvmtool v5 6/7] arm64: Generate HYP timer interrupt specifiers
  2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
                   ` (4 preceding siblings ...)
  2026-01-23 14:27 ` [PATCH kvmtool v5 5/7] arm64: Add FEAT_E2H0 support Andre Przywara
@ 2026-01-23 14:27 ` Andre Przywara
  2026-01-23 14:27 ` [PATCH kvmtool v5 7/7] arm64: Handle virtio endianness reset when running nested Andre Przywara
  2026-02-09  2:21 ` [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Itaru Kitayama
  7 siblings, 0 replies; 18+ messages in thread
From: Andre Przywara @ 2026-01-23 14:27 UTC (permalink / raw)
  To: Julien Thierry, Will Deacon
  Cc: Marc Zyngier, kvm, kvmarm, Alexandru Elisei, Sascha Bischoff

From: Marc Zyngier <maz@kernel.org>

FEAT_VHE introduced a non-secure EL2 virtual timer, along with its
interrupt line. Consequently the arch timer DT binding introduced a fifth
interrupt to communicate this interrupt number.

Refactor the interrupts property generation code to deal with a variable
number of interrupts, and forward five interrupts instead of four in case
nested virt is enabled.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
---
 arm64/arm-cpu.c           |  4 +---
 arm64/include/kvm/timer.h |  2 +-
 arm64/timer.c             | 29 ++++++++++++-----------------
 3 files changed, 14 insertions(+), 21 deletions(-)

diff --git a/arm64/arm-cpu.c b/arm64/arm-cpu.c
index 0843ac05..5b5484d8 100644
--- a/arm64/arm-cpu.c
+++ b/arm64/arm-cpu.c
@@ -12,10 +12,8 @@
 
 static void generate_fdt_nodes(void *fdt, struct kvm *kvm)
 {
-	int timer_interrupts[4] = {13, 14, 11, 10};
-
 	gic__generate_fdt_nodes(fdt, kvm);
-	timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
+	timer__generate_fdt_nodes(fdt, kvm);
 	pmu__generate_fdt_nodes(fdt, kvm);
 }
 
diff --git a/arm64/include/kvm/timer.h b/arm64/include/kvm/timer.h
index 928e9ea7..81e093e4 100644
--- a/arm64/include/kvm/timer.h
+++ b/arm64/include/kvm/timer.h
@@ -1,6 +1,6 @@
 #ifndef ARM_COMMON__TIMER_H
 #define ARM_COMMON__TIMER_H
 
-void timer__generate_fdt_nodes(void *fdt, struct kvm *kvm, int *irqs);
+void timer__generate_fdt_nodes(void *fdt, struct kvm *kvm);
 
 #endif /* ARM_COMMON__TIMER_H */
diff --git a/arm64/timer.c b/arm64/timer.c
index 861f2d99..2ac6144f 100644
--- a/arm64/timer.c
+++ b/arm64/timer.c
@@ -5,31 +5,26 @@
 #include "kvm/timer.h"
 #include "kvm/util.h"
 
-void timer__generate_fdt_nodes(void *fdt, struct kvm *kvm, int *irqs)
+void timer__generate_fdt_nodes(void *fdt, struct kvm *kvm)
 {
 	const char compatible[] = "arm,armv8-timer\0arm,armv7-timer";
 	u32 cpu_mask = gic__get_fdt_irq_cpumask(kvm);
-	u32 irq_prop[] = {
-		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
-		cpu_to_fdt32(irqs[0]),
-		cpu_to_fdt32(cpu_mask | IRQ_TYPE_LEVEL_LOW),
+	int irqs[5] = {13, 14, 11, 10, 12};
+	int nr = ARRAY_SIZE(irqs);
+	u32 irq_prop[nr * 3];
 
-		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
-		cpu_to_fdt32(irqs[1]),
-		cpu_to_fdt32(cpu_mask | IRQ_TYPE_LEVEL_LOW),
+	if (!kvm->cfg.arch.nested_virt)
+		nr--;
 
-		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
-		cpu_to_fdt32(irqs[2]),
-		cpu_to_fdt32(cpu_mask | IRQ_TYPE_LEVEL_LOW),
-
-		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
-		cpu_to_fdt32(irqs[3]),
-		cpu_to_fdt32(cpu_mask | IRQ_TYPE_LEVEL_LOW),
-	};
+	for (int i = 0; i < nr; i++) {
+		irq_prop[i * 3 + 0] = cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI);
+		irq_prop[i * 3 + 1] = cpu_to_fdt32(irqs[i]);
+		irq_prop[i * 3 + 2] = cpu_to_fdt32(cpu_mask | IRQ_TYPE_LEVEL_LOW);
+	}
 
 	_FDT(fdt_begin_node(fdt, "timer"));
 	_FDT(fdt_property(fdt, "compatible", compatible, sizeof(compatible)));
-	_FDT(fdt_property(fdt, "interrupts", irq_prop, sizeof(irq_prop)));
+	_FDT(fdt_property(fdt, "interrupts", irq_prop, nr * 3 * sizeof(irq_prop[0])));
 	_FDT(fdt_property(fdt, "always-on", NULL, 0));
 	if (kvm->cfg.arch.force_cntfrq > 0)
 		_FDT(fdt_property_cell(fdt, "clock-frequency", kvm->cfg.arch.force_cntfrq));
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH kvmtool v5 7/7] arm64: Handle virtio endianness reset when running nested
  2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
                   ` (5 preceding siblings ...)
  2026-01-23 14:27 ` [PATCH kvmtool v5 6/7] arm64: Generate HYP timer interrupt specifiers Andre Przywara
@ 2026-01-23 14:27 ` Andre Przywara
  2026-01-23 16:03   ` Marc Zyngier
  2026-02-09  2:21 ` [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Itaru Kitayama
  7 siblings, 1 reply; 18+ messages in thread
From: Andre Przywara @ 2026-01-23 14:27 UTC (permalink / raw)
  To: Julien Thierry, Will Deacon
  Cc: Marc Zyngier, kvm, kvmarm, Alexandru Elisei, Sascha Bischoff

From: Marc Zyngier <maz@kernel.org>

When running an EL2 guest, we need to make sure we don't sample
SCTLR_EL1 to work out the virtio endianness, as this is likely
to be a bit random.

Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
 arm64/include/kvm/kvm-cpu-arch.h |  5 ++--
 arm64/kvm-cpu.c                  | 47 +++++++++++++++++++++++++-------
 2 files changed, 40 insertions(+), 12 deletions(-)

diff --git a/arm64/include/kvm/kvm-cpu-arch.h b/arm64/include/kvm/kvm-cpu-arch.h
index 1af394aa..85646ad4 100644
--- a/arm64/include/kvm/kvm-cpu-arch.h
+++ b/arm64/include/kvm/kvm-cpu-arch.h
@@ -10,8 +10,9 @@
 #define ARM_MPIDR_HWID_BITMASK	0xFF00FFFFFFUL
 #define ARM_CPU_ID		3, 0, 0, 0
 #define ARM_CPU_ID_MPIDR	5
-#define ARM_CPU_CTRL		3, 0, 1, 0
-#define ARM_CPU_CTRL_SCTLR_EL1	0
+#define SYS_SCTLR_EL1		3, 4, 1, 0, 0
+#define SYS_SCTLR_EL2		3, 4, 1, 0, 0
+#define SYS_HCR_EL2		3, 4, 1, 1, 0
 
 struct kvm_cpu {
 	pthread_t	thread;
diff --git a/arm64/kvm-cpu.c b/arm64/kvm-cpu.c
index 5e4f3a7d..35e1c639 100644
--- a/arm64/kvm-cpu.c
+++ b/arm64/kvm-cpu.c
@@ -12,6 +12,7 @@
 
 #define SCTLR_EL1_E0E_MASK	(1 << 24)
 #define SCTLR_EL1_EE_MASK	(1 << 25)
+#define HCR_EL2_TGE		(1 << 27)
 
 static int debug_fd;
 
@@ -408,7 +409,8 @@ int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
 {
 	struct kvm_one_reg reg;
 	u64 psr;
-	u64 sctlr;
+	u64 sctlr, bit;
+	u64 hcr = 0;
 
 	/*
 	 * Quoting the definition given by Peter Maydell:
@@ -419,8 +421,9 @@ int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
 	 * We first check for an AArch32 guest: its endianness can
 	 * change when using SETEND, which affects the CPSR.E bit.
 	 *
-	 * If we're AArch64, use SCTLR_EL1.E0E if access comes from
-	 * EL0, and SCTLR_EL1.EE if access comes from EL1.
+	 * If we're AArch64, determine which SCTLR register to use,
+	 * depending on NV being used or not. Then use either the E0E
+	 * bit for EL0, or the EE bit for EL1/EL2.
 	 */
 	reg.id = ARM64_CORE_REG(regs.pstate);
 	reg.addr = (u64)&psr;
@@ -430,16 +433,40 @@ int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
 	if (psr & PSR_MODE32_BIT)
 		return (psr & COMPAT_PSR_E_BIT) ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE;
 
-	reg.id = ARM64_SYS_REG(ARM_CPU_CTRL, ARM_CPU_CTRL_SCTLR_EL1);
+	if (vcpu->kvm->cfg.arch.nested_virt) {
+		reg.id = ARM64_SYS_REG(SYS_HCR_EL2);
+		reg.addr = (u64)&hcr;
+		if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, &reg) < 0)
+			die("KVM_GET_ONE_REG failed (HCR_EL2)");
+	}
+
+	switch (psr & PSR_MODE_MASK) {
+	case PSR_MODE_EL0t:
+		if (hcr & HCR_EL2_TGE)
+			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL2);
+		else
+			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL1);
+		bit = SCTLR_EL1_E0E_MASK;
+		break;
+	case PSR_MODE_EL1t:
+	case PSR_MODE_EL1h:
+		reg.id = ARM64_SYS_REG(SYS_SCTLR_EL1);
+		bit = SCTLR_EL1_EE_MASK;
+		break;
+	case PSR_MODE_EL2t:
+	case PSR_MODE_EL2h:
+		reg.id = ARM64_SYS_REG(SYS_SCTLR_EL2);
+		bit = SCTLR_EL1_EE_MASK;
+		break;
+	default:
+		die("What's that mode???\n");
+	}
+
 	reg.addr = (u64)&sctlr;
 	if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, &reg) < 0)
-		die("KVM_GET_ONE_REG failed (SCTLR_EL1)");
+		die("KVM_GET_ONE_REG failed (SCTLR_ELx)");
 
-	if ((psr & PSR_MODE_MASK) == PSR_MODE_EL0t)
-		sctlr &= SCTLR_EL1_E0E_MASK;
-	else
-		sctlr &= SCTLR_EL1_EE_MASK;
-	return sctlr ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE;
+	return (sctlr & bit) ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE;
 }
 
 void kvm_cpu__show_code(struct kvm_cpu *vcpu)
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 7/7] arm64: Handle virtio endianness reset when running nested
  2026-01-23 14:27 ` [PATCH kvmtool v5 7/7] arm64: Handle virtio endianness reset when running nested Andre Przywara
@ 2026-01-23 16:03   ` Marc Zyngier
  2026-01-27 10:15     ` Sascha Bischoff
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Zyngier @ 2026-01-23 16:03 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Julien Thierry, Will Deacon, kvm, kvmarm, Alexandru Elisei,
	Sascha Bischoff

On Fri, 23 Jan 2026 14:27:29 +0000,
Andre Przywara <andre.przywara@arm.com> wrote:
> 
> From: Marc Zyngier <maz@kernel.org>
> 
> When running an EL2 guest, we need to make sure we don't sample
> SCTLR_EL1 to work out the virtio endianness, as this is likely
> to be a bit random.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  arm64/include/kvm/kvm-cpu-arch.h |  5 ++--
>  arm64/kvm-cpu.c                  | 47 +++++++++++++++++++++++++-------
>  2 files changed, 40 insertions(+), 12 deletions(-)
> 
> diff --git a/arm64/include/kvm/kvm-cpu-arch.h b/arm64/include/kvm/kvm-cpu-arch.h
> index 1af394aa..85646ad4 100644
> --- a/arm64/include/kvm/kvm-cpu-arch.h
> +++ b/arm64/include/kvm/kvm-cpu-arch.h
> @@ -10,8 +10,9 @@
>  #define ARM_MPIDR_HWID_BITMASK	0xFF00FFFFFFUL
>  #define ARM_CPU_ID		3, 0, 0, 0
>  #define ARM_CPU_ID_MPIDR	5
> -#define ARM_CPU_CTRL		3, 0, 1, 0
> -#define ARM_CPU_CTRL_SCTLR_EL1	0
> +#define SYS_SCTLR_EL1		3, 4, 1, 0, 0
> +#define SYS_SCTLR_EL2		3, 4, 1, 0, 0
> +#define SYS_HCR_EL2		3, 4, 1, 1, 0
>  
>  struct kvm_cpu {
>  	pthread_t	thread;
> diff --git a/arm64/kvm-cpu.c b/arm64/kvm-cpu.c
> index 5e4f3a7d..35e1c639 100644
> --- a/arm64/kvm-cpu.c
> +++ b/arm64/kvm-cpu.c
> @@ -12,6 +12,7 @@
>  
>  #define SCTLR_EL1_E0E_MASK	(1 << 24)
>  #define SCTLR_EL1_EE_MASK	(1 << 25)
> +#define HCR_EL2_TGE		(1 << 27)
>  
>  static int debug_fd;
>  
> @@ -408,7 +409,8 @@ int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
>  {
>  	struct kvm_one_reg reg;
>  	u64 psr;
> -	u64 sctlr;
> +	u64 sctlr, bit;
> +	u64 hcr = 0;
>  
>  	/*
>  	 * Quoting the definition given by Peter Maydell:
> @@ -419,8 +421,9 @@ int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
>  	 * We first check for an AArch32 guest: its endianness can
>  	 * change when using SETEND, which affects the CPSR.E bit.
>  	 *
> -	 * If we're AArch64, use SCTLR_EL1.E0E if access comes from
> -	 * EL0, and SCTLR_EL1.EE if access comes from EL1.
> +	 * If we're AArch64, determine which SCTLR register to use,
> +	 * depending on NV being used or not. Then use either the E0E
> +	 * bit for EL0, or the EE bit for EL1/EL2.
>  	 */
>  	reg.id = ARM64_CORE_REG(regs.pstate);
>  	reg.addr = (u64)&psr;
> @@ -430,16 +433,40 @@ int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
>  	if (psr & PSR_MODE32_BIT)
>  		return (psr & COMPAT_PSR_E_BIT) ? VIRTIO_ENDIAN_BE : VIRTIO_ENDIAN_LE;
>  
> -	reg.id = ARM64_SYS_REG(ARM_CPU_CTRL, ARM_CPU_CTRL_SCTLR_EL1);
> +	if (vcpu->kvm->cfg.arch.nested_virt) {
> +		reg.id = ARM64_SYS_REG(SYS_HCR_EL2);
> +		reg.addr = (u64)&hcr;
> +		if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, &reg) < 0)
> +			die("KVM_GET_ONE_REG failed (HCR_EL2)");
> +	}
> +
> +	switch (psr & PSR_MODE_MASK) {
> +	case PSR_MODE_EL0t:
> +		if (hcr & HCR_EL2_TGE)
> +			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL2);
> +		else
> +			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL1);
> +		bit = SCTLR_EL1_E0E_MASK;
> +		break;

A discussion with Sascha outlined a small bug here: when using the EL2
translation regime (E2H==0), we sample the wrong bit (SCTLR_EL2.E0E
does not exist in this case).

A potential fix is as follows, though I don't think anyone will care...

	M.

diff --git a/arm64/kvm-cpu.c b/arm64/kvm-cpu.c
index 35e1c639..7b012e7a 100644
--- a/arm64/kvm-cpu.c
+++ b/arm64/kvm-cpu.c
@@ -12,7 +12,8 @@
 
 #define SCTLR_EL1_E0E_MASK	(1 << 24)
 #define SCTLR_EL1_EE_MASK	(1 << 25)
-#define HCR_EL2_TGE		(1 << 27)
+#define HCR_EL2_TGE		(1UL << 27)
+#define HCR_EL2_E2H		(1UL << 34)
 
 static int debug_fd;
 
@@ -442,11 +443,21 @@ int kvm_cpu__get_endianness(struct kvm_cpu *vcpu)
 
 	switch (psr & PSR_MODE_MASK) {
 	case PSR_MODE_EL0t:
-		if (hcr & HCR_EL2_TGE)
+		switch (hcr & (HCR_EL2_E2H | HCR_EL2_TGE)) {
+		case HCR_EL2_E2H | HCR_EL2_TGE: /* EL2&0 */
 			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL2);
-		else
+			bit = SCTLR_EL1_E0E_MASK;
+			break;
+		case HCR_EL2_TGE: 		/* EL2 */
+			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL2);
+			bit = SCTLR_EL1_EE_MASK;
+			break;
+		case HCR_EL2_E2H:		/* EL1&0 (VHE) */
+		default:			/* EL1&0 (!VHE) */
 			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL1);
-		bit = SCTLR_EL1_E0E_MASK;
+			bit = SCTLR_EL1_E0E_MASK;
+			break;
+		}
 		break;
 	case PSR_MODE_EL1t:
 	case PSR_MODE_EL1h:

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ
  2026-01-23 14:27 ` [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ Andre Przywara
@ 2026-01-26 18:03   ` Marc Zyngier
  2026-01-27 12:07     ` Andre Przywara
  0 siblings, 1 reply; 18+ messages in thread
From: Marc Zyngier @ 2026-01-26 18:03 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Julien Thierry, Will Deacon, kvm, kvmarm, Alexandru Elisei,
	Sascha Bischoff

On Fri, 23 Jan 2026 14:27:25 +0000,
Andre Przywara <andre.przywara@arm.com> wrote:
> 
> Uses the new VGIC KVM device attribute to set the maintenance IRQ.
> This is fixed to use PPI 9, as a platform decision made by kvmtool,
> matching the SBSA recommendation.
> Use the opportunity to pass the kvm pointer to gic__generate_fdt_nodes(),
> as this simplifies the call and allows us access to the nested_virt
> config variable on the way.
> 
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
>  arm64/arm-cpu.c         |  2 +-
>  arm64/gic.c             | 29 +++++++++++++++++++++++++++--
>  arm64/include/kvm/gic.h |  2 +-
>  3 files changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/arm64/arm-cpu.c b/arm64/arm-cpu.c
> index 69bb2cb2..0843ac05 100644
> --- a/arm64/arm-cpu.c
> +++ b/arm64/arm-cpu.c
> @@ -14,7 +14,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm)
>  {
>  	int timer_interrupts[4] = {13, 14, 11, 10};
>  
> -	gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
> +	gic__generate_fdt_nodes(fdt, kvm);
>  	timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
>  	pmu__generate_fdt_nodes(fdt, kvm);
>  }
> diff --git a/arm64/gic.c b/arm64/gic.c
> index b0d3a1ab..2a595184 100644
> --- a/arm64/gic.c
> +++ b/arm64/gic.c
> @@ -11,6 +11,8 @@
>  
>  #define IRQCHIP_GIC 0
>  
> +#define GIC_MAINT_IRQ	9
> +
>  static int gic_fd = -1;
>  static u64 gic_redists_base;
>  static u64 gic_redists_size;
> @@ -302,10 +304,15 @@ static int gic__init_gic(struct kvm *kvm)
>  
>  	int lines = irq__get_nr_allocated_lines();
>  	u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
> +	u32 maint_irq = GIC_PPI_IRQ_BASE + GIC_MAINT_IRQ;
>  	struct kvm_device_attr nr_irqs_attr = {
>  		.group	= KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
>  		.addr	= (u64)(unsigned long)&nr_irqs,
>  	};
> +	struct kvm_device_attr maint_irq_attr = {
> +		.group	= KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ,
> +		.addr	= (u64)(unsigned long)&maint_irq,
> +	};
>  	struct kvm_device_attr vgic_init_attr = {
>  		.group	= KVM_DEV_ARM_VGIC_GRP_CTRL,
>  		.attr	= KVM_DEV_ARM_VGIC_CTRL_INIT,
> @@ -325,6 +332,16 @@ static int gic__init_gic(struct kvm *kvm)
>  			return ret;
>  	}
>  
> +	if (kvm->cfg.arch.nested_virt) {
> +		ret = ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &maint_irq_attr);
> +		if (!ret)
> +			ret = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &maint_irq_attr);
> +		if (ret) {
> +			pr_err("could not set maintenance IRQ\n");
> +			return ret;
> +		}
> +	}
> +
>  	irq__routing_init(kvm);
>  
>  	if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &vgic_init_attr)) {
> @@ -342,7 +359,7 @@ static int gic__init_gic(struct kvm *kvm)
>  }
>  late_init(gic__init_gic)
>  
> -void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
> +void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
>  {
>  	const char *compatible, *msi_compatible = NULL;
>  	u64 msi_prop[2];
> @@ -350,8 +367,12 @@ void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
>  		cpu_to_fdt64(ARM_GIC_DIST_BASE), cpu_to_fdt64(ARM_GIC_DIST_SIZE),
>  		0, 0,				/* to be filled */
>  	};
> +	u32 maint_irq[] = {
> +		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(GIC_MAINT_IRQ),
> +		gic__get_fdt_irq_cpumask(kvm) | IRQ_TYPE_LEVEL_HIGH
> +	};

This looks utterly broken, and my guests barf on this:

        intc {  
                compatible = "arm,gic-v3";
                #interrupt-cells = <0x03>;
                interrupt-controller;
                reg = <0x00 0x3fff0000 0x00 0x10000 0x00 0x3fef0000 0x00 0x100000>;
                interrupts = <0x01 0x09 0x4000000>;
                                        ^^^^^^^^^^^
Are you testing on a big-endian box??? I fixed it with the patchlet
below, but I also wonder why you added gic__get_fdt_irq_cpumask()...

	M.

diff --git a/arm64/gic.c b/arm64/gic.c
index 2a59518..640ff35 100644
--- a/arm64/gic.c
+++ b/arm64/gic.c
@@ -369,7 +369,7 @@ void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
 	};
 	u32 maint_irq[] = {
 		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(GIC_MAINT_IRQ),
-		gic__get_fdt_irq_cpumask(kvm) | IRQ_TYPE_LEVEL_HIGH
+		cpu_to_fdt32(gic__get_fdt_irq_cpumask(kvm) | IRQ_TYPE_LEVEL_HIGH),
 	};
 
 	switch (kvm->cfg.arch.irqchip) {

-- 
Without deviation from the norm, progress is not possible.

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 7/7] arm64: Handle virtio endianness reset when running nested
  2026-01-23 16:03   ` Marc Zyngier
@ 2026-01-27 10:15     ` Sascha Bischoff
  0 siblings, 0 replies; 18+ messages in thread
From: Sascha Bischoff @ 2026-01-27 10:15 UTC (permalink / raw)
  To: maz@kernel.org, Andre Przywara
  Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev, Alexandru Elisei,
	will@kernel.org, nd, julien.thierry.kdev@gmail.com

On Fri, 2026-01-23 at 16:03 +0000, Marc Zyngier wrote:
> On Fri, 23 Jan 2026 14:27:29 +0000,
> Andre Przywara <andre.przywara@arm.com> wrote:
> > 
> > From: Marc Zyngier <maz@kernel.org>
> > 
> > When running an EL2 guest, we need to make sure we don't sample
> > SCTLR_EL1 to work out the virtio endianness, as this is likely
> > to be a bit random.
> > 
> > Signed-off-by: Marc Zyngier <maz@kernel.org>
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > ---
> >  arm64/include/kvm/kvm-cpu-arch.h |  5 ++--
> >  arm64/kvm-cpu.c                  | 47 +++++++++++++++++++++++++---
> > ----
> >  2 files changed, 40 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arm64/include/kvm/kvm-cpu-arch.h
> > b/arm64/include/kvm/kvm-cpu-arch.h
> > index 1af394aa..85646ad4 100644
> > --- a/arm64/include/kvm/kvm-cpu-arch.h
> > +++ b/arm64/include/kvm/kvm-cpu-arch.h
> > @@ -10,8 +10,9 @@
> >  #define ARM_MPIDR_HWID_BITMASK	0xFF00FFFFFFUL
> >  #define ARM_CPU_ID		3, 0, 0, 0
> >  #define ARM_CPU_ID_MPIDR	5
> > -#define ARM_CPU_CTRL		3, 0, 1, 0
> > -#define ARM_CPU_CTRL_SCTLR_EL1	0
> > +#define SYS_SCTLR_EL1		3, 4, 1, 0, 0
> > +#define SYS_SCTLR_EL2		3, 4, 1, 0, 0

These can't both be the same! Looks like SCTLR_EL2 is correct, but
SCTLR_EL1 should be 3, 0, 1, 0, 0.

> > +#define SYS_HCR_EL2		3, 4, 1, 1, 0
> >  
> >  struct kvm_cpu {
> >  	pthread_t	thread;
> > diff --git a/arm64/kvm-cpu.c b/arm64/kvm-cpu.c
> > index 5e4f3a7d..35e1c639 100644
> > --- a/arm64/kvm-cpu.c
> > +++ b/arm64/kvm-cpu.c
> > @@ -12,6 +12,7 @@
> >  
> >  #define SCTLR_EL1_E0E_MASK	(1 << 24)
> >  #define SCTLR_EL1_EE_MASK	(1 << 25)

nit: It might be worth renaming these to SCTLR_ELx_E... (or
SCTLR_EE_MASK, SCTLR_E0E_MASK) as we apply them to both the EL1 and EL2
versions.

> > +#define HCR_EL2_TGE		(1 << 27)
> >  
> >  static int debug_fd;
> >  
> > @@ -408,7 +409,8 @@ int kvm_cpu__get_endianness(struct kvm_cpu
> > *vcpu)
> >  {
> >  	struct kvm_one_reg reg;
> >  	u64 psr;
> > -	u64 sctlr;
> > +	u64 sctlr, bit;
> > +	u64 hcr = 0;
> >  
> >  	/*
> >  	 * Quoting the definition given by Peter Maydell:
> > @@ -419,8 +421,9 @@ int kvm_cpu__get_endianness(struct kvm_cpu
> > *vcpu)
> >  	 * We first check for an AArch32 guest: its endianness can
> >  	 * change when using SETEND, which affects the CPSR.E bit.
> >  	 *
> > -	 * If we're AArch64, use SCTLR_EL1.E0E if access comes
> > from
> > -	 * EL0, and SCTLR_EL1.EE if access comes from EL1.
> > +	 * If we're AArch64, determine which SCTLR register to
> > use,
> > +	 * depending on NV being used or not. Then use either the
> > E0E
> > +	 * bit for EL0, or the EE bit for EL1/EL2.
> >  	 */
> >  	reg.id = ARM64_CORE_REG(regs.pstate);
> >  	reg.addr = (u64)&psr;
> > @@ -430,16 +433,40 @@ int kvm_cpu__get_endianness(struct kvm_cpu
> > *vcpu)
> >  	if (psr & PSR_MODE32_BIT)
> >  		return (psr & COMPAT_PSR_E_BIT) ? VIRTIO_ENDIAN_BE
> > : VIRTIO_ENDIAN_LE;
> >  
> > -	reg.id = ARM64_SYS_REG(ARM_CPU_CTRL,
> > ARM_CPU_CTRL_SCTLR_EL1);
> > +	if (vcpu->kvm->cfg.arch.nested_virt) {
> > +		reg.id = ARM64_SYS_REG(SYS_HCR_EL2);
> > +		reg.addr = (u64)&hcr;
> > +		if (ioctl(vcpu->vcpu_fd, KVM_GET_ONE_REG, &reg) <
> > 0)
> > +			die("KVM_GET_ONE_REG failed (HCR_EL2)");
> > +	}
> > +
> > +	switch (psr & PSR_MODE_MASK) {
> > +	case PSR_MODE_EL0t:
> > +		if (hcr & HCR_EL2_TGE)
> > +			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL2);
> > +		else
> > +			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL1);
> > +		bit = SCTLR_EL1_E0E_MASK;
> > +		break;
> 
> A discussion with Sascha outlined a small bug here: when using the
> EL2
> translation regime (E2H==0), we sample the wrong bit (SCTLR_EL2.E0E
> does not exist in this case).
> 
> A potential fix is as follows, though I don't think anyone will
> care...
> 
> 	M.
> 
> diff --git a/arm64/kvm-cpu.c b/arm64/kvm-cpu.c
> index 35e1c639..7b012e7a 100644
> --- a/arm64/kvm-cpu.c
> +++ b/arm64/kvm-cpu.c
> @@ -12,7 +12,8 @@
>  
>  #define SCTLR_EL1_E0E_MASK	(1 << 24)
>  #define SCTLR_EL1_EE_MASK	(1 << 25)
> -#define HCR_EL2_TGE		(1 << 27)
> +#define HCR_EL2_TGE		(1UL << 27)
> +#define HCR_EL2_E2H		(1UL << 34)
>  
>  static int debug_fd;
>  
> @@ -442,11 +443,21 @@ int kvm_cpu__get_endianness(struct kvm_cpu
> *vcpu)
>  
>  	switch (psr & PSR_MODE_MASK) {
>  	case PSR_MODE_EL0t:
> -		if (hcr & HCR_EL2_TGE)
> +		switch (hcr & (HCR_EL2_E2H | HCR_EL2_TGE)) {
> +		case HCR_EL2_E2H | HCR_EL2_TGE: /* EL2&0 */
>  			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL2);
> -		else
> +			bit = SCTLR_EL1_E0E_MASK;
> +			break;
> +		case HCR_EL2_TGE: 		/* EL2 */
> +			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL2);
> +			bit = SCTLR_EL1_EE_MASK;
> +			break;
> +		case HCR_EL2_E2H:		/* EL1&0 (VHE) */
> +		default:			/* EL1&0 (!VHE) */
>  			reg.id = ARM64_SYS_REG(SYS_SCTLR_EL1);
> -		bit = SCTLR_EL1_E0E_MASK;
> +			bit = SCTLR_EL1_E0E_MASK;
> +			break;
> +		}
>  		break;

This version matches my understanding of how this should work.

It is probably worth updating the comment above this block too. At the
very least "depending on NV being used or not" should become "depending
on the combination of TGE and E2H" or similar.

Thanks,
Sascha

>  	case PSR_MODE_EL1t:
>  	case PSR_MODE_EL1h:
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ
  2026-01-26 18:03   ` Marc Zyngier
@ 2026-01-27 12:07     ` Andre Przywara
  2026-01-27 13:23       ` Sascha Bischoff
  0 siblings, 1 reply; 18+ messages in thread
From: Andre Przywara @ 2026-01-27 12:07 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Julien Thierry, Will Deacon, kvm, kvmarm, Alexandru Elisei,
	Sascha Bischoff

Hi Marc,

On 26/01/2026 18:03, Marc Zyngier wrote:
> On Fri, 23 Jan 2026 14:27:25 +0000,
> Andre Przywara <andre.przywara@arm.com> wrote:
>>
>> Uses the new VGIC KVM device attribute to set the maintenance IRQ.
>> This is fixed to use PPI 9, as a platform decision made by kvmtool,
>> matching the SBSA recommendation.
>> Use the opportunity to pass the kvm pointer to gic__generate_fdt_nodes(),
>> as this simplifies the call and allows us access to the nested_virt
>> config variable on the way.
>>
>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>> ---
>>   arm64/arm-cpu.c         |  2 +-
>>   arm64/gic.c             | 29 +++++++++++++++++++++++++++--
>>   arm64/include/kvm/gic.h |  2 +-
>>   3 files changed, 29 insertions(+), 4 deletions(-)
>>
>> diff --git a/arm64/arm-cpu.c b/arm64/arm-cpu.c
>> index 69bb2cb2..0843ac05 100644
>> --- a/arm64/arm-cpu.c
>> +++ b/arm64/arm-cpu.c
>> @@ -14,7 +14,7 @@ static void generate_fdt_nodes(void *fdt, struct kvm *kvm)
>>   {
>>   	int timer_interrupts[4] = {13, 14, 11, 10};
>>   
>> -	gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
>> +	gic__generate_fdt_nodes(fdt, kvm);
>>   	timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
>>   	pmu__generate_fdt_nodes(fdt, kvm);
>>   }
>> diff --git a/arm64/gic.c b/arm64/gic.c
>> index b0d3a1ab..2a595184 100644
>> --- a/arm64/gic.c
>> +++ b/arm64/gic.c
>> @@ -11,6 +11,8 @@
>>   
>>   #define IRQCHIP_GIC 0
>>   
>> +#define GIC_MAINT_IRQ	9
>> +
>>   static int gic_fd = -1;
>>   static u64 gic_redists_base;
>>   static u64 gic_redists_size;
>> @@ -302,10 +304,15 @@ static int gic__init_gic(struct kvm *kvm)
>>   
>>   	int lines = irq__get_nr_allocated_lines();
>>   	u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
>> +	u32 maint_irq = GIC_PPI_IRQ_BASE + GIC_MAINT_IRQ;
>>   	struct kvm_device_attr nr_irqs_attr = {
>>   		.group	= KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
>>   		.addr	= (u64)(unsigned long)&nr_irqs,
>>   	};
>> +	struct kvm_device_attr maint_irq_attr = {
>> +		.group	= KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ,
>> +		.addr	= (u64)(unsigned long)&maint_irq,
>> +	};
>>   	struct kvm_device_attr vgic_init_attr = {
>>   		.group	= KVM_DEV_ARM_VGIC_GRP_CTRL,
>>   		.attr	= KVM_DEV_ARM_VGIC_CTRL_INIT,
>> @@ -325,6 +332,16 @@ static int gic__init_gic(struct kvm *kvm)
>>   			return ret;
>>   	}
>>   
>> +	if (kvm->cfg.arch.nested_virt) {
>> +		ret = ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &maint_irq_attr);
>> +		if (!ret)
>> +			ret = ioctl(gic_fd, KVM_SET_DEVICE_ATTR, &maint_irq_attr);
>> +		if (ret) {
>> +			pr_err("could not set maintenance IRQ\n");
>> +			return ret;
>> +		}
>> +	}
>> +
>>   	irq__routing_init(kvm);
>>   
>>   	if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR, &vgic_init_attr)) {
>> @@ -342,7 +359,7 @@ static int gic__init_gic(struct kvm *kvm)
>>   }
>>   late_init(gic__init_gic)
>>   
>> -void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
>> +void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
>>   {
>>   	const char *compatible, *msi_compatible = NULL;
>>   	u64 msi_prop[2];
>> @@ -350,8 +367,12 @@ void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
>>   		cpu_to_fdt64(ARM_GIC_DIST_BASE), cpu_to_fdt64(ARM_GIC_DIST_SIZE),
>>   		0, 0,				/* to be filled */
>>   	};
>> +	u32 maint_irq[] = {
>> +		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(GIC_MAINT_IRQ),
>> +		gic__get_fdt_irq_cpumask(kvm) | IRQ_TYPE_LEVEL_HIGH
>> +	};
> 
> This looks utterly broken, and my guests barf on this:
> 
>          intc {
>                  compatible = "arm,gic-v3";
>                  #interrupt-cells = <0x03>;
>                  interrupt-controller;
>                  reg = <0x00 0x3fff0000 0x00 0x10000 0x00 0x3fef0000 0x00 0x100000>;
>                  interrupts = <0x01 0x09 0x4000000>;

Ah yeah, sorry, that's of course complete blunder, this got lost in 
translation between v3 and v4.
                                           ^^^^^^^^^^^
> Are you testing on a big-endian box??? I fixed it with the patchlet
> below, but I also wonder why you added gic__get_fdt_irq_cpumask()...

this was to accommodate GICv2 (it returns 0 for GICv3), and was the 
equivalent of the hardcoded 0xff04 we had before. And though I guess 
there would be no overlap between machines supporting nested virt and 
having a GICv2 or a GICv2 emulation capable GICv3, I added this for the 
sake of completeness anyway, as it didn't feel right to make this 
assumption in the otherwise generic code.

Consider this fixed.

Cheers,
Andre

> 
> 	M.
> 
> diff --git a/arm64/gic.c b/arm64/gic.c
> index 2a59518..640ff35 100644
> --- a/arm64/gic.c
> +++ b/arm64/gic.c
> @@ -369,7 +369,7 @@ void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
>   	};
>   	u32 maint_irq[] = {
>   		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI), cpu_to_fdt32(GIC_MAINT_IRQ),
> -		gic__get_fdt_irq_cpumask(kvm) | IRQ_TYPE_LEVEL_HIGH
> +		cpu_to_fdt32(gic__get_fdt_irq_cpumask(kvm) | IRQ_TYPE_LEVEL_HIGH),
>   	};
>   
>   	switch (kvm->cfg.arch.irqchip) {
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ
  2026-01-27 12:07     ` Andre Przywara
@ 2026-01-27 13:23       ` Sascha Bischoff
  2026-01-29 17:08         ` Andre Przywara
  0 siblings, 1 reply; 18+ messages in thread
From: Sascha Bischoff @ 2026-01-27 13:23 UTC (permalink / raw)
  To: maz@kernel.org, Andre Przywara
  Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev, Alexandru Elisei,
	will@kernel.org, nd, julien.thierry.kdev@gmail.com

On Tue, 2026-01-27 at 12:07 +0000, Andre Przywara wrote:
> Hi Marc,
> 
> On 26/01/2026 18:03, Marc Zyngier wrote:
> > On Fri, 23 Jan 2026 14:27:25 +0000,
> > Andre Przywara <andre.przywara@arm.com> wrote:
> > > 
> > > Uses the new VGIC KVM device attribute to set the maintenance
> > > IRQ.
> > > This is fixed to use PPI 9, as a platform decision made by
> > > kvmtool,
> > > matching the SBSA recommendation.
> > > Use the opportunity to pass the kvm pointer to
> > > gic__generate_fdt_nodes(),
> > > as this simplifies the call and allows us access to the
> > > nested_virt
> > > config variable on the way.
> > > 
> > > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > ---
> > >   arm64/arm-cpu.c         |  2 +-
> > >   arm64/gic.c             | 29 +++++++++++++++++++++++++++--
> > >   arm64/include/kvm/gic.h |  2 +-
> > >   3 files changed, 29 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/arm64/arm-cpu.c b/arm64/arm-cpu.c
> > > index 69bb2cb2..0843ac05 100644
> > > --- a/arm64/arm-cpu.c
> > > +++ b/arm64/arm-cpu.c
> > > @@ -14,7 +14,7 @@ static void generate_fdt_nodes(void *fdt,
> > > struct kvm *kvm)
> > >   {
> > >   	int timer_interrupts[4] = {13, 14, 11, 10};
> > >   
> > > -	gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
> > > +	gic__generate_fdt_nodes(fdt, kvm);
> > >   	timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
> > >   	pmu__generate_fdt_nodes(fdt, kvm);
> > >   }
> > > diff --git a/arm64/gic.c b/arm64/gic.c
> > > index b0d3a1ab..2a595184 100644
> > > --- a/arm64/gic.c
> > > +++ b/arm64/gic.c
> > > @@ -11,6 +11,8 @@
> > >   
> > >   #define IRQCHIP_GIC 0
> > >   
> > > +#define GIC_MAINT_IRQ	9
> > > +
> > >   static int gic_fd = -1;
> > >   static u64 gic_redists_base;
> > >   static u64 gic_redists_size;
> > > @@ -302,10 +304,15 @@ static int gic__init_gic(struct kvm *kvm)
> > >   
> > >   	int lines = irq__get_nr_allocated_lines();
> > >   	u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
> > > +	u32 maint_irq = GIC_PPI_IRQ_BASE + GIC_MAINT_IRQ;
> > >   	struct kvm_device_attr nr_irqs_attr = {
> > >   		.group	= KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
> > >   		.addr	= (u64)(unsigned long)&nr_irqs,
> > >   	};
> > > +	struct kvm_device_attr maint_irq_attr = {
> > > +		.group	= KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ,
> > > +		.addr	= (u64)(unsigned long)&maint_irq,
> > > +	};
> > >   	struct kvm_device_attr vgic_init_attr = {
> > >   		.group	= KVM_DEV_ARM_VGIC_GRP_CTRL,
> > >   		.attr	= KVM_DEV_ARM_VGIC_CTRL_INIT,
> > > @@ -325,6 +332,16 @@ static int gic__init_gic(struct kvm *kvm)
> > >   			return ret;
> > >   	}
> > >   
> > > +	if (kvm->cfg.arch.nested_virt) {
> > > +		ret = ioctl(gic_fd, KVM_HAS_DEVICE_ATTR,
> > > &maint_irq_attr);
> > > +		if (!ret)
> > > +			ret = ioctl(gic_fd, KVM_SET_DEVICE_ATTR,
> > > &maint_irq_attr);
> > > +		if (ret) {
> > > +			pr_err("could not set maintenance
> > > IRQ\n");
> > > +			return ret;
> > > +		}
> > > +	}
> > > +
> > >   	irq__routing_init(kvm);
> > >   
> > >   	if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR,
> > > &vgic_init_attr)) {
> > > @@ -342,7 +359,7 @@ static int gic__init_gic(struct kvm *kvm)
> > >   }
> > >   late_init(gic__init_gic)
> > >   
> > > -void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
> > > +void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
> > >   {
> > >   	const char *compatible, *msi_compatible = NULL;
> > >   	u64 msi_prop[2];
> > > @@ -350,8 +367,12 @@ void gic__generate_fdt_nodes(void *fdt, enum
> > > irqchip_type type)
> > >   		cpu_to_fdt64(ARM_GIC_DIST_BASE),
> > > cpu_to_fdt64(ARM_GIC_DIST_SIZE),
> > >   		0, 0,				/* to be filled
> > > */
> > >   	};
> > > +	u32 maint_irq[] = {
> > > +		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
> > > cpu_to_fdt32(GIC_MAINT_IRQ),
> > > +		gic__get_fdt_irq_cpumask(kvm) |
> > > IRQ_TYPE_LEVEL_HIGH
> > > +	};
> > 
> > This looks utterly broken, and my guests barf on this:
> > 
> >          intc {
> >                  compatible = "arm,gic-v3";
> >                  #interrupt-cells = <0x03>;
> >                  interrupt-controller;
> >                  reg = <0x00 0x3fff0000 0x00 0x10000 0x00
> > 0x3fef0000 0x00 0x100000>;
> >                  interrupts = <0x01 0x09 0x4000000>;
> 
> Ah yeah, sorry, that's of course complete blunder, this got lost in 
> translation between v3 and v4.
>                                            ^^^^^^^^^^^
> > Are you testing on a big-endian box??? I fixed it with the patchlet
> > below, but I also wonder why you added
> > gic__get_fdt_irq_cpumask()...
> 
> this was to accommodate GICv2 (it returns 0 for GICv3), and was the 
> equivalent of the hardcoded 0xff04 we had before. And though I guess 
> there would be no overlap between machines supporting nested virt and
> having a GICv2 or a GICv2 emulation capable GICv3, I added this for
> the 
> sake of completeness anyway, as it didn't feel right to make this 
> assumption in the otherwise generic code.
> 
> Consider this fixed.
> 
> Cheers,
> Andre

Seems I'd missed this in v4. Sorry!

However, this made me think about GICv5 guests. Right now one can try
and create a nested guest with GICv2. Attempting to do so fails a
little ungracefully:

  Error: could not set maintenance IRQ

  Warning: Failed init: gic__init_gic

  Fatal: Initialisation failed

It might be worth catching the v2 + nested combo explicitly and
returning a slightly more useful error.

Thanks,
Sascha

> 
> > 
> > 	M.
> > 
> > diff --git a/arm64/gic.c b/arm64/gic.c
> > index 2a59518..640ff35 100644
> > --- a/arm64/gic.c
> > +++ b/arm64/gic.c
> > @@ -369,7 +369,7 @@ void gic__generate_fdt_nodes(void *fdt, struct
> > kvm *kvm)
> >   	};
> >   	u32 maint_irq[] = {
> >   		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
> > cpu_to_fdt32(GIC_MAINT_IRQ),
> > -		gic__get_fdt_irq_cpumask(kvm) |
> > IRQ_TYPE_LEVEL_HIGH
> > +		cpu_to_fdt32(gic__get_fdt_irq_cpumask(kvm) |
> > IRQ_TYPE_LEVEL_HIGH),
> >   	};
> >   
> >   	switch (kvm->cfg.arch.irqchip) {
> > 
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ
  2026-01-27 13:23       ` Sascha Bischoff
@ 2026-01-29 17:08         ` Andre Przywara
  2026-01-30  9:29           ` Sascha Bischoff
  0 siblings, 1 reply; 18+ messages in thread
From: Andre Przywara @ 2026-01-29 17:08 UTC (permalink / raw)
  To: Sascha Bischoff, maz@kernel.org
  Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev, Alexandru Elisei,
	will@kernel.org, nd, julien.thierry.kdev@gmail.com

Hi Sascha,

On 1/27/26 14:23, Sascha Bischoff wrote:
> On Tue, 2026-01-27 at 12:07 +0000, Andre Przywara wrote:
>> Hi Marc,
>>
>> On 26/01/2026 18:03, Marc Zyngier wrote:
>>> On Fri, 23 Jan 2026 14:27:25 +0000,
>>> Andre Przywara <andre.przywara@arm.com> wrote:
>>>>
>>>> Uses the new VGIC KVM device attribute to set the maintenance
>>>> IRQ.
>>>> This is fixed to use PPI 9, as a platform decision made by
>>>> kvmtool,
>>>> matching the SBSA recommendation.
>>>> Use the opportunity to pass the kvm pointer to
>>>> gic__generate_fdt_nodes(),
>>>> as this simplifies the call and allows us access to the
>>>> nested_virt
>>>> config variable on the way.
>>>>
>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>>> ---
>>>>    arm64/arm-cpu.c         |  2 +-
>>>>    arm64/gic.c             | 29 +++++++++++++++++++++++++++--
>>>>    arm64/include/kvm/gic.h |  2 +-
>>>>    3 files changed, 29 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arm64/arm-cpu.c b/arm64/arm-cpu.c
>>>> index 69bb2cb2..0843ac05 100644
>>>> --- a/arm64/arm-cpu.c
>>>> +++ b/arm64/arm-cpu.c
>>>> @@ -14,7 +14,7 @@ static void generate_fdt_nodes(void *fdt,
>>>> struct kvm *kvm)
>>>>    {
>>>>    	int timer_interrupts[4] = {13, 14, 11, 10};
>>>>    
>>>> -	gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
>>>> +	gic__generate_fdt_nodes(fdt, kvm);
>>>>    	timer__generate_fdt_nodes(fdt, kvm, timer_interrupts);
>>>>    	pmu__generate_fdt_nodes(fdt, kvm);
>>>>    }
>>>> diff --git a/arm64/gic.c b/arm64/gic.c
>>>> index b0d3a1ab..2a595184 100644
>>>> --- a/arm64/gic.c
>>>> +++ b/arm64/gic.c
>>>> @@ -11,6 +11,8 @@
>>>>    
>>>>    #define IRQCHIP_GIC 0
>>>>    
>>>> +#define GIC_MAINT_IRQ	9
>>>> +
>>>>    static int gic_fd = -1;
>>>>    static u64 gic_redists_base;
>>>>    static u64 gic_redists_size;
>>>> @@ -302,10 +304,15 @@ static int gic__init_gic(struct kvm *kvm)
>>>>    
>>>>    	int lines = irq__get_nr_allocated_lines();
>>>>    	u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
>>>> +	u32 maint_irq = GIC_PPI_IRQ_BASE + GIC_MAINT_IRQ;
>>>>    	struct kvm_device_attr nr_irqs_attr = {
>>>>    		.group	= KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
>>>>    		.addr	= (u64)(unsigned long)&nr_irqs,
>>>>    	};
>>>> +	struct kvm_device_attr maint_irq_attr = {
>>>> +		.group	= KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ,
>>>> +		.addr	= (u64)(unsigned long)&maint_irq,
>>>> +	};
>>>>    	struct kvm_device_attr vgic_init_attr = {
>>>>    		.group	= KVM_DEV_ARM_VGIC_GRP_CTRL,
>>>>    		.attr	= KVM_DEV_ARM_VGIC_CTRL_INIT,
>>>> @@ -325,6 +332,16 @@ static int gic__init_gic(struct kvm *kvm)
>>>>    			return ret;
>>>>    	}
>>>>    
>>>> +	if (kvm->cfg.arch.nested_virt) {
>>>> +		ret = ioctl(gic_fd, KVM_HAS_DEVICE_ATTR,
>>>> &maint_irq_attr);
>>>> +		if (!ret)
>>>> +			ret = ioctl(gic_fd, KVM_SET_DEVICE_ATTR,
>>>> &maint_irq_attr);
>>>> +		if (ret) {
>>>> +			pr_err("could not set maintenance
>>>> IRQ\n");
>>>> +			return ret;
>>>> +		}
>>>> +	}
>>>> +
>>>>    	irq__routing_init(kvm);
>>>>    
>>>>    	if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR,
>>>> &vgic_init_attr)) {
>>>> @@ -342,7 +359,7 @@ static int gic__init_gic(struct kvm *kvm)
>>>>    }
>>>>    late_init(gic__init_gic)
>>>>    
>>>> -void gic__generate_fdt_nodes(void *fdt, enum irqchip_type type)
>>>> +void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
>>>>    {
>>>>    	const char *compatible, *msi_compatible = NULL;
>>>>    	u64 msi_prop[2];
>>>> @@ -350,8 +367,12 @@ void gic__generate_fdt_nodes(void *fdt, enum
>>>> irqchip_type type)
>>>>    		cpu_to_fdt64(ARM_GIC_DIST_BASE),
>>>> cpu_to_fdt64(ARM_GIC_DIST_SIZE),
>>>>    		0, 0,				/* to be filled
>>>> */
>>>>    	};
>>>> +	u32 maint_irq[] = {
>>>> +		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
>>>> cpu_to_fdt32(GIC_MAINT_IRQ),
>>>> +		gic__get_fdt_irq_cpumask(kvm) |
>>>> IRQ_TYPE_LEVEL_HIGH
>>>> +	};
>>>
>>> This looks utterly broken, and my guests barf on this:
>>>
>>>           intc {
>>>                   compatible = "arm,gic-v3";
>>>                   #interrupt-cells = <0x03>;
>>>                   interrupt-controller;
>>>                   reg = <0x00 0x3fff0000 0x00 0x10000 0x00
>>> 0x3fef0000 0x00 0x100000>;
>>>                   interrupts = <0x01 0x09 0x4000000>;
>>
>> Ah yeah, sorry, that's of course complete blunder, this got lost in
>> translation between v3 and v4.
>>                                             ^^^^^^^^^^^
>>> Are you testing on a big-endian box??? I fixed it with the patchlet
>>> below, but I also wonder why you added
>>> gic__get_fdt_irq_cpumask()...
>>
>> this was to accommodate GICv2 (it returns 0 for GICv3), and was the
>> equivalent of the hardcoded 0xff04 we had before. And though I guess
>> there would be no overlap between machines supporting nested virt and
>> having a GICv2 or a GICv2 emulation capable GICv3, I added this for
>> the
>> sake of completeness anyway, as it didn't feel right to make this
>> assumption in the otherwise generic code.
>>
>> Consider this fixed.
>>
>> Cheers,
>> Andre
> 
> Seems I'd missed this in v4. Sorry!
> 
> However, this made me think about GICv5 guests. Right now one can try
> and create a nested guest with GICv2. Attempting to do so fails a
> little ungracefully:
> 
>    Error: could not set maintenance IRQ
> 
>    Warning: Failed init: gic__init_gic
> 
>    Fatal: Initialisation failed
> 
> It might be worth catching the v2 + nested combo explicitly and
> returning a slightly more useful error.

Mmmh, would that be really useful? You created that situation on the 
model, right? I don't think it's a common scenario to run a guest in EL2 
while having a GICv2 interrupt controller. And while we cannot 
completely rule this out (as you have shown), I don't think it's common 
enough to warrant an explicit check or message. At least it failed 
(because the vGICv2 device doesn't implement 
KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ), and barfed about the GIC, which should 
give people that tinker with the GIC enough clues, right?

Please let me know what you think!

Cheers,
Andre

> 
> Thanks,
> Sascha
> 
>>
>>>
>>> 	M.
>>>
>>> diff --git a/arm64/gic.c b/arm64/gic.c
>>> index 2a59518..640ff35 100644
>>> --- a/arm64/gic.c
>>> +++ b/arm64/gic.c
>>> @@ -369,7 +369,7 @@ void gic__generate_fdt_nodes(void *fdt, struct
>>> kvm *kvm)
>>>    	};
>>>    	u32 maint_irq[] = {
>>>    		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
>>> cpu_to_fdt32(GIC_MAINT_IRQ),
>>> -		gic__get_fdt_irq_cpumask(kvm) |
>>> IRQ_TYPE_LEVEL_HIGH
>>> +		cpu_to_fdt32(gic__get_fdt_irq_cpumask(kvm) |
>>> IRQ_TYPE_LEVEL_HIGH),
>>>    	};
>>>    
>>>    	switch (kvm->cfg.arch.irqchip) {
>>>
>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ
  2026-01-29 17:08         ` Andre Przywara
@ 2026-01-30  9:29           ` Sascha Bischoff
  2026-02-02  8:54             ` Andre Przywara
  0 siblings, 1 reply; 18+ messages in thread
From: Sascha Bischoff @ 2026-01-30  9:29 UTC (permalink / raw)
  To: maz@kernel.org, Andre Przywara
  Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev, Alexandru Elisei,
	will@kernel.org, nd, julien.thierry.kdev@gmail.com

On Thu, 2026-01-29 at 18:08 +0100, Andre Przywara wrote:
> Hi Sascha,
> 
> On 1/27/26 14:23, Sascha Bischoff wrote:
> > On Tue, 2026-01-27 at 12:07 +0000, Andre Przywara wrote:
> > > Hi Marc,
> > > 
> > > On 26/01/2026 18:03, Marc Zyngier wrote:
> > > > On Fri, 23 Jan 2026 14:27:25 +0000,
> > > > Andre Przywara <andre.przywara@arm.com> wrote:
> > > > > 
> > > > > Uses the new VGIC KVM device attribute to set the maintenance
> > > > > IRQ.
> > > > > This is fixed to use PPI 9, as a platform decision made by
> > > > > kvmtool,
> > > > > matching the SBSA recommendation.
> > > > > Use the opportunity to pass the kvm pointer to
> > > > > gic__generate_fdt_nodes(),
> > > > > as this simplifies the call and allows us access to the
> > > > > nested_virt
> > > > > config variable on the way.
> > > > > 
> > > > > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > > > > ---
> > > > >    arm64/arm-cpu.c         |  2 +-
> > > > >    arm64/gic.c             | 29 +++++++++++++++++++++++++++--
> > > > >    arm64/include/kvm/gic.h |  2 +-
> > > > >    3 files changed, 29 insertions(+), 4 deletions(-)
> > > > > 
> > > > > diff --git a/arm64/arm-cpu.c b/arm64/arm-cpu.c
> > > > > index 69bb2cb2..0843ac05 100644
> > > > > --- a/arm64/arm-cpu.c
> > > > > +++ b/arm64/arm-cpu.c
> > > > > @@ -14,7 +14,7 @@ static void generate_fdt_nodes(void *fdt,
> > > > > struct kvm *kvm)
> > > > >    {
> > > > >    	int timer_interrupts[4] = {13, 14, 11, 10};
> > > > >    
> > > > > -	gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
> > > > > +	gic__generate_fdt_nodes(fdt, kvm);
> > > > >    	timer__generate_fdt_nodes(fdt, kvm,
> > > > > timer_interrupts);
> > > > >    	pmu__generate_fdt_nodes(fdt, kvm);
> > > > >    }
> > > > > diff --git a/arm64/gic.c b/arm64/gic.c
> > > > > index b0d3a1ab..2a595184 100644
> > > > > --- a/arm64/gic.c
> > > > > +++ b/arm64/gic.c
> > > > > @@ -11,6 +11,8 @@
> > > > >    
> > > > >    #define IRQCHIP_GIC 0
> > > > >    
> > > > > +#define GIC_MAINT_IRQ	9
> > > > > +
> > > > >    static int gic_fd = -1;
> > > > >    static u64 gic_redists_base;
> > > > >    static u64 gic_redists_size;
> > > > > @@ -302,10 +304,15 @@ static int gic__init_gic(struct kvm
> > > > > *kvm)
> > > > >    
> > > > >    	int lines = irq__get_nr_allocated_lines();
> > > > >    	u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
> > > > > +	u32 maint_irq = GIC_PPI_IRQ_BASE + GIC_MAINT_IRQ;
> > > > >    	struct kvm_device_attr nr_irqs_attr = {
> > > > >    		.group	= KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
> > > > >    		.addr	= (u64)(unsigned long)&nr_irqs,
> > > > >    	};
> > > > > +	struct kvm_device_attr maint_irq_attr = {
> > > > > +		.group	= KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ,
> > > > > +		.addr	= (u64)(unsigned long)&maint_irq,
> > > > > +	};
> > > > >    	struct kvm_device_attr vgic_init_attr = {
> > > > >    		.group	= KVM_DEV_ARM_VGIC_GRP_CTRL,
> > > > >    		.attr	= KVM_DEV_ARM_VGIC_CTRL_INIT,
> > > > > @@ -325,6 +332,16 @@ static int gic__init_gic(struct kvm
> > > > > *kvm)
> > > > >    			return ret;
> > > > >    	}
> > > > >    
> > > > > +	if (kvm->cfg.arch.nested_virt) {
> > > > > +		ret = ioctl(gic_fd, KVM_HAS_DEVICE_ATTR,
> > > > > &maint_irq_attr);
> > > > > +		if (!ret)
> > > > > +			ret = ioctl(gic_fd,
> > > > > KVM_SET_DEVICE_ATTR,
> > > > > &maint_irq_attr);
> > > > > +		if (ret) {
> > > > > +			pr_err("could not set maintenance
> > > > > IRQ\n");
> > > > > +			return ret;
> > > > > +		}
> > > > > +	}
> > > > > +
> > > > >    	irq__routing_init(kvm);
> > > > >    
> > > > >    	if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR,
> > > > > &vgic_init_attr)) {
> > > > > @@ -342,7 +359,7 @@ static int gic__init_gic(struct kvm *kvm)
> > > > >    }
> > > > >    late_init(gic__init_gic)
> > > > >    
> > > > > -void gic__generate_fdt_nodes(void *fdt, enum irqchip_type
> > > > > type)
> > > > > +void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
> > > > >    {
> > > > >    	const char *compatible, *msi_compatible = NULL;
> > > > >    	u64 msi_prop[2];
> > > > > @@ -350,8 +367,12 @@ void gic__generate_fdt_nodes(void *fdt,
> > > > > enum
> > > > > irqchip_type type)
> > > > >    		cpu_to_fdt64(ARM_GIC_DIST_BASE),
> > > > > cpu_to_fdt64(ARM_GIC_DIST_SIZE),
> > > > >    		0, 0,				/* to be
> > > > > filled
> > > > > */
> > > > >    	};
> > > > > +	u32 maint_irq[] = {
> > > > > +		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
> > > > > cpu_to_fdt32(GIC_MAINT_IRQ),
> > > > > +		gic__get_fdt_irq_cpumask(kvm) |
> > > > > IRQ_TYPE_LEVEL_HIGH
> > > > > +	};
> > > > 
> > > > This looks utterly broken, and my guests barf on this:
> > > > 
> > > >           intc {
> > > >                   compatible = "arm,gic-v3";
> > > >                   #interrupt-cells = <0x03>;
> > > >                   interrupt-controller;
> > > >                   reg = <0x00 0x3fff0000 0x00 0x10000 0x00
> > > > 0x3fef0000 0x00 0x100000>;
> > > >                   interrupts = <0x01 0x09 0x4000000>;
> > > 
> > > Ah yeah, sorry, that's of course complete blunder, this got lost
> > > in
> > > translation between v3 and v4.
> > >                                             ^^^^^^^^^^^
> > > > Are you testing on a big-endian box??? I fixed it with the
> > > > patchlet
> > > > below, but I also wonder why you added
> > > > gic__get_fdt_irq_cpumask()...
> > > 
> > > this was to accommodate GICv2 (it returns 0 for GICv3), and was
> > > the
> > > equivalent of the hardcoded 0xff04 we had before. And though I
> > > guess
> > > there would be no overlap between machines supporting nested virt
> > > and
> > > having a GICv2 or a GICv2 emulation capable GICv3, I added this
> > > for
> > > the
> > > sake of completeness anyway, as it didn't feel right to make this
> > > assumption in the otherwise generic code.
> > > 
> > > Consider this fixed.
> > > 
> > > Cheers,
> > > Andre
> > 
> > Seems I'd missed this in v4. Sorry!
> > 
> > However, this made me think about GICv5 guests.

Hi Andre!

Apologies for confusing things. It seems that my muscle memory kicked
in when I replied, and I typed GICv5 where I'd meant GICv2! Argh!

> > Right now one can try
> > and create a nested guest with GICv2. Attempting to do so fails a
> > little ungracefully:
> > 
> >    Error: could not set maintenance IRQ
> > 
> >    Warning: Failed init: gic__init_gic
> > 
> >    Fatal: Initialisation failed
> > 
> > It might be worth catching the v2 + nested combo explicitly and
> > returning a slightly more useful error.
> 
> Mmmh, would that be really useful? You created that situation on the 
> model, right? I don't think it's a common scenario to run a guest in
> EL2 
> while having a GICv2 interrupt controller.

I did create this on the model, but it was a GICv3 FVP. So, this was `-
-irqchip=gicv2 --nested` on a GICv3 host.

I think that we are somewhat in agreement that running an EL2 guest on
GICv2 isn't a common or expected use-case. My main thinking is that it
doesn't really make sense allow the combination of anything but --
irqchip=gicv3(-its) and --nested (and eventually GICv5 once there is
nested support in KVM).

> And while we cannot 
> completely rule this out (as you have shown), I don't think it's
> common 
> enough to warrant an explicit check or message. At least it failed 
> (because the vGICv2 device doesn't implement 
> KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ), and barfed about the GIC, which
> should 
> give people that tinker with the GIC enough clues, right?

Yeah, it should give enough clues. I was just observing that the error
could have been more explicit. I was just thinking of something along
the following lines when validating the config (and actually not as
part of this change). Feel free to disregard if you think it is
overkill.

Thanks,
Sascha

diff --git a/arm64/kvm.c b/arm64/kvm.c
index ed0f1264..a50bcc23 100644
--- a/arm64/kvm.c
+++ b/arm64/kvm.c
@@ -440,8 +440,14 @@ void kvm__arch_validate_cfg(struct kvm *kvm)
            kvm->cfg.ram_addr + kvm->cfg.ram_size > SZ_4G) {          
                die("RAM extends above 4GB");                         
        }
+
        if (kvm->cfg.arch.e2h0 && !kvm->cfg.arch.nested_virt)         
                pr_warning("--e2h0 requires --nested, ignoring");
+                                                                     
+       if (kvm->cfg.arch.nested_virt &&
+           kvm->cfg.arch.irqchip != IRQCHIP_GICV3 &&                 
+           kvm->cfg.arch.irqchip != IRQCHIP_GICV3-ITS)
+               die("--nested requires a GICv3-based guest");         
 }
 
 u64 kvm__arch_default_ram_address(void)

> 
> Please let me know what you think!
> 
> Cheers,
> Andre
> 
> > 
> > Thanks,
> > Sascha
> > 
> > > 
> > > > 
> > > > 	M.
> > > > 
> > > > diff --git a/arm64/gic.c b/arm64/gic.c
> > > > index 2a59518..640ff35 100644
> > > > --- a/arm64/gic.c
> > > > +++ b/arm64/gic.c
> > > > @@ -369,7 +369,7 @@ void gic__generate_fdt_nodes(void *fdt,
> > > > struct
> > > > kvm *kvm)
> > > >    	};
> > > >    	u32 maint_irq[] = {
> > > >    		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
> > > > cpu_to_fdt32(GIC_MAINT_IRQ),
> > > > -		gic__get_fdt_irq_cpumask(kvm) |
> > > > IRQ_TYPE_LEVEL_HIGH
> > > > +		cpu_to_fdt32(gic__get_fdt_irq_cpumask(kvm) |
> > > > IRQ_TYPE_LEVEL_HIGH),
> > > >    	};
> > > >    
> > > >    	switch (kvm->cfg.arch.irqchip) {
> > > > 
> > > 
> > 
> 


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 5/7] arm64: Add FEAT_E2H0 support
  2026-01-23 14:27 ` [PATCH kvmtool v5 5/7] arm64: Add FEAT_E2H0 support Andre Przywara
@ 2026-01-30  9:29   ` Sascha Bischoff
  0 siblings, 0 replies; 18+ messages in thread
From: Sascha Bischoff @ 2026-01-30  9:29 UTC (permalink / raw)
  To: Andre Przywara, will@kernel.org, julien.thierry.kdev@gmail.com
  Cc: maz@kernel.org, kvm@vger.kernel.org, kvmarm@lists.linux.dev,
	Alexandru Elisei, nd

On Fri, 2026-01-23 at 14:27 +0000, Andre Przywara wrote:
> From: Marc Zyngier <maz@kernel.org>
> 
> The --nested option allows a guest to boot at EL2 without FEAT_E2H0
> (i.e. mandating VHE support). While this is great for "modern"
> operating
> systems and hypervisors, a few legacy guests are stuck in a distant
> past.
> 
> To support those, add the --e2h0 command line option, that exposes
> FEAT_E2H0 to the guest, at the expense of a number of other features,
> such
> as FEAT_NV2. This is conditioned on the host itself supporting
> FEAT_E2H0.
> 
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> Reviewed-by: Sascha Bischoff <sascha.bischoff@arm.com>
> ---
>  arm64/include/kvm/kvm-config-arch.h | 5 ++++-
>  arm64/kvm-cpu.c                     | 5 +++++
>  arm64/kvm.c                         | 2 ++
>  3 files changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/arm64/include/kvm/kvm-config-arch.h
> b/arm64/include/kvm/kvm-config-arch.h
> index 44c43367..73bf4211 100644
> --- a/arm64/include/kvm/kvm-config-arch.h
> +++ b/arm64/include/kvm/kvm-config-arch.h
> @@ -11,6 +11,7 @@ struct kvm_config_arch {
>  	bool		has_pmuv3;
>  	bool		mte_disabled;
>  	bool		nested_virt;
> +	bool		e2h0;
>  	u64		kaslr_seed;
>  	enum irqchip_type irqchip;
>  	u64		fw_addr;
> @@ -63,6 +64,8 @@ int sve_vl_parser(const struct option *opt, const
> char *arg, int unset);
>  	OPT_U64('\0', "counter-offset", &(cfg)-
> >counter_offset,			\
>  		"Specify the counter offset, defaulting to
> 0"),			\
>  	OPT_BOOLEAN('\0', "nested", &(cfg)-
> >nested_virt,			\
> -		    "Start VCPUs in EL2 (for nested virt)"),
> +		    "Start VCPUs in EL2 (for nested
> virt)"),			\
> +	OPT_BOOLEAN('\0', "e2h0", &(cfg)-
> >e2h0,					\
> +		    "Create guest without VHE support"),
>  
>  #endif /* ARM_COMMON__KVM_CONFIG_ARCH_H */
> diff --git a/arm64/kvm-cpu.c b/arm64/kvm-cpu.c
> index 42dc11da..5e4f3a7d 100644
> --- a/arm64/kvm-cpu.c
> +++ b/arm64/kvm-cpu.c
> @@ -76,6 +76,11 @@ static void kvm_cpu__select_features(struct kvm
> *kvm, struct kvm_vcpu_init *init
>  		if (!kvm__supports_extension(kvm, KVM_CAP_ARM_EL2))
>  			die("EL2 (nested virt) is not supported");
>  		init->features[0] |= 1UL << KVM_ARM_VCPU_HAS_EL2;
> +		if (kvm->cfg.arch.e2h0) {
> +			if (!kvm__supports_extension(kvm,
> KVM_CAP_ARM_EL2_E2H0))
> +				die("FEAT_E2H0 is not supported");
> +			init->features[0] |= 1UL <<
> KVM_ARM_VCPU_HAS_EL2_E2H0;
> +		}
>  	}
>  }
>  
> diff --git a/arm64/kvm.c b/arm64/kvm.c
> index 6e971dd7..ed0f1264 100644
> --- a/arm64/kvm.c
> +++ b/arm64/kvm.c
> @@ -440,6 +440,8 @@ void kvm__arch_validate_cfg(struct kvm *kvm)
>  	    kvm->cfg.ram_addr + kvm->cfg.ram_size > SZ_4G) {
>  		die("RAM extends above 4GB");
>  	}

As part of the other discussion I spotted the lack of newline here.
Please add a newline here to make it consistent with the rest of the
function and improve readability.

Thanks!
Sascha

> +	if (kvm->cfg.arch.e2h0 && !kvm->cfg.arch.nested_virt)
> +		pr_warning("--e2h0 requires --nested, ignoring");
>  }
>  
>  u64 kvm__arch_default_ram_address(void)


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ
  2026-01-30  9:29           ` Sascha Bischoff
@ 2026-02-02  8:54             ` Andre Przywara
  0 siblings, 0 replies; 18+ messages in thread
From: Andre Przywara @ 2026-02-02  8:54 UTC (permalink / raw)
  To: Sascha Bischoff, maz@kernel.org
  Cc: kvm@vger.kernel.org, kvmarm@lists.linux.dev, Alexandru Elisei,
	will@kernel.org, julien.thierry.kdev@gmail.com

Hi Sascha,

On 1/30/26 10:29, Sascha Bischoff wrote:
> On Thu, 2026-01-29 at 18:08 +0100, Andre Przywara wrote:
>> Hi Sascha,
>>
>> On 1/27/26 14:23, Sascha Bischoff wrote:
>>> On Tue, 2026-01-27 at 12:07 +0000, Andre Przywara wrote:
>>>> Hi Marc,
>>>>
>>>> On 26/01/2026 18:03, Marc Zyngier wrote:
>>>>> On Fri, 23 Jan 2026 14:27:25 +0000,
>>>>> Andre Przywara <andre.przywara@arm.com> wrote:
>>>>>>
>>>>>> Uses the new VGIC KVM device attribute to set the maintenance
>>>>>> IRQ.
>>>>>> This is fixed to use PPI 9, as a platform decision made by
>>>>>> kvmtool,
>>>>>> matching the SBSA recommendation.
>>>>>> Use the opportunity to pass the kvm pointer to
>>>>>> gic__generate_fdt_nodes(),
>>>>>> as this simplifies the call and allows us access to the
>>>>>> nested_virt
>>>>>> config variable on the way.
>>>>>>
>>>>>> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
>>>>>> ---
>>>>>>     arm64/arm-cpu.c         |  2 +-
>>>>>>     arm64/gic.c             | 29 +++++++++++++++++++++++++++--
>>>>>>     arm64/include/kvm/gic.h |  2 +-
>>>>>>     3 files changed, 29 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/arm64/arm-cpu.c b/arm64/arm-cpu.c
>>>>>> index 69bb2cb2..0843ac05 100644
>>>>>> --- a/arm64/arm-cpu.c
>>>>>> +++ b/arm64/arm-cpu.c
>>>>>> @@ -14,7 +14,7 @@ static void generate_fdt_nodes(void *fdt,
>>>>>> struct kvm *kvm)
>>>>>>     {
>>>>>>     	int timer_interrupts[4] = {13, 14, 11, 10};
>>>>>>     
>>>>>> -	gic__generate_fdt_nodes(fdt, kvm->cfg.arch.irqchip);
>>>>>> +	gic__generate_fdt_nodes(fdt, kvm);
>>>>>>     	timer__generate_fdt_nodes(fdt, kvm,
>>>>>> timer_interrupts);
>>>>>>     	pmu__generate_fdt_nodes(fdt, kvm);
>>>>>>     }
>>>>>> diff --git a/arm64/gic.c b/arm64/gic.c
>>>>>> index b0d3a1ab..2a595184 100644
>>>>>> --- a/arm64/gic.c
>>>>>> +++ b/arm64/gic.c
>>>>>> @@ -11,6 +11,8 @@
>>>>>>     
>>>>>>     #define IRQCHIP_GIC 0
>>>>>>     
>>>>>> +#define GIC_MAINT_IRQ	9
>>>>>> +
>>>>>>     static int gic_fd = -1;
>>>>>>     static u64 gic_redists_base;
>>>>>>     static u64 gic_redists_size;
>>>>>> @@ -302,10 +304,15 @@ static int gic__init_gic(struct kvm
>>>>>> *kvm)
>>>>>>     
>>>>>>     	int lines = irq__get_nr_allocated_lines();
>>>>>>     	u32 nr_irqs = ALIGN(lines, 32) + GIC_SPI_IRQ_BASE;
>>>>>> +	u32 maint_irq = GIC_PPI_IRQ_BASE + GIC_MAINT_IRQ;
>>>>>>     	struct kvm_device_attr nr_irqs_attr = {
>>>>>>     		.group	= KVM_DEV_ARM_VGIC_GRP_NR_IRQS,
>>>>>>     		.addr	= (u64)(unsigned long)&nr_irqs,
>>>>>>     	};
>>>>>> +	struct kvm_device_attr maint_irq_attr = {
>>>>>> +		.group	= KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ,
>>>>>> +		.addr	= (u64)(unsigned long)&maint_irq,
>>>>>> +	};
>>>>>>     	struct kvm_device_attr vgic_init_attr = {
>>>>>>     		.group	= KVM_DEV_ARM_VGIC_GRP_CTRL,
>>>>>>     		.attr	= KVM_DEV_ARM_VGIC_CTRL_INIT,
>>>>>> @@ -325,6 +332,16 @@ static int gic__init_gic(struct kvm
>>>>>> *kvm)
>>>>>>     			return ret;
>>>>>>     	}
>>>>>>     
>>>>>> +	if (kvm->cfg.arch.nested_virt) {
>>>>>> +		ret = ioctl(gic_fd, KVM_HAS_DEVICE_ATTR,
>>>>>> &maint_irq_attr);
>>>>>> +		if (!ret)
>>>>>> +			ret = ioctl(gic_fd,
>>>>>> KVM_SET_DEVICE_ATTR,
>>>>>> &maint_irq_attr);
>>>>>> +		if (ret) {
>>>>>> +			pr_err("could not set maintenance
>>>>>> IRQ\n");
>>>>>> +			return ret;
>>>>>> +		}
>>>>>> +	}
>>>>>> +
>>>>>>     	irq__routing_init(kvm);
>>>>>>     
>>>>>>     	if (!ioctl(gic_fd, KVM_HAS_DEVICE_ATTR,
>>>>>> &vgic_init_attr)) {
>>>>>> @@ -342,7 +359,7 @@ static int gic__init_gic(struct kvm *kvm)
>>>>>>     }
>>>>>>     late_init(gic__init_gic)
>>>>>>     
>>>>>> -void gic__generate_fdt_nodes(void *fdt, enum irqchip_type
>>>>>> type)
>>>>>> +void gic__generate_fdt_nodes(void *fdt, struct kvm *kvm)
>>>>>>     {
>>>>>>     	const char *compatible, *msi_compatible = NULL;
>>>>>>     	u64 msi_prop[2];
>>>>>> @@ -350,8 +367,12 @@ void gic__generate_fdt_nodes(void *fdt,
>>>>>> enum
>>>>>> irqchip_type type)
>>>>>>     		cpu_to_fdt64(ARM_GIC_DIST_BASE),
>>>>>> cpu_to_fdt64(ARM_GIC_DIST_SIZE),
>>>>>>     		0, 0,				/* to be
>>>>>> filled
>>>>>> */
>>>>>>     	};
>>>>>> +	u32 maint_irq[] = {
>>>>>> +		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
>>>>>> cpu_to_fdt32(GIC_MAINT_IRQ),
>>>>>> +		gic__get_fdt_irq_cpumask(kvm) |
>>>>>> IRQ_TYPE_LEVEL_HIGH
>>>>>> +	};
>>>>>
>>>>> This looks utterly broken, and my guests barf on this:
>>>>>
>>>>>            intc {
>>>>>                    compatible = "arm,gic-v3";
>>>>>                    #interrupt-cells = <0x03>;
>>>>>                    interrupt-controller;
>>>>>                    reg = <0x00 0x3fff0000 0x00 0x10000 0x00
>>>>> 0x3fef0000 0x00 0x100000>;
>>>>>                    interrupts = <0x01 0x09 0x4000000>;
>>>>
>>>> Ah yeah, sorry, that's of course complete blunder, this got lost
>>>> in
>>>> translation between v3 and v4.
>>>>                                              ^^^^^^^^^^^
>>>>> Are you testing on a big-endian box??? I fixed it with the
>>>>> patchlet
>>>>> below, but I also wonder why you added
>>>>> gic__get_fdt_irq_cpumask()...
>>>>
>>>> this was to accommodate GICv2 (it returns 0 for GICv3), and was
>>>> the
>>>> equivalent of the hardcoded 0xff04 we had before. And though I
>>>> guess
>>>> there would be no overlap between machines supporting nested virt
>>>> and
>>>> having a GICv2 or a GICv2 emulation capable GICv3, I added this
>>>> for
>>>> the
>>>> sake of completeness anyway, as it didn't feel right to make this
>>>> assumption in the otherwise generic code.
>>>>
>>>> Consider this fixed.
>>>>
>>>> Cheers,
>>>> Andre
>>>
>>> Seems I'd missed this in v4. Sorry!
>>>
>>> However, this made me think about GICv5 guests.
> 
> Hi Andre!
> 
> Apologies for confusing things. It seems that my muscle memory kicked
> in when I replied, and I typed GICv5 where I'd meant GICv2! Argh!

No worries, I sed'ed it while reading ;-)

>>> Right now one can try
>>> and create a nested guest with GICv2. Attempting to do so fails a
>>> little ungracefully:
>>>
>>>     Error: could not set maintenance IRQ
>>>
>>>     Warning: Failed init: gic__init_gic
>>>
>>>     Fatal: Initialisation failed
>>>
>>> It might be worth catching the v2 + nested combo explicitly and
>>> returning a slightly more useful error.
>>
>> Mmmh, would that be really useful? You created that situation on the
>> model, right? I don't think it's a common scenario to run a guest in
>> EL2
>> while having a GICv2 interrupt controller.
> 
> I did create this on the model, but it was a GICv3 FVP. So, this was `-
> -irqchip=gicv2 --nested` on a GICv3 host.

So the model is then providing a GICv2-compatible GICv3, which is quite 
rare in real silicon. Any halfway recent GIC would not support this, I 
think.

> I think that we are somewhat in agreement that running an EL2 guest on
> GICv2 isn't a common or expected use-case. My main thinking is that it
> doesn't really make sense allow the combination of anything but --
> irqchip=gicv3(-its) and --nested (and eventually GICv5 once there is
> nested support in KVM).

I don't think kvmtool should do this kind of predicting and filtering 
when this is eventually a feature of the running *kernel*. At the moment 
GICv3 is indeed the only supported GIC for nested, but this might change 
(with GICv5, for instance). The kernel returning an error should be the 
actual cause of a bailout.

>> And while we cannot
>> completely rule this out (as you have shown), I don't think it's
>> common
>> enough to warrant an explicit check or message. At least it failed
>> (because the vGICv2 device doesn't implement
>> KVM_DEV_ARM_VGIC_GRP_MAINT_IRQ), and barfed about the GIC, which
>> should
>> give people that tinker with the GIC enough clues, right?
> 
> Yeah, it should give enough clues. I was just observing that the error
> could have been more explicit. I was just thinking of something along
> the following lines when validating the config (and actually not as
> part of this change). Feel free to disregard if you think it is
> overkill.

With what I said above and the fact that trying to run in EL2 with a 
GICv2 is fringe anyway, I'd prefer to not have any explicit test.

Cheers,
Andre

> 
> Thanks,
> Sascha
> 
> diff --git a/arm64/kvm.c b/arm64/kvm.c
> index ed0f1264..a50bcc23 100644
> --- a/arm64/kvm.c
> +++ b/arm64/kvm.c
> @@ -440,8 +440,14 @@ void kvm__arch_validate_cfg(struct kvm *kvm)
>              kvm->cfg.ram_addr + kvm->cfg.ram_size > SZ_4G) {
>                  die("RAM extends above 4GB");
>          }
> +
>          if (kvm->cfg.arch.e2h0 && !kvm->cfg.arch.nested_virt)
>                  pr_warning("--e2h0 requires --nested, ignoring");
> +
> +       if (kvm->cfg.arch.nested_virt &&
> +           kvm->cfg.arch.irqchip != IRQCHIP_GICV3 &&
> +           kvm->cfg.arch.irqchip != IRQCHIP_GICV3-ITS)
> +               die("--nested requires a GICv3-based guest");
>   }
>   
>   u64 kvm__arch_default_ram_address(void)
> 
>>
>> Please let me know what you think!
>>
>> Cheers,
>> Andre
>>
>>>
>>> Thanks,
>>> Sascha
>>>
>>>>
>>>>>
>>>>> 	M.
>>>>>
>>>>> diff --git a/arm64/gic.c b/arm64/gic.c
>>>>> index 2a59518..640ff35 100644
>>>>> --- a/arm64/gic.c
>>>>> +++ b/arm64/gic.c
>>>>> @@ -369,7 +369,7 @@ void gic__generate_fdt_nodes(void *fdt,
>>>>> struct
>>>>> kvm *kvm)
>>>>>     	};
>>>>>     	u32 maint_irq[] = {
>>>>>     		cpu_to_fdt32(GIC_FDT_IRQ_TYPE_PPI),
>>>>> cpu_to_fdt32(GIC_MAINT_IRQ),
>>>>> -		gic__get_fdt_irq_cpumask(kvm) |
>>>>> IRQ_TYPE_LEVEL_HIGH
>>>>> +		cpu_to_fdt32(gic__get_fdt_irq_cpumask(kvm) |
>>>>> IRQ_TYPE_LEVEL_HIGH),
>>>>>     	};
>>>>>     
>>>>>     	switch (kvm->cfg.arch.irqchip) {
>>>>>
>>>>
>>>
>>
> 


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH kvmtool v5 0/7] arm64: Nested virtualization support
  2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
                   ` (6 preceding siblings ...)
  2026-01-23 14:27 ` [PATCH kvmtool v5 7/7] arm64: Handle virtio endianness reset when running nested Andre Przywara
@ 2026-02-09  2:21 ` Itaru Kitayama
  7 siblings, 0 replies; 18+ messages in thread
From: Itaru Kitayama @ 2026-02-09  2:21 UTC (permalink / raw)
  To: Andre Przywara
  Cc: Julien Thierry, Will Deacon, Marc Zyngier, kvm, kvmarm,
	Alexandru Elisei, Sascha Bischoff

On Fri, Jan 23, 2026 at 02:27:22PM +0000, Andre Przywara wrote:
> This is v5 of the nested virt support series, fixing a corner case when
> some maintenance IRQ setup fails. Also there is now a warning if --e2h0
> is specified without --nested. Many thanks to Sascha for the review!
> ========================================================
> 
> Thanks to the imperturbable efforts from Marc, arm64 support for nested
> virtualization has now reached the mainline kernel, which means the
> respective kvmtool support should now be ready as well.
> 
> Patch 1 updates the kernel headers, to get the new EL2 capability, and
> the VGIC device control to setup the maintenance IRQ.
> Patch 2 introduces the new "--nested" command line option, to let the
> VCPUs start in EL2. To allow KVM guests running in such a guest, we also
> need VGIC support, which patch 3 allows by setting the maintenance IRQ.
> Patch 4 to 6 are picked from Marc's repo, and allow to set the arch
> timer offset, enable non-VHE guests (at the cost of losing recursive
> nested virtualisation), and also advertise the virtual EL2 timer IRQ.
> 
> Tested on the FVP (with some good deal of patience), and some commercial
> (non-fruity) hardware, down to a guest's guest's guest.

Hi Andre,
I wonder if you have also tested this series with the latest QEMU which
has FEAT_NV2 support; I ask because when I tried to create a nested
guest with (again) lkvm it got stuck forever.

Thanks,
Itaru.

> 
> Cheers,
> Andre
> 
> Changelog v4 ... v5:
> - bump kernel headers to v6.19-rc6
> - print a warning if --e2h0 is given without --nested
> - fail if the maintenance IRQ setting attribute is not supported
> 
> Changelog v3 ... v4:
> - pass kvm pointer to gic__generate_fdt_nodes()
> - use macros for PPI offset and DT type identifier
> - properly calculate DT interrupt flags value
> - add patch 7 to fix virtio endianess issues
> - CAPITALISE verbs in commit message
> 
> Changelog v2 ... v3:
> - adjust^Wreplace commit messages for E2H0 and counter-offset patch
> - check for KVM_CAP_ARM_EL2_E2H0 when --e2h0 is requested
> - update kernel headers to v6.16 release
> 
> Changelog v1 ... 2:
> - add three patches from Marc:
>   - add --e2h0 command line option
>   - add --counter-offset command line option
>   - advertise all five arch timer interrupts in DT
> 
> Andre Przywara (3):
>   Sync kernel UAPI headers with v6.19-rc6
>   arm64: Initial nested virt support
>   arm64: nested: Add support for setting maintenance IRQ
> 
> Marc Zyngier (4):
>   arm64: Add counter offset control
>   arm64: Add FEAT_E2H0 support
>   arm64: Generate HYP timer interrupt specifiers
>   arm64: Handle virtio endianness reset when running nested
> 
>  arm64/arm-cpu.c                     |   6 +-
>  arm64/fdt.c                         |   5 +-
>  arm64/gic.c                         |  29 ++++++-
>  arm64/include/asm/kvm.h             |  25 ++++--
>  arm64/include/kvm/gic.h             |   2 +-
>  arm64/include/kvm/kvm-config-arch.h |  11 ++-
>  arm64/include/kvm/kvm-cpu-arch.h    |   5 +-
>  arm64/include/kvm/timer.h           |   2 +-
>  arm64/kvm-cpu.c                     |  64 ++++++++++++---
>  arm64/kvm.c                         |  19 +++++
>  arm64/timer.c                       |  29 +++----
>  include/linux/kvm.h                 |  47 +++++++++++
>  include/linux/virtio_ids.h          |   1 +
>  include/linux/virtio_net.h          |  49 +++++++++++-
>  include/linux/virtio_pci.h          |   3 +-
>  powerpc/include/asm/kvm.h           |  13 ----
>  riscv/include/asm/kvm.h             |  29 ++++++-
>  x86/include/asm/kvm.h               | 116 ++++++++++++++++++++++++++++
>  18 files changed, 394 insertions(+), 61 deletions(-)
> 
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2026-02-09  2:22 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-01-23 14:27 [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Andre Przywara
2026-01-23 14:27 ` [PATCH kvmtool v5 1/7] Sync kernel UAPI headers with v6.19-rc6 Andre Przywara
2026-01-23 14:27 ` [PATCH kvmtool v5 2/7] arm64: Initial nested virt support Andre Przywara
2026-01-23 14:27 ` [PATCH kvmtool v5 3/7] arm64: nested: Add support for setting maintenance IRQ Andre Przywara
2026-01-26 18:03   ` Marc Zyngier
2026-01-27 12:07     ` Andre Przywara
2026-01-27 13:23       ` Sascha Bischoff
2026-01-29 17:08         ` Andre Przywara
2026-01-30  9:29           ` Sascha Bischoff
2026-02-02  8:54             ` Andre Przywara
2026-01-23 14:27 ` [PATCH kvmtool v5 4/7] arm64: Add counter offset control Andre Przywara
2026-01-23 14:27 ` [PATCH kvmtool v5 5/7] arm64: Add FEAT_E2H0 support Andre Przywara
2026-01-30  9:29   ` Sascha Bischoff
2026-01-23 14:27 ` [PATCH kvmtool v5 6/7] arm64: Generate HYP timer interrupt specifiers Andre Przywara
2026-01-23 14:27 ` [PATCH kvmtool v5 7/7] arm64: Handle virtio endianness reset when running nested Andre Przywara
2026-01-23 16:03   ` Marc Zyngier
2026-01-27 10:15     ` Sascha Bischoff
2026-02-09  2:21 ` [PATCH kvmtool v5 0/7] arm64: Nested virtualization support Itaru Kitayama

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox