* [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less
@ 2026-03-10 17:08 Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 01/27] xen/riscv: Implement ARCH_PAGING_MEMPOOL Oleksii Kurochko
` (26 more replies)
0 siblings, 27 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
This patch series reprensent a bunch of patches necessary to enable
common part of Dom0less.
The stuff necessary to start/launch domains will be introduced separately.
This patch series is based on [1], but a lot of patch could go even without
it.
[1] https://lore.kernel.org/xen-devel/cover.1770821989.git.oleksii.kurochko@gmail.com/
Oleksii Kurochko (27):
xen/riscv: Implement ARCH_PAGING_MEMPOOL
xen/riscv: Implement construct_domain()
xen/riscv: implement prerequisites for domain_create()
xen/riscv: rework G-stage mode handling
xen/riscv: introduce guest riscv,isa string
xen/riscv: implement make_cpus_node()
xen/riscv: implement make_timer_node()
xen/riscv: implement make_arch_nodes()
xen/riscv: implement make_intc_domU_node()
xen/riscv: generate IMSIC DT node for guest domains
xen/riscv: create APLIC DT node for guest domains
xen/riscv: introduce aia_init() and aia_available()
xen/riscv: add basic VGEIN management for AIA guests
xen/riscv: introduce per-vCPU IMSIC state
xen/riscv: add very early virtual APLIC (vAPLIC) initialization
support
xen/riscv: implement IRQ mapping for device passthrough
xen/riscv: add missing APLIC register offsets, masks to asm/aplic.h.
xen/riscv: add vaplic access check
xen/riscv: emulate guest writes to virtual APLIC MMIO
xen/riscv: emulate guest reads from virtual APLIC MMIO
xen/riscv: introduce (de)initialization helpers for vINTC
xen/riscv: implement init_intc_phandle()
xen/riscv: call do_initcalls() in start_xen()
xen/riscv: init rcu
xen/riscv: setup system domains
xen/riscv: provide init_vuart()
xen/riscv: add initial dom0less infrastructure support
xen/arch/riscv/Kconfig | 2 +
xen/arch/riscv/Makefile | 6 +-
xen/arch/riscv/aia.c | 135 +++++++
xen/arch/riscv/aplic.c | 83 +++++
xen/arch/riscv/cpufeature.c | 33 ++
xen/arch/riscv/dom0less-build.c | 63 ++++
xen/arch/riscv/domain-build.c | 175 +++++++++
xen/arch/riscv/domain.c | 41 ++-
xen/arch/riscv/imsic.c | 155 ++++++++
xen/arch/riscv/include/asm/aia.h | 28 ++
xen/arch/riscv/include/asm/aplic.h | 37 ++
xen/arch/riscv/include/asm/cpufeature.h | 2 +
xen/arch/riscv/include/asm/domain.h | 17 +
xen/arch/riscv/include/asm/imsic.h | 24 ++
xen/arch/riscv/include/asm/intc.h | 51 ++-
xen/arch/riscv/include/asm/p2m.h | 9 +-
xen/arch/riscv/include/asm/setup.h | 21 +-
xen/arch/riscv/include/asm/vaplic.h | 36 ++
xen/arch/riscv/intc.c | 60 ++++
xen/arch/riscv/p2m.c | 91 +++--
xen/arch/riscv/setup.c | 14 +
xen/arch/riscv/stubs.c | 9 +-
xen/arch/riscv/vaplic.c | 449 ++++++++++++++++++++++++
xen/arch/riscv/vmid.c | 2 +-
xen/include/public/arch-riscv.h | 3 +
xen/include/public/device_tree_defs.h | 1 +
26 files changed, 1488 insertions(+), 59 deletions(-)
create mode 100644 xen/arch/riscv/aia.c
create mode 100644 xen/arch/riscv/dom0less-build.c
create mode 100644 xen/arch/riscv/domain-build.c
create mode 100644 xen/arch/riscv/include/asm/aia.h
create mode 100644 xen/arch/riscv/include/asm/vaplic.h
create mode 100644 xen/arch/riscv/vaplic.c
--
2.53.0
^ permalink raw reply [flat|nested] 123+ messages in thread
* [PATCH v1 01/27] xen/riscv: Implement ARCH_PAGING_MEMPOOL
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-03-11 8:18 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 02/27] xen/riscv: Implement construct_domain() Oleksii Kurochko
` (25 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
The p2m_freelist is used to allocate pages for the P2M, but to initialize
this list, domain_p2m_set_allocation() might be called.
This function is invoked in construct_domU() within the common Dom0less
code, and providing an implementation of domain_p2m_set_allocation() when
CONFIG_ARCH_PAGING_MEMPOOL=y is appropriate for RISC-V.
Additionally, implement arch_{set,get}_paging_mempool_size().
They are not used directly for now, but are required to support the
XEN_DOMCTL_{get,set}_paging_mempool_size() hypercalls.
Remove 'struct paging_domain' from RISC-V's 'struct arch_domain' when
the field is not required.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/Kconfig | 1 +
xen/arch/riscv/Makefile | 2 +-
xen/arch/riscv/include/asm/domain.h | 2 ++
xen/arch/riscv/p2m.c | 31 +++++++++++++++++++++++++++++
4 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/xen/arch/riscv/Kconfig b/xen/arch/riscv/Kconfig
index 12b337365f1f..091510380949 100644
--- a/xen/arch/riscv/Kconfig
+++ b/xen/arch/riscv/Kconfig
@@ -1,5 +1,6 @@
config RISCV
def_bool y
+ select ARCH_PAGING_MEMPOOL
select DOMAIN_BUILD_HELPERS
select FUNCTION_ALIGNMENT_16B
select GENERIC_BUG_FRAME
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 48957104e0b5..b1514f1514a2 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -10,7 +10,7 @@ obj-y += irq.o
obj-y += kernel.o
obj-y += mm.o
obj-y += p2m.o
-obj-y += paging.o
+obj-$(CONFIG_ARCH_PAGING_MEMPOOL) += paging.o
obj-y += pt.o
obj-$(CONFIG_RISCV_64) += riscv64/
obj-y += sbi.o
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index 6c48bf13111d..0caacf92b5a2 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -92,7 +92,9 @@ struct arch_domain {
/* Virtual MMU */
struct p2m_domain p2m;
+#ifdef CONFIG_ARCH_PAGING_MEMPOOL
struct paging_domain paging;
+#endif
};
#include <xen/sched.h>
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index fba93bce24d9..886e06196ba2 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -1568,3 +1568,34 @@ struct page_info *get_page_from_gfn(struct domain *d, unsigned long gfn,
return page;
}
+
+#ifdef CONFIG_ARCH_PAGING_MEMPOOL
+
+int arch_set_paging_mempool_size(struct domain *d, uint64_t size)
+{
+ unsigned long pages = PFN_DOWN(size);
+ bool preempted = false;
+ int rc;
+
+ if ( (size & ~PAGE_MASK) || /* Non page-sized request? */
+ pages != PFN_DOWN(size) ) /* 32-bit overflow? */
+ return -EINVAL;
+
+ spin_lock(&d->arch.paging.lock);
+ rc = p2m_set_allocation(d, pages, &preempted);
+ spin_unlock(&d->arch.paging.lock);
+
+ ASSERT(preempted == (rc == -ERESTART));
+
+ return rc;
+}
+
+/* Return the size of the pool, in bytes. */
+int arch_get_paging_mempool_size(struct domain *d, uint64_t *size)
+{
+ *size = (uint64_t)ACCESS_ONCE(d->arch.paging.total_pages) << PAGE_SHIFT;
+
+ return 0;
+}
+
+#endif /* CONFIG_ARCH_PAGING_MEMPOOL */
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 02/27] xen/riscv: Implement construct_domain()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 01/27] xen/riscv: Implement ARCH_PAGING_MEMPOOL Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-03-24 9:37 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 03/27] xen/riscv: implement prerequisites for domain_create() Oleksii Kurochko
` (24 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Implement construct_domain() function for RISC-V, which performs initial setup
for the domain's first vCPU, loads the kernel, initrd, and device tree,
and sets up guest CPU registers for boot.
It also creates additional vCPUs up to max_vcpus and assigns the device tree
address and boot cpuid in registers.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/domain-build.c | 46 +++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)
create mode 100644 xen/arch/riscv/domain-build.c
diff --git a/xen/arch/riscv/domain-build.c b/xen/arch/riscv/domain-build.c
new file mode 100644
index 000000000000..0ce903a911ae
--- /dev/null
+++ b/xen/arch/riscv/domain-build.c
@@ -0,0 +1,46 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include <xen/fdt-domain-build.h>
+#include <xen/fdt-kernel.h>
+#include <xen/init.h>
+#include <xen/sched.h>
+
+#include <asm/current.h>
+#include <asm/guest_access.h>
+
+int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
+{
+ struct vcpu *v = d->vcpu[0];
+ struct cpu_user_regs *regs = vcpu_guest_cpu_user_regs(v);
+
+ BUG_ON(d->vcpu[0] == NULL);
+ BUG_ON(v->is_initialised);
+
+ kernel_load(kinfo);
+ initrd_load(kinfo, copy_to_guest_phys);
+ dtb_load(kinfo, copy_to_guest_phys);
+
+ regs->sepc = kinfo->entry;
+
+ /* Guest boot cpuid = 0 */
+ regs->a0 = 0;
+ regs->a1 = kinfo->dtb_paddr;
+
+ for ( unsigned int i = 1; i < d->max_vcpus; i++ )
+ {
+ if ( vcpu_create(d, i) == NULL )
+ {
+ printk("Failed to allocate %pd v%d\n", d, i);
+ break;
+ }
+
+ printk("Created vcpu %d for %pd\n", i, d);
+ }
+
+ domain_update_node_affinity(d);
+
+ v->is_initialised = 1;
+ clear_bit(_VPF_down, &v->pause_flags);
+
+ return 0;
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 03/27] xen/riscv: implement prerequisites for domain_create()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 01/27] xen/riscv: Implement ARCH_PAGING_MEMPOOL Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 02/27] xen/riscv: Implement construct_domain() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 12:57 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 04/27] xen/riscv: rework G-stage mode handling Oleksii Kurochko
` (23 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
arch_domain_create() and arch_sanitise_domain_config() are prerequisites for
domain_create().
arch_sanitise_domain_config() currently returns 0, as there is no specific
work required at this stage.
arch_domain_create() performs basic initialization, such as setting up the P2M
and initializing the domain's virtual timer.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/domain.c | 27 +++++++++++++++++++++++++++
xen/arch/riscv/include/asm/domain.h | 3 +++
xen/arch/riscv/stubs.c | 9 +--------
xen/include/public/device_tree_defs.h | 1 +
4 files changed, 32 insertions(+), 8 deletions(-)
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index 7e3070101714..515735b32e30 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -288,6 +288,33 @@ void sync_vcpu_execstate(struct vcpu *v)
/* Nothing to do -- no lazy switching */
}
+int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
+{
+ return 0;
+}
+
+int arch_domain_create(struct domain *d,
+ struct xen_domctl_createdomain *config,
+ unsigned int flags)
+{
+ int rc = 0;
+
+ if ( is_idle_domain(d) )
+ return 0;
+
+ if ( (rc = p2m_init(d)) != 0)
+ goto fail;
+
+ d->arch.next_phandle = GUEST_PHANDLE_LAST + 1;
+
+ return rc;
+
+ fail:
+ d->is_dying = DOMDYING_dead;
+ arch_domain_destroy(d);
+ return rc;
+}
+
static void __init __maybe_unused build_assertions(void)
{
/*
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index 0caacf92b5a2..506365f199c7 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -95,6 +95,9 @@ struct arch_domain {
#ifdef CONFIG_ARCH_PAGING_MEMPOOL
struct paging_domain paging;
#endif
+
+ /* Next unused device tree phandle number */
+ uint32_t next_phandle;
};
#include <xen/sched.h>
diff --git a/xen/arch/riscv/stubs.c b/xen/arch/riscv/stubs.c
index acbb5b9123ea..d139f8786cb5 100644
--- a/xen/arch/riscv/stubs.c
+++ b/xen/arch/riscv/stubs.c
@@ -101,14 +101,7 @@ void dump_pageframe_info(struct domain *d)
BUG_ON("unimplemented");
}
-int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
-{
- BUG_ON("unimplemented");
-}
-
-int arch_domain_create(struct domain *d,
- struct xen_domctl_createdomain *config,
- unsigned int flags)
+void vcpu_switch_to_aarch64_mode(struct vcpu *v)
{
BUG_ON("unimplemented");
}
diff --git a/xen/include/public/device_tree_defs.h b/xen/include/public/device_tree_defs.h
index 9e80d0499dc3..c9679cb3543c 100644
--- a/xen/include/public/device_tree_defs.h
+++ b/xen/include/public/device_tree_defs.h
@@ -14,6 +14,7 @@
*/
#define GUEST_PHANDLE_GIC (65000)
#define GUEST_PHANDLE_IOMMU (GUEST_PHANDLE_GIC + 1)
+#define GUEST_PHANDLE_LAST GUEST_PHANDLE_IOMMU
#define GUEST_ROOT_ADDRESS_CELLS 2
#define GUEST_ROOT_SIZE_CELLS 2
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 04/27] xen/riscv: rework G-stage mode handling
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (2 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 03/27] xen/riscv: implement prerequisites for domain_create() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 13:19 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 05/27] xen/riscv: introduce guest riscv,isa string Oleksii Kurochko
` (22 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Rework G-stage mode handling to make the selected mode descriptor reusable
outside of p2m initialization.
As max_gstage_mode is going to be reused by code that creates CPU nodes for
guest domains, not only max_gstage_mode->mode but also max_gstage_mode->name
is required. To support this, make max_gstage_mode a global pointer to one of
the entries in a global modes[] array, and remove get_max_supported_mode().
Update struct p2m_domain to store a pointer to a mode descriptor instead of
embedding the structure directly.
Refactor the modes[] array so that mode->name contains only the MMU scheme
name (without the "x4" suffix), as this value is reused when filling the
maximum MMU type passed to the guest. According to DT bindings [1], the MMU
type must not include the "x4" suffix. Use "none" for the Bare mode to match
the DT binding requirements.
Adjust modes[]->paging_levels to represent the maximum paging level rather
than the total number of levels. This ensures that P2M_ROOT_LEVEL() and its
users behave correctly without relying on hardcoded p2m mode values.
Finally, drop __initconst from the modes[] declaration, as the array is
referenced via p2m->mode and max_gstage_mode beyond the init stage.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/riscv/cpus.yaml?h=v6.19-rc3#n82
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/p2m.h | 7 ++--
xen/arch/riscv/p2m.c | 60 +++++++++++++-------------------
xen/arch/riscv/vmid.c | 2 +-
3 files changed, 30 insertions(+), 39 deletions(-)
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index c6d846b96fb4..4441c0400b83 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -13,7 +13,7 @@
#define P2M_ROOT_ORDER (ilog2(GSTAGE_ROOT_PAGE_TABLE_SIZE) - PAGE_SHIFT)
#define P2M_ROOT_PAGES BIT(P2M_ROOT_ORDER, U)
-#define P2M_ROOT_LEVEL(p2m) ((p2m)->mode.paging_levels)
+#define P2M_ROOT_LEVEL(p2m) ((p2m)->mode->paging_levels)
/*
* According to the RISC-V spec:
@@ -58,6 +58,8 @@ struct gstage_mode_desc {
char name[8];
};
+extern const struct gstage_mode_desc *max_gstage_mode;
+
/* Per-p2m-table state */
struct p2m_domain {
/*
@@ -71,7 +73,7 @@ struct p2m_domain {
/* The root of the p2m tree. May be concatenated */
struct page_info *root;
- struct gstage_mode_desc mode;
+ const struct gstage_mode_desc *mode;
/* Back pointer to domain */
struct domain *domain;
@@ -218,7 +220,6 @@ static inline bool arch_acquire_resource_check(struct domain *d)
}
void guest_mm_init(void);
-unsigned char get_max_supported_mode(void);
int p2m_init(struct domain *d);
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 886e06196ba2..dce1eb205ec9 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -45,18 +45,32 @@ struct p2m_pte_ctx {
unsigned int level; /* Paging level at which the PTE resides. */
};
-static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
- .mode = HGATP_MODE_OFF,
- .paging_levels = 0,
- .name = "Bare",
-};
-
/*
* Set to the maximum configured support for IPA bits, so the number of IPA bits can be
* restricted by external entity (e.g. IOMMU).
*/
unsigned int __read_mostly p2m_ipa_bits = PADDR_BITS;
+static const struct gstage_mode_desc modes[] = {
+ /*
+ * Based on the RISC-V spec:
+ * Bare mode is always supported, regardless of SXLEN.
+ * When SXLEN=32, the only other valid setting for MODE is Sv32.
+ * When SXLEN=64, three paged virtual-memory schemes are defined:
+ * Sv39, Sv48, and Sv57.
+ */
+ [0] = { HGATP_MODE_OFF, 0, "none" },
+#ifdef CONFIG_RISCV_32
+ [1] = { HGATP_MODE_SV32X4, 1, "sv32" }
+#else
+ [2] = { HGATP_MODE_SV39X4, 2, "sv39" },
+ [3] = { HGATP_MODE_SV48X4, 3, "sv48" },
+ [4] = { HGATP_MODE_SV57X4, 4, "sv57" },
+#endif
+};
+
+const struct gstage_mode_desc * __ro_after_init max_gstage_mode = &modes[0];
+
static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
static inline void p2m_free_metadata_page(struct p2m_domain *p2m,
@@ -69,11 +83,6 @@ static inline void p2m_free_metadata_page(struct p2m_domain *p2m,
}
}
-unsigned char get_max_supported_mode(void)
-{
- return max_gstage_mode.mode;
-}
-
/*
* If anything is changed here, it may also require updates to
* p2m_{get,set}_type().
@@ -154,23 +163,6 @@ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
static void __init gstage_mode_detect(void)
{
- static const struct gstage_mode_desc modes[] __initconst = {
- /*
- * Based on the RISC-V spec:
- * Bare mode is always supported, regardless of SXLEN.
- * When SXLEN=32, the only other valid setting for MODE is Sv32.
- * When SXLEN=64, three paged virtual-memory schemes are defined:
- * Sv39, Sv48, and Sv57.
- */
-#ifdef CONFIG_RISCV_32
- { HGATP_MODE_SV32X4, 2, "Sv32x4" }
-#else
- { HGATP_MODE_SV39X4, 3, "Sv39x4" },
- { HGATP_MODE_SV48X4, 4, "Sv48x4" },
- { HGATP_MODE_SV57X4, 5, "Sv57x4" },
-#endif
- };
-
for ( unsigned int mode_idx = ARRAY_SIZE(modes); mode_idx-- > 0; )
{
unsigned long mode = modes[mode_idx].mode;
@@ -179,16 +171,16 @@ static void __init gstage_mode_detect(void)
if ( MASK_EXTR(csr_read(CSR_HGATP), HGATP_MODE_MASK) == mode )
{
- max_gstage_mode = modes[mode_idx];
+ max_gstage_mode = &modes[mode_idx];
break;
}
}
- if ( max_gstage_mode.mode == HGATP_MODE_OFF )
+ if ( max_gstage_mode->mode == HGATP_MODE_OFF )
panic("Xen expects that G-stage won't be Bare mode\n");
- printk("Max supported G-stage mode is %s\n", max_gstage_mode.name);
+ printk("Max supported G-stage mode is %sx4\n", max_gstage_mode->name);
csr_write(CSR_HGATP, 0);
@@ -289,7 +281,7 @@ static void clear_and_clean_page(struct page_info *page, bool clean_dcache)
unsigned long construct_hgatp(const struct p2m_domain *p2m, uint16_t vmid)
{
return MASK_INSR(mfn_x(page_to_mfn(p2m->root)), HGATP_PPN_MASK) |
- MASK_INSR(p2m->mode.mode, HGATP_MODE_MASK) |
+ MASK_INSR(p2m->mode->mode, HGATP_MODE_MASK) |
MASK_INSR(vmid, HGATP_VMID_MASK);
}
@@ -369,9 +361,7 @@ int p2m_init(struct domain *d)
#endif
/* TODO: don't hardcode used for a domain g-stage mode. */
- p2m->mode.mode = HGATP_MODE_SV39X4;
- p2m->mode.paging_levels = 2;
- safe_strcpy(p2m->mode.name, "Sv39x4");
+ p2m->mode = &modes[2];
return 0;
}
diff --git a/xen/arch/riscv/vmid.c b/xen/arch/riscv/vmid.c
index 8fbcd500f24d..11c7e9d6d6c8 100644
--- a/xen/arch/riscv/vmid.c
+++ b/xen/arch/riscv/vmid.c
@@ -52,7 +52,7 @@ static DEFINE_PER_CPU(struct vmid_data, vmid_data);
static unsigned int vmidlen_detect(void)
{
unsigned int vmid_bits;
- unsigned char gstage_mode = get_max_supported_mode();
+ unsigned char gstage_mode = max_gstage_mode->mode;
/*
* According to the RISC-V Privileged Architecture Spec:
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 05/27] xen/riscv: introduce guest riscv,isa string
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (3 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 04/27] xen/riscv: rework G-stage mode handling Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 13:49 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 06/27] xen/riscv: implement make_cpus_node() Oleksii Kurochko
` (21 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce generation of the riscv,isa string passed to the guest via the
Device Tree riscv,isa property.
The following changes are introduced:
- According to the RISC-V privileged specification, M-mode software controls
the XLEN value used by S-mode and U-mode. For RV64 harts, the SXL and UXL
fields of the mstatus register are WARL fields that define the XLEN for
S-mode and U-mode.
The XLEN value is provided by M-mode software (OpenSBI in the case of Xen)
via the riscv,isa DT property. Introduce and initialize an xlen variable
when parsing the host riscv,isa string in riscv_isa_parse_string().
- Introduce the guest_isa_str variable to store the riscv,isa string
generated for a guest domain during riscv,isa property parsing. Update
match_isa_ext() to populate guest_isa_str accordingly.
- Introduce guest_unsupp_bmp and guest_unsupp_exts[] to filter out ISA
extensions that should not be exposed to guests. For example, FPU-related
extensions are currently not supported for guests (at the moment) and are
therefore removed from the guest riscv,isa string.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/cpufeature.c | 33 +++++++++++++++++++++++++
xen/arch/riscv/include/asm/cpufeature.h | 2 ++
2 files changed, 35 insertions(+)
diff --git a/xen/arch/riscv/cpufeature.c b/xen/arch/riscv/cpufeature.c
index 987d36dc7eee..d7e483603dbe 100644
--- a/xen/arch/riscv/cpufeature.c
+++ b/xen/arch/riscv/cpufeature.c
@@ -38,6 +38,8 @@ struct riscv_isa_ext_data {
/* Host ISA bitmap */
static __ro_after_init DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX);
+static __ro_after_init unsigned int xlen;
+
static int __init dt_get_cpuid_from_node(const struct dt_device_node *cpu,
unsigned long *dt_cpuid)
{
@@ -160,6 +162,19 @@ static const struct riscv_isa_ext_data __initconst required_extensions[] = {
RISCV_ISA_EXT_DATA(svpbmt),
};
+static const unsigned int __initconst guest_unsupp_exts[] = {
+ RISCV_ISA_EXT_f,
+ RISCV_ISA_EXT_d,
+ RISCV_ISA_EXT_h,
+ RISCV_ISA_EXT_q,
+ RISCV_ISA_EXT_v,
+};
+
+static __ro_after_init DECLARE_BITMAP(guest_unsupp_bmp, RISCV_ISA_EXT_MAX);
+
+#define MAX_GUEST_ISA_STR_LEN 256
+char guest_isa_str[MAX_GUEST_ISA_STR_LEN];
+
static bool __init is_lowercase_extension_name(const char *str)
{
/*
@@ -193,6 +208,15 @@ static void __init match_isa_ext(const char *name, const char *name_end,
!memcmp(name, ext->name, name_end - name) )
{
__set_bit(ext->id, bitmap);
+
+ if ( riscv_isa_extension_available(guest_unsupp_bmp, ext->id) )
+ break;
+
+ if ( ext->id >= RISCV_ISA_EXT_BASE )
+ safe_strcat(guest_isa_str, "_");
+
+ safe_strcat(guest_isa_str, ext->name);
+
break;
}
}
@@ -207,13 +231,17 @@ static int __init riscv_isa_parse_string(const char *isa,
#if defined(CONFIG_RISCV_32)
if ( isa[2] != '3' && isa[3] != '2' )
return -EINVAL;
+ xlen = 32;
#elif defined(CONFIG_RISCV_64)
if ( isa[2] != '6' && isa[3] != '4' )
return -EINVAL;
+ xlen = 64;
#else
# error "unsupported RISC-V bitness"
#endif
+ snprintf(guest_isa_str, sizeof(guest_isa_str), "rv%d", xlen);
+
/*
* In unpriv. specification (*_20240411) is mentioned the following:
* (1) A RISC-V ISA is defined as a base integer ISA, which must be
@@ -487,6 +515,11 @@ void __init riscv_fill_hwcap(void)
bool all_extns_available = true;
struct trap_info trap;
+ for ( i = 0; i < ARRAY_SIZE(guest_unsupp_exts); i++ )
+ {
+ __set_bit(guest_unsupp_exts[i], guest_unsupp_bmp);
+ }
+
riscv_fill_hwcap_from_isa_string();
if ( bitmap_empty(riscv_isa, RISCV_ISA_EXT_MAX) )
diff --git a/xen/arch/riscv/include/asm/cpufeature.h b/xen/arch/riscv/include/asm/cpufeature.h
index ef02a3e26d2c..aabbbf0c2cc3 100644
--- a/xen/arch/riscv/include/asm/cpufeature.h
+++ b/xen/arch/riscv/include/asm/cpufeature.h
@@ -43,6 +43,8 @@ enum riscv_isa_ext_id {
RISCV_ISA_EXT_MAX
};
+extern char guest_isa_str[];
+
void riscv_fill_hwcap(void);
bool riscv_isa_extension_available(const unsigned long *isa_bitmap,
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 06/27] xen/riscv: implement make_cpus_node()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (4 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 05/27] xen/riscv: introduce guest riscv,isa string Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 14:11 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 07/27] xen/riscv: implement make_timer_node() Oleksii Kurochko
` (20 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Implement make_cpus_node() to create cpus node for a guest domain.
This function is going to be use by common dom0less code during
construction domain.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/domain-build.c | 108 ++++++++++++++++++++++++++++++++++
2 files changed, 109 insertions(+)
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index b1514f1514a2..6c6c58ed1ac0 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -1,6 +1,7 @@
obj-y += aplic.o
obj-y += cpufeature.o
obj-y += domain.o
+obj-y += domain-build.o
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
obj-y += entry.o
obj-y += guestcopy.o
diff --git a/xen/arch/riscv/domain-build.c b/xen/arch/riscv/domain-build.c
index 0ce903a911ae..633f02e42c5e 100644
--- a/xen/arch/riscv/domain-build.c
+++ b/xen/arch/riscv/domain-build.c
@@ -3,8 +3,10 @@
#include <xen/fdt-domain-build.h>
#include <xen/fdt-kernel.h>
#include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
#include <xen/sched.h>
+#include <asm/cpufeature.h>
#include <asm/current.h>
#include <asm/guest_access.h>
@@ -44,3 +46,109 @@ int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
return 0;
}
+
+int __init make_cpus_node(const struct domain *d, void *fdt)
+{
+ int res;
+ const struct dt_device_node *cpus = dt_find_node_by_path("/cpus");
+ unsigned int cpu;
+ u32 timebase_frequency;
+ bool frequency_valid;
+ uint32_t *next_phandle = &((struct domain *)d)->arch.next_phandle;
+
+ dt_dprintk("Create cpus node\n");
+
+ if ( !cpus )
+ {
+ dprintk(XENLOG_ERR, "Missing /cpus node in the device tree?\n");
+ return -ENOENT;
+ }
+
+ frequency_valid = dt_property_read_u32(cpus, "timebase-frequency",
+ &timebase_frequency);
+
+ res = fdt_begin_node(fdt, "cpus");
+ if ( res )
+ return res;
+
+ res = fdt_property_cell(fdt, "#address-cells", 1);
+ if ( res )
+ return res;
+
+ res = fdt_property_cell(fdt, "#size-cells", 0);
+ if ( res )
+ return res;
+
+ if ( frequency_valid )
+ res = fdt_property_cell(fdt, "timebase-frequency", timebase_frequency);
+
+ for ( cpu = 0; cpu < d->max_vcpus; cpu++ )
+ {
+ char buf[64];
+ uint32_t reg = cpu_to_fdt32(cpu);
+
+ snprintf(buf, sizeof(buf), "cpu@%u", cpu);
+ res = fdt_begin_node(fdt, buf);
+ if ( res )
+ return res;
+
+ res = fdt_property(fdt, "reg", ®, sizeof(reg));
+ if ( res )
+ return res;
+
+ res = fdt_property_string(fdt, "status", "okay");
+ if ( res )
+ return res;
+
+ res = fdt_property_string(fdt, "compatible", "riscv");
+ if ( res )
+ return res;
+
+ BUILD_BUG_ON((sizeof("riscv,") + sizeof_field(struct gstage_mode_desc, name) + 1) >= sizeof(buf));
+ snprintf(buf, sizeof(buf), "riscv,%s", max_gstage_mode->name);
+ res = fdt_property_string(fdt, "mmu-type", buf);
+ if ( res )
+ return res;
+
+ res = fdt_property_string(fdt, "riscv,isa", guest_isa_str);
+ if ( res )
+ return res;
+
+ res = fdt_property_string(fdt, "device_type", "cpu");
+ if ( res )
+ return res;
+
+ res = fdt_begin_node(fdt, "interrupt-controller");
+ if ( res )
+ return res;
+
+ res = fdt_property_string(fdt, "compatible", "riscv,cpu-intc");
+ if ( res )
+ return res;
+
+ res = fdt_property_cell(fdt, "#interrupt-cells", 1);
+ if ( res )
+ return res;
+
+ res = fdt_property(fdt, "interrupt-controller", NULL, 0);
+ if ( res )
+ return res;
+
+ res = fdt_property_u32(fdt, "phandle", (*next_phandle)++);
+ if ( res )
+ return res;
+
+ /* end of interrupt-controller */
+ res = fdt_end_node(fdt);
+ if ( res )
+ return res;
+
+ res = fdt_end_node(fdt);
+ if ( res )
+ return res;
+ }
+
+ res = fdt_end_node(fdt);
+
+ return res;
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 07/27] xen/riscv: implement make_timer_node()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (5 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 06/27] xen/riscv: implement make_cpus_node() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 14:24 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 08/27] xen/riscv: implement make_arch_nodes() Oleksii Kurochko
` (19 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
The make_timer_node() function is implemented to return 0,
as RISC-V does not require the creation of a timer node.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/domain-build.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/xen/arch/riscv/domain-build.c b/xen/arch/riscv/domain-build.c
index 633f02e42c5e..ae26faed09ed 100644
--- a/xen/arch/riscv/domain-build.c
+++ b/xen/arch/riscv/domain-build.c
@@ -3,6 +3,7 @@
#include <xen/fdt-domain-build.h>
#include <xen/fdt-kernel.h>
#include <xen/init.h>
+#include <xen/fdt-kernel.h>
#include <xen/libfdt/libfdt.h>
#include <xen/sched.h>
@@ -152,3 +153,10 @@ int __init make_cpus_node(const struct domain *d, void *fdt)
return res;
}
+
+int __init make_timer_node(const struct kernel_info *kinfo)
+{
+ /* There is no need for timer node for RISC-V. */
+
+ return 0;
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 08/27] xen/riscv: implement make_arch_nodes()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (6 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 07/27] xen/riscv: implement make_timer_node() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 14:29 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 09/27] xen/riscv: implement make_intc_domU_node() Oleksii Kurochko
` (18 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
No RISC-V-specific nodes need to be created at the moment,
so make_arch_nodes() is implemented to simply return 0.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/dom0less-build.c | 11 +++++++++++
2 files changed, 12 insertions(+)
create mode 100644 xen/arch/riscv/dom0less-build.c
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 6c6c58ed1ac0..fb04cc5a56b2 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -1,5 +1,6 @@
obj-y += aplic.o
obj-y += cpufeature.o
+obj-y += dom0less-build.o
obj-y += domain.o
obj-y += domain-build.o
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
diff --git a/xen/arch/riscv/dom0less-build.c b/xen/arch/riscv/dom0less-build.c
new file mode 100644
index 000000000000..10863cffc96b
--- /dev/null
+++ b/xen/arch/riscv/dom0less-build.c
@@ -0,0 +1,11 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/fdt-kernel.h>
+#include <xen/init.h>
+
+int __init make_arch_nodes(struct kernel_info *kinfo)
+{
+ /* No RISC-V specific nodes need to be made, at the moment. */
+
+ return 0;
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 09/27] xen/riscv: implement make_intc_domU_node()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (7 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 08/27] xen/riscv: implement make_arch_nodes() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 14:38 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains Oleksii Kurochko
` (17 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce a RISC-V specific function to create an interrupt controller
Device Tree node for DomU domains during dom0less build.
Add make_intc_domU_node() to the dom0less build path and wire it to
a new generic helper, intc_make_domu_dt_node(), which delegates DT
node creation to the active interrupt controller implementation via
intc_hw_ops.
Extend struct intc_hw_operations with a make_dom_dt_node callback
and expose the required interfaces in the interrupt controller header,
allowing individual interrupt controller drivers to populate DomU DT
nodes as needed.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/dom0less-build.c | 9 +++++++++
xen/arch/riscv/include/asm/intc.h | 10 ++++++++--
xen/arch/riscv/intc.c | 8 ++++++++
3 files changed, 25 insertions(+), 2 deletions(-)
diff --git a/xen/arch/riscv/dom0less-build.c b/xen/arch/riscv/dom0less-build.c
index 10863cffc96b..312c5d7d4979 100644
--- a/xen/arch/riscv/dom0less-build.c
+++ b/xen/arch/riscv/dom0less-build.c
@@ -3,6 +3,15 @@
#include <xen/fdt-kernel.h>
#include <xen/init.h>
+#include <asm/intc.h>
+
+int __init make_intc_domU_node(struct kernel_info *kinfo)
+{
+ intc_make_domu_dt_node(kinfo);
+
+ return 0;
+}
+
int __init make_arch_nodes(struct kernel_info *kinfo)
{
/* No RISC-V specific nodes need to be made, at the moment. */
diff --git a/xen/arch/riscv/include/asm/intc.h b/xen/arch/riscv/include/asm/intc.h
index ecdc8a5e6577..8300d71d472f 100644
--- a/xen/arch/riscv/include/asm/intc.h
+++ b/xen/arch/riscv/include/asm/intc.h
@@ -8,14 +8,14 @@
#ifndef ASM__RISCV__INTERRUPT_CONTOLLER_H
#define ASM__RISCV__INTERRUPT_CONTOLLER_H
-struct dt_device_node;
-
enum intc_version {
INTC_APLIC,
};
struct cpu_user_regs;
+struct dt_device_node;
struct irq_desc;
+struct kernel_info;
struct intc_info {
enum intc_version hw_version;
@@ -41,6 +41,10 @@ struct intc_hw_operations {
/* handle external interrupt */
void (*handle_interrupt)(struct cpu_user_regs *regs);
+
+ /* Create interrupt controller node for domain */
+ int (*make_dom_dt_node)(const struct kernel_info *kinfo,
+ const struct dt_device_node *intc);
};
void intc_preinit(void);
@@ -53,4 +57,6 @@ void intc_route_irq_to_xen(struct irq_desc *desc, unsigned int priority);
void intc_handle_external_irqs(struct cpu_user_regs *regs);
+int intc_make_domu_dt_node(const struct kernel_info *kinfo);
+
#endif /* ASM__RISCV__INTERRUPT_CONTOLLER_H */
diff --git a/xen/arch/riscv/intc.c b/xen/arch/riscv/intc.c
index ea317aea5ad8..a6f8c30f4771 100644
--- a/xen/arch/riscv/intc.c
+++ b/xen/arch/riscv/intc.c
@@ -67,3 +67,11 @@ void intc_route_irq_to_xen(struct irq_desc *desc, unsigned int priority)
intc_set_irq_type(desc, desc->arch.type);
intc_set_irq_priority(desc, priority);
}
+
+int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
+{
+ if ( intc_hw_ops && intc_hw_ops->make_dom_dt_node )
+ return intc_hw_ops->make_dom_dt_node(kinfo, intc_hw_ops->info->node);
+
+ return -ENOSYS;
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (8 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 09/27] xen/riscv: implement make_intc_domU_node() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 15:05 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 11/27] xen/riscv: create APLIC " Oleksii Kurochko
` (16 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Guests using the IMSIC interrupt controller require a corresponding
Device Tree description. Add support for generating an IMSIC node when
building the guest DT.
Keep a reference to the host IMSIC DT node and reuse its compatible
property while constructing the guest-visible node. The generated node
contains guest-specific information such as the MMIO region sized for
the number of vCPUs and the interrupts-extended property referencing
per-vCPU interrupt controllers.
This allows guests to discover and use the IMSIC interrupt controller.
Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/imsic.c | 113 +++++++++++++++++++++++++++++
xen/arch/riscv/include/asm/imsic.h | 7 ++
2 files changed, 120 insertions(+)
diff --git a/xen/arch/riscv/imsic.c b/xen/arch/riscv/imsic.c
index a4460576f620..0956b187705f 100644
--- a/xen/arch/riscv/imsic.c
+++ b/xen/arch/riscv/imsic.c
@@ -13,8 +13,11 @@
#include <xen/const.h>
#include <xen/cpumask.h>
#include <xen/device_tree.h>
+#include <xen/domain.h>
#include <xen/errno.h>
+#include <xen/fdt-kernel.h>
#include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
#include <xen/macros.h>
#include <xen/smp.h>
#include <xen/spinlock.h>
@@ -329,6 +332,8 @@ int __init imsic_init(const struct dt_device_node *node)
struct imsic_mmios *mmios;
struct imsic_msi *msi = NULL;
+ imsic_cfg.host_node = node;
+
/* Parse IMSIC node */
rc = imsic_parse_node(node, &nr_parent_irqs, &nr_mmios);
/*
@@ -487,3 +492,111 @@ int __init imsic_init(const struct dt_device_node *node)
return rc;
}
+
+static int __init imsic_make_reg_property(struct domain *d, void *fdt)
+{
+ __be32 regs[4];
+
+ regs[0] = cpu_to_be32(imsic_cfg.base_addr >> 32);
+ regs[1] = cpu_to_be32(imsic_cfg.base_addr);
+ regs[2] = cpu_to_be32((IMSIC_MMIO_PAGE_SZ * d->max_vcpus) >> 32);
+ regs[3] = cpu_to_be32(IMSIC_MMIO_PAGE_SZ * d->max_vcpus);
+
+ return fdt_property(fdt, "reg", regs, sizeof(regs));
+}
+
+static int __init imsic_set_interrupt_extended_prop(struct domain *d,
+ void *fdt)
+{
+ uint32_t len = 0, pos = 0, cpu, phandle;
+ uint32_t *irq_ext;
+ char buf[64];
+ int res;
+
+ irq_ext = xvzalloc_array(uint32_t, d->max_vcpus * 2);
+ if ( !irq_ext )
+ return -ENOMEM;
+
+ for ( cpu = 0; cpu < d->max_vcpus; cpu++ )
+ {
+ snprintf(buf, sizeof(buf), "/cpus/cpu@%u/interrupt-controller", cpu);
+ phandle = fdt_get_phandle(fdt, fdt_path_offset(fdt, buf));
+
+ if ( phandle <= 0 )
+ return phandle;
+
+ irq_ext[pos++] = cpu_to_be32(phandle);
+ len += sizeof(uint32_t);
+ irq_ext[pos++] = cpu_to_be32(IRQ_S_EXT);
+ len += sizeof(uint32_t);
+ }
+
+ res = fdt_property(fdt, "interrupts-extended", irq_ext, len);
+
+ XVFREE(irq_ext);
+
+ return res;
+}
+
+int __init imsic_make_dt_node(const struct kernel_info *kinfo)
+{
+ uint32_t len;
+ const void *data = NULL;
+ int res = 0;
+ void *fdt = kinfo->fdt;
+ const struct dt_device_node *host_imsic_node = imsic_cfg.host_node;
+ uint32_t *next_phandle = &kinfo->bd.d->arch.next_phandle;
+
+ res = fdt_begin_node(fdt, host_imsic_node->full_name);
+ if ( res )
+ return res;
+
+ data = dt_get_property(host_imsic_node, "compatible", &len);
+ if ( !data )
+ {
+ printk(XENLOG_ERR "%s: Can't find 'compatible' property\n",
+ host_imsic_node->full_name);
+
+ return -ENOENT;
+ }
+
+ res = fdt_property(fdt, "compatible", data, len);
+ if ( res )
+ return res;
+
+ res = imsic_make_reg_property(kinfo->bd.d, fdt);
+ if ( res )
+ return res;
+
+ res = imsic_set_interrupt_extended_prop(kinfo->bd.d, fdt);
+ if ( res )
+ return res;
+
+ res = fdt_property_u32(fdt, "riscv,num-ids", imsic_cfg.nr_ids);
+ if ( res )
+ return res;
+
+ res = fdt_property(fdt, "msi-controller", NULL, 0);
+ if ( res )
+ return res;
+
+ res = fdt_property_u32(fdt, "#msi-cells", 0);
+ if ( res )
+ return res;
+
+ res = fdt_property(fdt, "interrupt-controller", NULL, 0);
+ if ( res )
+ return res;
+
+ res = fdt_property_u32(fdt, "#interrupt-cells", 0);
+ if ( res )
+ return res;
+
+ imsic_cfg.phandle = (*next_phandle)++;
+
+ res = fdt_property_cell(fdt, "phandle", imsic_cfg.phandle);
+ if ( res )
+ return res;
+
+ return fdt_end_node(fdt);
+}
diff --git a/xen/arch/riscv/include/asm/imsic.h b/xen/arch/riscv/include/asm/imsic.h
index c6c59215df20..a63d56fbd5d9 100644
--- a/xen/arch/riscv/include/asm/imsic.h
+++ b/xen/arch/riscv/include/asm/imsic.h
@@ -57,11 +57,16 @@ struct imsic_config {
/* MSI */
const struct imsic_msi *msi;
+ /* DT node of IMSIC */
+ const struct dt_device_node *host_node;
+
/* Lock to protect access to IMSIC's stuff */
spinlock_t lock;
};
struct dt_device_node;
+struct kernel_info;
+
int imsic_init(const struct dt_device_node *node);
const struct imsic_config *imsic_get_config(void);
@@ -71,4 +76,6 @@ void imsic_irq_disable(unsigned int hwirq);
void imsic_ids_local_delivery(bool enable);
+int imsic_make_dt_node(const struct kernel_info *kinfo);
+
#endif /* ASM_RISCV_IMSIC_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 11/27] xen/riscv: create APLIC DT node for guest domains
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (9 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-01 15:16 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available() Oleksii Kurochko
` (15 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Guests require a Device Tree description of the interrupt controller
topology. Add support for creating an APLIC node when building the
guest DT.
The node is derived from the host DT and linked to the guest IMSIC
via the msi-parent property, allowing guests to discover and use the
APLIC interrupt controller.
Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/aplic.c | 82 ++++++++++++++++++++++++++++++++++++++++++
1 file changed, 82 insertions(+)
diff --git a/xen/arch/riscv/aplic.c b/xen/arch/riscv/aplic.c
index 739e8dab3498..e139946a05a0 100644
--- a/xen/arch/riscv/aplic.c
+++ b/xen/arch/riscv/aplic.c
@@ -12,8 +12,10 @@
#include <xen/const.h>
#include <xen/device_tree.h>
#include <xen/errno.h>
+#include <xen/fdt-kernel.h>
#include <xen/init.h>
#include <xen/irq.h>
+#include <xen/libfdt/libfdt.h>
#include <xen/mm.h>
#include <xen/sections.h>
#include <xen/spinlock.h>
@@ -295,6 +297,85 @@ static void cf_check aplic_set_irq_type(struct irq_desc *desc,
spin_unlock(&aplic.lock);
}
+static int __init cf_check aplic_make_dom_dt_node(
+ const struct kernel_info *kinfo,
+ const struct dt_device_node *aplic_node)
+{
+ uint32_t len;
+ const __be32 *regs;
+ const void *data = NULL;
+ int res = 0;
+ void *fdt = kinfo->fdt;
+
+ res = imsic_make_dt_node(kinfo);
+ if ( res )
+ return res;
+
+ res = fdt_begin_node(fdt, aplic_node->full_name);
+ if (res)
+ return res;
+
+ data = dt_get_property(aplic_node, "#interrupt-cells", &len);
+ if ( !data )
+ {
+ printk("%s: Can't find '#interrupt-cells' property\n",
+ aplic_node->full_name);
+ return -FDT_ERR_XEN(ENOENT);
+ }
+
+ res = fdt_property(fdt, "#interrupt-cells", data, len);
+ if ( res )
+ return res;
+
+ regs = dt_get_property(aplic_node, "reg", &len);
+ if ( !regs )
+ {
+ printk("%s: Can't find 'reg' property\n", aplic_node->full_name);
+ return -FDT_ERR_XEN(ENOENT);
+ }
+
+ res = fdt_property(fdt, "reg", regs, len);
+ if ( res )
+ return res;
+
+ data = dt_get_property(aplic_node, "riscv,num-sources", &len);
+ if ( !data )
+ {
+ printk("%s: Can't find 'riscv,num-sources' property\n",
+ aplic_node->full_name);
+ return -FDT_ERR_XEN(ENOENT);
+ }
+
+ res = fdt_property(fdt, "riscv,num-sources", data, len);
+ if ( res )
+ return res;
+
+ res = fdt_property(fdt, "interrupt-controller", NULL, 0);
+ if ( res )
+ return res;
+
+ data = dt_get_property(aplic_node, "compatible", &len);
+ if ( !data )
+ {
+ printk("%s: Can't find 'compatible' property\n", aplic_node->full_name);
+ return -FDT_ERR_XEN(ENOENT);
+ }
+
+ res = fdt_property(fdt, "compatible", data, len);
+ if ( res )
+ return res;
+
+ res = fdt_property_cell(fdt, "msi-parent", aplic.imsic_cfg->phandle);
+ if ( res )
+ return res;
+
+ res = fdt_property_cell(fdt, "phandle", kinfo->phandle_intc);
+ if ( res )
+ return res;
+
+ return fdt_end_node(fdt);
+}
+
static const hw_irq_controller aplic_xen_irq_type = {
.typename = "aplic",
.startup = aplic_irq_startup,
@@ -310,6 +391,7 @@ static const struct intc_hw_operations aplic_ops = {
.host_irq_type = &aplic_xen_irq_type,
.handle_interrupt = aplic_handle_interrupt,
.set_irq_type = aplic_set_irq_type,
+ .make_dom_dt_node = aplic_make_dom_dt_node,
};
static int cf_check aplic_irq_xlate(const uint32_t *intspec,
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (10 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 11/27] xen/riscv: create APLIC " Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 9:00 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests Oleksii Kurochko
` (14 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
aia_init() is going to contain all the stuff related to AIA initialization.
At the moment, it is just Check if SSAIA extension is available and if yes
set is_aia_available to true.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/aia.c | 25 +++++++++++++++++++++++++
xen/arch/riscv/include/asm/aia.h | 10 ++++++++++
xen/arch/riscv/intc.c | 3 +++
4 files changed, 39 insertions(+)
create mode 100644 xen/arch/riscv/aia.c
create mode 100644 xen/arch/riscv/include/asm/aia.h
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index fb04cc5a56b2..d772b42386c0 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -1,3 +1,4 @@
+obj-y += aia.o
obj-y += aplic.o
obj-y += cpufeature.o
obj-y += dom0less-build.o
diff --git a/xen/arch/riscv/aia.c b/xen/arch/riscv/aia.c
new file mode 100644
index 000000000000..5e3f190e8e2c
--- /dev/null
+++ b/xen/arch/riscv/aia.c
@@ -0,0 +1,25 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/errno.h>
+#include <xen/init.h>
+#include <xen/sections.h>
+#include <xen/types.h>
+
+#include <asm/cpufeature.h>
+
+static bool __ro_after_init is_aia_available;
+
+bool aia_available(void)
+{
+ return is_aia_available;
+}
+
+int __init aia_init(void)
+{
+ if ( !riscv_isa_extension_available(NULL, RISCV_ISA_EXT_ssaia) )
+ return -ENODEV;
+
+ is_aia_available = true;
+
+ return 0;
+}
diff --git a/xen/arch/riscv/include/asm/aia.h b/xen/arch/riscv/include/asm/aia.h
new file mode 100644
index 000000000000..039607faf685
--- /dev/null
+++ b/xen/arch/riscv/include/asm/aia.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#ifndef ASM__RISCV__AIA_H
+#define ASM__RISCV__AIA_H
+
+bool aia_available(void);
+
+int aia_init(void);
+
+#endif /* ASM__RISCV__ACPI_H */
diff --git a/xen/arch/riscv/intc.c b/xen/arch/riscv/intc.c
index a6f8c30f4771..c9f12651fda1 100644
--- a/xen/arch/riscv/intc.c
+++ b/xen/arch/riscv/intc.c
@@ -8,6 +8,7 @@
#include <xen/lib.h>
#include <xen/spinlock.h>
+#include <asm/aia.h>
#include <asm/intc.h>
static const struct intc_hw_operations *__ro_after_init intc_hw_ops;
@@ -27,6 +28,8 @@ void __init intc_preinit(void)
void __init intc_init(void)
{
+ aia_init();
+
if ( intc_hw_ops->init() )
panic("Failed to initialize the interrupt controller drivers\n");
}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (11 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 10:03 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state Oleksii Kurochko
` (13 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
AIA provides a hardware-accelerated mechanism for delivering external
interrupts to domains via "guest interrupt files" located in IMSIC.
A single physical hart can implement multiple such files (up to GEILEN),
allowing several virtual harts to receive interrupts directly from hardware
Introduce per-CPU tracking of guest interrupt file identifiers (VGEIN)
for systems implementing AIA specification. Each CPU maintains
a bitmap describing which guest interrupt files are currently in use.
Add helpers to initialize the bitmap based on the number of available
guest interrupt files (GEILEN), assign a VGEIN to a vCPU, and release it
when no longer needed. When assigning a VGEIN, the corresponding value
is written to the VGEIN field of the guest hstatus register so that
VS-level external interrupts are delivered from the selected interrupt
file.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/aia.c | 112 ++++++++++++++++++++++++++++++-
xen/arch/riscv/include/asm/aia.h | 18 +++++
2 files changed, 129 insertions(+), 1 deletion(-)
diff --git a/xen/arch/riscv/aia.c b/xen/arch/riscv/aia.c
index 5e3f190e8e2c..7bd66d1e37c6 100644
--- a/xen/arch/riscv/aia.c
+++ b/xen/arch/riscv/aia.c
@@ -1,11 +1,24 @@
/* SPDX-License-Identifier: GPL-2.0-only */
+#include <xen/bitmap.h>
#include <xen/errno.h>
#include <xen/init.h>
#include <xen/sections.h>
+#include <xen/sched.h>
+#include <xen/spinlock.h>
#include <xen/types.h>
+#include <xen/xvmalloc.h>
+#include <asm/aia.h>
#include <asm/cpufeature.h>
+#include <asm/csr.h>
+#include <asm/current.h>
+
+/*
+ * Bitmap for each physical cpus to detect which VS (guest)
+ * interrupt file id was used.
+ */
+DEFINE_PER_CPU(struct vgein_bmp, vgein_bmp);
static bool __ro_after_init is_aia_available;
@@ -14,12 +27,109 @@ bool aia_available(void)
return is_aia_available;
}
+int __init vgein_init(unsigned int cpu)
+{
+ struct vgein_bmp *vgein = &per_cpu(vgein_bmp, cpu);
+
+ csr_write(CSR_HGEIE, -1UL);
+ vgein->geilen = flsl(csr_read(CSR_HGEIE));
+ csr_write(CSR_HGEIE, 0);
+ if ( vgein->geilen )
+ vgein->geilen--;
+
+ BUG_ON(!vgein->geilen);
+
+ printk("cpu%d.geilen=%d\n", cpu, vgein->geilen);
+
+ if ( !vgein->bmp )
+ {
+ vgein->bmp = xvzalloc_array(unsigned long, BITS_TO_LONGS(vgein->geilen));
+ if ( !vgein->bmp )
+ return -ENOMEM;
+ }
+
+ spin_lock_init(&vgein->lock);
+
+ return 0;
+}
+
int __init aia_init(void)
{
+ int rc = 0;
+
if ( !riscv_isa_extension_available(NULL, RISCV_ISA_EXT_ssaia) )
return -ENODEV;
+ if ( (rc = vgein_init(0)) )
+ return rc;
+
is_aia_available = true;
- return 0;
+ return rc;
+}
+
+unsigned int vgein_assign(struct vcpu *v)
+{
+ unsigned int vgein_id;
+
+ struct vgein_bmp *vgein_bmp = &per_cpu(vgein_bmp, v->processor);
+ unsigned long *bmp = vgein_bmp->bmp;
+ unsigned long flags;
+
+ spin_lock_irqsave(&vgein_bmp->lock, flags);
+ vgein_id = bitmap_weight(bmp, vgein_bmp->geilen);
+
+ /*
+ * All vCPU guest interrupt files are used and we don't support a case
+ * when number of vCPU on 1 pCPU is bigger then geilen.
+ */
+ ASSERT(vgein_id < vgein_bmp->geilen);
+
+ bitmap_set(bmp, vgein_id, 1);
+ spin_unlock_irqrestore(&vgein_bmp->lock, flags);
+
+ /*
+ * The vgein_id shouldn't be zero, as it will indicate that no guest
+ * external interrupt source is selected for VS-level external interrupts
+ * according to RISC-V priviliged spec:
+ * 8.2.1 Hypervisor Status Register (hstatus) in RISC-V priviliged spec:
+ *
+ * The VGEIN (Virtual Guest External Interrupt Number) field selects
+ * a guest external interrupt source for VS-level external interrupts.
+ * VGEIN is a WLRL field that must be able to hold values between zero
+ * and the maximum guest external interrupt number (known as GEILEN),
+ * inclusive.
+ * When VGEIN=0, no guest external interrupt source is selected for
+ * VS-level external interrupts.
+ */
+ vgein_id++;
+
+#ifdef VGEIN_DEBUG
+ printk("%s: %pv: vgein_id(%u), xen_cpu%d_bmp=%#lx\n",
+ __func__, v, vgein_id, v->processor, *bmp);
+#endif
+
+ vcpu_guest_cpu_user_regs(v)->hstatus &= ~HSTATUS_VGEIN;
+ vcpu_guest_cpu_user_regs(v)->hstatus |=
+ MASK_INSR(vgein_id, HSTATUS_VGEIN);
+
+ return vgein_id;
+}
+
+void vgein_release(struct vcpu *v, unsigned int vgen_id)
+{
+ unsigned long flags;
+
+ struct vgein_bmp *vgein_bmp = &per_cpu(vgein_bmp, v->processor);
+
+ spin_lock_irqsave(&vgein_bmp->lock, flags);
+ bitmap_clear(vgein_bmp->bmp, vgen_id - 1, 1);
+ spin_unlock_irqrestore(&vgein_bmp->lock, flags);
+
+#ifdef VGEIN_DEBUG
+ printk("%s: vgein_id(%u), xen_cpu%d_bmp=%#lx\n",
+ __func__, vgen_id, v->processor, *vgein_bmp->bmp);
+#endif
+
+ vcpu_guest_cpu_user_regs(v)->hstatus &= ~HSTATUS_VGEIN;
}
diff --git a/xen/arch/riscv/include/asm/aia.h b/xen/arch/riscv/include/asm/aia.h
index 039607faf685..c2717504cbea 100644
--- a/xen/arch/riscv/include/asm/aia.h
+++ b/xen/arch/riscv/include/asm/aia.h
@@ -3,8 +3,26 @@
#ifndef ASM__RISCV__AIA_H
#define ASM__RISCV__AIA_H
+#include <xen/percpu.h>
+#include <xen/spinlock.h>
+
+struct vcpu;
+
+struct vgein_bmp {
+ unsigned long *bmp;
+ spinlock_t lock;
+ struct vcpu *owners[BITS_PER_LONG];
+ unsigned int geilen;
+};
+
+DECLARE_PER_CPU(struct vgein_bmp, vgein_bmp);
+
bool aia_available(void);
int aia_init(void);
+int vgein_init(unsigned int cpu);
+unsigned int vgein_assign(struct vcpu *v);
+void vgein_release(struct vcpu *v, unsigned int vgen_id);
+
#endif /* ASM__RISCV__ACPI_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (12 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 11:31 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support Oleksii Kurochko
` (12 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Each vCPU interacting with the IMSIC requires state to track the
associated guest interrupt file and its backing context.
Introduce a per-vCPU structure to hold IMSIC-related state, including
the guest interrupt file identifier and the CPU providing the backing
VS-file. Access to the guest file identifier is protected by a lock.
Initialize this structure during vCPU setup and store it in arch_vcpu.
The initial state marks the VS-file as software-backed until it becomes
associated with a physical CPU.
Add helpers to retrieve and update the guest interrupt file identifier.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/imsic.c | 42 +++++++++++++++++++++++++++++
xen/arch/riscv/include/asm/domain.h | 2 ++
xen/arch/riscv/include/asm/imsic.h | 17 ++++++++++++
3 files changed, 61 insertions(+)
diff --git a/xen/arch/riscv/imsic.c b/xen/arch/riscv/imsic.c
index 0956b187705f..bbadbdf352a1 100644
--- a/xen/arch/riscv/imsic.c
+++ b/xen/arch/riscv/imsic.c
@@ -59,6 +59,29 @@ do { \
csr_clear(CSR_SIREG, v); \
} while (0)
+unsigned int vcpu_guest_file_id(const struct vcpu *v)
+{
+ struct imsic_state *imsic_state = v->arch.imsic_state;
+ unsigned long flags;
+ unsigned int vsfile_id;
+
+ read_lock_irqsave(&imsic_state->vsfile_lock, flags);
+ vsfile_id = imsic_state->guest_file_id;
+ read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
+
+ return vsfile_id;
+}
+
+void imsic_set_guest_file_id(const struct vcpu *v, unsigned int guest_file_id)
+{
+ struct imsic_state *imsic_state = v->arch.imsic_state;
+ unsigned long flags;
+
+ write_lock_irqsave(&imsic_state->vsfile_lock, flags);
+ imsic_state->guest_file_id = guest_file_id;
+ write_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
+}
+
void __init imsic_ids_local_delivery(bool enable)
{
if ( enable )
@@ -315,6 +338,25 @@ static int imsic_parse_node(const struct dt_device_node *node,
return 0;
}
+int __init vcpu_imsic_init(struct vcpu *v)
+{
+ struct imsic_state *imsic_state;
+
+ /* Allocate IMSIC context */
+ imsic_state = xvzalloc(struct imsic_state);
+ if ( !imsic_state )
+ return -ENOMEM;
+
+ v->arch.imsic_state = imsic_state;
+
+ /* Setup IMSIC context */
+ rwlock_init(&imsic_state->vsfile_lock);
+
+ imsic_state->guest_file_id = imsic_state->vsfile_pcpu = NR_CPUS;
+
+ return 0;
+}
+
/*
* Initialize the imsic_cfg structure based on the IMSIC DT node.
*
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index 506365f199c7..bdb1ffd748c9 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -52,6 +52,8 @@ struct arch_vcpu {
struct vtimer vtimer;
+ struct imsic_state *imsic_state;
+
register_t hcounteren;
register_t hedeleg;
register_t hideleg;
diff --git a/xen/arch/riscv/include/asm/imsic.h b/xen/arch/riscv/include/asm/imsic.h
index a63d56fbd5d9..13a563dce066 100644
--- a/xen/arch/riscv/include/asm/imsic.h
+++ b/xen/arch/riscv/include/asm/imsic.h
@@ -11,6 +11,7 @@
#ifndef ASM_RISCV_IMSIC_H
#define ASM_RISCV_IMSIC_H
+#include <xen/rwlock.h>
#include <xen/spinlock.h>
#include <xen/stdbool.h>
#include <xen/types.h>
@@ -64,8 +65,20 @@ struct imsic_config {
spinlock_t lock;
};
+struct imsic_state {
+ /* IMSIC VS-file */
+ rwlock_t vsfile_lock;
+ unsigned int guest_file_id;
+ /*
+ * (vsfile_pcpu >= 0) => h/w IMSIC VS-file
+ * (vsfile_pcpu == NR_CPUS) => s/w IMSIC SW-file
+ */
+ unsigned long vsfile_pcpu;
+};
+
struct dt_device_node;
struct kernel_info;
+struct vcpu;
int imsic_init(const struct dt_device_node *node);
@@ -78,4 +91,8 @@ void imsic_ids_local_delivery(bool enable);
int imsic_make_dt_node(const struct kernel_info *kinfo);
+int vcpu_imsic_init(struct vcpu *v);
+unsigned int vcpu_guest_file_id(const struct vcpu *v);
+void imsic_set_guest_file_id(const struct vcpu *v, unsigned int guest_file_id);
+
#endif /* ASM_RISCV_IMSIC_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (13 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 11:58 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough Oleksii Kurochko
` (11 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
At the current development stage, only domain vINTC init and deinit
operations are required, so implement those first.
Initialize vAPLIC's domaincfg to with the interrupt-enable bit set and
MSI delivery mode selected as the current solution is exepcted to have
always IMSIC, and initialize vintc->ops.
Other operations such as emulate_load(), emulate_store(), and is_access()
will be needed once guests are running and MMIO accesses to APLIC MMIO
range must be handled. These will be introduced separately later.
Introduce a structure to describe a virtual interrupt controller (vINTC)
and a vintc_ops structure, which provides operations to emulate load and
store accesses to interrupt controller MMIOs and to check whether a given
address falls within the MMIO range of a specific virtual interrupt
controller.
The vAPLIC implementation of these operations will be provided later
once guests can be run and these operations are actually needed.
Introduce these structures here as they are required for the implementation
of domain_vaplic_init() and domain_vaplic_alloc(). Also, introduce
vcpu_vaplic_init() and init vintc_ops->vcpu_init() with it.
Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/domain.c | 11 ++---
xen/arch/riscv/include/asm/domain.h | 2 +
xen/arch/riscv/include/asm/intc.h | 14 ++++++
xen/arch/riscv/include/asm/vaplic.h | 36 ++++++++++++++
xen/arch/riscv/intc.c | 1 +
xen/arch/riscv/vaplic.c | 74 +++++++++++++++++++++++++++++
7 files changed, 131 insertions(+), 8 deletions(-)
create mode 100644 xen/arch/riscv/include/asm/vaplic.h
create mode 100644 xen/arch/riscv/vaplic.c
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index d772b42386c0..b9941a230e03 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -24,6 +24,7 @@ obj-y += smpboot.o
obj-y += stubs.o
obj-y += time.o
obj-y += traps.o
+obj-y += vaplic.o
obj-y += vmid.o
obj-y += vm_event.o
obj-y += vsbi/
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index 515735b32e30..560b21b16ffb 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -11,6 +11,7 @@
#include <asm/bitops.h>
#include <asm/cpufeature.h>
#include <asm/csr.h>
+#include <asm/intc.h>
#include <asm/riscv_encoding.h>
#include <asm/vtimer.h>
@@ -154,14 +155,8 @@ int arch_vcpu_create(struct vcpu *v)
if ( (rc = vcpu_vtimer_init(v)) )
goto fail;
- /*
- * As interrupt controller (IC) is not yet implemented,
- * return an error.
- *
- * TODO: Drop this once IC is implemented.
- */
- rc = -EOPNOTSUPP;
- goto fail;
+ if ( (rc = v->domain->arch.vintc->ops->vcpu_init(v)) )
+ goto fail;
return rc;
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index bdb1ffd748c9..21a3e6876f36 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -98,6 +98,8 @@ struct arch_domain {
struct paging_domain paging;
#endif
+ struct vintc *vintc;
+
/* Next unused device tree phandle number */
uint32_t next_phandle;
};
diff --git a/xen/arch/riscv/include/asm/intc.h b/xen/arch/riscv/include/asm/intc.h
index 8300d71d472f..c5a869db8bc5 100644
--- a/xen/arch/riscv/include/asm/intc.h
+++ b/xen/arch/riscv/include/asm/intc.h
@@ -16,6 +16,7 @@ struct cpu_user_regs;
struct dt_device_node;
struct irq_desc;
struct kernel_info;
+struct vcpu;
struct intc_info {
enum intc_version hw_version;
@@ -47,6 +48,19 @@ struct intc_hw_operations {
const struct dt_device_node *intc);
};
+struct vintc_ops {
+ /* Initialize some vINTC-related stuff for a vCPU */
+ int (*vcpu_init)(struct vcpu *vcpu);
+
+ /* Check if a register is virtual interrupt controller MMIO */
+ int (*is_access)(const struct vcpu *vcpu, const unsigned long addr);
+};
+
+struct vintc {
+ const struct intc_info *info;
+ const struct vintc_ops *ops;
+};
+
void intc_preinit(void);
void register_intc_ops(const struct intc_hw_operations *ops);
diff --git a/xen/arch/riscv/include/asm/vaplic.h b/xen/arch/riscv/include/asm/vaplic.h
new file mode 100644
index 000000000000..7684f3490829
--- /dev/null
+++ b/xen/arch/riscv/include/asm/vaplic.h
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * xen/arch/riscv/vaplic.c
+ *
+ * Virtual RISC-V Advanced Platform-Level Interrupt Controller support
+ *
+ * Copyright (c) Microchip.
+ */
+
+#ifndef ASM__RISCV__VAPLIC_H
+#define ASM__RISCV__VAPLIC_H
+
+#include <xen/kernel.h>
+#include <xen/types.h>
+
+#include <asm/intc.h>
+
+struct domain;
+
+#define to_vaplic(v) container_of(v, struct vaplic, base)
+
+struct vaplic_regs {
+ uint32_t domaincfg;
+ uint32_t smsiaddrcfg;
+ uint32_t smsiaddrcfgh;
+};
+
+struct vaplic {
+ struct vintc base;
+ struct vaplic_regs regs;
+};
+
+int domain_vaplic_init(struct domain *d);
+void domain_vaplic_deinit(struct domain *d);
+
+#endif /* ASM__RISCV__VAPLIC_H */
diff --git a/xen/arch/riscv/intc.c b/xen/arch/riscv/intc.c
index c9f12651fda1..ff7a76accaca 100644
--- a/xen/arch/riscv/intc.c
+++ b/xen/arch/riscv/intc.c
@@ -6,6 +6,7 @@
#include <xen/init.h>
#include <xen/irq.h>
#include <xen/lib.h>
+#include <xen/sched.h>
#include <xen/spinlock.h>
#include <asm/aia.h>
diff --git a/xen/arch/riscv/vaplic.c b/xen/arch/riscv/vaplic.c
new file mode 100644
index 000000000000..9b105de7ed7d
--- /dev/null
+++ b/xen/arch/riscv/vaplic.c
@@ -0,0 +1,74 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * xen/arch/riscv/vaplic.c
+ *
+ * Virtual RISC-V Advanced Platform-Level Interrupt Controller support
+ *
+ * Copyright (c) Microchip.
+ * Copyright (c) Vates
+ */
+
+#include <xen/errno.h>
+#include <xen/sched.h>
+#include <xen/xvmalloc.h>
+
+#include <asm/aia.h>
+#include <asm/imsic.h>
+#include <asm/intc.h>
+#include <asm/vaplic.h>
+
+#include "aplic-priv.h"
+
+static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
+{
+ int rc = 0;
+
+ rc = vcpu_imsic_init(v);
+ if ( rc )
+ return rc;
+
+ imsic_set_guest_file_id(v, vgein_assign(v));
+
+ return rc;
+}
+
+static const struct vintc_ops vaplic_ops = {
+ .vcpu_init = vcpu_vaplic_init,
+};
+
+static struct vintc * __init vaplic_alloc(void)
+{
+ struct vaplic *v = NULL;
+
+ v = xvzalloc(struct vaplic);
+ if ( !v )
+ return NULL;
+
+ return &v->base;
+}
+
+int __init domain_vaplic_init(struct domain *d)
+{
+ int ret = 0;
+
+ d->arch.vintc = vaplic_alloc();
+ if ( !d->arch.vintc )
+ {
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ d->arch.vintc->ops = &vaplic_ops;
+ to_vaplic(d->arch.vintc)->regs.domaincfg =
+ APLIC_DOMAINCFG_IE | APLIC_DOMAINCFG_DM;
+
+ fail:
+ return ret;
+}
+
+void __init domain_vaplic_deinit(struct domain *d)
+{
+ struct vaplic *vaplic = to_vaplic(d->arch.vintc);
+
+ XVFREE(vaplic);
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (14 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 12:22 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 17/27] xen/riscv: add missing APLIC register offsets, masks to asm/aplic.h Oleksii Kurochko
` (10 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Add initial support for assigning device interrupts to domains in
dom0less configurations.
IRQs described in the device tree are retrieved and authorized for
the target domain. Interrupts that are not connected to the primary
interrupt controller (at the momemnt, I haven't seen the platform
with secondary interrupt controller) are ignored, and IRQs already
assigned to another domain are rejected to prevent sharing.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
For better undesrstanding how auth_irq_bmp[] is going to be used
it is also introduced two extra patches which are going after this.
They aren't really needed now as this code will be used when guest
will be ran, but it improves understanding and it in general can
go with this patch series.
---
---
xen/arch/riscv/include/asm/aplic.h | 2 +
xen/arch/riscv/include/asm/intc.h | 12 +++
xen/arch/riscv/include/asm/setup.h | 17 ++++
xen/arch/riscv/intc.c | 8 ++
xen/arch/riscv/vaplic.c | 124 ++++++++++++++++++++++++++++-
5 files changed, 162 insertions(+), 1 deletion(-)
diff --git a/xen/arch/riscv/include/asm/aplic.h b/xen/arch/riscv/include/asm/aplic.h
index b0724fe6f360..55cd4a128de7 100644
--- a/xen/arch/riscv/include/asm/aplic.h
+++ b/xen/arch/riscv/include/asm/aplic.h
@@ -15,6 +15,8 @@
#include <asm/imsic.h>
+#define APLIC_NUM_REGS 32
+
#define APLIC_DOMAINCFG_IE BIT(8, U)
#define APLIC_DOMAINCFG_DM BIT(2, U)
diff --git a/xen/arch/riscv/include/asm/intc.h b/xen/arch/riscv/include/asm/intc.h
index c5a869db8bc5..76d2fd09cb8b 100644
--- a/xen/arch/riscv/include/asm/intc.h
+++ b/xen/arch/riscv/include/asm/intc.h
@@ -16,6 +16,7 @@ struct cpu_user_regs;
struct dt_device_node;
struct irq_desc;
struct kernel_info;
+struct rangeset;
struct vcpu;
struct intc_info {
@@ -54,11 +55,22 @@ struct vintc_ops {
/* Check if a register is virtual interrupt controller MMIO */
int (*is_access)(const struct vcpu *vcpu, const unsigned long addr);
+
+ /*
+ * Retrieves the interrupts configuration from a device tree node and maps
+ * those interrupts to the target domain.
+ */
+ int (*map_device_irqs_to_domain)(struct domain *d,
+ struct dt_device_node *dev,
+ bool need_mapping,
+ struct rangeset *irq_ranges);
};
struct vintc {
const struct intc_info *info;
const struct vintc_ops *ops;
+
+ void *private;
};
void intc_preinit(void);
diff --git a/xen/arch/riscv/include/asm/setup.h b/xen/arch/riscv/include/asm/setup.h
index 2215894cfbb1..1c23043f409f 100644
--- a/xen/arch/riscv/include/asm/setup.h
+++ b/xen/arch/riscv/include/asm/setup.h
@@ -5,6 +5,10 @@
#include <xen/types.h>
+struct domain;
+struct dt_device_node;
+struct rangeset;
+
#define max_init_domid (0)
void setup_mm(void);
@@ -13,6 +17,19 @@ void copy_from_paddr(void *dst, paddr_t paddr, unsigned long len);
void init_csr_masks(void);
+/* TODO: move somewhere to common header? */
+/*
+ * Retrieves the interrupts configuration from a device tree node and maps
+ * those interrupts to the target domain.
+ *
+ * Returns:
+ * < 0 error
+ * 0 success
+ */
+int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
+ bool need_mapping,
+ struct rangeset *irq_ranges);
+
#endif /* ASM__RISCV__SETUP_H */
/*
diff --git a/xen/arch/riscv/intc.c b/xen/arch/riscv/intc.c
index ff7a76accaca..a91dbc5e997c 100644
--- a/xen/arch/riscv/intc.c
+++ b/xen/arch/riscv/intc.c
@@ -79,3 +79,11 @@ int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
return -ENOSYS;
}
+
+int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
+ bool need_mapping,
+ struct rangeset *irq_ranges)
+{
+ return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
+ irq_ranges);
+}
diff --git a/xen/arch/riscv/vaplic.c b/xen/arch/riscv/vaplic.c
index 9b105de7ed7d..0c69f087cf4d 100644
--- a/xen/arch/riscv/vaplic.c
+++ b/xen/arch/riscv/vaplic.c
@@ -9,6 +9,7 @@
*/
#include <xen/errno.h>
+#include <xen/iocap.h>
#include <xen/sched.h>
#include <xen/xvmalloc.h>
@@ -19,6 +20,113 @@
#include "aplic-priv.h"
+struct vaplic_priv {
+ /* Contains a legal interrupts for a domain */
+ uint32_t auth_irq_bmp[APLIC_NUM_REGS];
+};
+
+static bool is_irq_shared_among_domains(const struct domain *d,
+ const unsigned int irq_num)
+{
+ struct domain *tmp;
+ unsigned int reg_num = irq_num / APLIC_NUM_REGS;
+ unsigned int bit_pos = irq_num % APLIC_NUM_REGS;
+
+ for_each_domain ( tmp )
+ {
+ uint32_t *auth_irq_bmp;
+
+ if ( tmp == d )
+ continue;
+
+ auth_irq_bmp = tmp->arch.vintc->private;
+
+ if ( auth_irq_bmp[reg_num] & BIT(bit_pos, U) )
+ {
+ printk("%s: irq%d is shared between %pd and %pd\n", __func__,
+ irq_num, tmp, d);
+
+ return true;
+ }
+ }
+
+ return false;
+}
+
+int vaplic_map_device_irqs_to_domain(struct domain *d,
+ struct dt_device_node *dev,
+ bool need_mapping,
+ struct rangeset *irq_ranges)
+{
+ unsigned int i, nirq;
+ int res, irq;
+ struct dt_raw_irq rirq;
+ uint32_t *auth_irq_bmp = d->arch.vintc->private;
+ unsigned int reg_num;
+
+ nirq = dt_number_of_irq(dev);
+
+ /* Give permission and map IRQs */
+ for ( i = 0; i < nirq; i++ )
+ {
+ res = dt_device_get_raw_irq(dev, i, &rirq);
+ if ( res )
+ {
+ printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
+ i, dt_node_full_name(dev));
+ return res;
+ }
+
+ /*
+ * Don't map IRQ that have no physical meaning
+ * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
+ */
+ if ( rirq.controller != dt_interrupt_controller )
+ {
+ dt_dprintk("irq %u not connected to primary controller."
+ "Connected to %s\n", i,
+ dt_node_full_name(rirq.controller));
+ continue;
+ }
+
+ irq = platform_get_irq(dev, i);
+ if ( irq < 0 )
+ {
+ printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
+ return irq;
+ }
+
+ res = irq_permit_access(d, irq);
+ if ( res )
+ {
+ printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
+ irq);
+ return res;
+ }
+
+ reg_num = irq / APLIC_NUM_REGS;
+
+ if ( is_irq_shared_among_domains(d, irq) )
+ {
+ printk("%s: Shared IRQ isn't supported\n", __func__);
+ return -EINVAL;
+ }
+
+ auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
+
+ dt_dprintk(" - IRQ: %u\n", irq);
+
+ if ( irq_ranges )
+ {
+ res = rangeset_add_singleton(irq_ranges, irq);
+ if ( res )
+ return res;
+ }
+ }
+
+ return 0;
+}
+
static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
{
int rc = 0;
@@ -34,6 +142,7 @@ static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
static const struct vintc_ops vaplic_ops = {
.vcpu_init = vcpu_vaplic_init,
+ .map_device_irqs_to_domain = vaplic_map_device_irqs_to_domain,
};
static struct vintc * __init vaplic_alloc(void)
@@ -62,13 +171,26 @@ int __init domain_vaplic_init(struct domain *d)
to_vaplic(d->arch.vintc)->regs.domaincfg =
APLIC_DOMAINCFG_IE | APLIC_DOMAINCFG_DM;
+ d->arch.vintc->private = xvzalloc(struct vaplic_priv);
+ if ( !d->arch.vintc->private )
+ {
+ ret = -ENOMEM;
+ goto fail;
+ }
+
+ return ret;
+
fail:
+ domain_vaplic_deinit(d);
+
return ret;
}
void __init domain_vaplic_deinit(struct domain *d)
{
- struct vaplic *vaplic = to_vaplic(d->arch.vintc);
+ struct vintc *vintc = d->arch.vintc;
+ struct vaplic *vaplic = to_vaplic(vintc);
+ XVFREE(vintc->private);
XVFREE(vaplic);
}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 17/27] xen/riscv: add missing APLIC register offsets, masks to asm/aplic.h.
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (15 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 12:51 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 18/27] xen/riscv: add vaplic access check Oleksii Kurochko
` (9 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
These definitions are required for correct decoding of APLIC MMIO
accesses and target configuration, and will be used by both the
physical and virtual APLIC implementations.
No functional change is intended by this patch; it only centralises
hardware definitions that were previously missing.
Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/aplic.h | 35 ++++++++++++++++++++++++++++++
1 file changed, 35 insertions(+)
diff --git a/xen/arch/riscv/include/asm/aplic.h b/xen/arch/riscv/include/asm/aplic.h
index 55cd4a128de7..7e9161d98ceb 100644
--- a/xen/arch/riscv/include/asm/aplic.h
+++ b/xen/arch/riscv/include/asm/aplic.h
@@ -17,6 +17,11 @@
#define APLIC_NUM_REGS 32
+#define APLIC_REG_OFFSET_MASK 0x3FFF
+#define APLIC_TARGET_IPRIO_MASK 0xFF
+#define APLIC_TARGET_GUEST_IDX_SHIFT 12
+#define APLIC_TARGET_EIID_MASK 0x7FF
+
#define APLIC_DOMAINCFG_IE BIT(8, U)
#define APLIC_DOMAINCFG_DM BIT(2, U)
@@ -27,6 +32,36 @@
#define APLIC_SOURCECFG_SM_LEVEL_HIGH 0x6
#define APLIC_SOURCECFG_SM_LEVEL_LOW 0x7
+#define APLIC_DOMAINCFG 0x0000
+#define APLIC_SOURCECFG_BASE 0x0004
+#define APLIC_SOURCECFG_LAST 0x0FFC
+
+#define APLIC_SMSICFGADDR 0x1BC8
+#define APLIC_SMSICFGADDRH 0x1BCC
+
+#define APLIC_SETIP_BASE 0x1C00
+#define APLIC_SETIP_LAST 0x1C7C
+#define APLIC_SETIPNUM 0x1CDC
+
+#define APLIC_CLRIP_BASE 0x1D00
+#define APLIC_CLRIP_LAST 0x1D7C
+#define APLIC_CLRIPNUM 0x1DDC
+
+#define APLIC_SETIE_BASE 0x1E00
+#define APLIC_SETIE_LAST 0x1E7C
+#define APLIC_SETIENUM 0x1EDC
+
+#define APLIC_CLRIE_BASE 0x1F00
+#define APLIC_CLRIE_LAST 0x1F7C
+#define APLIC_CLRIENUM 0x1FDC
+
+#define APLIC_SETIPNUM_LE 0x2000
+
+#define APLIC_GENMSI 0x3000
+
+#define APLIC_TARGET_BASE 0x3004
+#define APLIC_TARGET_LAST 0x3FFC
+
#define APLIC_TARGET_HART_IDX_SHIFT 18
struct aplic_regs {
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 18/27] xen/riscv: add vaplic access check
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (16 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 17/27] xen/riscv: add missing APLIC register offsets, masks to asm/aplic.h Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 13:10 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO Oleksii Kurochko
` (8 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Provide a mechanism for the virtual APLIC to determine whether a guest
access targets the physical APLIC MMIO region.
This is required to correctly identify and route guest APLIC accesses.
Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/aplic.c | 1 +
xen/arch/riscv/include/asm/intc.h | 3 +++
xen/arch/riscv/vaplic.c | 15 +++++++++++++++
3 files changed, 19 insertions(+)
diff --git a/xen/arch/riscv/aplic.c b/xen/arch/riscv/aplic.c
index e139946a05a0..754b444a2a13 100644
--- a/xen/arch/riscv/aplic.c
+++ b/xen/arch/riscv/aplic.c
@@ -38,6 +38,7 @@ static struct aplic_priv aplic = {
static struct intc_info __ro_after_init aplic_info = {
.hw_version = INTC_APLIC,
+ .private = &aplic,
};
static void __init aplic_init_hw_interrupts(void)
diff --git a/xen/arch/riscv/include/asm/intc.h b/xen/arch/riscv/include/asm/intc.h
index 76d2fd09cb8b..e6b6c179415a 100644
--- a/xen/arch/riscv/include/asm/intc.h
+++ b/xen/arch/riscv/include/asm/intc.h
@@ -25,6 +25,9 @@ struct intc_info {
/* number of irqs */
unsigned int num_irqs;
+
+ /* private data pointer of the interrupt controller */
+ void *private;
};
struct intc_hw_operations {
diff --git a/xen/arch/riscv/vaplic.c b/xen/arch/riscv/vaplic.c
index 0c69f087cf4d..82e74a609ee6 100644
--- a/xen/arch/riscv/vaplic.c
+++ b/xen/arch/riscv/vaplic.c
@@ -127,6 +127,20 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
return 0;
}
+static int cf_check vaplic_is_access(const struct vcpu *vcpu,
+ const unsigned long addr)
+{
+ const struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
+ const struct aplic_priv *priv = vaplic->base.info->private;
+ const paddr_t paddr_end = priv->paddr_start + priv->size;
+
+ /* check if it is an APLIC access */
+ if ( priv->paddr_start <= addr && addr < paddr_end )
+ return 1;
+
+ return 0;
+}
+
static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
{
int rc = 0;
@@ -143,6 +157,7 @@ static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
static const struct vintc_ops vaplic_ops = {
.vcpu_init = vcpu_vaplic_init,
.map_device_irqs_to_domain = vaplic_map_device_irqs_to_domain,
+ .is_access = vaplic_is_access,
};
static struct vintc * __init vaplic_alloc(void)
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (17 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 18/27] xen/riscv: add vaplic access check Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 14:18 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 20/27] xen/riscv: emulate guest reads from " Oleksii Kurochko
` (7 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
RISC-V guests program interrupt routing by writing to APLIC registers.
When APLIC is virtualised, Xen must intercept these MMIO writes and
translate them into updates of the virtual interrupt controller state.
Introduce vaplic_emulate_store() to validate guest APLIC accesses and
apply only those changes that are permitted by the domain’s authorised
interrupt bitmap. Writes targeting unauthorised interrupts are ignored,
ensuring that guests cannot affect interrupts they do not own.
APLIC TARGET writes are rewritten to reflect the effective interrupt
delivery configuration, depending on whether the domain operates in
direct (isn't supported by Xen) or MSI mode. This ensures that
guest-visible routing requests are correctly mapped onto the underlying
virtual interrupt topology.
Unexpected or unsupported register accesses are treated as fatal, as
they indicate a emulation bug.
Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/intc.h | 4 +
xen/arch/riscv/vaplic.c | 169 ++++++++++++++++++++++++++++++
2 files changed, 173 insertions(+)
diff --git a/xen/arch/riscv/include/asm/intc.h b/xen/arch/riscv/include/asm/intc.h
index e6b6c179415a..92a74eede4a0 100644
--- a/xen/arch/riscv/include/asm/intc.h
+++ b/xen/arch/riscv/include/asm/intc.h
@@ -56,6 +56,10 @@ struct vintc_ops {
/* Initialize some vINTC-related stuff for a vCPU */
int (*vcpu_init)(struct vcpu *vcpu);
+ /* Emulate store to virtual interrupt controller MMIOs */
+ int (*emulate_store)(const struct vcpu *vcpu, unsigned long addr,
+ uint32_t in);
+
/* Check if a register is virtual interrupt controller MMIO */
int (*is_access)(const struct vcpu *vcpu, const unsigned long addr);
diff --git a/xen/arch/riscv/vaplic.c b/xen/arch/riscv/vaplic.c
index 82e74a609ee6..5540b4884179 100644
--- a/xen/arch/riscv/vaplic.c
+++ b/xen/arch/riscv/vaplic.c
@@ -20,6 +20,16 @@
#include "aplic-priv.h"
+#define APLIC_REG_GET(addr, offset) \
+ readl((void *)((vaddr_t)(addr) + offset))
+#define APLIC_REG_SET(addr, offset, value) \
+ writel(value, (void *)((vaddr_t)(addr) + offset))
+
+#define AUTH_IRQ_BIT(irqnum) (auth_irq_bmp[(irqnum) / APLIC_NUM_REGS] & \
+ BIT((irqnum) % APLIC_NUM_REGS, U))
+
+#define regval_to_irqn(reg_val) ((reg_val) / sizeof(uint32_t))
+
struct vaplic_priv {
/* Contains a legal interrupts for a domain */
uint32_t auth_irq_bmp[APLIC_NUM_REGS];
@@ -127,6 +137,164 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
return 0;
}
+static void vaplic_dm_update_target(const unsigned long hart_id, uint32_t *iprio)
+{
+ *iprio &= APLIC_TARGET_IPRIO_MASK;
+ *iprio |= (hart_id << APLIC_TARGET_HART_IDX_SHIFT);
+}
+
+static void vaplic_update_target(const struct imsic_config *imsic,
+ const int guest_id,
+ const unsigned long hart_id, uint32_t *value)
+{
+ unsigned long group_index;
+ unsigned int hhxw = imsic->group_index_bits;
+ unsigned int lhxw = imsic->hart_index_bits;
+ unsigned int hhxs = imsic->group_index_shift - IMSIC_MMIO_PAGE_SHIFT * 2;
+ unsigned long base_ppn = imsic->msi[hart_id].base_addr >> IMSIC_MMIO_PAGE_SHIFT;
+
+ group_index = (base_ppn >> (hhxs + 12)) & (BIT(hhxw, UL) - 1);
+
+ *value &= APLIC_TARGET_EIID_MASK;
+ *value |= guest_id << APLIC_TARGET_GUEST_IDX_SHIFT;
+ *value |= hart_id << APLIC_TARGET_HART_IDX_SHIFT;
+ *value |= group_index << (lhxw + APLIC_TARGET_HART_IDX_SHIFT) ;
+}
+
+#define CALC_REG_VALUE(base) \
+{ \
+ uint32_t index; \
+ uint32_t tmp_val; \
+ index = regval_to_irqn(offset - base); \
+ tmp_val = APLIC_REG_GET(priv->regs, aplic_addr) & ~auth_irq_bmp[index]; \
+ value &= auth_irq_bmp[index]; \
+ value |= tmp_val; \
+}
+
+static int cf_check vaplic_emulate_store(const struct vcpu *vcpu,
+ unsigned long addr, uint32_t value)
+{
+ struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
+ struct aplic_priv *priv = vaplic->base.info->private;
+ uint32_t offset = addr & APLIC_REG_OFFSET_MASK;
+ unsigned long aplic_addr = addr - priv->paddr_start;
+ const uint32_t *auth_irq_bmp = vcpu->domain->arch.vintc->private;
+
+ switch ( offset )
+ {
+ case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
+ CALC_REG_VALUE(APLIC_SETIP_BASE);
+ break;
+
+ case APLIC_CLRIP_BASE ... APLIC_CLRIP_LAST:
+ CALC_REG_VALUE(APLIC_CLRIP_BASE);
+ break;
+
+ case APLIC_SETIE_BASE ... APLIC_SETIE_LAST:
+ CALC_REG_VALUE(APLIC_SETIE_BASE);
+ break;
+
+ case APLIC_CLRIE_BASE ... APLIC_CLRIE_LAST:
+ CALC_REG_VALUE(APLIC_CLRIE_BASE);
+ break;
+
+ case APLIC_SOURCECFG_BASE ... APLIC_SOURCECFG_LAST:
+ /* We don't suppert delagation, so bit10 if sourcecfg should be 0 */
+ ASSERT(!(value & BIT(10, U)));
+
+ /*
+ * As sourcecfg register starts from 1:
+ * 0x0000 domaincfg
+ * 0x0004 sourcecfg[1]
+ * 0x0008 sourcecfg[2]
+ * ...
+ * 0x0FFC sourcecfg[1023]
+ * It is necessary to calculate an interrupt number by substracting
+ * of APLIC_DOMAINCFG instead of APLIC_SOURCECFG_BASE.
+ */
+ if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_DOMAINCFG)) )
+ /* interrupt not enabled, ignore it */
+ return 0;
+
+ break;
+
+ case APLIC_TARGET_BASE ... APLIC_TARGET_LAST:
+ struct vcpu *target_vcpu = NULL;
+
+ /*
+ * Look at vaplic_emulate_load() for explanation why
+ * APLIC_GENMSI is substracted.
+ */
+ if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_GENMSI)) )
+ /* interrupt not enabled, ignore it */
+ return 0;
+
+ for ( int i = 0; i < vcpu->domain->max_vcpus; i++ )
+ {
+ struct vcpu *v = vcpu->domain->vcpu[i];
+
+ if ( v->vcpu_id == (value >> APLIC_TARGET_HART_IDX_SHIFT) )
+ {
+ target_vcpu = v;
+ break;
+ }
+ }
+
+ ASSERT(target_vcpu);
+
+ if ( !(vaplic->regs.domaincfg & APLIC_DOMAINCFG_DM) )
+ {
+ vaplic_dm_update_target(cpuid_to_hartid(target_vcpu->processor),
+ &value);
+ }
+ else
+ vaplic_update_target(priv->imsic_cfg,
+ vcpu_guest_file_id(target_vcpu),
+ cpuid_to_hartid(target_vcpu->processor),
+ &value);
+
+ break;
+
+ case APLIC_SETIPNUM:
+ case APLIC_SETIPNUM_LE:
+ case APLIC_CLRIPNUM:
+ case APLIC_SETIENUM:
+ case APLIC_CLRIENUM:
+ if ( AUTH_IRQ_BIT(value) )
+ break;
+
+ return 0;
+
+ case APLIC_DOMAINCFG:
+ /*
+ * TODO:
+ * The domaincfg register has this format:
+ * bits 31:24 read-only 0x80
+ * bit 8 IE
+ * bit 7 read-only 0
+ * bit 2 DM (WARL)
+ * bit 0 BE (WARL)
+ *
+ * The most interesting bit for us is IE(Interrupt Enable) bit.
+ * At the moment, at least, Linux doesn't use domaincfg.IE bit to
+ * disable interrupts globally, but if one day someone will use it
+ * then extra actions should be done.
+ */
+
+ printk_once("%s: Nothing to do, domaincfg is set by aplic during "
+ "initialization in Xen\n", __func__);
+
+ return 0;
+
+ default:
+ panic("%s: unsupported register offset: %#x\n", __func__, offset);
+ }
+
+ APLIC_REG_SET(priv->regs, aplic_addr, value);
+
+ return 0;
+}
+
static int cf_check vaplic_is_access(const struct vcpu *vcpu,
const unsigned long addr)
{
@@ -158,6 +326,7 @@ static const struct vintc_ops vaplic_ops = {
.vcpu_init = vcpu_vaplic_init,
.map_device_irqs_to_domain = vaplic_map_device_irqs_to_domain,
.is_access = vaplic_is_access,
+ .emulate_store = vaplic_emulate_store,
};
static struct vintc * __init vaplic_alloc(void)
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 20/27] xen/riscv: emulate guest reads from virtual APLIC MMIO
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (18 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 21/27] xen/riscv: introduce (de)initialization helpers for vINTC Oleksii Kurochko
` (6 subsequent siblings)
26 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Guests may read back APLIC registers to inspect interrupt state and
routing configuration. When virtualising APLIC, Xen must intercept
these MMIO reads and present a consistent, restricted view of the
virtual interrupt controller state. Note that at the moment only
MSI mode is supported.
Introduce vaplic_emulate_load() to handle guest loads from virtual
APLIC registers. Readback is filtered through the domain’s authorised
interrupt bitmap so that guests can observe state only for interrupts
they are permitted to control.
Registers defined by the AIA specification to read as zero are handled
accordingly, while other registers return masked values derived from
the underlying virtual APLIC state. Unsupported accesses are treated
as fatal, as they indicate a emulation error.
Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/include/asm/intc.h | 4 ++
xen/arch/riscv/vaplic.c | 69 +++++++++++++++++++++++++++++++
2 files changed, 73 insertions(+)
diff --git a/xen/arch/riscv/include/asm/intc.h b/xen/arch/riscv/include/asm/intc.h
index 92a74eede4a0..45d41e191e30 100644
--- a/xen/arch/riscv/include/asm/intc.h
+++ b/xen/arch/riscv/include/asm/intc.h
@@ -56,6 +56,10 @@ struct vintc_ops {
/* Initialize some vINTC-related stuff for a vCPU */
int (*vcpu_init)(struct vcpu *vcpu);
+ /* Emulate load to virtual interrupt controller MMIOs */
+ int (*emulate_load)(const struct vcpu *vcpu, unsigned long addr,
+ uint32_t *out);
+
/* Emulate store to virtual interrupt controller MMIOs */
int (*emulate_store)(const struct vcpu *vcpu, unsigned long addr,
uint32_t in);
diff --git a/xen/arch/riscv/vaplic.c b/xen/arch/riscv/vaplic.c
index 5540b4884179..293729ad0ad4 100644
--- a/xen/arch/riscv/vaplic.c
+++ b/xen/arch/riscv/vaplic.c
@@ -137,6 +137,74 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
return 0;
}
+static int vaplic_emulate_load(const struct vcpu *vcpu,
+ const unsigned long addr, uint32_t *out)
+{
+ const struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
+ struct aplic_priv *priv = vaplic->base.info->private;
+ const unsigned long offset = addr & APLIC_REG_OFFSET_MASK;
+ const uint32_t *auth_irq_bmp = vcpu->domain->arch.vintc->private;
+ bool auth_bit;
+
+ switch ( offset )
+ {
+ case APLIC_DOMAINCFG:
+ *out = vaplic->regs.domaincfg;
+ break;
+
+ case APLIC_SETIPNUM:
+ case APLIC_SETIPNUM_LE:
+ case APLIC_CLRIPNUM:
+ case APLIC_SETIENUM:
+ /*
+ * Based on the RISC-V AIA sepc a read of these registers
+ * always returns zero
+ */
+ *out = 0;
+ break;
+
+ case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
+ auth_bit = auth_irq_bmp[regval_to_irqn(offset - APLIC_SETIP_BASE)];
+ *out = APLIC_REG_GET(priv->regs, addr - priv->paddr_start) & auth_bit;
+ break;
+
+ case APLIC_CLRIP_BASE ... APLIC_CLRIP_LAST:
+ auth_bit = auth_irq_bmp[regval_to_irqn(offset - APLIC_CLRIP_BASE)];
+ *out = APLIC_REG_GET(priv->regs, addr - priv->paddr_start) & auth_bit;
+ break;
+
+ case APLIC_SETIE_BASE ... APLIC_SETIE_LAST:
+ auth_bit = auth_irq_bmp[regval_to_irqn(offset - APLIC_CLRIP_BASE)];
+ *out = APLIC_REG_GET(priv->regs, addr - priv->paddr_start) & auth_bit;
+ break;
+
+ case APLIC_CLRIE_BASE ... APLIC_CLRIE_LAST:
+ auth_bit = auth_irq_bmp[regval_to_irqn(offset - APLIC_CLRIE_BASE)];
+ *out = APLIC_REG_GET(priv->regs, addr - priv->paddr_start) & auth_bit;
+ break;
+
+ case APLIC_TARGET_BASE ... APLIC_TARGET_LAST:
+ /*
+ * As target registers start for 1:
+ * 0x3000 genmsi
+ * 0x3004 target[1]
+ * 0x3008 target[2]
+ * ...
+ * 0x3FFC target[1023]
+ * It is necessary to calculate an interrupt number by substracting
+ * of APLIC_GENMSI instead of APLIC_TARGET_BASE.
+ */
+ auth_bit = auth_irq_bmp[regval_to_irqn(offset - APLIC_GENMSI)];
+ *out = APLIC_REG_GET(priv->regs, addr - priv->paddr_start) & auth_bit;
+ break;
+
+ default:
+ panic("%s: unsupported register offset: %#lx", __func__, offset);
+ }
+
+ return 0;
+}
+
static void vaplic_dm_update_target(const unsigned long hart_id, uint32_t *iprio)
{
*iprio &= APLIC_TARGET_IPRIO_MASK;
@@ -327,6 +395,7 @@ static const struct vintc_ops vaplic_ops = {
.map_device_irqs_to_domain = vaplic_map_device_irqs_to_domain,
.is_access = vaplic_is_access,
.emulate_store = vaplic_emulate_store,
+ .emulate_load = vaplic_emulate_load,
};
static struct vintc * __init vaplic_alloc(void)
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 21/27] xen/riscv: introduce (de)initialization helpers for vINTC
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (19 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 20/27] xen/riscv: emulate guest reads from " Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 14:58 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 22/27] xen/riscv: implement init_intc_phandle() Oleksii Kurochko
` (5 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Add common helpers domain_vintc_init() and domain_vintc_deinit() to
allocate and deallocate a virtual interrupt controller (vINTC)
structure and initialize basic virtual interrupt controller registers.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/domain.c | 3 +++
xen/arch/riscv/include/asm/intc.h | 4 ++++
xen/arch/riscv/intc.c | 40 +++++++++++++++++++++++++++++++
3 files changed, 47 insertions(+)
diff --git a/xen/arch/riscv/domain.c b/xen/arch/riscv/domain.c
index 560b21b16ffb..1c08ba7396ea 100644
--- a/xen/arch/riscv/domain.c
+++ b/xen/arch/riscv/domain.c
@@ -300,6 +300,9 @@ int arch_domain_create(struct domain *d,
if ( (rc = p2m_init(d)) != 0)
goto fail;
+ if ( (rc = domain_vintc_init(d)) )
+ goto fail;
+
d->arch.next_phandle = GUEST_PHANDLE_LAST + 1;
return rc;
diff --git a/xen/arch/riscv/include/asm/intc.h b/xen/arch/riscv/include/asm/intc.h
index 45d41e191e30..fb4188b96a75 100644
--- a/xen/arch/riscv/include/asm/intc.h
+++ b/xen/arch/riscv/include/asm/intc.h
@@ -13,6 +13,7 @@ enum intc_version {
};
struct cpu_user_regs;
+struct domain;
struct dt_device_node;
struct irq_desc;
struct kernel_info;
@@ -96,4 +97,7 @@ void intc_handle_external_irqs(struct cpu_user_regs *regs);
int intc_make_domu_dt_node(const struct kernel_info *kinfo);
+int domain_vintc_init(struct domain *d);
+void domain_vintc_deinit(struct domain *d);
+
#endif /* ASM__RISCV__INTERRUPT_CONTOLLER_H */
diff --git a/xen/arch/riscv/intc.c b/xen/arch/riscv/intc.c
index a91dbc5e997c..88ce658a4de0 100644
--- a/xen/arch/riscv/intc.c
+++ b/xen/arch/riscv/intc.c
@@ -11,6 +11,7 @@
#include <asm/aia.h>
#include <asm/intc.h>
+#include <asm/vaplic.h>
static const struct intc_hw_operations *__ro_after_init intc_hw_ops;
@@ -87,3 +88,42 @@ int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
irq_ranges);
}
+
+int __init domain_vintc_init(struct domain *d)
+{
+ int ret = -EOPNOTSUPP;
+ const enum intc_version ver = intc_hw_ops->info->hw_version;
+
+ switch ( ver )
+ {
+ case INTC_APLIC:
+ ret = domain_vaplic_init(d);
+ break;
+
+ default:
+ printk("vintc (ver:%d) isn't implemented\n", ver);
+ break;
+ }
+
+ if ( !ret )
+ d->arch.vintc->info = intc_hw_ops->info;
+
+ return ret;
+}
+
+void __init domain_vintc_deinit(struct domain *d)
+{
+ const enum intc_version ver = intc_hw_ops->info->hw_version;
+
+ switch ( ver )
+ {
+ case INTC_APLIC:
+ domain_vaplic_deinit(d);
+ break;
+
+ default:
+ printk("vintc (ver:%d) isn't implemented\n", ver);
+ break;
+ }
+
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 22/27] xen/riscv: implement init_intc_phandle()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (20 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 21/27] xen/riscv: introduce (de)initialization helpers for vINTC Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 15:00 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 23/27] xen/riscv: call do_initcalls() in start_xen() Oleksii Kurochko
` (4 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Implement init_intc_phandle() to read phandle of interrupt controller
node and save it in kernel->phandle_intc for the future usage during
creation of guest interrupt controller node.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/dom0less-build.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/xen/arch/riscv/dom0less-build.c b/xen/arch/riscv/dom0less-build.c
index 312c5d7d4979..180d653e5780 100644
--- a/xen/arch/riscv/dom0less-build.c
+++ b/xen/arch/riscv/dom0less-build.c
@@ -2,9 +2,26 @@
#include <xen/fdt-kernel.h>
#include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
#include <asm/intc.h>
+int __init init_intc_phandle(struct kernel_info *kinfo, const char *name,
+ const int node_next, const void *pfdt)
+{
+ if ( dt_node_cmp(name, "intc") == 0 )
+ {
+ uint32_t phandle_intc = fdt_get_phandle(pfdt, node_next);
+
+ if ( phandle_intc != 0 )
+ kinfo->phandle_intc = phandle_intc;
+
+ return 0;
+ }
+
+ return 1;
+}
+
int __init make_intc_domU_node(struct kernel_info *kinfo)
{
intc_make_domu_dt_node(kinfo);
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 23/27] xen/riscv: call do_initcalls() in start_xen()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (21 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 22/27] xen/riscv: implement init_intc_phandle() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 15:01 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 24/27] xen/riscv: init rcu Oleksii Kurochko
` (3 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Execute initcall function. Additionally, before do_initcalls() call
scheduler_init() which is needed because of cpupool_create_pool()
is called by do_initcalls(); otherwise BUG_ON(IS_ERR(pool)) will occur
in inside cpupool_create_pool().
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/setup.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index cae49bb29626..24bad0c606a5 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -9,6 +9,7 @@
#include <xen/init.h>
#include <xen/irq.h>
#include <xen/mm.h>
+#include <xen/sched.h>
#include <xen/serial.h>
#include <xen/shutdown.h>
#include <xen/smp.h>
@@ -159,6 +160,11 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
guest_mm_init();
+ scheduler_init();
+ set_current(idle_vcpu[0]);
+
+ do_initcalls();
+
printk("All set up\n");
machine_halt();
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 24/27] xen/riscv: init rcu
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (22 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 23/27] xen/riscv: call do_initcalls() in start_xen() Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-02 15:03 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 25/27] xen/riscv: setup system domains Oleksii Kurochko
` (2 subsequent siblings)
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/setup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 24bad0c606a5..3e337bf611fb 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -9,6 +9,7 @@
#include <xen/init.h>
#include <xen/irq.h>
#include <xen/mm.h>
+#include <xen/rcupdate.h>
#include <xen/sched.h>
#include <xen/serial.h>
#include <xen/shutdown.h>
@@ -154,6 +155,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
timer_init();
+ rcu_init();
+
local_irq_enable();
console_init_postirq();
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 25/27] xen/riscv: setup system domains
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (23 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 24/27] xen/riscv: init rcu Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 26/27] xen/riscv: provide init_vuart() Oleksii Kurochko
2026-03-10 17:09 ` [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support Oleksii Kurochko
26 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/setup.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 3e337bf611fb..5bc34e41fe55 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -6,6 +6,7 @@
#include <xen/compile.h>
#include <xen/console.h>
#include <xen/device_tree.h>
+#include <xen/domain.h>
#include <xen/init.h>
#include <xen/irq.h>
#include <xen/mm.h>
@@ -157,6 +158,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
rcu_init();
+ setup_system_domains();
+
local_irq_enable();
console_init_postirq();
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 26/27] xen/riscv: provide init_vuart()
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (24 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 25/27] xen/riscv: setup system domains Oleksii Kurochko
@ 2026-03-10 17:08 ` Oleksii Kurochko
2026-04-07 13:52 ` Jan Beulich
2026-03-10 17:09 ` [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support Oleksii Kurochko
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:08 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
For debug purpose is enough to have only print messages from guest what is
now implemented in vsbi_legacy_ecall_handler().
For full guesst console support it will better to have something similar to
[1], thereby there is nothing specific should be done, at least, for now
and init_vuart() is provided to make dom0less code buildable.
[1] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2602041533440.3175371@ubuntu-linux-20-04-desktop/
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
xen/arch/riscv/dom0less-build.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/xen/arch/riscv/dom0less-build.c b/xen/arch/riscv/dom0less-build.c
index 180d653e5780..43c8bdc52300 100644
--- a/xen/arch/riscv/dom0less-build.c
+++ b/xen/arch/riscv/dom0less-build.c
@@ -6,6 +6,14 @@
#include <asm/intc.h>
+int __init init_vuart(struct domain *d, struct kernel_info *kinfo,
+ const struct dt_device_node *node)
+{
+ /* Nothing to do at the moment */
+
+ return 0;
+}
+
int __init init_intc_phandle(struct kernel_info *kinfo, const char *name,
const int node_next, const void *pfdt)
{
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
` (25 preceding siblings ...)
2026-03-10 17:08 ` [PATCH v1 26/27] xen/riscv: provide init_vuart() Oleksii Kurochko
@ 2026-03-10 17:09 ` Oleksii Kurochko
2026-04-07 14:11 ` Jan Beulich
26 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-03-10 17:09 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Enable dom0less support for RISC-V by selecting HAS_DOM0LESS and
providing the minimal architecture hooks required by the common
dom0less infrastructure.
Add stub implementations for architecture-specific helpers used when
building domains from the device tree. These currently perform no
additional work but allow the generic dom0less code to build and run
on RISC-V.
Introduce max_init_domid as a runtime variable rather than a constant
so that it can be updated during dom0less domain creation.
Provide missing helpers and definitions required by the domain
construction code, including domain bitness helpers and the
p2m_set_allocation() prototype.
Additionally define the guest magic memory region in the public
RISC-V interface.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Open questions:
- Move declaration of p2m_set_allocation() to xen/fdt-domain-build.h or
xen/dom0less-build.h as it is used in common code of Dom0less and
there is not too much sense in declaration of it for each arch which
supports Dom0less. It could be ifdef-ed in common header as, at the
momemnt, it is used only for Dom0less.
- Shouldn't declaration/defintion of max_init_domid move to common code
instead of having it for each architecture separately? If yes, then what
would be the best place.
---
---
xen/arch/riscv/Kconfig | 1 +
xen/arch/riscv/dom0less-build.c | 18 ++++++++++++++++++
xen/arch/riscv/domain-build.c | 13 +++++++++++++
xen/arch/riscv/include/asm/domain.h | 8 ++++++++
xen/arch/riscv/include/asm/p2m.h | 2 ++
xen/arch/riscv/include/asm/setup.h | 4 +++-
xen/arch/riscv/setup.c | 2 ++
xen/include/public/arch-riscv.h | 3 +++
8 files changed, 50 insertions(+), 1 deletion(-)
diff --git a/xen/arch/riscv/Kconfig b/xen/arch/riscv/Kconfig
index 091510380949..abd579aa6a54 100644
--- a/xen/arch/riscv/Kconfig
+++ b/xen/arch/riscv/Kconfig
@@ -6,6 +6,7 @@ config RISCV
select GENERIC_BUG_FRAME
select GENERIC_UART_INIT
select HAS_DEVICE_TREE_DISCOVERY
+ select HAS_DOM0LESS
select HAS_PMAP
select HAS_UBSAN
select HAS_VMAP
diff --git a/xen/arch/riscv/dom0less-build.c b/xen/arch/riscv/dom0less-build.c
index 43c8bdc52300..865311ec7037 100644
--- a/xen/arch/riscv/dom0less-build.c
+++ b/xen/arch/riscv/dom0less-build.c
@@ -43,3 +43,21 @@ int __init make_arch_nodes(struct kernel_info *kinfo)
return 0;
}
+
+void __init set_domain_type(struct domain *d, struct kernel_info *kinfo)
+{
+ /* Nothing to do */
+}
+
+int __init arch_parse_dom0less_node(struct dt_device_node *node,
+ struct boot_domain *bd)
+{
+ /* Nothing to do */
+ return 0;
+}
+
+int __init arch_handle_passthrough_prop(struct kernel_info *kinfo,
+ struct dt_device_node *node)
+{
+ return 0;
+}
diff --git a/xen/arch/riscv/domain-build.c b/xen/arch/riscv/domain-build.c
index ae26faed09ed..20735f41d646 100644
--- a/xen/arch/riscv/domain-build.c
+++ b/xen/arch/riscv/domain-build.c
@@ -154,9 +154,22 @@ int __init make_cpus_node(const struct domain *d, void *fdt)
return res;
}
+int __init construct_hwdom(struct kernel_info *kinfo,
+ const struct dt_device_node *node)
+{
+ return -EOPNOTSUPP;
+}
+
int __init make_timer_node(const struct kernel_info *kinfo)
{
/* There is no need for timer node for RISC-V. */
return 0;
}
+
+int __init make_hypervisor_node(struct domain *d,
+ const struct kernel_info *kinfo,
+ int addrcells, int sizecells)
+{
+ return -EOPNOTSUPP;
+}
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index 21a3e6876f36..a8342ca7c5bf 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -20,6 +20,14 @@ struct hvm_domain
uint64_t params[HVM_NR_PARAMS];
};
+#ifdef CONFIG_RISCV_64
+#define is_32bit_domain(d) (0)
+#define is_64bit_domain(d) (1)
+#else
+#define is_32bit_domain(d) (1)
+#define is_64bit_domain(d) (0)
+#endif
+
struct arch_vcpu_io {
};
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 4441c0400b83..d97c8d13ef6b 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -259,6 +259,8 @@ void p2m_ctxt_switch_from(struct vcpu *p);
void p2m_ctxt_switch_to(struct vcpu *n);
void p2m_handle_vmenter(void);
+int p2m_set_allocation(struct domain *d, unsigned long pages, bool *preempted);
+
#endif /* ASM__RISCV__P2M_H */
/*
diff --git a/xen/arch/riscv/include/asm/setup.h b/xen/arch/riscv/include/asm/setup.h
index 1c23043f409f..2e3f8931d01b 100644
--- a/xen/arch/riscv/include/asm/setup.h
+++ b/xen/arch/riscv/include/asm/setup.h
@@ -5,11 +5,13 @@
#include <xen/types.h>
+#include <public/xen.h>
+
struct domain;
struct dt_device_node;
struct rangeset;
-#define max_init_domid (0)
+extern domid_t max_init_domid;
void setup_mm(void);
diff --git a/xen/arch/riscv/setup.c b/xen/arch/riscv/setup.c
index 5bc34e41fe55..513f94777f38 100644
--- a/xen/arch/riscv/setup.c
+++ b/xen/arch/riscv/setup.c
@@ -32,6 +32,8 @@
#include <asm/traps.h>
#include <asm/vsbi.h>
+domid_t max_init_domid = 0;
+
/* Xen stack for bringing up the first CPU. */
unsigned char __initdata cpu0_boot_stack[STACK_SIZE]
__aligned(STACK_SIZE);
diff --git a/xen/include/public/arch-riscv.h b/xen/include/public/arch-riscv.h
index 91cee3096041..3c0c786c57ac 100644
--- a/xen/include/public/arch-riscv.h
+++ b/xen/include/public/arch-riscv.h
@@ -58,6 +58,9 @@ typedef uint64_t xen_ulong_t;
#define GUEST_RAM_BANK_BASES { GUEST_RAM0_BASE }
#define GUEST_RAM_BANK_SIZES { GUEST_RAM0_SIZE }
+#define GUEST_MAGIC_BASE xen_mk_ullong(0x39000000)
+#define GUEST_MAGIC_SIZE xen_mk_ullong(0x01000000)
+
struct vcpu_guest_context {
};
typedef struct vcpu_guest_context vcpu_guest_context_t;
--
2.53.0
^ permalink raw reply related [flat|nested] 123+ messages in thread
* Re: [PATCH v1 01/27] xen/riscv: Implement ARCH_PAGING_MEMPOOL
2026-03-10 17:08 ` [PATCH v1 01/27] xen/riscv: Implement ARCH_PAGING_MEMPOOL Oleksii Kurochko
@ 2026-03-11 8:18 ` Jan Beulich
2026-04-09 10:31 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-03-11 8:18 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> The p2m_freelist is used to allocate pages for the P2M, but to initialize
> this list, domain_p2m_set_allocation() might be called.
> This function is invoked in construct_domU() within the common Dom0less
> code, and providing an implementation of domain_p2m_set_allocation() when
> CONFIG_ARCH_PAGING_MEMPOOL=y is appropriate for RISC-V.
With this wording it is odd to see ...
> --- a/xen/arch/riscv/Kconfig
> +++ b/xen/arch/riscv/Kconfig
> @@ -1,5 +1,6 @@
> config RISCV
> def_bool y
> + select ARCH_PAGING_MEMPOOL
... this. You really want to settle on whether it is selected unconditionally
or not. Also for the code below, where ...
> --- a/xen/arch/riscv/Makefile
> +++ b/xen/arch/riscv/Makefile
> @@ -10,7 +10,7 @@ obj-y += irq.o
> obj-y += kernel.o
> obj-y += mm.o
> obj-y += p2m.o
> -obj-y += paging.o
> +obj-$(CONFIG_ARCH_PAGING_MEMPOOL) += paging.o
> obj-y += pt.o
> obj-$(CONFIG_RISCV_64) += riscv64/
> obj-y += sbi.o
... this change and any #ifdef-ary further down aren't needed unless the
select became conditional. (Plus with the change above things likely wouldn't
even build if ARCH_PAGING_MEMPOOL could be off under certain conditions.)
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -1568,3 +1568,34 @@ struct page_info *get_page_from_gfn(struct domain *d, unsigned long gfn,
>
> return page;
> }
> +
> +#ifdef CONFIG_ARCH_PAGING_MEMPOOL
> +
> +int arch_set_paging_mempool_size(struct domain *d, uint64_t size)
> +{
> + unsigned long pages = PFN_DOWN(size);
> + bool preempted = false;
> + int rc;
> +
> + if ( (size & ~PAGE_MASK) || /* Non page-sized request? */
> + pages != PFN_DOWN(size) ) /* 32-bit overflow? */
> + return -EINVAL;
Can't this be had with just
if ( ((paddr_t)pages << PAGE_SHIFT) != size )
return -EINVAL;
(and perhaps utilizing pfn_to_paddr(), even if it's not a PFN we're dealing
with here)?
> + spin_lock(&d->arch.paging.lock);
> + rc = p2m_set_allocation(d, pages, &preempted);
> + spin_unlock(&d->arch.paging.lock);
> +
> + ASSERT(preempted == (rc == -ERESTART));
This actually suggests that (once again) Arm code perhaps shouldn't have been
copied verbatim: There shouldn't be a need for the "preempted" state to be
returned back in two distinct ways.
> + return rc;
> +}
> +
> +/* Return the size of the pool, in bytes. */
> +int arch_get_paging_mempool_size(struct domain *d, uint64_t *size)
> +{
> + *size = (uint64_t)ACCESS_ONCE(d->arch.paging.total_pages) << PAGE_SHIFT;
As per above, maybe use pfn_to_paddr()?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 02/27] xen/riscv: Implement construct_domain()
2026-03-10 17:08 ` [PATCH v1 02/27] xen/riscv: Implement construct_domain() Oleksii Kurochko
@ 2026-03-24 9:37 ` Jan Beulich
2026-04-09 11:26 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-03-24 9:37 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/arch/riscv/domain-build.c
> @@ -0,0 +1,46 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +
> +#include <xen/fdt-domain-build.h>
> +#include <xen/fdt-kernel.h>
> +#include <xen/init.h>
> +#include <xen/sched.h>
> +
> +#include <asm/current.h>
> +#include <asm/guest_access.h>
> +
> +int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
Are you actually altering what kinfo points to?
> +{
> + struct vcpu *v = d->vcpu[0];
> + struct cpu_user_regs *regs = vcpu_guest_cpu_user_regs(v);
> +
> + BUG_ON(d->vcpu[0] == NULL);
Why not simply "!v"?
Also, while in the cover letter you state a dependency on another series,
this is somewhat unwieldy here. From the titles there I can't deduce which
of the patches would introduce vcpu_guest_cpu_user_regs(). Yet I would
have wanted to double check that it doesn't de-reference v already.
> + BUG_ON(v->is_initialised);
> +
> + kernel_load(kinfo);
> + initrd_load(kinfo, copy_to_guest_phys);
> + dtb_load(kinfo, copy_to_guest_phys);
These all return void, despite this also being used for non-Dom0. Is it
really fatal to a dom0less system if one out of many domains fail to be
built? Especially when, despite the name, there is a Dom0?
> + regs->sepc = kinfo->entry;
> +
> + /* Guest boot cpuid = 0 */
> + regs->a0 = 0;
> + regs->a1 = kinfo->dtb_paddr;
> +
> + for ( unsigned int i = 1; i < d->max_vcpus; i++ )
> + {
> + if ( vcpu_create(d, i) == NULL )
> + {
> + printk("Failed to allocate %pd v%d\n", d, i);
> + break;
And no error is indicated to the caller?
> + }
> +
> + printk("Created vcpu %d for %pd\n", i, d);
dprintk() at most, and if so, then please use %pv.
> + }
> +
> + domain_update_node_affinity(d);
> +
> + v->is_initialised = 1;
This is a boolean field, so the rhs wants to be "true".
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 03/27] xen/riscv: implement prerequisites for domain_create()
2026-03-10 17:08 ` [PATCH v1 03/27] xen/riscv: implement prerequisites for domain_create() Oleksii Kurochko
@ 2026-04-01 12:57 ` Jan Beulich
2026-04-09 11:55 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 12:57 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> arch_domain_create() and arch_sanitise_domain_config() are prerequisites for
> domain_create().
>
> arch_sanitise_domain_config() currently returns 0, as there is no specific
> work required at this stage.
>
> arch_domain_create() performs basic initialization, such as setting up the P2M
> and initializing the domain's virtual timer.
Does it? I can spot only the former; instead there is ...
> --- a/xen/arch/riscv/domain.c
> +++ b/xen/arch/riscv/domain.c
> @@ -288,6 +288,33 @@ void sync_vcpu_execstate(struct vcpu *v)
> /* Nothing to do -- no lazy switching */
> }
>
> +int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> +{
> + return 0;
> +}
> +
> +int arch_domain_create(struct domain *d,
> + struct xen_domctl_createdomain *config,
> + unsigned int flags)
> +{
> + int rc = 0;
> +
> + if ( is_idle_domain(d) )
> + return 0;
> +
> + if ( (rc = p2m_init(d)) != 0)
> + goto fail;
> +
> + d->arch.next_phandle = GUEST_PHANDLE_LAST + 1;
... this, which I can't make any sense of. I can't find matching Arm code
either, which might otherwise have helped.
> + return rc;
> +
> + fail:
> + d->is_dying = DOMDYING_dead;
> + arch_domain_destroy(d);
(At least) for the use here, that other function would better also move out
of stubs.c at the same time (and no longer have unconditional BUG_ON() in it).
> --- a/xen/arch/riscv/stubs.c
> +++ b/xen/arch/riscv/stubs.c
> @@ -101,14 +101,7 @@ void dump_pageframe_info(struct domain *d)
> BUG_ON("unimplemented");
> }
>
> -int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
> -{
> - BUG_ON("unimplemented");
> -}
> -
> -int arch_domain_create(struct domain *d,
> - struct xen_domctl_createdomain *config,
> - unsigned int flags)
> +void vcpu_switch_to_aarch64_mode(struct vcpu *v)
What is this? Surely nothing with this name should exist under riscv/.
> --- a/xen/include/public/device_tree_defs.h
> +++ b/xen/include/public/device_tree_defs.h
> @@ -14,6 +14,7 @@
> */
> #define GUEST_PHANDLE_GIC (65000)
> #define GUEST_PHANDLE_IOMMU (GUEST_PHANDLE_GIC + 1)
> +#define GUEST_PHANDLE_LAST GUEST_PHANDLE_IOMMU
This, to me, looks like a questionable addition to the public interface.
Yet I'm not a DT person, so I may simply be missing why something like
this might be wanted (and how stable it then would be, long term).
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 04/27] xen/riscv: rework G-stage mode handling
2026-03-10 17:08 ` [PATCH v1 04/27] xen/riscv: rework G-stage mode handling Oleksii Kurochko
@ 2026-04-01 13:19 ` Jan Beulich
2026-04-07 10:47 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 13:19 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> Rework G-stage mode handling to make the selected mode descriptor reusable
> outside of p2m initialization.
>
> As max_gstage_mode is going to be reused by code that creates CPU nodes for
> guest domains, not only max_gstage_mode->mode but also max_gstage_mode->name
> is required.
I guess I'm not DT-savvy enough to understand why that would be.
> To support this, make max_gstage_mode a global pointer to one of
> the entries in a global modes[] array, and remove get_max_supported_mode().
>
> Update struct p2m_domain to store a pointer to a mode descriptor instead of
> embedding the structure directly.
>
> Refactor the modes[] array so that mode->name contains only the MMU scheme
> name (without the "x4" suffix), as this value is reused when filling the
> maximum MMU type passed to the guest. According to DT bindings [1], the MMU
> type must not include the "x4" suffix. Use "none" for the Bare mode to match
> the DT binding requirements.
I expect this DT aspect is also why Sv changes to sv in the table? (Which
is a little unhelpful for the printk() where it's used.)
> Adjust modes[]->paging_levels to represent the maximum paging level rather
> than the total number of levels. This ensures that P2M_ROOT_LEVEL() and its
> users behave correctly without relying on hardcoded p2m mode values.
>
> Finally, drop __initconst from the modes[] declaration, as the array is
> referenced via p2m->mode and max_gstage_mode beyond the init stage.
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/riscv/cpus.yaml?h=v6.19-rc3#n82
Is a reference into Linux doc really providing something "canonical"? Surely
there's an independent spec somewhere?
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -45,18 +45,32 @@ struct p2m_pte_ctx {
> unsigned int level; /* Paging level at which the PTE resides. */
> };
>
> -static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
> - .mode = HGATP_MODE_OFF,
> - .paging_levels = 0,
> - .name = "Bare",
> -};
> -
> /*
> * Set to the maximum configured support for IPA bits, so the number of IPA bits can be
> * restricted by external entity (e.g. IOMMU).
> */
> unsigned int __read_mostly p2m_ipa_bits = PADDR_BITS;
>
> +static const struct gstage_mode_desc modes[] = {
As a function scope static this was a fine identifier. Please consider whether
with the wider scope gstage_modes[] might not be better.
> + /*
> + * Based on the RISC-V spec:
> + * Bare mode is always supported, regardless of SXLEN.
> + * When SXLEN=32, the only other valid setting for MODE is Sv32.
> + * When SXLEN=64, three paged virtual-memory schemes are defined:
> + * Sv39, Sv48, and Sv57.
> + */
> + [0] = { HGATP_MODE_OFF, 0, "none" },
> +#ifdef CONFIG_RISCV_32
> + [1] = { HGATP_MODE_SV32X4, 1, "sv32" }
> +#else
> + [2] = { HGATP_MODE_SV39X4, 2, "sv39" },
> + [3] = { HGATP_MODE_SV48X4, 3, "sv48" },
> + [4] = { HGATP_MODE_SV57X4, 4, "sv57" },
> +#endif
> +};
The dedicated initializer form isn't adding any value here (whereas it slightly
hampers readability). You really don't want the array to be sparsely populated,
so perhaps better to leave as it was before?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 05/27] xen/riscv: introduce guest riscv,isa string
2026-03-10 17:08 ` [PATCH v1 05/27] xen/riscv: introduce guest riscv,isa string Oleksii Kurochko
@ 2026-04-01 13:49 ` Jan Beulich
2026-04-10 10:24 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 13:49 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> Introduce generation of the riscv,isa string passed to the guest via the
> Device Tree riscv,isa property.
Title as well as this suggest this is all about guest properties. Then ...
> The following changes are introduced:
>
> - According to the RISC-V privileged specification, M-mode software controls
> the XLEN value used by S-mode and U-mode. For RV64 harts, the SXL and UXL
> fields of the mstatus register are WARL fields that define the XLEN for
> S-mode and U-mode.
>
> The XLEN value is provided by M-mode software (OpenSBI in the case of Xen)
> via the riscv,isa DT property. Introduce and initialize an xlen variable
> when parsing the host riscv,isa string in riscv_isa_parse_string().
... suddenly talk is of host aspects? (See below as to what "xlen" really
is meant to hold.)
> --- a/xen/arch/riscv/cpufeature.c
> +++ b/xen/arch/riscv/cpufeature.c
> @@ -38,6 +38,8 @@ struct riscv_isa_ext_data {
> /* Host ISA bitmap */
> static __ro_after_init DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX);
>
> +static __ro_after_init unsigned int xlen;
Nit: Attribute between type and identifier please, whenever possible (it
isn't neatly possible in riscv_isa above, due to DECLARE_BITMAP()).
> @@ -160,6 +162,19 @@ static const struct riscv_isa_ext_data __initconst required_extensions[] = {
> RISCV_ISA_EXT_DATA(svpbmt),
> };
>
> +static const unsigned int __initconst guest_unsupp_exts[] = {
> + RISCV_ISA_EXT_f,
> + RISCV_ISA_EXT_d,
> + RISCV_ISA_EXT_h,
> + RISCV_ISA_EXT_q,
> + RISCV_ISA_EXT_v,
> +};
This could do with a comment clarifying what needs (and what doesn't need)
putting here. My expectation would have been that everything in
riscv_isa_ext[] which shouldn't be exposed to guests should appear here.
Yet then there is V (which riscv_isa_ext[] doesn't have), while e.g. Svade
and Svpbmt (which iirc won't be available to guests right away) aren't
there.
> +static __ro_after_init DECLARE_BITMAP(guest_unsupp_bmp, RISCV_ISA_EXT_MAX);
Is the _bmp suffix really needed? riscv_isa, for example, doesn't have it.
> +#define MAX_GUEST_ISA_STR_LEN 256
> +char guest_isa_str[MAX_GUEST_ISA_STR_LEN];
__ro_after_init?
Yet then - can this really be a global? Isn't the set of extensions
available to a guest a per-guest property, i.e. a global could at best
represent an upper bound on features?
> @@ -193,6 +208,15 @@ static void __init match_isa_ext(const char *name, const char *name_end,
> !memcmp(name, ext->name, name_end - name) )
> {
> __set_bit(ext->id, bitmap);
> +
> + if ( riscv_isa_extension_available(guest_unsupp_bmp, ext->id) )
> + break;
> +
> + if ( ext->id >= RISCV_ISA_EXT_BASE )
> + safe_strcat(guest_isa_str, "_");
> +
> + safe_strcat(guest_isa_str, ext->name);
Shouldn't you check the (kind-of-)return value? (Yet better would be a build-
time check, but I can't think of a way to achieve that.)
> @@ -207,13 +231,17 @@ static int __init riscv_isa_parse_string(const char *isa,
> #if defined(CONFIG_RISCV_32)
> if ( isa[2] != '3' && isa[3] != '2' )
> return -EINVAL;
> + xlen = 32;
> #elif defined(CONFIG_RISCV_64)
> if ( isa[2] != '6' && isa[3] != '4' )
> return -EINVAL;
> + xlen = 64;
> #else
> # error "unsupported RISC-V bitness"
> #endif
This can be had with an initializer of "xlen". Then the (kind-of-)variable
could be const unsigned int. Seeing the use below, is the variable
correctly named, though? I.e. shouldn't it be guest_xlen or some such?
Independently I expect you will want to support 32-bit guests on 64-bit Xen
at some point, in which case encoding this value into a global string won't
work very well.
> + snprintf(guest_isa_str, sizeof(guest_isa_str), "rv%d", xlen);
%u please with unsigned int.
This being the only use of the variable (afaics), why is it not function-
scope?
> @@ -487,6 +515,11 @@ void __init riscv_fill_hwcap(void)
> bool all_extns_available = true;
> struct trap_info trap;
>
> + for ( i = 0; i < ARRAY_SIZE(guest_unsupp_exts); i++ )
> + {
> + __set_bit(guest_unsupp_exts[i], guest_unsupp_bmp);
> + }
Nit: No need for braces here. And anyway - can't this be had with an
initializer for guest_unsupp_bmp?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 06/27] xen/riscv: implement make_cpus_node()
2026-03-10 17:08 ` [PATCH v1 06/27] xen/riscv: implement make_cpus_node() Oleksii Kurochko
@ 2026-04-01 14:11 ` Jan Beulich
2026-04-10 11:19 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 14:11 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/Makefile
> +++ b/xen/arch/riscv/Makefile
> @@ -1,6 +1,7 @@
> obj-y += aplic.o
> obj-y += cpufeature.o
> obj-y += domain.o
> +obj-y += domain-build.o
> obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
> obj-y += entry.o
> obj-y += guestcopy.o
Doesn't this belong in patch 02? And then domain-build.init.o?
> --- a/xen/arch/riscv/domain-build.c
> +++ b/xen/arch/riscv/domain-build.c
> @@ -3,8 +3,10 @@
> #include <xen/fdt-domain-build.h>
> #include <xen/fdt-kernel.h>
> #include <xen/init.h>
> +#include <xen/libfdt/libfdt.h>
> #include <xen/sched.h>
>
> +#include <asm/cpufeature.h>
> #include <asm/current.h>
> #include <asm/guest_access.h>
>
> @@ -44,3 +46,109 @@ int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
>
> return 0;
> }
> +
> +int __init make_cpus_node(const struct domain *d, void *fdt)
> +{
> + int res;
> + const struct dt_device_node *cpus = dt_find_node_by_path("/cpus");
> + unsigned int cpu;
> + u32 timebase_frequency;
uint32_t please.
> + bool frequency_valid;
> + uint32_t *next_phandle = &((struct domain *)d)->arch.next_phandle;
No casting away of const, please.
> + dt_dprintk("Create cpus node\n");
> +
> + if ( !cpus )
> + {
> + dprintk(XENLOG_ERR, "Missing /cpus node in the device tree?\n");
> + return -ENOENT;
> + }
> +
> + frequency_valid = dt_property_read_u32(cpus, "timebase-frequency",
> + &timebase_frequency);
> +
> + res = fdt_begin_node(fdt, "cpus");
> + if ( res )
> + return res;
> +
> + res = fdt_property_cell(fdt, "#address-cells", 1);
> + if ( res )
> + return res;
> +
> + res = fdt_property_cell(fdt, "#size-cells", 0);
> + if ( res )
> + return res;
> +
> + if ( frequency_valid )
> + res = fdt_property_cell(fdt, "timebase-frequency", timebase_frequency);
Handing through a property directly makes me wonder how that's going to
fit with migration. I understand migration may not even be a mid-term
goal, but still.
> + for ( cpu = 0; cpu < d->max_vcpus; cpu++ )
> + {
> + char buf[64];
> + uint32_t reg = cpu_to_fdt32(cpu);
> +
> + snprintf(buf, sizeof(buf), "cpu@%u", cpu);
> + res = fdt_begin_node(fdt, buf);
> + if ( res )
> + return res;
> +
> + res = fdt_property(fdt, "reg", ®, sizeof(reg));
> + if ( res )
> + return res;
> +
> + res = fdt_property_string(fdt, "status", "okay");
> + if ( res )
> + return res;
> +
> + res = fdt_property_string(fdt, "compatible", "riscv");
> + if ( res )
> + return res;
> +
> + BUILD_BUG_ON((sizeof("riscv,") + sizeof_field(struct gstage_mode_desc, name) + 1) >= sizeof(buf));
Nit: Overlong line. Also, why +1? The name field has to include a nul, or
else ...
> + snprintf(buf, sizeof(buf), "riscv,%s", max_gstage_mode->name);
... you can't use plain %s here.
> + res = fdt_property_string(fdt, "mmu-type", buf);
> + if ( res )
> + return res;
> +
> + res = fdt_property_string(fdt, "riscv,isa", guest_isa_str);
> + if ( res )
> + return res;
> +
> + res = fdt_property_string(fdt, "device_type", "cpu");
> + if ( res )
> + return res;
> +
> + res = fdt_begin_node(fdt, "interrupt-controller");
> + if ( res )
> + return res;
> +
> + res = fdt_property_string(fdt, "compatible", "riscv,cpu-intc");
> + if ( res )
> + return res;
> +
> + res = fdt_property_cell(fdt, "#interrupt-cells", 1);
> + if ( res )
> + return res;
> +
> + res = fdt_property(fdt, "interrupt-controller", NULL, 0);
> + if ( res )
> + return res;
> +
> + res = fdt_property_u32(fdt, "phandle", (*next_phandle)++);
> + if ( res )
> + return res;
> +
> + /* end of interrupt-controller */
> + res = fdt_end_node(fdt);
> + if ( res )
> + return res;
> +
> + res = fdt_end_node(fdt);
> + if ( res )
> + return res;
> + }
> +
> + res = fdt_end_node(fdt);
> +
> + return res;
Simply "return fdt_end_node(fdt);"?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 07/27] xen/riscv: implement make_timer_node()
2026-03-10 17:08 ` [PATCH v1 07/27] xen/riscv: implement make_timer_node() Oleksii Kurochko
@ 2026-04-01 14:24 ` Jan Beulich
2026-04-10 11:54 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 14:24 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> The make_timer_node() function is implemented to return 0,
> as RISC-V does not require the creation of a timer node.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Yet: Why is this needed then in the first place? If this node may not
be required, why would the function need providing? Shouldn't the call
site be conditional then?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 08/27] xen/riscv: implement make_arch_nodes()
2026-03-10 17:08 ` [PATCH v1 08/27] xen/riscv: implement make_arch_nodes() Oleksii Kurochko
@ 2026-04-01 14:29 ` Jan Beulich
2026-04-10 13:32 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 14:29 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> No RISC-V-specific nodes need to be created at the moment,
> so make_arch_nodes() is implemented to simply return 0.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Same question as on the previous patch, albeit to a lesser degree: An
arch hook may plausibly exist (and be called) uniformly, even if on
some arch-es it doesn't do anything.
> ---
> xen/arch/riscv/Makefile | 1 +
> xen/arch/riscv/dom0less-build.c | 11 +++++++++++
> 2 files changed, 12 insertions(+)
> create mode 100644 xen/arch/riscv/dom0less-build.c
Further, on what basis is this function put in this new file, when the
earlier ones live in domain-build.c?
> --- a/xen/arch/riscv/Makefile
> +++ b/xen/arch/riscv/Makefile
> @@ -1,5 +1,6 @@
> obj-y += aplic.o
> obj-y += cpufeature.o
> +obj-y += dom0less-build.o
obj-$(CONFIG_DOM0LESS_BOOT) += dom0less-build.init.o
just like Arm has it?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 09/27] xen/riscv: implement make_intc_domU_node()
2026-03-10 17:08 ` [PATCH v1 09/27] xen/riscv: implement make_intc_domU_node() Oleksii Kurochko
@ 2026-04-01 14:38 ` Jan Beulich
2026-04-10 14:00 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 14:38 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/dom0less-build.c
> +++ b/xen/arch/riscv/dom0less-build.c
> @@ -3,6 +3,15 @@
> #include <xen/fdt-kernel.h>
> #include <xen/init.h>
>
> +#include <asm/intc.h>
> +
> +int __init make_intc_domU_node(struct kernel_info *kinfo)
> +{
> + intc_make_domu_dt_node(kinfo);
> +
> + return 0;
> +}
Is this wrapper really needed? Can't what's intc_make_domu_dt_node() right
now become make_intc_domU_node()?
> @@ -41,6 +41,10 @@ struct intc_hw_operations {
>
> /* handle external interrupt */
> void (*handle_interrupt)(struct cpu_user_regs *regs);
> +
> + /* Create interrupt controller node for domain */
> + int (*make_dom_dt_node)(const struct kernel_info *kinfo,
> + const struct dt_device_node *intc);
An __init-only hook is somewhat risky, just to mention it. In IOMMU code
besides struct iommu_ops we have struct iommu_init_ops, just to give an
example of where the same could have been the case.
> --- a/xen/arch/riscv/intc.c
> +++ b/xen/arch/riscv/intc.c
> @@ -67,3 +67,11 @@ void intc_route_irq_to_xen(struct irq_desc *desc, unsigned int priority)
> intc_set_irq_type(desc, desc->arch.type);
> intc_set_irq_priority(desc, priority);
> }
> +
> +int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
> +{
> + if ( intc_hw_ops && intc_hw_ops->make_dom_dt_node )
> + return intc_hw_ops->make_dom_dt_node(kinfo, intc_hw_ops->info->node);
> +
> + return -ENOSYS;
How do you justify this choice of return value? This isn't even a hypercall
handler.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains
2026-03-10 17:08 ` [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains Oleksii Kurochko
@ 2026-04-01 15:05 ` Jan Beulich
2026-04-10 15:40 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 15:05 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> Guests using the IMSIC interrupt controller require a corresponding
> Device Tree description. Add support for generating an IMSIC node when
> building the guest DT.
>
> Keep a reference to the host IMSIC DT node and reuse its compatible
> property while constructing the guest-visible node.
Again raises a migration concern. Presumably a guest would then be able
to migrate only to other hosts with the same compatible property.
> @@ -487,3 +492,111 @@ int __init imsic_init(const struct dt_device_node *node)
>
> return rc;
> }
> +
> +static int __init imsic_make_reg_property(struct domain *d, void *fdt)
> +{
> + __be32 regs[4];
> +
> + regs[0] = cpu_to_be32(imsic_cfg.base_addr >> 32);
> + regs[1] = cpu_to_be32(imsic_cfg.base_addr);
> + regs[2] = cpu_to_be32((IMSIC_MMIO_PAGE_SZ * d->max_vcpus) >> 32);
> + regs[3] = cpu_to_be32(IMSIC_MMIO_PAGE_SZ * d->max_vcpus);
Might be nice to encode as an initializer of the array variable. There don't
look to be any side effects in the expressions used, so Misra should take no
issue with this.
> + return fdt_property(fdt, "reg", regs, sizeof(regs));
> +}
> +
> +static int __init imsic_set_interrupt_extended_prop(struct domain *d,
> + void *fdt)
> +{
> + uint32_t len = 0, pos = 0, cpu, phandle;
At least pos and cpu should be of fixed width types.
> + uint32_t *irq_ext;
> + char buf[64];
Move this one into the loop's scope?
> + int res;
> +
> + irq_ext = xvzalloc_array(uint32_t, d->max_vcpus * 2);
> + if ( !irq_ext )
> + return -ENOMEM;
> +
> + for ( cpu = 0; cpu < d->max_vcpus; cpu++ )
> + {
> + snprintf(buf, sizeof(buf), "/cpus/cpu@%u/interrupt-controller", cpu);
> + phandle = fdt_get_phandle(fdt, fdt_path_offset(fdt, buf));
> +
> + if ( phandle <= 0 )
> + return phandle;
<= is odd to use on an unsigned type. Arm, in a similar situation, uses != .
Yet then - if 0 is an error indicator, aren't you converting this into a
success indicator for the caller?
> + irq_ext[pos++] = cpu_to_be32(phandle);
> + len += sizeof(uint32_t);
As before, preferably sizeof(<expression>) to clarify the connection. It's
pretty obvious here, but still. Yet then - do you really need to maintain
"len"? Why not use ...
> + irq_ext[pos++] = cpu_to_be32(IRQ_S_EXT);
> + len += sizeof(uint32_t);
> + }
> +
> + res = fdt_property(fdt, "interrupts-extended", irq_ext, len);
... "d->max_vcpus * 2 * sizeof(*irq_ext)" here?
> + XVFREE(irq_ext);
This can be just xvfree(), as the variable goes out of scpoe just afterwards
anyway.
> + return res;
> +}
> +
> +int __init imsic_make_dt_node(const struct kernel_info *kinfo)
So this function is unused until the next patch. Acceptable as long as Eclair
(or other) scans aren't done for RISC-V, but generally you may want to try to
get used to order patches such that this is avoided.
> +{
> + uint32_t len;
> + const void *data = NULL;
Why the initializer?
> + int res = 0;
> + void *fdt = kinfo->fdt;
> + const struct dt_device_node *host_imsic_node = imsic_cfg.host_node;
> + uint32_t *next_phandle = &kinfo->bd.d->arch.next_phandle;
> +
> + res = fdt_begin_node(fdt, host_imsic_node->full_name);
> + if ( res )
> + return res;
> +
> + data = dt_get_property(host_imsic_node, "compatible", &len);
> + if ( !data )
> + {
> + printk(XENLOG_ERR "%s: Can't find 'compatible' property\n",
> + host_imsic_node->full_name);
> +
> + return -ENOENT;
> + }
Move fdt_begin_node() down below here?
> --- a/xen/arch/riscv/include/asm/imsic.h
> +++ b/xen/arch/riscv/include/asm/imsic.h
> @@ -57,11 +57,16 @@ struct imsic_config {
> /* MSI */
> const struct imsic_msi *msi;
>
> + /* DT node of IMSIC */
> + const struct dt_device_node *host_node;
Does "host" in the name carry much of a meaning? Maybe better dt_node, to
(potentially) distinguish it from a NUMA node which also may need tracking
for an imsic at some point?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 11/27] xen/riscv: create APLIC DT node for guest domains
2026-03-10 17:08 ` [PATCH v1 11/27] xen/riscv: create APLIC " Oleksii Kurochko
@ 2026-04-01 15:16 ` Jan Beulich
2026-04-13 8:43 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-01 15:16 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> @@ -295,6 +297,85 @@ static void cf_check aplic_set_irq_type(struct irq_desc *desc,
> spin_unlock(&aplic.lock);
> }
>
> +static int __init cf_check aplic_make_dom_dt_node(
> + const struct kernel_info *kinfo,
> + const struct dt_device_node *aplic_node)
Nit: Entirely bogus indentation.
> +{
> + uint32_t len;
> + const __be32 *regs;
> + const void *data = NULL;
Again, why the initializer?
> + int res = 0;
> + void *fdt = kinfo->fdt;
> +
> + res = imsic_make_dt_node(kinfo);
> + if ( res )
> + return res;
> +
> + res = fdt_begin_node(fdt, aplic_node->full_name);
> + if (res)
Nit: Style.
> + return res;
> +
> + data = dt_get_property(aplic_node, "#interrupt-cells", &len);
> + if ( !data )
> + {
> + printk("%s: Can't find '#interrupt-cells' property\n",
> + aplic_node->full_name);
> + return -FDT_ERR_XEN(ENOENT);
> + }
Again, pull this up to be first in the function?
> + res = fdt_property(fdt, "#interrupt-cells", data, len);
> + if ( res )
> + return res;
So host properties are again directly handed through to the guest?
Shouldn't the number of interrupts (aiui that's the "number of cells"
here) a guest gets be independent from the host it runs one?
> + regs = dt_get_property(aplic_node, "reg", &len);
> + if ( !regs )
> + {
> + printk("%s: Can't find 'reg' property\n", aplic_node->full_name);
> + return -FDT_ERR_XEN(ENOENT);
> + }
> +
> + res = fdt_property(fdt, "reg", regs, len);
> + if ( res )
> + return res;
> +
> + data = dt_get_property(aplic_node, "riscv,num-sources", &len);
> + if ( !data )
> + {
> + printk("%s: Can't find 'riscv,num-sources' property\n",
> + aplic_node->full_name);
> + return -FDT_ERR_XEN(ENOENT);
> + }
> +
> + res = fdt_property(fdt, "riscv,num-sources", data, len);
> + if ( res )
> + return res;
Or maybe this is the number of interrupts?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available()
2026-03-10 17:08 ` [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available() Oleksii Kurochko
@ 2026-04-02 9:00 ` Jan Beulich
2026-04-13 9:32 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 9:00 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> aia_init() is going to contain all the stuff related to AIA initialization.
> At the moment, it is just Check if SSAIA extension is available and if yes
> set is_aia_available to true.
And (future) users of aia_available() can't directly call
riscv_isa_extension_available()? Nor can aia_available() be a convenience
wrapper around that call? It's only ...
> --- /dev/null
> +++ b/xen/arch/riscv/aia.c
> @@ -0,0 +1,25 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <xen/errno.h>
> +#include <xen/init.h>
> +#include <xen/sections.h>
> +#include <xen/types.h>
> +
> +#include <asm/cpufeature.h>
> +
> +static bool __ro_after_init is_aia_available;
... a boolean, yes, but still.
> +bool aia_available(void)
> +{
> + return is_aia_available;
> +}
> +
> +int __init aia_init(void)
> +{
> + if ( !riscv_isa_extension_available(NULL, RISCV_ISA_EXT_ssaia) )
> + return -ENODEV;
> +
> + is_aia_available = true;
> +
> + return 0;
> +}
Why the return value, when ...
> --- a/xen/arch/riscv/intc.c
> +++ b/xen/arch/riscv/intc.c
> @@ -8,6 +8,7 @@
> #include <xen/lib.h>
> #include <xen/spinlock.h>
>
> +#include <asm/aia.h>
> #include <asm/intc.h>
>
> static const struct intc_hw_operations *__ro_after_init intc_hw_ops;
> @@ -27,6 +28,8 @@ void __init intc_preinit(void)
>
> void __init intc_init(void)
> {
> + aia_init();
... the sole caller doesn't care?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests
2026-03-10 17:08 ` [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests Oleksii Kurochko
@ 2026-04-02 10:03 ` Jan Beulich
2026-04-13 14:42 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 10:03 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> AIA provides a hardware-accelerated mechanism for delivering external
> interrupts to domains via "guest interrupt files" located in IMSIC.
> A single physical hart can implement multiple such files (up to GEILEN),
> allowing several virtual harts to receive interrupts directly from hardware
Isn't use of such an optimization coming prematurely? Shouldn't this series
focus on getting basic functionality in place?
> --- a/xen/arch/riscv/aia.c
> +++ b/xen/arch/riscv/aia.c
> @@ -1,11 +1,24 @@
> /* SPDX-License-Identifier: GPL-2.0-only */
>
> +#include <xen/bitmap.h>
> #include <xen/errno.h>
> #include <xen/init.h>
> #include <xen/sections.h>
> +#include <xen/sched.h>
> +#include <xen/spinlock.h>
> #include <xen/types.h>
> +#include <xen/xvmalloc.h>
>
> +#include <asm/aia.h>
> #include <asm/cpufeature.h>
> +#include <asm/csr.h>
> +#include <asm/current.h>
> +
> +/*
> + * Bitmap for each physical cpus to detect which VS (guest)
> + * interrupt file id was used.
> + */
> +DEFINE_PER_CPU(struct vgein_bmp, vgein_bmp);
Why can this not be static? All management looks to be in this same file.
> @@ -14,12 +27,109 @@ bool aia_available(void)
> return is_aia_available;
> }
>
> +int __init vgein_init(unsigned int cpu)
If this needs invoking once per CPU being brought up, it can't be __init.
Also - static?
> +{
> + struct vgein_bmp *vgein = &per_cpu(vgein_bmp, cpu);
> +
> + csr_write(CSR_HGEIE, -1UL);
> + vgein->geilen = flsl(csr_read(CSR_HGEIE));
> + csr_write(CSR_HGEIE, 0);
> + if ( vgein->geilen )
> + vgein->geilen--;
I don't understand this. The "len" in "geilen" stands for "length", I suppose,
i.e. the number of bits. Hmm, the spec itself is inconsistent: "The number of
bits implemented in hgeip and hgeie for guest external interrupts is UNSPECIFIED
and may be zero. This number is known as GEILEN." This may or may not include
bit 0 (which is implemented, but r/o zero). Then saying "Hence, if GEILEN is
nonzero, bits GEILEN:1 shall be writable in ..." suggests 0 isn't included, but
that's not unambiguous.
Anyway, may I suggest
vgein->geilen = flsl(csr_read(CSR_HGEIE) >> 1);
instead?
> + BUG_ON(!vgein->geilen);
You can return (an error, but see the respective remark on the earlier patch),
no need to crash the system. That return may want to come after the printk()
below, though.
> + printk("cpu%d.geilen=%d\n", cpu, vgein->geilen);
As before - %u please with unsigned int.
> + if ( !vgein->bmp )
Why would this check be needed?
> + {
> + vgein->bmp = xvzalloc_array(unsigned long, BITS_TO_LONGS(vgein->geilen));
With the determination above, isn't BITS_TO_LONGS(vgein->geilen) ==
BITS_PER_LONG in all cases? Surely you don't mean to runtime-allocate
space for a single unsigned long? So I wonder is the dimension used
is wrong.
If it isn't, dynamically allocating the owners array may be more
useful, as (on RV64) occupies a fixed 512 bytes right now.
> + if ( !vgein->bmp )
> + return -ENOMEM;
> + }
> +
> + spin_lock_init(&vgein->lock);
> +
> + return 0;
> +}
> +
> int __init aia_init(void)
> {
> + int rc = 0;
> +
> if ( !riscv_isa_extension_available(NULL, RISCV_ISA_EXT_ssaia) )
> return -ENODEV;
>
> + if ( (rc = vgein_init(0)) )
> + return rc;
> +
> is_aia_available = true;
Ah, this answers a question of mine on the earlier patch: This boolean
indicates more than just the extension being available. But why does
the description there not simply say so? How am I as a reviewer supposed
to know?
> - return 0;
> + return rc;
> +}
> +
> +unsigned int vgein_assign(struct vcpu *v)
> +{
> + unsigned int vgein_id;
> +
Seemingly undue blank line.
> + struct vgein_bmp *vgein_bmp = &per_cpu(vgein_bmp, v->processor);
> + unsigned long *bmp = vgein_bmp->bmp;
> + unsigned long flags;
> +
> + spin_lock_irqsave(&vgein_bmp->lock, flags);
> + vgein_id = bitmap_weight(bmp, vgein_bmp->geilen);
How can the ID to use be the number of bits which are set? This only works
if all set bits are contiguous at the bottom.
> + /*
> + * All vCPU guest interrupt files are used and we don't support a case
> + * when number of vCPU on 1 pCPU is bigger then geilen.
> + */
This wants checking in vgein_init() then. CPUs (beyond the boot one)
violating this should not be brought online.
> + ASSERT(vgein_id < vgein_bmp->geilen);
What if not bit is available? By asserting, you assume the caller will not
call here when no ID is available. Yet there is no caller of this function,
so how can one verify whether this assertion is appropriate?
> + bitmap_set(bmp, vgein_id, 1);
__set_bit()?
> + spin_unlock_irqrestore(&vgein_bmp->lock, flags);
> +
> + /*
> + * The vgein_id shouldn't be zero, as it will indicate that no guest
> + * external interrupt source is selected for VS-level external interrupts
> + * according to RISC-V priviliged spec:
> + * 8.2.1 Hypervisor Status Register (hstatus) in RISC-V priviliged spec:
Please avoid section numbers in such references. The section of this name
in the version I'm looking at is 21.2.1.
> + * The VGEIN (Virtual Guest External Interrupt Number) field selects
> + * a guest external interrupt source for VS-level external interrupts.
> + * VGEIN is a WLRL field that must be able to hold values between zero
> + * and the maximum guest external interrupt number (known as GEILEN),
> + * inclusive.
> + * When VGEIN=0, no guest external interrupt source is selected for
> + * VS-level external interrupts.
> + */
> + vgein_id++;
Related to my comment regarding GEILEN, this shouldn't be necessary. Keep
bits in their natural positions, and simply avoid using bit 0 (either by
setting it during init and then never clearing it, or by starting the
scan for clear bits at bit 1).
> +#ifdef VGEIN_DEBUG
> + printk("%s: %pv: vgein_id(%u), xen_cpu%d_bmp=%#lx\n",
> + __func__, v, vgein_id, v->processor, *bmp);
> +#endif
> +
> + vcpu_guest_cpu_user_regs(v)->hstatus &= ~HSTATUS_VGEIN;
> + vcpu_guest_cpu_user_regs(v)->hstatus |=
> + MASK_INSR(vgein_id, HSTATUS_VGEIN);
When is this function going to be invoked? (As before, not knowing this is
one of the problems with introducing functions with no callers.)
> + return vgein_id;
> +}
> +
> +void vgein_release(struct vcpu *v, unsigned int vgen_id)
> +{
> + unsigned long flags;
> +
Another seemingly stray blank line.
> + struct vgein_bmp *vgein_bmp = &per_cpu(vgein_bmp, v->processor);
> +
> + spin_lock_irqsave(&vgein_bmp->lock, flags);
> + bitmap_clear(vgein_bmp->bmp, vgen_id - 1, 1);
__clear_bit()?
> + spin_unlock_irqrestore(&vgein_bmp->lock, flags);
> +
> +#ifdef VGEIN_DEBUG
> + printk("%s: vgein_id(%u), xen_cpu%d_bmp=%#lx\n",
> + __func__, vgen_id, v->processor, *vgein_bmp->bmp);
I can't spot a difference from the message in vgein_assign(). How is one
to distinguish the two in a log?
> --- a/xen/arch/riscv/include/asm/aia.h
> +++ b/xen/arch/riscv/include/asm/aia.h
> @@ -3,8 +3,26 @@
> #ifndef ASM__RISCV__AIA_H
> #define ASM__RISCV__AIA_H
>
> +#include <xen/percpu.h>
> +#include <xen/spinlock.h>
> +
> +struct vcpu;
> +
> +struct vgein_bmp {
What does the _bmp suffix indicate here? There's ...
> + unsigned long *bmp;
... a bitmap field, yes, but ...
> + spinlock_t lock;
> + struct vcpu *owners[BITS_PER_LONG];
> + unsigned int geilen;
> +};
... the structure as a whole has quite a bit more.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state
2026-03-10 17:08 ` [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state Oleksii Kurochko
@ 2026-04-02 11:31 ` Jan Beulich
2026-04-14 9:22 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 11:31 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> Each vCPU interacting with the IMSIC requires state to track the
> associated guest interrupt file and its backing context.
>
> Introduce a per-vCPU structure to hold IMSIC-related state, including
> the guest interrupt file identifier and the CPU providing the backing
> VS-file. Access to the guest file identifier is protected by a lock.
>
> Initialize this structure during vCPU setup and store it in arch_vcpu.
> The initial state marks the VS-file as software-backed until it becomes
> associated with a physical CPU.
>
> Add helpers to retrieve and update the guest interrupt file identifier.
Yet again a functions with no callers.
> --- a/xen/arch/riscv/imsic.c
> +++ b/xen/arch/riscv/imsic.c
> @@ -59,6 +59,29 @@ do { \
> csr_clear(CSR_SIREG, v); \
> } while (0)
>
> +unsigned int vcpu_guest_file_id(const struct vcpu *v)
> +{
> + struct imsic_state *imsic_state = v->arch.imsic_state;
> + unsigned long flags;
> + unsigned int vsfile_id;
> +
> + read_lock_irqsave(&imsic_state->vsfile_lock, flags);
> + vsfile_id = imsic_state->guest_file_id;
> + read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
What purpose does this locking have? Already ...
> + return vsfile_id;
... here the value can be stale, if indeed there is a chance of races.
Did you perhaps mean to use ACCESS_ONCE() here and where the value is
set?
> @@ -315,6 +338,25 @@ static int imsic_parse_node(const struct dt_device_node *node,
> return 0;
> }
>
> +int __init vcpu_imsic_init(struct vcpu *v)
__init for a function involved in setting up a vCPU?
> +{
> + struct imsic_state *imsic_state;
> +
> + /* Allocate IMSIC context */
> + imsic_state = xvzalloc(struct imsic_state);
> + if ( !imsic_state )
> + return -ENOMEM;
> +
> + v->arch.imsic_state = imsic_state;
> +
> + /* Setup IMSIC context */
> + rwlock_init(&imsic_state->vsfile_lock);
> +
> + imsic_state->guest_file_id = imsic_state->vsfile_pcpu = NR_CPUS;
Iirc Misra dislikes such double assignments, so better avoid them right away.
(As per a comment at the bottom this may need splitting anyway.)
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -52,6 +52,8 @@ struct arch_vcpu {
>
> struct vtimer vtimer;
>
> + struct imsic_state *imsic_state;
Just like it's "vtimer", perhaps also "vimsic_state" for both the field
and the struct tag?
> @@ -64,8 +65,20 @@ struct imsic_config {
> spinlock_t lock;
> };
>
> +struct imsic_state {
> + /* IMSIC VS-file */
> + rwlock_t vsfile_lock;
> + unsigned int guest_file_id;
> + /*
> + * (vsfile_pcpu >= 0) => h/w IMSIC VS-file
> + * (vsfile_pcpu == NR_CPUS) => s/w IMSIC SW-file
> + */
> + unsigned long vsfile_pcpu;
And why unsigned long, when unsigned int will do (as about everywhere else
for CPU numbers)? That'll also shrink the structure size by 8 bytes.
As to the comment - as per vcpu_imsic_init() NR_CPUS also has some special
meaning for guest_file_id, yet there's no comment there. How do file ID and
NR_CPUS fit together anyway?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support
2026-03-10 17:08 ` [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support Oleksii Kurochko
@ 2026-04-02 11:58 ` Jan Beulich
2026-04-14 10:27 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 11:58 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> @@ -47,6 +48,19 @@ struct intc_hw_operations {
> const struct dt_device_node *intc);
> };
>
> +struct vintc_ops {
> + /* Initialize some vINTC-related stuff for a vCPU */
> + int (*vcpu_init)(struct vcpu *vcpu);
v as the parameter name, to fit our convention? (Same below for the other
hook.)
> + /* Check if a register is virtual interrupt controller MMIO */
> + int (*is_access)(const struct vcpu *vcpu, const unsigned long addr);
What does "register" in the comment refer to. All I see is an address.
(The const will also want dropping from the parameter in this declaration.)
> +};
> +
> +struct vintc {
> + const struct intc_info *info;
Isn't this referencing a physical INTC's structure? Why would the virtual
one's properties have to match that of the physical one?
> --- /dev/null
> +++ b/xen/arch/riscv/include/asm/vaplic.h
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * xen/arch/riscv/vaplic.c
> + *
> + * Virtual RISC-V Advanced Platform-Level Interrupt Controller support
> + *
> + * Copyright (c) Microchip.
> + */
> +
> +#ifndef ASM__RISCV__VAPLIC_H
> +#define ASM__RISCV__VAPLIC_H
> +
> +#include <xen/kernel.h>
> +#include <xen/types.h>
> +
> +#include <asm/intc.h>
> +
> +struct domain;
> +
> +#define to_vaplic(v) container_of(v, struct vaplic, base)
I'm confused here, maybe first of all because of the use of v. v is our
common identified for struct vcpu * instances. Using it in a macro like
this one suggests a struct vcpu * needs passing into the macro. Yet from
the two uses of the macro that doesn't look to be the case.
Perhaps best to have a struct domain * passed into here?
> +struct vaplic_regs {
> + uint32_t domaincfg;
> + uint32_t smsiaddrcfg;
> + uint32_t smsiaddrcfgh;
The latter two aren't used, and generally I'd expect a h-suffixed field to
exist only for RV32. (The un-suffixed field then would need to be unsigned
long, of course.)
> +};
> +
> +struct vaplic {
> + struct vintc base;
How does "base" fit with the type of the field?
> --- a/xen/arch/riscv/intc.c
> +++ b/xen/arch/riscv/intc.c
> @@ -6,6 +6,7 @@
> #include <xen/init.h>
> #include <xen/irq.h>
> #include <xen/lib.h>
> +#include <xen/sched.h>
> #include <xen/spinlock.h>
Why is this change needed all of the sudden?
> --- /dev/null
> +++ b/xen/arch/riscv/vaplic.c
> @@ -0,0 +1,74 @@
> +/* SPDX-License-Identifier: MIT */
> +/*
> + * xen/arch/riscv/vaplic.c
> + *
> + * Virtual RISC-V Advanced Platform-Level Interrupt Controller support
> + *
> + * Copyright (c) Microchip.
> + * Copyright (c) Vates
> + */
> +
> +#include <xen/errno.h>
> +#include <xen/sched.h>
> +#include <xen/xvmalloc.h>
> +
> +#include <asm/aia.h>
> +#include <asm/imsic.h>
> +#include <asm/intc.h>
> +#include <asm/vaplic.h>
> +
> +#include "aplic-priv.h"
> +
> +static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
> +{
> + int rc = 0;
> +
> + rc = vcpu_imsic_init(v);
> + if ( rc )
> + return rc;
> +
> + imsic_set_guest_file_id(v, vgein_assign(v));
And vgein_assign() can't fail? (Rhetorical question - of course it can. That
function shouldn't assert that it can fine a valid ID.)
But then - aren't you limiting the number of vCPU-s a host can handle by the
number vgein IDs?
> + return rc;
> +}
> +
> +static const struct vintc_ops vaplic_ops = {
> + .vcpu_init = vcpu_vaplic_init,
> +};
> +
> +static struct vintc * __init vaplic_alloc(void)
> +{
> + struct vaplic *v = NULL;
Onve again - why the initializer? In fact, ...
> + v = xvzalloc(struct vaplic);
... this could be the initializer.
> + if ( !v )
> + return NULL;
> +
> + return &v->base;
> +}
If you returned and ...
> +int __init domain_vaplic_init(struct domain *d)
> +{
> + int ret = 0;
> +
> + d->arch.vintc = vaplic_alloc();
... stored struct vaplic *, the slightly odd to_vaplic() macro wouldn't
be needed.
> + if ( !d->arch.vintc )
> + {
> + ret = -ENOMEM;
> + goto fail;
Nit: goto when simply return could be used.
> + }
> +
> + d->arch.vintc->ops = &vaplic_ops;
Are other kinds of ops structures going to appear? If not, why the extra
indirection?
> + to_vaplic(d->arch.vintc)->regs.domaincfg =
> + APLIC_DOMAINCFG_IE | APLIC_DOMAINCFG_DM;
> +
> + fail:
> + return ret;
> +}
> +
> +void __init domain_vaplic_deinit(struct domain *d)
> +{
> + struct vaplic *vaplic = to_vaplic(d->arch.vintc);
> +
> + XVFREE(vaplic);
If this cleared the struct domain field, then yes. But the way it is, just
xvfree() will suffice. (Re-work following other remarks may want it to
become XVFREE() again, though.)
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-03-10 17:08 ` [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough Oleksii Kurochko
@ 2026-04-02 12:22 ` Jan Beulich
2026-04-14 11:29 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 12:22 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/setup.h
> +++ b/xen/arch/riscv/include/asm/setup.h
> @@ -5,6 +5,10 @@
>
> #include <xen/types.h>
>
> +struct domain;
> +struct dt_device_node;
> +struct rangeset;
> +
> #define max_init_domid (0)
>
> void setup_mm(void);
> @@ -13,6 +17,19 @@ void copy_from_paddr(void *dst, paddr_t paddr, unsigned long len);
>
> void init_csr_masks(void);
>
> +/* TODO: move somewhere to common header? */
Counter question: Why ...
> +/*
> + * Retrieves the interrupts configuration from a device tree node and maps
> + * those interrupts to the target domain.
> + *
> + * Returns:
> + * < 0 error
> + * 0 success
> + */
> +int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
> + bool need_mapping,
> + struct rangeset *irq_ranges);
... is this not an inline function, when ...
> --- a/xen/arch/riscv/intc.c
> +++ b/xen/arch/riscv/intc.c
> @@ -79,3 +79,11 @@ int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
>
> return -ENOSYS;
> }
> +
> +int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
> + bool need_mapping,
> + struct rangeset *irq_ranges)
> +{
> + return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
> + irq_ranges);
> +}
... it's merely a wrapper around an indirect function call? And then the
function isn't used anywhere anyway.
> --- a/xen/arch/riscv/vaplic.c
> +++ b/xen/arch/riscv/vaplic.c
> @@ -9,6 +9,7 @@
> */
>
> #include <xen/errno.h>
> +#include <xen/iocap.h>
> #include <xen/sched.h>
> #include <xen/xvmalloc.h>
>
> @@ -19,6 +20,113 @@
>
> #include "aplic-priv.h"
>
> +struct vaplic_priv {
> + /* Contains a legal interrupts for a domain */
> + uint32_t auth_irq_bmp[APLIC_NUM_REGS];
> +};
With it apparently plural that is correct, the first "a" wants dropping
from the comment.
> +static bool is_irq_shared_among_domains(const struct domain *d,
> + const unsigned int irq_num)
> +{
> + struct domain *tmp;
const
> + unsigned int reg_num = irq_num / APLIC_NUM_REGS;
> + unsigned int bit_pos = irq_num % APLIC_NUM_REGS;
> +
> + for_each_domain ( tmp )
See other uses of the construct for how this needs synchronizing. But looping
over all domains looks pretty inefficient anyway for ...
> + {
> + uint32_t *auth_irq_bmp;
> +
> + if ( tmp == d )
> + continue;
> +
> + auth_irq_bmp = tmp->arch.vintc->private;
> +
> + if ( auth_irq_bmp[reg_num] & BIT(bit_pos, U) )
> + {
> + printk("%s: irq%d is shared between %pd and %pd\n", __func__,
> + irq_num, tmp, d);
> +
> + return true;
> + }
... the intended purpose. If IRQs can't be shared, can't you maintain global
state of which ones are in use _somewhere_?
> + }
> +
> + return false;
> +}
This together with ...
> +int vaplic_map_device_irqs_to_domain(struct domain *d,
> + struct dt_device_node *dev,
> + bool need_mapping,
> + struct rangeset *irq_ranges)
> +{
> + unsigned int i, nirq;
> + int res, irq;
> + struct dt_raw_irq rirq;
> + uint32_t *auth_irq_bmp = d->arch.vintc->private;
> + unsigned int reg_num;
> +
> + nirq = dt_number_of_irq(dev);
> +
> + /* Give permission and map IRQs */
> + for ( i = 0; i < nirq; i++ )
> + {
> + res = dt_device_get_raw_irq(dev, i, &rirq);
> + if ( res )
> + {
> + printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
> + i, dt_node_full_name(dev));
> + return res;
> + }
> +
> + /*
> + * Don't map IRQ that have no physical meaning
> + * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
> + */
> + if ( rirq.controller != dt_interrupt_controller )
> + {
> + dt_dprintk("irq %u not connected to primary controller."
> + "Connected to %s\n", i,
> + dt_node_full_name(rirq.controller));
> + continue;
> + }
> +
> + irq = platform_get_irq(dev, i);
> + if ( irq < 0 )
> + {
> + printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
> + return irq;
> + }
> +
> + res = irq_permit_access(d, irq);
> + if ( res )
> + {
> + printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
> + irq);
This time the other way around: %d please with plain int. (Again at least
once further down.)
> + return res;
> + }
> +
> + reg_num = irq / APLIC_NUM_REGS;
> +
> + if ( is_irq_shared_among_domains(d, irq) )
> + {
> + printk("%s: Shared IRQ isn't supported\n", __func__);
> + return -EINVAL;
> + }
> +
> + auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
... all of this leaves me with the impression that IRQ numbering isn't really
virtualized. IRQs are merely split into groups, one group per domain (and
maybe some unused). How are you going to fit in truly virtual IRQs?
> + dt_dprintk(" - IRQ: %u\n", irq);
> +
> + if ( irq_ranges )
> + {
> + res = rangeset_add_singleton(irq_ranges, irq);
> + if ( res )
> + return res;
> + }
What is irq_ranges?
> @@ -34,6 +142,7 @@ static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
>
> static const struct vintc_ops vaplic_ops = {
> .vcpu_init = vcpu_vaplic_init,
> + .map_device_irqs_to_domain = vaplic_map_device_irqs_to_domain,
> };
What about the inverse function, needed for domain cleanup?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 17/27] xen/riscv: add missing APLIC register offsets, masks to asm/aplic.h.
2026-03-10 17:08 ` [PATCH v1 17/27] xen/riscv: add missing APLIC register offsets, masks to asm/aplic.h Oleksii Kurochko
@ 2026-04-02 12:51 ` Jan Beulich
2026-04-14 11:42 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 12:51 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> These definitions are required for correct decoding of APLIC MMIO
> accesses and target configuration, and will be used by both the
> physical and virtual APLIC implementations.
>
> No functional change is intended by this patch; it only centralises
> hardware definitions that were previously missing.
>
> Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
I can commit this as is (with my ack added), but I'd like you to consider
again my remark elsewhere as to ...
> --- a/xen/arch/riscv/include/asm/aplic.h
> +++ b/xen/arch/riscv/include/asm/aplic.h
> @@ -17,6 +17,11 @@
>
> #define APLIC_NUM_REGS 32
>
> +#define APLIC_REG_OFFSET_MASK 0x3FFF
> +#define APLIC_TARGET_IPRIO_MASK 0xFF
> +#define APLIC_TARGET_GUEST_IDX_SHIFT 12
> +#define APLIC_TARGET_EIID_MASK 0x7FF
> +
> #define APLIC_DOMAINCFG_IE BIT(8, U)
> #define APLIC_DOMAINCFG_DM BIT(2, U)
>
> @@ -27,6 +32,36 @@
> #define APLIC_SOURCECFG_SM_LEVEL_HIGH 0x6
> #define APLIC_SOURCECFG_SM_LEVEL_LOW 0x7
>
> +#define APLIC_DOMAINCFG 0x0000
> +#define APLIC_SOURCECFG_BASE 0x0004
> +#define APLIC_SOURCECFG_LAST 0x0FFC
> +
> +#define APLIC_SMSICFGADDR 0x1BC8
> +#define APLIC_SMSICFGADDRH 0x1BCC
> +
> +#define APLIC_SETIP_BASE 0x1C00
> +#define APLIC_SETIP_LAST 0x1C7C
> +#define APLIC_SETIPNUM 0x1CDC
> +
> +#define APLIC_CLRIP_BASE 0x1D00
> +#define APLIC_CLRIP_LAST 0x1D7C
> +#define APLIC_CLRIPNUM 0x1DDC
> +
> +#define APLIC_SETIE_BASE 0x1E00
> +#define APLIC_SETIE_LAST 0x1E7C
> +#define APLIC_SETIENUM 0x1EDC
> +
> +#define APLIC_CLRIE_BASE 0x1F00
> +#define APLIC_CLRIE_LAST 0x1F7C
> +#define APLIC_CLRIENUM 0x1FDC
> +
> +#define APLIC_SETIPNUM_LE 0x2000
> +
> +#define APLIC_GENMSI 0x3000
> +
> +#define APLIC_TARGET_BASE 0x3004
> +#define APLIC_TARGET_LAST 0x3FFC
... the case of hex digits vs the need to have L (and hence U) suffixes in
upper case for Misra's sake. Those suffixes are easier to spot as such when
the hex digits use lower case. Then even for un-suffixed numbers lower case
is preferable imo, for overall consistency.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 18/27] xen/riscv: add vaplic access check
2026-03-10 17:08 ` [PATCH v1 18/27] xen/riscv: add vaplic access check Oleksii Kurochko
@ 2026-04-02 13:10 ` Jan Beulich
2026-04-14 11:45 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 13:10 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/aplic.c
> +++ b/xen/arch/riscv/aplic.c
> @@ -38,6 +38,7 @@ static struct aplic_priv aplic = {
>
> static struct intc_info __ro_after_init aplic_info = {
> .hw_version = INTC_APLIC,
> + .private = &aplic,
Isn't this the host instance again? How can you ...
> --- a/xen/arch/riscv/vaplic.c
> +++ b/xen/arch/riscv/vaplic.c
> @@ -127,6 +127,20 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
> return 0;
> }
>
> +static int cf_check vaplic_is_access(const struct vcpu *vcpu,
> + const unsigned long addr)
> +{
> + const struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
> + const struct aplic_priv *priv = vaplic->base.info->private;
> + const paddr_t paddr_end = priv->paddr_start + priv->size;
> +
> + /* check if it is an APLIC access */
> + if ( priv->paddr_start <= addr && addr < paddr_end )
... use that here? Or asked differently, again: Where's the virtualization,
i.e. the abstraction away from host properties?
Furthermore, is it really sufficient to check just the starting address of
an access? Shouldn't the last byte accessed also fall into the range in
question?
> + return 1;
> +
> + return 0;
> +}
This function looks to want to return bool (and then use true/false).
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO
2026-03-10 17:08 ` [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO Oleksii Kurochko
@ 2026-04-02 14:18 ` Jan Beulich
2026-04-14 16:04 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 14:18 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/vaplic.c
> +++ b/xen/arch/riscv/vaplic.c
> @@ -20,6 +20,16 @@
>
> #include "aplic-priv.h"
>
> +#define APLIC_REG_GET(addr, offset) \
> + readl((void *)((vaddr_t)(addr) + offset))
> +#define APLIC_REG_SET(addr, offset, value) \
> + writel(value, (void *)((vaddr_t)(addr) + offset))
Why is addr properly parenthesized, but offset isn't?
> +#define AUTH_IRQ_BIT(irqnum) (auth_irq_bmp[(irqnum) / APLIC_NUM_REGS] & \
> + BIT((irqnum) % APLIC_NUM_REGS, U))
> +
> +#define regval_to_irqn(reg_val) ((reg_val) / sizeof(uint32_t))
I'm trying to make sense of the division here, but I think the main issue
is with naming: It's not a "register value" which is passed into here, but
a register index (offset from a range's base register).
> @@ -127,6 +137,164 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
> return 0;
> }
>
> +static void vaplic_dm_update_target(const unsigned long hart_id, uint32_t *iprio)
> +{
> + *iprio &= APLIC_TARGET_IPRIO_MASK;
> + *iprio |= (hart_id << APLIC_TARGET_HART_IDX_SHIFT);
> +}
> +
> +static void vaplic_update_target(const struct imsic_config *imsic,
> + const int guest_id,
> + const unsigned long hart_id, uint32_t *value)
> +{
> + unsigned long group_index;
> + unsigned int hhxw = imsic->group_index_bits;
> + unsigned int lhxw = imsic->hart_index_bits;
> + unsigned int hhxs = imsic->group_index_shift - IMSIC_MMIO_PAGE_SHIFT * 2;
> + unsigned long base_ppn = imsic->msi[hart_id].base_addr >> IMSIC_MMIO_PAGE_SHIFT;
> +
> + group_index = (base_ppn >> (hhxs + 12)) & (BIT(hhxw, UL) - 1);
And there's no constant available to make this literal 12 more descriptive?
> + *value &= APLIC_TARGET_EIID_MASK;
> + *value |= guest_id << APLIC_TARGET_GUEST_IDX_SHIFT;
> + *value |= hart_id << APLIC_TARGET_HART_IDX_SHIFT;
> + *value |= group_index << (lhxw + APLIC_TARGET_HART_IDX_SHIFT) ;
> +}
Both functions returning void right now, why would they need to return their
result via indirection?
> +#define CALC_REG_VALUE(base) \
> +{ \
> + uint32_t index; \
> + uint32_t tmp_val; \
Combine these two, or have the variables have initializers?
> + index = regval_to_irqn(offset - base); \
There's no "offset" declared or passed into here, nor ...
> + tmp_val = APLIC_REG_GET(priv->regs, aplic_addr) & ~auth_irq_bmp[index]; \
... "priv", nor ...
> + value &= auth_irq_bmp[index]; \
> + value |= tmp_val; \
... "value". It may remain like this, but then it wants putting inside the
sole function that uses it, and be #undef-ed at the end of the function.
> +}
Please wrap in do/while(0), for use sites to be required to have semicolons
(and hence look like normal statements). Or make it a statement expression
properly returning the calculated value.
> +static int cf_check vaplic_emulate_store(const struct vcpu *vcpu,
> + unsigned long addr, uint32_t value)
> +{
> + struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
> + struct aplic_priv *priv = vaplic->base.info->private;
> + uint32_t offset = addr & APLIC_REG_OFFSET_MASK;
See ./CODING_STYLE as to uses of fixed-width types.
> + unsigned long aplic_addr = addr - priv->paddr_start;
> + const uint32_t *auth_irq_bmp = vcpu->domain->arch.vintc->private;
> +
> + switch ( offset )
> + {
> + case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
And (taking this just as example) any misaligned accesses falling in this range
are fine?
> + CALC_REG_VALUE(APLIC_SETIP_BASE);
> + break;
> +
> + case APLIC_CLRIP_BASE ... APLIC_CLRIP_LAST:
> + CALC_REG_VALUE(APLIC_CLRIP_BASE);
> + break;
> +
> + case APLIC_SETIE_BASE ... APLIC_SETIE_LAST:
> + CALC_REG_VALUE(APLIC_SETIE_BASE);
> + break;
> +
> + case APLIC_CLRIE_BASE ... APLIC_CLRIE_LAST:
> + CALC_REG_VALUE(APLIC_CLRIE_BASE);
> + break;
> +
> + case APLIC_SOURCECFG_BASE ... APLIC_SOURCECFG_LAST:
> + /* We don't suppert delagation, so bit10 if sourcecfg should be 0 */
> + ASSERT(!(value & BIT(10, U)));
And that bit doesn't have a proper #define?
> + /*
> + * As sourcecfg register starts from 1:
> + * 0x0000 domaincfg
> + * 0x0004 sourcecfg[1]
> + * 0x0008 sourcecfg[2]
> + * ...
> + * 0x0FFC sourcecfg[1023]
> + * It is necessary to calculate an interrupt number by substracting
Nit: subtracting
> + * of APLIC_DOMAINCFG instead of APLIC_SOURCECFG_BASE.
> + */
> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_DOMAINCFG)) )
> + /* interrupt not enabled, ignore it */
Throughout the series: Please adhere to ./CODING_STYLE.
> + return 0;
> +
> + break;
And any value is okay to write?
> + case APLIC_TARGET_BASE ... APLIC_TARGET_LAST:
> + struct vcpu *target_vcpu = NULL;
> +
> + /*
> + * Look at vaplic_emulate_load() for explanation why
> + * APLIC_GENMSI is substracted.
> + */
There's no vaplic_emulate_load() - how can I go look there?
Also same typo again as above.
> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_GENMSI)) )
> + /* interrupt not enabled, ignore it */
> + return 0;
> +
> + for ( int i = 0; i < vcpu->domain->max_vcpus; i++ )
unsigned int
> + {
> + struct vcpu *v = vcpu->domain->vcpu[i];
> +
> + if ( v->vcpu_id == (value >> APLIC_TARGET_HART_IDX_SHIFT) )
> + {
> + target_vcpu = v;
> + break;
> + }
> + }
> +
> + ASSERT(target_vcpu);
What guarantees the pointer to be non-NULL? The incoming value can be
arbitrary, afaict.
> + if ( !(vaplic->regs.domaincfg & APLIC_DOMAINCFG_DM) )
> + {
> + vaplic_dm_update_target(cpuid_to_hartid(target_vcpu->processor),
> + &value);
> + }
> + else
> + vaplic_update_target(priv->imsic_cfg,
> + vcpu_guest_file_id(target_vcpu),
> + cpuid_to_hartid(target_vcpu->processor),
> + &value);
I'm struggling with the naming here: When DM is clear, a function with "dm"
in the name is called.
For the latter one, unless other uses are intended speaking against that,
instead of the middle two arguments simply pass target_vcpu?
Also please omit the braces consistently from both branches.
> + break;
> +
> + case APLIC_SETIPNUM:
> + case APLIC_SETIPNUM_LE:
What about APLIC_SETIPNUM_BE?
> + case APLIC_CLRIPNUM:
> + case APLIC_SETIENUM:
> + case APLIC_CLRIENUM:
> + if ( AUTH_IRQ_BIT(value) )
> + break;
Aren't you easily overrunning auth_irq_bmp[] here?
> + return 0;
> +
> + case APLIC_DOMAINCFG:
> + /*
> + * TODO:
> + * The domaincfg register has this format:
> + * bits 31:24 read-only 0x80
> + * bit 8 IE
> + * bit 7 read-only 0
> + * bit 2 DM (WARL)
> + * bit 0 BE (WARL)
> + *
> + * The most interesting bit for us is IE(Interrupt Enable) bit.
> + * At the moment, at least, Linux doesn't use domaincfg.IE bit to
> + * disable interrupts globally, but if one day someone will use it
> + * then extra actions should be done.
> + */
> +
> + printk_once("%s: Nothing to do, domaincfg is set by aplic during "
> + "initialization in Xen\n", __func__);
As per the comment it's not "nothing to do", but your choice to ignore writes
even if they may be relevant.
> + return 0;
> +
> + default:
> + panic("%s: unsupported register offset: %#x\n", __func__, offset);
Crashing the host for the guest doing something odd? It's odd that the function
only ever returns 0 anyway - it could simply return an error here (if the
itention is to not ignore such writes).
As it's not clear what values other than zero such a function may return, I
also can't comment on its (and the hook's) return type (may want to be bool
instead of int).
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 21/27] xen/riscv: introduce (de)initialization helpers for vINTC
2026-03-10 17:08 ` [PATCH v1 21/27] xen/riscv: introduce (de)initialization helpers for vINTC Oleksii Kurochko
@ 2026-04-02 14:58 ` Jan Beulich
2026-04-15 7:50 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 14:58 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> @@ -96,4 +97,7 @@ void intc_handle_external_irqs(struct cpu_user_regs *regs);
>
> int intc_make_domu_dt_node(const struct kernel_info *kinfo);
>
> +int domain_vintc_init(struct domain *d);
> +void domain_vintc_deinit(struct domain *d);
The former has a caller added, but the later hasn't.
> @@ -87,3 +88,42 @@ int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
> return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
> irq_ranges);
> }
> +
> +int __init domain_vintc_init(struct domain *d)
The __init here and below is questionable again.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 22/27] xen/riscv: implement init_intc_phandle()
2026-03-10 17:08 ` [PATCH v1 22/27] xen/riscv: implement init_intc_phandle() Oleksii Kurochko
@ 2026-04-02 15:00 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 15:00 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> Implement init_intc_phandle() to read phandle of interrupt controller
> node and save it in kernel->phandle_intc for the future usage during
> creation of guest interrupt controller node.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
On the basis that it matches Arm's:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 23/27] xen/riscv: call do_initcalls() in start_xen()
2026-03-10 17:08 ` [PATCH v1 23/27] xen/riscv: call do_initcalls() in start_xen() Oleksii Kurochko
@ 2026-04-02 15:01 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 15:01 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> Execute initcall function. Additionally, before do_initcalls() call
> scheduler_init() which is needed because of cpupool_create_pool()
> is called by do_initcalls(); otherwise BUG_ON(IS_ERR(pool)) will occur
> in inside cpupool_create_pool().
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 24/27] xen/riscv: init rcu
2026-03-10 17:08 ` [PATCH v1 24/27] xen/riscv: init rcu Oleksii Kurochko
@ 2026-04-02 15:03 ` Jan Beulich
2026-04-14 11:50 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-02 15:03 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Fold patches 23-25 into a single one, with an adjusted title?
Jan
> --- a/xen/arch/riscv/setup.c
> +++ b/xen/arch/riscv/setup.c
> @@ -9,6 +9,7 @@
> #include <xen/init.h>
> #include <xen/irq.h>
> #include <xen/mm.h>
> +#include <xen/rcupdate.h>
> #include <xen/sched.h>
> #include <xen/serial.h>
> #include <xen/shutdown.h>
> @@ -154,6 +155,8 @@ void __init noreturn start_xen(unsigned long bootcpu_id,
>
> timer_init();
>
> + rcu_init();
> +
> local_irq_enable();
>
> console_init_postirq();
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 04/27] xen/riscv: rework G-stage mode handling
2026-04-01 13:19 ` Jan Beulich
@ 2026-04-07 10:47 ` Oleksii Kurochko
2026-04-07 13:43 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-07 10:47 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 3:19 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> Rework G-stage mode handling to make the selected mode descriptor reusable
>> outside of p2m initialization.
>>
>> As max_gstage_mode is going to be reused by code that creates CPU nodes for
>> guest domains, not only max_gstage_mode->mode but also max_gstage_mode->name
>> is required.
>
> I guess I'm not DT-savvy enough to understand why that would be.
There is an optional mmu-type property for each cpu:
https://github.com/riscv-non-isa/riscv-device-tree-doc/blob/master/bindings/riscv/cpus.txt#L73
>
>> To support this, make max_gstage_mode a global pointer to one of
>> the entries in a global modes[] array, and remove get_max_supported_mode().
>>
>> Update struct p2m_domain to store a pointer to a mode descriptor instead of
>> embedding the structure directly.
>>
>> Refactor the modes[] array so that mode->name contains only the MMU scheme
>> name (without the "x4" suffix), as this value is reused when filling the
>> maximum MMU type passed to the guest. According to DT bindings [1], the MMU
>> type must not include the "x4" suffix. Use "none" for the Bare mode to match
>> the DT binding requirements.
>
> I expect this DT aspect is also why Sv changes to sv in the table? (Which
> is a little unhelpful for the printk() where it's used.)
Yes. According to the link above the following options could be passed:
"riscv,sv32"
"riscv,sv39"
"riscv,sv48"
>
>> Adjust modes[]->paging_levels to represent the maximum paging level rather
>> than the total number of levels. This ensures that P2M_ROOT_LEVEL() and its
>> users behave correctly without relying on hardcoded p2m mode values.
>>
>> Finally, drop __initconst from the modes[] declaration, as the array is
>> referenced via p2m->mode and max_gstage_mode beyond the init stage.
>>
>> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/riscv/cpus.yaml?h=v6.19-rc3#n82
>
> Is a reference into Linux doc really providing something "canonical"? Surely
> there's an independent spec somewhere?
I wasn't able to find better source for arch-specific definitions. For
example, the source mentioned above has outdated mmu-type properties
which should also contain riscv,sv57 and riscv,none.
>
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -45,18 +45,32 @@ struct p2m_pte_ctx {
>> unsigned int level; /* Paging level at which the PTE resides. */
>> };
>>
>> -static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
>> - .mode = HGATP_MODE_OFF,
>> - .paging_levels = 0,
>> - .name = "Bare",
>> -};
>> -
>> /*
>> * Set to the maximum configured support for IPA bits, so the number of IPA bits can be
>> * restricted by external entity (e.g. IOMMU).
>> */
>> unsigned int __read_mostly p2m_ipa_bits = PADDR_BITS;
>>
>> +static const struct gstage_mode_desc modes[] = {
>
> As a function scope static this was a fine identifier. Please consider whether
> with the wider scope gstage_modes[] might not be better.
>
>> + /*
>> + * Based on the RISC-V spec:
>> + * Bare mode is always supported, regardless of SXLEN.
>> + * When SXLEN=32, the only other valid setting for MODE is Sv32.
>> + * When SXLEN=64, three paged virtual-memory schemes are defined:
>> + * Sv39, Sv48, and Sv57.
>> + */
>> + [0] = { HGATP_MODE_OFF, 0, "none" },
>> +#ifdef CONFIG_RISCV_32
>> + [1] = { HGATP_MODE_SV32X4, 1, "sv32" }
>> +#else
>> + [2] = { HGATP_MODE_SV39X4, 2, "sv39" },
>> + [3] = { HGATP_MODE_SV48X4, 3, "sv48" },
>> + [4] = { HGATP_MODE_SV57X4, 4, "sv57" },
>> +#endif
>> +};
>
> The dedicated initializer form isn't adding any value here (whereas it slightly
> hampers readability). You really don't want the array to be sparsely populated,
> so perhaps better to leave as it was before?
I need modes[] to be outside of gstage_mode_detect() as it then could be
re-used. For example, if expected G-stage mode should be passed by DTS
property then in DTS property we'll have something like:
chosen {
...
DOMU1 {
mmu-type="riscv,sv48";
...
}
...
}
And I will need to have another functions something like:
static unsigned int find_gstage_mode(const char *mmu_type) {...}
which will re-use modes[] to find a correspondent mode and return an
index (or return just correspondent mode) for that mode to then re-use
it to initialize p2m->mode:
p2m->mode = &modes[find_gstage_mode(mmu_type)];
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 04/27] xen/riscv: rework G-stage mode handling
2026-04-07 10:47 ` Oleksii Kurochko
@ 2026-04-07 13:43 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-07 13:43 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 07.04.2026 12:47, Oleksii Kurochko wrote:
> On 4/1/26 3:19 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/p2m.c
>>> +++ b/xen/arch/riscv/p2m.c
>>> @@ -45,18 +45,32 @@ struct p2m_pte_ctx {
>>> unsigned int level; /* Paging level at which the PTE resides. */
>>> };
>>>
>>> -static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
>>> - .mode = HGATP_MODE_OFF,
>>> - .paging_levels = 0,
>>> - .name = "Bare",
>>> -};
>>> -
>>> /*
>>> * Set to the maximum configured support for IPA bits, so the number of IPA bits can be
>>> * restricted by external entity (e.g. IOMMU).
>>> */
>>> unsigned int __read_mostly p2m_ipa_bits = PADDR_BITS;
>>>
>>> +static const struct gstage_mode_desc modes[] = {
>>
>> As a function scope static this was a fine identifier. Please consider whether
>> with the wider scope gstage_modes[] might not be better.
>>
>>> + /*
>>> + * Based on the RISC-V spec:
>>> + * Bare mode is always supported, regardless of SXLEN.
>>> + * When SXLEN=32, the only other valid setting for MODE is Sv32.
>>> + * When SXLEN=64, three paged virtual-memory schemes are defined:
>>> + * Sv39, Sv48, and Sv57.
>>> + */
>>> + [0] = { HGATP_MODE_OFF, 0, "none" },
>>> +#ifdef CONFIG_RISCV_32
>>> + [1] = { HGATP_MODE_SV32X4, 1, "sv32" }
>>> +#else
>>> + [2] = { HGATP_MODE_SV39X4, 2, "sv39" },
>>> + [3] = { HGATP_MODE_SV48X4, 3, "sv48" },
>>> + [4] = { HGATP_MODE_SV57X4, 4, "sv57" },
>>> +#endif
>>> +};
>>
>> The dedicated initializer form isn't adding any value here (whereas it slightly
>> hampers readability). You really don't want the array to be sparsely populated,
>> so perhaps better to leave as it was before?
>
> I need modes[] to be outside of gstage_mode_detect() as it then could be
> re-used.
Sure, and I didn't say "where it was before". I said "as it was before", i.e.
without dedicated initializers.
Jan
> For example, if expected G-stage mode should be passed by DTS
> property then in DTS property we'll have something like:
> chosen {
> ...
> DOMU1 {
> mmu-type="riscv,sv48";
> ...
> }
> ...
> }
>
> And I will need to have another functions something like:
> static unsigned int find_gstage_mode(const char *mmu_type) {...}
> which will re-use modes[] to find a correspondent mode and return an
> index (or return just correspondent mode) for that mode to then re-use
> it to initialize p2m->mode:
> p2m->mode = &modes[find_gstage_mode(mmu_type)];
>
> ~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 26/27] xen/riscv: provide init_vuart()
2026-03-10 17:08 ` [PATCH v1 26/27] xen/riscv: provide init_vuart() Oleksii Kurochko
@ 2026-04-07 13:52 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-07 13:52 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:08, Oleksii Kurochko wrote:
> For debug purpose is enough to have only print messages from guest what is
> now implemented in vsbi_legacy_ecall_handler().
>
> For full guesst console support it will better to have something similar to
> [1], thereby there is nothing specific should be done, at least, for now
> and init_vuart() is provided to make dom0less code buildable.
>
> [1] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2602041533440.3175371@ubuntu-linux-20-04-desktop/
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support
2026-03-10 17:09 ` [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support Oleksii Kurochko
@ 2026-04-07 14:11 ` Jan Beulich
2026-04-15 10:00 ` Oleksii Kurochko
2026-04-15 10:28 ` Oleksii Kurochko
0 siblings, 2 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-07 14:11 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.03.2026 18:09, Oleksii Kurochko wrote:
> Enable dom0less support for RISC-V by selecting HAS_DOM0LESS and
> providing the minimal architecture hooks required by the common
> dom0less infrastructure.
>
> Add stub implementations for architecture-specific helpers used when
> building domains from the device tree. These currently perform no
> additional work but allow the generic dom0less code to build and run
> on RISC-V.
>
> Introduce max_init_domid as a runtime variable rather than a constant
> so that it can be updated during dom0less domain creation.
>
> Provide missing helpers and definitions required by the domain
> construction code,
I'm wondering about the splitting among patches: There's half a dozen
(effectively stub) functions which are added here, and then there is
the single init_vuart() which was split out into the earlier patch.
What's the pattern behind this, i.e. why isn't init_vuart() also
being added here?
> including domain bitness helpers and the
> p2m_set_allocation() prototype.
>
> Additionally define the guest magic memory region in the public
> RISC-V interface.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
> ---
> Open questions:
> - Move declaration of p2m_set_allocation() to xen/fdt-domain-build.h or
> xen/dom0less-build.h as it is used in common code of Dom0less and
> there is not too much sense in declaration of it for each arch which
> supports Dom0less. It could be ifdef-ed in common header as, at the
> momemnt, it is used only for Dom0less.
Having a common declaration of p2m_set_allocation() is certainly a
sensible thing to do, but not in DT or dom0less headers. p2m-common.h
is, going from file names, the only plausible place for it to go.
Whether that (a) works and (b) makes sense are separate questions.
> - Shouldn't declaration/defintion of max_init_domid move to common code
> instead of having it for each architecture separately? If yes, then what
> would be the best place.
What would you use to decide whether the declaration or #define is
needed? (Plausible headers to put it can surely be found: console.h,
domain.h, and perhaps more.)
> --- a/xen/arch/riscv/include/asm/domain.h
> +++ b/xen/arch/riscv/include/asm/domain.h
> @@ -20,6 +20,14 @@ struct hvm_domain
> uint64_t params[HVM_NR_PARAMS];
> };
>
> +#ifdef CONFIG_RISCV_64
> +#define is_32bit_domain(d) (0)
> +#define is_64bit_domain(d) (1)
> +#else
> +#define is_32bit_domain(d) (1)
> +#define is_64bit_domain(d) (0)
> +#endif
First, please use true/false. Then, while I agree with the RV32 part, 32-bit
guests surely will need to be an option on a 64-bit hypervisor. Imo you'd
better introduced a field in struct arch_domain to carry that information
(or to derive it from) right away. That wouldn't be set to non-zero for the
time being, i.e. that same constant-true/false would still result.
Otherwise I don't see why you use #ifdef; you could then have things
simpler as
#define is_32bit_domain(d) IS_ENABLED(CONFIG_RISCV_32)
#define is_64bit_domain(d) IS_ENABLED(CONFIG_RISCV_64)
(but I specifically don't recommend going this route).
> --- a/xen/arch/riscv/setup.c
> +++ b/xen/arch/riscv/setup.c
> @@ -32,6 +32,8 @@
> #include <asm/traps.h>
> #include <asm/vsbi.h>
>
> +domid_t max_init_domid = 0;
The initializer isn't of much use, is it? Instead add __read_mostly, like
Arm has it?
> --- a/xen/include/public/arch-riscv.h
> +++ b/xen/include/public/arch-riscv.h
> @@ -58,6 +58,9 @@ typedef uint64_t xen_ulong_t;
> #define GUEST_RAM_BANK_BASES { GUEST_RAM0_BASE }
> #define GUEST_RAM_BANK_SIZES { GUEST_RAM0_SIZE }
>
> +#define GUEST_MAGIC_BASE xen_mk_ullong(0x39000000)
> +#define GUEST_MAGIC_SIZE xen_mk_ullong(0x01000000)
What is this, and why does it need putting in the public interface? Plus
how come the numbers are exactly the same as what Arm uses?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 01/27] xen/riscv: Implement ARCH_PAGING_MEMPOOL
2026-03-11 8:18 ` Jan Beulich
@ 2026-04-09 10:31 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-09 10:31 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 3/11/26 9:18 AM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> The p2m_freelist is used to allocate pages for the P2M, but to initialize
>> this list, domain_p2m_set_allocation() might be called.
>> This function is invoked in construct_domU() within the common Dom0less
>> code, and providing an implementation of domain_p2m_set_allocation() when
>> CONFIG_ARCH_PAGING_MEMPOOL=y is appropriate for RISC-V.
>
> With this wording it is odd to see ...
>
>> --- a/xen/arch/riscv/Kconfig
>> +++ b/xen/arch/riscv/Kconfig
>> @@ -1,5 +1,6 @@
>> config RISCV
>> def_bool y
>> + select ARCH_PAGING_MEMPOOL
>
> ... this. You really want to settle on whether it is selected unconditionally
> or not. Also for the code below, where ...
>
>> --- a/xen/arch/riscv/Makefile
>> +++ b/xen/arch/riscv/Makefile
>> @@ -10,7 +10,7 @@ obj-y += irq.o
>> obj-y += kernel.o
>> obj-y += mm.o
>> obj-y += p2m.o
>> -obj-y += paging.o
>> +obj-$(CONFIG_ARCH_PAGING_MEMPOOL) += paging.o
>> obj-y += pt.o
>> obj-$(CONFIG_RISCV_64) += riscv64/
>> obj-y += sbi.o
>
> ... this change and any #ifdef-ary further down aren't needed unless the
> select became conditional. (Plus with the change above things likely wouldn't
> even build if ARCH_PAGING_MEMPOOL could be off under certain conditions.)
I missed if ARCH_PAGING_MEMPOOL=n then p2m.c, at least, will fail to
compile so much more things would be needed to be #ifdef-ed.
Just for simplicity then I will set CONFIG_ARCH_PAGING_MEMPOOL=y
unconditionally and drop all the #ifdef-s related to this config I've
added in this patch.
>
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -1568,3 +1568,34 @@ struct page_info *get_page_from_gfn(struct domain *d, unsigned long gfn,
>>
>> return page;
>> }
>> +
>> +#ifdef CONFIG_ARCH_PAGING_MEMPOOL
>> +
>> +int arch_set_paging_mempool_size(struct domain *d, uint64_t size)
>> +{
>> + unsigned long pages = PFN_DOWN(size);
>> + bool preempted = false;
>> + int rc;
>> +
>> + if ( (size & ~PAGE_MASK) || /* Non page-sized request? */
>> + pages != PFN_DOWN(size) ) /* 32-bit overflow? */
>> + return -EINVAL;
>
> Can't this be had with just
>
> if ( ((paddr_t)pages << PAGE_SHIFT) != size )
> return -EINVAL;
>
> (and perhaps utilizing pfn_to_paddr(), even if it's not a PFN we're dealing
> with here)?
It makes sense, I will apply that for RISC-V.
Then for Arm and x86 could be done the same, I can send a separate patch
for them.
>
>> + spin_lock(&d->arch.paging.lock);
>> + rc = p2m_set_allocation(d, pages, &preempted);
>> + spin_unlock(&d->arch.paging.lock);
>> +
>> + ASSERT(preempted == (rc == -ERESTART));
>
> This actually suggests that (once again) Arm code perhaps shouldn't have been
> copied verbatim: There shouldn't be a need for the "preempted" state to be
> returned back in two distinct ways.
Agree.
The preempted argument of p2m_set_allocation() could be switched to a
plain 'bool could_preempt', preemption would then be signaled solely via
return -ERESTART, and the caller would just check rc, so no local bool
preempted in this function and no ASSERT.
I'm also thinking that the preempted argument could be dropped entirely,
as it seems to exist only to conditionally enable the
general_preempt_check() call inside the function. It is skipped only
during domain_p2m_set_allocation(), which won't be a significant penalty
if general_preempt_check() is called every time. All other callers pass
a non-NULL preempted, so general_preempt_check() would always be
executed regardless.
>
>> + return rc;
>> +}
>> +
>> +/* Return the size of the pool, in bytes. */
>> +int arch_get_paging_mempool_size(struct domain *d, uint64_t *size)
>> +{
>> + *size = (uint64_t)ACCESS_ONCE(d->arch.paging.total_pages) << PAGE_SHIFT;
>
> As per above, maybe use pfn_to_paddr()?
Yes, it could be used to be in sync with a code above.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 02/27] xen/riscv: Implement construct_domain()
2026-03-24 9:37 ` Jan Beulich
@ 2026-04-09 11:26 ` Oleksii Kurochko
2026-04-09 12:58 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-09 11:26 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 3/24/26 10:37 AM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> --- /dev/null
>> +++ b/xen/arch/riscv/domain-build.c
>> @@ -0,0 +1,46 @@
>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> +
>> +#include <xen/fdt-domain-build.h>
>> +#include <xen/fdt-kernel.h>
>> +#include <xen/init.h>
>> +#include <xen/sched.h>
>> +
>> +#include <asm/current.h>
>> +#include <asm/guest_access.h>
>> +
>> +int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
>
> Are you actually altering what kinfo points to?
Not directly in this function, but it could be altered, for example, by
kernel_image_load() where "info->entry = load_addr" is happening.
>
>> +{
>> + struct vcpu *v = d->vcpu[0];
>> + struct cpu_user_regs *regs = vcpu_guest_cpu_user_regs(v);
>> +
>> + BUG_ON(d->vcpu[0] == NULL);
>
> Why not simply "!v"?
It could work. I'll apply that.
>
> Also, while in the cover letter you state a dependency on another series,
> this is somewhat unwieldy here. From the titles there I can't deduce which
> of the patches would introduce vcpu_guest_cpu_user_regs(). Yet I would
> have wanted to double check that it doesn't de-reference v already.
It was already merged. It was part of:
xen/riscv: implement vcpu_csr_init() "02b3a1b0e53c"
>
>> + BUG_ON(v->is_initialised);
>> +
>> + kernel_load(kinfo);
>> + initrd_load(kinfo, copy_to_guest_phys);
>> + dtb_load(kinfo, copy_to_guest_phys);
>
> These all return void, despite this also being used for non-Dom0. Is it
> really fatal to a dom0less system if one out of many domains fail to be
> built?
For a dom0less system, my opinion is that it should not be fatal, it
should simply ignore a domain that fails to build and continue with the
rest. However, with the current common dom0less code it will just
panic(). This is a behavior I would like to change and it is on my TODO
list.
Regarding the functions returning void, this is because all of them
currently call panic() on failure, which I expect will need to change in
order to ignore a domain that fails to build in dom0less mode.
For the current implementation of the common dom0less code this is fine,
but I agree it should be addressed in a separate patch series.
Especially when, despite the name, there is a Dom0?
For this case, a failure there should indeed be fatal, so panic() is
appropriate.
>
>> + regs->sepc = kinfo->entry;
>> +
>> + /* Guest boot cpuid = 0 */
>> + regs->a0 = 0;
>> + regs->a1 = kinfo->dtb_paddr;
>> +
>> + for ( unsigned int i = 1; i < d->max_vcpus; i++ )
>> + {
>> + if ( vcpu_create(d, i) == NULL )
>> + {
>> + printk("Failed to allocate %pd v%d\n", d, i);
>> + break;
>
> And no error is indicated to the caller?
No, as generally it is enough to have only one vCPU0 to run domain, so
we have to print that something went wrong with allocation of vCPU1...n
but it is okay to me to continue domain construction.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 03/27] xen/riscv: implement prerequisites for domain_create()
2026-04-01 12:57 ` Jan Beulich
@ 2026-04-09 11:55 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-09 11:55 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 2:57 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> arch_domain_create() and arch_sanitise_domain_config() are prerequisites for
>> domain_create().
>>
>> arch_sanitise_domain_config() currently returns 0, as there is no specific
>> work required at this stage.
>>
>> arch_domain_create() performs basic initialization, such as setting up the P2M
>> and initializing the domain's virtual timer.
>
> Does it? I can spot only the former; instead there is ...
Stale comment, I will drop that as vtimer init is called per vCPU.
>
>> --- a/xen/arch/riscv/domain.c
>> +++ b/xen/arch/riscv/domain.c
>> @@ -288,6 +288,33 @@ void sync_vcpu_execstate(struct vcpu *v)
>> /* Nothing to do -- no lazy switching */
>> }
>>
>> +int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>> +{
>> + return 0;
>> +}
>> +
>> +int arch_domain_create(struct domain *d,
>> + struct xen_domctl_createdomain *config,
>> + unsigned int flags)
>> +{
>> + int rc = 0;
>> +
>> + if ( is_idle_domain(d) )
>> + return 0;
>> +
>> + if ( (rc = p2m_init(d)) != 0)
>> + goto fail;
>> +
>> + d->arch.next_phandle = GUEST_PHANDLE_LAST + 1;
>
> ... this, which I can't make any sense of. I can't find matching Arm code
> either, which might otherwise have helped.
There are some cases when we are creating a node for guest DTB we should
know which phandles aren't busy (to avoid the case when some nodes have
the same phandle) so to track that this struct field was introduced.
I will check Arm code as I expect that phandle should be generated
somehow on their side too.
>
>> + return rc;
>> +
>> + fail:
>> + d->is_dying = DOMDYING_dead;
>> + arch_domain_destroy(d);
>
> (At least) for the use here, that other function would better also move out
> of stubs.c at the same time (and no longer have unconditional BUG_ON() in it).
>
>> --- a/xen/arch/riscv/stubs.c
>> +++ b/xen/arch/riscv/stubs.c
>> @@ -101,14 +101,7 @@ void dump_pageframe_info(struct domain *d)
>> BUG_ON("unimplemented");
>> }
>>
>> -int arch_sanitise_domain_config(struct xen_domctl_createdomain *config)
>> -{
>> - BUG_ON("unimplemented");
>> -}
>> -
>> -int arch_domain_create(struct domain *d,
>> - struct xen_domctl_createdomain *config,
>> - unsigned int flags)
>> +void vcpu_switch_to_aarch64_mode(struct vcpu *v)
>
> What is this? Surely nothing with this name should exist under riscv/.
Good question. I don't know how it appeared here. I will drop it.
>
>> --- a/xen/include/public/device_tree_defs.h
>> +++ b/xen/include/public/device_tree_defs.h
>> @@ -14,6 +14,7 @@
>> */
>> #define GUEST_PHANDLE_GIC (65000)
>> #define GUEST_PHANDLE_IOMMU (GUEST_PHANDLE_GIC + 1)
>> +#define GUEST_PHANDLE_LAST GUEST_PHANDLE_IOMMU
>
> This, to me, looks like a questionable addition to the public interface.
> Yet I'm not a DT person, so I may simply be missing why something like
> this might be wanted (and how stable it then would be, long term).
I will double check if I have to generate phandle for the node I'm
creating for guest DTB. It looks like I am confused something and it
could be just dropped.
Considering that only RISC-V needs that at the moment (if need it all?)
I will put somewhere in riscv/ this definition.
Thanks.
~ Oleksii.
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 02/27] xen/riscv: Implement construct_domain()
2026-04-09 11:26 ` Oleksii Kurochko
@ 2026-04-09 12:58 ` Jan Beulich
2026-04-09 13:39 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-09 12:58 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 09.04.2026 13:26, Oleksii Kurochko wrote:
> On 3/24/26 10:37 AM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> --- /dev/null
>>> +++ b/xen/arch/riscv/domain-build.c
>>> @@ -0,0 +1,46 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-or-later */
>>> +
>>> +#include <xen/fdt-domain-build.h>
>>> +#include <xen/fdt-kernel.h>
>>> +#include <xen/init.h>
>>> +#include <xen/sched.h>
>>> +
>>> +#include <asm/current.h>
>>> +#include <asm/guest_access.h>
>>> +
>>> +int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
>>
>> Are you actually altering what kinfo points to?
>
> Not directly in this function, but it could be altered, for example, by
> kernel_image_load() where "info->entry = load_addr" is happening.
Ah, I see.
>>> +{
>>> + struct vcpu *v = d->vcpu[0];
>>> + struct cpu_user_regs *regs = vcpu_guest_cpu_user_regs(v);
>>> +
>>> + BUG_ON(d->vcpu[0] == NULL);
>>
>> Why not simply "!v"?
>
> It could work. I'll apply that.
>
>>
>> Also, while in the cover letter you state a dependency on another series,
>> this is somewhat unwieldy here. From the titles there I can't deduce which
>> of the patches would introduce vcpu_guest_cpu_user_regs(). Yet I would
>> have wanted to double check that it doesn't de-reference v already.
>
> It was already merged. It was part of:
> xen/riscv: implement vcpu_csr_init() "02b3a1b0e53c"
Oh, indeed. Which makes clear that the BUG_ON() comes too late.
>>> + BUG_ON(v->is_initialised);
>>> +
>>> + kernel_load(kinfo);
>>> + initrd_load(kinfo, copy_to_guest_phys);
>>> + dtb_load(kinfo, copy_to_guest_phys);
>>
>> These all return void, despite this also being used for non-Dom0. Is it
>> really fatal to a dom0less system if one out of many domains fail to be
>> built?
>
> For a dom0less system, my opinion is that it should not be fatal, it
> should simply ignore a domain that fails to build and continue with the
> rest. However, with the current common dom0less code it will just
> panic(). This is a behavior I would like to change and it is on my TODO
> list.
>
> Regarding the functions returning void, this is because all of them
> currently call panic() on failure, which I expect will need to change in
> order to ignore a domain that fails to build in dom0less mode.
>
> For the current implementation of the common dom0less code this is fine,
> but I agree it should be addressed in a separate patch series.
>
> Especially when, despite the name, there is a Dom0?
>
> For this case, a failure there should indeed be fatal, so panic() is
> appropriate.
I think you misunderstood. I wasn't referring to the building of Dom0
failing. Was rather emphasizing that when there is a Dom0, failure to
create a DomU likely should even less so be fatal, as Dom0 could later
rectify the situation.
>>> + regs->sepc = kinfo->entry;
>>> +
>>> + /* Guest boot cpuid = 0 */
>>> + regs->a0 = 0;
>>> + regs->a1 = kinfo->dtb_paddr;
>>> +
>>> + for ( unsigned int i = 1; i < d->max_vcpus; i++ )
>>> + {
>>> + if ( vcpu_create(d, i) == NULL )
>>> + {
>>> + printk("Failed to allocate %pd v%d\n", d, i);
>>> + break;
>>
>> And no error is indicated to the caller?
>
> No, as generally it is enough to have only one vCPU0 to run domain, so
> we have to print that something went wrong with allocation of vCPU1...n
> but it is okay to me to continue domain construction.
Hmm, now that I look there, sched_setup_dom0_vcpus() ignores errors
and doesn't even emit a log message. Question is why neither Arm nor
RISC-V use that function, when we have it.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 02/27] xen/riscv: Implement construct_domain()
2026-04-09 12:58 ` Jan Beulich
@ 2026-04-09 13:39 ` Oleksii Kurochko
2026-04-09 14:01 ` Oleksii Kurochko
2026-04-14 6:26 ` Julien Grall
0 siblings, 2 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-09 13:39 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/9/26 2:58 PM, Jan Beulich wrote:
>>>> +{
>>>> + struct vcpu *v = d->vcpu[0];
>>>> + struct cpu_user_regs *regs = vcpu_guest_cpu_user_regs(v);
>>>> +
>>>> + BUG_ON(d->vcpu[0] == NULL);
>>> Why not simply "!v"?
>> It could work. I'll apply that.
>>
>>> Also, while in the cover letter you state a dependency on another series,
>>> this is somewhat unwieldy here. From the titles there I can't deduce which
>>> of the patches would introduce vcpu_guest_cpu_user_regs(). Yet I would
>>> have wanted to double check that it doesn't de-reference v already.
>> It was already merged. It was part of:
>> xen/riscv: implement vcpu_csr_init() "02b3a1b0e53c"
> Oh, indeed. Which makes clear that the BUG_ON() comes too late.
It makes this BUG_ON() unnessary at all if d->vcpu[0] wasn't created
then construct_domain() won't be called for such domain.
I will just drop this BUG_ON().
>
>>>> + BUG_ON(v->is_initialised);
>>>> +
>>>> + kernel_load(kinfo);
>>>> + initrd_load(kinfo, copy_to_guest_phys);
>>>> + dtb_load(kinfo, copy_to_guest_phys);
>>> These all return void, despite this also being used for non-Dom0. Is it
>>> really fatal to a dom0less system if one out of many domains fail to be
>>> built?
>> For a dom0less system, my opinion is that it should not be fatal, it
>> should simply ignore a domain that fails to build and continue with the
>> rest. However, with the current common dom0less code it will just
>> panic(). This is a behavior I would like to change and it is on my TODO
>> list.
>>
>> Regarding the functions returning void, this is because all of them
>> currently call panic() on failure, which I expect will need to change in
>> order to ignore a domain that fails to build in dom0less mode.
>>
>> For the current implementation of the common dom0less code this is fine,
>> but I agree it should be addressed in a separate patch series.
>>
>> Especially when, despite the name, there is a Dom0?
>>
>> For this case, a failure there should indeed be fatal, so panic() is
>> appropriate.
> I think you misunderstood. I wasn't referring to the building of Dom0
> failing. Was rather emphasizing that when there is a Dom0, failure to
> create a DomU likely should even less so be fatal, as Dom0 could later
> rectify the situation.
Oh, okay, then it is really less fatal if DomU creation will fail in the
case of Dom0.
>
>>>> + regs->sepc = kinfo->entry;
>>>> +
>>>> + /* Guest boot cpuid = 0 */
>>>> + regs->a0 = 0;
>>>> + regs->a1 = kinfo->dtb_paddr;
>>>> +
>>>> + for ( unsigned int i = 1; i < d->max_vcpus; i++ )
>>>> + {
>>>> + if ( vcpu_create(d, i) == NULL )
>>>> + {
>>>> + printk("Failed to allocate %pd v%d\n", d, i);
>>>> + break;
>>> And no error is indicated to the caller?
>> No, as generally it is enough to have only one vCPU0 to run domain, so
>> we have to print that something went wrong with allocation of vCPU1...n
>> but it is okay to me to continue domain construction.
> Hmm, now that I look there, sched_setup_dom0_vcpus() ignores errors
> and doesn't even emit a log message. Question is why neither Arm nor
> RISC-V use that function, when we have it.
I haven't seen this function, I will re-use it and in separate patch
suggest to re-use it for Arm.
One thing I think we want to do then is to drop #ifdef x86 around
sched_setup_dom0_vcpus() and rename it to sched_setup_dom_vcpus().
And maybe add dprintk()'s to provide some information about which vCPUs
were created and which not.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 02/27] xen/riscv: Implement construct_domain()
2026-04-09 13:39 ` Oleksii Kurochko
@ 2026-04-09 14:01 ` Oleksii Kurochko
2026-04-14 6:26 ` Julien Grall
1 sibling, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-09 14:01 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/9/26 3:39 PM, Oleksii Kurochko wrote:
>>
>>>>> + regs->sepc = kinfo->entry;
>>>>> +
>>>>> + /* Guest boot cpuid = 0 */
>>>>> + regs->a0 = 0;
>>>>> + regs->a1 = kinfo->dtb_paddr;
>>>>> +
>>>>> + for ( unsigned int i = 1; i < d->max_vcpus; i++ )
>>>>> + {
>>>>> + if ( vcpu_create(d, i) == NULL )
>>>>> + {
>>>>> + printk("Failed to allocate %pd v%d\n", d, i);
>>>>> + break;
>>>> And no error is indicated to the caller?
>>> No, as generally it is enough to have only one vCPU0 to run domain, so
>>> we have to print that something went wrong with allocation of vCPU1...n
>>> but it is okay to me to continue domain construction.
>> Hmm, now that I look there, sched_setup_dom0_vcpus() ignores errors
>> and doesn't even emit a log message. Question is why neither Arm nor
>> RISC-V use that function, when we have it.
>
> I haven't seen this function, I will re-use it and in separate patch
> suggest to re-use it for Arm.
>
> One thing I think we want to do then is to drop #ifdef x86 around
> sched_setup_dom0_vcpus() and rename it to sched_setup_dom_vcpus().
> And maybe add dprintk()'s to provide some information about which vCPUs
> were created and which not.
Arm doesn't re-use sched_setup_dom0_vcpus() because it has extra if()
inside:
for ( i = 1; i < d->max_vcpus; i++ )
{
if ( vcpu_create(d, i) == NULL )
{
printk("Failed to allocate d%dv%d\n", d->domain_id, i);
break;
}
if ( is_64bit_domain(d) )
vcpu_switch_to_aarch64_mode(d->vcpu[i]);
}
domain_update_node_affinity(d);
sched_setup_dom0_vcpus() still could be re-used but the an extra for()
loop will be needed for:
if ( is_64bit_domain(d) )
vcpu_switch_to_aarch64_mode(d->vcpu[i]);
I think it doesn't then too much sense to re-use
sched_setup_dom0_vcpus() for Arm.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 05/27] xen/riscv: introduce guest riscv,isa string
2026-04-01 13:49 ` Jan Beulich
@ 2026-04-10 10:24 ` Oleksii Kurochko
2026-04-10 10:50 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 10:24 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 3:49 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> Introduce generation of the riscv,isa string passed to the guest via the
>> Device Tree riscv,isa property.
>
> Title as well as this suggest this is all about guest properties. Then ...
>
>> The following changes are introduced:
>>
>> - According to the RISC-V privileged specification, M-mode software controls
>> the XLEN value used by S-mode and U-mode. For RV64 harts, the SXL and UXL
>> fields of the mstatus register are WARL fields that define the XLEN for
>> S-mode and U-mode.
>>
>> The XLEN value is provided by M-mode software (OpenSBI in the case of Xen)
>> via the riscv,isa DT property. Introduce and initialize an xlen variable
>> when parsing the host riscv,isa string in riscv_isa_parse_string().
>
> ... suddenly talk is of host aspects? (See below as to what "xlen" really
> is meant to hold.)
I just used it to show that xlen could be different and based on what
prev. mode put into one of it register to encode xlen. And the same is
applied for guest domains. I will rephrase it in more proper way.
>
>> --- a/xen/arch/riscv/cpufeature.c
>> +++ b/xen/arch/riscv/cpufeature.c
>> @@ -38,6 +38,8 @@ struct riscv_isa_ext_data {
>> /* Host ISA bitmap */
>> static __ro_after_init DECLARE_BITMAP(riscv_isa, RISCV_ISA_EXT_MAX);
>>
>> +static __ro_after_init unsigned int xlen;
>
> Nit: Attribute between type and identifier please, whenever possible (it
> isn't neatly possible in riscv_isa above, due to DECLARE_BITMAP()).
>
>> @@ -160,6 +162,19 @@ static const struct riscv_isa_ext_data __initconst required_extensions[] = {
>> RISCV_ISA_EXT_DATA(svpbmt),
>> };
>>
>> +static const unsigned int __initconst guest_unsupp_exts[] = {
>> + RISCV_ISA_EXT_f,
>> + RISCV_ISA_EXT_d,
>> + RISCV_ISA_EXT_h,
>> + RISCV_ISA_EXT_q,
>> + RISCV_ISA_EXT_v,
>> +};
>
> This could do with a comment clarifying what needs (and what doesn't need)
> putting here. My expectation would have been that everything in
> riscv_isa_ext[] which shouldn't be exposed to guests should appear here.
> Yet then there is V (which riscv_isa_ext[] doesn't have), while e.g. Svade
> and Svpbmt (which iirc won't be available to guests right away) aren't
> there.
>
>> +static __ro_after_init DECLARE_BITMAP(guest_unsupp_bmp, RISCV_ISA_EXT_MAX);
>
> Is the _bmp suffix really needed? riscv_isa, for example, doesn't have it.
It makes sense to drop.
>
>> +#define MAX_GUEST_ISA_STR_LEN 256
>> +char guest_isa_str[MAX_GUEST_ISA_STR_LEN];
>
> __ro_after_init?
>
> Yet then - can this really be a global? Isn't the set of extensions
> available to a guest a per-guest property, i.e. a global could at best
> represent an upper bound on features?
At least, for now I think it is okay just to use global guest riscv,isa
for all the domains. From my point of view if Xen doesn't support some
extension to be used by guest so then should be responsible to generate
proper riscv,isa.
It probably make sense to have riscv,isa per guest but then we still
will want to parse this passed riscv,isa and check what Xen supports and
what not, what looks like a double work a little bit. Probably it is the
best one option and if riscv,isa isn't provided then just use that one
generated by Xen.
>
>> @@ -193,6 +208,15 @@ static void __init match_isa_ext(const char *name, const char *name_end,
>> !memcmp(name, ext->name, name_end - name) )
>> {
>> __set_bit(ext->id, bitmap);
>> +
>> + if ( riscv_isa_extension_available(guest_unsupp_bmp, ext->id) )
>> + break;
>> +
>> + if ( ext->id >= RISCV_ISA_EXT_BASE )
>> + safe_strcat(guest_isa_str, "_");
>> +
>> + safe_strcat(guest_isa_str, ext->name);
>
> Shouldn't you check the (kind-of-)return value? (Yet better would be a build-
> time check, but I can't think of a way to achieve that.)
Yes, it would be nice. But it doesn't clear what should be reaction -
just panic() with the message that we don't have enough space to cover
all extenstion in riscv,isa or just continue with what was put to
guest_isa_str. I think the second option could be fine.
>
>> @@ -207,13 +231,17 @@ static int __init riscv_isa_parse_string(const char *isa,
>> #if defined(CONFIG_RISCV_32)
>> if ( isa[2] != '3' && isa[3] != '2' )
>> return -EINVAL;
>> + xlen = 32;
>> #elif defined(CONFIG_RISCV_64)
>> if ( isa[2] != '6' && isa[3] != '4' )
>> return -EINVAL;
>> + xlen = 64;
>> #else
>> # error "unsupported RISC-V bitness"
>> #endif
>
> This can be had with an initializer of "xlen". Then the (kind-of-)variable
> could be const unsigned int. Seeing the use below, is the variable
> correctly named, though? I.e. shouldn't it be guest_xlen or some such?
guest_xlen would be better.
>
> Independently I expect you will want to support 32-bit guests on 64-bit Xen
> at some point, in which case encoding this value into a global string won't
> work very well.
Yes, then it will be needed to move it to domain-specific structure.
I don't know if it makes sense to do now. (depends on what we will
decide to do with per-domain riscv,isa or it is fine to go with global one)
Btw, it isn't always be possible to have 32-bit guest on 64-bit Xen as
when HSXLEN=64, VSXL is a WARL field that is encoded the same as the MXL
field of misa. In particular, an implementation may make VSXL be a
read-only field whose value always ensures that VSXLEN=HSXLEN.
>
>> + snprintf(guest_isa_str, sizeof(guest_isa_str), "rv%d", xlen);
>
> %u please with unsigned int.
>
> This being the only use of the variable (afaics), why is it not function-
> scope?
With the current implmentation, yes, it should be in function-scope.
>
>> @@ -487,6 +515,11 @@ void __init riscv_fill_hwcap(void)
>> bool all_extns_available = true;
>> struct trap_info trap;
>>
>> + for ( i = 0; i < ARRAY_SIZE(guest_unsupp_exts); i++ )
>> + {
>> + __set_bit(guest_unsupp_exts[i], guest_unsupp_bmp);
>> + }
>
> Nit: No need for braces here. And anyway - can't this be had with an
> initializer for guest_unsupp_bmp?
It could be. I will use an initializer.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 05/27] xen/riscv: introduce guest riscv,isa string
2026-04-10 10:24 ` Oleksii Kurochko
@ 2026-04-10 10:50 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-10 10:50 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 12:24, Oleksii Kurochko wrote:
> On 4/1/26 3:49 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> +#define MAX_GUEST_ISA_STR_LEN 256
>>> +char guest_isa_str[MAX_GUEST_ISA_STR_LEN];
>>
>> __ro_after_init?
>>
>> Yet then - can this really be a global? Isn't the set of extensions
>> available to a guest a per-guest property, i.e. a global could at best
>> represent an upper bound on features?
>
> At least, for now I think it is okay just to use global guest riscv,isa
> for all the domains. From my point of view if Xen doesn't support some
> extension to be used by guest so then should be responsible to generate
> proper riscv,isa.
>
> It probably make sense to have riscv,isa per guest but then we still
> will want to parse this passed riscv,isa and check what Xen supports and
> what not, what looks like a double work a little bit. Probably it is the
> best one option and if riscv,isa isn't provided then just use that one
> generated by Xen.
My take is: Have data arrangement be as it wants to be in the end (i.e.
here: record what ought to be per guest as per-guest data). All guests
using the same value for now is a fair restriction until things have
progressed more.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 06/27] xen/riscv: implement make_cpus_node()
2026-04-01 14:11 ` Jan Beulich
@ 2026-04-10 11:19 ` Oleksii Kurochko
2026-04-10 12:02 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 11:19 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 4:11 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/Makefile
>> +++ b/xen/arch/riscv/Makefile
>> @@ -1,6 +1,7 @@
>> obj-y += aplic.o
>> obj-y += cpufeature.o
>> obj-y += domain.o
>> +obj-y += domain-build.o
>> obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
>> obj-y += entry.o
>> obj-y += guestcopy.o
>
> Doesn't this belong in patch 02? And then domain-build.init.o?
Yes, it should be part of patch 02. Made a mistake during splitting.
>
>> --- a/xen/arch/riscv/domain-build.c
>> +++ b/xen/arch/riscv/domain-build.c
>> @@ -3,8 +3,10 @@
>> #include <xen/fdt-domain-build.h>
>> #include <xen/fdt-kernel.h>
>> #include <xen/init.h>
>> +#include <xen/libfdt/libfdt.h>
>> #include <xen/sched.h>
>>
>> +#include <asm/cpufeature.h>
>> #include <asm/current.h>
>> #include <asm/guest_access.h>
>>
>> @@ -44,3 +46,109 @@ int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
>>
>> return 0;
>> }
>> +
>> +int __init make_cpus_node(const struct domain *d, void *fdt)
>> +{
>> + int res;
>> + const struct dt_device_node *cpus = dt_find_node_by_path("/cpus");
>> + unsigned int cpu;
>> + u32 timebase_frequency;
>
> uint32_t please.
>
>> + bool frequency_valid;
>> + uint32_t *next_phandle = &((struct domain *)d)->arch.next_phandle;
>
> No casting away of const, please.
>
>> + dt_dprintk("Create cpus node\n");
>> +
>> + if ( !cpus )
>> + {
>> + dprintk(XENLOG_ERR, "Missing /cpus node in the device tree?\n");
>> + return -ENOENT;
>> + }
>> +
>> + frequency_valid = dt_property_read_u32(cpus, "timebase-frequency",
>> + &timebase_frequency);
>> +
>> + res = fdt_begin_node(fdt, "cpus");
>> + if ( res )
>> + return res;
>> +
>> + res = fdt_property_cell(fdt, "#address-cells", 1);
>> + if ( res )
>> + return res;
>> +
>> + res = fdt_property_cell(fdt, "#size-cells", 0);
>> + if ( res )
>> + return res;
>> +
>> + if ( frequency_valid )
>> + res = fdt_property_cell(fdt, "timebase-frequency", timebase_frequency);
>
> Handing through a property directly makes me wonder how that's going to
> fit with migration. I understand migration may not even be a mid-term
> goal, but still.
Do you mean if I will set timebase-frequency = X for guest cpu node and
then this guest will migrate to h/w where timebase-frequency is Y, so we
will have unsynced timebase-frequency?
Migration between hosts with differing timebase-frequency would require
either (a) restricting migration pools to frequency-matched hosts (for
example, KVM checks if timebase-frequency isn't different here:
https://elixir.bootlin.com/linux/v6.19.11/source/arch/riscv/kvm/vcpu_timer.c#L200),
or (b) trap-and-emulate of time CSR reads to scale the virtual timer.
I think so as unlike ARM (which has CNTFRQ_EL0 writable by EL2, letting
the hypervisor normalize the frequency it presents), RISC-V has no
hypervisor-controlled frequency register. The guest reads
timebase-frequency from DT exactly once and trusts it forever. There's
no in-guest mechanism to update it post-boot. (at least, I don't see now
how to do that based on the spec)
>
>> + for ( cpu = 0; cpu < d->max_vcpus; cpu++ )
>> + {
>> + char buf[64];
>> + uint32_t reg = cpu_to_fdt32(cpu);
>> +
>> + snprintf(buf, sizeof(buf), "cpu@%u", cpu);
>> + res = fdt_begin_node(fdt, buf);
>> + if ( res )
>> + return res;
>> +
>> + res = fdt_property(fdt, "reg", ®, sizeof(reg));
>> + if ( res )
>> + return res;
>> +
>> + res = fdt_property_string(fdt, "status", "okay");
>> + if ( res )
>> + return res;
>> +
>> + res = fdt_property_string(fdt, "compatible", "riscv");
>> + if ( res )
>> + return res;
>> +
>> + BUILD_BUG_ON((sizeof("riscv,") + sizeof_field(struct gstage_mode_desc, name) + 1) >= sizeof(buf));
>
> Nit: Overlong line. Also, why +1? The name field has to include a nul, or
> else ...
Agree, +1 looks wrong here.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 07/27] xen/riscv: implement make_timer_node()
2026-04-01 14:24 ` Jan Beulich
@ 2026-04-10 11:54 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 11:54 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 4:24 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> The make_timer_node() function is implemented to return 0,
>> as RISC-V does not require the creation of a timer node.
>>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> Acked-by: Jan Beulich <jbeulich@suse.com>
>
> Yet: Why is this needed then in the first place? If this node may not
> be required, why would the function need providing? Shouldn't the call
> site be conditional then?
Generally, in DT for RISC-V there is a document which describes timer
node (riscv,timer.yaml or sifive,clint.yaml), but the driver
(drivers/clocksource/timer-riscv.c:244) is declared with
TIMER_OF_DECLARE(riscv_timer, "riscv", ...). It matches the CPU node
(compatible "riscv"), not the timer node itself. It then does
of_find_compatible_node(NULL, NULL, "riscv,timer") only to read the
optional riscv,timer-cannot-wake-cpu flag. But I don't see too much
sense at the moment to enable timer-cannot-wake-cpu option.
Regarding, sifive,clint.yaml basically it has almost the same as
riscv,timer node but also additionally provide some MMIO which are
acessible only in M-mode (what isn't the case for virtualization as
guest isn't ran in M-mode, otherwise emulation of M-mode register will
be needed).
So potentially some will want to add riscv,timer node to enable
timer-cannot-wake-cpu option, so then some changes additionally should
be done in this function.
Considering that it isn't needed now and timebase-frequency property is
always part of cpu node not a timer one, I think we can go with an empty
implementation of make_timer_node().
I can add some extra information to commit message.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 06/27] xen/riscv: implement make_cpus_node()
2026-04-10 11:19 ` Oleksii Kurochko
@ 2026-04-10 12:02 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-10 12:02 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 13:19, Oleksii Kurochko wrote:
> On 4/1/26 4:11 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> @@ -44,3 +46,109 @@ int __init construct_domain(struct domain *d, struct kernel_info *kinfo)
>>>
>>> return 0;
>>> }
>>> +
>>> +int __init make_cpus_node(const struct domain *d, void *fdt)
>>> +{
>>> + int res;
>>> + const struct dt_device_node *cpus = dt_find_node_by_path("/cpus");
>>> + unsigned int cpu;
>>> + u32 timebase_frequency;
>>
>> uint32_t please.
>>
>>> + bool frequency_valid;
>>> + uint32_t *next_phandle = &((struct domain *)d)->arch.next_phandle;
>>
>> No casting away of const, please.
>>
>>> + dt_dprintk("Create cpus node\n");
>>> +
>>> + if ( !cpus )
>>> + {
>>> + dprintk(XENLOG_ERR, "Missing /cpus node in the device tree?\n");
>>> + return -ENOENT;
>>> + }
>>> +
>>> + frequency_valid = dt_property_read_u32(cpus, "timebase-frequency",
>>> + &timebase_frequency);
>>> +
>>> + res = fdt_begin_node(fdt, "cpus");
>>> + if ( res )
>>> + return res;
>>> +
>>> + res = fdt_property_cell(fdt, "#address-cells", 1);
>>> + if ( res )
>>> + return res;
>>> +
>>> + res = fdt_property_cell(fdt, "#size-cells", 0);
>>> + if ( res )
>>> + return res;
>>> +
>>> + if ( frequency_valid )
>>> + res = fdt_property_cell(fdt, "timebase-frequency", timebase_frequency);
>>
>> Handing through a property directly makes me wonder how that's going to
>> fit with migration. I understand migration may not even be a mid-term
>> goal, but still.
>
> Do you mean if I will set timebase-frequency = X for guest cpu node and
> then this guest will migrate to h/w where timebase-frequency is Y, so we
> will have unsynced timebase-frequency?
Yes (I wouldn't name it "unsync-ed" though; it'll simply be wrong).
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 08/27] xen/riscv: implement make_arch_nodes()
2026-04-01 14:29 ` Jan Beulich
@ 2026-04-10 13:32 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 13:32 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 4:29 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> No RISC-V-specific nodes need to be created at the moment,
>> so make_arch_nodes() is implemented to simply return 0.
>>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> Same question as on the previous patch, albeit to a lesser degree: An
> arch hook may plausibly exist (and be called) uniformly, even if on
> some arch-es it doesn't do anything.
I'm using it to provide a node for emulated UART, something like Arm
does for VPL011 by calling of make_vpl011_uart_node().
>
>> ---
>> xen/arch/riscv/Makefile | 1 +
>> xen/arch/riscv/dom0less-build.c | 11 +++++++++++
>> 2 files changed, 12 insertions(+)
>> create mode 100644 xen/arch/riscv/dom0less-build.c
>
> Further, on what basis is this function put in this new file, when the
> earlier ones live in domain-build.c?
Because it is called only for dom0less path, but previous one called
from both dom0 and dom0less path.
>
>> --- a/xen/arch/riscv/Makefile
>> +++ b/xen/arch/riscv/Makefile
>> @@ -1,5 +1,6 @@
>> obj-y += aplic.o
>> obj-y += cpufeature.o
>> +obj-y += dom0less-build.o
>
> obj-$(CONFIG_DOM0LESS_BOOT) += dom0less-build.init.o>
> just like Arm has it?
It makes sense.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 09/27] xen/riscv: implement make_intc_domU_node()
2026-04-01 14:38 ` Jan Beulich
@ 2026-04-10 14:00 ` Oleksii Kurochko
2026-04-10 14:23 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 14:00 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 4:38 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/dom0less-build.c
>> +++ b/xen/arch/riscv/dom0less-build.c
>> @@ -3,6 +3,15 @@
>> #include <xen/fdt-kernel.h>
>> #include <xen/init.h>
>>
>> +#include <asm/intc.h>
>> +
>> +int __init make_intc_domU_node(struct kernel_info *kinfo)
>> +{
>> + intc_make_domu_dt_node(kinfo);
>> +
>> + return 0;
>> +}
>
> Is this wrapper really needed? Can't what's intc_make_domu_dt_node() right
> now become make_intc_domU_node()?
With current implementation no as intc_hw_ops used inside
intc_make_domu_dt_node() is declared as static.
But I can introduce:
enum intc_version intc_hw_version(void)
{
return intc_hw_ops->info->hw_version;
}
and the in make_intc_domU_node() just use switch/case to call interrupt
controller specific functions.
Would it be better? It will also help to ...
>
>> @@ -41,6 +41,10 @@ struct intc_hw_operations {
>>
>> /* handle external interrupt */
>> void (*handle_interrupt)(struct cpu_user_regs *regs);
>> +
>> + /* Create interrupt controller node for domain */
>> + int (*make_dom_dt_node)(const struct kernel_info *kinfo,
>> + const struct dt_device_node *intc);
>
> An __init-only hook is somewhat risky, just to mention it. In IOMMU code
> besides struct iommu_ops we have struct iommu_init_ops, just to give an
> example of where the same could have been the case.
.. not introduce hooks in this structure which won't exist after init.
>
>> --- a/xen/arch/riscv/intc.c
>> +++ b/xen/arch/riscv/intc.c
>> @@ -67,3 +67,11 @@ void intc_route_irq_to_xen(struct irq_desc *desc, unsigned int priority)
>> intc_set_irq_type(desc, desc->arch.type);
>> intc_set_irq_priority(desc, priority);
>> }
>> +
>> +int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
>> +{
>> + if ( intc_hw_ops && intc_hw_ops->make_dom_dt_node )
>> + return intc_hw_ops->make_dom_dt_node(kinfo, intc_hw_ops->info->node);
>> +
>> + return -ENOSYS;
>
> How do you justify this choice of return value? This isn't even a hypercall
> handler.
That make_dom_dt_node() isn't provided by interrupt controller, so isn't
supported or as it mentioned in the comment "not implemented in
public/errno.h:
XEN_ERRNO(ENOSYS, 38) /* Function not implemented */
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 09/27] xen/riscv: implement make_intc_domU_node()
2026-04-10 14:00 ` Oleksii Kurochko
@ 2026-04-10 14:23 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-10 14:23 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 16:00, Oleksii Kurochko wrote:
> On 4/1/26 4:38 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/dom0less-build.c
>>> +++ b/xen/arch/riscv/dom0less-build.c
>>> @@ -3,6 +3,15 @@
>>> #include <xen/fdt-kernel.h>
>>> #include <xen/init.h>
>>>
>>> +#include <asm/intc.h>
>>> +
>>> +int __init make_intc_domU_node(struct kernel_info *kinfo)
>>> +{
>>> + intc_make_domu_dt_node(kinfo);
>>> +
>>> + return 0;
>>> +}
>>
>> Is this wrapper really needed? Can't what's intc_make_domu_dt_node() right
>> now become make_intc_domU_node()?
>
> With current implementation no as intc_hw_ops used inside
> intc_make_domu_dt_node() is declared as static.
How does that matter if you simply rename intc_make_domu_dt_node()?
> But I can introduce:
>
> enum intc_version intc_hw_version(void)
> {
> return intc_hw_ops->info->hw_version;
> }
>
> and the in make_intc_domU_node() just use switch/case to call interrupt
> controller specific functions.
>
> Would it be better? It will also help to ...
>
>>
>>> @@ -41,6 +41,10 @@ struct intc_hw_operations {
>>>
>>> /* handle external interrupt */
>>> void (*handle_interrupt)(struct cpu_user_regs *regs);
>>> +
>>> + /* Create interrupt controller node for domain */
>>> + int (*make_dom_dt_node)(const struct kernel_info *kinfo,
>>> + const struct dt_device_node *intc);
>>
>> An __init-only hook is somewhat risky, just to mention it. In IOMMU code
>> besides struct iommu_ops we have struct iommu_init_ops, just to give an
>> example of where the same could have been the case.
>
> .. not introduce hooks in this structure which won't exist after init.
That would be nice (as already said).
>>> --- a/xen/arch/riscv/intc.c
>>> +++ b/xen/arch/riscv/intc.c
>>> @@ -67,3 +67,11 @@ void intc_route_irq_to_xen(struct irq_desc *desc, unsigned int priority)
>>> intc_set_irq_type(desc, desc->arch.type);
>>> intc_set_irq_priority(desc, priority);
>>> }
>>> +
>>> +int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
>>> +{
>>> + if ( intc_hw_ops && intc_hw_ops->make_dom_dt_node )
>>> + return intc_hw_ops->make_dom_dt_node(kinfo, intc_hw_ops->info->node);
>>> +
>>> + return -ENOSYS;
>>
>> How do you justify this choice of return value? This isn't even a hypercall
>> handler.
>
> That make_dom_dt_node() isn't provided by interrupt controller, so isn't
> supported or as it mentioned in the comment "not implemented in
> public/errno.h:
> XEN_ERRNO(ENOSYS, 38) /* Function not implemented */
"Function" as in "system call function" (normally; for us: "hypercall function").
ENOSYS really has a pretty narrow range of valid uses (according to my reading).
EOPNOTSUPP is the more generic alternative.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains
2026-04-01 15:05 ` Jan Beulich
@ 2026-04-10 15:40 ` Oleksii Kurochko
2026-04-16 11:42 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:40 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 5:05 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> Guests using the IMSIC interrupt controller require a corresponding
>> Device Tree description. Add support for generating an IMSIC node when
>> building the guest DT.
>>
>> Keep a reference to the host IMSIC DT node and reuse its compatible
>> property while constructing the guest-visible node.
>
> Again raises a migration concern. Presumably a guest would then be able
> to migrate only to other hosts with the same compatible property.
Right, but I don't think we can do too much and it is the simplest
approach just to migrate to hosts with the same compatible property.
If you concern is about DTS property then for IMSIC it will be always
riscv,imsics according to the RISC-V DT spec. (IIRC, the only other
option could be qemu,riscv). Actually, I can just hard code
"riscv,imsic" explicitly instead of re-using of host name.
If your concern is that we will migrate to a host which doesn't support
IMSIC at all then we should or (a) don't migrate to such host or (b)
provide an emulation of IMSIC. And option (b) would be terrible from at
least performance point of view.
>> +static int __init imsic_set_interrupt_extended_prop(struct domain *d,
>> + void *fdt)
>> +{
>> + uint32_t len = 0, pos = 0, cpu, phandle;
>
> At least pos and cpu should be of fixed width types.
I thought that uint32_t is fixed width type, isn't it?
>
>> + uint32_t *irq_ext;
>> + char buf[64];
>
> Move this one into the loop's scope?
I will do that.
>> + int res;
>> +
>> + irq_ext = xvzalloc_array(uint32_t, d->max_vcpus * 2);
>> + if ( !irq_ext )
>> + return -ENOMEM;
>> +
>> + for ( cpu = 0; cpu < d->max_vcpus; cpu++ )
>> + {
>> + snprintf(buf, sizeof(buf), "/cpus/cpu@%u/interrupt-controller", cpu);
>> + phandle = fdt_get_phandle(fdt, fdt_path_offset(fdt, buf));
>> +
>> + if ( phandle <= 0 )
>> + return phandle;
>
> <= is odd to use on an unsigned type. Arm, in a similar situation, uses != .
Agree, it should be !=.
>
> Yet then - if 0 is an error indicator, aren't you converting this into a
> success indicator for the caller?
I read incorrectly the comment of what fdt_get_phandle() returns. so it
should be:
if ( !phandle )
{
res = -ENODEV;
goto out;
}
goto here as xvfree() should be called in this case.
>
>> + irq_ext[pos++] = cpu_to_be32(phandle);
>> + len += sizeof(uint32_t);
>
> As before, preferably sizeof(<expression>) to clarify the connection. It's
> pretty obvious here, but still. Yet then - do you really need to maintain
> "len"? Why not use ...
>
>> + irq_ext[pos++] = cpu_to_be32(IRQ_S_EXT);
>> + len += sizeof(uint32_t);
>> + }
>> +
>> + res = fdt_property(fdt, "interrupts-extended", irq_ext, len);
>
> ... "d->max_vcpus * 2 * sizeof(*irq_ext)" here?
Agree, it could and would be better.
>
>> + XVFREE(irq_ext);
>
> This can be just xvfree(), as the variable goes out of scpoe just afterwards
> anyway.
>
>> + return res;
>> +}
>> +
>> +int __init imsic_make_dt_node(const struct kernel_info *kinfo)
>
> So this function is unused until the next patch. Acceptable as long as Eclair
> (or other) scans aren't done for RISC-V, but generally you may want to try to
> get used to order patches such that this is avoided.
>
>> +{
>> + uint32_t len;
>> + const void *data = NULL;
>
> Why the initializer?
No, need any more, I had a case before so just missed to clean up properly.
>
>> + int res = 0;
>> + void *fdt = kinfo->fdt;
>> + const struct dt_device_node *host_imsic_node = imsic_cfg.host_node;
>> + uint32_t *next_phandle = &kinfo->bd.d->arch.next_phandle;
>> +
>> + res = fdt_begin_node(fdt, host_imsic_node->full_name);
>> + if ( res )
>> + return res;
>> +
>> + data = dt_get_property(host_imsic_node, "compatible", &len);
>> + if ( !data )
>> + {
>> + printk(XENLOG_ERR "%s: Can't find 'compatible' property\n",
>> + host_imsic_node->full_name);
>> +
>> + return -ENOENT;
>> + }
>
> Move fdt_begin_node() down below here?
>
>> --- a/xen/arch/riscv/include/asm/imsic.h
>> +++ b/xen/arch/riscv/include/asm/imsic.h
>> @@ -57,11 +57,16 @@ struct imsic_config {
>> /* MSI */
>> const struct imsic_msi *msi;
>>
>> + /* DT node of IMSIC */
>> + const struct dt_device_node *host_node;
>
> Does "host" in the name carry much of a meaning? Maybe better dt_node, to
> (potentially) distinguish it from a NUMA node which also may need tracking
> for an imsic at some point?
Maybe, not too much. Lets rename it.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 11/27] xen/riscv: create APLIC DT node for guest domains
2026-04-01 15:16 ` Jan Beulich
@ 2026-04-13 8:43 ` Oleksii Kurochko
2026-04-13 8:48 ` Oleksii Kurochko
2026-04-16 11:49 ` Jan Beulich
0 siblings, 2 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-13 8:43 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/1/26 5:16 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> @@ -295,6 +297,85 @@ static void cf_check aplic_set_irq_type(struct irq_desc *desc,
>> spin_unlock(&aplic.lock);
>> }
>>
>> +static int __init cf_check aplic_make_dom_dt_node(
>> + const struct kernel_info *kinfo,
>> + const struct dt_device_node *aplic_node)
>
> Nit: Entirely bogus indentation.
>
>> +{
>> + uint32_t len;
>> + const __be32 *regs;
>> + const void *data = NULL;
>
> Again, why the initializer?
No real need, I will drop.
>
>> + int res = 0;
>> + void *fdt = kinfo->fdt;
>> +
>> + res = imsic_make_dt_node(kinfo);
>> + if ( res )
>> + return res;
>> +
>> + res = fdt_begin_node(fdt, aplic_node->full_name);
>> + if (res)
>
> Nit: Style.
>
>> + return res;
>> +
>> + data = dt_get_property(aplic_node, "#interrupt-cells", &len);
>> + if ( !data )
>> + {
>> + printk("%s: Can't find '#interrupt-cells' property\n",
>> + aplic_node->full_name);
>> + return -FDT_ERR_XEN(ENOENT);
>> + }
>
> Again, pull this up to be first in the function?
Agree, it makes sense to move it up.
>
>> + res = fdt_property(fdt, "#interrupt-cells", data, len);
>> + if ( res )
>> + return res;
>
> So host properties are again directly handed through to the guest?
I am not sure that it can be different from what host has.
> Shouldn't the number of interrupts (aiui that's the "number of cells"
> here) a guest gets be independent from the host it runs one?
The #interrupt-cells property specifies how many 32-bit values (cells)
are used to encode a single interrupt specifier when referencing this
interrupt controller from another device node.
In this APLIC schema, it's fixed at const: 2, meaning every interrupt
reference requires exactly two cells — typically:
Cell 1 — the interrupt source number (which of the riscv,num-sources
wired inputs)
Cell 2 — the interrupt type/trigger flags (e.g. edge vs. level, active
high/low)
So when a device node writes something like interrupts = <42 1>, the
interrupt controller knows to interpret that as "source 42, trigger type
1" because #interrupt-cells = <2> told it to consume two cells per
specifier.
>
>> + regs = dt_get_property(aplic_node, "reg", &len);
>> + if ( !regs )
>> + {
>> + printk("%s: Can't find 'reg' property\n", aplic_node->full_name);
>> + return -FDT_ERR_XEN(ENOENT);
>> + }
>> +
>> + res = fdt_property(fdt, "reg", regs, len);
>> + if ( res )
>> + return res;
>> +
>> + data = dt_get_property(aplic_node, "riscv,num-sources", &len);
>> + if ( !data )
>> + {
>> + printk("%s: Can't find 'riscv,num-sources' property\n",
>> + aplic_node->full_name);
>> + return -FDT_ERR_XEN(ENOENT);
>> + }
>> +
>> + res = fdt_property(fdt, "riscv,num-sources", data, len);
>> + if ( res )
>> + return res;
>
> Or maybe this is the number of interrupts?
This is the total count of hardware interrupt lines wired into this
APLIC domain.
It could be independent from the host it runs one but looking at the
possible range [1,1023] for this property if we will put for a guest
lets say 22 but new host support only 20 when we will be in a trouble
anyway.
It seems like it is another one thing where it will be needed to check
that new host has enough number of interrupts and then just deny
migration to such host.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 11/27] xen/riscv: create APLIC DT node for guest domains
2026-04-13 8:43 ` Oleksii Kurochko
@ 2026-04-13 8:48 ` Oleksii Kurochko
2026-04-16 11:49 ` Jan Beulich
1 sibling, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-13 8:48 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/13/26 10:43 AM, Oleksii Kurochko wrote:
>>> + return res;
>>> +
>>> + data = dt_get_property(aplic_node, "#interrupt-cells", &len);
>>> + if ( !data )
>>> + {
>>> + printk("%s: Can't find '#interrupt-cells' property\n",
>>> + aplic_node->full_name);
>>> + return -FDT_ERR_XEN(ENOENT);
>>> + }
>>
>> Again, pull this up to be first in the function?
>
> Agree, it makes sense to move it up.
But on other hand, there are several time used the same construction
dt_get_propety() from host aplic node then write this property to guest
aplic node, then maybe we want to keep this dt_get_property(aplic_node,
"#interrupt-cells", &len); where it is now to have the same visually
construction way of guest device tree properties.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available()
2026-04-02 9:00 ` Jan Beulich
@ 2026-04-13 9:32 ` Oleksii Kurochko
2026-04-16 12:06 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-13 9:32 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 11:00 AM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> aia_init() is going to contain all the stuff related to AIA initialization.
>> At the moment, it is just Check if SSAIA extension is available and if yes
>> set is_aia_available to true.
>
> And (future) users of aia_available() can't directly call
> riscv_isa_extension_available()? Nor can aia_available() be a convenience
> wrapper around that call? It's only ...
>
>> --- /dev/null
>> +++ b/xen/arch/riscv/aia.c
>> @@ -0,0 +1,25 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +
>> +#include <xen/errno.h>
>> +#include <xen/init.h>
>> +#include <xen/sections.h>
>> +#include <xen/types.h>
>> +
>> +#include <asm/cpufeature.h>
>> +
>> +static bool __ro_after_init is_aia_available;
>
> ... a boolean, yes, but still.
My purpose was to have a variable which represent that AIA is
initialized properly. Maybe, it makes sense to rename this variable to
is_aia_inited.
The idea is that in future patches VGEIN will be also initialized in
aia_init() and if wasn't initialized properly then just keep
is_aia_availabe be set to false and in such case we will have that
is_aia_available != riscv_isa_extension_available(NULL,
RISCV_ISA_EXT_ssaia).
Note that regarding VGEIN it is arguable that it should be initialized
as generally it could be that there is no VGEIN what means that h/w
assisted guest interrupt files aren't available and s/w one should be
used. But s/w guest interrupt files aren't supported.
So I mean that with the current implementation if VGEIN isn't
initialized I will tell that AIA isn't available what generally isn't
quite true.
>
>> +bool aia_available(void)
>> +{
>> + return is_aia_available;
>> +}
>> +
>> +int __init aia_init(void)
>> +{
>> + if ( !riscv_isa_extension_available(NULL, RISCV_ISA_EXT_ssaia) )
>> + return -ENODEV;
>> +
>> + is_aia_available = true;
>> +
>> + return 0;
>> +}
>
> Why the return value, when ...
>
>> --- a/xen/arch/riscv/intc.c
>> +++ b/xen/arch/riscv/intc.c
>> @@ -8,6 +8,7 @@
>> #include <xen/lib.h>
>> #include <xen/spinlock.h>
>>
>> +#include <asm/aia.h>
>> #include <asm/intc.h>
>>
>> static const struct intc_hw_operations *__ro_after_init intc_hw_ops;
>> @@ -27,6 +28,8 @@ void __init intc_preinit(void)
>>
>> void __init intc_init(void)
>> {
>> + aia_init();
>
> ... the sole caller doesn't care?
Good point. I think it should return nothing as probably it is just
another interrupt controller (PLIC) is going to be used.
Also, I thought if aia_init() should be called inside at the start of
aplic_preinit(). It looks a little bit incorrect as APLIC is just a part
of AIA spec...
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests
2026-04-02 10:03 ` Jan Beulich
@ 2026-04-13 14:42 ` Oleksii Kurochko
2026-04-16 12:21 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-13 14:42 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 12:03 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> AIA provides a hardware-accelerated mechanism for delivering external
>> interrupts to domains via "guest interrupt files" located in IMSIC.
>> A single physical hart can implement multiple such files (up to GEILEN),
>> allowing several virtual harts to receive interrupts directly from hardware
>
> Isn't use of such an optimization coming prematurely? Shouldn't this series
> focus on getting basic functionality in place?
At the moment, we don't support only APLIC for guest interrupts as it
will require trap-and-emulation approach, so just from the start it was
decided to go with APLIC+IMSIC (IMSIC here as it only one interrupt
controller which exist and support VGEIN stuff at the momemnt) approach
and then when it will be needed back to only the case when APLIC is
supported.
Maybe, it was better to introduce in patch series where a lauching of
domain actually happens.
Considering that you've already made a review, I prefer then to have
this patch part of this patch series.
>
>> --- a/xen/arch/riscv/aia.c
>> +++ b/xen/arch/riscv/aia.c
>> @@ -1,11 +1,24 @@
>> /* SPDX-License-Identifier: GPL-2.0-only */
>>
>> +#include <xen/bitmap.h>
>> #include <xen/errno.h>
>> #include <xen/init.h>
>> #include <xen/sections.h>
>> +#include <xen/sched.h>
>> +#include <xen/spinlock.h>
>> #include <xen/types.h>
>> +#include <xen/xvmalloc.h>
>>
>> +#include <asm/aia.h>
>> #include <asm/cpufeature.h>
>> +#include <asm/csr.h>
>> +#include <asm/current.h>
>> +
>> +/*
>> + * Bitmap for each physical cpus to detect which VS (guest)
>> + * interrupt file id was used.
>> + */
>> +DEFINE_PER_CPU(struct vgein_bmp, vgein_bmp);
>
> Why can this not be static? All management looks to be in this same file.
It could be, it couldn't be before when I have vgein/hgei interrupt
handler in traps but after I decided to move it to aia.c, it looks like
it is fine to make it static.
>
>> @@ -14,12 +27,109 @@ bool aia_available(void)
>> return is_aia_available;
>> }
>>
>> +int __init vgein_init(unsigned int cpu)
>
> If this needs invoking once per CPU being brought up, it can't be __init.
Yes, it is going to be called inside the secondary CPU bring-up function.
__init sections are removed much later, after all CPUs are brought up,
so it looks like that at the moment when secondary CPUs are being
brought up, __init still exists and can be called.
>
> Also - static?
It isn't static because it will be called inside the secondary CPU
bring-up function.
>
>> +{
>> + struct vgein_bmp *vgein = &per_cpu(vgein_bmp, cpu);
>> +
>> + csr_write(CSR_HGEIE, -1UL);
>> + vgein->geilen = flsl(csr_read(CSR_HGEIE));
>> + csr_write(CSR_HGEIE, 0);
>> + if ( vgein->geilen )
>> + vgein->geilen--;
>
> I don't understand this. The "len" in "geilen" stands for "length", I suppose,
> i.e. the number of bits. Hmm, the spec itself is inconsistent: "The number of
> bits implemented in hgeip and hgeie for guest external interrupts is UNSPECIFIED
> and may be zero. This number is known as GEILEN." This may or may not include
> bit 0 (which is implemented, but r/o zero). Then saying "Hence, if GEILEN is
> nonzero, bits GEILEN:1 shall be writable in ..." suggests 0 isn't included, but
> that's not unambiguous.
But they explicitly wrote that: The least-significant bits are
implemented first, apart from bit 0. So bit 0 is explicitly excluded.
>
> Anyway, may I suggest
>
> vgein->geilen = flsl(csr_read(CSR_HGEIE) >> 1);
>
> instead?
It would be really better.
>
>> + BUG_ON(!vgein->geilen);
>
> You can return (an error, but see the respective remark on the earlier patch),
> no need to crash the system. That return may want to come after the printk()
> below, though.
>
>> + printk("cpu%d.geilen=%d\n", cpu, vgein->geilen);
>
> As before - %u please with unsigned int.
>
>> + if ( !vgein->bmp )
>
> Why would this check be needed?
>
>> + {
>> + vgein->bmp = xvzalloc_array(unsigned long, BITS_TO_LONGS(vgein->geilen));
>
> With the determination above, isn't BITS_TO_LONGS(vgein->geilen) ==
> BITS_PER_LONG in all cases? Surely you don't mean to runtime-allocate
> space for a single unsigned long? So I wonder is the dimension used
> is wrong.
Hm, I can't remember why I did so. You are right there is no any sense
to allocate a single unsinged long in runtime...
>
> If it isn't, dynamically allocating the owners array may be more
> useful, as (on RV64) occupies a fixed 512 bytes right now.
Agree, it make sense it will be much less memory if to allocate like:
vgein->owners = xvzalloc_array(struct vcpu *, vgein->geilen);
if ( !vgein->owners )
return -ENOMEM;
as maximum value of vgein->geilen is 63.
>
>> + if ( !vgein->bmp )
>> + return -ENOMEM;
>> + }
>> +
>> + spin_lock_init(&vgein->lock);
>> +
>> + return 0;
>> +}
>> +
>> int __init aia_init(void)
>> {
>> + int rc = 0;
>> +
>> if ( !riscv_isa_extension_available(NULL, RISCV_ISA_EXT_ssaia) )
>> return -ENODEV;
>>
>> + if ( (rc = vgein_init(0)) )
>> + return rc;
>> +
>> is_aia_available = true;
>
> Ah, this answers a question of mine on the earlier patch: This boolean
> indicates more than just the extension being available. But why does
> the description there not simply say so? How am I as a reviewer supposed
> to know?
I relised that it is worse to mention that only when saw your reply in
the earlier patch, I will update the commit message for convience.
>
>> - return 0;
>> + return rc;
>> +}
>> +
>> +unsigned int vgein_assign(struct vcpu *v)
>> +{
>> + unsigned int vgein_id;
>> +
>
> Seemingly undue blank line.
>
>> + struct vgein_bmp *vgein_bmp = &per_cpu(vgein_bmp, v->processor);
>> + unsigned long *bmp = vgein_bmp->bmp;
>> + unsigned long flags;
>> +
>> + spin_lock_irqsave(&vgein_bmp->lock, flags);
>> + vgein_id = bitmap_weight(bmp, vgein_bmp->geilen);
>
> How can the ID to use be the number of bits which are set? This only works
> if all set bits are contiguous at the bottom.
Oh, it is really wrong. find_first_zero_bit() should be instead or
vgein_id = find_next_zero_bit(bmp, vgein_bmp->geilen + 1, 1);
>
>> + /*
>> + * All vCPU guest interrupt files are used and we don't support a case
>> + * when number of vCPU on 1 pCPU is bigger then geilen.
>> + */
>
> This wants checking in vgein_init() then. CPUs (beyond the boot one)
> violating this should not be brought online.
It'll be nice. But we can't know how many vCPUs will be ran on pCPU when
vgein_init() is executed.
>
>> + ASSERT(vgein_id < vgein_bmp->geilen);
>
> What if not bit is available? By asserting, you assume the caller will not
> call here when no ID is available.
It is just a temporary ASSERT() (as we don't support software guest
interrupt files) because in general it is fine if there is no bit
available, it will just mean that that no physical hardware guest
interrupt file is assigned to the virtual hart, and software-based
emulation (a "software file") must be used to handle guest external
interrupts.
Will it be better to return 0 now here and just don't create a vCPU
on ...
Yet there is no caller of this function,
> so how can one verify whether this assertion is appropriate?
... the caller side when an assignment is expected to be happen?
>
>> + bitmap_set(bmp, vgein_id, 1);
>
> __set_bit()?
I thought that it will be fine to use for bmp, bitmap_* functions().
__set_bit is what is called inside bitmap_set().
>
>> + spin_unlock_irqrestore(&vgein_bmp->lock, flags);
>> +
>> + /*
>> + * The vgein_id shouldn't be zero, as it will indicate that no guest
>> + * external interrupt source is selected for VS-level external interrupts
>> + * according to RISC-V priviliged spec:
>> + * 8.2.1 Hypervisor Status Register (hstatus) in RISC-V priviliged spec:
>
> Please avoid section numbers in such references. The section of this name
> in the version I'm looking at is 21.2.1.
>
>> + * The VGEIN (Virtual Guest External Interrupt Number) field selects
>> + * a guest external interrupt source for VS-level external interrupts.
>> + * VGEIN is a WLRL field that must be able to hold values between zero
>> + * and the maximum guest external interrupt number (known as GEILEN),
>> + * inclusive.
>> + * When VGEIN=0, no guest external interrupt source is selected for
>> + * VS-level external interrupts.
>> + */
>> + vgein_id++;
>
> Related to my comment regarding GEILEN, this shouldn't be necessary. Keep
> bits in their natural positions, and simply avoid using bit 0 (either by
> setting it during init and then never clearing it, or by starting the
> scan for clear bits at bit 1).
>
>> +#ifdef VGEIN_DEBUG
>> + printk("%s: %pv: vgein_id(%u), xen_cpu%d_bmp=%#lx\n",
>> + __func__, v, vgein_id, v->processor, *bmp);
>> +#endif
>> +
>> + vcpu_guest_cpu_user_regs(v)->hstatus &= ~HSTATUS_VGEIN;
>> + vcpu_guest_cpu_user_regs(v)->hstatus |=
>> + MASK_INSR(vgein_id, HSTATUS_VGEIN);
>
> When is this function going to be invoked? (As before, not knowing this is
> one of the problems with introducing functions with no callers.)
vgein_assign() function is going to be invoked during the call of
arch_vcpu_create().
I also thought to make vgein_assign() just work with vgein_id and just
return vgein_id and fill v->hstatus on the caller side. It looks a
little bit cleaner from some point of view.
It is still need to return vgein_id as it is needed for IMSIC's guest
interrupt file address calculation.
>
>> + return vgein_id;
>> +}
>> +
>> +void vgein_release(struct vcpu *v, unsigned int vgen_id)
>> +{
>> + unsigned long flags;
>> +
>
> Another seemingly stray blank line.
>
>> + struct vgein_bmp *vgein_bmp = &per_cpu(vgein_bmp, v->processor);
>> +
>> + spin_lock_irqsave(&vgein_bmp->lock, flags);
>> + bitmap_clear(vgein_bmp->bmp, vgen_id - 1, 1);
>
> __clear_bit()?
>
The same as with bitmap_set() as ->bmp is bitmap I expect that
bitmap_*() functions should be used. But just to avoid extra if() inside
bitmap_clear(), I will use __clear_bit().
>> + spin_unlock_irqrestore(&vgein_bmp->lock, flags);
>> +
>> +#ifdef VGEIN_DEBUG
>> + printk("%s: vgein_id(%u), xen_cpu%d_bmp=%#lx\n",
>> + __func__, vgen_id, v->processor, *vgein_bmp->bmp);
>
> I can't spot a difference from the message in vgein_assign(). How is one
> to distinguish the two in a log?
By function name which is the first argument (__func__).
>
>> --- a/xen/arch/riscv/include/asm/aia.h
>> +++ b/xen/arch/riscv/include/asm/aia.h
>> @@ -3,8 +3,26 @@
>> #ifndef ASM__RISCV__AIA_H
>> #define ASM__RISCV__AIA_H
>>
>> +#include <xen/percpu.h>
>> +#include <xen/spinlock.h>
>> +
>> +struct vcpu;
>> +
>> +struct vgein_bmp {
>
> What does the _bmp suffix indicate here? There's ...
>
>> + unsigned long *bmp;
>
> ... a bitmap field, yes, but ...
>
>> + spinlock_t lock;
>> + struct vcpu *owners[BITS_PER_LONG];
>> + unsigned int geilen;
>> +};
>
> ... the structure as a whole has quite a bit more.
Agree, there is no any sense for _bmp. It would be better to use _ctrl.
Also, I will move this struct to aia.c. Then it also make sense to
rename vgein_vmp variable just to vgein.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 02/27] xen/riscv: Implement construct_domain()
2026-04-09 13:39 ` Oleksii Kurochko
2026-04-09 14:01 ` Oleksii Kurochko
@ 2026-04-14 6:26 ` Julien Grall
1 sibling, 0 replies; 123+ messages in thread
From: Julien Grall @ 2026-04-14 6:26 UTC (permalink / raw)
To: Oleksii Kurochko, Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Roger Pau Monné,
Stefano Stabellini, xen-devel
Hi,
On 09/04/2026 22:39, Oleksii Kurochko wrote:
>>>>> + BUG_ON(v->is_initialised);
>>>>> +
>>>>> + kernel_load(kinfo);
>>>>> + initrd_load(kinfo, copy_to_guest_phys);
>>>>> + dtb_load(kinfo, copy_to_guest_phys);
>>>> These all return void, despite this also being used for non-Dom0. Is it
>>>> really fatal to a dom0less system if one out of many domains fail to be
>>>> built?
>>> For a dom0less system, my opinion is that it should not be fatal, it
>>> should simply ignore a domain that fails to build and continue with the
>>> rest. However, with the current common dom0less code it will just
>>> panic(). This is a behavior I would like to change and it is on my TODO
>>> list.
>>>
>>> Regarding the functions returning void, this is because all of them
>>> currently call panic() on failure, which I expect will need to change in
>>> order to ignore a domain that fails to build in dom0less mode.
>>>
>>> For the current implementation of the common dom0less code this is fine,
>>> but I agree it should be addressed in a separate patch series.
>>>
>>> Especially when, despite the name, there is a Dom0?
>>>
>>> For this case, a failure there should indeed be fatal, so panic() is
>>> appropriate.
>> I think you misunderstood. I wasn't referring to the building of Dom0
>> failing. Was rather emphasizing that when there is a Dom0, failure to
>> create a DomU likely should even less so be fatal, as Dom0 could later
>> rectify the situation.
>
> Oh, okay, then it is really less fatal if DomU creation will fail in the
> case of Dom0.
I am not sure I agree with this statement. The goal of dom0less is to
not have a dom0 at all. So there is no way to rectify after the fact.
At least on Arm, we took the stance that boot failures are fatal because
this is a clear signal that something went wrong. It may be more
difficult to notice if you continue to boot.
Cheers,
--
Julien Grall
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state
2026-04-02 11:31 ` Jan Beulich
@ 2026-04-14 9:22 ` Oleksii Kurochko
2026-04-16 12:31 ` Jan Beulich
2026-04-16 12:31 ` Jan Beulich
0 siblings, 2 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-14 9:22 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 1:31 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> Each vCPU interacting with the IMSIC requires state to track the
>> associated guest interrupt file and its backing context.
>>
>> Introduce a per-vCPU structure to hold IMSIC-related state, including
>> the guest interrupt file identifier and the CPU providing the backing
>> VS-file. Access to the guest file identifier is protected by a lock.
>>
>> Initialize this structure during vCPU setup and store it in arch_vcpu.
>> The initial state marks the VS-file as software-backed until it becomes
>> associated with a physical CPU.
>>
>> Add helpers to retrieve and update the guest interrupt file identifier.
>
> Yet again a functions with no callers.
They will be called in follow-up patches.
>
>> --- a/xen/arch/riscv/imsic.c
>> +++ b/xen/arch/riscv/imsic.c
>> @@ -59,6 +59,29 @@ do { \
>> csr_clear(CSR_SIREG, v); \
>> } while (0)
>>
>> +unsigned int vcpu_guest_file_id(const struct vcpu *v)
>> +{
>> + struct imsic_state *imsic_state = v->arch.imsic_state;
>> + unsigned long flags;
>> + unsigned int vsfile_id;
>> +
>> + read_lock_irqsave(&imsic_state->vsfile_lock, flags);
>> + vsfile_id = imsic_state->guest_file_id;
>> + read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
>
> What purpose does this locking have? Already ...
>
>> + return vsfile_id;
>
> ... here the value can be stale, if indeed there is a chance of races.
> Did you perhaps mean to use ACCESS_ONCE() here and where the value is
> set?
ACCESS_ONCE() isn't guarantee only compiler re-ordering (as basically it
is just volatile-related stuff inisde the macros)?
Generally, I think that that guest_file_id is needed to be updated only
during migration of vCPU from one pCPU to another and I expect that
during this migration vCPU isn't active, so no one will want to read
imsic_state->guest_file_id. But on the other hand, there is:
bool imsic_has_interrupt(const struct vcpu *vcpu)
{
...
/*
* The IMSIC SW-file directly injects interrupt via hvip so
* only check for interrupt when IMSIC VS-file is being used.
*/
read_lock_irqsave(&imsic_state->vsfile_lock, flags);
if ( imsic_state->vsfile_pcpu != NR_CPUS )
ret = !!(csr_read(CSR_HGEIP) & BIT(imsic_state->guest_file_id,
UL));
read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
...
}
which I think could be called in parallel with with migration, so then
still lock are needed.
>
>> @@ -315,6 +338,25 @@ static int imsic_parse_node(const struct dt_device_node *node,
>> return 0;
>> }
>>
>> +int __init vcpu_imsic_init(struct vcpu *v)
>
> __init for a function involved in setting up a vCPU?
Yes, it will be used during creationg of a vCPU.
>
>> +{
>> + struct imsic_state *imsic_state;
>> +
>> + /* Allocate IMSIC context */
>> + imsic_state = xvzalloc(struct imsic_state);
>> + if ( !imsic_state )
>> + return -ENOMEM;
>> +
>> + v->arch.imsic_state = imsic_state;
>> +
>> + /* Setup IMSIC context */
>> + rwlock_init(&imsic_state->vsfile_lock);
>> +
>> + imsic_state->guest_file_id = imsic_state->vsfile_pcpu = NR_CPUS;
>
> Iirc Misra dislikes such double assignments, so better avoid them right away.
> (As per a comment at the bottom this may need splitting anyway.)
>
>> --- a/xen/arch/riscv/include/asm/domain.h
>> +++ b/xen/arch/riscv/include/asm/domain.h
>> @@ -52,6 +52,8 @@ struct arch_vcpu {
>>
>> struct vtimer vtimer;
>>
>> + struct imsic_state *imsic_state;
>
> Just like it's "vtimer", perhaps also "vimsic_state" for both the field
> and the struct tag?
Makes sense. Lets rename this field and tag.
>
>> @@ -64,8 +65,20 @@ struct imsic_config {
>> spinlock_t lock;
>> };
>>
>> +struct imsic_state {
>> + /* IMSIC VS-file */
>> + rwlock_t vsfile_lock;
>> + unsigned int guest_file_id;
>> + /*
>> + * (vsfile_pcpu >= 0) => h/w IMSIC VS-file
>> + * (vsfile_pcpu == NR_CPUS) => s/w IMSIC SW-file
>> + */
>> + unsigned long vsfile_pcpu;
>
> And why unsigned long, when unsigned int will do (as about everywhere else
> for CPU numbers)? That'll also shrink the structure size by 8 bytes.
Originally, IIRC mhartid register is unsigned long, so potentially we
could have amount of pCPU up to what fit into unsigned long.
But know I see that AIA limits amount of pCPUs to 16,384 harts. So we
could really use unsigned int here.
>
> As to the comment - as per vcpu_imsic_init() NR_CPUS also has some special
> meaning for guest_file_id, yet there's no comment there. How do file ID and
> NR_CPUS fit together anyway?
Agree, it looks incorrect. I tried to avoid introduction of "#define
GIELEN_MAX 64".
As an option I can use UINT_MAX as init value for guest_file_id or even
better just use 0 as it basically means that SW-file should be used.
The idea is that if s/w IMSIC SW-file is used that it doesn't sit on any
pCPU, so we should have some marker to show that and NR_CPUS should be fine.
As a result guest_file_id should be init-ed with 0 (and it will be by
default as xvzalloc() is used for allocation) and ->vsfile_pcpu inits
with NR_CPUS. Also, I will add a comment above guest_file_id.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support
2026-04-02 11:58 ` Jan Beulich
@ 2026-04-14 10:27 ` Oleksii Kurochko
2026-04-16 12:42 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-14 10:27 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 1:58 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> @@ -47,6 +48,19 @@ struct intc_hw_operations {
>> const struct dt_device_node *intc);
>> };
>> +
>> +struct vintc {
>> + const struct intc_info *info;
>
> Isn't this referencing a physical INTC's structure? Why would the virtual
> one's properties have to match that of the physical one?
It is because of how vAPLIC emulation load and store is working.
>
>> --- /dev/null
>> +++ b/xen/arch/riscv/include/asm/vaplic.h
>> @@ -0,0 +1,36 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * xen/arch/riscv/vaplic.c
>> + *
>> + * Virtual RISC-V Advanced Platform-Level Interrupt Controller support
>> + *
>> + * Copyright (c) Microchip.
>> + */
>> +
>> +#ifndef ASM__RISCV__VAPLIC_H
>> +#define ASM__RISCV__VAPLIC_H
>> +
>> +#include <xen/kernel.h>
>> +#include <xen/types.h>
>> +
>> +#include <asm/intc.h>
>> +
>> +struct domain;
>> +
>> +#define to_vaplic(v) container_of(v, struct vaplic, base)
>
> I'm confused here, maybe first of all because of the use of v. v is our
> common identified for struct vcpu * instances. Using it in a macro like
> this one suggests a struct vcpu * needs passing into the macro. Yet from
> the two uses of the macro that doesn't look to be the case.
v it is stale name when vpalic() was per vCPU. (what looks incorrect as
on real h/w APLIC isn't belongs to one pCPU).
>
> Perhaps best to have a struct domain * passed into here?
struct domain * will be better.
>
>> +struct vaplic_regs {
>> + uint32_t domaincfg;
>> + uint32_t smsiaddrcfg;
>> + uint32_t smsiaddrcfgh;
>
> The latter two aren't used, and generally I'd expect a h-suffixed field to
> exist only for RV32. (The un-suffixed field then would need to be unsigned
> long, of course.)
It should be a part of another patch. I will drop them for now.
>
>> +};
>> +
>> +struct vaplic {
>> + struct vintc base;
>
> How does "base" fit with the type of the field?
The field name base is a idiom for embedding a "base class" struct as
the first member, enabling a form of inheritance.
Any suggestion how to rename it better?
>
>> --- a/xen/arch/riscv/intc.c
>> +++ b/xen/arch/riscv/intc.c
>> @@ -6,6 +6,7 @@
>> #include <xen/init.h>
>> #include <xen/irq.h>
>> #include <xen/lib.h>
>> +#include <xen/sched.h>
>> #include <xen/spinlock.h>
>
> Why is this change needed all of the sudden?
Incorrect rebase. I will drop that.
>
>> --- /dev/null
>> +++ b/xen/arch/riscv/vaplic.c
>> @@ -0,0 +1,74 @@
>> +/* SPDX-License-Identifier: MIT */
>> +/*
>> + * xen/arch/riscv/vaplic.c
>> + *
>> + * Virtual RISC-V Advanced Platform-Level Interrupt Controller support
>> + *
>> + * Copyright (c) Microchip.
>> + * Copyright (c) Vates
>> + */
>> +
>> +#include <xen/errno.h>
>> +#include <xen/sched.h>
>> +#include <xen/xvmalloc.h>
>> +
>> +#include <asm/aia.h>
>> +#include <asm/imsic.h>
>> +#include <asm/intc.h>
>> +#include <asm/vaplic.h>
>> +
>> +#include "aplic-priv.h"
>> +
>> +static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
>> +{
>> + int rc = 0;
>> +
>> + rc = vcpu_imsic_init(v);
>> + if ( rc )
>> + return rc;
>> +
>> + imsic_set_guest_file_id(v, vgein_assign(v));
>
> And vgein_assign() can't fail? (Rhetorical question - of course it can. That
> function shouldn't assert that it can fine a valid ID.)
Technically it can't fail (except some bug of course), this function
should in general return 0 (when there aren't left h/w IDs) or something
> 0 (when there are some h/w IDs). ASSERT() inside it was added only
because of ...
>
> But then - aren't you limiting the number of vCPU-s a host can handle by the
> number vgein IDs?
... At the moment, I am limiting because S/W interrutps guest files
(IDs) aren't supported.
>
>> + return rc;
>> +}
>> +
>> +static const struct vintc_ops vaplic_ops = {
>> + .vcpu_init = vcpu_vaplic_init,
>> +};
>> +
>> +static struct vintc * __init vaplic_alloc(void)
>> +{
>> + struct vaplic *v = NULL;
>
> Onve again - why the initializer? In fact, ...
>
>> + v = xvzalloc(struct vaplic);
>
> ... this could be the initializer.
Sure, I will use it as initializer.
>
>> + if ( !v )
>> + return NULL;
>> +
>> + return &v->base;
>> +}
>
> If you returned and ...
>
>> +int __init domain_vaplic_init(struct domain *d)
>> +{
>> + int ret = 0;
>> +
>> + d->arch.vintc = vaplic_alloc();
>
> ... stored struct vaplic *, the slightly odd to_vaplic() macro wouldn't
> be needed.
vaplic_alloc() return struct vintc *, which is then used by to_vaplic()
to get struct vaplic *.
>
>> + if ( !d->arch.vintc )
>> + {
>> + ret = -ENOMEM;
>> + goto fail;
>
> Nit: goto when simply return could be used.
>
>> + }
>> +
>> + d->arch.vintc->ops = &vaplic_ops;
>
> Are other kinds of ops structures going to appear? If not, why the extra
> indirection?
At the moment, no I don't see any other kinds of ops struct. It was just
convenient way to group them and then easier to initialize them - just
one assignment instead of addinng a separate line in domain_vaplic_init().
>
>> + to_vaplic(d->arch.vintc)->regs.domaincfg =
>> + APLIC_DOMAINCFG_IE | APLIC_DOMAINCFG_DM;
>> +
>> + fail:
>> + return ret;
>> +}
>> +
>> +void __init domain_vaplic_deinit(struct domain *d)
>> +{
>> + struct vaplic *vaplic = to_vaplic(d->arch.vintc);
>> +
>> + XVFREE(vaplic);
>
> If this cleared the struct domain field, then yes. But the way it is, just
> xvfree() will suffice. (Re-work following other remarks may want it to
> become XVFREE() again, though.)
Agree, it could be xvfree() for now.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-04-02 12:22 ` Jan Beulich
@ 2026-04-14 11:29 ` Oleksii Kurochko
2026-04-16 12:51 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-14 11:29 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 2:22 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/include/asm/setup.h
>> +++ b/xen/arch/riscv/include/asm/setup.h
>> @@ -5,6 +5,10 @@
>>
>> #include <xen/types.h>
>>
>> +struct domain;
>> +struct dt_device_node;
>> +struct rangeset;
>> +
>> #define max_init_domid (0)
>>
>> void setup_mm(void);
>> @@ -13,6 +17,19 @@ void copy_from_paddr(void *dst, paddr_t paddr, unsigned long len);
>>
>> void init_csr_masks(void);
>>
>> +/* TODO: move somewhere to common header? */
>
> Counter question: Why ...
>
>> +/*
>> + * Retrieves the interrupts configuration from a device tree node and maps
>> + * those interrupts to the target domain.
>> + *
>> + * Returns:
>> + * < 0 error
>> + * 0 success
>> + */
>> +int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
>> + bool need_mapping,
>> + struct rangeset *irq_ranges);
>
> ... is this not an inline function, when ...
>
>> --- a/xen/arch/riscv/intc.c
>> +++ b/xen/arch/riscv/intc.c
>> @@ -79,3 +79,11 @@ int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
>>
>> return -ENOSYS;
>> }
>> +
>> +int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
>> + bool need_mapping,
>> + struct rangeset *irq_ranges)
>> +{
>> + return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
>> + irq_ranges);
>> +}
>
> ... it's merely a wrapper around an indirect function call? And then the
> function isn't used anywhere anyway.
It is used by dom0less common code and it is a wrapper because Arm has
different implementation and Arm doesn't have
map_device_irqs_to_domain() in its virtual interrupt controller operations.
>
>> --- a/xen/arch/riscv/vaplic.c
>> +++ b/xen/arch/riscv/vaplic.c
>> @@ -9,6 +9,7 @@
>> */
>>
>> #include <xen/errno.h>
>> +#include <xen/iocap.h>
>> #include <xen/sched.h>
>> #include <xen/xvmalloc.h>
>>
>> @@ -19,6 +20,113 @@
>>
>> #include "aplic-priv.h"
>>
>> +struct vaplic_priv {
>> + /* Contains a legal interrupts for a domain */
>> + uint32_t auth_irq_bmp[APLIC_NUM_REGS];
>> +};
>
> With it apparently plural that is correct, the first "a" wants dropping
> from the comment.
>
>> +static bool is_irq_shared_among_domains(const struct domain *d,
>> + const unsigned int irq_num)
>> +{
>> + struct domain *tmp;
>
> const
>
>> + unsigned int reg_num = irq_num / APLIC_NUM_REGS;
>> + unsigned int bit_pos = irq_num % APLIC_NUM_REGS;
>> +
>> + for_each_domain ( tmp )
>
> See other uses of the construct for how this needs synchronizing. But looping
> over all domains looks pretty inefficient anyway for ...
>
>> + {
>> + uint32_t *auth_irq_bmp;
>> +
>> + if ( tmp == d )
>> + continue;
>> +
>> + auth_irq_bmp = tmp->arch.vintc->private;
>> +
>> + if ( auth_irq_bmp[reg_num] & BIT(bit_pos, U) )
>> + {
>> + printk("%s: irq%d is shared between %pd and %pd\n", __func__,
>> + irq_num, tmp, d);
>> +
>> + return true;
>> + }
>
> ... the intended purpose. If IRQs can't be shared, can't you maintain global
> state of which ones are in use _somewhere_?
Makes sense. I will create a static array inside
vaplic_map_device_irqs_to_domain() and maintain global state there.
>
>> + }
>> +
>> + return false;
>> +}
>
> This together with ...
>
>> +int vaplic_map_device_irqs_to_domain(struct domain *d,
>> + struct dt_device_node *dev,
>> + bool need_mapping,
>> + struct rangeset *irq_ranges)
>> +{
>> + unsigned int i, nirq;
>> + int res, irq;
>> + struct dt_raw_irq rirq;
>> + uint32_t *auth_irq_bmp = d->arch.vintc->private;
>> + unsigned int reg_num;
>> +
>> + nirq = dt_number_of_irq(dev);
>> +
>> + /* Give permission and map IRQs */
>> + for ( i = 0; i < nirq; i++ )
>> + {
>> + res = dt_device_get_raw_irq(dev, i, &rirq);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
>> + i, dt_node_full_name(dev));
>> + return res;
>> + }
>> +
>> + /*
>> + * Don't map IRQ that have no physical meaning
>> + * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
>> + */
>> + if ( rirq.controller != dt_interrupt_controller )
>> + {
>> + dt_dprintk("irq %u not connected to primary controller."
>> + "Connected to %s\n", i,
>> + dt_node_full_name(rirq.controller));
>> + continue;
>> + }
>> +
>> + irq = platform_get_irq(dev, i);
>> + if ( irq < 0 )
>> + {
>> + printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
>> + return irq;
>> + }
>> +
>> + res = irq_permit_access(d, irq);
>> + if ( res )
>> + {
>> + printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
>> + irq);
>
> This time the other way around: %d please with plain int. (Again at least
> once further down.)
>
>> + return res;
>> + }
>> +
>> + reg_num = irq / APLIC_NUM_REGS;
>> +
>> + if ( is_irq_shared_among_domains(d, irq) )
>> + {
>> + printk("%s: Shared IRQ isn't supported\n", __func__);
>> + return -EINVAL;
>> + }
>> +
>> + auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
>
> ... all of this leaves me with the impression that IRQ numbering isn't really
> virtualized. IRQs are merely split into groups, one group per domain (and
> maybe some unused). How are you going to fit in truly virtual IRQs?
What do you mean by truly virtual IRQs?
I can't totally agree that the current approach isn't use virtual IRQs,
yes, they are 1:1 mapped but on the other side Xen is responsible to
give an IRQ number for guest's device and Xen is responsible that guest
isn't trying to reach IRQ which not belongs to it.
>
>> + dt_dprintk(" - IRQ: %u\n", irq);
>> +
>> + if ( irq_ranges )
>> + {
>> + res = rangeset_add_singleton(irq_ranges, irq);
>> + if ( res )
>> + return res;
>> + }
>
> What is irq_ranges?
IIUC based on Arm code irq_ranges is an optional output accumulator, the
caller allocates and passes it in when it needs to track which IRQs were
mapped (overlay use case), or passes NULL when that tracking is not needed.
I added here as map_device_irqs_to_domain() is called from the common
code and so maybe one day someone will decide to pass irq_ranges to this
functions. At the moment, for RISC-V it is the only one user of
map_device_irqs_to_domain() and it passes NULL.
>
>> @@ -34,6 +142,7 @@ static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
>>
>> static const struct vintc_ops vaplic_ops = {
>> .vcpu_init = vcpu_vaplic_init,
>> + .map_device_irqs_to_domain = vaplic_map_device_irqs_to_domain,
>> };
>
> What about the inverse function, needed for domain cleanup?
I planned to add it when it will be really needed. At the momemnt, I
don't have such use cases.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 17/27] xen/riscv: add missing APLIC register offsets, masks to asm/aplic.h.
2026-04-02 12:51 ` Jan Beulich
@ 2026-04-14 11:42 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-14 11:42 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 2:51 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> These definitions are required for correct decoding of APLIC MMIO
>> accesses and target configuration, and will be used by both the
>> physical and virtual APLIC implementations.
>>
>> No functional change is intended by this patch; it only centralises
>> hardware definitions that were previously missing.
>>
>> Co-developed-by: Romain Caritey <Romain.Caritey@microchip.com>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> I can commit this as is (with my ack added), but I'd like you to consider
> again my remark elsewhere as to ...
>
>> --- a/xen/arch/riscv/include/asm/aplic.h
>> +++ b/xen/arch/riscv/include/asm/aplic.h
>> @@ -17,6 +17,11 @@
>>
>> #define APLIC_NUM_REGS 32
>>
>> +#define APLIC_REG_OFFSET_MASK 0x3FFF
>> +#define APLIC_TARGET_IPRIO_MASK 0xFF
>> +#define APLIC_TARGET_GUEST_IDX_SHIFT 12
>> +#define APLIC_TARGET_EIID_MASK 0x7FF
>> +
>> #define APLIC_DOMAINCFG_IE BIT(8, U)
>> #define APLIC_DOMAINCFG_DM BIT(2, U)
>>
>> @@ -27,6 +32,36 @@
>> #define APLIC_SOURCECFG_SM_LEVEL_HIGH 0x6
>> #define APLIC_SOURCECFG_SM_LEVEL_LOW 0x7
>>
>> +#define APLIC_DOMAINCFG 0x0000
>> +#define APLIC_SOURCECFG_BASE 0x0004
>> +#define APLIC_SOURCECFG_LAST 0x0FFC
>> +
>> +#define APLIC_SMSICFGADDR 0x1BC8
>> +#define APLIC_SMSICFGADDRH 0x1BCC
>> +
>> +#define APLIC_SETIP_BASE 0x1C00
>> +#define APLIC_SETIP_LAST 0x1C7C
>> +#define APLIC_SETIPNUM 0x1CDC
>> +
>> +#define APLIC_CLRIP_BASE 0x1D00
>> +#define APLIC_CLRIP_LAST 0x1D7C
>> +#define APLIC_CLRIPNUM 0x1DDC
>> +
>> +#define APLIC_SETIE_BASE 0x1E00
>> +#define APLIC_SETIE_LAST 0x1E7C
>> +#define APLIC_SETIENUM 0x1EDC
>> +
>> +#define APLIC_CLRIE_BASE 0x1F00
>> +#define APLIC_CLRIE_LAST 0x1F7C
>> +#define APLIC_CLRIENUM 0x1FDC
>> +
>> +#define APLIC_SETIPNUM_LE 0x2000
>> +
>> +#define APLIC_GENMSI 0x3000
>> +
>> +#define APLIC_TARGET_BASE 0x3004
>> +#define APLIC_TARGET_LAST 0x3FFC
>
> ... the case of hex digits vs the need to have L (and hence U) suffixes in
> upper case for Misra's sake. Those suffixes are easier to spot as such when
> the hex digits use lower case. Then even for un-suffixed numbers lower case
> is preferable imo, for overall consistency.
I will switch to lower case.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 18/27] xen/riscv: add vaplic access check
2026-04-02 13:10 ` Jan Beulich
@ 2026-04-14 11:45 ` Oleksii Kurochko
2026-04-15 7:35 ` Oleksii Kurochko
2026-04-16 13:01 ` Jan Beulich
0 siblings, 2 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-14 11:45 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 3:10 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/aplic.c
>> +++ b/xen/arch/riscv/aplic.c
>> @@ -38,6 +38,7 @@ static struct aplic_priv aplic = {
>>
>> static struct intc_info __ro_after_init aplic_info = {
>> .hw_version = INTC_APLIC,
>> + .private = &aplic,
>
> Isn't this the host instance again? How can you ...
>
>> --- a/xen/arch/riscv/vaplic.c
>> +++ b/xen/arch/riscv/vaplic.c
>> @@ -127,6 +127,20 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
>> return 0;
>> }
>>
>> +static int cf_check vaplic_is_access(const struct vcpu *vcpu,
>> + const unsigned long addr)
>> +{
>> + const struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>> + const struct aplic_priv *priv = vaplic->base.info->private;
>> + const paddr_t paddr_end = priv->paddr_start + priv->size;
>> +
>> + /* check if it is an APLIC access */
>> + if ( priv->paddr_start <= addr && addr < paddr_end )
>
> ... use that here? Or asked differently, again: Where's the virtualization,
> i.e. the abstraction away from host properties?
With the current use case it was easier to choose such approach then
provide the full abstraction.
>
> Furthermore, is it really sufficient to check just the starting address of
> an access? Shouldn't the last byte accessed also fall into the range in
> question?
I think that it is okay, my understanding is that *paddr_end technically
is another range.
>
>> + return 1;
>> +
>> + return 0;
>> +}
>
> This function looks to want to return bool (and then use true/false).
Agree, then it will also need to update function pointer prototype in
vintc_ops.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 24/27] xen/riscv: init rcu
2026-04-02 15:03 ` Jan Beulich
@ 2026-04-14 11:50 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-14 11:50 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 5:03 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> Fold patches 23-25 into a single one, with an adjusted title?
I am totally okay with that. I will do that.
Thasnks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO
2026-04-02 14:18 ` Jan Beulich
@ 2026-04-14 16:04 ` Oleksii Kurochko
2026-04-16 13:19 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-14 16:04 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 4:18 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/vaplic.c
>> +++ b/xen/arch/riscv/vaplic.c
>> @@ -20,6 +20,16 @@
>>
>> #include "aplic-priv.h"
>>
>> +#define APLIC_REG_GET(addr, offset) \
>> + readl((void *)((vaddr_t)(addr) + offset))
>> +#define APLIC_REG_SET(addr, offset, value) \
>> + writel(value, (void *)((vaddr_t)(addr) + offset))
>
> Why is addr properly parenthesized, but offset isn't?
Overlooked that. Will fix.
>
>> +#define AUTH_IRQ_BIT(irqnum) (auth_irq_bmp[(irqnum) / APLIC_NUM_REGS] & \
>> + BIT((irqnum) % APLIC_NUM_REGS, U))
>> +
>> +#define regval_to_irqn(reg_val) ((reg_val) / sizeof(uint32_t))
>
> I'm trying to make sense of the division here, but I think the main issue
> is with naming: It's not a "register value" which is passed into here, but
> a register index (offset from a range's base register).
register index would be clearer. I will rename s/regval/regindx.
>
>> @@ -127,6 +137,164 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
>> return 0;
>> }
>>
>> +static void vaplic_dm_update_target(const unsigned long hart_id, uint32_t *iprio)
>> +{
>> + *iprio &= APLIC_TARGET_IPRIO_MASK;
>> + *iprio |= (hart_id << APLIC_TARGET_HART_IDX_SHIFT);
>> +}
>> +
>> +static void vaplic_update_target(const struct imsic_config *imsic,
>> + const int guest_id,
>> + const unsigned long hart_id, uint32_t *value)
>> +{
>> + unsigned long group_index;
>> + unsigned int hhxw = imsic->group_index_bits;
>> + unsigned int lhxw = imsic->hart_index_bits;
>> + unsigned int hhxs = imsic->group_index_shift - IMSIC_MMIO_PAGE_SHIFT * 2;
>> + unsigned long base_ppn = imsic->msi[hart_id].base_addr >> IMSIC_MMIO_PAGE_SHIFT;
>> +
>> + group_index = (base_ppn >> (hhxs + 12)) & (BIT(hhxw, UL) - 1);
>
> And there's no constant available to make this literal 12 more descriptive?
As it was used in aplic_set_irq_affinity() - IMSIC_MMIO_PAGE_SHIFT could
be used here.
>
>> + *value &= APLIC_TARGET_EIID_MASK;
>> + *value |= guest_id << APLIC_TARGET_GUEST_IDX_SHIFT;
>> + *value |= hart_id << APLIC_TARGET_HART_IDX_SHIFT;
>> + *value |= group_index << (lhxw + APLIC_TARGET_HART_IDX_SHIFT) ;
>> +}
>
> Both functions returning void right now, why would they need to return their
> result via indirection?
No specific reason. Do you think it would be better just to return value
instead? I am okay to rework that.
>
>> +#define CALC_REG_VALUE(base) \
>> +{ \
>> + uint32_t index; \
>> + uint32_t tmp_val; \
>
> Combine these two, or have the variables have initializers?
>
>> + index = regval_to_irqn(offset - base); \
>
> There's no "offset" declared or passed into here, nor ...
>
>> + tmp_val = APLIC_REG_GET(priv->regs, aplic_addr) & ~auth_irq_bmp[index]; \
>
> ... "priv", nor ...
>
>> + value &= auth_irq_bmp[index]; \
>> + value |= tmp_val; \
>
> ... "value". It may remain like this, but then it wants putting inside the
> sole function that uses it, and be #undef-ed at the end of the function.
>
>> +}
>
> Please wrap in do/while(0), for use sites to be required to have semicolons
> (and hence look like normal statements). Or make it a statement expression
> properly returning the calculated value.
I will put the following inside the function + undef at the end:
#define CALC_REG_VALUE(base) do { \
\
uint32_t index = regindx_to_irqn(offset - (base)); \
uint32_t tmp_val = APLIC_REG_GET(priv->regs, aplic_addr) & \
~auth_irq_bmp[index]; \
value &= auth_irq_bmp[index]; \
value |= tmp_val; \
} while ( 0 )
>
>> +static int cf_check vaplic_emulate_store(const struct vcpu *vcpu,
>> + unsigned long addr, uint32_t value)
>> +{
>> + struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>> + struct aplic_priv *priv = vaplic->base.info->private;
>> + uint32_t offset = addr & APLIC_REG_OFFSET_MASK;
>
> See ./CODING_STYLE as to uses of fixed-width types.
>
>> + unsigned long aplic_addr = addr - priv->paddr_start;
>> + const uint32_t *auth_irq_bmp = vcpu->domain->arch.vintc->private;
>> +
>> + switch ( offset )
>> + {
>> + case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
>
> And (taking this just as example) any misaligned accesses falling in this range
> are fine?
Do you mean something like 0x1C02 instead of 0x1C00 or 0x1C04? If yes,
then I don't think. I will add the following between switch():
if ( offset & 3 )
{
gdprintk(XENLOG_WARNING, "Misaligned APLIC access at offset %#x\n",
offset);
return -EINVAL;
}
>
>> + CALC_REG_VALUE(APLIC_SETIP_BASE);
>> + break;
>> +
>> + case APLIC_CLRIP_BASE ... APLIC_CLRIP_LAST:
>> + CALC_REG_VALUE(APLIC_CLRIP_BASE);
>> + break;
>> +
>> + case APLIC_SETIE_BASE ... APLIC_SETIE_LAST:
>> + CALC_REG_VALUE(APLIC_SETIE_BASE);
>> + break;
>> +
>> + case APLIC_CLRIE_BASE ... APLIC_CLRIE_LAST:
>> + CALC_REG_VALUE(APLIC_CLRIE_BASE);
>> + break;
>> +
>> + case APLIC_SOURCECFG_BASE ... APLIC_SOURCECFG_LAST:
>> + /* We don't suppert delagation, so bit10 if sourcecfg should be 0 */
>> + ASSERT(!(value & BIT(10, U)));
>
> And that bit doesn't have a proper #define?
No, at the moment, I will add:
#define APLIC_SOURCECFG_D BIT(10, U) to aplic.h.
>
>> + /*
>> + * As sourcecfg register starts from 1:
>> + * 0x0000 domaincfg
>> + * 0x0004 sourcecfg[1]
>> + * 0x0008 sourcecfg[2]
>> + * ...
>> + * 0x0FFC sourcecfg[1023]
>> + * It is necessary to calculate an interrupt number by substracting
>
> Nit: subtracting
>
>> + * of APLIC_DOMAINCFG instead of APLIC_SOURCECFG_BASE.
>> + */
>> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_DOMAINCFG)) )
>> + /* interrupt not enabled, ignore it */
>
> Throughout the series: Please adhere to ./CODING_STYLE.
>
>> + return 0;
>> +
>> + break;
>
> And any value is okay to write?
No, it should be in a range
[APLIC_SOURCECFG_SM_INACTIVE,APLIC_SOURCECFG_SM_LEVEL_LOW].
I will add the check before break:
if ( value > APLIC_SOURCECFG_SM_LEVEL_LOW )
{
gdprintk(XENLOG_WARNING,
"value(%u) is incorrect for sourcecfg register\n",
value);
value = APLIC_SOURCECFG_SM_INACTIVE;
}
>
>> + case APLIC_TARGET_BASE ... APLIC_TARGET_LAST:
>> + struct vcpu *target_vcpu = NULL;
>> +
>> + /*
>> + * Look at vaplic_emulate_load() for explanation why
>> + * APLIC_GENMSI is substracted.
>> + */
>
> There's no vaplic_emulate_load() - how can I go look there?
It is introduced in the next patch.
>
> Also same typo again as above.
>
>> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_GENMSI)) )
>> + /* interrupt not enabled, ignore it */
>> + return 0;
>> +
>> + for ( int i = 0; i < vcpu->domain->max_vcpus; i++ )
>
> unsigned int
>
>> + {
>> + struct vcpu *v = vcpu->domain->vcpu[i];
>> +
>> + if ( v->vcpu_id == (value >> APLIC_TARGET_HART_IDX_SHIFT) )
>> + {
>> + target_vcpu = v;
>> + break;
>> + }
>> + }
>> +
>> + ASSERT(target_vcpu);
>
> What guarantees the pointer to be non-NULL? The incoming value can be
> arbitrary, afaict.
I didn't understand your point. It is just checking that target_vcpu has
been found. If after for() loop the value of target_vcpu is still NULL
then something wrong in Xen.
>
>> + if ( !(vaplic->regs.domaincfg & APLIC_DOMAINCFG_DM) )
>> + {
>> + vaplic_dm_update_target(cpuid_to_hartid(target_vcpu->processor),
>> + &value);
>> + }
>> + else
>> + vaplic_update_target(priv->imsic_cfg,
>> + vcpu_guest_file_id(target_vcpu),
>> + cpuid_to_hartid(target_vcpu->processor),
>> + &value);
>
> I'm struggling with the naming here: When DM is clear, a function with "dm"
> in the name is called.
it means direct (delivery) mode. Maybe it is better to put dm at the end
of the function name? Or it is just better to change it to something else?
>
> For the latter one, unless other uses are intended speaking against that,
> instead of the middle two arguments simply pass target_vcpu?
Good point. I will update the function accordingly.
>
> Also please omit the braces consistently from both branches.
>
>> + break;
>> +
>> + case APLIC_SETIPNUM:
>> + case APLIC_SETIPNUM_LE:
>
> What about APLIC_SETIPNUM_BE?
>
>> + case APLIC_CLRIPNUM:
>> + case APLIC_SETIENUM:
>> + case APLIC_CLRIENUM:
>> + if ( AUTH_IRQ_BIT(value) )
>> + break;
>
> Aren't you easily overrunning auth_irq_bmp[] here?
It makes sense to add instead:
if ( !value || value >= APLIC_NUM_REGS * APLIC_NUM_REGS ||
!AUTH_IRQ_BIT(value) )
return 0;
break;
>
>> + return 0;
>> +
>> + case APLIC_DOMAINCFG:
>> + /*
>> + * TODO:
>> + * The domaincfg register has this format:
>> + * bits 31:24 read-only 0x80
>> + * bit 8 IE
>> + * bit 7 read-only 0
>> + * bit 2 DM (WARL)
>> + * bit 0 BE (WARL)
>> + *
>> + * The most interesting bit for us is IE(Interrupt Enable) bit.
>> + * At the moment, at least, Linux doesn't use domaincfg.IE bit to
>> + * disable interrupts globally, but if one day someone will use it
>> + * then extra actions should be done.
>> + */
>> +
>> + printk_once("%s: Nothing to do, domaincfg is set by aplic during "
>> + "initialization in Xen\n", __func__);
>
> As per the comment it's not "nothing to do", but your choice to ignore writes
> even if they may be relevant.
I will change "Nothing to do" to "Ignore writes to".
>
>> + return 0;
>> +
>> + default:
>> + panic("%s: unsupported register offset: %#x\n", __func__, offset);
>
> Crashing the host for the guest doing something odd? It's odd that the function
> only ever returns 0 anyway - it could simply return an error here (if the
> itention is to not ignore such writes).
But maybe it is a legal offset and we really want to support it?
Even if I will return just error then a caller site will want to do
something with this error -> for example, kill domain or panic() again.
Maybe panic is to much and just domain should be crashed here:
default:
gdprintk(XENLOG_WARNING,
"Unhandled APLIC write at offset %#x (value %#x)\n",
offset, value);
domain_crash(vcpu->domain);
return 0;
?
>
> As it's not clear what values other than zero such a function may return, I
> also can't comment on its (and the hook's) return type (may want to be bool
> instead of int).
Maybe, it makes sense to switch to bool. As I mentioned above I'm
returning -EINVAL in some cases. But it could be really just return 1
instead. Here:
if ( offset & 3 )
{
gdprintk(XENLOG_WARNING, "Misaligned APLIC access at offset %#x\n",
offset);
return -EINVAL;
}
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 18/27] xen/riscv: add vaplic access check
2026-04-14 11:45 ` Oleksii Kurochko
@ 2026-04-15 7:35 ` Oleksii Kurochko
2026-04-16 13:01 ` Jan Beulich
1 sibling, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-15 7:35 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/14/26 1:45 PM, Oleksii Kurochko wrote:
>>> +static int cf_check vaplic_is_access(const struct vcpu *vcpu,
>>> + const unsigned long addr)
>>> +{
>>> + const struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>>> + const struct aplic_priv *priv = vaplic->base.info->private;
>>> + const paddr_t paddr_end = priv->paddr_start + priv->size;
>>> +
>>> + /* check if it is an APLIC access */
>>> + if ( priv->paddr_start <= addr && addr < paddr_end )
>>
>> ... use that here? Or asked differently, again: Where's the
>> virtualization,
>> i.e. the abstraction away from host properties?
>
> With the current use case it was easier to choose such approach then
> provide the full abstraction.
BUt specifically here even with the current approach there is no too
much sense to check MMIO of real APLIC.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 21/27] xen/riscv: introduce (de)initialization helpers for vINTC
2026-04-02 14:58 ` Jan Beulich
@ 2026-04-15 7:50 ` Oleksii Kurochko
2026-04-16 13:23 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-15 7:50 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/2/26 4:58 PM, Jan Beulich wrote:
> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>> @@ -96,4 +97,7 @@ void intc_handle_external_irqs(struct cpu_user_regs *regs);
>>
>> int intc_make_domu_dt_node(const struct kernel_info *kinfo);
>>
>> +int domain_vintc_init(struct domain *d);
>> +void domain_vintc_deinit(struct domain *d);
>
> The former has a caller added, but the later hasn't.
As arch_domain_destroy() is stub at the moment, domain_vintc_deinit()
isn't called.
>
>> @@ -87,3 +88,42 @@ int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
>> return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
>> irq_ranges);
>> }
>> +
>> +int __init domain_vintc_init(struct domain *d)
>
> The __init here and below is questionable again.
This function is called during construction domain only, what happen
when __init section exists. So it is okay to have __init here.
Any use cases where it isn't true?
It could be wrong for domain_vintc_deinit() as it could be called when
__init section was freed.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support
2026-04-07 14:11 ` Jan Beulich
@ 2026-04-15 10:00 ` Oleksii Kurochko
2026-04-16 14:13 ` Jan Beulich
2026-04-15 10:28 ` Oleksii Kurochko
1 sibling, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-15 10:00 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/7/26 4:11 PM, Jan Beulich wrote:
> On 10.03.2026 18:09, Oleksii Kurochko wrote:
>> Enable dom0less support for RISC-V by selecting HAS_DOM0LESS and
>> providing the minimal architecture hooks required by the common
>> dom0less infrastructure.
>>
>> Add stub implementations for architecture-specific helpers used when
>> building domains from the device tree. These currently perform no
>> additional work but allow the generic dom0less code to build and run
>> on RISC-V.
>>
>> Introduce max_init_domid as a runtime variable rather than a constant
>> so that it can be updated during dom0less domain creation.
>>
>> Provide missing helpers and definitions required by the domain
>> construction code,
>
> I'm wondering about the splitting among patches: There's half a dozen
> (effectively stub) functions which are added here, and then there is
> the single init_vuart() which was split out into the earlier patch.
> What's the pattern behind this, i.e. why isn't init_vuart() also
> being added here?
If it'll be more convenient I am okay to merge prev. patch with the
current one.
init_vuart() is in a separate patch as it has some useful functionality
inside and thereby it will be more convenient to review.
>
>> including domain bitness helpers and the
>> p2m_set_allocation() prototype.
>>
>> Additionally define the guest magic memory region in the public
>> RISC-V interface.
>>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>> ---
>> Open questions:
>> - Shouldn't declaration/defintion of max_init_domid move to common code
>> instead of having it for each architecture separately? If yes, then what
>> would be the best place.
>
> What would you use to decide whether the declaration or #define is
> needed? (Plausible headers to put it can surely be found: console.h,
> domain.h, and perhaps more.)
I thought about to wrap that with CONFIG_DOM0LESS_BOOT as the
declaration is needed only for this case, for all other cases it is just
#define.
Or as an option we could always use declaration all the time. It won't
increase size of Xen too much or decrease performance because of
variable access.
>
>> --- a/xen/arch/riscv/include/asm/domain.h
>> +++ b/xen/arch/riscv/include/asm/domain.h
>> @@ -20,6 +20,14 @@ struct hvm_domain
>> uint64_t params[HVM_NR_PARAMS];
>> };
>>
>> +#ifdef CONFIG_RISCV_64
>> +#define is_32bit_domain(d) (0)
>> +#define is_64bit_domain(d) (1)
>> +#else
>> +#define is_32bit_domain(d) (1)
>> +#define is_64bit_domain(d) (0)
>> +#endif
>
> First, please use true/false. Then, while I agree with the RV32 part, 32-bit
> guests surely will need to be an option on a 64-bit hypervisor. Imo you'd
> better introduced a field in struct arch_domain to carry that information
> (or to derive it from) right away. That wouldn't be set to non-zero for the
> time being, i.e. that same constant-true/false would still result.
>
> Otherwise I don't see why you use #ifdef; you could then have things
> simpler as
>
> #define is_32bit_domain(d) IS_ENABLED(CONFIG_RISCV_32)
> #define is_64bit_domain(d) IS_ENABLED(CONFIG_RISCV_64)
>
> (but I specifically don't recommend going this route).
I will introduce a type in struct arch_domain then.
>
>> --- a/xen/include/public/arch-riscv.h
>> +++ b/xen/include/public/arch-riscv.h
>> @@ -58,6 +58,9 @@ typedef uint64_t xen_ulong_t;
>> #define GUEST_RAM_BANK_BASES { GUEST_RAM0_BASE }
>> #define GUEST_RAM_BANK_SIZES { GUEST_RAM0_SIZE }
>>
>> +#define GUEST_MAGIC_BASE xen_mk_ullong(0x39000000)
>> +#define GUEST_MAGIC_SIZE xen_mk_ullong(0x01000000)
>
> What is this, and why does it need putting in the public interface?
In other patch series such related things will live in
asm/guest-layout.h. It will be moved there after rebase on top of that
patch series.
It is needed now only for common dom0less code compilation as at the
moment we don't use xenstore pages for dom0less.
But generally it is region which is used to allocate "magic pages"
-special pages that facilitate communication between the guest and the
Xen hypervisor such as Console, XenStore pages etc.
I have in my TODO to understand how to remove requirement to have this
fixed region from dom0less common code.
Plus
> how come the numbers are exactly the same as what Arm uses?
It could be different. But this region is free for RISC-V too so it is
fine to re-use.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support
2026-04-07 14:11 ` Jan Beulich
2026-04-15 10:00 ` Oleksii Kurochko
@ 2026-04-15 10:28 ` Oleksii Kurochko
2026-04-16 14:15 ` Jan Beulich
1 sibling, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-15 10:28 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/7/26 4:11 PM, Jan Beulich wrote:
>> --- a/xen/arch/riscv/setup.c
>> +++ b/xen/arch/riscv/setup.c
>> @@ -32,6 +32,8 @@
>> #include <asm/traps.h>
>> #include <asm/vsbi.h>
>>
>> +domid_t max_init_domid = 0;
> The initializer isn't of much use, is it? Instead add __read_mostly, like
> Arm has it?
Then I expect it will fail to compile as setup.c should contain only
__init data.
So the we have to remove or __read_mostly or put max_init_domid to
different file.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains
2026-04-10 15:40 ` Oleksii Kurochko
@ 2026-04-16 11:42 ` Jan Beulich
2026-04-17 8:10 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 11:42 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 17:40, Oleksii Kurochko wrote:
> On 4/1/26 5:05 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> Guests using the IMSIC interrupt controller require a corresponding
>>> Device Tree description. Add support for generating an IMSIC node when
>>> building the guest DT.
>>>
>>> Keep a reference to the host IMSIC DT node and reuse its compatible
>>> property while constructing the guest-visible node.
>>
>> Again raises a migration concern. Presumably a guest would then be able
>> to migrate only to other hosts with the same compatible property.
>
> Right, but I don't think we can do too much and it is the simplest
> approach just to migrate to hosts with the same compatible property.
>
> If you concern is about DTS property then for IMSIC it will be always
> riscv,imsics according to the RISC-V DT spec. (IIRC, the only other
> option could be qemu,riscv). Actually, I can just hard code
> "riscv,imsic" explicitly instead of re-using of host name.
>
> If your concern is that we will migrate to a host which doesn't support
> IMSIC at all then we should or (a) don't migrate to such host or (b)
> provide an emulation of IMSIC. And option (b) would be terrible from at
> least performance point of view.
That would all be only a secondary concern - in an IMSIC is needed right
now, so be it. My primary concern is with inheriting the host IMSIC's
properties.
>>> +static int __init imsic_set_interrupt_extended_prop(struct domain *d,
>>> + void *fdt)
>>> +{
>>> + uint32_t len = 0, pos = 0, cpu, phandle;
>>
>> At least pos and cpu should be of fixed width types.
>
> I thought that uint32_t is fixed width type, isn't it?
Oh, of course. Problem being: I meant "shouldn't". More generally, see
./CODING_STYLE.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 11/27] xen/riscv: create APLIC DT node for guest domains
2026-04-13 8:43 ` Oleksii Kurochko
2026-04-13 8:48 ` Oleksii Kurochko
@ 2026-04-16 11:49 ` Jan Beulich
2026-04-17 9:01 ` Oleksii Kurochko
1 sibling, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 11:49 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.04.2026 10:43, Oleksii Kurochko wrote:
> On 4/1/26 5:16 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> + res = fdt_property(fdt, "#interrupt-cells", data, len);
>>> + if ( res )
>>> + return res;
>>
>> So host properties are again directly handed through to the guest?
>
> I am not sure that it can be different from what host has.
>
>> Shouldn't the number of interrupts (aiui that's the "number of cells"
>> here) a guest gets be independent from the host it runs one?
>
> The #interrupt-cells property specifies how many 32-bit values (cells)
> are used to encode a single interrupt specifier when referencing this
> interrupt controller from another device node.
> In this APLIC schema, it's fixed at const: 2, meaning every interrupt
> reference requires exactly two cells — typically:
>
> Cell 1 — the interrupt source number (which of the riscv,num-sources
> wired inputs)
> Cell 2 — the interrupt type/trigger flags (e.g. edge vs. level, active
> high/low)
So what if #interrupt-cells is 3 in the DT Xen is handed? If Xen can
cope, should that value really also be handed through to guests?
>>> + regs = dt_get_property(aplic_node, "reg", &len);
>>> + if ( !regs )
>>> + {
>>> + printk("%s: Can't find 'reg' property\n", aplic_node->full_name);
>>> + return -FDT_ERR_XEN(ENOENT);
>>> + }
>>> +
>>> + res = fdt_property(fdt, "reg", regs, len);
>>> + if ( res )
>>> + return res;
>>> +
>>> + data = dt_get_property(aplic_node, "riscv,num-sources", &len);
>>> + if ( !data )
>>> + {
>>> + printk("%s: Can't find 'riscv,num-sources' property\n",
>>> + aplic_node->full_name);
>>> + return -FDT_ERR_XEN(ENOENT);
>>> + }
>>> +
>>> + res = fdt_property(fdt, "riscv,num-sources", data, len);
>>> + if ( res )
>>> + return res;
>>
>> Or maybe this is the number of interrupts?
>
> This is the total count of hardware interrupt lines wired into this
> APLIC domain.
>
> It could be independent from the host it runs one but looking at the
> possible range [1,1023] for this property if we will put for a guest
> lets say 22 but new host support only 20 when we will be in a trouble
> anyway.
Correct. But if you had some hosts with 1024 and some with 256, how
would you bring up a guest on the former to later be able to migrate
it to one of the latter when guests inherit the count from the host?
> It seems like it is another one thing where it will be needed to check
> that new host has enough number of interrupts and then just deny
> migration to such host.
Yes.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available()
2026-04-13 9:32 ` Oleksii Kurochko
@ 2026-04-16 12:06 ` Jan Beulich
2026-04-17 9:37 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 12:06 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.04.2026 11:32, Oleksii Kurochko wrote:
> On 4/2/26 11:00 AM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> aia_init() is going to contain all the stuff related to AIA initialization.
>>> At the moment, it is just Check if SSAIA extension is available and if yes
>>> set is_aia_available to true.
>>
>> And (future) users of aia_available() can't directly call
>> riscv_isa_extension_available()? Nor can aia_available() be a convenience
>> wrapper around that call? It's only ...
>>
>>> --- /dev/null
>>> +++ b/xen/arch/riscv/aia.c
>>> @@ -0,0 +1,25 @@
>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>> +
>>> +#include <xen/errno.h>
>>> +#include <xen/init.h>
>>> +#include <xen/sections.h>
>>> +#include <xen/types.h>
>>> +
>>> +#include <asm/cpufeature.h>
>>> +
>>> +static bool __ro_after_init is_aia_available;
>>
>> ... a boolean, yes, but still.
>
> My purpose was to have a variable which represent that AIA is
> initialized properly. Maybe, it makes sense to rename this variable to
> is_aia_inited.
I'm generally having trouble with the non-word "inited". How about
"aia_usable"? Fits ...
> The idea is that in future patches VGEIN will be also initialized in
> aia_init() and if wasn't initialized properly then just keep
> is_aia_availabe be set to false and in such case we will have that
> is_aia_available != riscv_isa_extension_available(NULL,
> RISCV_ISA_EXT_ssaia).
... this as well. And ...
> Note that regarding VGEIN it is arguable that it should be initialized
> as generally it could be that there is no VGEIN what means that h/w
> assisted guest interrupt files aren't available and s/w one should be
> used. But s/w guest interrupt files aren't supported.
> So I mean that with the current implementation if VGEIN isn't
> initialized I will tell that AIA isn't available what generally isn't
> quite true.
... this. Question of course if what aia_usable (or whatever its name)
being set to false means to Xen's own operation. If Xen would still work
okay (without being able to create guests), perhaps the name may want to
be "aia_guest_usable" or some such. I.e. as I said on a number of
occasions: A variable wants to be named to accurately express its
purpose, without the name growing excessively long.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests
2026-04-13 14:42 ` Oleksii Kurochko
@ 2026-04-16 12:21 ` Jan Beulich
2026-04-17 11:34 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 12:21 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 13.04.2026 16:42, Oleksii Kurochko wrote:
> On 4/2/26 12:03 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> AIA provides a hardware-accelerated mechanism for delivering external
>>> interrupts to domains via "guest interrupt files" located in IMSIC.
>>> A single physical hart can implement multiple such files (up to GEILEN),
>>> allowing several virtual harts to receive interrupts directly from hardware
>>
>> Isn't use of such an optimization coming prematurely? Shouldn't this series
>> focus on getting basic functionality in place?
>
> At the moment, we don't support only APLIC for guest interrupts as it
> will require trap-and-emulation approach, so just from the start it was
> decided to go with APLIC+IMSIC (IMSIC here as it only one interrupt
> controller which exist and support VGEIN stuff at the momemnt) approach
> and then when it will be needed back to only the case when APLIC is
> supported.
>
> Maybe, it was better to introduce in patch series where a lauching of
> domain actually happens.
>
> Considering that you've already made a review, I prefer then to have
> this patch part of this patch series.
That's okay, but then please take my remark as an indication that once
again the patch description makes implications on what the reader knows
(without it being written down anywhere I'm aware of).
>>> @@ -14,12 +27,109 @@ bool aia_available(void)
>>> return is_aia_available;
>>> }
>>>
>>> +int __init vgein_init(unsigned int cpu)
>>
>> If this needs invoking once per CPU being brought up, it can't be __init.
>
> Yes, it is going to be called inside the secondary CPU bring-up function.
>
> __init sections are removed much later, after all CPUs are brought up,
> so it looks like that at the moment when secondary CPUs are being
> brought up, __init still exists and can be called.
No. And I said so before. See XEN_SYSCTL_cpu_hotplug (and ignore it having
"hotplug" in the name, but merely consider that you can take CPUs offline
at runtime, and later bring them online again).
>> Also - static?
>
> It isn't static because it will be called inside the secondary CPU
> bring-up function.
As it doesn't need calling from the outside for the boot CPU, it's not
obvious why it would need calling from the outside for secondary ones.
>>> +{
>>> + struct vgein_bmp *vgein = &per_cpu(vgein_bmp, cpu);
>>> +
>>> + csr_write(CSR_HGEIE, -1UL);
>>> + vgein->geilen = flsl(csr_read(CSR_HGEIE));
>>> + csr_write(CSR_HGEIE, 0);
>>> + if ( vgein->geilen )
>>> + vgein->geilen--;
>>
>> I don't understand this. The "len" in "geilen" stands for "length", I suppose,
>> i.e. the number of bits. Hmm, the spec itself is inconsistent: "The number of
>> bits implemented in hgeip and hgeie for guest external interrupts is UNSPECIFIED
>> and may be zero. This number is known as GEILEN." This may or may not include
>> bit 0 (which is implemented, but r/o zero). Then saying "Hence, if GEILEN is
>> nonzero, bits GEILEN:1 shall be writable in ..." suggests 0 isn't included, but
>> that's not unambiguous.
>
> But they explicitly wrote that: The least-significant bits are
> implemented first, apart from bit 0. So bit 0 is explicitly excluded.
Fine, but not including it in anything named *LEN feels - as said - ambiguous.
>>> + /*
>>> + * All vCPU guest interrupt files are used and we don't support a case
>>> + * when number of vCPU on 1 pCPU is bigger then geilen.
>>> + */
>>
>> This wants checking in vgein_init() then. CPUs (beyond the boot one)
>> violating this should not be brought online.
>
> It'll be nice. But we can't know how many vCPUs will be ran on pCPU when
> vgein_init() is executed.
I don't understand: How does it matter how many vCPU-s will exist later on
(and will run wherever)?
>>> + ASSERT(vgein_id < vgein_bmp->geilen);
>>
>> What if not bit is available? By asserting, you assume the caller will not
>> call here when no ID is available.
>
> It is just a temporary ASSERT() (as we don't support software guest
> interrupt files) because in general it is fine if there is no bit
> available, it will just mean that that no physical hardware guest
> interrupt file is assigned to the virtual hart, and software-based
> emulation (a "software file") must be used to handle guest external
> interrupts.
>
> Will it be better to return 0 now here and just don't create a vCPU
> on ...
>
> Yet there is no caller of this function,
>> so how can one verify whether this assertion is appropriate?
>
> ... the caller side when an assignment is expected to be happen?
Yes, you want to return errors for cases you can't handle. You wan to
assert on internal state only, not on anything controlled by a caller
outside of the hypervisor.
>>> + bitmap_set(bmp, vgein_id, 1);
>>
>> __set_bit()?
>
> I thought that it will be fine to use for bmp, bitmap_* functions().
> __set_bit is what is called inside bitmap_set().
Sure, but why make things look complicated? We use __set_bit() and
friends all over the place on bitmaps. bitmap_*() exist for the less
trivial cases.
>>> + spin_unlock_irqrestore(&vgein_bmp->lock, flags);
>>> +
>>> +#ifdef VGEIN_DEBUG
>>> + printk("%s: vgein_id(%u), xen_cpu%d_bmp=%#lx\n",
>>> + __func__, vgen_id, v->processor, *vgein_bmp->bmp);
>>
>> I can't spot a difference from the message in vgein_assign(). How is one
>> to distinguish the two in a log?
>
> By function name which is the first argument (__func__).
My general take is that in gprintk() use of __func__ may be okay, but
hardly anywhere else.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state
2026-04-14 9:22 ` Oleksii Kurochko
@ 2026-04-16 12:31 ` Jan Beulich
2026-04-16 12:31 ` Jan Beulich
1 sibling, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 12:31 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.04.2026 11:22, Oleksii Kurochko wrote:
> On 4/2/26 1:31 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> Each vCPU interacting with the IMSIC requires state to track the
>>> associated guest interrupt file and its backing context.
>>>
>>> Introduce a per-vCPU structure to hold IMSIC-related state, including
>>> the guest interrupt file identifier and the CPU providing the backing
>>> VS-file. Access to the guest file identifier is protected by a lock.
>>>
>>> Initialize this structure during vCPU setup and store it in arch_vcpu.
>>> The initial state marks the VS-file as software-backed until it becomes
>>> associated with a physical CPU.
>>>
>>> Add helpers to retrieve and update the guest interrupt file identifier.
>>
>> Yet again a functions with no callers.
>
> They will be called in follow-up patches.
In which case please provide some minimal information on the intended use.
>>> --- a/xen/arch/riscv/imsic.c
>>> +++ b/xen/arch/riscv/imsic.c
>>> @@ -59,6 +59,29 @@ do { \
>>> csr_clear(CSR_SIREG, v); \
>>> } while (0)
>>>
>>> +unsigned int vcpu_guest_file_id(const struct vcpu *v)
>>> +{
>>> + struct imsic_state *imsic_state = v->arch.imsic_state;
>>> + unsigned long flags;
>>> + unsigned int vsfile_id;
>>> +
>>> + read_lock_irqsave(&imsic_state->vsfile_lock, flags);
>>> + vsfile_id = imsic_state->guest_file_id;
>>> + read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
>>
>> What purpose does this locking have? Already ...
>>
>>> + return vsfile_id;
>>
>> ... here the value can be stale, if indeed there is a chance of races.
>> Did you perhaps mean to use ACCESS_ONCE() here and where the value is
>> set?
>
> ACCESS_ONCE() isn't guarantee only compiler re-ordering (as basically it
> is just volatile-related stuff inisde the macros)?
>
> Generally, I think that that guest_file_id is needed to be updated only
> during migration of vCPU from one pCPU to another and I expect that
> during this migration vCPU isn't active, so no one will want to read
> imsic_state->guest_file_id. But on the other hand, there is:
> bool imsic_has_interrupt(const struct vcpu *vcpu)
> {
> ...
> /*
> * The IMSIC SW-file directly injects interrupt via hvip so
> * only check for interrupt when IMSIC VS-file is being used.
> */
>
> read_lock_irqsave(&imsic_state->vsfile_lock, flags);
> if ( imsic_state->vsfile_pcpu != NR_CPUS )
> ret = !!(csr_read(CSR_HGEIP) & BIT(imsic_state->guest_file_id,
> UL));
> read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
> ...
> }
> which I think could be called in parallel with with migration, so then
> still lock are needed.
None of this addresses my pointing out that the returned value will be
stale by the point the caller gets to look at it. Which in turn raises
said question about the use of a lock. If you read
imsic_state->guest_file_id atomically (i.e. excluding tearing of reads),
the value seen / used will be stale as with the lock in use. Unless of
course there's yet another aspect hidden somewhere in what is not being
explained.
>>> @@ -315,6 +338,25 @@ static int imsic_parse_node(const struct dt_device_node *node,
>>> return 0;
>>> }
>>>
>>> +int __init vcpu_imsic_init(struct vcpu *v)
>>
>> __init for a function involved in setting up a vCPU?
>
> Yes, it will be used during creationg of a vCPU.
And vCPU-s can be created post-boot, can't they? (Outside of dom0less
of course, but imo you really don't want to tie each and every function
to dom0less being the primary goal right now.)
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state
2026-04-14 9:22 ` Oleksii Kurochko
2026-04-16 12:31 ` Jan Beulich
@ 2026-04-16 12:31 ` Jan Beulich
2026-04-17 13:47 ` Oleksii Kurochko
1 sibling, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 12:31 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.04.2026 11:22, Oleksii Kurochko wrote:
> On 4/2/26 1:31 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> Each vCPU interacting with the IMSIC requires state to track the
>>> associated guest interrupt file and its backing context.
>>>
>>> Introduce a per-vCPU structure to hold IMSIC-related state, including
>>> the guest interrupt file identifier and the CPU providing the backing
>>> VS-file. Access to the guest file identifier is protected by a lock.
>>>
>>> Initialize this structure during vCPU setup and store it in arch_vcpu.
>>> The initial state marks the VS-file as software-backed until it becomes
>>> associated with a physical CPU.
>>>
>>> Add helpers to retrieve and update the guest interrupt file identifier.
>>
>> Yet again a functions with no callers.
>
> They will be called in follow-up patches.
In which case please provide some minimal information on the intended use.
>>> --- a/xen/arch/riscv/imsic.c
>>> +++ b/xen/arch/riscv/imsic.c
>>> @@ -59,6 +59,29 @@ do { \
>>> csr_clear(CSR_SIREG, v); \
>>> } while (0)
>>>
>>> +unsigned int vcpu_guest_file_id(const struct vcpu *v)
>>> +{
>>> + struct imsic_state *imsic_state = v->arch.imsic_state;
>>> + unsigned long flags;
>>> + unsigned int vsfile_id;
>>> +
>>> + read_lock_irqsave(&imsic_state->vsfile_lock, flags);
>>> + vsfile_id = imsic_state->guest_file_id;
>>> + read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
>>
>> What purpose does this locking have? Already ...
>>
>>> + return vsfile_id;
>>
>> ... here the value can be stale, if indeed there is a chance of races.
>> Did you perhaps mean to use ACCESS_ONCE() here and where the value is
>> set?
>
> ACCESS_ONCE() isn't guarantee only compiler re-ordering (as basically it
> is just volatile-related stuff inisde the macros)?
>
> Generally, I think that that guest_file_id is needed to be updated only
> during migration of vCPU from one pCPU to another and I expect that
> during this migration vCPU isn't active, so no one will want to read
> imsic_state->guest_file_id. But on the other hand, there is:
> bool imsic_has_interrupt(const struct vcpu *vcpu)
> {
> ...
> /*
> * The IMSIC SW-file directly injects interrupt via hvip so
> * only check for interrupt when IMSIC VS-file is being used.
> */
>
> read_lock_irqsave(&imsic_state->vsfile_lock, flags);
> if ( imsic_state->vsfile_pcpu != NR_CPUS )
> ret = !!(csr_read(CSR_HGEIP) & BIT(imsic_state->guest_file_id,
> UL));
> read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
> ...
> }
> which I think could be called in parallel with with migration, so then
> still lock are needed.
None of this addresses my pointing out that the returned value will be
stale by the point the caller gets to look at it. Which in turn raises
said question about the use of a lock. If you read
imsic_state->guest_file_id atomically (i.e. excluding tearing of reads),
the value seen / used will be stale as with the lock in use. Unless of
course there's yet another aspect hidden somewhere in what is not being
explained.
>>> @@ -315,6 +338,25 @@ static int imsic_parse_node(const struct dt_device_node *node,
>>> return 0;
>>> }
>>>
>>> +int __init vcpu_imsic_init(struct vcpu *v)
>>
>> __init for a function involved in setting up a vCPU?
>
> Yes, it will be used during creationg of a vCPU.
And vCPU-s can be created post-boot, can't they? (Outside of dom0less
of course, but imo you really don't want to tie each and every function
to dom0less being the primary goal right now.)
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support
2026-04-14 10:27 ` Oleksii Kurochko
@ 2026-04-16 12:42 ` Jan Beulich
2026-04-20 10:25 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 12:42 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.04.2026 12:27, Oleksii Kurochko wrote:
> On 4/2/26 1:58 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> @@ -47,6 +48,19 @@ struct intc_hw_operations {
>>> const struct dt_device_node *intc);
>>> };
>>> +
>>> +struct vintc {
>>> + const struct intc_info *info;
>>
>> Isn't this referencing a physical INTC's structure? Why would the virtual
>> one's properties have to match that of the physical one?
>
> It is because of how vAPLIC emulation load and store is working.
Thank you very much. This fully explains things, the more that of course
emulation of loads and stores comes earlier in this series. Oleksii,
really, please.
>>> +struct vaplic {
>>> + struct vintc base;
>>
>> How does "base" fit with the type of the field?
>
> The field name base is a idiom for embedding a "base class" struct as
> the first member, enabling a form of inheritance.
>
> Any suggestion how to rename it better?
vintc?
>>> --- /dev/null
>>> +++ b/xen/arch/riscv/vaplic.c
>>> @@ -0,0 +1,74 @@
>>> +/* SPDX-License-Identifier: MIT */
>>> +/*
>>> + * xen/arch/riscv/vaplic.c
>>> + *
>>> + * Virtual RISC-V Advanced Platform-Level Interrupt Controller support
>>> + *
>>> + * Copyright (c) Microchip.
>>> + * Copyright (c) Vates
>>> + */
>>> +
>>> +#include <xen/errno.h>
>>> +#include <xen/sched.h>
>>> +#include <xen/xvmalloc.h>
>>> +
>>> +#include <asm/aia.h>
>>> +#include <asm/imsic.h>
>>> +#include <asm/intc.h>
>>> +#include <asm/vaplic.h>
>>> +
>>> +#include "aplic-priv.h"
>>> +
>>> +static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
>>> +{
>>> + int rc = 0;
>>> +
>>> + rc = vcpu_imsic_init(v);
>>> + if ( rc )
>>> + return rc;
>>> +
>>> + imsic_set_guest_file_id(v, vgein_assign(v));
>>
>> And vgein_assign() can't fail? (Rhetorical question - of course it can. That
>> function shouldn't assert that it can fine a valid ID.)
>
> Technically it can't fail (except some bug of course), this function
> should in general return 0 (when there aren't left h/w IDs)
Which is "failure".
> or something > 0 (when there are some h/w IDs).
Which is "success".
> ASSERT() inside it was added only
> because of ...
>
>> But then - aren't you limiting the number of vCPU-s a host can handle by the
>> number vgein IDs?
>
> ... At the moment, I am limiting because S/W interrutps guest files
> (IDs) aren't supported.
As before - return error codes when errors occur.
>>> +static struct vintc * __init vaplic_alloc(void)
>>> +{
>>> + struct vaplic *v = NULL;
>>
>> Onve again - why the initializer? In fact, ...
>>
>>> + v = xvzalloc(struct vaplic);
>>
>> ... this could be the initializer.
>
> Sure, I will use it as initializer.
>
>>
>>> + if ( !v )
>>> + return NULL;
>>> +
>>> + return &v->base;
>>> +}
>>
>> If you returned and ...
>>
>>> +int __init domain_vaplic_init(struct domain *d)
>>> +{
>>> + int ret = 0;
>>> +
>>> + d->arch.vintc = vaplic_alloc();
>>
>> ... stored struct vaplic *, the slightly odd to_vaplic() macro wouldn't
>> be needed.
>
> vaplic_alloc() return struct vintc *,
Which is what I'm putting under question. Why would a function of this name
return anything else than struct vaplic *?
> which is then used by to_vaplic()
> to get struct vaplic *.
And which is what I'm saying can be avoided.
>>> + if ( !d->arch.vintc )
>>> + {
>>> + ret = -ENOMEM;
>>> + goto fail;
>>
>> Nit: goto when simply return could be used.
>>
>>> + }
>>> +
>>> + d->arch.vintc->ops = &vaplic_ops;
>>
>> Are other kinds of ops structures going to appear? If not, why the extra
>> indirection?
>
> At the moment, no I don't see any other kinds of ops struct. It was just
> convenient way to group them and then easier to initialize them - just
> one assignment instead of addinng a separate line in domain_vaplic_init().
Maybe I wasn't as clear as I should have been: Why the indirection when it
doesn't abstract anything? I.e. why the "ops" field in the first place,
when everyone could access the global (until such time that abstraction
becomes necessary)?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-04-14 11:29 ` Oleksii Kurochko
@ 2026-04-16 12:51 ` Jan Beulich
2026-04-20 11:39 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 12:51 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.04.2026 13:29, Oleksii Kurochko wrote:
> On 4/2/26 2:22 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/include/asm/setup.h
>>> +++ b/xen/arch/riscv/include/asm/setup.h
>>> @@ -5,6 +5,10 @@
>>>
>>> #include <xen/types.h>
>>>
>>> +struct domain;
>>> +struct dt_device_node;
>>> +struct rangeset;
>>> +
>>> #define max_init_domid (0)
>>>
>>> void setup_mm(void);
>>> @@ -13,6 +17,19 @@ void copy_from_paddr(void *dst, paddr_t paddr, unsigned long len);
>>>
>>> void init_csr_masks(void);
>>>
>>> +/* TODO: move somewhere to common header? */
>>
>> Counter question: Why ...
>>
>>> +/*
>>> + * Retrieves the interrupts configuration from a device tree node and maps
>>> + * those interrupts to the target domain.
>>> + *
>>> + * Returns:
>>> + * < 0 error
>>> + * 0 success
>>> + */
>>> +int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
>>> + bool need_mapping,
>>> + struct rangeset *irq_ranges);
>>
>> ... is this not an inline function, when ...
>>
>>> --- a/xen/arch/riscv/intc.c
>>> +++ b/xen/arch/riscv/intc.c
>>> @@ -79,3 +79,11 @@ int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
>>>
>>> return -ENOSYS;
>>> }
>>> +
>>> +int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
>>> + bool need_mapping,
>>> + struct rangeset *irq_ranges)
>>> +{
>>> + return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
>>> + irq_ranges);
>>> +}
>>
>> ... it's merely a wrapper around an indirect function call? And then the
>> function isn't used anywhere anyway.
>
> It is used by dom0less common code and it is a wrapper because Arm has
> different implementation and Arm doesn't have
> map_device_irqs_to_domain() in its virtual interrupt controller operations.
But the question wasn't why this is a wrapper, but why this wrapper isn't an
inline function.
>>> +int vaplic_map_device_irqs_to_domain(struct domain *d,
>>> + struct dt_device_node *dev,
>>> + bool need_mapping,
>>> + struct rangeset *irq_ranges)
>>> +{
>>> + unsigned int i, nirq;
>>> + int res, irq;
>>> + struct dt_raw_irq rirq;
>>> + uint32_t *auth_irq_bmp = d->arch.vintc->private;
>>> + unsigned int reg_num;
>>> +
>>> + nirq = dt_number_of_irq(dev);
>>> +
>>> + /* Give permission and map IRQs */
>>> + for ( i = 0; i < nirq; i++ )
>>> + {
>>> + res = dt_device_get_raw_irq(dev, i, &rirq);
>>> + if ( res )
>>> + {
>>> + printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
>>> + i, dt_node_full_name(dev));
>>> + return res;
>>> + }
>>> +
>>> + /*
>>> + * Don't map IRQ that have no physical meaning
>>> + * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
>>> + */
>>> + if ( rirq.controller != dt_interrupt_controller )
>>> + {
>>> + dt_dprintk("irq %u not connected to primary controller."
>>> + "Connected to %s\n", i,
>>> + dt_node_full_name(rirq.controller));
>>> + continue;
>>> + }
>>> +
>>> + irq = platform_get_irq(dev, i);
>>> + if ( irq < 0 )
>>> + {
>>> + printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
>>> + return irq;
>>> + }
>>> +
>>> + res = irq_permit_access(d, irq);
>>> + if ( res )
>>> + {
>>> + printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
>>> + irq);
>>
>> This time the other way around: %d please with plain int. (Again at least
>> once further down.)
>>
>>> + return res;
>>> + }
>>> +
>>> + reg_num = irq / APLIC_NUM_REGS;
>>> +
>>> + if ( is_irq_shared_among_domains(d, irq) )
>>> + {
>>> + printk("%s: Shared IRQ isn't supported\n", __func__);
>>> + return -EINVAL;
>>> + }
>>> +
>>> + auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
>>
>> ... all of this leaves me with the impression that IRQ numbering isn't really
>> virtualized. IRQs are merely split into groups, one group per domain (and
>> maybe some unused). How are you going to fit in truly virtual IRQs?
>
> What do you mean by truly virtual IRQs?
Ones where no aspects are represented by any piece of hardware.
> I can't totally agree that the current approach isn't use virtual IRQs,
> yes, they are 1:1 mapped but on the other side Xen is responsible to
> give an IRQ number for guest's device and Xen is responsible that guest
> isn't trying to reach IRQ which not belongs to it.
In a non-virtualized environment I expect IRQs are going to be "sparse"
(i.e. with perhaps large blocks of items used elsewhere). If you had
proper translation of IRQ numbers, the same could be true for your
guests.
>>> + dt_dprintk(" - IRQ: %u\n", irq);
>>> +
>>> + if ( irq_ranges )
>>> + {
>>> + res = rangeset_add_singleton(irq_ranges, irq);
>>> + if ( res )
>>> + return res;
>>> + }
>>
>> What is irq_ranges?
>
> IIUC based on Arm code irq_ranges is an optional output accumulator, the
> caller allocates and passes it in when it needs to track which IRQs were
> mapped (overlay use case), or passes NULL when that tracking is not needed.
>
> I added here as map_device_irqs_to_domain() is called from the common
> code and so maybe one day someone will decide to pass irq_ranges to this
> functions. At the moment, for RISC-V it is the only one user of
> map_device_irqs_to_domain() and it passes NULL.
Simply assert then that it's NULL?
>>> @@ -34,6 +142,7 @@ static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
>>>
>>> static const struct vintc_ops vaplic_ops = {
>>> .vcpu_init = vcpu_vaplic_init,
>>> + .map_device_irqs_to_domain = vaplic_map_device_irqs_to_domain,
>>> };
>>
>> What about the inverse function, needed for domain cleanup?
>
> I planned to add it when it will be really needed. At the momemnt, I
> don't have such use cases.
I.e. if any domain needs re-starting, the entire system needs rebooting?
Recall that "dom0less" is slightly misleading a name, as it only allows
there to not be a Dom0. One can be there, and hence re-starting a crashed
domain ought to be possible. For that, you need to correctly clean up
after the crashed one.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 18/27] xen/riscv: add vaplic access check
2026-04-14 11:45 ` Oleksii Kurochko
2026-04-15 7:35 ` Oleksii Kurochko
@ 2026-04-16 13:01 ` Jan Beulich
2026-04-20 11:53 ` Oleksii Kurochko
1 sibling, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 13:01 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.04.2026 13:45, Oleksii Kurochko wrote:
> On 4/2/26 3:10 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> --- a/xen/arch/riscv/aplic.c
>>> +++ b/xen/arch/riscv/aplic.c
>>> @@ -38,6 +38,7 @@ static struct aplic_priv aplic = {
>>>
>>> static struct intc_info __ro_after_init aplic_info = {
>>> .hw_version = INTC_APLIC,
>>> + .private = &aplic,
>>
>> Isn't this the host instance again? How can you ...
>>
>>> --- a/xen/arch/riscv/vaplic.c
>>> +++ b/xen/arch/riscv/vaplic.c
>>> @@ -127,6 +127,20 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
>>> return 0;
>>> }
>>>
>>> +static int cf_check vaplic_is_access(const struct vcpu *vcpu,
>>> + const unsigned long addr)
>>> +{
>>> + const struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>>> + const struct aplic_priv *priv = vaplic->base.info->private;
>>> + const paddr_t paddr_end = priv->paddr_start + priv->size;
>>> +
>>> + /* check if it is an APLIC access */
>>> + if ( priv->paddr_start <= addr && addr < paddr_end )
>>
>> ... use that here? Or asked differently, again: Where's the virtualization,
>> i.e. the abstraction away from host properties?
>
> With the current use case it was easier to choose such approach then
> provide the full abstraction.
>
>> Furthermore, is it really sufficient to check just the starting address of
>> an access? Shouldn't the last byte accessed also fall into the range in
>> question?
>
> I think that it is okay, my understanding is that *paddr_end technically
> is another range.
Of course it is. But a multi-byte access crossing the paddr_end boundary
isn't purely an APLIC one. You can reject such for simplicity, but I'm
unconvinced that you can claim you will be able to correctly handle it
without proper merging.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO
2026-04-14 16:04 ` Oleksii Kurochko
@ 2026-04-16 13:19 ` Jan Beulich
2026-04-20 15:02 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 13:19 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 14.04.2026 18:04, Oleksii Kurochko wrote:
> On 4/2/26 4:18 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> @@ -127,6 +137,164 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
>>> return 0;
>>> }
>>>
>>> +static void vaplic_dm_update_target(const unsigned long hart_id, uint32_t *iprio)
>>> +{
>>> + *iprio &= APLIC_TARGET_IPRIO_MASK;
>>> + *iprio |= (hart_id << APLIC_TARGET_HART_IDX_SHIFT);
>>> +}
>>> +
>>> +static void vaplic_update_target(const struct imsic_config *imsic,
>>> + const int guest_id,
>>> + const unsigned long hart_id, uint32_t *value)
>>> +{
>>> + unsigned long group_index;
>>> + unsigned int hhxw = imsic->group_index_bits;
>>> + unsigned int lhxw = imsic->hart_index_bits;
>>> + unsigned int hhxs = imsic->group_index_shift - IMSIC_MMIO_PAGE_SHIFT * 2;
>>> + unsigned long base_ppn = imsic->msi[hart_id].base_addr >> IMSIC_MMIO_PAGE_SHIFT;
>>> +
>>> + group_index = (base_ppn >> (hhxs + 12)) & (BIT(hhxw, UL) - 1);
>>
>> And there's no constant available to make this literal 12 more descriptive?
>
> As it was used in aplic_set_irq_affinity() - IMSIC_MMIO_PAGE_SHIFT could
> be used here.
>
>>
>>> + *value &= APLIC_TARGET_EIID_MASK;
>>> + *value |= guest_id << APLIC_TARGET_GUEST_IDX_SHIFT;
>>> + *value |= hart_id << APLIC_TARGET_HART_IDX_SHIFT;
>>> + *value |= group_index << (lhxw + APLIC_TARGET_HART_IDX_SHIFT) ;
>>> +}
>>
>> Both functions returning void right now, why would they need to return their
>> result via indirection?
>
> No specific reason. Do you think it would be better just to return value
> instead? I am okay to rework that.
Rule of thumb is: Use return value in preference to indirection when the
returned value isn't needed for some other purpose.
>>> +#define CALC_REG_VALUE(base) \
>>> +{ \
>>> + uint32_t index; \
>>> + uint32_t tmp_val; \
>>
>> Combine these two, or have the variables have initializers?
>>
>>> + index = regval_to_irqn(offset - base); \
>>
>> There's no "offset" declared or passed into here, nor ...
>>
>>> + tmp_val = APLIC_REG_GET(priv->regs, aplic_addr) & ~auth_irq_bmp[index]; \
>>
>> ... "priv", nor ...
>>
>>> + value &= auth_irq_bmp[index]; \
>>> + value |= tmp_val; \
>>
>> ... "value". It may remain like this, but then it wants putting inside the
>> sole function that uses it, and be #undef-ed at the end of the function.
>>
>>> +}
>>
>> Please wrap in do/while(0), for use sites to be required to have semicolons
>> (and hence look like normal statements). Or make it a statement expression
>> properly returning the calculated value.
>
> I will put the following inside the function + undef at the end:
>
> #define CALC_REG_VALUE(base) do { \
> \
Nit: Why this extra line?
>>> +static int cf_check vaplic_emulate_store(const struct vcpu *vcpu,
>>> + unsigned long addr, uint32_t value)
>>> +{
>>> + struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>>> + struct aplic_priv *priv = vaplic->base.info->private;
>>> + uint32_t offset = addr & APLIC_REG_OFFSET_MASK;
>>
>> See ./CODING_STYLE as to uses of fixed-width types.
>>
>>> + unsigned long aplic_addr = addr - priv->paddr_start;
>>> + const uint32_t *auth_irq_bmp = vcpu->domain->arch.vintc->private;
>>> +
>>> + switch ( offset )
>>> + {
>>> + case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
>>
>> And (taking this just as example) any misaligned accesses falling in this range
>> are fine?
>
> Do you mean something like 0x1C02 instead of 0x1C00 or 0x1C04?
Yes.
>>> + /*
>>> + * As sourcecfg register starts from 1:
>>> + * 0x0000 domaincfg
>>> + * 0x0004 sourcecfg[1]
>>> + * 0x0008 sourcecfg[2]
>>> + * ...
>>> + * 0x0FFC sourcecfg[1023]
>>> + * It is necessary to calculate an interrupt number by substracting
>>
>> Nit: subtracting
>>
>>> + * of APLIC_DOMAINCFG instead of APLIC_SOURCECFG_BASE.
>>> + */
>>> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_DOMAINCFG)) )
>>> + /* interrupt not enabled, ignore it */
>>
>> Throughout the series: Please adhere to ./CODING_STYLE.
>>
>>> + return 0;
>>> +
>>> + break;
>>
>> And any value is okay to write?
>
> No, it should be in a range
> [APLIC_SOURCECFG_SM_INACTIVE,APLIC_SOURCECFG_SM_LEVEL_LOW].
>
> I will add the check before break:
> if ( value > APLIC_SOURCECFG_SM_LEVEL_LOW )
> {
> gdprintk(XENLOG_WARNING,
> "value(%u) is incorrect for sourcecfg register\n",
> value);
> value = APLIC_SOURCECFG_SM_INACTIVE;
> }
And why would writing APLIC_SOURCECFG_SM_INACTIVE be any better, when
that's not what the guest wanted? Simply ignore such writes, unless the
spec mandates specific behavior for out-of-range avlues?
>>> + case APLIC_TARGET_BASE ... APLIC_TARGET_LAST:
>>> + struct vcpu *target_vcpu = NULL;
>>> +
>>> + /*
>>> + * Look at vaplic_emulate_load() for explanation why
>>> + * APLIC_GENMSI is substracted.
>>> + */
>>
>> There's no vaplic_emulate_load() - how can I go look there?
>
> It is introduced in the next patch.
As before - it should be possible to review patch series strictly
sequentially. Further, what if this patch gets committed, and the other
gets delayed by several months?
>>> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_GENMSI)) )
>>> + /* interrupt not enabled, ignore it */
>>> + return 0;
>>> +
>>> + for ( int i = 0; i < vcpu->domain->max_vcpus; i++ )
>>
>> unsigned int
>>
>>> + {
>>> + struct vcpu *v = vcpu->domain->vcpu[i];
>>> +
>>> + if ( v->vcpu_id == (value >> APLIC_TARGET_HART_IDX_SHIFT) )
>>> + {
>>> + target_vcpu = v;
>>> + break;
>>> + }
>>> + }
>>> +
>>> + ASSERT(target_vcpu);
>>
>> What guarantees the pointer to be non-NULL? The incoming value can be
>> arbitrary, afaict.
>
> I didn't understand your point. It is just checking that target_vcpu has
> been found. If after for() loop the value of target_vcpu is still NULL
> then something wrong in Xen.
If that's true, then the assertion is fine to have. I can't help the
impression though that a guest could pick a value such that you can't
possibly find the target vCPU. Asserting on guest controlled input is
not okay, as was said several times before.
>>> + if ( !(vaplic->regs.domaincfg & APLIC_DOMAINCFG_DM) )
>>> + {
>>> + vaplic_dm_update_target(cpuid_to_hartid(target_vcpu->processor),
>>> + &value);
>>> + }
>>> + else
>>> + vaplic_update_target(priv->imsic_cfg,
>>> + vcpu_guest_file_id(target_vcpu),
>>> + cpuid_to_hartid(target_vcpu->processor),
>>> + &value);
>>
>> I'm struggling with the naming here: When DM is clear, a function with "dm"
>> in the name is called.
>
> it means direct (delivery) mode. Maybe it is better to put dm at the end
> of the function name? Or it is just better to change it to something else?
Without a better understanding of what is wanted, all I can say is that
calling something with "dm" in its name when the condition says it's not
"dm" is confusing.
>>> + default:
>>> + panic("%s: unsupported register offset: %#x\n", __func__, offset);
>>
>> Crashing the host for the guest doing something odd? It's odd that the function
>> only ever returns 0 anyway - it could simply return an error here (if the
>> itention is to not ignore such writes).
>
> But maybe it is a legal offset and we really want to support it?
Still not a reason to crash the entire host?
> Even if I will return just error then a caller site will want to do
> something with this error -> for example, kill domain or panic() again.
> Maybe panic is to much and just domain should be crashed here:
>
> default:
> gdprintk(XENLOG_WARNING,
> "Unhandled APLIC write at offset %#x (value %#x)\n",
> offset, value);
> domain_crash(vcpu->domain);
> return 0;
> ?
This would already be better. You shouldn't use gdprintk() with
domain_crash() though. Please take a look at domain_crash()'s
definition - you'll then see what to do, I suppose.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 21/27] xen/riscv: introduce (de)initialization helpers for vINTC
2026-04-15 7:50 ` Oleksii Kurochko
@ 2026-04-16 13:23 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 13:23 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 15.04.2026 09:50, Oleksii Kurochko wrote:
> On 4/2/26 4:58 PM, Jan Beulich wrote:
>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>> @@ -87,3 +88,42 @@ int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
>>> return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
>>> irq_ranges);
>>> }
>>> +
>>> +int __init domain_vintc_init(struct domain *d)
>>
>> The __init here and below is questionable again.
>
> This function is called during construction domain only, what happen
> when __init section exists. So it is okay to have __init here.
>
> Any use cases where it isn't true?
See my earlier reply. Imo it is wrong for you to tailor the code to
dom0less-only operation, with no domains ever created (or destroyed)
at runtime.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support
2026-04-15 10:00 ` Oleksii Kurochko
@ 2026-04-16 14:13 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 14:13 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 15.04.2026 12:00, Oleksii Kurochko wrote:
> On 4/7/26 4:11 PM, Jan Beulich wrote:
>> On 10.03.2026 18:09, Oleksii Kurochko wrote:
>>> ---
>>> Open questions:
>>> - Shouldn't declaration/defintion of max_init_domid move to common code
>>> instead of having it for each architecture separately? If yes, then what
>>> would be the best place.
>>
>> What would you use to decide whether the declaration or #define is
>> needed? (Plausible headers to put it can surely be found: console.h,
>> domain.h, and perhaps more.)
>
> I thought about to wrap that with CONFIG_DOM0LESS_BOOT as the
> declaration is needed only for this case, for all other cases it is just
> #define.
>
> Or as an option we could always use declaration all the time. It won't
> increase size of Xen too much or decrease performance because of
> variable access.
Just to mention - unless coming with very good justification, I for one
am opposed to using variables for compile-time constants. We avoid
doing so in a number of situations, so deliberately introducing a case
to the contrary would at least feel inconsistent.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support
2026-04-15 10:28 ` Oleksii Kurochko
@ 2026-04-16 14:15 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-16 14:15 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 15.04.2026 12:28, Oleksii Kurochko wrote:
> On 4/7/26 4:11 PM, Jan Beulich wrote:
>>> --- a/xen/arch/riscv/setup.c
>>> +++ b/xen/arch/riscv/setup.c
>>> @@ -32,6 +32,8 @@
>>> #include <asm/traps.h>
>>> #include <asm/vsbi.h>
>>>
>>> +domid_t max_init_domid = 0;
>> The initializer isn't of much use, is it? Instead add __read_mostly, like
>> Arm has it?
>
> Then I expect it will fail to compile as setup.c should contain only
> __init data.
But that's true no matter whether you have __read_mostly there or no section
placement attribute at all.
> So the we have to remove or __read_mostly or put max_init_domid to
> different file.
If it can't be __initdata (or alike), any data item needs to live outside of
any file compiled into *.init.o.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains
2026-04-16 11:42 ` Jan Beulich
@ 2026-04-17 8:10 ` Oleksii Kurochko
2026-04-17 13:50 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-17 8:10 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 1:42 PM, Jan Beulich wrote:
> On 10.04.2026 17:40, Oleksii Kurochko wrote:
>> On 4/1/26 5:05 PM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> Guests using the IMSIC interrupt controller require a corresponding
>>>> Device Tree description. Add support for generating an IMSIC node when
>>>> building the guest DT.
>>>>
>>>> Keep a reference to the host IMSIC DT node and reuse its compatible
>>>> property while constructing the guest-visible node.
>>>
>>> Again raises a migration concern. Presumably a guest would then be able
>>> to migrate only to other hosts with the same compatible property.
>>
>> Right, but I don't think we can do too much and it is the simplest
>> approach just to migrate to hosts with the same compatible property.
>>
>> If you concern is about DTS property then for IMSIC it will be always
>> riscv,imsics according to the RISC-V DT spec. (IIRC, the only other
>> option could be qemu,riscv). Actually, I can just hard code
>> "riscv,imsic" explicitly instead of re-using of host name.
>>
>> If your concern is that we will migrate to a host which doesn't support
>> IMSIC at all then we should or (a) don't migrate to such host or (b)
>> provide an emulation of IMSIC. And option (b) would be terrible from at
>> least performance point of view.
>
> That would all be only a secondary concern - in an IMSIC is needed right
> now, so be it. My primary concern is with inheriting the host IMSIC's
> properties.
Could we do really something better?
At the moment, the following properties are inhereted:
node name -> not an issue at all, it is just a name of the node and
generally doesn't matter what it is in the matter of DTB generation as
phandle number of this node will be used as pointer to this node, not
the name. If it would be better I can hard code just "imsic".
compatible -> also, not a big issue. According to current DT RISC-V
bindings it could be just "riscv,imsic". But then I have the question
what if one day someone will come up with own implementation of IMSIC
then "riscv,own-imsic" will be in host DTB and so I expect that guest
DTB should have it as this h/w expects to use specifically OWN-IMSIC
driver. So it seems okay to copy compatible from host DTB. Yes, if
migration will happen where just "riscv,imsic" is supported then it
can't be easily migrated to such h/w and its okay. But generally I
expect that compatible will always riscv,imsic.
riscv,num-ids -> generally, it could be any number from [63, 2047], but
it seems like there is no any sense if put a value bigger then
guest-num-ids=min(riscv,num-ids, riscv,num-guest-ids) for guest as h/w
can't support more then that. At the same time I don't see too much
sense to tell a guest that it supports less then guest-num-ids. What is
the problem to give a guest a maximum that IMSIC could provide? With
migration, of course, it will be an issue if new host will support
different number of riscv,num-ids but I don't know what we can do better
then just avoid migration to such hosts if old-num-ids > new-num-ids
without significant performance drop.
As an summary it is okay IMO to re-use these host properties.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 11/27] xen/riscv: create APLIC DT node for guest domains
2026-04-16 11:49 ` Jan Beulich
@ 2026-04-17 9:01 ` Oleksii Kurochko
2026-04-17 13:53 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-17 9:01 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 1:49 PM, Jan Beulich wrote:
> On 13.04.2026 10:43, Oleksii Kurochko wrote:
>> On 4/1/26 5:16 PM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> + res = fdt_property(fdt, "#interrupt-cells", data, len);
>>>> + if ( res )
>>>> + return res;
>>>
>>> So host properties are again directly handed through to the guest?
>>
>> I am not sure that it can be different from what host has.
>>
>>> Shouldn't the number of interrupts (aiui that's the "number of cells"
>>> here) a guest gets be independent from the host it runs one?
>>
>> The #interrupt-cells property specifies how many 32-bit values (cells)
>> are used to encode a single interrupt specifier when referencing this
>> interrupt controller from another device node.
>> In this APLIC schema, it's fixed at const: 2, meaning every interrupt
>> reference requires exactly two cells — typically:
>>
>> Cell 1 — the interrupt source number (which of the riscv,num-sources
>> wired inputs)
>> Cell 2 — the interrupt type/trigger flags (e.g. edge vs. level, active
>> high/low)
>
> So what if #interrupt-cells is 3 in the DT Xen is handed? If Xen can
> cope, should that value really also be handed through to guests?
I would say that it depends on what cell 3 will represent. But likely it
will be needed to hand it to the guest.
>
>>>> + regs = dt_get_property(aplic_node, "reg", &len);
>>>> + if ( !regs )
>>>> + {
>>>> + printk("%s: Can't find 'reg' property\n", aplic_node->full_name);
>>>> + return -FDT_ERR_XEN(ENOENT);
>>>> + }
>>>> +
>>>> + res = fdt_property(fdt, "reg", regs, len);
>>>> + if ( res )
>>>> + return res;
>>>> +
>>>> + data = dt_get_property(aplic_node, "riscv,num-sources", &len);
>>>> + if ( !data )
>>>> + {
>>>> + printk("%s: Can't find 'riscv,num-sources' property\n",
>>>> + aplic_node->full_name);
>>>> + return -FDT_ERR_XEN(ENOENT);
>>>> + }
>>>> +
>>>> + res = fdt_property(fdt, "riscv,num-sources", data, len);
>>>> + if ( res )
>>>> + return res;
>>>
>>> Or maybe this is the number of interrupts?
>>
>> This is the total count of hardware interrupt lines wired into this
>> APLIC domain.
>>
>> It could be independent from the host it runs one but looking at the
>> possible range [1,1023] for this property if we will put for a guest
>> lets say 22 but new host support only 20 when we will be in a trouble
>> anyway.
>
> Correct. But if you had some hosts with 1024 and some with 256, how
> would you bring up a guest on the former to later be able to migrate
> it to one of the latter when guests inherit the count from the host?
It will be impossible to do that without emulation, so I expect it will
be easier to simply forbid migration to such a host.
Basically, it is just a game of numbers. We could try to estimate how
many interrupts are needed for the guest, or for simplicity use some
hard-coded number (say 128 or 256, something “pretty small”). Will it
resolve the migration issue? Partially yes, but not fully, as there
could always be cases where the new host’s number of sources is lower
than our hard-coded “small guest value.” So having a smaller value will
likely help on average and will allow to support more h/w migration to
which could happen, but the migration issue will still be present.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available()
2026-04-16 12:06 ` Jan Beulich
@ 2026-04-17 9:37 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-17 9:37 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 2:06 PM, Jan Beulich wrote:
> On 13.04.2026 11:32, Oleksii Kurochko wrote:
>> On 4/2/26 11:00 AM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> aia_init() is going to contain all the stuff related to AIA initialization.
>>>> At the moment, it is just Check if SSAIA extension is available and if yes
>>>> set is_aia_available to true.
>>>
>>> And (future) users of aia_available() can't directly call
>>> riscv_isa_extension_available()? Nor can aia_available() be a convenience
>>> wrapper around that call? It's only ...
>>>
>>>> --- /dev/null
>>>> +++ b/xen/arch/riscv/aia.c
>>>> @@ -0,0 +1,25 @@
>>>> +/* SPDX-License-Identifier: GPL-2.0-only */
>>>> +
>>>> +#include <xen/errno.h>
>>>> +#include <xen/init.h>
>>>> +#include <xen/sections.h>
>>>> +#include <xen/types.h>
>>>> +
>>>> +#include <asm/cpufeature.h>
>>>> +
>>>> +static bool __ro_after_init is_aia_available;
>>>
>>> ... a boolean, yes, but still.
>>
>> My purpose was to have a variable which represent that AIA is
>> initialized properly. Maybe, it makes sense to rename this variable to
>> is_aia_inited.
>
> I'm generally having trouble with the non-word "inited". How about
> "aia_usable"? Fits ...
>
>> The idea is that in future patches VGEIN will be also initialized in
>> aia_init() and if wasn't initialized properly then just keep
>> is_aia_availabe be set to false and in such case we will have that
>> is_aia_available != riscv_isa_extension_available(NULL,
>> RISCV_ISA_EXT_ssaia).
>
> ... this as well. And ...
>
>> Note that regarding VGEIN it is arguable that it should be initialized
>> as generally it could be that there is no VGEIN what means that h/w
>> assisted guest interrupt files aren't available and s/w one should be
>> used. But s/w guest interrupt files aren't supported.
>> So I mean that with the current implementation if VGEIN isn't
>> initialized I will tell that AIA isn't available what generally isn't
>> quite true.
>
> ... this. Question of course if what aia_usable (or whatever its name)
> being set to false means to Xen's own operation. If Xen would still work
> okay (without being able to create guests), perhaps the name may want to
> be "aia_guest_usable" or some such. I.e. as I said on a number of
> occasions: A variable wants to be named to accurately express its
> purpose, without the name growing excessively long.
aia_usable sounds good to me.
It covers also the case if AIA isn't expected to be used as h/w uses
non-AIA-compliant interrupt controller, for example, PLIC.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests
2026-04-16 12:21 ` Jan Beulich
@ 2026-04-17 11:34 ` Oleksii Kurochko
2026-04-17 14:07 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-17 11:34 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 2:21 PM, Jan Beulich wrote:
> On 13.04.2026 16:42, Oleksii Kurochko wrote:
>> On 4/2/26 12:03 PM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> @@ -14,12 +27,109 @@ bool aia_available(void)
>>>> return is_aia_available;
>>>> }
>>>>
>>>> +int __init vgein_init(unsigned int cpu)
>>>
>>> If this needs invoking once per CPU being brought up, it can't be __init.
>>
>> Yes, it is going to be called inside the secondary CPU bring-up function.
>>
>> __init sections are removed much later, after all CPUs are brought up,
>> so it looks like that at the moment when secondary CPUs are being
>> brought up, __init still exists and can be called.
>
> No. And I said so before. See XEN_SYSCTL_cpu_hotplug (and ignore it having
> "hotplug" in the name, but merely consider that you can take CPUs offline
> at runtime, and later bring them online again).
Thanks, now it makes sense to drop __init.
>
>>> Also - static?
>>
>> It isn't static because it will be called inside the secondary CPU
>> bring-up function.
>
> As it doesn't need calling from the outside for the boot CPU, it's not
> obvious why it would need calling from the outside for secondary ones.
I think I am confused here by what do you mean by "calling from the
outside".
It should be called during boot or secondary CPU initialization (if AIA
is used), right?
Then considering that VGEIN-related stuff is connected to AIA then
vgein_init() is expected to live in aia.c and considering that it wants
to be called by secondary CPU boot code it can't be static.
I have a feeling that I misunderstood your statement...
>
>>>> +{
>>>> + struct vgein_bmp *vgein = &per_cpu(vgein_bmp, cpu);
>>>> +
>>>> + csr_write(CSR_HGEIE, -1UL);
>>>> + vgein->geilen = flsl(csr_read(CSR_HGEIE));
>>>> + csr_write(CSR_HGEIE, 0);
>>>> + if ( vgein->geilen )
>>>> + vgein->geilen--;
>>>
>>> I don't understand this. The "len" in "geilen" stands for "length", I suppose,
>>> i.e. the number of bits. Hmm, the spec itself is inconsistent: "The number of
>>> bits implemented in hgeip and hgeie for guest external interrupts is UNSPECIFIED
>>> and may be zero. This number is known as GEILEN." This may or may not include
>>> bit 0 (which is implemented, but r/o zero). Then saying "Hence, if GEILEN is
>>> nonzero, bits GEILEN:1 shall be writable in ..." suggests 0 isn't included, but
>>> that's not unambiguous.
>>
>> But they explicitly wrote that: The least-significant bits are
>> implemented first, apart from bit 0. So bit 0 is explicitly excluded.
>
> Fine, but not including it in anything named *LEN feels - as said - ambiguous.
Agree, it is. I just tried to follow the spec naming here. But I am okay
to rename it to max_gein, for example.
>
>>>> + /*
>>>> + * All vCPU guest interrupt files are used and we don't support a case
>>>> + * when number of vCPU on 1 pCPU is bigger then geilen.
>>>> + */
>>>
>>> This wants checking in vgein_init() then. CPUs (beyond the boot one)
>>> violating this should not be brought online.
>>
>> It'll be nice. But we can't know how many vCPUs will be ran on pCPU when
>> vgein_init() is executed.
>
> I don't understand: How does it matter how many vCPU-s will exist later on
> (and will run wherever)?
IIUC, you want to check what is mentioned in the comment in vcpu_init().
The comment says that it is checking that number of vCPU on one pCPU
isn't bigger then geilen. To check that we have to know an amount of
vCPU potentially will be ran on pCPU.
For me it is enough to have that vgein->geilen isn't 0 as this case
isn't supported now.
>
>>>> + ASSERT(vgein_id < vgein_bmp->geilen);
>>>
>>> What if not bit is available? By asserting, you assume the caller will not
>>> call here when no ID is available.
>>
>> It is just a temporary ASSERT() (as we don't support software guest
>> interrupt files) because in general it is fine if there is no bit
>> available, it will just mean that that no physical hardware guest
>> interrupt file is assigned to the virtual hart, and software-based
>> emulation (a "software file") must be used to handle guest external
>> interrupts.
>>
>> Will it be better to return 0 now here and just don't create a vCPU
>> on ...
>>
>> Yet there is no caller of this function,
>>> so how can one verify whether this assertion is appropriate?
>>
>> ... the caller side when an assignment is expected to be happen?
>
> Yes, you want to return errors for cases you can't handle. You wan to
> assert on internal state only, not on anything controlled by a caller
> outside of the hypervisor.
I will return then just vgein_id = 0 (as it is what should be returned
when s/w guest interrupt files are supported) and handle that on the
caller side.
And symmetrically I will add 'if ( !vgein_id ) return' to
vgein_release() as nothing should be done in vgein bitmap in this case.
~ Oleksii
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state
2026-04-16 12:31 ` Jan Beulich
@ 2026-04-17 13:47 ` Oleksii Kurochko
2026-04-20 8:29 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-17 13:47 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 2:31 PM, Jan Beulich wrote:
> On 14.04.2026 11:22, Oleksii Kurochko wrote:
>> On 4/2/26 1:31 PM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> --- a/xen/arch/riscv/imsic.c
>>>> +++ b/xen/arch/riscv/imsic.c
>>>> @@ -59,6 +59,29 @@ do { \
>>>> csr_clear(CSR_SIREG, v); \
>>>> } while (0)
>>>>
>>>> +unsigned int vcpu_guest_file_id(const struct vcpu *v)
>>>> +{
>>>> + struct imsic_state *imsic_state = v->arch.imsic_state;
>>>> + unsigned long flags;
>>>> + unsigned int vsfile_id;
>>>> +
>>>> + read_lock_irqsave(&imsic_state->vsfile_lock, flags);
>>>> + vsfile_id = imsic_state->guest_file_id;
>>>> + read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
>>>
>>> What purpose does this locking have? Already ...
>>>
>>>> + return vsfile_id;
>>>
>>> ... here the value can be stale, if indeed there is a chance of races.
>>> Did you perhaps mean to use ACCESS_ONCE() here and where the value is
>>> set?
>>
>> ACCESS_ONCE() isn't guarantee only compiler re-ordering (as basically it
>> is just volatile-related stuff inisde the macros)?
>>
>> Generally, I think that that guest_file_id is needed to be updated only
>> during migration of vCPU from one pCPU to another and I expect that
>> during this migration vCPU isn't active, so no one will want to read
>> imsic_state->guest_file_id. But on the other hand, there is:
>> bool imsic_has_interrupt(const struct vcpu *vcpu)
>> {
>> ...
>> /*
>> * The IMSIC SW-file directly injects interrupt via hvip so
>> * only check for interrupt when IMSIC VS-file is being used.
>> */
>>
>> read_lock_irqsave(&imsic_state->vsfile_lock, flags);
>> if ( imsic_state->vsfile_pcpu != NR_CPUS )
>> ret = !!(csr_read(CSR_HGEIP) & BIT(imsic_state->guest_file_id,
>> UL));
>> read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
>> ...
>> }
>> which I think could be called in parallel with with migration, so then
>> still lock are needed.
>
> None of this addresses my pointing out that the returned value will be
> stale by the point the caller gets to look at it.
Yes, I agree that lock in vcpu_guest_file_id() is useless and it should
be on the caller side and used for the whole IMSIC state access. But ...
> Which in turn raises
> said question about the use of a lock. If you read
> imsic_state->guest_file_id atomically (i.e. excluding tearing of reads),
> the value seen / used will be stale as with the lock in use. Unless of
> course there's yet another aspect hidden somewhere in what is not being
> explained.
... I am not sure that I get this part.
If I am somewhere in migration code where I took write lock to update
imsic state (and of course ->guest_file_id as part of it) then if
someone else in parallel calls imsic_has_interrupt() then it won't enter
critical section where ->guest_file_id is trying to be read so no stale
->guest_file_id will be read.
Then does it make sense to use ACCESS_ONCE() during read and write of
->guest_file_id in such use cases?
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains
2026-04-17 8:10 ` Oleksii Kurochko
@ 2026-04-17 13:50 ` Jan Beulich
2026-04-17 14:01 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-17 13:50 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 17.04.2026 10:10, Oleksii Kurochko wrote:
> On 4/16/26 1:42 PM, Jan Beulich wrote:
>> On 10.04.2026 17:40, Oleksii Kurochko wrote:
>>> On 4/1/26 5:05 PM, Jan Beulich wrote:
>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>> Guests using the IMSIC interrupt controller require a corresponding
>>>>> Device Tree description. Add support for generating an IMSIC node when
>>>>> building the guest DT.
>>>>>
>>>>> Keep a reference to the host IMSIC DT node and reuse its compatible
>>>>> property while constructing the guest-visible node.
>>>>
>>>> Again raises a migration concern. Presumably a guest would then be able
>>>> to migrate only to other hosts with the same compatible property.
>>>
>>> Right, but I don't think we can do too much and it is the simplest
>>> approach just to migrate to hosts with the same compatible property.
>>>
>>> If you concern is about DTS property then for IMSIC it will be always
>>> riscv,imsics according to the RISC-V DT spec. (IIRC, the only other
>>> option could be qemu,riscv). Actually, I can just hard code
>>> "riscv,imsic" explicitly instead of re-using of host name.
>>>
>>> If your concern is that we will migrate to a host which doesn't support
>>> IMSIC at all then we should or (a) don't migrate to such host or (b)
>>> provide an emulation of IMSIC. And option (b) would be terrible from at
>>> least performance point of view.
>>
>> That would all be only a secondary concern - in an IMSIC is needed right
>> now, so be it. My primary concern is with inheriting the host IMSIC's
>> properties.
> Could we do really something better?
>
> At the moment, the following properties are inhereted:
> node name -> not an issue at all, it is just a name of the node and
> generally doesn't matter what it is in the matter of DTB generation as
> phandle number of this node will be used as pointer to this node, not
> the name. If it would be better I can hard code just "imsic".
>
> compatible -> also, not a big issue. According to current DT RISC-V
> bindings it could be just "riscv,imsic". But then I have the question
> what if one day someone will come up with own implementation of IMSIC
> then "riscv,own-imsic" will be in host DTB and so I expect that guest
> DTB should have it as this h/w expects to use specifically OWN-IMSIC
> driver. So it seems okay to copy compatible from host DTB. Yes, if
> migration will happen where just "riscv,imsic" is supported then it
> can't be easily migrated to such h/w and its okay. But generally I
> expect that compatible will always riscv,imsic.
>
> riscv,num-ids -> generally, it could be any number from [63, 2047], but
> it seems like there is no any sense if put a value bigger then
> guest-num-ids=min(riscv,num-ids, riscv,num-guest-ids) for guest as h/w
> can't support more then that. At the same time I don't see too much
> sense to tell a guest that it supports less then guest-num-ids. What is
> the problem to give a guest a maximum that IMSIC could provide? With
> migration, of course, it will be an issue if new host will support
> different number of riscv,num-ids but I don't know what we can do better
> then just avoid migration to such hosts if old-num-ids > new-num-ids
> without significant performance drop.
This is exactly why an admin may want to limit what a guest gets to
see / use.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 11/27] xen/riscv: create APLIC DT node for guest domains
2026-04-17 9:01 ` Oleksii Kurochko
@ 2026-04-17 13:53 ` Jan Beulich
2026-04-17 14:27 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-17 13:53 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 17.04.2026 11:01, Oleksii Kurochko wrote:
>
>
> On 4/16/26 1:49 PM, Jan Beulich wrote:
>> On 13.04.2026 10:43, Oleksii Kurochko wrote:
>>> On 4/1/26 5:16 PM, Jan Beulich wrote:
>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>> + res = fdt_property(fdt, "#interrupt-cells", data, len);
>>>>> + if ( res )
>>>>> + return res;
>>>>
>>>> So host properties are again directly handed through to the guest?
>>>
>>> I am not sure that it can be different from what host has.
>>>
>>>> Shouldn't the number of interrupts (aiui that's the "number of cells"
>>>> here) a guest gets be independent from the host it runs one?
>>>
>>> The #interrupt-cells property specifies how many 32-bit values (cells)
>>> are used to encode a single interrupt specifier when referencing this
>>> interrupt controller from another device node.
>>> In this APLIC schema, it's fixed at const: 2, meaning every interrupt
>>> reference requires exactly two cells — typically:
>>>
>>> Cell 1 — the interrupt source number (which of the riscv,num-sources
>>> wired inputs)
>>> Cell 2 — the interrupt type/trigger flags (e.g. edge vs. level, active
>>> high/low)
>>
>> So what if #interrupt-cells is 3 in the DT Xen is handed? If Xen can
>> cope, should that value really also be handed through to guests?
>
> I would say that it depends on what cell 3 will represent. But likely it
> will be needed to hand it to the guest.
>
>>
>>>>> + regs = dt_get_property(aplic_node, "reg", &len);
>>>>> + if ( !regs )
>>>>> + {
>>>>> + printk("%s: Can't find 'reg' property\n", aplic_node->full_name);
>>>>> + return -FDT_ERR_XEN(ENOENT);
>>>>> + }
>>>>> +
>>>>> + res = fdt_property(fdt, "reg", regs, len);
>>>>> + if ( res )
>>>>> + return res;
>>>>> +
>>>>> + data = dt_get_property(aplic_node, "riscv,num-sources", &len);
>>>>> + if ( !data )
>>>>> + {
>>>>> + printk("%s: Can't find 'riscv,num-sources' property\n",
>>>>> + aplic_node->full_name);
>>>>> + return -FDT_ERR_XEN(ENOENT);
>>>>> + }
>>>>> +
>>>>> + res = fdt_property(fdt, "riscv,num-sources", data, len);
>>>>> + if ( res )
>>>>> + return res;
>>>>
>>>> Or maybe this is the number of interrupts?
>>>
>>> This is the total count of hardware interrupt lines wired into this
>>> APLIC domain.
>>>
>>> It could be independent from the host it runs one but looking at the
>>> possible range [1,1023] for this property if we will put for a guest
>>> lets say 22 but new host support only 20 when we will be in a trouble
>>> anyway.
>>
>> Correct. But if you had some hosts with 1024 and some with 256, how
>> would you bring up a guest on the former to later be able to migrate
>> it to one of the latter when guests inherit the count from the host?
>
> It will be impossible to do that without emulation,
It's not quite clear to me why this would be.
> so I expect it will
> be easier to simply forbid migration to such a host.
>
> Basically, it is just a game of numbers. We could try to estimate how
> many interrupts are needed for the guest, or for simplicity use some
> hard-coded number (say 128 or 256, something “pretty small”). Will it
> resolve the migration issue? Partially yes, but not fully, as there
> could always be cases where the new host’s number of sources is lower
> than our hard-coded “small guest value.” So having a smaller value will
> likely help on average and will allow to support more h/w migration to
> which could happen, but the migration issue will still be present.
An admin, knowing the capabilities of all hosts in a pool, can suitably
limit guests intended to move among hosts. No migration issue at all.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains
2026-04-17 13:50 ` Jan Beulich
@ 2026-04-17 14:01 ` Oleksii Kurochko
2026-04-17 14:10 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-17 14:01 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/17/26 3:50 PM, Jan Beulich wrote:
> On 17.04.2026 10:10, Oleksii Kurochko wrote:
>> On 4/16/26 1:42 PM, Jan Beulich wrote:
>>> On 10.04.2026 17:40, Oleksii Kurochko wrote:
>>>> On 4/1/26 5:05 PM, Jan Beulich wrote:
>>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>>> Guests using the IMSIC interrupt controller require a corresponding
>>>>>> Device Tree description. Add support for generating an IMSIC node when
>>>>>> building the guest DT.
>>>>>>
>>>>>> Keep a reference to the host IMSIC DT node and reuse its compatible
>>>>>> property while constructing the guest-visible node.
>>>>>
>>>>> Again raises a migration concern. Presumably a guest would then be able
>>>>> to migrate only to other hosts with the same compatible property.
>>>>
>>>> Right, but I don't think we can do too much and it is the simplest
>>>> approach just to migrate to hosts with the same compatible property.
>>>>
>>>> If you concern is about DTS property then for IMSIC it will be always
>>>> riscv,imsics according to the RISC-V DT spec. (IIRC, the only other
>>>> option could be qemu,riscv). Actually, I can just hard code
>>>> "riscv,imsic" explicitly instead of re-using of host name.
>>>>
>>>> If your concern is that we will migrate to a host which doesn't support
>>>> IMSIC at all then we should or (a) don't migrate to such host or (b)
>>>> provide an emulation of IMSIC. And option (b) would be terrible from at
>>>> least performance point of view.
>>>
>>> That would all be only a secondary concern - in an IMSIC is needed right
>>> now, so be it. My primary concern is with inheriting the host IMSIC's
>>> properties.
>> Could we do really something better?
>>
>> At the moment, the following properties are inhereted:
>> node name -> not an issue at all, it is just a name of the node and
>> generally doesn't matter what it is in the matter of DTB generation as
>> phandle number of this node will be used as pointer to this node, not
>> the name. If it would be better I can hard code just "imsic".
>>
>> compatible -> also, not a big issue. According to current DT RISC-V
>> bindings it could be just "riscv,imsic". But then I have the question
>> what if one day someone will come up with own implementation of IMSIC
>> then "riscv,own-imsic" will be in host DTB and so I expect that guest
>> DTB should have it as this h/w expects to use specifically OWN-IMSIC
>> driver. So it seems okay to copy compatible from host DTB. Yes, if
>> migration will happen where just "riscv,imsic" is supported then it
>> can't be easily migrated to such h/w and its okay. But generally I
>> expect that compatible will always riscv,imsic.
>>
>> riscv,num-ids -> generally, it could be any number from [63, 2047], but
>> it seems like there is no any sense if put a value bigger then
>> guest-num-ids=min(riscv,num-ids, riscv,num-guest-ids) for guest as h/w
>> can't support more then that. At the same time I don't see too much
>> sense to tell a guest that it supports less then guest-num-ids. What is
>> the problem to give a guest a maximum that IMSIC could provide? With
>> migration, of course, it will be an issue if new host will support
>> different number of riscv,num-ids but I don't know what we can do better
>> then just avoid migration to such hosts if old-num-ids > new-num-ids
>> without significant performance drop.
>
> This is exactly why an admin may want to limit what a guest gets to
> see / use.
Would it be better then provide a separate define for riscv, num-ids and
use it here? Maybe here it makes sense to re-use already provided by DT
binding property:
riscv,num-guest-ids:
$ref: /schemas/types.yaml#/definitions/uint32
minimum: 63
maximum: 2047
description:
Number of interrupt identities are supported by IMSIC guest
interrupt
file. When not specified it is assumed to be same as specified by the
riscv,num-ids property.
And if riscv,num-guest-ids the use some predefined in define value?
Does it make sense then provide the similar defines for compatible and
node name or for that properties it would be okay to re-use what host
DTB IMSIC node provides?
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests
2026-04-17 11:34 ` Oleksii Kurochko
@ 2026-04-17 14:07 ` Jan Beulich
2026-04-20 7:52 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-17 14:07 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 17.04.2026 13:34, Oleksii Kurochko wrote:
> On 4/16/26 2:21 PM, Jan Beulich wrote:
>> On 13.04.2026 16:42, Oleksii Kurochko wrote:
>>> On 4/2/26 12:03 PM, Jan Beulich wrote:
>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>> @@ -14,12 +27,109 @@ bool aia_available(void)
>>>>> return is_aia_available;
>>>>> }
>>>>>
>>>>> +int __init vgein_init(unsigned int cpu)
>>>>
>>>> If this needs invoking once per CPU being brought up, it can't be __init.
>>>
>>> Yes, it is going to be called inside the secondary CPU bring-up function.
>>>
>>> __init sections are removed much later, after all CPUs are brought up,
>>> so it looks like that at the moment when secondary CPUs are being
>>> brought up, __init still exists and can be called.
>>
>> No. And I said so before. See XEN_SYSCTL_cpu_hotplug (and ignore it having
>> "hotplug" in the name, but merely consider that you can take CPUs offline
>> at runtime, and later bring them online again).
>
> Thanks, now it makes sense to drop __init.
>
>>>> Also - static?
>>>
>>> It isn't static because it will be called inside the secondary CPU
>>> bring-up function.
>>
>> As it doesn't need calling from the outside for the boot CPU, it's not
>> obvious why it would need calling from the outside for secondary ones.
>
> I think I am confused here by what do you mean by "calling from the
> outside".
As in "from another translation unit".
> It should be called during boot or secondary CPU initialization (if AIA
> is used), right?
> Then considering that VGEIN-related stuff is connected to AIA then
> vgein_init() is expected to live in aia.c and considering that it wants
> to be called by secondary CPU boot code it can't be static.
Putting my question differently: Why is there a difference here between
boot CPU and secondary CPUs. My (general) expectation would be that either
the function wants calling from outside of aia.c in all cases, or it could
be static (now and later).
>>>>> +{
>>>>> + struct vgein_bmp *vgein = &per_cpu(vgein_bmp, cpu);
>>>>> +
>>>>> + csr_write(CSR_HGEIE, -1UL);
>>>>> + vgein->geilen = flsl(csr_read(CSR_HGEIE));
>>>>> + csr_write(CSR_HGEIE, 0);
>>>>> + if ( vgein->geilen )
>>>>> + vgein->geilen--;
>>>>
>>>> I don't understand this. The "len" in "geilen" stands for "length", I suppose,
>>>> i.e. the number of bits. Hmm, the spec itself is inconsistent: "The number of
>>>> bits implemented in hgeip and hgeie for guest external interrupts is UNSPECIFIED
>>>> and may be zero. This number is known as GEILEN." This may or may not include
>>>> bit 0 (which is implemented, but r/o zero). Then saying "Hence, if GEILEN is
>>>> nonzero, bits GEILEN:1 shall be writable in ..." suggests 0 isn't included, but
>>>> that's not unambiguous.
>>>
>>> But they explicitly wrote that: The least-significant bits are
>>> implemented first, apart from bit 0. So bit 0 is explicitly excluded.
>>
>> Fine, but not including it in anything named *LEN feels - as said - ambiguous.
>
> Agree, it is. I just tried to follow the spec naming here. But I am okay
> to rename it to max_gein, for example.
Perhaps better stick to spec naming, but consider adding a clarifying comment.
>>>>> + /*
>>>>> + * All vCPU guest interrupt files are used and we don't support a case
>>>>> + * when number of vCPU on 1 pCPU is bigger then geilen.
>>>>> + */
>>>>
>>>> This wants checking in vgein_init() then. CPUs (beyond the boot one)
>>>> violating this should not be brought online.
>>>
>>> It'll be nice. But we can't know how many vCPUs will be ran on pCPU when
>>> vgein_init() is executed.
>>
>> I don't understand: How does it matter how many vCPU-s will exist later on
>> (and will run wherever)?
>
> IIUC, you want to check what is mentioned in the comment in vcpu_init().
> The comment says that it is checking that number of vCPU on one pCPU
> isn't bigger then geilen. To check that we have to know an amount of
> vCPU potentially will be ran on pCPU.
Well, not quite. I don't understand at all why the number of vCPU-s in
a system would be limited by any pCPU property. I don't think we have
any such constraint on x86, for example.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains
2026-04-17 14:01 ` Oleksii Kurochko
@ 2026-04-17 14:10 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-17 14:10 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 17.04.2026 16:01, Oleksii Kurochko wrote:
>
>
> On 4/17/26 3:50 PM, Jan Beulich wrote:
>> On 17.04.2026 10:10, Oleksii Kurochko wrote:
>>> On 4/16/26 1:42 PM, Jan Beulich wrote:
>>>> On 10.04.2026 17:40, Oleksii Kurochko wrote:
>>>>> On 4/1/26 5:05 PM, Jan Beulich wrote:
>>>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>>>> Guests using the IMSIC interrupt controller require a corresponding
>>>>>>> Device Tree description. Add support for generating an IMSIC node when
>>>>>>> building the guest DT.
>>>>>>>
>>>>>>> Keep a reference to the host IMSIC DT node and reuse its compatible
>>>>>>> property while constructing the guest-visible node.
>>>>>>
>>>>>> Again raises a migration concern. Presumably a guest would then be able
>>>>>> to migrate only to other hosts with the same compatible property.
>>>>>
>>>>> Right, but I don't think we can do too much and it is the simplest
>>>>> approach just to migrate to hosts with the same compatible property.
>>>>>
>>>>> If you concern is about DTS property then for IMSIC it will be always
>>>>> riscv,imsics according to the RISC-V DT spec. (IIRC, the only other
>>>>> option could be qemu,riscv). Actually, I can just hard code
>>>>> "riscv,imsic" explicitly instead of re-using of host name.
>>>>>
>>>>> If your concern is that we will migrate to a host which doesn't support
>>>>> IMSIC at all then we should or (a) don't migrate to such host or (b)
>>>>> provide an emulation of IMSIC. And option (b) would be terrible from at
>>>>> least performance point of view.
>>>>
>>>> That would all be only a secondary concern - in an IMSIC is needed right
>>>> now, so be it. My primary concern is with inheriting the host IMSIC's
>>>> properties.
>>> Could we do really something better?
>>>
>>> At the moment, the following properties are inhereted:
>>> node name -> not an issue at all, it is just a name of the node and
>>> generally doesn't matter what it is in the matter of DTB generation as
>>> phandle number of this node will be used as pointer to this node, not
>>> the name. If it would be better I can hard code just "imsic".
>>>
>>> compatible -> also, not a big issue. According to current DT RISC-V
>>> bindings it could be just "riscv,imsic". But then I have the question
>>> what if one day someone will come up with own implementation of IMSIC
>>> then "riscv,own-imsic" will be in host DTB and so I expect that guest
>>> DTB should have it as this h/w expects to use specifically OWN-IMSIC
>>> driver. So it seems okay to copy compatible from host DTB. Yes, if
>>> migration will happen where just "riscv,imsic" is supported then it
>>> can't be easily migrated to such h/w and its okay. But generally I
>>> expect that compatible will always riscv,imsic.
>>>
>>> riscv,num-ids -> generally, it could be any number from [63, 2047], but
>>> it seems like there is no any sense if put a value bigger then
>>> guest-num-ids=min(riscv,num-ids, riscv,num-guest-ids) for guest as h/w
>>> can't support more then that. At the same time I don't see too much
>>> sense to tell a guest that it supports less then guest-num-ids. What is
>>> the problem to give a guest a maximum that IMSIC could provide? With
>>> migration, of course, it will be an issue if new host will support
>>> different number of riscv,num-ids but I don't know what we can do better
>>> then just avoid migration to such hosts if old-num-ids > new-num-ids
>>> without significant performance drop.
>>
>> This is exactly why an admin may want to limit what a guest gets to
>> see / use.
>
> Would it be better then provide a separate define for riscv, num-ids and
> use it here? Maybe here it makes sense to re-use already provided by DT
> binding property:
> riscv,num-guest-ids:
> $ref: /schemas/types.yaml#/definitions/uint32
> minimum: 63
> maximum: 2047
> description:
> Number of interrupt identities are supported by IMSIC guest
> interrupt
> file. When not specified it is assumed to be same as specified by the
> riscv,num-ids property.
> And if riscv,num-guest-ids the use some predefined in define value?
>
> Does it make sense then provide the similar defines for compatible and
> node name or for that properties it would be okay to re-use what host
> DTB IMSIC node provides?
I fear I can't answer this. I'm not knowledgeable enough on DT. To me,
guest properties are (originally) specified by a guest config file.
How that maps to DT (in particular in the dom0less case) I don't really
know.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 11/27] xen/riscv: create APLIC DT node for guest domains
2026-04-17 13:53 ` Jan Beulich
@ 2026-04-17 14:27 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-17 14:27 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/17/26 3:53 PM, Jan Beulich wrote:
> On 17.04.2026 11:01, Oleksii Kurochko wrote:
>>
>>
>> On 4/16/26 1:49 PM, Jan Beulich wrote:
>>> On 13.04.2026 10:43, Oleksii Kurochko wrote:
>>>> On 4/1/26 5:16 PM, Jan Beulich wrote:
>>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>>> + res = fdt_property(fdt, "#interrupt-cells", data, len);
>>>>>> + if ( res )
>>>>>> + return res;
>>>>>
>>>>> So host properties are again directly handed through to the guest?
>>>>
>>>> I am not sure that it can be different from what host has.
>>>>
>>>>> Shouldn't the number of interrupts (aiui that's the "number of cells"
>>>>> here) a guest gets be independent from the host it runs one?
>>>>
>>>> The #interrupt-cells property specifies how many 32-bit values (cells)
>>>> are used to encode a single interrupt specifier when referencing this
>>>> interrupt controller from another device node.
>>>> In this APLIC schema, it's fixed at const: 2, meaning every interrupt
>>>> reference requires exactly two cells — typically:
>>>>
>>>> Cell 1 — the interrupt source number (which of the riscv,num-sources
>>>> wired inputs)
>>>> Cell 2 — the interrupt type/trigger flags (e.g. edge vs. level, active
>>>> high/low)
>>>
>>> So what if #interrupt-cells is 3 in the DT Xen is handed? If Xen can
>>> cope, should that value really also be handed through to guests?
>>
>> I would say that it depends on what cell 3 will represent. But likely it
>> will be needed to hand it to the guest.
>>
>>>
>>>>>> + regs = dt_get_property(aplic_node, "reg", &len);
>>>>>> + if ( !regs )
>>>>>> + {
>>>>>> + printk("%s: Can't find 'reg' property\n", aplic_node->full_name);
>>>>>> + return -FDT_ERR_XEN(ENOENT);
>>>>>> + }
>>>>>> +
>>>>>> + res = fdt_property(fdt, "reg", regs, len);
>>>>>> + if ( res )
>>>>>> + return res;
>>>>>> +
>>>>>> + data = dt_get_property(aplic_node, "riscv,num-sources", &len);
>>>>>> + if ( !data )
>>>>>> + {
>>>>>> + printk("%s: Can't find 'riscv,num-sources' property\n",
>>>>>> + aplic_node->full_name);
>>>>>> + return -FDT_ERR_XEN(ENOENT);
>>>>>> + }
>>>>>> +
>>>>>> + res = fdt_property(fdt, "riscv,num-sources", data, len);
>>>>>> + if ( res )
>>>>>> + return res;
>>>>>
>>>>> Or maybe this is the number of interrupts?
>>>>
>>>> This is the total count of hardware interrupt lines wired into this
>>>> APLIC domain.
>>>>
>>>> It could be independent from the host it runs one but looking at the
>>>> possible range [1,1023] for this property if we will put for a guest
>>>> lets say 22 but new host support only 20 when we will be in a trouble
>>>> anyway.
>>>
>>> Correct. But if you had some hosts with 1024 and some with 256, how
>>> would you bring up a guest on the former to later be able to migrate
>>> it to one of the latter when guests inherit the count from the host?
>>
>> It will be impossible to do that without emulation,
>
> It's not quite clear to me why this would be.
if migration happens from the host with 256 to the host with 1024 the
some interrupt remapping/multiplixing might be needed. But I am not
consider this as a good option...
>
>> so I expect it will
>> be easier to simply forbid migration to such a host.
>>
>> Basically, it is just a game of numbers. We could try to estimate how
>> many interrupts are needed for the guest, or for simplicity use some
>> hard-coded number (say 128 or 256, something “pretty small”). Will it
>> resolve the migration issue? Partially yes, but not fully, as there
>> could always be cases where the new host’s number of sources is lower
>> than our hard-coded “small guest value.” So having a smaller value will
>> likely help on average and will allow to support more h/w migration to
>> which could happen, but the migration issue will still be present.
>
> An admin, knowing the capabilities of all hosts in a pool, can suitably
> limit guests intended to move among hosts. No migration issue at all.
... this one option is much better. I will introduce define for now in
vaplic.h header or maybe just inside this function to hard-code this
amount of interrupts supported by guest vAPLIC for now.
But in future it make sense to introduce property in host APLIC node
something like riscv,num-guest-sources as IMSIC has for interrupt
identites (riscv,num-guest-ids).
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests
2026-04-17 14:07 ` Jan Beulich
@ 2026-04-20 7:52 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-20 7:52 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/17/26 4:07 PM, Jan Beulich wrote:
> On 17.04.2026 13:34, Oleksii Kurochko wrote:
>> On 4/16/26 2:21 PM, Jan Beulich wrote:
>>> On 13.04.2026 16:42, Oleksii Kurochko wrote:
>>>> On 4/2/26 12:03 PM, Jan Beulich wrote:
>>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>>> @@ -14,12 +27,109 @@ bool aia_available(void)
>>>>>> return is_aia_available;
>>>>>> }
>>>>>>
>>>>>> +int __init vgein_init(unsigned int cpu)
>>>>>
>>>>> If this needs invoking once per CPU being brought up, it can't be __init.
>>>>
>>>> Yes, it is going to be called inside the secondary CPU bring-up function.
>>>>
>>>> __init sections are removed much later, after all CPUs are brought up,
>>>> so it looks like that at the moment when secondary CPUs are being
>>>> brought up, __init still exists and can be called.
>>>
>>> No. And I said so before. See XEN_SYSCTL_cpu_hotplug (and ignore it having
>>> "hotplug" in the name, but merely consider that you can take CPUs offline
>>> at runtime, and later bring them online again).
>>
>> Thanks, now it makes sense to drop __init.
>>
>>>>> Also - static?
>>>>
>>>> It isn't static because it will be called inside the secondary CPU
>>>> bring-up function.
>>>
>>> As it doesn't need calling from the outside for the boot CPU, it's not
>>> obvious why it would need calling from the outside for secondary ones.
>>
>> I think I am confused here by what do you mean by "calling from the
>> outside".
>
> As in "from another translation unit".
>
>> It should be called during boot or secondary CPU initialization (if AIA
>> is used), right?
>> Then considering that VGEIN-related stuff is connected to AIA then
>> vgein_init() is expected to live in aia.c and considering that it wants
>> to be called by secondary CPU boot code it can't be static.
>
> Putting my question differently: Why is there a difference here between
> boot CPU and secondary CPUs. My (general) expectation would be that either
> the function wants calling from outside of aia.c in all cases, or it could
> be static (now and later).
Generally, if update prototype of aia_init() to receive cpuid argument
then it will be enough to make vgein_init() static and then all others
will call aia_init(). The only reason why I tried to have vgein_init()
non-static is that in the case of secondary CPU there is no need to call
second time anything (at least, now) from aia_init() or re-init
is_aia_usable variable.
But I can agree that probably it would be better to make vgein_init()
static now and then use aia_init() everywhere it is needed just for the
case that if something new will be added what should be called for boot
and secondary CPU won't be missed to call properly. And I think it is
fine just re-init is_aia_usable without any additional synchronization.
As an option I could check at the start if is_aia_usable is initialized
by checking if it is true and if it is then just call vgein_init() for
that cpuid.
Would it be better?
>
>>>>>> +{
>>>>>> + struct vgein_bmp *vgein = &per_cpu(vgein_bmp, cpu);
>>>>>> +
>>>>>> + csr_write(CSR_HGEIE, -1UL);
>>>>>> + vgein->geilen = flsl(csr_read(CSR_HGEIE));
>>>>>> + csr_write(CSR_HGEIE, 0);
>>>>>> + if ( vgein->geilen )
>>>>>> + vgein->geilen--;
>>>>>
>>>>> I don't understand this. The "len" in "geilen" stands for "length", I suppose,
>>>>> i.e. the number of bits. Hmm, the spec itself is inconsistent: "The number of
>>>>> bits implemented in hgeip and hgeie for guest external interrupts is UNSPECIFIED
>>>>> and may be zero. This number is known as GEILEN." This may or may not include
>>>>> bit 0 (which is implemented, but r/o zero). Then saying "Hence, if GEILEN is
>>>>> nonzero, bits GEILEN:1 shall be writable in ..." suggests 0 isn't included, but
>>>>> that's not unambiguous.
>>>>
>>>> But they explicitly wrote that: The least-significant bits are
>>>> implemented first, apart from bit 0. So bit 0 is explicitly excluded.
>>>
>>> Fine, but not including it in anything named *LEN feels - as said - ambiguous.
>>
>> Agree, it is. I just tried to follow the spec naming here. But I am okay
>> to rename it to max_gein, for example.
>
> Perhaps better stick to spec naming, but consider adding a clarifying comment.
I will add then above geilen field in vgein_bmp struct the comment:
The least-significant bits are implemented first, apart from bit 0
>
>>>>>> + /*
>>>>>> + * All vCPU guest interrupt files are used and we don't support a case
>>>>>> + * when number of vCPU on 1 pCPU is bigger then geilen.
>>>>>> + */
>>>>>
>>>>> This wants checking in vgein_init() then. CPUs (beyond the boot one)
>>>>> violating this should not be brought online.
>>>>
>>>> It'll be nice. But we can't know how many vCPUs will be ran on pCPU when
>>>> vgein_init() is executed.
>>>
>>> I don't understand: How does it matter how many vCPU-s will exist later on
>>> (and will run wherever)?
>>
>> IIUC, you want to check what is mentioned in the comment in vcpu_init().
>> The comment says that it is checking that number of vCPU on one pCPU
>> isn't bigger then geilen. To check that we have to know an amount of
>> vCPU potentially will be ran on pCPU.
>
> Well, not quite. I don't understand at all why the number of vCPU-s in
> a system would be limited by any pCPU property. I don't think we have
> any such constraint on x86, for example.
Then I will just drop this comment and correspondent ASSERT() and just
return vgein_id in the case when there is no available h/w VGEINs to
mark that h/w should use s/w guest interrupt files:
if ( vgein_id > vgein->geilen )
vgein_id = 0;
else
__set_bit(vgein_id, bmp);
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state
2026-04-17 13:47 ` Oleksii Kurochko
@ 2026-04-20 8:29 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-20 8:29 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 17.04.2026 15:47, Oleksii Kurochko wrote:
> On 4/16/26 2:31 PM, Jan Beulich wrote:
>> On 14.04.2026 11:22, Oleksii Kurochko wrote:
>>> On 4/2/26 1:31 PM, Jan Beulich wrote:
>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>> --- a/xen/arch/riscv/imsic.c
>>>>> +++ b/xen/arch/riscv/imsic.c
>>>>> @@ -59,6 +59,29 @@ do { \
>>>>> csr_clear(CSR_SIREG, v); \
>>>>> } while (0)
>>>>>
>>>>> +unsigned int vcpu_guest_file_id(const struct vcpu *v)
>>>>> +{
>>>>> + struct imsic_state *imsic_state = v->arch.imsic_state;
>>>>> + unsigned long flags;
>>>>> + unsigned int vsfile_id;
>>>>> +
>>>>> + read_lock_irqsave(&imsic_state->vsfile_lock, flags);
>>>>> + vsfile_id = imsic_state->guest_file_id;
>>>>> + read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
>>>>
>>>> What purpose does this locking have? Already ...
>>>>
>>>>> + return vsfile_id;
>>>>
>>>> ... here the value can be stale, if indeed there is a chance of races.
>>>> Did you perhaps mean to use ACCESS_ONCE() here and where the value is
>>>> set?
>>>
>>> ACCESS_ONCE() isn't guarantee only compiler re-ordering (as basically it
>>> is just volatile-related stuff inisde the macros)?
>>>
>>> Generally, I think that that guest_file_id is needed to be updated only
>>> during migration of vCPU from one pCPU to another and I expect that
>>> during this migration vCPU isn't active, so no one will want to read
>>> imsic_state->guest_file_id. But on the other hand, there is:
>>> bool imsic_has_interrupt(const struct vcpu *vcpu)
>>> {
>>> ...
>>> /*
>>> * The IMSIC SW-file directly injects interrupt via hvip so
>>> * only check for interrupt when IMSIC VS-file is being used.
>>> */
>>>
>>> read_lock_irqsave(&imsic_state->vsfile_lock, flags);
>>> if ( imsic_state->vsfile_pcpu != NR_CPUS )
>>> ret = !!(csr_read(CSR_HGEIP) & BIT(imsic_state->guest_file_id,
>>> UL));
>>> read_unlock_irqrestore(&imsic_state->vsfile_lock, flags);
>>> ...
>>> }
>>> which I think could be called in parallel with with migration, so then
>>> still lock are needed.
>>
>> None of this addresses my pointing out that the returned value will be
>> stale by the point the caller gets to look at it.
>
> Yes, I agree that lock in vcpu_guest_file_id() is useless and it should
> be on the caller side and used for the whole IMSIC state access. But ...
>
>> Which in turn raises
>> said question about the use of a lock. If you read
>> imsic_state->guest_file_id atomically (i.e. excluding tearing of reads),
>> the value seen / used will be stale as with the lock in use. Unless of
>> course there's yet another aspect hidden somewhere in what is not being
>> explained.
>
> ... I am not sure that I get this part.
>
> If I am somewhere in migration code where I took write lock to update
> imsic state (and of course ->guest_file_id as part of it) then if
> someone else in parallel calls imsic_has_interrupt() then it won't enter
> critical section where ->guest_file_id is trying to be read so no stale
> ->guest_file_id will be read.
Well, hence why I said "Unless of course there's yet another aspect hidden
somewhere in what is not being explained." If I don't know the full
picture, I can't very well judge whether a lock is needed, or whether ...
> Then does it make sense to use ACCESS_ONCE() during read and write of
> ->guest_file_id in such use cases?
... ACCESS_ONCE() would be enough.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support
2026-04-16 12:42 ` Jan Beulich
@ 2026-04-20 10:25 ` Oleksii Kurochko
2026-04-20 10:47 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-20 10:25 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 2:42 PM, Jan Beulich wrote:
> On 14.04.2026 12:27, Oleksii Kurochko wrote:
>> On 4/2/26 1:58 PM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> @@ -47,6 +48,19 @@ struct intc_hw_operations {
>>>> const struct dt_device_node *intc);
>>>> };
>>>> +
>>>> +struct vintc {
>>>> + const struct intc_info *info;
>>>
>>> Isn't this referencing a physical INTC's structure? Why would the virtual
>>> one's properties have to match that of the physical one?
>>
>> It is because of how vAPLIC emulation load and store is working.
>
> Thank you very much. This fully explains things, the more that of course
> emulation of loads and stores comes earlier in this series. Oleksii,
> really, please.
Sorry for that. Let me add some extra details where I think that pointer
to physical APLIC regs are needed.
When APLIC tries to access TARGET register it is necessary to update
real APLIC as inside this register it is coded information about Hart
index, Guest Index (guest interrupt file id) and EIID (External
Interrupt Identity). So to do that vintc should have access to physical
APLIC registers.
The similar things I expected to be with some of other register, for
example, that one which are stands for turning on/of interrupts (SETIE,
CLRIE). If vAPLIC is requesting an enablement of an interrupt then I
expect that correspondent physical APLIC register should be updated too
as otherwise how then device interrupt will start to occur. So again it
is needed a pointer to physical APLIC to access these registers.
Does it make sense at least a little bit now?
>
>>>> --- /dev/null
>>>> +++ b/xen/arch/riscv/vaplic.c
>>>> @@ -0,0 +1,74 @@
>>>> +/* SPDX-License-Identifier: MIT */
>>>> +/*
>>>> + * xen/arch/riscv/vaplic.c
>>>> + *
>>>> + * Virtual RISC-V Advanced Platform-Level Interrupt Controller support
>>>> + *
>>>> + * Copyright (c) Microchip.
>>>> + * Copyright (c) Vates
>>>> + */
>>>> +
>>>> +#include <xen/errno.h>
>>>> +#include <xen/sched.h>
>>>> +#include <xen/xvmalloc.h>
>>>> +
>>>> +#include <asm/aia.h>
>>>> +#include <asm/imsic.h>
>>>> +#include <asm/intc.h>
>>>> +#include <asm/vaplic.h>
>>>> +
>>>> +#include "aplic-priv.h"
>>>> +
>>>> +static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
>>>> +{
>>>> + int rc = 0;
>>>> +
>>>> + rc = vcpu_imsic_init(v);
>>>> + if ( rc )
>>>> + return rc;
>>>> +
>>>> + imsic_set_guest_file_id(v, vgein_assign(v));
>>>
>>> And vgein_assign() can't fail? (Rhetorical question - of course it can. That
>>> function shouldn't assert that it can fine a valid ID.)
>>
>> Technically it can't fail (except some bug of course), this function
>> should in general return 0 (when there aren't left h/w IDs)
>
> Which is "failure".
>
>> or something > 0 (when there are some h/w IDs).
>
> Which is "success".
>
>> ASSERT() inside it was added only
>> because of ...
>>
>>> But then - aren't you limiting the number of vCPU-s a host can handle by the
>>> number vgein IDs?
>>
>> ... At the moment, I am limiting because S/W interrutps guest files
>> (IDs) aren't supported.
>
> As before - return error codes when errors occur.
I will return error code on the caller side of vgein_assign() if it
returns 0.
>
>>>> +static struct vintc * __init vaplic_alloc(void)
>>>> +{
>>>> + struct vaplic *v = NULL;
>>>
>>> Onve again - why the initializer? In fact, ...
>>>
>>>> + v = xvzalloc(struct vaplic);
>>>
>>> ... this could be the initializer.
>>
>> Sure, I will use it as initializer.
>>
>>>
>>>> + if ( !v )
>>>> + return NULL;
>>>> +
>>>> + return &v->base;
>>>> +}
>>>
>>> If you returned and ...
>>>
>>>> +int __init domain_vaplic_init(struct domain *d)
>>>> +{
>>>> + int ret = 0;
>>>> +
>>>> + d->arch.vintc = vaplic_alloc();
>>>
>>> ... stored struct vaplic *, the slightly odd to_vaplic() macro wouldn't
>>> be needed.
>>
>> vaplic_alloc() return struct vintc *,
>
> Which is what I'm putting under question. Why would a function of this name
> return anything else than struct vaplic *?
Agree, this function could return struct vaplic *. I will do that.
>
>> which is then used by to_vaplic()
>> to get struct vaplic *.
>
> And which is what I'm saying can be avoided.
'struct vintc *vintc;' is still needed in arch_domain struct as it is
needed to call vintc->ops->... in the case like during vCPU creation:
if ( (rc = v->domain->arch.vintc->ops->vcpu_init(v)) )
goto fail;
And then if 'struct vintc *vintc;' is still present in arch_domain
struct to_vaplic() is still needed in domain_vaplic_deinit(struct domain
*d) function
to get struct vaplic *. (All it is true for other vAPLIC functions which
take as an argument struct domain).
>
>>>> + if ( !d->arch.vintc )
>>>> + {
>>>> + ret = -ENOMEM;
>>>> + goto fail;
>>>
>>> Nit: goto when simply return could be used.
>>>
>>>> + }
>>>> +
>>>> + d->arch.vintc->ops = &vaplic_ops;
>>>
>>> Are other kinds of ops structures going to appear? If not, why the extra
>>> indirection?
>>
>> At the moment, no I don't see any other kinds of ops struct. It was just
>> convenient way to group them and then easier to initialize them - just
>> one assignment instead of addinng a separate line in domain_vaplic_init().
>
> Maybe I wasn't as clear as I should have been: Why the indirection when it
> doesn't abstract anything? I.e. why the "ops" field in the first place,
> when everyone could access the global (until such time that abstraction
> becomes necessary)?
It isn't really needed now. I can just embed ops into vintc explicitly
without grouping them into structure.
Except the case if we want to have 'vintc_ops *ops;' field in
arch_domain structure and separately 'void *vintc;' (which futhure could
be casted to struct vaplic *) to drop fully to_vaplic() macros.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support
2026-04-20 10:25 ` Oleksii Kurochko
@ 2026-04-20 10:47 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-20 10:47 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 20.04.2026 12:25, Oleksii Kurochko wrote:
> On 4/16/26 2:42 PM, Jan Beulich wrote:
>> On 14.04.2026 12:27, Oleksii Kurochko wrote:
>>> On 4/2/26 1:58 PM, Jan Beulich wrote:
>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>> @@ -47,6 +48,19 @@ struct intc_hw_operations {
>>>>> const struct dt_device_node *intc);
>>>>> };
>>>>> +
>>>>> +struct vintc {
>>>>> + const struct intc_info *info;
>>>>
>>>> Isn't this referencing a physical INTC's structure? Why would the virtual
>>>> one's properties have to match that of the physical one?
>>>
>>> It is because of how vAPLIC emulation load and store is working.
>>
>> Thank you very much. This fully explains things, the more that of course
>> emulation of loads and stores comes earlier in this series. Oleksii,
>> really, please.
>
> Sorry for that. Let me add some extra details where I think that pointer
> to physical APLIC regs are needed.
>
> When APLIC tries to access TARGET register it is necessary to update
> real APLIC as inside this register it is coded information about Hart
> index, Guest Index (guest interrupt file id) and EIID (External
> Interrupt Identity). So to do that vintc should have access to physical
> APLIC registers.
>
> The similar things I expected to be with some of other register, for
> example, that one which are stands for turning on/of interrupts (SETIE,
> CLRIE). If vAPLIC is requesting an enablement of an interrupt then I
> expect that correspondent physical APLIC register should be updated too
> as otherwise how then device interrupt will start to occur. So again it
> is needed a pointer to physical APLIC to access these registers.
>
> Does it make sense at least a little bit now?
Yes. Just that there's a "but": I can't spot any use of the field in this
patch. Hence I still can't say whether I actually agree. Introduce the
field when it's actually needed?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-04-16 12:51 ` Jan Beulich
@ 2026-04-20 11:39 ` Oleksii Kurochko
2026-04-20 13:45 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-20 11:39 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 2:51 PM, Jan Beulich wrote:
> On 14.04.2026 13:29, Oleksii Kurochko wrote:
>> On 4/2/26 2:22 PM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> --- a/xen/arch/riscv/include/asm/setup.h
>>>> +++ b/xen/arch/riscv/include/asm/setup.h
>>>> @@ -5,6 +5,10 @@
>>>>
>>>> #include <xen/types.h>
>>>>
>>>> +struct domain;
>>>> +struct dt_device_node;
>>>> +struct rangeset;
>>>> +
>>>> #define max_init_domid (0)
>>>>
>>>> void setup_mm(void);
>>>> @@ -13,6 +17,19 @@ void copy_from_paddr(void *dst, paddr_t paddr, unsigned long len);
>>>>
>>>> void init_csr_masks(void);
>>>>
>>>> +/* TODO: move somewhere to common header? */
>>>
>>> Counter question: Why ...
>>>
>>>> +/*
>>>> + * Retrieves the interrupts configuration from a device tree node and maps
>>>> + * those interrupts to the target domain.
>>>> + *
>>>> + * Returns:
>>>> + * < 0 error
>>>> + * 0 success
>>>> + */
>>>> +int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
>>>> + bool need_mapping,
>>>> + struct rangeset *irq_ranges);
>>>
>>> ... is this not an inline function, when ...
>>>
>>>> --- a/xen/arch/riscv/intc.c
>>>> +++ b/xen/arch/riscv/intc.c
>>>> @@ -79,3 +79,11 @@ int __init intc_make_domu_dt_node(const struct kernel_info *kinfo)
>>>>
>>>> return -ENOSYS;
>>>> }
>>>> +
>>>> +int map_device_irqs_to_domain(struct domain *d, struct dt_device_node *dev,
>>>> + bool need_mapping,
>>>> + struct rangeset *irq_ranges)
>>>> +{
>>>> + return d->arch.vintc->ops->map_device_irqs_to_domain(d, dev, need_mapping,
>>>> + irq_ranges);
>>>> +}
>>>
>>> ... it's merely a wrapper around an indirect function call? And then the
>>> function isn't used anywhere anyway.
>>
>> It is used by dom0less common code and it is a wrapper because Arm has
>> different implementation and Arm doesn't have
>> map_device_irqs_to_domain() in its virtual interrupt controller operations.
>
> But the question wasn't why this is a wrapper, but why this wrapper isn't an
> inline function.
Got you, it could be really inline.
>
>>>> +int vaplic_map_device_irqs_to_domain(struct domain *d,
>>>> + struct dt_device_node *dev,
>>>> + bool need_mapping,
>>>> + struct rangeset *irq_ranges)
>>>> +{
>>>> + unsigned int i, nirq;
>>>> + int res, irq;
>>>> + struct dt_raw_irq rirq;
>>>> + uint32_t *auth_irq_bmp = d->arch.vintc->private;
>>>> + unsigned int reg_num;
>>>> +
>>>> + nirq = dt_number_of_irq(dev);
>>>> +
>>>> + /* Give permission and map IRQs */
>>>> + for ( i = 0; i < nirq; i++ )
>>>> + {
>>>> + res = dt_device_get_raw_irq(dev, i, &rirq);
>>>> + if ( res )
>>>> + {
>>>> + printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
>>>> + i, dt_node_full_name(dev));
>>>> + return res;
>>>> + }
>>>> +
>>>> + /*
>>>> + * Don't map IRQ that have no physical meaning
>>>> + * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
>>>> + */
>>>> + if ( rirq.controller != dt_interrupt_controller )
>>>> + {
>>>> + dt_dprintk("irq %u not connected to primary controller."
>>>> + "Connected to %s\n", i,
>>>> + dt_node_full_name(rirq.controller));
>>>> + continue;
>>>> + }
>>>> +
>>>> + irq = platform_get_irq(dev, i);
>>>> + if ( irq < 0 )
>>>> + {
>>>> + printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
>>>> + return irq;
>>>> + }
>>>> +
>>>> + res = irq_permit_access(d, irq);
>>>> + if ( res )
>>>> + {
>>>> + printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
>>>> + irq);
>>>
>>> This time the other way around: %d please with plain int. (Again at least
>>> once further down.)
>>>
>>>> + return res;
>>>> + }
>>>> +
>>>> + reg_num = irq / APLIC_NUM_REGS;
>>>> +
>>>> + if ( is_irq_shared_among_domains(d, irq) )
>>>> + {
>>>> + printk("%s: Shared IRQ isn't supported\n", __func__);
>>>> + return -EINVAL;
>>>> + }
>>>> +
>>>> + auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
>>>
>>> ... all of this leaves me with the impression that IRQ numbering isn't really
>>> virtualized. IRQs are merely split into groups, one group per domain (and
>>> maybe some unused). How are you going to fit in truly virtual IRQs?
>>
>> What do you mean by truly virtual IRQs?
>
> Ones where no aspects are represented by any piece of hardware.
>
>> I can't totally agree that the current approach isn't use virtual IRQs,
>> yes, they are 1:1 mapped but on the other side Xen is responsible to
>> give an IRQ number for guest's device and Xen is responsible that guest
>> isn't trying to reach IRQ which not belongs to it.
>
> In a non-virtualized environment I expect IRQs are going to be "sparse"
> (i.e. with perhaps large blocks of items used elsewhere). If you had
> proper translation of IRQ numbers, the same could be true for your
> guests.
Partial FDT, which is used to tell which device be passthroughed to
guest, is using physical IRQ number (which I am just considering for
simplicity to be 1:1 mapped to virtual IRQ number). So if we have the
following configuration:
Physical (bare-metal) IRQ layout is sparse:
IRQ 5 → UART -> domU0
IRQ 23 → Ethernet -> domU1
IRQ 47 → PCIe -> domU0
IRQ 100 → Storage -> domU1
(gaps everywhere, driven by hardware wiring)
For such configuration we will have for each domain auth_irq_bmp[] which
contains:
IRQ 5 and IRQ47 for domU0
and
IRQ 23 and IRQ 100 for domU1
And here vIRQ5 = pIRQ5, vIRQ47 = pIRQ47 and so on. auth_irq_bmp just
transform xIRQ number to bit position which it will have in real APLIC
register. Just as an example, lets take vIRQ5 and vIRQ47.
As reading or writing register setie[k] reads or potentially modifies
the enable bits for interrupt sources k × 32 through k × 32 + 31. For an
implemented interrupt source i within that range, the enable bit for
source i corresponds with register bit (i mod 32).
So for:
- vIRQ5 == pIRQ5 we have to set bit 5 in setie[0]
- vIRQ47 == pIRQ47 we have to set bit 15 in setie[1]
Probably it was not the best idea to declare auth_irq_bmp as it will
look in h/w and maybe just 'bool auth_irq_bmp[1024]' would be more clearer.
So irqs number are still stay "sparsed" in guest.
>
>>>> + dt_dprintk(" - IRQ: %u\n", irq);
>>>> +
>>>> + if ( irq_ranges )
>>>> + {
>>>> + res = rangeset_add_singleton(irq_ranges, irq);
>>>> + if ( res )
>>>> + return res;
>>>> + }
>>>
>>> What is irq_ranges?
>>
>> IIUC based on Arm code irq_ranges is an optional output accumulator, the
>> caller allocates and passes it in when it needs to track which IRQs were
>> mapped (overlay use case), or passes NULL when that tracking is not needed.
>>
>> I added here as map_device_irqs_to_domain() is called from the common
>> code and so maybe one day someone will decide to pass irq_ranges to this
>> functions. At the moment, for RISC-V it is the only one user of
>> map_device_irqs_to_domain() and it passes NULL.
>
> Simply assert then that it's NULL?
Won't BUG_ON() be better here as it BUG_ON() macros is always defined
and doesn't matter if release or debug build are used.
Or maybe you meant:
if ( irq_ranges )
assert_failed("irq_ranges arg isn't supported\n");
>
>>>> @@ -34,6 +142,7 @@ static int __init cf_check vcpu_vaplic_init(struct vcpu *v)
>>>>
>>>> static const struct vintc_ops vaplic_ops = {
>>>> .vcpu_init = vcpu_vaplic_init,
>>>> + .map_device_irqs_to_domain = vaplic_map_device_irqs_to_domain,
>>>> };
>>>
>>> What about the inverse function, needed for domain cleanup?
>>
>> I planned to add it when it will be really needed. At the momemnt, I
>> don't have such use cases.
>
> I.e. if any domain needs re-starting, the entire system needs rebooting?
> Recall that "dom0less" is slightly misleading a name, as it only allows
> there to not be a Dom0. One can be there, and hence re-starting a crashed
> domain ought to be possible. For that, you need to correctly clean up
> after the crashed one.
I haven't investigated yet what will be code path to do a reboot or
re-start crashed domain but based on an implementation of
vaplic_map_device_irqs_to_domain() as nothing is allocated there and
filling of auth_irq_bmp[] is happening there which will stay the same
after reboot if it is just re-used or it will be allocated new one if
crashed domain will just recreate this domain from scratch. So it seems
like it is enough to have only map_device_irqs_to_domain().
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 18/27] xen/riscv: add vaplic access check
2026-04-16 13:01 ` Jan Beulich
@ 2026-04-20 11:53 ` Oleksii Kurochko
2026-04-20 12:03 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-20 11:53 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 3:01 PM, Jan Beulich wrote:
> On 14.04.2026 13:45, Oleksii Kurochko wrote:
>> On 4/2/26 3:10 PM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> --- a/xen/arch/riscv/aplic.c
>>>> +++ b/xen/arch/riscv/aplic.c
>>>> @@ -38,6 +38,7 @@ static struct aplic_priv aplic = {
>>>>
>>>> static struct intc_info __ro_after_init aplic_info = {
>>>> .hw_version = INTC_APLIC,
>>>> + .private = &aplic,
>>>
>>> Isn't this the host instance again? How can you ...
>>>
>>>> --- a/xen/arch/riscv/vaplic.c
>>>> +++ b/xen/arch/riscv/vaplic.c
>>>> @@ -127,6 +127,20 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
>>>> return 0;
>>>> }
>>>>
>>>> +static int cf_check vaplic_is_access(const struct vcpu *vcpu,
>>>> + const unsigned long addr)
>>>> +{
>>>> + const struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>>>> + const struct aplic_priv *priv = vaplic->base.info->private;
>>>> + const paddr_t paddr_end = priv->paddr_start + priv->size;
>>>> +
>>>> + /* check if it is an APLIC access */
>>>> + if ( priv->paddr_start <= addr && addr < paddr_end )
>>>
>>> ... use that here? Or asked differently, again: Where's the virtualization,
>>> i.e. the abstraction away from host properties?
>>
>> With the current use case it was easier to choose such approach then
>> provide the full abstraction.
>>
>>> Furthermore, is it really sufficient to check just the starting address of
>>> an access? Shouldn't the last byte accessed also fall into the range in
>>> question?
>>
>> I think that it is okay, my understanding is that *paddr_end technically
>> is another range.
>
> Of course it is. But a multi-byte access crossing the paddr_end boundary
> isn't purely an APLIC one. You can reject such for simplicity, but I'm
> unconvinced that you can claim you will be able to correctly handle it
> without proper merging.
Lets say guest has the following description of vAPLIC in its DTB:
aplic@d000000 {
phandle = <0x06>;
riscv,num-sources = <0x60>;
reg = <0x00 0xd000000 0x00 0x8000>;
...
}
What means vAPLIC's MMIO range is [0xd000000, 0xD007FFF]. If some is
trying to access 0xd008000 it is not an MMIO address which belongs to
vAPLIC so vaplic_is_access() should return 0.
IIUC, you concern is that if someone will try to access 0xD007FFF which
from this function point of view is legal. I think it is okay to return
here 1 what tells that this address is from our vAPLIC range as it will
be rejected that on vaplic_emulate_{load,store}() side as addr (more
accurate offset got from addr) should be properly aligned:
const unsigned int offset = addr & APLIC_REG_OFFSET_MASK;
...
if ( offset & 3 )
{
gdprintk(XENLOG_WARNING, "Misaligned APLIC access at offset %#x\n",
offset);
return -EINVAL;
}
Is it okay? Actually I think we could add ( addr & 3 ) check in
vaplic_is_access() function too...
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 18/27] xen/riscv: add vaplic access check
2026-04-20 11:53 ` Oleksii Kurochko
@ 2026-04-20 12:03 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-20 12:03 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 20.04.2026 13:53, Oleksii Kurochko wrote:
>
>
> On 4/16/26 3:01 PM, Jan Beulich wrote:
>> On 14.04.2026 13:45, Oleksii Kurochko wrote:
>>> On 4/2/26 3:10 PM, Jan Beulich wrote:
>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>> --- a/xen/arch/riscv/aplic.c
>>>>> +++ b/xen/arch/riscv/aplic.c
>>>>> @@ -38,6 +38,7 @@ static struct aplic_priv aplic = {
>>>>>
>>>>> static struct intc_info __ro_after_init aplic_info = {
>>>>> .hw_version = INTC_APLIC,
>>>>> + .private = &aplic,
>>>>
>>>> Isn't this the host instance again? How can you ...
>>>>
>>>>> --- a/xen/arch/riscv/vaplic.c
>>>>> +++ b/xen/arch/riscv/vaplic.c
>>>>> @@ -127,6 +127,20 @@ int vaplic_map_device_irqs_to_domain(struct domain *d,
>>>>> return 0;
>>>>> }
>>>>>
>>>>> +static int cf_check vaplic_is_access(const struct vcpu *vcpu,
>>>>> + const unsigned long addr)
>>>>> +{
>>>>> + const struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>>>>> + const struct aplic_priv *priv = vaplic->base.info->private;
>>>>> + const paddr_t paddr_end = priv->paddr_start + priv->size;
>>>>> +
>>>>> + /* check if it is an APLIC access */
>>>>> + if ( priv->paddr_start <= addr && addr < paddr_end )
>>>>
>>>> ... use that here? Or asked differently, again: Where's the virtualization,
>>>> i.e. the abstraction away from host properties?
>>>
>>> With the current use case it was easier to choose such approach then
>>> provide the full abstraction.
>>>
>>>> Furthermore, is it really sufficient to check just the starting address of
>>>> an access? Shouldn't the last byte accessed also fall into the range in
>>>> question?
>>>
>>> I think that it is okay, my understanding is that *paddr_end technically
>>> is another range.
>>
>> Of course it is. But a multi-byte access crossing the paddr_end boundary
>> isn't purely an APLIC one. You can reject such for simplicity, but I'm
>> unconvinced that you can claim you will be able to correctly handle it
>> without proper merging.
>
> Lets say guest has the following description of vAPLIC in its DTB:
> aplic@d000000 {
> phandle = <0x06>;
> riscv,num-sources = <0x60>;
> reg = <0x00 0xd000000 0x00 0x8000>;
> ...
> }
> What means vAPLIC's MMIO range is [0xd000000, 0xD007FFF]. If some is
> trying to access 0xd008000 it is not an MMIO address which belongs to
> vAPLIC so vaplic_is_access() should return 0.
>
> IIUC, you concern is that if someone will try to access 0xD007FFF which
> from this function point of view is legal. I think it is okay to return
> here 1 what tells that this address is from our vAPLIC range as it will
> be rejected that on vaplic_emulate_{load,store}() side as addr (more
> accurate offset got from addr) should be properly aligned:
> const unsigned int offset = addr & APLIC_REG_OFFSET_MASK;
> ...
> if ( offset & 3 )
> {
> gdprintk(XENLOG_WARNING, "Misaligned APLIC access at offset %#x\n",
> offset);
> return -EINVAL;
> }
>
> Is it okay? Actually I think we could add ( addr & 3 ) check in
> vaplic_is_access() function too...
Perhaps best. The load/store functions could then simply assert that property.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-04-20 11:39 ` Oleksii Kurochko
@ 2026-04-20 13:45 ` Jan Beulich
2026-04-20 14:34 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-20 13:45 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 20.04.2026 13:39, Oleksii Kurochko wrote:
> On 4/16/26 2:51 PM, Jan Beulich wrote:
>> On 14.04.2026 13:29, Oleksii Kurochko wrote:
>>> On 4/2/26 2:22 PM, Jan Beulich wrote:
>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>> +int vaplic_map_device_irqs_to_domain(struct domain *d,
>>>>> + struct dt_device_node *dev,
>>>>> + bool need_mapping,
>>>>> + struct rangeset *irq_ranges)
>>>>> +{
>>>>> + unsigned int i, nirq;
>>>>> + int res, irq;
>>>>> + struct dt_raw_irq rirq;
>>>>> + uint32_t *auth_irq_bmp = d->arch.vintc->private;
>>>>> + unsigned int reg_num;
>>>>> +
>>>>> + nirq = dt_number_of_irq(dev);
>>>>> +
>>>>> + /* Give permission and map IRQs */
>>>>> + for ( i = 0; i < nirq; i++ )
>>>>> + {
>>>>> + res = dt_device_get_raw_irq(dev, i, &rirq);
>>>>> + if ( res )
>>>>> + {
>>>>> + printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
>>>>> + i, dt_node_full_name(dev));
>>>>> + return res;
>>>>> + }
>>>>> +
>>>>> + /*
>>>>> + * Don't map IRQ that have no physical meaning
>>>>> + * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
>>>>> + */
>>>>> + if ( rirq.controller != dt_interrupt_controller )
>>>>> + {
>>>>> + dt_dprintk("irq %u not connected to primary controller."
>>>>> + "Connected to %s\n", i,
>>>>> + dt_node_full_name(rirq.controller));
>>>>> + continue;
>>>>> + }
>>>>> +
>>>>> + irq = platform_get_irq(dev, i);
>>>>> + if ( irq < 0 )
>>>>> + {
>>>>> + printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
>>>>> + return irq;
>>>>> + }
>>>>> +
>>>>> + res = irq_permit_access(d, irq);
>>>>> + if ( res )
>>>>> + {
>>>>> + printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
>>>>> + irq);
>>>>
>>>> This time the other way around: %d please with plain int. (Again at least
>>>> once further down.)
>>>>
>>>>> + return res;
>>>>> + }
>>>>> +
>>>>> + reg_num = irq / APLIC_NUM_REGS;
>>>>> +
>>>>> + if ( is_irq_shared_among_domains(d, irq) )
>>>>> + {
>>>>> + printk("%s: Shared IRQ isn't supported\n", __func__);
>>>>> + return -EINVAL;
>>>>> + }
>>>>> +
>>>>> + auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
>>>>
>>>> ... all of this leaves me with the impression that IRQ numbering isn't really
>>>> virtualized. IRQs are merely split into groups, one group per domain (and
>>>> maybe some unused). How are you going to fit in truly virtual IRQs?
>>>
>>> What do you mean by truly virtual IRQs?
>>
>> Ones where no aspects are represented by any piece of hardware.
>>
>>> I can't totally agree that the current approach isn't use virtual IRQs,
>>> yes, they are 1:1 mapped but on the other side Xen is responsible to
>>> give an IRQ number for guest's device and Xen is responsible that guest
>>> isn't trying to reach IRQ which not belongs to it.
>>
>> In a non-virtualized environment I expect IRQs are going to be "sparse"
>> (i.e. with perhaps large blocks of items used elsewhere). If you had
>> proper translation of IRQ numbers, the same could be true for your
>> guests.
>
> Partial FDT, which is used to tell which device be passthroughed to
> guest, is using physical IRQ number (which I am just considering for
> simplicity to be 1:1 mapped to virtual IRQ number). So if we have the
> following configuration:
> Physical (bare-metal) IRQ layout is sparse:
> IRQ 5 → UART -> domU0
> IRQ 23 → Ethernet -> domU1
> IRQ 47 → PCIe -> domU0
> IRQ 100 → Storage -> domU1
> (gaps everywhere, driven by hardware wiring)
>
> For such configuration we will have for each domain auth_irq_bmp[] which
> contains:
> IRQ 5 and IRQ47 for domU0
> and
> IRQ 23 and IRQ 100 for domU1
>
> And here vIRQ5 = pIRQ5, vIRQ47 = pIRQ47 and so on. auth_irq_bmp just
> transform xIRQ number to bit position which it will have in real APLIC
> register. Just as an example, lets take vIRQ5 and vIRQ47.
>
> As reading or writing register setie[k] reads or potentially modifies
> the enable bits for interrupt sources k × 32 through k × 32 + 31. For an
> implemented interrupt source i within that range, the enable bit for
> source i corresponds with register bit (i mod 32).
> So for:
> - vIRQ5 == pIRQ5 we have to set bit 5 in setie[0]
> - vIRQ47 == pIRQ47 we have to set bit 15 in setie[1]
>
> Probably it was not the best idea to declare auth_irq_bmp as it will
> look in h/w and maybe just 'bool auth_irq_bmp[1024]' would be more clearer.
>
> So irqs number are still stay "sparsed" in guest.
Well, twice (or more) as sparse in the example you give, compared to the
host.
>>>>> + dt_dprintk(" - IRQ: %u\n", irq);
>>>>> +
>>>>> + if ( irq_ranges )
>>>>> + {
>>>>> + res = rangeset_add_singleton(irq_ranges, irq);
>>>>> + if ( res )
>>>>> + return res;
>>>>> + }
>>>>
>>>> What is irq_ranges?
>>>
>>> IIUC based on Arm code irq_ranges is an optional output accumulator, the
>>> caller allocates and passes it in when it needs to track which IRQs were
>>> mapped (overlay use case), or passes NULL when that tracking is not needed.
>>>
>>> I added here as map_device_irqs_to_domain() is called from the common
>>> code and so maybe one day someone will decide to pass irq_ranges to this
>>> functions. At the moment, for RISC-V it is the only one user of
>>> map_device_irqs_to_domain() and it passes NULL.
>>
>> Simply assert then that it's NULL?
>
> Won't BUG_ON() be better here as it BUG_ON() macros is always defined
> and doesn't matter if release or debug build are used.
Depends on the context, really.
> Or maybe you meant:
> if ( irq_ranges )
> assert_failed("irq_ranges arg isn't supported\n");
Definitely not. assert_failed() shouldn't be called directly, as I had
told you on at least one earlier occasion.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-04-20 13:45 ` Jan Beulich
@ 2026-04-20 14:34 ` Oleksii Kurochko
2026-04-20 15:21 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-20 14:34 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/20/26 3:45 PM, Jan Beulich wrote:
> On 20.04.2026 13:39, Oleksii Kurochko wrote:
>> On 4/16/26 2:51 PM, Jan Beulich wrote:
>>> On 14.04.2026 13:29, Oleksii Kurochko wrote:
>>>> On 4/2/26 2:22 PM, Jan Beulich wrote:
>>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>>> +int vaplic_map_device_irqs_to_domain(struct domain *d,
>>>>>> + struct dt_device_node *dev,
>>>>>> + bool need_mapping,
>>>>>> + struct rangeset *irq_ranges)
>>>>>> +{
>>>>>> + unsigned int i, nirq;
>>>>>> + int res, irq;
>>>>>> + struct dt_raw_irq rirq;
>>>>>> + uint32_t *auth_irq_bmp = d->arch.vintc->private;
>>>>>> + unsigned int reg_num;
>>>>>> +
>>>>>> + nirq = dt_number_of_irq(dev);
>>>>>> +
>>>>>> + /* Give permission and map IRQs */
>>>>>> + for ( i = 0; i < nirq; i++ )
>>>>>> + {
>>>>>> + res = dt_device_get_raw_irq(dev, i, &rirq);
>>>>>> + if ( res )
>>>>>> + {
>>>>>> + printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
>>>>>> + i, dt_node_full_name(dev));
>>>>>> + return res;
>>>>>> + }
>>>>>> +
>>>>>> + /*
>>>>>> + * Don't map IRQ that have no physical meaning
>>>>>> + * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
>>>>>> + */
>>>>>> + if ( rirq.controller != dt_interrupt_controller )
>>>>>> + {
>>>>>> + dt_dprintk("irq %u not connected to primary controller."
>>>>>> + "Connected to %s\n", i,
>>>>>> + dt_node_full_name(rirq.controller));
>>>>>> + continue;
>>>>>> + }
>>>>>> +
>>>>>> + irq = platform_get_irq(dev, i);
>>>>>> + if ( irq < 0 )
>>>>>> + {
>>>>>> + printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
>>>>>> + return irq;
>>>>>> + }
>>>>>> +
>>>>>> + res = irq_permit_access(d, irq);
>>>>>> + if ( res )
>>>>>> + {
>>>>>> + printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
>>>>>> + irq);
>>>>> This time the other way around: %d please with plain int. (Again at least
>>>>> once further down.)
>>>>>
>>>>>> + return res;
>>>>>> + }
>>>>>> +
>>>>>> + reg_num = irq / APLIC_NUM_REGS;
>>>>>> +
>>>>>> + if ( is_irq_shared_among_domains(d, irq) )
>>>>>> + {
>>>>>> + printk("%s: Shared IRQ isn't supported\n", __func__);
>>>>>> + return -EINVAL;
>>>>>> + }
>>>>>> +
>>>>>> + auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
>>>>> ... all of this leaves me with the impression that IRQ numbering isn't really
>>>>> virtualized. IRQs are merely split into groups, one group per domain (and
>>>>> maybe some unused). How are you going to fit in truly virtual IRQs?
>>>> What do you mean by truly virtual IRQs?
>>> Ones where no aspects are represented by any piece of hardware.
>>>
>>>> I can't totally agree that the current approach isn't use virtual IRQs,
>>>> yes, they are 1:1 mapped but on the other side Xen is responsible to
>>>> give an IRQ number for guest's device and Xen is responsible that guest
>>>> isn't trying to reach IRQ which not belongs to it.
>>> In a non-virtualized environment I expect IRQs are going to be "sparse"
>>> (i.e. with perhaps large blocks of items used elsewhere). If you had
>>> proper translation of IRQ numbers, the same could be true for your
>>> guests.
>> Partial FDT, which is used to tell which device be passthroughed to
>> guest, is using physical IRQ number (which I am just considering for
>> simplicity to be 1:1 mapped to virtual IRQ number). So if we have the
>> following configuration:
>> Physical (bare-metal) IRQ layout is sparse:
>> IRQ 5 → UART -> domU0
>> IRQ 23 → Ethernet -> domU1
>> IRQ 47 → PCIe -> domU0
>> IRQ 100 → Storage -> domU1
>> (gaps everywhere, driven by hardware wiring)
>>
>> For such configuration we will have for each domain auth_irq_bmp[] which
>> contains:
>> IRQ 5 and IRQ47 for domU0
>> and
>> IRQ 23 and IRQ 100 for domU1
>>
>> And here vIRQ5 = pIRQ5, vIRQ47 = pIRQ47 and so on. auth_irq_bmp just
>> transform xIRQ number to bit position which it will have in real APLIC
>> register. Just as an example, lets take vIRQ5 and vIRQ47.
>>
>> As reading or writing register setie[k] reads or potentially modifies
>> the enable bits for interrupt sources k × 32 through k × 32 + 31. For an
>> implemented interrupt source i within that range, the enable bit for
>> source i corresponds with register bit (i mod 32).
>> So for:
>> - vIRQ5 == pIRQ5 we have to set bit 5 in setie[0]
>> - vIRQ47 == pIRQ47 we have to set bit 15 in setie[1]
>>
>> Probably it was not the best idea to declare auth_irq_bmp as it will
>> look in h/w and maybe just 'bool auth_irq_bmp[1024]' would be more clearer.
>>
>> So irqs number are still stay "sparsed" in guest.
> Well, twice (or more) as sparse in the example you give, compared to the
> host.
Just to be sure that I fully understand your concern here.
The difference between xIRQ5 and xIRQ47 is 42 bits (if for 1 irq we are
using 1 bit) which leads to that we have somewhere allocated 48 bit
bitmap where only two bits will be set, all others will be zero.
Instead it would be better to have to do mapping: pIRQ5 -> vIRQ1, pIRQ47
->vIRQ2, right?
If yes, won't we still store somewhere this mapping? it seems like
having 'unsigned int auth_irq_bmp[1024]' is a good option where index
will be vIRQ number and 'unsigned int' will be pIRQ number. But at the
moment I think that we could go with 1:1 IRQ number mapping and then
have 'bool auth_irq_bmp[1024]' will be more then enough and will safe
some memory.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO
2026-04-16 13:19 ` Jan Beulich
@ 2026-04-20 15:02 ` Oleksii Kurochko
2026-04-20 15:27 ` Jan Beulich
0 siblings, 1 reply; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-20 15:02 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/16/26 3:19 PM, Jan Beulich wrote:
> On 14.04.2026 18:04, Oleksii Kurochko wrote:
>> On 4/2/26 4:18 PM, Jan Beulich wrote:
>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>> +static int cf_check vaplic_emulate_store(const struct vcpu *vcpu,
>>>> + unsigned long addr, uint32_t value)
>>>> +{
>>>> + struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>>>> + struct aplic_priv *priv = vaplic->base.info->private;
>>>> + uint32_t offset = addr & APLIC_REG_OFFSET_MASK;
>>>
>>> See ./CODING_STYLE as to uses of fixed-width types.
>>>
>>>> + unsigned long aplic_addr = addr - priv->paddr_start;
>>>> + const uint32_t *auth_irq_bmp = vcpu->domain->arch.vintc->private;
>>>> +
>>>> + switch ( offset )
>>>> + {
>>>> + case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
>>>
>>> And (taking this just as example) any misaligned accesses falling in this range
>>> are fine?
>>
>> Do you mean something like 0x1C02 instead of 0x1C00 or 0x1C04?
>
> Yes.
>
>>>> + /*
>>>> + * As sourcecfg register starts from 1:
>>>> + * 0x0000 domaincfg
>>>> + * 0x0004 sourcecfg[1]
>>>> + * 0x0008 sourcecfg[2]
>>>> + * ...
>>>> + * 0x0FFC sourcecfg[1023]
>>>> + * It is necessary to calculate an interrupt number by substracting
>>>
>>> Nit: subtracting
>>>
>>>> + * of APLIC_DOMAINCFG instead of APLIC_SOURCECFG_BASE.
>>>> + */
>>>> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_DOMAINCFG)) )
>>>> + /* interrupt not enabled, ignore it */
>>>
>>> Throughout the series: Please adhere to ./CODING_STYLE.
>>>
>>>> + return 0;
>>>> +
>>>> + break;
>>>
>>> And any value is okay to write?
>>
>> No, it should be in a range
>> [APLIC_SOURCECFG_SM_INACTIVE,APLIC_SOURCECFG_SM_LEVEL_LOW].
>>
>> I will add the check before break:
>> if ( value > APLIC_SOURCECFG_SM_LEVEL_LOW )
>> {
>> gdprintk(XENLOG_WARNING,
>> "value(%u) is incorrect for sourcecfg register\n",
>> value);
>> value = APLIC_SOURCECFG_SM_INACTIVE;
>> }
>
> And why would writing APLIC_SOURCECFG_SM_INACTIVE be any better, when
> that's not what the guest wanted? Simply ignore such writes, unless the
> spec mandates specific behavior for out-of-range avlues?
The spec doesn't mandate specific behavior for out-of-range values but I
thought it would be better to make irq inactive instead of just ignoring
so it won't affect somehow potential occurrence of this interrupt.
>
>>>> + case APLIC_TARGET_BASE ... APLIC_TARGET_LAST:
>>>> + struct vcpu *target_vcpu = NULL;
>>>> +
>>>> + /*
>>>> + * Look at vaplic_emulate_load() for explanation why
>>>> + * APLIC_GENMSI is substracted.
>>>> + */
>>>
>>> There's no vaplic_emulate_load() - how can I go look there?
>>
>> It is introduced in the next patch.
>
> As before - it should be possible to review patch series strictly
> sequentially. Further, what if this patch gets committed, and the other
> gets delayed by several months?
Got you, I will re-order patches.
>
>>>> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_GENMSI)) )
>>>> + /* interrupt not enabled, ignore it */
>>>> + return 0;
>>>> +
>>>> + for ( int i = 0; i < vcpu->domain->max_vcpus; i++ )
>>>
>>> unsigned int
>>>
>>>> + {
>>>> + struct vcpu *v = vcpu->domain->vcpu[i];
>>>> +
>>>> + if ( v->vcpu_id == (value >> APLIC_TARGET_HART_IDX_SHIFT) )
>>>> + {
>>>> + target_vcpu = v;
>>>> + break;
>>>> + }
>>>> + }
>>>> +
>>>> + ASSERT(target_vcpu);
>>>
>>> What guarantees the pointer to be non-NULL? The incoming value can be
>>> arbitrary, afaict.
>>
>> I didn't understand your point. It is just checking that target_vcpu has
>> been found. If after for() loop the value of target_vcpu is still NULL
>> then something wrong in Xen.
>
> If that's true, then the assertion is fine to have. I can't help the
> impression though that a guest could pick a value such that you can't
> possibly find the target vCPU. Asserting on guest controlled input is
> not okay, as was said several times before.
I will then do domain_crash() that as a value is incorrect in case if
target_vcpu is NULL, I missed that guest could put wrong value.
>
>>>> + if ( !(vaplic->regs.domaincfg & APLIC_DOMAINCFG_DM) )
>>>> + {
>>>> + vaplic_dm_update_target(cpuid_to_hartid(target_vcpu->processor),
>>>> + &value);
>>>> + }
>>>> + else
>>>> + vaplic_update_target(priv->imsic_cfg,
>>>> + vcpu_guest_file_id(target_vcpu),
>>>> + cpuid_to_hartid(target_vcpu->processor),
>>>> + &value);
>>>
>>> I'm struggling with the naming here: When DM is clear, a function with "dm"
>>> in the name is called.
>>
>> it means direct (delivery) mode. Maybe it is better to put dm at the end
>> of the function name? Or it is just better to change it to something else?
>
> Without a better understanding of what is wanted, all I can say is that
> calling something with "dm" in its name when the condition says it's not
> "dm" is confusing.
Basically it should be the following. If domaincfg.DM (here dm is
delivery mode according to spec) is 0 then it means that APLIC works in
direct delivery mode, if DM bit is 1 then MSI delivery mode is used.
So just for clarity I will rename:
- vaplic_dm_update_target -> vaplic_ddm_update_target
- vaplic_update_target -> vaplic_mdm_update_target
Or maybe just s/ddm/direct and s/mdm/msi will be just better in the
function names.
>
>>>> + default:
>>>> + panic("%s: unsupported register offset: %#x\n", __func__, offset);
>>>
>>> Crashing the host for the guest doing something odd? It's odd that the function
>>> only ever returns 0 anyway - it could simply return an error here (if the
>>> itention is to not ignore such writes).
>>
>> But maybe it is a legal offset and we really want to support it?
>
> Still not a reason to crash the entire host?
Agree, domain crash will be more then enough.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-04-20 14:34 ` Oleksii Kurochko
@ 2026-04-20 15:21 ` Jan Beulich
2026-04-20 15:31 ` Oleksii Kurochko
0 siblings, 1 reply; 123+ messages in thread
From: Jan Beulich @ 2026-04-20 15:21 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 20.04.2026 16:34, Oleksii Kurochko wrote:
>
>
> On 4/20/26 3:45 PM, Jan Beulich wrote:
>> On 20.04.2026 13:39, Oleksii Kurochko wrote:
>>> On 4/16/26 2:51 PM, Jan Beulich wrote:
>>>> On 14.04.2026 13:29, Oleksii Kurochko wrote:
>>>>> On 4/2/26 2:22 PM, Jan Beulich wrote:
>>>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>>>> +int vaplic_map_device_irqs_to_domain(struct domain *d,
>>>>>>> + struct dt_device_node *dev,
>>>>>>> + bool need_mapping,
>>>>>>> + struct rangeset *irq_ranges)
>>>>>>> +{
>>>>>>> + unsigned int i, nirq;
>>>>>>> + int res, irq;
>>>>>>> + struct dt_raw_irq rirq;
>>>>>>> + uint32_t *auth_irq_bmp = d->arch.vintc->private;
>>>>>>> + unsigned int reg_num;
>>>>>>> +
>>>>>>> + nirq = dt_number_of_irq(dev);
>>>>>>> +
>>>>>>> + /* Give permission and map IRQs */
>>>>>>> + for ( i = 0; i < nirq; i++ )
>>>>>>> + {
>>>>>>> + res = dt_device_get_raw_irq(dev, i, &rirq);
>>>>>>> + if ( res )
>>>>>>> + {
>>>>>>> + printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
>>>>>>> + i, dt_node_full_name(dev));
>>>>>>> + return res;
>>>>>>> + }
>>>>>>> +
>>>>>>> + /*
>>>>>>> + * Don't map IRQ that have no physical meaning
>>>>>>> + * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
>>>>>>> + */
>>>>>>> + if ( rirq.controller != dt_interrupt_controller )
>>>>>>> + {
>>>>>>> + dt_dprintk("irq %u not connected to primary controller."
>>>>>>> + "Connected to %s\n", i,
>>>>>>> + dt_node_full_name(rirq.controller));
>>>>>>> + continue;
>>>>>>> + }
>>>>>>> +
>>>>>>> + irq = platform_get_irq(dev, i);
>>>>>>> + if ( irq < 0 )
>>>>>>> + {
>>>>>>> + printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
>>>>>>> + return irq;
>>>>>>> + }
>>>>>>> +
>>>>>>> + res = irq_permit_access(d, irq);
>>>>>>> + if ( res )
>>>>>>> + {
>>>>>>> + printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
>>>>>>> + irq);
>>>>>> This time the other way around: %d please with plain int. (Again at least
>>>>>> once further down.)
>>>>>>
>>>>>>> + return res;
>>>>>>> + }
>>>>>>> +
>>>>>>> + reg_num = irq / APLIC_NUM_REGS;
>>>>>>> +
>>>>>>> + if ( is_irq_shared_among_domains(d, irq) )
>>>>>>> + {
>>>>>>> + printk("%s: Shared IRQ isn't supported\n", __func__);
>>>>>>> + return -EINVAL;
>>>>>>> + }
>>>>>>> +
>>>>>>> + auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
>>>>>> ... all of this leaves me with the impression that IRQ numbering isn't really
>>>>>> virtualized. IRQs are merely split into groups, one group per domain (and
>>>>>> maybe some unused). How are you going to fit in truly virtual IRQs?
>>>>> What do you mean by truly virtual IRQs?
>>>> Ones where no aspects are represented by any piece of hardware.
>>>>
>>>>> I can't totally agree that the current approach isn't use virtual IRQs,
>>>>> yes, they are 1:1 mapped but on the other side Xen is responsible to
>>>>> give an IRQ number for guest's device and Xen is responsible that guest
>>>>> isn't trying to reach IRQ which not belongs to it.
>>>> In a non-virtualized environment I expect IRQs are going to be "sparse"
>>>> (i.e. with perhaps large blocks of items used elsewhere). If you had
>>>> proper translation of IRQ numbers, the same could be true for your
>>>> guests.
>>> Partial FDT, which is used to tell which device be passthroughed to
>>> guest, is using physical IRQ number (which I am just considering for
>>> simplicity to be 1:1 mapped to virtual IRQ number). So if we have the
>>> following configuration:
>>> Physical (bare-metal) IRQ layout is sparse:
>>> IRQ 5 → UART -> domU0
>>> IRQ 23 → Ethernet -> domU1
>>> IRQ 47 → PCIe -> domU0
>>> IRQ 100 → Storage -> domU1
>>> (gaps everywhere, driven by hardware wiring)
>>>
>>> For such configuration we will have for each domain auth_irq_bmp[] which
>>> contains:
>>> IRQ 5 and IRQ47 for domU0
>>> and
>>> IRQ 23 and IRQ 100 for domU1
>>>
>>> And here vIRQ5 = pIRQ5, vIRQ47 = pIRQ47 and so on. auth_irq_bmp just
>>> transform xIRQ number to bit position which it will have in real APLIC
>>> register. Just as an example, lets take vIRQ5 and vIRQ47.
>>>
>>> As reading or writing register setie[k] reads or potentially modifies
>>> the enable bits for interrupt sources k × 32 through k × 32 + 31. For an
>>> implemented interrupt source i within that range, the enable bit for
>>> source i corresponds with register bit (i mod 32).
>>> So for:
>>> - vIRQ5 == pIRQ5 we have to set bit 5 in setie[0]
>>> - vIRQ47 == pIRQ47 we have to set bit 15 in setie[1]
>>>
>>> Probably it was not the best idea to declare auth_irq_bmp as it will
>>> look in h/w and maybe just 'bool auth_irq_bmp[1024]' would be more clearer.
>>>
>>> So irqs number are still stay "sparsed" in guest.
>> Well, twice (or more) as sparse in the example you give, compared to the
>> host.
>
> Just to be sure that I fully understand your concern here.
>
> The difference between xIRQ5 and xIRQ47 is 42 bits (if for 1 irq we are
> using 1 bit) which leads to that we have somewhere allocated 48 bit
> bitmap where only two bits will be set, all others will be zero.
Why 48-bit bitmap? As you inherit the property from the host, it'll be e.g.
1024 bits. Compared to the host, each guest will have yet fewer bits set in
there.
> Instead it would be better to have to do mapping: pIRQ5 -> vIRQ1, pIRQ47
> ->vIRQ2, right?
Which specific mapping I don't care very much. There may also be conventions
to adhere to (on x86 for example there are).
> If yes, won't we still store somewhere this mapping? it seems like
> having 'unsigned int auth_irq_bmp[1024]' is a good option where index
> will be vIRQ number and 'unsigned int' will be pIRQ number. But at the
> moment I think that we could go with 1:1 IRQ number mapping and then
> have 'bool auth_irq_bmp[1024]' will be more then enough and will safe
> some memory.
Well, if using 1:1 mapping was clearly identified as "for the time being",
then that might be acceptable (for the time being).
As to you (again) suggesting "bool auth_irq_bmp[1024]" - why would you use
an array of bool-s when a bitmap can do the same in 1/8th of the space?
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO
2026-04-20 15:02 ` Oleksii Kurochko
@ 2026-04-20 15:27 ` Jan Beulich
0 siblings, 0 replies; 123+ messages in thread
From: Jan Beulich @ 2026-04-20 15:27 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 20.04.2026 17:02, Oleksii Kurochko wrote:
> On 4/16/26 3:19 PM, Jan Beulich wrote:
>> On 14.04.2026 18:04, Oleksii Kurochko wrote:
>>> On 4/2/26 4:18 PM, Jan Beulich wrote:
>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>> +static int cf_check vaplic_emulate_store(const struct vcpu *vcpu,
>>>>> + unsigned long addr, uint32_t value)
>>>>> +{
>>>>> + struct vaplic *vaplic = to_vaplic(vcpu->domain->arch.vintc);
>>>>> + struct aplic_priv *priv = vaplic->base.info->private;
>>>>> + uint32_t offset = addr & APLIC_REG_OFFSET_MASK;
>>>>
>>>> See ./CODING_STYLE as to uses of fixed-width types.
>>>>
>>>>> + unsigned long aplic_addr = addr - priv->paddr_start;
>>>>> + const uint32_t *auth_irq_bmp = vcpu->domain->arch.vintc->private;
>>>>> +
>>>>> + switch ( offset )
>>>>> + {
>>>>> + case APLIC_SETIP_BASE ... APLIC_SETIP_LAST:
>>>>
>>>> And (taking this just as example) any misaligned accesses falling in this range
>>>> are fine?
>>>
>>> Do you mean something like 0x1C02 instead of 0x1C00 or 0x1C04?
>>
>> Yes.
>>
>>>>> + /*
>>>>> + * As sourcecfg register starts from 1:
>>>>> + * 0x0000 domaincfg
>>>>> + * 0x0004 sourcecfg[1]
>>>>> + * 0x0008 sourcecfg[2]
>>>>> + * ...
>>>>> + * 0x0FFC sourcecfg[1023]
>>>>> + * It is necessary to calculate an interrupt number by substracting
>>>>
>>>> Nit: subtracting
>>>>
>>>>> + * of APLIC_DOMAINCFG instead of APLIC_SOURCECFG_BASE.
>>>>> + */
>>>>> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_DOMAINCFG)) )
>>>>> + /* interrupt not enabled, ignore it */
>>>>
>>>> Throughout the series: Please adhere to ./CODING_STYLE.
>>>>
>>>>> + return 0;
>>>>> +
>>>>> + break;
>>>>
>>>> And any value is okay to write?
>>>
>>> No, it should be in a range
>>> [APLIC_SOURCECFG_SM_INACTIVE,APLIC_SOURCECFG_SM_LEVEL_LOW].
>>>
>>> I will add the check before break:
>>> if ( value > APLIC_SOURCECFG_SM_LEVEL_LOW )
>>> {
>>> gdprintk(XENLOG_WARNING,
>>> "value(%u) is incorrect for sourcecfg register\n",
>>> value);
>>> value = APLIC_SOURCECFG_SM_INACTIVE;
>>> }
>>
>> And why would writing APLIC_SOURCECFG_SM_INACTIVE be any better, when
>> that's not what the guest wanted? Simply ignore such writes, unless the
>> spec mandates specific behavior for out-of-range avlues?
>
> The spec doesn't mandate specific behavior for out-of-range values but I
> thought it would be better to make irq inactive instead of just ignoring
> so it won't affect somehow potential occurrence of this interrupt.
I'd find it concerning if we did anything not mandated by the spec. Just
doing nothing is imo still the best course of action.
>>>>> + if ( !AUTH_IRQ_BIT(regval_to_irqn(offset - APLIC_GENMSI)) )
>>>>> + /* interrupt not enabled, ignore it */
>>>>> + return 0;
>>>>> +
>>>>> + for ( int i = 0; i < vcpu->domain->max_vcpus; i++ )
>>>>
>>>> unsigned int
>>>>
>>>>> + {
>>>>> + struct vcpu *v = vcpu->domain->vcpu[i];
>>>>> +
>>>>> + if ( v->vcpu_id == (value >> APLIC_TARGET_HART_IDX_SHIFT) )
>>>>> + {
>>>>> + target_vcpu = v;
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> +
>>>>> + ASSERT(target_vcpu);
>>>>
>>>> What guarantees the pointer to be non-NULL? The incoming value can be
>>>> arbitrary, afaict.
>>>
>>> I didn't understand your point. It is just checking that target_vcpu has
>>> been found. If after for() loop the value of target_vcpu is still NULL
>>> then something wrong in Xen.
>>
>> If that's true, then the assertion is fine to have. I can't help the
>> impression though that a guest could pick a value such that you can't
>> possibly find the target vCPU. Asserting on guest controlled input is
>> not okay, as was said several times before.
>
> I will then do domain_crash() that as a value is incorrect in case if
> target_vcpu is NULL, I missed that guest could put wrong value.
Would domain_crash() come close to anything real hardware would do? I
doubt it. Doing nothing (besides perhaps logging a debug message) may
once again the best we can do.
>>>>> + if ( !(vaplic->regs.domaincfg & APLIC_DOMAINCFG_DM) )
>>>>> + {
>>>>> + vaplic_dm_update_target(cpuid_to_hartid(target_vcpu->processor),
>>>>> + &value);
>>>>> + }
>>>>> + else
>>>>> + vaplic_update_target(priv->imsic_cfg,
>>>>> + vcpu_guest_file_id(target_vcpu),
>>>>> + cpuid_to_hartid(target_vcpu->processor),
>>>>> + &value);
>>>>
>>>> I'm struggling with the naming here: When DM is clear, a function with "dm"
>>>> in the name is called.
>>>
>>> it means direct (delivery) mode. Maybe it is better to put dm at the end
>>> of the function name? Or it is just better to change it to something else?
>>
>> Without a better understanding of what is wanted, all I can say is that
>> calling something with "dm" in its name when the condition says it's not
>> "dm" is confusing.
>
> Basically it should be the following. If domaincfg.DM (here dm is
> delivery mode according to spec) is 0 then it means that APLIC works in
> direct delivery mode, if DM bit is 1 then MSI delivery mode is used.
>
> So just for clarity I will rename:
> - vaplic_dm_update_target -> vaplic_ddm_update_target
> - vaplic_update_target -> vaplic_mdm_update_target
>
> Or maybe just s/ddm/direct and s/mdm/msi will be just better in the
> function names.
Afaic - yes to these latter two names.
>>>>> + default:
>>>>> + panic("%s: unsupported register offset: %#x\n", __func__, offset);
>>>>
>>>> Crashing the host for the guest doing something odd? It's odd that the function
>>>> only ever returns 0 anyway - it could simply return an error here (if the
>>>> itention is to not ignore such writes).
>>>
>>> But maybe it is a legal offset and we really want to support it?
>>
>> Still not a reason to crash the entire host?
>
> Agree, domain crash will be more then enough.
See above as to even that possibly being too harsh.
Jan
^ permalink raw reply [flat|nested] 123+ messages in thread
* Re: [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough
2026-04-20 15:21 ` Jan Beulich
@ 2026-04-20 15:31 ` Oleksii Kurochko
0 siblings, 0 replies; 123+ messages in thread
From: Oleksii Kurochko @ 2026-04-20 15:31 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/20/26 5:21 PM, Jan Beulich wrote:
> On 20.04.2026 16:34, Oleksii Kurochko wrote:
>>
>>
>> On 4/20/26 3:45 PM, Jan Beulich wrote:
>>> On 20.04.2026 13:39, Oleksii Kurochko wrote:
>>>> On 4/16/26 2:51 PM, Jan Beulich wrote:
>>>>> On 14.04.2026 13:29, Oleksii Kurochko wrote:
>>>>>> On 4/2/26 2:22 PM, Jan Beulich wrote:
>>>>>>> On 10.03.2026 18:08, Oleksii Kurochko wrote:
>>>>>>>> +int vaplic_map_device_irqs_to_domain(struct domain *d,
>>>>>>>> + struct dt_device_node *dev,
>>>>>>>> + bool need_mapping,
>>>>>>>> + struct rangeset *irq_ranges)
>>>>>>>> +{
>>>>>>>> + unsigned int i, nirq;
>>>>>>>> + int res, irq;
>>>>>>>> + struct dt_raw_irq rirq;
>>>>>>>> + uint32_t *auth_irq_bmp = d->arch.vintc->private;
>>>>>>>> + unsigned int reg_num;
>>>>>>>> +
>>>>>>>> + nirq = dt_number_of_irq(dev);
>>>>>>>> +
>>>>>>>> + /* Give permission and map IRQs */
>>>>>>>> + for ( i = 0; i < nirq; i++ )
>>>>>>>> + {
>>>>>>>> + res = dt_device_get_raw_irq(dev, i, &rirq);
>>>>>>>> + if ( res )
>>>>>>>> + {
>>>>>>>> + printk(XENLOG_ERR "Unable to retrieve irq %u for %s\n",
>>>>>>>> + i, dt_node_full_name(dev));
>>>>>>>> + return res;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + /*
>>>>>>>> + * Don't map IRQ that have no physical meaning
>>>>>>>> + * ie: IRQ whose controller is not APLIC/IMSIC/PLIC.
>>>>>>>> + */
>>>>>>>> + if ( rirq.controller != dt_interrupt_controller )
>>>>>>>> + {
>>>>>>>> + dt_dprintk("irq %u not connected to primary controller."
>>>>>>>> + "Connected to %s\n", i,
>>>>>>>> + dt_node_full_name(rirq.controller));
>>>>>>>> + continue;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + irq = platform_get_irq(dev, i);
>>>>>>>> + if ( irq < 0 )
>>>>>>>> + {
>>>>>>>> + printk("Unable to get irq %u for %s\n", i, dt_node_full_name(dev));
>>>>>>>> + return irq;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + res = irq_permit_access(d, irq);
>>>>>>>> + if ( res )
>>>>>>>> + {
>>>>>>>> + printk(XENLOG_ERR "Unable to permit to %pd access to IRQ %u\n", d,
>>>>>>>> + irq);
>>>>>>> This time the other way around: %d please with plain int. (Again at least
>>>>>>> once further down.)
>>>>>>>
>>>>>>>> + return res;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + reg_num = irq / APLIC_NUM_REGS;
>>>>>>>> +
>>>>>>>> + if ( is_irq_shared_among_domains(d, irq) )
>>>>>>>> + {
>>>>>>>> + printk("%s: Shared IRQ isn't supported\n", __func__);
>>>>>>>> + return -EINVAL;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + auth_irq_bmp[reg_num] |= BIT(irq % APLIC_NUM_REGS, U);
>>>>>>> ... all of this leaves me with the impression that IRQ numbering isn't really
>>>>>>> virtualized. IRQs are merely split into groups, one group per domain (and
>>>>>>> maybe some unused). How are you going to fit in truly virtual IRQs?
>>>>>> What do you mean by truly virtual IRQs?
>>>>> Ones where no aspects are represented by any piece of hardware.
>>>>>
>>>>>> I can't totally agree that the current approach isn't use virtual IRQs,
>>>>>> yes, they are 1:1 mapped but on the other side Xen is responsible to
>>>>>> give an IRQ number for guest's device and Xen is responsible that guest
>>>>>> isn't trying to reach IRQ which not belongs to it.
>>>>> In a non-virtualized environment I expect IRQs are going to be "sparse"
>>>>> (i.e. with perhaps large blocks of items used elsewhere). If you had
>>>>> proper translation of IRQ numbers, the same could be true for your
>>>>> guests.
>>>> Partial FDT, which is used to tell which device be passthroughed to
>>>> guest, is using physical IRQ number (which I am just considering for
>>>> simplicity to be 1:1 mapped to virtual IRQ number). So if we have the
>>>> following configuration:
>>>> Physical (bare-metal) IRQ layout is sparse:
>>>> IRQ 5 → UART -> domU0
>>>> IRQ 23 → Ethernet -> domU1
>>>> IRQ 47 → PCIe -> domU0
>>>> IRQ 100 → Storage -> domU1
>>>> (gaps everywhere, driven by hardware wiring)
>>>>
>>>> For such configuration we will have for each domain auth_irq_bmp[] which
>>>> contains:
>>>> IRQ 5 and IRQ47 for domU0
>>>> and
>>>> IRQ 23 and IRQ 100 for domU1
>>>>
>>>> And here vIRQ5 = pIRQ5, vIRQ47 = pIRQ47 and so on. auth_irq_bmp just
>>>> transform xIRQ number to bit position which it will have in real APLIC
>>>> register. Just as an example, lets take vIRQ5 and vIRQ47.
>>>>
>>>> As reading or writing register setie[k] reads or potentially modifies
>>>> the enable bits for interrupt sources k × 32 through k × 32 + 31. For an
>>>> implemented interrupt source i within that range, the enable bit for
>>>> source i corresponds with register bit (i mod 32).
>>>> So for:
>>>> - vIRQ5 == pIRQ5 we have to set bit 5 in setie[0]
>>>> - vIRQ47 == pIRQ47 we have to set bit 15 in setie[1]
>>>>
>>>> Probably it was not the best idea to declare auth_irq_bmp as it will
>>>> look in h/w and maybe just 'bool auth_irq_bmp[1024]' would be more clearer.
>>>>
>>>> So irqs number are still stay "sparsed" in guest.
>>> Well, twice (or more) as sparse in the example you give, compared to the
>>> host.
>>
>> Just to be sure that I fully understand your concern here.
>>
>> The difference between xIRQ5 and xIRQ47 is 42 bits (if for 1 irq we are
>> using 1 bit) which leads to that we have somewhere allocated 48 bit
>> bitmap where only two bits will be set, all others will be zero.
>
> Why 48-bit bitmap? As you inherit the property from the host, it'll be e.g.
> 1024 bits. Compared to the host, each guest will have yet fewer bits set in
> there.
48-bit bitmap specifically in this example because IRQ47 is the highest
mentioned here, so if we won't compress zeros between IRQ0-4 and IRQ6-46
it will be needed 48-bit to cover IRQs from IRQ0 to IRQ47. I thought
about that as it seems like I misunderstood what you mean by "sparsed in
guest" and the way how I have to deal with that.
>
>> Instead it would be better to have to do mapping: pIRQ5 -> vIRQ1, pIRQ47
>> ->vIRQ2, right?
>
> Which specific mapping I don't care very much. There may also be conventions
> to adhere to (on x86 for example there are).
If just bitmap is okay here ... then I am okay with using it.
>
>> If yes, won't we still store somewhere this mapping? it seems like
>> having 'unsigned int auth_irq_bmp[1024]' is a good option where index
>> will be vIRQ number and 'unsigned int' will be pIRQ number. But at the
>> moment I think that we could go with 1:1 IRQ number mapping and then
>> have 'bool auth_irq_bmp[1024]' will be more then enough and will safe
>> some memory.
>
> Well, if using 1:1 mapping was clearly identified as "for the time being",
> then that might be acceptable (for the time being).
>
> As to you (again) suggesting "bool auth_irq_bmp[1024]" - why would you use
> an array of bool-s when a bitmap can do the same in 1/8th of the space?
... just bitmap will better.
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 123+ messages in thread
end of thread, other threads:[~2026-04-20 15:31 UTC | newest]
Thread overview: 123+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10 17:08 [PATCH v1 00/27] [RISC-V] Introduce enablemenant of dom0less Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 01/27] xen/riscv: Implement ARCH_PAGING_MEMPOOL Oleksii Kurochko
2026-03-11 8:18 ` Jan Beulich
2026-04-09 10:31 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 02/27] xen/riscv: Implement construct_domain() Oleksii Kurochko
2026-03-24 9:37 ` Jan Beulich
2026-04-09 11:26 ` Oleksii Kurochko
2026-04-09 12:58 ` Jan Beulich
2026-04-09 13:39 ` Oleksii Kurochko
2026-04-09 14:01 ` Oleksii Kurochko
2026-04-14 6:26 ` Julien Grall
2026-03-10 17:08 ` [PATCH v1 03/27] xen/riscv: implement prerequisites for domain_create() Oleksii Kurochko
2026-04-01 12:57 ` Jan Beulich
2026-04-09 11:55 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 04/27] xen/riscv: rework G-stage mode handling Oleksii Kurochko
2026-04-01 13:19 ` Jan Beulich
2026-04-07 10:47 ` Oleksii Kurochko
2026-04-07 13:43 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 05/27] xen/riscv: introduce guest riscv,isa string Oleksii Kurochko
2026-04-01 13:49 ` Jan Beulich
2026-04-10 10:24 ` Oleksii Kurochko
2026-04-10 10:50 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 06/27] xen/riscv: implement make_cpus_node() Oleksii Kurochko
2026-04-01 14:11 ` Jan Beulich
2026-04-10 11:19 ` Oleksii Kurochko
2026-04-10 12:02 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 07/27] xen/riscv: implement make_timer_node() Oleksii Kurochko
2026-04-01 14:24 ` Jan Beulich
2026-04-10 11:54 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 08/27] xen/riscv: implement make_arch_nodes() Oleksii Kurochko
2026-04-01 14:29 ` Jan Beulich
2026-04-10 13:32 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 09/27] xen/riscv: implement make_intc_domU_node() Oleksii Kurochko
2026-04-01 14:38 ` Jan Beulich
2026-04-10 14:00 ` Oleksii Kurochko
2026-04-10 14:23 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 10/27] xen/riscv: generate IMSIC DT node for guest domains Oleksii Kurochko
2026-04-01 15:05 ` Jan Beulich
2026-04-10 15:40 ` Oleksii Kurochko
2026-04-16 11:42 ` Jan Beulich
2026-04-17 8:10 ` Oleksii Kurochko
2026-04-17 13:50 ` Jan Beulich
2026-04-17 14:01 ` Oleksii Kurochko
2026-04-17 14:10 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 11/27] xen/riscv: create APLIC " Oleksii Kurochko
2026-04-01 15:16 ` Jan Beulich
2026-04-13 8:43 ` Oleksii Kurochko
2026-04-13 8:48 ` Oleksii Kurochko
2026-04-16 11:49 ` Jan Beulich
2026-04-17 9:01 ` Oleksii Kurochko
2026-04-17 13:53 ` Jan Beulich
2026-04-17 14:27 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 12/27] xen/riscv: introduce aia_init() and aia_available() Oleksii Kurochko
2026-04-02 9:00 ` Jan Beulich
2026-04-13 9:32 ` Oleksii Kurochko
2026-04-16 12:06 ` Jan Beulich
2026-04-17 9:37 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 13/27] xen/riscv: add basic VGEIN management for AIA guests Oleksii Kurochko
2026-04-02 10:03 ` Jan Beulich
2026-04-13 14:42 ` Oleksii Kurochko
2026-04-16 12:21 ` Jan Beulich
2026-04-17 11:34 ` Oleksii Kurochko
2026-04-17 14:07 ` Jan Beulich
2026-04-20 7:52 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 14/27] xen/riscv: introduce per-vCPU IMSIC state Oleksii Kurochko
2026-04-02 11:31 ` Jan Beulich
2026-04-14 9:22 ` Oleksii Kurochko
2026-04-16 12:31 ` Jan Beulich
2026-04-16 12:31 ` Jan Beulich
2026-04-17 13:47 ` Oleksii Kurochko
2026-04-20 8:29 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 15/27] xen/riscv: add very early virtual APLIC (vAPLIC) initialization support Oleksii Kurochko
2026-04-02 11:58 ` Jan Beulich
2026-04-14 10:27 ` Oleksii Kurochko
2026-04-16 12:42 ` Jan Beulich
2026-04-20 10:25 ` Oleksii Kurochko
2026-04-20 10:47 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 16/27] xen/riscv: implement IRQ mapping for device passthrough Oleksii Kurochko
2026-04-02 12:22 ` Jan Beulich
2026-04-14 11:29 ` Oleksii Kurochko
2026-04-16 12:51 ` Jan Beulich
2026-04-20 11:39 ` Oleksii Kurochko
2026-04-20 13:45 ` Jan Beulich
2026-04-20 14:34 ` Oleksii Kurochko
2026-04-20 15:21 ` Jan Beulich
2026-04-20 15:31 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 17/27] xen/riscv: add missing APLIC register offsets, masks to asm/aplic.h Oleksii Kurochko
2026-04-02 12:51 ` Jan Beulich
2026-04-14 11:42 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 18/27] xen/riscv: add vaplic access check Oleksii Kurochko
2026-04-02 13:10 ` Jan Beulich
2026-04-14 11:45 ` Oleksii Kurochko
2026-04-15 7:35 ` Oleksii Kurochko
2026-04-16 13:01 ` Jan Beulich
2026-04-20 11:53 ` Oleksii Kurochko
2026-04-20 12:03 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 19/27] xen/riscv: emulate guest writes to virtual APLIC MMIO Oleksii Kurochko
2026-04-02 14:18 ` Jan Beulich
2026-04-14 16:04 ` Oleksii Kurochko
2026-04-16 13:19 ` Jan Beulich
2026-04-20 15:02 ` Oleksii Kurochko
2026-04-20 15:27 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 20/27] xen/riscv: emulate guest reads from " Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 21/27] xen/riscv: introduce (de)initialization helpers for vINTC Oleksii Kurochko
2026-04-02 14:58 ` Jan Beulich
2026-04-15 7:50 ` Oleksii Kurochko
2026-04-16 13:23 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 22/27] xen/riscv: implement init_intc_phandle() Oleksii Kurochko
2026-04-02 15:00 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 23/27] xen/riscv: call do_initcalls() in start_xen() Oleksii Kurochko
2026-04-02 15:01 ` Jan Beulich
2026-03-10 17:08 ` [PATCH v1 24/27] xen/riscv: init rcu Oleksii Kurochko
2026-04-02 15:03 ` Jan Beulich
2026-04-14 11:50 ` Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 25/27] xen/riscv: setup system domains Oleksii Kurochko
2026-03-10 17:08 ` [PATCH v1 26/27] xen/riscv: provide init_vuart() Oleksii Kurochko
2026-04-07 13:52 ` Jan Beulich
2026-03-10 17:09 ` [PATCH v1 27/27] xen/riscv: add initial dom0less infrastructure support Oleksii Kurochko
2026-04-07 14:11 ` Jan Beulich
2026-04-15 10:00 ` Oleksii Kurochko
2026-04-16 14:13 ` Jan Beulich
2026-04-15 10:28 ` Oleksii Kurochko
2026-04-16 14:15 ` Jan Beulich
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.