* [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS
@ 2026-04-10 15:54 Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 01/12] xen/riscv: implement get_page_from_gfn() Oleksii Kurochko
` (11 more replies)
0 siblings, 12 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini,
Bertrand Marquis, Volodymyr Babchuk
Introduce necessary things to enable DOMAIN_BUILD_HELPERS config for RISC-V.
Generally it is indepenent patch series from [1] but depends on which
patches will go first it could be some merge conflicts.
[1] https://lore.kernel.org/xen-devel/eba232ac5a338332ddedc2cb084e0c04ee8744c2.1775835741.git.oleksii.kurochko@gmail.com/T/#u
CI tests: https://gitlab.com/xen-project/people/olkur/xen/-/pipelines/2444698148
---
Changes in v3:
- Address the comments from ML.
---
Changes in v2:
- Address the comments from ML.
- Introduce some new patches to make dom0less solution more architecture
indepenent from terminology point of view.
- Minor fixes.
---
Oleksii Kurochko (12):
xen/riscv: implement get_page_from_gfn()
xen: fix len type for guest copy functions
xen/riscv: implement copy_to_guest_phys()
xen/dom0less: rename kernel_zimage_probe() to kernel_image_probe()
xen/riscv: add kernel loading support
xen: move declaration of fw_unreserved_regions() to common header
xen: introduce domain-layout.h with common domain_use_host_layout()
xen/riscv: rework G-stage mode handling
xen: rename p2m_ipa_bits to p2m_gpa_bits
xen/riscv: introduce p2m_gpa_bits
xen/riscv: add definition of guest RAM banks
xen/riscv: enable DOMAIN_BUILD_HELPERS
xen/arch/arm/domain_build.c | 1 +
xen/arch/arm/guestcopy.c | 6 +-
xen/arch/arm/include/asm/domain.h | 14 --
xen/arch/arm/include/asm/guest_access.h | 2 +-
xen/arch/arm/include/asm/p2m.h | 2 +
xen/arch/arm/include/asm/setup.h | 3 -
xen/arch/arm/kernel.c | 48 ++---
xen/arch/arm/vgic-v3.c | 1 +
xen/arch/riscv/Kconfig | 1 +
xen/arch/riscv/Makefile | 3 +
xen/arch/riscv/dom0less-build.c | 30 +++
xen/arch/riscv/guestcopy.c | 116 +++++++++++
xen/arch/riscv/include/asm/config.h | 13 ++
xen/arch/riscv/include/asm/domain.h | 1 +
xen/arch/riscv/include/asm/guest-layout.h | 23 +++
xen/arch/riscv/include/asm/guest_access.h | 7 +
xen/arch/riscv/include/asm/p2m.h | 29 +--
xen/arch/riscv/kernel.c | 230 ++++++++++++++++++++++
xen/arch/riscv/p2m.c | 141 +++++++++----
xen/arch/riscv/vmid.c | 2 +-
xen/common/device-tree/domain-build.c | 3 +-
xen/common/device-tree/kernel.c | 2 +-
xen/include/public/arch-riscv.h | 5 +
xen/include/xen/bootinfo.h | 4 +
xen/include/xen/domain-layout.h | 28 +++
xen/include/xen/fdt-domain-build.h | 2 +-
xen/include/xen/fdt-kernel.h | 15 +-
27 files changed, 631 insertions(+), 101 deletions(-)
create mode 100644 xen/arch/riscv/dom0less-build.c
create mode 100644 xen/arch/riscv/guestcopy.c
create mode 100644 xen/arch/riscv/include/asm/guest-layout.h
create mode 100644 xen/arch/riscv/kernel.c
create mode 100644 xen/include/xen/domain-layout.h
--
2.53.0
^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH v3 01/12] xen/riscv: implement get_page_from_gfn()
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-20 14:17 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 02/12] xen: fix len type for guest copy functions Oleksii Kurochko
` (10 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
The function is implemented out-of-line rather than as a static inline,
to avoid header ordering issues where struct domain is incomplete when
asm/p2m.h is included, leading to build failures:
In file included from ./arch/riscv/include/asm/domain.h:10,
from ./include/xen/domain.h:16,
from ./include/xen/sched.h:11,
from ./include/xen/event.h:12,
from common/cpu.c:3:
./arch/riscv/include/asm/p2m.h: In function 'get_page_from_gfn':
./arch/riscv/include/asm/p2m.h:50:33: error: invalid use of undefined type 'struct domain'
50 | #define p2m_get_hostp2m(d) (&(d)->arch.p2m)
| ^~
./arch/riscv/include/asm/p2m.h:180:38: note: in expansion of macro 'p2m_get_hostp2m'
180 | return p2m_get_page_from_gfn(p2m_get_hostp2m(d), _gfn(gfn), t);
| ^~~~~~~~~~~~~~~
make[2]: *** [Rules.mk:253: common/cpu.o] Error 1
make[1]: *** [build.mk:72: common] Error 2
make: *** [Makefile:623: xen] Error 2
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- As nothing is called get_page_from_gfn() for system domains, leave only
handling of not system domains.
---
Changes in v2:
- Align implemntation with Arm's get_page_from_gfn().
- Update the first comment about DOMID_XEN to mention that isn't "normal"
domain instead of no-autotranslated.
- Drop footer after commit message.
---
xen/arch/riscv/include/asm/p2m.h | 8 ++------
xen/arch/riscv/p2m.c | 13 +++++++++++++
2 files changed, 15 insertions(+), 6 deletions(-)
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 60f27f9b347e..54ea67990f06 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -164,12 +164,8 @@ typedef unsigned int p2m_query_t;
#define P2M_ALLOC (1u<<0) /* Populate PoD and paged-out entries */
#define P2M_UNSHARE (1u<<1) /* Break CoW sharing */
-static inline struct page_info *get_page_from_gfn(
- struct domain *d, unsigned long gfn, p2m_type_t *t, p2m_query_t q)
-{
- BUG_ON("unimplemented");
- return NULL;
-}
+struct page_info *get_page_from_gfn(struct domain *d, unsigned long gfn,
+ p2m_type_t *t, p2m_query_t q);
static inline void memory_type_changed(struct domain *d)
{
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index 89e5db606fc8..d63697c89a1a 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -1534,3 +1534,16 @@ void p2m_handle_vmenter(void)
* won't be reused until need_flush is set to true.
*/
}
+
+struct page_info *get_page_from_gfn(struct domain *d, unsigned long gfn,
+ p2m_type_t *t, p2m_query_t q)
+{
+ /*
+ * system domains are domains which doesn't have p2m translation tables,
+ * so they can't use p2m_get_page_from_gfn() and extra care should be
+ * done for them.
+ */
+ ASSERT(!is_system_domain(d));
+
+ return p2m_get_page_from_gfn(p2m_get_hostp2m(d), _gfn(gfn), t);
+}
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 02/12] xen: fix len type for guest copy functions
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 01/12] xen/riscv: implement get_page_from_gfn() Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-20 15:43 ` Jan Beulich
2026-04-20 15:44 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 03/12] xen/riscv: implement copy_to_guest_phys() Oleksii Kurochko
` (9 subsequent siblings)
11 siblings, 2 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné
Widen the len argument of copy_to_guest_phys_flush_dcache() and the
copy_to_guest_phys_cb typedef from unsigned int to unsigned long, as
the function can be used to copy large blobs such as the initrd which
may exceed 4GB. Update the internal copy_guest() len argument to
unsigned long accordingly.
Change the type for local variable size in copy_guest() to avoid
compilation error because of type mismatch.
raw_* wrappers above copy_guest() keep returning unsigned long to
avoid type narrowing; it is not an issue for raw_*'s len argument
to remain 'unsigned int' since the assignment to copy_guest()'s wider
unsigned long parameter is safe and there is no raw_* users who
are using a value bigger than what can fit into 'unsigned int'.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- Correct the declaration of copy_to_guest_phys_cb() instead of return
'unsigned int' to be in sync with len argument, the type of len argument
is changed on 'unsigned long' as initrd could be pretty big and also its
size is stroed in 'paddr_t' which is 'unsigned long'.
- Update copy_guest() prototype to avoid trancation bug for len argument.
- Revert prototype changes for raw_* wrappers above copy_guest() as they
should keep returning unsigned long to avoid type narrowing; it is not
an issue for raw_*'s len argument to remain 'unsigned int' since the
assignment to copy_guest()'s wider unsigned long parameter is safe.
- Change the type for local variable size in copy_guest() to avoid
compilation error because of type mismatch.
- Add Reported-by: Jan Beulich <jbeulich@suse.com>
---
Changes in v2:
- New patch.
---
xen/arch/arm/guestcopy.c | 6 +++---
xen/arch/arm/include/asm/guest_access.h | 2 +-
xen/include/xen/fdt-domain-build.h | 2 +-
3 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
index fdb06422b8e9..ca0f72a80bff 100644
--- a/xen/arch/arm/guestcopy.c
+++ b/xen/arch/arm/guestcopy.c
@@ -53,7 +53,7 @@ static struct page_info *translate_get_page(copy_info_t info, uint64_t addr,
return page;
}
-static unsigned long copy_guest(void *buf, uint64_t addr, unsigned int len,
+static unsigned long copy_guest(void *buf, uint64_t addr, unsigned long len,
copy_info_t info, unsigned int flags)
{
/* XXX needs to handle faults */
@@ -65,7 +65,7 @@ static unsigned long copy_guest(void *buf, uint64_t addr, unsigned int len,
while ( len )
{
void *p;
- unsigned int size = min(len, (unsigned int)PAGE_SIZE - offset);
+ unsigned long size = min_t(unsigned long, len, PAGE_SIZE - offset);
struct page_info *page;
page = translate_get_page(info, addr, flags & COPY_linear,
@@ -136,7 +136,7 @@ unsigned long raw_copy_from_guest(void *to, const void __user *from,
unsigned long copy_to_guest_phys_flush_dcache(struct domain *d,
paddr_t gpa,
void *buf,
- unsigned int len)
+ unsigned long len)
{
return copy_guest(buf, gpa, len, GPA_INFO(d),
COPY_to_guest | COPY_ipa | COPY_flush_dcache);
diff --git a/xen/arch/arm/include/asm/guest_access.h b/xen/arch/arm/include/asm/guest_access.h
index 18c88b70d7ec..c13cbec55b65 100644
--- a/xen/arch/arm/include/asm/guest_access.h
+++ b/xen/arch/arm/include/asm/guest_access.h
@@ -14,7 +14,7 @@ unsigned long raw_clear_guest(void *to, unsigned int len);
unsigned long copy_to_guest_phys_flush_dcache(struct domain *d,
paddr_t gpa,
void *buf,
- unsigned int len);
+ unsigned long len);
int access_guest_memory_by_gpa(struct domain *d, paddr_t gpa, void *buf,
uint32_t size, bool is_write);
diff --git a/xen/include/xen/fdt-domain-build.h b/xen/include/xen/fdt-domain-build.h
index 886a85381651..1d9e77df0eb3 100644
--- a/xen/include/xen/fdt-domain-build.h
+++ b/xen/include/xen/fdt-domain-build.h
@@ -47,7 +47,7 @@ static inline int get_allocation_size(paddr_t size)
typedef unsigned long (*copy_to_guest_phys_cb)(struct domain *d,
paddr_t gpa,
void *buf,
- unsigned int len);
+ unsigned long len);
void initrd_load(struct kernel_info *kinfo,
copy_to_guest_phys_cb cb);
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 03/12] xen/riscv: implement copy_to_guest_phys()
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 01/12] xen/riscv: implement get_page_from_gfn() Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 02/12] xen: fix len type for guest copy functions Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-20 15:46 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 04/12] xen/dom0less: rename kernel_zimage_probe() to kernel_image_probe() Oleksii Kurochko
` (8 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce copy_to_guest_phys() for RISC-V, based on the Arm implementation.
Add a generic copy_guest() helper for copying to and from guest physical
(and potentially virtual addresses in the future), and implement
translate_get_page() to translate a guest physical address into a struct
page_info via the domain p2m.
Compared to the Arm code:
- Drop COPY_flush_dcache(), as no such use cases exist on RISC-V.
- Do not implement the linear mapping case, which is currently unused.
- Use PAGE_OFFSET() to initialize the local offset variable in copy_guest().
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- s/if (page == NULL)/if ( !page ).
- Drop " - offset" for unmap_domain_page() inside copy_guest() function.
unmap_domain_page() is expected to mask (or something similar) the page
offset bits.
- Change some types for functions as copy_to_guest_phys_cb function pointer
prototype was changed too in the prev. commit.
---
Changes in v2:
- Use BIT() instead of open-coding.
- Rename COPY_ipa to COPY_gpa.
- Rename COPY_linear to COPY_gva.
- Use BUG_ON(linear) instead if (lineer) + BUG_ON.
- Rename arg liner to gva for translate_get_page().
- Update translate_get_page() to properly handling write argument.
- Return unsigned int for copy_guest() and copy_to_guest_phys() as
len function parameter is only 'unsigned int'.
- Reformat function arguments for alignment
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/guestcopy.c | 116 ++++++++++++++++++++++
xen/arch/riscv/include/asm/guest_access.h | 7 ++
3 files changed, 124 insertions(+)
create mode 100644 xen/arch/riscv/guestcopy.c
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index 04f02ad89cba..cfc3fdf7d208 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -4,6 +4,7 @@ obj-y += domain.o
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
obj-y += entry.o
obj-y += extable.o
+obj-y += guestcopy.o
obj-y += imsic.o
obj-y += intc.o
obj-y += irq.o
diff --git a/xen/arch/riscv/guestcopy.c b/xen/arch/riscv/guestcopy.c
new file mode 100644
index 000000000000..55a2d6597621
--- /dev/null
+++ b/xen/arch/riscv/guestcopy.c
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/domain_page.h>
+#include <xen/page-size.h>
+#include <xen/sched.h>
+#include <xen/string.h>
+
+#include <asm/guest_access.h>
+
+#define COPY_from_guest 0U
+#define COPY_to_guest BIT(0, U)
+#define COPY_gpa 0U
+#define COPY_gva BIT(1, U)
+
+typedef union
+{
+ struct
+ {
+ struct vcpu *v;
+ } gva;
+
+ struct
+ {
+ struct domain *d;
+ } gpa;
+} copy_info_t;
+
+#define GVA_INFO(vcpu) ((copy_info_t) { .gva = { vcpu } })
+#define GPA_INFO(domain) ((copy_info_t) { .gpa = { domain } })
+
+static struct page_info *translate_get_page(copy_info_t info, uint64_t addr,
+ bool gva, bool write)
+{
+ p2m_type_t p2mt;
+ struct page_info *page;
+
+ /*
+ * Not implemented yet.
+ *
+ * If gva == true, the operation will likely require a struct vcpu
+ * rather than just a struct domain. For this reason copy_info_t is
+ * already passed here instead of only struct domain.
+ */
+ BUG_ON(gva);
+
+ page = get_page_from_gfn(info.gpa.d, paddr_to_pfn(addr), &p2mt, P2M_ALLOC);
+
+ if ( !page )
+ return NULL;
+
+ if ( write ? p2mt != p2m_ram_rw : !p2m_is_ram(p2mt) )
+ {
+ put_page(page);
+ return NULL;
+ }
+
+ return page;
+}
+
+static unsigned long copy_guest(void *buf, uint64_t addr, unsigned long len,
+ copy_info_t info, unsigned int flags)
+{
+ unsigned int offset = PAGE_OFFSET(addr);
+
+ BUILD_BUG_ON((sizeof(addr)) < sizeof(vaddr_t));
+ BUILD_BUG_ON((sizeof(addr)) < sizeof(paddr_t));
+
+ while ( len )
+ {
+ void *p;
+ unsigned long size = min_t(unsigned long, len, PAGE_SIZE - offset);
+ struct page_info *page;
+
+ page = translate_get_page(info, addr, flags & COPY_gva,
+ flags & COPY_to_guest);
+ if ( !page )
+ return len;
+
+ p = __map_domain_page(page);
+ p += offset;
+ if ( flags & COPY_to_guest )
+ {
+ /*
+ * buf will be NULL when the caller request to zero the
+ * guest memory.
+ */
+ if ( buf )
+ memcpy(p, buf, size);
+ else
+ memset(p, 0, size);
+ }
+ else
+ memcpy(buf, p, size);
+
+ unmap_domain_page(p);
+ put_page(page);
+ len -= size;
+ buf += size;
+ addr += size;
+
+ /*
+ * After the first iteration, guest virtual address is correctly
+ * aligned to PAGE_SIZE.
+ */
+ offset = 0;
+ }
+
+ return 0;
+}
+
+unsigned long copy_to_guest_phys(struct domain *d, paddr_t gpa, void *buf,
+ unsigned long len)
+{
+ return copy_guest(buf, gpa, len, GPA_INFO(d),
+ COPY_to_guest | COPY_gpa);
+}
diff --git a/xen/arch/riscv/include/asm/guest_access.h b/xen/arch/riscv/include/asm/guest_access.h
index 7cd51fbbdead..8d679319ded0 100644
--- a/xen/arch/riscv/include/asm/guest_access.h
+++ b/xen/arch/riscv/include/asm/guest_access.h
@@ -2,6 +2,10 @@
#ifndef ASM__RISCV__GUEST_ACCESS_H
#define ASM__RISCV__GUEST_ACCESS_H
+#include <xen/types.h>
+
+struct domain;
+
unsigned long raw_copy_to_guest(void *to, const void *from, unsigned len);
unsigned long raw_copy_from_guest(void *to, const void *from, unsigned len);
unsigned long raw_clear_guest(void *to, unsigned int len);
@@ -18,6 +22,9 @@ unsigned long raw_clear_guest(void *to, unsigned int len);
#define guest_handle_okay(hnd, nr) (1)
#define guest_handle_subrange_okay(hnd, first, last) (1)
+unsigned long copy_to_guest_phys(struct domain *d, paddr_t gpa, void *buf,
+ unsigned long len);
+
#endif /* ASM__RISCV__GUEST_ACCESS_H */
/*
* Local variables:
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 04/12] xen/dom0less: rename kernel_zimage_probe() to kernel_image_probe()
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (2 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 03/12] xen/riscv: implement copy_to_guest_phys() Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 05/12] xen/riscv: add kernel loading support Oleksii Kurochko
` (7 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné
The helper kernel_zimage_probe() is referenced from common code
(xen/common/device-tree/kernel.c), but its name is tied to the zImage
format which is specific to Arm (from architectures supported by Xen).
Other architectures supported by Xen, such as RISC-V, do not use the
zImage format and instead rely on other kernel image types (e.g. Image
or compressed Image variants: Image.gz, etc). Using "zimage" in the
name is therefore misleading in architecture-independent code.
Rename kernel_zimage_probe() to kernel_image_probe() and update the
associated structure field from "zimage" to "image" to reflect that the
code handles generic kernel images rather than the zImage format
specifically.
No functional change intended.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- Nothing changed. Only rebase.
---
Changes in v2:
- new patch.
---
xen/arch/arm/kernel.c | 48 ++++++++++++++++-----------------
xen/common/device-tree/kernel.c | 2 +-
xen/include/xen/fdt-kernel.h | 4 +--
3 files changed, 27 insertions(+), 27 deletions(-)
diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 7544fd50a20f..3c613cdb233f 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -101,8 +101,8 @@ static paddr_t __init kernel_zimage_place(struct kernel_info *info)
paddr_t load_addr;
#ifdef CONFIG_ARM_64
- if ( (info->arch.type == DOMAIN_64BIT) && (info->zimage.start == 0) )
- return mem->bank[0].start + info->zimage.text_offset;
+ if ( (info->arch.type == DOMAIN_64BIT) && (info->image.start == 0) )
+ return mem->bank[0].start + info->image.text_offset;
#endif
/*
@@ -111,19 +111,19 @@ static paddr_t __init kernel_zimage_place(struct kernel_info *info)
* and above 32MiB. Load it as high as possible within these
* constraints, while also avoiding the DTB.
*/
- if ( info->zimage.start == 0 )
+ if ( info->image.start == 0 )
{
paddr_t load_end;
load_end = mem->bank[0].start + mem->bank[0].size;
load_end = MIN(mem->bank[0].start + MB(128), load_end);
- load_addr = load_end - info->zimage.len;
+ load_addr = load_end - info->image.len;
/* Align to 2MB */
load_addr &= ~((2 << 20) - 1);
}
else
- load_addr = info->zimage.start;
+ load_addr = info->image.start;
return load_addr;
}
@@ -131,8 +131,8 @@ static paddr_t __init kernel_zimage_place(struct kernel_info *info)
static void __init kernel_zimage_load(struct kernel_info *info)
{
paddr_t load_addr = kernel_zimage_place(info);
- paddr_t paddr = info->zimage.kernel_addr;
- paddr_t len = info->zimage.len;
+ paddr_t paddr = info->image.kernel_addr;
+ paddr_t len = info->image.len;
void *kernel;
int rc;
@@ -215,7 +215,7 @@ int __init kernel_uimage_probe(struct kernel_info *info,
return -EOPNOTSUPP;
}
- info->zimage.start = be32_to_cpu(uimage.load);
+ info->image.start = be32_to_cpu(uimage.load);
info->entry = be32_to_cpu(uimage.ep);
/*
@@ -224,20 +224,20 @@ int __init kernel_uimage_probe(struct kernel_info *info,
* independent image. That means Xen is free to load such an image at
* any valid address.
*/
- if ( info->zimage.start == 0 )
+ if ( info->image.start == 0 )
printk(XENLOG_INFO
"No load address provided. Xen will decide where to load it.\n");
else
printk(XENLOG_INFO
"Provided load address: %"PRIpaddr" and entry address: %"PRIpaddr"\n",
- info->zimage.start, info->entry);
+ info->image.start, info->entry);
/*
* If the image supports position independent execution, then user cannot
* provide an entry point as Xen will load such an image at any appropriate
* memory address. Thus, we need to return error.
*/
- if ( (info->zimage.start == 0) && (info->entry != 0) )
+ if ( (info->image.start == 0) && (info->entry != 0) )
{
printk(XENLOG_ERR
"Entry point cannot be non zero for PIE image.\n");
@@ -257,13 +257,13 @@ int __init kernel_uimage_probe(struct kernel_info *info,
if ( rc )
return rc;
- info->zimage.kernel_addr = mod->start;
- info->zimage.len = mod->size;
+ info->image.kernel_addr = mod->start;
+ info->image.len = mod->size;
}
else
{
- info->zimage.kernel_addr = addr + sizeof(uimage);
- info->zimage.len = len;
+ info->image.kernel_addr = addr + sizeof(uimage);
+ info->image.len = len;
}
info->load = kernel_zimage_load;
@@ -289,7 +289,7 @@ int __init kernel_uimage_probe(struct kernel_info *info,
* Thus, Xen uses uimage.load attribute to determine the load address and
* zimage.text_offset is ignored.
*/
- info->zimage.text_offset = 0;
+ info->image.text_offset = 0;
#endif
return 0;
@@ -338,10 +338,10 @@ static int __init kernel_zimage64_probe(struct kernel_info *info,
if ( (end - start) > size )
return -EINVAL;
- info->zimage.kernel_addr = addr;
- info->zimage.len = end - start;
- info->zimage.text_offset = zimage.text_offset;
- info->zimage.start = 0;
+ info->image.kernel_addr = addr;
+ info->image.len = end - start;
+ info->image.text_offset = zimage.text_offset;
+ info->image.start = 0;
info->load = kernel_zimage_load;
@@ -389,10 +389,10 @@ static int __init kernel_zimage32_probe(struct kernel_info *info,
}
}
- info->zimage.kernel_addr = addr;
+ info->image.kernel_addr = addr;
- info->zimage.start = start;
- info->zimage.len = end - start;
+ info->image.start = start;
+ info->image.len = end - start;
info->load = kernel_zimage_load;
@@ -403,7 +403,7 @@ static int __init kernel_zimage32_probe(struct kernel_info *info,
return 0;
}
-int __init kernel_zimage_probe(struct kernel_info *info, paddr_t addr,
+int __init kernel_image_probe(struct kernel_info *info, paddr_t addr,
paddr_t size)
{
int rc;
diff --git a/xen/common/device-tree/kernel.c b/xen/common/device-tree/kernel.c
index 28096121a52d..cfa27464f0fc 100644
--- a/xen/common/device-tree/kernel.c
+++ b/xen/common/device-tree/kernel.c
@@ -235,7 +235,7 @@ int __init kernel_probe(struct kernel_info *info,
if ( rc && rc != -EINVAL )
return rc;
- rc = kernel_zimage_probe(info, mod->start, mod->size);
+ rc = kernel_image_probe(info, mod->start, mod->size);
return rc;
}
diff --git a/xen/include/xen/fdt-kernel.h b/xen/include/xen/fdt-kernel.h
index 33a60597bb4d..2af3bd5f0722 100644
--- a/xen/include/xen/fdt-kernel.h
+++ b/xen/include/xen/fdt-kernel.h
@@ -56,7 +56,7 @@ struct kernel_info {
paddr_t text_offset; /* 64-bit Image only */
#endif
paddr_t start; /* Must be 0 for 64-bit Image */
- } zimage;
+ } image;
};
#if __has_include(<asm/kernel.h>)
@@ -122,7 +122,7 @@ void kernel_load(struct kernel_info *info);
int kernel_decompress(struct boot_module *mod, uint32_t offset);
-int kernel_zimage_probe(struct kernel_info *info, paddr_t addr, paddr_t size);
+int kernel_image_probe(struct kernel_info *info, paddr_t addr, paddr_t size);
/*
* uImage isn't really used nowadays thereby leave kernel_uimage_probe()
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 05/12] xen/riscv: add kernel loading support
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (3 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 04/12] xen/dom0less: rename kernel_zimage_probe() to kernel_image_probe() Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-21 8:57 ` Jan Beulich
2026-04-21 9:01 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 06/12] xen: move declaration of fw_unreserved_regions() to common header Oleksii Kurochko
` (6 subsequent siblings)
11 siblings, 2 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Introduce support for loading a Linux kernel Image which is got by
uncompressing of Image.gz on RISC-V.
kernel_image_load() and place_modules() currently call panic() on
failure rather than returning an error. This is because the common
kernel_load() in common/device-tree/kernel.c does not expect a
return code. Handling errors gracefully would require a separate
refactor.
The implementation is based on the Xen Arm kernel loading code.
text_offset is available for both 32-bit and 64-bit Image, so fix
that.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- s/kernel.o/kernel.init.o.
- s/struct kernel_info *info/const struct kernel_info *info for kernel_image_place
- s/ioremap_wc/ioremap_cache() inside kernel_image_load().
- s/u<N>/uint<N>.
- Drop hard tabs.
- Add le{32,64}_to_cpu() when correpondent le endian fields are read.
- Update the comment in kernel_image64_probe() about initialization of end of
image64.
- Introduce image_size to pass to kernel load function total size of kernel
it will have at runtime.
- Update kernel_image_place() to not put Image always to bank 0.
- Update place_modules() function to deal with different banks.
- Propage error up the call chain in kernel_image_probe().
---
Changes in v2:
- s/zimage/image as RISC-V doesn't support zImages, only Image and
Image.{gz,...}.
- Update the commit message.
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/include/asm/config.h | 13 ++
xen/arch/riscv/kernel.c | 230 ++++++++++++++++++++++++++++
xen/include/xen/fdt-kernel.h | 11 +-
4 files changed, 253 insertions(+), 2 deletions(-)
create mode 100644 xen/arch/riscv/kernel.c
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index cfc3fdf7d208..eecdcbc76867 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -8,6 +8,7 @@ obj-y += guestcopy.o
obj-y += imsic.o
obj-y += intc.o
obj-y += irq.o
+obj-y += kernel.init.o
obj-y += mm.o
obj-y += p2m.o
obj-y += paging.o
diff --git a/xen/arch/riscv/include/asm/config.h b/xen/arch/riscv/include/asm/config.h
index 0613de008b13..fd69057826e1 100644
--- a/xen/arch/riscv/include/asm/config.h
+++ b/xen/arch/riscv/include/asm/config.h
@@ -151,6 +151,19 @@
extern unsigned long phys_offset; /* = load_start - XEN_VIRT_START */
#endif
+/*
+ * KERNEL_LOAD_ADDR_ALIGNMENT is defined based on paragraph of
+ * "Kernel location" of boot.rst:
+ * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
+ */
+#if defined(CONFIG_RISCV_32)
+#define KERNEL_LOAD_ADDR_ALIGNMENT MB(4)
+#elif defined(CONFIG_RISCV_64)
+#define KERNEL_LOAD_ADDR_ALIGNMENT MB(2)
+#else
+#error "Define KERNEL_LOAD_ADDR_ALIGNMENT"
+#endif
+
#endif /* ASM__RISCV__CONFIG_H */
/*
* Local variables:
diff --git a/xen/arch/riscv/kernel.c b/xen/arch/riscv/kernel.c
new file mode 100644
index 000000000000..6d58f3372358
--- /dev/null
+++ b/xen/arch/riscv/kernel.c
@@ -0,0 +1,230 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/bug.h>
+#include <xen/compiler.h>
+#include <xen/errno.h>
+#include <xen/fdt-kernel.h>
+#include <xen/guest_access.h>
+#include <xen/init.h>
+#include <xen/libfdt/libfdt.h>
+#include <xen/mm.h>
+#include <xen/types.h>
+#include <xen/vmap.h>
+
+#include <asm/setup.h>
+
+#define IMAGE64_MAGIC_V2 0x05435352 /* Magic number 2, le, "RSC\x05" */
+
+static void __init place_modules(struct kernel_info *info, paddr_t kernbase,
+ paddr_t kernend)
+{
+ const struct boot_module *mod = info->bd.initrd;
+ const struct membanks *banks = kernel_info_get_mem_const(info);
+ const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0,
+ KERNEL_LOAD_ADDR_ALIGNMENT);
+ const paddr_t dtb_len = ROUNDUP(fdt_totalsize(info->fdt),
+ KERNEL_LOAD_ADDR_ALIGNMENT);
+ const paddr_t modsize = initrd_len + dtb_len;
+ int bi;
+
+ BUG_ON(modsize < initrd_len);
+
+ /*
+ * Place modules as high in RAM as possible, scanning banks from
+ * last to first so that the end of the last bank is preferred.
+ */
+ for ( bi = banks->nr_banks - 1; bi >= 0; bi-- )
+ {
+ const struct membank *bank = &banks->bank[bi];
+ const paddr_t bank_end = bank->start + bank->size;
+ paddr_t modbase;
+
+ if ( modsize > bank->size )
+ continue;
+
+ modbase = ROUNDDOWN(bank_end - modsize, KERNEL_LOAD_ADDR_ALIGNMENT);
+
+ if ( modbase < bank->start )
+ continue;
+
+ /*
+ * If the kernel resides in this bank, ensure modules do not
+ * overlap with it.
+ */
+ if ( (kernbase >= bank->start) && (kernbase < bank_end) &&
+ (modbase < ROUNDUP(kernend, KERNEL_LOAD_ADDR_ALIGNMENT)) &&
+ (modbase + modsize > kernbase) )
+ continue;
+
+ info->dtb_paddr = modbase;
+ info->initrd_paddr = modbase + dtb_len;
+
+ return;
+ }
+
+ panic("Unable to find suitable location for dtb+initrd\n");
+}
+
+static paddr_t __init kernel_image_place(struct kernel_info *info)
+{
+ paddr_t load_addr = INVALID_PADDR;
+ uint64_t image_size = info->image.image_size ?: info->image.len;
+ const struct membanks *banks = kernel_info_get_mem_const(info);
+ unsigned int nr_banks = banks->nr_banks;
+ unsigned int bi;
+
+ dprintk(XENLOG_DEBUG, "nr_banks(%u)\n", nr_banks);
+
+ /*
+ * At the moment, RISC-V's Linux kernel should be always position
+ * independent based on "Per-MMU execution" of boot.rst:
+ * https://docs.kernel.org/arch/riscv/boot.html#pre-mmu-execution
+ *
+ * But just for the case when RISC-V's Linux kernel isn't position
+ * independent it is needed to take load address from
+ * info->image.start.
+ *
+ * If `start` is zero, the Image is position independent.
+ */
+ if ( likely(!info->image.start) )
+ {
+ for ( bi = 0; bi != nr_banks; bi++ )
+ {
+ const struct membank *bank = &banks->bank[bi];
+ paddr_t bank_start = bank->start;
+ /*
+ * According to boot.rst kernel load address should be properly
+ * aligned:
+ * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
+ *
+ * As Image in this case is PIC we can ignore
+ * info->image.text_offset.
+ */
+ paddr_t aligned_start = ROUNDUP(bank_start, KERNEL_LOAD_ADDR_ALIGNMENT);
+ paddr_t bank_end = bank_start + bank->size;
+ paddr_t bank_size;
+
+ if ( aligned_start > bank_end )
+ continue;
+
+ bank_size = bank_end - aligned_start;
+
+ dprintk(XENLOG_DEBUG, "bank[%u].start=%"PRIpaddr"\n", bi, bank->start);
+
+ if ( image_size <= bank_size )
+ {
+ load_addr = aligned_start;
+ break;
+ }
+ }
+ }
+ else
+ {
+ load_addr = info->image.start + info->image.text_offset;
+
+ WARN_ON(!IS_ALIGNED(load_addr, KERNEL_LOAD_ADDR_ALIGNMENT));
+
+ for ( bi = 0; bi != nr_banks; bi++ )
+ {
+ const struct membank *bank = &banks->bank[bi];
+ paddr_t bank_start = bank->start;
+ paddr_t bank_end = bank_start + bank->size;
+
+ if ( (load_addr >= bank_start) && (load_addr < bank_end) &&
+ (bank_end - load_addr) >= image_size )
+ break;
+ }
+ }
+
+ if ( bi == nr_banks )
+ panic("Failed to place kernel image in any memory bank\n");
+
+ info->entry = load_addr;
+
+ return load_addr;
+}
+
+static void __init kernel_image_load(struct kernel_info *info)
+{
+ int rc;
+ paddr_t load_addr = kernel_image_place(info);
+ paddr_t paddr = info->image.kernel_addr;
+ paddr_t len = info->image.len;
+ paddr_t effective_size = info->image.image_size ?: len;
+ void *kernel;
+
+ place_modules(info, load_addr, load_addr + effective_size);
+
+ printk("Loading Image from %"PRIpaddr" to %"PRIpaddr"-%"PRIpaddr"\n",
+ paddr, load_addr, load_addr + effective_size);
+
+ kernel = ioremap_cache(paddr, len);
+
+ if ( !kernel )
+ panic("Unable to map kernel\n");
+
+ /* Move kernel to proper location in guest phys map */
+ rc = copy_to_guest_phys(info->bd.d, load_addr, kernel, len);
+
+ if ( rc )
+ panic("Unable to copy kernel to proper guest location\n");
+
+ iounmap(kernel);
+}
+
+/* Check if the image is a 64-bit Image */
+static int __init kernel_image64_probe(struct kernel_info *info,
+ paddr_t addr, paddr_t size)
+{
+ /* riscv/boot-image-header.rst */
+ struct {
+ uint32_t code0; /* Executable code */
+ uint32_t code1; /* Executable code */
+ uint64_t text_offset; /* Image load offset, little endian */
+ uint64_t image_size; /* Effective Image size, little endian */
+ uint64_t flags; /* kernel flags, little endian */
+ uint32_t version; /* Version of this header */
+ uint32_t res1; /* Reserved */
+ uint64_t res2; /* Reserved */
+ uint64_t magic; /* Deprecated: Magic number, little endian, "RISCV" */
+ uint32_t magic2; /* Magic number 2, little endian, "RSC\x05" */
+ uint32_t res3; /* Reserved for PE COFF offset */
+ } image;
+ uint64_t effective_size;
+
+ if ( size < sizeof(image) )
+ return -EINVAL;
+
+ copy_from_paddr(&image, addr, sizeof(image));
+
+ /* Magic v1 is deprecated and may be removed. Only use v2 */
+ if ( le32_to_cpu(image.magic2) != IMAGE64_MAGIC_V2 )
+ return -EINVAL;
+
+ effective_size = le64_to_cpu(image.image_size);
+
+ if ( effective_size && size > effective_size )
+ return -EINVAL;
+
+ info->image.kernel_addr = addr;
+ /* Actual size in the binary file */
+ info->image.len = size;
+ /* Total memory the kernel occupies at runtime */
+ info->image.image_size = effective_size;
+ info->image.text_offset = le64_to_cpu(image.text_offset);
+ info->image.start = 0;
+
+ info->load = kernel_image_load;
+
+ return 0;
+}
+
+int __init kernel_image_probe(struct kernel_info *info, paddr_t addr,
+ paddr_t size)
+{
+#ifdef CONFIG_RISCV_64
+ return kernel_image64_probe(info, addr, size);
+#else
+ return -EOPNOTSUPP;
+#endif
+}
diff --git a/xen/include/xen/fdt-kernel.h b/xen/include/xen/fdt-kernel.h
index 2af3bd5f0722..aa977a50f4fc 100644
--- a/xen/include/xen/fdt-kernel.h
+++ b/xen/include/xen/fdt-kernel.h
@@ -52,8 +52,15 @@ struct kernel_info {
struct {
paddr_t kernel_addr;
paddr_t len;
-#if defined(CONFIG_ARM_64) || defined(CONFIG_RISCV_64)
- paddr_t text_offset; /* 64-bit Image only */
+#if defined(CONFIG_ARM_64) || defined(CONFIG_RISCV)
+ /*
+ * ARM: 64-bit Image only.
+ * RISC-V: both 32-bit and 64-bit Images.
+ */
+ paddr_t text_offset;
+#endif
+#if defined(CONFIG_RISCV)
+ uint64_t image_size; /* Effective size of Image */
#endif
paddr_t start; /* Must be 0 for 64-bit Image */
} image;
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 06/12] xen: move declaration of fw_unreserved_regions() to common header
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (4 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 05/12] xen/riscv: add kernel loading support Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 07/12] xen: introduce domain-layout.h with common domain_use_host_layout() Oleksii Kurochko
` (5 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné
Since the implementation of fw_unreserved_regions() is in common code, move
its declaration to xen/bootinfo.h.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v2-v3:
- Nothing changed. Only rebase.
---
xen/arch/arm/include/asm/setup.h | 3 ---
xen/include/xen/bootinfo.h | 4 ++++
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/xen/arch/arm/include/asm/setup.h b/xen/arch/arm/include/asm/setup.h
index 899e33925ca4..0d29b46ea52b 100644
--- a/xen/arch/arm/include/asm/setup.h
+++ b/xen/arch/arm/include/asm/setup.h
@@ -43,9 +43,6 @@ int acpi_make_efi_nodes(void *fdt, struct membank tbl_add[]);
void create_dom0(void);
void discard_initial_modules(void);
-void fw_unreserved_regions(paddr_t s, paddr_t e,
- void (*cb)(paddr_t ps, paddr_t pe),
- unsigned int first);
void init_pdx(void);
void setup_mm(void);
diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
index f834f1957155..dbf492c2e36e 100644
--- a/xen/include/xen/bootinfo.h
+++ b/xen/include/xen/bootinfo.h
@@ -210,4 +210,8 @@ static inline struct membanks *membanks_xzalloc(unsigned int nr,
return banks;
}
+void fw_unreserved_regions(paddr_t s, paddr_t e,
+ void (*cb)(paddr_t ps, paddr_t pe),
+ unsigned int first);
+
#endif /* XEN_BOOTINFO_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 07/12] xen: introduce domain-layout.h with common domain_use_host_layout()
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (5 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 06/12] xen: move declaration of fw_unreserved_regions() to common header Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-21 9:20 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 08/12] xen/riscv: rework G-stage mode handling Oleksii Kurochko
` (4 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Andrew Cooper, Anthony PERARD, Jan Beulich, Roger Pau Monné
domain_use_host_layout() is not architecture-specific and may be needed
on x86 [1]. Replace the ARM-specific macro in asm/domain.h with a common
static inline in a new dedicated header, xen/domain-layout.h.
xen/domain.h would be the natural home, but placing it there would
require including xen/paging.h (for paging_mode_translate()) and
xen/sched.h (for is_hardware_domain()), which would introduce circular
dependencies. A separate header that callers opt into avoids this.
Adjust the implementation to take paging_mode_translate() into account
so it works correctly for all architectures, including x86. Some extra
details about implementation [2] and [3].
[1] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2602161038120.359097@ubuntu-linux-20-04-desktop/
[2] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2602271742400.3148344@ubuntu-linux-20-04-desktop/
[3] https://lore.kernel.org/xen-devel/alpine.DEB.2.22.394.2602271750190.3148344@ubuntu-linux-20-04-desktop/
Suggested-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- Make argument of domain_use_host_layout() const.
- Create a separate header to avoid circular heder dependecy and making
domain_use_host_layour() as static inline.
- Rework domain_use_host_layout() to be protected by paging_mode_translate().
- Update the commit message.
---
Changes in v2:
- Drop ifdef around defintion of domain_use_host_layout() as it
was suggested generic version. It could be returned back when
the real use case for it will appear.
- Add Suggested-by: and update the commit message.
- Make domain_use_host_layout() function instead of macros to
avoid ciclular header dependecies. Look at more details in
the commit message.
---
xen/arch/arm/domain_build.c | 1 +
xen/arch/arm/include/asm/domain.h | 14 --------------
xen/arch/arm/vgic-v3.c | 1 +
xen/common/device-tree/domain-build.c | 1 +
xen/include/xen/domain-layout.h | 28 +++++++++++++++++++++++++++
5 files changed, 31 insertions(+), 14 deletions(-)
create mode 100644 xen/include/xen/domain-layout.h
diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 6c17a84b2633..60a7cbf915a5 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -2,6 +2,7 @@
#include <xen/init.h>
#include <xen/bootinfo.h>
#include <xen/compile.h>
+#include <xen/domain-layout.h>
#include <xen/dom0less-build.h>
#include <xen/fdt-domain-build.h>
#include <xen/fdt-kernel.h>
diff --git a/xen/arch/arm/include/asm/domain.h b/xen/arch/arm/include/asm/domain.h
index ffe5d0d9f0a6..f95ad1285e6e 100644
--- a/xen/arch/arm/include/asm/domain.h
+++ b/xen/arch/arm/include/asm/domain.h
@@ -30,20 +30,6 @@ enum domain_type {
#define is_64bit_domain(d) (0)
#endif
-/*
- * Is the domain using the host memory layout?
- *
- * Direct-mapped domain will always have the RAM mapped with GFN == MFN.
- * To avoid any trouble finding space, it is easier to force using the
- * host memory layout.
- *
- * The hardware domain will use the host layout regardless of
- * direct-mapped because some OS may rely on a specific address ranges
- * for the devices.
- */
-#define domain_use_host_layout(d) (is_domain_direct_mapped(d) || \
- is_hardware_domain(d))
-
struct vtimer {
struct vcpu *v;
int irq;
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 77aab5c3c293..77517c303061 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -10,6 +10,7 @@
*/
#include <xen/bitops.h>
+#include <xen/domain-layout.h>
#include <xen/init.h>
#include <xen/irq.h>
#include <xen/lib.h>
diff --git a/xen/common/device-tree/domain-build.c b/xen/common/device-tree/domain-build.c
index 540627b74e96..e706a6173ba6 100644
--- a/xen/common/device-tree/domain-build.c
+++ b/xen/common/device-tree/domain-build.c
@@ -1,6 +1,7 @@
/* SPDX-License-Identifier: GPL-2.0-only */
#include <xen/bootinfo.h>
+#include <xen/domain-layout.h>
#include <xen/fdt-domain-build.h>
#include <xen/init.h>
#include <xen/lib.h>
diff --git a/xen/include/xen/domain-layout.h b/xen/include/xen/domain-layout.h
new file mode 100644
index 000000000000..15cbb1813c8a
--- /dev/null
+++ b/xen/include/xen/domain-layout.h
@@ -0,0 +1,28 @@
+#ifndef __XEN_DOMAIN_LAYOUT_H__
+#define __XEN_DOMAIN_LAYOUT_H__
+
+#include <xen/domain.h>
+#include <xen/paging.h>
+#include <xen/sched.h>
+
+/*
+ * Is a domain using the host memory layout?
+ *
+ * domain_use_host_layout() is always False for PV guests.
+ *
+ * Direct-mapped domains (autotranslated domains with memory allocated
+ * contiguously and mapped 1:1 so that GFN == MFN) are always using the
+ * host memory layout to avoid address clashes.
+ *
+ * The hardware domain will use the host layout (regardless of
+ * direct-mapped) because some OS may rely on specific address ranges
+ * for the devices. PV Dom0, like any other PV guests, has
+ * domain_use_host_layout() returning False.
+ */
+static inline bool domain_use_host_layout(const struct domain *d)
+{
+ return paging_mode_translate(d) &&
+ (is_domain_direct_mapped(d) || is_hardware_domain(d));
+}
+
+#endif /* __XEN_DOMAIN_LAYOUT_H__ */
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 08/12] xen/riscv: rework G-stage mode handling
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (6 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 07/12] xen: introduce domain-layout.h with common domain_use_host_layout() Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-21 9:39 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 09/12] xen: rename p2m_ipa_bits to p2m_gpa_bits Oleksii Kurochko
` (3 subsequent siblings)
11 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Rework G-stage mode handling to make the selected mode descriptor
reusable outside of p2m initialization, both for filling CPU nodes in
the device tree passed to dom0less guests and for per-domain G-stage
mode selection at domain creation time.
Promote modes[] from a local __initconst variable inside
gstage_mode_detect() to a file-scope static const array, and convert
max_gstage_mode from an embedded struct (assigned by value) to a global
const pointer into modes[]. This allows referencing both the mode
identifier and the mode name after init without copying the descriptor.
Remove get_max_supported_mode(); its callers now dereference
max_gstage_mode->mode directly.
Change struct p2m_domain::mode from an embedded gstage_mode_desc to a
const pointer into modes[], so each domain shares the descriptor rather
than carrying its own copy.
Adjust the modes[] entries in three ways:
- Use lowercase names without the "x4" suffix (e.g. "sv39" instead of
"Sv39x4"). The DT mmu-type binding [1] does not include the suffix,
so the name can now be passed directly to the guest without
transformation. The suffix is appended only in the diagnostic
printk, where it remains informative.
- Use "none" for Bare mode (HGATP_MODE_OFF) to match the DT binding.
- Change paging_levels to represent the root page-table level index
(i.e. total paging levels minus one) rather than the total count.
P2M_ROOT_LEVEL() now returns the correct VPN index directly, without
requiring callers to subtract one or use hardcoded offsets.
Add gstage_mode[8] to xen_arch_domainconfig so the toolstack can
request a specific G-stage mode at domain creation time. Introduce
find_gstage_mode() to resolve a mode descriptor by name
(case-insensitive), capping the result at max_gstage_mode to prevent
requesting a mode the hardware does not support. Update p2m_init() to
accept a xen_domctl_createdomain pointer and call find_gstage_mode()
instead of hardcoding Sv39x4.
Add arch_parse_dom0less_node() in a new dom0less-build.c to read the
"mmu-type" DT property from a guest domain node and store it in
boot_domain::create_cfg.arch.gstage_mode, falling back to maximum
supported mode when the property is absent.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/devicetree/bindings/riscv/cpus.yaml?h=v6.19-rc3#n82
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- New patch. Was taken from another patch series:
https://lore.kernel.org/xen-devel/cover.1773157782.git.oleksii.kurochko@gmail.com/T/#m6eb886bdf718b06b967d1688314b80e7374a8de5
---
xen/arch/riscv/Makefile | 1 +
xen/arch/riscv/dom0less-build.c | 30 ++++++++
xen/arch/riscv/include/asm/p2m.h | 11 +--
xen/arch/riscv/p2m.c | 113 +++++++++++++++++++++----------
xen/arch/riscv/vmid.c | 2 +-
xen/include/public/arch-riscv.h | 5 ++
6 files changed, 121 insertions(+), 41 deletions(-)
create mode 100644 xen/arch/riscv/dom0less-build.c
diff --git a/xen/arch/riscv/Makefile b/xen/arch/riscv/Makefile
index eecdcbc76867..8f7fd625dddd 100644
--- a/xen/arch/riscv/Makefile
+++ b/xen/arch/riscv/Makefile
@@ -1,6 +1,7 @@
obj-y += aplic.o
obj-y += cpufeature.o
obj-y += domain.o
+obj-$(CONFIG_DOM0LESS_BOOT) += dom0less-build.init.o
obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
obj-y += entry.o
obj-y += extable.o
diff --git a/xen/arch/riscv/dom0less-build.c b/xen/arch/riscv/dom0less-build.c
new file mode 100644
index 000000000000..11fa184d54be
--- /dev/null
+++ b/xen/arch/riscv/dom0less-build.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <xen/bootfdt.h>
+#include <xen/device_tree.h>
+#include <xen/init.h>
+
+#include <asm/p2m.h>
+
+int __init arch_parse_dom0less_node(struct dt_device_node *node,
+ struct boot_domain *bd)
+{
+ const char *mmu_type;
+
+ if ( dt_property_read_string(node, "mmu-type", &mmu_type) )
+ {
+ dprintk(XENLOG_WARNING, "mmu-type property is missing in guest domain "
+ "node. %s will be used as fallback\n", max_gstage_mode->name);
+
+ mmu_type = max_gstage_mode->name;
+ }
+
+ if ( safe_strcpy(bd->create_cfg.arch.gstage_mode, mmu_type) )
+ {
+ dprintk(XENLOG_ERR, "mmu-type value \"%s\" is too long\n", mmu_type);
+
+ return -EINVAL;
+ }
+
+ return 0;
+}
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index 54ea67990f06..b5b6a996baeb 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -13,7 +13,7 @@
#define P2M_ROOT_ORDER (ilog2(GSTAGE_ROOT_PAGE_TABLE_SIZE) - PAGE_SHIFT)
#define P2M_ROOT_PAGES BIT(P2M_ROOT_ORDER, U)
-#define P2M_ROOT_LEVEL(p2m) ((p2m)->mode.paging_levels)
+#define P2M_ROOT_LEVEL(p2m) ((p2m)->mode->paging_levels)
/*
* According to the RISC-V spec:
@@ -55,6 +55,8 @@ struct gstage_mode_desc {
char name[8];
};
+extern const struct gstage_mode_desc *max_gstage_mode;
+
/* Per-p2m-table state */
struct p2m_domain {
/*
@@ -68,7 +70,7 @@ struct p2m_domain {
/* The root of the p2m tree. May be concatenated */
struct page_info *root;
- struct gstage_mode_desc mode;
+ const struct gstage_mode_desc *mode;
/* Back pointer to domain */
struct domain *domain;
@@ -215,9 +217,10 @@ static inline bool arch_acquire_resource_check(struct domain *d)
}
void guest_mm_init(void);
-unsigned char get_max_supported_mode(void);
-int p2m_init(struct domain *d);
+struct xen_domctl_createdomain;
+
+int p2m_init(struct domain *d, const struct xen_domctl_createdomain *config);
static inline void p2m_write_lock(struct p2m_domain *p2m)
{
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index d63697c89a1a..b0773083ff53 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -45,12 +45,27 @@ struct p2m_pte_ctx {
unsigned int level; /* Paging level at which the PTE resides. */
};
-static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
- .mode = HGATP_MODE_OFF,
- .paging_levels = 0,
- .name = "Bare",
+/* Values should be sorted by ->mode in this array */
+static const struct gstage_mode_desc modes[] = {
+ /*
+ * Based on the RISC-V spec:
+ * Bare mode is always supported, regardless of SXLEN.
+ * When SXLEN=32, the only other valid setting for MODE is Sv32.
+ * When SXLEN=64, three paged virtual-memory schemes are defined:
+ * Sv39, Sv48, and Sv57.
+ */
+ { HGATP_MODE_OFF, 0, "none" },
+#ifdef CONFIG_RISCV_32
+ { HGATP_MODE_SV32X4, 1, "sv32" },
+#else
+ { HGATP_MODE_SV39X4, 2, "sv39" },
+ { HGATP_MODE_SV48X4, 3, "sv48" },
+ { HGATP_MODE_SV57X4, 4, "sv57" },
+#endif
};
+const struct gstage_mode_desc * __ro_after_init max_gstage_mode = &modes[0];
+
static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
static inline void p2m_free_metadata_page(struct p2m_domain *p2m,
@@ -63,11 +78,6 @@ static inline void p2m_free_metadata_page(struct p2m_domain *p2m,
}
}
-unsigned char get_max_supported_mode(void)
-{
- return max_gstage_mode.mode;
-}
-
/*
* If anything is changed here, it may also require updates to
* p2m_{get,set}_type().
@@ -148,23 +158,6 @@ static pte_t *p2m_get_root_pointer(struct p2m_domain *p2m, gfn_t gfn)
static void __init gstage_mode_detect(void)
{
- static const struct gstage_mode_desc modes[] __initconst = {
- /*
- * Based on the RISC-V spec:
- * Bare mode is always supported, regardless of SXLEN.
- * When SXLEN=32, the only other valid setting for MODE is Sv32.
- * When SXLEN=64, three paged virtual-memory schemes are defined:
- * Sv39, Sv48, and Sv57.
- */
-#ifdef CONFIG_RISCV_32
- { HGATP_MODE_SV32X4, 2, "Sv32x4" }
-#else
- { HGATP_MODE_SV39X4, 3, "Sv39x4" },
- { HGATP_MODE_SV48X4, 4, "Sv48x4" },
- { HGATP_MODE_SV57X4, 5, "Sv57x4" },
-#endif
- };
-
for ( unsigned int mode_idx = ARRAY_SIZE(modes); mode_idx-- > 0; )
{
unsigned long mode = modes[mode_idx].mode;
@@ -173,16 +166,16 @@ static void __init gstage_mode_detect(void)
if ( MASK_EXTR(csr_read(CSR_HGATP), HGATP_MODE_MASK) == mode )
{
- max_gstage_mode = modes[mode_idx];
+ max_gstage_mode = &modes[mode_idx];
break;
}
}
- if ( max_gstage_mode.mode == HGATP_MODE_OFF )
+ if ( max_gstage_mode->mode == HGATP_MODE_OFF )
panic("Xen expects that G-stage won't be Bare mode\n");
- printk("Max supported G-stage mode is %s\n", max_gstage_mode.name);
+ printk("Max supported G-stage mode is %sx4\n", max_gstage_mode->name);
csr_write(CSR_HGATP, 0);
@@ -283,7 +276,7 @@ static void clear_and_clean_page(struct page_info *page, bool clean_dcache)
unsigned long construct_hgatp(const struct p2m_domain *p2m, uint16_t vmid)
{
return MASK_INSR(mfn_x(page_to_mfn(p2m->root)), HGATP_PPN_MASK) |
- MASK_INSR(p2m->mode.mode, HGATP_MODE_MASK) |
+ MASK_INSR(p2m->mode->mode, HGATP_MODE_MASK) |
MASK_INSR(vmid, HGATP_VMID_MASK);
}
@@ -331,8 +324,40 @@ static int p2m_alloc_root_table(struct p2m_domain *p2m)
return 0;
}
-int p2m_init(struct domain *d)
+static const struct gstage_mode_desc * find_gstage_mode(const char *mmu_type)
{
+ for ( unsigned int mode_idx = 0; mode_idx < ARRAY_SIZE(modes); mode_idx++ )
+ {
+ if ( !strcasecmp(mmu_type, modes[mode_idx].name) )
+ {
+ if ( modes[mode_idx].mode == HGATP_MODE_OFF ||
+ modes[mode_idx].mode > max_gstage_mode->mode )
+ break;
+
+ return &modes[mode_idx];
+ }
+ }
+
+ ASSERT(modes[0].mode == HGATP_MODE_OFF);
+
+ dprintk(XENLOG_ERR, "Requested G-stage mode (%s) isn't supported\n",
+ mmu_type);
+
+ /*
+ * Return the Bare-mode sentinel. p2m_init() will reject it with
+ * -EINVAL, producing the appropriate domain-creation failure.
+ */
+ return &modes[0];
+}
+
+int p2m_init(struct domain *d, const struct xen_domctl_createdomain *config)
+{
+ /*
+ * TODO: This static is a temporary constraint: all guests must use the
+ * same MMU mode because p2m_gpa_bits is not yet per-domain.
+ * Drop this once per-domain p2m_gpa_bits is introduced.
+ */
+ static const struct gstage_mode_desc *m = &modes[0];
struct p2m_domain *p2m = p2m_get_hostp2m(d);
/*
@@ -341,6 +366,27 @@ int p2m_init(struct domain *d)
*/
p2m->domain = d;
+ if ( !config )
+ {
+ dprintk(XENLOG_ERR, "NULL config is passed\n");
+ return -EINVAL;
+ }
+
+ p2m->mode = find_gstage_mode(config->arch.gstage_mode);
+
+ if ( p2m->mode->mode == HGATP_MODE_OFF )
+ return -EINVAL;
+
+ if ( m->mode == HGATP_MODE_OFF )
+ m = p2m->mode;
+
+ if ( m->mode != p2m->mode->mode )
+ {
+ dprintk(XENLOG_ERR,
+ "Mode should be the same for all guests at the moment\n");
+ return -EINVAL;
+ }
+
paging_domain_init(d);
rwlock_init(&p2m->lock);
@@ -362,11 +408,6 @@ int p2m_init(struct domain *d)
# error "Add init of p2m->clean_dcache"
#endif
- /* TODO: don't hardcode used for a domain g-stage mode. */
- p2m->mode.mode = HGATP_MODE_SV39X4;
- p2m->mode.paging_levels = 2;
- safe_strcpy(p2m->mode.name, "Sv39x4");
-
return 0;
}
diff --git a/xen/arch/riscv/vmid.c b/xen/arch/riscv/vmid.c
index 8fbcd500f24d..11c7e9d6d6c8 100644
--- a/xen/arch/riscv/vmid.c
+++ b/xen/arch/riscv/vmid.c
@@ -52,7 +52,7 @@ static DEFINE_PER_CPU(struct vmid_data, vmid_data);
static unsigned int vmidlen_detect(void)
{
unsigned int vmid_bits;
- unsigned char gstage_mode = get_max_supported_mode();
+ unsigned char gstage_mode = max_gstage_mode->mode;
/*
* According to the RISC-V Privileged Architecture Spec:
diff --git a/xen/include/public/arch-riscv.h b/xen/include/public/arch-riscv.h
index 360d8e6871ba..5faf8924d926 100644
--- a/xen/include/public/arch-riscv.h
+++ b/xen/include/public/arch-riscv.h
@@ -56,6 +56,11 @@ typedef struct vcpu_guest_context vcpu_guest_context_t;
DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t);
struct xen_arch_domainconfig {
+ /*
+ * G-stage MMU mode for the guest (e.g. "sv39", "sv48", "sv57").
+ * Must be set; an empty string is invalid.
+ */
+ char gstage_mode[8];
};
#endif
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 09/12] xen: rename p2m_ipa_bits to p2m_gpa_bits
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (7 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 08/12] xen/riscv: rework G-stage mode handling Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 10/12] xen/riscv: introduce p2m_gpa_bits Oleksii Kurochko
` (2 subsequent siblings)
11 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Stefano Stabellini,
Julien Grall, Bertrand Marquis, Michal Orzel, Volodymyr Babchuk,
Jan Beulich
The IPA terminology is Arm-specific, so rename p2m_ipa_bits to
p2m_gpa_bits to use architecture-neutral naming in
xen/common/device-tree/ code.
No functional changes.
Reported-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- Introduce #define p2m_gpa_bits p2m_ipa_bits for Arm instead of
renaming of p2m_ipa_bits to p2m_gpa_bits to keep Arm part of
changes clearer and keep using Arm-specific terminolgy inside
Arm code.
---
Changes in v2:
- New patch.
---
xen/arch/arm/include/asm/p2m.h | 2 ++
xen/common/device-tree/domain-build.c | 2 +-
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/xen/arch/arm/include/asm/p2m.h b/xen/arch/arm/include/asm/p2m.h
index 010ce8c9ebbd..7957dbd96e57 100644
--- a/xen/arch/arm/include/asm/p2m.h
+++ b/xen/arch/arm/include/asm/p2m.h
@@ -14,6 +14,8 @@
/* Holds the bit size of IPAs in p2m tables. */
extern unsigned int p2m_ipa_bits;
+#define p2m_gpa_bits p2m_ipa_bits
+
#define MAX_VMID_8_BIT (1UL << 8)
#define MAX_VMID_16_BIT (1UL << 16)
diff --git a/xen/common/device-tree/domain-build.c b/xen/common/device-tree/domain-build.c
index e706a6173ba6..30625a33ef08 100644
--- a/xen/common/device-tree/domain-build.c
+++ b/xen/common/device-tree/domain-build.c
@@ -221,7 +221,7 @@ int __init find_unallocated_memory(const struct kernel_info *kinfo,
}
start = 0;
- end = (1ULL << p2m_ipa_bits) - 1;
+ end = (1ULL << p2m_gpa_bits) - 1;
res = rangeset_report_ranges(unalloc_mem, PFN_DOWN(start), PFN_DOWN(end),
cb, free_regions);
if ( res )
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 10/12] xen/riscv: introduce p2m_gpa_bits
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (8 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 09/12] xen: rename p2m_ipa_bits to p2m_gpa_bits Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-21 9:46 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 11/12] xen/riscv: add definition of guest RAM banks Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 12/12] xen/riscv: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
11 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
common/device-tree/domain-build.c uses p2m_gpa_bits to determine the
upper bound of the GPA space when searching for unused regions, so it
must be defined when CONFIG_DOMAIN_BUILD_HELPERS=y.
The variable is initialised to PADDR_BITS and narrowed in p2m_init() to
the GPA width of the selected G-stage mode, allowing an external entity
(e.g. an IOMMU) to restrict it further if needed.
p2m_gpa_bits is a global rather than a per-domain value, which is
acceptable for now because all domains are required to use the same
G-stage MMU mode, as dom0less common code allocates it per all
domains.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- Update initialization of p2m_gpa_bits and the comment above.
- Rework how p2m_gpa_bits is limited.
- Update the commit message.
---
Changes in v2:
- New patch.
---
xen/arch/riscv/include/asm/p2m.h | 10 ++++++++--
xen/arch/riscv/p2m.c | 17 ++++++++++++++++-
2 files changed, 24 insertions(+), 3 deletions(-)
diff --git a/xen/arch/riscv/include/asm/p2m.h b/xen/arch/riscv/include/asm/p2m.h
index b5b6a996baeb..0d1dace1a0d8 100644
--- a/xen/arch/riscv/include/asm/p2m.h
+++ b/xen/arch/riscv/include/asm/p2m.h
@@ -32,10 +32,13 @@
*/
#define P2M_LEVEL_ORDER(lvl) XEN_PT_LEVEL_ORDER(lvl)
-#define P2M_ROOT_EXTRA_BITS(p2m, lvl) (2 * ((lvl) == P2M_ROOT_LEVEL(p2m)))
+#define P2M_ROOT_EXTRA_BITS 2
+
+#define P2M_LEVEL_EXTRA_BITS(p2m, lvl) \
+ (P2M_ROOT_EXTRA_BITS * ((lvl) == P2M_ROOT_LEVEL(p2m)))
#define P2M_PAGETABLE_ENTRIES(p2m, lvl) \
- (BIT(PAGETABLE_ORDER + P2M_ROOT_EXTRA_BITS(p2m, lvl), UL))
+ (BIT(PAGETABLE_ORDER + P2M_LEVEL_EXTRA_BITS(p2m, lvl), UL))
#define P2M_TABLE_OFFSET(p2m, lvl) (P2M_PAGETABLE_ENTRIES(p2m, lvl) - 1UL)
@@ -44,6 +47,9 @@
#define P2M_LEVEL_MASK(p2m, lvl) \
(P2M_TABLE_OFFSET(p2m, lvl) << P2M_GFN_LEVEL_SHIFT(lvl))
+/* Holds the bit size of GPAs in p2m tables */
+extern unsigned int p2m_gpa_bits;
+
#define paddr_bits PADDR_BITS
/* Get host p2m table */
diff --git a/xen/arch/riscv/p2m.c b/xen/arch/riscv/p2m.c
index b0773083ff53..1f229e96d740 100644
--- a/xen/arch/riscv/p2m.c
+++ b/xen/arch/riscv/p2m.c
@@ -11,6 +11,8 @@
#include <xen/sections.h>
#include <xen/xvmalloc.h>
+#include <public/domctl.h>
+
#include <asm/cpufeature.h>
#include <asm/csr.h>
#include <asm/flushtlb.h>
@@ -66,6 +68,12 @@ static const struct gstage_mode_desc modes[] = {
const struct gstage_mode_desc * __ro_after_init max_gstage_mode = &modes[0];
+/*
+ * Set to the maximum configured support for GPA bits, so the number of GPA
+ * bits can be restricted by an external entity (e.g. IOMMU).
+ */
+unsigned int __ro_after_init p2m_gpa_bits = PADDR_BITS;
+
static void p2m_free_page(struct p2m_domain *p2m, struct page_info *pg);
static inline void p2m_free_metadata_page(struct p2m_domain *p2m,
@@ -359,6 +367,7 @@ int p2m_init(struct domain *d, const struct xen_domctl_createdomain *config)
*/
static const struct gstage_mode_desc *m = &modes[0];
struct p2m_domain *p2m = p2m_get_hostp2m(d);
+ unsigned int gpa_bits;
/*
* "Trivial" initialisation is now complete. Set the backpointer so the
@@ -408,6 +417,12 @@ int p2m_init(struct domain *d, const struct xen_domctl_createdomain *config)
# error "Add init of p2m->clean_dcache"
#endif
+ gpa_bits = P2M_GFN_LEVEL_SHIFT(p2m->mode->paging_levels + 1) +
+ P2M_ROOT_EXTRA_BITS;
+
+ if ( gpa_bits < p2m_gpa_bits )
+ p2m_gpa_bits = gpa_bits;
+
return 0;
}
@@ -1345,7 +1360,7 @@ static mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
{
unsigned int level = P2M_ROOT_LEVEL(p2m);
unsigned int gfn_limit_bits =
- P2M_LEVEL_ORDER(level + 1) + P2M_ROOT_EXTRA_BITS(p2m, level);
+ P2M_LEVEL_ORDER(level + 1) + P2M_LEVEL_EXTRA_BITS(p2m, level);
pte_t entry, *table;
int rc;
mfn_t mfn = INVALID_MFN;
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 11/12] xen/riscv: add definition of guest RAM banks
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (9 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 10/12] xen/riscv: introduce p2m_gpa_bits Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
2026-04-21 10:24 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 12/12] xen/riscv: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
11 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
The dom0less solution uses defined RAM banks as compile-time constants,
so introduce macros to describe guest RAM banks.
The reason for 2 banks is that there is typically always a use case for
low memory under 4 GB, but the bank under 4 GB ends up being small because
there are other things under 4 GB it can conflict with (interrupt
controller, PCI BARs, etc.). So a second bank is added above that MMIO
region (starting at 8 GiB) to provide the remaining RAM; the gap between
the two banks also exercises code paths handling discontiguous memory.
For Sv32 guests (34-bit GPA, 16 GiB addressable), bank0 provides 2 GB
(2–4 GB) and the first 8 GB of bank1 (8–16 GB) is accessible.
Extended regions are useful for RISC-V: they could be used to provide a
"space" for Linux to map grant mappings.
Despite the fact that for every guest MMU mode the GPA could be up
to 56 bits wide (except Sv32 whose GPA is 34 bits), the combined size
of both banks is limited to 1018 GB as it is more than enough for most
use cases.
Add inclusion of asm/guest-layout.h to asm/domain.h to make dom0less
common code build happy.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
---
Changes in v3:
- Move GUEST_RAM*-related defines to asm/p2m.h instead of public header.
---
Changes in v2:
- New patch.
---
xen/arch/riscv/include/asm/domain.h | 1 +
xen/arch/riscv/include/asm/guest-layout.h | 23 +++++++++++++++++++++++
2 files changed, 24 insertions(+)
create mode 100644 xen/arch/riscv/include/asm/guest-layout.h
diff --git a/xen/arch/riscv/include/asm/domain.h b/xen/arch/riscv/include/asm/domain.h
index 6c48bf13111d..6044ce0feee0 100644
--- a/xen/arch/riscv/include/asm/domain.h
+++ b/xen/arch/riscv/include/asm/domain.h
@@ -7,6 +7,7 @@
#include <xen/xmalloc.h>
#include <public/hvm/params.h>
+#include <asm/guest-layout.h>
#include <asm/p2m.h>
#include <asm/vtimer.h>
diff --git a/xen/arch/riscv/include/asm/guest-layout.h b/xen/arch/riscv/include/asm/guest-layout.h
new file mode 100644
index 000000000000..68d95a09394c
--- /dev/null
+++ b/xen/arch/riscv/include/asm/guest-layout.h
@@ -0,0 +1,23 @@
+#ifndef ASM_RISCV_GUEST_LAYOUT_H
+#define ASM_RISCV_GUEST_LAYOUT_H
+
+#include <public/xen.h>
+
+#define GUEST_RAM_BANKS 2
+
+/*
+ * The way to find the extended regions (to be exposed to the guest as unused
+ * address space) relies on the fact that the regions reserved for the RAM
+ * below are big enough to also accommodate such regions.
+ */
+#define GUEST_RAM0_BASE xen_mk_ullong(0x80000000) /* 2GB of low RAM @ 2GB */
+#define GUEST_RAM0_SIZE xen_mk_ullong(0x80000000)
+
+#define GUEST_RAM1_BASE xen_mk_ullong(0x0200000000) /* 1016 GB of RAM @ 8GB */
+#define GUEST_RAM1_SIZE xen_mk_ullong(0xFE00000000)
+
+/* TODO: allocate these all dynamically */
+#define GUEST_RAM_BANK_BASES { GUEST_RAM0_BASE, GUEST_RAM1_BASE }
+#define GUEST_RAM_BANK_SIZES { GUEST_RAM0_SIZE, GUEST_RAM1_SIZE }
+
+#endif /* ASM_RISCV_GUEST_LAYOUT_H */
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* [PATCH v3 12/12] xen/riscv: enable DOMAIN_BUILD_HELPERS
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
` (10 preceding siblings ...)
2026-04-10 15:54 ` [PATCH v3 11/12] xen/riscv: add definition of guest RAM banks Oleksii Kurochko
@ 2026-04-10 15:54 ` Oleksii Kurochko
11 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-10 15:54 UTC (permalink / raw)
To: xen-devel
Cc: Romain Caritey, Oleksii Kurochko, Alistair Francis, Connor Davis,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Everything is ready to enable DOMAIN_BUILD_HELPER which are necessary
for dom0less common code. So enable it.
Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
---
Changes in v3:
- Add Acked-by: Jan Beulich <jbeulich@suse.com>.
---
Changes in v2:
- Move introduction of p2m_ipa_bits to separate patch.
- Move to separate patch introduction of guest banks constansts.
---
xen/arch/riscv/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/xen/arch/riscv/Kconfig b/xen/arch/riscv/Kconfig
index a5e87c1757f7..41426c205292 100644
--- a/xen/arch/riscv/Kconfig
+++ b/xen/arch/riscv/Kconfig
@@ -1,5 +1,6 @@
config RISCV
def_bool y
+ select DOMAIN_BUILD_HELPERS
select FUNCTION_ALIGNMENT_16B
select GENERIC_BUG_FRAME
select GENERIC_UART_INIT
--
2.53.0
^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH v3 01/12] xen/riscv: implement get_page_from_gfn()
2026-04-10 15:54 ` [PATCH v3 01/12] xen/riscv: implement get_page_from_gfn() Oleksii Kurochko
@ 2026-04-20 14:17 ` Jan Beulich
0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2026-04-20 14:17 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> The function is implemented out-of-line rather than as a static inline,
> to avoid header ordering issues where struct domain is incomplete when
> asm/p2m.h is included, leading to build failures:
> In file included from ./arch/riscv/include/asm/domain.h:10,
> from ./include/xen/domain.h:16,
> from ./include/xen/sched.h:11,
> from ./include/xen/event.h:12,
> from common/cpu.c:3:
> ./arch/riscv/include/asm/p2m.h: In function 'get_page_from_gfn':
> ./arch/riscv/include/asm/p2m.h:50:33: error: invalid use of undefined type 'struct domain'
> 50 | #define p2m_get_hostp2m(d) (&(d)->arch.p2m)
> | ^~
> ./arch/riscv/include/asm/p2m.h:180:38: note: in expansion of macro 'p2m_get_hostp2m'
> 180 | return p2m_get_page_from_gfn(p2m_get_hostp2m(d), _gfn(gfn), t);
> | ^~~~~~~~~~~~~~~
> make[2]: *** [Rules.mk:253: common/cpu.o] Error 1
> make[1]: *** [build.mk:72: common] Error 2
> make: *** [Makefile:623: xen] Error 2
I still think this can and preferably would be sorted, but for now this is
good enough.
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
with a comment adjustment:
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -1534,3 +1534,16 @@ void p2m_handle_vmenter(void)
> * won't be reused until need_flush is set to true.
> */
> }
> +
> +struct page_info *get_page_from_gfn(struct domain *d, unsigned long gfn,
> + p2m_type_t *t, p2m_query_t q)
> +{
> + /*
> + * system domains are domains which doesn't have p2m translation tables,
> + * so they can't use p2m_get_page_from_gfn() and extra care should be
> + * done for them.
> + */
This violates style and has two grammar issues. I'll take care of this
when committing.
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 02/12] xen: fix len type for guest copy functions
2026-04-10 15:54 ` [PATCH v3 02/12] xen: fix len type for guest copy functions Oleksii Kurochko
@ 2026-04-20 15:43 ` Jan Beulich
2026-04-20 15:44 ` Jan Beulich
1 sibling, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2026-04-20 15:43 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
Anthony PERARD, Roger Pau Monné, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> Widen the len argument of copy_to_guest_phys_flush_dcache() and the
> copy_to_guest_phys_cb typedef from unsigned int to unsigned long, as
> the function can be used to copy large blobs such as the initrd which
> may exceed 4GB. Update the internal copy_guest() len argument to
> unsigned long accordingly.
>
> Change the type for local variable size in copy_guest() to avoid
> compilation error because of type mismatch.
>
> raw_* wrappers above copy_guest() keep returning unsigned long to
> avoid type narrowing; it is not an issue for raw_*'s len argument
> to remain 'unsigned int' since the assignment to copy_guest()'s wider
> unsigned long parameter is safe and there is no raw_* users who
> are using a value bigger than what can fit into 'unsigned int'.
>
> Reported-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
With two suggestions: For one, the patch subject prefix may want to be (or
include) "arm:", to draw Arm maintainers' attention. And then ...
> --- a/xen/arch/arm/guestcopy.c
> +++ b/xen/arch/arm/guestcopy.c
> @@ -53,7 +53,7 @@ static struct page_info *translate_get_page(copy_info_t info, uint64_t addr,
> return page;
> }
>
> -static unsigned long copy_guest(void *buf, uint64_t addr, unsigned int len,
> +static unsigned long copy_guest(void *buf, uint64_t addr, unsigned long len,
> copy_info_t info, unsigned int flags)
> {
> /* XXX needs to handle faults */
> @@ -65,7 +65,7 @@ static unsigned long copy_guest(void *buf, uint64_t addr, unsigned int len,
> while ( len )
> {
> void *p;
> - unsigned int size = min(len, (unsigned int)PAGE_SIZE - offset);
> + unsigned long size = min_t(unsigned long, len, PAGE_SIZE - offset);
... I think we'd be better off avoiding min_t() whenever possible, i.e.
when min() can sensibly be used (preferably, unlike originally here, also
without a cast). PAGE_SIZE being of type long, "PAGE_SIZE + 0UL - offset"
would look to be an option. Yet of course Arm maintainers may dislike
that form ...
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 02/12] xen: fix len type for guest copy functions
2026-04-10 15:54 ` [PATCH v3 02/12] xen: fix len type for guest copy functions Oleksii Kurochko
2026-04-20 15:43 ` Jan Beulich
@ 2026-04-20 15:44 ` Jan Beulich
2026-04-22 10:02 ` Oleksii Kurochko
1 sibling, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2026-04-20 15:44 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
Anthony PERARD, Roger Pau Monné, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> Widen the len argument of copy_to_guest_phys_flush_dcache() and the
> copy_to_guest_phys_cb typedef from unsigned int to unsigned long, as
> the function can be used to copy large blobs such as the initrd which
> may exceed 4GB. Update the internal copy_guest() len argument to
> unsigned long accordingly.
>
> Change the type for local variable size in copy_guest() to avoid
> compilation error because of type mismatch.
>
> raw_* wrappers above copy_guest() keep returning unsigned long to
> avoid type narrowing; it is not an issue for raw_*'s len argument
> to remain 'unsigned int' since the assignment to copy_guest()'s wider
> unsigned long parameter is safe and there is no raw_* users who
> are using a value bigger than what can fit into 'unsigned int'.
>
> Reported-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Actually: You say "fix" in the subject. How about sorting out a correct
Fixes: tag then?
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 03/12] xen/riscv: implement copy_to_guest_phys()
2026-04-10 15:54 ` [PATCH v3 03/12] xen/riscv: implement copy_to_guest_phys() Oleksii Kurochko
@ 2026-04-20 15:46 ` Jan Beulich
0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2026-04-20 15:46 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> Introduce copy_to_guest_phys() for RISC-V, based on the Arm implementation.
>
> Add a generic copy_guest() helper for copying to and from guest physical
> (and potentially virtual addresses in the future), and implement
> translate_get_page() to translate a guest physical address into a struct
> page_info via the domain p2m.
>
> Compared to the Arm code:
> - Drop COPY_flush_dcache(), as no such use cases exist on RISC-V.
> - Do not implement the linear mapping case, which is currently unused.
> - Use PAGE_OFFSET() to initialize the local offset variable in copy_guest().
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
with ...
> --- /dev/null
> +++ b/xen/arch/riscv/guestcopy.c
> @@ -0,0 +1,116 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <xen/domain_page.h>
> +#include <xen/page-size.h>
> +#include <xen/sched.h>
> +#include <xen/string.h>
> +
> +#include <asm/guest_access.h>
> +
> +#define COPY_from_guest 0U
> +#define COPY_to_guest BIT(0, U)
> +#define COPY_gpa 0U
> +#define COPY_gva BIT(1, U)
> +
> +typedef union
> +{
> + struct
> + {
> + struct vcpu *v;
> + } gva;
> +
> + struct
> + {
> + struct domain *d;
> + } gpa;
> +} copy_info_t;
> +
> +#define GVA_INFO(vcpu) ((copy_info_t) { .gva = { vcpu } })
> +#define GPA_INFO(domain) ((copy_info_t) { .gpa = { domain } })
> +
> +static struct page_info *translate_get_page(copy_info_t info, uint64_t addr,
> + bool gva, bool write)
> +{
> + p2m_type_t p2mt;
> + struct page_info *page;
> +
> + /*
> + * Not implemented yet.
> + *
> + * If gva == true, the operation will likely require a struct vcpu
> + * rather than just a struct domain. For this reason copy_info_t is
> + * already passed here instead of only struct domain.
> + */
> + BUG_ON(gva);
> +
> + page = get_page_from_gfn(info.gpa.d, paddr_to_pfn(addr), &p2mt, P2M_ALLOC);
> +
> + if ( !page )
> + return NULL;
> +
> + if ( write ? p2mt != p2m_ram_rw : !p2m_is_ram(p2mt) )
> + {
> + put_page(page);
> + return NULL;
> + }
> +
> + return page;
> +}
> +
> +static unsigned long copy_guest(void *buf, uint64_t addr, unsigned long len,
> + copy_info_t info, unsigned int flags)
> +{
> + unsigned int offset = PAGE_OFFSET(addr);
> +
> + BUILD_BUG_ON((sizeof(addr)) < sizeof(vaddr_t));
> + BUILD_BUG_ON((sizeof(addr)) < sizeof(paddr_t));
> +
> + while ( len )
> + {
> + void *p;
> + unsigned long size = min_t(unsigned long, len, PAGE_SIZE - offset);
... this adjusted to whatever the final shape is going to be on Arm.
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 05/12] xen/riscv: add kernel loading support
2026-04-10 15:54 ` [PATCH v3 05/12] xen/riscv: add kernel loading support Oleksii Kurochko
@ 2026-04-21 8:57 ` Jan Beulich
2026-04-22 11:45 ` Oleksii Kurochko
2026-04-21 9:01 ` Jan Beulich
1 sibling, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2026-04-21 8:57 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/include/asm/config.h
> +++ b/xen/arch/riscv/include/asm/config.h
> @@ -151,6 +151,19 @@
> extern unsigned long phys_offset; /* = load_start - XEN_VIRT_START */
> #endif
>
> +/*
> + * KERNEL_LOAD_ADDR_ALIGNMENT is defined based on paragraph of
> + * "Kernel location" of boot.rst:
> + * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
I.e. this is entirely Linux-centric? If so, maybe the patch subject should
then reflect this?
> --- /dev/null
> +++ b/xen/arch/riscv/kernel.c
> @@ -0,0 +1,230 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#include <xen/bug.h>
> +#include <xen/compiler.h>
> +#include <xen/errno.h>
> +#include <xen/fdt-kernel.h>
> +#include <xen/guest_access.h>
> +#include <xen/init.h>
> +#include <xen/libfdt/libfdt.h>
> +#include <xen/mm.h>
> +#include <xen/types.h>
> +#include <xen/vmap.h>
> +
> +#include <asm/setup.h>
> +
> +#define IMAGE64_MAGIC_V2 0x05435352 /* Magic number 2, le, "RSC\x05" */
> +
> +static void __init place_modules(struct kernel_info *info, paddr_t kernbase,
> + paddr_t kernend)
> +{
> + const struct boot_module *mod = info->bd.initrd;
> + const struct membanks *banks = kernel_info_get_mem_const(info);
> + const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0,
> + KERNEL_LOAD_ADDR_ALIGNMENT);
> + const paddr_t dtb_len = ROUNDUP(fdt_totalsize(info->fdt),
> + KERNEL_LOAD_ADDR_ALIGNMENT);
> + const paddr_t modsize = initrd_len + dtb_len;
> + int bi;
Please can variables used for array indexing be of unsigned types? The use ...
> + BUG_ON(modsize < initrd_len);
> +
> + /*
> + * Place modules as high in RAM as possible, scanning banks from
> + * last to first so that the end of the last bank is preferred.
> + */
> + for ( bi = banks->nr_banks - 1; bi >= 0; bi-- )
... here can easily be replaced:
for ( bi = banks->nr_banks; bi-- > 0; )
Or you could have
unsigned int bi = banks->nr_banks;
...
while ( bi-- > 0 )
.
> + {
> + const struct membank *bank = &banks->bank[bi];
> + const paddr_t bank_end = bank->start + bank->size;
> + paddr_t modbase;
> +
> + if ( modsize > bank->size )
> + continue;
> +
> + modbase = ROUNDDOWN(bank_end - modsize, KERNEL_LOAD_ADDR_ALIGNMENT);
> +
> + if ( modbase < bank->start )
> + continue;
> +
> + /*
> + * If the kernel resides in this bank, ensure modules do not
> + * overlap with it.
> + */
> + if ( (kernbase >= bank->start) && (kernbase < bank_end) &&
> + (modbase < ROUNDUP(kernend, KERNEL_LOAD_ADDR_ALIGNMENT)) &&
> + (modbase + modsize > kernbase) )
> + continue;
Can't this be had with only two comparisons? Same bank or not doesn't really
matter - if it's different banks, there'll be no overlap anyway. So all you
need here is that the module range doesn't overlap the kernel range, entirely
independent of the bank.
What is dependent on the bank is that the bank may fit both kernel and module
even if there is an overlap as per your current calculation: You may be able
to place the module below the kernel if it doesn't fit above.
> +static paddr_t __init kernel_image_place(struct kernel_info *info)
> +{
> + paddr_t load_addr = INVALID_PADDR;
> + uint64_t image_size = info->image.image_size ?: info->image.len;
> + const struct membanks *banks = kernel_info_get_mem_const(info);
> + unsigned int nr_banks = banks->nr_banks;
> + unsigned int bi;
> +
> + dprintk(XENLOG_DEBUG, "nr_banks(%u)\n", nr_banks);
> +
> + /*
> + * At the moment, RISC-V's Linux kernel should be always position
> + * independent based on "Per-MMU execution" of boot.rst:
> + * https://docs.kernel.org/arch/riscv/boot.html#pre-mmu-execution
> + *
> + * But just for the case when RISC-V's Linux kernel isn't position
> + * independent it is needed to take load address from
> + * info->image.start.
> + *
> + * If `start` is zero, the Image is position independent.
> + */
> + if ( likely(!info->image.start) )
> + {
> + for ( bi = 0; bi != nr_banks; bi++ )
> + {
> + const struct membank *bank = &banks->bank[bi];
> + paddr_t bank_start = bank->start;
> + /*
> + * According to boot.rst kernel load address should be properly
> + * aligned:
> + * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
> + *
> + * As Image in this case is PIC we can ignore
> + * info->image.text_offset.
> + */
> + paddr_t aligned_start = ROUNDUP(bank_start, KERNEL_LOAD_ADDR_ALIGNMENT);
> + paddr_t bank_end = bank_start + bank->size;
> + paddr_t bank_size;
> +
> + if ( aligned_start > bank_end )
> + continue;
> +
> + bank_size = bank_end - aligned_start;
> +
> + dprintk(XENLOG_DEBUG, "bank[%u].start=%"PRIpaddr"\n", bi, bank->start);
> +
> + if ( image_size <= bank_size )
> + {
> + load_addr = aligned_start;
> + break;
> + }
> + }
> + }
> + else
> + {
> + load_addr = info->image.start + info->image.text_offset;
Why does stuff ahead of text_offset not need loading?
> + WARN_ON(!IS_ALIGNED(load_addr, KERNEL_LOAD_ADDR_ALIGNMENT));
> +
> + for ( bi = 0; bi != nr_banks; bi++ )
> + {
> + const struct membank *bank = &banks->bank[bi];
> + paddr_t bank_start = bank->start;
> + paddr_t bank_end = bank_start + bank->size;
> +
> + if ( (load_addr >= bank_start) && (load_addr < bank_end) &&
> + (bank_end - load_addr) >= image_size )
Do we have to fear overflow? (If so, shouldn't such an image be rejected
rather than an attempt being made to place it?) If not, simply:
if ( (load_addr >= bank_start) &&
(load_addr + image_size <= bank_end) )
Also, does image_size really only cover space starting from .text_offset,
rather than from .start?
> +static void __init kernel_image_load(struct kernel_info *info)
> +{
> + int rc;
> + paddr_t load_addr = kernel_image_place(info);
> + paddr_t paddr = info->image.kernel_addr;
> + paddr_t len = info->image.len;
> + paddr_t effective_size = info->image.image_size ?: len;
> + void *kernel;
> +
> + place_modules(info, load_addr, load_addr + effective_size);
> +
> + printk("Loading Image from %"PRIpaddr" to %"PRIpaddr"-%"PRIpaddr"\n",
> + paddr, load_addr, load_addr + effective_size);
As on earlier occasions: Please represent ranges as mathematical ones, to
disambiguate whether the bounds (the upper one in particular) are inclusive
or exclusive.
> +int __init kernel_image_probe(struct kernel_info *info, paddr_t addr,
> + paddr_t size)
> +{
> +#ifdef CONFIG_RISCV_64
> + return kernel_image64_probe(info, addr, size);
> +#else
> + return -EOPNOTSUPP;
Better #error, as you have it elsewhere (iirc)?
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 05/12] xen/riscv: add kernel loading support
2026-04-10 15:54 ` [PATCH v3 05/12] xen/riscv: add kernel loading support Oleksii Kurochko
2026-04-21 8:57 ` Jan Beulich
@ 2026-04-21 9:01 ` Jan Beulich
1 sibling, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2026-04-21 9:01 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> +/* Check if the image is a 64-bit Image */
> +static int __init kernel_image64_probe(struct kernel_info *info,
> + paddr_t addr, paddr_t size)
> +{
> + /* riscv/boot-image-header.rst */
Is this meant to refer to Linux'es Documentation/arch/riscv/boot-image-header.rst?
This needs making unambiguous.
Jan
> + struct {
> + uint32_t code0; /* Executable code */
> + uint32_t code1; /* Executable code */
> + uint64_t text_offset; /* Image load offset, little endian */
> + uint64_t image_size; /* Effective Image size, little endian */
> + uint64_t flags; /* kernel flags, little endian */
> + uint32_t version; /* Version of this header */
> + uint32_t res1; /* Reserved */
> + uint64_t res2; /* Reserved */
> + uint64_t magic; /* Deprecated: Magic number, little endian, "RISCV" */
> + uint32_t magic2; /* Magic number 2, little endian, "RSC\x05" */
> + uint32_t res3; /* Reserved for PE COFF offset */
> + } image;
> + uint64_t effective_size;
> +
> + if ( size < sizeof(image) )
> + return -EINVAL;
> +
> + copy_from_paddr(&image, addr, sizeof(image));
> +
> + /* Magic v1 is deprecated and may be removed. Only use v2 */
> + if ( le32_to_cpu(image.magic2) != IMAGE64_MAGIC_V2 )
> + return -EINVAL;
> +
> + effective_size = le64_to_cpu(image.image_size);
> +
> + if ( effective_size && size > effective_size )
> + return -EINVAL;
> +
> + info->image.kernel_addr = addr;
> + /* Actual size in the binary file */
> + info->image.len = size;
> + /* Total memory the kernel occupies at runtime */
> + info->image.image_size = effective_size;
> + info->image.text_offset = le64_to_cpu(image.text_offset);
> + info->image.start = 0;
> +
> + info->load = kernel_image_load;
> +
> + return 0;
> +}
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 07/12] xen: introduce domain-layout.h with common domain_use_host_layout()
2026-04-10 15:54 ` [PATCH v3 07/12] xen: introduce domain-layout.h with common domain_use_host_layout() Oleksii Kurochko
@ 2026-04-21 9:20 ` Jan Beulich
2026-04-22 14:38 ` Oleksii Kurochko
0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2026-04-21 9:20 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
Anthony PERARD, Roger Pau Monné, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> --- /dev/null
> +++ b/xen/include/xen/domain-layout.h
> @@ -0,0 +1,28 @@
> +#ifndef __XEN_DOMAIN_LAYOUT_H__
> +#define __XEN_DOMAIN_LAYOUT_H__
> +
> +#include <xen/domain.h>
> +#include <xen/paging.h>
> +#include <xen/sched.h>
> +
> +/*
> + * Is a domain using the host memory layout?
> + *
> + * domain_use_host_layout() is always False for PV guests.
> + *
> + * Direct-mapped domains (autotranslated domains with memory allocated
> + * contiguously and mapped 1:1 so that GFN == MFN) are always using the
> + * host memory layout to avoid address clashes.
What is "to avoid address clashes" about? If GFN == MFN, how could there
be clashes?
> + * The hardware domain will use the host layout (regardless of
> + * direct-mapped) because some OS may rely on specific address ranges
> + * for the devices. PV Dom0, like any other PV guests, has
> + * domain_use_host_layout() returning False.
This last sentence is somewhat redundant and somewhat in conflict with
what is said further up. If you did s/guests/domains (including Dom0)/
there, imo this sentence would best be dropped from down here.
With these adjustments:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 08/12] xen/riscv: rework G-stage mode handling
2026-04-10 15:54 ` [PATCH v3 08/12] xen/riscv: rework G-stage mode handling Oleksii Kurochko
@ 2026-04-21 9:39 ` Jan Beulich
2026-04-22 16:06 ` Oleksii Kurochko
0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2026-04-21 9:39 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> --- a/xen/arch/riscv/p2m.c
> +++ b/xen/arch/riscv/p2m.c
> @@ -45,12 +45,27 @@ struct p2m_pte_ctx {
> unsigned int level; /* Paging level at which the PTE resides. */
> };
>
> -static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
> - .mode = HGATP_MODE_OFF,
> - .paging_levels = 0,
> - .name = "Bare",
> +/* Values should be sorted by ->mode in this array */
> +static const struct gstage_mode_desc modes[] = {
As before, I'm of the clear opinion that this is too generic an identifier
for use at file scope.
> @@ -331,8 +324,40 @@ static int p2m_alloc_root_table(struct p2m_domain *p2m)
> return 0;
> }
>
> -int p2m_init(struct domain *d)
> +static const struct gstage_mode_desc * find_gstage_mode(const char *mmu_type)
Nit (style): Stray blank after *.
> {
> + for ( unsigned int mode_idx = 0; mode_idx < ARRAY_SIZE(modes); mode_idx++ )
Can't the variable be just idx or even i? Would likely help readability
some.
> + {
> + if ( !strcasecmp(mmu_type, modes[mode_idx].name) )
> + {
> + if ( modes[mode_idx].mode == HGATP_MODE_OFF ||
Instead of this, start the loop at index 1, with ...
> + modes[mode_idx].mode > max_gstage_mode->mode )
> + break;
> +
> + return &modes[mode_idx];
> + }
> + }
> +
> + ASSERT(modes[0].mode == HGATP_MODE_OFF);
... this moved up?
> + dprintk(XENLOG_ERR, "Requested G-stage mode (%s) isn't supported\n",
> + mmu_type);
> +
> + /*
> + * Return the Bare-mode sentinel. p2m_init() will reject it with
> + * -EINVAL, producing the appropriate domain-creation failure.
> + */
> + return &modes[0];
> +}
Yet better return NULL on error?
> +int p2m_init(struct domain *d, const struct xen_domctl_createdomain *config)
> +{
> + /*
> + * TODO: This static is a temporary constraint: all guests must use the
> + * same MMU mode because p2m_gpa_bits is not yet per-domain.
> + * Drop this once per-domain p2m_gpa_bits is introduced.
> + */
> + static const struct gstage_mode_desc *m = &modes[0];
It being temporary it may not matter much, but couldn't this be __ro_after_init?`
At least one domain needs creating during boot, so ...
> @@ -341,6 +366,27 @@ int p2m_init(struct domain *d)
> */
> p2m->domain = d;
>
> + if ( !config )
> + {
> + dprintk(XENLOG_ERR, "NULL config is passed\n");
> + return -EINVAL;
> + }
> +
> + p2m->mode = find_gstage_mode(config->arch.gstage_mode);
> +
> + if ( p2m->mode->mode == HGATP_MODE_OFF )
> + return -EINVAL;
> +
> + if ( m->mode == HGATP_MODE_OFF )
> + m = p2m->mode;
... this path won't be taken post-init.
> --- a/xen/include/public/arch-riscv.h
> +++ b/xen/include/public/arch-riscv.h
> @@ -56,6 +56,11 @@ typedef struct vcpu_guest_context vcpu_guest_context_t;
> DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t);
>
> struct xen_arch_domainconfig {
> + /*
> + * G-stage MMU mode for the guest (e.g. "sv39", "sv48", "sv57").
> + * Must be set; an empty string is invalid.
> + */
> + char gstage_mode[8];
> };
I have to say that I find it odd to use a string literal for this purpose.
Specifying the number of wanted address bits would feel more natural. Plus
the strings named aren't valid G-stage modes afaict - they lack the x4.
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 10/12] xen/riscv: introduce p2m_gpa_bits
2026-04-10 15:54 ` [PATCH v3 10/12] xen/riscv: introduce p2m_gpa_bits Oleksii Kurochko
@ 2026-04-21 9:46 ` Jan Beulich
2026-04-23 13:47 ` Oleksii Kurochko
0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2026-04-21 9:46 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> common/device-tree/domain-build.c uses p2m_gpa_bits to determine the
> upper bound of the GPA space when searching for unused regions, so it
> must be defined when CONFIG_DOMAIN_BUILD_HELPERS=y.
>
> The variable is initialised to PADDR_BITS and narrowed in p2m_init() to
> the GPA width of the selected G-stage mode, allowing an external entity
> (e.g. an IOMMU) to restrict it further if needed.
>
> p2m_gpa_bits is a global rather than a per-domain value, which is
> acceptable for now because all domains are required to use the same
> G-stage MMU mode, as dom0less common code allocates it per all
> domains.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Pretty hesitantly:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 11/12] xen/riscv: add definition of guest RAM banks
2026-04-10 15:54 ` [PATCH v3 11/12] xen/riscv: add definition of guest RAM banks Oleksii Kurochko
@ 2026-04-21 10:24 ` Jan Beulich
2026-04-23 13:50 ` Oleksii Kurochko
0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2026-04-21 10:24 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 10.04.2026 17:54, Oleksii Kurochko wrote:
> The dom0less solution uses defined RAM banks as compile-time constants,
> so introduce macros to describe guest RAM banks.
>
> The reason for 2 banks is that there is typically always a use case for
> low memory under 4 GB, but the bank under 4 GB ends up being small because
> there are other things under 4 GB it can conflict with (interrupt
> controller, PCI BARs, etc.). So a second bank is added above that MMIO
> region (starting at 8 GiB) to provide the remaining RAM; the gap between
> the two banks also exercises code paths handling discontiguous memory.
> For Sv32 guests (34-bit GPA, 16 GiB addressable), bank0 provides 2 GB
> (2–4 GB) and the first 8 GB of bank1 (8–16 GB) is accessible.
>
> Extended regions are useful for RISC-V: they could be used to provide a
> "space" for Linux to map grant mappings.
>
> Despite the fact that for every guest MMU mode the GPA could be up
> to 56 bits wide (except Sv32 whose GPA is 34 bits), the combined size
> of both banks is limited to 1018 GB as it is more than enough for most
> use cases.
>
> Add inclusion of asm/guest-layout.h to asm/domain.h to make dom0less
> common code build happy.
>
> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
Again pretty hesitantly:
Acked-by: Jan Beulich <jbeulich@suse.com>
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 02/12] xen: fix len type for guest copy functions
2026-04-20 15:44 ` Jan Beulich
@ 2026-04-22 10:02 ` Oleksii Kurochko
0 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-22 10:02 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
Anthony PERARD, Roger Pau Monné, xen-devel
On 4/20/26 5:44 PM, Jan Beulich wrote:
> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>> Widen the len argument of copy_to_guest_phys_flush_dcache() and the
>> copy_to_guest_phys_cb typedef from unsigned int to unsigned long, as
>> the function can be used to copy large blobs such as the initrd which
>> may exceed 4GB. Update the internal copy_guest() len argument to
>> unsigned long accordingly.
>>
>> Change the type for local variable size in copy_guest() to avoid
>> compilation error because of type mismatch.
>>
>> raw_* wrappers above copy_guest() keep returning unsigned long to
>> avoid type narrowing; it is not an issue for raw_*'s len argument
>> to remain 'unsigned int' since the assignment to copy_guest()'s wider
>> unsigned long parameter is safe and there is no raw_* users who
>> are using a value bigger than what can fit into 'unsigned int'.
>>
>> Reported-by: Jan Beulich <jbeulich@suse.com>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> Actually: You say "fix" in the subject. How about sorting out a correct
> Fixes: tag then?
I will add then before Reported-by:
Fixes: 2986481b3d9e6 ("xen/arm: guest_copy: Extend the prototype to pass
the vCPU")
Fixes: 5302bd490bea7 ("xen/arm: Introduce copy_to_guest_phys_flush_dcache")
Fixes: d07b7369aa65b ("xen/common: dom0less: introduce common
domain-build.c")
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 05/12] xen/riscv: add kernel loading support
2026-04-21 8:57 ` Jan Beulich
@ 2026-04-22 11:45 ` Oleksii Kurochko
2026-04-22 11:57 ` Jan Beulich
0 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-22 11:45 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/21/26 10:57 AM, Jan Beulich wrote:
> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/include/asm/config.h
>> +++ b/xen/arch/riscv/include/asm/config.h
>> @@ -151,6 +151,19 @@
>> extern unsigned long phys_offset; /* = load_start - XEN_VIRT_START */
>> #endif
>>
>> +/*
>> + * KERNEL_LOAD_ADDR_ALIGNMENT is defined based on paragraph of
>> + * "Kernel location" of boot.rst:
>> + * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
>
> I.e. this is entirely Linux-centric? If so, maybe the patch subject should
> then reflect this?
At least. for now - yes. I will add Linux kernel to the patch subject.
>
>> --- /dev/null
>> +++ b/xen/arch/riscv/kernel.c
>> @@ -0,0 +1,230 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +
>> +#include <xen/bug.h>
>> +#include <xen/compiler.h>
>> +#include <xen/errno.h>
>> +#include <xen/fdt-kernel.h>
>> +#include <xen/guest_access.h>
>> +#include <xen/init.h>
>> +#include <xen/libfdt/libfdt.h>
>> +#include <xen/mm.h>
>> +#include <xen/types.h>
>> +#include <xen/vmap.h>
>> +
>> +#include <asm/setup.h>
>> +
>> +#define IMAGE64_MAGIC_V2 0x05435352 /* Magic number 2, le, "RSC\x05" */
>> +
>> +static void __init place_modules(struct kernel_info *info, paddr_t kernbase,
>> + paddr_t kernend)
>> +{
>> + const struct boot_module *mod = info->bd.initrd;
>> + const struct membanks *banks = kernel_info_get_mem_const(info);
>> + const paddr_t initrd_len = ROUNDUP(mod ? mod->size : 0,
>> + KERNEL_LOAD_ADDR_ALIGNMENT);
>> + const paddr_t dtb_len = ROUNDUP(fdt_totalsize(info->fdt),
>> + KERNEL_LOAD_ADDR_ALIGNMENT);
>> + const paddr_t modsize = initrd_len + dtb_len;
>> + int bi;
>
> Please can variables used for array indexing be of unsigned types? The use ...
>
>> + BUG_ON(modsize < initrd_len);
>> +
>> + /*
>> + * Place modules as high in RAM as possible, scanning banks from
>> + * last to first so that the end of the last bank is preferred.
>> + */
>> + for ( bi = banks->nr_banks - 1; bi >= 0; bi-- )
>
> ... here can easily be replaced:
>
> for ( bi = banks->nr_banks; bi-- > 0; )
>
> Or you could have
>
> unsigned int bi = banks->nr_banks;
> ...
> while ( bi-- > 0 )
>
> .
I will use while() form.
>
>> + {
>> + const struct membank *bank = &banks->bank[bi];
>> + const paddr_t bank_end = bank->start + bank->size;
>> + paddr_t modbase;
>> +
>> + if ( modsize > bank->size )
>> + continue;
>> +
>> + modbase = ROUNDDOWN(bank_end - modsize, KERNEL_LOAD_ADDR_ALIGNMENT);
>> +
>> + if ( modbase < bank->start )
>> + continue;
>> +
>> + /*
>> + * If the kernel resides in this bank, ensure modules do not
>> + * overlap with it.
>> + */
>> + if ( (kernbase >= bank->start) && (kernbase < bank_end) &&
>> + (modbase < ROUNDUP(kernend, KERNEL_LOAD_ADDR_ALIGNMENT)) &&
>> + (modbase + modsize > kernbase) )
>> + continue;
>
> Can't this be had with only two comparisons? Same bank or not doesn't really
> matter - if it's different banks, there'll be no overlap anyway. So all you
> need here is that the module range doesn't overlap the kernel range, entirely
> independent of the bank.
>
> What is dependent on the bank is that the bank may fit both kernel and module
> even if there is an overlap as per your current calculation: You may be able
> to place the module below the kernel if it doesn't fit above.
I will drop the first check and update the comment:
/*
* If modules would overlap the kernel, try placing them below it.
*/
if ( (modbase < ROUNDUP(kernend, KERNEL_LOAD_ADDR_ALIGNMENT)) &&
(modbase + modsize > kernbase) )
{
modbase = ROUNDDOWN(kernbase - modsize,
KERNEL_LOAD_ADDR_ALIGNMENT);
if ( modbase < bank->start )
continue;
}
>> +static paddr_t __init kernel_image_place(struct kernel_info *info)
>> +{
>> + paddr_t load_addr = INVALID_PADDR;
>> + uint64_t image_size = info->image.image_size ?: info->image.len;
>> + const struct membanks *banks = kernel_info_get_mem_const(info);
>> + unsigned int nr_banks = banks->nr_banks;
>> + unsigned int bi;
>> +
>> + dprintk(XENLOG_DEBUG, "nr_banks(%u)\n", nr_banks);
>> +
>> + /*
>> + * At the moment, RISC-V's Linux kernel should be always position
>> + * independent based on "Per-MMU execution" of boot.rst:
>> + * https://docs.kernel.org/arch/riscv/boot.html#pre-mmu-execution
>> + *
>> + * But just for the case when RISC-V's Linux kernel isn't position
>> + * independent it is needed to take load address from
>> + * info->image.start.
>> + *
>> + * If `start` is zero, the Image is position independent.
>> + */
>> + if ( likely(!info->image.start) )
>> + {
>> + for ( bi = 0; bi != nr_banks; bi++ )
>> + {
>> + const struct membank *bank = &banks->bank[bi];
>> + paddr_t bank_start = bank->start;
>> + /*
>> + * According to boot.rst kernel load address should be properly
>> + * aligned:
>> + * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
>> + *
>> + * As Image in this case is PIC we can ignore
>> + * info->image.text_offset.
>> + */
>> + paddr_t aligned_start = ROUNDUP(bank_start, KERNEL_LOAD_ADDR_ALIGNMENT);
>> + paddr_t bank_end = bank_start + bank->size;
>> + paddr_t bank_size;
>> +
>> + if ( aligned_start > bank_end )
>> + continue;
>> +
>> + bank_size = bank_end - aligned_start;
>> +
>> + dprintk(XENLOG_DEBUG, "bank[%u].start=%"PRIpaddr"\n", bi, bank->start);
>> +
>> + if ( image_size <= bank_size )
>> + {
>> + load_addr = aligned_start;
>> + break;
>> + }
>> + }
>> + }
>> + else
>> + {
>> + load_addr = info->image.start + info->image.text_offset;
>
> Why does stuff ahead of text_offset not need loading?
Here we just calculating only a place where kernel will be loaded. The
full kernel image will be loaded in kernel_image_load().
>
>> + WARN_ON(!IS_ALIGNED(load_addr, KERNEL_LOAD_ADDR_ALIGNMENT));
>> +
>> + for ( bi = 0; bi != nr_banks; bi++ )
>> + {
>> + const struct membank *bank = &banks->bank[bi];
>> + paddr_t bank_start = bank->start;
>> + paddr_t bank_end = bank_start + bank->size;
>> +
>> + if ( (load_addr >= bank_start) && (load_addr < bank_end) &&
>> + (bank_end - load_addr) >= image_size )
>
> Do we have to fear overflow? (If so, shouldn't such an image be rejected
> rather than an attempt being made to place it?) If not, simply:
Just for a case. As a user may control load_addr and image_size it could
be some combination which will lead to overflow here.
>
> if ( (load_addr >= bank_start) &&
> (load_addr + image_size <= bank_end) )
I will add the following:
/*
* Reject a malformed image before the loop to avoid wrapping
* load_addr + image_size in the per-bank check below.
*
* image_size covers the kernel from _start (placed at load_addr =
* start + text_offset) through _end. The alignment gap
* [start, load_addr) is padding and need not lie within a bank.
*/
if ( image_size > (paddr_t)-1 - load_addr )
bi = nr_banks;
else
for ( bi = 0; bi != nr_banks; bi++ )
{
const struct membank *bank = &banks->bank[bi];
paddr_t bank_start = bank->start;
paddr_t bank_end = bank_start + bank->size;
if ( (load_addr >= bank_start) &&
(load_addr + image_size <= bank_end) )
break;
}
>
> Also, does image_size really only cover space starting from .text_offset,
> rather than from .start?
image_size covers total memory the kernel occupies at runtime.
>
>> +static void __init kernel_image_load(struct kernel_info *info)
>> +{
>> + int rc;
>> + paddr_t load_addr = kernel_image_place(info);
>> + paddr_t paddr = info->image.kernel_addr;
>> + paddr_t len = info->image.len;
>> + paddr_t effective_size = info->image.image_size ?: len;
>> + void *kernel;
>> +
>> + place_modules(info, load_addr, load_addr + effective_size);
>> +
>> + printk("Loading Image from %"PRIpaddr" to %"PRIpaddr"-%"PRIpaddr"\n",
>> + paddr, load_addr, load_addr + effective_size);
>
> As on earlier occasions: Please represent ranges as mathematical ones, to
> disambiguate whether the bounds (the upper one in particular) are inclusive
> or exclusive.
I will change to "... [%"PRIpaddr",%"PRIpaddr")\n".
>
>> +int __init kernel_image_probe(struct kernel_info *info, paddr_t addr,
>> + paddr_t size)
>> +{
>> +#ifdef CONFIG_RISCV_64
>> + return kernel_image64_probe(info, addr, size);
>> +#else
>> + return -EOPNOTSUPP;
>
> Better #error, as you have it elsewhere (iirc)?
Makes sense use:
# error "Only 64-bit RISC-V is supported"
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 05/12] xen/riscv: add kernel loading support
2026-04-22 11:45 ` Oleksii Kurochko
@ 2026-04-22 11:57 ` Jan Beulich
2026-04-22 13:47 ` Oleksii Kurochko
0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2026-04-22 11:57 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 22.04.2026 13:45, Oleksii Kurochko wrote:
> On 4/21/26 10:57 AM, Jan Beulich wrote:
>> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>>> +static paddr_t __init kernel_image_place(struct kernel_info *info)
>>> +{
>>> + paddr_t load_addr = INVALID_PADDR;
>>> + uint64_t image_size = info->image.image_size ?: info->image.len;
>>> + const struct membanks *banks = kernel_info_get_mem_const(info);
>>> + unsigned int nr_banks = banks->nr_banks;
>>> + unsigned int bi;
>>> +
>>> + dprintk(XENLOG_DEBUG, "nr_banks(%u)\n", nr_banks);
>>> +
>>> + /*
>>> + * At the moment, RISC-V's Linux kernel should be always position
>>> + * independent based on "Per-MMU execution" of boot.rst:
>>> + * https://docs.kernel.org/arch/riscv/boot.html#pre-mmu-execution
>>> + *
>>> + * But just for the case when RISC-V's Linux kernel isn't position
>>> + * independent it is needed to take load address from
>>> + * info->image.start.
>>> + *
>>> + * If `start` is zero, the Image is position independent.
>>> + */
>>> + if ( likely(!info->image.start) )
>>> + {
>>> + for ( bi = 0; bi != nr_banks; bi++ )
>>> + {
>>> + const struct membank *bank = &banks->bank[bi];
>>> + paddr_t bank_start = bank->start;
>>> + /*
>>> + * According to boot.rst kernel load address should be properly
>>> + * aligned:
>>> + * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
>>> + *
>>> + * As Image in this case is PIC we can ignore
>>> + * info->image.text_offset.
>>> + */
>>> + paddr_t aligned_start = ROUNDUP(bank_start, KERNEL_LOAD_ADDR_ALIGNMENT);
>>> + paddr_t bank_end = bank_start + bank->size;
>>> + paddr_t bank_size;
>>> +
>>> + if ( aligned_start > bank_end )
>>> + continue;
>>> +
>>> + bank_size = bank_end - aligned_start;
>>> +
>>> + dprintk(XENLOG_DEBUG, "bank[%u].start=%"PRIpaddr"\n", bi, bank->start);
>>> +
>>> + if ( image_size <= bank_size )
>>> + {
>>> + load_addr = aligned_start;
>>> + break;
>>> + }
>>> + }
>>> + }
>>> + else
>>> + {
>>> + load_addr = info->image.start + info->image.text_offset;
>>
>> Why does stuff ahead of text_offset not need loading?
>
> Here we just calculating only a place where kernel will be loaded. The
> full kernel image will be loaded in kernel_image_load().
Okay, but if you calculate an address where the full image won't fit,
how are things going to work?
>>> + WARN_ON(!IS_ALIGNED(load_addr, KERNEL_LOAD_ADDR_ALIGNMENT));
>>> +
>>> + for ( bi = 0; bi != nr_banks; bi++ )
>>> + {
>>> + const struct membank *bank = &banks->bank[bi];
>>> + paddr_t bank_start = bank->start;
>>> + paddr_t bank_end = bank_start + bank->size;
>>> +
>>> + if ( (load_addr >= bank_start) && (load_addr < bank_end) &&
>>> + (bank_end - load_addr) >= image_size )
>>
>> Do we have to fear overflow? (If so, shouldn't such an image be rejected
>> rather than an attempt being made to place it?) If not, simply:
>
> Just for a case. As a user may control load_addr and image_size it could
> be some combination which will lead to overflow here.
>
>>
>> if ( (load_addr >= bank_start) &&
>> (load_addr + image_size <= bank_end) )
>
> I will add the following:
> /*
> * Reject a malformed image before the loop to avoid wrapping
> * load_addr + image_size in the per-bank check below.
> *
> * image_size covers the kernel from _start (placed at load_addr =
> * start + text_offset) through _end. The alignment gap
> * [start, load_addr) is padding and need not lie within a bank.
> */
> if ( image_size > (paddr_t)-1 - load_addr )
> bi = nr_banks;
> else
> for ( bi = 0; bi != nr_banks; bi++ )
> {
> const struct membank *bank = &banks->bank[bi];
> paddr_t bank_start = bank->start;
> paddr_t bank_end = bank_start + bank->size;
>
> if ( (load_addr >= bank_start) &&
> (load_addr + image_size <= bank_end) )
> break;
> }
Please consider getting away without "else" (and hence with one level
less of indentation):
bi = image_size <= (paddr_t)-1 - load_addr ? 0 : nr_banks;
for ( ; bi != nr_banks; bi++ )
...
>> Also, does image_size really only cover space starting from .text_offset,
>> rather than from .start?
>
> image_size covers total memory the kernel occupies at runtime.
Which emphasizes the remark further up.
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 05/12] xen/riscv: add kernel loading support
2026-04-22 11:57 ` Jan Beulich
@ 2026-04-22 13:47 ` Oleksii Kurochko
2026-04-22 13:54 ` Jan Beulich
0 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-22 13:47 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/22/26 1:57 PM, Jan Beulich wrote:
> On 22.04.2026 13:45, Oleksii Kurochko wrote:
>> On 4/21/26 10:57 AM, Jan Beulich wrote:
>>> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>>>> +static paddr_t __init kernel_image_place(struct kernel_info *info)
>>>> +{
>>>> + paddr_t load_addr = INVALID_PADDR;
>>>> + uint64_t image_size = info->image.image_size ?: info->image.len;
>>>> + const struct membanks *banks = kernel_info_get_mem_const(info);
>>>> + unsigned int nr_banks = banks->nr_banks;
>>>> + unsigned int bi;
>>>> +
>>>> + dprintk(XENLOG_DEBUG, "nr_banks(%u)\n", nr_banks);
>>>> +
>>>> + /*
>>>> + * At the moment, RISC-V's Linux kernel should be always position
>>>> + * independent based on "Per-MMU execution" of boot.rst:
>>>> + * https://docs.kernel.org/arch/riscv/boot.html#pre-mmu-execution
>>>> + *
>>>> + * But just for the case when RISC-V's Linux kernel isn't position
>>>> + * independent it is needed to take load address from
>>>> + * info->image.start.
>>>> + *
>>>> + * If `start` is zero, the Image is position independent.
>>>> + */
>>>> + if ( likely(!info->image.start) )
>>>> + {
>>>> + for ( bi = 0; bi != nr_banks; bi++ )
>>>> + {
>>>> + const struct membank *bank = &banks->bank[bi];
>>>> + paddr_t bank_start = bank->start;
>>>> + /*
>>>> + * According to boot.rst kernel load address should be properly
>>>> + * aligned:
>>>> + * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
>>>> + *
>>>> + * As Image in this case is PIC we can ignore
>>>> + * info->image.text_offset.
>>>> + */
>>>> + paddr_t aligned_start = ROUNDUP(bank_start, KERNEL_LOAD_ADDR_ALIGNMENT);
>>>> + paddr_t bank_end = bank_start + bank->size;
>>>> + paddr_t bank_size;
>>>> +
>>>> + if ( aligned_start > bank_end )
>>>> + continue;
>>>> +
>>>> + bank_size = bank_end - aligned_start;
>>>> +
>>>> + dprintk(XENLOG_DEBUG, "bank[%u].start=%"PRIpaddr"\n", bi, bank->start);
>>>> +
>>>> + if ( image_size <= bank_size )
>>>> + {
>>>> + load_addr = aligned_start;
>>>> + break;
>>>> + }
>>>> + }
>>>> + }
>>>> + else
>>>> + {
>>>> + load_addr = info->image.start + info->image.text_offset;
>>>
>>> Why does stuff ahead of text_offset not need loading?
>>
>> Here we just calculating only a place where kernel will be loaded. The
>> full kernel image will be loaded in kernel_image_load().
>
> Okay, but if you calculate an address where the full image won't fit,
> how are things going to work?
If the full image won't fit than the necessary bank won't be found in
for() loop below and so this kernel will be rejected.
I expect that in the case when info->image.start is not 0 (so isn't
Image isn't PIC) Image want to be specifically load to info->image.start
+ info->image.text_offset. Is it wrong statement?
~ Oleksii
>
>>>> + WARN_ON(!IS_ALIGNED(load_addr, KERNEL_LOAD_ADDR_ALIGNMENT));
>>>> +
>>>> + for ( bi = 0; bi != nr_banks; bi++ )
>>>> + {
>>>> + const struct membank *bank = &banks->bank[bi];
>>>> + paddr_t bank_start = bank->start;
>>>> + paddr_t bank_end = bank_start + bank->size;
>>>> +
>>>> + if ( (load_addr >= bank_start) && (load_addr < bank_end) &&
>>>> + (bank_end - load_addr) >= image_size )
>>>
>>> Do we have to fear overflow? (If so, shouldn't such an image be rejected
>>> rather than an attempt being made to place it?) If not, simply:
>>
>> Just for a case. As a user may control load_addr and image_size it could
>> be some combination which will lead to overflow here.
>>
>>>
>>> if ( (load_addr >= bank_start) &&
>>> (load_addr + image_size <= bank_end) )
>>
>> I will add the following:
>> /*
>> * Reject a malformed image before the loop to avoid wrapping
>> * load_addr + image_size in the per-bank check below.
>> *
>> * image_size covers the kernel from _start (placed at load_addr =
>> * start + text_offset) through _end. The alignment gap
>> * [start, load_addr) is padding and need not lie within a bank.
>> */
>> if ( image_size > (paddr_t)-1 - load_addr )
>> bi = nr_banks;
>> else
>> for ( bi = 0; bi != nr_banks; bi++ )
>> {
>> const struct membank *bank = &banks->bank[bi];
>> paddr_t bank_start = bank->start;
>> paddr_t bank_end = bank_start + bank->size;
>>
>> if ( (load_addr >= bank_start) &&
>> (load_addr + image_size <= bank_end) )
>> break;
>> }
>
> Please consider getting away without "else" (and hence with one level
> less of indentation):
>
> bi = image_size <= (paddr_t)-1 - load_addr ? 0 : nr_banks;
> for ( ; bi != nr_banks; bi++ )
> ...
>
>>> Also, does image_size really only cover space starting from .text_offset,
>>> rather than from .start?
>>
>> image_size covers total memory the kernel occupies at runtime.
>
> Which emphasizes the remark further up.
>
> Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 05/12] xen/riscv: add kernel loading support
2026-04-22 13:47 ` Oleksii Kurochko
@ 2026-04-22 13:54 ` Jan Beulich
2026-04-22 14:32 ` Oleksii Kurochko
0 siblings, 1 reply; 34+ messages in thread
From: Jan Beulich @ 2026-04-22 13:54 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 22.04.2026 15:47, Oleksii Kurochko wrote:
> On 4/22/26 1:57 PM, Jan Beulich wrote:
>> On 22.04.2026 13:45, Oleksii Kurochko wrote:
>>> On 4/21/26 10:57 AM, Jan Beulich wrote:
>>>> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>>>>> +static paddr_t __init kernel_image_place(struct kernel_info *info)
>>>>> +{
>>>>> + paddr_t load_addr = INVALID_PADDR;
>>>>> + uint64_t image_size = info->image.image_size ?: info->image.len;
>>>>> + const struct membanks *banks = kernel_info_get_mem_const(info);
>>>>> + unsigned int nr_banks = banks->nr_banks;
>>>>> + unsigned int bi;
>>>>> +
>>>>> + dprintk(XENLOG_DEBUG, "nr_banks(%u)\n", nr_banks);
>>>>> +
>>>>> + /*
>>>>> + * At the moment, RISC-V's Linux kernel should be always position
>>>>> + * independent based on "Per-MMU execution" of boot.rst:
>>>>> + * https://docs.kernel.org/arch/riscv/boot.html#pre-mmu-execution
>>>>> + *
>>>>> + * But just for the case when RISC-V's Linux kernel isn't position
>>>>> + * independent it is needed to take load address from
>>>>> + * info->image.start.
>>>>> + *
>>>>> + * If `start` is zero, the Image is position independent.
>>>>> + */
>>>>> + if ( likely(!info->image.start) )
>>>>> + {
>>>>> + for ( bi = 0; bi != nr_banks; bi++ )
>>>>> + {
>>>>> + const struct membank *bank = &banks->bank[bi];
>>>>> + paddr_t bank_start = bank->start;
>>>>> + /*
>>>>> + * According to boot.rst kernel load address should be properly
>>>>> + * aligned:
>>>>> + * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
>>>>> + *
>>>>> + * As Image in this case is PIC we can ignore
>>>>> + * info->image.text_offset.
>>>>> + */
>>>>> + paddr_t aligned_start = ROUNDUP(bank_start, KERNEL_LOAD_ADDR_ALIGNMENT);
>>>>> + paddr_t bank_end = bank_start + bank->size;
>>>>> + paddr_t bank_size;
>>>>> +
>>>>> + if ( aligned_start > bank_end )
>>>>> + continue;
>>>>> +
>>>>> + bank_size = bank_end - aligned_start;
>>>>> +
>>>>> + dprintk(XENLOG_DEBUG, "bank[%u].start=%"PRIpaddr"\n", bi, bank->start);
>>>>> +
>>>>> + if ( image_size <= bank_size )
>>>>> + {
>>>>> + load_addr = aligned_start;
>>>>> + break;
>>>>> + }
>>>>> + }
>>>>> + }
>>>>> + else
>>>>> + {
>>>>> + load_addr = info->image.start + info->image.text_offset;
>>>>
>>>> Why does stuff ahead of text_offset not need loading?
>>>
>>> Here we just calculating only a place where kernel will be loaded. The
>>> full kernel image will be loaded in kernel_image_load().
>>
>> Okay, but if you calculate an address where the full image won't fit,
>> how are things going to work?
>
> If the full image won't fit than the necessary bank won't be found in
> for() loop below and so this kernel will be rejected.
>
> I expect that in the case when info->image.start is not 0 (so isn't
> Image isn't PIC) Image want to be specifically load to info->image.start
> + info->image.text_offset. Is it wrong statement?
I don't know, but the adding in of .text_offset looks questionable to me.
I simply don't know why such offsetting would be wanted / needed.
Jan
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 05/12] xen/riscv: add kernel loading support
2026-04-22 13:54 ` Jan Beulich
@ 2026-04-22 14:32 ` Oleksii Kurochko
0 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-22 14:32 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/22/26 3:54 PM, Jan Beulich wrote:
> On 22.04.2026 15:47, Oleksii Kurochko wrote:
>> On 4/22/26 1:57 PM, Jan Beulich wrote:
>>> On 22.04.2026 13:45, Oleksii Kurochko wrote:
>>>> On 4/21/26 10:57 AM, Jan Beulich wrote:
>>>>> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>>>>>> +static paddr_t __init kernel_image_place(struct kernel_info *info)
>>>>>> +{
>>>>>> + paddr_t load_addr = INVALID_PADDR;
>>>>>> + uint64_t image_size = info->image.image_size ?: info->image.len;
>>>>>> + const struct membanks *banks = kernel_info_get_mem_const(info);
>>>>>> + unsigned int nr_banks = banks->nr_banks;
>>>>>> + unsigned int bi;
>>>>>> +
>>>>>> + dprintk(XENLOG_DEBUG, "nr_banks(%u)\n", nr_banks);
>>>>>> +
>>>>>> + /*
>>>>>> + * At the moment, RISC-V's Linux kernel should be always position
>>>>>> + * independent based on "Per-MMU execution" of boot.rst:
>>>>>> + * https://docs.kernel.org/arch/riscv/boot.html#pre-mmu-execution
>>>>>> + *
>>>>>> + * But just for the case when RISC-V's Linux kernel isn't position
>>>>>> + * independent it is needed to take load address from
>>>>>> + * info->image.start.
>>>>>> + *
>>>>>> + * If `start` is zero, the Image is position independent.
>>>>>> + */
>>>>>> + if ( likely(!info->image.start) )
>>>>>> + {
>>>>>> + for ( bi = 0; bi != nr_banks; bi++ )
>>>>>> + {
>>>>>> + const struct membank *bank = &banks->bank[bi];
>>>>>> + paddr_t bank_start = bank->start;
>>>>>> + /*
>>>>>> + * According to boot.rst kernel load address should be properly
>>>>>> + * aligned:
>>>>>> + * https://docs.kernel.org/arch/riscv/boot.html#kernel-location
>>>>>> + *
>>>>>> + * As Image in this case is PIC we can ignore
>>>>>> + * info->image.text_offset.
>>>>>> + */
>>>>>> + paddr_t aligned_start = ROUNDUP(bank_start, KERNEL_LOAD_ADDR_ALIGNMENT);
>>>>>> + paddr_t bank_end = bank_start + bank->size;
>>>>>> + paddr_t bank_size;
>>>>>> +
>>>>>> + if ( aligned_start > bank_end )
>>>>>> + continue;
>>>>>> +
>>>>>> + bank_size = bank_end - aligned_start;
>>>>>> +
>>>>>> + dprintk(XENLOG_DEBUG, "bank[%u].start=%"PRIpaddr"\n", bi, bank->start);
>>>>>> +
>>>>>> + if ( image_size <= bank_size )
>>>>>> + {
>>>>>> + load_addr = aligned_start;
>>>>>> + break;
>>>>>> + }
>>>>>> + }
>>>>>> + }
>>>>>> + else
>>>>>> + {
>>>>>> + load_addr = info->image.start + info->image.text_offset;
>>>>>
>>>>> Why does stuff ahead of text_offset not need loading?
>>>>
>>>> Here we just calculating only a place where kernel will be loaded. The
>>>> full kernel image will be loaded in kernel_image_load().
>>>
>>> Okay, but if you calculate an address where the full image won't fit,
>>> how are things going to work?
>>
>> If the full image won't fit than the necessary bank won't be found in
>> for() loop below and so this kernel will be rejected.
>>
>> I expect that in the case when info->image.start is not 0 (so isn't
>> Image isn't PIC) Image want to be specifically load to info->image.start
>> + info->image.text_offset. Is it wrong statement?
>
> I don't know, but the adding in of .text_offset looks questionable to me.
> I simply don't know why such offsetting would be wanted / needed.
In the <linux>/Documentation/arch/riscv/boot-image-header.rst:
This header format is compliant with PE/COFF header and largely inspired
from ARM64 header. Thus, both ARM64 & RISC-V header can be combined into
one common header in future.
Than if to look at the comment near text_offset it is mentioned that:
/* Image load offset, little endian */
So it is basically Image load offset and that is why I expect it should
be added to calculation of load_addr.
Considering that boot image header is based on Arm64 I assume that an
explanation regarding text_offset for Arm64
(<linux>/Documentation/arch/arm64/booting.rst:) should be true for RISC-V:
```
The Image must be placed text_offset bytes from a 2MB aligned base
address anywhere in usable system RAM and called there. The region
between the 2 MB aligned base address and the start of the image has no
special significance to the kernel, and may be used for other purposes.
At least image_size bytes from the start of the image must be free for
use by the kernel.
```
~ Oleksii
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 07/12] xen: introduce domain-layout.h with common domain_use_host_layout()
2026-04-21 9:20 ` Jan Beulich
@ 2026-04-22 14:38 ` Oleksii Kurochko
0 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-22 14:38 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Stefano Stabellini, Julien Grall,
Bertrand Marquis, Michal Orzel, Volodymyr Babchuk, Andrew Cooper,
Anthony PERARD, Roger Pau Monné, xen-devel
On 4/21/26 11:20 AM, Jan Beulich wrote:
> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>> --- /dev/null
>> +++ b/xen/include/xen/domain-layout.h
>> @@ -0,0 +1,28 @@
>> +#ifndef __XEN_DOMAIN_LAYOUT_H__
>> +#define __XEN_DOMAIN_LAYOUT_H__
>> +
>> +#include <xen/domain.h>
>> +#include <xen/paging.h>
>> +#include <xen/sched.h>
>> +
>> +/*
>> + * Is a domain using the host memory layout?
>> + *
>> + * domain_use_host_layout() is always False for PV guests.
>> + *
>> + * Direct-mapped domains (autotranslated domains with memory allocated
>> + * contiguously and mapped 1:1 so that GFN == MFN) are always using the
>> + * host memory layout to avoid address clashes.
>
> What is "to avoid address clashes" about? If GFN == MFN, how could there
> be clashes?
>
>> + * The hardware domain will use the host layout (regardless of
>> + * direct-mapped) because some OS may rely on specific address ranges
>> + * for the devices. PV Dom0, like any other PV guests, has
>> + * domain_use_host_layout() returning False.
>
> This last sentence is somewhat redundant and somewhat in conflict with
> what is said further up. If you did s/guests/domains (including Dom0)/
> there, imo this sentence would best be dropped from down here.
I will change the comment to:
/*
* Is a domain using the host memory layout?
*
* domain_use_host_layout() is always False for PV domains (including
Dom0).
*
* Direct-mapped domains (autotranslated domains with memory allocated
* contiguously and mapped 1:1 so that GFN == MFN) must use the host
* memory layout since GFN == MFN by definition.
*
* The hardware domain will use the host layout (regardless of
* direct-mapped) because some OS may rely on specific address ranges
* for the devices.
*/
>
> With these adjustments:
> Acked-by: Jan Beulich <jbeulich@suse.com>
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 08/12] xen/riscv: rework G-stage mode handling
2026-04-21 9:39 ` Jan Beulich
@ 2026-04-22 16:06 ` Oleksii Kurochko
0 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-22 16:06 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/21/26 11:39 AM, Jan Beulich wrote:
> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>> --- a/xen/arch/riscv/p2m.c
>> +++ b/xen/arch/riscv/p2m.c
>> @@ -45,12 +45,27 @@ struct p2m_pte_ctx {
>> unsigned int level; /* Paging level at which the PTE resides. */
>> };
>>
>> -static struct gstage_mode_desc __ro_after_init max_gstage_mode = {
>> - .mode = HGATP_MODE_OFF,
>> - .paging_levels = 0,
>> - .name = "Bare",
>> +/* Values should be sorted by ->mode in this array */
>> +static const struct gstage_mode_desc modes[] = {
>
> As before, I'm of the clear opinion that this is too generic an identifier
> for use at file scope.
I can rename it to gstage_modes or p2m_modes to make it better suite
file scope.
>> --- a/xen/include/public/arch-riscv.h
>> +++ b/xen/include/public/arch-riscv.h
>> @@ -56,6 +56,11 @@ typedef struct vcpu_guest_context vcpu_guest_context_t;
>> DEFINE_XEN_GUEST_HANDLE(vcpu_guest_context_t);
>>
>> struct xen_arch_domainconfig {
>> + /*
>> + * G-stage MMU mode for the guest (e.g. "sv39", "sv48", "sv57").
>> + * Must be set; an empty string is invalid.
>> + */
>> + char gstage_mode[8];
>> };
>
> I have to say that I find it odd to use a string literal for this purpose.
> Specifying the number of wanted address bits would feel more natural. Plus
> the strings named aren't valid G-stage modes afaict - they lack the x4.
I will change it to unsigned char gstage_mode and put there on of the
value of HGATP_MODE_SV39X4, HGATP_MODE_SV48X4. Or just address bits as
you suggested will be fine too.
~ Oleksii
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 10/12] xen/riscv: introduce p2m_gpa_bits
2026-04-21 9:46 ` Jan Beulich
@ 2026-04-23 13:47 ` Oleksii Kurochko
2026-05-04 5:39 ` Jan Beulich
0 siblings, 1 reply; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-23 13:47 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/21/26 11:46 AM, Jan Beulich wrote:
> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>> common/device-tree/domain-build.c uses p2m_gpa_bits to determine the
>> upper bound of the GPA space when searching for unused regions, so it
>> must be defined when CONFIG_DOMAIN_BUILD_HELPERS=y.
>>
>> The variable is initialised to PADDR_BITS and narrowed in p2m_init() to
>> the GPA width of the selected G-stage mode, allowing an external entity
>> (e.g. an IOMMU) to restrict it further if needed.
>>
>> p2m_gpa_bits is a global rather than a per-domain value, which is
>> acceptable for now because all domains are required to use the same
>> G-stage MMU mode, as dom0less common code allocates it per all
>> domains.
>>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> Pretty hesitantly:
Could you please clarify what specifically you don't like here?
> Acked-by: Jan Beulich <jbeulich@suse.com>
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 11/12] xen/riscv: add definition of guest RAM banks
2026-04-21 10:24 ` Jan Beulich
@ 2026-04-23 13:50 ` Oleksii Kurochko
0 siblings, 0 replies; 34+ messages in thread
From: Oleksii Kurochko @ 2026-04-23 13:50 UTC (permalink / raw)
To: Jan Beulich
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 4/21/26 12:24 PM, Jan Beulich wrote:
> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>> The dom0less solution uses defined RAM banks as compile-time constants,
>> so introduce macros to describe guest RAM banks.
>>
>> The reason for 2 banks is that there is typically always a use case for
>> low memory under 4 GB, but the bank under 4 GB ends up being small because
>> there are other things under 4 GB it can conflict with (interrupt
>> controller, PCI BARs, etc.). So a second bank is added above that MMIO
>> region (starting at 8 GiB) to provide the remaining RAM; the gap between
>> the two banks also exercises code paths handling discontiguous memory.
>> For Sv32 guests (34-bit GPA, 16 GiB addressable), bank0 provides 2 GB
>> (2–4 GB) and the first 8 GB of bank1 (8–16 GB) is accessible.
>>
>> Extended regions are useful for RISC-V: they could be used to provide a
>> "space" for Linux to map grant mappings.
>>
>> Despite the fact that for every guest MMU mode the GPA could be up
>> to 56 bits wide (except Sv32 whose GPA is 34 bits), the combined size
>> of both banks is limited to 1018 GB as it is more than enough for most
>> use cases.
>>
>> Add inclusion of asm/guest-layout.h to asm/domain.h to make dom0less
>> common code build happy.
>>
>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>
> Again pretty hesitantly:
I suppose beucase of hard-coded layout, right?
> Acked-by: Jan Beulich <jbeulich@suse.com>
Thanks.
~ Oleksii
^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH v3 10/12] xen/riscv: introduce p2m_gpa_bits
2026-04-23 13:47 ` Oleksii Kurochko
@ 2026-05-04 5:39 ` Jan Beulich
0 siblings, 0 replies; 34+ messages in thread
From: Jan Beulich @ 2026-05-04 5:39 UTC (permalink / raw)
To: Oleksii Kurochko
Cc: Romain Caritey, Alistair Francis, Connor Davis, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Roger Pau Monné,
Stefano Stabellini, xen-devel
On 23.04.2026 15:47, Oleksii Kurochko wrote:
> On 4/21/26 11:46 AM, Jan Beulich wrote:
>> On 10.04.2026 17:54, Oleksii Kurochko wrote:
>>> common/device-tree/domain-build.c uses p2m_gpa_bits to determine the
>>> upper bound of the GPA space when searching for unused regions, so it
>>> must be defined when CONFIG_DOMAIN_BUILD_HELPERS=y.
>>>
>>> The variable is initialised to PADDR_BITS and narrowed in p2m_init() to
>>> the GPA width of the selected G-stage mode, allowing an external entity
>>> (e.g. an IOMMU) to restrict it further if needed.
>>>
>>> p2m_gpa_bits is a global rather than a per-domain value, which is
>>> acceptable for now because all domains are required to use the same
>>> G-stage MMU mode, as dom0less common code allocates it per all
>>> domains.
>>>
>>> Signed-off-by: Oleksii Kurochko <oleksii.kurochko@gmail.com>
>>
>> Pretty hesitantly:
>
> Could you please clarify what specifically you don't like here?
As per earlier comments, this not being per-guest, and implications thereof.
Jan
>> Acked-by: Jan Beulich <jbeulich@suse.com>
>
> Thanks.
>
> ~ Oleksii
^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2026-05-04 5:39 UTC | newest]
Thread overview: 34+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-04-10 15:54 [PATCH v3 00/12] RISCV: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 01/12] xen/riscv: implement get_page_from_gfn() Oleksii Kurochko
2026-04-20 14:17 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 02/12] xen: fix len type for guest copy functions Oleksii Kurochko
2026-04-20 15:43 ` Jan Beulich
2026-04-20 15:44 ` Jan Beulich
2026-04-22 10:02 ` Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 03/12] xen/riscv: implement copy_to_guest_phys() Oleksii Kurochko
2026-04-20 15:46 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 04/12] xen/dom0less: rename kernel_zimage_probe() to kernel_image_probe() Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 05/12] xen/riscv: add kernel loading support Oleksii Kurochko
2026-04-21 8:57 ` Jan Beulich
2026-04-22 11:45 ` Oleksii Kurochko
2026-04-22 11:57 ` Jan Beulich
2026-04-22 13:47 ` Oleksii Kurochko
2026-04-22 13:54 ` Jan Beulich
2026-04-22 14:32 ` Oleksii Kurochko
2026-04-21 9:01 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 06/12] xen: move declaration of fw_unreserved_regions() to common header Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 07/12] xen: introduce domain-layout.h with common domain_use_host_layout() Oleksii Kurochko
2026-04-21 9:20 ` Jan Beulich
2026-04-22 14:38 ` Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 08/12] xen/riscv: rework G-stage mode handling Oleksii Kurochko
2026-04-21 9:39 ` Jan Beulich
2026-04-22 16:06 ` Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 09/12] xen: rename p2m_ipa_bits to p2m_gpa_bits Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 10/12] xen/riscv: introduce p2m_gpa_bits Oleksii Kurochko
2026-04-21 9:46 ` Jan Beulich
2026-04-23 13:47 ` Oleksii Kurochko
2026-05-04 5:39 ` Jan Beulich
2026-04-10 15:54 ` [PATCH v3 11/12] xen/riscv: add definition of guest RAM banks Oleksii Kurochko
2026-04-21 10:24 ` Jan Beulich
2026-04-23 13:50 ` Oleksii Kurochko
2026-04-10 15:54 ` [PATCH v3 12/12] xen/riscv: enable DOMAIN_BUILD_HELPERS Oleksii Kurochko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.