* [RFC 00/38] Hyperlaunch domain builder
@ 2025-04-19 22:07 Daniel P. Smith
2025-04-19 22:07 ` [RFC 01/38] maintainers: add new section for hyperlaunch Daniel P. Smith
` (37 more replies)
0 siblings, 38 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, christopher.w.clark,
stefano.stabellini, agarciav
NOTE: Sending this series as an RFC as it is a follow-on to the hyperlaunch
dom0 device tree series going through rounds of review right now. This specific
iteration is based off of v3 with the exception of one fix found here and
already incorporated into v4.
The Hyperlaunch domain builder series is the third split out for the
introduction of the Hyperlaunch domain builder logic. These changes focus on
introducing the ability to create multiple PVH domains at boot. The definition
for those domains will come from the Device Tree capability introduced in the
dom0 device tree series.
Documentation on Hyperlaunch:
https://wiki.xenproject.org/wiki/Hyperlaunch
Original Hyperlaunch v1 patch series:
https://lists.xenproject.org/archives/html/xen-devel/2022-07/msg00345.html
V/r,
Daniel P. Smith
Daniel P. Smith (38):
maintainers: add new section for hyperlaunch
x86/hyperlaunch: correct the naming of domain ramdisk field
x86/hyperlaunch: convert max vcpu determination to domain builder
x86/hyperlaunch: convert vcpu0 creation to domain builder
x86/hyperlaunch: move dom0 cpuid policy behind capability check
x86/hyperlaunch: add hardware domain capability support
x86/hyperlaunch: introduce pvh domain builder
x86/hyperlaunch: move initial hwdom setup to dom_construct_pvh
x86/boot: convert dom0 page calculation to use boot domain
x86/boot: refactor dom0 page calculation
x86/boot: generalize paging pages calculation
x86/boot: generalize compute number of domain pages
x86/hyperlaunch: move page computation to domain builder
x86/hyperlaunch: move pvh p2m init to domain builder
x86/hyperlaunch: move iommu init to domain builder
x86/boot: move and rename sched_setup_dom0_vcpus
x86/hyperlaunch: move pvh_setup_cpus to domain builder
x86/boot: rename pvh acpi setup function
x86/hyperlaunch: add domu memory map construction
x86/hyperlaunch: move populating p2m under domain builder
x86/hyperlaunch: move remaining pvh dom0 construction
x86/hyperlaunch: relocate pvh_steal_ram to domain builder
x86/hyperlaunch: add domu acpi construction
x86/boot: export command line processing
x86/hyperlaunch: convert create_dom0 to arch_create_dom
x86/hyperlaunch: remove dom0-isms from arch_create_dom
x86/hyperlaunch: introduce domain builder general dom creation
x86/hyperlaunch: add xenstore boot capabilities flag
x86/hyperlaunch: allocate console for domu
x86/hyperlaunch: allocate xenstore for domu
x86/hyperlaunch: move boot module discard to domain builder
x86/hyperlaunch: introduce concept of core domains
x86/boot: refactor bzimage parser to be re-enterant
x86/hyperlaunch: introduce multidomain kconfig option
x86/hyperlaunch: add multidomain construction logic
x86/hyperlaunch: enable unpausing mulitple domains
x86/hyperlaunch: generalize domid assignment
tools: introduce hyperlaunch domain late init
.gitignore | 1 +
MAINTAINERS | 9 +
tools/helpers/Makefile | 12 +
tools/helpers/late-init-domains.c | 364 +++++++
tools/helpers/late-init-domains.h | 18 +
tools/helpers/xs-helpers.c | 117 +++
tools/helpers/xs-helpers.h | 26 +
xen/arch/x86/bzimage.c | 38 +-
xen/arch/x86/dom0_build.c | 120 +--
xen/arch/x86/domain-builder/Kconfig | 12 +
xen/arch/x86/domain-builder/Makefile | 1 +
xen/arch/x86/domain-builder/core.c | 127 ++-
xen/arch/x86/domain-builder/domain.c | 421 ++++++++
xen/arch/x86/domain-builder/fdt.c | 37 +-
xen/arch/x86/hvm/Makefile | 1 +
xen/arch/x86/hvm/dom0_build.c | 600 +-----------
xen/arch/x86/hvm/dom_build.c | 1063 +++++++++++++++++++++
xen/arch/x86/include/asm/boot-domain.h | 21 +-
xen/arch/x86/include/asm/bootinfo.h | 28 +-
xen/arch/x86/include/asm/bzimage.h | 5 +-
xen/arch/x86/include/asm/dom0_build.h | 19 +-
xen/arch/x86/include/asm/domain-builder.h | 29 +
xen/arch/x86/include/asm/setup.h | 4 +-
xen/arch/x86/pv/dom0_build.c | 19 +-
xen/arch/x86/setup.c | 207 +---
xen/common/sched/core.c | 12 -
xen/include/xen/sched.h | 1 -
27 files changed, 2412 insertions(+), 900 deletions(-)
create mode 100644 tools/helpers/late-init-domains.c
create mode 100644 tools/helpers/late-init-domains.h
create mode 100644 tools/helpers/xs-helpers.c
create mode 100644 tools/helpers/xs-helpers.h
create mode 100644 xen/arch/x86/domain-builder/domain.c
create mode 100644 xen/arch/x86/hvm/dom_build.c
--
2.30.2
^ permalink raw reply [flat|nested] 56+ messages in thread
* [RFC 01/38] maintainers: add new section for hyperlaunch
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 02/38] x86/hyperlaunch: correct the naming of domain ramdisk field Daniel P. Smith
` (36 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
Add new section to MAINTAINERS for hyperlaunch, including the files
specifically added to this point under the hyperlaunch work.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
MAINTAINERS | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index c11b82eca98f..7de671efe512 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -343,6 +343,15 @@ F: tools/misc/xenhypfs.c
F: xen/common/hypfs.c
F: xen/include/xen/hypfs.h
+HYPERLAUNCH
+M: Daniel P. Smith <dpsmith@apertussolutions.com>
+M: Christopher Clark <christopher.w.clark@gmail.com>
+S: Supported
+F: xen/x86/domain-builder/
+F: xen/x86/include/asm/bootinfo.h
+F: xen/x86/include/asm/boot-domain.h
+F: xen/x86/include/asm/domain-builder.h
+
IMX8QM/QXP SUPPORT
R: John Ernberg <john.ernberg@actia.se>
F: xen/arch/arm/platforms/imx8qm.c
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 02/38] x86/hyperlaunch: correct the naming of domain ramdisk field
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
2025-04-19 22:07 ` [RFC 01/38] maintainers: add new section for hyperlaunch Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 03/38] x86/hyperlaunch: convert max vcpu determination to domain builder Daniel P. Smith
` (35 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The ramdisk field was incorrectly renamed to module without providing a sound
justification. Doing so creates an unnecessary indirection that can cause more
confusion than utility. The only way the field is populated is via a match of a
boot module of type BOOTMOD_RAMDISK. All usages of the field are cast into a
variables named initrd. The attempt to generalize the field name under the
guise that it could be multiplexed for other module types did so without a
valid usage. The result is there is no consideration of how that multiplexing
would even work or be deconflict with the simultaneous presence of a ramdisk.
Moving the field name back to ramdisk to make the current code flow logical. At
a later time should there be a use case that arises where additional modules
need to be passed to a domain, a more appropriate framework will be crafted
that will like be more complicated than just renaming the field to something
other than ramdisk.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/fdt.c | 4 ++--
xen/arch/x86/hvm/dom0_build.c | 2 +-
xen/arch/x86/include/asm/boot-domain.h | 2 +-
xen/arch/x86/pv/dom0_build.c | 2 +-
xen/arch/x86/setup.c | 2 +-
5 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/xen/arch/x86/domain-builder/fdt.c b/xen/arch/x86/domain-builder/fdt.c
index dbfbcffb0a9c..d2a7e010ea77 100644
--- a/xen/arch/x86/domain-builder/fdt.c
+++ b/xen/arch/x86/domain-builder/fdt.c
@@ -310,7 +310,7 @@ static int __init process_domain_node(
{
int idx;
- if ( bd->module )
+ if ( bd->ramdisk )
{
printk(XENLOG_ERR "Duplicate ramdisk module for domain %s)\n",
name);
@@ -328,7 +328,7 @@ static int __init process_domain_node(
printk(" ramdisk: boot module %d\n", idx);
bi->mods[idx].type = BOOTMOD_RAMDISK;
- bd->module = &bi->mods[idx];
+ bd->ramdisk = &bi->mods[idx];
continue;
}
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index ebad5a49b8d4..176b253d3c61 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -648,7 +648,7 @@ static int __init pvh_load_kernel(
{
struct domain *d = bd->d;
struct boot_module *image = bd->kernel;
- struct boot_module *initrd = bd->module;
+ struct boot_module *initrd = bd->ramdisk;
void *image_base = bootstrap_map_bm(image);
void *image_start = image_base + image->headroom;
unsigned long image_len = image->size;
diff --git a/xen/arch/x86/include/asm/boot-domain.h b/xen/arch/x86/include/asm/boot-domain.h
index 29a7d806de1a..b0f956cd734e 100644
--- a/xen/arch/x86/include/asm/boot-domain.h
+++ b/xen/arch/x86/include/asm/boot-domain.h
@@ -29,7 +29,7 @@ struct boot_domain {
unsigned int max_vcpus;
struct boot_module *kernel;
- struct boot_module *module;
+ struct boot_module *ramdisk;
const char *cmdline;
struct domain *d;
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index e1b78d47c218..3b2baf057b75 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -375,7 +375,7 @@ static int __init dom0_construct(const struct boot_domain *bd)
struct vcpu *v = d->vcpu[0];
struct boot_module *image = bd->kernel;
- struct boot_module *initrd = bd->module;
+ struct boot_module *initrd = bd->ramdisk;
void *image_base;
unsigned long image_len;
void *image_start;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 1e7bbc415e1d..6a939ccede3f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2165,7 +2165,7 @@ void asmlinkage __init noreturn __start_xen(void)
if ( !bi->hyperlaunch_enabled && initrdidx < MAX_NR_BOOTMODS )
{
bi->mods[initrdidx].type = BOOTMOD_RAMDISK;
- bi->domains[0].module = &bi->mods[initrdidx];
+ bi->domains[0].ramdisk = &bi->mods[initrdidx];
if ( first_boot_module_index(bi, BOOTMOD_UNKNOWN) < MAX_NR_BOOTMODS )
printk(XENLOG_WARNING
"Multiple initrd candidates, picking module #%u\n",
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 03/38] x86/hyperlaunch: convert max vcpu determination to domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
2025-04-19 22:07 ` [RFC 01/38] maintainers: add new section for hyperlaunch Daniel P. Smith
2025-04-19 22:07 ` [RFC 02/38] x86/hyperlaunch: correct the naming of domain ramdisk field Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-22 20:36 ` Jason Andryuk
2025-04-19 22:07 ` [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation " Daniel P. Smith
` (34 subsequent siblings)
37 siblings, 1 reply; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The domain configuration may request more vcpus than are present in the system.
For dom0, the function dom0_max_vcpus() was used to clamp down to physically
available vcpus. Here we are introducing a generalized version,
dom_max_vcpus(), that takes a boot domain and sets the max vcpus based on the
lesser of the requested max and the available vcpus.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/Makefile | 1 +
xen/arch/x86/domain-builder/domain.c | 38 +++++++++++++++++++++++
xen/arch/x86/include/asm/domain-builder.h | 1 +
xen/arch/x86/setup.c | 4 +--
4 files changed, 42 insertions(+), 2 deletions(-)
create mode 100644 xen/arch/x86/domain-builder/domain.c
diff --git a/xen/arch/x86/domain-builder/Makefile b/xen/arch/x86/domain-builder/Makefile
index b10cd56b286b..67024b5cb213 100644
--- a/xen/arch/x86/domain-builder/Makefile
+++ b/xen/arch/x86/domain-builder/Makefile
@@ -1,2 +1,3 @@
obj-$(CONFIG_DOMAIN_BUILDER) += fdt.init.o
obj-y += core.init.o
+obj-y += domain.init.o
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
new file mode 100644
index 000000000000..f2277b9e3cf3
--- /dev/null
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2024, Apertus Solutions, LLC
+ */
+
+#include <xen/cpumask.h>
+#include <xen/domain.h>
+#include <xen/init.h>
+#include <xen/sched.h>
+
+#include <asm/bootinfo.h>
+
+unsigned int __init dom_max_vcpus(struct boot_domain *bd)
+{
+ unsigned int limit = bd->mode & BUILD_MODE_PARAVIRT ?
+ MAX_VIRT_CPUS : HVM_MAX_VCPUS;
+
+ if ( bd->capabilities & BUILD_CAPS_CONTROL )
+ limit = dom0_max_vcpus();
+ else
+ limit = min(limit,
+ (uint32_t)cpumask_weight(cpupool_valid_cpus(cpupool0)));
+
+ if ( bd->max_vcpus == 0 || bd->max_vcpus > limit )
+ bd->max_vcpus = limit;
+
+ return bd->max_vcpus;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index 7518b6ddf3b9..f37f73e2255b 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -8,5 +8,6 @@ int __init builder_get_cmdline(
struct boot_info *bi, int offset, char *cmdline, size_t size);
void builder_init(struct boot_info *bi);
+unsigned int dom_max_vcpus(struct boot_domain *bd);
#endif
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 6a939ccede3f..86bbd7c72ccd 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1010,18 +1010,18 @@ static struct domain *__init create_dom0(struct boot_info *bi)
char *cmdline = NULL;
size_t cmdline_size;
unsigned int create_flags = 0;
+ struct boot_domain *bd = &bi->domains[0];
struct xen_domctl_createdomain dom0_cfg = {
.flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
.max_evtchn_port = -1,
.max_grant_frames = -1,
.max_maptrack_frames = -1,
.grant_opts = XEN_DOMCTL_GRANT_version(opt_gnttab_max_version),
- .max_vcpus = dom0_max_vcpus(),
+ .max_vcpus = dom_max_vcpus(bd),
.arch = {
.misc_flags = opt_dom0_msr_relaxed ? XEN_X86_MSR_RELAXED : 0,
},
};
- struct boot_domain *bd = &bi->domains[0];
struct domain *d;
if ( opt_dom0_pvh ||
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation to domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (2 preceding siblings ...)
2025-04-19 22:07 ` [RFC 03/38] x86/hyperlaunch: convert max vcpu determination to domain builder Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-25 15:22 ` Alejandro Vallejo
2025-04-19 22:07 ` [RFC 05/38] x86/hyperlaunch: move dom0 cpuid policy behind capability check Daniel P. Smith
` (33 subsequent siblings)
37 siblings, 1 reply; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Convert alloc_dom0_vcpu0() to dom0_set_affinity(), making it only set up the
node affinity based on command line parameters passed. At the same time,
introduce alloc_dom_vcpu0() as the replacement for alloc_dom0_vcpu(). Then have
alloc_dom_vcpu0() call dom0_set_affinity() when the boot domain is the control
domain, otherwise set the affinity to auto.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/dom0_build.c | 4 +---
xen/arch/x86/domain-builder/domain.c | 11 +++++++++++
xen/arch/x86/include/asm/dom0_build.h | 2 ++
xen/arch/x86/include/asm/domain-builder.h | 1 +
xen/arch/x86/setup.c | 5 +++--
5 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 7b3e31a08f7d..77386cd1e20c 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -254,12 +254,10 @@ unsigned int __init dom0_max_vcpus(void)
return max_vcpus;
}
-struct vcpu *__init alloc_dom0_vcpu0(struct domain *dom0)
+void __init dom0_set_affinity(struct domain *dom0)
{
dom0->node_affinity = dom0_nodes;
dom0->auto_node_affinity = !dom0_nr_pxms;
-
- return vcpu_create(dom0, 0);
}
#ifdef CONFIG_SHADOW_PAGING
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index f2277b9e3cf3..619d36ea0b87 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -9,6 +9,7 @@
#include <xen/sched.h>
#include <asm/bootinfo.h>
+#include <asm/dom0_build.h>
unsigned int __init dom_max_vcpus(struct boot_domain *bd)
{
@@ -27,6 +28,16 @@ unsigned int __init dom_max_vcpus(struct boot_domain *bd)
return bd->max_vcpus;
}
+struct vcpu *__init alloc_dom_vcpu0(struct boot_domain *bd)
+{
+ if ( bd->capabilities & BUILD_CAPS_CONTROL )
+ dom0_set_affinity(bd->d);
+ else
+ bd->d->auto_node_affinity = true;
+
+ return vcpu_create(bd->d, 0);
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index ff021c24af9d..426def4115ce 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -8,6 +8,8 @@
extern unsigned int dom0_memflags;
+void dom0_set_affinity(struct domain *dom0);
+
unsigned long dom0_compute_nr_pages(struct domain *d,
struct elf_dom_parms *parms,
unsigned long initrd_len);
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index f37f73e2255b..dd47e9ac0dc6 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -9,5 +9,6 @@ int __init builder_get_cmdline(
void builder_init(struct boot_info *bi);
unsigned int dom_max_vcpus(struct boot_domain *bd);
+struct vcpu *alloc_dom_vcpu0(struct boot_domain *bd);
#endif
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 86bbd7c72ccd..8ba9d592ed5a 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1048,9 +1048,11 @@ static struct domain *__init create_dom0(struct boot_info *bi)
if ( IS_ERR(d) )
panic("Error creating d%u: %ld\n", bd->domid, PTR_ERR(d));
+ bd->d = d;
+
init_dom0_cpuid_policy(d);
- if ( alloc_dom0_vcpu0(d) == NULL )
+ if ( alloc_dom_vcpu0(bd) == NULL )
panic("Error creating %pd vcpu 0\n", d);
cmdline_size = domain_cmdline_size(bi, bd);
@@ -1090,7 +1092,6 @@ static struct domain *__init create_dom0(struct boot_info *bi)
bd->cmdline = cmdline;
}
- bd->d = d;
if ( construct_dom0(bd) != 0 )
panic("Could not construct domain 0\n");
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 05/38] x86/hyperlaunch: move dom0 cpuid policy behind capability check
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (3 preceding siblings ...)
2025-04-19 22:07 ` [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation " Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 06/38] x86/hyperlaunch: add hardware domain capability support Daniel P. Smith
` (32 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
To incrementally convert create_dom0() into being a generalized domain
construction function, move the dom0 specific cpuid policy behind the control
domain capability.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/setup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 8ba9d592ed5a..41e46f05a520 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1050,7 +1050,8 @@ static struct domain *__init create_dom0(struct boot_info *bi)
bd->d = d;
- init_dom0_cpuid_policy(d);
+ if ( bd->capabilities & BUILD_CAPS_CONTROL )
+ init_dom0_cpuid_policy(bd->d);
if ( alloc_dom_vcpu0(bd) == NULL )
panic("Error creating %pd vcpu 0\n", d);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 06/38] x86/hyperlaunch: add hardware domain capability support
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (4 preceding siblings ...)
2025-04-19 22:07 ` [RFC 05/38] x86/hyperlaunch: move dom0 cpuid policy behind capability check Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 07/38] x86/hyperlaunch: introduce pvh domain builder Daniel P. Smith
` (31 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Introduce a capability flag to the capabilities property of a domain node in
the hyperlaunch device tree. In the domain construction, use the capability
flag to determine if CDF_hardware should be set on the domain config.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/core.c | 3 ++-
xen/arch/x86/domain-builder/fdt.c | 2 ++
xen/arch/x86/include/asm/boot-domain.h | 1 +
xen/arch/x86/setup.c | 8 ++++++--
4 files changed, 11 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-builder/core.c
index 6ab4e6fe5333..8d137ecaaf84 100644
--- a/xen/arch/x86/domain-builder/core.c
+++ b/xen/arch/x86/domain-builder/core.c
@@ -96,7 +96,8 @@ void __init builder_init(struct boot_info *bi)
i = first_boot_module_index(bi, BOOTMOD_UNKNOWN);
bi->mods[i].type = BOOTMOD_KERNEL;
bi->domains[0].kernel = &bi->mods[i];
- bi->domains[0].capabilities |= BUILD_CAPS_CONTROL;
+ bi->domains[0].capabilities |=
+ (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE);
bi->nr_domains = 1;
}
}
diff --git a/xen/arch/x86/domain-builder/fdt.c b/xen/arch/x86/domain-builder/fdt.c
index d2a7e010ea77..1af58ed4eb86 100644
--- a/xen/arch/x86/domain-builder/fdt.c
+++ b/xen/arch/x86/domain-builder/fdt.c
@@ -267,6 +267,8 @@ static int __init process_domain_node(
printk(" caps: ");
if ( bd->capabilities & BUILD_CAPS_CONTROL )
printk("c");
+ if ( bd->capabilities & BUILD_CAPS_HARDWARE )
+ printk(" h");
printk("\n");
}
}
diff --git a/xen/arch/x86/include/asm/boot-domain.h b/xen/arch/x86/include/asm/boot-domain.h
index b0f956cd734e..18d144de2fea 100644
--- a/xen/arch/x86/include/asm/boot-domain.h
+++ b/xen/arch/x86/include/asm/boot-domain.h
@@ -15,6 +15,7 @@ struct boot_domain {
#define BUILD_CAPS_NONE (0)
#define BUILD_CAPS_CONTROL (1 << 0)
+#define BUILD_CAPS_HARDWARE (1 << 1)
uint32_t capabilities;
/* On | Off */
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 41e46f05a520..760262ea3d4c 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1035,8 +1035,12 @@ static struct domain *__init create_dom0(struct boot_info *bi)
XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
}
- if ( iommu_enabled )
- dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
+ if ( bd->capabilities & BUILD_CAPS_HARDWARE )
+ {
+ create_flags |= CDF_hardware;
+ if ( iommu_enabled )
+ dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
+ }
if ( bd->domid == DOMID_INVALID )
/* Create initial domain. Not d0 for pvshim. */
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 07/38] x86/hyperlaunch: introduce pvh domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (5 preceding siblings ...)
2025-04-19 22:07 ` [RFC 06/38] x86/hyperlaunch: add hardware domain capability support Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 08/38] x86/hyperlaunch: move initial hwdom setup to dom_construct_pvh Daniel P. Smith
` (30 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Introduce dom_construct_pvh() as a wrapper around dom0_construct_pvh(). This
function will be expanded as dom0 specific construction functions are
generalized.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/dom0_build.c | 5 ++--
xen/arch/x86/hvm/Makefile | 1 +
xen/arch/x86/hvm/dom_build.c | 31 +++++++++++++++++++++++
xen/arch/x86/include/asm/domain-builder.h | 2 ++
xen/arch/x86/include/asm/setup.h | 2 +-
5 files changed, 38 insertions(+), 3 deletions(-)
create mode 100644 xen/arch/x86/hvm/dom_build.c
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 77386cd1e20c..08fde953a1e0 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -15,6 +15,7 @@
#include <asm/amd.h>
#include <asm/bootinfo.h>
#include <asm/dom0_build.h>
+#include <asm/domain-builder.h>
#include <asm/guest.h>
#include <asm/hpet.h>
#include <asm/hvm/emulate.h>
@@ -613,7 +614,7 @@ int __init dom0_setup_permissions(struct domain *d)
return rc;
}
-int __init construct_dom0(const struct boot_domain *bd)
+int __init construct_dom0(struct boot_domain *bd)
{
int rc;
const struct domain *d = bd->d;
@@ -637,7 +638,7 @@ int __init construct_dom0(const struct boot_domain *bd)
opt_dom0_max_vcpus_max = bd->max_vcpus;
if ( is_hvm_domain(d) )
- rc = dom0_construct_pvh(bd);
+ rc = dom_construct_pvh(bd);
else if ( is_pv_domain(d) )
rc = dom0_construct_pv(bd);
else
diff --git a/xen/arch/x86/hvm/Makefile b/xen/arch/x86/hvm/Makefile
index 4c1fa5c6c2bf..2c1662b66897 100644
--- a/xen/arch/x86/hvm/Makefile
+++ b/xen/arch/x86/hvm/Makefile
@@ -5,6 +5,7 @@ obj-y += viridian/
obj-y += asid.o
obj-y += dm.o
obj-bin-y += dom0_build.init.o
+obj-bin-y += dom_build.init.o
obj-y += domain.o
obj-y += emulate.o
obj-$(CONFIG_GRANT_TABLE) += grant_table.o
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
new file mode 100644
index 000000000000..7206815d64a9
--- /dev/null
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * hvm/dom_build.c
+ *
+ * Dom builder for PVH guest.
+ *
+ * Copyright (C) 2017 Citrix Systems R&D
+ * Copyright (C) 2024 Apertus Solutions, LLC
+ */
+
+#include <xen/init.h>
+
+#include <asm/bootinfo.h>
+#include <asm/dom0_build.h>
+
+int __init dom_construct_pvh(struct boot_domain *bd)
+{
+ printk(XENLOG_INFO "*** Building a PVH Dom%d ***\n", bd->domid);
+
+ return dom0_construct_pvh(bd);
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index dd47e9ac0dc6..ccfa4bd82acd 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -11,4 +11,6 @@ void builder_init(struct boot_info *bi);
unsigned int dom_max_vcpus(struct boot_domain *bd);
struct vcpu *alloc_dom_vcpu0(struct boot_domain *bd);
+int dom_construct_pvh(struct boot_domain *bd);
+
#endif
diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
index ac34c698551e..b517da6144de 100644
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -27,7 +27,7 @@ void subarch_init_memory(void);
void init_IRQ(void);
struct boot_domain;
-int construct_dom0(const struct boot_domain *bd);
+int construct_dom0(struct boot_domain *bd);
void setup_io_bitmap(struct domain *d);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 08/38] x86/hyperlaunch: move initial hwdom setup to dom_construct_pvh
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (6 preceding siblings ...)
2025-04-19 22:07 ` [RFC 07/38] x86/hyperlaunch: introduce pvh domain builder Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 09/38] x86/boot: convert dom0 page calculation to use boot domain Daniel P. Smith
` (29 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Relocate the initial block of hwdom setup code from dom0_construct_pvh() over
to dom_construct_pvh().
No functional change.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom0_build.c | 44 ------------------------------
xen/arch/x86/hvm/dom_build.c | 50 +++++++++++++++++++++++++++++++++++
2 files changed, 50 insertions(+), 44 deletions(-)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 176b253d3c61..b4ffebdde00e 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1312,56 +1312,12 @@ static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info)
return 0;
}
-static void __hwdom_init pvh_setup_mmcfg(struct domain *d)
-{
- unsigned int i;
- int rc;
-
- for ( i = 0; i < pci_mmcfg_config_num; i++ )
- {
- rc = register_vpci_mmcfg_handler(d, pci_mmcfg_config[i].address,
- pci_mmcfg_config[i].start_bus_number,
- pci_mmcfg_config[i].end_bus_number,
- pci_mmcfg_config[i].pci_segment);
- if ( rc )
- printk("Unable to setup MMCFG handler at %#lx for segment %u\n",
- pci_mmcfg_config[i].address,
- pci_mmcfg_config[i].pci_segment);
- }
-}
-
int __init dom0_construct_pvh(const struct boot_domain *bd)
{
paddr_t entry, start_info;
struct domain *d = bd->d;
int rc;
- printk(XENLOG_INFO "*** Building a PVH Dom%d ***\n", d->domain_id);
-
- if ( bd->kernel == NULL )
- panic("Missing kernel boot module for %pd construction\n", d);
-
- if ( is_hardware_domain(d) )
- {
- /*
- * MMCFG initialization must be performed before setting domain
- * permissions, as the MCFG areas must not be part of the domain IOMEM
- * accessible regions.
- */
- pvh_setup_mmcfg(d);
-
- /*
- * Setup permissions early so that calls to add MMIO regions to the
- * p2m as part of vPCI setup don't fail due to permission checks.
- */
- rc = dom0_setup_permissions(d);
- if ( rc )
- {
- printk("%pd unable to setup permissions: %d\n", d, rc);
- return rc;
- }
- }
-
/*
* Craft dom0 physical memory map and set the paging allocation. This must
* be done before the iommu initializion, since iommu initialization code
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 7206815d64a9..b6e534226434 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -8,15 +8,65 @@
* Copyright (C) 2024 Apertus Solutions, LLC
*/
+#include <xen/acpi.h>
#include <xen/init.h>
+#include <xen/types.h>
+
+#include <acpi/actables.h>
#include <asm/bootinfo.h>
#include <asm/dom0_build.h>
+#include <asm/hvm/io.h>
+#include <asm/pci.h>
+
+static void __hwdom_init pvh_setup_mmcfg(struct domain *d)
+{
+ unsigned int i;
+ int rc;
+
+ for ( i = 0; i < pci_mmcfg_config_num; i++ )
+ {
+ rc = register_vpci_mmcfg_handler(d, pci_mmcfg_config[i].address,
+ pci_mmcfg_config[i].start_bus_number,
+ pci_mmcfg_config[i].end_bus_number,
+ pci_mmcfg_config[i].pci_segment);
+ if ( rc )
+ printk("Unable to setup MMCFG handler at %#lx for segment %u\n",
+ pci_mmcfg_config[i].address,
+ pci_mmcfg_config[i].pci_segment);
+ }
+}
int __init dom_construct_pvh(struct boot_domain *bd)
{
+ int rc;
+
printk(XENLOG_INFO "*** Building a PVH Dom%d ***\n", bd->domid);
+ if ( bd->kernel == NULL )
+ panic("Missing kernel boot module for %pd construction\n", bd->d);
+
+ if ( is_hardware_domain(bd->d) )
+ {
+ /*
+ * MMCFG initialization must be performed before setting domain
+ * permissions, as the MCFG areas must not be part of the domain IOMEM
+ * accessible regions.
+ */
+ pvh_setup_mmcfg(bd->d);
+
+ /*
+ * Setup permissions early so that calls to add MMIO regions to the
+ * p2m as part of vPCI setup don't fail due to permission checks.
+ */
+ rc = dom0_setup_permissions(bd->d);
+ if ( rc )
+ {
+ printk("%pd unable to setup permissions: %d\n", bd->d, rc);
+ return rc;
+ }
+ }
+
return dom0_construct_pvh(bd);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 09/38] x86/boot: convert dom0 page calculation to use boot domain
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (7 preceding siblings ...)
2025-04-19 22:07 ` [RFC 08/38] x86/hyperlaunch: move initial hwdom setup to dom_construct_pvh Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 10/38] x86/boot: refactor dom0 page calculation Daniel P. Smith
` (28 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
This commit seeks to rework the dom0_compute_nr_pages() function to consume a
boot domain structure that may contain requested memory pages, min pages, max
pages, and the reference for the initrd. With the passing of the boot domain
struct, the initrd_size parameter is dropped. This takes into account the PVH
case, where the value 0 was passed, which is safe as initrd_size is only used
behind the is_pv_domain() check.
It introduces the calculate_dom0_pages() function that handles the command line
override of the memory pages, min pages, and max pages values. The function
also applies a clamping of memory pages to the min/max pages value.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/dom0_build.c | 62 ++++++++++++++++-----------
xen/arch/x86/hvm/dom0_build.c | 12 +++---
xen/arch/x86/include/asm/dom0_build.h | 10 ++---
xen/arch/x86/pv/dom0_build.c | 6 +--
4 files changed, 51 insertions(+), 39 deletions(-)
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 08fde953a1e0..658d81ab598c 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -330,16 +330,37 @@ static unsigned long __init default_nr_pages(unsigned long avail)
: min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
}
-unsigned long __init dom0_compute_nr_pages(
- struct domain *d, struct elf_dom_parms *parms, unsigned long initrd_len)
+static void __init calculate_dom0_pages(
+ struct boot_domain *bd, unsigned long avail)
{
- nodeid_t node;
- unsigned long avail = 0, nr_pages, min_pages, max_pages, iommu_pages = 0;
+ unsigned long nr_pages = bd->mem_pages ?: default_nr_pages(avail);
/* The ordering of operands is to work around a clang5 issue. */
if ( CONFIG_DOM0_MEM[0] && !dom0_mem_set )
parse_dom0_mem(CONFIG_DOM0_MEM);
+ if ( dom0_mem_set )
+ {
+ nr_pages = get_memsize(&dom0_size, avail) ?: default_nr_pages(avail);
+ bd->min_pages = get_memsize(&dom0_min_size, avail);
+ bd->max_pages = get_memsize(&dom0_max_size, avail);
+ }
+
+ nr_pages = max(nr_pages, bd->min_pages);
+ nr_pages = min(nr_pages, bd->max_pages);
+ nr_pages = min(nr_pages, avail);
+
+ bd->mem_pages = nr_pages;
+}
+
+unsigned long __init dom0_compute_nr_pages(
+ struct boot_domain *bd, struct elf_dom_parms *parms)
+{
+ nodeid_t node;
+ struct domain *d = bd->d;
+ unsigned long initrd_len = bd->ramdisk ? bd->ramdisk->size : 0;
+ unsigned long avail = 0, iommu_pages = 0;
+
for_each_node_mask ( node, dom0_nodes )
avail += avail_domheap_pages_region(node, 0, 0) +
initial_images_nrpages(node);
@@ -366,17 +387,13 @@ unsigned long __init dom0_compute_nr_pages(
{
unsigned long cpu_pages;
- nr_pages = get_memsize(&dom0_size, avail) ?: default_nr_pages(avail);
-
/*
* Clamp according to min/max limits and available memory
* (preliminary).
*/
- nr_pages = max(nr_pages, get_memsize(&dom0_min_size, avail));
- nr_pages = min(nr_pages, get_memsize(&dom0_max_size, avail));
- nr_pages = min(nr_pages, avail);
+ calculate_dom0_pages(bd, avail);
- cpu_pages = dom0_paging_pages(d, nr_pages);
+ cpu_pages = dom0_paging_pages(d, bd->mem_pages);
if ( !iommu_use_hap_pt(d) )
avail -= cpu_pages;
@@ -384,18 +401,12 @@ unsigned long __init dom0_compute_nr_pages(
avail -= cpu_pages - iommu_pages;
}
- nr_pages = get_memsize(&dom0_size, avail) ?: default_nr_pages(avail);
- min_pages = get_memsize(&dom0_min_size, avail);
- max_pages = get_memsize(&dom0_max_size, avail);
-
/* Clamp according to min/max limits and available memory (final). */
- nr_pages = max(nr_pages, min_pages);
- nr_pages = min(nr_pages, max_pages);
- nr_pages = min(nr_pages, avail);
+ calculate_dom0_pages(bd, avail);
if ( is_pv_domain(d) &&
(parms->p2m_base == UNSET_ADDR) && !memsize_gt_zero(&dom0_size) &&
- (!memsize_gt_zero(&dom0_min_size) || (nr_pages > min_pages)) )
+ (!memsize_gt_zero(&dom0_min_size) || (bd->mem_pages > bd->min_pages)) )
{
/*
* Legacy Linux kernels (i.e. such without a XEN_ELFNOTE_INIT_P2M
@@ -412,7 +423,7 @@ unsigned long __init dom0_compute_nr_pages(
vend = round_pgup(parms->virt_kend);
if ( !parms->unmapped_initrd )
vend += round_pgup(initrd_len);
- end = vend + nr_pages * sizeof_long;
+ end = vend + bd->mem_pages * sizeof_long;
if ( end > vstart )
end += end - vstart;
@@ -420,16 +431,17 @@ unsigned long __init dom0_compute_nr_pages(
(sizeof_long < sizeof(end) && end > (1UL << (8 * sizeof_long))) )
{
end = sizeof_long >= sizeof(end) ? 0 : 1UL << (8 * sizeof_long);
- nr_pages = (end - vend) / (2 * sizeof_long);
- if ( memsize_gt_zero(&dom0_min_size) && nr_pages < min_pages )
- nr_pages = min_pages;
- printk("Dom0 memory clipped to %lu pages\n", nr_pages);
+ bd->mem_pages = (end - vend) / (2 * sizeof_long);
+ if ( memsize_gt_zero(&dom0_min_size) &&
+ bd->mem_pages < bd->min_pages )
+ bd->mem_pages = bd->min_pages;
+ printk("Dom0 memory clipped to %lu pages\n", bd->mem_pages);
}
}
- d->max_pages = min_t(unsigned long, max_pages, UINT_MAX);
+ d->max_pages = min_t(unsigned long, bd->max_pages, UINT_MAX);
- return nr_pages;
+ return bd->mem_pages;
}
static void __init process_dom0_ioports_disable(struct domain *dom0)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index b4ffebdde00e..a818a9a772ed 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -398,15 +398,15 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
ASSERT(cur_pages == nr_pages);
}
-static void __init pvh_init_p2m(struct domain *d)
+static void __init pvh_init_p2m(struct boot_domain *bd)
{
- unsigned long nr_pages = dom0_compute_nr_pages(d, NULL, 0);
+ unsigned long nr_pages = dom0_compute_nr_pages(bd, NULL);
bool preempted;
- pvh_setup_e820(d, nr_pages);
+ pvh_setup_e820(bd->d, nr_pages);
do {
preempted = false;
- paging_set_allocation(d, dom0_paging_pages(d, nr_pages),
+ paging_set_allocation(bd->d, dom0_paging_pages(bd->d, nr_pages),
&preempted);
process_pending_softirqs();
} while ( preempted );
@@ -1312,7 +1312,7 @@ static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info)
return 0;
}
-int __init dom0_construct_pvh(const struct boot_domain *bd)
+int __init dom0_construct_pvh(struct boot_domain *bd)
{
paddr_t entry, start_info;
struct domain *d = bd->d;
@@ -1323,7 +1323,7 @@ int __init dom0_construct_pvh(const struct boot_domain *bd)
* be done before the iommu initializion, since iommu initialization code
* will likely add mappings required by devices to the p2m (ie: RMRRs).
*/
- pvh_init_p2m(d);
+ pvh_init_p2m(bd);
iommu_hwdom_init(d);
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index 426def4115ce..dcf71c032a17 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -10,14 +10,14 @@ extern unsigned int dom0_memflags;
void dom0_set_affinity(struct domain *dom0);
-unsigned long dom0_compute_nr_pages(struct domain *d,
- struct elf_dom_parms *parms,
- unsigned long initrd_len);
int dom0_setup_permissions(struct domain *d);
struct boot_domain;
-int dom0_construct_pv(const struct boot_domain *bd);
-int dom0_construct_pvh(const struct boot_domain *bd);
+unsigned long dom0_compute_nr_pages(
+ struct boot_domain *bd, struct elf_dom_parms *parms);
+
+int dom0_construct_pv(struct boot_domain *bd);
+int dom0_construct_pvh(struct boot_domain *bd);
unsigned long dom0_paging_pages(const struct domain *d,
unsigned long nr_pages);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 3b2baf057b75..350a60b1e8fd 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -355,7 +355,7 @@ static struct page_info * __init alloc_chunk(struct domain *d,
return page;
}
-static int __init dom0_construct(const struct boot_domain *bd)
+static int __init dom0_construct(struct boot_domain *bd)
{
unsigned int i;
int rc, order, machine;
@@ -503,7 +503,7 @@ static int __init dom0_construct(const struct boot_domain *bd)
}
}
- nr_pages = dom0_compute_nr_pages(d, &parms, initrd_len);
+ nr_pages = dom0_compute_nr_pages(bd, &parms);
#ifdef CONFIG_PV32
if ( elf_32bit(&elf) )
@@ -1070,7 +1070,7 @@ out:
return rc;
}
-int __init dom0_construct_pv(const struct boot_domain *bd)
+int __init dom0_construct_pv(struct boot_domain *bd)
{
unsigned long cr4 = read_cr4();
int rc;
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 10/38] x86/boot: refactor dom0 page calculation
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (8 preceding siblings ...)
2025-04-19 22:07 ` [RFC 09/38] x86/boot: convert dom0 page calculation to use boot domain Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 11/38] x86/boot: generalize paging pages calculation Daniel P. Smith
` (27 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Caution is needed when dom0 is being constructed as PV using an older kernel
that does not have the elf note XEN_ELFNOTE_INIT_P2M. The logic for handling
this situation is embedded directly and takes into account whether dom0 memory
parameters were specified using the negative allocation syntax. To prepare for
generalizing domain page allocation, isolate this logic to a separate handling
function.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/dom0_build.c | 76 +++++++++++++++++++++------------------
1 file changed, 41 insertions(+), 35 deletions(-)
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 658d81ab598c..a007e424bbe3 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -353,12 +353,50 @@ static void __init calculate_dom0_pages(
bd->mem_pages = nr_pages;
}
+static void __init dom0_pv_restrict_pages(
+ struct boot_domain *bd, struct elf_dom_parms *parms)
+{
+ if ( (parms->p2m_base == UNSET_ADDR) && !memsize_gt_zero(&dom0_size) &&
+ (!memsize_gt_zero(&dom0_min_size) || (bd->mem_pages > bd->min_pages)) )
+ {
+ /*
+ * Legacy Linux kernels (i.e. such without a XEN_ELFNOTE_INIT_P2M
+ * note) require that there is enough virtual space beyond the initial
+ * allocation to set up their initial page tables. This space is
+ * roughly the same size as the p2m table, so make sure the initial
+ * allocation doesn't consume more than about half the space that's
+ * available between params.virt_base and the address space end.
+ */
+ unsigned long vstart, vend, end;
+ unsigned long initrd_len = bd->ramdisk ? bd->ramdisk->size : 0;
+ size_t sizeof_long = is_pv_32bit_domain(bd->d) ? sizeof(int) : sizeof(long);
+
+ vstart = parms->virt_base;
+ vend = round_pgup(parms->virt_kend);
+ if ( !parms->unmapped_initrd )
+ vend += round_pgup(initrd_len);
+ end = vend + bd->mem_pages * sizeof_long;
+
+ if ( end > vstart )
+ end += end - vstart;
+ if ( end <= vstart ||
+ (sizeof_long < sizeof(end) && end > (1UL << (8 * sizeof_long))) )
+ {
+ end = sizeof_long >= sizeof(end) ? 0 : 1UL << (8 * sizeof_long);
+ bd->mem_pages = (end - vend) / (2 * sizeof_long);
+ if ( memsize_gt_zero(&dom0_min_size) &&
+ bd->mem_pages < bd->min_pages )
+ bd->mem_pages = bd->min_pages;
+ printk("Dom0 memory clipped to %lu pages\n", bd->mem_pages);
+ }
+ }
+}
+
unsigned long __init dom0_compute_nr_pages(
struct boot_domain *bd, struct elf_dom_parms *parms)
{
nodeid_t node;
struct domain *d = bd->d;
- unsigned long initrd_len = bd->ramdisk ? bd->ramdisk->size : 0;
unsigned long avail = 0, iommu_pages = 0;
for_each_node_mask ( node, dom0_nodes )
@@ -404,40 +442,8 @@ unsigned long __init dom0_compute_nr_pages(
/* Clamp according to min/max limits and available memory (final). */
calculate_dom0_pages(bd, avail);
- if ( is_pv_domain(d) &&
- (parms->p2m_base == UNSET_ADDR) && !memsize_gt_zero(&dom0_size) &&
- (!memsize_gt_zero(&dom0_min_size) || (bd->mem_pages > bd->min_pages)) )
- {
- /*
- * Legacy Linux kernels (i.e. such without a XEN_ELFNOTE_INIT_P2M
- * note) require that there is enough virtual space beyond the initial
- * allocation to set up their initial page tables. This space is
- * roughly the same size as the p2m table, so make sure the initial
- * allocation doesn't consume more than about half the space that's
- * available between params.virt_base and the address space end.
- */
- unsigned long vstart, vend, end;
- size_t sizeof_long = is_pv_32bit_domain(d) ? sizeof(int) : sizeof(long);
-
- vstart = parms->virt_base;
- vend = round_pgup(parms->virt_kend);
- if ( !parms->unmapped_initrd )
- vend += round_pgup(initrd_len);
- end = vend + bd->mem_pages * sizeof_long;
-
- if ( end > vstart )
- end += end - vstart;
- if ( end <= vstart ||
- (sizeof_long < sizeof(end) && end > (1UL << (8 * sizeof_long))) )
- {
- end = sizeof_long >= sizeof(end) ? 0 : 1UL << (8 * sizeof_long);
- bd->mem_pages = (end - vend) / (2 * sizeof_long);
- if ( memsize_gt_zero(&dom0_min_size) &&
- bd->mem_pages < bd->min_pages )
- bd->mem_pages = bd->min_pages;
- printk("Dom0 memory clipped to %lu pages\n", bd->mem_pages);
- }
- }
+ if ( is_pv_domain(d) )
+ dom0_pv_restrict_pages(bd, parms);
d->max_pages = min_t(unsigned long, bd->max_pages, UINT_MAX);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 11/38] x86/boot: generalize paging pages calculation
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (9 preceding siblings ...)
2025-04-19 22:07 ` [RFC 10/38] x86/boot: refactor dom0 page calculation Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 12/38] x86/boot: generalize compute number of domain pages Daniel P. Smith
` (26 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Modeling after libxl__get_required_paging_memory(), refactor
dom0_paging_pages() to calculate the number of paging pages required for a
domain that is not the control or hardware domain. As the function is being
refactored, rename to dom_paging_pages() and move under the domain builder.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/dom0_build.c | 17 +----------------
xen/arch/x86/domain-builder/domain.c | 20 ++++++++++++++++++++
xen/arch/x86/hvm/dom0_build.c | 3 ++-
xen/arch/x86/include/asm/dom0_build.h | 3 ---
xen/arch/x86/include/asm/domain-builder.h | 3 +++
xen/arch/x86/pv/dom0_build.c | 3 ++-
6 files changed, 28 insertions(+), 21 deletions(-)
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index a007e424bbe3..1413e8c634a7 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -305,21 +305,6 @@ boolean_param("ro-hpet", ro_hpet);
unsigned int __initdata dom0_memflags = MEMF_no_dma|MEMF_exact_node;
-unsigned long __init dom0_paging_pages(const struct domain *d,
- unsigned long nr_pages)
-{
- /* Keep in sync with libxl__get_required_paging_memory(). */
- unsigned long memkb = nr_pages * (PAGE_SIZE / 1024);
-
- memkb = 4 * (256 * d->max_vcpus +
- (is_pv_domain(d) ? opt_dom0_shadow || opt_pv_l1tf_hwdom
- : 1 + opt_dom0_shadow) *
- (memkb / 1024));
-
- return DIV_ROUND_UP(memkb, 1024) << (20 - PAGE_SHIFT);
-}
-
-
/*
* If allocation isn't specified, reserve 1/16th of available memory for
* things like DMA buffers. This reservation is clamped to a maximum of 128MB.
@@ -431,7 +416,7 @@ unsigned long __init dom0_compute_nr_pages(
*/
calculate_dom0_pages(bd, avail);
- cpu_pages = dom0_paging_pages(d, bd->mem_pages);
+ cpu_pages = dom_paging_pages(bd, bd->mem_pages);
if ( !iommu_use_hap_pt(d) )
avail -= cpu_pages;
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index 619d36ea0b87..87f538ae26c5 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -10,6 +10,26 @@
#include <asm/bootinfo.h>
#include <asm/dom0_build.h>
+#include <asm/paging.h>
+#include <asm/spec_ctrl.h>
+
+unsigned long __init dom_paging_pages(
+ const struct boot_domain *bd, unsigned long nr_pages)
+{
+ /* Keep in sync with libxl__get_required_paging_memory(). */
+ unsigned long memkb = bd->mem_pages * (PAGE_SIZE / 1024);
+ unsigned long factor = 0;
+
+ if ( bd->capabilities & (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE) )
+ factor = is_pv_domain(bd->d) ? opt_dom0_shadow || opt_pv_l1tf_hwdom
+ : 1 + opt_dom0_shadow;
+ else
+ factor = !is_pv_domain(bd->d) + !paging_mode_hap(bd->d);
+
+ memkb = 4 * (256 * bd->d->max_vcpus + (factor * (memkb / 1024)));
+
+ return DIV_ROUND_UP(memkb, 1024) << (20 - PAGE_SHIFT);
+}
unsigned int __init dom_max_vcpus(struct boot_domain *bd)
{
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index a818a9a772ed..d3ad90348a1f 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -19,6 +19,7 @@
#include <asm/bootinfo.h>
#include <asm/bzimage.h>
#include <asm/dom0_build.h>
+#include <asm/domain-builder.h>
#include <asm/hvm/support.h>
#include <asm/io_apic.h>
#include <asm/p2m.h>
@@ -406,7 +407,7 @@ static void __init pvh_init_p2m(struct boot_domain *bd)
pvh_setup_e820(bd->d, nr_pages);
do {
preempted = false;
- paging_set_allocation(bd->d, dom0_paging_pages(bd->d, nr_pages),
+ paging_set_allocation(bd->d, dom_paging_pages(bd, nr_pages),
&preempted);
process_pending_softirqs();
} while ( preempted );
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index dcf71c032a17..81717b49b4ae 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -19,9 +19,6 @@ unsigned long dom0_compute_nr_pages(
int dom0_construct_pv(struct boot_domain *bd);
int dom0_construct_pvh(struct boot_domain *bd);
-unsigned long dom0_paging_pages(const struct domain *d,
- unsigned long nr_pages);
-
void dom0_update_physmap(bool compat, unsigned long pfn,
unsigned long mfn, unsigned long vphysmap_s);
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index ccfa4bd82acd..8b0b224eba41 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -8,6 +8,9 @@ int __init builder_get_cmdline(
struct boot_info *bi, int offset, char *cmdline, size_t size);
void builder_init(struct boot_info *bi);
+
+unsigned long dom_paging_pages(
+ const struct boot_domain *d, unsigned long nr_pages);
unsigned int dom_max_vcpus(struct boot_domain *bd);
struct vcpu *alloc_dom_vcpu0(struct boot_domain *bd);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 350a60b1e8fd..f8844b858082 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -17,6 +17,7 @@
#include <asm/bootinfo.h>
#include <asm/bzimage.h>
#include <asm/dom0_build.h>
+#include <asm/domain-builder.h>
#include <asm/guest.h>
#include <asm/page.h>
#include <asm/pv/mm.h>
@@ -1043,7 +1044,7 @@ static int __init dom0_construct(struct boot_domain *bd)
{
bool preempted;
- nr_pt_pages = dom0_paging_pages(d, nr_pages);
+ nr_pt_pages = dom_paging_pages(bd, nr_pages);
do {
preempted = false;
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 12/38] x86/boot: generalize compute number of domain pages
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (10 preceding siblings ...)
2025-04-19 22:07 ` [RFC 11/38] x86/boot: generalize paging pages calculation Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 13/38] x86/hyperlaunch: move page computation to domain builder Daniel P. Smith
` (25 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The amount of pages for a domain to be allocated is based on the physical nodes
a domain may be scheduled. For dom0, this can be restricted down from available
nodes via the dom0_nodes command line parameter.
Refactor dom0_compute_nr_pages() such that only apply the dom0_nodes
restriction only if the domain has the control domain or hardware domain
capability flag set. In doing so, also rename the function to
dom_compute_nr_pages().
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/dom0_build.c | 16 +++++++++++++---
xen/arch/x86/hvm/dom0_build.c | 2 +-
xen/arch/x86/include/asm/dom0_build.h | 2 +-
xen/arch/x86/pv/dom0_build.c | 2 +-
4 files changed, 16 insertions(+), 6 deletions(-)
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 1413e8c634a7..53ab1afbaa1f 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -377,14 +377,23 @@ static void __init dom0_pv_restrict_pages(
}
}
-unsigned long __init dom0_compute_nr_pages(
+unsigned long __init dom_compute_nr_pages(
struct boot_domain *bd, struct elf_dom_parms *parms)
{
nodeid_t node;
+ nodemask_t nodes = { 0 };
struct domain *d = bd->d;
unsigned long avail = 0, iommu_pages = 0;
- for_each_node_mask ( node, dom0_nodes )
+ nodes_or(nodes, nodes, node_online_map);
+
+ /* If building dom0 or hwdom, apply command line restriction. */
+ if ( bd->capabilities & (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE) )
+ nodes_and(nodes, nodes, dom0_nodes);
+
+ ASSERT(nodes_weight(nodes) != 0);
+
+ for_each_node_mask ( node, nodes )
avail += avail_domheap_pages_region(node, 0, 0) +
initial_images_nrpages(node);
@@ -396,7 +405,8 @@ unsigned long __init dom0_compute_nr_pages(
avail -= d->max_vcpus - 1;
/* Reserve memory for iommu_dom0_init() (rough estimate). */
- if ( is_iommu_enabled(d) && !iommu_hwdom_passthrough )
+ if ( is_hardware_domain(d) && is_iommu_enabled(d)
+ && !iommu_hwdom_passthrough )
{
unsigned int s;
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index d3ad90348a1f..f59af0e72810 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -401,7 +401,7 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
static void __init pvh_init_p2m(struct boot_domain *bd)
{
- unsigned long nr_pages = dom0_compute_nr_pages(bd, NULL);
+ unsigned long nr_pages = dom_compute_nr_pages(bd, NULL);
bool preempted;
pvh_setup_e820(bd->d, nr_pages);
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index 81717b49b4ae..7275bcf9ba6b 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -13,7 +13,7 @@ void dom0_set_affinity(struct domain *dom0);
int dom0_setup_permissions(struct domain *d);
struct boot_domain;
-unsigned long dom0_compute_nr_pages(
+unsigned long dom_compute_nr_pages(
struct boot_domain *bd, struct elf_dom_parms *parms);
int dom0_construct_pv(struct boot_domain *bd);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index f8844b858082..ad4d1cc3520c 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -504,7 +504,7 @@ static int __init dom0_construct(struct boot_domain *bd)
}
}
- nr_pages = dom0_compute_nr_pages(bd, &parms);
+ nr_pages = dom_compute_nr_pages(bd, &parms);
#ifdef CONFIG_PV32
if ( elf_32bit(&elf) )
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 13/38] x86/hyperlaunch: move page computation to domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (11 preceding siblings ...)
2025-04-19 22:07 ` [RFC 12/38] x86/boot: generalize compute number of domain pages Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 14/38] x86/hyperlaunch: move pvh p2m init " Daniel P. Smith
` (24 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The function dom_compute_nr_pages() is being moved to the domain builder. For
this to happen, the variable dom0_nodes, and the functions
calculate_dom0_pages() and dom0_pv_restrict_pages() must be exported.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/dom0_build.c | 75 +----------------------
xen/arch/x86/domain-builder/domain.c | 70 +++++++++++++++++++++
xen/arch/x86/include/asm/dom0_build.h | 4 +-
xen/arch/x86/include/asm/domain-builder.h | 5 ++
4 files changed, 81 insertions(+), 73 deletions(-)
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index 53ab1afbaa1f..e488a204372b 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -209,7 +209,7 @@ static int __init cf_check parse_dom0_nodes(const char *s)
custom_param("dom0_nodes", parse_dom0_nodes);
cpumask_t __initdata dom0_cpus;
-static nodemask_t __initdata dom0_nodes;
+nodemask_t __initdata dom0_nodes;
unsigned int __init dom0_max_vcpus(void)
{
@@ -315,8 +315,7 @@ static unsigned long __init default_nr_pages(unsigned long avail)
: min(avail / 16, 128UL << (20 - PAGE_SHIFT)));
}
-static void __init calculate_dom0_pages(
- struct boot_domain *bd, unsigned long avail)
+void __init calculate_dom0_pages(struct boot_domain *bd, unsigned long avail)
{
unsigned long nr_pages = bd->mem_pages ?: default_nr_pages(avail);
@@ -338,7 +337,7 @@ static void __init calculate_dom0_pages(
bd->mem_pages = nr_pages;
}
-static void __init dom0_pv_restrict_pages(
+void __init dom0_pv_restrict_pages(
struct boot_domain *bd, struct elf_dom_parms *parms)
{
if ( (parms->p2m_base == UNSET_ADDR) && !memsize_gt_zero(&dom0_size) &&
@@ -377,74 +376,6 @@ static void __init dom0_pv_restrict_pages(
}
}
-unsigned long __init dom_compute_nr_pages(
- struct boot_domain *bd, struct elf_dom_parms *parms)
-{
- nodeid_t node;
- nodemask_t nodes = { 0 };
- struct domain *d = bd->d;
- unsigned long avail = 0, iommu_pages = 0;
-
- nodes_or(nodes, nodes, node_online_map);
-
- /* If building dom0 or hwdom, apply command line restriction. */
- if ( bd->capabilities & (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE) )
- nodes_and(nodes, nodes, dom0_nodes);
-
- ASSERT(nodes_weight(nodes) != 0);
-
- for_each_node_mask ( node, nodes )
- avail += avail_domheap_pages_region(node, 0, 0) +
- initial_images_nrpages(node);
-
- /* Reserve memory for further dom0 vcpu-struct allocations... */
- avail -= (d->max_vcpus - 1UL)
- << get_order_from_bytes(sizeof(struct vcpu));
- /* ...and compat_l4's, if needed. */
- if ( is_pv_32bit_domain(d) )
- avail -= d->max_vcpus - 1;
-
- /* Reserve memory for iommu_dom0_init() (rough estimate). */
- if ( is_hardware_domain(d) && is_iommu_enabled(d)
- && !iommu_hwdom_passthrough )
- {
- unsigned int s;
-
- for ( s = 9; s < BITS_PER_LONG; s += 9 )
- iommu_pages += max_pdx >> s;
-
- avail -= iommu_pages;
- }
-
- if ( paging_mode_enabled(d) || opt_dom0_shadow || opt_pv_l1tf_hwdom )
- {
- unsigned long cpu_pages;
-
- /*
- * Clamp according to min/max limits and available memory
- * (preliminary).
- */
- calculate_dom0_pages(bd, avail);
-
- cpu_pages = dom_paging_pages(bd, bd->mem_pages);
-
- if ( !iommu_use_hap_pt(d) )
- avail -= cpu_pages;
- else if ( cpu_pages > iommu_pages )
- avail -= cpu_pages - iommu_pages;
- }
-
- /* Clamp according to min/max limits and available memory (final). */
- calculate_dom0_pages(bd, avail);
-
- if ( is_pv_domain(d) )
- dom0_pv_restrict_pages(bd, parms);
-
- d->max_pages = min_t(unsigned long, bd->max_pages, UINT_MAX);
-
- return bd->mem_pages;
-}
-
static void __init process_dom0_ioports_disable(struct domain *dom0)
{
unsigned long io_from, io_to;
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index 87f538ae26c5..bdb4d9745ff2 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -6,6 +6,8 @@
#include <xen/cpumask.h>
#include <xen/domain.h>
#include <xen/init.h>
+#include <xen/libelf.h>
+#include <xen/nodemask.h>
#include <xen/sched.h>
#include <asm/bootinfo.h>
@@ -31,6 +33,74 @@ unsigned long __init dom_paging_pages(
return DIV_ROUND_UP(memkb, 1024) << (20 - PAGE_SHIFT);
}
+unsigned long __init dom_compute_nr_pages(
+ struct boot_domain *bd, struct elf_dom_parms *parms)
+{
+ nodeid_t node;
+ nodemask_t nodes = { 0 };
+ struct domain *d = bd->d;
+ unsigned long avail = 0, iommu_pages = 0;
+
+ nodes_or(nodes, nodes, node_online_map);
+
+ /* If building dom0 or hwdom, apply command line restriction. */
+ if ( bd->capabilities & (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE) )
+ nodes_and(nodes, nodes, dom0_nodes);
+
+ ASSERT(nodes_weight(nodes) != 0);
+
+ for_each_node_mask ( node, nodes )
+ avail += avail_domheap_pages_region(node, 0, 0) +
+ initial_images_nrpages(node);
+
+ /* Reserve memory for further dom0 vcpu-struct allocations... */
+ avail -= (d->max_vcpus - 1UL)
+ << get_order_from_bytes(sizeof(struct vcpu));
+ /* ...and compat_l4's, if needed. */
+ if ( is_pv_32bit_domain(d) )
+ avail -= d->max_vcpus - 1;
+
+ /* Reserve memory for iommu_dom0_init() (rough estimate). */
+ if ( is_hardware_domain(d) && is_iommu_enabled(d)
+ && !iommu_hwdom_passthrough )
+ {
+ unsigned int s;
+
+ for ( s = 9; s < BITS_PER_LONG; s += 9 )
+ iommu_pages += max_pdx >> s;
+
+ avail -= iommu_pages;
+ }
+
+ if ( paging_mode_enabled(d) || opt_dom0_shadow || opt_pv_l1tf_hwdom )
+ {
+ unsigned long cpu_pages;
+
+ /*
+ * Clamp according to min/max limits and available memory
+ * (preliminary).
+ */
+ calculate_dom0_pages(bd, avail);
+
+ cpu_pages = dom_paging_pages(bd, bd->mem_pages);
+
+ if ( !iommu_use_hap_pt(d) )
+ avail -= cpu_pages;
+ else if ( cpu_pages > iommu_pages )
+ avail -= cpu_pages - iommu_pages;
+ }
+
+ /* Clamp according to min/max limits and available memory (final). */
+ calculate_dom0_pages(bd, avail);
+
+ if ( is_pv_domain(d) )
+ dom0_pv_restrict_pages(bd, parms);
+
+ d->max_pages = min_t(unsigned long, bd->max_pages, UINT_MAX);
+
+ return bd->mem_pages;
+}
+
unsigned int __init dom_max_vcpus(struct boot_domain *bd)
{
unsigned int limit = bd->mode & BUILD_MODE_PARAVIRT ?
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index 7275bcf9ba6b..43a402af15b7 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -7,13 +7,15 @@
#include <asm/setup.h>
extern unsigned int dom0_memflags;
+extern nodemask_t dom0_nodes;
void dom0_set_affinity(struct domain *dom0);
int dom0_setup_permissions(struct domain *d);
struct boot_domain;
-unsigned long dom_compute_nr_pages(
+void calculate_dom0_pages(struct boot_domain *bd, unsigned long avail);
+void dom0_pv_restrict_pages(
struct boot_domain *bd, struct elf_dom_parms *parms);
int dom0_construct_pv(struct boot_domain *bd);
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index 8b0b224eba41..f9b11e06ceb8 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -11,6 +11,11 @@ void builder_init(struct boot_info *bi);
unsigned long dom_paging_pages(
const struct boot_domain *d, unsigned long nr_pages);
+
+struct elf_dom_parms;
+unsigned long dom_compute_nr_pages(
+ struct boot_domain *bd, struct elf_dom_parms *parms);
+
unsigned int dom_max_vcpus(struct boot_domain *bd);
struct vcpu *alloc_dom_vcpu0(struct boot_domain *bd);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 14/38] x86/hyperlaunch: move pvh p2m init to domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (12 preceding siblings ...)
2025-04-19 22:07 ` [RFC 13/38] x86/hyperlaunch: move page computation to domain builder Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 15/38] x86/hyperlaunch: move iommu " Daniel P. Smith
` (23 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Generalize pvh_init_p2m() for use on domU and relocate under the domain
builder. To support moving the function, dom0_pvh_setup_e820() was exported.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom0_build.c | 23 +----------------------
xen/arch/x86/hvm/dom_build.c | 25 +++++++++++++++++++++++++
xen/arch/x86/include/asm/dom0_build.h | 2 ++
3 files changed, 28 insertions(+), 22 deletions(-)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index f59af0e72810..4a371dfd66d7 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -323,7 +323,7 @@ static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
return 0;
}
-static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
+void __init dom0_pvh_setup_e820(struct domain *d, unsigned long nr_pages)
{
struct e820entry *entry, *entry_guest;
unsigned int i;
@@ -399,20 +399,6 @@ static __init void pvh_setup_e820(struct domain *d, unsigned long nr_pages)
ASSERT(cur_pages == nr_pages);
}
-static void __init pvh_init_p2m(struct boot_domain *bd)
-{
- unsigned long nr_pages = dom_compute_nr_pages(bd, NULL);
- bool preempted;
-
- pvh_setup_e820(bd->d, nr_pages);
- do {
- preempted = false;
- paging_set_allocation(bd->d, dom_paging_pages(bd, nr_pages),
- &preempted);
- process_pending_softirqs();
- } while ( preempted );
-}
-
static int __init pvh_populate_p2m(struct domain *d)
{
struct vcpu *v = d->vcpu[0];
@@ -1319,13 +1305,6 @@ int __init dom0_construct_pvh(struct boot_domain *bd)
struct domain *d = bd->d;
int rc;
- /*
- * Craft dom0 physical memory map and set the paging allocation. This must
- * be done before the iommu initializion, since iommu initialization code
- * will likely add mappings required by devices to the p2m (ie: RMRRs).
- */
- pvh_init_p2m(bd);
-
iommu_hwdom_init(d);
rc = pvh_populate_p2m(d);
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index b6e534226434..05064ae9ba96 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -10,13 +10,16 @@
#include <xen/acpi.h>
#include <xen/init.h>
+#include <xen/softirq.h>
#include <xen/types.h>
#include <acpi/actables.h>
#include <asm/bootinfo.h>
#include <asm/dom0_build.h>
+#include <asm/domain-builder.h>
#include <asm/hvm/io.h>
+#include <asm/paging.h>
#include <asm/pci.h>
static void __hwdom_init pvh_setup_mmcfg(struct domain *d)
@@ -37,6 +40,20 @@ static void __hwdom_init pvh_setup_mmcfg(struct domain *d)
}
}
+static void __init pvh_init_p2m(struct boot_domain *bd)
+{
+ unsigned long nr_pages = dom_compute_nr_pages(bd, NULL);
+ bool preempted;
+
+ dom0_pvh_setup_e820(bd->d, nr_pages);
+ do {
+ preempted = false;
+ paging_set_allocation(bd->d, dom_paging_pages(bd, nr_pages),
+ &preempted);
+ process_pending_softirqs();
+ } while ( preempted );
+}
+
int __init dom_construct_pvh(struct boot_domain *bd)
{
int rc;
@@ -67,6 +84,14 @@ int __init dom_construct_pvh(struct boot_domain *bd)
}
}
+ /*
+ * Craft domain physical memory map and set the paging allocation. This
+ * must be done before the iommu initializion, since iommu initialization
+ * code will likely add mappings required by devices to the p2m (ie:
+ * RMRRs).
+ */
+ pvh_init_p2m(bd);
+
return dom0_construct_pvh(bd);
}
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index 43a402af15b7..e5debd5adf9f 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -18,6 +18,8 @@ void calculate_dom0_pages(struct boot_domain *bd, unsigned long avail);
void dom0_pv_restrict_pages(
struct boot_domain *bd, struct elf_dom_parms *parms);
+void dom0_pvh_setup_e820(struct domain *d, unsigned long nr_pages);
+
int dom0_construct_pv(struct boot_domain *bd);
int dom0_construct_pvh(struct boot_domain *bd);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 15/38] x86/hyperlaunch: move iommu init to domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (13 preceding siblings ...)
2025-04-19 22:07 ` [RFC 14/38] x86/hyperlaunch: move pvh p2m init " Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 16/38] x86/boot: move and rename sched_setup_dom0_vcpus Daniel P. Smith
` (22 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Move invocation of iommu_hwdom_init() to dom_construct_pvh() and guard it
with a hardware domain check.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom0_build.c | 2 --
xen/arch/x86/hvm/dom_build.c | 4 ++++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 4a371dfd66d7..a68474b7856a 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1305,8 +1305,6 @@ int __init dom0_construct_pvh(struct boot_domain *bd)
struct domain *d = bd->d;
int rc;
- iommu_hwdom_init(d);
-
rc = pvh_populate_p2m(d);
if ( rc )
{
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 05064ae9ba96..7fa007a6926d 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -9,6 +9,7 @@
*/
#include <xen/acpi.h>
+#include <xen/iommu.h>
#include <xen/init.h>
#include <xen/softirq.h>
#include <xen/types.h>
@@ -92,6 +93,9 @@ int __init dom_construct_pvh(struct boot_domain *bd)
*/
pvh_init_p2m(bd);
+ if ( is_hardware_domain(bd->d) )
+ iommu_hwdom_init(bd->d);
+
return dom0_construct_pvh(bd);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 16/38] x86/boot: move and rename sched_setup_dom0_vcpus
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (14 preceding siblings ...)
2025-04-19 22:07 ` [RFC 15/38] x86/hyperlaunch: move iommu " Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-20 9:36 ` Jürgen Groß
2025-04-19 22:07 ` [RFC 17/38] x86/hyperlaunch: move pvh_setup_cpus to domain builder Daniel P. Smith
` (21 subsequent siblings)
37 siblings, 1 reply; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, Dario Faggioli,
Juergen Gross, George Dunlap
Relocated the function sched_setup_dom0_vcpus(), which was protected by an
ifdef CONFIG_X86, from common/sched to the hyperlaunch domain builder. Rename
it to alloc_dom_vcpus() to better reflect the purpose of the function.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/domain.c | 10 ++++++++++
xen/arch/x86/hvm/dom0_build.c | 2 +-
xen/arch/x86/include/asm/domain-builder.h | 1 +
xen/arch/x86/pv/dom0_build.c | 2 +-
xen/common/sched/core.c | 12 ------------
xen/include/xen/sched.h | 1 -
6 files changed, 13 insertions(+), 15 deletions(-)
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index bdb4d9745ff2..764a1af4acd3 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -128,6 +128,16 @@ struct vcpu *__init alloc_dom_vcpu0(struct boot_domain *bd)
return vcpu_create(bd->d, 0);
}
+void __init alloc_dom_vcpus(struct domain *d)
+{
+ unsigned int i;
+
+ for ( i = 1; i < d->max_vcpus; i++ )
+ vcpu_create(d, i);
+
+ domain_update_node_affinity(d);
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index a68474b7856a..2ed2d5fd7326 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -834,7 +834,7 @@ static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
.cpu_regs.x86_32.tr_ar = 0x8b,
};
- sched_setup_dom0_vcpus(d);
+ alloc_dom_vcpus(d);
rc = arch_set_info_hvm_guest(v, &cpu_ctx);
if ( rc )
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index f9b11e06ceb8..af0a24b4edf5 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -18,6 +18,7 @@ unsigned long dom_compute_nr_pages(
unsigned int dom_max_vcpus(struct boot_domain *bd);
struct vcpu *alloc_dom_vcpu0(struct boot_domain *bd);
+void alloc_dom_vcpus(struct domain *d);
int dom_construct_pvh(struct boot_domain *bd);
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index ad4d1cc3520c..c3d269438313 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -827,7 +827,7 @@ static int __init dom0_construct(struct boot_domain *bd)
printk("Dom%u has maximum %u VCPUs\n", d->domain_id, d->max_vcpus);
- sched_setup_dom0_vcpus(d);
+ alloc_dom_vcpus(d);
d->arch.paging.mode = 0;
diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index 9043414290a8..d679d766a4b6 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -3479,18 +3479,6 @@ void wait(void)
schedule();
}
-#ifdef CONFIG_X86
-void __init sched_setup_dom0_vcpus(struct domain *d)
-{
- unsigned int i;
-
- for ( i = 1; i < d->max_vcpus; i++ )
- vcpu_create(d, i);
-
- domain_update_node_affinity(d);
-}
-#endif
-
#ifdef CONFIG_COMPAT
#include "compat.c"
#endif
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 559d201e0c7e..4f184cd76206 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -1084,7 +1084,6 @@ static inline bool sched_has_urgent_vcpu(void)
}
void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value);
-void sched_setup_dom0_vcpus(struct domain *d);
int vcpu_temporary_affinity(struct vcpu *v, unsigned int cpu, uint8_t reason);
int vcpu_set_hard_affinity(struct vcpu *v, const cpumask_t *affinity);
int vcpu_affinity_domctl(struct domain *d, uint32_t cmd,
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 17/38] x86/hyperlaunch: move pvh_setup_cpus to domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (15 preceding siblings ...)
2025-04-19 22:07 ` [RFC 16/38] x86/boot: move and rename sched_setup_dom0_vcpus Daniel P. Smith
@ 2025-04-19 22:07 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 18/38] x86/boot: rename pvh acpi setup function Daniel P. Smith
` (20 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:07 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The function pvh_setup_cpus() is a very general function that is usable by all
HVM domains, not just PVH. As such, renaming to hvm_setup_cpus during move.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom0_build.c | 45 +----------------------
xen/arch/x86/hvm/dom_build.c | 45 +++++++++++++++++++++++
xen/arch/x86/include/asm/domain-builder.h | 2 +
3 files changed, 48 insertions(+), 44 deletions(-)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 2ed2d5fd7326..a3b6afc5318a 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -808,49 +808,6 @@ static int __init pvh_load_kernel(
return 0;
}
-static int __init pvh_setup_cpus(struct domain *d, paddr_t entry,
- paddr_t start_info)
-{
- struct vcpu *v = d->vcpu[0];
- int rc;
- /*
- * This sets the vCPU state according to the state described in
- * docs/misc/pvh.pandoc.
- */
- vcpu_hvm_context_t cpu_ctx = {
- .mode = VCPU_HVM_MODE_32B,
- .cpu_regs.x86_32.ebx = start_info,
- .cpu_regs.x86_32.eip = entry,
- .cpu_regs.x86_32.cr0 = X86_CR0_PE | X86_CR0_ET,
- .cpu_regs.x86_32.cs_limit = ~0u,
- .cpu_regs.x86_32.ds_limit = ~0u,
- .cpu_regs.x86_32.es_limit = ~0u,
- .cpu_regs.x86_32.ss_limit = ~0u,
- .cpu_regs.x86_32.tr_limit = 0x67,
- .cpu_regs.x86_32.cs_ar = 0xc9b,
- .cpu_regs.x86_32.ds_ar = 0xc93,
- .cpu_regs.x86_32.es_ar = 0xc93,
- .cpu_regs.x86_32.ss_ar = 0xc93,
- .cpu_regs.x86_32.tr_ar = 0x8b,
- };
-
- alloc_dom_vcpus(d);
-
- rc = arch_set_info_hvm_guest(v, &cpu_ctx);
- if ( rc )
- {
- printk("Unable to setup Dom0 BSP context: %d\n", rc);
- return rc;
- }
-
- update_domain_wallclock_time(d);
-
- v->is_initialised = 1;
- clear_bit(_VPF_down, &v->pause_flags);
-
- return 0;
-}
-
static int __init cf_check acpi_count_intr_ovr(
struct acpi_subtable_header *header, const unsigned long end)
{
@@ -1319,7 +1276,7 @@ int __init dom0_construct_pvh(struct boot_domain *bd)
return rc;
}
- rc = pvh_setup_cpus(d, entry, start_info);
+ rc = hvm_setup_cpus(bd->d, entry, start_info);
if ( rc )
{
printk("Failed to setup Dom0 CPUs: %d\n", rc);
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 7fa007a6926d..078e42447b0a 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -16,6 +16,8 @@
#include <acpi/actables.h>
+#include <public/hvm/hvm_vcpu.h>
+
#include <asm/bootinfo.h>
#include <asm/dom0_build.h>
#include <asm/domain-builder.h>
@@ -55,6 +57,49 @@ static void __init pvh_init_p2m(struct boot_domain *bd)
} while ( preempted );
}
+int __init hvm_setup_cpus(
+ struct domain *d, paddr_t entry, paddr_t start_info)
+{
+ struct vcpu *v = d->vcpu[0];
+ int rc;
+ /*
+ * This sets the vCPU state according to the state described in
+ * docs/misc/pvh.pandoc.
+ */
+ vcpu_hvm_context_t cpu_ctx = {
+ .mode = VCPU_HVM_MODE_32B,
+ .cpu_regs.x86_32.ebx = start_info,
+ .cpu_regs.x86_32.eip = entry,
+ .cpu_regs.x86_32.cr0 = X86_CR0_PE | X86_CR0_ET,
+ .cpu_regs.x86_32.cs_limit = ~0u,
+ .cpu_regs.x86_32.ds_limit = ~0u,
+ .cpu_regs.x86_32.es_limit = ~0u,
+ .cpu_regs.x86_32.ss_limit = ~0u,
+ .cpu_regs.x86_32.tr_limit = 0x67,
+ .cpu_regs.x86_32.cs_ar = 0xc9b,
+ .cpu_regs.x86_32.ds_ar = 0xc93,
+ .cpu_regs.x86_32.es_ar = 0xc93,
+ .cpu_regs.x86_32.ss_ar = 0xc93,
+ .cpu_regs.x86_32.tr_ar = 0x8b,
+ };
+
+ alloc_dom_vcpus(d);
+
+ rc = arch_set_info_hvm_guest(v, &cpu_ctx);
+ if ( rc )
+ {
+ printk("Unable to setup %pd BSP context: %d\n", d, rc);
+ return rc;
+ }
+
+ update_domain_wallclock_time(d);
+
+ v->is_initialised = 1;
+ clear_bit(_VPF_down, &v->pause_flags);
+
+ return 0;
+}
+
int __init dom_construct_pvh(struct boot_domain *bd)
{
int rc;
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index af0a24b4edf5..7bc2fbe66bc2 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -9,6 +9,8 @@ int __init builder_get_cmdline(
void builder_init(struct boot_info *bi);
+int hvm_setup_cpus(struct domain *d, paddr_t entry, paddr_t start_info);
+
unsigned long dom_paging_pages(
const struct boot_domain *d, unsigned long nr_pages);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 18/38] x86/boot: rename pvh acpi setup function
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (16 preceding siblings ...)
2025-04-19 22:07 ` [RFC 17/38] x86/hyperlaunch: move pvh_setup_cpus to domain builder Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 19/38] x86/hyperlaunch: add domu memory map construction Daniel P. Smith
` (19 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The function pvh_setup_acpi() is dom0 specific, renaming it to
dom0_pvh_setup_acpi(). Now export the function so that it may be called by the
domain builder.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom0_build.c | 4 ++--
xen/arch/x86/include/asm/dom0_build.h | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index a3b6afc5318a..8ea4c7bd2792 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -1125,7 +1125,7 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr,
return rc;
}
-static int __init pvh_setup_acpi(struct domain *d, paddr_t start_info)
+int __init dom0_pvh_setup_acpi(struct domain *d, paddr_t start_info)
{
unsigned long pfn, nr_pages;
paddr_t madt_paddr, xsdt_paddr, rsdp_paddr;
@@ -1283,7 +1283,7 @@ int __init dom0_construct_pvh(struct boot_domain *bd)
return rc;
}
- rc = pvh_setup_acpi(d, start_info);
+ rc = dom0_pvh_setup_acpi(bd->d, start_info);
if ( rc )
{
printk("Failed to setup Dom0 ACPI tables: %d\n", rc);
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index e5debd5adf9f..36f563bd9d5b 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -20,6 +20,8 @@ void dom0_pv_restrict_pages(
void dom0_pvh_setup_e820(struct domain *d, unsigned long nr_pages);
+int dom0_pvh_setup_acpi(struct domain *d, paddr_t start_info);
+
int dom0_construct_pv(struct boot_domain *bd);
int dom0_construct_pvh(struct boot_domain *bd);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 19/38] x86/hyperlaunch: add domu memory map construction
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (17 preceding siblings ...)
2025-04-19 22:08 ` [RFC 18/38] x86/boot: rename pvh acpi setup function Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 20/38] x86/hyperlaunch: move populating p2m under domain builder Daniel P. Smith
` (18 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Introduce hvm_setup_e820() that will build the e820 memory map for a general
domU. To populate the ACPI entry, ACPI table size helpers are introduced. A
conditional is added to the domain builder to select between calling
hvm_setup_e820() and dom0_pvh_setup_e820() depending on if it is building dom0
or a domU.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom_build.c | 149 ++++++++++++++++++++++++++++++++++-
1 file changed, 148 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 078e42447b0a..658d3d170e64 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -16,6 +16,7 @@
#include <acpi/actables.h>
+#include <public/hvm/e820.h>
#include <public/hvm/hvm_vcpu.h>
#include <asm/bootinfo.h>
@@ -43,12 +44,158 @@ static void __hwdom_init pvh_setup_mmcfg(struct domain *d)
}
}
+static unsigned long __init hvm_size_acpi_madt(struct domain *d)
+{
+ unsigned long size = sizeof(struct acpi_table_madt);
+
+ size += sizeof(struct acpi_madt_local_apic) * d->max_vcpus;
+
+ return size;
+}
+
+static unsigned long __init hvm_size_acpi_xsdt(struct domain *d)
+{
+ unsigned long size = sizeof(struct acpi_table_xsdt);
+ /* Only adding the MADT table to the XSDT. */
+ unsigned int num_tables = 1;
+
+ /*
+ * No need to add or subtract anything because struct acpi_table_xsdt
+ * includes one array slot already.
+ */
+ size += num_tables * sizeof(uint64_t);
+
+ return size;
+}
+
+static unsigned long __init hvm_size_acpi_region(struct domain *d)
+{
+ unsigned long size = sizeof(struct acpi_table_rsdp);
+
+ size += hvm_size_acpi_xsdt(d);
+ size += hvm_size_acpi_madt(d);
+
+ return ROUNDUP(size, PAGE_SIZE);
+}
+
+/* From xenguest lib */
+#define END_SPECIAL_REGION 0xff000U
+#define NR_SPECIAL_PAGES 8
+#define START_SPECIAL_REGION (END_SPECIAL_REGION - NR_SPECIAL_PAGES)
+
+#define SPECIALPAGE_PAGING 0
+#define SPECIALPAGE_ACCESS 1
+#define SPECIALPAGE_SHARING 2
+#define SPECIALPAGE_BUFIOREQ 3
+#define SPECIALPAGE_XENSTORE 4
+#define SPECIALPAGE_IOREQ 5
+#define SPECIALPAGE_IDENT_PT 6
+#define SPECIALPAGE_CONSOLE 7
+#define special_pfn(x) (START_SPECIAL_REGION + (x))
+
+/*
+ * Allocation scheme, derived from xenlight/xenguest:
+ *
+ * | <4G MMIO Hole |
+ * [ Low Mem ][ RDM Mem ][ >1M Mem ][ ACPI ][ Special ][ High Mem ]
+ *
+ */
+static void __init hvm_setup_e820(struct domain *d, unsigned long nr_pages)
+{
+ const uint32_t lowmem_reserved_base = 0x9e000;
+ const uint32_t rdm_base = 0xa0000, rdm_size = 0x60;
+ unsigned long low_pages, ext_pages, mmio_pages, acpi_pages, high_pages = 0;
+ unsigned long max_ext_pages = (HVM_BELOW_4G_MMIO_START - MB(1)) >> PAGE_SHIFT,
+ page_count = 0;
+ unsigned nr = 0, e820_entries = 5;
+
+ /* low pages: below 1MB */
+ low_pages = lowmem_reserved_base >> PAGE_SHIFT;
+ if ( low_pages > nr_pages )
+ panic("Insufficient memory for HVM/PVH domain (%pd)\n", d);
+
+ acpi_pages = hvm_size_acpi_region(d) >> PAGE_SHIFT;
+ mmio_pages = acpi_pages + NR_SPECIAL_PAGES;
+
+ /* ext pages: from 1MB to mmio hole */
+ ext_pages = nr_pages - (low_pages + mmio_pages);
+ if ( ext_pages > max_ext_pages )
+ ext_pages = max_ext_pages;
+
+ /* high pages: above 4GB */
+ if ( nr_pages > (low_pages + mmio_pages + ext_pages) )
+ high_pages = nr_pages - (low_pages + mmio_pages + ext_pages);
+
+ /* If we should have a highmem range, add one more e820 entry */
+ if ( high_pages )
+ e820_entries++;
+
+ ASSERT(e820_entries < E820MAX);
+
+ d->arch.e820 = xzalloc_array(struct e820entry, e820_entries);
+ if ( !d->arch.e820 )
+ panic("Unable to allocate memory for boot domain e820 map\n");
+
+ /* usable: Low memory */
+ d->arch.e820[nr].addr = 0x000000;
+ d->arch.e820[nr].size = low_pages << PAGE_SHIFT;
+ d->arch.e820[nr].type = E820_RAM;
+ page_count += d->arch.e820[nr].size >> PAGE_SHIFT;
+ nr++;
+
+ /* reserved: lowmem reserved device memory */
+ d->arch.e820[nr].addr = rdm_base;
+ d->arch.e820[nr].size = rdm_size;
+ d->arch.e820[nr].type = E820_RESERVED;
+ nr++;
+
+ /* usable: extended memory from 1MB */
+ d->arch.e820[nr].addr = 0x100000;
+ d->arch.e820[nr].size = ext_pages << PAGE_SHIFT;
+ d->arch.e820[nr].type = E820_RAM;
+ page_count += d->arch.e820[nr].size >> PAGE_SHIFT;
+ nr++;
+
+ /* reserved: ACPI entry, ACPI_INFO_PHYSICAL_ADDRESS */
+ d->arch.e820[nr].addr = 0xFC000000;
+ d->arch.e820[nr].size = acpi_pages << PAGE_SHIFT;
+ d->arch.e820[nr].type = E820_ACPI;
+ page_count += d->arch.e820[nr].size >> PAGE_SHIFT;
+ nr++;
+
+ /* reserved: HVM special pages, X86_HVM_END_SPECIAL_REGION */
+ d->arch.e820[nr].addr = START_SPECIAL_REGION << PAGE_SHIFT;
+ d->arch.e820[nr].size = NR_SPECIAL_PAGES << PAGE_SHIFT;
+ d->arch.e820[nr].type = E820_RESERVED;
+ page_count += d->arch.e820[nr].size >> PAGE_SHIFT;
+ nr++;
+
+ /* usable: highmem */
+ if ( high_pages )
+ {
+ d->arch.e820[nr].addr = 0x100000000;
+ d->arch.e820[nr].size = high_pages << PAGE_SHIFT;
+ d->arch.e820[nr].type = E820_RAM;
+ page_count += d->arch.e820[nr].size >> PAGE_SHIFT;
+ nr++;
+ }
+
+ d->arch.nr_e820 = nr;
+
+ ASSERT(nr == e820_entries);
+ ASSERT(nr_pages == page_count);
+}
+
static void __init pvh_init_p2m(struct boot_domain *bd)
{
unsigned long nr_pages = dom_compute_nr_pages(bd, NULL);
bool preempted;
- dom0_pvh_setup_e820(bd->d, nr_pages);
+ if ( bd->capabilities & (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE) )
+ dom0_pvh_setup_e820(bd->d, nr_pages);
+ else
+ hvm_setup_e820(bd->d, nr_pages);
+
do {
preempted = false;
paging_set_allocation(bd->d, dom_paging_pages(bd, nr_pages),
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 20/38] x86/hyperlaunch: move populating p2m under domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (18 preceding siblings ...)
2025-04-19 22:08 ` [RFC 19/38] x86/hyperlaunch: add domu memory map construction Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 21/38] x86/hyperlaunch: move remaining pvh dom0 construction Daniel P. Smith
` (17 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Introduce hvm_populate_p2m() for populating domU p2m maps. Rename
pvh_populate_p2m() to dom0_pvh_populate_p2m() and export it. With these
adjustments, move the calls to populate the p2m maps under domain builder.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom0_build.c | 14 ++------
xen/arch/x86/hvm/dom_build.c | 39 +++++++++++++++++++++++
xen/arch/x86/include/asm/dom0_build.h | 1 +
xen/arch/x86/include/asm/domain-builder.h | 2 ++
4 files changed, 45 insertions(+), 11 deletions(-)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 8ea4c7bd2792..73ce33fb17f1 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -88,9 +88,8 @@ static int __init modify_identity_mmio(struct domain *d, unsigned long pfn,
}
/* Populate a HVM memory range using the biggest possible order. */
-static int __init pvh_populate_memory_range(struct domain *d,
- unsigned long start,
- unsigned long nr_pages)
+int __init pvh_populate_memory_range(
+ struct domain *d, unsigned long start, unsigned long nr_pages)
{
static const struct {
unsigned long align;
@@ -399,7 +398,7 @@ void __init dom0_pvh_setup_e820(struct domain *d, unsigned long nr_pages)
ASSERT(cur_pages == nr_pages);
}
-static int __init pvh_populate_p2m(struct domain *d)
+int __init dom0_pvh_populate_p2m(struct domain *d)
{
struct vcpu *v = d->vcpu[0];
unsigned int i;
@@ -1262,13 +1261,6 @@ int __init dom0_construct_pvh(struct boot_domain *bd)
struct domain *d = bd->d;
int rc;
- rc = pvh_populate_p2m(d);
- if ( rc )
- {
- printk("Failed to setup Dom0 physical memory map\n");
- return rc;
- }
-
rc = pvh_load_kernel(bd, &entry, &start_info);
if ( rc )
{
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 658d3d170e64..9421dc431ba9 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -23,6 +23,8 @@
#include <asm/dom0_build.h>
#include <asm/domain-builder.h>
#include <asm/hvm/io.h>
+#include <asm/hvm/support.h>
+#include <asm/p2m.h>
#include <asm/paging.h>
#include <asm/pci.h>
@@ -247,6 +249,33 @@ int __init hvm_setup_cpus(
return 0;
}
+static int __init hvm_populate_p2m(struct domain *d)
+{
+ unsigned int i;
+
+ /* Populate memory map. */
+ for ( i = 0; i < d->arch.nr_e820; i++ )
+ {
+ int rc;
+ unsigned long addr, size;
+
+ if ( d->arch.e820[i].type != E820_RAM &&
+ d->arch.e820[i].type != E820_ACPI &&
+ PFN_DOWN(d->arch.e820[i].addr) != START_SPECIAL_REGION )
+ continue;
+
+ addr = PFN_DOWN(d->arch.e820[i].addr);
+ size = PFN_DOWN(d->arch.e820[i].size);
+
+ rc = pvh_populate_memory_range(d, addr, size);
+ if ( rc )
+ return rc;
+
+ }
+
+ return 0;
+}
+
int __init dom_construct_pvh(struct boot_domain *bd)
{
int rc;
@@ -288,6 +317,16 @@ int __init dom_construct_pvh(struct boot_domain *bd)
if ( is_hardware_domain(bd->d) )
iommu_hwdom_init(bd->d);
+ if ( is_control_domain(bd->d) || is_hardware_domain(bd->d) )
+ rc = dom0_pvh_populate_p2m(bd->d);
+ else
+ rc = hvm_populate_p2m(bd->d);
+ if ( rc )
+ {
+ printk("Failed to setup HVM/PVH %pd physical memory map\n", bd->d);
+ return rc;
+ }
+
return dom0_construct_pvh(bd);
}
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index 36f563bd9d5b..3819b3f4e7a4 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -21,6 +21,7 @@ void dom0_pv_restrict_pages(
void dom0_pvh_setup_e820(struct domain *d, unsigned long nr_pages);
int dom0_pvh_setup_acpi(struct domain *d, paddr_t start_info);
+int dom0_pvh_populate_p2m(struct domain *d);
int dom0_construct_pv(struct boot_domain *bd);
int dom0_construct_pvh(struct boot_domain *bd);
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index 7bc2fbe66bc2..fa6fa60841ab 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -10,6 +10,8 @@ int __init builder_get_cmdline(
void builder_init(struct boot_info *bi);
int hvm_setup_cpus(struct domain *d, paddr_t entry, paddr_t start_info);
+int pvh_populate_memory_range(
+ struct domain *d, unsigned long start, unsigned long nr_pages);
unsigned long dom_paging_pages(
const struct boot_domain *d, unsigned long nr_pages);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 21/38] x86/hyperlaunch: move remaining pvh dom0 construction
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (19 preceding siblings ...)
2025-04-19 22:08 ` [RFC 20/38] x86/hyperlaunch: move populating p2m under domain builder Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 22/38] x86/hyperlaunch: relocate pvh_steal_ram to domain builder Daniel P. Smith
` (16 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Move pvh_load_kernel() and its helper functions to the domain builder. With
this move, it is now possible to move the remaining logic of
dom0_construct_pvh() to the domain builder. With all the logic moved, the
function can be dropped.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom0_build.c | 365 --------------------------
xen/arch/x86/hvm/dom_build.c | 362 ++++++++++++++++++++++++-
xen/arch/x86/include/asm/dom0_build.h | 1 -
3 files changed, 361 insertions(+), 367 deletions(-)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 73ce33fb17f1..23b46ef86c9f 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -478,335 +478,6 @@ int __init dom0_pvh_populate_p2m(struct domain *d)
#undef MB1_PAGES
}
-static paddr_t __init find_memory(
- const struct domain *d, const struct elf_binary *elf, size_t size)
-{
- paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
- paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size,
- PAGE_SIZE);
- unsigned int i;
-
- /*
- * The memory map is sorted and all RAM regions starts and sizes are
- * aligned to page boundaries.
- */
- for ( i = 0; i < d->arch.nr_e820; i++ )
- {
- paddr_t start, end = d->arch.e820[i].addr + d->arch.e820[i].size;
-
- /* Don't use memory below 1MB, as it could overwrite BDA/EBDA/IBFT. */
- if ( end <= MB(1) || d->arch.e820[i].type != E820_RAM )
- continue;
-
- start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1));
-
- ASSERT(IS_ALIGNED(start, PAGE_SIZE) && IS_ALIGNED(end, PAGE_SIZE));
-
- /*
- * NB: Even better would be to use rangesets to determine a suitable
- * range, in particular in case a kernel requests multiple heavily
- * discontiguous regions (which right now we fold all into one big
- * region).
- */
- if ( end <= kernel_start || start >= kernel_end )
- {
- /* No overlap, just check whether the region is large enough. */
- if ( end - start >= size )
- return start;
- }
- /* Deal with the kernel already being loaded in the region. */
- else if ( kernel_start > start && kernel_start - start >= size )
- return start;
- else if ( kernel_end < end && end - kernel_end >= size )
- return kernel_end;
- }
-
- return INVALID_PADDR;
-}
-
-static bool __init check_load_address(
- const struct domain *d, const struct elf_binary *elf)
-{
- paddr_t kernel_start = (uintptr_t)elf->dest_base;
- paddr_t kernel_end = kernel_start + elf->dest_size;
- unsigned int i;
-
- /* Relies on a sorted memory map with adjacent entries merged. */
- for ( i = 0; i < d->arch.nr_e820; i++ )
- {
- paddr_t start = d->arch.e820[i].addr;
- paddr_t end = start + d->arch.e820[i].size;
-
- if ( start >= kernel_end )
- return false;
-
- if ( d->arch.e820[i].type == E820_RAM &&
- start <= kernel_start &&
- end >= kernel_end )
- return true;
- }
-
- return false;
-}
-
-/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
-static paddr_t __init find_kernel_memory(
- const struct domain *d, struct elf_binary *elf,
- const struct elf_dom_parms *parms)
-{
- paddr_t kernel_size = elf->dest_size;
- unsigned int align;
- unsigned int i;
-
- if ( parms->phys_align != UNSET_ADDR32 )
- align = parms->phys_align;
- else if ( elf->palign >= PAGE_SIZE )
- align = elf->palign;
- else
- align = MB(2);
-
- /* Search backwards to find the highest address. */
- for ( i = d->arch.nr_e820; i--; )
- {
- paddr_t start = d->arch.e820[i].addr;
- paddr_t end = start + d->arch.e820[i].size;
- paddr_t kstart, kend;
-
- if ( d->arch.e820[i].type != E820_RAM ||
- d->arch.e820[i].size < kernel_size )
- continue;
-
- if ( start > parms->phys_max )
- continue;
-
- if ( end - 1 > parms->phys_max )
- end = parms->phys_max + 1;
-
- kstart = (end - kernel_size) & ~(align - 1);
- kend = kstart + kernel_size;
-
- if ( kstart < parms->phys_min )
- return 0;
-
- if ( kstart >= start && kend <= end )
- return kstart;
- }
-
- return 0;
-}
-
-/* Check the kernel load address, and adjust if necessary and possible. */
-static bool __init check_and_adjust_load_address(
- const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
-{
- paddr_t reloc_base;
-
- if ( check_load_address(d, elf) )
- return true;
-
- if ( !parms->phys_reloc )
- {
- printk("%pd kernel: Address conflict and not relocatable\n", d);
- return false;
- }
-
- reloc_base = find_kernel_memory(d, elf, parms);
- if ( !reloc_base )
- {
- printk("%pd kernel: Failed find a load address\n", d);
- return false;
- }
-
- if ( opt_dom0_verbose )
- printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", d,
- elf->dest_base, elf->dest_base + elf->dest_size - 1,
- reloc_base, reloc_base + elf->dest_size - 1);
-
- parms->phys_entry =
- reloc_base + (parms->phys_entry - (uintptr_t)elf->dest_base);
- elf->dest_base = (char *)reloc_base;
-
- return true;
-}
-
-static int __init pvh_load_kernel(
- const struct boot_domain *bd, paddr_t *entry, paddr_t *start_info_addr)
-{
- struct domain *d = bd->d;
- struct boot_module *image = bd->kernel;
- struct boot_module *initrd = bd->ramdisk;
- void *image_base = bootstrap_map_bm(image);
- void *image_start = image_base + image->headroom;
- unsigned long image_len = image->size;
- unsigned long initrd_len = initrd ? initrd->size : 0;
- const char *initrd_cmdline = NULL;
- struct elf_binary elf;
- struct elf_dom_parms parms;
- size_t extra_space;
- paddr_t last_addr;
- struct hvm_start_info start_info = { 0 };
- struct hvm_modlist_entry mod = { 0 };
- struct vcpu *v = d->vcpu[0];
- int rc;
-
- if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
- {
- printk("Error trying to detect bz compressed kernel\n");
- return rc;
- }
-
- if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
- {
- printk("Unable to init ELF\n");
- return rc;
- }
- if ( opt_dom0_verbose )
- elf_set_verbose(&elf);
- elf_parse_binary(&elf);
- if ( (rc = elf_xen_parse(&elf, &parms, true)) != 0 )
- {
- printk("Unable to parse kernel for ELFNOTES\n");
- if ( elf_check_broken(&elf) )
- printk("%pd kernel: broken ELF: %s\n", d, elf_check_broken(&elf));
- return rc;
- }
-
- if ( parms.phys_entry == UNSET_ADDR32 )
- {
- printk("Unable to find XEN_ELFNOTE_PHYS32_ENTRY address\n");
- return -EINVAL;
- }
-
- /* Copy the OS image and free temporary buffer. */
- elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
- elf.dest_size = parms.virt_kend - parms.virt_kstart;
-
- if ( !check_and_adjust_load_address(d, &elf, &parms) )
- return -ENOSPC;
-
- elf_set_vcpu(&elf, v);
- rc = elf_load_binary(&elf);
- if ( rc < 0 )
- {
- printk("Failed to load kernel: %d\n", rc);
- if ( elf_check_broken(&elf) )
- printk("%pd kernel: broken ELF: %s\n", d, elf_check_broken(&elf));
- return rc;
- }
-
- /*
- * Find a RAM region big enough (and that doesn't overlap with the loaded
- * kernel) in order to load the initrd and the metadata. Note it could be
- * split into smaller allocations, done as a single region in order to
- * simplify it.
- */
- extra_space = sizeof(start_info);
-
- if ( initrd )
- {
- size_t initrd_space = elf_round_up(&elf, initrd_len);
-
- if ( initrd->cmdline_pa )
- {
- initrd_cmdline = __va(initrd->cmdline_pa);
- if ( !*initrd_cmdline )
- initrd_cmdline = NULL;
- }
- if ( initrd_cmdline )
- initrd_space += strlen(initrd_cmdline) + 1;
-
- if ( initrd_space )
- extra_space += ROUNDUP(initrd_space, PAGE_SIZE) + sizeof(mod);
- else
- initrd = NULL;
- }
-
- if ( bd->cmdline )
- extra_space += elf_round_up(&elf, strlen(bd->cmdline) + 1);
-
- last_addr = find_memory(d, &elf, extra_space);
- if ( last_addr == INVALID_PADDR )
- {
- printk("Unable to find a memory region to load initrd and metadata\n");
- return -ENOMEM;
- }
-
- if ( initrd != NULL )
- {
- rc = hvm_copy_to_guest_phys(last_addr, __va(initrd->start),
- initrd_len, v);
- if ( rc )
- {
- printk("Unable to copy initrd to guest\n");
- return rc;
- }
-
- mod.paddr = last_addr;
- mod.size = initrd_len;
- last_addr += elf_round_up(&elf, initrd_len);
- if ( initrd_cmdline )
- {
- size_t len = strlen(initrd_cmdline) + 1;
-
- rc = hvm_copy_to_guest_phys(last_addr, initrd_cmdline, len, v);
- if ( rc )
- {
- printk("Unable to copy module command line\n");
- return rc;
- }
- mod.cmdline_paddr = last_addr;
- last_addr += len;
- }
- last_addr = ROUNDUP(last_addr, PAGE_SIZE);
- }
-
- /* Free temporary buffers. */
- free_boot_modules();
-
- if ( bd->cmdline )
- {
- rc = hvm_copy_to_guest_phys(last_addr, bd->cmdline,
- strlen(bd->cmdline) + 1, v);
- if ( rc )
- {
- printk("Unable to copy guest command line\n");
- return rc;
- }
- start_info.cmdline_paddr = last_addr;
- /*
- * Round up to 32/64 bits (depending on the guest kernel bitness) so
- * the modlist/start_info is aligned.
- */
- last_addr += elf_round_up(&elf, strlen(bd->cmdline) + 1);
- }
- if ( initrd != NULL )
- {
- rc = hvm_copy_to_guest_phys(last_addr, &mod, sizeof(mod), v);
- if ( rc )
- {
- printk("Unable to copy guest modules\n");
- return rc;
- }
- start_info.modlist_paddr = last_addr;
- start_info.nr_modules = 1;
- last_addr += sizeof(mod);
- }
-
- start_info.magic = XEN_HVM_START_MAGIC_VALUE;
- start_info.flags = SIF_PRIVILEGED | SIF_INITDOMAIN;
- rc = hvm_copy_to_guest_phys(last_addr, &start_info, sizeof(start_info), v);
- if ( rc )
- {
- printk("Unable to copy start info to guest\n");
- return rc;
- }
-
- *entry = parms.phys_entry;
- *start_info_addr = last_addr;
-
- return 0;
-}
-
static int __init cf_check acpi_count_intr_ovr(
struct acpi_subtable_header *header, const unsigned long end)
{
@@ -1255,42 +926,6 @@ int __init dom0_pvh_setup_acpi(struct domain *d, paddr_t start_info)
return 0;
}
-int __init dom0_construct_pvh(struct boot_domain *bd)
-{
- paddr_t entry, start_info;
- struct domain *d = bd->d;
- int rc;
-
- rc = pvh_load_kernel(bd, &entry, &start_info);
- if ( rc )
- {
- printk("Failed to load Dom0 kernel\n");
- return rc;
- }
-
- rc = hvm_setup_cpus(bd->d, entry, start_info);
- if ( rc )
- {
- printk("Failed to setup Dom0 CPUs: %d\n", rc);
- return rc;
- }
-
- rc = dom0_pvh_setup_acpi(bd->d, start_info);
- if ( rc )
- {
- printk("Failed to setup Dom0 ACPI tables: %d\n", rc);
- return rc;
- }
-
- if ( opt_dom0_verbose )
- {
- printk("Dom%u memory map:\n", d->domain_id);
- print_e820_memory_map(d->arch.e820, d->arch.nr_e820);
- }
-
- return 0;
-}
-
/*
* Local variables:
* mode: C
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 9421dc431ba9..2e47ca489a71 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -16,10 +16,12 @@
#include <acpi/actables.h>
+#include <public/arch-x86/hvm/start_info.h>
#include <public/hvm/e820.h>
#include <public/hvm/hvm_vcpu.h>
#include <asm/bootinfo.h>
+#include <asm/bzimage.h>
#include <asm/dom0_build.h>
#include <asm/domain-builder.h>
#include <asm/hvm/io.h>
@@ -276,8 +278,338 @@ static int __init hvm_populate_p2m(struct domain *d)
return 0;
}
+static paddr_t __init find_memory(
+ const struct domain *d, const struct elf_binary *elf, size_t size)
+{
+ paddr_t kernel_start = (paddr_t)elf->dest_base & PAGE_MASK;
+ paddr_t kernel_end = ROUNDUP((paddr_t)elf->dest_base + elf->dest_size,
+ PAGE_SIZE);
+ unsigned int i;
+
+ /*
+ * The memory map is sorted and all RAM regions starts and sizes are
+ * aligned to page boundaries.
+ */
+ for ( i = 0; i < d->arch.nr_e820; i++ )
+ {
+ paddr_t start, end = d->arch.e820[i].addr + d->arch.e820[i].size;
+
+ /* Don't use memory below 1MB, as it could overwrite BDA/EBDA/IBFT. */
+ if ( end <= MB(1) || d->arch.e820[i].type != E820_RAM )
+ continue;
+
+ start = MAX(ROUNDUP(d->arch.e820[i].addr, PAGE_SIZE), MB(1));
+
+ ASSERT(IS_ALIGNED(start, PAGE_SIZE) && IS_ALIGNED(end, PAGE_SIZE));
+
+ /*
+ * NB: Even better would be to use rangesets to determine a suitable
+ * range, in particular in case a kernel requests multiple heavily
+ * discontiguous regions (which right now we fold all into one big
+ * region).
+ */
+ if ( end <= kernel_start || start >= kernel_end )
+ {
+ /* No overlap, just check whether the region is large enough. */
+ if ( end - start >= size )
+ return start;
+ }
+ /* Deal with the kernel already being loaded in the region. */
+ else if ( kernel_start > start && kernel_start - start >= size )
+ return start;
+ else if ( kernel_end < end && end - kernel_end >= size )
+ return kernel_end;
+ }
+
+ return INVALID_PADDR;
+}
+
+static bool __init check_load_address(
+ const struct domain *d, const struct elf_binary *elf)
+{
+ paddr_t kernel_start = (uintptr_t)elf->dest_base;
+ paddr_t kernel_end = kernel_start + elf->dest_size;
+ unsigned int i;
+
+ /* Relies on a sorted memory map with adjacent entries merged. */
+ for ( i = 0; i < d->arch.nr_e820; i++ )
+ {
+ paddr_t start = d->arch.e820[i].addr;
+ paddr_t end = start + d->arch.e820[i].size;
+
+ if ( start >= kernel_end )
+ return false;
+
+ if ( d->arch.e820[i].type == E820_RAM &&
+ start <= kernel_start &&
+ end >= kernel_end )
+ return true;
+ }
+
+ return false;
+}
+
+/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
+static paddr_t __init find_kernel_memory(
+ const struct domain *d, struct elf_binary *elf,
+ const struct elf_dom_parms *parms)
+{
+ paddr_t kernel_size = elf->dest_size;
+ unsigned int align;
+ unsigned int i;
+
+ if ( parms->phys_align != UNSET_ADDR32 )
+ align = parms->phys_align;
+ else if ( elf->palign >= PAGE_SIZE )
+ align = elf->palign;
+ else
+ align = MB(2);
+
+ /* Search backwards to find the highest address. */
+ for ( i = d->arch.nr_e820; i--; )
+ {
+ paddr_t start = d->arch.e820[i].addr;
+ paddr_t end = start + d->arch.e820[i].size;
+ paddr_t kstart, kend;
+
+ if ( d->arch.e820[i].type != E820_RAM ||
+ d->arch.e820[i].size < kernel_size )
+ continue;
+
+ if ( start > parms->phys_max )
+ continue;
+
+ if ( end - 1 > parms->phys_max )
+ end = parms->phys_max + 1;
+
+ kstart = (end - kernel_size) & ~(align - 1);
+ kend = kstart + kernel_size;
+
+ if ( kstart < parms->phys_min )
+ return 0;
+
+ if ( kstart >= start && kend <= end )
+ return kstart;
+ }
+
+ return 0;
+}
+
+/* Check the kernel load address, and adjust if necessary and possible. */
+static bool __init check_and_adjust_load_address(
+ const struct domain *d, struct elf_binary *elf, struct elf_dom_parms *parms)
+{
+ paddr_t reloc_base;
+
+ if ( check_load_address(d, elf) )
+ return true;
+
+ if ( !parms->phys_reloc )
+ {
+ printk("%pd kernel: Address conflict and not relocatable\n", d);
+ return false;
+ }
+
+ reloc_base = find_kernel_memory(d, elf, parms);
+ if ( !reloc_base )
+ {
+ printk("%pd kernel: Failed find a load address\n", d);
+ return false;
+ }
+
+ if ( opt_dom0_verbose )
+ printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", d,
+ elf->dest_base, elf->dest_base + elf->dest_size - 1,
+ reloc_base, reloc_base + elf->dest_size - 1);
+
+ parms->phys_entry =
+ reloc_base + (parms->phys_entry - (uintptr_t)elf->dest_base);
+ elf->dest_base = (char *)reloc_base;
+
+ return true;
+}
+
+static int __init pvh_load_kernel(
+ const struct boot_domain *bd, paddr_t *entry, paddr_t *start_info_addr)
+{
+ struct domain *d = bd->d;
+ struct boot_module *image = bd->kernel;
+ struct boot_module *initrd = bd->ramdisk;
+ void *image_base = bootstrap_map_bm(image);
+ void *image_start = image_base + image->headroom;
+ unsigned long image_len = image->size;
+ unsigned long initrd_len = initrd ? initrd->size : 0;
+ const char *initrd_cmdline = NULL;
+ struct elf_binary elf;
+ struct elf_dom_parms parms;
+ size_t extra_space;
+ paddr_t last_addr;
+ struct hvm_start_info start_info = { 0 };
+ struct hvm_modlist_entry mod = { 0 };
+ struct vcpu *v = d->vcpu[0];
+ int rc;
+
+ if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
+ {
+ printk("Error trying to detect bz compressed kernel\n");
+ return rc;
+ }
+
+ if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
+ {
+ printk("Unable to init ELF\n");
+ return rc;
+ }
+ if ( opt_dom0_verbose )
+ elf_set_verbose(&elf);
+ elf_parse_binary(&elf);
+ if ( (rc = elf_xen_parse(&elf, &parms, true)) != 0 )
+ {
+ printk("Unable to parse kernel for ELFNOTES\n");
+ if ( elf_check_broken(&elf) )
+ printk("%pd kernel: broken ELF: %s\n", d, elf_check_broken(&elf));
+ return rc;
+ }
+
+ if ( parms.phys_entry == UNSET_ADDR32 )
+ {
+ printk("Unable to find XEN_ELFNOTE_PHYS32_ENTRY address\n");
+ return -EINVAL;
+ }
+
+ /* Copy the OS image and free temporary buffer. */
+ elf.dest_base = (void *)(parms.virt_kstart - parms.virt_base);
+ elf.dest_size = parms.virt_kend - parms.virt_kstart;
+
+ if ( !check_and_adjust_load_address(d, &elf, &parms) )
+ return -ENOSPC;
+
+ elf_set_vcpu(&elf, v);
+ rc = elf_load_binary(&elf);
+ if ( rc < 0 )
+ {
+ printk("Failed to load kernel: %d\n", rc);
+ if ( elf_check_broken(&elf) )
+ printk("%pd kernel: broken ELF: %s\n", d, elf_check_broken(&elf));
+ return rc;
+ }
+
+ /*
+ * Find a RAM region big enough (and that doesn't overlap with the loaded
+ * kernel) in order to load the initrd and the metadata. Note it could be
+ * split into smaller allocations, done as a single region in order to
+ * simplify it.
+ */
+ extra_space = sizeof(start_info);
+
+ if ( initrd )
+ {
+ size_t initrd_space = elf_round_up(&elf, initrd_len);
+
+ if ( initrd->cmdline_pa )
+ {
+ initrd_cmdline = __va(initrd->cmdline_pa);
+ if ( !*initrd_cmdline )
+ initrd_cmdline = NULL;
+ }
+ if ( initrd_cmdline )
+ initrd_space += strlen(initrd_cmdline) + 1;
+
+ if ( initrd_space )
+ extra_space += ROUNDUP(initrd_space, PAGE_SIZE) + sizeof(mod);
+ else
+ initrd = NULL;
+ }
+
+ if ( bd->cmdline )
+ extra_space += elf_round_up(&elf, strlen(bd->cmdline) + 1);
+
+ last_addr = find_memory(d, &elf, extra_space);
+ if ( last_addr == INVALID_PADDR )
+ {
+ printk("Unable to find a memory region to load initrd and metadata\n");
+ return -ENOMEM;
+ }
+
+ if ( initrd != NULL )
+ {
+ rc = hvm_copy_to_guest_phys(last_addr, __va(initrd->start),
+ initrd_len, v);
+ if ( rc )
+ {
+ printk("Unable to copy initrd to guest\n");
+ return rc;
+ }
+
+ mod.paddr = last_addr;
+ mod.size = initrd_len;
+ last_addr += elf_round_up(&elf, initrd_len);
+ if ( initrd_cmdline )
+ {
+ size_t len = strlen(initrd_cmdline) + 1;
+
+ rc = hvm_copy_to_guest_phys(last_addr, initrd_cmdline, len, v);
+ if ( rc )
+ {
+ printk("Unable to copy module command line\n");
+ return rc;
+ }
+ mod.cmdline_paddr = last_addr;
+ last_addr += len;
+ }
+ last_addr = ROUNDUP(last_addr, PAGE_SIZE);
+ }
+
+ /* Free temporary buffers. */
+ free_boot_modules();
+
+ if ( bd->cmdline )
+ {
+ rc = hvm_copy_to_guest_phys(last_addr, bd->cmdline,
+ strlen(bd->cmdline) + 1, v);
+ if ( rc )
+ {
+ printk("Unable to copy guest command line\n");
+ return rc;
+ }
+ start_info.cmdline_paddr = last_addr;
+ /*
+ * Round up to 32/64 bits (depending on the guest kernel bitness) so
+ * the modlist/start_info is aligned.
+ */
+ last_addr += elf_round_up(&elf, strlen(bd->cmdline) + 1);
+ }
+ if ( initrd != NULL )
+ {
+ rc = hvm_copy_to_guest_phys(last_addr, &mod, sizeof(mod), v);
+ if ( rc )
+ {
+ printk("Unable to copy guest modules\n");
+ return rc;
+ }
+ start_info.modlist_paddr = last_addr;
+ start_info.nr_modules = 1;
+ last_addr += sizeof(mod);
+ }
+
+ start_info.magic = XEN_HVM_START_MAGIC_VALUE;
+ start_info.flags = SIF_PRIVILEGED | SIF_INITDOMAIN;
+ rc = hvm_copy_to_guest_phys(last_addr, &start_info, sizeof(start_info), v);
+ if ( rc )
+ {
+ printk("Unable to copy start info to guest\n");
+ return rc;
+ }
+
+ *entry = parms.phys_entry;
+ *start_info_addr = last_addr;
+
+ return 0;
+}
+
int __init dom_construct_pvh(struct boot_domain *bd)
{
+ paddr_t entry, start_info;
int rc;
printk(XENLOG_INFO "*** Building a PVH Dom%d ***\n", bd->domid);
@@ -327,7 +659,35 @@ int __init dom_construct_pvh(struct boot_domain *bd)
return rc;
}
- return dom0_construct_pvh(bd);
+ rc = pvh_load_kernel(bd, &entry, &start_info);
+ if ( rc )
+ {
+ printk("Failed to load Dom0 kernel\n");
+ return rc;
+ }
+
+ rc = hvm_setup_cpus(bd->d, entry, start_info);
+ if ( rc )
+ {
+ printk("Failed to setup Dom0 CPUs: %d\n", rc);
+ return rc;
+ }
+
+ rc = dom0_pvh_setup_acpi(bd->d, start_info);
+ if ( rc )
+ {
+ printk("Failed to setup Dom0 ACPI tables: %d\n", rc);
+ return rc;
+ }
+
+ if ( opt_dom0_verbose )
+ {
+ printk("Dom%u memory map:\n", bd->domid);
+ print_e820_memory_map(bd->d->arch.e820, bd->d->arch.nr_e820);
+ }
+
+ printk("WARNING: PVH is an experimental mode with limited functionality\n");
+ return 0;
}
/*
diff --git a/xen/arch/x86/include/asm/dom0_build.h b/xen/arch/x86/include/asm/dom0_build.h
index 3819b3f4e7a4..6947aaa1dce3 100644
--- a/xen/arch/x86/include/asm/dom0_build.h
+++ b/xen/arch/x86/include/asm/dom0_build.h
@@ -24,7 +24,6 @@ int dom0_pvh_setup_acpi(struct domain *d, paddr_t start_info);
int dom0_pvh_populate_p2m(struct domain *d);
int dom0_construct_pv(struct boot_domain *bd);
-int dom0_construct_pvh(struct boot_domain *bd);
void dom0_update_physmap(bool compat, unsigned long pfn,
unsigned long mfn, unsigned long vphysmap_s);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 22/38] x86/hyperlaunch: relocate pvh_steal_ram to domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (20 preceding siblings ...)
2025-04-19 22:08 ` [RFC 21/38] x86/hyperlaunch: move remaining pvh dom0 construction Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 23/38] x86/hyperlaunch: add domu acpi construction Daniel P. Smith
` (15 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The function pvh_steal_ram() is not pvh specific and can be used on any HVM
domain. Move to the domain builder and rename to hvm_steal_ram.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom0_build.c | 106 +++-------------------
xen/arch/x86/hvm/dom_build.c | 84 +++++++++++++++++
xen/arch/x86/include/asm/domain-builder.h | 7 ++
3 files changed, 102 insertions(+), 95 deletions(-)
diff --git a/xen/arch/x86/hvm/dom0_build.c b/xen/arch/x86/hvm/dom0_build.c
index 23b46ef86c9f..11dfc54f5ab2 100644
--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -182,90 +182,6 @@ int __init pvh_populate_memory_range(
return 0;
}
-/* Steal RAM from the end of a memory region. */
-static int __init pvh_steal_ram(struct domain *d, unsigned long size,
- unsigned long align, paddr_t limit,
- paddr_t *addr)
-{
- unsigned int i = d->arch.nr_e820;
-
- /*
- * Alignment 0 should be set to 1, so it doesn't wrap around in the
- * calculations below.
- */
- align = align ? : 1;
- while ( i-- )
- {
- struct e820entry *entry = &d->arch.e820[i];
-
- if ( entry->type != E820_RAM || entry->addr + entry->size > limit )
- continue;
-
- *addr = (entry->addr + entry->size - size) & ~(align - 1);
- if ( *addr < entry->addr ||
- /* Don't steal from the low 1MB due to the copying done there. */
- *addr < MB(1) )
- continue;
-
- entry->size = *addr - entry->addr;
- return 0;
- }
-
- return -ENOMEM;
-}
-
-/* NB: memory map must be sorted at all times for this to work correctly. */
-static int __init pvh_add_mem_range(struct domain *d, uint64_t s, uint64_t e,
- unsigned int type)
-{
- struct e820entry *map;
- unsigned int i;
-
- for ( i = 0; i < d->arch.nr_e820; i++ )
- {
- uint64_t rs = d->arch.e820[i].addr;
- uint64_t re = rs + d->arch.e820[i].size;
-
- if ( rs == e && d->arch.e820[i].type == type )
- {
- d->arch.e820[i].addr = s;
- return 0;
- }
-
- if ( re == s && d->arch.e820[i].type == type &&
- (i + 1 == d->arch.nr_e820 || d->arch.e820[i + 1].addr >= e) )
- {
- d->arch.e820[i].size += e - s;
- return 0;
- }
-
- if ( rs >= e )
- break;
-
- if ( re > s )
- return -EEXIST;
- }
-
- map = xzalloc_array(struct e820entry, d->arch.nr_e820 + 1);
- if ( !map )
- {
- printk(XENLOG_WARNING "E820: out of memory to add region\n");
- return -ENOMEM;
- }
-
- memcpy(map, d->arch.e820, i * sizeof(*d->arch.e820));
- memcpy(map + i + 1, d->arch.e820 + i,
- (d->arch.nr_e820 - i) * sizeof(*d->arch.e820));
- map[i].addr = s;
- map[i].size = e - s;
- map[i].type = type;
- xfree(d->arch.e820);
- d->arch.e820 = map;
- d->arch.nr_e820++;
-
- return 0;
-}
-
static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
{
uint32_t rc, *ident_pt;
@@ -279,14 +195,14 @@ static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
* TSS structure (which accounts for the first 104b) doesn't cross
* a page boundary.
*/
- if ( !pvh_steal_ram(d, HVM_VM86_TSS_SIZE, 128, GB(4), &gaddr) )
+ if ( !hvm_steal_ram(d, HVM_VM86_TSS_SIZE, 128, GB(4), &gaddr) )
{
if ( hvm_copy_to_guest_phys(gaddr, NULL, HVM_VM86_TSS_SIZE, v) !=
HVMTRANS_okay )
printk("Unable to zero VM86 TSS area\n");
d->arch.hvm.params[HVM_PARAM_VM86_TSS_SIZED] =
VM86_TSS_UPDATED | ((uint64_t)HVM_VM86_TSS_SIZE << 32) | gaddr;
- if ( pvh_add_mem_range(d, gaddr, gaddr + HVM_VM86_TSS_SIZE,
+ if ( hvm_add_mem_range(d, gaddr, gaddr + HVM_VM86_TSS_SIZE,
E820_RESERVED) )
printk("Unable to set VM86 TSS as reserved in the memory map\n");
}
@@ -294,7 +210,7 @@ static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
printk("Unable to allocate VM86 TSS area\n");
/* Steal some more RAM for the identity page tables. */
- if ( pvh_steal_ram(d, PAGE_SIZE, PAGE_SIZE, GB(4), &gaddr) )
+ if ( hvm_steal_ram(d, PAGE_SIZE, PAGE_SIZE, GB(4), &gaddr) )
{
printk("Unable to find memory to stash the identity page tables\n");
return -ENOMEM;
@@ -316,7 +232,7 @@ static int __init pvh_setup_vmx_realmode_helpers(struct domain *d)
unmap_domain_page(ident_pt);
put_page(mfn_to_page(mfn));
d->arch.hvm.params[HVM_PARAM_IDENT_PT] = gaddr;
- if ( pvh_add_mem_range(d, gaddr, gaddr + PAGE_SIZE, E820_RESERVED) )
+ if ( hvm_add_mem_range(d, gaddr, gaddr + PAGE_SIZE, E820_RESERVED) )
printk("Unable to set identity page tables as reserved in the memory map\n");
return 0;
@@ -581,7 +497,7 @@ static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
madt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, madt), size);
/* Place the new MADT in guest memory space. */
- if ( pvh_steal_ram(d, size, 0, GB(4), addr) )
+ if ( hvm_steal_ram(d, size, 0, GB(4), addr) )
{
printk("Unable to steal guest RAM for MADT\n");
rc = -ENOMEM;
@@ -589,7 +505,7 @@ static int __init pvh_setup_acpi_madt(struct domain *d, paddr_t *addr)
}
/* Mark this region as E820_ACPI. */
- if ( pvh_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
+ if ( hvm_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
printk("Unable to add MADT region to memory map\n");
rc = hvm_copy_to_guest_phys(*addr, madt, size, d->vcpu[0]);
@@ -769,7 +685,7 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr,
xsdt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, xsdt), size);
/* Place the new XSDT in guest memory space. */
- if ( pvh_steal_ram(d, size, 0, GB(4), addr) )
+ if ( hvm_steal_ram(d, size, 0, GB(4), addr) )
{
printk("Unable to find guest RAM for XSDT\n");
rc = -ENOMEM;
@@ -777,7 +693,7 @@ static int __init pvh_setup_acpi_xsdt(struct domain *d, paddr_t madt_addr,
}
/* Mark this region as E820_ACPI. */
- if ( pvh_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
+ if ( hvm_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
printk("Unable to add XSDT region to memory map\n");
rc = hvm_copy_to_guest_phys(*addr, xsdt, size, d->vcpu[0]);
@@ -823,7 +739,7 @@ int __init dom0_pvh_setup_acpi(struct domain *d, paddr_t start_info)
if ( strncmp(sig, ACPI_SIG_MADT, ACPI_NAME_SIZE)
? pvh_acpi_table_allowed(sig, addr, size)
: !acpi_memory_banned(addr, size) )
- pvh_add_mem_range(d, addr, addr + size, E820_ACPI);
+ hvm_add_mem_range(d, addr, addr + size, E820_ACPI);
}
/* Identity map ACPI e820 regions. */
@@ -892,14 +808,14 @@ int __init dom0_pvh_setup_acpi(struct domain *d, paddr_t start_info)
* the native RSDT, and should not be used for the Dom0 kernel's boot
* purposes (we keep it visible for post boot access).
*/
- if ( pvh_steal_ram(d, sizeof(rsdp), 0, GB(4), &rsdp_paddr) )
+ if ( hvm_steal_ram(d, sizeof(rsdp), 0, GB(4), &rsdp_paddr) )
{
printk("Unable to allocate guest RAM for RSDP\n");
return -ENOMEM;
}
/* Mark this region as E820_ACPI. */
- if ( pvh_add_mem_range(d, rsdp_paddr, rsdp_paddr + sizeof(rsdp),
+ if ( hvm_add_mem_range(d, rsdp_paddr, rsdp_paddr + sizeof(rsdp),
E820_ACPI) )
printk("Unable to add RSDP region to memory map\n");
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 2e47ca489a71..f997f293f329 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -278,6 +278,38 @@ static int __init hvm_populate_p2m(struct domain *d)
return 0;
}
+/* Steal RAM from the end of a memory region. */
+int __init hvm_steal_ram(
+ struct domain *d, unsigned long size, unsigned long align, paddr_t limit,
+ paddr_t *addr)
+{
+ unsigned int i = d->arch.nr_e820;
+
+ /*
+ * Alignment 0 should be set to 1, so it doesn't wrap around in the
+ * calculations below.
+ */
+ align = align ? : 1;
+ while ( i-- )
+ {
+ struct e820entry *entry = &d->arch.e820[i];
+
+ if ( entry->type != E820_RAM || entry->addr + entry->size > limit )
+ continue;
+
+ *addr = (entry->addr + entry->size - size) & ~(align - 1);
+ if ( *addr < entry->addr ||
+ /* Don't steal from the low 1MB due to the copying done there. */
+ *addr < MB(1) )
+ continue;
+
+ entry->size = *addr - entry->addr;
+ return 0;
+ }
+
+ return -ENOMEM;
+}
+
static paddr_t __init find_memory(
const struct domain *d, const struct elf_binary *elf, size_t size)
{
@@ -324,6 +356,58 @@ static paddr_t __init find_memory(
return INVALID_PADDR;
}
+/* NB: memory map must be sorted at all times for this to work correctly. */
+int __init hvm_add_mem_range(
+ struct domain *d, uint64_t s, uint64_t e, unsigned int type)
+{
+ struct e820entry *map;
+ unsigned int i;
+
+ for ( i = 0; i < d->arch.nr_e820; i++ )
+ {
+ uint64_t rs = d->arch.e820[i].addr;
+ uint64_t re = rs + d->arch.e820[i].size;
+
+ if ( rs == e && d->arch.e820[i].type == type )
+ {
+ d->arch.e820[i].addr = s;
+ return 0;
+ }
+
+ if ( re == s && d->arch.e820[i].type == type &&
+ (i + 1 == d->arch.nr_e820 || d->arch.e820[i + 1].addr >= e) )
+ {
+ d->arch.e820[i].size += e - s;
+ return 0;
+ }
+
+ if ( rs >= e )
+ break;
+
+ if ( re > s )
+ return -EEXIST;
+ }
+
+ map = xzalloc_array(struct e820entry, d->arch.nr_e820 + 1);
+ if ( !map )
+ {
+ printk(XENLOG_WARNING "E820: out of memory to add region\n");
+ return -ENOMEM;
+ }
+
+ memcpy(map, d->arch.e820, i * sizeof(*d->arch.e820));
+ memcpy(map + i + 1, d->arch.e820 + i,
+ (d->arch.nr_e820 - i) * sizeof(*d->arch.e820));
+ map[i].addr = s;
+ map[i].size = e - s;
+ map[i].type = type;
+ xfree(d->arch.e820);
+ d->arch.e820 = map;
+ d->arch.nr_e820++;
+
+ return 0;
+}
+
static bool __init check_load_address(
const struct domain *d, const struct elf_binary *elf)
{
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index fa6fa60841ab..ee91edc3c4d1 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -9,10 +9,17 @@ int __init builder_get_cmdline(
void builder_init(struct boot_info *bi);
+int hvm_add_mem_range(
+ struct domain *d, uint64_t s, uint64_t e, unsigned int type);
+
int hvm_setup_cpus(struct domain *d, paddr_t entry, paddr_t start_info);
int pvh_populate_memory_range(
struct domain *d, unsigned long start, unsigned long nr_pages);
+int hvm_steal_ram(
+ struct domain *d, unsigned long size, unsigned long align, paddr_t limit,
+ paddr_t *addr);
+
unsigned long dom_paging_pages(
const struct boot_domain *d, unsigned long nr_pages);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 23/38] x86/hyperlaunch: add domu acpi construction
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (21 preceding siblings ...)
2025-04-19 22:08 ` [RFC 22/38] x86/hyperlaunch: relocate pvh_steal_ram to domain builder Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 24/38] x86/boot: export command line processing Daniel P. Smith
` (14 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Introduce hvm_setup_acpi() that will construct an APCI table for a general HVM
domU guest.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/hvm/dom_build.c | 213 ++++++++++++++++++++++++++++++++++-
1 file changed, 212 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index f997f293f329..c482d5c2d974 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -408,6 +408,214 @@ int __init hvm_add_mem_range(
return 0;
}
+static int __init hvm_setup_acpi_madt(
+ struct domain *d, struct acpi_table_madt *madt)
+{
+ struct acpi_table_header *table;
+ struct acpi_madt_local_apic *lapic;
+ acpi_status status;
+ unsigned long size = hvm_size_acpi_madt(d);
+ int i;
+
+ /* Copy the native MADT table header. */
+ status = acpi_get_table(ACPI_SIG_MADT, 0, &table);
+ if ( !ACPI_SUCCESS(status) )
+ {
+ printk("Failed to get MADT ACPI table, aborting.\n");
+ return -EINVAL;
+ }
+ madt->header = *table;
+ madt->address = APIC_DEFAULT_PHYS_BASE;
+ /*
+ * NB: this is currently set to 4, which is the revision in the ACPI
+ * spec 6.1. Sadly ACPICA doesn't provide revision numbers for the
+ * tables described in the headers.
+ */
+ madt->header.revision = min_t(unsigned char, table->revision, 4);
+
+ lapic = (void *)(madt + 1);
+
+ for ( i = 0; i < d->max_vcpus; i++ )
+ {
+ lapic->header.type = ACPI_MADT_TYPE_LOCAL_APIC;
+ lapic->header.length = sizeof(*lapic);
+ lapic->id = i * 2;
+ lapic->processor_id = i;
+ lapic->lapic_flags = ACPI_MADT_ENABLED;
+
+ lapic++;
+ }
+
+ madt->header.length = size;
+ /*
+ * Calling acpi_tb_checksum here is a layering violation, but
+ * introducing a wrapper for such simple usage seems overkill.
+ */
+ madt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, madt), size);
+
+ return 0;
+}
+
+static int __init hvm_setup_acpi_xsdt(
+ struct domain *d, struct acpi_table_xsdt *xsdt, paddr_t madt_addr)
+{
+ struct acpi_table_header *table;
+ struct acpi_table_rsdp *rsdp;
+ unsigned long size = hvm_size_acpi_xsdt(d);
+ paddr_t xsdt_paddr;
+
+ /*
+ * Restore original DMAR table signature, we are going to filter it from
+ * the new XSDT that is presented to the guest, so it is no longer
+ * necessary to have it's signature zapped.
+ */
+ acpi_dmar_reinstate();
+
+ /* Copy the native XSDT table header. */
+ rsdp = acpi_os_map_memory(acpi_os_get_root_pointer(), sizeof(*rsdp));
+ if ( !rsdp )
+ {
+ printk("Unable to map RSDP\n");
+ return -EINVAL;
+ }
+ xsdt_paddr = rsdp->xsdt_physical_address;
+ acpi_os_unmap_memory(rsdp, sizeof(*rsdp));
+ table = acpi_os_map_memory(xsdt_paddr, sizeof(*table));
+ if ( !table )
+ {
+ printk("Unable to map XSDT\n");
+ return -EINVAL;
+ }
+ xsdt->header = *table;
+ acpi_os_unmap_memory(table, sizeof(*table));
+
+ /* Add the custom MADT. */
+ xsdt->table_offset_entry[0] = madt_addr;
+
+ xsdt->header.revision = 1;
+ xsdt->header.length = size;
+ /*
+ * Calling acpi_tb_checksum here is a layering violation, but
+ * introducing a wrapper for such simple usage seems overkill.
+ */
+ xsdt->header.checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, xsdt), size);
+
+ return 0;
+}
+
+static int __init hvm_alloc_acpi_region(
+ struct domain *d, void **region, unsigned long size, paddr_t *addr)
+{
+ int i;
+
+ *addr = 0;
+
+ for ( i = 0; i < d->arch.nr_e820; i++ )
+ {
+ if ( d->arch.e820[i].type == E820_ACPI )
+ {
+ if ( d->arch.e820[i].size < size )
+ break;
+
+ *addr = d->arch.e820[i].addr;
+ break;
+ }
+ }
+
+ /* The e820 setup did not allocate ACPI region, steal one instead. */
+ if ( *addr == 0 )
+ {
+ if ( hvm_steal_ram(d, size, 0, GB(4), addr) )
+ {
+ printk("Unable to allocate guest RAM for RSDP\n");
+ return -ENOMEM;
+ }
+ if ( hvm_add_mem_range(d, *addr, *addr + size, E820_ACPI) )
+ {
+ printk("Unable to add RSDP region to memory map\n");
+ return -EFAULT;
+ }
+ }
+
+ *region = xzalloc_bytes(size);
+ if ( !region )
+ return -ENOMEM;
+
+ return 0;
+}
+
+static int __init hvm_setup_acpi(struct domain *d, paddr_t start_info)
+{
+ paddr_t rsdp_paddr, xsdt_paddr, madt_paddr;
+ struct acpi_table_rsdp *rsdp;
+ unsigned long size = hvm_size_acpi_region(d);
+ void *table;
+ int rc;
+
+ rc = hvm_alloc_acpi_region(d, &table, size, &rsdp_paddr);
+ if ( rc < 0 )
+ return rc;
+
+ /* RSDP */
+ rsdp = table;
+ xsdt_paddr = rsdp_paddr + sizeof(struct acpi_table_rsdp);
+
+ *rsdp = (struct acpi_table_rsdp){
+ .signature = ACPI_SIG_RSDP,
+ .revision = 2,
+ .length = sizeof(struct acpi_table_rsdp),
+ .oem_id = "XenHL\0", /* Xen Hyperlaunch */
+ .xsdt_physical_address = xsdt_paddr,
+ };
+
+ rsdp->checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, rsdp),
+ ACPI_RSDP_REV0_SIZE);
+ rsdp->extended_checksum -= acpi_tb_checksum(ACPI_CAST_PTR(u8, rsdp),
+ sizeof(*rsdp));
+
+ /* XSDT */
+ table += sizeof(struct acpi_table_rsdp);
+ madt_paddr = xsdt_paddr + hvm_size_acpi_xsdt(d);
+
+ rc = hvm_setup_acpi_xsdt(d, table, madt_paddr);
+ if ( rc )
+ {
+ printk("Unable to construct XSDT\n");
+ goto out;
+ }
+
+
+ /* MADT */
+ table += hvm_size_acpi_xsdt(d);
+ rc = hvm_setup_acpi_madt(d, table);
+ if ( rc )
+ {
+ printk("Unable to construct MADT\n");
+ goto out;
+ }
+
+ /* Copy ACPI region into guest memory. */
+ rc = hvm_copy_to_guest_phys(rsdp_paddr, rsdp, size, d->vcpu[0]);
+ if ( rc )
+ {
+ printk("Unable to copy RSDP into guest memory\n");
+ goto out;
+ }
+
+ /* Copy RSDP address to start_info. */
+ rc = hvm_copy_to_guest_phys(
+ start_info + offsetof(struct hvm_start_info, rsdp_paddr), &rsdp_paddr,
+ sizeof(((struct hvm_start_info *) 0)->rsdp_paddr), d->vcpu[0]);
+ if ( rc )
+ printk("Unable to copy RSDP address to start info\n");
+
+ out:
+ if ( rsdp )
+ xfree(rsdp);
+
+ return rc;
+}
+
static bool __init check_load_address(
const struct domain *d, const struct elf_binary *elf)
{
@@ -757,7 +965,10 @@ int __init dom_construct_pvh(struct boot_domain *bd)
return rc;
}
- rc = dom0_pvh_setup_acpi(bd->d, start_info);
+ if ( is_control_domain(bd->d) || is_hardware_domain(bd->d) )
+ rc = dom0_pvh_setup_acpi(bd->d, start_info);
+ else
+ rc = hvm_setup_acpi(bd->d, start_info);
if ( rc )
{
printk("Failed to setup Dom0 ACPI tables: %d\n", rc);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 24/38] x86/boot: export command line processing
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (22 preceding siblings ...)
2025-04-19 22:08 ` [RFC 23/38] x86/hyperlaunch: add domu acpi construction Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 25/38] x86/hyperlaunch: convert create_dom0 to arch_create_dom Daniel P. Smith
` (13 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Export the function cmdline_cook() so that it can be called outside of setup.c.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/include/asm/setup.h | 2 ++
xen/arch/x86/setup.c | 4 +---
2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
index b517da6144de..4b8fbdc67e05 100644
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -29,6 +29,8 @@ void init_IRQ(void);
struct boot_domain;
int construct_dom0(struct boot_domain *bd);
+const char *cmdline_cook(const char *p, const char *loader_name);
+
void setup_io_bitmap(struct domain *d);
extern struct boot_info xen_boot_info;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 760262ea3d4c..0c9f1cd98faa 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -291,8 +291,6 @@ static int __init cf_check parse_acpi_param(const char *s)
}
custom_param("acpi", parse_acpi_param);
-static const char *cmdline_cook(const char *p, const char *loader_name);
-
struct boot_info __initdata xen_boot_info = {
.loader = "unknown",
.cmdline = "",
@@ -950,7 +948,7 @@ static bool __init loader_is_grub2(const char *loader_name)
*
* Always returns a pointer within @p.
*/
-static const char *__init cmdline_cook(const char *p, const char *loader_name)
+const char *__init cmdline_cook(const char *p, const char *loader_name)
{
/* Strip leading whitespace. */
while ( *p == ' ' )
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 25/38] x86/hyperlaunch: convert create_dom0 to arch_create_dom
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (23 preceding siblings ...)
2025-04-19 22:08 ` [RFC 24/38] x86/boot: export command line processing Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 26/38] x86/hyperlaunch: remove dom0-isms from arch_create_dom Daniel P. Smith
` (12 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The only consumer of the function domain_cmdline_size() and the acpi_param
parameter is create_dom(). It is therefore reasonable to move
domain_cmdline_size() and the acpi_param parameter along with its parsing code
at the same time as create_dom0() is moved under the domain builder. While
moving create_dom0(), rename it to arch_create_dom() as the function is now
generalized.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/Makefile | 2 +-
xen/arch/x86/domain-builder/domain.c | 172 ++++++++++++++++++++++
xen/arch/x86/include/asm/domain-builder.h | 3 +
xen/arch/x86/setup.c | 167 +--------------------
4 files changed, 177 insertions(+), 167 deletions(-)
diff --git a/xen/arch/x86/domain-builder/Makefile b/xen/arch/x86/domain-builder/Makefile
index 67024b5cb213..cefe20abd2e6 100644
--- a/xen/arch/x86/domain-builder/Makefile
+++ b/xen/arch/x86/domain-builder/Makefile
@@ -1,3 +1,3 @@
obj-$(CONFIG_DOMAIN_BUILDER) += fdt.init.o
obj-y += core.init.o
-obj-y += domain.init.o
+obj-y += domain.o
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index 764a1af4acd3..a82c02250b4c 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -5,16 +5,65 @@
#include <xen/cpumask.h>
#include <xen/domain.h>
+#include <xen/err.h>
+#include <xen/grant_table.h>
#include <xen/init.h>
#include <xen/libelf.h>
#include <xen/nodemask.h>
+#include <xen/param.h>
#include <xen/sched.h>
#include <asm/bootinfo.h>
+#include <asm/cpu-policy.h>
#include <asm/dom0_build.h>
+#include <asm/domain-builder.h>
+#include <asm/io_apic.h>
#include <asm/paging.h>
+#include <asm/pv/shim.h>
#include <asm/spec_ctrl.h>
+bool __read_mostly acpi_disabled;
+bool __initdata acpi_force;
+static char __initdata acpi_param[10] = "";
+
+static int __init cf_check parse_acpi_param(const char *s)
+{
+ /* Interpret the parameter for use within Xen. */
+ if ( !parse_bool(s, NULL) )
+ {
+ disable_acpi();
+ }
+ else if ( !strcmp(s, "force") )
+ {
+ acpi_force = true;
+ acpi_ht = 1;
+ acpi_disabled = false;
+ }
+ else if ( !strcmp(s, "ht") )
+ {
+ if ( !acpi_force )
+ disable_acpi();
+ acpi_ht = 1;
+ }
+ else if ( !strcmp(s, "noirq") )
+ {
+ acpi_noirq_set();
+ }
+ else if ( !strcmp(s, "verbose") )
+ {
+ opt_acpi_verbose = true;
+ return 0;
+ }
+ else
+ return -EINVAL;
+
+ /* Save the parameter so it can be propagated to domain0. */
+ safe_strcpy(acpi_param, s);
+
+ return 0;
+}
+custom_param("acpi", parse_acpi_param);
+
unsigned long __init dom_paging_pages(
const struct boot_domain *bd, unsigned long nr_pages)
{
@@ -138,6 +187,129 @@ void __init alloc_dom_vcpus(struct domain *d)
domain_update_node_affinity(d);
}
+static size_t __init domain_cmdline_size(
+ struct boot_info *bi, struct boot_domain *bd)
+{
+ size_t s = bi->kextra ? strlen(bi->kextra) : 0;
+
+ if ( bd->kernel->fdt_cmdline )
+ s += builder_get_cmdline_size(bi, bd->kernel->cmdline_pa);
+ else
+ s += strlen(__va(bd->kernel->cmdline_pa));
+
+ if ( s == 0 )
+ return s;
+
+ /*
+ * Certain parameters from the Xen command line may be added to the dom0
+ * command line. Add additional space for the possible cases along with one
+ * extra char to hold \0.
+ */
+ s += strlen(" noapic") + strlen(" acpi=") + sizeof(acpi_param) + 1;
+
+ return s;
+}
+
+struct domain *__init arch_create_dom(
+ struct boot_info *bi, struct boot_domain *bd)
+{
+ char *cmdline = NULL;
+ size_t cmdline_size;
+ unsigned int create_flags = 0;
+ struct xen_domctl_createdomain dom0_cfg = {
+ .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
+ .max_evtchn_port = -1,
+ .max_grant_frames = -1,
+ .max_maptrack_frames = -1,
+ .grant_opts = XEN_DOMCTL_GRANT_version(opt_gnttab_max_version),
+ .max_vcpus = dom_max_vcpus(bd),
+ .arch = {
+ .misc_flags = opt_dom0_msr_relaxed ? XEN_X86_MSR_RELAXED : 0,
+ },
+ };
+ struct domain *d;
+
+ if ( opt_dom0_pvh ||
+ (bi->hyperlaunch_enabled && !(bd->mode & BUILD_MODE_PARAVIRT)) )
+ {
+ dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
+ ((hvm_hap_supported() && !opt_dom0_shadow) ?
+ XEN_DOMCTL_CDF_hap : 0));
+
+ dom0_cfg.arch.emulation_flags |=
+ XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
+ }
+
+ if ( bd->capabilities & BUILD_CAPS_HARDWARE )
+ {
+ create_flags |= CDF_hardware;
+ if ( iommu_enabled )
+ dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
+ }
+
+ if ( bd->domid == DOMID_INVALID )
+ /* Create initial domain. Not d0 for pvshim. */
+ bd->domid = get_initial_domain_id();
+ if ( bd->capabilities & BUILD_CAPS_CONTROL )
+ create_flags |= CDF_privileged | CDF_hardware;
+ d = domain_create(bd->domid, &dom0_cfg,
+ pv_shim ? 0 : create_flags);
+ if ( IS_ERR(d) )
+ panic("Error creating d%u: %ld\n", bd->domid, PTR_ERR(d));
+
+ bd->d = d;
+
+ if ( bd->capabilities & BUILD_CAPS_CONTROL )
+ init_dom0_cpuid_policy(bd->d);
+
+ if ( alloc_dom_vcpu0(bd) == NULL )
+ panic("Error creating %pd vcpu 0\n", d);
+
+ cmdline_size = domain_cmdline_size(bi, bd);
+ if ( cmdline_size )
+ {
+ if ( !(cmdline = xzalloc_array(char, cmdline_size)) )
+ panic("Error allocating cmdline buffer for %pd\n", d);
+
+ if ( bd->kernel->fdt_cmdline )
+ builder_get_cmdline(
+ bi, bd->kernel->cmdline_pa, cmdline, cmdline_size);
+ else
+ strlcpy(cmdline,
+ cmdline_cook(__va(bd->kernel->cmdline_pa),bi->loader),
+ cmdline_size);
+
+ if ( bi->kextra )
+ /* kextra always includes exactly one leading space. */
+ strlcat(cmdline, bi->kextra, cmdline_size);
+
+ /* Append any extra parameters. */
+ if ( skip_ioapic_setup && !strstr(cmdline, "noapic") )
+ strlcat(cmdline, " noapic", cmdline_size);
+
+ if ( (strlen(acpi_param) == 0) && acpi_disabled )
+ {
+ printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
+ safe_strcpy(acpi_param, "off");
+ }
+
+ if ( (strlen(acpi_param) != 0) && !strstr(cmdline, "acpi=") )
+ {
+ strlcat(cmdline, " acpi=", cmdline_size);
+ strlcat(cmdline, acpi_param, cmdline_size);
+ }
+ bd->kernel->cmdline_pa = 0;
+ bd->cmdline = cmdline;
+ }
+
+ if ( construct_dom0(bd) != 0 )
+ panic("Could not construct domain 0\n");
+
+ XFREE(cmdline);
+
+ return d;
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index ee91edc3c4d1..ccab02c3a1fd 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -9,6 +9,9 @@ int __init builder_get_cmdline(
void builder_init(struct boot_info *bi);
+struct domain *arch_create_dom(
+ struct boot_info *bi, struct boot_domain *bd);
+
int hvm_add_mem_range(
struct domain *d, uint64_t s, uint64_t e, unsigned int type);
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 0c9f1cd98faa..8f956b6eca4f 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -249,48 +249,6 @@ static int __init cf_check parse_smap_param(const char *s)
}
custom_param("smap", parse_smap_param);
-bool __read_mostly acpi_disabled;
-bool __initdata acpi_force;
-static char __initdata acpi_param[10] = "";
-
-static int __init cf_check parse_acpi_param(const char *s)
-{
- /* Interpret the parameter for use within Xen. */
- if ( !parse_bool(s, NULL) )
- {
- disable_acpi();
- }
- else if ( !strcmp(s, "force") )
- {
- acpi_force = true;
- acpi_ht = 1;
- acpi_disabled = false;
- }
- else if ( !strcmp(s, "ht") )
- {
- if ( !acpi_force )
- disable_acpi();
- acpi_ht = 1;
- }
- else if ( !strcmp(s, "noirq") )
- {
- acpi_noirq_set();
- }
- else if ( !strcmp(s, "verbose") )
- {
- opt_acpi_verbose = true;
- return 0;
- }
- else
- return -EINVAL;
-
- /* Save the parameter so it can be propagated to domain0. */
- safe_strcpy(acpi_param, s);
-
- return 0;
-}
-custom_param("acpi", parse_acpi_param);
-
struct boot_info __initdata xen_boot_info = {
.loader = "unknown",
.cmdline = "",
@@ -980,129 +938,6 @@ static unsigned int __init copy_bios_e820(struct e820entry *map, unsigned int li
return n;
}
-static size_t __init domain_cmdline_size(
- struct boot_info *bi, struct boot_domain *bd)
-{
- size_t s = bi->kextra ? strlen(bi->kextra) : 0;
-
- if ( bd->kernel->fdt_cmdline )
- s += builder_get_cmdline_size(bi, bd->kernel->cmdline_pa);
- else
- s += strlen(__va(bd->kernel->cmdline_pa));
-
- if ( s == 0 )
- return s;
-
- /*
- * Certain parameters from the Xen command line may be added to the dom0
- * command line. Add additional space for the possible cases along with one
- * extra char to hold \0.
- */
- s += strlen(" noapic") + strlen(" acpi=") + sizeof(acpi_param) + 1;
-
- return s;
-}
-
-static struct domain *__init create_dom0(struct boot_info *bi)
-{
- char *cmdline = NULL;
- size_t cmdline_size;
- unsigned int create_flags = 0;
- struct boot_domain *bd = &bi->domains[0];
- struct xen_domctl_createdomain dom0_cfg = {
- .flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
- .max_evtchn_port = -1,
- .max_grant_frames = -1,
- .max_maptrack_frames = -1,
- .grant_opts = XEN_DOMCTL_GRANT_version(opt_gnttab_max_version),
- .max_vcpus = dom_max_vcpus(bd),
- .arch = {
- .misc_flags = opt_dom0_msr_relaxed ? XEN_X86_MSR_RELAXED : 0,
- },
- };
- struct domain *d;
-
- if ( opt_dom0_pvh ||
- (bi->hyperlaunch_enabled && !(bd->mode & BUILD_MODE_PARAVIRT)) )
- {
- dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
- ((hvm_hap_supported() && !opt_dom0_shadow) ?
- XEN_DOMCTL_CDF_hap : 0));
-
- dom0_cfg.arch.emulation_flags |=
- XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
- }
-
- if ( bd->capabilities & BUILD_CAPS_HARDWARE )
- {
- create_flags |= CDF_hardware;
- if ( iommu_enabled )
- dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
- }
-
- if ( bd->domid == DOMID_INVALID )
- /* Create initial domain. Not d0 for pvshim. */
- bd->domid = get_initial_domain_id();
- if ( bd->capabilities & BUILD_CAPS_CONTROL )
- create_flags |= CDF_privileged | CDF_hardware;
- d = domain_create(bd->domid, &dom0_cfg,
- pv_shim ? 0 : create_flags);
- if ( IS_ERR(d) )
- panic("Error creating d%u: %ld\n", bd->domid, PTR_ERR(d));
-
- bd->d = d;
-
- if ( bd->capabilities & BUILD_CAPS_CONTROL )
- init_dom0_cpuid_policy(bd->d);
-
- if ( alloc_dom_vcpu0(bd) == NULL )
- panic("Error creating %pd vcpu 0\n", d);
-
- cmdline_size = domain_cmdline_size(bi, bd);
- if ( cmdline_size )
- {
- if ( !(cmdline = xzalloc_array(char, cmdline_size)) )
- panic("Error allocating cmdline buffer for %pd\n", d);
-
- if ( bd->kernel->fdt_cmdline )
- builder_get_cmdline(
- bi, bd->kernel->cmdline_pa, cmdline, cmdline_size);
- else
- strlcpy(cmdline,
- cmdline_cook(__va(bd->kernel->cmdline_pa),bi->loader),
- cmdline_size);
-
- if ( bi->kextra )
- /* kextra always includes exactly one leading space. */
- strlcat(cmdline, bi->kextra, cmdline_size);
-
- /* Append any extra parameters. */
- if ( skip_ioapic_setup && !strstr(cmdline, "noapic") )
- strlcat(cmdline, " noapic", cmdline_size);
-
- if ( (strlen(acpi_param) == 0) && acpi_disabled )
- {
- printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
- safe_strcpy(acpi_param, "off");
- }
-
- if ( (strlen(acpi_param) != 0) && !strstr(cmdline, "acpi=") )
- {
- strlcat(cmdline, " acpi=", cmdline_size);
- strlcat(cmdline, acpi_param, cmdline_size);
- }
- bd->kernel->cmdline_pa = 0;
- bd->cmdline = cmdline;
- }
-
- if ( construct_dom0(bd) != 0 )
- panic("Could not construct domain 0\n");
-
- XFREE(cmdline);
-
- return d;
-}
-
/* How much of the directmap is prebuilt at compile time. */
#define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
@@ -2180,7 +2015,7 @@ void asmlinkage __init noreturn __start_xen(void)
* We're going to setup domain0 using the module(s) that we stashed safely
* above our heap. The second module, if present, is an initrd ramdisk.
*/
- dom0 = create_dom0(bi);
+ dom0 = arch_create_dom(bi, &bi->domains[0]);
if ( !dom0 )
panic("Could not set up DOM0 guest OS\n");
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 26/38] x86/hyperlaunch: remove dom0-isms from arch_create_dom
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (24 preceding siblings ...)
2025-04-19 22:08 ` [RFC 25/38] x86/hyperlaunch: convert create_dom0 to arch_create_dom Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 27/38] x86/hyperlaunch: introduce domain builder general dom creation Daniel P. Smith
` (11 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Removes the dom0 naming from variables and isolates control/hardware
domain specific logic behind capabilities check.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/domain.c | 47 +++++++++++++++-------------
1 file changed, 26 insertions(+), 21 deletions(-)
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index a82c02250b4c..13f99111a3a9 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -216,7 +216,7 @@ struct domain *__init arch_create_dom(
char *cmdline = NULL;
size_t cmdline_size;
unsigned int create_flags = 0;
- struct xen_domctl_createdomain dom0_cfg = {
+ struct xen_domctl_createdomain dom_cfg = {
.flags = IS_ENABLED(CONFIG_TBOOT) ? XEN_DOMCTL_CDF_s3_integrity : 0,
.max_evtchn_port = -1,
.max_grant_frames = -1,
@@ -232,11 +232,11 @@ struct domain *__init arch_create_dom(
if ( opt_dom0_pvh ||
(bi->hyperlaunch_enabled && !(bd->mode & BUILD_MODE_PARAVIRT)) )
{
- dom0_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
+ dom_cfg.flags |= (XEN_DOMCTL_CDF_hvm |
((hvm_hap_supported() && !opt_dom0_shadow) ?
XEN_DOMCTL_CDF_hap : 0));
- dom0_cfg.arch.emulation_flags |=
+ dom_cfg.arch.emulation_flags |=
XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
}
@@ -244,7 +244,7 @@ struct domain *__init arch_create_dom(
{
create_flags |= CDF_hardware;
if ( iommu_enabled )
- dom0_cfg.flags |= XEN_DOMCTL_CDF_iommu;
+ dom_cfg.flags |= XEN_DOMCTL_CDF_iommu;
}
if ( bd->domid == DOMID_INVALID )
@@ -252,7 +252,7 @@ struct domain *__init arch_create_dom(
bd->domid = get_initial_domain_id();
if ( bd->capabilities & BUILD_CAPS_CONTROL )
create_flags |= CDF_privileged | CDF_hardware;
- d = domain_create(bd->domid, &dom0_cfg,
+ d = domain_create(bd->domid, &dom_cfg,
pv_shim ? 0 : create_flags);
if ( IS_ERR(d) )
panic("Error creating d%u: %ld\n", bd->domid, PTR_ERR(d));
@@ -279,25 +279,30 @@ struct domain *__init arch_create_dom(
cmdline_cook(__va(bd->kernel->cmdline_pa),bi->loader),
cmdline_size);
- if ( bi->kextra )
- /* kextra always includes exactly one leading space. */
- strlcat(cmdline, bi->kextra, cmdline_size);
-
- /* Append any extra parameters. */
- if ( skip_ioapic_setup && !strstr(cmdline, "noapic") )
- strlcat(cmdline, " noapic", cmdline_size);
-
- if ( (strlen(acpi_param) == 0) && acpi_disabled )
+ /* Params from Xen cmd line apply only to control/hardware doms */
+ if ( bd->capabilities & (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE) )
{
- printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
- safe_strcpy(acpi_param, "off");
+ if ( bi->kextra )
+ /* kextra always includes exactly one leading space. */
+ strlcat(cmdline, bi->kextra, cmdline_size);
+
+ /* Append any extra parameters. */
+ if ( skip_ioapic_setup && !strstr(cmdline, "noapic") )
+ strlcat(cmdline, " noapic", cmdline_size);
+
+ if ( (strlen(acpi_param) == 0) && acpi_disabled )
+ {
+ printk("ACPI is disabled, notifying Domain 0 (acpi=off)\n");
+ safe_strcpy(acpi_param, "off");
+ }
+
+ if ( (strlen(acpi_param) != 0) && !strstr(cmdline, "acpi=") )
+ {
+ strlcat(cmdline, " acpi=", cmdline_size);
+ strlcat(cmdline, acpi_param, cmdline_size);
+ }
}
- if ( (strlen(acpi_param) != 0) && !strstr(cmdline, "acpi=") )
- {
- strlcat(cmdline, " acpi=", cmdline_size);
- strlcat(cmdline, acpi_param, cmdline_size);
- }
bd->kernel->cmdline_pa = 0;
bd->cmdline = cmdline;
}
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 27/38] x86/hyperlaunch: introduce domain builder general dom creation
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (25 preceding siblings ...)
2025-04-19 22:08 ` [RFC 26/38] x86/hyperlaunch: remove dom0-isms from arch_create_dom Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 28/38] x86/hyperlaunch: add xenstore boot capabilities flag Daniel P. Smith
` (10 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Introduce the builder_create_domains() function that provides the domain
construciton abstraction for the hyperlaunch domain builder.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/core.c | 17 +++++++++++++++
xen/arch/x86/include/asm/bootinfo.h | 26 +++++++++++++++++++++++
xen/arch/x86/include/asm/domain-builder.h | 1 +
xen/arch/x86/setup.c | 23 +++++++++++++++++---
4 files changed, 64 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-builder/core.c
index 8d137ecaaf84..2f0b8bd82c3a 100644
--- a/xen/arch/x86/domain-builder/core.c
+++ b/xen/arch/x86/domain-builder/core.c
@@ -9,6 +9,7 @@
#include <asm/bootinfo.h>
#include <asm/setup.h>
+#include <asm/domain-builder.h>
#include "fdt.h"
@@ -102,6 +103,22 @@ void __init builder_init(struct boot_info *bi)
}
}
+unsigned int __init builder_create_domains(struct boot_info *bi)
+{
+ unsigned int build_count = 0;
+ struct boot_domain *bd = &bi->domains[0];
+
+ if ( bd->kernel == NULL && bd->capabilities & BUILD_CAPS_CONTROL )
+ panic("%s: control domain missing kernel\n", __func__);
+
+
+ arch_create_dom(bi, bd);
+ if ( bd->d )
+ build_count++;
+
+ return build_count;
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/arch/x86/include/asm/bootinfo.h b/xen/arch/x86/include/asm/bootinfo.h
index 5b2c93b1ef9e..430ae08cf5ef 100644
--- a/xen/arch/x86/include/asm/bootinfo.h
+++ b/xen/arch/x86/include/asm/bootinfo.h
@@ -132,6 +132,32 @@ static inline unsigned int __init next_boot_module_index(
(i) <= (b)->nr_modules; \
(i) = next_boot_module_index(b, t, i + 1) )
+/*
+ * next_boot_domain_index:
+ * Finds the next boot domain with capability cap, starting at array index
+ * start.
+ *
+ * Returns:
+ * Success - index in boot_domains array
+ * Failure - a value greater than MAX_NR_BOOTDOMS
+ */
+static inline unsigned int __init next_boot_domain_index(
+ const struct boot_info *bi, uint32_t cap, unsigned int start)
+{
+ int i;
+
+ for ( i = start; i < bi->nr_domains; i++ )
+ {
+ if ( bi->domains[i].capabilities & cap )
+ return i;
+ }
+
+ return MAX_NR_BOOTDOMS + 1;
+}
+
+#define first_boot_domain_index(bi, cap) \
+ next_boot_domain_index(bi, cap, 0)
+
#endif /* X86_BOOTINFO_H */
/*
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index ccab02c3a1fd..5dc5661bec07 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -8,6 +8,7 @@ int __init builder_get_cmdline(
struct boot_info *bi, int offset, char *cmdline, size_t size);
void builder_init(struct boot_info *bi);
+unsigned int builder_create_domains(struct boot_info *bi);
struct domain *arch_create_dom(
struct boot_info *bi, struct boot_domain *bd);
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 8f956b6eca4f..da5a8e8d8ed3 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2015,9 +2015,26 @@ void asmlinkage __init noreturn __start_xen(void)
* We're going to setup domain0 using the module(s) that we stashed safely
* above our heap. The second module, if present, is an initrd ramdisk.
*/
- dom0 = arch_create_dom(bi, &bi->domains[0]);
- if ( !dom0 )
- panic("Could not set up DOM0 guest OS\n");
+ ret = builder_create_domains(bi);
+ if ( ret <= 0 )
+ panic("Could not set up boot-time domains\n");
+ else
+ printk(XENLOG_INFO "Constructed %d boot-time domains\n", ret);
+
+ /* Selection order: hardware domain, control domain, first domain */
+ i = first_boot_domain_index(bi, BUILD_CAPS_HARDWARE);
+ if ( i >= MAX_NR_BOOTDOMS )
+ {
+ i = first_boot_domain_index(bi, BUILD_CAPS_CONTROL);
+ if ( i >= MAX_NR_BOOTDOMS )
+ {
+ printk(XENLOG_WARNING
+ "A hwdom/ctrldom not detected, using 0th domain\n");
+ i = 0;
+ }
+ }
+
+ dom0 = bi->domains[i].d;
heap_init_late();
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 28/38] x86/hyperlaunch: add xenstore boot capabilities flag
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (26 preceding siblings ...)
2025-04-19 22:08 ` [RFC 27/38] x86/hyperlaunch: introduce domain builder general dom creation Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 29/38] x86/hyperlaunch: allocate console for domu Daniel P. Smith
` (9 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Add the ability to designate a domain as the Xenstore domain via the
hyperlaunch configuration.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/core.c | 2 +-
xen/arch/x86/domain-builder/domain.c | 2 ++
xen/arch/x86/domain-builder/fdt.c | 2 ++
xen/arch/x86/include/asm/boot-domain.h | 1 +
4 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-builder/core.c
index 2f0b8bd82c3a..e21f5efbfe62 100644
--- a/xen/arch/x86/domain-builder/core.c
+++ b/xen/arch/x86/domain-builder/core.c
@@ -98,7 +98,7 @@ void __init builder_init(struct boot_info *bi)
bi->mods[i].type = BOOTMOD_KERNEL;
bi->domains[0].kernel = &bi->mods[i];
bi->domains[0].capabilities |=
- (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE);
+ (BUILD_CAPS_CONTROL | BUILD_CAPS_HARDWARE | BUILD_CAPS_XENSTORE);
bi->nr_domains = 1;
}
}
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index 13f99111a3a9..c1b2e011aaa0 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -250,6 +250,8 @@ struct domain *__init arch_create_dom(
if ( bd->domid == DOMID_INVALID )
/* Create initial domain. Not d0 for pvshim. */
bd->domid = get_initial_domain_id();
+ if ( bd->capabilities & BUILD_CAPS_XENSTORE )
+ dom_cfg.flags |= XEN_DOMCTL_CDF_xs_domain;
if ( bd->capabilities & BUILD_CAPS_CONTROL )
create_flags |= CDF_privileged | CDF_hardware;
d = domain_create(bd->domid, &dom_cfg,
diff --git a/xen/arch/x86/domain-builder/fdt.c b/xen/arch/x86/domain-builder/fdt.c
index 1af58ed4eb86..9c6c0ae09a24 100644
--- a/xen/arch/x86/domain-builder/fdt.c
+++ b/xen/arch/x86/domain-builder/fdt.c
@@ -269,6 +269,8 @@ static int __init process_domain_node(
printk("c");
if ( bd->capabilities & BUILD_CAPS_HARDWARE )
printk(" h");
+ if ( bd->capabilities & BUILD_CAPS_XENSTORE )
+ printk(" x");
printk("\n");
}
}
diff --git a/xen/arch/x86/include/asm/boot-domain.h b/xen/arch/x86/include/asm/boot-domain.h
index 18d144de2fea..32f1f8fbc4e8 100644
--- a/xen/arch/x86/include/asm/boot-domain.h
+++ b/xen/arch/x86/include/asm/boot-domain.h
@@ -16,6 +16,7 @@ struct boot_domain {
#define BUILD_CAPS_NONE (0)
#define BUILD_CAPS_CONTROL (1 << 0)
#define BUILD_CAPS_HARDWARE (1 << 1)
+#define BUILD_CAPS_XENSTORE (1 << 2)
uint32_t capabilities;
/* On | Off */
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 29/38] x86/hyperlaunch: allocate console for domu
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (27 preceding siblings ...)
2025-04-19 22:08 ` [RFC 28/38] x86/hyperlaunch: add xenstore boot capabilities flag Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 30/38] x86/hyperlaunch: allocate xenstore " Daniel P. Smith
` (8 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
During domU construction, a page of memory and an event channel must be setup
for the console connection. In this commit, a page from the special page region
of domU is setup as the console page along with an event channel. The page
address and event channel are published in the HVM parameters, so they may be
published in Xenstore once it is online.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/domain.c | 56 ++++++++++++++++++++++++++
xen/arch/x86/hvm/dom_build.c | 34 ++++++++++++++++
xen/arch/x86/include/asm/boot-domain.h | 7 ++++
3 files changed, 97 insertions(+)
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index c1b2e011aaa0..7ce069a57c5d 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -6,6 +6,7 @@
#include <xen/cpumask.h>
#include <xen/domain.h>
#include <xen/err.h>
+#include <xen/event.h>
#include <xen/grant_table.h>
#include <xen/init.h>
#include <xen/libelf.h>
@@ -187,6 +188,58 @@ void __init alloc_dom_vcpus(struct domain *d)
domain_update_node_affinity(d);
}
+static int __init alloc_dom_evtchn(
+ const struct boot_domain *d, const struct boot_domain *r,
+ evtchn_alloc_unbound_t *ec)
+{
+ int rc;
+
+ ec->dom = d->domid;
+ ec->remote_dom = r->domid;
+
+ rc = evtchn_alloc_unbound(ec, 0);
+ if ( rc )
+ {
+ printk(XENLOG_WARNING "Failed allocating event channel for %pd\n",
+ d->d);
+ return rc;
+ }
+
+ return 0;
+}
+
+static int __init alloc_console_evtchn(
+ const struct boot_info *bi, struct boot_domain *bd)
+{
+ evtchn_alloc_unbound_t evtchn_req;
+ const struct boot_domain *hwdom;
+ int idx, rc;
+
+ idx = first_boot_domain_index(bi, BUILD_CAPS_HARDWARE);
+ if ( idx < 0 )
+ {
+ printk(XENLOG_WARNING "No backing hardware domain for %pd console\n",
+ bd->d);
+ return -EINVAL;
+ }
+
+ if ( bi->domains[idx].d )
+ hwdom = &bi->domains[idx];
+ else
+ {
+ printk(XENLOG_WARNING "Hardware domain for %pd console not constructed\n",
+ bd->d);
+ return -EINVAL;
+ }
+
+ if ( (rc = alloc_dom_evtchn(bd, hwdom, &evtchn_req)) < 0 )
+ return rc;
+
+ bd->console.evtchn = evtchn_req.port;
+
+ return 0;
+}
+
static size_t __init domain_cmdline_size(
struct boot_info *bi, struct boot_domain *bd)
{
@@ -309,6 +362,9 @@ struct domain *__init arch_create_dom(
bd->cmdline = cmdline;
}
+ if ( !(bd->capabilities & BUILD_CAPS_HARDWARE) )
+ alloc_console_evtchn(bi, bd);
+
if ( construct_dom0(bd) != 0 )
panic("Could not construct domain 0\n");
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index c482d5c2d974..934ae138e58f 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -9,6 +9,7 @@
*/
#include <xen/acpi.h>
+#include <xen/event.h>
#include <xen/iommu.h>
#include <xen/init.h>
#include <xen/softirq.h>
@@ -19,6 +20,7 @@
#include <public/arch-x86/hvm/start_info.h>
#include <public/hvm/e820.h>
#include <public/hvm/hvm_vcpu.h>
+#include <public/hvm/params.h>
#include <asm/bootinfo.h>
#include <asm/bzimage.h>
@@ -899,6 +901,35 @@ static int __init pvh_load_kernel(
return 0;
}
+static int __init alloc_console_page(struct boot_domain *bd)
+{
+ paddr_t con_addr = special_pfn(SPECIALPAGE_CONSOLE) << PAGE_SHIFT;
+ uint32_t fields[4] = { 0 };
+
+ if ( !port_is_valid(bd->d, bd->console.evtchn) )
+ {
+ printk("No event channel available for %pd console\n", bd->d);
+ return -EINVAL;
+ }
+
+ /*
+ * Clear the xencons_interface fields that are located after a 1024 rx and
+ * a 2048 tx buffer, 3072 bytes.
+ */
+ if ( hvm_copy_to_guest_phys(con_addr + 3072, fields, sizeof(fields),
+ bd->d->vcpu[0]) )
+ {
+ printk("Unable to set xenstore connection state\n");
+ return -EFAULT;
+ }
+
+ bd->console.gfn = PFN_DOWN(con_addr);
+ bd->d->arch.hvm.params[HVM_PARAM_CONSOLE_PFN] = bd->console.gfn;
+ bd->d->arch.hvm.params[HVM_PARAM_CONSOLE_EVTCHN] = bd->console.evtchn;
+
+ return 0;
+}
+
int __init dom_construct_pvh(struct boot_domain *bd)
{
paddr_t entry, start_info;
@@ -975,6 +1006,9 @@ int __init dom_construct_pvh(struct boot_domain *bd)
return rc;
}
+ if ( !is_hardware_domain(bd->d) )
+ alloc_console_page(bd);
+
if ( opt_dom0_verbose )
{
printk("Dom%u memory map:\n", bd->domid);
diff --git a/xen/arch/x86/include/asm/boot-domain.h b/xen/arch/x86/include/asm/boot-domain.h
index 32f1f8fbc4e8..cb6e1fab23ba 100644
--- a/xen/arch/x86/include/asm/boot-domain.h
+++ b/xen/arch/x86/include/asm/boot-domain.h
@@ -5,6 +5,9 @@
* Copyright (c) 2024 Christopher Clark <christopher.w.clark@gmail.com>
*/
+#include <public/xen.h>
+#include <public/event_channel.h>
+
#ifndef __XEN_X86_BOOTDOMAIN_H__
#define __XEN_X86_BOOTDOMAIN_H__
@@ -35,6 +38,10 @@ struct boot_domain {
const char *cmdline;
struct domain *d;
+ struct {
+ xen_pfn_t gfn;
+ evtchn_port_t evtchn;
+ } console;
};
#endif
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 30/38] x86/hyperlaunch: allocate xenstore for domu
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (28 preceding siblings ...)
2025-04-19 22:08 ` [RFC 29/38] x86/hyperlaunch: allocate console for domu Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 31/38] x86/hyperlaunch: move boot module discard to domain builder Daniel P. Smith
` (7 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
During domU construction, a page of memory and an event channel must be setup
for xenstore connection. In this commit, a page from the special page region of
domU is setup as the xenstore page along with an event channel. The page
address and event channel are published in the HVM parameters, so the domain
can be announced to Xenstore.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/domain.c | 38 ++++++++++++++++++++++++--
xen/arch/x86/hvm/dom_build.c | 32 ++++++++++++++++++++++
xen/arch/x86/include/asm/boot-domain.h | 2 +-
3 files changed, 69 insertions(+), 3 deletions(-)
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index 7ce069a57c5d..b413a87cc9c9 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -240,6 +240,38 @@ static int __init alloc_console_evtchn(
return 0;
}
+static int __init alloc_store_evtchn(
+ const struct boot_info *bi, struct boot_domain *bd)
+{
+ evtchn_alloc_unbound_t evtchn_req;
+ const struct boot_domain *xsdom;
+ int idx, rc;
+
+ idx = first_boot_domain_index(bi, BUILD_CAPS_XENSTORE);
+ if ( idx < 0 )
+ {
+ printk(XENLOG_WARNING "No backing xenstore domain for %pdv\n",
+ bd->d);
+ return -EINVAL;
+ }
+
+ if ( bi->domains[idx].d )
+ xsdom = &bi->domains[idx];
+ else
+ {
+ printk(XENLOG_WARNING "Xenstore domain for %pd console not constructed\n",
+ bd->d);
+ return -EINVAL;
+ }
+
+ if ( (rc = alloc_dom_evtchn(bd, xsdom, &evtchn_req)) < 0 )
+ return rc;
+
+ bd->store.evtchn = evtchn_req.port;
+
+ return 0;
+}
+
static size_t __init domain_cmdline_size(
struct boot_info *bi, struct boot_domain *bd)
{
@@ -362,8 +394,10 @@ struct domain *__init arch_create_dom(
bd->cmdline = cmdline;
}
- if ( !(bd->capabilities & BUILD_CAPS_HARDWARE) )
- alloc_console_evtchn(bi, bd);
+ if ( !(bd->capabilities & BUILD_CAPS_XENSTORE) )
+ alloc_store_evtchn(bi, bd);
+ if ( !(bd->capabilities & BUILD_CAPS_HARDWARE) )
+ alloc_console_evtchn(bi, bd);
if ( construct_dom0(bd) != 0 )
panic("Could not construct domain 0\n");
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 934ae138e58f..2798542e5483 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -930,6 +930,35 @@ static int __init alloc_console_page(struct boot_domain *bd)
return 0;
}
+static int __init alloc_xenstore_page(struct boot_domain *bd)
+{
+ paddr_t xs_addr = special_pfn(SPECIALPAGE_XENSTORE) << PAGE_SHIFT;
+ uint32_t fields[7] = { 0, 0, 0, 0, 0, 1, 0};
+
+ if ( !port_is_valid(bd->d, bd->store.evtchn) )
+ {
+ printk("No event channel available for %pd xenstore\n", bd->d);
+ return -EINVAL;
+ }
+
+ /*
+ * Set connection field to XENSTORE_RECONNECT, where the
+ * xenstore_domain_interface fields are located after the 2 1024 buffers
+ */
+ if ( hvm_copy_to_guest_phys(xs_addr + 2048, fields, sizeof(fields),
+ bd->d->vcpu[0]) )
+ {
+ printk("Unable to set xenstore connection state\n");
+ return -EFAULT;
+ }
+
+ bd->store.gfn = gfn_x(gaddr_to_gfn(xs_addr));
+ bd->d->arch.hvm.params[HVM_PARAM_STORE_PFN] = bd->store.gfn;
+ bd->d->arch.hvm.params[HVM_PARAM_STORE_EVTCHN] = bd->store.evtchn;
+
+ return 0;
+}
+
int __init dom_construct_pvh(struct boot_domain *bd)
{
paddr_t entry, start_info;
@@ -1009,6 +1038,9 @@ int __init dom_construct_pvh(struct boot_domain *bd)
if ( !is_hardware_domain(bd->d) )
alloc_console_page(bd);
+ if ( !is_xenstore_domain(bd->d) )
+ alloc_xenstore_page(bd);
+
if ( opt_dom0_verbose )
{
printk("Dom%u memory map:\n", bd->domid);
diff --git a/xen/arch/x86/include/asm/boot-domain.h b/xen/arch/x86/include/asm/boot-domain.h
index cb6e1fab23ba..df2bfa0c94fa 100644
--- a/xen/arch/x86/include/asm/boot-domain.h
+++ b/xen/arch/x86/include/asm/boot-domain.h
@@ -41,7 +41,7 @@ struct boot_domain {
struct {
xen_pfn_t gfn;
evtchn_port_t evtchn;
- } console;
+ } console, store;
};
#endif
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 31/38] x86/hyperlaunch: move boot module discard to domain builder
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (29 preceding siblings ...)
2025-04-19 22:08 ` [RFC 30/38] x86/hyperlaunch: allocate xenstore " Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 32/38] x86/hyperlaunch: introduce concept of core domains Daniel P. Smith
` (6 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Relocate the call to discard boot modules from inside the domain
construction logic to the top level domain builder. This will
ensure modules are not discarded until after all domains have been
constructed.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/core.c | 5 ++++-
xen/arch/x86/hvm/dom_build.c | 3 ---
xen/arch/x86/pv/dom0_build.c | 3 ---
3 files changed, 4 insertions(+), 7 deletions(-)
diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-builder/core.c
index e21f5efbfe62..901efce62a61 100644
--- a/xen/arch/x86/domain-builder/core.c
+++ b/xen/arch/x86/domain-builder/core.c
@@ -8,8 +8,8 @@
#include <xen/lib.h>
#include <asm/bootinfo.h>
-#include <asm/setup.h>
#include <asm/domain-builder.h>
+#include <asm/setup.h>
#include "fdt.h"
@@ -116,6 +116,9 @@ unsigned int __init builder_create_domains(struct boot_info *bi)
if ( bd->d )
build_count++;
+ /* Free temporary buffers. */
+ free_boot_modules();
+
return build_count;
}
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 2798542e5483..a847c2cb16d9 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -854,9 +854,6 @@ static int __init pvh_load_kernel(
last_addr = ROUNDUP(last_addr, PAGE_SIZE);
}
- /* Free temporary buffers. */
- free_boot_modules();
-
if ( bd->cmdline )
{
rc = hvm_copy_to_guest_phys(last_addr, bd->cmdline,
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index c3d269438313..8d941ff4486e 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -865,9 +865,6 @@ static int __init dom0_construct(struct boot_domain *bd)
init_hypercall_page(d, _p(parms.virt_hypercall));
}
- /* Free temporary buffers. */
- free_boot_modules();
-
/* Set up start info area. */
si = (start_info_t *)vstartinfo_start;
clear_page(si);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 32/38] x86/hyperlaunch: introduce concept of core domains
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (30 preceding siblings ...)
2025-04-19 22:08 ` [RFC 31/38] x86/hyperlaunch: move boot module discard to domain builder Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-23 19:50 ` Jason Andryuk
2025-04-19 22:08 ` [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant Daniel P. Smith
` (5 subsequent siblings)
37 siblings, 1 reply; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
When constructing domU, and specifically the event channels for their console
and xenstore event channels, the domid for the backing domain must be known.
Therefore, the control, hardware, and xenstore domains are deemed as core
domains, and must be constructed before any of the other domains.
This commit introduces the build_core_domains() function that will ensure the
core domains are constructed first.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/core.c | 68 ++++++++++++++++++++++++--
xen/arch/x86/include/asm/boot-domain.h | 2 +
2 files changed, 66 insertions(+), 4 deletions(-)
diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-builder/core.c
index 901efce62a61..f693aa46d278 100644
--- a/xen/arch/x86/domain-builder/core.c
+++ b/xen/arch/x86/domain-builder/core.c
@@ -103,18 +103,78 @@ void __init builder_init(struct boot_info *bi)
}
}
+static int __init build_core_domains(struct boot_info *bi)
+{
+ int count = 0;
+ struct boot_domain *bd;
+ int hw, cd, xs;
+
+ cd = first_boot_domain_index(bi, BUILD_CAPS_CONTROL);
+ if ( cd > MAX_NR_BOOTDOMS )
+ printk(XENLOG_WARNING "No control domain was defined\n");
+ else
+ {
+ bd = &bi->domains[cd];
+
+ arch_create_dom(bi, bd);
+ if ( bd->d )
+ {
+ bd->constructed = true;
+ count++;
+ }
+ }
+
+ hw = first_boot_domain_index(bi, BUILD_CAPS_HARDWARE);
+ if ( hw > MAX_NR_BOOTDOMS )
+ printk(XENLOG_WARNING "No hardware domain was defined\n");
+ else
+ {
+ if ( hw != cd )
+ {
+ bd = &bi->domains[hw];
+
+ arch_create_dom(bi, bd);
+ if ( bd->d )
+ {
+ bd->constructed = true;
+ count++;
+ }
+ }
+ }
+
+ xs = first_boot_domain_index(bi, BUILD_CAPS_XENSTORE);
+ if ( xs > MAX_NR_BOOTDOMS )
+ printk(XENLOG_WARNING "No xenstore domain was defined\n");
+ else
+ {
+ if ( xs != cd && xs != hw )
+ {
+ bd = &bi->domains[xs];
+
+ arch_create_dom(bi, bd);
+ if ( bd->d )
+ {
+ bd->constructed = true;
+ count++;
+ }
+ }
+ }
+
+ return count;
+}
+
unsigned int __init builder_create_domains(struct boot_info *bi)
{
unsigned int build_count = 0;
struct boot_domain *bd = &bi->domains[0];
+ if ( bi->nr_domains == 0 )
+ panic("%s: no domains defined\n", __func__);
+
if ( bd->kernel == NULL && bd->capabilities & BUILD_CAPS_CONTROL )
panic("%s: control domain missing kernel\n", __func__);
-
- arch_create_dom(bi, bd);
- if ( bd->d )
- build_count++;
+ build_count = build_core_domains(bi);
/* Free temporary buffers. */
free_boot_modules();
diff --git a/xen/arch/x86/include/asm/boot-domain.h b/xen/arch/x86/include/asm/boot-domain.h
index df2bfa0c94fa..a574f4941ed3 100644
--- a/xen/arch/x86/include/asm/boot-domain.h
+++ b/xen/arch/x86/include/asm/boot-domain.h
@@ -42,6 +42,8 @@ struct boot_domain {
xen_pfn_t gfn;
evtchn_port_t evtchn;
} console, store;
+
+ bool constructed;
};
#endif
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (31 preceding siblings ...)
2025-04-19 22:08 ` [RFC 32/38] x86/hyperlaunch: introduce concept of core domains Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-23 19:27 ` Jason Andryuk
2025-04-19 22:08 ` [RFC 34/38] x86/hyperlaunch: introduce multidomain kconfig option Daniel P. Smith
` (4 subsequent siblings)
37 siblings, 1 reply; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
The bzimage logic uses the unit global orig_image_len to hold the original
module length for the kernel when the headroom is calculated. It then uses
orig_image_len to locate the start of the bzimage when the expansion is done.
This is an issue when more than one bzimage is processed by the headroom
calculation logic, as it will leave orig_image_len set to the length of the
last bzimage it processed.
The boot module work introduced storing the headroom size on a per module
basis. By passing in the headroom from the boot module, orig_image_len is no
longer needed to locate the beginning of the bzimage after the allocated
headroom. The bzimage functions are reworked as such, allowing the removal of
orig_image_len and enabling them to be reused by multiple kernel boot modules.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/bzimage.c | 38 ++++++++++++++++++------------
xen/arch/x86/hvm/dom_build.c | 3 ++-
xen/arch/x86/include/asm/bzimage.h | 5 ++--
xen/arch/x86/pv/dom0_build.c | 3 ++-
4 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index 66f648f311e4..32f0360d25b4 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -68,8 +68,6 @@ static __init int bzimage_check(struct setup_header *hdr, unsigned long len)
return 1;
}
-static unsigned long __initdata orig_image_len;
-
unsigned long __init bzimage_headroom(void *image_start,
unsigned long image_length)
{
@@ -90,7 +88,6 @@ unsigned long __init bzimage_headroom(void *image_start,
if ( elf_is_elfbinary(image_start, image_length) )
return 0;
- orig_image_len = image_length;
headroom = output_length(image_start, image_length);
if (gzip_check(image_start, image_length))
{
@@ -103,13 +100,20 @@ unsigned long __init bzimage_headroom(void *image_start,
return headroom;
}
-int __init bzimage_parse(void *image_base, void **image_start,
- unsigned long *image_len)
+int __init bzimage_parse(
+ void *image_base, void **image_start, unsigned long headroom,
+ unsigned long *image_len)
{
struct setup_header *hdr = (struct setup_header *)(*image_start);
int err = bzimage_check(hdr, *image_len);
- unsigned long output_len;
-
+ unsigned long module_len = *image_len;
+
+ /*
+ * Variable err will have one of three values:
+ * - < 0: a error occurred trying to inspect the contents
+ * - > 0: the image is a bzImage
+ * - == 0: not a bzImage, could be raw elf or elf.gz (vmlinuz.gz)
+ */
if ( err < 0 )
return err;
@@ -118,21 +122,25 @@ int __init bzimage_parse(void *image_base, void **image_start,
*image_start += (hdr->setup_sects + 1) * 512 + hdr->payload_offset;
*image_len = hdr->payload_length;
}
-
- if ( elf_is_elfbinary(*image_start, *image_len) )
- return 0;
+ else
+ {
+ if ( elf_is_elfbinary(*image_start, *image_len) )
+ return 0;
+ else
+ *image_len = *image_len - headroom;
+ }
BUG_ON(!(image_base < *image_start));
- output_len = output_length(*image_start, orig_image_len);
-
- if ( (err = perform_gunzip(image_base, *image_start, orig_image_len)) > 0 )
- err = decompress(*image_start, orig_image_len, image_base);
+ if ( (err = perform_gunzip(image_base, *image_start, *image_len)) > 0 )
+ err = decompress(*image_start, *image_len, image_base);
if ( !err )
{
+ printk(XENLOG_ERR "%s(%d): decompression failed, reseting image start and len\n",
+ __func__, err);
*image_start = image_base;
- *image_len = output_len;
+ *image_len = module_len;
}
return err > 0 ? 0 : err;
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index a847c2cb16d9..4f614aea34c3 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -743,7 +743,8 @@ static int __init pvh_load_kernel(
struct vcpu *v = d->vcpu[0];
int rc;
- if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
+ if ( (rc = bzimage_parse(image_base, &image_start, image->headroom,
+ &image_len)) != 0 )
{
printk("Error trying to detect bz compressed kernel\n");
return rc;
diff --git a/xen/arch/x86/include/asm/bzimage.h b/xen/arch/x86/include/asm/bzimage.h
index 7ed69d39103d..24c7d4b8eb68 100644
--- a/xen/arch/x86/include/asm/bzimage.h
+++ b/xen/arch/x86/include/asm/bzimage.h
@@ -5,7 +5,8 @@
unsigned long bzimage_headroom(void *image_start, unsigned long image_length);
-int bzimage_parse(void *image_base, void **image_start,
- unsigned long *image_len);
+int bzimage_parse(
+ void *image_base, void **image_start, unsigned long headroom,
+ unsigned long *image_len);
#endif /* __X86_BZIMAGE_H__ */
diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index 8d941ff4486e..8b02f62892d4 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -427,7 +427,8 @@ static int __init dom0_construct(struct boot_domain *bd)
d->max_pages = ~0U;
- if ( (rc = bzimage_parse(image_base, &image_start, &image_len)) != 0 )
+ if ( (rc = bzimage_parse(image_base, &image_start, bd->kernel->headroom,
+ &image_len)) != 0 )
return rc;
if ( (rc = elf_init(&elf, image_start, image_len)) != 0 )
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 34/38] x86/hyperlaunch: introduce multidomain kconfig option
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (32 preceding siblings ...)
2025-04-19 22:08 ` [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 35/38] x86/hyperlaunch: add multidomain construction logic Daniel P. Smith
` (3 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
This adds the MULTIDOMAIN_BUILDER kconfig option that will be used to enable
the domain construction path to be called multiple times. With the idea of
being able to construct multiple domains now introduced, rename construct_dom0()
to construct_dom().
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/dom0_build.c | 9 +++++----
xen/arch/x86/domain-builder/Kconfig | 12 ++++++++++++
xen/arch/x86/domain-builder/domain.c | 2 +-
xen/arch/x86/include/asm/bootinfo.h | 2 +-
xen/arch/x86/include/asm/setup.h | 2 +-
5 files changed, 20 insertions(+), 7 deletions(-)
diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
index e488a204372b..8c8cb08a6f7e 100644
--- a/xen/arch/x86/dom0_build.c
+++ b/xen/arch/x86/dom0_build.c
@@ -558,15 +558,16 @@ int __init dom0_setup_permissions(struct domain *d)
return rc;
}
-int __init construct_dom0(struct boot_domain *bd)
+int __init construct_dom(struct boot_domain *bd)
{
int rc;
const struct domain *d = bd->d;
/* Sanity! */
- BUG_ON(!pv_shim && d->domain_id != 0);
- BUG_ON(d->vcpu[0] == NULL);
- BUG_ON(d->vcpu[0]->is_initialised);
+ if ( ! IS_ENABLED(CONFIG_MULTIDOMAIN_BUILDER) )
+ BUG_ON(!pv_shim && bd->d->domain_id != 0);
+ BUG_ON(bd->d->vcpu[0] == NULL);
+ BUG_ON(bd->d->vcpu[0]->is_initialised);
process_pending_softirqs();
diff --git a/xen/arch/x86/domain-builder/Kconfig b/xen/arch/x86/domain-builder/Kconfig
index 51d549c20df3..05d7859e6b53 100644
--- a/xen/arch/x86/domain-builder/Kconfig
+++ b/xen/arch/x86/domain-builder/Kconfig
@@ -12,4 +12,16 @@ config DOMAIN_BUILDER
If unsure, say N.
+config MULTIDOMAIN_BUILDER
+ bool "Multiple domain building (UNSUPPORTED)" if UNSUPPORTED
+ depends on DOMAIN_BUILDER
+ default n
+ help
+ Enables the domain builder capability to build multiple domains
+ using a flattened device tree.
+
+ This feature is currently experimental.
+
+ If unsure, say N.
+
endmenu
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index b413a87cc9c9..919df1ffb1c3 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -399,7 +399,7 @@ struct domain *__init arch_create_dom(
if ( !(bd->capabilities & BUILD_CAPS_HARDWARE) )
alloc_console_evtchn(bi, bd);
- if ( construct_dom0(bd) != 0 )
+ if ( construct_dom(bd) != 0 )
panic("Could not construct domain 0\n");
XFREE(cmdline);
diff --git a/xen/arch/x86/include/asm/bootinfo.h b/xen/arch/x86/include/asm/bootinfo.h
index 430ae08cf5ef..298cff303673 100644
--- a/xen/arch/x86/include/asm/bootinfo.h
+++ b/xen/arch/x86/include/asm/bootinfo.h
@@ -17,7 +17,7 @@
#define MAX_NR_BOOTMODS 63
/* Max number of boot domains that Xen can construct */
-#define MAX_NR_BOOTDOMS 1
+#define MAX_NR_BOOTDOMS 64
/* Boot module binary type / purpose */
enum bootmod_type {
diff --git a/xen/arch/x86/include/asm/setup.h b/xen/arch/x86/include/asm/setup.h
index 4b8fbdc67e05..3f6850d40d04 100644
--- a/xen/arch/x86/include/asm/setup.h
+++ b/xen/arch/x86/include/asm/setup.h
@@ -27,7 +27,7 @@ void subarch_init_memory(void);
void init_IRQ(void);
struct boot_domain;
-int construct_dom0(struct boot_domain *bd);
+int construct_dom(struct boot_domain *bd);
const char *cmdline_cook(const char *p, const char *loader_name);
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 35/38] x86/hyperlaunch: add multidomain construction logic
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (33 preceding siblings ...)
2025-04-19 22:08 ` [RFC 34/38] x86/hyperlaunch: introduce multidomain kconfig option Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 36/38] x86/hyperlaunch: enable unpausing mulitple domains Daniel P. Smith
` (2 subsequent siblings)
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Introduce the logic to loop over boot_info->domains and construct
each valid entry in the array.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/core.c | 32 ++++++++++++++++++++++++----
xen/arch/x86/domain-builder/domain.c | 7 ++++--
xen/arch/x86/hvm/dom_build.c | 5 ++++-
xen/arch/x86/setup.c | 11 ++++++----
4 files changed, 44 insertions(+), 11 deletions(-)
diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-builder/core.c
index f693aa46d278..2712703e17e6 100644
--- a/xen/arch/x86/domain-builder/core.c
+++ b/xen/arch/x86/domain-builder/core.c
@@ -166,16 +166,40 @@ static int __init build_core_domains(struct boot_info *bi)
unsigned int __init builder_create_domains(struct boot_info *bi)
{
unsigned int build_count = 0;
- struct boot_domain *bd = &bi->domains[0];
+ int i;
if ( bi->nr_domains == 0 )
panic("%s: no domains defined\n", __func__);
- if ( bd->kernel == NULL && bd->capabilities & BUILD_CAPS_CONTROL )
- panic("%s: control domain missing kernel\n", __func__);
-
build_count = build_core_domains(bi);
+ if ( ! IS_ENABLED(CONFIG_MULTIDOMAIN_BUILDER) )
+ goto out;
+
+ for ( i = 0; i < bi->nr_domains; i++ )
+ {
+ struct boot_domain *bd = &bi->domains[i];
+
+ if ( bd->constructed )
+ continue;
+
+ if ( bd->mode & BUILD_MODE_PARAVIRT )
+ {
+ printk(XENLOG_WARNING "don't support PV DomU, skipping %d\n", i);
+ continue;
+ }
+
+ arch_create_dom(bi, bd);
+ if ( bd->d )
+ {
+ bd->constructed = true;
+ build_count++;
+ }
+ else
+ printk(XENLOG_WARNING "failed to construct build domain %d\n", i);
+ }
+
+ out:
/* Free temporary buffers. */
free_boot_modules();
diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
index 919df1ffb1c3..7f5bee0dc759 100644
--- a/xen/arch/x86/domain-builder/domain.c
+++ b/xen/arch/x86/domain-builder/domain.c
@@ -321,8 +321,11 @@ struct domain *__init arch_create_dom(
((hvm_hap_supported() && !opt_dom0_shadow) ?
XEN_DOMCTL_CDF_hap : 0));
- dom_cfg.arch.emulation_flags |=
- XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
+ if ( bd->capabilities & BUILD_CAPS_HARDWARE )
+ dom_cfg.arch.emulation_flags |=
+ XEN_X86_EMU_LAPIC | XEN_X86_EMU_IOAPIC | XEN_X86_EMU_VPCI;
+ else
+ dom_cfg.arch.emulation_flags |= X86_EMU_LAPIC;
}
if ( bd->capabilities & BUILD_CAPS_HARDWARE )
diff --git a/xen/arch/x86/hvm/dom_build.c b/xen/arch/x86/hvm/dom_build.c
index 4f614aea34c3..db7bf9a6934a 100644
--- a/xen/arch/x86/hvm/dom_build.c
+++ b/xen/arch/x86/hvm/dom_build.c
@@ -885,7 +885,10 @@ static int __init pvh_load_kernel(
}
start_info.magic = XEN_HVM_START_MAGIC_VALUE;
- start_info.flags = SIF_PRIVILEGED | SIF_INITDOMAIN;
+ if ( is_control_domain(d) )
+ start_info.flags = SIF_PRIVILEGED;
+ if ( is_hardware_domain(d) )
+ start_info.flags = SIF_INITDOMAIN;
rc = hvm_copy_to_guest_phys(last_addr, &start_info, sizeof(start_info), v);
if ( rc )
{
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index da5a8e8d8ed3..09b1fc94426d 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -1319,10 +1319,13 @@ void asmlinkage __init noreturn __start_xen(void)
xen->size = __2M_rwdata_end - _stext;
}
- bi->domains[0].kernel->headroom =
- bzimage_headroom(bootstrap_map_bm(bi->domains[0].kernel),
- bi->domains[0].kernel->size);
- bootstrap_unmap();
+ for ( i = 0; i < bi->nr_domains; i++ )
+ {
+ bi->domains[i].kernel->headroom =
+ bzimage_headroom(bootstrap_map_bm(bi->domains[i].kernel),
+ bi->domains[i].kernel->size);
+ bootstrap_unmap();
+ }
#ifndef highmem_start
/* Don't allow split below 4Gb. */
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 36/38] x86/hyperlaunch: enable unpausing mulitple domains
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (34 preceding siblings ...)
2025-04-19 22:08 ` [RFC 35/38] x86/hyperlaunch: add multidomain construction logic Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 37/38] x86/hyperlaunch: generalize domid assignment Daniel P. Smith
2025-04-19 22:08 ` [RFC 38/38] tools: introduce hyperlaunch domain late init Daniel P. Smith
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
This commit enables the domain builder to unpause all domains
that have been flagged to start on boot.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/core.c | 20 ++++++++++++++++++++
xen/arch/x86/include/asm/boot-domain.h | 8 +++++---
xen/arch/x86/include/asm/domain-builder.h | 1 +
xen/arch/x86/setup.c | 8 +++++++-
4 files changed, 33 insertions(+), 4 deletions(-)
diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-builder/core.c
index 2712703e17e6..fa01b8390a23 100644
--- a/xen/arch/x86/domain-builder/core.c
+++ b/xen/arch/x86/domain-builder/core.c
@@ -6,6 +6,7 @@
#include <xen/init.h>
#include <xen/kconfig.h>
#include <xen/lib.h>
+#include <xen/sched.h>
#include <asm/bootinfo.h>
#include <asm/domain-builder.h>
@@ -206,6 +207,25 @@ unsigned int __init builder_create_domains(struct boot_info *bi)
return build_count;
}
+int __init builder_unpause_domains(struct boot_info *bi)
+{
+ int i, count = 0;
+
+ for ( i = 0; i < bi->nr_domains; i++ )
+ {
+ struct boot_domain *bd = &bi->domains[i];
+
+ if ( bd->capabilities & BUILD_CAPS_HARDWARE ||
+ bd->mode & BUILD_MODE_START_ON_BOOT )
+ {
+ domain_unpause_by_systemcontroller(bd->d);
+ count++;
+ }
+ }
+
+ return count;
+}
+
/*
* Local variables:
* mode: C
diff --git a/xen/arch/x86/include/asm/boot-domain.h b/xen/arch/x86/include/asm/boot-domain.h
index a574f4941ed3..b592aef84b27 100644
--- a/xen/arch/x86/include/asm/boot-domain.h
+++ b/xen/arch/x86/include/asm/boot-domain.h
@@ -22,9 +22,11 @@ struct boot_domain {
#define BUILD_CAPS_XENSTORE (1 << 2)
uint32_t capabilities;
- /* On | Off */
-#define BUILD_MODE_PARAVIRT (1 << 0) /* PV | PVH/HVM */
-#define BUILD_MODE_ENABLE_DM (1 << 1) /* HVM | PVH */
+ /* On | Off */
+#define BUILD_MODE_PARAVIRT (1 << 0) /* PV | PVH/HVM */
+#define BUILD_MODE_ENABLE_DM (1 << 1) /* HVM | PVH */
+#define BUILD_MODE_LONG (1 << 2) /* 64 BIT | 32 BIT */
+#define BUILD_MODE_START_ON_BOOT (1 << 3) /* UNPAUSED | PAUSED */
uint32_t mode;
unsigned long mem_pages;
diff --git a/xen/arch/x86/include/asm/domain-builder.h b/xen/arch/x86/include/asm/domain-builder.h
index 5dc5661bec07..df55cf52460c 100644
--- a/xen/arch/x86/include/asm/domain-builder.h
+++ b/xen/arch/x86/include/asm/domain-builder.h
@@ -9,6 +9,7 @@ int __init builder_get_cmdline(
void builder_init(struct boot_info *bi);
unsigned int builder_create_domains(struct boot_info *bi);
+int builder_unpause_domains(struct boot_info *bi);
struct domain *arch_create_dom(
struct boot_info *bi, struct boot_domain *bd);
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 09b1fc94426d..ba0dd427c81b 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -787,6 +787,7 @@ static inline bool using_2M_mapping(void)
static void noreturn init_done(void)
{
+ struct boot_info *bi = &xen_boot_info;
void *va;
unsigned long start, end;
int err;
@@ -800,7 +801,12 @@ static void noreturn init_done(void)
if ( IS_ENABLED(CONFIG_SELF_TESTS) && cpu_has_xen_shstk )
stub_selftest();
- domain_unpause_by_systemcontroller(dom0);
+ err = builder_unpause_domains(bi);
+ if ( err == 0 )
+ panic("domain builder: failed to schedule any domain to start\n");
+ else
+ printk("domain builder: unpaused %d of %d domains at boot\n", err,
+ bi->nr_domains);
/* MUST be done prior to removing .init data. */
unregister_init_virtual_region();
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 37/38] x86/hyperlaunch: generalize domid assignment
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (35 preceding siblings ...)
2025-04-19 22:08 ` [RFC 36/38] x86/hyperlaunch: enable unpausing mulitple domains Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 38/38] tools: introduce hyperlaunch domain late init Daniel P. Smith
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Jan Beulich, Andrew Cooper, Roger Pau Monné
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
xen/arch/x86/domain-builder/fdt.c | 29 +++++++++++++++++++++++++++--
1 file changed, 27 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/domain-builder/fdt.c b/xen/arch/x86/domain-builder/fdt.c
index 9c6c0ae09a24..543a0c92adb4 100644
--- a/xen/arch/x86/domain-builder/fdt.c
+++ b/xen/arch/x86/domain-builder/fdt.c
@@ -15,6 +15,21 @@
#include "fdt.h"
+#define MAX_DOMID DOMID_FIRST_RESERVED
+static __initdata DECLARE_BITMAP(domid_alloc, MAX_DOMID);
+
+static domid_t find_next_domid(void)
+{
+ unsigned long n = find_next_zero_bit(domid_alloc, MAX_DOMID, 1);
+
+ if ( n == MAX_DOMID )
+ return DOMID_INVALID;
+
+ set_bit(n, domid_alloc);
+
+ return (domid_t) n;
+}
+
/*
* Unpacks a "reg" property into its address and size constituents.
*
@@ -186,11 +201,21 @@ static int __init process_domain_node(
printk(" failed processing domain id for domain %s\n", name);
return -EINVAL;
}
+
if ( val >= DOMID_FIRST_RESERVED )
{
- printk(" invalid domain id for domain %s\n", name);
- return -EINVAL;
+ if ( (val = find_next_domid()) == DOMID_INVALID )
+ {
+ printk(" unable to allocate domid for domain %s\n", name);
+ return -EFAULT;
+ }
}
+ else if ( test_and_set_bit(val, domid_alloc) )
+ {
+ printk(XENLOG_ERR "ERR: domid (%d) already assigned\n", val);
+ return -EFAULT;
+ }
+
bd->domid = (domid_t)val;
printk(" domid: %d\n", bd->domid);
}
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* [RFC 38/38] tools: introduce hyperlaunch domain late init
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
` (36 preceding siblings ...)
2025-04-19 22:08 ` [RFC 37/38] x86/hyperlaunch: generalize domid assignment Daniel P. Smith
@ 2025-04-19 22:08 ` Daniel P. Smith
37 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-19 22:08 UTC (permalink / raw)
To: xen-devel
Cc: Daniel P. Smith, jason.andryuk, stefano.stabellini, agarciav,
Andrew Cooper, Anthony PERARD, Michal Orzel, Jan Beulich,
Julien Grall, Roger Pau Monné, Stefano Stabellini
The late domain init helper is a helper tool for late setup of Xenstore for a
domain that was created by the hypervisor using hyperlaunch.
Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
---
.gitignore | 1 +
tools/helpers/Makefile | 12 +
tools/helpers/late-init-domains.c | 364 ++++++++++++++++++++++++++++++
tools/helpers/late-init-domains.h | 18 ++
tools/helpers/xs-helpers.c | 117 ++++++++++
tools/helpers/xs-helpers.h | 26 +++
6 files changed, 538 insertions(+)
create mode 100644 tools/helpers/late-init-domains.c
create mode 100644 tools/helpers/late-init-domains.h
create mode 100644 tools/helpers/xs-helpers.c
create mode 100644 tools/helpers/xs-helpers.h
diff --git a/.gitignore b/.gitignore
index 53f5df000383..7b0c390dbe0d 100644
--- a/.gitignore
+++ b/.gitignore
@@ -122,6 +122,7 @@ tools/flask/utils/flask-label-pci
tools/helpers/init-dom0less
tools/helpers/init-xenstore-domain
tools/helpers/xen-init-dom0
+tools/helpers/late-init-domains
tools/hotplug/common/hotplugpath.sh
tools/hotplug/FreeBSD/rc.d/xencommons
tools/hotplug/FreeBSD/rc.d/xendriverdomain
diff --git a/tools/helpers/Makefile b/tools/helpers/Makefile
index 09590eb5b6f0..26fa079e8b1f 100644
--- a/tools/helpers/Makefile
+++ b/tools/helpers/Makefile
@@ -14,6 +14,7 @@ ifeq ($(CONFIG_ARM),y)
TARGETS += init-dom0less
endif
endif
+TARGETS += late-init-domains
XEN_INIT_DOM0_OBJS = xen-init-dom0.o init-dom-json.o
$(XEN_INIT_DOM0_OBJS): CFLAGS += $(CFLAGS_libxentoollog)
@@ -39,6 +40,14 @@ $(INIT_DOM0LESS_OBJS): CFLAGS += $(CFLAGS_libxenctrl)
$(INIT_DOM0LESS_OBJS): CFLAGS += $(CFLAGS_libxenevtchn)
init-dom0less: LDLIBS += $(call xenlibs-ldlibs,ctrl evtchn toollog store light guest foreignmemory)
+LATE_INIT_DOMAINS_OBJS = late-init-domains.o xs-helpers.o init-dom-json.o
+$(LATE_INIT_DOMAINS_OBJS): CFLAGS += $(CFLAGS_libxentoollog)
+$(LATE_INIT_DOMAINS_OBJS): CFLAGS += $(CFLAGS_libxenguest)
+$(LATE_INIT_DOMAINS_OBJS): CFLAGS += $(CFLAGS_libxenlight)
+$(LATE_INIT_DOMAINS_OBJS): CFLAGS += $(CFLAGS_libxenctrl)
+$(LATE_INIT_DOMAINS_OBJS): CFLAGS += $(CFLAGS_libxenstore)
+late-init-domains: LDLIBS += $(call xenlibs-ldlibs,ctrl toollog store light guest)
+
.PHONY: all
all: $(TARGETS)
@@ -51,6 +60,9 @@ init-xenstore-domain: $(INIT_XENSTORE_DOMAIN_OBJS)
init-dom0less: $(INIT_DOM0LESS_OBJS)
$(CC) $(LDFLAGS) -o $@ $(INIT_DOM0LESS_OBJS) $(LDLIBS) $(APPEND_LDFLAGS)
+late-init-domains: $(LATE_INIT_DOMAINS_OBJS)
+ $(CC) $(LDFLAGS) -o $@ $(LATE_INIT_DOMAINS_OBJS) $(LDLIBS) $(APPEND_LDFLAGS)
+
.PHONY: install
install: all
$(INSTALL_DIR) $(DESTDIR)$(LIBEXEC_BIN)
diff --git a/tools/helpers/late-init-domains.c b/tools/helpers/late-init-domains.c
new file mode 100644
index 000000000000..06911d2e93d1
--- /dev/null
+++ b/tools/helpers/late-init-domains.c
@@ -0,0 +1,364 @@
+
+#include <errno.h>
+#include <getopt.h>
+#include <inttypes.h>
+#include <libxl.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <xenctrl.h>
+#include <xenguest.h>
+#include <xenstore.h>
+#include <xentoollog.h>
+#include <xen/io/xenbus.h>
+
+#include "init-dom-json.h"
+#include "late-init-domains.h"
+#include "xs-helpers.h"
+
+static struct option options[] = {
+ { "console", 0, NULL, 'c' },
+ { "xenstore", 1, NULL, 'x' },
+ { "force", 0, NULL, 'f' },
+ { "verbose", 0, NULL, 'v' },
+ { "help", 0, NULL, 'h' },
+ { NULL, 0, NULL, 0 }
+};
+
+static void usage(void)
+{
+ fprintf(stderr,
+"Usage:\n"
+"\n"
+"late-init-domains <options>\n"
+"\n"
+"where options may include:\n"
+"\n"
+" --console <con domid> configure the console\n"
+" --xenstore <xs domid> domain id of the xenstore domain\n"
+" --force force domain introduction even if xenstore entries exist\n"
+" -v[v[v]] verbosity constructing xenstore tree\n"
+" --help help message\n");
+}
+
+#define XS_DOM_PERM(x, d, k, v) \
+ ret = do_xs_write_dom_with_perm(x, d, k, v, perms, num_perms); \
+ if ( ret != 0 ) return ret \
+
+#define XS_DIR_PERM(x, p, k, v) \
+ ret = do_xs_write_dir_node_with_perm(x, p, k, v, perms, num_perms); \
+ if ( ret != 0 ) return ret \
+
+static int pages_from_hvm_params(
+ struct xc_interface_core *xch, libxl_dominfo *info,
+ struct system_pages *pgs)
+{
+ int ret;
+ domid_t domid = info->domid;
+
+ ret = xc_hvm_param_get(xch, domid, HVM_PARAM_STORE_EVTCHN,
+ &pgs->store.evtchn);
+ if (ret != 0) {
+ fprintf(stderr, "err: failed to get dom%d store evtchn\n", domid);
+ return ret;
+ }
+
+ ret = xc_hvm_param_get(xch, domid, HVM_PARAM_STORE_PFN,
+ &pgs->store.pfn);
+ if (ret < 0) {
+ fprintf(stderr, "err: failed to get dom%d store pfn\n", domid);
+ return ret;
+ }
+
+ if ( pgs->console.enabled )
+ {
+ ret = xc_hvm_param_get(xch, domid, HVM_PARAM_CONSOLE_EVTCHN,
+ &pgs->console.evtchn);
+ if (ret != 0) {
+ fprintf(stderr, "warn: console for dom%d not configured\n", domid);
+ pgs->console.evtchn = pgs->console.pfn = 0;
+ return 0;
+ }
+
+ ret = xc_hvm_param_get(xch, domid, HVM_PARAM_CONSOLE_PFN,
+ &pgs->console.pfn);
+ if (ret < 0) {
+ fprintf(stderr, "warn: console for dom%d not configured\n", domid);
+ pgs->console.evtchn = pgs->console.pfn = 0;
+ return 0;
+ }
+ }
+
+ return 0;
+}
+
+static int create_xs_entries(
+ struct xs_handle *xsh, struct system_pages *pgs, libxl_dominfo *di)
+{
+ char path[128], value[16];
+ struct xs_permissions perms[2] = {
+ {.id = pgs->store.be_domid, .perms = XS_PERM_NONE},
+ {.id = di->domid, .perms = XS_PERM_READ},
+ };
+ uint32_t num_perms = (sizeof(perms) / sizeof((perms)[0]));
+ int ret = 0;
+
+ while ( do_xs_start_transaction(xsh) == 0 )
+ {
+ XS_DOM_PERM(xsh, di->domid, "", "");
+
+ snprintf(value, 16, "%d", di->domid);
+ XS_DOM_PERM(xsh, di->domid, "domid", value);
+
+ XS_DOM_PERM(xsh, di->domid, "memory", "");
+ snprintf(value, 16, "%" PRIu64, di->current_memkb);
+ XS_DOM_PERM(xsh, di->domid, "memory/target", value);
+
+ snprintf(value, 16, "%" PRIu64, di->max_memkb);
+ XS_DOM_PERM(xsh, di->domid, "memory/static-max", value);
+
+ XS_DOM_PERM(xsh, di->domid, "store", "");
+ snprintf(value, 16, "%" PRIu64, pgs->store.evtchn);
+ XS_DOM_PERM(xsh, di->domid, "store/port", value);
+
+ snprintf(value, 16, "%" PRIu64, pgs->store.pfn);
+ XS_DOM_PERM(xsh, di->domid, "store/ring-ref", value);
+
+ if ( pgs->console.enabled && pgs->console.evtchn )
+ {
+ char be_path[64], fe_path[64];
+
+ snprintf(fe_path, 64, "/local/domain/%d/console", di->domid);
+ snprintf(be_path, 64, "/local/domain/%d/backend/console/%d/0",
+ pgs->console.be_domid, di->domid);
+
+ /* Backend entries */
+ XS_DIR_PERM(xsh, be_path, "", "");
+ snprintf(value, 16, "%d", di->domid);
+ XS_DIR_PERM(xsh, be_path, "frontend-id", value);
+ XS_DIR_PERM(xsh, be_path, "frontend", fe_path);
+ XS_DIR_PERM(xsh, be_path, "online", "1");
+ XS_DIR_PERM(xsh, be_path, "protocol", "vt100");
+
+ snprintf(value, 16, "%d", XenbusStateInitialising);
+ XS_DIR_PERM(xsh, be_path, "state", value);
+
+ /* Frontend entries */
+ XS_DOM_PERM(xsh, di->domid, "console", "");
+ snprintf(value, 16, "%d", pgs->console.be_domid);
+ XS_DIR_PERM(xsh, fe_path, "backend", be_path);
+ XS_DIR_PERM(xsh, fe_path, "backend-id", value);
+ XS_DIR_PERM(xsh, fe_path, "limit", "1048576");
+ XS_DIR_PERM(xsh, fe_path, "type", "xenconsoled");
+ XS_DIR_PERM(xsh, fe_path, "output", "pty");
+ XS_DIR_PERM(xsh, fe_path, "tty", "");
+
+ snprintf(value, 16, "%" PRIu64, pgs->console.evtchn);
+ XS_DIR_PERM(xsh, fe_path, "port", value);
+
+ snprintf(value, 16, "%" PRIu64, pgs->console.pfn);
+ XS_DIR_PERM(xsh, fe_path, "ring-ref", value);
+
+ }
+
+ snprintf(path, 128, "/libxl/%u", di->domid);
+ switch ( di->domain_type )
+ {
+ case LIBXL_DOMAIN_TYPE_PV:
+ XS_DIR_PERM(xsh, path, "type", "pv");
+ break;
+ case LIBXL_DOMAIN_TYPE_PVH:
+ XS_DIR_PERM(xsh, path, "type", "pvh");
+ break;
+ case LIBXL_DOMAIN_TYPE_HVM:
+ XS_DIR_PERM(xsh, path, "type", "hvm");
+ break;
+ default:
+ break;
+ }
+
+ ret = do_xs_end_transaction(xsh);
+ switch ( ret )
+ {
+ case 0:
+ break; /* proceed to loop break */
+ case -EAGAIN:
+ continue; /* try again */
+ default:
+ return ret; /* failed */
+ }
+
+ break;
+ }
+
+ return ret;
+}
+
+static bool init_domain(
+ struct xc_interface_core *xch, struct xs_handle *xsh,
+ struct system_pages *pgs, libxl_dominfo *di)
+{
+ xen_pfn_t con_pfn = 0L;
+ /*xc_dom_gnttab_seed will do nothing if front == back */
+ uint32_t con_domid = di->domid;
+ bool is_hvm = (di->domain_type == LIBXL_DOMAIN_TYPE_HVM ||
+ di->domain_type == LIBXL_DOMAIN_TYPE_PVH);
+ int ret;
+
+ if ( (ret = pages_from_hvm_params(xch, di, pgs)) != 0 )
+ {
+ fprintf(stderr, "error(%d): unable to fetch dom%d system pages\n", ret,
+ di->domid);
+ return false;
+ }
+
+ if ( pgs->console.enabled && pgs->console.evtchn )
+ {
+ con_domid = pgs->console.be_domid;
+ con_pfn = pgs->console.pfn;
+ }
+
+ ret = xc_dom_gnttab_seed(xch, di->domid, is_hvm, con_pfn,
+ pgs->store.pfn, con_domid, pgs->store.be_domid);
+ if ( ret != 0 )
+ {
+ fprintf(stderr, "error (%d) setting up grant tables for dom%d\n",
+ ret, di->domid);
+ return false;
+ }
+
+ libxl_uuid_generate(&di->uuid);
+ xc_domain_sethandle(xch, di->domid,
+ libxl_uuid_bytearray(&di->uuid));
+
+ if ( (ret = gen_stub_json_config(di->domid, &di->uuid)) != 0 )
+ fprintf(stderr, "warn(%d): unable generate dom%d json stub\n", ret,
+ di->domid);
+
+ if ( (ret = create_xs_entries(xsh, pgs, di)) != 0 )
+ {
+ fprintf(stderr, "error(%d): unable create dom%d xenstore entries\n",
+ ret, di->domid);
+ return false;
+ }
+
+ if ( !xs_introduce_domain(xsh, di->domid, pgs->store.pfn,
+ pgs->store.evtchn) )
+ {
+ fprintf(stderr, "error introducing dom%d\n", di->domid);
+ return false;
+ }
+
+ return true;
+}
+
+int main(int argc, char** argv)
+{
+ int opt, ret, i, nb_vm = 0, count = 0;
+ bool force = false;
+ struct xs_handle *xsh = NULL;
+ struct xc_interface_core *xch = NULL;
+ xentoollog_level minmsglevel = XTL_PROGRESS;
+ xentoollog_logger *logger = NULL;
+ libxl_dominfo *info = NULL;
+ libxl_ctx *ctx;
+ struct system_pages pages = { {0} };
+
+ while ( (opt = getopt_long(argc, argv, "c:x:fv", options, NULL)) != -1 )
+ {
+ switch ( opt )
+ {
+ case 'c':
+ pages.console.be_domid = strtol(optarg, NULL, 10);
+ pages.console.enabled = true;
+ break;
+ case 'x':
+ pages.store.be_domid = strtol(optarg, NULL, 10);
+ break;
+ case 'f':
+ force = true;
+ break;
+ case 'v':
+ if ( minmsglevel > 1 )
+ minmsglevel--;
+ break;
+ case 'h':
+ usage();
+ return 0;
+ default:
+ usage();
+ return 2;
+ }
+ }
+
+ if ( optind != argc )
+ {
+ usage();
+ return 1;
+ }
+
+ logger = (xentoollog_logger *)xtl_createlogger_stdiostream(stderr,
+ minmsglevel, 0);
+
+ xsh = xs_open(0);
+ xch = xc_interface_open(0, 0, 0);
+ if ( xsh == NULL || xch == NULL )
+ {
+ fprintf(stderr, "error: unable to connect to xs and/or xc interface\n");
+ ret = 1;
+ goto out;
+ }
+
+ ret = libxl_ctx_alloc(&ctx, LIBXL_VERSION, 0, NULL);
+ if (ret) {
+ fprintf(stderr, "cannot init xl context\n");
+ goto out;
+ }
+
+ info = libxl_list_domain(ctx, &nb_vm);
+ if (!info) {
+ fprintf(stderr, "libxl_list_vm failed.\n");
+ ret = 1;
+ goto out;
+ }
+
+ for (i = 0; i < nb_vm; i++) {
+ domid_t domid = info[i].domid;
+
+ /* Don't need to check for Dom0 */
+ if (!domid)
+ continue;
+
+ if ( xs_is_domain_introduced(xsh, domid) )
+ {
+ if ( !force )
+ continue;
+
+ fprintf(stderr, "warning: re-introducting domain %d\n", domid);
+ }
+
+ if ( init_domain(xch, xsh, &pages, &info[i]) )
+ count++;
+ }
+
+ printf("initialized %d out of %d domains\n", count, nb_vm);
+
+ ret = 0;
+
+out:
+ if ( info )
+ libxl_dominfo_list_free(info, nb_vm);
+
+ if ( xsh )
+ xs_close(xsh);
+
+ if ( xch )
+ xc_interface_close(xch);
+
+ if ( logger )
+ xtl_logger_destroy(logger);
+
+ return ret;
+}
diff --git a/tools/helpers/late-init-domains.h b/tools/helpers/late-init-domains.h
new file mode 100644
index 000000000000..8d071ef82ea0
--- /dev/null
+++ b/tools/helpers/late-init-domains.h
@@ -0,0 +1,18 @@
+#ifndef __LATE_INIT_PV_H
+#define __LATE_INIT_PV_H
+
+struct system_pages {
+ struct {
+ uint16_t be_domid;
+ uint64_t evtchn;
+ uint64_t pfn;
+ } store;
+ struct {
+ bool enabled;
+ uint16_t be_domid;
+ uint64_t evtchn;
+ uint64_t pfn;
+ } console;
+};
+
+#endif
diff --git a/tools/helpers/xs-helpers.c b/tools/helpers/xs-helpers.c
new file mode 100644
index 000000000000..a4d2bebbbd54
--- /dev/null
+++ b/tools/helpers/xs-helpers.c
@@ -0,0 +1,117 @@
+
+#include <err.h>
+#include <stdio.h>
+#include <string.h>
+#include <xenstore.h>
+
+#define MAX_XS_PAATH 100
+
+static xs_transaction_t t_id = XBT_NULL;
+
+int do_xs_start_transaction(struct xs_handle *xsh)
+{
+ t_id = xs_transaction_start(xsh);
+ if (t_id == XBT_NULL)
+ return -errno;
+
+ return 0;
+}
+
+int do_xs_end_transaction(struct xs_handle *xsh)
+{
+ if ( t_id == XBT_NULL )
+ return -EINVAL;
+
+ if (!xs_transaction_end(xsh, t_id, false))
+ return -errno;
+
+ return 0;
+}
+
+int do_xs_write(struct xs_handle *xsh, char *path, char *val)
+{
+ if ( !xs_write(xsh, t_id, path, val, strlen(val)) )
+ {
+ fprintf(stderr, "failed write: %s\n", path);
+ return -errno;
+ }
+
+ return 0;
+}
+
+int do_xs_perms(
+ struct xs_handle *xsh, char *path, struct xs_permissions *perms,
+ uint32_t num_perms)
+{
+ if ( !xs_set_permissions(xsh, t_id, path, perms, num_perms) )
+ {
+ fprintf(stderr, "failed set perm: %s\n", path);
+ return -errno;
+ }
+
+ return 0;
+}
+
+int do_xs_write_dir_node_with_perm(
+ struct xs_handle *xsh, char *dir, char *node, char *val,
+ struct xs_permissions *perms, uint32_t num_perms)
+{
+ char full_path[MAX_XS_PAATH];
+ int ret = 0;
+
+ /*
+ * mainly for creating a value holding node, but
+ * also support creating directory nodes.
+ */
+ if ( strlen(node) != 0 )
+ snprintf(full_path, MAX_XS_PAATH, "%s/%s", dir, node);
+ else
+ snprintf(full_path, MAX_XS_PAATH, "%s", dir);
+
+ ret = do_xs_write(xsh, full_path, val);
+ if ( ret < 0 )
+ return ret;
+
+ if ( perms != NULL && num_perms > 0 )
+ ret = do_xs_perms(xsh, full_path, perms, num_perms);
+
+ return ret;
+}
+
+int do_xs_write_dir_node(
+ struct xs_handle *xsh, char *dir, char *node, char *val)
+{
+ return do_xs_write_dir_node_with_perm(xsh, dir, node, val, NULL, 0);
+}
+
+int do_xs_write_dom_with_perm(
+ struct xs_handle *xsh, uint32_t domid, char *path, char *val,
+ struct xs_permissions *perms, uint32_t num_perms)
+{
+ char full_path[MAX_XS_PAATH];
+ int ret = 0;
+
+ /*
+ * mainly for creating a value holding node, but
+ * also support creating directory nodes.
+ */
+ if ( strlen(path) != 0 )
+ snprintf(full_path, MAX_XS_PAATH, "/local/domain/%d/%s", domid, path);
+ else
+ snprintf(full_path, MAX_XS_PAATH, "/local/domain/%d", domid);
+
+ ret = do_xs_write(xsh, full_path, val);
+ if ( ret < 0 )
+ return ret;
+
+ if ( perms != NULL && num_perms > 0 )
+ ret = do_xs_perms(xsh, full_path, perms, num_perms);
+
+ return ret;
+}
+
+int do_xs_write_dom(
+ struct xs_handle *xsh, uint32_t domid, char *path, char *val)
+{
+ return do_xs_write_dom_with_perm(xsh, domid, path, val, NULL, 0);
+}
diff --git a/tools/helpers/xs-helpers.h b/tools/helpers/xs-helpers.h
new file mode 100644
index 000000000000..89585637d4bb
--- /dev/null
+++ b/tools/helpers/xs-helpers.h
@@ -0,0 +1,26 @@
+#ifndef __XS_HELPERS_H
+#define __XS_HELPERS_H
+
+#include <xenstore.h>
+
+int do_xs_start_transaction(struct xs_handle *xsh);
+int do_xs_end_transaction(struct xs_handle *xsh);
+
+int do_xs_write(struct xs_handle *xsh, char *path, char *val);
+int do_xs_perms(
+ struct xs_handle *xsh, char *path, struct xs_permissions *perms,
+ uint32_t num_perms);
+
+int do_xs_write_dir_node_with_perm(
+ struct xs_handle *xsh, char *dir, char *node, char *val,
+ struct xs_permissions *perms, uint32_t num_perms);
+int do_xs_write_dir_node(
+ struct xs_handle *xsh, char *dir, char *node, char *val);
+
+int do_xs_write_dom_with_perm(
+ struct xs_handle *xsh, uint32_t domid, char *path, char *val,
+ struct xs_permissions *perms, uint32_t num_perms);
+int do_xs_write_dom(
+ struct xs_handle *xsh, uint32_t domid, char *path, char *val);
+
+#endif
--
2.30.2
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [RFC 16/38] x86/boot: move and rename sched_setup_dom0_vcpus
2025-04-19 22:07 ` [RFC 16/38] x86/boot: move and rename sched_setup_dom0_vcpus Daniel P. Smith
@ 2025-04-20 9:36 ` Jürgen Groß
2025-04-22 12:38 ` Daniel P. Smith
2025-04-25 22:05 ` Daniel P. Smith
0 siblings, 2 replies; 56+ messages in thread
From: Jürgen Groß @ 2025-04-20 9:36 UTC (permalink / raw)
To: Daniel P. Smith, xen-devel
Cc: jason.andryuk, stefano.stabellini, agarciav, Jan Beulich,
Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, Dario Faggioli, George Dunlap
[-- Attachment #1.1.1: Type: text/plain, Size: 341 bytes --]
On 20.04.25 00:07, Daniel P. Smith wrote:
> Relocated the function sched_setup_dom0_vcpus(), which was protected by an
> ifdef CONFIG_X86, from common/sched to the hyperlaunch domain builder. Rename
> it to alloc_dom_vcpus() to better reflect the purpose of the function.
I'd be in favor naming it domain_vcpus_create().
Juergen
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 3743 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 16/38] x86/boot: move and rename sched_setup_dom0_vcpus
2025-04-20 9:36 ` Jürgen Groß
@ 2025-04-22 12:38 ` Daniel P. Smith
2025-04-25 22:05 ` Daniel P. Smith
1 sibling, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-22 12:38 UTC (permalink / raw)
To: Jürgen Groß, xen-devel
Cc: jason.andryuk, stefano.stabellini, agarciav, Jan Beulich,
Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, Dario Faggioli, George Dunlap
On 4/20/25 05:36, Jürgen Groß wrote:
> On 20.04.25 00:07, Daniel P. Smith wrote:
>> Relocated the function sched_setup_dom0_vcpus(), which was protected
>> by an
>> ifdef CONFIG_X86, from common/sched to the hyperlaunch domain builder.
>> Rename
>> it to alloc_dom_vcpus() to better reflect the purpose of the function.
>
> I'd be in favor naming it domain_vcpus_create().
Yah, that's a much better name.
v/r,
dps
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 03/38] x86/hyperlaunch: convert max vcpu determination to domain builder
2025-04-19 22:07 ` [RFC 03/38] x86/hyperlaunch: convert max vcpu determination to domain builder Daniel P. Smith
@ 2025-04-22 20:36 ` Jason Andryuk
2025-04-25 15:05 ` Alejandro Vallejo
2025-04-25 22:03 ` Daniel P. Smith
0 siblings, 2 replies; 56+ messages in thread
From: Jason Andryuk @ 2025-04-22 20:36 UTC (permalink / raw)
To: Daniel P. Smith, xen-devel
Cc: stefano.stabellini, agarciav, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On 2025-04-19 18:07, Daniel P. Smith wrote:
> The domain configuration may request more vcpus than are present in the system.
> For dom0, the function dom0_max_vcpus() was used to clamp down to physically
> available vcpus. Here we are introducing a generalized version,
> dom_max_vcpus(), that takes a boot domain and sets the max vcpus based on the
> lesser of the requested max and the available vcpus.
>
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> ---
> diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
> new file mode 100644
> index 000000000000..f2277b9e3cf3
> --- /dev/null
> +++ b/xen/arch/x86/domain-builder/domain.c
> @@ -0,0 +1,38 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * Copyright (C) 2024, Apertus Solutions, LLC
> + */
> +
> +#include <xen/cpumask.h>
> +#include <xen/domain.h>
> +#include <xen/init.h>
> +#include <xen/sched.h>
> +
> +#include <asm/bootinfo.h>
> +
> +unsigned int __init dom_max_vcpus(struct boot_domain *bd)
> +{
> + unsigned int limit = bd->mode & BUILD_MODE_PARAVIRT ?
> + MAX_VIRT_CPUS : HVM_MAX_VCPUS;
> +
> + if ( bd->capabilities & BUILD_CAPS_CONTROL )
I added xen/include/public/bootfdt.h with DOMAIN_CAPS_CONTROL and the
other capabilities to provide common values.
> + limit = dom0_max_vcpus();
dom0_max_vcpus() applies Xen's dom0_max_vcpus command line option. That
is desirable for a traditional dom0. For a disaggregated, Hyperlaunch
system, I'm not sure it's appropriate. Considering there can multiple
control domains, it's more questionable.
Might it be better to only apply Xen "dom0" command line options to
non-hyperlaunch dom0? Or a domain with all of
BUILD_CAPS_CONTROL/HARDWARE/XENSTORE?
I guess it could stay as-is, but it seems unusual.
Regards,
Jason
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant
2025-04-19 22:08 ` [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant Daniel P. Smith
@ 2025-04-23 19:27 ` Jason Andryuk
2025-04-25 22:16 ` Daniel P. Smith
2025-04-26 1:53 ` Daniel P. Smith
0 siblings, 2 replies; 56+ messages in thread
From: Jason Andryuk @ 2025-04-23 19:27 UTC (permalink / raw)
To: Daniel P. Smith, xen-devel
Cc: stefano.stabellini, agarciav, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On 2025-04-19 18:08, Daniel P. Smith wrote:
> The bzimage logic uses the unit global orig_image_len to hold the original
> module length for the kernel when the headroom is calculated. It then uses
> orig_image_len to locate the start of the bzimage when the expansion is done.
> This is an issue when more than one bzimage is processed by the headroom
> calculation logic, as it will leave orig_image_len set to the length of the
> last bzimage it processed.
>
> The boot module work introduced storing the headroom size on a per module
> basis. By passing in the headroom from the boot module, orig_image_len is no
> longer needed to locate the beginning of the bzimage after the allocated
> headroom. The bzimage functions are reworked as such, allowing the removal of
> orig_image_len and enabling them to be reused by multiple kernel boot modules.
>
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> ---
> xen/arch/x86/bzimage.c | 38 ++++++++++++++++++------------
> xen/arch/x86/hvm/dom_build.c | 3 ++-
> xen/arch/x86/include/asm/bzimage.h | 5 ++--
> xen/arch/x86/pv/dom0_build.c | 3 ++-
> 4 files changed, 30 insertions(+), 19 deletions(-)
>
> diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
> index 66f648f311e4..32f0360d25b4 100644
> @@ -103,13 +100,20 @@ unsigned long __init bzimage_headroom(void *image_start,
> return headroom;
> }
>
> -int __init bzimage_parse(void *image_base, void **image_start,
> - unsigned long *image_len)
> +int __init bzimage_parse(
> + void *image_base, void **image_start, unsigned long headroom,
> + unsigned long *image_len)
> {
> struct setup_header *hdr = (struct setup_header *)(*image_start);
> int err = bzimage_check(hdr, *image_len);
> - unsigned long output_len;
> -
> + unsigned long module_len = *image_len;
> +
> + /*
> + * Variable err will have one of three values:
> + * - < 0: a error occurred trying to inspect the contents
> + * - > 0: the image is a bzImage
> + * - == 0: not a bzImage, could be raw elf or elf.gz (vmlinuz.gz)
> + */
This comment seems a little independent of this change, so maybe it
should be submitted separately. Also, I think a better placement would
be next to bzimage_check().
> if ( err < 0 )
> return err;
>
> @@ -118,21 +122,25 @@ int __init bzimage_parse(void *image_base, void **image_start,
> *image_start += (hdr->setup_sects + 1) * 512 + hdr->payload_offset;
> *image_len = hdr->payload_length;
@here
> }
> -
> - if ( elf_is_elfbinary(*image_start, *image_len) )
> - return 0;
> + else
> + {
> + if ( elf_is_elfbinary(*image_start, *image_len) )
> + return 0;
> + else
> + *image_len = *image_len - headroom;
> + }
I don't like this extra indention which includes the return. If you
retain orig_image_len as a local variable, and set it above at "@here",
you can have a smaller diff and leave cleaner logic.
orig_image_len previously was set as a static variable, so the correct
value was set after bzimage_headroom. Now that it is no longer static,
we need to grab the hdr->payload_length value when we have a bzImage.
Otherwise output_length will read past the end of the module.
Below is the diff for bzimage.c with the change I suggest.
Regards,
Jason
diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index ac4fd428be..775a8b45ff 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -69,8 +69,6 @@ static __init int bzimage_check(struct setup_header
*hdr, unsigned long len)
return 1;
}
-static unsigned long __initdata orig_image_len;
-
unsigned long __init bzimage_headroom(void *image_start,
unsigned long image_length)
{
@@ -91,7 +89,6 @@ unsigned long __init bzimage_headroom(void *image_start,
if ( elf_is_elfbinary(image_start, image_length) )
return 0;
- orig_image_len = image_length;
headroom = output_length(image_start, image_length);
if (gzip_check(image_start, image_length))
{
@@ -104,12 +101,15 @@ unsigned long __init bzimage_headroom(void
*image_start,
return headroom;
}
-int __init bzimage_parse(void *image_base, void **image_start,
- unsigned long *image_len)
+int __init bzimage_parse(
+ void *image_base, void **image_start, unsigned long headroom,
+ unsigned long *image_len)
{
struct setup_header *hdr = (struct setup_header *)(*image_start);
int err = bzimage_check(hdr, *image_len);
- unsigned long output_len;
+ unsigned long output_len, orig_image_len;
+
+ orig_image_len = *image_len - headroom;
if ( err < 0 )
return err;
@@ -118,6 +118,7 @@ int __init bzimage_parse(void *image_base, void
**image_start,
{
*image_start += (hdr->setup_sects + 1) * 512 +
hdr->payload_offset;
*image_len = hdr->payload_length;
+ orig_image_len = hdr->payload_length;
}
if ( elf_is_elfbinary(*image_start, *image_len) )
^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [RFC 32/38] x86/hyperlaunch: introduce concept of core domains
2025-04-19 22:08 ` [RFC 32/38] x86/hyperlaunch: introduce concept of core domains Daniel P. Smith
@ 2025-04-23 19:50 ` Jason Andryuk
2025-04-25 22:06 ` Daniel P. Smith
0 siblings, 1 reply; 56+ messages in thread
From: Jason Andryuk @ 2025-04-23 19:50 UTC (permalink / raw)
To: Daniel P. Smith, xen-devel
Cc: stefano.stabellini, agarciav, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On 2025-04-19 18:08, Daniel P. Smith wrote:
> When constructing domU, and specifically the event channels for their console
> and xenstore event channels, the domid for the backing domain must be known.
> Therefore, the control, hardware, and xenstore domains are deemed as core
> domains, and must be constructed before any of the other domains.
>
> This commit introduces the build_core_domains() function that will ensure the
> core domains are constructed first.
>
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> ---
> xen/arch/x86/domain-builder/core.c | 68 ++++++++++++++++++++++++--
> xen/arch/x86/include/asm/boot-domain.h | 2 +
> 2 files changed, 66 insertions(+), 4 deletions(-)
>
> diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-builder/core.c
> index 901efce62a61..f693aa46d278 100644
> --- a/xen/arch/x86/domain-builder/core.c
> +++ b/xen/arch/x86/domain-builder/core.c
> @@ -103,18 +103,78 @@ void __init builder_init(struct boot_info *bi)
> }
> }
>
> +static int __init build_core_domains(struct boot_info *bi)
> +{
> + int count = 0;
> + struct boot_domain *bd;
> + int hw, cd, xs;
> +
> + cd = first_boot_domain_index(bi, BUILD_CAPS_CONTROL);
> + hw = first_boot_domain_index(bi, BUILD_CAPS_HARDWARE);
> + xs = first_boot_domain_index(bi, BUILD_CAPS_XENSTORE);
This order has issues if you actually have disaggregated domains.
Control and Hardware depend on Xenstore for Xenstore.
Control and Xenstore depend on Hardware for console support.
I re-worked the xenstore allocation to run after domain creation. I've
upstreamed that for dom0less (and ARM doesn't have to deal with consoles).
So if xenstore allocation is moved later, Hardware, Xenstore, then
Control works. But xenstore and console could both be handled after the
fact and then the construction order doesn't matter. The backend domid
is needed to construct the event channel and grant entry. With assigned
domids, alloc_store_evtchn()/alloc_console_evtchn() can operate on the
domids instead of expecting the domain to have been constructed.
Regards,
Jason
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 03/38] x86/hyperlaunch: convert max vcpu determination to domain builder
2025-04-22 20:36 ` Jason Andryuk
@ 2025-04-25 15:05 ` Alejandro Vallejo
2025-04-25 22:03 ` Daniel P. Smith
1 sibling, 0 replies; 56+ messages in thread
From: Alejandro Vallejo @ 2025-04-25 15:05 UTC (permalink / raw)
To: Jason Andryuk, Daniel P. Smith, xen-devel
Cc: stefano.stabellini, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On Tue Apr 22, 2025 at 9:36 PM BST, Jason Andryuk wrote:
> On 2025-04-19 18:07, Daniel P. Smith wrote:
>> + limit = dom0_max_vcpus();
>
> dom0_max_vcpus() applies Xen's dom0_max_vcpus command line option. That
> is desirable for a traditional dom0. For a disaggregated, Hyperlaunch
> system, I'm not sure it's appropriate. Considering there can multiple
> control domains, it's more questionable.
>
> Might it be better to only apply Xen "dom0" command line options to
> non-hyperlaunch dom0? Or a domain with all of
> BUILD_CAPS_CONTROL/HARDWARE/XENSTORE?
Alternatively, why not apply it to the hardware domain instead? That's
guaranteed to be (at most) one, and will still function appropriately
when doing non-DTB based boots.
I'll make this adjustment while rebasing this rfc against my latest
hlaunch series.
>
> I guess it could stay as-is, but it seems unusual.
And would probably be particularly weird when it applies to all your
control domains and _not to your hardware domain, which incidentally is
the one domain with domid 0.
>
> Regards,
> Jason
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation to domain builder
2025-04-19 22:07 ` [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation " Daniel P. Smith
@ 2025-04-25 15:22 ` Alejandro Vallejo
2025-04-25 22:04 ` Daniel P. Smith
0 siblings, 1 reply; 56+ messages in thread
From: Alejandro Vallejo @ 2025-04-25 15:22 UTC (permalink / raw)
To: Daniel P. Smith, xen-devel
Cc: jason.andryuk, stefano.stabellini, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On Sat Apr 19, 2025 at 11:07 PM BST, Daniel P. Smith wrote:
> Convert alloc_dom0_vcpu0() to dom0_set_affinity(), making it only set up the
> node affinity based on command line parameters passed. At the same time,
> introduce alloc_dom_vcpu0() as the replacement for alloc_dom0_vcpu(). Then have
> alloc_dom_vcpu0() call dom0_set_affinity() when the boot domain is the control
> domain, otherwise set the affinity to auto.
>
> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
> ---
> xen/arch/x86/dom0_build.c | 4 +---
> xen/arch/x86/domain-builder/domain.c | 11 +++++++++++
> xen/arch/x86/include/asm/dom0_build.h | 2 ++
> xen/arch/x86/include/asm/domain-builder.h | 1 +
> xen/arch/x86/setup.c | 5 +++--
> 5 files changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
> index f2277b9e3cf3..619d36ea0b87 100644
> --- a/xen/arch/x86/domain-builder/domain.c
> +++ b/xen/arch/x86/domain-builder/domain.c
> @@ -9,6 +9,7 @@
> #include <xen/sched.h>
>
> #include <asm/bootinfo.h>
> +#include <asm/dom0_build.h>
>
> unsigned int __init dom_max_vcpus(struct boot_domain *bd)
> {
> @@ -27,6 +28,16 @@ unsigned int __init dom_max_vcpus(struct boot_domain *bd)
> return bd->max_vcpus;
> }
>
> +struct vcpu *__init alloc_dom_vcpu0(struct boot_domain *bd)
> +{
> + if ( bd->capabilities & BUILD_CAPS_CONTROL )
> + dom0_set_affinity(bd->d);
Similar as before, this probably wants to be DOMAIN_CAPS_HARDWARE?
I'll adjust while rebasing.
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 03/38] x86/hyperlaunch: convert max vcpu determination to domain builder
2025-04-22 20:36 ` Jason Andryuk
2025-04-25 15:05 ` Alejandro Vallejo
@ 2025-04-25 22:03 ` Daniel P. Smith
1 sibling, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-25 22:03 UTC (permalink / raw)
To: Jason Andryuk, xen-devel
Cc: stefano.stabellini, agarciav, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On 4/22/25 16:36, Jason Andryuk wrote:
> On 2025-04-19 18:07, Daniel P. Smith wrote:
>> The domain configuration may request more vcpus than are present in
>> the system.
>> For dom0, the function dom0_max_vcpus() was used to clamp down to
>> physically
>> available vcpus. Here we are introducing a generalized version,
>> dom_max_vcpus(), that takes a boot domain and sets the max vcpus based
>> on the
>> lesser of the requested max and the available vcpus.
>>
>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>> ---
>
>> diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/
>> domain-builder/domain.c
>> new file mode 100644
>> index 000000000000..f2277b9e3cf3
>> --- /dev/null
>> +++ b/xen/arch/x86/domain-builder/domain.c
>> @@ -0,0 +1,38 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Copyright (C) 2024, Apertus Solutions, LLC
>> + */
>> +
>> +#include <xen/cpumask.h>
>> +#include <xen/domain.h>
>> +#include <xen/init.h>
>> +#include <xen/sched.h>
>> +
>> +#include <asm/bootinfo.h>
>> +
>> +unsigned int __init dom_max_vcpus(struct boot_domain *bd)
>> +{
>> + unsigned int limit = bd->mode & BUILD_MODE_PARAVIRT ?
>> + MAX_VIRT_CPUS : HVM_MAX_VCPUS;
>> +
>> + if ( bd->capabilities & BUILD_CAPS_CONTROL )
>
> I added xen/include/public/bootfdt.h with DOMAIN_CAPS_CONTROL and the
> other capabilities to provide common values.
>
>> + limit = dom0_max_vcpus();
>
> dom0_max_vcpus() applies Xen's dom0_max_vcpus command line option. That
> is desirable for a traditional dom0. For a disaggregated, Hyperlaunch
> system, I'm not sure it's appropriate. Considering there can multiple
> control domains, it's more questionable.
>
> Might it be better to only apply Xen "dom0" command line options to non-
> hyperlaunch dom0? Or a domain with all of BUILD_CAPS_CONTROL/HARDWARE/
> XENSTORE?
>
> I guess it could stay as-is, but it seems unusual.
The larger issue is that the cmdline params are going to need to be
addressed. I see three approaches, first would be to only apply the
params when there is a single domain with ctrl/hw/xs all set, or the
second approach would be to change the params to support multiple domain
statements. Though the second approach has the issue of the need to
decide how to support the legacy params
v/r,
dps
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation to domain builder
2025-04-25 15:22 ` Alejandro Vallejo
@ 2025-04-25 22:04 ` Daniel P. Smith
2025-04-28 10:33 ` Alejandro Vallejo
0 siblings, 1 reply; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-25 22:04 UTC (permalink / raw)
To: Alejandro Vallejo, xen-devel
Cc: jason.andryuk, stefano.stabellini, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On 4/25/25 11:22, Alejandro Vallejo wrote:
> On Sat Apr 19, 2025 at 11:07 PM BST, Daniel P. Smith wrote:
>> Convert alloc_dom0_vcpu0() to dom0_set_affinity(), making it only set up the
>> node affinity based on command line parameters passed. At the same time,
>> introduce alloc_dom_vcpu0() as the replacement for alloc_dom0_vcpu(). Then have
>> alloc_dom_vcpu0() call dom0_set_affinity() when the boot domain is the control
>> domain, otherwise set the affinity to auto.
>>
>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>> ---
>> xen/arch/x86/dom0_build.c | 4 +---
>> xen/arch/x86/domain-builder/domain.c | 11 +++++++++++
>> xen/arch/x86/include/asm/dom0_build.h | 2 ++
>> xen/arch/x86/include/asm/domain-builder.h | 1 +
>> xen/arch/x86/setup.c | 5 +++--
>> 5 files changed, 18 insertions(+), 5 deletions(-)
>>
>> diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
>> index f2277b9e3cf3..619d36ea0b87 100644
>> --- a/xen/arch/x86/domain-builder/domain.c
>> +++ b/xen/arch/x86/domain-builder/domain.c
>> @@ -9,6 +9,7 @@
>> #include <xen/sched.h>
>>
>> #include <asm/bootinfo.h>
>> +#include <asm/dom0_build.h>
>>
>> unsigned int __init dom_max_vcpus(struct boot_domain *bd)
>> {
>> @@ -27,6 +28,16 @@ unsigned int __init dom_max_vcpus(struct boot_domain *bd)
>> return bd->max_vcpus;
>> }
>>
>> +struct vcpu *__init alloc_dom_vcpu0(struct boot_domain *bd)
>> +{
>> + if ( bd->capabilities & BUILD_CAPS_CONTROL )
>> + dom0_set_affinity(bd->d);
>
> Similar as before, this probably wants to be DOMAIN_CAPS_HARDWARE?
>
> I'll adjust while rebasing.
Does it?
v/r,
dps
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 16/38] x86/boot: move and rename sched_setup_dom0_vcpus
2025-04-20 9:36 ` Jürgen Groß
2025-04-22 12:38 ` Daniel P. Smith
@ 2025-04-25 22:05 ` Daniel P. Smith
1 sibling, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-25 22:05 UTC (permalink / raw)
To: Jürgen Groß, xen-devel
Cc: jason.andryuk, stefano.stabellini, agarciav, Jan Beulich,
Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, Dario Faggioli, George Dunlap
On 4/20/25 05:36, Jürgen Groß wrote:
> On 20.04.25 00:07, Daniel P. Smith wrote:
>> Relocated the function sched_setup_dom0_vcpus(), which was protected
>> by an
>> ifdef CONFIG_X86, from common/sched to the hyperlaunch domain builder.
>> Rename
>> it to alloc_dom_vcpus() to better reflect the purpose of the function.
>
> I'd be in favor naming it domain_vcpus_create().
I would add that since that name came was based on the function name
alloc_dom_vpcu0(), I have also switched it so that both follow that scheme.
v/r,
dps
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 32/38] x86/hyperlaunch: introduce concept of core domains
2025-04-23 19:50 ` Jason Andryuk
@ 2025-04-25 22:06 ` Daniel P. Smith
0 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-25 22:06 UTC (permalink / raw)
To: Jason Andryuk, xen-devel
Cc: stefano.stabellini, agarciav, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On 4/23/25 15:50, Jason Andryuk wrote:
> On 2025-04-19 18:08, Daniel P. Smith wrote:
>> When constructing domU, and specifically the event channels for their
>> console
>> and xenstore event channels, the domid for the backing domain must be
>> known.
>> Therefore, the control, hardware, and xenstore domains are deemed as core
>> domains, and must be constructed before any of the other domains.
>>
>> This commit introduces the build_core_domains() function that will
>> ensure the
>> core domains are constructed first.
>>
>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>> ---
>> xen/arch/x86/domain-builder/core.c | 68 ++++++++++++++++++++++++--
>> xen/arch/x86/include/asm/boot-domain.h | 2 +
>> 2 files changed, 66 insertions(+), 4 deletions(-)
>>
>> diff --git a/xen/arch/x86/domain-builder/core.c b/xen/arch/x86/domain-
>> builder/core.c
>> index 901efce62a61..f693aa46d278 100644
>> --- a/xen/arch/x86/domain-builder/core.c
>> +++ b/xen/arch/x86/domain-builder/core.c
>> @@ -103,18 +103,78 @@ void __init builder_init(struct boot_info *bi)
>> }
>> }
>> +static int __init build_core_domains(struct boot_info *bi)
>> +{
>> + int count = 0;
>> + struct boot_domain *bd;
>> + int hw, cd, xs;
>> +
>> + cd = first_boot_domain_index(bi, BUILD_CAPS_CONTROL);
>
>> + hw = first_boot_domain_index(bi, BUILD_CAPS_HARDWARE);
>
>> + xs = first_boot_domain_index(bi, BUILD_CAPS_XENSTORE);
>
> This order has issues if you actually have disaggregated domains.
>
> Control and Hardware depend on Xenstore for Xenstore.
>
> Control and Xenstore depend on Hardware for console support.
>
> I re-worked the xenstore allocation to run after domain creation. I've
> upstreamed that for dom0less (and ARM doesn't have to deal with consoles).
>
> So if xenstore allocation is moved later, Hardware, Xenstore, then
> Control works. But xenstore and console could both be handled after the
> fact and then the construction order doesn't matter. The backend domid
> is needed to construct the event channel and grant entry. With assigned
> domids, alloc_store_evtchn()/alloc_console_evtchn() can operate on the
> domids instead of expecting the domain to have been constructed.
Actually, there is a larger issue here and this is going to be reworked.
V/r,
dps
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant
2025-04-23 19:27 ` Jason Andryuk
@ 2025-04-25 22:16 ` Daniel P. Smith
2025-04-26 1:53 ` Daniel P. Smith
1 sibling, 0 replies; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-25 22:16 UTC (permalink / raw)
To: Jason Andryuk, xen-devel
Cc: stefano.stabellini, agarciav, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On 4/23/25 15:27, Jason Andryuk wrote:
> On 2025-04-19 18:08, Daniel P. Smith wrote:
>> The bzimage logic uses the unit global orig_image_len to hold the
>> original
>> module length for the kernel when the headroom is calculated. It then
>> uses
>> orig_image_len to locate the start of the bzimage when the expansion
>> is done.
>> This is an issue when more than one bzimage is processed by the headroom
>> calculation logic, as it will leave orig_image_len set to the length
>> of the
>> last bzimage it processed.
>>
>> The boot module work introduced storing the headroom size on a per module
>> basis. By passing in the headroom from the boot module, orig_image_len
>> is no
>> longer needed to locate the beginning of the bzimage after the allocated
>> headroom. The bzimage functions are reworked as such, allowing the
>> removal of
>> orig_image_len and enabling them to be reused by multiple kernel boot
>> modules.
>>
>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>> ---
>> xen/arch/x86/bzimage.c | 38 ++++++++++++++++++------------
>> xen/arch/x86/hvm/dom_build.c | 3 ++-
>> xen/arch/x86/include/asm/bzimage.h | 5 ++--
>> xen/arch/x86/pv/dom0_build.c | 3 ++-
>> 4 files changed, 30 insertions(+), 19 deletions(-)
>>
>> diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
>> index 66f648f311e4..32f0360d25b4 100644
>
>> @@ -103,13 +100,20 @@ unsigned long __init bzimage_headroom(void
>> *image_start,
>> return headroom;
>> }
>> -int __init bzimage_parse(void *image_base, void **image_start,
>> - unsigned long *image_len)
>> +int __init bzimage_parse(
>> + void *image_base, void **image_start, unsigned long headroom,
>> + unsigned long *image_len)
>> {
>> struct setup_header *hdr = (struct setup_header *)(*image_start);
>> int err = bzimage_check(hdr, *image_len);
>> - unsigned long output_len;
>> -
>> + unsigned long module_len = *image_len;
>> +
>> + /*
>> + * Variable err will have one of three values:
>> + * - < 0: a error occurred trying to inspect the contents
>> + * - > 0: the image is a bzImage
>> + * - == 0: not a bzImage, could be raw elf or elf.gz (vmlinuz.gz)
>> + */
>
> This comment seems a little independent of this change, so maybe it
> should be submitted separately. Also, I think a better placement would
> be next to bzimage_check().
>
>> if ( err < 0 )
>> return err;
>> @@ -118,21 +122,25 @@ int __init bzimage_parse(void *image_base, void
>> **image_start,
>> *image_start += (hdr->setup_sects + 1) * 512 + hdr-
>> >payload_offset;
>> *image_len = hdr->payload_length;
>
> @here
>
>> }
>> -
>> - if ( elf_is_elfbinary(*image_start, *image_len) )
>> - return 0;
>> + else
>> + {
>> + if ( elf_is_elfbinary(*image_start, *image_len) )
>> + return 0;
>> + else
>> + *image_len = *image_len - headroom;
>> + }
>
> I don't like this extra indention which includes the return. If you
> retain orig_image_len as a local variable, and set it above at "@here",
> you can have a smaller diff and leave cleaner logic.
>
> orig_image_len previously was set as a static variable, so the correct
> value was set after bzimage_headroom. Now that it is no longer static,
> we need to grab the hdr->payload_length value when we have a bzImage.
> Otherwise output_length will read past the end of the module.
>
> Below is the diff for bzimage.c with the change I suggest.
>
I will review it again when as I rebase it.
v/r,
dps
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant
2025-04-23 19:27 ` Jason Andryuk
2025-04-25 22:16 ` Daniel P. Smith
@ 2025-04-26 1:53 ` Daniel P. Smith
2025-04-28 6:41 ` Jan Beulich
1 sibling, 1 reply; 56+ messages in thread
From: Daniel P. Smith @ 2025-04-26 1:53 UTC (permalink / raw)
To: Jason Andryuk, xen-devel
Cc: stefano.stabellini, agarciav, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On 4/23/25 15:27, Jason Andryuk wrote:
> On 2025-04-19 18:08, Daniel P. Smith wrote:
>> The bzimage logic uses the unit global orig_image_len to hold the
>> original
>> module length for the kernel when the headroom is calculated. It then
>> uses
>> orig_image_len to locate the start of the bzimage when the expansion
>> is done.
>> This is an issue when more than one bzimage is processed by the headroom
>> calculation logic, as it will leave orig_image_len set to the length
>> of the
>> last bzimage it processed.
>>
>> The boot module work introduced storing the headroom size on a per module
>> basis. By passing in the headroom from the boot module, orig_image_len
>> is no
>> longer needed to locate the beginning of the bzimage after the allocated
>> headroom. The bzimage functions are reworked as such, allowing the
>> removal of
>> orig_image_len and enabling them to be reused by multiple kernel boot
>> modules.
>>
>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>> ---
>> xen/arch/x86/bzimage.c | 38 ++++++++++++++++++------------
>> xen/arch/x86/hvm/dom_build.c | 3 ++-
>> xen/arch/x86/include/asm/bzimage.h | 5 ++--
>> xen/arch/x86/pv/dom0_build.c | 3 ++-
>> 4 files changed, 30 insertions(+), 19 deletions(-)
>>
>> diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
>> index 66f648f311e4..32f0360d25b4 100644
>
>> @@ -103,13 +100,20 @@ unsigned long __init bzimage_headroom(void
>> *image_start,
>> return headroom;
>> }
>> -int __init bzimage_parse(void *image_base, void **image_start,
>> - unsigned long *image_len)
>> +int __init bzimage_parse(
>> + void *image_base, void **image_start, unsigned long headroom,
>> + unsigned long *image_len)
>> {
>> struct setup_header *hdr = (struct setup_header *)(*image_start);
>> int err = bzimage_check(hdr, *image_len);
>> - unsigned long output_len;
>> -
>> + unsigned long module_len = *image_len;
>> +
>> + /*
>> + * Variable err will have one of three values:
>> + * - < 0: a error occurred trying to inspect the contents
>> + * - > 0: the image is a bzImage
>> + * - == 0: not a bzImage, could be raw elf or elf.gz (vmlinuz.gz)
>> + */
>
> This comment seems a little independent of this change, so maybe it
> should be submitted separately. Also, I think a better placement would
> be next to bzimage_check().
>
>> if ( err < 0 )
>> return err;
>> @@ -118,21 +122,25 @@ int __init bzimage_parse(void *image_base, void
>> **image_start,
>> *image_start += (hdr->setup_sects + 1) * 512 + hdr-
>> >payload_offset;
>> *image_len = hdr->payload_length;
>
> @here
>
>> }
>> -
>> - if ( elf_is_elfbinary(*image_start, *image_len) )
>> - return 0;
>> + else
>> + {
>> + if ( elf_is_elfbinary(*image_start, *image_len) )
>> + return 0;
>> + else
>> + *image_len = *image_len - headroom;
>> + }
>
> I don't like this extra indention which includes the return. If you
> retain orig_image_len as a local variable, and set it above at "@here",
> you can have a smaller diff and leave cleaner logic.
Right, but I find it sillier to be checking every kernel for elf when we
know it's a bzImage. While the elf check is fairly simplistic, it is
still multiple value checks.
> orig_image_len previously was set as a static variable, so the correct
> value was set after bzimage_headroom. Now that it is no longer static,
> we need to grab the hdr->payload_length value when we have a bzImage.
> Otherwise output_length will read past the end of the module.
Yes, I am fully aware.
> Below is the diff for bzimage.c with the change I suggest.
I will provide an alternate version.
v/r,
dps
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant
2025-04-26 1:53 ` Daniel P. Smith
@ 2025-04-28 6:41 ` Jan Beulich
2025-04-28 14:16 ` Jason Andryuk
0 siblings, 1 reply; 56+ messages in thread
From: Jan Beulich @ 2025-04-28 6:41 UTC (permalink / raw)
To: Daniel P. Smith
Cc: stefano.stabellini, agarciav, Andrew Cooper, Roger Pau Monné,
Jason Andryuk, xen-devel
On 26.04.2025 03:53, Daniel P. Smith wrote:
> On 4/23/25 15:27, Jason Andryuk wrote:
>> On 2025-04-19 18:08, Daniel P. Smith wrote:
>>> The bzimage logic uses the unit global orig_image_len to hold the
>>> original
>>> module length for the kernel when the headroom is calculated. It then
>>> uses
>>> orig_image_len to locate the start of the bzimage when the expansion
>>> is done.
>>> This is an issue when more than one bzimage is processed by the headroom
>>> calculation logic, as it will leave orig_image_len set to the length
>>> of the
>>> last bzimage it processed.
>>>
>>> The boot module work introduced storing the headroom size on a per module
>>> basis. By passing in the headroom from the boot module, orig_image_len
>>> is no
>>> longer needed to locate the beginning of the bzimage after the allocated
>>> headroom. The bzimage functions are reworked as such, allowing the
>>> removal of
>>> orig_image_len and enabling them to be reused by multiple kernel boot
>>> modules.
>>>
>>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>>> ---
>>> xen/arch/x86/bzimage.c | 38 ++++++++++++++++++------------
>>> xen/arch/x86/hvm/dom_build.c | 3 ++-
>>> xen/arch/x86/include/asm/bzimage.h | 5 ++--
>>> xen/arch/x86/pv/dom0_build.c | 3 ++-
>>> 4 files changed, 30 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
>>> index 66f648f311e4..32f0360d25b4 100644
>>
>>> @@ -103,13 +100,20 @@ unsigned long __init bzimage_headroom(void
>>> *image_start,
>>> return headroom;
>>> }
>>> -int __init bzimage_parse(void *image_base, void **image_start,
>>> - unsigned long *image_len)
>>> +int __init bzimage_parse(
>>> + void *image_base, void **image_start, unsigned long headroom,
>>> + unsigned long *image_len)
>>> {
>>> struct setup_header *hdr = (struct setup_header *)(*image_start);
>>> int err = bzimage_check(hdr, *image_len);
>>> - unsigned long output_len;
>>> -
>>> + unsigned long module_len = *image_len;
>>> +
>>> + /*
>>> + * Variable err will have one of three values:
>>> + * - < 0: a error occurred trying to inspect the contents
>>> + * - > 0: the image is a bzImage
>>> + * - == 0: not a bzImage, could be raw elf or elf.gz (vmlinuz.gz)
>>> + */
>>
>> This comment seems a little independent of this change, so maybe it
>> should be submitted separately. Also, I think a better placement would
>> be next to bzimage_check().
>>
>>> if ( err < 0 )
>>> return err;
>>> @@ -118,21 +122,25 @@ int __init bzimage_parse(void *image_base, void
>>> **image_start,
>>> *image_start += (hdr->setup_sects + 1) * 512 + hdr-
>>>> payload_offset;
>>> *image_len = hdr->payload_length;
>>
>> @here
>>
>>> }
>>> -
>>> - if ( elf_is_elfbinary(*image_start, *image_len) )
>>> - return 0;
>>> + else
>>> + {
>>> + if ( elf_is_elfbinary(*image_start, *image_len) )
>>> + return 0;
>>> + else
>>> + *image_len = *image_len - headroom;
>>> + }
>>
>> I don't like this extra indention which includes the return. If you
>> retain orig_image_len as a local variable, and set it above at "@here",
>> you can have a smaller diff and leave cleaner logic.
>
> Right, but I find it sillier to be checking every kernel for elf when we
> know it's a bzImage. While the elf check is fairly simplistic, it is
> still multiple value checks.
Even without any extra local vars the above can be
else if ( elf_is_elfbinary(*image_start, *image_len) )
return 0;
else
*image_len = *image_len - headroom;
which would already address the "extra indentation" aspect.
Jan
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation to domain builder
2025-04-25 22:04 ` Daniel P. Smith
@ 2025-04-28 10:33 ` Alejandro Vallejo
2025-04-28 11:01 ` Jan Beulich
0 siblings, 1 reply; 56+ messages in thread
From: Alejandro Vallejo @ 2025-04-28 10:33 UTC (permalink / raw)
To: Daniel P. Smith, xen-devel
Cc: jason.andryuk, stefano.stabellini, Jan Beulich, Andrew Cooper,
Roger Pau Monné
On Fri Apr 25, 2025 at 11:04 PM BST, Daniel P. Smith wrote:
> On 4/25/25 11:22, Alejandro Vallejo wrote:
>> On Sat Apr 19, 2025 at 11:07 PM BST, Daniel P. Smith wrote:
>>> Convert alloc_dom0_vcpu0() to dom0_set_affinity(), making it only set up the
>>> node affinity based on command line parameters passed. At the same time,
>>> introduce alloc_dom_vcpu0() as the replacement for alloc_dom0_vcpu(). Then have
>>> alloc_dom_vcpu0() call dom0_set_affinity() when the boot domain is the control
>>> domain, otherwise set the affinity to auto.
>>>
>>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>>> ---
>>> xen/arch/x86/dom0_build.c | 4 +---
>>> xen/arch/x86/domain-builder/domain.c | 11 +++++++++++
>>> xen/arch/x86/include/asm/dom0_build.h | 2 ++
>>> xen/arch/x86/include/asm/domain-builder.h | 1 +
>>> xen/arch/x86/setup.c | 5 +++--
>>> 5 files changed, 18 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
>>> index f2277b9e3cf3..619d36ea0b87 100644
>>> --- a/xen/arch/x86/domain-builder/domain.c
>>> +++ b/xen/arch/x86/domain-builder/domain.c
>>> @@ -9,6 +9,7 @@
>>> #include <xen/sched.h>
>>>
>>> #include <asm/bootinfo.h>
>>> +#include <asm/dom0_build.h>
>>>
>>> unsigned int __init dom_max_vcpus(struct boot_domain *bd)
>>> {
>>> @@ -27,6 +28,16 @@ unsigned int __init dom_max_vcpus(struct boot_domain *bd)
>>> return bd->max_vcpus;
>>> }
>>>
>>> +struct vcpu *__init alloc_dom_vcpu0(struct boot_domain *bd)
>>> +{
>>> + if ( bd->capabilities & BUILD_CAPS_CONTROL )
>>> + dom0_set_affinity(bd->d);
>>
>> Similar as before, this probably wants to be DOMAIN_CAPS_HARDWARE?
>>
>> I'll adjust while rebasing.
>
> Does it?
>
> v/r,
> dps
The situation is similar later on when choosing a CPU policy. Why
mustn't the hardware domain get the same treatment as the control
domains? Using (DOMAIN_CAPS_CONTROL | DOMAIN_CAPS_HARDWARE) at the
very least seems warranted.
All these cases single-out dom0 when dom0 is both a control and a
hardware domain, but as Jason mentioned how is Xen meant to deal with
dom0_X arguments when dom0 is disaggregated? Either it applies to all
its constituents (with the plausible exception of a xenstore domain), or
just one (the hardware domain), or none. Only applying to control
domains and not the hardware domain doesn't look right (to me).
Cheers,
Alejandro
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation to domain builder
2025-04-28 10:33 ` Alejandro Vallejo
@ 2025-04-28 11:01 ` Jan Beulich
0 siblings, 0 replies; 56+ messages in thread
From: Jan Beulich @ 2025-04-28 11:01 UTC (permalink / raw)
To: Alejandro Vallejo
Cc: jason.andryuk, stefano.stabellini, Andrew Cooper,
Roger Pau Monné, Daniel P. Smith, xen-devel
On 28.04.2025 12:33, Alejandro Vallejo wrote:
> On Fri Apr 25, 2025 at 11:04 PM BST, Daniel P. Smith wrote:
>> On 4/25/25 11:22, Alejandro Vallejo wrote:
>>> On Sat Apr 19, 2025 at 11:07 PM BST, Daniel P. Smith wrote:
>>>> Convert alloc_dom0_vcpu0() to dom0_set_affinity(), making it only set up the
>>>> node affinity based on command line parameters passed. At the same time,
>>>> introduce alloc_dom_vcpu0() as the replacement for alloc_dom0_vcpu(). Then have
>>>> alloc_dom_vcpu0() call dom0_set_affinity() when the boot domain is the control
>>>> domain, otherwise set the affinity to auto.
>>>>
>>>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>>>> ---
>>>> xen/arch/x86/dom0_build.c | 4 +---
>>>> xen/arch/x86/domain-builder/domain.c | 11 +++++++++++
>>>> xen/arch/x86/include/asm/dom0_build.h | 2 ++
>>>> xen/arch/x86/include/asm/domain-builder.h | 1 +
>>>> xen/arch/x86/setup.c | 5 +++--
>>>> 5 files changed, 18 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/domain-builder/domain.c b/xen/arch/x86/domain-builder/domain.c
>>>> index f2277b9e3cf3..619d36ea0b87 100644
>>>> --- a/xen/arch/x86/domain-builder/domain.c
>>>> +++ b/xen/arch/x86/domain-builder/domain.c
>>>> @@ -9,6 +9,7 @@
>>>> #include <xen/sched.h>
>>>>
>>>> #include <asm/bootinfo.h>
>>>> +#include <asm/dom0_build.h>
>>>>
>>>> unsigned int __init dom_max_vcpus(struct boot_domain *bd)
>>>> {
>>>> @@ -27,6 +28,16 @@ unsigned int __init dom_max_vcpus(struct boot_domain *bd)
>>>> return bd->max_vcpus;
>>>> }
>>>>
>>>> +struct vcpu *__init alloc_dom_vcpu0(struct boot_domain *bd)
>>>> +{
>>>> + if ( bd->capabilities & BUILD_CAPS_CONTROL )
>>>> + dom0_set_affinity(bd->d);
>>>
>>> Similar as before, this probably wants to be DOMAIN_CAPS_HARDWARE?
>>>
>>> I'll adjust while rebasing.
>>
>> Does it?
>>
>> v/r,
>> dps
>
> The situation is similar later on when choosing a CPU policy. Why
> mustn't the hardware domain get the same treatment as the control
> domains? Using (DOMAIN_CAPS_CONTROL | DOMAIN_CAPS_HARDWARE) at the
> very least seems warranted.
>
> All these cases single-out dom0 when dom0 is both a control and a
> hardware domain, but as Jason mentioned how is Xen meant to deal with
> dom0_X arguments when dom0 is disaggregated? Either it applies to all
> its constituents (with the plausible exception of a xenstore domain),
This one-fits-all seems very unlikely to me to make sense, while
> or just one (the hardware domain), or none.
... either of these would. "None" in particular might if all config
information is coming from e.g. DT anyway in such an setup.
> Only applying to control
> domains and not the hardware domain doesn't look right (to me).
+1
Jan
^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant
2025-04-28 6:41 ` Jan Beulich
@ 2025-04-28 14:16 ` Jason Andryuk
0 siblings, 0 replies; 56+ messages in thread
From: Jason Andryuk @ 2025-04-28 14:16 UTC (permalink / raw)
To: Jan Beulich, Daniel P. Smith
Cc: stefano.stabellini, agarciav, Andrew Cooper, Roger Pau Monné,
xen-devel
On 2025-04-28 02:41, Jan Beulich wrote:
> On 26.04.2025 03:53, Daniel P. Smith wrote:
>> On 4/23/25 15:27, Jason Andryuk wrote:
>>> On 2025-04-19 18:08, Daniel P. Smith wrote:
>>>> The bzimage logic uses the unit global orig_image_len to hold the
>>>> original
>>>> module length for the kernel when the headroom is calculated. It then
>>>> uses
>>>> orig_image_len to locate the start of the bzimage when the expansion
>>>> is done.
>>>> This is an issue when more than one bzimage is processed by the headroom
>>>> calculation logic, as it will leave orig_image_len set to the length
>>>> of the
>>>> last bzimage it processed.
>>>>
>>>> The boot module work introduced storing the headroom size on a per module
>>>> basis. By passing in the headroom from the boot module, orig_image_len
>>>> is no
>>>> longer needed to locate the beginning of the bzimage after the allocated
>>>> headroom. The bzimage functions are reworked as such, allowing the
>>>> removal of
>>>> orig_image_len and enabling them to be reused by multiple kernel boot
>>>> modules.
>>>>
>>>> Signed-off-by: Daniel P. Smith <dpsmith@apertussolutions.com>
>>>> ---
>>>> xen/arch/x86/bzimage.c | 38 ++++++++++++++++++------------
>>>> xen/arch/x86/hvm/dom_build.c | 3 ++-
>>>> xen/arch/x86/include/asm/bzimage.h | 5 ++--
>>>> xen/arch/x86/pv/dom0_build.c | 3 ++-
>>>> 4 files changed, 30 insertions(+), 19 deletions(-)
>>>>
>>>> diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
>>>> index 66f648f311e4..32f0360d25b4 100644
>>>
>>>> @@ -103,13 +100,20 @@ unsigned long __init bzimage_headroom(void
>>>> *image_start,
>>>> return headroom;
>>>> }
>>>> -int __init bzimage_parse(void *image_base, void **image_start,
>>>> - unsigned long *image_len)
>>>> +int __init bzimage_parse(
>>>> + void *image_base, void **image_start, unsigned long headroom,
>>>> + unsigned long *image_len)
>>>> {
>>>> struct setup_header *hdr = (struct setup_header *)(*image_start);
>>>> int err = bzimage_check(hdr, *image_len);
>>>> - unsigned long output_len;
>>>> -
>>>> + unsigned long module_len = *image_len;
>>>> +
>>>> + /*
>>>> + * Variable err will have one of three values:
>>>> + * - < 0: a error occurred trying to inspect the contents
>>>> + * - > 0: the image is a bzImage
>>>> + * - == 0: not a bzImage, could be raw elf or elf.gz (vmlinuz.gz)
>>>> + */
>>>
>>> This comment seems a little independent of this change, so maybe it
>>> should be submitted separately. Also, I think a better placement would
>>> be next to bzimage_check().
>>>
>>>> if ( err < 0 )
>>>> return err;
>>>> @@ -118,21 +122,25 @@ int __init bzimage_parse(void *image_base, void
>>>> **image_start,
>>>> *image_start += (hdr->setup_sects + 1) * 512 + hdr-
>>>>> payload_offset;
>>>> *image_len = hdr->payload_length;
>>>
>>> @here
>>>
>>>> }
>>>> -
>>>> - if ( elf_is_elfbinary(*image_start, *image_len) )
>>>> - return 0;
>>>> + else
>>>> + {
>>>> + if ( elf_is_elfbinary(*image_start, *image_len) )
>>>> + return 0;
>>>> + else
>>>> + *image_len = *image_len - headroom;
>>>> + }
>>>
>>> I don't like this extra indention which includes the return. If you
>>> retain orig_image_len as a local variable, and set it above at "@here",
>>> you can have a smaller diff and leave cleaner logic.
>>
>> Right, but I find it sillier to be checking every kernel for elf when we
>> know it's a bzImage. While the elf check is fairly simplistic, it is
>> still multiple value checks.
bzimage_parse() needs to handle:
ELF
compressed ELF
bzImage
bzimage_check just returns
+ * Variable err will have one of three values:
+ * - < 0: a error occurred trying to inspect the contents
+ * - > 0: the image is a bzImage
+ * - == 0: not a bzImage, could be raw elf or elf.gz (vmlinuz.gz)
The elf_is_elfbinary() check is needed to detect plain ELF files and
skip decompression.
> Even without any extra local vars the above can be
>
> else if ( elf_is_elfbinary(*image_start, *image_len) )
> return 0;
> else
> *image_len = *image_len - headroom;
>
> which would already address the "extra indentation" aspect.
Yes. Thanks, but this actually helps illustrate how Dan's change is
incorrect. Well, Dan's elimination of orig_image_len makes *image_len
correct later, but I think there are further issues.
if ( err > 0 ) <--- bzImage case
{
*image_start += (hdr->setup_sects + 1) * 512 + hdr->payload_offset;
*image_len = hdr->payload_length;
}
else if ( elf_is_elfbinary(*image_start, *image_len) ) <--- ELF
return 0;
else <--- compressed ELF
*image_len = *image_len - headroom;
My addition is to the bzImage case, since that is what I saw breaking
and fixed. Setting orig_image_len in the else of the bzImage case would
not fix it.
Later changes:
> - output_len = output_length(*image_start, orig_image_len);
You drop output_len...
> -
> - if ( (err = perform_gunzip(image_base, *image_start,
orig_image_len)) > 0 )
> - err = decompress(*image_start, orig_image_len, image_base);
> + if ( (err = perform_gunzip(image_base, *image_start,
*image_len)) > 0 )
> + err = decompress(*image_start, *image_len, image_base);
>
> if ( !err )
> {
> + printk(XENLOG_ERR "%s(%d): decompression failed, reseting
image start and len\n",
> + __func__, err);
... but this is actually the success case...
> *image_start = image_base;
> - *image_len = output_len;
> + *image_len = module_len;
... it should be writing output_len.
> }
Again, please just use the minimal change from the other email. It is
what I've been using, so tested and working. That is your original
conversion with the one line fixup and much easier to review.
Regards,
Jason
^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2025-04-28 14:17 UTC | newest]
Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-19 22:07 [RFC 00/38] Hyperlaunch domain builder Daniel P. Smith
2025-04-19 22:07 ` [RFC 01/38] maintainers: add new section for hyperlaunch Daniel P. Smith
2025-04-19 22:07 ` [RFC 02/38] x86/hyperlaunch: correct the naming of domain ramdisk field Daniel P. Smith
2025-04-19 22:07 ` [RFC 03/38] x86/hyperlaunch: convert max vcpu determination to domain builder Daniel P. Smith
2025-04-22 20:36 ` Jason Andryuk
2025-04-25 15:05 ` Alejandro Vallejo
2025-04-25 22:03 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 04/38] x86/hyperlaunch: convert vcpu0 creation " Daniel P. Smith
2025-04-25 15:22 ` Alejandro Vallejo
2025-04-25 22:04 ` Daniel P. Smith
2025-04-28 10:33 ` Alejandro Vallejo
2025-04-28 11:01 ` Jan Beulich
2025-04-19 22:07 ` [RFC 05/38] x86/hyperlaunch: move dom0 cpuid policy behind capability check Daniel P. Smith
2025-04-19 22:07 ` [RFC 06/38] x86/hyperlaunch: add hardware domain capability support Daniel P. Smith
2025-04-19 22:07 ` [RFC 07/38] x86/hyperlaunch: introduce pvh domain builder Daniel P. Smith
2025-04-19 22:07 ` [RFC 08/38] x86/hyperlaunch: move initial hwdom setup to dom_construct_pvh Daniel P. Smith
2025-04-19 22:07 ` [RFC 09/38] x86/boot: convert dom0 page calculation to use boot domain Daniel P. Smith
2025-04-19 22:07 ` [RFC 10/38] x86/boot: refactor dom0 page calculation Daniel P. Smith
2025-04-19 22:07 ` [RFC 11/38] x86/boot: generalize paging pages calculation Daniel P. Smith
2025-04-19 22:07 ` [RFC 12/38] x86/boot: generalize compute number of domain pages Daniel P. Smith
2025-04-19 22:07 ` [RFC 13/38] x86/hyperlaunch: move page computation to domain builder Daniel P. Smith
2025-04-19 22:07 ` [RFC 14/38] x86/hyperlaunch: move pvh p2m init " Daniel P. Smith
2025-04-19 22:07 ` [RFC 15/38] x86/hyperlaunch: move iommu " Daniel P. Smith
2025-04-19 22:07 ` [RFC 16/38] x86/boot: move and rename sched_setup_dom0_vcpus Daniel P. Smith
2025-04-20 9:36 ` Jürgen Groß
2025-04-22 12:38 ` Daniel P. Smith
2025-04-25 22:05 ` Daniel P. Smith
2025-04-19 22:07 ` [RFC 17/38] x86/hyperlaunch: move pvh_setup_cpus to domain builder Daniel P. Smith
2025-04-19 22:08 ` [RFC 18/38] x86/boot: rename pvh acpi setup function Daniel P. Smith
2025-04-19 22:08 ` [RFC 19/38] x86/hyperlaunch: add domu memory map construction Daniel P. Smith
2025-04-19 22:08 ` [RFC 20/38] x86/hyperlaunch: move populating p2m under domain builder Daniel P. Smith
2025-04-19 22:08 ` [RFC 21/38] x86/hyperlaunch: move remaining pvh dom0 construction Daniel P. Smith
2025-04-19 22:08 ` [RFC 22/38] x86/hyperlaunch: relocate pvh_steal_ram to domain builder Daniel P. Smith
2025-04-19 22:08 ` [RFC 23/38] x86/hyperlaunch: add domu acpi construction Daniel P. Smith
2025-04-19 22:08 ` [RFC 24/38] x86/boot: export command line processing Daniel P. Smith
2025-04-19 22:08 ` [RFC 25/38] x86/hyperlaunch: convert create_dom0 to arch_create_dom Daniel P. Smith
2025-04-19 22:08 ` [RFC 26/38] x86/hyperlaunch: remove dom0-isms from arch_create_dom Daniel P. Smith
2025-04-19 22:08 ` [RFC 27/38] x86/hyperlaunch: introduce domain builder general dom creation Daniel P. Smith
2025-04-19 22:08 ` [RFC 28/38] x86/hyperlaunch: add xenstore boot capabilities flag Daniel P. Smith
2025-04-19 22:08 ` [RFC 29/38] x86/hyperlaunch: allocate console for domu Daniel P. Smith
2025-04-19 22:08 ` [RFC 30/38] x86/hyperlaunch: allocate xenstore " Daniel P. Smith
2025-04-19 22:08 ` [RFC 31/38] x86/hyperlaunch: move boot module discard to domain builder Daniel P. Smith
2025-04-19 22:08 ` [RFC 32/38] x86/hyperlaunch: introduce concept of core domains Daniel P. Smith
2025-04-23 19:50 ` Jason Andryuk
2025-04-25 22:06 ` Daniel P. Smith
2025-04-19 22:08 ` [RFC 33/38] x86/boot: refactor bzimage parser to be re-enterant Daniel P. Smith
2025-04-23 19:27 ` Jason Andryuk
2025-04-25 22:16 ` Daniel P. Smith
2025-04-26 1:53 ` Daniel P. Smith
2025-04-28 6:41 ` Jan Beulich
2025-04-28 14:16 ` Jason Andryuk
2025-04-19 22:08 ` [RFC 34/38] x86/hyperlaunch: introduce multidomain kconfig option Daniel P. Smith
2025-04-19 22:08 ` [RFC 35/38] x86/hyperlaunch: add multidomain construction logic Daniel P. Smith
2025-04-19 22:08 ` [RFC 36/38] x86/hyperlaunch: enable unpausing mulitple domains Daniel P. Smith
2025-04-19 22:08 ` [RFC 37/38] x86/hyperlaunch: generalize domid assignment Daniel P. Smith
2025-04-19 22:08 ` [RFC 38/38] tools: introduce hyperlaunch domain late init Daniel P. Smith
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.