* [PATCH 01/17] libacpi: Split dsdt.asl file and extract i440 specific parts
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 9:05 ` Roger Pau Monné
2026-05-04 14:34 ` Jan Beulich
2026-03-13 16:35 ` [PATCH 06/17] hvmloader: Move pci devices setup to a separate function Thierry Escande
` (17 subsequent siblings)
18 siblings, 2 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD
In order to factorize the common parts between i440 and q35 dsdt files,
this patch splits dsdt.asl and put the i440 specific parts into
dsdt_i400.asl.
This also makes use of #include directives instead of file
concatenations to build the asl files.
Also, the anycpu asl files generation makes use of makefile pattern
rules to avoid duplication for i440 and q35.
Becuase the LPC controller BDF (which differs between i440 and q35) must
be set at device declaration, it is still set in dsdt.asl by checking
for a MACHINE_TYPE_I440 macro defined in dsdt_i400.asl.
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/Makefile | 2 +-
tools/firmware/hvmloader/ovmf.c | 4 ++--
tools/firmware/hvmloader/seabios.c | 4 ++--
tools/firmware/hvmloader/util.c | 4 ++--
tools/firmware/hvmloader/util.h | 4 ++--
tools/libacpi/Makefile | 10 ++++-----
tools/libacpi/dsdt.asl | 25 ++++-----------------
tools/libacpi/dsdt_i440.asl | 36 ++++++++++++++++++++++++++++++
8 files changed, 53 insertions(+), 36 deletions(-)
create mode 100644 tools/libacpi/dsdt_i440.asl
diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index 21de72187d..bdc33a877f 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -78,7 +78,7 @@ rombios.o: roms.inc
smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
ACPI_PATH = ../../libacpi
-DSDT_FILES += dsdt_anycpu_qemu_xen.c
+DSDT_FILES += dsdt_i440_anycpu_qemu_xen.c
ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o
$(ACPI_OBJS): CFLAGS += -iquote . -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
CFLAGS += -I$(ACPI_PATH)
diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
index 23610a0717..d264a50c73 100644
--- a/tools/firmware/hvmloader/ovmf.c
+++ b/tools/firmware/hvmloader/ovmf.c
@@ -119,8 +119,8 @@ static void ovmf_load(const struct bios_config *config,
static void ovmf_acpi_build_tables(void)
{
struct acpi_config config = {
- .dsdt_anycpu = dsdt_anycpu_qemu_xen,
- .dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len,
+ .dsdt_anycpu = dsdt_i440_anycpu_qemu_xen,
+ .dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len,
.dsdt_15cpu = NULL,
.dsdt_15cpu_len = 0
};
diff --git a/tools/firmware/hvmloader/seabios.c b/tools/firmware/hvmloader/seabios.c
index 444d118ddb..74b0406b5a 100644
--- a/tools/firmware/hvmloader/seabios.c
+++ b/tools/firmware/hvmloader/seabios.c
@@ -90,8 +90,8 @@ static void seabios_acpi_build_tables(void)
{
uint32_t rsdp = (uint32_t)scratch_alloc(sizeof(struct acpi_20_rsdp), 0);
struct acpi_config config = {
- .dsdt_anycpu = dsdt_anycpu_qemu_xen,
- .dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len,
+ .dsdt_anycpu = dsdt_i440_anycpu_qemu_xen,
+ .dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len,
.dsdt_15cpu = NULL,
.dsdt_15cpu_len = 0,
};
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index e651342681..f1ed1eb48d 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -843,8 +843,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
s = xenstore_read("platform/device-model", "");
if ( !strncmp(s, "qemu_xen", 9) )
{
- config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
- config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
+ config->dsdt_anycpu = dsdt_i440_anycpu_qemu_xen;
+ config->dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len;
config->dsdt_15cpu = NULL;
config->dsdt_15cpu_len = 0;
}
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 765a013ddd..3c5eeff5e7 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -381,8 +381,8 @@ extern struct e820map memory_map;
bool check_overlap(uint64_t start, uint64_t size,
uint64_t reserved_start, uint64_t reserved_size);
-extern const unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
-extern const int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
+extern const unsigned char dsdt_i440_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
+extern const int dsdt_i440_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
unsigned long acpi_pages_allocated(void);
diff --git a/tools/libacpi/Makefile b/tools/libacpi/Makefile
index b21a64c6b4..d3d4bc9543 100644
--- a/tools/libacpi/Makefile
+++ b/tools/libacpi/Makefile
@@ -11,7 +11,7 @@ endif
MK_DSDT = $(ACPI_BUILD_DIR)/mk_dsdt
-C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c dsdt_pvh.c
+C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_i440_anycpu_qemu_xen.c dsdt_pvh.c
C_SRC-$(CONFIG_ARM_64) = dsdt_anycpu_arm.c
DSDT_FILES ?= $(C_SRC-y)
C_SRC = $(addprefix $(ACPI_BUILD_DIR)/, $(DSDT_FILES))
@@ -39,18 +39,16 @@ $(H_SRC): $(ACPI_BUILD_DIR)/%.h: %.asl
$(MK_DSDT): mk_dsdt.c
$(HOSTCC) $(HOSTCFLAGS) $(MKDSDT_CFLAGS-y) $(CFLAGS_xeninclude) -D__XEN_TOOLS__ -o $@ mk_dsdt.c
-$(ACPI_BUILD_DIR)/dsdt_anycpu_qemu_xen.asl: dsdt.asl dsdt_acpi_info.asl $(MK_DSDT)
+$(ACPI_BUILD_DIR)/dsdt_%_anycpu_qemu_xen.asl: dsdt_%.asl dsdt.asl dsdt_acpi_info.asl $(MK_DSDT)
# Remove last bracket
awk 'NR > 1 {print s} {s=$$0}' $< > $@.$(TMP_SUFFIX)
- cat dsdt_acpi_info.asl >> $@.$(TMP_SUFFIX)
$(MK_DSDT) --debug=$(debug) --dm-version qemu-xen >> $@.$(TMP_SUFFIX)
mv -f $@.$(TMP_SUFFIX) $@
# NB. awk invocation is a portable alternative to 'head -n -1'
-$(ACPI_BUILD_DIR)/dsdt_%cpu.asl: dsdt.asl dsdt_acpi_info.asl $(MK_DSDT)
+$(ACPI_BUILD_DIR)/dsdt_%cpu.asl: dsdt_i440.asl dsdt.asl dsdt_acpi_info.asl $(MK_DSDT)
# Remove last bracket
awk 'NR > 1 {print s} {s=$$0}' $< > $@.$(TMP_SUFFIX)
- cat dsdt_acpi_info.asl >> $@.$(TMP_SUFFIX)
$(MK_DSDT) --debug=$(debug) --maxcpu $* >> $@.$(TMP_SUFFIX)
mv -f $@.$(TMP_SUFFIX) $@
@@ -65,7 +63,7 @@ $(ACPI_BUILD_DIR)/dsdt_anycpu_arm.asl: $(MK_DSDT)
mv -f $@.$(TMP_SUFFIX) $@
$(C_SRC): $(ACPI_BUILD_DIR)/%.c: $(ACPI_BUILD_DIR)/%.asl
- $(IASL) -vs -p $(ACPI_BUILD_DIR)/$*.$(TMP_SUFFIX) -tc $<
+ $(IASL) -vs -I $(CURDIR) -p $(ACPI_BUILD_DIR)/$*.$(TMP_SUFFIX) -tc $<
sed -e 's/AmlCode/$*/g' -e 's/_aml_code//g' $(ACPI_BUILD_DIR)/$*.hex > $@.$(TMP_SUFFIX)
echo "int $*_len=sizeof($*);" >> $@.$(TMP_SUFFIX)
mv -f $@.$(TMP_SUFFIX) $@
diff --git a/tools/libacpi/dsdt.asl b/tools/libacpi/dsdt.asl
index 32b42f85ae..130826fdcc 100644
--- a/tools/libacpi/dsdt.asl
+++ b/tools/libacpi/dsdt.asl
@@ -5,8 +5,6 @@
* Copyright (c) 2004, Intel Corporation.
*/
-DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
-{
Name (\PMBS, 0x0C00)
Name (\PMLN, 0x08)
Name (\IOB1, 0x00)
@@ -199,7 +197,10 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
Device (ISA)
{
- Name (_ADR, 0x00010000) /* device 1, fn 0 */
+ /* Error will be raised if the machine type is not defined */
+ #ifdef MACHINE_TYPE_I440
+ Name (_ADR, 0x00010000) /* device 1, fn 0 */
+ #endif
OperationRegion(PIRQ, PCI_Config, 0x60, 0x4)
Scope(\) {
@@ -329,23 +330,6 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
})
}
- Device (FDC0)
- {
- Name (_HID, EisaId ("PNP0700"))
- Method (_STA, 0, NotSerialized)
- {
- Return (0x0F)
- }
-
- Name (_CRS, ResourceTemplate ()
- {
- IO (Decode16, 0x03F0, 0x03F0, 0x01, 0x06)
- IO (Decode16, 0x03F7, 0x03F7, 0x01, 0x01)
- IRQNoFlags () {6}
- DMA (Compatibility, NotBusMaster, Transfer8) {2}
- })
- }
-
Device (UAR1)
{
Name (_HID, EisaId ("PNP0501"))
@@ -444,4 +428,3 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
Method(_PIC, 1) {
Store(Arg0, PICD)
}
-}
diff --git a/tools/libacpi/dsdt_i440.asl b/tools/libacpi/dsdt_i440.asl
new file mode 100644
index 0000000000..e80c454ad9
--- /dev/null
+++ b/tools/libacpi/dsdt_i440.asl
@@ -0,0 +1,36 @@
+/* SPDX-License-Identifier: LGPL-2.1-only */
+/******************************************************************************
+ * DSDT for Xen with Qemu device model (for i440 machine)
+ *
+ * Copyright (c) 2004, Intel Corporation.
+ */
+
+DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
+{
+ #define MACHINE_TYPE_I440
+
+ #include "dsdt.asl"
+
+ Scope (\_SB.PCI0.ISA)
+ {
+ Device (FDC0)
+ {
+ Name (_HID, EisaId ("PNP0700"))
+
+ Method (_STA, 0, NotSerialized)
+ {
+ Return (0x0F)
+ }
+
+ Name (_CRS, ResourceTemplate ()
+ {
+ IO (Decode16, 0x03F0, 0x03F0, 0x01, 0x06)
+ IO (Decode16, 0x03F7, 0x03F7, 0x01, 0x01)
+ IRQNoFlags () {6}
+ DMA (Compatibility, NotBusMaster, Transfer8) {2}
+ })
+ }
+ }
+
+ #include "dsdt_acpi_info.asl"
+}
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 01/17] libacpi: Split dsdt.asl file and extract i440 specific parts
2026-03-13 16:35 ` [PATCH 01/17] libacpi: Split dsdt.asl file and extract i440 specific parts Thierry Escande
@ 2026-04-28 9:05 ` Roger Pau Monné
2026-05-04 14:34 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 9:05 UTC (permalink / raw)
To: Thierry Escande; +Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
> In order to factorize the common parts between i440 and q35 dsdt files,
> this patch splits dsdt.asl and put the i440 specific parts into
> dsdt_i400.asl.
>
> This also makes use of #include directives instead of file
> concatenations to build the asl files.
>
> Also, the anycpu asl files generation makes use of makefile pattern
> rules to avoid duplication for i440 and q35.
>
> Becuase the LPC controller BDF (which differs between i440 and q35) must
s/Becuase/Because/
> be set at device declaration, it is still set in dsdt.asl by checking
> for a MACHINE_TYPE_I440 macro defined in dsdt_i400.asl.
s/i400/i440/
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/Makefile | 2 +-
> tools/firmware/hvmloader/ovmf.c | 4 ++--
> tools/firmware/hvmloader/seabios.c | 4 ++--
> tools/firmware/hvmloader/util.c | 4 ++--
> tools/firmware/hvmloader/util.h | 4 ++--
> tools/libacpi/Makefile | 10 ++++-----
> tools/libacpi/dsdt.asl | 25 ++++-----------------
> tools/libacpi/dsdt_i440.asl | 36 ++++++++++++++++++++++++++++++
> 8 files changed, 53 insertions(+), 36 deletions(-)
> create mode 100644 tools/libacpi/dsdt_i440.asl
>
> diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
> index 21de72187d..bdc33a877f 100644
> --- a/tools/firmware/hvmloader/Makefile
> +++ b/tools/firmware/hvmloader/Makefile
> @@ -78,7 +78,7 @@ rombios.o: roms.inc
> smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
>
> ACPI_PATH = ../../libacpi
> -DSDT_FILES += dsdt_anycpu_qemu_xen.c
> +DSDT_FILES += dsdt_i440_anycpu_qemu_xen.c
> ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o
> $(ACPI_OBJS): CFLAGS += -iquote . -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
> CFLAGS += -I$(ACPI_PATH)
> diff --git a/tools/firmware/hvmloader/ovmf.c b/tools/firmware/hvmloader/ovmf.c
> index 23610a0717..d264a50c73 100644
> --- a/tools/firmware/hvmloader/ovmf.c
> +++ b/tools/firmware/hvmloader/ovmf.c
> @@ -119,8 +119,8 @@ static void ovmf_load(const struct bios_config *config,
> static void ovmf_acpi_build_tables(void)
> {
> struct acpi_config config = {
> - .dsdt_anycpu = dsdt_anycpu_qemu_xen,
> - .dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len,
> + .dsdt_anycpu = dsdt_i440_anycpu_qemu_xen,
> + .dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len,
> .dsdt_15cpu = NULL,
> .dsdt_15cpu_len = 0
> };
> diff --git a/tools/firmware/hvmloader/seabios.c b/tools/firmware/hvmloader/seabios.c
> index 444d118ddb..74b0406b5a 100644
> --- a/tools/firmware/hvmloader/seabios.c
> +++ b/tools/firmware/hvmloader/seabios.c
> @@ -90,8 +90,8 @@ static void seabios_acpi_build_tables(void)
> {
> uint32_t rsdp = (uint32_t)scratch_alloc(sizeof(struct acpi_20_rsdp), 0);
> struct acpi_config config = {
> - .dsdt_anycpu = dsdt_anycpu_qemu_xen,
> - .dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len,
> + .dsdt_anycpu = dsdt_i440_anycpu_qemu_xen,
> + .dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len,
> .dsdt_15cpu = NULL,
> .dsdt_15cpu_len = 0,
> };
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index e651342681..f1ed1eb48d 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -843,8 +843,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
> s = xenstore_read("platform/device-model", "");
> if ( !strncmp(s, "qemu_xen", 9) )
> {
> - config->dsdt_anycpu = dsdt_anycpu_qemu_xen;
> - config->dsdt_anycpu_len = dsdt_anycpu_qemu_xen_len;
> + config->dsdt_anycpu = dsdt_i440_anycpu_qemu_xen;
> + config->dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len;
> config->dsdt_15cpu = NULL;
> config->dsdt_15cpu_len = 0;
> }
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index 765a013ddd..3c5eeff5e7 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -381,8 +381,8 @@ extern struct e820map memory_map;
> bool check_overlap(uint64_t start, uint64_t size,
> uint64_t reserved_start, uint64_t reserved_size);
>
> -extern const unsigned char dsdt_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
> -extern const int dsdt_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
> +extern const unsigned char dsdt_i440_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
> +extern const int dsdt_i440_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
While there you can make this unsigned int, like the dsdt_anycpu_len
field.
>
> unsigned long acpi_pages_allocated(void);
>
> diff --git a/tools/libacpi/Makefile b/tools/libacpi/Makefile
> index b21a64c6b4..d3d4bc9543 100644
> --- a/tools/libacpi/Makefile
> +++ b/tools/libacpi/Makefile
> @@ -11,7 +11,7 @@ endif
>
> MK_DSDT = $(ACPI_BUILD_DIR)/mk_dsdt
>
> -C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_anycpu_qemu_xen.c dsdt_pvh.c
> +C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_i440_anycpu_qemu_xen.c dsdt_pvh.c
> C_SRC-$(CONFIG_ARM_64) = dsdt_anycpu_arm.c
> DSDT_FILES ?= $(C_SRC-y)
> C_SRC = $(addprefix $(ACPI_BUILD_DIR)/, $(DSDT_FILES))
> @@ -39,18 +39,16 @@ $(H_SRC): $(ACPI_BUILD_DIR)/%.h: %.asl
> $(MK_DSDT): mk_dsdt.c
> $(HOSTCC) $(HOSTCFLAGS) $(MKDSDT_CFLAGS-y) $(CFLAGS_xeninclude) -D__XEN_TOOLS__ -o $@ mk_dsdt.c
>
> -$(ACPI_BUILD_DIR)/dsdt_anycpu_qemu_xen.asl: dsdt.asl dsdt_acpi_info.asl $(MK_DSDT)
> +$(ACPI_BUILD_DIR)/dsdt_%_anycpu_qemu_xen.asl: dsdt_%.asl dsdt.asl dsdt_acpi_info.asl $(MK_DSDT)
> # Remove last bracket
> awk 'NR > 1 {print s} {s=$$0}' $< > $@.$(TMP_SUFFIX)
> - cat dsdt_acpi_info.asl >> $@.$(TMP_SUFFIX)
> $(MK_DSDT) --debug=$(debug) --dm-version qemu-xen >> $@.$(TMP_SUFFIX)
> mv -f $@.$(TMP_SUFFIX) $@
>
> # NB. awk invocation is a portable alternative to 'head -n -1'
> -$(ACPI_BUILD_DIR)/dsdt_%cpu.asl: dsdt.asl dsdt_acpi_info.asl $(MK_DSDT)
> +$(ACPI_BUILD_DIR)/dsdt_%cpu.asl: dsdt_i440.asl dsdt.asl dsdt_acpi_info.asl $(MK_DSDT)
> # Remove last bracket
> awk 'NR > 1 {print s} {s=$$0}' $< > $@.$(TMP_SUFFIX)
> - cat dsdt_acpi_info.asl >> $@.$(TMP_SUFFIX)
> $(MK_DSDT) --debug=$(debug) --maxcpu $* >> $@.$(TMP_SUFFIX)
> mv -f $@.$(TMP_SUFFIX) $@
>
> @@ -65,7 +63,7 @@ $(ACPI_BUILD_DIR)/dsdt_anycpu_arm.asl: $(MK_DSDT)
> mv -f $@.$(TMP_SUFFIX) $@
>
> $(C_SRC): $(ACPI_BUILD_DIR)/%.c: $(ACPI_BUILD_DIR)/%.asl
> - $(IASL) -vs -p $(ACPI_BUILD_DIR)/$*.$(TMP_SUFFIX) -tc $<
> + $(IASL) -vs -I $(CURDIR) -p $(ACPI_BUILD_DIR)/$*.$(TMP_SUFFIX) -tc $<
> sed -e 's/AmlCode/$*/g' -e 's/_aml_code//g' $(ACPI_BUILD_DIR)/$*.hex > $@.$(TMP_SUFFIX)
> echo "int $*_len=sizeof($*);" >> $@.$(TMP_SUFFIX)
> mv -f $@.$(TMP_SUFFIX) $@
> diff --git a/tools/libacpi/dsdt.asl b/tools/libacpi/dsdt.asl
> index 32b42f85ae..130826fdcc 100644
> --- a/tools/libacpi/dsdt.asl
> +++ b/tools/libacpi/dsdt.asl
> @@ -5,8 +5,6 @@
> * Copyright (c) 2004, Intel Corporation.
> */
>
> -DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
> -{
> Name (\PMBS, 0x0C00)
> Name (\PMLN, 0x08)
> Name (\IOB1, 0x00)
> @@ -199,7 +197,10 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
>
> Device (ISA)
> {
> - Name (_ADR, 0x00010000) /* device 1, fn 0 */
> + /* Error will be raised if the machine type is not defined */
> + #ifdef MACHINE_TYPE_I440
> + Name (_ADR, 0x00010000) /* device 1, fn 0 */
> + #endif
I think we want the preprocessor directives not indented the same as
asl code, like we do in C, iow:
Device (ISA)
{
/* Error will be raised if the machine type is not defined */
#ifdef MACHINE_TYPE_I440
Name (_ADR, 0x00010000) /* device 1, fn 0 */
#endif
Same with the includes and defines below.
>
> OperationRegion(PIRQ, PCI_Config, 0x60, 0x4)
> Scope(\) {
> @@ -329,23 +330,6 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
> })
> }
>
> - Device (FDC0)
> - {
> - Name (_HID, EisaId ("PNP0700"))
> - Method (_STA, 0, NotSerialized)
> - {
> - Return (0x0F)
> - }
> -
> - Name (_CRS, ResourceTemplate ()
> - {
> - IO (Decode16, 0x03F0, 0x03F0, 0x01, 0x06)
> - IO (Decode16, 0x03F7, 0x03F7, 0x01, 0x01)
> - IRQNoFlags () {6}
> - DMA (Compatibility, NotBusMaster, Transfer8) {2}
> - })
> - }
I need to look at the Q35 changes, but ff the only differences are the
Name() method at the top and this, we might as well use an #ifdef
here, and there's no need to split into so many separate files? I
need to look at further patches.
The ACPI table generation stuff could certainly do with an overall
rework.
> Device (UAR1)
> {
> Name (_HID, EisaId ("PNP0501"))
> @@ -444,4 +428,3 @@ DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
> Method(_PIC, 1) {
> Store(Arg0, PICD)
> }
> -}
> diff --git a/tools/libacpi/dsdt_i440.asl b/tools/libacpi/dsdt_i440.asl
> new file mode 100644
> index 0000000000..e80c454ad9
> --- /dev/null
> +++ b/tools/libacpi/dsdt_i440.asl
> @@ -0,0 +1,36 @@
> +/* SPDX-License-Identifier: LGPL-2.1-only */
> +/******************************************************************************
> + * DSDT for Xen with Qemu device model (for i440 machine)
> + *
> + * Copyright (c) 2004, Intel Corporation.
> + */
> +
> +DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
> +{
> + #define MACHINE_TYPE_I440
IMO we probably want to define this outside of the DefinitionBlock
section, just after the copyright header.
> +
> + #include "dsdt.asl"
> +
> + Scope (\_SB.PCI0.ISA)
> + {
> + Device (FDC0)
> + {
> + Name (_HID, EisaId ("PNP0700"))
> +
> + Method (_STA, 0, NotSerialized)
> + {
> + Return (0x0F)
> + }
> +
> + Name (_CRS, ResourceTemplate ()
> + {
> + IO (Decode16, 0x03F0, 0x03F0, 0x01, 0x06)
> + IO (Decode16, 0x03F7, 0x03F7, 0x01, 0x01)
> + IRQNoFlags () {6}
> + DMA (Compatibility, NotBusMaster, Transfer8) {2}
> + })
> + }
> + }
> +
> + #include "dsdt_acpi_info.asl"
> +}
> --
> 2.51.0
>
>
>
> --
> Thierry Escande | Vates XCP-ng Developer
>
> XCP-ng & Xen Orchestra - Vates solutions
>
> web: https://vates.tech
>
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 01/17] libacpi: Split dsdt.asl file and extract i440 specific parts
2026-03-13 16:35 ` [PATCH 01/17] libacpi: Split dsdt.asl file and extract i440 specific parts Thierry Escande
2026-04-28 9:05 ` Roger Pau Monné
@ 2026-05-04 14:34 ` Jan Beulich
2026-05-04 14:35 ` Jan Beulich
1 sibling, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 14:34 UTC (permalink / raw)
To: Thierry Escande
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 13.03.2026 17:35, Thierry Escande wrote:
> --- a/tools/firmware/hvmloader/Makefile
> +++ b/tools/firmware/hvmloader/Makefile
> @@ -78,7 +78,7 @@ rombios.o: roms.inc
> smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
>
> ACPI_PATH = ../../libacpi
> -DSDT_FILES += dsdt_anycpu_qemu_xen.c
> +DSDT_FILES += dsdt_i440_anycpu_qemu_xen.c
Instead of merely adding the i440 infix, could we perhaps replace the anycpu
one (which doesn't serve any purpose here anymore, afaics)?
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 01/17] libacpi: Split dsdt.asl file and extract i440 specific parts
2026-05-04 14:34 ` Jan Beulich
@ 2026-05-04 14:35 ` Jan Beulich
0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 14:35 UTC (permalink / raw)
To: Thierry Escande
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 04.05.2026 16:34, Jan Beulich wrote:
> On 13.03.2026 17:35, Thierry Escande wrote:
>> --- a/tools/firmware/hvmloader/Makefile
>> +++ b/tools/firmware/hvmloader/Makefile
>> @@ -78,7 +78,7 @@ rombios.o: roms.inc
>> smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
>>
>> ACPI_PATH = ../../libacpi
>> -DSDT_FILES += dsdt_anycpu_qemu_xen.c
>> +DSDT_FILES += dsdt_i440_anycpu_qemu_xen.c
>
> Instead of merely adding the i440 infix, could we perhaps replace the anycpu
> one (which doesn't serve any purpose here anymore, afaics)?
Thinking about it, qemu and xen perhaps fall into the same category, and could
perhaps also be dropped.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 06/17] hvmloader: Move pci devices setup to a separate function
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
2026-03-13 16:35 ` [PATCH 01/17] libacpi: Split dsdt.asl file and extract i440 specific parts Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 12:48 ` Roger Pau Monné
2026-05-04 14:52 ` Jan Beulich
2026-03-13 16:35 ` [PATCH 02/17] libacpi: new DSDT ACPI table for Q35 Thierry Escande
` (16 subsequent siblings)
18 siblings, 2 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD
For readability and code simplification, this patch moves PCI-device
specific initializations out of the pci_setup() function to a new
function class_specific_pci_device_setup().
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/pci.c | 117 +++++++++++++++-------------
tools/firmware/hvmloader/pci_regs.h | 4 +
2 files changed, 68 insertions(+), 53 deletions(-)
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index c41c8d946a..a76d051bdf 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -84,12 +84,71 @@ static int find_next_rmrr(uint32_t base)
return next_rmrr;
}
+static void class_specific_pci_device_setup(uint16_t vendor_id,
+ uint16_t device_id,
+ uint16_t class,
+ uint8_t bus,
+ uint8_t devfn, uint8_t *vga_devfn)
+{
+ switch ( class )
+ {
+ case PCI_CLASS_DISPLAY_VGA:
+ /* If emulated VGA is found, preserve it as primary VGA. */
+ if ( (vendor_id == 0x1234) && (device_id == 0x1111) )
+ {
+ *vga_devfn = devfn;
+ virtual_vga = VGA_std;
+ }
+ else if ( (vendor_id == 0x1013) && (device_id == 0xb8) )
+ {
+ *vga_devfn = devfn;
+ virtual_vga = VGA_cirrus;
+ }
+ else if ( virtual_vga == VGA_none )
+ {
+ *vga_devfn = devfn;
+ virtual_vga = VGA_pt;
+ if ( vendor_id == 0x8086 )
+ {
+ igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
+ /*
+ * Write the the OpRegion offset to give the opregion
+ * address to the device model. The device model will trap
+ * and map the OpRegion at the give address.
+ */
+ pci_writel(*vga_devfn, PCI_INTEL_OPREGION,
+ igd_opregion_pgbase << PAGE_SHIFT);
+ }
+ }
+ break;
+ case PCI_CLASS_BRIDGE_OTHER:
+ /* PIIX4 ACPI PM. Special device with special PCI config space. */
+ ASSERT((vendor_id == 0x8086) && (device_id == 0x7113));
+ pci_writew(devfn, 0x20, 0x0000); /* No smb bus IO enable */
+ pci_writew(devfn, 0xd2, 0x0000); /* No smb bus IO enable */
+ pci_writew(devfn, 0x22, 0x0000);
+ pci_writew(devfn, 0x3c, 0x0009); /* Hardcoded IRQ9 */
+ pci_writew(devfn, 0x3d, 0x0001);
+ pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
+ pci_writeb(devfn, 0x80, 0x01); /* enable PM io space */
+ break;
+ case PCI_CLASS_STORAGE_IDE:
+ if ( vendor_id == 0x8086 )
+ {
+ /* Intel ICHs since PIIX3: enable IDE legacy mode. */
+ pci_writew(devfn, 0x40, 0x8000); /* enable IDE0 */
+ pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
+ }
+ break;
+ }
+}
+
void pci_setup(void)
{
uint8_t is_64bar, using_64bar, bar64_relocate = 0;
uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
- uint32_t vga_devfn = 256;
+ uint8_t vga_devfn = 0xff;
uint16_t class, vendor_id, device_id;
unsigned int bar, pin, link, isa_irq;
uint8_t pci_devfn_decode_type[256] = {};
@@ -170,57 +229,9 @@ void pci_setup(void)
ASSERT((devfn != PCI_ISA_DEVFN) ||
((vendor_id == 0x8086) && (device_id == 0x7000)));
- switch ( class )
- {
- case 0x0300:
- /* If emulated VGA is found, preserve it as primary VGA. */
- if ( (vendor_id == 0x1234) && (device_id == 0x1111) )
- {
- vga_devfn = devfn;
- virtual_vga = VGA_std;
- }
- else if ( (vendor_id == 0x1013) && (device_id == 0xb8) )
- {
- vga_devfn = devfn;
- virtual_vga = VGA_cirrus;
- }
- else if ( virtual_vga == VGA_none )
- {
- vga_devfn = devfn;
- virtual_vga = VGA_pt;
- if ( vendor_id == 0x8086 )
- {
- igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
- /*
- * Write the the OpRegion offset to give the opregion
- * address to the device model. The device model will trap
- * and map the OpRegion at the give address.
- */
- pci_writel(vga_devfn, PCI_INTEL_OPREGION,
- igd_opregion_pgbase << PAGE_SHIFT);
- }
- }
- break;
- case 0x0680:
- /* PIIX4 ACPI PM. Special device with special PCI config space. */
- ASSERT((vendor_id == 0x8086) && (device_id == 0x7113));
- pci_writew(devfn, 0x20, 0x0000); /* No smb bus IO enable */
- pci_writew(devfn, 0xd2, 0x0000); /* No smb bus IO enable */
- pci_writew(devfn, 0x22, 0x0000);
- pci_writew(devfn, 0x3c, 0x0009); /* Hardcoded IRQ9 */
- pci_writew(devfn, 0x3d, 0x0001);
- pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
- pci_writeb(devfn, 0x80, 0x01); /* enable PM io space */
- break;
- case 0x0101:
- if ( vendor_id == 0x8086 )
- {
- /* Intel ICHs since PIIX3: enable IDE legacy mode. */
- pci_writew(devfn, 0x40, 0x8000); /* enable IDE0 */
- pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
- }
- break;
- }
+ class_specific_pci_device_setup(vendor_id, device_id, class,
+ 0 /* virt_bus support TBD */,
+ devfn, &vga_devfn);
/*
* It is recommended that BAR programming be done whilst decode
@@ -583,7 +594,7 @@ void pci_setup(void)
((pci_hi_mem_start & -pci_hi_mem_start) - 1)) + 1;
}
- if ( vga_devfn != 256 )
+ if ( vga_devfn != 0xff )
{
/*
* VGA registers live in I/O space so ensure that primary VGA
diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
index 4d4dc0cd01..c94278855b 100644
--- a/tools/firmware/hvmloader/pci_regs.h
+++ b/tools/firmware/hvmloader/pci_regs.h
@@ -111,6 +111,10 @@
#define PCI_DEVICE_ID_INTEL_82441 0x1237
#define PCI_DEVICE_ID_INTEL_Q35_MCH 0x29c0
+#define PCI_CLASS_STORAGE_IDE 0x0101
+#define PCI_CLASS_DISPLAY_VGA 0x0300
+#define PCI_CLASS_BRIDGE_OTHER 0x0680
+
#endif /* __HVMLOADER_PCI_REGS_H__ */
/*
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 06/17] hvmloader: Move pci devices setup to a separate function
2026-03-13 16:35 ` [PATCH 06/17] hvmloader: Move pci devices setup to a separate function Thierry Escande
@ 2026-04-28 12:48 ` Roger Pau Monné
2026-05-04 14:52 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 12:48 UTC (permalink / raw)
To: Thierry Escande; +Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Fri, Mar 13, 2026 at 04:35:02PM +0000, Thierry Escande wrote:
> For readability and code simplification, this patch moves PCI-device
> specific initializations out of the pci_setup() function to a new
> function class_specific_pci_device_setup().
AFAICT this is a non-functional change. Should likely be mentioned in
the commit message to avoid any doubts.
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/pci.c | 117 +++++++++++++++-------------
> tools/firmware/hvmloader/pci_regs.h | 4 +
> 2 files changed, 68 insertions(+), 53 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index c41c8d946a..a76d051bdf 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -84,12 +84,71 @@ static int find_next_rmrr(uint32_t base)
> return next_rmrr;
> }
>
> +static void class_specific_pci_device_setup(uint16_t vendor_id,
> + uint16_t device_id,
> + uint16_t class,
It's a bit weird to pass the class value into the function, the value
is only used inside the function itself, and hence could be fetched
inside the function as the device BDF is provided as parameters?
> + uint8_t bus,
> + uint8_t devfn, uint8_t *vga_devfn)
> +{
> + switch ( class )
> + {
> + case PCI_CLASS_DISPLAY_VGA:
> + /* If emulated VGA is found, preserve it as primary VGA. */
> + if ( (vendor_id == 0x1234) && (device_id == 0x1111) )
> + {
> + *vga_devfn = devfn;
> + virtual_vga = VGA_std;
> + }
> + else if ( (vendor_id == 0x1013) && (device_id == 0xb8) )
Since you introduce defines for the device classes, could you also
introduce defines for the vendor and device IDs used here?
> + {
> + *vga_devfn = devfn;
> + virtual_vga = VGA_cirrus;
> + }
> + else if ( virtual_vga == VGA_none )
> + {
> + *vga_devfn = devfn;
> + virtual_vga = VGA_pt;
> + if ( vendor_id == 0x8086 )
This one is PCI_VENDOR_ID_INTEL, also a couple of more instances below.
> + {
> + igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
> + /*
> + * Write the the OpRegion offset to give the opregion
> + * address to the device model. The device model will trap
> + * and map the OpRegion at the give address.
> + */
> + pci_writel(*vga_devfn, PCI_INTEL_OPREGION,
> + igd_opregion_pgbase << PAGE_SHIFT);
> + }
> + }
> + break;
Newlines after break statements.
> + case PCI_CLASS_BRIDGE_OTHER:
> + /* PIIX4 ACPI PM. Special device with special PCI config space. */
> + ASSERT((vendor_id == 0x8086) && (device_id == 0x7113));
> + pci_writew(devfn, 0x20, 0x0000); /* No smb bus IO enable */
> + pci_writew(devfn, 0xd2, 0x0000); /* No smb bus IO enable */
> + pci_writew(devfn, 0x22, 0x0000);
> + pci_writew(devfn, 0x3c, 0x0009); /* Hardcoded IRQ9 */
> + pci_writew(devfn, 0x3d, 0x0001);
> + pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
> + pci_writeb(devfn, 0x80, 0x01); /* enable PM io space */
> + break;
> + case PCI_CLASS_STORAGE_IDE:
> + if ( vendor_id == 0x8086 )
> + {
> + /* Intel ICHs since PIIX3: enable IDE legacy mode. */
> + pci_writew(devfn, 0x40, 0x8000); /* enable IDE0 */
> + pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
> + }
> + break;
> + }
> +}
> +
> void pci_setup(void)
> {
> uint8_t is_64bar, using_64bar, bar64_relocate = 0;
> uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
> uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
> - uint32_t vga_devfn = 256;
> + uint8_t vga_devfn = 0xff;
> uint16_t class, vendor_id, device_id;
> unsigned int bar, pin, link, isa_irq;
> uint8_t pci_devfn_decode_type[256] = {};
> @@ -170,57 +229,9 @@ void pci_setup(void)
> ASSERT((devfn != PCI_ISA_DEVFN) ||
> ((vendor_id == 0x8086) && (device_id == 0x7000)));
>
> - switch ( class )
> - {
> - case 0x0300:
> - /* If emulated VGA is found, preserve it as primary VGA. */
> - if ( (vendor_id == 0x1234) && (device_id == 0x1111) )
> - {
> - vga_devfn = devfn;
> - virtual_vga = VGA_std;
> - }
> - else if ( (vendor_id == 0x1013) && (device_id == 0xb8) )
> - {
> - vga_devfn = devfn;
> - virtual_vga = VGA_cirrus;
> - }
> - else if ( virtual_vga == VGA_none )
> - {
> - vga_devfn = devfn;
> - virtual_vga = VGA_pt;
> - if ( vendor_id == 0x8086 )
> - {
> - igd_opregion_pgbase = mem_hole_alloc(IGD_OPREGION_PAGES);
> - /*
> - * Write the the OpRegion offset to give the opregion
> - * address to the device model. The device model will trap
> - * and map the OpRegion at the give address.
> - */
> - pci_writel(vga_devfn, PCI_INTEL_OPREGION,
> - igd_opregion_pgbase << PAGE_SHIFT);
> - }
> - }
> - break;
> - case 0x0680:
> - /* PIIX4 ACPI PM. Special device with special PCI config space. */
> - ASSERT((vendor_id == 0x8086) && (device_id == 0x7113));
> - pci_writew(devfn, 0x20, 0x0000); /* No smb bus IO enable */
> - pci_writew(devfn, 0xd2, 0x0000); /* No smb bus IO enable */
> - pci_writew(devfn, 0x22, 0x0000);
> - pci_writew(devfn, 0x3c, 0x0009); /* Hardcoded IRQ9 */
> - pci_writew(devfn, 0x3d, 0x0001);
> - pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
> - pci_writeb(devfn, 0x80, 0x01); /* enable PM io space */
> - break;
> - case 0x0101:
> - if ( vendor_id == 0x8086 )
> - {
> - /* Intel ICHs since PIIX3: enable IDE legacy mode. */
> - pci_writew(devfn, 0x40, 0x8000); /* enable IDE0 */
> - pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
> - }
> - break;
> - }
> + class_specific_pci_device_setup(vendor_id, device_id, class,
> + 0 /* virt_bus support TBD */,
> + devfn, &vga_devfn);
>
> /*
> * It is recommended that BAR programming be done whilst decode
> @@ -583,7 +594,7 @@ void pci_setup(void)
> ((pci_hi_mem_start & -pci_hi_mem_start) - 1)) + 1;
> }
>
> - if ( vga_devfn != 256 )
> + if ( vga_devfn != 0xff )
> {
> /*
> * VGA registers live in I/O space so ensure that primary VGA
> diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
> index 4d4dc0cd01..c94278855b 100644
> --- a/tools/firmware/hvmloader/pci_regs.h
> +++ b/tools/firmware/hvmloader/pci_regs.h
> @@ -111,6 +111,10 @@
> #define PCI_DEVICE_ID_INTEL_82441 0x1237
> #define PCI_DEVICE_ID_INTEL_Q35_MCH 0x29c0
>
> +#define PCI_CLASS_STORAGE_IDE 0x0101
> +#define PCI_CLASS_DISPLAY_VGA 0x0300
> +#define PCI_CLASS_BRIDGE_OTHER 0x0680
As mentioned in a previous patch, this would better be placed in a
pci_ids.h header.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 06/17] hvmloader: Move pci devices setup to a separate function
2026-03-13 16:35 ` [PATCH 06/17] hvmloader: Move pci devices setup to a separate function Thierry Escande
2026-04-28 12:48 ` Roger Pau Monné
@ 2026-05-04 14:52 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 14:52 UTC (permalink / raw)
To: Thierry Escande
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, xen-devel
On 13.03.2026 17:35, Thierry Escande wrote:
> void pci_setup(void)
> {
> uint8_t is_64bar, using_64bar, bar64_relocate = 0;
> uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
> uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
> - uint32_t vga_devfn = 256;
> + uint8_t vga_devfn = 0xff;
This change (and the related ones elsewhere) looks bogus to me: 0xff is
a valid devfn value, whereas 256 isn't.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 02/17] libacpi: new DSDT ACPI table for Q35
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
2026-03-13 16:35 ` [PATCH 01/17] libacpi: Split dsdt.asl file and extract i440 specific parts Thierry Escande
2026-03-13 16:35 ` [PATCH 06/17] hvmloader: Move pci devices setup to a separate function Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 10:17 ` Roger Pau Monné
2026-05-04 14:39 ` Jan Beulich
2026-03-13 16:35 ` [PATCH 05/17] hvmloader: add Q35 DSDT table loading Thierry Escande
` (15 subsequent siblings)
18 siblings, 2 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Alexey Gerasimenko
This patch adds the DSDT table for Q35 (new tools/libacpi/dsdt_q35.asl
file). It only contains the specific Q35 parts that differ from i440).
At the moment, these are:
- BDF location of LPC Controller
- Minor changes related to FDC detection
- Addition of _OSC method to inform OSPM about PCIe features supported
As we are still using 4 PCI router links and their corresponding
device/register addresses are same (offset 0x60), no need to change PCI
routing descriptions.
Note that '15cpu' ACPI tables are only applicable to qemu-traditional
(which have no support for Q35), so we need to use 'anycpu' version only.
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/Makefile | 2 +-
tools/libacpi/Makefile | 2 +-
tools/libacpi/dsdt.asl | 3 +
tools/libacpi/dsdt_q35.asl | 130 ++++++++++++++++++++++++++++++
4 files changed, 135 insertions(+), 2 deletions(-)
create mode 100644 tools/libacpi/dsdt_q35.asl
diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
index bdc33a877f..99f045efaa 100644
--- a/tools/firmware/hvmloader/Makefile
+++ b/tools/firmware/hvmloader/Makefile
@@ -78,7 +78,7 @@ rombios.o: roms.inc
smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
ACPI_PATH = ../../libacpi
-DSDT_FILES += dsdt_i440_anycpu_qemu_xen.c
+DSDT_FILES += dsdt_i440_anycpu_qemu_xen.c dsdt_q35_anycpu_qemu_xen.c
ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o
$(ACPI_OBJS): CFLAGS += -iquote . -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
CFLAGS += -I$(ACPI_PATH)
diff --git a/tools/libacpi/Makefile b/tools/libacpi/Makefile
index d3d4bc9543..e6c4a3fd8b 100644
--- a/tools/libacpi/Makefile
+++ b/tools/libacpi/Makefile
@@ -11,7 +11,7 @@ endif
MK_DSDT = $(ACPI_BUILD_DIR)/mk_dsdt
-C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_i440_anycpu_qemu_xen.c dsdt_pvh.c
+C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_i440_anycpu_qemu_xen.c dsdt_q35_anycpu_qemu_xen.c dsdt_pvh.c
C_SRC-$(CONFIG_ARM_64) = dsdt_anycpu_arm.c
DSDT_FILES ?= $(C_SRC-y)
C_SRC = $(addprefix $(ACPI_BUILD_DIR)/, $(DSDT_FILES))
diff --git a/tools/libacpi/dsdt.asl b/tools/libacpi/dsdt.asl
index 130826fdcc..dc764881c9 100644
--- a/tools/libacpi/dsdt.asl
+++ b/tools/libacpi/dsdt.asl
@@ -201,6 +201,9 @@
#ifdef MACHINE_TYPE_I440
Name (_ADR, 0x00010000) /* device 1, fn 0 */
#endif
+ #ifdef MACHINE_TYPE_Q35
+ Name (_ADR, 0x001f0000) /* device 31, fn 0 */
+ #endif
OperationRegion(PIRQ, PCI_Config, 0x60, 0x4)
Scope(\) {
diff --git a/tools/libacpi/dsdt_q35.asl b/tools/libacpi/dsdt_q35.asl
new file mode 100644
index 0000000000..7cefe63506
--- /dev/null
+++ b/tools/libacpi/dsdt_q35.asl
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: LGPL-2.1-only */
+/******************************************************************************
+ * DSDT for Xen with Qemu device model (for Q35 machine)
+ */
+
+DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
+{
+ #define MACHINE_TYPE_Q35
+
+ #include "dsdt.asl"
+
+ Scope (\_SB.PCI0)
+ {
+ /* _OSC, modified from ASL sample in ACPI spec */
+ Name (SUPP, 0) /* PCI _OSC Support Field value */
+ Name (CTRL, 0) /* PCI _OSC Control Field value */
+ Method (_OSC, 4) {
+ /* Create DWORD-addressable fields from the Capabilities Buffer */
+ CreateDWordField (Arg3, 0, CDW1)
+
+ /* Switch by UUID.
+ * Only PCI Host Bridge Device capabilities UUID used for now
+ */
+ If (LEqual (Arg0, ToUUID ("33DB4D5B-1FF7-401C-9657-7441C03DD766"))) {
+ /* Create DWORD-addressable fields from the Capabilities Buffer */
+ CreateDWordField (Arg3, 4, CDW2)
+ CreateDWordField (Arg3, 8, CDW3)
+
+ /* Save Capabilities DWORD2 & 3 */
+ Store (CDW2, SUPP)
+ Store (CDW3, CTRL)
+
+ /* Validate Revision DWORD */
+ If (LNotEqual (Arg1, One)) {
+ /* Unknown revision */
+ /* Support and Control DWORDs will be returned anyway */
+ Or (CDW1, 0x08, CDW1)
+ }
+
+ /* Control field bits are:
+ * bit 0 PCI Express Native Hot Plug control
+ * bit 1 SHPC Native Hot Plug control
+ * bit 2 PCI Express Native Power Management Events control
+ * bit 3 PCI Express Advanced Error Reporting control
+ * bit 4 PCI Express Capability Structure control
+ */
+
+ /* Always allow native PME, AER (no dependencies)
+ * Never allow SHPC (no SHPC controller in this system)
+ * Do not allow PCIe Capability Structure control for now
+ * Also, ACPI hotplug is used for now instead of PCIe
+ * Native Hot Plug
+ */
+ And (CTRL, 0x0C, CTRL)
+
+ If (LNotEqual (CDW3, CTRL)) {
+ /* Some of Capabilities bits were masked */
+ Or (CDW1, 0x10, CDW1)
+ }
+ /* Update DWORD3 in the buffer */
+ Store (CTRL, CDW3)
+ } Else {
+ Or (CDW1, 4, CDW1) /* Unrecognized UUID */
+ }
+ Return (Arg3)
+ }
+ /* end of _OSC */
+ }
+
+ /****************************************************************
+ * LPC ISA bridge
+ ****************************************************************/
+
+ Scope (\_SB.PCI0.ISA)
+ {
+ /*
+ LPC ISA bridge
+
+ PCI Interrupt Routing Register 2 (PIRQE..PIRQH) cannot be
+ used because of existing Xen IRQ limitations (4 PCI links
+ only)
+ */
+
+ /* LPC_I/O: I/O Decode Ranges Register */
+ OperationRegion (LPCD, PCI_Config, 0x80, 0x2)
+ Field (LPCD, AnyAcc, NoLock, Preserve) {
+ COMA, 3,
+ , 1,
+ COMB, 3,
+
+ Offset(0x01),
+ LPTD, 2,
+ , 2,
+ FDCD, 2
+ }
+
+ /* LPC_EN: LPC I/F Enables Register */
+ OperationRegion(LPCE, PCI_Config, 0x82, 0x2)
+ Field(LPCE, AnyAcc, NoLock, Preserve) {
+ CAEN, 1,
+ CBEN, 1,
+ LPEN, 1,
+ FDEN, 1
+ }
+
+ Device (FDC0)
+ {
+ Name (_HID, EisaId ("PNP0700"))
+ Method (_STA, 0, NotSerialized)
+ {
+ Store (FDEN, Local0)
+ If (LEqual (Local0, 0)) {
+ Return (0x00)
+ } Else {
+ Return (0x0F)
+ }
+ }
+
+ Name (_CRS, ResourceTemplate ()
+ {
+ IO (Decode16, 0x03F2, 0x03F2, 0x00, 0x04)
+ IO (Decode16, 0x03F7, 0x03F7, 0x00, 0x01)
+ IRQNoFlags () {6}
+ DMA (Compatibility, NotBusMaster, Transfer8) {2}
+ })
+ }
+ }
+
+ #include "dsdt_acpi_info.asl"
+}
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 02/17] libacpi: new DSDT ACPI table for Q35
2026-03-13 16:35 ` [PATCH 02/17] libacpi: new DSDT ACPI table for Q35 Thierry Escande
@ 2026-04-28 10:17 ` Roger Pau Monné
2026-05-04 14:39 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 10:17 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
> This patch adds the DSDT table for Q35 (new tools/libacpi/dsdt_q35.asl
> file). It only contains the specific Q35 parts that differ from i440).
> At the moment, these are:
>
> - BDF location of LPC Controller
> - Minor changes related to FDC detection
> - Addition of _OSC method to inform OSPM about PCIe features supported
>
> As we are still using 4 PCI router links and their corresponding
> device/register addresses are same (offset 0x60), no need to change PCI
> routing descriptions.
>
> Note that '15cpu' ACPI tables are only applicable to qemu-traditional
> (which have no support for Q35), so we need to use 'anycpu' version only.
Is the above statement fully accurate? It seems like 15cpu tables are
used with rombios, so the dependency is not on qemu-trad, but rather
rombios?
If it's truly only dependent on qemu-trad then we should remove those,
as we have removed qemu-trad.
>
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
If the first SoB if from Alexey, the From: should also match.
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/Makefile | 2 +-
> tools/libacpi/Makefile | 2 +-
> tools/libacpi/dsdt.asl | 3 +
> tools/libacpi/dsdt_q35.asl | 130 ++++++++++++++++++++++++++++++
> 4 files changed, 135 insertions(+), 2 deletions(-)
> create mode 100644 tools/libacpi/dsdt_q35.asl
>
> diff --git a/tools/firmware/hvmloader/Makefile b/tools/firmware/hvmloader/Makefile
> index bdc33a877f..99f045efaa 100644
> --- a/tools/firmware/hvmloader/Makefile
> +++ b/tools/firmware/hvmloader/Makefile
> @@ -78,7 +78,7 @@ rombios.o: roms.inc
> smbios.o: CFLAGS += -D__SMBIOS_DATE__="\"$(SMBIOS_REL_DATE)\""
>
> ACPI_PATH = ../../libacpi
> -DSDT_FILES += dsdt_i440_anycpu_qemu_xen.c
> +DSDT_FILES += dsdt_i440_anycpu_qemu_xen.c dsdt_q35_anycpu_qemu_xen.c
> ACPI_OBJS = $(patsubst %.c,%.o,$(DSDT_FILES)) build.o static_tables.o
> $(ACPI_OBJS): CFLAGS += -iquote . -DLIBACPI_STDUTILS=\"$(CURDIR)/util.h\"
> CFLAGS += -I$(ACPI_PATH)
> diff --git a/tools/libacpi/Makefile b/tools/libacpi/Makefile
> index d3d4bc9543..e6c4a3fd8b 100644
> --- a/tools/libacpi/Makefile
> +++ b/tools/libacpi/Makefile
> @@ -11,7 +11,7 @@ endif
>
> MK_DSDT = $(ACPI_BUILD_DIR)/mk_dsdt
>
> -C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_i440_anycpu_qemu_xen.c dsdt_pvh.c
> +C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_i440_anycpu_qemu_xen.c dsdt_q35_anycpu_qemu_xen.c dsdt_pvh.c
> C_SRC-$(CONFIG_ARM_64) = dsdt_anycpu_arm.c
> DSDT_FILES ?= $(C_SRC-y)
> C_SRC = $(addprefix $(ACPI_BUILD_DIR)/, $(DSDT_FILES))
> diff --git a/tools/libacpi/dsdt.asl b/tools/libacpi/dsdt.asl
> index 130826fdcc..dc764881c9 100644
> --- a/tools/libacpi/dsdt.asl
> +++ b/tools/libacpi/dsdt.asl
> @@ -201,6 +201,9 @@
> #ifdef MACHINE_TYPE_I440
> Name (_ADR, 0x00010000) /* device 1, fn 0 */
> #endif
> + #ifdef MACHINE_TYPE_Q35
> + Name (_ADR, 0x001f0000) /* device 31, fn 0 */
> + #endif
You possibly want to do:
#ifdef ...
#elif defined(...)
#else
#error ...
#endif
But seeing the difference is only for the address, why not do:
#define ISA_DEV_SBDF 0x00010000
...
Name (_ADR, ISA_DEV_SBDF)
...
And avoid the ifdef mess?
>
> OperationRegion(PIRQ, PCI_Config, 0x60, 0x4)
> Scope(\) {
> diff --git a/tools/libacpi/dsdt_q35.asl b/tools/libacpi/dsdt_q35.asl
> new file mode 100644
> index 0000000000..7cefe63506
> --- /dev/null
> +++ b/tools/libacpi/dsdt_q35.asl
> @@ -0,0 +1,130 @@
> +/* SPDX-License-Identifier: LGPL-2.1-only */
> +/******************************************************************************
> + * DSDT for Xen with Qemu device model (for Q35 machine)
> + */
> +
> +DefinitionBlock ("DSDT.aml", "DSDT", 2, "Xen", "HVM", 0)
> +{
> + #define MACHINE_TYPE_Q35
> +
> + #include "dsdt.asl"
> +
> + Scope (\_SB.PCI0)
> + {
> + /* _OSC, modified from ASL sample in ACPI spec */
> + Name (SUPP, 0) /* PCI _OSC Support Field value */
> + Name (CTRL, 0) /* PCI _OSC Control Field value */
> + Method (_OSC, 4) {
> + /* Create DWORD-addressable fields from the Capabilities Buffer */
> + CreateDWordField (Arg3, 0, CDW1)
> +
> + /* Switch by UUID.
> + * Only PCI Host Bridge Device capabilities UUID used for now
Comment style, in Xen we use:
/*
* Switch by UIID.
* Only PCI Host Bridge Device capabilities UUID used for now.
*/
The opening and closing lines are standalone. Also missing a full
stop on the last line. The rest of the comments below also need
adjusting.
> + */
> + If (LEqual (Arg0, ToUUID ("33DB4D5B-1FF7-401C-9657-7441C03DD766"))) {
> + /* Create DWORD-addressable fields from the Capabilities Buffer */
> + CreateDWordField (Arg3, 4, CDW2)
> + CreateDWordField (Arg3, 8, CDW3)
> +
> + /* Save Capabilities DWORD2 & 3 */
> + Store (CDW2, SUPP)
> + Store (CDW3, CTRL)
> +
> + /* Validate Revision DWORD */
> + If (LNotEqual (Arg1, One)) {
> + /* Unknown revision */
> + /* Support and Control DWORDs will be returned anyway */
> + Or (CDW1, 0x08, CDW1)
> + }
> +
> + /* Control field bits are:
> + * bit 0 PCI Express Native Hot Plug control
> + * bit 1 SHPC Native Hot Plug control
> + * bit 2 PCI Express Native Power Management Events control
> + * bit 3 PCI Express Advanced Error Reporting control
> + * bit 4 PCI Express Capability Structure control
> + */
> +
> + /* Always allow native PME, AER (no dependencies)
> + * Never allow SHPC (no SHPC controller in this system)
> + * Do not allow PCIe Capability Structure control for now
> + * Also, ACPI hotplug is used for now instead of PCIe
> + * Native Hot Plug
> + */
> + And (CTRL, 0x0C, CTRL)
> +
> + If (LNotEqual (CDW3, CTRL)) {
> + /* Some of Capabilities bits were masked */
> + Or (CDW1, 0x10, CDW1)
> + }
> + /* Update DWORD3 in the buffer */
> + Store (CTRL, CDW3)
This looks equal to the QEMU code FWIW.
> + } Else {
> + Or (CDW1, 4, CDW1) /* Unrecognized UUID */
> + }
> + Return (Arg3)
> + }
> + /* end of _OSC */
> + }
> +
> + /****************************************************************
> + * LPC ISA bridge
> + ****************************************************************/
I would use a normal one-line comment here: /* LPCI ISA Bridge */
Has any of this been picked up from the QEMU asl files? Asking
because the above comment looks to be verbatim copied from the QEMU
file, and we then need to carry their copyright notice, which is not
done in this patch.
> +
> + Scope (\_SB.PCI0.ISA)
AFAICT this is adding more content to the ISA device already defined
in dsdt.asl?
> + {
> + /*
> + LPC ISA bridge
> +
> + PCI Interrupt Routing Register 2 (PIRQE..PIRQH) cannot be
> + used because of existing Xen IRQ limitations (4 PCI links
> + only)
> + */
Right, and PIRQA..PIRQD is already defined in the generic dsdt.asl.
Might be worth mentioning, otherwise the block looks incomplete.
> +
> + /* LPC_I/O: I/O Decode Ranges Register */
> + OperationRegion (LPCD, PCI_Config, 0x80, 0x2)
> + Field (LPCD, AnyAcc, NoLock, Preserve) {
> + COMA, 3,
> + , 1,
> + COMB, 3,
> +
> + Offset(0x01),
> + LPTD, 2,
> + , 2,
> + FDCD, 2
> + }
> +
> + /* LPC_EN: LPC I/F Enables Register */
> + OperationRegion(LPCE, PCI_Config, 0x82, 0x2)
> + Field(LPCE, AnyAcc, NoLock, Preserve) {
> + CAEN, 1,
> + CBEN, 1,
> + LPEN, 1,
> + FDEN, 1
> + }
> +
> + Device (FDC0)
> + {
> + Name (_HID, EisaId ("PNP0700"))
> + Method (_STA, 0, NotSerialized)
> + {
> + Store (FDEN, Local0)
> + If (LEqual (Local0, 0)) {
> + Return (0x00)
> + } Else {
> + Return (0x0F)
> + }
> + }
> +
> + Name (_CRS, ResourceTemplate ()
> + {
> + IO (Decode16, 0x03F2, 0x03F2, 0x00, 0x04)
> + IO (Decode16, 0x03F7, 0x03F7, 0x00, 0x01)
> + IRQNoFlags () {6}
> + DMA (Compatibility, NotBusMaster, Transfer8) {2}
> + })
> + }
> + }
This seem to match the blocks in QEMU, so it's likely fine.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 02/17] libacpi: new DSDT ACPI table for Q35
2026-03-13 16:35 ` [PATCH 02/17] libacpi: new DSDT ACPI table for Q35 Thierry Escande
2026-04-28 10:17 ` Roger Pau Monné
@ 2026-05-04 14:39 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 14:39 UTC (permalink / raw)
To: Thierry Escande
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Alexey Gerasimenko, xen-devel
On 13.03.2026 17:35, Thierry Escande wrote:
> --- a/tools/libacpi/Makefile
> +++ b/tools/libacpi/Makefile
> @@ -11,7 +11,7 @@ endif
>
> MK_DSDT = $(ACPI_BUILD_DIR)/mk_dsdt
>
> -C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_i440_anycpu_qemu_xen.c dsdt_pvh.c
> +C_SRC-$(CONFIG_X86) = dsdt_anycpu.c dsdt_15cpu.c dsdt_i440_anycpu_qemu_xen.c dsdt_q35_anycpu_qemu_xen.c dsdt_pvh.c
This line's now definitely getting too long. Maybe with the suggested name
change (see patch 1, to be extended to here) it would remain tolerable as
a single line. Otherwise it wants splitting.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 05/17] hvmloader: add Q35 DSDT table loading
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (2 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 02/17] libacpi: new DSDT ACPI table for Q35 Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 11:08 ` Roger Pau Monné
2026-03-13 16:35 ` [PATCH 03/17] hvmloader: add function to set the emulated machine type (i440/Q35) Thierry Escande
` (14 subsequent siblings)
18 siblings, 1 reply; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Alexey Gerasimenko
This patch allows to select Q35 DSDT table in the function
hvmloader_acpi_build_tables(). The machine_type global variable is used
to select a proper table (i440/q35).
As we are bound to the qemu-xen device model for Q35, there is no need
to initialize config->dsdt_15cpu/config->dsdt_15cpu_len fields.
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/util.c | 17 +++++++++++++++--
tools/firmware/hvmloader/util.h | 2 ++
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index f9116bea4d..45519ea583 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -885,8 +885,21 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
s = xenstore_read("platform/device-model", "");
if ( !strncmp(s, "qemu_xen", 9) )
{
- config->dsdt_anycpu = dsdt_i440_anycpu_qemu_xen;
- config->dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len;
+ switch ( machine_type )
+ {
+ case MACHINE_TYPE_Q35:
+ config->dsdt_anycpu = dsdt_q35_anycpu_qemu_xen;
+ config->dsdt_anycpu_len = dsdt_q35_anycpu_qemu_xen_len;
+ break;
+ case MACHINE_TYPE_I440:
+ config->dsdt_anycpu = dsdt_i440_anycpu_qemu_xen;
+ config->dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len;
+ break;
+ default:
+ /* Not likely to happen */
+ BUG();
+ }
+
config->dsdt_15cpu = NULL;
config->dsdt_15cpu_len = 0;
}
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 2f37504aca..4641ca0c46 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -393,7 +393,9 @@ bool check_overlap(uint64_t start, uint64_t size,
uint64_t reserved_start, uint64_t reserved_size);
extern const unsigned char dsdt_i440_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
+extern const unsigned char dsdt_q35_anycpu_qemu_xen[];
extern const int dsdt_i440_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
+extern const int dsdt_q35_anycpu_qemu_xen_len;
unsigned long acpi_pages_allocated(void);
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 05/17] hvmloader: add Q35 DSDT table loading
2026-03-13 16:35 ` [PATCH 05/17] hvmloader: add Q35 DSDT table loading Thierry Escande
@ 2026-04-28 11:08 ` Roger Pau Monné
0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 11:08 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:02PM +0000, Thierry Escande wrote:
> This patch allows to select Q35 DSDT table in the function
> hvmloader_acpi_build_tables(). The machine_type global variable is used
> to select a proper table (i440/q35).
>
> As we are bound to the qemu-xen device model for Q35, there is no need
> to initialize config->dsdt_15cpu/config->dsdt_15cpu_len fields.
>
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/util.c | 17 +++++++++++++++--
> tools/firmware/hvmloader/util.h | 2 ++
> 2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index f9116bea4d..45519ea583 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -885,8 +885,21 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
> s = xenstore_read("platform/device-model", "");
> if ( !strncmp(s, "qemu_xen", 9) )
Tying Q35 to xenstore seems a bit sub-optimal, as Q35 detection is
done purely from the PCI config space. It would be better if we could
avoid relying on the xenstore node, and hence also fixup at least
ovmf_acpi_build_tables() and seabios_acpi_build_tables() to use the
Q35 ACPI tables when Q35 is detected, regardless of the device-model
xenstore node.
> {
> - config->dsdt_anycpu = dsdt_i440_anycpu_qemu_xen;
> - config->dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len;
> + switch ( machine_type )
> + {
> + case MACHINE_TYPE_Q35:
> + config->dsdt_anycpu = dsdt_q35_anycpu_qemu_xen;
> + config->dsdt_anycpu_len = dsdt_q35_anycpu_qemu_xen_len;
> + break;
> + case MACHINE_TYPE_I440:
> + config->dsdt_anycpu = dsdt_i440_anycpu_qemu_xen;
> + config->dsdt_anycpu_len = dsdt_i440_anycpu_qemu_xen_len;
> + break;
> + default:
> + /* Not likely to happen */
> + BUG();
> + }
> +
> config->dsdt_15cpu = NULL;
> config->dsdt_15cpu_len = 0;
> }
> diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
> index 2f37504aca..4641ca0c46 100644
> --- a/tools/firmware/hvmloader/util.h
> +++ b/tools/firmware/hvmloader/util.h
> @@ -393,7 +393,9 @@ bool check_overlap(uint64_t start, uint64_t size,
> uint64_t reserved_start, uint64_t reserved_size);
>
> extern const unsigned char dsdt_i440_anycpu_qemu_xen[], dsdt_anycpu[], dsdt_15cpu[];
> +extern const unsigned char dsdt_q35_anycpu_qemu_xen[];
If I'm not mistaken, patch 2 introduces the dsdt_q35_anycpu_qemu_xen
symbol to the hvmloader build, so that you can reference it here?
At first I was going to comment that you are referencing a symbol not
included in the patch, but I think such symbol was previously added to
the build in patch 2 (and was unused there).
> extern const int dsdt_i440_anycpu_qemu_xen_len, dsdt_anycpu_len, dsdt_15cpu_len;
> +extern const int dsdt_q35_anycpu_qemu_xen_len;
unsigned int would be better here.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 03/17] hvmloader: add function to set the emulated machine type (i440/Q35)
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (3 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 05/17] hvmloader: add Q35 DSDT table loading Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 10:39 ` Roger Pau Monné
2026-05-04 14:43 ` Jan Beulich
2026-03-13 16:35 ` [PATCH 08/17] hvmloader: Extend PCI BAR struct Thierry Escande
` (13 subsequent siblings)
18 siblings, 2 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Alexey Gerasimenko
This adds a new function init_pc_machine_type() which allows to
determine and set the emulated chipset type. Possible values are
MACHINE_TYPE_I440 and MACHINE_TYPE_Q35 and stored in the global variable
machine_type.
The machine_type variable will be used from multiple places in following
commits.
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/hvmloader.c | 2 ++
tools/firmware/hvmloader/pci_regs.h | 4 +++
tools/firmware/hvmloader/util.c | 42 ++++++++++++++++++++++++++++
tools/firmware/hvmloader/util.h | 11 ++++++++
4 files changed, 59 insertions(+)
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 6d23150fc9..626cc53649 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -332,6 +332,8 @@ int main(void)
init_hypercalls();
+ init_pc_machine_type();
+
memory_map_setup();
xenbus_setup();
diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
index 7bf2d873ab..4d4dc0cd01 100644
--- a/tools/firmware/hvmloader/pci_regs.h
+++ b/tools/firmware/hvmloader/pci_regs.h
@@ -107,6 +107,10 @@
#define PCI_INTEL_OPREGION 0xfc /* 4 bits */
+#define PCI_VENDOR_ID_INTEL 0x8086
+#define PCI_DEVICE_ID_INTEL_82441 0x1237
+#define PCI_DEVICE_ID_INTEL_Q35_MCH 0x29c0
+
#endif /* __HVMLOADER_PCI_REGS_H__ */
/*
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index f1ed1eb48d..f9116bea4d 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -22,6 +22,7 @@
#include "hypercall.h"
#include "ctype.h"
#include "vnuma.h"
+#include "pci_regs.h"
#include <acpi2_0.h>
#include <libacpi.h>
#include <stdint.h>
@@ -648,6 +649,47 @@ void __bug(const char *file, int line)
crash();
}
+machine_type_t machine_type;
+
+void init_pc_machine_type(void)
+{
+ uint16_t vendor_id;
+ uint16_t device_id;
+
+ if ( machine_type != MACHINE_TYPE_UNDEFINED )
+ return;
+
+ vendor_id = pci_readw(0, PCI_VENDOR_ID);
+ device_id = pci_readw(0, PCI_DEVICE_ID);
+
+ /* only Intel platforms are emulated currently */
+ if ( vendor_id != PCI_VENDOR_ID_INTEL )
+ goto error;
+
+ switch ( device_id )
+ {
+ case PCI_DEVICE_ID_INTEL_82441:
+ machine_type = MACHINE_TYPE_I440;
+ printf("Detected i440 chipset\n");
+ break;
+
+ case PCI_DEVICE_ID_INTEL_Q35_MCH:
+ machine_type = MACHINE_TYPE_Q35;
+ printf("Detected Q35 chipset\n");
+ break;
+
+ default:
+ goto error;
+ }
+
+ return;
+
+error:
+ printf("Unknown emulated chipset encountered, VID=%04Xh, DID=%04Xh\n",
+ vendor_id, device_id);
+ BUG();
+}
+
static void validate_hvm_info(struct hvm_info_table *t)
{
uint8_t *ptr = (uint8_t *)t;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 3c5eeff5e7..2f37504aca 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -170,6 +170,17 @@ void pci_write(uint32_t devfn, uint32_t reg, uint32_t len, uint32_t val);
#define pci_writew(devfn, reg, val) pci_write(devfn, reg, 2, (uint16_t)(val))
#define pci_writel(devfn, reg, val) pci_write(devfn, reg, 4, (uint32_t)(val))
+/* Emulated machine types */
+typedef enum {
+ MACHINE_TYPE_UNDEFINED = 0,
+ MACHINE_TYPE_I440,
+ MACHINE_TYPE_Q35,
+} machine_type_t;
+
+extern machine_type_t machine_type;
+
+void init_pc_machine_type(void);
+
/* Get a pointer to the shared-info page */
struct shared_info *get_shared_info(void) __attribute__ ((const));
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 03/17] hvmloader: add function to set the emulated machine type (i440/Q35)
2026-03-13 16:35 ` [PATCH 03/17] hvmloader: add function to set the emulated machine type (i440/Q35) Thierry Escande
@ 2026-04-28 10:39 ` Roger Pau Monné
2026-05-04 10:58 ` Jan Beulich
2026-05-04 14:43 ` Jan Beulich
1 sibling, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 10:39 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
> This adds a new function init_pc_machine_type() which allows to
> determine and set the emulated chipset type. Possible values are
> MACHINE_TYPE_I440 and MACHINE_TYPE_Q35 and stored in the global variable
> machine_type.
>
> The machine_type variable will be used from multiple places in following
> commits.
Is this initialization something that OVMF or SeaBIOS also does?
(maybe not for Xen ATM)
Asking myself because as said earlier we want to possibly get rid of
hvmloader, plus we will want ECAM support in PVH at some point.
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Same as previous patch, if the first SoB is from Alexey the From:
(patch author) should also match.
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/hvmloader.c | 2 ++
> tools/firmware/hvmloader/pci_regs.h | 4 +++
> tools/firmware/hvmloader/util.c | 42 ++++++++++++++++++++++++++++
> tools/firmware/hvmloader/util.h | 11 ++++++++
> 4 files changed, 59 insertions(+)
>
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index 6d23150fc9..626cc53649 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -332,6 +332,8 @@ int main(void)
>
> init_hypercalls();
>
> + init_pc_machine_type();
> +
> memory_map_setup();
>
> xenbus_setup();
> diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
> index 7bf2d873ab..4d4dc0cd01 100644
> --- a/tools/firmware/hvmloader/pci_regs.h
> +++ b/tools/firmware/hvmloader/pci_regs.h
> @@ -107,6 +107,10 @@
>
> #define PCI_INTEL_OPREGION 0xfc /* 4 bits */
>
> +#define PCI_VENDOR_ID_INTEL 0x8086
> +#define PCI_DEVICE_ID_INTEL_82441 0x1237
> +#define PCI_DEVICE_ID_INTEL_Q35_MCH 0x29c0
In Xen we have a separate file for vendor and device IDs, called
pci_ids.h. Maybe it would be better to use a similar approach in
hvmloader, and keep pci_regs.h only containing PCI register offsets.
> +
> #endif /* __HVMLOADER_PCI_REGS_H__ */
>
> /*
> diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
> index f1ed1eb48d..f9116bea4d 100644
> --- a/tools/firmware/hvmloader/util.c
> +++ b/tools/firmware/hvmloader/util.c
> @@ -22,6 +22,7 @@
> #include "hypercall.h"
> #include "ctype.h"
> #include "vnuma.h"
> +#include "pci_regs.h"
> #include <acpi2_0.h>
> #include <libacpi.h>
> #include <stdint.h>
> @@ -648,6 +649,47 @@ void __bug(const char *file, int line)
> crash();
> }
>
> +machine_type_t machine_type;
> +
> +void init_pc_machine_type(void)
Since detection is done based on PCI device IDs, it might be better
placed in pci.c, and so you don't need to include pci_regs.h in
util.c.
> +{
> + uint16_t vendor_id;
> + uint16_t device_id;
> +
> + if ( machine_type != MACHINE_TYPE_UNDEFINED )
> + return;
> +
> + vendor_id = pci_readw(0, PCI_VENDOR_ID);
> + device_id = pci_readw(0, PCI_DEVICE_ID);
> +
> + /* only Intel platforms are emulated currently */
> + if ( vendor_id != PCI_VENDOR_ID_INTEL )
> + goto error;
> +
> + switch ( device_id )
> + {
> + case PCI_DEVICE_ID_INTEL_82441:
> + machine_type = MACHINE_TYPE_I440;
> + printf("Detected i440 chipset\n");
> + break;
> +
> + case PCI_DEVICE_ID_INTEL_Q35_MCH:
> + machine_type = MACHINE_TYPE_Q35;
> + printf("Detected Q35 chipset\n");
> + break;
> +
> + default:
> + goto error;
> + }
> +
> + return;
> +
> +error:
> + printf("Unknown emulated chipset encountered, VID=%04Xh, DID=%04Xh\n",
We don't usually use the h suffix in hex numbers in hvmloader, it's
more common to prefix them with 0x, so I would recommend to use the %#06x
formatter instead.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 03/17] hvmloader: add function to set the emulated machine type (i440/Q35)
2026-04-28 10:39 ` Roger Pau Monné
@ 2026-05-04 10:58 ` Jan Beulich
0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 10:58 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Alexey Gerasimenko,
Thierry Escande
On 28.04.2026 12:39, Roger Pau Monné wrote:
> On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
>> --- a/tools/firmware/hvmloader/pci_regs.h
>> +++ b/tools/firmware/hvmloader/pci_regs.h
>> @@ -107,6 +107,10 @@
>>
>> #define PCI_INTEL_OPREGION 0xfc /* 4 bits */
>>
>> +#define PCI_VENDOR_ID_INTEL 0x8086
>> +#define PCI_DEVICE_ID_INTEL_82441 0x1237
>> +#define PCI_DEVICE_ID_INTEL_Q35_MCH 0x29c0
>
> In Xen we have a separate file for vendor and device IDs, called
> pci_ids.h. Maybe it would be better to use a similar approach in
> hvmloader, and keep pci_regs.h only containing PCI register offsets.
Can't hvmloader simply re-use Xen's header(s)?
>> --- a/tools/firmware/hvmloader/util.c
>> +++ b/tools/firmware/hvmloader/util.c
>> @@ -22,6 +22,7 @@
>> #include "hypercall.h"
>> #include "ctype.h"
>> #include "vnuma.h"
>> +#include "pci_regs.h"
>> #include <acpi2_0.h>
>> #include <libacpi.h>
>> #include <stdint.h>
>> @@ -648,6 +649,47 @@ void __bug(const char *file, int line)
>> crash();
>> }
>>
>> +machine_type_t machine_type;
>> +
>> +void init_pc_machine_type(void)
>
> Since detection is done based on PCI device IDs, it might be better
> placed in pci.c, and so you don't need to include pci_regs.h in
> util.c.
>
>> +{
>> + uint16_t vendor_id;
>> + uint16_t device_id;
>> +
>> + if ( machine_type != MACHINE_TYPE_UNDEFINED )
>> + return;
>> +
>> + vendor_id = pci_readw(0, PCI_VENDOR_ID);
>> + device_id = pci_readw(0, PCI_DEVICE_ID);
>> +
>> + /* only Intel platforms are emulated currently */
>> + if ( vendor_id != PCI_VENDOR_ID_INTEL )
>> + goto error;
>> +
>> + switch ( device_id )
>> + {
>> + case PCI_DEVICE_ID_INTEL_82441:
>> + machine_type = MACHINE_TYPE_I440;
>> + printf("Detected i440 chipset\n");
>> + break;
>> +
>> + case PCI_DEVICE_ID_INTEL_Q35_MCH:
>> + machine_type = MACHINE_TYPE_Q35;
>> + printf("Detected Q35 chipset\n");
>> + break;
>> +
>> + default:
>> + goto error;
>> + }
>> +
>> + return;
>> +
>> +error:
>> + printf("Unknown emulated chipset encountered, VID=%04Xh, DID=%04Xh\n",
>
> We don't usually use the h suffix in hex numbers in hvmloader, it's
> more common to prefix them with 0x, so I would recommend to use the %#06x
> formatter instead.
I'd generally advise against use of # with a width specifier, as that ends
up awkward for 0. That is, %#x is fine and generally to be preferred, but
for a specific with it might better be 0x%0<n>x (with n=4 here). Arguably
here we don't really expect either of the values to be 0, so the suggested
use may indeed be okay in this case (while still introducing an example
which later may be copied elsewhere without much thought).
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 03/17] hvmloader: add function to set the emulated machine type (i440/Q35)
2026-03-13 16:35 ` [PATCH 03/17] hvmloader: add function to set the emulated machine type (i440/Q35) Thierry Escande
2026-04-28 10:39 ` Roger Pau Monné
@ 2026-05-04 14:43 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 14:43 UTC (permalink / raw)
To: Thierry Escande
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Alexey Gerasimenko, xen-devel
On 13.03.2026 17:35, Thierry Escande wrote:
> @@ -648,6 +649,47 @@ void __bug(const char *file, int line)
> crash();
> }
>
> +machine_type_t machine_type;
> +
> +void init_pc_machine_type(void)
> +{
> + uint16_t vendor_id;
> + uint16_t device_id;
> +
> + if ( machine_type != MACHINE_TYPE_UNDEFINED )
> + return;
> +
> + vendor_id = pci_readw(0, PCI_VENDOR_ID);
> + device_id = pci_readw(0, PCI_DEVICE_ID);
> +
> + /* only Intel platforms are emulated currently */
Nit: Comment style.
> + if ( vendor_id != PCI_VENDOR_ID_INTEL )
> + goto error;
> +
> + switch ( device_id )
> + {
> + case PCI_DEVICE_ID_INTEL_82441:
> + machine_type = MACHINE_TYPE_I440;
> + printf("Detected i440 chipset\n");
> + break;
> +
> + case PCI_DEVICE_ID_INTEL_Q35_MCH:
> + machine_type = MACHINE_TYPE_Q35;
> + printf("Detected Q35 chipset\n");
> + break;
> +
> + default:
> + goto error;
> + }
> +
> + return;
> +
> +error:
Nit: Labels indented by at least one blank please.
> + printf("Unknown emulated chipset encountered, VID=%04Xh, DID=%04Xh\n",
> + vendor_id, device_id);
> + BUG();
Can't this be moved up into the default case, thus avoiding "goto" and label
altogether?
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 08/17] hvmloader: Extend PCI BAR struct
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (4 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 03/17] hvmloader: add function to set the emulated machine type (i440/Q35) Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 13:31 ` Roger Pau Monné
2026-05-04 15:01 ` Jan Beulich
2026-03-13 16:35 ` [PATCH 07/17] hvmloader: add basic Q35 support Thierry Escande
` (12 subsequent siblings)
18 siblings, 2 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Alexey Gerasimenko
For the upcoming allocation of the MMCONFIG range in MMIO hole, this
patch extends the 'bars' structure to make it universal for any
arbitrary BAR type. Either IO, MMIO, ROM or a chipset-specific resource.
One important new field is addr_mask, which tells which bits of the base
address can (should) be written. Different address types (ROM, MMIO BAR,
PCIEXBAR) will have different addr_mask values.
For every assignable BAR range we store its size, PCI device BDF (devfn
actually) to which it belongs, BAR type (mem/io/mem64) and corresponding
register offset in device PCI conf space.
Also, to reduce code complexity, all long mem/mem64 BAR flags checks are
replaced by simple bars[i] field probing, eg.:
- if ( (bar_reg == PCI_ROM_ADDRESS) ||
- ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
- PCI_BASE_ADDRESS_SPACE_MEMORY) )
+ if ( bars[i].is_mem )
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/pci.c | 58 ++++++++++++++++++++--------------
1 file changed, 35 insertions(+), 23 deletions(-)
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 91c7fd2171..6e6720adae 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -160,9 +160,10 @@ static void class_specific_pci_device_setup(uint16_t vendor_id,
void pci_setup(void)
{
- uint8_t is_64bar, using_64bar, bar64_relocate = 0;
+ uint8_t is_64bar, using_64bar, bar64_relocate = 0, is_mem;
uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
+ uint64_t addr_mask;
uint8_t vga_devfn = 0xff;
uint16_t class, vendor_id, device_id;
unsigned int bar, pin, link, isa_irq;
@@ -176,10 +177,13 @@ void pci_setup(void)
/* Create a list of device BARs in descending order of size. */
struct bars {
- uint32_t is_64bar;
uint32_t devfn;
uint32_t bar_reg;
uint64_t bar_sz;
+ uint64_t addr_mask; /* which bits of the base address can be written */
+ uint32_t bar_data; /* initial value - BAR flags here */
+ uint8_t is_64bar;
+ uint8_t is_mem;
} *bars = (struct bars *)scratch_start;
unsigned int i, nr_bars = 0;
uint64_t mmio_hole_size = 0;
@@ -278,13 +282,21 @@ void pci_setup(void)
bar_reg = PCI_ROM_ADDRESS;
bar_data = pci_readl(devfn, bar_reg);
+
+ is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
+ PCI_BASE_ADDRESS_SPACE_MEMORY) ||
+ (bar_reg == PCI_ROM_ADDRESS));
+
if ( bar_reg != PCI_ROM_ADDRESS )
{
- is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
- PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
- (PCI_BASE_ADDRESS_SPACE_MEMORY |
+ is_64bar = !!(is_mem &&
+ ((bar_data & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
PCI_BASE_ADDRESS_MEM_TYPE_64));
+
pci_writel(devfn, bar_reg, ~0);
+
+ addr_mask = is_mem ? PCI_BASE_ADDRESS_MEM_MASK
+ : PCI_BASE_ADDRESS_IO_MASK;
}
else
{
@@ -292,15 +304,16 @@ void pci_setup(void)
pci_writel(devfn, bar_reg,
(bar_data | PCI_ROM_ADDRESS_MASK) &
~PCI_ROM_ADDRESS_ENABLE);
+
+ addr_mask = PCI_ROM_ADDRESS_MASK;
}
+
bar_sz = pci_readl(devfn, bar_reg);
pci_writel(devfn, bar_reg, bar_data);
if ( bar_reg != PCI_ROM_ADDRESS )
- bar_sz &= (((bar_data & PCI_BASE_ADDRESS_SPACE) ==
- PCI_BASE_ADDRESS_SPACE_MEMORY) ?
- PCI_BASE_ADDRESS_MEM_MASK :
- (PCI_BASE_ADDRESS_IO_MASK & 0xffff));
+ bar_sz &= is_mem ? PCI_BASE_ADDRESS_MEM_MASK :
+ (PCI_BASE_ADDRESS_IO_MASK & 0xffff);
else
bar_sz &= PCI_ROM_ADDRESS_MASK;
if (is_64bar) {
@@ -314,6 +327,9 @@ void pci_setup(void)
if ( bar_sz == 0 )
continue;
+ /* leave only memtype/enable bits etc */
+ bar_data &= ~addr_mask;
+
if ( !xenpci_bar_uc &&
((bar_data & PCI_BASE_ADDRESS_SPACE) ==
PCI_BASE_ADDRESS_SPACE_MEMORY) &&
@@ -359,16 +375,17 @@ void pci_setup(void)
if ( i != nr_bars )
memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
- bars[i].is_64bar = is_64bar;
bars[i].devfn = devfn;
bars[i].bar_reg = bar_reg;
bars[i].bar_sz = bar_sz;
+ bars[i].is_64bar = is_64bar;
+ bars[i].is_mem = is_mem;
+ bars[i].addr_mask = addr_mask;
+ bars[i].bar_data = bar_data;
if ( is_64bar && bar_sz > BAR_RELOC_THRESH )
bar64_relocate = 1;
- else if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
- PCI_BASE_ADDRESS_SPACE_MEMORY) ||
- (bar_reg == PCI_ROM_ADDRESS) )
+ else if ( is_mem )
mmio_total += bar_sz;
nr_bars++;
@@ -531,10 +548,10 @@ void pci_setup(void)
using_64bar = bars[i].is_64bar && bar64_relocate &&
(mmio_total > (mem_resource.max - mem_resource.base) ||
bar_sz > BAR_RELOC_THRESH);
- bar_data = pci_readl(devfn, bar_reg);
- if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
- PCI_BASE_ADDRESS_SPACE_MEMORY )
+ bar_data = bars[i].bar_data;
+
+ if ( bars[i].is_mem )
{
/* Mapping high memory if PCI device is 64 bits bar */
if ( using_64bar ) {
@@ -544,11 +561,9 @@ void pci_setup(void)
if ( !pci_hi_mem_start )
pci_hi_mem_start = high_mem_resource.base;
resource = &high_mem_resource;
- bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
}
else {
resource = &mem_resource;
- bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
}
if ( bar_sz <= BAR_RELOC_THRESH )
mmio_total -= bar_sz;
@@ -556,7 +571,6 @@ void pci_setup(void)
else
{
resource = &io_resource;
- bar_data &= ~PCI_BASE_ADDRESS_IO_MASK;
}
base = (resource->base + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
@@ -578,7 +592,7 @@ void pci_setup(void)
}
}
- bar_data |= (uint32_t)base;
+ bar_data |= (uint32_t) (base & bars[i].addr_mask);
bar_data_upper = (uint32_t)(base >> 32);
base += bar_sz;
@@ -600,9 +614,7 @@ void pci_setup(void)
PRIllx_arg(bar_sz),
bar_data_upper, bar_data);
- if ( (bar_reg == PCI_ROM_ADDRESS) ||
- ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
- PCI_BASE_ADDRESS_SPACE_MEMORY) )
+ if ( bars[i].is_mem )
pci_devfn_decode_type[devfn] |= PCI_COMMAND_MEMORY;
else
pci_devfn_decode_type[devfn] |= PCI_COMMAND_IO;
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 08/17] hvmloader: Extend PCI BAR struct
2026-03-13 16:35 ` [PATCH 08/17] hvmloader: Extend PCI BAR struct Thierry Escande
@ 2026-04-28 13:31 ` Roger Pau Monné
2026-05-04 15:01 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 13:31 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:03PM +0000, Thierry Escande wrote:
> For the upcoming allocation of the MMCONFIG range in MMIO hole, this
> patch extends the 'bars' structure to make it universal for any
> arbitrary BAR type. Either IO, MMIO, ROM or a chipset-specific resource.
>
> One important new field is addr_mask, which tells which bits of the base
> address can (should) be written. Different address types (ROM, MMIO BAR,
> PCIEXBAR) will have different addr_mask values.
>
> For every assignable BAR range we store its size, PCI device BDF (devfn
> actually) to which it belongs, BAR type (mem/io/mem64) and corresponding
> register offset in device PCI conf space.
>
> Also, to reduce code complexity, all long mem/mem64 BAR flags checks are
> replaced by simple bars[i] field probing, eg.:
> - if ( (bar_reg == PCI_ROM_ADDRESS) ||
> - ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> - PCI_BASE_ADDRESS_SPACE_MEMORY) )
> + if ( bars[i].is_mem )
I think this is also supposed to be a non-functional change, just
adding new fields and adjusting the code to make use of them?
>
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/pci.c | 58 ++++++++++++++++++++--------------
> 1 file changed, 35 insertions(+), 23 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 91c7fd2171..6e6720adae 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -160,9 +160,10 @@ static void class_specific_pci_device_setup(uint16_t vendor_id,
>
> void pci_setup(void)
> {
> - uint8_t is_64bar, using_64bar, bar64_relocate = 0;
> + uint8_t is_64bar, using_64bar, bar64_relocate = 0, is_mem;
The newly introduce fields want to be booleans types.
> uint32_t devfn, bar_reg, cmd, bar_data, bar_data_upper;
> uint64_t base, bar_sz, bar_sz_upper, mmio_total = 0;
> + uint64_t addr_mask;
> uint8_t vga_devfn = 0xff;
> uint16_t class, vendor_id, device_id;
> unsigned int bar, pin, link, isa_irq;
> @@ -176,10 +177,13 @@ void pci_setup(void)
>
> /* Create a list of device BARs in descending order of size. */
> struct bars {
> - uint32_t is_64bar;
> uint32_t devfn;
> uint32_t bar_reg;
> uint64_t bar_sz;
> + uint64_t addr_mask; /* which bits of the base address can be written */
> + uint32_t bar_data; /* initial value - BAR flags here */
Hm, that's just storing the flags of the BAR, given that you already
store the 64bit and memory flags, you just need the prefetch and ROM
enabled booleans to have the full set, and then you can remove the
bar_data field from the struct.
> + uint8_t is_64bar;
> + uint8_t is_mem;
Use bool types please for the is_ fields.
> } *bars = (struct bars *)scratch_start;
> unsigned int i, nr_bars = 0;
> uint64_t mmio_hole_size = 0;
> @@ -278,13 +282,21 @@ void pci_setup(void)
> bar_reg = PCI_ROM_ADDRESS;
>
> bar_data = pci_readl(devfn, bar_reg);
> +
> + is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> + PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> + (bar_reg == PCI_ROM_ADDRESS));
> +
> if ( bar_reg != PCI_ROM_ADDRESS )
> {
> - is_64bar = !!((bar_data & (PCI_BASE_ADDRESS_SPACE |
> - PCI_BASE_ADDRESS_MEM_TYPE_MASK)) ==
> - (PCI_BASE_ADDRESS_SPACE_MEMORY |
> + is_64bar = !!(is_mem &&
> + ((bar_data & PCI_BASE_ADDRESS_MEM_TYPE_MASK) ==
> PCI_BASE_ADDRESS_MEM_TYPE_64));
> +
> pci_writel(devfn, bar_reg, ~0);
> +
> + addr_mask = is_mem ? PCI_BASE_ADDRESS_MEM_MASK
> + : PCI_BASE_ADDRESS_IO_MASK;
> }
> else
> {
> @@ -292,15 +304,16 @@ void pci_setup(void)
> pci_writel(devfn, bar_reg,
> (bar_data | PCI_ROM_ADDRESS_MASK) &
> ~PCI_ROM_ADDRESS_ENABLE);
> +
> + addr_mask = PCI_ROM_ADDRESS_MASK;
> }
> +
> bar_sz = pci_readl(devfn, bar_reg);
> pci_writel(devfn, bar_reg, bar_data);
>
> if ( bar_reg != PCI_ROM_ADDRESS )
> - bar_sz &= (((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> - PCI_BASE_ADDRESS_SPACE_MEMORY) ?
> - PCI_BASE_ADDRESS_MEM_MASK :
> - (PCI_BASE_ADDRESS_IO_MASK & 0xffff));
> + bar_sz &= is_mem ? PCI_BASE_ADDRESS_MEM_MASK :
> + (PCI_BASE_ADDRESS_IO_MASK & 0xffff);
> else
> bar_sz &= PCI_ROM_ADDRESS_MASK;
> if (is_64bar) {
> @@ -314,6 +327,9 @@ void pci_setup(void)
> if ( bar_sz == 0 )
> continue;
>
> + /* leave only memtype/enable bits etc */
> + bar_data &= ~addr_mask;
> +
> if ( !xenpci_bar_uc &&
> ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> PCI_BASE_ADDRESS_SPACE_MEMORY) &&
> @@ -359,16 +375,17 @@ void pci_setup(void)
> if ( i != nr_bars )
> memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
>
> - bars[i].is_64bar = is_64bar;
> bars[i].devfn = devfn;
> bars[i].bar_reg = bar_reg;
> bars[i].bar_sz = bar_sz;
> + bars[i].is_64bar = is_64bar;
> + bars[i].is_mem = is_mem;
> + bars[i].addr_mask = addr_mask;
> + bars[i].bar_data = bar_data;
>
> if ( is_64bar && bar_sz > BAR_RELOC_THRESH )
> bar64_relocate = 1;
> - else if ( ((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> - PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> - (bar_reg == PCI_ROM_ADDRESS) )
> + else if ( is_mem )
> mmio_total += bar_sz;
>
> nr_bars++;
> @@ -531,10 +548,10 @@ void pci_setup(void)
> using_64bar = bars[i].is_64bar && bar64_relocate &&
> (mmio_total > (mem_resource.max - mem_resource.base) ||
> bar_sz > BAR_RELOC_THRESH);
> - bar_data = pci_readl(devfn, bar_reg);
>
> - if ( (bar_data & PCI_BASE_ADDRESS_SPACE) ==
> - PCI_BASE_ADDRESS_SPACE_MEMORY )
> + bar_data = bars[i].bar_data;
> +
> + if ( bars[i].is_mem )
> {
> /* Mapping high memory if PCI device is 64 bits bar */
> if ( using_64bar ) {
> @@ -544,11 +561,9 @@ void pci_setup(void)
> if ( !pci_hi_mem_start )
> pci_hi_mem_start = high_mem_resource.base;
> resource = &high_mem_resource;
> - bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
> }
> else {
> resource = &mem_resource;
> - bar_data &= ~PCI_BASE_ADDRESS_MEM_MASK;
> }
> if ( bar_sz <= BAR_RELOC_THRESH )
> mmio_total -= bar_sz;
> @@ -556,7 +571,6 @@ void pci_setup(void)
> else
> {
> resource = &io_resource;
> - bar_data &= ~PCI_BASE_ADDRESS_IO_MASK;
> }
>
> base = (resource->base + bar_sz - 1) & ~(uint64_t)(bar_sz - 1);
> @@ -578,7 +592,7 @@ void pci_setup(void)
> }
> }
>
> - bar_data |= (uint32_t)base;
> + bar_data |= (uint32_t) (base & bars[i].addr_mask);
^ unintended space?
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 08/17] hvmloader: Extend PCI BAR struct
2026-03-13 16:35 ` [PATCH 08/17] hvmloader: Extend PCI BAR struct Thierry Escande
2026-04-28 13:31 ` Roger Pau Monné
@ 2026-05-04 15:01 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 15:01 UTC (permalink / raw)
To: Thierry Escande
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Alexey Gerasimenko, xen-devel
On 13.03.2026 17:35, Thierry Escande wrote:
> @@ -176,10 +177,13 @@ void pci_setup(void)
>
> /* Create a list of device BARs in descending order of size. */
> struct bars {
> - uint32_t is_64bar;
> uint32_t devfn;
> uint32_t bar_reg;
> uint64_t bar_sz;
> + uint64_t addr_mask; /* which bits of the base address can be written */
> + uint32_t bar_data; /* initial value - BAR flags here */
Nit: Comment style again (also elsewhere).
> @@ -278,13 +282,21 @@ void pci_setup(void)
> bar_reg = PCI_ROM_ADDRESS;
>
> bar_data = pci_readl(devfn, bar_reg);
> +
> + is_mem = !!(((bar_data & PCI_BASE_ADDRESS_SPACE) ==
> + PCI_BASE_ADDRESS_SPACE_MEMORY) ||
> + (bar_reg == PCI_ROM_ADDRESS));
Nit: Indentation (pending open parentheses want to be reflected by extra
indenting blanks). With, as requested by Roger, is_mem switched to bool,
the !! also can go away, and with it perhaps one pair of parentheses.
(I realize pre-existing code further down has similar issues, yet when
such code is touched - and even more so when new code is added - this
wants getting right, to aid readability.)
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 07/17] hvmloader: add basic Q35 support
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (5 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 08/17] hvmloader: Extend PCI BAR struct Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 13:15 ` Roger Pau Monné
2026-05-04 14:55 ` Jan Beulich
2026-03-13 16:35 ` [PATCH 10/17] hvmloader: Add support for HVMOP_set|get_ecam_space hypercalls Thierry Escande
` (11 subsequent siblings)
18 siblings, 2 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Alexey Gerasimenko
The current hvmloader implementation assumes a fixed PCI-to-ISA bridge
at 00:01.0 (PIIX3). This patch introduces support for the ICH9 LPC
bridge used in Q35 machine types, which resides at 00:1f.0.
It also initializes PIRQA...{PIRQD, PIRQH} routing accordingly to the
emulated south bridge (either located on PCI_ISA_DEVFN or
PCI_ICH9_LPC_DEVFN).
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/config.h | 1 +
tools/firmware/hvmloader/pci.c | 34 ++++++++++++++++++++++++-----
tools/firmware/hvmloader/pci_regs.h | 1 +
3 files changed, 31 insertions(+), 5 deletions(-)
diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index c159db30ee..baaed91c7f 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -54,6 +54,7 @@ extern uint32_t *cpu_to_apicid;
#define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
#define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
+#define PCI_ICH9_LPC_DEVFN 0xf8 /* dev 31, fn 0 */
#define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index a76d051bdf..91c7fd2171 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -84,6 +84,10 @@ static int find_next_rmrr(uint32_t base)
return next_rmrr;
}
+#define SCI_EN_IOPORT (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x30)
+#define GBL_SMI_EN (1 << 0)
+#define APMC_EN (1 << 5)
+
static void class_specific_pci_device_setup(uint16_t vendor_id,
uint16_t device_id,
uint16_t class,
@@ -140,6 +144,17 @@ static void class_specific_pci_device_setup(uint16_t vendor_id,
pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
}
break;
+ case PCI_CLASS_BRIDGE_ISA:
+ /* LPC bridge */
+ if ( vendor_id == 0x8086 && device_id == 0x2918 )
+ {
+ pci_writeb(devfn, 0x3c, 0x09); /* Hardcoded IRQ9 */
+ pci_writeb(devfn, 0x3d, 0x01);
+ pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
+ pci_writeb(devfn, 0x44, 0x80); /* enable PM io space */
+ outl(SCI_EN_IOPORT, inl(SCI_EN_IOPORT) | GBL_SMI_EN | APMC_EN);
+ }
+ break;
}
}
@@ -152,6 +167,7 @@ void pci_setup(void)
uint16_t class, vendor_id, device_id;
unsigned int bar, pin, link, isa_irq;
uint8_t pci_devfn_decode_type[256] = {};
+ int is_running_on_q35 = (machine_type == MACHINE_TYPE_Q35);
/* Resources assignable to PCI devices via BARs. */
struct resource {
@@ -209,7 +225,16 @@ void pci_setup(void)
{
do { isa_irq = (isa_irq + 1) & 15;
} while ( !(PCI_ISA_IRQ_MASK & (1U << isa_irq)) );
- pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
+
+ if ( is_running_on_q35 )
+ {
+ pci_writeb(PCI_ICH9_LPC_DEVFN, 0x60 + link, isa_irq);
+ }
+ else
+ {
+ pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
+ }
+
printf("PCI-ISA link %u routed to IRQ%u\n", link, isa_irq);
}
@@ -226,9 +251,6 @@ void pci_setup(void)
if ( (vendor_id == 0xffff) && (device_id == 0xffff) )
continue;
- ASSERT((devfn != PCI_ISA_DEVFN) ||
- ((vendor_id == 0x8086) && (device_id == 0x7000)));
-
class_specific_pci_device_setup(vendor_id, device_id, class,
0 /* virt_bus support TBD */,
devfn, &vga_devfn);
@@ -362,7 +384,9 @@ void pci_setup(void)
{
/* This is the barber's pole mapping used by Xen. */
link = ((pin - 1) + (devfn >> 3)) & 3;
- isa_irq = pci_readb(PCI_ISA_DEVFN, 0x60 + link);
+ isa_irq = pci_readb(is_running_on_q35 ?
+ PCI_ICH9_LPC_DEVFN : PCI_ISA_DEVFN,
+ 0x60 + link);
pci_writeb(devfn, PCI_INTERRUPT_LINE, isa_irq);
printf("pci dev %02x:%x INT%c->IRQ%u\n",
devfn>>3, devfn&7, 'A'+pin-1, isa_irq);
diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
index c94278855b..d217b8f1a4 100644
--- a/tools/firmware/hvmloader/pci_regs.h
+++ b/tools/firmware/hvmloader/pci_regs.h
@@ -114,6 +114,7 @@
#define PCI_CLASS_STORAGE_IDE 0x0101
#define PCI_CLASS_DISPLAY_VGA 0x0300
#define PCI_CLASS_BRIDGE_OTHER 0x0680
+#define PCI_CLASS_BRIDGE_ISA 0x0601
#endif /* __HVMLOADER_PCI_REGS_H__ */
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 07/17] hvmloader: add basic Q35 support
2026-03-13 16:35 ` [PATCH 07/17] hvmloader: add basic Q35 support Thierry Escande
@ 2026-04-28 13:15 ` Roger Pau Monné
2026-05-10 23:32 ` Alexey G
2026-05-04 14:55 ` Jan Beulich
1 sibling, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 13:15 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:02PM +0000, Thierry Escande wrote:
> The current hvmloader implementation assumes a fixed PCI-to-ISA bridge
> at 00:01.0 (PIIX3). This patch introduces support for the ICH9 LPC
> bridge used in Q35 machine types, which resides at 00:1f.0.
>
> It also initializes PIRQA...{PIRQD, PIRQH} routing accordingly to the
> emulated south bridge (either located on PCI_ISA_DEVFN or
> PCI_ICH9_LPC_DEVFN).
>
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/config.h | 1 +
> tools/firmware/hvmloader/pci.c | 34 ++++++++++++++++++++++++-----
> tools/firmware/hvmloader/pci_regs.h | 1 +
> 3 files changed, 31 insertions(+), 5 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> index c159db30ee..baaed91c7f 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -54,6 +54,7 @@ extern uint32_t *cpu_to_apicid;
>
> #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
> #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
> +#define PCI_ICH9_LPC_DEVFN 0xf8 /* dev 31, fn 0 */
We should introduce a macro to generate BDFs more easily in the
hvmloader context. Having them as plain numbers is not very friendly.
>
> #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
>
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index a76d051bdf..91c7fd2171 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -84,6 +84,10 @@ static int find_next_rmrr(uint32_t base)
> return next_rmrr;
> }
>
> +#define SCI_EN_IOPORT (ACPI_PM1A_EVT_BLK_ADDRESS_V1 + 0x30)
> +#define GBL_SMI_EN (1 << 0)
> +#define APMC_EN (1 << 5)
> +
> static void class_specific_pci_device_setup(uint16_t vendor_id,
> uint16_t device_id,
> uint16_t class,
> @@ -140,6 +144,17 @@ static void class_specific_pci_device_setup(uint16_t vendor_id,
> pci_writew(devfn, 0x42, 0x8000); /* enable IDE1 */
> }
> break;
> + case PCI_CLASS_BRIDGE_ISA:
> + /* LPC bridge */
> + if ( vendor_id == 0x8086 && device_id == 0x2918 )
If you don't add parentheses around the equal comparisons please also
removing the existing ones when moving the code. Also device ID would
batter be a constant with a proper name.
> + {
> + pci_writeb(devfn, 0x3c, 0x09); /* Hardcoded IRQ9 */
> + pci_writeb(devfn, 0x3d, 0x01);
0x3c and 0x3d are PCI_INTERRUPT_LINE and PCI_INTERRUPT_PIN
respectively, let's please try to use the named constants when
available (or when they can be introduced).
> + pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
> + pci_writeb(devfn, 0x44, 0x80); /* enable PM io space */
> + outl(SCI_EN_IOPORT, inl(SCI_EN_IOPORT) | GBL_SMI_EN | APMC_EN);
Most of the above looks like black magic. It's not like the context
of this function is great, most of the existing stuff is poorly
documented. Can we get a bit more comments about what's this supposed
to do, and maybe a reference to the Intel specification that lists
where those PCI config space registers are coming from?
> + }
> + break;
> }
> }
>
> @@ -152,6 +167,7 @@ void pci_setup(void)
> uint16_t class, vendor_id, device_id;
> unsigned int bar, pin, link, isa_irq;
> uint8_t pci_devfn_decode_type[256] = {};
> + int is_running_on_q35 = (machine_type == MACHINE_TYPE_Q35);
Maybe you can avoid that and instead simply store the BDF of the ISA
device?
unsigned int isa_bdf = (machine_type == MACHINE_TYPE_Q35) ? PCI_ICH9_LPC_DEVFN
: PCI_ISA_DEVFN;
Or if you really need it to be a boolean, let's make it:
bool is_q35 = ...
It makes no sense for it to be an int, and the naming could be shorter
IMO.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 07/17] hvmloader: add basic Q35 support
2026-04-28 13:15 ` Roger Pau Monné
@ 2026-05-10 23:32 ` Alexey G
0 siblings, 0 replies; 59+ messages in thread
From: Alexey G @ 2026-05-10 23:32 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Thierry Escande, xen-devel, Jan Beulich, Andrew Cooper,
Anthony PERARD
On Tue, 28 Apr 2026 15:15:36 +0200
Roger Pau Monné <roger.pau@citrix.com> wrote:
>
>> + pci_writel(devfn, 0x40, ACPI_PM1A_EVT_BLK_ADDRESS_V1 | 1);
>> + pci_writeb(devfn, 0x44, 0x80); /* enable PM io space */
>> + outl(SCI_EN_IOPORT, inl(SCI_EN_IOPORT) | GBL_SMI_EN | APMC_EN);
>
>Most of the above looks like black magic. It's not like the context
>of this function is great, most of the existing stuff is poorly
>documented. Can we get a bit more comments about what's this supposed
>to do, and maybe a reference to the Intel specification that lists
>where those PCI config space registers are coming from?
This is a precondition for the (later) switch to ACPI.
First, we set the ACPI I/O registers base - it can be absolutely
arbitrary but we choose the same value that was used before for PIIX4 -
ACPI_PM1A_EVT_BLK_ADDRESS_V1. The "| 1" part is likely NOT necessary
and should be removed unless QEMU has a bug with its handling -
according to Intel datasheet, bit0 is "Hardwired to 1 to indicate I/O
space", no need to set it explicitly.
At this point we can use ACPI I/O register block at
ACPI_PM1A_EVT_BLK_ADDRESS_V1. We are only interested in one register
from this range - at offset +30h. SCI_EN_IOPORT is a wrong name
actually, it should be renamed to SMI_EN_IOPORT.
Finally, the line
"outl(SCI_EN_IOPORT, inl(SCI_EN_IOPORT) | GBL_SMI_EN | APMC_EN)"
enables SMI generation - we will use it later to perform a classic
APM -> ACPI switch via a write to the APM_CNT (0B2h) register to
trigger an SW SMI, followed by validating SCI_EN=1 to confirm the
successful switch.
Basically, all this setup is just a preparation for the step done in
the next patch - "hvmloader: add ACPI enabling for Q35".
What I don't remember is who was actually responsible for actual ACPI
enabling - either QEMU or firmware.
On real systems it is a bit more complicated - the APM -> ACPI switch
is done by an ACPI-aware OS itself (OSPM, how they call it). The OS
extracts information from ACPI tables and use it to find out what and
where to write in order to switch to ACPI. Under the hood it's still a
special value written to the SW SMI register, triggering a SMI handler
in the firmware. The actual hand-off to ACPI was done by the SMI
handler, including switching the PM interrupt from SMI to ACPI's SCI.
I'm not sure why hvmloader does this switch early instead. But at least
this APM -> ACPI switch flow matches the older (PIIX4) one. Perhaps
this is how it was handled by QEMU.
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 07/17] hvmloader: add basic Q35 support
2026-03-13 16:35 ` [PATCH 07/17] hvmloader: add basic Q35 support Thierry Escande
2026-04-28 13:15 ` Roger Pau Monné
@ 2026-05-04 14:55 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 14:55 UTC (permalink / raw)
To: Thierry Escande
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD,
Alexey Gerasimenko, xen-devel
On 13.03.2026 17:35, Thierry Escande wrote:
> @@ -209,7 +225,16 @@ void pci_setup(void)
> {
> do { isa_irq = (isa_irq + 1) & 15;
> } while ( !(PCI_ISA_IRQ_MASK & (1U << isa_irq)) );
> - pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
> +
> + if ( is_running_on_q35 )
> + {
> + pci_writeb(PCI_ICH9_LPC_DEVFN, 0x60 + link, isa_irq);
> + }
> + else
> + {
> + pci_writeb(PCI_ISA_DEVFN, 0x60 + link, isa_irq);
> + }
Nit: Easier to read imo without the figure braces.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 10/17] hvmloader: Add support for HVMOP_set|get_ecam_space hypercalls
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (6 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 07/17] hvmloader: add basic Q35 support Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 14:14 ` Roger Pau Monné
2026-03-13 16:35 ` [PATCH 09/17] xev/hvm: Add HVMOP_get|set_ecam_space hypercalls Thierry Escande
` (10 subsequent siblings)
18 siblings, 1 reply; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD
This patch adds wrappers in hvmloader for the hypercalls used to set and
get the ECAM space base address and size.
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/util.c | 26 ++++++++++++++++++++++++++
tools/firmware/hvmloader/util.h | 4 ++++
2 files changed, 30 insertions(+)
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index 45519ea583..ee7a09b5bc 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -690,6 +690,32 @@ error:
BUG();
}
+int hvm_set_ecam_space(uint64_t addr, uint32_t size)
+{
+ xen_hvm_ecam_space_t ecam;
+
+ ecam.domid = DOMID_SELF;
+ ecam.addr = addr;
+ ecam.size = size;
+ return hypercall_hvm_op(HVMOP_set_ecam_space, &ecam);
+}
+
+int hvm_get_ecam_space(uint64_t *addr, uint32_t *size)
+{
+ struct xen_hvm_ecam_space e = { };
+ int ret;
+
+ e.domid = DOMID_SELF;
+
+ ret = hypercall_hvm_op(HVMOP_get_ecam_space, &e);
+ if ( ret == 0 ) {
+ *addr = e.addr;
+ *size = e.size;
+ }
+
+ return ret;
+}
+
static void validate_hvm_info(struct hvm_info_table *t)
{
uint8_t *ptr = (uint8_t *)t;
diff --git a/tools/firmware/hvmloader/util.h b/tools/firmware/hvmloader/util.h
index 4641ca0c46..f63fdd3fbf 100644
--- a/tools/firmware/hvmloader/util.h
+++ b/tools/firmware/hvmloader/util.h
@@ -403,6 +403,10 @@ struct acpi_config;
void hvmloader_acpi_build_tables(struct acpi_config *config,
unsigned int physical);
+/* Pass ecam space information to Xen */
+int hvm_get_ecam_space(uint64_t *addr, uint32_t *size);
+int hvm_set_ecam_space(uint64_t addr, uint32_t size);
+
#endif /* __HVMLOADER_UTIL_H__ */
/*
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 10/17] hvmloader: Add support for HVMOP_set|get_ecam_space hypercalls
2026-03-13 16:35 ` [PATCH 10/17] hvmloader: Add support for HVMOP_set|get_ecam_space hypercalls Thierry Escande
@ 2026-04-28 14:14 ` Roger Pau Monné
0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 14:14 UTC (permalink / raw)
To: Thierry Escande; +Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Fri, Mar 13, 2026 at 04:35:03PM +0000, Thierry Escande wrote:
> This patch adds wrappers in hvmloader for the hypercalls used to set and
> get the ECAM space base address and size.
This would better be introduced with at least one user, as otherwise
it's just unreachable code. And then I'm not convinced we should use
hypercalls to set the ECAM position/size from hvmloader, if it can be
done using the native registers and QEMU can forward those to Xen.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 09/17] xev/hvm: Add HVMOP_get|set_ecam_space hypercalls
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (7 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 10/17] hvmloader: Add support for HVMOP_set|get_ecam_space hypercalls Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 13:59 ` Roger Pau Monné
2026-05-04 15:12 ` Jan Beulich
2026-03-13 16:35 ` [PATCH 14/17] libacpi: build ACPI MCFG table if requested Thierry Escande
` (9 subsequent siblings)
18 siblings, 2 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini
This patch adds 2 HVMOP hypercalls, HVMOP_get|set_ecam_space, used to
set and get the base address and size of the PCIe ECAM space as
configured by hvmloader.
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
xen/arch/x86/hvm/hvm.c | 52 +++++++++++++++++++++++++++++++
xen/arch/x86/include/asm/domain.h | 4 +++
xen/include/public/hvm/hvm_op.h | 11 +++++++
3 files changed, 67 insertions(+)
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4d37a93c57..a46dfa955d 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -5195,6 +5195,58 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
break;
+ case HVMOP_set_ecam_space: {
+ xen_hvm_ecam_space_t ecam;
+ struct domain *d;
+
+ if ( copy_from_guest( &ecam, guest_handle_cast(arg, xen_hvm_ecam_space_t), 1 ) )
+ return -EFAULT;
+
+ d = rcu_lock_domain_by_any_id(ecam.domid);
+ if ( d == NULL )
+ return -ESRCH;
+
+ if ( d->arch.ecam_addr ) {
+ rcu_unlock_domain(d);
+ return -EFAULT;
+ }
+
+ if ( (ecam.size >> 28) || (!ecam.addr) ) {
+ rcu_unlock_domain(d);
+ return -EINVAL;
+ }
+
+ d->arch.ecam_addr = ecam.addr;
+ d->arch.ecam_size = ecam.size;
+
+ rcu_unlock_domain(d);
+ break;
+ }
+
+ case HVMOP_get_ecam_space: {
+ xen_hvm_ecam_space_t ecam;
+ struct domain *d;
+
+ if ( copy_from_guest( &ecam, guest_handle_cast(arg, xen_hvm_ecam_space_t), 1 ) )
+ return -EFAULT;
+
+ d = rcu_lock_domain_by_any_id(ecam.domid);
+ if ( d == NULL )
+ return -ESRCH;
+
+ if ( ! d->arch.ecam_addr || ! d->arch.ecam_size ) {
+ rcu_unlock_domain(d);
+ return -EINVAL;
+ }
+
+ ecam.addr = d->arch.ecam_addr;
+ ecam.size = d->arch.ecam_size;
+ rc = __copy_to_guest(arg, &ecam, 1) ? -EFAULT : 0;
+
+ rcu_unlock_domain(d);
+ break;
+ }
+
default:
rc = -ENOSYS;
break;
diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
index ad7f6adb2c..24ec33fc4d 100644
--- a/xen/arch/x86/include/asm/domain.h
+++ b/xen/arch/x86/include/asm/domain.h
@@ -476,6 +476,10 @@ struct arch_domain
/* Emulated devices enabled bitmap. */
uint32_t emulation_flags;
+
+ /* PCI ECAM space emulation */
+ uint64_t ecam_addr;
+ uint32_t ecam_size;
} __cacheline_aligned;
#ifdef CONFIG_HVM
diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
index e22adf0319..c84febc37c 100644
--- a/xen/include/public/hvm/hvm_op.h
+++ b/xen/include/public/hvm/hvm_op.h
@@ -166,6 +166,17 @@ struct xen_hvm_get_mem_type {
typedef struct xen_hvm_get_mem_type xen_hvm_get_mem_type_t;
DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_mem_type_t);
+#define HVMOP_set_ecam_space 16
+#define HVMOP_get_ecam_space 17
+struct xen_hvm_ecam_space {
+ domid_t domid;
+ uint16_t pad[3]; /* align next field on 8-byte boundary */
+ uint64_t addr;
+ uint32_t size;
+};
+typedef struct xen_hvm_ecam_space xen_hvm_ecam_space_t;
+DEFINE_XEN_GUEST_HANDLE(xen_hvm_ecam_space_t);
+
/* Following tools-only interfaces may change in future. */
#if defined(__XEN__) || defined(__XEN_TOOLS__)
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 09/17] xev/hvm: Add HVMOP_get|set_ecam_space hypercalls
2026-03-13 16:35 ` [PATCH 09/17] xev/hvm: Add HVMOP_get|set_ecam_space hypercalls Thierry Escande
@ 2026-04-28 13:59 ` Roger Pau Monné
2026-05-04 11:09 ` Jan Beulich
2026-05-04 15:12 ` Jan Beulich
1 sibling, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 13:59 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini
On Fri, Mar 13, 2026 at 04:35:03PM +0000, Thierry Escande wrote:
> This patch adds 2 HVMOP hypercalls, HVMOP_get|set_ecam_space, used to
> set and get the base address and size of the PCIe ECAM space as
> configured by hvmloader.
>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> xen/arch/x86/hvm/hvm.c | 52 +++++++++++++++++++++++++++++++
> xen/arch/x86/include/asm/domain.h | 4 +++
> xen/include/public/hvm/hvm_op.h | 11 +++++++
> 3 files changed, 67 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index 4d37a93c57..a46dfa955d 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -5195,6 +5195,58 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
> rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
> break;
>
> + case HVMOP_set_ecam_space: {
> + xen_hvm_ecam_space_t ecam;
> + struct domain *d;
> +
> + if ( copy_from_guest( &ecam, guest_handle_cast(arg, xen_hvm_ecam_space_t), 1 ) )
^ extra space, here and at the
closing parenthesis.
Line length is also past the 80 character limit, same below in
HVMOP_get_ecam_space.
> + return -EFAULT;
This operation (and the matching get variant) needs an XSM check.
> +
> + d = rcu_lock_domain_by_any_id(ecam.domid);
> + if ( d == NULL )
> + return -ESRCH;
> +
> + if ( d->arch.ecam_addr ) {
Coding style, opening braces should be on a new line.
> + rcu_unlock_domain(d);
> + return -EFAULT;
This would better return -EBUSY
> + }
You also need to check the padding fields are 0.
> +
> + if ( (ecam.size >> 28) || (!ecam.addr) ) {
^ the parenthesis here are
unneeded.
> + rcu_unlock_domain(d);
> + return -EINVAL;
> + }
> +
> + d->arch.ecam_addr = ecam.addr;
> + d->arch.ecam_size = ecam.size;
I'm a bit worried about a domain being able to set it's own ECAM hole,
assessing all the side-effects of this might be complex.
Won't the code here better check the region passed in the hypercall is
indeed not mapped in the p2m, so that trapping of ECAM accesses works
as expected?
Also, how does the ECAM hole get setup on native? I assume there are
some magic registers in the PCI config space of a platform device that
the firmware uses to position the ECAM space?
Are those trapped by QEMU, in which case won't it be better to do it
the native way (iow: with the config space registers), and let QEMU
forward it to Xen? It would then be QEMU the one to call
HVMOP_set_ecam_space (or whatever hypercall we end up using).
> +
> + rcu_unlock_domain(d);
> + break;
> + }
> +
> + case HVMOP_get_ecam_space: {
> + xen_hvm_ecam_space_t ecam;
> + struct domain *d;
> +
> + if ( copy_from_guest( &ecam, guest_handle_cast(arg, xen_hvm_ecam_space_t), 1 ) )
> + return -EFAULT;
> +
> + d = rcu_lock_domain_by_any_id(ecam.domid);
> + if ( d == NULL )
> + return -ESRCH;
> +
> + if ( ! d->arch.ecam_addr || ! d->arch.ecam_size ) {
> + rcu_unlock_domain(d);
> + return -EINVAL;
> + }
> +
> + ecam.addr = d->arch.ecam_addr;
> + ecam.size = d->arch.ecam_size;
> + rc = __copy_to_guest(arg, &ecam, 1) ? -EFAULT : 0;
> +
> + rcu_unlock_domain(d);
> + break;
> + }
> +
> default:
> rc = -ENOSYS;
> break;
> diff --git a/xen/arch/x86/include/asm/domain.h b/xen/arch/x86/include/asm/domain.h
> index ad7f6adb2c..24ec33fc4d 100644
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -476,6 +476,10 @@ struct arch_domain
>
> /* Emulated devices enabled bitmap. */
> uint32_t emulation_flags;
> +
> + /* PCI ECAM space emulation */
> + uint64_t ecam_addr;
> + uint32_t ecam_size;
This fields would better be in hvm_domain struct, and there you
already have the mmcfg_regions list, which we should aim to use for
the q35 introduced ECAM region.
> } __cacheline_aligned;
>
> #ifdef CONFIG_HVM
> diff --git a/xen/include/public/hvm/hvm_op.h b/xen/include/public/hvm/hvm_op.h
> index e22adf0319..c84febc37c 100644
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -166,6 +166,17 @@ struct xen_hvm_get_mem_type {
> typedef struct xen_hvm_get_mem_type xen_hvm_get_mem_type_t;
> DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_mem_type_t);
>
> +#define HVMOP_set_ecam_space 16
> +#define HVMOP_get_ecam_space 17
> +struct xen_hvm_ecam_space {
> + domid_t domid;
> + uint16_t pad[3]; /* align next field on 8-byte boundary */
> + uint64_t addr;
> + uint32_t size;
There's also a trailing uint32_t padding here on 64bit builds I think?
FWIW, you could do:
domid_t domid;
uint16_t pad;
uint32_t size
uint64_t addr;
As that would reduce the padding in the struct?
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 09/17] xev/hvm: Add HVMOP_get|set_ecam_space hypercalls
2026-04-28 13:59 ` Roger Pau Monné
@ 2026-05-04 11:09 ` Jan Beulich
0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 11:09 UTC (permalink / raw)
To: Roger Pau Monné, Thierry Escande
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini
On 28.04.2026 15:59, Roger Pau Monné wrote:
> On Fri, Mar 13, 2026 at 04:35:03PM +0000, Thierry Escande wrote:
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -5195,6 +5195,58 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
>> rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
>> break;
>>
>> + case HVMOP_set_ecam_space: {
>> + xen_hvm_ecam_space_t ecam;
>> + struct domain *d;
>> +
>> + if ( copy_from_guest( &ecam, guest_handle_cast(arg, xen_hvm_ecam_space_t), 1 ) )
> ^ extra space, here and at the
> closing parenthesis.
>
> Line length is also past the 80 character limit, same below in
> HVMOP_get_ecam_space.
>
>> + return -EFAULT;
>
> This operation (and the matching get variant) needs an XSM check.
>
>> +
>> + d = rcu_lock_domain_by_any_id(ecam.domid);
>> + if ( d == NULL )
>> + return -ESRCH;
>> +
>> + if ( d->arch.ecam_addr ) {
>
> Coding style, opening braces should be on a new line.
>
>> + rcu_unlock_domain(d);
>> + return -EFAULT;
>
> This would better return -EBUSY
I agree, yet I'd like to suggest that this may want changing further: If
one can "set" the address, shouldn't one also be able to "clear" it? That
could (pretty) naturally be expressed by ecam.addr being 0 in the request.
Which would then require permitting non-0 .ecam_addr in that specific
case.
>> + if ( (ecam.size >> 28) || (!ecam.addr) ) {
> ^ the parenthesis here are
> unneeded.
>
>> + rcu_unlock_domain(d);
>> + return -EINVAL;
>> + }
>> +
>> + d->arch.ecam_addr = ecam.addr;
>> + d->arch.ecam_size = ecam.size;
>
> I'm a bit worried about a domain being able to set it's own ECAM hole,
> assessing all the side-effects of this might be complex.
>
> Won't the code here better check the region passed in the hypercall is
> indeed not mapped in the p2m, so that trapping of ECAM accesses works
> as expected?
>
> Also, how does the ECAM hole get setup on native? I assume there are
> some magic registers in the PCI config space of a platform device that
> the firmware uses to position the ECAM space?
That may even be outside of any device's config space, e.g. custom MMIO.
I didn't check, but I guess that may also be the case for Q35.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 09/17] xev/hvm: Add HVMOP_get|set_ecam_space hypercalls
2026-03-13 16:35 ` [PATCH 09/17] xev/hvm: Add HVMOP_get|set_ecam_space hypercalls Thierry Escande
2026-04-28 13:59 ` Roger Pau Monné
@ 2026-05-04 15:12 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 15:12 UTC (permalink / raw)
To: Thierry Escande
Cc: Andrew Cooper, Roger Pau Monné, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, xen-devel
On 13.03.2026 17:35, Thierry Escande wrote:
> This patch adds 2 HVMOP hypercalls, HVMOP_get|set_ecam_space, used to
> set and get the base address and size of the PCIe ECAM space as
> configured by hvmloader.
>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
Just in case we want to stick to these (see Roger's earlier comments
throughout the series), a few remarks here:
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -5195,6 +5195,58 @@ long do_hvm_op(unsigned long op, XEN_GUEST_HANDLE_PARAM(void) arg)
> rc = current->hcall_compat ? compat_altp2m_op(arg) : do_altp2m_op(arg);
> break;
>
> + case HVMOP_set_ecam_space: {
> + xen_hvm_ecam_space_t ecam;
> + struct domain *d;
> +
> + if ( copy_from_guest( &ecam, guest_handle_cast(arg, xen_hvm_ecam_space_t), 1 ) )
> + return -EFAULT;
> +
> + d = rcu_lock_domain_by_any_id(ecam.domid);
> + if ( d == NULL )
> + return -ESRCH;
> +
> + if ( d->arch.ecam_addr ) {
> + rcu_unlock_domain(d);
> + return -EFAULT;
> + }
> +
> + if ( (ecam.size >> 28) || (!ecam.addr) ) {
> + rcu_unlock_domain(d);
> + return -EINVAL;
> + }
> +
> + d->arch.ecam_addr = ecam.addr;
> + d->arch.ecam_size = ecam.size;
Shorter (and easier to follow as well as less error prone as to the
rcu_unlock_domain())
if ( d->arch.ecam_addr )
rc = -E...;
else if ( (ecam.size >> 28) || !ecam.addr )
rc = -EINVAL;
else
{
d->arch.ecam_addr = ecam.addr;
d->arch.ecam_size = ecam.size;
}
all utilizing ...
> + rcu_unlock_domain(d);
... this.
The magic 28 also needs (a) explaining and/or (b) abstracting (a
suitably named #define might address both).
> --- a/xen/include/public/hvm/hvm_op.h
> +++ b/xen/include/public/hvm/hvm_op.h
> @@ -166,6 +166,17 @@ struct xen_hvm_get_mem_type {
> typedef struct xen_hvm_get_mem_type xen_hvm_get_mem_type_t;
> DEFINE_XEN_GUEST_HANDLE(xen_hvm_get_mem_type_t);
>
> +#define HVMOP_set_ecam_space 16
> +#define HVMOP_get_ecam_space 17
> +struct xen_hvm_ecam_space {
> + domid_t domid;
> + uint16_t pad[3]; /* align next field on 8-byte boundary */
The comment, as is, is wrong for 32-bit HVM guests: There ...
> + uint64_t addr;
... this is only 4-byte aligned, and hence the entire structure only
has 4-byte alignment, and hence the padding also only guarantees 4-
byte alignment.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 14/17] libacpi: build ACPI MCFG table if requested
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (8 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 09/17] xev/hvm: Add HVMOP_get|set_ecam_space hypercalls Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-29 10:13 ` Roger Pau Monné
2026-03-13 16:35 ` [PATCH 13/17] libxl: Add xen-platform device for Q35 machine Thierry Escande
` (8 subsequent siblings)
18 siblings, 1 reply; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Anthony PERARD, Alexey Gerasimenko
This adds construct_mcfg() function to libacpi which allows to build MCFG
table for a given mmconfig_addr/mmconfig_len pair if the ACPI_HAS_MCFG
flag was specified in acpi_config struct.
The maximum bus number is calculated from mmconfig_size using
MCFG_SIZE_TO_NUM_BUSES macro (1MByte of MMIO space per bus).
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/libacpi/acpi2_0.h | 17 ++++++++++++++++
tools/libacpi/build.c | 43 +++++++++++++++++++++++++++++++++++++++++
tools/libacpi/libacpi.h | 6 ++++++
3 files changed, 66 insertions(+)
diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
index 51623e2a8a..2b16bd636a 100644
--- a/tools/libacpi/acpi2_0.h
+++ b/tools/libacpi/acpi2_0.h
@@ -442,6 +442,21 @@ struct acpi_20_slit {
uint8_t entry[0];
};
+/*
+ * PCI Express Memory Mapped Configuration Description Table
+ */
+struct acpi_10_mcfg {
+ struct acpi_header header;
+ uint8_t reserved[8];
+ struct {
+ uint64_t base_address;
+ uint16_t pci_segment;
+ uint8_t start_pci_bus_num;
+ uint8_t end_pci_bus_num;
+ uint32_t reserved;
+ } entries[1];
+};
+
/*
* Table Signatures.
*/
@@ -457,6 +472,7 @@ struct acpi_20_slit {
#define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
#define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
#define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
+#define ACPI_MCFG_SIGNATURE ASCII32('M','C','F','G')
/*
* Table revision numbers.
@@ -472,6 +488,7 @@ struct acpi_20_slit {
#define ACPI_1_0_FADT_REVISION 0x01
#define ACPI_2_0_SRAT_REVISION 0x01
#define ACPI_2_0_SLIT_REVISION 0x01
+#define ACPI_1_0_MCFG_REVISION 0x01
#pragma pack ()
diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
index 95188e217e..90080c76c4 100644
--- a/tools/libacpi/build.c
+++ b/tools/libacpi/build.c
@@ -295,6 +295,37 @@ static struct acpi_20_slit *construct_slit(struct acpi_ctxt *ctxt,
return slit;
}
+static struct acpi_10_mcfg *construct_mcfg(struct acpi_ctxt *ctxt,
+ const struct acpi_config *config)
+{
+ struct acpi_10_mcfg *mcfg;
+
+ /* Warning: this code expects that we have only one PCI segment */
+ mcfg = ctxt->mem_ops.alloc(ctxt, sizeof(*mcfg), 16);
+ if ( !mcfg )
+ return NULL;
+
+ memset(mcfg, 0, sizeof(*mcfg));
+ mcfg->header.signature = ACPI_MCFG_SIGNATURE;
+ mcfg->header.revision = ACPI_1_0_MCFG_REVISION;
+ mcfg->header.creator_id = ACPI_CREATOR_ID;
+ mcfg->header.creator_revision = ACPI_CREATOR_REVISION;
+ mcfg->header.length = sizeof(*mcfg);
+ mcfg->header.oem_revision = ACPI_OEM_REVISION;
+ fixed_strcpy(mcfg->header.oem_id, ACPI_OEM_ID);
+ fixed_strcpy(mcfg->header.oem_table_id, ACPI_OEM_TABLE_ID);
+
+ mcfg->entries[0].base_address = config->mmconfig_addr;
+ mcfg->entries[0].pci_segment = 0;
+ mcfg->entries[0].start_pci_bus_num = 0;
+ mcfg->entries[0].end_pci_bus_num =
+ MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_size) - 1;
+
+ set_checksum(mcfg, offsetof(struct acpi_header, checksum), sizeof(*mcfg));
+
+ return mcfg;
+}
+
static int construct_passthrough_tables(struct acpi_ctxt *ctxt,
unsigned long *table_ptrs,
int nr_tables,
@@ -343,6 +374,7 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
struct acpi_20_waet *waet;
struct acpi_20_tcpa *tcpa;
struct acpi_20_tpm2 *tpm2;
+ struct acpi_10_mcfg *mcfg;
unsigned char *ssdt;
void *lasa;
@@ -402,6 +434,17 @@ static int construct_secondary_tables(struct acpi_ctxt *ctxt,
memcpy(ssdt, ssdt_laptop_slate, sizeof(ssdt_laptop_slate));
table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, ssdt);
}
+
+ /* MCFG */
+ if ( config->table_flags & ACPI_HAS_MCFG )
+ {
+ mcfg = construct_mcfg(ctxt, config);
+ if ( !mcfg )
+ return -1;
+
+ table_ptrs[nr_tables++] = ctxt->mem_ops.v2p(ctxt, mcfg);
+ }
+
/* TPM and its SSDT. */
if ( config->table_flags & ACPI_HAS_TPM )
{
diff --git a/tools/libacpi/libacpi.h b/tools/libacpi/libacpi.h
index deda39e5db..b2abda90b1 100644
--- a/tools/libacpi/libacpi.h
+++ b/tools/libacpi/libacpi.h
@@ -27,6 +27,9 @@
#define ACPI_HAS_8042 (1<<13)
#define ACPI_HAS_CMOS_RTC (1<<14)
#define ACPI_HAS_SSDT_LAPTOP_SLATE (1<<15)
+#define ACPI_HAS_MCFG (1<<16)
+
+#define MCFG_SIZE_TO_NUM_BUSES(size) ((size) >> 20)
struct xen_vmemrange;
struct acpi_numa {
@@ -89,6 +92,9 @@ struct acpi_config {
uint32_t ioapic_base_address;
uint16_t pci_isa_irq_mask;
uint8_t ioapic_id;
+
+ uint64_t mmconfig_addr;
+ uint32_t mmconfig_size;
};
int acpi_build_tables(struct acpi_ctxt *ctxt, struct acpi_config *config);
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 14/17] libacpi: build ACPI MCFG table if requested
2026-03-13 16:35 ` [PATCH 14/17] libacpi: build ACPI MCFG table if requested Thierry Escande
@ 2026-04-29 10:13 ` Roger Pau Monné
0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-29 10:13 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Anthony PERARD, Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:04PM +0000, Thierry Escande wrote:
> This adds construct_mcfg() function to libacpi which allows to build MCFG
> table for a given mmconfig_addr/mmconfig_len pair if the ACPI_HAS_MCFG
> flag was specified in acpi_config struct.
>
> The maximum bus number is calculated from mmconfig_size using
> MCFG_SIZE_TO_NUM_BUSES macro (1MByte of MMIO space per bus).
>
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/libacpi/acpi2_0.h | 17 ++++++++++++++++
> tools/libacpi/build.c | 43 +++++++++++++++++++++++++++++++++++++++++
> tools/libacpi/libacpi.h | 6 ++++++
> 3 files changed, 66 insertions(+)
>
> diff --git a/tools/libacpi/acpi2_0.h b/tools/libacpi/acpi2_0.h
> index 51623e2a8a..2b16bd636a 100644
> --- a/tools/libacpi/acpi2_0.h
> +++ b/tools/libacpi/acpi2_0.h
> @@ -442,6 +442,21 @@ struct acpi_20_slit {
> uint8_t entry[0];
> };
>
> +/*
> + * PCI Express Memory Mapped Configuration Description Table
> + */
> +struct acpi_10_mcfg {
> + struct acpi_header header;
> + uint8_t reserved[8];
> + struct {
> + uint64_t base_address;
> + uint16_t pci_segment;
> + uint8_t start_pci_bus_num;
> + uint8_t end_pci_bus_num;
> + uint32_t reserved;
> + } entries[1];
> +};
> +
> /*
> * Table Signatures.
> */
> @@ -457,6 +472,7 @@ struct acpi_20_slit {
> #define ACPI_2_0_WAET_SIGNATURE ASCII32('W','A','E','T')
> #define ACPI_2_0_SRAT_SIGNATURE ASCII32('S','R','A','T')
> #define ACPI_2_0_SLIT_SIGNATURE ASCII32('S','L','I','T')
> +#define ACPI_MCFG_SIGNATURE ASCII32('M','C','F','G')
>
> /*
> * Table revision numbers.
> @@ -472,6 +488,7 @@ struct acpi_20_slit {
> #define ACPI_1_0_FADT_REVISION 0x01
> #define ACPI_2_0_SRAT_REVISION 0x01
> #define ACPI_2_0_SLIT_REVISION 0x01
> +#define ACPI_1_0_MCFG_REVISION 0x01
>
> #pragma pack ()
>
> diff --git a/tools/libacpi/build.c b/tools/libacpi/build.c
> index 95188e217e..90080c76c4 100644
> --- a/tools/libacpi/build.c
> +++ b/tools/libacpi/build.c
> @@ -295,6 +295,37 @@ static struct acpi_20_slit *construct_slit(struct acpi_ctxt *ctxt,
> return slit;
> }
>
> +static struct acpi_10_mcfg *construct_mcfg(struct acpi_ctxt *ctxt,
> + const struct acpi_config *config)
> +{
> + struct acpi_10_mcfg *mcfg;
> +
> + /* Warning: this code expects that we have only one PCI segment */
Not only one PCI segment, but just one ECAM region. You could in
theory have multiple ECAM regions within a single PCI segment.
> + mcfg = ctxt->mem_ops.alloc(ctxt, sizeof(*mcfg), 16);
> + if ( !mcfg )
> + return NULL;
> +
> + memset(mcfg, 0, sizeof(*mcfg));
> + mcfg->header.signature = ACPI_MCFG_SIGNATURE;
> + mcfg->header.revision = ACPI_1_0_MCFG_REVISION;
> + mcfg->header.creator_id = ACPI_CREATOR_ID;
> + mcfg->header.creator_revision = ACPI_CREATOR_REVISION;
> + mcfg->header.length = sizeof(*mcfg);
> + mcfg->header.oem_revision = ACPI_OEM_REVISION;
> + fixed_strcpy(mcfg->header.oem_id, ACPI_OEM_ID);
> + fixed_strcpy(mcfg->header.oem_table_id, ACPI_OEM_TABLE_ID);
> +
> + mcfg->entries[0].base_address = config->mmconfig_addr;
> + mcfg->entries[0].pci_segment = 0;
> + mcfg->entries[0].start_pci_bus_num = 0;
> + mcfg->entries[0].end_pci_bus_num =
> + MCFG_SIZE_TO_NUM_BUSES(config->mmconfig_size) - 1;
You might want to check that mmconfig_addr and mmconfig_size are set
ahead of using them? Just in case some bogus toolstack/hvmloader sets
ACPI_HAS_MCFG without correctly populating the fields?
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 13/17] libxl: Add xen-platform device for Q35 machine
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (9 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 14/17] libacpi: build ACPI MCFG table if requested Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-03-13 16:35 ` [PATCH 12/17] libxl: Q35 support (new option device_model_machine) Thierry Escande
` (7 subsequent siblings)
18 siblings, 0 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel; +Cc: Thierry Escande, Anthony PERARD, Juergen Gross
Current Xen/QEMU method to control Xen Platform device is done by
setting the 'xen_platform_device' option value that modifies QEMU
emulated machine type, namely xenfv <--> pc.
In order to avoid multiplying machine types, this patch supplies
'-device xen-platform' directly to Qemu. To maintain backward
compatibility with existing Xen/QEMU setups, this is currently only
applicable to q35 machine. i440 emulation uses the old method (xenfv/pc
machine) to control Xen Platform device.
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/libs/light/libxl_dm.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
index 36f2813cde..a64e4779d0 100644
--- a/tools/libs/light/libxl_dm.c
+++ b/tools/libs/light/libxl_dm.c
@@ -1811,6 +1811,12 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
flexarray_append(dm_args, state->dm_runas);
}
}
+
+ if (b_info->device_model_machine == LIBXL_DEVICE_MODEL_MACHINE_Q35 &&
+ libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
+ flexarray_append(dm_args, "-device");
+ flexarray_append(dm_args, "xen-platform");
+ }
}
flexarray_append(dm_args, NULL);
*args = (char **) flexarray_contents(dm_args);
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* [PATCH 12/17] libxl: Q35 support (new option device_model_machine)
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (10 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 13/17] libxl: Add xen-platform device for Q35 machine Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-29 10:01 ` Roger Pau Monné
2026-03-13 16:35 ` [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole Thierry Escande
` (6 subsequent siblings)
18 siblings, 1 reply; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Anthony PERARD, Juergen Gross,
Alexey Gerasimenko
Provide a new domain config option to select the emulated machine type,
device_model_machine. It has following possible values:
- "i440" - i440 emulation (default)
- "q35" - emulate a Q35 machine. By default, the storage interface is
AHCI.
Note that omitting device_model_machine parameter means i440 system
by default, so the default behavior doesn't change for existing domain
config files.
Setting device_model_machine to "q35" sends '-machine q35,accel=xen'
argument to QEMU. Unlike i440, there is no separated machine type to
enable/disable Xen platform device, it is controlled via a machine
property only. See 'libxl: Add xen-platform device for Q35 machine'
patch for a detailed description.
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/libs/light/libxl_dm.c | 16 ++++++++++------
tools/libs/light/libxl_types.idl | 7 +++++++
tools/xl/xl_parse.c | 14 ++++++++++++++
3 files changed, 31 insertions(+), 6 deletions(-)
diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
index 511ec76a65..36f2813cde 100644
--- a/tools/libs/light/libxl_dm.c
+++ b/tools/libs/light/libxl_dm.c
@@ -1562,13 +1562,17 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
flexarray_append(dm_args, b_info->extra_pv[i]);
break;
case LIBXL_DOMAIN_TYPE_HVM:
- if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
- /* Switching here to the machine "pc" which does not add
- * the xen-platform device instead of the default "xenfv" machine.
- */
- machinearg = libxl__strdup(gc, "pc,accel=xen,suppress-vmdesc=on");
+ if (b_info->device_model_machine == LIBXL_DEVICE_MODEL_MACHINE_Q35) {
+ machinearg = libxl__sprintf(gc, "q35,accel=xen");
} else {
- machinearg = libxl__strdup(gc, "xenfv,suppress-vmdesc=on");
+ if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
+ /* Switching here to the machine "pc" which does not add
+ * the xen-platform device instead of the default "xenfv" machine.
+ */
+ machinearg = libxl__strdup(gc, "pc,accel=xen,suppress-vmdesc=on");
+ } else {
+ machinearg = libxl__strdup(gc, "xenfv,suppress-vmdesc=on");
+ }
}
if (b_info->u.hvm.mmio_hole_memkb) {
uint64_t max_ram_below_4g = (1ULL << 32) -
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index d64a573ff3..f9cd881b66 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -109,6 +109,12 @@ libxl_device_model_version = Enumeration("device_model_version", [
(2, "QEMU_XEN"), # Upstream based qemu-xen device model
])
+libxl_device_model_machine = Enumeration("device_model_machine", [
+ (0, "UNKNOWN"),
+ (1, "I440"),
+ (2, "Q35"),
+ ])
+
libxl_console_type = Enumeration("console_type", [
(0, "UNKNOWN"),
(1, "SERIAL"),
@@ -613,6 +619,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
("device_model_ssidref", uint32),
("device_model_ssid_label", string),
("device_model_user", string),
+ ("device_model_machine", libxl_device_model_machine),
# extra parameters pass directly to qemu, NULL terminated
("extra", libxl_string_list),
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 1a2ea8b5d5..a4346d1693 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -2813,6 +2813,20 @@ skip_usbdev:
if (!xlu_cfg_get_long (config, "stubdomain_memory", &l, 0))
b_info->stubdomain_memkb = l * 1024;
+ if (!xlu_cfg_get_string (config, "device_model_machine", &buf, 0)) {
+ if (!strcmp(buf, "i440")) {
+ b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_I440;
+ } else if (!strcmp(buf, "q35")) {
+ b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_Q35;
+ } else {
+ fprintf(stderr,
+ "Unknown device_model_machine \"%s\" specified\n", buf);
+ exit(1);
+ }
+ } else {
+ b_info->device_model_machine = LIBXL_DEVICE_MODEL_MACHINE_UNKNOWN;
+ }
+
#define parse_extra_args(type) \
e = xlu_cfg_get_list_as_string_list(config, "device_model_args"#type, \
&b_info->extra##type, 0); \
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 12/17] libxl: Q35 support (new option device_model_machine)
2026-03-13 16:35 ` [PATCH 12/17] libxl: Q35 support (new option device_model_machine) Thierry Escande
@ 2026-04-29 10:01 ` Roger Pau Monné
0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-29 10:01 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Anthony PERARD, Juergen Gross, Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:04PM +0000, Thierry Escande wrote:
> Provide a new domain config option to select the emulated machine type,
> device_model_machine. It has following possible values:
> - "i440" - i440 emulation (default)
> - "q35" - emulate a Q35 machine. By default, the storage interface is
> AHCI.
>
> Note that omitting device_model_machine parameter means i440 system
> by default, so the default behavior doesn't change for existing domain
> config files.
>
> Setting device_model_machine to "q35" sends '-machine q35,accel=xen'
> argument to QEMU. Unlike i440, there is no separated machine type to
> enable/disable Xen platform device, it is controlled via a machine
> property only. See 'libxl: Add xen-platform device for Q35 machine'
> patch for a detailed description.
Not an explicit objection to this patch, but I wonder what will we do
for PVH when we start exposing PCI devices. We cannot provide a fully
complete emulated Q35, but we do need to expose an MCFG for extended
config space. The current naming "device_model_machine" won't work
for PVH, as there's no device model there. But at the same time I
wonder whether what we end up exposing to PVH would resemble any
physical chipsets, or it's more likely going to be the minimum needed
to make PVH guests happy to access the PCI config space (and hence we
might end up emulating too little to match any chipset).
Thanks, Roger.
>
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/libs/light/libxl_dm.c | 16 ++++++++++------
> tools/libs/light/libxl_types.idl | 7 +++++++
> tools/xl/xl_parse.c | 14 ++++++++++++++
> 3 files changed, 31 insertions(+), 6 deletions(-)
>
> diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
> index 511ec76a65..36f2813cde 100644
> --- a/tools/libs/light/libxl_dm.c
> +++ b/tools/libs/light/libxl_dm.c
> @@ -1562,13 +1562,17 @@ static int libxl__build_device_model_args_new(libxl__gc *gc,
> flexarray_append(dm_args, b_info->extra_pv[i]);
> break;
> case LIBXL_DOMAIN_TYPE_HVM:
> - if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
> - /* Switching here to the machine "pc" which does not add
> - * the xen-platform device instead of the default "xenfv" machine.
> - */
> - machinearg = libxl__strdup(gc, "pc,accel=xen,suppress-vmdesc=on");
> + if (b_info->device_model_machine == LIBXL_DEVICE_MODEL_MACHINE_Q35) {
> + machinearg = libxl__sprintf(gc, "q35,accel=xen");
> } else {
> - machinearg = libxl__strdup(gc, "xenfv,suppress-vmdesc=on");
> + if (!libxl_defbool_val(b_info->u.hvm.xen_platform_pci)) {
> + /* Switching here to the machine "pc" which does not add
> + * the xen-platform device instead of the default "xenfv" machine.
> + */
> + machinearg = libxl__strdup(gc, "pc,accel=xen,suppress-vmdesc=on");
> + } else {
> + machinearg = libxl__strdup(gc, "xenfv,suppress-vmdesc=on");
> + }
> }
> if (b_info->u.hvm.mmio_hole_memkb) {
> uint64_t max_ram_below_4g = (1ULL << 32) -
> diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
> index d64a573ff3..f9cd881b66 100644
> --- a/tools/libs/light/libxl_types.idl
> +++ b/tools/libs/light/libxl_types.idl
> @@ -109,6 +109,12 @@ libxl_device_model_version = Enumeration("device_model_version", [
> (2, "QEMU_XEN"), # Upstream based qemu-xen device model
> ])
>
> +libxl_device_model_machine = Enumeration("device_model_machine", [
> + (0, "UNKNOWN"),
> + (1, "I440"),
> + (2, "Q35"),
> + ])
> +
> libxl_console_type = Enumeration("console_type", [
> (0, "UNKNOWN"),
> (1, "SERIAL"),
> @@ -613,6 +619,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
> ("device_model_ssidref", uint32),
> ("device_model_ssid_label", string),
> ("device_model_user", string),
> + ("device_model_machine", libxl_device_model_machine),
This possibly wants to be inside the u.hvm sub-structure. I don't
think we want to use it for PVH, not with this current naming at
least.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (11 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 12/17] libxl: Q35 support (new option device_model_machine) Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-29 9:29 ` Roger Pau Monné
2026-03-13 16:35 ` [PATCH 15/17] hvmloader: Set MCFG in ACPI table Thierry Escande
` (5 subsequent siblings)
18 siblings, 1 reply; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Alexey Gerasimenko
The actual MMCONFIG size depends on the number of PCI buses available
which should be covered by ECAM. Possible options are 64MB, 128MB and
256MB. As Xen is limited to the bus 0 currently, the lowest possible
setting is used (64MB), defined via PCI_MAX_MCFG_BUSES in
hvmloader/config.h. When multiple PCI buses support for Xen will be
implemented, PCI_MAX_MCFG_BUSES may be replaced by a calculation of the
number of buses according to PCI devices enumeration.
The MMCONFIG entry is inserted into bars array in the same manner like
for any other BARs. In this case, the devfn field will point to MCH PCI
device and bar_reg will contain PCIEXBAR register offset. It will be
assigned a slot in the MMIO hole later in a very same way like for plain
PCI BARs, with respect to its size and alignment. At this point, the
actual base address and size of the ECAM space are passed to Xen using
the HVMOP_set_ecam_space hypercall.
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/config.h | 4 +++
tools/firmware/hvmloader/pci.c | 55 +++++++++++++++++++++++++++++
tools/firmware/hvmloader/pci_regs.h | 7 ++++
3 files changed, 66 insertions(+)
diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
index baaed91c7f..aa3158bca5 100644
--- a/tools/firmware/hvmloader/config.h
+++ b/tools/firmware/hvmloader/config.h
@@ -55,6 +55,10 @@ extern uint32_t *cpu_to_apicid;
#define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
#define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
#define PCI_ICH9_LPC_DEVFN 0xf8 /* dev 31, fn 0 */
+#define PCI_MCH_DEVFN 0 /* bus 0, dev 0, func 0 */
+
+/* possible values are: 64, 128, 256 */
+#define PCI_MAX_MCFG_BUSES 64
#define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
index 6e6720adae..54c23ffdd8 100644
--- a/tools/firmware/hvmloader/pci.c
+++ b/tools/firmware/hvmloader/pci.c
@@ -413,6 +413,58 @@ void pci_setup(void)
pci_devfn_decode_type[devfn] |= PCI_COMMAND_MASTER;
}
+ /*
+ * Calculate MMCONFIG area size and squeeze it into the bars array
+ * for assigning a slot in the MMIO hole
+ */
+ if ( is_running_on_q35 )
+ {
+ /* disable PCIEXBAR decoding for now */
+ pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
+ pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);
+
+ switch ( PCI_MAX_MCFG_BUSES )
+ {
+ case 64:
+ bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
+ bar_sz = MB(64);
+ break;
+
+ case 128:
+ bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
+ bar_sz = MB(128);
+ break;
+
+ case 256:
+ bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
+ bar_sz = MB(256);
+ break;
+
+ default:
+ /* unsupported number of buses specified */
+ BUG();
+ }
+
+ addr_mask = ~(bar_sz - 1);
+
+ for ( i = 0; i < nr_bars; i++ )
+ if ( bars[i].bar_sz < bar_sz )
+ break;
+
+ if ( i != nr_bars )
+ memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
+
+ bars[i].is_mem = 1;
+ bars[i].devfn = PCI_MCH_DEVFN;
+ bars[i].bar_reg = PCI_MCH_PCIEXBAR;
+ bars[i].bar_sz = bar_sz;
+ bars[i].addr_mask = addr_mask;
+ bars[i].bar_data = bar_data;
+
+ mmio_total += bar_sz;
+ nr_bars++;
+ }
+
if ( mmio_hole_size )
{
uint64_t max_ram_below_4g = GB(4) - mmio_hole_size;
@@ -592,6 +644,9 @@ void pci_setup(void)
}
}
+ if ( bar_reg == PCI_MCH_PCIEXBAR )
+ hvm_set_ecam_space(base, bar_sz);
+
bar_data |= (uint32_t) (base & bars[i].addr_mask);
bar_data_upper = (uint32_t)(base >> 32);
base += bar_sz;
diff --git a/tools/firmware/hvmloader/pci_regs.h b/tools/firmware/hvmloader/pci_regs.h
index d217b8f1a4..86e04f8bbd 100644
--- a/tools/firmware/hvmloader/pci_regs.h
+++ b/tools/firmware/hvmloader/pci_regs.h
@@ -116,6 +116,13 @@
#define PCI_CLASS_BRIDGE_OTHER 0x0680
#define PCI_CLASS_BRIDGE_ISA 0x0601
+#define PCI_MCH_PCIEXBAR 0x60
+
+#define PCIEXBAR_64_BUSES 0x04
+#define PCIEXBAR_128_BUSES 0x02
+#define PCIEXBAR_256_BUSES 0x00
+#define PCIEXBAR_ENABLE 0x01
+
#endif /* __HVMLOADER_PCI_REGS_H__ */
/*
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole
2026-03-13 16:35 ` [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole Thierry Escande
@ 2026-04-29 9:29 ` Roger Pau Monné
2026-05-04 11:11 ` Jan Beulich
0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-29 9:29 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:04PM +0000, Thierry Escande wrote:
> The actual MMCONFIG size depends on the number of PCI buses available
> which should be covered by ECAM. Possible options are 64MB, 128MB and
> 256MB.
Are such values inherited from the real q35 impleemntation?
AFAICT the ACPI MCFG spec notes:
"The size of the memory mapped configuration region is indicated by
the start and end bus number fields in the Memory mapped Enhanced
configuration space base address allocation structure as shown in
Table 4-3. 0-255 is the range of allowed bus numbers supported for a
given PCI Segment Group."
So it's in principle possible to specify a MCFG that covers a single
bus, and then it would have a size of 256 * 4K = 1M. Which avoids
wasting 63M of MMIO space in the low MMIO hole that's already fairly
tight on space.
Is this limitation possibly inherited from the way the ECAM region
position and size must be notified to the chipset?
And further seeing the code below - I found the answer myself, it's
because the chipset only supports negotiation those ECAM sizes. We
could possibly expose a smaller region in MCFG, but doesn't seem like
a good move.
> As Xen is limited to the bus 0 currently, the lowest possible
> setting is used (64MB), defined via PCI_MAX_MCFG_BUSES in
> hvmloader/config.h. When multiple PCI buses support for Xen will be
> implemented, PCI_MAX_MCFG_BUSES may be replaced by a calculation of the
> number of buses according to PCI devices enumeration.
>
> The MMCONFIG entry is inserted into bars array in the same manner like
> for any other BARs. In this case, the devfn field will point to MCH PCI
> device and bar_reg will contain PCIEXBAR register offset. It will be
> assigned a slot in the MMIO hole later in a very same way like for plain
> PCI BARs, with respect to its size and alignment. At this point, the
> actual base address and size of the ECAM space are passed to Xen using
> the HVMOP_set_ecam_space hypercall.
>
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/config.h | 4 +++
> tools/firmware/hvmloader/pci.c | 55 +++++++++++++++++++++++++++++
> tools/firmware/hvmloader/pci_regs.h | 7 ++++
> 3 files changed, 66 insertions(+)
>
> diff --git a/tools/firmware/hvmloader/config.h b/tools/firmware/hvmloader/config.h
> index baaed91c7f..aa3158bca5 100644
> --- a/tools/firmware/hvmloader/config.h
> +++ b/tools/firmware/hvmloader/config.h
> @@ -55,6 +55,10 @@ extern uint32_t *cpu_to_apicid;
> #define PCI_ISA_DEVFN 0x08 /* dev 1, fn 0 */
> #define PCI_ISA_IRQ_MASK 0x0c20U /* ISA IRQs 5,10,11 are PCI connected */
> #define PCI_ICH9_LPC_DEVFN 0xf8 /* dev 31, fn 0 */
> +#define PCI_MCH_DEVFN 0 /* bus 0, dev 0, func 0 */
> +
> +/* possible values are: 64, 128, 256 */
> +#define PCI_MAX_MCFG_BUSES 64
>
> #define ACPI_TIS_HDR_ADDRESS 0xFED40F00UL
>
> diff --git a/tools/firmware/hvmloader/pci.c b/tools/firmware/hvmloader/pci.c
> index 6e6720adae..54c23ffdd8 100644
> --- a/tools/firmware/hvmloader/pci.c
> +++ b/tools/firmware/hvmloader/pci.c
> @@ -413,6 +413,58 @@ void pci_setup(void)
> pci_devfn_decode_type[devfn] |= PCI_COMMAND_MASTER;
> }
>
> + /*
> + * Calculate MMCONFIG area size and squeeze it into the bars array
> + * for assigning a slot in the MMIO hole
> + */
> + if ( is_running_on_q35 )
> + {
> + /* disable PCIEXBAR decoding for now */
> + pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
> + pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);
> +
> + switch ( PCI_MAX_MCFG_BUSES )
> + {
> + case 64:
> + bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
> + bar_sz = MB(64);
> + break;
> +
> + case 128:
> + bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
> + bar_sz = MB(128);
> + break;
> +
> + case 256:
> + bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
> + bar_sz = MB(256);
> + break;
> +
> + default:
> + /* unsupported number of buses specified */
> + BUG();
> + }
> +
> + addr_mask = ~(bar_sz - 1);
> +
> + for ( i = 0; i < nr_bars; i++ )
> + if ( bars[i].bar_sz < bar_sz )
> + break;
> +
> + if ( i != nr_bars )
> + memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
> +
> + bars[i].is_mem = 1;
> + bars[i].devfn = PCI_MCH_DEVFN;
> + bars[i].bar_reg = PCI_MCH_PCIEXBAR;
> + bars[i].bar_sz = bar_sz;
> + bars[i].addr_mask = addr_mask;
> + bars[i].bar_data = bar_data;
> +
> + mmio_total += bar_sz;
> + nr_bars++;
> + }
I think it might be best if the ECAM fake BAR is the first element in
the bars array, so we ensure it's the first item to consume memory
from the low MMIO hole. Not sure how that will work with the current
sorting of the resources based on their size, but it's imperative for
hvmloader to attempt to position ECAM ahead of the other device
resources IMO.
> +
> if ( mmio_hole_size )
> {
> uint64_t max_ram_below_4g = GB(4) - mmio_hole_size;
> @@ -592,6 +644,9 @@ void pci_setup(void)
> }
> }
>
> + if ( bar_reg == PCI_MCH_PCIEXBAR )
> + hvm_set_ecam_space(base, bar_sz);
As noted in a previous patch, it would be better if it's QEMU (as part
of handling the PCI_MCH_PCIEXBAR writes) that notifies Xen of the ECAM
window placement.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole
2026-04-29 9:29 ` Roger Pau Monné
@ 2026-05-04 11:11 ` Jan Beulich
2026-05-04 12:23 ` Roger Pau Monné
0 siblings, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 11:11 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Alexey Gerasimenko,
Thierry Escande
On 29.04.2026 11:29, Roger Pau Monné wrote:
> On Fri, Mar 13, 2026 at 04:35:04PM +0000, Thierry Escande wrote:
>> --- a/tools/firmware/hvmloader/pci.c
>> +++ b/tools/firmware/hvmloader/pci.c
>> @@ -413,6 +413,58 @@ void pci_setup(void)
>> pci_devfn_decode_type[devfn] |= PCI_COMMAND_MASTER;
>> }
>>
>> + /*
>> + * Calculate MMCONFIG area size and squeeze it into the bars array
>> + * for assigning a slot in the MMIO hole
>> + */
>> + if ( is_running_on_q35 )
>> + {
>> + /* disable PCIEXBAR decoding for now */
>> + pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
>> + pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);
>> +
>> + switch ( PCI_MAX_MCFG_BUSES )
>> + {
>> + case 64:
>> + bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
>> + bar_sz = MB(64);
>> + break;
>> +
>> + case 128:
>> + bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
>> + bar_sz = MB(128);
>> + break;
>> +
>> + case 256:
>> + bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
>> + bar_sz = MB(256);
>> + break;
>> +
>> + default:
>> + /* unsupported number of buses specified */
>> + BUG();
>> + }
>> +
>> + addr_mask = ~(bar_sz - 1);
>> +
>> + for ( i = 0; i < nr_bars; i++ )
>> + if ( bars[i].bar_sz < bar_sz )
>> + break;
>> +
>> + if ( i != nr_bars )
>> + memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
>> +
>> + bars[i].is_mem = 1;
>> + bars[i].devfn = PCI_MCH_DEVFN;
>> + bars[i].bar_reg = PCI_MCH_PCIEXBAR;
>> + bars[i].bar_sz = bar_sz;
>> + bars[i].addr_mask = addr_mask;
>> + bars[i].bar_data = bar_data;
>> +
>> + mmio_total += bar_sz;
>> + nr_bars++;
>> + }
>
> I think it might be best if the ECAM fake BAR is the first element in
> the bars array, so we ensure it's the first item to consume memory
> from the low MMIO hole. Not sure how that will work with the current
> sorting of the resources based on their size, but it's imperative for
> hvmloader to attempt to position ECAM ahead of the other device
> resources IMO.
Why would this be?
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole
2026-05-04 11:11 ` Jan Beulich
@ 2026-05-04 12:23 ` Roger Pau Monné
2026-05-04 12:36 ` Jan Beulich
0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2026-05-04 12:23 UTC (permalink / raw)
To: Jan Beulich
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Alexey Gerasimenko,
Thierry Escande
On Mon, May 04, 2026 at 01:11:44PM +0200, Jan Beulich wrote:
> On 29.04.2026 11:29, Roger Pau Monné wrote:
> > On Fri, Mar 13, 2026 at 04:35:04PM +0000, Thierry Escande wrote:
> >> --- a/tools/firmware/hvmloader/pci.c
> >> +++ b/tools/firmware/hvmloader/pci.c
> >> @@ -413,6 +413,58 @@ void pci_setup(void)
> >> pci_devfn_decode_type[devfn] |= PCI_COMMAND_MASTER;
> >> }
> >>
> >> + /*
> >> + * Calculate MMCONFIG area size and squeeze it into the bars array
> >> + * for assigning a slot in the MMIO hole
> >> + */
> >> + if ( is_running_on_q35 )
> >> + {
> >> + /* disable PCIEXBAR decoding for now */
> >> + pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
> >> + pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);
> >> +
> >> + switch ( PCI_MAX_MCFG_BUSES )
> >> + {
> >> + case 64:
> >> + bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
> >> + bar_sz = MB(64);
> >> + break;
> >> +
> >> + case 128:
> >> + bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
> >> + bar_sz = MB(128);
> >> + break;
> >> +
> >> + case 256:
> >> + bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
> >> + bar_sz = MB(256);
> >> + break;
> >> +
> >> + default:
> >> + /* unsupported number of buses specified */
> >> + BUG();
> >> + }
> >> +
> >> + addr_mask = ~(bar_sz - 1);
> >> +
> >> + for ( i = 0; i < nr_bars; i++ )
> >> + if ( bars[i].bar_sz < bar_sz )
> >> + break;
> >> +
> >> + if ( i != nr_bars )
> >> + memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
> >> +
> >> + bars[i].is_mem = 1;
> >> + bars[i].devfn = PCI_MCH_DEVFN;
> >> + bars[i].bar_reg = PCI_MCH_PCIEXBAR;
> >> + bars[i].bar_sz = bar_sz;
> >> + bars[i].addr_mask = addr_mask;
> >> + bars[i].bar_data = bar_data;
> >> +
> >> + mmio_total += bar_sz;
> >> + nr_bars++;
> >> + }
> >
> > I think it might be best if the ECAM fake BAR is the first element in
> > the bars array, so we ensure it's the first item to consume memory
> > from the low MMIO hole. Not sure how that will work with the current
> > sorting of the resources based on their size, but it's imperative for
> > hvmloader to attempt to position ECAM ahead of the other device
> > resources IMO.
>
> Why would this be?
I would assume it's best to have ECAM access in the low 4G (for 32bit
OSes) at the expense of some 32bit BARs possibly not fitting in the
32bit space. But the ECAM space could be placed above 4G, and 32bit
OSes might not care much about extended address space capabilities.
Should is_64bar be set for the MCFG "fake" BAR?
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole
2026-05-04 12:23 ` Roger Pau Monné
@ 2026-05-04 12:36 ` Jan Beulich
0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 12:36 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Alexey Gerasimenko,
Thierry Escande
On 04.05.2026 14:23, Roger Pau Monné wrote:
> On Mon, May 04, 2026 at 01:11:44PM +0200, Jan Beulich wrote:
>> On 29.04.2026 11:29, Roger Pau Monné wrote:
>>> On Fri, Mar 13, 2026 at 04:35:04PM +0000, Thierry Escande wrote:
>>>> --- a/tools/firmware/hvmloader/pci.c
>>>> +++ b/tools/firmware/hvmloader/pci.c
>>>> @@ -413,6 +413,58 @@ void pci_setup(void)
>>>> pci_devfn_decode_type[devfn] |= PCI_COMMAND_MASTER;
>>>> }
>>>>
>>>> + /*
>>>> + * Calculate MMCONFIG area size and squeeze it into the bars array
>>>> + * for assigning a slot in the MMIO hole
>>>> + */
>>>> + if ( is_running_on_q35 )
>>>> + {
>>>> + /* disable PCIEXBAR decoding for now */
>>>> + pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR, 0);
>>>> + pci_writel(PCI_MCH_DEVFN, PCI_MCH_PCIEXBAR + 4, 0);
>>>> +
>>>> + switch ( PCI_MAX_MCFG_BUSES )
>>>> + {
>>>> + case 64:
>>>> + bar_data = PCIEXBAR_64_BUSES | PCIEXBAR_ENABLE;
>>>> + bar_sz = MB(64);
>>>> + break;
>>>> +
>>>> + case 128:
>>>> + bar_data = PCIEXBAR_128_BUSES | PCIEXBAR_ENABLE;
>>>> + bar_sz = MB(128);
>>>> + break;
>>>> +
>>>> + case 256:
>>>> + bar_data = PCIEXBAR_256_BUSES | PCIEXBAR_ENABLE;
>>>> + bar_sz = MB(256);
>>>> + break;
>>>> +
>>>> + default:
>>>> + /* unsupported number of buses specified */
>>>> + BUG();
>>>> + }
>>>> +
>>>> + addr_mask = ~(bar_sz - 1);
>>>> +
>>>> + for ( i = 0; i < nr_bars; i++ )
>>>> + if ( bars[i].bar_sz < bar_sz )
>>>> + break;
>>>> +
>>>> + if ( i != nr_bars )
>>>> + memmove(&bars[i+1], &bars[i], (nr_bars-i) * sizeof(*bars));
>>>> +
>>>> + bars[i].is_mem = 1;
>>>> + bars[i].devfn = PCI_MCH_DEVFN;
>>>> + bars[i].bar_reg = PCI_MCH_PCIEXBAR;
>>>> + bars[i].bar_sz = bar_sz;
>>>> + bars[i].addr_mask = addr_mask;
>>>> + bars[i].bar_data = bar_data;
>>>> +
>>>> + mmio_total += bar_sz;
>>>> + nr_bars++;
>>>> + }
>>>
>>> I think it might be best if the ECAM fake BAR is the first element in
>>> the bars array, so we ensure it's the first item to consume memory
>>> from the low MMIO hole. Not sure how that will work with the current
>>> sorting of the resources based on their size, but it's imperative for
>>> hvmloader to attempt to position ECAM ahead of the other device
>>> resources IMO.
>>
>> Why would this be?
>
> I would assume it's best to have ECAM access in the low 4G (for 32bit
> OSes) at the expense of some 32bit BARs possibly not fitting in the
> 32bit space. But the ECAM space could be placed above 4G, and 32bit
> OSes might not care much about extended address space capabilities.
This doesn't require it to be 1st, though? Ideally we would try to put
it below 4G if possible (i.e. if all other BARs which aren't 64-bit
capable still fit), but fall back to putting it above 4G otherwise. In
all of this it would still be put in the slot that its size demands.
If we put it first, a significant gap could result between it and the
first "real" BAR. (Then again a significant gap could also result if
the number of buses to cover wasn't a power of 2.) Properly making use
of such a gap would further complicate the code, I expect.
> Should is_64bar be set for the MCFG "fake" BAR?
As per above, maybe when making a 2nd (retry) pass.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 15/17] hvmloader: Set MCFG in ACPI table
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (12 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 11/17] hvmloader: allocate MMCONFIG area in the MMIO hole Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-29 12:33 ` Roger Pau Monné
2026-03-13 16:35 ` [PATCH 04/17] hvmloader: add ACPI enabling for Q35 Thierry Escande
` (4 subsequent siblings)
18 siblings, 1 reply; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD
This patch extends hvmloader_acpi_build_tables() to detect if MMCONFIG
is available by obtaining its base address and size from the hypercall
HVMOP_get_ecam_space and sets the flag ACPI_HAS_MCFG in the ACPI config
if needed.
This also sets the MMCONFIG area in E820 map using the same method.
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/e820.c | 11 +++++++++++
tools/firmware/hvmloader/util.c | 9 +++++++++
2 files changed, 20 insertions(+)
diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
index 86d39544e8..ff5c270f57 100644
--- a/tools/firmware/hvmloader/e820.c
+++ b/tools/firmware/hvmloader/e820.c
@@ -158,6 +158,8 @@ int build_e820_table(struct e820entry *e820,
unsigned long acpi_mem_end = acpi_enabled ?
ACPI_MEMORY_DYNAMIC_START + (acpi_pages_allocated() << PAGE_SHIFT) :
RESERVED_MEMBASE;
+ uint64_t mmconfig_addr;
+ uint32_t mmconfig_size;
if ( !lowmem_reserved_base )
lowmem_reserved_base = 0xA0000;
@@ -260,6 +262,15 @@ int build_e820_table(struct e820entry *e820,
nr++;
}
+ /* mark MMCONFIG area */
+ if ( ! hvm_get_ecam_space(&mmconfig_addr, &mmconfig_size) )
+ {
+ e820[nr].addr = mmconfig_addr;
+ e820[nr].size = mmconfig_size;
+ e820[nr].type = E820_RESERVED;
+ nr++;
+ }
+
/* Low RAM goes here. Reserve space for special pages. */
BUG_ON(low_mem_end < MB(2));
diff --git a/tools/firmware/hvmloader/util.c b/tools/firmware/hvmloader/util.c
index ee7a09b5bc..2cd1cadfc3 100644
--- a/tools/firmware/hvmloader/util.c
+++ b/tools/firmware/hvmloader/util.c
@@ -903,6 +903,8 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
struct acpi_ctxt ctxt;
long long tpm_version;
char *end;
+ uint64_t mmconfig_addr;
+ uint32_t mmconfig_size;
/* Allocate and initialise the acpi info area. */
mem_hole_populate_ram(ACPI_INFO_PHYSICAL_ADDRESS >> PAGE_SHIFT, 1);
@@ -953,6 +955,13 @@ void hvmloader_acpi_build_tables(struct acpi_config *config,
config->pci_hi_len = pci_hi_mem_end - pci_hi_mem_start;
}
+ if ( ! hvm_get_ecam_space(&mmconfig_addr, &mmconfig_size) )
+ {
+ config->table_flags |= ACPI_HAS_MCFG;
+ config->mmconfig_addr = mmconfig_addr;
+ config->mmconfig_size = mmconfig_size;
+ }
+
s = xenstore_read("platform/generation-id", "0:0");
if ( s )
{
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 15/17] hvmloader: Set MCFG in ACPI table
2026-03-13 16:35 ` [PATCH 15/17] hvmloader: Set MCFG in ACPI table Thierry Escande
@ 2026-04-29 12:33 ` Roger Pau Monné
0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-29 12:33 UTC (permalink / raw)
To: Thierry Escande; +Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD
On Fri, Mar 13, 2026 at 04:35:05PM +0000, Thierry Escande wrote:
> This patch extends hvmloader_acpi_build_tables() to detect if MMCONFIG
> is available by obtaining its base address and size from the hypercall
> HVMOP_get_ecam_space and sets the flag ACPI_HAS_MCFG in the ACPI config
> if needed.
>
> This also sets the MMCONFIG area in E820 map using the same method.
>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> tools/firmware/hvmloader/e820.c | 11 +++++++++++
> tools/firmware/hvmloader/util.c | 9 +++++++++
> 2 files changed, 20 insertions(+)
>
> diff --git a/tools/firmware/hvmloader/e820.c b/tools/firmware/hvmloader/e820.c
> index 86d39544e8..ff5c270f57 100644
> --- a/tools/firmware/hvmloader/e820.c
> +++ b/tools/firmware/hvmloader/e820.c
> @@ -158,6 +158,8 @@ int build_e820_table(struct e820entry *e820,
> unsigned long acpi_mem_end = acpi_enabled ?
> ACPI_MEMORY_DYNAMIC_START + (acpi_pages_allocated() << PAGE_SHIFT) :
> RESERVED_MEMBASE;
> + uint64_t mmconfig_addr;
> + uint32_t mmconfig_size;
>
> if ( !lowmem_reserved_base )
> lowmem_reserved_base = 0xA0000;
> @@ -260,6 +262,15 @@ int build_e820_table(struct e820entry *e820,
> nr++;
> }
>
> + /* mark MMCONFIG area */
> + if ( ! hvm_get_ecam_space(&mmconfig_addr, &mmconfig_size) )
^ extra space
However, having to query the hypervisor for something that has been
set by hvmloader itself seems very inefficient. Just store the values
in global variables so they can be consumed from here?
Same for the usage below.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 04/17] hvmloader: add ACPI enabling for Q35
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (13 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 15/17] hvmloader: Set MCFG in ACPI table Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-28 10:48 ` Roger Pau Monné
2026-03-13 16:35 ` [PATCH 17/17] docs: provide description for device_model_machine option Thierry Escande
` (3 subsequent siblings)
18 siblings, 1 reply; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Alexey Gerasimenko
In order to turn on ACPI for OS, we need to write a chipset-specific value
to SMI_CMD register (sort of imitation of the APM->ACPI switch on real
systems). Modify acpi_enable_sci() function to support both i440 and Q35
emulation.
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
tools/firmware/hvmloader/hvmloader.c | 19 +++++++++++++++++--
1 file changed, 17 insertions(+), 2 deletions(-)
diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
index 626cc53649..f6cc3fa4b9 100644
--- a/tools/firmware/hvmloader/hvmloader.c
+++ b/tools/firmware/hvmloader/hvmloader.c
@@ -258,9 +258,24 @@ static const struct bios_config *detect_bios(void)
static void acpi_enable_sci(void)
{
uint8_t pm1a_cnt_val;
+ uint8_t acpi_enable_val;
-#define PIIX4_SMI_CMD_IOPORT 0xb2
+#define SMI_CMD_IOPORT 0xb2
#define PIIX4_ACPI_ENABLE 0xf1
+#define ICH9_ACPI_ENABLE 0x02
+
+ switch ( machine_type )
+ {
+ case MACHINE_TYPE_Q35:
+ acpi_enable_val = ICH9_ACPI_ENABLE;
+ break;
+ case MACHINE_TYPE_I440:
+ acpi_enable_val = PIIX4_ACPI_ENABLE;
+ break;
+ default:
+ /* Not likely to happen */
+ BUG();
+ }
/*
* PIIX4 emulation in QEMU has SCI_EN=0 by default. We have no legacy
@@ -268,7 +283,7 @@ static void acpi_enable_sci(void)
*/
pm1a_cnt_val = inb(ACPI_PM1A_CNT_BLK_ADDRESS_V1);
if ( !(pm1a_cnt_val & ACPI_PM1C_SCI_EN) )
- outb(PIIX4_SMI_CMD_IOPORT, PIIX4_ACPI_ENABLE);
+ outb(SMI_CMD_IOPORT, acpi_enable_val);
pm1a_cnt_val = inb(ACPI_PM1A_CNT_BLK_ADDRESS_V1);
BUG_ON(!(pm1a_cnt_val & ACPI_PM1C_SCI_EN));
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 04/17] hvmloader: add ACPI enabling for Q35
2026-03-13 16:35 ` [PATCH 04/17] hvmloader: add ACPI enabling for Q35 Thierry Escande
@ 2026-04-28 10:48 ` Roger Pau Monné
2026-05-05 13:58 ` Alexey G
0 siblings, 1 reply; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 10:48 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Alexey Gerasimenko
On Fri, Mar 13, 2026 at 04:35:05PM +0000, Thierry Escande wrote:
> In order to turn on ACPI for OS, we need to write a chipset-specific value
> to SMI_CMD register (sort of imitation of the APM->ACPI switch on real
> systems). Modify acpi_enable_sci() function to support both i440 and Q35
> emulation.
>
> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
It's not great to add more stuff into hvmloader when we want to move
out of it, but it's also not helpful to tie the Q35 addition to the
removal of hvmloader.
> ---
> tools/firmware/hvmloader/hvmloader.c | 19 +++++++++++++++++--
> 1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/tools/firmware/hvmloader/hvmloader.c b/tools/firmware/hvmloader/hvmloader.c
> index 626cc53649..f6cc3fa4b9 100644
> --- a/tools/firmware/hvmloader/hvmloader.c
> +++ b/tools/firmware/hvmloader/hvmloader.c
> @@ -258,9 +258,24 @@ static const struct bios_config *detect_bios(void)
> static void acpi_enable_sci(void)
> {
> uint8_t pm1a_cnt_val;
> + uint8_t acpi_enable_val;
>
> -#define PIIX4_SMI_CMD_IOPORT 0xb2
> +#define SMI_CMD_IOPORT 0xb2
> #define PIIX4_ACPI_ENABLE 0xf1
> +#define ICH9_ACPI_ENABLE 0x02
> +
> + switch ( machine_type )
> + {
> + case MACHINE_TYPE_Q35:
> + acpi_enable_val = ICH9_ACPI_ENABLE;
> + break;
> + case MACHINE_TYPE_I440:
> + acpi_enable_val = PIIX4_ACPI_ENABLE;
> + break;
We might want to add a newline after the break statements.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 04/17] hvmloader: add ACPI enabling for Q35
2026-04-28 10:48 ` Roger Pau Monné
@ 2026-05-05 13:58 ` Alexey G
2026-05-05 14:25 ` Roger Pau Monné
0 siblings, 1 reply; 59+ messages in thread
From: Alexey G @ 2026-05-05 13:58 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Thierry Escande, xen-devel, Jan Beulich, Andrew Cooper,
Anthony PERARD
On Tue, 28 Apr 2026 12:48:23 +0200
Roger Pau Monné <roger.pau@citrix.com> wrote:
>On Fri, Mar 13, 2026 at 04:35:05PM +0000, Thierry Escande wrote:
>> In order to turn on ACPI for OS, we need to write a chipset-specific
>> value to SMI_CMD register (sort of imitation of the APM->ACPI switch
>> on real systems). Modify acpi_enable_sci() function to support both
>> i440 and Q35 emulation.
>>
>> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
>> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
>
>Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
>
>It's not great to add more stuff into hvmloader when we want to move
>out of it, but it's also not helpful to tie the Q35 addition to the
>removal of hvmloader.
I'm afraid the only option to get rid of hvmloader is to move its
responsibilities back into the firmware (SeaBIOS/OVMF). But if I
understand it right, the whole idea of introducing hvmloader originally
was to delegate Xen-specific parts of HVM guest initialization from
firmware to a component managed by Xen itself.
So, to some extent hvmloader can be considered as a part of the firmware
itself as it does things like PCI BAR allocation etc which are normally
done by the guest firmware, but with more knowledge of Xen specifics.
I guess this hvmloader/firmware split model was introduced to have more
freedom/maintainability/control - I suppose it's much faster and easier
to integrate Xen-specific changes to hvmloader directly then to
upstream them to SeaBIOS/OVMF codebases.
But other than moving hvmloader's responsibilities to the firmware we
can't do much I think - HVM guests expect to have full freedom over the
emulated platform. Among problems are non-standard (chipset-specific)
devices which also need to have assigned resources like MMIO ranges -
and Xen doesn't know anything about these devices and their resource
requirements (left alone how to configure them), yet they still need to
have correct BARs assigned with no conflicts with other PCI devices and
to contribute to MMIO hole sizing. This is something which cannot be
solved on the toolstack level unless Xen emulates the whole chipset and
knows about all emulated chipset devices - we limit ourselves to
MMCONFIG now but there are more configurable ranges like this.
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 04/17] hvmloader: add ACPI enabling for Q35
2026-05-05 13:58 ` Alexey G
@ 2026-05-05 14:25 ` Roger Pau Monné
0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-05-05 14:25 UTC (permalink / raw)
To: Alexey G
Cc: Thierry Escande, xen-devel, Jan Beulich, Andrew Cooper,
Anthony PERARD
On Tue, May 05, 2026 at 03:58:16PM +0200, Alexey G wrote:
> On Tue, 28 Apr 2026 12:48:23 +0200
> Roger Pau Monné <roger.pau@citrix.com> wrote:
> >On Fri, Mar 13, 2026 at 04:35:05PM +0000, Thierry Escande wrote:
> >> In order to turn on ACPI for OS, we need to write a chipset-specific
> >> value to SMI_CMD register (sort of imitation of the APM->ACPI switch
> >> on real systems). Modify acpi_enable_sci() function to support both
> >> i440 and Q35 emulation.
> >>
> >> Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
> >> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> >
> >Reviewed-by: Roger Pau Monné <roger.pau@citrix.com>
> >
> >It's not great to add more stuff into hvmloader when we want to move
> >out of it, but it's also not helpful to tie the Q35 addition to the
> >removal of hvmloader.
>
> I'm afraid the only option to get rid of hvmloader is to move its
> responsibilities back into the firmware (SeaBIOS/OVMF). But if I
> understand it right, the whole idea of introducing hvmloader originally
> was to delegate Xen-specific parts of HVM guest initialization from
> firmware to a component managed by Xen itself.
Maybe originally? We then ended up having to add all sorts of Xen
specific logic into both SeaBIOS and OVMF, so much to OVMF that
there's a Xen-specific target/platform in OVMF.
> So, to some extent hvmloader can be considered as a part of the firmware
> itself as it does things like PCI BAR allocation etc which are normally
> done by the guest firmware, but with more knowledge of Xen specifics.
>
> I guess this hvmloader/firmware split model was introduced to have more
> freedom/maintainability/control - I suppose it's much faster and easier
> to integrate Xen-specific changes to hvmloader directly then to
> upstream them to SeaBIOS/OVMF codebases.
Faster to integrate possibly, but then we end up with IMO a lot of
duplication between what hvmloader does vs what we could offload to
OVMF.
> But other than moving hvmloader's responsibilities to the firmware we
> can't do much I think - HVM guests expect to have full freedom over the
> emulated platform. Among problems are non-standard (chipset-specific)
> devices which also need to have assigned resources like MMIO ranges -
> and Xen doesn't know anything about these devices and their resource
> requirements (left alone how to configure them), yet they still need to
> have correct BARs assigned with no conflicts with other PCI devices and
> to contribute to MMIO hole sizing. This is something which cannot be
> solved on the toolstack level unless Xen emulates the whole chipset and
> knows about all emulated chipset devices - we limit ourselves to
> MMCONFIG now but there are more configurable ranges like this.
I think we do want to move a lot of hvmloader responsibilities into
OVMF (maybe SeaBIOS if possible also, albeit that's a legacy firmware
by today standards anyway). Whether we would need to partially
offload some of what hvmloader does into the toolstack remains to be
seen.
Chipset initialization would possibly need to be done by OVMF, at
which point we might require PVH guests that want to use PCI
passthrough to also rely on OVMF instead of direct kernel boot.
Anyway, this is quite distant future.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread
* [PATCH 17/17] docs: provide description for device_model_machine option
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (14 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 04/17] hvmloader: add ACPI enabling for Q35 Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-29 12:43 ` Roger Pau Monné
2026-03-13 16:35 ` [PATCH 16/17] Handle PCIe ECAM space access from guests Thierry Escande
` (2 subsequent siblings)
18 siblings, 1 reply; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel; +Cc: Thierry Escande, Anthony PERARD, Alexey Gerasimenko
This patch adds description for 'device_model_machine' option which allows
to control which chipset will be emulated by device model.
Signed-off-by: Alexey Gerasimenko <x1917x@gmail.com>
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
docs/man/xl.cfg.5.pod.in | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 27c455210b..67a5bc54a5 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2963,6 +2963,33 @@ you have existing guests then, depending on the nature of the guest
Operating System, you may wish to force them to use the device
model which they were installed with.
+=item B<device_model_machine="STRING">
+
+Selects which chipset the device model should emulate for this
+guest.
+
+Valid options are:
+
+=over 4
+
+=item B<"i440">
+
+Use i440 emulation (a default setting)
+
+=item B<"q35">
+
+Use Q35/ICH9 emulation. This enables additional features for
+PCIe device passthrough
+
+=back
+
+Note that omitting device_model_machine parameter means i440 system
+by default, so the default behavior doesn't change for old domain
+config files.
+
+It is recommended to install the guest OS from scratch to avoid issues
+due to the emulated platform change.
+
=item B<device_model_override="PATH">
Override the path to the binary to be used as the device-model running in
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* [PATCH 16/17] Handle PCIe ECAM space access from guests
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (15 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 17/17] docs: provide description for device_model_machine option Thierry Escande
@ 2026-03-13 16:35 ` Thierry Escande
2026-04-29 12:42 ` Roger Pau Monné
2026-05-04 15:22 ` Jan Beulich
2026-03-15 22:43 ` [PATCH 00/17] Q35 initial support for HVM guests Alexey G
2026-04-28 7:48 ` Roger Pau Monné
18 siblings, 2 replies; 59+ messages in thread
From: Thierry Escande @ 2026-03-13 16:35 UTC (permalink / raw)
To: xen-devel
Cc: Thierry Escande, Jan Beulich, Andrew Cooper, Roger Pau Monné
This patch adds the logic to decode MMIO-based PCIe ECAM accesses. If
the IOREQ_TYPE_COPY request is within the ECAM address space configured
by hvmloader, the ioreq type is set to XEN_DMOP_IO_RANGE_PCI and the
sbdf decoded from the accessed address.
Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
---
xen/arch/x86/hvm/ioreq.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
index a5fa97e149..022fe05222 100644
--- a/xen/arch/x86/hvm/ioreq.c
+++ b/xen/arch/x86/hvm/ioreq.c
@@ -268,6 +268,8 @@ bool arch_ioreq_server_get_type_addr(const struct domain *d,
uint64_t *addr)
{
unsigned int cf8 = d->arch.hvm.pci_cf8;
+ unsigned long mmio_start = (p->type == IOREQ_TYPE_COPY) ?
+ ioreq_mmio_first_byte(p) : 0;
if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
return false;
@@ -298,6 +300,19 @@ bool arch_ioreq_server_get_type_addr(const struct domain *d,
*addr |= CF8_ADDR_HI(cf8);
}
}
+ else if ( p->type == IOREQ_TYPE_COPY &&
+ (mmio_start >= d->arch.ecam_addr &&
+ mmio_start < (d->arch.ecam_addr + d->arch.ecam_size)) )
+ {
+ pci_sbdf_t sbdf;
+ unsigned int reg = mmio_start & ~PAGE_MASK;
+
+ sbdf.bdf = (((mmio_start - d->arch.ecam_addr) & 0x0ffff000) >> 12);
+ sbdf.seg = 0;
+
+ *type = XEN_DMOP_IO_RANGE_PCI;
+ *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
+ }
else
{
*type = (p->type == IOREQ_TYPE_PIO) ?
--
2.51.0
--
Thierry Escande | Vates XCP-ng Developer
XCP-ng & Xen Orchestra - Vates solutions
web: https://vates.tech
^ permalink raw reply related [flat|nested] 59+ messages in thread* Re: [PATCH 16/17] Handle PCIe ECAM space access from guests
2026-03-13 16:35 ` [PATCH 16/17] Handle PCIe ECAM space access from guests Thierry Escande
@ 2026-04-29 12:42 ` Roger Pau Monné
2026-05-04 15:22 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-29 12:42 UTC (permalink / raw)
To: Thierry Escande; +Cc: xen-devel, Jan Beulich, Andrew Cooper
On Fri, Mar 13, 2026 at 04:35:05PM +0000, Thierry Escande wrote:
> This patch adds the logic to decode MMIO-based PCIe ECAM accesses. If
> the IOREQ_TYPE_COPY request is within the ECAM address space configured
> by hvmloader, the ioreq type is set to XEN_DMOP_IO_RANGE_PCI and the
> sbdf decoded from the accessed address.
>
> Signed-off-by: Thierry Escande <thierry.escande@vates.tech>
> ---
> xen/arch/x86/hvm/ioreq.c | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/xen/arch/x86/hvm/ioreq.c b/xen/arch/x86/hvm/ioreq.c
> index a5fa97e149..022fe05222 100644
> --- a/xen/arch/x86/hvm/ioreq.c
> +++ b/xen/arch/x86/hvm/ioreq.c
> @@ -268,6 +268,8 @@ bool arch_ioreq_server_get_type_addr(const struct domain *d,
> uint64_t *addr)
> {
> unsigned int cf8 = d->arch.hvm.pci_cf8;
> + unsigned long mmio_start = (p->type == IOREQ_TYPE_COPY) ?
> + ioreq_mmio_first_byte(p) : 0;
>
> if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
> return false;
> @@ -298,6 +300,19 @@ bool arch_ioreq_server_get_type_addr(const struct domain *d,
> *addr |= CF8_ADDR_HI(cf8);
> }
> }
> + else if ( p->type == IOREQ_TYPE_COPY &&
> + (mmio_start >= d->arch.ecam_addr &&
> + mmio_start < (d->arch.ecam_addr + d->arch.ecam_size)) )
> + {
> + pci_sbdf_t sbdf;
> + unsigned int reg = mmio_start & ~PAGE_MASK;
> +
> + sbdf.bdf = (((mmio_start - d->arch.ecam_addr) & 0x0ffff000) >> 12);
> + sbdf.seg = 0;
> +
> + *type = XEN_DMOP_IO_RANGE_PCI;
> + *addr = ((uint64_t)sbdf.sbdf << 32) | reg;
The trapping & decoding here should better re-use the logic in the
vpci_mmcfg* handlers in x86/hvm/io.c. You might want to gate the call
to register_mmio_handler() to the domain having vPCI, so that
otherwise it will be handled by the IOREQ catch-all. It's a bit
hacky, but we already know that vPCI and IOREQs don't play well, and
it needs solving properly. Again I don't want to force you having to
do that just to get q35 merged. But we should at least aim to not
duplicate the data in the domain structures.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 16/17] Handle PCIe ECAM space access from guests
2026-03-13 16:35 ` [PATCH 16/17] Handle PCIe ECAM space access from guests Thierry Escande
2026-04-29 12:42 ` Roger Pau Monné
@ 2026-05-04 15:22 ` Jan Beulich
1 sibling, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 15:22 UTC (permalink / raw)
To: Thierry Escande; +Cc: Andrew Cooper, Roger Pau Monné, xen-devel
On 13.03.2026 17:35, Thierry Escande wrote:
> @@ -298,6 +300,19 @@ bool arch_ioreq_server_get_type_addr(const struct domain *d,
> *addr |= CF8_ADDR_HI(cf8);
> }
> }
> + else if ( p->type == IOREQ_TYPE_COPY &&
> + (mmio_start >= d->arch.ecam_addr &&
> + mmio_start < (d->arch.ecam_addr + d->arch.ecam_size)) )
> + {
> + pci_sbdf_t sbdf;
> + unsigned int reg = mmio_start & ~PAGE_MASK;
If you use PAGE_MASK here, ...
> + sbdf.bdf = (((mmio_start - d->arch.ecam_addr) & 0x0ffff000) >> 12);
... why not PAGE_SHIFT here? And why the masking by 0x0ffff000? Doesn't the
earlier range check make this unnecessary (as long as .ecam_size is plausible)?
Finally, nit: There's a stray blank after the =, and one pair of parentheses
could perhaps also be dropped.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 00/17] Q35 initial support for HVM guests
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (16 preceding siblings ...)
2026-03-13 16:35 ` [PATCH 16/17] Handle PCIe ECAM space access from guests Thierry Escande
@ 2026-03-15 22:43 ` Alexey G
2026-04-28 7:48 ` Roger Pau Monné
18 siblings, 0 replies; 59+ messages in thread
From: Alexey G @ 2026-03-15 22:43 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Roger Pau Monné,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini,
Juergen Gross
Hello,
I'm glad someone wants to commit these patches (and surprised that
they're still applicable after so many years), thank you for this
effort. Feel free to proceed, it would be good if you manage to upstream
them to Xen/QEMU code. But be prepared that it won't be an easy
task - the patches cross multiple areas of responsibility, so it will
require some effort to make all involved Xen/QEMU maintainers happy.
I don't work on virtualization/x86 anymore and I barely remember
anything after 8 years, so I probably won't be able to help much, but
I'll keep an eye on the email thread.
Some historical background for the Q35 patches:
The project I was working on was relying on Xen for PCIe device
passthrough (mostly GPUs, NICs and storage controllers) to HVM guests.
So PCIe passthrough and HVM were the top priority - it affected many
of my decisions.
IIRC, there were 2 major obstacles to successfully passthrough any PCIe device:
1. Even back then, there were **multiple PCIe devices whose drivers
were attempting to read/write registers from their device's PCIe
extended config space** (offsets above 100h). Supporting this feature
required to have MMCONFIG/ECAM working, which was something only
available for Q35 emulation at that time => hence Q35 support was
added, with mostly PCIe passthrough in mind. In the process I also
discovered that dreadful "PCIe topology check" issue which was bypassed
by presenting the passed through PCIe device to the OS as a chipset
built-in device. This solution was a bit hacky, but allowed to
successfully pass through PCIe devices to a Q35 HVM guest.
2. Some devices had mirrors of BAR registers' values _accessed through
a proprietary mechanism_, like reading them through device-specific
MMIO registers. As such, their drivers do not read a BAR value from
the PCI conf space but rather get it directly from eg. MMIO, whose
layout is completely unknown to us. This makes all BAR emulation in the
hypervisor useless for such device - the hypervisor returns one value
for BARs read via PCI conf space, but the driver sees the real values
as it bypasses the PCI conf space. Among such devices were Nvidia GPUs
BTW - but not including the "pro" models AFAIR, which were more
virtualization-friendly.
That "BAR desync" problem was tricky - I solved it by implementing an
option (in the domain config file) for a passed through device which,
when turned on, was basically enabling 1:1 matching between virtual and
physical BAR values for a given device, without affecting other devices
(be it PT or emulated). This way virtual physical addresses in BARs
match the real ones - hence the device driver sees the same values
either in the PCI conf space or proprietary registers.
But it wasn't that simple, unfortunately - having a specific "locked"
BAR value means we need to adjust the MMIO hole size for the guest
accordingly. A straightforward approach is to make the MMIO hole size
very big. This in turn brought another problems to solve:
2.1. when a recent (back then) Windows OS sees PCI BAR allocation which
is far from perfect - it can completely reallocate all BARs of all
devices to other, very different addresses. They were calling this
feature as PCIe "resource rebalancing" IIRC. This breaks 1:1 mirroring
of given device's virtual/physical BARs - it's ok to present BARs with
real physical addresses (the sneaky device driver knows them via MMIO
registers anyway), but allowing to modify values in BARs is a no go, of
course.
Luckily, this problem was solved by a specific PCI BAR allocation - the
idea was to keep the MMIO hole as small as possible while avoiding
large unused gaps inside it, not claimed by any BAR. It was implemented
inside hvmloader, which was populating the MMIO hole while taking into
account both fixed and freely modifiable BARs and then reported the new
RAM/MMIO hole layout back to Xen. This allowed to prevent the PCI BAR
reallocation from the OS - and hotplugging was still working thanks to
the high MMIO hole (above 4Gb).
2.2. after experimenting with dynamic resizing of the MMIO hole, I
realized that Xen and QEMU have their own vision of the system memory
layout which can get out of sync. And MMIO hole resize was creating
this bad situation in fact, giving some hard to debug/reproduce bugs
with unexpected guest memory corruption.
The way I fixed this memory mismatch was emulating the real Q35
facility for this - namely, chipset's REMAP register which was designed
precisely for this goal - to reconfigure the MMIO hole size/position
while relocating underlying RAM memory to another range (so no RAM is
wasted). As the chipset was emulated by QEMU and the whole idea of HVM
was to emulate real hardware as close as possible, this was the obvious
solution - we do it in the way like it's done in a real firmware and
then QEMU knows the RAM/MMIO hole layout, allowing to sync it with
Xen's. There were some other fixed issues relying on this feature -
AFAIR, I needed it also to make 'populate on demand' working with
(hotplugged?) PT devices.
I was planning to send patches for this feature too, after settling the
Q35 patches. I'll try to find the relevant code/notes, maybe they will
be helpful.
On Fri, 13 Mar 2026 16:35:01 +0000
"Thierry Escande" <thierry.escande@vates.tech> wrote:
>This series introduces initial Q35 chipset support for HVM guests, based on the
>patchset at [1] by Alexey Gerasimenko.
>
>Basic support means that this patchset allows to start an HVM guest that
>emulates a Q35 chipset via Qemu and implements access to PCIe extended
>configuration space for such devices emulated by Qemu.
>
>Support for PCIe device passthrough is not implemented yet. This is planned but
>implies modifications in the hypervisor and the firmwares, mainly for the
>support of multiple PCI buses.
>
>In order to create a Q35 guest, a new domain config option has been added,
>named 'device_model_machine'. Possible values are:
>- "i440" - i440 emulation (default)
>- "q35" - emulate a Q35 machine
>
>If the option is omitted it defaults to "i440", not impacting existing domain
>configuration files.
>
>DSDT files for Q35 and i440 are largely similar so the existing file dsdt.asl
>has been split with i440 and q35 specific parts put in seperated files.
>
>The PCIe MMCONFIG area is configured by hvmloader and its base address and size
>are set in Xen using a new pair of hypercalls HVMOP_get|set_ecam_space. Access
>to the MMCONFIG area from a guest is trapped by Xen and transfered to the
>emulator as XEN_DMOP_IO_RANGE_PCI ioreq type.
>
>[1] https://lore.kernel.org/xen-devel/cover.1520867740.git.x1917x@gmail.com/
>
>Thierry Escande (17):
> libacpi: Split dsdt.asl file and extract i440 specific parts
> libacpi: new DSDT ACPI table for Q35
> hvmloader: add function to set the emulated machine type (i440/Q35)
> hvmloader: add ACPI enabling for Q35
> hvmloader: add Q35 DSDT table loading
> hvmloader: Move pci devices setup to a separate function
> hvmloader: add basic Q35 support
> hvmloader: Extend PCI BAR struct
> xev/hvm: Add HVMOP_get|set_ecam_space hypercalls
> hvmloader: Add support for HVMOP_set|get_ecam_space hypercalls
> hvmloader: allocate MMCONFIG area in the MMIO hole
> libxl: Q35 support (new option device_model_machine)
> libxl: Add xen-platform device for Q35 machine
> libacpi: build ACPI MCFG table if requested
> hvmloader: Set MCFG in ACPI table
> Handle PCIe ECAM space access from guests
> docs: provide description for device_model_machine option
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 00/17] Q35 initial support for HVM guests
2026-03-13 16:35 [PATCH 00/17] Q35 initial support for HVM guests Thierry Escande
` (17 preceding siblings ...)
2026-03-15 22:43 ` [PATCH 00/17] Q35 initial support for HVM guests Alexey G
@ 2026-04-28 7:48 ` Roger Pau Monné
2026-05-04 10:45 ` Jan Beulich
2026-05-05 13:07 ` Alexey G
18 siblings, 2 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-04-28 7:48 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Jan Beulich, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, Juergen Gross
On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
> This series introduces initial Q35 chipset support for HVM guests, based on the
> patchset at [1] by Alexey Gerasimenko.
>
> Basic support means that this patchset allows to start an HVM guest that
> emulates a Q35 chipset via Qemu and implements access to PCIe extended
> configuration space for such devices emulated by Qemu.
>
> Support for PCIe device passthrough is not implemented yet. This is planned but
> implies modifications in the hypervisor and the firmwares, mainly for the
> support of multiple PCI buses.
Why do you need multi bus support to expose PCIe capabilities? I'm
not seeing the relation between those two. You could still expose a
single bus on the MCFG table.
> In order to create a Q35 guest, a new domain config option has been added,
> named 'device_model_machine'. Possible values are:
> - "i440" - i440 emulation (default)
> - "q35" - emulate a Q35 machine
>
> If the option is omitted it defaults to "i440", not impacting existing domain
> configuration files.
>
> DSDT files for Q35 and i440 are largely similar so the existing file dsdt.asl
> has been split with i440 and q35 specific parts put in seperated files.
>
> The PCIe MMCONFIG area is configured by hvmloader and its base address and size
> are set in Xen using a new pair of hypercalls HVMOP_get|set_ecam_space.
I guess I will see how that looks like in the series, but the setting
of the ECAM region would better be done by the toolstack. Setting it
in hvmloader is possibly not the best placement, because it doesn't
run for PVH guests (and we will want ECAM support for PVH at some
point), and there's also a vague plan/intention to get rid of
hvmloader even for HVM guests eventually.
> Access
> to the MMCONFIG area from a guest is trapped by Xen and transfered to the
> emulator as XEN_DMOP_IO_RANGE_PCI ioreq type.
Thanks for doing this bit, IIRC from when this was last posted this
was the biggest shortcoming of the series when originally submitted.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread* Re: [PATCH 00/17] Q35 initial support for HVM guests
2026-04-28 7:48 ` Roger Pau Monné
@ 2026-05-04 10:45 ` Jan Beulich
2026-05-05 5:48 ` Jan Beulich
2026-05-05 13:29 ` Alexey G
2026-05-05 13:07 ` Alexey G
1 sibling, 2 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-04 10:45 UTC (permalink / raw)
To: Roger Pau Monné
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, Juergen Gross, Thierry Escande
On 28.04.2026 09:48, Roger Pau Monné wrote:
> On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
>> This series introduces initial Q35 chipset support for HVM guests, based on the
>> patchset at [1] by Alexey Gerasimenko.
>>
>> Basic support means that this patchset allows to start an HVM guest that
>> emulates a Q35 chipset via Qemu and implements access to PCIe extended
>> configuration space for such devices emulated by Qemu.
>>
>> Support for PCIe device passthrough is not implemented yet. This is planned but
>> implies modifications in the hypervisor and the firmwares, mainly for the
>> support of multiple PCI buses.
>
> Why do you need multi bus support to expose PCIe capabilities? I'm
> not seeing the relation between those two. You could still expose a
> single bus on the MCFG table.
Can a valid PCIe topology be expressed with just bus 0? If an endpoint
to be handed to a guest isn't root complex integrated, would it be valid
to make it appear so by putting it on bus 0?
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 00/17] Q35 initial support for HVM guests
2026-05-04 10:45 ` Jan Beulich
@ 2026-05-05 5:48 ` Jan Beulich
2026-05-05 5:49 ` Jan Beulich
2026-05-05 13:29 ` Alexey G
1 sibling, 1 reply; 59+ messages in thread
From: Jan Beulich @ 2026-05-05 5:48 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, Juergen Gross,
Roger Pau Monné
On 04.05.2026 12:45, Jan Beulich wrote:
> On 28.04.2026 09:48, Roger Pau Monné wrote:
>> On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
>>> This series introduces initial Q35 chipset support for HVM guests, based on the
>>> patchset at [1] by Alexey Gerasimenko.
>>>
>>> Basic support means that this patchset allows to start an HVM guest that
>>> emulates a Q35 chipset via Qemu and implements access to PCIe extended
>>> configuration space for such devices emulated by Qemu.
>>>
>>> Support for PCIe device passthrough is not implemented yet. This is planned but
>>> implies modifications in the hypervisor and the firmwares, mainly for the
>>> support of multiple PCI buses.
>>
>> Why do you need multi bus support to expose PCIe capabilities? I'm
>> not seeing the relation between those two. You could still expose a
>> single bus on the MCFG table.
>
> Can a valid PCIe topology be expressed with just bus 0? If an endpoint
> to be handed to a guest isn't root complex integrated, would it be valid
> to make it appear so by putting it on bus 0?
We discussed this with Roger on our x86 call, and we came to the agreement
that for the time being putting everything on bus 0 may be good enough.
Introducing of bridge(s) and anything else that's required for multi-bus
support will want to be a separate piece of work.
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 00/17] Q35 initial support for HVM guests
2026-05-05 5:48 ` Jan Beulich
@ 2026-05-05 5:49 ` Jan Beulich
0 siblings, 0 replies; 59+ messages in thread
From: Jan Beulich @ 2026-05-05 5:49 UTC (permalink / raw)
To: Thierry Escande
Cc: xen-devel, Andrew Cooper, Anthony PERARD, Michal Orzel,
Julien Grall, Stefano Stabellini, Juergen Gross,
Roger Pau Monné
On 05.05.2026 07:48, Jan Beulich wrote:
> On 04.05.2026 12:45, Jan Beulich wrote:
>> On 28.04.2026 09:48, Roger Pau Monné wrote:
>>> On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
>>>> This series introduces initial Q35 chipset support for HVM guests, based on the
>>>> patchset at [1] by Alexey Gerasimenko.
>>>>
>>>> Basic support means that this patchset allows to start an HVM guest that
>>>> emulates a Q35 chipset via Qemu and implements access to PCIe extended
>>>> configuration space for such devices emulated by Qemu.
>>>>
>>>> Support for PCIe device passthrough is not implemented yet. This is planned but
>>>> implies modifications in the hypervisor and the firmwares, mainly for the
>>>> support of multiple PCI buses.
>>>
>>> Why do you need multi bus support to expose PCIe capabilities? I'm
>>> not seeing the relation between those two. You could still expose a
>>> single bus on the MCFG table.
>>
>> Can a valid PCIe topology be expressed with just bus 0? If an endpoint
>> to be handed to a guest isn't root complex integrated, would it be valid
>> to make it appear so by putting it on bus 0?
>
> We discussed this with Roger on our x86 call, and we came to the agreement
> that for the time being putting everything on bus 0 may be good enough.
> Introducing of bridge(s) and anything else that's required for multi-bus
> support will want to be a separate piece of work.
Oh, and: Where applicable this (deliberate) restriction (or should I say
mistreatment) will want calling out in description(s).
Jan
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 00/17] Q35 initial support for HVM guests
2026-05-04 10:45 ` Jan Beulich
2026-05-05 5:48 ` Jan Beulich
@ 2026-05-05 13:29 ` Alexey G
1 sibling, 0 replies; 59+ messages in thread
From: Alexey G @ 2026-05-05 13:29 UTC (permalink / raw)
To: Jan Beulich, Juergen Gross
Cc: Roger Pau Monné, xen-devel, Andrew Cooper, Anthony PERARD,
Michal Orzel, Julien Grall, Stefano Stabellini, Thierry Escande
On Mon, 4 May 2026 12:45:23 +0200
Jan Beulich <jbeulich@suse.com> wrote:
>> Why do you need multi bus support to expose PCIe capabilities? I'm
>> not seeing the relation between those two. You could still expose a
>> single bus on the MCFG table.
>
>Can a valid PCIe topology be expressed with just bus 0? If an endpoint
>to be handed to a guest isn't root complex integrated, would it be
>valid to make it appear so by putting it on bus 0?
No, unfortunately, it will fail at least under Windows' pci.sys
driver. Unless they changed something in the past years. To place an
endpoint device to bus 0, we need to emulate that it's a
chipset-integrated device by trapping its PCIe Capabilities reads.
I found an old pci.sys + .pdb symbols which I used for debugging back
in 2017 and, if I remember correctly, the failing function was
`ExpressValidateFabricTopology()`.
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 00/17] Q35 initial support for HVM guests
2026-04-28 7:48 ` Roger Pau Monné
2026-05-04 10:45 ` Jan Beulich
@ 2026-05-05 13:07 ` Alexey G
2026-05-05 14:15 ` Roger Pau Monné
1 sibling, 1 reply; 59+ messages in thread
From: Alexey G @ 2026-05-05 13:07 UTC (permalink / raw)
To: Roger Pau Monné
Cc: Thierry Escande, xen-devel, Jan Beulich, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini,
Juergen Gross
On Tue, 28 Apr 2026 09:48:41 +0200
Roger Pau Monné <roger.pau@citrix.com> wrote:
>On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
>> This series introduces initial Q35 chipset support for HVM guests,
>> based on the patchset at [1] by Alexey Gerasimenko.
>>
>> Basic support means that this patchset allows to start an HVM guest
>> that emulates a Q35 chipset via Qemu and implements access to PCIe
>> extended configuration space for such devices emulated by Qemu.
>>
>> Support for PCIe device passthrough is not implemented yet. This is
>> planned but implies modifications in the hypervisor and the
>> firmwares, mainly for the support of multiple PCI buses.
>
>Why do you need multi bus support to expose PCIe capabilities? I'm
>not seeing the relation between those two. You could still expose a
>single bus on the MCFG table.
The problem with the PCIe bus is that it's very "topological" by design
- and it always wants a valid hierarchy.
Each PCIe device manifests itself (via its PCIe Capabilities entry)
as either a chipset-integrated device or a regular PCIe endpoint
device, which is the most common case. There are more types IIRC but
these are what we deal with mostly - both for PT devices and
QEMU-emulated ones.
But, being a PCIe endpoint means that the device must have some parent
device. It can be located below a PCIe switch or, in the simplest and
the most common case, below a PCIe Root Port device.
In both cases the 'parent' is a PCI-PCI bridge technically, with the
PCIe endpoint device being located on its secondary bus.
As the Q35 patch series was done with mostly PCIe device passthrough in
mind, this brings the main complication - in order to properly place a
passed through device on the PCIe bus, we need an emulated/real/hybrid
Root Port device.
A much lengthier description is in this patch message:
https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg01197.html
To summarize, we need this 'valid PCIe topology' nonsense just to make
Windows kernel (pci.sys driver specifically) not to discard our PT
device due to checking PCIe bus hierarchy above it.
This limitation was found/confirmed via debugging - luckily, pci.sys
had symbols and the main bad function which was failing had a very
speaking name - something like pcieCheckTopology or similar.
Emulating the "chipset-integrated device" in PT device's PCIe
Capabilities was a simple hack which allowed to bypass the requirement
to have a valid PCIe hierarchy with multiple buses. But the proper
future direction is implementing emulation of Root Ports or PCIe
switches I guess.
>> The PCIe MMCONFIG area is configured by hvmloader and its base
>> address and size are set in Xen using a new pair of hypercalls
>> HVMOP_get|set_ecam_space.
>
>I guess I will see how that looks like in the series, but the setting
>of the ECAM region would better be done by the toolstack. Setting it
>in hvmloader is possibly not the best placement, because it doesn't
>run for PVH guests (and we will want ECAM support for PVH at some
>point), and there's also a vague plan/intention to get rid of
>hvmloader even for HVM guests eventually.
This is the situation where the difference between HVM and PVH might be
very problematic I'm afraid. HVM guests assume full freedom over the
IO/MMIO resources setup inside their sandboxed environment.
It's not just Windows reallocating PCI BARs to its liking, but also
spans to the emulated chipset's resources. In worst case we could have
MMCONFIG reinitialization implemented even in Intel's Q35 drivers
installed inside an HVM guest. Fortunately, this is not what I remember
was the case, but in theory Q35 driver could have done things like this.
^ permalink raw reply [flat|nested] 59+ messages in thread
* Re: [PATCH 00/17] Q35 initial support for HVM guests
2026-05-05 13:07 ` Alexey G
@ 2026-05-05 14:15 ` Roger Pau Monné
0 siblings, 0 replies; 59+ messages in thread
From: Roger Pau Monné @ 2026-05-05 14:15 UTC (permalink / raw)
To: Alexey G
Cc: Thierry Escande, xen-devel, Jan Beulich, Andrew Cooper,
Anthony PERARD, Michal Orzel, Julien Grall, Stefano Stabellini,
Juergen Gross
On Tue, May 05, 2026 at 03:07:36PM +0200, Alexey G wrote:
> On Tue, 28 Apr 2026 09:48:41 +0200
> Roger Pau Monné <roger.pau@citrix.com> wrote:
>
> >On Fri, Mar 13, 2026 at 04:35:01PM +0000, Thierry Escande wrote:
> >> This series introduces initial Q35 chipset support for HVM guests,
> >> based on the patchset at [1] by Alexey Gerasimenko.
> >>
> >> Basic support means that this patchset allows to start an HVM guest
> >> that emulates a Q35 chipset via Qemu and implements access to PCIe
> >> extended configuration space for such devices emulated by Qemu.
> >>
> >> Support for PCIe device passthrough is not implemented yet. This is
> >> planned but implies modifications in the hypervisor and the
> >> firmwares, mainly for the support of multiple PCI buses.
> >
> >Why do you need multi bus support to expose PCIe capabilities? I'm
> >not seeing the relation between those two. You could still expose a
> >single bus on the MCFG table.
>
> The problem with the PCIe bus is that it's very "topological" by design
> - and it always wants a valid hierarchy.
>
> Each PCIe device manifests itself (via its PCIe Capabilities entry)
> as either a chipset-integrated device or a regular PCIe endpoint
> device, which is the most common case. There are more types IIRC but
> these are what we deal with mostly - both for PT devices and
> QEMU-emulated ones.
>
> But, being a PCIe endpoint means that the device must have some parent
> device. It can be located below a PCIe switch or, in the simplest and
> the most common case, below a PCIe Root Port device.
>
> In both cases the 'parent' is a PCI-PCI bridge technically, with the
> PCIe endpoint device being located on its secondary bus.
>
> As the Q35 patch series was done with mostly PCIe device passthrough in
> mind, this brings the main complication - in order to properly place a
> passed through device on the PCIe bus, we need an emulated/real/hybrid
> Root Port device.
>
> A much lengthier description is in this patch message:
> https://lists.xenproject.org/archives/html/xen-devel/2018-03/msg01197.html
>
> To summarize, we need this 'valid PCIe topology' nonsense just to make
> Windows kernel (pci.sys driver specifically) not to discard our PT
> device due to checking PCIe bus hierarchy above it.
>
> This limitation was found/confirmed via debugging - luckily, pci.sys
> had symbols and the main bad function which was failing had a very
> speaking name - something like pcieCheckTopology or similar.
>
> Emulating the "chipset-integrated device" in PT device's PCIe
> Capabilities was a simple hack which allowed to bypass the requirement
> to have a valid PCIe hierarchy with multiple buses. But the proper
> future direction is implementing emulation of Root Ports or PCIe
> switches I guess.
Oh, I see. We discussed this with Jan, it wasn't clear whether it
would be a strict requirement or not. We have our answer now, it is a
strict requirement for pass-through to Windows guests.
> >> The PCIe MMCONFIG area is configured by hvmloader and its base
> >> address and size are set in Xen using a new pair of hypercalls
> >> HVMOP_get|set_ecam_space.
> >
> >I guess I will see how that looks like in the series, but the setting
> >of the ECAM region would better be done by the toolstack. Setting it
> >in hvmloader is possibly not the best placement, because it doesn't
> >run for PVH guests (and we will want ECAM support for PVH at some
> >point), and there's also a vague plan/intention to get rid of
> >hvmloader even for HVM guests eventually.
>
> This is the situation where the difference between HVM and PVH might be
> very problematic I'm afraid. HVM guests assume full freedom over the
> IO/MMIO resources setup inside their sandboxed environment.
>
> It's not just Windows reallocating PCI BARs to its liking, but also
> spans to the emulated chipset's resources. In worst case we could have
> MMCONFIG reinitialization implemented even in Intel's Q35 drivers
> installed inside an HVM guest. Fortunately, this is not what I remember
> was the case, but in theory Q35 driver could have done things like this.
Indeed. In later patches my recommendation was to trap accesses to
the root complex registers that control the position and size of the
ECAM region, and forward those to the hypervisor, instead of hvmloader
using a side-band hypercall to set the position and size of the ECAM
region.
I've also discussed this with Jan, alternatively we could trap the
registers directly in Xen itself, but the Xen would need to know the
domain has an emulated q35, which we might need to do at some point,
but we are likely not there yet.
Thanks, Roger.
^ permalink raw reply [flat|nested] 59+ messages in thread