From: <mhonap@nvidia.com>
To: <alwilliamson@nvidia.com>, <skolothumtho@nvidia.com>,
<ankita@nvidia.com>, <mst@redhat.com>, <imammedo@redhat.com>,
<anisinha@redhat.com>, <eric.auger@redhat.com>,
<peter.maydell@linaro.org>, <shannon.zhaosl@gmail.com>,
<jonathan.cameron@huawei.com>, <fan.ni@samsung.com>,
<pbonzini@redhat.com>, <richard.henderson@linaro.org>,
<marcel.apfelbaum@gmail.com>, <clg@redhat.com>,
<cohuck@redhat.com>, <dan.j.williams@intel.com>,
<dave.jiang@intel.com>, <alejandro.lucero-palau@amd.com>
Cc: <vsethi@nvidia.com>, <cjia@nvidia.com>, <targupta@nvidia.com>,
<zhiw@nvidia.com>, <kjaju@nvidia.com>,
<linux-cxl@vger.kernel.org>, <kvm@vger.kernel.org>,
<qemu-devel@nongnu.org>, <qemu-arm@nongnu.org>,
"Manish Honap" <mhonap@nvidia.com>
Subject: [RFC 1/9] hw/arm/virt: Add CXL FMWS PA window for device memory
Date: Mon, 27 Apr 2026 23:42:27 +0530 [thread overview]
Message-ID: <20260427181235.3003865-2-mhonap@nvidia.com> (raw)
In-Reply-To: <20260427181235.3003865-1-mhonap@nvidia.com>
From: Manish Honap <mhonap@nvidia.com>
CXL VFIO passthrough needs a stable guest physical address range for
device memory (DPA) that falls inside a CFMWS entry the guest discovers
from ACPI CEDT. Without a dedicated range in the address map, the HDM
decoder has nowhere to point.
Add VIRT_HIGH_CXL_MMIO immediately after the second PCIe MMIO window.
It gets its own highmem_cxl_mmio flag in VirtMachineState rather than
sharing highmem_cxl, so the two slots are independently controllable
even though both are currently tied to CXL bridge presence.
The base and size flow through GPEXConfig.cxl_mmio to
acpi_dsdt_add_gpex(), which carves out a QWord memory descriptor in the
first CXL root bridge's _CRS. The CFMWS window is system-wide, so only
the first CXL bridge gets the descriptor - subsequent ones would
produce duplicate resource claims for the same range.
build_crs() already emits the bridge's own 64-bit ranges into crs.
The CFMWS window is a separate system-wide range, so only that window
is appended as a new QWord descriptor; the bridge ranges are not
re-emitted. A warn_report() fires if the CFMWS window overlaps any
existing bridge 64-bit range, since that would indicate an address
layout conflict.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Manish Honap <mhonap@nvidia.com>
---
hw/arm/virt-acpi-build.c | 5 +++++
hw/arm/virt.c | 9 +++++++++
hw/pci-host/gpex-acpi.c | 40 ++++++++++++++++++++++++++++++++++++++
include/hw/arm/virt.h | 2 ++
include/hw/pci-host/gpex.h | 1 +
5 files changed, 57 insertions(+)
diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 591cfc993c..863e0680fb 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -176,6 +176,11 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
cfg.mmio64 = memmap[VIRT_HIGH_PCIE_MMIO];
}
+ if (vms->highmem_cxl) {
+ cfg.cxl_mmio.base = memmap[VIRT_HIGH_CXL_MMIO].base;
+ cfg.cxl_mmio.size = memmap[VIRT_HIGH_CXL_MMIO].size;
+ }
+
acpi_dsdt_add_gpex(scope, &cfg);
QLIST_FOREACH(bus, &vms->bus->child, sibling) {
if (pci_bus_is_cxl(bus)) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ec0d8475ca..fa07819401 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -211,6 +211,8 @@ static const MemMapEntry base_memmap[] = {
#define DEFAULT_HIGH_PCIE_MMIO_SIZE_GB 512
#define DEFAULT_HIGH_PCIE_MMIO_SIZE (DEFAULT_HIGH_PCIE_MMIO_SIZE_GB * GiB)
+#define DEFAULT_HIGH_CXL_MMIO_SIZE DEFAULT_HIGH_PCIE_MMIO_SIZE
+
/*
* Highmem IO Regions: This memory map is floating, located after the RAM.
* Each MemMapEntry base (GPA) will be dynamically computed, depending on the
@@ -237,6 +239,11 @@ static MemMapEntry extended_memmap[] = {
[VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB },
/* Second PCIe window */
[VIRT_HIGH_PCIE_MMIO] = { 0x0, DEFAULT_HIGH_PCIE_MMIO_SIZE },
+ /*
+ * CXL FMWS guest PA window - separate from PCIe MMIO so the two are
+ * independently sizeable. Same default size for now.
+ */
+ [VIRT_HIGH_CXL_MMIO] = { 0x0, DEFAULT_HIGH_CXL_MMIO_SIZE },
/* Any CXL Fixed memory windows come here */
};
@@ -1724,6 +1731,7 @@ static void create_cxl_host_reg_region(VirtMachineState *vms)
vms->memmap[VIRT_CXL_HOST].size);
memory_region_add_subregion(sysmem, vms->memmap[VIRT_CXL_HOST].base, mr);
vms->highmem_cxl = true;
+ vms->highmem_cxl_mmio = true;
}
static void create_platform_bus(VirtMachineState *vms)
@@ -1897,6 +1905,7 @@ static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,
&vms->highmem_cxl,
&vms->highmem_ecam,
&vms->highmem_mmio,
+ &vms->highmem_cxl_mmio,
};
assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST ==
diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index d9820f9b41..7de57bbc46 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -7,6 +7,7 @@
#include "hw/pci/pci_bridge.h"
#include "hw/pci/pcie_host.h"
#include "hw/acpi/cxl.h"
+#include "qemu/error-report.h"
static void acpi_dsdt_add_pci_route_table(Aml *dev, uint32_t irq,
Aml *scope, uint8_t bus_num)
@@ -108,6 +109,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
CrsRangeSet crs_range_set;
CrsRangeEntry *entry;
int i;
+ bool first_cxl = true;
/* start to construct the tables for pxb */
crs_range_set_init(&crs_range_set);
@@ -161,6 +163,44 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
*/
crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set,
cfg->pio.base, 0, 0, 0);
+ if (is_cxl && first_cxl && cfg->cxl_mmio.size) {
+ uint64_t cfmws_end = cfg->cxl_mmio.base +
+ cfg->cxl_mmio.size - 1;
+
+ /*
+ * The CXL Fixed Memory Window (CFMWS) is a system-wide GPA
+ * range. Only the first CXL root bridge emits the QWord
+ * descriptor; adding it to every bridge would give the OS
+ * duplicate resource claims for the same range.
+ *
+ * build_crs() has already appended the bridge's own 64-bit
+ * ranges into crs. Do not copy them again here; only append
+ * the CFMWS window itself as a new QWord descriptor.
+ *
+ * Warn if the CFMWS window overlaps any range already claimed
+ * by the bridge; in the current address layout they should be
+ * disjoint, but catch it early if the layout ever changes.
+ */
+ for (i = 0; i < crs_range_set.mem_64bit_ranges->len; i++) {
+ entry = g_ptr_array_index(crs_range_set.mem_64bit_ranges,
+ i);
+ if (entry->base <= cfmws_end &&
+ entry->limit >= cfg->cxl_mmio.base) {
+ warn_report("CXL CFMWS [0x%"PRIx64"-0x%"PRIx64"] "
+ "overlaps CXL root bridge 64-bit range "
+ "[0x%"PRIx64"-0x%"PRIx64"]",
+ cfg->cxl_mmio.base, cfmws_end,
+ entry->base, entry->limit);
+ }
+ }
+ aml_append(crs,
+ aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
+ AML_MAX_FIXED, AML_NON_CACHEABLE, AML_READ_WRITE,
+ 0x0000, cfg->cxl_mmio.base, cfmws_end, 0x0000,
+ cfg->cxl_mmio.size));
+ first_cxl = false;
+ }
+
aml_append(dev, aml_name_decl("_CRS", crs));
if (is_cxl) {
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 5fcbd1c76f..88bb3c0bdf 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -91,6 +91,7 @@ enum {
VIRT_CXL_HOST,
VIRT_HIGH_PCIE_ECAM,
VIRT_HIGH_PCIE_MMIO,
+ VIRT_HIGH_CXL_MMIO,
};
typedef enum VirtIOMMUType {
@@ -147,6 +148,7 @@ struct VirtMachineState {
bool highmem;
bool highmem_compact;
bool highmem_cxl;
+ bool highmem_cxl_mmio; /* VIRT_HIGH_CXL_MMIO window; follows highmem_cxl */
bool highmem_ecam;
bool highmem_mmio;
bool highmem_redists;
diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
index 1da9c85bce..a7c2e2edf3 100644
--- a/include/hw/pci-host/gpex.h
+++ b/include/hw/pci-host/gpex.h
@@ -43,6 +43,7 @@ struct GPEXConfig {
MemMapEntry mmio32;
MemMapEntry mmio64;
MemMapEntry pio;
+ MemMapEntry cxl_mmio;
int irq;
PCIBus *bus;
bool pci_native_hotplug;
--
2.25.1
next prev parent reply other threads:[~2026-04-27 18:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 18:12 [RFC 0/9] QEMU: CXL Type-2 device passthrough via vfio-pci mhonap
2026-04-27 18:12 ` mhonap [this message]
2026-04-27 18:12 ` [RFC 2/9] cxl: Add preserve_config to pxb-cxl OSC method mhonap
2026-04-27 18:12 ` [RFC 3/9] linux-headers: Update vfio.h for CXL Type-2 device passthrough mhonap
2026-04-27 18:12 ` [RFC 4/9] hw/vfio/region: Add vfio_region_setup_with_ops() for custom region ops mhonap
2026-04-27 18:12 ` [RFC 5/9] hw/vfio/pci: Add CXL Type-2 device detection and region setup mhonap
2026-04-27 18:12 ` [RFC 6/9] hw/vfio/pci: Wire CXL component-register BAR with COMP_REGS overlay mhonap
2026-04-27 18:12 ` [RFC 7/9] hw/vfio+cxl: Program HDM decoder 0 at machine_done for firmware-committed devices mhonap
2026-04-27 18:12 ` [RFC 8/9] hw/arm/smmu-common: Allow pxb-cxl as SMMUv3 primary bus mhonap
2026-04-27 18:12 ` [RFC 9/9] vfio/listener: Skip DMA mapping for VFIO-owned RAM-device regions mhonap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427181235.3003865-2-mhonap@nvidia.com \
--to=mhonap@nvidia.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=alwilliamson@nvidia.com \
--cc=anisinha@redhat.com \
--cc=ankita@nvidia.com \
--cc=cjia@nvidia.com \
--cc=clg@redhat.com \
--cc=cohuck@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=eric.auger@redhat.com \
--cc=fan.ni@samsung.com \
--cc=imammedo@redhat.com \
--cc=jonathan.cameron@huawei.com \
--cc=kjaju@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=shannon.zhaosl@gmail.com \
--cc=skolothumtho@nvidia.com \
--cc=targupta@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox