public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: <mhonap@nvidia.com>
To: <alwilliamson@nvidia.com>, <skolothumtho@nvidia.com>,
	<ankita@nvidia.com>, <mst@redhat.com>, <imammedo@redhat.com>,
	<anisinha@redhat.com>, <eric.auger@redhat.com>,
	<peter.maydell@linaro.org>, <shannon.zhaosl@gmail.com>,
	<jonathan.cameron@huawei.com>, <fan.ni@samsung.com>,
	<pbonzini@redhat.com>, <richard.henderson@linaro.org>,
	<marcel.apfelbaum@gmail.com>, <clg@redhat.com>,
	<cohuck@redhat.com>, <dan.j.williams@intel.com>,
	<dave.jiang@intel.com>, <alejandro.lucero-palau@amd.com>
Cc: <vsethi@nvidia.com>, <cjia@nvidia.com>, <targupta@nvidia.com>,
	<zhiw@nvidia.com>, <kjaju@nvidia.com>,
	<linux-cxl@vger.kernel.org>, <kvm@vger.kernel.org>,
	<qemu-devel@nongnu.org>, <qemu-arm@nongnu.org>,
	"Manish Honap" <mhonap@nvidia.com>
Subject: [RFC 1/9] hw/arm/virt: Add CXL FMWS PA window for device memory
Date: Mon, 27 Apr 2026 23:42:27 +0530	[thread overview]
Message-ID: <20260427181235.3003865-2-mhonap@nvidia.com> (raw)
In-Reply-To: <20260427181235.3003865-1-mhonap@nvidia.com>

From: Manish Honap <mhonap@nvidia.com>

CXL VFIO passthrough needs a stable guest physical address range for
device memory (DPA) that falls inside a CFMWS entry the guest discovers
from ACPI CEDT. Without a dedicated range in the address map, the HDM
decoder has nowhere to point.

Add VIRT_HIGH_CXL_MMIO immediately after the second PCIe MMIO window.
It gets its own highmem_cxl_mmio flag in VirtMachineState rather than
sharing highmem_cxl, so the two slots are independently controllable
even though both are currently tied to CXL bridge presence.

The base and size flow through GPEXConfig.cxl_mmio to
acpi_dsdt_add_gpex(), which carves out a QWord memory descriptor in the
first CXL root bridge's _CRS. The CFMWS window is system-wide, so only
the first CXL bridge gets the descriptor - subsequent ones would
produce duplicate resource claims for the same range.

build_crs() already emits the bridge's own 64-bit ranges into crs.
The CFMWS window is a separate system-wide range, so only that window
is appended as a new QWord descriptor; the bridge ranges are not
re-emitted. A warn_report() fires if the CFMWS window overlaps any
existing bridge 64-bit range, since that would indicate an address
layout conflict.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Manish Honap <mhonap@nvidia.com>
---
 hw/arm/virt-acpi-build.c   |  5 +++++
 hw/arm/virt.c              |  9 +++++++++
 hw/pci-host/gpex-acpi.c    | 40 ++++++++++++++++++++++++++++++++++++++
 include/hw/arm/virt.h      |  2 ++
 include/hw/pci-host/gpex.h |  1 +
 5 files changed, 57 insertions(+)

diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c
index 591cfc993c..863e0680fb 100644
--- a/hw/arm/virt-acpi-build.c
+++ b/hw/arm/virt-acpi-build.c
@@ -176,6 +176,11 @@ static void acpi_dsdt_add_pci(Aml *scope, const MemMapEntry *memmap,
         cfg.mmio64 = memmap[VIRT_HIGH_PCIE_MMIO];
     }
 
+    if (vms->highmem_cxl) {
+        cfg.cxl_mmio.base = memmap[VIRT_HIGH_CXL_MMIO].base;
+        cfg.cxl_mmio.size = memmap[VIRT_HIGH_CXL_MMIO].size;
+    }
+
     acpi_dsdt_add_gpex(scope, &cfg);
     QLIST_FOREACH(bus, &vms->bus->child, sibling) {
         if (pci_bus_is_cxl(bus)) {
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ec0d8475ca..fa07819401 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -211,6 +211,8 @@ static const MemMapEntry base_memmap[] = {
 #define DEFAULT_HIGH_PCIE_MMIO_SIZE_GB 512
 #define DEFAULT_HIGH_PCIE_MMIO_SIZE (DEFAULT_HIGH_PCIE_MMIO_SIZE_GB * GiB)
 
+#define DEFAULT_HIGH_CXL_MMIO_SIZE  DEFAULT_HIGH_PCIE_MMIO_SIZE
+
 /*
  * Highmem IO Regions: This memory map is floating, located after the RAM.
  * Each MemMapEntry base (GPA) will be dynamically computed, depending on the
@@ -237,6 +239,11 @@ static MemMapEntry extended_memmap[] = {
     [VIRT_HIGH_PCIE_ECAM] =     { 0x0, 256 * MiB },
     /* Second PCIe window */
     [VIRT_HIGH_PCIE_MMIO] =     { 0x0, DEFAULT_HIGH_PCIE_MMIO_SIZE },
+    /*
+     * CXL FMWS guest PA window - separate from PCIe MMIO so the two are
+     * independently sizeable. Same default size for now.
+     */
+    [VIRT_HIGH_CXL_MMIO] =      { 0x0, DEFAULT_HIGH_CXL_MMIO_SIZE },
     /* Any CXL Fixed memory windows come here */
 };
 
@@ -1724,6 +1731,7 @@ static void create_cxl_host_reg_region(VirtMachineState *vms)
                        vms->memmap[VIRT_CXL_HOST].size);
     memory_region_add_subregion(sysmem, vms->memmap[VIRT_CXL_HOST].base, mr);
     vms->highmem_cxl = true;
+    vms->highmem_cxl_mmio = true;
 }
 
 static void create_platform_bus(VirtMachineState *vms)
@@ -1897,6 +1905,7 @@ static inline bool *virt_get_high_memmap_enabled(VirtMachineState *vms,
         &vms->highmem_cxl,
         &vms->highmem_ecam,
         &vms->highmem_mmio,
+        &vms->highmem_cxl_mmio,
     };
 
     assert(ARRAY_SIZE(extended_memmap) - VIRT_LOWMEMMAP_LAST ==
diff --git a/hw/pci-host/gpex-acpi.c b/hw/pci-host/gpex-acpi.c
index d9820f9b41..7de57bbc46 100644
--- a/hw/pci-host/gpex-acpi.c
+++ b/hw/pci-host/gpex-acpi.c
@@ -7,6 +7,7 @@
 #include "hw/pci/pci_bridge.h"
 #include "hw/pci/pcie_host.h"
 #include "hw/acpi/cxl.h"
+#include "qemu/error-report.h"
 
 static void acpi_dsdt_add_pci_route_table(Aml *dev, uint32_t irq,
                                           Aml *scope, uint8_t bus_num)
@@ -108,6 +109,7 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
     CrsRangeSet crs_range_set;
     CrsRangeEntry *entry;
     int i;
+    bool first_cxl = true;
 
     /* start to construct the tables for pxb */
     crs_range_set_init(&crs_range_set);
@@ -161,6 +163,44 @@ void acpi_dsdt_add_gpex(Aml *scope, struct GPEXConfig *cfg)
              */
             crs = build_crs(PCI_HOST_BRIDGE(BUS(bus)->parent), &crs_range_set,
                             cfg->pio.base, 0, 0, 0);
+            if (is_cxl && first_cxl && cfg->cxl_mmio.size) {
+                uint64_t cfmws_end = cfg->cxl_mmio.base +
+                                     cfg->cxl_mmio.size - 1;
+
+                /*
+                 * The CXL Fixed Memory Window (CFMWS) is a system-wide GPA
+                 * range.  Only the first CXL root bridge emits the QWord
+                 * descriptor; adding it to every bridge would give the OS
+                 * duplicate resource claims for the same range.
+                 *
+                 * build_crs() has already appended the bridge's own 64-bit
+                 * ranges into crs.  Do not copy them again here; only append
+                 * the CFMWS window itself as a new QWord descriptor.
+                 *
+                 * Warn if the CFMWS window overlaps any range already claimed
+                 * by the bridge; in the current address layout they should be
+                 * disjoint, but catch it early if the layout ever changes.
+                 */
+                for (i = 0; i < crs_range_set.mem_64bit_ranges->len; i++) {
+                    entry = g_ptr_array_index(crs_range_set.mem_64bit_ranges,
+                                              i);
+                    if (entry->base <= cfmws_end &&
+                        entry->limit >= cfg->cxl_mmio.base) {
+                        warn_report("CXL CFMWS [0x%"PRIx64"-0x%"PRIx64"] "
+                                    "overlaps CXL root bridge 64-bit range "
+                                    "[0x%"PRIx64"-0x%"PRIx64"]",
+                                    cfg->cxl_mmio.base, cfmws_end,
+                                    entry->base, entry->limit);
+                    }
+                }
+                aml_append(crs,
+                    aml_qword_memory(AML_POS_DECODE, AML_MIN_FIXED,
+                        AML_MAX_FIXED, AML_NON_CACHEABLE, AML_READ_WRITE,
+                        0x0000, cfg->cxl_mmio.base, cfmws_end, 0x0000,
+                        cfg->cxl_mmio.size));
+                first_cxl = false;
+            }
+
             aml_append(dev, aml_name_decl("_CRS", crs));
 
             if (is_cxl) {
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 5fcbd1c76f..88bb3c0bdf 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -91,6 +91,7 @@ enum {
     VIRT_CXL_HOST,
     VIRT_HIGH_PCIE_ECAM,
     VIRT_HIGH_PCIE_MMIO,
+    VIRT_HIGH_CXL_MMIO,
 };
 
 typedef enum VirtIOMMUType {
@@ -147,6 +148,7 @@ struct VirtMachineState {
     bool highmem;
     bool highmem_compact;
     bool highmem_cxl;
+    bool highmem_cxl_mmio;  /* VIRT_HIGH_CXL_MMIO window; follows highmem_cxl */
     bool highmem_ecam;
     bool highmem_mmio;
     bool highmem_redists;
diff --git a/include/hw/pci-host/gpex.h b/include/hw/pci-host/gpex.h
index 1da9c85bce..a7c2e2edf3 100644
--- a/include/hw/pci-host/gpex.h
+++ b/include/hw/pci-host/gpex.h
@@ -43,6 +43,7 @@ struct GPEXConfig {
     MemMapEntry mmio32;
     MemMapEntry mmio64;
     MemMapEntry pio;
+    MemMapEntry cxl_mmio;
     int         irq;
     PCIBus      *bus;
     bool        pci_native_hotplug;
-- 
2.25.1


  reply	other threads:[~2026-04-27 18:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 18:12 [RFC 0/9] QEMU: CXL Type-2 device passthrough via vfio-pci mhonap
2026-04-27 18:12 ` mhonap [this message]
2026-04-27 18:12 ` [RFC 2/9] cxl: Add preserve_config to pxb-cxl OSC method mhonap
2026-04-27 18:12 ` [RFC 3/9] linux-headers: Update vfio.h for CXL Type-2 device passthrough mhonap
2026-04-27 18:12 ` [RFC 4/9] hw/vfio/region: Add vfio_region_setup_with_ops() for custom region ops mhonap
2026-04-27 18:12 ` [RFC 5/9] hw/vfio/pci: Add CXL Type-2 device detection and region setup mhonap
2026-04-27 18:12 ` [RFC 6/9] hw/vfio/pci: Wire CXL component-register BAR with COMP_REGS overlay mhonap
2026-04-27 18:12 ` [RFC 7/9] hw/vfio+cxl: Program HDM decoder 0 at machine_done for firmware-committed devices mhonap
2026-04-27 18:12 ` [RFC 8/9] hw/arm/smmu-common: Allow pxb-cxl as SMMUv3 primary bus mhonap
2026-04-27 18:12 ` [RFC 9/9] vfio/listener: Skip DMA mapping for VFIO-owned RAM-device regions mhonap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260427181235.3003865-2-mhonap@nvidia.com \
    --to=mhonap@nvidia.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=alwilliamson@nvidia.com \
    --cc=anisinha@redhat.com \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=clg@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=eric.auger@redhat.com \
    --cc=fan.ni@samsung.com \
    --cc=imammedo@redhat.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=shannon.zhaosl@gmail.com \
    --cc=skolothumtho@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox