From: <mhonap@nvidia.com>
To: <alwilliamson@nvidia.com>, <skolothumtho@nvidia.com>,
<ankita@nvidia.com>, <mst@redhat.com>, <imammedo@redhat.com>,
<anisinha@redhat.com>, <eric.auger@redhat.com>,
<peter.maydell@linaro.org>, <shannon.zhaosl@gmail.com>,
<jonathan.cameron@huawei.com>, <fan.ni@samsung.com>,
<pbonzini@redhat.com>, <richard.henderson@linaro.org>,
<marcel.apfelbaum@gmail.com>, <clg@redhat.com>,
<cohuck@redhat.com>, <dan.j.williams@intel.com>,
<dave.jiang@intel.com>, <alejandro.lucero-palau@amd.com>
Cc: <vsethi@nvidia.com>, <cjia@nvidia.com>, <targupta@nvidia.com>,
<zhiw@nvidia.com>, <kjaju@nvidia.com>,
<linux-cxl@vger.kernel.org>, <kvm@vger.kernel.org>,
<qemu-devel@nongnu.org>, <qemu-arm@nongnu.org>,
"Manish Honap" <mhonap@nvidia.com>
Subject: [RFC 6/9] hw/vfio/pci: Wire CXL component-register BAR with COMP_REGS overlay
Date: Mon, 27 Apr 2026 23:42:32 +0530 [thread overview]
Message-ID: <20260427181235.3003865-7-mhonap@nvidia.com> (raw)
In-Reply-To: <20260427181235.3003865-1-mhonap@nvidia.com>
From: Manish Honap <mhonap@nvidia.com>
The CXL Component Register BAR contains two types of ranges that need
different handling:
- Accelerator register windows: passed through as direct hardware
mmaps for performance. The kernel reports the real BAR size and
lists mmappable windows via VFIO_REGION_INFO_CAP_SPARSE_MMAP,
excluding the HDM Decoder Capability block. vfio_region_mmap()
creates hardware-backed sub-regions for each sparse area.
- HDM Decoder Capability block: guest accesses must go through
emulated ops so QEMU can observe and program decoder state. The
kernel blocks direct mmap of this range.
vfio_bar_register(): after the normal mmap path, overlay the COMP_REGS
emulation region at hdm_regs_offset with priority 1. In QEMU's
MemoryRegion model, overlapping subregions are resolved by priority;
the default is 0. Priority 1 ensures guest accesses to the HDM range
always dispatch through the emulated COMP_REGS ops regardless of any
hardware-backed sub-region at a neighbouring offset.
vfio_pci_bars_exit(): remove the COMP_REGS overlay before the normal
BAR teardown path.
Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Manish Honap <mhonap@nvidia.com>
---
hw/vfio/pci.c | 26 ++++++++++++++++++++++++++
hw/vfio/trace-events | 1 +
2 files changed, 27 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 49ac661eb3..0270de61d2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1960,6 +1960,10 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
return;
}
+ bool cxl_comp_regs_bar = (vdev->vbasedev.flags & VFIO_DEVICE_FLAGS_CXL) &&
+ nr == vdev->cxl.hdm_regs_bar_index &&
+ vdev->cxl.comp_regs_region.mem;
+
bar->mr = g_new0(MemoryRegion, 1);
name = g_strdup_printf("%s base BAR %d", vdev->vbasedev.name, nr);
memory_region_init_io(bar->mr, OBJECT(vdev), NULL, NULL, name, bar->size);
@@ -1974,6 +1978,21 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
}
}
+ if (cxl_comp_regs_bar) {
+ /*
+ * Overlay the COMP_REGS emulation at hdm_regs_offset with priority 1.
+ * The kernel excludes the HDM Decoder Capability block from the
+ * sparse-mmap list, so vfio_region_mmap() creates hardware-backed
+ * sub-regions only for accelerator register windows. The emulated
+ * COMP_REGS region sits above those at priority 1, ensuring guest
+ * accesses to the HDM range always dispatch through the emulated ops.
+ */
+ memory_region_add_subregion_overlap(bar->mr, vdev->cxl.hdm_regs_offset,
+ vdev->cxl.comp_regs_region.mem, 1);
+ trace_vfio_cxl_bar_subregion(vdev->vbasedev.name, nr,
+ vdev->cxl.hdm_regs_offset);
+ }
+
pci_register_bar(pdev, nr, bar->type, bar->mr);
}
@@ -1993,9 +2012,16 @@ void vfio_pci_bars_exit(VFIOPCIDevice *vdev)
for (i = 0; i < PCI_ROM_SLOT; i++) {
VFIOBAR *bar = &vdev->bars[i];
+ bool use_comp_regs = (vdev->vbasedev.flags & VFIO_DEVICE_FLAGS_CXL) &&
+ i == vdev->cxl.hdm_regs_bar_index &&
+ vdev->cxl.comp_regs_region.mem;
vfio_bar_quirk_exit(vdev, i);
vfio_region_exit(&bar->region);
+ if (use_comp_regs && bar->mr) {
+ memory_region_del_subregion(bar->mr,
+ vdev->cxl.comp_regs_region.mem);
+ }
if (bar->region.size) {
memory_region_del_subregion(bar->mr, bar->region.mem);
}
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 3678481a8e..3bced3cebb 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -201,3 +201,4 @@ vfio_device_detach(const char *name, int group_id) " (%s) group %d"
# pci.c CXL Type-2 passthrough
vfio_cxl_setup_params(const char *name, uint8_t bar, uint64_t hdm_off, uint64_t hdm_sz, uint64_t dpa_sz) " (%s) hdm_bar=%u hdm_regs_offset=0x%"PRIx64" hdm_regs_size=0x%"PRIx64" dpa_size=0x%"PRIx64
vfio_cxl_put_device(const char *name) " (%s) removing DPA region from system memory"
+vfio_cxl_bar_subregion(const char *name, int nr, uint64_t off) " (%s) BAR%d comp_regs overlay at BAR offset 0x%"PRIx64
--
2.25.1
next prev parent reply other threads:[~2026-04-27 18:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-04-27 18:12 [RFC 0/9] QEMU: CXL Type-2 device passthrough via vfio-pci mhonap
2026-04-27 18:12 ` [RFC 1/9] hw/arm/virt: Add CXL FMWS PA window for device memory mhonap
2026-04-27 18:12 ` [RFC 2/9] cxl: Add preserve_config to pxb-cxl OSC method mhonap
2026-04-27 18:12 ` [RFC 3/9] linux-headers: Update vfio.h for CXL Type-2 device passthrough mhonap
2026-04-27 18:12 ` [RFC 4/9] hw/vfio/region: Add vfio_region_setup_with_ops() for custom region ops mhonap
2026-04-27 18:12 ` [RFC 5/9] hw/vfio/pci: Add CXL Type-2 device detection and region setup mhonap
2026-04-27 18:12 ` mhonap [this message]
2026-04-27 18:12 ` [RFC 7/9] hw/vfio+cxl: Program HDM decoder 0 at machine_done for firmware-committed devices mhonap
2026-04-27 18:12 ` [RFC 8/9] hw/arm/smmu-common: Allow pxb-cxl as SMMUv3 primary bus mhonap
2026-04-27 18:12 ` [RFC 9/9] vfio/listener: Skip DMA mapping for VFIO-owned RAM-device regions mhonap
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260427181235.3003865-7-mhonap@nvidia.com \
--to=mhonap@nvidia.com \
--cc=alejandro.lucero-palau@amd.com \
--cc=alwilliamson@nvidia.com \
--cc=anisinha@redhat.com \
--cc=ankita@nvidia.com \
--cc=cjia@nvidia.com \
--cc=clg@redhat.com \
--cc=cohuck@redhat.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=eric.auger@redhat.com \
--cc=fan.ni@samsung.com \
--cc=imammedo@redhat.com \
--cc=jonathan.cameron@huawei.com \
--cc=kjaju@nvidia.com \
--cc=kvm@vger.kernel.org \
--cc=linux-cxl@vger.kernel.org \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=shannon.zhaosl@gmail.com \
--cc=skolothumtho@nvidia.com \
--cc=targupta@nvidia.com \
--cc=vsethi@nvidia.com \
--cc=zhiw@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox