public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
From: <mhonap@nvidia.com>
To: <alwilliamson@nvidia.com>, <skolothumtho@nvidia.com>,
	<ankita@nvidia.com>, <mst@redhat.com>, <imammedo@redhat.com>,
	<anisinha@redhat.com>, <eric.auger@redhat.com>,
	<peter.maydell@linaro.org>, <shannon.zhaosl@gmail.com>,
	<jonathan.cameron@huawei.com>, <fan.ni@samsung.com>,
	<pbonzini@redhat.com>, <richard.henderson@linaro.org>,
	<marcel.apfelbaum@gmail.com>, <clg@redhat.com>,
	<cohuck@redhat.com>, <dan.j.williams@intel.com>,
	<dave.jiang@intel.com>, <alejandro.lucero-palau@amd.com>
Cc: <vsethi@nvidia.com>, <cjia@nvidia.com>, <targupta@nvidia.com>,
	<zhiw@nvidia.com>, <kjaju@nvidia.com>,
	<linux-cxl@vger.kernel.org>, <kvm@vger.kernel.org>,
	<qemu-devel@nongnu.org>, <qemu-arm@nongnu.org>,
	"Manish Honap" <mhonap@nvidia.com>
Subject: [RFC 6/9] hw/vfio/pci: Wire CXL component-register BAR with COMP_REGS overlay
Date: Mon, 27 Apr 2026 23:42:32 +0530	[thread overview]
Message-ID: <20260427181235.3003865-7-mhonap@nvidia.com> (raw)
In-Reply-To: <20260427181235.3003865-1-mhonap@nvidia.com>

From: Manish Honap <mhonap@nvidia.com>

The CXL Component Register BAR contains two types of ranges that need
different handling:

  - Accelerator register windows: passed through as direct hardware
    mmaps for performance. The kernel reports the real BAR size and
    lists mmappable windows via VFIO_REGION_INFO_CAP_SPARSE_MMAP,
    excluding the HDM Decoder Capability block. vfio_region_mmap()
    creates hardware-backed sub-regions for each sparse area.

  - HDM Decoder Capability block: guest accesses must go through
    emulated ops so QEMU can observe and program decoder state. The
    kernel blocks direct mmap of this range.

vfio_bar_register(): after the normal mmap path, overlay the COMP_REGS
emulation region at hdm_regs_offset with priority 1. In QEMU's
MemoryRegion model, overlapping subregions are resolved by priority;
the default is 0. Priority 1 ensures guest accesses to the HDM range
always dispatch through the emulated COMP_REGS ops regardless of any
hardware-backed sub-region at a neighbouring offset.

vfio_pci_bars_exit(): remove the COMP_REGS overlay before the normal
BAR teardown path.

Signed-off-by: Zhi Wang <zhiw@nvidia.com>
Signed-off-by: Manish Honap <mhonap@nvidia.com>
---
 hw/vfio/pci.c        | 26 ++++++++++++++++++++++++++
 hw/vfio/trace-events |  1 +
 2 files changed, 27 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 49ac661eb3..0270de61d2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1960,6 +1960,10 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
         return;
     }
 
+    bool cxl_comp_regs_bar = (vdev->vbasedev.flags & VFIO_DEVICE_FLAGS_CXL) &&
+                              nr == vdev->cxl.hdm_regs_bar_index &&
+                              vdev->cxl.comp_regs_region.mem;
+
     bar->mr = g_new0(MemoryRegion, 1);
     name = g_strdup_printf("%s base BAR %d", vdev->vbasedev.name, nr);
     memory_region_init_io(bar->mr, OBJECT(vdev), NULL, NULL, name, bar->size);
@@ -1974,6 +1978,21 @@ static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
         }
     }
 
+    if (cxl_comp_regs_bar) {
+        /*
+         * Overlay the COMP_REGS emulation at hdm_regs_offset with priority 1.
+         * The kernel excludes the HDM Decoder Capability block from the
+         * sparse-mmap list, so vfio_region_mmap() creates hardware-backed
+         * sub-regions only for accelerator register windows. The emulated
+         * COMP_REGS region sits above those at priority 1, ensuring guest
+         * accesses to the HDM range always dispatch through the emulated ops.
+         */
+        memory_region_add_subregion_overlap(bar->mr, vdev->cxl.hdm_regs_offset,
+                                            vdev->cxl.comp_regs_region.mem, 1);
+        trace_vfio_cxl_bar_subregion(vdev->vbasedev.name, nr,
+                                     vdev->cxl.hdm_regs_offset);
+    }
+
     pci_register_bar(pdev, nr, bar->type, bar->mr);
 }
 
@@ -1993,9 +2012,16 @@ void vfio_pci_bars_exit(VFIOPCIDevice *vdev)
 
     for (i = 0; i < PCI_ROM_SLOT; i++) {
         VFIOBAR *bar = &vdev->bars[i];
+        bool use_comp_regs = (vdev->vbasedev.flags & VFIO_DEVICE_FLAGS_CXL) &&
+                             i == vdev->cxl.hdm_regs_bar_index &&
+                             vdev->cxl.comp_regs_region.mem;
 
         vfio_bar_quirk_exit(vdev, i);
         vfio_region_exit(&bar->region);
+        if (use_comp_regs && bar->mr) {
+            memory_region_del_subregion(bar->mr,
+                                        vdev->cxl.comp_regs_region.mem);
+        }
         if (bar->region.size) {
             memory_region_del_subregion(bar->mr, bar->region.mem);
         }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 3678481a8e..3bced3cebb 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -201,3 +201,4 @@ vfio_device_detach(const char *name, int group_id) " (%s) group %d"
 # pci.c CXL Type-2 passthrough
 vfio_cxl_setup_params(const char *name, uint8_t bar, uint64_t hdm_off, uint64_t hdm_sz, uint64_t dpa_sz) " (%s) hdm_bar=%u hdm_regs_offset=0x%"PRIx64" hdm_regs_size=0x%"PRIx64" dpa_size=0x%"PRIx64
 vfio_cxl_put_device(const char *name) " (%s) removing DPA region from system memory"
+vfio_cxl_bar_subregion(const char *name, int nr, uint64_t off) " (%s) BAR%d comp_regs overlay at BAR offset 0x%"PRIx64
-- 
2.25.1


  parent reply	other threads:[~2026-04-27 18:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-27 18:12 [RFC 0/9] QEMU: CXL Type-2 device passthrough via vfio-pci mhonap
2026-04-27 18:12 ` [RFC 1/9] hw/arm/virt: Add CXL FMWS PA window for device memory mhonap
2026-04-27 18:12 ` [RFC 2/9] cxl: Add preserve_config to pxb-cxl OSC method mhonap
2026-04-27 18:12 ` [RFC 3/9] linux-headers: Update vfio.h for CXL Type-2 device passthrough mhonap
2026-04-27 18:12 ` [RFC 4/9] hw/vfio/region: Add vfio_region_setup_with_ops() for custom region ops mhonap
2026-04-27 18:12 ` [RFC 5/9] hw/vfio/pci: Add CXL Type-2 device detection and region setup mhonap
2026-04-27 18:12 ` mhonap [this message]
2026-04-27 18:12 ` [RFC 7/9] hw/vfio+cxl: Program HDM decoder 0 at machine_done for firmware-committed devices mhonap
2026-04-27 18:12 ` [RFC 8/9] hw/arm/smmu-common: Allow pxb-cxl as SMMUv3 primary bus mhonap
2026-04-27 18:12 ` [RFC 9/9] vfio/listener: Skip DMA mapping for VFIO-owned RAM-device regions mhonap

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260427181235.3003865-7-mhonap@nvidia.com \
    --to=mhonap@nvidia.com \
    --cc=alejandro.lucero-palau@amd.com \
    --cc=alwilliamson@nvidia.com \
    --cc=anisinha@redhat.com \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=clg@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=eric.auger@redhat.com \
    --cc=fan.ni@samsung.com \
    --cc=imammedo@redhat.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=kjaju@nvidia.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-cxl@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=shannon.zhaosl@gmail.com \
    --cc=skolothumtho@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox