From: Shameer Kolothum <skolothumtho@nvidia.com>
To: <qemu-arm@nongnu.org>, <qemu-devel@nongnu.org>
Cc: <eric.auger@redhat.com>, <peter.maydell@linaro.org>,
<jgg@nvidia.com>, <nicolinc@nvidia.com>, <ddutile@redhat.com>,
<berrange@redhat.com>, <nathanc@nvidia.com>, <mochs@nvidia.com>,
<smostafa@google.com>, <wangzhou1@hisilicon.com>,
<jiangkunkun@huawei.com>, <jonathan.cameron@huawei.com>,
<zhangfei.gao@linaro.org>, <zhenzhong.duan@intel.com>,
<yi.l.liu@intel.com>, <kjaju@nvidia.com>,
"Michael S . Tsirkin" <mst@redhat.com>
Subject: [PATCH v6 16/33] hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA directly
Date: Thu, 20 Nov 2025 13:21:56 +0000 [thread overview]
Message-ID: <20251120132213.56581-17-skolothumtho@nvidia.com> (raw)
In-Reply-To: <20251120132213.56581-1-skolothumtho@nvidia.com>
For certain vIOMMU implementations, such as SMMUv3 in accelerated mode,
the translation tables are programmed directly into the physical SMMUv3
in a nested configuration. While QEMU knows where the guest tables live,
safely walking them in software would require trapping and ordering all
guest invalidations on every command queue. Without this, QEMU could race
with guest updates and walk stale or freed page tables.
This constraint is fundamental to the design of HW-accelerated vSMMU when
used with downstream vfio-pci endpoint devices, where QEMU must never walk
guest translation tables and must rely on the physical SMMU for
translation. Future accelerated vSMMU features, such as virtual CMDQ, will
also prevent trapping invalidations, reinforcing this restriction.
For vfio-pci endpoints behind such a vSMMU, the only translation QEMU
needs is for the MSI doorbell used when setting up KVM MSI route tables.
Instead of attempting a software walk, introduce an optional vIOMMU
callback that returns the MSI doorbell GPA directly.
kvm_arch_fixup_msi_route() uses this callback when available and ignores
the guest provided IOVA in that case.
If the vIOMMU does not implement the callback, we fall back to the
existing IOMMU based address space translation path.
This ensures correct MSI routing for accelerated SMMUv3 + VFIO passthrough
while avoiding unsafe software walks of guest translation tables.
Cc: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Shameer Kolothum <skolothumtho@nvidia.com>
---
hw/pci/pci.c | 17 +++++++++++++++++
include/hw/pci/pci.h | 17 +++++++++++++++++
target/arm/kvm.c | 18 +++++++++++++++++-
3 files changed, 51 insertions(+), 1 deletion(-)
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 55647a6928..201583603f 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2979,6 +2979,23 @@ bool pci_device_get_iommu_bus_devfn(PCIDevice *dev, PCIBus **piommu_bus,
return aliased;
}
+bool pci_device_iommu_msi_direct_gpa(PCIDevice *dev, hwaddr *out_doorbell)
+{
+ PCIBus *bus;
+ PCIBus *iommu_bus;
+ int devfn;
+
+ pci_device_get_iommu_bus_devfn(dev, &iommu_bus, &bus, &devfn);
+ if (iommu_bus) {
+ if (iommu_bus->iommu_ops->get_msi_direct_gpa) {
+ *out_doorbell = iommu_bus->iommu_ops->get_msi_direct_gpa(bus,
+ iommu_bus->iommu_opaque, devfn);
+ return true;
+ }
+ }
+ return false;
+}
+
AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
{
PCIBus *bus;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index dd1c4483a2..0964049044 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -664,6 +664,22 @@ typedef struct PCIIOMMUOps {
uint32_t pasid, bool priv_req, bool exec_req,
hwaddr addr, bool lpig, uint16_t prgi, bool is_read,
bool is_write);
+ /**
+ * @get_msi_direct_gpa: get the guest physical address of MSI doorbell
+ * for the device on a PCI bus.
+ *
+ * Optional callback. If implemented, it must return a valid guest
+ * physical address for the MSI doorbell
+ *
+ * @bus: the #PCIBus being accessed.
+ *
+ * @opaque: the data passed to pci_setup_iommu().
+ *
+ * @devfn: device and function number
+ *
+ * Returns: the guest physical address of the MSI doorbell.
+ */
+ uint64_t (*get_msi_direct_gpa)(PCIBus *bus, void *opaque, int devfn);
} PCIIOMMUOps;
bool pci_device_get_iommu_bus_devfn(PCIDevice *dev, PCIBus **piommu_bus,
@@ -672,6 +688,7 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
Error **errp);
void pci_device_unset_iommu_device(PCIDevice *dev);
+bool pci_device_iommu_msi_direct_gpa(PCIDevice *dev, hwaddr *out_doorbell);
/**
* pci_device_get_viommu_flags: get vIOMMU flags.
diff --git a/target/arm/kvm.c b/target/arm/kvm.c
index 0d57081e69..2372de6a6e 100644
--- a/target/arm/kvm.c
+++ b/target/arm/kvm.c
@@ -1620,26 +1620,42 @@ int kvm_arch_fixup_msi_route(struct kvm_irq_routing_entry *route,
return 0;
}
+ /*
+ * We do have an IOMMU address space, but for some vIOMMU implementations
+ * (e.g. accelerated SMMUv3) the translation tables are programmed into
+ * the physical SMMUv3 in the host (nested S1=guest, S2=host). QEMU cannot
+ * walk these tables in a safe way, so in that case we obtain the MSI
+ * doorbell GPA directly from the vIOMMU backend and ignore the gIOVA
+ * @address.
+ */
+ if (pci_device_iommu_msi_direct_gpa(dev, &doorbell_gpa)) {
+ goto set_doorbell;
+ }
+
/* MSI doorbell address is translated by an IOMMU */
- RCU_READ_LOCK_GUARD();
+ rcu_read_lock();
mr = address_space_translate(as, address, &xlat, &len, true,
MEMTXATTRS_UNSPECIFIED);
if (!mr) {
+ rcu_read_unlock();
return 1;
}
mrs = memory_region_find(mr, xlat, 1);
if (!mrs.mr) {
+ rcu_read_unlock();
return 1;
}
doorbell_gpa = mrs.offset_within_address_space;
memory_region_unref(mrs.mr);
+ rcu_read_unlock();
+set_doorbell:
route->u.msi.address_lo = doorbell_gpa;
route->u.msi.address_hi = doorbell_gpa >> 32;
--
2.43.0
next prev parent reply other threads:[~2025-11-20 13:27 UTC|newest]
Thread overview: 89+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-20 13:21 [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 01/33] backends/iommufd: Introduce iommufd_backend_alloc_viommu Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 02/33] backends/iommufd: Introduce iommufd_backend_alloc_vdev Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 03/33] hw/arm/smmu-common: Factor out common helper functions and export Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 04/33] hw/arm/smmu-common: Make iommu ops part of SMMUState Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 05/33] hw/arm/smmuv3-accel: Introduce smmuv3 accel device Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 06/33] hw/arm/smmuv3-accel: Initialize shared system address space Shameer Kolothum
2025-12-08 17:05 ` Eric Auger
2025-11-20 13:21 ` [PATCH v6 07/33] hw/pci/pci: Move pci_init_bus_master() after adding device to bus Shameer Kolothum
2025-11-20 20:44 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 08/33] hw/pci/pci: Add optional supports_address_space() callback Shameer Kolothum
2025-11-20 20:51 ` Nicolin Chen
2025-11-21 10:38 ` Shameer Kolothum
2025-11-21 17:28 ` Nicolin Chen
2025-11-21 17:32 ` Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 09/33] hw/pci-bridge/pci_expander_bridge: Move TYPE_PXB_PCIE_DEV to header Shameer Kolothum
2025-11-20 20:52 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 10/33] hw/arm/smmuv3-accel: Restrict accelerated SMMUv3 to vfio-pci endpoints with iommufd Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 11/33] hw/arm/smmuv3: Implement get_viommu_cap() callback Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 12/33] hw/arm/smmuv3-accel: Add set/unset_iommu_device callback Shameer Kolothum
2025-12-09 7:57 ` Eric Auger
2025-11-20 13:21 ` [PATCH v6 13/33] hw/arm/smmuv3: propagate smmuv3_cmdq_consume() errors to caller Shameer Kolothum
2025-11-20 20:59 ` Nicolin Chen
2025-12-04 16:28 ` Eric Auger
2025-11-20 13:21 ` [PATCH v6 14/33] hw/arm/smmuv3-accel: Add nested vSTE install/uninstall support Shameer Kolothum
2025-12-09 8:14 ` Eric Auger
2025-11-20 13:21 ` [PATCH v6 15/33] hw/arm/smmuv3-accel: Install SMMUv3 GBPA based hwpt Shameer Kolothum
2025-11-20 21:03 ` Nicolin Chen
2025-11-20 13:21 ` Shameer Kolothum [this message]
2025-11-20 21:05 ` [PATCH v6 16/33] hw/pci/pci: Introduce a callback to retrieve the MSI doorbell GPA directly Nicolin Chen
2025-12-04 16:38 ` Eric Auger
2025-12-04 18:57 ` Shameer Kolothum
2025-12-08 17:03 ` Eric Auger
2025-11-20 13:21 ` [PATCH v6 17/33] hw/arm/smmuv3: Add support for providing a direct MSI doorbell GPA Shameer Kolothum
2025-11-20 21:21 ` Nicolin Chen
2025-11-21 9:57 ` Shameer Kolothum
2025-11-21 17:56 ` Nicolin Chen
2025-11-24 8:05 ` Shameer Kolothum
2025-11-24 18:34 ` Nicolin Chen
2025-11-24 19:01 ` Shameer Kolothum
2025-11-24 20:08 ` Nicolin Chen
2025-11-20 13:21 ` [PATCH v6 18/33] hw/arm/smmuv3-accel: Add support to issue invalidation cmd to host Shameer Kolothum
2025-11-20 13:21 ` [PATCH v6 19/33] hw/arm/smmuv3: Initialize ID registers early during realize() Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 20/33] hw/arm/smmuv3-accel: Get host SMMUv3 hw info and validate Shameer Kolothum
2025-11-20 21:27 ` Nicolin Chen
2025-11-20 21:30 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 21/33] hw/pci-host/gpex: Allow to generate preserve boot config DSM #5 Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 22/33] hw/arm/virt: Set PCI preserve_config for accel SMMUv3 Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 23/33] tests/qtest/bios-tables-test: Prepare for IORT revison upgrade Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 24/33] hw/arm/virt-acpi-build: Add IORT RMR regions to handle MSI nested binding Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 25/33] tests/qtest/bios-tables-test: Update IORT blobs after revision upgrade Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 26/33] hw/arm/smmuv3: Add accel property for SMMUv3 device Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 27/33] hw/arm/smmuv3-accel: Add a property to specify RIL support Shameer Kolothum
2025-11-20 21:34 ` Nicolin Chen via
2025-11-21 10:04 ` Shameer Kolothum
2025-11-20 13:22 ` [PATCH v6 28/33] hw/arm/smmuv3-accel: Add support for ATS Shameer Kolothum
2025-11-20 21:40 ` Nicolin Chen
2025-11-24 12:00 ` Zhangfei Gao
2025-11-24 12:48 ` Shameer Kolothum
2025-12-08 17:36 ` Eric Auger
2025-11-20 13:22 ` [PATCH v6 29/33] hw/arm/smmuv3-accel: Add property to specify OAS bits Shameer Kolothum
2025-11-20 21:47 ` Nicolin Chen
2025-12-08 17:17 ` Eric Auger
2025-11-20 13:22 ` [PATCH v6 30/33] backends/iommufd: Retrieve PASID width from iommufd_backend_get_device_info() Shameer Kolothum
2025-11-20 21:50 ` Nicolin Chen
2025-11-20 13:22 ` [PATCH v6 31/33] Extend get_cap() callback to support PASID Shameer Kolothum
2025-11-20 21:56 ` Nicolin Chen
2025-12-08 17:20 ` Eric Auger
2025-11-20 13:22 ` [PATCH v6 32/33] vfio: Synthesize vPASID capability to VM Shameer Kolothum
2025-11-20 21:59 ` Nicolin Chen
2025-12-09 9:51 ` Eric Auger
2025-12-09 11:17 ` Yi Liu
2025-11-20 13:22 ` [PATCH v6 33/33] hw/arm/smmuv3-accel: Add support for PASID enable Shameer Kolothum
2025-11-20 22:09 ` Nicolin Chen
2025-11-21 10:22 ` Shameer Kolothum
2025-11-21 17:50 ` Nicolin Chen
2025-11-21 18:36 ` Nicolin Chen
2025-11-21 18:44 ` Jason Gunthorpe
2025-11-24 8:17 ` Shameer Kolothum
2025-11-20 17:06 ` [PATCH v6 00/33] hw/arm/virt: Add support for user-creatable accelerated SMMUv3 Nicolin Chen
2025-11-24 12:09 ` Zhangfei Gao
2025-12-08 10:08 ` Duan, Zhenzhong
2025-12-08 11:15 ` Shameer Kolothum
2025-12-09 2:30 ` Duan, Zhenzhong
2025-12-09 3:33 ` Yi Liu
2025-12-09 10:31 ` Cédric Le Goater
2025-12-10 15:07 ` Shameer Kolothum
2025-12-10 16:07 ` Cédric Le Goater
2025-12-10 16:18 ` Eric Auger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20251120132213.56581-17-skolothumtho@nvidia.com \
--to=skolothumtho@nvidia.com \
--cc=berrange@redhat.com \
--cc=ddutile@redhat.com \
--cc=eric.auger@redhat.com \
--cc=jgg@nvidia.com \
--cc=jiangkunkun@huawei.com \
--cc=jonathan.cameron@huawei.com \
--cc=kjaju@nvidia.com \
--cc=mochs@nvidia.com \
--cc=mst@redhat.com \
--cc=nathanc@nvidia.com \
--cc=nicolinc@nvidia.com \
--cc=peter.maydell@linaro.org \
--cc=qemu-arm@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=smostafa@google.com \
--cc=wangzhou1@hisilicon.com \
--cc=yi.l.liu@intel.com \
--cc=zhangfei.gao@linaro.org \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).