qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Zhenzhong Duan <zhenzhong.duan@intel.com>
To: qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com,
	eric.auger@redhat.com, mst@redhat.com, jasowang@redhat.com,
	peterx@redhat.com, ddutile@redhat.com, jgg@nvidia.com,
	nicolinc@nvidia.com, shameerali.kolothum.thodi@huawei.com,
	joao.m.martins@oracle.com, clement.mathieu--drif@eviden.com,
	kevin.tian@intel.com, yi.l.liu@intel.com, chao.p.peng@intel.com,
	Zhenzhong Duan <zhenzhong.duan@intel.com>
Subject: [PATCH v4 02/20] hw/pci: Introduce pci_device_get_viommu_cap()
Date: Tue, 29 Jul 2025 05:20:24 -0400	[thread overview]
Message-ID: <20250729092043.785836-3-zhenzhong.duan@intel.com> (raw)
In-Reply-To: <20250729092043.785836-1-zhenzhong.duan@intel.com>

Introduce a new PCIIOMMUOps optional callback, get_viommu_cap() which
allows to retrieve capabilities exposed by a vIOMMU. The first planned
capability is VIOMMU_CAP_HW_NESTED that advertises the support of HW
nested stage translation scheme. pci_device_get_viommu_cap is a wrapper
that can be called on a PCI device potentially protected by a vIOMMU.

get_viommu_cap() is designed to return 64bit bitmap of purely emulated
capabilities which are only derermined by user's configuration, no host
capabilities involved. Reasons are:

1. there can be more than one host IOMMUs with different capabilities
2. there can also be more than one vIOMMUs with different user
   configuration, e.g., arm smmuv3.
3. This is migration friendly, return value is consistent between source
   and target.
4. It's too late for VFIO to call get_viommu_cap() after set_iommu_device()
   because we need get_viommu_cap() to determine if creating nested parent
   hwpt or not at attaching stage, meanwhile hiod realize needs iommufd,
   devid and hwpt_id which are ready after attach_device().
   See below sequence:

     attach_device()
       get_viommu_cap()
       create hwpt
     ...
     create hiod
     set_iommu_device(hiod)

Suggested-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
 MAINTAINERS          |  1 +
 include/hw/iommu.h   | 17 +++++++++++++++++
 include/hw/pci/pci.h | 25 +++++++++++++++++++++++++
 hw/pci/pci.c         | 11 +++++++++++
 4 files changed, 54 insertions(+)
 create mode 100644 include/hw/iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 37879ab64e..840cb1e604 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2304,6 +2304,7 @@ F: include/system/iommufd.h
 F: backends/host_iommu_device.c
 F: include/system/host_iommu_device.h
 F: include/qemu/chardev_open.h
+F: include/hw/iommu.h
 F: util/chardev_open.c
 F: docs/devel/vfio-iommufd.rst
 
diff --git a/include/hw/iommu.h b/include/hw/iommu.h
new file mode 100644
index 0000000000..021db50db5
--- /dev/null
+++ b/include/hw/iommu.h
@@ -0,0 +1,17 @@
+/*
+ * General vIOMMU capabilities, flags, etc
+ *
+ * Copyright (C) 2025 Intel Corporation.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef HW_IOMMU_H
+#define HW_IOMMU_H
+
+enum {
+    /* hardware nested stage-1 page table support */
+    VIOMMU_CAP_HW_NESTED = BIT_ULL(0),
+};
+
+#endif /* HW_IOMMU_H */
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 6b7d3ac8a3..d89aefc030 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -462,6 +462,21 @@ typedef struct PCIIOMMUOps {
      * @devfn: device and function number of the PCI device.
      */
     void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
+    /**
+     * @get_viommu_cap: get vIOMMU capabilities
+     *
+     * Optional callback, if not implemented, then vIOMMU doesn't
+     * support exposing capabilities to other subsystem, e.g., VFIO.
+     * vIOMMU can choose which capabilities to expose.
+     *
+     * @opaque: the data passed to pci_setup_iommu().
+     *
+     * Returns: 64bit bitmap with each bit represents a capability emulated by
+     * VIOMMU_CAP_* in include/hw/iommu.h, these capabilities are theoretical
+     * which are only determined by user's configuration and independent on the
+     * actual host capabilities they may depend on.
+     */
+    uint64_t (*get_viommu_cap)(void *opaque);
     /**
      * @get_iotlb_info: get properties required to initialize a device IOTLB.
      *
@@ -642,6 +657,16 @@ bool pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *hiod,
                                  Error **errp);
 void pci_device_unset_iommu_device(PCIDevice *dev);
 
+/**
+ * pci_device_get_viommu_cap: get vIOMMU capabilities.
+ *
+ * Returns a 64bit bitmap with each bit represents a vIOMMU exposed
+ * capability, 0 if vIOMMU doesn't support esposing capabilities.
+ *
+ * @dev: PCI device pointer.
+ */
+uint64_t pci_device_get_viommu_cap(PCIDevice *dev);
+
 /**
  * pci_iommu_get_iotlb_info: get properties required to initialize a
  * device IOTLB.
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index c70b5ceeba..df1fb615a8 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2992,6 +2992,17 @@ void pci_device_unset_iommu_device(PCIDevice *dev)
     }
 }
 
+uint64_t pci_device_get_viommu_cap(PCIDevice *dev)
+{
+    PCIBus *iommu_bus;
+
+    pci_device_get_iommu_bus_devfn(dev, &iommu_bus, NULL, NULL);
+    if (iommu_bus && iommu_bus->iommu_ops->get_viommu_cap) {
+        return iommu_bus->iommu_ops->get_viommu_cap(iommu_bus->iommu_opaque);
+    }
+    return 0;
+}
+
 int pci_pri_request_page(PCIDevice *dev, uint32_t pasid, bool priv_req,
                          bool exec_req, hwaddr addr, bool lpig,
                          uint16_t prgi, bool is_read, bool is_write)
-- 
2.47.1



  parent reply	other threads:[~2025-07-29  9:21 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-29  9:20 [PATCH v4 00/20] intel_iommu: Enable stage-1 translation for passthrough device Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 01/20] intel_iommu: Rename vtd_ce_get_rid2pasid_entry to vtd_ce_get_pasid_entry Zhenzhong Duan
2025-07-29  9:20 ` Zhenzhong Duan [this message]
2025-07-29 13:19   ` [PATCH v4 02/20] hw/pci: Introduce pci_device_get_viommu_cap() Cédric Le Goater
2025-07-30 10:51     ` Duan, Zhenzhong
2025-08-04 18:45       ` Nicolin Chen
2025-08-05  6:04         ` Duan, Zhenzhong
2025-08-13  6:43   ` Jim Shu
2025-08-13  6:57     ` Duan, Zhenzhong
2025-07-29  9:20 ` [PATCH v4 03/20] intel_iommu: Implement get_viommu_cap() callback Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 04/20] vfio/iommufd: Force creating nested parent domain Zhenzhong Duan
2025-07-29 13:44   ` Cédric Le Goater
2025-07-30 10:55     ` Duan, Zhenzhong
2025-07-30 14:00       ` Cédric Le Goater
2025-07-31  3:29         ` Duan, Zhenzhong
2025-07-29  9:20 ` [PATCH v4 05/20] hw/pci: Export pci_device_get_iommu_bus_devfn() and return bool Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 06/20] intel_iommu: Introduce a new structure VTDHostIOMMUDevice Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 07/20] intel_iommu: Check for compatibility with IOMMUFD backed device when x-flts=on Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 08/20] intel_iommu: Fail passthrough device under PCI bridge if x-flts=on Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 09/20] intel_iommu: Introduce two helpers vtd_as_from/to_iommu_pasid_locked Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 10/20] intel_iommu: Handle PASID entry removal and update Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 11/20] intel_iommu: Handle PASID entry addition Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 12/20] intel_iommu: Introduce a new pasid cache invalidation type FORCE_RESET Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 13/20] intel_iommu: Stick to system MR for IOMMUFD backed host device when x-fls=on Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 14/20] intel_iommu: Bind/unbind guest page table to host Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 15/20] intel_iommu: Replay pasid bindings after context cache invalidation Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 16/20] intel_iommu: Propagate PASID-based iotlb invalidation to host Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 17/20] intel_iommu: Replay all pasid bindings when either SRTP or TE bit is changed Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 18/20] vfio: Add a new element bypass_ro in VFIOContainerBase Zhenzhong Duan
2025-07-29 13:55   ` Cédric Le Goater
2025-07-30 10:58     ` Duan, Zhenzhong
2025-07-30 14:18       ` Cédric Le Goater
2025-07-31  3:30         ` Duan, Zhenzhong
2025-07-29  9:20 ` [PATCH v4 19/20] Workaround for ERRATA_772415_SPR17 Zhenzhong Duan
2025-07-29  9:20 ` [PATCH v4 20/20] intel_iommu: Enable host device when x-flts=on in scalable mode Zhenzhong Duan
2025-08-21  7:19 ` [PATCH v4 00/20] intel_iommu: Enable stage-1 translation for passthrough device Duan, Zhenzhong
2025-08-21  8:50   ` Yi Liu
2025-08-21  8:50     ` Eric Auger
2025-08-21  8:58       ` Duan, Zhenzhong
2025-08-21  8:50     ` Duan, Zhenzhong
2025-08-21  8:51   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250729092043.785836-3-zhenzhong.duan@intel.com \
    --to=zhenzhong.duan@intel.com \
    --cc=alex.williamson@redhat.com \
    --cc=chao.p.peng@intel.com \
    --cc=clement.mathieu--drif@eviden.com \
    --cc=clg@redhat.com \
    --cc=ddutile@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=kevin.tian@intel.com \
    --cc=mst@redhat.com \
    --cc=nicolinc@nvidia.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).