From: Joao Martins <joao.m.martins@oracle.com>
To: qemu-devel@nongnu.org
Cc: Yi Liu <yi.l.liu@intel.com>, Eric Auger <eric.auger@redhat.com>,
Zhenzhong Duan <zhenzhong.duan@intel.com>,
Alex Williamson <alex.williamson@redhat.com>,
Cedric Le Goater <clg@redhat.com>,
Jason Gunthorpe <jgg@nvidia.com>,
Avihai Horon <avihaih@nvidia.com>,
Joao Martins <joao.m.martins@oracle.com>
Subject: [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation
Date: Fri, 12 Jul 2024 12:46:57 +0100 [thread overview]
Message-ID: <20240712114704.8708-6-joao.m.martins@oracle.com> (raw)
In-Reply-To: <20240712114704.8708-1-joao.m.martins@oracle.com>
There's generally two modes of operation for IOMMUFD:
* The simple user API which intends to perform relatively simple things
with IOMMUs e.g. DPDK. It generally creates an IOAS and attach to VFIO
and mainly performs IOAS_MAP and UNMAP.
* The native IOMMUFD API where you have fine grained control of the
IOMMU domain and model it accordingly. This is where most new feature
are being steered to.
For dirty tracking 2) is required, as it needs to ensure that
the stage-2/parent IOMMU domain will only attach devices
that support dirty tracking (so far it is all homogeneous in x86, likely
not the case for smmuv3). Such invariant on dirty tracking provides a
useful guarantee to VMMs that will refuse incompatible device
attachments for IOMMU domains.
Dirty tracking insurance is enforced via HWPT_ALLOC, which is
responsible for creating an IOMMU domain. This is contrast to the
'simple API' where the IOMMU domain is created by IOMMUFD automatically
when it attaches to VFIO (usually referred as autodomains) but it has
the needed handling for mdevs.
To support dirty tracking with the advanced IOMMUFD API, it needs
similar logic, where IOMMU domains are created and devices attached to
compatible domains. Essentially mimmicing kernel
iommufd_device_auto_get_domain(). With mdevs given there's no IOMMU domain
it falls back to IOAS attach.
The auto domain logic allows different IOMMU domains to be created when
DMA dirty tracking is not desired (and VF can provide it), and others where
it is. Here is not used in this way here given how VFIODevice migration
state is initialized after the device attachment. But such mixed mode of
IOMMU dirty tracking + device dirty tracking is an improvement that can
be added on. Keep the 'all of nothing' of type1 approach that we have
been using so far between container vs device dirty tracking.
Signed-off-by: Joao Martins <joao.m.martins@oracle.com>
---
include/hw/vfio/vfio-common.h | 9 ++++
include/sysemu/iommufd.h | 5 +++
backends/iommufd.c | 30 +++++++++++++
hw/vfio/iommufd.c | 82 +++++++++++++++++++++++++++++++++++
backends/trace-events | 1 +
5 files changed, 127 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 7419466bca92..2dd468ce3c02 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -95,10 +95,17 @@ typedef struct VFIOHostDMAWindow {
typedef struct IOMMUFDBackend IOMMUFDBackend;
+typedef struct VFIOIOASHwpt {
+ uint32_t hwpt_id;
+ QLIST_HEAD(, VFIODevice) device_list;
+ QLIST_ENTRY(VFIOIOASHwpt) next;
+} VFIOIOASHwpt;
+
typedef struct VFIOIOMMUFDContainer {
VFIOContainerBase bcontainer;
IOMMUFDBackend *be;
uint32_t ioas_id;
+ QLIST_HEAD(, VFIOIOASHwpt) hwpt_list;
} VFIOIOMMUFDContainer;
OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
@@ -135,6 +142,8 @@ typedef struct VFIODevice {
HostIOMMUDevice *hiod;
int devid;
IOMMUFDBackend *iommufd;
+ VFIOIOASHwpt *hwpt;
+ QLIST_ENTRY(VFIODevice) hwpt_next;
} VFIODevice;
struct VFIODeviceOps {
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 57d502a1c79a..e917e7591d05 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -50,6 +50,11 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp);
+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt,
+ Error **errp);
#define TYPE_HOST_IOMMU_DEVICE_IOMMUFD TYPE_HOST_IOMMU_DEVICE "-iommufd"
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 2b3d51af26d2..5d3dfa917415 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -208,6 +208,36 @@ int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
return ret;
}
+bool iommufd_backend_alloc_hwpt(IOMMUFDBackend *be, uint32_t dev_id,
+ uint32_t pt_id, uint32_t flags,
+ uint32_t data_type, uint32_t data_len,
+ void *data_ptr, uint32_t *out_hwpt,
+ Error **errp)
+{
+ int ret, fd = be->fd;
+ struct iommu_hwpt_alloc alloc_hwpt = {
+ .size = sizeof(struct iommu_hwpt_alloc),
+ .flags = flags,
+ .dev_id = dev_id,
+ .pt_id = pt_id,
+ .data_type = data_type,
+ .data_len = data_len,
+ .data_uptr = (uint64_t)data_ptr,
+ };
+
+ ret = ioctl(fd, IOMMU_HWPT_ALLOC, &alloc_hwpt);
+ trace_iommufd_backend_alloc_hwpt(fd, dev_id, pt_id, flags, data_type,
+ data_len, (uint64_t)data_ptr,
+ alloc_hwpt.out_hwpt_id, ret);
+ if (ret) {
+ error_setg_errno(errp, errno, "Failed to allocate hwpt");
+ return false;
+ }
+
+ *out_hwpt = alloc_hwpt.out_hwpt_id;
+ return true;
+}
+
bool iommufd_backend_get_device_info(IOMMUFDBackend *be, uint32_t devid,
uint32_t *type, void *data, uint32_t len,
uint64_t *caps, Error **errp)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 077dea8f1b64..325c7598d5a1 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -212,10 +212,86 @@ static bool iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
return true;
}
+static bool iommufd_cdev_autodomains_get(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container,
+ Error **errp)
+{
+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
+ uint32_t flags = 0;
+ VFIOIOASHwpt *hwpt;
+ uint32_t hwpt_id;
+ int ret;
+
+ /* Try to find a domain */
+ QLIST_FOREACH(hwpt, &container->hwpt_list, next) {
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ /* -EINVAL means the domain is incompatible with the device. */
+ if (ret == -EINVAL) {
+ /*
+ * It is an expected failure and it just means we will try
+ * another domain, or create one if no existing compatible
+ * domain is found. Hence why the error is discarded below.
+ */
+ error_free(*errp);
+ *errp = NULL;
+ continue;
+ }
+
+ return false;
+ } else {
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ return true;
+ }
+ }
+
+ if (!iommufd_backend_alloc_hwpt(iommufd, vbasedev->devid,
+ container->ioas_id, flags,
+ IOMMU_HWPT_DATA_NONE, 0, NULL,
+ &hwpt_id, errp)) {
+ return false;
+ }
+
+ hwpt = g_malloc0(sizeof(*hwpt));
+ hwpt->hwpt_id = hwpt_id;
+ QLIST_INIT(&hwpt->device_list);
+
+ ret = iommufd_cdev_attach_ioas_hwpt(vbasedev, hwpt->hwpt_id, errp);
+ if (ret) {
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ return false;
+ }
+
+ vbasedev->hwpt = hwpt;
+ QLIST_INSERT_HEAD(&hwpt->device_list, vbasedev, hwpt_next);
+ QLIST_INSERT_HEAD(&container->hwpt_list, hwpt, next);
+ return true;
+}
+
+static void iommufd_cdev_autodomains_put(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container)
+{
+ VFIOIOASHwpt *hwpt = vbasedev->hwpt;
+
+ QLIST_REMOVE(vbasedev, hwpt_next);
+ if (QLIST_EMPTY(&hwpt->device_list)) {
+ QLIST_REMOVE(hwpt, next);
+ iommufd_backend_free_id(container->be, hwpt->hwpt_id);
+ g_free(hwpt);
+ }
+}
+
static bool iommufd_cdev_attach_container(VFIODevice *vbasedev,
VFIOIOMMUFDContainer *container,
Error **errp)
{
+ /* mdevs aren't physical devices and will fail with auto domains */
+ if (!vbasedev->mdev) {
+ return iommufd_cdev_autodomains_get(vbasedev, container, errp);
+ }
+
return !iommufd_cdev_attach_ioas_hwpt(vbasedev, container->ioas_id, errp);
}
@@ -224,6 +300,11 @@ static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
{
Error *err = NULL;
+ if (vbasedev->hwpt) {
+ iommufd_cdev_autodomains_put(vbasedev, container);
+ return;
+ }
+
if (!iommufd_cdev_detach_ioas_hwpt(vbasedev, &err)) {
error_report_err(err);
}
@@ -354,6 +435,7 @@ static bool iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
container = VFIO_IOMMU_IOMMUFD(object_new(TYPE_VFIO_IOMMU_IOMMUFD));
container->be = vbasedev->iommufd;
container->ioas_id = ioas_id;
+ QLIST_INIT(&container->hwpt_list);
bcontainer = &container->bcontainer;
vfio_address_space_insert(space, bcontainer);
diff --git a/backends/trace-events b/backends/trace-events
index 211e6f374adc..4d8ac02fe7d6 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -14,4 +14,5 @@ iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size
iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
+iommufd_backend_alloc_hwpt(int iommufd, uint32_t dev_id, uint32_t pt_id, uint32_t flags, uint32_t hwpt_type, uint32_t len, uint64_t data_ptr, uint32_t out_hwpt_id, int ret) " iommufd=%d dev_id=%u pt_id=%u flags=0x%x hwpt_type=%u len=%u data_ptr=0x%"PRIx64" out_hwpt=%u (%d)"
iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
--
2.17.2
next prev parent reply other threads:[~2024-07-12 11:49 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-12 11:46 [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Joao Martins
2024-07-12 11:46 ` [PATCH v4 01/12] vfio/pci: Extract mdev check into an helper Joao Martins
2024-07-16 9:21 ` Cédric Le Goater
2024-07-16 9:33 ` Joao Martins
2024-07-12 11:46 ` [PATCH v4 02/12] vfio/iommufd: Don't initialize nor set a HOST_IOMMU_DEVICE with mdev Joao Martins
2024-07-16 9:21 ` Cédric Le Goater
2024-07-16 13:26 ` Eric Auger
2024-07-17 1:34 ` Duan, Zhenzhong
2024-07-12 11:46 ` [PATCH v4 03/12] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
2024-07-16 9:22 ` Cédric Le Goater
2024-07-16 13:34 ` Eric Auger
2024-07-12 11:46 ` [PATCH v4 04/12] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
2024-07-16 9:27 ` Cédric Le Goater
2024-07-16 13:36 ` Eric Auger
2024-07-17 1:37 ` Duan, Zhenzhong
2024-07-12 11:46 ` Joao Martins [this message]
2024-07-16 9:39 ` [PATCH v4 05/12] vfio/iommufd: Introduce auto domain creation Cédric Le Goater
2024-07-16 9:47 ` Joao Martins
2024-07-16 12:54 ` Cédric Le Goater
2024-07-16 16:04 ` Eric Auger
2024-07-16 16:44 ` Joao Martins
2024-07-16 16:46 ` Joao Martins
2024-07-17 2:52 ` Duan, Zhenzhong
2024-07-17 9:09 ` Joao Martins
2024-07-17 9:28 ` Cédric Le Goater
2024-07-17 9:31 ` Joao Martins
2024-07-18 13:47 ` Joao Martins
2024-07-19 6:06 ` Cédric Le Goater
2024-07-17 9:48 ` Duan, Zhenzhong
2024-07-17 9:53 ` Joao Martins
2024-07-16 17:32 ` Eric Auger
2024-07-17 2:18 ` Duan, Zhenzhong
2024-07-17 9:04 ` Joao Martins
2024-07-17 10:05 ` Duan, Zhenzhong
2024-07-17 11:04 ` Joao Martins
2024-07-18 7:44 ` Duan, Zhenzhong
2024-07-18 9:16 ` Joao Martins
2024-07-19 2:36 ` Duan, Zhenzhong
2024-07-12 11:46 ` [PATCH v4 06/12] vfio/{iommufd,container}: Remove caps::aw_bits Joao Martins
2024-07-16 10:19 ` Cédric Le Goater
2024-07-16 17:40 ` Eric Auger
2024-07-16 18:22 ` Joao Martins
2024-07-17 11:48 ` Eric Auger
2024-07-12 11:46 ` [PATCH v4 07/12] vfio/{iommufd, container}: Initialize HostIOMMUDeviceCaps during attach_device() Joao Martins via
2024-07-16 10:20 ` [PATCH v4 07/12] vfio/{iommufd,container}: " Cédric Le Goater
2024-07-16 10:40 ` Joao Martins
2024-07-17 2:05 ` Duan, Zhenzhong
2024-07-17 8:55 ` Joao Martins
2024-07-17 12:19 ` Eric Auger
2024-07-17 12:33 ` Joao Martins
2024-07-17 13:41 ` Eric Auger
2024-07-17 15:34 ` Joao Martins
2024-07-12 11:47 ` [PATCH v4 08/12] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
2024-07-16 12:21 ` Cédric Le Goater
2024-07-17 12:27 ` Eric Auger
2024-07-17 12:38 ` Joao Martins
2024-07-17 13:43 ` Eric Auger
2024-07-12 11:47 ` [PATCH v4 09/12] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
2024-07-16 12:24 ` Cédric Le Goater
2024-07-17 2:24 ` Duan, Zhenzhong
2024-07-17 9:14 ` Joao Martins
2024-07-17 12:36 ` Eric Auger
2024-07-17 12:41 ` Joao Martins
2024-07-17 13:34 ` Eric Auger
2024-07-17 15:18 ` Joao Martins
2024-07-12 11:47 ` [PATCH v4 10/12] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
2024-07-16 12:31 ` Cédric Le Goater
2024-07-16 12:53 ` Cédric Le Goater
2024-07-17 12:50 ` Eric Auger
2024-07-12 11:47 ` [PATCH v4 11/12] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
2024-07-17 2:38 ` Duan, Zhenzhong
2024-07-17 9:20 ` Joao Martins
2024-07-17 15:35 ` Joao Martins
2024-07-17 16:02 ` Joao Martins
2024-07-17 16:54 ` Joao Martins
2024-07-18 7:20 ` Duan, Zhenzhong
2024-07-18 9:05 ` Joao Martins
2024-07-17 12:57 ` Eric Auger
2024-07-12 11:47 ` [PATCH v4 12/12] vfio/common: Allow disabling device dirty page tracking Joao Martins
2024-07-16 8:20 ` [PATCH v4 00/12] hw/iommufd: IOMMUFD Dirty Tracking Duan, Zhenzhong
2024-07-16 9:22 ` Joao Martins
2024-07-18 7:50 ` Duan, Zhenzhong
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240712114704.8708-6-joao.m.martins@oracle.com \
--to=joao.m.martins@oracle.com \
--cc=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=clg@redhat.com \
--cc=eric.auger@redhat.com \
--cc=jgg@nvidia.com \
--cc=qemu-devel@nongnu.org \
--cc=yi.l.liu@intel.com \
--cc=zhenzhong.duan@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).