* [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU
@ 2024-02-01 7:28 Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 01/18] Introduce a common abstract struct HostIOMMUDevice Zhenzhong Duan
` (17 more replies)
0 siblings, 18 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan
Hi,
This enables vIOMMU to get host IOMMU cap/ecap information by introducing
a new set/unset_iommu_device interface, then vIOMMU could check or sync
with vIOMMU's own cap/ecap config.
It works by having device side, i.e. VFIO, register either an IOMMULegacyDevice
or IOMMUFDDevice to vIOMMU, which includes necessary data to archive that.
Currently only VFIO device is supported, but it could also be used for other
devices, i.e., VDPA.
For coldplugged device, we can get its host IOMMU cap/ecap during qemu init,
then check and sync into vIOMMU cap/ecap.
For hotplugged device, vIOMMU cap/ecap is frozen, we could only check with
vIOMMU cap/ecap, not allowed to update. IF check fails, hotplugged will fail.
This is also a prerequisite for incoming iommufd nesting series:
'intel_iommu: Enable stage-1 translation'.
I didn't implement cap/ecap sync for legacy VFIO backend, would like to see
what Eric want to put in IOMMULegacyDevice for virtio-iommu and if I can
utilize some of them.
PATCH1-3: Introduce HostIOMMUDevice and two sub class
PATCH4-5: Define HostIOMMUDevice instance in VFIODevice
PATCH6-9: Introdcue host_iommu_device_init callback to intialize HostIOMMUDevice
PATCH10-12: Introdcue set/unset_iommu_device to pass HostIOMMUDevice to vIOMMU
PATCH13-18: Implement cap/ecap check and sync in intel_iommu
Qemu code can be found at:
https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_rfcv2
Thanks
Zhenzhong
Changelog:
rfcv2:
- introduce common abstract HostIOMMUDevice and sub struct for different BEs (Eric, Cédric)
- remove iommufd_device.[ch] (Cédric)
- remove duplicate iommufd/devid define from VFIODevice (Eric)
- drop the p in aliased_pbus and aliased_pdevfn (Eric)
- assert devfn and iommu_bus in pci_device_get_iommu_bus_devfn (Cédric, Eric)
- use errp in iommufd_device_get_info (Eric)
- split and simplify cap/ecap check/sync code in intel_iommu.c (Cédric)
- move VTDHostIOMMUDevice declaration to intel_iommu_internal.h (Cédric)
- make '(vtd->cap_reg >> 16) & 0x3fULL' a MACRO and add missed '+1' (Cédric)
- block migration if vIOMMU cap/ecap updated based on host IOMMU cap/ecap
- add R-B
Yi Liu (3):
hw/pci: Introduce pci_device_set/unset_iommu_device()
intel_iommu: Add set/unset_iommu_device callback
intel_iommu: Add a framework to check and sync host IOMMU cap/ecap
Zhenzhong Duan (15):
Introduce a common abstract struct HostIOMMUDevice
backends/iommufd: Introduce IOMMUFDDevice
vfio: Introduce IOMMULegacyDevice
vfio: Add host iommu device instance into VFIODevice
vfio: Remove redundant iommufd and devid elements in VFIODevice
vfio: Introduce host_iommu_device_init callback
vfio/container: Implement host_iommu_device_init callback in legacy
mode
vfio/iommufd: Implement host_iommu_device_init callback in iommufd
mode
vfio/pci: Initialize host iommu device instance after attachment
vfio: Initialize host IOMMU device and pass to vIOMMU
intel_iommu: Extract out vtd_cap_init to initialize cap/ecap
backends/iommufd: Introduce helper function iommufd_device_get_info()
intel_iommu: Implement check and sync mechanism in iommufd mode
intel_iommu: Use mgaw instead of s->aw_bits
intel_iommu: Block migration if cap is updated
hw/i386/intel_iommu_internal.h | 15 ++
include/hw/i386/intel_iommu.h | 4 +
include/hw/pci/pci.h | 38 +++-
include/hw/vfio/vfio-common.h | 20 +-
include/hw/vfio/vfio-container-base.h | 1 +
include/sysemu/host_iommu_device.h | 22 ++
include/sysemu/iommufd.h | 18 ++
backends/iommufd.c | 31 ++-
hw/i386/acpi-build.c | 3 +-
hw/i386/intel_iommu.c | 279 ++++++++++++++++++++------
hw/pci/pci.c | 62 +++++-
hw/vfio/ap.c | 2 +-
hw/vfio/ccw.c | 2 +-
hw/vfio/common.c | 10 +-
hw/vfio/container.c | 7 +
hw/vfio/helpers.c | 2 +-
hw/vfio/iommufd.c | 32 +--
hw/vfio/pci.c | 25 ++-
hw/vfio/platform.c | 3 +-
19 files changed, 488 insertions(+), 88 deletions(-)
create mode 100644 include/sysemu/host_iommu_device.h
--
2.34.1
^ permalink raw reply [flat|nested] 35+ messages in thread
* [PATCH rfcv2 01/18] Introduce a common abstract struct HostIOMMUDevice
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 02/18] backends/iommufd: Introduce IOMMUFDDevice Zhenzhong Duan
` (16 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan
HostIOMMUDevice will be inherited by two sub classes,
legacy and iommufd currently.
Introduce a helper function host_iommu_base_device_init to initialize it.
Suggested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/sysemu/host_iommu_device.h | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
create mode 100644 include/sysemu/host_iommu_device.h
diff --git a/include/sysemu/host_iommu_device.h b/include/sysemu/host_iommu_device.h
new file mode 100644
index 0000000000..fe80ab25fb
--- /dev/null
+++ b/include/sysemu/host_iommu_device.h
@@ -0,0 +1,22 @@
+#ifndef HOST_IOMMU_DEVICE_H
+#define HOST_IOMMU_DEVICE_H
+
+typedef enum HostIOMMUDevice_Type {
+ HID_LEGACY,
+ HID_IOMMUFD,
+ HID_MAX,
+} HostIOMMUDevice_Type;
+
+typedef struct HostIOMMUDevice {
+ HostIOMMUDevice_Type type;
+ size_t size;
+} HostIOMMUDevice;
+
+static inline void host_iommu_base_device_init(HostIOMMUDevice *dev,
+ HostIOMMUDevice_Type type,
+ size_t size)
+{
+ dev->type = type;
+ dev->size = size;
+}
+#endif
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 02/18] backends/iommufd: Introduce IOMMUFDDevice
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 01/18] Introduce a common abstract struct HostIOMMUDevice Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 03/18] vfio: Introduce IOMMULegacyDevice Zhenzhong Duan
` (15 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan, Yi Sun
IOMMUFDDevice represents a device in iommufd and can be used as
a communication interface between devices (i.e., VFIO, VDPA) and
vIOMMU.
Currently it includes only public iommufd handle and device id
which could be used by vIOMMU to get hw IOMMU information.
There will also be some elements in private field in future,
i.e., capability bits for dirty tracking; when nested translation
is supported in future, vIOMMU is going to have more iommufd related
operations like allocate hwpt for a device, attach/detach hwpt, etc.
So IOMMUFDDevice will be further extended with those needs.
IOMMUFDDevice is willingly not a QOM object because we don't want
it to be visible from the user interface.
Introduce a helper iommufd_device_init to initialize IOMMUFDDevice.
Originally-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/sysemu/iommufd.h | 14 ++++++++++++++
backends/iommufd.c | 6 ++++++
2 files changed, 20 insertions(+)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index 9af27ebd6c..c3f3469760 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -4,6 +4,7 @@
#include "qom/object.h"
#include "exec/hwaddr.h"
#include "exec/cpu-common.h"
+#include "sysemu/host_iommu_device.h"
#define TYPE_IOMMUFD_BACKEND "iommufd"
OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
@@ -33,4 +34,17 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly);
int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
hwaddr iova, ram_addr_t size);
+
+
+/* Abstraction of host IOMMUFD device */
+typedef struct IOMMUFDDevice {
+ HostIOMMUDevice base;
+ /* private: */
+
+ /* public: */
+ IOMMUFDBackend *iommufd;
+ uint32_t devid;
+} IOMMUFDDevice;
+
+void iommufd_device_init(IOMMUFDDevice *idev);
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 1ef683c7b0..d92791bba9 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -231,3 +231,9 @@ static void register_types(void)
}
type_init(register_types);
+
+void iommufd_device_init(IOMMUFDDevice *idev)
+{
+ host_iommu_base_device_init(&idev->base, HID_IOMMUFD,
+ sizeof(IOMMUFDDevice));
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 03/18] vfio: Introduce IOMMULegacyDevice
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 01/18] Introduce a common abstract struct HostIOMMUDevice Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 02/18] backends/iommufd: Introduce IOMMUFDDevice Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice Zhenzhong Duan
` (14 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan
Similar as IOMMUFDDevice, IOMMULegacyDevice represents a device in
legacy mode and can be used as a communication interface between
devices (i.e., VFIO, VDPA) and vIOMMU.
Currently it includes nothing legacy specific, but could be extended
with any wanted info of legacy mode when necessary.
IOMMULegacyDevice is willingly not a QOM object because we don't want
it to be visible from the user interface.
Suggested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-common.h | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 9b7ef7d02b..8bfb9cbe94 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -31,6 +31,7 @@
#endif
#include "sysemu/sysemu.h"
#include "hw/vfio/vfio-container-base.h"
+#include "sysemu/host_iommu_device.h"
#define VFIO_MSG_PREFIX "vfio %s: "
@@ -97,6 +98,11 @@ typedef struct VFIOIOMMUFDContainer {
uint32_t ioas_id;
} VFIOIOMMUFDContainer;
+/* Abstraction of host IOMMU legacy device */
+typedef struct IOMMULegacyDevice {
+ HostIOMMUDevice base;
+} IOMMULegacyDevice;
+
typedef struct VFIODeviceOps VFIODeviceOps;
typedef struct VFIODevice {
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (2 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 03/18] vfio: Introduce IOMMULegacyDevice Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-19 15:34 ` Eric Auger
2024-02-19 15:45 ` Eric Auger
2024-02-01 7:28 ` [PATCH rfcv2 05/18] vfio: Remove redundant iommufd and devid elements in VFIODevice Zhenzhong Duan
` (13 subsequent siblings)
17 siblings, 2 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan
Either IOMMULegacyDevice or IOMMUFDDevice into VFIODevice, neither
both.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-common.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8bfb9cbe94..1bbad003ee 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -32,6 +32,7 @@
#include "sysemu/sysemu.h"
#include "hw/vfio/vfio-container-base.h"
#include "sysemu/host_iommu_device.h"
+#include "sysemu/iommufd.h"
#define VFIO_MSG_PREFIX "vfio %s: "
@@ -132,8 +133,18 @@ typedef struct VFIODevice {
bool dirty_tracking;
int devid;
IOMMUFDBackend *iommufd;
+ union {
+ HostIOMMUDevice base_hdev;
+ IOMMULegacyDevice legacy_dev;
+ IOMMUFDDevice iommufd_dev;
+ };
} VFIODevice;
+QEMU_BUILD_BUG_ON(offsetof(VFIODevice, legacy_dev.base) !=
+ offsetof(VFIODevice, base_hdev));
+QEMU_BUILD_BUG_ON(offsetof(VFIODevice, iommufd_dev.base) !=
+ offsetof(VFIODevice, base_hdev));
+
struct VFIODeviceOps {
void (*vfio_compute_needs_reset)(VFIODevice *vdev);
int (*vfio_hot_reset_multi)(VFIODevice *vdev);
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 05/18] vfio: Remove redundant iommufd and devid elements in VFIODevice
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (3 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 06/18] vfio: Introduce host_iommu_device_init callback Zhenzhong Duan
` (12 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan, Thomas Huth, Tony Krowiak,
Halil Pasic, Jason Herne, Eric Farman, Matthew Rosato,
open list:S390 general arch...
iommufd and devid in VFIODevice are redundant with the ones
in IOMMUFDDevice, so remove them.
Suggested-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-common.h | 2 --
hw/vfio/ap.c | 2 +-
hw/vfio/ccw.c | 2 +-
hw/vfio/common.c | 2 +-
hw/vfio/helpers.c | 2 +-
hw/vfio/iommufd.c | 26 ++++++++++++++------------
hw/vfio/pci.c | 2 +-
hw/vfio/platform.c | 3 ++-
8 files changed, 21 insertions(+), 20 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 1bbad003ee..24e3eaaf3d 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -131,8 +131,6 @@ typedef struct VFIODevice {
OnOffAuto pre_copy_dirty_page_tracking;
bool dirty_pages_supported;
bool dirty_tracking;
- int devid;
- IOMMUFDBackend *iommufd;
union {
HostIOMMUDevice base_hdev;
IOMMULegacyDevice legacy_dev;
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index e157aa1ff7..11526d93d4 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -198,7 +198,7 @@ static void vfio_ap_unrealize(DeviceState *dev)
static Property vfio_ap_properties[] = {
DEFINE_PROP_STRING("sysfsdev", VFIOAPDevice, vdev.sysfsdev),
#ifdef CONFIG_IOMMUFD
- DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd,
+ DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd_dev.iommufd,
TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
#endif
DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index 90e4a53437..b1b75ffa2a 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -667,7 +667,7 @@ static Property vfio_ccw_properties[] = {
DEFINE_PROP_STRING("sysfsdev", VFIOCCWDevice, vdev.sysfsdev),
DEFINE_PROP_BOOL("force-orb-pfch", VFIOCCWDevice, force_orb_pfch, false),
#ifdef CONFIG_IOMMUFD
- DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd,
+ DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd_dev.iommufd,
TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
#endif
DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 059bfdc07a..8b3b575c9d 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1505,7 +1505,7 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
const VFIOIOMMUClass *ops =
VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
- if (vbasedev->iommufd) {
+ if (vbasedev->iommufd_dev.iommufd) {
ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
}
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 6789870802..e5457ca326 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -626,7 +626,7 @@ int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
}
} else {
- if (!vbasedev->iommufd) {
+ if (!vbasedev->iommufd_dev.iommufd) {
error_setg(errp, "Use FD passing only with iommufd backend");
return -EINVAL;
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 9bfddc1360..5d50549713 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -65,7 +65,7 @@ static void iommufd_cdev_kvm_device_del(VFIODevice *vbasedev)
static int iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
{
- IOMMUFDBackend *iommufd = vbasedev->iommufd;
+ IOMMUFDBackend *iommufd = vbasedev->iommufd_dev.iommufd;
struct vfio_device_bind_iommufd bind = {
.argsz = sizeof(bind),
.flags = 0,
@@ -96,9 +96,10 @@ static int iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
goto err_bind;
}
- vbasedev->devid = bind.out_devid;
+ vbasedev->iommufd_dev.devid = bind.out_devid;
trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name,
- vbasedev->fd, vbasedev->devid);
+ vbasedev->fd,
+ vbasedev->iommufd_dev.devid);
return ret;
err_bind:
iommufd_cdev_kvm_device_del(vbasedev);
@@ -111,7 +112,7 @@ static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
{
/* Unbind is automatically conducted when device fd is closed */
iommufd_cdev_kvm_device_del(vbasedev);
- iommufd_backend_disconnect(vbasedev->iommufd);
+ iommufd_backend_disconnect(vbasedev->iommufd_dev.iommufd);
}
static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
@@ -181,7 +182,7 @@ out_free_path:
static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
Error **errp)
{
- int ret, iommufd = vbasedev->iommufd->fd;
+ int ret, iommufd = vbasedev->iommufd_dev.iommufd->fd;
struct vfio_device_attach_iommufd_pt attach_data = {
.argsz = sizeof(attach_data),
.flags = 0,
@@ -203,7 +204,7 @@ static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, uint32_t id,
static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, Error **errp)
{
- int ret, iommufd = vbasedev->iommufd->fd;
+ int ret, iommufd = vbasedev->iommufd_dev.iommufd->fd;
struct vfio_device_detach_iommufd_pt detach_data = {
.argsz = sizeof(detach_data),
.flags = 0,
@@ -337,7 +338,7 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
QLIST_FOREACH(bcontainer, &space->containers, next) {
container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
if (bcontainer->ops != iommufd_vioc ||
- vbasedev->iommufd != container->be) {
+ vbasedev->iommufd_dev.iommufd != container->be) {
continue;
}
if (iommufd_cdev_attach_container(vbasedev, container, &err)) {
@@ -358,15 +359,16 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
}
/* Need to allocate a new dedicated container */
- ret = iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, errp);
+ ret = iommufd_backend_alloc_ioas(vbasedev->iommufd_dev.iommufd,
+ &ioas_id, errp);
if (ret < 0) {
goto err_alloc_ioas;
}
- trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
+ trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd_dev.iommufd->fd, ioas_id);
container = g_malloc0(sizeof(*container));
- container->be = vbasedev->iommufd;
+ container->be = vbasedev->iommufd_dev.iommufd;
container->ioas_id = ioas_id;
bcontainer = &container->bcontainer;
@@ -479,7 +481,7 @@ static VFIODevice *iommufd_cdev_pci_find_by_devid(__u32 devid)
if (vbasedev_iter->bcontainer->ops != iommufd_vioc) {
continue;
}
- if (devid == vbasedev_iter->devid) {
+ if (devid == vbasedev_iter->iommufd_dev.devid) {
return vbasedev_iter;
}
}
@@ -492,7 +494,7 @@ iommufd_cdev_dep_get_realized_vpdev(struct vfio_pci_dependent_device *dep_dev,
{
VFIODevice *vbasedev_tmp;
- if (dep_dev->devid == reset_dev->devid ||
+ if (dep_dev->devid == reset_dev->iommufd_dev.devid ||
dep_dev->devid == VFIO_PCI_DEVID_OWNED) {
return NULL;
}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 4fa387f043..d1e1b8cb89 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3385,7 +3385,7 @@ static Property vfio_pci_dev_properties[] = {
DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
OFF_AUTOPCIBAR_OFF),
#ifdef CONFIG_IOMMUFD
- DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
+ DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd_dev.iommufd,
TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
#endif
DEFINE_PROP_END_OF_LIST(),
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index a8d9b7da63..823a17b9c5 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -646,7 +646,8 @@ static Property vfio_platform_dev_properties[] = {
mmap_timeout, 1100),
DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
#ifdef CONFIG_IOMMUFD
- DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
+ DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice,
+ vbasedev.iommufd_dev.iommufd,
TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
#endif
DEFINE_PROP_END_OF_LIST(),
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 06/18] vfio: Introduce host_iommu_device_init callback
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (4 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 05/18] vfio: Remove redundant iommufd and devid elements in VFIODevice Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 07/18] vfio/container: Implement host_iommu_device_init callback in legacy mode Zhenzhong Duan
` (11 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan
Introduce host_iommu_device_init callback and a wrapper for it.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-common.h | 1 +
include/hw/vfio/vfio-container-base.h | 1 +
hw/vfio/common.c | 8 ++++++++
3 files changed, 10 insertions(+)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 24e3eaaf3d..9c4b60c906 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -216,6 +216,7 @@ struct vfio_device_info *vfio_get_device_info(int fd);
int vfio_attach_device(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
void vfio_detach_device(VFIODevice *vbasedev);
+void host_iommu_device_init(VFIODevice *vbasedev);
int vfio_kvm_device_add_fd(int fd, Error **errp);
int vfio_kvm_device_del_fd(int fd, Error **errp);
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index b2813b0c11..c71f4abb2d 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -120,6 +120,7 @@ struct VFIOIOMMUClass {
int (*attach_device)(const char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
void (*detach_device)(VFIODevice *vbasedev);
+ void (*host_iommu_device_init)(VFIODevice *vbasedev);
/* migration feature */
int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
bool start);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 8b3b575c9d..f7f85160be 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1521,3 +1521,11 @@ void vfio_detach_device(VFIODevice *vbasedev)
}
vbasedev->bcontainer->ops->detach_device(vbasedev);
}
+
+void host_iommu_device_init(VFIODevice *vbasedev)
+{
+ const VFIOIOMMUClass *ops = vbasedev->bcontainer->ops;
+
+ assert(ops->host_iommu_device_init);
+ ops->host_iommu_device_init(vbasedev);
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 07/18] vfio/container: Implement host_iommu_device_init callback in legacy mode
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (5 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 06/18] vfio: Introduce host_iommu_device_init callback Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-19 17:13 ` Eric Auger
2024-02-01 7:28 ` [PATCH rfcv2 08/18] vfio/iommufd: Implement host_iommu_device_init callback in iommufd mode Zhenzhong Duan
` (10 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan
This callback will be used to initialize base and public elements
in IOMMULegacyDevice.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/container.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index bd25b9fbad..8fafd4b4e5 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -1120,6 +1120,12 @@ out_single:
return ret;
}
+static void vfio_legacy_host_iommu_device_init(VFIODevice *vbasedev)
+{
+ host_iommu_base_device_init(&vbasedev->base_hdev, HID_LEGACY,
+ sizeof(IOMMULegacyDevice));
+}
+
static void vfio_iommu_legacy_class_init(ObjectClass *klass, void *data)
{
VFIOIOMMUClass *vioc = VFIO_IOMMU_CLASS(klass);
@@ -1132,6 +1138,7 @@ static void vfio_iommu_legacy_class_init(ObjectClass *klass, void *data)
vioc->set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking;
vioc->query_dirty_bitmap = vfio_legacy_query_dirty_bitmap;
vioc->pci_hot_reset = vfio_legacy_pci_hot_reset;
+ vioc->host_iommu_device_init = vfio_legacy_host_iommu_device_init;
};
static const TypeInfo types[] = {
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 08/18] vfio/iommufd: Implement host_iommu_device_init callback in iommufd mode
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (6 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 07/18] vfio/container: Implement host_iommu_device_init callback in legacy mode Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 09/18] vfio/pci: Initialize host iommu device instance after attachment Zhenzhong Duan
` (9 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan
This callback will be used to initialize base and public elements
in IOMMUFDDevice, with the exception of iommufd and devid which
are initialized early in attachment.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/iommufd.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 5d50549713..7d39d7a5fa 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -621,6 +621,11 @@ out_single:
return ret;
}
+static void vfio_cdev_host_iommu_device_init(VFIODevice *vbasedev)
+{
+ iommufd_device_init(&vbasedev->iommufd_dev);
+}
+
static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
{
VFIOIOMMUClass *vioc = VFIO_IOMMU_CLASS(klass);
@@ -630,6 +635,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
vioc->attach_device = iommufd_cdev_attach;
vioc->detach_device = iommufd_cdev_detach;
vioc->pci_hot_reset = iommufd_cdev_pci_hot_reset;
+ vioc->host_iommu_device_init = vfio_cdev_host_iommu_device_init;
};
static const TypeInfo types[] = {
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 09/18] vfio/pci: Initialize host iommu device instance after attachment
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (7 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 08/18] vfio/iommufd: Implement host_iommu_device_init callback in iommufd mode Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 10/18] hw/pci: Introduce pci_device_set/unset_iommu_device() Zhenzhong Duan
` (8 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/pci.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d1e1b8cb89..dedb64fc08 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3006,6 +3006,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
goto error;
}
+ /* Initialize host iommu device after attachment succeed */
+ host_iommu_device_init(vbasedev);
+
vfio_populate_device(vdev, &err);
if (err) {
error_propagate(errp, err);
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 10/18] hw/pci: Introduce pci_device_set/unset_iommu_device()
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (8 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 09/18] vfio/pci: Initialize host iommu device instance after attachment Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-19 17:41 ` Eric Auger
2024-02-01 7:28 ` [PATCH rfcv2 11/18] intel_iommu: Add set/unset_iommu_device callback Zhenzhong Duan
` (7 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Yi Sun, Zhenzhong Duan, Marcel Apfelbaum
From: Yi Liu <yi.l.liu@intel.com>
This adds pci_device_set/unset_iommu_device() to set/unset
HostIOMMUDevice for a given PCIe device. Caller of set
should fail if set operation fails.
Extract out pci_device_get_iommu_bus_devfn() to facilitate
implementation of pci_device_set/unset_iommu_device().
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/pci/pci.h | 38 ++++++++++++++++++++++++++-
hw/pci/pci.c | 62 +++++++++++++++++++++++++++++++++++++++++---
2 files changed, 96 insertions(+), 4 deletions(-)
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index fa6313aabc..5b471fd380 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -3,6 +3,7 @@
#include "exec/memory.h"
#include "sysemu/dma.h"
+#include "sysemu/host_iommu_device.h"
/* PCI includes legacy ISA access. */
#include "hw/isa/isa.h"
@@ -384,10 +385,45 @@ typedef struct PCIIOMMUOps {
*
* @devfn: device and function number
*/
- AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
+ AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
+ /**
+ * @set_iommu_device: set iommufd device for a PCI device to vIOMMU
+ *
+ * Optional callback, if not implemented in vIOMMU, then vIOMMU can't
+ * utilize iommufd specific features.
+ *
+ * Return true if iommufd device is accepted, or else return false with
+ * errp set.
+ *
+ * @bus: the #PCIBus of the PCI device.
+ *
+ * @opaque: the data passed to pci_setup_iommu().
+ *
+ * @devfn: device and function number of the PCI device.
+ *
+ * @dev: the data structure representing host assigned device.
+ *
+ */
+ int (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
+ HostIOMMUDevice *dev, Error **errp);
+ /**
+ * @unset_iommu_device: unset iommufd device for a PCI device from vIOMMU
+ *
+ * Optional callback.
+ *
+ * @bus: the #PCIBus of the PCI device.
+ *
+ * @opaque: the data passed to pci_setup_iommu().
+ *
+ * @devfn: device and function number of the PCI device.
+ */
+ void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
} PCIIOMMUOps;
AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
+int pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *base_dev,
+ Error **errp);
+void pci_device_unset_iommu_device(PCIDevice *dev);
/**
* pci_setup_iommu: Initialize specific IOMMU handlers for a PCIBus
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 76080af580..8078307963 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2672,11 +2672,14 @@ static void pci_device_class_base_init(ObjectClass *klass, void *data)
}
}
-AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+static void pci_device_get_iommu_bus_devfn(PCIDevice *dev,
+ PCIBus **aliased_bus,
+ PCIBus **piommu_bus,
+ int *aliased_devfn)
{
PCIBus *bus = pci_get_bus(dev);
PCIBus *iommu_bus = bus;
- uint8_t devfn = dev->devfn;
+ int devfn = dev->devfn;
while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {
PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev);
@@ -2717,13 +2720,66 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
iommu_bus = parent_bus;
}
- if (!pci_bus_bypass_iommu(bus) && iommu_bus->iommu_ops) {
+
+ assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
+ assert(iommu_bus);
+
+ if (pci_bus_bypass_iommu(bus) || !iommu_bus->iommu_ops) {
+ iommu_bus = NULL;
+ }
+
+ *piommu_bus = iommu_bus;
+
+ if (aliased_bus) {
+ *aliased_bus = bus;
+ }
+
+ if (aliased_devfn) {
+ *aliased_devfn = devfn;
+ }
+}
+
+AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
+{
+ PCIBus *bus;
+ PCIBus *iommu_bus;
+ int devfn;
+
+ pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
+ if (iommu_bus) {
return iommu_bus->iommu_ops->get_address_space(bus,
iommu_bus->iommu_opaque, devfn);
}
return &address_space_memory;
}
+int pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *base_dev,
+ Error **errp)
+{
+ PCIBus *iommu_bus;
+
+ pci_device_get_iommu_bus_devfn(dev, NULL, &iommu_bus, NULL);
+ if (iommu_bus && iommu_bus->iommu_ops->set_iommu_device) {
+ return iommu_bus->iommu_ops->set_iommu_device(pci_get_bus(dev),
+ iommu_bus->iommu_opaque,
+ dev->devfn, base_dev,
+ errp);
+ }
+ return 0;
+}
+
+void pci_device_unset_iommu_device(PCIDevice *dev)
+{
+ PCIBus *iommu_bus;
+
+ pci_device_get_iommu_bus_devfn(dev, NULL, &iommu_bus, NULL);
+ if (iommu_bus && iommu_bus->iommu_ops->unset_iommu_device) {
+ return iommu_bus->iommu_ops->unset_iommu_device(pci_get_bus(dev),
+ iommu_bus->iommu_opaque,
+ dev->devfn);
+ }
+}
+
void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
{
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 11/18] intel_iommu: Add set/unset_iommu_device callback
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (9 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 10/18] hw/pci: Introduce pci_device_set/unset_iommu_device() Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-19 17:46 ` Eric Auger
2024-02-01 7:28 ` [PATCH rfcv2 12/18] vfio: Initialize host IOMMU device and pass to vIOMMU Zhenzhong Duan
` (6 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Yi Sun, Zhenzhong Duan, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
From: Yi Liu <yi.l.liu@intel.com>
This adds set/unset_iommu_device() implementation in Intel vIOMMU.
In set call, a pointer to host IOMMU device info is stored in hash
table indexed by PCI BDF.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 14 +++++++
include/hw/i386/intel_iommu.h | 2 +
hw/i386/intel_iommu.c | 74 ++++++++++++++++++++++++++++++++++
3 files changed, 90 insertions(+)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index f8cf99bddf..3301f54b35 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -28,6 +28,8 @@
#ifndef HW_I386_INTEL_IOMMU_INTERNAL_H
#define HW_I386_INTEL_IOMMU_INTERNAL_H
#include "hw/i386/intel_iommu.h"
+#include "sysemu/host_iommu_device.h"
+#include "hw/vfio/vfio-common.h"
/*
* Intel IOMMU register specification
@@ -537,4 +539,16 @@ typedef struct VTDRootEntry VTDRootEntry;
#define VTD_SL_IGN_COM 0xbff0000000000000ULL
#define VTD_SL_TM (1ULL << 62)
+
+typedef struct VTDHostIOMMUDevice {
+ IntelIOMMUState *iommu_state;
+ PCIBus *bus;
+ uint8_t devfn;
+ union {
+ HostIOMMUDevice *dev;
+ IOMMULegacyDevice *ldev;
+ IOMMUFDDevice *idev;
+ };
+ QLIST_ENTRY(VTDHostIOMMUDevice) next;
+} VTDHostIOMMUDevice;
#endif
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index 7fa0a695c8..bbc7b96add 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -292,6 +292,8 @@ struct IntelIOMMUState {
/* list of registered notifiers */
QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
+ GHashTable *vtd_host_iommu_dev; /* VTDHostIOMMUDevice */
+
/* interrupt remapping */
bool intr_enabled; /* Whether guest enabled IR */
dma_addr_t intr_root; /* Interrupt remapping table pointer */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 1a07faddb4..9b62441439 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -237,6 +237,13 @@ static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
(key1->pasid == key2->pasid);
}
+static gboolean vtd_as_idev_equal(gconstpointer v1, gconstpointer v2)
+{
+ const struct vtd_as_key *key1 = v1;
+ const struct vtd_as_key *key2 = v2;
+
+ return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
+}
/*
* Note that we use pointer to PCIBus as the key, so hashing/shifting
* based on the pointer value is intended. Note that we deal with
@@ -3812,6 +3819,68 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
return vtd_dev_as;
}
+static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
+ HostIOMMUDevice *base_dev, Error **errp)
+{
+ IntelIOMMUState *s = opaque;
+ VTDHostIOMMUDevice *vtd_hdev;
+ struct vtd_as_key key = {
+ .bus = bus,
+ .devfn = devfn,
+ };
+ struct vtd_as_key *new_key;
+
+ assert(base_dev);
+
+ vtd_iommu_lock(s);
+
+ vtd_hdev = g_hash_table_lookup(s->vtd_host_iommu_dev, &key);
+
+ if (vtd_hdev) {
+ error_setg(errp, "IOMMUFD device already exist");
+ vtd_iommu_unlock(s);
+ return -EEXIST;
+ }
+
+ vtd_hdev = g_malloc0(sizeof(VTDHostIOMMUDevice));
+ vtd_hdev->bus = bus;
+ vtd_hdev->devfn = (uint8_t)devfn;
+ vtd_hdev->iommu_state = s;
+ vtd_hdev->dev = base_dev;
+
+ new_key = g_malloc(sizeof(*new_key));
+ new_key->bus = bus;
+ new_key->devfn = devfn;
+
+ g_hash_table_insert(s->vtd_host_iommu_dev, new_key, vtd_hdev);
+
+ vtd_iommu_unlock(s);
+
+ return 0;
+}
+
+static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
+{
+ IntelIOMMUState *s = opaque;
+ VTDHostIOMMUDevice *vtd_hdev;
+ struct vtd_as_key key = {
+ .bus = bus,
+ .devfn = devfn,
+ };
+
+ vtd_iommu_lock(s);
+
+ vtd_hdev = g_hash_table_lookup(s->vtd_host_iommu_dev, &key);
+ if (!vtd_hdev) {
+ vtd_iommu_unlock(s);
+ return;
+ }
+
+ g_hash_table_remove(s->vtd_host_iommu_dev, &key);
+
+ vtd_iommu_unlock(s);
+}
+
/* Unmap the whole range in the notifier's scope. */
static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
{
@@ -4107,6 +4176,8 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
static PCIIOMMUOps vtd_iommu_ops = {
.get_address_space = vtd_host_dma_iommu,
+ .set_iommu_device = vtd_dev_set_iommu_device,
+ .unset_iommu_device = vtd_dev_unset_iommu_device,
};
static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
@@ -4230,6 +4301,9 @@ static void vtd_realize(DeviceState *dev, Error **errp)
g_free, g_free);
s->vtd_address_spaces = g_hash_table_new_full(vtd_as_hash, vtd_as_equal,
g_free, g_free);
+ s->vtd_host_iommu_dev = g_hash_table_new_full(vtd_as_hash,
+ vtd_as_idev_equal,
+ g_free, g_free);
vtd_init(s);
pci_setup_iommu(bus, &vtd_iommu_ops, dev);
/* Pseudo address space under root PCI bus. */
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 12/18] vfio: Initialize host IOMMU device and pass to vIOMMU
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (10 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 11/18] intel_iommu: Add set/unset_iommu_device callback Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 13/18] intel_iommu: Extract out vtd_cap_init to initialize cap/ecap Zhenzhong Duan
` (5 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan, Yi Sun
Initialize host IOMMU device in vfio and pass to vIOMMU, so that vIOMMU
could get hw IOMMU information.
Support both iommufd and legacy backend.
Originally-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/pci.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index dedb64fc08..b23c5ea790 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3112,11 +3112,17 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
vfio_bars_register(vdev);
- ret = vfio_add_capabilities(vdev, errp);
+ ret = pci_device_set_iommu_device(pdev, &vbasedev->base_hdev, errp);
if (ret) {
+ error_prepend(errp, "Failed to set iommu_device: ");
goto out_teardown;
}
+ ret = vfio_add_capabilities(vdev, errp);
+ if (ret) {
+ goto out_unset_idev;
+ }
+
if (vdev->vga) {
vfio_vga_quirk_setup(vdev);
}
@@ -3133,7 +3139,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
error_setg(errp,
"cannot support IGD OpRegion feature on hotplugged "
"device");
- goto out_teardown;
+ goto out_unset_idev;
}
ret = vfio_get_dev_region_info(vbasedev,
@@ -3142,13 +3148,13 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
if (ret) {
error_setg_errno(errp, -ret,
"does not support requested IGD OpRegion feature");
- goto out_teardown;
+ goto out_unset_idev;
}
ret = vfio_pci_igd_opregion_init(vdev, opregion, errp);
g_free(opregion);
if (ret) {
- goto out_teardown;
+ goto out_unset_idev;
}
}
@@ -3234,6 +3240,8 @@ out_deregister:
if (vdev->intx.mmap_timer) {
timer_free(vdev->intx.mmap_timer);
}
+out_unset_idev:
+ pci_device_unset_iommu_device(pdev);
out_teardown:
vfio_teardown_msi(vdev);
vfio_bars_exit(vdev);
@@ -3262,6 +3270,7 @@ static void vfio_instance_finalize(Object *obj)
static void vfio_exitfn(PCIDevice *pdev)
{
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIODevice *vbasedev = &vdev->vbasedev;
vfio_unregister_req_notifier(vdev);
vfio_unregister_err_notifier(vdev);
@@ -3276,7 +3285,8 @@ static void vfio_exitfn(PCIDevice *pdev)
vfio_teardown_msi(vdev);
vfio_pci_disable_rp_atomics(vdev);
vfio_bars_exit(vdev);
- vfio_migration_exit(&vdev->vbasedev);
+ vfio_migration_exit(vbasedev);
+ pci_device_unset_iommu_device(pdev);
}
static void vfio_pci_reset(DeviceState *dev)
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 13/18] intel_iommu: Extract out vtd_cap_init to initialize cap/ecap
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (11 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 12/18] vfio: Initialize host IOMMU device and pass to vIOMMU Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap Zhenzhong Duan
` (4 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan, Marcel Apfelbaum, Paolo Bonzini,
Richard Henderson, Eduardo Habkost
This is a prerequisite for host cap/ecap sync.
No functional change intended.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 93 ++++++++++++++++++++++++-------------------
1 file changed, 51 insertions(+), 42 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 9b62441439..ffa1ad6429 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -4003,30 +4003,10 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
return;
}
-/* Do the initialization. It will also be called when reset, so pay
- * attention when adding new initialization stuff.
- */
-static void vtd_init(IntelIOMMUState *s)
+static void vtd_cap_init(IntelIOMMUState *s)
{
X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
- memset(s->csr, 0, DMAR_REG_SIZE);
- memset(s->wmask, 0, DMAR_REG_SIZE);
- memset(s->w1cmask, 0, DMAR_REG_SIZE);
- memset(s->womask, 0, DMAR_REG_SIZE);
-
- s->root = 0;
- s->root_scalable = false;
- s->dmar_enabled = false;
- s->intr_enabled = false;
- s->iq_head = 0;
- s->iq_tail = 0;
- s->iq = 0;
- s->iq_size = 0;
- s->qi_enabled = false;
- s->iq_last_desc_type = VTD_INV_DESC_NONE;
- s->iq_dw = false;
- s->next_frcd_reg = 0;
s->cap = VTD_CAP_FRO | VTD_CAP_NFR | VTD_CAP_ND |
VTD_CAP_MAMV | VTD_CAP_PSI | VTD_CAP_SLLPS |
VTD_CAP_MGAW(s->aw_bits);
@@ -4043,27 +4023,6 @@ static void vtd_init(IntelIOMMUState *s)
}
s->ecap = VTD_ECAP_QI | VTD_ECAP_IRO;
- /*
- * Rsvd field masks for spte
- */
- vtd_spte_rsvd[0] = ~0ULL;
- vtd_spte_rsvd[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits,
- x86_iommu->dt_supported);
- vtd_spte_rsvd[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
- vtd_spte_rsvd[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
- vtd_spte_rsvd[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
-
- vtd_spte_rsvd_large[2] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits,
- x86_iommu->dt_supported);
- vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits,
- x86_iommu->dt_supported);
-
- if (s->scalable_mode || s->snoop_control) {
- vtd_spte_rsvd[1] &= ~VTD_SPTE_SNP;
- vtd_spte_rsvd_large[2] &= ~VTD_SPTE_SNP;
- vtd_spte_rsvd_large[3] &= ~VTD_SPTE_SNP;
- }
-
if (x86_iommu_ir_supported(x86_iommu)) {
s->ecap |= VTD_ECAP_IR | VTD_ECAP_MHMV;
if (s->intr_eim == ON_OFF_AUTO_ON) {
@@ -4096,6 +4055,56 @@ static void vtd_init(IntelIOMMUState *s)
if (s->pasid) {
s->ecap |= VTD_ECAP_PASID;
}
+}
+
+/*
+ * Do the initialization. It will also be called when reset, so pay
+ * attention when adding new initialization stuff.
+ */
+static void vtd_init(IntelIOMMUState *s)
+{
+ X86IOMMUState *x86_iommu = X86_IOMMU_DEVICE(s);
+
+ memset(s->csr, 0, DMAR_REG_SIZE);
+ memset(s->wmask, 0, DMAR_REG_SIZE);
+ memset(s->w1cmask, 0, DMAR_REG_SIZE);
+ memset(s->womask, 0, DMAR_REG_SIZE);
+
+ s->root = 0;
+ s->root_scalable = false;
+ s->dmar_enabled = false;
+ s->intr_enabled = false;
+ s->iq_head = 0;
+ s->iq_tail = 0;
+ s->iq = 0;
+ s->iq_size = 0;
+ s->qi_enabled = false;
+ s->iq_last_desc_type = VTD_INV_DESC_NONE;
+ s->iq_dw = false;
+ s->next_frcd_reg = 0;
+
+ vtd_cap_init(s);
+
+ /*
+ * Rsvd field masks for spte
+ */
+ vtd_spte_rsvd[0] = ~0ULL;
+ vtd_spte_rsvd[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits,
+ x86_iommu->dt_supported);
+ vtd_spte_rsvd[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
+ vtd_spte_rsvd[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
+ vtd_spte_rsvd[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
+
+ vtd_spte_rsvd_large[2] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits,
+ x86_iommu->dt_supported);
+ vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits,
+ x86_iommu->dt_supported);
+
+ if (s->scalable_mode || s->snoop_control) {
+ vtd_spte_rsvd[1] &= ~VTD_SPTE_SNP;
+ vtd_spte_rsvd_large[2] &= ~VTD_SPTE_SNP;
+ vtd_spte_rsvd_large[3] &= ~VTD_SPTE_SNP;
+ }
vtd_reset_caches(s);
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (12 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 13/18] intel_iommu: Extract out vtd_cap_init to initialize cap/ecap Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-19 17:51 ` Eric Auger
2024-02-01 7:28 ` [PATCH rfcv2 15/18] backends/iommufd: Introduce helper function iommufd_device_get_info() Zhenzhong Duan
` (3 subsequent siblings)
17 siblings, 1 reply; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Yi Sun, Zhenzhong Duan, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
From: Yi Liu <yi.l.liu@intel.com>
Add a framework to check and synchronize host IOMMU cap/ecap with
vIOMMU cap/ecap.
The sequence will be:
vtd_cap_init() initializes iommu->cap/ecap.
vtd_check_hdev() update iommu->cap/ecap based on host cap/ecap.
iommu->cap_frozen set when machine create done, iommu->cap/ecap become readonly.
Implementation details for different backends will be in following patches.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/i386/intel_iommu.h | 1 +
hw/i386/intel_iommu.c | 41 ++++++++++++++++++++++++++++++++++-
2 files changed, 41 insertions(+), 1 deletion(-)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index bbc7b96add..c71a133820 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -283,6 +283,7 @@ struct IntelIOMMUState {
uint64_t cap; /* The value of capability reg */
uint64_t ecap; /* The value of extended capability reg */
+ bool cap_frozen; /* cap/ecap become read-only after frozen */
uint32_t context_cache_gen; /* Should be in [1,MAX] */
GHashTable *iotlb; /* IOTLB */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ffa1ad6429..7ed2b79669 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3819,6 +3819,31 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
return vtd_dev_as;
}
+static int vtd_check_legacy_hdev(IntelIOMMUState *s,
+ IOMMULegacyDevice *ldev,
+ Error **errp)
+{
+ return 0;
+}
+
+static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
+ IOMMUFDDevice *idev,
+ Error **errp)
+{
+ return 0;
+}
+
+static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
+ Error **errp)
+{
+ HostIOMMUDevice *base_dev = vtd_hdev->dev;
+
+ if (base_dev->type == HID_LEGACY) {
+ return vtd_check_legacy_hdev(s, vtd_hdev->ldev, errp);
+ }
+ return vtd_check_iommufd_hdev(s, vtd_hdev->idev, errp);
+}
+
static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
HostIOMMUDevice *base_dev, Error **errp)
{
@@ -3829,6 +3854,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
.devfn = devfn,
};
struct vtd_as_key *new_key;
+ int ret;
assert(base_dev);
@@ -3848,6 +3874,13 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
vtd_hdev->iommu_state = s;
vtd_hdev->dev = base_dev;
+ ret = vtd_check_hdev(s, vtd_hdev, errp);
+ if (ret) {
+ g_free(vtd_hdev);
+ vtd_iommu_unlock(s);
+ return ret;
+ }
+
new_key = g_malloc(sizeof(*new_key));
new_key->bus = bus;
new_key->devfn = devfn;
@@ -4083,7 +4116,9 @@ static void vtd_init(IntelIOMMUState *s)
s->iq_dw = false;
s->next_frcd_reg = 0;
- vtd_cap_init(s);
+ if (!s->cap_frozen) {
+ vtd_cap_init(s);
+ }
/*
* Rsvd field masks for spte
@@ -4254,6 +4289,10 @@ static int vtd_machine_done_notify_one(Object *child, void *unused)
static void vtd_machine_done_hook(Notifier *notifier, void *unused)
{
+ IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
+
+ iommu->cap_frozen = true;
+
object_child_foreach_recursive(object_get_root(),
vtd_machine_done_notify_one, NULL);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 15/18] backends/iommufd: Introduce helper function iommufd_device_get_info()
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (13 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 16/18] intel_iommu: Implement check and sync mechanism in iommufd mode Zhenzhong Duan
` (2 subsequent siblings)
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan, Yi Sun
Introduce a helper function iommufd_device_get_info() to get
host IOMMU related information through iommufd uAPI.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/sysemu/iommufd.h | 4 ++++
backends/iommufd.c | 25 ++++++++++++++++++++++++-
2 files changed, 28 insertions(+), 1 deletion(-)
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
index c3f3469760..ec8b80d8d9 100644
--- a/include/sysemu/iommufd.h
+++ b/include/sysemu/iommufd.h
@@ -4,6 +4,7 @@
#include "qom/object.h"
#include "exec/hwaddr.h"
#include "exec/cpu-common.h"
+#include <linux/iommufd.h>
#include "sysemu/host_iommu_device.h"
#define TYPE_IOMMUFD_BACKEND "iommufd"
@@ -47,4 +48,7 @@ typedef struct IOMMUFDDevice {
} IOMMUFDDevice;
void iommufd_device_init(IOMMUFDDevice *idev);
+int iommufd_device_get_info(IOMMUFDDevice *idev,
+ enum iommu_hw_info_type *type,
+ uint32_t len, void *data, Error **errp);
#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
index d92791bba9..1b0b991747 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -20,7 +20,6 @@
#include "monitor/monitor.h"
#include "trace.h"
#include <sys/ioctl.h>
-#include <linux/iommufd.h>
static void iommufd_backend_init(Object *obj)
{
@@ -237,3 +236,27 @@ void iommufd_device_init(IOMMUFDDevice *idev)
host_iommu_base_device_init(&idev->base, HID_IOMMUFD,
sizeof(IOMMUFDDevice));
}
+
+int iommufd_device_get_info(IOMMUFDDevice *idev,
+ enum iommu_hw_info_type *type,
+ uint32_t len, void *data, Error **errp)
+{
+ struct iommu_hw_info info = {
+ .size = sizeof(info),
+ .flags = 0,
+ .dev_id = idev->devid,
+ .data_len = len,
+ .__reserved = 0,
+ .data_uptr = (uintptr_t)data,
+ };
+ int ret;
+
+ ret = ioctl(idev->iommufd->fd, IOMMU_GET_HW_INFO, &info);
+ if (ret) {
+ error_setg_errno(errp, errno, "Failed to get hardware info");
+ } else {
+ *type = info.out_data_type;
+ }
+
+ return ret;
+}
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 16/18] intel_iommu: Implement check and sync mechanism in iommufd mode
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (14 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 15/18] backends/iommufd: Introduce helper function iommufd_device_get_info() Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 17/18] intel_iommu: Use mgaw instead of s->aw_bits Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated Zhenzhong Duan
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan, Yi Sun, Marcel Apfelbaum,
Paolo Bonzini, Richard Henderson, Eduardo Habkost
We use cap_frozen to mark cap/ecap read/writable or read-only,
At init stage, we allow to update cap/ecap based on host IOMMU
cap/ecap, but when machine create done, cap_frozen is set and
we only allow checking cap/ecap for compatibility.
Currently only stage-2 translation is supported which is backed by
shadow page table on host side. So we don't need exact matching of
each bit of cap/ecap between vIOMMU and host. However, we can still
ensure compatibility of host and vIOMMU's address width at least,
i.e., vIOMMU's mgaw <= host IOMMU mgaw, which is missed before.
When stage-1 translation is supported in future, a.k.a. scalable
modern mode, this mechanism will be further extended to check more
bits.
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu_internal.h | 1 +
include/hw/i386/intel_iommu.h | 1 +
hw/i386/intel_iommu.c | 29 +++++++++++++++++++++++++++++
3 files changed, 31 insertions(+)
diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
index 3301f54b35..33d2298dce 100644
--- a/hw/i386/intel_iommu_internal.h
+++ b/hw/i386/intel_iommu_internal.h
@@ -206,6 +206,7 @@
#define VTD_DOMAIN_ID_MASK ((1UL << VTD_DOMAIN_ID_SHIFT) - 1)
#define VTD_CAP_ND (((VTD_DOMAIN_ID_SHIFT - 4) / 2) & 7ULL)
#define VTD_ADDRESS_SIZE(aw) (1ULL << (aw))
+#define VTD_CAP_MGAW_MASK (0x3fULL << 16)
#define VTD_CAP_MGAW(aw) ((((aw) - 1) & 0x3fULL) << 16)
#define VTD_MAMV 18ULL
#define VTD_CAP_MAMV (VTD_MAMV << 48)
diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
index c71a133820..a0b530ebc6 100644
--- a/include/hw/i386/intel_iommu.h
+++ b/include/hw/i386/intel_iommu.h
@@ -47,6 +47,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(IntelIOMMUState, INTEL_IOMMU_DEVICE)
#define VTD_HOST_AW_48BIT 48
#define VTD_HOST_ADDRESS_WIDTH VTD_HOST_AW_39BIT
#define VTD_HAW_MASK(aw) ((1ULL << (aw)) - 1)
+#define VTD_MGAW_FROM_CAP(cap) (((cap >> 16) & 0x3fULL) + 1)
#define DMAR_REPORT_F_INTR (1)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 7ed2b79669..409f8a59c3 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -35,6 +35,7 @@
#include "sysemu/kvm.h"
#include "sysemu/dma.h"
#include "sysemu/sysemu.h"
+#include "sysemu/iommufd.h"
#include "hw/i386/apic_internal.h"
#include "kvm/kvm_i386.h"
#include "migration/vmstate.h"
@@ -3830,6 +3831,34 @@ static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
IOMMUFDDevice *idev,
Error **errp)
{
+ struct iommu_hw_info_vtd vtd;
+ enum iommu_hw_info_type type = IOMMU_HW_INFO_TYPE_INTEL_VTD;
+ long host_mgaw, viommu_mgaw = VTD_MGAW_FROM_CAP(s->cap);
+ uint64_t tmp_cap = s->cap;
+ int ret;
+
+ ret = iommufd_device_get_info(idev, &type, sizeof(vtd), &vtd, errp);
+ if (ret) {
+ return ret;
+ }
+
+ if (type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
+ error_setg(errp, "IOMMU hardware is not compatible");
+ return -EINVAL;
+ }
+
+ host_mgaw = VTD_MGAW_FROM_CAP(vtd.cap_reg);
+ if (viommu_mgaw > host_mgaw) {
+ if (s->cap_frozen) {
+ error_setg(errp, "mgaw %" PRId64 " > host mgaw %" PRId64,
+ viommu_mgaw, host_mgaw);
+ return -EINVAL;
+ }
+ tmp_cap &= ~VTD_CAP_MGAW_MASK;
+ tmp_cap |= VTD_CAP_MGAW(host_mgaw + 1);
+ }
+
+ s->cap = tmp_cap;
return 0;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 17/18] intel_iommu: Use mgaw instead of s->aw_bits
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (15 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 16/18] intel_iommu: Implement check and sync mechanism in iommufd mode Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated Zhenzhong Duan
17 siblings, 0 replies; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan, Igor Mammedov, Ani Sinha,
Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcel Apfelbaum
Because vIOMMU mgaw can be updated based on host IOMMU mgaw, s->aw_bits
does't necessarily represent the final mgaw now but the mgaw field in
s->cap does.
Replace reference to s->aw_bits with a MACRO S_AW_BITS to fetch mgaw
from s->cap. There are two exceptions on this, aw_bits value sanity
check and s->cap initialization.
ACPI DMAR table is also updated with right mgaw value.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/acpi-build.c | 3 ++-
hw/i386/intel_iommu.c | 44 ++++++++++++++++++++++---------------------
2 files changed, 25 insertions(+), 22 deletions(-)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index edc979379c..6467157686 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2159,7 +2159,8 @@ build_dmar_q35(GArray *table_data, BIOSLinker *linker, const char *oem_id,
acpi_table_begin(&table, table_data);
/* Host Address Width */
- build_append_int_noprefix(table_data, intel_iommu->aw_bits - 1, 1);
+ build_append_int_noprefix(table_data,
+ VTD_MGAW_FROM_CAP(intel_iommu->cap), 1);
build_append_int_noprefix(table_data, dmar_flags, 1); /* Flags */
g_array_append_vals(table_data, rsvd10, sizeof(rsvd10)); /* Reserved */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 409f8a59c3..72cc8b2c71 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -41,6 +41,8 @@
#include "migration/vmstate.h"
#include "trace.h"
+#define S_AW_BITS (VTD_MGAW_FROM_CAP(s->cap) + 1)
+
/* context entry operations */
#define VTD_CE_GET_RID2PASID(ce) \
((ce)->val[1] & VTD_SM_CONTEXT_ENTRY_RID2PASID_MASK)
@@ -1409,13 +1411,13 @@ static int vtd_root_entry_rsvd_bits_check(IntelIOMMUState *s,
{
/* Legacy Mode reserved bits check */
if (!s->root_scalable &&
- (re->hi || (re->lo & VTD_ROOT_ENTRY_RSVD(s->aw_bits))))
+ (re->hi || (re->lo & VTD_ROOT_ENTRY_RSVD(S_AW_BITS))))
goto rsvd_err;
/* Scalable Mode reserved bits check */
if (s->root_scalable &&
- ((re->lo & VTD_ROOT_ENTRY_RSVD(s->aw_bits)) ||
- (re->hi & VTD_ROOT_ENTRY_RSVD(s->aw_bits))))
+ ((re->lo & VTD_ROOT_ENTRY_RSVD(S_AW_BITS)) ||
+ (re->hi & VTD_ROOT_ENTRY_RSVD(S_AW_BITS))))
goto rsvd_err;
return 0;
@@ -1432,7 +1434,7 @@ static inline int vtd_context_entry_rsvd_bits_check(IntelIOMMUState *s,
{
if (!s->root_scalable &&
(ce->hi & VTD_CONTEXT_ENTRY_RSVD_HI ||
- ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(s->aw_bits))) {
+ ce->lo & VTD_CONTEXT_ENTRY_RSVD_LO(S_AW_BITS))) {
error_report_once("%s: invalid context entry: hi=%"PRIx64
", lo=%"PRIx64" (reserved nonzero)",
__func__, ce->hi, ce->lo);
@@ -1440,7 +1442,7 @@ static inline int vtd_context_entry_rsvd_bits_check(IntelIOMMUState *s,
}
if (s->root_scalable &&
- (ce->val[0] & VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(s->aw_bits) ||
+ (ce->val[0] & VTD_SM_CONTEXT_ENTRY_RSVD_VAL0(S_AW_BITS) ||
ce->val[1] & VTD_SM_CONTEXT_ENTRY_RSVD_VAL1 ||
ce->val[2] ||
ce->val[3])) {
@@ -1571,7 +1573,7 @@ static int vtd_sync_shadow_page_table_range(VTDAddressSpace *vtd_as,
.hook_fn = vtd_sync_shadow_page_hook,
.private = (void *)&vtd_as->iommu,
.notify_unmap = true,
- .aw = s->aw_bits,
+ .aw = S_AW_BITS,
.as = vtd_as,
.domain_id = vtd_get_domain_id(s, ce, vtd_as->pasid),
};
@@ -1990,7 +1992,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
}
ret_fr = vtd_iova_to_slpte(s, &ce, addr, is_write, &slpte, &level,
- &reads, &writes, s->aw_bits, pasid);
+ &reads, &writes, S_AW_BITS, pasid);
if (ret_fr) {
vtd_report_fault(s, -ret_fr, is_fpd_set, source_id,
addr, is_write, pasid != PCI_NO_PASID, pasid);
@@ -2004,7 +2006,7 @@ static bool vtd_do_iommu_translate(VTDAddressSpace *vtd_as, PCIBus *bus,
out:
vtd_iommu_unlock(s);
entry->iova = addr & page_mask;
- entry->translated_addr = vtd_get_slpte_addr(slpte, s->aw_bits) & page_mask;
+ entry->translated_addr = vtd_get_slpte_addr(slpte, S_AW_BITS) & page_mask;
entry->addr_mask = ~page_mask;
entry->perm = access_flags;
return true;
@@ -2021,7 +2023,7 @@ error:
static void vtd_root_table_setup(IntelIOMMUState *s)
{
s->root = vtd_get_quad_raw(s, DMAR_RTADDR_REG);
- s->root &= VTD_RTADDR_ADDR_MASK(s->aw_bits);
+ s->root &= VTD_RTADDR_ADDR_MASK(S_AW_BITS);
vtd_update_scalable_state(s);
@@ -2039,7 +2041,7 @@ static void vtd_interrupt_remap_table_setup(IntelIOMMUState *s)
uint64_t value = 0;
value = vtd_get_quad_raw(s, DMAR_IRTA_REG);
s->intr_size = 1UL << ((value & VTD_IRTA_SIZE_MASK) + 1);
- s->intr_root = value & VTD_IRTA_ADDR_MASK(s->aw_bits);
+ s->intr_root = value & VTD_IRTA_ADDR_MASK(S_AW_BITS);
s->intr_eime = value & VTD_IRTA_EIME;
/* Notify global invalidation */
@@ -2322,7 +2324,7 @@ static void vtd_handle_gcmd_qie(IntelIOMMUState *s, bool en)
trace_vtd_inv_qi_enable(en);
if (en) {
- s->iq = iqa_val & VTD_IQA_IQA_MASK(s->aw_bits);
+ s->iq = iqa_val & VTD_IQA_IQA_MASK(S_AW_BITS);
/* 2^(x+8) entries */
s->iq_size = 1UL << ((iqa_val & VTD_IQA_QS) + 8 - (s->iq_dw ? 1 : 0));
s->qi_enabled = true;
@@ -3958,12 +3960,12 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
* VT-d spec), otherwise we need to consider overflow of 64 bits.
*/
- if (end > VTD_ADDRESS_SIZE(s->aw_bits) - 1) {
+ if (end > VTD_ADDRESS_SIZE(S_AW_BITS) - 1) {
/*
* Don't need to unmap regions that is bigger than the whole
* VT-d supported address space size
*/
- end = VTD_ADDRESS_SIZE(s->aw_bits) - 1;
+ end = VTD_ADDRESS_SIZE(S_AW_BITS) - 1;
}
assert(start <= end);
@@ -3971,7 +3973,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
while (remain >= VTD_PAGE_SIZE) {
IOMMUTLBEvent event;
- uint64_t mask = dma_aligned_pow2_mask(start, end, s->aw_bits);
+ uint64_t mask = dma_aligned_pow2_mask(start, end, S_AW_BITS);
uint64_t size = mask + 1;
assert(size);
@@ -4050,7 +4052,7 @@ static void vtd_iommu_replay(IOMMUMemoryRegion *iommu_mr, IOMMUNotifier *n)
.hook_fn = vtd_replay_hook,
.private = (void *)n,
.notify_unmap = false,
- .aw = s->aw_bits,
+ .aw = S_AW_BITS,
.as = vtd_as,
.domain_id = vtd_get_domain_id(s, &ce, vtd_as->pasid),
};
@@ -4153,15 +4155,15 @@ static void vtd_init(IntelIOMMUState *s)
* Rsvd field masks for spte
*/
vtd_spte_rsvd[0] = ~0ULL;
- vtd_spte_rsvd[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(s->aw_bits,
+ vtd_spte_rsvd[1] = VTD_SPTE_PAGE_L1_RSVD_MASK(S_AW_BITS,
x86_iommu->dt_supported);
- vtd_spte_rsvd[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(s->aw_bits);
- vtd_spte_rsvd[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(s->aw_bits);
- vtd_spte_rsvd[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(s->aw_bits);
+ vtd_spte_rsvd[2] = VTD_SPTE_PAGE_L2_RSVD_MASK(S_AW_BITS);
+ vtd_spte_rsvd[3] = VTD_SPTE_PAGE_L3_RSVD_MASK(S_AW_BITS);
+ vtd_spte_rsvd[4] = VTD_SPTE_PAGE_L4_RSVD_MASK(S_AW_BITS);
- vtd_spte_rsvd_large[2] = VTD_SPTE_LPAGE_L2_RSVD_MASK(s->aw_bits,
+ vtd_spte_rsvd_large[2] = VTD_SPTE_LPAGE_L2_RSVD_MASK(S_AW_BITS,
x86_iommu->dt_supported);
- vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(s->aw_bits,
+ vtd_spte_rsvd_large[3] = VTD_SPTE_LPAGE_L3_RSVD_MASK(S_AW_BITS,
x86_iommu->dt_supported);
if (s->scalable_mode || s->snoop_control) {
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
` (16 preceding siblings ...)
2024-02-01 7:28 ` [PATCH rfcv2 17/18] intel_iommu: Use mgaw instead of s->aw_bits Zhenzhong Duan
@ 2024-02-01 7:28 ` Zhenzhong Duan
2024-02-13 10:55 ` Joao Martins
17 siblings, 1 reply; 35+ messages in thread
From: Zhenzhong Duan @ 2024-02-01 7:28 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun,
chao.p.peng, Zhenzhong Duan, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, Marcel Apfelbaum
When there is VFIO device and vIOMMU cap/ecap is updated based on host
IOMMU cap/ecap, migration should be blocked.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/intel_iommu.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 72cc8b2c71..7f9ff653b2 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -39,6 +39,7 @@
#include "hw/i386/apic_internal.h"
#include "kvm/kvm_i386.h"
#include "migration/vmstate.h"
+#include "migration/blocker.h"
#include "trace.h"
#define S_AW_BITS (VTD_MGAW_FROM_CAP(s->cap) + 1)
@@ -3829,6 +3830,8 @@ static int vtd_check_legacy_hdev(IntelIOMMUState *s,
return 0;
}
+static Error *vtd_mig_blocker;
+
static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
IOMMUFDDevice *idev,
Error **errp)
@@ -3860,8 +3863,17 @@ static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
tmp_cap |= VTD_CAP_MGAW(host_mgaw + 1);
}
- s->cap = tmp_cap;
- return 0;
+ if (s->cap != tmp_cap) {
+ if (vtd_mig_blocker == NULL) {
+ error_setg(&vtd_mig_blocker,
+ "cap/ecap update from host IOMMU block migration");
+ ret = migrate_add_blocker(&vtd_mig_blocker, errp);
+ }
+ if (!ret) {
+ s->cap = tmp_cap;
+ }
+ }
+ return ret;
}
static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
--
2.34.1
^ permalink raw reply related [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated
2024-02-01 7:28 ` [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated Zhenzhong Duan
@ 2024-02-13 10:55 ` Joao Martins
2024-02-27 2:41 ` Duan, Zhenzhong
0 siblings, 1 reply; 35+ messages in thread
From: Joao Martins @ 2024-02-13 10:55 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, eric.auger, peterx, jasowang, mst, jgg,
nicolinc, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcel Apfelbaum
On 01/02/2024 07:28, Zhenzhong Duan wrote:
> When there is VFIO device and vIOMMU cap/ecap is updated based on host
> IOMMU cap/ecap, migration should be blocked.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Is this really needed considering migration with vIOMMU is already blocked anyways?
> ---
> hw/i386/intel_iommu.c | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 72cc8b2c71..7f9ff653b2 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -39,6 +39,7 @@
> #include "hw/i386/apic_internal.h"
> #include "kvm/kvm_i386.h"
> #include "migration/vmstate.h"
> +#include "migration/blocker.h"
> #include "trace.h"
>
> #define S_AW_BITS (VTD_MGAW_FROM_CAP(s->cap) + 1)
> @@ -3829,6 +3830,8 @@ static int vtd_check_legacy_hdev(IntelIOMMUState *s,
> return 0;
> }
>
> +static Error *vtd_mig_blocker;
> +
> static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
> IOMMUFDDevice *idev,
> Error **errp)
> @@ -3860,8 +3863,17 @@ static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
> tmp_cap |= VTD_CAP_MGAW(host_mgaw + 1);
> }
>
> - s->cap = tmp_cap;
> - return 0;
> + if (s->cap != tmp_cap) {
> + if (vtd_mig_blocker == NULL) {
> + error_setg(&vtd_mig_blocker,
> + "cap/ecap update from host IOMMU block migration");
> + ret = migrate_add_blocker(&vtd_mig_blocker, errp);
> + }
> + if (!ret) {
> + s->cap = tmp_cap;
> + }
> + }
> + return ret;
> }
>
> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice
2024-02-01 7:28 ` [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice Zhenzhong Duan
@ 2024-02-19 15:34 ` Eric Auger
2024-02-19 15:45 ` Eric Auger
1 sibling, 0 replies; 35+ messages in thread
From: Eric Auger @ 2024-02-19 15:34 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, peterx, jasowang, mst, jgg, nicolinc,
joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
Hi Zhenzhong,
On 2/1/24 08:28, Zhenzhong Duan wrote:
> Either IOMMULegacyDevice or IOMMUFDDevice into VFIODevice, neither
> both.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> include/hw/vfio/vfio-common.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 8bfb9cbe94..1bbad003ee 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -32,6 +32,7 @@
> #include "sysemu/sysemu.h"
> #include "hw/vfio/vfio-container-base.h"
> #include "sysemu/host_iommu_device.h"
> +#include "sysemu/iommufd.h"
>
> #define VFIO_MSG_PREFIX "vfio %s: "
>
> @@ -132,8 +133,18 @@ typedef struct VFIODevice {
> bool dirty_tracking;
> int devid;
> IOMMUFDBackend *iommufd;
> + union {
> + HostIOMMUDevice base_hdev;
I don't think we want the base object above to be usable here
Thanks
Eric
> + IOMMULegacyDevice legacy_dev;
> + IOMMUFDDevice iommufd_dev;
> + };
> } VFIODevice;
>
> +QEMU_BUILD_BUG_ON(offsetof(VFIODevice, legacy_dev.base) !=
> + offsetof(VFIODevice, base_hdev));
> +QEMU_BUILD_BUG_ON(offsetof(VFIODevice, iommufd_dev.base) !=
> + offsetof(VFIODevice, base_hdev));
> +
> struct VFIODeviceOps {
> void (*vfio_compute_needs_reset)(VFIODevice *vdev);
> int (*vfio_hot_reset_multi)(VFIODevice *vdev);
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice
2024-02-01 7:28 ` [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice Zhenzhong Duan
2024-02-19 15:34 ` Eric Auger
@ 2024-02-19 15:45 ` Eric Auger
2024-02-26 6:16 ` Duan, Zhenzhong
1 sibling, 1 reply; 35+ messages in thread
From: Eric Auger @ 2024-02-19 15:45 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, peterx, jasowang, mst, jgg, nicolinc,
joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 2/1/24 08:28, Zhenzhong Duan wrote:
> Either IOMMULegacyDevice or IOMMUFDDevice into VFIODevice, neither
> both.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> include/hw/vfio/vfio-common.h | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 8bfb9cbe94..1bbad003ee 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -32,6 +32,7 @@
> #include "sysemu/sysemu.h"
> #include "hw/vfio/vfio-container-base.h"
> #include "sysemu/host_iommu_device.h"
> +#include "sysemu/iommufd.h"
>
> #define VFIO_MSG_PREFIX "vfio %s: "
>
> @@ -132,8 +133,18 @@ typedef struct VFIODevice {
> bool dirty_tracking;
> int devid;
> IOMMUFDBackend *iommufd;
> + union {
> + HostIOMMUDevice base_hdev;
> + IOMMULegacyDevice legacy_dev;
> + IOMMUFDDevice iommufd_dev;
I think you should rather have a HostIOMMUDevice handle.
host_iommu_device_init cb would allocate the right type of the derived object and you would store the base object pointer here.
Eric
> + };
> } VFIODevice;
>
> +QEMU_BUILD_BUG_ON(offsetof(VFIODevice, legacy_dev.base) !=
> + offsetof(VFIODevice, base_hdev));
> +QEMU_BUILD_BUG_ON(offsetof(VFIODevice, iommufd_dev.base) !=
> + offsetof(VFIODevice, base_hdev));
> +
> struct VFIODeviceOps {
> void (*vfio_compute_needs_reset)(VFIODevice *vdev);
> int (*vfio_hot_reset_multi)(VFIODevice *vdev);
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 07/18] vfio/container: Implement host_iommu_device_init callback in legacy mode
2024-02-01 7:28 ` [PATCH rfcv2 07/18] vfio/container: Implement host_iommu_device_init callback in legacy mode Zhenzhong Duan
@ 2024-02-19 17:13 ` Eric Auger
2024-02-26 6:56 ` Duan, Zhenzhong
0 siblings, 1 reply; 35+ messages in thread
From: Eric Auger @ 2024-02-19 17:13 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, peterx, jasowang, mst, jgg, nicolinc,
joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
Hi Zhenzhong,
On 2/1/24 08:28, Zhenzhong Duan wrote:
> This callback will be used to initialize base and public elements
> in IOMMULegacyDevice.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/vfio/container.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index bd25b9fbad..8fafd4b4e5 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -1120,6 +1120,12 @@ out_single:
> return ret;
> }
>
> +static void vfio_legacy_host_iommu_device_init(VFIODevice *vbasedev)
> +{
> + host_iommu_base_device_init(&vbasedev->base_hdev, HID_LEGACY,
> + sizeof(IOMMULegacyDevice));
To me this should allocate a new
IOMMULegacyDevice and set the VFIODevice base_hdev handle to its base @
Thanks
Eric
> +}
> +
> static void vfio_iommu_legacy_class_init(ObjectClass *klass, void *data)
> {
> VFIOIOMMUClass *vioc = VFIO_IOMMU_CLASS(klass);
> @@ -1132,6 +1138,7 @@ static void vfio_iommu_legacy_class_init(ObjectClass *klass, void *data)
> vioc->set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking;
> vioc->query_dirty_bitmap = vfio_legacy_query_dirty_bitmap;
> vioc->pci_hot_reset = vfio_legacy_pci_hot_reset;
> + vioc->host_iommu_device_init = vfio_legacy_host_iommu_device_init;
> };
>
> static const TypeInfo types[] = {
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 10/18] hw/pci: Introduce pci_device_set/unset_iommu_device()
2024-02-01 7:28 ` [PATCH rfcv2 10/18] hw/pci: Introduce pci_device_set/unset_iommu_device() Zhenzhong Duan
@ 2024-02-19 17:41 ` Eric Auger
2024-02-26 6:26 ` Duan, Zhenzhong
0 siblings, 1 reply; 35+ messages in thread
From: Eric Auger @ 2024-02-19 17:41 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, peterx, jasowang, mst, jgg, nicolinc,
joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Yi Sun, Marcel Apfelbaum
Hi Zhenzhong,
On 2/1/24 08:28, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
>
> This adds pci_device_set/unset_iommu_device() to set/unset
> HostIOMMUDevice for a given PCIe device. Caller of set
> should fail if set operation fails.
>
> Extract out pci_device_get_iommu_bus_devfn() to facilitate
> implementation of pci_device_set/unset_iommu_device().
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> include/hw/pci/pci.h | 38 ++++++++++++++++++++++++++-
> hw/pci/pci.c | 62 +++++++++++++++++++++++++++++++++++++++++---
> 2 files changed, 96 insertions(+), 4 deletions(-)
>
> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
> index fa6313aabc..5b471fd380 100644
> --- a/include/hw/pci/pci.h
> +++ b/include/hw/pci/pci.h
> @@ -3,6 +3,7 @@
>
> #include "exec/memory.h"
> #include "sysemu/dma.h"
> +#include "sysemu/host_iommu_device.h"
>
> /* PCI includes legacy ISA access. */
> #include "hw/isa/isa.h"
> @@ -384,10 +385,45 @@ typedef struct PCIIOMMUOps {
> *
> * @devfn: device and function number
> */
> - AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
> + AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int devfn);
> + /**
> + * @set_iommu_device: set iommufd device for a PCI device to vIOMMU
attach a HostIOMMUDevice to a vIOMMU
> + *
> + * Optional callback, if not implemented in vIOMMU, then vIOMMU can't
> + * utilize iommufd specific features.
looks too iommufd specific. Then vIOMMU can't retrieve host information
from the associated HostIOMMUDevice
> + *
> + * Return true if iommufd device is accepted, or else return false with
s/accepted/attached
> + * errp set.
> + *
> + * @bus: the #PCIBus of the PCI device.
> + *
> + * @opaque: the data passed to pci_setup_iommu().
> + *
> + * @devfn: device and function number of the PCI device.
> + *
> + * @dev: the data structure representing host assigned device.
> + *
> + */
> + int (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
> + HostIOMMUDevice *dev, Error **errp);
> + /**
> + * @unset_iommu_device: unset iommufd device for a PCI device from vIOMMU
same suggestion here
> + *
> + * Optional callback.
> + *
> + * @bus: the #PCIBus of the PCI device.
> + *
> + * @opaque: the data passed to pci_setup_iommu().
> + *
> + * @devfn: device and function number of the PCI device.
> + */
> + void (*unset_iommu_device)(PCIBus *bus, void *opaque, int devfn);
> } PCIIOMMUOps;
>
> AddressSpace *pci_device_iommu_address_space(PCIDevice *dev);
> +int pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *base_dev,
> + Error **errp);
> +void pci_device_unset_iommu_device(PCIDevice *dev);
>
> /**
> * pci_setup_iommu: Initialize specific IOMMU handlers for a PCIBus
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 76080af580..8078307963 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -2672,11 +2672,14 @@ static void pci_device_class_base_init(ObjectClass *klass, void *data)
> }
> }
>
> -AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> +static void pci_device_get_iommu_bus_devfn(PCIDevice *dev,
> + PCIBus **aliased_bus,
> + PCIBus **piommu_bus,
> + int *aliased_devfn)
> {
> PCIBus *bus = pci_get_bus(dev);
> PCIBus *iommu_bus = bus;
> - uint8_t devfn = dev->devfn;
> + int devfn = dev->devfn;
>
> while (iommu_bus && !iommu_bus->iommu_ops && iommu_bus->parent_dev) {
> PCIBus *parent_bus = pci_get_bus(iommu_bus->parent_dev);
> @@ -2717,13 +2720,66 @@ AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
>
> iommu_bus = parent_bus;
> }
> - if (!pci_bus_bypass_iommu(bus) && iommu_bus->iommu_ops) {
> +
> + assert(0 <= devfn && devfn < PCI_DEVFN_MAX);
> + assert(iommu_bus);
> +
> + if (pci_bus_bypass_iommu(bus) || !iommu_bus->iommu_ops) {
> + iommu_bus = NULL;
> + }
> +
> + *piommu_bus = iommu_bus;
> +
> + if (aliased_bus) {
> + *aliased_bus = bus;
> + }
> +
> + if (aliased_devfn) {
> + *aliased_devfn = devfn;
> + }
> +}
> +
> +AddressSpace *pci_device_iommu_address_space(PCIDevice *dev)
> +{
> + PCIBus *bus;
> + PCIBus *iommu_bus;
> + int devfn;
> +
> + pci_device_get_iommu_bus_devfn(dev, &bus, &iommu_bus, &devfn);
> + if (iommu_bus) {
> return iommu_bus->iommu_ops->get_address_space(bus,
> iommu_bus->iommu_opaque, devfn);
> }
> return &address_space_memory;
> }
>
> +int pci_device_set_iommu_device(PCIDevice *dev, HostIOMMUDevice *base_dev,
> + Error **errp)
> +{
> + PCIBus *iommu_bus;
> +
> + pci_device_get_iommu_bus_devfn(dev, NULL, &iommu_bus, NULL);
> + if (iommu_bus && iommu_bus->iommu_ops->set_iommu_device) {
> + return iommu_bus->iommu_ops->set_iommu_device(pci_get_bus(dev),
> + iommu_bus->iommu_opaque,
> + dev->devfn, base_dev,
> + errp);
> + }
> + return 0;
> +}
> +
> +void pci_device_unset_iommu_device(PCIDevice *dev)
> +{
> + PCIBus *iommu_bus;
> +
> + pci_device_get_iommu_bus_devfn(dev, NULL, &iommu_bus, NULL);
> + if (iommu_bus && iommu_bus->iommu_ops->unset_iommu_device) {
> + return iommu_bus->iommu_ops->unset_iommu_device(pci_get_bus(dev),
> + iommu_bus->iommu_opaque,
> + dev->devfn);
> + }
> +}
> +
> void pci_setup_iommu(PCIBus *bus, const PCIIOMMUOps *ops, void *opaque)
> {
> /*
Eric
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 11/18] intel_iommu: Add set/unset_iommu_device callback
2024-02-01 7:28 ` [PATCH rfcv2 11/18] intel_iommu: Add set/unset_iommu_device callback Zhenzhong Duan
@ 2024-02-19 17:46 ` Eric Auger
2024-02-26 6:52 ` Duan, Zhenzhong
0 siblings, 1 reply; 35+ messages in thread
From: Eric Auger @ 2024-02-19 17:46 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, peterx, jasowang, mst, jgg, nicolinc,
joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Yi Sun, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcel Apfelbaum
On 2/1/24 08:28, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
>
> This adds set/unset_iommu_device() implementation in Intel vIOMMU.
> In set call, a pointer to host IOMMU device info is stored in hash
> table indexed by PCI BDF.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/intel_iommu_internal.h | 14 +++++++
> include/hw/i386/intel_iommu.h | 2 +
> hw/i386/intel_iommu.c | 74 ++++++++++++++++++++++++++++++++++
> 3 files changed, 90 insertions(+)
>
> diff --git a/hw/i386/intel_iommu_internal.h b/hw/i386/intel_iommu_internal.h
> index f8cf99bddf..3301f54b35 100644
> --- a/hw/i386/intel_iommu_internal.h
> +++ b/hw/i386/intel_iommu_internal.h
> @@ -28,6 +28,8 @@
> #ifndef HW_I386_INTEL_IOMMU_INTERNAL_H
> #define HW_I386_INTEL_IOMMU_INTERNAL_H
> #include "hw/i386/intel_iommu.h"
> +#include "sysemu/host_iommu_device.h"
> +#include "hw/vfio/vfio-common.h"
>
> /*
> * Intel IOMMU register specification
> @@ -537,4 +539,16 @@ typedef struct VTDRootEntry VTDRootEntry;
> #define VTD_SL_IGN_COM 0xbff0000000000000ULL
> #define VTD_SL_TM (1ULL << 62)
>
> +
> +typedef struct VTDHostIOMMUDevice {
> + IntelIOMMUState *iommu_state;
> + PCIBus *bus;
> + uint8_t devfn;
> + union {
> + HostIOMMUDevice *dev;
> + IOMMULegacyDevice *ldev;
> + IOMMUFDDevice *idev;
> + };
again this looks really weird to me. Why don't we simply have
HostIOMMUDevice *dev;
> + QLIST_ENTRY(VTDHostIOMMUDevice) next;
> +} VTDHostIOMMUDevice;
> #endif
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index 7fa0a695c8..bbc7b96add 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -292,6 +292,8 @@ struct IntelIOMMUState {
> /* list of registered notifiers */
> QLIST_HEAD(, VTDAddressSpace) vtd_as_with_notifiers;
>
> + GHashTable *vtd_host_iommu_dev; /* VTDHostIOMMUDevice */
> +
> /* interrupt remapping */
> bool intr_enabled; /* Whether guest enabled IR */
> dma_addr_t intr_root; /* Interrupt remapping table pointer */
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 1a07faddb4..9b62441439 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -237,6 +237,13 @@ static gboolean vtd_as_equal(gconstpointer v1, gconstpointer v2)
> (key1->pasid == key2->pasid);
> }
>
> +static gboolean vtd_as_idev_equal(gconstpointer v1, gconstpointer v2)
> +{
> + const struct vtd_as_key *key1 = v1;
> + const struct vtd_as_key *key2 = v2;
> +
> + return (key1->bus == key2->bus) && (key1->devfn == key2->devfn);
> +}
> /*
> * Note that we use pointer to PCIBus as the key, so hashing/shifting
> * based on the pointer value is intended. Note that we deal with
> @@ -3812,6 +3819,68 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
> return vtd_dev_as;
> }
>
> +static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> + HostIOMMUDevice *base_dev, Error **errp)
> +{
> + IntelIOMMUState *s = opaque;
> + VTDHostIOMMUDevice *vtd_hdev;
> + struct vtd_as_key key = {
> + .bus = bus,
> + .devfn = devfn,
> + };
> + struct vtd_as_key *new_key;
> +
> + assert(base_dev);
> +
> + vtd_iommu_lock(s);
> +
> + vtd_hdev = g_hash_table_lookup(s->vtd_host_iommu_dev, &key);
> +
> + if (vtd_hdev) {
> + error_setg(errp, "IOMMUFD device already exist");
> + vtd_iommu_unlock(s);
> + return -EEXIST;
> + }
> +
> + vtd_hdev = g_malloc0(sizeof(VTDHostIOMMUDevice));
> + vtd_hdev->bus = bus;
> + vtd_hdev->devfn = (uint8_t)devfn;
> + vtd_hdev->iommu_state = s;
> + vtd_hdev->dev = base_dev;
and here you set the base pointer.
> +
> + new_key = g_malloc(sizeof(*new_key));
> + new_key->bus = bus;
> + new_key->devfn = devfn;
> +
> + g_hash_table_insert(s->vtd_host_iommu_dev, new_key, vtd_hdev);
> +
> + vtd_iommu_unlock(s);
> +
> + return 0;
> +}
> +
> +static void vtd_dev_unset_iommu_device(PCIBus *bus, void *opaque, int devfn)
> +{
> + IntelIOMMUState *s = opaque;
> + VTDHostIOMMUDevice *vtd_hdev;
> + struct vtd_as_key key = {
> + .bus = bus,
> + .devfn = devfn,
> + };
> +
> + vtd_iommu_lock(s);
> +
> + vtd_hdev = g_hash_table_lookup(s->vtd_host_iommu_dev, &key);
> + if (!vtd_hdev) {
> + vtd_iommu_unlock(s);
> + return;
> + }
> +
> + g_hash_table_remove(s->vtd_host_iommu_dev, &key);
> +
> + vtd_iommu_unlock(s);
> +}
> +
> /* Unmap the whole range in the notifier's scope. */
> static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
> {
> @@ -4107,6 +4176,8 @@ static AddressSpace *vtd_host_dma_iommu(PCIBus *bus, void *opaque, int devfn)
>
> static PCIIOMMUOps vtd_iommu_ops = {
> .get_address_space = vtd_host_dma_iommu,
> + .set_iommu_device = vtd_dev_set_iommu_device,
> + .unset_iommu_device = vtd_dev_unset_iommu_device,
> };
>
> static bool vtd_decide_config(IntelIOMMUState *s, Error **errp)
> @@ -4230,6 +4301,9 @@ static void vtd_realize(DeviceState *dev, Error **errp)
> g_free, g_free);
> s->vtd_address_spaces = g_hash_table_new_full(vtd_as_hash, vtd_as_equal,
> g_free, g_free);
> + s->vtd_host_iommu_dev = g_hash_table_new_full(vtd_as_hash,
> + vtd_as_idev_equal,
> + g_free, g_free);
> vtd_init(s);
> pci_setup_iommu(bus, &vtd_iommu_ops, dev);
> /* Pseudo address space under root PCI bus. */
Eric
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap
2024-02-01 7:28 ` [PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap Zhenzhong Duan
@ 2024-02-19 17:51 ` Eric Auger
2024-02-26 7:36 ` Duan, Zhenzhong
0 siblings, 1 reply; 35+ messages in thread
From: Eric Auger @ 2024-02-19 17:51 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, peterx, jasowang, mst, jgg, nicolinc,
joao.m.martins, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Yi Sun, Paolo Bonzini, Richard Henderson, Eduardo Habkost,
Marcel Apfelbaum
On 2/1/24 08:28, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
>
> Add a framework to check and synchronize host IOMMU cap/ecap with
> vIOMMU cap/ecap.
>
> The sequence will be:
>
> vtd_cap_init() initializes iommu->cap/ecap.
> vtd_check_hdev() update iommu->cap/ecap based on host cap/ecap.
> iommu->cap_frozen set when machine create done, iommu->cap/ecap become readonly.
>
> Implementation details for different backends will be in following patches.
>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> include/hw/i386/intel_iommu.h | 1 +
> hw/i386/intel_iommu.c | 41 ++++++++++++++++++++++++++++++++++-
> 2 files changed, 41 insertions(+), 1 deletion(-)
>
> diff --git a/include/hw/i386/intel_iommu.h b/include/hw/i386/intel_iommu.h
> index bbc7b96add..c71a133820 100644
> --- a/include/hw/i386/intel_iommu.h
> +++ b/include/hw/i386/intel_iommu.h
> @@ -283,6 +283,7 @@ struct IntelIOMMUState {
>
> uint64_t cap; /* The value of capability reg */
> uint64_t ecap; /* The value of extended capability reg */
> + bool cap_frozen; /* cap/ecap become read-only after frozen */
>
> uint32_t context_cache_gen; /* Should be in [1,MAX] */
> GHashTable *iotlb; /* IOTLB */
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index ffa1ad6429..7ed2b79669 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -3819,6 +3819,31 @@ VTDAddressSpace *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
> return vtd_dev_as;
> }
>
> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
> + IOMMULegacyDevice *ldev,
> + Error **errp)
> +{
> + return 0;
> +}
> +
> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
> + IOMMUFDDevice *idev,
> + Error **errp)
> +{
> + return 0;
> +}
> +
> +static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice *vtd_hdev,
> + Error **errp)
> +{
> + HostIOMMUDevice *base_dev = vtd_hdev->dev;
> +
> + if (base_dev->type == HID_LEGACY) {
> + return vtd_check_legacy_hdev(s, vtd_hdev->ldev, errp);
> + }
> + return vtd_check_iommufd_hdev(s, vtd_hdev->idev, errp);
Couldn't we have HostIOMMUDevice ops instead of having this check here?
Eric
> +}
> +
> static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> HostIOMMUDevice *base_dev, Error **errp)
> {
> @@ -3829,6 +3854,7 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> .devfn = devfn,
> };
> struct vtd_as_key *new_key;
> + int ret;
>
> assert(base_dev);
>
> @@ -3848,6 +3874,13 @@ static int vtd_dev_set_iommu_device(PCIBus *bus, void *opaque, int devfn,
> vtd_hdev->iommu_state = s;
> vtd_hdev->dev = base_dev;
>
> + ret = vtd_check_hdev(s, vtd_hdev, errp);
> + if (ret) {
> + g_free(vtd_hdev);
> + vtd_iommu_unlock(s);
> + return ret;
> + }
> +
> new_key = g_malloc(sizeof(*new_key));
> new_key->bus = bus;
> new_key->devfn = devfn;
> @@ -4083,7 +4116,9 @@ static void vtd_init(IntelIOMMUState *s)
> s->iq_dw = false;
> s->next_frcd_reg = 0;
>
> - vtd_cap_init(s);
> + if (!s->cap_frozen) {
> + vtd_cap_init(s);
> + }
>
> /*
> * Rsvd field masks for spte
> @@ -4254,6 +4289,10 @@ static int vtd_machine_done_notify_one(Object *child, void *unused)
>
> static void vtd_machine_done_hook(Notifier *notifier, void *unused)
> {
> + IntelIOMMUState *iommu = INTEL_IOMMU_DEVICE(x86_iommu_get_default());
> +
> + iommu->cap_frozen = true;
> +
> object_child_foreach_recursive(object_get_root(),
> vtd_machine_done_notify_one, NULL);
> }
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice
2024-02-19 15:45 ` Eric Auger
@ 2024-02-26 6:16 ` Duan, Zhenzhong
0 siblings, 0 replies; 35+ messages in thread
From: Duan, Zhenzhong @ 2024-02-26 6:16 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, peterx@redhat.com,
jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, Tian, Kevin,
Liu, Yi L, Sun, Yi Y, Peng, Chao P
Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH rfcv2 04/18] vfio: Add host iommu device instance into
>VFIODevice
>
>
>
>On 2/1/24 08:28, Zhenzhong Duan wrote:
>> Either IOMMULegacyDevice or IOMMUFDDevice into VFIODevice, neither
>> both.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> include/hw/vfio/vfio-common.h | 11 +++++++++++
>> 1 file changed, 11 insertions(+)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-
>common.h
>> index 8bfb9cbe94..1bbad003ee 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -32,6 +32,7 @@
>> #include "sysemu/sysemu.h"
>> #include "hw/vfio/vfio-container-base.h"
>> #include "sysemu/host_iommu_device.h"
>> +#include "sysemu/iommufd.h"
>>
>> #define VFIO_MSG_PREFIX "vfio %s: "
>>
>> @@ -132,8 +133,18 @@ typedef struct VFIODevice {
>> bool dirty_tracking;
>> int devid;
>> IOMMUFDBackend *iommufd;
>> + union {
>> + HostIOMMUDevice base_hdev;
>> + IOMMULegacyDevice legacy_dev;
>> + IOMMUFDDevice iommufd_dev;
>I think you should rather have a HostIOMMUDevice handle.
>
>host_iommu_device_init cb would allocate the right type of the derived
>object and you would store the base object pointer here.
Sorry for late response, just back from vacation.
I didn't use a HostIOMMUDevice handle but a statically allocated one
Because in following patch:
'[PATCH rfcv2 05/18] vfio: Remove redundant iommufd and devid elements in VFIODevice'
iommufd and devid are removed from VFIODevice. I need a place to store
them early in attachment. The handle is dynamically allocated after
attachment and can't be used for that purpose.
I'll change to use HostIOMMUDevice handle and drop patch5 in rfcv3,
Let me know if you have other comments.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH rfcv2 10/18] hw/pci: Introduce pci_device_set/unset_iommu_device()
2024-02-19 17:41 ` Eric Auger
@ 2024-02-26 6:26 ` Duan, Zhenzhong
0 siblings, 0 replies; 35+ messages in thread
From: Duan, Zhenzhong @ 2024-02-26 6:26 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, peterx@redhat.com,
jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, Tian, Kevin,
Liu, Yi L, Sun, Yi Y, Peng, Chao P, Yi Sun, Marcel Apfelbaum
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH rfcv2 10/18] hw/pci: Introduce
>pci_device_set/unset_iommu_device()
>
>Hi Zhenzhong,
>
>On 2/1/24 08:28, Zhenzhong Duan wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>>
>> This adds pci_device_set/unset_iommu_device() to set/unset
>> HostIOMMUDevice for a given PCIe device. Caller of set
>> should fail if set operation fails.
>>
>> Extract out pci_device_get_iommu_bus_devfn() to facilitate
>> implementation of pci_device_set/unset_iommu_device().
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> include/hw/pci/pci.h | 38 ++++++++++++++++++++++++++-
>> hw/pci/pci.c | 62
>+++++++++++++++++++++++++++++++++++++++++---
>> 2 files changed, 96 insertions(+), 4 deletions(-)
>>
>> diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
>> index fa6313aabc..5b471fd380 100644
>> --- a/include/hw/pci/pci.h
>> +++ b/include/hw/pci/pci.h
>> @@ -3,6 +3,7 @@
>>
>> #include "exec/memory.h"
>> #include "sysemu/dma.h"
>> +#include "sysemu/host_iommu_device.h"
>>
>> /* PCI includes legacy ISA access. */
>> #include "hw/isa/isa.h"
>> @@ -384,10 +385,45 @@ typedef struct PCIIOMMUOps {
>> *
>> * @devfn: device and function number
>> */
>> - AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int
>devfn);
>> + AddressSpace * (*get_address_space)(PCIBus *bus, void *opaque, int
>devfn);
>> + /**
>> + * @set_iommu_device: set iommufd device for a PCI device to
>vIOMMU
>attach a HostIOMMUDevice to a vIOMMU
Will do.
>> + *
>> + * Optional callback, if not implemented in vIOMMU, then vIOMMU
>can't
>> + * utilize iommufd specific features.
>looks too iommufd specific. Then vIOMMU can't retrieve host information
>from the associated HostIOMMUDevice
Will do.
>> + *
>> + * Return true if iommufd device is accepted, or else return false with
>s/accepted/attached
Will do.
>> + * errp set.
>> + *
>> + * @bus: the #PCIBus of the PCI device.
>> + *
>> + * @opaque: the data passed to pci_setup_iommu().
>> + *
>> + * @devfn: device and function number of the PCI device.
>> + *
>> + * @dev: the data structure representing host assigned device.
>> + *
>> + */
>> + int (*set_iommu_device)(PCIBus *bus, void *opaque, int devfn,
>> + HostIOMMUDevice *dev, Error **errp);
>> + /**
>> + * @unset_iommu_device: unset iommufd device for a PCI device from
>vIOMMU
>same suggestion here
Will do.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH rfcv2 11/18] intel_iommu: Add set/unset_iommu_device callback
2024-02-19 17:46 ` Eric Auger
@ 2024-02-26 6:52 ` Duan, Zhenzhong
0 siblings, 0 replies; 35+ messages in thread
From: Duan, Zhenzhong @ 2024-02-26 6:52 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, peterx@redhat.com,
jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, Tian, Kevin,
Liu, Yi L, Sun, Yi Y, Peng, Chao P, Yi Sun, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH rfcv2 11/18] intel_iommu: Add
>set/unset_iommu_device callback
>
>
>
>On 2/1/24 08:28, Zhenzhong Duan wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>>
>> This adds set/unset_iommu_device() implementation in Intel vIOMMU.
>> In set call, a pointer to host IOMMU device info is stored in hash
>> table indexed by PCI BDF.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> hw/i386/intel_iommu_internal.h | 14 +++++++
>> include/hw/i386/intel_iommu.h | 2 +
>> hw/i386/intel_iommu.c | 74
>++++++++++++++++++++++++++++++++++
>> 3 files changed, 90 insertions(+)
>>
>> diff --git a/hw/i386/intel_iommu_internal.h
>b/hw/i386/intel_iommu_internal.h
>> index f8cf99bddf..3301f54b35 100644
>> --- a/hw/i386/intel_iommu_internal.h
>> +++ b/hw/i386/intel_iommu_internal.h
>> @@ -28,6 +28,8 @@
>> #ifndef HW_I386_INTEL_IOMMU_INTERNAL_H
>> #define HW_I386_INTEL_IOMMU_INTERNAL_H
>> #include "hw/i386/intel_iommu.h"
>> +#include "sysemu/host_iommu_device.h"
>> +#include "hw/vfio/vfio-common.h"
>>
>> /*
>> * Intel IOMMU register specification
>> @@ -537,4 +539,16 @@ typedef struct VTDRootEntry VTDRootEntry;
>> #define VTD_SL_IGN_COM 0xbff0000000000000ULL
>> #define VTD_SL_TM (1ULL << 62)
>>
>> +
>> +typedef struct VTDHostIOMMUDevice {
>> + IntelIOMMUState *iommu_state;
>> + PCIBus *bus;
>> + uint8_t devfn;
>> + union {
>> + HostIOMMUDevice *dev;
>> + IOMMULegacyDevice *ldev;
>> + IOMMUFDDevice *idev;
>> + };
>again this looks really weird to me. Why don't we simply have
>
>HostIOMMUDevice *dev;
Sure, will do.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH rfcv2 07/18] vfio/container: Implement host_iommu_device_init callback in legacy mode
2024-02-19 17:13 ` Eric Auger
@ 2024-02-26 6:56 ` Duan, Zhenzhong
0 siblings, 0 replies; 35+ messages in thread
From: Duan, Zhenzhong @ 2024-02-26 6:56 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, peterx@redhat.com,
jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, Tian, Kevin,
Liu, Yi L, Sun, Yi Y, Peng, Chao P
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH rfcv2 07/18] vfio/container: Implement
>host_iommu_device_init callback in legacy mode
>
>Hi Zhenzhong,
>On 2/1/24 08:28, Zhenzhong Duan wrote:
>> This callback will be used to initialize base and public elements
>> in IOMMULegacyDevice.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> hw/vfio/container.c | 7 +++++++
>> 1 file changed, 7 insertions(+)
>>
>> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
>> index bd25b9fbad..8fafd4b4e5 100644
>> --- a/hw/vfio/container.c
>> +++ b/hw/vfio/container.c
>> @@ -1120,6 +1120,12 @@ out_single:
>> return ret;
>> }
>>
>> +static void vfio_legacy_host_iommu_device_init(VFIODevice *vbasedev)
>> +{
>> + host_iommu_base_device_init(&vbasedev->base_hdev, HID_LEGACY,
>> + sizeof(IOMMULegacyDevice));
>To me this should allocate a new
>
> IOMMULegacyDevice and set the VFIODevice base_hdev handle to its base
Sure, will do.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap
2024-02-19 17:51 ` Eric Auger
@ 2024-02-26 7:36 ` Duan, Zhenzhong
2024-02-27 17:06 ` Eric Auger
0 siblings, 1 reply; 35+ messages in thread
From: Duan, Zhenzhong @ 2024-02-26 7:36 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, peterx@redhat.com,
jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, Tian, Kevin,
Liu, Yi L, Sun, Yi Y, Peng, Chao P, Yi Sun, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Subject: Re: [PATCH rfcv2 14/18] intel_iommu: Add a framework to check
>and sync host IOMMU cap/ecap
>
>
>
>On 2/1/24 08:28, Zhenzhong Duan wrote:
>> From: Yi Liu <yi.l.liu@intel.com>
>>
>> Add a framework to check and synchronize host IOMMU cap/ecap with
>> vIOMMU cap/ecap.
>>
>> The sequence will be:
>>
>> vtd_cap_init() initializes iommu->cap/ecap.
>> vtd_check_hdev() update iommu->cap/ecap based on host cap/ecap.
>> iommu->cap_frozen set when machine create done, iommu->cap/ecap
>become readonly.
>>
>> Implementation details for different backends will be in following patches.
>>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> include/hw/i386/intel_iommu.h | 1 +
>> hw/i386/intel_iommu.c | 41
>++++++++++++++++++++++++++++++++++-
>> 2 files changed, 41 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/hw/i386/intel_iommu.h
>b/include/hw/i386/intel_iommu.h
>> index bbc7b96add..c71a133820 100644
>> --- a/include/hw/i386/intel_iommu.h
>> +++ b/include/hw/i386/intel_iommu.h
>> @@ -283,6 +283,7 @@ struct IntelIOMMUState {
>>
>> uint64_t cap; /* The value of capability reg */
>> uint64_t ecap; /* The value of extended capability reg */
>> + bool cap_frozen; /* cap/ecap become read-only after frozen */
>>
>> uint32_t context_cache_gen; /* Should be in [1,MAX] */
>> GHashTable *iotlb; /* IOTLB */
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index ffa1ad6429..7ed2b79669 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -3819,6 +3819,31 @@ VTDAddressSpace
>*vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>> return vtd_dev_as;
>> }
>>
>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>> + IOMMULegacyDevice *ldev,
>> + Error **errp)
>> +{
>> + return 0;
>> +}
>> +
>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>> + IOMMUFDDevice *idev,
>> + Error **errp)
>> +{
>> + return 0;
>> +}
>> +
>> +static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>*vtd_hdev,
>> + Error **errp)
>> +{
>> + HostIOMMUDevice *base_dev = vtd_hdev->dev;
>> +
>> + if (base_dev->type == HID_LEGACY) {
>> + return vtd_check_legacy_hdev(s, vtd_hdev->ldev, errp);
>> + }
>> + return vtd_check_iommufd_hdev(s, vtd_hdev->idev, errp);
>Couldn't we have HostIOMMUDevice ops instead of having this check here?
Hmm, not sure if this is deserved. If we define such ops, it has only one function
check_hdev and we still need to check base_dev->type to assign different
function to HostIOMMUDevice.ops.check_hdev in vtd_dev_set_iommu_device().
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated
2024-02-13 10:55 ` Joao Martins
@ 2024-02-27 2:41 ` Duan, Zhenzhong
2024-02-27 11:08 ` Joao Martins
0 siblings, 1 reply; 35+ messages in thread
From: Duan, Zhenzhong @ 2024-02-27 2:41 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, mst@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, Marcel Apfelbaum
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is
>updated
>
>On 01/02/2024 07:28, Zhenzhong Duan wrote:
>> When there is VFIO device and vIOMMU cap/ecap is updated based on
>host
>> IOMMU cap/ecap, migration should be blocked.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>
>Is this really needed considering migration with vIOMMU is already blocked
>anyways?
VFIO device can be hot unplugged, then blocker due to vIOMMU is removed,
but we still need a blocker for cap/ecap update.
Thanks
Zhenzhong
>
>> ---
>> hw/i386/intel_iommu.c | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>> index 72cc8b2c71..7f9ff653b2 100644
>> --- a/hw/i386/intel_iommu.c
>> +++ b/hw/i386/intel_iommu.c
>> @@ -39,6 +39,7 @@
>> #include "hw/i386/apic_internal.h"
>> #include "kvm/kvm_i386.h"
>> #include "migration/vmstate.h"
>> +#include "migration/blocker.h"
>> #include "trace.h"
>>
>> #define S_AW_BITS (VTD_MGAW_FROM_CAP(s->cap) + 1)
>> @@ -3829,6 +3830,8 @@ static int
>vtd_check_legacy_hdev(IntelIOMMUState *s,
>> return 0;
>> }
>>
>> +static Error *vtd_mig_blocker;
>> +
>> static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>> IOMMUFDDevice *idev,
>> Error **errp)
>> @@ -3860,8 +3863,17 @@ static int
>vtd_check_iommufd_hdev(IntelIOMMUState *s,
>> tmp_cap |= VTD_CAP_MGAW(host_mgaw + 1);
>> }
>>
>> - s->cap = tmp_cap;
>> - return 0;
>> + if (s->cap != tmp_cap) {
>> + if (vtd_mig_blocker == NULL) {
>> + error_setg(&vtd_mig_blocker,
>> + "cap/ecap update from host IOMMU block migration");
>> + ret = migrate_add_blocker(&vtd_mig_blocker, errp);
>> + }
>> + if (!ret) {
>> + s->cap = tmp_cap;
>> + }
>> + }
>> + return ret;
>> }
>>
>> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>*vtd_hdev,
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated
2024-02-27 2:41 ` Duan, Zhenzhong
@ 2024-02-27 11:08 ` Joao Martins
2024-02-28 2:14 ` Duan, Zhenzhong
0 siblings, 1 reply; 35+ messages in thread
From: Joao Martins @ 2024-02-27 11:08 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, mst@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, Marcel Apfelbaum
On 27/02/2024 02:41, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is
>> updated
>>
>> On 01/02/2024 07:28, Zhenzhong Duan wrote:
>>> When there is VFIO device and vIOMMU cap/ecap is updated based on
>> host
>>> IOMMU cap/ecap, migration should be blocked.
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>
>> Is this really needed considering migration with vIOMMU is already blocked
>> anyways?
>
> VFIO device can be hot unplugged, then blocker due to vIOMMU is removed,
> but we still need a blocker for cap/ecap update.
>
Right which then the blocker gets re-added after you add one VFIO device. The
commit message refers xplicitly VFIO device, why would you care about blocking
migration on vIOMMU without vfio devices present? Maybe there's another reason
but that the commit messages doesn't cover? like guest MGAW being bigger than
host MGAW or something like that?
Joao
^ permalink raw reply [flat|nested] 35+ messages in thread
* Re: [PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap
2024-02-26 7:36 ` Duan, Zhenzhong
@ 2024-02-27 17:06 ` Eric Auger
0 siblings, 0 replies; 35+ messages in thread
From: Eric Auger @ 2024-02-27 17:06 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, peterx@redhat.com,
jasowang@redhat.com, mst@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, Tian, Kevin,
Liu, Yi L, Sun, Yi Y, Peng, Chao P, Yi Sun, Paolo Bonzini,
Richard Henderson, Eduardo Habkost, Marcel Apfelbaum
Hi Zhenzhong,
On 2/26/24 08:36, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Subject: Re: [PATCH rfcv2 14/18] intel_iommu: Add a framework to check
>> and sync host IOMMU cap/ecap
>>
>>
>>
>> On 2/1/24 08:28, Zhenzhong Duan wrote:
>>> From: Yi Liu <yi.l.liu@intel.com>
>>>
>>> Add a framework to check and synchronize host IOMMU cap/ecap with
>>> vIOMMU cap/ecap.
>>>
>>> The sequence will be:
>>>
>>> vtd_cap_init() initializes iommu->cap/ecap.
>>> vtd_check_hdev() update iommu->cap/ecap based on host cap/ecap.
>>> iommu->cap_frozen set when machine create done, iommu->cap/ecap
>> become readonly.
>>> Implementation details for different backends will be in following patches.
>>>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>> include/hw/i386/intel_iommu.h | 1 +
>>> hw/i386/intel_iommu.c | 41
>> ++++++++++++++++++++++++++++++++++-
>>> 2 files changed, 41 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/hw/i386/intel_iommu.h
>> b/include/hw/i386/intel_iommu.h
>>> index bbc7b96add..c71a133820 100644
>>> --- a/include/hw/i386/intel_iommu.h
>>> +++ b/include/hw/i386/intel_iommu.h
>>> @@ -283,6 +283,7 @@ struct IntelIOMMUState {
>>>
>>> uint64_t cap; /* The value of capability reg */
>>> uint64_t ecap; /* The value of extended capability reg */
>>> + bool cap_frozen; /* cap/ecap become read-only after frozen */
>>>
>>> uint32_t context_cache_gen; /* Should be in [1,MAX] */
>>> GHashTable *iotlb; /* IOTLB */
>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>> index ffa1ad6429..7ed2b79669 100644
>>> --- a/hw/i386/intel_iommu.c
>>> +++ b/hw/i386/intel_iommu.c
>>> @@ -3819,6 +3819,31 @@ VTDAddressSpace
>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>> return vtd_dev_as;
>>> }
>>>
>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>> + IOMMULegacyDevice *ldev,
>>> + Error **errp)
>>> +{
>>> + return 0;
>>> +}
>>> +
>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>> + IOMMUFDDevice *idev,
>>> + Error **errp)
>>> +{
>>> + return 0;
>>> +}
>>> +
>>> +static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>> *vtd_hdev,
>>> + Error **errp)
>>> +{
>>> + HostIOMMUDevice *base_dev = vtd_hdev->dev;
>>> +
>>> + if (base_dev->type == HID_LEGACY) {
>>> + return vtd_check_legacy_hdev(s, vtd_hdev->ldev, errp);
>>> + }
>>> + return vtd_check_iommufd_hdev(s, vtd_hdev->idev, errp);
>> Couldn't we have HostIOMMUDevice ops instead of having this check here?
> Hmm, not sure if this is deserved. If we define such ops, it has only one function
> check_hdev and we still need to check base_dev->type to assign different
> function to HostIOMMUDevice.ops.check_hdev in vtd_dev_set_iommu_device().
OK maybe this is over engineered. Drop that comment for now
Thanks
Eric
>
> Thanks
> Zhenzhong
^ permalink raw reply [flat|nested] 35+ messages in thread
* RE: [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated
2024-02-27 11:08 ` Joao Martins
@ 2024-02-28 2:14 ` Duan, Zhenzhong
0 siblings, 0 replies; 35+ messages in thread
From: Duan, Zhenzhong @ 2024-02-28 2:14 UTC (permalink / raw)
To: Joao Martins, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, mst@redhat.com,
jgg@nvidia.com, nicolinc@nvidia.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P, Paolo Bonzini, Richard Henderson,
Eduardo Habkost, Marcel Apfelbaum
>-----Original Message-----
>From: Joao Martins <joao.m.martins@oracle.com>
>Subject: Re: [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is
>updated
>
>On 27/02/2024 02:41, Duan, Zhenzhong wrote:
>>
>>
>>> -----Original Message-----
>>> From: Joao Martins <joao.m.martins@oracle.com>
>>> Subject: Re: [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is
>>> updated
>>>
>>> On 01/02/2024 07:28, Zhenzhong Duan wrote:
>>>> When there is VFIO device and vIOMMU cap/ecap is updated based on
>>> host
>>>> IOMMU cap/ecap, migration should be blocked.
>>>>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>
>>> Is this really needed considering migration with vIOMMU is already
>blocked
>>> anyways?
>>
>> VFIO device can be hot unplugged, then blocker due to vIOMMU is
>removed,
>> but we still need a blocker for cap/ecap update.
>>
>
>Right which then the blocker gets re-added after you add one VFIO device.
>The
>commit message refers xplicitly VFIO device, why would you care about
>blocking
>migration on vIOMMU without vfio devices present? Maybe there's another
>reason
>but that the commit messages doesn't cover? like guest MGAW being bigger
>than
>host MGAW or something like that?
If qemu starts with cold plugged vfio device, that vfio device may update cap/ecap.
Even if that vfio device is unplugged at runtime, the changed cap/ecap is kept.
In this case source and dest will have incompatible cap/ecap config.
So I block migration if there is cap/ecap update on source side.
This patch is to deal with the case that there is cold plugged vfio device which is
unplugged at runtime and then migration happen.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 35+ messages in thread
end of thread, other threads:[~2024-02-28 2:15 UTC | newest]
Thread overview: 35+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-02-01 7:28 [PATCH rfcv2 00/18] Check and sync host IOMMU cap/ecap with vIOMMU Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 01/18] Introduce a common abstract struct HostIOMMUDevice Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 02/18] backends/iommufd: Introduce IOMMUFDDevice Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 03/18] vfio: Introduce IOMMULegacyDevice Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 04/18] vfio: Add host iommu device instance into VFIODevice Zhenzhong Duan
2024-02-19 15:34 ` Eric Auger
2024-02-19 15:45 ` Eric Auger
2024-02-26 6:16 ` Duan, Zhenzhong
2024-02-01 7:28 ` [PATCH rfcv2 05/18] vfio: Remove redundant iommufd and devid elements in VFIODevice Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 06/18] vfio: Introduce host_iommu_device_init callback Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 07/18] vfio/container: Implement host_iommu_device_init callback in legacy mode Zhenzhong Duan
2024-02-19 17:13 ` Eric Auger
2024-02-26 6:56 ` Duan, Zhenzhong
2024-02-01 7:28 ` [PATCH rfcv2 08/18] vfio/iommufd: Implement host_iommu_device_init callback in iommufd mode Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 09/18] vfio/pci: Initialize host iommu device instance after attachment Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 10/18] hw/pci: Introduce pci_device_set/unset_iommu_device() Zhenzhong Duan
2024-02-19 17:41 ` Eric Auger
2024-02-26 6:26 ` Duan, Zhenzhong
2024-02-01 7:28 ` [PATCH rfcv2 11/18] intel_iommu: Add set/unset_iommu_device callback Zhenzhong Duan
2024-02-19 17:46 ` Eric Auger
2024-02-26 6:52 ` Duan, Zhenzhong
2024-02-01 7:28 ` [PATCH rfcv2 12/18] vfio: Initialize host IOMMU device and pass to vIOMMU Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 13/18] intel_iommu: Extract out vtd_cap_init to initialize cap/ecap Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 14/18] intel_iommu: Add a framework to check and sync host IOMMU cap/ecap Zhenzhong Duan
2024-02-19 17:51 ` Eric Auger
2024-02-26 7:36 ` Duan, Zhenzhong
2024-02-27 17:06 ` Eric Auger
2024-02-01 7:28 ` [PATCH rfcv2 15/18] backends/iommufd: Introduce helper function iommufd_device_get_info() Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 16/18] intel_iommu: Implement check and sync mechanism in iommufd mode Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 17/18] intel_iommu: Use mgaw instead of s->aw_bits Zhenzhong Duan
2024-02-01 7:28 ` [PATCH rfcv2 18/18] intel_iommu: Block migration if cap is updated Zhenzhong Duan
2024-02-13 10:55 ` Joao Martins
2024-02-27 2:41 ` Duan, Zhenzhong
2024-02-27 11:08 ` Joao Martins
2024-02-28 2:14 ` Duan, Zhenzhong
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).