* [PATCH v6 00/21] vfio: Adopt iommufd
@ 2023-11-14 10:09 Zhenzhong Duan
2023-11-14 10:09 ` [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
` (22 more replies)
0 siblings, 23 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
Hi,
Thanks all for giving guides and comments on previous series, this is
the remaining part of the iommufd support.
Based on Cédric's suggestion, replace old config method for IOMMUFD
with Kconfig.
Based on Jason's suggestion, drop the implementation of manually
allocating hwpt and switch to IOAS attach/detach.
Beside current test, we also tested mdev with mtty for better cover range.
PATCH 1: Introduce iommufd object
PATCH 2-9: add IOMMUFD container and cdev support
PATCH 10-17: fd passing for cdev and linking to IOMMUFD
PATCH 18: make VFIOContainerBase parameter const
PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
We have done wide test with different combinations, e.g:
- PCI device were tested
- FD passing and hot reset with some trick.
- device hotplug test with legacy and iommufd backends
- with or without vIOMMU for legacy and iommufd backends
- divices linked to different iommufds
- VFIO migration with a E800 net card(no dirty sync support) passthrough
- platform, ccw and ap were only compile-tested due to environment limit
- test mdev pass through with mtty and mix with real device and different BE
Given some iommufd kernel limitations, the iommufd backend is
not yet fully on par with the legacy backend w.r.t. features like:
- p2p mappings (you will see related error traces)
- dirty page sync
- and etc.
qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6
Based on vfio-next, commit id: 1a22fb936e
--------------------------------------------------------------------------
Below are some background and graph about the design:
With the introduction of iommufd, the Linux kernel provides a generic
interface for userspace drivers to propagate their DMA mappings to kernel
for assigned devices. This series does the porting of the VFIO devices
onto the /dev/iommu uapi and let it coexist with the legacy implementation.
At QEMU level, interactions with the /dev/iommu are abstracted by a new
iommufd object (compiled in with the CONFIG_IOMMUFD option).
Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
linked with an iommufd object. In this series, the vfio-pci device is
granted with such capability (other VFIO devices are not yet ready):
It gets a new optional parameter named iommufd which allows to pass
an iommufd object:
-object iommufd,id=iommufd0
-device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
Note the /dev/iommu and vfio cdev can be externally opened by a
management layer. In such a case the fd is passed:
-object iommufd,id=iommufd0,fd=22
-device vfio-pci,iommufd=iommufd0,fd=23
If the fd parameter is not passed, the fd is opened by QEMU.
See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
for detailed discuss on this requirement.
If no iommufd option is passed to the vfio-pci device, iommufd is not
used and the end-user gets the behavior based on the legacy vfio iommu
interfaces:
-device vfio-pci,host=0000:02:00.0
While the legacy kernel interface is group-centric, the new iommufd
interface is device-centric, relying on device fd and iommufd.
To support both interfaces in the QEMU VFIO device we reworked the vfio
container abstraction so that the generic VFIO code can use either
backend.
The VFIOContainer object becomes a base object derived into
a) the legacy VFIO container and
b) the new iommufd based container.
The base object implements generic code such as code related to
memory_listener and address space management whereas the derived
objects implement callbacks specific to either BE, legacy and
iommufd. Indeed each backend has its own way to setup secure context
and dma management interface. The below diagram shows how it looks
like with both BEs.
VFIO AddressSpace/Memory
+-------+ +----------+ +-----+ +-----+
| pci | | platform | | ap | | ccw |
+---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
| | | | | AddressSpace |
| | | | +------------+---------+
+---V-----------V-----------V--------V----+ /
| VFIOAddressSpace | <------------+
| | | MemoryListener
| VFIOContainer list |
+-------+----------------------------+----+
| |
| |
+-------V------+ +--------V----------+
| iommufd | | vfio legacy |
| container | | container |
+-------+------+ +--------+----------+
| |
| /dev/iommu | /dev/vfio/vfio
| /dev/vfio/devices/vfioX | /dev/vfio/$group_id
Userspace | |
============+============================+===========================
Kernel | device fd |
+---------------+ | group/container fd
| (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU)
| ATTACH_IOAS) | | device fd
| | |
| +-------V------------V-----------------+
iommufd | | vfio |
(map/unmap | +---------+--------------------+-------+
ioas_copy) | | | map/unmap
| | |
+------V------+ +-----V------+ +------V--------+
| iommfd core | | device | | vfio iommu |
+-------------+ +------------+ +---------------+
[Secure Context setup]
- iommufd BE: uses device fd and iommufd to setup secure context
(bind_iommufd, attach_ioas)
- vfio legacy BE: uses group fd and container fd to setup secure context
(set_container, set_iommu)
[Device access]
- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
- vfio legacy BE: device fd is retrieved from group fd ioctl
[DMA Mapping flow]
1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
2. VFIO populates DMA map/unmap via the container BEs
*) iommufd BE: uses iommufd
*) vfio legacy BE: uses container fd
Changelog:
v6:
- simplify CONFIG_IOMMUFD checking code further (Cédric)
- check iommufd_cdev_kvm_device_add return value (Cédric)
- dirrectory -> directory (Cédric)
- propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric)
- introduce a helper vfio_device_set_fd (Cédric)
- Move #include "sysemu/iommufd.h" in platform.c (Cédric)
- simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas
- Dare to keep Matthew's RB as related change is minor
v5:
- Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric)
- Add (uintptr_t) to info->allowed_iovas (Cédric)
- Switch to IOAS attach/detach and hide hwpt (Jason)
- move chardev_open.[h|c] under the IOMMUFD entry (Cédric)
- Move vfio_legacy_pci_hot_reset into container.c (Cédric)
- Add missed pgsizes initialization in vfio_get_info_iova_range
- split linking iommufd patch into three to be cleaner
- Fix comments on PCI BAR unmap
v4:
- add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
- add doc for default case without fd (Markus)
- Fix build issue reported by Markus and Cédric
- Simply use SPDX identifier in new file (Cédric)
- make vfio_container_init/destroy helper a seperate patch (Cédric)
- make vrdl_list movement a seperate patch (Cédric)
- add const for some callback parameters (Cédric)
- add g_assert in VFIOIOMMUOps callback (Cédric)
- introduce pci_hot_reset callback (Cédric)
- remove VFIOIOMMUSpaprOps (Cédric)
- initialize g_autofree to NULL (Cédric)
- adjust func name prefix and trace event in iommufd.c (Cédric)
- add RB
v3:
- Rename base container as VFIOContainerBase and legacy container as container (Cédric)
- Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
- Cleanup container.c by introducing spapr backend and move spapr code out (Cédric)
- Introduce vfio_iommu_spapr_ops (Cédric)
- Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
- patch19 and patch21 are splitted to two parts to facilitate review
v2:
- patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
- add fd passing to platform/ap/ccw vfio device
- add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
- rename char_dev.h to chardev_open.h for same naming scheme per Daniel
- add full copyright per Daniel and Jason
Note changelog below are from full IOMMUFD series:
v1:
- Alloc hwpt instead of using auto hwpt
- elaborate iommufd code per Nicolin
- consolidate two patches and drop as.c
- typo error fix and function rename
rfcv4:
- rebase on top of v8.0.3
- Add one patch from Yi which is about vfio device add in kvm
- Remove IOAS_COPY optimization and focus on functions in this patchset
- Fix wrong name issue reported and fix suggested by Matthew
- Fix compilation issue reported and fix sugggsted by Nicolin
- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
granularity
- Add dev_iter_next() callback to avoid adding so many callback
at container scope, add VFIODevice.hwpt to support that
- Restore all functions back to common from container whenever possible,
mainly migration and reset related functions
- Add --enable/disable-iommufd config option, enabled by default in linux
- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
redundant code
- Add FD passing support for vfio device backed by IOMMUFD
- Fix hot unplug resource leak issue in vfio_legacy_detach_device()
- Fix FD leak in vfio_get_devicefd()
rfcv3:
- rebase on top of v7.2.0
- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
VFIO backends
- Fix use after free in error path, reported by Alister
- Split common.c in several steps to ease the review
rfcv2:
- remove the first three patches of rfcv1
- add open cdev helper suggested by Jason
- remove the QOMification of the VFIOContainer and simply use standard ops
(David)
- add "-object iommufd" suggested by Alex
Thanks
Zhenzhong
Cédric Le Goater (3):
hw/arm: Activate IOMMUFD for virt machines
kconfig: Activate IOMMUFD for s390x machines
hw/i386: Activate IOMMUFD for q35 machines
Eric Auger (2):
backends/iommufd: Introduce the iommufd object
vfio/pci: Allow the selection of a given iommu backend
Yi Liu (2):
util/char_dev: Add open_cdev()
vfio/iommufd: Implement the iommufd backend
Zhenzhong Duan (14):
vfio/common: return early if space isn't empty
vfio/iommufd: Relax assert check for iommufd backend
vfio/iommufd: Add support for iova_ranges and pgsizes
vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
vfio/pci: Introduce a vfio pci hot reset interface
vfio/iommufd: Enable pci hot reset through iommufd cdev interface
vfio/pci: Make vfio cdev pre-openable by passing a file handle
vfio/platform: Allow the selection of a given iommu backend
vfio/platform: Make vfio cdev pre-openable by passing a file handle
vfio/ap: Allow the selection of a given iommu backend
vfio/ap: Make vfio cdev pre-openable by passing a file handle
vfio/ccw: Allow the selection of a given iommu backend
vfio/ccw: Make vfio cdev pre-openable by passing a file handle
vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
callbacks
MAINTAINERS | 10 +
qapi/qom.json | 19 +
hw/vfio/pci.h | 6 +
include/hw/vfio/vfio-common.h | 26 +-
include/hw/vfio/vfio-container-base.h | 15 +-
include/qemu/chardev_open.h | 16 +
include/sysemu/iommufd.h | 44 ++
backends/iommufd.c | 228 ++++++++++
hw/vfio/ap.c | 29 +-
hw/vfio/ccw.c | 31 +-
hw/vfio/common.c | 24 +-
hw/vfio/container-base.c | 6 +-
hw/vfio/container.c | 208 ++++++++-
hw/vfio/helpers.c | 44 ++
hw/vfio/iommufd.c | 630 ++++++++++++++++++++++++++
hw/vfio/pci.c | 212 ++-------
hw/vfio/platform.c | 38 +-
util/chardev_open.c | 81 ++++
backends/Kconfig | 4 +
backends/meson.build | 1 +
backends/trace-events | 10 +
hw/arm/Kconfig | 1 +
hw/i386/Kconfig | 1 +
hw/s390x/Kconfig | 1 +
hw/vfio/meson.build | 3 +
hw/vfio/trace-events | 11 +
qemu-options.hx | 12 +
util/meson.build | 1 +
28 files changed, 1493 insertions(+), 219 deletions(-)
create mode 100644 include/qemu/chardev_open.h
create mode 100644 include/sysemu/iommufd.h
create mode 100644 backends/iommufd.c
create mode 100644 hw/vfio/iommufd.c
create mode 100644 util/chardev_open.c
--
2.34.1
^ permalink raw reply [flat|nested] 82+ messages in thread
* [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 13:28 ` Cédric Le Goater
` (2 more replies)
2023-11-14 10:09 ` [PATCH v6 02/21] util/char_dev: Add open_cdev() Zhenzhong Duan
` (21 subsequent siblings)
22 siblings, 3 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan, Paolo Bonzini, Eric Blake, Markus Armbruster,
Daniel P. Berrangé, Eduardo Habkost
From: Eric Auger <eric.auger@redhat.com>
Introduce an iommufd object which allows the interaction
with the host /dev/iommu device.
The /dev/iommu can have been already pre-opened outside of qemu,
in which case the fd can be passed directly along with the
iommufd object:
This allows the iommufd object to be shared accross several
subsystems (VFIO, VDPA, ...). For example, libvirt would open
the /dev/iommu once.
If no fd is passed along with the iommufd object, the /dev/iommu
is opened by the qemu code.
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v6: remove redundant call, alloc_hwpt, get/put_ioas
MAINTAINERS | 7 ++
qapi/qom.json | 19 ++++
include/sysemu/iommufd.h | 44 ++++++++
backends/iommufd.c | 228 +++++++++++++++++++++++++++++++++++++++
backends/Kconfig | 4 +
backends/meson.build | 1 +
backends/trace-events | 10 ++
qemu-options.hx | 12 +++
8 files changed, 325 insertions(+)
create mode 100644 include/sysemu/iommufd.h
create mode 100644 backends/iommufd.c
diff --git a/MAINTAINERS b/MAINTAINERS
index ff1238bb98..a4891f7bda 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c
F: docs/system/s390x/vfio-ap.rst
L: qemu-s390x@nongnu.org
+iommufd
+M: Yi Liu <yi.l.liu@intel.com>
+M: Eric Auger <eric.auger@redhat.com>
+S: Supported
+F: backends/iommufd.c
+F: include/sysemu/iommufd.h
+
vhost
M: Michael S. Tsirkin <mst@redhat.com>
S: Supported
diff --git a/qapi/qom.json b/qapi/qom.json
index c53ef978ff..1fd8555a75 100644
--- a/qapi/qom.json
+++ b/qapi/qom.json
@@ -794,6 +794,23 @@
{ 'struct': 'VfioUserServerProperties',
'data': { 'socket': 'SocketAddress', 'device': 'str' } }
+##
+# @IOMMUFDProperties:
+#
+# Properties for iommufd objects.
+#
+# @fd: file descriptor name previously passed via 'getfd' command,
+# which represents a pre-opened /dev/iommu. This allows the
+# iommufd object to be shared accross several subsystems
+# (VFIO, VDPA, ...), and the file descriptor to be shared
+# with other process, e.g. DPDK. (default: QEMU opens
+# /dev/iommu by itself)
+#
+# Since: 8.2
+##
+{ 'struct': 'IOMMUFDProperties',
+ 'data': { '*fd': 'str' } }
+
##
# @RngProperties:
#
@@ -934,6 +951,7 @@
'input-barrier',
{ 'name': 'input-linux',
'if': 'CONFIG_LINUX' },
+ 'iommufd',
'iothread',
'main-loop',
{ 'name': 'memory-backend-epc',
@@ -1003,6 +1021,7 @@
'input-barrier': 'InputBarrierProperties',
'input-linux': { 'type': 'InputLinuxProperties',
'if': 'CONFIG_LINUX' },
+ 'iommufd': 'IOMMUFDProperties',
'iothread': 'IothreadProperties',
'main-loop': 'MainLoopProperties',
'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties',
diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
new file mode 100644
index 0000000000..9b3a86f57d
--- /dev/null
+++ b/include/sysemu/iommufd.h
@@ -0,0 +1,44 @@
+#ifndef SYSEMU_IOMMUFD_H
+#define SYSEMU_IOMMUFD_H
+
+#include "qom/object.h"
+#include "qemu/thread.h"
+#include "exec/hwaddr.h"
+#include "exec/cpu-common.h"
+
+#define TYPE_IOMMUFD_BACKEND "iommufd"
+OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
+ IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND(obj) \
+ OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND_GET_CLASS(obj) \
+ OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND)
+#define IOMMUFD_BACKEND_CLASS(klass) \
+ OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND)
+struct IOMMUFDBackendClass {
+ ObjectClass parent_class;
+};
+
+struct IOMMUFDBackend {
+ Object parent;
+
+ /*< protected >*/
+ int fd; /* /dev/iommu file descriptor */
+ bool owned; /* is the /dev/iommu opened internally */
+ QemuMutex lock;
+ uint32_t users;
+
+ /*< public >*/
+};
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
+void iommufd_backend_disconnect(IOMMUFDBackend *be);
+
+int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
+ Error **errp);
+void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+ ram_addr_t size, void *vaddr, bool readonly);
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+ hwaddr iova, ram_addr_t size);
+#endif
diff --git a/backends/iommufd.c b/backends/iommufd.c
new file mode 100644
index 0000000000..ea3e2a8f85
--- /dev/null
+++ b/backends/iommufd.c
@@ -0,0 +1,228 @@
+/*
+ * iommufd container backend
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ * Eric Auger <eric.auger@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/iommufd.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/module.h"
+#include "qom/object_interfaces.h"
+#include "qemu/error-report.h"
+#include "monitor/monitor.h"
+#include "trace.h"
+#include <sys/ioctl.h>
+#include <linux/iommufd.h>
+
+static void iommufd_backend_init(Object *obj)
+{
+ IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+ be->fd = -1;
+ be->users = 0;
+ be->owned = true;
+ qemu_mutex_init(&be->lock);
+}
+
+static void iommufd_backend_finalize(Object *obj)
+{
+ IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+
+ if (be->owned) {
+ close(be->fd);
+ be->fd = -1;
+ }
+}
+
+static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
+{
+ IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
+ int fd = -1;
+
+ fd = monitor_fd_param(monitor_cur(), str, errp);
+ if (fd == -1) {
+ error_prepend(errp, "Could not parse remote object fd %s:", str);
+ return;
+ }
+ qemu_mutex_lock(&be->lock);
+ be->fd = fd;
+ be->owned = false;
+ qemu_mutex_unlock(&be->lock);
+ trace_iommu_backend_set_fd(be->fd);
+}
+
+static void iommufd_backend_class_init(ObjectClass *oc, void *data)
+{
+ object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
+}
+
+int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
+{
+ int fd, ret = 0;
+
+ qemu_mutex_lock(&be->lock);
+ if (be->users == UINT32_MAX) {
+ error_setg(errp, "too many connections");
+ ret = -E2BIG;
+ goto out;
+ }
+ if (be->owned && !be->users) {
+ fd = qemu_open_old("/dev/iommu", O_RDWR);
+ if (fd < 0) {
+ error_setg_errno(errp, errno, "/dev/iommu opening failed");
+ ret = fd;
+ goto out;
+ }
+ be->fd = fd;
+ }
+ be->users++;
+out:
+ trace_iommufd_backend_connect(be->fd, be->owned,
+ be->users, ret);
+ qemu_mutex_unlock(&be->lock);
+ return ret;
+}
+
+void iommufd_backend_disconnect(IOMMUFDBackend *be)
+{
+ qemu_mutex_lock(&be->lock);
+ if (!be->users) {
+ goto out;
+ }
+ be->users--;
+ if (!be->users && be->owned) {
+ close(be->fd);
+ be->fd = -1;
+ }
+out:
+ trace_iommufd_backend_disconnect(be->fd, be->users);
+ qemu_mutex_unlock(&be->lock);
+}
+
+int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
+ Error **errp)
+{
+ int ret, fd = be->fd;
+ struct iommu_ioas_alloc alloc_data = {
+ .size = sizeof(alloc_data),
+ .flags = 0,
+ };
+
+ ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
+ if (ret) {
+ error_setg_errno(errp, errno, "Failed to allocate ioas");
+ return ret;
+ }
+
+ *ioas_id = alloc_data.out_ioas_id;
+ trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
+
+ return ret;
+}
+
+void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id)
+{
+ int ret, fd = be->fd;
+ struct iommu_destroy des = {
+ .size = sizeof(des),
+ .id = id,
+ };
+
+ ret = ioctl(fd, IOMMU_DESTROY, &des);
+ trace_iommufd_backend_free_id(fd, id, ret);
+ if (ret) {
+ error_report("Failed to free id: %u %m", id);
+ }
+}
+
+int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
+ ram_addr_t size, void *vaddr, bool readonly)
+{
+ int ret, fd = be->fd;
+ struct iommu_ioas_map map = {
+ .size = sizeof(map),
+ .flags = IOMMU_IOAS_MAP_READABLE |
+ IOMMU_IOAS_MAP_FIXED_IOVA,
+ .ioas_id = ioas_id,
+ .__reserved = 0,
+ .user_va = (uintptr_t)vaddr,
+ .iova = iova,
+ .length = size,
+ };
+
+ if (!readonly) {
+ map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
+ }
+
+ ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
+ trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
+ vaddr, readonly, ret);
+ if (ret) {
+ ret = -errno;
+ error_report("IOMMU_IOAS_MAP failed: %m");
+ }
+ return ret;
+}
+
+int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
+ hwaddr iova, ram_addr_t size)
+{
+ int ret, fd = be->fd;
+ struct iommu_ioas_unmap unmap = {
+ .size = sizeof(unmap),
+ .ioas_id = ioas_id,
+ .iova = iova,
+ .length = size,
+ };
+
+ ret = ioctl(fd, IOMMU_IOAS_UNMAP, &unmap);
+ /*
+ * IOMMUFD takes mapping as some kind of object, unmapping
+ * nonexistent mapping is treated as deleting a nonexistent
+ * object and return ENOENT. This is different from legacy
+ * backend which allows it. vIOMMU may trigger a lot of
+ * redundant unmapping, to avoid flush the log, treat them
+ * as succeess for IOMMUFD just like legacy backend.
+ */
+ if (ret && errno == ENOENT) {
+ trace_iommufd_backend_unmap_dma_non_exist(fd, ioas_id, iova, size, ret);
+ ret = 0;
+ } else {
+ trace_iommufd_backend_unmap_dma(fd, ioas_id, iova, size, ret);
+ }
+
+ if (ret) {
+ ret = -errno;
+ error_report("IOMMU_IOAS_UNMAP failed: %m");
+ }
+ return ret;
+}
+
+static const TypeInfo iommufd_backend_info = {
+ .name = TYPE_IOMMUFD_BACKEND,
+ .parent = TYPE_OBJECT,
+ .instance_size = sizeof(IOMMUFDBackend),
+ .instance_init = iommufd_backend_init,
+ .instance_finalize = iommufd_backend_finalize,
+ .class_size = sizeof(IOMMUFDBackendClass),
+ .class_init = iommufd_backend_class_init,
+ .interfaces = (InterfaceInfo[]) {
+ { TYPE_USER_CREATABLE },
+ { }
+ }
+};
+
+static void register_types(void)
+{
+ type_register_static(&iommufd_backend_info);
+}
+
+type_init(register_types);
diff --git a/backends/Kconfig b/backends/Kconfig
index f35abc1609..2cb23f62fa 100644
--- a/backends/Kconfig
+++ b/backends/Kconfig
@@ -1 +1,5 @@
source tpm/Kconfig
+
+config IOMMUFD
+ bool
+ depends on VFIO
diff --git a/backends/meson.build b/backends/meson.build
index 914c7c4afb..9a5cea480d 100644
--- a/backends/meson.build
+++ b/backends/meson.build
@@ -20,6 +20,7 @@ if have_vhost_user
system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c'))
endif
system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost.c'))
+system_ss.add(when: 'CONFIG_IOMMUFD', if_true: files('iommufd.c'))
if have_vhost_user_crypto
system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost-user.c'))
endif
diff --git a/backends/trace-events b/backends/trace-events
index 652eb76a57..d45c6e31a6 100644
--- a/backends/trace-events
+++ b/backends/trace-events
@@ -5,3 +5,13 @@ dbus_vmstate_pre_save(void)
dbus_vmstate_post_load(int version_id) "version_id: %d"
dbus_vmstate_loading(const char *id) "id: %s"
dbus_vmstate_saving(const char *id) "id: %s"
+
+# iommufd.c
+iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd=%d owned=%d users=%d (%d)"
+iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
+iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
+iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
+iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
+iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
+iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
+iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
diff --git a/qemu-options.hx b/qemu-options.hx
index 42fd09e4de..70507c0ee6 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -5224,6 +5224,18 @@ SRST
The ``share`` boolean option is on by default with memfd.
+ ``-object iommufd,id=id[,fd=fd]``
+ Creates an iommufd backend which allows control of DMA mapping
+ through the /dev/iommu device.
+
+ The ``id`` parameter is a unique ID which frontends (such as
+ vfio-pci of vdpa) will use to connect with the iommufd backend.
+
+ The ``fd`` parameter is an optional pre-opened file descriptor
+ resulting from /dev/iommu opening. Usually the iommufd is shared
+ across all subsystems, bringing the benefit of centralized
+ reference counting.
+
``-object rng-builtin,id=id``
Creates a random number generator backend which obtains entropy
from QEMU builtin functions. The ``id`` parameter is a unique ID
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 02/21] util/char_dev: Add open_cdev()
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
2023-11-14 10:09 ` [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 13:29 ` Cédric Le Goater
2023-11-15 13:23 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 03/21] vfio/common: return early if space isn't empty Zhenzhong Duan
` (20 subsequent siblings)
22 siblings, 2 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
From: Yi Liu <yi.l.liu@intel.com>
/dev/vfio/devices/vfioX may not exist. In that case it is still possible
to open /dev/char/$major:$minor instead. Add helper function to abstract
the cdev open.
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
MAINTAINERS | 3 ++
include/qemu/chardev_open.h | 16 ++++++++
util/chardev_open.c | 81 +++++++++++++++++++++++++++++++++++++
util/meson.build | 1 +
4 files changed, 101 insertions(+)
create mode 100644 include/qemu/chardev_open.h
create mode 100644 util/chardev_open.c
diff --git a/MAINTAINERS b/MAINTAINERS
index a4891f7bda..869ec3d5af 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2172,6 +2172,9 @@ M: Eric Auger <eric.auger@redhat.com>
S: Supported
F: backends/iommufd.c
F: include/sysemu/iommufd.h
+F: include/qemu/chardev_open.h
+F: util/chardev_open.c
+
vhost
M: Michael S. Tsirkin <mst@redhat.com>
diff --git a/include/qemu/chardev_open.h b/include/qemu/chardev_open.h
new file mode 100644
index 0000000000..64e8fcfdcb
--- /dev/null
+++ b/include/qemu/chardev_open.h
@@ -0,0 +1,16 @@
+/*
+ * QEMU Chardev Helper
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#ifndef QEMU_CHARDEV_OPEN_H
+#define QEMU_CHARDEV_OPEN_H
+
+int open_cdev(const char *devpath, dev_t cdev);
+#endif
diff --git a/util/chardev_open.c b/util/chardev_open.c
new file mode 100644
index 0000000000..f776429788
--- /dev/null
+++ b/util/chardev_open.c
@@ -0,0 +1,81 @@
+/*
+ * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
+ * Copyright (C) 2023 Intel Corporation.
+ *
+ * This software is available to you under a choice of one of two
+ * licenses. You may choose to be licensed under the terms of the GNU
+ * General Public License (GPL) Version 2, available from the file
+ * COPYING in the main directory of this source tree, or the
+ * OpenIB.org BSD license below:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * - Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ * - Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ *
+ * Copied from
+ * https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/chardev_open.h"
+
+static int open_cdev_internal(const char *path, dev_t cdev)
+{
+ struct stat st;
+ int fd;
+
+ fd = qemu_open_old(path, O_RDWR);
+ if (fd == -1) {
+ return -1;
+ }
+ if (fstat(fd, &st) || !S_ISCHR(st.st_mode) ||
+ (cdev != 0 && st.st_rdev != cdev)) {
+ close(fd);
+ return -1;
+ }
+ return fd;
+}
+
+static int open_cdev_robust(dev_t cdev)
+{
+ g_autofree char *devpath = NULL;
+
+ /*
+ * This assumes that udev is being used and is creating the /dev/char/
+ * symlinks.
+ */
+ devpath = g_strdup_printf("/dev/char/%u:%u", major(cdev), minor(cdev));
+ return open_cdev_internal(devpath, cdev);
+}
+
+int open_cdev(const char *devpath, dev_t cdev)
+{
+ int fd;
+
+ fd = open_cdev_internal(devpath, cdev);
+ if (fd == -1 && cdev != 0) {
+ return open_cdev_robust(cdev);
+ }
+ return fd;
+}
diff --git a/util/meson.build b/util/meson.build
index c2322ef6e7..174c133368 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -108,6 +108,7 @@ if have_block
util_ss.add(files('filemonitor-stub.c'))
endif
util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c'))
+ util_ss.add(when: 'CONFIG_LINUX', if_true: files('chardev_open.c'))
endif
if cpu == 'aarch64'
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 03/21] vfio/common: return early if space isn't empty
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
2023-11-14 10:09 ` [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
2023-11-14 10:09 ` [PATCH v6 02/21] util/char_dev: Add open_cdev() Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 13:29 ` Cédric Le Goater
2023-11-15 13:28 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 04/21] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
` (19 subsequent siblings)
22 siblings, 2 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
This is a trivial optimization. If there is active container in space,
vfio_reset_handler will never be unregistered. So revert the check of
space->containers and return early.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/common.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 572ae7c934..934f4f5446 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -1462,10 +1462,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
void vfio_put_address_space(VFIOAddressSpace *space)
{
- if (QLIST_EMPTY(&space->containers)) {
- QLIST_REMOVE(space, list);
- g_free(space);
+ if (!QLIST_EMPTY(&space->containers)) {
+ return;
}
+
+ QLIST_REMOVE(space, list);
+ g_free(space);
+
if (QLIST_EMPTY(&vfio_address_spaces)) {
qemu_unregister_reset(vfio_reset_handler, NULL);
}
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 04/21] vfio/iommufd: Implement the iommufd backend
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (2 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 03/21] vfio/common: return early if space isn't empty Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 13:36 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 05/21] vfio/iommufd: Relax assert check for " Zhenzhong Duan
` (18 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
From: Yi Liu <yi.l.liu@intel.com>
Add the iommufd backend. The IOMMUFD container class is implemented
based on the new /dev/iommu user API. This backend obviously depends
on CONFIG_IOMMUFD.
So far, the iommufd backend doesn't support dirty page sync yet due
to missing support in the host kernel.
Co-authored-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v6: simplify CONFIG_IOMMUFD checking code
dirrectory -> directory
check iommufd_cdev_kvm_device_add return value
use prefix iommufd_cdev_* for all functions and trace event
include/hw/vfio/vfio-common.h | 11 +
hw/vfio/common.c | 6 +
hw/vfio/iommufd.c | 428 ++++++++++++++++++++++++++++++++++
hw/vfio/meson.build | 3 +
hw/vfio/trace-events | 10 +
5 files changed, 458 insertions(+)
create mode 100644 hw/vfio/iommufd.c
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 24ecc0e7ee..3dac5c167e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -89,6 +89,14 @@ typedef struct VFIOHostDMAWindow {
QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
} VFIOHostDMAWindow;
+typedef struct IOMMUFDBackend IOMMUFDBackend;
+
+typedef struct VFIOIOMMUFDContainer {
+ VFIOContainerBase bcontainer;
+ IOMMUFDBackend *be;
+ uint32_t ioas_id;
+} VFIOIOMMUFDContainer;
+
typedef struct VFIODeviceOps VFIODeviceOps;
typedef struct VFIODevice {
@@ -116,6 +124,8 @@ typedef struct VFIODevice {
OnOffAuto pre_copy_dirty_page_tracking;
bool dirty_pages_supported;
bool dirty_tracking;
+ int devid;
+ IOMMUFDBackend *iommufd;
} VFIODevice;
struct VFIODeviceOps {
@@ -201,6 +211,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
extern VFIOGroupList vfio_group_list;
extern VFIODeviceList vfio_device_list;
extern const VFIOIOMMUOps vfio_legacy_ops;
+extern const VFIOIOMMUOps vfio_iommufd_ops;
extern const MemoryListener vfio_memory_listener;
extern int vfio_kvm_device_fd;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 934f4f5446..6569732b7a 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -19,6 +19,7 @@
*/
#include "qemu/osdep.h"
+#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
#include <sys/ioctl.h>
#ifdef CONFIG_KVM
#include <linux/kvm.h>
@@ -1503,6 +1504,11 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
{
const VFIOIOMMUOps *ops = &vfio_legacy_ops;
+#ifdef CONFIG_IOMMUFD
+ if (vbasedev->iommufd) {
+ ops = &vfio_iommufd_ops;
+ }
+#endif
return ops->attach_device(name, vbasedev, as, errp);
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
new file mode 100644
index 0000000000..06282d885c
--- /dev/null
+++ b/hw/vfio/iommufd.c
@@ -0,0 +1,428 @@
+/*
+ * iommufd container backend
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ * Copyright Red Hat, Inc. 2023
+ *
+ * Authors: Yi Liu <yi.l.liu@intel.com>
+ * Eric Auger <eric.auger@redhat.com>
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include <sys/ioctl.h>
+#include <linux/vfio.h>
+#include <linux/iommufd.h>
+
+#include "hw/vfio/vfio-common.h"
+#include "qemu/error-report.h"
+#include "trace.h"
+#include "qapi/error.h"
+#include "sysemu/iommufd.h"
+#include "hw/qdev-core.h"
+#include "sysemu/reset.h"
+#include "qemu/cutils.h"
+#include "qemu/chardev_open.h"
+
+static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
+ ram_addr_t size, void *vaddr, bool readonly)
+{
+ VFIOIOMMUFDContainer *container =
+ container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+ return iommufd_backend_map_dma(container->be,
+ container->ioas_id,
+ iova, size, vaddr, readonly);
+}
+
+static int iommufd_cdev_unmap(VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb)
+{
+ VFIOIOMMUFDContainer *container =
+ container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+
+ /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
+ return iommufd_backend_unmap_dma(container->be,
+ container->ioas_id, iova, size);
+}
+
+static int iommufd_cdev_kvm_device_add(VFIODevice *vbasedev, Error **errp)
+{
+ return vfio_kvm_device_add_fd(vbasedev->fd, errp);
+}
+
+static void iommufd_cdev_kvm_device_del(VFIODevice *vbasedev)
+{
+ Error *err = NULL;
+
+ if (vfio_kvm_device_del_fd(vbasedev->fd, &err)) {
+ error_report_err(err);
+ }
+}
+
+static int iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
+{
+ IOMMUFDBackend *iommufd = vbasedev->iommufd;
+ struct vfio_device_bind_iommufd bind = {
+ .argsz = sizeof(bind),
+ .flags = 0,
+ };
+ int ret;
+
+ ret = iommufd_backend_connect(iommufd, errp);
+ if (ret) {
+ return ret;
+ }
+
+ /*
+ * Add device to kvm-vfio to be prepared for the tracking
+ * in KVM. Especially for some emulated devices, it requires
+ * to have kvm information in the device open.
+ */
+ ret = iommufd_cdev_kvm_device_add(vbasedev, errp);
+ if (ret) {
+ goto err_kvm_device_add;
+ }
+
+ /* Bind device to iommufd */
+ bind.iommufd = iommufd->fd;
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
+ if (ret) {
+ error_setg_errno(errp, errno, "error bind device fd=%d to iommufd=%d",
+ vbasedev->fd, bind.iommufd);
+ goto err_bind;
+ }
+
+ vbasedev->devid = bind.out_devid;
+ trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name,
+ vbasedev->fd, vbasedev->devid);
+ return ret;
+err_bind:
+ iommufd_cdev_kvm_device_del(vbasedev);
+err_kvm_device_add:
+ iommufd_backend_disconnect(iommufd);
+ return ret;
+}
+
+static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
+{
+ /* Unbind is automatically conducted when device fd is closed */
+ iommufd_cdev_kvm_device_del(vbasedev);
+ iommufd_backend_disconnect(vbasedev->iommufd);
+}
+
+static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
+{
+ long int ret = -ENOTTY;
+ char *path, *vfio_dev_path = NULL, *vfio_path = NULL;
+ DIR *dir = NULL;
+ struct dirent *dent;
+ gchar *contents;
+ struct stat st;
+ gsize length;
+ int major, minor;
+ dev_t vfio_devt;
+
+ path = g_strdup_printf("%s/vfio-dev", sysfs_path);
+ if (stat(path, &st) < 0) {
+ error_setg_errno(errp, errno, "no such host device");
+ goto out_free_path;
+ }
+
+ dir = opendir(path);
+ if (!dir) {
+ error_setg_errno(errp, errno, "couldn't open directory %s", path);
+ goto out_free_path;
+ }
+
+ while ((dent = readdir(dir))) {
+ if (!strncmp(dent->d_name, "vfio", 4)) {
+ vfio_dev_path = g_strdup_printf("%s/%s/dev", path, dent->d_name);
+ break;
+ }
+ }
+
+ if (!vfio_dev_path) {
+ error_setg(errp, "failed to find vfio-dev/vfioX/dev");
+ goto out_close_dir;
+ }
+
+ if (!g_file_get_contents(vfio_dev_path, &contents, &length, NULL)) {
+ error_setg(errp, "failed to load \"%s\"", vfio_dev_path);
+ goto out_free_dev_path;
+ }
+
+ if (sscanf(contents, "%d:%d", &major, &minor) != 2) {
+ error_setg(errp, "failed to get major:minor for \"%s\"", vfio_dev_path);
+ goto out_free_dev_path;
+ }
+ g_free(contents);
+ vfio_devt = makedev(major, minor);
+
+ vfio_path = g_strdup_printf("/dev/vfio/devices/%s", dent->d_name);
+ ret = open_cdev(vfio_path, vfio_devt);
+ if (ret < 0) {
+ error_setg(errp, "Failed to open %s", vfio_path);
+ }
+
+ trace_iommufd_cdev_getfd(vfio_path, ret);
+ g_free(vfio_path);
+
+out_free_dev_path:
+ g_free(vfio_dev_path);
+out_close_dir:
+ closedir(dir);
+out_free_path:
+ if (*errp) {
+ error_prepend(errp, VFIO_MSG_PREFIX, path);
+ }
+ g_free(path);
+
+ return ret;
+}
+
+static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, bool is_ioas,
+ uint32_t id, Error **errp)
+{
+ int ret, iommufd = vbasedev->iommufd->fd;
+ struct vfio_device_attach_iommufd_pt attach_data = {
+ .argsz = sizeof(attach_data),
+ .flags = 0,
+ .pt_id = id,
+ };
+ const char *str = is_ioas ? "ioas" : "hwpt";
+
+ /* Attach device to an IOAS or hwpt within iommufd */
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
+ if (ret) {
+ error_setg_errno(errp, errno,
+ "[iommufd=%d] error attach %s (%d) to %s_id=%d",
+ iommufd, vbasedev->name, vbasedev->fd, str, id);
+ } else {
+ trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
+ vbasedev->fd, str, id);
+ }
+ return ret;
+}
+
+static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, bool is_ioas,
+ uint32_t id, Error **errp)
+{
+ int ret, iommufd = vbasedev->iommufd->fd;
+ struct vfio_device_detach_iommufd_pt detach_data = {
+ .argsz = sizeof(detach_data),
+ .flags = 0,
+ };
+ const char *str = is_ioas ? "ioas" : "hwpt";
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data);
+ if (ret) {
+ error_setg_errno(errp, errno, "detach %s from %s failed",
+ vbasedev->name, str);
+ } else {
+ trace_iommufd_cdev_detach_ioas_hwpt(iommufd, vbasedev->name, str, id);
+ }
+ return ret;
+}
+
+static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container,
+ Error **errp)
+{
+ return iommufd_cdev_attach_ioas_hwpt(vbasedev, true,
+ container->ioas_id, errp);
+}
+
+static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
+ VFIOIOMMUFDContainer *container)
+{
+ Error *err = NULL;
+
+ if (iommufd_cdev_detach_ioas_hwpt(vbasedev, true,
+ container->ioas_id, &err)) {
+ error_report_err(err);
+ }
+}
+
+static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer *container)
+{
+ VFIOContainerBase *bcontainer = &container->bcontainer;
+
+ if (!QLIST_EMPTY(&bcontainer->device_list)) {
+ return;
+ }
+ memory_listener_unregister(&bcontainer->listener);
+ vfio_container_destroy(bcontainer);
+ iommufd_backend_free_id(container->be, container->ioas_id);
+ g_free(container);
+}
+
+static int iommufd_cdev_ram_block_discard_disable(bool state)
+{
+ /*
+ * We support coordinated discarding of RAM via the RamDiscardManager.
+ */
+ return ram_block_uncoordinated_discard_disable(state);
+}
+
+static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
+ AddressSpace *as, Error **errp)
+{
+ VFIOContainerBase *bcontainer;
+ VFIOIOMMUFDContainer *container;
+ VFIOAddressSpace *space;
+ struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
+ int ret, devfd;
+ uint32_t ioas_id;
+ Error *err = NULL;
+
+ devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
+ if (devfd < 0) {
+ return devfd;
+ }
+ vbasedev->fd = devfd;
+
+ ret = iommufd_cdev_connect_and_bind(vbasedev, errp);
+ if (ret) {
+ goto err_connect_bind;
+ }
+
+ space = vfio_get_address_space(as);
+
+ /* try to attach to an existing container in this space */
+ QLIST_FOREACH(bcontainer, &space->containers, next) {
+ container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+ if (bcontainer->ops != &vfio_iommufd_ops ||
+ vbasedev->iommufd != container->be) {
+ continue;
+ }
+ if (iommufd_cdev_attach_container(vbasedev, container, &err)) {
+ const char *msg = error_get_pretty(err);
+
+ trace_iommufd_cdev_fail_attach_existing_container(msg);
+ error_free(err);
+ err = NULL;
+ } else {
+ ret = iommufd_cdev_ram_block_discard_disable(true);
+ if (ret) {
+ error_setg(errp,
+ "Cannot set discarding of RAM broken (%d)", ret);
+ goto err_discard_disable;
+ }
+ goto found_container;
+ }
+ }
+
+ /* Need to allocate a new dedicated container */
+ ret = iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, errp);
+ if (ret < 0) {
+ goto err_alloc_ioas;
+ }
+
+ trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
+
+ container = g_malloc0(sizeof(*container));
+ container->be = vbasedev->iommufd;
+ container->ioas_id = ioas_id;
+
+ bcontainer = &container->bcontainer;
+ vfio_container_init(bcontainer, space, &vfio_iommufd_ops);
+ QLIST_INSERT_HEAD(&space->containers, bcontainer, next);
+
+ ret = iommufd_cdev_attach_container(vbasedev, container, errp);
+ if (ret) {
+ goto err_attach_container;
+ }
+
+ ret = iommufd_cdev_ram_block_discard_disable(true);
+ if (ret) {
+ goto err_discard_disable;
+ }
+
+ bcontainer->pgsizes = qemu_real_host_page_size();
+
+ bcontainer->listener = vfio_memory_listener;
+ memory_listener_register(&bcontainer->listener, bcontainer->space->as);
+
+ if (bcontainer->error) {
+ ret = -1;
+ error_propagate_prepend(errp, bcontainer->error,
+ "memory listener initialization failed: ");
+ goto err_listener_register;
+ }
+
+ bcontainer->initialized = true;
+
+found_container:
+ ret = ioctl(devfd, VFIO_DEVICE_GET_INFO, &dev_info);
+ if (ret) {
+ error_setg_errno(errp, errno, "error getting device info");
+ goto err_listener_register;
+ }
+
+ /*
+ * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
+ * for discarding incompatibility check as well?
+ */
+ if (vbasedev->ram_block_discard_allowed) {
+ iommufd_cdev_ram_block_discard_disable(false);
+ }
+
+ vbasedev->group = 0;
+ vbasedev->num_irqs = dev_info.num_irqs;
+ vbasedev->num_regions = dev_info.num_regions;
+ vbasedev->flags = dev_info.flags;
+ vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
+ vbasedev->bcontainer = bcontainer;
+ QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
+ QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+
+ trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
+ vbasedev->num_regions, vbasedev->flags);
+ return 0;
+
+err_listener_register:
+ iommufd_cdev_ram_block_discard_disable(false);
+err_discard_disable:
+ iommufd_cdev_detach_container(vbasedev, container);
+err_attach_container:
+ iommufd_cdev_container_destroy(container);
+err_alloc_ioas:
+ vfio_put_address_space(space);
+ iommufd_cdev_unbind_and_disconnect(vbasedev);
+err_connect_bind:
+ close(vbasedev->fd);
+ return ret;
+}
+
+static void iommufd_cdev_detach(VFIODevice *vbasedev)
+{
+ VFIOContainerBase *bcontainer = vbasedev->bcontainer;
+ VFIOIOMMUFDContainer *container;
+ VFIOAddressSpace *space = bcontainer->space;
+
+ QLIST_REMOVE(vbasedev, global_next);
+ QLIST_REMOVE(vbasedev, container_next);
+ vbasedev->bcontainer = NULL;
+
+ if (!vbasedev->ram_block_discard_allowed) {
+ iommufd_cdev_ram_block_discard_disable(false);
+ }
+
+ container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+ iommufd_cdev_detach_container(vbasedev, container);
+ iommufd_cdev_container_destroy(container);
+ vfio_put_address_space(space);
+
+ iommufd_cdev_unbind_and_disconnect(vbasedev);
+ close(vbasedev->fd);
+}
+
+const VFIOIOMMUOps vfio_iommufd_ops = {
+ .dma_map = iommufd_cdev_map,
+ .dma_unmap = iommufd_cdev_unmap,
+ .attach_device = iommufd_cdev_attach,
+ .detach_device = iommufd_cdev_detach,
+};
diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
index eb6ce6229d..e5d98b6adc 100644
--- a/hw/vfio/meson.build
+++ b/hw/vfio/meson.build
@@ -7,6 +7,9 @@ vfio_ss.add(files(
'spapr.c',
'migration.c',
))
+vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
+ 'iommufd.c',
+))
vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
'display.c',
'pci-quirks.c',
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 08a1f9dfa4..5d3e9e8cee 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -164,3 +164,13 @@ vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcop
vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
+
+#iommufd.c
+
+iommufd_cdev_connect_and_bind(int iommufd, const char *name, int devfd, int devid) " [iommufd=%d] Successfully bound device %s (fd=%d): output devid=%d"
+iommufd_cdev_getfd(const char *dev, int devfd) " %s (fd=%d)"
+iommufd_cdev_attach_ioas_hwpt(int iommufd, const char *name, int devfd, const char *str, int hwptid) " [iommufd=%d] Successfully attached device %s (%d) to %s_id=%d"
+iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name, const char *str, int hwptid) " [iommufd=%d] Successfully detached %s from %s_id=%d"
+iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
+iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD container with ioasid=%d"
+iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 05/21] vfio/iommufd: Relax assert check for iommufd backend
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (3 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 04/21] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-15 13:56 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes Zhenzhong Duan
` (17 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
Currently iommufd doesn't support dirty page sync yet,
but it will not block us doing live migration if VFIO
migration is force enabled.
So in this case we allow set_dirty_page_tracking to be NULL.
Note we don't need same change for query_dirty_bitmap because
when dirty page sync isn't supported, query_dirty_bitmap will
never be called.
Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
---
hw/vfio/container-base.c | 4 ++++
hw/vfio/container.c | 4 ----
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 71f7274973..eee2dcfe76 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -55,6 +55,10 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
bool start)
{
+ if (!bcontainer->dirty_pages_supported) {
+ return 0;
+ }
+
g_assert(bcontainer->ops->set_dirty_page_tracking);
return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
}
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 6bacf38222..ed2d721b2b 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -216,10 +216,6 @@ static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
.argsz = sizeof(dirty),
};
- if (!bcontainer->dirty_pages_supported) {
- return 0;
- }
-
if (start) {
dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
} else {
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (4 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 05/21] vfio/iommufd: Relax assert check for " Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 13:46 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 07/21] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Zhenzhong Duan
` (16 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
Some vIOMMU such as virtio-iommu use IOVA ranges from host side to
setup reserved ranges for passthrough device, so that guest will not
use an IOVA range beyond host support.
Use an uAPI of IOMMUFD to get IOVA ranges of host side and pass to
vIOMMU just like the legacy backend, if this fails, fallback to
64bit IOVA range.
Also use out_iova_alignment returned from uAPI as pgsizes instead of
qemu_real_host_page_size() as a fallback.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v6: propagate iommufd_cdev_get_info_iova_range err and print as warning
hw/vfio/iommufd.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 54 insertions(+), 1 deletion(-)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 06282d885c..e5bf528e89 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -267,6 +267,53 @@ static int iommufd_cdev_ram_block_discard_disable(bool state)
return ram_block_uncoordinated_discard_disable(state);
}
+static int iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer *container,
+ uint32_t ioas_id, Error **errp)
+{
+ VFIOContainerBase *bcontainer = &container->bcontainer;
+ struct iommu_ioas_iova_ranges *info;
+ struct iommu_iova_range *iova_ranges;
+ int ret, sz, fd = container->be->fd;
+
+ info = g_malloc0(sizeof(*info));
+ info->size = sizeof(*info);
+ info->ioas_id = ioas_id;
+
+ ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
+ if (ret && errno != EMSGSIZE) {
+ goto error;
+ }
+
+ sz = info->num_iovas * sizeof(struct iommu_iova_range);
+ info = g_realloc(info, sizeof(*info) + sz);
+ info->allowed_iovas = (uintptr_t)(info + 1);
+
+ ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
+ if (ret) {
+ goto error;
+ }
+
+ iova_ranges = (struct iommu_iova_range *)(uintptr_t)info->allowed_iovas;
+
+ for (int i = 0; i < info->num_iovas; i++) {
+ Range *range = g_new(Range, 1);
+
+ range_set_bounds(range, iova_ranges[i].start, iova_ranges[i].last);
+ bcontainer->iova_ranges =
+ range_list_insert(bcontainer->iova_ranges, range);
+ }
+ bcontainer->pgsizes = info->out_iova_alignment;
+
+ g_free(info);
+ return 0;
+
+error:
+ ret = -errno;
+ g_free(info);
+ error_setg_errno(errp, errno, "Cannot get IOVA ranges");
+ return ret;
+}
+
static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp)
{
@@ -341,7 +388,13 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
goto err_discard_disable;
}
- bcontainer->pgsizes = qemu_real_host_page_size();
+ ret = iommufd_cdev_get_info_iova_range(container, ioas_id, &err);
+ if (ret) {
+ warn_report_err(err);
+ err = NULL;
+ error_printf("Fallback to default 64bit IOVA range and 4K page size\n");
+ bcontainer->pgsizes = qemu_real_host_page_size();
+ }
bcontainer->listener = vfio_memory_listener;
memory_listener_register(&bcontainer->listener, bcontainer->space->as);
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 07/21] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (5 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-15 17:00 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
` (15 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
This helper will be used by both legacy and iommufd backends.
No functional changes intended.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
---
hw/vfio/pci.h | 3 +++
hw/vfio/pci.c | 54 +++++++++++++++++++++++++++++++++++----------------
2 files changed, 40 insertions(+), 17 deletions(-)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index fba8737ab2..1006061afb 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -218,6 +218,9 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
+int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
+ struct vfio_pci_hot_reset_info **info_p);
+
int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c62c02f7b6..eb55e8ae88 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2445,22 +2445,13 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
return (strcmp(tmp, name) == 0);
}
-static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
+int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
+ struct vfio_pci_hot_reset_info **info_p)
{
- VFIOGroup *group;
struct vfio_pci_hot_reset_info *info;
- struct vfio_pci_dependent_device *devices;
- struct vfio_pci_hot_reset *reset;
- int32_t *fds;
- int ret, i, count;
- bool multi = false;
+ int ret, count;
- trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
-
- if (!single) {
- vfio_pci_pre_reset(vdev);
- }
- vdev->vbasedev.needs_reset = false;
+ assert(info_p && !*info_p);
info = g_malloc0(sizeof(*info));
info->argsz = sizeof(*info);
@@ -2468,24 +2459,53 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
if (ret && errno != ENOSPC) {
ret = -errno;
+ g_free(info);
if (!vdev->has_pm_reset) {
error_report("vfio: Cannot reset device %s, "
"no available reset mechanism.", vdev->vbasedev.name);
}
- goto out_single;
+ return ret;
}
count = info->count;
- info = g_realloc(info, sizeof(*info) + (count * sizeof(*devices)));
- info->argsz = sizeof(*info) + (count * sizeof(*devices));
- devices = &info->devices[0];
+ info = g_realloc(info, sizeof(*info) + (count * sizeof(info->devices[0])));
+ info->argsz = sizeof(*info) + (count * sizeof(info->devices[0]));
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
if (ret) {
ret = -errno;
+ g_free(info);
error_report("vfio: hot reset info failed: %m");
+ return ret;
+ }
+
+ *info_p = info;
+ return 0;
+}
+
+static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
+{
+ VFIOGroup *group;
+ struct vfio_pci_hot_reset_info *info = NULL;
+ struct vfio_pci_dependent_device *devices;
+ struct vfio_pci_hot_reset *reset;
+ int32_t *fds;
+ int ret, i, count;
+ bool multi = false;
+
+ trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
+
+ if (!single) {
+ vfio_pci_pre_reset(vdev);
+ }
+ vdev->vbasedev.needs_reset = false;
+
+ ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
+
+ if (ret) {
goto out_single;
}
+ devices = &info->devices[0];
trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (6 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 07/21] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 13:51 ` Cédric Le Goater
2023-11-15 17:54 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 09/21] vfio/iommufd: Enable pci hot reset through iommufd cdev interface Zhenzhong Duan
` (14 subsequent siblings)
22 siblings, 2 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
Legacy vfio pci and iommufd cdev have different process to hot reset
vfio device, expand current code to abstract out pci_hot_reset callback
for legacy vfio, this same interface will also be used by iommufd
cdev vfio device.
Rename vfio_pci_hot_reset to vfio_legacy_pci_hot_reset and move it
into container.c.
vfio_pci_[pre/post]_reset and vfio_pci_host_match are exported so
they could be called in legacy and iommufd pci_hot_reset callback.
Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v6: pci_hot_reset return -errno if fails
hw/vfio/pci.h | 3 +
include/hw/vfio/vfio-container-base.h | 3 +
hw/vfio/container.c | 170 ++++++++++++++++++++++++++
hw/vfio/pci.c | 168 +------------------------
4 files changed, 182 insertions(+), 162 deletions(-)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 1006061afb..6e64a2654e 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -218,6 +218,9 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
+void vfio_pci_pre_reset(VFIOPCIDevice *vdev);
+void vfio_pci_post_reset(VFIOPCIDevice *vdev);
+bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name);
int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
struct vfio_pci_hot_reset_info **info_p);
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 4b6f017c6f..45bb19c767 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -106,6 +106,9 @@ struct VFIOIOMMUOps {
int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
hwaddr iova, hwaddr size);
+ /* PCI specific */
+ int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
+
/* SPAPR specific */
int (*add_window)(VFIOContainerBase *bcontainer,
MemoryRegionSection *section,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index ed2d721b2b..1dbf9b9a17 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -33,6 +33,7 @@
#include "trace.h"
#include "qapi/error.h"
#include "migration/migration.h"
+#include "pci.h"
VFIOGroupList vfio_group_list =
QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -922,6 +923,174 @@ static void vfio_legacy_detach_device(VFIODevice *vbasedev)
vfio_put_group(group);
}
+static int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single)
+{
+ VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+ VFIOGroup *group;
+ struct vfio_pci_hot_reset_info *info = NULL;
+ struct vfio_pci_dependent_device *devices;
+ struct vfio_pci_hot_reset *reset;
+ int32_t *fds;
+ int ret, i, count;
+ bool multi = false;
+
+ trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
+
+ if (!single) {
+ vfio_pci_pre_reset(vdev);
+ }
+ vdev->vbasedev.needs_reset = false;
+
+ ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
+
+ if (ret) {
+ goto out_single;
+ }
+ devices = &info->devices[0];
+
+ trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
+
+ /* Verify that we have all the groups required */
+ for (i = 0; i < info->count; i++) {
+ PCIHostDeviceAddress host;
+ VFIOPCIDevice *tmp;
+ VFIODevice *vbasedev_iter;
+
+ host.domain = devices[i].segment;
+ host.bus = devices[i].bus;
+ host.slot = PCI_SLOT(devices[i].devfn);
+ host.function = PCI_FUNC(devices[i].devfn);
+
+ trace_vfio_pci_hot_reset_dep_devices(host.domain,
+ host.bus, host.slot, host.function, devices[i].group_id);
+
+ if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
+ continue;
+ }
+
+ QLIST_FOREACH(group, &vfio_group_list, next) {
+ if (group->groupid == devices[i].group_id) {
+ break;
+ }
+ }
+
+ if (!group) {
+ if (!vdev->has_pm_reset) {
+ error_report("vfio: Cannot reset device %s, "
+ "depends on group %d which is not owned.",
+ vdev->vbasedev.name, devices[i].group_id);
+ }
+ ret = -EPERM;
+ goto out;
+ }
+
+ /* Prep dependent devices for reset and clear our marker. */
+ QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+ if (!vbasedev_iter->dev->realized ||
+ vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+ continue;
+ }
+ tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+ if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
+ if (single) {
+ ret = -EINVAL;
+ goto out_single;
+ }
+ vfio_pci_pre_reset(tmp);
+ tmp->vbasedev.needs_reset = false;
+ multi = true;
+ break;
+ }
+ }
+ }
+
+ if (!single && !multi) {
+ ret = -EINVAL;
+ goto out_single;
+ }
+
+ /* Determine how many group fds need to be passed */
+ count = 0;
+ QLIST_FOREACH(group, &vfio_group_list, next) {
+ for (i = 0; i < info->count; i++) {
+ if (group->groupid == devices[i].group_id) {
+ count++;
+ break;
+ }
+ }
+ }
+
+ reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
+ reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
+ fds = &reset->group_fds[0];
+
+ /* Fill in group fds */
+ QLIST_FOREACH(group, &vfio_group_list, next) {
+ for (i = 0; i < info->count; i++) {
+ if (group->groupid == devices[i].group_id) {
+ fds[reset->count++] = group->fd;
+ break;
+ }
+ }
+ }
+
+ /* Bus reset! */
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+ g_free(reset);
+ if (ret) {
+ ret = -errno;
+ }
+
+ trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
+ ret ? strerror(errno) : "Success");
+
+out:
+ /* Re-enable INTx on affected devices */
+ for (i = 0; i < info->count; i++) {
+ PCIHostDeviceAddress host;
+ VFIOPCIDevice *tmp;
+ VFIODevice *vbasedev_iter;
+
+ host.domain = devices[i].segment;
+ host.bus = devices[i].bus;
+ host.slot = PCI_SLOT(devices[i].devfn);
+ host.function = PCI_FUNC(devices[i].devfn);
+
+ if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
+ continue;
+ }
+
+ QLIST_FOREACH(group, &vfio_group_list, next) {
+ if (group->groupid == devices[i].group_id) {
+ break;
+ }
+ }
+
+ if (!group) {
+ break;
+ }
+
+ QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
+ if (!vbasedev_iter->dev->realized ||
+ vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+ continue;
+ }
+ tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+ if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
+ vfio_pci_post_reset(tmp);
+ break;
+ }
+ }
+ }
+out_single:
+ if (!single) {
+ vfio_pci_post_reset(vdev);
+ }
+ g_free(info);
+
+ return ret;
+}
+
const VFIOIOMMUOps vfio_legacy_ops = {
.dma_map = vfio_legacy_dma_map,
.dma_unmap = vfio_legacy_dma_unmap,
@@ -929,4 +1098,5 @@ const VFIOIOMMUOps vfio_legacy_ops = {
.detach_device = vfio_legacy_detach_device,
.set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
.query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
+ .pci_hot_reset = vfio_legacy_pci_hot_reset,
};
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index eb55e8ae88..d00c3472c7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2374,7 +2374,7 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
return 0;
}
-static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
+void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
{
PCIDevice *pdev = &vdev->pdev;
uint16_t cmd;
@@ -2411,7 +2411,7 @@ static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
}
-static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
+void vfio_pci_post_reset(VFIOPCIDevice *vdev)
{
Error *err = NULL;
int nr;
@@ -2435,7 +2435,7 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
vfio_quirk_reset(vdev);
}
-static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
+bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
{
char tmp[13];
@@ -2485,166 +2485,10 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
{
- VFIOGroup *group;
- struct vfio_pci_hot_reset_info *info = NULL;
- struct vfio_pci_dependent_device *devices;
- struct vfio_pci_hot_reset *reset;
- int32_t *fds;
- int ret, i, count;
- bool multi = false;
-
- trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
-
- if (!single) {
- vfio_pci_pre_reset(vdev);
- }
- vdev->vbasedev.needs_reset = false;
-
- ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
-
- if (ret) {
- goto out_single;
- }
- devices = &info->devices[0];
-
- trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
-
- /* Verify that we have all the groups required */
- for (i = 0; i < info->count; i++) {
- PCIHostDeviceAddress host;
- VFIOPCIDevice *tmp;
- VFIODevice *vbasedev_iter;
-
- host.domain = devices[i].segment;
- host.bus = devices[i].bus;
- host.slot = PCI_SLOT(devices[i].devfn);
- host.function = PCI_FUNC(devices[i].devfn);
-
- trace_vfio_pci_hot_reset_dep_devices(host.domain,
- host.bus, host.slot, host.function, devices[i].group_id);
-
- if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
- continue;
- }
-
- QLIST_FOREACH(group, &vfio_group_list, next) {
- if (group->groupid == devices[i].group_id) {
- break;
- }
- }
-
- if (!group) {
- if (!vdev->has_pm_reset) {
- error_report("vfio: Cannot reset device %s, "
- "depends on group %d which is not owned.",
- vdev->vbasedev.name, devices[i].group_id);
- }
- ret = -EPERM;
- goto out;
- }
-
- /* Prep dependent devices for reset and clear our marker. */
- QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
- if (!vbasedev_iter->dev->realized ||
- vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
- continue;
- }
- tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
- if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
- if (single) {
- ret = -EINVAL;
- goto out_single;
- }
- vfio_pci_pre_reset(tmp);
- tmp->vbasedev.needs_reset = false;
- multi = true;
- break;
- }
- }
- }
-
- if (!single && !multi) {
- ret = -EINVAL;
- goto out_single;
- }
-
- /* Determine how many group fds need to be passed */
- count = 0;
- QLIST_FOREACH(group, &vfio_group_list, next) {
- for (i = 0; i < info->count; i++) {
- if (group->groupid == devices[i].group_id) {
- count++;
- break;
- }
- }
- }
-
- reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
- reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
- fds = &reset->group_fds[0];
-
- /* Fill in group fds */
- QLIST_FOREACH(group, &vfio_group_list, next) {
- for (i = 0; i < info->count; i++) {
- if (group->groupid == devices[i].group_id) {
- fds[reset->count++] = group->fd;
- break;
- }
- }
- }
-
- /* Bus reset! */
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
- g_free(reset);
-
- trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
- ret ? strerror(errno) : "Success");
-
-out:
- /* Re-enable INTx on affected devices */
- for (i = 0; i < info->count; i++) {
- PCIHostDeviceAddress host;
- VFIOPCIDevice *tmp;
- VFIODevice *vbasedev_iter;
-
- host.domain = devices[i].segment;
- host.bus = devices[i].bus;
- host.slot = PCI_SLOT(devices[i].devfn);
- host.function = PCI_FUNC(devices[i].devfn);
-
- if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
- continue;
- }
-
- QLIST_FOREACH(group, &vfio_group_list, next) {
- if (group->groupid == devices[i].group_id) {
- break;
- }
- }
-
- if (!group) {
- break;
- }
-
- QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
- if (!vbasedev_iter->dev->realized ||
- vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
- continue;
- }
- tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
- if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
- vfio_pci_post_reset(tmp);
- break;
- }
- }
- }
-out_single:
- if (!single) {
- vfio_pci_post_reset(vdev);
- }
- g_free(info);
+ VFIODevice *vbasedev = &vdev->vbasedev;
+ const VFIOIOMMUOps *ops = vbasedev->bcontainer->ops;
- return ret;
+ return ops->pci_hot_reset(vbasedev, single);
}
/*
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 09/21] vfio/iommufd: Enable pci hot reset through iommufd cdev interface
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (7 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-17 13:53 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 10/21] vfio/pci: Allow the selection of a given iommu backend Zhenzhong Duan
` (13 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
Add a new callback iommufd_cdev_pci_hot_reset to do iommufd specific
check and reset operation.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v6: pci_hot_reset return -errno if fails
hw/vfio/iommufd.c | 145 +++++++++++++++++++++++++++++++++++++++++++
hw/vfio/trace-events | 1 +
2 files changed, 146 insertions(+)
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index e5bf528e89..3eec428162 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -24,6 +24,7 @@
#include "sysemu/reset.h"
#include "qemu/cutils.h"
#include "qemu/chardev_open.h"
+#include "pci.h"
static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly)
@@ -473,9 +474,153 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
close(vbasedev->fd);
}
+static VFIODevice *iommufd_cdev_pci_find_by_devid(__u32 devid)
+{
+ VFIODevice *vbasedev_iter;
+
+ QLIST_FOREACH(vbasedev_iter, &vfio_device_list, global_next) {
+ if (vbasedev_iter->bcontainer->ops != &vfio_iommufd_ops) {
+ continue;
+ }
+ if (devid == vbasedev_iter->devid) {
+ return vbasedev_iter;
+ }
+ }
+ return NULL;
+}
+
+static int iommufd_cdev_pci_hot_reset(VFIODevice *vbasedev, bool single)
+{
+ VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+ struct vfio_pci_hot_reset_info *info = NULL;
+ struct vfio_pci_dependent_device *devices;
+ struct vfio_pci_hot_reset *reset;
+ int ret, i;
+ bool multi = false;
+
+ trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
+
+ if (!single) {
+ vfio_pci_pre_reset(vdev);
+ }
+ vdev->vbasedev.needs_reset = false;
+
+ ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
+
+ if (ret) {
+ goto out_single;
+ }
+
+ assert(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID);
+
+ devices = &info->devices[0];
+
+ if (!(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED)) {
+ if (!vdev->has_pm_reset) {
+ for (i = 0; i < info->count; i++) {
+ if (devices[i].devid == VFIO_PCI_DEVID_NOT_OWNED) {
+ error_report("vfio: Cannot reset device %s, "
+ "depends on device %04x:%02x:%02x.%x "
+ "which is not owned.",
+ vdev->vbasedev.name, devices[i].segment,
+ devices[i].bus, PCI_SLOT(devices[i].devfn),
+ PCI_FUNC(devices[i].devfn));
+ }
+ }
+ }
+ ret = -EPERM;
+ goto out_single;
+ }
+
+ trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
+
+ for (i = 0; i < info->count; i++) {
+ VFIOPCIDevice *tmp;
+ VFIODevice *vbasedev_iter;
+
+ trace_iommufd_cdev_pci_hot_reset_dep_devices(devices[i].segment,
+ devices[i].bus,
+ PCI_SLOT(devices[i].devfn),
+ PCI_FUNC(devices[i].devfn),
+ devices[i].devid);
+
+ /*
+ * If a VFIO cdev device is resettable, all the dependent devices
+ * are either bound to same iommufd or within same iommu_groups as
+ * one of the iommufd bound devices.
+ */
+ assert(devices[i].devid != VFIO_PCI_DEVID_NOT_OWNED);
+
+ if (devices[i].devid == vdev->vbasedev.devid ||
+ devices[i].devid == VFIO_PCI_DEVID_OWNED) {
+ continue;
+ }
+
+ vbasedev_iter = iommufd_cdev_pci_find_by_devid(devices[i].devid);
+ if (!vbasedev_iter || !vbasedev_iter->dev->realized ||
+ vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+ continue;
+ }
+ tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+ if (single) {
+ ret = -EINVAL;
+ goto out_single;
+ }
+ vfio_pci_pre_reset(tmp);
+ tmp->vbasedev.needs_reset = false;
+ multi = true;
+ }
+
+ if (!single && !multi) {
+ ret = -EINVAL;
+ goto out_single;
+ }
+
+ /* Use zero length array for hot reset with iommufd backend */
+ reset = g_malloc0(sizeof(*reset));
+ reset->argsz = sizeof(*reset);
+
+ /* Bus reset! */
+ ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
+ g_free(reset);
+ if (ret) {
+ ret = -errno;
+ }
+
+ trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
+ ret ? strerror(errno) : "Success");
+
+ /* Re-enable INTx on affected devices */
+ for (i = 0; i < info->count; i++) {
+ VFIOPCIDevice *tmp;
+ VFIODevice *vbasedev_iter;
+
+ if (devices[i].devid == vdev->vbasedev.devid ||
+ devices[i].devid == VFIO_PCI_DEVID_OWNED) {
+ continue;
+ }
+
+ vbasedev_iter = iommufd_cdev_pci_find_by_devid(devices[i].devid);
+ if (!vbasedev_iter || !vbasedev_iter->dev->realized ||
+ vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
+ continue;
+ }
+ tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
+ vfio_pci_post_reset(tmp);
+ }
+out_single:
+ if (!single) {
+ vfio_pci_post_reset(vdev);
+ }
+ g_free(info);
+
+ return ret;
+}
+
const VFIOIOMMUOps vfio_iommufd_ops = {
.dma_map = iommufd_cdev_map,
.dma_unmap = iommufd_cdev_unmap,
.attach_device = iommufd_cdev_attach,
.detach_device = iommufd_cdev_detach,
+ .pci_hot_reset = iommufd_cdev_pci_hot_reset,
};
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 5d3e9e8cee..d838232d5a 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -174,3 +174,4 @@ iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name, const char *str, in
iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD container with ioasid=%d"
iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"
+iommufd_cdev_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int dev_id) "\t%04x:%02x:%02x.%x devid %d"
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 10/21] vfio/pci: Allow the selection of a given iommu backend
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (8 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 09/21] vfio/iommufd: Enable pci hot reset through iommufd cdev interface Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 13:57 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
` (12 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
From: Eric Auger <eric.auger@redhat.com>
Now we support two types of iommu backends, let's add the capability
to select one of them. This depends on whether an iommufd object has
been linked with the vfio-pci device:
If the user wants to use the legacy backend, it shall not
link the vfio-pci device with any iommufd object:
-device vfio-pci,host=0000:02:00.0
This is called the legacy mode/backend.
If the user wants to use the iommufd backend (/dev/iommu) it
shall pass an iommufd object id in the vfio-pci device options:
-object iommufd,id=iommufd0
-device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Eric Auger <eric.auger@redhat.com>
Signed-off-by: Yi Liu <yi.l.liu@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/pci.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index d00c3472c7..c5984b0598 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -19,6 +19,7 @@
*/
#include "qemu/osdep.h"
+#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
#include <linux/vfio.h>
#include <sys/ioctl.h>
@@ -42,6 +43,7 @@
#include "qapi/error.h"
#include "migration/blocker.h"
#include "migration/qemu-file.h"
+#include "sysemu/iommufd.h"
#define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
@@ -3386,6 +3388,10 @@ static Property vfio_pci_dev_properties[] = {
* DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
* DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
*/
+#ifdef CONFIG_IOMMUFD
+ DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
+ TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
DEFINE_PROP_END_OF_LIST(),
};
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (9 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 10/21] vfio/pci: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 14:08 ` Cédric Le Goater
2023-11-15 12:09 ` Philippe Mathieu-Daudé
2023-11-14 10:09 ` [PATCH v6 12/21] vfio/platform: Allow the selection of a given iommu backend Zhenzhong Duan
` (11 subsequent siblings)
22 siblings, 2 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.
Together with the earlier support of pre-opening /dev/iommu device,
now we have full support of passing a vfio device to unprivileged
qemu by management tool. This mode is no more considered for the
legacy backend. So let's remove the "TODO" comment.
Add helper functions vfio_device_set_fd() and vfio_device_get_name()
to set fd and get device name, they will also be used by other vfio
devices.
There is no easy way to check if a device is mdev with FD passing,
so fail the x-balloon-allowed check unconditionally in this case.
There is also no easy way to get BDF as name with FD passing, so
we fake a name by VFIO_FD[fd].
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v6: simplify CONFIG_IOMMUFD checking code
introduce a helper vfio_device_set_fd
include/hw/vfio/vfio-common.h | 3 +++
hw/vfio/helpers.c | 44 +++++++++++++++++++++++++++++++++++
hw/vfio/iommufd.c | 12 ++++++----
hw/vfio/pci.c | 28 ++++++++++++----------
4 files changed, 71 insertions(+), 16 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3dac5c167e..567e5f7bea 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -251,4 +251,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
hwaddr size);
int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
uint64_t size, ram_addr_t ram_addr);
+
+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
+void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
#endif /* HW_VFIO_VFIO_COMMON_H */
diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
index 168847e7c5..986ef1095a 100644
--- a/hw/vfio/helpers.c
+++ b/hw/vfio/helpers.c
@@ -20,6 +20,7 @@
*/
#include "qemu/osdep.h"
+#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
#include <sys/ioctl.h>
#include "hw/vfio/vfio-common.h"
@@ -27,6 +28,7 @@
#include "trace.h"
#include "qapi/error.h"
#include "qemu/error-report.h"
+#include "monitor/monitor.h"
/*
* Common VFIO interrupt disable
@@ -609,3 +611,45 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
return ret;
}
+
+int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
+{
+ struct stat st;
+
+ if (vbasedev->fd < 0) {
+ if (stat(vbasedev->sysfsdev, &st) < 0) {
+ error_setg_errno(errp, errno, "no such host device");
+ error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
+ return -errno;
+ }
+ /* User may specify a name, e.g: VFIO platform device */
+ if (!vbasedev->name) {
+ vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+ }
+ } else {
+ if (!vbasedev->iommufd) {
+ error_setg(errp, "Use FD passing only with iommufd backend");
+ return -EINVAL;
+ }
+ /*
+ * Give a name with fd so any function printing out vbasedev->name
+ * will not break.
+ */
+ if (!vbasedev->name) {
+ vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
+ }
+ }
+
+ return 0;
+}
+
+void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
+{
+ int fd = monitor_fd_param(monitor_cur(), str, errp);
+
+ if (fd < 0) {
+ error_prepend(errp, "Could not parse remote object fd %s:", str);
+ return;
+ }
+ vbasedev->fd = fd;
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 3eec428162..e08a217057 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -326,11 +326,15 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
uint32_t ioas_id;
Error *err = NULL;
- devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
- if (devfd < 0) {
- return devfd;
+ if (vbasedev->fd < 0) {
+ devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
+ if (devfd < 0) {
+ return devfd;
+ }
+ vbasedev->fd = devfd;
+ } else {
+ devfd = vbasedev->fd;
}
- vbasedev->fd = devfd;
ret = iommufd_cdev_connect_and_bind(vbasedev, errp);
if (ret) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index c5984b0598..b23b492cce 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2944,17 +2944,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
VFIODevice *vbasedev = &vdev->vbasedev;
char *tmp, *subsys;
Error *err = NULL;
- struct stat st;
int i, ret;
bool is_mdev;
char uuid[UUID_STR_LEN];
char *name;
- if (!vbasedev->sysfsdev) {
+ if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
if (!(~vdev->host.domain || ~vdev->host.bus ||
~vdev->host.slot || ~vdev->host.function)) {
error_setg(errp, "No provided host device");
error_append_hint(errp, "Use -device vfio-pci,host=DDDD:BB:DD.F "
+#ifdef CONFIG_IOMMUFD
+ "or -device vfio-pci,fd=DEVICE_FD "
+#endif
"or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
return;
}
@@ -2964,13 +2966,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
vdev->host.slot, vdev->host.function);
}
- if (stat(vbasedev->sysfsdev, &st) < 0) {
- error_setg_errno(errp, errno, "no such host device");
- error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
+ if (vfio_device_get_name(vbasedev, errp)) {
return;
}
-
- vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
vbasedev->ops = &vfio_pci_ops;
vbasedev->type = VFIO_DEVICE_TYPE_PCI;
vbasedev->dev = DEVICE(vdev);
@@ -3330,6 +3328,7 @@ static void vfio_instance_init(Object *obj)
vdev->host.bus = ~0U;
vdev->host.slot = ~0U;
vdev->host.function = ~0U;
+ vdev->vbasedev.fd = -1;
vdev->nv_gpudirect_clique = 0xFF;
@@ -3383,11 +3382,6 @@ static Property vfio_pci_dev_properties[] = {
qdev_prop_nv_gpudirect_clique, uint8_t),
DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
OFF_AUTOPCIBAR_OFF),
- /*
- * TODO - support passed fds... is this necessary?
- * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
- * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
- */
#ifdef CONFIG_IOMMUFD
DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
@@ -3395,6 +3389,13 @@ static Property vfio_pci_dev_properties[] = {
DEFINE_PROP_END_OF_LIST(),
};
+#ifdef CONFIG_IOMMUFD
+static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp)
+{
+ vfio_device_set_fd(&VFIO_PCI(obj)->vbasedev, str, errp);
+}
+#endif
+
static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
@@ -3402,6 +3403,9 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
dc->reset = vfio_pci_reset;
device_class_set_props(dc, vfio_pci_dev_properties);
+#ifdef CONFIG_IOMMUFD
+ object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
+#endif
dc->desc = "VFIO-based PCI device assignment";
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
pdc->realize = vfio_realize;
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 12/21] vfio/platform: Allow the selection of a given iommu backend
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (10 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 14:03 ` Cédric Le Goater
2023-11-17 14:55 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 13/21] vfio/platform: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
` (10 subsequent siblings)
22 siblings, 2 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
Now we support two types of iommu backends, let's add the capability
to select one of them. This depends on whether an iommufd object has
been linked with the vfio-platform device:
If the user wants to use the legacy backend, it shall not
link the vfio-platform device with any iommufd object:
-device vfio-platform,host=XXX
This is called the legacy mode/backend.
If the user wants to use the iommufd backend (/dev/iommu) it
shall pass an iommufd object id in the vfio-platform device options:
-object iommufd,id=iommufd0
-device vfio-platform,host=XXX,iommufd=iommufd0
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
v6: Move #include "sysemu/iommufd.h" in platform.c
hw/vfio/platform.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 8e3d4ac458..98ae4bc655 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -15,11 +15,13 @@
*/
#include "qemu/osdep.h"
+#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
#include "qapi/error.h"
#include <sys/ioctl.h>
#include <linux/vfio.h>
#include "hw/vfio/vfio-platform.h"
+#include "sysemu/iommufd.h"
#include "migration/vmstate.h"
#include "qemu/error-report.h"
#include "qemu/lockable.h"
@@ -649,6 +651,10 @@ static Property vfio_platform_dev_properties[] = {
DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
mmap_timeout, 1100),
DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
+#ifdef CONFIG_IOMMUFD
+ DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
+ TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
DEFINE_PROP_END_OF_LIST(),
};
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 13/21] vfio/platform: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (11 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 12/21] vfio/platform: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 14:22 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 14/21] vfio/ap: Allow the selection of a given iommu backend Zhenzhong Duan
` (9 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/vfio/platform.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index 98ae4bc655..a97d9c6234 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -531,14 +531,13 @@ static VFIODeviceOps vfio_platform_ops = {
*/
static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
{
- struct stat st;
int ret;
- /* @sysfsdev takes precedence over @host */
- if (vbasedev->sysfsdev) {
+ /* @fd takes precedence over @sysfsdev which takes precedence over @host */
+ if (vbasedev->fd < 0 && vbasedev->sysfsdev) {
g_free(vbasedev->name);
vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
- } else {
+ } else if (vbasedev->fd < 0) {
if (!vbasedev->name || strchr(vbasedev->name, '/')) {
error_setg(errp, "wrong host device name");
return -EINVAL;
@@ -548,10 +547,9 @@ static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
vbasedev->name);
}
- if (stat(vbasedev->sysfsdev, &st) < 0) {
- error_setg_errno(errp, errno,
- "failed to get the sysfs host device file status");
- return -errno;
+ ret = vfio_device_get_name(vbasedev, errp);
+ if (ret) {
+ return ret;
}
ret = vfio_attach_device(vbasedev->name, vbasedev,
@@ -658,6 +656,20 @@ static Property vfio_platform_dev_properties[] = {
DEFINE_PROP_END_OF_LIST(),
};
+static void vfio_platform_instance_init(Object *obj)
+{
+ VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(obj);
+
+ vdev->vbasedev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_platform_set_fd(Object *obj, const char *str, Error **errp)
+{
+ vfio_device_set_fd(&VFIO_PLATFORM_DEVICE(obj)->vbasedev, str, errp);
+}
+#endif
+
static void vfio_platform_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
@@ -665,6 +677,9 @@ static void vfio_platform_class_init(ObjectClass *klass, void *data)
dc->realize = vfio_platform_realize;
device_class_set_props(dc, vfio_platform_dev_properties);
+#ifdef CONFIG_IOMMUFD
+ object_class_property_add_str(klass, "fd", NULL, vfio_platform_set_fd);
+#endif
dc->vmsd = &vfio_platform_vmstate;
dc->desc = "VFIO-based platform device assignment";
sbc->connect_irq_notifier = vfio_start_irqfd_injection;
@@ -677,6 +692,7 @@ static const TypeInfo vfio_platform_dev_info = {
.name = TYPE_VFIO_PLATFORM,
.parent = TYPE_SYS_BUS_DEVICE,
.instance_size = sizeof(VFIOPlatformDevice),
+ .instance_init = vfio_platform_instance_init,
.class_init = vfio_platform_class_init,
.class_size = sizeof(VFIOPlatformDeviceClass),
};
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 14/21] vfio/ap: Allow the selection of a given iommu backend
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (12 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 13/21] vfio/platform: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 14:03 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 15/21] vfio/ap: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
` (8 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan, Matthew Rosato, Tony Krowiak, Halil Pasic,
Jason Herne, Thomas Huth, open list:vfio-ap
Now we support two types of iommu backends, let's add the capability
to select one of them. This depends on whether an iommufd object has
been linked with the vfio-ap device:
if the user wants to use the legacy backend, it shall not
link the vfio-ap device with any iommufd object:
-device vfio-ap,sysfsdev=/sys/bus/mdev/devices/XXX
This is called the legacy mode/backend.
If the user wants to use the iommufd backend (/dev/iommu) it
shall pass an iommufd object id in the vfio-ap device options:
-object iommufd,id=iommufd0
-device vfio-ap,sysfsdev=/sys/bus/mdev/devices/XXX,iommufd=iommufd0
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
hw/vfio/ap.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index bbf69ff55a..80629609ae 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -11,10 +11,12 @@
*/
#include "qemu/osdep.h"
+#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
#include <linux/vfio.h>
#include <sys/ioctl.h>
#include "qapi/error.h"
#include "hw/vfio/vfio-common.h"
+#include "sysemu/iommufd.h"
#include "hw/s390x/ap-device.h"
#include "qemu/error-report.h"
#include "qemu/event_notifier.h"
@@ -204,6 +206,10 @@ static void vfio_ap_unrealize(DeviceState *dev)
static Property vfio_ap_properties[] = {
DEFINE_PROP_STRING("sysfsdev", VFIOAPDevice, vdev.sysfsdev),
+#ifdef CONFIG_IOMMUFD
+ DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd,
+ TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
DEFINE_PROP_END_OF_LIST(),
};
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 15/21] vfio/ap: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (13 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 14/21] vfio/ap: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 14:04 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 16/21] vfio/ccw: Allow the selection of a given iommu backend Zhenzhong Duan
` (7 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan, Matthew Rosato, Tony Krowiak, Halil Pasic,
Jason Herne, Thomas Huth, open list:vfio-ap
This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
hw/vfio/ap.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 80629609ae..b21f92291e 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -160,7 +160,10 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
VFIOAPDevice *vapdev = VFIO_AP_DEVICE(dev);
VFIODevice *vbasedev = &vapdev->vdev;
- vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
+ if (vfio_device_get_name(vbasedev, errp)) {
+ return;
+ }
+
vbasedev->ops = &vfio_ap_ops;
vbasedev->type = VFIO_DEVICE_TYPE_AP;
vbasedev->dev = dev;
@@ -230,11 +233,28 @@ static const VMStateDescription vfio_ap_vmstate = {
.unmigratable = 1,
};
+static void vfio_ap_instance_init(Object *obj)
+{
+ VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
+
+ vapdev->vdev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_ap_set_fd(Object *obj, const char *str, Error **errp)
+{
+ vfio_device_set_fd(&VFIO_AP_DEVICE(obj)->vdev, str, errp);
+}
+#endif
+
static void vfio_ap_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
device_class_set_props(dc, vfio_ap_properties);
+#ifdef CONFIG_IOMMUFD
+ object_class_property_add_str(klass, "fd", NULL, vfio_ap_set_fd);
+#endif
dc->vmsd = &vfio_ap_vmstate;
dc->desc = "VFIO-based AP device assignment";
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
@@ -249,6 +269,7 @@ static const TypeInfo vfio_ap_info = {
.name = TYPE_VFIO_AP_DEVICE,
.parent = TYPE_AP_DEVICE,
.instance_size = sizeof(VFIOAPDevice),
+ .instance_init = vfio_ap_instance_init,
.class_init = vfio_ap_class_init,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 16/21] vfio/ccw: Allow the selection of a given iommu backend
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (14 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 15/21] vfio/ap: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 14:04 ` Cédric Le Goater
2023-11-15 18:45 ` Eric Farman
2023-11-14 10:09 ` [PATCH v6 17/21] vfio/ccw: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
` (6 subsequent siblings)
22 siblings, 2 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan, Matthew Rosato, Eric Farman, Thomas Huth,
open list:vfio-ccw
Now we support two types of iommu backends, let's add the capability
to select one of them. This depends on whether an iommufd object has
been linked with the vfio-ccw device:
If the user wants to use the legacy backend, it shall not
link the vfio-ccw device with any iommufd object:
-device vfio-ccw,sysfsdev=/sys/bus/mdev/devices/XXX
This is called the legacy mode/backend.
If the user wants to use the iommufd backend (/dev/iommu) it
shall pass an iommufd object id in the vfio-ccw device options:
-object iommufd,id=iommufd0
-device vfio-ccw,sysfsdev=/sys/bus/mdev/devices/XXX,iommufd=iommufd0
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
hw/vfio/ccw.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index d857bb8d0f..d2d58bb677 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -15,12 +15,14 @@
*/
#include "qemu/osdep.h"
+#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
#include <linux/vfio.h>
#include <linux/vfio_ccw.h>
#include <sys/ioctl.h>
#include "qapi/error.h"
#include "hw/vfio/vfio-common.h"
+#include "sysemu/iommufd.h"
#include "hw/s390x/s390-ccw.h"
#include "hw/s390x/vfio-ccw.h"
#include "hw/qdev-properties.h"
@@ -677,6 +679,10 @@ static void vfio_ccw_unrealize(DeviceState *dev)
static Property vfio_ccw_properties[] = {
DEFINE_PROP_STRING("sysfsdev", VFIOCCWDevice, vdev.sysfsdev),
DEFINE_PROP_BOOL("force-orb-pfch", VFIOCCWDevice, force_orb_pfch, false),
+#ifdef CONFIG_IOMMUFD
+ DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd,
+ TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
+#endif
DEFINE_PROP_END_OF_LIST(),
};
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 17/21] vfio/ccw: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (15 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 16/21] vfio/ccw: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 14:04 ` Cédric Le Goater
2023-11-15 18:46 ` Eric Farman
2023-11-14 10:09 ` [PATCH v6 18/21] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Zhenzhong Duan
` (5 subsequent siblings)
22 siblings, 2 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan, Matthew Rosato, Eric Farman, Thomas Huth,
open list:vfio-ccw
This gives management tools like libvirt a chance to open the vfio
cdev with privilege and pass FD to qemu. This way qemu never needs
to have privilege to open a VFIO or iommu cdev node.
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
hw/vfio/ccw.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index d2d58bb677..b116b10fe7 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -590,11 +590,12 @@ static void vfio_ccw_realize(DeviceState *dev, Error **errp)
}
}
+ if (vfio_device_get_name(vbasedev, errp)) {
+ return;
+ }
+
vbasedev->ops = &vfio_ccw_ops;
vbasedev->type = VFIO_DEVICE_TYPE_CCW;
- vbasedev->name = g_strdup_printf("%x.%x.%04x", vcdev->cdev.hostid.cssid,
- vcdev->cdev.hostid.ssid,
- vcdev->cdev.hostid.devid);
vbasedev->dev = dev;
/*
@@ -691,12 +692,29 @@ static const VMStateDescription vfio_ccw_vmstate = {
.unmigratable = 1,
};
+static void vfio_ccw_instance_init(Object *obj)
+{
+ VFIOCCWDevice *vcdev = VFIO_CCW(obj);
+
+ vcdev->vdev.fd = -1;
+}
+
+#ifdef CONFIG_IOMMUFD
+static void vfio_ccw_set_fd(Object *obj, const char *str, Error **errp)
+{
+ vfio_device_set_fd(&VFIO_CCW(obj)->vdev, str, errp);
+}
+#endif
+
static void vfio_ccw_class_init(ObjectClass *klass, void *data)
{
DeviceClass *dc = DEVICE_CLASS(klass);
S390CCWDeviceClass *cdc = S390_CCW_DEVICE_CLASS(klass);
device_class_set_props(dc, vfio_ccw_properties);
+#ifdef CONFIG_IOMMUFD
+ object_class_property_add_str(klass, "fd", NULL, vfio_ccw_set_fd);
+#endif
dc->vmsd = &vfio_ccw_vmstate;
dc->desc = "VFIO-based subchannel assignment";
set_bit(DEVICE_CATEGORY_MISC, dc->categories);
@@ -714,6 +732,7 @@ static const TypeInfo vfio_ccw_info = {
.name = TYPE_VFIO_CCW,
.parent = TYPE_S390_CCW,
.instance_size = sizeof(VFIOCCWDevice),
+ .instance_init = vfio_ccw_instance_init,
.class_init = vfio_ccw_class_init,
};
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 18/21] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (16 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 17/21] vfio/ccw: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-14 14:05 ` Cédric Le Goater
2023-11-17 14:58 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 19/21] hw/arm: Activate IOMMUFD for virt machines Zhenzhong Duan
` (4 subsequent siblings)
22 siblings, 2 replies; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan
Some of the callbacks in VFIOIOMMUOps pass VFIOContainerBase poiner,
those callbacks only need read access to the sub object of VFIOContainerBase.
So make VFIOContainerBase, VFIOContainer and VFIOIOMMUFDContainer as const
in these callbacks.
Local functions called by those callbacks also need same changes to avoid
build error.
Suggested-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
include/hw/vfio/vfio-common.h | 12 ++++++----
include/hw/vfio/vfio-container-base.h | 12 ++++++----
hw/vfio/common.c | 9 +++----
hw/vfio/container-base.c | 2 +-
hw/vfio/container.c | 34 ++++++++++++++-------------
hw/vfio/iommufd.c | 8 +++----
6 files changed, 42 insertions(+), 35 deletions(-)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 567e5f7bea..7954531d05 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -244,13 +244,15 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
void vfio_migration_exit(VFIODevice *vbasedev);
int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size);
-bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer);
-bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer);
-int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+bool
+vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
+bool
+vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
+int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap, hwaddr iova,
hwaddr size);
-int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
- uint64_t size, ram_addr_t ram_addr);
+int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
+ uint64_t size, ram_addr_t ram_addr);
int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 45bb19c767..2ae297ccda 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -82,7 +82,7 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
MemoryRegionSection *section);
int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
bool start);
-int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap,
hwaddr iova, hwaddr size);
@@ -93,18 +93,20 @@ void vfio_container_destroy(VFIOContainerBase *bcontainer);
struct VFIOIOMMUOps {
/* basic feature */
- int (*dma_map)(VFIOContainerBase *bcontainer,
+ int (*dma_map)(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
void *vaddr, bool readonly);
- int (*dma_unmap)(VFIOContainerBase *bcontainer,
+ int (*dma_unmap)(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
IOMMUTLBEntry *iotlb);
int (*attach_device)(const char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
void (*detach_device)(VFIODevice *vbasedev);
/* migration feature */
- int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
- int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
+ int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
+ bool start);
+ int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
+ VFIOBitmap *vbmap,
hwaddr iova, hwaddr size);
/* PCI specific */
int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6569732b7a..08a3e57672 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -204,7 +204,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
return true;
}
-bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
+bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
{
VFIODevice *vbasedev;
@@ -221,7 +221,8 @@ bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
* Check if all VFIO devices are running and migration is active, which is
* essentially equivalent to the migration being in pre-copy phase.
*/
-bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer)
+bool
+vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer)
{
VFIODevice *vbasedev;
@@ -1139,7 +1140,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
return 0;
}
-int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap, hwaddr iova,
hwaddr size)
{
@@ -1162,7 +1163,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
return 0;
}
-int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
+int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
uint64_t size, ram_addr_t ram_addr)
{
bool all_device_dirty_tracking =
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index eee2dcfe76..1ffd25bbfa 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -63,7 +63,7 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
}
-int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap,
hwaddr iova, hwaddr size)
{
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 1dbf9b9a17..b22feb8ded 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -61,11 +61,11 @@ static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
}
}
-static int vfio_dma_unmap_bitmap(VFIOContainer *container,
+static int vfio_dma_unmap_bitmap(const VFIOContainer *container,
hwaddr iova, ram_addr_t size,
IOMMUTLBEntry *iotlb)
{
- VFIOContainerBase *bcontainer = &container->bcontainer;
+ const VFIOContainerBase *bcontainer = &container->bcontainer;
struct vfio_iommu_type1_dma_unmap *unmap;
struct vfio_bitmap *bitmap;
VFIOBitmap vbmap;
@@ -117,11 +117,12 @@ unmap_exit:
/*
* DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
*/
-static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
- ram_addr_t size, IOMMUTLBEntry *iotlb)
+static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb)
{
- VFIOContainer *container = container_of(bcontainer, VFIOContainer,
- bcontainer);
+ const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+ bcontainer);
struct vfio_iommu_type1_dma_unmap unmap = {
.argsz = sizeof(unmap),
.flags = 0,
@@ -174,11 +175,11 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
return 0;
}
-static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
+static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly)
{
- VFIOContainer *container = container_of(bcontainer, VFIOContainer,
- bcontainer);
+ const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+ bcontainer);
struct vfio_iommu_type1_dma_map map = {
.argsz = sizeof(map),
.flags = VFIO_DMA_MAP_FLAG_READ,
@@ -207,11 +208,12 @@ static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
return -errno;
}
-static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
- bool start)
+static int
+vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
+ bool start)
{
- VFIOContainer *container = container_of(bcontainer, VFIOContainer,
- bcontainer);
+ const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+ bcontainer);
int ret;
struct vfio_iommu_type1_dirty_bitmap dirty = {
.argsz = sizeof(dirty),
@@ -233,12 +235,12 @@ static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
return ret;
}
-static int vfio_legacy_query_dirty_bitmap(VFIOContainerBase *bcontainer,
+static int vfio_legacy_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
VFIOBitmap *vbmap,
hwaddr iova, hwaddr size)
{
- VFIOContainer *container = container_of(bcontainer, VFIOContainer,
- bcontainer);
+ const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
+ bcontainer);
struct vfio_iommu_type1_dirty_bitmap *dbitmap;
struct vfio_iommu_type1_dirty_bitmap_get *range;
int ret;
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index e08a217057..bc45dd1842 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -26,10 +26,10 @@
#include "qemu/chardev_open.h"
#include "pci.h"
-static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
+static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly)
{
- VFIOIOMMUFDContainer *container =
+ const VFIOIOMMUFDContainer *container =
container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
return iommufd_backend_map_dma(container->be,
@@ -37,11 +37,11 @@ static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
iova, size, vaddr, readonly);
}
-static int iommufd_cdev_unmap(VFIOContainerBase *bcontainer,
+static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
IOMMUTLBEntry *iotlb)
{
- VFIOIOMMUFDContainer *container =
+ const VFIOIOMMUFDContainer *container =
container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
/* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 19/21] hw/arm: Activate IOMMUFD for virt machines
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (17 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 18/21] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-16 9:17 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 20/21] kconfig: Activate IOMMUFD for s390x machines Zhenzhong Duan
` (3 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan, Paolo Bonzini, Peter Maydell,
open list:ARM TCG CPUs
From: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/arm/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
index 3ada335a24..660f49db49 100644
--- a/hw/arm/Kconfig
+++ b/hw/arm/Kconfig
@@ -8,6 +8,7 @@ config ARM_VIRT
imply TPM_TIS_SYSBUS
imply TPM_TIS_I2C
imply NVDIMM
+ imply IOMMUFD
select ARM_GIC
select ACPI
select ARM_SMMUV3
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 20/21] kconfig: Activate IOMMUFD for s390x machines
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (18 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 19/21] hw/arm: Activate IOMMUFD for virt machines Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-15 18:47 ` Eric Farman
2023-11-14 10:09 ` [PATCH v6 21/21] hw/i386: Activate IOMMUFD for q35 machines Zhenzhong Duan
` (2 subsequent siblings)
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan, Matthew Rosato, Paolo Bonzini, Halil Pasic,
Christian Borntraeger, Eric Farman, Thomas Huth,
Richard Henderson, David Hildenbrand, Ilya Leoshkevich,
open list:S390 Virtio-ccw
From: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
---
hw/s390x/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/s390x/Kconfig b/hw/s390x/Kconfig
index 4c068d7960..26ad104485 100644
--- a/hw/s390x/Kconfig
+++ b/hw/s390x/Kconfig
@@ -6,6 +6,7 @@ config S390_CCW_VIRTIO
imply VFIO_CCW
imply WDT_DIAG288
imply PCIE_DEVICES
+ imply IOMMUFD
select PCI_EXPRESS
select S390_FLIC
select S390_FLIC_KVM if KVM
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* [PATCH v6 21/21] hw/i386: Activate IOMMUFD for q35 machines
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (19 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 20/21] kconfig: Activate IOMMUFD for s390x machines Zhenzhong Duan
@ 2023-11-14 10:09 ` Zhenzhong Duan
2023-11-16 9:17 ` Eric Auger
2023-11-14 14:51 ` [PATCH v6 00/21] vfio: Adopt iommufd Cédric Le Goater
2023-11-20 9:15 ` Eric Auger
22 siblings, 1 reply; 82+ messages in thread
From: Zhenzhong Duan @ 2023-11-14 10:09 UTC (permalink / raw)
To: qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Zhenzhong Duan, Paolo Bonzini, Michael S. Tsirkin,
Marcel Apfelbaum, Richard Henderson, Eduardo Habkost
From: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
---
hw/i386/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
index 55850791df..a1846be6f7 100644
--- a/hw/i386/Kconfig
+++ b/hw/i386/Kconfig
@@ -95,6 +95,7 @@ config Q35
imply E1000E_PCI_EXPRESS
imply VMPORT
imply VMMOUSE
+ imply IOMMUFD
select PC_PCI
select PC_ACPI
select PCI_EXPRESS_Q35
--
2.34.1
^ permalink raw reply related [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-14 10:09 ` [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
@ 2023-11-14 13:28 ` Cédric Le Goater
2023-11-15 4:06 ` Duan, Zhenzhong
2023-11-15 12:52 ` Eric Auger
2023-11-17 11:09 ` Cédric Le Goater
2 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 13:28 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Paolo Bonzini, Eric Blake, Markus Armbruster,
Daniel P. Berrangé, Eduardo Habkost
On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
>
> Introduce an iommufd object which allows the interaction
> with the host /dev/iommu device.
>
> The /dev/iommu can have been already pre-opened outside of qemu,
> in which case the fd can be passed directly along with the
> iommufd object:
>
> This allows the iommufd object to be shared accross several
> subsystems (VFIO, VDPA, ...). For example, libvirt would open
> the /dev/iommu once.
>
> If no fd is passed along with the iommufd object, the /dev/iommu
> is opened by the qemu code.
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
I simplified the object declaration in include/sysemu/iommufd.h and
formatted /dev/iommu in qemu-options.hx. No need to resend.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> v6: remove redundant call, alloc_hwpt, get/put_ioas
>
> MAINTAINERS | 7 ++
> qapi/qom.json | 19 ++++
> include/sysemu/iommufd.h | 44 ++++++++
> backends/iommufd.c | 228 +++++++++++++++++++++++++++++++++++++++
> backends/Kconfig | 4 +
> backends/meson.build | 1 +
> backends/trace-events | 10 ++
> qemu-options.hx | 12 +++
> 8 files changed, 325 insertions(+)
> create mode 100644 include/sysemu/iommufd.h
> create mode 100644 backends/iommufd.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ff1238bb98..a4891f7bda 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c
> F: docs/system/s390x/vfio-ap.rst
> L: qemu-s390x@nongnu.org
>
> +iommufd
> +M: Yi Liu <yi.l.liu@intel.com>
> +M: Eric Auger <eric.auger@redhat.com>
> +S: Supported
> +F: backends/iommufd.c
> +F: include/sysemu/iommufd.h
> +
> vhost
> M: Michael S. Tsirkin <mst@redhat.com>
> S: Supported
> diff --git a/qapi/qom.json b/qapi/qom.json
> index c53ef978ff..1fd8555a75 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -794,6 +794,23 @@
> { 'struct': 'VfioUserServerProperties',
> 'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>
> +##
> +# @IOMMUFDProperties:
> +#
> +# Properties for iommufd objects.
> +#
> +# @fd: file descriptor name previously passed via 'getfd' command,
> +# which represents a pre-opened /dev/iommu. This allows the
> +# iommufd object to be shared accross several subsystems
> +# (VFIO, VDPA, ...), and the file descriptor to be shared
> +# with other process, e.g. DPDK. (default: QEMU opens
> +# /dev/iommu by itself)
> +#
> +# Since: 8.2
> +##
> +{ 'struct': 'IOMMUFDProperties',
> + 'data': { '*fd': 'str' } }
> +
> ##
> # @RngProperties:
> #
> @@ -934,6 +951,7 @@
> 'input-barrier',
> { 'name': 'input-linux',
> 'if': 'CONFIG_LINUX' },
> + 'iommufd',
> 'iothread',
> 'main-loop',
> { 'name': 'memory-backend-epc',
> @@ -1003,6 +1021,7 @@
> 'input-barrier': 'InputBarrierProperties',
> 'input-linux': { 'type': 'InputLinuxProperties',
> 'if': 'CONFIG_LINUX' },
> + 'iommufd': 'IOMMUFDProperties',
> 'iothread': 'IothreadProperties',
> 'main-loop': 'MainLoopProperties',
> 'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties',
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> new file mode 100644
> index 0000000000..9b3a86f57d
> --- /dev/null
> +++ b/include/sysemu/iommufd.h
> @@ -0,0 +1,44 @@
> +#ifndef SYSEMU_IOMMUFD_H
> +#define SYSEMU_IOMMUFD_H
> +
> +#include "qom/object.h"
> +#include "qemu/thread.h"
> +#include "exec/hwaddr.h"
> +#include "exec/cpu-common.h"
> +
> +#define TYPE_IOMMUFD_BACKEND "iommufd"
> +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
> + IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND(obj) \
> + OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND_GET_CLASS(obj) \
> + OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND_CLASS(klass) \
> + OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND)
> +struct IOMMUFDBackendClass {
> + ObjectClass parent_class;
> +};
> +
> +struct IOMMUFDBackend {
> + Object parent;
> +
> + /*< protected >*/
> + int fd; /* /dev/iommu file descriptor */
> + bool owned; /* is the /dev/iommu opened internally */
> + QemuMutex lock;
> + uint32_t users;
> +
> + /*< public >*/
> +};
> +
> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
> +void iommufd_backend_disconnect(IOMMUFDBackend *be);
> +
> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
> + Error **errp);
> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> + ram_addr_t size, void *vaddr, bool readonly);
> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> + hwaddr iova, ram_addr_t size);
> +#endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> new file mode 100644
> index 0000000000..ea3e2a8f85
> --- /dev/null
> +++ b/backends/iommufd.c
> @@ -0,0 +1,228 @@
> +/*
> + * iommufd container backend
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + * Copyright Red Hat, Inc. 2023
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + * Eric Auger <eric.auger@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "sysemu/iommufd.h"
> +#include "qapi/error.h"
> +#include "qapi/qmp/qerror.h"
> +#include "qemu/module.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "monitor/monitor.h"
> +#include "trace.h"
> +#include <sys/ioctl.h>
> +#include <linux/iommufd.h>
> +
> +static void iommufd_backend_init(Object *obj)
> +{
> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> +
> + be->fd = -1;
> + be->users = 0;
> + be->owned = true;
> + qemu_mutex_init(&be->lock);
> +}
> +
> +static void iommufd_backend_finalize(Object *obj)
> +{
> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> +
> + if (be->owned) {
> + close(be->fd);
> + be->fd = -1;
> + }
> +}
> +
> +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
> +{
> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> + int fd = -1;
> +
> + fd = monitor_fd_param(monitor_cur(), str, errp);
> + if (fd == -1) {
> + error_prepend(errp, "Could not parse remote object fd %s:", str);
> + return;
> + }
> + qemu_mutex_lock(&be->lock);
> + be->fd = fd;
> + be->owned = false;
> + qemu_mutex_unlock(&be->lock);
> + trace_iommu_backend_set_fd(be->fd);
> +}
> +
> +static void iommufd_backend_class_init(ObjectClass *oc, void *data)
> +{
> + object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
> +}
> +
> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
> +{
> + int fd, ret = 0;
> +
> + qemu_mutex_lock(&be->lock);
> + if (be->users == UINT32_MAX) {
> + error_setg(errp, "too many connections");
> + ret = -E2BIG;
> + goto out;
> + }
> + if (be->owned && !be->users) {
> + fd = qemu_open_old("/dev/iommu", O_RDWR);
> + if (fd < 0) {
> + error_setg_errno(errp, errno, "/dev/iommu opening failed");
> + ret = fd;
> + goto out;
> + }
> + be->fd = fd;
> + }
> + be->users++;
> +out:
> + trace_iommufd_backend_connect(be->fd, be->owned,
> + be->users, ret);
> + qemu_mutex_unlock(&be->lock);
> + return ret;
> +}
> +
> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
> +{
> + qemu_mutex_lock(&be->lock);
> + if (!be->users) {
> + goto out;
> + }
> + be->users--;
> + if (!be->users && be->owned) {
> + close(be->fd);
> + be->fd = -1;
> + }
> +out:
> + trace_iommufd_backend_disconnect(be->fd, be->users);
> + qemu_mutex_unlock(&be->lock);
> +}
> +
> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
> + Error **errp)
> +{
> + int ret, fd = be->fd;
> + struct iommu_ioas_alloc alloc_data = {
> + .size = sizeof(alloc_data),
> + .flags = 0,
> + };
> +
> + ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
> + if (ret) {
> + error_setg_errno(errp, errno, "Failed to allocate ioas");
> + return ret;
> + }
> +
> + *ioas_id = alloc_data.out_ioas_id;
> + trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
> +
> + return ret;
> +}
> +
> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id)
> +{
> + int ret, fd = be->fd;
> + struct iommu_destroy des = {
> + .size = sizeof(des),
> + .id = id,
> + };
> +
> + ret = ioctl(fd, IOMMU_DESTROY, &des);
> + trace_iommufd_backend_free_id(fd, id, ret);
> + if (ret) {
> + error_report("Failed to free id: %u %m", id);
> + }
> +}
> +
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> + ram_addr_t size, void *vaddr, bool readonly)
> +{
> + int ret, fd = be->fd;
> + struct iommu_ioas_map map = {
> + .size = sizeof(map),
> + .flags = IOMMU_IOAS_MAP_READABLE |
> + IOMMU_IOAS_MAP_FIXED_IOVA,
> + .ioas_id = ioas_id,
> + .__reserved = 0,
> + .user_va = (uintptr_t)vaddr,
> + .iova = iova,
> + .length = size,
> + };
> +
> + if (!readonly) {
> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
> + }
> +
> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
> + vaddr, readonly, ret);
> + if (ret) {
> + ret = -errno;
> + error_report("IOMMU_IOAS_MAP failed: %m");
> + }
> + return ret;
> +}
> +
> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> + hwaddr iova, ram_addr_t size)
> +{
> + int ret, fd = be->fd;
> + struct iommu_ioas_unmap unmap = {
> + .size = sizeof(unmap),
> + .ioas_id = ioas_id,
> + .iova = iova,
> + .length = size,
> + };
> +
> + ret = ioctl(fd, IOMMU_IOAS_UNMAP, &unmap);
> + /*
> + * IOMMUFD takes mapping as some kind of object, unmapping
> + * nonexistent mapping is treated as deleting a nonexistent
> + * object and return ENOENT. This is different from legacy
> + * backend which allows it. vIOMMU may trigger a lot of
> + * redundant unmapping, to avoid flush the log, treat them
> + * as succeess for IOMMUFD just like legacy backend.
> + */
> + if (ret && errno == ENOENT) {
> + trace_iommufd_backend_unmap_dma_non_exist(fd, ioas_id, iova, size, ret);
> + ret = 0;
> + } else {
> + trace_iommufd_backend_unmap_dma(fd, ioas_id, iova, size, ret);
> + }
> +
> + if (ret) {
> + ret = -errno;
> + error_report("IOMMU_IOAS_UNMAP failed: %m");
> + }
> + return ret;
> +}
> +
> +static const TypeInfo iommufd_backend_info = {
> + .name = TYPE_IOMMUFD_BACKEND,
> + .parent = TYPE_OBJECT,
> + .instance_size = sizeof(IOMMUFDBackend),
> + .instance_init = iommufd_backend_init,
> + .instance_finalize = iommufd_backend_finalize,
> + .class_size = sizeof(IOMMUFDBackendClass),
> + .class_init = iommufd_backend_class_init,
> + .interfaces = (InterfaceInfo[]) {
> + { TYPE_USER_CREATABLE },
> + { }
> + }
> +};
> +
> +static void register_types(void)
> +{
> + type_register_static(&iommufd_backend_info);
> +}
> +
> +type_init(register_types);
> diff --git a/backends/Kconfig b/backends/Kconfig
> index f35abc1609..2cb23f62fa 100644
> --- a/backends/Kconfig
> +++ b/backends/Kconfig
> @@ -1 +1,5 @@
> source tpm/Kconfig
> +
> +config IOMMUFD
> + bool
> + depends on VFIO
> diff --git a/backends/meson.build b/backends/meson.build
> index 914c7c4afb..9a5cea480d 100644
> --- a/backends/meson.build
> +++ b/backends/meson.build
> @@ -20,6 +20,7 @@ if have_vhost_user
> system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c'))
> endif
> system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost.c'))
> +system_ss.add(when: 'CONFIG_IOMMUFD', if_true: files('iommufd.c'))
> if have_vhost_user_crypto
> system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost-user.c'))
> endif
> diff --git a/backends/trace-events b/backends/trace-events
> index 652eb76a57..d45c6e31a6 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -5,3 +5,13 @@ dbus_vmstate_pre_save(void)
> dbus_vmstate_post_load(int version_id) "version_id: %d"
> dbus_vmstate_loading(const char *id) "id: %s"
> dbus_vmstate_saving(const char *id) "id: %s"
> +
> +# iommufd.c
> +iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd=%d owned=%d users=%d (%d)"
> +iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
> +iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
> +iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
> +iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> +iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> +iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
> +iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 42fd09e4de..70507c0ee6 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -5224,6 +5224,18 @@ SRST
>
> The ``share`` boolean option is on by default with memfd.
>
> + ``-object iommufd,id=id[,fd=fd]``
> + Creates an iommufd backend which allows control of DMA mapping
> + through the /dev/iommu device.
> +
> + The ``id`` parameter is a unique ID which frontends (such as
> + vfio-pci of vdpa) will use to connect with the iommufd backend.
> +
> + The ``fd`` parameter is an optional pre-opened file descriptor
> + resulting from /dev/iommu opening. Usually the iommufd is shared
> + across all subsystems, bringing the benefit of centralized
> + reference counting.
> +
> ``-object rng-builtin,id=id``
> Creates a random number generator backend which obtains entropy
> from QEMU builtin functions. The ``id`` parameter is a unique ID
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 02/21] util/char_dev: Add open_cdev()
2023-11-14 10:09 ` [PATCH v6 02/21] util/char_dev: Add open_cdev() Zhenzhong Duan
@ 2023-11-14 13:29 ` Cédric Le Goater
2023-11-15 13:23 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 13:29 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
>
> /dev/vfio/devices/vfioX may not exist. In that case it is still possible
> to open /dev/char/$major:$minor instead. Add helper function to abstract
> the cdev open.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> MAINTAINERS | 3 ++
> include/qemu/chardev_open.h | 16 ++++++++
> util/chardev_open.c | 81 +++++++++++++++++++++++++++++++++++++
> util/meson.build | 1 +
> 4 files changed, 101 insertions(+)
> create mode 100644 include/qemu/chardev_open.h
> create mode 100644 util/chardev_open.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a4891f7bda..869ec3d5af 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2172,6 +2172,9 @@ M: Eric Auger <eric.auger@redhat.com>
> S: Supported
> F: backends/iommufd.c
> F: include/sysemu/iommufd.h
> +F: include/qemu/chardev_open.h
> +F: util/chardev_open.c
> +
>
> vhost
> M: Michael S. Tsirkin <mst@redhat.com>
> diff --git a/include/qemu/chardev_open.h b/include/qemu/chardev_open.h
> new file mode 100644
> index 0000000000..64e8fcfdcb
> --- /dev/null
> +++ b/include/qemu/chardev_open.h
> @@ -0,0 +1,16 @@
> +/*
> + * QEMU Chardev Helper
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_CHARDEV_OPEN_H
> +#define QEMU_CHARDEV_OPEN_H
> +
> +int open_cdev(const char *devpath, dev_t cdev);
> +#endif
> diff --git a/util/chardev_open.c b/util/chardev_open.c
> new file mode 100644
> index 0000000000..f776429788
> --- /dev/null
> +++ b/util/chardev_open.c
> @@ -0,0 +1,81 @@
> +/*
> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
> + * Copyright (C) 2023 Intel Corporation.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses. You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *
> + * Copied from
> + * https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/chardev_open.h"
> +
> +static int open_cdev_internal(const char *path, dev_t cdev)
> +{
> + struct stat st;
> + int fd;
> +
> + fd = qemu_open_old(path, O_RDWR);
> + if (fd == -1) {
> + return -1;
> + }
> + if (fstat(fd, &st) || !S_ISCHR(st.st_mode) ||
> + (cdev != 0 && st.st_rdev != cdev)) {
> + close(fd);
> + return -1;
> + }
> + return fd;
> +}
> +
> +static int open_cdev_robust(dev_t cdev)
> +{
> + g_autofree char *devpath = NULL;
> +
> + /*
> + * This assumes that udev is being used and is creating the /dev/char/
> + * symlinks.
> + */
> + devpath = g_strdup_printf("/dev/char/%u:%u", major(cdev), minor(cdev));
> + return open_cdev_internal(devpath, cdev);
> +}
> +
> +int open_cdev(const char *devpath, dev_t cdev)
> +{
> + int fd;
> +
> + fd = open_cdev_internal(devpath, cdev);
> + if (fd == -1 && cdev != 0) {
> + return open_cdev_robust(cdev);
> + }
> + return fd;
> +}
> diff --git a/util/meson.build b/util/meson.build
> index c2322ef6e7..174c133368 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -108,6 +108,7 @@ if have_block
> util_ss.add(files('filemonitor-stub.c'))
> endif
> util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c'))
> + util_ss.add(when: 'CONFIG_LINUX', if_true: files('chardev_open.c'))
> endif
>
> if cpu == 'aarch64'
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 03/21] vfio/common: return early if space isn't empty
2023-11-14 10:09 ` [PATCH v6 03/21] vfio/common: return early if space isn't empty Zhenzhong Duan
@ 2023-11-14 13:29 ` Cédric Le Goater
2023-11-15 13:28 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 13:29 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> This is a trivial optimization. If there is active container in space,
> vfio_reset_handler will never be unregistered. So revert the check of
> space->containers and return early.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/common.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 572ae7c934..934f4f5446 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1462,10 +1462,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
>
> void vfio_put_address_space(VFIOAddressSpace *space)
> {
> - if (QLIST_EMPTY(&space->containers)) {
> - QLIST_REMOVE(space, list);
> - g_free(space);
> + if (!QLIST_EMPTY(&space->containers)) {
> + return;
> }
> +
> + QLIST_REMOVE(space, list);
> + g_free(space);
> +
> if (QLIST_EMPTY(&vfio_address_spaces)) {
> qemu_unregister_reset(vfio_reset_handler, NULL);
> }
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 04/21] vfio/iommufd: Implement the iommufd backend
2023-11-14 10:09 ` [PATCH v6 04/21] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
@ 2023-11-14 13:36 ` Cédric Le Goater
0 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 13:36 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
>
> Add the iommufd backend. The IOMMUFD container class is implemented
> based on the new /dev/iommu user API. This backend obviously depends
> on CONFIG_IOMMUFD.
>
> So far, the iommufd backend doesn't support dirty page sync yet due
> to missing support in the host kernel.
>
> Co-authored-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> v6: simplify CONFIG_IOMMUFD checking code
> dirrectory -> directory
> check iommufd_cdev_kvm_device_add return value
> use prefix iommufd_cdev_* for all functions and trace event
>
> include/hw/vfio/vfio-common.h | 11 +
> hw/vfio/common.c | 6 +
> hw/vfio/iommufd.c | 428 ++++++++++++++++++++++++++++++++++
> hw/vfio/meson.build | 3 +
> hw/vfio/trace-events | 10 +
> 5 files changed, 458 insertions(+)
> create mode 100644 hw/vfio/iommufd.c
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 24ecc0e7ee..3dac5c167e 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -89,6 +89,14 @@ typedef struct VFIOHostDMAWindow {
> QLIST_ENTRY(VFIOHostDMAWindow) hostwin_next;
> } VFIOHostDMAWindow;
>
> +typedef struct IOMMUFDBackend IOMMUFDBackend;
> +
> +typedef struct VFIOIOMMUFDContainer {
> + VFIOContainerBase bcontainer;
> + IOMMUFDBackend *be;
> + uint32_t ioas_id;
> +} VFIOIOMMUFDContainer;
> +
> typedef struct VFIODeviceOps VFIODeviceOps;
>
> typedef struct VFIODevice {
> @@ -116,6 +124,8 @@ typedef struct VFIODevice {
> OnOffAuto pre_copy_dirty_page_tracking;
> bool dirty_pages_supported;
> bool dirty_tracking;
> + int devid;
> + IOMMUFDBackend *iommufd;
> } VFIODevice;
>
> struct VFIODeviceOps {
> @@ -201,6 +211,7 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
> extern VFIOGroupList vfio_group_list;
> extern VFIODeviceList vfio_device_list;
> extern const VFIOIOMMUOps vfio_legacy_ops;
> +extern const VFIOIOMMUOps vfio_iommufd_ops;
> extern const MemoryListener vfio_memory_listener;
> extern int vfio_kvm_device_fd;
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 934f4f5446..6569732b7a 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -19,6 +19,7 @@
> */
>
> #include "qemu/osdep.h"
> +#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
> #include <sys/ioctl.h>
> #ifdef CONFIG_KVM
> #include <linux/kvm.h>
> @@ -1503,6 +1504,11 @@ int vfio_attach_device(char *name, VFIODevice *vbasedev,
> {
> const VFIOIOMMUOps *ops = &vfio_legacy_ops;
>
> +#ifdef CONFIG_IOMMUFD
> + if (vbasedev->iommufd) {
> + ops = &vfio_iommufd_ops;
> + }
> +#endif
> return ops->attach_device(name, vbasedev, as, errp);
> }
>
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> new file mode 100644
> index 0000000000..06282d885c
> --- /dev/null
> +++ b/hw/vfio/iommufd.c
> @@ -0,0 +1,428 @@
> +/*
> + * iommufd container backend
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + * Copyright Red Hat, Inc. 2023
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + * Eric Auger <eric.auger@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include <sys/ioctl.h>
> +#include <linux/vfio.h>
> +#include <linux/iommufd.h>
> +
> +#include "hw/vfio/vfio-common.h"
> +#include "qemu/error-report.h"
> +#include "trace.h"
> +#include "qapi/error.h"
> +#include "sysemu/iommufd.h"
> +#include "hw/qdev-core.h"
> +#include "sysemu/reset.h"
> +#include "qemu/cutils.h"
> +#include "qemu/chardev_open.h"
> +
> +static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
> + ram_addr_t size, void *vaddr, bool readonly)
> +{
> + VFIOIOMMUFDContainer *container =
> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> +
> + return iommufd_backend_map_dma(container->be,
> + container->ioas_id,
> + iova, size, vaddr, readonly);
> +}
> +
> +static int iommufd_cdev_unmap(VFIOContainerBase *bcontainer,
> + hwaddr iova, ram_addr_t size,
> + IOMMUTLBEntry *iotlb)
> +{
> + VFIOIOMMUFDContainer *container =
> + container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> +
> + /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
> + return iommufd_backend_unmap_dma(container->be,
> + container->ioas_id, iova, size);
> +}
> +
> +static int iommufd_cdev_kvm_device_add(VFIODevice *vbasedev, Error **errp)
> +{
> + return vfio_kvm_device_add_fd(vbasedev->fd, errp);
> +}
> +
> +static void iommufd_cdev_kvm_device_del(VFIODevice *vbasedev)
> +{
> + Error *err = NULL;
> +
> + if (vfio_kvm_device_del_fd(vbasedev->fd, &err)) {
> + error_report_err(err);
> + }
> +}
> +
> +static int iommufd_cdev_connect_and_bind(VFIODevice *vbasedev, Error **errp)
> +{
> + IOMMUFDBackend *iommufd = vbasedev->iommufd;
> + struct vfio_device_bind_iommufd bind = {
> + .argsz = sizeof(bind),
> + .flags = 0,
> + };
> + int ret;
> +
> + ret = iommufd_backend_connect(iommufd, errp);
> + if (ret) {
> + return ret;
> + }
> +
> + /*
> + * Add device to kvm-vfio to be prepared for the tracking
> + * in KVM. Especially for some emulated devices, it requires
> + * to have kvm information in the device open.
> + */
> + ret = iommufd_cdev_kvm_device_add(vbasedev, errp);
> + if (ret) {
> + goto err_kvm_device_add;
> + }
> +
> + /* Bind device to iommufd */
> + bind.iommufd = iommufd->fd;
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_BIND_IOMMUFD, &bind);
> + if (ret) {
> + error_setg_errno(errp, errno, "error bind device fd=%d to iommufd=%d",
> + vbasedev->fd, bind.iommufd);
> + goto err_bind;
> + }
> +
> + vbasedev->devid = bind.out_devid;
> + trace_iommufd_cdev_connect_and_bind(bind.iommufd, vbasedev->name,
> + vbasedev->fd, vbasedev->devid);
> + return ret;
> +err_bind:
> + iommufd_cdev_kvm_device_del(vbasedev);
> +err_kvm_device_add:
> + iommufd_backend_disconnect(iommufd);
> + return ret;
> +}
> +
> +static void iommufd_cdev_unbind_and_disconnect(VFIODevice *vbasedev)
> +{
> + /* Unbind is automatically conducted when device fd is closed */
> + iommufd_cdev_kvm_device_del(vbasedev);
> + iommufd_backend_disconnect(vbasedev->iommufd);
> +}
> +
> +static int iommufd_cdev_getfd(const char *sysfs_path, Error **errp)
> +{
> + long int ret = -ENOTTY;
> + char *path, *vfio_dev_path = NULL, *vfio_path = NULL;
> + DIR *dir = NULL;
> + struct dirent *dent;
> + gchar *contents;
> + struct stat st;
> + gsize length;
> + int major, minor;
> + dev_t vfio_devt;
> +
> + path = g_strdup_printf("%s/vfio-dev", sysfs_path);
> + if (stat(path, &st) < 0) {
> + error_setg_errno(errp, errno, "no such host device");
> + goto out_free_path;
> + }
> +
> + dir = opendir(path);
> + if (!dir) {
> + error_setg_errno(errp, errno, "couldn't open directory %s", path);
> + goto out_free_path;
> + }
> +
> + while ((dent = readdir(dir))) {
> + if (!strncmp(dent->d_name, "vfio", 4)) {
> + vfio_dev_path = g_strdup_printf("%s/%s/dev", path, dent->d_name);
> + break;
> + }
> + }
> +
> + if (!vfio_dev_path) {
> + error_setg(errp, "failed to find vfio-dev/vfioX/dev");
> + goto out_close_dir;
> + }
> +
> + if (!g_file_get_contents(vfio_dev_path, &contents, &length, NULL)) {
> + error_setg(errp, "failed to load \"%s\"", vfio_dev_path);
> + goto out_free_dev_path;
> + }
> +
> + if (sscanf(contents, "%d:%d", &major, &minor) != 2) {
> + error_setg(errp, "failed to get major:minor for \"%s\"", vfio_dev_path);
> + goto out_free_dev_path;
> + }
> + g_free(contents);
> + vfio_devt = makedev(major, minor);
> +
> + vfio_path = g_strdup_printf("/dev/vfio/devices/%s", dent->d_name);
> + ret = open_cdev(vfio_path, vfio_devt);
> + if (ret < 0) {
> + error_setg(errp, "Failed to open %s", vfio_path);
> + }
> +
> + trace_iommufd_cdev_getfd(vfio_path, ret);
> + g_free(vfio_path);
> +
> +out_free_dev_path:
> + g_free(vfio_dev_path);
> +out_close_dir:
> + closedir(dir);
> +out_free_path:
> + if (*errp) {
> + error_prepend(errp, VFIO_MSG_PREFIX, path);
> + }
> + g_free(path);
> +
> + return ret;
> +}
> +
> +static int iommufd_cdev_attach_ioas_hwpt(VFIODevice *vbasedev, bool is_ioas,
> + uint32_t id, Error **errp)
> +{
> + int ret, iommufd = vbasedev->iommufd->fd;
> + struct vfio_device_attach_iommufd_pt attach_data = {
> + .argsz = sizeof(attach_data),
> + .flags = 0,
> + .pt_id = id,
> + };
> + const char *str = is_ioas ? "ioas" : "hwpt";
> +
> + /* Attach device to an IOAS or hwpt within iommufd */
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_ATTACH_IOMMUFD_PT, &attach_data);
> + if (ret) {
> + error_setg_errno(errp, errno,
> + "[iommufd=%d] error attach %s (%d) to %s_id=%d",
> + iommufd, vbasedev->name, vbasedev->fd, str, id);
> + } else {
> + trace_iommufd_cdev_attach_ioas_hwpt(iommufd, vbasedev->name,
> + vbasedev->fd, str, id);
> + }
> + return ret;
> +}
> +
> +static int iommufd_cdev_detach_ioas_hwpt(VFIODevice *vbasedev, bool is_ioas,
> + uint32_t id, Error **errp)
> +{
> + int ret, iommufd = vbasedev->iommufd->fd;
> + struct vfio_device_detach_iommufd_pt detach_data = {
> + .argsz = sizeof(detach_data),
> + .flags = 0,
> + };
> + const char *str = is_ioas ? "ioas" : "hwpt";
> +
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_DETACH_IOMMUFD_PT, &detach_data);
> + if (ret) {
> + error_setg_errno(errp, errno, "detach %s from %s failed",
> + vbasedev->name, str);
> + } else {
> + trace_iommufd_cdev_detach_ioas_hwpt(iommufd, vbasedev->name, str, id);
> + }
> + return ret;
> +}
> +
> +static int iommufd_cdev_attach_container(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container,
> + Error **errp)
> +{
> + return iommufd_cdev_attach_ioas_hwpt(vbasedev, true,
> + container->ioas_id, errp);
> +}
> +
> +static void iommufd_cdev_detach_container(VFIODevice *vbasedev,
> + VFIOIOMMUFDContainer *container)
> +{
> + Error *err = NULL;
> +
> + if (iommufd_cdev_detach_ioas_hwpt(vbasedev, true,
> + container->ioas_id, &err)) {
> + error_report_err(err);
> + }
> +}
> +
> +static void iommufd_cdev_container_destroy(VFIOIOMMUFDContainer *container)
> +{
> + VFIOContainerBase *bcontainer = &container->bcontainer;
> +
> + if (!QLIST_EMPTY(&bcontainer->device_list)) {
> + return;
> + }
> + memory_listener_unregister(&bcontainer->listener);
> + vfio_container_destroy(bcontainer);
> + iommufd_backend_free_id(container->be, container->ioas_id);
> + g_free(container);
> +}
> +
> +static int iommufd_cdev_ram_block_discard_disable(bool state)
> +{
> + /*
> + * We support coordinated discarding of RAM via the RamDiscardManager.
> + */
> + return ram_block_uncoordinated_discard_disable(state);
> +}
> +
> +static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> + AddressSpace *as, Error **errp)
> +{
> + VFIOContainerBase *bcontainer;
> + VFIOIOMMUFDContainer *container;
> + VFIOAddressSpace *space;
> + struct vfio_device_info dev_info = { .argsz = sizeof(dev_info) };
> + int ret, devfd;
> + uint32_t ioas_id;
> + Error *err = NULL;
> +
> + devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
> + if (devfd < 0) {
> + return devfd;
> + }
> + vbasedev->fd = devfd;
> +
> + ret = iommufd_cdev_connect_and_bind(vbasedev, errp);
> + if (ret) {
> + goto err_connect_bind;
> + }
> +
> + space = vfio_get_address_space(as);
> +
> + /* try to attach to an existing container in this space */
> + QLIST_FOREACH(bcontainer, &space->containers, next) {
> + container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> + if (bcontainer->ops != &vfio_iommufd_ops ||
> + vbasedev->iommufd != container->be) {
> + continue;
> + }
> + if (iommufd_cdev_attach_container(vbasedev, container, &err)) {
> + const char *msg = error_get_pretty(err);
> +
> + trace_iommufd_cdev_fail_attach_existing_container(msg);
> + error_free(err);
> + err = NULL;
> + } else {
> + ret = iommufd_cdev_ram_block_discard_disable(true);
> + if (ret) {
> + error_setg(errp,
> + "Cannot set discarding of RAM broken (%d)", ret);
> + goto err_discard_disable;
> + }
> + goto found_container;
> + }
> + }
> +
> + /* Need to allocate a new dedicated container */
> + ret = iommufd_backend_alloc_ioas(vbasedev->iommufd, &ioas_id, errp);
> + if (ret < 0) {
> + goto err_alloc_ioas;
> + }
> +
> + trace_iommufd_cdev_alloc_ioas(vbasedev->iommufd->fd, ioas_id);
> +
> + container = g_malloc0(sizeof(*container));
> + container->be = vbasedev->iommufd;
> + container->ioas_id = ioas_id;
> +
> + bcontainer = &container->bcontainer;
> + vfio_container_init(bcontainer, space, &vfio_iommufd_ops);
> + QLIST_INSERT_HEAD(&space->containers, bcontainer, next);
> +
> + ret = iommufd_cdev_attach_container(vbasedev, container, errp);
> + if (ret) {
> + goto err_attach_container;
> + }
> +
> + ret = iommufd_cdev_ram_block_discard_disable(true);
> + if (ret) {
> + goto err_discard_disable;
> + }
> +
> + bcontainer->pgsizes = qemu_real_host_page_size();
> +
> + bcontainer->listener = vfio_memory_listener;
> + memory_listener_register(&bcontainer->listener, bcontainer->space->as);
> +
> + if (bcontainer->error) {
> + ret = -1;
> + error_propagate_prepend(errp, bcontainer->error,
> + "memory listener initialization failed: ");
> + goto err_listener_register;
> + }
> +
> + bcontainer->initialized = true;
> +
> +found_container:
> + ret = ioctl(devfd, VFIO_DEVICE_GET_INFO, &dev_info);
> + if (ret) {
> + error_setg_errno(errp, errno, "error getting device info");
> + goto err_listener_register;
> + }
> +
> + /*
> + * TODO: examine RAM_BLOCK_DISCARD stuff, should we do group level
> + * for discarding incompatibility check as well?
> + */
> + if (vbasedev->ram_block_discard_allowed) {
> + iommufd_cdev_ram_block_discard_disable(false);
> + }
> +
> + vbasedev->group = 0;
> + vbasedev->num_irqs = dev_info.num_irqs;
> + vbasedev->num_regions = dev_info.num_regions;
> + vbasedev->flags = dev_info.flags;
> + vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
> + vbasedev->bcontainer = bcontainer;
> + QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
> + QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
> +
> + trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
> + vbasedev->num_regions, vbasedev->flags);
> + return 0;
> +
> +err_listener_register:
> + iommufd_cdev_ram_block_discard_disable(false);
> +err_discard_disable:
> + iommufd_cdev_detach_container(vbasedev, container);
> +err_attach_container:
> + iommufd_cdev_container_destroy(container);
> +err_alloc_ioas:
> + vfio_put_address_space(space);
> + iommufd_cdev_unbind_and_disconnect(vbasedev);
> +err_connect_bind:
> + close(vbasedev->fd);
> + return ret;
> +}
> +
> +static void iommufd_cdev_detach(VFIODevice *vbasedev)
> +{
> + VFIOContainerBase *bcontainer = vbasedev->bcontainer;
> + VFIOIOMMUFDContainer *container;
> + VFIOAddressSpace *space = bcontainer->space;
> +
> + QLIST_REMOVE(vbasedev, global_next);
> + QLIST_REMOVE(vbasedev, container_next);
> + vbasedev->bcontainer = NULL;
> +
> + if (!vbasedev->ram_block_discard_allowed) {
> + iommufd_cdev_ram_block_discard_disable(false);
> + }
> +
> + container = container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> + iommufd_cdev_detach_container(vbasedev, container);
> + iommufd_cdev_container_destroy(container);
> + vfio_put_address_space(space);
> +
> + iommufd_cdev_unbind_and_disconnect(vbasedev);
> + close(vbasedev->fd);
> +}
> +
> +const VFIOIOMMUOps vfio_iommufd_ops = {
> + .dma_map = iommufd_cdev_map,
> + .dma_unmap = iommufd_cdev_unmap,
> + .attach_device = iommufd_cdev_attach,
> + .detach_device = iommufd_cdev_detach,
> +};
> diff --git a/hw/vfio/meson.build b/hw/vfio/meson.build
> index eb6ce6229d..e5d98b6adc 100644
> --- a/hw/vfio/meson.build
> +++ b/hw/vfio/meson.build
> @@ -7,6 +7,9 @@ vfio_ss.add(files(
> 'spapr.c',
> 'migration.c',
> ))
> +vfio_ss.add(when: 'CONFIG_IOMMUFD', if_true: files(
> + 'iommufd.c',
> +))
> vfio_ss.add(when: 'CONFIG_VFIO_PCI', if_true: files(
> 'display.c',
> 'pci-quirks.c',
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 08a1f9dfa4..5d3e9e8cee 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -164,3 +164,13 @@ vfio_state_pending_estimate(const char *name, uint64_t precopy, uint64_t postcop
> vfio_state_pending_exact(const char *name, uint64_t precopy, uint64_t postcopy, uint64_t stopcopy_size, uint64_t precopy_init_size, uint64_t precopy_dirty_size) " (%s) precopy 0x%"PRIx64" postcopy 0x%"PRIx64" stopcopy size 0x%"PRIx64" precopy initial size 0x%"PRIx64" precopy dirty size 0x%"PRIx64
> vfio_vmstate_change(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> vfio_vmstate_change_prepare(const char *name, int running, const char *reason, const char *dev_state) " (%s) running %d reason %s device state %s"
> +
> +#iommufd.c
> +
> +iommufd_cdev_connect_and_bind(int iommufd, const char *name, int devfd, int devid) " [iommufd=%d] Successfully bound device %s (fd=%d): output devid=%d"
> +iommufd_cdev_getfd(const char *dev, int devfd) " %s (fd=%d)"
> +iommufd_cdev_attach_ioas_hwpt(int iommufd, const char *name, int devfd, const char *str, int hwptid) " [iommufd=%d] Successfully attached device %s (%d) to %s_id=%d"
> +iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name, const char *str, int hwptid) " [iommufd=%d] Successfully detached %s from %s_id=%d"
> +iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
> +iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD container with ioasid=%d"
> +iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes
2023-11-14 10:09 ` [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes Zhenzhong Duan
@ 2023-11-14 13:46 ` Cédric Le Goater
2023-11-15 2:36 ` Duan, Zhenzhong
2023-11-15 16:25 ` Eric Auger
0 siblings, 2 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 13:46 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Some vIOMMU such as virtio-iommu use IOVA ranges from host side to
> setup reserved ranges for passthrough device, so that guest will not
> use an IOVA range beyond host support.
>
> Use an uAPI of IOMMUFD to get IOVA ranges of host side and pass to
> vIOMMU just like the legacy backend, if this fails, fallback to
> 64bit IOVA range.
>
> Also use out_iova_alignment returned from uAPI as pgsizes instead of
> qemu_real_host_page_size() as a fallback.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v6: propagate iommufd_cdev_get_info_iova_range err and print as warning
>
> hw/vfio/iommufd.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 54 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 06282d885c..e5bf528e89 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -267,6 +267,53 @@ static int iommufd_cdev_ram_block_discard_disable(bool state)
> return ram_block_uncoordinated_discard_disable(state);
> }
>
> +static int iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer *container,
> + uint32_t ioas_id, Error **errp)
> +{
> + VFIOContainerBase *bcontainer = &container->bcontainer;
> + struct iommu_ioas_iova_ranges *info;
> + struct iommu_iova_range *iova_ranges;
> + int ret, sz, fd = container->be->fd;
> +
> + info = g_malloc0(sizeof(*info));
> + info->size = sizeof(*info);
> + info->ioas_id = ioas_id;
> +
> + ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
> + if (ret && errno != EMSGSIZE) {
> + goto error;
> + }
> +
> + sz = info->num_iovas * sizeof(struct iommu_iova_range);
> + info = g_realloc(info, sizeof(*info) + sz);
> + info->allowed_iovas = (uintptr_t)(info + 1);
> +
> + ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
> + if (ret) {
> + goto error;
> + }
> +
> + iova_ranges = (struct iommu_iova_range *)(uintptr_t)info->allowed_iovas;
> +
> + for (int i = 0; i < info->num_iovas; i++) {
> + Range *range = g_new(Range, 1);
> +
> + range_set_bounds(range, iova_ranges[i].start, iova_ranges[i].last);
> + bcontainer->iova_ranges =
> + range_list_insert(bcontainer->iova_ranges, range);
> + }
> + bcontainer->pgsizes = info->out_iova_alignment;
> +
> + g_free(info);
> + return 0;
> +
> +error:
> + ret = -errno;
> + g_free(info);
> + error_setg_errno(errp, errno, "Cannot get IOVA ranges");
> + return ret;
> +}
> +
> static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp)
> {
> @@ -341,7 +388,13 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> goto err_discard_disable;
> }
>
> - bcontainer->pgsizes = qemu_real_host_page_size();
> + ret = iommufd_cdev_get_info_iova_range(container, ioas_id, &err);
> + if (ret) {
> + warn_report_err(err);
> + err = NULL;
> + error_printf("Fallback to default 64bit IOVA range and 4K page size\n");
This would be better :
error_append_hint(&err,
"Fallback to default 64bit IOVA range and 4K page size\n");
warn_report_err(err);
I will take care of it if you agree. With that,
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> + bcontainer->pgsizes = qemu_real_host_page_size();
> + }
>
> bcontainer->listener = vfio_memory_listener;
> memory_listener_register(&bcontainer->listener, bcontainer->space->as);
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface
2023-11-14 10:09 ` [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
@ 2023-11-14 13:51 ` Cédric Le Goater
2023-11-15 2:55 ` Duan, Zhenzhong
2023-11-15 17:54 ` Eric Auger
1 sibling, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 13:51 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Legacy vfio pci and iommufd cdev have different process to hot reset
> vfio device, expand current code to abstract out pci_hot_reset callback
> for legacy vfio, this same interface will also be used by iommufd
> cdev vfio device.
>
> Rename vfio_pci_hot_reset to vfio_legacy_pci_hot_reset and move it
> into container.c.
>
> vfio_pci_[pre/post]_reset and vfio_pci_host_match are exported so
> they could be called in legacy and iommufd pci_hot_reset callback.
vfio_pci_host_match() is never used outside ot the legacy reset cb.
Do you have future plans ?
Thanks,
C.
>
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v6: pci_hot_reset return -errno if fails
>
> hw/vfio/pci.h | 3 +
> include/hw/vfio/vfio-container-base.h | 3 +
> hw/vfio/container.c | 170 ++++++++++++++++++++++++++
> hw/vfio/pci.c | 168 +------------------------
> 4 files changed, 182 insertions(+), 162 deletions(-)
>
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 1006061afb..6e64a2654e 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -218,6 +218,9 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
>
> extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
>
> +void vfio_pci_pre_reset(VFIOPCIDevice *vdev);
> +void vfio_pci_post_reset(VFIOPCIDevice *vdev);
> +bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name);
> int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
> struct vfio_pci_hot_reset_info **info_p);
>
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 4b6f017c6f..45bb19c767 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -106,6 +106,9 @@ struct VFIOIOMMUOps {
> int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
> int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size);
> + /* PCI specific */
> + int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
> +
> /* SPAPR specific */
> int (*add_window)(VFIOContainerBase *bcontainer,
> MemoryRegionSection *section,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index ed2d721b2b..1dbf9b9a17 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -33,6 +33,7 @@
> #include "trace.h"
> #include "qapi/error.h"
> #include "migration/migration.h"
> +#include "pci.h"
>
> VFIOGroupList vfio_group_list =
> QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -922,6 +923,174 @@ static void vfio_legacy_detach_device(VFIODevice *vbasedev)
> vfio_put_group(group);
> }
>
> +static int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single)
> +{
> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> + VFIOGroup *group;
> + struct vfio_pci_hot_reset_info *info = NULL;
> + struct vfio_pci_dependent_device *devices;
> + struct vfio_pci_hot_reset *reset;
> + int32_t *fds;
> + int ret, i, count;
> + bool multi = false;
> +
> + trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> +
> + if (!single) {
> + vfio_pci_pre_reset(vdev);
> + }
> + vdev->vbasedev.needs_reset = false;
> +
> + ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
> +
> + if (ret) {
> + goto out_single;
> + }
> + devices = &info->devices[0];
> +
> + trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
> +
> + /* Verify that we have all the groups required */
> + for (i = 0; i < info->count; i++) {
> + PCIHostDeviceAddress host;
> + VFIOPCIDevice *tmp;
> + VFIODevice *vbasedev_iter;
> +
> + host.domain = devices[i].segment;
> + host.bus = devices[i].bus;
> + host.slot = PCI_SLOT(devices[i].devfn);
> + host.function = PCI_FUNC(devices[i].devfn);
> +
> + trace_vfio_pci_hot_reset_dep_devices(host.domain,
> + host.bus, host.slot, host.function, devices[i].group_id);
> +
> + if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
> + continue;
> + }
> +
> + QLIST_FOREACH(group, &vfio_group_list, next) {
> + if (group->groupid == devices[i].group_id) {
> + break;
> + }
> + }
> +
> + if (!group) {
> + if (!vdev->has_pm_reset) {
> + error_report("vfio: Cannot reset device %s, "
> + "depends on group %d which is not owned.",
> + vdev->vbasedev.name, devices[i].group_id);
> + }
> + ret = -EPERM;
> + goto out;
> + }
> +
> + /* Prep dependent devices for reset and clear our marker. */
> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> + if (!vbasedev_iter->dev->realized ||
> + vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> + continue;
> + }
> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> + if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
> + if (single) {
> + ret = -EINVAL;
> + goto out_single;
> + }
> + vfio_pci_pre_reset(tmp);
> + tmp->vbasedev.needs_reset = false;
> + multi = true;
> + break;
> + }
> + }
> + }
> +
> + if (!single && !multi) {
> + ret = -EINVAL;
> + goto out_single;
> + }
> +
> + /* Determine how many group fds need to be passed */
> + count = 0;
> + QLIST_FOREACH(group, &vfio_group_list, next) {
> + for (i = 0; i < info->count; i++) {
> + if (group->groupid == devices[i].group_id) {
> + count++;
> + break;
> + }
> + }
> + }
> +
> + reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
> + reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
> + fds = &reset->group_fds[0];
> +
> + /* Fill in group fds */
> + QLIST_FOREACH(group, &vfio_group_list, next) {
> + for (i = 0; i < info->count; i++) {
> + if (group->groupid == devices[i].group_id) {
> + fds[reset->count++] = group->fd;
> + break;
> + }
> + }
> + }
> +
> + /* Bus reset! */
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
> + g_free(reset);
> + if (ret) {
> + ret = -errno;
> + }
> +
> + trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
> + ret ? strerror(errno) : "Success");
> +
> +out:
> + /* Re-enable INTx on affected devices */
> + for (i = 0; i < info->count; i++) {
> + PCIHostDeviceAddress host;
> + VFIOPCIDevice *tmp;
> + VFIODevice *vbasedev_iter;
> +
> + host.domain = devices[i].segment;
> + host.bus = devices[i].bus;
> + host.slot = PCI_SLOT(devices[i].devfn);
> + host.function = PCI_FUNC(devices[i].devfn);
> +
> + if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
> + continue;
> + }
> +
> + QLIST_FOREACH(group, &vfio_group_list, next) {
> + if (group->groupid == devices[i].group_id) {
> + break;
> + }
> + }
> +
> + if (!group) {
> + break;
> + }
> +
> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> + if (!vbasedev_iter->dev->realized ||
> + vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> + continue;
> + }
> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> + if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
> + vfio_pci_post_reset(tmp);
> + break;
> + }
> + }
> + }
> +out_single:
> + if (!single) {
> + vfio_pci_post_reset(vdev);
> + }
> + g_free(info);
> +
> + return ret;
> +}
> +
> const VFIOIOMMUOps vfio_legacy_ops = {
> .dma_map = vfio_legacy_dma_map,
> .dma_unmap = vfio_legacy_dma_unmap,
> @@ -929,4 +1098,5 @@ const VFIOIOMMUOps vfio_legacy_ops = {
> .detach_device = vfio_legacy_detach_device,
> .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
> .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
> + .pci_hot_reset = vfio_legacy_pci_hot_reset,
> };
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index eb55e8ae88..d00c3472c7 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2374,7 +2374,7 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
> return 0;
> }
>
> -static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
> +void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
> {
> PCIDevice *pdev = &vdev->pdev;
> uint16_t cmd;
> @@ -2411,7 +2411,7 @@ static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
> vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
> }
>
> -static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
> +void vfio_pci_post_reset(VFIOPCIDevice *vdev)
> {
> Error *err = NULL;
> int nr;
> @@ -2435,7 +2435,7 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
> vfio_quirk_reset(vdev);
> }
>
> -static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
> +bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
> {
> char tmp[13];
>
> @@ -2485,166 +2485,10 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
>
> static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> {
> - VFIOGroup *group;
> - struct vfio_pci_hot_reset_info *info = NULL;
> - struct vfio_pci_dependent_device *devices;
> - struct vfio_pci_hot_reset *reset;
> - int32_t *fds;
> - int ret, i, count;
> - bool multi = false;
> -
> - trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> -
> - if (!single) {
> - vfio_pci_pre_reset(vdev);
> - }
> - vdev->vbasedev.needs_reset = false;
> -
> - ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
> -
> - if (ret) {
> - goto out_single;
> - }
> - devices = &info->devices[0];
> -
> - trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
> -
> - /* Verify that we have all the groups required */
> - for (i = 0; i < info->count; i++) {
> - PCIHostDeviceAddress host;
> - VFIOPCIDevice *tmp;
> - VFIODevice *vbasedev_iter;
> -
> - host.domain = devices[i].segment;
> - host.bus = devices[i].bus;
> - host.slot = PCI_SLOT(devices[i].devfn);
> - host.function = PCI_FUNC(devices[i].devfn);
> -
> - trace_vfio_pci_hot_reset_dep_devices(host.domain,
> - host.bus, host.slot, host.function, devices[i].group_id);
> -
> - if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
> - continue;
> - }
> -
> - QLIST_FOREACH(group, &vfio_group_list, next) {
> - if (group->groupid == devices[i].group_id) {
> - break;
> - }
> - }
> -
> - if (!group) {
> - if (!vdev->has_pm_reset) {
> - error_report("vfio: Cannot reset device %s, "
> - "depends on group %d which is not owned.",
> - vdev->vbasedev.name, devices[i].group_id);
> - }
> - ret = -EPERM;
> - goto out;
> - }
> -
> - /* Prep dependent devices for reset and clear our marker. */
> - QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> - if (!vbasedev_iter->dev->realized ||
> - vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> - continue;
> - }
> - tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> - if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
> - if (single) {
> - ret = -EINVAL;
> - goto out_single;
> - }
> - vfio_pci_pre_reset(tmp);
> - tmp->vbasedev.needs_reset = false;
> - multi = true;
> - break;
> - }
> - }
> - }
> -
> - if (!single && !multi) {
> - ret = -EINVAL;
> - goto out_single;
> - }
> -
> - /* Determine how many group fds need to be passed */
> - count = 0;
> - QLIST_FOREACH(group, &vfio_group_list, next) {
> - for (i = 0; i < info->count; i++) {
> - if (group->groupid == devices[i].group_id) {
> - count++;
> - break;
> - }
> - }
> - }
> -
> - reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
> - reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
> - fds = &reset->group_fds[0];
> -
> - /* Fill in group fds */
> - QLIST_FOREACH(group, &vfio_group_list, next) {
> - for (i = 0; i < info->count; i++) {
> - if (group->groupid == devices[i].group_id) {
> - fds[reset->count++] = group->fd;
> - break;
> - }
> - }
> - }
> -
> - /* Bus reset! */
> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
> - g_free(reset);
> -
> - trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
> - ret ? strerror(errno) : "Success");
> -
> -out:
> - /* Re-enable INTx on affected devices */
> - for (i = 0; i < info->count; i++) {
> - PCIHostDeviceAddress host;
> - VFIOPCIDevice *tmp;
> - VFIODevice *vbasedev_iter;
> -
> - host.domain = devices[i].segment;
> - host.bus = devices[i].bus;
> - host.slot = PCI_SLOT(devices[i].devfn);
> - host.function = PCI_FUNC(devices[i].devfn);
> -
> - if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
> - continue;
> - }
> -
> - QLIST_FOREACH(group, &vfio_group_list, next) {
> - if (group->groupid == devices[i].group_id) {
> - break;
> - }
> - }
> -
> - if (!group) {
> - break;
> - }
> -
> - QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> - if (!vbasedev_iter->dev->realized ||
> - vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> - continue;
> - }
> - tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> - if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
> - vfio_pci_post_reset(tmp);
> - break;
> - }
> - }
> - }
> -out_single:
> - if (!single) {
> - vfio_pci_post_reset(vdev);
> - }
> - g_free(info);
> + VFIODevice *vbasedev = &vdev->vbasedev;
> + const VFIOIOMMUOps *ops = vbasedev->bcontainer->ops;
>
> - return ret;
> + return ops->pci_hot_reset(vbasedev, single);
> }
>
> /*
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 10/21] vfio/pci: Allow the selection of a given iommu backend
2023-11-14 10:09 ` [PATCH v6 10/21] vfio/pci: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-14 13:57 ` Cédric Le Goater
0 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 13:57 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
>
> Now we support two types of iommu backends, let's add the capability
> to select one of them. This depends on whether an iommufd object has
> been linked with the vfio-pci device:
>
> If the user wants to use the legacy backend, it shall not
> link the vfio-pci device with any iommufd object:
>
> -device vfio-pci,host=0000:02:00.0
>
> This is called the legacy mode/backend.
>
> If the user wants to use the iommufd backend (/dev/iommu) it
> shall pass an iommufd object id in the vfio-pci device options:
>
> -object iommufd,id=iommufd0
> -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/pci.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index d00c3472c7..c5984b0598 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -19,6 +19,7 @@
> */
>
> #include "qemu/osdep.h"
> +#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
> #include <linux/vfio.h>
> #include <sys/ioctl.h>
>
> @@ -42,6 +43,7 @@
> #include "qapi/error.h"
> #include "migration/blocker.h"
> #include "migration/qemu-file.h"
> +#include "sysemu/iommufd.h"
>
> #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
>
> @@ -3386,6 +3388,10 @@ static Property vfio_pci_dev_properties[] = {
> * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
> */
> +#ifdef CONFIG_IOMMUFD
> + DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
> + TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> +#endif
> DEFINE_PROP_END_OF_LIST(),
> };
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 12/21] vfio/platform: Allow the selection of a given iommu backend
2023-11-14 10:09 ` [PATCH v6 12/21] vfio/platform: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-14 14:03 ` Cédric Le Goater
2023-11-17 14:55 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:03 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Now we support two types of iommu backends, let's add the capability
> to select one of them. This depends on whether an iommufd object has
> been linked with the vfio-platform device:
>
> If the user wants to use the legacy backend, it shall not
> link the vfio-platform device with any iommufd object:
>
> -device vfio-platform,host=XXX
>
> This is called the legacy mode/backend.
>
> If the user wants to use the iommufd backend (/dev/iommu) it
> shall pass an iommufd object id in the vfio-platform device options:
>
> -object iommufd,id=iommufd0
> -device vfio-platform,host=XXX,iommufd=iommufd0
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> v6: Move #include "sysemu/iommufd.h" in platform.c
>
> hw/vfio/platform.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index 8e3d4ac458..98ae4bc655 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -15,11 +15,13 @@
> */
>
> #include "qemu/osdep.h"
> +#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
> #include "qapi/error.h"
> #include <sys/ioctl.h>
> #include <linux/vfio.h>
>
> #include "hw/vfio/vfio-platform.h"
> +#include "sysemu/iommufd.h"
> #include "migration/vmstate.h"
> #include "qemu/error-report.h"
> #include "qemu/lockable.h"
> @@ -649,6 +651,10 @@ static Property vfio_platform_dev_properties[] = {
> DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
> mmap_timeout, 1100),
> DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
> +#ifdef CONFIG_IOMMUFD
> + DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
> + TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> +#endif
> DEFINE_PROP_END_OF_LIST(),
> };
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 14/21] vfio/ap: Allow the selection of a given iommu backend
2023-11-14 10:09 ` [PATCH v6 14/21] vfio/ap: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-14 14:03 ` Cédric Le Goater
0 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:03 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Matthew Rosato, Tony Krowiak, Halil Pasic, Jason Herne,
Thomas Huth, open list:vfio-ap
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Now we support two types of iommu backends, let's add the capability
> to select one of them. This depends on whether an iommufd object has
> been linked with the vfio-ap device:
>
> if the user wants to use the legacy backend, it shall not
> link the vfio-ap device with any iommufd object:
>
> -device vfio-ap,sysfsdev=/sys/bus/mdev/devices/XXX
>
> This is called the legacy mode/backend.
>
> If the user wants to use the iommufd backend (/dev/iommu) it
> shall pass an iommufd object id in the vfio-ap device options:
>
> -object iommufd,id=iommufd0
> -device vfio-ap,sysfsdev=/sys/bus/mdev/devices/XXX,iommufd=iommufd0
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/ap.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index bbf69ff55a..80629609ae 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -11,10 +11,12 @@
> */
>
> #include "qemu/osdep.h"
> +#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
> #include <linux/vfio.h>
> #include <sys/ioctl.h>
> #include "qapi/error.h"
> #include "hw/vfio/vfio-common.h"
> +#include "sysemu/iommufd.h"
> #include "hw/s390x/ap-device.h"
> #include "qemu/error-report.h"
> #include "qemu/event_notifier.h"
> @@ -204,6 +206,10 @@ static void vfio_ap_unrealize(DeviceState *dev)
>
> static Property vfio_ap_properties[] = {
> DEFINE_PROP_STRING("sysfsdev", VFIOAPDevice, vdev.sysfsdev),
> +#ifdef CONFIG_IOMMUFD
> + DEFINE_PROP_LINK("iommufd", VFIOAPDevice, vdev.iommufd,
> + TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> +#endif
> DEFINE_PROP_END_OF_LIST(),
> };
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 15/21] vfio/ap: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 ` [PATCH v6 15/21] vfio/ap: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-14 14:04 ` Cédric Le Goater
0 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:04 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Matthew Rosato, Tony Krowiak, Halil Pasic, Jason Herne,
Thomas Huth, open list:vfio-ap
On 11/14/23 11:09, Zhenzhong Duan wrote:
> This gives management tools like libvirt a chance to open the vfio
> cdev with privilege and pass FD to qemu. This way qemu never needs
> to have privilege to open a VFIO or iommu cdev node.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/ap.c | 23 ++++++++++++++++++++++-
> 1 file changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index 80629609ae..b21f92291e 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -160,7 +160,10 @@ static void vfio_ap_realize(DeviceState *dev, Error **errp)
> VFIOAPDevice *vapdev = VFIO_AP_DEVICE(dev);
> VFIODevice *vbasedev = &vapdev->vdev;
>
> - vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
> + if (vfio_device_get_name(vbasedev, errp)) {
> + return;
> + }
> +
> vbasedev->ops = &vfio_ap_ops;
> vbasedev->type = VFIO_DEVICE_TYPE_AP;
> vbasedev->dev = dev;
> @@ -230,11 +233,28 @@ static const VMStateDescription vfio_ap_vmstate = {
> .unmigratable = 1,
> };
>
> +static void vfio_ap_instance_init(Object *obj)
> +{
> + VFIOAPDevice *vapdev = VFIO_AP_DEVICE(obj);
> +
> + vapdev->vdev.fd = -1;
> +}
> +
> +#ifdef CONFIG_IOMMUFD
> +static void vfio_ap_set_fd(Object *obj, const char *str, Error **errp)
> +{
> + vfio_device_set_fd(&VFIO_AP_DEVICE(obj)->vdev, str, errp);
> +}
> +#endif
> +
> static void vfio_ap_class_init(ObjectClass *klass, void *data)
> {
> DeviceClass *dc = DEVICE_CLASS(klass);
>
> device_class_set_props(dc, vfio_ap_properties);
> +#ifdef CONFIG_IOMMUFD
> + object_class_property_add_str(klass, "fd", NULL, vfio_ap_set_fd);
> +#endif
> dc->vmsd = &vfio_ap_vmstate;
> dc->desc = "VFIO-based AP device assignment";
> set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> @@ -249,6 +269,7 @@ static const TypeInfo vfio_ap_info = {
> .name = TYPE_VFIO_AP_DEVICE,
> .parent = TYPE_AP_DEVICE,
> .instance_size = sizeof(VFIOAPDevice),
> + .instance_init = vfio_ap_instance_init,
> .class_init = vfio_ap_class_init,
> };
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 16/21] vfio/ccw: Allow the selection of a given iommu backend
2023-11-14 10:09 ` [PATCH v6 16/21] vfio/ccw: Allow the selection of a given iommu backend Zhenzhong Duan
@ 2023-11-14 14:04 ` Cédric Le Goater
2023-11-15 18:45 ` Eric Farman
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:04 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Matthew Rosato, Eric Farman, Thomas Huth, open list:vfio-ccw
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Now we support two types of iommu backends, let's add the capability
> to select one of them. This depends on whether an iommufd object has
> been linked with the vfio-ccw device:
>
> If the user wants to use the legacy backend, it shall not
> link the vfio-ccw device with any iommufd object:
>
> -device vfio-ccw,sysfsdev=/sys/bus/mdev/devices/XXX
>
> This is called the legacy mode/backend.
>
> If the user wants to use the iommufd backend (/dev/iommu) it
> shall pass an iommufd object id in the vfio-ccw device options:
>
> -object iommufd,id=iommufd0
> -device vfio-ccw,sysfsdev=/sys/bus/mdev/devices/XXX,iommufd=iommufd0
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/ccw.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index d857bb8d0f..d2d58bb677 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -15,12 +15,14 @@
> */
>
> #include "qemu/osdep.h"
> +#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
> #include <linux/vfio.h>
> #include <linux/vfio_ccw.h>
> #include <sys/ioctl.h>
>
> #include "qapi/error.h"
> #include "hw/vfio/vfio-common.h"
> +#include "sysemu/iommufd.h"
> #include "hw/s390x/s390-ccw.h"
> #include "hw/s390x/vfio-ccw.h"
> #include "hw/qdev-properties.h"
> @@ -677,6 +679,10 @@ static void vfio_ccw_unrealize(DeviceState *dev)
> static Property vfio_ccw_properties[] = {
> DEFINE_PROP_STRING("sysfsdev", VFIOCCWDevice, vdev.sysfsdev),
> DEFINE_PROP_BOOL("force-orb-pfch", VFIOCCWDevice, force_orb_pfch, false),
> +#ifdef CONFIG_IOMMUFD
> + DEFINE_PROP_LINK("iommufd", VFIOCCWDevice, vdev.iommufd,
> + TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> +#endif
> DEFINE_PROP_END_OF_LIST(),
> };
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 17/21] vfio/ccw: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 ` [PATCH v6 17/21] vfio/ccw: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-14 14:04 ` Cédric Le Goater
2023-11-15 18:46 ` Eric Farman
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:04 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Matthew Rosato, Eric Farman, Thomas Huth, open list:vfio-ccw
On 11/14/23 11:09, Zhenzhong Duan wrote:
> This gives management tools like libvirt a chance to open the vfio
> cdev with privilege and pass FD to qemu. This way qemu never needs
> to have privilege to open a VFIO or iommu cdev node.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/ccw.c | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index d2d58bb677..b116b10fe7 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -590,11 +590,12 @@ static void vfio_ccw_realize(DeviceState *dev, Error **errp)
> }
> }
>
> + if (vfio_device_get_name(vbasedev, errp)) {
> + return;
> + }
> +
> vbasedev->ops = &vfio_ccw_ops;
> vbasedev->type = VFIO_DEVICE_TYPE_CCW;
> - vbasedev->name = g_strdup_printf("%x.%x.%04x", vcdev->cdev.hostid.cssid,
> - vcdev->cdev.hostid.ssid,
> - vcdev->cdev.hostid.devid);
> vbasedev->dev = dev;
>
> /*
> @@ -691,12 +692,29 @@ static const VMStateDescription vfio_ccw_vmstate = {
> .unmigratable = 1,
> };
>
> +static void vfio_ccw_instance_init(Object *obj)
> +{
> + VFIOCCWDevice *vcdev = VFIO_CCW(obj);
> +
> + vcdev->vdev.fd = -1;
> +}
> +
> +#ifdef CONFIG_IOMMUFD
> +static void vfio_ccw_set_fd(Object *obj, const char *str, Error **errp)
> +{
> + vfio_device_set_fd(&VFIO_CCW(obj)->vdev, str, errp);
> +}
> +#endif
> +
> static void vfio_ccw_class_init(ObjectClass *klass, void *data)
> {
> DeviceClass *dc = DEVICE_CLASS(klass);
> S390CCWDeviceClass *cdc = S390_CCW_DEVICE_CLASS(klass);
>
> device_class_set_props(dc, vfio_ccw_properties);
> +#ifdef CONFIG_IOMMUFD
> + object_class_property_add_str(klass, "fd", NULL, vfio_ccw_set_fd);
> +#endif
> dc->vmsd = &vfio_ccw_vmstate;
> dc->desc = "VFIO-based subchannel assignment";
> set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> @@ -714,6 +732,7 @@ static const TypeInfo vfio_ccw_info = {
> .name = TYPE_VFIO_CCW,
> .parent = TYPE_S390_CCW,
> .instance_size = sizeof(VFIOCCWDevice),
> + .instance_init = vfio_ccw_instance_init,
> .class_init = vfio_ccw_class_init,
> };
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 18/21] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks
2023-11-14 10:09 ` [PATCH v6 18/21] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Zhenzhong Duan
@ 2023-11-14 14:05 ` Cédric Le Goater
2023-11-17 14:58 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:05 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Some of the callbacks in VFIOIOMMUOps pass VFIOContainerBase poiner,
> those callbacks only need read access to the sub object of VFIOContainerBase.
> So make VFIOContainerBase, VFIOContainer and VFIOIOMMUFDContainer as const
> in these callbacks.
>
> Local functions called by those callbacks also need same changes to avoid
> build error.
>
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/hw/vfio/vfio-common.h | 12 ++++++----
> include/hw/vfio/vfio-container-base.h | 12 ++++++----
> hw/vfio/common.c | 9 +++----
> hw/vfio/container-base.c | 2 +-
> hw/vfio/container.c | 34 ++++++++++++++-------------
> hw/vfio/iommufd.c | 8 +++----
> 6 files changed, 42 insertions(+), 35 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 567e5f7bea..7954531d05 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -244,13 +244,15 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
> void vfio_migration_exit(VFIODevice *vbasedev);
>
> int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size);
> -bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer);
> -bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer);
> -int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +bool
> +vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
> +bool
> +vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
> +int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap, hwaddr iova,
> hwaddr size);
> -int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
> - uint64_t size, ram_addr_t ram_addr);
> +int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
> + uint64_t size, ram_addr_t ram_addr);
>
> int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
> void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 45bb19c767..2ae297ccda 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -82,7 +82,7 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
> MemoryRegionSection *section);
> int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> bool start);
> -int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size);
>
> @@ -93,18 +93,20 @@ void vfio_container_destroy(VFIOContainerBase *bcontainer);
>
> struct VFIOIOMMUOps {
> /* basic feature */
> - int (*dma_map)(VFIOContainerBase *bcontainer,
> + int (*dma_map)(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> void *vaddr, bool readonly);
> - int (*dma_unmap)(VFIOContainerBase *bcontainer,
> + int (*dma_unmap)(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> IOMMUTLBEntry *iotlb);
> int (*attach_device)(const char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void (*detach_device)(VFIODevice *vbasedev);
> /* migration feature */
> - int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
> - int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
> + int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
> + bool start);
> + int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
> + VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size);
> /* PCI specific */
> int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6569732b7a..08a3e57672 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -204,7 +204,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
> return true;
> }
>
> -bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
> +bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
> {
> VFIODevice *vbasedev;
>
> @@ -221,7 +221,8 @@ bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
> * Check if all VFIO devices are running and migration is active, which is
> * essentially equivalent to the migration being in pre-copy phase.
> */
> -bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer)
> +bool
> +vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer)
> {
> VFIODevice *vbasedev;
>
> @@ -1139,7 +1140,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
> return 0;
> }
>
> -int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap, hwaddr iova,
> hwaddr size)
> {
> @@ -1162,7 +1163,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> return 0;
> }
>
> -int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
> +int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
> uint64_t size, ram_addr_t ram_addr)
> {
> bool all_device_dirty_tracking =
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index eee2dcfe76..1ffd25bbfa 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -63,7 +63,7 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
> }
>
> -int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size)
> {
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 1dbf9b9a17..b22feb8ded 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -61,11 +61,11 @@ static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
> }
> }
>
> -static int vfio_dma_unmap_bitmap(VFIOContainer *container,
> +static int vfio_dma_unmap_bitmap(const VFIOContainer *container,
> hwaddr iova, ram_addr_t size,
> IOMMUTLBEntry *iotlb)
> {
> - VFIOContainerBase *bcontainer = &container->bcontainer;
> + const VFIOContainerBase *bcontainer = &container->bcontainer;
> struct vfio_iommu_type1_dma_unmap *unmap;
> struct vfio_bitmap *bitmap;
> VFIOBitmap vbmap;
> @@ -117,11 +117,12 @@ unmap_exit:
> /*
> * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
> */
> -static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
> - ram_addr_t size, IOMMUTLBEntry *iotlb)
> +static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> + hwaddr iova, ram_addr_t size,
> + IOMMUTLBEntry *iotlb)
> {
> - VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> - bcontainer);
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> struct vfio_iommu_type1_dma_unmap unmap = {
> .argsz = sizeof(unmap),
> .flags = 0,
> @@ -174,11 +175,11 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
> return 0;
> }
>
> -static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
> +static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> {
> - VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> - bcontainer);
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> struct vfio_iommu_type1_dma_map map = {
> .argsz = sizeof(map),
> .flags = VFIO_DMA_MAP_FLAG_READ,
> @@ -207,11 +208,12 @@ static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
> return -errno;
> }
>
> -static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> - bool start)
> +static int
> +vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
> + bool start)
> {
> - VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> - bcontainer);
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> int ret;
> struct vfio_iommu_type1_dirty_bitmap dirty = {
> .argsz = sizeof(dirty),
> @@ -233,12 +235,12 @@ static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> return ret;
> }
>
> -static int vfio_legacy_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +static int vfio_legacy_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size)
> {
> - VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> - bcontainer);
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> struct vfio_iommu_type1_dirty_bitmap *dbitmap;
> struct vfio_iommu_type1_dirty_bitmap_get *range;
> int ret;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index e08a217057..bc45dd1842 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -26,10 +26,10 @@
> #include "qemu/chardev_open.h"
> #include "pci.h"
>
> -static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
> +static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> {
> - VFIOIOMMUFDContainer *container =
> + const VFIOIOMMUFDContainer *container =
> container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>
> return iommufd_backend_map_dma(container->be,
> @@ -37,11 +37,11 @@ static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
> iova, size, vaddr, readonly);
> }
>
> -static int iommufd_cdev_unmap(VFIOContainerBase *bcontainer,
> +static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> IOMMUTLBEntry *iotlb)
> {
> - VFIOIOMMUFDContainer *container =
> + const VFIOIOMMUFDContainer *container =
> container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>
> /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 ` [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-14 14:08 ` Cédric Le Goater
2023-11-15 12:09 ` Philippe Mathieu-Daudé
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:08 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> This gives management tools like libvirt a chance to open the vfio
> cdev with privilege and pass FD to qemu. This way qemu never needs
> to have privilege to open a VFIO or iommu cdev node.
>
> Together with the earlier support of pre-opening /dev/iommu device,
> now we have full support of passing a vfio device to unprivileged
> qemu by management tool. This mode is no more considered for the
> legacy backend. So let's remove the "TODO" comment.
>
> Add helper functions vfio_device_set_fd() and vfio_device_get_name()
> to set fd and get device name, they will also be used by other vfio
> devices.
>
> There is no easy way to check if a device is mdev with FD passing,
> so fail the x-balloon-allowed check unconditionally in this case.
>
> There is also no easy way to get BDF as name with FD passing, so
> we fake a name by VFIO_FD[fd].
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v6: simplify CONFIG_IOMMUFD checking code
> introduce a helper vfio_device_set_fd
>
> include/hw/vfio/vfio-common.h | 3 +++
> hw/vfio/helpers.c | 44 +++++++++++++++++++++++++++++++++++
> hw/vfio/iommufd.c | 12 ++++++----
> hw/vfio/pci.c | 28 ++++++++++++----------
> 4 files changed, 71 insertions(+), 16 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 3dac5c167e..567e5f7bea 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -251,4 +251,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> hwaddr size);
> int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
> uint64_t size, ram_addr_t ram_addr);
> +
> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
> +void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
> #endif /* HW_VFIO_VFIO_COMMON_H */
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index 168847e7c5..986ef1095a 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -20,6 +20,7 @@
> */
>
> #include "qemu/osdep.h"
> +#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
Unused now. I removed it.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> #include <sys/ioctl.h>
>
> #include "hw/vfio/vfio-common.h"
> @@ -27,6 +28,7 @@
> #include "trace.h"
> #include "qapi/error.h"
> #include "qemu/error-report.h"
> +#include "monitor/monitor.h"
>
> /*
> * Common VFIO interrupt disable
> @@ -609,3 +611,45 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
>
> return ret;
> }
> +
> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
> +{
> + struct stat st;
> +
> + if (vbasedev->fd < 0) {
> + if (stat(vbasedev->sysfsdev, &st) < 0) {
> + error_setg_errno(errp, errno, "no such host device");
> + error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
> + return -errno;
> + }
> + /* User may specify a name, e.g: VFIO platform device */
> + if (!vbasedev->name) {
> + vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
> + }
> + } else {
> + if (!vbasedev->iommufd) {
> + error_setg(errp, "Use FD passing only with iommufd backend");
> + return -EINVAL;
> + }
> + /*
> + * Give a name with fd so any function printing out vbasedev->name
> + * will not break.
> + */
> + if (!vbasedev->name) {
> + vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
> + }
> + }
> +
> + return 0;
> +}
> +
> +void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
> +{
> + int fd = monitor_fd_param(monitor_cur(), str, errp);
> +
> + if (fd < 0) {
> + error_prepend(errp, "Could not parse remote object fd %s:", str);
> + return;
> + }
> + vbasedev->fd = fd;
> +}
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 3eec428162..e08a217057 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -326,11 +326,15 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> uint32_t ioas_id;
> Error *err = NULL;
>
> - devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
> - if (devfd < 0) {
> - return devfd;
> + if (vbasedev->fd < 0) {
> + devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
> + if (devfd < 0) {
> + return devfd;
> + }
> + vbasedev->fd = devfd;
> + } else {
> + devfd = vbasedev->fd;
> }
> - vbasedev->fd = devfd;
>
> ret = iommufd_cdev_connect_and_bind(vbasedev, errp);
> if (ret) {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c5984b0598..b23b492cce 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2944,17 +2944,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> VFIODevice *vbasedev = &vdev->vbasedev;
> char *tmp, *subsys;
> Error *err = NULL;
> - struct stat st;
> int i, ret;
> bool is_mdev;
> char uuid[UUID_STR_LEN];
> char *name;
>
> - if (!vbasedev->sysfsdev) {
> + if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
> if (!(~vdev->host.domain || ~vdev->host.bus ||
> ~vdev->host.slot || ~vdev->host.function)) {
> error_setg(errp, "No provided host device");
> error_append_hint(errp, "Use -device vfio-pci,host=DDDD:BB:DD.F "
> +#ifdef CONFIG_IOMMUFD
> + "or -device vfio-pci,fd=DEVICE_FD "
> +#endif
> "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
> return;
> }
> @@ -2964,13 +2966,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> vdev->host.slot, vdev->host.function);
> }
>
> - if (stat(vbasedev->sysfsdev, &st) < 0) {
> - error_setg_errno(errp, errno, "no such host device");
> - error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
> + if (vfio_device_get_name(vbasedev, errp)) {
> return;
> }
> -
> - vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
> vbasedev->ops = &vfio_pci_ops;
> vbasedev->type = VFIO_DEVICE_TYPE_PCI;
> vbasedev->dev = DEVICE(vdev);
> @@ -3330,6 +3328,7 @@ static void vfio_instance_init(Object *obj)
> vdev->host.bus = ~0U;
> vdev->host.slot = ~0U;
> vdev->host.function = ~0U;
> + vdev->vbasedev.fd = -1;
>
> vdev->nv_gpudirect_clique = 0xFF;
>
> @@ -3383,11 +3382,6 @@ static Property vfio_pci_dev_properties[] = {
> qdev_prop_nv_gpudirect_clique, uint8_t),
> DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
> OFF_AUTOPCIBAR_OFF),
> - /*
> - * TODO - support passed fds... is this necessary?
> - * DEFINE_PROP_STRING("vfiofd", VFIOPCIDevice, vfiofd_name),
> - * DEFINE_PROP_STRING("vfiogroupfd, VFIOPCIDevice, vfiogroupfd_name),
> - */
> #ifdef CONFIG_IOMMUFD
> DEFINE_PROP_LINK("iommufd", VFIOPCIDevice, vbasedev.iommufd,
> TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> @@ -3395,6 +3389,13 @@ static Property vfio_pci_dev_properties[] = {
> DEFINE_PROP_END_OF_LIST(),
> };
>
> +#ifdef CONFIG_IOMMUFD
> +static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp)
> +{
> + vfio_device_set_fd(&VFIO_PCI(obj)->vbasedev, str, errp);
> +}
> +#endif
> +
> static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
> {
> DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -3402,6 +3403,9 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, void *data)
>
> dc->reset = vfio_pci_reset;
> device_class_set_props(dc, vfio_pci_dev_properties);
> +#ifdef CONFIG_IOMMUFD
> + object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
> +#endif
> dc->desc = "VFIO-based PCI device assignment";
> set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> pdc->realize = vfio_realize;
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 13/21] vfio/platform: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 ` [PATCH v6 13/21] vfio/platform: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
@ 2023-11-14 14:22 ` Cédric Le Goater
0 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:22 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> This gives management tools like libvirt a chance to open the vfio
> cdev with privilege and pass FD to qemu. This way qemu never needs
> to have privilege to open a VFIO or iommu cdev node.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/platform.c | 32 ++++++++++++++++++++++++--------
> 1 file changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index 98ae4bc655..a97d9c6234 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -531,14 +531,13 @@ static VFIODeviceOps vfio_platform_ops = {
> */
> static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
> {
> - struct stat st;
> int ret;
>
> - /* @sysfsdev takes precedence over @host */
> - if (vbasedev->sysfsdev) {
> + /* @fd takes precedence over @sysfsdev which takes precedence over @host */
> + if (vbasedev->fd < 0 && vbasedev->sysfsdev) {
> g_free(vbasedev->name);
> vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
> - } else {
> + } else if (vbasedev->fd < 0) {
> if (!vbasedev->name || strchr(vbasedev->name, '/')) {
> error_setg(errp, "wrong host device name");
> return -EINVAL;
> @@ -548,10 +547,9 @@ static int vfio_base_device_init(VFIODevice *vbasedev, Error **errp)
> vbasedev->name);
> }
>
> - if (stat(vbasedev->sysfsdev, &st) < 0) {
> - error_setg_errno(errp, errno,
> - "failed to get the sysfs host device file status");
> - return -errno;
> + ret = vfio_device_get_name(vbasedev, errp);
> + if (ret) {
> + return ret;
> }
>
> ret = vfio_attach_device(vbasedev->name, vbasedev,
> @@ -658,6 +656,20 @@ static Property vfio_platform_dev_properties[] = {
> DEFINE_PROP_END_OF_LIST(),
> };
>
> +static void vfio_platform_instance_init(Object *obj)
> +{
> + VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(obj);
> +
> + vdev->vbasedev.fd = -1;
> +}
> +
> +#ifdef CONFIG_IOMMUFD
> +static void vfio_platform_set_fd(Object *obj, const char *str, Error **errp)
> +{
> + vfio_device_set_fd(&VFIO_PLATFORM_DEVICE(obj)->vbasedev, str, errp);
> +}
> +#endif
> +
> static void vfio_platform_class_init(ObjectClass *klass, void *data)
> {
> DeviceClass *dc = DEVICE_CLASS(klass);
> @@ -665,6 +677,9 @@ static void vfio_platform_class_init(ObjectClass *klass, void *data)
>
> dc->realize = vfio_platform_realize;
> device_class_set_props(dc, vfio_platform_dev_properties);
> +#ifdef CONFIG_IOMMUFD
> + object_class_property_add_str(klass, "fd", NULL, vfio_platform_set_fd);
> +#endif
> dc->vmsd = &vfio_platform_vmstate;
> dc->desc = "VFIO-based platform device assignment";
> sbc->connect_irq_notifier = vfio_start_irqfd_injection;
> @@ -677,6 +692,7 @@ static const TypeInfo vfio_platform_dev_info = {
> .name = TYPE_VFIO_PLATFORM,
> .parent = TYPE_SYS_BUS_DEVICE,
> .instance_size = sizeof(VFIOPlatformDevice),
> + .instance_init = vfio_platform_instance_init,
> .class_init = vfio_platform_class_init,
> .class_size = sizeof(VFIOPlatformDeviceClass),
> };
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 00/21] vfio: Adopt iommufd
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (20 preceding siblings ...)
2023-11-14 10:09 ` [PATCH v6 21/21] hw/i386: Activate IOMMUFD for q35 machines Zhenzhong Duan
@ 2023-11-14 14:51 ` Cédric Le Goater
2023-11-15 4:16 ` Duan, Zhenzhong
2023-11-20 9:15 ` Eric Auger
22 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-14 14:51 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
Hello Zhenzhong,
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Hi,
>
> Thanks all for giving guides and comments on previous series, this is
> the remaining part of the iommufd support.
>
> Based on Cédric's suggestion, replace old config method for IOMMUFD
> with Kconfig.
>
> Based on Jason's suggestion, drop the implementation of manually
> allocating hwpt and switch to IOAS attach/detach.
>
> Beside current test, we also tested mdev with mtty for better cover range.
>
> PATCH 1: Introduce iommufd object
> PATCH 2-9: add IOMMUFD container and cdev support
> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
> PATCH 18: make VFIOContainerBase parameter const
> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
>
>
> We have done wide test with different combinations, e.g:
> - PCI device were tested
> - FD passing and hot reset with some trick.
> - device hotplug test with legacy and iommufd backends
> - with or without vIOMMU for legacy and iommufd backends
> - divices linked to different iommufds
> - VFIO migration with a E800 net card(no dirty sync support) passthrough
> - platform, ccw and ap were only compile-tested due to environment limit
> - test mdev pass through with mtty and mix with real device and different BE
>
> Given some iommufd kernel limitations, the iommufd backend is
> not yet fully on par with the legacy backend w.r.t. features like:
> - p2p mappings (you will see related error traces)
> - dirty page sync
> - and etc.
>
>
> qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6
> Based on vfio-next, commit id: 1a22fb936e
I just had a few comments that I addressed myself in :
https://github.com/legoater/qemu/commits/vfio-8.2
Please take a look and I will possibly merge the modified v6 in vfio-next
after a few days or next week when people are back from LPC.
I would like to propose this series for an early merge in QEMU 9.0. So, we
have a few weeks to polish and test a bit more. I didn't get feedback from
ARM for instance (that's a message in a bottle for Eric :).
My last request would be to take some of the documentation, below in this
email, and include it in QEMU. It can be done later.
Thanks,
C.
>
> --------------------------------------------------------------------------
>
> Below are some background and graph about the design:
>
> With the introduction of iommufd, the Linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>
> At QEMU level, interactions with the /dev/iommu are abstracted by a new
> iommufd object (compiled in with the CONFIG_IOMMUFD option).
>
> Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
> linked with an iommufd object. In this series, the vfio-pci device is
> granted with such capability (other VFIO devices are not yet ready):
>
> It gets a new optional parameter named iommufd which allows to pass
> an iommufd object:
>
> -object iommufd,id=iommufd0
> -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>
> Note the /dev/iommu and vfio cdev can be externally opened by a
> management layer. In such a case the fd is passed:
>
> -object iommufd,id=iommufd0,fd=22
> -device vfio-pci,iommufd=iommufd0,fd=23
>
> If the fd parameter is not passed, the fd is opened by QEMU.
> See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
> for detailed discuss on this requirement.
>
> If no iommufd option is passed to the vfio-pci device, iommufd is not
> used and the end-user gets the behavior based on the legacy vfio iommu
> interfaces:
>
> -device vfio-pci,host=0000:02:00.0
>
> While the legacy kernel interface is group-centric, the new iommufd
> interface is device-centric, relying on device fd and iommufd.
>
> To support both interfaces in the QEMU VFIO device we reworked the vfio
> container abstraction so that the generic VFIO code can use either
> backend.
>
> The VFIOContainer object becomes a base object derived into
> a) the legacy VFIO container and
> b) the new iommufd based container.
>
> The base object implements generic code such as code related to
> memory_listener and address space management whereas the derived
> objects implement callbacks specific to either BE, legacy and
> iommufd. Indeed each backend has its own way to setup secure context
> and dma management interface. The below diagram shows how it looks
> like with both BEs.
>
> VFIO AddressSpace/Memory
> +-------+ +----------+ +-----+ +-----+
> | pci | | platform | | ap | | ccw |
> +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
> | | | | | AddressSpace |
> | | | | +------------+---------+
> +---V-----------V-----------V--------V----+ /
> | VFIOAddressSpace | <------------+
> | | | MemoryListener
> | VFIOContainer list |
> +-------+----------------------------+----+
> | |
> | |
> +-------V------+ +--------V----------+
> | iommufd | | vfio legacy |
> | container | | container |
> +-------+------+ +--------+----------+
> | |
> | /dev/iommu | /dev/vfio/vfio
> | /dev/vfio/devices/vfioX | /dev/vfio/$group_id
> Userspace | |
> ============+============================+===========================
> Kernel | device fd |
> +---------------+ | group/container fd
> | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU)
> | ATTACH_IOAS) | | device fd
> | | |
> | +-------V------------V-----------------+
> iommufd | | vfio |
> (map/unmap | +---------+--------------------+-------+
> ioas_copy) | | | map/unmap
> | | |
> +------V------+ +-----V------+ +------V--------+
> | iommfd core | | device | | vfio iommu |
> +-------------+ +------------+ +---------------+
>
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
> (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
> (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> 2. VFIO populates DMA map/unmap via the container BEs
> *) iommufd BE: uses iommufd
> *) vfio legacy BE: uses container fd
>
>
> Changelog:
> v6:
> - simplify CONFIG_IOMMUFD checking code further (Cédric)
> - check iommufd_cdev_kvm_device_add return value (Cédric)
> - dirrectory -> directory (Cédric)
> - propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric)
> - introduce a helper vfio_device_set_fd (Cédric)
> - Move #include "sysemu/iommufd.h" in platform.c (Cédric)
> - simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas
> - Dare to keep Matthew's RB as related change is minor
>
> v5:
> - Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric)
> - Add (uintptr_t) to info->allowed_iovas (Cédric)
> - Switch to IOAS attach/detach and hide hwpt (Jason)
> - move chardev_open.[h|c] under the IOMMUFD entry (Cédric)
> - Move vfio_legacy_pci_hot_reset into container.c (Cédric)
> - Add missed pgsizes initialization in vfio_get_info_iova_range
> - split linking iommufd patch into three to be cleaner
> - Fix comments on PCI BAR unmap
>
> v4:
> - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
> - add doc for default case without fd (Markus)
> - Fix build issue reported by Markus and Cédric
> - Simply use SPDX identifier in new file (Cédric)
> - make vfio_container_init/destroy helper a seperate patch (Cédric)
> - make vrdl_list movement a seperate patch (Cédric)
> - add const for some callback parameters (Cédric)
> - add g_assert in VFIOIOMMUOps callback (Cédric)
> - introduce pci_hot_reset callback (Cédric)
> - remove VFIOIOMMUSpaprOps (Cédric)
> - initialize g_autofree to NULL (Cédric)
> - adjust func name prefix and trace event in iommufd.c (Cédric)
> - add RB
>
> v3:
> - Rename base container as VFIOContainerBase and legacy container as container (Cédric)
> - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
> - Cleanup container.c by introducing spapr backend and move spapr code out (Cédric)
> - Introduce vfio_iommu_spapr_ops (Cédric)
> - Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
> - patch19 and patch21 are splitted to two parts to facilitate review
>
> v2:
> - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
> - add fd passing to platform/ap/ccw vfio device
> - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
> - rename char_dev.h to chardev_open.h for same naming scheme per Daniel
> - add full copyright per Daniel and Jason
>
>
> Note changelog below are from full IOMMUFD series:
>
> v1:
> - Alloc hwpt instead of using auto hwpt
> - elaborate iommufd code per Nicolin
> - consolidate two patches and drop as.c
> - typo error fix and function rename
>
> rfcv4:
> - rebase on top of v8.0.3
> - Add one patch from Yi which is about vfio device add in kvm
> - Remove IOAS_COPY optimization and focus on functions in this patchset
> - Fix wrong name issue reported and fix suggested by Matthew
> - Fix compilation issue reported and fix sugggsted by Nicolin
> - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
> granularity
> - Add dev_iter_next() callback to avoid adding so many callback
> at container scope, add VFIODevice.hwpt to support that
> - Restore all functions back to common from container whenever possible,
> mainly migration and reset related functions
> - Add --enable/disable-iommufd config option, enabled by default in linux
> - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
> - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
> - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
> redundant code
> - Add FD passing support for vfio device backed by IOMMUFD
> - Fix hot unplug resource leak issue in vfio_legacy_detach_device()
> - Fix FD leak in vfio_get_devicefd()
>
> rfcv3:
> - rebase on top of v7.2.0
> - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
> VFIO backends
> - Fix use after free in error path, reported by Alister
> - Split common.c in several steps to ease the review
>
> rfcv2:
> - remove the first three patches of rfcv1
> - add open cdev helper suggested by Jason
> - remove the QOMification of the VFIOContainer and simply use standard ops
> (David)
> - add "-object iommufd" suggested by Alex
>
> Thanks
> Zhenzhong
>
>
> Cédric Le Goater (3):
> hw/arm: Activate IOMMUFD for virt machines
> kconfig: Activate IOMMUFD for s390x machines
> hw/i386: Activate IOMMUFD for q35 machines
>
> Eric Auger (2):
> backends/iommufd: Introduce the iommufd object
> vfio/pci: Allow the selection of a given iommu backend
>
> Yi Liu (2):
> util/char_dev: Add open_cdev()
> vfio/iommufd: Implement the iommufd backend
>
> Zhenzhong Duan (14):
> vfio/common: return early if space isn't empty
> vfio/iommufd: Relax assert check for iommufd backend
> vfio/iommufd: Add support for iova_ranges and pgsizes
> vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
> vfio/pci: Introduce a vfio pci hot reset interface
> vfio/iommufd: Enable pci hot reset through iommufd cdev interface
> vfio/pci: Make vfio cdev pre-openable by passing a file handle
> vfio/platform: Allow the selection of a given iommu backend
> vfio/platform: Make vfio cdev pre-openable by passing a file handle
> vfio/ap: Allow the selection of a given iommu backend
> vfio/ap: Make vfio cdev pre-openable by passing a file handle
> vfio/ccw: Allow the selection of a given iommu backend
> vfio/ccw: Make vfio cdev pre-openable by passing a file handle
> vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
> callbacks
>
> MAINTAINERS | 10 +
> qapi/qom.json | 19 +
> hw/vfio/pci.h | 6 +
> include/hw/vfio/vfio-common.h | 26 +-
> include/hw/vfio/vfio-container-base.h | 15 +-
> include/qemu/chardev_open.h | 16 +
> include/sysemu/iommufd.h | 44 ++
> backends/iommufd.c | 228 ++++++++++
> hw/vfio/ap.c | 29 +-
> hw/vfio/ccw.c | 31 +-
> hw/vfio/common.c | 24 +-
> hw/vfio/container-base.c | 6 +-
> hw/vfio/container.c | 208 ++++++++-
> hw/vfio/helpers.c | 44 ++
> hw/vfio/iommufd.c | 630 ++++++++++++++++++++++++++
> hw/vfio/pci.c | 212 ++-------
> hw/vfio/platform.c | 38 +-
> util/chardev_open.c | 81 ++++
> backends/Kconfig | 4 +
> backends/meson.build | 1 +
> backends/trace-events | 10 +
> hw/arm/Kconfig | 1 +
> hw/i386/Kconfig | 1 +
> hw/s390x/Kconfig | 1 +
> hw/vfio/meson.build | 3 +
> hw/vfio/trace-events | 11 +
> qemu-options.hx | 12 +
> util/meson.build | 1 +
> 28 files changed, 1493 insertions(+), 219 deletions(-)
> create mode 100644 include/qemu/chardev_open.h
> create mode 100644 include/sysemu/iommufd.h
> create mode 100644 backends/iommufd.c
> create mode 100644 hw/vfio/iommufd.c
> create mode 100644 util/chardev_open.c
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes
2023-11-14 13:46 ` Cédric Le Goater
@ 2023-11-15 2:36 ` Duan, Zhenzhong
2023-11-15 16:25 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-15 2:36 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 14, 2023 9:46 PM
>Subject: Re: [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and
>pgsizes
>
>On 11/14/23 11:09, Zhenzhong Duan wrote:
>> Some vIOMMU such as virtio-iommu use IOVA ranges from host side to
>> setup reserved ranges for passthrough device, so that guest will not
>> use an IOVA range beyond host support.
>>
>> Use an uAPI of IOMMUFD to get IOVA ranges of host side and pass to
>> vIOMMU just like the legacy backend, if this fails, fallback to
>> 64bit IOVA range.
>>
>> Also use out_iova_alignment returned from uAPI as pgsizes instead of
>> qemu_real_host_page_size() as a fallback.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> v6: propagate iommufd_cdev_get_info_iova_range err and print as warning
>>
>> hw/vfio/iommufd.c | 55
>++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 06282d885c..e5bf528e89 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -267,6 +267,53 @@ static int
>iommufd_cdev_ram_block_discard_disable(bool state)
>> return ram_block_uncoordinated_discard_disable(state);
>> }
>>
>> +static int iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer
>*container,
>> + uint32_t ioas_id, Error **errp)
>> +{
>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>> + struct iommu_ioas_iova_ranges *info;
>> + struct iommu_iova_range *iova_ranges;
>> + int ret, sz, fd = container->be->fd;
>> +
>> + info = g_malloc0(sizeof(*info));
>> + info->size = sizeof(*info);
>> + info->ioas_id = ioas_id;
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
>> + if (ret && errno != EMSGSIZE) {
>> + goto error;
>> + }
>> +
>> + sz = info->num_iovas * sizeof(struct iommu_iova_range);
>> + info = g_realloc(info, sizeof(*info) + sz);
>> + info->allowed_iovas = (uintptr_t)(info + 1);
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
>> + if (ret) {
>> + goto error;
>> + }
>> +
>> + iova_ranges = (struct iommu_iova_range *)(uintptr_t)info->allowed_iovas;
>> +
>> + for (int i = 0; i < info->num_iovas; i++) {
>> + Range *range = g_new(Range, 1);
>> +
>> + range_set_bounds(range, iova_ranges[i].start, iova_ranges[i].last);
>> + bcontainer->iova_ranges =
>> + range_list_insert(bcontainer->iova_ranges, range);
>> + }
>> + bcontainer->pgsizes = info->out_iova_alignment;
>> +
>> + g_free(info);
>> + return 0;
>> +
>> +error:
>> + ret = -errno;
>> + g_free(info);
>> + error_setg_errno(errp, errno, "Cannot get IOVA ranges");
>> + return ret;
>> +}
>> +
>> static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>> AddressSpace *as, Error **errp)
>> {
>> @@ -341,7 +388,13 @@ static int iommufd_cdev_attach(const char *name,
>VFIODevice *vbasedev,
>> goto err_discard_disable;
>> }
>>
>> - bcontainer->pgsizes = qemu_real_host_page_size();
>> + ret = iommufd_cdev_get_info_iova_range(container, ioas_id, &err);
>> + if (ret) {
>> + warn_report_err(err);
>> + err = NULL;
>> + error_printf("Fallback to default 64bit IOVA range and 4K page size\n");
>
>This would be better :
>
> error_append_hint(&err,
> "Fallback to default 64bit IOVA range and 4K page size\n");
> warn_report_err(err);
>
>I will take care of it if you agree. With that,
Sure, thanks
BRs
Zhenzhong
>
>Reviewed-by: Cédric Le Goater <clg@redhat.com>
>
>Thanks,
>
>C.
>
>
>> + bcontainer->pgsizes = qemu_real_host_page_size();
>> + }
>>
>> bcontainer->listener = vfio_memory_listener;
>> memory_listener_register(&bcontainer->listener, bcontainer->space->as);
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface
2023-11-14 13:51 ` Cédric Le Goater
@ 2023-11-15 2:55 ` Duan, Zhenzhong
0 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-15 2:55 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 14, 2023 9:52 PM
>Subject: Re: [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface
>
>On 11/14/23 11:09, Zhenzhong Duan wrote:
>> Legacy vfio pci and iommufd cdev have different process to hot reset
>> vfio device, expand current code to abstract out pci_hot_reset callback
>> for legacy vfio, this same interface will also be used by iommufd
>> cdev vfio device.
>>
>> Rename vfio_pci_hot_reset to vfio_legacy_pci_hot_reset and move it
>> into container.c.
>>
>> vfio_pci_[pre/post]_reset and vfio_pci_host_match are exported so
>> they could be called in legacy and iommufd pci_hot_reset callback.
>
>vfio_pci_host_match() is never used outside ot the legacy reset cb.
>Do you have future plans ?
No future plans, I'm just following a rule to keep pci specific functions
in pci.c whenever possible. Maybe another rule is to make functions
static whenever possible. I'm fine with both😊
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-14 13:28 ` Cédric Le Goater
@ 2023-11-15 4:06 ` Duan, Zhenzhong
2023-11-15 8:15 ` Cédric Le Goater
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-15 4:06 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P, Paolo Bonzini, Eric Blake,
Markus Armbruster, Daniel P. Berrangé, Eduardo Habkost
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 14, 2023 9:29 PM
>Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>
>On 11/14/23 11:09, Zhenzhong Duan wrote:
>> From: Eric Auger <eric.auger@redhat.com>
>>
>> Introduce an iommufd object which allows the interaction
>> with the host /dev/iommu device.
>>
>> The /dev/iommu can have been already pre-opened outside of qemu,
>> in which case the fd can be passed directly along with the
>> iommufd object:
>>
>> This allows the iommufd object to be shared accross several
>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>> the /dev/iommu once.
>>
>> If no fd is passed along with the iommufd object, the /dev/iommu
>> is opened by the qemu code.
>>
>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>
>I simplified the object declaration in include/sysemu/iommufd.h and
>formatted /dev/iommu in qemu-options.hx. No need to resend.
Good catch, thanks! Maybe further simplified with below? This is minor.
OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
Thanks
Zhenzhong
>
>Reviewed-by: Cédric Le Goater <clg@redhat.com>
>
>Thanks,
>
>C.
>
>
>> ---
>> v6: remove redundant call, alloc_hwpt, get/put_ioas
>>
>> MAINTAINERS | 7 ++
>> qapi/qom.json | 19 ++++
>> include/sysemu/iommufd.h | 44 ++++++++
>> backends/iommufd.c | 228
>+++++++++++++++++++++++++++++++++++++++
>> backends/Kconfig | 4 +
>> backends/meson.build | 1 +
>> backends/trace-events | 10 ++
>> qemu-options.hx | 12 +++
>> 8 files changed, 325 insertions(+)
>> create mode 100644 include/sysemu/iommufd.h
>> create mode 100644 backends/iommufd.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index ff1238bb98..a4891f7bda 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c
>> F: docs/system/s390x/vfio-ap.rst
>> L: qemu-s390x@nongnu.org
>>
>> +iommufd
>> +M: Yi Liu <yi.l.liu@intel.com>
>> +M: Eric Auger <eric.auger@redhat.com>
>> +S: Supported
>> +F: backends/iommufd.c
>> +F: include/sysemu/iommufd.h
>> +
>> vhost
>> M: Michael S. Tsirkin <mst@redhat.com>
>> S: Supported
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index c53ef978ff..1fd8555a75 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -794,6 +794,23 @@
>> { 'struct': 'VfioUserServerProperties',
>> 'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>
>> +##
>> +# @IOMMUFDProperties:
>> +#
>> +# Properties for iommufd objects.
>> +#
>> +# @fd: file descriptor name previously passed via 'getfd' command,
>> +# which represents a pre-opened /dev/iommu. This allows the
>> +# iommufd object to be shared accross several subsystems
>> +# (VFIO, VDPA, ...), and the file descriptor to be shared
>> +# with other process, e.g. DPDK. (default: QEMU opens
>> +# /dev/iommu by itself)
>> +#
>> +# Since: 8.2
>> +##
>> +{ 'struct': 'IOMMUFDProperties',
>> + 'data': { '*fd': 'str' } }
>> +
>> ##
>> # @RngProperties:
>> #
>> @@ -934,6 +951,7 @@
>> 'input-barrier',
>> { 'name': 'input-linux',
>> 'if': 'CONFIG_LINUX' },
>> + 'iommufd',
>> 'iothread',
>> 'main-loop',
>> { 'name': 'memory-backend-epc',
>> @@ -1003,6 +1021,7 @@
>> 'input-barrier': 'InputBarrierProperties',
>> 'input-linux': { 'type': 'InputLinuxProperties',
>> 'if': 'CONFIG_LINUX' },
>> + 'iommufd': 'IOMMUFDProperties',
>> 'iothread': 'IothreadProperties',
>> 'main-loop': 'MainLoopProperties',
>> 'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties',
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> new file mode 100644
>> index 0000000000..9b3a86f57d
>> --- /dev/null
>> +++ b/include/sysemu/iommufd.h
>> @@ -0,0 +1,44 @@
>> +#ifndef SYSEMU_IOMMUFD_H
>> +#define SYSEMU_IOMMUFD_H
>> +
>> +#include "qom/object.h"
>> +#include "qemu/thread.h"
>> +#include "exec/hwaddr.h"
>> +#include "exec/cpu-common.h"
>> +
>> +#define TYPE_IOMMUFD_BACKEND "iommufd"
>> +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
>> + IOMMUFD_BACKEND)
>> +#define IOMMUFD_BACKEND(obj) \
>> + OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
>> +#define IOMMUFD_BACKEND_GET_CLASS(obj) \
>> + OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj),
>TYPE_IOMMUFD_BACKEND)
>> +#define IOMMUFD_BACKEND_CLASS(klass) \
>> + OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass),
>TYPE_IOMMUFD_BACKEND)
>> +struct IOMMUFDBackendClass {
>> + ObjectClass parent_class;
>> +};
>> +
>> +struct IOMMUFDBackend {
>> + Object parent;
>> +
>> + /*< protected >*/
>> + int fd; /* /dev/iommu file descriptor */
>> + bool owned; /* is the /dev/iommu opened internally */
>> + QemuMutex lock;
>> + uint32_t users;
>> +
>> + /*< public >*/
>> +};
>> +
>> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
>> +void iommufd_backend_disconnect(IOMMUFDBackend *be);
>> +
>> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>> + Error **errp);
>> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>hwaddr iova,
>> + ram_addr_t size, void *vaddr, bool readonly);
>> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> + hwaddr iova, ram_addr_t size);
>> +#endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> new file mode 100644
>> index 0000000000..ea3e2a8f85
>> --- /dev/null
>> +++ b/backends/iommufd.c
>> @@ -0,0 +1,228 @@
>> +/*
>> + * iommufd container backend
>> + *
>> + * Copyright (C) 2023 Intel Corporation.
>> + * Copyright Red Hat, Inc. 2023
>> + *
>> + * Authors: Yi Liu <yi.l.liu@intel.com>
>> + * Eric Auger <eric.auger@redhat.com>
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "sysemu/iommufd.h"
>> +#include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>> +#include "qemu/module.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/error-report.h"
>> +#include "monitor/monitor.h"
>> +#include "trace.h"
>> +#include <sys/ioctl.h>
>> +#include <linux/iommufd.h>
>> +
>> +static void iommufd_backend_init(Object *obj)
>> +{
>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> +
>> + be->fd = -1;
>> + be->users = 0;
>> + be->owned = true;
>> + qemu_mutex_init(&be->lock);
>> +}
>> +
>> +static void iommufd_backend_finalize(Object *obj)
>> +{
>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> +
>> + if (be->owned) {
>> + close(be->fd);
>> + be->fd = -1;
>> + }
>> +}
>> +
>> +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
>> +{
>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> + int fd = -1;
>> +
>> + fd = monitor_fd_param(monitor_cur(), str, errp);
>> + if (fd == -1) {
>> + error_prepend(errp, "Could not parse remote object fd %s:", str);
>> + return;
>> + }
>> + qemu_mutex_lock(&be->lock);
>> + be->fd = fd;
>> + be->owned = false;
>> + qemu_mutex_unlock(&be->lock);
>> + trace_iommu_backend_set_fd(be->fd);
>> +}
>> +
>> +static void iommufd_backend_class_init(ObjectClass *oc, void *data)
>> +{
>> + object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
>> +}
>> +
>> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
>> +{
>> + int fd, ret = 0;
>> +
>> + qemu_mutex_lock(&be->lock);
>> + if (be->users == UINT32_MAX) {
>> + error_setg(errp, "too many connections");
>> + ret = -E2BIG;
>> + goto out;
>> + }
>> + if (be->owned && !be->users) {
>> + fd = qemu_open_old("/dev/iommu", O_RDWR);
>> + if (fd < 0) {
>> + error_setg_errno(errp, errno, "/dev/iommu opening failed");
>> + ret = fd;
>> + goto out;
>> + }
>> + be->fd = fd;
>> + }
>> + be->users++;
>> +out:
>> + trace_iommufd_backend_connect(be->fd, be->owned,
>> + be->users, ret);
>> + qemu_mutex_unlock(&be->lock);
>> + return ret;
>> +}
>> +
>> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
>> +{
>> + qemu_mutex_lock(&be->lock);
>> + if (!be->users) {
>> + goto out;
>> + }
>> + be->users--;
>> + if (!be->users && be->owned) {
>> + close(be->fd);
>> + be->fd = -1;
>> + }
>> +out:
>> + trace_iommufd_backend_disconnect(be->fd, be->users);
>> + qemu_mutex_unlock(&be->lock);
>> +}
>> +
>> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>> + Error **errp)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_ioas_alloc alloc_data = {
>> + .size = sizeof(alloc_data),
>> + .flags = 0,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
>> + if (ret) {
>> + error_setg_errno(errp, errno, "Failed to allocate ioas");
>> + return ret;
>> + }
>> +
>> + *ioas_id = alloc_data.out_ioas_id;
>> + trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
>> +
>> + return ret;
>> +}
>> +
>> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_destroy des = {
>> + .size = sizeof(des),
>> + .id = id,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_DESTROY, &des);
>> + trace_iommufd_backend_free_id(fd, id, ret);
>> + if (ret) {
>> + error_report("Failed to free id: %u %m", id);
>> + }
>> +}
>> +
>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>hwaddr iova,
>> + ram_addr_t size, void *vaddr, bool readonly)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_ioas_map map = {
>> + .size = sizeof(map),
>> + .flags = IOMMU_IOAS_MAP_READABLE |
>> + IOMMU_IOAS_MAP_FIXED_IOVA,
>> + .ioas_id = ioas_id,
>> + .__reserved = 0,
>> + .user_va = (uintptr_t)vaddr,
>> + .iova = iova,
>> + .length = size,
>> + };
>> +
>> + if (!readonly) {
>> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>> + }
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
>> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
>> + vaddr, readonly, ret);
>> + if (ret) {
>> + ret = -errno;
>> + error_report("IOMMU_IOAS_MAP failed: %m");
>> + }
>> + return ret;
>> +}
>> +
>> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> + hwaddr iova, ram_addr_t size)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_ioas_unmap unmap = {
>> + .size = sizeof(unmap),
>> + .ioas_id = ioas_id,
>> + .iova = iova,
>> + .length = size,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_UNMAP, &unmap);
>> + /*
>> + * IOMMUFD takes mapping as some kind of object, unmapping
>> + * nonexistent mapping is treated as deleting a nonexistent
>> + * object and return ENOENT. This is different from legacy
>> + * backend which allows it. vIOMMU may trigger a lot of
>> + * redundant unmapping, to avoid flush the log, treat them
>> + * as succeess for IOMMUFD just like legacy backend.
>> + */
>> + if (ret && errno == ENOENT) {
>> + trace_iommufd_backend_unmap_dma_non_exist(fd, ioas_id, iova, size,
>ret);
>> + ret = 0;
>> + } else {
>> + trace_iommufd_backend_unmap_dma(fd, ioas_id, iova, size, ret);
>> + }
>> +
>> + if (ret) {
>> + ret = -errno;
>> + error_report("IOMMU_IOAS_UNMAP failed: %m");
>> + }
>> + return ret;
>> +}
>> +
>> +static const TypeInfo iommufd_backend_info = {
>> + .name = TYPE_IOMMUFD_BACKEND,
>> + .parent = TYPE_OBJECT,
>> + .instance_size = sizeof(IOMMUFDBackend),
>> + .instance_init = iommufd_backend_init,
>> + .instance_finalize = iommufd_backend_finalize,
>> + .class_size = sizeof(IOMMUFDBackendClass),
>> + .class_init = iommufd_backend_class_init,
>> + .interfaces = (InterfaceInfo[]) {
>> + { TYPE_USER_CREATABLE },
>> + { }
>> + }
>> +};
>> +
>> +static void register_types(void)
>> +{
>> + type_register_static(&iommufd_backend_info);
>> +}
>> +
>> +type_init(register_types);
>> diff --git a/backends/Kconfig b/backends/Kconfig
>> index f35abc1609..2cb23f62fa 100644
>> --- a/backends/Kconfig
>> +++ b/backends/Kconfig
>> @@ -1 +1,5 @@
>> source tpm/Kconfig
>> +
>> +config IOMMUFD
>> + bool
>> + depends on VFIO
>> diff --git a/backends/meson.build b/backends/meson.build
>> index 914c7c4afb..9a5cea480d 100644
>> --- a/backends/meson.build
>> +++ b/backends/meson.build
>> @@ -20,6 +20,7 @@ if have_vhost_user
>> system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c'))
>> endif
>> system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-
>vhost.c'))
>> +system_ss.add(when: 'CONFIG_IOMMUFD', if_true: files('iommufd.c'))
>> if have_vhost_user_crypto
>> system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-
>vhost-user.c'))
>> endif
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 652eb76a57..d45c6e31a6 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -5,3 +5,13 @@ dbus_vmstate_pre_save(void)
>> dbus_vmstate_post_load(int version_id) "version_id: %d"
>> dbus_vmstate_loading(const char *id) "id: %s"
>> dbus_vmstate_saving(const char *id) "id: %s"
>> +
>> +# iommufd.c
>> +iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd=%d
>owned=%d users=%d (%d)"
>> +iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
>> +iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
>> +iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova,
>uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d
>iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
>> +iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas,
>uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d
>ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>> +iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova,
>uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64"
>size=0x%"PRIx64" (%d)"
>> +iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) "
>iommufd=%d ioas=%d (%d)"
>> +iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d
>id=%d (%d)"
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index 42fd09e4de..70507c0ee6 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -5224,6 +5224,18 @@ SRST
>>
>> The ``share`` boolean option is on by default with memfd.
>>
>> + ``-object iommufd,id=id[,fd=fd]``
>> + Creates an iommufd backend which allows control of DMA mapping
>> + through the /dev/iommu device.
>> +
>> + The ``id`` parameter is a unique ID which frontends (such as
>> + vfio-pci of vdpa) will use to connect with the iommufd backend.
>> +
>> + The ``fd`` parameter is an optional pre-opened file descriptor
>> + resulting from /dev/iommu opening. Usually the iommufd is shared
>> + across all subsystems, bringing the benefit of centralized
>> + reference counting.
>> +
>> ``-object rng-builtin,id=id``
>> Creates a random number generator backend which obtains entropy
>> from QEMU builtin functions. The ``id`` parameter is a unique ID
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 00/21] vfio: Adopt iommufd
2023-11-14 14:51 ` [PATCH v6 00/21] vfio: Adopt iommufd Cédric Le Goater
@ 2023-11-15 4:16 ` Duan, Zhenzhong
0 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-15 4:16 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P
Hi Cédric,
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 14, 2023 10:52 PM
>Subject: Re: [PATCH v6 00/21] vfio: Adopt iommufd
>
>Hello Zhenzhong,
>
>On 11/14/23 11:09, Zhenzhong Duan wrote:
>> Hi,
>>
>> Thanks all for giving guides and comments on previous series, this is
>> the remaining part of the iommufd support.
>>
>> Based on Cédric's suggestion, replace old config method for IOMMUFD
>> with Kconfig.
>>
>> Based on Jason's suggestion, drop the implementation of manually
>> allocating hwpt and switch to IOAS attach/detach.
>>
>> Beside current test, we also tested mdev with mtty for better cover range.
>>
>> PATCH 1: Introduce iommufd object
>> PATCH 2-9: add IOMMUFD container and cdev support
>> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
>> PATCH 18: make VFIOContainerBase parameter const
>> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
>>
>>
>> We have done wide test with different combinations, e.g:
>> - PCI device were tested
>> - FD passing and hot reset with some trick.
>> - device hotplug test with legacy and iommufd backends
>> - with or without vIOMMU for legacy and iommufd backends
>> - divices linked to different iommufds
>> - VFIO migration with a E800 net card(no dirty sync support) passthrough
>> - platform, ccw and ap were only compile-tested due to environment limit
>> - test mdev pass through with mtty and mix with real device and different BE
>>
>> Given some iommufd kernel limitations, the iommufd backend is
>> not yet fully on par with the legacy backend w.r.t. features like:
>> - p2p mappings (you will see related error traces)
>> - dirty page sync
>> - and etc.
>>
>>
>> qemu code:
>https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6
>> Based on vfio-next, commit id: 1a22fb936e
>
>I just had a few comments that I addressed myself in :
>
> https://github.com/legoater/qemu/commits/vfio-8.2
>
>Please take a look and I will possibly merge the modified v6 in vfio-next
>after a few days or next week when people are back from LPC.
Good for me, only a minor suggestion to use OBJECT_DECLARE_TYPE.
>
>I would like to propose this series for an early merge in QEMU 9.0. So, we
>have a few weeks to polish and test a bit more. I didn't get feedback from
>ARM for instance (that's a message in a bottle for Eric :).
Glad to know, thanks very much for your active review and guidance
since my first version😊
I'll do more test recent days.
>
>My last request would be to take some of the documentation, below in this
>email, and include it in QEMU. It can be done later.
Sure, will do.
BRs.
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-15 4:06 ` Duan, Zhenzhong
@ 2023-11-15 8:15 ` Cédric Le Goater
0 siblings, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-15 8:15 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P, Paolo Bonzini, Eric Blake,
Markus Armbruster, Daniel P. Berrangé, Eduardo Habkost
On 11/15/23 05:06, Duan, Zhenzhong wrote:
>
>
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Sent: Tuesday, November 14, 2023 9:29 PM
>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>>
>> On 11/14/23 11:09, Zhenzhong Duan wrote:
>>> From: Eric Auger <eric.auger@redhat.com>
>>>
>>> Introduce an iommufd object which allows the interaction
>>> with the host /dev/iommu device.
>>>
>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>> in which case the fd can be passed directly along with the
>>> iommufd object:
>>>
>>> This allows the iommufd object to be shared accross several
>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>> the /dev/iommu once.
>>>
>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>> is opened by the qemu code.
>>>
>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>
>> I simplified the object declaration in include/sysemu/iommufd.h and
>> formatted /dev/iommu in qemu-options.hx. No need to resend.
>
> Good catch, thanks! Maybe further simplified with below? This is minor.
>
> OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass, IOMMUFD_BACKEND)
Indeed. Done.
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 ` [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-11-14 14:08 ` Cédric Le Goater
@ 2023-11-15 12:09 ` Philippe Mathieu-Daudé
2023-11-15 13:05 ` Cédric Le Goater
1 sibling, 1 reply; 82+ messages in thread
From: Philippe Mathieu-Daudé @ 2023-11-15 12:09 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
Hi Zhenzhong,
On 14/11/23 11:09, Zhenzhong Duan wrote:
> This gives management tools like libvirt a chance to open the vfio
> cdev with privilege and pass FD to qemu. This way qemu never needs
> to have privilege to open a VFIO or iommu cdev node.
>
> Together with the earlier support of pre-opening /dev/iommu device,
> now we have full support of passing a vfio device to unprivileged
> qemu by management tool. This mode is no more considered for the
> legacy backend. So let's remove the "TODO" comment.
>
> Add helper functions vfio_device_set_fd() and vfio_device_get_name()
> to set fd and get device name, they will also be used by other vfio
> devices.
>
> There is no easy way to check if a device is mdev with FD passing,
> so fail the x-balloon-allowed check unconditionally in this case.
>
> There is also no easy way to get BDF as name with FD passing, so
> we fake a name by VFIO_FD[fd].
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v6: simplify CONFIG_IOMMUFD checking code
> introduce a helper vfio_device_set_fd
>
> include/hw/vfio/vfio-common.h | 3 +++
> hw/vfio/helpers.c | 44 +++++++++++++++++++++++++++++++++++
> hw/vfio/iommufd.c | 12 ++++++----
> hw/vfio/pci.c | 28 ++++++++++++----------
> 4 files changed, 71 insertions(+), 16 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 3dac5c167e..567e5f7bea 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -251,4 +251,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> hwaddr size);
> int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
> uint64_t size, ram_addr_t ram_addr);
> +
Please add bare documentation:
/* Returns 0 on success, or a negative errno. */
> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
Functions taking an Error** param should return a boolean, so:
/* Return: true on success, else false setting @errp with error. */
> +void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
> #endif /* HW_VFIO_VFIO_COMMON_H */
> @@ -609,3 +611,45 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
>
> return ret;
> }
> +
> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
> +{
> + struct stat st;
> +
> + if (vbasedev->fd < 0) {
> + if (stat(vbasedev->sysfsdev, &st) < 0) {
> + error_setg_errno(errp, errno, "no such host device");
> + error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
> + return -errno;
> + }
> + /* User may specify a name, e.g: VFIO platform device */
> + if (!vbasedev->name) {
> + vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
> + }
> + } else {
> + if (!vbasedev->iommufd) {
> + error_setg(errp, "Use FD passing only with iommufd backend");
> + return -EINVAL;
> + }
> + /*
> + * Give a name with fd so any function printing out vbasedev->name
> + * will not break.
> + */
> + if (!vbasedev->name) {
> + vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
> + }
> + }
> +
> + return 0;
> +}
> +
> +void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
bool vfio_device_set_fd(..., Error **errp)
> +{
> + int fd = monitor_fd_param(monitor_cur(), str, errp);
> +
> + if (fd < 0) {
> + error_prepend(errp, "Could not parse remote object fd %s:", str);
> + return;
return false;
> + }
> + vbasedev->fd = fd;
return true;
> +}
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 3eec428162..e08a217057 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -326,11 +326,15 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
> uint32_t ioas_id;
> Error *err = NULL;
>
> - devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
> - if (devfd < 0) {
> - return devfd;
> + if (vbasedev->fd < 0) {
> + devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
> + if (devfd < 0) {
> + return devfd;
> + }
> + vbasedev->fd = devfd;
> + } else {
> + devfd = vbasedev->fd;
> }
> - vbasedev->fd = devfd;
>
> ret = iommufd_cdev_connect_and_bind(vbasedev, errp);
> if (ret) {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c5984b0598..b23b492cce 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2944,17 +2944,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> VFIODevice *vbasedev = &vdev->vbasedev;
> char *tmp, *subsys;
> Error *err = NULL;
> - struct stat st;
> int i, ret;
> bool is_mdev;
> char uuid[UUID_STR_LEN];
> char *name;
>
> - if (!vbasedev->sysfsdev) {
> + if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
> if (!(~vdev->host.domain || ~vdev->host.bus ||
> ~vdev->host.slot || ~vdev->host.function)) {
> error_setg(errp, "No provided host device");
> error_append_hint(errp, "Use -device vfio-pci,host=DDDD:BB:DD.F "
> +#ifdef CONFIG_IOMMUFD
> + "or -device vfio-pci,fd=DEVICE_FD "
> +#endif
> "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
> return;
> }
> @@ -2964,13 +2966,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
> vdev->host.slot, vdev->host.function);
> }
>
> - if (stat(vbasedev->sysfsdev, &st) < 0) {
> - error_setg_errno(errp, errno, "no such host device");
> - error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
> + if (vfio_device_get_name(vbasedev, errp)) {
Clearer as:
if (vfio_device_get_name(vbasedev, errp) < 0) {
> return;
> }
Regards,
Phil.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-14 10:09 ` [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
2023-11-14 13:28 ` Cédric Le Goater
@ 2023-11-15 12:52 ` Eric Auger
2023-11-16 4:04 ` Duan, Zhenzhong
2023-11-17 11:09 ` Cédric Le Goater
2 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2023-11-15 12:52 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Paolo Bonzini, Eric Blake, Markus Armbruster,
Daniel P. Berrangé, Eduardo Habkost
Hi Zhenzhong,
On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Eric Auger <eric.auger@redhat.com>
>
> Introduce an iommufd object which allows the interaction
> with the host /dev/iommu device.
>
> The /dev/iommu can have been already pre-opened outside of qemu,
> in which case the fd can be passed directly along with the
> iommufd object:
>
> This allows the iommufd object to be shared accross several
> subsystems (VFIO, VDPA, ...). For example, libvirt would open
> the /dev/iommu once.
>
> If no fd is passed along with the iommufd object, the /dev/iommu
> is opened by the qemu code.
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v6: remove redundant call, alloc_hwpt, get/put_ioas
>
> MAINTAINERS | 7 ++
> qapi/qom.json | 19 ++++
> include/sysemu/iommufd.h | 44 ++++++++
> backends/iommufd.c | 228 +++++++++++++++++++++++++++++++++++++++
> backends/Kconfig | 4 +
> backends/meson.build | 1 +
> backends/trace-events | 10 ++
> qemu-options.hx | 12 +++
> 8 files changed, 325 insertions(+)
> create mode 100644 include/sysemu/iommufd.h
> create mode 100644 backends/iommufd.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ff1238bb98..a4891f7bda 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c
> F: docs/system/s390x/vfio-ap.rst
> L: qemu-s390x@nongnu.org
>
> +iommufd
> +M: Yi Liu <yi.l.liu@intel.com>
> +M: Eric Auger <eric.auger@redhat.com>
Zhenzhong, don't you want to be added here?
> +S: Supported
> +F: backends/iommufd.c
> +F: include/sysemu/iommufd.h
> +
> vhost
> M: Michael S. Tsirkin <mst@redhat.com>
> S: Supported
> diff --git a/qapi/qom.json b/qapi/qom.json
> index c53ef978ff..1fd8555a75 100644
> --- a/qapi/qom.json
> +++ b/qapi/qom.json
> @@ -794,6 +794,23 @@
> { 'struct': 'VfioUserServerProperties',
> 'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>
> +##
> +# @IOMMUFDProperties:
> +#
> +# Properties for iommufd objects.
> +#
> +# @fd: file descriptor name previously passed via 'getfd' command,
"previously passed via 'getfd' command", I wonder if this applies here or whether it is copy/paste of
RemoteObjectProperties.fd doc?
> +# which represents a pre-opened /dev/iommu. This allows the
> +# iommufd object to be shared accross several subsystems
> +# (VFIO, VDPA, ...), and the file descriptor to be shared
> +# with other process, e.g. DPDK. (default: QEMU opens
> +# /dev/iommu by itself)
> +#
> +# Since: 8.2
> +##
> +{ 'struct': 'IOMMUFDProperties',
> + 'data': { '*fd': 'str' } }
> +
> ##
> # @RngProperties:
> #
> @@ -934,6 +951,7 @@
> 'input-barrier',
> { 'name': 'input-linux',
> 'if': 'CONFIG_LINUX' },
> + 'iommufd',
> 'iothread',
> 'main-loop',
> { 'name': 'memory-backend-epc',
> @@ -1003,6 +1021,7 @@
> 'input-barrier': 'InputBarrierProperties',
> 'input-linux': { 'type': 'InputLinuxProperties',
> 'if': 'CONFIG_LINUX' },
> + 'iommufd': 'IOMMUFDProperties',
> 'iothread': 'IothreadProperties',
> 'main-loop': 'MainLoopProperties',
> 'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties',
> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
> new file mode 100644
> index 0000000000..9b3a86f57d
> --- /dev/null
> +++ b/include/sysemu/iommufd.h
> @@ -0,0 +1,44 @@
> +#ifndef SYSEMU_IOMMUFD_H
> +#define SYSEMU_IOMMUFD_H
> +
> +#include "qom/object.h"
> +#include "qemu/thread.h"
> +#include "exec/hwaddr.h"
> +#include "exec/cpu-common.h"
> +
> +#define TYPE_IOMMUFD_BACKEND "iommufd"
> +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
> + IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND(obj) \
> + OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND_GET_CLASS(obj) \
> + OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj), TYPE_IOMMUFD_BACKEND)
> +#define IOMMUFD_BACKEND_CLASS(klass) \
> + OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass), TYPE_IOMMUFD_BACKEND)
> +struct IOMMUFDBackendClass {
> + ObjectClass parent_class;
> +};
> +
> +struct IOMMUFDBackend {
> + Object parent;
> +
> + /*< protected >*/
> + int fd; /* /dev/iommu file descriptor */
> + bool owned; /* is the /dev/iommu opened internally */
> + QemuMutex lock;
> + uint32_t users;
> +
> + /*< public >*/
> +};
> +
> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
> +void iommufd_backend_disconnect(IOMMUFDBackend *be);
> +
> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
> + Error **errp);
> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> + ram_addr_t size, void *vaddr, bool readonly);
> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> + hwaddr iova, ram_addr_t size);
> +#endif
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> new file mode 100644
> index 0000000000..ea3e2a8f85
> --- /dev/null
> +++ b/backends/iommufd.c
> @@ -0,0 +1,228 @@
> +/*
> + * iommufd container backend
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + * Copyright Red Hat, Inc. 2023
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + * Eric Auger <eric.auger@redhat.com>
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#include "qemu/osdep.h"
> +#include "sysemu/iommufd.h"
> +#include "qapi/error.h"
> +#include "qapi/qmp/qerror.h"
> +#include "qemu/module.h"
> +#include "qom/object_interfaces.h"
> +#include "qemu/error-report.h"
> +#include "monitor/monitor.h"
> +#include "trace.h"
> +#include <sys/ioctl.h>
> +#include <linux/iommufd.h>
> +
> +static void iommufd_backend_init(Object *obj)
> +{
> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> +
> + be->fd = -1;
> + be->users = 0;
> + be->owned = true;
> + qemu_mutex_init(&be->lock);
> +}
> +
> +static void iommufd_backend_finalize(Object *obj)
> +{
> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> +
> + if (be->owned) {
> + close(be->fd);
> + be->fd = -1;
> + }
> +}
> +
> +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
> +{
> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
> + int fd = -1;
> +
> + fd = monitor_fd_param(monitor_cur(), str, errp);
> + if (fd == -1) {
> + error_prepend(errp, "Could not parse remote object fd %s:", str);
> + return;
> + }
> + qemu_mutex_lock(&be->lock);
> + be->fd = fd;
> + be->owned = false;
> + qemu_mutex_unlock(&be->lock);
> + trace_iommu_backend_set_fd(be->fd);
> +}
> +
> +static void iommufd_backend_class_init(ObjectClass *oc, void *data)
> +{
> + object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
> +}
> +
> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
> +{
> + int fd, ret = 0;
> +
> + qemu_mutex_lock(&be->lock);
> + if (be->users == UINT32_MAX) {
> + error_setg(errp, "too many connections");
> + ret = -E2BIG;
> + goto out;
> + }
> + if (be->owned && !be->users) {
> + fd = qemu_open_old("/dev/iommu", O_RDWR);
> + if (fd < 0) {
> + error_setg_errno(errp, errno, "/dev/iommu opening failed");
> + ret = fd;
> + goto out;
> + }
> + be->fd = fd;
> + }
> + be->users++;
> +out:
> + trace_iommufd_backend_connect(be->fd, be->owned,
> + be->users, ret);
> + qemu_mutex_unlock(&be->lock);
> + return ret;
> +}
> +
> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
> +{
> + qemu_mutex_lock(&be->lock);
> + if (!be->users) {
> + goto out;
> + }
> + be->users--;
> + if (!be->users && be->owned) {
> + close(be->fd);
> + be->fd = -1;
> + }
> +out:
> + trace_iommufd_backend_disconnect(be->fd, be->users);
> + qemu_mutex_unlock(&be->lock);
> +}
> +
> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
> + Error **errp)
> +{
> + int ret, fd = be->fd;
> + struct iommu_ioas_alloc alloc_data = {
> + .size = sizeof(alloc_data),
> + .flags = 0,
> + };
> +
> + ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
> + if (ret) {
> + error_setg_errno(errp, errno, "Failed to allocate ioas");
> + return ret;
> + }
> +
> + *ioas_id = alloc_data.out_ioas_id;
> + trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
> +
> + return ret;
> +}
> +
> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id)
> +{
> + int ret, fd = be->fd;
> + struct iommu_destroy des = {
> + .size = sizeof(des),
> + .id = id,
> + };
> +
> + ret = ioctl(fd, IOMMU_DESTROY, &des);
> + trace_iommufd_backend_free_id(fd, id, ret);
> + if (ret) {
> + error_report("Failed to free id: %u %m", id);
> + }
> +}
> +
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> + ram_addr_t size, void *vaddr, bool readonly)
> +{
> + int ret, fd = be->fd;
> + struct iommu_ioas_map map = {
> + .size = sizeof(map),
> + .flags = IOMMU_IOAS_MAP_READABLE |
> + IOMMU_IOAS_MAP_FIXED_IOVA,
> + .ioas_id = ioas_id,
> + .__reserved = 0,
> + .user_va = (uintptr_t)vaddr,
> + .iova = iova,
> + .length = size,
> + };
> +
> + if (!readonly) {
> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
> + }
> +
> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
> + vaddr, readonly, ret);
> + if (ret) {
> + ret = -errno;
> + error_report("IOMMU_IOAS_MAP failed: %m");
> + }
> + return ret;
> +}
> +
> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
> + hwaddr iova, ram_addr_t size)
> +{
> + int ret, fd = be->fd;
> + struct iommu_ioas_unmap unmap = {
> + .size = sizeof(unmap),
> + .ioas_id = ioas_id,
> + .iova = iova,
> + .length = size,
> + };
> +
> + ret = ioctl(fd, IOMMU_IOAS_UNMAP, &unmap);
> + /*
> + * IOMMUFD takes mapping as some kind of object, unmapping
> + * nonexistent mapping is treated as deleting a nonexistent
> + * object and return ENOENT. This is different from legacy
> + * backend which allows it. vIOMMU may trigger a lot of
> + * redundant unmapping, to avoid flush the log, treat them
> + * as succeess for IOMMUFD just like legacy backend.
> + */
> + if (ret && errno == ENOENT) {
> + trace_iommufd_backend_unmap_dma_non_exist(fd, ioas_id, iova, size, ret);
> + ret = 0;
> + } else {
> + trace_iommufd_backend_unmap_dma(fd, ioas_id, iova, size, ret);
> + }
> +
> + if (ret) {
> + ret = -errno;
> + error_report("IOMMU_IOAS_UNMAP failed: %m");
> + }
> + return ret;
> +}
> +
> +static const TypeInfo iommufd_backend_info = {
> + .name = TYPE_IOMMUFD_BACKEND,
> + .parent = TYPE_OBJECT,
> + .instance_size = sizeof(IOMMUFDBackend),
> + .instance_init = iommufd_backend_init,
> + .instance_finalize = iommufd_backend_finalize,
> + .class_size = sizeof(IOMMUFDBackendClass),
> + .class_init = iommufd_backend_class_init,
> + .interfaces = (InterfaceInfo[]) {
> + { TYPE_USER_CREATABLE },
> + { }
> + }
> +};
> +
> +static void register_types(void)
> +{
> + type_register_static(&iommufd_backend_info);
> +}
> +
> +type_init(register_types);
> diff --git a/backends/Kconfig b/backends/Kconfig
> index f35abc1609..2cb23f62fa 100644
> --- a/backends/Kconfig
> +++ b/backends/Kconfig
> @@ -1 +1,5 @@
> source tpm/Kconfig
> +
> +config IOMMUFD
> + bool
> + depends on VFIO
I don't know the state of vDPA/iommufd integration but this extra might
be added in short term.
> diff --git a/backends/meson.build b/backends/meson.build
> index 914c7c4afb..9a5cea480d 100644
> --- a/backends/meson.build
> +++ b/backends/meson.build
> @@ -20,6 +20,7 @@ if have_vhost_user
> system_ss.add(when: 'CONFIG_VIRTIO', if_true: files('vhost-user.c'))
> endif
> system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost.c'))
> +system_ss.add(when: 'CONFIG_IOMMUFD', if_true: files('iommufd.c'))
> if have_vhost_user_crypto
> system_ss.add(when: 'CONFIG_VIRTIO_CRYPTO', if_true: files('cryptodev-vhost-user.c'))
> endif
> diff --git a/backends/trace-events b/backends/trace-events
> index 652eb76a57..d45c6e31a6 100644
> --- a/backends/trace-events
> +++ b/backends/trace-events
> @@ -5,3 +5,13 @@ dbus_vmstate_pre_save(void)
> dbus_vmstate_post_load(int version_id) "version_id: %d"
> dbus_vmstate_loading(const char *id) "id: %s"
> dbus_vmstate_saving(const char *id) "id: %s"
> +
> +# iommufd.c
> +iommufd_backend_connect(int fd, bool owned, uint32_t users, int ret) "fd=%d owned=%d users=%d (%d)"
> +iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
> +iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
> +iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
> +iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> +iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
> +iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas, int ret) " iommufd=%d ioas=%d (%d)"
> +iommufd_backend_free_id(int iommufd, uint32_t id, int ret) " iommufd=%d id=%d (%d)"
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 42fd09e4de..70507c0ee6 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -5224,6 +5224,18 @@ SRST
>
> The ``share`` boolean option is on by default with memfd.
>
> + ``-object iommufd,id=id[,fd=fd]``
> + Creates an iommufd backend which allows control of DMA mapping
> + through the /dev/iommu device.
> +
> + The ``id`` parameter is a unique ID which frontends (such as
> + vfio-pci of vdpa) will use to connect with the iommufd backend.
> +
> + The ``fd`` parameter is an optional pre-opened file descriptor
> + resulting from /dev/iommu opening. Usually the iommufd is shared
> + across all subsystems, bringing the benefit of centralized
> + reference counting.
> +
> ``-object rng-builtin,id=id``
> Creates a random number generator backend which obtains entropy
> from QEMU builtin functions. The ``id`` parameter is a unique ID
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle
2023-11-15 12:09 ` Philippe Mathieu-Daudé
@ 2023-11-15 13:05 ` Cédric Le Goater
2023-11-16 2:15 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-15 13:05 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/15/23 13:09, Philippe Mathieu-Daudé wrote:
> Hi Zhenzhong,
>
> On 14/11/23 11:09, Zhenzhong Duan wrote:
>> This gives management tools like libvirt a chance to open the vfio
>> cdev with privilege and pass FD to qemu. This way qemu never needs
>> to have privilege to open a VFIO or iommu cdev node.
>>
>> Together with the earlier support of pre-opening /dev/iommu device,
>> now we have full support of passing a vfio device to unprivileged
>> qemu by management tool. This mode is no more considered for the
>> legacy backend. So let's remove the "TODO" comment.
>>
>> Add helper functions vfio_device_set_fd() and vfio_device_get_name()
>> to set fd and get device name, they will also be used by other vfio
>> devices.
>>
>> There is no easy way to check if a device is mdev with FD passing,
>> so fail the x-balloon-allowed check unconditionally in this case.
>>
>> There is also no easy way to get BDF as name with FD passing, so
>> we fake a name by VFIO_FD[fd].
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> v6: simplify CONFIG_IOMMUFD checking code
>> introduce a helper vfio_device_set_fd
>>
>> include/hw/vfio/vfio-common.h | 3 +++
>> hw/vfio/helpers.c | 44 +++++++++++++++++++++++++++++++++++
>> hw/vfio/iommufd.c | 12 ++++++----
>> hw/vfio/pci.c | 28 ++++++++++++----------
>> 4 files changed, 71 insertions(+), 16 deletions(-)
>>
>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>> index 3dac5c167e..567e5f7bea 100644
>> --- a/include/hw/vfio/vfio-common.h
>> +++ b/include/hw/vfio/vfio-common.h
>> @@ -251,4 +251,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
>> hwaddr size);
>> int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
>> uint64_t size, ram_addr_t ram_addr);
>> +
>
> Please add bare documentation:
>
> /* Returns 0 on success, or a negative errno. */
>
>> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>
> Functions taking an Error** param should return a boolean, so:
>
> /* Return: true on success, else false setting @errp with error. */
>
>> +void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
>> #endif /* HW_VFIO_VFIO_COMMON_H */
>
>
>> @@ -609,3 +611,45 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
>> return ret;
>> }
>> +
>> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
>> +{
>> + struct stat st;
>> +
>> + if (vbasedev->fd < 0) {
>> + if (stat(vbasedev->sysfsdev, &st) < 0) {
>> + error_setg_errno(errp, errno, "no such host device");
>> + error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
>> + return -errno;
>> + }
>> + /* User may specify a name, e.g: VFIO platform device */
>> + if (!vbasedev->name) {
>> + vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
>> + }
>> + } else {
>> + if (!vbasedev->iommufd) {
>> + error_setg(errp, "Use FD passing only with iommufd backend");
>> + return -EINVAL;
>> + }
>> + /*
>> + * Give a name with fd so any function printing out vbasedev->name
>> + * will not break.
>> + */
>> + if (!vbasedev->name) {
>> + vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
>> + }
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
>
> bool vfio_device_set_fd(..., Error **errp)
>
>> +{
>> + int fd = monitor_fd_param(monitor_cur(), str, errp);
>> +
>> + if (fd < 0) {
>> + error_prepend(errp, "Could not parse remote object fd %s:", str);
>> + return;
>
> return false;
>
>> + }
>> + vbasedev->fd = fd;
>
> return true;
If we had a QOM base device object, vfio_device_set_fd() would be passed
directly to object_class_property_add_str() which expects a :
void (*set)(Object *, const char *, Error **)
I think it is fine to keep as it is. We might have a QOM base device object
one day ! Minor anyway.
Thanks,
C.
>
>> +}
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 3eec428162..e08a217057 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -326,11 +326,15 @@ static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>> uint32_t ioas_id;
>> Error *err = NULL;
>> - devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>> - if (devfd < 0) {
>> - return devfd;
>> + if (vbasedev->fd < 0) {
>> + devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>> + if (devfd < 0) {
>> + return devfd;
>> + }
>> + vbasedev->fd = devfd;
>> + } else {
>> + devfd = vbasedev->fd;
>> }
>> - vbasedev->fd = devfd;
>> ret = iommufd_cdev_connect_and_bind(vbasedev, errp);
>> if (ret) {
>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>> index c5984b0598..b23b492cce 100644
>> --- a/hw/vfio/pci.c
>> +++ b/hw/vfio/pci.c
>> @@ -2944,17 +2944,19 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>> VFIODevice *vbasedev = &vdev->vbasedev;
>> char *tmp, *subsys;
>> Error *err = NULL;
>> - struct stat st;
>> int i, ret;
>> bool is_mdev;
>> char uuid[UUID_STR_LEN];
>> char *name;
>> - if (!vbasedev->sysfsdev) {
>> + if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
>> if (!(~vdev->host.domain || ~vdev->host.bus ||
>> ~vdev->host.slot || ~vdev->host.function)) {
>> error_setg(errp, "No provided host device");
>> error_append_hint(errp, "Use -device vfio-pci,host=DDDD:BB:DD.F "
>> +#ifdef CONFIG_IOMMUFD
>> + "or -device vfio-pci,fd=DEVICE_FD "
>> +#endif
>> "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
>> return;
>> }
>> @@ -2964,13 +2966,9 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>> vdev->host.slot, vdev->host.function);
>> }
>> - if (stat(vbasedev->sysfsdev, &st) < 0) {
>> - error_setg_errno(errp, errno, "no such host device");
>> - error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
>> + if (vfio_device_get_name(vbasedev, errp)) {
>
> Clearer as:
>
> if (vfio_device_get_name(vbasedev, errp) < 0) {
>
>> return;
>> }
>
> Regards,
>
> Phil.
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 02/21] util/char_dev: Add open_cdev()
2023-11-14 10:09 ` [PATCH v6 02/21] util/char_dev: Add open_cdev() Zhenzhong Duan
2023-11-14 13:29 ` Cédric Le Goater
@ 2023-11-15 13:23 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-15 13:23 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Yi Liu <yi.l.liu@intel.com>
>
> /dev/vfio/devices/vfioX may not exist. In that case it is still possible
> to open /dev/char/$major:$minor instead. Add helper function to abstract
> the cdev open.
>
> Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> MAINTAINERS | 3 ++
> include/qemu/chardev_open.h | 16 ++++++++
> util/chardev_open.c | 81 +++++++++++++++++++++++++++++++++++++
> util/meson.build | 1 +
> 4 files changed, 101 insertions(+)
> create mode 100644 include/qemu/chardev_open.h
> create mode 100644 util/chardev_open.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index a4891f7bda..869ec3d5af 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2172,6 +2172,9 @@ M: Eric Auger <eric.auger@redhat.com>
> S: Supported
> F: backends/iommufd.c
> F: include/sysemu/iommufd.h
> +F: include/qemu/chardev_open.h
> +F: util/chardev_open.c
> +
>
> vhost
> M: Michael S. Tsirkin <mst@redhat.com>
> diff --git a/include/qemu/chardev_open.h b/include/qemu/chardev_open.h
> new file mode 100644
> index 0000000000..64e8fcfdcb
> --- /dev/null
> +++ b/include/qemu/chardev_open.h
> @@ -0,0 +1,16 @@
> +/*
> + * QEMU Chardev Helper
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *
> + * This work is licensed under the terms of the GNU GPL, version 2. See
> + * the COPYING file in the top-level directory.
> + */
> +
> +#ifndef QEMU_CHARDEV_OPEN_H
> +#define QEMU_CHARDEV_OPEN_H
> +
> +int open_cdev(const char *devpath, dev_t cdev);
> +#endif
> diff --git a/util/chardev_open.c b/util/chardev_open.c
> new file mode 100644
> index 0000000000..f776429788
> --- /dev/null
> +++ b/util/chardev_open.c
> @@ -0,0 +1,81 @@
> +/*
> + * Copyright (c) 2019, Mellanox Technologies. All rights reserved.
> + * Copyright (C) 2023 Intel Corporation.
> + *
> + * This software is available to you under a choice of one of two
> + * licenses. You may choose to be licensed under the terms of the GNU
> + * General Public License (GPL) Version 2, available from the file
> + * COPYING in the main directory of this source tree, or the
> + * OpenIB.org BSD license below:
> + *
> + * Redistribution and use in source and binary forms, with or
> + * without modification, are permitted provided that the following
> + * conditions are met:
> + *
> + * - Redistributions of source code must retain the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer.
> + *
> + * - Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following
> + * disclaimer in the documentation and/or other materials
> + * provided with the distribution.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> + * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> + * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> + * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
> + * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
> + * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
> + * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
> + * SOFTWARE.
> + *
> + * Authors: Yi Liu <yi.l.liu@intel.com>
> + *
> + * Copied from
> + * https://github.com/linux-rdma/rdma-core/blob/master/util/open_cdev.c
> + *
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu/chardev_open.h"
> +
> +static int open_cdev_internal(const char *path, dev_t cdev)
> +{
> + struct stat st;
> + int fd;
> +
> + fd = qemu_open_old(path, O_RDWR);
> + if (fd == -1) {
> + return -1;
> + }
> + if (fstat(fd, &st) || !S_ISCHR(st.st_mode) ||
> + (cdev != 0 && st.st_rdev != cdev)) {
> + close(fd);
> + return -1;
> + }
> + return fd;
> +}
> +
> +static int open_cdev_robust(dev_t cdev)
> +{
> + g_autofree char *devpath = NULL;
> +
> + /*
> + * This assumes that udev is being used and is creating the /dev/char/
> + * symlinks.
> + */
> + devpath = g_strdup_printf("/dev/char/%u:%u", major(cdev), minor(cdev));
> + return open_cdev_internal(devpath, cdev);
> +}
> +
> +int open_cdev(const char *devpath, dev_t cdev)
> +{
> + int fd;
> +
> + fd = open_cdev_internal(devpath, cdev);
> + if (fd == -1 && cdev != 0) {
> + return open_cdev_robust(cdev);
> + }
> + return fd;
> +}
> diff --git a/util/meson.build b/util/meson.build
> index c2322ef6e7..174c133368 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -108,6 +108,7 @@ if have_block
> util_ss.add(files('filemonitor-stub.c'))
> endif
> util_ss.add(when: 'CONFIG_LINUX', if_true: files('vfio-helpers.c'))
> + util_ss.add(when: 'CONFIG_LINUX', if_true: files('chardev_open.c'))
> endif
>
> if cpu == 'aarch64'
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 03/21] vfio/common: return early if space isn't empty
2023-11-14 10:09 ` [PATCH v6 03/21] vfio/common: return early if space isn't empty Zhenzhong Duan
2023-11-14 13:29 ` Cédric Le Goater
@ 2023-11-15 13:28 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-15 13:28 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> This is a trivial optimization. If there is active container in space,
> vfio_reset_handler will never be unregistered. So revert the check of
> space->containers and return early.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Thanks
Eric
> ---
> hw/vfio/common.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 572ae7c934..934f4f5446 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -1462,10 +1462,13 @@ VFIOAddressSpace *vfio_get_address_space(AddressSpace *as)
>
> void vfio_put_address_space(VFIOAddressSpace *space)
> {
> - if (QLIST_EMPTY(&space->containers)) {
> - QLIST_REMOVE(space, list);
> - g_free(space);
> + if (!QLIST_EMPTY(&space->containers)) {
> + return;
> }
> +
> + QLIST_REMOVE(space, list);
> + g_free(space);
> +
> if (QLIST_EMPTY(&vfio_address_spaces)) {
> qemu_unregister_reset(vfio_reset_handler, NULL);
> }
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 05/21] vfio/iommufd: Relax assert check for iommufd backend
2023-11-14 10:09 ` [PATCH v6 05/21] vfio/iommufd: Relax assert check for " Zhenzhong Duan
@ 2023-11-15 13:56 ` Eric Auger
0 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-15 13:56 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Currently iommufd doesn't support dirty page sync yet,
> but it will not block us doing live migration if VFIO
> migration is force enabled.
>
> So in this case we allow set_dirty_page_tracking to be NULL.
> Note we don't need same change for query_dirty_bitmap because
> when dirty page sync isn't supported, query_dirty_bitmap will
> never be called.
>
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> hw/vfio/container-base.c | 4 ++++
> hw/vfio/container.c | 4 ----
> 2 files changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 71f7274973..eee2dcfe76 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -55,6 +55,10 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
> int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> bool start)
> {
> + if (!bcontainer->dirty_pages_supported) {
> + return 0;
> + }
> +
> g_assert(bcontainer->ops->set_dirty_page_tracking);
> return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
> }
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 6bacf38222..ed2d721b2b 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -216,10 +216,6 @@ static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> .argsz = sizeof(dirty),
> };
>
> - if (!bcontainer->dirty_pages_supported) {
> - return 0;
> - }
> -
> if (start) {
> dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
> } else {
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes
2023-11-14 13:46 ` Cédric Le Goater
2023-11-15 2:36 ` Duan, Zhenzhong
@ 2023-11-15 16:25 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-15 16:25 UTC (permalink / raw)
To: Cédric Le Goater, Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, peterx, jasowang,
kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 14:46, Cédric Le Goater wrote:
> On 11/14/23 11:09, Zhenzhong Duan wrote:
>> Some vIOMMU such as virtio-iommu use IOVA ranges from host side to
>> setup reserved ranges for passthrough device, so that guest will not
>> use an IOVA range beyond host support.
>>
>> Use an uAPI of IOMMUFD to get IOVA ranges of host side and pass to
>> vIOMMU just like the legacy backend, if this fails, fallback to
>> 64bit IOVA range.
>>
>> Also use out_iova_alignment returned from uAPI as pgsizes instead of
>> qemu_real_host_page_size() as a fallback.
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> v6: propagate iommufd_cdev_get_info_iova_range err and print as warning
>>
>> hw/vfio/iommufd.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++-
>> 1 file changed, 54 insertions(+), 1 deletion(-)
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 06282d885c..e5bf528e89 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -267,6 +267,53 @@ static int
>> iommufd_cdev_ram_block_discard_disable(bool state)
>> return ram_block_uncoordinated_discard_disable(state);
>> }
>> +static int iommufd_cdev_get_info_iova_range(VFIOIOMMUFDContainer
>> *container,
>> + uint32_t ioas_id, Error
>> **errp)
>> +{
>> + VFIOContainerBase *bcontainer = &container->bcontainer;
>> + struct iommu_ioas_iova_ranges *info;
>> + struct iommu_iova_range *iova_ranges;
>> + int ret, sz, fd = container->be->fd;
>> +
>> + info = g_malloc0(sizeof(*info));
>> + info->size = sizeof(*info);
>> + info->ioas_id = ioas_id;
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
>> + if (ret && errno != EMSGSIZE) {
>> + goto error;
>> + }
>> +
>> + sz = info->num_iovas * sizeof(struct iommu_iova_range);
>> + info = g_realloc(info, sizeof(*info) + sz);
>> + info->allowed_iovas = (uintptr_t)(info + 1);
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_IOVA_RANGES, info);
>> + if (ret) {
>> + goto error;
>> + }
>> +
>> + iova_ranges = (struct iommu_iova_range
>> *)(uintptr_t)info->allowed_iovas;
>> +
>> + for (int i = 0; i < info->num_iovas; i++) {
>> + Range *range = g_new(Range, 1);
>> +
>> + range_set_bounds(range, iova_ranges[i].start,
>> iova_ranges[i].last);
>> + bcontainer->iova_ranges =
>> + range_list_insert(bcontainer->iova_ranges, range);
>> + }
>> + bcontainer->pgsizes = info->out_iova_alignment;
>> +
>> + g_free(info);
>> + return 0;
>> +
>> +error:
>> + ret = -errno;
>> + g_free(info);
>> + error_setg_errno(errp, errno, "Cannot get IOVA ranges");
>> + return ret;
>> +}
>> +
>> static int iommufd_cdev_attach(const char *name, VFIODevice *vbasedev,
>> AddressSpace *as, Error **errp)
>> {
>> @@ -341,7 +388,13 @@ static int iommufd_cdev_attach(const char *name,
>> VFIODevice *vbasedev,
>> goto err_discard_disable;
>> }
>> - bcontainer->pgsizes = qemu_real_host_page_size();
>> + ret = iommufd_cdev_get_info_iova_range(container, ioas_id, &err);
>> + if (ret) {
>> + warn_report_err(err);
>> + err = NULL;
>> + error_printf("Fallback to default 64bit IOVA range and 4K
>> page size\n");
>
> This would be better :
>
> error_append_hint(&err,
> "Fallback to default 64bit IOVA range and 4K page
> size\n");
> warn_report_err(err);
>
> I will take care of it if you agree. With that,
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
With Cédric's suggestion,
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
>
> Thanks,
>
> C.
>
>
>> + bcontainer->pgsizes = qemu_real_host_page_size();
>> + }
>> bcontainer->listener = vfio_memory_listener;
>> memory_listener_register(&bcontainer->listener,
>> bcontainer->space->as);
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 07/21] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
2023-11-14 10:09 ` [PATCH v6 07/21] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Zhenzhong Duan
@ 2023-11-15 17:00 ` Eric Auger
0 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-15 17:00 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> This helper will be used by both legacy and iommufd backends.
>
> No functional changes intended.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> hw/vfio/pci.h | 3 +++
> hw/vfio/pci.c | 54 +++++++++++++++++++++++++++++++++++----------------
> 2 files changed, 40 insertions(+), 17 deletions(-)
>
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index fba8737ab2..1006061afb 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -218,6 +218,9 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
>
> extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
>
> +int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
> + struct vfio_pci_hot_reset_info **info_p);
> +
> int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp);
>
> int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index c62c02f7b6..eb55e8ae88 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2445,22 +2445,13 @@ static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
> return (strcmp(tmp, name) == 0);
> }
>
> -static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> +int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
> + struct vfio_pci_hot_reset_info **info_p)
> {
> - VFIOGroup *group;
> struct vfio_pci_hot_reset_info *info;
> - struct vfio_pci_dependent_device *devices;
> - struct vfio_pci_hot_reset *reset;
> - int32_t *fds;
> - int ret, i, count;
> - bool multi = false;
> + int ret, count;
>
> - trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> -
> - if (!single) {
> - vfio_pci_pre_reset(vdev);
> - }
> - vdev->vbasedev.needs_reset = false;
> + assert(info_p && !*info_p);
>
> info = g_malloc0(sizeof(*info));
> info->argsz = sizeof(*info);
> @@ -2468,24 +2459,53 @@ static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> if (ret && errno != ENOSPC) {
> ret = -errno;
> + g_free(info);
> if (!vdev->has_pm_reset) {
> error_report("vfio: Cannot reset device %s, "
> "no available reset mechanism.", vdev->vbasedev.name);
> }
> - goto out_single;
> + return ret;
> }
>
> count = info->count;
> - info = g_realloc(info, sizeof(*info) + (count * sizeof(*devices)));
> - info->argsz = sizeof(*info) + (count * sizeof(*devices));
> - devices = &info->devices[0];
> + info = g_realloc(info, sizeof(*info) + (count * sizeof(info->devices[0])));
> + info->argsz = sizeof(*info) + (count * sizeof(info->devices[0]));
>
> ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_PCI_HOT_RESET_INFO, info);
> if (ret) {
> ret = -errno;
> + g_free(info);
> error_report("vfio: hot reset info failed: %m");
> + return ret;
> + }
> +
> + *info_p = info;
> + return 0;
> +}
> +
> +static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> +{
> + VFIOGroup *group;
> + struct vfio_pci_hot_reset_info *info = NULL;
> + struct vfio_pci_dependent_device *devices;
> + struct vfio_pci_hot_reset *reset;
> + int32_t *fds;
> + int ret, i, count;
> + bool multi = false;
> +
> + trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> +
> + if (!single) {
> + vfio_pci_pre_reset(vdev);
> + }
> + vdev->vbasedev.needs_reset = false;
> +
> + ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
> +
> + if (ret) {
> goto out_single;
> }
> + devices = &info->devices[0];
>
> trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface
2023-11-14 10:09 ` [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
2023-11-14 13:51 ` Cédric Le Goater
@ 2023-11-15 17:54 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-15 17:54 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Legacy vfio pci and iommufd cdev have different process to hot reset
> vfio device, expand current code to abstract out pci_hot_reset callback
> for legacy vfio, this same interface will also be used by iommufd
> cdev vfio device.
>
> Rename vfio_pci_hot_reset to vfio_legacy_pci_hot_reset and move it
> into container.c.
>
> vfio_pci_[pre/post]_reset and vfio_pci_host_match are exported so
> they could be called in legacy and iommufd pci_hot_reset callback.
>
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> v6: pci_hot_reset return -errno if fails
>
> hw/vfio/pci.h | 3 +
> include/hw/vfio/vfio-container-base.h | 3 +
> hw/vfio/container.c | 170 ++++++++++++++++++++++++++
> hw/vfio/pci.c | 168 +------------------------
> 4 files changed, 182 insertions(+), 162 deletions(-)
>
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index 1006061afb..6e64a2654e 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -218,6 +218,9 @@ void vfio_probe_igd_bar4_quirk(VFIOPCIDevice *vdev, int nr);
>
> extern const PropertyInfo qdev_prop_nv_gpudirect_clique;
>
> +void vfio_pci_pre_reset(VFIOPCIDevice *vdev);
> +void vfio_pci_post_reset(VFIOPCIDevice *vdev);
> +bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name);
> int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
> struct vfio_pci_hot_reset_info **info_p);
>
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 4b6f017c6f..45bb19c767 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -106,6 +106,9 @@ struct VFIOIOMMUOps {
> int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
> int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size);
> + /* PCI specific */
> + int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
> +
> /* SPAPR specific */
> int (*add_window)(VFIOContainerBase *bcontainer,
> MemoryRegionSection *section,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index ed2d721b2b..1dbf9b9a17 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -33,6 +33,7 @@
> #include "trace.h"
> #include "qapi/error.h"
> #include "migration/migration.h"
> +#include "pci.h"
>
> VFIOGroupList vfio_group_list =
> QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -922,6 +923,174 @@ static void vfio_legacy_detach_device(VFIODevice *vbasedev)
> vfio_put_group(group);
> }
>
> +static int vfio_legacy_pci_hot_reset(VFIODevice *vbasedev, bool single)
> +{
> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> + VFIOGroup *group;
> + struct vfio_pci_hot_reset_info *info = NULL;
> + struct vfio_pci_dependent_device *devices;
> + struct vfio_pci_hot_reset *reset;
> + int32_t *fds;
> + int ret, i, count;
> + bool multi = false;
> +
> + trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> +
> + if (!single) {
> + vfio_pci_pre_reset(vdev);
> + }
> + vdev->vbasedev.needs_reset = false;
> +
> + ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
> +
> + if (ret) {
> + goto out_single;
> + }
> + devices = &info->devices[0];
> +
> + trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
> +
> + /* Verify that we have all the groups required */
> + for (i = 0; i < info->count; i++) {
> + PCIHostDeviceAddress host;
> + VFIOPCIDevice *tmp;
> + VFIODevice *vbasedev_iter;
> +
> + host.domain = devices[i].segment;
> + host.bus = devices[i].bus;
> + host.slot = PCI_SLOT(devices[i].devfn);
> + host.function = PCI_FUNC(devices[i].devfn);
> +
> + trace_vfio_pci_hot_reset_dep_devices(host.domain,
> + host.bus, host.slot, host.function, devices[i].group_id);
> +
> + if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
> + continue;
> + }
> +
> + QLIST_FOREACH(group, &vfio_group_list, next) {
> + if (group->groupid == devices[i].group_id) {
> + break;
> + }
> + }
> +
> + if (!group) {
> + if (!vdev->has_pm_reset) {
> + error_report("vfio: Cannot reset device %s, "
> + "depends on group %d which is not owned.",
> + vdev->vbasedev.name, devices[i].group_id);
> + }
> + ret = -EPERM;
> + goto out;
> + }
> +
> + /* Prep dependent devices for reset and clear our marker. */
> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> + if (!vbasedev_iter->dev->realized ||
> + vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> + continue;
> + }
> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> + if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
> + if (single) {
> + ret = -EINVAL;
> + goto out_single;
> + }
> + vfio_pci_pre_reset(tmp);
> + tmp->vbasedev.needs_reset = false;
> + multi = true;
> + break;
> + }
> + }
> + }
> +
> + if (!single && !multi) {
> + ret = -EINVAL;
> + goto out_single;
> + }
> +
> + /* Determine how many group fds need to be passed */
> + count = 0;
> + QLIST_FOREACH(group, &vfio_group_list, next) {
> + for (i = 0; i < info->count; i++) {
> + if (group->groupid == devices[i].group_id) {
> + count++;
> + break;
> + }
> + }
> + }
> +
> + reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
> + reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
> + fds = &reset->group_fds[0];
> +
> + /* Fill in group fds */
> + QLIST_FOREACH(group, &vfio_group_list, next) {
> + for (i = 0; i < info->count; i++) {
> + if (group->groupid == devices[i].group_id) {
> + fds[reset->count++] = group->fd;
> + break;
> + }
> + }
> + }
> +
> + /* Bus reset! */
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
> + g_free(reset);
> + if (ret) {
> + ret = -errno;
> + }
> +
> + trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
> + ret ? strerror(errno) : "Success");
> +
> +out:
> + /* Re-enable INTx on affected devices */
> + for (i = 0; i < info->count; i++) {
> + PCIHostDeviceAddress host;
> + VFIOPCIDevice *tmp;
> + VFIODevice *vbasedev_iter;
> +
> + host.domain = devices[i].segment;
> + host.bus = devices[i].bus;
> + host.slot = PCI_SLOT(devices[i].devfn);
> + host.function = PCI_FUNC(devices[i].devfn);
> +
> + if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
> + continue;
> + }
> +
> + QLIST_FOREACH(group, &vfio_group_list, next) {
> + if (group->groupid == devices[i].group_id) {
> + break;
> + }
> + }
> +
> + if (!group) {
> + break;
> + }
> +
> + QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> + if (!vbasedev_iter->dev->realized ||
> + vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> + continue;
> + }
> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> + if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
> + vfio_pci_post_reset(tmp);
> + break;
> + }
> + }
> + }
> +out_single:
> + if (!single) {
> + vfio_pci_post_reset(vdev);
> + }
> + g_free(info);
> +
> + return ret;
> +}
> +
> const VFIOIOMMUOps vfio_legacy_ops = {
> .dma_map = vfio_legacy_dma_map,
> .dma_unmap = vfio_legacy_dma_unmap,
> @@ -929,4 +1098,5 @@ const VFIOIOMMUOps vfio_legacy_ops = {
> .detach_device = vfio_legacy_detach_device,
> .set_dirty_page_tracking = vfio_legacy_set_dirty_page_tracking,
> .query_dirty_bitmap = vfio_legacy_query_dirty_bitmap,
> + .pci_hot_reset = vfio_legacy_pci_hot_reset,
> };
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index eb55e8ae88..d00c3472c7 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -2374,7 +2374,7 @@ static int vfio_add_capabilities(VFIOPCIDevice *vdev, Error **errp)
> return 0;
> }
>
> -static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
> +void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
> {
> PCIDevice *pdev = &vdev->pdev;
> uint16_t cmd;
> @@ -2411,7 +2411,7 @@ static void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
> vfio_pci_write_config(pdev, PCI_COMMAND, cmd, 2);
> }
>
> -static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
> +void vfio_pci_post_reset(VFIOPCIDevice *vdev)
> {
> Error *err = NULL;
> int nr;
> @@ -2435,7 +2435,7 @@ static void vfio_pci_post_reset(VFIOPCIDevice *vdev)
> vfio_quirk_reset(vdev);
> }
>
> -static bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
> +bool vfio_pci_host_match(PCIHostDeviceAddress *addr, const char *name)
> {
> char tmp[13];
>
> @@ -2485,166 +2485,10 @@ int vfio_pci_get_pci_hot_reset_info(VFIOPCIDevice *vdev,
>
> static int vfio_pci_hot_reset(VFIOPCIDevice *vdev, bool single)
> {
> - VFIOGroup *group;
> - struct vfio_pci_hot_reset_info *info = NULL;
> - struct vfio_pci_dependent_device *devices;
> - struct vfio_pci_hot_reset *reset;
> - int32_t *fds;
> - int ret, i, count;
> - bool multi = false;
> -
> - trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> -
> - if (!single) {
> - vfio_pci_pre_reset(vdev);
> - }
> - vdev->vbasedev.needs_reset = false;
> -
> - ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
> -
> - if (ret) {
> - goto out_single;
> - }
> - devices = &info->devices[0];
> -
> - trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
> -
> - /* Verify that we have all the groups required */
> - for (i = 0; i < info->count; i++) {
> - PCIHostDeviceAddress host;
> - VFIOPCIDevice *tmp;
> - VFIODevice *vbasedev_iter;
> -
> - host.domain = devices[i].segment;
> - host.bus = devices[i].bus;
> - host.slot = PCI_SLOT(devices[i].devfn);
> - host.function = PCI_FUNC(devices[i].devfn);
> -
> - trace_vfio_pci_hot_reset_dep_devices(host.domain,
> - host.bus, host.slot, host.function, devices[i].group_id);
> -
> - if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
> - continue;
> - }
> -
> - QLIST_FOREACH(group, &vfio_group_list, next) {
> - if (group->groupid == devices[i].group_id) {
> - break;
> - }
> - }
> -
> - if (!group) {
> - if (!vdev->has_pm_reset) {
> - error_report("vfio: Cannot reset device %s, "
> - "depends on group %d which is not owned.",
> - vdev->vbasedev.name, devices[i].group_id);
> - }
> - ret = -EPERM;
> - goto out;
> - }
> -
> - /* Prep dependent devices for reset and clear our marker. */
> - QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> - if (!vbasedev_iter->dev->realized ||
> - vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> - continue;
> - }
> - tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> - if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
> - if (single) {
> - ret = -EINVAL;
> - goto out_single;
> - }
> - vfio_pci_pre_reset(tmp);
> - tmp->vbasedev.needs_reset = false;
> - multi = true;
> - break;
> - }
> - }
> - }
> -
> - if (!single && !multi) {
> - ret = -EINVAL;
> - goto out_single;
> - }
> -
> - /* Determine how many group fds need to be passed */
> - count = 0;
> - QLIST_FOREACH(group, &vfio_group_list, next) {
> - for (i = 0; i < info->count; i++) {
> - if (group->groupid == devices[i].group_id) {
> - count++;
> - break;
> - }
> - }
> - }
> -
> - reset = g_malloc0(sizeof(*reset) + (count * sizeof(*fds)));
> - reset->argsz = sizeof(*reset) + (count * sizeof(*fds));
> - fds = &reset->group_fds[0];
> -
> - /* Fill in group fds */
> - QLIST_FOREACH(group, &vfio_group_list, next) {
> - for (i = 0; i < info->count; i++) {
> - if (group->groupid == devices[i].group_id) {
> - fds[reset->count++] = group->fd;
> - break;
> - }
> - }
> - }
> -
> - /* Bus reset! */
> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
> - g_free(reset);
> -
> - trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
> - ret ? strerror(errno) : "Success");
> -
> -out:
> - /* Re-enable INTx on affected devices */
> - for (i = 0; i < info->count; i++) {
> - PCIHostDeviceAddress host;
> - VFIOPCIDevice *tmp;
> - VFIODevice *vbasedev_iter;
> -
> - host.domain = devices[i].segment;
> - host.bus = devices[i].bus;
> - host.slot = PCI_SLOT(devices[i].devfn);
> - host.function = PCI_FUNC(devices[i].devfn);
> -
> - if (vfio_pci_host_match(&host, vdev->vbasedev.name)) {
> - continue;
> - }
> -
> - QLIST_FOREACH(group, &vfio_group_list, next) {
> - if (group->groupid == devices[i].group_id) {
> - break;
> - }
> - }
> -
> - if (!group) {
> - break;
> - }
> -
> - QLIST_FOREACH(vbasedev_iter, &group->device_list, next) {
> - if (!vbasedev_iter->dev->realized ||
> - vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> - continue;
> - }
> - tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> - if (vfio_pci_host_match(&host, tmp->vbasedev.name)) {
> - vfio_pci_post_reset(tmp);
> - break;
> - }
> - }
> - }
> -out_single:
> - if (!single) {
> - vfio_pci_post_reset(vdev);
> - }
> - g_free(info);
> + VFIODevice *vbasedev = &vdev->vbasedev;
> + const VFIOIOMMUOps *ops = vbasedev->bcontainer->ops;
>
> - return ret;
> + return ops->pci_hot_reset(vbasedev, single);
> }
>
> /*
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 16/21] vfio/ccw: Allow the selection of a given iommu backend
2023-11-14 10:09 ` [PATCH v6 16/21] vfio/ccw: Allow the selection of a given iommu backend Zhenzhong Duan
2023-11-14 14:04 ` Cédric Le Goater
@ 2023-11-15 18:45 ` Eric Farman
1 sibling, 0 replies; 82+ messages in thread
From: Eric Farman @ 2023-11-15 18:45 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Matthew Rosato, Thomas Huth, open list:vfio-ccw
On Tue, 2023-11-14 at 18:09 +0800, Zhenzhong Duan wrote:
> Now we support two types of iommu backends, let's add the capability
> to select one of them. This depends on whether an iommufd object has
> been linked with the vfio-ccw device:
>
> If the user wants to use the legacy backend, it shall not
> link the vfio-ccw device with any iommufd object:
>
> -device vfio-ccw,sysfsdev=/sys/bus/mdev/devices/XXX
>
> This is called the legacy mode/backend.
>
> If the user wants to use the iommufd backend (/dev/iommu) it
> shall pass an iommufd object id in the vfio-ccw device options:
>
> -object iommufd,id=iommufd0
> -device vfio-ccw,sysfsdev=/sys/bus/mdev/devices/XXX,iommufd=iommufd0
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> hw/vfio/ccw.c | 6 ++++++
> 1 file changed, 6 insertions(+)
Reviewed-by: Eric Farman <farman@linux.ibm.com>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 17/21] vfio/ccw: Make vfio cdev pre-openable by passing a file handle
2023-11-14 10:09 ` [PATCH v6 17/21] vfio/ccw: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-11-14 14:04 ` Cédric Le Goater
@ 2023-11-15 18:46 ` Eric Farman
1 sibling, 0 replies; 82+ messages in thread
From: Eric Farman @ 2023-11-15 18:46 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Matthew Rosato, Thomas Huth, open list:vfio-ccw
On Tue, 2023-11-14 at 18:09 +0800, Zhenzhong Duan wrote:
> This gives management tools like libvirt a chance to open the vfio
> cdev with privilege and pass FD to qemu. This way qemu never needs
> to have privilege to open a VFIO or iommu cdev node.
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> hw/vfio/ccw.c | 25 ++++++++++++++++++++++---
> 1 file changed, 22 insertions(+), 3 deletions(-)
Reviewed-by: Eric Farman <farman@linux.ibm.com>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 20/21] kconfig: Activate IOMMUFD for s390x machines
2023-11-14 10:09 ` [PATCH v6 20/21] kconfig: Activate IOMMUFD for s390x machines Zhenzhong Duan
@ 2023-11-15 18:47 ` Eric Farman
0 siblings, 0 replies; 82+ messages in thread
From: Eric Farman @ 2023-11-15 18:47 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Matthew Rosato, Paolo Bonzini, Halil Pasic, Christian Borntraeger,
Thomas Huth, Richard Henderson, David Hildenbrand,
Ilya Leoshkevich, open list:S390 Virtio-ccw
On Tue, 2023-11-14 at 18:09 +0800, Zhenzhong Duan wrote:
> From: Cédric Le Goater <clg@redhat.com>
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Reviewed-by: Matthew Rosato <mjrosato@linux.ibm.com>
> ---
> hw/s390x/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
Reviewed-by: Eric Farman <farman@linux.ibm.com>
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle
2023-11-15 13:05 ` Cédric Le Goater
@ 2023-11-16 2:15 ` Duan, Zhenzhong
2023-11-16 7:25 ` Cédric Le Goater
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-16 2:15 UTC (permalink / raw)
To: Cédric Le Goater, Philippe Mathieu-Daudé,
qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P
Hi Philippe,
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Wednesday, November 15, 2023 9:05 PM
>Subject: Re: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a
>file handle
>
>On 11/15/23 13:09, Philippe Mathieu-Daudé wrote:
>> Hi Zhenzhong,
>>
>> On 14/11/23 11:09, Zhenzhong Duan wrote:
>>> This gives management tools like libvirt a chance to open the vfio
>>> cdev with privilege and pass FD to qemu. This way qemu never needs
>>> to have privilege to open a VFIO or iommu cdev node.
>>>
>>> Together with the earlier support of pre-opening /dev/iommu device,
>>> now we have full support of passing a vfio device to unprivileged
>>> qemu by management tool. This mode is no more considered for the
>>> legacy backend. So let's remove the "TODO" comment.
>>>
>>> Add helper functions vfio_device_set_fd() and vfio_device_get_name()
>>> to set fd and get device name, they will also be used by other vfio
>>> devices.
>>>
>>> There is no easy way to check if a device is mdev with FD passing,
>>> so fail the x-balloon-allowed check unconditionally in this case.
>>>
>>> There is also no easy way to get BDF as name with FD passing, so
>>> we fake a name by VFIO_FD[fd].
>>>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>> v6: simplify CONFIG_IOMMUFD checking code
>>> introduce a helper vfio_device_set_fd
>>>
>>> include/hw/vfio/vfio-common.h | 3 +++
>>> hw/vfio/helpers.c | 44 +++++++++++++++++++++++++++++++++++
>>> hw/vfio/iommufd.c | 12 ++++++----
>>> hw/vfio/pci.c | 28 ++++++++++++----------
>>> 4 files changed, 71 insertions(+), 16 deletions(-)
>>>
>>> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
>>> index 3dac5c167e..567e5f7bea 100644
>>> --- a/include/hw/vfio/vfio-common.h
>>> +++ b/include/hw/vfio/vfio-common.h
>>> @@ -251,4 +251,7 @@ int
>vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
>>> hwaddr size);
>>> int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
>>> uint64_t size, ram_addr_t ram_addr);
>>> +
>>
>> Please add bare documentation:
>>
>> /* Returns 0 on success, or a negative errno. */
>>
>>> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
Will do, I'd like to wait a few days to collect more suggested changes and RB,
Then send all these updates to Cédric in once before he pushes this series to vfio-next.
>>
>> Functions taking an Error** param should return a boolean, so:
>>
>> /* Return: true on success, else false setting @errp with error. */
>>
>>> +void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
>>> #endif /* HW_VFIO_VFIO_COMMON_H */
>>
>>
>>> @@ -609,3 +611,45 @@ bool vfio_has_region_cap(VFIODevice *vbasedev, int
>region, uint16_t cap_type)
>>> return ret;
>>> }
>>> +
>>> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp)
>>> +{
>>> + struct stat st;
>>> +
>>> + if (vbasedev->fd < 0) {
>>> + if (stat(vbasedev->sysfsdev, &st) < 0) {
>>> + error_setg_errno(errp, errno, "no such host device");
>>> + error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
>>> + return -errno;
>>> + }
>>> + /* User may specify a name, e.g: VFIO platform device */
>>> + if (!vbasedev->name) {
>>> + vbasedev->name = g_path_get_basename(vbasedev->sysfsdev);
>>> + }
>>> + } else {
>>> + if (!vbasedev->iommufd) {
>>> + error_setg(errp, "Use FD passing only with iommufd backend");
>>> + return -EINVAL;
>>> + }
>>> + /*
>>> + * Give a name with fd so any function printing out vbasedev->name
>>> + * will not break.
>>> + */
>>> + if (!vbasedev->name) {
>>> + vbasedev->name = g_strdup_printf("VFIO_FD%d", vbasedev->fd);
>>> + }
>>> + }
>>> +
>>> + return 0;
>>> +}
>>> +
>>> +void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
>>
>> bool vfio_device_set_fd(..., Error **errp)
>>
>>> +{
>>> + int fd = monitor_fd_param(monitor_cur(), str, errp);
>>> +
>>> + if (fd < 0) {
>>> + error_prepend(errp, "Could not parse remote object fd %s:", str);
>>> + return;
>>
>> return false;
>>
>>> + }
>>> + vbasedev->fd = fd;
>>
>> return true;
>
>If we had a QOM base device object, vfio_device_set_fd() would be passed
>directly to object_class_property_add_str() which expects a :
>
> void (*set)(Object *, const char *, Error **)
>
>I think it is fine to keep as it is. We might have a QOM base device object
>one day ! Minor anyway.
>
>Thanks,
>
>C.
>
>
>>
>>> +}
>>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>>> index 3eec428162..e08a217057 100644
>>> --- a/hw/vfio/iommufd.c
>>> +++ b/hw/vfio/iommufd.c
>>> @@ -326,11 +326,15 @@ static int iommufd_cdev_attach(const char *name,
>VFIODevice *vbasedev,
>>> uint32_t ioas_id;
>>> Error *err = NULL;
>>> - devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>>> - if (devfd < 0) {
>>> - return devfd;
>>> + if (vbasedev->fd < 0) {
>>> + devfd = iommufd_cdev_getfd(vbasedev->sysfsdev, errp);
>>> + if (devfd < 0) {
>>> + return devfd;
>>> + }
>>> + vbasedev->fd = devfd;
>>> + } else {
>>> + devfd = vbasedev->fd;
>>> }
>>> - vbasedev->fd = devfd;
>>> ret = iommufd_cdev_connect_and_bind(vbasedev, errp);
>>> if (ret) {
>>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
>>> index c5984b0598..b23b492cce 100644
>>> --- a/hw/vfio/pci.c
>>> +++ b/hw/vfio/pci.c
>>> @@ -2944,17 +2944,19 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>>> VFIODevice *vbasedev = &vdev->vbasedev;
>>> char *tmp, *subsys;
>>> Error *err = NULL;
>>> - struct stat st;
>>> int i, ret;
>>> bool is_mdev;
>>> char uuid[UUID_STR_LEN];
>>> char *name;
>>> - if (!vbasedev->sysfsdev) {
>>> + if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
>>> if (!(~vdev->host.domain || ~vdev->host.bus ||
>>> ~vdev->host.slot || ~vdev->host.function)) {
>>> error_setg(errp, "No provided host device");
>>> error_append_hint(errp, "Use -device vfio-pci,host=DDDD:BB:DD.F "
>>> +#ifdef CONFIG_IOMMUFD
>>> + "or -device vfio-pci,fd=DEVICE_FD "
>>> +#endif
>>> "or -device vfio-pci,sysfsdev=PATH_TO_DEVICE\n");
>>> return;
>>> }
>>> @@ -2964,13 +2966,9 @@ static void vfio_realize(PCIDevice *pdev, Error
>**errp)
>>> vdev->host.slot, vdev->host.function);
>>> }
>>> - if (stat(vbasedev->sysfsdev, &st) < 0) {
>>> - error_setg_errno(errp, errno, "no such host device");
>>> - error_prepend(errp, VFIO_MSG_PREFIX, vbasedev->sysfsdev);
>>> + if (vfio_device_get_name(vbasedev, errp)) {
>>
>> Clearer as:
>>
>> if (vfio_device_get_name(vbasedev, errp) < 0) {
>>
>>> return;
>>> }
Will do.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-15 12:52 ` Eric Auger
@ 2023-11-16 4:04 ` Duan, Zhenzhong
2023-11-16 8:32 ` Eric Auger
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-16 4:04 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, peterx@redhat.com,
jasowang@redhat.com, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
Peng, Chao P, Paolo Bonzini, Eric Blake, Markus Armbruster,
Daniel P. Berrangé, Eduardo Habkost
Hi Eric,
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Wednesday, November 15, 2023 8:53 PM
>Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>
>Hi Zhenzhong,
>
>On 11/14/23 11:09, Zhenzhong Duan wrote:
>> From: Eric Auger <eric.auger@redhat.com>
>>
>> Introduce an iommufd object which allows the interaction
>> with the host /dev/iommu device.
>>
>> The /dev/iommu can have been already pre-opened outside of qemu,
>> in which case the fd can be passed directly along with the
>> iommufd object:
>>
>> This allows the iommufd object to be shared accross several
>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>> the /dev/iommu once.
>>
>> If no fd is passed along with the iommufd object, the /dev/iommu
>> is opened by the qemu code.
>>
>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> v6: remove redundant call, alloc_hwpt, get/put_ioas
>>
>> MAINTAINERS | 7 ++
>> qapi/qom.json | 19 ++++
>> include/sysemu/iommufd.h | 44 ++++++++
>> backends/iommufd.c | 228 +++++++++++++++++++++++++++++++++++++++
>> backends/Kconfig | 4 +
>> backends/meson.build | 1 +
>> backends/trace-events | 10 ++
>> qemu-options.hx | 12 +++
>> 8 files changed, 325 insertions(+)
>> create mode 100644 include/sysemu/iommufd.h
>> create mode 100644 backends/iommufd.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index ff1238bb98..a4891f7bda 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c
>> F: docs/system/s390x/vfio-ap.rst
>> L: qemu-s390x@nongnu.org
>>
>> +iommufd
>> +M: Yi Liu <yi.l.liu@intel.com>
>> +M: Eric Auger <eric.auger@redhat.com>
>Zhenzhong, don't you want to be added here?
>> +S: Supported
>> +F: backends/iommufd.c
>> +F: include/sysemu/iommufd.h
>> +
>> vhost
>> M: Michael S. Tsirkin <mst@redhat.com>
>> S: Supported
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index c53ef978ff..1fd8555a75 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -794,6 +794,23 @@
>> { 'struct': 'VfioUserServerProperties',
>> 'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>
>> +##
>> +# @IOMMUFDProperties:
>> +#
>> +# Properties for iommufd objects.
>> +#
>> +# @fd: file descriptor name previously passed via 'getfd' command,
>
>"previously passed via 'getfd' command", I wonder if this applies here or whether
>it is copy/paste of
>RemoteObjectProperties.fd doc?
Maybe copied😊 I thinks this applies here because I use monitor_fd_param to get fd.
Or I miss anything?
>
>> +# which represents a pre-opened /dev/iommu. This allows the
>> +# iommufd object to be shared accross several subsystems
>> +# (VFIO, VDPA, ...), and the file descriptor to be shared
>> +# with other process, e.g. DPDK. (default: QEMU opens
>> +# /dev/iommu by itself)
>> +#
>> +# Since: 8.2
>> +##
>> +{ 'struct': 'IOMMUFDProperties',
>> + 'data': { '*fd': 'str' } }
>> +
>> ##
>> # @RngProperties:
>> #
>> @@ -934,6 +951,7 @@
>> 'input-barrier',
>> { 'name': 'input-linux',
>> 'if': 'CONFIG_LINUX' },
>> + 'iommufd',
>> 'iothread',
>> 'main-loop',
>> { 'name': 'memory-backend-epc',
>> @@ -1003,6 +1021,7 @@
>> 'input-barrier': 'InputBarrierProperties',
>> 'input-linux': { 'type': 'InputLinuxProperties',
>> 'if': 'CONFIG_LINUX' },
>> + 'iommufd': 'IOMMUFDProperties',
>> 'iothread': 'IothreadProperties',
>> 'main-loop': 'MainLoopProperties',
>> 'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties',
>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>> new file mode 100644
>> index 0000000000..9b3a86f57d
>> --- /dev/null
>> +++ b/include/sysemu/iommufd.h
>> @@ -0,0 +1,44 @@
>> +#ifndef SYSEMU_IOMMUFD_H
>> +#define SYSEMU_IOMMUFD_H
>> +
>> +#include "qom/object.h"
>> +#include "qemu/thread.h"
>> +#include "exec/hwaddr.h"
>> +#include "exec/cpu-common.h"
>> +
>> +#define TYPE_IOMMUFD_BACKEND "iommufd"
>> +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
>> + IOMMUFD_BACKEND)
>> +#define IOMMUFD_BACKEND(obj) \
>> + OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
>> +#define IOMMUFD_BACKEND_GET_CLASS(obj) \
>> + OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj),
>TYPE_IOMMUFD_BACKEND)
>> +#define IOMMUFD_BACKEND_CLASS(klass) \
>> + OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass),
>TYPE_IOMMUFD_BACKEND)
>> +struct IOMMUFDBackendClass {
>> + ObjectClass parent_class;
>> +};
>> +
>> +struct IOMMUFDBackend {
>> + Object parent;
>> +
>> + /*< protected >*/
>> + int fd; /* /dev/iommu file descriptor */
>> + bool owned; /* is the /dev/iommu opened internally */
>> + QemuMutex lock;
>> + uint32_t users;
>> +
>> + /*< public >*/
>> +};
>> +
>> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
>> +void iommufd_backend_disconnect(IOMMUFDBackend *be);
>> +
>> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>> + Error **errp);
>> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>hwaddr iova,
>> + ram_addr_t size, void *vaddr, bool readonly);
>> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> + hwaddr iova, ram_addr_t size);
>> +#endif
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> new file mode 100644
>> index 0000000000..ea3e2a8f85
>> --- /dev/null
>> +++ b/backends/iommufd.c
>> @@ -0,0 +1,228 @@
>> +/*
>> + * iommufd container backend
>> + *
>> + * Copyright (C) 2023 Intel Corporation.
>> + * Copyright Red Hat, Inc. 2023
>> + *
>> + * Authors: Yi Liu <yi.l.liu@intel.com>
>> + * Eric Auger <eric.auger@redhat.com>
>> + *
>> + * SPDX-License-Identifier: GPL-2.0-or-later
>> + */
>> +
>> +#include "qemu/osdep.h"
>> +#include "sysemu/iommufd.h"
>> +#include "qapi/error.h"
>> +#include "qapi/qmp/qerror.h"
>> +#include "qemu/module.h"
>> +#include "qom/object_interfaces.h"
>> +#include "qemu/error-report.h"
>> +#include "monitor/monitor.h"
>> +#include "trace.h"
>> +#include <sys/ioctl.h>
>> +#include <linux/iommufd.h>
>> +
>> +static void iommufd_backend_init(Object *obj)
>> +{
>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> +
>> + be->fd = -1;
>> + be->users = 0;
>> + be->owned = true;
>> + qemu_mutex_init(&be->lock);
>> +}
>> +
>> +static void iommufd_backend_finalize(Object *obj)
>> +{
>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> +
>> + if (be->owned) {
>> + close(be->fd);
>> + be->fd = -1;
>> + }
>> +}
>> +
>> +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
>> +{
>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>> + int fd = -1;
>> +
>> + fd = monitor_fd_param(monitor_cur(), str, errp);
>> + if (fd == -1) {
>> + error_prepend(errp, "Could not parse remote object fd %s:", str);
>> + return;
>> + }
>> + qemu_mutex_lock(&be->lock);
>> + be->fd = fd;
>> + be->owned = false;
>> + qemu_mutex_unlock(&be->lock);
>> + trace_iommu_backend_set_fd(be->fd);
>> +}
>> +
>> +static void iommufd_backend_class_init(ObjectClass *oc, void *data)
>> +{
>> + object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
>> +}
>> +
>> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
>> +{
>> + int fd, ret = 0;
>> +
>> + qemu_mutex_lock(&be->lock);
>> + if (be->users == UINT32_MAX) {
>> + error_setg(errp, "too many connections");
>> + ret = -E2BIG;
>> + goto out;
>> + }
>> + if (be->owned && !be->users) {
>> + fd = qemu_open_old("/dev/iommu", O_RDWR);
>> + if (fd < 0) {
>> + error_setg_errno(errp, errno, "/dev/iommu opening failed");
>> + ret = fd;
>> + goto out;
>> + }
>> + be->fd = fd;
>> + }
>> + be->users++;
>> +out:
>> + trace_iommufd_backend_connect(be->fd, be->owned,
>> + be->users, ret);
>> + qemu_mutex_unlock(&be->lock);
>> + return ret;
>> +}
>> +
>> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
>> +{
>> + qemu_mutex_lock(&be->lock);
>> + if (!be->users) {
>> + goto out;
>> + }
>> + be->users--;
>> + if (!be->users && be->owned) {
>> + close(be->fd);
>> + be->fd = -1;
>> + }
>> +out:
>> + trace_iommufd_backend_disconnect(be->fd, be->users);
>> + qemu_mutex_unlock(&be->lock);
>> +}
>> +
>> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>> + Error **errp)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_ioas_alloc alloc_data = {
>> + .size = sizeof(alloc_data),
>> + .flags = 0,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
>> + if (ret) {
>> + error_setg_errno(errp, errno, "Failed to allocate ioas");
>> + return ret;
>> + }
>> +
>> + *ioas_id = alloc_data.out_ioas_id;
>> + trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
>> +
>> + return ret;
>> +}
>> +
>> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_destroy des = {
>> + .size = sizeof(des),
>> + .id = id,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_DESTROY, &des);
>> + trace_iommufd_backend_free_id(fd, id, ret);
>> + if (ret) {
>> + error_report("Failed to free id: %u %m", id);
>> + }
>> +}
>> +
>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>hwaddr iova,
>> + ram_addr_t size, void *vaddr, bool readonly)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_ioas_map map = {
>> + .size = sizeof(map),
>> + .flags = IOMMU_IOAS_MAP_READABLE |
>> + IOMMU_IOAS_MAP_FIXED_IOVA,
>> + .ioas_id = ioas_id,
>> + .__reserved = 0,
>> + .user_va = (uintptr_t)vaddr,
>> + .iova = iova,
>> + .length = size,
>> + };
>> +
>> + if (!readonly) {
>> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>> + }
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
>> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
>> + vaddr, readonly, ret);
>> + if (ret) {
>> + ret = -errno;
>> + error_report("IOMMU_IOAS_MAP failed: %m");
>> + }
>> + return ret;
>> +}
>> +
>> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> + hwaddr iova, ram_addr_t size)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_ioas_unmap unmap = {
>> + .size = sizeof(unmap),
>> + .ioas_id = ioas_id,
>> + .iova = iova,
>> + .length = size,
>> + };
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_UNMAP, &unmap);
>> + /*
>> + * IOMMUFD takes mapping as some kind of object, unmapping
>> + * nonexistent mapping is treated as deleting a nonexistent
>> + * object and return ENOENT. This is different from legacy
>> + * backend which allows it. vIOMMU may trigger a lot of
>> + * redundant unmapping, to avoid flush the log, treat them
>> + * as succeess for IOMMUFD just like legacy backend.
>> + */
>> + if (ret && errno == ENOENT) {
>> + trace_iommufd_backend_unmap_dma_non_exist(fd, ioas_id, iova, size,
>ret);
>> + ret = 0;
>> + } else {
>> + trace_iommufd_backend_unmap_dma(fd, ioas_id, iova, size, ret);
>> + }
>> +
>> + if (ret) {
>> + ret = -errno;
>> + error_report("IOMMU_IOAS_UNMAP failed: %m");
>> + }
>> + return ret;
>> +}
>> +
>> +static const TypeInfo iommufd_backend_info = {
>> + .name = TYPE_IOMMUFD_BACKEND,
>> + .parent = TYPE_OBJECT,
>> + .instance_size = sizeof(IOMMUFDBackend),
>> + .instance_init = iommufd_backend_init,
>> + .instance_finalize = iommufd_backend_finalize,
>> + .class_size = sizeof(IOMMUFDBackendClass),
>> + .class_init = iommufd_backend_class_init,
>> + .interfaces = (InterfaceInfo[]) {
>> + { TYPE_USER_CREATABLE },
>> + { }
>> + }
>> +};
>> +
>> +static void register_types(void)
>> +{
>> + type_register_static(&iommufd_backend_info);
>> +}
>> +
>> +type_init(register_types);
>> diff --git a/backends/Kconfig b/backends/Kconfig
>> index f35abc1609..2cb23f62fa 100644
>> --- a/backends/Kconfig
>> +++ b/backends/Kconfig
>> @@ -1 +1,5 @@
>> source tpm/Kconfig
>> +
>> +config IOMMUFD
>> + bool
>> + depends on VFIO
>I don't know the state of vDPA/iommufd integration but this extra might
>be added in short term.
Thanks for reminder. But I think it make more sense that series relax the check
itself?
E.g. depends on VFIO || VHOST_VDPA
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle
2023-11-16 2:15 ` Duan, Zhenzhong
@ 2023-11-16 7:25 ` Cédric Le Goater
2023-11-16 7:43 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-16 7:25 UTC (permalink / raw)
To: Duan, Zhenzhong, Philippe Mathieu-Daudé,
qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P
>>> Please add bare documentation:
>>>
>>> /* Returns 0 on success, or a negative errno. */
>>>
>>>> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>
> Will do, I'd like to wait a few days to collect more suggested changes and RB,
> Then send all these updates to Cédric in once before he pushes this series to vfio-next.
Yep. Could you respin a v7 with all the comments on v6 ? I will
then apply directly on vfio-next.
Please wait for Eric to finish looking at the platform part.
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle
2023-11-16 7:25 ` Cédric Le Goater
@ 2023-11-16 7:43 ` Duan, Zhenzhong
0 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-16 7:43 UTC (permalink / raw)
To: Cédric Le Goater, Philippe Mathieu-Daudé,
qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Thursday, November 16, 2023 3:25 PM
>Subject: Re: [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a
>file handle
>
>>>> Please add bare documentation:
>>>>
>>>> /* Returns 0 on success, or a negative errno. */
>>>>
>>>>> +int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>>
>> Will do, I'd like to wait a few days to collect more suggested changes and RB,
>> Then send all these updates to Cédric in once before he pushes this series to
>vfio-next.
>
>Yep. Could you respin a v7 with all the comments on v6 ? I will
>then apply directly on vfio-next.
Sure.
>
>Please wait for Eric to finish looking at the platform part.
Sure.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-16 4:04 ` Duan, Zhenzhong
@ 2023-11-16 8:32 ` Eric Auger
2023-11-16 8:47 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2023-11-16 8:32 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, peterx@redhat.com,
jasowang@redhat.com, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
Peng, Chao P, Paolo Bonzini, Eric Blake, Markus Armbruster,
Daniel P. Berrangé, Eduardo Habkost
Hi Zhenzhong,
On 11/16/23 05:04, Duan, Zhenzhong wrote:
> Hi Eric,
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Wednesday, November 15, 2023 8:53 PM
>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>>
>> Hi Zhenzhong,
>>
>> On 11/14/23 11:09, Zhenzhong Duan wrote:
>>> From: Eric Auger <eric.auger@redhat.com>
>>>
>>> Introduce an iommufd object which allows the interaction
>>> with the host /dev/iommu device.
>>>
>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>> in which case the fd can be passed directly along with the
>>> iommufd object:
>>>
>>> This allows the iommufd object to be shared accross several
>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>> the /dev/iommu once.
>>>
>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>> is opened by the qemu code.
>>>
>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>> ---
>>> v6: remove redundant call, alloc_hwpt, get/put_ioas
>>>
>>> MAINTAINERS | 7 ++
>>> qapi/qom.json | 19 ++++
>>> include/sysemu/iommufd.h | 44 ++++++++
>>> backends/iommufd.c | 228 +++++++++++++++++++++++++++++++++++++++
>>> backends/Kconfig | 4 +
>>> backends/meson.build | 1 +
>>> backends/trace-events | 10 ++
>>> qemu-options.hx | 12 +++
>>> 8 files changed, 325 insertions(+)
>>> create mode 100644 include/sysemu/iommufd.h
>>> create mode 100644 backends/iommufd.c
>>>
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>> index ff1238bb98..a4891f7bda 100644
>>> --- a/MAINTAINERS
>>> +++ b/MAINTAINERS
>>> @@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c
>>> F: docs/system/s390x/vfio-ap.rst
>>> L: qemu-s390x@nongnu.org
>>>
>>> +iommufd
>>> +M: Yi Liu <yi.l.liu@intel.com>
>>> +M: Eric Auger <eric.auger@redhat.com>
>> Zhenzhong, don't you want to be added here?
>>> +S: Supported
>>> +F: backends/iommufd.c
>>> +F: include/sysemu/iommufd.h
>>> +
>>> vhost
>>> M: Michael S. Tsirkin <mst@redhat.com>
>>> S: Supported
>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>> index c53ef978ff..1fd8555a75 100644
>>> --- a/qapi/qom.json
>>> +++ b/qapi/qom.json
>>> @@ -794,6 +794,23 @@
>>> { 'struct': 'VfioUserServerProperties',
>>> 'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>>
>>> +##
>>> +# @IOMMUFDProperties:
>>> +#
>>> +# Properties for iommufd objects.
>>> +#
>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>> "previously passed via 'getfd' command", I wonder if this applies here or whether
>> it is copy/paste of
>> RemoteObjectProperties.fd doc?
> Maybe copied😊 I thinks this applies here because I use monitor_fd_param to get fd.
> Or I miss anything?
This is a bit cryptic to me and I don't really understand what it means.
It does not mean it is not correct but I am curious about explanations
if anybody has some ;-)
>
>>> +# which represents a pre-opened /dev/iommu. This allows the
>>> +# iommufd object to be shared accross several subsystems
>>> +# (VFIO, VDPA, ...), and the file descriptor to be shared
>>> +# with other process, e.g. DPDK. (default: QEMU opens
>>> +# /dev/iommu by itself)
>>> +#
>>> +# Since: 8.2
>>> +##
>>> +{ 'struct': 'IOMMUFDProperties',
>>> + 'data': { '*fd': 'str' } }
>>> +
>>> ##
>>> # @RngProperties:
>>> #
>>> @@ -934,6 +951,7 @@
>>> 'input-barrier',
>>> { 'name': 'input-linux',
>>> 'if': 'CONFIG_LINUX' },
>>> + 'iommufd',
>>> 'iothread',
>>> 'main-loop',
>>> { 'name': 'memory-backend-epc',
>>> @@ -1003,6 +1021,7 @@
>>> 'input-barrier': 'InputBarrierProperties',
>>> 'input-linux': { 'type': 'InputLinuxProperties',
>>> 'if': 'CONFIG_LINUX' },
>>> + 'iommufd': 'IOMMUFDProperties',
>>> 'iothread': 'IothreadProperties',
>>> 'main-loop': 'MainLoopProperties',
>>> 'memory-backend-epc': { 'type': 'MemoryBackendEpcProperties',
>>> diff --git a/include/sysemu/iommufd.h b/include/sysemu/iommufd.h
>>> new file mode 100644
>>> index 0000000000..9b3a86f57d
>>> --- /dev/null
>>> +++ b/include/sysemu/iommufd.h
>>> @@ -0,0 +1,44 @@
>>> +#ifndef SYSEMU_IOMMUFD_H
>>> +#define SYSEMU_IOMMUFD_H
>>> +
>>> +#include "qom/object.h"
>>> +#include "qemu/thread.h"
>>> +#include "exec/hwaddr.h"
>>> +#include "exec/cpu-common.h"
>>> +
>>> +#define TYPE_IOMMUFD_BACKEND "iommufd"
>>> +OBJECT_DECLARE_TYPE(IOMMUFDBackend, IOMMUFDBackendClass,
>>> + IOMMUFD_BACKEND)
>>> +#define IOMMUFD_BACKEND(obj) \
>>> + OBJECT_CHECK(IOMMUFDBackend, (obj), TYPE_IOMMUFD_BACKEND)
>>> +#define IOMMUFD_BACKEND_GET_CLASS(obj) \
>>> + OBJECT_GET_CLASS(IOMMUFDBackendClass, (obj),
>> TYPE_IOMMUFD_BACKEND)
>>> +#define IOMMUFD_BACKEND_CLASS(klass) \
>>> + OBJECT_CLASS_CHECK(IOMMUFDBackendClass, (klass),
>> TYPE_IOMMUFD_BACKEND)
>>> +struct IOMMUFDBackendClass {
>>> + ObjectClass parent_class;
>>> +};
>>> +
>>> +struct IOMMUFDBackend {
>>> + Object parent;
>>> +
>>> + /*< protected >*/
>>> + int fd; /* /dev/iommu file descriptor */
>>> + bool owned; /* is the /dev/iommu opened internally */
>>> + QemuMutex lock;
>>> + uint32_t users;
>>> +
>>> + /*< public >*/
>>> +};
>>> +
>>> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp);
>>> +void iommufd_backend_disconnect(IOMMUFDBackend *be);
>>> +
>>> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>>> + Error **errp);
>>> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
>>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> hwaddr iova,
>>> + ram_addr_t size, void *vaddr, bool readonly);
>>> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> + hwaddr iova, ram_addr_t size);
>>> +#endif
>>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>>> new file mode 100644
>>> index 0000000000..ea3e2a8f85
>>> --- /dev/null
>>> +++ b/backends/iommufd.c
>>> @@ -0,0 +1,228 @@
>>> +/*
>>> + * iommufd container backend
>>> + *
>>> + * Copyright (C) 2023 Intel Corporation.
>>> + * Copyright Red Hat, Inc. 2023
>>> + *
>>> + * Authors: Yi Liu <yi.l.liu@intel.com>
>>> + * Eric Auger <eric.auger@redhat.com>
>>> + *
>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>> + */
>>> +
>>> +#include "qemu/osdep.h"
>>> +#include "sysemu/iommufd.h"
>>> +#include "qapi/error.h"
>>> +#include "qapi/qmp/qerror.h"
>>> +#include "qemu/module.h"
>>> +#include "qom/object_interfaces.h"
>>> +#include "qemu/error-report.h"
>>> +#include "monitor/monitor.h"
>>> +#include "trace.h"
>>> +#include <sys/ioctl.h>
>>> +#include <linux/iommufd.h>
>>> +
>>> +static void iommufd_backend_init(Object *obj)
>>> +{
>>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>>> +
>>> + be->fd = -1;
>>> + be->users = 0;
>>> + be->owned = true;
>>> + qemu_mutex_init(&be->lock);
>>> +}
>>> +
>>> +static void iommufd_backend_finalize(Object *obj)
>>> +{
>>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>>> +
>>> + if (be->owned) {
>>> + close(be->fd);
>>> + be->fd = -1;
>>> + }
>>> +}
>>> +
>>> +static void iommufd_backend_set_fd(Object *obj, const char *str, Error **errp)
>>> +{
>>> + IOMMUFDBackend *be = IOMMUFD_BACKEND(obj);
>>> + int fd = -1;
>>> +
>>> + fd = monitor_fd_param(monitor_cur(), str, errp);
>>> + if (fd == -1) {
>>> + error_prepend(errp, "Could not parse remote object fd %s:", str);
>>> + return;
>>> + }
>>> + qemu_mutex_lock(&be->lock);
>>> + be->fd = fd;
>>> + be->owned = false;
>>> + qemu_mutex_unlock(&be->lock);
>>> + trace_iommu_backend_set_fd(be->fd);
>>> +}
>>> +
>>> +static void iommufd_backend_class_init(ObjectClass *oc, void *data)
>>> +{
>>> + object_class_property_add_str(oc, "fd", NULL, iommufd_backend_set_fd);
>>> +}
>>> +
>>> +int iommufd_backend_connect(IOMMUFDBackend *be, Error **errp)
>>> +{
>>> + int fd, ret = 0;
>>> +
>>> + qemu_mutex_lock(&be->lock);
>>> + if (be->users == UINT32_MAX) {
>>> + error_setg(errp, "too many connections");
>>> + ret = -E2BIG;
>>> + goto out;
>>> + }
>>> + if (be->owned && !be->users) {
>>> + fd = qemu_open_old("/dev/iommu", O_RDWR);
>>> + if (fd < 0) {
>>> + error_setg_errno(errp, errno, "/dev/iommu opening failed");
>>> + ret = fd;
>>> + goto out;
>>> + }
>>> + be->fd = fd;
>>> + }
>>> + be->users++;
>>> +out:
>>> + trace_iommufd_backend_connect(be->fd, be->owned,
>>> + be->users, ret);
>>> + qemu_mutex_unlock(&be->lock);
>>> + return ret;
>>> +}
>>> +
>>> +void iommufd_backend_disconnect(IOMMUFDBackend *be)
>>> +{
>>> + qemu_mutex_lock(&be->lock);
>>> + if (!be->users) {
>>> + goto out;
>>> + }
>>> + be->users--;
>>> + if (!be->users && be->owned) {
>>> + close(be->fd);
>>> + be->fd = -1;
>>> + }
>>> +out:
>>> + trace_iommufd_backend_disconnect(be->fd, be->users);
>>> + qemu_mutex_unlock(&be->lock);
>>> +}
>>> +
>>> +int iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>>> + Error **errp)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_ioas_alloc alloc_data = {
>>> + .size = sizeof(alloc_data),
>>> + .flags = 0,
>>> + };
>>> +
>>> + ret = ioctl(fd, IOMMU_IOAS_ALLOC, &alloc_data);
>>> + if (ret) {
>>> + error_setg_errno(errp, errno, "Failed to allocate ioas");
>>> + return ret;
>>> + }
>>> +
>>> + *ioas_id = alloc_data.out_ioas_id;
>>> + trace_iommufd_backend_alloc_ioas(fd, *ioas_id, ret);
>>> +
>>> + return ret;
>>> +}
>>> +
>>> +void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_destroy des = {
>>> + .size = sizeof(des),
>>> + .id = id,
>>> + };
>>> +
>>> + ret = ioctl(fd, IOMMU_DESTROY, &des);
>>> + trace_iommufd_backend_free_id(fd, id, ret);
>>> + if (ret) {
>>> + error_report("Failed to free id: %u %m", id);
>>> + }
>>> +}
>>> +
>>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> hwaddr iova,
>>> + ram_addr_t size, void *vaddr, bool readonly)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_ioas_map map = {
>>> + .size = sizeof(map),
>>> + .flags = IOMMU_IOAS_MAP_READABLE |
>>> + IOMMU_IOAS_MAP_FIXED_IOVA,
>>> + .ioas_id = ioas_id,
>>> + .__reserved = 0,
>>> + .user_va = (uintptr_t)vaddr,
>>> + .iova = iova,
>>> + .length = size,
>>> + };
>>> +
>>> + if (!readonly) {
>>> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>>> + }
>>> +
>>> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
>>> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
>>> + vaddr, readonly, ret);
>>> + if (ret) {
>>> + ret = -errno;
>>> + error_report("IOMMU_IOAS_MAP failed: %m");
>>> + }
>>> + return ret;
>>> +}
>>> +
>>> +int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> + hwaddr iova, ram_addr_t size)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_ioas_unmap unmap = {
>>> + .size = sizeof(unmap),
>>> + .ioas_id = ioas_id,
>>> + .iova = iova,
>>> + .length = size,
>>> + };
>>> +
>>> + ret = ioctl(fd, IOMMU_IOAS_UNMAP, &unmap);
>>> + /*
>>> + * IOMMUFD takes mapping as some kind of object, unmapping
>>> + * nonexistent mapping is treated as deleting a nonexistent
>>> + * object and return ENOENT. This is different from legacy
>>> + * backend which allows it. vIOMMU may trigger a lot of
>>> + * redundant unmapping, to avoid flush the log, treat them
>>> + * as succeess for IOMMUFD just like legacy backend.
>>> + */
>>> + if (ret && errno == ENOENT) {
>>> + trace_iommufd_backend_unmap_dma_non_exist(fd, ioas_id, iova, size,
>> ret);
>>> + ret = 0;
>>> + } else {
>>> + trace_iommufd_backend_unmap_dma(fd, ioas_id, iova, size, ret);
>>> + }
>>> +
>>> + if (ret) {
>>> + ret = -errno;
>>> + error_report("IOMMU_IOAS_UNMAP failed: %m");
>>> + }
>>> + return ret;
>>> +}
>>> +
>>> +static const TypeInfo iommufd_backend_info = {
>>> + .name = TYPE_IOMMUFD_BACKEND,
>>> + .parent = TYPE_OBJECT,
>>> + .instance_size = sizeof(IOMMUFDBackend),
>>> + .instance_init = iommufd_backend_init,
>>> + .instance_finalize = iommufd_backend_finalize,
>>> + .class_size = sizeof(IOMMUFDBackendClass),
>>> + .class_init = iommufd_backend_class_init,
>>> + .interfaces = (InterfaceInfo[]) {
>>> + { TYPE_USER_CREATABLE },
>>> + { }
>>> + }
>>> +};
>>> +
>>> +static void register_types(void)
>>> +{
>>> + type_register_static(&iommufd_backend_info);
>>> +}
>>> +
>>> +type_init(register_types);
>>> diff --git a/backends/Kconfig b/backends/Kconfig
>>> index f35abc1609..2cb23f62fa 100644
>>> --- a/backends/Kconfig
>>> +++ b/backends/Kconfig
>>> @@ -1 +1,5 @@
>>> source tpm/Kconfig
>>> +
>>> +config IOMMUFD
>>> + bool
>>> + depends on VFIO
>> I don't know the state of vDPA/iommufd integration but this extra might
>> be added in short term.
> Thanks for reminder. But I think it make more sense that series relax the check
> itself?
> E.g. depends on VFIO || VHOST_VDPA
yeah we can leave it as it is for now
Eric
>
> Thanks
> Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-16 8:32 ` Eric Auger
@ 2023-11-16 8:47 ` Duan, Zhenzhong
0 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-16 8:47 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, peterx@redhat.com,
jasowang@redhat.com, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
Peng, Chao P, Paolo Bonzini, Eric Blake, Markus Armbruster,
Daniel P. Berrangé, Eduardo Habkost
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Thursday, November 16, 2023 4:33 PM
>Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>
>Hi Zhenzhong,
>
>On 11/16/23 05:04, Duan, Zhenzhong wrote:
>> Hi Eric,
>>
>>> -----Original Message-----
>>> From: Eric Auger <eric.auger@redhat.com>
>>> Sent: Wednesday, November 15, 2023 8:53 PM
>>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd
>object
>>>
>>> Hi Zhenzhong,
>>>
>>> On 11/14/23 11:09, Zhenzhong Duan wrote:
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>>
>>>> Introduce an iommufd object which allows the interaction
>>>> with the host /dev/iommu device.
>>>>
>>>> The /dev/iommu can have been already pre-opened outside of qemu,
>>>> in which case the fd can be passed directly along with the
>>>> iommufd object:
>>>>
>>>> This allows the iommufd object to be shared accross several
>>>> subsystems (VFIO, VDPA, ...). For example, libvirt would open
>>>> the /dev/iommu once.
>>>>
>>>> If no fd is passed along with the iommufd object, the /dev/iommu
>>>> is opened by the qemu code.
>>>>
>>>> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>> ---
>>>> v6: remove redundant call, alloc_hwpt, get/put_ioas
>>>>
>>>> MAINTAINERS | 7 ++
>>>> qapi/qom.json | 19 ++++
>>>> include/sysemu/iommufd.h | 44 ++++++++
>>>> backends/iommufd.c | 228
>+++++++++++++++++++++++++++++++++++++++
>>>> backends/Kconfig | 4 +
>>>> backends/meson.build | 1 +
>>>> backends/trace-events | 10 ++
>>>> qemu-options.hx | 12 +++
>>>> 8 files changed, 325 insertions(+)
>>>> create mode 100644 include/sysemu/iommufd.h
>>>> create mode 100644 backends/iommufd.c
>>>>
>>>> diff --git a/MAINTAINERS b/MAINTAINERS
>>>> index ff1238bb98..a4891f7bda 100644
>>>> --- a/MAINTAINERS
>>>> +++ b/MAINTAINERS
>>>> @@ -2166,6 +2166,13 @@ F: hw/vfio/ap.c
>>>> F: docs/system/s390x/vfio-ap.rst
>>>> L: qemu-s390x@nongnu.org
>>>>
>>>> +iommufd
>>>> +M: Yi Liu <yi.l.liu@intel.com>
>>>> +M: Eric Auger <eric.auger@redhat.com>
>>> Zhenzhong, don't you want to be added here?
Sorry, missed this comment.
My pleasure, I'll add myself in v7.
>>>> +S: Supported
>>>> +F: backends/iommufd.c
>>>> +F: include/sysemu/iommufd.h
>>>> +
>>>> vhost
>>>> M: Michael S. Tsirkin <mst@redhat.com>
>>>> S: Supported
>>>> diff --git a/qapi/qom.json b/qapi/qom.json
>>>> index c53ef978ff..1fd8555a75 100644
>>>> --- a/qapi/qom.json
>>>> +++ b/qapi/qom.json
>>>> @@ -794,6 +794,23 @@
>>>> { 'struct': 'VfioUserServerProperties',
>>>> 'data': { 'socket': 'SocketAddress', 'device': 'str' } }
>>>>
>>>> +##
>>>> +# @IOMMUFDProperties:
>>>> +#
>>>> +# Properties for iommufd objects.
>>>> +#
>>>> +# @fd: file descriptor name previously passed via 'getfd' command,
>>> "previously passed via 'getfd' command", I wonder if this applies here or
>whether
>>> it is copy/paste of
>>> RemoteObjectProperties.fd doc?
>> Maybe copied😊 I thinks this applies here because I use monitor_fd_param to
>get fd.
>> Or I miss anything?
>This is a bit cryptic to me and I don't really understand what it means.
>It does not mean it is not correct but I am curious about explanations
>if anybody has some ;-)
I have a weak understanding on this, may have errors😊
QMP support a command named 'getfd' to send a pre-opened fd with a name,
This fd is then stored in a fd list. Then we can use that name to reference the fd
In ths list.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 19/21] hw/arm: Activate IOMMUFD for virt machines
2023-11-14 10:09 ` [PATCH v6 19/21] hw/arm: Activate IOMMUFD for virt machines Zhenzhong Duan
@ 2023-11-16 9:17 ` Eric Auger
0 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-16 9:17 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Paolo Bonzini, Peter Maydell, open list:ARM TCG CPUs
On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Cédric Le Goater <clg@redhat.com>
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/arm/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/hw/arm/Kconfig b/hw/arm/Kconfig
> index 3ada335a24..660f49db49 100644
> --- a/hw/arm/Kconfig
> +++ b/hw/arm/Kconfig
> @@ -8,6 +8,7 @@ config ARM_VIRT
> imply TPM_TIS_SYSBUS
> imply TPM_TIS_I2C
> imply NVDIMM
> + imply IOMMUFD
> select ARM_GIC
> select ACPI
> select ARM_SMMUV3
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 21/21] hw/i386: Activate IOMMUFD for q35 machines
2023-11-14 10:09 ` [PATCH v6 21/21] hw/i386: Activate IOMMUFD for q35 machines Zhenzhong Duan
@ 2023-11-16 9:17 ` Eric Auger
0 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-16 9:17 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Paolo Bonzini, Michael S. Tsirkin, Marcel Apfelbaum,
Richard Henderson, Eduardo Habkost
On 11/14/23 11:09, Zhenzhong Duan wrote:
> From: Cédric Le Goater <clg@redhat.com>
>
> Signed-off-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> hw/i386/Kconfig | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/hw/i386/Kconfig b/hw/i386/Kconfig
> index 55850791df..a1846be6f7 100644
> --- a/hw/i386/Kconfig
> +++ b/hw/i386/Kconfig
> @@ -95,6 +95,7 @@ config Q35
> imply E1000E_PCI_EXPRESS
> imply VMPORT
> imply VMMOUSE
> + imply IOMMUFD
> select PC_PCI
> select PC_ACPI
> select PCI_EXPRESS_Q35
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-14 10:09 ` [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
2023-11-14 13:28 ` Cédric Le Goater
2023-11-15 12:52 ` Eric Auger
@ 2023-11-17 11:09 ` Cédric Le Goater
2023-11-17 11:39 ` Duan, Zhenzhong
2 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-17 11:09 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, jgg, nicolinc, joao.m.martins, eric.auger,
peterx, jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng,
Paolo Bonzini, Eric Blake, Markus Armbruster,
Daniel P. Berrangé, Eduardo Habkost
Hello,
> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> + ram_addr_t size, void *vaddr, bool readonly)
> +{
> + int ret, fd = be->fd;
> + struct iommu_ioas_map map = {
> + .size = sizeof(map),
> + .flags = IOMMU_IOAS_MAP_READABLE |
> + IOMMU_IOAS_MAP_FIXED_IOVA,
> + .ioas_id = ioas_id,
> + .__reserved = 0,
> + .user_va = (uintptr_t)vaddr,
> + .iova = iova,
> + .length = size,
> + };
> +
> + if (!readonly) {
> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
> + }
> +
> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
> + vaddr, readonly, ret);
> + if (ret) {
> + ret = -errno;
> + error_report("IOMMU_IOAS_MAP failed: %m");
> + }
> + return ret;
> +}
When using a UEFI guest, QEMU reports errors when mapping regions
in the top PCI space :
iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000001000 size=0x3000 addr=0x7fce2c28b000 readonly=0 (-1)
qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150, 0x380000001000, 0x3000, 0x7fce2c28b000) = -22 (Invalid argument)
iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000004000 size=0x4000 addr=0x7fce2c980000 readonly=0 (-1)
qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150, 0x380000004000, 0x4000, 0x7fce2c980000) = -22 (Invalid argument)
This is because IOMMUFD reserved IOVAs areas are :
[ fee00000 - feefffff ]
[ 8000000000 - ffffffffffffffff ] (39 bits address space)
which were allocated when the device was initially attached.
The topology is basic. Something is wrong.
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-17 11:09 ` Cédric Le Goater
@ 2023-11-17 11:39 ` Duan, Zhenzhong
2023-11-17 12:56 ` Cédric Le Goater
2023-11-17 13:29 ` Eric Auger
0 siblings, 2 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-17 11:39 UTC (permalink / raw)
To: Cédric Le Goater, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P, Paolo Bonzini, Eric Blake,
Markus Armbruster, Daniel P. Berrangé, Eduardo Habkost,
Gerd Hoffmann, Kasireddy, Vivek, lersek@redhat.com
Hi Cédric,
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Friday, November 17, 2023 7:10 PM
>Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>
>Hello,
>
>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>hwaddr iova,
>> + ram_addr_t size, void *vaddr, bool readonly)
>> +{
>> + int ret, fd = be->fd;
>> + struct iommu_ioas_map map = {
>> + .size = sizeof(map),
>> + .flags = IOMMU_IOAS_MAP_READABLE |
>> + IOMMU_IOAS_MAP_FIXED_IOVA,
>> + .ioas_id = ioas_id,
>> + .__reserved = 0,
>> + .user_va = (uintptr_t)vaddr,
>> + .iova = iova,
>> + .length = size,
>> + };
>> +
>> + if (!readonly) {
>> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>> + }
>> +
>> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
>> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
>> + vaddr, readonly, ret);
>> + if (ret) {
>> + ret = -errno;
>> + error_report("IOMMU_IOAS_MAP failed: %m");
>> + }
>> + return ret;
>> +}
>
>When using a UEFI guest, QEMU reports errors when mapping regions
>in the top PCI space :
>
> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000001000
>size=0x3000 addr=0x7fce2c28b000 readonly=0 (-1)
> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>0x380000001000, 0x3000, 0x7fce2c28b000) = -22 (Invalid argument)
>
> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000004000
>size=0x4000 addr=0x7fce2c980000 readonly=0 (-1)
> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>0x380000004000, 0x4000, 0x7fce2c980000) = -22 (Invalid argument)
>
>This is because IOMMUFD reserved IOVAs areas are :
>
> [ fee00000 - feefffff ]
> [ 8000000000 - ffffffffffffffff ] (39 bits address space)
>
>which were allocated when the device was initially attached.
>The topology is basic. Something is wrong.
Thanks for your report. This looks a hardware limit of
host IOMMU address width(39) < guest physical address width.
A similar issue with a fix submitted below, ccing related people.
https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
It looks the fix will not work for hotplug.
Or below qemu cmdline may help:
"-cpu host,host-phys-bits-limit=39"
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-17 11:39 ` Duan, Zhenzhong
@ 2023-11-17 12:56 ` Cédric Le Goater
2023-11-17 13:29 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-17 12:56 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, eric.auger@redhat.com,
peterx@redhat.com, jasowang@redhat.com, Tian, Kevin, Liu, Yi L,
Sun, Yi Y, Peng, Chao P, Paolo Bonzini, Eric Blake,
Markus Armbruster, Daniel P. Berrangé, Eduardo Habkost,
Gerd Hoffmann, Kasireddy, Vivek, lersek@redhat.com
On 11/17/23 12:39, Duan, Zhenzhong wrote:
> Hi Cédric,
>
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Sent: Friday, November 17, 2023 7:10 PM
>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>>
>> Hello,
>>
>>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> hwaddr iova,
>>> + ram_addr_t size, void *vaddr, bool readonly)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_ioas_map map = {
>>> + .size = sizeof(map),
>>> + .flags = IOMMU_IOAS_MAP_READABLE |
>>> + IOMMU_IOAS_MAP_FIXED_IOVA,
>>> + .ioas_id = ioas_id,
>>> + .__reserved = 0,
>>> + .user_va = (uintptr_t)vaddr,
>>> + .iova = iova,
>>> + .length = size,
>>> + };
>>> +
>>> + if (!readonly) {
>>> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>>> + }
>>> +
>>> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
>>> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
>>> + vaddr, readonly, ret);
>>> + if (ret) {
>>> + ret = -errno;
>>> + error_report("IOMMU_IOAS_MAP failed: %m");
>>> + }
>>> + return ret;
>>> +}
>>
>> When using a UEFI guest, QEMU reports errors when mapping regions
>> in the top PCI space :
>>
>> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000001000
>> size=0x3000 addr=0x7fce2c28b000 readonly=0 (-1)
>> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
>> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>> 0x380000001000, 0x3000, 0x7fce2c28b000) = -22 (Invalid argument)
>>
>> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000004000
>> size=0x4000 addr=0x7fce2c980000 readonly=0 (-1)
>> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
>> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>> 0x380000004000, 0x4000, 0x7fce2c980000) = -22 (Invalid argument)
>>
>> This is because IOMMUFD reserved IOVAs areas are :
>>
>> [ fee00000 - feefffff ]
>> [ 8000000000 - ffffffffffffffff ] (39 bits address space)
>>
>> which were allocated when the device was initially attached.
>> The topology is basic. Something is wrong.
>
> Thanks for your report. This looks a hardware limit of
> host IOMMU address width(39) < guest physical address width.
>
> A similar issue with a fix submitted below, ccing related people.
> https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
> It looks the fix will not work for hotplug.
>
> Or below qemu cmdline may help:
> "-cpu host,host-phys-bits-limit=39"
Not that much. The IOMMU_IOAS_MAP failure becomes a "Bad address".
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-17 11:39 ` Duan, Zhenzhong
2023-11-17 12:56 ` Cédric Le Goater
@ 2023-11-17 13:29 ` Eric Auger
2023-11-17 13:56 ` Cédric Le Goater
1 sibling, 1 reply; 82+ messages in thread
From: Eric Auger @ 2023-11-17 13:29 UTC (permalink / raw)
To: Duan, Zhenzhong, Cédric Le Goater, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
Hi Cédric,
On 11/17/23 12:39, Duan, Zhenzhong wrote:
> Hi Cédric,
>
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Sent: Friday, November 17, 2023 7:10 PM
>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>>
>> Hello,
>>
>>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> hwaddr iova,
>>> + ram_addr_t size, void *vaddr, bool readonly)
>>> +{
>>> + int ret, fd = be->fd;
>>> + struct iommu_ioas_map map = {
>>> + .size = sizeof(map),
>>> + .flags = IOMMU_IOAS_MAP_READABLE |
>>> + IOMMU_IOAS_MAP_FIXED_IOVA,
>>> + .ioas_id = ioas_id,
>>> + .__reserved = 0,
>>> + .user_va = (uintptr_t)vaddr,
>>> + .iova = iova,
>>> + .length = size,
>>> + };
>>> +
>>> + if (!readonly) {
>>> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>>> + }
>>> +
>>> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
>>> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
>>> + vaddr, readonly, ret);
>>> + if (ret) {
>>> + ret = -errno;
>>> + error_report("IOMMU_IOAS_MAP failed: %m");
>>> + }
>>> + return ret;
>>> +}
>> When using a UEFI guest, QEMU reports errors when mapping regions
>> in the top PCI space :
>>
>> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000001000
>> size=0x3000 addr=0x7fce2c28b000 readonly=0 (-1)
>> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
>> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>> 0x380000001000, 0x3000, 0x7fce2c28b000) = -22 (Invalid argument)
>>
>> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000004000
>> size=0x4000 addr=0x7fce2c980000 readonly=0 (-1)
>> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
>> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>> 0x380000004000, 0x4000, 0x7fce2c980000) = -22 (Invalid argument)
>>
>> This is because IOMMUFD reserved IOVAs areas are :
>>
>> [ fee00000 - feefffff ]
>> [ 8000000000 - ffffffffffffffff ] (39 bits address space)
>>
>> which were allocated when the device was initially attached.
>> The topology is basic. Something is wrong.
>
> Thanks for your report. This looks a hardware limit of
> host IOMMU address width(39) < guest physical address width.
>
> A similar issue with a fix submitted below, ccing related people.
> https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
> It looks the fix will not work for hotplug.
>
> Or below qemu cmdline may help:
> "-cpu host,host-phys-bits-limit=39"
don't you have the same issue with legacy VFIO code, you should?
Eric
>
> Thanks
> Zhenzhong
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 09/21] vfio/iommufd: Enable pci hot reset through iommufd cdev interface
2023-11-14 10:09 ` [PATCH v6 09/21] vfio/iommufd: Enable pci hot reset through iommufd cdev interface Zhenzhong Duan
@ 2023-11-17 13:53 ` Eric Auger
2023-11-20 4:15 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2023-11-17 13:53 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Add a new callback iommufd_cdev_pci_hot_reset to do iommufd specific
> check and reset operation.
nit: Implement the newly introduced pci_hot_reset callback?
>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> ---
> v6: pci_hot_reset return -errno if fails
>
> hw/vfio/iommufd.c | 145 +++++++++++++++++++++++++++++++++++++++++++
> hw/vfio/trace-events | 1 +
> 2 files changed, 146 insertions(+)
>
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index e5bf528e89..3eec428162 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -24,6 +24,7 @@
> #include "sysemu/reset.h"
> #include "qemu/cutils.h"
> #include "qemu/chardev_open.h"
> +#include "pci.h"
>
> static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> @@ -473,9 +474,153 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
> close(vbasedev->fd);
> }
>
> +static VFIODevice *iommufd_cdev_pci_find_by_devid(__u32 devid)
> +{
> + VFIODevice *vbasedev_iter;
> +
> + QLIST_FOREACH(vbasedev_iter, &vfio_device_list, global_next) {
> + if (vbasedev_iter->bcontainer->ops != &vfio_iommufd_ops) {
> + continue;
> + }
> + if (devid == vbasedev_iter->devid) {
> + return vbasedev_iter;
> + }
> + }
> + return NULL;
> +}
> +
> +static int iommufd_cdev_pci_hot_reset(VFIODevice *vbasedev, bool single)
> +{
> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
> + struct vfio_pci_hot_reset_info *info = NULL;
> + struct vfio_pci_dependent_device *devices;
> + struct vfio_pci_hot_reset *reset;
> + int ret, i;
> + bool multi = false;
> +
> + trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
> +
> + if (!single) {
> + vfio_pci_pre_reset(vdev);
> + }
> + vdev->vbasedev.needs_reset = false;
> +
> + ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
> +
> + if (ret) {
> + goto out_single;
> + }
> +
> + assert(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID);
> +
> + devices = &info->devices[0];
> +
> + if (!(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED)) {
> + if (!vdev->has_pm_reset) {
> + for (i = 0; i < info->count; i++) {
> + if (devices[i].devid == VFIO_PCI_DEVID_NOT_OWNED) {
> + error_report("vfio: Cannot reset device %s, "
> + "depends on device %04x:%02x:%02x.%x "
> + "which is not owned.",
> + vdev->vbasedev.name, devices[i].segment,
> + devices[i].bus, PCI_SLOT(devices[i].devfn),
> + PCI_FUNC(devices[i].devfn));
> + }
> + }
> + }
> + ret = -EPERM;
> + goto out_single;
> + }
> +
> + trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
> +
> + for (i = 0; i < info->count; i++) {
> + VFIOPCIDevice *tmp;
> + VFIODevice *vbasedev_iter;
> +
> + trace_iommufd_cdev_pci_hot_reset_dep_devices(devices[i].segment,
> + devices[i].bus,
> + PCI_SLOT(devices[i].devfn),
> + PCI_FUNC(devices[i].devfn),
> + devices[i].devid);
> +
> + /*
> + * If a VFIO cdev device is resettable, all the dependent devices
> + * are either bound to same iommufd or within same iommu_groups as
> + * one of the iommufd bound devices.
> + */
> + assert(devices[i].devid != VFIO_PCI_DEVID_NOT_OWNED);
> +
> + if (devices[i].devid == vdev->vbasedev.devid ||
> + devices[i].devid == VFIO_PCI_DEVID_OWNED) {
> + continue;
> + }
> +
> + vbasedev_iter = iommufd_cdev_pci_find_by_devid(devices[i].devid);
> + if (!vbasedev_iter || !vbasedev_iter->dev->realized ||
> + vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> + continue;
> + }
> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
> + if (single) {
> + ret = -EINVAL;
> + goto out_single;
> + }
> + vfio_pci_pre_reset(tmp);
> + tmp->vbasedev.needs_reset = false;
> + multi = true;
> + }
> +
> + if (!single && !multi) {
> + ret = -EINVAL;
> + goto out_single;
> + }
> +
> + /* Use zero length array for hot reset with iommufd backend */
> + reset = g_malloc0(sizeof(*reset));
> + reset->argsz = sizeof(*reset);
> +
> + /* Bus reset! */
> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
> + g_free(reset);
> + if (ret) {
> + ret = -errno;
> + }
> +
> + trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
> + ret ? strerror(errno) : "Success");
> +
> + /* Re-enable INTx on affected devices */
> + for (i = 0; i < info->count; i++) {
> + VFIOPCIDevice *tmp;
> + VFIODevice *vbasedev_iter;
> +
> + if (devices[i].devid == vdev->vbasedev.devid ||
> + devices[i].devid == VFIO_PCI_DEVID_OWNED) {
> + continue;
> + }
> +
> + vbasedev_iter = iommufd_cdev_pci_find_by_devid(devices[i].devid);
> + if (!vbasedev_iter || !vbasedev_iter->dev->realized ||
> + vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
> + continue;
> + }
> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
nit: I see this block of code also is used above for the pre_reset. May
be interesting to introduce an helper? Could be done later though
> + vfio_pci_post_reset(tmp);
> + }
> +out_single:
> + if (!single) {
> + vfio_pci_post_reset(vdev);
> + }
> + g_free(info);
> +
> + return ret;
> +}
> +
> const VFIOIOMMUOps vfio_iommufd_ops = {
> .dma_map = iommufd_cdev_map,
> .dma_unmap = iommufd_cdev_unmap,
> .attach_device = iommufd_cdev_attach,
> .detach_device = iommufd_cdev_detach,
> + .pci_hot_reset = iommufd_cdev_pci_hot_reset,
> };
> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
> index 5d3e9e8cee..d838232d5a 100644
> --- a/hw/vfio/trace-events
> +++ b/hw/vfio/trace-events
> @@ -174,3 +174,4 @@ iommufd_cdev_detach_ioas_hwpt(int iommufd, const char *name, const char *str, in
> iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
> iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new IOMMUFD container with ioasid=%d"
> iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"
> +iommufd_cdev_pci_hot_reset_dep_devices(int domain, int bus, int slot, int function, int dev_id) "\t%04x:%02x:%02x.%x devid %d"
Otherwise looks good to me.
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-17 13:29 ` Eric Auger
@ 2023-11-17 13:56 ` Cédric Le Goater
2023-11-20 3:06 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-17 13:56 UTC (permalink / raw)
To: eric.auger, Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
On 11/17/23 14:29, Eric Auger wrote:
> Hi Cédric,
>
> On 11/17/23 12:39, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Sent: Friday, November 17, 2023 7:10 PM
>>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>>>
>>> Hello,
>>>
>>>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>> hwaddr iova,
>>>> + ram_addr_t size, void *vaddr, bool readonly)
>>>> +{
>>>> + int ret, fd = be->fd;
>>>> + struct iommu_ioas_map map = {
>>>> + .size = sizeof(map),
>>>> + .flags = IOMMU_IOAS_MAP_READABLE |
>>>> + IOMMU_IOAS_MAP_FIXED_IOVA,
>>>> + .ioas_id = ioas_id,
>>>> + .__reserved = 0,
>>>> + .user_va = (uintptr_t)vaddr,
>>>> + .iova = iova,
>>>> + .length = size,
>>>> + };
>>>> +
>>>> + if (!readonly) {
>>>> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>>>> + }
>>>> +
>>>> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
>>>> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
>>>> + vaddr, readonly, ret);
>>>> + if (ret) {
>>>> + ret = -errno;
>>>> + error_report("IOMMU_IOAS_MAP failed: %m");
>>>> + }
>>>> + return ret;
>>>> +}
>>> When using a UEFI guest, QEMU reports errors when mapping regions
>>> in the top PCI space :
>>>
>>> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000001000
>>> size=0x3000 addr=0x7fce2c28b000 readonly=0 (-1)
>>> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
>>> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>>> 0x380000001000, 0x3000, 0x7fce2c28b000) = -22 (Invalid argument)
>>>
>>> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000004000
>>> size=0x4000 addr=0x7fce2c980000 readonly=0 (-1)
>>> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
>>> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>>> 0x380000004000, 0x4000, 0x7fce2c980000) = -22 (Invalid argument)
>>>
>>> This is because IOMMUFD reserved IOVAs areas are :
>>>
>>> [ fee00000 - feefffff ]
>>> [ 8000000000 - ffffffffffffffff ] (39 bits address space)
>>>
>>> which were allocated when the device was initially attached.
>>> The topology is basic. Something is wrong.
>>
>> Thanks for your report. This looks a hardware limit of
>> host IOMMU address width(39) < guest physical address width.
>>
>> A similar issue with a fix submitted below, ccing related people.
>> https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
>> It looks the fix will not work for hotplug.
>>
>> Or below qemu cmdline may help:
>> "-cpu host,host-phys-bits-limit=39"
>
> don't you have the same issue with legacy VFIO code, you should?
I tend to be lazy and use seabios for guests on the command line.
I do see the error with legacy VFIO and uefi.
However, with the address space size work-around and iommufd, the
error is different, an EFAULT now. Some page pinning issue it seems.
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 12/21] vfio/platform: Allow the selection of a given iommu backend
2023-11-14 10:09 ` [PATCH v6 12/21] vfio/platform: Allow the selection of a given iommu backend Zhenzhong Duan
2023-11-14 14:03 ` Cédric Le Goater
@ 2023-11-17 14:55 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-17 14:55 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
Hi Zhenzhong,
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Now we support two types of iommu backends, let's add the capability
> to select one of them. This depends on whether an iommufd object has
> been linked with the vfio-platform device:
>
> If the user wants to use the legacy backend, it shall not
> link the vfio-platform device with any iommufd object:
>
> -device vfio-platform,host=XXX
>
> This is called the legacy mode/backend.
>
> If the user wants to use the iommufd backend (/dev/iommu) it
> shall pass an iommufd object id in the vfio-platform device options:
>
> -object iommufd,id=iommufd0
> -device vfio-platform,host=XXX,iommufd=iommufd0
>
> Suggested-by: Alex Williamson <alex.williamson@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Eric
> ---
> v6: Move #include "sysemu/iommufd.h" in platform.c
>
> hw/vfio/platform.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index 8e3d4ac458..98ae4bc655 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -15,11 +15,13 @@
> */
>
> #include "qemu/osdep.h"
> +#include CONFIG_DEVICES /* CONFIG_IOMMUFD */
> #include "qapi/error.h"
> #include <sys/ioctl.h>
> #include <linux/vfio.h>
>
> #include "hw/vfio/vfio-platform.h"
> +#include "sysemu/iommufd.h"
> #include "migration/vmstate.h"
> #include "qemu/error-report.h"
> #include "qemu/lockable.h"
> @@ -649,6 +651,10 @@ static Property vfio_platform_dev_properties[] = {
> DEFINE_PROP_UINT32("mmap-timeout-ms", VFIOPlatformDevice,
> mmap_timeout, 1100),
> DEFINE_PROP_BOOL("x-irqfd", VFIOPlatformDevice, irqfd_allowed, true),
> +#ifdef CONFIG_IOMMUFD
> + DEFINE_PROP_LINK("iommufd", VFIOPlatformDevice, vbasedev.iommufd,
> + TYPE_IOMMUFD_BACKEND, IOMMUFDBackend *),
> +#endif
> DEFINE_PROP_END_OF_LIST(),
> };
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 18/21] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks
2023-11-14 10:09 ` [PATCH v6 18/21] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Zhenzhong Duan
2023-11-14 14:05 ` Cédric Le Goater
@ 2023-11-17 14:58 ` Eric Auger
1 sibling, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-17 14:58 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Some of the callbacks in VFIOIOMMUOps pass VFIOContainerBase poiner,
> those callbacks only need read access to the sub object of VFIOContainerBase.
> So make VFIOContainerBase, VFIOContainer and VFIOIOMMUFDContainer as const
> in these callbacks.
>
> Local functions called by those callbacks also need same changes to avoid
> build error.
>
> Suggested-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
Thanks
Eric
> ---
> include/hw/vfio/vfio-common.h | 12 ++++++----
> include/hw/vfio/vfio-container-base.h | 12 ++++++----
> hw/vfio/common.c | 9 +++----
> hw/vfio/container-base.c | 2 +-
> hw/vfio/container.c | 34 ++++++++++++++-------------
> hw/vfio/iommufd.c | 8 +++----
> 6 files changed, 42 insertions(+), 35 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 567e5f7bea..7954531d05 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -244,13 +244,15 @@ bool vfio_migration_realize(VFIODevice *vbasedev, Error **errp);
> void vfio_migration_exit(VFIODevice *vbasedev);
>
> int vfio_bitmap_alloc(VFIOBitmap *vbmap, hwaddr size);
> -bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer);
> -bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer);
> -int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +bool
> +vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer);
> +bool
> +vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer);
> +int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap, hwaddr iova,
> hwaddr size);
> -int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
> - uint64_t size, ram_addr_t ram_addr);
> +int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
> + uint64_t size, ram_addr_t ram_addr);
>
> int vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
> void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 45bb19c767..2ae297ccda 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -82,7 +82,7 @@ void vfio_container_del_section_window(VFIOContainerBase *bcontainer,
> MemoryRegionSection *section);
> int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> bool start);
> -int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size);
>
> @@ -93,18 +93,20 @@ void vfio_container_destroy(VFIOContainerBase *bcontainer);
>
> struct VFIOIOMMUOps {
> /* basic feature */
> - int (*dma_map)(VFIOContainerBase *bcontainer,
> + int (*dma_map)(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> void *vaddr, bool readonly);
> - int (*dma_unmap)(VFIOContainerBase *bcontainer,
> + int (*dma_unmap)(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> IOMMUTLBEntry *iotlb);
> int (*attach_device)(const char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void (*detach_device)(VFIODevice *vbasedev);
> /* migration feature */
> - int (*set_dirty_page_tracking)(VFIOContainerBase *bcontainer, bool start);
> - int (*query_dirty_bitmap)(VFIOContainerBase *bcontainer, VFIOBitmap *vbmap,
> + int (*set_dirty_page_tracking)(const VFIOContainerBase *bcontainer,
> + bool start);
> + int (*query_dirty_bitmap)(const VFIOContainerBase *bcontainer,
> + VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size);
> /* PCI specific */
> int (*pci_hot_reset)(VFIODevice *vbasedev, bool single);
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 6569732b7a..08a3e57672 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -204,7 +204,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainerBase *bcontainer)
> return true;
> }
>
> -bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
> +bool vfio_devices_all_device_dirty_tracking(const VFIOContainerBase *bcontainer)
> {
> VFIODevice *vbasedev;
>
> @@ -221,7 +221,8 @@ bool vfio_devices_all_device_dirty_tracking(VFIOContainerBase *bcontainer)
> * Check if all VFIO devices are running and migration is active, which is
> * essentially equivalent to the migration being in pre-copy phase.
> */
> -bool vfio_devices_all_running_and_mig_active(VFIOContainerBase *bcontainer)
> +bool
> +vfio_devices_all_running_and_mig_active(const VFIOContainerBase *bcontainer)
> {
> VFIODevice *vbasedev;
>
> @@ -1139,7 +1140,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
> return 0;
> }
>
> -int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +int vfio_devices_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap, hwaddr iova,
> hwaddr size)
> {
> @@ -1162,7 +1163,7 @@ int vfio_devices_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> return 0;
> }
>
> -int vfio_get_dirty_bitmap(VFIOContainerBase *bcontainer, uint64_t iova,
> +int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
> uint64_t size, ram_addr_t ram_addr)
> {
> bool all_device_dirty_tracking =
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index eee2dcfe76..1ffd25bbfa 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -63,7 +63,7 @@ int vfio_container_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> return bcontainer->ops->set_dirty_page_tracking(bcontainer, start);
> }
>
> -int vfio_container_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +int vfio_container_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size)
> {
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 1dbf9b9a17..b22feb8ded 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -61,11 +61,11 @@ static int vfio_ram_block_discard_disable(VFIOContainer *container, bool state)
> }
> }
>
> -static int vfio_dma_unmap_bitmap(VFIOContainer *container,
> +static int vfio_dma_unmap_bitmap(const VFIOContainer *container,
> hwaddr iova, ram_addr_t size,
> IOMMUTLBEntry *iotlb)
> {
> - VFIOContainerBase *bcontainer = &container->bcontainer;
> + const VFIOContainerBase *bcontainer = &container->bcontainer;
> struct vfio_iommu_type1_dma_unmap *unmap;
> struct vfio_bitmap *bitmap;
> VFIOBitmap vbmap;
> @@ -117,11 +117,12 @@ unmap_exit:
> /*
> * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
> */
> -static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
> - ram_addr_t size, IOMMUTLBEntry *iotlb)
> +static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> + hwaddr iova, ram_addr_t size,
> + IOMMUTLBEntry *iotlb)
> {
> - VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> - bcontainer);
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> struct vfio_iommu_type1_dma_unmap unmap = {
> .argsz = sizeof(unmap),
> .flags = 0,
> @@ -174,11 +175,11 @@ static int vfio_legacy_dma_unmap(VFIOContainerBase *bcontainer, hwaddr iova,
> return 0;
> }
>
> -static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
> +static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> {
> - VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> - bcontainer);
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> struct vfio_iommu_type1_dma_map map = {
> .argsz = sizeof(map),
> .flags = VFIO_DMA_MAP_FLAG_READ,
> @@ -207,11 +208,12 @@ static int vfio_legacy_dma_map(VFIOContainerBase *bcontainer, hwaddr iova,
> return -errno;
> }
>
> -static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> - bool start)
> +static int
> +vfio_legacy_set_dirty_page_tracking(const VFIOContainerBase *bcontainer,
> + bool start)
> {
> - VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> - bcontainer);
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> int ret;
> struct vfio_iommu_type1_dirty_bitmap dirty = {
> .argsz = sizeof(dirty),
> @@ -233,12 +235,12 @@ static int vfio_legacy_set_dirty_page_tracking(VFIOContainerBase *bcontainer,
> return ret;
> }
>
> -static int vfio_legacy_query_dirty_bitmap(VFIOContainerBase *bcontainer,
> +static int vfio_legacy_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> VFIOBitmap *vbmap,
> hwaddr iova, hwaddr size)
> {
> - VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> - bcontainer);
> + const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> + bcontainer);
> struct vfio_iommu_type1_dirty_bitmap *dbitmap;
> struct vfio_iommu_type1_dirty_bitmap_get *range;
> int ret;
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index e08a217057..bc45dd1842 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -26,10 +26,10 @@
> #include "qemu/chardev_open.h"
> #include "pci.h"
>
> -static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
> +static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> {
> - VFIOIOMMUFDContainer *container =
> + const VFIOIOMMUFDContainer *container =
> container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>
> return iommufd_backend_map_dma(container->be,
> @@ -37,11 +37,11 @@ static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
> iova, size, vaddr, readonly);
> }
>
> -static int iommufd_cdev_unmap(VFIOContainerBase *bcontainer,
> +static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> IOMMUTLBEntry *iotlb)
> {
> - VFIOIOMMUFDContainer *container =
> + const VFIOIOMMUFDContainer *container =
> container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>
> /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-17 13:56 ` Cédric Le Goater
@ 2023-11-20 3:06 ` Duan, Zhenzhong
2023-11-20 8:24 ` Cédric Le Goater
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-20 3:06 UTC (permalink / raw)
To: Cédric Le Goater, eric.auger@redhat.com,
qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Friday, November 17, 2023 9:56 PM
>Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>
>On 11/17/23 14:29, Eric Auger wrote:
>> Hi Cédric,
>>
>> On 11/17/23 12:39, Duan, Zhenzhong wrote:
>>> Hi Cédric,
>>>
>>>> -----Original Message-----
>>>> From: Cédric Le Goater <clg@redhat.com>
>>>> Sent: Friday, November 17, 2023 7:10 PM
>>>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd
>object
>>>>
>>>> Hello,
>>>>
>>>>> +int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>>> hwaddr iova,
>>>>> + ram_addr_t size, void *vaddr, bool readonly)
>>>>> +{
>>>>> + int ret, fd = be->fd;
>>>>> + struct iommu_ioas_map map = {
>>>>> + .size = sizeof(map),
>>>>> + .flags = IOMMU_IOAS_MAP_READABLE |
>>>>> + IOMMU_IOAS_MAP_FIXED_IOVA,
>>>>> + .ioas_id = ioas_id,
>>>>> + .__reserved = 0,
>>>>> + .user_va = (uintptr_t)vaddr,
>>>>> + .iova = iova,
>>>>> + .length = size,
>>>>> + };
>>>>> +
>>>>> + if (!readonly) {
>>>>> + map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>>>>> + }
>>>>> +
>>>>> + ret = ioctl(fd, IOMMU_IOAS_MAP, &map);
>>>>> + trace_iommufd_backend_map_dma(fd, ioas_id, iova, size,
>>>>> + vaddr, readonly, ret);
>>>>> + if (ret) {
>>>>> + ret = -errno;
>>>>> + error_report("IOMMU_IOAS_MAP failed: %m");
>>>>> + }
>>>>> + return ret;
>>>>> +}
>>>> When using a UEFI guest, QEMU reports errors when mapping regions
>>>> in the top PCI space :
>>>>
>>>> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000001000
>>>> size=0x3000 addr=0x7fce2c28b000 readonly=0 (-1)
>>>> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
>>>> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>>>> 0x380000001000, 0x3000, 0x7fce2c28b000) = -22 (Invalid argument)
>>>>
>>>> iommufd_backend_map_dma iommufd=10 ioas=2 iova=0x380000004000
>>>> size=0x4000 addr=0x7fce2c980000 readonly=0 (-1)
>>>> qemu-system-x86_64: IOMMU_IOAS_MAP failed: Invalid argument
>>>> qemu-system-x86_64: vfio_container_dma_map(0x55a21b03a150,
>>>> 0x380000004000, 0x4000, 0x7fce2c980000) = -22 (Invalid argument)
>>>>
>>>> This is because IOMMUFD reserved IOVAs areas are :
>>>>
>>>> [ fee00000 - feefffff ]
>>>> [ 8000000000 - ffffffffffffffff ] (39 bits address space)
>>>>
>>>> which were allocated when the device was initially attached.
>>>> The topology is basic. Something is wrong.
>>>
>>> Thanks for your report. This looks a hardware limit of
>>> host IOMMU address width(39) < guest physical address width.
>>>
>>> A similar issue with a fix submitted below, ccing related people.
>>> https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
>>> It looks the fix will not work for hotplug.
>>>
>>> Or below qemu cmdline may help:
>>> "-cpu host,host-phys-bits-limit=39"
>>
>> don't you have the same issue with legacy VFIO code, you should?
>
>I tend to be lazy and use seabios for guests on the command line.
>I do see the error with legacy VFIO and uefi.
>
>However, with the address space size work-around and iommufd, the
>error is different, an EFAULT now. Some page pinning issue it seems.
Yes, this reminds me of iommufd not supporting p2p mapping yet.
So EFAULT is expected. Maybe I should add a comment in docs/devel/vfio-iommufd.rst
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 09/21] vfio/iommufd: Enable pci hot reset through iommufd cdev interface
2023-11-17 13:53 ` Eric Auger
@ 2023-11-20 4:15 ` Duan, Zhenzhong
0 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-20 4:15 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, peterx@redhat.com,
jasowang@redhat.com, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
Peng, Chao P
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Friday, November 17, 2023 9:54 PM
>Subject: Re: [PATCH v6 09/21] vfio/iommufd: Enable pci hot reset through
>iommufd cdev interface
>
>
>
>On 11/14/23 11:09, Zhenzhong Duan wrote:
>> Add a new callback iommufd_cdev_pci_hot_reset to do iommufd specific
>> check and reset operation.
>
>nit: Implement the newly introduced pci_hot_reset callback?
Yes
>>
>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>> ---
>> v6: pci_hot_reset return -errno if fails
>>
>> hw/vfio/iommufd.c | 145
>+++++++++++++++++++++++++++++++++++++++++++
>> hw/vfio/trace-events | 1 +
>> 2 files changed, 146 insertions(+)
>>
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index e5bf528e89..3eec428162 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -24,6 +24,7 @@
>> #include "sysemu/reset.h"
>> #include "qemu/cutils.h"
>> #include "qemu/chardev_open.h"
>> +#include "pci.h"
>>
>> static int iommufd_cdev_map(VFIOContainerBase *bcontainer, hwaddr iova,
>> ram_addr_t size, void *vaddr, bool readonly)
>> @@ -473,9 +474,153 @@ static void iommufd_cdev_detach(VFIODevice
>*vbasedev)
>> close(vbasedev->fd);
>> }
>>
>> +static VFIODevice *iommufd_cdev_pci_find_by_devid(__u32 devid)
>> +{
>> + VFIODevice *vbasedev_iter;
>> +
>> + QLIST_FOREACH(vbasedev_iter, &vfio_device_list, global_next) {
>> + if (vbasedev_iter->bcontainer->ops != &vfio_iommufd_ops) {
>> + continue;
>> + }
>> + if (devid == vbasedev_iter->devid) {
>> + return vbasedev_iter;
>> + }
>> + }
>> + return NULL;
>> +}
>> +
>> +static int iommufd_cdev_pci_hot_reset(VFIODevice *vbasedev, bool single)
>> +{
>> + VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
>> + struct vfio_pci_hot_reset_info *info = NULL;
>> + struct vfio_pci_dependent_device *devices;
>> + struct vfio_pci_hot_reset *reset;
>> + int ret, i;
>> + bool multi = false;
>> +
>> + trace_vfio_pci_hot_reset(vdev->vbasedev.name, single ? "one" : "multi");
>> +
>> + if (!single) {
>> + vfio_pci_pre_reset(vdev);
>> + }
>> + vdev->vbasedev.needs_reset = false;
>> +
>> + ret = vfio_pci_get_pci_hot_reset_info(vdev, &info);
>> +
>> + if (ret) {
>> + goto out_single;
>> + }
>> +
>> + assert(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID);
>> +
>> + devices = &info->devices[0];
>> +
>> + if (!(info->flags & VFIO_PCI_HOT_RESET_FLAG_DEV_ID_OWNED)) {
>> + if (!vdev->has_pm_reset) {
>> + for (i = 0; i < info->count; i++) {
>> + if (devices[i].devid == VFIO_PCI_DEVID_NOT_OWNED) {
>> + error_report("vfio: Cannot reset device %s, "
>> + "depends on device %04x:%02x:%02x.%x "
>> + "which is not owned.",
>> + vdev->vbasedev.name, devices[i].segment,
>> + devices[i].bus, PCI_SLOT(devices[i].devfn),
>> + PCI_FUNC(devices[i].devfn));
>> + }
>> + }
>> + }
>> + ret = -EPERM;
>> + goto out_single;
>> + }
>> +
>> + trace_vfio_pci_hot_reset_has_dep_devices(vdev->vbasedev.name);
>> +
>> + for (i = 0; i < info->count; i++) {
>> + VFIOPCIDevice *tmp;
>> + VFIODevice *vbasedev_iter;
>> +
>> + trace_iommufd_cdev_pci_hot_reset_dep_devices(devices[i].segment,
>> + devices[i].bus,
>> + PCI_SLOT(devices[i].devfn),
>> + PCI_FUNC(devices[i].devfn),
>> + devices[i].devid);
>> +
>> + /*
>> + * If a VFIO cdev device is resettable, all the dependent devices
>> + * are either bound to same iommufd or within same iommu_groups as
>> + * one of the iommufd bound devices.
>> + */
>> + assert(devices[i].devid != VFIO_PCI_DEVID_NOT_OWNED);
>> +
>> + if (devices[i].devid == vdev->vbasedev.devid ||
>> + devices[i].devid == VFIO_PCI_DEVID_OWNED) {
>> + continue;
>> + }
>> +
>> + vbasedev_iter = iommufd_cdev_pci_find_by_devid(devices[i].devid);
>> + if (!vbasedev_iter || !vbasedev_iter->dev->realized ||
>> + vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
>> + continue;
>> + }
>> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
>> + if (single) {
>> + ret = -EINVAL;
>> + goto out_single;
>> + }
>> + vfio_pci_pre_reset(tmp);
>> + tmp->vbasedev.needs_reset = false;
>> + multi = true;
>> + }
>> +
>> + if (!single && !multi) {
>> + ret = -EINVAL;
>> + goto out_single;
>> + }
>> +
>> + /* Use zero length array for hot reset with iommufd backend */
>> + reset = g_malloc0(sizeof(*reset));
>> + reset->argsz = sizeof(*reset);
>> +
>> + /* Bus reset! */
>> + ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_PCI_HOT_RESET, reset);
>> + g_free(reset);
>> + if (ret) {
>> + ret = -errno;
>> + }
>> +
>> + trace_vfio_pci_hot_reset_result(vdev->vbasedev.name,
>> + ret ? strerror(errno) : "Success");
>> +
>> + /* Re-enable INTx on affected devices */
>> + for (i = 0; i < info->count; i++) {
>> + VFIOPCIDevice *tmp;
>> + VFIODevice *vbasedev_iter;
>> +
>> + if (devices[i].devid == vdev->vbasedev.devid ||
>> + devices[i].devid == VFIO_PCI_DEVID_OWNED) {
>> + continue;
>> + }
>> +
>> + vbasedev_iter = iommufd_cdev_pci_find_by_devid(devices[i].devid);
>> + if (!vbasedev_iter || !vbasedev_iter->dev->realized ||
>> + vbasedev_iter->type != VFIO_DEVICE_TYPE_PCI) {
>> + continue;
>> + }
>> + tmp = container_of(vbasedev_iter, VFIOPCIDevice, vbasedev);
>nit: I see this block of code also is used above for the pre_reset. May
>be interesting to introduce an helper? Could be done later though
Will do in v7.
Thanks
Zhenzhong
>> + vfio_pci_post_reset(tmp);
>> + }
>> +out_single:
>> + if (!single) {
>> + vfio_pci_post_reset(vdev);
>> + }
>> + g_free(info);
>> +
>> + return ret;
>> +}
>> +
>> const VFIOIOMMUOps vfio_iommufd_ops = {
>> .dma_map = iommufd_cdev_map,
>> .dma_unmap = iommufd_cdev_unmap,
>> .attach_device = iommufd_cdev_attach,
>> .detach_device = iommufd_cdev_detach,
>> + .pci_hot_reset = iommufd_cdev_pci_hot_reset,
>> };
>> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
>> index 5d3e9e8cee..d838232d5a 100644
>> --- a/hw/vfio/trace-events
>> +++ b/hw/vfio/trace-events
>> @@ -174,3 +174,4 @@ iommufd_cdev_detach_ioas_hwpt(int iommufd, const
>char *name, const char *str, in
>> iommufd_cdev_fail_attach_existing_container(const char *msg) " %s"
>> iommufd_cdev_alloc_ioas(int iommufd, int ioas_id) " [iommufd=%d] new
>IOMMUFD container with ioasid=%d"
>> iommufd_cdev_device_info(char *name, int devfd, int num_irqs, int
>num_regions, int flags) " %s (%d) num_irqs=%d num_regions=%d flags=%d"
>> +iommufd_cdev_pci_hot_reset_dep_devices(int domain, int bus, int slot, int
>function, int dev_id) "\t%04x:%02x:%02x.%x devid %d"
>Otherwise looks good to me.
>
>Reviewed-by: Eric Auger <eric.auger@redhat.com>
>
>
>Eric
>
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-20 3:06 ` Duan, Zhenzhong
@ 2023-11-20 8:24 ` Cédric Le Goater
2023-11-20 10:07 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-20 8:24 UTC (permalink / raw)
To: Duan, Zhenzhong, eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
>>>> A similar issue with a fix submitted below, ccing related people.
>>>> https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
>>>> It looks the fix will not work for hotplug.
>>>>
>>>> Or below qemu cmdline may help:
>>>> "-cpu host,host-phys-bits-limit=39"
>>>
>>> don't you have the same issue with legacy VFIO code, you should?
>>
>> I tend to be lazy and use seabios for guests on the command line.
>> I do see the error with legacy VFIO and uefi.
>>
>> However, with the address space size work-around and iommufd, the
>> error is different, an EFAULT now. Some page pinning issue it seems.
>
> Yes, this reminds me of iommufd not supporting p2p mapping yet.
OK. Should we transform this error in a warning ? The code needs
at least a comment.
> So EFAULT is expected. Maybe I should add a comment in docs/devel/vfio-iommufd.rst
Yes. It would be good to have a list of gaps and effects in the
documentation. See Jason's presentation at LPC.
https://lpc.events/event/17/contributions/1418/attachments/1297/2607/LPC2023_iommufd.pdf
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 00/21] vfio: Adopt iommufd
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
` (21 preceding siblings ...)
2023-11-14 14:51 ` [PATCH v6 00/21] vfio: Adopt iommufd Cédric Le Goater
@ 2023-11-20 9:15 ` Eric Auger
2023-11-20 10:09 ` Duan, Zhenzhong
22 siblings, 1 reply; 82+ messages in thread
From: Eric Auger @ 2023-11-20 9:15 UTC (permalink / raw)
To: Zhenzhong Duan, qemu-devel
Cc: alex.williamson, clg, jgg, nicolinc, joao.m.martins, peterx,
jasowang, kevin.tian, yi.l.liu, yi.y.sun, chao.p.peng
Hi Zhenzhong,
On 11/14/23 11:09, Zhenzhong Duan wrote:
> Hi,
>
> Thanks all for giving guides and comments on previous series, this is
> the remaining part of the iommufd support.
>
> Based on Cédric's suggestion, replace old config method for IOMMUFD
> with Kconfig.
>
> Based on Jason's suggestion, drop the implementation of manually
> allocating hwpt and switch to IOAS attach/detach.
>
> Beside current test, we also tested mdev with mtty for better cover range.
>
> PATCH 1: Introduce iommufd object
> PATCH 2-9: add IOMMUFD container and cdev support
> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
> PATCH 18: make VFIOContainerBase parameter const
> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
>
>
> We have done wide test with different combinations, e.g:
> - PCI device were tested
> - FD passing and hot reset with some trick.
> - device hotplug test with legacy and iommufd backends
> - with or without vIOMMU for legacy and iommufd backends
> - divices linked to different iommufds
> - VFIO migration with a E800 net card(no dirty sync support) passthrough
> - platform, ccw and ap were only compile-tested due to environment limit
> - test mdev pass through with mtty and mix with real device and different BE
>
> Given some iommufd kernel limitations, the iommufd backend is
> not yet fully on par with the legacy backend w.r.t. features like:
> - p2p mappings (you will see related error traces)
> - dirty page sync
> - and etc.
Feel free to add my T-b:
Tested-by: Eric Auger <eric.auger@redhat.com>
Thanks
Eric
>
>
> qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6
> Based on vfio-next, commit id: 1a22fb936e
>
> --------------------------------------------------------------------------
>
> Below are some background and graph about the design:
>
> With the introduction of iommufd, the Linux kernel provides a generic
> interface for userspace drivers to propagate their DMA mappings to kernel
> for assigned devices. This series does the porting of the VFIO devices
> onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>
> At QEMU level, interactions with the /dev/iommu are abstracted by a new
> iommufd object (compiled in with the CONFIG_IOMMUFD option).
>
> Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
> linked with an iommufd object. In this series, the vfio-pci device is
> granted with such capability (other VFIO devices are not yet ready):
>
> It gets a new optional parameter named iommufd which allows to pass
> an iommufd object:
>
> -object iommufd,id=iommufd0
> -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>
> Note the /dev/iommu and vfio cdev can be externally opened by a
> management layer. In such a case the fd is passed:
>
> -object iommufd,id=iommufd0,fd=22
> -device vfio-pci,iommufd=iommufd0,fd=23
>
> If the fd parameter is not passed, the fd is opened by QEMU.
> See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
> for detailed discuss on this requirement.
>
> If no iommufd option is passed to the vfio-pci device, iommufd is not
> used and the end-user gets the behavior based on the legacy vfio iommu
> interfaces:
>
> -device vfio-pci,host=0000:02:00.0
>
> While the legacy kernel interface is group-centric, the new iommufd
> interface is device-centric, relying on device fd and iommufd.
>
> To support both interfaces in the QEMU VFIO device we reworked the vfio
> container abstraction so that the generic VFIO code can use either
> backend.
>
> The VFIOContainer object becomes a base object derived into
> a) the legacy VFIO container and
> b) the new iommufd based container.
>
> The base object implements generic code such as code related to
> memory_listener and address space management whereas the derived
> objects implement callbacks specific to either BE, legacy and
> iommufd. Indeed each backend has its own way to setup secure context
> and dma management interface. The below diagram shows how it looks
> like with both BEs.
>
> VFIO AddressSpace/Memory
> +-------+ +----------+ +-----+ +-----+
> | pci | | platform | | ap | | ccw |
> +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+
> | | | | | AddressSpace |
> | | | | +------------+---------+
> +---V-----------V-----------V--------V----+ /
> | VFIOAddressSpace | <------------+
> | | | MemoryListener
> | VFIOContainer list |
> +-------+----------------------------+----+
> | |
> | |
> +-------V------+ +--------V----------+
> | iommufd | | vfio legacy |
> | container | | container |
> +-------+------+ +--------+----------+
> | |
> | /dev/iommu | /dev/vfio/vfio
> | /dev/vfio/devices/vfioX | /dev/vfio/$group_id
> Userspace | |
> ============+============================+===========================
> Kernel | device fd |
> +---------------+ | group/container fd
> | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU)
> | ATTACH_IOAS) | | device fd
> | | |
> | +-------V------------V-----------------+
> iommufd | | vfio |
> (map/unmap | +---------+--------------------+-------+
> ioas_copy) | | | map/unmap
> | | |
> +------V------+ +-----V------+ +------V--------+
> | iommfd core | | device | | vfio iommu |
> +-------------+ +------------+ +---------------+
>
> [Secure Context setup]
> - iommufd BE: uses device fd and iommufd to setup secure context
> (bind_iommufd, attach_ioas)
> - vfio legacy BE: uses group fd and container fd to setup secure context
> (set_container, set_iommu)
> [Device access]
> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
> - vfio legacy BE: device fd is retrieved from group fd ioctl
> [DMA Mapping flow]
> 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
> 2. VFIO populates DMA map/unmap via the container BEs
> *) iommufd BE: uses iommufd
> *) vfio legacy BE: uses container fd
>
>
> Changelog:
> v6:
> - simplify CONFIG_IOMMUFD checking code further (Cédric)
> - check iommufd_cdev_kvm_device_add return value (Cédric)
> - dirrectory -> directory (Cédric)
> - propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric)
> - introduce a helper vfio_device_set_fd (Cédric)
> - Move #include "sysemu/iommufd.h" in platform.c (Cédric)
> - simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas
> - Dare to keep Matthew's RB as related change is minor
>
> v5:
> - Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric)
> - Add (uintptr_t) to info->allowed_iovas (Cédric)
> - Switch to IOAS attach/detach and hide hwpt (Jason)
> - move chardev_open.[h|c] under the IOMMUFD entry (Cédric)
> - Move vfio_legacy_pci_hot_reset into container.c (Cédric)
> - Add missed pgsizes initialization in vfio_get_info_iova_range
> - split linking iommufd patch into three to be cleaner
> - Fix comments on PCI BAR unmap
>
> v4:
> - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus)
> - add doc for default case without fd (Markus)
> - Fix build issue reported by Markus and Cédric
> - Simply use SPDX identifier in new file (Cédric)
> - make vfio_container_init/destroy helper a seperate patch (Cédric)
> - make vrdl_list movement a seperate patch (Cédric)
> - add const for some callback parameters (Cédric)
> - add g_assert in VFIOIOMMUOps callback (Cédric)
> - introduce pci_hot_reset callback (Cédric)
> - remove VFIOIOMMUSpaprOps (Cédric)
> - initialize g_autofree to NULL (Cédric)
> - adjust func name prefix and trace event in iommufd.c (Cédric)
> - add RB
>
> v3:
> - Rename base container as VFIOContainerBase and legacy container as container (Cédric)
> - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric)
> - Cleanup container.c by introducing spapr backend and move spapr code out (Cédric)
> - Introduce vfio_iommu_spapr_ops (Cédric)
> - Add doc of iommufd in qom.json and have iommufd member sorted (Markus)
> - patch19 and patch21 are splitted to two parts to facilitate review
>
> v2:
> - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric
> - add fd passing to platform/ap/ccw vfio device
> - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric
> - rename char_dev.h to chardev_open.h for same naming scheme per Daniel
> - add full copyright per Daniel and Jason
>
>
> Note changelog below are from full IOMMUFD series:
>
> v1:
> - Alloc hwpt instead of using auto hwpt
> - elaborate iommufd code per Nicolin
> - consolidate two patches and drop as.c
> - typo error fix and function rename
>
> rfcv4:
> - rebase on top of v8.0.3
> - Add one patch from Yi which is about vfio device add in kvm
> - Remove IOAS_COPY optimization and focus on functions in this patchset
> - Fix wrong name issue reported and fix suggested by Matthew
> - Fix compilation issue reported and fix sugggsted by Nicolin
> - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
> granularity
> - Add dev_iter_next() callback to avoid adding so many callback
> at container scope, add VFIODevice.hwpt to support that
> - Restore all functions back to common from container whenever possible,
> mainly migration and reset related functions
> - Add --enable/disable-iommufd config option, enabled by default in linux
> - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
> - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
> - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
> redundant code
> - Add FD passing support for vfio device backed by IOMMUFD
> - Fix hot unplug resource leak issue in vfio_legacy_detach_device()
> - Fix FD leak in vfio_get_devicefd()
>
> rfcv3:
> - rebase on top of v7.2.0
> - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
> VFIO backends
> - Fix use after free in error path, reported by Alister
> - Split common.c in several steps to ease the review
>
> rfcv2:
> - remove the first three patches of rfcv1
> - add open cdev helper suggested by Jason
> - remove the QOMification of the VFIOContainer and simply use standard ops
> (David)
> - add "-object iommufd" suggested by Alex
>
> Thanks
> Zhenzhong
>
>
> Cédric Le Goater (3):
> hw/arm: Activate IOMMUFD for virt machines
> kconfig: Activate IOMMUFD for s390x machines
> hw/i386: Activate IOMMUFD for q35 machines
>
> Eric Auger (2):
> backends/iommufd: Introduce the iommufd object
> vfio/pci: Allow the selection of a given iommu backend
>
> Yi Liu (2):
> util/char_dev: Add open_cdev()
> vfio/iommufd: Implement the iommufd backend
>
> Zhenzhong Duan (14):
> vfio/common: return early if space isn't empty
> vfio/iommufd: Relax assert check for iommufd backend
> vfio/iommufd: Add support for iova_ranges and pgsizes
> vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info
> vfio/pci: Introduce a vfio pci hot reset interface
> vfio/iommufd: Enable pci hot reset through iommufd cdev interface
> vfio/pci: Make vfio cdev pre-openable by passing a file handle
> vfio/platform: Allow the selection of a given iommu backend
> vfio/platform: Make vfio cdev pre-openable by passing a file handle
> vfio/ap: Allow the selection of a given iommu backend
> vfio/ap: Make vfio cdev pre-openable by passing a file handle
> vfio/ccw: Allow the selection of a given iommu backend
> vfio/ccw: Make vfio cdev pre-openable by passing a file handle
> vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps
> callbacks
>
> MAINTAINERS | 10 +
> qapi/qom.json | 19 +
> hw/vfio/pci.h | 6 +
> include/hw/vfio/vfio-common.h | 26 +-
> include/hw/vfio/vfio-container-base.h | 15 +-
> include/qemu/chardev_open.h | 16 +
> include/sysemu/iommufd.h | 44 ++
> backends/iommufd.c | 228 ++++++++++
> hw/vfio/ap.c | 29 +-
> hw/vfio/ccw.c | 31 +-
> hw/vfio/common.c | 24 +-
> hw/vfio/container-base.c | 6 +-
> hw/vfio/container.c | 208 ++++++++-
> hw/vfio/helpers.c | 44 ++
> hw/vfio/iommufd.c | 630 ++++++++++++++++++++++++++
> hw/vfio/pci.c | 212 ++-------
> hw/vfio/platform.c | 38 +-
> util/chardev_open.c | 81 ++++
> backends/Kconfig | 4 +
> backends/meson.build | 1 +
> backends/trace-events | 10 +
> hw/arm/Kconfig | 1 +
> hw/i386/Kconfig | 1 +
> hw/s390x/Kconfig | 1 +
> hw/vfio/meson.build | 3 +
> hw/vfio/trace-events | 11 +
> qemu-options.hx | 12 +
> util/meson.build | 1 +
> 28 files changed, 1493 insertions(+), 219 deletions(-)
> create mode 100644 include/qemu/chardev_open.h
> create mode 100644 include/sysemu/iommufd.h
> create mode 100644 backends/iommufd.c
> create mode 100644 hw/vfio/iommufd.c
> create mode 100644 util/chardev_open.c
>
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-20 8:24 ` Cédric Le Goater
@ 2023-11-20 10:07 ` Duan, Zhenzhong
2023-11-20 17:08 ` Cédric Le Goater
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-20 10:07 UTC (permalink / raw)
To: Cédric Le Goater, eric.auger@redhat.com,
qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
Hi Cédric,
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Monday, November 20, 2023 4:25 PM
>Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>
>>>>> A similar issue with a fix submitted below, ccing related people.
>>>>> https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
>>>>> It looks the fix will not work for hotplug.
>>>>>
>>>>> Or below qemu cmdline may help:
>>>>> "-cpu host,host-phys-bits-limit=39"
>>>>
>>>> don't you have the same issue with legacy VFIO code, you should?
>>>
>>> I tend to be lazy and use seabios for guests on the command line.
>>> I do see the error with legacy VFIO and uefi.
>>>
>>> However, with the address space size work-around and iommufd, the
>>> error is different, an EFAULT now. Some page pinning issue it seems.
>>
>> Yes, this reminds me of iommufd not supporting p2p mapping yet.
>
>OK. Should we transform this error in a warning ? The code needs
>at least a comment.
Make sense, though I'm not clear if there is other corner case return EFAULT.
I plan below change in v7:
diff --git a/backends/iommufd.c b/backends/iommufd.c
index 53fdac4cc0..ba58a0eb0d 100644
--- a/backends/iommufd.c
+++ b/backends/iommufd.c
@@ -178,7 +178,13 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
vaddr, readonly, ret);
if (ret) {
ret = -errno;
- error_report("IOMMU_IOAS_MAP failed: %m");
+
+ /* TODO: Not support mapping hardware PCI BAR region for now. */
+ if (errno == EFAULT) {
+ warn_report("IOMMU_IOAS_MAP failed: %m, PCI BAR?");
+ } else {
+ error_report("IOMMU_IOAS_MAP failed: %m");
+ }
}
return ret;
}
I failed to change vfio_container_dma_map print as warning because for legacy container, it's real errro.
So print after fix:
qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
>
>> So EFAULT is expected. Maybe I should add a comment in docs/devel/vfio-
>iommufd.rst
>
>Yes. It would be good to have a list of gaps and effects in the
>documentation. See Jason's presentation at LPC.
>
>
>https://lpc.events/event/17/contributions/1418/attachments/1297/2607/LPC202
>3_iommufd.pdf
I see, PCI Peer to Peer and POWER/SPAPR are related to qemu iommufd implementation.
For POWER/SPAPR, we have "Supported platform" section.
Below are other gaps I can think of for now:
Gaps:
1. dirty page sync, WIP (Joao)
2. p2p dma not supported yet.
3. fd passing with mdev not support ram discard(vfio-pci) as no way to know it's a mdev from a fd.
Thanks
Zhenzhong
^ permalink raw reply related [flat|nested] 82+ messages in thread
* RE: [PATCH v6 00/21] vfio: Adopt iommufd
2023-11-20 9:15 ` Eric Auger
@ 2023-11-20 10:09 ` Duan, Zhenzhong
2023-11-20 11:22 ` Eric Auger
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-20 10:09 UTC (permalink / raw)
To: eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, peterx@redhat.com,
jasowang@redhat.com, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
Peng, Chao P
>-----Original Message-----
>From: Eric Auger <eric.auger@redhat.com>
>Sent: Monday, November 20, 2023 5:15 PM
>Subject: Re: [PATCH v6 00/21] vfio: Adopt iommufd
>
>Hi Zhenzhong,
>
>On 11/14/23 11:09, Zhenzhong Duan wrote:
>> Hi,
>>
>> Thanks all for giving guides and comments on previous series, this is
>> the remaining part of the iommufd support.
>>
>> Based on Cédric's suggestion, replace old config method for IOMMUFD
>> with Kconfig.
>>
>> Based on Jason's suggestion, drop the implementation of manually
>> allocating hwpt and switch to IOAS attach/detach.
>>
>> Beside current test, we also tested mdev with mtty for better cover range.
>>
>> PATCH 1: Introduce iommufd object
>> PATCH 2-9: add IOMMUFD container and cdev support
>> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
>> PATCH 18: make VFIOContainerBase parameter const
>> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
>>
>>
>> We have done wide test with different combinations, e.g:
>> - PCI device were tested
>> - FD passing and hot reset with some trick.
>> - device hotplug test with legacy and iommufd backends
>> - with or without vIOMMU for legacy and iommufd backends
>> - divices linked to different iommufds
>> - VFIO migration with a E800 net card(no dirty sync support) passthrough
>> - platform, ccw and ap were only compile-tested due to environment limit
>> - test mdev pass through with mtty and mix with real device and different BE
>>
>> Given some iommufd kernel limitations, the iommufd backend is
>> not yet fully on par with the legacy backend w.r.t. features like:
>> - p2p mappings (you will see related error traces)
>> - dirty page sync
>> - and etc.
>
>Feel free to add my T-b:
>Tested-by: Eric Auger <eric.auger@redhat.com>
Thanks Eric, you mean all the patches or arm part?
BRs.
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 00/21] vfio: Adopt iommufd
2023-11-20 10:09 ` Duan, Zhenzhong
@ 2023-11-20 11:22 ` Eric Auger
0 siblings, 0 replies; 82+ messages in thread
From: Eric Auger @ 2023-11-20 11:22 UTC (permalink / raw)
To: Duan, Zhenzhong, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, clg@redhat.com, jgg@nvidia.com,
nicolinc@nvidia.com, joao.m.martins@oracle.com, peterx@redhat.com,
jasowang@redhat.com, Tian, Kevin, Liu, Yi L, Sun, Yi Y,
Peng, Chao P
Hi Zhengzhong
On 11/20/23 11:09, Duan, Zhenzhong wrote:
>
>> -----Original Message-----
>> From: Eric Auger <eric.auger@redhat.com>
>> Sent: Monday, November 20, 2023 5:15 PM
>> Subject: Re: [PATCH v6 00/21] vfio: Adopt iommufd
>>
>> Hi Zhenzhong,
>>
>> On 11/14/23 11:09, Zhenzhong Duan wrote:
>>> Hi,
>>>
>>> Thanks all for giving guides and comments on previous series, this is
>>> the remaining part of the iommufd support.
>>>
>>> Based on Cédric's suggestion, replace old config method for IOMMUFD
>>> with Kconfig.
>>>
>>> Based on Jason's suggestion, drop the implementation of manually
>>> allocating hwpt and switch to IOAS attach/detach.
>>>
>>> Beside current test, we also tested mdev with mtty for better cover range.
>>>
>>> PATCH 1: Introduce iommufd object
>>> PATCH 2-9: add IOMMUFD container and cdev support
>>> PATCH 10-17: fd passing for cdev and linking to IOMMUFD
>>> PATCH 18: make VFIOContainerBase parameter const
>>> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86
>>>
>>>
>>> We have done wide test with different combinations, e.g:
>>> - PCI device were tested
>>> - FD passing and hot reset with some trick.
>>> - device hotplug test with legacy and iommufd backends
>>> - with or without vIOMMU for legacy and iommufd backends
>>> - divices linked to different iommufds
>>> - VFIO migration with a E800 net card(no dirty sync support) passthrough
>>> - platform, ccw and ap were only compile-tested due to environment limit
>>> - test mdev pass through with mtty and mix with real device and different BE
>>>
>>> Given some iommufd kernel limitations, the iommufd backend is
>>> not yet fully on par with the legacy backend w.r.t. features like:
>>> - p2p mappings (you will see related error traces)
>>> - dirty page sync
>>> - and etc.
>> Feel free to add my T-b:
>> Tested-by: Eric Auger <eric.auger@redhat.com>
> Thanks Eric, you mean all the patches or arm part?
Yeah sorry I failed to give details. I have tested on ARM with vfio-pci
atm. So all the generic patches + ARM virt / PCI specific ones. As for
VFIO-PLATFORM I need to resurrect a new environment because I have some
trouble with AMD overdrive which do not expose iommu groups atm. I need
to figure this out or create a new vfio-platform environment to test.
Working on it ...
Thanks
Eric
>
> BRs.
> Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-20 10:07 ` Duan, Zhenzhong
@ 2023-11-20 17:08 ` Cédric Le Goater
2023-11-21 3:26 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-20 17:08 UTC (permalink / raw)
To: Duan, Zhenzhong, eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
Hello Zhenzhong
On 11/20/23 11:07, Duan, Zhenzhong wrote:
> Hi Cédric,
>
>> -----Original Message-----
>> From: Cédric Le Goater <clg@redhat.com>
>> Sent: Monday, November 20, 2023 4:25 PM
>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>>
>>>>>> A similar issue with a fix submitted below, ccing related people.
>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
>>>>>> It looks the fix will not work for hotplug.
>>>>>>
>>>>>> Or below qemu cmdline may help:
>>>>>> "-cpu host,host-phys-bits-limit=39"
>>>>>
>>>>> don't you have the same issue with legacy VFIO code, you should?
>>>>
>>>> I tend to be lazy and use seabios for guests on the command line.
>>>> I do see the error with legacy VFIO and uefi.
>>>>
>>>> However, with the address space size work-around and iommufd, the
>>>> error is different, an EFAULT now. Some page pinning issue it seems.
>>>
>>> Yes, this reminds me of iommufd not supporting p2p mapping yet.
>>
>> OK. Should we transform this error in a warning ? The code needs
>> at least a comment.
>
> Make sense, though I'm not clear if there is other corner case return EFAULT.
yep. That's the problem.
> I plan below change in v7:
>
> diff --git a/backends/iommufd.c b/backends/iommufd.c
> index 53fdac4cc0..ba58a0eb0d 100644
> --- a/backends/iommufd.c
> +++ b/backends/iommufd.c
> @@ -178,7 +178,13 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
> vaddr, readonly, ret);
> if (ret) {
> ret = -errno;
> - error_report("IOMMU_IOAS_MAP failed: %m");
> +
> + /* TODO: Not support mapping hardware PCI BAR region for now. */
> + if (errno == EFAULT) {
> + warn_report("IOMMU_IOAS_MAP failed: %m, PCI BAR?");
> + } else {
> + error_report("IOMMU_IOAS_MAP failed: %m");
> + }
> }
> return ret;
> }
>
> I failed to change vfio_container_dma_map print as warning because for legacy container, it's real errro.
> So print after fix:
>
> qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI BAR?
> qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620, 0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
I am OK with that. Let's see what the others have to say.
>>
>>> So EFAULT is expected. Maybe I should add a comment in docs/devel/vfio-
>> iommufd.rst
>>
>> Yes. It would be good to have a list of gaps and effects in the
>> documentation. See Jason's presentation at LPC.
>>
>>
>> https://lpc.events/event/17/contributions/1418/attachments/1297/2607/LPC202
>> 3_iommufd.pdf
>
> I see, PCI Peer to Peer and POWER/SPAPR are related to qemu iommufd implementation.
> For POWER/SPAPR, we have "Supported platform" section.
yes.
> Below are other gaps I can think of for now:
>
> Gaps:
> 1. dirty page sync, WIP (Joao)
> 2. p2p dma not supported yet.
> 3. fd passing with mdev not support ram discard(vfio-pci) as no way to know it's a mdev from a fd.
Call the section Caveats maybe?
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-20 17:08 ` Cédric Le Goater
@ 2023-11-21 3:26 ` Duan, Zhenzhong
2023-11-21 8:05 ` Cédric Le Goater
0 siblings, 1 reply; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-21 3:26 UTC (permalink / raw)
To: Cédric Le Goater, eric.auger@redhat.com,
qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 21, 2023 1:09 AM
>Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>
>Hello Zhenzhong
>
>On 11/20/23 11:07, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Sent: Monday, November 20, 2023 4:25 PM
>>> Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd
>object
>>>
>>>>>>> A similar issue with a fix submitted below, ccing related people.
>>>>>>> https://lists.gnu.org/archive/html/qemu-devel/2023-11/msg02937.html
>>>>>>> It looks the fix will not work for hotplug.
>>>>>>>
>>>>>>> Or below qemu cmdline may help:
>>>>>>> "-cpu host,host-phys-bits-limit=39"
>>>>>>
>>>>>> don't you have the same issue with legacy VFIO code, you should?
>>>>>
>>>>> I tend to be lazy and use seabios for guests on the command line.
>>>>> I do see the error with legacy VFIO and uefi.
>>>>>
>>>>> However, with the address space size work-around and iommufd, the
>>>>> error is different, an EFAULT now. Some page pinning issue it seems.
>>>>
>>>> Yes, this reminds me of iommufd not supporting p2p mapping yet.
>>>
>>> OK. Should we transform this error in a warning ? The code needs
>>> at least a comment.
>>
>> Make sense, though I'm not clear if there is other corner case return EFAULT.
>
>yep. That's the problem.
>
>> I plan below change in v7:
>>
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 53fdac4cc0..ba58a0eb0d 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -178,7 +178,13 @@ int iommufd_backend_map_dma(IOMMUFDBackend
>*be, uint32_t ioas_id, hwaddr iova,
>> vaddr, readonly, ret);
>> if (ret) {
>> ret = -errno;
>> - error_report("IOMMU_IOAS_MAP failed: %m");
>> +
>> + /* TODO: Not support mapping hardware PCI BAR region for now. */
>> + if (errno == EFAULT) {
>> + warn_report("IOMMU_IOAS_MAP failed: %m, PCI BAR?");
>> + } else {
>> + error_report("IOMMU_IOAS_MAP failed: %m");
>> + }
>> }
>> return ret;
>> }
>>
>> I failed to change vfio_container_dma_map print as warning because for legacy
>container, it's real errro.
>> So print after fix:
>>
>> qemu-system-x86_64: warning: IOMMU_IOAS_MAP failed: Bad address, PCI
>BAR?
>> qemu-system-x86_64: vfio_container_dma_map(0x560cb6cb1620,
>0xe000000021000, 0x3000, 0x7f32ed55c000) = -14 (Bad address)
>
>I am OK with that. Let's see what the others have to say.
>
>>>
>>>> So EFAULT is expected. Maybe I should add a comment in docs/devel/vfio-
>>> iommufd.rst
>>>
>>> Yes. It would be good to have a list of gaps and effects in the
>>> documentation. See Jason's presentation at LPC.
>>>
>>>
>>>
>https://lpc.events/event/17/contributions/1418/attachments/1297/2607/LPC202
>>> 3_iommufd.pdf
>>
>> I see, PCI Peer to Peer and POWER/SPAPR are related to qemu iommufd
>implementation.
>> For POWER/SPAPR, we have "Supported platform" section.
>
>yes.
>
>> Below are other gaps I can think of for now:
>>
>> Gaps:
>> 1. dirty page sync, WIP (Joao)
>> 2. p2p dma not supported yet.
>> 3. fd passing with mdev not support ram discard(vfio-pci) as no way to know it's
>a mdev from a fd.
>
>Call the section Caveats maybe?
Got it.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
* Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-21 3:26 ` Duan, Zhenzhong
@ 2023-11-21 8:05 ` Cédric Le Goater
2023-11-21 8:39 ` Duan, Zhenzhong
0 siblings, 1 reply; 82+ messages in thread
From: Cédric Le Goater @ 2023-11-21 8:05 UTC (permalink / raw)
To: Duan, Zhenzhong, eric.auger@redhat.com, qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
Hello Zhenzhong,
>>> Below are other gaps I can think of for now:
>>>
>>> Gaps:
>>> 1. dirty page sync, WIP (Joao)
>>> 2. p2p dma not supported yet.
>>> 3. fd passing with mdev not support ram discard(vfio-pci) as no way to know it's
>> a mdev from a fd.
>>
>> Call the section Caveats maybe?
>
> Got it.
It looks like v7 should be ready by rc2 (next week). I would then merge
in vfio-next and wait a week before sending a QEMU-9.0 PR.
Thanks,
C.
^ permalink raw reply [flat|nested] 82+ messages in thread
* RE: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
2023-11-21 8:05 ` Cédric Le Goater
@ 2023-11-21 8:39 ` Duan, Zhenzhong
0 siblings, 0 replies; 82+ messages in thread
From: Duan, Zhenzhong @ 2023-11-21 8:39 UTC (permalink / raw)
To: Cédric Le Goater, eric.auger@redhat.com,
qemu-devel@nongnu.org
Cc: alex.williamson@redhat.com, jgg@nvidia.com, nicolinc@nvidia.com,
joao.m.martins@oracle.com, peterx@redhat.com, jasowang@redhat.com,
Tian, Kevin, Liu, Yi L, Sun, Yi Y, Peng, Chao P, Paolo Bonzini,
Eric Blake, Markus Armbruster, Daniel P. Berrangé,
Eduardo Habkost, Gerd Hoffmann, Kasireddy, Vivek,
lersek@redhat.com
>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Sent: Tuesday, November 21, 2023 4:06 PM
>Subject: Re: [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object
>
>Hello Zhenzhong,
>
>>>> Below are other gaps I can think of for now:
>>>>
>>>> Gaps:
>>>> 1. dirty page sync, WIP (Joao)
>>>> 2. p2p dma not supported yet.
>>>> 3. fd passing with mdev not support ram discard(vfio-pci) as no way to know
>it's
>>> a mdev from a fd.
>>>
>>> Call the section Caveats maybe?
>>
>> Got it.
>
>It looks like v7 should be ready by rc2 (next week). I would then merge
>in vfio-next and wait a week before sending a QEMU-9.0 PR.
Got it, I'll send out soon.
Thanks
Zhenzhong
^ permalink raw reply [flat|nested] 82+ messages in thread
end of thread, other threads:[~2023-11-21 8:40 UTC | newest]
Thread overview: 82+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-11-14 10:09 [PATCH v6 00/21] vfio: Adopt iommufd Zhenzhong Duan
2023-11-14 10:09 ` [PATCH v6 01/21] backends/iommufd: Introduce the iommufd object Zhenzhong Duan
2023-11-14 13:28 ` Cédric Le Goater
2023-11-15 4:06 ` Duan, Zhenzhong
2023-11-15 8:15 ` Cédric Le Goater
2023-11-15 12:52 ` Eric Auger
2023-11-16 4:04 ` Duan, Zhenzhong
2023-11-16 8:32 ` Eric Auger
2023-11-16 8:47 ` Duan, Zhenzhong
2023-11-17 11:09 ` Cédric Le Goater
2023-11-17 11:39 ` Duan, Zhenzhong
2023-11-17 12:56 ` Cédric Le Goater
2023-11-17 13:29 ` Eric Auger
2023-11-17 13:56 ` Cédric Le Goater
2023-11-20 3:06 ` Duan, Zhenzhong
2023-11-20 8:24 ` Cédric Le Goater
2023-11-20 10:07 ` Duan, Zhenzhong
2023-11-20 17:08 ` Cédric Le Goater
2023-11-21 3:26 ` Duan, Zhenzhong
2023-11-21 8:05 ` Cédric Le Goater
2023-11-21 8:39 ` Duan, Zhenzhong
2023-11-14 10:09 ` [PATCH v6 02/21] util/char_dev: Add open_cdev() Zhenzhong Duan
2023-11-14 13:29 ` Cédric Le Goater
2023-11-15 13:23 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 03/21] vfio/common: return early if space isn't empty Zhenzhong Duan
2023-11-14 13:29 ` Cédric Le Goater
2023-11-15 13:28 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 04/21] vfio/iommufd: Implement the iommufd backend Zhenzhong Duan
2023-11-14 13:36 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 05/21] vfio/iommufd: Relax assert check for " Zhenzhong Duan
2023-11-15 13:56 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 06/21] vfio/iommufd: Add support for iova_ranges and pgsizes Zhenzhong Duan
2023-11-14 13:46 ` Cédric Le Goater
2023-11-15 2:36 ` Duan, Zhenzhong
2023-11-15 16:25 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 07/21] vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info Zhenzhong Duan
2023-11-15 17:00 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 08/21] vfio/pci: Introduce a vfio pci hot reset interface Zhenzhong Duan
2023-11-14 13:51 ` Cédric Le Goater
2023-11-15 2:55 ` Duan, Zhenzhong
2023-11-15 17:54 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 09/21] vfio/iommufd: Enable pci hot reset through iommufd cdev interface Zhenzhong Duan
2023-11-17 13:53 ` Eric Auger
2023-11-20 4:15 ` Duan, Zhenzhong
2023-11-14 10:09 ` [PATCH v6 10/21] vfio/pci: Allow the selection of a given iommu backend Zhenzhong Duan
2023-11-14 13:57 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 11/21] vfio/pci: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-11-14 14:08 ` Cédric Le Goater
2023-11-15 12:09 ` Philippe Mathieu-Daudé
2023-11-15 13:05 ` Cédric Le Goater
2023-11-16 2:15 ` Duan, Zhenzhong
2023-11-16 7:25 ` Cédric Le Goater
2023-11-16 7:43 ` Duan, Zhenzhong
2023-11-14 10:09 ` [PATCH v6 12/21] vfio/platform: Allow the selection of a given iommu backend Zhenzhong Duan
2023-11-14 14:03 ` Cédric Le Goater
2023-11-17 14:55 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 13/21] vfio/platform: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-11-14 14:22 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 14/21] vfio/ap: Allow the selection of a given iommu backend Zhenzhong Duan
2023-11-14 14:03 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 15/21] vfio/ap: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-11-14 14:04 ` Cédric Le Goater
2023-11-14 10:09 ` [PATCH v6 16/21] vfio/ccw: Allow the selection of a given iommu backend Zhenzhong Duan
2023-11-14 14:04 ` Cédric Le Goater
2023-11-15 18:45 ` Eric Farman
2023-11-14 10:09 ` [PATCH v6 17/21] vfio/ccw: Make vfio cdev pre-openable by passing a file handle Zhenzhong Duan
2023-11-14 14:04 ` Cédric Le Goater
2023-11-15 18:46 ` Eric Farman
2023-11-14 10:09 ` [PATCH v6 18/21] vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks Zhenzhong Duan
2023-11-14 14:05 ` Cédric Le Goater
2023-11-17 14:58 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 19/21] hw/arm: Activate IOMMUFD for virt machines Zhenzhong Duan
2023-11-16 9:17 ` Eric Auger
2023-11-14 10:09 ` [PATCH v6 20/21] kconfig: Activate IOMMUFD for s390x machines Zhenzhong Duan
2023-11-15 18:47 ` Eric Farman
2023-11-14 10:09 ` [PATCH v6 21/21] hw/i386: Activate IOMMUFD for q35 machines Zhenzhong Duan
2023-11-16 9:17 ` Eric Auger
2023-11-14 14:51 ` [PATCH v6 00/21] vfio: Adopt iommufd Cédric Le Goater
2023-11-15 4:16 ` Duan, Zhenzhong
2023-11-20 9:15 ` Eric Auger
2023-11-20 10:09 ` Duan, Zhenzhong
2023-11-20 11:22 ` Eric Auger
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).