* [PATCH v3 00/15] vfio: preparation for vfio-user
@ 2025-05-07 15:20 John Levon
2025-05-07 15:20 ` [PATCH v3 01/15] vfio: add vfio_device_prepare() John Levon
` (15 more replies)
0 siblings, 16 replies; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Hi, this series is against the vfio-next tree:
https://github.com/legoater/qemu/commits/vfio-next
The series contains patches to vfio to prepare for the vfio-user
implementation. A previous version of these patches can be found at
https://patchew.org/QEMU/20250430194003.2793823-1-john.levon@nutanix.com/
The changes have been rebased on vfio-next, and include changes from previous
series code review comments.
An old version of the full vfio-user series can be found at
https://lore.kernel.org/all/7dd34008-e0f1-4eed-a77e-55b1f68fbe69@redhat.com/T/
("[PATCH v8 00/28] vfio-user client"). Please see that series for justification
and context.
thanks
john
John Levon (15):
vfio: add vfio_device_prepare()
vfio: add vfio_device_unprepare()
vfio: add vfio_attach_device_by_iommu_type()
vfio: add vfio_device_get_irq_info() helper
vfio: consistently handle return value for helpers
vfio: add strread/writeerror()
vfio: add vfio_pci_config_space_read/write()
vfio: add unmap_all flag to DMA unmap callback
vfio: implement unmap all for DMA unmap callbacks
vfio: add device IO ops vector
vfio: add region info cache
vfio: add read/write to device IO ops vector
vfio: add vfio-pci-base class
vfio/container: pass listener_begin/commit callbacks
vfio/container: pass MemoryRegion to DMA operations
hw/vfio/pci.h | 10 +-
include/hw/vfio/vfio-container-base.h | 21 ++-
include/hw/vfio/vfio-device.h | 82 ++++++++
include/system/memory.h | 4 +-
hw/vfio/ap.c | 19 +-
hw/vfio/ccw.c | 25 ++-
hw/vfio/container-base.c | 14 +-
hw/vfio/container.c | 62 ++++---
hw/vfio/device.c | 183 ++++++++++++++++--
hw/vfio/igd.c | 10 +-
hw/vfio/iommufd.c | 35 ++--
hw/vfio/listener.c | 82 +++++---
hw/vfio/pci.c | 257 ++++++++++++++++----------
hw/vfio/platform.c | 6 +-
hw/vfio/region.c | 19 +-
hw/virtio/vhost-vdpa.c | 2 +-
system/memory.c | 7 +-
17 files changed, 603 insertions(+), 235 deletions(-)
--
2.43.0
^ permalink raw reply [flat|nested] 27+ messages in thread
* [PATCH v3 01/15] vfio: add vfio_device_prepare()
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-07 15:20 ` [PATCH v3 02/15] vfio: add vfio_device_unprepare() John Levon
` (14 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Commonize some initialization code shared by the legacy and iommufd vfio
implementations.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-device.h | 3 +++
hw/vfio/container.c | 14 ++------------
hw/vfio/device.c | 14 ++++++++++++++
hw/vfio/iommufd.c | 9 +--------
4 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 81c95bb51e..081929ca4b 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -134,6 +134,9 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
extern VFIODeviceList vfio_device_list;
#ifdef CONFIG_LINUX
+void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
+ struct vfio_device_info *info);
+
int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
struct vfio_region_info **info);
int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index a761f0958b..d30c1a141d 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -826,18 +826,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
}
}
+ vfio_device_prepare(vbasedev, &group->container->bcontainer, info);
+
vbasedev->fd = fd;
vbasedev->group = group;
QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
- vbasedev->num_irqs = info->num_irqs;
- vbasedev->num_regions = info->num_regions;
- vbasedev->flags = info->flags;
-
trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
- vbasedev->reset_works = !!(info->flags & VFIO_DEVICE_FLAGS_RESET);
-
return true;
}
@@ -890,7 +886,6 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
int groupid = vfio_device_get_groupid(vbasedev, errp);
VFIODevice *vbasedev_iter;
VFIOGroup *group;
- VFIOContainerBase *bcontainer;
if (groupid < 0) {
return false;
@@ -919,11 +914,6 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
goto device_put_exit;
}
- bcontainer = &group->container->bcontainer;
- vbasedev->bcontainer = bcontainer;
- QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
- QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
-
return true;
device_put_exit:
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index d625a7c4db..f3b9902d21 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -398,3 +398,17 @@ void vfio_device_detach(VFIODevice *vbasedev)
}
VFIO_IOMMU_GET_CLASS(vbasedev->bcontainer)->detach_device(vbasedev);
}
+
+void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
+ struct vfio_device_info *info)
+{
+ vbasedev->num_irqs = info->num_irqs;
+ vbasedev->num_regions = info->num_regions;
+ vbasedev->flags = info->flags;
+ vbasedev->reset_works = !!(info->flags & VFIO_DEVICE_FLAGS_RESET);
+
+ vbasedev->bcontainer = bcontainer;
+ QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
+
+ QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 232c06dd15..83033c352a 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -588,14 +588,7 @@ found_container:
iommufd_cdev_ram_block_discard_disable(false);
}
- vbasedev->group = 0;
- vbasedev->num_irqs = dev_info.num_irqs;
- vbasedev->num_regions = dev_info.num_regions;
- vbasedev->flags = dev_info.flags;
- vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
- vbasedev->bcontainer = bcontainer;
- QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
- QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+ vfio_device_prepare(vbasedev, bcontainer, &dev_info);
trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
vbasedev->num_regions, vbasedev->flags);
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 02/15] vfio: add vfio_device_unprepare()
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
2025-05-07 15:20 ` [PATCH v3 01/15] vfio: add vfio_device_prepare() John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-07 15:20 ` [PATCH v3 03/15] vfio: add vfio_attach_device_by_iommu_type() John Levon
` (13 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Add a helper that's the inverse of vfio_device_prepare().
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-device.h | 2 ++
hw/vfio/container.c | 6 +++---
hw/vfio/device.c | 7 +++++++
hw/vfio/iommufd.c | 4 +---
4 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 081929ca4b..342c4ba3bf 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -137,6 +137,8 @@ extern VFIODeviceList vfio_device_list;
void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
struct vfio_device_info *info);
+void vfio_device_unprepare(VFIODevice *vbasedev);
+
int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
struct vfio_region_info **info);
int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index d30c1a141d..cf23aa799f 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -927,10 +927,10 @@ static void vfio_legacy_detach_device(VFIODevice *vbasedev)
{
VFIOGroup *group = vbasedev->group;
- QLIST_REMOVE(vbasedev, global_next);
- QLIST_REMOVE(vbasedev, container_next);
- vbasedev->bcontainer = NULL;
trace_vfio_device_detach(vbasedev->name, group->groupid);
+
+ vfio_device_unprepare(vbasedev);
+
object_unref(vbasedev->hiod);
vfio_device_put(vbasedev);
vfio_group_put(group);
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index f3b9902d21..31c441a3df 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -412,3 +412,10 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
}
+
+void vfio_device_unprepare(VFIODevice *vbasedev)
+{
+ QLIST_REMOVE(vbasedev, container_next);
+ QLIST_REMOVE(vbasedev, global_next);
+ vbasedev->bcontainer = NULL;
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 83033c352a..62ecb758f1 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -615,9 +615,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
VFIOIOMMUFDContainer *container = container_of(bcontainer,
VFIOIOMMUFDContainer,
bcontainer);
- QLIST_REMOVE(vbasedev, global_next);
- QLIST_REMOVE(vbasedev, container_next);
- vbasedev->bcontainer = NULL;
+ vfio_device_unprepare(vbasedev);
if (!vbasedev->ram_block_discard_allowed) {
iommufd_cdev_ram_block_discard_disable(false);
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 03/15] vfio: add vfio_attach_device_by_iommu_type()
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
2025-05-07 15:20 ` [PATCH v3 01/15] vfio: add vfio_device_prepare() John Levon
2025-05-07 15:20 ` [PATCH v3 02/15] vfio: add vfio_device_unprepare() John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-07 15:20 ` [PATCH v3 04/15] vfio: add vfio_device_get_irq_info() helper John Levon
` (12 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Allow attachment by explicitly passing a TYPE_VFIO_IOMMU_* string;
vfio-user will use this later.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-device.h | 3 +++
hw/vfio/device.c | 22 +++++++++++++++-------
2 files changed, 18 insertions(+), 7 deletions(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 342c4ba3bf..8b1437ba66 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -127,6 +127,9 @@ bool vfio_device_hiod_create_and_realize(VFIODevice *vbasedev,
const char *typename, Error **errp);
bool vfio_device_attach(char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
+bool vfio_device_attach_by_iommu_type(const char *iommu_type, char *name,
+ VFIODevice *vbasedev, AddressSpace *as,
+ Error **errp);
void vfio_device_detach(VFIODevice *vbasedev);
VFIODevice *vfio_get_vfio_device(Object *obj);
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 31c441a3df..9673b0717e 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -376,21 +376,29 @@ VFIODevice *vfio_get_vfio_device(Object *obj)
}
}
-bool vfio_device_attach(char *name, VFIODevice *vbasedev,
- AddressSpace *as, Error **errp)
+bool vfio_device_attach_by_iommu_type(const char *iommu_type, char *name,
+ VFIODevice *vbasedev, AddressSpace *as,
+ Error **errp)
{
const VFIOIOMMUClass *ops =
- VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
-
- if (vbasedev->iommufd) {
- ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
- }
+ VFIO_IOMMU_CLASS(object_class_by_name(iommu_type));
assert(ops);
return ops->attach_device(name, vbasedev, as, errp);
}
+bool vfio_device_attach(char *name, VFIODevice *vbasedev,
+ AddressSpace *as, Error **errp)
+{
+ const char *iommu_type = vbasedev->iommufd ?
+ TYPE_VFIO_IOMMU_IOMMUFD :
+ TYPE_VFIO_IOMMU_LEGACY;
+
+ return vfio_device_attach_by_iommu_type(iommu_type, name, vbasedev,
+ as, errp);
+}
+
void vfio_device_detach(VFIODevice *vbasedev)
{
if (!vbasedev->bcontainer) {
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 04/15] vfio: add vfio_device_get_irq_info() helper
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (2 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 03/15] vfio: add vfio_attach_device_by_iommu_type() John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-07 15:20 ` [PATCH v3 05/15] vfio: consistently handle return value for helpers John Levon
` (11 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Add a helper similar to vfio_device_get_region_info() and use it
everywhere.
Replace a couple of needless allocations with stack variables.
As a side-effect, this fixes a minor error reporting issue in the call
from vfio_msix_early_setup().
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-device.h | 3 +++
hw/vfio/ap.c | 19 ++++++++++---------
hw/vfio/ccw.c | 20 +++++++++++---------
hw/vfio/device.c | 15 +++++++++++++++
hw/vfio/pci.c | 23 +++++++++++------------
hw/vfio/platform.c | 6 +++---
6 files changed, 53 insertions(+), 33 deletions(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 8b1437ba66..a7eaaa31e7 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -147,6 +147,9 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
uint32_t subtype, struct vfio_region_info **info);
bool vfio_device_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
+
+int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
+ struct vfio_irq_info *info);
#endif
/* Returns 0 on success, or a negative errno. */
diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 1207c08d8d..785c0a0197 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -74,10 +74,10 @@ static bool vfio_ap_register_irq_notifier(VFIOAPDevice *vapdev,
unsigned int irq, Error **errp)
{
int fd;
- size_t argsz;
+ int ret;
IOHandler *fd_read;
EventNotifier *notifier;
- g_autofree struct vfio_irq_info *irq_info = NULL;
+ struct vfio_irq_info irq_info;
VFIODevice *vdev = &vapdev->vdev;
switch (irq) {
@@ -96,14 +96,15 @@ static bool vfio_ap_register_irq_notifier(VFIOAPDevice *vapdev,
return false;
}
- argsz = sizeof(*irq_info);
- irq_info = g_malloc0(argsz);
- irq_info->index = irq;
- irq_info->argsz = argsz;
+ ret = vfio_device_get_irq_info(vdev, irq, &irq_info);
+
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "vfio: Error getting irq info");
+ return false;
+ }
- if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
- irq_info) < 0 || irq_info->count < 1) {
- error_setg_errno(errp, errno, "vfio: Error getting irq info");
+ if (irq_info.count < 1) {
+ error_setg(errp, "vfio: Error getting irq info, count=0");
return false;
}
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index fde0c3fbef..ab3fabf991 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -376,8 +376,8 @@ static bool vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
Error **errp)
{
VFIODevice *vdev = &vcdev->vdev;
- g_autofree struct vfio_irq_info *irq_info = NULL;
- size_t argsz;
+ struct vfio_irq_info irq_info;
+ int ret;
int fd;
EventNotifier *notifier;
IOHandler *fd_read;
@@ -406,13 +406,15 @@ static bool vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
return false;
}
- argsz = sizeof(*irq_info);
- irq_info = g_malloc0(argsz);
- irq_info->index = irq;
- irq_info->argsz = argsz;
- if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
- irq_info) < 0 || irq_info->count < 1) {
- error_setg_errno(errp, errno, "vfio: Error getting irq info");
+ ret = vfio_device_get_irq_info(vdev, irq, &irq_info);
+
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "vfio: Error getting irq info");
+ return false;
+ }
+
+ if (irq_info.count < 1) {
+ error_setg(errp, "vfio: Error getting irq info, count=0");
return false;
}
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 9673b0717e..5d837092cb 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -185,6 +185,21 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
return false;
}
+int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
+ struct vfio_irq_info *info)
+{
+ int ret;
+
+ memset(info, 0, sizeof(*info));
+
+ info->argsz = sizeof(*info);
+ info->index = index;
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
+
+ return ret < 0 ? -errno : ret;
+}
+
int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
struct vfio_region_info **info)
{
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e1fab21b47..5ccfc67aef 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1555,8 +1555,7 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
uint16_t ctrl;
uint32_t table, pba;
int ret, fd = vdev->vbasedev.fd;
- struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
- .index = VFIO_PCI_MSIX_IRQ_INDEX };
+ struct vfio_irq_info irq_info;
VFIOMSIXInfo *msix;
pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
@@ -1593,7 +1592,8 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+ ret = vfio_device_get_irq_info(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX,
+ &irq_info);
if (ret < 0) {
error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
g_free(msix);
@@ -2736,7 +2736,7 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
{
VFIODevice *vbasedev = &vdev->vbasedev;
g_autofree struct vfio_region_info *reg_info = NULL;
- struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+ struct vfio_irq_info irq_info;
int i, ret = -1;
/* Sanity check device */
@@ -2797,12 +2797,10 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
}
}
- irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
-
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+ ret = vfio_device_get_irq_info(vbasedev, VFIO_PCI_ERR_IRQ_INDEX, &irq_info);
if (ret) {
/* This can fail for an old kernel or legacy PCI dev */
- trace_vfio_populate_device_get_irq_info_failure(strerror(errno));
+ trace_vfio_populate_device_get_irq_info_failure(strerror(-ret));
} else if (irq_info.count == 1) {
vdev->pci_aer = true;
} else {
@@ -2911,17 +2909,18 @@ static void vfio_req_notifier_handler(void *opaque)
static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
{
- struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
- .index = VFIO_PCI_REQ_IRQ_INDEX };
+ struct vfio_irq_info irq_info;
Error *err = NULL;
int32_t fd;
+ int ret;
if (!(vdev->features & VFIO_FEATURE_ENABLE_REQ)) {
return;
}
- if (ioctl(vdev->vbasedev.fd,
- VFIO_DEVICE_GET_IRQ_INFO, &irq_info) < 0 || irq_info.count < 1) {
+ ret = vfio_device_get_irq_info(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX,
+ &irq_info);
+ if (ret < 0 || irq_info.count < 1) {
return;
}
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index ffb3681607..9a21f2e50a 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -474,10 +474,10 @@ static bool vfio_populate_device(VFIODevice *vbasedev, Error **errp)
QSIMPLEQ_INIT(&vdev->pending_intp_queue);
for (i = 0; i < vbasedev->num_irqs; i++) {
- struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+ struct vfio_irq_info irq;
+
+ ret = vfio_device_get_irq_info(vbasedev, i, &irq);
- irq.index = i;
- ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
if (ret) {
error_setg_errno(errp, -ret, "failed to get device irq info");
goto irq_err;
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 05/15] vfio: consistently handle return value for helpers
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (3 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 04/15] vfio: add vfio_device_get_irq_info() helper John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-07 15:20 ` [PATCH v3 06/15] vfio: add strread/writeerror() John Levon
` (10 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Various bits of code that call vfio device APIs should consistently use
the "return -errno" approach for passing errors back, rather than
presuming errno is (still) set correctly.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
hw/vfio/pci.c | 33 ++++++++++++++++++++-------------
1 file changed, 20 insertions(+), 13 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5ccfc67aef..866cf58d04 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -398,7 +398,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
- return ret;
+ return ret < 0 ? -errno : ret;
}
static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
@@ -459,7 +459,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
g_free(irq_set);
- return ret;
+ return ret < 0 ? -errno : ret;
}
static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
@@ -581,7 +581,8 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
vfio_device_irq_disable(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
ret = vfio_enable_vectors(vdev, true);
if (ret) {
- error_report("vfio: failed to enable vectors, %d", ret);
+ error_report("vfio: failed to enable vectors, %s",
+ strerror(-ret));
}
} else {
Error *err = NULL;
@@ -695,7 +696,8 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
if (vdev->nr_vectors) {
ret = vfio_enable_vectors(vdev, true);
if (ret) {
- error_report("vfio: failed to enable vectors, %d", ret);
+ error_report("vfio: failed to enable vectors, %s",
+ strerror(-ret));
}
} else {
/*
@@ -712,7 +714,8 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
*/
ret = vfio_enable_msix_no_vec(vdev);
if (ret) {
- error_report("vfio: failed to enable MSI-X, %d", ret);
+ error_report("vfio: failed to enable MSI-X, %s",
+ strerror(-ret));
}
}
@@ -765,7 +768,8 @@ retry:
ret = vfio_enable_vectors(vdev, false);
if (ret) {
if (ret < 0) {
- error_report("vfio: Error: Failed to setup MSI fds: %m");
+ error_report("vfio: Error: Failed to setup MSI fds: %s",
+ strerror(-ret));
} else {
error_report("vfio: Error: Failed to enable %d "
"MSI vectors, retry with %d", vdev->nr_vectors, ret);
@@ -882,17 +886,21 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
{
g_autofree struct vfio_region_info *reg_info = NULL;
+ VFIODevice *vbasedev = &vdev->vbasedev;
uint64_t size;
off_t off = 0;
ssize_t bytes;
+ int ret;
+
+ ret = vfio_device_get_region_info(vbasedev, VFIO_PCI_ROM_REGION_INDEX,
+ ®_info);
- if (vfio_device_get_region_info(&vdev->vbasedev,
- VFIO_PCI_ROM_REGION_INDEX, ®_info)) {
- error_report("vfio: Error getting ROM info: %m");
+ if (ret != 0) {
+ error_report("vfio: Error getting ROM info: %s", strerror(-ret));
return;
}
- trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info->size,
+ trace_vfio_pci_load_rom(vbasedev->name, (unsigned long)reg_info->size,
(unsigned long)reg_info->offset,
(unsigned long)reg_info->flags);
@@ -901,8 +909,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
if (!vdev->rom_size) {
vdev->rom_read_failed = true;
- error_report("vfio-pci: Cannot read device rom at "
- "%s", vdev->vbasedev.name);
+ error_report("vfio-pci: Cannot read device rom at %s", vbasedev->name);
error_printf("Device option ROM contents are probably invalid "
"(check dmesg).\nSkip option ROM probe with rombar=0, "
"or load from file with romfile=\n");
@@ -913,7 +920,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
memset(vdev->rom, 0xff, size);
while (size) {
- bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
+ bytes = pread(vbasedev->fd, vdev->rom + off,
size, vdev->rom_offset + off);
if (bytes == 0) {
break;
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 06/15] vfio: add strread/writeerror()
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (4 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 05/15] vfio: consistently handle return value for helpers John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-09 10:05 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 07/15] vfio: add vfio_pci_config_space_read/write() John Levon
` (9 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Add simple helpers to correctly report failures from read/write routines
using the return -errno style.
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-device.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index a7eaaa31e7..4a32202943 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -115,6 +115,20 @@ struct VFIODeviceOps {
int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
};
+/*
+ * Given a return value of either a short number of bytes read or -errno,
+ * construct a meaningful error message.
+ */
+#define strreaderror(ret) \
+ (ret < 0 ? strerror(-ret) : "short read")
+
+/*
+ * Given a return value of either a short number of bytes written or -errno,
+ * construct a meaningful error message.
+ */
+#define strwriteerror(ret) \
+ (ret < 0 ? strerror(-ret) : "short write")
+
void vfio_device_irq_disable(VFIODevice *vbasedev, int index);
void vfio_device_irq_unmask(VFIODevice *vbasedev, int index);
void vfio_device_irq_mask(VFIODevice *vbasedev, int index);
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 07/15] vfio: add vfio_pci_config_space_read/write()
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (5 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 06/15] vfio: add strread/writeerror() John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-07 15:20 ` [PATCH v3 08/15] vfio: add unmap_all flag to DMA unmap callback John Levon
` (8 subsequent siblings)
15 siblings, 0 replies; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Add these helpers that access config space and return an -errno style
return.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
hw/vfio/pci.c | 123 ++++++++++++++++++++++++++++++++------------------
1 file changed, 80 insertions(+), 43 deletions(-)
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 866cf58d04..f65c9463ce 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -967,6 +967,28 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
}
}
+/* "Raw" read of underlying config space. */
+static int vfio_pci_config_space_read(VFIOPCIDevice *vdev, off_t offset,
+ uint32_t size, void *data)
+{
+ ssize_t ret;
+
+ ret = pread(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
+
+ return ret < 0 ? -errno : (int)ret;
+}
+
+/* "Raw" write of underlying config space. */
+static int vfio_pci_config_space_write(VFIOPCIDevice *vdev, off_t offset,
+ uint32_t size, void *data)
+{
+ ssize_t ret;
+
+ ret = pwrite(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
+
+ return ret < 0 ? -errno : (int)ret;
+}
+
static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
{
VFIOPCIDevice *vdev = opaque;
@@ -1019,10 +1041,9 @@ static const MemoryRegionOps vfio_rom_ops = {
static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
{
+ VFIODevice *vbasedev = &vdev->vbasedev;
uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
- off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
char *name;
- int fd = vdev->vbasedev.fd;
if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
/* Since pci handles romfile, just print a message and return */
@@ -1039,11 +1060,12 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
* Use the same size ROM BAR as the physical device. The contents
* will get filled in later when the guest tries to read it.
*/
- if (pread(fd, &orig, 4, offset) != 4 ||
- pwrite(fd, &size, 4, offset) != 4 ||
- pread(fd, &size, 4, offset) != 4 ||
- pwrite(fd, &orig, 4, offset) != 4) {
- error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name);
+ if (vfio_pci_config_space_read(vdev, PCI_ROM_ADDRESS, 4, &orig) != 4 ||
+ vfio_pci_config_space_write(vdev, PCI_ROM_ADDRESS, 4, &size) != 4 ||
+ vfio_pci_config_space_read(vdev, PCI_ROM_ADDRESS, 4, &size) != 4 ||
+ vfio_pci_config_space_write(vdev, PCI_ROM_ADDRESS, 4, &orig) != 4) {
+
+ error_report("%s(%s) ROM access failed", __func__, vbasedev->name);
return;
}
@@ -1223,6 +1245,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
{
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIODevice *vbasedev = &vdev->vbasedev;
uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
@@ -1235,12 +1258,12 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
ssize_t ret;
- ret = pread(vdev->vbasedev.fd, &phys_val, len,
- vdev->config_offset + addr);
+ ret = vfio_pci_config_space_read(vdev, addr, len, &phys_val);
if (ret != len) {
- error_report("%s(%s, 0x%x, 0x%x) failed: %m",
- __func__, vdev->vbasedev.name, addr, len);
- return -errno;
+ error_report("%s(%s, 0x%x, 0x%x) failed: %s",
+ __func__, vbasedev->name, addr, len,
+ strreaderror(ret));
+ return -1;
}
phys_val = le32_to_cpu(phys_val);
}
@@ -1256,15 +1279,18 @@ void vfio_pci_write_config(PCIDevice *pdev,
uint32_t addr, uint32_t val, int len)
{
VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIODevice *vbasedev = &vdev->vbasedev;
uint32_t val_le = cpu_to_le32(val);
+ int ret;
trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
/* Write everything to VFIO, let it filter out what we can't write */
- if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
- != len) {
- error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %m",
- __func__, vdev->vbasedev.name, addr, val, len);
+ ret = vfio_pci_config_space_write(vdev, addr, len, &val_le);
+ if (ret != len) {
+ error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %s",
+ __func__, vbasedev->name, addr, val, len,
+ strwriteerror(ret));
}
/* MSI/MSI-X Enabling/Disabling */
@@ -1352,9 +1378,11 @@ static bool vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
int ret, entries;
Error *err = NULL;
- if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
- vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
- error_setg_errno(errp, errno, "failed reading MSI PCI_CAP_FLAGS");
+ ret = vfio_pci_config_space_read(vdev, pos + PCI_CAP_FLAGS,
+ sizeof(ctrl), &ctrl);
+ if (ret != sizeof(ctrl)) {
+ error_setg(errp, "failed reading MSI PCI_CAP_FLAGS: %s",
+ strreaderror(ret));
return false;
}
ctrl = le16_to_cpu(ctrl);
@@ -1561,30 +1589,35 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
uint8_t pos;
uint16_t ctrl;
uint32_t table, pba;
- int ret, fd = vdev->vbasedev.fd;
struct vfio_irq_info irq_info;
VFIOMSIXInfo *msix;
+ int ret;
pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
if (!pos) {
return true;
}
- if (pread(fd, &ctrl, sizeof(ctrl),
- vdev->config_offset + pos + PCI_MSIX_FLAGS) != sizeof(ctrl)) {
- error_setg_errno(errp, errno, "failed to read PCI MSIX FLAGS");
+ ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_FLAGS,
+ sizeof(ctrl), &ctrl);
+ if (ret != sizeof(ctrl)) {
+ error_setg(errp, "failed to read PCI MSIX FLAGS: %s",
+ strreaderror(ret));
return false;
}
- if (pread(fd, &table, sizeof(table),
- vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
- error_setg_errno(errp, errno, "failed to read PCI MSIX TABLE");
+ ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_TABLE,
+ sizeof(table), &table);
+ if (ret != sizeof(table)) {
+ error_setg(errp, "failed to read PCI MSIX TABLE: %s",
+ strreaderror(ret));
return false;
}
- if (pread(fd, &pba, sizeof(pba),
- vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
- error_setg_errno(errp, errno, "failed to read PCI MSIX PBA");
+ ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_PBA,
+ sizeof(pba), &pba);
+ if (ret != sizeof(pba)) {
+ error_setg(errp, "failed to read PCI MSIX PBA: %s", strreaderror(ret));
return false;
}
@@ -1744,10 +1777,10 @@ static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
}
/* Determine what type of BAR this is for registration */
- ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
- vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
+ ret = vfio_pci_config_space_read(vdev, PCI_BASE_ADDRESS_0 + (4 * nr),
+ sizeof(pci_bar), &pci_bar);
if (ret != sizeof(pci_bar)) {
- error_report("vfio: Failed to read BAR %d (%m)", nr);
+ error_report("vfio: Failed to read BAR %d: %s", nr, strreaderror(ret));
return;
}
@@ -2450,21 +2483,23 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
void vfio_pci_post_reset(VFIOPCIDevice *vdev)
{
+ VFIODevice *vbasedev = &vdev->vbasedev;
Error *err = NULL;
- int nr;
+ int ret, nr;
if (!vfio_intx_enable(vdev, &err)) {
error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
}
for (nr = 0; nr < PCI_NUM_REGIONS - 1; ++nr) {
- off_t addr = vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr);
+ off_t addr = PCI_BASE_ADDRESS_0 + (4 * nr);
uint32_t val = 0;
uint32_t len = sizeof(val);
- if (pwrite(vdev->vbasedev.fd, &val, len, addr) != len) {
- error_report("%s(%s) reset bar %d failed: %m", __func__,
- vdev->vbasedev.name, nr);
+ ret = vfio_pci_config_space_write(vdev, addr, len, &val);
+ if (ret != len) {
+ error_report("%s(%s) reset bar %d failed: %s", __func__,
+ vbasedev->name, nr, strwriteerror(ret));
}
}
@@ -3101,6 +3136,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
int i, ret;
char uuid[UUID_STR_LEN];
g_autofree char *name = NULL;
+ uint32_t config_space_size;
if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
if (!(~vdev->host.domain || ~vdev->host.bus ||
@@ -3155,13 +3191,14 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
goto error;
}
+ config_space_size = MIN(pci_config_size(&vdev->pdev), vdev->config_size);
+
/* Get a copy of config space */
- ret = pread(vbasedev->fd, vdev->pdev.config,
- MIN(pci_config_size(&vdev->pdev), vdev->config_size),
- vdev->config_offset);
- if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
- ret = ret < 0 ? -errno : -EFAULT;
- error_setg_errno(errp, -ret, "failed to read device config space");
+ ret = vfio_pci_config_space_read(vdev, 0, config_space_size,
+ vdev->pdev.config);
+ if (ret < (int)config_space_size) {
+ ret = ret < 0 ? -ret : EFAULT;
+ error_setg_errno(errp, ret, "failed to read device config space");
goto error;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 08/15] vfio: add unmap_all flag to DMA unmap callback
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (6 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 07/15] vfio: add vfio_pci_config_space_read/write() John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-09 10:07 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 09/15] vfio: implement unmap all for DMA unmap callbacks John Levon
` (7 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
We'll use this parameter shortly; this just adds the plumbing.
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-container-base.h | 15 +++++++++++++--
hw/vfio/container-base.c | 4 ++--
hw/vfio/container.c | 8 ++++++--
hw/vfio/iommufd.c | 6 +++++-
hw/vfio/listener.c | 8 ++++----
5 files changed, 30 insertions(+), 11 deletions(-)
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 5527e02722..59f07d26e8 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -81,7 +81,7 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
void *vaddr, bool readonly);
int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
- IOMMUTLBEntry *iotlb);
+ IOMMUTLBEntry *iotlb, bool unmap_all);
bool vfio_container_add_section_window(VFIOContainerBase *bcontainer,
MemoryRegionSection *section,
Error **errp);
@@ -120,9 +120,20 @@ struct VFIOIOMMUClass {
int (*dma_map)(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
void *vaddr, bool readonly);
+ /**
+ * @dma_unmap
+ *
+ * Unmap an address range from the container.
+ *
+ * @bcontainer: #VFIOContainerBase to use for unmap
+ * @iova: start address to unmap
+ * @size: size of the range to unmap
+ * @iotlb: The IOMMU TLB mapping entry (or NULL)
+ * @unmap_all: if set, unmap the entire address space
+ */
int (*dma_unmap)(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
- IOMMUTLBEntry *iotlb);
+ IOMMUTLBEntry *iotlb, bool unmap_all);
bool (*attach_device)(const char *name, VFIODevice *vbasedev,
AddressSpace *as, Error **errp);
void (*detach_device)(VFIODevice *vbasedev);
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 09340fd97a..3ff473a45c 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -85,12 +85,12 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
- IOMMUTLBEntry *iotlb)
+ IOMMUTLBEntry *iotlb, bool unmap_all)
{
VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
g_assert(vioc->dma_unmap);
- return vioc->dma_unmap(bcontainer, iova, size, iotlb);
+ return vioc->dma_unmap(bcontainer, iova, size, iotlb, unmap_all);
}
bool vfio_container_add_section_window(VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index cf23aa799f..d5f4e66f1c 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -124,7 +124,7 @@ unmap_exit:
*/
static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
- IOMMUTLBEntry *iotlb)
+ IOMMUTLBEntry *iotlb, bool unmap_all)
{
const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
bcontainer);
@@ -138,6 +138,10 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
int ret;
Error *local_err = NULL;
+ if (unmap_all) {
+ return -ENOTSUP;
+ }
+
if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
bcontainer->dirty_pages_supported) {
@@ -205,7 +209,7 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
*/
if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
(errno == EBUSY &&
- vfio_legacy_dma_unmap(bcontainer, iova, size, NULL) == 0 &&
+ vfio_legacy_dma_unmap(bcontainer, iova, size, NULL, false) == 0 &&
ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
return 0;
}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 62ecb758f1..6b2764c044 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -46,11 +46,15 @@ static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
- IOMMUTLBEntry *iotlb)
+ IOMMUTLBEntry *iotlb, bool unmap_all)
{
const VFIOIOMMUFDContainer *container =
container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+ if (unmap_all) {
+ return -ENOTSUP;
+ }
+
/* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
return iommufd_backend_unmap_dma(container->be,
container->ioas_id, iova, size);
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index 6f77e18a7a..c5183700db 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -172,7 +172,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
}
} else {
ret = vfio_container_dma_unmap(bcontainer, iova,
- iotlb->addr_mask + 1, iotlb);
+ iotlb->addr_mask + 1, iotlb, false);
if (ret) {
error_setg(&local_err,
"vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
@@ -201,7 +201,7 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
int ret;
/* Unmap with a single call. */
- ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL);
+ ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL, false);
if (ret) {
error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
strerror(-ret));
@@ -638,7 +638,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
/* The unmap ioctl doesn't accept a full 64-bit span. */
llsize = int128_rshift(llsize, 1);
ret = vfio_container_dma_unmap(bcontainer, iova,
- int128_get64(llsize), NULL);
+ int128_get64(llsize), NULL, false);
if (ret) {
error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
"0x%"HWADDR_PRIx") = %d (%s)",
@@ -648,7 +648,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
iova += int128_get64(llsize);
}
ret = vfio_container_dma_unmap(bcontainer, iova,
- int128_get64(llsize), NULL);
+ int128_get64(llsize), NULL, false);
if (ret) {
error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
"0x%"HWADDR_PRIx") = %d (%s)",
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 09/15] vfio: implement unmap all for DMA unmap callbacks
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (7 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 08/15] vfio: add unmap_all flag to DMA unmap callback John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-09 10:08 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 10/15] vfio: add device IO ops vector John Levon
` (6 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Handle unmap_all in the DMA unmap handlers rather than in the caller.
Signed-off-by: John Levon <john.levon@nutanix.com>
---
hw/vfio/container.c | 41 +++++++++++++++++++++++++++++++----------
hw/vfio/iommufd.c | 15 ++++++++++++++-
hw/vfio/listener.c | 19 ++++++-------------
3 files changed, 51 insertions(+), 24 deletions(-)
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index d5f4e66f1c..a9f0dbaec4 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -119,12 +119,9 @@ unmap_exit:
return ret;
}
-/*
- * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
- */
-static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
- hwaddr iova, ram_addr_t size,
- IOMMUTLBEntry *iotlb, bool unmap_all)
+static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb)
{
const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
bcontainer);
@@ -138,10 +135,6 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
int ret;
Error *local_err = NULL;
- if (unmap_all) {
- return -ENOTSUP;
- }
-
if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
bcontainer->dirty_pages_supported) {
@@ -185,6 +178,34 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
return 0;
}
+/*
+ * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
+ */
+static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb, bool unmap_all)
+{
+ int ret;
+
+ if (unmap_all) {
+ /* The unmap ioctl doesn't accept a full 64-bit span. */
+ Int128 llsize = int128_rshift(int128_2_64(), 1);
+
+ ret = vfio_legacy_dma_unmap_one(bcontainer, 0, int128_get64(llsize),
+ iotlb);
+
+ if (ret == 0) {
+ ret = vfio_legacy_dma_unmap_one(bcontainer, int128_get64(llsize),
+ int128_get64(llsize), iotlb);
+ }
+
+ } else {
+ ret = vfio_legacy_dma_unmap_one(bcontainer, iova, size, iotlb);
+ }
+
+ return ret;
+}
+
static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
ram_addr_t size, void *vaddr, bool readonly)
{
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 6b2764c044..af1c7ab10a 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -51,8 +51,21 @@ static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
const VFIOIOMMUFDContainer *container =
container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
+ /* unmap in halves */
if (unmap_all) {
- return -ENOTSUP;
+ Int128 llsize = int128_rshift(int128_2_64(), 1);
+ int ret;
+
+ ret = iommufd_backend_unmap_dma(container->be, container->ioas_id,
+ 0, int128_get64(llsize));
+
+ if (ret == 0) {
+ ret = iommufd_backend_unmap_dma(container->be, container->ioas_id,
+ int128_get64(llsize),
+ int128_get64(llsize));
+ }
+
+ return ret;
}
/* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index c5183700db..e7ade7d62e 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -634,21 +634,14 @@ static void vfio_listener_region_del(MemoryListener *listener,
}
if (try_unmap) {
+ bool unmap_all = false;
+
if (int128_eq(llsize, int128_2_64())) {
- /* The unmap ioctl doesn't accept a full 64-bit span. */
- llsize = int128_rshift(llsize, 1);
- ret = vfio_container_dma_unmap(bcontainer, iova,
- int128_get64(llsize), NULL, false);
- if (ret) {
- error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
- "0x%"HWADDR_PRIx") = %d (%s)",
- bcontainer, iova, int128_get64(llsize), ret,
- strerror(-ret));
- }
- iova += int128_get64(llsize);
+ unmap_all = true;
+ llsize = int128_zero();
}
- ret = vfio_container_dma_unmap(bcontainer, iova,
- int128_get64(llsize), NULL, false);
+ ret = vfio_container_dma_unmap(bcontainer, iova, int128_get64(llsize),
+ NULL, unmap_all);
if (ret) {
error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
"0x%"HWADDR_PRIx") = %d (%s)",
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 10/15] vfio: add device IO ops vector
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (8 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 09/15] vfio: implement unmap all for DMA unmap callbacks John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-09 10:09 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 11/15] vfio: add region info cache John Levon
` (5 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon, John Johnson,
Elena Ufimtseva, Jagannathan Raman
For vfio-user, device operations such as IRQ handling and region
read/writes are implemented in userspace over the control socket, not
ioctl() to the vfio kernel driver; add an ops vector to generalize this,
and implement vfio_device_io_ops_ioctl for interacting with the kernel
vfio driver.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-device.h | 38 ++++++++++++++++++
hw/vfio/container-base.c | 6 +--
hw/vfio/device.c | 74 +++++++++++++++++++++++++++++------
hw/vfio/listener.c | 13 +++---
hw/vfio/pci.c | 10 ++---
5 files changed, 114 insertions(+), 27 deletions(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 4a32202943..7e1e81e76b 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -41,6 +41,7 @@ enum {
};
typedef struct VFIODeviceOps VFIODeviceOps;
+typedef struct VFIODeviceIOOps VFIODeviceIOOps;
typedef struct VFIOMigration VFIOMigration;
typedef struct IOMMUFDBackend IOMMUFDBackend;
@@ -66,6 +67,7 @@ typedef struct VFIODevice {
OnOffAuto migration_multifd_transfer;
bool migration_events;
VFIODeviceOps *ops;
+ VFIODeviceIOOps *io_ops;
unsigned int num_irqs;
unsigned int num_regions;
unsigned int flags;
@@ -151,6 +153,42 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
extern VFIODeviceList vfio_device_list;
#ifdef CONFIG_LINUX
+/*
+ * How devices communicate with the server. The default option is through
+ * ioctl() to the kernel VFIO driver, but vfio-user can use a socket to a remote
+ * process.
+ */
+struct VFIODeviceIOOps {
+ /**
+ * @device_feature
+ *
+ * Fill in feature info for the given device.
+ */
+ int (*device_feature)(VFIODevice *vdev, struct vfio_device_feature *);
+
+ /**
+ * @get_region_info
+ *
+ * Fill in @info with information on the region given by @info->index.
+ */
+ int (*get_region_info)(VFIODevice *vdev,
+ struct vfio_region_info *info);
+
+ /**
+ * @get_irq_info
+ *
+ * Fill in @irq with information on the IRQ given by @info->index.
+ */
+ int (*get_irq_info)(VFIODevice *vdev, struct vfio_irq_info *irq);
+
+ /**
+ * @set_irqs
+ *
+ * Configure IRQs as defined by @irqs.
+ */
+ int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
+};
+
void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
struct vfio_device_info *info);
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 3ff473a45c..1c6ca94b60 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -198,11 +198,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
feature->flags = VFIO_DEVICE_FEATURE_GET |
VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
- if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
- return -errno;
- }
-
- return 0;
+ return vbasedev->io_ops->device_feature(vbasedev, feature);
}
static int vfio_container_iommu_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 5d837092cb..40a196bfb9 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -82,7 +82,7 @@ void vfio_device_irq_disable(VFIODevice *vbasedev, int index)
.count = 0,
};
- ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
}
void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
@@ -95,7 +95,7 @@ void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
.count = 1,
};
- ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
}
void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
@@ -108,7 +108,7 @@ void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
.count = 1,
};
- ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+ vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
}
static inline const char *action_to_str(int action)
@@ -167,7 +167,7 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
pfd = (int32_t *)&irq_set->data;
*pfd = fd;
- if (!ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
+ if (!vbasedev->io_ops->set_irqs(vbasedev, irq_set)) {
return true;
}
@@ -188,22 +188,19 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
struct vfio_irq_info *info)
{
- int ret;
-
memset(info, 0, sizeof(*info));
info->argsz = sizeof(*info);
info->index = index;
- ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
-
- return ret < 0 ? -errno : ret;
+ return vbasedev->io_ops->get_irq_info(vbasedev, info);
}
int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
struct vfio_region_info **info)
{
size_t argsz = sizeof(struct vfio_region_info);
+ int ret;
*info = g_malloc0(argsz);
@@ -211,10 +208,11 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
retry:
(*info)->argsz = argsz;
- if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
+ ret = vbasedev->io_ops->get_region_info(vbasedev, *info);
+ if (ret != 0) {
g_free(*info);
*info = NULL;
- return -errno;
+ return ret;
}
if ((*info)->argsz > argsz) {
@@ -320,11 +318,14 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
vbasedev->fd = fd;
}
+static VFIODeviceIOOps vfio_device_io_ops_ioctl;
+
void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
DeviceState *dev, bool ram_discard)
{
vbasedev->type = type;
vbasedev->ops = ops;
+ vbasedev->io_ops = &vfio_device_io_ops_ioctl;
vbasedev->dev = dev;
vbasedev->fd = -1;
@@ -442,3 +443,54 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
QLIST_REMOVE(vbasedev, global_next);
vbasedev->bcontainer = NULL;
}
+
+/*
+ * Traditional ioctl() based io
+ */
+
+static int vfio_device_io_device_feature(VFIODevice *vbasedev,
+ struct vfio_device_feature *feature)
+{
+ int ret;
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+
+ return ret < 0 ? -errno : ret;
+}
+
+static int vfio_device_io_get_region_info(VFIODevice *vbasedev,
+ struct vfio_region_info *info)
+{
+ int ret;
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
+
+ return ret < 0 ? -errno : ret;
+}
+
+static int vfio_device_io_get_irq_info(VFIODevice *vbasedev,
+ struct vfio_irq_info *info)
+{
+ int ret;
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
+
+ return ret < 0 ? -errno : ret;
+}
+
+static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
+ struct vfio_irq_set *irqs)
+{
+ int ret;
+
+ ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs);
+
+ return ret < 0 ? -errno : ret;
+}
+
+static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
+ .device_feature = vfio_device_io_device_feature,
+ .get_region_info = vfio_device_io_get_region_info,
+ .get_irq_info = vfio_device_io_get_irq_info,
+ .set_irqs = vfio_device_io_set_irqs,
+};
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index e7ade7d62e..2b93ca55b6 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -794,13 +794,17 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
+ int ret;
+
if (!vbasedev->dirty_tracking) {
continue;
}
- if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+ ret = vbasedev->io_ops->device_feature(vbasedev, feature);
+
+ if (ret != 0) {
warn_report("%s: Failed to stop DMA logging, err %d (%s)",
- vbasedev->name, -errno, strerror(errno));
+ vbasedev->name, -ret, strerror(-ret));
}
vbasedev->dirty_tracking = false;
}
@@ -901,10 +905,9 @@ static bool vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
continue;
}
- ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+ ret = vbasedev->io_ops->device_feature(vbasedev, feature);
if (ret) {
- ret = -errno;
- error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
+ error_setg_errno(errp, -ret, "%s: Failed to start DMA logging",
vbasedev->name);
goto out;
}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index f65c9463ce..da2ffc9bf3 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -381,7 +381,7 @@ static void vfio_msi_interrupt(void *opaque)
static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
{
g_autofree struct vfio_irq_set *irq_set = NULL;
- int ret = 0, argsz;
+ int argsz;
int32_t *fd;
argsz = sizeof(*irq_set) + sizeof(*fd);
@@ -396,9 +396,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
fd = (int32_t *)&irq_set->data;
*fd = -1;
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
-
- return ret < 0 ? -errno : ret;
+ return vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
}
static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
@@ -455,11 +453,11 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
fds[i] = fd;
}
- ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+ ret = vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
g_free(irq_set);
- return ret < 0 ? -errno : ret;
+ return ret;
}
static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 11/15] vfio: add region info cache
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (9 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 10/15] vfio: add device IO ops vector John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-09 10:09 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 12/15] vfio: add read/write to device IO ops vector John Levon
` (4 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon, John Johnson,
Elena Ufimtseva, Jagannathan Raman
Instead of requesting region information on demand with
VFIO_DEVICE_GET_REGION_INFO, maintain a cache: this will become
necessary for performance for vfio-user, where this call becomes a
message over the control socket, so is of higher overhead than the
traditional path.
We will also need it to generalize region accesses, as that means we
can't use ->config_offset for configuration space accesses, but must
look up the region offset (if relevant) each time.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-device.h | 1 +
hw/vfio/ccw.c | 5 -----
hw/vfio/device.c | 25 +++++++++++++++++++++----
hw/vfio/igd.c | 10 +++++-----
hw/vfio/pci.c | 6 +++---
hw/vfio/region.c | 2 +-
6 files changed, 31 insertions(+), 18 deletions(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 7e1e81e76b..4fff3dcee3 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -83,6 +83,7 @@ typedef struct VFIODevice {
IOMMUFDBackend *iommufd;
VFIOIOASHwpt *hwpt;
QLIST_ENTRY(VFIODevice) hwpt_next;
+ struct vfio_region_info **reginfo;
} VFIODevice;
struct VFIODeviceOps {
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index ab3fabf991..cea9d6e005 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -504,7 +504,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
vcdev->io_region_offset = info->offset;
vcdev->io_region = g_malloc0(info->size);
- g_free(info);
/* check for the optional async command region */
ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
@@ -517,7 +516,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
}
vcdev->async_cmd_region_offset = info->offset;
vcdev->async_cmd_region = g_malloc0(info->size);
- g_free(info);
}
ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
@@ -530,7 +528,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
}
vcdev->schib_region_offset = info->offset;
vcdev->schib_region = g_malloc(info->size);
- g_free(info);
}
ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
@@ -544,7 +541,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
}
vcdev->crw_region_offset = info->offset;
vcdev->crw_region = g_malloc(info->size);
- g_free(info);
}
return true;
@@ -554,7 +550,6 @@ out_err:
g_free(vcdev->schib_region);
g_free(vcdev->async_cmd_region);
g_free(vcdev->io_region);
- g_free(info);
return false;
}
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 40a196bfb9..77b0675abe 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -202,6 +202,12 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
size_t argsz = sizeof(struct vfio_region_info);
int ret;
+ /* check cache */
+ if (vbasedev->reginfo[index] != NULL) {
+ *info = vbasedev->reginfo[index];
+ return 0;
+ }
+
*info = g_malloc0(argsz);
(*info)->index = index;
@@ -222,6 +228,9 @@ retry:
goto retry;
}
+ /* fill cache */
+ vbasedev->reginfo[index] = *info;
+
return 0;
}
@@ -240,7 +249,6 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
hdr = vfio_get_region_info_cap(*info, VFIO_REGION_INFO_CAP_TYPE);
if (!hdr) {
- g_free(*info);
continue;
}
@@ -252,8 +260,6 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
if (cap_type->type == type && cap_type->subtype == subtype) {
return 0;
}
-
- g_free(*info);
}
*info = NULL;
@@ -262,7 +268,7 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
bool vfio_device_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
{
- g_autofree struct vfio_region_info *info = NULL;
+ struct vfio_region_info *info = NULL;
bool ret = false;
if (!vfio_device_get_region_info(vbasedev, region, &info)) {
@@ -435,10 +441,21 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+
+ vbasedev->reginfo = g_new0(struct vfio_region_info *,
+ vbasedev->num_regions);
}
void vfio_device_unprepare(VFIODevice *vbasedev)
{
+ int i;
+
+ for (i = 0; i < vbasedev->num_regions; i++) {
+ g_free(vbasedev->reginfo[i]);
+ }
+ g_free(vbasedev->reginfo);
+ vbasedev->reginfo = NULL;
+
QLIST_REMOVE(vbasedev, container_next);
QLIST_REMOVE(vbasedev, global_next);
vbasedev->bcontainer = NULL;
diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index 3ee1a73b57..e7952d15a0 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -349,8 +349,8 @@ static int vfio_pci_igd_lpc_init(VFIOPCIDevice *vdev,
static bool vfio_pci_igd_setup_lpc_bridge(VFIOPCIDevice *vdev, Error **errp)
{
- g_autofree struct vfio_region_info *host = NULL;
- g_autofree struct vfio_region_info *lpc = NULL;
+ struct vfio_region_info *host = NULL;
+ struct vfio_region_info *lpc = NULL;
PCIDevice *lpc_bridge;
int ret;
@@ -510,7 +510,7 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
static bool vfio_pci_igd_config_quirk(VFIOPCIDevice *vdev, Error **errp)
{
- g_autofree struct vfio_region_info *opregion = NULL;
+ struct vfio_region_info *opregion = NULL;
int ret, gen;
uint64_t gms_size = 0;
uint64_t *bdsm_size;
@@ -551,7 +551,7 @@ static bool vfio_pci_igd_config_quirk(VFIOPCIDevice *vdev, Error **errp)
* - OpRegion
* - Same LPC bridge and Host bridge VID/DID/SVID/SSID as host
*/
- g_autofree struct vfio_region_info *rom = NULL;
+ struct vfio_region_info *rom = NULL;
legacy_mode_enabled = true;
info_report("IGD legacy mode enabled, "
@@ -681,7 +681,7 @@ error:
*/
static bool vfio_pci_kvmgt_config_quirk(VFIOPCIDevice *vdev, Error **errp)
{
- g_autofree struct vfio_region_info *opregion = NULL;
+ struct vfio_region_info *opregion = NULL;
int gen;
if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index da2ffc9bf3..9136cf52c8 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -883,8 +883,8 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
{
- g_autofree struct vfio_region_info *reg_info = NULL;
VFIODevice *vbasedev = &vdev->vbasedev;
+ struct vfio_region_info *reg_info = NULL;
uint64_t size;
off_t off = 0;
ssize_t bytes;
@@ -2710,7 +2710,7 @@ static VFIODeviceOps vfio_pci_ops = {
bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
{
VFIODevice *vbasedev = &vdev->vbasedev;
- g_autofree struct vfio_region_info *reg_info = NULL;
+ struct vfio_region_info *reg_info = NULL;
int ret;
ret = vfio_device_get_region_info(vbasedev, VFIO_PCI_VGA_REGION_INDEX, ®_info);
@@ -2775,7 +2775,7 @@ bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
{
VFIODevice *vbasedev = &vdev->vbasedev;
- g_autofree struct vfio_region_info *reg_info = NULL;
+ struct vfio_region_info *reg_info = NULL;
struct vfio_irq_info irq_info;
int i, ret = -1;
diff --git a/hw/vfio/region.c b/hw/vfio/region.c
index 04bf9eb098..ef2630cac3 100644
--- a/hw/vfio/region.c
+++ b/hw/vfio/region.c
@@ -182,7 +182,7 @@ static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
int index, const char *name)
{
- g_autofree struct vfio_region_info *info = NULL;
+ struct vfio_region_info *info = NULL;
int ret;
ret = vfio_device_get_region_info(vbasedev, index, &info);
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 12/15] vfio: add read/write to device IO ops vector
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (10 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 11/15] vfio: add region info cache John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-09 10:14 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 13/15] vfio: add vfio-pci-base class John Levon
` (3 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
Now we have the region info cache, add ->region_read/write device I/O
operations instead of explicit pread()/pwrite() system calls.
---
include/hw/vfio/vfio-device.h | 18 ++++++++++++++++++
hw/vfio/device.c | 34 ++++++++++++++++++++++++++++++++++
hw/vfio/pci.c | 28 ++++++++++++++--------------
hw/vfio/region.c | 17 +++++++++++------
4 files changed, 77 insertions(+), 20 deletions(-)
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 4fff3dcee3..8bcb3c19f6 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -188,6 +188,24 @@ struct VFIODeviceIOOps {
* Configure IRQs as defined by @irqs.
*/
int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
+
+ /**
+ * @region_read
+ *
+ * Read @size bytes from the region @nr at offset @off into the buffer
+ * @data.
+ */
+ int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
+ void *data);
+
+ /**
+ * @region_write
+ *
+ * Write @size bytes to the region @nr at offset @off from the buffer
+ * @data.
+ */
+ int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
+ void *data);
};
void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 77b0675abe..0b2cd90d64 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -505,9 +505,43 @@ static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
return ret < 0 ? -errno : ret;
}
+static int vfio_device_io_region_read(VFIODevice *vbasedev, uint8_t index,
+ off_t off, uint32_t size, void *data)
+{
+ struct vfio_region_info *info;
+ int ret;
+
+ ret = vfio_device_get_region_info(vbasedev, index, &info);
+ if (ret != 0) {
+ return ret;
+ }
+
+ ret = pread(vbasedev->fd, data, size, info->offset + off);
+
+ return ret < 0 ? -errno : ret;
+}
+
+static int vfio_device_io_region_write(VFIODevice *vbasedev, uint8_t index,
+ off_t off, uint32_t size, void *data)
+{
+ struct vfio_region_info *info;
+ int ret;
+
+ ret = vfio_device_get_region_info(vbasedev, index, &info);
+ if (ret != 0) {
+ return ret;
+ }
+
+ ret = pwrite(vbasedev->fd, data, size, info->offset + off);
+
+ return ret < 0 ? -errno : ret;
+}
+
static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
.device_feature = vfio_device_io_device_feature,
.get_region_info = vfio_device_io_get_region_info,
.get_irq_info = vfio_device_io_get_irq_info,
.set_irqs = vfio_device_io_set_irqs,
+ .region_read = vfio_device_io_region_read,
+ .region_write = vfio_device_io_region_write,
};
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 9136cf52c8..1236de315d 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -918,18 +918,22 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
memset(vdev->rom, 0xff, size);
while (size) {
- bytes = pread(vbasedev->fd, vdev->rom + off,
- size, vdev->rom_offset + off);
+ bytes = vbasedev->io_ops->region_read(vbasedev,
+ VFIO_PCI_ROM_REGION_INDEX,
+ off, size, vdev->rom + off);
+
if (bytes == 0) {
break;
} else if (bytes > 0) {
off += bytes;
size -= bytes;
} else {
- if (errno == EINTR || errno == EAGAIN) {
+ if (bytes == -EINTR || bytes == -EAGAIN) {
continue;
}
- error_report("vfio: Error reading device ROM: %m");
+ error_report("vfio: Error reading device ROM: %s",
+ strreaderror(bytes));
+
break;
}
}
@@ -969,22 +973,18 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
static int vfio_pci_config_space_read(VFIOPCIDevice *vdev, off_t offset,
uint32_t size, void *data)
{
- ssize_t ret;
-
- ret = pread(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
-
- return ret < 0 ? -errno : (int)ret;
+ return vdev->vbasedev.io_ops->region_read(&vdev->vbasedev,
+ VFIO_PCI_CONFIG_REGION_INDEX,
+ offset, size, data);
}
/* "Raw" write of underlying config space. */
static int vfio_pci_config_space_write(VFIOPCIDevice *vdev, off_t offset,
uint32_t size, void *data)
{
- ssize_t ret;
-
- ret = pwrite(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
-
- return ret < 0 ? -errno : (int)ret;
+ return vdev->vbasedev.io_ops->region_write(&vdev->vbasedev,
+ VFIO_PCI_CONFIG_REGION_INDEX,
+ offset, size, data);
}
static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
diff --git a/hw/vfio/region.c b/hw/vfio/region.c
index ef2630cac3..34752c3f65 100644
--- a/hw/vfio/region.c
+++ b/hw/vfio/region.c
@@ -45,6 +45,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
uint32_t dword;
uint64_t qword;
} buf;
+ int ret;
switch (size) {
case 1:
@@ -64,11 +65,13 @@ void vfio_region_write(void *opaque, hwaddr addr,
break;
}
- if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+ ret = vbasedev->io_ops->region_write(vbasedev, region->nr,
+ addr, size, &buf);
+ if (ret != size) {
error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
- ",%d) failed: %m",
+ ",%d) failed: %s",
__func__, vbasedev->name, region->nr,
- addr, data, size);
+ addr, data, size, strwriteerror(ret));
}
trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
@@ -96,11 +99,13 @@ uint64_t vfio_region_read(void *opaque,
uint64_t qword;
} buf;
uint64_t data = 0;
+ int ret;
- if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
- error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+ ret = vbasedev->io_ops->region_read(vbasedev, region->nr, addr, size, &buf);
+ if (ret != size) {
+ error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %s",
__func__, vbasedev->name, region->nr,
- addr, size);
+ addr, size, strreaderror(ret));
return (uint64_t)-1;
}
switch (size) {
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 13/15] vfio: add vfio-pci-base class
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (11 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 12/15] vfio: add read/write to device IO ops vector John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-09 10:14 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 14/15] vfio/container: pass listener_begin/commit callbacks John Levon
` (2 subsequent siblings)
15 siblings, 1 reply; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon, John Johnson,
Elena Ufimtseva, Jagannathan Raman
Split out parts of TYPE_VFIO_PCI into a base TYPE_VFIO_PCI_BASE,
although we have not yet introduced another subclass, so all the
properties have remained in TYPE_VFIO_PCI.
Note that currently there is no need for additional data for
TYPE_VFIO_PCI, so it shares the same C struct type as
TYPE_VFIO_PCI_BASE, VFIOPCIDevice.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
hw/vfio/pci.h | 10 +++++++-
hw/vfio/device.c | 2 +-
hw/vfio/pci.c | 62 +++++++++++++++++++++++++++++++-----------------
3 files changed, 50 insertions(+), 24 deletions(-)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f835b1dbc2..5ce0fb916f 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -118,8 +118,16 @@ typedef struct VFIOMSIXInfo {
bool noresize;
} VFIOMSIXInfo;
+/*
+ * TYPE_VFIO_PCI_BASE is an abstract type used to share code
+ * between VFIO implementations that use a kernel driver
+ * with those that use user sockets.
+ */
+#define TYPE_VFIO_PCI_BASE "vfio-pci-base"
+OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI_BASE)
+
#define TYPE_VFIO_PCI "vfio-pci"
-OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI)
+/* TYPE_VFIO_PCI shares struct VFIOPCIDevice. */
struct VFIOPCIDevice {
PCIDevice pdev;
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 0b2cd90d64..9fba2c7272 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -392,7 +392,7 @@ bool vfio_device_hiod_create_and_realize(VFIODevice *vbasedev,
VFIODevice *vfio_get_vfio_device(Object *obj)
{
if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
- return &VFIO_PCI(obj)->vbasedev;
+ return &VFIO_PCI_BASE(obj)->vbasedev;
} else {
return NULL;
}
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1236de315d..a1bfdfe375 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -241,7 +241,7 @@ static void vfio_intx_update(VFIOPCIDevice *vdev, PCIINTxRoute *route)
static void vfio_intx_routing_notifier(PCIDevice *pdev)
{
- VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
PCIINTxRoute route;
if (vdev->interrupt != VFIO_INT_INTx) {
@@ -514,7 +514,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
MSIMessage *msg, IOHandler *handler)
{
- VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
VFIOMSIVector *vector;
int ret;
bool resizing = !!(vdev->nr_vectors < nr + 1);
@@ -620,7 +620,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
{
- VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
VFIOMSIVector *vector = &vdev->msi_vectors[nr];
trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
@@ -1196,7 +1196,7 @@ static const MemoryRegionOps vfio_vga_ops = {
*/
static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
{
- VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
VFIORegion *region = &vdev->bars[bar].region;
MemoryRegion *mmap_mr, *region_mr, *base_mr;
PCIIORegion *r;
@@ -1242,7 +1242,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
*/
uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
{
- VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
VFIODevice *vbasedev = &vdev->vbasedev;
uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
@@ -1276,7 +1276,7 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
void vfio_pci_write_config(PCIDevice *pdev,
uint32_t addr, uint32_t val, int len)
{
- VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
VFIODevice *vbasedev = &vdev->vbasedev;
uint32_t val_le = cpu_to_le32(val);
int ret;
@@ -3129,7 +3129,7 @@ static bool vfio_interrupt_setup(VFIOPCIDevice *vdev, Error **errp)
static void vfio_realize(PCIDevice *pdev, Error **errp)
{
ERRP_GUARD();
- VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
VFIODevice *vbasedev = &vdev->vbasedev;
int i, ret;
char uuid[UUID_STR_LEN];
@@ -3300,7 +3300,7 @@ error:
static void vfio_instance_finalize(Object *obj)
{
- VFIOPCIDevice *vdev = VFIO_PCI(obj);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
vfio_display_finalize(vdev);
vfio_bars_finalize(vdev);
@@ -3318,7 +3318,7 @@ static void vfio_instance_finalize(Object *obj)
static void vfio_exitfn(PCIDevice *pdev)
{
- VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
VFIODevice *vbasedev = &vdev->vbasedev;
vfio_unregister_req_notifier(vdev);
@@ -3342,7 +3342,7 @@ static void vfio_exitfn(PCIDevice *pdev)
static void vfio_pci_reset(DeviceState *dev)
{
- VFIOPCIDevice *vdev = VFIO_PCI(dev);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
trace_vfio_pci_reset(vdev->vbasedev.name);
@@ -3382,7 +3382,7 @@ post_reset:
static void vfio_instance_init(Object *obj)
{
PCIDevice *pci_dev = PCI_DEVICE(obj);
- VFIOPCIDevice *vdev = VFIO_PCI(obj);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
VFIODevice *vbasedev = &vdev->vbasedev;
device_add_bootindex_property(obj, &vdev->bootindex,
@@ -3403,6 +3403,31 @@ static void vfio_instance_init(Object *obj)
pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
}
+static void vfio_pci_base_dev_class_init(ObjectClass *klass, const void *data)
+{
+ DeviceClass *dc = DEVICE_CLASS(klass);
+ PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
+
+ dc->desc = "VFIO PCI base device";
+ set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+ pdc->exit = vfio_exitfn;
+ pdc->config_read = vfio_pci_read_config;
+ pdc->config_write = vfio_pci_write_config;
+}
+
+static const TypeInfo vfio_pci_base_dev_info = {
+ .name = TYPE_VFIO_PCI_BASE,
+ .parent = TYPE_PCI_DEVICE,
+ .instance_size = 0,
+ .abstract = true,
+ .class_init = vfio_pci_base_dev_class_init,
+ .interfaces = (const InterfaceInfo[]) {
+ { INTERFACE_PCIE_DEVICE },
+ { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+ { }
+ },
+};
+
static PropertyInfo vfio_pci_migration_multifd_transfer_prop;
static const Property vfio_pci_dev_properties[] = {
@@ -3473,7 +3498,8 @@ static const Property vfio_pci_dev_properties[] = {
#ifdef CONFIG_IOMMUFD
static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp)
{
- vfio_device_set_fd(&VFIO_PCI(obj)->vbasedev, str, errp);
+ VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
+ vfio_device_set_fd(&vdev->vbasedev, str, errp);
}
#endif
@@ -3488,11 +3514,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
#endif
dc->desc = "VFIO-based PCI device assignment";
- set_bit(DEVICE_CATEGORY_MISC, dc->categories);
pdc->realize = vfio_realize;
- pdc->exit = vfio_exitfn;
- pdc->config_read = vfio_pci_read_config;
- pdc->config_write = vfio_pci_write_config;
object_class_property_set_description(klass, /* 1.3 */
"host",
@@ -3617,16 +3639,11 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
static const TypeInfo vfio_pci_dev_info = {
.name = TYPE_VFIO_PCI,
- .parent = TYPE_PCI_DEVICE,
+ .parent = TYPE_VFIO_PCI_BASE,
.instance_size = sizeof(VFIOPCIDevice),
.class_init = vfio_pci_dev_class_init,
.instance_init = vfio_instance_init,
.instance_finalize = vfio_instance_finalize,
- .interfaces = (const InterfaceInfo[]) {
- { INTERFACE_PCIE_DEVICE },
- { INTERFACE_CONVENTIONAL_PCI_DEVICE },
- { }
- },
};
static const Property vfio_pci_dev_nohotplug_properties[] = {
@@ -3673,6 +3690,7 @@ static void register_vfio_pci_dev_type(void)
vfio_pci_migration_multifd_transfer_prop = qdev_prop_on_off_auto;
vfio_pci_migration_multifd_transfer_prop.realized_set_allowed = true;
+ type_register_static(&vfio_pci_base_dev_info);
type_register_static(&vfio_pci_dev_info);
type_register_static(&vfio_pci_nohotplug_dev_info);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 14/15] vfio/container: pass listener_begin/commit callbacks
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (12 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 13/15] vfio: add vfio-pci-base class John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-07 15:20 ` [PATCH v3 15/15] vfio/container: pass MemoryRegion to DMA operations John Levon
2025-05-09 10:24 ` [PATCH v3 00/15] vfio: preparation for vfio-user Cédric Le Goater
15 siblings, 0 replies; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon
The vfio-user container will later need to hook into these callbacks;
set up vfio to use them, and optionally pass them through to the
container.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-container-base.h | 2 ++
hw/vfio/listener.c | 28 +++++++++++++++++++++++++++
2 files changed, 30 insertions(+)
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 59f07d26e8..3d392b0fd8 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -117,6 +117,8 @@ struct VFIOIOMMUClass {
/* basic feature */
bool (*setup)(VFIOContainerBase *bcontainer, Error **errp);
+ void (*listener_begin)(VFIOContainerBase *bcontainer);
+ void (*listener_commit)(VFIOContainerBase *bcontainer);
int (*dma_map)(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
void *vaddr, bool readonly);
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index 2b93ca55b6..bfacb3d8d9 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -411,6 +411,32 @@ static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer,
return true;
}
+static void vfio_listener_begin(MemoryListener *listener)
+{
+ VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+ listener);
+ void (*listener_begin)(VFIOContainerBase *bcontainer);
+
+ listener_begin = VFIO_IOMMU_GET_CLASS(bcontainer)->listener_begin;
+
+ if (listener_begin) {
+ listener_begin(bcontainer);
+ }
+}
+
+static void vfio_listener_commit(MemoryListener *listener)
+{
+ VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+ listener);
+ void (*listener_commit)(VFIOContainerBase *bcontainer);
+
+ listener_commit = VFIO_IOMMU_GET_CLASS(bcontainer)->listener_begin;
+
+ if (listener_commit) {
+ listener_commit(bcontainer);
+ }
+}
+
static void vfio_device_error_append(VFIODevice *vbasedev, Error **errp)
{
/*
@@ -1161,6 +1187,8 @@ static void vfio_listener_log_sync(MemoryListener *listener,
static const MemoryListener vfio_memory_listener = {
.name = "vfio",
+ .begin = vfio_listener_begin,
+ .commit = vfio_listener_commit,
.region_add = vfio_listener_region_add,
.region_del = vfio_listener_region_del,
.log_global_start = vfio_listener_log_global_start,
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* [PATCH v3 15/15] vfio/container: pass MemoryRegion to DMA operations
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (13 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 14/15] vfio/container: pass listener_begin/commit callbacks John Levon
@ 2025-05-07 15:20 ` John Levon
2025-05-09 10:22 ` Cédric Le Goater
2025-05-09 10:24 ` [PATCH v3 00/15] vfio: preparation for vfio-user Cédric Le Goater
15 siblings, 1 reply; 27+ messages in thread
From: John Levon @ 2025-05-07 15:20 UTC (permalink / raw)
To: qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Cédric Le Goater, Thomas Huth, Tony Krowiak,
Michael S. Tsirkin, Paolo Bonzini, Eric Farman, David Hildenbrand,
qemu-s390x, Jason Herne, John Levon, John Johnson,
Jagannathan Raman, Elena Ufimtseva
Pass through the MemoryRegion to DMA operation handlers of vfio
containers. The vfio-user container will need this later.
Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
include/hw/vfio/vfio-container-base.h | 4 ++--
include/system/memory.h | 4 +++-
hw/vfio/container-base.c | 4 ++--
hw/vfio/container.c | 3 ++-
hw/vfio/iommufd.c | 3 ++-
hw/vfio/listener.c | 18 +++++++++++-------
hw/virtio/vhost-vdpa.c | 2 +-
system/memory.c | 7 ++++++-
8 files changed, 29 insertions(+), 16 deletions(-)
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 3d392b0fd8..359b483963 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -78,7 +78,7 @@ void vfio_address_space_insert(VFIOAddressSpace *space,
int vfio_container_dma_map(VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
- void *vaddr, bool readonly);
+ void *vaddr, bool readonly, MemoryRegion *mrp);
int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
IOMMUTLBEntry *iotlb, bool unmap_all);
@@ -121,7 +121,7 @@ struct VFIOIOMMUClass {
void (*listener_commit)(VFIOContainerBase *bcontainer);
int (*dma_map)(const VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
- void *vaddr, bool readonly);
+ void *vaddr, bool readonly, MemoryRegion *mrp);
/**
* @dma_unmap
*
diff --git a/include/system/memory.h b/include/system/memory.h
index fbbf4cf911..eca1d9f32e 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -746,13 +746,15 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
* @read_only: indicates if writes are allowed
* @mr_has_discard_manager: indicates memory is controlled by a
* RamDiscardManager
+ * @mrp: if non-NULL, fill in with MemoryRegion
* @errp: pointer to Error*, to store an error if it happens.
*
* Return: true on success, else false setting @errp with error.
*/
bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
ram_addr_t *ram_addr, bool *read_only,
- bool *mr_has_discard_manager, Error **errp);
+ bool *mr_has_discard_manager, MemoryRegion **mrp,
+ Error **errp);
typedef struct CoalescedMemoryRange CoalescedMemoryRange;
typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 1c6ca94b60..a677bb6694 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -75,12 +75,12 @@ void vfio_address_space_insert(VFIOAddressSpace *space,
int vfio_container_dma_map(VFIOContainerBase *bcontainer,
hwaddr iova, ram_addr_t size,
- void *vaddr, bool readonly)
+ void *vaddr, bool readonly, MemoryRegion *mrp)
{
VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
g_assert(vioc->dma_map);
- return vioc->dma_map(bcontainer, iova, size, vaddr, readonly);
+ return vioc->dma_map(bcontainer, iova, size, vaddr, readonly, mrp);
}
int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index a9f0dbaec4..98d6b9f90c 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -207,7 +207,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
}
static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
- ram_addr_t size, void *vaddr, bool readonly)
+ ram_addr_t size, void *vaddr, bool readonly,
+ MemoryRegion *mrp)
{
const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
bcontainer);
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index af1c7ab10a..a2518c4a5d 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -34,7 +34,8 @@
TYPE_HOST_IOMMU_DEVICE_IOMMUFD "-vfio"
static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
- ram_addr_t size, void *vaddr, bool readonly)
+ ram_addr_t size, void *vaddr, bool readonly,
+ MemoryRegion *mrp)
{
const VFIOIOMMUFDContainer *container =
container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index bfacb3d8d9..71f336a31c 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -93,12 +93,12 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
/* Called with rcu_read_lock held. */
static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
ram_addr_t *ram_addr, bool *read_only,
- Error **errp)
+ MemoryRegion **mrp, Error **errp)
{
bool ret, mr_has_discard_manager;
ret = memory_get_xlat_addr(iotlb, vaddr, ram_addr, read_only,
- &mr_has_discard_manager, errp);
+ &mr_has_discard_manager, mrp, errp);
if (ret && mr_has_discard_manager) {
/*
* Malicious VMs might trigger discarding of IOMMU-mapped memory. The
@@ -126,6 +126,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
VFIOContainerBase *bcontainer = giommu->bcontainer;
hwaddr iova = iotlb->iova + giommu->iommu_offset;
+ MemoryRegion *mrp;
void *vaddr;
int ret;
Error *local_err = NULL;
@@ -150,7 +151,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
bool read_only;
- if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &local_err)) {
+ if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &mrp,
+ &local_err)) {
error_report_err(local_err);
goto out;
}
@@ -163,7 +165,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
*/
ret = vfio_container_dma_map(bcontainer, iova,
iotlb->addr_mask + 1, vaddr,
- read_only);
+ read_only, mrp);
if (ret) {
error_report("vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
"0x%"HWADDR_PRIx", %p) = %d (%s)",
@@ -233,7 +235,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
vaddr = memory_region_get_ram_ptr(section->mr) + start;
ret = vfio_container_dma_map(bcontainer, iova, next - start,
- vaddr, section->readonly);
+ vaddr, section->readonly, section->mr);
if (ret) {
/* Rollback */
vfio_ram_discard_notify_discard(rdl, section);
@@ -557,7 +559,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
}
ret = vfio_container_dma_map(bcontainer, iova, int128_get64(llsize),
- vaddr, section->readonly);
+ vaddr, section->readonly, section->mr);
if (ret) {
error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
"0x%"HWADDR_PRIx", %p) = %d (%s)",
@@ -1021,7 +1023,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
}
rcu_read_lock();
- if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, &local_err)) {
+ if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, NULL,
+ &local_err)) {
+ error_report_err(local_err);
goto out_unlock;
}
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 1ab2c11fa8..4c4b3d1371 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -228,7 +228,7 @@ static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
bool read_only;
- if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL,
+ if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL, NULL,
&local_err)) {
error_report_err(local_err);
return;
diff --git a/system/memory.c b/system/memory.c
index 71434e7ad0..79671943ce 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2176,7 +2176,8 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
/* Called with rcu_read_lock held. */
bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
ram_addr_t *ram_addr, bool *read_only,
- bool *mr_has_discard_manager, Error **errp)
+ bool *mr_has_discard_manager, MemoryRegion **mrp,
+ Error **errp)
{
MemoryRegion *mr;
hwaddr xlat;
@@ -2241,6 +2242,10 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
*read_only = !writable || mr->readonly;
}
+ if (mrp != NULL) {
+ *mrp = mr;
+ }
+
return true;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 27+ messages in thread
* Re: [PATCH v3 06/15] vfio: add strread/writeerror()
2025-05-07 15:20 ` [PATCH v3 06/15] vfio: add strread/writeerror() John Levon
@ 2025-05-09 10:05 ` Cédric Le Goater
0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:05 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne
On 5/7/25 17:20, John Levon wrote:
> Add simple helpers to correctly report failures from read/write routines
> using the return -errno style.
>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
> include/hw/vfio/vfio-device.h | 14 ++++++++++++++
> 1 file changed, 14 insertions(+)
>
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index a7eaaa31e7..4a32202943 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -115,6 +115,20 @@ struct VFIODeviceOps {
> int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
> };
>
> +/*
> + * Given a return value of either a short number of bytes read or -errno,
> + * construct a meaningful error message.
> + */
> +#define strreaderror(ret) \
> + (ret < 0 ? strerror(-ret) : "short read")
> +
> +/*
> + * Given a return value of either a short number of bytes written or -errno,
> + * construct a meaningful error message.
> + */
> +#define strwriteerror(ret) \
> + (ret < 0 ? strerror(-ret) : "short write")
> +
> void vfio_device_irq_disable(VFIODevice *vbasedev, int index);
> void vfio_device_irq_unmask(VFIODevice *vbasedev, int index);
> void vfio_device_irq_mask(VFIODevice *vbasedev, int index);
I am not thrilled about the naming nor the location (why not use
hw/vfio/vfio-helpers.h instead ?) but this is minor and we can refine
later.
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 08/15] vfio: add unmap_all flag to DMA unmap callback
2025-05-07 15:20 ` [PATCH v3 08/15] vfio: add unmap_all flag to DMA unmap callback John Levon
@ 2025-05-09 10:07 ` Cédric Le Goater
0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:07 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne
On 5/7/25 17:20, John Levon wrote:
> We'll use this parameter shortly; this just adds the plumbing.
>
> Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/hw/vfio/vfio-container-base.h | 15 +++++++++++++--
> hw/vfio/container-base.c | 4 ++--
> hw/vfio/container.c | 8 ++++++--
> hw/vfio/iommufd.c | 6 +++++-
> hw/vfio/listener.c | 8 ++++----
> 5 files changed, 30 insertions(+), 11 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 5527e02722..59f07d26e8 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -81,7 +81,7 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
> void *vaddr, bool readonly);
> int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> - IOMMUTLBEntry *iotlb);
> + IOMMUTLBEntry *iotlb, bool unmap_all);
> bool vfio_container_add_section_window(VFIOContainerBase *bcontainer,
> MemoryRegionSection *section,
> Error **errp);
> @@ -120,9 +120,20 @@ struct VFIOIOMMUClass {
> int (*dma_map)(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> void *vaddr, bool readonly);
> + /**
> + * @dma_unmap
> + *
> + * Unmap an address range from the container.
> + *
> + * @bcontainer: #VFIOContainerBase to use for unmap
> + * @iova: start address to unmap
> + * @size: size of the range to unmap
> + * @iotlb: The IOMMU TLB mapping entry (or NULL)
> + * @unmap_all: if set, unmap the entire address space
> + */
> int (*dma_unmap)(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> - IOMMUTLBEntry *iotlb);
> + IOMMUTLBEntry *iotlb, bool unmap_all);
> bool (*attach_device)(const char *name, VFIODevice *vbasedev,
> AddressSpace *as, Error **errp);
> void (*detach_device)(VFIODevice *vbasedev);
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 09340fd97a..3ff473a45c 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -85,12 +85,12 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>
> int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> - IOMMUTLBEntry *iotlb)
> + IOMMUTLBEntry *iotlb, bool unmap_all)
> {
> VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>
> g_assert(vioc->dma_unmap);
> - return vioc->dma_unmap(bcontainer, iova, size, iotlb);
> + return vioc->dma_unmap(bcontainer, iova, size, iotlb, unmap_all);
> }
>
> bool vfio_container_add_section_window(VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index cf23aa799f..d5f4e66f1c 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -124,7 +124,7 @@ unmap_exit:
> */
> static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> - IOMMUTLBEntry *iotlb)
> + IOMMUTLBEntry *iotlb, bool unmap_all)
> {
> const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> bcontainer);
> @@ -138,6 +138,10 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> int ret;
> Error *local_err = NULL;
>
> + if (unmap_all) {
> + return -ENOTSUP;
> + }
> +
> if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
> if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
> bcontainer->dirty_pages_supported) {
> @@ -205,7 +209,7 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> */
> if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
> (errno == EBUSY &&
> - vfio_legacy_dma_unmap(bcontainer, iova, size, NULL) == 0 &&
> + vfio_legacy_dma_unmap(bcontainer, iova, size, NULL, false) == 0 &&
> ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
> return 0;
> }
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 62ecb758f1..6b2764c044 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -46,11 +46,15 @@ static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>
> static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> - IOMMUTLBEntry *iotlb)
> + IOMMUTLBEntry *iotlb, bool unmap_all)
> {
> const VFIOIOMMUFDContainer *container =
> container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>
> + if (unmap_all) {
> + return -ENOTSUP;
> + }
> +
> /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
> return iommufd_backend_unmap_dma(container->be,
> container->ioas_id, iova, size);
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index 6f77e18a7a..c5183700db 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -172,7 +172,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> }
> } else {
> ret = vfio_container_dma_unmap(bcontainer, iova,
> - iotlb->addr_mask + 1, iotlb);
> + iotlb->addr_mask + 1, iotlb, false);
> if (ret) {
> error_setg(&local_err,
> "vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> @@ -201,7 +201,7 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
> int ret;
>
> /* Unmap with a single call. */
> - ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL);
> + ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL, false);
> if (ret) {
> error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
> strerror(-ret));
> @@ -638,7 +638,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
> /* The unmap ioctl doesn't accept a full 64-bit span. */
> llsize = int128_rshift(llsize, 1);
> ret = vfio_container_dma_unmap(bcontainer, iova,
> - int128_get64(llsize), NULL);
> + int128_get64(llsize), NULL, false);
> if (ret) {
> error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> "0x%"HWADDR_PRIx") = %d (%s)",
> @@ -648,7 +648,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
> iova += int128_get64(llsize);
> }
> ret = vfio_container_dma_unmap(bcontainer, iova,
> - int128_get64(llsize), NULL);
> + int128_get64(llsize), NULL, false);
> if (ret) {
> error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> "0x%"HWADDR_PRIx") = %d (%s)",
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 09/15] vfio: implement unmap all for DMA unmap callbacks
2025-05-07 15:20 ` [PATCH v3 09/15] vfio: implement unmap all for DMA unmap callbacks John Levon
@ 2025-05-09 10:08 ` Cédric Le Goater
0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:08 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne
On 5/7/25 17:20, John Levon wrote:
> Handle unmap_all in the DMA unmap handlers rather than in the caller.
>
> Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/container.c | 41 +++++++++++++++++++++++++++++++----------
> hw/vfio/iommufd.c | 15 ++++++++++++++-
> hw/vfio/listener.c | 19 ++++++-------------
> 3 files changed, 51 insertions(+), 24 deletions(-)
>
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index d5f4e66f1c..a9f0dbaec4 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -119,12 +119,9 @@ unmap_exit:
> return ret;
> }
>
> -/*
> - * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
> - */
> -static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> - hwaddr iova, ram_addr_t size,
> - IOMMUTLBEntry *iotlb, bool unmap_all)
> +static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
> + hwaddr iova, ram_addr_t size,
> + IOMMUTLBEntry *iotlb)
> {
> const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> bcontainer);
> @@ -138,10 +135,6 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> int ret;
> Error *local_err = NULL;
>
> - if (unmap_all) {
> - return -ENOTSUP;
> - }
> -
> if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
> if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
> bcontainer->dirty_pages_supported) {
> @@ -185,6 +178,34 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> return 0;
> }
>
> +/*
> + * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
> + */
> +static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> + hwaddr iova, ram_addr_t size,
> + IOMMUTLBEntry *iotlb, bool unmap_all)
> +{
> + int ret;
> +
> + if (unmap_all) {
> + /* The unmap ioctl doesn't accept a full 64-bit span. */
> + Int128 llsize = int128_rshift(int128_2_64(), 1);
> +
> + ret = vfio_legacy_dma_unmap_one(bcontainer, 0, int128_get64(llsize),
> + iotlb);
> +
> + if (ret == 0) {
> + ret = vfio_legacy_dma_unmap_one(bcontainer, int128_get64(llsize),
> + int128_get64(llsize), iotlb);
> + }
> +
> + } else {
> + ret = vfio_legacy_dma_unmap_one(bcontainer, iova, size, iotlb);
> + }
> +
> + return ret;
> +}
> +
> static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> ram_addr_t size, void *vaddr, bool readonly)
> {
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 6b2764c044..af1c7ab10a 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -51,8 +51,21 @@ static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
> const VFIOIOMMUFDContainer *container =
> container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>
> + /* unmap in halves */
> if (unmap_all) {
> - return -ENOTSUP;
> + Int128 llsize = int128_rshift(int128_2_64(), 1);
> + int ret;
> +
> + ret = iommufd_backend_unmap_dma(container->be, container->ioas_id,
> + 0, int128_get64(llsize));
> +
> + if (ret == 0) {
> + ret = iommufd_backend_unmap_dma(container->be, container->ioas_id,
> + int128_get64(llsize),
> + int128_get64(llsize));
> + }
> +
> + return ret;
> }
>
> /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index c5183700db..e7ade7d62e 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -634,21 +634,14 @@ static void vfio_listener_region_del(MemoryListener *listener,
> }
>
> if (try_unmap) {
> + bool unmap_all = false;
> +
> if (int128_eq(llsize, int128_2_64())) {
> - /* The unmap ioctl doesn't accept a full 64-bit span. */
> - llsize = int128_rshift(llsize, 1);
> - ret = vfio_container_dma_unmap(bcontainer, iova,
> - int128_get64(llsize), NULL, false);
> - if (ret) {
> - error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> - "0x%"HWADDR_PRIx") = %d (%s)",
> - bcontainer, iova, int128_get64(llsize), ret,
> - strerror(-ret));
> - }
> - iova += int128_get64(llsize);
> + unmap_all = true;
> + llsize = int128_zero();
> }
> - ret = vfio_container_dma_unmap(bcontainer, iova,
> - int128_get64(llsize), NULL, false);
> + ret = vfio_container_dma_unmap(bcontainer, iova, int128_get64(llsize),
> + NULL, unmap_all);
> if (ret) {
> error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> "0x%"HWADDR_PRIx") = %d (%s)",
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 10/15] vfio: add device IO ops vector
2025-05-07 15:20 ` [PATCH v3 10/15] vfio: add device IO ops vector John Levon
@ 2025-05-09 10:09 ` Cédric Le Goater
0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:09 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne,
John Johnson, Elena Ufimtseva, Jagannathan Raman
On 5/7/25 17:20, John Levon wrote:
> For vfio-user, device operations such as IRQ handling and region
> read/writes are implemented in userspace over the control socket, not
> ioctl() to the vfio kernel driver; add an ops vector to generalize this,
> and implement vfio_device_io_ops_ioctl for interacting with the kernel
> vfio driver.
>
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/hw/vfio/vfio-device.h | 38 ++++++++++++++++++
> hw/vfio/container-base.c | 6 +--
> hw/vfio/device.c | 74 +++++++++++++++++++++++++++++------
> hw/vfio/listener.c | 13 +++---
> hw/vfio/pci.c | 10 ++---
> 5 files changed, 114 insertions(+), 27 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 4a32202943..7e1e81e76b 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -41,6 +41,7 @@ enum {
> };
>
> typedef struct VFIODeviceOps VFIODeviceOps;
> +typedef struct VFIODeviceIOOps VFIODeviceIOOps;
> typedef struct VFIOMigration VFIOMigration;
>
> typedef struct IOMMUFDBackend IOMMUFDBackend;
> @@ -66,6 +67,7 @@ typedef struct VFIODevice {
> OnOffAuto migration_multifd_transfer;
> bool migration_events;
> VFIODeviceOps *ops;
> + VFIODeviceIOOps *io_ops;
> unsigned int num_irqs;
> unsigned int num_regions;
> unsigned int flags;
> @@ -151,6 +153,42 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
> extern VFIODeviceList vfio_device_list;
>
> #ifdef CONFIG_LINUX
> +/*
> + * How devices communicate with the server. The default option is through
> + * ioctl() to the kernel VFIO driver, but vfio-user can use a socket to a remote
> + * process.
> + */
> +struct VFIODeviceIOOps {
> + /**
> + * @device_feature
> + *
> + * Fill in feature info for the given device.
> + */
> + int (*device_feature)(VFIODevice *vdev, struct vfio_device_feature *);
> +
> + /**
> + * @get_region_info
> + *
> + * Fill in @info with information on the region given by @info->index.
> + */
> + int (*get_region_info)(VFIODevice *vdev,
> + struct vfio_region_info *info);
> +
> + /**
> + * @get_irq_info
> + *
> + * Fill in @irq with information on the IRQ given by @info->index.
> + */
> + int (*get_irq_info)(VFIODevice *vdev, struct vfio_irq_info *irq);
> +
> + /**
> + * @set_irqs
> + *
> + * Configure IRQs as defined by @irqs.
> + */
> + int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
> +};
> +
> void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
> struct vfio_device_info *info);
>
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 3ff473a45c..1c6ca94b60 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -198,11 +198,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
> feature->flags = VFIO_DEVICE_FEATURE_GET |
> VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
>
> - if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> - return -errno;
> - }
> -
> - return 0;
> + return vbasedev->io_ops->device_feature(vbasedev, feature);
> }
>
> static int vfio_container_iommu_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 5d837092cb..40a196bfb9 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -82,7 +82,7 @@ void vfio_device_irq_disable(VFIODevice *vbasedev, int index)
> .count = 0,
> };
>
> - ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> + vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
> }
>
> void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
> @@ -95,7 +95,7 @@ void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
> .count = 1,
> };
>
> - ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> + vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
> }
>
> void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
> @@ -108,7 +108,7 @@ void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
> .count = 1,
> };
>
> - ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> + vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
> }
>
> static inline const char *action_to_str(int action)
> @@ -167,7 +167,7 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
> pfd = (int32_t *)&irq_set->data;
> *pfd = fd;
>
> - if (!ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
> + if (!vbasedev->io_ops->set_irqs(vbasedev, irq_set)) {
> return true;
> }
>
> @@ -188,22 +188,19 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
> int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
> struct vfio_irq_info *info)
> {
> - int ret;
> -
> memset(info, 0, sizeof(*info));
>
> info->argsz = sizeof(*info);
> info->index = index;
>
> - ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> -
> - return ret < 0 ? -errno : ret;
> + return vbasedev->io_ops->get_irq_info(vbasedev, info);
> }
>
> int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
> struct vfio_region_info **info)
> {
> size_t argsz = sizeof(struct vfio_region_info);
> + int ret;
>
> *info = g_malloc0(argsz);
>
> @@ -211,10 +208,11 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
> retry:
> (*info)->argsz = argsz;
>
> - if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
> + ret = vbasedev->io_ops->get_region_info(vbasedev, *info);
> + if (ret != 0) {
> g_free(*info);
> *info = NULL;
> - return -errno;
> + return ret;
> }
>
> if ((*info)->argsz > argsz) {
> @@ -320,11 +318,14 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
> vbasedev->fd = fd;
> }
>
> +static VFIODeviceIOOps vfio_device_io_ops_ioctl;
> +
> void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
> DeviceState *dev, bool ram_discard)
> {
> vbasedev->type = type;
> vbasedev->ops = ops;
> + vbasedev->io_ops = &vfio_device_io_ops_ioctl;
> vbasedev->dev = dev;
> vbasedev->fd = -1;
>
> @@ -442,3 +443,54 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
> QLIST_REMOVE(vbasedev, global_next);
> vbasedev->bcontainer = NULL;
> }
> +
> +/*
> + * Traditional ioctl() based io
> + */
> +
> +static int vfio_device_io_device_feature(VFIODevice *vbasedev,
> + struct vfio_device_feature *feature)
> +{
> + int ret;
> +
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +
> + return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_get_region_info(VFIODevice *vbasedev,
> + struct vfio_region_info *info)
> +{
> + int ret;
> +
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
> +
> + return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_get_irq_info(VFIODevice *vbasedev,
> + struct vfio_irq_info *info)
> +{
> + int ret;
> +
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> +
> + return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
> + struct vfio_irq_set *irqs)
> +{
> + int ret;
> +
> + ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs);
> +
> + return ret < 0 ? -errno : ret;
> +}
> +
> +static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
> + .device_feature = vfio_device_io_device_feature,
> + .get_region_info = vfio_device_io_get_region_info,
> + .get_irq_info = vfio_device_io_get_irq_info,
> + .set_irqs = vfio_device_io_set_irqs,
> +};
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index e7ade7d62e..2b93ca55b6 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -794,13 +794,17 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
> VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
>
> QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
> + int ret;
> +
> if (!vbasedev->dirty_tracking) {
> continue;
> }
>
> - if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> + ret = vbasedev->io_ops->device_feature(vbasedev, feature);
> +
> + if (ret != 0) {
> warn_report("%s: Failed to stop DMA logging, err %d (%s)",
> - vbasedev->name, -errno, strerror(errno));
> + vbasedev->name, -ret, strerror(-ret));
> }
> vbasedev->dirty_tracking = false;
> }
> @@ -901,10 +905,9 @@ static bool vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
> continue;
> }
>
> - ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> + ret = vbasedev->io_ops->device_feature(vbasedev, feature);
> if (ret) {
> - ret = -errno;
> - error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
> + error_setg_errno(errp, -ret, "%s: Failed to start DMA logging",
> vbasedev->name);
> goto out;
> }
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index f65c9463ce..da2ffc9bf3 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -381,7 +381,7 @@ static void vfio_msi_interrupt(void *opaque)
> static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
> {
> g_autofree struct vfio_irq_set *irq_set = NULL;
> - int ret = 0, argsz;
> + int argsz;
> int32_t *fd;
>
> argsz = sizeof(*irq_set) + sizeof(*fd);
> @@ -396,9 +396,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
> fd = (int32_t *)&irq_set->data;
> *fd = -1;
>
> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> -
> - return ret < 0 ? -errno : ret;
> + return vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
> }
>
> static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> @@ -455,11 +453,11 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> fds[i] = fd;
> }
>
> - ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> + ret = vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
>
> g_free(irq_set);
>
> - return ret < 0 ? -errno : ret;
> + return ret;
> }
>
> static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 11/15] vfio: add region info cache
2025-05-07 15:20 ` [PATCH v3 11/15] vfio: add region info cache John Levon
@ 2025-05-09 10:09 ` Cédric Le Goater
0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:09 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne,
John Johnson, Elena Ufimtseva, Jagannathan Raman
On 5/7/25 17:20, John Levon wrote:
> Instead of requesting region information on demand with
> VFIO_DEVICE_GET_REGION_INFO, maintain a cache: this will become
> necessary for performance for vfio-user, where this call becomes a
> message over the control socket, so is of higher overhead than the
> traditional path.
>
> We will also need it to generalize region accesses, as that means we
> can't use ->config_offset for configuration space accesses, but must
> look up the region offset (if relevant) each time.
>
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> include/hw/vfio/vfio-device.h | 1 +
> hw/vfio/ccw.c | 5 -----
> hw/vfio/device.c | 25 +++++++++++++++++++++----
> hw/vfio/igd.c | 10 +++++-----
> hw/vfio/pci.c | 6 +++---
> hw/vfio/region.c | 2 +-
> 6 files changed, 31 insertions(+), 18 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 7e1e81e76b..4fff3dcee3 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -83,6 +83,7 @@ typedef struct VFIODevice {
> IOMMUFDBackend *iommufd;
> VFIOIOASHwpt *hwpt;
> QLIST_ENTRY(VFIODevice) hwpt_next;
> + struct vfio_region_info **reginfo;
> } VFIODevice;
>
> struct VFIODeviceOps {
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index ab3fabf991..cea9d6e005 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -504,7 +504,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
>
> vcdev->io_region_offset = info->offset;
> vcdev->io_region = g_malloc0(info->size);
> - g_free(info);
>
> /* check for the optional async command region */
> ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
> @@ -517,7 +516,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
> }
> vcdev->async_cmd_region_offset = info->offset;
> vcdev->async_cmd_region = g_malloc0(info->size);
> - g_free(info);
> }
>
> ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
> @@ -530,7 +528,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
> }
> vcdev->schib_region_offset = info->offset;
> vcdev->schib_region = g_malloc(info->size);
> - g_free(info);
> }
>
> ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
> @@ -544,7 +541,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
> }
> vcdev->crw_region_offset = info->offset;
> vcdev->crw_region = g_malloc(info->size);
> - g_free(info);
> }
>
> return true;
> @@ -554,7 +550,6 @@ out_err:
> g_free(vcdev->schib_region);
> g_free(vcdev->async_cmd_region);
> g_free(vcdev->io_region);
> - g_free(info);
> return false;
> }
>
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 40a196bfb9..77b0675abe 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -202,6 +202,12 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
> size_t argsz = sizeof(struct vfio_region_info);
> int ret;
>
> + /* check cache */
> + if (vbasedev->reginfo[index] != NULL) {
> + *info = vbasedev->reginfo[index];
> + return 0;
> + }
> +
> *info = g_malloc0(argsz);
>
> (*info)->index = index;
> @@ -222,6 +228,9 @@ retry:
> goto retry;
> }
>
> + /* fill cache */
> + vbasedev->reginfo[index] = *info;
> +
> return 0;
> }
>
> @@ -240,7 +249,6 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
>
> hdr = vfio_get_region_info_cap(*info, VFIO_REGION_INFO_CAP_TYPE);
> if (!hdr) {
> - g_free(*info);
> continue;
> }
>
> @@ -252,8 +260,6 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
> if (cap_type->type == type && cap_type->subtype == subtype) {
> return 0;
> }
> -
> - g_free(*info);
> }
>
> *info = NULL;
> @@ -262,7 +268,7 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
>
> bool vfio_device_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
> {
> - g_autofree struct vfio_region_info *info = NULL;
> + struct vfio_region_info *info = NULL;
> bool ret = false;
>
> if (!vfio_device_get_region_info(vbasedev, region, &info)) {
> @@ -435,10 +441,21 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
> QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
>
> QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
> +
> + vbasedev->reginfo = g_new0(struct vfio_region_info *,
> + vbasedev->num_regions);
> }
>
> void vfio_device_unprepare(VFIODevice *vbasedev)
> {
> + int i;
> +
> + for (i = 0; i < vbasedev->num_regions; i++) {
> + g_free(vbasedev->reginfo[i]);
> + }
> + g_free(vbasedev->reginfo);
> + vbasedev->reginfo = NULL;
> +
> QLIST_REMOVE(vbasedev, container_next);
> QLIST_REMOVE(vbasedev, global_next);
> vbasedev->bcontainer = NULL;
> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
> index 3ee1a73b57..e7952d15a0 100644
> --- a/hw/vfio/igd.c
> +++ b/hw/vfio/igd.c
> @@ -349,8 +349,8 @@ static int vfio_pci_igd_lpc_init(VFIOPCIDevice *vdev,
>
> static bool vfio_pci_igd_setup_lpc_bridge(VFIOPCIDevice *vdev, Error **errp)
> {
> - g_autofree struct vfio_region_info *host = NULL;
> - g_autofree struct vfio_region_info *lpc = NULL;
> + struct vfio_region_info *host = NULL;
> + struct vfio_region_info *lpc = NULL;
> PCIDevice *lpc_bridge;
> int ret;
>
> @@ -510,7 +510,7 @@ void vfio_probe_igd_bar0_quirk(VFIOPCIDevice *vdev, int nr)
>
> static bool vfio_pci_igd_config_quirk(VFIOPCIDevice *vdev, Error **errp)
> {
> - g_autofree struct vfio_region_info *opregion = NULL;
> + struct vfio_region_info *opregion = NULL;
> int ret, gen;
> uint64_t gms_size = 0;
> uint64_t *bdsm_size;
> @@ -551,7 +551,7 @@ static bool vfio_pci_igd_config_quirk(VFIOPCIDevice *vdev, Error **errp)
> * - OpRegion
> * - Same LPC bridge and Host bridge VID/DID/SVID/SSID as host
> */
> - g_autofree struct vfio_region_info *rom = NULL;
> + struct vfio_region_info *rom = NULL;
>
> legacy_mode_enabled = true;
> info_report("IGD legacy mode enabled, "
> @@ -681,7 +681,7 @@ error:
> */
> static bool vfio_pci_kvmgt_config_quirk(VFIOPCIDevice *vdev, Error **errp)
> {
> - g_autofree struct vfio_region_info *opregion = NULL;
> + struct vfio_region_info *opregion = NULL;
> int gen;
>
> if (!vfio_pci_is(vdev, PCI_VENDOR_ID_INTEL, PCI_ANY_ID) ||
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index da2ffc9bf3..9136cf52c8 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -883,8 +883,8 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
>
> static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
> {
> - g_autofree struct vfio_region_info *reg_info = NULL;
> VFIODevice *vbasedev = &vdev->vbasedev;
> + struct vfio_region_info *reg_info = NULL;
> uint64_t size;
> off_t off = 0;
> ssize_t bytes;
> @@ -2710,7 +2710,7 @@ static VFIODeviceOps vfio_pci_ops = {
> bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
> {
> VFIODevice *vbasedev = &vdev->vbasedev;
> - g_autofree struct vfio_region_info *reg_info = NULL;
> + struct vfio_region_info *reg_info = NULL;
> int ret;
>
> ret = vfio_device_get_region_info(vbasedev, VFIO_PCI_VGA_REGION_INDEX, ®_info);
> @@ -2775,7 +2775,7 @@ bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
> static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
> {
> VFIODevice *vbasedev = &vdev->vbasedev;
> - g_autofree struct vfio_region_info *reg_info = NULL;
> + struct vfio_region_info *reg_info = NULL;
> struct vfio_irq_info irq_info;
> int i, ret = -1;
>
> diff --git a/hw/vfio/region.c b/hw/vfio/region.c
> index 04bf9eb098..ef2630cac3 100644
> --- a/hw/vfio/region.c
> +++ b/hw/vfio/region.c
> @@ -182,7 +182,7 @@ static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
> int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
> int index, const char *name)
> {
> - g_autofree struct vfio_region_info *info = NULL;
> + struct vfio_region_info *info = NULL;
> int ret;
>
> ret = vfio_device_get_region_info(vbasedev, index, &info);
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 12/15] vfio: add read/write to device IO ops vector
2025-05-07 15:20 ` [PATCH v3 12/15] vfio: add read/write to device IO ops vector John Levon
@ 2025-05-09 10:14 ` Cédric Le Goater
2025-05-09 10:32 ` John Levon
0 siblings, 1 reply; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:14 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne
On 5/7/25 17:20, John Levon wrote:
> Now we have the region info cache, add ->region_read/write device I/O
> operations instead of explicit pread()/pwrite() system calls.
No S-o-b. Please reply with one.
Thanks,
C.
> ---
> include/hw/vfio/vfio-device.h | 18 ++++++++++++++++++
> hw/vfio/device.c | 34 ++++++++++++++++++++++++++++++++++
> hw/vfio/pci.c | 28 ++++++++++++++--------------
> hw/vfio/region.c | 17 +++++++++++------
> 4 files changed, 77 insertions(+), 20 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 4fff3dcee3..8bcb3c19f6 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -188,6 +188,24 @@ struct VFIODeviceIOOps {
> * Configure IRQs as defined by @irqs.
> */
> int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
> +
> + /**
> + * @region_read
> + *
> + * Read @size bytes from the region @nr at offset @off into the buffer
> + * @data.
> + */
> + int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
> + void *data);
> +
> + /**
> + * @region_write
> + *
> + * Write @size bytes to the region @nr at offset @off from the buffer
> + * @data.
> + */
> + int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
> + void *data);
> };
>
> void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 77b0675abe..0b2cd90d64 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -505,9 +505,43 @@ static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
> return ret < 0 ? -errno : ret;
> }
>
> +static int vfio_device_io_region_read(VFIODevice *vbasedev, uint8_t index,
> + off_t off, uint32_t size, void *data)
> +{
> + struct vfio_region_info *info;
> + int ret;
> +
> + ret = vfio_device_get_region_info(vbasedev, index, &info);
> + if (ret != 0) {
> + return ret;
> + }
> +
> + ret = pread(vbasedev->fd, data, size, info->offset + off);
> +
> + return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_region_write(VFIODevice *vbasedev, uint8_t index,
> + off_t off, uint32_t size, void *data)
> +{
> + struct vfio_region_info *info;
> + int ret;
> +
> + ret = vfio_device_get_region_info(vbasedev, index, &info);
> + if (ret != 0) {
> + return ret;
> + }
> +
> + ret = pwrite(vbasedev->fd, data, size, info->offset + off);
> +
> + return ret < 0 ? -errno : ret;
> +}
> +
> static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
> .device_feature = vfio_device_io_device_feature,
> .get_region_info = vfio_device_io_get_region_info,
> .get_irq_info = vfio_device_io_get_irq_info,
> .set_irqs = vfio_device_io_set_irqs,
> + .region_read = vfio_device_io_region_read,
> + .region_write = vfio_device_io_region_write,
> };
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 9136cf52c8..1236de315d 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -918,18 +918,22 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
> memset(vdev->rom, 0xff, size);
>
> while (size) {
> - bytes = pread(vbasedev->fd, vdev->rom + off,
> - size, vdev->rom_offset + off);
> + bytes = vbasedev->io_ops->region_read(vbasedev,
> + VFIO_PCI_ROM_REGION_INDEX,
> + off, size, vdev->rom + off);
> +
> if (bytes == 0) {
> break;
> } else if (bytes > 0) {
> off += bytes;
> size -= bytes;
> } else {
> - if (errno == EINTR || errno == EAGAIN) {
> + if (bytes == -EINTR || bytes == -EAGAIN) {
> continue;
> }
> - error_report("vfio: Error reading device ROM: %m");
> + error_report("vfio: Error reading device ROM: %s",
> + strreaderror(bytes));
> +
> break;
> }
> }
> @@ -969,22 +973,18 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
> static int vfio_pci_config_space_read(VFIOPCIDevice *vdev, off_t offset,
> uint32_t size, void *data)
> {
> - ssize_t ret;
> -
> - ret = pread(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
> -
> - return ret < 0 ? -errno : (int)ret;
> + return vdev->vbasedev.io_ops->region_read(&vdev->vbasedev,
> + VFIO_PCI_CONFIG_REGION_INDEX,
> + offset, size, data);
> }
>
> /* "Raw" write of underlying config space. */
> static int vfio_pci_config_space_write(VFIOPCIDevice *vdev, off_t offset,
> uint32_t size, void *data)
> {
> - ssize_t ret;
> -
> - ret = pwrite(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
> -
> - return ret < 0 ? -errno : (int)ret;
> + return vdev->vbasedev.io_ops->region_write(&vdev->vbasedev,
> + VFIO_PCI_CONFIG_REGION_INDEX,
> + offset, size, data);
> }
>
> static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
> diff --git a/hw/vfio/region.c b/hw/vfio/region.c
> index ef2630cac3..34752c3f65 100644
> --- a/hw/vfio/region.c
> +++ b/hw/vfio/region.c
> @@ -45,6 +45,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
> uint32_t dword;
> uint64_t qword;
> } buf;
> + int ret;
>
> switch (size) {
> case 1:
> @@ -64,11 +65,13 @@ void vfio_region_write(void *opaque, hwaddr addr,
> break;
> }
>
> - if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
> + ret = vbasedev->io_ops->region_write(vbasedev, region->nr,
> + addr, size, &buf);
> + if (ret != size) {
> error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
> - ",%d) failed: %m",
> + ",%d) failed: %s",
> __func__, vbasedev->name, region->nr,
> - addr, data, size);
> + addr, data, size, strwriteerror(ret));
> }
>
> trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
> @@ -96,11 +99,13 @@ uint64_t vfio_region_read(void *opaque,
> uint64_t qword;
> } buf;
> uint64_t data = 0;
> + int ret;
>
> - if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
> - error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
> + ret = vbasedev->io_ops->region_read(vbasedev, region->nr, addr, size, &buf);
> + if (ret != size) {
> + error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %s",
> __func__, vbasedev->name, region->nr,
> - addr, size);
> + addr, size, strreaderror(ret));
> return (uint64_t)-1;
> }
> switch (size) {
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 13/15] vfio: add vfio-pci-base class
2025-05-07 15:20 ` [PATCH v3 13/15] vfio: add vfio-pci-base class John Levon
@ 2025-05-09 10:14 ` Cédric Le Goater
0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:14 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne,
John Johnson, Elena Ufimtseva, Jagannathan Raman
On 5/7/25 17:20, John Levon wrote:
> Split out parts of TYPE_VFIO_PCI into a base TYPE_VFIO_PCI_BASE,
> although we have not yet introduced another subclass, so all the
> properties have remained in TYPE_VFIO_PCI.
>
> Note that currently there is no need for additional data for
> TYPE_VFIO_PCI, so it shares the same C struct type as
> TYPE_VFIO_PCI_BASE, VFIOPCIDevice.
>
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
Reviewed-by: Cédric Le Goater <clg@redhat.com>
Thanks,
C.
> ---
> hw/vfio/pci.h | 10 +++++++-
> hw/vfio/device.c | 2 +-
> hw/vfio/pci.c | 62 +++++++++++++++++++++++++++++++-----------------
> 3 files changed, 50 insertions(+), 24 deletions(-)
>
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index f835b1dbc2..5ce0fb916f 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -118,8 +118,16 @@ typedef struct VFIOMSIXInfo {
> bool noresize;
> } VFIOMSIXInfo;
>
> +/*
> + * TYPE_VFIO_PCI_BASE is an abstract type used to share code
> + * between VFIO implementations that use a kernel driver
> + * with those that use user sockets.
> + */
> +#define TYPE_VFIO_PCI_BASE "vfio-pci-base"
> +OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI_BASE)
> +
> #define TYPE_VFIO_PCI "vfio-pci"
> -OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI)
> +/* TYPE_VFIO_PCI shares struct VFIOPCIDevice. */
>
> struct VFIOPCIDevice {
> PCIDevice pdev;
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 0b2cd90d64..9fba2c7272 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -392,7 +392,7 @@ bool vfio_device_hiod_create_and_realize(VFIODevice *vbasedev,
> VFIODevice *vfio_get_vfio_device(Object *obj)
> {
> if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
> - return &VFIO_PCI(obj)->vbasedev;
> + return &VFIO_PCI_BASE(obj)->vbasedev;
> } else {
> return NULL;
> }
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 1236de315d..a1bfdfe375 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -241,7 +241,7 @@ static void vfio_intx_update(VFIOPCIDevice *vdev, PCIINTxRoute *route)
>
> static void vfio_intx_routing_notifier(PCIDevice *pdev)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> PCIINTxRoute route;
>
> if (vdev->interrupt != VFIO_INT_INTx) {
> @@ -514,7 +514,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
> static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
> MSIMessage *msg, IOHandler *handler)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> VFIOMSIVector *vector;
> int ret;
> bool resizing = !!(vdev->nr_vectors < nr + 1);
> @@ -620,7 +620,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
>
> static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> VFIOMSIVector *vector = &vdev->msi_vectors[nr];
>
> trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
> @@ -1196,7 +1196,7 @@ static const MemoryRegionOps vfio_vga_ops = {
> */
> static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> VFIORegion *region = &vdev->bars[bar].region;
> MemoryRegion *mmap_mr, *region_mr, *base_mr;
> PCIIORegion *r;
> @@ -1242,7 +1242,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
> */
> uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
> uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
>
> @@ -1276,7 +1276,7 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
> void vfio_pci_write_config(PCIDevice *pdev,
> uint32_t addr, uint32_t val, int len)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
> uint32_t val_le = cpu_to_le32(val);
> int ret;
> @@ -3129,7 +3129,7 @@ static bool vfio_interrupt_setup(VFIOPCIDevice *vdev, Error **errp)
> static void vfio_realize(PCIDevice *pdev, Error **errp)
> {
> ERRP_GUARD();
> - VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
> int i, ret;
> char uuid[UUID_STR_LEN];
> @@ -3300,7 +3300,7 @@ error:
>
> static void vfio_instance_finalize(Object *obj)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(obj);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
>
> vfio_display_finalize(vdev);
> vfio_bars_finalize(vdev);
> @@ -3318,7 +3318,7 @@ static void vfio_instance_finalize(Object *obj)
>
> static void vfio_exitfn(PCIDevice *pdev)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> VFIODevice *vbasedev = &vdev->vbasedev;
>
> vfio_unregister_req_notifier(vdev);
> @@ -3342,7 +3342,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>
> static void vfio_pci_reset(DeviceState *dev)
> {
> - VFIOPCIDevice *vdev = VFIO_PCI(dev);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
>
> trace_vfio_pci_reset(vdev->vbasedev.name);
>
> @@ -3382,7 +3382,7 @@ post_reset:
> static void vfio_instance_init(Object *obj)
> {
> PCIDevice *pci_dev = PCI_DEVICE(obj);
> - VFIOPCIDevice *vdev = VFIO_PCI(obj);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
> VFIODevice *vbasedev = &vdev->vbasedev;
>
> device_add_bootindex_property(obj, &vdev->bootindex,
> @@ -3403,6 +3403,31 @@ static void vfio_instance_init(Object *obj)
> pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
> }
>
> +static void vfio_pci_base_dev_class_init(ObjectClass *klass, const void *data)
> +{
> + DeviceClass *dc = DEVICE_CLASS(klass);
> + PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
> +
> + dc->desc = "VFIO PCI base device";
> + set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> + pdc->exit = vfio_exitfn;
> + pdc->config_read = vfio_pci_read_config;
> + pdc->config_write = vfio_pci_write_config;
> +}
> +
> +static const TypeInfo vfio_pci_base_dev_info = {
> + .name = TYPE_VFIO_PCI_BASE,
> + .parent = TYPE_PCI_DEVICE,
> + .instance_size = 0,
> + .abstract = true,
> + .class_init = vfio_pci_base_dev_class_init,
> + .interfaces = (const InterfaceInfo[]) {
> + { INTERFACE_PCIE_DEVICE },
> + { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> + { }
> + },
> +};
> +
> static PropertyInfo vfio_pci_migration_multifd_transfer_prop;
>
> static const Property vfio_pci_dev_properties[] = {
> @@ -3473,7 +3498,8 @@ static const Property vfio_pci_dev_properties[] = {
> #ifdef CONFIG_IOMMUFD
> static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp)
> {
> - vfio_device_set_fd(&VFIO_PCI(obj)->vbasedev, str, errp);
> + VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
> + vfio_device_set_fd(&vdev->vbasedev, str, errp);
> }
> #endif
>
> @@ -3488,11 +3514,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
> object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
> #endif
> dc->desc = "VFIO-based PCI device assignment";
> - set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> pdc->realize = vfio_realize;
> - pdc->exit = vfio_exitfn;
> - pdc->config_read = vfio_pci_read_config;
> - pdc->config_write = vfio_pci_write_config;
>
> object_class_property_set_description(klass, /* 1.3 */
> "host",
> @@ -3617,16 +3639,11 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
>
> static const TypeInfo vfio_pci_dev_info = {
> .name = TYPE_VFIO_PCI,
> - .parent = TYPE_PCI_DEVICE,
> + .parent = TYPE_VFIO_PCI_BASE,
> .instance_size = sizeof(VFIOPCIDevice),
> .class_init = vfio_pci_dev_class_init,
> .instance_init = vfio_instance_init,
> .instance_finalize = vfio_instance_finalize,
> - .interfaces = (const InterfaceInfo[]) {
> - { INTERFACE_PCIE_DEVICE },
> - { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> - { }
> - },
> };
>
> static const Property vfio_pci_dev_nohotplug_properties[] = {
> @@ -3673,6 +3690,7 @@ static void register_vfio_pci_dev_type(void)
> vfio_pci_migration_multifd_transfer_prop = qdev_prop_on_off_auto;
> vfio_pci_migration_multifd_transfer_prop.realized_set_allowed = true;
>
> + type_register_static(&vfio_pci_base_dev_info);
> type_register_static(&vfio_pci_dev_info);
> type_register_static(&vfio_pci_nohotplug_dev_info);
> }
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 15/15] vfio/container: pass MemoryRegion to DMA operations
2025-05-07 15:20 ` [PATCH v3 15/15] vfio/container: pass MemoryRegion to DMA operations John Levon
@ 2025-05-09 10:22 ` Cédric Le Goater
0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:22 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne,
John Johnson, Jagannathan Raman, Elena Ufimtseva
John,
On 5/7/25 17:20, John Levon wrote:
> Pass through the MemoryRegion to DMA operation handlers of vfio
> containers. The vfio-user container will need this later.
It think the subject and commit log does not reflect the important
part, which is to add a extend the memory_get_xlat_addr() parameters
with a 'MemoryRegion **' parameter for vfio-user usage (and why).
Could you please rephrase and resend as a standalone patch putting
in Cc: the system/memory maintainers, virtio and vfio ?
Thanks,
C.
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
> include/hw/vfio/vfio-container-base.h | 4 ++--
> include/system/memory.h | 4 +++-
> hw/vfio/container-base.c | 4 ++--
> hw/vfio/container.c | 3 ++-
> hw/vfio/iommufd.c | 3 ++-
> hw/vfio/listener.c | 18 +++++++++++-------
> hw/virtio/vhost-vdpa.c | 2 +-
> system/memory.c | 7 ++++++-
> 8 files changed, 29 insertions(+), 16 deletions(-)
>
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 3d392b0fd8..359b483963 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -78,7 +78,7 @@ void vfio_address_space_insert(VFIOAddressSpace *space,
>
> int vfio_container_dma_map(VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> - void *vaddr, bool readonly);
> + void *vaddr, bool readonly, MemoryRegion *mrp);
> int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> IOMMUTLBEntry *iotlb, bool unmap_all);
> @@ -121,7 +121,7 @@ struct VFIOIOMMUClass {
> void (*listener_commit)(VFIOContainerBase *bcontainer);
> int (*dma_map)(const VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> - void *vaddr, bool readonly);
> + void *vaddr, bool readonly, MemoryRegion *mrp);
> /**
> * @dma_unmap
> *
> diff --git a/include/system/memory.h b/include/system/memory.h
> index fbbf4cf911..eca1d9f32e 100644
> --- a/include/system/memory.h
> +++ b/include/system/memory.h
> @@ -746,13 +746,15 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> * @read_only: indicates if writes are allowed
> * @mr_has_discard_manager: indicates memory is controlled by a
> * RamDiscardManager
> + * @mrp: if non-NULL, fill in with MemoryRegion
> * @errp: pointer to Error*, to store an error if it happens.
> *
> * Return: true on success, else false setting @errp with error.
> */
> bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> ram_addr_t *ram_addr, bool *read_only,
> - bool *mr_has_discard_manager, Error **errp);
> + bool *mr_has_discard_manager, MemoryRegion **mrp,
> + Error **errp);
>
> typedef struct CoalescedMemoryRange CoalescedMemoryRange;
> typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 1c6ca94b60..a677bb6694 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -75,12 +75,12 @@ void vfio_address_space_insert(VFIOAddressSpace *space,
>
> int vfio_container_dma_map(VFIOContainerBase *bcontainer,
> hwaddr iova, ram_addr_t size,
> - void *vaddr, bool readonly)
> + void *vaddr, bool readonly, MemoryRegion *mrp)
> {
> VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>
> g_assert(vioc->dma_map);
> - return vioc->dma_map(bcontainer, iova, size, vaddr, readonly);
> + return vioc->dma_map(bcontainer, iova, size, vaddr, readonly, mrp);
> }
>
> int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index a9f0dbaec4..98d6b9f90c 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -207,7 +207,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> }
>
> static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> - ram_addr_t size, void *vaddr, bool readonly)
> + ram_addr_t size, void *vaddr, bool readonly,
> + MemoryRegion *mrp)
> {
> const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
> bcontainer);
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index af1c7ab10a..a2518c4a5d 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -34,7 +34,8 @@
> TYPE_HOST_IOMMU_DEVICE_IOMMUFD "-vfio"
>
> static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> - ram_addr_t size, void *vaddr, bool readonly)
> + ram_addr_t size, void *vaddr, bool readonly,
> + MemoryRegion *mrp)
> {
> const VFIOIOMMUFDContainer *container =
> container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index bfacb3d8d9..71f336a31c 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -93,12 +93,12 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
> /* Called with rcu_read_lock held. */
> static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> ram_addr_t *ram_addr, bool *read_only,
> - Error **errp)
> + MemoryRegion **mrp, Error **errp)
> {
> bool ret, mr_has_discard_manager;
>
> ret = memory_get_xlat_addr(iotlb, vaddr, ram_addr, read_only,
> - &mr_has_discard_manager, errp);
> + &mr_has_discard_manager, mrp, errp);
> if (ret && mr_has_discard_manager) {
> /*
> * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> @@ -126,6 +126,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
> VFIOContainerBase *bcontainer = giommu->bcontainer;
> hwaddr iova = iotlb->iova + giommu->iommu_offset;
> + MemoryRegion *mrp;
> void *vaddr;
> int ret;
> Error *local_err = NULL;
> @@ -150,7 +151,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> bool read_only;
>
> - if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &local_err)) {
> + if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &mrp,
> + &local_err)) {
> error_report_err(local_err);
> goto out;
> }
> @@ -163,7 +165,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> */
> ret = vfio_container_dma_map(bcontainer, iova,
> iotlb->addr_mask + 1, vaddr,
> - read_only);
> + read_only, mrp);
> if (ret) {
> error_report("vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
> "0x%"HWADDR_PRIx", %p) = %d (%s)",
> @@ -233,7 +235,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
> vaddr = memory_region_get_ram_ptr(section->mr) + start;
>
> ret = vfio_container_dma_map(bcontainer, iova, next - start,
> - vaddr, section->readonly);
> + vaddr, section->readonly, section->mr);
> if (ret) {
> /* Rollback */
> vfio_ram_discard_notify_discard(rdl, section);
> @@ -557,7 +559,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
> }
>
> ret = vfio_container_dma_map(bcontainer, iova, int128_get64(llsize),
> - vaddr, section->readonly);
> + vaddr, section->readonly, section->mr);
> if (ret) {
> error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
> "0x%"HWADDR_PRIx", %p) = %d (%s)",
> @@ -1021,7 +1023,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> }
>
> rcu_read_lock();
> - if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, &local_err)) {
> + if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, NULL,
> + &local_err)) {
> + error_report_err(local_err);
> goto out_unlock;
> }
>
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 1ab2c11fa8..4c4b3d1371 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -228,7 +228,7 @@ static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
> if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> bool read_only;
>
> - if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL,
> + if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL, NULL,
> &local_err)) {
> error_report_err(local_err);
> return;
> diff --git a/system/memory.c b/system/memory.c
> index 71434e7ad0..79671943ce 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2176,7 +2176,8 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
> /* Called with rcu_read_lock held. */
> bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> ram_addr_t *ram_addr, bool *read_only,
> - bool *mr_has_discard_manager, Error **errp)
> + bool *mr_has_discard_manager, MemoryRegion **mrp,
> + Error **errp)
> {
> MemoryRegion *mr;
> hwaddr xlat;
> @@ -2241,6 +2242,10 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> *read_only = !writable || mr->readonly;
> }
>
> + if (mrp != NULL) {
> + *mrp = mr;
> + }
> +
> return true;
> }
>
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 00/15] vfio: preparation for vfio-user
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
` (14 preceding siblings ...)
2025-05-07 15:20 ` [PATCH v3 15/15] vfio/container: pass MemoryRegion to DMA operations John Levon
@ 2025-05-09 10:24 ` Cédric Le Goater
2025-05-09 12:45 ` Cédric Le Goater
15 siblings, 1 reply; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:24 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne
On 5/7/25 17:20, John Levon wrote:
> Hi, this series is against the vfio-next tree:
> https://github.com/legoater/qemu/commits/vfio-next
>
> The series contains patches to vfio to prepare for the vfio-user
> implementation. A previous version of these patches can be found at
> https://patchew.org/QEMU/20250430194003.2793823-1-john.levon@nutanix.com/
>
> The changes have been rebased on vfio-next, and include changes from previous
> series code review comments.
>
> An old version of the full vfio-user series can be found at
> https://lore.kernel.org/all/7dd34008-e0f1-4eed-a77e-55b1f68fbe69@redhat.com/T/
> ("[PATCH v8 00/28] vfio-user client"). Please see that series for justification
> and context.
>
> thanks
> john
>
> John Levon (15):
> vfio: add vfio_device_prepare()
> vfio: add vfio_device_unprepare()
> vfio: add vfio_attach_device_by_iommu_type()
> vfio: add vfio_device_get_irq_info() helper
> vfio: consistently handle return value for helpers
> vfio: add strread/writeerror()
> vfio: add vfio_pci_config_space_read/write()
> vfio: add unmap_all flag to DMA unmap callback
> vfio: implement unmap all for DMA unmap callbacks
> vfio: add device IO ops vector
> vfio: add region info cache
> vfio: add read/write to device IO ops vector
> vfio: add vfio-pci-base class
> vfio/container: pass listener_begin/commit callbacks
> vfio/container: pass MemoryRegion to DMA operations
>
> hw/vfio/pci.h | 10 +-
> include/hw/vfio/vfio-container-base.h | 21 ++-
> include/hw/vfio/vfio-device.h | 82 ++++++++
> include/system/memory.h | 4 +-
> hw/vfio/ap.c | 19 +-
> hw/vfio/ccw.c | 25 ++-
> hw/vfio/container-base.c | 14 +-
> hw/vfio/container.c | 62 ++++---
> hw/vfio/device.c | 183 ++++++++++++++++--
> hw/vfio/igd.c | 10 +-
> hw/vfio/iommufd.c | 35 ++--
> hw/vfio/listener.c | 82 +++++---
> hw/vfio/pci.c | 257 ++++++++++++++++----------
> hw/vfio/platform.c | 6 +-
> hw/vfio/region.c | 19 +-
> hw/virtio/vhost-vdpa.c | 2 +-
> system/memory.c | 7 +-
> 17 files changed, 603 insertions(+), 235 deletions(-)
>
I am waiting for an update of patch 12 to apply 01-14 to vfio-next.
patch 15 should be addressed independently.
Thanks,
C.
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 12/15] vfio: add read/write to device IO ops vector
2025-05-09 10:14 ` Cédric Le Goater
@ 2025-05-09 10:32 ` John Levon
0 siblings, 0 replies; 27+ messages in thread
From: John Levon @ 2025-05-09 10:32 UTC (permalink / raw)
To: Cédric Le Goater
Cc: qemu-devel, Philippe Mathieu-Daudé, Halil Pasic,
Tomita Moeko, Matthew Rosato, Stefano Garzarella, Alex Williamson,
Peter Xu, Thomas Huth, Tony Krowiak, Michael S. Tsirkin,
Paolo Bonzini, Eric Farman, David Hildenbrand, qemu-s390x,
Jason Herne
On Fri, May 09, 2025 at 12:14:02PM +0200, Cédric Le Goater wrote:
> !-------------------------------------------------------------------|
> CAUTION: External Email
>
> |-------------------------------------------------------------------!
>
> On 5/7/25 17:20, John Levon wrote:
> > Now we have the region info cache, add ->region_read/write device I/O
> > operations instead of explicit pread()/pwrite() system calls.
>
> No S-o-b. Please reply with one.
Apologies.
Signed-off-by: John Levon <john.levon@nutanix.com>
regards
john
^ permalink raw reply [flat|nested] 27+ messages in thread
* Re: [PATCH v3 00/15] vfio: preparation for vfio-user
2025-05-09 10:24 ` [PATCH v3 00/15] vfio: preparation for vfio-user Cédric Le Goater
@ 2025-05-09 12:45 ` Cédric Le Goater
0 siblings, 0 replies; 27+ messages in thread
From: Cédric Le Goater @ 2025-05-09 12:45 UTC (permalink / raw)
To: John Levon, qemu-devel
Cc: Philippe Mathieu-Daudé, Halil Pasic, Tomita Moeko,
Matthew Rosato, Stefano Garzarella, Alex Williamson, Peter Xu,
Thomas Huth, Tony Krowiak, Michael S. Tsirkin, Paolo Bonzini,
Eric Farman, David Hildenbrand, qemu-s390x, Jason Herne
On 5/9/25 12:24, Cédric Le Goater wrote:
> On 5/7/25 17:20, John Levon wrote:
>> Hi, this series is against the vfio-next tree:
>> https://github.com/legoater/qemu/commits/vfio-next
>>
>> The series contains patches to vfio to prepare for the vfio-user
>> implementation. A previous version of these patches can be found at
>> https://patchew.org/QEMU/20250430194003.2793823-1-john.levon@nutanix.com/
>>
>> The changes have been rebased on vfio-next, and include changes from previous
>> series code review comments.
>>
>> An old version of the full vfio-user series can be found at
>> https://lore.kernel.org/all/7dd34008-e0f1-4eed-a77e-55b1f68fbe69@redhat.com/T/
>> ("[PATCH v8 00/28] vfio-user client"). Please see that series for justification
>> and context.
>>
>> thanks
>> john
>>
>> John Levon (15):
>> vfio: add vfio_device_prepare()
>> vfio: add vfio_device_unprepare()
>> vfio: add vfio_attach_device_by_iommu_type()
>> vfio: add vfio_device_get_irq_info() helper
>> vfio: consistently handle return value for helpers
>> vfio: add strread/writeerror()
>> vfio: add vfio_pci_config_space_read/write()
>> vfio: add unmap_all flag to DMA unmap callback
>> vfio: implement unmap all for DMA unmap callbacks
>> vfio: add device IO ops vector
>> vfio: add region info cache
>> vfio: add read/write to device IO ops vector
>> vfio: add vfio-pci-base class
>> vfio/container: pass listener_begin/commit callbacks
>> vfio/container: pass MemoryRegion to DMA operations
>>
>> hw/vfio/pci.h | 10 +-
>> include/hw/vfio/vfio-container-base.h | 21 ++-
>> include/hw/vfio/vfio-device.h | 82 ++++++++
>> include/system/memory.h | 4 +-
>> hw/vfio/ap.c | 19 +-
>> hw/vfio/ccw.c | 25 ++-
>> hw/vfio/container-base.c | 14 +-
>> hw/vfio/container.c | 62 ++++---
>> hw/vfio/device.c | 183 ++++++++++++++++--
>> hw/vfio/igd.c | 10 +-
>> hw/vfio/iommufd.c | 35 ++--
>> hw/vfio/listener.c | 82 +++++---
>> hw/vfio/pci.c | 257 ++++++++++++++++----------
>> hw/vfio/platform.c | 6 +-
>> hw/vfio/region.c | 19 +-
>> hw/virtio/vhost-vdpa.c | 2 +-
>> system/memory.c | 7 +-
>> 17 files changed, 603 insertions(+), 235 deletions(-)
>>
>
> I am waiting for an update of patch 12 to apply 01-14 to vfio-next.
Applied 01-14 to vfio-next.
Thanks,
C.
^ permalink raw reply [flat|nested] 27+ messages in thread
end of thread, other threads:[~2025-05-09 12:47 UTC | newest]
Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-07 15:20 [PATCH v3 00/15] vfio: preparation for vfio-user John Levon
2025-05-07 15:20 ` [PATCH v3 01/15] vfio: add vfio_device_prepare() John Levon
2025-05-07 15:20 ` [PATCH v3 02/15] vfio: add vfio_device_unprepare() John Levon
2025-05-07 15:20 ` [PATCH v3 03/15] vfio: add vfio_attach_device_by_iommu_type() John Levon
2025-05-07 15:20 ` [PATCH v3 04/15] vfio: add vfio_device_get_irq_info() helper John Levon
2025-05-07 15:20 ` [PATCH v3 05/15] vfio: consistently handle return value for helpers John Levon
2025-05-07 15:20 ` [PATCH v3 06/15] vfio: add strread/writeerror() John Levon
2025-05-09 10:05 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 07/15] vfio: add vfio_pci_config_space_read/write() John Levon
2025-05-07 15:20 ` [PATCH v3 08/15] vfio: add unmap_all flag to DMA unmap callback John Levon
2025-05-09 10:07 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 09/15] vfio: implement unmap all for DMA unmap callbacks John Levon
2025-05-09 10:08 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 10/15] vfio: add device IO ops vector John Levon
2025-05-09 10:09 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 11/15] vfio: add region info cache John Levon
2025-05-09 10:09 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 12/15] vfio: add read/write to device IO ops vector John Levon
2025-05-09 10:14 ` Cédric Le Goater
2025-05-09 10:32 ` John Levon
2025-05-07 15:20 ` [PATCH v3 13/15] vfio: add vfio-pci-base class John Levon
2025-05-09 10:14 ` Cédric Le Goater
2025-05-07 15:20 ` [PATCH v3 14/15] vfio/container: pass listener_begin/commit callbacks John Levon
2025-05-07 15:20 ` [PATCH v3 15/15] vfio/container: pass MemoryRegion to DMA operations John Levon
2025-05-09 10:22 ` Cédric Le Goater
2025-05-09 10:24 ` [PATCH v3 00/15] vfio: preparation for vfio-user Cédric Le Goater
2025-05-09 12:45 ` Cédric Le Goater
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).