qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/15] vfio: preparation for vfio-user
@ 2025-04-30 19:39 John Levon
  2025-04-30 19:39 ` [PATCH v2 01/15] vfio: add vfio_prepare_device() John Levon
                   ` (15 more replies)
  0 siblings, 16 replies; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Hi, this series is against the vfio-next tree:
https://github.com/legoater/qemu/commits/vfio-next

The series contains patches to vfio to prepare for the vfio-user
implementation. A previous version of these patches can be found at
https://lore.kernel.org/all/20250409134814.478903-1-john.levon@nutanix.com/

The changes have been rebased on vfio-next, and include changes from previous
series code review comments.

An old version of the full vfio-user series can be found at
https://lore.kernel.org/all/7dd34008-e0f1-4eed-a77e-55b1f68fbe69@redhat.com/T/
("[PATCH v8 00/28] vfio-user client"). Please see that series for justification
and context.

thanks
john

John Levon (15):
  vfio: add vfio_prepare_device()
  vfio: add vfio_unprepare_device()
  vfio: add vfio_attach_device_by_iommu_type()
  vfio: add vfio_device_get_irq_info() helper
  vfio: consistently handle return value for helpers
  include/qemu: add strread/writeerror()
  vfio: add vfio_pci_config_space_read/write()
  vfio: add unmap_all flag to DMA unmap callback
  vfio: implement unmap all for DMA unmap callbacks
  vfio: add device IO ops vector
  vfio: add region info cache
  vfio: add read/write to device IO ops vector
  vfio: add vfio-pci-base class
  vfio/container: pass listener_begin/commit callbacks
  vfio/container: pass MemoryRegion to DMA operations

 hw/vfio/ap.c                          |  19 +-
 hw/vfio/ccw.c                         |  25 ++-
 hw/vfio/container-base.c              |  14 +-
 hw/vfio/container.c                   |  66 ++++---
 hw/vfio/device.c                      | 192 +++++++++++++++++--
 hw/vfio/igd.c                         |   8 +-
 hw/vfio/iommufd.c                     |  35 ++--
 hw/vfio/listener.c                    |  82 +++++---
 hw/vfio/pci.c                         | 257 ++++++++++++++++----------
 hw/vfio/pci.h                         |  12 +-
 hw/vfio/platform.c                    |   6 +-
 hw/vfio/region.c                      |  19 +-
 hw/virtio/vhost-vdpa.c                |   2 +-
 include/hw/vfio/vfio-container-base.h |  10 +-
 include/hw/vfio/vfio-device.h         |  67 +++++++
 include/qemu/error-report.h           |  14 ++
 include/system/memory.h               |   4 +-
 system/memory.c                       |   7 +-
 18 files changed, 604 insertions(+), 235 deletions(-)

-- 
2.43.0



^ permalink raw reply	[flat|nested] 41+ messages in thread

* [PATCH v2 01/15] vfio: add vfio_prepare_device()
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05  8:35   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 02/15] vfio: add vfio_unprepare_device() John Levon
                   ` (14 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Commonize some initialization code shared by the legacy and iommufd vfio
implementations.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/container.c           | 14 ++------------
 hw/vfio/device.c              | 14 ++++++++++++++
 hw/vfio/iommufd.c             |  9 +--------
 include/hw/vfio/vfio-device.h |  3 +++
 4 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 77ff56b43f..aa9d5b731b 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -811,18 +811,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
         }
     }
 
+    vfio_device_prepare(vbasedev, &group->container->bcontainer, info);
+
     vbasedev->fd = fd;
     vbasedev->group = group;
     QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
 
-    vbasedev->num_irqs = info->num_irqs;
-    vbasedev->num_regions = info->num_regions;
-    vbasedev->flags = info->flags;
-
     trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
 
-    vbasedev->reset_works = !!(info->flags & VFIO_DEVICE_FLAGS_RESET);
-
     return true;
 }
 
@@ -875,7 +871,6 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
     int groupid = vfio_device_get_groupid(vbasedev, errp);
     VFIODevice *vbasedev_iter;
     VFIOGroup *group;
-    VFIOContainerBase *bcontainer;
 
     if (groupid < 0) {
         return false;
@@ -904,11 +899,6 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
         goto device_put_exit;
     }
 
-    bcontainer = &group->container->bcontainer;
-    vbasedev->bcontainer = bcontainer;
-    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
-    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
-
     return true;
 
 device_put_exit:
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index d625a7c4db..f3b9902d21 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -398,3 +398,17 @@ void vfio_device_detach(VFIODevice *vbasedev)
     }
     VFIO_IOMMU_GET_CLASS(vbasedev->bcontainer)->detach_device(vbasedev);
 }
+
+void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
+                         struct vfio_device_info *info)
+{
+    vbasedev->num_irqs = info->num_irqs;
+    vbasedev->num_regions = info->num_regions;
+    vbasedev->flags = info->flags;
+    vbasedev->reset_works = !!(info->flags & VFIO_DEVICE_FLAGS_RESET);
+
+    vbasedev->bcontainer = bcontainer;
+    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
+
+    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 232c06dd15..83033c352a 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -588,14 +588,7 @@ found_container:
         iommufd_cdev_ram_block_discard_disable(false);
     }
 
-    vbasedev->group = 0;
-    vbasedev->num_irqs = dev_info.num_irqs;
-    vbasedev->num_regions = dev_info.num_regions;
-    vbasedev->flags = dev_info.flags;
-    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
-    vbasedev->bcontainer = bcontainer;
-    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
-    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+    vfio_device_prepare(vbasedev, bcontainer, &dev_info);
 
     trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
                                    vbasedev->num_regions, vbasedev->flags);
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 81c95bb51e..9cb5671ab5 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -130,6 +130,9 @@ bool vfio_device_attach(char *name, VFIODevice *vbasedev,
 void vfio_device_detach(VFIODevice *vbasedev);
 VFIODevice *vfio_get_vfio_device(Object *obj);
 
+void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
+                         struct vfio_device_info *info);
+
 typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
 extern VFIODeviceList vfio_device_list;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 02/15] vfio: add vfio_unprepare_device()
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
  2025-04-30 19:39 ` [PATCH v2 01/15] vfio: add vfio_prepare_device() John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05  9:18   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 03/15] vfio: add vfio_attach_device_by_iommu_type() John Levon
                   ` (13 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Add a helper that's the inverse of vfio_prepare_device().

Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/container.c           | 6 +++---
 hw/vfio/device.c              | 7 +++++++
 hw/vfio/iommufd.c             | 4 +---
 include/hw/vfio/vfio-device.h | 1 +
 4 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index aa9d5b731b..1dfdc312bd 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -912,10 +912,10 @@ static void vfio_legacy_detach_device(VFIODevice *vbasedev)
 {
     VFIOGroup *group = vbasedev->group;
 
-    QLIST_REMOVE(vbasedev, global_next);
-    QLIST_REMOVE(vbasedev, container_next);
-    vbasedev->bcontainer = NULL;
     trace_vfio_device_detach(vbasedev->name, group->groupid);
+
+    vfio_device_unprepare(vbasedev);
+
     object_unref(vbasedev->hiod);
     vfio_device_put(vbasedev);
     vfio_group_put(group);
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index f3b9902d21..31c441a3df 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -412,3 +412,10 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
 
     QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
 }
+
+void vfio_device_unprepare(VFIODevice *vbasedev)
+{
+    QLIST_REMOVE(vbasedev, container_next);
+    QLIST_REMOVE(vbasedev, global_next);
+    vbasedev->bcontainer = NULL;
+}
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 83033c352a..62ecb758f1 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -615,9 +615,7 @@ static void iommufd_cdev_detach(VFIODevice *vbasedev)
     VFIOIOMMUFDContainer *container = container_of(bcontainer,
                                                    VFIOIOMMUFDContainer,
                                                    bcontainer);
-    QLIST_REMOVE(vbasedev, global_next);
-    QLIST_REMOVE(vbasedev, container_next);
-    vbasedev->bcontainer = NULL;
+    vfio_device_unprepare(vbasedev);
 
     if (!vbasedev->ram_block_discard_allowed) {
         iommufd_cdev_ram_block_discard_disable(false);
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 9cb5671ab5..6d2a112734 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -132,6 +132,7 @@ VFIODevice *vfio_get_vfio_device(Object *obj);
 
 void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
                          struct vfio_device_info *info);
+void vfio_device_unprepare(VFIODevice *vbasedev);
 
 typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
 extern VFIODeviceList vfio_device_list;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 03/15] vfio: add vfio_attach_device_by_iommu_type()
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
  2025-04-30 19:39 ` [PATCH v2 01/15] vfio: add vfio_prepare_device() John Levon
  2025-04-30 19:39 ` [PATCH v2 02/15] vfio: add vfio_unprepare_device() John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-04-30 19:39 ` [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper John Levon
                   ` (12 subsequent siblings)
  15 siblings, 0 replies; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Allow attachment by explicitly passing a TYPE_VFIO_IOMMU_* string;
vfio-user will use this later.

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/device.c              | 22 +++++++++++++++-------
 include/hw/vfio/vfio-device.h |  3 +++
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 31c441a3df..9673b0717e 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -376,21 +376,29 @@ VFIODevice *vfio_get_vfio_device(Object *obj)
     }
 }
 
-bool vfio_device_attach(char *name, VFIODevice *vbasedev,
-                        AddressSpace *as, Error **errp)
+bool vfio_device_attach_by_iommu_type(const char *iommu_type, char *name,
+                                      VFIODevice *vbasedev, AddressSpace *as,
+                                      Error **errp)
 {
     const VFIOIOMMUClass *ops =
-        VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_LEGACY));
-
-    if (vbasedev->iommufd) {
-        ops = VFIO_IOMMU_CLASS(object_class_by_name(TYPE_VFIO_IOMMU_IOMMUFD));
-    }
+        VFIO_IOMMU_CLASS(object_class_by_name(iommu_type));
 
     assert(ops);
 
     return ops->attach_device(name, vbasedev, as, errp);
 }
 
+bool vfio_device_attach(char *name, VFIODevice *vbasedev,
+                        AddressSpace *as, Error **errp)
+{
+    const char *iommu_type = vbasedev->iommufd ?
+                             TYPE_VFIO_IOMMU_IOMMUFD :
+                             TYPE_VFIO_IOMMU_LEGACY;
+
+    return vfio_device_attach_by_iommu_type(iommu_type, name, vbasedev,
+                                            as, errp);
+}
+
 void vfio_device_detach(VFIODevice *vbasedev)
 {
     if (!vbasedev->bcontainer) {
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 6d2a112734..666a0b50b4 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -127,6 +127,9 @@ bool vfio_device_hiod_create_and_realize(VFIODevice *vbasedev,
                                          const char *typename, Error **errp);
 bool vfio_device_attach(char *name, VFIODevice *vbasedev,
                         AddressSpace *as, Error **errp);
+bool vfio_device_attach_by_iommu_type(const char *iommu_type, char *name,
+                                      VFIODevice *vbasedev, AddressSpace *as,
+                                      Error **errp);
 void vfio_device_detach(VFIODevice *vbasedev);
 VFIODevice *vfio_get_vfio_device(Object *obj);
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (2 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 03/15] vfio: add vfio_attach_device_by_iommu_type() John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-01 11:53   ` Anthony Krowiak
  2025-05-05  9:19   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 05/15] vfio: consistently handle return value for helpers John Levon
                   ` (11 subsequent siblings)
  15 siblings, 2 replies; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Add a helper similar to vfio_device_get_region_info() and use it
everywhere.

Replace a couple of needless allocations with stack variables.

As a side-effect, this fixes a minor error reporting issue in the call
from vfio_msix_early_setup().

Reviewed-by: Cédric Le Goater <clg@redhat.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/ap.c                  | 19 ++++++++++---------
 hw/vfio/ccw.c                 | 20 +++++++++++---------
 hw/vfio/device.c              | 15 +++++++++++++++
 hw/vfio/pci.c                 | 23 +++++++++++------------
 hw/vfio/platform.c            |  6 +++---
 include/hw/vfio/vfio-device.h |  3 +++
 6 files changed, 53 insertions(+), 33 deletions(-)

diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
index 4f88f80c54..4f97260dac 100644
--- a/hw/vfio/ap.c
+++ b/hw/vfio/ap.c
@@ -139,10 +139,10 @@ static bool vfio_ap_register_irq_notifier(VFIOAPDevice *vapdev,
                                           unsigned int irq, Error **errp)
 {
     int fd;
-    size_t argsz;
+    int ret;
     IOHandler *fd_read;
     EventNotifier *notifier;
-    g_autofree struct vfio_irq_info *irq_info = NULL;
+    struct vfio_irq_info irq_info;
     VFIODevice *vdev = &vapdev->vdev;
 
     switch (irq) {
@@ -165,14 +165,15 @@ static bool vfio_ap_register_irq_notifier(VFIOAPDevice *vapdev,
         return false;
     }
 
-    argsz = sizeof(*irq_info);
-    irq_info = g_malloc0(argsz);
-    irq_info->index = irq;
-    irq_info->argsz = argsz;
+    ret = vfio_device_get_irq_info(vdev, irq, &irq_info);
+
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "vfio: Error getting irq info");
+        return false;
+    }
 
-    if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
-              irq_info) < 0 || irq_info->count < 1) {
-        error_setg_errno(errp, errno, "vfio: Error getting irq info");
+    if (irq_info.count < 1) {
+        error_setg(errp, "vfio: Error getting irq info, count=0");
         return false;
     }
 
diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index fde0c3fbef..ab3fabf991 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -376,8 +376,8 @@ static bool vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
                                            Error **errp)
 {
     VFIODevice *vdev = &vcdev->vdev;
-    g_autofree struct vfio_irq_info *irq_info = NULL;
-    size_t argsz;
+    struct vfio_irq_info irq_info;
+    int ret;
     int fd;
     EventNotifier *notifier;
     IOHandler *fd_read;
@@ -406,13 +406,15 @@ static bool vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
         return false;
     }
 
-    argsz = sizeof(*irq_info);
-    irq_info = g_malloc0(argsz);
-    irq_info->index = irq;
-    irq_info->argsz = argsz;
-    if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
-              irq_info) < 0 || irq_info->count < 1) {
-        error_setg_errno(errp, errno, "vfio: Error getting irq info");
+    ret = vfio_device_get_irq_info(vdev, irq, &irq_info);
+
+    if (ret < 0) {
+        error_setg_errno(errp, -ret, "vfio: Error getting irq info");
+        return false;
+    }
+
+    if (irq_info.count < 1) {
+        error_setg(errp, "vfio: Error getting irq info, count=0");
         return false;
     }
 
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 9673b0717e..5d837092cb 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -185,6 +185,21 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
     return false;
 }
 
+int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
+                             struct vfio_irq_info *info)
+{
+    int ret;
+
+    memset(info, 0, sizeof(*info));
+
+    info->argsz = sizeof(*info);
+    info->index = index;
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
+
+    return ret < 0 ? -errno : ret;
+}
+
 int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
                                 struct vfio_region_info **info)
 {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6908bcc0d3..407cf43387 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -1555,8 +1555,7 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
     uint16_t ctrl;
     uint32_t table, pba;
     int ret, fd = vdev->vbasedev.fd;
-    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
-                                      .index = VFIO_PCI_MSIX_IRQ_INDEX };
+    struct vfio_irq_info irq_info;
     VFIOMSIXInfo *msix;
 
     pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
@@ -1593,7 +1592,8 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
     msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
     msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
 
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+    ret = vfio_device_get_irq_info(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX,
+                                   &irq_info);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
         g_free(msix);
@@ -2736,7 +2736,7 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
     VFIODevice *vbasedev = &vdev->vbasedev;
     g_autofree struct vfio_region_info *reg_info = NULL;
-    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
+    struct vfio_irq_info irq_info;
     int i, ret = -1;
 
     /* Sanity check device */
@@ -2797,12 +2797,10 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
         }
     }
 
-    irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
-
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
+    ret = vfio_device_get_irq_info(vbasedev, VFIO_PCI_ERR_IRQ_INDEX, &irq_info);
     if (ret) {
         /* This can fail for an old kernel or legacy PCI dev */
-        trace_vfio_populate_device_get_irq_info_failure(strerror(errno));
+        trace_vfio_populate_device_get_irq_info_failure(strerror(-ret));
     } else if (irq_info.count == 1) {
         vdev->pci_aer = true;
     } else {
@@ -2911,17 +2909,18 @@ static void vfio_req_notifier_handler(void *opaque)
 
 static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
 {
-    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
-                                      .index = VFIO_PCI_REQ_IRQ_INDEX };
+    struct vfio_irq_info irq_info;
     Error *err = NULL;
     int32_t fd;
+    int ret;
 
     if (!(vdev->features & VFIO_FEATURE_ENABLE_REQ)) {
         return;
     }
 
-    if (ioctl(vdev->vbasedev.fd,
-              VFIO_DEVICE_GET_IRQ_INFO, &irq_info) < 0 || irq_info.count < 1) {
+    ret = vfio_device_get_irq_info(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX,
+                                   &irq_info);
+    if (ret < 0 || irq_info.count < 1) {
         return;
     }
 
diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
index ffb3681607..9a21f2e50a 100644
--- a/hw/vfio/platform.c
+++ b/hw/vfio/platform.c
@@ -474,10 +474,10 @@ static bool vfio_populate_device(VFIODevice *vbasedev, Error **errp)
     QSIMPLEQ_INIT(&vdev->pending_intp_queue);
 
     for (i = 0; i < vbasedev->num_irqs; i++) {
-        struct vfio_irq_info irq = { .argsz = sizeof(irq) };
+        struct vfio_irq_info irq;
+
+        ret = vfio_device_get_irq_info(vbasedev, i, &irq);
 
-        irq.index = i;
-        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
         if (ret) {
             error_setg_errno(errp, -ret, "failed to get device irq info");
             goto irq_err;
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 666a0b50b4..5b833868c9 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -146,6 +146,9 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
 int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
                                      uint32_t subtype, struct vfio_region_info **info);
 bool vfio_device_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
+
+int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
+                                struct vfio_irq_info *info);
 #endif
 
 /* Returns 0 on success, or a negative errno. */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 05/15] vfio: consistently handle return value for helpers
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (3 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05  9:32   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 06/15] include/qemu: add strread/writeerror() John Levon
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Various bits of code that call vfio device APIs should consistently use
the "return -errno" approach for passing errors back, rather than
presuming errno is (still) set correctly.

Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/pci.c | 33 ++++++++++++++++++++-------------
 1 file changed, 20 insertions(+), 13 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 407cf43387..768c48d7ad 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -398,7 +398,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
 
     ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
 
-    return ret;
+    return ret < 0 ? -errno : ret;
 }
 
 static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
@@ -459,7 +459,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
 
     g_free(irq_set);
 
-    return ret;
+    return ret < 0 ? -errno : ret;
 }
 
 static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
@@ -581,7 +581,8 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
             vfio_device_irq_disable(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
             ret = vfio_enable_vectors(vdev, true);
             if (ret) {
-                error_report("vfio: failed to enable vectors, %d", ret);
+                error_report("vfio: failed to enable vectors, %s",
+                             strerror(-ret));
             }
         } else {
             Error *err = NULL;
@@ -695,7 +696,8 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
     if (vdev->nr_vectors) {
         ret = vfio_enable_vectors(vdev, true);
         if (ret) {
-            error_report("vfio: failed to enable vectors, %d", ret);
+            error_report("vfio: failed to enable vectors, %s",
+                         strerror(-ret));
         }
     } else {
         /*
@@ -712,7 +714,8 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
          */
         ret = vfio_enable_msix_no_vec(vdev);
         if (ret) {
-            error_report("vfio: failed to enable MSI-X, %d", ret);
+            error_report("vfio: failed to enable MSI-X, %s",
+                         strerror(-ret));
         }
     }
 
@@ -765,7 +768,8 @@ retry:
     ret = vfio_enable_vectors(vdev, false);
     if (ret) {
         if (ret < 0) {
-            error_report("vfio: Error: Failed to setup MSI fds: %m");
+            error_report("vfio: Error: Failed to setup MSI fds: %s",
+                         strerror(-ret));
         } else {
             error_report("vfio: Error: Failed to enable %d "
                          "MSI vectors, retry with %d", vdev->nr_vectors, ret);
@@ -882,17 +886,21 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
 static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
     g_autofree struct vfio_region_info *reg_info = NULL;
+    VFIODevice *vbasedev = &vdev->vbasedev;
     uint64_t size;
     off_t off = 0;
     ssize_t bytes;
+    int ret;
+
+    ret = vfio_device_get_region_info(vbasedev, VFIO_PCI_ROM_REGION_INDEX,
+                                      &reg_info);
 
-    if (vfio_device_get_region_info(&vdev->vbasedev,
-                                    VFIO_PCI_ROM_REGION_INDEX, &reg_info)) {
-        error_report("vfio: Error getting ROM info: %m");
+    if (ret != 0) {
+        error_report("vfio: Error getting ROM info: %s", strerror(-ret));
         return;
     }
 
-    trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info->size,
+    trace_vfio_pci_load_rom(vbasedev->name, (unsigned long)reg_info->size,
                             (unsigned long)reg_info->offset,
                             (unsigned long)reg_info->flags);
 
@@ -901,8 +909,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 
     if (!vdev->rom_size) {
         vdev->rom_read_failed = true;
-        error_report("vfio-pci: Cannot read device rom at "
-                    "%s", vdev->vbasedev.name);
+        error_report("vfio-pci: Cannot read device rom at %s", vbasedev->name);
         error_printf("Device option ROM contents are probably invalid "
                     "(check dmesg).\nSkip option ROM probe with rombar=0, "
                     "or load from file with romfile=\n");
@@ -913,7 +920,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     memset(vdev->rom, 0xff, size);
 
     while (size) {
-        bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
+        bytes = pread(vbasedev->fd, vdev->rom + off,
                       size, vdev->rom_offset + off);
         if (bytes == 0) {
             break;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 06/15] include/qemu: add strread/writeerror()
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (4 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 05/15] vfio: consistently handle return value for helpers John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05  9:37   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 07/15] vfio: add vfio_pci_config_space_read/write() John Levon
                   ` (9 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Add simple helpers to correctly report failures from read/write routines
using the return -errno style.

Signed-off-by: John Levon <john.levon@nutanix.com>
---
 include/qemu/error-report.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h
index 3ae2357fda..67afe5a020 100644
--- a/include/qemu/error-report.h
+++ b/include/qemu/error-report.h
@@ -70,6 +70,20 @@ void error_init(const char *argv0);
                               fmt, ##__VA_ARGS__);      \
     })
 
+/*
+ * Given a return value of either a short number of bytes read or -errno,
+ * construct a meaningful error message.
+ */
+#define strreaderror(ret) \
+    (ret < 0 ? strerror(-ret) : "short read")
+
+/*
+ * Given a return value of either a short number of bytes written or -errno,
+ * construct a meaningful error message.
+ */
+#define strwriteerror(ret) \
+    (ret < 0 ? strerror(-ret) : "short write")
+
 extern bool message_with_timestamp;
 extern bool error_with_guestname;
 extern const char *error_guest_name;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 07/15] vfio: add vfio_pci_config_space_read/write()
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (5 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 06/15] include/qemu: add strread/writeerror() John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05  9:45   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 08/15] vfio: add unmap_all flag to DMA unmap callback John Levon
                   ` (8 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Add these helpers that access config space and return an -errno style
return.

Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/pci.c | 123 ++++++++++++++++++++++++++++++++------------------
 1 file changed, 80 insertions(+), 43 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 768c48d7ad..8455010d62 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -967,6 +967,28 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     }
 }
 
+/* "Raw" read of underlying config space. */
+static int vfio_pci_config_space_read(VFIOPCIDevice *vdev, off_t offset,
+                                      uint32_t size, void *data)
+{
+    ssize_t ret;
+
+    ret = pread(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
+
+    return ret < 0 ? -errno : (int)ret;
+}
+
+/* "Raw" write of underlying config space. */
+static int vfio_pci_config_space_write(VFIOPCIDevice *vdev, off_t offset,
+                                       uint32_t size, void *data)
+{
+    ssize_t ret;
+
+    ret = pwrite(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
+
+    return ret < 0 ? -errno : (int)ret;
+}
+
 static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
 {
     VFIOPCIDevice *vdev = opaque;
@@ -1019,10 +1041,9 @@ static const MemoryRegionOps vfio_rom_ops = {
 
 static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
 {
+    VFIODevice *vbasedev = &vdev->vbasedev;
     uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
-    off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
     char *name;
-    int fd = vdev->vbasedev.fd;
 
     if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
         /* Since pci handles romfile, just print a message and return */
@@ -1039,11 +1060,12 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
      * Use the same size ROM BAR as the physical device.  The contents
      * will get filled in later when the guest tries to read it.
      */
-    if (pread(fd, &orig, 4, offset) != 4 ||
-        pwrite(fd, &size, 4, offset) != 4 ||
-        pread(fd, &size, 4, offset) != 4 ||
-        pwrite(fd, &orig, 4, offset) != 4) {
-        error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name);
+    if (vfio_pci_config_space_read(vdev, PCI_ROM_ADDRESS, 4, &orig) != 4 ||
+        vfio_pci_config_space_write(vdev, PCI_ROM_ADDRESS, 4, &size) != 4 ||
+        vfio_pci_config_space_read(vdev, PCI_ROM_ADDRESS, 4, &size) != 4 ||
+        vfio_pci_config_space_write(vdev, PCI_ROM_ADDRESS, 4, &orig) != 4) {
+
+        error_report("%s(%s) ROM access failed", __func__, vbasedev->name);
         return;
     }
 
@@ -1223,6 +1245,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 {
     VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
     uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
 
     memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
@@ -1235,12 +1258,12 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
     if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
         ssize_t ret;
 
-        ret = pread(vdev->vbasedev.fd, &phys_val, len,
-                    vdev->config_offset + addr);
+        ret = vfio_pci_config_space_read(vdev, addr, len, &phys_val);
         if (ret != len) {
-            error_report("%s(%s, 0x%x, 0x%x) failed: %m",
-                         __func__, vdev->vbasedev.name, addr, len);
-            return -errno;
+            error_report("%s(%s, 0x%x, 0x%x) failed: %s",
+                         __func__, vbasedev->name, addr, len,
+                         strreaderror(ret));
+            return -1;
         }
         phys_val = le32_to_cpu(phys_val);
     }
@@ -1256,15 +1279,18 @@ void vfio_pci_write_config(PCIDevice *pdev,
                            uint32_t addr, uint32_t val, int len)
 {
     VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIODevice *vbasedev = &vdev->vbasedev;
     uint32_t val_le = cpu_to_le32(val);
+    int ret;
 
     trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
     /* Write everything to VFIO, let it filter out what we can't write */
-    if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
-                != len) {
-        error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %m",
-                     __func__, vdev->vbasedev.name, addr, val, len);
+    ret = vfio_pci_config_space_write(vdev, addr, len, &val_le);
+    if (ret != len) {
+        error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %s",
+                     __func__, vbasedev->name, addr, val, len,
+                    strwriteerror(ret));
     }
 
     /* MSI/MSI-X Enabling/Disabling */
@@ -1352,9 +1378,11 @@ static bool vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
     int ret, entries;
     Error *err = NULL;
 
-    if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
-              vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
-        error_setg_errno(errp, errno, "failed reading MSI PCI_CAP_FLAGS");
+    ret = vfio_pci_config_space_read(vdev, pos + PCI_CAP_FLAGS,
+                                     sizeof(ctrl), &ctrl);
+    if (ret != sizeof(ctrl)) {
+        error_setg(errp, "failed reading MSI PCI_CAP_FLAGS: %s",
+                   strreaderror(ret));
         return false;
     }
     ctrl = le16_to_cpu(ctrl);
@@ -1561,30 +1589,35 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
     uint8_t pos;
     uint16_t ctrl;
     uint32_t table, pba;
-    int ret, fd = vdev->vbasedev.fd;
     struct vfio_irq_info irq_info;
     VFIOMSIXInfo *msix;
+    int ret;
 
     pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
     if (!pos) {
         return true;
     }
 
-    if (pread(fd, &ctrl, sizeof(ctrl),
-              vdev->config_offset + pos + PCI_MSIX_FLAGS) != sizeof(ctrl)) {
-        error_setg_errno(errp, errno, "failed to read PCI MSIX FLAGS");
+    ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_FLAGS,
+                                     sizeof(ctrl), &ctrl);
+    if (ret != sizeof(ctrl)) {
+        error_setg(errp, "failed to read PCI MSIX FLAGS: %s",
+                   strreaderror(ret));
         return false;
     }
 
-    if (pread(fd, &table, sizeof(table),
-              vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
-        error_setg_errno(errp, errno, "failed to read PCI MSIX TABLE");
+    ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_TABLE,
+                                     sizeof(table), &table);
+    if (ret != sizeof(table)) {
+        error_setg(errp, "failed to read PCI MSIX TABLE: %s",
+                   strreaderror(ret));
         return false;
     }
 
-    if (pread(fd, &pba, sizeof(pba),
-              vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
-        error_setg_errno(errp, errno, "failed to read PCI MSIX PBA");
+    ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_PBA,
+                                     sizeof(pba), &pba);
+    if (ret != sizeof(pba)) {
+        error_setg(errp, "failed to read PCI MSIX PBA: %s", strreaderror(ret));
         return false;
     }
 
@@ -1744,10 +1777,10 @@ static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
     }
 
     /* Determine what type of BAR this is for registration */
-    ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
-                vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
+    ret = vfio_pci_config_space_read(vdev, PCI_BASE_ADDRESS_0 + (4 * nr),
+                                     sizeof(pci_bar), &pci_bar);
     if (ret != sizeof(pci_bar)) {
-        error_report("vfio: Failed to read BAR %d (%m)", nr);
+        error_report("vfio: Failed to read BAR %d: %s", nr, strreaderror(ret));
         return;
     }
 
@@ -2450,21 +2483,23 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
 
 void vfio_pci_post_reset(VFIOPCIDevice *vdev)
 {
+    VFIODevice *vbasedev = &vdev->vbasedev;
     Error *err = NULL;
-    int nr;
+    int ret, nr;
 
     if (!vfio_intx_enable(vdev, &err)) {
         error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
     }
 
     for (nr = 0; nr < PCI_NUM_REGIONS - 1; ++nr) {
-        off_t addr = vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr);
+        off_t addr = PCI_BASE_ADDRESS_0 + (4 * nr);
         uint32_t val = 0;
         uint32_t len = sizeof(val);
 
-        if (pwrite(vdev->vbasedev.fd, &val, len, addr) != len) {
-            error_report("%s(%s) reset bar %d failed: %m", __func__,
-                         vdev->vbasedev.name, nr);
+        ret = vfio_pci_config_space_write(vdev, addr, len, &val);
+        if (ret != len) {
+            error_report("%s(%s) reset bar %d failed: %s", __func__,
+                         vbasedev->name, nr, strwriteerror(ret));
         }
     }
 
@@ -3101,6 +3136,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
     int i, ret;
     char uuid[UUID_STR_LEN];
     g_autofree char *name = NULL;
+    uint32_t config_space_size;
 
     if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
         if (!(~vdev->host.domain || ~vdev->host.bus ||
@@ -3155,13 +3191,14 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
         goto error;
     }
 
+    config_space_size = MIN(pci_config_size(&vdev->pdev), vdev->config_size);
+
     /* Get a copy of config space */
-    ret = pread(vbasedev->fd, vdev->pdev.config,
-                MIN(pci_config_size(&vdev->pdev), vdev->config_size),
-                vdev->config_offset);
-    if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
-        ret = ret < 0 ? -errno : -EFAULT;
-        error_setg_errno(errp, -ret, "failed to read device config space");
+    ret = vfio_pci_config_space_read(vdev, 0, config_space_size,
+                                     vdev->pdev.config);
+    if (ret < (int)config_space_size) {
+        ret = ret < 0 ? -ret : EFAULT;
+        error_setg_errno(errp, ret, "failed to read device config space");
         goto error;
     }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 08/15] vfio: add unmap_all flag to DMA unmap callback
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (6 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 07/15] vfio: add vfio_pci_config_space_read/write() John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05 12:06   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 09/15] vfio: implement unmap all for DMA unmap callbacks John Levon
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

We'll use this parameter shortly; this just adds the plumbing.

Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/container-base.c              | 4 ++--
 hw/vfio/container.c                   | 8 ++++++--
 hw/vfio/iommufd.c                     | 6 +++++-
 hw/vfio/listener.c                    | 8 ++++----
 include/hw/vfio/vfio-container-base.h | 4 ++--
 5 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 09340fd97a..3ff473a45c 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -85,12 +85,12 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
 
 int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
                              hwaddr iova, ram_addr_t size,
-                             IOMMUTLBEntry *iotlb)
+                             IOMMUTLBEntry *iotlb, bool unmap_all)
 {
     VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
 
     g_assert(vioc->dma_unmap);
-    return vioc->dma_unmap(bcontainer, iova, size, iotlb);
+    return vioc->dma_unmap(bcontainer, iova, size, iotlb, unmap_all);
 }
 
 bool vfio_container_add_section_window(VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 1dfdc312bd..766ba5a275 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -124,7 +124,7 @@ unmap_exit:
  */
 static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
                                  hwaddr iova, ram_addr_t size,
-                                 IOMMUTLBEntry *iotlb)
+                                 IOMMUTLBEntry *iotlb, bool unmap_all)
 {
     const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
                                                   bcontainer);
@@ -138,6 +138,10 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
     int ret;
     Error *local_err = NULL;
 
+    if (unmap_all) {
+        return -ENOTSUP;
+    }
+
     if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
         if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
             bcontainer->dirty_pages_supported) {
@@ -205,7 +209,7 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
      */
     if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
         (errno == EBUSY &&
-         vfio_legacy_dma_unmap(bcontainer, iova, size, NULL) == 0 &&
+         vfio_legacy_dma_unmap(bcontainer, iova, size, NULL, false) == 0 &&
          ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
         return 0;
     }
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 62ecb758f1..6b2764c044 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -46,11 +46,15 @@ static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
 
 static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
                               hwaddr iova, ram_addr_t size,
-                              IOMMUTLBEntry *iotlb)
+                              IOMMUTLBEntry *iotlb, bool unmap_all)
 {
     const VFIOIOMMUFDContainer *container =
         container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
 
+    if (unmap_all) {
+        return -ENOTSUP;
+    }
+
     /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
     return iommufd_backend_unmap_dma(container->be,
                                      container->ioas_id, iova, size);
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index 6f77e18a7a..c5183700db 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -172,7 +172,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
         }
     } else {
         ret = vfio_container_dma_unmap(bcontainer, iova,
-                                       iotlb->addr_mask + 1, iotlb);
+                                       iotlb->addr_mask + 1, iotlb, false);
         if (ret) {
             error_setg(&local_err,
                        "vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
@@ -201,7 +201,7 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
     int ret;
 
     /* Unmap with a single call. */
-    ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL);
+    ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL, false);
     if (ret) {
         error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
                      strerror(-ret));
@@ -638,7 +638,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
             /* The unmap ioctl doesn't accept a full 64-bit span. */
             llsize = int128_rshift(llsize, 1);
             ret = vfio_container_dma_unmap(bcontainer, iova,
-                                           int128_get64(llsize), NULL);
+                                           int128_get64(llsize), NULL, false);
             if (ret) {
                 error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                              "0x%"HWADDR_PRIx") = %d (%s)",
@@ -648,7 +648,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
             iova += int128_get64(llsize);
         }
         ret = vfio_container_dma_unmap(bcontainer, iova,
-                                       int128_get64(llsize), NULL);
+                                       int128_get64(llsize), NULL, false);
         if (ret) {
             error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx") = %d (%s)",
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 5527e02722..92cee54d11 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -81,7 +81,7 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
                            void *vaddr, bool readonly);
 int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
                              hwaddr iova, ram_addr_t size,
-                             IOMMUTLBEntry *iotlb);
+                             IOMMUTLBEntry *iotlb, bool unmap_all);
 bool vfio_container_add_section_window(VFIOContainerBase *bcontainer,
                                        MemoryRegionSection *section,
                                        Error **errp);
@@ -122,7 +122,7 @@ struct VFIOIOMMUClass {
                    void *vaddr, bool readonly);
     int (*dma_unmap)(const VFIOContainerBase *bcontainer,
                      hwaddr iova, ram_addr_t size,
-                     IOMMUTLBEntry *iotlb);
+                     IOMMUTLBEntry *iotlb, bool unmap_all);
     bool (*attach_device)(const char *name, VFIODevice *vbasedev,
                           AddressSpace *as, Error **errp);
     void (*detach_device)(VFIODevice *vbasedev);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 09/15] vfio: implement unmap all for DMA unmap callbacks
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (7 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 08/15] vfio: add unmap_all flag to DMA unmap callback John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05 11:28   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 10/15] vfio: add device IO ops vector John Levon
                   ` (6 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Handle unmap_all in the DMA unmap handlers rather than in the caller.

Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/container.c | 45 +++++++++++++++++++++++++++++++++++----------
 hw/vfio/iommufd.c   | 15 ++++++++++++++-
 hw/vfio/listener.c  | 19 ++++++-------------
 3 files changed, 55 insertions(+), 24 deletions(-)

diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 766ba5a275..1000f3c241 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -119,12 +119,9 @@ unmap_exit:
     return ret;
 }
 
-/*
- * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
- */
-static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
-                                 hwaddr iova, ram_addr_t size,
-                                 IOMMUTLBEntry *iotlb, bool unmap_all)
+static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
+                                     hwaddr iova, ram_addr_t size,
+                                     IOMMUTLBEntry *iotlb)
 {
     const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
                                                   bcontainer);
@@ -138,10 +135,6 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
     int ret;
     Error *local_err = NULL;
 
-    if (unmap_all) {
-        return -ENOTSUP;
-    }
-
     if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
         if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
             bcontainer->dirty_pages_supported) {
@@ -185,6 +178,38 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
     return 0;
 }
 
+/*
+ * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
+ */
+static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
+                                 hwaddr iova, ram_addr_t size,
+                                 IOMMUTLBEntry *iotlb, bool unmap_all)
+{
+    int ret;
+
+    if (unmap_all) {
+        /* The unmap ioctl doesn't accept a full 64-bit span. */
+        Int128 llsize = int128_rshift(int128_2_64(), 1);
+
+        ret = vfio_legacy_dma_unmap_one(bcontainer, 0, int128_get64(llsize),
+                                        iotlb);
+
+        if (ret == 0) {
+            ret = vfio_legacy_dma_unmap_one(bcontainer, int128_get64(llsize),
+                                            int128_get64(llsize), iotlb);
+        }
+
+    } else {
+        ret = vfio_legacy_dma_unmap_one(bcontainer, iova, size, iotlb);
+    }
+
+    if (ret != 0) {
+        return -errno;
+    }
+
+    return 0;
+}
+
 static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
                                ram_addr_t size, void *vaddr, bool readonly)
 {
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index 6b2764c044..af1c7ab10a 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -51,8 +51,21 @@ static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
     const VFIOIOMMUFDContainer *container =
         container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
 
+    /* unmap in halves */
     if (unmap_all) {
-        return -ENOTSUP;
+        Int128 llsize = int128_rshift(int128_2_64(), 1);
+        int ret;
+
+        ret = iommufd_backend_unmap_dma(container->be, container->ioas_id,
+                                        0, int128_get64(llsize));
+
+        if (ret == 0) {
+            ret = iommufd_backend_unmap_dma(container->be, container->ioas_id,
+                                            int128_get64(llsize),
+                                            int128_get64(llsize));
+        }
+
+        return ret;
     }
 
     /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index c5183700db..e7ade7d62e 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -634,21 +634,14 @@ static void vfio_listener_region_del(MemoryListener *listener,
     }
 
     if (try_unmap) {
+        bool unmap_all = false;
+
         if (int128_eq(llsize, int128_2_64())) {
-            /* The unmap ioctl doesn't accept a full 64-bit span. */
-            llsize = int128_rshift(llsize, 1);
-            ret = vfio_container_dma_unmap(bcontainer, iova,
-                                           int128_get64(llsize), NULL, false);
-            if (ret) {
-                error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
-                             "0x%"HWADDR_PRIx") = %d (%s)",
-                             bcontainer, iova, int128_get64(llsize), ret,
-                             strerror(-ret));
-            }
-            iova += int128_get64(llsize);
+            unmap_all = true;
+            llsize = int128_zero();
         }
-        ret = vfio_container_dma_unmap(bcontainer, iova,
-                                       int128_get64(llsize), NULL, false);
+        ret = vfio_container_dma_unmap(bcontainer, iova, int128_get64(llsize),
+                                       NULL, unmap_all);
         if (ret) {
             error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx") = %d (%s)",
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 10/15] vfio: add device IO ops vector
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (8 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 09/15] vfio: implement unmap all for DMA unmap callbacks John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05 12:21   ` Cédric Le Goater
  2025-05-06 10:01   ` Cédric Le Goater
  2025-04-30 19:39 ` [PATCH v2 11/15] vfio: add region info cache John Levon
                   ` (5 subsequent siblings)
  15 siblings, 2 replies; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon, John Johnson, Elena Ufimtseva, Jagannathan Raman

For vfio-user, device operations such as IRQ handling and region
read/writes are implemented in userspace over the control socket, not
ioctl() to the vfio kernel driver; add an ops vector to generalize this,
and implement vfio_device_io_ops_ioctl for interacting with the kernel
vfio driver.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/container-base.c      |  6 +--
 hw/vfio/device.c              | 77 ++++++++++++++++++++++++++++++-----
 hw/vfio/listener.c            | 13 +++---
 hw/vfio/pci.c                 | 10 ++---
 include/hw/vfio/vfio-device.h | 38 +++++++++++++++++
 5 files changed, 117 insertions(+), 27 deletions(-)

diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 3ff473a45c..1c6ca94b60 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -198,11 +198,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
     feature->flags = VFIO_DEVICE_FEATURE_GET |
                      VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
 
-    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
-        return -errno;
-    }
-
-    return 0;
+    return vbasedev->io_ops->device_feature(vbasedev, feature);
 }
 
 static int vfio_container_iommu_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 5d837092cb..468fb50eac 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -82,7 +82,7 @@ void vfio_device_irq_disable(VFIODevice *vbasedev, int index)
         .count = 0,
     };
 
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
 }
 
 void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
@@ -95,7 +95,7 @@ void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
         .count = 1,
     };
 
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
 }
 
 void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
@@ -108,7 +108,7 @@ void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
         .count = 1,
     };
 
-    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
+    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
 }
 
 static inline const char *action_to_str(int action)
@@ -155,6 +155,7 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
     int argsz;
     const char *name;
     int32_t *pfd;
+    int ret;
 
     argsz = sizeof(*irq_set) + sizeof(*pfd);
 
@@ -167,7 +168,9 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
     pfd = (int32_t *)&irq_set->data;
     *pfd = fd;
 
-    if (!ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
+    ret = vbasedev->io_ops->set_irqs(vbasedev, irq_set);
+
+    if (!ret) {
         return true;
     }
 
@@ -188,22 +191,19 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
 int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
                              struct vfio_irq_info *info)
 {
-    int ret;
-
     memset(info, 0, sizeof(*info));
 
     info->argsz = sizeof(*info);
     info->index = index;
 
-    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
-
-    return ret < 0 ? -errno : ret;
+    return vbasedev->io_ops->get_irq_info(vbasedev, info);
 }
 
 int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
                                 struct vfio_region_info **info)
 {
     size_t argsz = sizeof(struct vfio_region_info);
+    int ret;
 
     *info = g_malloc0(argsz);
 
@@ -211,10 +211,11 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
 retry:
     (*info)->argsz = argsz;
 
-    if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
+    ret = vbasedev->io_ops->get_region_info(vbasedev, *info);
+    if (ret != 0) {
         g_free(*info);
         *info = NULL;
-        return -errno;
+        return ret;
     }
 
     if ((*info)->argsz > argsz) {
@@ -320,11 +321,14 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
     vbasedev->fd = fd;
 }
 
+static VFIODeviceIOOps vfio_device_io_ops_ioctl;
+
 void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
                       DeviceState *dev, bool ram_discard)
 {
     vbasedev->type = type;
     vbasedev->ops = ops;
+    vbasedev->io_ops = &vfio_device_io_ops_ioctl;
     vbasedev->dev = dev;
     vbasedev->fd = -1;
 
@@ -442,3 +446,54 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
     QLIST_REMOVE(vbasedev, global_next);
     vbasedev->bcontainer = NULL;
 }
+
+/*
+ * Traditional ioctl() based io
+ */
+
+static int vfio_device_io_device_feature(VFIODevice *vbasedev,
+                                         struct vfio_device_feature *feature)
+{
+    int ret;
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+
+    return ret < 0 ? -errno : ret;
+}
+
+static int vfio_device_io_get_region_info(VFIODevice *vbasedev,
+                                          struct vfio_region_info *info)
+{
+    int ret;
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
+
+    return ret < 0 ? -errno : ret;
+}
+
+static int vfio_device_io_get_irq_info(VFIODevice *vbasedev,
+                                       struct vfio_irq_info *info)
+{
+    int ret;
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
+
+    return ret < 0 ? -errno : ret;
+}
+
+static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
+                                   struct vfio_irq_set *irqs)
+{
+    int ret;
+
+    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs);
+
+    return ret < 0 ? -errno : ret;
+}
+
+static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
+    .device_feature = vfio_device_io_device_feature,
+    .get_region_info = vfio_device_io_get_region_info,
+    .get_irq_info = vfio_device_io_get_irq_info,
+    .set_irqs = vfio_device_io_set_irqs,
+};
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index e7ade7d62e..2b93ca55b6 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -794,13 +794,17 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
                      VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
 
     QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
+        int ret;
+
         if (!vbasedev->dirty_tracking) {
             continue;
         }
 
-        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
+        ret = vbasedev->io_ops->device_feature(vbasedev, feature);
+
+        if (ret != 0) {
             warn_report("%s: Failed to stop DMA logging, err %d (%s)",
-                        vbasedev->name, -errno, strerror(errno));
+                        vbasedev->name, -ret, strerror(-ret));
         }
         vbasedev->dirty_tracking = false;
     }
@@ -901,10 +905,9 @@ static bool vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
             continue;
         }
 
-        ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
+        ret = vbasedev->io_ops->device_feature(vbasedev, feature);
         if (ret) {
-            ret = -errno;
-            error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
+            error_setg_errno(errp, -ret, "%s: Failed to start DMA logging",
                              vbasedev->name);
             goto out;
         }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8455010d62..bbf95215cc 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -381,7 +381,7 @@ static void vfio_msi_interrupt(void *opaque)
 static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
 {
     g_autofree struct vfio_irq_set *irq_set = NULL;
-    int ret = 0, argsz;
+    int argsz;
     int32_t *fd;
 
     argsz = sizeof(*irq_set) + sizeof(*fd);
@@ -396,9 +396,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
     fd = (int32_t *)&irq_set->data;
     *fd = -1;
 
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
-
-    return ret < 0 ? -errno : ret;
+    return vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
 }
 
 static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
@@ -455,11 +453,11 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
         fds[i] = fd;
     }
 
-    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
+    ret = vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
 
     g_free(irq_set);
 
-    return ret < 0 ? -errno : ret;
+    return ret;
 }
 
 static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index 5b833868c9..e89ed02c0e 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -41,6 +41,7 @@ enum {
 };
 
 typedef struct VFIODeviceOps VFIODeviceOps;
+typedef struct VFIODeviceIOOps VFIODeviceIOOps;
 typedef struct VFIOMigration VFIOMigration;
 
 typedef struct IOMMUFDBackend IOMMUFDBackend;
@@ -66,6 +67,7 @@ typedef struct VFIODevice {
     OnOffAuto migration_multifd_transfer;
     bool migration_events;
     VFIODeviceOps *ops;
+    VFIODeviceIOOps *io_ops;
     unsigned int num_irqs;
     unsigned int num_regions;
     unsigned int flags;
@@ -141,6 +143,42 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
 extern VFIODeviceList vfio_device_list;
 
 #ifdef CONFIG_LINUX
+/*
+ * How devices communicate with the server.  The default option is through
+ * ioctl() to the kernel VFIO driver, but vfio-user can use a socket to a remote
+ * process.
+ */
+struct VFIODeviceIOOps {
+    /**
+     * @device_feature
+     *
+     * Fill in feature info for the given device.
+     */
+    int (*device_feature)(VFIODevice *vdev, struct vfio_device_feature *);
+
+    /**
+     * @get_region_info
+     *
+     * Fill in @info with information on the region given by @info->index.
+     */
+    int (*get_region_info)(VFIODevice *vdev,
+                           struct vfio_region_info *info);
+
+    /**
+     * @get_irq_info
+     *
+     * Fill in @irq with information on the IRQ given by @info->index.
+     */
+    int (*get_irq_info)(VFIODevice *vdev, struct vfio_irq_info *irq);
+
+    /**
+     * @set_irqs
+     *
+     * Configure IRQs as defined by @irqs.
+     */
+    int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
+};
+
 int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
                                 struct vfio_region_info **info);
 int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 11/15] vfio: add region info cache
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (9 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 10/15] vfio: add device IO ops vector John Levon
@ 2025-04-30 19:39 ` John Levon
  2025-05-05 12:26   ` Cédric Le Goater
  2025-04-30 19:40 ` [PATCH v2 12/15] vfio: add read/write to device IO ops vector John Levon
                   ` (4 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:39 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon, John Johnson, Elena Ufimtseva, Jagannathan Raman

Instead of requesting region information on demand with
VFIO_DEVICE_GET_REGION_INFO, maintain a cache: this will become
necessary for performance for vfio-user, where this call becomes a
message over the control socket, so is of higher overhead than the
traditional path.

We will also need it to generalize region accesses, as that means we
can't use ->config_offset for configuration space accesses, but must
look up the region offset (if relevant) each time.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/ccw.c                 |  5 -----
 hw/vfio/device.c              | 27 +++++++++++++++++++++++----
 hw/vfio/igd.c                 |  8 ++++----
 hw/vfio/pci.c                 |  6 +++---
 hw/vfio/region.c              |  2 +-
 include/hw/vfio/vfio-device.h |  1 +
 6 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
index ab3fabf991..cea9d6e005 100644
--- a/hw/vfio/ccw.c
+++ b/hw/vfio/ccw.c
@@ -504,7 +504,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
 
     vcdev->io_region_offset = info->offset;
     vcdev->io_region = g_malloc0(info->size);
-    g_free(info);
 
     /* check for the optional async command region */
     ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
@@ -517,7 +516,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
         }
         vcdev->async_cmd_region_offset = info->offset;
         vcdev->async_cmd_region = g_malloc0(info->size);
-        g_free(info);
     }
 
     ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
@@ -530,7 +528,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
         }
         vcdev->schib_region_offset = info->offset;
         vcdev->schib_region = g_malloc(info->size);
-        g_free(info);
     }
 
     ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
@@ -544,7 +541,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
         }
         vcdev->crw_region_offset = info->offset;
         vcdev->crw_region = g_malloc(info->size);
-        g_free(info);
     }
 
     return true;
@@ -554,7 +550,6 @@ out_err:
     g_free(vcdev->schib_region);
     g_free(vcdev->async_cmd_region);
     g_free(vcdev->io_region);
-    g_free(info);
     return false;
 }
 
diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index 468fb50eac..d08c0ab536 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -205,6 +205,12 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
     size_t argsz = sizeof(struct vfio_region_info);
     int ret;
 
+    /* check cache */
+    if (vbasedev->reginfo[index] != NULL) {
+        *info = vbasedev->reginfo[index];
+        return 0;
+    }
+
     *info = g_malloc0(argsz);
 
     (*info)->index = index;
@@ -225,6 +231,9 @@ retry:
         goto retry;
     }
 
+    /* fill cache */
+    vbasedev->reginfo[index] = *info;
+
     return 0;
 }
 
@@ -243,7 +252,6 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
 
         hdr = vfio_get_region_info_cap(*info, VFIO_REGION_INFO_CAP_TYPE);
         if (!hdr) {
-            g_free(*info);
             continue;
         }
 
@@ -255,8 +263,6 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
         if (cap_type->type == type && cap_type->subtype == subtype) {
             return 0;
         }
-
-        g_free(*info);
     }
 
     *info = NULL;
@@ -265,7 +271,7 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
 
 bool vfio_device_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
 {
-    g_autofree struct vfio_region_info *info = NULL;
+    struct vfio_region_info *info = NULL;
     bool ret = false;
 
     if (!vfio_device_get_region_info(vbasedev, region, &info)) {
@@ -438,10 +444,23 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
     QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
 
     QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
+
+    if (vbasedev->reginfo == NULL) {
+        vbasedev->reginfo = g_new0(struct vfio_region_info *,
+                                   vbasedev->num_regions);
+    }
 }
 
 void vfio_device_unprepare(VFIODevice *vbasedev)
 {
+    int i;
+
+    for (i = 0; i < vbasedev->num_regions; i++) {
+        g_free(vbasedev->reginfo[i]);
+    }
+    g_free(vbasedev->reginfo);
+    vbasedev->reginfo = NULL;
+
     QLIST_REMOVE(vbasedev, container_next);
     QLIST_REMOVE(vbasedev, global_next);
     vbasedev->bcontainer = NULL;
diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
index d7e4728fdc..c7db74cde4 100644
--- a/hw/vfio/igd.c
+++ b/hw/vfio/igd.c
@@ -191,7 +191,7 @@ static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
 
 static bool vfio_pci_igd_setup_opregion(VFIOPCIDevice *vdev, Error **errp)
 {
-    g_autofree struct vfio_region_info *opregion = NULL;
+    struct vfio_region_info *opregion = NULL;
     int ret;
 
     /* Hotplugging is not supported for opregion access */
@@ -355,8 +355,8 @@ static int vfio_pci_igd_lpc_init(VFIOPCIDevice *vdev,
 
 static bool vfio_pci_igd_setup_lpc_bridge(VFIOPCIDevice *vdev, Error **errp)
 {
-    g_autofree struct vfio_region_info *host = NULL;
-    g_autofree struct vfio_region_info *lpc = NULL;
+    struct vfio_region_info *host = NULL;
+    struct vfio_region_info *lpc = NULL;
     PCIDevice *lpc_bridge;
     int ret;
 
@@ -532,7 +532,7 @@ static bool vfio_pci_igd_config_quirk(VFIOPCIDevice *vdev, Error **errp)
          * - OpRegion
          * - Same LPC bridge and Host bridge VID/DID/SVID/SSID as host
          */
-        g_autofree struct vfio_region_info *rom = NULL;
+        struct vfio_region_info *rom = NULL;
 
         legacy_mode_enabled = true;
         info_report("IGD legacy mode enabled, "
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bbf95215cc..1aeb4d91d2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -883,8 +883,8 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
 
 static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 {
-    g_autofree struct vfio_region_info *reg_info = NULL;
     VFIODevice *vbasedev = &vdev->vbasedev;
+    struct vfio_region_info *reg_info = NULL;
     uint64_t size;
     off_t off = 0;
     ssize_t bytes;
@@ -2710,7 +2710,7 @@ static VFIODeviceOps vfio_pci_ops = {
 bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
 {
     VFIODevice *vbasedev = &vdev->vbasedev;
-    g_autofree struct vfio_region_info *reg_info = NULL;
+    struct vfio_region_info *reg_info = NULL;
     int ret;
 
     ret = vfio_device_get_region_info(vbasedev, VFIO_PCI_VGA_REGION_INDEX, &reg_info);
@@ -2775,7 +2775,7 @@ bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
 static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
 {
     VFIODevice *vbasedev = &vdev->vbasedev;
-    g_autofree struct vfio_region_info *reg_info = NULL;
+    struct vfio_region_info *reg_info = NULL;
     struct vfio_irq_info irq_info;
     int i, ret = -1;
 
diff --git a/hw/vfio/region.c b/hw/vfio/region.c
index 04bf9eb098..ef2630cac3 100644
--- a/hw/vfio/region.c
+++ b/hw/vfio/region.c
@@ -182,7 +182,7 @@ static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
 int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
                       int index, const char *name)
 {
-    g_autofree struct vfio_region_info *info = NULL;
+    struct vfio_region_info *info = NULL;
     int ret;
 
     ret = vfio_device_get_region_info(vbasedev, index, &info);
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index e89ed02c0e..b4a28c2a54 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -83,6 +83,7 @@ typedef struct VFIODevice {
     IOMMUFDBackend *iommufd;
     VFIOIOASHwpt *hwpt;
     QLIST_ENTRY(VFIODevice) hwpt_next;
+    struct vfio_region_info **reginfo;
 } VFIODevice;
 
 struct VFIODeviceOps {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 12/15] vfio: add read/write to device IO ops vector
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (10 preceding siblings ...)
  2025-04-30 19:39 ` [PATCH v2 11/15] vfio: add region info cache John Levon
@ 2025-04-30 19:40 ` John Levon
  2025-05-05 12:39   ` Cédric Le Goater
  2025-04-30 19:40 ` [PATCH v2 13/15] vfio: add vfio-pci-base class John Levon
                   ` (3 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

Now we have the region info cache, add ->region_read/write device I/O
operations instead of explicit pread()/pwrite() system calls.
---
 hw/vfio/device.c              | 38 +++++++++++++++++++++++++++++++++++
 hw/vfio/pci.c                 | 28 +++++++++++++-------------
 hw/vfio/region.c              | 17 ++++++++++------
 include/hw/vfio/vfio-device.h | 18 +++++++++++++++++
 4 files changed, 81 insertions(+), 20 deletions(-)

diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index d08c0ab536..ceb7bbebda 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -510,9 +510,47 @@ static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
     return ret < 0 ? -errno : ret;
 }
 
+static int vfio_device_io_region_read(VFIODevice *vbasedev, uint8_t index,
+                                      off_t off, uint32_t size, void *data)
+{
+    struct vfio_region_info *info = vbasedev->reginfo[index];
+    int ret;
+
+    if (info == NULL) {
+        ret = vfio_device_get_region_info(vbasedev, index, &info);
+        if (ret != 0) {
+            return ret;
+        }
+    }
+
+    ret = pread(vbasedev->fd, data, size, info->offset + off);
+
+    return ret < 0 ? -errno : ret;
+}
+
+static int vfio_device_io_region_write(VFIODevice *vbasedev, uint8_t index,
+                                       off_t off, uint32_t size, void *data)
+{
+    struct vfio_region_info *info = vbasedev->reginfo[index];
+    int ret;
+
+    if (info == NULL) {
+        ret = vfio_device_get_region_info(vbasedev, index, &info);
+        if (ret != 0) {
+            return ret;
+        }
+    }
+
+    ret = pwrite(vbasedev->fd, data, size, info->offset + off);
+
+    return ret < 0 ? -errno : ret;
+}
+
 static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
     .device_feature = vfio_device_io_device_feature,
     .get_region_info = vfio_device_io_get_region_info,
     .get_irq_info = vfio_device_io_get_irq_info,
     .set_irqs = vfio_device_io_set_irqs,
+    .region_read = vfio_device_io_region_read,
+    .region_write = vfio_device_io_region_write,
 };
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 1aeb4d91d2..5e811d5d6a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -918,18 +918,22 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
     memset(vdev->rom, 0xff, size);
 
     while (size) {
-        bytes = pread(vbasedev->fd, vdev->rom + off,
-                      size, vdev->rom_offset + off);
+        bytes = vbasedev->io_ops->region_read(vbasedev,
+                                              VFIO_PCI_ROM_REGION_INDEX,
+                                              off, size, vdev->rom + off);
+
         if (bytes == 0) {
             break;
         } else if (bytes > 0) {
             off += bytes;
             size -= bytes;
         } else {
-            if (errno == EINTR || errno == EAGAIN) {
+            if (bytes == -EINTR || bytes == -EAGAIN) {
                 continue;
             }
-            error_report("vfio: Error reading device ROM: %m");
+            error_report("vfio: Error reading device ROM: %s",
+                         strreaderror(bytes));
+
             break;
         }
     }
@@ -969,22 +973,18 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
 static int vfio_pci_config_space_read(VFIOPCIDevice *vdev, off_t offset,
                                       uint32_t size, void *data)
 {
-    ssize_t ret;
-
-    ret = pread(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
-
-    return ret < 0 ? -errno : (int)ret;
+    return vdev->vbasedev.io_ops->region_read(&vdev->vbasedev,
+                                              VFIO_PCI_CONFIG_REGION_INDEX,
+                                              offset, size, data);
 }
 
 /* "Raw" write of underlying config space. */
 static int vfio_pci_config_space_write(VFIOPCIDevice *vdev, off_t offset,
                                        uint32_t size, void *data)
 {
-    ssize_t ret;
-
-    ret = pwrite(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
-
-    return ret < 0 ? -errno : (int)ret;
+    return vdev->vbasedev.io_ops->region_write(&vdev->vbasedev,
+                                               VFIO_PCI_CONFIG_REGION_INDEX,
+                                               offset, size, data);
 }
 
 static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
diff --git a/hw/vfio/region.c b/hw/vfio/region.c
index ef2630cac3..34752c3f65 100644
--- a/hw/vfio/region.c
+++ b/hw/vfio/region.c
@@ -45,6 +45,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
         uint32_t dword;
         uint64_t qword;
     } buf;
+    int ret;
 
     switch (size) {
     case 1:
@@ -64,11 +65,13 @@ void vfio_region_write(void *opaque, hwaddr addr,
         break;
     }
 
-    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
+    ret = vbasedev->io_ops->region_write(vbasedev, region->nr,
+                                         addr, size, &buf);
+    if (ret != size) {
         error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
-                     ",%d) failed: %m",
+                     ",%d) failed: %s",
                      __func__, vbasedev->name, region->nr,
-                     addr, data, size);
+                     addr, data, size, strwriteerror(ret));
     }
 
     trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
@@ -96,11 +99,13 @@ uint64_t vfio_region_read(void *opaque,
         uint64_t qword;
     } buf;
     uint64_t data = 0;
+    int ret;
 
-    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
-        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
+    ret = vbasedev->io_ops->region_read(vbasedev, region->nr, addr, size, &buf);
+    if (ret != size) {
+        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %s",
                      __func__, vbasedev->name, region->nr,
-                     addr, size);
+                     addr, size, strreaderror(ret));
         return (uint64_t)-1;
     }
     switch (size) {
diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
index b4a28c2a54..d3ab13ca6a 100644
--- a/include/hw/vfio/vfio-device.h
+++ b/include/hw/vfio/vfio-device.h
@@ -178,6 +178,24 @@ struct VFIODeviceIOOps {
      * Configure IRQs as defined by @irqs.
      */
     int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
+
+    /**
+     * @region_read
+     *
+     * Read @size bytes from the region @nr at offset @off into the buffer
+     * @data.
+     */
+    int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
+                       void *data);
+
+    /**
+     * @region_write
+     *
+     * Write @size bytes to the region @nr at offset @off from the buffer
+     * @data.
+     */
+    int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
+                        void *data);
 };
 
 int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 13/15] vfio: add vfio-pci-base class
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (11 preceding siblings ...)
  2025-04-30 19:40 ` [PATCH v2 12/15] vfio: add read/write to device IO ops vector John Levon
@ 2025-04-30 19:40 ` John Levon
  2025-05-05 12:42   ` Cédric Le Goater
  2025-04-30 19:40 ` [PATCH v2 14/15] vfio/container: pass listener_begin/commit callbacks John Levon
                   ` (2 subsequent siblings)
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon, John Johnson, Elena Ufimtseva, Jagannathan Raman

Split out parts of TYPE_VFIO_PCI into a base TYPE_VFIO_PCI_BASE,
although we have not yet introduced another subclass, so all the
properties have remained in TYPE_VFIO_PCI.

Note that currently there is no need for additional data for
TYPE_VFIO_PCI, so it shares the same C struct type as
TYPE_VFIO_PCI_BASE, VFIOPCIDevice.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/device.c |  2 +-
 hw/vfio/pci.c    | 62 +++++++++++++++++++++++++++++++-----------------
 hw/vfio/pci.h    | 12 ++++++++--
 3 files changed, 51 insertions(+), 25 deletions(-)

diff --git a/hw/vfio/device.c b/hw/vfio/device.c
index ceb7bbebda..70d75b271f 100644
--- a/hw/vfio/device.c
+++ b/hw/vfio/device.c
@@ -395,7 +395,7 @@ bool vfio_device_hiod_create_and_realize(VFIODevice *vbasedev,
 VFIODevice *vfio_get_vfio_device(Object *obj)
 {
     if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
-        return &VFIO_PCI(obj)->vbasedev;
+        return &VFIO_PCI_BASE(obj)->vbasedev;
     } else {
         return NULL;
     }
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5e811d5d6a..8d29b4552f 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -241,7 +241,7 @@ static void vfio_intx_update(VFIOPCIDevice *vdev, PCIINTxRoute *route)
 
 static void vfio_intx_routing_notifier(PCIDevice *pdev)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
     PCIINTxRoute route;
 
     if (vdev->interrupt != VFIO_INT_INTx) {
@@ -514,7 +514,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
 static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
                                    MSIMessage *msg, IOHandler *handler)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
     VFIOMSIVector *vector;
     int ret;
     bool resizing = !!(vdev->nr_vectors < nr + 1);
@@ -620,7 +620,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
 
 static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
     VFIOMSIVector *vector = &vdev->msi_vectors[nr];
 
     trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
@@ -1196,7 +1196,7 @@ static const MemoryRegionOps vfio_vga_ops = {
  */
 static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
     VFIORegion *region = &vdev->bars[bar].region;
     MemoryRegion *mmap_mr, *region_mr, *base_mr;
     PCIIORegion *r;
@@ -1242,7 +1242,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
  */
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
     VFIODevice *vbasedev = &vdev->vbasedev;
     uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
 
@@ -1276,7 +1276,7 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
 void vfio_pci_write_config(PCIDevice *pdev,
                            uint32_t addr, uint32_t val, int len)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
     VFIODevice *vbasedev = &vdev->vbasedev;
     uint32_t val_le = cpu_to_le32(val);
     int ret;
@@ -3129,7 +3129,7 @@ static bool vfio_interrupt_setup(VFIOPCIDevice *vdev, Error **errp)
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
     ERRP_GUARD();
-    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
     VFIODevice *vbasedev = &vdev->vbasedev;
     int i, ret;
     char uuid[UUID_STR_LEN];
@@ -3300,7 +3300,7 @@ error:
 
 static void vfio_instance_finalize(Object *obj)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(obj);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
 
     vfio_display_finalize(vdev);
     vfio_bars_finalize(vdev);
@@ -3318,7 +3318,7 @@ static void vfio_instance_finalize(Object *obj)
 
 static void vfio_exitfn(PCIDevice *pdev)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
     VFIODevice *vbasedev = &vdev->vbasedev;
 
     vfio_unregister_req_notifier(vdev);
@@ -3342,7 +3342,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 
 static void vfio_pci_reset(DeviceState *dev)
 {
-    VFIOPCIDevice *vdev = VFIO_PCI(dev);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
 
     trace_vfio_pci_reset(vdev->vbasedev.name);
 
@@ -3382,7 +3382,7 @@ post_reset:
 static void vfio_instance_init(Object *obj)
 {
     PCIDevice *pci_dev = PCI_DEVICE(obj);
-    VFIOPCIDevice *vdev = VFIO_PCI(obj);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
     VFIODevice *vbasedev = &vdev->vbasedev;
 
     device_add_bootindex_property(obj, &vdev->bootindex,
@@ -3403,6 +3403,31 @@ static void vfio_instance_init(Object *obj)
     pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
 }
 
+static void vfio_pci_base_dev_class_init(ObjectClass *klass, const void *data)
+{
+    DeviceClass *dc = DEVICE_CLASS(klass);
+    PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
+
+    dc->desc = "VFIO PCI base device";
+    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+    pdc->exit = vfio_exitfn;
+    pdc->config_read = vfio_pci_read_config;
+    pdc->config_write = vfio_pci_write_config;
+}
+
+static const TypeInfo vfio_pci_base_dev_info = {
+    .name = TYPE_VFIO_PCI_BASE,
+    .parent = TYPE_PCI_DEVICE,
+    .instance_size = 0,
+    .abstract = true,
+    .class_init = vfio_pci_base_dev_class_init,
+    .interfaces = (const InterfaceInfo[]) {
+        { INTERFACE_PCIE_DEVICE },
+        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
+        { }
+    },
+};
+
 static PropertyInfo vfio_pci_migration_multifd_transfer_prop;
 
 static const Property vfio_pci_dev_properties[] = {
@@ -3473,7 +3498,8 @@ static const Property vfio_pci_dev_properties[] = {
 #ifdef CONFIG_IOMMUFD
 static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp)
 {
-    vfio_device_set_fd(&VFIO_PCI(obj)->vbasedev, str, errp);
+    VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
+    vfio_device_set_fd(&vdev->vbasedev, str, errp);
 }
 #endif
 
@@ -3488,11 +3514,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
     object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
 #endif
     dc->desc = "VFIO-based PCI device assignment";
-    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
     pdc->realize = vfio_realize;
-    pdc->exit = vfio_exitfn;
-    pdc->config_read = vfio_pci_read_config;
-    pdc->config_write = vfio_pci_write_config;
 
     object_class_property_set_description(klass, /* 1.3 */
                                           "host",
@@ -3617,16 +3639,11 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
 
 static const TypeInfo vfio_pci_dev_info = {
     .name = TYPE_VFIO_PCI,
-    .parent = TYPE_PCI_DEVICE,
+    .parent = TYPE_VFIO_PCI_BASE,
     .instance_size = sizeof(VFIOPCIDevice),
     .class_init = vfio_pci_dev_class_init,
     .instance_init = vfio_instance_init,
     .instance_finalize = vfio_instance_finalize,
-    .interfaces = (const InterfaceInfo[]) {
-        { INTERFACE_PCIE_DEVICE },
-        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
-        { }
-    },
 };
 
 static const Property vfio_pci_dev_nohotplug_properties[] = {
@@ -3673,6 +3690,7 @@ static void register_vfio_pci_dev_type(void)
     vfio_pci_migration_multifd_transfer_prop = qdev_prop_on_off_auto;
     vfio_pci_migration_multifd_transfer_prop.realized_set_allowed = true;
 
+    type_register_static(&vfio_pci_base_dev_info);
     type_register_static(&vfio_pci_dev_info);
     type_register_static(&vfio_pci_nohotplug_dev_info);
 }
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index f835b1dbc2..32a65cc1ae 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -118,8 +118,13 @@ typedef struct VFIOMSIXInfo {
     bool noresize;
 } VFIOMSIXInfo;
 
-#define TYPE_VFIO_PCI "vfio-pci"
-OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI)
+/*
+ * TYPE_VFIO_PCI_BASE is an abstract type used to share code
+ * between VFIO implementations that use a kernel driver
+ * with those that use user sockets.
+ */
+#define TYPE_VFIO_PCI_BASE "vfio-pci-base"
+OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI_BASE)
 
 struct VFIOPCIDevice {
     PCIDevice pdev;
@@ -187,6 +192,9 @@ struct VFIOPCIDevice {
     Notifier irqchip_change_notifier;
 };
 
+#define TYPE_VFIO_PCI "vfio-pci"
+/* TYPE_VFIO_PCI shares struct VFIOPCIDevice. */
+
 /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
 static inline bool vfio_pci_is(VFIOPCIDevice *vdev, uint32_t vendor, uint32_t device)
 {
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 14/15] vfio/container: pass listener_begin/commit callbacks
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (12 preceding siblings ...)
  2025-04-30 19:40 ` [PATCH v2 13/15] vfio: add vfio-pci-base class John Levon
@ 2025-04-30 19:40 ` John Levon
  2025-05-05 12:43   ` Cédric Le Goater
  2025-04-30 19:40 ` [PATCH v2 15/15] vfio/container: pass MemoryRegion to DMA operations John Levon
  2025-05-05 12:51 ` [PATCH v2 00/15] vfio: preparation for vfio-user Cédric Le Goater
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon

The vfio-user container will later need to hook into these callbacks;
set up vfio to use them, and optionally pass them through to the
container.

Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/listener.c                    | 28 +++++++++++++++++++++++++++
 include/hw/vfio/vfio-container-base.h |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index 2b93ca55b6..bfacb3d8d9 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -411,6 +411,32 @@ static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer,
     return true;
 }
 
+static void vfio_listener_begin(MemoryListener *listener)
+{
+    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+                                                 listener);
+    void (*listener_begin)(VFIOContainerBase *bcontainer);
+
+    listener_begin = VFIO_IOMMU_GET_CLASS(bcontainer)->listener_begin;
+
+    if (listener_begin) {
+        listener_begin(bcontainer);
+    }
+}
+
+static void vfio_listener_commit(MemoryListener *listener)
+{
+    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
+                                                 listener);
+    void (*listener_commit)(VFIOContainerBase *bcontainer);
+
+    listener_commit = VFIO_IOMMU_GET_CLASS(bcontainer)->listener_begin;
+
+    if (listener_commit) {
+        listener_commit(bcontainer);
+    }
+}
+
 static void vfio_device_error_append(VFIODevice *vbasedev, Error **errp)
 {
     /*
@@ -1161,6 +1187,8 @@ static void vfio_listener_log_sync(MemoryListener *listener,
 
 static const MemoryListener vfio_memory_listener = {
     .name = "vfio",
+    .begin = vfio_listener_begin,
+    .commit = vfio_listener_commit,
     .region_add = vfio_listener_region_add,
     .region_del = vfio_listener_region_del,
     .log_global_start = vfio_listener_log_global_start,
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index 92cee54d11..e29f7126c5 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -117,6 +117,8 @@ struct VFIOIOMMUClass {
 
     /* basic feature */
     bool (*setup)(VFIOContainerBase *bcontainer, Error **errp);
+    void (*listener_begin)(VFIOContainerBase *bcontainer);
+    void (*listener_commit)(VFIOContainerBase *bcontainer);
     int (*dma_map)(const VFIOContainerBase *bcontainer,
                    hwaddr iova, ram_addr_t size,
                    void *vaddr, bool readonly);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* [PATCH v2 15/15] vfio/container: pass MemoryRegion to DMA operations
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (13 preceding siblings ...)
  2025-04-30 19:40 ` [PATCH v2 14/15] vfio/container: pass listener_begin/commit callbacks John Levon
@ 2025-04-30 19:40 ` John Levon
  2025-05-05 12:46   ` Cédric Le Goater
  2025-05-05 12:51 ` [PATCH v2 00/15] vfio: preparation for vfio-user Cédric Le Goater
  15 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-04-30 19:40 UTC (permalink / raw)
  To: qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Tony Krowiak, Alex Williamson,
	Stefano Garzarella, Thomas Huth, Paolo Bonzini, Halil Pasic,
	John Levon, John Johnson, Jagannathan Raman, Elena Ufimtseva

Pass through the MemoryRegion to DMA operation handlers of vfio
containers. The vfio-user container will need this later.

Originally-by: John Johnson <john.g.johnson@oracle.com>
Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
Signed-off-by: John Levon <john.levon@nutanix.com>
---
 hw/vfio/container-base.c              |  4 ++--
 hw/vfio/container.c                   |  3 ++-
 hw/vfio/iommufd.c                     |  3 ++-
 hw/vfio/listener.c                    | 18 +++++++++++-------
 hw/virtio/vhost-vdpa.c                |  2 +-
 include/hw/vfio/vfio-container-base.h |  4 ++--
 include/system/memory.h               |  4 +++-
 system/memory.c                       |  7 ++++++-
 8 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
index 1c6ca94b60..a677bb6694 100644
--- a/hw/vfio/container-base.c
+++ b/hw/vfio/container-base.c
@@ -75,12 +75,12 @@ void vfio_address_space_insert(VFIOAddressSpace *space,
 
 int vfio_container_dma_map(VFIOContainerBase *bcontainer,
                            hwaddr iova, ram_addr_t size,
-                           void *vaddr, bool readonly)
+                           void *vaddr, bool readonly, MemoryRegion *mrp)
 {
     VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
 
     g_assert(vioc->dma_map);
-    return vioc->dma_map(bcontainer, iova, size, vaddr, readonly);
+    return vioc->dma_map(bcontainer, iova, size, vaddr, readonly, mrp);
 }
 
 int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
diff --git a/hw/vfio/container.c b/hw/vfio/container.c
index 1000f3c241..aaaca33c8e 100644
--- a/hw/vfio/container.c
+++ b/hw/vfio/container.c
@@ -211,7 +211,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
 }
 
 static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
-                               ram_addr_t size, void *vaddr, bool readonly)
+                               ram_addr_t size, void *vaddr, bool readonly,
+                               MemoryRegion *mrp)
 {
     const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
                                                   bcontainer);
diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
index af1c7ab10a..a2518c4a5d 100644
--- a/hw/vfio/iommufd.c
+++ b/hw/vfio/iommufd.c
@@ -34,7 +34,8 @@
             TYPE_HOST_IOMMU_DEVICE_IOMMUFD "-vfio"
 
 static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
-                            ram_addr_t size, void *vaddr, bool readonly)
+                            ram_addr_t size, void *vaddr, bool readonly,
+                            MemoryRegion *mrp)
 {
     const VFIOIOMMUFDContainer *container =
         container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
index bfacb3d8d9..71f336a31c 100644
--- a/hw/vfio/listener.c
+++ b/hw/vfio/listener.c
@@ -93,12 +93,12 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
 /* Called with rcu_read_lock held.  */
 static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
                                ram_addr_t *ram_addr, bool *read_only,
-                               Error **errp)
+                               MemoryRegion **mrp, Error **errp)
 {
     bool ret, mr_has_discard_manager;
 
     ret = memory_get_xlat_addr(iotlb, vaddr, ram_addr, read_only,
-                               &mr_has_discard_manager, errp);
+                               &mr_has_discard_manager, mrp, errp);
     if (ret && mr_has_discard_manager) {
         /*
          * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
@@ -126,6 +126,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
     VFIOContainerBase *bcontainer = giommu->bcontainer;
     hwaddr iova = iotlb->iova + giommu->iommu_offset;
+    MemoryRegion *mrp;
     void *vaddr;
     int ret;
     Error *local_err = NULL;
@@ -150,7 +151,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
         bool read_only;
 
-        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &local_err)) {
+        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &mrp,
+                                &local_err)) {
             error_report_err(local_err);
             goto out;
         }
@@ -163,7 +165,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
          */
         ret = vfio_container_dma_map(bcontainer, iova,
                                      iotlb->addr_mask + 1, vaddr,
-                                     read_only);
+                                     read_only, mrp);
         if (ret) {
             error_report("vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
                          "0x%"HWADDR_PRIx", %p) = %d (%s)",
@@ -233,7 +235,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
         vaddr = memory_region_get_ram_ptr(section->mr) + start;
 
         ret = vfio_container_dma_map(bcontainer, iova, next - start,
-                                     vaddr, section->readonly);
+                                     vaddr, section->readonly, section->mr);
         if (ret) {
             /* Rollback */
             vfio_ram_discard_notify_discard(rdl, section);
@@ -557,7 +559,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
     }
 
     ret = vfio_container_dma_map(bcontainer, iova, int128_get64(llsize),
-                                 vaddr, section->readonly);
+                                 vaddr, section->readonly, section->mr);
     if (ret) {
         error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
                    "0x%"HWADDR_PRIx", %p) = %d (%s)",
@@ -1021,7 +1023,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     }
 
     rcu_read_lock();
-    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, &local_err)) {
+    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, NULL,
+                            &local_err)) {
+        error_report_err(local_err);
         goto out_unlock;
     }
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 1ab2c11fa8..4c4b3d1371 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -228,7 +228,7 @@ static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
     if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
         bool read_only;
 
-        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL,
+        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL, NULL,
                                   &local_err)) {
             error_report_err(local_err);
             return;
diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
index e29f7126c5..09b72e9969 100644
--- a/include/hw/vfio/vfio-container-base.h
+++ b/include/hw/vfio/vfio-container-base.h
@@ -78,7 +78,7 @@ void vfio_address_space_insert(VFIOAddressSpace *space,
 
 int vfio_container_dma_map(VFIOContainerBase *bcontainer,
                            hwaddr iova, ram_addr_t size,
-                           void *vaddr, bool readonly);
+                           void *vaddr, bool readonly, MemoryRegion *mrp);
 int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
                              hwaddr iova, ram_addr_t size,
                              IOMMUTLBEntry *iotlb, bool unmap_all);
@@ -121,7 +121,7 @@ struct VFIOIOMMUClass {
     void (*listener_commit)(VFIOContainerBase *bcontainer);
     int (*dma_map)(const VFIOContainerBase *bcontainer,
                    hwaddr iova, ram_addr_t size,
-                   void *vaddr, bool readonly);
+                   void *vaddr, bool readonly, MemoryRegion *mrp);
     int (*dma_unmap)(const VFIOContainerBase *bcontainer,
                      hwaddr iova, ram_addr_t size,
                      IOMMUTLBEntry *iotlb, bool unmap_all);
diff --git a/include/system/memory.h b/include/system/memory.h
index fbbf4cf911..eca1d9f32e 100644
--- a/include/system/memory.h
+++ b/include/system/memory.h
@@ -746,13 +746,15 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
  * @read_only: indicates if writes are allowed
  * @mr_has_discard_manager: indicates memory is controlled by a
  *                          RamDiscardManager
+ * @mrp: if non-NULL, fill in with MemoryRegion
  * @errp: pointer to Error*, to store an error if it happens.
  *
  * Return: true on success, else false setting @errp with error.
  */
 bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
                           ram_addr_t *ram_addr, bool *read_only,
-                          bool *mr_has_discard_manager, Error **errp);
+                          bool *mr_has_discard_manager, MemoryRegion **mrp,
+                          Error **errp);
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
 typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
diff --git a/system/memory.c b/system/memory.c
index 71434e7ad0..79671943ce 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -2176,7 +2176,8 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
 /* Called with rcu_read_lock held.  */
 bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
                           ram_addr_t *ram_addr, bool *read_only,
-                          bool *mr_has_discard_manager, Error **errp)
+                          bool *mr_has_discard_manager, MemoryRegion **mrp,
+                          Error **errp)
 {
     MemoryRegion *mr;
     hwaddr xlat;
@@ -2241,6 +2242,10 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
         *read_only = !writable || mr->readonly;
     }
 
+    if (mrp != NULL) {
+        *mrp = mr;
+    }
+
     return true;
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper
  2025-04-30 19:39 ` [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper John Levon
@ 2025-05-01 11:53   ` Anthony Krowiak
  2025-05-05  9:19   ` Cédric Le Goater
  1 sibling, 0 replies; 41+ messages in thread
From: Anthony Krowiak @ 2025-05-01 11:53 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Cédric Le Goater,
	Tomita Moeko, Markus Armbruster, Matthew Rosato, Eric Farman,
	David Hildenbrand, Philippe Mathieu-Daudé,
	Michael S. Tsirkin, Alex Williamson, Stefano Garzarella,
	Thomas Huth, Paolo Bonzini, Halil Pasic




On 4/30/25 3:39 PM, John Levon wrote:
> Add a helper similar to vfio_device_get_region_info() and use it
> everywhere.
>
> Replace a couple of needless allocations with stack variables.
>
> As a side-effect, this fixes a minor error reporting issue in the call
> from vfio_msix_early_setup().
>
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/ap.c                  | 19 ++++++++++---------
>   hw/vfio/ccw.c                 | 20 +++++++++++---------
>   hw/vfio/device.c              | 15 +++++++++++++++
>   hw/vfio/pci.c                 | 23 +++++++++++------------
>   hw/vfio/platform.c            |  6 +++---
>   include/hw/vfio/vfio-device.h |  3 +++
>   6 files changed, 53 insertions(+), 33 deletions(-)
>
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index 4f88f80c54..4f97260dac 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -139,10 +139,10 @@ static bool vfio_ap_register_irq_notifier(VFIOAPDevice *vapdev,
>                                             unsigned int irq, Error **errp)
>   {
>       int fd;
> -    size_t argsz;
> +    int ret;
>       IOHandler *fd_read;
>       EventNotifier *notifier;
> -    g_autofree struct vfio_irq_info *irq_info = NULL;
> +    struct vfio_irq_info irq_info;
>       VFIODevice *vdev = &vapdev->vdev;
>   
>       switch (irq) {
> @@ -165,14 +165,15 @@ static bool vfio_ap_register_irq_notifier(VFIOAPDevice *vapdev,
>           return false;
>       }
>   
> -    argsz = sizeof(*irq_info);
> -    irq_info = g_malloc0(argsz);
> -    irq_info->index = irq;
> -    irq_info->argsz = argsz;
> +    ret = vfio_device_get_irq_info(vdev, irq, &irq_info);
> +
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "vfio: Error getting irq info");
> +        return false;
> +    }
>   
> -    if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
> -              irq_info) < 0 || irq_info->count < 1) {
> -        error_setg_errno(errp, errno, "vfio: Error getting irq info");
> +    if (irq_info.count < 1) {
> +        error_setg(errp, "vfio: Error getting irq info, count=0");
>           return false;
>       }

The changes above look good to me.

>   
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index fde0c3fbef..ab3fabf991 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -376,8 +376,8 @@ static bool vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
>                                              Error **errp)
>   {
>       VFIODevice *vdev = &vcdev->vdev;
> -    g_autofree struct vfio_irq_info *irq_info = NULL;
> -    size_t argsz;
> +    struct vfio_irq_info irq_info;
> +    int ret;
>       int fd;
>       EventNotifier *notifier;
>       IOHandler *fd_read;
> @@ -406,13 +406,15 @@ static bool vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
>           return false;
>       }
>   
> -    argsz = sizeof(*irq_info);
> -    irq_info = g_malloc0(argsz);
> -    irq_info->index = irq;
> -    irq_info->argsz = argsz;
> -    if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
> -              irq_info) < 0 || irq_info->count < 1) {
> -        error_setg_errno(errp, errno, "vfio: Error getting irq info");
> +    ret = vfio_device_get_irq_info(vdev, irq, &irq_info);
> +
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "vfio: Error getting irq info");
> +        return false;
> +    }
> +
> +    if (irq_info.count < 1) {
> +        error_setg(errp, "vfio: Error getting irq info, count=0");
>           return false;
>       }
>   
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 9673b0717e..5d837092cb 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -185,6 +185,21 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
>       return false;
>   }
>   
> +int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
> +                             struct vfio_irq_info *info)
> +{
> +    int ret;
> +
> +    memset(info, 0, sizeof(*info));
> +
> +    info->argsz = sizeof(*info);
> +    info->index = index;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
>   int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>                                   struct vfio_region_info **info)
>   {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6908bcc0d3..407cf43387 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1555,8 +1555,7 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       uint16_t ctrl;
>       uint32_t table, pba;
>       int ret, fd = vdev->vbasedev.fd;
> -    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
> -                                      .index = VFIO_PCI_MSIX_IRQ_INDEX };
> +    struct vfio_irq_info irq_info;
>       VFIOMSIXInfo *msix;
>   
>       pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
> @@ -1593,7 +1592,8 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
>       msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    ret = vfio_device_get_irq_info(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX,
> +                                   &irq_info);
>       if (ret < 0) {
>           error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
>           g_free(msix);
> @@ -2736,7 +2736,7 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
>   {
>       VFIODevice *vbasedev = &vdev->vbasedev;
>       g_autofree struct vfio_region_info *reg_info = NULL;
> -    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> +    struct vfio_irq_info irq_info;
>       int i, ret = -1;
>   
>       /* Sanity check device */
> @@ -2797,12 +2797,10 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
>           }
>       }
>   
> -    irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
> -
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    ret = vfio_device_get_irq_info(vbasedev, VFIO_PCI_ERR_IRQ_INDEX, &irq_info);
>       if (ret) {
>           /* This can fail for an old kernel or legacy PCI dev */
> -        trace_vfio_populate_device_get_irq_info_failure(strerror(errno));
> +        trace_vfio_populate_device_get_irq_info_failure(strerror(-ret));
>       } else if (irq_info.count == 1) {
>           vdev->pci_aer = true;
>       } else {
> @@ -2911,17 +2909,18 @@ static void vfio_req_notifier_handler(void *opaque)
>   
>   static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
>   {
> -    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
> -                                      .index = VFIO_PCI_REQ_IRQ_INDEX };
> +    struct vfio_irq_info irq_info;
>       Error *err = NULL;
>       int32_t fd;
> +    int ret;
>   
>       if (!(vdev->features & VFIO_FEATURE_ENABLE_REQ)) {
>           return;
>       }
>   
> -    if (ioctl(vdev->vbasedev.fd,
> -              VFIO_DEVICE_GET_IRQ_INFO, &irq_info) < 0 || irq_info.count < 1) {
> +    ret = vfio_device_get_irq_info(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX,
> +                                   &irq_info);
> +    if (ret < 0 || irq_info.count < 1) {
>           return;
>       }
>   
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index ffb3681607..9a21f2e50a 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -474,10 +474,10 @@ static bool vfio_populate_device(VFIODevice *vbasedev, Error **errp)
>       QSIMPLEQ_INIT(&vdev->pending_intp_queue);
>   
>       for (i = 0; i < vbasedev->num_irqs; i++) {
> -        struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> +        struct vfio_irq_info irq;
> +
> +        ret = vfio_device_get_irq_info(vbasedev, i, &irq);
>   
> -        irq.index = i;
> -        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
>           if (ret) {
>               error_setg_errno(errp, -ret, "failed to get device irq info");
>               goto irq_err;
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 666a0b50b4..5b833868c9 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -146,6 +146,9 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>   int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
>                                        uint32_t subtype, struct vfio_region_info **info);
>   bool vfio_device_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
> +
> +int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
> +                                struct vfio_irq_info *info);
>   #endif
>   
>   /* Returns 0 on success, or a negative errno. */



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 01/15] vfio: add vfio_prepare_device()
  2025-04-30 19:39 ` [PATCH v2 01/15] vfio: add vfio_prepare_device() John Levon
@ 2025-05-05  8:35   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05  8:35 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

John,

On 4/30/25 21:39, John Levon wrote:
> Commonize some initialization code shared by the legacy and iommufd vfio
> implementations.
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/container.c           | 14 ++------------
>   hw/vfio/device.c              | 14 ++++++++++++++
>   hw/vfio/iommufd.c             |  9 +--------
>   include/hw/vfio/vfio-device.h |  3 +++
>   4 files changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 77ff56b43f..aa9d5b731b 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -811,18 +811,14 @@ static bool vfio_device_get(VFIOGroup *group, const char *name,
>           }
>       }
>   
> +    vfio_device_prepare(vbasedev, &group->container->bcontainer, info);
> +
>       vbasedev->fd = fd;
>       vbasedev->group = group;
>       QLIST_INSERT_HEAD(&group->device_list, vbasedev, next);
>   
> -    vbasedev->num_irqs = info->num_irqs;
> -    vbasedev->num_regions = info->num_regions;
> -    vbasedev->flags = info->flags;
> -
>       trace_vfio_device_get(name, info->flags, info->num_regions, info->num_irqs);
>   
> -    vbasedev->reset_works = !!(info->flags & VFIO_DEVICE_FLAGS_RESET);
> -
>       return true;
>   }
>   
> @@ -875,7 +871,6 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>       int groupid = vfio_device_get_groupid(vbasedev, errp);
>       VFIODevice *vbasedev_iter;
>       VFIOGroup *group;
> -    VFIOContainerBase *bcontainer;
>   
>       if (groupid < 0) {
>           return false;
> @@ -904,11 +899,6 @@ static bool vfio_legacy_attach_device(const char *name, VFIODevice *vbasedev,
>           goto device_put_exit;
>       }
>   
> -    bcontainer = &group->container->bcontainer;
> -    vbasedev->bcontainer = bcontainer;
> -    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
> -    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
> -
>       return true;
>   
>   device_put_exit:
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index d625a7c4db..f3b9902d21 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -398,3 +398,17 @@ void vfio_device_detach(VFIODevice *vbasedev)
>       }
>       VFIO_IOMMU_GET_CLASS(vbasedev->bcontainer)->detach_device(vbasedev);
>   }
> +
> +void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
> +                         struct vfio_device_info *info)
> +{
> +    vbasedev->num_irqs = info->num_irqs;
> +    vbasedev->num_regions = info->num_regions;
> +    vbasedev->flags = info->flags;
> +    vbasedev->reset_works = !!(info->flags & VFIO_DEVICE_FLAGS_RESET);
> +
> +    vbasedev->bcontainer = bcontainer;
> +    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
> +
> +    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
> +}
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 232c06dd15..83033c352a 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -588,14 +588,7 @@ found_container:
>           iommufd_cdev_ram_block_discard_disable(false);
>       }
>   
> -    vbasedev->group = 0;
> -    vbasedev->num_irqs = dev_info.num_irqs;
> -    vbasedev->num_regions = dev_info.num_regions;
> -    vbasedev->flags = dev_info.flags;
> -    vbasedev->reset_works = !!(dev_info.flags & VFIO_DEVICE_FLAGS_RESET);
> -    vbasedev->bcontainer = bcontainer;
> -    QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
> -    QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
> +    vfio_device_prepare(vbasedev, bcontainer, &dev_info);
>   
>       trace_iommufd_cdev_device_info(vbasedev->name, devfd, vbasedev->num_irqs,
>                                      vbasedev->num_regions, vbasedev->flags);
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 81c95bb51e..9cb5671ab5 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -130,6 +130,9 @@ bool vfio_device_attach(char *name, VFIODevice *vbasedev,
>   void vfio_device_detach(VFIODevice *vbasedev);
>   VFIODevice *vfio_get_vfio_device(Object *obj);
>   
> +void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
> +                         struct vfio_device_info *info);
> +
>   typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
>   extern VFIODeviceList vfio_device_list;
>   

Please add to your .git/config file :

[diff]
	orderFile = /path/to/qemu/scripts/git.orderfile

Thanks,

C.





^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 02/15] vfio: add vfio_unprepare_device()
  2025-04-30 19:39 ` [PATCH v2 02/15] vfio: add vfio_unprepare_device() John Levon
@ 2025-05-05  9:18   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05  9:18 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:39, John Levon wrote:
> Add a helper that's the inverse of vfio_prepare_device().
> 
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/container.c           | 6 +++---
>   hw/vfio/device.c              | 7 +++++++
>   hw/vfio/iommufd.c             | 4 +---
>   include/hw/vfio/vfio-device.h | 1 +
>   4 files changed, 12 insertions(+), 6 deletions(-)
> 

Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper
  2025-04-30 19:39 ` [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper John Levon
  2025-05-01 11:53   ` Anthony Krowiak
@ 2025-05-05  9:19   ` Cédric Le Goater
  2025-05-06 11:38     ` John Levon
  1 sibling, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05  9:19 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:39, John Levon wrote:
> Add a helper similar to vfio_device_get_region_info() and use it
> everywhere.
> 
> Replace a couple of needless allocations with stack variables.
> 
> As a side-effect, this fixes a minor error reporting issue in the call
> from vfio_msix_early_setup().
> 
> Reviewed-by: Cédric Le Goater <clg@redhat.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/ap.c                  | 19 ++++++++++---------
>   hw/vfio/ccw.c                 | 20 +++++++++++---------
>   hw/vfio/device.c              | 15 +++++++++++++++
>   hw/vfio/pci.c                 | 23 +++++++++++------------
>   hw/vfio/platform.c            |  6 +++---
>   include/hw/vfio/vfio-device.h |  3 +++
>   6 files changed, 53 insertions(+), 33 deletions(-)
> 
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index 4f88f80c54..4f97260dac 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -139,10 +139,10 @@ static bool vfio_ap_register_irq_notifier(VFIOAPDevice *vapdev,
>                                             unsigned int irq, Error **errp)
>   {
>       int fd;
> -    size_t argsz;
> +    int ret;
>       IOHandler *fd_read;
>       EventNotifier *notifier;
> -    g_autofree struct vfio_irq_info *irq_info = NULL;
> +    struct vfio_irq_info irq_info;
>       VFIODevice *vdev = &vapdev->vdev;
>   
>       switch (irq) {
> @@ -165,14 +165,15 @@ static bool vfio_ap_register_irq_notifier(VFIOAPDevice *vapdev,
>           return false;
>       }
>   
> -    argsz = sizeof(*irq_info);
> -    irq_info = g_malloc0(argsz);
> -    irq_info->index = irq;
> -    irq_info->argsz = argsz;
> +    ret = vfio_device_get_irq_info(vdev, irq, &irq_info);
> +
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "vfio: Error getting irq info");
> +        return false;
> +    }
>   
> -    if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
> -              irq_info) < 0 || irq_info->count < 1) {
> -        error_setg_errno(errp, errno, "vfio: Error getting irq info");
> +    if (irq_info.count < 1) {
> +        error_setg(errp, "vfio: Error getting irq info, count=0");
>           return false;
>       }
>   
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index fde0c3fbef..ab3fabf991 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -376,8 +376,8 @@ static bool vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
>                                              Error **errp)
>   {
>       VFIODevice *vdev = &vcdev->vdev;
> -    g_autofree struct vfio_irq_info *irq_info = NULL;
> -    size_t argsz;
> +    struct vfio_irq_info irq_info;
> +    int ret;
>       int fd;
>       EventNotifier *notifier;
>       IOHandler *fd_read;
> @@ -406,13 +406,15 @@ static bool vfio_ccw_register_irq_notifier(VFIOCCWDevice *vcdev,
>           return false;
>       }
>   
> -    argsz = sizeof(*irq_info);
> -    irq_info = g_malloc0(argsz);
> -    irq_info->index = irq;
> -    irq_info->argsz = argsz;
> -    if (ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO,
> -              irq_info) < 0 || irq_info->count < 1) {
> -        error_setg_errno(errp, errno, "vfio: Error getting irq info");
> +    ret = vfio_device_get_irq_info(vdev, irq, &irq_info);
> +
> +    if (ret < 0) {
> +        error_setg_errno(errp, -ret, "vfio: Error getting irq info");
> +        return false;
> +    }
> +
> +    if (irq_info.count < 1) {
> +        error_setg(errp, "vfio: Error getting irq info, count=0");
>           return false;
>       }
>   
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 9673b0717e..5d837092cb 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -185,6 +185,21 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
>       return false;
>   }
>   
> +int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
> +                             struct vfio_irq_info *info)
> +{
> +    int ret;
> +
> +    memset(info, 0, sizeof(*info));
> +
> +    info->argsz = sizeof(*info);
> +    info->index = index;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
>   int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>                                   struct vfio_region_info **info)
>   {
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 6908bcc0d3..407cf43387 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -1555,8 +1555,7 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       uint16_t ctrl;
>       uint32_t table, pba;
>       int ret, fd = vdev->vbasedev.fd;
> -    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
> -                                      .index = VFIO_PCI_MSIX_IRQ_INDEX };
> +    struct vfio_irq_info irq_info;
>       VFIOMSIXInfo *msix;
>   
>       pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
> @@ -1593,7 +1592,8 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
>       msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    ret = vfio_device_get_irq_info(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX,
> +                                   &irq_info);
>       if (ret < 0) {
>           error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
>           g_free(msix);
> @@ -2736,7 +2736,7 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
>   {
>       VFIODevice *vbasedev = &vdev->vbasedev;
>       g_autofree struct vfio_region_info *reg_info = NULL;
> -    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> +    struct vfio_irq_info irq_info;
>       int i, ret = -1;
>   
>       /* Sanity check device */
> @@ -2797,12 +2797,10 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
>           }
>       }
>   
> -    irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
> -
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    ret = vfio_device_get_irq_info(vbasedev, VFIO_PCI_ERR_IRQ_INDEX, &irq_info);
>       if (ret) {
>           /* This can fail for an old kernel or legacy PCI dev */
> -        trace_vfio_populate_device_get_irq_info_failure(strerror(errno));
> +        trace_vfio_populate_device_get_irq_info_failure(strerror(-ret));
>       } else if (irq_info.count == 1) {
>           vdev->pci_aer = true;
>       } else {
> @@ -2911,17 +2909,18 @@ static void vfio_req_notifier_handler(void *opaque)
>   
>   static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
>   {
> -    struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
> -                                      .index = VFIO_PCI_REQ_IRQ_INDEX };
> +    struct vfio_irq_info irq_info;
>       Error *err = NULL;
>       int32_t fd;
> +    int ret;
>   
>       if (!(vdev->features & VFIO_FEATURE_ENABLE_REQ)) {
>           return;
>       }
>   
> -    if (ioctl(vdev->vbasedev.fd,
> -              VFIO_DEVICE_GET_IRQ_INFO, &irq_info) < 0 || irq_info.count < 1) {
> +    ret = vfio_device_get_irq_info(&vdev->vbasedev, VFIO_PCI_REQ_IRQ_INDEX,
> +                                   &irq_info);
> +    if (ret < 0 || irq_info.count < 1) {
>           return;
>       }
>   
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index ffb3681607..9a21f2e50a 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -474,10 +474,10 @@ static bool vfio_populate_device(VFIODevice *vbasedev, Error **errp)
>       QSIMPLEQ_INIT(&vdev->pending_intp_queue);
>   
>       for (i = 0; i < vbasedev->num_irqs; i++) {
> -        struct vfio_irq_info irq = { .argsz = sizeof(irq) };
> +        struct vfio_irq_info irq;
> +
> +        ret = vfio_device_get_irq_info(vbasedev, i, &irq);
>   
> -        irq.index = i;
> -        ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, &irq);
>           if (ret) {
>               error_setg_errno(errp, -ret, "failed to get device irq info");
>               goto irq_err;
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 666a0b50b4..5b833868c9 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -146,6 +146,9 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>   int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
>                                        uint32_t subtype, struct vfio_region_info **info);
>   bool vfio_device_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type);
> +
> +int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
> +                                struct vfio_irq_info *info);

This is breaking the windows build.


Thanks,

C.



>   #endif
>   
>   /* Returns 0 on success, or a negative errno. */



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 05/15] vfio: consistently handle return value for helpers
  2025-04-30 19:39 ` [PATCH v2 05/15] vfio: consistently handle return value for helpers John Levon
@ 2025-05-05  9:32   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05  9:32 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:39, John Levon wrote:
> Various bits of code that call vfio device APIs should consistently use
> the "return -errno" approach for passing errors back, rather than
> presuming errno is (still) set correctly.
> 
> Signed-off-by: John Levon <john.levon@nutanix.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/pci.c | 33 ++++++++++++++++++++-------------
>   1 file changed, 20 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 407cf43387..768c48d7ad 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -398,7 +398,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
>   
>       ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
>   
> -    return ret;
> +    return ret < 0 ? -errno : ret;
>   }
>   
>   static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> @@ -459,7 +459,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>   
>       g_free(irq_set);
>   
> -    return ret;
> +    return ret < 0 ? -errno : ret;
>   }
>   
>   static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
> @@ -581,7 +581,8 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>               vfio_device_irq_disable(&vdev->vbasedev, VFIO_PCI_MSIX_IRQ_INDEX);
>               ret = vfio_enable_vectors(vdev, true);
>               if (ret) {
> -                error_report("vfio: failed to enable vectors, %d", ret);
> +                error_report("vfio: failed to enable vectors, %s",
> +                             strerror(-ret));
>               }
>           } else {
>               Error *err = NULL;
> @@ -695,7 +696,8 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>       if (vdev->nr_vectors) {
>           ret = vfio_enable_vectors(vdev, true);
>           if (ret) {
> -            error_report("vfio: failed to enable vectors, %d", ret);
> +            error_report("vfio: failed to enable vectors, %s",
> +                         strerror(-ret));
>           }
>       } else {
>           /*
> @@ -712,7 +714,8 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
>            */
>           ret = vfio_enable_msix_no_vec(vdev);
>           if (ret) {
> -            error_report("vfio: failed to enable MSI-X, %d", ret);
> +            error_report("vfio: failed to enable MSI-X, %s",
> +                         strerror(-ret));
>           }
>       }
>   
> @@ -765,7 +768,8 @@ retry:
>       ret = vfio_enable_vectors(vdev, false);
>       if (ret) {
>           if (ret < 0) {
> -            error_report("vfio: Error: Failed to setup MSI fds: %m");
> +            error_report("vfio: Error: Failed to setup MSI fds: %s",
> +                         strerror(-ret));
>           } else {
>               error_report("vfio: Error: Failed to enable %d "
>                            "MSI vectors, retry with %d", vdev->nr_vectors, ret);
> @@ -882,17 +886,21 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
>   static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>   {
>       g_autofree struct vfio_region_info *reg_info = NULL;
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       uint64_t size;
>       off_t off = 0;
>       ssize_t bytes;
> +    int ret;
> +
> +    ret = vfio_device_get_region_info(vbasedev, VFIO_PCI_ROM_REGION_INDEX,
> +                                      &reg_info);
>   
> -    if (vfio_device_get_region_info(&vdev->vbasedev,
> -                                    VFIO_PCI_ROM_REGION_INDEX, &reg_info)) {
> -        error_report("vfio: Error getting ROM info: %m");
> +    if (ret != 0) {
> +        error_report("vfio: Error getting ROM info: %s", strerror(-ret));
>           return;
>       }
>   
> -    trace_vfio_pci_load_rom(vdev->vbasedev.name, (unsigned long)reg_info->size,
> +    trace_vfio_pci_load_rom(vbasedev->name, (unsigned long)reg_info->size,
>                               (unsigned long)reg_info->offset,
>                               (unsigned long)reg_info->flags);
>   
> @@ -901,8 +909,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>   
>       if (!vdev->rom_size) {
>           vdev->rom_read_failed = true;
> -        error_report("vfio-pci: Cannot read device rom at "
> -                    "%s", vdev->vbasedev.name);
> +        error_report("vfio-pci: Cannot read device rom at %s", vbasedev->name);
>           error_printf("Device option ROM contents are probably invalid "
>                       "(check dmesg).\nSkip option ROM probe with rombar=0, "
>                       "or load from file with romfile=\n");
> @@ -913,7 +920,7 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>       memset(vdev->rom, 0xff, size);
>   
>       while (size) {
> -        bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
> +        bytes = pread(vbasedev->fd, vdev->rom + off,
>                         size, vdev->rom_offset + off);
>           if (bytes == 0) {
>               break;



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 06/15] include/qemu: add strread/writeerror()
  2025-04-30 19:39 ` [PATCH v2 06/15] include/qemu: add strread/writeerror() John Levon
@ 2025-05-05  9:37   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05  9:37 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:39, John Levon wrote:
> Add simple helpers to correctly report failures from read/write routines
> using the return -errno style.

I would keep these helpers under vfio for the moment.

Their use is a bit context-specific and making them common requires
more work, which would likely make them less useful.


Thanks,

C.





> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   include/qemu/error-report.h | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/include/qemu/error-report.h b/include/qemu/error-report.h
> index 3ae2357fda..67afe5a020 100644
> --- a/include/qemu/error-report.h
> +++ b/include/qemu/error-report.h
> @@ -70,6 +70,20 @@ void error_init(const char *argv0);
>                                 fmt, ##__VA_ARGS__);      \
>       })
>   
> +/*
> + * Given a return value of either a short number of bytes read or -errno,
> + * construct a meaningful error message.
> + */
> +#define strreaderror(ret) \
> +    (ret < 0 ? strerror(-ret) : "short read")
> +
> +/*
> + * Given a return value of either a short number of bytes written or -errno,
> + * construct a meaningful error message.
> + */
> +#define strwriteerror(ret) \
> +    (ret < 0 ? strerror(-ret) : "short write")
> +
>   extern bool message_with_timestamp;
>   extern bool error_with_guestname;
>   extern const char *error_guest_name;



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 07/15] vfio: add vfio_pci_config_space_read/write()
  2025-04-30 19:39 ` [PATCH v2 07/15] vfio: add vfio_pci_config_space_read/write() John Levon
@ 2025-05-05  9:45   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05  9:45 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:39, John Levon wrote:
> Add these helpers that access config space and return an -errno style
> return.
> 
> Signed-off-by: John Levon <john.levon@nutanix.com>

Looks ok.


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/pci.c | 123 ++++++++++++++++++++++++++++++++------------------
>   1 file changed, 80 insertions(+), 43 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 768c48d7ad..8455010d62 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -967,6 +967,28 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>       }
>   }
>   
> +/* "Raw" read of underlying config space. */
> +static int vfio_pci_config_space_read(VFIOPCIDevice *vdev, off_t offset,
> +                                      uint32_t size, void *data)
> +{
> +    ssize_t ret;
> +
> +    ret = pread(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
> +
> +    return ret < 0 ? -errno : (int)ret;
> +}
> +
> +/* "Raw" write of underlying config space. */
> +static int vfio_pci_config_space_write(VFIOPCIDevice *vdev, off_t offset,
> +                                       uint32_t size, void *data)
> +{
> +    ssize_t ret;
> +
> +    ret = pwrite(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
> +
> +    return ret < 0 ? -errno : (int)ret;
> +}
> +
>   static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
>   {
>       VFIOPCIDevice *vdev = opaque;
> @@ -1019,10 +1041,9 @@ static const MemoryRegionOps vfio_rom_ops = {
>   
>   static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>   {
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
> -    off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
>       char *name;
> -    int fd = vdev->vbasedev.fd;
>   
>       if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
>           /* Since pci handles romfile, just print a message and return */
> @@ -1039,11 +1060,12 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>        * Use the same size ROM BAR as the physical device.  The contents
>        * will get filled in later when the guest tries to read it.
>        */
> -    if (pread(fd, &orig, 4, offset) != 4 ||
> -        pwrite(fd, &size, 4, offset) != 4 ||
> -        pread(fd, &size, 4, offset) != 4 ||
> -        pwrite(fd, &orig, 4, offset) != 4) {
> -        error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name);
> +    if (vfio_pci_config_space_read(vdev, PCI_ROM_ADDRESS, 4, &orig) != 4 ||
> +        vfio_pci_config_space_write(vdev, PCI_ROM_ADDRESS, 4, &size) != 4 ||
> +        vfio_pci_config_space_read(vdev, PCI_ROM_ADDRESS, 4, &size) != 4 ||
> +        vfio_pci_config_space_write(vdev, PCI_ROM_ADDRESS, 4, &orig) != 4) {
> +
> +        error_report("%s(%s) ROM access failed", __func__, vbasedev->name);
>           return;
>       }
>   
> @@ -1223,6 +1245,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
>   uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>   {
>       VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
>   
>       memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
> @@ -1235,12 +1258,12 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>       if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
>           ssize_t ret;
>   
> -        ret = pread(vdev->vbasedev.fd, &phys_val, len,
> -                    vdev->config_offset + addr);
> +        ret = vfio_pci_config_space_read(vdev, addr, len, &phys_val);
>           if (ret != len) {
> -            error_report("%s(%s, 0x%x, 0x%x) failed: %m",
> -                         __func__, vdev->vbasedev.name, addr, len);
> -            return -errno;
> +            error_report("%s(%s, 0x%x, 0x%x) failed: %s",
> +                         __func__, vbasedev->name, addr, len,
> +                         strreaderror(ret));
> +            return -1;
>           }
>           phys_val = le32_to_cpu(phys_val);
>       }
> @@ -1256,15 +1279,18 @@ void vfio_pci_write_config(PCIDevice *pdev,
>                              uint32_t addr, uint32_t val, int len)
>   {
>       VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       uint32_t val_le = cpu_to_le32(val);
> +    int ret;
>   
>       trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>   
>       /* Write everything to VFIO, let it filter out what we can't write */
> -    if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
> -                != len) {
> -        error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %m",
> -                     __func__, vdev->vbasedev.name, addr, val, len);
> +    ret = vfio_pci_config_space_write(vdev, addr, len, &val_le);
> +    if (ret != len) {
> +        error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %s",
> +                     __func__, vbasedev->name, addr, val, len,
> +                    strwriteerror(ret));
>       }
>   
>       /* MSI/MSI-X Enabling/Disabling */
> @@ -1352,9 +1378,11 @@ static bool vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
>       int ret, entries;
>       Error *err = NULL;
>   
> -    if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
> -              vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
> -        error_setg_errno(errp, errno, "failed reading MSI PCI_CAP_FLAGS");
> +    ret = vfio_pci_config_space_read(vdev, pos + PCI_CAP_FLAGS,
> +                                     sizeof(ctrl), &ctrl);
> +    if (ret != sizeof(ctrl)) {
> +        error_setg(errp, "failed reading MSI PCI_CAP_FLAGS: %s",
> +                   strreaderror(ret));
>           return false;
>       }
>       ctrl = le16_to_cpu(ctrl);
> @@ -1561,30 +1589,35 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       uint8_t pos;
>       uint16_t ctrl;
>       uint32_t table, pba;
> -    int ret, fd = vdev->vbasedev.fd;
>       struct vfio_irq_info irq_info;
>       VFIOMSIXInfo *msix;
> +    int ret;
>   
>       pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
>       if (!pos) {
>           return true;
>       }
>   
> -    if (pread(fd, &ctrl, sizeof(ctrl),
> -              vdev->config_offset + pos + PCI_MSIX_FLAGS) != sizeof(ctrl)) {
> -        error_setg_errno(errp, errno, "failed to read PCI MSIX FLAGS");
> +    ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_FLAGS,
> +                                     sizeof(ctrl), &ctrl);
> +    if (ret != sizeof(ctrl)) {
> +        error_setg(errp, "failed to read PCI MSIX FLAGS: %s",
> +                   strreaderror(ret));
>           return false;
>       }
>   
> -    if (pread(fd, &table, sizeof(table),
> -              vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
> -        error_setg_errno(errp, errno, "failed to read PCI MSIX TABLE");
> +    ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_TABLE,
> +                                     sizeof(table), &table);
> +    if (ret != sizeof(table)) {
> +        error_setg(errp, "failed to read PCI MSIX TABLE: %s",
> +                   strreaderror(ret));
>           return false;
>       }
>   
> -    if (pread(fd, &pba, sizeof(pba),
> -              vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
> -        error_setg_errno(errp, errno, "failed to read PCI MSIX PBA");
> +    ret = vfio_pci_config_space_read(vdev, pos + PCI_MSIX_PBA,
> +                                     sizeof(pba), &pba);
> +    if (ret != sizeof(pba)) {
> +        error_setg(errp, "failed to read PCI MSIX PBA: %s", strreaderror(ret));
>           return false;
>       }
>   
> @@ -1744,10 +1777,10 @@ static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
>       }
>   
>       /* Determine what type of BAR this is for registration */
> -    ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
> -                vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
> +    ret = vfio_pci_config_space_read(vdev, PCI_BASE_ADDRESS_0 + (4 * nr),
> +                                     sizeof(pci_bar), &pci_bar);
>       if (ret != sizeof(pci_bar)) {
> -        error_report("vfio: Failed to read BAR %d (%m)", nr);
> +        error_report("vfio: Failed to read BAR %d: %s", nr, strreaderror(ret));
>           return;
>       }
>   
> @@ -2450,21 +2483,23 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
>   
>   void vfio_pci_post_reset(VFIOPCIDevice *vdev)
>   {
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       Error *err = NULL;
> -    int nr;
> +    int ret, nr;
>   
>       if (!vfio_intx_enable(vdev, &err)) {
>           error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>       }
>   
>       for (nr = 0; nr < PCI_NUM_REGIONS - 1; ++nr) {
> -        off_t addr = vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr);
> +        off_t addr = PCI_BASE_ADDRESS_0 + (4 * nr);
>           uint32_t val = 0;
>           uint32_t len = sizeof(val);
>   
> -        if (pwrite(vdev->vbasedev.fd, &val, len, addr) != len) {
> -            error_report("%s(%s) reset bar %d failed: %m", __func__,
> -                         vdev->vbasedev.name, nr);
> +        ret = vfio_pci_config_space_write(vdev, addr, len, &val);
> +        if (ret != len) {
> +            error_report("%s(%s) reset bar %d failed: %s", __func__,
> +                         vbasedev->name, nr, strwriteerror(ret));
>           }
>       }
>   
> @@ -3101,6 +3136,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>       int i, ret;
>       char uuid[UUID_STR_LEN];
>       g_autofree char *name = NULL;
> +    uint32_t config_space_size;
>   
>       if (vbasedev->fd < 0 && !vbasedev->sysfsdev) {
>           if (!(~vdev->host.domain || ~vdev->host.bus ||
> @@ -3155,13 +3191,14 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
>           goto error;
>       }
>   
> +    config_space_size = MIN(pci_config_size(&vdev->pdev), vdev->config_size);
> +
>       /* Get a copy of config space */
> -    ret = pread(vbasedev->fd, vdev->pdev.config,
> -                MIN(pci_config_size(&vdev->pdev), vdev->config_size),
> -                vdev->config_offset);
> -    if (ret < (int)MIN(pci_config_size(&vdev->pdev), vdev->config_size)) {
> -        ret = ret < 0 ? -errno : -EFAULT;
> -        error_setg_errno(errp, -ret, "failed to read device config space");
> +    ret = vfio_pci_config_space_read(vdev, 0, config_space_size,
> +                                     vdev->pdev.config);
> +    if (ret < (int)config_space_size) {
> +        ret = ret < 0 ? -ret : EFAULT;
> +        error_setg_errno(errp, ret, "failed to read device config space");
>           goto error;
>       }
>   



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 09/15] vfio: implement unmap all for DMA unmap callbacks
  2025-04-30 19:39 ` [PATCH v2 09/15] vfio: implement unmap all for DMA unmap callbacks John Levon
@ 2025-05-05 11:28   ` Cédric Le Goater
  2025-05-07 11:47     ` John Levon
  0 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 11:28 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:39, John Levon wrote:
> Handle unmap_all in the DMA unmap handlers rather than in the caller.
> 
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/container.c | 45 +++++++++++++++++++++++++++++++++++----------
>   hw/vfio/iommufd.c   | 15 ++++++++++++++-
>   hw/vfio/listener.c  | 19 ++++++-------------
>   3 files changed, 55 insertions(+), 24 deletions(-)
> 
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 766ba5a275..1000f3c241 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -119,12 +119,9 @@ unmap_exit:
>       return ret;
>   }
>   
> -/*
> - * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
> - */
> -static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> -                                 hwaddr iova, ram_addr_t size,
> -                                 IOMMUTLBEntry *iotlb, bool unmap_all)
> +static int vfio_legacy_dma_unmap_one(const VFIOContainerBase *bcontainer,
> +                                     hwaddr iova, ram_addr_t size,
> +                                     IOMMUTLBEntry *iotlb)
>   {
>       const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>                                                     bcontainer);
> @@ -138,10 +135,6 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
>       int ret;
>       Error *local_err = NULL;
>   
> -    if (unmap_all) {
> -        return -ENOTSUP;
> -    }
> -
>       if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
>           if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
>               bcontainer->dirty_pages_supported) {
> @@ -185,6 +178,38 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
>       return 0;
>   }
>   
> +/*
> + * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
> + */
> +static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> +                                 hwaddr iova, ram_addr_t size,
> +                                 IOMMUTLBEntry *iotlb, bool unmap_all)
> +{
> +    int ret;
> +
> +    if (unmap_all) {
> +        /* The unmap ioctl doesn't accept a full 64-bit span. */
> +        Int128 llsize = int128_rshift(int128_2_64(), 1);
> +
> +        ret = vfio_legacy_dma_unmap_one(bcontainer, 0, int128_get64(llsize),
> +                                        iotlb);
> +
> +        if (ret == 0) {
> +            ret = vfio_legacy_dma_unmap_one(bcontainer, int128_get64(llsize),
> +                                            int128_get64(llsize), iotlb);
> +        }
> +
> +    } else {
> +        ret = vfio_legacy_dma_unmap_one(bcontainer, iova, size, iotlb);
> +    }
> +
> +    if (ret != 0) {
> +        return -errno;
> +    }

the ret value should already be an errno. Shouldn't it ?


Thanks,

C.



> +    return 0;
> +}
> +
>   static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>                                  ram_addr_t size, void *vaddr, bool readonly)
>   {
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 6b2764c044..af1c7ab10a 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -51,8 +51,21 @@ static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
>       const VFIOIOMMUFDContainer *container =
>           container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>   
> +    /* unmap in halves */
>       if (unmap_all) {
> -        return -ENOTSUP;
> +        Int128 llsize = int128_rshift(int128_2_64(), 1);
> +        int ret;
> +
> +        ret = iommufd_backend_unmap_dma(container->be, container->ioas_id,
> +                                        0, int128_get64(llsize));
> +
> +        if (ret == 0) {
> +            ret = iommufd_backend_unmap_dma(container->be, container->ioas_id,
> +                                            int128_get64(llsize),
> +                                            int128_get64(llsize));
> +        }
> +
> +        return ret;
>       }
>   
>       /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index c5183700db..e7ade7d62e 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -634,21 +634,14 @@ static void vfio_listener_region_del(MemoryListener *listener,
>       }
>   
>       if (try_unmap) {
> +        bool unmap_all = false;
> +
>           if (int128_eq(llsize, int128_2_64())) {
> -            /* The unmap ioctl doesn't accept a full 64-bit span. */
> -            llsize = int128_rshift(llsize, 1);
> -            ret = vfio_container_dma_unmap(bcontainer, iova,
> -                                           int128_get64(llsize), NULL, false);
> -            if (ret) {
> -                error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> -                             "0x%"HWADDR_PRIx") = %d (%s)",
> -                             bcontainer, iova, int128_get64(llsize), ret,
> -                             strerror(-ret));
> -            }
> -            iova += int128_get64(llsize);
> +            unmap_all = true;
> +            llsize = int128_zero();
>           }
> -        ret = vfio_container_dma_unmap(bcontainer, iova,
> -                                       int128_get64(llsize), NULL, false);
> +        ret = vfio_container_dma_unmap(bcontainer, iova, int128_get64(llsize),
> +                                       NULL, unmap_all);
>           if (ret) {
>               error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>                            "0x%"HWADDR_PRIx") = %d (%s)",



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/15] vfio: add unmap_all flag to DMA unmap callback
  2025-04-30 19:39 ` [PATCH v2 08/15] vfio: add unmap_all flag to DMA unmap callback John Levon
@ 2025-05-05 12:06   ` Cédric Le Goater
  2025-05-05 13:26     ` John Levon
  0 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 12:06 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:39, John Levon wrote:
> We'll use this parameter shortly; this just adds the plumbing.

I am not sure the 'unmap_all' name reflects what the dma_unmap()
handler does.

> 
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/container-base.c              | 4 ++--
>   hw/vfio/container.c                   | 8 ++++++--
>   hw/vfio/iommufd.c                     | 6 +++++-
>   hw/vfio/listener.c                    | 8 ++++----
>   include/hw/vfio/vfio-container-base.h | 4 ++--
>   5 files changed, 19 insertions(+), 11 deletions(-)
> 
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 09340fd97a..3ff473a45c 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -85,12 +85,12 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>   
>   int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>                                hwaddr iova, ram_addr_t size,
> -                             IOMMUTLBEntry *iotlb)
> +                             IOMMUTLBEntry *iotlb, bool unmap_all)
>   {
>       VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>   
>       g_assert(vioc->dma_unmap);
> -    return vioc->dma_unmap(bcontainer, iova, size, iotlb);
> +    return vioc->dma_unmap(bcontainer, iova, size, iotlb, unmap_all);
>   }
>   
>   bool vfio_container_add_section_window(VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 1dfdc312bd..766ba5a275 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -124,7 +124,7 @@ unmap_exit:
>    */
>   static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
>                                    hwaddr iova, ram_addr_t size,
> -                                 IOMMUTLBEntry *iotlb)
> +                                 IOMMUTLBEntry *iotlb, bool unmap_all)
>   {
>       const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>                                                     bcontainer);
> @@ -138,6 +138,10 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
>       int ret;
>       Error *local_err = NULL;
>   
> +    if (unmap_all) {
> +        return -ENOTSUP;
> +    }
> +
>       if (iotlb && vfio_container_dirty_tracking_is_started(bcontainer)) {
>           if (!vfio_container_devices_dirty_tracking_is_supported(bcontainer) &&
>               bcontainer->dirty_pages_supported) {
> @@ -205,7 +209,7 @@ static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>        */
>       if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0 ||
>           (errno == EBUSY &&
> -         vfio_legacy_dma_unmap(bcontainer, iova, size, NULL) == 0 &&
> +         vfio_legacy_dma_unmap(bcontainer, iova, size, NULL, false) == 0 &&
>            ioctl(container->fd, VFIO_IOMMU_MAP_DMA, &map) == 0)) {
>           return 0;
>       }
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index 62ecb758f1..6b2764c044 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -46,11 +46,15 @@ static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>   
>   static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
>                                 hwaddr iova, ram_addr_t size,
> -                              IOMMUTLBEntry *iotlb)
> +                              IOMMUTLBEntry *iotlb, bool unmap_all)
>   {
>       const VFIOIOMMUFDContainer *container =
>           container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>   
> +    if (unmap_all) {
> +        return -ENOTSUP;
> +    }
> +
>       /* TODO: Handle dma_unmap_bitmap with iotlb args (migration) */
>       return iommufd_backend_unmap_dma(container->be,
>                                        container->ioas_id, iova, size);
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index 6f77e18a7a..c5183700db 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -172,7 +172,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>           }
>       } else {
>           ret = vfio_container_dma_unmap(bcontainer, iova,
> -                                       iotlb->addr_mask + 1, iotlb);
> +                                       iotlb->addr_mask + 1, iotlb, false);
>           if (ret) {
>               error_setg(&local_err,
>                          "vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
> @@ -201,7 +201,7 @@ static void vfio_ram_discard_notify_discard(RamDiscardListener *rdl,
>       int ret;
>   
>       /* Unmap with a single call. */
> -    ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL);
> +    ret = vfio_container_dma_unmap(bcontainer, iova, size , NULL, false);
>       if (ret) {
>           error_report("%s: vfio_container_dma_unmap() failed: %s", __func__,
>                        strerror(-ret));
> @@ -638,7 +638,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>               /* The unmap ioctl doesn't accept a full 64-bit span. */
>               llsize = int128_rshift(llsize, 1);
>               ret = vfio_container_dma_unmap(bcontainer, iova,
> -                                           int128_get64(llsize), NULL);
> +                                           int128_get64(llsize), NULL, false);
>               if (ret) {
>                   error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>                                "0x%"HWADDR_PRIx") = %d (%s)",
> @@ -648,7 +648,7 @@ static void vfio_listener_region_del(MemoryListener *listener,
>               iova += int128_get64(llsize);
>           }
>           ret = vfio_container_dma_unmap(bcontainer, iova,
> -                                       int128_get64(llsize), NULL);
> +                                       int128_get64(llsize), NULL, false);
>           if (ret) {
>               error_report("vfio_container_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>                            "0x%"HWADDR_PRIx") = %d (%s)",
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 5527e02722..92cee54d11 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -81,7 +81,7 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>                              void *vaddr, bool readonly);
>   int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>                                hwaddr iova, ram_addr_t size,
> -                             IOMMUTLBEntry *iotlb);
> +                             IOMMUTLBEntry *iotlb, bool unmap_all);
>   bool vfio_container_add_section_window(VFIOContainerBase *bcontainer,
>                                          MemoryRegionSection *section,
>                                          Error **errp);
> @@ -122,7 +122,7 @@ struct VFIOIOMMUClass {
>                      void *vaddr, bool readonly);
>       int (*dma_unmap)(const VFIOContainerBase *bcontainer,
>                        hwaddr iova, ram_addr_t size,
> -                     IOMMUTLBEntry *iotlb);
> +                     IOMMUTLBEntry *iotlb, bool unmap_all);
>       bool (*attach_device)(const char *name, VFIODevice *vbasedev,
>                             AddressSpace *as, Error **errp);
>       void (*detach_device)(VFIODevice *vbasedev);

Please add documentation to dma_unmap().

Thanks,

C.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 10/15] vfio: add device IO ops vector
  2025-04-30 19:39 ` [PATCH v2 10/15] vfio: add device IO ops vector John Levon
@ 2025-05-05 12:21   ` Cédric Le Goater
  2025-05-06 10:01   ` Cédric Le Goater
  1 sibling, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 12:21 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic, John Johnson, Elena Ufimtseva, Jagannathan Raman

On 4/30/25 21:39, John Levon wrote:
> For vfio-user, device operations such as IRQ handling and region
> read/writes are implemented in userspace over the control socket, not
> ioctl() to the vfio kernel driver; add an ops vector to generalize this,
> and implement vfio_device_io_ops_ioctl for interacting with the kernel
> vfio driver.
> 
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/container-base.c      |  6 +--
>   hw/vfio/device.c              | 77 ++++++++++++++++++++++++++++++-----
>   hw/vfio/listener.c            | 13 +++---
>   hw/vfio/pci.c                 | 10 ++---
>   include/hw/vfio/vfio-device.h | 38 +++++++++++++++++
>   5 files changed, 117 insertions(+), 27 deletions(-)
> 
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 3ff473a45c..1c6ca94b60 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -198,11 +198,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
>       feature->flags = VFIO_DEVICE_FEATURE_GET |
>                        VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
>   
> -    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -        return -errno;
> -    }
> -
> -    return 0;
> +    return vbasedev->io_ops->device_feature(vbasedev, feature);
>   }
>   
>   static int vfio_container_iommu_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 5d837092cb..468fb50eac 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -82,7 +82,7 @@ void vfio_device_irq_disable(VFIODevice *vbasedev, int index)
>           .count = 0,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
>   }
>   
>   void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
> @@ -95,7 +95,7 @@ void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
>           .count = 1,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
>   }
>   
>   void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
> @@ -108,7 +108,7 @@ void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
>           .count = 1,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
>   }
>   
>   static inline const char *action_to_str(int action)
> @@ -155,6 +155,7 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
>       int argsz;
>       const char *name;
>       int32_t *pfd;
> +    int ret;

Why adding a 'ret' variable here ?

The rest looks good.


Thanks,

C.



>       argsz = sizeof(*irq_set) + sizeof(*pfd);
>   
> @@ -167,7 +168,9 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
>       pfd = (int32_t *)&irq_set->data;
>       *pfd = fd;
>   
> -    if (!ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
> +    ret = vbasedev->io_ops->set_irqs(vbasedev, irq_set);
> +
> +    if (!ret) {
>           return true;
>       }
>   
> @@ -188,22 +191,19 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
>   int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
>                                struct vfio_irq_info *info)
>   {
> -    int ret;
> -
>       memset(info, 0, sizeof(*info));
>   
>       info->argsz = sizeof(*info);
>       info->index = index;
>   
> -    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> -
> -    return ret < 0 ? -errno : ret;
> +    return vbasedev->io_ops->get_irq_info(vbasedev, info);
>   }
>   
>   int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>                                   struct vfio_region_info **info)
>   {
>       size_t argsz = sizeof(struct vfio_region_info);
> +    int ret;
>   
>       *info = g_malloc0(argsz);
>   
> @@ -211,10 +211,11 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>   retry:
>       (*info)->argsz = argsz;
>   
> -    if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
> +    ret = vbasedev->io_ops->get_region_info(vbasedev, *info);
> +    if (ret != 0) {
>           g_free(*info);
>           *info = NULL;
> -        return -errno;
> +        return ret;
>       }
>   
>       if ((*info)->argsz > argsz) {
> @@ -320,11 +321,14 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
>       vbasedev->fd = fd;
>   }
>   
> +static VFIODeviceIOOps vfio_device_io_ops_ioctl;
> +
>   void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
>                         DeviceState *dev, bool ram_discard)
>   {
>       vbasedev->type = type;
>       vbasedev->ops = ops;
> +    vbasedev->io_ops = &vfio_device_io_ops_ioctl;
>       vbasedev->dev = dev;
>       vbasedev->fd = -1;
>   
> @@ -442,3 +446,54 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
>       QLIST_REMOVE(vbasedev, global_next);
>       vbasedev->bcontainer = NULL;
>   }
> +
> +/*
> + * Traditional ioctl() based io
> + */
> +
> +static int vfio_device_io_device_feature(VFIODevice *vbasedev,
> +                                         struct vfio_device_feature *feature)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_get_region_info(VFIODevice *vbasedev,
> +                                          struct vfio_region_info *info)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_get_irq_info(VFIODevice *vbasedev,
> +                                       struct vfio_irq_info *info)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
> +                                   struct vfio_irq_set *irqs)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
> +    .device_feature = vfio_device_io_device_feature,
> +    .get_region_info = vfio_device_io_get_region_info,
> +    .get_irq_info = vfio_device_io_get_irq_info,
> +    .set_irqs = vfio_device_io_set_irqs,
> +};
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index e7ade7d62e..2b93ca55b6 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -794,13 +794,17 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
>                        VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
>   
>       QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
> +        int ret;
> +
>           if (!vbasedev->dirty_tracking) {
>               continue;
>           }
>   
> -        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +        ret = vbasedev->io_ops->device_feature(vbasedev, feature);
> +
> +        if (ret != 0) {
>               warn_report("%s: Failed to stop DMA logging, err %d (%s)",
> -                        vbasedev->name, -errno, strerror(errno));
> +                        vbasedev->name, -ret, strerror(-ret));
>           }
>           vbasedev->dirty_tracking = false;
>       }
> @@ -901,10 +905,9 @@ static bool vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
>               continue;
>           }
>   
> -        ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +        ret = vbasedev->io_ops->device_feature(vbasedev, feature);
>           if (ret) {
> -            ret = -errno;
> -            error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
> +            error_setg_errno(errp, -ret, "%s: Failed to start DMA logging",
>                                vbasedev->name);
>               goto out;
>           }
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8455010d62..bbf95215cc 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -381,7 +381,7 @@ static void vfio_msi_interrupt(void *opaque)
>   static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
>   {
>       g_autofree struct vfio_irq_set *irq_set = NULL;
> -    int ret = 0, argsz;
> +    int argsz;
>       int32_t *fd;
>   
>       argsz = sizeof(*irq_set) + sizeof(*fd);
> @@ -396,9 +396,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
>       fd = (int32_t *)&irq_set->data;
>       *fd = -1;
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> -
> -    return ret < 0 ? -errno : ret;
> +    return vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
>   }
>   
>   static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> @@ -455,11 +453,11 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>           fds[i] = fd;
>       }
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
>   
>       g_free(irq_set);
>   
> -    return ret < 0 ? -errno : ret;
> +    return ret;
>   }
>   
>   static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 5b833868c9..e89ed02c0e 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -41,6 +41,7 @@ enum {
>   };
>   
>   typedef struct VFIODeviceOps VFIODeviceOps;
> +typedef struct VFIODeviceIOOps VFIODeviceIOOps;
>   typedef struct VFIOMigration VFIOMigration;
>   
>   typedef struct IOMMUFDBackend IOMMUFDBackend;
> @@ -66,6 +67,7 @@ typedef struct VFIODevice {
>       OnOffAuto migration_multifd_transfer;
>       bool migration_events;
>       VFIODeviceOps *ops;
> +    VFIODeviceIOOps *io_ops;
>       unsigned int num_irqs;
>       unsigned int num_regions;
>       unsigned int flags;
> @@ -141,6 +143,42 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
>   extern VFIODeviceList vfio_device_list;
>   
>   #ifdef CONFIG_LINUX
> +/*
> + * How devices communicate with the server.  The default option is through
> + * ioctl() to the kernel VFIO driver, but vfio-user can use a socket to a remote
> + * process.
> + */
> +struct VFIODeviceIOOps {
> +    /**
> +     * @device_feature
> +     *
> +     * Fill in feature info for the given device.
> +     */
> +    int (*device_feature)(VFIODevice *vdev, struct vfio_device_feature *);
> +
> +    /**
> +     * @get_region_info
> +     *
> +     * Fill in @info with information on the region given by @info->index.
> +     */
> +    int (*get_region_info)(VFIODevice *vdev,
> +                           struct vfio_region_info *info);
> +
> +    /**
> +     * @get_irq_info
> +     *
> +     * Fill in @irq with information on the IRQ given by @info->index.
> +     */
> +    int (*get_irq_info)(VFIODevice *vdev, struct vfio_irq_info *irq);
> +
> +    /**
> +     * @set_irqs
> +     *
> +     * Configure IRQs as defined by @irqs.
> +     */
> +    int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
> +};
> +
>   int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>                                   struct vfio_region_info **info);
>   int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 11/15] vfio: add region info cache
  2025-04-30 19:39 ` [PATCH v2 11/15] vfio: add region info cache John Levon
@ 2025-05-05 12:26   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 12:26 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic, John Johnson, Elena Ufimtseva, Jagannathan Raman

On 4/30/25 21:39, John Levon wrote:
> Instead of requesting region information on demand with
> VFIO_DEVICE_GET_REGION_INFO, maintain a cache: this will become
> necessary for performance for vfio-user, where this call becomes a
> message over the control socket, so is of higher overhead than the
> traditional path.
> 
> We will also need it to generalize region accesses, as that means we
> can't use ->config_offset for configuration space accesses, but must
> look up the region offset (if relevant) each time.

This looks cleaner. One comment below,

> 
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/ccw.c                 |  5 -----
>   hw/vfio/device.c              | 27 +++++++++++++++++++++++----
>   hw/vfio/igd.c                 |  8 ++++----
>   hw/vfio/pci.c                 |  6 +++---
>   hw/vfio/region.c              |  2 +-
>   include/hw/vfio/vfio-device.h |  1 +
>   6 files changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index ab3fabf991..cea9d6e005 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -504,7 +504,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
>   
>       vcdev->io_region_offset = info->offset;
>       vcdev->io_region = g_malloc0(info->size);
> -    g_free(info);
>   
>       /* check for the optional async command region */
>       ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
> @@ -517,7 +516,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
>           }
>           vcdev->async_cmd_region_offset = info->offset;
>           vcdev->async_cmd_region = g_malloc0(info->size);
> -        g_free(info);
>       }
>   
>       ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
> @@ -530,7 +528,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
>           }
>           vcdev->schib_region_offset = info->offset;
>           vcdev->schib_region = g_malloc(info->size);
> -        g_free(info);
>       }
>   
>       ret = vfio_device_get_region_info_type(vdev, VFIO_REGION_TYPE_CCW,
> @@ -544,7 +541,6 @@ static bool vfio_ccw_get_region(VFIOCCWDevice *vcdev, Error **errp)
>           }
>           vcdev->crw_region_offset = info->offset;
>           vcdev->crw_region = g_malloc(info->size);
> -        g_free(info);
>       }
>   
>       return true;
> @@ -554,7 +550,6 @@ out_err:
>       g_free(vcdev->schib_region);
>       g_free(vcdev->async_cmd_region);
>       g_free(vcdev->io_region);
> -    g_free(info);
>       return false;
>   }
>   
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 468fb50eac..d08c0ab536 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -205,6 +205,12 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>       size_t argsz = sizeof(struct vfio_region_info);
>       int ret;
>   
> +    /* check cache */
> +    if (vbasedev->reginfo[index] != NULL) {
> +        *info = vbasedev->reginfo[index];
> +        return 0;
> +    }
> +
>       *info = g_malloc0(argsz);
>   
>       (*info)->index = index;
> @@ -225,6 +231,9 @@ retry:
>           goto retry;
>       }
>   
> +    /* fill cache */
> +    vbasedev->reginfo[index] = *info;
> +
>       return 0;
>   }
>   
> @@ -243,7 +252,6 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
>   
>           hdr = vfio_get_region_info_cap(*info, VFIO_REGION_INFO_CAP_TYPE);
>           if (!hdr) {
> -            g_free(*info);
>               continue;
>           }
>   
> @@ -255,8 +263,6 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
>           if (cap_type->type == type && cap_type->subtype == subtype) {
>               return 0;
>           }
> -
> -        g_free(*info);
>       }
>   
>       *info = NULL;
> @@ -265,7 +271,7 @@ int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,
>   
>   bool vfio_device_has_region_cap(VFIODevice *vbasedev, int region, uint16_t cap_type)
>   {
> -    g_autofree struct vfio_region_info *info = NULL;
> +    struct vfio_region_info *info = NULL;
>       bool ret = false;
>   
>       if (!vfio_device_get_region_info(vbasedev, region, &info)) {
> @@ -438,10 +444,23 @@ void vfio_device_prepare(VFIODevice *vbasedev, VFIOContainerBase *bcontainer,
>       QLIST_INSERT_HEAD(&bcontainer->device_list, vbasedev, container_next);
>   
>       QLIST_INSERT_HEAD(&vfio_device_list, vbasedev, global_next);
> +
> +    if (vbasedev->reginfo == NULL) {

I don't think this test is necessary. Is it ?


Thanks,

C.



> +        vbasedev->reginfo = g_new0(struct vfio_region_info *,
> +                                   vbasedev->num_regions);
> +    }
>   }
>   
>   void vfio_device_unprepare(VFIODevice *vbasedev)
>   {
> +    int i;
> +
> +    for (i = 0; i < vbasedev->num_regions; i++) {
> +        g_free(vbasedev->reginfo[i]);
> +    }
> +    g_free(vbasedev->reginfo);
> +    vbasedev->reginfo = NULL;
> +
>       QLIST_REMOVE(vbasedev, container_next);
>       QLIST_REMOVE(vbasedev, global_next);
>       vbasedev->bcontainer = NULL;
> diff --git a/hw/vfio/igd.c b/hw/vfio/igd.c
> index d7e4728fdc..c7db74cde4 100644
> --- a/hw/vfio/igd.c
> +++ b/hw/vfio/igd.c
> @@ -191,7 +191,7 @@ static bool vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev,
>   
>   static bool vfio_pci_igd_setup_opregion(VFIOPCIDevice *vdev, Error **errp)
>   {
> -    g_autofree struct vfio_region_info *opregion = NULL;
> +    struct vfio_region_info *opregion = NULL;
>       int ret;
>   
>       /* Hotplugging is not supported for opregion access */
> @@ -355,8 +355,8 @@ static int vfio_pci_igd_lpc_init(VFIOPCIDevice *vdev,
>   
>   static bool vfio_pci_igd_setup_lpc_bridge(VFIOPCIDevice *vdev, Error **errp)
>   {
> -    g_autofree struct vfio_region_info *host = NULL;
> -    g_autofree struct vfio_region_info *lpc = NULL;
> +    struct vfio_region_info *host = NULL;
> +    struct vfio_region_info *lpc = NULL;
>       PCIDevice *lpc_bridge;
>       int ret;
>   
> @@ -532,7 +532,7 @@ static bool vfio_pci_igd_config_quirk(VFIOPCIDevice *vdev, Error **errp)
>            * - OpRegion
>            * - Same LPC bridge and Host bridge VID/DID/SVID/SSID as host
>            */
> -        g_autofree struct vfio_region_info *rom = NULL;
> +        struct vfio_region_info *rom = NULL;
>   
>           legacy_mode_enabled = true;
>           info_report("IGD legacy mode enabled, "
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index bbf95215cc..1aeb4d91d2 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -883,8 +883,8 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
>   
>   static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>   {
> -    g_autofree struct vfio_region_info *reg_info = NULL;
>       VFIODevice *vbasedev = &vdev->vbasedev;
> +    struct vfio_region_info *reg_info = NULL;
>       uint64_t size;
>       off_t off = 0;
>       ssize_t bytes;
> @@ -2710,7 +2710,7 @@ static VFIODeviceOps vfio_pci_ops = {
>   bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
>   {
>       VFIODevice *vbasedev = &vdev->vbasedev;
> -    g_autofree struct vfio_region_info *reg_info = NULL;
> +    struct vfio_region_info *reg_info = NULL;
>       int ret;
>   
>       ret = vfio_device_get_region_info(vbasedev, VFIO_PCI_VGA_REGION_INDEX, &reg_info);
> @@ -2775,7 +2775,7 @@ bool vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
>   static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
>   {
>       VFIODevice *vbasedev = &vdev->vbasedev;
> -    g_autofree struct vfio_region_info *reg_info = NULL;
> +    struct vfio_region_info *reg_info = NULL;
>       struct vfio_irq_info irq_info;
>       int i, ret = -1;
>   
> diff --git a/hw/vfio/region.c b/hw/vfio/region.c
> index 04bf9eb098..ef2630cac3 100644
> --- a/hw/vfio/region.c
> +++ b/hw/vfio/region.c
> @@ -182,7 +182,7 @@ static int vfio_setup_region_sparse_mmaps(VFIORegion *region,
>   int vfio_region_setup(Object *obj, VFIODevice *vbasedev, VFIORegion *region,
>                         int index, const char *name)
>   {
> -    g_autofree struct vfio_region_info *info = NULL;
> +    struct vfio_region_info *info = NULL;
>       int ret;
>   
>       ret = vfio_device_get_region_info(vbasedev, index, &info);
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index e89ed02c0e..b4a28c2a54 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -83,6 +83,7 @@ typedef struct VFIODevice {
>       IOMMUFDBackend *iommufd;
>       VFIOIOASHwpt *hwpt;
>       QLIST_ENTRY(VFIODevice) hwpt_next;
> +    struct vfio_region_info **reginfo;
>   } VFIODevice;
>   
>   struct VFIODeviceOps {



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 12/15] vfio: add read/write to device IO ops vector
  2025-04-30 19:40 ` [PATCH v2 12/15] vfio: add read/write to device IO ops vector John Levon
@ 2025-05-05 12:39   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 12:39 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:40, John Levon wrote:
> Now we have the region info cache, add ->region_read/write device I/O
> operations instead of explicit pread()/pwrite() system calls.
> ---
>   hw/vfio/device.c              | 38 +++++++++++++++++++++++++++++++++++
>   hw/vfio/pci.c                 | 28 +++++++++++++-------------
>   hw/vfio/region.c              | 17 ++++++++++------
>   include/hw/vfio/vfio-device.h | 18 +++++++++++++++++
>   4 files changed, 81 insertions(+), 20 deletions(-)
> 
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index d08c0ab536..ceb7bbebda 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -510,9 +510,47 @@ static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
>       return ret < 0 ? -errno : ret;
>   }
>   
> +static int vfio_device_io_region_read(VFIODevice *vbasedev, uint8_t index,
> +                                      off_t off, uint32_t size, void *data)
> +{
> +    struct vfio_region_info *info = vbasedev->reginfo[index];

Why not rely on vfio_device_get_region_info() to fill the cache and
return the cached struct vfio_region_info ?

> +    int ret;
> +
> +    if (info == NULL) {
> +        ret = vfio_device_get_region_info(vbasedev, index, &info);
> +        if (ret != 0) {
> +            return ret;
> +        }
> +    }
> +
> +    ret = pread(vbasedev->fd, data, size, info->offset + off);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_region_write(VFIODevice *vbasedev, uint8_t index,
> +                                       off_t off, uint32_t size, void *data)
> +{
> +    struct vfio_region_info *info = vbasedev->reginfo[index];

same here.

The rest looks good.


Thanks,

C.



> +    int ret;
> +
> +    if (info == NULL) {
> +        ret = vfio_device_get_region_info(vbasedev, index, &info);
> +        if (ret != 0) {
> +            return ret;
> +        }
> +    }
> +
> +    ret = pwrite(vbasedev->fd, data, size, info->offset + off);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
>   static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
>       .device_feature = vfio_device_io_device_feature,
>       .get_region_info = vfio_device_io_get_region_info,
>       .get_irq_info = vfio_device_io_get_irq_info,
>       .set_irqs = vfio_device_io_set_irqs,
> +    .region_read = vfio_device_io_region_read,
> +    .region_write = vfio_device_io_region_write,
>   };
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 1aeb4d91d2..5e811d5d6a 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -918,18 +918,22 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>       memset(vdev->rom, 0xff, size);
>   
>       while (size) {
> -        bytes = pread(vbasedev->fd, vdev->rom + off,
> -                      size, vdev->rom_offset + off);
> +        bytes = vbasedev->io_ops->region_read(vbasedev,
> +                                              VFIO_PCI_ROM_REGION_INDEX,
> +                                              off, size, vdev->rom + off);
> +
>           if (bytes == 0) {
>               break;
>           } else if (bytes > 0) {
>               off += bytes;
>               size -= bytes;
>           } else {
> -            if (errno == EINTR || errno == EAGAIN) {
> +            if (bytes == -EINTR || bytes == -EAGAIN) {
>                   continue;
>               }
> -            error_report("vfio: Error reading device ROM: %m");
> +            error_report("vfio: Error reading device ROM: %s",
> +                         strreaderror(bytes));
> +
>               break;
>           }
>       }
> @@ -969,22 +973,18 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>   static int vfio_pci_config_space_read(VFIOPCIDevice *vdev, off_t offset,
>                                         uint32_t size, void *data)
>   {
> -    ssize_t ret;
> -
> -    ret = pread(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
> -
> -    return ret < 0 ? -errno : (int)ret;
> +    return vdev->vbasedev.io_ops->region_read(&vdev->vbasedev,
> +                                              VFIO_PCI_CONFIG_REGION_INDEX,
> +                                              offset, size, data);
>   }
>   
>   /* "Raw" write of underlying config space. */
>   static int vfio_pci_config_space_write(VFIOPCIDevice *vdev, off_t offset,
>                                          uint32_t size, void *data)
>   {
> -    ssize_t ret;
> -
> -    ret = pwrite(vdev->vbasedev.fd, data, size, vdev->config_offset + offset);
> -
> -    return ret < 0 ? -errno : (int)ret;
> +    return vdev->vbasedev.io_ops->region_write(&vdev->vbasedev,
> +                                               VFIO_PCI_CONFIG_REGION_INDEX,
> +                                               offset, size, data);
>   }
>   
>   static uint64_t vfio_rom_read(void *opaque, hwaddr addr, unsigned size)
> diff --git a/hw/vfio/region.c b/hw/vfio/region.c
> index ef2630cac3..34752c3f65 100644
> --- a/hw/vfio/region.c
> +++ b/hw/vfio/region.c
> @@ -45,6 +45,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
>           uint32_t dword;
>           uint64_t qword;
>       } buf;
> +    int ret;
>   
>       switch (size) {
>       case 1:
> @@ -64,11 +65,13 @@ void vfio_region_write(void *opaque, hwaddr addr,
>           break;
>       }
>   
> -    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
> +    ret = vbasedev->io_ops->region_write(vbasedev, region->nr,
> +                                         addr, size, &buf);
> +    if (ret != size) {
>           error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
> -                     ",%d) failed: %m",
> +                     ",%d) failed: %s",
>                        __func__, vbasedev->name, region->nr,
> -                     addr, data, size);
> +                     addr, data, size, strwriteerror(ret));
>       }
>   
>       trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
> @@ -96,11 +99,13 @@ uint64_t vfio_region_read(void *opaque,
>           uint64_t qword;
>       } buf;
>       uint64_t data = 0;
> +    int ret;
>   
> -    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
> -        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
> +    ret = vbasedev->io_ops->region_read(vbasedev, region->nr, addr, size, &buf);
> +    if (ret != size) {
> +        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %s",
>                        __func__, vbasedev->name, region->nr,
> -                     addr, size);
> +                     addr, size, strreaderror(ret));
>           return (uint64_t)-1;
>       }
>       switch (size) {
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index b4a28c2a54..d3ab13ca6a 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -178,6 +178,24 @@ struct VFIODeviceIOOps {
>        * Configure IRQs as defined by @irqs.
>        */
>       int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
> +
> +    /**
> +     * @region_read
> +     *
> +     * Read @size bytes from the region @nr at offset @off into the buffer
> +     * @data.
> +     */
> +    int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
> +                       void *data);
> +
> +    /**
> +     * @region_write
> +     *
> +     * Write @size bytes to the region @nr at offset @off from the buffer
> +     * @data.
> +     */
> +    int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
> +                        void *data);
>   };
>   
>   int vfio_device_get_region_info(VFIODevice *vbasedev, int index,



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 13/15] vfio: add vfio-pci-base class
  2025-04-30 19:40 ` [PATCH v2 13/15] vfio: add vfio-pci-base class John Levon
@ 2025-05-05 12:42   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 12:42 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic, John Johnson, Elena Ufimtseva, Jagannathan Raman

On 4/30/25 21:40, John Levon wrote:
> Split out parts of TYPE_VFIO_PCI into a base TYPE_VFIO_PCI_BASE,
> although we have not yet introduced another subclass, so all the
> properties have remained in TYPE_VFIO_PCI.
> 
> Note that currently there is no need for additional data for
> TYPE_VFIO_PCI, so it shares the same C struct type as
> TYPE_VFIO_PCI_BASE, VFIOPCIDevice.


Looks better. One nit below.


> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/device.c |  2 +-
>   hw/vfio/pci.c    | 62 +++++++++++++++++++++++++++++++-----------------
>   hw/vfio/pci.h    | 12 ++++++++--
>   3 files changed, 51 insertions(+), 25 deletions(-)
> 
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index ceb7bbebda..70d75b271f 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -395,7 +395,7 @@ bool vfio_device_hiod_create_and_realize(VFIODevice *vbasedev,
>   VFIODevice *vfio_get_vfio_device(Object *obj)
>   {
>       if (object_dynamic_cast(obj, TYPE_VFIO_PCI)) {
> -        return &VFIO_PCI(obj)->vbasedev;
> +        return &VFIO_PCI_BASE(obj)->vbasedev;
>       } else {
>           return NULL;
>       }
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 5e811d5d6a..8d29b4552f 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -241,7 +241,7 @@ static void vfio_intx_update(VFIOPCIDevice *vdev, PCIINTxRoute *route)
>   
>   static void vfio_intx_routing_notifier(PCIDevice *pdev)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
>       PCIINTxRoute route;
>   
>       if (vdev->interrupt != VFIO_INT_INTx) {
> @@ -514,7 +514,7 @@ static void vfio_update_kvm_msi_virq(VFIOMSIVector *vector, MSIMessage msg,
>   static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
>                                      MSIMessage *msg, IOHandler *handler)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
>       VFIOMSIVector *vector;
>       int ret;
>       bool resizing = !!(vdev->nr_vectors < nr + 1);
> @@ -620,7 +620,7 @@ static int vfio_msix_vector_use(PCIDevice *pdev,
>   
>   static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
>       VFIOMSIVector *vector = &vdev->msi_vectors[nr];
>   
>       trace_vfio_msix_vector_release(vdev->vbasedev.name, nr);
> @@ -1196,7 +1196,7 @@ static const MemoryRegionOps vfio_vga_ops = {
>    */
>   static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
>       VFIORegion *region = &vdev->bars[bar].region;
>       MemoryRegion *mmap_mr, *region_mr, *base_mr;
>       PCIIORegion *r;
> @@ -1242,7 +1242,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
>    */
>   uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
>       VFIODevice *vbasedev = &vdev->vbasedev;
>       uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
>   
> @@ -1276,7 +1276,7 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>   void vfio_pci_write_config(PCIDevice *pdev,
>                              uint32_t addr, uint32_t val, int len)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
>       VFIODevice *vbasedev = &vdev->vbasedev;
>       uint32_t val_le = cpu_to_le32(val);
>       int ret;
> @@ -3129,7 +3129,7 @@ static bool vfio_interrupt_setup(VFIOPCIDevice *vdev, Error **errp)
>   static void vfio_realize(PCIDevice *pdev, Error **errp)
>   {
>       ERRP_GUARD();
> -    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
>       VFIODevice *vbasedev = &vdev->vbasedev;
>       int i, ret;
>       char uuid[UUID_STR_LEN];
> @@ -3300,7 +3300,7 @@ error:
>   
>   static void vfio_instance_finalize(Object *obj)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(obj);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
>   
>       vfio_display_finalize(vdev);
>       vfio_bars_finalize(vdev);
> @@ -3318,7 +3318,7 @@ static void vfio_instance_finalize(Object *obj)
>   
>   static void vfio_exitfn(PCIDevice *pdev)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
>       VFIODevice *vbasedev = &vdev->vbasedev;
>   
>       vfio_unregister_req_notifier(vdev);
> @@ -3342,7 +3342,7 @@ static void vfio_exitfn(PCIDevice *pdev)
>   
>   static void vfio_pci_reset(DeviceState *dev)
>   {
> -    VFIOPCIDevice *vdev = VFIO_PCI(dev);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(dev);
>   
>       trace_vfio_pci_reset(vdev->vbasedev.name);
>   
> @@ -3382,7 +3382,7 @@ post_reset:
>   static void vfio_instance_init(Object *obj)
>   {
>       PCIDevice *pci_dev = PCI_DEVICE(obj);
> -    VFIOPCIDevice *vdev = VFIO_PCI(obj);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
>       VFIODevice *vbasedev = &vdev->vbasedev;
>   
>       device_add_bootindex_property(obj, &vdev->bootindex,
> @@ -3403,6 +3403,31 @@ static void vfio_instance_init(Object *obj)
>       pci_dev->cap_present |= QEMU_PCI_CAP_EXPRESS;
>   }
>   
> +static void vfio_pci_base_dev_class_init(ObjectClass *klass, const void *data)
> +{
> +    DeviceClass *dc = DEVICE_CLASS(klass);
> +    PCIDeviceClass *pdc = PCI_DEVICE_CLASS(klass);
> +
> +    dc->desc = "VFIO PCI base device";
> +    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
> +    pdc->exit = vfio_exitfn;
> +    pdc->config_read = vfio_pci_read_config;
> +    pdc->config_write = vfio_pci_write_config;
> +}
> +
> +static const TypeInfo vfio_pci_base_dev_info = {
> +    .name = TYPE_VFIO_PCI_BASE,
> +    .parent = TYPE_PCI_DEVICE,
> +    .instance_size = 0,
> +    .abstract = true,
> +    .class_init = vfio_pci_base_dev_class_init,
> +    .interfaces = (const InterfaceInfo[]) {
> +        { INTERFACE_PCIE_DEVICE },
> +        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> +        { }
> +    },
> +};
> +
>   static PropertyInfo vfio_pci_migration_multifd_transfer_prop;
>   
>   static const Property vfio_pci_dev_properties[] = {
> @@ -3473,7 +3498,8 @@ static const Property vfio_pci_dev_properties[] = {
>   #ifdef CONFIG_IOMMUFD
>   static void vfio_pci_set_fd(Object *obj, const char *str, Error **errp)
>   {
> -    vfio_device_set_fd(&VFIO_PCI(obj)->vbasedev, str, errp);
> +    VFIOPCIDevice *vdev = VFIO_PCI_BASE(obj);
> +    vfio_device_set_fd(&vdev->vbasedev, str, errp);
>   }
>   #endif
>   
> @@ -3488,11 +3514,7 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
>       object_class_property_add_str(klass, "fd", NULL, vfio_pci_set_fd);
>   #endif
>       dc->desc = "VFIO-based PCI device assignment";
> -    set_bit(DEVICE_CATEGORY_MISC, dc->categories);
>       pdc->realize = vfio_realize;
> -    pdc->exit = vfio_exitfn;
> -    pdc->config_read = vfio_pci_read_config;
> -    pdc->config_write = vfio_pci_write_config;
>   
>       object_class_property_set_description(klass, /* 1.3 */
>                                             "host",
> @@ -3617,16 +3639,11 @@ static void vfio_pci_dev_class_init(ObjectClass *klass, const void *data)
>   
>   static const TypeInfo vfio_pci_dev_info = {
>       .name = TYPE_VFIO_PCI,
> -    .parent = TYPE_PCI_DEVICE,
> +    .parent = TYPE_VFIO_PCI_BASE,
>       .instance_size = sizeof(VFIOPCIDevice),
>       .class_init = vfio_pci_dev_class_init,
>       .instance_init = vfio_instance_init,
>       .instance_finalize = vfio_instance_finalize,
> -    .interfaces = (const InterfaceInfo[]) {
> -        { INTERFACE_PCIE_DEVICE },
> -        { INTERFACE_CONVENTIONAL_PCI_DEVICE },
> -        { }
> -    },
>   };
>   
>   static const Property vfio_pci_dev_nohotplug_properties[] = {
> @@ -3673,6 +3690,7 @@ static void register_vfio_pci_dev_type(void)
>       vfio_pci_migration_multifd_transfer_prop = qdev_prop_on_off_auto;
>       vfio_pci_migration_multifd_transfer_prop.realized_set_allowed = true;
>   
> +    type_register_static(&vfio_pci_base_dev_info);
>       type_register_static(&vfio_pci_dev_info);
>       type_register_static(&vfio_pci_nohotplug_dev_info);
>   }
> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
> index f835b1dbc2..32a65cc1ae 100644
> --- a/hw/vfio/pci.h
> +++ b/hw/vfio/pci.h
> @@ -118,8 +118,13 @@ typedef struct VFIOMSIXInfo {
>       bool noresize;
>   } VFIOMSIXInfo;
>   
> -#define TYPE_VFIO_PCI "vfio-pci"
> -OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI)
> +/*
> + * TYPE_VFIO_PCI_BASE is an abstract type used to share code
> + * between VFIO implementations that use a kernel driver
> + * with those that use user sockets.
> + */
> +#define TYPE_VFIO_PCI_BASE "vfio-pci-base"
> +OBJECT_DECLARE_SIMPLE_TYPE(VFIOPCIDevice, VFIO_PCI_BASE)
>   
>   struct VFIOPCIDevice {
>       PCIDevice pdev;
> @@ -187,6 +192,9 @@ struct VFIOPCIDevice {
>       Notifier irqchip_change_notifier;
>   };
>   
> +#define TYPE_VFIO_PCI "vfio-pci"
> +/* TYPE_VFIO_PCI shares struct VFIOPCIDevice. */

Please keep the TYPE_VFIO definitions together.


Thanks,

C.


> +
>   /* Use uin32_t for vendor & device so PCI_ANY_ID expands and cannot match hw */
>   static inline bool vfio_pci_is(VFIOPCIDevice *vdev, uint32_t vendor, uint32_t device)
>   {



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 14/15] vfio/container: pass listener_begin/commit callbacks
  2025-04-30 19:40 ` [PATCH v2 14/15] vfio/container: pass listener_begin/commit callbacks John Levon
@ 2025-05-05 12:43   ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 12:43 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 4/30/25 21:40, John Levon wrote:
> The vfio-user container will later need to hook into these callbacks;
> set up vfio to use them, and optionally pass them through to the
> container.
> 
> Signed-off-by: John Levon <john.levon@nutanix.com>


Reviewed-by: Cédric Le Goater <clg@redhat.com>

Thanks,

C.


> ---
>   hw/vfio/listener.c                    | 28 +++++++++++++++++++++++++++
>   include/hw/vfio/vfio-container-base.h |  2 ++
>   2 files changed, 30 insertions(+)
> 
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index 2b93ca55b6..bfacb3d8d9 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -411,6 +411,32 @@ static bool vfio_get_section_iova_range(VFIOContainerBase *bcontainer,
>       return true;
>   }
>   
> +static void vfio_listener_begin(MemoryListener *listener)
> +{
> +    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
> +                                                 listener);
> +    void (*listener_begin)(VFIOContainerBase *bcontainer);
> +
> +    listener_begin = VFIO_IOMMU_GET_CLASS(bcontainer)->listener_begin;
> +
> +    if (listener_begin) {
> +        listener_begin(bcontainer);
> +    }
> +}
> +
> +static void vfio_listener_commit(MemoryListener *listener)
> +{
> +    VFIOContainerBase *bcontainer = container_of(listener, VFIOContainerBase,
> +                                                 listener);
> +    void (*listener_commit)(VFIOContainerBase *bcontainer);
> +
> +    listener_commit = VFIO_IOMMU_GET_CLASS(bcontainer)->listener_begin;
> +
> +    if (listener_commit) {
> +        listener_commit(bcontainer);
> +    }
> +}
> +
>   static void vfio_device_error_append(VFIODevice *vbasedev, Error **errp)
>   {
>       /*
> @@ -1161,6 +1187,8 @@ static void vfio_listener_log_sync(MemoryListener *listener,
>   
>   static const MemoryListener vfio_memory_listener = {
>       .name = "vfio",
> +    .begin = vfio_listener_begin,
> +    .commit = vfio_listener_commit,
>       .region_add = vfio_listener_region_add,
>       .region_del = vfio_listener_region_del,
>       .log_global_start = vfio_listener_log_global_start,
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index 92cee54d11..e29f7126c5 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -117,6 +117,8 @@ struct VFIOIOMMUClass {
>   
>       /* basic feature */
>       bool (*setup)(VFIOContainerBase *bcontainer, Error **errp);
> +    void (*listener_begin)(VFIOContainerBase *bcontainer);
> +    void (*listener_commit)(VFIOContainerBase *bcontainer);
>       int (*dma_map)(const VFIOContainerBase *bcontainer,
>                      hwaddr iova, ram_addr_t size,
>                      void *vaddr, bool readonly);



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 15/15] vfio/container: pass MemoryRegion to DMA operations
  2025-04-30 19:40 ` [PATCH v2 15/15] vfio/container: pass MemoryRegion to DMA operations John Levon
@ 2025-05-05 12:46   ` Cédric Le Goater
  2025-05-07 13:58     ` John Levon
  0 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 12:46 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic, John Johnson, Jagannathan Raman, Elena Ufimtseva

On 4/30/25 21:40, John Levon wrote:
> Pass through the MemoryRegion to DMA operation handlers of vfio
> containers. The vfio-user container will need this later.
> 
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>

You should add the system/memory maintainers as Cc: entries in this
patch.

Thanks,

C.


> ---
>   hw/vfio/container-base.c              |  4 ++--
>   hw/vfio/container.c                   |  3 ++-
>   hw/vfio/iommufd.c                     |  3 ++-
>   hw/vfio/listener.c                    | 18 +++++++++++-------
>   hw/virtio/vhost-vdpa.c                |  2 +-
>   include/hw/vfio/vfio-container-base.h |  4 ++--
>   include/system/memory.h               |  4 +++-
>   system/memory.c                       |  7 ++++++-
>   8 files changed, 29 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 1c6ca94b60..a677bb6694 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -75,12 +75,12 @@ void vfio_address_space_insert(VFIOAddressSpace *space,
>   
>   int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>                              hwaddr iova, ram_addr_t size,
> -                           void *vaddr, bool readonly)
> +                           void *vaddr, bool readonly, MemoryRegion *mrp)
>   {
>       VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>   
>       g_assert(vioc->dma_map);
> -    return vioc->dma_map(bcontainer, iova, size, vaddr, readonly);
> +    return vioc->dma_map(bcontainer, iova, size, vaddr, readonly, mrp);
>   }
>   
>   int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/container.c b/hw/vfio/container.c
> index 1000f3c241..aaaca33c8e 100644
> --- a/hw/vfio/container.c
> +++ b/hw/vfio/container.c
> @@ -211,7 +211,8 @@ static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
>   }
>   
>   static int vfio_legacy_dma_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> -                               ram_addr_t size, void *vaddr, bool readonly)
> +                               ram_addr_t size, void *vaddr, bool readonly,
> +                               MemoryRegion *mrp)
>   {
>       const VFIOContainer *container = container_of(bcontainer, VFIOContainer,
>                                                     bcontainer);
> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
> index af1c7ab10a..a2518c4a5d 100644
> --- a/hw/vfio/iommufd.c
> +++ b/hw/vfio/iommufd.c
> @@ -34,7 +34,8 @@
>               TYPE_HOST_IOMMU_DEVICE_IOMMUFD "-vfio"
>   
>   static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
> -                            ram_addr_t size, void *vaddr, bool readonly)
> +                            ram_addr_t size, void *vaddr, bool readonly,
> +                            MemoryRegion *mrp)
>   {
>       const VFIOIOMMUFDContainer *container =
>           container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index bfacb3d8d9..71f336a31c 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -93,12 +93,12 @@ static bool vfio_listener_skipped_section(MemoryRegionSection *section)
>   /* Called with rcu_read_lock held.  */
>   static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
>                                  ram_addr_t *ram_addr, bool *read_only,
> -                               Error **errp)
> +                               MemoryRegion **mrp, Error **errp)
>   {
>       bool ret, mr_has_discard_manager;
>   
>       ret = memory_get_xlat_addr(iotlb, vaddr, ram_addr, read_only,
> -                               &mr_has_discard_manager, errp);
> +                               &mr_has_discard_manager, mrp, errp);
>       if (ret && mr_has_discard_manager) {
>           /*
>            * Malicious VMs might trigger discarding of IOMMU-mapped memory. The
> @@ -126,6 +126,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>       VFIOGuestIOMMU *giommu = container_of(n, VFIOGuestIOMMU, n);
>       VFIOContainerBase *bcontainer = giommu->bcontainer;
>       hwaddr iova = iotlb->iova + giommu->iommu_offset;
> +    MemoryRegion *mrp;
>       void *vaddr;
>       int ret;
>       Error *local_err = NULL;
> @@ -150,7 +151,8 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>       if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>           bool read_only;
>   
> -        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &local_err)) {
> +        if (!vfio_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, &mrp,
> +                                &local_err)) {
>               error_report_err(local_err);
>               goto out;
>           }
> @@ -163,7 +165,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>            */
>           ret = vfio_container_dma_map(bcontainer, iova,
>                                        iotlb->addr_mask + 1, vaddr,
> -                                     read_only);
> +                                     read_only, mrp);
>           if (ret) {
>               error_report("vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
>                            "0x%"HWADDR_PRIx", %p) = %d (%s)",
> @@ -233,7 +235,7 @@ static int vfio_ram_discard_notify_populate(RamDiscardListener *rdl,
>           vaddr = memory_region_get_ram_ptr(section->mr) + start;
>   
>           ret = vfio_container_dma_map(bcontainer, iova, next - start,
> -                                     vaddr, section->readonly);
> +                                     vaddr, section->readonly, section->mr);
>           if (ret) {
>               /* Rollback */
>               vfio_ram_discard_notify_discard(rdl, section);
> @@ -557,7 +559,7 @@ static void vfio_listener_region_add(MemoryListener *listener,
>       }
>   
>       ret = vfio_container_dma_map(bcontainer, iova, int128_get64(llsize),
> -                                 vaddr, section->readonly);
> +                                 vaddr, section->readonly, section->mr);
>       if (ret) {
>           error_setg(&err, "vfio_container_dma_map(%p, 0x%"HWADDR_PRIx", "
>                      "0x%"HWADDR_PRIx", %p) = %d (%s)",
> @@ -1021,7 +1023,9 @@ static void vfio_iommu_map_dirty_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>       }
>   
>       rcu_read_lock();
> -    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, &local_err)) {
> +    if (!vfio_get_xlat_addr(iotlb, NULL, &translated_addr, NULL, NULL,
> +                            &local_err)) {
> +        error_report_err(local_err);
>           goto out_unlock;
>       }
>   
> diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> index 1ab2c11fa8..4c4b3d1371 100644
> --- a/hw/virtio/vhost-vdpa.c
> +++ b/hw/virtio/vhost-vdpa.c
> @@ -228,7 +228,7 @@ static void vhost_vdpa_iommu_map_notify(IOMMUNotifier *n, IOMMUTLBEntry *iotlb)
>       if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
>           bool read_only;
>   
> -        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL,
> +        if (!memory_get_xlat_addr(iotlb, &vaddr, NULL, &read_only, NULL, NULL,
>                                     &local_err)) {
>               error_report_err(local_err);
>               return;
> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
> index e29f7126c5..09b72e9969 100644
> --- a/include/hw/vfio/vfio-container-base.h
> +++ b/include/hw/vfio/vfio-container-base.h
> @@ -78,7 +78,7 @@ void vfio_address_space_insert(VFIOAddressSpace *space,
>   
>   int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>                              hwaddr iova, ram_addr_t size,
> -                           void *vaddr, bool readonly);
> +                           void *vaddr, bool readonly, MemoryRegion *mrp);
>   int vfio_container_dma_unmap(VFIOContainerBase *bcontainer,
>                                hwaddr iova, ram_addr_t size,
>                                IOMMUTLBEntry *iotlb, bool unmap_all);
> @@ -121,7 +121,7 @@ struct VFIOIOMMUClass {
>       void (*listener_commit)(VFIOContainerBase *bcontainer);
>       int (*dma_map)(const VFIOContainerBase *bcontainer,
>                      hwaddr iova, ram_addr_t size,
> -                   void *vaddr, bool readonly);
> +                   void *vaddr, bool readonly, MemoryRegion *mrp);
>       int (*dma_unmap)(const VFIOContainerBase *bcontainer,
>                        hwaddr iova, ram_addr_t size,
>                        IOMMUTLBEntry *iotlb, bool unmap_all);
> diff --git a/include/system/memory.h b/include/system/memory.h
> index fbbf4cf911..eca1d9f32e 100644
> --- a/include/system/memory.h
> +++ b/include/system/memory.h
> @@ -746,13 +746,15 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
>    * @read_only: indicates if writes are allowed
>    * @mr_has_discard_manager: indicates memory is controlled by a
>    *                          RamDiscardManager
> + * @mrp: if non-NULL, fill in with MemoryRegion
>    * @errp: pointer to Error*, to store an error if it happens.
>    *
>    * Return: true on success, else false setting @errp with error.
>    */
>   bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
>                             ram_addr_t *ram_addr, bool *read_only,
> -                          bool *mr_has_discard_manager, Error **errp);
> +                          bool *mr_has_discard_manager, MemoryRegion **mrp,
> +                          Error **errp);
>   
>   typedef struct CoalescedMemoryRange CoalescedMemoryRange;
>   typedef struct MemoryRegionIoeventfd MemoryRegionIoeventfd;
> diff --git a/system/memory.c b/system/memory.c
> index 71434e7ad0..79671943ce 100644
> --- a/system/memory.c
> +++ b/system/memory.c
> @@ -2176,7 +2176,8 @@ void ram_discard_manager_unregister_listener(RamDiscardManager *rdm,
>   /* Called with rcu_read_lock held.  */
>   bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
>                             ram_addr_t *ram_addr, bool *read_only,
> -                          bool *mr_has_discard_manager, Error **errp)
> +                          bool *mr_has_discard_manager, MemoryRegion **mrp,
> +                          Error **errp)
>   {
>       MemoryRegion *mr;
>       hwaddr xlat;
> @@ -2241,6 +2242,10 @@ bool memory_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
>           *read_only = !writable || mr->readonly;
>       }
>   
> +    if (mrp != NULL) {
> +        *mrp = mr;
> +    }
> +
>       return true;
>   }
>   



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 00/15] vfio: preparation for vfio-user
  2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
                   ` (14 preceding siblings ...)
  2025-04-30 19:40 ` [PATCH v2 15/15] vfio/container: pass MemoryRegion to DMA operations John Levon
@ 2025-05-05 12:51 ` Cédric Le Goater
  15 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 12:51 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

Hello John,

On 4/30/25 21:39, John Levon wrote:
> Hi, this series is against the vfio-next tree:
> https://github.com/legoater/qemu/commits/vfio-next
> 
> The series contains patches to vfio to prepare for the vfio-user
> implementation. A previous version of these patches can be found at
> https://lore.kernel.org/all/20250409134814.478903-1-john.levon@nutanix.com/
> 
> The changes have been rebased on vfio-next, and include changes from previous
> series code review comments.
> 
> An old version of the full vfio-user series can be found at
> https://lore.kernel.org/all/7dd34008-e0f1-4eed-a77e-55b1f68fbe69@redhat.com/T/
> ("[PATCH v8 00/28] vfio-user client"). Please see that series for justification
> and context.

We are nearly there. Please address the little issues in v3, the
build breakage being the most important. Last patch is not under
the VFIO jurisdiction though.


Thanks,

C.






  
> thanks
> john
> 
> John Levon (15):
>    vfio: add vfio_prepare_device()
>    vfio: add vfio_unprepare_device()
>    vfio: add vfio_attach_device_by_iommu_type()
>    vfio: add vfio_device_get_irq_info() helper
>    vfio: consistently handle return value for helpers
>    include/qemu: add strread/writeerror()
>    vfio: add vfio_pci_config_space_read/write()
>    vfio: add unmap_all flag to DMA unmap callback
>    vfio: implement unmap all for DMA unmap callbacks
>    vfio: add device IO ops vector
>    vfio: add region info cache
>    vfio: add read/write to device IO ops vector
>    vfio: add vfio-pci-base class
>    vfio/container: pass listener_begin/commit callbacks
>    vfio/container: pass MemoryRegion to DMA operations
> 
>   hw/vfio/ap.c                          |  19 +-
>   hw/vfio/ccw.c                         |  25 ++-
>   hw/vfio/container-base.c              |  14 +-
>   hw/vfio/container.c                   |  66 ++++---
>   hw/vfio/device.c                      | 192 +++++++++++++++++--
>   hw/vfio/igd.c                         |   8 +-
>   hw/vfio/iommufd.c                     |  35 ++--
>   hw/vfio/listener.c                    |  82 +++++---
>   hw/vfio/pci.c                         | 257 ++++++++++++++++----------
>   hw/vfio/pci.h                         |  12 +-
>   hw/vfio/platform.c                    |   6 +-
>   hw/vfio/region.c                      |  19 +-
>   hw/virtio/vhost-vdpa.c                |   2 +-
>   include/hw/vfio/vfio-container-base.h |  10 +-
>   include/hw/vfio/vfio-device.h         |  67 +++++++
>   include/qemu/error-report.h           |  14 ++
>   include/system/memory.h               |   4 +-
>   system/memory.c                       |   7 +-
>   18 files changed, 604 insertions(+), 235 deletions(-)
> 



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/15] vfio: add unmap_all flag to DMA unmap callback
  2025-05-05 12:06   ` Cédric Le Goater
@ 2025-05-05 13:26     ` John Levon
  2025-05-05 21:05       ` Cédric Le Goater
  0 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-05-05 13:26 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On Mon, May 05, 2025 at 02:06:03PM +0200, Cédric Le Goater wrote:

> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On 4/30/25 21:39, John Levon wrote:
> > We'll use this parameter shortly; this just adds the plumbing.
> 
> I am not sure the 'unmap_all' name reflects what the dma_unmap()
> handler does.

FWIW the vfio API flag that reflects this is already called
VFIO_DMA_UNMAP_FLAG_ALL so there's precedent for the name.

It unmaps the entire address space, right? Do you have a suggestion for a better
name?

regards
john


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 08/15] vfio: add unmap_all flag to DMA unmap callback
  2025-05-05 13:26     ` John Levon
@ 2025-05-05 21:05       ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-05 21:05 UTC (permalink / raw)
  To: John Levon
  Cc: qemu-devel, Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 5/5/25 15:26, John Levon wrote:
> On Mon, May 05, 2025 at 02:06:03PM +0200, Cédric Le Goater wrote:
> 
>> !-------------------------------------------------------------------|
>>   CAUTION: External Email
>>
>> |-------------------------------------------------------------------!
>>
>> On 4/30/25 21:39, John Levon wrote:
>>> We'll use this parameter shortly; this just adds the plumbing.
>>
>> I am not sure the 'unmap_all' name reflects what the dma_unmap()
>> handler does.
> 
> FWIW the vfio API flag that reflects this is already called
> VFIO_DMA_UNMAP_FLAG_ALL so there's precedent for the name.
> 
> It unmaps the entire address space, right? 

yes but the unmap is split in two.

> Do you have a suggestion for a better name?

no. Let's move on.


Thanks,

C.




^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 10/15] vfio: add device IO ops vector
  2025-04-30 19:39 ` [PATCH v2 10/15] vfio: add device IO ops vector John Levon
  2025-05-05 12:21   ` Cédric Le Goater
@ 2025-05-06 10:01   ` Cédric Le Goater
  1 sibling, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-06 10:01 UTC (permalink / raw)
  To: John Levon, qemu-devel
  Cc: Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic, John Johnson, Elena Ufimtseva, Jagannathan Raman

On 4/30/25 21:39, John Levon wrote:
> For vfio-user, device operations such as IRQ handling and region
> read/writes are implemented in userspace over the control socket, not
> ioctl() to the vfio kernel driver; add an ops vector to generalize this,
> and implement vfio_device_io_ops_ioctl for interacting with the kernel
> vfio driver.
> 
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/container-base.c      |  6 +--
>   hw/vfio/device.c              | 77 ++++++++++++++++++++++++++++++-----
>   hw/vfio/listener.c            | 13 +++---
>   hw/vfio/pci.c                 | 10 ++---
>   include/hw/vfio/vfio-device.h | 38 +++++++++++++++++
>   5 files changed, 117 insertions(+), 27 deletions(-)
> 
> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
> index 3ff473a45c..1c6ca94b60 100644
> --- a/hw/vfio/container-base.c
> +++ b/hw/vfio/container-base.c
> @@ -198,11 +198,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
>       feature->flags = VFIO_DEVICE_FEATURE_GET |
>                        VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
>   
> -    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -        return -errno;
> -    }
> -
> -    return 0;
> +    return vbasedev->io_ops->device_feature(vbasedev, feature);
>   }
>   
>   static int vfio_container_iommu_query_dirty_bitmap(const VFIOContainerBase *bcontainer,
> diff --git a/hw/vfio/device.c b/hw/vfio/device.c
> index 5d837092cb..468fb50eac 100644
> --- a/hw/vfio/device.c
> +++ b/hw/vfio/device.c
> @@ -82,7 +82,7 @@ void vfio_device_irq_disable(VFIODevice *vbasedev, int index)
>           .count = 0,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
>   }
>   
>   void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
> @@ -95,7 +95,7 @@ void vfio_device_irq_unmask(VFIODevice *vbasedev, int index)
>           .count = 1,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
>   }
>   
>   void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
> @@ -108,7 +108,7 @@ void vfio_device_irq_mask(VFIODevice *vbasedev, int index)
>           .count = 1,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io_ops->set_irqs(vbasedev, &irq_set);
>   }


FYI, Coverity reports issues for the above routines:

*** CID 1609636:  Error handling issues  (CHECKED_RETURN)
/builds/qemu-project/qemu/hw/vfio/device.c: 98 in vfio_device_irq_unmask()
** CID 1609635:  Error handling issues  (CHECKED_RETURN)
/builds/qemu-project/qemu/hw/vfio/device.c: 111 in vfio_device_irq_mask()
** CID 1609633:  Error handling issues  (CHECKED_RETURN)
/builds/qemu-project/qemu/hw/vfio/device.c: 85 in vfio_device_irq_disable()

Something to address after this change. Not critical.


Thanks,

C.



>   
>   static inline const char *action_to_str(int action)
> @@ -155,6 +155,7 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
>       int argsz;
>       const char *name;
>       int32_t *pfd;
> +    int ret;
>   
>       argsz = sizeof(*irq_set) + sizeof(*pfd);
>   
> @@ -167,7 +168,9 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
>       pfd = (int32_t *)&irq_set->data;
>       *pfd = fd;
>   
> -    if (!ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
> +    ret = vbasedev->io_ops->set_irqs(vbasedev, irq_set);
> +
> +    if (!ret) {
>           return true;
>       }
>   
> @@ -188,22 +191,19 @@ bool vfio_device_irq_set_signaling(VFIODevice *vbasedev, int index, int subindex
>   int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
>                                struct vfio_irq_info *info)
>   {
> -    int ret;
> -
>       memset(info, 0, sizeof(*info));
>   
>       info->argsz = sizeof(*info);
>       info->index = index;
>   
> -    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> -
> -    return ret < 0 ? -errno : ret;
> +    return vbasedev->io_ops->get_irq_info(vbasedev, info);
>   }
>   
>   int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>                                   struct vfio_region_info **info)
>   {
>       size_t argsz = sizeof(struct vfio_region_info);
> +    int ret;
>   
>       *info = g_malloc0(argsz);
>   
> @@ -211,10 +211,11 @@ int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>   retry:
>       (*info)->argsz = argsz;
>   
> -    if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
> +    ret = vbasedev->io_ops->get_region_info(vbasedev, *info);
> +    if (ret != 0) {
>           g_free(*info);
>           *info = NULL;
> -        return -errno;
> +        return ret;
>       }
>   
>       if ((*info)->argsz > argsz) {
> @@ -320,11 +321,14 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
>       vbasedev->fd = fd;
>   }
>   
> +static VFIODeviceIOOps vfio_device_io_ops_ioctl;
> +
>   void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
>                         DeviceState *dev, bool ram_discard)
>   {
>       vbasedev->type = type;
>       vbasedev->ops = ops;
> +    vbasedev->io_ops = &vfio_device_io_ops_ioctl;
>       vbasedev->dev = dev;
>       vbasedev->fd = -1;
>   
> @@ -442,3 +446,54 @@ void vfio_device_unprepare(VFIODevice *vbasedev)
>       QLIST_REMOVE(vbasedev, global_next);
>       vbasedev->bcontainer = NULL;
>   }
> +
> +/*
> + * Traditional ioctl() based io
> + */
> +
> +static int vfio_device_io_device_feature(VFIODevice *vbasedev,
> +                                         struct vfio_device_feature *feature)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_get_region_info(VFIODevice *vbasedev,
> +                                          struct vfio_region_info *info)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_get_irq_info(VFIODevice *vbasedev,
> +                                       struct vfio_irq_info *info)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_device_io_set_irqs(VFIODevice *vbasedev,
> +                                   struct vfio_irq_set *irqs)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static VFIODeviceIOOps vfio_device_io_ops_ioctl = {
> +    .device_feature = vfio_device_io_device_feature,
> +    .get_region_info = vfio_device_io_get_region_info,
> +    .get_irq_info = vfio_device_io_get_irq_info,
> +    .set_irqs = vfio_device_io_set_irqs,
> +};
> diff --git a/hw/vfio/listener.c b/hw/vfio/listener.c
> index e7ade7d62e..2b93ca55b6 100644
> --- a/hw/vfio/listener.c
> +++ b/hw/vfio/listener.c
> @@ -794,13 +794,17 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
>                        VFIO_DEVICE_FEATURE_DMA_LOGGING_STOP;
>   
>       QLIST_FOREACH(vbasedev, &bcontainer->device_list, container_next) {
> +        int ret;
> +
>           if (!vbasedev->dirty_tracking) {
>               continue;
>           }
>   
> -        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +        ret = vbasedev->io_ops->device_feature(vbasedev, feature);
> +
> +        if (ret != 0) {
>               warn_report("%s: Failed to stop DMA logging, err %d (%s)",
> -                        vbasedev->name, -errno, strerror(errno));
> +                        vbasedev->name, -ret, strerror(-ret));
>           }
>           vbasedev->dirty_tracking = false;
>       }
> @@ -901,10 +905,9 @@ static bool vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
>               continue;
>           }
>   
> -        ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +        ret = vbasedev->io_ops->device_feature(vbasedev, feature);
>           if (ret) {
> -            ret = -errno;
> -            error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
> +            error_setg_errno(errp, -ret, "%s: Failed to start DMA logging",
>                                vbasedev->name);
>               goto out;
>           }
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 8455010d62..bbf95215cc 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -381,7 +381,7 @@ static void vfio_msi_interrupt(void *opaque)
>   static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
>   {
>       g_autofree struct vfio_irq_set *irq_set = NULL;
> -    int ret = 0, argsz;
> +    int argsz;
>       int32_t *fd;
>   
>       argsz = sizeof(*irq_set) + sizeof(*fd);
> @@ -396,9 +396,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
>       fd = (int32_t *)&irq_set->data;
>       *fd = -1;
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> -
> -    return ret < 0 ? -errno : ret;
> +    return vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
>   }
>   
>   static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> @@ -455,11 +453,11 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>           fds[i] = fd;
>       }
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = vdev->vbasedev.io_ops->set_irqs(&vdev->vbasedev, irq_set);
>   
>       g_free(irq_set);
>   
> -    return ret < 0 ? -errno : ret;
> +    return ret;
>   }
>   
>   static void vfio_add_kvm_msi_virq(VFIOPCIDevice *vdev, VFIOMSIVector *vector,
> diff --git a/include/hw/vfio/vfio-device.h b/include/hw/vfio/vfio-device.h
> index 5b833868c9..e89ed02c0e 100644
> --- a/include/hw/vfio/vfio-device.h
> +++ b/include/hw/vfio/vfio-device.h
> @@ -41,6 +41,7 @@ enum {
>   };
>   
>   typedef struct VFIODeviceOps VFIODeviceOps;
> +typedef struct VFIODeviceIOOps VFIODeviceIOOps;
>   typedef struct VFIOMigration VFIOMigration;
>   
>   typedef struct IOMMUFDBackend IOMMUFDBackend;
> @@ -66,6 +67,7 @@ typedef struct VFIODevice {
>       OnOffAuto migration_multifd_transfer;
>       bool migration_events;
>       VFIODeviceOps *ops;
> +    VFIODeviceIOOps *io_ops;
>       unsigned int num_irqs;
>       unsigned int num_regions;
>       unsigned int flags;
> @@ -141,6 +143,42 @@ typedef QLIST_HEAD(VFIODeviceList, VFIODevice) VFIODeviceList;
>   extern VFIODeviceList vfio_device_list;
>   
>   #ifdef CONFIG_LINUX
> +/*
> + * How devices communicate with the server.  The default option is through
> + * ioctl() to the kernel VFIO driver, but vfio-user can use a socket to a remote
> + * process.
> + */
> +struct VFIODeviceIOOps {
> +    /**
> +     * @device_feature
> +     *
> +     * Fill in feature info for the given device.
> +     */
> +    int (*device_feature)(VFIODevice *vdev, struct vfio_device_feature *);
> +
> +    /**
> +     * @get_region_info
> +     *
> +     * Fill in @info with information on the region given by @info->index.
> +     */
> +    int (*get_region_info)(VFIODevice *vdev,
> +                           struct vfio_region_info *info);
> +
> +    /**
> +     * @get_irq_info
> +     *
> +     * Fill in @irq with information on the IRQ given by @info->index.
> +     */
> +    int (*get_irq_info)(VFIODevice *vdev, struct vfio_irq_info *irq);
> +
> +    /**
> +     * @set_irqs
> +     *
> +     * Configure IRQs as defined by @irqs.
> +     */
> +    int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
> +};
> +
>   int vfio_device_get_region_info(VFIODevice *vbasedev, int index,
>                                   struct vfio_region_info **info);
>   int vfio_device_get_region_info_type(VFIODevice *vbasedev, uint32_t type,



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper
  2025-05-05  9:19   ` Cédric Le Goater
@ 2025-05-06 11:38     ` John Levon
  2025-05-06 12:27       ` Cédric Le Goater
  0 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-05-06 11:38 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On Mon, May 05, 2025 at 11:19:30AM +0200, Cédric Le Goater wrote:

> > +int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
> > +                                struct vfio_irq_info *info);
> 
> This is breaking the windows build.

Sorry, I forgot to set up cross-compile. I've done so now, and it was actually
vfio_device_prepare() that was broken, I think, as it was outside of the linux
ifdef.

regards
john


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper
  2025-05-06 11:38     ` John Levon
@ 2025-05-06 12:27       ` Cédric Le Goater
  2025-05-06 12:35         ` John Levon
  0 siblings, 1 reply; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-06 12:27 UTC (permalink / raw)
  To: John Levon
  Cc: qemu-devel, Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On 5/6/25 13:38, John Levon wrote:
> On Mon, May 05, 2025 at 11:19:30AM +0200, Cédric Le Goater wrote:
> 
>>> +int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
>>> +                                struct vfio_irq_info *info);
>>
>> This is breaking the windows build.
> 
> Sorry, I forgot to set up cross-compile. I've done so now, and it was actually
> vfio_device_prepare() that was broken, I think, as it was outside of the linux
> ifdef.


All references to 'struct vfio_irq_info *info' should be under
the Linux ifdef.

C.



^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper
  2025-05-06 12:27       ` Cédric Le Goater
@ 2025-05-06 12:35         ` John Levon
  0 siblings, 0 replies; 41+ messages in thread
From: John Levon @ 2025-05-06 12:35 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On Tue, May 06, 2025 at 02:27:21PM +0200, Cédric Le Goater wrote:

> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On 5/6/25 13:38, John Levon wrote:
> > On Mon, May 05, 2025 at 11:19:30AM +0200, Cédric Le Goater wrote:
> > 
> > > > +int vfio_device_get_irq_info(VFIODevice *vbasedev, int index,
> > > > +                                struct vfio_irq_info *info);
> > > 
> > > This is breaking the windows build.
> > 
> > Sorry, I forgot to set up cross-compile. I've done so now, and it was actually
> > vfio_device_prepare() that was broken, I think, as it was outside of the linux
> > ifdef.
> 
> All references to 'struct vfio_irq_info *info' should be under
> the Linux ifdef.

Yes, I think it already was (this is just before the CONFIG_LINUX #endif). The
other patch was definitely broken.

regards
john


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 09/15] vfio: implement unmap all for DMA unmap callbacks
  2025-05-05 11:28   ` Cédric Le Goater
@ 2025-05-07 11:47     ` John Levon
  0 siblings, 0 replies; 41+ messages in thread
From: John Levon @ 2025-05-07 11:47 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic

On Mon, May 05, 2025 at 01:28:02PM +0200, Cédric Le Goater wrote:

> > +static int vfio_legacy_dma_unmap(const VFIOContainerBase *bcontainer,
> > +                                 hwaddr iova, ram_addr_t size,
> > +                                 IOMMUTLBEntry *iotlb, bool unmap_all)
> > +{
> > +    int ret;
> > +
> > +    if (unmap_all) {
> > +        /* The unmap ioctl doesn't accept a full 64-bit span. */
> > +        Int128 llsize = int128_rshift(int128_2_64(), 1);
> > +
> > +        ret = vfio_legacy_dma_unmap_one(bcontainer, 0, int128_get64(llsize),
> > +                                        iotlb);
> > +
> > +        if (ret == 0) {
> > +            ret = vfio_legacy_dma_unmap_one(bcontainer, int128_get64(llsize),
> > +                                            int128_get64(llsize), iotlb);
> > +        }
> > +
> > +    } else {
> > +        ret = vfio_legacy_dma_unmap_one(bcontainer, iova, size, iotlb);
> > +    }
> > +
> > +    if (ret != 0) {
> > +        return -errno;
> > +    }
> 
> the ret value should already be an errno. Shouldn't it ?

Yes, this can just be "return ret", thanks.

regards
john


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 15/15] vfio/container: pass MemoryRegion to DMA operations
  2025-05-05 12:46   ` Cédric Le Goater
@ 2025-05-07 13:58     ` John Levon
  2025-05-09 10:28       ` Cédric Le Goater
  0 siblings, 1 reply; 41+ messages in thread
From: John Levon @ 2025-05-07 13:58 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: qemu-devel, Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic, John Johnson, Jagannathan Raman, Elena Ufimtseva

On Mon, May 05, 2025 at 02:46:12PM +0200, Cédric Le Goater wrote:

> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> On 4/30/25 21:40, John Levon wrote:
> > Pass through the MemoryRegion to DMA operation handlers of vfio
> > containers. The vfio-user container will need this later.
> > 
> > Originally-by: John Johnson <john.g.johnson@oracle.com>
> > Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> > Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> > Signed-off-by: John Levon <john.levon@nutanix.com>
> 
> You should add the system/memory maintainers as Cc: entries in this
> patch.

Double-checked, and they are already:

$ ./scripts/get_maintainer.pl -f system/memory.c
Paolo Bonzini <pbonzini@redhat.com> (supporter:Memory API)
Peter Xu <peterx@redhat.com> (supporter:Memory API)
David Hildenbrand <david@redhat.com> (supporter:Memory API)
"Philippe Mathieu-Daudé" <philmd@linaro.org> (reviewer:Memory API)
qemu-devel@nongnu.org (open list:All patches CC here)

regards
john


^ permalink raw reply	[flat|nested] 41+ messages in thread

* Re: [PATCH v2 15/15] vfio/container: pass MemoryRegion to DMA operations
  2025-05-07 13:58     ` John Levon
@ 2025-05-09 10:28       ` Cédric Le Goater
  0 siblings, 0 replies; 41+ messages in thread
From: Cédric Le Goater @ 2025-05-09 10:28 UTC (permalink / raw)
  To: John Levon
  Cc: qemu-devel, Peter Xu, qemu-s390x, Jason Herne, Tomita Moeko,
	Markus Armbruster, Matthew Rosato, Eric Farman, David Hildenbrand,
	Philippe Mathieu-Daudé, Michael S. Tsirkin, Tony Krowiak,
	Alex Williamson, Stefano Garzarella, Thomas Huth, Paolo Bonzini,
	Halil Pasic, John Johnson, Jagannathan Raman, Elena Ufimtseva

On 5/7/25 15:58, John Levon wrote:
> On Mon, May 05, 2025 at 02:46:12PM +0200, Cédric Le Goater wrote:
> 
>> !-------------------------------------------------------------------|
>>   CAUTION: External Email
>>
>> |-------------------------------------------------------------------!
>>
>> On 4/30/25 21:40, John Levon wrote:
>>> Pass through the MemoryRegion to DMA operation handlers of vfio
>>> containers. The vfio-user container will need this later.
>>>
>>> Originally-by: John Johnson <john.g.johnson@oracle.com>
>>> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
>>> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
>>> Signed-off-by: John Levon <john.levon@nutanix.com>
>>
>> You should add the system/memory maintainers as Cc: entries in this
>> patch.
> 
> Double-checked, and they are already:
> 
> $ ./scripts/get_maintainer.pl -f system/memory.c
> Paolo Bonzini <pbonzini@redhat.com> (supporter:Memory API)
> Peter Xu <peterx@redhat.com> (supporter:Memory API)
> David Hildenbrand <david@redhat.com> (supporter:Memory API)
> "Philippe Mathieu-Daudé" <philmd@linaro.org> (reviewer:Memory API)
> qemu-devel@nongnu.org (open list:All patches CC here)

Adding them on the Cc: list of the patch only is more likely to get
their attention. Subject should be changed too bc this is not a VFIO
but a system/memory change.


Thanks,

C.


  
> 
> regards
> john
> 



^ permalink raw reply	[flat|nested] 41+ messages in thread

end of thread, other threads:[~2025-05-09 10:29 UTC | newest]

Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-04-30 19:39 [PATCH v2 00/15] vfio: preparation for vfio-user John Levon
2025-04-30 19:39 ` [PATCH v2 01/15] vfio: add vfio_prepare_device() John Levon
2025-05-05  8:35   ` Cédric Le Goater
2025-04-30 19:39 ` [PATCH v2 02/15] vfio: add vfio_unprepare_device() John Levon
2025-05-05  9:18   ` Cédric Le Goater
2025-04-30 19:39 ` [PATCH v2 03/15] vfio: add vfio_attach_device_by_iommu_type() John Levon
2025-04-30 19:39 ` [PATCH v2 04/15] vfio: add vfio_device_get_irq_info() helper John Levon
2025-05-01 11:53   ` Anthony Krowiak
2025-05-05  9:19   ` Cédric Le Goater
2025-05-06 11:38     ` John Levon
2025-05-06 12:27       ` Cédric Le Goater
2025-05-06 12:35         ` John Levon
2025-04-30 19:39 ` [PATCH v2 05/15] vfio: consistently handle return value for helpers John Levon
2025-05-05  9:32   ` Cédric Le Goater
2025-04-30 19:39 ` [PATCH v2 06/15] include/qemu: add strread/writeerror() John Levon
2025-05-05  9:37   ` Cédric Le Goater
2025-04-30 19:39 ` [PATCH v2 07/15] vfio: add vfio_pci_config_space_read/write() John Levon
2025-05-05  9:45   ` Cédric Le Goater
2025-04-30 19:39 ` [PATCH v2 08/15] vfio: add unmap_all flag to DMA unmap callback John Levon
2025-05-05 12:06   ` Cédric Le Goater
2025-05-05 13:26     ` John Levon
2025-05-05 21:05       ` Cédric Le Goater
2025-04-30 19:39 ` [PATCH v2 09/15] vfio: implement unmap all for DMA unmap callbacks John Levon
2025-05-05 11:28   ` Cédric Le Goater
2025-05-07 11:47     ` John Levon
2025-04-30 19:39 ` [PATCH v2 10/15] vfio: add device IO ops vector John Levon
2025-05-05 12:21   ` Cédric Le Goater
2025-05-06 10:01   ` Cédric Le Goater
2025-04-30 19:39 ` [PATCH v2 11/15] vfio: add region info cache John Levon
2025-05-05 12:26   ` Cédric Le Goater
2025-04-30 19:40 ` [PATCH v2 12/15] vfio: add read/write to device IO ops vector John Levon
2025-05-05 12:39   ` Cédric Le Goater
2025-04-30 19:40 ` [PATCH v2 13/15] vfio: add vfio-pci-base class John Levon
2025-05-05 12:42   ` Cédric Le Goater
2025-04-30 19:40 ` [PATCH v2 14/15] vfio/container: pass listener_begin/commit callbacks John Levon
2025-05-05 12:43   ` Cédric Le Goater
2025-04-30 19:40 ` [PATCH v2 15/15] vfio/container: pass MemoryRegion to DMA operations John Levon
2025-05-05 12:46   ` Cédric Le Goater
2025-05-07 13:58     ` John Levon
2025-05-09 10:28       ` Cédric Le Goater
2025-05-05 12:51 ` [PATCH v2 00/15] vfio: preparation for vfio-user Cédric Le Goater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).