qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@redhat.com>
To: John Levon <john.levon@nutanix.com>, qemu-devel@nongnu.org
Cc: "Jason Herne" <jjherne@linux.ibm.com>,
	"Thanos Makatos" <thanos.makatos@nutanix.com>,
	"Halil Pasic" <pasic@linux.ibm.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>,
	"Eric Farman" <farman@linux.ibm.com>,
	"Tony Krowiak" <akrowiak@linux.ibm.com>,
	"Thomas Huth" <thuth@redhat.com>,
	qemu-s390x@nongnu.org, "Matthew Rosato" <mjrosato@linux.ibm.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>,
	"Stefano Garzarella" <sgarzare@redhat.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"David Hildenbrand" <david@redhat.com>,
	"Peter Xu" <peterx@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	"Jagannathan Raman" <jag.raman@oracle.com>,
	"John Johnson" <john.g.johnson@oracle.com>,
	"Elena Ufimtseva" <elena.ufimtseva@oracle.com>
Subject: Re: [PATCH v8 10/28] vfio: add device IO ops vector
Date: Fri, 4 Apr 2025 16:36:47 +0200	[thread overview]
Message-ID: <18613967-0bd5-4a9d-b2d5-37cd179ea0e6@redhat.com> (raw)
In-Reply-To: <20250219144858.266455-11-john.levon@nutanix.com>

On 2/19/25 15:48, John Levon wrote:
> From: Jagannathan Raman <jag.raman@oracle.com>
> 
> For vfio-user, device operations such as IRQ handling and region
> read/writes are implemented in userspace over the control socket, not
> ioctl() or read()/write() to the vfio kernel driver; add an ops vector
> to generalize this, and implement vfio_dev_io_ioctl for interacting
> with the kernel vfio driver.
> 
> The ops consistently use the "-errno" return style, as the vfio-user
> implementations get their errors from response messages not from the
> kernel; adjust the callers to handle this as necessary.

Please adjust the callers before introducing the new ops.
  
> Originally-by: John Johnson <john.g.johnson@oracle.com>
> Signed-off-by: Elena Ufimtseva <elena.ufimtseva@oracle.com>
> Signed-off-by: Jagannathan Raman <jag.raman@oracle.com>
> Signed-off-by: John Levon <john.levon@nutanix.com>
> ---
>   hw/vfio/ap.c                  |   2 +-
>   hw/vfio/ccw.c                 |   2 +-
>   hw/vfio/common.c              |  13 +--
>   hw/vfio/helpers.c             | 110 ++++++++++++++++++++++---
>   hw/vfio/pci.c                 | 147 ++++++++++++++++++++++------------
>   hw/vfio/platform.c            |   2 +-
>   include/hw/vfio/vfio-common.h |  27 ++++++-
>   7 files changed, 227 insertions(+), 76 deletions(-)
> 
> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
> index 30b08ad375..1adce1ab40 100644
> --- a/hw/vfio/ap.c
> +++ b/hw/vfio/ap.c
> @@ -228,7 +228,7 @@ static void vfio_ap_instance_init(Object *obj)
>        * handle ram_block_discard_disable().
>        */
>       vfio_device_init(vbasedev, VFIO_DEVICE_TYPE_AP, &vfio_ap_ops,
> -                     DEVICE(vapdev), true);
> +                     &vfio_dev_io_ioctl, DEVICE(vapdev), true);

Hmm, most of these parameters should be VFIODeviceClass attributes but
it doesn't exist. I don't see any nice improvements. Let's keep it.

>   
>       /* AP device is mdev type device */
>       vbasedev->mdev = true;
> diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> index 22378d50bc..8c16648819 100644
> --- a/hw/vfio/ccw.c
> +++ b/hw/vfio/ccw.c
> @@ -682,7 +682,7 @@ static void vfio_ccw_instance_init(Object *obj)
>        * ram_block_discard_disable().
>        */
>       vfio_device_init(vbasedev, VFIO_DEVICE_TYPE_CCW, &vfio_ccw_ops,
> -                     DEVICE(vcdev), true);
> +                     &vfio_dev_io_ioctl, DEVICE(vcdev), true);
>   }
>   
>   #ifdef CONFIG_IOMMUFD
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 1866b3d3c5..cc0c0f7fc7 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -971,7 +971,7 @@ static void vfio_devices_dma_logging_stop(VFIOContainerBase *bcontainer)
>               continue;
>           }
>   
> -        if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> +        if (vbasedev->io->device_feature(vbasedev, feature)) {
>               warn_report("%s: Failed to stop DMA logging, err %d (%s)",
>                           vbasedev->name, -errno, strerror(errno));
>           }
> @@ -1074,10 +1074,9 @@ static bool vfio_devices_dma_logging_start(VFIOContainerBase *bcontainer,
>               continue;
>           }
>   
> -        ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +        ret = vbasedev->io->device_feature(vbasedev, feature);
>           if (ret) {
> -            ret = -errno;
> -            error_setg_errno(errp, errno, "%s: Failed to start DMA logging",
> +            error_setg_errno(errp, -ret, "%s: Failed to start DMA logging",
>                                vbasedev->name);
>               goto out;
>           }
> @@ -1145,6 +1144,7 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
>       struct vfio_device_feature *feature = (struct vfio_device_feature *)buf;
>       struct vfio_device_feature_dma_logging_report *report =
>           (struct vfio_device_feature_dma_logging_report *)feature->data;
> +    int ret;
>   
>       report->iova = iova;
>       report->length = size;
> @@ -1155,8 +1155,9 @@ static int vfio_device_dma_logging_report(VFIODevice *vbasedev, hwaddr iova,
>       feature->flags = VFIO_DEVICE_FEATURE_GET |
>                        VFIO_DEVICE_FEATURE_DMA_LOGGING_REPORT;
>   
> -    if (ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature)) {
> -        return -errno;
> +    ret = vbasedev->io->device_feature(vbasedev, feature);
> +    if (ret) {
> +        return -ret;
>       }
>   
>       return 0;
> diff --git a/hw/vfio/helpers.c b/hw/vfio/helpers.c
> index 94bbc5747c..bef1540295 100644
> --- a/hw/vfio/helpers.c
> +++ b/hw/vfio/helpers.c
> @@ -44,7 +44,7 @@ void vfio_disable_irqindex(VFIODevice *vbasedev, int index)
>           .count = 0,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io->set_irqs(vbasedev, &irq_set);
>   }
>   
>   void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index)
> @@ -57,7 +57,7 @@ void vfio_unmask_single_irqindex(VFIODevice *vbasedev, int index)
>           .count = 1,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io->set_irqs(vbasedev, &irq_set);
>   }
>   
>   void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index)
> @@ -70,7 +70,7 @@ void vfio_mask_single_irqindex(VFIODevice *vbasedev, int index)
>           .count = 1,
>       };
>   
> -    ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, &irq_set);
> +    vbasedev->io->set_irqs(vbasedev, &irq_set);
>   }
>   
>   static inline const char *action_to_str(int action)
> @@ -117,6 +117,7 @@ bool vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
>       int argsz;
>       const char *name;
>       int32_t *pfd;
> +    int ret;
>   
>       argsz = sizeof(*irq_set) + sizeof(*pfd);
>   
> @@ -129,7 +130,9 @@ bool vfio_set_irq_signaling(VFIODevice *vbasedev, int index, int subindex,
>       pfd = (int32_t *)&irq_set->data;
>       *pfd = fd;
>   
> -    if (!ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irq_set)) {
> +    ret = vbasedev->io->set_irqs(vbasedev, irq_set);
> +
> +    if (!ret) {
>           return true;
>       }
>   
> @@ -161,6 +164,7 @@ void vfio_region_write(void *opaque, hwaddr addr,
>           uint32_t dword;
>           uint64_t qword;
>       } buf;
> +    int ret;
>   
>       switch (size) {
>       case 1:
> @@ -180,11 +184,12 @@ void vfio_region_write(void *opaque, hwaddr addr,
>           break;
>       }
>   
> -    if (pwrite(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
> +    ret = vbasedev->io->region_write(vbasedev, region->nr, addr, size, &buf);
> +    if (ret != size) {
>           error_report("%s(%s:region%d+0x%"HWADDR_PRIx", 0x%"PRIx64
> -                     ",%d) failed: %m",
> +                     ",%d) failed: %s",
>                        __func__, vbasedev->name, region->nr,
> -                     addr, data, size);
> +                     addr, data, size, ret < 0 ? strerror(ret) : "short write");
>       }
>   
>       trace_vfio_region_write(vbasedev->name, region->nr, addr, data, size);
> @@ -212,11 +217,13 @@ uint64_t vfio_region_read(void *opaque,
>           uint64_t qword;
>       } buf;
>       uint64_t data = 0;
> +    int ret;
>   
> -    if (pread(vbasedev->fd, &buf, size, region->fd_offset + addr) != size) {
> -        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %m",
> +    ret = vbasedev->io->region_read(vbasedev, region->nr, addr, size, &buf);
> +    if (ret != size) {
> +        error_report("%s(%s:region%d+0x%"HWADDR_PRIx", %d) failed: %s",
>                        __func__, vbasedev->name, region->nr,
> -                     addr, size);
> +                     addr, size, ret < 0 ? strerror(ret) : "short read");
>           return (uint64_t)-1;
>       }
>       switch (size) {
> @@ -561,6 +568,7 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
>                            struct vfio_region_info **info)
>   {
>       size_t argsz = sizeof(struct vfio_region_info);
> +    int ret;
>   
>       /* create region cache */
>       if (vbasedev->regions == NULL) {
> @@ -579,10 +587,11 @@ int vfio_get_region_info(VFIODevice *vbasedev, int index,
>   retry:
>       (*info)->argsz = argsz;
>   
> -    if (ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, *info)) {
> +    ret = vbasedev->io->get_region_info(vbasedev, *info);
> +    if (ret != 0) {
>           g_free(*info);
>           *info = NULL;
> -        return -errno;
> +        return ret;
>       }
>   
>       if ((*info)->argsz > argsz) {
> @@ -689,11 +698,12 @@ void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp)
>   }
>   
>   void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
> -                      DeviceState *dev, bool ram_discard)
> +                      VFIODeviceIO *io, DeviceState *dev, bool ram_discard)
>   {
>       vbasedev->type = type;
>       vbasedev->ops = ops;
>       vbasedev->dev = dev;
> +    vbasedev->io = io;
>       vbasedev->fd = -1;
>   
>       vbasedev->ram_block_discard_allowed = ram_discard;
> @@ -749,3 +759,77 @@ VFIODevice *vfio_get_vfio_device(Object *obj)
>           return NULL;
>       }
>   }
> +
> +/*
> + * Traditional ioctl() based io
> + */
> +
> +static int vfio_io_device_feature(VFIODevice *vbasedev,
> +                                  struct vfio_device_feature *feature)

'vfio_device' prefix ? vfio_device_io_device_feature. Minor since it's
local to the file.

> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_FEATURE, feature);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_io_get_region_info(VFIODevice *vbasedev,
> +                                   struct vfio_region_info *info)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_REGION_INFO, info);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_io_get_irq_info(VFIODevice *vbasedev,
> +                                struct vfio_irq_info *info)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_GET_IRQ_INFO, info);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_io_set_irqs(VFIODevice *vbasedev, struct vfio_irq_set *irqs)
> +{
> +    int ret;
> +
> +    ret = ioctl(vbasedev->fd, VFIO_DEVICE_SET_IRQS, irqs);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_io_region_read(VFIODevice *vbasedev, uint8_t index, off_t off,
> +                               uint32_t size, void *data)
> +{
> +    struct vfio_region_info *info = vbasedev->regions[index];
> +    int ret;
> +
> +    ret = pread(vbasedev->fd, data, size, info->offset + off);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +static int vfio_io_region_write(VFIODevice *vbasedev, uint8_t index, off_t off,
> +                                uint32_t size, void *data)
> +{
> +    struct vfio_region_info *info = vbasedev->regions[index];
> +    int ret;
> +
> +    ret = pwrite(vbasedev->fd, data, size, info->offset + off);
> +
> +    return ret < 0 ? -errno : ret;
> +}
> +
> +VFIODeviceIO vfio_dev_io_ioctl = {

vfio_device_io_ops_ioctl

> +    .device_feature = vfio_io_device_feature,
> +    .get_region_info = vfio_io_get_region_info,
> +    .get_irq_info = vfio_io_get_irq_info,
> +    .set_irqs = vfio_io_set_irqs,
> +    .region_read = vfio_io_region_read,
> +    .region_write = vfio_io_region_write,
> +};
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 812743e9dd..a9cc9366fb 100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -45,6 +45,14 @@
>   #include "migration/qemu-file.h"
>   #include "system/iommufd.h"
>   
> +/* convenience macros for PCI config space */
> +#define VDEV_CONFIG_READ(vbasedev, off, size, data) \
> +    ((vbasedev)->io->region_read((vbasedev), VFIO_PCI_CONFIG_REGION_INDEX, \
> +                                 (off), (size), (data)))
> +#define VDEV_CONFIG_WRITE(vbasedev, off, size, data) \
> +    ((vbasedev)->io->region_write((vbasedev), VFIO_PCI_CONFIG_REGION_INDEX, \
> +                                  (off), (size), (data)))
> +

Pease introduce these helpers in a separate patch.

>   #define TYPE_VFIO_PCI_NOHOTPLUG "vfio-pci-nohotplug"
>   
>   /* Protected by BQL */
> @@ -379,6 +387,7 @@ static void vfio_msi_interrupt(void *opaque)
>   static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
>   {
>       g_autofree struct vfio_irq_set *irq_set = NULL;
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       int ret = 0, argsz;
>       int32_t *fd;
>   
> @@ -394,7 +403,7 @@ static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)
>       fd = (int32_t *)&irq_set->data;
>       *fd = -1;
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = vbasedev->io->set_irqs(vbasedev, irq_set);
>   
>       return ret;
>   }
> @@ -453,7 +462,7 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
>           fds[i] = fd;
>       }
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> +    ret = vdev->vbasedev.io->set_irqs(&vdev->vbasedev, irq_set);
>   
>       g_free(irq_set);
>   
> @@ -763,7 +772,8 @@ retry:
>       ret = vfio_enable_vectors(vdev, false);
>       if (ret) {
>           if (ret < 0) {
> -            error_report("vfio: Error: Failed to setup MSI fds: %m");
> +            error_report("vfio: Error: Failed to setup MSI fds: %s",
> +                         strerror(-ret));
>           } else {
>               error_report("vfio: Error: Failed to enable %d "
>                            "MSI vectors, retry with %d", vdev->nr_vectors, ret);
> @@ -879,14 +889,17 @@ static void vfio_update_msi(VFIOPCIDevice *vdev)
>   
>   static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>   {
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       struct vfio_region_info *reg_info = NULL;
>       uint64_t size;
>       off_t off = 0;
>       ssize_t bytes;
> +    int ret;
>   
> -    if (vfio_get_region_info(&vdev->vbasedev,
> -                             VFIO_PCI_ROM_REGION_INDEX, &reg_info)) {
> -        error_report("vfio: Error getting ROM info: %m");
> +    ret = vfio_get_region_info(vbasedev, VFIO_PCI_ROM_REGION_INDEX, &reg_info);
> +
> +    if (ret != 0) {
> +        error_report("vfio: Error getting ROM info: %s", strerror(-ret));
>           return;
>       }
>   
> @@ -911,18 +924,19 @@ static void vfio_pci_load_rom(VFIOPCIDevice *vdev)
>       memset(vdev->rom, 0xff, size);
>   
>       while (size) {
> -        bytes = pread(vdev->vbasedev.fd, vdev->rom + off,
> -                      size, vdev->rom_offset + off);
> +        bytes = vbasedev->io->region_read(vbasedev, VFIO_PCI_ROM_REGION_INDEX,
> +                                          off, size, vdev->rom + off);
>           if (bytes == 0) {
>               break;
>           } else if (bytes > 0) {
>               off += bytes;
>               size -= bytes;
>           } else {
> -            if (errno == EINTR || errno == EAGAIN) {
> +            if (bytes == -EINTR || bytes == -EAGAIN) {
>                   continue;
>               }
> -            error_report("vfio: Error reading device ROM: %m");
> +            error_report("vfio: Error reading device ROM: %s",
> +                         strerror(-bytes));
>               break;
>           }
>       }
> @@ -1010,10 +1024,9 @@ static const MemoryRegionOps vfio_rom_ops = {
>   
>   static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>   {
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       uint32_t orig, size = cpu_to_le32((uint32_t)PCI_ROM_ADDRESS_MASK);
> -    off_t offset = vdev->config_offset + PCI_ROM_ADDRESS;
>       char *name;
> -    int fd = vdev->vbasedev.fd;
>   
>       if (vdev->pdev.romfile || !vdev->pdev.rom_bar) {
>           /* Since pci handles romfile, just print a message and return */
> @@ -1030,11 +1043,12 @@ static void vfio_pci_size_rom(VFIOPCIDevice *vdev)
>        * Use the same size ROM BAR as the physical device.  The contents
>        * will get filled in later when the guest tries to read it.
>        */
> -    if (pread(fd, &orig, 4, offset) != 4 ||
> -        pwrite(fd, &size, 4, offset) != 4 ||
> -        pread(fd, &size, 4, offset) != 4 ||
> -        pwrite(fd, &orig, 4, offset) != 4) {
> -        error_report("%s(%s) failed: %m", __func__, vdev->vbasedev.name);
> +    if (VDEV_CONFIG_READ(vbasedev, PCI_ROM_ADDRESS, 4, &orig) != 4 ||
> +        VDEV_CONFIG_WRITE(vbasedev, PCI_ROM_ADDRESS, 4, &size) != 4 ||
> +        VDEV_CONFIG_READ(vbasedev, PCI_ROM_ADDRESS, 4, &size) != 4 ||
> +        VDEV_CONFIG_WRITE(vbasedev, PCI_ROM_ADDRESS, 4, &orig) != 4) {
> +
> +        error_report("%s(%s) ROM access failed", __func__, vbasedev->name);
>           return;
>       }
>   
> @@ -1214,6 +1228,7 @@ static void vfio_sub_page_bar_update_mapping(PCIDevice *pdev, int bar)
>   uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>   {
>       VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       uint32_t emu_bits = 0, emu_val = 0, phys_val = 0, val;
>   
>       memcpy(&emu_bits, vdev->emulated_config_bits + addr, len);
> @@ -1226,12 +1241,13 @@ uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len)
>       if (~emu_bits & (0xffffffffU >> (32 - len * 8))) {
>           ssize_t ret;
>   
> -        ret = pread(vdev->vbasedev.fd, &phys_val, len,
> -                    vdev->config_offset + addr);
> +        ret = VDEV_CONFIG_READ(vbasedev, addr, len, &phys_val);
>           if (ret != len) {
> -            error_report("%s(%s, 0x%x, 0x%x) failed: %m",
> -                         __func__, vdev->vbasedev.name, addr, len);
> -            return -errno;
> +            const char *err = ret < 0 ? strerror(-ret) : "short read";
> +
> +            error_report("%s(%s, 0x%x, 0x%x) failed: %s",
> +                         __func__, vbasedev->name, addr, len, err);
> +            return -1;
>           }
>           phys_val = le32_to_cpu(phys_val);
>       }
> @@ -1247,15 +1263,19 @@ void vfio_pci_write_config(PCIDevice *pdev,
>                              uint32_t addr, uint32_t val, int len)
>   {
>       VFIOPCIDevice *vdev = VFIO_PCI_BASE(pdev);
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       uint32_t val_le = cpu_to_le32(val);
> +    int ret;
>   
>       trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
>   
>       /* Write everything to VFIO, let it filter out what we can't write */
> -    if (pwrite(vdev->vbasedev.fd, &val_le, len, vdev->config_offset + addr)
> -                != len) {
> -        error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %m",
> -                     __func__, vdev->vbasedev.name, addr, val, len);
> +    ret = VDEV_CONFIG_WRITE(vbasedev, addr, len, &val_le);
> +    if (ret != len) {
> +        const char *err = ret < 0 ? strerror(-ret) : "short write";
> +
> +        error_report("%s(%s, 0x%x, 0x%x, 0x%x) failed: %s",
> +                     __func__, vbasedev->name, addr, val, len, err);
>       }
>   
>       /* MSI/MSI-X Enabling/Disabling */
> @@ -1343,9 +1363,12 @@ static bool vfio_msi_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
>       int ret, entries;
>       Error *err = NULL;
>   
> -    if (pread(vdev->vbasedev.fd, &ctrl, sizeof(ctrl),
> -              vdev->config_offset + pos + PCI_CAP_FLAGS) != sizeof(ctrl)) {
> -        error_setg_errno(errp, errno, "failed reading MSI PCI_CAP_FLAGS");
> +    ret = VDEV_CONFIG_READ(&vdev->vbasedev, pos + PCI_CAP_FLAGS,
> +                           sizeof(ctrl), &ctrl);
> +    if (ret != sizeof(ctrl)) {
> +        const char *errmsg = ret < 0 ? strerror(-ret) : "short read";
> +
> +        error_setg(errp, "failed reading MSI PCI_CAP_FLAGS %s", errmsg);
>           return false;
>       }
>       ctrl = le16_to_cpu(ctrl);
> @@ -1549,34 +1572,43 @@ static bool vfio_pci_relocate_msix(VFIOPCIDevice *vdev, Error **errp)
>    */
>   static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>   {
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       uint8_t pos;
>       uint16_t ctrl;
>       uint32_t table, pba;
> -    int ret, fd = vdev->vbasedev.fd;
>       struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
>                                         .index = VFIO_PCI_MSIX_IRQ_INDEX };
>       VFIOMSIXInfo *msix;
> +    int ret;
>   
>       pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX);
>       if (!pos) {
>           return true;
>       }
>   
> -    if (pread(fd, &ctrl, sizeof(ctrl),
> -              vdev->config_offset + pos + PCI_MSIX_FLAGS) != sizeof(ctrl)) {
> -        error_setg_errno(errp, errno, "failed to read PCI MSIX FLAGS");
> +    ret = VDEV_CONFIG_READ(vbasedev, pos + PCI_MSIX_FLAGS,
> +                           sizeof(ctrl), &ctrl);
> +    if (ret != sizeof(ctrl)) {
> +        const char *err = ret < 0 ? strerror(-ret) : "short read";
> +
> +        error_setg(errp, "failed to read PCI MSIX FLAGS: %s", err);
>           return false;
>       }
>   
> -    if (pread(fd, &table, sizeof(table),
> -              vdev->config_offset + pos + PCI_MSIX_TABLE) != sizeof(table)) {
> -        error_setg_errno(errp, errno, "failed to read PCI MSIX TABLE");
> +    ret = VDEV_CONFIG_READ(vbasedev, pos + PCI_MSIX_TABLE,
> +                           sizeof(table), &table);
> +    if (ret != sizeof(table)) {
> +        const char *err = ret < 0 ? strerror(-ret) : "short read";
> +
> +        error_setg(errp, "failed to read PCI MSIX TABLE: %s", err);
>           return false;
>       }
>   
> -    if (pread(fd, &pba, sizeof(pba),
> -              vdev->config_offset + pos + PCI_MSIX_PBA) != sizeof(pba)) {
> -        error_setg_errno(errp, errno, "failed to read PCI MSIX PBA");
> +    ret = VDEV_CONFIG_READ(vbasedev, pos + PCI_MSIX_PBA, sizeof(pba), &pba);
> +    if (ret != sizeof(pba)) {
> +        const char *err = ret < 0 ? strerror(-ret) : "short read";
> +
> +        error_setg(errp, "failed to read PCI MSIX PBA: %s", err);
>           return false;
>       }
>   
> @@ -1591,7 +1623,7 @@ static bool vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
>       msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
>       msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    ret = vdev->vbasedev.io->get_irq_info(&vdev->vbasedev, &irq_info);
>       if (ret < 0) {
>           error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
>           g_free(msix);
> @@ -1735,10 +1767,12 @@ static void vfio_bar_prepare(VFIOPCIDevice *vdev, int nr)
>       }
>   
>       /* Determine what type of BAR this is for registration */
> -    ret = pread(vdev->vbasedev.fd, &pci_bar, sizeof(pci_bar),
> -                vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr));
> +    ret = VDEV_CONFIG_READ(&vdev->vbasedev, PCI_BASE_ADDRESS_0 + (4 * nr),
> +                           sizeof(pci_bar), &pci_bar);
>       if (ret != sizeof(pci_bar)) {
> -        error_report("vfio: Failed to read BAR %d (%m)", nr);
> +        const char *err =  ret < 0 ? strerror(-ret) : "short read";
> +
> +        error_report("vfio: Failed to read BAR %d: %s", nr, err);
>           return;
>       }
>   
> @@ -2438,21 +2472,25 @@ void vfio_pci_pre_reset(VFIOPCIDevice *vdev)
>   
>   void vfio_pci_post_reset(VFIOPCIDevice *vdev)
>   {
> +    VFIODevice *vbasedev = &vdev->vbasedev;
>       Error *err = NULL;
> -    int nr;
> +    int ret, nr;
>   
>       if (!vfio_intx_enable(vdev, &err)) {
>           error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
>       }
>   
>       for (nr = 0; nr < PCI_NUM_REGIONS - 1; ++nr) {
> -        off_t addr = vdev->config_offset + PCI_BASE_ADDRESS_0 + (4 * nr);
> +        off_t addr = PCI_BASE_ADDRESS_0 + (4 * nr);
>           uint32_t val = 0;
>           uint32_t len = sizeof(val);
>   
> -        if (pwrite(vdev->vbasedev.fd, &val, len, addr) != len) {
> -            error_report("%s(%s) reset bar %d failed: %m", __func__,
> -                         vdev->vbasedev.name, nr);
> +        ret = VDEV_CONFIG_WRITE(vbasedev, addr, len, &val);
> +        if (ret != len) {
> +            const char *errmsg = ret < 0 ? strerror(-ret) : "short write";
> +
> +            error_report("%s(%s) reset bar %d failed: %s", __func__,
> +                         vbasedev->name, nr, errmsg);
>           }
>       }
>   
> @@ -2794,10 +2832,10 @@ static bool vfio_populate_device(VFIOPCIDevice *vdev, Error **errp)
>   
>       irq_info.index = VFIO_PCI_ERR_IRQ_INDEX;
>   
> -    ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> +    ret = vbasedev->io->get_irq_info(vbasedev, &irq_info);
>       if (ret) {
>           /* This can fail for an old kernel or legacy PCI dev */
> -        trace_vfio_populate_device_get_irq_info_failure(strerror(errno));
> +        trace_vfio_populate_device_get_irq_info_failure(strerror(-ret));
>       } else if (irq_info.count == 1) {
>           vdev->pci_aer = true;
>       } else {
> @@ -2915,8 +2953,11 @@ static void vfio_register_req_notifier(VFIOPCIDevice *vdev)
>           return;
>       }
>   
> -    if (ioctl(vdev->vbasedev.fd,
> -              VFIO_DEVICE_GET_IRQ_INFO, &irq_info) < 0 || irq_info.count < 1) {
> +    if (vdev->vbasedev.io->get_irq_info(&vdev->vbasedev, &irq_info) < 0) {
> +        return;
> +    }
> +
> +    if (irq_info.count < 1) {
>           return;
>       }
>   
> @@ -3368,7 +3409,7 @@ static void vfio_instance_init(Object *obj)
>       vdev->host.function = ~0U;
>   
>       vfio_device_init(vbasedev, VFIO_DEVICE_TYPE_PCI, &vfio_pci_ops,
> -                     DEVICE(vdev), false);
> +                     &vfio_dev_io_ioctl, DEVICE(vdev), false);
>   
>       vdev->nv_gpudirect_clique = 0xFF;
>   
> diff --git a/hw/vfio/platform.c b/hw/vfio/platform.c
> index f491f4dc95..51534fd941 100644
> --- a/hw/vfio/platform.c
> +++ b/hw/vfio/platform.c
> @@ -648,7 +648,7 @@ static void vfio_platform_instance_init(Object *obj)
>       VFIODevice *vbasedev = &vdev->vbasedev;
>   
>       vfio_device_init(vbasedev, VFIO_DEVICE_TYPE_PLATFORM, &vfio_platform_ops,
> -                     DEVICE(vdev), false);
> +                     &vfio_dev_io_ioctl, DEVICE(vdev), false);
>   }
>   
>   #ifdef CONFIG_IOMMUFD
> diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
> index 304030e71d..3512556590 100644
> --- a/include/hw/vfio/vfio-common.h
> +++ b/include/hw/vfio/vfio-common.h
> @@ -116,6 +116,7 @@ typedef struct VFIOIOMMUFDContainer {
>   OBJECT_DECLARE_SIMPLE_TYPE(VFIOIOMMUFDContainer, VFIO_IOMMU_IOMMUFD);
>   
>   typedef struct VFIODeviceOps VFIODeviceOps;
> +typedef struct VFIODeviceIO VFIODeviceIO;

I suggest VFIODeviceIOOps
    
>   typedef struct VFIODevice {
>       QLIST_ENTRY(VFIODevice) next;
> @@ -136,6 +137,7 @@ typedef struct VFIODevice {
>       OnOffAuto enable_migration;
>       bool migration_events;
>       VFIODeviceOps *ops;
> +    VFIODeviceIO *io;

io_ops

>       unsigned int num_irqs;
>       unsigned int num_regions;
>       unsigned int flags;
> @@ -186,6 +188,29 @@ struct VFIODeviceOps {
>       int (*vfio_load_config)(VFIODevice *vdev, QEMUFile *f);
>   };
>   
> +#ifdef CONFIG_LINUX
> +
> +/*
> + * How devices communicate with the server.  The default option is through
> + * ioctl() to the kernel VFIO driver, but vfio-user can use a socket to a remote
> + * process.
> + */
> +struct VFIODeviceIO {
> +    int (*device_feature)(VFIODevice *vdev, struct vfio_device_feature *);
> +    int (*get_region_info)(VFIODevice *vdev,
> +                           struct vfio_region_info *info);
> +    int (*get_irq_info)(VFIODevice *vdev, struct vfio_irq_info *irq);
> +    int (*set_irqs)(VFIODevice *vdev, struct vfio_irq_set *irqs);
> +    int (*region_read)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
> +                       void *data);
> +    int (*region_write)(VFIODevice *vdev, uint8_t nr, off_t off, uint32_t size,
> +                        void *data);
> +};
> +
> +extern VFIODeviceIO vfio_dev_io_ioctl;

vfio_dev_io_ops_ioctl

> +> +#endif /* CONFIG_LINUX */
> +
>   typedef struct VFIOGroup {
>       int fd;
>       int groupid;
> @@ -317,6 +342,6 @@ int vfio_get_dirty_bitmap(const VFIOContainerBase *bcontainer, uint64_t iova,
>   bool vfio_device_get_name(VFIODevice *vbasedev, Error **errp);
>   void vfio_device_set_fd(VFIODevice *vbasedev, const char *str, Error **errp);
>   void vfio_device_init(VFIODevice *vbasedev, int type, VFIODeviceOps *ops,
> -                      DeviceState *dev, bool ram_discard);
> +                      VFIODeviceIO *io, DeviceState *dev, bool ram_discard);

VFIODeviceIOOps *io_ops

>   int vfio_device_get_aw_bits(VFIODevice *vdev);
>   #endif /* HW_VFIO_VFIO_COMMON_H */

Just minor changes. Looks good.

Thanks,

C.




  reply	other threads:[~2025-04-04 14:37 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-19 14:48 [PATCH v8 00/28] vfio-user client John Levon
2025-02-19 14:48 ` [PATCH v8 01/28] vfio/container: pass MemoryRegion to DMA operations John Levon
2025-04-02 16:44   ` Cédric Le Goater
2025-02-19 14:48 ` [PATCH v8 02/28] vfio/container: pass listener_begin/commit callbacks John Levon
2025-04-02 12:30   ` Cédric Le Goater
2025-02-19 14:48 ` [PATCH v8 03/28] vfio/container: support VFIO_DMA_UNMAP_FLAG_ALL John Levon
2025-04-02 16:49   ` Cédric Le Goater
2025-04-03  9:45     ` John Levon
2025-04-04 15:43       ` Cédric Le Goater
2025-02-19 14:48 ` [PATCH v8 04/28] vfio: add vfio_attach_device_by_iommu_type() John Levon
2025-04-02 16:52   ` Cédric Le Goater
2025-02-19 14:48 ` [PATCH v8 05/28] vfio: add vfio_prepare_device() John Levon
2025-04-03  9:19   ` Cédric Le Goater
2025-04-03  9:34     ` John Levon
2025-04-04 15:41       ` Cédric Le Goater
2025-04-04 15:45         ` John Levon
2025-02-19 14:48 ` [PATCH v8 06/28] vfio: refactor out vfio_interrupt_setup() John Levon
2025-04-03  9:23   ` Cédric Le Goater
2025-04-03  9:38     ` John Levon
2025-02-19 14:48 ` [PATCH v8 07/28] vfio: refactor out vfio_pci_config_setup() John Levon
2025-04-03  9:30   ` Cédric Le Goater
2025-02-19 14:48 ` [PATCH v8 08/28] vfio: add region cache John Levon
2025-04-03 15:46   ` Cédric Le Goater
2025-04-03 16:00     ` John Levon
2025-04-04 16:57       ` Cédric Le Goater
2025-04-04 17:18         ` John Levon
2025-04-08 13:48           ` John Levon
2025-02-19 14:48 ` [PATCH v8 09/28] vfio: split out VFIOKernelPCIDevice John Levon
2025-04-03 17:13   ` Cédric Le Goater
2025-04-03 18:08     ` John Levon
2025-04-04 12:49       ` Cédric Le Goater
2025-04-04 14:21         ` John Levon
2025-04-04 14:48           ` Cédric Le Goater
2025-04-04 15:44             ` John Levon
2025-02-19 14:48 ` [PATCH v8 10/28] vfio: add device IO ops vector John Levon
2025-04-04 14:36   ` Cédric Le Goater [this message]
2025-04-04 15:53     ` John Levon
2025-02-19 14:48 ` [PATCH v8 11/28] vfio-user: introduce vfio-user protocol specification John Levon
2025-02-19 14:48 ` [PATCH v8 12/28] vfio-user: add vfio-user class and container John Levon
2025-02-19 14:48 ` [PATCH v8 13/28] vfio-user: connect vfio proxy to remote server John Levon
2025-02-19 14:48 ` [PATCH v8 14/28] vfio-user: implement message receive infrastructure John Levon
2025-02-19 14:48 ` [PATCH v8 15/28] vfio-user: implement message send infrastructure John Levon
2025-02-19 14:48 ` [PATCH v8 16/28] vfio-user: implement VFIO_USER_DEVICE_GET_INFO John Levon
2025-02-19 14:48 ` [PATCH v8 17/28] vfio-user: implement VFIO_USER_DEVICE_GET_REGION_INFO John Levon
2025-02-19 14:48 ` [PATCH v8 18/28] vfio-user: implement VFIO_USER_REGION_READ/WRITE John Levon
2025-02-19 14:48 ` [PATCH v8 19/28] vfio-user: set up PCI in vfio_user_pci_realize() John Levon
2025-02-19 14:48 ` [PATCH v8 20/28] vfio-user: implement VFIO_USER_DEVICE_GET/SET_IRQ* John Levon
2025-02-19 14:48 ` [PATCH v8 21/28] vfio-user: forward MSI-X PBA BAR accesses to server John Levon
2025-02-19 14:48 ` [PATCH v8 22/28] vfio-user: set up container access to the proxy John Levon
2025-02-19 14:48 ` [PATCH v8 23/28] vfio-user: implement VFIO_USER_DEVICE_RESET John Levon
2025-02-19 14:48 ` [PATCH v8 24/28] vfio-user: implement VFIO_USER_DMA_MAP/UNMAP John Levon
2025-02-19 14:48 ` [PATCH v8 25/28] vfio-user: implement VFIO_USER_DMA_READ/WRITE John Levon
2025-02-19 14:48 ` [PATCH v8 26/28] vfio-user: add 'no-direct-dma' option John Levon
2025-02-19 14:48 ` [PATCH v8 27/28] vfio-user: add 'x-msg-timeout' option John Levon
2025-02-19 14:48 ` [PATCH v8 28/28] vfio-user: add coalesced posted writes John Levon
2025-02-28 17:09 ` [PATCH v8 00/28] vfio-user client Jag Raman
2025-03-03 11:19   ` John Levon
2025-03-03 15:39     ` Jag Raman
2025-03-14 14:25 ` Cédric Le Goater
2025-03-14 14:48   ` Steven Sistare
2025-03-18 10:00     ` Cédric Le Goater
2025-03-14 15:13   ` John Levon
2025-03-18 10:02     ` Cédric Le Goater
2025-04-04 17:21 ` Cédric Le Goater

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=18613967-0bd5-4a9d-b2d5-37cd179ea0e6@redhat.com \
    --to=clg@redhat.com \
    --cc=akrowiak@linux.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=berrange@redhat.com \
    --cc=david@redhat.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=farman@linux.ibm.com \
    --cc=jag.raman@oracle.com \
    --cc=jjherne@linux.ibm.com \
    --cc=john.g.johnson@oracle.com \
    --cc=john.levon@nutanix.com \
    --cc=marcandre.lureau@redhat.com \
    --cc=mjrosato@linux.ibm.com \
    --cc=mst@redhat.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=thanos.makatos@nutanix.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).