qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: "Cédric Le Goater" <clg@redhat.com>, qemu-devel@nongnu.org
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Yi Liu <yi.l.liu@intel.com>, Eric Auger <eric.auger@redhat.com>,
	Zhenzhong Duan <zhenzhong.duan@intel.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>
Subject: Re: [PATCH V1 19/26] vfio/iommufd: use IOMMU_IOAS_MAP_FILE
Date: Wed, 5 Feb 2025 17:01:42 -0500	[thread overview]
Message-ID: <a649d9a2-f9d2-4366-9abf-6fa7c4321c7e@oracle.com> (raw)
In-Reply-To: <94093d0a-f9bb-4327-b793-2f3145c7cba2@redhat.com>

On 2/5/2025 12:23 PM, Cédric Le Goater wrote:
> On 1/29/25 15:43, Steve Sistare wrote:
>> Use IOMMU_IOAS_MAP_FILE when the mapped region is backed by a file.
>> Such a mapping can be preserved without modification during CPR,
>> because it depends on the file's address space, which does not change,
>> rather than on the process's address space, which does change.
>>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>>   backends/iommufd.c                    | 36 +++++++++++++++++++++++++++++++++++
>>   backends/trace-events                 |  1 +
>>   hw/vfio/container-base.c              |  9 +++++++++
>>   hw/vfio/iommufd.c                     | 13 +++++++++++++
>>   include/exec/cpu-common.h             |  1 +
>>   include/hw/vfio/vfio-container-base.h |  3 +++
>>   include/system/iommufd.h              |  3 +++
>>   system/physmem.c                      |  5 +++++
>>   8 files changed, 71 insertions(+)
>>
>> diff --git a/backends/iommufd.c b/backends/iommufd.c
>> index 7b4fc8e..6d29221 100644
>> --- a/backends/iommufd.c
>> +++ b/backends/iommufd.c
>> @@ -174,6 +174,42 @@ int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
>>       return ret;
>>   }
>> +int iommufd_backend_map_file_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> +                                 hwaddr iova, ram_addr_t size,
>> +                                 int mfd, unsigned long start, bool readonly)
> 
> Please introduce a patch for this new routine.

OK.

>> +{
>> +    int ret, fd = be->fd;
>> +    struct iommu_ioas_map_file map = {
>> +        .size = sizeof(map),
>> +        .flags = IOMMU_IOAS_MAP_READABLE |
>> +                 IOMMU_IOAS_MAP_FIXED_IOVA,
>> +        .ioas_id = ioas_id,
>> +        .fd = mfd,
>> +        .start = start,
>> +        .iova = iova,
>> +        .length = size,
>> +    };
>> +
>> +    if (!readonly) {
>> +        map.flags |= IOMMU_IOAS_MAP_WRITEABLE;
>> +    }
>> +
>> +    ret = ioctl(fd, IOMMU_IOAS_MAP_FILE, &map);
>> +    trace_iommufd_backend_map_file_dma(fd, ioas_id, iova, size, mfd, start,
>> +                                       readonly, ret);
>> +    if (ret) {
>> +        ret = -errno;
>> +
>> +        /* TODO: Not support mapping hardware PCI BAR region for now. */
>> +        if (errno == EFAULT) {
>> +            warn_report("IOMMU_IOAS_MAP_FILE failed: %m, PCI BAR?");
> 
> I am not sure this warning can occur when the PCI BARs are mmaped
> in an VM with incompatible address spaces. My attempts produced EINVAL.
> Let's keep it for now until it is clarified.
> 
> 
>> +        } else {
>> +            error_report("IOMMU_IOAS_MAP_FILE failed: %m");
> 
> please remove this error report. It's redundant with the callers which
> will report the same.

These warnings and errors are copied as-is from iommufd_backend_map_dma.
I aim to be bug-for-bug compatible until the issues with mapping BARs
are resolved.

>> +        }
>> +    }
>> +    return ret;
>> +}
>> +
>>   int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>>                                 hwaddr iova, ram_addr_t size)
>>   {
>> diff --git a/backends/trace-events b/backends/trace-events
>> index 40811a3..f478e18 100644
>> --- a/backends/trace-events
>> +++ b/backends/trace-events
>> @@ -11,6 +11,7 @@ iommufd_backend_connect(int fd, bool owned, uint32_t users) "fd=%d owned=%d user
>>   iommufd_backend_disconnect(int fd, uint32_t users) "fd=%d users=%d"
>>   iommu_backend_set_fd(int fd) "pre-opened /dev/iommu fd=%d"
>>   iommufd_backend_map_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, void *vaddr, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" addr=%p readonly=%d (%d)"
>> +iommufd_backend_map_file_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int fd, unsigned long start, bool readonly, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" fd=%d start=%ld readonly=%d (%d)"
>>   iommufd_backend_unmap_dma_non_exist(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " Unmap nonexistent mapping: iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>   iommufd_backend_unmap_dma(int iommufd, uint32_t ioas, uint64_t iova, uint64_t size, int ret) " iommufd=%d ioas=%d iova=0x%"PRIx64" size=0x%"PRIx64" (%d)"
>>   iommufd_backend_alloc_ioas(int iommufd, uint32_t ioas) " iommufd=%d ioas=%d"
>> diff --git a/hw/vfio/container-base.c b/hw/vfio/container-base.c
>> index 302cd4c..fbaf04a 100644
>> --- a/hw/vfio/container-base.c
>> +++ b/hw/vfio/container-base.c
>> @@ -21,7 +21,16 @@ int vfio_container_dma_map(VFIOContainerBase *bcontainer,
>>                              RAMBlock *rb)
>>   {
>>       VFIOIOMMUClass *vioc = VFIO_IOMMU_GET_CLASS(bcontainer);
>> +    int mfd = rb ? qemu_ram_get_fd(rb) : -1;
>> +    if (mfd >= 0 && vioc->dma_map_file) {
>> +        unsigned long start = vaddr - qemu_ram_get_host_addr(rb);
>> +        unsigned long offset = qemu_ram_get_fd_offset(rb);
>> +
>> +        vioc->dma_map_file(bcontainer, iova, size, mfd, start + offset,
>> +                           readonly);
>> +        return 0;
> 
> This is CPR related. Please add a dma_map_file helper and move the
> code abolve to a cpr file.

This is not specific to CPR.  It has value that is independent of CPR,
by representing mappings in the kernel using file mappings with folios
rather than struct pages.  It would be proposed even if CPR did not exist.

>> +    }
>>       g_assert(vioc->dma_map);
>>       return vioc->dma_map(bcontainer, iova, size, vaddr, readonly);
>>   }
>> diff --git a/hw/vfio/iommufd.c b/hw/vfio/iommufd.c
>> index 42ba63f..a3e7edb 100644
>> --- a/hw/vfio/iommufd.c
>> +++ b/hw/vfio/iommufd.c
>> @@ -38,6 +38,18 @@ static int iommufd_cdev_map(const VFIOContainerBase *bcontainer, hwaddr iova,
>>                                      iova, size, vaddr, readonly);
>>   }
>> +static int iommufd_cdev_map_file(const VFIOContainerBase *bcontainer,
>> +                                 hwaddr iova, ram_addr_t size,
>> +                                 int fd, unsigned long start, bool readonly)
>> +{
>> +    const VFIOIOMMUFDContainer *container =
>> +        container_of(bcontainer, VFIOIOMMUFDContainer, bcontainer);
>> +
>> +    return iommufd_backend_map_file_dma(container->be,
>> +                                        container->ioas_id,
>> +                                        iova, size, fd, start, readonly);
>> +}
>> +
>>   static int iommufd_cdev_unmap(const VFIOContainerBase *bcontainer,
>>                                 hwaddr iova, ram_addr_t size,
>>                                 IOMMUTLBEntry *iotlb)
>> @@ -806,6 +818,7 @@ static void vfio_iommu_iommufd_class_init(ObjectClass *klass, void *data)
>>       vioc->hiod_typename = TYPE_HOST_IOMMU_DEVICE_IOMMUFD_VFIO;
>>       vioc->dma_map = iommufd_cdev_map;
>> +    vioc->dma_map_file = iommufd_cdev_map_file;
>>       vioc->dma_unmap = iommufd_cdev_unmap;
>>       vioc->attach_device = iommufd_cdev_attach;
>>       vioc->detach_device = iommufd_cdev_detach;
>> diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
>> index b1d76d6..0cab252 100644
>> --- a/include/exec/cpu-common.h
>> +++ b/include/exec/cpu-common.h
>> @@ -95,6 +95,7 @@ void qemu_ram_unset_idstr(RAMBlock *block);
>>   const char *qemu_ram_get_idstr(RAMBlock *rb);
>>   void *qemu_ram_get_host_addr(RAMBlock *rb);
>>   ram_addr_t qemu_ram_get_offset(RAMBlock *rb);
>> +ram_addr_t qemu_ram_get_fd_offset(RAMBlock *rb);
>>   ram_addr_t qemu_ram_get_used_length(RAMBlock *rb);
>>   ram_addr_t qemu_ram_get_max_length(RAMBlock *rb);
>>   bool qemu_ram_is_shared(RAMBlock *rb);
>> diff --git a/include/hw/vfio/vfio-container-base.h b/include/hw/vfio/vfio-container-base.h
>> index d82e256..4daa5f8 100644
>> --- a/include/hw/vfio/vfio-container-base.h
>> +++ b/include/hw/vfio/vfio-container-base.h
>> @@ -115,6 +115,9 @@ struct VFIOIOMMUClass {
>>       int (*dma_map)(const VFIOContainerBase *bcontainer,
>>                      hwaddr iova, ram_addr_t size,
>>                      void *vaddr, bool readonly);
>> +    int (*dma_map_file)(const VFIOContainerBase *bcontainer,
>> +                        hwaddr iova, ram_addr_t size,
>> +                        int fd, unsigned long start, bool readonly);
>>       int (*dma_unmap)(const VFIOContainerBase *bcontainer,
>>                        hwaddr iova, ram_addr_t size,
>>                        IOMMUTLBEntry *iotlb);
>> diff --git a/include/system/iommufd.h b/include/system/iommufd.h
>> index cbab75b..ac700b8 100644
>> --- a/include/system/iommufd.h
>> +++ b/include/system/iommufd.h
>> @@ -43,6 +43,9 @@ void iommufd_backend_disconnect(IOMMUFDBackend *be);
>>   bool iommufd_backend_alloc_ioas(IOMMUFDBackend *be, uint32_t *ioas_id,
>>                                   Error **errp);
>>   void iommufd_backend_free_id(IOMMUFDBackend *be, uint32_t id);
>> +int iommufd_backend_map_file_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> +                                 hwaddr iova, ram_addr_t size, int fd,
>> +                                 unsigned long start, bool readonly);
>>   int iommufd_backend_map_dma(IOMMUFDBackend *be, uint32_t ioas_id, hwaddr iova,
>>                               ram_addr_t size, void *vaddr, bool readonly);
>>   int iommufd_backend_unmap_dma(IOMMUFDBackend *be, uint32_t ioas_id,
>> diff --git a/system/physmem.c b/system/physmem.c
>> index 0bcfc6c..c41a80b 100644
>> --- a/system/physmem.c
>> +++ b/system/physmem.c
>> @@ -1569,6 +1569,11 @@ ram_addr_t qemu_ram_get_offset(RAMBlock *rb)
>>       return rb->offset;
>>   }
>> +ram_addr_t qemu_ram_get_fd_offset(RAMBlock *rb)
>> +{
>> +    return rb->fd_offset;
>> +}
> 
> Should go in its own patch.

OK.

- Steve

>>   ram_addr_t qemu_ram_get_used_length(RAMBlock *rb)
>>   {
>>       return rb->used_length;
> 
> 
> Thanks,
> 
> C.



  reply	other threads:[~2025-02-05 22:02 UTC|newest]

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-29 14:42 [PATCH V1 00/26] Live update: vfio and iommufd Steve Sistare
2025-01-29 14:42 ` [PATCH V1 01/26] migration: cpr helpers Steve Sistare
2025-01-29 14:42 ` [PATCH V1 02/26] migration: lower handler priority Steve Sistare
2025-02-03 16:21   ` Fabiano Rosas
2025-02-03 16:58   ` Peter Xu
2025-02-06 13:39     ` Steven Sistare
2025-01-29 14:42 ` [PATCH V1 03/26] vfio: vfio_find_ram_discard_listener Steve Sistare
2025-02-03 16:57   ` Cédric Le Goater
2025-01-29 14:43 ` [PATCH V1 04/26] vfio/container: register container for cpr Steve Sistare
2025-02-03 17:01   ` Cédric Le Goater
2025-02-03 22:26     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 05/26] vfio/container: preserve descriptors Steve Sistare
2025-02-03 17:48   ` Cédric Le Goater
2025-02-03 22:26     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 06/26] vfio/container: preserve DMA mappings Steve Sistare
2025-02-03 18:25   ` Cédric Le Goater
2025-02-03 22:27     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 07/26] vfio/container: recover from unmap-all-vaddr failure Steve Sistare
2025-02-04 14:10   ` Cédric Le Goater
2025-02-04 16:13     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 08/26] pci: skip reset during cpr Steve Sistare
2025-02-04 14:14   ` Cédric Le Goater
2025-02-04 16:13     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 09/26] pci: export msix_is_pending Steve Sistare
2025-01-29 14:43 ` [PATCH V1 10/26] vfio-pci: refactor for cpr Steve Sistare
2025-02-04 14:39   ` Cédric Le Goater
2025-02-04 16:14     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 11/26] vfio-pci: skip reset during cpr Steve Sistare
2025-02-04 14:56   ` Cédric Le Goater
2025-02-04 16:15     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 12/26] vfio-pci: preserve MSI Steve Sistare
2025-02-05 16:48   ` Cédric Le Goater
2025-02-06 14:41     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 13/26] vfio-pci: preserve INTx Steve Sistare
2025-02-05 17:13   ` Cédric Le Goater
2025-02-06 14:43     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 14/26] migration: close kvm after cpr Steve Sistare
2025-01-29 14:43 ` [PATCH V1 15/26] migration: cpr_get_fd_param helper Steve Sistare
2025-01-29 14:43 ` [PATCH V1 16/26] vfio: return mr from vfio_get_xlat_addr Steve Sistare
2025-02-04 15:47   ` Cédric Le Goater
2025-02-04 17:42     ` Steven Sistare
2025-02-16 23:19       ` John Levon
2025-01-29 14:43 ` [PATCH V1 17/26] vfio: pass ramblock to vfio_container_dma_map Steve Sistare
2025-01-29 14:43 ` [PATCH V1 18/26] vfio/iommufd: define iommufd_cdev_make_hwpt Steve Sistare
2025-02-04 16:22   ` Cédric Le Goater
2025-02-04 17:42     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 19/26] vfio/iommufd: use IOMMU_IOAS_MAP_FILE Steve Sistare
2025-02-05 17:23   ` Cédric Le Goater
2025-02-05 22:01     ` Steven Sistare [this message]
2025-01-29 14:43 ` [PATCH V1 20/26] vfio/iommufd: export iommufd_cdev_get_info_iova_range Steve Sistare
2025-02-05 17:33   ` Cédric Le Goater
2025-02-05 22:01     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 21/26] iommufd: change process ioctl Steve Sistare
2025-02-05 17:34   ` Cédric Le Goater
2025-02-05 22:02     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 22/26] vfio/iommufd: invariant device name Steve Sistare
2025-02-05 17:42   ` Cédric Le Goater
2025-02-05 22:02     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 23/26] vfio/iommufd: register container for cpr Steve Sistare
2025-02-05 17:45   ` Cédric Le Goater
2025-02-05 22:03     ` Steven Sistare
2025-01-29 14:43 ` [PATCH V1 24/26] vfio/iommufd: preserve descriptors Steve Sistare
2025-01-29 14:43 ` [PATCH V1 25/26] vfio/iommufd: reconstruct device Steve Sistare
2025-01-29 14:43 ` [PATCH V1 26/26] iommufd: preserve DMA mappings Steve Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a649d9a2-f9d2-4366-9abf-6fa7c4321c7e@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=clg@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=farosas@suse.de \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).