qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Cédric Le Goater" <clg@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>, qemu-devel@nongnu.org
Cc: Alex Williamson <alex.williamson@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Jason Wang <jasowang@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Eduardo Habkost <eduardo@habkost.net>,
	Avihai Horon <avihaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Yi Liu <yi.l.liu@intel.com>
Subject: Re: [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU
Date: Thu, 6 Jun 2024 17:43:35 +0200	[thread overview]
Message-ID: <088a0db6-ae69-4d85-a817-1685d4053d17@redhat.com> (raw)
In-Reply-To: <20230622214845.3980-1-joao.m.martins@oracle.com>

Hello Joao,

On 6/22/23 23:48, Joao Martins wrote:
> Hey,
> 
> This series introduces support for vIOMMU with VFIO device migration,
> particurlarly related to how we do the dirty page tracking.
> 
> Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
> provide dma translation services for guests to provide some form of
> guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
> required for big VMs with VFs with more than 255 vcpus. We tackle both
> and remove the migration blocker when vIOMMU is present provided the
> conditions are met. I have both use-cases here in one series, but I am happy
> to tackle them in separate series.
> 
> As I found out we don't necessarily need to expose the whole vIOMMU
> functionality in order to just support interrupt remapping. x86 IOMMUs
> on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
> Linux guests with commit c40aaaac10 and since qemu commit 8646d9c773d8)
> can instantiate a IOMMU just for interrupt remapping without needing to
> be advertised/support DMA translation. AMD IOMMU in theory can provide
> the same, but Linux doesn't quite support the IR-only part there yet,
> only intel-iommu.
> 
> The series is organized as following:
> 
> Patches 1-5: Today we can't gather vIOMMU details before the guest
> establishes their first DMA mapping via the vIOMMU. So these first four
> patches add a way for vIOMMUs to be asked of their properties at start
> of day. I choose the least churn possible way for now (as opposed to a
> treewide conversion) and allow easy conversion a posteriori. As
> suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which
> allows us to fetch PCI backing vIOMMU attributes, without necessarily
> tieing the caller (VFIO or anyone else) to an IOMMU MR like I
> was doing in v3.
> 
> Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
> DMA translation allowed. Today the 'dma-translation' attribute is
> x86-iommu only, but the way this series is structured nothing stops from
> other vIOMMUs supporting it too as long as they use
> pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes
> are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
> the toggle/report DMA_TRANSLATION attribute. With the patches up to this set,
> we've then tackled item (1) of the second paragraph.
> 
> Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete
> IOVA address space, leveraging the logic we use to compose the dirty ranges.
> The blocker is once again relaxed for vIOMMUs that advertise their IOVA
> addressing limits. This tackles item (2). So far I mainly use it with
> intel-iommu, although I have a small set of patches for virtio-iommu per
> Alex's suggestion in v2.
> 
> Comments, suggestions welcome. Thanks for the review!


I spent sometime refreshing your series on upstream QEMU (See [1]) and
gave migration a try with CX-7 VF. LGTM. It doesn't seem we are far
from acceptance in QEMU 9.1. Are we ?

First, I will resend these with the changes I made :

   vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap()
   vfio/common: Move dirty tracking ranges update to helper()

I guess the PCIIOMMUOps::get_iommu_attr needs a close review. Is
IOMMU_ATTR_DMA_TRANSLATION a must have ?

The rest is mostly VFIO internals for dirty tracking.

Thanks,

C.

[1] https://github.com/legoater/qemu/commits/vfio-9.1


> 
> Regards,
> 	Joao
> 
> Changes since v3[8]:
> * Pick up Yi's patches[5][6], and rework the first four patches.
>    These are a bit better splitted, and make the new iommu_ops *optional*
>    as opposed to a treewide conversion. Rather than returning an IOMMU MR
>    and let VFIO operate on it to fetch attributes, we instead let the
>    underlying IOMMU driver fetch the desired IOMMU MR and ask for the
>    desired IOMMU attribute. Callers only care about PCI Device backing
>    vIOMMU attributes regardless of its topology/association. (Peter Xu)
>    These patches are a bit better splitted compared to original ones,
>    and I've kept all the same authorship and note the changes from
>    original where applicable.
> * Because of the rework of the first four patches, switch to
>    individual attributes in the VFIOSpace that track dma_translation
>    and the max_iova. All are expected to be unused when zero to retain
>    the defaults of today in common code.
> * Improve the migration blocker message of the last patch to be
>    more obvious that vIOMMU migration blocker is added when no vIOMMU
>    address space limits are advertised. (Patch 15)
> * Cast to uintptr_t in IOMMUAttr data in intel-iommu (Philippe).
> * Switch to MAKE_64BIT_MASK() instead of plain left shift (Philippe).
> * Change diffstat of patches with scripts/git.orderfile (Philippe).
> 
> Changes since v2[3]:
> * New patches 1-9 to be able to handle vIOMMUs without DMA translation, and
> introduce ways to know various IOMMU model attributes via the IOMMU MR. This
> is partly meant to address a comment in previous versions where we can't
> access the IOMMU MR prior to the DMA mapping happening. Before this series
> vfio giommu_list is only tracking 'mapped GIOVA' and that controlled by the
> guest. As well as better tackling of the IOMMU usage for interrupt-remapping
> only purposes.
> * Dropped Peter Xu ack on patch 9 given that the code changed a bit.
> * Adjust patch 14 to adjust for the VFIO bitmaps no longer being pointers.
> * The patches that existed in v2 of vIOMMU dirty tracking, are mostly
> * untouched, except patch 12 which was greatly simplified.
> 
> Changes since v1[4]:
> - Rebased on latest master branch. As part of it, made some changes in
>    pre-copy to adjust it to Juan's new patches:
>    1. Added a new patch that passes threshold_size parameter to
>       .state_pending_{estimate,exact}() handlers.
>    2. Added a new patch that refactors vfio_save_block().
>    3. Changed the pre-copy patch to cache and report pending pre-copy
>       size in the .state_pending_estimate() handler.
> - Removed unnecessary P2P code. This should be added later on when P2P
>    support is added. (Alex)
> - Moved the dirty sync to be after the DMA unmap in vfio_dma_unmap()
>    (patch #11). (Alex)
> - Stored vfio_devices_all_device_dirty_tracking()'s value in a local
>    variable in vfio_get_dirty_bitmap() so it can be re-used (patch #11).
> - Refactored the viommu device dirty tracking ranges creation code to
>    make it clearer (patch #15).
> - Changed overflow check in vfio_iommu_range_is_device_tracked() to
>    emphasize that we specifically check for 2^64 wrap around (patch #15).
> - Added R-bs / Acks.
> 
> [0] https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/
> [1] https://lore.kernel.org/qemu-devel/c66d2d8e-f042-964a-a797-a3d07c260a3b@oracle.com/
> [2] https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection
> [3] https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/
> [4] https://lore.kernel.org/qemu-devel/20230126184948.10478-1-avihaih@nvidia.com/
> [5] https://lore.kernel.org/all/20210302203827.437645-5-yi.l.liu@intel.com/
> [6] https://lore.kernel.org/all/20210302203827.437645-6-yi.l.liu@intel.com/
> [7] https://lore.kernel.org/qemu-devel/ZH9Kr6mrKNqUgcYs@x1n/
> [8] https://lore.kernel.org/qemu-devel/20230530175937.24202-1-joao.m.martins@oracle.com/
> 
> Avihai Horon (4):
>    memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute
>    intel-iommu: Implement IOMMU_ATTR_MAX_IOVA get_attr() attribute
>    vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap()
>    vfio/common: Optimize device dirty page tracking with vIOMMU
> 
> Joao Martins (7):
>    memory/iommu: Add IOMMU_ATTR_DMA_TRANSLATION attribute
>    intel-iommu: Implement get_attr() method
>    vfio/common: Track whether DMA Translation is enabled on the vIOMMU
>    vfio/common: Relax vIOMMU detection when DMA translation is off
>    vfio/common: Move dirty tracking ranges update to helper
>    vfio/common: Support device dirty page tracking with vIOMMU
>    vfio/common: Block migration with vIOMMUs without address width limits
> 
> Yi Liu (4):
>    hw/pci: Add a pci_setup_iommu_ops() helper
>    hw/pci: Refactor pci_device_iommu_address_space()
>    hw/pci: Introduce pci_device_iommu_get_attr()
>    intel-iommu: Switch to pci_setup_iommu_ops()
> 
>   include/exec/memory.h         |   4 +-
>   include/hw/pci/pci.h          |  11 ++
>   include/hw/pci/pci_bus.h      |   1 +
>   include/hw/vfio/vfio-common.h |   2 +
>   hw/i386/intel_iommu.c         |  53 +++++++-
>   hw/pci/pci.c                  |  58 +++++++-
>   hw/vfio/common.c              | 241 ++++++++++++++++++++++++++--------
>   hw/vfio/pci.c                 |  22 +++-
>   8 files changed, 329 insertions(+), 63 deletions(-)
> 



  parent reply	other threads:[~2024-06-06 15:44 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-22 21:48 [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU Joao Martins
2023-06-22 21:48 ` [PATCH v4 01/15] hw/pci: Add a pci_setup_iommu_ops() helper Joao Martins
2023-10-02 15:12   ` Cédric Le Goater
2023-10-06  8:38     ` Joao Martins
2023-10-06  8:50       ` Cédric Le Goater
2023-10-06 11:06         ` Joao Martins
2023-10-06 17:09           ` Cédric Le Goater
2023-10-06 17:59             ` Joao Martins
2023-10-09 13:01               ` Cédric Le Goater
2023-10-06  8:45   ` Eric Auger
2023-10-06 11:03     ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 02/15] hw/pci: Refactor pci_device_iommu_address_space() Joao Martins
2023-10-02 15:22   ` Cédric Le Goater
2023-10-06  8:39     ` Joao Martins
2023-10-06  8:40       ` Joao Martins
2023-10-06  8:52   ` Eric Auger
2023-10-06 11:07     ` Joao Martins
2023-10-06  9:11   ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 03/15] hw/pci: Introduce pci_device_iommu_get_attr() Joao Martins
2023-06-22 21:48 ` [PATCH v4 04/15] intel-iommu: Switch to pci_setup_iommu_ops() Joao Martins
2023-06-22 21:48 ` [PATCH v4 05/15] memory/iommu: Add IOMMU_ATTR_DMA_TRANSLATION attribute Joao Martins
2023-10-06 13:08   ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 06/15] intel-iommu: Implement get_attr() method Joao Martins
2023-09-08  6:23   ` Duan, Zhenzhong
2023-09-08 10:11     ` Joao Martins
2023-10-02 15:23   ` Cédric Le Goater
2023-10-06  8:42     ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 07/15] vfio/common: Track whether DMA Translation is enabled on the vIOMMU Joao Martins
2023-07-09 15:10   ` Avihai Horon
2023-07-10 13:44     ` Joao Martins
2023-10-06 13:09   ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 08/15] vfio/common: Relax vIOMMU detection when DMA translation is off Joao Martins
2023-06-22 21:48 ` [PATCH v4 09/15] memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute Joao Martins
2023-06-22 21:48 ` [PATCH v4 10/15] intel-iommu: Implement IOMMU_ATTR_MAX_IOVA get_attr() attribute Joao Martins
2023-07-09 15:17   ` Avihai Horon
2023-07-10 13:44     ` Joao Martins
2023-10-02 15:42       ` Cédric Le Goater
2023-10-06  8:43         ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 11/15] vfio/common: Move dirty tracking ranges update to helper Joao Martins
2023-06-22 21:48 ` [PATCH v4 12/15] vfio/common: Support device dirty page tracking with vIOMMU Joao Martins
2023-07-09 15:24   ` Avihai Horon
2023-07-10 13:49     ` Joao Martins
2023-09-08  6:11   ` Duan, Zhenzhong
2023-09-08 10:11     ` Joao Martins
2023-09-08 11:52       ` Duan, Zhenzhong
2023-09-08 11:54         ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 13/15] vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap() Joao Martins
2023-06-22 21:48 ` [PATCH v4 14/15] vfio/common: Optimize device dirty page tracking with vIOMMU Joao Martins
2023-06-22 21:48 ` [PATCH v4 15/15] vfio/common: Block migration with vIOMMUs without address width limits Joao Martins
2023-09-08  6:28   ` Duan, Zhenzhong
2023-09-08 10:11     ` Joao Martins
2023-06-22 22:18 ` [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU Joao Martins
2023-09-07 11:11 ` Joao Martins
2023-09-07 12:40   ` Cédric Le Goater
2023-09-07 15:20     ` Joao Martins
2024-06-06 15:43 ` Cédric Le Goater [this message]
2024-06-07 15:10   ` Joao Martins
2024-06-10 16:53     ` Cédric Le Goater
2024-06-18 11:26       ` Joao Martins
2024-06-20 12:31         ` Cédric Le Goater
2024-11-28  3:19 ` Zhangfei Gao
2024-11-28 18:29   ` Joao Martins
2025-01-21 16:42     ` Joao Martins
2025-01-07  6:55 ` Zhangfei Gao
2025-01-21 16:42   ` Joao Martins
2025-02-08  2:07     ` Zhangfei Gao
2025-03-05 11:59       ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=088a0db6-ae69-4d85-a817-1685d4053d17@redhat.com \
    --to=clg@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=jasowang@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=joao.m.martins@oracle.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=yi.l.liu@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).