qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: "Duan, Zhenzhong" <zhenzhong.duan@intel.com>,
	"Cédric Le Goater" <clg@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>
Cc: "Liu, Yi L" <yi.l.liu@intel.com>,
	Eric Auger <eric.auger@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Avihai Horon <avihaih@nvidia.com>
Subject: Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
Date: Thu, 11 Jul 2024 11:44:24 +0100	[thread overview]
Message-ID: <e8a510f0-b76b-4f72-ae99-a18a4bfdfb17@oracle.com> (raw)
In-Reply-To: <SJ0PR11MB6744373FFC0993E12FB3A29492A52@SJ0PR11MB6744.namprd11.prod.outlook.com>

On 11/07/2024 11:22, Duan, Zhenzhong wrote:
> 
> 
>> -----Original Message-----
>> From: Joao Martins <joao.m.martins@oracle.com>
>> Subject: Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking
>>
>> On 11/07/2024 08:41, Cédric Le Goater wrote:
>>> Hello Joao,
>>>
>>> On 7/8/24 4:34 PM, Joao Martins wrote:
>>>> This small series adds support for IOMMU dirty tracking support via the
>>>> IOMMUFD backend. The hardware capability is available on most recent
>> x86
>>>> hardware. The series is divided organized as follows:
>>>>
>>>> * Patch 1: Fixes a regression into mdev support with IOMMUFD. This
>>>>             one is independent of the series but happened to cross it
>>>>             while testing mdev with this series
>>>>
>>>> * Patch 2: Adds a support to iommufd_get_device_info() for capabilities
>>>>
>>>> * Patches 3 - 7: IOMMUFD backend support for dirty tracking;
>>>>
>>>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is
>> that
>>>> we will find and attach a device to a compatible IOMMU domain, or
>> allocate a new
>>>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs).
>> Afterwards the
>>>> workflow is relatively simple:
>>>>
>>>> 1) Probe device and allow dirty tracking in the HWPT
>>>> 2) Toggling dirty tracking on/off
>>>> 3) Read-and-clear of Dirty IOVAs
>>>>
>>>> The heuristics selected for (1) were to always request the HWPT for
>>>> dirty tracking if supported, or rely on device dirty page tracking. This
>>>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty
>>>> tracking even if we ask during hwpt allocation.
>>>>
>>>> The unmap case is deferred until further vIOMMU support with migration
>>>> is added[3] which will then introduce the usage of
>>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP
>> ioctl in the
>>>> dma unmap bitmap flow.
>>>>
>>>> * Patches 8-10: Don't block live migration where there's no VF dirty
>>>> tracker, considering that we have IOMMU dirty tracking.
>>>>
>>>> Comments and feedback appreciated.
>>>>
>>>> Cheers,
>>>>      Joao
>>>>
>>>> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's
>>>> some bugs fixed there with regards to IOMMU hugepage dirty tracking.
>>>>
>>>> Changes since RFCv2[4]:
>>>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING
>> even if
>>>> we end up not actually toggling dirty tracking. (Avihai)
>>>> * Fix error handling widely in auto domains logic and all patches (Avihai)
>>>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong)
>>>> * New patches 1 and 2 taking into consideration previous comments.
>>>> * Store hwpt::flags to know if we have dirty tracking (Avihai)
>>>> * New patch 8, that allows to query dirty tracking support after
>>>> provisioning. This is a cleaner way to check IOMMU dirty tracking support
>>>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps.
>>>> device caps way is still used because at vfio attach we aren't yet with
>>>> a fully initialized migration state.
>>>> * Adopt error propagation in query,set dirty tracking
>>>> * Misc improvements overall broadly and Avihai
>>>> * Drop hugepages as it's a bit unrelated; I can pursue that patch
>>>> * separately. The main motivation is to provide a way to test
>>>> without hugepages similar to what
>> vfio_type1_iommu.disable_hugepages=1
>>>> does.
>>>>
>>>> Changes since RFCv1[2]:
>>>> * Remove intel/amd dirty tracking emulation enabling
>>>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking
>>>> [Will pursue these two in separate series]
>>>> * Introduce auto domains support
>>>> * Enforce dirty tracking following the IOMMUFD UAPI for this
>>>> * Add support for toggling hugepages in IOMMUFD
>>>> * Auto enable support when VF supports migration to use IOMMU
>>>> when it doesn't have VF dirty tracking
>>>> * Add a parameter to toggle VF dirty tracking
>>>>
>>>> [0]
>>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1-
>> zhenzhong.duan@intel.com/
>>>> [1]
>>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10-
>> zhenzhong.duan@intel.com/
>>>> [2]
>>>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1-
>> joao.m.martins@oracle.com/
>>>> [3]
>>>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1-
>> joao.m.martins@oracle.com/
>>>> [4]
>>>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1-
>> joao.m.martins@oracle.com/
>>>>
>>>> Joao Martins (10):
>>>>    vfio/iommufd: don't fail to realize on IOMMU_GET_HW_INFO failure
>>>>    backends/iommufd: Extend iommufd_backend_get_device_info() to
>> fetch HW
>>>> capabilities
>>>>    vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt()
>>>>    vfio/iommufd: Introduce auto domain creation
>>>>    vfio/iommufd: Probe and request hwpt dirty tracking capability
>>>>    vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking
>> support
>>>>    vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap
>> support
>>>>    vfio/iommufd: Parse hw_caps and store dirty tracking support
>>>>    vfio/migration: Don't block migration device dirty tracking is
>> unsupported
>>>>    vfio/common: Allow disabling device dirty page tracking
>>>>
>>>>   include/hw/vfio/vfio-common.h      |  11 ++
>>>>   include/sysemu/host_iommu_device.h |   2 +
>>>>   include/sysemu/iommufd.h           |  12 +-
>>>>   backends/iommufd.c                 |  81 ++++++++++-
>>>>   hw/vfio/common.c                   |   3 +
>>>>   hw/vfio/iommufd.c                  | 217 +++++++++++++++++++++++++++--
>>>>   hw/vfio/migration.c                |   7 +-
>>>>   hw/vfio/pci.c                      |   3 +
>>>>   backends/trace-events              |   3 +
>>>>   9 files changed, 325 insertions(+), 14 deletions(-)
>>>
>>>
>>> I am a bit confused with all the inline proposals. Would you mind
>>> resending a v4 please ?
>>>
>>
>> Yeap, I'll send it out today, or worst case tomorrow morning.
>>
>>> Regarding my comments on error handling,
>>>
>>> The error should be set in case of failure, which means a routine
>>> can not return 'false' or '-errno' and not setting 'Error **'
>>> parameter at the same time.
>>>
>>> If the returned value needs to be interpreted in some ways, for a
>>> retry or any reason, then it makes sense to use an int, else please
>>> use a bool. This is to avoid random negative values being interpreted
>>> as an errno when they are not.
>>>
>> OK, I'll retain the Error* creation even when expecting to test the errno.
>>
>>> With VFIO migration support, low level errors (from the adapter FW
>>> through the VFIO PCI variant driver) now reach to the core migration
>>> subsystem. It is preferable to propagate this error, possibly literal,
>>> to the VMM, monitor or libvirt. It's not fully symmetric today because
>>> the log_global_stop handler for dirty tracking enablement is not
>>> addressed. Anyhow, an effort on error reporting needs to be made and
>>> any use of error_report() in a low level function is a sign for
>>> improvement.
>>>
>> Gotcha. My earlier comment was mostly that it sounded like there was no
>> place
>> for returning -errno, but it seems it's not that binary and the Error* is the
>> thing that really matters here.
>>
>>> I think it would have value to probe early the host IOMMU device for
>>> its HW features. If the results were cached in the HostIOMMUDevice
>>> struct, it would then remove unnecessary and redundant calls to the
>>> host kernel and avoid error handling in complex code paths. I hope
>>> this is feasible. I haven't looked closely tbh.
>>>
>> OK, I'll post in this series what I had inline[0], as that's what I did.
>>
>> [0]
>> https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133-
>> 59170c471e24@oracle.com/
>>
>> The gotcha in my opinion is that I cache IOMMUFD specific data returned by
>> the
>> GET_HW_INFO ioctl inside a new HostIOMMUDeviceCaps::iommufd. The
>> reason being
>> that vfio_device_get_aw_bits() has a hidden assumption that the container
>> is
>> already populated with the list of allowed iova ranges, which is not true for
>> the first device. So rather than have partial set of caps initialized, I
>> essentially ended up with fetching the raw caps and store them, and serialize
>> caps into named features (e.g. caps::aw_bits) in
>> HostIOMMUDevice::realize().
> 
> Another way is to call vfio_device_get_aw_bits() and return its result directly
> in get_cap(), then no need to initialize caps::aw_bits.
> This way host IOMMU device can be moved ahead as Cédric suggested.

Oh, yes, that's a great alternative. Let me adopt that instead and we don't need
to make so huge changes structure wise.


      reply	other threads:[~2024-07-11 10:45 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-08 14:34 [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Joao Martins
2024-07-08 14:34 ` [PATCH v3 01/10] vfio/iommufd: Don't fail to realize on IOMMU_GET_HW_INFO failure Joao Martins
2024-07-09  3:43   ` Duan, Zhenzhong
2024-07-09  8:56     ` Joao Martins
2024-07-09 11:45       ` Joao Martins
2024-07-09 11:50         ` Joao Martins
2024-07-10  2:53           ` Duan, Zhenzhong
2024-07-10  9:29             ` Joao Martins
2024-07-10  9:54               ` Duan, Zhenzhong
2024-07-10  9:56                 ` Joao Martins
2024-07-08 14:34 ` [PATCH v3 02/10] backends/iommufd: Extend iommufd_backend_get_device_info() to fetch HW capabilities Joao Martins
2024-07-09  6:13   ` Duan, Zhenzhong
2024-07-08 14:34 ` [PATCH v3 03/10] vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() Joao Martins
2024-07-08 15:28   ` Cédric Le Goater
2024-07-08 15:32     ` Joao Martins
2024-07-08 16:28       ` Joao Martins
2024-07-09  6:20       ` Cédric Le Goater
2024-07-09  8:56         ` Joao Martins
2024-07-08 14:34 ` [PATCH v3 04/10] vfio/iommufd: Introduce auto domain creation Joao Martins
2024-07-09  6:26   ` Duan, Zhenzhong
2024-07-09  9:00     ` Joao Martins
2024-07-09  6:50   ` Cédric Le Goater
2024-07-09  9:09     ` Joao Martins
2024-07-08 14:34 ` [PATCH v3 05/10] vfio/iommufd: Probe and request hwpt dirty tracking capability Joao Martins
2024-07-09  6:28   ` Cédric Le Goater
2024-07-09  9:04     ` Joao Martins
2024-07-09 12:47       ` Joao Martins
2024-07-09 16:53         ` Joao Martins
2024-07-08 14:34 ` [PATCH v3 06/10] vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking support Joao Martins
2024-07-09  7:07   ` Cédric Le Goater
2024-07-09  9:13     ` Joao Martins
2024-07-08 14:34 ` [PATCH v3 07/10] vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap support Joao Martins
2024-07-09  7:05   ` Cédric Le Goater
2024-07-09  9:13     ` Joao Martins
2024-07-09 12:41       ` Joao Martins
2024-07-08 14:34 ` [PATCH v3 08/10] vfio/iommufd: Parse hw_caps and store dirty tracking support Joao Martins
2024-07-08 14:34 ` [PATCH v3 09/10] vfio/migration: Don't block migration device dirty tracking is unsupported Joao Martins
2024-07-09  7:02   ` Cédric Le Goater
2024-07-09  9:09     ` Joao Martins
2024-07-10 10:38   ` Duan, Zhenzhong
2024-07-10 10:59     ` Joao Martins
2024-07-08 14:34 ` [PATCH v3 10/10] vfio/common: Allow disabling device dirty page tracking Joao Martins
2024-07-10 10:42   ` Duan, Zhenzhong
2024-07-10 10:51     ` Joao Martins
2024-07-11  7:41 ` [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking Cédric Le Goater
2024-07-11  8:33   ` Joao Martins
2024-07-11 10:22     ` Duan, Zhenzhong
2024-07-11 10:44       ` Joao Martins [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e8a510f0-b76b-4f72-ae99-a18a4bfdfb17@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=clg@redhat.com \
    --cc=eric.auger@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhenzhong.duan@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).