From: Alex Williamson <alex.williamson@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: "Jason Gunthorpe" <jgg@nvidia.com>,
"Avihai Horon" <avihaih@nvidia.com>,
qemu-devel@nongnu.org, "Cédric Le Goater" <clg@redhat.com>,
"Juan Quintela" <quintela@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
"Michael S. Tsirkin" <mst@redhat.com>,
"Peter Xu" <peterx@redhat.com>,
"Jason Wang" <jasowang@redhat.com>,
"Marcel Apfelbaum" <marcel.apfelbaum@gmail.com>,
"Paolo Bonzini" <pbonzini@redhat.com>,
"Richard Henderson" <richard.henderson@linaro.org>,
"Eduardo Habkost" <eduardo@habkost.net>,
"David Hildenbrand" <david@redhat.com>,
"Philippe Mathieu-Daudé" <philmd@linaro.org>,
"Yishai Hadas" <yishaih@nvidia.com>,
"Maor Gottlieb" <maorg@nvidia.com>,
"Kirti Wankhede" <kwankhede@nvidia.com>,
"Tarun Gupta" <targupta@nvidia.com>
Subject: Re: [PATCH v2 17/20] vfio/common: Support device dirty page tracking with vIOMMU
Date: Fri, 24 Feb 2023 08:56:34 -0700 [thread overview]
Message-ID: <20230224085634.149e3ad2.alex.williamson@redhat.com> (raw)
In-Reply-To: <c66d2d8e-f042-964a-a797-a3d07c260a3b@oracle.com>
On Fri, 24 Feb 2023 12:53:26 +0000
Joao Martins <joao.m.martins@oracle.com> wrote:
> On 24/02/2023 11:25, Joao Martins wrote:
> > On 23/02/2023 23:26, Jason Gunthorpe wrote:
> >> On Thu, Feb 23, 2023 at 03:33:09PM -0700, Alex Williamson wrote:
> >>> On Thu, 23 Feb 2023 16:55:54 -0400
> >>> Jason Gunthorpe <jgg@nvidia.com> wrote:
> >>>> On Thu, Feb 23, 2023 at 01:06:33PM -0700, Alex Williamson wrote:
> >>>> Or even better figure out how to get interrupt remapping without IOMMU
> >>>> support :\
> >>>
> >>> -machine q35,default_bus_bypass_iommu=on,kernel-irqchip=split \
> >>> -device intel-iommu,caching-mode=on,intremap=on
> >>
> >> Joao?
> >>
> >> If this works lets just block migration if the vIOMMU is turned on..
> >
> > At a first glance, this looked like my regular iommu incantation.
> >
> > But reading the code this ::bypass_iommu (new to me) apparently tells that
> > vIOMMU is bypassed or not for the PCI devices all the way to avoiding
> > enumerating in the IVRS/DMAR ACPI tables. And I see VFIO double-checks whether
> > PCI device is within the IOMMU address space (or bypassed) prior to DMA maps and
> > such.
> >
> > You can see from the other email that all of the other options in my head were
> > either bit inconvenient or risky. I wasn't aware of this option for what is
> > worth -- much simpler, should work!
> >
>
> I say *should*, but on a second thought interrupt remapping may still be
> required to one of these devices that are IOMMU-bypassed. Say to put affinities
> to vcpus above 255? I was trying this out with more than 255 vcpus with a couple
> VFs and at a first glance these VFs fail to probe (these are CX6 VFs).
>
> It is a working setup without the parameter, but now adding a
> default_bus_bypass_iommu=on fails to init VFs:
>
> [ 32.412733] mlx5_core 0000:00:02.0: Rate limit: 127 rates are supported,
> range: 0Mbps to 97656Mbps
> [ 32.416242] mlx5_core 0000:00:02.0: mlx5_load:1204:(pid 3361): Failed to
> alloc IRQs
> [ 33.227852] mlx5_core 0000:00:02.0: probe_one:1684:(pid 3361): mlx5_init_one
> failed with error code -19
> [ 33.242182] mlx5_core 0000:00:03.0: firmware version: 22.31.1660
> [ 33.415876] mlx5_core 0000:00:03.0: Rate limit: 127 rates are supported,
> range: 0Mbps to 97656Mbps
> [ 33.448016] mlx5_core 0000:00:03.0: mlx5_load:1204:(pid 3361): Failed to
> alloc IRQs
> [ 34.207532] mlx5_core 0000:00:03.0: probe_one:1684:(pid 3361): mlx5_init_one
> failed with error code -19
>
> I haven't dived yet into why it fails.
Hmm, I was thinking this would only affect DMA, but on second thought
I think the DRHD also describes the interrupt remapping hardware and
while interrupt remapping is an optional feature of the DRHD, DMA
remapping is always supported afaict. I saw IR vectors in
/proc/interrupts and thought it worked, but indeed an assigned device
is having trouble getting vectors.
>
> > And avoiding vIOMMU simplifies the whole patchset too, if it's OK to add a live
> > migration blocker if `bypass_iommu` is off for any PCI device.
> >
>
> Still we could have for starters a live migration blocker until we revisit the
> vIOMMU case ... should we deem that the default_bus_bypass_iommu=on or the
> others I suggested as non-options?
I'm very uncomfortable presuming a vIOMMU usage model, especially when
it leads to potentially untracked DMA if our assumptions are violated.
We could use a MemoryListener on the IOVA space to record a high level
mark, but we'd need to continue to monitor that mark while we're in
pre-copy and I don't think anyone would agree that a migratable VM can
suddenly become unmigratable due to a random IOVA allocation would be
supportable. That leads me to think that a machine option to limit the
vIOMMU address space, and testing that against the device prior to
declaring migration support of the device is possibly our best option.
Is that feasible? Do all the vIOMMU models have a means to limit the
IOVA space? How does QEMU learn a limit for a given device? We
probably need to think about whether there are devices that can even
support the guest physical memory ranges when we start relocating RAM
to arbitrary addresses (ex. hypertransport). Can we infer anything
from the vCPU virtual address space or is that still an unreasonable
range to track for devices? Thanks,
Alex
next prev parent reply other threads:[~2023-02-24 15:57 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-02-22 17:48 [PATCH v2 00/20] vfio: Add migration pre-copy support and device dirty tracking Avihai Horon
2023-02-22 17:48 ` [PATCH v2 01/20] migration: Pass threshold_size to .state_pending_{estimate, exact}() Avihai Horon via
2023-02-22 17:48 ` [PATCH v2 02/20] vfio/migration: Refactor vfio_save_block() to return saved data size Avihai Horon
2023-02-27 14:10 ` Cédric Le Goater
2023-02-22 17:48 ` [PATCH v2 03/20] vfio/migration: Add VFIO migration pre-copy support Avihai Horon
2023-02-22 20:58 ` Alex Williamson
2023-02-23 15:25 ` Avihai Horon
2023-02-23 21:16 ` Alex Williamson
2023-02-26 16:43 ` Avihai Horon
2023-02-27 16:14 ` Alex Williamson
2023-02-27 17:26 ` Jason Gunthorpe
2023-02-27 17:43 ` Alex Williamson
2023-03-01 18:49 ` Avihai Horon
2023-03-01 19:55 ` Alex Williamson
2023-03-01 21:12 ` Jason Gunthorpe
2023-03-01 22:39 ` Alex Williamson
2023-03-06 19:01 ` Jason Gunthorpe
2023-02-22 17:48 ` [PATCH v2 04/20] vfio/common: Fix error reporting in vfio_get_dirty_bitmap() Avihai Horon
2023-02-22 17:49 ` [PATCH v2 05/20] vfio/common: Fix wrong %m usages Avihai Horon
2023-02-22 17:49 ` [PATCH v2 06/20] vfio/common: Abort migration if dirty log start/stop/sync fails Avihai Horon
2023-02-22 17:49 ` [PATCH v2 07/20] vfio/common: Add VFIOBitmap and (de)alloc functions Avihai Horon
2023-02-22 21:40 ` Alex Williamson
2023-02-23 15:27 ` Avihai Horon
2023-02-27 14:09 ` Cédric Le Goater
2023-03-01 18:56 ` Avihai Horon
2023-03-02 13:24 ` Joao Martins
2023-03-02 14:52 ` Cédric Le Goater
2023-03-02 16:30 ` Joao Martins
2023-03-04 0:23 ` Joao Martins
2023-02-22 17:49 ` [PATCH v2 08/20] util: Add iova_tree_nnodes() Avihai Horon
2023-02-22 17:49 ` [PATCH v2 09/20] util: Extend iova_tree_foreach() to take data argument Avihai Horon
2023-02-22 17:49 ` [PATCH v2 10/20] vfio/common: Record DMA mapped IOVA ranges Avihai Horon
2023-02-22 22:10 ` Alex Williamson
2023-02-23 10:37 ` Joao Martins
2023-02-23 21:05 ` Alex Williamson
2023-02-23 21:19 ` Joao Martins
2023-02-23 21:50 ` Alex Williamson
2023-02-23 21:54 ` Joao Martins
2023-02-28 12:11 ` Joao Martins
2023-02-28 20:36 ` Alex Williamson
2023-03-02 0:07 ` Joao Martins
2023-03-02 0:13 ` Joao Martins
2023-03-02 18:42 ` Alex Williamson
2023-03-03 0:19 ` Joao Martins
2023-03-03 16:58 ` Joao Martins
2023-03-03 17:05 ` Alex Williamson
2023-03-03 19:14 ` Joao Martins
2023-03-03 19:40 ` Alex Williamson
2023-03-03 20:16 ` Joao Martins
2023-03-03 23:47 ` Alex Williamson
2023-03-03 23:57 ` Joao Martins
2023-03-04 0:21 ` Joao Martins
2023-02-22 17:49 ` [PATCH v2 11/20] vfio/common: Add device dirty page tracking start/stop Avihai Horon
2023-02-22 22:40 ` Alex Williamson
2023-02-23 2:02 ` Jason Gunthorpe
2023-02-23 19:27 ` Alex Williamson
2023-02-23 19:30 ` Jason Gunthorpe
2023-02-23 20:16 ` Alex Williamson
2023-02-23 20:54 ` Jason Gunthorpe
2023-02-26 16:54 ` Avihai Horon
2023-02-23 15:36 ` Avihai Horon
2023-02-22 17:49 ` [PATCH v2 12/20] vfio/common: Extract code from vfio_get_dirty_bitmap() to new function Avihai Horon
2023-02-22 17:49 ` [PATCH v2 13/20] vfio/common: Add device dirty page bitmap sync Avihai Horon
2023-02-22 17:49 ` [PATCH v2 14/20] vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap() Avihai Horon
2023-02-22 17:49 ` [PATCH v2 15/20] memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute Avihai Horon
2023-02-22 17:49 ` [PATCH v2 16/20] intel-iommu: Implement get_attr() method Avihai Horon
2023-02-22 17:49 ` [PATCH v2 17/20] vfio/common: Support device dirty page tracking with vIOMMU Avihai Horon
2023-02-22 23:34 ` Alex Williamson
2023-02-23 2:08 ` Jason Gunthorpe
2023-02-23 20:06 ` Alex Williamson
2023-02-23 20:55 ` Jason Gunthorpe
2023-02-23 21:30 ` Joao Martins
2023-02-23 22:33 ` Alex Williamson
2023-02-23 23:26 ` Jason Gunthorpe
2023-02-24 11:25 ` Joao Martins
2023-02-24 12:53 ` Joao Martins
2023-02-24 15:47 ` Jason Gunthorpe
2023-02-24 15:56 ` Alex Williamson [this message]
2023-02-24 19:16 ` Joao Martins
2023-02-22 17:49 ` [PATCH v2 18/20] vfio/common: Optimize " Avihai Horon
2023-02-22 17:49 ` [PATCH v2 19/20] vfio/migration: Query device dirty page tracking support Avihai Horon
2023-02-22 17:49 ` [PATCH v2 20/20] docs/devel: Document VFIO device dirty page tracking Avihai Horon
2023-02-27 14:29 ` Cédric Le Goater
2023-02-22 18:00 ` [PATCH v2 00/20] vfio: Add migration pre-copy support and device dirty tracking Avihai Horon
2023-02-22 20:55 ` Alex Williamson
2023-02-23 10:05 ` Cédric Le Goater
2023-02-23 15:07 ` Avihai Horon
2023-02-27 10:24 ` Cédric Le Goater
2023-02-23 14:56 ` Avihai Horon
2023-02-24 19:26 ` Joao Martins
2023-02-26 17:00 ` Avihai Horon
2023-02-27 13:50 ` Cédric Le Goater
2023-03-01 19:04 ` Avihai Horon
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20230224085634.149e3ad2.alex.williamson@redhat.com \
--to=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=clg@redhat.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eduardo@habkost.net \
--cc=jasowang@redhat.com \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=kwankhede@nvidia.com \
--cc=maorg@nvidia.com \
--cc=marcel.apfelbaum@gmail.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=philmd@linaro.org \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=richard.henderson@linaro.org \
--cc=targupta@nvidia.com \
--cc=yishaih@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).