qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Zhangfei Gao <zhangfei.gao@gmail.com>
Cc: qemu-devel@nongnu.org,
	Alex Williamson <alex.williamson@redhat.com>,
	Cedric Le Goater <clg@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Philippe Mathieu-Daude <philmd@linaro.org>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Marcel Apfelbaum <marcel.apfelbaum@gmail.com>,
	Jason Wang <jasowang@redhat.com>,
	Richard Henderson <richard.henderson@linaro.org>,
	Eduardo Habkost <eduardo@habkost.net>,
	Avihai Horon <avihaih@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Yi Liu <yi.l.liu@intel.com>
Subject: Re: [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU
Date: Tue, 21 Jan 2025 16:42:46 +0000	[thread overview]
Message-ID: <62d4bffa-a912-48b4-9a7c-b16b21bffb7a@oracle.com> (raw)
In-Reply-To: <CAMj5Bki73PNZdZvNAsK1YJiWGMeZugQCZ18QPekCM5EN61QqBg@mail.gmail.com>

On 07/01/2025 06:55, Zhangfei Gao wrote:
> Hi, Joao
> 
> On Fri, Jun 23, 2023 at 5:51 AM Joao Martins <joao.m.martins@oracle.com> wrote:
>>
>> Hey,
>>
>> This series introduces support for vIOMMU with VFIO device migration,
>> particurlarly related to how we do the dirty page tracking.
>>
>> Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
>> provide dma translation services for guests to provide some form of
>> guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
>> required for big VMs with VFs with more than 255 vcpus. We tackle both
>> and remove the migration blocker when vIOMMU is present provided the
>> conditions are met. I have both use-cases here in one series, but I am happy
>> to tackle them in separate series.
>>
>> As I found out we don't necessarily need to expose the whole vIOMMU
>> functionality in order to just support interrupt remapping. x86 IOMMUs
>> on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
>> Linux guests with commit c40aaaac10 and since qemu commit 8646d9c773d8)
>> can instantiate a IOMMU just for interrupt remapping without needing to
>> be advertised/support DMA translation. AMD IOMMU in theory can provide
>> the same, but Linux doesn't quite support the IR-only part there yet,
>> only intel-iommu.
>>
>> The series is organized as following:
>>
>> Patches 1-5: Today we can't gather vIOMMU details before the guest
>> establishes their first DMA mapping via the vIOMMU. So these first four
>> patches add a way for vIOMMUs to be asked of their properties at start
>> of day. I choose the least churn possible way for now (as opposed to a
>> treewide conversion) and allow easy conversion a posteriori. As
>> suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which
>> allows us to fetch PCI backing vIOMMU attributes, without necessarily
>> tieing the caller (VFIO or anyone else) to an IOMMU MR like I
>> was doing in v3.
>>
>> Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
>> DMA translation allowed. Today the 'dma-translation' attribute is
>> x86-iommu only, but the way this series is structured nothing stops from
>> other vIOMMUs supporting it too as long as they use
>> pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes
>> are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
>> the toggle/report DMA_TRANSLATION attribute. With the patches up to this set,
>> we've then tackled item (1) of the second paragraph.
> 
> Not understanding how to handle the device page table.
> 
> Does this mean after live-migration, the page table built by vIOMMU
> will be re-build in the target guest via pci_setup_iommu_ops?

AFAIU It is supposed to be done post loading the vIOMMU vmstate when enabling
the vIOMMU related MRs. And when walking the different 'emulated' address spaces
 it will replay all mappings (and skip non-present parts of the address space).

The trick in making this work largelly depends on individual vIOMMU
implementation (and this emulated vIOMMU stuff shouldn't be confused with IOMMU
nesting btw!). In intel case (and AMD will be similar) the root table pointer
that's part of the vmstate has all the device pagetables, which is just guest
memory that gets migrated over and enough to resolve VT-d/IVRS page walks.

The somewhat hard to follow part is that when it replays it walks all the whole
DMAR memory region and only notifies IOMMU MR listeners if there's a present PTE
or skip it. So at the end of the enabling of MRs the IOTLB gets reconstructed.
Though you would have to try to understand the flow with the vIOMMU you are using.

The replay in intel-iommu is triggered more or less this stack trace for a
present PTE:

vfio_iommu_map_notify
memory_region_notify_iommu_one
vtd_replay_hook
vtd_page_walk_one
vtd_page_walk_level
vtd_page_walk_level
vtd_page_walk_level
vtd_page_walk
vtd_iommu_replay
memory_region_iommu_replay
vfio_listener_region_add
address_space_update_topology_pass
address_space_set_flatview
memory_region_transaction_commit
vtd_switch_address_space
vtd_switch_address_space_all
vtd_post_load
vmstate_load_state
vmstate_load
qemu_loadvm_section_start_full
qemu_loadvm_state_main
qemu_loadvm_state
process_incoming_migration_co


  reply	other threads:[~2025-01-21 16:43 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-22 21:48 [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU Joao Martins
2023-06-22 21:48 ` [PATCH v4 01/15] hw/pci: Add a pci_setup_iommu_ops() helper Joao Martins
2023-10-02 15:12   ` Cédric Le Goater
2023-10-06  8:38     ` Joao Martins
2023-10-06  8:50       ` Cédric Le Goater
2023-10-06 11:06         ` Joao Martins
2023-10-06 17:09           ` Cédric Le Goater
2023-10-06 17:59             ` Joao Martins
2023-10-09 13:01               ` Cédric Le Goater
2023-10-06  8:45   ` Eric Auger
2023-10-06 11:03     ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 02/15] hw/pci: Refactor pci_device_iommu_address_space() Joao Martins
2023-10-02 15:22   ` Cédric Le Goater
2023-10-06  8:39     ` Joao Martins
2023-10-06  8:40       ` Joao Martins
2023-10-06  8:52   ` Eric Auger
2023-10-06 11:07     ` Joao Martins
2023-10-06  9:11   ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 03/15] hw/pci: Introduce pci_device_iommu_get_attr() Joao Martins
2023-06-22 21:48 ` [PATCH v4 04/15] intel-iommu: Switch to pci_setup_iommu_ops() Joao Martins
2023-06-22 21:48 ` [PATCH v4 05/15] memory/iommu: Add IOMMU_ATTR_DMA_TRANSLATION attribute Joao Martins
2023-10-06 13:08   ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 06/15] intel-iommu: Implement get_attr() method Joao Martins
2023-09-08  6:23   ` Duan, Zhenzhong
2023-09-08 10:11     ` Joao Martins
2023-10-02 15:23   ` Cédric Le Goater
2023-10-06  8:42     ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 07/15] vfio/common: Track whether DMA Translation is enabled on the vIOMMU Joao Martins
2023-07-09 15:10   ` Avihai Horon
2023-07-10 13:44     ` Joao Martins
2023-10-06 13:09   ` Eric Auger
2023-06-22 21:48 ` [PATCH v4 08/15] vfio/common: Relax vIOMMU detection when DMA translation is off Joao Martins
2023-06-22 21:48 ` [PATCH v4 09/15] memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute Joao Martins
2023-06-22 21:48 ` [PATCH v4 10/15] intel-iommu: Implement IOMMU_ATTR_MAX_IOVA get_attr() attribute Joao Martins
2023-07-09 15:17   ` Avihai Horon
2023-07-10 13:44     ` Joao Martins
2023-10-02 15:42       ` Cédric Le Goater
2023-10-06  8:43         ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 11/15] vfio/common: Move dirty tracking ranges update to helper Joao Martins
2023-06-22 21:48 ` [PATCH v4 12/15] vfio/common: Support device dirty page tracking with vIOMMU Joao Martins
2023-07-09 15:24   ` Avihai Horon
2023-07-10 13:49     ` Joao Martins
2023-09-08  6:11   ` Duan, Zhenzhong
2023-09-08 10:11     ` Joao Martins
2023-09-08 11:52       ` Duan, Zhenzhong
2023-09-08 11:54         ` Joao Martins
2023-06-22 21:48 ` [PATCH v4 13/15] vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap() Joao Martins
2023-06-22 21:48 ` [PATCH v4 14/15] vfio/common: Optimize device dirty page tracking with vIOMMU Joao Martins
2023-06-22 21:48 ` [PATCH v4 15/15] vfio/common: Block migration with vIOMMUs without address width limits Joao Martins
2023-09-08  6:28   ` Duan, Zhenzhong
2023-09-08 10:11     ` Joao Martins
2023-06-22 22:18 ` [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU Joao Martins
2023-09-07 11:11 ` Joao Martins
2023-09-07 12:40   ` Cédric Le Goater
2023-09-07 15:20     ` Joao Martins
2024-06-06 15:43 ` Cédric Le Goater
2024-06-07 15:10   ` Joao Martins
2024-06-10 16:53     ` Cédric Le Goater
2024-06-18 11:26       ` Joao Martins
2024-06-20 12:31         ` Cédric Le Goater
2024-11-28  3:19 ` Zhangfei Gao
2024-11-28 18:29   ` Joao Martins
2025-01-21 16:42     ` Joao Martins
2025-01-07  6:55 ` Zhangfei Gao
2025-01-21 16:42   ` Joao Martins [this message]
2025-02-08  2:07     ` Zhangfei Gao
2025-03-05 11:59       ` Joao Martins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62d4bffa-a912-48b4-9a7c-b16b21bffb7a@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=alex.williamson@redhat.com \
    --cc=avihaih@nvidia.com \
    --cc=clg@redhat.com \
    --cc=david@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=jasowang@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=richard.henderson@linaro.org \
    --cc=yi.l.liu@intel.com \
    --cc=zhangfei.gao@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).