From: Alex Williamson <alex.williamson@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: cohuck@redhat.com, cjia@nvidia.com, aik@ozlabs.ru,
Zhengxiao.zx@Alibaba-inc.com, shuangtai.tst@alibaba-inc.com,
qemu-devel@nongnu.org, peterx@redhat.com, eauger@redhat.com,
yi.l.liu@intel.com, quintela@redhat.com, ziye.yang@intel.com,
armbru@redhat.com, mlevitsk@redhat.com, pasic@linux.ibm.com,
felipe@nutanix.com, zhi.a.wang@intel.com, kevin.tian@intel.com,
yan.y.zhao@intel.com, dgilbert@redhat.com,
changpeng.liu@intel.com, eskultet@redhat.com, Ken.Xue@amd.com,
jonathan.davies@nutanix.com, pbonzini@redhat.com
Subject: Re: [PATCH QEMU v25 14/17] vfio: Add vfio_listener_log_sync to mark dirty pages
Date: Thu, 25 Jun 2020 11:57:12 -0600 [thread overview]
Message-ID: <20200625115712.630a592f@x1.home> (raw)
In-Reply-To: <33d35c69-437a-b7b8-821f-6356d559e2ef@nvidia.com>
On Thu, 25 Jun 2020 20:13:39 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:
> On 6/25/2020 12:25 AM, Alex Williamson wrote:
> > On Sun, 21 Jun 2020 01:51:23 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >
> >> vfio_listener_log_sync gets list of dirty pages from container using
> >> VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
> >> devices are stopped and saving state.
> >> Return early for the RAM block section of mapped MMIO region.
> >>
> >> Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
> >> Reviewed-by: Neo Jia <cjia@nvidia.com>
> >> ---
> >> hw/vfio/common.c | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >> hw/vfio/trace-events | 1 +
> >> 2 files changed, 131 insertions(+)
> >>
> >> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> >> index 6921a78e9ba5..0518cf228ed5 100644
> >> --- a/hw/vfio/common.c
> >> +++ b/hw/vfio/common.c
> >> @@ -29,6 +29,7 @@
> >> #include "hw/vfio/vfio.h"
> >> #include "exec/address-spaces.h"
> >> #include "exec/memory.h"
> >> +#include "exec/ram_addr.h"
> >> #include "hw/hw.h"
> >> #include "qemu/error-report.h"
> >> #include "qemu/main-loop.h"
> >> @@ -38,6 +39,7 @@
> >> #include "sysemu/reset.h"
> >> #include "trace.h"
> >> #include "qapi/error.h"
> >> +#include "migration/migration.h"
> >>
> >> VFIOGroupList vfio_group_list =
> >> QLIST_HEAD_INITIALIZER(vfio_group_list);
> >> @@ -288,6 +290,28 @@ const MemoryRegionOps vfio_region_ops = {
> >> };
> >>
> >> /*
> >> + * Device state interfaces
> >> + */
> >> +
> >> +static bool vfio_devices_are_stopped_and_saving(void)
> >> +{
> >> + VFIOGroup *group;
> >> + VFIODevice *vbasedev;
> >> +
> >> + QLIST_FOREACH(group, &vfio_group_list, next) {
> >
> > Should this be passed the container in order to iterate
> > container->group_list?
> >
> >> + QLIST_FOREACH(vbasedev, &group->device_list, next) {
> >> + if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
> >> + !(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
> >> + continue;
> >> + } else {
> >> + return false;
> >> + }
> >> + }
> >> + }
> >> + return true;
> >> +}
> >> +
> >> +/*
> >> * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
> >> */
> >> static int vfio_dma_unmap(VFIOContainer *container,
> >> @@ -852,9 +876,115 @@ static void vfio_listener_region_del(MemoryListener *listener,
> >> }
> >> }
> >>
> >> +static int vfio_get_dirty_bitmap(VFIOContainer *container, uint64_t iova,
> >> + uint64_t size, ram_addr_t ram_addr)
> >> +{
> >> + struct vfio_iommu_type1_dirty_bitmap *dbitmap;
> >> + struct vfio_iommu_type1_dirty_bitmap_get *range;
> >> + uint64_t pages;
> >> + int ret;
> >> +
> >> + dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
> >> +
> >> + dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
> >> + dbitmap->flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP;
> >> + range = (struct vfio_iommu_type1_dirty_bitmap_get *)&dbitmap->data;
> >> + range->iova = iova;
> >> + range->size = size;
> >> +
> >> + /*
> >> + * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
> >> + * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap's pgsize to
> >> + * TARGET_PAGE_SIZE.
> >> + */
> >> + range->bitmap.pgsize = TARGET_PAGE_SIZE;
> >> +
> >> + pages = TARGET_PAGE_ALIGN(range->size) >> TARGET_PAGE_BITS;
> >> + range->bitmap.size = ROUND_UP(pages, sizeof(__u64) * BITS_PER_BYTE) /
> >> + BITS_PER_BYTE;
> >> + range->bitmap.data = g_try_malloc0(range->bitmap.size);
> >> + if (!range->bitmap.data) {
> >> + ret = -ENOMEM;
> >> + goto err_out;
> >> + }
> >> +
> >> + ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, dbitmap);
> >> + if (ret) {
> >> + error_report("Failed to get dirty bitmap for iova: 0x%llx "
> >> + "size: 0x%llx err: %d",
> >> + range->iova, range->size, errno);
> >> + goto err_out;
> >> + }
> >> +
> >> + cpu_physical_memory_set_dirty_lebitmap((uint64_t *)range->bitmap.data,
> >> + ram_addr, pages);
> >> +
> >> + trace_vfio_get_dirty_bitmap(container->fd, range->iova, range->size,
> >> + range->bitmap.size, ram_addr);
> >> +err_out:
> >> + g_free(range->bitmap.data);
> >> + g_free(dbitmap);
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static int vfio_sync_dirty_bitmap(MemoryListener *listener,
> >> + MemoryRegionSection *section)
> >> +{
> >> + VFIOContainer *container = container_of(listener, VFIOContainer, listener);
> >> + VFIOGuestIOMMU *giommu = NULL;
> >> + ram_addr_t ram_addr;
> >> + uint64_t iova, size;
> >> + int ret = 0;
> >> +
> >> + if (memory_region_is_iommu(section->mr)) {
> >> +
> >> + QLIST_FOREACH(giommu, &container->giommu_list, giommu_next) {
> >> + if (MEMORY_REGION(giommu->iommu) == section->mr &&
> >> + giommu->n.start == section->offset_within_region) {
> >> + VFIOIovaRange *iova_range;
> >> +
> >> + QLIST_FOREACH(iova_range, &giommu->iova_list, next) {
> >> + ret = vfio_get_dirty_bitmap(container, iova_range->iova,
> >> + iova_range->size, iova_range->ram_addr);
> >> + if (ret) {
> >> + break;
> >> + }
> >> + }
> >> + break;
> >> + }
> >> + }
> >> +
> >> + } else {
> >> + iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> >> + size = int128_get64(section->size);
> >> +
> >> + ram_addr = memory_region_get_ram_addr(section->mr) +
> >> + section->offset_within_region + iova -
> >> + TARGET_PAGE_ALIGN(section->offset_within_address_space);
> >> +
> >> + ret = vfio_get_dirty_bitmap(container, iova, size, ram_addr);
> >> + }
> >> +
> >> + return ret;
> >> +}
> >> +
> >> +static void vfio_listerner_log_sync(MemoryListener *listener,
> >> + MemoryRegionSection *section)
> >> +{
> >> + if (vfio_listener_skipped_section(section)) {
> >> + return;
> >> + }
> >> +
> >> + if (vfio_devices_are_stopped_and_saving()) {
> >> + vfio_sync_dirty_bitmap(listener, section);
> >> + }
> >
> >
> > How do we decide that this is the best policy for all devices? For
> > example if a device does not support page pinning or some future means
> > of marking dirtied pages, this is clearly the right choice, but when
> > these are supported, aren't we deferring all dirty logging info until
> > the final stage? Thanks,
> >
>
> Yes, for now we are deferring all dirty logging to stop-and-copy phase.
> In future, whenever hardware support for dirty page tracking get added,
> we will have flag added in migration capability in VFIO_IOMMU_GET_INFO
> capability list. Based on that we can decide to get dirty pages in
> earlier phase of migration.
So the flag that you're expecting to add would indicate that the IOMMU
is reporting actual page dirtying, not just assuming pinned pages are
dirty, as we have now. It's too bad we can't collect the
previously-but-not-currently-pinned pages during the iterative phase.
Thanks,
Alex
next prev parent reply other threads:[~2020-06-25 17:58 UTC|newest]
Thread overview: 66+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-06-20 20:21 [PATCH QEMU v25 00/17] Add migration support for VFIO devices Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 01/17] vfio: Add function to unmap VFIO region Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 02/17] vfio: Add vfio_get_object callback to VFIODeviceOps Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 03/17] vfio: Add save and load functions for VFIO PCI devices Kirti Wankhede
2020-06-22 20:28 ` Alex Williamson
2020-06-24 14:29 ` Kirti Wankhede
2020-06-24 19:49 ` Alex Williamson
2020-06-26 12:16 ` Dr. David Alan Gilbert
2020-06-26 22:44 ` Alex Williamson
2020-06-29 9:59 ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 04/17] vfio: Add migration region initialization and finalize function Kirti Wankhede
2020-06-23 7:54 ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 05/17] vfio: Add VM state change handler to know state of VM Kirti Wankhede
2020-06-22 22:50 ` Alex Williamson
2020-06-23 18:55 ` Kirti Wankhede
2020-06-26 14:51 ` Dr. David Alan Gilbert
2020-06-23 8:07 ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 06/17] vfio: Add migration state change notifier Kirti Wankhede
2020-06-23 8:10 ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 07/17] vfio: Register SaveVMHandlers for VFIO device Kirti Wankhede
2020-06-22 22:50 ` Alex Williamson
2020-06-23 19:21 ` Kirti Wankhede
2020-06-23 19:50 ` Alex Williamson
2020-06-26 14:22 ` Dr. David Alan Gilbert
2020-06-26 14:31 ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 08/17] vfio: Add save state functions to SaveVMHandlers Kirti Wankhede
2020-06-22 22:50 ` Alex Williamson
2020-06-23 20:34 ` Kirti Wankhede
2020-06-23 20:40 ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 09/17] vfio: Add load " Kirti Wankhede
2020-06-24 18:54 ` Alex Williamson
2020-06-25 14:16 ` Kirti Wankhede
2020-06-25 14:57 ` Alex Williamson
2020-06-26 14:54 ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 10/17] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled Kirti Wankhede
2020-06-20 20:21 ` [PATCH QEMU v25 11/17] vfio: Get migration capability flags for container Kirti Wankhede
2020-06-24 8:43 ` Cornelia Huck
2020-06-24 18:55 ` Alex Williamson
2020-06-25 14:09 ` Kirti Wankhede
2020-06-25 14:56 ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 12/17] vfio: Add function to start and stop dirty pages tracking Kirti Wankhede
2020-06-23 10:32 ` Cornelia Huck
2020-06-23 11:01 ` Dr. David Alan Gilbert
2020-06-23 11:06 ` Cornelia Huck
2020-06-24 18:55 ` Alex Williamson
2020-06-20 20:21 ` [PATCH QEMU v25 13/17] vfio: create mapped iova list when vIOMMU is enabled Kirti Wankhede
2020-06-24 18:55 ` Alex Williamson
2020-06-25 14:34 ` Kirti Wankhede
2020-06-25 17:40 ` Alex Williamson
2020-06-26 14:43 ` Peter Xu
2020-06-20 20:21 ` [PATCH QEMU v25 14/17] vfio: Add vfio_listener_log_sync to mark dirty pages Kirti Wankhede
2020-06-24 18:55 ` Alex Williamson
2020-06-25 14:43 ` Kirti Wankhede
2020-06-25 17:57 ` Alex Williamson [this message]
2020-06-20 20:21 ` [PATCH QEMU v25 15/17] vfio: Add ioctl to get dirty pages bitmap during dma unmap Kirti Wankhede
2020-06-23 8:25 ` Cornelia Huck
2020-06-24 18:56 ` Alex Williamson
2020-06-25 15:01 ` Kirti Wankhede
2020-06-25 19:18 ` Alex Williamson
2020-06-26 14:15 ` Dr. David Alan Gilbert
2020-06-20 20:21 ` [PATCH QEMU v25 16/17] vfio: Make vfio-pci device migration capable Kirti Wankhede
2020-06-22 16:51 ` Cornelia Huck
2020-06-20 20:21 ` [PATCH QEMU v25 17/17] qapi: Add VFIO devices migration stats in Migration stats Kirti Wankhede
2020-06-23 7:21 ` Markus Armbruster
2020-06-23 21:16 ` Kirti Wankhede
2020-06-25 5:51 ` Markus Armbruster
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200625115712.630a592f@x1.home \
--to=alex.williamson@redhat.com \
--cc=Ken.Xue@amd.com \
--cc=Zhengxiao.zx@Alibaba-inc.com \
--cc=aik@ozlabs.ru \
--cc=armbru@redhat.com \
--cc=changpeng.liu@intel.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eauger@redhat.com \
--cc=eskultet@redhat.com \
--cc=felipe@nutanix.com \
--cc=jonathan.davies@nutanix.com \
--cc=kevin.tian@intel.com \
--cc=kwankhede@nvidia.com \
--cc=mlevitsk@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=pbonzini@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
--cc=shuangtai.tst@alibaba-inc.com \
--cc=yan.y.zhao@intel.com \
--cc=yi.l.liu@intel.com \
--cc=zhi.a.wang@intel.com \
--cc=ziye.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).