From: Alex Williamson <alex.williamson@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: mcrossley@nvidia.com, cjia@nvidia.com,
Cornelia Huck <cohuck@redhat.com>,
qemu-devel@nongnu.org, dnigam@nvidia.com, philmd@redhat.com
Subject: Re: [PATCH v1] docs/devel: Add VFIO device migration documentation
Date: Wed, 4 Nov 2020 05:45:27 -0700 [thread overview]
Message-ID: <20201104054527.22bbace7@x1.home> (raw)
In-Reply-To: <a27dee38-2fa9-a6ae-de30-eb7b57629393@nvidia.com>
On Wed, 4 Nov 2020 13:25:40 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:
> On 11/4/2020 1:57 AM, Alex Williamson wrote:
> > On Wed, 4 Nov 2020 01:18:12 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >
> >> On 10/30/2020 12:35 AM, Alex Williamson wrote:
> >>> On Thu, 29 Oct 2020 23:11:16 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>
> >>
> >> <snip>
> >>
> >>>>>> +System memory dirty pages tracking
> >>>>>> +----------------------------------
> >>>>>> +
> >>>>>> +A ``log_sync`` memory listener callback is added to mark system memory pages
> >>>>>
> >>>>> s/is added to mark/marks those/
> >>>>>
> >>>>>> +as dirty which are used for DMA by VFIO device. Dirty pages bitmap is queried
> >>>>>
> >>>>> s/by/by the/
> >>>>> s/Dirty/The dirty/
> >>>>>
> >>>>>> +per container. All pages pinned by vendor driver through vfio_pin_pages()
> >>>>>
> >>>>> s/by/by the/
> >>>>>
> >>>>>> +external API have to be marked as dirty during migration. When there are CPU
> >>>>>> +writes, CPU dirty page tracking can identify dirtied pages, but any page pinned
> >>>>>> +by vendor driver can also be written by device. There is currently no device
> >>>>>
> >>>>> s/by/by the/ (x2)
> >>>>>
> >>>>>> +which has hardware support for dirty page tracking. So all pages which are
> >>>>>> +pinned by vendor driver are considered as dirty.
> >>>>>> +Dirty pages are tracked when device is in stop-and-copy phase because if pages
> >>>>>> +are marked dirty during pre-copy phase and content is transfered from source to
> >>>>>> +destination, there is no way to know newly dirtied pages from the point they
> >>>>>> +were copied earlier until device stops. To avoid repeated copy of same content,
> >>>>>> +pinned pages are marked dirty only during stop-and-copy phase.
> >>>>
> >>>>
> >>>>> Let me take a quick stab at rewriting this paragraph (not sure if I
> >>>>> understood it correctly):
> >>>>>
> >>>>> "Dirty pages are tracked when the device is in the stop-and-copy phase.
> >>>>> During the pre-copy phase, it is not possible to distinguish a dirty
> >>>>> page that has been transferred from the source to the destination from
> >>>>> newly dirtied pages, which would lead to repeated copying of the same
> >>>>> content. Therefore, pinned pages are only marked dirty during the
> >>>>> stop-and-copy phase." ?
> >>>>>
> >>>>
> >>>> I think above rephrase only talks about repeated copying in pre-copy
> >>>> phase. Used "copied earlier until device stops" to indicate both
> >>>> pre-copy and stop-and-copy till device stops.
> >>>
> >>>
> >>> Now I'm confused, I thought we had abandoned the idea that we can only
> >>> report pinned pages during stop-and-copy. Doesn't the device needs to
> >>> expose its dirty memory footprint during the iterative phase regardless
> >>> of whether that causes repeat copies? If QEMU iterates and sees that
> >>> all memory is still dirty, it may have transferred more data, but it
> >>> can actually predict if it can achieve its downtime tolerances. Which
> >>> is more important, less data transfer or predictability? Thanks,
> >>>
> >>
> >> Even if QEMU copies and transfers content of all sys mem pages during
> >> pre-copy (worst case with IOMMU backed mdev device when its vendor
> >> driver is not smart to pin pages explicitly and all sys mem pages are
> >> marked dirty), then also its prediction about downtime tolerance will
> >> not be correct, because during stop-and-copy again all pages need to be
> >> copied as device can write to any of those pinned pages.
> >
> > I think you're only reiterating my point. If QEMU copies all of guest
> > memory during the iterative phase and each time it sees that all memory
> > is dirty, such as if CPUs or devices (including assigned devices) are
> > dirtying pages as fast as it copies them (or continuously marks them
> > dirty), then QEMU can predict that downtime will require copying all
> > pages.
>
> But as of now there is no way to know if device has dirtied pages during
> iterative phase.
This claim doesn't make any sense, pinned pages are considered
persistently dirtied, during the iterative phase and while stopped.
> > If instead devices don't mark dirty pages until the VM is
> > stopped, then QEMU might iterate through memory copy and predict a short
> > downtime because not much memory is dirty, only to be surprised that
> > all of memory is suddenly dirty. At that point it's too late, the VM
> > is already stopped, the predicted short downtime takes far longer than
> > expected. This is exactly why we made the kernel interface mark pinned
> > pages persistently dirty when it was proposed that we only report
> > pinned pages once. Thanks,
> >
>
> Since there is no way to know if device dirtied pages during iterative
> phase, QEMU should query pinned pages in stop-and-copy phase.
As above, I don't believe this is true.
> Whenever there will be hardware support or some software mechanism to
> report pages dirtied by device then we will add a capability bit in
> migration capability and based on that capability bit qemu/user space
> app should decide to query dirty pages in iterative phase.
Yes, we could advertise support for fine granularity dirty page
tracking, but I completely disagree that we should consider pinned
pages clean until suddenly exposing them as dirty once the VM is
stopped. Thanks,
Alex
next prev parent reply other threads:[~2020-11-04 12:47 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-10-29 5:53 [PATCH v1] docs/devel: Add VFIO device migration documentation Kirti Wankhede
2020-10-29 11:52 ` Cornelia Huck
2020-10-29 17:41 ` Kirti Wankhede
2020-10-29 19:05 ` Alex Williamson
2020-10-29 20:25 ` Cornelia Huck
2020-11-03 19:48 ` Kirti Wankhede
2020-11-03 20:27 ` Alex Williamson
2020-11-04 7:55 ` Kirti Wankhede
2020-11-04 12:45 ` Alex Williamson [this message]
2020-11-05 18:59 ` Kirti Wankhede
2020-11-05 19:11 ` Alex Williamson
2020-11-05 20:52 ` Kirti Wankhede
2020-11-05 21:26 ` Alex Williamson
2020-11-06 18:57 ` Kirti Wankhede
2020-11-06 19:17 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20201104054527.22bbace7@x1.home \
--to=alex.williamson@redhat.com \
--cc=cjia@nvidia.com \
--cc=cohuck@redhat.com \
--cc=dnigam@nvidia.com \
--cc=kwankhede@nvidia.com \
--cc=mcrossley@nvidia.com \
--cc=philmd@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).