qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Kirti Wankhede <kwankhede@nvidia.com>
Cc: mcrossley@nvidia.com, cjia@nvidia.com,
	Cornelia Huck <cohuck@redhat.com>,
	qemu-devel@nongnu.org, dnigam@nvidia.com, philmd@redhat.com
Subject: Re: [PATCH v1] docs/devel: Add VFIO device migration documentation
Date: Wed, 4 Nov 2020 05:45:27 -0700	[thread overview]
Message-ID: <20201104054527.22bbace7@x1.home> (raw)
In-Reply-To: <a27dee38-2fa9-a6ae-de30-eb7b57629393@nvidia.com>

On Wed, 4 Nov 2020 13:25:40 +0530
Kirti Wankhede <kwankhede@nvidia.com> wrote:

> On 11/4/2020 1:57 AM, Alex Williamson wrote:
> > On Wed, 4 Nov 2020 01:18:12 +0530
> > Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >   
> >> On 10/30/2020 12:35 AM, Alex Williamson wrote:  
> >>> On Thu, 29 Oct 2020 23:11:16 +0530
> >>> Kirti Wankhede <kwankhede@nvidia.com> wrote:
> >>>      
> >>
> >> <snip>
> >>  
> >>>>>> +System memory dirty pages tracking
> >>>>>> +----------------------------------
> >>>>>> +
> >>>>>> +A ``log_sync`` memory listener callback is added to mark system memory pages  
> >>>>>
> >>>>> s/is added to mark/marks those/
> >>>>>         
> >>>>>> +as dirty which are used for DMA by VFIO device. Dirty pages bitmap is queried  
> >>>>>
> >>>>> s/by/by the/
> >>>>> s/Dirty/The dirty/
> >>>>>         
> >>>>>> +per container. All pages pinned by vendor driver through vfio_pin_pages()  
> >>>>>
> >>>>> s/by/by the/
> >>>>>         
> >>>>>> +external API have to be marked as dirty during migration. When there are CPU
> >>>>>> +writes, CPU dirty page tracking can identify dirtied pages, but any page pinned
> >>>>>> +by vendor driver can also be written by device. There is currently no device  
> >>>>>
> >>>>> s/by/by the/ (x2)
> >>>>>         
> >>>>>> +which has hardware support for dirty page tracking. So all pages which are
> >>>>>> +pinned by vendor driver are considered as dirty.
> >>>>>> +Dirty pages are tracked when device is in stop-and-copy phase because if pages
> >>>>>> +are marked dirty during pre-copy phase and content is transfered from source to
> >>>>>> +destination, there is no way to know newly dirtied pages from the point they
> >>>>>> +were copied earlier until device stops. To avoid repeated copy of same content,
> >>>>>> +pinned pages are marked dirty only during stop-and-copy phase.  
> >>>>
> >>>>     
> >>>>> Let me take a quick stab at rewriting this paragraph (not sure if I
> >>>>> understood it correctly):
> >>>>>
> >>>>> "Dirty pages are tracked when the device is in the stop-and-copy phase.
> >>>>> During the pre-copy phase, it is not possible to distinguish a dirty
> >>>>> page that has been transferred from the source to the destination from
> >>>>> newly dirtied pages, which would lead to repeated copying of the same
> >>>>> content. Therefore, pinned pages are only marked dirty during the
> >>>>> stop-and-copy phase." ?
> >>>>>         
> >>>>
> >>>> I think above rephrase only talks about repeated copying in pre-copy
> >>>> phase. Used "copied earlier until device stops" to indicate both
> >>>> pre-copy and stop-and-copy till device stops.  
> >>>
> >>>
> >>> Now I'm confused, I thought we had abandoned the idea that we can only
> >>> report pinned pages during stop-and-copy.  Doesn't the device needs to
> >>> expose its dirty memory footprint during the iterative phase regardless
> >>> of whether that causes repeat copies?  If QEMU iterates and sees that
> >>> all memory is still dirty, it may have transferred more data, but it
> >>> can actually predict if it can achieve its downtime tolerances.  Which
> >>> is more important, less data transfer or predictability?  Thanks,
> >>>      
> >>
> >> Even if QEMU copies and transfers content of all sys mem pages during
> >> pre-copy (worst case with IOMMU backed mdev device when its vendor
> >> driver is not smart to pin pages explicitly and all sys mem pages are
> >> marked dirty), then also its prediction about downtime tolerance will
> >> not be correct, because during stop-and-copy again all pages need to be
> >> copied as device can write to any of those pinned pages.  
> > 
> > I think you're only reiterating my point.  If QEMU copies all of guest
> > memory during the iterative phase and each time it sees that all memory
> > is dirty, such as if CPUs or devices (including assigned devices) are
> > dirtying pages as fast as it copies them (or continuously marks them
> > dirty), then QEMU can predict that downtime will require copying all
> > pages.   
> 
> But as of now there is no way to know if device has dirtied pages during 
> iterative phase.


This claim doesn't make any sense, pinned pages are considered
persistently dirtied, during the iterative phase and while stopped.

 
> > If instead devices don't mark dirty pages until the VM is
> > stopped, then QEMU might iterate through memory copy and predict a short
> > downtime because not much memory is dirty, only to be surprised that
> > all of memory is suddenly dirty.  At that point it's too late, the VM
> > is already stopped, the predicted short downtime takes far longer than
> > expected.  This is exactly why we made the kernel interface mark pinned
> > pages persistently dirty when it was proposed that we only report
> > pinned pages once.  Thanks,
> >   
> 
> Since there is no way to know if device dirtied pages during iterative 
> phase, QEMU should query pinned pages in stop-and-copy phase.


As above, I don't believe this is true.


> Whenever there will be hardware support or some software mechanism to 
> report pages dirtied by device then we will add a capability bit in 
> migration capability and based on that capability bit qemu/user space 
> app should decide to query dirty pages in iterative phase.


Yes, we could advertise support for fine granularity dirty page
tracking, but I completely disagree that we should consider pinned
pages clean until suddenly exposing them as dirty once the VM is
stopped.  Thanks,

Alex



  reply	other threads:[~2020-11-04 12:47 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-10-29  5:53 [PATCH v1] docs/devel: Add VFIO device migration documentation Kirti Wankhede
2020-10-29 11:52 ` Cornelia Huck
2020-10-29 17:41   ` Kirti Wankhede
2020-10-29 19:05     ` Alex Williamson
2020-10-29 20:25       ` Cornelia Huck
2020-11-03 19:48       ` Kirti Wankhede
2020-11-03 20:27         ` Alex Williamson
2020-11-04  7:55           ` Kirti Wankhede
2020-11-04 12:45             ` Alex Williamson [this message]
2020-11-05 18:59               ` Kirti Wankhede
2020-11-05 19:11                 ` Alex Williamson
2020-11-05 20:52                   ` Kirti Wankhede
2020-11-05 21:26                     ` Alex Williamson
2020-11-06 18:57                       ` Kirti Wankhede
2020-11-06 19:17                         ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20201104054527.22bbace7@x1.home \
    --to=alex.williamson@redhat.com \
    --cc=cjia@nvidia.com \
    --cc=cohuck@redhat.com \
    --cc=dnigam@nvidia.com \
    --cc=kwankhede@nvidia.com \
    --cc=mcrossley@nvidia.com \
    --cc=philmd@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).