qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Christoffer Dall <christoffer.dall@linaro.org>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	Wei Huang <wei@redhat.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Juan Quintela <quintela@redhat.com>,
	Andrew Jones <drjones@redhat.com>
Subject: Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
Date: Wed, 31 Jan 2018 15:23:56 +0000	[thread overview]
Message-ID: <20180131152356.GH2521@work-vm> (raw)
In-Reply-To: <CAKv+Gu-NAcUTpEUXCLudHTCFAf=Yb28q3AET9d4ED9A9z7dMfQ@mail.gmail.com>

* Ard Biesheuvel (ard.biesheuvel@linaro.org) wrote:
> On 31 January 2018 at 09:53, Christoffer Dall
> <christoffer.dall@linaro.org> wrote:
> > On Mon, Jan 29, 2018 at 10:32:12AM +0000, Marc Zyngier wrote:
> >> On 29/01/18 10:04, Peter Maydell wrote:
> >> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >> >> * Peter Maydell (peter.maydell@linaro.org) wrote:
> >> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> >> >>>> * Peter Maydell (peter.maydell@linaro.org) wrote:
> >> >>>>> I think the correct fix here is that your test code should turn
> >> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work
> >> >>>>> for Arm KVM guests (for the same reason that VGA device video memory
> >> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as
> >> >>>>> Normal Cacheable, and then everything should work fine.
> >> >>>>
> >> >>>> Does this cause problems with migrating at just the wrong point during
> >> >>>> a VM boot?
> >> >>>
> >> >>> It wouldn't surprise me if it did, but I don't think I've ever
> >> >>> tried to provoke that problem...
> >> >>
> >> >> If you think it'll get the RAM contents wrong, it might be best to fail
> >> >> the migration if you can detect the cache is disabled in the guest.
> >> >
> >> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit
> >> > in the guest's system registers, and refuse migration if it's off...
> >> >
> >> > (cc'd Marc, Christoffer to check that I don't have the wrong end
> >> > of the stick about how thin the ice is in the period before the
> >> > guest turns on its MMU...)
> >>
> >> Once MMU and caches are on, we should be in a reasonable place for QEMU
> >> to have a consistent view of the memory. The trick is to prevent the
> >> vcpus from changing that. A guest could perfectly turn off its MMU at
> >> any given time if it needs to (and it is actually required on some HW if
> >> you want to mitigate headlining CVEs), and KVM won't know about that.
> >>
> >
> > (Clarification: KVM can detect this is it bother to check the VCPU's
> > system registers, but we don't trap to KVM when the VCPU turns off its
> > caches, right?)
> >
> >> You may have to pause the vcpus before starting the migration, or
> >> introduce a new KVM feature that would automatically pause a vcpu that
> >> is trying to disable its MMU while the migration is on. This would
> >> involve trapping all the virtual memory related system registers, with
> >> an obvious cost. But that cost would be limited to the time it takes to
> >> migrate the memory, so maybe that's acceptable.
> >>
> > Is that even sufficient?
> >
> > What if the following happened. (1) guest turns off MMU, (2) guest
> > writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
> > guest ram.  QEMU's view of guest ram is now incorrect (stale,
> > incoherent, ...).
> >
> > I'm also not really sure if pausing one VCPU because it turned off its
> > MMU will go very well when trying to migrate a large VM (wouldn't this
> > ask for all the other VCPUs beginning to complain that the stopped VCPU
> > appears to be dead?).  As a short-term 'fix' it's probably better to
> > refuse migration if you detect that a VCPU had begun turning off its
> > MMU.
> >
> > On the larger scale of thins; this appears to me to be another case of
> > us really needing some way to coherently access memory between QEMU and
> > the VM, but in the case of the VCPU turning off the MMU prior to
> > migration, we don't even know where it may have written data, and I'm
> > therefore not really sure what the 'proper' solution would be.
> >
> > (cc'ing Ard who has has thought about this problem before in the context
> > of UEFI and VGA.)
> >
> 
> Actually, the VGA case is much simpler because the host is not
> expected to write to the framebuffer, only read from it, and the guest
> is not expected to create a cacheable mapping for it, so any
> incoherency can be trivially solved by cache invalidation on the host
> side. (Note that this has nothing to do with DMA coherency, but only
> with PCI MMIO BARs that are backed by DRAM in the host)
> 
> In the migration case, it is much more complicated, and I think
> capturing the state of the VM in a way that takes incoherency between
> caches and main memory into account is simply infeasible (i.e., the
> act of recording the state of guest RAM via a cached mapping may evict
> clean cachelines that are out of sync, and so it is impossible to
> record both the cached *and* the delta with the uncached state)
> 
> I wonder how difficult it would be to
> a) enable trapping of the MMU system register when a guest CPU is
> found to have its MMU off at migration time
> b) allow the guest CPU to complete whatever it thinks it needs to be
> doing with the MMU off
> c) once it re-enables the MMU, proceed with capturing the memory state
> 
> Full disclosure: I know very little about KVM migration ...

The difficulty is that migration is 'live' - i.e. the guest is running
while we're copying the data across; that means that a guest might
do any of these MMU things multiple times - so if we wait for it
to be right, will it go back to being wrong?  How long do you wait?
(It's not a bad hack if that's the best we can do though).

Now of course 'live' itself sounds scary for consistency, but the only thing we really
require is that a page is marked dirty some time after it's been
written to so that we cause it to be sent again and that we
eventually send a correct version; it's ok for us to be sending
inconsistent versions as long as we eventually send the right
version.

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply	other threads:[~2018-01-31 15:24 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-24 21:22 [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64 Wei Huang
2018-01-25 20:05 ` Dr. David Alan Gilbert
2018-01-26 15:47   ` Wei Huang
2018-01-26 16:39     ` Peter Maydell
2018-01-26 17:08       ` Wei Huang
2018-01-26 19:46       ` Dr. David Alan Gilbert
2018-01-28 15:08         ` Peter Maydell
2018-01-29  9:53           ` Dr. David Alan Gilbert
2018-01-29 10:04             ` Peter Maydell
2018-01-29 10:19               ` Dr. David Alan Gilbert
2018-01-29 10:32               ` Marc Zyngier
2018-01-31  9:53                 ` Christoffer Dall
2018-01-31 15:18                   ` Ard Biesheuvel
2018-01-31 15:23                     ` Dr. David Alan Gilbert [this message]
2018-01-31 16:53                     ` Christoffer Dall
2018-01-31 16:59                       ` Ard Biesheuvel
2018-01-31 17:39                         ` Christoffer Dall
2018-01-31 18:00                           ` Ard Biesheuvel
2018-01-31 19:12                             ` Christoffer Dall
2018-01-31 20:15                               ` Ard Biesheuvel
2018-02-01  9:17                                 ` Christoffer Dall
2018-02-01  9:33                                   ` Ard Biesheuvel
2018-02-01  9:59                                     ` Christoffer Dall
2018-02-01 10:09                                       ` Ard Biesheuvel
2018-02-01 10:42                                       ` Andrew Jones
2018-02-01 10:48                                         ` Christoffer Dall
2018-02-01 12:25                                           ` Andrew Jones
2018-02-01 14:04                                             ` Christoffer Dall
2018-02-01 20:01     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180131152356.GH2521@work-vm \
    --to=dgilbert@redhat.com \
    --cc=ard.biesheuvel@linaro.org \
    --cc=christoffer.dall@linaro.org \
    --cc=drjones@redhat.com \
    --cc=marc.zyngier@arm.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).