Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Christoffer Dall <christoffer.dall@linaro.org>
To: Andrew Jones <drjones@redhat.com>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Peter Maydell <peter.maydell@linaro.org>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Wei Huang <wei@redhat.com>,
	QEMU Developers <qemu-devel@nongnu.org>,
	Juan Quintela <quintela@redhat.com>
Subject: Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64
Date: Thu, 1 Feb 2018 11:48:31 +0100	[thread overview]
Message-ID: <20180201104831.GB21802@cbox> (raw)
In-Reply-To: <20180201104222.bwbrsfp2kmkwd7zp@kamzik.brq.redhat.com>

On Thu, Feb 01, 2018 at 11:42:22AM +0100, Andrew Jones wrote:
> On Thu, Feb 01, 2018 at 10:59:54AM +0100, Christoffer Dall wrote:
> > On Thu, Feb 1, 2018 at 10:33 AM, Ard Biesheuvel
> > <ard.biesheuvel@linaro.org> wrote:
> > > On 1 February 2018 at 09:17, Christoffer Dall
> > > <christoffer.dall@linaro.org> wrote:
> > >> On Wed, Jan 31, 2018 at 9:15 PM, Ard Biesheuvel
> > >> <ard.biesheuvel@linaro.org> wrote:
> > >>> On 31 January 2018 at 19:12, Christoffer Dall
> > >>> <christoffer.dall@linaro.org> wrote:
> > >>>> On Wed, Jan 31, 2018 at 7:00 PM, Ard Biesheuvel
> > >>>> <ard.biesheuvel@linaro.org> wrote:
> > >>>>> On 31 January 2018 at 17:39, Christoffer Dall
> > >>>>> <christoffer.dall@linaro.org> wrote:
> > >>>>>> On Wed, Jan 31, 2018 at 5:59 PM, Ard Biesheuvel
> > >>>>>> <ard.biesheuvel@linaro.org> wrote:
> > >>>>>>> On 31 January 2018 at 16:53, Christoffer Dall
> > >>>>>>> <christoffer.dall@linaro.org> wrote:
> > >>>>>>>> On Wed, Jan 31, 2018 at 4:18 PM, Ard Biesheuvel
> > >>>>>>>> <ard.biesheuvel@linaro.org> wrote:
> > >>>>>>>>> On 31 January 2018 at 09:53, Christoffer Dall
> > >>>>>>>>> <christoffer.dall@linaro.org> wrote:
> > >>>>>>>>>> On Mon, Jan 29, 2018 at 10:32:12AM +0000, Marc Zyngier wrote:
> > >>>>>>>>>>> On 29/01/18 10:04, Peter Maydell wrote:
> > >>>>>>>>>>> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > >>>>>>>>>>> >> * Peter Maydell (peter.maydell@linaro.org) wrote:
> > >>>>>>>>>>> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote:
> > >>>>>>>>>>> >>>> * Peter Maydell (peter.maydell@linaro.org) wrote:
> > >>>>>>>>>>> >>>>> I think the correct fix here is that your test code should turn
> > >>>>>>>>>>> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work
> > >>>>>>>>>>> >>>>> for Arm KVM guests (for the same reason that VGA device video memory
> > >>>>>>>>>>> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as
> > >>>>>>>>>>> >>>>> Normal Cacheable, and then everything should work fine.
> > >>>>>>>>>>> >>>>
> > >>>>>>>>>>> >>>> Does this cause problems with migrating at just the wrong point during
> > >>>>>>>>>>> >>>> a VM boot?
> > >>>>>>>>>>> >>>
> > >>>>>>>>>>> >>> It wouldn't surprise me if it did, but I don't think I've ever
> > >>>>>>>>>>> >>> tried to provoke that problem...
> > >>>>>>>>>>> >>
> > >>>>>>>>>>> >> If you think it'll get the RAM contents wrong, it might be best to fail
> > >>>>>>>>>>> >> the migration if you can detect the cache is disabled in the guest.
> > >>>>>>>>>>> >
> > >>>>>>>>>>> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit
> > >>>>>>>>>>> > in the guest's system registers, and refuse migration if it's off...
> > >>>>>>>>>>> >
> > >>>>>>>>>>> > (cc'd Marc, Christoffer to check that I don't have the wrong end
> > >>>>>>>>>>> > of the stick about how thin the ice is in the period before the
> > >>>>>>>>>>> > guest turns on its MMU...)
> > >>>>>>>>>>>
> > >>>>>>>>>>> Once MMU and caches are on, we should be in a reasonable place for QEMU
> > >>>>>>>>>>> to have a consistent view of the memory. The trick is to prevent the
> > >>>>>>>>>>> vcpus from changing that. A guest could perfectly turn off its MMU at
> > >>>>>>>>>>> any given time if it needs to (and it is actually required on some HW if
> > >>>>>>>>>>> you want to mitigate headlining CVEs), and KVM won't know about that.
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> (Clarification: KVM can detect this is it bother to check the VCPU's
> > >>>>>>>>>> system registers, but we don't trap to KVM when the VCPU turns off its
> > >>>>>>>>>> caches, right?)
> > >>>>>>>>>>
> > >>>>>>>>>>> You may have to pause the vcpus before starting the migration, or
> > >>>>>>>>>>> introduce a new KVM feature that would automatically pause a vcpu that
> > >>>>>>>>>>> is trying to disable its MMU while the migration is on. This would
> > >>>>>>>>>>> involve trapping all the virtual memory related system registers, with
> > >>>>>>>>>>> an obvious cost. But that cost would be limited to the time it takes to
> > >>>>>>>>>>> migrate the memory, so maybe that's acceptable.
> > >>>>>>>>>>>
> > >>>>>>>>>> Is that even sufficient?
> > >>>>>>>>>>
> > >>>>>>>>>> What if the following happened. (1) guest turns off MMU, (2) guest
> > >>>>>>>>>> writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads
> > >>>>>>>>>> guest ram.  QEMU's view of guest ram is now incorrect (stale,
> > >>>>>>>>>> incoherent, ...).
> > >>>>>>>>>>
> > >>>>>>>>>> I'm also not really sure if pausing one VCPU because it turned off its
> > >>>>>>>>>> MMU will go very well when trying to migrate a large VM (wouldn't this
> > >>>>>>>>>> ask for all the other VCPUs beginning to complain that the stopped VCPU
> > >>>>>>>>>> appears to be dead?).  As a short-term 'fix' it's probably better to
> > >>>>>>>>>> refuse migration if you detect that a VCPU had begun turning off its
> > >>>>>>>>>> MMU.
> > >>>>>>>>>>
> > >>>>>>>>>> On the larger scale of thins; this appears to me to be another case of
> > >>>>>>>>>> us really needing some way to coherently access memory between QEMU and
> > >>>>>>>>>> the VM, but in the case of the VCPU turning off the MMU prior to
> > >>>>>>>>>> migration, we don't even know where it may have written data, and I'm
> > >>>>>>>>>> therefore not really sure what the 'proper' solution would be.
> > >>>>>>>>>>
> > >>>>>>>>>> (cc'ing Ard who has has thought about this problem before in the context
> > >>>>>>>>>> of UEFI and VGA.)
> > >>>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> Actually, the VGA case is much simpler because the host is not
> > >>>>>>>>> expected to write to the framebuffer, only read from it, and the guest
> > >>>>>>>>> is not expected to create a cacheable mapping for it, so any
> > >>>>>>>>> incoherency can be trivially solved by cache invalidation on the host
> > >>>>>>>>> side. (Note that this has nothing to do with DMA coherency, but only
> > >>>>>>>>> with PCI MMIO BARs that are backed by DRAM in the host)
> > >>>>>>>>
> > >>>>>>>> In case of the running guest, the host will also only read from the
> > >>>>>>>> cached mapping.  Of course, at restore, the host will also write
> > >>>>>>>> through a cached mapping, but shouldn't the latter case be solvable by
> > >>>>>>>> having KVM clean the cache lines when faulting in any page?
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> We are still talking about the contents of the framebuffer, right? In
> > >>>>>>> that case, yes, afaict
> > >>>>>>>
> > >>>>>>
> > >>>>>> I was talking about normal RAM actually... not sure if that changes anything?
> > >>>>>>
> > >>>>>
> > >>>>> The main difference is that with a framebuffer BAR, it is pointless
> > >>>>> for the guest to map it cacheable, given that the purpose of a
> > >>>>> framebuffer is its side effects, which are not guaranteed to occur
> > >>>>> timely if the mapping is cacheable.
> > >>>>>
> > >>>>> If we are talking about normal RAM, then why are we discussing it here
> > >>>>> and not down there?
> > >>>>>
> > >>>>
> > >>>> Because I was trying to figure out how the challenge of accessing the
> > >>>> VGA framebuffer differs from the challenge of accessing guest RAM
> > >>>> which may have been written by the guest with the MMU off.
> > >>>>
> > >>>> First approximation, they are extremely similar because the guest is
> > >>>> writing uncached to memory, which the host now has to access via a
> > >>>> cached mapping.
> > >>>>
> > >>>> But I'm guessing that a "clean+invalidate before read on the host"
> > >>>> solution will result in terrible performance for a framebuffer and
> > >>>> therefore isn't a good solution for that problem...
> > >>>>
> > >>>
> > >>> That highly depends on where 'not working' resides on the performance
> > >>> scale. Currently, VGA on KVM simply does not work at all, and so
> > >>> working but slow would be a huge improvement over the current
> > >>> situation.
> > >>>
> > >>> Also, the performance hit is caused by the fact that the data needs to
> > >>> make a round trip to memory, and the invalidation (without cleaning)
> > >>> performed by the host shouldn't make that much worse than it
> > >>> fundamentally is to begin with.
> > >>>
> > >>
> > >> True, but Marc pointed out (on IRC) that the cache maintenance
> > >> instructions are broadcast across all CPUs and may therefore adversely
> > >> affect the performance of your entire system by running an emulated
> > >> VGA framebuffer on a subset of the host CPUs.  However, he also
> > >> pointed out that the necessary cache maintenance can be done in EL0
> 
> I had some memory of this not being the case, but I just checked and
> indeed 'dc civac' will work from el0 with sctlr_el1.uci=1. Note, I
> see that the gcc builtin __builtin___clear_cache() does not use that
> instruction (it uses 'dc cvau'), so we'd need to implement a wrapper
> ourselves (maybe that's why I thought it wasn't available...)
> 
> > >> and we wouldn't even need a new ioctl or anything else from KVM or the
> > >> kernel to do this.  I think we should measure the effect of this
> > >> before dismissing it though.
> > >>
> > >
> > > If you could use dirty page tracking to only perform the cache
> > > invalidation when the framebuffer memory has been updated, you can at
> > > least limit the impact to cases where the framebuffer is actually
> > > used, rather than sitting idle with a nice wallpaper image.
> 
> Yes, this is the exact approach I took back when I experimented with
> this. I must have screwed up my PoC in some way (like using the gcc
> builtin), because it wasn't working for me...
> 
Does that mean you have some code you feel like reviving and use to send
out an RFC based on? ;)

-Christoffer

next prev parent reply	other threads:[~2018-02-01 10:48 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-24 21:22 [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64 Wei Huang
2018-01-25 20:05 ` Dr. David Alan Gilbert
2018-01-26 15:47   ` Wei Huang
2018-01-26 16:39     ` Peter Maydell
2018-01-26 17:08       ` Wei Huang
2018-01-26 19:46       ` Dr. David Alan Gilbert
2018-01-28 15:08         ` Peter Maydell
2018-01-29  9:53           ` Dr. David Alan Gilbert
2018-01-29 10:04             ` Peter Maydell
2018-01-29 10:19               ` Dr. David Alan Gilbert
2018-01-29 10:32               ` Marc Zyngier
2018-01-31  9:53                 ` Christoffer Dall
2018-01-31 15:18                   ` Ard Biesheuvel
2018-01-31 15:23                     ` Dr. David Alan Gilbert
2018-01-31 16:53                     ` Christoffer Dall
2018-01-31 16:59                       ` Ard Biesheuvel
2018-01-31 17:39                         ` Christoffer Dall
2018-01-31 18:00                           ` Ard Biesheuvel
2018-01-31 19:12                             ` Christoffer Dall
2018-01-31 20:15                               ` Ard Biesheuvel
2018-02-01  9:17                                 ` Christoffer Dall
2018-02-01  9:33                                   ` Ard Biesheuvel
2018-02-01  9:59                                     ` Christoffer Dall
2018-02-01 10:09                                       ` Ard Biesheuvel
2018-02-01 10:42                                       ` Andrew Jones
2018-02-01 10:48                                         ` Christoffer Dall [this message]
2018-02-01 12:25                                           ` Andrew Jones
2018-02-01 14:04                                             ` Christoffer Dall
2018-02-01 20:01     ` Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180201104831.GB21802@cbox \
    --to=christoffer.dall@linaro.org \
    --cc=ard.biesheuvel@linaro.org \
    --cc=dgilbert@redhat.com \
    --cc=drjones@redhat.com \
    --cc=marc.zyngier@arm.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=wei@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).