From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:32792) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eguFM-000242-At for qemu-devel@nongnu.org; Wed, 31 Jan 2018 10:24:28 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eguFH-00010V-73 for qemu-devel@nongnu.org; Wed, 31 Jan 2018 10:24:24 -0500 Received: from mx1.redhat.com ([209.132.183.28]:50038) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1eguFG-0000yx-UN for qemu-devel@nongnu.org; Wed, 31 Jan 2018 10:24:19 -0500 Date: Wed, 31 Jan 2018 15:23:56 +0000 From: "Dr. David Alan Gilbert" Message-ID: <20180131152356.GH2521@work-vm> References: <20180125200525.GC2442@work-vm> <6e27688f-1350-62f5-23ff-7855ea502be9@redhat.com> <20180126194559.GI2610@work-vm> <20180129095330.GA2408@work-vm> <20180131095351.GY21802@cbox> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH V1 1/1] tests: Add migration test for aarch64 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Ard Biesheuvel Cc: Christoffer Dall , Marc Zyngier , Peter Maydell , Wei Huang , QEMU Developers , Juan Quintela , Andrew Jones * Ard Biesheuvel (ard.biesheuvel@linaro.org) wrote: > On 31 January 2018 at 09:53, Christoffer Dall > wrote: > > On Mon, Jan 29, 2018 at 10:32:12AM +0000, Marc Zyngier wrote: > >> On 29/01/18 10:04, Peter Maydell wrote: > >> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert wrote: > >> >> * Peter Maydell (peter.maydell@linaro.org) wrote: > >> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert wrote: > >> >>>> * Peter Maydell (peter.maydell@linaro.org) wrote: > >> >>>>> I think the correct fix here is that your test code should turn > >> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work > >> >>>>> for Arm KVM guests (for the same reason that VGA device video memory > >> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as > >> >>>>> Normal Cacheable, and then everything should work fine. > >> >>>> > >> >>>> Does this cause problems with migrating at just the wrong point during > >> >>>> a VM boot? > >> >>> > >> >>> It wouldn't surprise me if it did, but I don't think I've ever > >> >>> tried to provoke that problem... > >> >> > >> >> If you think it'll get the RAM contents wrong, it might be best to fail > >> >> the migration if you can detect the cache is disabled in the guest. > >> > > >> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit > >> > in the guest's system registers, and refuse migration if it's off... > >> > > >> > (cc'd Marc, Christoffer to check that I don't have the wrong end > >> > of the stick about how thin the ice is in the period before the > >> > guest turns on its MMU...) > >> > >> Once MMU and caches are on, we should be in a reasonable place for QEMU > >> to have a consistent view of the memory. The trick is to prevent the > >> vcpus from changing that. A guest could perfectly turn off its MMU at > >> any given time if it needs to (and it is actually required on some HW if > >> you want to mitigate headlining CVEs), and KVM won't know about that. > >> > > > > (Clarification: KVM can detect this is it bother to check the VCPU's > > system registers, but we don't trap to KVM when the VCPU turns off its > > caches, right?) > > > >> You may have to pause the vcpus before starting the migration, or > >> introduce a new KVM feature that would automatically pause a vcpu that > >> is trying to disable its MMU while the migration is on. This would > >> involve trapping all the virtual memory related system registers, with > >> an obvious cost. But that cost would be limited to the time it takes to > >> migrate the memory, so maybe that's acceptable. > >> > > Is that even sufficient? > > > > What if the following happened. (1) guest turns off MMU, (2) guest > > writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads > > guest ram. QEMU's view of guest ram is now incorrect (stale, > > incoherent, ...). > > > > I'm also not really sure if pausing one VCPU because it turned off its > > MMU will go very well when trying to migrate a large VM (wouldn't this > > ask for all the other VCPUs beginning to complain that the stopped VCPU > > appears to be dead?). As a short-term 'fix' it's probably better to > > refuse migration if you detect that a VCPU had begun turning off its > > MMU. > > > > On the larger scale of thins; this appears to me to be another case of > > us really needing some way to coherently access memory between QEMU and > > the VM, but in the case of the VCPU turning off the MMU prior to > > migration, we don't even know where it may have written data, and I'm > > therefore not really sure what the 'proper' solution would be. > > > > (cc'ing Ard who has has thought about this problem before in the context > > of UEFI and VGA.) > > > > Actually, the VGA case is much simpler because the host is not > expected to write to the framebuffer, only read from it, and the guest > is not expected to create a cacheable mapping for it, so any > incoherency can be trivially solved by cache invalidation on the host > side. (Note that this has nothing to do with DMA coherency, but only > with PCI MMIO BARs that are backed by DRAM in the host) > > In the migration case, it is much more complicated, and I think > capturing the state of the VM in a way that takes incoherency between > caches and main memory into account is simply infeasible (i.e., the > act of recording the state of guest RAM via a cached mapping may evict > clean cachelines that are out of sync, and so it is impossible to > record both the cached *and* the delta with the uncached state) > > I wonder how difficult it would be to > a) enable trapping of the MMU system register when a guest CPU is > found to have its MMU off at migration time > b) allow the guest CPU to complete whatever it thinks it needs to be > doing with the MMU off > c) once it re-enables the MMU, proceed with capturing the memory state > > Full disclosure: I know very little about KVM migration ... The difficulty is that migration is 'live' - i.e. the guest is running while we're copying the data across; that means that a guest might do any of these MMU things multiple times - so if we wait for it to be right, will it go back to being wrong? How long do you wait? (It's not a bad hack if that's the best we can do though). Now of course 'live' itself sounds scary for consistency, but the only thing we really require is that a page is marked dirty some time after it's been written to so that we cause it to be sent again and that we eventually send a correct version; it's ok for us to be sending inconsistent versions as long as we eventually send the right version. Dave -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK