From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [patch 02/10] KVM: MMU: move local TLB flush to mmu_set_spte Date: Mon, 22 Sep 2008 23:28:11 +0300 Message-ID: <48D7FFDB.1010708@redhat.com> References: <20080921001046.GA8593@dmt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: kvm-devel , "David S. Ahern" To: Marcelo Tosatti Return-path: Received: from mx2.redhat.com ([66.187.237.31]:34014 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752425AbYIVU3O (ORCPT ); Mon, 22 Sep 2008 16:29:14 -0400 In-Reply-To: <20080921001046.GA8593@dmt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: Marcelo Tosatti wrote: >> I think we had cases where the spte.pfn contents changed, for example >> when a large page was replaced by a normal page, >> > > True. And the TLB is not flushed now for large->normal replace, in case > the pte thats faulting is read-only. The local (and remote) TLB's must > be flushed on large->normal replace. > > Can you prepare a patch for that, for -stable? >> and also: >> >> } else if (pfn != spte_to_pfn(*shadow_pte)) { >> > > That one is likely to crash the guest anyway, so I don't see the need > for a flush there: > > >>> Did you find out what's causing the errors in the first place (if >>> zap is not used)? It worries me greatly. >>> >> Yes, the problem is that the rmap code does not handle the qemu >> process >> mappings from vanishing while there is a present rmap. If that >> happens, >> and there is a fault for a gfn whose qemu mapping has been removed, a >> different physical zero page will be allocated: >> >> rmap a -> gfn 0 -> physical host page 0 >> mapping for gfn 0 gets removed >> guest faults in gfn 0 through the same pte "chain" >> rmap a -> gfn 0 -> physical host page 1 >> >> When instantiating the shadow mapping for the second time, the >> "is_rmap_pte" check succeeds, so we release the reference grabbed by >> gfn_to_page() at mmu_set_spte(). We now have a shadow mapping >> pointing >> to a physical page without having an additional reference on that >> page. >> >> The following makes the host not crash under such a condition, but >> the condition itself is invalid leading to inconsistent state on the >> guest. >> So IMHO it shouldnt be allowed to happen in the first place. >> And it isn't, with mmu notifiers. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.