From mboxrd@z Thu Jan  1 00:00:00 1970
From: Marcelo Tosatti <mtosatti@redhat.com>
Subject: Re: [patch 02/10] KVM: MMU: move local TLB flush to mmu_set_spte
Date: Sat, 20 Sep 2008 21:10:46 -0300
Message-ID: <20080921001046.GA8593@dmt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: kvm-devel <kvm@vger.kernel.org>,
	"David S. Ahern" <daahern@cisco.com>
To: avi@redhat.com
Return-path: <kvm-owner@vger.kernel.org>
Received: from mx2.redhat.com ([66.187.237.31]:50961 "EHLO mx2.redhat.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1751578AbYIUAMH (ORCPT <rfc822;kvm@vger.kernel.org>);
	Sat, 20 Sep 2008 20:12:07 -0400
Content-Disposition: inline
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Fri, Sep 19, 2008 at 05:21:09PM -0700, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Since the sync page path can collapse flushes.
>>
>> Also only flush if the spte was writable before.
>>
>> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
>>
>> @@ -1241,9 +1239,12 @@ static void mmu_set_spte(struct kvm_vcpu
>>  		}
>>  	}
>>  	if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
>> -		      dirty, largepage, gfn, pfn, speculative))
>> +		      dirty, largepage, gfn, pfn, speculative)) {
>>  		if (write_fault)
>>  			*ptwrite = 1;
>> +		if (was_writeble)
>> +			kvm_x86_ops->tlb_flush(vcpu);
>> +	}
>>    
>
> I think we had cases where the spte.pfn contents changed, for example  
> when a large page was replaced by a normal page,

True. And the TLB is not flushed now for large->normal replace, in case
the pte thats faulting is read-only. The local (and remote) TLB's must
be flushed on large->normal replace.

(BTW the largepage patch is wrong, will reply to that soon).

> and also:
>
>        } else if (pfn != spte_to_pfn(*shadow_pte)) {

That one is likely to crash the guest anyway, so I don't see the need
for a flush there:

> > Did you find out what's causing the errors in the first place (if
> > zap is not used)?  It worries me greatly.
> Yes, the problem is that the rmap code does not handle the qemu
> process
> mappings from vanishing while there is a present rmap. If that
> happens,
> and there is a fault for a gfn whose qemu mapping has been removed, a
> different physical zero page will be allocated:
> 
>      rmap a -> gfn 0 -> physical host page 0
>      mapping for gfn 0 gets removed
>      guest faults in gfn 0 through the same pte "chain"
>      rmap a -> gfn 0 -> physical host page 1
> 
> When instantiating the shadow mapping for the second time, the
> "is_rmap_pte" check succeeds, so we release the reference grabbed by
> gfn_to_page() at mmu_set_spte(). We now have a shadow mapping
> pointing
> to a physical page without having an additional reference on that
> page.
> 
> The following makes the host not crash under such a condition, but
> the condition itself is invalid leading to inconsistent state on the
> guest.
> So IMHO it shouldnt be allowed to happen in the first place.