From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
Date: Wed, 15 Feb 2012 19:37:38 +0800
Message-ID: <4F3B9902.5000306@linux.vnet.ibm.com>
References: <20120210152831.6ac3ac87.yoshikawa.takuya@oss.ntt.co.jp> <20120214171044.GK9440@redhat.com> <20120214172947.GA22362@amt.cnet> <20120214185356.GQ9440@redhat.com> <20120214194342.GA24117@amt.cnet> <4F3B7867.7000807@redhat.com> <4F3B7F1F.40402@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
	kvm@vger.kernel.org
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e23smtp09.au.ibm.com ([202.81.31.142]:52039 "EHLO
	e23smtp09.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1751759Ab2BOLhp (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 15 Feb 2012 06:37:45 -0500
Received: from /spool/local
	by e23smtp09.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
	for <kvm@vger.kernel.org> from <xiaoguangrong@linux.vnet.ibm.com>;
	Wed, 15 Feb 2012 12:29:03 +1000
Received: from d23av04.au.ibm.com (d23av04.au.ibm.com [9.190.235.139])
	by d23relay03.au.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q1FBbemu1110036
	for <kvm@vger.kernel.org>; Wed, 15 Feb 2012 22:37:41 +1100
Received: from d23av04.au.ibm.com (loopback [127.0.0.1])
	by d23av04.au.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q1FBbeLa009550
	for <kvm@vger.kernel.org>; Wed, 15 Feb 2012 22:37:40 +1100
In-Reply-To: <4F3B7F1F.40402@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 02/15/2012 05:47 PM, Avi Kivity wrote:

> On 02/15/2012 11:18 AM, Avi Kivity wrote:
>> On 02/14/2012 09:43 PM, Marcelo Tosatti wrote:
>>> Also it should not be necessary for these flushes to be inside mmu_lock
>>> on EPT/NPT case (since there is no write protection there). 
>>
>> We do write protect with TDP, if nested virt is active.  The question is
>> whether we have indirect pages or not, not whether TDP is active or not
>> (even without TDP, if you don't enable paging in the guest, you don't
>> have to write protect).
>>
>>> But it would
>>> be awkward to differentiate the unlock position based on EPT/NPT.
>>>
>>
>> I would really like to move the IPI back out of the lock.
>>
>> How about something like a sequence lock:
>>
>>
>>     spin_lock(mmu_lock)
>>     need_flush = write_protect_stuff();
>>     atomic_add(kvm->want_flush_counter, need_flush);
>>     spin_unlock(mmu_lock);
>>
>>     while ((done = atomic_read(kvm->done_flush_counter)) < (want =
>> atomic_read(kvm->want_flush_counter)) {
>>           kvm_make_request(flush)
>>           atomic_cmpxchg(kvm->done_flush_counter, done, want)
>>     }
>>
>> This (or maybe a corrected and optimized version) ensures that any
>> need_flush cannot pass the while () barrier, no matter which thread
>> encounters it first.  However it violates the "do not invent new locking
>> techniques" commandment.  Can we map it to some existing method?
> 
> There is no need to advance 'want' in the loop.  So we could do
> 
> /* must call with mmu_lock held */
> void kvm_mmu_defer_remote_flush(kvm, need_flush)
> {
>       if (need_flush)
>             ++kvm->flush_counter.want;
> }
> 
> /* may call without mmu_lock */
> void kvm_mmu_commit_remote_flush(kvm)
> {
>       want = ACCESS_ONCE(kvm->flush_counter.want)
>       while ((done = atomic_read(kvm->flush_counter.done) < want) {
>             kvm_make_request(flush)
>             atomic_cmpxchg(kvm->flush_counter.done, done, want)
>       }
> }
> 


Hmm, we already have kvm->tlbs_dirty, so, we can do it like this:

#define SPTE_INVALID_UNCLEAN (1 << 63 )

in invalid page path:
lock mmu_lock
if (spte is invalid)
	kvm->tlbs_dirty |= SPTE_INVALID_UNCLEAN;
need_tlb_flush = kvm->tlbs_dirty;
unlock mmu_lock
if (need_tlb_flush)
	kvm_flush_remote_tlbs()

And in page write-protected path:
lock mmu_lock
	if (it has spte change to readonly |
	      kvm->tlbs_dirty & SPTE_INVALID_UNCLEAN)
		kvm_flush_remote_tlbs()
unlock mmu_lock

How about this?