From mboxrd@z Thu Jan  1 00:00:00 1970
From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Subject: Re: [PATCH 1/2] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock
Date: Thu, 16 Feb 2012 12:50:18 +0800
Message-ID: <4F3C8B0A.4020800@linux.vnet.ibm.com>
References: <20120210152831.6ac3ac87.yoshikawa.takuya@oss.ntt.co.jp> <20120214171044.GK9440@redhat.com> <20120214172947.GA22362@amt.cnet> <20120214185356.GQ9440@redhat.com> <20120214194342.GA24117@amt.cnet> <4F3B7867.7000807@redhat.com> <4F3B7F1F.40402@redhat.com> <4F3B9902.5000306@linux.vnet.ibm.com> <4F3BBC35.6010502@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
	kvm@vger.kernel.org
To: Avi Kivity <avi@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from e28smtp08.in.ibm.com ([122.248.162.8]:47706 "EHLO
	e28smtp08.in.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1756014Ab2BPEua (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 15 Feb 2012 23:50:30 -0500
Received: from /spool/local
	by e28smtp08.in.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
	for <kvm@vger.kernel.org> from <xiaoguangrong@linux.vnet.ibm.com>;
	Thu, 16 Feb 2012 10:20:27 +0530
Received: from d28av04.in.ibm.com (d28av04.in.ibm.com [9.184.220.66])
	by d28relay03.in.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id q1G4oL3v4259950
	for <kvm@vger.kernel.org>; Thu, 16 Feb 2012 10:20:22 +0530
Received: from d28av04.in.ibm.com (loopback [127.0.0.1])
	by d28av04.in.ibm.com (8.14.4/8.13.1/NCO v10.0 AVout) with ESMTP id q1G4oKln007052
	for <kvm@vger.kernel.org>; Thu, 16 Feb 2012 15:50:21 +1100
In-Reply-To: <4F3BBC35.6010502@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 02/15/2012 10:07 PM, Avi Kivity wrote:

> On 02/15/2012 01:37 PM, Xiao Guangrong wrote:
>>>>
>>>> I would really like to move the IPI back out of the lock.
>>>>
>>>> How about something like a sequence lock:
>>>>
>>>>
>>>>     spin_lock(mmu_lock)
>>>>     need_flush = write_protect_stuff();
>>>>     atomic_add(kvm->want_flush_counter, need_flush);
>>>>     spin_unlock(mmu_lock);
>>>>
>>>>     while ((done = atomic_read(kvm->done_flush_counter)) < (want =
>>>> atomic_read(kvm->want_flush_counter)) {
>>>>           kvm_make_request(flush)
>>>>           atomic_cmpxchg(kvm->done_flush_counter, done, want)
>>>>     }
>>>>
>>>> This (or maybe a corrected and optimized version) ensures that any
>>>> need_flush cannot pass the while () barrier, no matter which thread
>>>> encounters it first.  However it violates the "do not invent new locking
>>>> techniques" commandment.  Can we map it to some existing method?
>>>
>>> There is no need to advance 'want' in the loop.  So we could do
>>>
>>> /* must call with mmu_lock held */
>>> void kvm_mmu_defer_remote_flush(kvm, need_flush)
>>> {
>>>       if (need_flush)
>>>             ++kvm->flush_counter.want;
>>> }
>>>
>>> /* may call without mmu_lock */
>>> void kvm_mmu_commit_remote_flush(kvm)
>>> {
>>>       want = ACCESS_ONCE(kvm->flush_counter.want)
>>>       while ((done = atomic_read(kvm->flush_counter.done) < want) {
>>>             kvm_make_request(flush)
>>>             atomic_cmpxchg(kvm->flush_counter.done, done, want)
>>>       }
>>> }
>>>
>>
>>
>> Hmm, we already have kvm->tlbs_dirty, so, we can do it like this:
>>
>> #define SPTE_INVALID_UNCLEAN (1 << 63 )
>>
>> in invalid page path:
>> lock mmu_lock
>> if (spte is invalid)
>> 	kvm->tlbs_dirty |= SPTE_INVALID_UNCLEAN;
>> need_tlb_flush = kvm->tlbs_dirty;
>> unlock mmu_lock
>> if (need_tlb_flush)
>> 	kvm_flush_remote_tlbs()
>>
>> And in page write-protected path:
>> lock mmu_lock
>> 	if (it has spte change to readonly |
>> 	      kvm->tlbs_dirty & SPTE_INVALID_UNCLEAN)
>> 		kvm_flush_remote_tlbs()
>> unlock mmu_lock
>>
>> How about this?
> 
> Well, it still has flushes inside the lock.  And it seems to be more
> complicated, but maybe that's because I thought of my idea and didn't
> fully grok yours yet.
> 


Oh, i was not think that flush all tlbs out of mmu-lock, just invalid page
path.

But, there still have some paths need flush tlbs inside of mmu-lock(like
sync_children, get_page).

In your code:

>>> /* must call with mmu_lock held */
>>> void kvm_mmu_defer_remote_flush(kvm, need_flush)
>>> {
>>>       if (need_flush)
>>>             ++kvm->flush_counter.want;
>>> }
>>>
>>> /* may call without mmu_lock */
>>> void kvm_mmu_commit_remote_flush(kvm)
>>> {
>>>       want = ACCESS_ONCE(kvm->flush_counter.want)
>>>       while ((done = atomic_read(kvm->flush_counter.done) < want) {
>>>             kvm_make_request(flush)
>>>             atomic_cmpxchg(kvm->flush_counter.done, done, want)
>>>       }
>>> }

I think we do not need handle all tlb-flushed request here since all of these
request can be delayed to the point where mmu-lock is released , we can simply
do it:

void kvm_mmu_defer_remote_flush(kvm, need_flush)
{
	if (need_flush)
		++kvm->tlbs_dirty;
}

void kvm_mmu_commit_remote_flush(struct kvm *kvm)
{
	int dirty_count = kvm->tlbs_dirty;

	smp_mb();

	if (!dirty_count)
		return;

	if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
		++kvm->stat.remote_tlb_flush;
	cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
}

if this is ok, we only need do small change in the current code, since
kvm_mmu_commit_remote_flush is very similar with kvm_flush_remote_tlbs().