From mboxrd@z Thu Jan  1 00:00:00 1970
From: Lan Tianyu <tianyu.lan@intel.com>
Subject: Re: [PATCH] KVM: Remove redundant smp_mb() in the kvm_mmu_commit_zap_page()
Date: Wed, 09 Mar 2016 15:18:40 +0800
Message-ID: <56DFCE50.2000807@intel.com>
References: <1457055312-27067-1-git-send-email-tianyu.lan@intel.com> <alpine.DEB.2.11.1603040820100.3597@nanos> <56D9354E.9040908@intel.com> <56D94BFE.1080406@redhat.com> <56DE8F1A.9000401@intel.com> <56DEEF69.20208@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: gleb@kernel.org, mingo@redhat.com, hpa@zytor.com, x86@kernel.org,
	kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Xiao Guangrong <guangrong.xiao@linux.intel.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mga04.intel.com ([192.55.52.120]:5585 "EHLO mga04.intel.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1752728AbcCIHfI (ORCPT <rfc822;kvm@vger.kernel.org>);
	Wed, 9 Mar 2016 02:35:08 -0500
In-Reply-To: <56DEEF69.20208@redhat.com>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On 2016=E5=B9=B403=E6=9C=8808=E6=97=A5 23:27, Paolo Bonzini wrote:
> Unfortunately that patch added a bad memory barrier: 1) it lacks a
> comment; 2) it lacks obvious pairing; 3) it is an smp_mb() after a re=
ad,
> so it's not even obvious that this memory barrier has to do with the
> immediately preceding read of kvm->tlbs_dirty.  It also is not
> documented in Documentation/virtual/kvm/mmu.txt (Guangrong documented
> there most of his other work, back in 2013, but not this one :)).
>=20
> The cmpxchg is ordered anyway against the read, because 1) x86 has
> implicit ordering between earlier loads and later stores; 2) even
> store-load barriers are unnecessary for accesses to the same variable
> (in this case kvm->tlbs_dirty).
>=20
> So offhand, I cannot say what it orders.  There are two possibilities=
:
>=20
> 1) it orders the read of tlbs_dirty with the read of mode.  In this
> case, a smp_rmb() would have been enough, and it's not clear where is
> the matching smp_wmb().
>=20
> 2) it orders the read of tlbs_dirty with the KVM_REQ_TLB_FLUSH reques=
t.
>  In this case a smp_load_acquire would be better.
>=20
> 3) it does the same as kvm_mmu_commit_zap_page's smp_mb() but for oth=
er
> callers of kvm_flush_remote_tlbs().  In this case, we know what's the
> matching memory barrier (walk_shadow_page_lockless_*).
>=20
> 4) it is completely unnecessary.
>=20
> My guess is either (2) or (3), but I am not sure.  We know that
> anticipating kvm->tlbs_dirty should be safe; worst case, it causes th=
e
> cmpxchg to fail and an extra TLB flush on the next call to the MMU
> notifier.  But I'm not sure of what happens if the processor moves th=
e
> read later.

I found the smp_mb() in the kvm_mmu_commit_zap_page() was removed by
commit 5befdc38 and the commit was reverted by commit a086f6a1e.

The remove/revert reason is whether kvm_flush_remote_tlbs() is under
mmu_lock or not.

The mode and request aren't always under mmu_lock and so I think the
smp_mb() should not be related with mode and request when introduced.

>=20
>> > The smp_mb() in the kvm_mmu_commit_zap_page() was introduced by co=
mmit
>> > c142786c6 which was merged later than commit a4ee1ca4. It pairs wi=
th
>> > smp_mb() in the walk_shadow_page_lockless_begin/end() to keep orde=
r
>> > between modifications of the page tables and reading mode.
> Yes; it also pairs with the smp_mb__after_srcu_unlock() in vcpu_enter=
_guest.
>=20
>> > The smp_mb() in the kvm_make_all_cpus_request() was introduced by =
commit
>> > 6b7e2d09. It keeps order between setting request bit and reading m=
ode.
> Yes.
>=20
>>> >>  So you can:
>>> >>
>>> >> 1) add a comment to kvm_flush_remote_tlbs like:
>>> >>
>>> >> 	/*
>>> >> 	 * We want to publish modifications to the page tables before r=
eading
>>> >> 	 * mode.  Pairs with a memory barrier in arch-specific code.
>>> >> 	 * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
>>> >> 	 * - powerpc: smp_mb in kvmppc_prepare_to_enter.
>>> >> 	 */
>>> >>
>>> >> 2) add a comment to vcpu_enter_guest and kvmppc_prepare_to_enter=
, saying
>>> >> that the memory barrier also orders the write to mode from any r=
eads
>>> >> to the page tables done while the VCPU is running.  In other wor=
ds, on
>>> >> entry a single memory barrier achieves two purposes (write ->mod=
e before
>>> >> reading requests, write ->mode before reading page tables).
>> >=20
>> > These sounds good.
>> >=20
>>> >> The same should be true in kvm_flush_remote_tlbs().  So you may =
investigate
>>> >> removing the barrier from kvm_flush_remote_tlbs, because
>>> >> kvm_make_all_cpus_request already includes a memory barrier.  Li=
ke
>>> >> Thomas suggested, leave a comment in kvm_flush_remote_tlbs(),
>>> >> saying which memory barrier you are relying on and for what.
>> >=20
>> > If we remove the smp_mb() in the kvm_flush_remote_tlbs(), we need =
to
>> > leave comments both in the kvm_flush_remote_tlbs() and
>> > kvm_mmu_commit_zap_page(), right?
> Yes.  In fact, instead of removing it, I would change it to
>=20
> 	smp_mb__before_atomic();
>=20
> with a comment that points to the addition of the barrier in commit
> a4ee1ca4.  Unless Guangrong can enlighten us. :)
>=20

How about the following comments.

Log for kvm_mmu_commit_zap_page()
	/*
	 * We need to make sure everyone sees our modifications to
	 * the page tables and see changes to vcpu->mode here. The
	 * barrier in the kvm_flush_remote_tlbs() helps us to achieve
	 * these. Otherwise, wait for all vcpus to exit guest mode
	 * and/or lockless shadow page table walks.
	 */
	kvm_flush_remote_tlbs(kvm);


Log for kvm_flush_remote_tlbs()
	/*
	 * We want to publish modifications to the page tables before
	 * reading mode. Pairs with a memory barrier in arch-specific
	 * code.
	 * - x86: smp_mb__after_srcu_read_unlock in vcpu_enter_guest.
	 * - powerpc: smp_mb in kvmppc_prepare_to_enter.
	 */
 	 smp_mb__before_atomic();
=09

>>> >>
>>> >> And finally, the memory barrier in kvm_make_all_cpus_request can=
 become
>>> >> smp_mb__after_atomic, which is free on x86.
>> >=20
>> > I found you have done this in your tree :)
>> >=20
>> > commit 5b06b60b72b73cbe3805d2a64d223e0264bd0479
>> > Author: Paolo Bonzini <pbonzini@redhat.com>
>> > Date:   Wed Feb 24 12:44:17 2016 +0100
>> >=20
>> >     KVM: mark memory barrier with smp_mb__after_atomic
>> >=20
>> >     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> Yeah, but I reverted it because it makes sense to do it together with
> your patch.
>=20
> Paolo


--=20
Best regards
Tianyu Lan