From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiao Guangrong Subject: Re: [PATCH 2/7] KVM: MMU: document clear_spte_count Date: Wed, 19 Jun 2013 20:25:46 +0800 Message-ID: <51C1A34A.7080201@linux.vnet.ibm.com> References: <1371632965-20077-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <1371632965-20077-3-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <51C196E9.2080508@redhat.com> <51C19BA6.2060501@linux.vnet.ibm.com> <51C19C4C.3000800@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Cc: gleb@redhat.com, avi.kivity@gmail.com, mtosatti@redhat.com, linux-kernel@vger.kernel.org, kvm@vger.kernel.org To: Paolo Bonzini Return-path: In-Reply-To: <51C19C4C.3000800@redhat.com> Sender: linux-kernel-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On 06/19/2013 07:55 PM, Paolo Bonzini wrote: > Il 19/06/2013 13:53, Xiao Guangrong ha scritto: >> On 06/19/2013 07:32 PM, Paolo Bonzini wrote: >>> Il 19/06/2013 11:09, Xiao Guangrong ha scritto: >>>> Document it to Documentation/virtual/kvm/mmu.txt >>> >>> While reviewing the docs, I looked at the code. >>> >>> Why can't this happen? >>> >>> CPU 1: __get_spte_lockless CPU 2: __update_clear_spte_slow >>> ------------------------------------------------------------------------------ >>> write low >>> read count >>> read low >>> read high >>> write high >>> check low and count >>> update count >>> >>> The check passes, but CPU 1 read a "torn" SPTE. >> >> In this case, CPU 1 will read the "new low bits" and the "old high bits", right? >> the P bit in the low bits is cleared when do __update_clear_spte_slow, i.e, it is >> not present, so the whole value is ignored. > > Indeed that's what the comment says, too. But then why do we need the > count at all? The spte that is read is exactly the same before and > after the count is updated. In order to detect repeatedly marking spte present to stop the lockless side to see present to present change, otherwise, we can get this: Say spte = 0xa11110001 (high 32bits = 0xa, low 32bit = 0x11110001) CPU 1: __get_spte_lockless CPU 2: __update_clear_spte_slow ---------------------------------------------------------------------- read low: low= 0x11110001 clear the spte, then spte = 0x0ull read high: high = 0x0 set spte to 0xb11110001 (high 32bits = 0xb, low 32bit = 0x11110001) read low: 0x11110001 and see it is not changed. In this case, CPU 1 see the low bits are not changed, then it tries to access the memory at: 0x11110000. BTW, we are using tlb to protect lockless walking, the count can be drop after improving kvm_set_pte_rmapp where is the only place change spte from present to present without TLB flush.