From mboxrd@z Thu Jan 1 00:00:00 1970 From: Xiao Guangrong Subject: Re: [PATCH v3 07/15] KVM: MMU: introduce nulls desc Date: Thu, 28 Nov 2013 16:53:50 +0800 Message-ID: <5297049E.3020800@linux.vnet.ibm.com> References: <1382534973-13197-1-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <1382534973-13197-8-git-send-email-xiaoguangrong@linux.vnet.ibm.com> <20131122191429.GA13308@amt.cnet> <65EE805B-B5DB-4BD0-A057-E5FF78D96D67@linux.vnet.ibm.com> <5292EE2F.5090305@linux.vnet.ibm.com> <20131125181254.GB21858@amt.cnet> <529413C1.60302@linux.vnet.ibm.com> <20131126193148.GA18071@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: Gleb Natapov , avi.kivity@gmail.com, "pbonzini@redhat.com Bonzini" , linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Eric Dumazet , Peter Zijlstra To: Marcelo Tosatti Return-path: Received: from e23smtp09.au.ibm.com ([202.81.31.142]:36796 "EHLO e23smtp09.au.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751112Ab3K1Ix7 (ORCPT ); Thu, 28 Nov 2013 03:53:59 -0500 Received: from /spool/local by e23smtp09.au.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Thu, 28 Nov 2013 18:53:57 +1000 In-Reply-To: <20131126193148.GA18071@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: On 11/27/2013 03:31 AM, Marcelo Tosatti wrote: > On Tue, Nov 26, 2013 at 11:21:37AM +0800, Xiao Guangrong wrote: >> On 11/26/2013 02:12 AM, Marcelo Tosatti wrote: >>> On Mon, Nov 25, 2013 at 02:29:03PM +0800, Xiao Guangrong wrote: >>>>>> Also, there is no guarantee of termination (as long as sptes are >>>>>> deleted with the correct timing). BTW, can't see any guarantee o= f >>>>>> termination for rculist nulls either (a writer can race with a l= ockless >>>>>> reader indefinately, restarting the lockless walk every time). >>>>> >>>>> Hmm, that can be avoided by checking dirty-bitmap before rewalk, >>>>> that means, if the dirty-bitmap has been set during lockless writ= e-protection, >>>>> it=EF=BF=BDs unnecessary to write-protect its sptes. Your idea? >>>> This idea is based on the fact that the number of rmap is limited = by >>>> RMAP_RECYCLE_THRESHOLD. So, in the case of adding new spte into rm= ap, >>>> we can break the rewalk at once, in the case of deleting, we can o= nly >>>> rewalk RMAP_RECYCLE_THRESHOLD times. >>> >>> Please explain in more detail. >> >> Okay. >> >> My proposal is like this: >> >> pte_list_walk_lockless() >> { >> restart: >> >> + if (__test_bit(slot->arch.dirty_bitmap, gfn-index)) >> + return; >> >> code-doing-lockless-walking; >> ...... >> } >> >> Before do lockless-walking, we check the dirty-bitmap first, if >> it is set we can simply skip write-protection for the gfn, that >> is the case that new spte is being added into rmap when we lockless >> access the rmap. >=20 > The dirty bit could be set after the check. >=20 >> For the case of deleting spte from rmap, the number of entry is limi= ted >> by RMAP_RECYCLE_THRESHOLD, that is not endlessly. >=20 > It can shrink and grow while lockless walk is performed. Yes, indeed. Hmmm, another idea in my mind to fix this is encoding the position into the reserved bits of desc->more pointer, for example: +------+ +------+ +------+ rmapp -> |Desc 0| -> |Desc 1| -> |Desc 2| +------+ +------+ +------+ There are 3 descs on the rmap, and: rmapp =3D &desc0 | 1UL | 3UL << 50; desc0->more =3D desc1 | 2UL << 50; desc1->more =3D desc0 | 1UL << 50 desc2->more =3D &rmapp | 1UL; (The nulls pointer) We will walk to the next desc only if the "position" of current desc is >=3D the position of next desc. That can make sure we can reach the last desc anyway. And in order to avoiding doing too many "rewalk", we will goto the slow path (do walk with holding the lock) instead when retry the walk more that N times. Thanks all you guys in thanksgiving day. :)