From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Paul E. McKenney" Subject: Re: [RFC] KVM: x86: conditionally acquire/release slots_lock on entry/exit Date: Sun, 13 Sep 2009 09:26:52 -0700 Message-ID: <20090913162652.GF6867@linux.vnet.ibm.com> References: <20090827012000.762063112@localhost.localdomain> <20090827155450.GA6312@amt.cnet> <4A96B404.5010500@redhat.com> <20090827225940.GA13571@amt.cnet> <4A977E3C.7050604@redhat.com> <20090910223003.GA658@amt.cnet> <4AAD12F9.5020601@redhat.com> Reply-To: paulmck@linux.vnet.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Marcelo Tosatti , kvm@vger.kernel.org, Gleb Natapov To: Avi Kivity Return-path: Received: from e6.ny.us.ibm.com ([32.97.182.146]:59071 "EHLO e6.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753308AbZIMQ0p (ORCPT ); Sun, 13 Sep 2009 12:26:45 -0400 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e6.ny.us.ibm.com (8.14.3/8.13.1) with ESMTP id n8DGVGsM013144 for ; Sun, 13 Sep 2009 12:31:16 -0400 Received: from d01av04.pok.ibm.com (d01av04.pok.ibm.com [9.56.224.64]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id n8DGQmDB251700 for ; Sun, 13 Sep 2009 12:26:48 -0400 Received: from d01av04.pok.ibm.com (loopback [127.0.0.1]) by d01av04.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id n8DGQmW5010468 for ; Sun, 13 Sep 2009 12:26:48 -0400 Content-Disposition: inline In-Reply-To: <4AAD12F9.5020601@redhat.com> Sender: kvm-owner@vger.kernel.org List-ID: On Sun, Sep 13, 2009 at 06:42:49PM +0300, Avi Kivity wrote: > On 09/11/2009 01:30 AM, Marcelo Tosatti wrote: >> >>> We don't need to stop vcpus, just kick them out of guest mode to let >>> them notice the new data. SRCU does that well. >>> >> Two problems: >> >> 1. The removal of memslots/aliases and zapping of mmu (to remove any >> shadow pages with stale sp->gfn) must be atomic from the POV of the >> pagefault path. Otherwise something like this can happen: >> >> fault path set_memory_region >> > > srcu_read_lock() > >> walk_addr fetches and validates >> table_gfns >> delete memslot >> synchronize_srcu >> >> fetch creates shadow >> > > srcu_read_unlock() > >> page with nonexistant sp->gfn >> > > I think synchronize_srcu() will be deferred until the fault path is > complete (and srcu_read_unlock() runs). Copying someone who knows for > sure. Yes, synchronize_srcu() will block until srcu_read_unlock() in this scenario, assuming that the same srcu_struct is used by both. >> OR >> >> mmu_alloc_roots path set_memory_region >> > > srcu_read_lock() > >> >> delete memslot >> root_gfn = vcpu->arch.cr3<< PAGE_SHIFT >> mmu_check_root(root_gfn) synchronize_rcu >> kvm_mmu_get_page() >> > > srcu_read_unlock() > >> kvm_mmu_zap_all >> > > Ditto, srcu_read_lock() protects us. Yep! >> Accesses between kvm_mmu_get_page and kvm_mmu_zap_all window can see >> shadow pages with stale gfn. >> >> But, if you still think its worthwhile to use RCU, at least handling >> gfn_to_memslot / unalias_gfn errors _and_ adding mmu_notifier_retry >> invalidation to set_memory_region path will be necessary (so that >> gfn_to_pfn validation, in the fault path, is discarded in case >> of memslot/alias update). > > It really is worthwhile to reuse complex infrastructure instead of writing > new infrastructure. Marcelo, in your first example, is your concern that the fault path needs to detect the memslot deletion? Or that the use of sp->gfn "leaks" out of the SRCU read-side critical section? Thanx, Paul >> 2. Another complication is that memslot creation and kvm_iommu_map_pages >> are not atomic. >> >> create memslot >> synchronize_srcu >> <----- vcpu grabs gfn reference without >> iommu mapping. >> kvm_iommu_map_pages >> >> Which can be solved by changing kvm_iommu_map_pages (and new gfn_to_pfn >> helper) to use base_gfn,npages,hva information from somewhere else other >> than visible kvm->memslots (so that when the slot becomes visible its >> already iommu mapped). > > Yes. It can accept a memslots structure instead of deriving it from > kvm->memslots. Then we do a rcu_assign_pointer() to switch the tables. > >> So it appears to me using RCU introduces more complications / subtle >> details than mutual exclusion here. The new request bit which the >> original patch introduces is limited to enabling/disabling conditional >> acquision of slots_lock (calling it a "new locking protocol" is unfair) >> to improve write acquision latency. >> > > It's true that it is not a new locking protocol. But I feel it is > worthwhile to try to use rcu for this; at least it will make it easier for > newcomers (provided they understand rcu). > > > -- > error compiling committee.c: too many arguments to function >