From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marcelo Tosatti Subject: Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly Date: Wed, 15 Aug 2012 14:59:20 -0300 Message-ID: <20120815175920.GB18452@amt.cnet> References: <20120806100603.GD8980@bloggs.ozlabs.ibm.com> <20120809181612.GA12285@amt.cnet> <20120810003439.GB26420@bloggs.ozlabs.ibm.com> <20120810012532.GA15142@amt.cnet> <20120810110909.fd044ae1.yoshikawa.takuya@oss.ntt.co.jp> <20120810183553.GA16345@amt.cnet> <20120811003754.GA6016@bloggs.ozlabs.ibm.com> <20120813163411.GA22134@amt.cnet> <20120813220450.GB15395@amt.cnet> <502B6B2D.2060906@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Paul Mackerras , Gleb Natapov , Alex Williamson , Takuya Yoshikawa , Alexander Graf , kvm-ppc@vger.kernel.org, kvm@vger.kernel.org To: Avi Kivity Return-path: Content-Disposition: inline In-Reply-To: <502B6B2D.2060906@redhat.com> Sender: kvm-ppc-owner@vger.kernel.org List-Id: kvm.vger.kernel.org On Wed, Aug 15, 2012 at 12:26:05PM +0300, Avi Kivity wrote: > On 08/14/2012 01:04 AM, Marcelo Tosatti wrote: > > On Mon, Aug 13, 2012 at 01:34:11PM -0300, Marcelo Tosatti wrote: > >> On Sat, Aug 11, 2012 at 10:37:54AM +1000, Paul Mackerras wrote: > >> > On Fri, Aug 10, 2012 at 03:35:53PM -0300, Marcelo Tosatti wrote: > >> > > There's no plan. I just wanted to confirm this before converting > >> > > to per-memslot flush. > >> > > > >> > > 1) __kvm_set_memory_region line 807: > >> > > > >> > > * - gfn_to_hva (kvm_read_guest, gfn_to_pfn) > >> > > * - kvm_is_visible_gfn (mmu_check_roots) > >> > > */ > >> > > kvm_arch_flush_shadow(kvm); > >> > > kfree(old_memslots); > >> > > } > >> > > > >> > > This can be converted to per-memslot flush. > >> > > >> > Yes, that would be good. > >> > > >> > > 2) __kvm_set_memory_region line 846: > >> > > > >> > > /* > >> > > * If the new memory slot is created, we need to clear all > >> > > * mmio sptes. > >> > > */ > >> > > if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT) > >> > > kvm_arch_flush_shadow(kvm); > >> > > > >> > > This must flush all translations in the new and old GPA ranges: > >> > > a) to remove any mmio sptes that existed in the new GPA range > >> > > (see ce88decffd17bf9f373cc233c for a description). > >> > > >> > I assume that mmio sptes are ones that are created for accesses to > >> > guest physical addresses where there is no memslot. On Book3S HV we > >> > trap those accesses (on POWER7 at least) but we don't create any > >> > hardware PTEs for them. So I don't have anything to do here. > >> > > >> > > b) to remove any translations in the old range (this is > >> > > necessary because change of GPA base is allowed). > >> > > >> > In fact I need to remove any translations in the old range *before* > >> > the new memslot gets committed, whereas this happens after that. > >> > This is why I was doing the flush in kvm_arch_prepare_memory_region. > >> > >> I see. The problem with that is, given that the fault path (reader > >> of memslots) is protected only by SRCU, the following window is open: > >> > >> A) kvm_arch_prepare_memory_region > >> B) rcu_assign_pointer(kvm->memslots, slots) > >> C) kvm_arch_commit_memory_region > >> > >> The window between A and B where a guest pagefault can instantiate a new > >> translation using the old memslot information (while you already removed > >> any translations in the old range). > >> > >> > >> Avi, Gleb, Alex, do you know why it is necessary to support change of > >> GPA base again? > >> > >> Without taking into consideration backwards compatibility, userspace > >> can first delete the slot and later create a new one. > > > > Current userspace does not, and can't see why older userspace would. If > > we break it and someone complains, it can be fixed. > > > > So, please: > > > > 1) Disallow change of base GPA (can do that in the "Check for overlaps" loop > > in __kvm_set_memory_region), returning EINVAL. > > 2) Introduce kvm_arch_invalidate_memslot (replacing first > > kvm_arch_flush_shadow). > > 3) Introduce kvm_arch_postcommit_memslot (replacing the > > second kvm_arch_flush_shadow). > > > Is there a reason why the kernel can't do the same? First remove the > old slot, then add the new one? > > We break atomicity, but it's less likely that guests rely on it than an > old qemu not relying on gpa-changing memory update. The guest should not expect memory accesses to an address to behave sanely while changing a BAR anyway. Yes, compatibility for change of GPA base can be done in the kernel. I can look into it next week if nobody has done so at that point. Thanks Avi.