From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [patch 2/5] KVM: MMU: allow pinning spte translations (TDP-only) Date: Thu, 19 Jun 2014 11:01:06 +0300 Message-ID: <53A298C2.4040005@gmail.com> References: <20140618231203.846608908@amt.cnet> <20140618231521.569025131@amt.cnet> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: pbonzini@redhat.com, xiaoguangrong@linux.vnet.ibm.com, gleb@kernel.org To: mtosatti@redhat.com, kvm@vger.kernel.org, ak@linux.intel.com Return-path: Received: from mail-we0-f172.google.com ([74.125.82.172]:44909 "EHLO mail-we0-f172.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750975AbaFSIBK (ORCPT ); Thu, 19 Jun 2014 04:01:10 -0400 Received: by mail-we0-f172.google.com with SMTP id u57so1966299wes.3 for ; Thu, 19 Jun 2014 01:01:08 -0700 (PDT) In-Reply-To: <20140618231521.569025131@amt.cnet> Sender: kvm-owner@vger.kernel.org List-ID: On 06/19/2014 02:12 AM, mtosatti@redhat.com wrote: > Allow vcpus to pin spte translations by: > > 1) Creating a per-vcpu list of pinned ranges. > 2) On mmu reload request: > - Fault ranges. > - Mark sptes with a pinned bit. > - Mark shadow pages as pinned. > > 3) Then modify the following actions: > - Page age => skip spte flush. > - MMU notifiers => force mmu reload request (which kicks cpu out of > guest mode). > - GET_DIRTY_LOG => force mmu reload request. > - SLAB shrinker => skip shadow page deletion. > > TDP-only. > > > +int kvm_mmu_register_pinned_range(struct kvm_vcpu *vcpu, > + gfn_t base_gfn, unsigned long npages) > +{ > + struct kvm_pinned_page_range *p; > + > + mutex_lock(&vcpu->arch.pinned_mmu_mutex); > + list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) { > + if (p->base_gfn == base_gfn && p->npages == npages) { > + mutex_unlock(&vcpu->arch.pinned_mmu_mutex); > + return -EEXIST; > + } > + } > + mutex_unlock(&vcpu->arch.pinned_mmu_mutex); > + > + if (vcpu->arch.nr_pinned_ranges >= > + KVM_MAX_PER_VCPU_PINNED_RANGE) > + return -ENOSPC; > + > + p = kzalloc(sizeof(struct kvm_pinned_page_range), GFP_KERNEL); > + if (!p) > + return -ENOMEM; > + > + vcpu->arch.nr_pinned_ranges++; > + > + trace_kvm_mmu_register_pinned_range(vcpu->vcpu_id, base_gfn, npages); > + > + INIT_LIST_HEAD(&p->link); > + p->base_gfn = base_gfn; > + p->npages = npages; > + mutex_lock(&vcpu->arch.pinned_mmu_mutex); > + list_add(&p->link, &vcpu->arch.pinned_mmu_pages); > + mutex_unlock(&vcpu->arch.pinned_mmu_mutex); > + kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu); > + > + return 0; > +} > + What happens if ranges overlap (within a vcpu, cross-vcpu)? Or if a range overflows and wraps around 0? Or if it does not refer to RAM? Looks like you're limiting the number of ranges, but not the number of pages, so a guest can lock all of its memory. > + > +/* > + * Pin KVM MMU page translations. This guarantees, for valid > + * addresses registered by kvm_mmu_register_pinned_range (valid address > + * meaning address which posses sufficient information for fault to > + * be resolved), valid translations exist while in guest mode and > + * therefore no VM-exits due to faults will occur. > + * > + * Failure to instantiate pages will abort guest entry. > + * > + * Page frames should be pinned with get_page in advance. > + * > + * Pinning is not guaranteed while executing as L2 guest. Does this undermine security? > + * > + */ > + > +static void kvm_mmu_pin_pages(struct kvm_vcpu *vcpu) > +{ > + struct kvm_pinned_page_range *p; > + > + if (is_guest_mode(vcpu)) > + return; > + > + if (!vcpu->arch.mmu.direct_map) > + return; > + > + ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa)); > + > + mutex_lock(&vcpu->arch.pinned_mmu_mutex); Is the mutex actually needed? It seems it's only taken in vcpu context, so the vcpu mutex should be sufficient. > + list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) { > + gfn_t gfn_offset; > + > + for (gfn_offset = 0; gfn_offset < p->npages; gfn_offset++) { > + gfn_t gfn = p->base_gfn + gfn_offset; > + int r; > + bool pinned = false; > + > + r = vcpu->arch.mmu.page_fault(vcpu, gfn << PAGE_SHIFT, > + PFERR_WRITE_MASK, false, > + true, &pinned); > + /* MMU notifier sequence window: retry */ > + if (!r && !pinned) > + kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu); > + if (r) { > + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); > + break; > + } > + > + } > + } > + mutex_unlock(&vcpu->arch.pinned_mmu_mutex); > +} > + > int kvm_mmu_load(struct kvm_vcpu *vcpu) > { > int r; > @@ -3916,6 +4101,7 @@ > goto out; > /* set_cr3() should ensure TLB has been flushed */ > vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa); > + kvm_mmu_pin_pages(vcpu); > out: > return r; > } > I don't see where you unpin pages, so even if you limit the number of pinned pages, a guest can pin all of memory by iterating over all of memory and pinning it a chunk at a time. You might try something similar to guest MTRR handling.