All of lore.kernel.org
 help / color / mirror / Atom feed
From: Avi Kivity <avi.kivity@gmail.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kvm@vger.kernel.org, ak@linux.intel.com, pbonzini@redhat.com,
	xiaoguangrong@linux.vnet.ibm.com, gleb@kernel.org
Subject: Re: [patch 2/5] KVM: MMU: allow pinning spte translations (TDP-only)
Date: Sun, 22 Jun 2014 16:35:24 +0300	[thread overview]
Message-ID: <53A6DB9C.7040107@gmail.com> (raw)
In-Reply-To: <20140619182627.GA32410@amt.cnet>


On 06/19/2014 09:26 PM, Marcelo Tosatti wrote:
> On Thu, Jun 19, 2014 at 11:01:06AM +0300, Avi Kivity wrote:
>> On 06/19/2014 02:12 AM, mtosatti@redhat.com wrote:
>>> Allow vcpus to pin spte translations by:
>>>
>>> 1) Creating a per-vcpu list of pinned ranges.
>>> 2) On mmu reload request:
>>> 	- Fault ranges.
>>> 	- Mark sptes with a pinned bit.
>>> 	- Mark shadow pages as pinned.
>>>
>>> 3) Then modify the following actions:
>>> 	- Page age => skip spte flush.
>>> 	- MMU notifiers => force mmu reload request (which kicks cpu out of
>>> 				guest mode).
>>> 	- GET_DIRTY_LOG => force mmu reload request.
>>> 	- SLAB shrinker => skip shadow page deletion.
>>>
>>> TDP-only.
>>>
>>> +int kvm_mmu_register_pinned_range(struct kvm_vcpu *vcpu,
>>> +				  gfn_t base_gfn, unsigned long npages)
>>> +{
>>> +	struct kvm_pinned_page_range *p;
>>> +
>>> +	mutex_lock(&vcpu->arch.pinned_mmu_mutex);
>>> +	list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) {
>>> +		if (p->base_gfn == base_gfn && p->npages == npages) {
>>> +			mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
>>> +			return -EEXIST;
>>> +		}
>>> +	}
>>> +	mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
>>> +
>>> +	if (vcpu->arch.nr_pinned_ranges >=
>>> +	    KVM_MAX_PER_VCPU_PINNED_RANGE)
>>> +		return -ENOSPC;
>>> +
>>> +	p = kzalloc(sizeof(struct kvm_pinned_page_range), GFP_KERNEL);
>>> +	if (!p)
>>> +		return -ENOMEM;
>>> +
>>> +	vcpu->arch.nr_pinned_ranges++;
>>> +
>>> +	trace_kvm_mmu_register_pinned_range(vcpu->vcpu_id, base_gfn, npages);
>>> +
>>> +	INIT_LIST_HEAD(&p->link);
>>> +	p->base_gfn = base_gfn;
>>> +	p->npages = npages;
>>> +	mutex_lock(&vcpu->arch.pinned_mmu_mutex);
>>> +	list_add(&p->link, &vcpu->arch.pinned_mmu_pages);
>>> +	mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
>>> +	kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>>> +
>>> +	return 0;
>>> +}
>>> +
>> What happens if ranges overlap (within a vcpu, cross-vcpu)?
> The page(s) are faulted multiple times if ranges overlap within a vcpu.
>
> I see no reason to disallow overlapping ranges. Do you?

Not really.  Just making sure nothing horrible happens.

>
>> Or if a range overflows and wraps around 0?
> Pagefault fails on vm-entry -> KVM_REQ_TRIPLE_FAULT.
>
> Will double check for overflows to make sure.

Will the loop terminate?

>> Looks like you're limiting the number of ranges, but not the number
>> of pages, so a guest can lock all of its memory.
> Yes. The page pinning at get_page time can also lock all of
> guest memory.

I'm sure that can't be good.  Maybe subject this pinning to the task 
mlock limit.

>
>>> +
>>> +/*
>>> + * Pin KVM MMU page translations. This guarantees, for valid
>>> + * addresses registered by kvm_mmu_register_pinned_range (valid address
>>> + * meaning address which posses sufficient information for fault to
>>> + * be resolved), valid translations exist while in guest mode and
>>> + * therefore no VM-exits due to faults will occur.
>>> + *
>>> + * Failure to instantiate pages will abort guest entry.
>>> + *
>>> + * Page frames should be pinned with get_page in advance.
>>> + *
>>> + * Pinning is not guaranteed while executing as L2 guest.
>> Does this undermine security?
> PEBS writes should not be enabled when L2 guest is executing.

What prevents L1 for setting up PEBS MSRs for L2?

>>> +	list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) {
>>> +		gfn_t gfn_offset;
>>> +
>>> +		for (gfn_offset = 0; gfn_offset < p->npages; gfn_offset++) {
>>> +			gfn_t gfn = p->base_gfn + gfn_offset;
>>> +			int r;
>>> +			bool pinned = false;
>>> +
>>> +			r = vcpu->arch.mmu.page_fault(vcpu, gfn << PAGE_SHIFT,
>>> +						     PFERR_WRITE_MASK, false,
>>> +						     true, &pinned);
>>> +			/* MMU notifier sequence window: retry */
>>> +			if (!r && !pinned)
>>> +				kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
>>> +			if (r) {
>>> +				kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
>>> +				break;
>>> +			}
>>> +
>>> +		}
>>> +	}
>>> +	mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
>>> +}
>>> +
>>>   int kvm_mmu_load(struct kvm_vcpu *vcpu)
>>>   {
>>>   	int r;
>>> @@ -3916,6 +4101,7 @@
>>>   		goto out;
>>>   	/* set_cr3() should ensure TLB has been flushed */
>>>   	vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
>>> +	kvm_mmu_pin_pages(vcpu);
>>>   out:
>>>   	return r;
>>>   }
>>>
>> I don't see where  you unpin pages, so even if you limit the number
>> of pinned pages, a guest can pin all of memory by iterating over all
>> of memory and pinning it a chunk at a time.
> The caller should be responsible for limiting number of pages pinned it
> is pinning the struct pages?

The caller would be the debug store data are MSR callbacks. How would 
they know what the limit it?

>
> And in that case, should remove any limiting from this interface, as
> that is confusing.
>
>> You might try something similar to guest MTRR handling.
> Please be more verbose.
>

mtrr_state already provides physical range attributes that are looked up 
on every fault, so I thought you could get the pinned attribute from 
there. But I guess that's too late, you want to pre-fault eveything.


  reply	other threads:[~2014-06-22 13:35 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-06-18 23:12 [patch 0/5] KVM: support for pinning sptes mtosatti
2014-06-18 23:12 ` [patch 1/5] KVM: x86: add pinned parameter to page_fault methods mtosatti
2014-06-18 23:12 ` [patch 2/5] KVM: MMU: allow pinning spte translations (TDP-only) mtosatti
2014-06-19  7:21   ` Gleb Natapov
2014-06-19 19:22     ` Marcelo Tosatti
2014-06-20 10:09       ` Gleb Natapov
2014-06-30 20:46         ` Marcelo Tosatti
2014-06-30 22:00           ` Andi Kleen
2014-06-19  8:01   ` Avi Kivity
2014-06-19 14:06     ` Andi Kleen
2014-06-19 18:26     ` Marcelo Tosatti
2014-06-22 13:35       ` Avi Kivity [this message]
2014-07-09 13:25         ` Marcelo Tosatti
2014-07-02  0:58   ` Nadav Amit
2014-06-18 23:12 ` [patch 3/5] KVM: MMU: notifiers support for pinned sptes mtosatti
2014-06-19  6:48   ` Gleb Natapov
2014-06-19 18:28     ` Marcelo Tosatti
2014-06-20 10:11       ` Gleb Natapov
2014-06-18 23:12 ` [patch 4/5] KVM: MMU: reload request from GET_DIRTY_LOG path mtosatti
2014-06-19  8:17   ` Gleb Natapov
2014-06-19 18:40     ` Marcelo Tosatti
2014-06-20 10:46       ` Gleb Natapov
2014-06-30 20:59         ` Marcelo Tosatti
2014-07-01  6:27           ` Gleb Natapov
2014-07-01 17:50             ` Marcelo Tosatti
2014-06-18 23:12 ` [patch 5/5] KVM: MMU: pinned sps are not candidates for deletion mtosatti
2014-06-19  1:44 ` [patch 0/5] KVM: support for pinning sptes Andi Kleen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53A6DB9C.7040107@gmail.com \
    --to=avi.kivity@gmail.com \
    --cc=ak@linux.intel.com \
    --cc=gleb@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=xiaoguangrong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.