Re: [PATCH v2 2/5] KVM: MMU: adjust page size early if gfn used as page table

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>,
	LKML <linux-kernel@vger.kernel.org>, KVM <kvm@vger.kernel.org>
Subject: Re: [PATCH v2 2/5] KVM: MMU: adjust page size early if gfn used as page table
Date: Thu, 13 Dec 2012 03:23:26 +0800	[thread overview]
Message-ID: <50C8D9AE.5050706@linux.vnet.ibm.com> (raw)
In-Reply-To: <20121212005723.GA2898@amt.cnet>

On 12/12/2012 08:57 AM, Marcelo Tosatti wrote:
> On Mon, Dec 10, 2012 at 05:13:03PM +0800, Xiao Guangrong wrote:
>> We have two issues in current code:
>> - if target gfn is used as its page table, guest will refault then kvm will use
>>   small page size to map it. We need two #PF to fix its shadow page table
>>
>> - sometimes, say a exception is triggered during vm-exit caused by #PF
>>   (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
>>   by the target gfn before go into page fault path, it will cause infinite
>>   loop:
>>   delete shadow pages shadowed by the gfn -> try to use large page size to map
>>   the gfn -> retry the access ->...
>>
>> To fix these, We can adjust page size early if the target gfn is used as page
>> table
>>
>> Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
>> ---
>>  arch/x86/kvm/mmu.c         |   13 ++++---------
>>  arch/x86/kvm/paging_tmpl.h |   33 ++++++++++++++++++++++++++++++++-
>>  2 files changed, 36 insertions(+), 10 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2a3c890..54fc61e 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2380,15 +2380,10 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
>>  	if (pte_access & ACC_WRITE_MASK) {
>>
>>  		/*
>> -		 * There are two cases:
>> -		 * - the one is other vcpu creates new sp in the window
>> -		 *   between mapping_level() and acquiring mmu-lock.
>> -		 * - the another case is the new sp is created by itself
>> -		 *   (page-fault path) when guest uses the target gfn as
>> -		 *   its page table.
>> -		 * Both of these cases can be fixed by allowing guest to
>> -		 * retry the access, it will refault, then we can establish
>> -		 * the mapping by using small page.
>> +		 * Other vcpu creates new sp in the window between
>> +		 * mapping_level() and acquiring mmu-lock. We can
>> +		 * allow guest to retry the access, the mapping can
>> +		 * be fixed if guest refault.
>>  		 */
>>  		if (level > PT_PAGE_TABLE_LEVEL &&
>>  		    has_wrprotected_page(vcpu->kvm, gfn, level))
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index ec481e9..32d77ff 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -491,6 +491,36 @@ out_gpte_changed:
>>  	return 0;
>>  }
>>
>> + /*
>> + * To see whether the mapped gfn can write its page table in the current
>> + * mapping.
>> + *
>> + * It is the helper function of FNAME(page_fault). When guest uses large page
>> + * size to map the writable gfn which is used as current page table, we should
>> + * force kvm to use small page size to map it because new shadow page will be
>> + * created when kvm establishes shadow page table that stop kvm using large
>> + * page size. Do it early can avoid unnecessary #PF and emulation.
>> + *
>> + * Note: the PDPT page table is not checked for PAE-32 bit guest. It is ok
>> + * since the PDPT is always shadowed, that means, we can not use large page
>> + * size to map the gfn which is used as PDPT.
>> + */
>> +static bool
>> +FNAME(mapped_gfn_can_write_current_pagetable)(struct guest_walker *walker)
>> +{
>> +	int level;
>> +	gfn_t mask = ~(KVM_PAGES_PER_HPAGE(walker->level) - 1);
>> +
>> +	if (!(walker->pte_access & ACC_WRITE_MASK))
>> +		return false;
>> +
>> +	for (level = walker->level; level <= walker->max_level; level++)
>> +		if (!((walker->gfn ^ walker->table_gfn[level - 1]) & mask))
>> +			return true;
> 
> XOR won't work. Just check with sums and integer comparison, ie.
> walker->gfn + KVM_PAGES_PER_HPAGE(walker->level).

It can not work since walker->gfn is not large-page-size aligned. For example,
guest uses large page size to map 0x123000000 to physical address 0-2M, if
guest faults on 0x123001000, walker->gfn = 0x1000.

The code "if (!((walker->gfn ^ walker->table_gfn[level - 1]) & mask))" is the
same as "if (walker->gfn & mask == walker->table_gfn[level - 1] & mask)" - if
any page in the large page area used as page table, we should use 4K page size
to fix it.

In above example, if table_gfn is in the area [0, 2M), kvm is forced to use
4k page size.

> 
> Moreover, its confusing to have it checked at this level. What about
> doing at reexecute_instruction?

Hmm, this patch is trying to fix a bug described in the changelog:

======
 - sometimes, say a exception is triggered during vm-exit caused by #PF
   (see handle_exception() in vmx.c), we remove all the shadow pages shadowed
   by the target gfn before go into page fault path, it will cause infinite
   loop:
   delete shadow pages shadowed by the gfn -> try to use large page size to map
   the gfn -> retry the access ->...
======

Which is caused by this code:

	if (is_page_fault(intr_info)) {
		/* EPT won't cause page fault directly */
		BUG_ON(enable_ept);
		cr2 = vmcs_readl(EXIT_QUALIFICATION);
		trace_kvm_page_fault(cr2, error_code);

		if (kvm_event_needs_reinjection(vcpu))
			kvm_mmu_unprotect_page_virt(vcpu, cr2);
		return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
	}

This bug is introduced in commit c219346325.

Another way to fix it is doing this change:
@@ -2395,7 +2395,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
                 */
                if (level > PT_PAGE_TABLE_LEVEL &&
                    has_wrprotected_page(vcpu->kvm, gfn, level))
-                       goto done;
+                       return 1;

The disadvantage of this way is, it causes unnecessary emulation. For example,
if 0-2M is mapped in guest and only page 0 used as page table, any write to
[4K, 2M) will need be emulated.

Your idea?

next prev parent reply	other threads:[~2012-12-12 19:23 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-12-10  9:11 [PATCH v2 0/5] KVM: x86: improve reexecute_instruction Xiao Guangrong
2012-12-10  9:12 ` [PATCH v2 1/5] KVM: MMU: move adjusting pte access for softmmu to FNAME(page_fault) Xiao Guangrong
2012-12-11 23:47   ` Marcelo Tosatti
2012-12-12 18:53     ` Xiao Guangrong
2012-12-10  9:13 ` [PATCH v2 2/5] KVM: MMU: adjust page size early if gfn used as page table Xiao Guangrong
2012-12-12  0:57   ` Marcelo Tosatti
2012-12-12 19:23     ` Xiao Guangrong [this message]
2012-12-13 22:37       ` Marcelo Tosatti
2012-12-10  9:13 ` [PATCH v2 3/5] KVM: x86: clean up reexecute_instruction Xiao Guangrong
2012-12-10  9:14 ` [PATCH v2 4/5] KVM: x86: let reexecute_instruction work for tdp Xiao Guangrong
2012-12-10  9:14 ` [PATCH v2 5/5] KVM: x86: improve reexecute_instruction Xiao Guangrong
2012-12-12  1:09   ` Marcelo Tosatti
2012-12-12 19:29     ` Xiao Guangrong
2012-12-13 23:02       ` Marcelo Tosatti
2012-12-14  3:40         ` Xiao Guangrong
2012-12-11 23:36 ` [PATCH v2 0/5] " Marcelo Tosatti
2012-12-12 20:05   ` Xiao Guangrong
2012-12-13 22:54     ` Marcelo Tosatti
2012-12-14  4:50       ` Xiao Guangrong
2012-12-15  1:05         ` Marcelo Tosatti
2012-12-23 11:46           ` Gleb Natapov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50C8D9AE.5050706@linux.vnet.ibm.com \
    --to=xiaoguangrong@linux.vnet.ibm.com \
    --cc=gleb@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.