[PATCH 1/3] KVM: Introduce a helper to check if gfn is in memslot

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/3] KVM: Introduce a helper to check if gfn is in memslot
@ 2011-04-18 18:32 Takuya Yoshikawa
  2011-04-18 18:34 ` [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte Takuya Yoshikawa
  2011-04-18 18:38 ` [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk Takuya Yoshikawa
  0 siblings, 2 replies; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-18 18:32 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya

From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

This will be used later.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 include/linux/kvm_host.h |    5 +++++
 virt/kvm/kvm_main.c      |    6 ++----
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0bc3d37..9101698 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -612,6 +612,11 @@ static inline unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot,
 	return slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE;
 }
 
+static inline bool gfn_in_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
+{
+	return (gfn >= slot->base_gfn) && (gfn < slot->base_gfn + slot->npages);
+}
+
 static inline gpa_t gfn_to_gpa(gfn_t gfn)
 {
 	return (gpa_t)gfn << PAGE_SHIFT;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5814645..6df199d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -940,8 +940,7 @@ static struct kvm_memory_slot *__gfn_to_memslot(struct kvm_memslots *slots,
 	for (i = 0; i < slots->nmemslots; ++i) {
 		struct kvm_memory_slot *memslot = &slots->memslots[i];
 
-		if (gfn >= memslot->base_gfn
-		    && gfn < memslot->base_gfn + memslot->npages)
+		if (gfn_in_memslot(memslot, gfn))
 			return memslot;
 	}
 	return NULL;
@@ -964,8 +963,7 @@ int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn)
 		if (memslot->flags & KVM_MEMSLOT_INVALID)
 			continue;
 
-		if (gfn >= memslot->base_gfn
-		    && gfn < memslot->base_gfn + memslot->npages)
+		if (gfn_in_memslot(memslot, gfn))
 			return 1;
 	}
 	return 0;
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
  2011-04-18 18:32 [PATCH 1/3] KVM: Introduce a helper to check if gfn is in memslot Takuya Yoshikawa
@ 2011-04-18 18:34 ` Takuya Yoshikawa
  2011-04-20  9:07   ` Avi Kivity
  2011-04-18 18:38 ` [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk Takuya Yoshikawa
  1 sibling, 1 reply; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-18 18:34 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya

From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

This will be optimized later.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/paging_tmpl.h |   12 +++++++++---
 1 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 74f8567..109939a 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
 	return access;
 }
 
+static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
+				 gfn_t table_gfn, int offset, pt_element_t *ptep)
+{
+	return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
+				       offset, sizeof(*ptep),
+				       PFERR_USER_MASK | PFERR_WRITE_MASK);
+}
+
 /*
  * Fetch a guest pte for a guest virtual address
  */
@@ -160,9 +168,7 @@ walk:
 		walker->table_gfn[walker->level - 1] = table_gfn;
 		walker->pte_gpa[walker->level - 1] = pte_gpa;
 
-		if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, &pte,
-					    offset, sizeof(pte),
-					    PFERR_USER_MASK|PFERR_WRITE_MASK)) {
+		if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset, &pte)) {
 			present = false;
 			break;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
  2011-04-18 18:34 ` [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte Takuya Yoshikawa
@ 2011-04-20  9:07   ` Avi Kivity
  2011-04-20  9:35     ` Roedel, Joerg
  0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2011-04-20  9:07 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya, Roedel, Joerg

On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote:
> From: Takuya Yoshikawa<yoshikawa.takuya@oss.ntt.co.jp>
>
> This will be optimized later.
>
> Signed-off-by: Takuya Yoshikawa<yoshikawa.takuya@oss.ntt.co.jp>
> ---
>   arch/x86/kvm/paging_tmpl.h |   12 +++++++++---
>   1 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 74f8567..109939a 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
>   	return access;
>   }
>
> +static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> +				 gfn_t table_gfn, int offset, pt_element_t *ptep)
> +{
> +	return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
> +				       offset, sizeof(*ptep),
> +				       PFERR_USER_MASK | PFERR_WRITE_MASK);
> +}
> +
>   /*
>    * Fetch a guest pte for a guest virtual address
>    */
> @@ -160,9 +168,7 @@ walk:
>   		walker->table_gfn[walker->level - 1] = table_gfn;
>   		walker->pte_gpa[walker->level - 1] = pte_gpa;
>
> -		if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte,
> -					    offset, sizeof(pte),
> -					    PFERR_USER_MASK|PFERR_WRITE_MASK)) {
> +		if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) {
>   			present = false;
>   			break;
>   		}


I think it's better to avoid a separate function for this.  The reason 
is I'd like to use ptep_user for cmpxchg_gpte() later on in 
walk_addr_generic(), so we use the same calculation for both read and 
write.  So please just inline the new code in walk_addr_generic().

In fact there's probably a bug there for nested npt - we use 
gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn.  
Joerg, am I right here?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
  2011-04-20  9:07   ` Avi Kivity
@ 2011-04-20  9:35     ` Roedel, Joerg
  2011-04-20 10:05       ` Avi Kivity
  0 siblings, 1 reply; 27+ messages in thread
From: Roedel, Joerg @ 2011-04-20  9:35 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Takuya Yoshikawa, mtosatti@redhat.com, kvm@vger.kernel.org,
	yoshikawa.takuya@oss.ntt.co.jp

On Wed, Apr 20, 2011 at 05:07:12AM -0400, Avi Kivity wrote:
> On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote:
> > From: Takuya Yoshikawa<yoshikawa.takuya@oss.ntt.co.jp>
> >
> > This will be optimized later.
> >
> > Signed-off-by: Takuya Yoshikawa<yoshikawa.takuya@oss.ntt.co.jp>
> > ---
> >   arch/x86/kvm/paging_tmpl.h |   12 +++++++++---
> >   1 files changed, 9 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> > index 74f8567..109939a 100644
> > --- a/arch/x86/kvm/paging_tmpl.h
> > +++ b/arch/x86/kvm/paging_tmpl.h
> > @@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
> >   	return access;
> >   }
> >
> > +static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> > +				 gfn_t table_gfn, int offset, pt_element_t *ptep)
> > +{
> > +	return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
> > +				       offset, sizeof(*ptep),
> > +				       PFERR_USER_MASK | PFERR_WRITE_MASK);
> > +}
> > +
> >   /*
> >    * Fetch a guest pte for a guest virtual address
> >    */
> > @@ -160,9 +168,7 @@ walk:
> >   		walker->table_gfn[walker->level - 1] = table_gfn;
> >   		walker->pte_gpa[walker->level - 1] = pte_gpa;
> >
> > -		if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte,
> > -					    offset, sizeof(pte),
> > -					    PFERR_USER_MASK|PFERR_WRITE_MASK)) {
> > +		if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) {
> >   			present = false;
> >   			break;
> >   		}
> 
> 
> I think it's better to avoid a separate function for this.  The reason 
> is I'd like to use ptep_user for cmpxchg_gpte() later on in 
> walk_addr_generic(), so we use the same calculation for both read and 
> write.  So please just inline the new code in walk_addr_generic().
> 
> In fact there's probably a bug there for nested npt - we use 
> gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn.  
> Joerg, am I right here?

This patch seems only to introduce another wrapper around
kvm_read_guest_page_mmu(), so I don't see a problem in this patch.

The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or
l2-gfn (by calling mmu->translate_gpa).

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
  2011-04-20  9:35     ` Roedel, Joerg
@ 2011-04-20 10:05       ` Avi Kivity
  2011-04-20 11:06         ` Roedel, Joerg
  0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2011-04-20 10:05 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: Takuya Yoshikawa, mtosatti@redhat.com, kvm@vger.kernel.org,
	yoshikawa.takuya@oss.ntt.co.jp

On 04/20/2011 12:35 PM, Roedel, Joerg wrote:
> On Wed, Apr 20, 2011 at 05:07:12AM -0400, Avi Kivity wrote:
> >  On 04/18/2011 09:34 PM, Takuya Yoshikawa wrote:
> >  >  From: Takuya Yoshikawa<yoshikawa.takuya@oss.ntt.co.jp>
> >  >
> >  >  This will be optimized later.
> >  >
> >  >  Signed-off-by: Takuya Yoshikawa<yoshikawa.takuya@oss.ntt.co.jp>
> >  >  ---
> >  >    arch/x86/kvm/paging_tmpl.h |   12 +++++++++---
> >  >    1 files changed, 9 insertions(+), 3 deletions(-)
> >  >
> >  >  diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> >  >  index 74f8567..109939a 100644
> >  >  --- a/arch/x86/kvm/paging_tmpl.h
> >  >  +++ b/arch/x86/kvm/paging_tmpl.h
> >  >  @@ -109,6 +109,14 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
> >  >    	return access;
> >  >    }
> >  >
> >  >  +static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> >  >  +				 gfn_t table_gfn, int offset, pt_element_t *ptep)
> >  >  +{
> >  >  +	return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
> >  >  +				       offset, sizeof(*ptep),
> >  >  +				       PFERR_USER_MASK | PFERR_WRITE_MASK);
> >  >  +}
> >  >  +
> >  >    /*
> >  >     * Fetch a guest pte for a guest virtual address
> >  >     */
> >  >  @@ -160,9 +168,7 @@ walk:
> >  >    		walker->table_gfn[walker->level - 1] = table_gfn;
> >  >    		walker->pte_gpa[walker->level - 1] = pte_gpa;
> >  >
> >  >  -		if (kvm_read_guest_page_mmu(vcpu, mmu, table_gfn,&pte,
> >  >  -					    offset, sizeof(pte),
> >  >  -					    PFERR_USER_MASK|PFERR_WRITE_MASK)) {
> >  >  +		if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) {
> >  >    			present = false;
> >  >    			break;
> >  >    		}
> >
> >
> >  I think it's better to avoid a separate function for this.  The reason
> >  is I'd like to use ptep_user for cmpxchg_gpte() later on in
> >  walk_addr_generic(), so we use the same calculation for both read and
> >  write.  So please just inline the new code in walk_addr_generic().
> >
> >  In fact there's probably a bug there for nested npt - we use
> >  gfn_to_page(table_gfn), but table_gfn is actually an ngfn, not a gfn.
> >  Joerg, am I right here?
>
> This patch seems only to introduce another wrapper around
> kvm_read_guest_page_mmu(), so I don't see a problem in this patch.

By patch 3, ptep_user will be computed in this function and no longer 
available for setting the accessed bit later on.

> The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or
> l2-gfn (by calling mmu->translate_gpa).

But cmpxchg_gpte() does not.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
  2011-04-20 10:05       ` Avi Kivity
@ 2011-04-20 11:06         ` Roedel, Joerg
  2011-04-20 11:18           ` Avi Kivity
  0 siblings, 1 reply; 27+ messages in thread
From: Roedel, Joerg @ 2011-04-20 11:06 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Takuya Yoshikawa, mtosatti@redhat.com, kvm@vger.kernel.org,
	yoshikawa.takuya@oss.ntt.co.jp

On Wed, Apr 20, 2011 at 06:05:08AM -0400, Avi Kivity wrote:
> On 04/20/2011 12:35 PM, Roedel, Joerg wrote:

> > This patch seems only to introduce another wrapper around
> > kvm_read_guest_page_mmu(), so I don't see a problem in this patch.
> 
> By patch 3, ptep_user will be computed in this function and no longer 
> available for setting the accessed bit later on.
> 
> > The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or
> > l2-gfn (by calling mmu->translate_gpa).
> 
> But cmpxchg_gpte() does not.

You are right, cmpxchg_gpte needs to handle this too. But the bug is not
introduced with this patch-set it was there before.
The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a
fix soon.

Regards,

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
  2011-04-20 11:06         ` Roedel, Joerg
@ 2011-04-20 11:18           ` Avi Kivity
  2011-04-20 13:33             ` [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too Roedel, Joerg
  2011-04-21  1:07             ` [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte Takuya Yoshikawa
  0 siblings, 2 replies; 27+ messages in thread
From: Avi Kivity @ 2011-04-20 11:18 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: Takuya Yoshikawa, mtosatti@redhat.com, kvm@vger.kernel.org,
	yoshikawa.takuya@oss.ntt.co.jp

On 04/20/2011 02:06 PM, Roedel, Joerg wrote:
> On Wed, Apr 20, 2011 at 06:05:08AM -0400, Avi Kivity wrote:
> >  On 04/20/2011 12:35 PM, Roedel, Joerg wrote:
>
> >  >  This patch seems only to introduce another wrapper around
> >  >  kvm_read_guest_page_mmu(), so I don't see a problem in this patch.
> >
> >  By patch 3, ptep_user will be computed in this function and no longer
> >  available for setting the accessed bit later on.
> >
> >  >  The kvm_read_guest_page_mmu takes care whether it gets a l1-gfn or
> >  >  l2-gfn (by calling mmu->translate_gpa).
> >
> >  But cmpxchg_gpte() does not.
>
> You are right, cmpxchg_gpte needs to handle this too. But the bug is not
> introduced with this patch-set it was there before.

Correct.  The reason I don't want the helper, is so we can use ptep_user 
in both places (not for efficiency, just to make sure it's exactly the 
same value).

> The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a
> fix soon.

Thanks.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too
  2011-04-20 11:18           ` Avi Kivity
@ 2011-04-20 13:33             ` Roedel, Joerg
  2011-04-21  1:02               ` Takuya Yoshikawa
  2011-04-21  1:07             ` [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte Takuya Yoshikawa
  1 sibling, 1 reply; 27+ messages in thread
From: Roedel, Joerg @ 2011-04-20 13:33 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Takuya Yoshikawa, mtosatti@redhat.com, kvm@vger.kernel.org,
	yoshikawa.takuya@oss.ntt.co.jp

On Wed, Apr 20, 2011 at 07:18:12AM -0400, Avi Kivity wrote:
> On 04/20/2011 02:06 PM, Roedel, Joerg wrote:

> > The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a
> > fix soon.
> 
> Thanks.

Here is a fix for review. I am out-of-office starting in nearly one hour
until next Tuesday. So the corrections will most likely not happen
before :)
The patch ist tested with npt and shadow paging as well as with
npt-on-npt (64 bit wit kvm).

Regards,

	Joerg

>From 6b1dcd9f17bbd482061180001d1f45c3adcef430 Mon Sep 17 00:00:00 2001
From: Joerg Roedel <joerg.roedel@amd.com>
Date: Wed, 20 Apr 2011 15:22:21 +0200
Subject: [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too

This patch makes the cmpxchg_gpte() function aware of the
difference between l1-gfns and l2-gfns when nested
virtualization is in use. This fixes a potential
data-corruption problem in the l1-guest and makes the code
work correct (at least as correct as the hardware which is
emulated in this code) again.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
---
 arch/x86/kvm/paging_tmpl.h |   30 +++++++++++++++++++++++-------
 1 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 74f8567..e442bf4 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -78,15 +78,21 @@ static gfn_t gpte_to_gfn_lvl(pt_element_t gpte, int lvl)
 	return (gpte & PT_LVL_ADDR_MASK(lvl)) >> PAGE_SHIFT;
 }
 
-static bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
+static int FNAME(cmpxchg_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
 			 gfn_t table_gfn, unsigned index,
 			 pt_element_t orig_pte, pt_element_t new_pte)
 {
 	pt_element_t ret;
 	pt_element_t *table;
 	struct page *page;
+	gpa_t gpa;
 
-	page = gfn_to_page(kvm, table_gfn);
+	gpa = mmu->translate_gpa(vcpu, table_gfn << PAGE_SHIFT,
+				 PFERR_USER_MASK|PFERR_WRITE_MASK);
+	if (gpa == UNMAPPED_GVA)
+		return -EFAULT;
+
+	page = gfn_to_page(vcpu->kvm, gpa_to_gfn(gpa));
 
 	table = kmap_atomic(page, KM_USER0);
 	ret = CMPXCHG(&table[index], orig_pte, new_pte);
@@ -192,11 +198,17 @@ walk:
 #endif
 
 		if (!eperm && !rsvd_fault && !(pte & PT_ACCESSED_MASK)) {
+			int ret;
 			trace_kvm_mmu_set_accessed_bit(table_gfn, index,
 						       sizeof(pte));
-			if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn,
-			    index, pte, pte|PT_ACCESSED_MASK))
+			ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn,
+					index, pte, pte|PT_ACCESSED_MASK);
+			if (ret < 0) {
+				present = false;
+				break;
+			} else if (ret)
 				goto walk;
+
 			mark_page_dirty(vcpu->kvm, table_gfn);
 			pte |= PT_ACCESSED_MASK;
 		}
@@ -245,13 +257,17 @@ walk:
 		goto error;
 
 	if (write_fault && !is_dirty_gpte(pte)) {
-		bool ret;
+		int ret;
 
 		trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
-		ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
+		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn, index, pte,
 			    pte|PT_DIRTY_MASK);
-		if (ret)
+		if (ret < 0) {
+			present = false;
+			goto error;
+		} if (ret)
 			goto walk;
+
 		mark_page_dirty(vcpu->kvm, table_gfn);
 		pte |= PT_DIRTY_MASK;
 		walker->ptes[walker->level - 1] = pte;
-- 
1.7.1



-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too
  2011-04-20 13:33             ` [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too Roedel, Joerg
@ 2011-04-21  1:02               ` Takuya Yoshikawa
  2011-04-21  8:11                 ` Avi Kivity
  0 siblings, 1 reply; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-21  1:02 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: Avi Kivity, Takuya Yoshikawa, mtosatti@redhat.com,
	kvm@vger.kernel.org

On Wed, 20 Apr 2011 15:33:16 +0200
"Roedel, Joerg" <Joerg.Roedel@amd.com> wrote:

> @@ -245,13 +257,17 @@ walk:
>  		goto error;
>  
>  	if (write_fault && !is_dirty_gpte(pte)) {
> -		bool ret;
> +		int ret;
>  
>  		trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
> -		ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
> +		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn, index, pte,
>  			    pte|PT_DIRTY_MASK);
> -		if (ret)
> +		if (ret < 0) {
> +			present = false;
> +			goto error;
> +		} if (ret)
>  			goto walk;

Preferably else if or another line ? :)

Takuya


> +
>  		mark_page_dirty(vcpu->kvm, table_gfn);
>  		pte |= PT_DIRTY_MASK;
>  		walker->ptes[walker->level - 1] = pte;
> -- 
> 1.7.1
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too
  2011-04-21  1:02               ` Takuya Yoshikawa
@ 2011-04-21  8:11                 ` Avi Kivity
  0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2011-04-21  8:11 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: Roedel, Joerg, Takuya Yoshikawa, mtosatti@redhat.com,
	kvm@vger.kernel.org

On 04/21/2011 04:02 AM, Takuya Yoshikawa wrote:
> On Wed, 20 Apr 2011 15:33:16 +0200
> "Roedel, Joerg"<Joerg.Roedel@amd.com>  wrote:
>
> >  @@ -245,13 +257,17 @@ walk:
> >   		goto error;
> >
> >   	if (write_fault&&  !is_dirty_gpte(pte)) {
> >  -		bool ret;
> >  +		int ret;
> >
> >   		trace_kvm_mmu_set_dirty_bit(table_gfn, index, sizeof(pte));
> >  -		ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
> >  +		ret = FNAME(cmpxchg_gpte)(vcpu, mmu, table_gfn, index, pte,
> >   			pte|PT_DIRTY_MASK);
> >  -		if (ret)
> >  +		if (ret<  0) {
> >  +			present = false;
> >  +			goto error;
> >  +		} if (ret)
> >   			goto walk;
>
> Preferably else if or another line ? :)

Yes, I added an 'else' and applied.  Thanks.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte
  2011-04-20 11:18           ` Avi Kivity
  2011-04-20 13:33             ` [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too Roedel, Joerg
@ 2011-04-21  1:07             ` Takuya Yoshikawa
  1 sibling, 0 replies; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-21  1:07 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Roedel, Joerg, Takuya Yoshikawa, mtosatti@redhat.com,
	kvm@vger.kernel.org

On Wed, 20 Apr 2011 14:18:12 +0300
Avi Kivity <avi@redhat.com> wrote:

> Correct.  The reason I don't want the helper, is so we can use ptep_user 
> in both places (not for efficiency, just to make sure it's exactly the 
> same value).
> 

Thank you for your explanation, now I've got the picture!

I will send a new patch taking into account your advice.

  Takuya


> > The cmpxchg_gpte function treats all table_gfns as l1-gfns. I'll send a
> > fix soon.
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-18 18:32 [PATCH 1/3] KVM: Introduce a helper to check if gfn is in memslot Takuya Yoshikawa
  2011-04-18 18:34 ` [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte Takuya Yoshikawa
@ 2011-04-18 18:38 ` Takuya Yoshikawa
  2011-04-18 18:52   ` Joerg Roedel
                     ` (2 more replies)
  1 sibling, 3 replies; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-18 18:38 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya

From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

We optimize multi level guest page table walk as follows:

  1. We cache the memslot which, probably, includes the next guest page
     tables to avoid searching for it many times.
  2. We use get_user() instead of copy_from_user().

Note that this is kind of a restricted way of Xiao's more generic
work: "KVM: optimize memslots searching and cache GPN to GFN."

With this patch applied, paging64_walk_addr_generic() has improved
as the following tracing results show.

Before:
  3.169 us   |  paging64_walk_addr_generic();
  1.880 us   |  paging64_walk_addr_generic();
  1.243 us   |  paging64_walk_addr_generic();
  1.517 us   |  paging64_walk_addr_generic();
  3.009 us   |  paging64_walk_addr_generic();
  1.814 us   |  paging64_walk_addr_generic();
  1.340 us   |  paging64_walk_addr_generic();
  1.659 us   |  paging64_walk_addr_generic();
  1.748 us   |  paging64_walk_addr_generic();
  1.488 us   |  paging64_walk_addr_generic();

After:
  1.714 us   |  paging64_walk_addr_generic();
  0.806 us   |  paging64_walk_addr_generic();
  0.664 us   |  paging64_walk_addr_generic();
  0.619 us   |  paging64_walk_addr_generic();
  0.645 us   |  paging64_walk_addr_generic();
  0.605 us   |  paging64_walk_addr_generic();
  1.388 us   |  paging64_walk_addr_generic();
  0.753 us   |  paging64_walk_addr_generic();
  0.594 us   |  paging64_walk_addr_generic();
  0.833 us   |  paging64_walk_addr_generic();

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/paging_tmpl.h |   37 ++++++++++++++++++++++++++++++++-----
 1 files changed, 32 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 109939a..614aa3f 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -109,12 +109,37 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
 	return access;
 }
 
+/*
+ * Read the guest PTE refered to by table_gfn and offset and put it into ptep.
+ *
+ * *slot_hint, if not NULL, should point to a memslot which probably includes
+ * the guest PTE.  The actual memslot will be put back into this so that
+ * callers can cache it.
+ */
 static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
-				 gfn_t table_gfn, int offset, pt_element_t *ptep)
+				 gfn_t table_gfn, int offset, pt_element_t *ptep,
+				 struct kvm_memory_slot **slot_hint)
 {
-	return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
-				       offset, sizeof(*ptep),
-				       PFERR_USER_MASK | PFERR_WRITE_MASK);
+	unsigned long addr;
+	pt_element_t __user *ptep_user;
+	gfn_t real_gfn;
+
+	real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
+				      PFERR_USER_MASK | PFERR_WRITE_MASK);
+	if (real_gfn == UNMAPPED_GVA)
+		return -EFAULT;
+
+	real_gfn = gpa_to_gfn(real_gfn);
+
+	if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn))
+		*slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn);
+
+	addr = gfn_to_hva_memslot(*slot_hint, real_gfn);
+	if (kvm_is_error_hva(addr))
+		return -EFAULT;
+
+	ptep_user = (pt_element_t __user *)((void *)addr + offset);
+	return get_user(*ptep, ptep_user);
 }
 
 /*
@@ -130,6 +155,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
 	gpa_t pte_gpa;
 	bool eperm, present, rsvd_fault;
 	int offset, write_fault, user_fault, fetch_fault;
+	struct kvm_memory_slot *slot_cache = NULL;
 
 	write_fault = access & PFERR_WRITE_MASK;
 	user_fault = access & PFERR_USER_MASK;
@@ -168,7 +194,8 @@ walk:
 		walker->table_gfn[walker->level - 1] = table_gfn;
 		walker->pte_gpa[walker->level - 1] = pte_gpa;
 
-		if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset, &pte)) {
+		if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn,
+					  offset, &pte, &slot_cache)) {
 			present = false;
 			break;
 		}
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-18 18:38 ` [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk Takuya Yoshikawa
@ 2011-04-18 18:52   ` Joerg Roedel
  2011-04-19  1:24     ` Takuya Yoshikawa
  2011-04-19  1:42   ` Xiao Guangrong
  2011-04-20  9:02   ` Avi Kivity
  2 siblings, 1 reply; 27+ messages in thread
From: Joerg Roedel @ 2011-04-18 18:52 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: avi, mtosatti, kvm, yoshikawa.takuya

On Tue, Apr 19, 2011 at 03:38:14AM +0900, Takuya Yoshikawa wrote:
> From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> 
> We optimize multi level guest page table walk as follows:
> 
>   1. We cache the memslot which, probably, includes the next guest page
>      tables to avoid searching for it many times.
>   2. We use get_user() instead of copy_from_user().
> 
> Note that this is kind of a restricted way of Xiao's more generic
> work: "KVM: optimize memslots searching and cache GPN to GFN."
> 
> With this patch applied, paging64_walk_addr_generic() has improved
> as the following tracing results show.
> 
> Before:
>   3.169 us   |  paging64_walk_addr_generic();
>   1.880 us   |  paging64_walk_addr_generic();
>   1.243 us   |  paging64_walk_addr_generic();
>   1.517 us   |  paging64_walk_addr_generic();
>   3.009 us   |  paging64_walk_addr_generic();
>   1.814 us   |  paging64_walk_addr_generic();
>   1.340 us   |  paging64_walk_addr_generic();
>   1.659 us   |  paging64_walk_addr_generic();
>   1.748 us   |  paging64_walk_addr_generic();
>   1.488 us   |  paging64_walk_addr_generic();
> 
> After:
>   1.714 us   |  paging64_walk_addr_generic();
>   0.806 us   |  paging64_walk_addr_generic();
>   0.664 us   |  paging64_walk_addr_generic();
>   0.619 us   |  paging64_walk_addr_generic();
>   0.645 us   |  paging64_walk_addr_generic();
>   0.605 us   |  paging64_walk_addr_generic();
>   1.388 us   |  paging64_walk_addr_generic();
>   0.753 us   |  paging64_walk_addr_generic();
>   0.594 us   |  paging64_walk_addr_generic();
>   0.833 us   |  paging64_walk_addr_generic();

Nice optimization! What scenarios have you used to test it?

	Joerg


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-18 18:52   ` Joerg Roedel
@ 2011-04-19  1:24     ` Takuya Yoshikawa
  2011-04-19  6:20       ` Joerg Roedel
  0 siblings, 1 reply; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-19  1:24 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Takuya Yoshikawa, avi, mtosatti, kvm

Joerg Roedel <joro@8bytes.org> wrote:
> Nice optimization! What scenarios have you used to test it?

I used my desktop Phenom II box, running the latest qemu-kvm.
So probably, NPT was ON by default.

The guest was running a .ogg movie during that test.

I am not an MMU expert.  So I would be glad if I can know what
scenarios should be tested for this patch!

  Can I test nested SVM easily, e.g.?

Thanks,
  Takuya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-19  1:24     ` Takuya Yoshikawa
@ 2011-04-19  6:20       ` Joerg Roedel
  0 siblings, 0 replies; 27+ messages in thread
From: Joerg Roedel @ 2011-04-19  6:20 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, avi, mtosatti, kvm

On Tue, Apr 19, 2011 at 10:24:10AM +0900, Takuya Yoshikawa wrote:

>   Can I test nested SVM easily, e.g.?

Yes, nested SVM would be good to test too with those changes. We should
make sure to not break something there.

Regards,

	Joerg


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-18 18:38 ` [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk Takuya Yoshikawa
  2011-04-18 18:52   ` Joerg Roedel
@ 2011-04-19  1:42   ` Xiao Guangrong
  2011-04-19  3:47     ` Takuya Yoshikawa
  2011-04-20  9:02   ` Avi Kivity
  2 siblings, 1 reply; 27+ messages in thread
From: Xiao Guangrong @ 2011-04-19  1:42 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: avi, mtosatti, kvm, yoshikawa.takuya

On 04/19/2011 02:38 AM, Takuya Yoshikawa wrote:
> From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> 
> We optimize multi level guest page table walk as follows:
> 
>   1. We cache the memslot which, probably, includes the next guest page
>      tables to avoid searching for it many times.

Yeah, the hit is very high, after optimizing the algorithm of memslots
(http://lwn.net/Articles/429308/), maybe the advantage is not so significant,
could you apply this patchset and test again please?

>   2. We use get_user() instead of copy_from_user().
> 
> Note that this is kind of a restricted way of Xiao's more generic
> work: "KVM: optimize memslots searching and cache GPN to GFN."
> 
> With this patch applied, paging64_walk_addr_generic() has improved
> as the following tracing results show.
> 
> Before:
>   3.169 us   |  paging64_walk_addr_generic();
>   1.880 us   |  paging64_walk_addr_generic();
>   1.243 us   |  paging64_walk_addr_generic();
>   1.517 us   |  paging64_walk_addr_generic();
>   3.009 us   |  paging64_walk_addr_generic();
>   1.814 us   |  paging64_walk_addr_generic();
>   1.340 us   |  paging64_walk_addr_generic();
>   1.659 us   |  paging64_walk_addr_generic();
>   1.748 us   |  paging64_walk_addr_generic();
>   1.488 us   |  paging64_walk_addr_generic();
> 
> After:
>   1.714 us   |  paging64_walk_addr_generic();
>   0.806 us   |  paging64_walk_addr_generic();
>   0.664 us   |  paging64_walk_addr_generic();
>   0.619 us   |  paging64_walk_addr_generic();
>   0.645 us   |  paging64_walk_addr_generic();
>   0.605 us   |  paging64_walk_addr_generic();
>   1.388 us   |  paging64_walk_addr_generic();
>   0.753 us   |  paging64_walk_addr_generic();
>   0.594 us   |  paging64_walk_addr_generic();
>   0.833 us   |  paging64_walk_addr_generic();
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
>  arch/x86/kvm/paging_tmpl.h |   37 ++++++++++++++++++++++++++++++++-----
>  1 files changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 109939a..614aa3f 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -109,12 +109,37 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu *vcpu, pt_element_t gpte)
>  	return access;
>  }
>  
> +/*
> + * Read the guest PTE refered to by table_gfn and offset and put it into ptep.
> + *
> + * *slot_hint, if not NULL, should point to a memslot which probably includes
> + * the guest PTE.  The actual memslot will be put back into this so that
> + * callers can cache it.
> + */
>  static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> -				 gfn_t table_gfn, int offset, pt_element_t *ptep)
> +				 gfn_t table_gfn, int offset, pt_element_t *ptep,
> +				 struct kvm_memory_slot **slot_hint)
>  {
> -	return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
> -				       offset, sizeof(*ptep),
> -				       PFERR_USER_MASK | PFERR_WRITE_MASK);
> +	unsigned long addr;
> +	pt_element_t __user *ptep_user;
> +	gfn_t real_gfn;
> +
> +	real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
> +				      PFERR_USER_MASK | PFERR_WRITE_MASK);
> +	if (real_gfn == UNMAPPED_GVA)
> +		return -EFAULT;
> +
> +	real_gfn = gpa_to_gfn(real_gfn);
> +
> +	if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn))
> +		*slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn);
> +

You forgot to check the result. (if *slot_hint == NULL)? ...　;-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-19  1:42   ` Xiao Guangrong
@ 2011-04-19  3:47     ` Takuya Yoshikawa
  2011-04-20  9:09       ` Avi Kivity
  0 siblings, 1 reply; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-19  3:47 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Takuya Yoshikawa, avi, mtosatti, kvm

Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> wrote:

> > We optimize multi level guest page table walk as follows:
> > 
> >   1. We cache the memslot which, probably, includes the next guest page
> >      tables to avoid searching for it many times.
> 
> Yeah, the hit is very high, after optimizing the algorithm of memslots
> (http://lwn.net/Articles/429308/), maybe the advantage is not so significant,
> could you apply this patchset and test again please?

Any sorting, including tree based, strategies have tradoffs.
Compared to that, what I wanted to do here was to improve the table
walk locally without sacrificing other things.

  Of course, my strategy depends on the assumption that the page
  tables will be in the same slot in very high probability.

So if certain algorithm seems to be addapted, yes, I will test based
on that.  IIRC, any practically good algorithm has not been found yet,
right?

> 
> >   2. We use get_user() instead of copy_from_user().
> > 
> > Note that this is kind of a restricted way of Xiao's more generic
> > work: "KVM: optimize memslots searching and cache GPN to GFN."
> > 
> > With this patch applied, paging64_walk_addr_generic() has improved
> > as the following tracing results show.
> > 


> > +
> > +	if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn))
> > +		*slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn);
> > +
> 
> You forgot to check the result. (if *slot_hint == NULL)? ...　;-)

Thank you!  I will check later.

Takuya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-19  3:47     ` Takuya Yoshikawa
@ 2011-04-20  9:09       ` Avi Kivity
  0 siblings, 0 replies; 27+ messages in thread
From: Avi Kivity @ 2011-04-20  9:09 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Xiao Guangrong, Takuya Yoshikawa, mtosatti, kvm

On 04/19/2011 06:47 AM, Takuya Yoshikawa wrote:
> So if certain algorithm seems to be addapted, yes, I will test based
> on that.  IIRC, any practically good algorithm has not been found yet,
> right?

I think a simple sort based on size will provide the same optimization
(just the cache, not get_user()) without any downsides. Most memory in a
guest is usually in just one or two slots, that's the reason for the
high hit rate.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-18 18:38 ` [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk Takuya Yoshikawa
  2011-04-18 18:52   ` Joerg Roedel
  2011-04-19  1:42   ` Xiao Guangrong
@ 2011-04-20  9:02   ` Avi Kivity
  2011-04-29  2:46     ` Andi Kleen
  2 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2011-04-20  9:02 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya

On 04/18/2011 09:38 PM, Takuya Yoshikawa wrote:
> From: Takuya Yoshikawa<yoshikawa.takuya@oss.ntt.co.jp>
>
> We optimize multi level guest page table walk as follows:
>
>    1. We cache the memslot which, probably, includes the next guest page
>       tables to avoid searching for it many times.
>    2. We use get_user() instead of copy_from_user().
>
> Note that this is kind of a restricted way of Xiao's more generic
> work: "KVM: optimize memslots searching and cache GPN to GFN."

Good optimization.  copy_from_user() really isn't optimized for short 
buffers, I expect much of the improvement comes from that.

> +/*
> + * Read the guest PTE refered to by table_gfn and offset and put it into ptep.
> + *
> + * *slot_hint, if not NULL, should point to a memslot which probably includes
> + * the guest PTE.  The actual memslot will be put back into this so that
> + * callers can cache it.
> + */

Please drop the slot_hint optimization.  First, it belongs in a separate 
patch.  Second, I prefer to see a generic slot sort instead of an ad-hoc 
cache.

>   static int FNAME(read_guest_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
> -				 gfn_t table_gfn, int offset, pt_element_t *ptep)
> +				 gfn_t table_gfn, int offset, pt_element_t *ptep,
> +				 struct kvm_memory_slot **slot_hint)
>   {
> -	return kvm_read_guest_page_mmu(vcpu, mmu, table_gfn, ptep,
> -				       offset, sizeof(*ptep),
> -				       PFERR_USER_MASK | PFERR_WRITE_MASK);
> +	unsigned long addr;
> +	pt_element_t __user *ptep_user;
> +	gfn_t real_gfn;
> +
> +	real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
> +				      PFERR_USER_MASK | PFERR_WRITE_MASK);
> +	if (real_gfn == UNMAPPED_GVA)
> +		return -EFAULT;
> +
> +	real_gfn = gpa_to_gfn(real_gfn);
> +
> +	if (!(*slot_hint) || !gfn_in_memslot(*slot_hint, real_gfn))
> +		*slot_hint = gfn_to_memslot(vcpu->kvm, real_gfn);
> +
> +	addr = gfn_to_hva_memslot(*slot_hint, real_gfn);
> +	if (kvm_is_error_hva(addr))
> +		return -EFAULT;
> +
> +	ptep_user = (pt_element_t __user *)((void *)addr + offset);
> +	return get_user(*ptep, ptep_user);
>   }
>
>   /*
> @@ -130,6 +155,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
>   	gpa_t pte_gpa;
>   	bool eperm, present, rsvd_fault;
>   	int offset, write_fault, user_fault, fetch_fault;
> +	struct kvm_memory_slot *slot_cache = NULL;
>
>   	write_fault = access&  PFERR_WRITE_MASK;
>   	user_fault = access&  PFERR_USER_MASK;
> @@ -168,7 +194,8 @@ walk:
>   		walker->table_gfn[walker->level - 1] = table_gfn;
>   		walker->pte_gpa[walker->level - 1] = pte_gpa;
>
> -		if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn, offset,&pte)) {
> +		if (FNAME(read_guest_pte)(vcpu, mmu, table_gfn,
> +					  offset,&pte,&slot_cache)) {
>   			present = false;
>   			break;
>   		}


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-20  9:02   ` Avi Kivity
@ 2011-04-29  2:46     ` Andi Kleen
  2011-04-29  5:38       ` Takuya Yoshikawa
  0 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2011-04-29  2:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Takuya Yoshikawa, mtosatti, kvm, yoshikawa.takuya

Avi Kivity <avi@redhat.com> writes:
>
> Good optimization.  copy_from_user() really isn't optimized for short
> buffers, I expect much of the improvement comes from that.

Actually it is equivalent to get_user for the lenghts supported by
get_user, assuming you pass in a constant length. You probably do not.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-29  2:46     ` Andi Kleen
@ 2011-04-29  5:38       ` Takuya Yoshikawa
  2011-04-29  6:30         ` Takuya Yoshikawa
  2011-04-29  6:59         ` Andi Kleen
  0 siblings, 2 replies; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-29  5:38 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Avi Kivity, mtosatti, kvm, yoshikawa.takuya

On Thu, 28 Apr 2011 19:46:00 -0700
Andi Kleen <andi@firstfloor.org> wrote:

> Avi Kivity <avi@redhat.com> writes:
> >
> > Good optimization.  copy_from_user() really isn't optimized for short
> > buffers, I expect much of the improvement comes from that.
> 
> Actually it is equivalent to get_user for the lenghts supported by
> get_user, assuming you pass in a constant length. You probably do not.
> 
> -Andi


Weird, I actually measured some changes even after dropping other parts
than get_user() usage.

Only I can guess for that reason is the reduction of some function calls
by inlining some functions.

Takuya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-29  5:38       ` Takuya Yoshikawa
@ 2011-04-29  6:30         ` Takuya Yoshikawa
  2011-04-29  6:59         ` Andi Kleen
  1 sibling, 0 replies; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-29  6:30 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Andi Kleen, Avi Kivity, mtosatti, kvm, yoshikawa.takuya

On Fri, 29 Apr 2011 14:38:08 +0900
Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote:

> On Thu, 28 Apr 2011 19:46:00 -0700
> Andi Kleen <andi@firstfloor.org> wrote:
> 
> > Avi Kivity <avi@redhat.com> writes:
> > >
> > > Good optimization.  copy_from_user() really isn't optimized for short
> > > buffers, I expect much of the improvement comes from that.
> > 
> > Actually it is equivalent to get_user for the lenghts supported by
> > get_user, assuming you pass in a constant length. You probably do not.
> > 
> > -Andi
> 
> 
> Weird, I actually measured some changes even after dropping other parts
> than get_user() usage.
> 
> Only I can guess for that reason is the reduction of some function calls
> by inlining some functions.

A bit to clarify.

Original path:

  kvm_read_guest_page_mmu()
    kvm_read_guest_page()
      copy_from_user()

Takuya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-29  5:38       ` Takuya Yoshikawa
  2011-04-29  6:30         ` Takuya Yoshikawa
@ 2011-04-29  6:59         ` Andi Kleen
  2011-04-29 13:51           ` Takuya Yoshikawa
  1 sibling, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2011-04-29  6:59 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Andi Kleen, Avi Kivity, mtosatti, kvm, yoshikawa.takuya

> Only I can guess for that reason is the reduction of some function calls
> by inlining some functions.

Yes once at a time cfu was inline too and just checked for the right
sizes and the used g*u, but it got lost in the "icache over everything else" mania which is unfortunately en vogue for quite some time in kernel land (aka 
measuring patches only based on their impact on the .text
size, not the actual performance)

But you might getter better gains by fixing this general
c*u() regression.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-29  6:59         ` Andi Kleen
@ 2011-04-29 13:51           ` Takuya Yoshikawa
  2011-04-29 16:05             ` Andi Kleen
  0 siblings, 1 reply; 27+ messages in thread
From: Takuya Yoshikawa @ 2011-04-29 13:51 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Avi Kivity, mtosatti, kvm, yoshikawa.takuya

Andi Kleen <andi@firstfloor.org> wrote:

> > Only I can guess for that reason is the reduction of some function calls
> > by inlining some functions.
> 
> Yes once at a time cfu was inline too and just checked for the right
> sizes and the used g*u, but it got lost in the "icache over everything else" mania which is unfortunately en vogue for quite some time in kernel land (aka 
> measuring patches only based on their impact on the .text
> size, not the actual performance)
> 
> But you might getter better gains by fixing this general
> c*u() regression.
> 

What I mentioned was about not only cfu but 3 functions.

Originally, we were doing the following function calls:

walk_addr_generic()              ---1
  kvm_read_guest_page_mmu()      ---2
    kvm_read_guest_page()        ---3
      copy_from_user()           ---4

And now, with my patch already applied, we are not using
generic kvm_read_guest_page_mmu() and some address
calculations are done in walk_addr_generic():

walk_addr_generic()              ---1'
  get_user()                     ---2'

The length is passed in from walk_addr_generic().

Do you think the following case would not differ so much
from (1' 2') ?

walk_addr_generic()              ---1''
  copy_from_user()               ---2''

This can satisfy your "assuming you pass in a constant length"
and almost same as get_user() ?

Takuya

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-29 13:51           ` Takuya Yoshikawa
@ 2011-04-29 16:05             ` Andi Kleen
  2011-05-01 13:32               ` Avi Kivity
  0 siblings, 1 reply; 27+ messages in thread
From: Andi Kleen @ 2011-04-29 16:05 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Andi Kleen, Avi Kivity, mtosatti, kvm, yoshikawa.takuya

> Do you think the following case would not differ so much
> from (1' 2') ?
> 
> walk_addr_generic()              ---1''
>   copy_from_user()               ---2''

Yes it should be the same and is cleaner.

If you do a make .../foo.i and look at the code coming out of the
preprocessor you'll see it expands to a 

	if (!__builtin_constant_p(size))
                return copy_user_generic(dst, (__force void *)src, size);
        switch (size) {
        case 1:__get_user_asm(*(u8 *)dst, (u8 __user *)src,
                              ret, "b", "b", "=q", 1);
                return ret;
	case 2: ..
	case 4: ..
	case 8: ..
	case 10: ..
	case 16: ..
	}

Ok it looks like the 32bit kernel only handles 1/2/4. Maybe that
was the problem if you ran on 32bit.

-Andi

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-04-29 16:05             ` Andi Kleen
@ 2011-05-01 13:32               ` Avi Kivity
  2011-05-01 20:51                 ` Andi Kleen
  0 siblings, 1 reply; 27+ messages in thread
From: Avi Kivity @ 2011-05-01 13:32 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Takuya Yoshikawa, mtosatti, kvm, yoshikawa.takuya

On 04/29/2011 07:05 PM, Andi Kleen wrote:
> >  Do you think the following case would not differ so much
> >  from (1' 2') ?
> >
> >  walk_addr_generic()              ---1''
> >    copy_from_user()               ---2''
>
> Yes it should be the same and is cleaner.
>
> If you do a make .../foo.i and look at the code coming out of the
> preprocessor you'll see it expands to a
>
> 	if (!__builtin_constant_p(size))
>                  return copy_user_generic(dst, (__force void *)src, size);
>          switch (size) {
>          case 1:__get_user_asm(*(u8 *)dst, (u8 __user *)src,
>                                ret, "b", "b", "=q", 1);
>                  return ret;
> 	case 2: ..
> 	case 4: ..
> 	case 8: ..
> 	case 10: ..
> 	case 16: ..
> 	}
>
> Ok it looks like the 32bit kernel only handles 1/2/4. Maybe that
> was the problem if you ran on 32bit.

I'm happy with a slower copy_from_user() for that particular case.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
  2011-05-01 13:32               ` Avi Kivity
@ 2011-05-01 20:51                 ` Andi Kleen
  0 siblings, 0 replies; 27+ messages in thread
From: Andi Kleen @ 2011-05-01 20:51 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Andi Kleen, Takuya Yoshikawa, mtosatti, kvm, yoshikawa.takuya


>> Ok it looks like the 32bit kernel only handles 1/2/4. Maybe that
>> was the problem if you ran on 32bit.
>
> I'm happy with a slower copy_from_user() for that particular case.

It wouldn't be hard to fix.

-Andi



^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2011-05-01 20:51 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-04-18 18:32 [PATCH 1/3] KVM: Introduce a helper to check if gfn is in memslot Takuya Yoshikawa
2011-04-18 18:34 ` [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte Takuya Yoshikawa
2011-04-20  9:07   ` Avi Kivity
2011-04-20  9:35     ` Roedel, Joerg
2011-04-20 10:05       ` Avi Kivity
2011-04-20 11:06         ` Roedel, Joerg
2011-04-20 11:18           ` Avi Kivity
2011-04-20 13:33             ` [PATCH] KVM: MMU: Make cmpxchg_gpte aware of nesting too Roedel, Joerg
2011-04-21  1:02               ` Takuya Yoshikawa
2011-04-21  8:11                 ` Avi Kivity
2011-04-21  1:07             ` [PATCH 2/3] KVM: MMU: Introduce a helper to read guest pte Takuya Yoshikawa
2011-04-18 18:38 ` [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk Takuya Yoshikawa
2011-04-18 18:52   ` Joerg Roedel
2011-04-19  1:24     ` Takuya Yoshikawa
2011-04-19  6:20       ` Joerg Roedel
2011-04-19  1:42   ` Xiao Guangrong
2011-04-19  3:47     ` Takuya Yoshikawa
2011-04-20  9:09       ` Avi Kivity
2011-04-20  9:02   ` Avi Kivity
2011-04-29  2:46     ` Andi Kleen
2011-04-29  5:38       ` Takuya Yoshikawa
2011-04-29  6:30         ` Takuya Yoshikawa
2011-04-29  6:59         ` Andi Kleen
2011-04-29 13:51           ` Takuya Yoshikawa
2011-04-29 16:05             ` Andi Kleen
2011-05-01 13:32               ` Avi Kivity
2011-05-01 20:51                 ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).