KVM: MMU: optimize set_spte for page sync

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* KVM: MMU: optimize set_spte for page sync
@ 2008-11-21 18:49 Marcelo Tosatti
  2008-11-23 10:36 ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2008-11-21 18:49 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel


The write protect verification in set_spte is unnecessary for page sync.

Its guaranteed that, if the unsync spte was writable, the target page
does not have a write protected shadow (if it had, the spte would have
been write protected under mmu_lock by rmap_write_protect before).

Same reasoning applies to mark_page_dirty: the gfn has been marked as
dirty via the pagefault path.

The cost of hash table and memslot lookups are quite significant if the
workload is pagetable write intensive resulting in increased mmu_lock
contention.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1529,7 +1529,7 @@ static int kvm_unsync_page(struct kvm_vc
 }
 
 static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn,
-				  bool can_unsync)
+				  bool sync_page)
 {
 	struct kvm_mmu_page *shadow;
 
@@ -1539,7 +1539,7 @@ static int mmu_need_write_protect(struct
 			return 1;
 		if (shadow->unsync)
 			return 0;
-		if (can_unsync && oos_shadow)
+		if (!sync_page && oos_shadow)
 			return kvm_unsync_page(vcpu, shadow);
 		return 1;
 	}
@@ -1550,7 +1550,7 @@ static int set_spte(struct kvm_vcpu *vcp
 		    unsigned pte_access, int user_fault,
 		    int write_fault, int dirty, int largepage,
 		    gfn_t gfn, pfn_t pfn, bool speculative,
-		    bool can_unsync)
+		    bool sync_page)
 {
 	u64 spte;
 	int ret = 0;
@@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp
 
 		spte |= PT_WRITABLE_MASK;
 
-		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
+		/*
+ 	 	 * Optimization: for pte sync, if spte was writable the hash
+ 	 	 * lookup is unnecessary (and expensive). Write protection
+ 	 	 * is responsibility of mmu_get_page / kvm_sync_page.
+ 	 	 * Same reasoning can be applied to dirty page accounting.
+ 	 	 */
+		if (sync_page && is_writeble_pte(*shadow_pte))
+			goto set_pte;
+
+		if (mmu_need_write_protect(vcpu, gfn, sync_page)) {
 			pgprintk("%s: found shadow page for %lx, marking ro\n",
 				 __func__, gfn);
 			ret = 1;
@@ -1648,7 +1657,7 @@ static void mmu_set_spte(struct kvm_vcpu
 		}
 	}
 	if (set_spte(vcpu, shadow_pte, pte_access, user_fault, write_fault,
-		      dirty, largepage, gfn, pfn, speculative, true)) {
+		      dirty, largepage, gfn, pfn, speculative, false)) {
 		if (write_fault)
 			*ptwrite = 1;
 		kvm_x86_ops->tlb_flush(vcpu);
Index: kvm/arch/x86/kvm/paging_tmpl.h
===================================================================
--- kvm.orig/arch/x86/kvm/paging_tmpl.h
+++ kvm/arch/x86/kvm/paging_tmpl.h
@@ -580,7 +580,7 @@ static int FNAME(sync_page)(struct kvm_v
 		pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
 		set_spte(vcpu, &sp->spt[i], pte_access, 0, 0,
 			 is_dirty_pte(gpte), 0, gfn,
-			 spte_to_pfn(sp->spt[i]), true, false);
+			 spte_to_pfn(sp->spt[i]), true, true);
 	}
 
 	return !nr_present;

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-11-21 18:49 KVM: MMU: optimize set_spte for page sync Marcelo Tosatti
@ 2008-11-23 10:36 ` Avi Kivity
  2008-11-24 12:04   ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-11-23 10:36 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel

Marcelo Tosatti wrote:

> The cost of hash table and memslot lookups are quite significant if the
> workload is pagetable write intensive resulting in increased mmu_lock
> contention.
>
> @@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp
>  
>  		spte |= PT_WRITABLE_MASK;
>  
> -		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
> +		/*
> + 	 	 * Optimization: for pte sync, if spte was writable the hash
> + 	 	 * lookup is unnecessary (and expensive). Write protection
> + 	 	 * is responsibility of mmu_get_page / kvm_sync_page.
> + 	 	 * Same reasoning can be applied to dirty page accounting.
> + 	 	 */
> +		if (sync_page && is_writeble_pte(*shadow_pte))
> +			goto set_pte;
>   

What if *shadow_pte points at a different page?  Is that possible?


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-11-23 10:36 ` Avi Kivity
@ 2008-11-24 12:04   ` Marcelo Tosatti
  2008-11-24 13:23     ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2008-11-24 12:04 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

On Sun, Nov 23, 2008 at 12:36:29PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>
>> The cost of hash table and memslot lookups are quite significant if the
>> workload is pagetable write intensive resulting in increased mmu_lock
>> contention.
>>
>> @@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp
>>   		spte |= PT_WRITABLE_MASK;
>>  -		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
>> +		/*
>> + 	 	 * Optimization: for pte sync, if spte was writable the hash
>> + 	 	 * lookup is unnecessary (and expensive). Write protection
>> + 	 	 * is responsibility of mmu_get_page / kvm_sync_page.
>> + 	 	 * Same reasoning can be applied to dirty page accounting.
>> + 	 	 */
>> +		if (sync_page && is_writeble_pte(*shadow_pte))
>> +			goto set_pte;
>>   
>
> What if *shadow_pte points at a different page?  Is that possible?

To a different gfn? Then sync_page will have nuked the spte:

                if (gpte_to_gfn(gpte) != gfn || !is_present_pte(gpte) ||
                    !(gpte & PT_ACCESSED_MASK)) {
                        u64 nonpresent;
                        ..
                        set_shadow_pte(&sp->spt[i], nonpresent);
                }

Otherwise:

/*
 * Using the cached information from sp->gfns is safe because:
 * - The spte has a reference to the struct page, so the pfn for a given
 * gfn can't change unless all sptes pointing to it are nuked first.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-11-24 12:04   ` Marcelo Tosatti
@ 2008-11-24 13:23     ` Marcelo Tosatti
  2008-11-25 14:38       ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2008-11-24 13:23 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

On Mon, Nov 24, 2008 at 01:04:23PM +0100, Marcelo Tosatti wrote:
> On Sun, Nov 23, 2008 at 12:36:29PM +0200, Avi Kivity wrote:
> > Marcelo Tosatti wrote:
> >
> >> The cost of hash table and memslot lookups are quite significant if the
> >> workload is pagetable write intensive resulting in increased mmu_lock
> >> contention.
> >>
> >> @@ -1593,7 +1593,16 @@ static int set_spte(struct kvm_vcpu *vcp
> >>   		spte |= PT_WRITABLE_MASK;
> >>  -		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
> >> +		/*
> >> + 	 	 * Optimization: for pte sync, if spte was writable the hash
> >> + 	 	 * lookup is unnecessary (and expensive). Write protection
> >> + 	 	 * is responsibility of mmu_get_page / kvm_sync_page.
> >> + 	 	 * Same reasoning can be applied to dirty page accounting.
> >> + 	 	 */
> >> +		if (sync_page && is_writeble_pte(*shadow_pte))
> >> +			goto set_pte;
> >>   
> >
> > What if *shadow_pte points at a different page?  Is that possible?
>
> To a different gfn? Then sync_page will have nuked the spte:
> 
>                 if (gpte_to_gfn(gpte) != gfn || !is_present_pte(gpte) ||
>                     !(gpte & PT_ACCESSED_MASK)) {
>                         u64 nonpresent;
>                         ..
>                         set_shadow_pte(&sp->spt[i], nonpresent);
>                 }
> 
> Otherwise:
> 
> /*
>  * Using the cached information from sp->gfns is safe because:
>  * - The spte has a reference to the struct page, so the pfn for a given
>  * gfn can't change unless all sptes pointing to it are nuked first.

*shadow_pte can point to a different page if the guest updates
pagetable, there is a fault before resync, the fault updates the
spte with new gfn (and pfn) via mmu_set_spte. In which case the gfn
cache is updated since:

                    } else if (pfn != spte_to_pfn(*shadow_pte)) {
                        printk("hfn old %lx new %lx\n",
                                 spte_to_pfn(*shadow_pte), pfn);
                        rmap_remove(vcpu->kvm, shadow_pte);



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-11-24 13:23     ` Marcelo Tosatti
@ 2008-11-25 14:38       ` Avi Kivity
  2008-11-25 14:58         ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-11-25 14:38 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel

Marcelo Tosatti wrote:
> *shadow_pte can point to a different page if the guest updates
> pagetable, there is a fault before resync, the fault updates the
> spte with new gfn (and pfn) via mmu_set_spte. In which case the gfn
> cache is updated since:
>
>                     } else if (pfn != spte_to_pfn(*shadow_pte)) {
>                         printk("hfn old %lx new %lx\n",
>                                  spte_to_pfn(*shadow_pte), pfn);
>                         rmap_remove(vcpu->kvm, shadow_pte);
>   

Okay.  Please resend but without the reversal of can_unsync, it will 
make a more readable patch.  If you like, send a follow on that only 
does the reversal.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-11-25 14:38       ` Avi Kivity
@ 2008-11-25 14:58         ` Marcelo Tosatti
  2008-11-26 11:12           ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2008-11-25 14:58 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel

On Tue, Nov 25, 2008 at 04:38:13PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> *shadow_pte can point to a different page if the guest updates
>> pagetable, there is a fault before resync, the fault updates the
>> spte with new gfn (and pfn) via mmu_set_spte. In which case the gfn
>> cache is updated since:
>>
>>                     } else if (pfn != spte_to_pfn(*shadow_pte)) {
>>                         printk("hfn old %lx new %lx\n",
>>                                  spte_to_pfn(*shadow_pte), pfn);
>>                         rmap_remove(vcpu->kvm, shadow_pte);
>>   
>
> Okay.  Please resend but without the reversal of can_unsync, it will  
> make a more readable patch.  If you like, send a follow on that only  
> does the reversal.

Here it goes.

KVM: MMU: optimize set_spte for page sync

The write protect verification in set_spte is unnecessary for page sync.

Its guaranteed that, if the unsync spte was writable, the target page
does not have a write protected shadow (if it had, the spte would have
been write protected under mmu_lock by rmap_write_protect before).

Same reasoning applies to mark_page_dirty: the gfn has been marked as
dirty via the pagefault path.

The cost of hash table and memslot lookups are quite significant if the
workload is pagetable write intensive resulting in increased mmu_lock
contention.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: kvm/arch/x86/kvm/mmu.c
===================================================================
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1593,6 +1593,15 @@ static int set_spte(struct kvm_vcpu *vcp
 
 		spte |= PT_WRITABLE_MASK;
 
+		/*
+ 	 	 * Optimization: for pte sync, if spte was writable the hash
+ 	 	 * lookup is unnecessary (and expensive). Write protection
+ 	 	 * is responsibility of mmu_get_page / kvm_sync_page.
+ 	 	 * Same reasoning can be applied to dirty page accounting.
+ 	 	 */
+		if (!can_unsync && is_writeble_pte(*shadow_pte))
+			goto set_pte;
+
 		if (mmu_need_write_protect(vcpu, gfn, can_unsync)) {
 			pgprintk("%s: found shadow page for %lx, marking ro\n",
 				 __func__, gfn);


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-11-25 14:58         ` Marcelo Tosatti
@ 2008-11-26 11:12           ` Avi Kivity
  2008-12-09 15:52             ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-11-26 11:12 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel

Marcelo Tosatti wrote:
>
> Here it goes.
>
> KVM: MMU: optimize set_spte for page sync
>
> The write protect verification in set_spte is unnecessary for page sync.
>
> Its guaranteed that, if the unsync spte was writable, the target page
> does not have a write protected shadow (if it had, the spte would have
> been write protected under mmu_lock by rmap_write_protect before).
>
> Same reasoning applies to mark_page_dirty: the gfn has been marked as
> dirty via the pagefault path.
>
> The cost of hash table and memslot lookups are quite significant if the
> workload is pagetable write intensive resulting in increased mmu_lock
> contention.
>
>   

Applied, thanks.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-11-26 11:12           ` Avi Kivity
@ 2008-12-09 15:52             ` Marcelo Tosatti
  2008-12-10  9:14               ` Avi Kivity
  0 siblings, 1 reply; 10+ messages in thread
From: Marcelo Tosatti @ 2008-12-09 15:52 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, Eduardo Habkost, Chris Wright

On Wed, Nov 26, 2008 at 01:12:20PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>>
>> Here it goes.
>>
>> KVM: MMU: optimize set_spte for page sync
>>
>> The write protect verification in set_spte is unnecessary for page sync.
>>
>> Its guaranteed that, if the unsync spte was writable, the target page
>> does not have a write protected shadow (if it had, the spte would have
>> been write protected under mmu_lock by rmap_write_protect before).
>>
>> Same reasoning applies to mark_page_dirty: the gfn has been marked as
>> dirty via the pagefault path.
>>
>> The cost of hash table and memslot lookups are quite significant if the
>> workload is pagetable write intensive resulting in increased mmu_lock
>> contention.
>>
>>   
>
> Applied, thanks.

Avi,

Do you have objections for submitting this patch for 2.6.28 ? The hash
lookup kills performance of pagetable write + context switch intensive
workloads.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-12-09 15:52             ` Marcelo Tosatti
@ 2008-12-10  9:14               ` Avi Kivity
  2008-12-10 12:54                 ` Marcelo Tosatti
  0 siblings, 1 reply; 10+ messages in thread
From: Avi Kivity @ 2008-12-10  9:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: kvm-devel, Eduardo Habkost, Chris Wright

Marcelo Tosatti wrote:
> Do you have objections for submitting this patch for 2.6.28 ? The hash
> lookup kills performance of pagetable write + context switch intensive
> workloads.
>   

Can you quantify?

2.6.28 is out of the question, 2.6.28.stable is possible, depending on 
the actual performance difference.  I'd hate to introduce a correctness 
problem just to improve performance.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: KVM: MMU: optimize set_spte for page sync
  2008-12-10  9:14               ` Avi Kivity
@ 2008-12-10 12:54                 ` Marcelo Tosatti
  0 siblings, 0 replies; 10+ messages in thread
From: Marcelo Tosatti @ 2008-12-10 12:54 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm-devel, Eduardo Habkost, Chris Wright

On Wed, Dec 10, 2008 at 11:14:42AM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> Do you have objections for submitting this patch for 2.6.28 ? The hash
>> lookup kills performance of pagetable write + context switch intensive
>> workloads.
>>   
>
> Can you quantify?
>
> 2.6.28 is out of the question, 2.6.28.stable is possible, depending on  
> the actual performance difference.  I'd hate to introduce a correctness  
> problem just to improve performance.

Around 10% improvement on some workloads.


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2008-12-10 15:56 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-11-21 18:49 KVM: MMU: optimize set_spte for page sync Marcelo Tosatti
2008-11-23 10:36 ` Avi Kivity
2008-11-24 12:04   ` Marcelo Tosatti
2008-11-24 13:23     ` Marcelo Tosatti
2008-11-25 14:38       ` Avi Kivity
2008-11-25 14:58         ` Marcelo Tosatti
2008-11-26 11:12           ` Avi Kivity
2008-12-09 15:52             ` Marcelo Tosatti
2008-12-10  9:14               ` Avi Kivity
2008-12-10 12:54                 ` Marcelo Tosatti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).