[PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging
@ 2012-04-28 10:05 Takuya Yoshikawa
  2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa
  2012-05-02 11:24 ` Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
  0 siblings, 2 replies; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-04-28 10:05 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya

1. Problem
  During live migration, if the guest tries to take mmu_lock at the same
  time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
  forced to wait long time; this is not restricted to page faults caused
  by GET_DIRTY_LOG's write protection.

2. Measurement
- Server:
  Xeon: 8 cores(2 CPUs), 24GB memory

- One VM was being migrated locally to the opposite numa node:
  Source(active)   VM: binded to node 0
  Target(incoming) VM: binded to node 1

  This binding was for reducing extra noise.

- The guest inside it:
  3 VCPUs, 11GB memory

- Workload:
  On VCPU 2 and 3, there were 3 threads and each of them was endlessly
  writing to 3GB, in total 9GB, anonymous memory at its maximum speed.

  I had checked that GET_DIRTY_LOG was forced to write protect more than
  2 million pages.  So the 9GB memory was almost always kept dirty to be
  sent.

  In parallel, on VCPU 1, I checked memory write latency: how long it
  takes to write to one byte of each page in 1GB anonymous memory.

- Result:
  With the current KVM, I could see 1.5ms worst case latency: this
  corresponds well with the expected mmu_lock hold time.

  Here, you may think that this is too small compared to the numbers I
  reported before, using dirty-log-perf, but that was done on 32-bit
  host on a core-i3 box which was much slower than server machines.

  Although having 10GB dirty memory pages is a bit extreme for guests
  with less than 16GB memory, much larger guests, e.g. 128GB guests, may
  see latency longer than 1.5ms.

3. Solution
  GET_DIRTY_LOG time is very limited compared to other works in QEMU,
  so we should focus on alleviating the worst case latency first.

  The solution is very simple and originally suggested by Marcelo:
    "Conditionally reschedule when there is a contention."

  By this rescheduling, see the following patch, the worst case latency
  changed from 1.5ms to 800us for the same test.

4. TODO
  The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write
  protection by kvm_mmu_slot_remove_write_access(), which is called when
  we enable dirty page logging, can cause the same problem.

  My plan is to replace it with rmap-based protection after this.

Thanks,
	Takuya

---
Takuya Yoshikawa (1):
  KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

 arch/x86/include/asm/kvm_host.h |    6 +++---
 arch/x86/kvm/mmu.c              |   12 +++++++++---
 arch/x86/kvm/x86.c              |   22 +++++++++++++++++-----
 3 files changed, 29 insertions(+), 11 deletions(-)

-- 
1.7.5.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-28 10:05 [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
@ 2012-04-28 10:07 ` Takuya Yoshikawa
  2012-04-29 11:27   ` Avi Kivity
  2012-05-02 11:24 ` Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
  1 sibling, 1 reply; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-04-28 10:07 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya

From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>

get_dirty_log() needs to hold mmu_lock during write protecting dirty
pages and this can be long when there are many dirty pages to protect.

As the guest can get faulted during that time, and of course other mmu
works can also happen, this may result in a severe latency problem which
would prevent the system from scaling.

This patch mitigates this by checking mmu_lock contention for every 8K
dirty pages we protect: write protecting this number of pages took about
one hundred microseconds on a usual Xeon server.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 v1->v2:
   Replaced spin_needbreak() with spin_is_contended() to make this work
     even when CONFIG_PREEMPT=no.
   Expanded the reasoning behind the check as a comment.
   Raised the value from 2048 to 8192 based on a new test result.

 arch/x86/include/asm/kvm_host.h |    6 +++---
 arch/x86/kvm/mmu.c              |   12 +++++++++---
 arch/x86/kvm/x86.c              |   22 +++++++++++++++++-----
 3 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 69e39bc..b249a8b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -716,9 +716,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
-void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
-				     struct kvm_memory_slot *slot,
-				     gfn_t gfn_offset, unsigned long mask);
+int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+				    struct kvm_memory_slot *slot,
+				    gfn_t gfn_offset, unsigned long mask);
 void kvm_mmu_zap_all(struct kvm *kvm);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 07424cf..c4abfc1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level
  *
  * Used when we do not need to care about huge page mappings: e.g. during dirty
  * logging we do not have any such mappings.
+ *
+ * Returns the number of pages protected by this.
  */
-void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
-				     struct kvm_memory_slot *slot,
-				     gfn_t gfn_offset, unsigned long mask)
+int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
+				    struct kvm_memory_slot *slot,
+				    gfn_t gfn_offset, unsigned long mask)
 {
 	unsigned long *rmapp;
+	int nr_protected = 0;
 
 	while (mask) {
 		rmapp = &slot->rmap[gfn_offset + __ffs(mask)];
 		__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL);
+		++nr_protected;
 
 		/* clear the first set bit */
 		mask &= mask - 1;
 	}
+
+	return nr_protected;
 }
 
 static int rmap_write_protect(struct kvm *kvm, u64 gfn)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4de705c..aaae01a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 	unsigned long n, i;
 	unsigned long *dirty_bitmap;
 	unsigned long *dirty_bitmap_buffer;
-	bool is_dirty = false;
+	int nr_protected = 0;
 
 	mutex_lock(&kvm->slots_lock);
 
@@ -3121,15 +3121,27 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 		if (!dirty_bitmap[i])
 			continue;
 
-		is_dirty = true;
-
 		mask = xchg(&dirty_bitmap[i], 0);
 		dirty_bitmap_buffer[i] = mask;
 
 		offset = i * BITS_PER_LONG;
-		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
+		nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot,
+								offset, mask);
+		if (nr_protected > 8192) {
+			/*
+			 * Write protecting many pages takes long time, so we
+			 * check mmu_lock contention for avoiding ms latency.
+			 */
+			if (need_resched() || spin_is_contended(&kvm->mmu_lock)) {
+				kvm_flush_remote_tlbs(kvm);
+				spin_unlock(&kvm->mmu_lock);
+				cond_resched();
+				spin_lock(&kvm->mmu_lock);
+			}
+			nr_protected = 0;
+		}
 	}
-	if (is_dirty)
+	if (nr_protected)
 		kvm_flush_remote_tlbs(kvm);
 
 	spin_unlock(&kvm->mmu_lock);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa
@ 2012-04-29 11:27   ` Avi Kivity
  2012-04-29 12:17     ` Takuya Yoshikawa
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2012-04-29 11:27 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya

On 04/28/2012 01:07 PM, Takuya Yoshikawa wrote:
> From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
>
> get_dirty_log() needs to hold mmu_lock during write protecting dirty
> pages and this can be long when there are many dirty pages to protect.
>
> As the guest can get faulted during that time, and of course other mmu
> works can also happen, this may result in a severe latency problem which
> would prevent the system from scaling.
>
> This patch mitigates this by checking mmu_lock contention for every 8K
> dirty pages we protect: write protecting this number of pages took about
> one hundred microseconds on a usual Xeon server.
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
>  v1->v2:
>    Replaced spin_needbreak() with spin_is_contended() to make this work
>      even when CONFIG_PREEMPT=no.
>    Expanded the reasoning behind the check as a comment.
>    Raised the value from 2048 to 8192 based on a new test result.
>
>  arch/x86/include/asm/kvm_host.h |    6 +++---
>  arch/x86/kvm/mmu.c              |   12 +++++++++---
>  arch/x86/kvm/x86.c              |   22 +++++++++++++++++-----
>  3 files changed, 29 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 69e39bc..b249a8b 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -716,9 +716,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
>  
>  int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
>  void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> -				     struct kvm_memory_slot *slot,
> -				     gfn_t gfn_offset, unsigned long mask);
> +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +				    struct kvm_memory_slot *slot,
> +				    gfn_t gfn_offset, unsigned long mask);
>  void kvm_mmu_zap_all(struct kvm *kvm);
>  unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>  void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 07424cf..c4abfc1 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level
>   *
>   * Used when we do not need to care about huge page mappings: e.g. during dirty
>   * logging we do not have any such mappings.
> + *
> + * Returns the number of pages protected by this.
>   */
> -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> -				     struct kvm_memory_slot *slot,
> -				     gfn_t gfn_offset, unsigned long mask)
> +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +				    struct kvm_memory_slot *slot,
> +				    gfn_t gfn_offset, unsigned long mask)
>  {
>  	unsigned long *rmapp;
> +	int nr_protected = 0;
>  
>  	while (mask) {
>  		rmapp = &slot->rmap[gfn_offset + __ffs(mask)];
>  		__rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL);
> +		++nr_protected;
>  
>  		/* clear the first set bit */
>  		mask &= mask - 1;
>  	}
> +
> +	return nr_protected;
>  }
>  
>  static int rmap_write_protect(struct kvm *kvm, u64 gfn)
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 4de705c..aaae01a 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  	unsigned long n, i;
>  	unsigned long *dirty_bitmap;
>  	unsigned long *dirty_bitmap_buffer;
> -	bool is_dirty = false;
> +	int nr_protected = 0;
>  
>  	mutex_lock(&kvm->slots_lock);
>  
> @@ -3121,15 +3121,27 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  		if (!dirty_bitmap[i])
>  			continue;
>  
> -		is_dirty = true;
> -
>  		mask = xchg(&dirty_bitmap[i], 0);
>  		dirty_bitmap_buffer[i] = mask;
>  
>  		offset = i * BITS_PER_LONG;
> -		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> +		nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot,
> +								offset, mask);
> +		if (nr_protected > 8192) {
> +			/*
> +			 * Write protecting many pages takes long time, so we
> +			 * check mmu_lock contention for avoiding ms latency.
> +			 */
> +			if (need_resched() || spin_is_contended(&kvm->mmu_lock)) {
> +				kvm_flush_remote_tlbs(kvm);

Do we really need to flush the TLB here?

Suppose we don't.  So some pages could still be written to using old TLB
entries, but that's okay, since we're reporting those pages as dirty
anyway.  In fact we might be saving userspace another pass at the page,
if it won't be written afterwards.  A flush is only needed to prevent
unlogged writes after we return the dirty bitmap.

If this reasoning is correct, we can replace the whole thing with
cond_resched_lock().

Oh, but this might trick a later rmap_write_protect() into thinking that
no write protection and tlb flush is needed.  So we should touch
kvm->tlbs_dirty.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 11:27   ` Avi Kivity
@ 2012-04-29 12:17     ` Takuya Yoshikawa
  2012-04-29 12:59       ` Avi Kivity
  0 siblings, 1 reply; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-04-29 12:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya

On Sun, 29 Apr 2012 14:27:30 +0300
Avi Kivity <avi@redhat.com> wrote:

> > +			if (need_resched() || spin_is_contended(&kvm->mmu_lock)) {
> > +				kvm_flush_remote_tlbs(kvm);
> 
> Do we really need to flush the TLB here?
> 
> Suppose we don't.  So some pages could still be written to using old TLB
> entries, but that's okay, since we're reporting those pages as dirty
> anyway.  In fact we might be saving userspace another pass at the page,
> if it won't be written afterwards.  A flush is only needed to prevent
> unlogged writes after we return the dirty bitmap.
> 
> If this reasoning is correct, we can replace the whole thing with
> cond_resched_lock().

Correct for dirty logging.

Actually, that was the reason I once introduced a rmap-write-protection race
when I first did rmap-based write protection.  I did not think about other
paths which conditionally flush TLBs.

For this patch, TLB flush is needed.

> Oh, but this might trick a later rmap_write_protect() into thinking that
> no write protection and tlb flush is needed.  So we should touch
> kvm->tlbs_dirty.

The problem was making others think that
"already protected, no need to flush."

As we discussed before, we need to add some tricks to de-couple mmu_lock and
TLB flush.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 12:17     ` Takuya Yoshikawa
@ 2012-04-29 12:59       ` Avi Kivity
  2012-04-29 14:24         ` Takuya Yoshikawa
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2012-04-29 12:59 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya

On 04/29/2012 03:17 PM, Takuya Yoshikawa wrote:
> > Oh, but this might trick a later rmap_write_protect() into thinking that
> > no write protection and tlb flush is needed.  So we should touch
> > kvm->tlbs_dirty.
>
> The problem was making others think that
> "already protected, no need to flush."
>
> As we discussed before, we need to add some tricks to de-couple mmu_lock and
> TLB flush.

Ok, let's discuss them (we can apply the patch independently).  Do you
have something in mind?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 12:59       ` Avi Kivity
@ 2012-04-29 14:24         ` Takuya Yoshikawa
  2012-04-29 14:39           ` Avi Kivity
  0 siblings, 1 reply; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-04-29 14:24 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya

On Sun, 29 Apr 2012 15:59:18 +0300
Avi Kivity <avi@redhat.com> wrote:

> > As we discussed before, we need to add some tricks to de-couple mmu_lock and
> > TLB flush.
> 
> Ok, let's discuss them (we can apply the patch independently).  Do you
> have something in mind?
> 

How about your own idea?

http://www.spinics.net/lists/kvm/msg68550.html
===
> > How about something like a sequence lock:
> >
> >
> >     spin_lock(mmu_lock)
> >     need_flush = write_protect_stuff();
> >     atomic_add(kvm->want_flush_counter, need_flush);
> >     spin_unlock(mmu_lock);
> >
> >     while ((done = atomic_read(kvm->done_flush_counter)) < (want =
> > atomic_read(kvm->want_flush_counter)) {
> >           kvm_make_request(flush)
> >           atomic_cmpxchg(kvm->done_flush_counter, done, want)
> >     }
> >
> > This (or maybe a corrected and optimized version) ensures that any
> > need_flush cannot pass the while () barrier, no matter which thread
> > encounters it first.  However it violates the "do not invent new locking
> > techniques" commandment.  Can we map it to some existing method?
> 
> There is no need to advance 'want' in the loop.  So we could do
> 
> /* must call with mmu_lock held */
> void kvm_mmu_defer_remote_flush(kvm, need_flush)
> {
>       if (need_flush)
>             ++kvm->flush_counter.want;
> }
> 
> /* may call without mmu_lock */
> void kvm_mmu_commit_remote_flush(kvm)
> {
>       want = ACCESS_ONCE(kvm->flush_counter.want)
>       while ((done = atomic_read(kvm->flush_counter.done) < want) {
>             kvm_make_request(flush)
>             atomic_cmpxchg(kvm->flush_counter.done, done, want)
>       }
> }
===

	Takuya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 14:24         ` Takuya Yoshikawa
@ 2012-04-29 14:39           ` Avi Kivity
  2012-04-29 14:55             ` Takuya Yoshikawa
  2012-05-01  3:07             ` Marcelo Tosatti
  0 siblings, 2 replies; 18+ messages in thread
From: Avi Kivity @ 2012-04-29 14:39 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya

On 04/29/2012 05:24 PM, Takuya Yoshikawa wrote:
> On Sun, 29 Apr 2012 15:59:18 +0300
> Avi Kivity <avi@redhat.com> wrote:
>
> > > As we discussed before, we need to add some tricks to de-couple mmu_lock and
> > > TLB flush.
> > 
> > Ok, let's discuss them (we can apply the patch independently).  Do you
> > have something in mind?
> > 
>
> How about your own idea?
>

Looks pretty good, if I do say so myself.  I'll take a shot at
implementing it.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 14:39           ` Avi Kivity
@ 2012-04-29 14:55             ` Takuya Yoshikawa
  2012-04-29 15:00               ` Avi Kivity
  2012-05-01  3:04               ` Marcelo Tosatti
  2012-05-01  3:07             ` Marcelo Tosatti
  1 sibling, 2 replies; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-04-29 14:55 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya

On Sun, 29 Apr 2012 17:39:35 +0300
Avi Kivity <avi@redhat.com> wrote:

> > How about your own idea?
> >
> 
> Looks pretty good, if I do say so myself.  I'll take a shot at
> implementing it.


Looking forward to it!
After your work, 8192 in my patch may better be lowered a bit.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 14:55             ` Takuya Yoshikawa
@ 2012-04-29 15:00               ` Avi Kivity
  2012-04-29 15:13                 ` Takuya Yoshikawa
  2012-05-01  3:04               ` Marcelo Tosatti
  1 sibling, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2012-04-29 15:00 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya

On 04/29/2012 05:55 PM, Takuya Yoshikawa wrote:
> On Sun, 29 Apr 2012 17:39:35 +0300
> Avi Kivity <avi@redhat.com> wrote:
>
> > > How about your own idea?
> > >
> > 
> > Looks pretty good, if I do say so myself.  I'll take a shot at
> > implementing it.
>
>
> Looking forward to it!
> After your work, 8192 in my patch may better be lowered a bit.
>

Why not remove it altogether?  Just change it to cond_resched_lock().

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 15:00               ` Avi Kivity
@ 2012-04-29 15:13                 ` Takuya Yoshikawa
  2012-04-29 15:20                   ` Avi Kivity
  0 siblings, 1 reply; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-04-29 15:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya

On Sun, 29 Apr 2012 18:00:03 +0300
Avi Kivity <avi@redhat.com> wrote:

> > After your work, 8192 in my patch may better be lowered a bit.
> >
> 
> Why not remove it altogether?  Just change it to cond_resched_lock().

Two concerns:

	- too many checks may slow down GET_DIRTY_LOG.
	- cond_resched_lock() uses spin_needbreak(), and I am not sure if
	  this still does rescheduling for alleviating mmu_lock contention
	  without CONFIG_PREEMPT.

I need to read the code a bit more.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 15:13                 ` Takuya Yoshikawa
@ 2012-04-29 15:20                   ` Avi Kivity
  2012-04-30 14:06                     ` Takuya Yoshikawa
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2012-04-29 15:20 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya

On 04/29/2012 06:13 PM, Takuya Yoshikawa wrote:
> On Sun, 29 Apr 2012 18:00:03 +0300
> Avi Kivity <avi@redhat.com> wrote:
>
> > > After your work, 8192 in my patch may better be lowered a bit.
> > >
> > 
> > Why not remove it altogether?  Just change it to cond_resched_lock().
>
> Two concerns:
>
> 	- too many checks may slow down GET_DIRTY_LOG.

Yes.  My expectation is that it's okay.  If there is no contention, then
the lock cache line will be local, and the check is cheap.  If there is
contention, then we need to drop it.

> 	- cond_resched_lock() uses spin_needbreak(), and I am not sure if
> 	  this still does rescheduling for alleviating mmu_lock contention
> 	  without CONFIG_PREEMPT.

IMO that's a feature.  If kernel policy is not to lockbreak when
!CONFIG_PREEMPT, then we should follow it, whether it's the right thing
or not.  If it's the wrong thing, change it in the kernel proper.

> I need to read the code a bit more.
>

while (1)
   need_to_read_the_code_a_bit_more();

unfortunately.  I keep finding new things in there, including stuff I
wrote myself.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 15:20                   ` Avi Kivity
@ 2012-04-30 14:06                     ` Takuya Yoshikawa
  0 siblings, 0 replies; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-04-30 14:06 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya

On Sun, 29 Apr 2012 18:20:37 +0300
Avi Kivity <avi@redhat.com> wrote:

> > 	- cond_resched_lock() uses spin_needbreak(), and I am not sure if
> > 	  this still does rescheduling for alleviating mmu_lock contention
> > 	  without CONFIG_PREEMPT.
> 
> IMO that's a feature.  If kernel policy is not to lockbreak when
> !CONFIG_PREEMPT, then we should follow it, whether it's the right thing
> or not.  If it's the wrong thing, change it in the kernel proper.

Agreed.

Anyway, we need to alleviate mmu_lock contention on server machines, where we
usually do not select CONFIG_PREEMPT.

So I need to check the porpose of this API.

> while (1)
>    need_to_read_the_code_a_bit_more();

It's a pleasure!

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 14:55             ` Takuya Yoshikawa
  2012-04-29 15:00               ` Avi Kivity
@ 2012-05-01  3:04               ` Marcelo Tosatti
  2012-05-01 13:14                 ` Takuya Yoshikawa
  1 sibling, 1 reply; 18+ messages in thread
From: Marcelo Tosatti @ 2012-05-01  3:04 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, kvm, yoshikawa.takuya

On Sun, Apr 29, 2012 at 11:55:50PM +0900, Takuya Yoshikawa wrote:
> On Sun, 29 Apr 2012 17:39:35 +0300
> Avi Kivity <avi@redhat.com> wrote:
> 
> > > How about your own idea?
> > >
> > 
> > Looks pretty good, if I do say so myself.  I'll take a shot at
> > implementing it.
> 
> 
> Looking forward to it!
> After your work, 8192 in my patch may better be lowered a bit.

Why not simply use spin_is_contented again? Are you afraid of
GET_DIRTY_LOG starved by pagefaults?



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-05-01  3:04               ` Marcelo Tosatti
@ 2012-05-01 13:14                 ` Takuya Yoshikawa
  0 siblings, 0 replies; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-05-01 13:14 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Avi Kivity, kvm, yoshikawa.takuya

On Tue, 1 May 2012 00:04:47 -0300
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> > Looking forward to it!
> > After your work, 8192 in my patch may better be lowered a bit.
> 
> Why not simply use spin_is_contented again? Are you afraid of
> GET_DIRTY_LOG starved by pagefaults?

No, but not so confident.

I personally tested some extreme cases like "cond_resched for every
iteration" and did not see any significant slowdown.  That's all what
I know now.

I also think we should use spin_is_contended() again.

What I am thinking now is whether it is possible to change
cond_resched_lock() to satisfy our need like:

	cond_resched_lock(lock, spin_is_contended(lock)); // we want this
	cond_resched_lock(lock, spin_needbreak(lock));    // same as current
	cond_resched_lock(lock, false);            // never check contention

Although I have checked all callers, it is not certain whether they do
not want to check lock contention when CONFIG_PREEMPT=no.

I will send an RFC patch to get comments, if possible.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
  2012-04-29 14:39           ` Avi Kivity
  2012-04-29 14:55             ` Takuya Yoshikawa
@ 2012-05-01  3:07             ` Marcelo Tosatti
  1 sibling, 0 replies; 18+ messages in thread
From: Marcelo Tosatti @ 2012-05-01  3:07 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Takuya Yoshikawa, kvm, yoshikawa.takuya

On Sun, Apr 29, 2012 at 05:39:35PM +0300, Avi Kivity wrote:
> On 04/29/2012 05:24 PM, Takuya Yoshikawa wrote:
> > On Sun, 29 Apr 2012 15:59:18 +0300
> > Avi Kivity <avi@redhat.com> wrote:
> >
> > > > As we discussed before, we need to add some tricks to de-couple mmu_lock and
> > > > TLB flush.
> > > 
> > > Ok, let's discuss them (we can apply the patch independently).  Do you
> > > have something in mind?
> > > 
> >
> > How about your own idea?
> >
> 
> Looks pretty good, if I do say so myself.  I'll take a shot at
> implementing it.

Decoupling would also simplify Xiao's lockless faults.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Heavy  memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging
  2012-04-28 10:05 [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
  2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa
@ 2012-05-02 11:24 ` Takuya Yoshikawa
  2012-05-02 11:33   ` Avi Kivity
  1 sibling, 1 reply; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-05-02 11:24 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: avi, mtosatti, kvm, yoshikawa.takuya, qemu-devel

During checking mmu_lock contention, I noticed that QEMU's
memory_region_get_dirty() was using unexpectedly much CPU time.

Thanks,
	Takuya

=============================
perf top -t ${QEMU_TID}
=============================
 51.52%  qemu-system-x86_64       [.] memory_region_get_dirty
 16.73%  qemu-system-x86_64       [.] ram_save_remaining
  7.25%  qemu-system-x86_64       [.] cpu_physical_memory_reset_dirty
  3.49%  [kvm]                    [k] __rmap_write_protect
  2.85%  [kvm]                    [k] mmu_spte_update
  2.20%  [kernel]                 [k] copy_user_generic_string
  2.16%  libc-2.13.so             [.] 0x874e9
  1.71%  qemu-system-x86_64       [.] memory_region_set_dirty
  1.20%  qemu-system-x86_64       [.] kvm_physical_sync_dirty_bitmap
  1.00%  [kernel]                 [k] __lock_acquire.isra.31
  0.66%  [kvm]                    [k] rmap_get_next
  0.58%  [kvm]                    [k] rmap_get_first
  0.54%  [kvm]                    [k] kvm_mmu_write_protect_pt_masked
  0.54%  [kvm]                    [k] spte_has_volatile_bits
  0.42%  [kernel]                 [k] lock_release
  0.37%  [kernel]                 [k] tcp_sendmsg
  0.33%  [kernel]                 [k] alloc_pages_current
  0.29%  [kernel]                 [k] native_read_tsc
  0.29%  qemu-system-x86_64       [.] ram_save_block
  0.25%  [kernel]                 [k] lock_is_held
  0.25%  [kernel]                 [k] __ticket_spin_trylock
  0.21%  [kernel]                 [k] lock_acquire


On Sat, 28 Apr 2012 19:05:44 +0900
Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote:

> 1. Problem
>   During live migration, if the guest tries to take mmu_lock at the same
>   time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
>   forced to wait long time; this is not restricted to page faults caused
>   by GET_DIRTY_LOG's write protection.
> 
> 2. Measurement
> - Server:
>   Xeon: 8 cores(2 CPUs), 24GB memory
> 
> - One VM was being migrated locally to the opposite numa node:
>   Source(active)   VM: binded to node 0
>   Target(incoming) VM: binded to node 1
> 
>   This binding was for reducing extra noise.
> 
> - The guest inside it:
>   3 VCPUs, 11GB memory
> 
> - Workload:
>   On VCPU 2 and 3, there were 3 threads and each of them was endlessly
>   writing to 3GB, in total 9GB, anonymous memory at its maximum speed.
> 
>   I had checked that GET_DIRTY_LOG was forced to write protect more than
>   2 million pages.  So the 9GB memory was almost always kept dirty to be
>   sent.
> 
>   In parallel, on VCPU 1, I checked memory write latency: how long it
>   takes to write to one byte of each page in 1GB anonymous memory.
> 
> - Result:
>   With the current KVM, I could see 1.5ms worst case latency: this
>   corresponds well with the expected mmu_lock hold time.
> 
>   Here, you may think that this is too small compared to the numbers I
>   reported before, using dirty-log-perf, but that was done on 32-bit
>   host on a core-i3 box which was much slower than server machines.
> 
> 
>   Although having 10GB dirty memory pages is a bit extreme for guests
>   with less than 16GB memory, much larger guests, e.g. 128GB guests, may
>   see latency longer than 1.5ms.
> 
> 3. Solution
>   GET_DIRTY_LOG time is very limited compared to other works in QEMU,
>   so we should focus on alleviating the worst case latency first.
> 
>   The solution is very simple and originally suggested by Marcelo:
>     "Conditionally reschedule when there is a contention."
> 
>   By this rescheduling, see the following patch, the worst case latency
>   changed from 1.5ms to 800us for the same test.
> 
> 4. TODO
>   The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write
>   protection by kvm_mmu_slot_remove_write_access(), which is called when
>   we enable dirty page logging, can cause the same problem.
> 
>   My plan is to replace it with rmap-based protection after this.
> 
> 
> Thanks,
> 	Takuya
> 
> ---
> Takuya Yoshikawa (1):
>   KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
> 
>  arch/x86/include/asm/kvm_host.h |    6 +++---
>  arch/x86/kvm/mmu.c              |   12 +++++++++---
>  arch/x86/kvm/x86.c              |   22 +++++++++++++++++-----
>  3 files changed, 29 insertions(+), 11 deletions(-)
> 
> -- 
> 1.7.5.4
> 


-- 
Takuya Yoshikawa <takuya.yoshikawa@gmail.com>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Heavy  memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging
  2012-05-02 11:24 ` Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
@ 2012-05-02 11:33   ` Avi Kivity
  2012-05-02 14:20     ` Takuya Yoshikawa
  0 siblings, 1 reply; 18+ messages in thread
From: Avi Kivity @ 2012-05-02 11:33 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya, qemu-devel, qemu-devel

On 05/02/2012 02:24 PM, Takuya Yoshikawa wrote:
> During checking mmu_lock contention, I noticed that QEMU's
> memory_region_get_dirty() was using unexpectedly much CPU time.
>
> Thanks,
> 	Takuya
>
> =============================
> perf top -t ${QEMU_TID}
> =============================
>  51.52%  qemu-system-x86_64       [.] memory_region_get_dirty
>  16.73%  qemu-system-x86_64       [.] ram_save_remaining
>

memory_region_get_dirty() is called from ram_save_remaining().  Looks
like quadratic behaviour here: we send a few pages in
ram_save_remaining(), then walk the entire dirty bitmap to calculate
expected_time().

We should probably calculate expected_time once per iteration.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Heavy  memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging
  2012-05-02 11:33   ` Avi Kivity
@ 2012-05-02 14:20     ` Takuya Yoshikawa
  0 siblings, 0 replies; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-05-02 14:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya, qemu-devel

On Wed, 02 May 2012 14:33:55 +0300
Avi Kivity <avi@redhat.com> wrote:

> > =============================
> > perf top -t ${QEMU_TID}
> > =============================
> >  51.52%  qemu-system-x86_64       [.] memory_region_get_dirty
> >  16.73%  qemu-system-x86_64       [.] ram_save_remaining
> >
> 
> memory_region_get_dirty() is called from ram_save_remaining().  Looks
> like quadratic behaviour here: we send a few pages in
> ram_save_remaining(), then walk the entire dirty bitmap to calculate
> expected_time().
> 
> We should probably calculate expected_time once per iteration.

There seems to be many unnecessary/repeated calculations.
Not restricted to this expected_time one.

	Takuya

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-05-02 14:20 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-04-28 10:05 [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa
2012-04-29 11:27   ` Avi Kivity
2012-04-29 12:17     ` Takuya Yoshikawa
2012-04-29 12:59       ` Avi Kivity
2012-04-29 14:24         ` Takuya Yoshikawa
2012-04-29 14:39           ` Avi Kivity
2012-04-29 14:55             ` Takuya Yoshikawa
2012-04-29 15:00               ` Avi Kivity
2012-04-29 15:13                 ` Takuya Yoshikawa
2012-04-29 15:20                   ` Avi Kivity
2012-04-30 14:06                     ` Takuya Yoshikawa
2012-05-01  3:04               ` Marcelo Tosatti
2012-05-01 13:14                 ` Takuya Yoshikawa
2012-05-01  3:07             ` Marcelo Tosatti
2012-05-02 11:24 ` Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
2012-05-02 11:33   ` Avi Kivity
2012-05-02 14:20     ` Takuya Yoshikawa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox