* [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging
@ 2012-04-28 10:05 Takuya Yoshikawa
2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa
2012-05-02 11:24 ` Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
0 siblings, 2 replies; 18+ messages in thread
From: Takuya Yoshikawa @ 2012-04-28 10:05 UTC (permalink / raw)
To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya
1. Problem
During live migration, if the guest tries to take mmu_lock at the same
time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
forced to wait long time; this is not restricted to page faults caused
by GET_DIRTY_LOG's write protection.
2. Measurement
- Server:
Xeon: 8 cores(2 CPUs), 24GB memory
- One VM was being migrated locally to the opposite numa node:
Source(active) VM: binded to node 0
Target(incoming) VM: binded to node 1
This binding was for reducing extra noise.
- The guest inside it:
3 VCPUs, 11GB memory
- Workload:
On VCPU 2 and 3, there were 3 threads and each of them was endlessly
writing to 3GB, in total 9GB, anonymous memory at its maximum speed.
I had checked that GET_DIRTY_LOG was forced to write protect more than
2 million pages. So the 9GB memory was almost always kept dirty to be
sent.
In parallel, on VCPU 1, I checked memory write latency: how long it
takes to write to one byte of each page in 1GB anonymous memory.
- Result:
With the current KVM, I could see 1.5ms worst case latency: this
corresponds well with the expected mmu_lock hold time.
Here, you may think that this is too small compared to the numbers I
reported before, using dirty-log-perf, but that was done on 32-bit
host on a core-i3 box which was much slower than server machines.
Although having 10GB dirty memory pages is a bit extreme for guests
with less than 16GB memory, much larger guests, e.g. 128GB guests, may
see latency longer than 1.5ms.
3. Solution
GET_DIRTY_LOG time is very limited compared to other works in QEMU,
so we should focus on alleviating the worst case latency first.
The solution is very simple and originally suggested by Marcelo:
"Conditionally reschedule when there is a contention."
By this rescheduling, see the following patch, the worst case latency
changed from 1.5ms to 800us for the same test.
4. TODO
The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write
protection by kvm_mmu_slot_remove_write_access(), which is called when
we enable dirty page logging, can cause the same problem.
My plan is to replace it with rmap-based protection after this.
Thanks,
Takuya
---
Takuya Yoshikawa (1):
KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
arch/x86/include/asm/kvm_host.h | 6 +++---
arch/x86/kvm/mmu.c | 12 +++++++++---
arch/x86/kvm/x86.c | 22 +++++++++++++++++-----
3 files changed, 29 insertions(+), 11 deletions(-)
--
1.7.5.4
^ permalink raw reply [flat|nested] 18+ messages in thread* [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-28 10:05 [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa @ 2012-04-28 10:07 ` Takuya Yoshikawa 2012-04-29 11:27 ` Avi Kivity 2012-05-02 11:24 ` Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa 1 sibling, 1 reply; 18+ messages in thread From: Takuya Yoshikawa @ 2012-04-28 10:07 UTC (permalink / raw) To: avi, mtosatti; +Cc: kvm, yoshikawa.takuya From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> get_dirty_log() needs to hold mmu_lock during write protecting dirty pages and this can be long when there are many dirty pages to protect. As the guest can get faulted during that time, and of course other mmu works can also happen, this may result in a severe latency problem which would prevent the system from scaling. This patch mitigates this by checking mmu_lock contention for every 8K dirty pages we protect: write protecting this number of pages took about one hundred microseconds on a usual Xeon server. Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> --- v1->v2: Replaced spin_needbreak() with spin_is_contended() to make this work even when CONFIG_PREEMPT=no. Expanded the reasoning behind the check as a comment. Raised the value from 2048 to 8192 based on a new test result. arch/x86/include/asm/kvm_host.h | 6 +++--- arch/x86/kvm/mmu.c | 12 +++++++++--- arch/x86/kvm/x86.c | 22 +++++++++++++++++----- 3 files changed, 29 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 69e39bc..b249a8b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -716,9 +716,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, - struct kvm_memory_slot *slot, - gfn_t gfn_offset, unsigned long mask); +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask); void kvm_mmu_zap_all(struct kvm *kvm); unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 07424cf..c4abfc1 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level * * Used when we do not need to care about huge page mappings: e.g. during dirty * logging we do not have any such mappings. + * + * Returns the number of pages protected by this. */ -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, - struct kvm_memory_slot *slot, - gfn_t gfn_offset, unsigned long mask) +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm, + struct kvm_memory_slot *slot, + gfn_t gfn_offset, unsigned long mask) { unsigned long *rmapp; + int nr_protected = 0; while (mask) { rmapp = &slot->rmap[gfn_offset + __ffs(mask)]; __rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL); + ++nr_protected; /* clear the first set bit */ mask &= mask - 1; } + + return nr_protected; } static int rmap_write_protect(struct kvm *kvm, u64 gfn) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4de705c..aaae01a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) unsigned long n, i; unsigned long *dirty_bitmap; unsigned long *dirty_bitmap_buffer; - bool is_dirty = false; + int nr_protected = 0; mutex_lock(&kvm->slots_lock); @@ -3121,15 +3121,27 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) if (!dirty_bitmap[i]) continue; - is_dirty = true; - mask = xchg(&dirty_bitmap[i], 0); dirty_bitmap_buffer[i] = mask; offset = i * BITS_PER_LONG; - kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask); + nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot, + offset, mask); + if (nr_protected > 8192) { + /* + * Write protecting many pages takes long time, so we + * check mmu_lock contention for avoiding ms latency. + */ + if (need_resched() || spin_is_contended(&kvm->mmu_lock)) { + kvm_flush_remote_tlbs(kvm); + spin_unlock(&kvm->mmu_lock); + cond_resched(); + spin_lock(&kvm->mmu_lock); + } + nr_protected = 0; + } } - if (is_dirty) + if (nr_protected) kvm_flush_remote_tlbs(kvm); spin_unlock(&kvm->mmu_lock); -- 1.7.5.4 ^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa @ 2012-04-29 11:27 ` Avi Kivity 2012-04-29 12:17 ` Takuya Yoshikawa 0 siblings, 1 reply; 18+ messages in thread From: Avi Kivity @ 2012-04-29 11:27 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya On 04/28/2012 01:07 PM, Takuya Yoshikawa wrote: > From: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> > > get_dirty_log() needs to hold mmu_lock during write protecting dirty > pages and this can be long when there are many dirty pages to protect. > > As the guest can get faulted during that time, and of course other mmu > works can also happen, this may result in a severe latency problem which > would prevent the system from scaling. > > This patch mitigates this by checking mmu_lock contention for every 8K > dirty pages we protect: write protecting this number of pages took about > one hundred microseconds on a usual Xeon server. > > Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> > --- > v1->v2: > Replaced spin_needbreak() with spin_is_contended() to make this work > even when CONFIG_PREEMPT=no. > Expanded the reasoning behind the check as a comment. > Raised the value from 2048 to 8192 based on a new test result. > > arch/x86/include/asm/kvm_host.h | 6 +++--- > arch/x86/kvm/mmu.c | 12 +++++++++--- > arch/x86/kvm/x86.c | 22 +++++++++++++++++----- > 3 files changed, 29 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 69e39bc..b249a8b 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -716,9 +716,9 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, > > int kvm_mmu_reset_context(struct kvm_vcpu *vcpu); > void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot); > -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > - struct kvm_memory_slot *slot, > - gfn_t gfn_offset, unsigned long mask); > +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + gfn_t gfn_offset, unsigned long mask); > void kvm_mmu_zap_all(struct kvm *kvm); > unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm); > void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages); > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 07424cf..c4abfc1 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -1081,20 +1081,26 @@ static int __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, int level > * > * Used when we do not need to care about huge page mappings: e.g. during dirty > * logging we do not have any such mappings. > + * > + * Returns the number of pages protected by this. > */ > -void kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > - struct kvm_memory_slot *slot, > - gfn_t gfn_offset, unsigned long mask) > +int kvm_mmu_write_protect_pt_masked(struct kvm *kvm, > + struct kvm_memory_slot *slot, > + gfn_t gfn_offset, unsigned long mask) > { > unsigned long *rmapp; > + int nr_protected = 0; > > while (mask) { > rmapp = &slot->rmap[gfn_offset + __ffs(mask)]; > __rmap_write_protect(kvm, rmapp, PT_PAGE_TABLE_LEVEL); > + ++nr_protected; > > /* clear the first set bit */ > mask &= mask - 1; > } > + > + return nr_protected; > } > > static int rmap_write_protect(struct kvm *kvm, u64 gfn) > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 4de705c..aaae01a 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3092,7 +3092,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) > unsigned long n, i; > unsigned long *dirty_bitmap; > unsigned long *dirty_bitmap_buffer; > - bool is_dirty = false; > + int nr_protected = 0; > > mutex_lock(&kvm->slots_lock); > > @@ -3121,15 +3121,27 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log) > if (!dirty_bitmap[i]) > continue; > > - is_dirty = true; > - > mask = xchg(&dirty_bitmap[i], 0); > dirty_bitmap_buffer[i] = mask; > > offset = i * BITS_PER_LONG; > - kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask); > + nr_protected += kvm_mmu_write_protect_pt_masked(kvm, memslot, > + offset, mask); > + if (nr_protected > 8192) { > + /* > + * Write protecting many pages takes long time, so we > + * check mmu_lock contention for avoiding ms latency. > + */ > + if (need_resched() || spin_is_contended(&kvm->mmu_lock)) { > + kvm_flush_remote_tlbs(kvm); Do we really need to flush the TLB here? Suppose we don't. So some pages could still be written to using old TLB entries, but that's okay, since we're reporting those pages as dirty anyway. In fact we might be saving userspace another pass at the page, if it won't be written afterwards. A flush is only needed to prevent unlogged writes after we return the dirty bitmap. If this reasoning is correct, we can replace the whole thing with cond_resched_lock(). Oh, but this might trick a later rmap_write_protect() into thinking that no write protection and tlb flush is needed. So we should touch kvm->tlbs_dirty. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 11:27 ` Avi Kivity @ 2012-04-29 12:17 ` Takuya Yoshikawa 2012-04-29 12:59 ` Avi Kivity 0 siblings, 1 reply; 18+ messages in thread From: Takuya Yoshikawa @ 2012-04-29 12:17 UTC (permalink / raw) To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya On Sun, 29 Apr 2012 14:27:30 +0300 Avi Kivity <avi@redhat.com> wrote: > > + if (need_resched() || spin_is_contended(&kvm->mmu_lock)) { > > + kvm_flush_remote_tlbs(kvm); > > Do we really need to flush the TLB here? > > Suppose we don't. So some pages could still be written to using old TLB > entries, but that's okay, since we're reporting those pages as dirty > anyway. In fact we might be saving userspace another pass at the page, > if it won't be written afterwards. A flush is only needed to prevent > unlogged writes after we return the dirty bitmap. > > If this reasoning is correct, we can replace the whole thing with > cond_resched_lock(). Correct for dirty logging. Actually, that was the reason I once introduced a rmap-write-protection race when I first did rmap-based write protection. I did not think about other paths which conditionally flush TLBs. For this patch, TLB flush is needed. > Oh, but this might trick a later rmap_write_protect() into thinking that > no write protection and tlb flush is needed. So we should touch > kvm->tlbs_dirty. The problem was making others think that "already protected, no need to flush." As we discussed before, we need to add some tricks to de-couple mmu_lock and TLB flush. Thanks, Takuya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 12:17 ` Takuya Yoshikawa @ 2012-04-29 12:59 ` Avi Kivity 2012-04-29 14:24 ` Takuya Yoshikawa 0 siblings, 1 reply; 18+ messages in thread From: Avi Kivity @ 2012-04-29 12:59 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya On 04/29/2012 03:17 PM, Takuya Yoshikawa wrote: > > Oh, but this might trick a later rmap_write_protect() into thinking that > > no write protection and tlb flush is needed. So we should touch > > kvm->tlbs_dirty. > > The problem was making others think that > "already protected, no need to flush." > > As we discussed before, we need to add some tricks to de-couple mmu_lock and > TLB flush. Ok, let's discuss them (we can apply the patch independently). Do you have something in mind? -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 12:59 ` Avi Kivity @ 2012-04-29 14:24 ` Takuya Yoshikawa 2012-04-29 14:39 ` Avi Kivity 0 siblings, 1 reply; 18+ messages in thread From: Takuya Yoshikawa @ 2012-04-29 14:24 UTC (permalink / raw) To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya On Sun, 29 Apr 2012 15:59:18 +0300 Avi Kivity <avi@redhat.com> wrote: > > As we discussed before, we need to add some tricks to de-couple mmu_lock and > > TLB flush. > > Ok, let's discuss them (we can apply the patch independently). Do you > have something in mind? > How about your own idea? http://www.spinics.net/lists/kvm/msg68550.html === > > How about something like a sequence lock: > > > > > > spin_lock(mmu_lock) > > need_flush = write_protect_stuff(); > > atomic_add(kvm->want_flush_counter, need_flush); > > spin_unlock(mmu_lock); > > > > while ((done = atomic_read(kvm->done_flush_counter)) < (want = > > atomic_read(kvm->want_flush_counter)) { > > kvm_make_request(flush) > > atomic_cmpxchg(kvm->done_flush_counter, done, want) > > } > > > > This (or maybe a corrected and optimized version) ensures that any > > need_flush cannot pass the while () barrier, no matter which thread > > encounters it first. However it violates the "do not invent new locking > > techniques" commandment. Can we map it to some existing method? > > There is no need to advance 'want' in the loop. So we could do > > /* must call with mmu_lock held */ > void kvm_mmu_defer_remote_flush(kvm, need_flush) > { > if (need_flush) > ++kvm->flush_counter.want; > } > > /* may call without mmu_lock */ > void kvm_mmu_commit_remote_flush(kvm) > { > want = ACCESS_ONCE(kvm->flush_counter.want) > while ((done = atomic_read(kvm->flush_counter.done) < want) { > kvm_make_request(flush) > atomic_cmpxchg(kvm->flush_counter.done, done, want) > } > } === Takuya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 14:24 ` Takuya Yoshikawa @ 2012-04-29 14:39 ` Avi Kivity 2012-04-29 14:55 ` Takuya Yoshikawa 2012-05-01 3:07 ` Marcelo Tosatti 0 siblings, 2 replies; 18+ messages in thread From: Avi Kivity @ 2012-04-29 14:39 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya On 04/29/2012 05:24 PM, Takuya Yoshikawa wrote: > On Sun, 29 Apr 2012 15:59:18 +0300 > Avi Kivity <avi@redhat.com> wrote: > > > > As we discussed before, we need to add some tricks to de-couple mmu_lock and > > > TLB flush. > > > > Ok, let's discuss them (we can apply the patch independently). Do you > > have something in mind? > > > > How about your own idea? > Looks pretty good, if I do say so myself. I'll take a shot at implementing it. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 14:39 ` Avi Kivity @ 2012-04-29 14:55 ` Takuya Yoshikawa 2012-04-29 15:00 ` Avi Kivity 2012-05-01 3:04 ` Marcelo Tosatti 2012-05-01 3:07 ` Marcelo Tosatti 1 sibling, 2 replies; 18+ messages in thread From: Takuya Yoshikawa @ 2012-04-29 14:55 UTC (permalink / raw) To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya On Sun, 29 Apr 2012 17:39:35 +0300 Avi Kivity <avi@redhat.com> wrote: > > How about your own idea? > > > > Looks pretty good, if I do say so myself. I'll take a shot at > implementing it. Looking forward to it! After your work, 8192 in my patch may better be lowered a bit. Thanks, Takuya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 14:55 ` Takuya Yoshikawa @ 2012-04-29 15:00 ` Avi Kivity 2012-04-29 15:13 ` Takuya Yoshikawa 2012-05-01 3:04 ` Marcelo Tosatti 1 sibling, 1 reply; 18+ messages in thread From: Avi Kivity @ 2012-04-29 15:00 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya On 04/29/2012 05:55 PM, Takuya Yoshikawa wrote: > On Sun, 29 Apr 2012 17:39:35 +0300 > Avi Kivity <avi@redhat.com> wrote: > > > > How about your own idea? > > > > > > > Looks pretty good, if I do say so myself. I'll take a shot at > > implementing it. > > > Looking forward to it! > After your work, 8192 in my patch may better be lowered a bit. > Why not remove it altogether? Just change it to cond_resched_lock(). -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 15:00 ` Avi Kivity @ 2012-04-29 15:13 ` Takuya Yoshikawa 2012-04-29 15:20 ` Avi Kivity 0 siblings, 1 reply; 18+ messages in thread From: Takuya Yoshikawa @ 2012-04-29 15:13 UTC (permalink / raw) To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya On Sun, 29 Apr 2012 18:00:03 +0300 Avi Kivity <avi@redhat.com> wrote: > > After your work, 8192 in my patch may better be lowered a bit. > > > > Why not remove it altogether? Just change it to cond_resched_lock(). Two concerns: - too many checks may slow down GET_DIRTY_LOG. - cond_resched_lock() uses spin_needbreak(), and I am not sure if this still does rescheduling for alleviating mmu_lock contention without CONFIG_PREEMPT. I need to read the code a bit more. Thanks, Takuya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 15:13 ` Takuya Yoshikawa @ 2012-04-29 15:20 ` Avi Kivity 2012-04-30 14:06 ` Takuya Yoshikawa 0 siblings, 1 reply; 18+ messages in thread From: Avi Kivity @ 2012-04-29 15:20 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya On 04/29/2012 06:13 PM, Takuya Yoshikawa wrote: > On Sun, 29 Apr 2012 18:00:03 +0300 > Avi Kivity <avi@redhat.com> wrote: > > > > After your work, 8192 in my patch may better be lowered a bit. > > > > > > > Why not remove it altogether? Just change it to cond_resched_lock(). > > Two concerns: > > - too many checks may slow down GET_DIRTY_LOG. Yes. My expectation is that it's okay. If there is no contention, then the lock cache line will be local, and the check is cheap. If there is contention, then we need to drop it. > - cond_resched_lock() uses spin_needbreak(), and I am not sure if > this still does rescheduling for alleviating mmu_lock contention > without CONFIG_PREEMPT. IMO that's a feature. If kernel policy is not to lockbreak when !CONFIG_PREEMPT, then we should follow it, whether it's the right thing or not. If it's the wrong thing, change it in the kernel proper. > I need to read the code a bit more. > while (1) need_to_read_the_code_a_bit_more(); unfortunately. I keep finding new things in there, including stuff I wrote myself. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 15:20 ` Avi Kivity @ 2012-04-30 14:06 ` Takuya Yoshikawa 0 siblings, 0 replies; 18+ messages in thread From: Takuya Yoshikawa @ 2012-04-30 14:06 UTC (permalink / raw) To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya On Sun, 29 Apr 2012 18:20:37 +0300 Avi Kivity <avi@redhat.com> wrote: > > - cond_resched_lock() uses spin_needbreak(), and I am not sure if > > this still does rescheduling for alleviating mmu_lock contention > > without CONFIG_PREEMPT. > > IMO that's a feature. If kernel policy is not to lockbreak when > !CONFIG_PREEMPT, then we should follow it, whether it's the right thing > or not. If it's the wrong thing, change it in the kernel proper. Agreed. Anyway, we need to alleviate mmu_lock contention on server machines, where we usually do not select CONFIG_PREEMPT. So I need to check the porpose of this API. > while (1) > need_to_read_the_code_a_bit_more(); It's a pleasure! Thanks, Takuya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 14:55 ` Takuya Yoshikawa 2012-04-29 15:00 ` Avi Kivity @ 2012-05-01 3:04 ` Marcelo Tosatti 2012-05-01 13:14 ` Takuya Yoshikawa 1 sibling, 1 reply; 18+ messages in thread From: Marcelo Tosatti @ 2012-05-01 3:04 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: Avi Kivity, kvm, yoshikawa.takuya On Sun, Apr 29, 2012 at 11:55:50PM +0900, Takuya Yoshikawa wrote: > On Sun, 29 Apr 2012 17:39:35 +0300 > Avi Kivity <avi@redhat.com> wrote: > > > > How about your own idea? > > > > > > > Looks pretty good, if I do say so myself. I'll take a shot at > > implementing it. > > > Looking forward to it! > After your work, 8192 in my patch may better be lowered a bit. Why not simply use spin_is_contented again? Are you afraid of GET_DIRTY_LOG starved by pagefaults? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-05-01 3:04 ` Marcelo Tosatti @ 2012-05-01 13:14 ` Takuya Yoshikawa 0 siblings, 0 replies; 18+ messages in thread From: Takuya Yoshikawa @ 2012-05-01 13:14 UTC (permalink / raw) To: Marcelo Tosatti; +Cc: Avi Kivity, kvm, yoshikawa.takuya On Tue, 1 May 2012 00:04:47 -0300 Marcelo Tosatti <mtosatti@redhat.com> wrote: > > Looking forward to it! > > After your work, 8192 in my patch may better be lowered a bit. > > Why not simply use spin_is_contented again? Are you afraid of > GET_DIRTY_LOG starved by pagefaults? No, but not so confident. I personally tested some extreme cases like "cond_resched for every iteration" and did not see any significant slowdown. That's all what I know now. I also think we should use spin_is_contended() again. What I am thinking now is whether it is possible to change cond_resched_lock() to satisfy our need like: cond_resched_lock(lock, spin_is_contended(lock)); // we want this cond_resched_lock(lock, spin_needbreak(lock)); // same as current cond_resched_lock(lock, false); // never check contention Although I have checked all callers, it is not certain whether they do not want to check lock contention when CONFIG_PREEMPT=no. I will send an RFC patch to get comments, if possible. Thanks, Takuya ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() 2012-04-29 14:39 ` Avi Kivity 2012-04-29 14:55 ` Takuya Yoshikawa @ 2012-05-01 3:07 ` Marcelo Tosatti 1 sibling, 0 replies; 18+ messages in thread From: Marcelo Tosatti @ 2012-05-01 3:07 UTC (permalink / raw) To: Avi Kivity; +Cc: Takuya Yoshikawa, kvm, yoshikawa.takuya On Sun, Apr 29, 2012 at 05:39:35PM +0300, Avi Kivity wrote: > On 04/29/2012 05:24 PM, Takuya Yoshikawa wrote: > > On Sun, 29 Apr 2012 15:59:18 +0300 > > Avi Kivity <avi@redhat.com> wrote: > > > > > > As we discussed before, we need to add some tricks to de-couple mmu_lock and > > > > TLB flush. > > > > > > Ok, let's discuss them (we can apply the patch independently). Do you > > > have something in mind? > > > > > > > How about your own idea? > > > > Looks pretty good, if I do say so myself. I'll take a shot at > implementing it. Decoupling would also simplify Xiao's lockless faults. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging 2012-04-28 10:05 [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa 2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa @ 2012-05-02 11:24 ` Takuya Yoshikawa 2012-05-02 11:33 ` Avi Kivity 1 sibling, 1 reply; 18+ messages in thread From: Takuya Yoshikawa @ 2012-05-02 11:24 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: avi, mtosatti, kvm, yoshikawa.takuya, qemu-devel During checking mmu_lock contention, I noticed that QEMU's memory_region_get_dirty() was using unexpectedly much CPU time. Thanks, Takuya ============================= perf top -t ${QEMU_TID} ============================= 51.52% qemu-system-x86_64 [.] memory_region_get_dirty 16.73% qemu-system-x86_64 [.] ram_save_remaining 7.25% qemu-system-x86_64 [.] cpu_physical_memory_reset_dirty 3.49% [kvm] [k] __rmap_write_protect 2.85% [kvm] [k] mmu_spte_update 2.20% [kernel] [k] copy_user_generic_string 2.16% libc-2.13.so [.] 0x874e9 1.71% qemu-system-x86_64 [.] memory_region_set_dirty 1.20% qemu-system-x86_64 [.] kvm_physical_sync_dirty_bitmap 1.00% [kernel] [k] __lock_acquire.isra.31 0.66% [kvm] [k] rmap_get_next 0.58% [kvm] [k] rmap_get_first 0.54% [kvm] [k] kvm_mmu_write_protect_pt_masked 0.54% [kvm] [k] spte_has_volatile_bits 0.42% [kernel] [k] lock_release 0.37% [kernel] [k] tcp_sendmsg 0.33% [kernel] [k] alloc_pages_current 0.29% [kernel] [k] native_read_tsc 0.29% qemu-system-x86_64 [.] ram_save_block 0.25% [kernel] [k] lock_is_held 0.25% [kernel] [k] __ticket_spin_trylock 0.21% [kernel] [k] lock_acquire On Sat, 28 Apr 2012 19:05:44 +0900 Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote: > 1. Problem > During live migration, if the guest tries to take mmu_lock at the same > time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be > forced to wait long time; this is not restricted to page faults caused > by GET_DIRTY_LOG's write protection. > > 2. Measurement > - Server: > Xeon: 8 cores(2 CPUs), 24GB memory > > - One VM was being migrated locally to the opposite numa node: > Source(active) VM: binded to node 0 > Target(incoming) VM: binded to node 1 > > This binding was for reducing extra noise. > > - The guest inside it: > 3 VCPUs, 11GB memory > > - Workload: > On VCPU 2 and 3, there were 3 threads and each of them was endlessly > writing to 3GB, in total 9GB, anonymous memory at its maximum speed. > > I had checked that GET_DIRTY_LOG was forced to write protect more than > 2 million pages. So the 9GB memory was almost always kept dirty to be > sent. > > In parallel, on VCPU 1, I checked memory write latency: how long it > takes to write to one byte of each page in 1GB anonymous memory. > > - Result: > With the current KVM, I could see 1.5ms worst case latency: this > corresponds well with the expected mmu_lock hold time. > > Here, you may think that this is too small compared to the numbers I > reported before, using dirty-log-perf, but that was done on 32-bit > host on a core-i3 box which was much slower than server machines. > > > Although having 10GB dirty memory pages is a bit extreme for guests > with less than 16GB memory, much larger guests, e.g. 128GB guests, may > see latency longer than 1.5ms. > > 3. Solution > GET_DIRTY_LOG time is very limited compared to other works in QEMU, > so we should focus on alleviating the worst case latency first. > > The solution is very simple and originally suggested by Marcelo: > "Conditionally reschedule when there is a contention." > > By this rescheduling, see the following patch, the worst case latency > changed from 1.5ms to 800us for the same test. > > 4. TODO > The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write > protection by kvm_mmu_slot_remove_write_access(), which is called when > we enable dirty page logging, can cause the same problem. > > My plan is to replace it with rmap-based protection after this. > > > Thanks, > Takuya > > --- > Takuya Yoshikawa (1): > KVM: Reduce mmu_lock contention during dirty logging by cond_resched() > > arch/x86/include/asm/kvm_host.h | 6 +++--- > arch/x86/kvm/mmu.c | 12 +++++++++--- > arch/x86/kvm/x86.c | 22 +++++++++++++++++----- > 3 files changed, 29 insertions(+), 11 deletions(-) > > -- > 1.7.5.4 > -- Takuya Yoshikawa <takuya.yoshikawa@gmail.com> ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging 2012-05-02 11:24 ` Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa @ 2012-05-02 11:33 ` Avi Kivity 2012-05-02 14:20 ` Takuya Yoshikawa 0 siblings, 1 reply; 18+ messages in thread From: Avi Kivity @ 2012-05-02 11:33 UTC (permalink / raw) To: Takuya Yoshikawa; +Cc: mtosatti, kvm, yoshikawa.takuya, qemu-devel, qemu-devel On 05/02/2012 02:24 PM, Takuya Yoshikawa wrote: > During checking mmu_lock contention, I noticed that QEMU's > memory_region_get_dirty() was using unexpectedly much CPU time. > > Thanks, > Takuya > > ============================= > perf top -t ${QEMU_TID} > ============================= > 51.52% qemu-system-x86_64 [.] memory_region_get_dirty > 16.73% qemu-system-x86_64 [.] ram_save_remaining > memory_region_get_dirty() is called from ram_save_remaining(). Looks like quadratic behaviour here: we send a few pages in ram_save_remaining(), then walk the entire dirty bitmap to calculate expected_time(). We should probably calculate expected_time once per iteration. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging 2012-05-02 11:33 ` Avi Kivity @ 2012-05-02 14:20 ` Takuya Yoshikawa 0 siblings, 0 replies; 18+ messages in thread From: Takuya Yoshikawa @ 2012-05-02 14:20 UTC (permalink / raw) To: Avi Kivity; +Cc: mtosatti, kvm, yoshikawa.takuya, qemu-devel On Wed, 02 May 2012 14:33:55 +0300 Avi Kivity <avi@redhat.com> wrote: > > ============================= > > perf top -t ${QEMU_TID} > > ============================= > > 51.52% qemu-system-x86_64 [.] memory_region_get_dirty > > 16.73% qemu-system-x86_64 [.] ram_save_remaining > > > > memory_region_get_dirty() is called from ram_save_remaining(). Looks > like quadratic behaviour here: we send a few pages in > ram_save_remaining(), then walk the entire dirty bitmap to calculate > expected_time(). > > We should probably calculate expected_time once per iteration. There seems to be many unnecessary/repeated calculations. Not restricted to this expected_time one. Takuya ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2012-05-02 14:20 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-04-28 10:05 [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa 2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa 2012-04-29 11:27 ` Avi Kivity 2012-04-29 12:17 ` Takuya Yoshikawa 2012-04-29 12:59 ` Avi Kivity 2012-04-29 14:24 ` Takuya Yoshikawa 2012-04-29 14:39 ` Avi Kivity 2012-04-29 14:55 ` Takuya Yoshikawa 2012-04-29 15:00 ` Avi Kivity 2012-04-29 15:13 ` Takuya Yoshikawa 2012-04-29 15:20 ` Avi Kivity 2012-04-30 14:06 ` Takuya Yoshikawa 2012-05-01 3:04 ` Marcelo Tosatti 2012-05-01 13:14 ` Takuya Yoshikawa 2012-05-01 3:07 ` Marcelo Tosatti 2012-05-02 11:24 ` Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa 2012-05-02 11:33 ` Avi Kivity 2012-05-02 14:20 ` Takuya Yoshikawa
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox