Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
To: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
Cc: avi@redhat.com, mtosatti@redhat.com, kvm@vger.kernel.org,
	yoshikawa.takuya@oss.ntt.co.jp, qemu-devel@nongnu.org
Subject: Heavy  memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging
Date: Wed, 2 May 2012 20:24:14 +0900	[thread overview]
Message-ID: <20120502202414.37d760fbb1135bb8acb0f0db@gmail.com> (raw)
In-Reply-To: <20120428190544.7dc2bfd281054c1fcac5a14e@gmail.com>

During checking mmu_lock contention, I noticed that QEMU's
memory_region_get_dirty() was using unexpectedly much CPU time.

Thanks,
	Takuya

=============================
perf top -t ${QEMU_TID}
=============================
 51.52%  qemu-system-x86_64       [.] memory_region_get_dirty
 16.73%  qemu-system-x86_64       [.] ram_save_remaining
  7.25%  qemu-system-x86_64       [.] cpu_physical_memory_reset_dirty
  3.49%  [kvm]                    [k] __rmap_write_protect
  2.85%  [kvm]                    [k] mmu_spte_update
  2.20%  [kernel]                 [k] copy_user_generic_string
  2.16%  libc-2.13.so             [.] 0x874e9
  1.71%  qemu-system-x86_64       [.] memory_region_set_dirty
  1.20%  qemu-system-x86_64       [.] kvm_physical_sync_dirty_bitmap
  1.00%  [kernel]                 [k] __lock_acquire.isra.31
  0.66%  [kvm]                    [k] rmap_get_next
  0.58%  [kvm]                    [k] rmap_get_first
  0.54%  [kvm]                    [k] kvm_mmu_write_protect_pt_masked
  0.54%  [kvm]                    [k] spte_has_volatile_bits
  0.42%  [kernel]                 [k] lock_release
  0.37%  [kernel]                 [k] tcp_sendmsg
  0.33%  [kernel]                 [k] alloc_pages_current
  0.29%  [kernel]                 [k] native_read_tsc
  0.29%  qemu-system-x86_64       [.] ram_save_block
  0.25%  [kernel]                 [k] lock_is_held
  0.25%  [kernel]                 [k] __ticket_spin_trylock
  0.21%  [kernel]                 [k] lock_acquire


On Sat, 28 Apr 2012 19:05:44 +0900
Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote:

> 1. Problem
>   During live migration, if the guest tries to take mmu_lock at the same
>   time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
>   forced to wait long time; this is not restricted to page faults caused
>   by GET_DIRTY_LOG's write protection.
> 
> 2. Measurement
> - Server:
>   Xeon: 8 cores(2 CPUs), 24GB memory
> 
> - One VM was being migrated locally to the opposite numa node:
>   Source(active)   VM: binded to node 0
>   Target(incoming) VM: binded to node 1
> 
>   This binding was for reducing extra noise.
> 
> - The guest inside it:
>   3 VCPUs, 11GB memory
> 
> - Workload:
>   On VCPU 2 and 3, there were 3 threads and each of them was endlessly
>   writing to 3GB, in total 9GB, anonymous memory at its maximum speed.
> 
>   I had checked that GET_DIRTY_LOG was forced to write protect more than
>   2 million pages.  So the 9GB memory was almost always kept dirty to be
>   sent.
> 
>   In parallel, on VCPU 1, I checked memory write latency: how long it
>   takes to write to one byte of each page in 1GB anonymous memory.
> 
> - Result:
>   With the current KVM, I could see 1.5ms worst case latency: this
>   corresponds well with the expected mmu_lock hold time.
> 
>   Here, you may think that this is too small compared to the numbers I
>   reported before, using dirty-log-perf, but that was done on 32-bit
>   host on a core-i3 box which was much slower than server machines.
> 
> 
>   Although having 10GB dirty memory pages is a bit extreme for guests
>   with less than 16GB memory, much larger guests, e.g. 128GB guests, may
>   see latency longer than 1.5ms.
> 
> 3. Solution
>   GET_DIRTY_LOG time is very limited compared to other works in QEMU,
>   so we should focus on alleviating the worst case latency first.
> 
>   The solution is very simple and originally suggested by Marcelo:
>     "Conditionally reschedule when there is a contention."
> 
>   By this rescheduling, see the following patch, the worst case latency
>   changed from 1.5ms to 800us for the same test.
> 
> 4. TODO
>   The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write
>   protection by kvm_mmu_slot_remove_write_access(), which is called when
>   we enable dirty page logging, can cause the same problem.
> 
>   My plan is to replace it with rmap-based protection after this.
> 
> 
> Thanks,
> 	Takuya
> 
> ---
> Takuya Yoshikawa (1):
>   KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
> 
>  arch/x86/include/asm/kvm_host.h |    6 +++---
>  arch/x86/kvm/mmu.c              |   12 +++++++++---
>  arch/x86/kvm/x86.c              |   22 +++++++++++++++++-----
>  3 files changed, 29 insertions(+), 11 deletions(-)
> 
> -- 
> 1.7.5.4
> 


-- 
Takuya Yoshikawa <takuya.yoshikawa@gmail.com>

WARNING: multiple messages have this Message-ID (diff)

From: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
To: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
Cc: yoshikawa.takuya@oss.ntt.co.jp, mtosatti@redhat.com,
	avi@redhat.com, kvm@vger.kernel.org, qemu-devel@nongnu.org
Subject: [Qemu-devel] Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging
Date: Wed, 2 May 2012 20:24:14 +0900	[thread overview]
Message-ID: <20120502202414.37d760fbb1135bb8acb0f0db@gmail.com> (raw)
In-Reply-To: <20120428190544.7dc2bfd281054c1fcac5a14e@gmail.com>

During checking mmu_lock contention, I noticed that QEMU's
memory_region_get_dirty() was using unexpectedly much CPU time.

Thanks,
	Takuya

=============================
perf top -t ${QEMU_TID}
=============================
 51.52%  qemu-system-x86_64       [.] memory_region_get_dirty
 16.73%  qemu-system-x86_64       [.] ram_save_remaining
  7.25%  qemu-system-x86_64       [.] cpu_physical_memory_reset_dirty
  3.49%  [kvm]                    [k] __rmap_write_protect
  2.85%  [kvm]                    [k] mmu_spte_update
  2.20%  [kernel]                 [k] copy_user_generic_string
  2.16%  libc-2.13.so             [.] 0x874e9
  1.71%  qemu-system-x86_64       [.] memory_region_set_dirty
  1.20%  qemu-system-x86_64       [.] kvm_physical_sync_dirty_bitmap
  1.00%  [kernel]                 [k] __lock_acquire.isra.31
  0.66%  [kvm]                    [k] rmap_get_next
  0.58%  [kvm]                    [k] rmap_get_first
  0.54%  [kvm]                    [k] kvm_mmu_write_protect_pt_masked
  0.54%  [kvm]                    [k] spte_has_volatile_bits
  0.42%  [kernel]                 [k] lock_release
  0.37%  [kernel]                 [k] tcp_sendmsg
  0.33%  [kernel]                 [k] alloc_pages_current
  0.29%  [kernel]                 [k] native_read_tsc
  0.29%  qemu-system-x86_64       [.] ram_save_block
  0.25%  [kernel]                 [k] lock_is_held
  0.25%  [kernel]                 [k] __ticket_spin_trylock
  0.21%  [kernel]                 [k] lock_acquire


On Sat, 28 Apr 2012 19:05:44 +0900
Takuya Yoshikawa <takuya.yoshikawa@gmail.com> wrote:

> 1. Problem
>   During live migration, if the guest tries to take mmu_lock at the same
>   time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
>   forced to wait long time; this is not restricted to page faults caused
>   by GET_DIRTY_LOG's write protection.
> 
> 2. Measurement
> - Server:
>   Xeon: 8 cores(2 CPUs), 24GB memory
> 
> - One VM was being migrated locally to the opposite numa node:
>   Source(active)   VM: binded to node 0
>   Target(incoming) VM: binded to node 1
> 
>   This binding was for reducing extra noise.
> 
> - The guest inside it:
>   3 VCPUs, 11GB memory
> 
> - Workload:
>   On VCPU 2 and 3, there were 3 threads and each of them was endlessly
>   writing to 3GB, in total 9GB, anonymous memory at its maximum speed.
> 
>   I had checked that GET_DIRTY_LOG was forced to write protect more than
>   2 million pages.  So the 9GB memory was almost always kept dirty to be
>   sent.
> 
>   In parallel, on VCPU 1, I checked memory write latency: how long it
>   takes to write to one byte of each page in 1GB anonymous memory.
> 
> - Result:
>   With the current KVM, I could see 1.5ms worst case latency: this
>   corresponds well with the expected mmu_lock hold time.
> 
>   Here, you may think that this is too small compared to the numbers I
>   reported before, using dirty-log-perf, but that was done on 32-bit
>   host on a core-i3 box which was much slower than server machines.
> 
> 
>   Although having 10GB dirty memory pages is a bit extreme for guests
>   with less than 16GB memory, much larger guests, e.g. 128GB guests, may
>   see latency longer than 1.5ms.
> 
> 3. Solution
>   GET_DIRTY_LOG time is very limited compared to other works in QEMU,
>   so we should focus on alleviating the worst case latency first.
> 
>   The solution is very simple and originally suggested by Marcelo:
>     "Conditionally reschedule when there is a contention."
> 
>   By this rescheduling, see the following patch, the worst case latency
>   changed from 1.5ms to 800us for the same test.
> 
> 4. TODO
>   The patch treats kvm_vm_ioctl_get_dirty_log() only, so the write
>   protection by kvm_mmu_slot_remove_write_access(), which is called when
>   we enable dirty page logging, can cause the same problem.
> 
>   My plan is to replace it with rmap-based protection after this.
> 
> 
> Thanks,
> 	Takuya
> 
> ---
> Takuya Yoshikawa (1):
>   KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
> 
>  arch/x86/include/asm/kvm_host.h |    6 +++---
>  arch/x86/kvm/mmu.c              |   12 +++++++++---
>  arch/x86/kvm/x86.c              |   22 +++++++++++++++++-----
>  3 files changed, 29 insertions(+), 11 deletions(-)
> 
> -- 
> 1.7.5.4
> 


-- 
Takuya Yoshikawa <takuya.yoshikawa@gmail.com>

next prev parent reply	other threads:[~2012-05-02 11:24 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-28 10:05 [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
2012-04-28 10:07 ` [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched() Takuya Yoshikawa
2012-04-29 11:27   ` Avi Kivity
2012-04-29 12:17     ` Takuya Yoshikawa
2012-04-29 12:59       ` Avi Kivity
2012-04-29 14:24         ` Takuya Yoshikawa
2012-04-29 14:39           ` Avi Kivity
2012-04-29 14:55             ` Takuya Yoshikawa
2012-04-29 15:00               ` Avi Kivity
2012-04-29 15:13                 ` Takuya Yoshikawa
2012-04-29 15:20                   ` Avi Kivity
2012-04-30 14:06                     ` Takuya Yoshikawa
2012-05-01  3:04               ` Marcelo Tosatti
2012-05-01 13:14                 ` Takuya Yoshikawa
2012-05-01  3:07             ` Marcelo Tosatti
2012-05-02 11:24 ` Takuya Yoshikawa [this message]
2012-05-02 11:24   ` [Qemu-devel] Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging Takuya Yoshikawa
2012-05-02 11:33   ` Avi Kivity
2012-05-02 11:33     ` [Qemu-devel] " Avi Kivity
2012-05-02 14:20     ` Takuya Yoshikawa
2012-05-02 14:20       ` [Qemu-devel] " Takuya Yoshikawa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120502202414.37d760fbb1135bb8acb0f0db@gmail.com \
    --to=takuya.yoshikawa@gmail.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=mtosatti@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.