kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
	avi@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4 changelog-v2] KVM: Switch to srcu-less get_dirty_log()
Date: Tue, 6 Mar 2012 08:15:40 -0300	[thread overview]
Message-ID: <20120306111540.GA29914@amt.cnet> (raw)
In-Reply-To: <20120303142148.2689454b30dc86d84c4a19f5@gmail.com>

On Sat, Mar 03, 2012 at 02:21:48PM +0900, Takuya Yoshikawa wrote:
> We have seen some problems of the current implementation of
> get_dirty_log() which uses synchronize_srcu_expedited() for updating
> dirty bitmaps; e.g. it is noticeable that this sometimes gives us ms
> order of latency when we use VGA displays.
> 
> Furthermore the recent discussion on the following thread
>     "srcu: Implement call_srcu()"
>     http://lkml.org/lkml/2012/1/31/211
> also motivated us to implement get_dirty_log() without SRCU.
> 
> This patch achieves this goal without sacrificing the performance of
> both VGA and live migration: in practice the new code is much faster
> than the old one unless we have too many dirty pages.
> 
> Implementation:
> 
> The key part of the implementation is the use of xchg() operation for
> clearing dirty bits atomically.  Since this allows us to update only
> BITS_PER_LONG pages at once, we need to iterate over the dirty bitmap
> until every dirty bit is cleared again for the next call.
> 
> Although some people may worry about the problem of using the atomic
> memory instruction many times to the concurrently accessible bitmap,
> it is usually accessed with mmu_lock held and we rarely see concurrent
> accesses: so what we need to care about is the pure xchg() overheads.
> 
> Another point to note is that we do not use for_each_set_bit() to check
> which ones in each BITS_PER_LONG pages are actually dirty.  Instead we
> simply use __ffs() in a loop.  This is much faster than repeatedly call
> find_next_bit().
> 
> Performance:
> 
> The dirty-log-perf unit test showed nice improvements, some times faster
> than before, except for some extreme cases; for such cases the speed of
> getting dirty page information is much faster than we process it in the
> userspace.
> 
> For real workloads, both VGA and live migration, we have observed pure
> improvements: when the guest was reading a file during live migration,
> we originally saw a few ms of latency, but with the new method the
> latency was less than 200us.
> 
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
>  arch/x86/kvm/x86.c |  116 +++++++++++++++++++--------------------------------
>  1 files changed, 43 insertions(+), 73 deletions(-)
> 
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3bc1922..0748bab 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3041,55 +3041,32 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>  }
>  
>  /**
> - * write_protect_slot - write protect a slot for dirty logging
> - * @kvm: the kvm instance
> - * @memslot: the slot we protect
> - * @dirty_bitmap: the bitmap indicating which pages are dirty
> - * @nr_dirty_pages: the number of dirty pages
> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
> + * @kvm: kvm instance
> + * @log: slot id and address to which we copy the log
>   *
> - * We have two ways to find all sptes to protect:
> - * 1. Use kvm_mmu_slot_remove_write_access() which walks all shadow pages and
> - *    checks ones that have a spte mapping a page in the slot.
> - * 2. Use kvm_mmu_rmap_write_protect() for each gfn found in the bitmap.
> + * We need to keep it in mind that VCPU threads can write to the bitmap
> + * concurrently.  So, to avoid losing data, we keep the following order for
> + * each bit:
>   *
> - * Generally speaking, if there are not so many dirty pages compared to the
> - * number of shadow pages, we should use the latter.
> + *   1. Take a snapshot of the bit and clear it if needed.
> + *   2. Write protect the corresponding page.
> + *   3. Flush TLB's if needed.
> + *   4. Copy the snapshot to the userspace.
>   *
> - * Note that letting others write into a page marked dirty in the old bitmap
> - * by using the remaining tlb entry is not a problem.  That page will become
> - * write protected again when we flush the tlb and then be reported dirty to
> - * the user space by copying the old bitmap.
> + * Between 2 and 3, the guest may write to the page using the remaining TLB
> + * entry.  This is not a problem because the page will be reported dirty at
> + * step 4 using the snapshot taken before and step 3 ensures that successive
> + * writes will be logged for the next call.
>   */
> -static void write_protect_slot(struct kvm *kvm,
> -			       struct kvm_memory_slot *memslot,
> -			       unsigned long *dirty_bitmap,
> -			       unsigned long nr_dirty_pages)
> -{
> -	spin_lock(&kvm->mmu_lock);
> -
> -	/* Not many dirty pages compared to # of shadow pages. */
> -	if (nr_dirty_pages < kvm->arch.n_used_mmu_pages) {
> -		gfn_t offset;
> -
> -		for_each_set_bit(offset, dirty_bitmap, memslot->npages)
> -			kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, 1);
> -
> -		kvm_flush_remote_tlbs(kvm);
> -	} else
> -		kvm_mmu_slot_remove_write_access(kvm, memslot->id);
> -
> -	spin_unlock(&kvm->mmu_lock);
> -}
> -
> -/*
> - * Get (and clear) the dirty memory log for a memory slot.
> - */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> -				      struct kvm_dirty_log *log)
> +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
>  	int r;
>  	struct kvm_memory_slot *memslot;
> -	unsigned long n, nr_dirty_pages;
> +	unsigned long n, i;
> +	unsigned long *dirty_bitmap;
> +	unsigned long *dirty_bitmap_buffer;
> +	bool is_dirty = false;
>  
>  	mutex_lock(&kvm->slots_lock);
>  
> @@ -3098,49 +3075,42 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  		goto out;
>  
>  	memslot = id_to_memslot(kvm->memslots, log->slot);
> +
> +	dirty_bitmap = memslot->dirty_bitmap;
>  	r = -ENOENT;
> -	if (!memslot->dirty_bitmap)
> +	if (!dirty_bitmap)
>  		goto out;
>  
>  	n = kvm_dirty_bitmap_bytes(memslot);
> -	nr_dirty_pages = memslot->nr_dirty_pages;
>  
> -	/* If nothing is dirty, don't bother messing with page tables. */
> -	if (nr_dirty_pages) {
> -		struct kvm_memslots *slots, *old_slots;
> -		unsigned long *dirty_bitmap, *dirty_bitmap_head;
> +	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> +	memset(dirty_bitmap_buffer, 0, n);
>  
> -		dirty_bitmap = memslot->dirty_bitmap;
> -		dirty_bitmap_head = memslot->dirty_bitmap_head;
> -		if (dirty_bitmap == dirty_bitmap_head)
> -			dirty_bitmap_head += n / sizeof(long);
> -		memset(dirty_bitmap_head, 0, n);
> +	spin_lock(&kvm->mmu_lock);

It is not clear why mmu_lock is needed. Dropping it across the xchg loop
should be similar to srcu implementation, in that concurrent updates
will be visible only on the next get_dirty call? Well, it is necessary
anyway for write protecting the sptes.

A cond_resched_lock() would alleviate the potentially long held 
times for mmu_lock (can you measure it with large memslots?)

Otherwise looks nice.

>  
> -		r = -ENOMEM;
> -		slots = kmemdup(kvm->memslots, sizeof(*kvm->memslots), GFP_KERNEL);
> -		if (!slots)
> -			goto out;
> +	for (i = 0; i < n / sizeof(long); i++) {
> +		unsigned long mask;
> +		gfn_t offset;
>  
> -		memslot = id_to_memslot(slots, log->slot);
> -		memslot->nr_dirty_pages = 0;
> -		memslot->dirty_bitmap = dirty_bitmap_head;
> -		update_memslots(slots, NULL);
> +		if (!dirty_bitmap[i])
> +			continue;
>  
> -		old_slots = kvm->memslots;
> -		rcu_assign_pointer(kvm->memslots, slots);
> -		synchronize_srcu_expedited(&kvm->srcu);
> -		kfree(old_slots);
> +		is_dirty = true;
>  
> -		write_protect_slot(kvm, memslot, dirty_bitmap, nr_dirty_pages);
> +		mask = xchg(&dirty_bitmap[i], 0);
> +		dirty_bitmap_buffer[i] = mask;
>  
> -		r = -EFAULT;
> -		if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
> -			goto out;
> -	} else {
> -		r = -EFAULT;
> -		if (clear_user(log->dirty_bitmap, n))
> -			goto out;
> +		offset = i * BITS_PER_LONG;
> +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
>  	}
> +	if (is_dirty)
> +		kvm_flush_remote_tlbs(kvm);
> +
> +	spin_unlock(&kvm->mmu_lock);
> +
> +	r = -EFAULT;
> +	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> +		goto out;
>  
>  	r = 0;
>  out:
> -- 
> 1.7.5.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  reply	other threads:[~2012-03-06 11:15 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-03-01 10:30 [PATCH 0/4] KVM: srcu-less dirty logging -v2 Takuya Yoshikawa
2012-03-01 10:31 ` [PATCH 1/4] KVM: MMU: Split the main body of rmap_write_protect() off from others Takuya Yoshikawa
2012-03-12  7:39   ` Takuya Yoshikawa
2012-03-12  7:52     ` Takuya Yoshikawa
2012-03-01 10:32 ` [PATCH 2/4] KVM: Avoid checking huge page mappings in get_dirty_log() Takuya Yoshikawa
2012-03-02  2:56   ` Takuya Yoshikawa
2012-03-02  5:11     ` Takuya Yoshikawa
2012-03-12 18:04   ` Avi Kivity
2012-03-13  9:20     ` Takuya Yoshikawa
2012-03-13 10:12       ` Avi Kivity
2012-03-13 23:04   ` Marcelo Tosatti
2012-03-14  1:04     ` Takuya Yoshikawa
2012-03-14  5:34     ` Takuya Yoshikawa
2012-03-14 10:58       ` Marcelo Tosatti
2012-03-01 10:33 ` [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log() Takuya Yoshikawa
2012-03-03  5:21   ` [PATCH 3/4 changelog-v2] " Takuya Yoshikawa
2012-03-06 11:15     ` Marcelo Tosatti [this message]
2012-03-06 14:43       ` Takuya Yoshikawa
2012-03-06 15:01         ` Marcelo Tosatti
2012-03-06 15:23           ` Takuya Yoshikawa
2012-03-06 15:28             ` Marcelo Tosatti
2012-03-07  8:07               ` Takuya Yoshikawa
2012-03-07 23:25                 ` Marcelo Tosatti
2012-03-08  1:35                   ` Takuya Yoshikawa
2012-03-09  0:08                     ` Marcelo Tosatti
2012-03-12 12:05                       ` Avi Kivity
2012-03-07  8:18               ` Takuya Yoshikawa
2012-03-07 23:20                 ` Marcelo Tosatti
2012-03-16  5:03   ` [PATCH 3/4] " Xiao Guangrong
2012-03-16  6:55     ` Takuya Yoshikawa
2012-03-16  7:30       ` Xiao Guangrong
2012-03-16  7:55         ` Takuya Yoshikawa
2012-03-16  8:28           ` Xiao Guangrong
2012-03-16  9:44             ` Takuya Yoshikawa
2012-03-19  9:34               ` Xiao Guangrong
2012-03-19 10:15                 ` Takuya Yoshikawa
2012-03-01 10:34 ` [PATCH 4/4] KVM: Remove unused dirty_bitmap_head and nr_dirty_pages Takuya Yoshikawa
2012-03-03  5:12 ` [PATCH 0/4] KVM: srcu-less dirty logging -v2 Takuya Yoshikawa
2012-03-20 14:43 ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120306111540.GA29914@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=takuya.yoshikawa@gmail.com \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).