From: Marcelo Tosatti <mtosatti@redhat.com>
To: Takuya Yoshikawa <takuya.yoshikawa@gmail.com>
Cc: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>,
avi@redhat.com, kvm@vger.kernel.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/4 changelog-v2] KVM: Switch to srcu-less get_dirty_log()
Date: Tue, 6 Mar 2012 08:15:40 -0300 [thread overview]
Message-ID: <20120306111540.GA29914@amt.cnet> (raw)
In-Reply-To: <20120303142148.2689454b30dc86d84c4a19f5@gmail.com>
On Sat, Mar 03, 2012 at 02:21:48PM +0900, Takuya Yoshikawa wrote:
> We have seen some problems of the current implementation of
> get_dirty_log() which uses synchronize_srcu_expedited() for updating
> dirty bitmaps; e.g. it is noticeable that this sometimes gives us ms
> order of latency when we use VGA displays.
>
> Furthermore the recent discussion on the following thread
> "srcu: Implement call_srcu()"
> http://lkml.org/lkml/2012/1/31/211
> also motivated us to implement get_dirty_log() without SRCU.
>
> This patch achieves this goal without sacrificing the performance of
> both VGA and live migration: in practice the new code is much faster
> than the old one unless we have too many dirty pages.
>
> Implementation:
>
> The key part of the implementation is the use of xchg() operation for
> clearing dirty bits atomically. Since this allows us to update only
> BITS_PER_LONG pages at once, we need to iterate over the dirty bitmap
> until every dirty bit is cleared again for the next call.
>
> Although some people may worry about the problem of using the atomic
> memory instruction many times to the concurrently accessible bitmap,
> it is usually accessed with mmu_lock held and we rarely see concurrent
> accesses: so what we need to care about is the pure xchg() overheads.
>
> Another point to note is that we do not use for_each_set_bit() to check
> which ones in each BITS_PER_LONG pages are actually dirty. Instead we
> simply use __ffs() in a loop. This is much faster than repeatedly call
> find_next_bit().
>
> Performance:
>
> The dirty-log-perf unit test showed nice improvements, some times faster
> than before, except for some extreme cases; for such cases the speed of
> getting dirty page information is much faster than we process it in the
> userspace.
>
> For real workloads, both VGA and live migration, we have observed pure
> improvements: when the guest was reading a file during live migration,
> we originally saw a few ms of latency, but with the new method the
> latency was less than 200us.
>
> Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
> ---
> arch/x86/kvm/x86.c | 116 +++++++++++++++++++--------------------------------
> 1 files changed, 43 insertions(+), 73 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 3bc1922..0748bab 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3041,55 +3041,32 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
> }
>
> /**
> - * write_protect_slot - write protect a slot for dirty logging
> - * @kvm: the kvm instance
> - * @memslot: the slot we protect
> - * @dirty_bitmap: the bitmap indicating which pages are dirty
> - * @nr_dirty_pages: the number of dirty pages
> + * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
> + * @kvm: kvm instance
> + * @log: slot id and address to which we copy the log
> *
> - * We have two ways to find all sptes to protect:
> - * 1. Use kvm_mmu_slot_remove_write_access() which walks all shadow pages and
> - * checks ones that have a spte mapping a page in the slot.
> - * 2. Use kvm_mmu_rmap_write_protect() for each gfn found in the bitmap.
> + * We need to keep it in mind that VCPU threads can write to the bitmap
> + * concurrently. So, to avoid losing data, we keep the following order for
> + * each bit:
> *
> - * Generally speaking, if there are not so many dirty pages compared to the
> - * number of shadow pages, we should use the latter.
> + * 1. Take a snapshot of the bit and clear it if needed.
> + * 2. Write protect the corresponding page.
> + * 3. Flush TLB's if needed.
> + * 4. Copy the snapshot to the userspace.
> *
> - * Note that letting others write into a page marked dirty in the old bitmap
> - * by using the remaining tlb entry is not a problem. That page will become
> - * write protected again when we flush the tlb and then be reported dirty to
> - * the user space by copying the old bitmap.
> + * Between 2 and 3, the guest may write to the page using the remaining TLB
> + * entry. This is not a problem because the page will be reported dirty at
> + * step 4 using the snapshot taken before and step 3 ensures that successive
> + * writes will be logged for the next call.
> */
> -static void write_protect_slot(struct kvm *kvm,
> - struct kvm_memory_slot *memslot,
> - unsigned long *dirty_bitmap,
> - unsigned long nr_dirty_pages)
> -{
> - spin_lock(&kvm->mmu_lock);
> -
> - /* Not many dirty pages compared to # of shadow pages. */
> - if (nr_dirty_pages < kvm->arch.n_used_mmu_pages) {
> - gfn_t offset;
> -
> - for_each_set_bit(offset, dirty_bitmap, memslot->npages)
> - kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, 1);
> -
> - kvm_flush_remote_tlbs(kvm);
> - } else
> - kvm_mmu_slot_remove_write_access(kvm, memslot->id);
> -
> - spin_unlock(&kvm->mmu_lock);
> -}
> -
> -/*
> - * Get (and clear) the dirty memory log for a memory slot.
> - */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> - struct kvm_dirty_log *log)
> +int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> {
> int r;
> struct kvm_memory_slot *memslot;
> - unsigned long n, nr_dirty_pages;
> + unsigned long n, i;
> + unsigned long *dirty_bitmap;
> + unsigned long *dirty_bitmap_buffer;
> + bool is_dirty = false;
>
> mutex_lock(&kvm->slots_lock);
>
> @@ -3098,49 +3075,42 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> goto out;
>
> memslot = id_to_memslot(kvm->memslots, log->slot);
> +
> + dirty_bitmap = memslot->dirty_bitmap;
> r = -ENOENT;
> - if (!memslot->dirty_bitmap)
> + if (!dirty_bitmap)
> goto out;
>
> n = kvm_dirty_bitmap_bytes(memslot);
> - nr_dirty_pages = memslot->nr_dirty_pages;
>
> - /* If nothing is dirty, don't bother messing with page tables. */
> - if (nr_dirty_pages) {
> - struct kvm_memslots *slots, *old_slots;
> - unsigned long *dirty_bitmap, *dirty_bitmap_head;
> + dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> + memset(dirty_bitmap_buffer, 0, n);
>
> - dirty_bitmap = memslot->dirty_bitmap;
> - dirty_bitmap_head = memslot->dirty_bitmap_head;
> - if (dirty_bitmap == dirty_bitmap_head)
> - dirty_bitmap_head += n / sizeof(long);
> - memset(dirty_bitmap_head, 0, n);
> + spin_lock(&kvm->mmu_lock);
It is not clear why mmu_lock is needed. Dropping it across the xchg loop
should be similar to srcu implementation, in that concurrent updates
will be visible only on the next get_dirty call? Well, it is necessary
anyway for write protecting the sptes.
A cond_resched_lock() would alleviate the potentially long held
times for mmu_lock (can you measure it with large memslots?)
Otherwise looks nice.
>
> - r = -ENOMEM;
> - slots = kmemdup(kvm->memslots, sizeof(*kvm->memslots), GFP_KERNEL);
> - if (!slots)
> - goto out;
> + for (i = 0; i < n / sizeof(long); i++) {
> + unsigned long mask;
> + gfn_t offset;
>
> - memslot = id_to_memslot(slots, log->slot);
> - memslot->nr_dirty_pages = 0;
> - memslot->dirty_bitmap = dirty_bitmap_head;
> - update_memslots(slots, NULL);
> + if (!dirty_bitmap[i])
> + continue;
>
> - old_slots = kvm->memslots;
> - rcu_assign_pointer(kvm->memslots, slots);
> - synchronize_srcu_expedited(&kvm->srcu);
> - kfree(old_slots);
> + is_dirty = true;
>
> - write_protect_slot(kvm, memslot, dirty_bitmap, nr_dirty_pages);
> + mask = xchg(&dirty_bitmap[i], 0);
> + dirty_bitmap_buffer[i] = mask;
>
> - r = -EFAULT;
> - if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
> - goto out;
> - } else {
> - r = -EFAULT;
> - if (clear_user(log->dirty_bitmap, n))
> - goto out;
> + offset = i * BITS_PER_LONG;
> + kvm_mmu_write_protect_pt_masked(kvm, memslot, offset, mask);
> }
> + if (is_dirty)
> + kvm_flush_remote_tlbs(kvm);
> +
> + spin_unlock(&kvm->mmu_lock);
> +
> + r = -EFAULT;
> + if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> + goto out;
>
> r = 0;
> out:
> --
> 1.7.5.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2012-03-06 11:15 UTC|newest]
Thread overview: 39+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-03-01 10:30 [PATCH 0/4] KVM: srcu-less dirty logging -v2 Takuya Yoshikawa
2012-03-01 10:31 ` [PATCH 1/4] KVM: MMU: Split the main body of rmap_write_protect() off from others Takuya Yoshikawa
2012-03-12 7:39 ` Takuya Yoshikawa
2012-03-12 7:52 ` Takuya Yoshikawa
2012-03-01 10:32 ` [PATCH 2/4] KVM: Avoid checking huge page mappings in get_dirty_log() Takuya Yoshikawa
2012-03-02 2:56 ` Takuya Yoshikawa
2012-03-02 5:11 ` Takuya Yoshikawa
2012-03-12 18:04 ` Avi Kivity
2012-03-13 9:20 ` Takuya Yoshikawa
2012-03-13 10:12 ` Avi Kivity
2012-03-13 23:04 ` Marcelo Tosatti
2012-03-14 1:04 ` Takuya Yoshikawa
2012-03-14 5:34 ` Takuya Yoshikawa
2012-03-14 10:58 ` Marcelo Tosatti
2012-03-01 10:33 ` [PATCH 3/4] KVM: Switch to srcu-less get_dirty_log() Takuya Yoshikawa
2012-03-03 5:21 ` [PATCH 3/4 changelog-v2] " Takuya Yoshikawa
2012-03-06 11:15 ` Marcelo Tosatti [this message]
2012-03-06 14:43 ` Takuya Yoshikawa
2012-03-06 15:01 ` Marcelo Tosatti
2012-03-06 15:23 ` Takuya Yoshikawa
2012-03-06 15:28 ` Marcelo Tosatti
2012-03-07 8:07 ` Takuya Yoshikawa
2012-03-07 23:25 ` Marcelo Tosatti
2012-03-08 1:35 ` Takuya Yoshikawa
2012-03-09 0:08 ` Marcelo Tosatti
2012-03-12 12:05 ` Avi Kivity
2012-03-07 8:18 ` Takuya Yoshikawa
2012-03-07 23:20 ` Marcelo Tosatti
2012-03-16 5:03 ` [PATCH 3/4] " Xiao Guangrong
2012-03-16 6:55 ` Takuya Yoshikawa
2012-03-16 7:30 ` Xiao Guangrong
2012-03-16 7:55 ` Takuya Yoshikawa
2012-03-16 8:28 ` Xiao Guangrong
2012-03-16 9:44 ` Takuya Yoshikawa
2012-03-19 9:34 ` Xiao Guangrong
2012-03-19 10:15 ` Takuya Yoshikawa
2012-03-01 10:34 ` [PATCH 4/4] KVM: Remove unused dirty_bitmap_head and nr_dirty_pages Takuya Yoshikawa
2012-03-03 5:12 ` [PATCH 0/4] KVM: srcu-less dirty logging -v2 Takuya Yoshikawa
2012-03-20 14:43 ` Avi Kivity
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120306111540.GA29914@amt.cnet \
--to=mtosatti@redhat.com \
--cc=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=takuya.yoshikawa@gmail.com \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.