From: Avi Kivity <avi@redhat.com>
To: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: mtosatti@redhat.com, laijs@cn.fujitsu.com, kvm@vger.kernel.org,
takuya.yoshikawa@gmail.com
Subject: Re: [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap
Date: Thu, 18 Nov 2010 15:06:03 +0200 [thread overview]
Message-ID: <4CE524BB.9000007@redhat.com> (raw)
In-Reply-To: <20101118141559.8580766c.yoshikawa.takuya@oss.ntt.co.jp>
On 11/18/2010 07:15 AM, Takuya Yoshikawa wrote:
> Lai Jiangshan once tried to rewrite kvm_mmu_slot_remove_write_access() using
> rmap: "kvm: rework remove-write-access for a slot"
> http://www.spinics.net/lists/kvm/msg35871.html
>
> One problem pointed out there was that this approach might hurt cache locality
> and make things slow down.
>
> But if we restrict the story to dirty logging, we notice that only small
> portion of pages are actually needed to be write protected.
>
> For example, I have confirmed that even when we are playing with tools like
> x11perf, dirty ratio of the frame buffer bitmap is almost always less than 10%.
>
> In the case of live-migration, we will see more sparseness in the usual
> workload because the RAM size is really big.
>
> So this patch uses his approach with small modification to use switched out
> dirty bitmap as a hint to restrict the rmap travel.
>
> We can also use this to selectively write protect pages to reduce unwanted page
> faults in the future.
>
Looks like a good approach. Any measurements?
>
> +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp)
> +{
> + u64 *spte = rmap_next(kvm, rmapp, NULL);
> +
> + while (spte) {
> + /* avoid RMW */
> + if (is_writable_pte(*spte))
> + *spte&= ~PT_WRITABLE_MASK;
This is racy, *spte can be modified concurrently by hardware.
update_spte() can be used for this.
> + spte = rmap_next(kvm, rmapp, spte);
> + }
> +}
> +
> +/*
> + * Write protect the pages set dirty in a given bitmap.
> + */
> +void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm,
> + struct kvm_memory_slot *memslot,
> + unsigned long *dirty_bitmap)
> +{
> + int i;
> + unsigned long gfn_offset;
> +
> + for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
> + rmapp_remove_write_access(kvm,&memslot->rmap[gfn_offset]);
> +
> + for (i = 0; i< KVM_NR_PAGE_SIZES - 1; i++) {
> + unsigned long gfn = memslot->base_gfn + gfn_offset;
> + unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2);
> + int idx = gfn / huge - memslot->base_gfn / huge;
Better to use a shift than a division here.
> +
> + if (!(gfn_offset || (gfn % huge)))
> + break;
Why?
> + rmapp_remove_write_access(kvm,
> + &memslot->lpage_info[i][idx].rmap_pde);
> + }
> + }
> + kvm_flush_remote_tlbs(kvm);
> +}
> +
> void kvm_mmu_zap_all(struct kvm *kvm)
> {
> struct kvm_mmu_page *sp, *node;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 038d719..3556b4d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3194,12 +3194,27 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
> }
>
> /*
> + * Check the dirty bit ratio of a given memslot.
> + * 0: clean
> + * 1: sparse
> + * 2: dense
> + */
> +static int dirty_bitmap_density(struct kvm_memory_slot *memslot)
> +{
> + if (!memslot->num_dirty_bits)
> + return 0;
> + if (memslot->num_dirty_bits< memslot->npages / 128)
> + return 1;
> + return 2;
> +}
Use an enum please.
> +
> +/*
> * Get (and clear) the dirty memory log for a memory slot.
> */
> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> struct kvm_dirty_log *log)
> {
> - int r;
> + int r, density;
> struct kvm_memory_slot *memslot;
> unsigned long n;
>
> @@ -3217,7 +3232,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> n = kvm_dirty_bitmap_bytes(memslot);
>
> /* If nothing is dirty, don't bother messing with page tables. */
> - if (memslot->num_dirty_bits) {
> + density = dirty_bitmap_density(memslot);
> + if (density) {
> struct kvm_memslots *slots, *old_slots;
> unsigned long *dirty_bitmap;
>
> @@ -3242,7 +3258,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> kfree(old_slots);
>
> spin_lock(&kvm->mmu_lock);
> - kvm_mmu_slot_remove_write_access(kvm, log->slot);
> + if (density == 2)
> + kvm_mmu_slot_remove_write_access(kvm, log->slot);
> + else
> + kvm_mmu_slot_remove_write_access_mask(kvm,
> + &slots->memslots[log->slot],
> + dirty_bitmap);
> spin_unlock(&kvm->mmu_lock);
wrt. O(1) write protection: hard to tell if the two methods can
coexist. For direct mapped shadow pages (i.e. ep/npt) I think we can
use the mask to speed up clearing of an individual sp's sptes.
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2010-11-18 13:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-18 5:12 [RFC PATCH 0/2] KVM: dirty logging optimization Takuya Yoshikawa
2010-11-18 5:14 ` [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot Takuya Yoshikawa
2010-11-18 12:54 ` Avi Kivity
2010-11-18 13:10 ` Jan Kiszka
2010-11-18 13:14 ` Avi Kivity
2010-11-18 5:15 ` [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap Takuya Yoshikawa
2010-11-18 13:06 ` Avi Kivity [this message]
2010-11-19 8:30 ` Takuya Yoshikawa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CE524BB.9000007@redhat.com \
--to=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=laijs@cn.fujitsu.com \
--cc=mtosatti@redhat.com \
--cc=takuya.yoshikawa@gmail.com \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.