From: Avi Kivity <avi@redhat.com>
To: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: mtosatti@redhat.com, laijs@cn.fujitsu.com, kvm@vger.kernel.org,
takuya.yoshikawa@gmail.com
Subject: Re: [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap
Date: Thu, 18 Nov 2010 15:06:03 +0200 [thread overview]
Message-ID: <4CE524BB.9000007@redhat.com> (raw)
In-Reply-To: <20101118141559.8580766c.yoshikawa.takuya@oss.ntt.co.jp>
On 11/18/2010 07:15 AM, Takuya Yoshikawa wrote:
> Lai Jiangshan once tried to rewrite kvm_mmu_slot_remove_write_access() using
> rmap: "kvm: rework remove-write-access for a slot"
> http://www.spinics.net/lists/kvm/msg35871.html
>
> One problem pointed out there was that this approach might hurt cache locality
> and make things slow down.
>
> But if we restrict the story to dirty logging, we notice that only small
> portion of pages are actually needed to be write protected.
>
> For example, I have confirmed that even when we are playing with tools like
> x11perf, dirty ratio of the frame buffer bitmap is almost always less than 10%.
>
> In the case of live-migration, we will see more sparseness in the usual
> workload because the RAM size is really big.
>
> So this patch uses his approach with small modification to use switched out
> dirty bitmap as a hint to restrict the rmap travel.
>
> We can also use this to selectively write protect pages to reduce unwanted page
> faults in the future.
>
Looks like a good approach. Any measurements?
>
> +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp)
> +{
> + u64 *spte = rmap_next(kvm, rmapp, NULL);
> +
> + while (spte) {
> + /* avoid RMW */
> + if (is_writable_pte(*spte))
> + *spte&= ~PT_WRITABLE_MASK;
This is racy, *spte can be modified concurrently by hardware.
update_spte() can be used for this.
> + spte = rmap_next(kvm, rmapp, spte);
> + }
> +}
> +
> +/*
> + * Write protect the pages set dirty in a given bitmap.
> + */
> +void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm,
> + struct kvm_memory_slot *memslot,
> + unsigned long *dirty_bitmap)
> +{
> + int i;
> + unsigned long gfn_offset;
> +
> + for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
> + rmapp_remove_write_access(kvm,&memslot->rmap[gfn_offset]);
> +
> + for (i = 0; i< KVM_NR_PAGE_SIZES - 1; i++) {
> + unsigned long gfn = memslot->base_gfn + gfn_offset;
> + unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2);
> + int idx = gfn / huge - memslot->base_gfn / huge;
Better to use a shift than a division here.
> +
> + if (!(gfn_offset || (gfn % huge)))
> + break;
Why?
> + rmapp_remove_write_access(kvm,
> + &memslot->lpage_info[i][idx].rmap_pde);
> + }
> + }
> + kvm_flush_remote_tlbs(kvm);
> +}
> +
> void kvm_mmu_zap_all(struct kvm *kvm)
> {
> struct kvm_mmu_page *sp, *node;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 038d719..3556b4d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3194,12 +3194,27 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
> }
>
> /*
> + * Check the dirty bit ratio of a given memslot.
> + * 0: clean
> + * 1: sparse
> + * 2: dense
> + */
> +static int dirty_bitmap_density(struct kvm_memory_slot *memslot)
> +{
> + if (!memslot->num_dirty_bits)
> + return 0;
> + if (memslot->num_dirty_bits< memslot->npages / 128)
> + return 1;
> + return 2;
> +}
Use an enum please.
> +
> +/*
> * Get (and clear) the dirty memory log for a memory slot.
> */
> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> struct kvm_dirty_log *log)
> {
> - int r;
> + int r, density;
> struct kvm_memory_slot *memslot;
> unsigned long n;
>
> @@ -3217,7 +3232,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> n = kvm_dirty_bitmap_bytes(memslot);
>
> /* If nothing is dirty, don't bother messing with page tables. */
> - if (memslot->num_dirty_bits) {
> + density = dirty_bitmap_density(memslot);
> + if (density) {
> struct kvm_memslots *slots, *old_slots;
> unsigned long *dirty_bitmap;
>
> @@ -3242,7 +3258,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> kfree(old_slots);
>
> spin_lock(&kvm->mmu_lock);
> - kvm_mmu_slot_remove_write_access(kvm, log->slot);
> + if (density == 2)
> + kvm_mmu_slot_remove_write_access(kvm, log->slot);
> + else
> + kvm_mmu_slot_remove_write_access_mask(kvm,
> + &slots->memslots[log->slot],
> + dirty_bitmap);
> spin_unlock(&kvm->mmu_lock);
wrt. O(1) write protection: hard to tell if the two methods can
coexist. For direct mapped shadow pages (i.e. ep/npt) I think we can
use the mask to speed up clearing of an individual sp's sptes.
--
error compiling committee.c: too many arguments to function
next prev parent reply other threads:[~2010-11-18 13:06 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-18 5:12 [RFC PATCH 0/2] KVM: dirty logging optimization Takuya Yoshikawa
2010-11-18 5:14 ` [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot Takuya Yoshikawa
2010-11-18 12:54 ` Avi Kivity
2010-11-18 13:10 ` Jan Kiszka
2010-11-18 13:14 ` Avi Kivity
2010-11-18 5:15 ` [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap Takuya Yoshikawa
2010-11-18 13:06 ` Avi Kivity [this message]
2010-11-19 8:30 ` Takuya Yoshikawa
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4CE524BB.9000007@redhat.com \
--to=avi@redhat.com \
--cc=kvm@vger.kernel.org \
--cc=laijs@cn.fujitsu.com \
--cc=mtosatti@redhat.com \
--cc=takuya.yoshikawa@gmail.com \
--cc=yoshikawa.takuya@oss.ntt.co.jp \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox