From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap Date: Thu, 18 Nov 2010 15:06:03 +0200 Message-ID: <4CE524BB.9000007@redhat.com> References: <20101118141232.d1d25679.yoshikawa.takuya@oss.ntt.co.jp> <20101118141559.8580766c.yoshikawa.takuya@oss.ntt.co.jp> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: mtosatti@redhat.com, laijs@cn.fujitsu.com, kvm@vger.kernel.org, takuya.yoshikawa@gmail.com To: Takuya Yoshikawa Return-path: Received: from mx1.redhat.com ([209.132.183.28]:55494 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751944Ab0KRNGN (ORCPT ); Thu, 18 Nov 2010 08:06:13 -0500 In-Reply-To: <20101118141559.8580766c.yoshikawa.takuya@oss.ntt.co.jp> Sender: kvm-owner@vger.kernel.org List-ID: On 11/18/2010 07:15 AM, Takuya Yoshikawa wrote: > Lai Jiangshan once tried to rewrite kvm_mmu_slot_remove_write_access() using > rmap: "kvm: rework remove-write-access for a slot" > http://www.spinics.net/lists/kvm/msg35871.html > > One problem pointed out there was that this approach might hurt cache locality > and make things slow down. > > But if we restrict the story to dirty logging, we notice that only small > portion of pages are actually needed to be write protected. > > For example, I have confirmed that even when we are playing with tools like > x11perf, dirty ratio of the frame buffer bitmap is almost always less than 10%. > > In the case of live-migration, we will see more sparseness in the usual > workload because the RAM size is really big. > > So this patch uses his approach with small modification to use switched out > dirty bitmap as a hint to restrict the rmap travel. > > We can also use this to selectively write protect pages to reduce unwanted page > faults in the future. > Looks like a good approach. Any measurements? > > +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp) > +{ > + u64 *spte = rmap_next(kvm, rmapp, NULL); > + > + while (spte) { > + /* avoid RMW */ > + if (is_writable_pte(*spte)) > + *spte&= ~PT_WRITABLE_MASK; This is racy, *spte can be modified concurrently by hardware. update_spte() can be used for this. > + spte = rmap_next(kvm, rmapp, spte); > + } > +} > + > +/* > + * Write protect the pages set dirty in a given bitmap. > + */ > +void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm, > + struct kvm_memory_slot *memslot, > + unsigned long *dirty_bitmap) > +{ > + int i; > + unsigned long gfn_offset; > + > + for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) { > + rmapp_remove_write_access(kvm,&memslot->rmap[gfn_offset]); > + > + for (i = 0; i< KVM_NR_PAGE_SIZES - 1; i++) { > + unsigned long gfn = memslot->base_gfn + gfn_offset; > + unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2); > + int idx = gfn / huge - memslot->base_gfn / huge; Better to use a shift than a division here. > + > + if (!(gfn_offset || (gfn % huge))) > + break; Why? > + rmapp_remove_write_access(kvm, > + &memslot->lpage_info[i][idx].rmap_pde); > + } > + } > + kvm_flush_remote_tlbs(kvm); > +} > + > void kvm_mmu_zap_all(struct kvm *kvm) > { > struct kvm_mmu_page *sp, *node; > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 038d719..3556b4d 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -3194,12 +3194,27 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm, > } > > /* > + * Check the dirty bit ratio of a given memslot. > + * 0: clean > + * 1: sparse > + * 2: dense > + */ > +static int dirty_bitmap_density(struct kvm_memory_slot *memslot) > +{ > + if (!memslot->num_dirty_bits) > + return 0; > + if (memslot->num_dirty_bits< memslot->npages / 128) > + return 1; > + return 2; > +} Use an enum please. > + > +/* > * Get (and clear) the dirty memory log for a memory slot. > */ > int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, > struct kvm_dirty_log *log) > { > - int r; > + int r, density; > struct kvm_memory_slot *memslot; > unsigned long n; > > @@ -3217,7 +3232,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, > n = kvm_dirty_bitmap_bytes(memslot); > > /* If nothing is dirty, don't bother messing with page tables. */ > - if (memslot->num_dirty_bits) { > + density = dirty_bitmap_density(memslot); > + if (density) { > struct kvm_memslots *slots, *old_slots; > unsigned long *dirty_bitmap; > > @@ -3242,7 +3258,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, > kfree(old_slots); > > spin_lock(&kvm->mmu_lock); > - kvm_mmu_slot_remove_write_access(kvm, log->slot); > + if (density == 2) > + kvm_mmu_slot_remove_write_access(kvm, log->slot); > + else > + kvm_mmu_slot_remove_write_access_mask(kvm, > + &slots->memslots[log->slot], > + dirty_bitmap); > spin_unlock(&kvm->mmu_lock); wrt. O(1) write protection: hard to tell if the two methods can coexist. For direct mapped shadow pages (i.e. ep/npt) I think we can use the mask to speed up clearing of an individual sp's sptes. -- error compiling committee.c: too many arguments to function