Re: [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi@redhat.com>
To: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Cc: mtosatti@redhat.com, laijs@cn.fujitsu.com, kvm@vger.kernel.org,
	takuya.yoshikawa@gmail.com
Subject: Re: [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap
Date: Thu, 18 Nov 2010 15:06:03 +0200	[thread overview]
Message-ID: <4CE524BB.9000007@redhat.com> (raw)
In-Reply-To: <20101118141559.8580766c.yoshikawa.takuya@oss.ntt.co.jp>

On 11/18/2010 07:15 AM, Takuya Yoshikawa wrote:
> Lai Jiangshan once tried to rewrite kvm_mmu_slot_remove_write_access() using
> rmap: "kvm: rework remove-write-access for a slot"
>        http://www.spinics.net/lists/kvm/msg35871.html
>
> One problem pointed out there was that this approach might hurt cache locality
> and make things slow down.
>
> But if we restrict the story to dirty logging, we notice that only small
> portion of pages are actually needed to be write protected.
>
> For example, I have confirmed that even when we are playing with tools like
> x11perf, dirty ratio of the frame buffer bitmap is almost always less than 10%.
>
> In the case of live-migration, we will see more sparseness in the usual
> workload because the RAM size is really big.
>
> So this patch uses his approach with small modification to use switched out
> dirty bitmap as a hint to restrict the rmap travel.
>
> We can also use this to selectively write protect pages to reduce unwanted page
> faults in the future.
>

Looks like a good approach.  Any measurements?

>
> +static void rmapp_remove_write_access(struct kvm *kvm, unsigned long *rmapp)
> +{
> +	u64 *spte = rmap_next(kvm, rmapp, NULL);
> +
> +	while (spte) {
> +		/* avoid RMW */
> +		if (is_writable_pte(*spte))
> +			*spte&= ~PT_WRITABLE_MASK;

This is racy, *spte can be modified concurrently by hardware.

update_spte() can be used for this.

> +		spte = rmap_next(kvm, rmapp, spte);
> +	}
> +}
> +
> +/*
> + * Write protect the pages set dirty in a given bitmap.
> + */
> +void kvm_mmu_slot_remove_write_access_mask(struct kvm *kvm,
> +					   struct kvm_memory_slot *memslot,
> +					   unsigned long *dirty_bitmap)
> +{
> +	int i;
> +	unsigned long gfn_offset;
> +
> +	for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
> +		rmapp_remove_write_access(kvm,&memslot->rmap[gfn_offset]);
> +
> +		for (i = 0; i<  KVM_NR_PAGE_SIZES - 1; i++) {
> +			unsigned long gfn = memslot->base_gfn + gfn_offset;
> +			unsigned long huge = KVM_PAGES_PER_HPAGE(i + 2);
> +			int idx = gfn / huge - memslot->base_gfn / huge;

Better to use a shift than a division here.

> +
> +			if (!(gfn_offset || (gfn % huge)))
> +				break;

Why?

> +			rmapp_remove_write_access(kvm,
> +					&memslot->lpage_info[i][idx].rmap_pde);
> +		}
> +	}
> +	kvm_flush_remote_tlbs(kvm);
> +}
> +
>   void kvm_mmu_zap_all(struct kvm *kvm)
>   {
>   	struct kvm_mmu_page *sp, *node;
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 038d719..3556b4d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3194,12 +3194,27 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
>   }
>
>   /*
> + * Check the dirty bit ratio of a given memslot.
> + *   0: clean
> + *   1: sparse
> + *   2: dense
> + */
> +static int dirty_bitmap_density(struct kvm_memory_slot *memslot)
> +{
> +	if (!memslot->num_dirty_bits)
> +		return 0;
> +	if (memslot->num_dirty_bits<  memslot->npages / 128)
> +		return 1;
> +	return 2;
> +}

Use an enum please.

> +
> +/*
>    * Get (and clear) the dirty memory log for a memory slot.
>    */
>   int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>   				      struct kvm_dirty_log *log)
>   {
> -	int r;
> +	int r, density;
>   	struct kvm_memory_slot *memslot;
>   	unsigned long n;
>
> @@ -3217,7 +3232,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>   	n = kvm_dirty_bitmap_bytes(memslot);
>
>   	/* If nothing is dirty, don't bother messing with page tables. */
> -	if (memslot->num_dirty_bits) {
> +	density = dirty_bitmap_density(memslot);
> +	if (density) {
>   		struct kvm_memslots *slots, *old_slots;
>   		unsigned long *dirty_bitmap;
>
> @@ -3242,7 +3258,12 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>   		kfree(old_slots);
>
>   		spin_lock(&kvm->mmu_lock);
> -		kvm_mmu_slot_remove_write_access(kvm, log->slot);
> +		if (density == 2)
> +			kvm_mmu_slot_remove_write_access(kvm, log->slot);
> +		else
> +			kvm_mmu_slot_remove_write_access_mask(kvm,
> +						&slots->memslots[log->slot],
> +						dirty_bitmap);
>   		spin_unlock(&kvm->mmu_lock);

wrt. O(1) write protection: hard to tell if the two methods can 
coexist.  For direct mapped shadow pages (i.e. ep/npt) I think we can 
use the mask to speed up clearing of an individual sp's sptes.

-- 
error compiling committee.c: too many arguments to function

next prev parent reply	other threads:[~2010-11-18 13:06 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-11-18  5:12 [RFC PATCH 0/2] KVM: dirty logging optimization Takuya Yoshikawa
2010-11-18  5:14 ` [RFC PATCH 1/2] KVM: count the number of dirty bits for each memslot Takuya Yoshikawa
2010-11-18 12:54   ` Avi Kivity
2010-11-18 13:10     ` Jan Kiszka
2010-11-18 13:14       ` Avi Kivity
2010-11-18  5:15 ` [RFC PATCH 2/2] KVM: selective write protection using dirty bitmap Takuya Yoshikawa
2010-11-18 13:06   ` Avi Kivity [this message]
2010-11-19  8:30     ` Takuya Yoshikawa

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CE524BB.9000007@redhat.com \
    --to=avi@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=laijs@cn.fujitsu.com \
    --cc=mtosatti@redhat.com \
    --cc=takuya.yoshikawa@gmail.com \
    --cc=yoshikawa.takuya@oss.ntt.co.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox