Re: [PATCH 3/5] live migration support for VM dirty log management

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Marc Zyngier <marc.zyngier@arm.com>
To: Mario Smarduch <m.smarduch@samsung.com>
Cc: "kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	"christoffer.dall@linaro.org" <christoffer.dall@linaro.org>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	이정석 <jays.lee@samsung.com>, 정성진 <sungjinn.chung@samsung.com>
Subject: Re: [PATCH 3/5] live migration support for VM dirty log management
Date: Thu, 17 Apr 2014 09:00:18 +0100	[thread overview]
Message-ID: <87a9bkxu5p.fsf@approximate.cambridge.arm.com> (raw)
In-Reply-To: <534F2F9B.80004@samsung.com> (Mario Smarduch's message of "Thu, 17 Apr 2014 02:34:19 +0100")

On Thu, Apr 17 2014 at  2:34:19 am BST, Mario Smarduch <m.smarduch@samsung.com> wrote:
> Add support for dirty bitmap management. Wanted to make it generic but function
> does a couple things different then the x86 version.
>
>
> Signed-off-by: Mario Smarduch <m.smarduch@samsung.com>
> ---
>  arch/arm/include/asm/kvm_host.h |    3 ++
>  arch/arm/kvm/arm.c              |   71 ++++++++++++++++++++++++++++++++++++++-
>  arch/arm/kvm/mmu.c              |   53 +++++++++++++++++++++++++++++
>  3 files changed, 126 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 7ac1fdc..16ed4e4 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -230,5 +230,8 @@ int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
>  
>  void kvm_tlb_flush_vm(struct kvm *kvm);
>  void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +		struct kvm_memory_slot *slot,
> +		gfn_t gfn_offset, unsigned long mask);
>  
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7714cc6..7882343 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -785,9 +785,78 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>  	}
>  }
>  
> +
> +/**
> + * kvm_mmu_slot_remove_access - retrieves the log of dirty pages for a memslot.
> + *	It's itteratively during migration to retrieve pages written since
> + *	last call. In the process write protects ptes that are dirty for next
> + *	time, holds the mmu_lock while write protecting dirty pages.
> + *
> + * @kvm:        The KVM pointer
> + * @log:        Bitmap of dirty pages return.
> + */

So let's play the difference game with x86:

>  int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
>  {
> -	return -EINVAL;
> +	int r;
> +	struct kvm_memory_slot *memslot;
> +	unsigned long n, i;
> +	unsigned long *dirty_bitmap;
> +	unsigned long *dirty_bitmap_buffer;
> +	bool is_dirty = false;
> +	gfn_t offset;

You've moved offset out of the loop.

> +	mutex_lock(&kvm->slots_lock);
> +	r = -EINVAL;
> +
> +	/* Return with error code will cause migration to abort, this happens
> +	 * when initial write protection of VM to manage dirty pages fails
> +	 */
> +	if (kvm->arch.migration_in_progress == -1)
> +		goto out;

You've added this test. How does x86 cope with the same case?

> +	if (log->slot >= KVM_USER_MEM_SLOTS)
> +		goto out;
> +
> +	memslot = id_to_memslot(kvm->memslots, log->slot);
> +	dirty_bitmap = memslot->dirty_bitmap;
> +
> +	r = -ENOENT;
> +	if (!dirty_bitmap)
> +		goto out;
> +
> +	n = kvm_dirty_bitmap_bytes(memslot);
> +	dirty_bitmap_buffer = dirty_bitmap + n / sizeof(long);
> +	memset(dirty_bitmap_buffer, 0, n);
> +
> +	spin_lock(&kvm->mmu_lock);
> +	for (i = 0; i < n / sizeof(long); i++) {
> +		unsigned long mask;
> +
> +		if (!dirty_bitmap[i])
> +			continue;
> +
> +		is_dirty = true;
> +		offset = i * BITS_PER_LONG;
> +		kvm_mmu_write_protect_pt_masked(kvm, memslot, offset,
> +							dirty_bitmap[i]);
> +		mask = dirty_bitmap[i];
> +		dirty_bitmap_buffer[i] = mask;
> +		dirty_bitmap[i] = 0;

You've expanded the xchg macro, and moved it around.

> +	}
> +
> +	if (is_dirty)
> +		kvm_tlb_flush_vm(kvm);

This can be easily abstracted to be a kvm_flush_remote_tlbs on x86, and
a HW broadcast on ARM.

> +
> +	spin_unlock(&kvm->mmu_lock);
> +	r = -EFAULT;
> +
> +	if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
> +		goto out;
> +
> +	r = 0;
> +out:
> +	mutex_unlock(&kvm->slots_lock);
> +	return r;
>  }

So only two small differences that can be easily factored out. I stand
by my initial analysis, and ask again you move this function to the
generic code as a weak symbol. There isn't any value in duplicating it.

>  static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index b85ab56..47bec1c 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -773,6 +773,59 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)
>  	spin_unlock(&kvm->mmu_lock);
>  }
>  
> +
> +/**
> + * kvm_mmu_write_protected_pt_masked - after migration thread write protects
> + *  the entire VM address space itterative calls are made to get diry pages
> + *  as the VM pages are being migrated. New dirty pages may be subset
> + *  of initial WPed VM or new writes faulted in. Here write protect new
> + *  dirty pages again in preparation of next dirty log read. This function is
> + *  called as a result KVM_GET_DIRTY_LOG ioctl, to determine what pages
> + *  need to be migrated.
> + *   'kvm->mmu_lock' must be  held to protect against concurrent modification
> + *   of page tables (2nd stage fault, mmu modifiers, device writes)
> + *
> + * @kvm:        The KVM pointer
> + * @slot:	The memory slot the dirty log is retrieved for
> + * @gfn_offset: The gfn offset in memory slot
> + * @mask:       The mask of dirty pages at offset 'gnf_offset in this memory
> + *              slot to be writ protect
> + */
> +
> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> +		struct kvm_memory_slot *slot,
> +		gfn_t gfn_offset, unsigned long mask)
> +{
> +	phys_addr_t ipa;
> +	pgd_t *pgdp = kvm->arch.pgd, *pgd;
> +	pud_t *pud;
> +	pmd_t *pmd;
> +	pte_t *pte, new_pte;
> +
> +	/* walk set bits in the mask and write protect corresponding pages */
> +	while (mask) {
> +		ipa = (slot->base_gfn + gfn_offset + __ffs(mask)) << PAGE_SHIFT;
> +		pgd = pgdp + pgd_index(ipa);
> +		if (!pgd_present(*pgd))
> +			goto update_mask;

I think something is wrong in your logic. If there is no PGD, it means a
whole 1GB isn't present. Yet you're just clearing one bit from the mask
and doing it again. As you're only looking at BITS_PER_LONG contiguous pages at a
time, it is likely that the same thing will happen for the other pages,
and you're just wasting precious CPU cycles here.

> +		pud = pud_offset(pgd, ipa);
> +		if (!pud_present(*pud))
> +			goto update_mask;
> +		pmd = pmd_offset(pud, ipa);
> +		if (!pmd_present(*pmd))
> +			goto update_mask;
> +		pte = pte_offset_kernel(pmd, ipa);
> +		if (!pte_present(*pte))
> +			goto update_mask;
> +		if ((*pte & L_PTE_S2_RDWR) == L_PTE_S2_RDONLY)
> +			goto update_mask;
> +		new_pte = pfn_pte(pte_pfn(*pte), PAGE_S2);
> +		*pte = new_pte;

I'd like to see these two lines in a separate function (something like
"stage2_mark_pte_ro")...

> +update_mask:
> +		mask &= mask - 1;
> +	}
> +}
> +
>  static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
>  			  struct kvm_memory_slot *memslot,
>  			  unsigned long fault_status)

-- 
Jazz is not dead. It just smells funny.

next prev parent reply	other threads:[~2014-04-17  8:00 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-17  1:34 [PATCH 3/5] live migration support for VM dirty log management Mario Smarduch
2014-04-17  8:00 ` Marc Zyngier [this message]
2014-04-18  3:10   ` Mario Smarduch
2014-04-18  8:23     ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a9bkxu5p.fsf@approximate.cambridge.arm.com \
    --to=marc.zyngier@arm.com \
    --cc=christoffer.dall@linaro.org \
    --cc=jays.lee@samsung.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.cs.columbia.edu \
    --cc=m.smarduch@samsung.com \
    --cc=sungjinn.chung@samsung.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.