[PATCH 0/4] KVM: Dirty logging optimization using rmap

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] KVM: Dirty logging optimization using rmap
@ 2011-11-14  9:20 Takuya Yoshikawa
  2011-11-14  9:21 ` [PATCH 1/4] KVM: MMU: Clean up BUG_ON() conditions in rmap_write_protect() Takuya Yoshikawa
                   ` (5 more replies)
  0 siblings, 6 replies; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-14  9:20 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, takuya.yoshikawa

This is a revised version of my previous work.  I hope that
the patches are more self explanatory than before.

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH 1/4] KVM: MMU: Clean up BUG_ON() conditions in rmap_write_protect()
  2011-11-14  9:20 [PATCH 0/4] KVM: Dirty logging optimization using rmap Takuya Yoshikawa
@ 2011-11-14  9:21 ` Takuya Yoshikawa
  2011-11-14  9:22 ` [PATCH 2/4] KVM: MMU: Split gfn_to_rmap() into two functions Takuya Yoshikawa
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-14  9:21 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, takuya.yoshikawa

Remove redundant checks and use is_large_pte() macro.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |    4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a9b3a32..973f254 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1027,7 +1027,6 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 
 	spte = rmap_next(kvm, rmapp, NULL);
 	while (spte) {
-		BUG_ON(!spte);
 		BUG_ON(!(*spte & PT_PRESENT_MASK));
 		rmap_printk("rmap_write_protect: spte %p %llx\n", spte, *spte);
 		if (is_writable_pte(*spte)) {
@@ -1043,9 +1042,8 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 		rmapp = gfn_to_rmap(kvm, gfn, i);
 		spte = rmap_next(kvm, rmapp, NULL);
 		while (spte) {
-			BUG_ON(!spte);
 			BUG_ON(!(*spte & PT_PRESENT_MASK));
-			BUG_ON((*spte & (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK)) != (PT_PAGE_SIZE_MASK|PT_PRESENT_MASK));
+			BUG_ON(!is_large_pte(*spte));
 			pgprintk("rmap_write_protect(large): spte %p %llx %lld\n", spte, *spte, gfn);
 			if (is_writable_pte(*spte)) {
 				drop_spte(kvm, spte);
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 2/4] KVM: MMU: Split gfn_to_rmap() into two functions
  2011-11-14  9:20 [PATCH 0/4] KVM: Dirty logging optimization using rmap Takuya Yoshikawa
  2011-11-14  9:21 ` [PATCH 1/4] KVM: MMU: Clean up BUG_ON() conditions in rmap_write_protect() Takuya Yoshikawa
@ 2011-11-14  9:22 ` Takuya Yoshikawa
  2011-11-14  9:23 ` [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging Takuya Yoshikawa
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-14  9:22 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, takuya.yoshikawa

rmap_write_protect() calls gfn_to_rmap() for each level with gfn fixed.
This results in calling gfn_to_memslot() repeatedly with that gfn.

This patch introduces __gfn_to_rmap() which takes the slot as an
argument to avoid this.

This is also needed for the following dirty logging optimization.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/mmu.c |   26 +++++++++++++++++---------
 1 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 973f254..fa71085 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -958,23 +958,29 @@ static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
 	}
 }
 
-/*
- * Take gfn and return the reverse mapping to it.
- */
-static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
+static unsigned long *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level,
+				    struct kvm_memory_slot *slot)
 {
-	struct kvm_memory_slot *slot;
 	struct kvm_lpage_info *linfo;
 
-	slot = gfn_to_memslot(kvm, gfn);
 	if (likely(level == PT_PAGE_TABLE_LEVEL))
 		return &slot->rmap[gfn - slot->base_gfn];
 
 	linfo = lpage_info_slot(gfn, slot, level);
-
 	return &linfo->rmap_pde;
 }
 
+/*
+ * Take gfn and return the reverse mapping to it.
+ */
+static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
+{
+	struct kvm_memory_slot *slot;
+
+	slot = gfn_to_memslot(kvm, gfn);
+	return __gfn_to_rmap(kvm, gfn, level, slot);
+}
+
 static bool rmap_can_add(struct kvm_vcpu *vcpu)
 {
 	struct kvm_mmu_memory_cache *cache;
@@ -1019,12 +1025,14 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 
 static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 {
+	struct kvm_memory_slot *slot;
 	unsigned long *rmapp;
 	u64 *spte;
 	int i, write_protected = 0;
 
-	rmapp = gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL);
+	slot = gfn_to_memslot(kvm, gfn);
 
+	rmapp = __gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL, slot);
 	spte = rmap_next(kvm, rmapp, NULL);
 	while (spte) {
 		BUG_ON(!(*spte & PT_PRESENT_MASK));
@@ -1039,7 +1047,7 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 	/* check for huge page mappings */
 	for (i = PT_DIRECTORY_LEVEL;
 	     i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
-		rmapp = gfn_to_rmap(kvm, gfn, i);
+		rmapp = __gfn_to_rmap(kvm, gfn, i, slot);
 		spte = rmap_next(kvm, rmapp, NULL);
 		while (spte) {
 			BUG_ON(!(*spte & PT_PRESENT_MASK));
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-11-14  9:20 [PATCH 0/4] KVM: Dirty logging optimization using rmap Takuya Yoshikawa
  2011-11-14  9:21 ` [PATCH 1/4] KVM: MMU: Clean up BUG_ON() conditions in rmap_write_protect() Takuya Yoshikawa
  2011-11-14  9:22 ` [PATCH 2/4] KVM: MMU: Split gfn_to_rmap() into two functions Takuya Yoshikawa
@ 2011-11-14  9:23 ` Takuya Yoshikawa
  2011-11-14 10:07   ` Avi Kivity
  2011-11-14  9:24 ` [PATCH 4/4] KVM: Optimize dirty logging by rmap_write_protect() Takuya Yoshikawa
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-14  9:23 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, takuya.yoshikawa

Needed for the next patch which uses this number to decide how to write
protect a slot.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/kvm/x86.c       |    9 +++------
 include/linux/kvm_host.h |    1 +
 virt/kvm/kvm_main.c      |    4 +++-
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9eff4af..5fa6801 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3467,10 +3467,9 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 				      struct kvm_dirty_log *log)
 {
-	int r, i;
+	int r;
 	struct kvm_memory_slot *memslot;
 	unsigned long n;
-	unsigned long is_dirty = 0;
 
 	mutex_lock(&kvm->slots_lock);
 
@@ -3485,11 +3484,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 
 	n = kvm_dirty_bitmap_bytes(memslot);
 
-	for (i = 0; !is_dirty && i < n/sizeof(long); i++)
-		is_dirty = memslot->dirty_bitmap[i];
-
 	/* If nothing is dirty, don't bother messing with page tables. */
-	if (is_dirty) {
+	if (memslot->nr_dirty_pages) {
 		struct kvm_memslots *slots, *old_slots;
 		unsigned long *dirty_bitmap;
 
@@ -3504,6 +3500,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 			goto out;
 		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
 		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
+		slots->memslots[log->slot].nr_dirty_pages = 0;
 		slots->generation++;
 
 		old_slots = kvm->memslots;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c6a2ec9..7c654aa 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -181,6 +181,7 @@ struct kvm_memory_slot {
 	unsigned long *rmap;
 	unsigned long *dirty_bitmap;
 	unsigned long *dirty_bitmap_head;
+	unsigned long nr_dirty_pages;
 	struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1];
 	unsigned long userspace_addr;
 	int user_alloc;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4c5b9a2..af5c988 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -625,6 +625,7 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
 		return -ENOMEM;
 
 	memslot->dirty_bitmap_head = memslot->dirty_bitmap;
+	memslot->nr_dirty_pages = 0;
 	return 0;
 }
 #endif /* !CONFIG_S390 */
@@ -1491,7 +1492,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
 	if (memslot && memslot->dirty_bitmap) {
 		unsigned long rel_gfn = gfn - memslot->base_gfn;
 
-		__set_bit_le(rel_gfn, memslot->dirty_bitmap);
+		if (!__test_and_set_bit_le(rel_gfn, memslot->dirty_bitmap))
+			memslot->nr_dirty_pages++;
 	}
 }
 
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH 4/4] KVM: Optimize dirty logging by rmap_write_protect()
  2011-11-14  9:20 [PATCH 0/4] KVM: Dirty logging optimization using rmap Takuya Yoshikawa
                   ` (2 preceding siblings ...)
  2011-11-14  9:23 ` [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging Takuya Yoshikawa
@ 2011-11-14  9:24 ` Takuya Yoshikawa
  2011-11-14 10:22   ` Avi Kivity
  2011-11-14 10:25 ` [PATCH 0/4] KVM: Dirty logging optimization using rmap Avi Kivity
  2011-11-17  9:28 ` Avi Kivity
  5 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-14  9:24 UTC (permalink / raw)
  To: avi, mtosatti; +Cc: kvm, takuya.yoshikawa

Currently, write protecting a slot needs to walk all the shadow pages
and checks ones which have a pte mapping a page in it.

The walk is overly heavy when dirty pages in that slot are not so many
and checking the shadow pages would result in unwanted cache pollution.

To mitigate this problem, we use rmap_write_protect() and check only
the sptes which can be reached from gfns marked in the dirty bitmap
when the number of dirty pages are less than that of shadow pages.

This criterion is reasonable in its meaning and worked well in our test:
write protection became some times faster than before when the ratio of
dirty pages are low and was not worse even when the ratio was near the
criterion.

Note that the locking for this write protection becomes fine grained.
The reason why this is safe is descripted in the comments.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
---
 arch/x86/include/asm/kvm_host.h |    2 +
 arch/x86/kvm/mmu.c              |   14 +++++++---
 arch/x86/kvm/x86.c              |   58 ++++++++++++++++++++++++++++++++++-----
 3 files changed, 63 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 6d83264..69b6525 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -648,6 +648,8 @@ void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask,
 
 int kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
 void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
+int kvm_mmu_rmap_write_protect(struct kvm *kvm, u64 gfn,
+			       struct kvm_memory_slot *slot);
 void kvm_mmu_zap_all(struct kvm *kvm);
 unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
 void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fa71085..aecdea2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1023,15 +1023,13 @@ static void drop_spte(struct kvm *kvm, u64 *sptep)
 		rmap_remove(kvm, sptep);
 }
 
-static int rmap_write_protect(struct kvm *kvm, u64 gfn)
+int kvm_mmu_rmap_write_protect(struct kvm *kvm, u64 gfn,
+			       struct kvm_memory_slot *slot)
 {
-	struct kvm_memory_slot *slot;
 	unsigned long *rmapp;
 	u64 *spte;
 	int i, write_protected = 0;
 
-	slot = gfn_to_memslot(kvm, gfn);
-
 	rmapp = __gfn_to_rmap(kvm, gfn, PT_PAGE_TABLE_LEVEL, slot);
 	spte = rmap_next(kvm, rmapp, NULL);
 	while (spte) {
@@ -1066,6 +1064,14 @@ static int rmap_write_protect(struct kvm *kvm, u64 gfn)
 	return write_protected;
 }
 
+static int rmap_write_protect(struct kvm *kvm, u64 gfn)
+{
+	struct kvm_memory_slot *slot;
+
+	slot = gfn_to_memslot(kvm, gfn);
+	return kvm_mmu_rmap_write_protect(kvm, gfn, slot);
+}
+
 static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
 			   unsigned long data)
 {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5fa6801..1985ea1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3461,6 +3461,50 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
 	return 0;
 }
 
+/**
+ * write_protect_slot - write protect a slot for dirty logging
+ * @kvm: the kvm instance
+ * @memslot: the slot we protect
+ * @dirty_bitmap: the bitmap indicating which pages are dirty
+ * @nr_dirty_pages: the number of dirty pages
+ *
+ * We have two ways to find all sptes to protect:
+ * 1. Use kvm_mmu_slot_remove_write_access() which walks all shadow pages and
+ *    checks ones that have a spte mapping a page in the slot.
+ * 2. Use kvm_mmu_rmap_write_protect() for each gfn found in the bitmap.
+ *
+ * Generally speaking, if there are not so many dirty pages compared to the
+ * number of shadow pages, we should use the latter.
+ *
+ * Note that letting others write into a page marked dirty in the old bitmap
+ * by using the remaining tlb entry is not a problem.  That page will become
+ * write protected again when we flush the tlb and then be reported dirty to
+ * the user space by copying the old bitmap.
+ */
+static void write_protect_slot(struct kvm *kvm,
+			       struct kvm_memory_slot *memslot,
+			       unsigned long *dirty_bitmap,
+			       unsigned long nr_dirty_pages)
+{
+	/* Not many dirty pages compared to # of shadow pages. */
+	if (nr_dirty_pages < kvm->arch.n_used_mmu_pages) {
+		unsigned long gfn_offset;
+
+		for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
+			unsigned long gfn = memslot->base_gfn + gfn_offset;
+
+			spin_lock(&kvm->mmu_lock);
+			kvm_mmu_rmap_write_protect(kvm, gfn, memslot);
+			spin_unlock(&kvm->mmu_lock);
+		}
+		kvm_flush_remote_tlbs(kvm);
+	} else {
+		spin_lock(&kvm->mmu_lock);
+		kvm_mmu_slot_remove_write_access(kvm, memslot->id);
+		spin_unlock(&kvm->mmu_lock);
+	}
+}
+
 /*
  * Get (and clear) the dirty memory log for a memory slot.
  */
@@ -3469,7 +3513,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 {
 	int r;
 	struct kvm_memory_slot *memslot;
-	unsigned long n;
+	unsigned long n, nr_dirty_pages;
 
 	mutex_lock(&kvm->slots_lock);
 
@@ -3483,9 +3527,10 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		goto out;
 
 	n = kvm_dirty_bitmap_bytes(memslot);
+	nr_dirty_pages = memslot->nr_dirty_pages;
 
 	/* If nothing is dirty, don't bother messing with page tables. */
-	if (memslot->nr_dirty_pages) {
+	if (nr_dirty_pages) {
 		struct kvm_memslots *slots, *old_slots;
 		unsigned long *dirty_bitmap;
 
@@ -3499,8 +3544,9 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		if (!slots)
 			goto out;
 		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
-		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
-		slots->memslots[log->slot].nr_dirty_pages = 0;
+		memslot = &slots->memslots[log->slot];
+		memslot->dirty_bitmap = dirty_bitmap;
+		memslot->nr_dirty_pages = 0;
 		slots->generation++;
 
 		old_slots = kvm->memslots;
@@ -3509,9 +3555,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
 		dirty_bitmap = old_slots->memslots[log->slot].dirty_bitmap;
 		kfree(old_slots);
 
-		spin_lock(&kvm->mmu_lock);
-		kvm_mmu_slot_remove_write_access(kvm, log->slot);
-		spin_unlock(&kvm->mmu_lock);
+		write_protect_slot(kvm, memslot, dirty_bitmap, nr_dirty_pages);
 
 		r = -EFAULT;
 		if (copy_to_user(log->dirty_bitmap, dirty_bitmap, n))
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-11-14  9:23 ` [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging Takuya Yoshikawa
@ 2011-11-14 10:07   ` Avi Kivity
  2011-11-14 10:24     ` Avi Kivity
  2011-12-20  4:29     ` Takuya Yoshikawa
  0 siblings, 2 replies; 40+ messages in thread
From: Avi Kivity @ 2011-11-14 10:07 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, takuya.yoshikawa

On 11/14/2011 11:23 AM, Takuya Yoshikawa wrote:
> Needed for the next patch which uses this number to decide how to write
> protect a slot.
>
>  	/* If nothing is dirty, don't bother messing with page tables. */
> -	if (is_dirty) {
> +	if (memslot->nr_dirty_pages) {
>  		struct kvm_memslots *slots, *old_slots;
>  		unsigned long *dirty_bitmap;
>  
> @@ -3504,6 +3500,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
>  			goto out;
>  		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
>  		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
> +		slots->memslots[log->slot].nr_dirty_pages = 0;
>  		slots->generation++;
>  
>  #endif /* !CONFIG_S390 */
> @@ -1491,7 +1492,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>  	if (memslot && memslot->dirty_bitmap) {
>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
>  
> -		__set_bit_le(rel_gfn, memslot->dirty_bitmap);
> +		if (!__test_and_set_bit_le(rel_gfn, memslot->dirty_bitmap))
> +			memslot->nr_dirty_pages++;
>  	}
>  }
>  

The two assignments to ->nr_dirty_pages can race, no?  Nothing protects it.

btw mark_page_dirty() itself seems to assume mmu_lock protection that
doesn't exist.  Marcelo?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/4] KVM: Optimize dirty logging by rmap_write_protect()
  2011-11-14  9:24 ` [PATCH 4/4] KVM: Optimize dirty logging by rmap_write_protect() Takuya Yoshikawa
@ 2011-11-14 10:22   ` Avi Kivity
  2011-11-14 10:29     ` Takuya Yoshikawa
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2011-11-14 10:22 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, takuya.yoshikawa

On 11/14/2011 11:24 AM, Takuya Yoshikawa wrote:
> Currently, write protecting a slot needs to walk all the shadow pages
> and checks ones which have a pte mapping a page in it.
>
> The walk is overly heavy when dirty pages in that slot are not so many
> and checking the shadow pages would result in unwanted cache pollution.
>
> To mitigate this problem, we use rmap_write_protect() and check only
> the sptes which can be reached from gfns marked in the dirty bitmap
> when the number of dirty pages are less than that of shadow pages.
>
> This criterion is reasonable in its meaning and worked well in our test:
> write protection became some times faster than before when the ratio of
> dirty pages are low and was not worse even when the ratio was near the
> criterion.
>
> Note that the locking for this write protection becomes fine grained.
> The reason why this is safe is descripted in the comments.
>
>  
> +/**
> + * write_protect_slot - write protect a slot for dirty logging
> + * @kvm: the kvm instance
> + * @memslot: the slot we protect
> + * @dirty_bitmap: the bitmap indicating which pages are dirty
> + * @nr_dirty_pages: the number of dirty pages
> + *
> + * We have two ways to find all sptes to protect:
> + * 1. Use kvm_mmu_slot_remove_write_access() which walks all shadow pages and
> + *    checks ones that have a spte mapping a page in the slot.
> + * 2. Use kvm_mmu_rmap_write_protect() for each gfn found in the bitmap.
> + *
> + * Generally speaking, if there are not so many dirty pages compared to the
> + * number of shadow pages, we should use the latter.
> + *
> + * Note that letting others write into a page marked dirty in the old bitmap
> + * by using the remaining tlb entry is not a problem.  That page will become
> + * write protected again when we flush the tlb and then be reported dirty to
> + * the user space by copying the old bitmap.
> + */
> +static void write_protect_slot(struct kvm *kvm,
> +			       struct kvm_memory_slot *memslot,
> +			       unsigned long *dirty_bitmap,
> +			       unsigned long nr_dirty_pages)
> +{
> +	/* Not many dirty pages compared to # of shadow pages. */
> +	if (nr_dirty_pages < kvm->arch.n_used_mmu_pages) {

Seems a reasonable heuristic.  In particular, this is always true for
vga, yes?  That will get the code exercised.

> +		unsigned long gfn_offset;
> +
> +		for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
> +			unsigned long gfn = memslot->base_gfn + gfn_offset;
> +
> +			spin_lock(&kvm->mmu_lock);
> +			kvm_mmu_rmap_write_protect(kvm, gfn, memslot);
> +			spin_unlock(&kvm->mmu_lock);
> +		}
> +		kvm_flush_remote_tlbs(kvm);
> +	} else {
> +		spin_lock(&kvm->mmu_lock);
> +		kvm_mmu_slot_remove_write_access(kvm, memslot->id);
> +		spin_unlock(&kvm->mmu_lock);
> +	}
> +}
> +
>

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-11-14 10:07   ` Avi Kivity
@ 2011-11-14 10:24     ` Avi Kivity
  2011-12-20  4:29     ` Takuya Yoshikawa
  1 sibling, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2011-11-14 10:24 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, takuya.yoshikawa

On 11/14/2011 12:07 PM, Avi Kivity wrote:
> On 11/14/2011 11:23 AM, Takuya Yoshikawa wrote:
> > Needed for the next patch which uses this number to decide how to write
> > protect a slot.
> >
> >  	/* If nothing is dirty, don't bother messing with page tables. */
> > -	if (is_dirty) {
> > +	if (memslot->nr_dirty_pages) {
> >  		struct kvm_memslots *slots, *old_slots;
> >  		unsigned long *dirty_bitmap;
> >  
> > @@ -3504,6 +3500,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> >  			goto out;
> >  		memcpy(slots, kvm->memslots, sizeof(struct kvm_memslots));
> >  		slots->memslots[log->slot].dirty_bitmap = dirty_bitmap;
> > +		slots->memslots[log->slot].nr_dirty_pages = 0;
> >  		slots->generation++;
> >  
> >  #endif /* !CONFIG_S390 */
> > @@ -1491,7 +1492,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >  	if (memslot && memslot->dirty_bitmap) {
> >  		unsigned long rel_gfn = gfn - memslot->base_gfn;
> >  
> > -		__set_bit_le(rel_gfn, memslot->dirty_bitmap);
> > +		if (!__test_and_set_bit_le(rel_gfn, memslot->dirty_bitmap))
> > +			memslot->nr_dirty_pages++;
> >  	}
> >  }
> >  
>
> The two assignments to ->nr_dirty_pages can race, no?  Nothing protects it.
>

er, it can't, it's before the rcu_assign_pointer().

-- 
error compiling committee.c: too many argumen
ts to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-14  9:20 [PATCH 0/4] KVM: Dirty logging optimization using rmap Takuya Yoshikawa
                   ` (3 preceding siblings ...)
  2011-11-14  9:24 ` [PATCH 4/4] KVM: Optimize dirty logging by rmap_write_protect() Takuya Yoshikawa
@ 2011-11-14 10:25 ` Avi Kivity
  2011-11-14 10:56   ` Takuya Yoshikawa
  2011-11-17  9:28 ` Avi Kivity
  5 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2011-11-14 10:25 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, takuya.yoshikawa

On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:
> This is a revised version of my previous work.  I hope that
> the patches are more self explanatory than before.
>

It looks good.  I'll let Marcelo (or anyone else?) review it as well
before applying.

Do you have performance measurements?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 4/4] KVM: Optimize dirty logging by rmap_write_protect()
  2011-11-14 10:22   ` Avi Kivity
@ 2011-11-14 10:29     ` Takuya Yoshikawa
  0 siblings, 0 replies; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-14 10:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, takuya.yoshikawa

(2011/11/14 19:22), Avi Kivity wrote:

>> + *
>> + * Generally speaking, if there are not so many dirty pages compared to the
>> + * number of shadow pages, we should use the latter.
>> + *
>> + * Note that letting others write into a page marked dirty in the old bitmap
>> + * by using the remaining tlb entry is not a problem.  That page will become
>> + * write protected again when we flush the tlb and then be reported dirty to
>> + * the user space by copying the old bitmap.
>> + */
>> +static void write_protect_slot(struct kvm *kvm,
>> +			       struct kvm_memory_slot *memslot,
>> +			       unsigned long *dirty_bitmap,
>> +			       unsigned long nr_dirty_pages)
>> +{
>> +	/* Not many dirty pages compared to # of shadow pages. */
>> +	if (nr_dirty_pages<  kvm->arch.n_used_mmu_pages) {
>
> Seems a reasonable heuristic.  In particular, this is always true for
> vga, yes?  That will get the code exercised.

Almost always yes, once the guest warms up and shadow pages are allocated.

	Takuya

>
>> +		unsigned long gfn_offset;
>> +
>> +		for_each_set_bit(gfn_offset, dirty_bitmap, memslot->npages) {
>> +			unsigned long gfn = memslot->base_gfn + gfn_offset;
>> +
>> +			spin_lock(&kvm->mmu_lock);
>> +			kvm_mmu_rmap_write_protect(kvm, gfn, memslot);
>> +			spin_unlock(&kvm->mmu_lock);
>> +		}
>> +		kvm_flush_remote_tlbs(kvm);
>> +	} else {
>> +		spin_lock(&kvm->mmu_lock);
>> +		kvm_mmu_slot_remove_write_access(kvm, memslot->id);
>> +		spin_unlock(&kvm->mmu_lock);
>> +	}
>> +}
>> +
>>
>


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-14 10:25 ` [PATCH 0/4] KVM: Dirty logging optimization using rmap Avi Kivity
@ 2011-11-14 10:56   ` Takuya Yoshikawa
  2011-11-14 12:39     ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-14 10:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, takuya.yoshikawa

(2011/11/14 19:25), Avi Kivity wrote:
> On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:
>> This is a revised version of my previous work.  I hope that
>> the patches are more self explanatory than before.
>>
>
> It looks good.  I'll let Marcelo (or anyone else?) review it as well
> before applying.
>
> Do you have performance measurements?
>

For VGA, 30-40us became 3-5us when the display was quiet, with a
enough warmed up guest.

Near the criterion, the number was not different much from the
original version.

For live migration, I forgot the number but the result was good.
But my test case was not enough to cover every pattern, so I changed
the criterion to be a bit conservative.

	More tests may be able to find a better criterion.
	I am not in a hurry about this, so it is OK to add some tests
	before merging this.

But what I did not like was holding spin lock more than 100us or so
with the original version.  With this version, at least, the problem
should be cured some.

	Takuya

One note:

kvm-unit-tests' dirty logging test was broken for 32-bit box: compile error.
I changed an "idt" to "boot_idt" and used it.

I do not know kvm-unit-tests well, so I want somebody to fix that officially.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-14 10:56   ` Takuya Yoshikawa
@ 2011-11-14 12:39     ` Avi Kivity
  2011-11-16  4:28       ` Takuya Yoshikawa
  2011-11-16  8:17       ` Takuya Yoshikawa
  0 siblings, 2 replies; 40+ messages in thread
From: Avi Kivity @ 2011-11-14 12:39 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, takuya.yoshikawa

On 11/14/2011 12:56 PM, Takuya Yoshikawa wrote:
> (2011/11/14 19:25), Avi Kivity wrote:
>> On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:
>>> This is a revised version of my previous work.  I hope that
>>> the patches are more self explanatory than before.
>>>
>>
>> It looks good.  I'll let Marcelo (or anyone else?) review it as well
>> before applying.
>>
>> Do you have performance measurements?
>>
>
> For VGA, 30-40us became 3-5us when the display was quiet, with a
> enough warmed up guest.
>

That's a nice improvement.

> Near the criterion, the number was not different much from the
> original version.
>
> For live migration, I forgot the number but the result was good.
> But my test case was not enough to cover every pattern, so I changed
> the criterion to be a bit conservative.
>
>     More tests may be able to find a better criterion.
>     I am not in a hurry about this, so it is OK to add some tests
>     before merging this.

I think we can merge is as is, it's clear we get an improvement.

>
> But what I did not like was holding spin lock more than 100us or so
> with the original version.  With this version, at least, the problem
> should be cured some.

There was a patchset from Peter Zijlstra that converted mmu notifiers to
be preemptible, with that, we can convert the mmu spinlock to a mutex,
I'll see what happened to it.

There is a third method of doing write protection, and that is by
write-protecting at the higher levels of the paging hierarchy.  The
advantage there is that write protection is O(1) no matter how large the
guest is, or the number of dirty pages.

To write protect all guest memory, we just write protect the 512 PTEs at
the very top, and leave the rest alone.  When the guest writes to a
page, we allow writes for the top-level PTE that faulted, and
write-protect all the PTEs that it points to.

We can combine it with your method by having a small bitmap (say, just
64 bits) per shadow page.  Each bit represents 8 PTEs (total 512 PTEs)
and is set if any of those PTEs are writeable.

>
>     Takuya
>
>
> One note:
>
> kvm-unit-tests' dirty logging test was broken for 32-bit box: compile
> error.
> I changed an "idt" to "boot_idt" and used it.
>
> I do not know kvm-unit-tests well, so I want somebody to fix that
> officially.

I'll look into it.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-14 12:39     ` Avi Kivity
@ 2011-11-16  4:28       ` Takuya Yoshikawa
  2011-11-16  9:06         ` Avi Kivity
  2011-11-16  8:17       ` Takuya Yoshikawa
  1 sibling, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-16  4:28 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, takuya.yoshikawa

(2011/11/14 21:39), Avi Kivity wrote:
> There was a patchset from Peter Zijlstra that converted mmu notifiers to
> be preemptible, with that, we can convert the mmu spinlock to a mutex,
> I'll see what happened to it.

Interesting!

> There is a third method of doing write protection, and that is by
> write-protecting at the higher levels of the paging hierarchy.  The
> advantage there is that write protection is O(1) no matter how large the
> guest is, or the number of dirty pages.
>
> To write protect all guest memory, we just write protect the 512 PTEs at
> the very top, and leave the rest alone.  When the guest writes to a
> page, we allow writes for the top-level PTE that faulted, and
> write-protect all the PTEs that it points to.

One important point is that the guest, not GET DIRTY LOG caller, will pay
for the write protection at the timing of faults.

For live migration, it may be good because we have to make the guest memory
converge anyway.

> We can combine it with your method by having a small bitmap (say, just
> 64 bits) per shadow page.  Each bit represents 8 PTEs (total 512 PTEs)
> and is set if any of those PTEs are writeable.

Yes, there seem to be some good ways to make every case work well.

	Takuya

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-14 12:39     ` Avi Kivity
  2011-11-16  4:28       ` Takuya Yoshikawa
@ 2011-11-16  8:17       ` Takuya Yoshikawa
  1 sibling, 0 replies; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-16  8:17 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, takuya.yoshikawa, kvm, qemu-devel

Adding qemu-devel to Cc.

(2011/11/14 21:39), Avi Kivity wrote:
> On 11/14/2011 12:56 PM, Takuya Yoshikawa wrote:
>> (2011/11/14 19:25), Avi Kivity wrote:
>>> On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:
>>>> This is a revised version of my previous work.  I hope that
>>>> the patches are more self explanatory than before.
>>>>
>>>
>>> It looks good.  I'll let Marcelo (or anyone else?) review it as well
>>> before applying.
>>>
>>> Do you have performance measurements?
>>>
>>
>> For VGA, 30-40us became 3-5us when the display was quiet, with a
>> enough warmed up guest.
>>
>
> That's a nice improvement.
>
>> Near the criterion, the number was not different much from the
>> original version.
>>
>> For live migration, I forgot the number but the result was good.
>> But my test case was not enough to cover every pattern, so I changed
>> the criterion to be a bit conservative.
>>
>>      More tests may be able to find a better criterion.
>>      I am not in a hurry about this, so it is OK to add some tests
>>      before merging this.
>
> I think we can merge is as is, it's clear we get an improvement.


I did a simple test to show numbers!

Here, a 4GB guest was being migrated locally during copying a file in it.


Case 1. corresponds to the original method and case 2 does to the optimized one.

Small numbers are, probably, from VGA:

	Case 1. about 30us
	Case 2. about 3us

Other numbers are from the system RAM (triggered by live migration):

	Case 1. about 500us, 2000us
	Case 2. about  80us, 2000us (not exactly averaged, see below for details)
	* 2000us was when rmap was not used, so equal to that of case 1.

So I can say that my patch worked well for both VGA and live migration.

	Takuya


=== measurement snippet ===

Case 1. kvm_mmu_slot_remove_write_access() only (same as the original method):

  qemu-system-x86-25413 [000]  6546.215009: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.215010: funcgraph_entry:      ! 2039.512 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.217051: funcgraph_exit:       ! 2040.487 us |  }
  qemu-system-x86-25413 [002]  6546.217347: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [002]  6546.217349: funcgraph_entry:      ! 571.121 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [002]  6546.217921: funcgraph_exit:       ! 572.525 us |  }
  qemu-system-x86-25413 [000]  6546.314583: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.314585: funcgraph_entry:      + 29.598 us  |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.314616: funcgraph_exit:       + 31.053 us  |  }
  qemu-system-x86-25413 [000]  6546.314784: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.314785: funcgraph_entry:      ! 2002.591 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.316788: funcgraph_exit:       ! 2003.537 us |  }
  qemu-system-x86-25413 [000]  6546.317082: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.317083: funcgraph_entry:      ! 624.445 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.317709: funcgraph_exit:       ! 625.861 us |  }
  qemu-system-x86-25413 [000]  6546.414261: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.414263: funcgraph_entry:      + 29.593 us  |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.414293: funcgraph_exit:       + 30.944 us  |  }
  qemu-system-x86-25413 [000]  6546.414528: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.414529: funcgraph_entry:      ! 1990.363 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.416520: funcgraph_exit:       ! 1991.370 us |  }
  qemu-system-x86-25413 [000]  6546.416775: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.416776: funcgraph_entry:      ! 594.333 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.417371: funcgraph_exit:       ! 595.415 us |  }
  qemu-system-x86-25413 [000]  6546.514133: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.514135: funcgraph_entry:      + 24.032 us  |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.514160: funcgraph_exit:       + 25.074 us  |  }
  qemu-system-x86-25413 [000]  6546.514312: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.514313: funcgraph_entry:      ! 2035.365 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.516349: funcgraph_exit:       ! 2036.298 us |  }
  qemu-system-x86-25413 [000]  6546.516642: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.516643: funcgraph_entry:      ! 598.308 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.517242: funcgraph_exit:       ! 599.344 us |  }
  qemu-system-x86-25413 [000]  6546.613895: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.613897: funcgraph_entry:      + 27.765 us  |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.613926: funcgraph_exit:       + 29.051 us  |  }
  qemu-system-x86-25413 [000]  6546.614052: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.614053: funcgraph_entry:      ! 1401.083 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.615454: funcgraph_exit:       ! 1401.778 us |  }
  qemu-system-x86-25413 [000]  6546.615656: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.615657: funcgraph_entry:      ! 405.810 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.616063: funcgraph_exit:       ! 406.415 us |  }
  qemu-system-x86-25413 [001]  6546.713523: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [001]  6546.713525: funcgraph_entry:      + 33.166 us  |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [001]  6546.713559: funcgraph_exit:       + 34.644 us  |  }
  qemu-system-x86-25413 [003]  6546.713688: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [003]  6546.713691: funcgraph_entry:      ! 1872.491 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [003]  6546.715564: funcgraph_exit:       ! 1874.775 us |  }
  qemu-system-x86-25413 [001]  6546.715829: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [001]  6546.715830: funcgraph_entry:      + 25.002 us  |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [001]  6546.715856: funcgraph_exit:       + 26.177 us  |  }
  qemu-system-x86-25413 [001]  6546.715969: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [001]  6546.715970: funcgraph_entry:      ! 604.399 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [001]  6546.716575: funcgraph_exit:       ! 605.502 us |  }
  qemu-system-x86-25413 [000]  6546.813387: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.813389: funcgraph_entry:      + 32.248 us  |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.813422: funcgraph_exit:       + 33.592 us  |  }
  qemu-system-x86-25413 [000]  6546.813565: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-25413 [000]  6546.813566: funcgraph_entry:      ! 1970.585 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-25413 [000]  6546.815537: funcgraph_exit:       ! 1971.752 us |  }


2. + rmap (my method):

  qemu-system-x86-7772  [000]  6096.185229: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [000]  6096.185230: funcgraph_entry:      ! 2090.108 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [000]  6096.187322: funcgraph_exit:       ! 2091.865 us |  }
  qemu-system-x86-7772  [001]  6096.187634: funcgraph_entry:        6.623 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.187765: funcgraph_entry:      ! 110.571 us |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.284811: funcgraph_entry:        2.343 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.284971: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [000]  6096.284972: funcgraph_entry:      ! 1999.656 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [000]  6096.286973: funcgraph_exit:       ! 2000.955 us |  }
  qemu-system-x86-7772  [000]  6096.287255: funcgraph_entry:      + 79.547 us  |  write_protect_slot();
  qemu-system-x86-7772  [001]  6096.384401: funcgraph_entry:        4.977 us   |  write_protect_slot();
  qemu-system-x86-7772  [001]  6096.384512: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [001]  6096.384513: funcgraph_entry:      ! 1887.579 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [001]  6096.386401: funcgraph_exit:       ! 1889.068 us |  }
  qemu-system-x86-7772  [001]  6096.386631: funcgraph_entry:      + 80.816 us  |  write_protect_slot();
  qemu-system-x86-7772  [001]  6096.484212: funcgraph_entry:        4.249 us   |  write_protect_slot();
  qemu-system-x86-7772  [001]  6096.484280: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [001]  6096.484280: funcgraph_entry:      ! 1872.626 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [001]  6096.486154: funcgraph_exit:       ! 1873.982 us |  }
  qemu-system-x86-7772  [001]  6096.486398: funcgraph_entry:      + 99.259 us  |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.584165: funcgraph_entry:        2.354 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.584450: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [000]  6096.584451: funcgraph_entry:      ! 2012.011 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [000]  6096.586464: funcgraph_exit:       ! 2013.625 us |  }
  qemu-system-x86-7772  [000]  6096.586791: funcgraph_entry:      + 74.855 us  |  write_protect_slot();
  qemu-system-x86-7772  [001]  6096.683636: funcgraph_entry:        2.386 us   |  write_protect_slot();
  qemu-system-x86-7772  [001]  6096.683749: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [001]  6096.683749: funcgraph_entry:      ! 1922.750 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [001]  6096.685674: funcgraph_exit:       ! 1924.311 us |  }
  qemu-system-x86-7772  [000]  6096.685926: funcgraph_entry:      + 76.410 us  |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.783620: funcgraph_entry:        2.195 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.783715: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [000]  6096.783716: funcgraph_entry:      ! 2110.459 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [000]  6096.785827: funcgraph_exit:       ! 2111.781 us |  }
  qemu-system-x86-7772  [000]  6096.786243: funcgraph_entry:        3.493 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.786334: funcgraph_entry:      ! 186.657 us |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.883166: funcgraph_entry:        2.289 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.883376: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [000]  6096.883376: funcgraph_entry:      ! 2031.813 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [000]  6096.885410: funcgraph_exit:       ! 2033.311 us |  }
  qemu-system-x86-7772  [000]  6096.885696: funcgraph_entry:      + 53.929 us  |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.983096: funcgraph_entry:        2.177 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6096.983288: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [000]  6096.983288: funcgraph_entry:      ! 2096.257 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [000]  6096.985386: funcgraph_exit:       ! 2097.685 us |  }
  qemu-system-x86-7772  [000]  6096.985697: funcgraph_entry:      + 78.442 us  |  write_protect_slot();
  qemu-system-x86-7772  [000]  6097.082658: funcgraph_entry:        2.286 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6097.082800: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [000]  6097.082801: funcgraph_entry:      ! 2049.809 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [000]  6097.084851: funcgraph_exit:       ! 2051.107 us |  }
  qemu-system-x86-7772  [000]  6097.085290: funcgraph_entry:      + 69.201 us  |  write_protect_slot();
  qemu-system-x86-7772  [000]  6097.182384: funcgraph_entry:        2.239 us   |  write_protect_slot();
  qemu-system-x86-7772  [000]  6097.182545: funcgraph_entry:                   |  write_protect_slot() {
  qemu-system-x86-7772  [000]  6097.182546: funcgraph_entry:      ! 2158.647 us |    kvm_mmu_slot_remove_write_access();
  qemu-system-x86-7772  [000]  6097.184706: funcgraph_exit:       ! 2160.028 us |  }

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-16  4:28       ` Takuya Yoshikawa
@ 2011-11-16  9:06         ` Avi Kivity
  2011-11-29 10:01           ` Xiao Guangrong
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2011-11-16  9:06 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, takuya.yoshikawa

On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote:
> (2011/11/14 21:39), Avi Kivity wrote:
>> There was a patchset from Peter Zijlstra that converted mmu notifiers to
>> be preemptible, with that, we can convert the mmu spinlock to a mutex,
>> I'll see what happened to it.
>
> Interesting!
>
>> There is a third method of doing write protection, and that is by
>> write-protecting at the higher levels of the paging hierarchy.  The
>> advantage there is that write protection is O(1) no matter how large the
>> guest is, or the number of dirty pages.
>>
>> To write protect all guest memory, we just write protect the 512 PTEs at
>> the very top, and leave the rest alone.  When the guest writes to a
>> page, we allow writes for the top-level PTE that faulted, and
>> write-protect all the PTEs that it points to.
>
> One important point is that the guest, not GET DIRTY LOG caller, will pay
> for the write protection at the timing of faults.

I don't think there is a significant difference.  The number of write
faults does not change.  The amount of work done per fault does, but not
by much, thanks to the writeable bitmap.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-14  9:20 [PATCH 0/4] KVM: Dirty logging optimization using rmap Takuya Yoshikawa
                   ` (4 preceding siblings ...)
  2011-11-14 10:25 ` [PATCH 0/4] KVM: Dirty logging optimization using rmap Avi Kivity
@ 2011-11-17  9:28 ` Avi Kivity
  5 siblings, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2011-11-17  9:28 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: mtosatti, kvm, takuya.yoshikawa

On 11/14/2011 11:20 AM, Takuya Yoshikawa wrote:
> This is a revised version of my previous work.  I hope that
> the patches are more self explanatory than before.
>
>

Thanks, applied.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-16  9:06         ` Avi Kivity
@ 2011-11-29 10:01           ` Xiao Guangrong
  2011-11-29 10:09             ` Xiao Guangrong
  0 siblings, 1 reply; 40+ messages in thread
From: Xiao Guangrong @ 2011-11-29 10:01 UTC (permalink / raw)
  To: kvm

Avi Kivity <avi <at> redhat.com> writes:

> 
> On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote:
> > (2011/11/14 21:39), Avi Kivity wrote:
> >> There was a patchset from Peter Zijlstra that converted mmu notifiers to
> >> be preemptible, with that, we can convert the mmu spinlock to a mutex,
> >> I'll see what happened to it.
> >
> > Interesting!
> >
> >> There is a third method of doing write protection, and that is by
> >> write-protecting at the higher levels of the paging hierarchy.  The
> >> advantage there is that write protection is O(1) no matter how large the
> >> guest is, or the number of dirty pages.
> >>
> >> To write protect all guest memory, we just write protect the 512 PTEs at
> >> the very top, and leave the rest alone.  When the guest writes to a
> >> page, we allow writes for the top-level PTE that faulted, and
> >> write-protect all the PTEs that it points to.
> >
> > One important point is that the guest, not GET DIRTY LOG caller, will pay
> > for the write protection at the timing of faults.
> 
> I don't think there is a significant difference.  The number of write
> faults does not change.  The amount of work done per fault does, but not
> by much, thanks to the writeable bitmap.
> 

Avi,

I think it needs more thinking if only less page need be write protected.

For example, framebuffer-based device used by Xwindow, only ~64M pages needs
to be write protected, but in your way, guest will get write page fault on all
memory? Hmm?

It has some tricks but i missed?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 10:01           ` Xiao Guangrong
@ 2011-11-29 10:09             ` Xiao Guangrong
  2011-11-29 10:35               ` Takuya Yoshikawa
  0 siblings, 1 reply; 40+ messages in thread
From: Xiao Guangrong @ 2011-11-29 10:09 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, Takuya Yoshikawa, KVM

Sorry, CC list is lost. :(

On 11/29/2011 06:01 PM, Xiao Guangrong wrote:

> Avi Kivity <avi <at> redhat.com> writes:
> 
>>
>> On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote:
>>> (2011/11/14 21:39), Avi Kivity wrote:
>>>> There was a patchset from Peter Zijlstra that converted mmu notifiers to
>>>> be preemptible, with that, we can convert the mmu spinlock to a mutex,
>>>> I'll see what happened to it.
>>>
>>> Interesting!
>>>
>>>> There is a third method of doing write protection, and that is by
>>>> write-protecting at the higher levels of the paging hierarchy.  The
>>>> advantage there is that write protection is O(1) no matter how large the
>>>> guest is, or the number of dirty pages.
>>>>
>>>> To write protect all guest memory, we just write protect the 512 PTEs at
>>>> the very top, and leave the rest alone.  When the guest writes to a
>>>> page, we allow writes for the top-level PTE that faulted, and
>>>> write-protect all the PTEs that it points to.
>>>
>>> One important point is that the guest, not GET DIRTY LOG caller, will pay
>>> for the write protection at the timing of faults.
>>
>> I don't think there is a significant difference.  The number of write
>> faults does not change.  The amount of work done per fault does, but not
>> by much, thanks to the writeable bitmap.
>>
> 
> Avi,
> 
> I think it needs more thinking if only less page need be write protected.
> 
> For example, framebuffer-based device used by Xwindow, only ~64M pages needs
> to be write protected, but in your way, guest will get write page fault on all
> memory? Hmm?
> 
> It has some tricks but i missed?
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 10:09             ` Xiao Guangrong
@ 2011-11-29 10:35               ` Takuya Yoshikawa
  2011-11-29 11:20                 ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-29 10:35 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Avi Kivity, Marcelo Tosatti, KVM

(2011/11/29 19:09), Xiao Guangrong wrote:
> Sorry, CC list is lost. :(
>
> On 11/29/2011 06:01 PM, Xiao Guangrong wrote:
>
>> Avi Kivity<avi<at>  redhat.com>  writes:
>>
>>>
>>> On 11/16/2011 06:28 AM, Takuya Yoshikawa wrote:
>>>> (2011/11/14 21:39), Avi Kivity wrote:
>>>>> There was a patchset from Peter Zijlstra that converted mmu notifiers to
>>>>> be preemptible, with that, we can convert the mmu spinlock to a mutex,
>>>>> I'll see what happened to it.
>>>>
>>>> Interesting!
>>>>
>>>>> There is a third method of doing write protection, and that is by
>>>>> write-protecting at the higher levels of the paging hierarchy.  The
>>>>> advantage there is that write protection is O(1) no matter how large the
>>>>> guest is, or the number of dirty pages.
>>>>>
>>>>> To write protect all guest memory, we just write protect the 512 PTEs at
>>>>> the very top, and leave the rest alone.  When the guest writes to a
>>>>> page, we allow writes for the top-level PTE that faulted, and
>>>>> write-protect all the PTEs that it points to.
>>>>
>>>> One important point is that the guest, not GET DIRTY LOG caller, will pay
>>>> for the write protection at the timing of faults.
>>>
>>> I don't think there is a significant difference.  The number of write
>>> faults does not change.  The amount of work done per fault does, but not
>>> by much, thanks to the writeable bitmap.
>>>
>>
>> Avi,
>>
>> I think it needs more thinking if only less page need be write protected.
>>
>> For example, framebuffer-based device used by Xwindow, only ~64M pages needs
>> to be write protected, but in your way, guest will get write page fault on all
>> memory? Hmm?
>>
>> It has some tricks but i missed?

Do you mean write protecting slot by slot is difficult in the case of O(1)?

	Takuya

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 10:35               ` Takuya Yoshikawa
@ 2011-11-29 11:20                 ` Avi Kivity
  2011-11-29 11:56                   ` Xiao Guangrong
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2011-11-29 11:20 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Xiao Guangrong, Marcelo Tosatti, KVM

On 11/29/2011 12:35 PM, Takuya Yoshikawa wrote:
>>> I think it needs more thinking if only less page need be write
>>> protected.
>>>
>>> For example, framebuffer-based device used by Xwindow, only ~64M
>>> pages needs
>>> to be write protected, but in your way, guest will get write page
>>> fault on all
>>> memory? Hmm?
>>>
>>> It has some tricks but i missed?
>
>
> Do you mean write protecting slot by slot is difficult in the case of
> O(1)?

Well, O(1) as outlined protects everything.  So if we protect a small
slot, all the others get a penalty.

We used to have a bitmap in a shadow page with a bit set for every slot
pointed to by the page.  If we extend this to non-leaf pages (so, when
we set a bit, we propagate it through its parent_ptes list), then we do
the following on write fault:

  for each level, from root towards leaf
     find index
     if if pte(index) is write-protected:
        remove write protection
        if pte(index) points to a leaf
           mark page dirty
        else
           for each pte in page pointed to by pte(index):
              if it is not dirty already, and its slots bitmap indicates
it needs write protection
                write protect pte

or something.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 11:20                 ` Avi Kivity
@ 2011-11-29 11:56                   ` Xiao Guangrong
  2011-11-29 12:01                     ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Xiao Guangrong @ 2011-11-29 11:56 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Takuya Yoshikawa, Marcelo Tosatti, KVM

On 11/29/2011 07:20 PM, Avi Kivity wrote:


> We used to have a bitmap in a shadow page with a bit set for every slot
> pointed to by the page.  If we extend this to non-leaf pages (so, when
> we set a bit, we propagate it through its parent_ptes list), then we do
> the following on write fault:
> 


Thanks for the detail.

Um, propagating slot bit to parent ptes is little slow, especially, it
is the overload for no Xwindow guests which is dirty logged only in the
migration(i guess most linux guests are running on this mode and migration
is not frequent). No?


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 11:56                   ` Xiao Guangrong
@ 2011-11-29 12:01                     ` Avi Kivity
  2011-11-29 14:03                       ` Avi Kivity
  2011-11-30  7:03                       ` Xiao Guangrong
  0 siblings, 2 replies; 40+ messages in thread
From: Avi Kivity @ 2011-11-29 12:01 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Takuya Yoshikawa, Marcelo Tosatti, KVM

On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
> On 11/29/2011 07:20 PM, Avi Kivity wrote:
>
>
> > We used to have a bitmap in a shadow page with a bit set for every slot
> > pointed to by the page.  If we extend this to non-leaf pages (so, when
> > we set a bit, we propagate it through its parent_ptes list), then we do
> > the following on write fault:
> > 
>
>
> Thanks for the detail.
>
> Um, propagating slot bit to parent ptes is little slow, especially, it
> is the overload for no Xwindow guests which is dirty logged only in the
> migration(i guess most linux guests are running on this mode and migration
> is not frequent). No?

You need to propagate very infrequently.  The first pte added to a page
will need to propagate, but the second (if from the same slot, which is
likely) will already have the bit set in the page, so we're assured it's
set in all its parents.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 12:01                     ` Avi Kivity
@ 2011-11-29 14:03                       ` Avi Kivity
  2011-11-30  5:02                         ` Takuya Yoshikawa
  2011-11-30  7:10                         ` Xiao Guangrong
  2011-11-30  7:03                       ` Xiao Guangrong
  1 sibling, 2 replies; 40+ messages in thread
From: Avi Kivity @ 2011-11-29 14:03 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Takuya Yoshikawa, Marcelo Tosatti, KVM

On 11/29/2011 02:01 PM, Avi Kivity wrote:
> On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
> > On 11/29/2011 07:20 PM, Avi Kivity wrote:
> >
> >
> > > We used to have a bitmap in a shadow page with a bit set for every slot
> > > pointed to by the page.  If we extend this to non-leaf pages (so, when
> > > we set a bit, we propagate it through its parent_ptes list), then we do
> > > the following on write fault:
> > > 
> >
> >
> > Thanks for the detail.
> >
> > Um, propagating slot bit to parent ptes is little slow, especially, it
> > is the overload for no Xwindow guests which is dirty logged only in the
> > migration(i guess most linux guests are running on this mode and migration
> > is not frequent). No?
>
> You need to propagate very infrequently.  The first pte added to a page
> will need to propagate, but the second (if from the same slot, which is
> likely) will already have the bit set in the page, so we're assured it's
> set in all its parents.

btw, if you plan to work on this, let's agree on pseudocode/data
structures first to minimize churn.  I'll also want this documented in
mmu.txt.  Of course we can still end up with something different than
planned, but let's at least try to think of the issues in advance.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 14:03                       ` Avi Kivity
@ 2011-11-30  5:02                         ` Takuya Yoshikawa
  2011-11-30  5:15                           ` Takuya Yoshikawa
  2011-11-30  7:10                         ` Xiao Guangrong
  1 sibling, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-30  5:02 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Xiao Guangrong, Marcelo Tosatti, KVM, quintela, qemu-devel,
	Takuya Yoshikawa

CCing qemu devel, Juan,

(2011/11/29 23:03), Avi Kivity wrote:
> On 11/29/2011 02:01 PM, Avi Kivity wrote:
>> On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
>>> On 11/29/2011 07:20 PM, Avi Kivity wrote:
>>>
>>>
>>>> We used to have a bitmap in a shadow page with a bit set for every slot
>>>> pointed to by the page.  If we extend this to non-leaf pages (so, when
>>>> we set a bit, we propagate it through its parent_ptes list), then we do
>>>> the following on write fault:
>>>>
>>>
>>>
>>> Thanks for the detail.
>>>
>>> Um, propagating slot bit to parent ptes is little slow, especially, it
>>> is the overload for no Xwindow guests which is dirty logged only in the
>>> migration(i guess most linux guests are running on this mode and migration
>>> is not frequent). No?
>>
>> You need to propagate very infrequently.  The first pte added to a page
>> will need to propagate, but the second (if from the same slot, which is
>> likely) will already have the bit set in the page, so we're assured it's
>> set in all its parents.
>
> btw, if you plan to work on this, let's agree on pseudocode/data
> structures first to minimize churn.  I'll also want this documented in
> mmu.txt.  Of course we can still end up with something different than
> planned, but let's at least try to think of the issues in advance.
>

I want to hear the overall view as well.

Now we are trying to improve cases when there are too many dirty pages during
live migration.

I did some measurements of live migration some months ago on 10Gbps dedicated line,
two servers were directly connected, and checked that transferring only a few MBs of
memory took ms order of latency, even if I excluded other QEMU side overheads: it
matches simple math calculation.

In another test, I found that even in a relatively normal workload, it needed a few
seconds of pause at the last timing.

	Juan has more data?

So, the current scheme is not scalable with respect to the number of dirty pages,
and administrators should control not to migrate during such workload if possible.

	Server consolidation in the night will be OK, but dynamic load balancing
	may not work well in such restrictions: I am now more interested in the
	former.

Then, taking that in mind, I put the goal on 1K dirty pages, 4MB memory, when
I did the rmap optimization.  Now it takes a few ms or so for write protecting
such number of pages, IIRC: that is not so bad compared to the overall latency?

So, though I like O(1) method, I want to hear the expected improvements in a bit
more detail, if possible.

IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write
protections with respect to the total number of dirty pages: distributed, but
actually each page fault, which should be logged, does some write protection?

In general, what kind of improvements actually needed for live migration?

Thanks,
	Takuya

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-30  5:02                         ` Takuya Yoshikawa
@ 2011-11-30  5:15                           ` Takuya Yoshikawa
  2011-12-01 15:18                             ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-11-30  5:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Xiao Guangrong, Marcelo Tosatti, KVM, quintela, qemu-devel,
	Takuya Yoshikawa

(2011/11/30 14:02), Takuya Yoshikawa wrote:

> IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it needs O(N) write
> protections with respect to the total number of dirty pages: distributed, but
> actually each page fault, which should be logged, does some write protection?

Sorry, was not precise.  It depends on the level, and not completely distributed.
But I think it is O(N), and the total number of costs will not change so much,
I guess.

	Takuya

>
> In general, what kind of improvements actually needed for live migration?
>
> Thanks,
> Takuya
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 12:01                     ` Avi Kivity
  2011-11-29 14:03                       ` Avi Kivity
@ 2011-11-30  7:03                       ` Xiao Guangrong
  2011-12-01 15:11                         ` Avi Kivity
  1 sibling, 1 reply; 40+ messages in thread
From: Xiao Guangrong @ 2011-11-30  7:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Takuya Yoshikawa, Marcelo Tosatti, KVM

On 11/29/2011 08:01 PM, Avi Kivity wrote:

> On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
>> On 11/29/2011 07:20 PM, Avi Kivity wrote:
>>
>>
>>> We used to have a bitmap in a shadow page with a bit set for every slot
>>> pointed to by the page.  If we extend this to non-leaf pages (so, when
>>> we set a bit, we propagate it through its parent_ptes list), then we do
>>> the following on write fault:
>>>
>>
>>
>> Thanks for the detail.
>>
>> Um, propagating slot bit to parent ptes is little slow, especially, it
>> is the overload for no Xwindow guests which is dirty logged only in the
>> migration(i guess most linux guests are running on this mode and migration
>> is not frequent). No?
> 
> You need to propagate very infrequently.  The first pte added to a page
> will need to propagate, but the second (if from the same slot, which is
> likely) will already have the bit set in the page, so we're assured it's
> set in all its parents.
> 


What will happen if a guest page is unmapped or a shadow page is zapped?
It should immediately clear the "slot" bit of the shadow page and its
parent, it means it should propagate this "clear slot bit" event to all
parents, in the case of softmmu. zapping shadow page is frequently, maybe
it is unacceptable?

It does not like "unsync" bit which can be lazily cleared, because all bits
of hierarchy can be cleared when cr3 reload.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-29 14:03                       ` Avi Kivity
  2011-11-30  5:02                         ` Takuya Yoshikawa
@ 2011-11-30  7:10                         ` Xiao Guangrong
  1 sibling, 0 replies; 40+ messages in thread
From: Xiao Guangrong @ 2011-11-30  7:10 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Takuya Yoshikawa, Marcelo Tosatti, KVM

On 11/29/2011 10:03 PM, Avi Kivity wrote:

> On 11/29/2011 02:01 PM, Avi Kivity wrote:
>> On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
>>> On 11/29/2011 07:20 PM, Avi Kivity wrote:
>>>
>>>
>>>> We used to have a bitmap in a shadow page with a bit set for every slot
>>>> pointed to by the page.  If we extend this to non-leaf pages (so, when
>>>> we set a bit, we propagate it through its parent_ptes list), then we do
>>>> the following on write fault:
>>>>
>>>
>>>
>>> Thanks for the detail.
>>>
>>> Um, propagating slot bit to parent ptes is little slow, especially, it
>>> is the overload for no Xwindow guests which is dirty logged only in the
>>> migration(i guess most linux guests are running on this mode and migration
>>> is not frequent). No?
>>
>> You need to propagate very infrequently.  The first pte added to a page
>> will need to propagate, but the second (if from the same slot, which is
>> likely) will already have the bit set in the page, so we're assured it's
>> set in all its parents.
> 
> btw, if you plan to work on this, let's agree on pseudocode/data
> structures first to minimize churn.  I'll also want this documented in
> mmu.txt.  Of course we can still end up with something different than
> planned, but let's at least try to think of the issues in advance.
> 


Yeap, this work is interesting, i will keep researching it. ;)


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-30  7:03                       ` Xiao Guangrong
@ 2011-12-01 15:11                         ` Avi Kivity
  0 siblings, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2011-12-01 15:11 UTC (permalink / raw)
  To: Xiao Guangrong; +Cc: Takuya Yoshikawa, Marcelo Tosatti, KVM

On 11/30/2011 09:03 AM, Xiao Guangrong wrote:
> On 11/29/2011 08:01 PM, Avi Kivity wrote:
>
> > On 11/29/2011 01:56 PM, Xiao Guangrong wrote:
> >> On 11/29/2011 07:20 PM, Avi Kivity wrote:
> >>
> >>
> >>> We used to have a bitmap in a shadow page with a bit set for every slot
> >>> pointed to by the page.  If we extend this to non-leaf pages (so, when
> >>> we set a bit, we propagate it through its parent_ptes list), then we do
> >>> the following on write fault:
> >>>
> >>
> >>
> >> Thanks for the detail.
> >>
> >> Um, propagating slot bit to parent ptes is little slow, especially, it
> >> is the overload for no Xwindow guests which is dirty logged only in the
> >> migration(i guess most linux guests are running on this mode and migration
> >> is not frequent). No?
> > 
> > You need to propagate very infrequently.  The first pte added to a page
> > will need to propagate, but the second (if from the same slot, which is
> > likely) will already have the bit set in the page, so we're assured it's
> > set in all its parents.
> > 
>
>
> What will happen if a guest page is unmapped or a shadow page is zapped?
> It should immediately clear the "slot" bit of the shadow page and its
> parent, it means it should propagate this "clear slot bit" event to all
> parents, in the case of softmmu. zapping shadow page is frequently, maybe
> it is unacceptable?

You can keep the bit set.  A clear bit means there are exactly zero
pages from the slot in this mmu page or its descendents.  A set bit
means there are zero or more pages.  If we have a bit accidentally set,
nothing bad happens.

> It does not like "unsync" bit which can be lazily cleared, because all bits
> of hierarchy can be cleared when cr3 reload.

With tdp (and without nested virt) the mappings never change anyway. 
With shadow, they do change.  Not sure how to handle that at the higher
levels.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-11-30  5:15                           ` Takuya Yoshikawa
@ 2011-12-01 15:18                             ` Avi Kivity
  2011-12-03  4:37                               ` Takuya Yoshikawa
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2011-12-01 15:18 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: Xiao Guangrong, Marcelo Tosatti, KVM, quintela, qemu-devel,
	Takuya Yoshikawa

On 11/30/2011 07:15 AM, Takuya Yoshikawa wrote:
> (2011/11/30 14:02), Takuya Yoshikawa wrote:
>
>> IIUC, even though O(1) is O(1) at the timing of GET DIRTY LOG, it
>> needs O(N) write
>> protections with respect to the total number of dirty pages:
>> distributed, but
>> actually each page fault, which should be logged, does some write
>> protection?
>
> Sorry, was not precise.  It depends on the level, and not completely
> distributed.
> But I think it is O(N), and the total number of costs will not change
> so much,
> I guess.

That's true.  But some applications do require low latency, and the
current code can impose a lot of time with the mmu spinlock held.

The total amount of work actually increases slightly, from O(N) to O(N
log N), but since the tree is so wide, the overhead is small.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-12-01 15:18                             ` Avi Kivity
@ 2011-12-03  4:37                               ` Takuya Yoshikawa
  2011-12-04 10:20                                 ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-12-03  4:37 UTC (permalink / raw)
  To: Avi Kivity
  Cc: KVM, quintela, Marcelo Tosatti, qemu-devel, Xiao Guangrong,
	Takuya Yoshikawa

Avi Kivity <avi@redhat.com> wrote:
> That's true.  But some applications do require low latency, and the
> current code can impose a lot of time with the mmu spinlock held.
> 
> The total amount of work actually increases slightly, from O(N) to O(N
> log N), but since the tree is so wide, the overhead is small.
> 

Controlling the latency can be achieved by making the user space limit
the number of dirty pages to scan without hacking the core mmu code.

	The fact that we cannot transfer so many pages on the network at
	once suggests this is reasonable.

With the rmap write protection method in KVM, the only thing we need is
a new GET_DIRTY_LOG api which takes the [gfn_start, gfn_end] to scan,
or max_write_protections optionally.

	I remember that someone suggested splitting the slot at KVM forum.
	Same effect with less effort.

QEMU can also avoid unwanted page faults by using this api wisely.

	E.g. you can use this for "Interactivity improvements" TODO on
	KVM wiki, I think.

Furthermore, QEMU may be able to use multiple threads for the memory
copy task.

	Each thread has its own range of memory to copy, and does
	GET_DIRTY_LOG independently.  This will make things easy to
	add further optimizations in QEMU.

In summary, my impression is that the main cause of the current latency
problem is not the write protection of KVM but the strategy which tries
to cook the large slot in one hand.

What do you think?

	Takuya

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 0/4] KVM: Dirty logging optimization using rmap
  2011-12-03  4:37                               ` Takuya Yoshikawa
@ 2011-12-04 10:20                                 ` Avi Kivity
  0 siblings, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2011-12-04 10:20 UTC (permalink / raw)
  To: Takuya Yoshikawa
  Cc: KVM, quintela, Marcelo Tosatti, qemu-devel, Xiao Guangrong,
	Takuya Yoshikawa

On 12/03/2011 06:37 AM, Takuya Yoshikawa wrote:
> Avi Kivity <avi@redhat.com> wrote:
> > That's true.  But some applications do require low latency, and the
> > current code can impose a lot of time with the mmu spinlock held.
> > 
> > The total amount of work actually increases slightly, from O(N) to O(N
> > log N), but since the tree is so wide, the overhead is small.
> > 
>
> Controlling the latency can be achieved by making the user space limit
> the number of dirty pages to scan without hacking the core mmu code.
>
> 	The fact that we cannot transfer so many pages on the network at
> 	once suggests this is reasonable.

That is true.  Write protecting everything at once means that there is a
large window between the sampling the dirty log, and transferring the
page.  Any writes within that window cause a re-transfer, even when they
should not.

>
> With the rmap write protection method in KVM, the only thing we need is
> a new GET_DIRTY_LOG api which takes the [gfn_start, gfn_end] to scan,
> or max_write_protections optionally.

Right.

>
> 	I remember that someone suggested splitting the slot at KVM forum.
> 	Same effect with less effort.
>
> QEMU can also avoid unwanted page faults by using this api wisely.
>
> 	E.g. you can use this for "Interactivity improvements" TODO on
> 	KVM wiki, I think.
>
> Furthermore, QEMU may be able to use multiple threads for the memory
> copy task.
>
> 	Each thread has its own range of memory to copy, and does
> 	GET_DIRTY_LOG independently.  This will make things easy to
> 	add further optimizations in QEMU.
>
> In summary, my impression is that the main cause of the current latency
> problem is not the write protection of KVM but the strategy which tries
> to cook the large slot in one hand.
>
> What do you think?

I agree.  Maybe O(1) write protection has a place, but it is secondary
to fine-grained dirty logging, and if we implement it, it should be
after your idea, and further measurements.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-11-14 10:07   ` Avi Kivity
  2011-11-14 10:24     ` Avi Kivity
@ 2011-12-20  4:29     ` Takuya Yoshikawa
  2011-12-23 11:14       ` Marcelo Tosatti
  1 sibling, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-12-20  4:29 UTC (permalink / raw)
  To: Avi Kivity; +Cc: mtosatti, kvm, takuya.yoshikawa

(2011/11/14 19:07), Avi Kivity wrote:
>> @@ -1491,7 +1492,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
>>   	if (memslot&&  memslot->dirty_bitmap) {
>>   		unsigned long rel_gfn = gfn - memslot->base_gfn;
>>
>> -		__set_bit_le(rel_gfn, memslot->dirty_bitmap);
>> +		if (!__test_and_set_bit_le(rel_gfn, memslot->dirty_bitmap))
>> +			memslot->nr_dirty_pages++;
>>   	}
>>   }
>>

[snip]

> btw mark_page_dirty() itself seems to assume mmu_lock protection that
> doesn't exist.  Marcelo?
>

I want to hear the answer for this question.

Though I myself is reading the code, I cannot understand it thoroughly yet.
I wish if there were mmu_lock entry in locking.txt ...

	Takuya

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-12-20  4:29     ` Takuya Yoshikawa
@ 2011-12-23 11:14       ` Marcelo Tosatti
  2011-12-24  2:52         ` Takuya Yoshikawa
  0 siblings, 1 reply; 40+ messages in thread
From: Marcelo Tosatti @ 2011-12-23 11:14 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Avi Kivity, kvm, takuya.yoshikawa

On Tue, Dec 20, 2011 at 01:29:35PM +0900, Takuya Yoshikawa wrote:
> (2011/11/14 19:07), Avi Kivity wrote:
> >>@@ -1491,7 +1492,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot,
> >>  	if (memslot&&  memslot->dirty_bitmap) {
> >>  		unsigned long rel_gfn = gfn - memslot->base_gfn;
> >>
> >>-		__set_bit_le(rel_gfn, memslot->dirty_bitmap);
> >>+		if (!__test_and_set_bit_le(rel_gfn, memslot->dirty_bitmap))
> >>+			memslot->nr_dirty_pages++;
> >>  	}
> >>  }
> >>
> 
> [snip]
> 
> >btw mark_page_dirty() itself seems to assume mmu_lock protection that
> >doesn't exist.  Marcelo?
> >

Not mmu_lock protection, kvm->srcu protection.

> I want to hear the answer for this question.
> 
> Though I myself is reading the code, I cannot understand it thoroughly yet.
> I wish if there were mmu_lock entry in locking.txt ...

Agreed.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-12-23 11:14       ` Marcelo Tosatti
@ 2011-12-24  2:52         ` Takuya Yoshikawa
  2011-12-27 13:50           ` Marcelo Tosatti
  0 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-12-24  2:52 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Takuya Yoshikawa, Avi Kivity, kvm

On Fri, 23 Dec 2011 09:14:51 -0200
Marcelo Tosatti <mtosatti@redhat.com> wrote:

> > >btw mark_page_dirty() itself seems to assume mmu_lock protection that
> > >doesn't exist.  Marcelo?
> > >
> 
> Not mmu_lock protection, kvm->srcu protection.

But it just protects slot readers against updates and two, or more, threads
can call mark_page_dirty() concurrently?

What I am worring about here is the atomicity of bitmap updates.

	commit c8240bd6f0b4b1b21ffd36dd44114d05c7afe0c0
	Author: Alexander Graf <agraf@suse.de>
	Date:   Fri Oct 30 05:47:26 2009 +0000

		Use Little Endian for Dirty Bitmap

has changed set_bit() to the non-atomic version and nothing protects
dirty bits if mmu_lock is not held.

	The changelog has no explanation why using non-atomic version is safe.
	Some comment in the code may be worthwhile if it is really safe.

I want to see some clear reasoning now if possible.

	Takuya

> 
> > I want to hear the answer for this question.
> > 
> > Though I myself is reading the code, I cannot understand it thoroughly yet.
> > I wish if there were mmu_lock entry in locking.txt ...
> 
> Agreed.
> 

-- 
Takuya Yoshikawa <takuya.yoshikawa@gmail.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-12-24  2:52         ` Takuya Yoshikawa
@ 2011-12-27 13:50           ` Marcelo Tosatti
  2011-12-27 14:03             ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Marcelo Tosatti @ 2011-12-27 13:50 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Takuya Yoshikawa, Avi Kivity, kvm

On Sat, Dec 24, 2011 at 11:52:51AM +0900, Takuya Yoshikawa wrote:
> On Fri, 23 Dec 2011 09:14:51 -0200
> Marcelo Tosatti <mtosatti@redhat.com> wrote:
> 
> > > >btw mark_page_dirty() itself seems to assume mmu_lock protection that
> > > >doesn't exist.  Marcelo?
> > > >
> > 
> > Not mmu_lock protection, kvm->srcu protection.
> 
> But it just protects slot readers against updates and two, or more, threads
> can call mark_page_dirty() concurrently?

Yes, two or more threads can call mark_page_dirty() concurrently.

> What I am worring about here is the atomicity of bitmap updates.
> 
> 	commit c8240bd6f0b4b1b21ffd36dd44114d05c7afe0c0
> 	Author: Alexander Graf <agraf@suse.de>
> 	Date:   Fri Oct 30 05:47:26 2009 +0000
> 
> 		Use Little Endian for Dirty Bitmap
> 
> has changed set_bit() to the non-atomic version and nothing protects
> dirty bits if mmu_lock is not held.
> 
> 	The changelog has no explanation why using non-atomic version is safe.
> 	Some comment in the code may be worthwhile if it is really safe.

It should not be necessary, atomicity of updates to
memslot->dirty_bitmap, because of RCU updates to the memslot pointer
(note memslot = gfn_to_memslot(kvm, gfn); in mark_page_dirty).

The order is:

- update page data.
- grab memslot pointer.
- set page bit in memslot.

> I want to see some clear reasoning now if possible.

The reasoning is stated above, but reviewing the code to make sure its
correct is not a bad idea.

> 	Takuya
> 
> > 
> > > I want to hear the answer for this question.
> > > 
> > > Though I myself is reading the code, I cannot understand it thoroughly yet.
> > > I wish if there were mmu_lock entry in locking.txt ...
> > 
> > Agreed.
> > 
> 
> -- 
> Takuya Yoshikawa <takuya.yoshikawa@gmail.com>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-12-27 13:50           ` Marcelo Tosatti
@ 2011-12-27 14:03             ` Avi Kivity
  2011-12-27 15:03               ` Takuya Yoshikawa
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2011-12-27 14:03 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Takuya Yoshikawa, Takuya Yoshikawa, kvm

On 12/27/2011 03:50 PM, Marcelo Tosatti wrote:
> On Sat, Dec 24, 2011 at 11:52:51AM +0900, Takuya Yoshikawa wrote:
> > On Fri, 23 Dec 2011 09:14:51 -0200
> > Marcelo Tosatti <mtosatti@redhat.com> wrote:
> > 
> > > > >btw mark_page_dirty() itself seems to assume mmu_lock protection that
> > > > >doesn't exist.  Marcelo?
> > > > >
> > > 
> > > Not mmu_lock protection, kvm->srcu protection.
> > 
> > But it just protects slot readers against updates and two, or more, threads
> > can call mark_page_dirty() concurrently?
>
> Yes, two or more threads can call mark_page_dirty() concurrently.
>
> > What I am worring about here is the atomicity of bitmap updates.
> > 
> > 	commit c8240bd6f0b4b1b21ffd36dd44114d05c7afe0c0
> > 	Author: Alexander Graf <agraf@suse.de>
> > 	Date:   Fri Oct 30 05:47:26 2009 +0000
> > 
> > 		Use Little Endian for Dirty Bitmap
> > 
> > has changed set_bit() to the non-atomic version and nothing protects
> > dirty bits if mmu_lock is not held.
> > 
> > 	The changelog has no explanation why using non-atomic version is safe.
> > 	Some comment in the code may be worthwhile if it is really safe.
>
> It should not be necessary, atomicity of updates to
> memslot->dirty_bitmap, because of RCU updates to the memslot pointer
> (note memslot = gfn_to_memslot(kvm, gfn); in mark_page_dirty).
>
> The order is:
>
> - update page data.
> - grab memslot pointer.
> - set page bit in memslot.
>

What if both grab the same memslot pointer, but set different bits?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-12-27 14:03             ` Avi Kivity
@ 2011-12-27 15:03               ` Takuya Yoshikawa
  2011-12-27 15:06                 ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-12-27 15:03 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Takuya Yoshikawa, kvm

Avi Kivity <avi@redhat.com> wrote:

> > It should not be necessary, atomicity of updates to
> > memslot->dirty_bitmap, because of RCU updates to the memslot pointer
> > (note memslot = gfn_to_memslot(kvm, gfn); in mark_page_dirty).
> >
> > The order is:
> >
> > - update page data.
> > - grab memslot pointer.
> > - set page bit in memslot.
> >
> 
> What if both grab the same memslot pointer, but set different bits?

(Sorry for my bad wording, that is what I wanted to explain.)

There seems to be possible concurrent __set_bit()'s to the same word
which will result in bit loss.

I was thinking to change the kvm-unit-tests' dirty log test to write
to two consecutive pages concurrently.  Is this good for debugging this?

	Takuya

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-12-27 15:03               ` Takuya Yoshikawa
@ 2011-12-27 15:06                 ` Avi Kivity
  2011-12-27 15:15                   ` Takuya Yoshikawa
  0 siblings, 1 reply; 40+ messages in thread
From: Avi Kivity @ 2011-12-27 15:06 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Marcelo Tosatti, Takuya Yoshikawa, kvm

On 12/27/2011 05:03 PM, Takuya Yoshikawa wrote:
> Avi Kivity <avi@redhat.com> wrote:
>
> > > It should not be necessary, atomicity of updates to
> > > memslot->dirty_bitmap, because of RCU updates to the memslot pointer
> > > (note memslot = gfn_to_memslot(kvm, gfn); in mark_page_dirty).
> > >
> > > The order is:
> > >
> > > - update page data.
> > > - grab memslot pointer.
> > > - set page bit in memslot.
> > >
> > 
> > What if both grab the same memslot pointer, but set different bits?
>
> (Sorry for my bad wording, that is what I wanted to explain.)
>
> There seems to be possible concurrent __set_bit()'s to the same word
> which will result in bit loss.
>
> I was thinking to change the kvm-unit-tests' dirty log test to write
> to two consecutive pages concurrently.  Is this good for debugging this?

Yes, it already tests some concurrency issues.  Please do write the
test, we should do this a lot more often.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-12-27 15:06                 ` Avi Kivity
@ 2011-12-27 15:15                   ` Takuya Yoshikawa
  2011-12-27 15:18                     ` Avi Kivity
  0 siblings, 1 reply; 40+ messages in thread
From: Takuya Yoshikawa @ 2011-12-27 15:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Marcelo Tosatti, Takuya Yoshikawa, kvm

On Tue, 27 Dec 2011 17:06:47 +0200
Avi Kivity <avi@redhat.com> wrote:

> > There seems to be possible concurrent __set_bit()'s to the same word
> > which will result in bit loss.
> >
> > I was thinking to change the kvm-unit-tests' dirty log test to write
> > to two consecutive pages concurrently.  Is this good for debugging this?
> 
> Yes, it already tests some concurrency issues.  Please do write the
> test, we should do this a lot more often.
> 

OK, I will write the test.

If I can find a bit loss by that, I will send a fix patch: this is an
important issue for my current work.

	Takuya

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging
  2011-12-27 15:15                   ` Takuya Yoshikawa
@ 2011-12-27 15:18                     ` Avi Kivity
  0 siblings, 0 replies; 40+ messages in thread
From: Avi Kivity @ 2011-12-27 15:18 UTC (permalink / raw)
  To: Takuya Yoshikawa; +Cc: Marcelo Tosatti, Takuya Yoshikawa, kvm

On 12/27/2011 05:15 PM, Takuya Yoshikawa wrote:
> On Tue, 27 Dec 2011 17:06:47 +0200
> Avi Kivity <avi@redhat.com> wrote:
>
> > > There seems to be possible concurrent __set_bit()'s to the same word
> > > which will result in bit loss.
> > >
> > > I was thinking to change the kvm-unit-tests' dirty log test to write
> > > to two consecutive pages concurrently.  Is this good for debugging this?
> > 
> > Yes, it already tests some concurrency issues.  Please do write the
> > test, we should do this a lot more often.
> > 
>
> OK, I will write the test.
>
> If I can find a bit loss by that, I will send a fix patch: this is an
> important issue for my current work.

I'm pretty sure that the issue is real; if the test doesn't find an
error, it's probably not trying hard enough.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2011-12-27 15:18 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-14  9:20 [PATCH 0/4] KVM: Dirty logging optimization using rmap Takuya Yoshikawa
2011-11-14  9:21 ` [PATCH 1/4] KVM: MMU: Clean up BUG_ON() conditions in rmap_write_protect() Takuya Yoshikawa
2011-11-14  9:22 ` [PATCH 2/4] KVM: MMU: Split gfn_to_rmap() into two functions Takuya Yoshikawa
2011-11-14  9:23 ` [PATCH 3/4] KVM: Count the number of dirty pages for dirty logging Takuya Yoshikawa
2011-11-14 10:07   ` Avi Kivity
2011-11-14 10:24     ` Avi Kivity
2011-12-20  4:29     ` Takuya Yoshikawa
2011-12-23 11:14       ` Marcelo Tosatti
2011-12-24  2:52         ` Takuya Yoshikawa
2011-12-27 13:50           ` Marcelo Tosatti
2011-12-27 14:03             ` Avi Kivity
2011-12-27 15:03               ` Takuya Yoshikawa
2011-12-27 15:06                 ` Avi Kivity
2011-12-27 15:15                   ` Takuya Yoshikawa
2011-12-27 15:18                     ` Avi Kivity
2011-11-14  9:24 ` [PATCH 4/4] KVM: Optimize dirty logging by rmap_write_protect() Takuya Yoshikawa
2011-11-14 10:22   ` Avi Kivity
2011-11-14 10:29     ` Takuya Yoshikawa
2011-11-14 10:25 ` [PATCH 0/4] KVM: Dirty logging optimization using rmap Avi Kivity
2011-11-14 10:56   ` Takuya Yoshikawa
2011-11-14 12:39     ` Avi Kivity
2011-11-16  4:28       ` Takuya Yoshikawa
2011-11-16  9:06         ` Avi Kivity
2011-11-29 10:01           ` Xiao Guangrong
2011-11-29 10:09             ` Xiao Guangrong
2011-11-29 10:35               ` Takuya Yoshikawa
2011-11-29 11:20                 ` Avi Kivity
2011-11-29 11:56                   ` Xiao Guangrong
2011-11-29 12:01                     ` Avi Kivity
2011-11-29 14:03                       ` Avi Kivity
2011-11-30  5:02                         ` Takuya Yoshikawa
2011-11-30  5:15                           ` Takuya Yoshikawa
2011-12-01 15:18                             ` Avi Kivity
2011-12-03  4:37                               ` Takuya Yoshikawa
2011-12-04 10:20                                 ` Avi Kivity
2011-11-30  7:10                         ` Xiao Guangrong
2011-11-30  7:03                       ` Xiao Guangrong
2011-12-01 15:11                         ` Avi Kivity
2011-11-16  8:17       ` Takuya Yoshikawa
2011-11-17  9:28 ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).