kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] KVM: s390: fix storage attributes migration
@ 2018-01-15 17:03 Claudio Imbrenda
  2018-01-15 17:03 ` [PATCH v1 1/2] KVM: s390x: some utility functions for migration Claudio Imbrenda
  2018-01-15 17:03 ` [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots Claudio Imbrenda
  0 siblings, 2 replies; 10+ messages in thread
From: Claudio Imbrenda @ 2018-01-15 17:03 UTC (permalink / raw)
  To: kvm; +Cc: borntraeger, cohuck, pbonzini, david, schwidefsky

This patchset fixes the migration of storage attributes.

Each memory slot has a bitmap, with 2 bits per page, used to keep track
of dirty pages during migration. We only use one bit, the other would be
wasted. With this patch, the second bit is now used to keep track of
dirty storage attributes. This means that we do not need anymore to
allocate and manage the additional bitmap, and no extra work is needed
to keep track of memory slots. The code does get a little bit more
complicated when accounting for the memory slots, but overall it should
be more robust.

the first patch simply introduces two auxiliary functions, while the
second patch does the actual job.

Claudio Imbrenda (2):
  KVM: s390x: some utility functions for migration
  KVM: s390: Fix storage attributes migration with memory slots

 arch/s390/include/asm/kvm_host.h |   9 +-
 arch/s390/kvm/kvm-s390.c         | 286 +++++++++++++++++++++++++--------------
 arch/s390/kvm/priv.c             |  33 +++--
 3 files changed, 208 insertions(+), 120 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v1 1/2] KVM: s390x: some utility functions for migration
  2018-01-15 17:03 [PATCH v1 0/2] KVM: s390: fix storage attributes migration Claudio Imbrenda
@ 2018-01-15 17:03 ` Claudio Imbrenda
  2018-01-16 18:03   ` David Hildenbrand
  2018-01-15 17:03 ` [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots Claudio Imbrenda
  1 sibling, 1 reply; 10+ messages in thread
From: Claudio Imbrenda @ 2018-01-15 17:03 UTC (permalink / raw)
  To: kvm; +Cc: borntraeger, cohuck, pbonzini, david, schwidefsky

These are some utilty functions that will be used later on for storage
attributes migration.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
---
 arch/s390/kvm/kvm-s390.c | 40 ++++++++++++++++++++++++++++++++++++++++
 1 file changed, 40 insertions(+)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 6f17031..100ea15 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -764,6 +764,14 @@ static void kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
 		kvm_s390_sync_request(req, vcpu);
 }
 
+static inline unsigned long *_cmma_bitmap(struct kvm_memory_slot *ms)
+{
+	unsigned long long len;
+
+	len = kvm_dirty_bitmap_bytes(ms) / sizeof(*ms->dirty_bitmap);
+	return ms->dirty_bitmap + len;
+}
+
 /*
  * Must be called with kvm->srcu held to avoid races on memslots, and with
  * kvm->lock to avoid races with ourselves and kvm_s390_vm_stop_migration.
@@ -1512,6 +1520,38 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
 #define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX)
 
 /*
+ * Similar to gfn_to_memslot, but returns a memslot also when the address falls
+ * in a hole. In that case a memslot near the hole is returned.
+ */
+static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	int start = 0, end = slots->used_slots;
+	int slot = atomic_read(&slots->lru_slot);
+	struct kvm_memory_slot *memslots = slots->memslots;
+
+	if (gfn >= memslots[slot].base_gfn &&
+	    gfn < memslots[slot].base_gfn + memslots[slot].npages)
+		return slot;
+
+	while (start < end) {
+		slot = start + (end - start) / 2;
+
+		if (gfn >= memslots[slot].base_gfn)
+			end = slot;
+		else
+			start = slot + 1;
+	}
+
+	if (gfn >= memslots[start].base_gfn &&
+	    gfn < memslots[start].base_gfn + memslots[start].npages) {
+		atomic_set(&slots->lru_slot, start);
+	}
+
+	return start;
+}
+
+/*
  * This function searches for the next page with dirty CMMA attributes, and
  * saves the attributes in the buffer up to either the end of the buffer or
  * until a block of at least KVM_S390_MAX_BIT_DISTANCE clean bits is found;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots
  2018-01-15 17:03 [PATCH v1 0/2] KVM: s390: fix storage attributes migration Claudio Imbrenda
  2018-01-15 17:03 ` [PATCH v1 1/2] KVM: s390x: some utility functions for migration Claudio Imbrenda
@ 2018-01-15 17:03 ` Claudio Imbrenda
  2018-01-17  9:17   ` Janosch Frank
  2018-01-17 13:01   ` Janosch Frank
  1 sibling, 2 replies; 10+ messages in thread
From: Claudio Imbrenda @ 2018-01-15 17:03 UTC (permalink / raw)
  To: kvm; +Cc: borntraeger, cohuck, pbonzini, david, schwidefsky

This is a fix for several issues that were found in the original code
for storage attributes migration.

Now no bitmap is allocated to keep track of dirty storage attributes;
the extra bits of the per-memslot bitmap that are always present anyway
are now used for this purpose.

The code has also been refactored a little to improve readability.

Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and set guest storage attributes")
---
 arch/s390/include/asm/kvm_host.h |   9 +-
 arch/s390/kvm/kvm-s390.c         | 246 ++++++++++++++++++++++-----------------
 arch/s390/kvm/priv.c             |  33 ++++--
 3 files changed, 168 insertions(+), 120 deletions(-)

diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index 819838c..2ca73b0 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -763,12 +763,6 @@ struct kvm_s390_vsie {
 	struct page *pages[KVM_MAX_VCPUS];
 };
 
-struct kvm_s390_migration_state {
-	unsigned long bitmap_size;	/* in bits (number of guest pages) */
-	atomic64_t dirty_pages;		/* number of dirty pages */
-	unsigned long *pgste_bitmap;
-};
-
 struct kvm_arch{
 	void *sca;
 	int use_esca;
@@ -796,7 +790,8 @@ struct kvm_arch{
 	struct kvm_s390_vsie vsie;
 	u8 epdx;
 	u64 epoch;
-	struct kvm_s390_migration_state *migration_state;
+	atomic_t migration_mode;
+	atomic64_t cmma_dirty_pages;
 	/* subset of available cpu features enabled by user space */
 	DECLARE_BITMAP(cpu_feat, KVM_S390_VM_CPU_FEAT_NR_BITS);
 	struct kvm_s390_gisa *gisa;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 100ea15..c8e1cce 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -778,52 +778,38 @@ static inline unsigned long *_cmma_bitmap(struct kvm_memory_slot *ms)
  */
 static int kvm_s390_vm_start_migration(struct kvm *kvm)
 {
-	struct kvm_s390_migration_state *mgs;
-	struct kvm_memory_slot *ms;
-	/* should be the only one */
 	struct kvm_memslots *slots;
-	unsigned long ram_pages;
+	struct kvm_memory_slot *ms;
+	unsigned long ram_pages = 0;
 	int slotnr;
 
 	/* migration mode already enabled */
-	if (kvm->arch.migration_state)
+	if (atomic_read(&kvm->arch.migration_mode))
 		return 0;
-
 	slots = kvm_memslots(kvm);
 	if (!slots || !slots->used_slots)
 		return -EINVAL;
 
-	mgs = kzalloc(sizeof(*mgs), GFP_KERNEL);
-	if (!mgs)
-		return -ENOMEM;
-	kvm->arch.migration_state = mgs;
-
 	if (kvm->arch.use_cmma) {
-		/*
-		 * Get the last slot. They should be sorted by base_gfn, so the
-		 * last slot is also the one at the end of the address space.
-		 * We have verified above that at least one slot is present.
-		 */
-		ms = slots->memslots + slots->used_slots - 1;
-		/* round up so we only use full longs */
-		ram_pages = roundup(ms->base_gfn + ms->npages, BITS_PER_LONG);
-		/* allocate enough bytes to store all the bits */
-		mgs->pgste_bitmap = vmalloc(ram_pages / 8);
-		if (!mgs->pgste_bitmap) {
-			kfree(mgs);
-			kvm->arch.migration_state = NULL;
-			return -ENOMEM;
-		}
-
-		mgs->bitmap_size = ram_pages;
-		atomic64_set(&mgs->dirty_pages, ram_pages);
 		/* mark all the pages in active slots as dirty */
 		for (slotnr = 0; slotnr < slots->used_slots; slotnr++) {
 			ms = slots->memslots + slotnr;
-			bitmap_set(mgs->pgste_bitmap, ms->base_gfn, ms->npages);
+			/*
+			 * The second half of the bitmap is only used on x86,
+			 * and would be wasted otherwise, so we put it to good
+			 * use here to keep track of the state of the storage
+			 * attributes.
+			 */
+			memset(_cmma_bitmap(ms), 0xFF,
+				kvm_dirty_bitmap_bytes(ms));
+			ram_pages += ms->npages;
 		}
 
+		atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages);
+		atomic_set(&kvm->arch.migration_mode, 1);
 		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);
+	} else {
+		atomic_set(&kvm->arch.migration_mode, 1);
 	}
 	return 0;
 }
@@ -834,19 +820,12 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
  */
 static int kvm_s390_vm_stop_migration(struct kvm *kvm)
 {
-	struct kvm_s390_migration_state *mgs;
-
 	/* migration mode already disabled */
-	if (!kvm->arch.migration_state)
+	if (!atomic_read(&kvm->arch.migration_mode))
 		return 0;
-	mgs = kvm->arch.migration_state;
-	kvm->arch.migration_state = NULL;
-
-	if (kvm->arch.use_cmma) {
+	atomic_set(&kvm->arch.migration_mode, 0);
+	if (kvm->arch.use_cmma)
 		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
-		vfree(mgs->pgste_bitmap);
-	}
-	kfree(mgs);
 	return 0;
 }
 
@@ -876,7 +855,7 @@ static int kvm_s390_vm_set_migration(struct kvm *kvm,
 static int kvm_s390_vm_get_migration(struct kvm *kvm,
 				     struct kvm_device_attr *attr)
 {
-	u64 mig = (kvm->arch.migration_state != NULL);
+	u64 mig = atomic_read(&kvm->arch.migration_mode);
 
 	if (attr->attr != KVM_S390_VM_MIGRATION_STATUS)
 		return -ENXIO;
@@ -1551,6 +1530,112 @@ static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn)
 	return start;
 }
 
+static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args,
+			      u8 *res, unsigned long bufsize)
+{
+	unsigned long pgstev, cur, hva, i = 0;
+	int r, ret = 0;
+
+	cur = args->start_gfn;
+	while (i < bufsize) {
+		hva = gfn_to_hva(kvm, cur);
+		if (kvm_is_error_hva(hva)) {
+			if (!i)
+				ret = -EFAULT;
+			break;
+		}
+		r = get_pgste(kvm->mm, hva, &pgstev);
+		if (r < 0)
+			pgstev = 0;
+		res[i++] = (pgstev >> 24) & 0x43;
+		cur++;
+	}
+	args->count = i;
+
+	return ret;
+}
+
+static unsigned long kvm_s390_next_dirty_cmma(struct kvm *kvm,
+					      unsigned long cur)
+{
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *ms;
+	int slotidx;
+
+	slotidx = gfn_to_memslot_approx(kvm, cur);
+	ms = slots->memslots + slotidx;
+
+	if (ms->base_gfn + ms->npages <= cur) {
+		slotidx--;
+		/* If we are above the highest slot, wrap around */
+		if (slotidx < 0)
+			slotidx = slots->used_slots - 1;
+
+		ms = slots->memslots + slotidx;
+		cur = ms->base_gfn;
+	}
+	cur = find_next_bit(_cmma_bitmap(ms), ms->npages, cur - ms->base_gfn);
+	while ((slotidx > 0) && (cur >= ms->npages)) {
+		slotidx--;
+		ms = slots->memslots + slotidx;
+		cur = find_next_bit(_cmma_bitmap(ms), ms->npages,
+				    cur - ms->base_gfn);
+	}
+	return cur + ms->base_gfn;
+}
+
+static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args,
+			     u8 *res, unsigned long bufsize)
+{
+	unsigned long next, mem_end, cur, hva, pgstev, i = 0;
+	struct kvm_memslots *slots = kvm_memslots(kvm);
+	struct kvm_memory_slot *ms;
+	int r, ret = 0;
+
+	cur = kvm_s390_next_dirty_cmma(kvm, args->start_gfn);
+	ms = gfn_to_memslot(kvm, cur);
+	args->count = 0;
+	args->start_gfn = 0;
+	if (!ms)
+		return 0;
+	next = kvm_s390_next_dirty_cmma(kvm, cur + 1);
+	mem_end = slots->memslots[0].base_gfn + slots->memslots[0].npages;
+
+	args->start_gfn = cur;
+	while (i < bufsize) {
+		hva = gfn_to_hva(kvm, cur);
+		if (kvm_is_error_hva(hva))
+			break;
+		/* decrement only if we actually flipped the bit to 0 */
+		if (test_and_clear_bit(cur - ms->base_gfn, _cmma_bitmap(ms)))
+			atomic64_dec(&kvm->arch.cmma_dirty_pages);
+		r = get_pgste(kvm->mm, hva, &pgstev);
+		if (r < 0)
+			pgstev = 0;
+		/* save the value */
+		res[i++] = (pgstev >> 24) & 0x43;
+		/*
+		 * if the next bit is too far away, stop.
+		 * if we reached the previous "next", find the next one
+		 */
+		if (next > cur + KVM_S390_MAX_BIT_DISTANCE)
+			break;
+		if (cur == next)
+			next = kvm_s390_next_dirty_cmma(kvm, cur + 1);
+		/* reached the end of the bitmap or of the buffer, stop */
+		if ((next >= mem_end) || (next - args->start_gfn >= bufsize))
+			break;
+		cur++;
+		if (cur - ms->base_gfn >= ms->npages) {
+			ms = gfn_to_memslot(kvm, cur);
+			if (!ms)
+				break;
+		}
+	}
+	args->count = i;
+	return ret;
+}
+
 /*
  * This function searches for the next page with dirty CMMA attributes, and
  * saves the attributes in the buffer up to either the end of the buffer or
@@ -1562,90 +1647,47 @@ static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn)
 static int kvm_s390_get_cmma_bits(struct kvm *kvm,
 				  struct kvm_s390_cmma_log *args)
 {
-	struct kvm_s390_migration_state *s = kvm->arch.migration_state;
-	unsigned long bufsize, hva, pgstev, i, next, cur;
-	int srcu_idx, peek, r = 0, rr;
+	unsigned long bufsize;
+	int srcu_idx, peek, s, rr, r = 0;
 	u8 *res;
 
-	cur = args->start_gfn;
-	i = next = pgstev = 0;
+	s = atomic_read(&kvm->arch.migration_mode);
 
 	if (unlikely(!kvm->arch.use_cmma))
 		return -ENXIO;
 	/* Invalid/unsupported flags were specified */
-	if (args->flags & ~KVM_S390_CMMA_PEEK)
+	if (unlikely(args->flags & ~KVM_S390_CMMA_PEEK))
 		return -EINVAL;
 	/* Migration mode query, and we are not doing a migration */
 	peek = !!(args->flags & KVM_S390_CMMA_PEEK);
-	if (!peek && !s)
+	if (unlikely(!peek && !s))
 		return -EINVAL;
 	/* CMMA is disabled or was not used, or the buffer has length zero */
 	bufsize = min(args->count, KVM_S390_CMMA_SIZE_MAX);
-	if (!bufsize || !kvm->mm->context.use_cmma) {
+	if (unlikely(!bufsize || !kvm->mm->context.use_cmma)) {
 		memset(args, 0, sizeof(*args));
 		return 0;
 	}
-
-	if (!peek) {
-		/* We are not peeking, and there are no dirty pages */
-		if (!atomic64_read(&s->dirty_pages)) {
-			memset(args, 0, sizeof(*args));
-			return 0;
-		}
-		cur = find_next_bit(s->pgste_bitmap, s->bitmap_size,
-				    args->start_gfn);
-		if (cur >= s->bitmap_size)	/* nothing found, loop back */
-			cur = find_next_bit(s->pgste_bitmap, s->bitmap_size, 0);
-		if (cur >= s->bitmap_size) {	/* again! (very unlikely) */
-			memset(args, 0, sizeof(*args));
-			return 0;
-		}
-		next = find_next_bit(s->pgste_bitmap, s->bitmap_size, cur + 1);
+	/* We are not peeking, and there are no dirty pages */
+	if (!peek && !atomic64_read(&kvm->arch.cmma_dirty_pages)) {
+		memset(args, 0, sizeof(*args));
+		return 0;
 	}
 
 	res = vmalloc(bufsize);
 	if (!res)
 		return -ENOMEM;
 
-	args->start_gfn = cur;
-
 	down_read(&kvm->mm->mmap_sem);
 	srcu_idx = srcu_read_lock(&kvm->srcu);
-	while (i < bufsize) {
-		hva = gfn_to_hva(kvm, cur);
-		if (kvm_is_error_hva(hva)) {
-			r = -EFAULT;
-			break;
-		}
-		/* decrement only if we actually flipped the bit to 0 */
-		if (!peek && test_and_clear_bit(cur, s->pgste_bitmap))
-			atomic64_dec(&s->dirty_pages);
-		r = get_pgste(kvm->mm, hva, &pgstev);
-		if (r < 0)
-			pgstev = 0;
-		/* save the value */
-		res[i++] = (pgstev >> 24) & 0x43;
-		/*
-		 * if the next bit is too far away, stop.
-		 * if we reached the previous "next", find the next one
-		 */
-		if (!peek) {
-			if (next > cur + KVM_S390_MAX_BIT_DISTANCE)
-				break;
-			if (cur == next)
-				next = find_next_bit(s->pgste_bitmap,
-						     s->bitmap_size, cur + 1);
-		/* reached the end of the bitmap or of the buffer, stop */
-			if ((next >= s->bitmap_size) ||
-			    (next >= args->start_gfn + bufsize))
-				break;
-		}
-		cur++;
-	}
+	if (peek)
+		r = kvm_s390_peek_cmma(kvm, args, res, bufsize);
+	else
+		r = kvm_s390_get_cmma(kvm, args, res, bufsize);
 	srcu_read_unlock(&kvm->srcu, srcu_idx);
 	up_read(&kvm->mm->mmap_sem);
-	args->count = i;
-	args->remaining = s ? atomic64_read(&s->dirty_pages) : 0;
+
+	args->remaining = s ? atomic64_read(&kvm->arch.cmma_dirty_pages) : 0;
 
 	rr = copy_to_user((void __user *)args->values, res, args->count);
 	if (rr)
@@ -2096,10 +2138,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
 	kvm_s390_destroy_adapters(kvm);
 	kvm_s390_clear_float_irqs(kvm);
 	kvm_s390_vsie_destroy(kvm);
-	if (kvm->arch.migration_state) {
-		vfree(kvm->arch.migration_state->pgste_bitmap);
-		kfree(kvm->arch.migration_state);
-	}
 	KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
 }
 
diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
index 572496c..321d6b2 100644
--- a/arch/s390/kvm/priv.c
+++ b/arch/s390/kvm/priv.c
@@ -954,9 +954,11 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
 	return 0;
 }
 
-static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
+/*
+ * Must be called with relevant read locks held (kvm->mm->mmap_sem, kvm->srcu)
+ */
+static inline int __do_essa(struct kvm_vcpu *vcpu, const int orc)
 {
-	struct kvm_s390_migration_state *ms = vcpu->kvm->arch.migration_state;
 	int r1, r2, nappended, entries;
 	unsigned long gfn, hva, res, pgstev, ptev;
 	unsigned long *cbrlo;
@@ -965,7 +967,6 @@ static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
 	 * We don't need to set SD.FPF.SK to 1 here, because if we have a
 	 * machine check here we either handle it or crash
 	 */
-
 	kvm_s390_get_regs_rre(vcpu, &r1, &r2);
 	gfn = vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT;
 	hva = gfn_to_hva(vcpu->kvm, gfn);
@@ -1007,9 +1008,17 @@ static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
 	}
 
 	if (orc) {
-		/* increment only if we are really flipping the bit to 1 */
-		if (!test_and_set_bit(gfn, ms->pgste_bitmap))
-			atomic64_inc(&ms->dirty_pages);
+		struct kvm_memory_slot *ms = gfn_to_memslot(vcpu->kvm, gfn);
+		unsigned long *bm;
+
+		if (ms) {
+			/* The cmma bitmap is right after the memory bitmap */
+			bm = ms->dirty_bitmap + kvm_dirty_bitmap_bytes(ms)
+						/ sizeof(*ms->dirty_bitmap);
+			/* Increment only if we are really flipping the bit */
+			if (!test_and_set_bit(gfn - ms->base_gfn, bm))
+				atomic64_inc(&vcpu->kvm->arch.cmma_dirty_pages);
+		}
 	}
 
 	return nappended;
@@ -1038,7 +1047,7 @@ static int handle_essa(struct kvm_vcpu *vcpu)
 						: ESSA_SET_STABLE_IF_RESIDENT))
 		return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
 
-	if (likely(!vcpu->kvm->arch.migration_state)) {
+	if (likely(!atomic_read(&vcpu->kvm->arch.migration_mode))) {
 		/*
 		 * CMMA is enabled in the KVM settings, but is disabled in
 		 * the SIE block and in the mm_context, and we are not doing
@@ -1066,10 +1075,16 @@ static int handle_essa(struct kvm_vcpu *vcpu)
 		/* Retry the ESSA instruction */
 		kvm_s390_retry_instr(vcpu);
 	} else {
-		/* Account for the possible extra cbrl entry */
-		i = do_essa(vcpu, orc);
+		int srcu_idx;
+
+		down_read(&vcpu->kvm->mm->mmap_sem);
+		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+		i = __do_essa(vcpu, orc);
+		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+		up_read(&vcpu->kvm->mm->mmap_sem);
 		if (i < 0)
 			return i;
+		/* Account for the possible extra cbrl entry */
 		entries += i;
 	}
 	vcpu->arch.sie_block->cbrlo &= PAGE_MASK;	/* reset nceo */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] KVM: s390x: some utility functions for migration
  2018-01-15 17:03 ` [PATCH v1 1/2] KVM: s390x: some utility functions for migration Claudio Imbrenda
@ 2018-01-16 18:03   ` David Hildenbrand
  2018-01-17  9:50     ` Claudio Imbrenda
  2018-01-17 12:16     ` Christian Borntraeger
  0 siblings, 2 replies; 10+ messages in thread
From: David Hildenbrand @ 2018-01-16 18:03 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm; +Cc: borntraeger, cohuck, pbonzini, schwidefsky

On 15.01.2018 18:03, Claudio Imbrenda wrote:
> These are some utilty functions that will be used later on for storage
> attributes migration.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c | 40 ++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> index 6f17031..100ea15 100644
> --- a/arch/s390/kvm/kvm-s390.c
> +++ b/arch/s390/kvm/kvm-s390.c
> @@ -764,6 +764,14 @@ static void kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
>  		kvm_s390_sync_request(req, vcpu);
>  }
>  
> +static inline unsigned long *_cmma_bitmap(struct kvm_memory_slot *ms)

I think you can get rid of the "_" here. And ususally we use two _ ?

> +{
> +	unsigned long long len;
> +
> +	len = kvm_dirty_bitmap_bytes(ms) / sizeof(*ms->dirty_bitmap);

return (void *) ms->dirty_bitmap + kvm_dirty_bitmap_bytes(ms);

?

> +	return ms->dirty_bitmap + len;
> +}


> +
>  /*
>   * Must be called with kvm->srcu held to avoid races on memslots, and with
>   * kvm->lock to avoid races with ourselves and kvm_s390_vm_stop_migration.
> @@ -1512,6 +1520,38 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>  #define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX)
>  
>  /*
> + * Similar to gfn_to_memslot, but returns a memslot also when the address falls
> + * in a hole. In that case a memslot near the hole is returned.
> + */
> +static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn)
> +{
> +	struct kvm_memslots *slots = kvm_memslots(kvm);
> +	int start = 0, end = slots->used_slots;
> +	int slot = atomic_read(&slots->lru_slot);
> +	struct kvm_memory_slot *memslots = slots->memslots;
> +
> +	if (gfn >= memslots[slot].base_gfn &&
> +	    gfn < memslots[slot].base_gfn + memslots[slot].npages)
> +		return slot;
> +
> +	while (start < end) {
> +		slot = start + (end - start) / 2;
> +
> +		if (gfn >= memslots[slot].base_gfn)
> +			end = slot;
> +		else
> +			start = slot + 1;
> +	}
> +
> +	if (gfn >= memslots[start].base_gfn &&
> +	    gfn < memslots[start].base_gfn + memslots[start].npages) {
> +		atomic_set(&slots->lru_slot, start);
> +	}
> +
> +	return start;
> +}

This looks ugly, hope we can avoid this ....

> +
> +/*
>   * This function searches for the next page with dirty CMMA attributes, and
>   * saves the attributes in the buffer up to either the end of the buffer or
>   * until a block of at least KVM_S390_MAX_BIT_DISTANCE clean bits is found;
> 


-- 

Thanks,

David / dhildenb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots
  2018-01-15 17:03 ` [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots Claudio Imbrenda
@ 2018-01-17  9:17   ` Janosch Frank
  2018-01-17 12:10     ` Claudio Imbrenda
  2018-01-17 13:01   ` Janosch Frank
  1 sibling, 1 reply; 10+ messages in thread
From: Janosch Frank @ 2018-01-17  9:17 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm; +Cc: borntraeger, cohuck, pbonzini, david, schwidefsky


[-- Attachment #1.1: Type: text/plain, Size: 2720 bytes --]

On 15.01.2018 18:03, Claudio Imbrenda wrote:
> This is a fix for several issues that were found in the original code
> for storage attributes migration.
> 
> Now no bitmap is allocated to keep track of dirty storage attributes;
> the extra bits of the per-memslot bitmap that are always present anyway
> are now used for this purpose.
> 
> The code has also been refactored a little to improve readability.
> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
> Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and set guest storage attributes")

Did Christians patches go into stable, and should yours too?

> ---

>  static int kvm_s390_vm_start_migration(struct kvm *kvm)
>  {
[...]
>  	if (kvm->arch.use_cmma) {
[...]
> -
> -		mgs->bitmap_size = ram_pages;
> -		atomic64_set(&mgs->dirty_pages, ram_pages);
>  		/* mark all the pages in active slots as dirty */
>  		for (slotnr = 0; slotnr < slots->used_slots; slotnr++) {
>  			ms = slots->memslots + slotnr;
> -			bitmap_set(mgs->pgste_bitmap, ms->base_gfn, ms->npages);
> +			/*
> +			 * The second half of the bitmap is only used on x86,
> +			 * and would be wasted otherwise, so we put it to good
> +			 * use here to keep track of the state of the storage
> +			 * attributes.
> +			 */
> +			memset(_cmma_bitmap(ms), 0xFF,
> +				kvm_dirty_bitmap_bytes(ms));

On your branch, the second line is not indented properly.
Also s/0xFF/0xff/

> +			ram_pages += ms->npages;
>  		}
> 
> +		atomic64_set(&kvm->arch.cmma_dirty_pages, ram_pages);
> +		atomic_set(&kvm->arch.migration_mode, 1);
>  		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_START_MIGRATION);
> +	} else {
> +		atomic_set(&kvm->arch.migration_mode, 1);

For stop migration, you do that unconditionally, why don't you move this
in front of the if?

>  	}
>  	return 0;
>  }
> @@ -834,19 +820,12 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
>   */
>  static int kvm_s390_vm_stop_migration(struct kvm *kvm)
>  {
> -	struct kvm_s390_migration_state *mgs;
> -
>  	/* migration mode already disabled */
> -	if (!kvm->arch.migration_state)
> +	if (!atomic_read(&kvm->arch.migration_mode))
>  		return 0;
> -	mgs = kvm->arch.migration_state;
> -	kvm->arch.migration_state = NULL;
> -
> -	if (kvm->arch.use_cmma) {
> +	atomic_set(&kvm->arch.migration_mode, 0);
> +	if (kvm->arch.use_cmma)
>  		kvm_s390_sync_request_broadcast(kvm, KVM_REQ_STOP_MIGRATION);
> -		vfree(mgs->pgste_bitmap);
> -	}
> -	kfree(mgs);
>  	return 0;
>  }
> 
The rest below will take me some time.
However, it is much more readable than before!


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] KVM: s390x: some utility functions for migration
  2018-01-16 18:03   ` David Hildenbrand
@ 2018-01-17  9:50     ` Claudio Imbrenda
  2018-01-17 12:16     ` Christian Borntraeger
  1 sibling, 0 replies; 10+ messages in thread
From: Claudio Imbrenda @ 2018-01-17  9:50 UTC (permalink / raw)
  To: David Hildenbrand; +Cc: kvm, borntraeger, cohuck, pbonzini, schwidefsky

On Tue, 16 Jan 2018 19:03:00 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 15.01.2018 18:03, Claudio Imbrenda wrote:
> > These are some utilty functions that will be used later on for
> > storage attributes migration.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> > ---
> >  arch/s390/kvm/kvm-s390.c | 40
> > ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40
> > insertions(+)
> > 
> > diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
> > index 6f17031..100ea15 100644
> > --- a/arch/s390/kvm/kvm-s390.c
> > +++ b/arch/s390/kvm/kvm-s390.c
> > @@ -764,6 +764,14 @@ static void
> > kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
> > kvm_s390_sync_request(req, vcpu); }
> >  
> > +static inline unsigned long *_cmma_bitmap(struct kvm_memory_slot
> > *ms)  
> 
> I think you can get rid of the "_" here. And ususally we use two _ ?

will fix

> > +{
> > +	unsigned long long len;
> > +
> > +	len = kvm_dirty_bitmap_bytes(ms) /
> > sizeof(*ms->dirty_bitmap);  
> 
> return (void *) ms->dirty_bitmap + kvm_dirty_bitmap_bytes(ms);
> 
> ?

I really don't like pointer arithmetic on void pointers, especially
when the base pointer we are working with is already of the correct
type. (also, it's an extension, standard C doesn't even allow it)

do you really think it improves readability that much?

> > +	return ms->dirty_bitmap + len;
> > +}  
> 
> 
> > +
> >  /*
> >   * Must be called with kvm->srcu held to avoid races on memslots,
> > and with
> >   * kvm->lock to avoid races with ourselves and
> > kvm_s390_vm_stop_migration. @@ -1512,6 +1520,38 @@ static long
> > kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
> > #define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX) 
> >  /*
> > + * Similar to gfn_to_memslot, but returns a memslot also when the
> > address falls
> > + * in a hole. In that case a memslot near the hole is returned.
> > + */
> > +static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn)
> > +{
> > +	struct kvm_memslots *slots = kvm_memslots(kvm);
> > +	int start = 0, end = slots->used_slots;
> > +	int slot = atomic_read(&slots->lru_slot);
> > +	struct kvm_memory_slot *memslots = slots->memslots;
> > +
> > +	if (gfn >= memslots[slot].base_gfn &&
> > +	    gfn < memslots[slot].base_gfn + memslots[slot].npages)
> > +		return slot;
> > +
> > +	while (start < end) {
> > +		slot = start + (end - start) / 2;
> > +
> > +		if (gfn >= memslots[slot].base_gfn)
> > +			end = slot;
> > +		else
> > +			start = slot + 1;
> > +	}
> > +
> > +	if (gfn >= memslots[start].base_gfn &&
> > +	    gfn < memslots[start].base_gfn +
> > memslots[start].npages) {
> > +		atomic_set(&slots->lru_slot, start);
> > +	}
> > +
> > +	return start;
> > +}  
> 
> This looks ugly, hope we can avoid this ....

this is actually a copypaste of search_memslots (which is called by
gfn_to_memslot). we need this because the existing functions return
NULL when a slot is not found, but we need to return some memslot also
when the requested address falls in a hole. 

> > +
> > +/*
> >   * This function searches for the next page with dirty CMMA
> > attributes, and
> >   * saves the attributes in the buffer up to either the end of the
> > buffer or
> >   * until a block of at least KVM_S390_MAX_BIT_DISTANCE clean bits
> > is found; 
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots
  2018-01-17  9:17   ` Janosch Frank
@ 2018-01-17 12:10     ` Claudio Imbrenda
  0 siblings, 0 replies; 10+ messages in thread
From: Claudio Imbrenda @ 2018-01-17 12:10 UTC (permalink / raw)
  To: Janosch Frank; +Cc: kvm, borntraeger, cohuck, pbonzini, david, schwidefsky

On Wed, 17 Jan 2018 10:17:56 +0100
Janosch Frank <frankja@linux.vnet.ibm.com> wrote:

> On 15.01.2018 18:03, Claudio Imbrenda wrote:
> > This is a fix for several issues that were found in the original
> > code for storage attributes migration.
> > 
> > Now no bitmap is allocated to keep track of dirty storage
> > attributes; the extra bits of the per-memslot bitmap that are
> > always present anyway are now used for this purpose.
> > 
> > The code has also been refactored a little to improve readability.
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> > Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation,
> > migration mode") Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and
> > set guest storage attributes")  
> 
> Did Christians patches go into stable, and should yours too?

I don't know, and I don't know, but I'd bet probably yes

> > ---  
> 
> >  static int kvm_s390_vm_start_migration(struct kvm *kvm)
> >  {  
> [...]
> >  	if (kvm->arch.use_cmma) {  
> [...]
> > -
> > -		mgs->bitmap_size = ram_pages;
> > -		atomic64_set(&mgs->dirty_pages, ram_pages);
> >  		/* mark all the pages in active slots as dirty */
> >  		for (slotnr = 0; slotnr < slots->used_slots;
> > slotnr++) { ms = slots->memslots + slotnr;
> > -			bitmap_set(mgs->pgste_bitmap,
> > ms->base_gfn, ms->npages);
> > +			/*
> > +			 * The second half of the bitmap is only
> > used on x86,
> > +			 * and would be wasted otherwise, so we
> > put it to good
> > +			 * use here to keep track of the state of
> > the storage
> > +			 * attributes.
> > +			 */
> > +			memset(_cmma_bitmap(ms), 0xFF,
> > +				kvm_dirty_bitmap_bytes(ms));  
> 
> On your branch, the second line is not indented properly.
> Also s/0xFF/0xff/

will fix

> > +			ram_pages += ms->npages;
> >  		}
> > 
> > +		atomic64_set(&kvm->arch.cmma_dirty_pages,
> > ram_pages);
> > +		atomic_set(&kvm->arch.migration_mode, 1);
> >  		kvm_s390_sync_request_broadcast(kvm,
> > KVM_REQ_START_MIGRATION);
> > +	} else {
> > +		atomic_set(&kvm->arch.migration_mode, 1);  
> 
> For stop migration, you do that unconditionally, why don't you move
> this in front of the if?

I want the bitmaps and the count of pages to be properly set up before
we enable migration mode.
we can intercept essa also during normal operation, and I don't want
migration_mode to be set before we're done setting up bitmaps and page
count, because the code in the essa handler could access both before
they're properly set up.

thread1: migration_mode=1
thread2: essa handler: migration_mode==1 -> set bit in bitmap
thread1: fill bitmap, cmma_dirty_pages=N
thread2: cmma_dirty_pages=N+1

now the number of dirty pages is one more than the maximum number of
pages, and could be a problem for userspace. this should not happen in
normal operation.
it can still happen if the process maliciously calls
kvm_s390_vm_start_migration several times in parallel, but this is not
our problem any longer.

in any case the race won't cause illegal memory
accessees, crashes, leaks or security issues, so I think we can ignore
the issue

> >  	}
> >  	return 0;
> >  }
> > @@ -834,19 +820,12 @@ static int kvm_s390_vm_start_migration(struct
> > kvm *kvm) */
> >  static int kvm_s390_vm_stop_migration(struct kvm *kvm)
> >  {
> > -	struct kvm_s390_migration_state *mgs;
> > -
> >  	/* migration mode already disabled */
> > -	if (!kvm->arch.migration_state)
> > +	if (!atomic_read(&kvm->arch.migration_mode))
> >  		return 0;
> > -	mgs = kvm->arch.migration_state;
> > -	kvm->arch.migration_state = NULL;
> > -
> > -	if (kvm->arch.use_cmma) {
> > +	atomic_set(&kvm->arch.migration_mode, 0);
> > +	if (kvm->arch.use_cmma)
> >  		kvm_s390_sync_request_broadcast(kvm,
> > KVM_REQ_STOP_MIGRATION);
> > -		vfree(mgs->pgste_bitmap);
> > -	}
> > -	kfree(mgs);
> >  	return 0;
> >  }
> >   
> The rest below will take me some time.
> However, it is much more readable than before!
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 1/2] KVM: s390x: some utility functions for migration
  2018-01-16 18:03   ` David Hildenbrand
  2018-01-17  9:50     ` Claudio Imbrenda
@ 2018-01-17 12:16     ` Christian Borntraeger
  1 sibling, 0 replies; 10+ messages in thread
From: Christian Borntraeger @ 2018-01-17 12:16 UTC (permalink / raw)
  To: David Hildenbrand, Claudio Imbrenda, kvm; +Cc: cohuck, pbonzini, schwidefsky



On 01/16/2018 07:03 PM, David Hildenbrand wrote:
> On 15.01.2018 18:03, Claudio Imbrenda wrote:
>> These are some utilty functions that will be used later on for storage
>> attributes migration.
>>
>> Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c | 40 ++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 40 insertions(+)
>>
>> diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
>> index 6f17031..100ea15 100644
>> --- a/arch/s390/kvm/kvm-s390.c
>> +++ b/arch/s390/kvm/kvm-s390.c
>> @@ -764,6 +764,14 @@ static void kvm_s390_sync_request_broadcast(struct kvm *kvm, int req)
>>  		kvm_s390_sync_request(req, vcpu);
>>  }
>>  
>> +static inline unsigned long *_cmma_bitmap(struct kvm_memory_slot *ms)
> 
> I think you can get rid of the "_" here. And ususally we use two _ ?
> 
>> +{
>> +	unsigned long long len;
>> +
>> +	len = kvm_dirty_bitmap_bytes(ms) / sizeof(*ms->dirty_bitmap);
> 
> return (void *) ms->dirty_bitmap + kvm_dirty_bitmap_bytes(ms);
> 
> ?

Relying on pointer arithmetics for (void *) being 1 is a gcc extension.
If possible I would like to avoid that. 

> 
>> +	return ms->dirty_bitmap + len;
>> +}
> 
> 
>> +
>>  /*
>>   * Must be called with kvm->srcu held to avoid races on memslots, and with
>>   * kvm->lock to avoid races with ourselves and kvm_s390_vm_stop_migration.
>> @@ -1512,6 +1520,38 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
>>  #define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX)
>>  
>>  /*
>> + * Similar to gfn_to_memslot, but returns a memslot also when the address falls
>> + * in a hole. In that case a memslot near the hole is returned.
>> + */
>> +static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn)
>> +{
>> +	struct kvm_memslots *slots = kvm_memslots(kvm);
>> +	int start = 0, end = slots->used_slots;
>> +	int slot = atomic_read(&slots->lru_slot);
>> +	struct kvm_memory_slot *memslots = slots->memslots;
>> +
>> +	if (gfn >= memslots[slot].base_gfn &&
>> +	    gfn < memslots[slot].base_gfn + memslots[slot].npages)
>> +		return slot;
>> +
>> +	while (start < end) {
>> +		slot = start + (end - start) / 2;
>> +
>> +		if (gfn >= memslots[slot].base_gfn)
>> +			end = slot;
>> +		else
>> +			start = slot + 1;
>> +	}
>> +
>> +	if (gfn >= memslots[start].base_gfn &&
>> +	    gfn < memslots[start].base_gfn + memslots[start].npages) {
>> +		atomic_set(&slots->lru_slot, start);
>> +	}
>> +
>> +	return start;
>> +}
> 
> This looks ugly, hope we can avoid this ....
> 
>> +
>> +/*
>>   * This function searches for the next page with dirty CMMA attributes, and
>>   * saves the attributes in the buffer up to either the end of the buffer or
>>   * until a block of at least KVM_S390_MAX_BIT_DISTANCE clean bits is found;
>>
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots
  2018-01-15 17:03 ` [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots Claudio Imbrenda
  2018-01-17  9:17   ` Janosch Frank
@ 2018-01-17 13:01   ` Janosch Frank
  2018-01-17 17:37     ` Claudio Imbrenda
  1 sibling, 1 reply; 10+ messages in thread
From: Janosch Frank @ 2018-01-17 13:01 UTC (permalink / raw)
  To: Claudio Imbrenda, kvm; +Cc: borntraeger, cohuck, pbonzini, david, schwidefsky


[-- Attachment #1.1: Type: text/plain, Size: 7914 bytes --]

On 15.01.2018 18:03, Claudio Imbrenda wrote:
> This is a fix for several issues that were found in the original code
> for storage attributes migration.
> 
> Now no bitmap is allocated to keep track of dirty storage attributes;
> the extra bits of the per-memslot bitmap that are always present anyway
> are now used for this purpose.
> 
> The code has also been refactored a little to improve readability.

Second pass.

> 
> Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation, migration mode")
> Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and set guest storage attributes")
> ---

> 
> +static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args,
> +			      u8 *res, unsigned long bufsize)
> +{
> +	unsigned long pgstev, cur, hva, i = 0;
> +	int r, ret = 0;
> +
> +	cur = args->start_gfn;

s/cur/cur_gfn/ for a lot of occasions.

> +	while (i < bufsize) {
> +		hva = gfn_to_hva(kvm, cur);
> +		if (kvm_is_error_hva(hva)) {
> +			if (!i)
> +				ret = -EFAULT;

Short comment, something like:?
If we end up with a failure right away return an error, if we
encounter it after successful reads pass the successes down.

> +			break;
> +		}
> +		r = get_pgste(kvm->mm, hva, &pgstev);
> +		if (r < 0)
> +			pgstev = 0;
> +		res[i++] = (pgstev >> 24) & 0x43;
> +		cur++;
> +	}
> +	args->count = i;

If you feel comfortable with it you could count with args->count.

> +
> +	return ret;
> +}
> +
> +static unsigned long kvm_s390_next_dirty_cmma(struct kvm *kvm,
> +					      unsigned long cur)
> +{
> +	struct kvm_memslots *slots = kvm_memslots(kvm);
> +	struct kvm_memory_slot *ms;
> +	int slotidx;
> +
> +	slotidx = gfn_to_memslot_approx(kvm, cur);
> +	ms = slots->memslots + slotidx;

I prefer proper indexing.

> +
> +	if (ms->base_gfn + ms->npages <= cur) {

That's the hole handling, right?

> +		slotidx--;
> +		/* If we are above the highest slot, wrap around */
> +		if (slotidx < 0)
> +			slotidx = slots->used_slots - 1;
> +
> +		ms = slots->memslots + slotidx;
> +		cur = ms->base_gfn;
> +	}
> +	cur = find_next_bit(_cmma_bitmap(ms), ms->npages, cur - ms->base_gfn);

So cur is now the index of the next set bit, right? I do not like that.

Your next dirty bit is in another memslot. :)

> +	while ((slotidx > 0) && (cur >= ms->npages)) {
> +		slotidx--;
> +		ms = slots->memslots + slotidx;
> +		cur = find_next_bit(_cmma_bitmap(ms), ms->npages,
> +				    cur - ms->base_gfn);
> +	}
> +	return cur + ms->base_gfn;

I'd switch that, personal preference.

> +}
> +
> +static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args,
> +			     u8 *res, unsigned long bufsize)
> +{
> +	unsigned long next, mem_end, cur, hva, pgstev, i = 0;
> +	struct kvm_memslots *slots = kvm_memslots(kvm);
> +	struct kvm_memory_slot *ms;
> +	int r, ret = 0;
> +
> +	cur = kvm_s390_next_dirty_cmma(kvm, args->start_gfn);
> +	ms = gfn_to_memslot(kvm, cur);
> +	args->count = 0;
> +	args->start_gfn = 0;
> +	if (!ms)
> +		return 0;
> +	next = kvm_s390_next_dirty_cmma(kvm, cur + 1);
> +	mem_end = slots->memslots[0].base_gfn + slots->memslots[0].npages;
> +
> +	args->start_gfn = cur;
> +	while (i < bufsize) {
> +		hva = gfn_to_hva(kvm, cur);
> +		if (kvm_is_error_hva(hva))
> +			break;
> +		/* decrement only if we actually flipped the bit to 0 */
> +		if (test_and_clear_bit(cur - ms->base_gfn, _cmma_bitmap(ms)))
> +			atomic64_dec(&kvm->arch.cmma_dirty_pages);
> +		r = get_pgste(kvm->mm, hva, &pgstev);
> +		if (r < 0)
> +			pgstev = 0;
> +		/* save the value */
> +		res[i++] = (pgstev >> 24) & 0x43;
> +		/*
> +		 * if the next bit is too far away, stop.
> +		 * if we reached the previous "next", find the next one
> +		 */
> +		if (next > cur + KVM_S390_MAX_BIT_DISTANCE)
> +			break;
> +		if (cur == next)
> +			next = kvm_s390_next_dirty_cmma(kvm, cur + 1);
> +		/* reached the end of the bitmap or of the buffer, stop */
> +		if ((next >= mem_end) || (next - args->start_gfn >= bufsize))
> +			break;
> +		cur++;
> +		if (cur - ms->base_gfn >= ms->npages) {
> +			ms = gfn_to_memslot(kvm, cur);
> +			if (!ms)
> +				break;
> +		}
> +	}
> +	args->count = i;
> +	return ret;

s/ret/0/

> +}
> +
>  /*
>   * This function searches for the next page with dirty CMMA attributes, and
>   * saves the attributes in the buffer up to either the end of the buffer or
> @@ -1562,90 +1647,47 @@ static int gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn)
>  static int kvm_s390_get_cmma_bits(struct kvm *kvm,
>  				  struct kvm_s390_cmma_log *args)
>  {
> -	struct kvm_s390_migration_state *s = kvm->arch.migration_state;
> -	unsigned long bufsize, hva, pgstev, i, next, cur;
> -	int srcu_idx, peek, r = 0, rr;
> +	unsigned long bufsize;
> +	int srcu_idx, peek, s, rr, r = 0;
>  	u8 *res;
> 
> -	cur = args->start_gfn;
> -	i = next = pgstev = 0;
> +	s = atomic_read(&kvm->arch.migration_mode);

s?
Please do not overuse single char variable names!

> +	if (peek)
> +		r = kvm_s390_peek_cmma(kvm, args, res, bufsize);
> +	else
> +		r = kvm_s390_get_cmma(kvm, args, res, bufsize);
>  	srcu_read_unlock(&kvm->srcu, srcu_idx);
>  	up_read(&kvm->mm->mmap_sem);
> -	args->count = i;
> -	args->remaining = s ? atomic64_read(&s->dirty_pages) : 0;
> +
> +	args->remaining = s ? atomic64_read(&kvm->arch.cmma_dirty_pages) : 0;

So, I didn't yet have time to read the QEMU part so this might be
trivial, but what if:
s = 0 but we didn't transfer all pgstes with this invocation.
There would be leftover data which would not be retrieved

> 
>  	rr = copy_to_user((void __user *)args->values, res, args->count);
>  	if (rr)

Please get rid of we don't really need it here.

> @@ -2096,10 +2138,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
>  	kvm_s390_destroy_adapters(kvm);
>  	kvm_s390_clear_float_irqs(kvm);
>  	kvm_s390_vsie_destroy(kvm);
> -	if (kvm->arch.migration_state) {
> -		vfree(kvm->arch.migration_state->pgste_bitmap);
> -		kfree(kvm->arch.migration_state);
> -	}
>  	KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
>  }
> 
> diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> index 572496c..321d6b2 100644
> --- a/arch/s390/kvm/priv.c
> +++ b/arch/s390/kvm/priv.c
> @@ -954,9 +954,11 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
>  	return 0;
>  }
> 
> -static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
> +/*
> + * Must be called with relevant read locks held (kvm->mm->mmap_sem, kvm->srcu)
> + */
> +static inline int __do_essa(struct kvm_vcpu *vcpu, const int orc)
>  {
> -	struct kvm_s390_migration_state *ms = vcpu->kvm->arch.migration_state;
>  	int r1, r2, nappended, entries;
>  	unsigned long gfn, hva, res, pgstev, ptev;
>  	unsigned long *cbrlo;
> @@ -965,7 +967,6 @@ static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
>  	 * We don't need to set SD.FPF.SK to 1 here, because if we have a
>  	 * machine check here we either handle it or crash
>  	 */
> -
>  	kvm_s390_get_regs_rre(vcpu, &r1, &r2);
>  	gfn = vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT;
>  	hva = gfn_to_hva(vcpu->kvm, gfn);
> @@ -1007,9 +1008,17 @@ static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
>  	}
> 
>  	if (orc) {
> -		/* increment only if we are really flipping the bit to 1 */
> -		if (!test_and_set_bit(gfn, ms->pgste_bitmap))
> -			atomic64_inc(&ms->dirty_pages);
> +		struct kvm_memory_slot *ms = gfn_to_memslot(vcpu->kvm, gfn);
> +		unsigned long *bm;
> +
> +		if (ms) {
> +			/* The cmma bitmap is right after the memory bitmap */
> +			bm = ms->dirty_bitmap + kvm_dirty_bitmap_bytes(ms)
> +						/ sizeof(*ms->dirty_bitmap);

Hrm, maybe we could put the utility function in kvm-s390.h so we can
also use it here.


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots
  2018-01-17 13:01   ` Janosch Frank
@ 2018-01-17 17:37     ` Claudio Imbrenda
  0 siblings, 0 replies; 10+ messages in thread
From: Claudio Imbrenda @ 2018-01-17 17:37 UTC (permalink / raw)
  To: Janosch Frank; +Cc: kvm, borntraeger, cohuck, pbonzini, david, schwidefsky

On Wed, 17 Jan 2018 14:01:11 +0100
Janosch Frank <frankja@linux.vnet.ibm.com> wrote:

> On 15.01.2018 18:03, Claudio Imbrenda wrote:
> > This is a fix for several issues that were found in the original
> > code for storage attributes migration.
> > 
> > Now no bitmap is allocated to keep track of dirty storage
> > attributes; the extra bits of the per-memslot bitmap that are
> > always present anyway are now used for this purpose.
> > 
> > The code has also been refactored a little to improve readability.  
> 
> Second pass.
> 
> > 
> > Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com>
> > Fixes: 190df4a212a ("KVM: s390: CMMA tracking, ESSA emulation,
> > migration mode") Fixes: 4036e3874a1 ("KVM: s390: ioctls to get and
> > set guest storage attributes") ---  
> 
> > 
> > +static int kvm_s390_peek_cmma(struct kvm *kvm, struct
> > kvm_s390_cmma_log *args,
> > +			      u8 *res, unsigned long bufsize)
> > +{
> > +	unsigned long pgstev, cur, hva, i = 0;
> > +	int r, ret = 0;
> > +
> > +	cur = args->start_gfn;  
> 
> s/cur/cur_gfn/ for a lot of occasions.

will be fixed

> 
> > +	while (i < bufsize) {
> > +		hva = gfn_to_hva(kvm, cur);
> > +		if (kvm_is_error_hva(hva)) {
> > +			if (!i)
> > +				ret = -EFAULT;  
> 
> Short comment, something like:?
> If we end up with a failure right away return an error, if we
> encounter it after successful reads pass the successes down.

will be added

> > +			break;
> > +		}
> > +		r = get_pgste(kvm->mm, hva, &pgstev);
> > +		if (r < 0)
> > +			pgstev = 0;
> > +		res[i++] = (pgstev >> 24) & 0x43;
> > +		cur++;
> > +	}
> > +	args->count = i;  
> 
> If you feel comfortable with it you could count with args->count.

makes sense
 
> > +
> > +	return ret;
> > +}
> > +
> > +static unsigned long kvm_s390_next_dirty_cmma(struct kvm *kvm,
> > +					      unsigned long cur)
> > +{
> > +	struct kvm_memslots *slots = kvm_memslots(kvm);
> > +	struct kvm_memory_slot *ms;
> > +	int slotidx;
> > +
> > +	slotidx = gfn_to_memslot_approx(kvm, cur);
> > +	ms = slots->memslots + slotidx;  
> 
> I prefer proper indexing.

meh, I think &slots->memslots[slotidx] would look even uglier

> > +
> > +	if (ms->base_gfn + ms->npages <= cur) {  
> 
> That's the hole handling, right?

yes
 
> > +		slotidx--;
> > +		/* If we are above the highest slot, wrap around */
> > +		if (slotidx < 0)
> > +			slotidx = slots->used_slots - 1;
> > +
> > +		ms = slots->memslots + slotidx;
> > +		cur = ms->base_gfn;
> > +	}
> > +	cur = find_next_bit(_cmma_bitmap(ms), ms->npages, cur -
> > ms->base_gfn);  
> 
> So cur is now the index of the next set bit, right? I do not like
> that.

I'll add a separate variable

> Your next dirty bit is in another memslot. :)
> 
> > +	while ((slotidx > 0) && (cur >= ms->npages)) {
> > +		slotidx--;
> > +		ms = slots->memslots + slotidx;
> > +		cur = find_next_bit(_cmma_bitmap(ms), ms->npages,
> > +				    cur - ms->base_gfn);

should be 0, not cur - ms->base_gfn, since we're moving to the next
memslot

> > +	}
> > +	return cur + ms->base_gfn;  
> 
> I'd switch that, personal preference.

yeah looks better switched

> > +}
> > +
> > +static int kvm_s390_get_cmma(struct kvm *kvm, struct
> > kvm_s390_cmma_log *args,
> > +			     u8 *res, unsigned long bufsize)
> > +{
> > +	unsigned long next, mem_end, cur, hva, pgstev, i = 0;
> > +	struct kvm_memslots *slots = kvm_memslots(kvm);
> > +	struct kvm_memory_slot *ms;
> > +	int r, ret = 0;
> > +
> > +	cur = kvm_s390_next_dirty_cmma(kvm, args->start_gfn);
> > +	ms = gfn_to_memslot(kvm, cur);
> > +	args->count = 0;
> > +	args->start_gfn = 0;
> > +	if (!ms)
> > +		return 0;
> > +	next = kvm_s390_next_dirty_cmma(kvm, cur + 1);
> > +	mem_end = slots->memslots[0].base_gfn +
> > slots->memslots[0].npages; +
> > +	args->start_gfn = cur;
> > +	while (i < bufsize) {
> > +		hva = gfn_to_hva(kvm, cur);
> > +		if (kvm_is_error_hva(hva))
> > +			break;
> > +		/* decrement only if we actually flipped the bit
> > to 0 */
> > +		if (test_and_clear_bit(cur - ms->base_gfn,
> > _cmma_bitmap(ms)))
> > +			atomic64_dec(&kvm->arch.cmma_dirty_pages);
> > +		r = get_pgste(kvm->mm, hva, &pgstev);
> > +		if (r < 0)
> > +			pgstev = 0;
> > +		/* save the value */
> > +		res[i++] = (pgstev >> 24) & 0x43;
> > +		/*
> > +		 * if the next bit is too far away, stop.
> > +		 * if we reached the previous "next", find the
> > next one
> > +		 */
> > +		if (next > cur + KVM_S390_MAX_BIT_DISTANCE)
> > +			break;
> > +		if (cur == next)
> > +			next = kvm_s390_next_dirty_cmma(kvm, cur +
> > 1);
> > +		/* reached the end of the bitmap or of the buffer,
> > stop */
> > +		if ((next >= mem_end) || (next - args->start_gfn
> > >= bufsize))
> > +			break;
> > +		cur++;
> > +		if (cur - ms->base_gfn >= ms->npages) {
> > +			ms = gfn_to_memslot(kvm, cur);
> > +			if (!ms)
> > +				break;
> > +		}
> > +	}
> > +	args->count = i;
> > +	return ret;  
> 
> s/ret/0/

will be fixed
 
> > +}
> > +
> >  /*
> >   * This function searches for the next page with dirty CMMA
> > attributes, and
> >   * saves the attributes in the buffer up to either the end of the
> > buffer or @@ -1562,90 +1647,47 @@ static int
> > gfn_to_memslot_approx(struct kvm *kvm, gfn_t gfn) static int
> > kvm_s390_get_cmma_bits(struct kvm *kvm, struct kvm_s390_cmma_log
> > *args) {
> > -	struct kvm_s390_migration_state *s =
> > kvm->arch.migration_state;
> > -	unsigned long bufsize, hva, pgstev, i, next, cur;
> > -	int srcu_idx, peek, r = 0, rr;
> > +	unsigned long bufsize;
> > +	int srcu_idx, peek, s, rr, r = 0;
> >  	u8 *res;
> > 
> > -	cur = args->start_gfn;
> > -	i = next = pgstev = 0;
> > +	s = atomic_read(&kvm->arch.migration_mode);  
> 
> s?
> Please do not overuse single char variable names!

I'll do a refactoring of all those variable names so that they will be
less confusing
 
> > +	if (peek)
> > +		r = kvm_s390_peek_cmma(kvm, args, res, bufsize);
> > +	else
> > +		r = kvm_s390_get_cmma(kvm, args, res, bufsize);
> >  	srcu_read_unlock(&kvm->srcu, srcu_idx);
> >  	up_read(&kvm->mm->mmap_sem);
> > -	args->count = i;
> > -	args->remaining = s ? atomic64_read(&s->dirty_pages) : 0;
> > +
> > +	args->remaining = s ?
> > atomic64_read(&kvm->arch.cmma_dirty_pages) : 0;  
> 
> So, I didn't yet have time to read the QEMU part so this might be
> trivial, but what if:
> s = 0 but we didn't transfer all pgstes with this invocation.
> There would be leftover data which would not be retrieved

s == 0 only if we are not in migration mode. so if qemu decides to stop
migration mode before it's done retrieving all the values, it's not our
fault, it's qemu's choice.

migration mode can only be changed through the start/stop migration
attributes

> > 
> >  	rr = copy_to_user((void __user *)args->values, res,
> > args->count); if (rr)  
> 
> Please get rid of we don't really need it here.
> 
> > @@ -2096,10 +2138,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> >  	kvm_s390_destroy_adapters(kvm);
> >  	kvm_s390_clear_float_irqs(kvm);
> >  	kvm_s390_vsie_destroy(kvm);
> > -	if (kvm->arch.migration_state) {
> > -		vfree(kvm->arch.migration_state->pgste_bitmap);
> > -		kfree(kvm->arch.migration_state);
> > -	}
> >  	KVM_EVENT(3, "vm 0x%pK destroyed", kvm);
> >  }
> > 
> > diff --git a/arch/s390/kvm/priv.c b/arch/s390/kvm/priv.c
> > index 572496c..321d6b2 100644
> > --- a/arch/s390/kvm/priv.c
> > +++ b/arch/s390/kvm/priv.c
> > @@ -954,9 +954,11 @@ static int handle_pfmf(struct kvm_vcpu *vcpu)
> >  	return 0;
> >  }
> > 
> > -static inline int do_essa(struct kvm_vcpu *vcpu, const int orc)
> > +/*
> > + * Must be called with relevant read locks held
> > (kvm->mm->mmap_sem, kvm->srcu)
> > + */
> > +static inline int __do_essa(struct kvm_vcpu *vcpu, const int orc)
> >  {
> > -	struct kvm_s390_migration_state *ms =
> > vcpu->kvm->arch.migration_state; int r1, r2, nappended, entries;
> >  	unsigned long gfn, hva, res, pgstev, ptev;
> >  	unsigned long *cbrlo;
> > @@ -965,7 +967,6 @@ static inline int do_essa(struct kvm_vcpu
> > *vcpu, const int orc)
> >  	 * We don't need to set SD.FPF.SK to 1 here, because if we
> > have a
> >  	 * machine check here we either handle it or crash
> >  	 */
> > -
> >  	kvm_s390_get_regs_rre(vcpu, &r1, &r2);
> >  	gfn = vcpu->run->s.regs.gprs[r2] >> PAGE_SHIFT;
> >  	hva = gfn_to_hva(vcpu->kvm, gfn);
> > @@ -1007,9 +1008,17 @@ static inline int do_essa(struct kvm_vcpu
> > *vcpu, const int orc) }
> > 
> >  	if (orc) {
> > -		/* increment only if we are really flipping the
> > bit to 1 */
> > -		if (!test_and_set_bit(gfn, ms->pgste_bitmap))
> > -			atomic64_inc(&ms->dirty_pages);
> > +		struct kvm_memory_slot *ms =
> > gfn_to_memslot(vcpu->kvm, gfn);
> > +		unsigned long *bm;
> > +
> > +		if (ms) {
> > +			/* The cmma bitmap is right after the
> > memory bitmap */
> > +			bm = ms->dirty_bitmap +
> > kvm_dirty_bitmap_bytes(ms)
> > +						/
> > sizeof(*ms->dirty_bitmap);  
> 
> Hrm, maybe we could put the utility function in kvm-s390.h so we can
> also use it here.

I had considered that too, but in the end I didn't do it. it probably
makes sense to have it in the .h so I'll fix it
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-01-17 17:37 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-15 17:03 [PATCH v1 0/2] KVM: s390: fix storage attributes migration Claudio Imbrenda
2018-01-15 17:03 ` [PATCH v1 1/2] KVM: s390x: some utility functions for migration Claudio Imbrenda
2018-01-16 18:03   ` David Hildenbrand
2018-01-17  9:50     ` Claudio Imbrenda
2018-01-17 12:16     ` Christian Borntraeger
2018-01-15 17:03 ` [PATCH v1 2/2] KVM: s390: Fix storage attributes migration with memory slots Claudio Imbrenda
2018-01-17  9:17   ` Janosch Frank
2018-01-17 12:10     ` Claudio Imbrenda
2018-01-17 13:01   ` Janosch Frank
2018-01-17 17:37     ` Claudio Imbrenda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).