[PATCH 0/6] kvm-s390: collection of kvm-s390 fixes

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes
@ 2009-05-05 14:39 ehrhardt
  2009-05-05 14:39 ` [PATCH 1/6] kvm-s390: Fix memory slot versus run ehrhardt
                   ` (5 more replies)
  0 siblings, 6 replies; 26+ messages in thread
From: ehrhardt @ 2009-05-05 14:39 UTC (permalink / raw)
  To: Avi Kivity, kvm; +Cc: ehrhardt, Christian Borntraeger, Carsten Otte

From: Christian Ehrhardt <ehrhardt@de.ibm.com>

This is a collection of fixes for kvm-s390 that originate from several tests
made in the last few months. They are now tested a while and should be ready
to be merged.

All six patches are created either by Carsten Otte or Christain Borntraeger.
I'm just the one stumbling across the filled patch queue and cleaning them up
for submission. The patches themselve have proper tags to account creator etc.

*not sending patches a few weeks makes somewhat forgetful - I beg a pardon from
all on cc that got it two times now after adding the kvm list this time.

Patches included:
[PATCH 1/6] kvm-s390: Fix memory slot versus run'
[PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle'
[PATCH 3/6] kvm-s390: optimize float int lock: spin_lock_bh --> spin_lock'
[PATCH 4/6] kvm-s390: Unlink vcpu on destroy'
[PATCH 5/6] kvm-s390: Sanity check on validity intercept'
[PATCH 6/6] kvm-s390: Verify memory in kvm run'

Overall-Diffstat:
 arch/s390/include/asm/kvm_host.h |    5 ++-
 arch/s390/kvm/intercept.c        |   28 ++++++++++++-------
 arch/s390/kvm/interrupt.c        |   55 ++++++++++++++++++++++++---------------
 arch/s390/kvm/kvm-s390.c         |   50 ++++++++++++++++++++++++++++-------
 arch/s390/kvm/kvm-s390.h         |    4 ++
 arch/s390/kvm/priv.c             |    4 +-
 arch/s390/kvm/sigp.c             |   16 +++++------
 7 files changed, 110 insertions(+), 52 deletions(-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-05 14:39 [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes ehrhardt
@ 2009-05-05 14:39 ` ehrhardt
  2009-05-06 12:01   ` Avi Kivity
  2009-05-05 14:39 ` [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle ehrhardt
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: ehrhardt @ 2009-05-05 14:39 UTC (permalink / raw)
  To: Avi Kivity, kvm; +Cc: ehrhardt, Christian Borntraeger, Carsten Otte

From: Carsten Otte <cotte@de.ibm.com>

This patch fixes an incorrectness in the kvm backend for s390.
In case virtual cpus are being created before the corresponding
memory slot is being registered, we need to update the sie
control blocks for the virtual cpus. In order to do that, we
use the vcpu->mutex to lock out kvm_run and friends. This way
we can ensure a consistent update of the memory for the entire
smp configuration.

Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c |   24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

Index: kvm/arch/s390/kvm/kvm-s390.c
===================================================================
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -657,6 +657,8 @@ int kvm_arch_set_memory_region(struct kv
 				struct kvm_memory_slot old,
 				int user_alloc)
 {
+	int i;
+
 	/* A few sanity checks. We can have exactly one memory slot which has
 	   to start at guest virtual zero and which has to be located at a
 	   page boundary in userland and which has to end at a page boundary.
@@ -676,13 +678,27 @@ int kvm_arch_set_memory_region(struct kv
 	if (mem->memory_size & (PAGE_SIZE - 1))
 		return -EINVAL;
 
+	/* lock all vcpus */
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i])
+			mutex_lock(&kvm->vcpus[i]->mutex);
+	}
+
 	kvm->arch.guest_origin = mem->userspace_addr;
 	kvm->arch.guest_memsize = mem->memory_size;
 
-	/* FIXME: we do want to interrupt running CPUs and update their memory
-	   configuration now to avoid race conditions. But hey, changing the
-	   memory layout while virtual CPUs are running is usually bad
-	   programming practice. */
+	/* update sie control blocks, and unlock all vcpus */
+	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
+		if (kvm->vcpus[i]) {
+			kvm->vcpus[i]->arch.sie_block->gmsor =
+				kvm->arch.guest_origin;
+			kvm->vcpus[i]->arch.sie_block->gmslm =
+				kvm->arch.guest_memsize +
+				kvm->arch.guest_origin +
+				VIRTIODESCSPACE - 1ul;
+			mutex_unlock(&kvm->vcpus[i]->mutex);
+		}
+	}
 
 	return 0;
 }

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle
  2009-05-05 14:39 [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes ehrhardt
  2009-05-05 14:39 ` [PATCH 1/6] kvm-s390: Fix memory slot versus run ehrhardt
@ 2009-05-05 14:39 ` ehrhardt
  2009-05-06 12:10   ` Avi Kivity
  2009-05-05 14:39 ` [PATCH 3/6] kvm-s390: optimize float int lock: spin_lock_bh --> spin_lock ehrhardt
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 26+ messages in thread
From: ehrhardt @ 2009-05-05 14:39 UTC (permalink / raw)
  To: Avi Kivity, kvm; +Cc: ehrhardt, Christian Borntraeger, Carsten Otte

From: Christian Borntraeger <borntraeger@de.ibm.com>

This patch reworks the s390 clock comparator wakeup to hrtimer. The clock
comparator is a per-cpu value that is compared against the TOD clock. If
ckc <= TOD an external interrupt 1004 is triggered. Since the clock comparator
and the TOD clock have a much higher resolution than jiffies we should use
hrtimers to trigger the wakeup. This speeds up guest nanosleep for small
values.

Since hrtimers callbacks run in hard-irq context, I added a tasklet to do
the actual work with enabled interrupts. 

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
 include/asm/kvm_host.h |    5 ++++-
 kvm/interrupt.c        |   35 +++++++++++++++++++++++++----------
 kvm/kvm-s390.c         |    7 +++++--
 kvm/kvm-s390.h         |    4 +++-
 4 files changed, 37 insertions(+), 14 deletions(-)

Index: kvm/arch/s390/include/asm/kvm_host.h
===================================================================
--- kvm.orig/arch/s390/include/asm/kvm_host.h	2009-05-05 15:58:45.000000000 +0200
+++ kvm/arch/s390/include/asm/kvm_host.h	2009-05-05 16:16:49.000000000 +0200
@@ -13,6 +13,8 @@
 
 #ifndef ASM_KVM_HOST_H
 #define ASM_KVM_HOST_H
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/kvm_host.h>
 #include <asm/debug.h>
 #include <asm/cpuid.h>
@@ -210,7 +212,8 @@
 	s390_fp_regs      guest_fpregs;
 	unsigned int      guest_acrs[NUM_ACRS];
 	struct kvm_s390_local_interrupt local_int;
-	struct timer_list ckc_timer;
+	struct hrtimer    ckc_timer;
+	struct tasklet_struct tasklet;
 	union  {
 		cpuid_t	  cpu_id;
 		u64	  stidp_data;
Index: kvm/arch/s390/kvm/interrupt.c
===================================================================
--- kvm.orig/arch/s390/kvm/interrupt.c	2009-05-05 15:58:45.000000000 +0200
+++ kvm/arch/s390/kvm/interrupt.c	2009-05-05 16:18:02.000000000 +0200
@@ -12,6 +12,8 @@
 
 #include <asm/lowcore.h>
 #include <asm/uaccess.h>
+#include <linux/hrtimer.h>
+#include <linux/interrupt.h>
 #include <linux/kvm_host.h>
 #include <linux/signal.h>
 #include "kvm-s390.h"
@@ -361,12 +363,12 @@
 		return 0;
 	}
 
-	sltime = (vcpu->arch.sie_block->ckc - now) / (0xf4240000ul / HZ) + 1;
+	sltime = (vcpu->arch.sie_block->ckc - now)/4096*1000;
 
-	vcpu->arch.ckc_timer.expires = jiffies + sltime;
-
-	add_timer(&vcpu->arch.ckc_timer);
-	VCPU_EVENT(vcpu, 5, "enabled wait timer:%llx jiffies", sltime);
+	hrtimer_start(&vcpu->arch.ckc_timer, ktime_set(0, sltime),
+		      HRTIMER_MODE_REL);
+	VCPU_EVENT(vcpu, 5, "enabled wait via clock comparator: %llx ns",
+		   sltime);
 no_timer:
 	spin_lock_bh(&vcpu->arch.local_int.float_int->lock);
 	spin_lock_bh(&vcpu->arch.local_int.lock);
@@ -389,21 +391,34 @@
 	remove_wait_queue(&vcpu->wq, &wait);
 	spin_unlock_bh(&vcpu->arch.local_int.lock);
 	spin_unlock_bh(&vcpu->arch.local_int.float_int->lock);
-	del_timer(&vcpu->arch.ckc_timer);
+	hrtimer_try_to_cancel(&vcpu->arch.ckc_timer);
 	return 0;
 }
 
-void kvm_s390_idle_wakeup(unsigned long data)
+void kvm_s390_tasklet(unsigned long parm)
 {
-	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
+	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
 
-	spin_lock_bh(&vcpu->arch.local_int.lock);
+	spin_lock(&vcpu->arch.local_int.lock);
 	vcpu->arch.local_int.timer_due = 1;
 	if (waitqueue_active(&vcpu->arch.local_int.wq))
 		wake_up_interruptible(&vcpu->arch.local_int.wq);
-	spin_unlock_bh(&vcpu->arch.local_int.lock);
+	spin_unlock(&vcpu->arch.local_int.lock);
 }
 
+/*
+ * low level hrtimer wake routine. Because this runs in hardirq context
+ * we schedule a tasklet to do the real work.
+ */
+enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer)
+{
+	struct kvm_vcpu *vcpu;
+
+	vcpu = container_of(timer, struct kvm_vcpu, arch.ckc_timer);
+	tasklet_schedule(&vcpu->arch.tasklet);
+
+	return HRTIMER_NORESTART;
+}
 
 void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu)
 {
Index: kvm/arch/s390/kvm/kvm-s390.c
===================================================================
--- kvm.orig/arch/s390/kvm/kvm-s390.c	2009-05-05 16:16:48.000000000 +0200
+++ kvm/arch/s390/kvm/kvm-s390.c	2009-05-05 16:16:49.000000000 +0200
@@ -15,6 +15,7 @@
 #include <linux/compiler.h>
 #include <linux/err.h>
 #include <linux/fs.h>
+#include <linux/hrtimer.h>
 #include <linux/init.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
@@ -286,8 +287,10 @@
 	vcpu->arch.sie_block->gmsor = vcpu->kvm->arch.guest_origin;
 	vcpu->arch.sie_block->ecb   = 2;
 	vcpu->arch.sie_block->eca   = 0xC1002001U;
-	setup_timer(&vcpu->arch.ckc_timer, kvm_s390_idle_wakeup,
-		 (unsigned long) vcpu);
+	hrtimer_init(&vcpu->arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
+	tasklet_init(&vcpu->arch.tasklet, kvm_s390_tasklet,
+		     (unsigned long) vcpu);
+	vcpu->arch.ckc_timer.function = kvm_s390_idle_wakeup;
 	get_cpu_id(&vcpu->arch.cpu_id);
 	vcpu->arch.cpu_id.version = 0xff;
 	return 0;
Index: kvm/arch/s390/kvm/kvm-s390.h
===================================================================
--- kvm.orig/arch/s390/kvm/kvm-s390.h	2009-05-05 15:58:45.000000000 +0200
+++ kvm/arch/s390/kvm/kvm-s390.h	2009-05-05 16:16:49.000000000 +0200
@@ -14,6 +14,7 @@
 #ifndef ARCH_S390_KVM_S390_H
 #define ARCH_S390_KVM_S390_H
 
+#include <linux/hrtimer.h>
 #include <linux/kvm.h>
 #include <linux/kvm_host.h>
 
@@ -41,7 +42,8 @@
 }
 
 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu);
-void kvm_s390_idle_wakeup(unsigned long data);
+enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer);
+void kvm_s390_tasklet(unsigned long parm);
 void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu);
 int kvm_s390_inject_vm(struct kvm *kvm,
 		struct kvm_s390_interrupt *s390int);

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 3/6] kvm-s390: optimize float int lock: spin_lock_bh --> spin_lock
  2009-05-05 14:39 [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes ehrhardt
  2009-05-05 14:39 ` [PATCH 1/6] kvm-s390: Fix memory slot versus run ehrhardt
  2009-05-05 14:39 ` [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle ehrhardt
@ 2009-05-05 14:39 ` ehrhardt
  2009-05-05 14:39 ` [PATCH 4/6] kvm-s390: Unlink vcpu on destroy ehrhardt
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 26+ messages in thread
From: ehrhardt @ 2009-05-05 14:39 UTC (permalink / raw)
  To: Avi Kivity, kvm; +Cc: ehrhardt, Christian Borntraeger, Carsten Otte

From: Christian Borntraeger <borntraeger@de.ibm.com>

The floating interrupt lock is only taken in process context. We can
replace all spin_lock_bh with standard spin_lock calls.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
---
 arch/s390/kvm/interrupt.c |   20 ++++++++++----------
 arch/s390/kvm/kvm-s390.c  |    4 ++--
 arch/s390/kvm/priv.c      |    4 ++--
 arch/s390/kvm/sigp.c      |   16 ++++++++--------
 4 files changed, 22 insertions(+), 22 deletions(-)

Index: kvm/arch/s390/kvm/interrupt.c
===================================================================
--- kvm.orig/arch/s390/kvm/interrupt.c
+++ kvm/arch/s390/kvm/interrupt.c
@@ -301,13 +301,13 @@ int kvm_cpu_has_interrupt(struct kvm_vcp
 	}
 
 	if ((!rc) && atomic_read(&fi->active)) {
-		spin_lock_bh(&fi->lock);
+		spin_lock(&fi->lock);
 		list_for_each_entry(inti, &fi->list, list)
 			if (__interrupt_is_deliverable(vcpu, inti)) {
 				rc = 1;
 				break;
 			}
-		spin_unlock_bh(&fi->lock);
+		spin_unlock(&fi->lock);
 	}
 
 	if ((!rc) && (vcpu->arch.sie_block->ckc <
@@ -368,7 +368,7 @@ int kvm_s390_handle_wait(struct kvm_vcpu
 	hrtimer_start(&vcpu->arch.ckc_timer, ktime_set (0, sltime) , HRTIMER_MODE_REL);
 	VCPU_EVENT(vcpu, 5, "enabled wait via clock comparator: %llx ns", sltime);
 no_timer:
-	spin_lock_bh(&vcpu->arch.local_int.float_int->lock);
+	spin_lock(&vcpu->arch.local_int.float_int->lock);
 	spin_lock_bh(&vcpu->arch.local_int.lock);
 	add_wait_queue(&vcpu->arch.local_int.wq, &wait);
 	while (list_empty(&vcpu->arch.local_int.list) &&
@@ -377,18 +377,18 @@ no_timer:
 		!signal_pending(current)) {
 		set_current_state(TASK_INTERRUPTIBLE);
 		spin_unlock_bh(&vcpu->arch.local_int.lock);
-		spin_unlock_bh(&vcpu->arch.local_int.float_int->lock);
+		spin_unlock(&vcpu->arch.local_int.float_int->lock);
 		vcpu_put(vcpu);
 		schedule();
 		vcpu_load(vcpu);
-		spin_lock_bh(&vcpu->arch.local_int.float_int->lock);
+		spin_lock(&vcpu->arch.local_int.float_int->lock);
 		spin_lock_bh(&vcpu->arch.local_int.lock);
 	}
 	__unset_cpu_idle(vcpu);
 	__set_current_state(TASK_RUNNING);
 	remove_wait_queue(&vcpu->wq, &wait);
 	spin_unlock_bh(&vcpu->arch.local_int.lock);
-	spin_unlock_bh(&vcpu->arch.local_int.float_int->lock);
+	spin_unlock(&vcpu->arch.local_int.float_int->lock);
 	hrtimer_try_to_cancel(&vcpu->arch.ckc_timer);
 	return 0;
 }
@@ -455,7 +455,7 @@ void kvm_s390_deliver_pending_interrupts
 	if (atomic_read(&fi->active)) {
 		do {
 			deliver = 0;
-			spin_lock_bh(&fi->lock);
+			spin_lock(&fi->lock);
 			list_for_each_entry_safe(inti, n, &fi->list, list) {
 				if (__interrupt_is_deliverable(vcpu, inti)) {
 					list_del(&inti->list);
@@ -466,7 +466,7 @@ void kvm_s390_deliver_pending_interrupts
 			}
 			if (list_empty(&fi->list))
 				atomic_set(&fi->active, 0);
-			spin_unlock_bh(&fi->lock);
+			spin_unlock(&fi->lock);
 			if (deliver) {
 				__do_deliver_interrupt(vcpu, inti);
 				kfree(inti);
@@ -531,7 +531,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
 
 	mutex_lock(&kvm->lock);
 	fi = &kvm->arch.float_int;
-	spin_lock_bh(&fi->lock);
+	spin_lock(&fi->lock);
 	list_add_tail(&inti->list, &fi->list);
 	atomic_set(&fi->active, 1);
 	sigcpu = find_first_bit(fi->idle_mask, KVM_MAX_VCPUS);
@@ -548,7 +548,7 @@ int kvm_s390_inject_vm(struct kvm *kvm,
 	if (waitqueue_active(&li->wq))
 		wake_up_interruptible(&li->wq);
 	spin_unlock_bh(&li->lock);
-	spin_unlock_bh(&fi->lock);
+	spin_unlock(&fi->lock);
 	mutex_unlock(&kvm->lock);
 	return 0;
 }
Index: kvm/arch/s390/kvm/kvm-s390.c
===================================================================
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -323,11 +323,11 @@ struct kvm_vcpu *kvm_arch_vcpu_create(st
 	spin_lock_init(&vcpu->arch.local_int.lock);
 	INIT_LIST_HEAD(&vcpu->arch.local_int.list);
 	vcpu->arch.local_int.float_int = &kvm->arch.float_int;
-	spin_lock_bh(&kvm->arch.float_int.lock);
+	spin_lock(&kvm->arch.float_int.lock);
 	kvm->arch.float_int.local_int[id] = &vcpu->arch.local_int;
 	init_waitqueue_head(&vcpu->arch.local_int.wq);
 	vcpu->arch.local_int.cpuflags = &vcpu->arch.sie_block->cpuflags;
-	spin_unlock_bh(&kvm->arch.float_int.lock);
+	spin_unlock(&kvm->arch.float_int.lock);
 
 	rc = kvm_vcpu_init(vcpu, kvm, id);
 	if (rc)
Index: kvm/arch/s390/kvm/priv.c
===================================================================
--- kvm.orig/arch/s390/kvm/priv.c
+++ kvm/arch/s390/kvm/priv.c
@@ -204,11 +204,11 @@ static void handle_stsi_3_2_2(struct kvm
 	int cpus = 0;
 	int n;
 
-	spin_lock_bh(&fi->lock);
+	spin_lock(&fi->lock);
 	for (n = 0; n < KVM_MAX_VCPUS; n++)
 		if (fi->local_int[n])
 			cpus++;
-	spin_unlock_bh(&fi->lock);
+	spin_unlock(&fi->lock);
 
 	/* deal with other level 3 hypervisors */
 	if (stsi(mem, 3, 2, 2) == -ENOSYS)
Index: kvm/arch/s390/kvm/sigp.c
===================================================================
--- kvm.orig/arch/s390/kvm/sigp.c
+++ kvm/arch/s390/kvm/sigp.c
@@ -52,7 +52,7 @@ static int __sigp_sense(struct kvm_vcpu 
 	if (cpu_addr >= KVM_MAX_VCPUS)
 		return 3; /* not operational */
 
-	spin_lock_bh(&fi->lock);
+	spin_lock(&fi->lock);
 	if (fi->local_int[cpu_addr] == NULL)
 		rc = 3; /* not operational */
 	else if (atomic_read(fi->local_int[cpu_addr]->cpuflags)
@@ -64,7 +64,7 @@ static int __sigp_sense(struct kvm_vcpu 
 		*reg |= SIGP_STAT_STOPPED;
 		rc = 1; /* status stored */
 	}
-	spin_unlock_bh(&fi->lock);
+	spin_unlock(&fi->lock);
 
 	VCPU_EVENT(vcpu, 4, "sensed status of cpu %x rc %x", cpu_addr, rc);
 	return rc;
@@ -86,7 +86,7 @@ static int __sigp_emergency(struct kvm_v
 
 	inti->type = KVM_S390_INT_EMERGENCY;
 
-	spin_lock_bh(&fi->lock);
+	spin_lock(&fi->lock);
 	li = fi->local_int[cpu_addr];
 	if (li == NULL) {
 		rc = 3; /* not operational */
@@ -102,7 +102,7 @@ static int __sigp_emergency(struct kvm_v
 	spin_unlock_bh(&li->lock);
 	rc = 0; /* order accepted */
 unlock:
-	spin_unlock_bh(&fi->lock);
+	spin_unlock(&fi->lock);
 	VCPU_EVENT(vcpu, 4, "sent sigp emerg to cpu %x", cpu_addr);
 	return rc;
 }
@@ -123,7 +123,7 @@ static int __sigp_stop(struct kvm_vcpu *
 
 	inti->type = KVM_S390_SIGP_STOP;
 
-	spin_lock_bh(&fi->lock);
+	spin_lock(&fi->lock);
 	li = fi->local_int[cpu_addr];
 	if (li == NULL) {
 		rc = 3; /* not operational */
@@ -142,7 +142,7 @@ static int __sigp_stop(struct kvm_vcpu *
 	spin_unlock_bh(&li->lock);
 	rc = 0; /* order accepted */
 unlock:
-	spin_unlock_bh(&fi->lock);
+	spin_unlock(&fi->lock);
 	VCPU_EVENT(vcpu, 4, "sent sigp stop to cpu %x", cpu_addr);
 	return rc;
 }
@@ -188,7 +188,7 @@ static int __sigp_set_prefix(struct kvm_
 	if (!inti)
 		return 2; /* busy */
 
-	spin_lock_bh(&fi->lock);
+	spin_lock(&fi->lock);
 	li = fi->local_int[cpu_addr];
 
 	if ((cpu_addr >= KVM_MAX_VCPUS) || (li == NULL)) {
@@ -220,7 +220,7 @@ static int __sigp_set_prefix(struct kvm_
 out_li:
 	spin_unlock_bh(&li->lock);
 out_fi:
-	spin_unlock_bh(&fi->lock);
+	spin_unlock(&fi->lock);
 	return rc;
 }
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 4/6] kvm-s390: Unlink vcpu on destroy
  2009-05-05 14:39 [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes ehrhardt
                   ` (2 preceding siblings ...)
  2009-05-05 14:39 ` [PATCH 3/6] kvm-s390: optimize float int lock: spin_lock_bh --> spin_lock ehrhardt
@ 2009-05-05 14:39 ` ehrhardt
  2009-05-06 12:11   ` Avi Kivity
  2009-05-05 14:39 ` [PATCH 5/6] kvm-s390: Sanity check on validity intercept ehrhardt
  2009-05-05 14:39 ` [PATCH 6/6] kvm-s390: Verify memory in kvm run ehrhardt
  5 siblings, 1 reply; 26+ messages in thread
From: ehrhardt @ 2009-05-05 14:39 UTC (permalink / raw)
  To: Avi Kivity, kvm; +Cc: ehrhardt, Christian Borntraeger, Carsten Otte

From: Carsten Otte <cotte@de.ibm.com>

This patch makes sure we do unlink a vcpu's sie control block
from the system control area in kvm_arch_vcpu_destroy. This
prevents illegal accesses to the sie control block from other
virtual cpus after free.

Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Index: kvm/arch/s390/kvm/kvm-s390.c
===================================================================
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -195,6 +195,9 @@ out_nokvm:
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
 	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
+	if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
+		(__u64) vcpu->arch.sie_block)
+		vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
 	free_page((unsigned long)(vcpu->arch.sie_block));
 	kvm_vcpu_uninit(vcpu);
 	kfree(vcpu);
@@ -307,8 +310,10 @@ struct kvm_vcpu *kvm_arch_vcpu_create(st
 
 	vcpu->arch.sie_block->icpua = id;
 	BUG_ON(!kvm->arch.sca);
-	BUG_ON(kvm->arch.sca->cpu[id].sda);
-	kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
+	if (!kvm->arch.sca->cpu[id].sda)
+		kvm->arch.sca->cpu[id].sda = (__u64) vcpu->arch.sie_block;
+	else
+		BUG_ON(!kvm->vcpus[id]); /* vcpu does already exist */
 	vcpu->arch.sie_block->scaoh = (__u32)(((__u64)kvm->arch.sca) >> 32);
 	vcpu->arch.sie_block->scaol = (__u32)(__u64)kvm->arch.sca;
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 5/6] kvm-s390: Sanity check on validity intercept
  2009-05-05 14:39 [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes ehrhardt
                   ` (3 preceding siblings ...)
  2009-05-05 14:39 ` [PATCH 4/6] kvm-s390: Unlink vcpu on destroy ehrhardt
@ 2009-05-05 14:39 ` ehrhardt
  2009-05-05 14:39 ` [PATCH 6/6] kvm-s390: Verify memory in kvm run ehrhardt
  5 siblings, 0 replies; 26+ messages in thread
From: ehrhardt @ 2009-05-05 14:39 UTC (permalink / raw)
  To: Avi Kivity, kvm; +Cc: ehrhardt, Christian Borntraeger, Carsten Otte

From: Carsten Otte <cotte@de.ibm.com>

This patch adds a sanity check for the content of the guest
prefix register content before faulting in the cpu lowcore
that it refers to. The guest might end up in an endless loop
where SIE complains about missing lowcore with incorrect
content of the prefix register without this fix.

Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
 arch/s390/kvm/intercept.c |   28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

Index: kvm/arch/s390/kvm/intercept.c
===================================================================
--- kvm.orig/arch/s390/kvm/intercept.c
+++ kvm/arch/s390/kvm/intercept.c
@@ -154,17 +154,25 @@ static int handle_stop(struct kvm_vcpu *
 static int handle_validity(struct kvm_vcpu *vcpu)
 {
 	int viwhy = vcpu->arch.sie_block->ipb >> 16;
+	int rc;
+
 	vcpu->stat.exit_validity++;
-	if (viwhy == 0x37) {
-		fault_in_pages_writeable((char __user *)
-					 vcpu->kvm->arch.guest_origin +
-					 vcpu->arch.sie_block->prefix,
-					 PAGE_SIZE);
-		return 0;
-	}
-	VCPU_EVENT(vcpu, 2, "unhandled validity intercept code %d",
-		   viwhy);
-	return -ENOTSUPP;
+	if ((viwhy == 0x37) && (vcpu->arch.sie_block->prefix
+		<= vcpu->kvm->arch.guest_memsize - 2*PAGE_SIZE)){
+		rc = fault_in_pages_writeable((char __user *)
+			 vcpu->kvm->arch.guest_origin +
+			 vcpu->arch.sie_block->prefix,
+			 2*PAGE_SIZE);
+		if (rc)
+			/* user will receive sigsegv, exit to user */
+			rc = -ENOTSUPP;
+	} else
+		rc = -ENOTSUPP;
+
+	if (rc)
+		VCPU_EVENT(vcpu, 2, "unhandled validity intercept code %d",
+			   viwhy);
+	return rc;
 }
 
 static int handle_instruction(struct kvm_vcpu *vcpu)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 6/6] kvm-s390: Verify memory in kvm run
  2009-05-05 14:39 [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes ehrhardt
                   ` (4 preceding siblings ...)
  2009-05-05 14:39 ` [PATCH 5/6] kvm-s390: Sanity check on validity intercept ehrhardt
@ 2009-05-05 14:39 ` ehrhardt
  5 siblings, 0 replies; 26+ messages in thread
From: ehrhardt @ 2009-05-05 14:39 UTC (permalink / raw)
  To: Avi Kivity, kvm; +Cc: ehrhardt, Christian Borntraeger, Carsten Otte

From: Carsten Otte <cotte@de.ibm.com>

This check verifies that the guest we're trying to run in KVM_RUN
has some memory assigned to it. It enters an endless exception
loop if this is not the case.

Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
Signed-off-by: Carsten Otte <cotte@de.ibm.com>
---
 arch/s390/kvm/kvm-s390.c |    6 ++++++
 1 file changed, 6 insertions(+)

Index: kvm/arch/s390/kvm/kvm-s390.c
===================================================================
--- kvm.orig/arch/s390/kvm/kvm-s390.c
+++ kvm/arch/s390/kvm/kvm-s390.c
@@ -478,6 +478,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_v
 
 	vcpu_load(vcpu);
 
+	/* verify, that memory has been registered */
+	if (!vcpu->kvm->arch.guest_memsize) {
+		vcpu_put(vcpu);
+		return -EINVAL;
+	}
+
 	if (vcpu->sigset_active)
 		sigprocmask(SIG_SETMASK, &vcpu->sigset, &sigsaved);
 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-05 14:39 ` [PATCH 1/6] kvm-s390: Fix memory slot versus run ehrhardt
@ 2009-05-06 12:01   ` Avi Kivity
  2009-05-11 13:00     ` Christian Ehrhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2009-05-06 12:01 UTC (permalink / raw)
  To: ehrhardt; +Cc: kvm, Christian Borntraeger, Carsten Otte

ehrhardt@linux.vnet.ibm.com wrote:
> From: Carsten Otte <cotte@de.ibm.com>
>
> This patch fixes an incorrectness in the kvm backend for s390.
> In case virtual cpus are being created before the corresponding
> memory slot is being registered, we need to update the sie
> control blocks for the virtual cpus. In order to do that, we
> use the vcpu->mutex to lock out kvm_run and friends. This way
> we can ensure a consistent update of the memory for the entire
> smp configuration.
> @@ -657,6 +657,8 @@ int kvm_arch_set_memory_region(struct kv
>  				struct kvm_memory_slot old,
>  				int user_alloc)
>  {
> +	int i;
> +
>  	/* A few sanity checks. We can have exactly one memory slot which has
>  	   to start at guest virtual zero and which has to be located at a
>  	   page boundary in userland and which has to end at a page boundary.
> @@ -676,13 +678,27 @@ int kvm_arch_set_memory_region(struct kv
>  	if (mem->memory_size & (PAGE_SIZE - 1))
>  		return -EINVAL;
>  
> +	/* lock all vcpus */
> +	for (i = 0; i < KVM_MAX_VCPUS; ++i) {
> +		if (kvm->vcpus[i])
> +			mutex_lock(&kvm->vcpus[i]->mutex);
> +	}
> +
>  	

Can't that livelock?  Nothing requires a vcpu to ever exit, and if the 
cpu on which it's running on has no other load and no interrupts, it 
could remain in guest mode indefinitely, and then the ioctl will hang, 
waiting for something to happen.

On x86, we use slots_lock to protect memory slots.  When we change the 
global memory configuration, we set a bit in vcpu->requests, and send an 
IPI to all cpus that are currently in guest mode for our guest.  This 
forces the cpu back to host mode.  On the next entry, vcpu_run notices 
vcpu->requests has the bit set and reloads the mmu configuration.  Of 
course, all this may be overkill for s390.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle
  2009-05-05 14:39 ` [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle ehrhardt
@ 2009-05-06 12:10   ` Avi Kivity
  2009-05-06 12:36     ` Christian Borntraeger
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2009-05-06 12:10 UTC (permalink / raw)
  To: ehrhardt; +Cc: kvm, Christian Borntraeger, Carsten Otte

ehrhardt@linux.vnet.ibm.com wrote:
> From: Christian Borntraeger <borntraeger@de.ibm.com>
>
> This patch reworks the s390 clock comparator wakeup to hrtimer. The clock
> comparator is a per-cpu value that is compared against the TOD clock. If
> ckc <= TOD an external interrupt 1004 is triggered. Since the clock comparator
> and the TOD clock have a much higher resolution than jiffies we should use
> hrtimers to trigger the wakeup. This speeds up guest nanosleep for small
> values.
>
> Since hrtimers callbacks run in hard-irq context, I added a tasklet to do
> the actual work with enabled interrupts. 
>
>  
> -void kvm_s390_idle_wakeup(unsigned long data)
> +void kvm_s390_tasklet(unsigned long parm)
>  {
> -	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> +	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
>  
> -	spin_lock_bh(&vcpu->arch.local_int.lock);
> +	spin_lock(&vcpu->arch.local_int.lock);
>  	vcpu->arch.local_int.timer_due = 1;
>  	if (waitqueue_active(&vcpu->arch.local_int.wq))
>  		wake_up_interruptible(&vcpu->arch.local_int.wq);
> -	spin_unlock_bh(&vcpu->arch.local_int.lock);
> +	spin_unlock(&vcpu->arch.local_int.lock);
>  }
>   

Why can't this be done from the timer context (after adjusting the locks)?

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] kvm-s390: Unlink vcpu on destroy
  2009-05-05 14:39 ` [PATCH 4/6] kvm-s390: Unlink vcpu on destroy ehrhardt
@ 2009-05-06 12:11   ` Avi Kivity
  2009-05-11 13:00     ` Christian Ehrhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2009-05-06 12:11 UTC (permalink / raw)
  To: ehrhardt; +Cc: kvm, Christian Borntraeger, Carsten Otte

ehrhardt@linux.vnet.ibm.com wrote:
> From: Carsten Otte <cotte@de.ibm.com>
>
> This patch makes sure we do unlink a vcpu's sie control block
> from the system control area in kvm_arch_vcpu_destroy. This
> prevents illegal accesses to the sie control block from other
> virtual cpus after free.
>
> Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
> Signed-off-by: Carsten Otte <cotte@de.ibm.com>
> ---
>  arch/s390/kvm/kvm-s390.c |    9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> Index: kvm/arch/s390/kvm/kvm-s390.c
> ===================================================================
> --- kvm.orig/arch/s390/kvm/kvm-s390.c
> +++ kvm/arch/s390/kvm/kvm-s390.c
> @@ -195,6 +195,9 @@ out_nokvm:
>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>  {
>  	VCPU_EVENT(vcpu, 3, "%s", "free cpu");
> +	if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
> +		(__u64) vcpu->arch.sie_block)
> +		vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
>  	free_page((unsigned long)(vcpu->arch.sie_block));
>
>   

If this is accessed by hardware on a different cpu, don't you need a 
memory barrier here?


-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle
  2009-05-06 12:10   ` Avi Kivity
@ 2009-05-06 12:36     ` Christian Borntraeger
  2009-05-07 10:19       ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Borntraeger @ 2009-05-06 12:36 UTC (permalink / raw)
  To: Avi Kivity; +Cc: ehrhardt, kvm, Carsten Otte

Am Wednesday 06 May 2009 14:10:07 schrieb Avi Kivity:
> ehrhardt@linux.vnet.ibm.com wrote:
> > From: Christian Borntraeger <borntraeger@de.ibm.com>
> >
> > This patch reworks the s390 clock comparator wakeup to hrtimer. The clock
> > comparator is a per-cpu value that is compared against the TOD clock. If
> > ckc <= TOD an external interrupt 1004 is triggered. Since the clock comparator
> > and the TOD clock have a much higher resolution than jiffies we should use
> > hrtimers to trigger the wakeup. This speeds up guest nanosleep for small
> > values.
> >
> > Since hrtimers callbacks run in hard-irq context, I added a tasklet to do
> > the actual work with enabled interrupts. 
> >
> >  
> > -void kvm_s390_idle_wakeup(unsigned long data)
> > +void kvm_s390_tasklet(unsigned long parm)
> >  {
> > -	struct kvm_vcpu *vcpu = (struct kvm_vcpu *)data;
> > +	struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm;
> >  
> > -	spin_lock_bh(&vcpu->arch.local_int.lock);
> > +	spin_lock(&vcpu->arch.local_int.lock);
> >  	vcpu->arch.local_int.timer_due = 1;
> >  	if (waitqueue_active(&vcpu->arch.local_int.wq))
> >  		wake_up_interruptible(&vcpu->arch.local_int.wq);
> > -	spin_unlock_bh(&vcpu->arch.local_int.lock);
> > +	spin_unlock(&vcpu->arch.local_int.lock);
> >  }
> >   
> 
> Why can't this be done from the timer context (after adjusting the locks)?

It can be done, in fact I did that in my first version. The thing is, we would need to change all local_int.locks to spinlock_irqs, since standard timers are softirqs and hrtimers and hardirqs. Enabling and disabling irqs is a relatively expensive operating on s390 (but locks via compare and swap are quite cheap). Since we take this specific lock in lots of places this lock is on some hot pathes. The idle wakeup on the other hand is not that critical.

Christian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle
  2009-05-06 12:36     ` Christian Borntraeger
@ 2009-05-07 10:19       ` Avi Kivity
  2009-05-07 10:34         ` Christian Borntraeger
  2009-05-20 15:48         ` Hollis Blanchard
  0 siblings, 2 replies; 26+ messages in thread
From: Avi Kivity @ 2009-05-07 10:19 UTC (permalink / raw)
  To: Christian Borntraeger; +Cc: ehrhardt, kvm, Carsten Otte

Christian Borntraeger wrote:
>> Why can't this be done from the timer context (after adjusting the locks)?
>>     
>
> It can be done, in fact I did that in my first version. The thing is, we would need to change all local_int.locks to spinlock_irqs, since standard timers are softirqs and hrtimers and hardirqs. Enabling and disabling irqs is a relatively expensive operating on s390 (but locks via compare and swap are quite cheap). Since we take this specific lock in lots of places this lock is on some hot pathes. The idle wakeup on the other hand is not that critical.
>   

Makes sense.

Back when irqsave/irqrestore were expensive on x86 (P4 days, I think it 
was ~100 cycles) there was talk of using a software irq disable flag 
instead of using the hardware support.  So

- local_irq_disable() sets a flag

- interrupt handlers check the flag, if set they schedule the interrupt 
handler and return immediately

- local_irq_restore() clears the flag and fires and scheduled handlers

Now these operations are pretty cheap on x86, but maybe that can apply 
to s390.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle
  2009-05-07 10:19       ` Avi Kivity
@ 2009-05-07 10:34         ` Christian Borntraeger
  2009-05-20 15:48         ` Hollis Blanchard
  1 sibling, 0 replies; 26+ messages in thread
From: Christian Borntraeger @ 2009-05-07 10:34 UTC (permalink / raw)
  To: Avi Kivity; +Cc: ehrhardt, kvm, Carsten Otte

Am Thursday 07 May 2009 12:19:32 schrieb Avi Kivity:
> Makes sense.
> 
> Back when irqsave/irqrestore were expensive on x86 (P4 days, I think it 
> was ~100 cycles) there was talk of using a software irq disable flag 
> instead of using the hardware support.  So
> 
> - local_irq_disable() sets a flag
> 
> - interrupt handlers check the flag, if set they schedule the interrupt 
> handler and return immediately
> 
> - local_irq_restore() clears the flag and fires and scheduled handlers
> 
> Now these operations are pretty cheap on x86, but maybe that can apply 
> to s390.

Interesting idea. Nevertheless, I dont think it improve our situation. The affected instructions (stosm and stnsm) are more expensive than compare and swap, but its nowhere near the ~100 cycles of P4.

Christian

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/6] kvm-s390: Unlink vcpu on destroy
  2009-05-06 12:11   ` Avi Kivity
@ 2009-05-11 13:00     ` Christian Ehrhardt
  0 siblings, 0 replies; 26+ messages in thread
From: Christian Ehrhardt @ 2009-05-11 13:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Christian Borntraeger, Carsten Otte

Avi Kivity wrote:
> ehrhardt@linux.vnet.ibm.com wrote:
>> From: Carsten Otte <cotte@de.ibm.com>
>>
>> This patch makes sure we do unlink a vcpu's sie control block
>> from the system control area in kvm_arch_vcpu_destroy. This
>> prevents illegal accesses to the sie control block from other
>> virtual cpus after free.
>>
>> Reported-by: Mijo Safradin <mijo@linux.vnet.ibm.com>
>> Signed-off-by: Carsten Otte <cotte@de.ibm.com>
>> ---
>>  arch/s390/kvm/kvm-s390.c |    9 +++++++--
>>  1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> Index: kvm/arch/s390/kvm/kvm-s390.c
>> ===================================================================
>> --- kvm.orig/arch/s390/kvm/kvm-s390.c
>> +++ kvm/arch/s390/kvm/kvm-s390.c
>> @@ -195,6 +195,9 @@ out_nokvm:
>>  void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
>>  {
>>      VCPU_EVENT(vcpu, 3, "%s", "free cpu");
>> +    if (vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda ==
>> +        (__u64) vcpu->arch.sie_block)
>> +        vcpu->kvm->arch.sca->cpu[vcpu->vcpu_id].sda = 0;
>>      free_page((unsigned long)(vcpu->arch.sie_block));
>>
>>   
>
> If this is accessed by hardware on a different cpu, don't you need a 
> memory barrier here?
>
>
Right, will be in v2

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-06 12:01   ` Avi Kivity
@ 2009-05-11 13:00     ` Christian Ehrhardt
  2009-05-11 13:15       ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Ehrhardt @ 2009-05-11 13:00 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Christian Borntraeger, Carsten Otte

Avi Kivity wrote:
> ehrhardt@linux.vnet.ibm.com wrote:
>> From: Carsten Otte <cotte@de.ibm.com>
>>
>> This patch fixes an incorrectness in the kvm backend for s390.
>> In case virtual cpus are being created before the corresponding
>> memory slot is being registered, we need to update the sie
>> control blocks for the virtual cpus. In order to do that, we
>> use the vcpu->mutex to lock out kvm_run and friends. This way
>> we can ensure a consistent update of the memory for the entire
>> smp configuration.
>> @@ -657,6 +657,8 @@ int kvm_arch_set_memory_region(struct kv
>>                  struct kvm_memory_slot old,
>>                  int user_alloc)
>>  {
>> +    int i;
>> +
>>      /* A few sanity checks. We can have exactly one memory slot 
>> which has
>>         to start at guest virtual zero and which has to be located at a
>>         page boundary in userland and which has to end at a page 
>> boundary.
>> @@ -676,13 +678,27 @@ int kvm_arch_set_memory_region(struct kv
>>      if (mem->memory_size & (PAGE_SIZE - 1))
>>          return -EINVAL;
>>  
>> +    /* lock all vcpus */
>> +    for (i = 0; i < KVM_MAX_VCPUS; ++i) {
>> +        if (kvm->vcpus[i])
>> +            mutex_lock(&kvm->vcpus[i]->mutex);
>> +    }
>> +
>>     
>
> Can't that livelock?  Nothing requires a vcpu to ever exit, and if the 
> cpu on which it's running on has no other load and no interrupts, it 
> could remain in guest mode indefinitely, and then the ioctl will hang, 
> waiting for something to happen.
>
Yes it could wait indefinitely - good spot.

> On x86, we use slots_lock to protect memory slots.  When we change the 
> global memory configuration, we set a bit in vcpu->requests, and send 
> an IPI to all cpus that are currently in guest mode for our guest.  
> This forces the cpu back to host mode.  On the next entry, vcpu_run 
> notices vcpu->requests has the bit set and reloads the mmu 
> configuration.  Of course, all this may be overkill for s390.
>
I thought about implementing it with slots_lock, vcpu->request, etc but 
it really looks like overkill for s390.
At least today we can assume that we only have one memslot. Therefore a 
set_memslot with already created vcpu's will still not interfere with 
running vcpus (they can't run without memslot and since we have only one 
they won't run).
Anyway I the code is prepared to "meet" running vcpus, because it might 
be different in future. To prevent the livelock issue I changed the code 
using mutex_trylock and in case I can't get the lock I explicitly let 
the vcpu exit from guest.

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-11 13:00     ` Christian Ehrhardt
@ 2009-05-11 13:15       ` Avi Kivity
  2009-05-11 13:46         ` Christian Ehrhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2009-05-11 13:15 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: kvm, Christian Borntraeger, Carsten Otte

Christian Ehrhardt wrote:
>
>> On x86, we use slots_lock to protect memory slots.  When we change 
>> the global memory configuration, we set a bit in vcpu->requests, and 
>> send an IPI to all cpus that are currently in guest mode for our 
>> guest.  This forces the cpu back to host mode.  On the next entry, 
>> vcpu_run notices vcpu->requests has the bit set and reloads the mmu 
>> configuration.  Of course, all this may be overkill for s390.
>>
> I thought about implementing it with slots_lock, vcpu->request, etc 
> but it really looks like overkill for s390.

We could make (some of) it common code, so it won't look so bad.  
There's value in having all kvm ports do things similarly; though of 
course we shouldn't force the solution when it isn't really needed.

vcpu->requests is useful whenever we modify global VM state that needs 
to be seen by all vcpus in host mode; see  kvm_reload_remote_mmus().

> At least today we can assume that we only have one memslot. Therefore 
> a set_memslot with already created vcpu's will still not interfere 
> with running vcpus (they can't run without memslot and since we have 
> only one they won't run).
> Anyway I the code is prepared to "meet" running vcpus, because it 
> might be different in future. To prevent the livelock issue I changed 
> the code using mutex_trylock and in case I can't get the lock I 
> explicitly let the vcpu exit from guest.

Why not do it unconditionally?

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-11 13:15       ` Avi Kivity
@ 2009-05-11 13:46         ` Christian Ehrhardt
  2009-05-11 14:02           ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Ehrhardt @ 2009-05-11 13:46 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Christian Borntraeger, Carsten Otte

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>>
>>> On x86, we use slots_lock to protect memory slots.  When we change 
>>> the global memory configuration, we set a bit in vcpu->requests, and 
>>> send an IPI to all cpus that are currently in guest mode for our 
>>> guest.  This forces the cpu back to host mode.  On the next entry, 
>>> vcpu_run notices vcpu->requests has the bit set and reloads the mmu 
>>> configuration.  Of course, all this may be overkill for s390.
>>>
>> I thought about implementing it with slots_lock, vcpu->request, etc 
>> but it really looks like overkill for s390.
>
> We could make (some of) it common code, so it won't look so bad.  
> There's value in having all kvm ports do things similarly; though of 
> course we shouldn't force the solution when it isn't really needed.
>
> vcpu->requests is useful whenever we modify global VM state that needs 
> to be seen by all vcpus in host mode; see  kvm_reload_remote_mmus().
yeah I read that code after your first hint in that thread, and I agree 
that merging some of this into common code might be good.
But in my opinion not now for this bugfix patch (the intention is just 
to prevent a user being able to crash the host via vcpu create,set mem& 
and vcpu run in that order).
It might be a good point to further streamline this once we use the same 
userspace code, but I think it doesn't make sense yet.
>
>> At least today we can assume that we only have one memslot. Therefore 
>> a set_memslot with already created vcpu's will still not interfere 
>> with running vcpus (they can't run without memslot and since we have 
>> only one they won't run).
>> Anyway I the code is prepared to "meet" running vcpus, because it 
>> might be different in future. To prevent the livelock issue I changed 
>> the code using mutex_trylock and in case I can't get the lock I 
>> explicitly let the vcpu exit from guest.
>
> Why not do it unconditionally?
>
hmm I might have written that misleading - eventually it's a loop until 
it got the lock
  while !trylock
    kick vcpu out of guest
    schedule

There is no reason to kick out guests where I got the lock cleanly as 
far as I see.
Especially as I expect the vcpus not running in the common case as i 
explained above (can't run without memslot + we only have one => no vcpu 
will run).


-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-11 13:46         ` Christian Ehrhardt
@ 2009-05-11 14:02           ` Avi Kivity
  2009-05-11 14:42             ` Christian Ehrhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2009-05-11 14:02 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: kvm, Christian Borntraeger, Carsten Otte

Christian Ehrhardt wrote:
>>> I thought about implementing it with slots_lock, vcpu->request, etc 
>>> but it really looks like overkill for s390.
>>
>> We could make (some of) it common code, so it won't look so bad.  
>> There's value in having all kvm ports do things similarly; though of 
>> course we shouldn't force the solution when it isn't really needed.
>>
>> vcpu->requests is useful whenever we modify global VM state that 
>> needs to be seen by all vcpus in host mode; see  
>> kvm_reload_remote_mmus().
> yeah I read that code after your first hint in that thread, and I 
> agree that merging some of this into common code might be good.
> But in my opinion not now for this bugfix patch (the intention is just 
> to prevent a user being able to crash the host via vcpu create,set 
> mem& and vcpu run in that order).
> It might be a good point to further streamline this once we use the 
> same userspace code, but I think it doesn't make sense yet.

Sure, don't mix bugfixes with infrastructure changes, when possible.

>>> At least today we can assume that we only have one memslot. 
>>> Therefore a set_memslot with already created vcpu's will still not 
>>> interfere with running vcpus (they can't run without memslot and 
>>> since we have only one they won't run).
>>> Anyway I the code is prepared to "meet" running vcpus, because it 
>>> might be different in future. To prevent the livelock issue I 
>>> changed the code using mutex_trylock and in case I can't get the 
>>> lock I explicitly let the vcpu exit from guest.
>>
>> Why not do it unconditionally?
>>
> hmm I might have written that misleading - eventually it's a loop 
> until it got the lock
>  while !trylock
>    kick vcpu out of guest
>    schedule
>
> There is no reason to kick out guests where I got the lock cleanly as 
> far as I see.
> Especially as I expect the vcpus not running in the common case as i 
> explained above (can't run without memslot + we only have one => no 
> vcpu will run).

Still livelockable, unless you stop the vcpu from entering the guest 
immediately.

That's why vcpu->requests is so powerful.  Not only you kick the vcpu 
out of guest mode, you force it to synchronize when it tries to enter again.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-11 14:02           ` Avi Kivity
@ 2009-05-11 14:42             ` Christian Ehrhardt
  2009-05-11 15:01               ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Ehrhardt @ 2009-05-11 14:42 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Christian Borntraeger, Carsten Otte

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>>>> I thought about implementing it with slots_lock, vcpu->request, etc 
>>>> but it really looks like overkill for s390.
>>>
>>> We could make (some of) it common code, so it won't look so bad.  
>>> There's value in having all kvm ports do things similarly; though of 
>>> course we shouldn't force the solution when it isn't really needed.
>>>
>>> vcpu->requests is useful whenever we modify global VM state that 
>>> needs to be seen by all vcpus in host mode; see  
>>> kvm_reload_remote_mmus().
>> yeah I read that code after your first hint in that thread, and I 
>> agree that merging some of this into common code might be good.
>> But in my opinion not now for this bugfix patch (the intention is 
>> just to prevent a user being able to crash the host via vcpu 
>> create,set mem& and vcpu run in that order).
>> It might be a good point to further streamline this once we use the 
>> same userspace code, but I think it doesn't make sense yet.
>
> Sure, don't mix bugfixes with infrastructure changes, when possible.
>
>>>> At least today we can assume that we only have one memslot. 
>>>> Therefore a set_memslot with already created vcpu's will still not 
>>>> interfere with running vcpus (they can't run without memslot and 
>>>> since we have only one they won't run).
>>>> Anyway I the code is prepared to "meet" running vcpus, because it 
>>>> might be different in future. To prevent the livelock issue I 
>>>> changed the code using mutex_trylock and in case I can't get the 
>>>> lock I explicitly let the vcpu exit from guest.
>>>
>>> Why not do it unconditionally?
>>>
>> hmm I might have written that misleading - eventually it's a loop 
>> until it got the lock
>>  while !trylock
>>    kick vcpu out of guest
>>    schedule
>>
>> There is no reason to kick out guests where I got the lock cleanly as 
>> far as I see.
>> Especially as I expect the vcpus not running in the common case as i 
>> explained above (can't run without memslot + we only have one => no 
>> vcpu will run).
>
> Still livelockable, unless you stop the vcpu from entering the guest 
> immediately.
>
> That's why vcpu->requests is so powerful.  Not only you kick the vcpu 
> out of guest mode, you force it to synchronize when it tries to enter 
> again.
>

The bad thing on vcpu->request in that case is that I don't want the 
async behaviour of vcpu->requests in that case, I want the memory slot 
updated in all vcpu's when the ioctl is returning.
Looking at vcpu->request based solution I don't find the synchronization 
I need. The changes to  vcpu->arch.guest_origin/guest_memsize and the 
changes to vcpu->arch.sie_block->gmsor/gmslm need to happen without the 
vcpu running.
Therefor i want the vcpu lock _before_ I update the both structs, 
otherwise it could be racy (at least on s390).

On the other hand while it is very++ unlikely to happen you are still 
right that it could theoretically livelock there.
I might use vcpu->request in to not enter vcpu run again after such a 
"kick" out of guest state.
It would be checked on vcpu_run enter and could then drop the lock, call 
schedule, relock and check the flag again until it is cleared.
I'm not yet happy with this solution as I expect it to end up in 
something like a reference count which then would not fit into the 
existing vcpu->request flags :-/

As I mentioned above the changes to vcpu->arch and vcpu->arch->sie_block 
have to be exclusive with the vcpu not running.
If I would find something as "transport" for the information I have on 
set_memory_slot (origin/size) until the next vcpu_run entry I could do 
both changes there synchronously.
In that case I could really use your suggested solution with 
vcpu->request, kick out unconditionally and set values on next (re-)entry.

Hmmm .. Maybe I can find all I need on reentry in vcpu->kvm->memslots[].
If I can change it that way it will definitely require some testing.
... to be continued :-)

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-11 14:42             ` Christian Ehrhardt
@ 2009-05-11 15:01               ` Avi Kivity
  2009-05-12  9:15                 ` Christian Ehrhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2009-05-11 15:01 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: kvm, Christian Borntraeger, Carsten Otte

Christian Ehrhardt wrote:
>
> The bad thing on vcpu->request in that case is that I don't want the 
> async behaviour of vcpu->requests in that case, I want the memory slot 
> updated in all vcpu's when the ioctl is returning.

You mean, the hardware can access the vcpu control block even when the 
vcpu is not running?

> Looking at vcpu->request based solution I don't find the 
> synchronization I need. The changes to  
> vcpu->arch.guest_origin/guest_memsize and the changes to 
> vcpu->arch.sie_block->gmsor/gmslm need to happen without the vcpu 
> running.
> Therefor i want the vcpu lock _before_ I update the both structs, 
> otherwise it could be racy (at least on s390).
>
> On the other hand while it is very++ unlikely to happen you are still 
> right that it could theoretically livelock there.
> I might use vcpu->request in to not enter vcpu run again after such a 
> "kick" out of guest state.
> It would be checked on vcpu_run enter and could then drop the lock, 
> call schedule, relock and check the flag again until it is cleared.
> I'm not yet happy with this solution as I expect it to end up in 
> something like a reference count which then would not fit into the 
> existing vcpu->request flags :-/
>
> As I mentioned above the changes to vcpu->arch and 
> vcpu->arch->sie_block have to be exclusive with the vcpu not running.
> If I would find something as "transport" for the information I have on 
> set_memory_slot (origin/size) until the next vcpu_run entry I could do 
> both changes there synchronously.

The information can be found in kvm->memslots.

> In that case I could really use your suggested solution with 
> vcpu->request, kick out unconditionally and set values on next 
> (re-)entry.
>
> Hmmm .. Maybe I can find all I need on reentry in vcpu->kvm->memslots[].

Err...

> If I can change it that way it will definitely require some testing.
> ... to be continued :-)

I definitely recommend it -- would bring s390 more in line with the 
other ports (I know it's a backward step for you :)

Note our plan is to change slots_lock to RCU, so it's even better if you 
use memslots.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-11 15:01               ` Avi Kivity
@ 2009-05-12  9:15                 ` Christian Ehrhardt
  2009-05-12 11:35                   ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Ehrhardt @ 2009-05-12  9:15 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Christian Borntraeger, Carsten Otte

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>>
>> The bad thing on vcpu->request in that case is that I don't want the 
>> async behaviour of vcpu->requests in that case, I want the memory 
>> slot updated in all vcpu's when the ioctl is returning.
>
> You mean, the hardware can access the vcpu control block even when the 
> vcpu is not running? 
No, hardware only uses it with a running vcpu, but I realised my own 
fault while changing the code to vcpu->request style.
For s390 I need to update the KVM->arch and *all* 
vcpu->arch->sie_block... data synchronously.
That makes the "per vcpu resync on next entry" approach not feasible.

On the other hand I realized at the same moment that the livelock should 
be no issue for us, because as I mentioned:
a) only one memslot
b) a vcpu can't run without memslot
So I don't even need to kick out vcpu's, they just should not be running.
Until we ever support multiple slots, or updates of the existing single 
slot this should be ok, so is the bugfix patch this should be.
To avoid a theoretical deadlock in case other code is changing (badly) 
it should be fair to aquire the lock with mutex_trylock and return 
-EINVAL if we did not get all locks.

[...]
>> If I can change it that way it will definitely require some testing.
>> ... to be continued :-)
>
> I definitely recommend it -- would bring s390 more in line with the 
> other ports (I know it's a backward step for you :)
>
> Note our plan is to change slots_lock to RCU, so it's even better if 
> you use memslots.
As long as we have the special conditions mentioned above I think its ok 
to implement it the way I do it now.
I agree that if we ever support multiple memslots we should strive for a 
common solution.

p.s. the second patch in the series ensures that a vcpu really never 
runs without a memslot being set as this was another bug we had.

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-12  9:15                 ` Christian Ehrhardt
@ 2009-05-12 11:35                   ` Avi Kivity
  2009-05-12 13:33                     ` Christian Ehrhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2009-05-12 11:35 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: kvm, Christian Borntraeger, Carsten Otte

Christian Ehrhardt wrote:
> Avi Kivity wrote:
>> Christian Ehrhardt wrote:
>>>
>>> The bad thing on vcpu->request in that case is that I don't want the 
>>> async behaviour of vcpu->requests in that case, I want the memory 
>>> slot updated in all vcpu's when the ioctl is returning.
>>
>> You mean, the hardware can access the vcpu control block even when 
>> the vcpu is not running? 
> No, hardware only uses it with a running vcpu, but I realised my own 
> fault while changing the code to vcpu->request style.
> For s390 I need to update the KVM->arch and *all* 
> vcpu->arch->sie_block... data synchronously.

Out of interest, can you explain why?

> That makes the "per vcpu resync on next entry" approach not feasible.
>
> On the other hand I realized at the same moment that the livelock 
> should be no issue for us, because as I mentioned:
> a) only one memslot
> b) a vcpu can't run without memslot
> So I don't even need to kick out vcpu's, they just should not be running.
> Until we ever support multiple slots, or updates of the existing 
> single slot this should be ok, so is the bugfix patch this should be.
> To avoid a theoretical deadlock in case other code is changing (badly) 
> it should be fair to aquire the lock with mutex_trylock and return 
> -EINVAL if we did not get all locks.

OK.


-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-12 11:35                   ` Avi Kivity
@ 2009-05-12 13:33                     ` Christian Ehrhardt
  2009-05-17 22:31                       ` Avi Kivity
  0 siblings, 1 reply; 26+ messages in thread
From: Christian Ehrhardt @ 2009-05-12 13:33 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Christian Borntraeger, Carsten Otte

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>> Avi Kivity wrote:
>>> Christian Ehrhardt wrote:
>>>>
>>>> The bad thing on vcpu->request in that case is that I don't want 
>>>> the async behaviour of vcpu->requests in that case, I want the 
>>>> memory slot updated in all vcpu's when the ioctl is returning.
>>>
>>> You mean, the hardware can access the vcpu control block even when 
>>> the vcpu is not running? 
>> No, hardware only uses it with a running vcpu, but I realised my own 
>> fault while changing the code to vcpu->request style.
>> For s390 I need to update the KVM->arch and *all* 
>> vcpu->arch->sie_block... data synchronously.
>
> Out of interest, can you explain why?
Sure I'll try to give an example.

a) The whole guest has "one" memory slot representing all it's memory. 
Therefore some important values like guest_origin and guest_memsize (one 
slot so it's just addr+size) are kept at VM level in kvm->arch.
b) We fortunately have cool hardware support for "nearly everything"(tm) 
:-) In this case for example we set in vcpu->arch.sie_block the values 
for origin and size translated into a "limit" to get memory management 
virtualization support.
c) we have other code e.g. all our copy_from/to_guest stuff that uses 
the kvm->arch values

If we would allow e.g. updates of a memslot (or as the patch supposes to 
harden the set_memory_region code against inconsiderate code changes in 
other sections) it might happen that we set the kvm->arch information 
but the vcpu->arch->sie_block stuff not until next reentry. Now 
concurrently the running vcpu could cause some kind of fault that 
involves a copy_from/to_guest. That way we could end up with potentially 
invalid handling of that fault (fault handling and running guest would 
use different userspace adresses until it is synced on next vcpu 
reentry) - it's theoretical I know, but it might cause some issues that 
would be hard to find.

On the other hand for the long term I wanted to note that all our 
copy_from/to_guest functions is per vcpu, so when we some day implement 
updateable memslots, multiple memslots or even just fill "free time"(tm) 
and streamline our code we could redesign that origin/size storage. This 
could be done multiple ways, either just store it per vcpu or with a 
lock for the kvm->arch level variables - both ways and maybe more could 
then use the vcpu->request based approach, but unfortunately it's 
neither part of that patch nor of the current effort to do that.

The really good thing is, because of our discussion about that I now 
have a really detailed idea how I can improve that code aside from this 
bugfix patch (lets hope not too far in the future).

>> That makes the "per vcpu resync on next entry" approach not feasible.
>>
>> On the other hand I realized at the same moment that the livelock 
>> should be no issue for us, because as I mentioned:
>> a) only one memslot
>> b) a vcpu can't run without memslot
>> So I don't even need to kick out vcpu's, they just should not be 
>> running.
>> Until we ever support multiple slots, or updates of the existing 
>> single slot this should be ok, so is the bugfix patch this should be.
>> To avoid a theoretical deadlock in case other code is changing 
>> (badly) it should be fair to aquire the lock with mutex_trylock and 
>> return -EINVAL if we did not get all locks.
>
> OK.
>
>

-- 

Grüsse / regards, 
Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-12 13:33                     ` Christian Ehrhardt
@ 2009-05-17 22:31                       ` Avi Kivity
  2009-05-20 12:05                         ` Christian Ehrhardt
  0 siblings, 1 reply; 26+ messages in thread
From: Avi Kivity @ 2009-05-17 22:31 UTC (permalink / raw)
  To: Christian Ehrhardt; +Cc: kvm, Christian Borntraeger, Carsten Otte

Christian Ehrhardt wrote:
>>>>> The bad thing on vcpu->request in that case is that I don't want 
>>>>> the async behaviour of vcpu->requests in that case, I want the 
>>>>> memory slot updated in all vcpu's when the ioctl is returning.
>>>>
>>>> You mean, the hardware can access the vcpu control block even when 
>>>> the vcpu is not running? 
>>> No, hardware only uses it with a running vcpu, but I realised my own 
>>> fault while changing the code to vcpu->request style.
>>> For s390 I need to update the KVM->arch and *all* 
>>> vcpu->arch->sie_block... data synchronously.
>>
>> Out of interest, can you explain why?
> Sure I'll try to give an example.
>
> a) The whole guest has "one" memory slot representing all it's memory. 
> Therefore some important values like guest_origin and guest_memsize 
> (one slot so it's just addr+size) are kept at VM level in kvm->arch.

It should really be kept in kvm->memslots[0]->{userspace_addr, npages}.  
This is common to all architectures.

> b) We fortunately have cool hardware support for "nearly 
> everything"(tm) :-) In this case for example we set in 
> vcpu->arch.sie_block the values for origin and size translated into a 
> "limit" to get memory management virtualization support.

x86 has something analogous; shadow or nested page tables are also 
per-vcpu and accessed by the hardware while the guest is running.

> c) we have other code e.g. all our copy_from/to_guest stuff that uses 
> the kvm->arch values

You want to drop these and use kvm_read_guest() / kvm_write_guest().

> If we would allow e.g. updates of a memslot (or as the patch supposes 
> to harden the set_memory_region code against inconsiderate code 
> changes in other sections) it might happen that we set the kvm->arch 
> information but the vcpu->arch->sie_block stuff not until next 
> reentry. Now concurrently the running vcpu could cause some kind of 
> fault that involves a copy_from/to_guest. That way we could end up 
> with potentially invalid handling of that fault (fault handling and 
> running guest would use different userspace adresses until it is 
> synced on next vcpu reentry) - it's theoretical I know, but it might 
> cause some issues that would be hard to find.

I agree it should be protected.  Here's how we do it in arch-independent 
code:

- code that looks up memory slots takes slots_lock for read (future: rcu)
- code that changes memory slots takes slots_lock for write, and 
requests an mmu reload (includes an IPI to force the vcpu out of guest mode)

Now, once we begin changing a slot no one can touch memory or reenter 
the guest until we are done.

> On the other hand for the long term I wanted to note that all our 
> copy_from/to_guest functions is per vcpu, so when we some day 
> implement updateable memslots, multiple memslots or even just fill 
> "free time"(tm) and streamline our code we could redesign that 
> origin/size storage. This could be done multiple ways, either just 
> store it per vcpu or with a lock for the kvm->arch level variables - 
> both ways and maybe more could then use the vcpu->request based 
> approach, but unfortunately it's neither part of that patch nor of the 
> current effort to do that.

I think we should keep that in generic code.  All of that applies to x86 
(and ia64 and ppc), if I understand you correctly, and if I understand 
the other archs correctly (don't place a large bet).

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 1/6] kvm-s390: Fix memory slot versus run
  2009-05-17 22:31                       ` Avi Kivity
@ 2009-05-20 12:05                         ` Christian Ehrhardt
  0 siblings, 0 replies; 26+ messages in thread
From: Christian Ehrhardt @ 2009-05-20 12:05 UTC (permalink / raw)
  To: Avi Kivity; +Cc: kvm, Christian Borntraeger, Carsten Otte

Avi Kivity wrote:
> Christian Ehrhardt wrote:
>>>>>> The bad thing on vcpu->request in that case is that I don't want 
>>>>>> the async behaviour of vcpu->requests in that case, I want the 
>>>>>> memory slot updated in all vcpu's when the ioctl is returning.
>>>>>
>>>>> You mean, the hardware can access the vcpu control block even when 
>>>>> the vcpu is not running? 
>>>> No, hardware only uses it with a running vcpu, but I realised my 
>>>> own fault while changing the code to vcpu->request style.
>>>> For s390 I need to update the KVM->arch and *all* 
>>>> vcpu->arch->sie_block... data synchronously.
>>>
>>> Out of interest, can you explain why?
>> Sure I'll try to give an example.
>>
>> a) The whole guest has "one" memory slot representing all it's 
>> memory. Therefore some important values like guest_origin and 
>> guest_memsize (one slot so it's just addr+size) are kept at VM level 
>> in kvm->arch.
>
> It should really be kept in kvm->memslots[0]->{userspace_addr, 
> npages}.  This is common to all architectures.
As I said wanted to do that, but due to the need to relocate my work 
environment to a new laptop I was a bit stalled the last few days.
A patch series implementing it in a streamlined (storing in memslots 
only, slots_lock, vcpu->request, ...) way will soon appear on the list.
> [...]
>> c) we have other code e.g. all our copy_from/to_guest stuff that uses 
>> the kvm->arch values
>
> You want to drop these and use kvm_read_guest() / kvm_write_guest().
I put it on my "low-prio-but-very-useful todo list" to take a look at 
that too as one of the next opportunities to streamline our code.

[...]

-- 

Grüsse / regards, Christian Ehrhardt
IBM Linux Technology Center, Open Virtualization

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle
  2009-05-07 10:19       ` Avi Kivity
  2009-05-07 10:34         ` Christian Borntraeger
@ 2009-05-20 15:48         ` Hollis Blanchard
  1 sibling, 0 replies; 26+ messages in thread
From: Hollis Blanchard @ 2009-05-20 15:48 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Christian Borntraeger, ehrhardt, kvm, Carsten Otte

On Thu, 2009-05-07 at 13:19 +0300, Avi Kivity wrote:
> Christian Borntraeger wrote:
> >> Why can't this be done from the timer context (after adjusting the locks)?
> >>     
> >
> > It can be done, in fact I did that in my first version. The thing is, we would need to change all local_int.locks to spinlock_irqs, since standard timers are softirqs and hrtimers and hardirqs. Enabling and disabling irqs is a relatively expensive operating on s390 (but locks via compare and swap are quite cheap). Since we take this specific lock in lots of places this lock is on some hot pathes. The idle wakeup on the other hand is not that critical.
> >   
> 
> Makes sense.
> 
> Back when irqsave/irqrestore were expensive on x86 (P4 days, I think it 
> was ~100 cycles) there was talk of using a software irq disable flag 
> instead of using the hardware support.  So
> 
> - local_irq_disable() sets a flag
> 
> - interrupt handlers check the flag, if set they schedule the interrupt 
> handler and return immediately
> 
> - local_irq_restore() clears the flag and fires and scheduled handlers
> 
> Now these operations are pretty cheap on x86, but maybe that can apply 
> to s390.

I don't know how the cycle counts compare, but FWIW normal ppc64 Linux
does almost exactly this (and calls it "lazy irq disabling"). The only
difference is a step 2.5 that really disables interrupts, because some
are level-triggered.

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2009-05-20 15:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-05 14:39 [PATCH 0/6] kvm-s390: collection of kvm-s390 fixes ehrhardt
2009-05-05 14:39 ` [PATCH 1/6] kvm-s390: Fix memory slot versus run ehrhardt
2009-05-06 12:01   ` Avi Kivity
2009-05-11 13:00     ` Christian Ehrhardt
2009-05-11 13:15       ` Avi Kivity
2009-05-11 13:46         ` Christian Ehrhardt
2009-05-11 14:02           ` Avi Kivity
2009-05-11 14:42             ` Christian Ehrhardt
2009-05-11 15:01               ` Avi Kivity
2009-05-12  9:15                 ` Christian Ehrhardt
2009-05-12 11:35                   ` Avi Kivity
2009-05-12 13:33                     ` Christian Ehrhardt
2009-05-17 22:31                       ` Avi Kivity
2009-05-20 12:05                         ` Christian Ehrhardt
2009-05-05 14:39 ` [PATCH 2/6] kvm-s390: use hrtimer for clock wakeup from idle ehrhardt
2009-05-06 12:10   ` Avi Kivity
2009-05-06 12:36     ` Christian Borntraeger
2009-05-07 10:19       ` Avi Kivity
2009-05-07 10:34         ` Christian Borntraeger
2009-05-20 15:48         ` Hollis Blanchard
2009-05-05 14:39 ` [PATCH 3/6] kvm-s390: optimize float int lock: spin_lock_bh --> spin_lock ehrhardt
2009-05-05 14:39 ` [PATCH 4/6] kvm-s390: Unlink vcpu on destroy ehrhardt
2009-05-06 12:11   ` Avi Kivity
2009-05-11 13:00     ` Christian Ehrhardt
2009-05-05 14:39 ` [PATCH 5/6] kvm-s390: Sanity check on validity intercept ehrhardt
2009-05-05 14:39 ` [PATCH 6/6] kvm-s390: Verify memory in kvm run ehrhardt

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).