[PATCH 0/10] HV KVM fixes, reposted

kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/10] HV KVM fixes, reposted
@ 2012-09-21  5:16 Paul Mackerras
  2012-09-21  5:33 ` [PATCH 01/10] KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas Paul Mackerras
                   ` (9 more replies)
  0 siblings, 10 replies; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:16 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

This is a repost of 10 patches out of a series of 12 that I posted
more than three weeks ago that have had no comments but have not yet
been applied.  They have been rediffed against Alex Graf's current
kvm-ppc-next branch.

This series contains various fixes collected during the process of
getting reboot of Book3S HV guests to work correctly, plus some needed
for Ben H's forthcoming series to implement in-kernel XICS (interrupt
controller) emulation.  As part of getting reboot to work, we have
relaxed the previous policy where, on POWER7, a virtual core would
only run when all of the vcpus in it were ready to run (or were idle).
Now a virtual core will run as soon as any one of its vcpus are ready.
This avoids the problem where the guest wouldn't run after reboot
because userspace (qemu) had stopped all except cpu 0.

Please apply.

Paul.

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/10] KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
@ 2012-09-21  5:33 ` Paul Mackerras
  2012-09-24 12:23   ` Alexander Graf
  2012-09-21  5:35 ` [PATCH 02/10] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online Paul Mackerras
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:33 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

The PAPR paravirtualization interface lets guests register three
different types of per-vCPU buffer areas in its memory for communication
with the hypervisor.  These are called virtual processor areas (VPAs).
Currently the hypercalls to register and unregister VPAs are handled
by KVM in the kernel, and userspace has no way to know about or save
and restore these registrations across a migration.

This adds get and set ioctls to allow userspace to see what addresses
have been registered, and to register or unregister them.  This will
be needed for guest hibernation and migration, and is also needed so
that userspace can unregister them on reset (otherwise we corrupt
guest memory after reboot by writing to the VPAs registered by the
previous kernel).  We also add a capability to indicate that the
ioctls are supported.

This also fixes a bug where we were calling init_vpa unconditionally,
leading to an oops when unregistering the VPA.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 Documentation/virtual/kvm/api.txt  |   32 +++++++++++++++++++++
 arch/powerpc/include/asm/kvm_ppc.h |    3 ++
 arch/powerpc/kvm/book3s_hv.c       |   54 +++++++++++++++++++++++++++++++++++-
 arch/powerpc/kvm/powerpc.c         |   26 +++++++++++++++++
 include/linux/kvm.h                |   11 ++++++++
 5 files changed, 125 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index a12f4e4..76a07a6 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1992,6 +1992,38 @@ return the hash table order in the parameter.  (If the guest is using
 the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
+4.77 KVM_PPC_GET_VPA_INFO
+
+Capability: KVM_CAP_PPC_VPA
+Architectures: powerpc
+Type: vcpu ioctl
+Parameters: Pointer to struct kvm_ppc_vpa (out)
+Returns: 0 on success, -1 on error
+
+This populates and returns a structure containing the guest physical
+addresses and sizes of the three per-virtual-processor areas that the
+guest can register with the hypervisor under the PAPR
+paravirtualization interface, namely the Virtual Processor Area, the
+SLB (Segment Lookaside Buffer) Shadow Area, and the Dispatch Trace
+Log.
+
+4.78 KVM_PPC_SET_VPA_INFO
+
+Capability: KVM_CAP_PPC_VPA
+Architectures: powerpc
+Type: vcpu ioctl
+Parameters: Pointer to struct kvm_ppc_vpa (in)
+Returns: 0 on success, -1 on error
+
+This sets the guest physical addresses and sizes of the three
+per-virtual-processor areas that the guest can register with the
+hypervisor under the PAPR paravirtualization interface, namely the
+Virtual Processor Area, the SLB (Segment Lookaside Buffer) Shadow
+Area, and the Dispatch Trace Log.  Providing an address of zero for
+any of these areas causes the kernel to unregister any previously
+registered area; a non-zero address replaces any previously registered
+area.
+
 
 5. The kvm_run structure
 ------------------------
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 3fb980d..2c94cb3 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -205,6 +205,9 @@ int kvmppc_set_sregs_ivor(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs);
 int kvm_vcpu_ioctl_get_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg);
 int kvm_vcpu_ioctl_set_one_reg(struct kvm_vcpu *vcpu, struct kvm_one_reg *reg);
 
+int kvm_vcpu_get_vpa_info(struct kvm_vcpu *vcpu, struct kvm_ppc_vpa *vpa);
+int kvm_vcpu_set_vpa_info(struct kvm_vcpu *vcpu, struct kvm_ppc_vpa *vpa);
+
 void kvmppc_set_pid(struct kvm_vcpu *vcpu, u32 pid);
 
 #ifdef CONFIG_KVM_BOOK3S_64_HV
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 38c7f1b..bebf9cb 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -143,6 +143,57 @@ static void init_vpa(struct kvm_vcpu *vcpu, struct lppaca *vpa)
 	vpa->yield_count = 1;
 }
 
+int kvm_vcpu_get_vpa_info(struct kvm_vcpu *vcpu, struct kvm_ppc_vpa *vpa)
+{
+	spin_lock(&vcpu->arch.vpa_update_lock);
+	vpa->vpa_addr = vcpu->arch.vpa.next_gpa;
+	vpa->slb_shadow_addr = vcpu->arch.slb_shadow.next_gpa;
+	vpa->slb_shadow_size = vcpu->arch.slb_shadow.len;
+	vpa->dtl_addr = vcpu->arch.dtl.next_gpa;
+	vpa->dtl_size = vcpu->arch.dtl.len;
+	spin_unlock(&vcpu->arch.vpa_update_lock);
+	return 0;
+}
+
+static inline void set_vpa(struct kvmppc_vpa *v, unsigned long addr,
+			   unsigned long len)
+{
+	if (v->next_gpa != addr || v->len != len) {
+		v->next_gpa = addr;
+		v->len = addr ? len : 0;
+		v->update_pending = 1;
+	}
+}
+
+int kvm_vcpu_set_vpa_info(struct kvm_vcpu *vcpu, struct kvm_ppc_vpa *vpa)
+{
+	/* check that addresses are cacheline aligned */
+	if ((vpa->vpa_addr & (L1_CACHE_BYTES - 1)) ||
+	    (vpa->slb_shadow_addr & (L1_CACHE_BYTES - 1)) ||
+	    (vpa->dtl_addr & (L1_CACHE_BYTES - 1)))
+		return -EINVAL;
+
+	/* DTL must be at least 1 entry long, if being set */
+	if (vpa->dtl_addr) {
+		if (vpa->dtl_size < sizeof(struct dtl_entry))
+			return -EINVAL;
+		vpa->dtl_size -= vpa->dtl_size % sizeof(struct dtl_entry);
+	}
+
+	/* DTL and SLB shadow require VPA */
+	if (!vpa->vpa_addr && (vpa->slb_shadow_addr || vpa->dtl_addr))
+		return -EINVAL;
+
+	spin_lock(&vcpu->arch.vpa_update_lock);
+	set_vpa(&vcpu->arch.vpa, vpa->vpa_addr, sizeof(struct lppaca));
+	set_vpa(&vcpu->arch.slb_shadow, vpa->slb_shadow_addr,
+		vpa->slb_shadow_size);
+	set_vpa(&vcpu->arch.dtl, vpa->dtl_addr, vpa->dtl_size);
+	spin_unlock(&vcpu->arch.vpa_update_lock);
+
+	return 0;
+}
+
 /* Length for a per-processor buffer is passed in at offset 4 in the buffer */
 struct reg_vpa {
 	u32 dummy;
@@ -321,7 +372,8 @@ static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 	spin_lock(&vcpu->arch.vpa_update_lock);
 	if (vcpu->arch.vpa.update_pending) {
 		kvmppc_update_vpa(vcpu, &vcpu->arch.vpa);
-		init_vpa(vcpu, vcpu->arch.vpa.pinned_addr);
+		if (vcpu->arch.vpa.pinned_addr)
+			init_vpa(vcpu, vcpu->arch.vpa.pinned_addr);
 	}
 	if (vcpu->arch.dtl.update_pending) {
 		kvmppc_update_vpa(vcpu, &vcpu->arch.dtl);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 8443e23..2b08564 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -349,6 +349,12 @@ int kvm_dev_ioctl_check_extension(long ext)
 		r = 1;
 #else
 		r = 0;
+		break;
+#endif
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	case KVM_CAP_PPC_VPA:
+		r = 1;
+		break;
 #endif
 		break;
 	case KVM_CAP_NR_VCPUS:
@@ -826,6 +832,26 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
 		break;
 	}
 #endif
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+	case KVM_PPC_GET_VPA_INFO: {
+		struct kvm_ppc_vpa vpa;
+		r = kvm_vcpu_get_vpa_info(vcpu, &vpa);
+		if (r)
+			break;
+		r = -EFAULT;
+		if (copy_to_user(argp, &vpa, sizeof(vpa)))
+			break;
+		r = 0;
+	}
+	case KVM_PPC_SET_VPA_INFO: {
+		struct kvm_ppc_vpa vpa;
+		r = -EFAULT;
+		if (copy_from_user(&vpa, argp, sizeof(vpa)))
+			break;
+		r = kvm_vcpu_set_vpa_info(vcpu, &vpa);
+		break;
+	}
+#endif
 	default:
 		r = -EINVAL;
 	}
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 99c3c50..e7509bd 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -629,6 +629,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_READONLY_MEM 81
 #endif
 #define KVM_CAP_PPC_BOOKE_WATCHDOG 82
+#define KVM_CAP_PPC_VPA 83
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -915,6 +916,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_SET_ONE_REG		  _IOW(KVMIO,  0xac, struct kvm_one_reg)
 /* VM is being stopped by host */
 #define KVM_KVMCLOCK_CTRL	  _IO(KVMIO,   0xad)
+#define KVM_PPC_GET_VPA_INFO	  _IOR(KVMIO,  0xae, struct kvm_ppc_vpa)
+#define KVM_PPC_SET_VPA_INFO	  _IOW(KVMIO,  0xaf, struct kvm_ppc_vpa)
 
 #define KVM_DEV_ASSIGN_ENABLE_IOMMU	(1 << 0)
 #define KVM_DEV_ASSIGN_PCI_2_3		(1 << 1)
@@ -966,4 +969,12 @@ struct kvm_assigned_msix_entry {
 	__u16 padding[3];
 };
 
+struct kvm_ppc_vpa {
+	__u64 vpa_addr;
+	__u64 slb_shadow_addr;
+	__u64 dtl_addr;
+	__u32 slb_shadow_size;
+	__u32 dtl_size;
+};
+
 #endif /* __LINUX_KVM_H */
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/10] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
  2012-09-21  5:33 ` [PATCH 01/10] KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas Paul Mackerras
@ 2012-09-21  5:35 ` Paul Mackerras
  2012-09-24 12:26   ` Alexander Graf
  2012-09-27  1:01   ` Benjamin Herrenschmidt
  2012-09-21  5:35 ` [PATCH 03/10] KVM: PPC: Book3S HV: Fix updates of vcpu->cpu Paul Mackerras
                   ` (7 subsequent siblings)
  9 siblings, 2 replies; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:35 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

When a Book3S HV KVM guest is running, we need the host to be in
single-thread mode, that is, all of the cores (or at least all of
the cores where the KVM guest could run) to be running only one
active hardware thread.  This is because of the hardware restriction
in POWER processors that all of the hardware threads in the core
must be in the same logical partition.  Complying with this restriction
is much easier if, from the host kernel's point of view, only one
hardware thread is active.

This adds two hooks in the SMP hotplug code to allow the KVM code to
make sure that secondary threads (i.e. hardware threads other than
thread 0) cannot come online while any KVM guest exists.  The KVM
code still has to check that any core where it runs a guest has the
secondary threads offline, but having done that check it can now be
sure that they will not come online while the guest is running.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/smp.h |    8 +++++++
 arch/powerpc/kernel/smp.c      |   46 ++++++++++++++++++++++++++++++++++++++++
 arch/powerpc/kvm/book3s_hv.c   |   12 +++++++++--
 3 files changed, 64 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index ebc24dc..b625a1a 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -66,6 +66,14 @@ void generic_cpu_die(unsigned int cpu);
 void generic_mach_cpu_die(void);
 void generic_set_cpu_dead(unsigned int cpu);
 int generic_check_cpu_restart(unsigned int cpu);
+
+extern void inhibit_secondary_onlining(void);
+extern void uninhibit_secondary_onlining(void);
+
+#else /* HOTPLUG_CPU */
+static inline void inhibit_secondary_onlining(void) {}
+static inline void uninhibit_secondary_onlining(void) {}
+
 #endif
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 0321007..c45f51d 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -410,6 +410,45 @@ int generic_check_cpu_restart(unsigned int cpu)
 {
 	return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE;
 }
+
+static atomic_t secondary_inhibit_count;
+
+/*
+ * Don't allow secondary CPU threads to come online
+ */
+void inhibit_secondary_onlining(void)
+{
+	/*
+	 * This makes secondary_inhibit_count stable during cpu
+	 * online/offline operations.
+	 */
+	get_online_cpus();
+
+	atomic_inc(&secondary_inhibit_count);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(inhibit_secondary_onlining);
+
+/*
+ * Allow secondary CPU threads to come online again
+ */
+void uninhibit_secondary_onlining(void)
+{
+	get_online_cpus();
+	atomic_dec(&secondary_inhibit_count);
+	put_online_cpus();
+}
+EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining);
+
+static int secondaries_inhibited(void)
+{
+	return atomic_read(&secondary_inhibit_count);
+}
+
+#else /* HOTPLUG_CPU */
+
+#define secondaries_inhibited()		0
+
 #endif
 
 static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
@@ -428,6 +467,13 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle)
 {
 	int rc, c;
 
+	/*
+	 * Don't allow secondary threads to come online if inhibited
+	 */
+	if (threads_per_core > 1 && secondaries_inhibited() &&
+	    cpu % threads_per_core != 0)
+		return -EBUSY;
+
 	if (smp_ops == NULL ||
 	    (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)))
 		return -EINVAL;
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bebf9cb..6fe1410 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -47,6 +47,7 @@
 #include <asm/page.h>
 #include <asm/hvcall.h>
 #include <asm/switch_to.h>
+#include <asm/smp.h>
 #include <linux/gfp.h>
 #include <linux/vmalloc.h>
 #include <linux/highmem.h>
@@ -918,8 +919,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	/*
 	 * Make sure we are running on thread 0, and that
 	 * secondary threads are offline.
-	 * XXX we should also block attempts to bring any
-	 * secondary threads online.
 	 */
 	if (threads_per_core > 1 && !on_primary_thread()) {
 		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
@@ -1632,11 +1631,20 @@ int kvmppc_core_init_vm(struct kvm *kvm)
 
 	kvm->arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206);
 	spin_lock_init(&kvm->arch.slot_phys_lock);
+
+	/*
+	 * Don't allow secondary CPU threads to come online
+	 * while any KVM VMs exist.
+	 */
+	inhibit_secondary_onlining();
+
 	return 0;
 }
 
 void kvmppc_core_destroy_vm(struct kvm *kvm)
 {
+	uninhibit_secondary_onlining();
+
 	if (kvm->arch.rma) {
 		kvm_release_rma(kvm->arch.rma);
 		kvm->arch.rma = NULL;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/10] KVM: PPC: Book3S HV: Fix updates of vcpu->cpu
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
  2012-09-21  5:33 ` [PATCH 01/10] KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas Paul Mackerras
  2012-09-21  5:35 ` [PATCH 02/10] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online Paul Mackerras
@ 2012-09-21  5:35 ` Paul Mackerras
  2012-09-24 12:52   ` Alexander Graf
  2012-09-21  5:36 ` [PATCH 04/10] KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs Paul Mackerras
                   ` (6 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:35 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

This removes the powerpc "generic" updates of vcpu->cpu in load and
put, and moves them to the various backends.

The reason is that "HV" KVM does its own sauce with that field
and the generic updates might corrupt it. The field contains the
CPU# of the -first- HW CPU of the core always for all the VCPU
threads of a core (the one that's online from a host Linux
perspective).

However, the preempt notifiers are going to be called on the
threads VCPUs when they are running (due to them sleeping on our
private waitqueue) causing unload to be called, potentially
clobbering the value.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_pr.c |    3 ++-
 arch/powerpc/kvm/booke.c     |    2 ++
 arch/powerpc/kvm/powerpc.c   |    2 --
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 4d0667a..bf3ec5d 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -64,7 +64,7 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	svcpu->slb_max = to_book3s(vcpu)->slb_shadow_max;
 	svcpu_put(svcpu);
 #endif
-
+	vcpu->cpu = smp_processor_id();
 #ifdef CONFIG_PPC_BOOK3S_32
 	current->thread.kvm_shadow_vcpu = to_book3s(vcpu)->shadow_vcpu;
 #endif
@@ -84,6 +84,7 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 	kvmppc_giveup_ext(vcpu, MSR_FP);
 	kvmppc_giveup_ext(vcpu, MSR_VEC);
 	kvmppc_giveup_ext(vcpu, MSR_VSX);
+	vcpu->cpu = -1;
 }
 
 int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 3a6490f..69d047c 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1509,12 +1509,14 @@ void kvmppc_decrementer_func(unsigned long data)
 
 void kvmppc_booke_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+	vcpu->cpu = smp_processor_id();
 	current->thread.kvm_vcpu = vcpu;
 }
 
 void kvmppc_booke_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	current->thread.kvm_vcpu = NULL;
+	vcpu->cpu = -1;
 }
 
 int __init kvmppc_booke_init(void)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2b08564..fd73763 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -510,7 +510,6 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	mtspr(SPRN_VRSAVE, vcpu->arch.vrsave);
 #endif
 	kvmppc_core_vcpu_load(vcpu, cpu);
-	vcpu->cpu = smp_processor_id();
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -519,7 +518,6 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 #ifdef CONFIG_BOOKE
 	vcpu->arch.vrsave = mfspr(SPRN_VRSAVE);
 #endif
-	vcpu->cpu = -1;
 }
 
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/10] KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
                   ` (2 preceding siblings ...)
  2012-09-21  5:35 ` [PATCH 03/10] KVM: PPC: Book3S HV: Fix updates of vcpu->cpu Paul Mackerras
@ 2012-09-21  5:36 ` Paul Mackerras
  2012-09-24 12:52   ` Alexander Graf
  2012-09-21  5:36 ` [PATCH 05/10] KVM: PPC: Book3S HV: Fix some races in starting secondary threads Paul Mackerras
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:36 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt

When making a vcpu non-runnable we incorrectly changed the
thread IDs of all other threads on the core, just remove that
code.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c |    6 ------
 1 file changed, 6 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 6fe1410..a917603 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -759,17 +759,11 @@ extern void xics_wake_cpu(int cpu);
 static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 				   struct kvm_vcpu *vcpu)
 {
-	struct kvm_vcpu *v;
-
 	if (vcpu->arch.state != KVMPPC_VCPU_RUNNABLE)
 		return;
 	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 	--vc->n_runnable;
 	++vc->n_busy;
-	/* decrement the physical thread id of each following vcpu */
-	v = vcpu;
-	list_for_each_entry_continue(v, &vc->runnable_threads, arch.run_list)
-		--v->arch.ptid;
 	list_del(&vcpu->arch.run_list);
 }
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/10] KVM: PPC: Book3S HV: Fix some races in starting secondary threads
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
                   ` (3 preceding siblings ...)
  2012-09-21  5:36 ` [PATCH 04/10] KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs Paul Mackerras
@ 2012-09-21  5:36 ` Paul Mackerras
  2012-09-21  5:37 ` [PATCH 06/10] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock Paul Mackerras
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:36 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

Subsequent patches implementing in-kernel XICS emulation will make it
possible for IPIs to arrive at secondary threads at arbitrary times.
This fixes some races in how we start the secondary threads, which
if not fixed could lead to occasional crashes of the host kernel.

This makes sure that (a) we have grabbed all the secondary threads,
and verified that they are no longer in the kernel, before we start
any thread, (b) that the secondary thread loads its vcpu pointer
after clearing the IPI that woke it up (so we don't miss a wakeup),
and (c) that the secondary thread clears its vcpu pointer before
incrementing the nap count.  It also removes unnecessary setting
of the vcpu and vcore pointers in the paca in kvmppc_core_vcpu_load.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv.c            |   41 ++++++++++++++++++-------------
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   11 ++++++---
 2 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a917603..cd3dc12 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -64,8 +64,6 @@ void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
-	local_paca->kvm_hstate.kvm_vcpu = vcpu;
-	local_paca->kvm_hstate.kvm_vcore = vc;
 	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE)
 		vc->stolen_tb += mftb() - vc->preempt_tb;
 }
@@ -776,6 +774,7 @@ static int kvmppc_grab_hwthread(int cpu)
 
 	/* Ensure the thread won't go into the kernel if it wakes */
 	tpaca->kvm_hstate.hwthread_req = 1;
+	tpaca->kvm_hstate.kvm_vcpu = NULL;
 
 	/*
 	 * If the thread is already executing in the kernel (e.g. handling
@@ -825,7 +824,6 @@ static void kvmppc_start_thread(struct kvm_vcpu *vcpu)
 	smp_wmb();
 #if defined(CONFIG_PPC_ICP_NATIVE) && defined(CONFIG_SMP)
 	if (vcpu->arch.ptid) {
-		kvmppc_grab_hwthread(cpu);
 		xics_wake_cpu(cpu);
 		++vc->n_woken;
 	}
@@ -851,7 +849,8 @@ static void kvmppc_wait_for_nap(struct kvmppc_vcore *vc)
 
 /*
  * Check that we are on thread 0 and that any other threads in
- * this core are off-line.
+ * this core are off-line.  Then grab the threads so they can't
+ * enter the kernel.
  */
 static int on_primary_thread(void)
 {
@@ -863,6 +862,17 @@ static int on_primary_thread(void)
 	while (++thr < threads_per_core)
 		if (cpu_online(cpu + thr))
 			return 0;
+
+	/* Grab all hw threads so they can't go into the kernel */
+	for (thr = 1; thr < threads_per_core; ++thr) {
+		if (kvmppc_grab_hwthread(cpu + thr)) {
+			/* Couldn't grab one; let the others go */
+			do {
+				kvmppc_release_hwthread(cpu + thr);
+			} while (--thr > 0);
+			return 0;
+		}
+	}
 	return 1;
 }
 
@@ -911,16 +921,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	}
 
 	/*
-	 * Make sure we are running on thread 0, and that
-	 * secondary threads are offline.
-	 */
-	if (threads_per_core > 1 && !on_primary_thread()) {
-		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
-			vcpu->arch.ret = -EBUSY;
-		goto out;
-	}
-
-	/*
 	 * Assign physical thread IDs, first to non-ceded vcpus
 	 * and then to ceded ones.
 	 */
@@ -939,15 +939,22 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 		if (vcpu->arch.ceded)
 			vcpu->arch.ptid = ptid++;
 
+	/*
+	 * Make sure we are running on thread 0, and that
+	 * secondary threads are offline.
+	 */
+	if (threads_per_core > 1 && !on_primary_thread()) {
+		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
+			vcpu->arch.ret = -EBUSY;
+		goto out;
+	}
+
 	vc->stolen_tb += mftb() - vc->preempt_tb;
 	vc->pcpu = smp_processor_id();
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		kvmppc_start_thread(vcpu);
 		kvmppc_create_dtl_entry(vcpu, vc);
 	}
-	/* Grab any remaining hw threads so they can't go into the kernel */
-	for (i = ptid; i < threads_per_core; ++i)
-		kvmppc_grab_hwthread(vc->pcpu + i);
 
 	preempt_disable();
 	spin_unlock(&vc->lock);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 44b72fe..1e90ef6 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -134,8 +134,11 @@ kvm_start_guest:
 
 27:	/* XXX should handle hypervisor maintenance interrupts etc. here */
 
+	/* reload vcpu pointer after clearing the IPI */
+	ld	r4,HSTATE_KVM_VCPU(r13)
+	cmpdi	r4,0
 	/* if we have no vcpu to run, go back to sleep */
-	beq	cr1,kvm_no_guest
+	beq	kvm_no_guest
 
 	/* were we napping due to cede? */
 	lbz	r0,HSTATE_NAPPING(r13)
@@ -1587,6 +1590,10 @@ secondary_too_late:
 	.endr
 
 secondary_nap:
+	/* Clear our vcpu pointer so we don't come back in early */
+	li	r0, 0
+	std	r0, HSTATE_KVM_VCPU(r13)
+	lwsync
 	/* Clear any pending IPI - assume we're a secondary thread */
 	ld	r5, HSTATE_XICS_PHYS(r13)
 	li	r7, XICS_XIRR
@@ -1612,8 +1619,6 @@ secondary_nap:
 kvm_no_guest:
 	li	r0, KVM_HWTHREAD_IN_NAP
 	stb	r0, HSTATE_HWTHREAD_STATE(r13)
-	li	r0, 0
-	std	r0, HSTATE_KVM_VCPU(r13)
 
 	li	r3, LPCR_PECE0
 	mfspr	r4, SPRN_LPCR
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/10] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
                   ` (4 preceding siblings ...)
  2012-09-21  5:36 ` [PATCH 05/10] KVM: PPC: Book3S HV: Fix some races in starting secondary threads Paul Mackerras
@ 2012-09-21  5:37 ` Paul Mackerras
  2012-09-24 12:48   ` Alexander Graf
  2012-09-21  5:37 ` [PATCH 07/10] KVM: PPC: Book3S HV: Fixes for late-joining threads Paul Mackerras
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:37 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

There were a few places where we were traversing the list of runnable
threads in a virtual core, i.e. vc->runnable_threads, without holding
the vcore spinlock.  This extends the places where we hold the vcore
spinlock to cover everywhere that we traverse that list.

Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
this moves the call of it from kvmppc_handle_exit out to
kvmppc_vcpu_run, where we don't hold the vcore lock.

In kvmppc_vcore_blocked, we don't actually need to check whether
all vcpus are ceded and don't have any pending exceptions, since the
caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
actually checking for pending exceptions, so we add that.

The change of if to while in kvmppc_run_vcpu is to make sure that we
never call kvmppc_remove_runnable() when the vcore state is RUNNING or
EXITING.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_asm.h |    1 +
 arch/powerpc/kvm/book3s_hv.c       |   64 +++++++++++++++++-------------------
 2 files changed, 31 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 76fdcfe..fb99a21 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -123,6 +123,7 @@
 #define RESUME_GUEST_NV         RESUME_FLAG_NV
 #define RESUME_HOST             RESUME_FLAG_HOST
 #define RESUME_HOST_NV          (RESUME_FLAG_HOST|RESUME_FLAG_NV)
+#define RESUME_PAGE_FAULT	(1<<2)
 
 #define KVM_GUEST_MODE_NONE	0
 #define KVM_GUEST_MODE_GUEST	1
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index cd3dc12..bd3c5c1 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -466,7 +466,6 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 			      struct task_struct *tsk)
 {
 	int r = RESUME_HOST;
-	int srcu_idx;
 
 	vcpu->stat.sum_exits++;
 
@@ -526,16 +525,12 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 * have been handled already.
 	 */
 	case BOOK3S_INTERRUPT_H_DATA_STORAGE:
-		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-		r = kvmppc_book3s_hv_page_fault(run, vcpu,
-				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
-		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+		r = RESUME_PAGE_FAULT;
 		break;
 	case BOOK3S_INTERRUPT_H_INST_STORAGE:
-		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-		r = kvmppc_book3s_hv_page_fault(run, vcpu,
-				kvmppc_get_pc(vcpu), 0);
-		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
+		vcpu->arch.fault_dsisr = 0;
+		r = RESUME_PAGE_FAULT;
 		break;
 	/*
 	 * This occurs if the guest executes an illegal instruction.
@@ -880,22 +875,24 @@ static int on_primary_thread(void)
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
-static int kvmppc_run_core(struct kvmppc_vcore *vc)
+static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
 	struct kvm_vcpu *vcpu, *vcpu0, *vnext;
 	long ret;
 	u64 now;
 	int ptid, i, need_vpa_update;
 	int srcu_idx;
+	struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
 	/* don't start if any threads have a signal pending */
 	need_vpa_update = 0;
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		if (signal_pending(vcpu->arch.run_task))
-			return 0;
-		need_vpa_update |= vcpu->arch.vpa.update_pending |
-			vcpu->arch.slb_shadow.update_pending |
-			vcpu->arch.dtl.update_pending;
+			return;
+		if (vcpu->arch.vpa.update_pending ||
+		    vcpu->arch.slb_shadow.update_pending ||
+		    vcpu->arch.dtl.update_pending)
+			vcpus_to_update[need_vpa_update++] = vcpu;
 	}
 
 	/*
@@ -915,8 +912,8 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	 */
 	if (need_vpa_update) {
 		spin_unlock(&vc->lock);
-		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
-			kvmppc_update_vpas(vcpu);
+		for (i = 0; i < need_vpa_update; ++i)
+			kvmppc_update_vpas(vcpus_to_update[i]);
 		spin_lock(&vc->lock);
 	}
 
@@ -933,8 +930,10 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 			vcpu->arch.ptid = ptid++;
 		}
 	}
-	if (!vcpu0)
-		return 0;		/* nothing to run */
+	if (!vcpu0) {
+		vc->vcore_state = VCORE_INACTIVE;
+		return;		/* nothing to run; should never happen */
+	}
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 		if (vcpu->arch.ceded)
 			vcpu->arch.ptid = ptid++;
@@ -987,6 +986,7 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	preempt_enable();
 	kvm_resched(vcpu);
 
+	spin_lock(&vc->lock);
 	now = get_tb();
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		/* cancel pending dec exception if dec is positive */
@@ -1010,7 +1010,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 		}
 	}
 
-	spin_lock(&vc->lock);
  out:
 	vc->vcore_state = VCORE_INACTIVE;
 	vc->preempt_tb = mftb();
@@ -1021,8 +1020,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 			wake_up(&vcpu->arch.cpu_run);
 		}
 	}
-
-	return 1;
 }
 
 /*
@@ -1046,20 +1043,11 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu *vcpu, int wait_state)
 static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 {
 	DEFINE_WAIT(wait);
-	struct kvm_vcpu *v;
-	int all_idle = 1;
 
 	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
 	vc->vcore_state = VCORE_SLEEPING;
 	spin_unlock(&vc->lock);
-	list_for_each_entry(v, &vc->runnable_threads, arch.run_list) {
-		if (!v->arch.ceded || v->arch.pending_exceptions) {
-			all_idle = 0;
-			break;
-		}
-	}
-	if (all_idle)
-		schedule();
+	schedule();
 	finish_wait(&vc->wq, &wait);
 	spin_lock(&vc->lock);
 	vc->vcore_state = VCORE_INACTIVE;
@@ -1115,7 +1103,8 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		vc->runner = vcpu;
 		n_ceded = 0;
 		list_for_each_entry(v, &vc->runnable_threads, arch.run_list)
-			n_ceded += v->arch.ceded;
+			if (!v->arch.pending_exceptions)
+				n_ceded += v->arch.ceded;
 		if (n_ceded == vc->n_runnable)
 			kvmppc_vcore_blocked(vc);
 		else
@@ -1136,8 +1125,9 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	}
 
 	if (signal_pending(current)) {
-		if (vc->vcore_state == VCORE_RUNNING ||
-		    vc->vcore_state == VCORE_EXITING) {
+		while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
+		       (vc->vcore_state == VCORE_RUNNING ||
+			vc->vcore_state == VCORE_EXITING)) {
 			spin_unlock(&vc->lock);
 			kvmppc_wait_for_exec(vcpu, TASK_UNINTERRUPTIBLE);
 			spin_lock(&vc->lock);
@@ -1157,6 +1147,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
 {
 	int r;
+	int srcu_idx;
 
 	if (!vcpu->arch.sane) {
 		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
@@ -1195,6 +1186,11 @@ int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		    !(vcpu->arch.shregs.msr & MSR_PR)) {
 			r = kvmppc_pseries_do_hcall(vcpu);
 			kvmppc_core_prepare_to_enter(vcpu);
+		} else if (r == RESUME_PAGE_FAULT) {
+			srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+			r = kvmppc_book3s_hv_page_fault(run, vcpu,
+				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
+			srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
 		}
 	} while (r == RESUME_GUEST);
 
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/10] KVM: PPC: Book3S HV: Fixes for late-joining threads
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
                   ` (5 preceding siblings ...)
  2012-09-21  5:37 ` [PATCH 06/10] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock Paul Mackerras
@ 2012-09-21  5:37 ` Paul Mackerras
  2012-09-21  5:38 ` [PATCH 08/10] KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run Paul Mackerras
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:37 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

If a thread in a virtual core becomes runnable while other threads
in the same virtual core are already running in the guest, it is
possible for the latecomer to join the others on the core without
first pulling them all out of the guest.  Currently this only happens
rarely, when a vcpu is first started.  This fixes some bugs and
omissions in the code in this case.

First, we need to check for VPA updates for the latecomer and make
a DTL entry for it.  Secondly, if it comes along while the master
vcpu is doing a VPA update, we don't need to do anything since the
master will pick it up in kvmppc_run_core.  To handle this correctly
we introduce a new vcore state, VCORE_STARTING.  Thirdly, there is
a race because we currently clear the hardware thread's hwthread_req
before waiting to see it get to nap.  A latecomer thread could have
its hwthread_req cleared before it gets to test it, and therefore
never increment the nap_count, leading to messages about wait_for_nap
timeouts.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |    7 ++++---
 arch/powerpc/kvm/book3s_hv.c        |   14 +++++++++++---
 2 files changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 68f5a30..218534d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -289,9 +289,10 @@ struct kvmppc_vcore {
 
 /* Values for vcore_state */
 #define VCORE_INACTIVE	0
-#define VCORE_RUNNING	1
-#define VCORE_EXITING	2
-#define VCORE_SLEEPING	3
+#define VCORE_SLEEPING	1
+#define VCORE_STARTING	2
+#define VCORE_RUNNING	3
+#define VCORE_EXITING	4
 
 /*
  * Struct used to manage memory for a virtual processor area
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index bd3c5c1..8e84625 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -368,6 +368,11 @@ static void kvmppc_update_vpa(struct kvm_vcpu *vcpu, struct kvmppc_vpa *vpap)
 
 static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 {
+	if (!(vcpu->arch.vpa.update_pending ||
+	      vcpu->arch.slb_shadow.update_pending ||
+	      vcpu->arch.dtl.update_pending))
+		return;
+
 	spin_lock(&vcpu->arch.vpa_update_lock);
 	if (vcpu->arch.vpa.update_pending) {
 		kvmppc_update_vpa(vcpu, &vcpu->arch.vpa);
@@ -902,7 +907,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	vc->n_woken = 0;
 	vc->nap_count = 0;
 	vc->entry_exit_count = 0;
-	vc->vcore_state = VCORE_RUNNING;
+	vc->vcore_state = VCORE_STARTING;
 	vc->in_guest = 0;
 	vc->napping_threads = 0;
 
@@ -955,6 +960,7 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 		kvmppc_create_dtl_entry(vcpu, vc);
 	}
 
+	vc->vcore_state = VCORE_RUNNING;
 	preempt_disable();
 	spin_unlock(&vc->lock);
 
@@ -963,8 +969,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	srcu_idx = srcu_read_lock(&vcpu0->kvm->srcu);
 
 	__kvmppc_vcore_entry(NULL, vcpu0);
-	for (i = 0; i < threads_per_core; ++i)
-		kvmppc_release_hwthread(vc->pcpu + i);
 
 	spin_lock(&vc->lock);
 	/* disable sending of IPIs on virtual external irqs */
@@ -973,6 +977,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 	/* wait for secondary threads to finish writing their state to memory */
 	if (vc->nap_count < vc->n_woken)
 		kvmppc_wait_for_nap(vc);
+	for (i = 0; i < threads_per_core; ++i)
+		kvmppc_release_hwthread(vc->pcpu + i);
 	/* prevent other vcpu threads from doing kvmppc_start_thread() now */
 	vc->vcore_state = VCORE_EXITING;
 	spin_unlock(&vc->lock);
@@ -1063,6 +1069,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	kvm_run->exit_reason = 0;
 	vcpu->arch.ret = RESUME_GUEST;
 	vcpu->arch.trap = 0;
+	kvmppc_update_vpas(vcpu);
 
 	/*
 	 * Synchronize with other threads in this virtual core
@@ -1086,6 +1093,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		if (vc->vcore_state == VCORE_RUNNING &&
 		    VCORE_EXIT_COUNT(vc) == 0) {
 			vcpu->arch.ptid = vc->n_runnable - 1;
+			kvmppc_create_dtl_entry(vcpu, vc);
 			kvmppc_start_thread(vcpu);
 		}
 
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/10] KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
                   ` (6 preceding siblings ...)
  2012-09-21  5:37 ` [PATCH 07/10] KVM: PPC: Book3S HV: Fixes for late-joining threads Paul Mackerras
@ 2012-09-21  5:38 ` Paul Mackerras
  2012-09-21  5:38 ` [PATCH 09/10] KVM: PPC: Book3S HV: Fix accounting of stolen time Paul Mackerras
  2012-09-21  5:39 ` [PATCH 10/10] KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation Paul Mackerras
  9 siblings, 0 replies; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:38 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

Currently the Book3S HV code implements a policy on multi-threaded
processors (i.e. POWER7) that requires all of the active vcpus in a
virtual core to be ready to run before we run the virtual core.
However, that causes problems on reset, because reset stops all vcpus
except vcpu 0, and can also reduce throughput since all four threads
in a virtual core have to wait whenever any one of them hits a
hypervisor page fault.

This relaxes the policy, allowing the virtual core to run as soon as
any vcpu in it is runnable.  With this, the KVMPPC_VCPU_STOPPED state
and the KVMPPC_VCPU_BUSY_IN_HOST state have been combined into a single
KVMPPC_VCPU_NOTREADY state, since we no longer need to distinguish
between them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |    5 +--
 arch/powerpc/kvm/book3s_hv.c        |   74 ++++++++++++++++++-----------------
 2 files changed, 40 insertions(+), 39 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 218534d..1e8cbd1 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -563,9 +563,8 @@ struct kvm_vcpu_arch {
 };
 
 /* Values for vcpu->arch.state */
-#define KVMPPC_VCPU_STOPPED		0
-#define KVMPPC_VCPU_BUSY_IN_HOST	1
-#define KVMPPC_VCPU_RUNNABLE		2
+#define KVMPPC_VCPU_NOTREADY		0
+#define KVMPPC_VCPU_RUNNABLE		1
 
 /* Values for vcpu->arch.io_gpr */
 #define KVM_MMIO_REG_MASK	0x001f
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8e84625..dc34a69 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -669,10 +669,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 
 	kvmppc_mmu_book3s_hv_init(vcpu);
 
-	/*
-	 * We consider the vcpu stopped until we see the first run ioctl for it.
-	 */
-	vcpu->arch.state = KVMPPC_VCPU_STOPPED;
+	vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
 
 	init_waitqueue_head(&vcpu->arch.cpu_run);
 
@@ -759,9 +756,8 @@ static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 {
 	if (vcpu->arch.state != KVMPPC_VCPU_RUNNABLE)
 		return;
-	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
+	vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
 	--vc->n_runnable;
-	++vc->n_busy;
 	list_del(&vcpu->arch.run_list);
 }
 
@@ -1062,7 +1058,6 @@ static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 {
 	int n_ceded;
-	int prev_state;
 	struct kvmppc_vcore *vc;
 	struct kvm_vcpu *v, *vn;
 
@@ -1079,7 +1074,6 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	vcpu->arch.ceded = 0;
 	vcpu->arch.run_task = current;
 	vcpu->arch.kvm_run = kvm_run;
-	prev_state = vcpu->arch.state;
 	vcpu->arch.state = KVMPPC_VCPU_RUNNABLE;
 	list_add_tail(&vcpu->arch.run_list, &vc->runnable_threads);
 	++vc->n_runnable;
@@ -1089,35 +1083,26 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	 * If the vcore is already running, we may be able to start
 	 * this thread straight away and have it join in.
 	 */
-	if (prev_state == KVMPPC_VCPU_STOPPED) {
+	if (!signal_pending(current)) {
 		if (vc->vcore_state == VCORE_RUNNING &&
 		    VCORE_EXIT_COUNT(vc) == 0) {
 			vcpu->arch.ptid = vc->n_runnable - 1;
 			kvmppc_create_dtl_entry(vcpu, vc);
 			kvmppc_start_thread(vcpu);
+		} else if (vc->vcore_state == VCORE_SLEEPING) {
+			wake_up(&vc->wq);
 		}
 
-	} else if (prev_state == KVMPPC_VCPU_BUSY_IN_HOST)
-		--vc->n_busy;
+	}
 
 	while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
 	       !signal_pending(current)) {
-		if (vc->n_busy || vc->vcore_state != VCORE_INACTIVE) {
+		if (vc->vcore_state != VCORE_INACTIVE) {
 			spin_unlock(&vc->lock);
 			kvmppc_wait_for_exec(vcpu, TASK_INTERRUPTIBLE);
 			spin_lock(&vc->lock);
 			continue;
 		}
-		vc->runner = vcpu;
-		n_ceded = 0;
-		list_for_each_entry(v, &vc->runnable_threads, arch.run_list)
-			if (!v->arch.pending_exceptions)
-				n_ceded += v->arch.ceded;
-		if (n_ceded == vc->n_runnable)
-			kvmppc_vcore_blocked(vc);
-		else
-			kvmppc_run_core(vc);
-
 		list_for_each_entry_safe(v, vn, &vc->runnable_threads,
 					 arch.run_list) {
 			kvmppc_core_prepare_to_enter(v);
@@ -1129,23 +1114,40 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 				wake_up(&v->arch.cpu_run);
 			}
 		}
+		if (!vc->n_runnable || vcpu->arch.state != KVMPPC_VCPU_RUNNABLE)
+			break;
+		vc->runner = vcpu;
+		n_ceded = 0;
+		list_for_each_entry(v, &vc->runnable_threads, arch.run_list)
+			if (!v->arch.pending_exceptions)
+				n_ceded += v->arch.ceded;
+		if (n_ceded == vc->n_runnable)
+			kvmppc_vcore_blocked(vc);
+		else
+			kvmppc_run_core(vc);
 		vc->runner = NULL;
 	}
 
-	if (signal_pending(current)) {
-		while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
-		       (vc->vcore_state == VCORE_RUNNING ||
-			vc->vcore_state == VCORE_EXITING)) {
-			spin_unlock(&vc->lock);
-			kvmppc_wait_for_exec(vcpu, TASK_UNINTERRUPTIBLE);
-			spin_lock(&vc->lock);
-		}
-		if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) {
-			kvmppc_remove_runnable(vc, vcpu);
-			vcpu->stat.signal_exits++;
-			kvm_run->exit_reason = KVM_EXIT_INTR;
-			vcpu->arch.ret = -EINTR;
-		}
+	while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
+	       (vc->vcore_state == VCORE_RUNNING ||
+		vc->vcore_state == VCORE_EXITING)) {
+		spin_unlock(&vc->lock);
+		kvmppc_wait_for_exec(vcpu, TASK_UNINTERRUPTIBLE);
+		spin_lock(&vc->lock);
+	}
+
+	if (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE) {
+		kvmppc_remove_runnable(vc, vcpu);
+		vcpu->stat.signal_exits++;
+		kvm_run->exit_reason = KVM_EXIT_INTR;
+		vcpu->arch.ret = -EINTR;
+	}
+
+	if (vc->n_runnable && vc->vcore_state == VCORE_INACTIVE) {
+		/* Wake up some vcpu to run the core */
+		v = list_first_entry(&vc->runnable_threads,
+				     struct kvm_vcpu, arch.run_list);
+		wake_up(&v->arch.cpu_run);
 	}
 
 	spin_unlock(&vc->lock);
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/10] KVM: PPC: Book3S HV: Fix accounting of stolen time
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
                   ` (7 preceding siblings ...)
  2012-09-21  5:38 ` [PATCH 08/10] KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run Paul Mackerras
@ 2012-09-21  5:38 ` Paul Mackerras
  2012-09-27  6:05   ` [PATCH v2 " Paul Mackerras
  2012-09-21  5:39 ` [PATCH 10/10] KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation Paul Mackerras
  9 siblings, 1 reply; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:38 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/kvm_host.h |    5 ++
 arch/powerpc/kvm/book3s_hv.c        |  127 ++++++++++++++++++++++++++++++-----
 2 files changed, 117 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 1e8cbd1..3093896 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -559,12 +559,17 @@ struct kvm_vcpu_arch {
 	unsigned long dtl_index;
 	u64 stolen_logged;
 	struct kvmppc_vpa slb_shadow;
+
+	spinlock_t tbacct_lock;
+	u64 busy_stolen;
+	u64 busy_preempt;
 #endif
 };
 
 /* Values for vcpu->arch.state */
 #define KVMPPC_VCPU_NOTREADY		0
 #define KVMPPC_VCPU_RUNNABLE		1
+#define KVMPPC_VCPU_BUSY_IN_HOST	2
 
 /* Values for vcpu->arch.io_gpr */
 #define KVM_MMIO_REG_MASK	0x001f
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index dc34a69..f953f73 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -57,23 +57,74 @@
 /* #define EXIT_DEBUG_SIMPLE */
 /* #define EXIT_DEBUG_INT */
 
+/* Used as a "null" value for timebase values */
+#define TB_NIL	(~(u64)0)
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+/*
+ * We use the vcpu_load/put functions to measure stolen time.
+ * Stolen time is counted as time when either the vcpu is able to
+ * run as part of a virtual core, but the task running the vcore
+ * is preempted or sleeping, or when the vcpu needs something done
+ * in the kernel by the task running the vcpu, but that task is
+ * preempted or sleeping.  Those two things have to be counted
+ * separately, since one of the vcpu tasks will take on the job
+ * of running the core, and the other vcpu tasks in the vcore will
+ * sleep waiting for it to do that, but that sleep shouldn't count
+ * as stolen time.
+ *
+ * Hence we accumulate stolen time when the vcpu can run as part of
+ * a vcore using vc->stolen_tb, and the stolen time when the vcpu
+ * needs its task to do other things in the kernel (for example,
+ * service a page fault) in busy_stolen.  We don't accumulate
+ * stolen time for a vcore when it is inactive, or for a vcpu
+ * when it is in state RUNNING or NOTREADY.  NOTREADY is a bit of
+ * a misnomer; it means that the vcpu task is not executing in
+ * the KVM_VCPU_RUN ioctl, i.e. it is in userspace or elsewhere in
+ * the kernel.  We don't have any way of dividing up that time
+ * between time that the vcpu is genuinely stopped, time that
+ * the task is actively working on behalf of the vcpu, and time
+ * that the task is preempted, so we don't count any of it as
+ * stolen.
+ *
+ * Updates to busy_stolen are protected by arch.tbacct_lock;
+ * updates to vc->stolen_tb are protected by the arch.tbacct_lock
+ * of the vcpu that has taken responsibility for running the vcore
+ * (i.e. vc->runner).  The stolen times are measured in units of
+ * timebase ticks.  (Note that the != TB_NIL checks below are
+ * purely defensive; they should never fail.)
+ */
+
 void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
-	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE)
+	spin_lock(&vcpu->arch.tbacct_lock);
+	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE &&
+	    vc->preempt_tb != TB_NIL) {
 		vc->stolen_tb += mftb() - vc->preempt_tb;
+		vc->preempt_tb = TB_NIL;
+	}
+	if (vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST &&
+	    vcpu->arch.busy_preempt != TB_NIL) {
+		vcpu->arch.busy_stolen += mftb() - vcpu->arch.busy_preempt;
+		vcpu->arch.busy_preempt = TB_NIL;
+	}
+	spin_unlock(&vcpu->arch.tbacct_lock);
 }
 
 void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
+	spin_lock(&vcpu->arch.tbacct_lock);
 	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE)
 		vc->preempt_tb = mftb();
+	if (vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST)
+		vcpu->arch.busy_preempt = mftb();
+	spin_unlock(&vcpu->arch.tbacct_lock);
 }
 
 void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
@@ -389,24 +440,61 @@ static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 	spin_unlock(&vcpu->arch.vpa_update_lock);
 }
 
+/*
+ * Return the accumulated stolen time for the vcore up until `now'.
+ * The caller should hold the vcore lock.
+ */
+static u64 vcore_stolen_time(struct kvmppc_vcore *vc, u64 now)
+{
+	u64 p;
+
+	/*
+	 * If we are the task running the vcore, then since we hold
+	 * the vcore lock, we can't be preempted, so stolen_tb/preempt_tb
+	 * can't be updated, so we don't need the tbacct_lock.
+	 * If the vcore is inactive, it can't become active (since we
+	 * hold the vcore lock), so the vcpu load/put functions won't
+	 * update stolen_tb/preempt_tb, and we don't need tbacct_lock.
+	 */
+	if (vc->vcore_state != VCORE_INACTIVE &&
+	    vc->runner->arch.run_task != current) {
+		spin_lock(&vc->runner->arch.tbacct_lock);
+		p = vc->stolen_tb;
+		if (vc->preempt_tb != TB_NIL)
+			p += now - vc->preempt_tb;
+		spin_unlock(&vc->runner->arch.tbacct_lock);
+	} else {
+		p = vc->stolen_tb;
+	}
+	return p;
+}
+
 static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
 				    struct kvmppc_vcore *vc)
 {
 	struct dtl_entry *dt;
 	struct lppaca *vpa;
-	unsigned long old_stolen;
+	unsigned long stolen;
+	unsigned long core_stolen;
+	u64 now;
 
 	dt = vcpu->arch.dtl_ptr;
 	vpa = vcpu->arch.vpa.pinned_addr;
-	old_stolen = vcpu->arch.stolen_logged;
-	vcpu->arch.stolen_logged = vc->stolen_tb;
+	now = mftb();
+	core_stolen = vcore_stolen_time(vc, now);
+	stolen = core_stolen - vcpu->arch.stolen_logged;
+	vcpu->arch.stolen_logged = core_stolen;
+	spin_lock(&vcpu->arch.tbacct_lock);
+	stolen += vcpu->arch.busy_stolen;
+	vcpu->arch.busy_stolen = 0;
+	spin_unlock(&vcpu->arch.tbacct_lock);
 	if (!dt || !vpa)
 		return;
 	memset(dt, 0, sizeof(struct dtl_entry));
 	dt->dispatch_reason = 7;
 	dt->processor_id = vc->pcpu + vcpu->arch.ptid;
-	dt->timebase = mftb();
-	dt->enqueue_to_dispatch_time = vc->stolen_tb - old_stolen;
+	dt->timebase = now;
+	dt->enqueue_to_dispatch_time = stolen;
 	dt->srr0 = kvmppc_get_pc(vcpu);
 	dt->srr1 = vcpu->arch.shregs.msr;
 	++dt;
@@ -666,6 +754,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	vcpu->arch.pvr = mfspr(SPRN_PVR);
 	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
+	spin_lock_init(&vcpu->arch.tbacct_lock);
+	vcpu->arch.busy_preempt = TB_NIL;
 
 	kvmppc_mmu_book3s_hv_init(vcpu);
 
@@ -681,7 +771,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 			INIT_LIST_HEAD(&vcore->runnable_threads);
 			spin_lock_init(&vcore->lock);
 			init_waitqueue_head(&vcore->wq);
-			vcore->preempt_tb = mftb();
+			vcore->preempt_tb = TB_NIL;
 		}
 		kvm->arch.vcores[core] = vcore;
 	}
@@ -694,7 +784,6 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	++vcore->num_threads;
 	spin_unlock(&vcore->lock);
 	vcpu->arch.vcore = vcore;
-	vcpu->arch.stolen_logged = vcore->stolen_tb;
 
 	vcpu->arch.cpu_type = KVM_CPU_3S_64;
 	kvmppc_sanity_check(vcpu);
@@ -754,9 +843,17 @@ extern void xics_wake_cpu(int cpu);
 static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 				   struct kvm_vcpu *vcpu)
 {
+	u64 now;
+
 	if (vcpu->arch.state != KVMPPC_VCPU_RUNNABLE)
 		return;
-	vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
+	spin_lock(&vcpu->arch.tbacct_lock);
+	now = mftb();
+	vcpu->arch.busy_stolen += vcore_stolen_time(vc, now) -
+		vcpu->arch.stolen_logged;
+	vcpu->arch.busy_preempt = now;
+	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
+	spin_unlock(&vcpu->arch.tbacct_lock);
 	--vc->n_runnable;
 	list_del(&vcpu->arch.run_list);
 }
@@ -931,10 +1028,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 			vcpu->arch.ptid = ptid++;
 		}
 	}
-	if (!vcpu0) {
-		vc->vcore_state = VCORE_INACTIVE;
-		return;		/* nothing to run; should never happen */
-	}
+	if (!vcpu0)
+		goto out;	/* nothing to run; should never happen */
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 		if (vcpu->arch.ceded)
 			vcpu->arch.ptid = ptid++;
@@ -949,7 +1044,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 		goto out;
 	}
 
-	vc->stolen_tb += mftb() - vc->preempt_tb;
 	vc->pcpu = smp_processor_id();
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		kvmppc_start_thread(vcpu);
@@ -1014,7 +1108,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 
  out:
 	vc->vcore_state = VCORE_INACTIVE;
-	vc->preempt_tb = mftb();
 	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
 				 arch.run_list) {
 		if (vcpu->arch.ret != RESUME_GUEST) {
@@ -1074,7 +1167,9 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	vcpu->arch.ceded = 0;
 	vcpu->arch.run_task = current;
 	vcpu->arch.kvm_run = kvm_run;
+	vcpu->arch.stolen_logged = vcore_stolen_time(vc, mftb());
 	vcpu->arch.state = KVMPPC_VCPU_RUNNABLE;
+	vcpu->arch.busy_preempt = TB_NIL;
 	list_add_tail(&vcpu->arch.run_list, &vc->runnable_threads);
 	++vc->n_runnable;
 
@@ -1188,6 +1283,7 @@ int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	flush_vsx_to_thread(current);
 	vcpu->arch.wqp = &vcpu->arch.vcore->wq;
 	vcpu->arch.pgdir = current->mm->pgd;
+	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
 	do {
 		r = kvmppc_run_vcpu(run, vcpu);
@@ -1205,6 +1301,7 @@ int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	} while (r == RESUME_GUEST);
 
  out:
+	vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
 	atomic_dec(&vcpu->kvm->arch.vcpus_running);
 	return r;
 }
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/10] KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation
  2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
                   ` (8 preceding siblings ...)
  2012-09-21  5:38 ` [PATCH 09/10] KVM: PPC: Book3S HV: Fix accounting of stolen time Paul Mackerras
@ 2012-09-21  5:39 ` Paul Mackerras
  2012-09-24 12:52   ` Alexander Graf
  9 siblings, 1 reply; 20+ messages in thread
From: Paul Mackerras @ 2012-09-21  5:39 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

In the case where the host kernel is using a 64kB base page size and
the guest uses a 4k HPTE (hashed page table entry) to map an emulated
MMIO device, we were calculating the guest physical address wrongly.
We were calculating a gfn as the guest physical address shifted right
16 bits (PAGE_SHIFT) but then only adding back in 12 bits from the
effective address, since the HPTE had a 4k page size.  Thus the gpa
reported to userspace was missing 4 bits.

Instead, we now compute the guest physical address from the HPTE
without reference to the host page size, and then compute the gfn
by shifting the gpa right PAGE_SHIFT bits.

Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c |    9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f598366..7a4aae9 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -571,7 +571,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long *hptep, hpte[3], r;
 	unsigned long mmu_seq, psize, pte_size;
-	unsigned long gfn, hva, pfn;
+	unsigned long gpa, gfn, hva, pfn;
 	struct kvm_memory_slot *memslot;
 	unsigned long *rmap;
 	struct revmap_entry *rev;
@@ -609,15 +609,14 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 	/* Translate the logical address and get the page */
 	psize = hpte_page_size(hpte[0], r);
-	gfn = hpte_rpn(r, psize);
+	gpa = (r & HPTE_R_RPN & ~(psize - 1)) | (ea & (psize - 1));
+	gfn = gpa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(kvm, gfn);
 
 	/* No memslot means it's an emulated MMIO region */
-	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID)) {
-		unsigned long gpa = (gfn << PAGE_SHIFT) | (ea & (psize - 1));
+	if (!memslot || (memslot->flags & KVM_MEMSLOT_INVALID))
 		return kvmppc_hv_emulate_mmio(run, vcpu, gpa, ea,
 					      dsisr & DSISR_ISSTORE);
-	}
 
 	if (!kvm->arch.using_mmu_notifiers)
 		return -EFAULT;		/* should never get here */
-- 
1.7.10


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 01/10] KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas
  2012-09-21  5:33 ` [PATCH 01/10] KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas Paul Mackerras
@ 2012-09-24 12:23   ` Alexander Graf
  0 siblings, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2012-09-24 12:23 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc


On 21.09.2012, at 07:33, Paul Mackerras wrote:

> The PAPR paravirtualization interface lets guests register three
> different types of per-vCPU buffer areas in its memory for communication
> with the hypervisor.  These are called virtual processor areas (VPAs).
> Currently the hypercalls to register and unregister VPAs are handled
> by KVM in the kernel, and userspace has no way to know about or save
> and restore these registrations across a migration.
> 
> This adds get and set ioctls to allow userspace to see what addresses
> have been registered, and to register or unregister them.  This will
> be needed for guest hibernation and migration, and is also needed so
> that userspace can unregister them on reset (otherwise we corrupt
> guest memory after reboot by writing to the VPAs registered by the
> previous kernel).  We also add a capability to indicate that the
> ioctls are supported.
> 
> This also fixes a bug where we were calling init_vpa unconditionally,
> leading to an oops when unregistering the VPA.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Do you think it'd be possible to map these onto ONE_REG as well? I'm slightly wary to add an interface that is restricted to only a limited amount of entries. What if some future PAPR spec wants to add another VPA to the game? We'd have to do a completely new ioctl for that one then.

However, if instead we could have 3 REGs

  64-bit VPA_ADDR
  128-bit VPA_SLB
  128-bit VPA_DTL

where you'd have to set VPA_ADDR first, then the other two. It gives us nice extensibility for the future too, right? We could just add another REG and everyone's happy.


Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/10] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
  2012-09-21  5:35 ` [PATCH 02/10] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online Paul Mackerras
@ 2012-09-24 12:26   ` Alexander Graf
  2012-09-27  1:01   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2012-09-24 12:26 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt


On 21.09.2012, at 07:35, Paul Mackerras wrote:

> When a Book3S HV KVM guest is running, we need the host to be in
> single-thread mode, that is, all of the cores (or at least all of
> the cores where the KVM guest could run) to be running only one
> active hardware thread.  This is because of the hardware restriction
> in POWER processors that all of the hardware threads in the core
> must be in the same logical partition.  Complying with this restriction
> is much easier if, from the host kernel's point of view, only one
> hardware thread is active.
> 
> This adds two hooks in the SMP hotplug code to allow the KVM code to
> make sure that secondary threads (i.e. hardware threads other than
> thread 0) cannot come online while any KVM guest exists.  The KVM
> code still has to check that any core where it runs a guest has the
> secondary threads offline, but having done that check it can now be
> sure that they will not come online while the guest is running.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Ben, since this touches generic ppc code, could you please ack?

Alex

> ---
> arch/powerpc/include/asm/smp.h |    8 +++++++
> arch/powerpc/kernel/smp.c      |   46 ++++++++++++++++++++++++++++++++++++++++
> arch/powerpc/kvm/book3s_hv.c   |   12 +++++++++--
> 3 files changed, 64 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index ebc24dc..b625a1a 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -66,6 +66,14 @@ void generic_cpu_die(unsigned int cpu);
> void generic_mach_cpu_die(void);
> void generic_set_cpu_dead(unsigned int cpu);
> int generic_check_cpu_restart(unsigned int cpu);
> +
> +extern void inhibit_secondary_onlining(void);
> +extern void uninhibit_secondary_onlining(void);
> +
> +#else /* HOTPLUG_CPU */
> +static inline void inhibit_secondary_onlining(void) {}
> +static inline void uninhibit_secondary_onlining(void) {}
> +
> #endif
> 
> #ifdef CONFIG_PPC64
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 0321007..c45f51d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -410,6 +410,45 @@ int generic_check_cpu_restart(unsigned int cpu)
> {
> 	return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE;
> }
> +
> +static atomic_t secondary_inhibit_count;
> +
> +/*
> + * Don't allow secondary CPU threads to come online
> + */
> +void inhibit_secondary_onlining(void)
> +{
> +	/*
> +	 * This makes secondary_inhibit_count stable during cpu
> +	 * online/offline operations.
> +	 */
> +	get_online_cpus();
> +
> +	atomic_inc(&secondary_inhibit_count);
> +	put_online_cpus();
> +}
> +EXPORT_SYMBOL_GPL(inhibit_secondary_onlining);
> +
> +/*
> + * Allow secondary CPU threads to come online again
> + */
> +void uninhibit_secondary_onlining(void)
> +{
> +	get_online_cpus();
> +	atomic_dec(&secondary_inhibit_count);
> +	put_online_cpus();
> +}
> +EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining);
> +
> +static int secondaries_inhibited(void)
> +{
> +	return atomic_read(&secondary_inhibit_count);
> +}
> +
> +#else /* HOTPLUG_CPU */
> +
> +#define secondaries_inhibited()		0
> +
> #endif
> 
> static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
> @@ -428,6 +467,13 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle)
> {
> 	int rc, c;
> 
> +	/*
> +	 * Don't allow secondary threads to come online if inhibited
> +	 */
> +	if (threads_per_core > 1 && secondaries_inhibited() &&
> +	    cpu % threads_per_core != 0)
> +		return -EBUSY;
> +
> 	if (smp_ops == NULL ||
> 	    (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)))
> 		return -EINVAL;
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index bebf9cb..6fe1410 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -47,6 +47,7 @@
> #include <asm/page.h>
> #include <asm/hvcall.h>
> #include <asm/switch_to.h>
> +#include <asm/smp.h>
> #include <linux/gfp.h>
> #include <linux/vmalloc.h>
> #include <linux/highmem.h>
> @@ -918,8 +919,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
> 	/*
> 	 * Make sure we are running on thread 0, and that
> 	 * secondary threads are offline.
> -	 * XXX we should also block attempts to bring any
> -	 * secondary threads online.
> 	 */
> 	if (threads_per_core > 1 && !on_primary_thread()) {
> 		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
> @@ -1632,11 +1631,20 @@ int kvmppc_core_init_vm(struct kvm *kvm)
> 
> 	kvm->arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206);
> 	spin_lock_init(&kvm->arch.slot_phys_lock);
> +
> +	/*
> +	 * Don't allow secondary CPU threads to come online
> +	 * while any KVM VMs exist.
> +	 */
> +	inhibit_secondary_onlining();
> +
> 	return 0;
> }
> 
> void kvmppc_core_destroy_vm(struct kvm *kvm)
> {
> +	uninhibit_secondary_onlining();
> +
> 	if (kvm->arch.rma) {
> 		kvm_release_rma(kvm->arch.rma);
> 		kvm->arch.rma = NULL;
> -- 
> 1.7.10
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 06/10] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock
  2012-09-21  5:37 ` [PATCH 06/10] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock Paul Mackerras
@ 2012-09-24 12:48   ` Alexander Graf
  2012-09-27  6:00     ` [PATCH v2 06/10] KVM: PPC: Book3S " Paul Mackerras
  0 siblings, 1 reply; 20+ messages in thread
From: Alexander Graf @ 2012-09-24 12:48 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc


On 21.09.2012, at 07:37, Paul Mackerras wrote:

> There were a few places where we were traversing the list of runnable
> threads in a virtual core, i.e. vc->runnable_threads, without holding
> the vcore spinlock.  This extends the places where we hold the vcore
> spinlock to cover everywhere that we traverse that list.
> 
> Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
> this moves the call of it from kvmppc_handle_exit out to
> kvmppc_vcpu_run, where we don't hold the vcore lock.
> 
> In kvmppc_vcore_blocked, we don't actually need to check whether
> all vcpus are ceded and don't have any pending exceptions, since the
> caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
> actually checking for pending exceptions, so we add that.
> 
> The change of if to while in kvmppc_run_vcpu is to make sure that we
> never call kvmppc_remove_runnable() when the vcore state is RUNNING or
> EXITING.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>
> ---
> arch/powerpc/include/asm/kvm_asm.h |    1 +
> arch/powerpc/kvm/book3s_hv.c       |   64 +++++++++++++++++-------------------
> 2 files changed, 31 insertions(+), 34 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
> index 76fdcfe..fb99a21 100644
> --- a/arch/powerpc/include/asm/kvm_asm.h
> +++ b/arch/powerpc/include/asm/kvm_asm.h
> @@ -123,6 +123,7 @@
> #define RESUME_GUEST_NV         RESUME_FLAG_NV
> #define RESUME_HOST             RESUME_FLAG_HOST
> #define RESUME_HOST_NV          (RESUME_FLAG_HOST|RESUME_FLAG_NV)
> +#define RESUME_PAGE_FAULT	(1<<2)

I would actually prefer if you could move this to core specific code. How about

#define RESUME_ARCH1	(1 << 2)

and then in book3s_hv.c:

#define RESUME_PAGE_FAULT	(RESUME_GUEST | RESUME_ARCH1)


Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 03/10] KVM: PPC: Book3S HV: Fix updates of vcpu->cpu
  2012-09-21  5:35 ` [PATCH 03/10] KVM: PPC: Book3S HV: Fix updates of vcpu->cpu Paul Mackerras
@ 2012-09-24 12:52   ` Alexander Graf
  0 siblings, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2012-09-24 12:52 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt


On 21.09.2012, at 07:35, Paul Mackerras wrote:

> This removes the powerpc "generic" updates of vcpu->cpu in load and
> put, and moves them to the various backends.
> 
> The reason is that "HV" KVM does its own sauce with that field
> and the generic updates might corrupt it. The field contains the
> CPU# of the -first- HW CPU of the core always for all the VCPU
> threads of a core (the one that's online from a host Linux
> perspective).
> 
> However, the preempt notifiers are going to be called on the
> threads VCPUs when they are running (due to them sleeping on our
> private waitqueue) causing unload to be called, potentially
> clobbering the value.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Thanks, applied to kvm-ppc-next.


Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 04/10] KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs
  2012-09-21  5:36 ` [PATCH 04/10] KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs Paul Mackerras
@ 2012-09-24 12:52   ` Alexander Graf
  0 siblings, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2012-09-24 12:52 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc, Benjamin Herrenschmidt


On 21.09.2012, at 07:36, Paul Mackerras wrote:

> When making a vcpu non-runnable we incorrectly changed the
> thread IDs of all other threads on the core, just remove that
> code.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Thanks, applied to kvm-ppc-next.


Alex


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 10/10] KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation
  2012-09-21  5:39 ` [PATCH 10/10] KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation Paul Mackerras
@ 2012-09-24 12:52   ` Alexander Graf
  0 siblings, 0 replies; 20+ messages in thread
From: Alexander Graf @ 2012-09-24 12:52 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm, kvm-ppc


On 21.09.2012, at 07:39, Paul Mackerras wrote:

> In the case where the host kernel is using a 64kB base page size and
> the guest uses a 4k HPTE (hashed page table entry) to map an emulated
> MMIO device, we were calculating the guest physical address wrongly.
> We were calculating a gfn as the guest physical address shifted right
> 16 bits (PAGE_SHIFT) but then only adding back in 12 bits from the
> effective address, since the HPTE had a 4k page size.  Thus the gpa
> reported to userspace was missing 4 bits.
> 
> Instead, we now compute the guest physical address from the HPTE
> without reference to the host page size, and then compute the gfn
> by shifting the gpa right PAGE_SHIFT bits.
> 
> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Thanks, applied to kvm-ppc-next.


Alex

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 02/10] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online
  2012-09-21  5:35 ` [PATCH 02/10] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online Paul Mackerras
  2012-09-24 12:26   ` Alexander Graf
@ 2012-09-27  1:01   ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 20+ messages in thread
From: Benjamin Herrenschmidt @ 2012-09-27  1:01 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: Alexander Graf, kvm, kvm-ppc

On Fri, 2012-09-21 at 15:35 +1000, Paul Mackerras wrote:
> When a Book3S HV KVM guest is running, we need the host to be in
> single-thread mode, that is, all of the cores (or at least all of
> the cores where the KVM guest could run) to be running only one
> active hardware thread.  This is because of the hardware restriction
> in POWER processors that all of the hardware threads in the core
> must be in the same logical partition.  Complying with this restriction
> is much easier if, from the host kernel's point of view, only one
> hardware thread is active.
> 
> This adds two hooks in the SMP hotplug code to allow the KVM code to
> make sure that secondary threads (i.e. hardware threads other than
> thread 0) cannot come online while any KVM guest exists.  The KVM
> code still has to check that any core where it runs a guest has the
> secondary threads offline, but having done that check it can now be
> sure that they will not come online while the guest is running.
> 
> Signed-off-by: Paul Mackerras <paulus@samba.org>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

> ---
>  arch/powerpc/include/asm/smp.h |    8 +++++++
>  arch/powerpc/kernel/smp.c      |   46 ++++++++++++++++++++++++++++++++++++++++
>  arch/powerpc/kvm/book3s_hv.c   |   12 +++++++++--
>  3 files changed, 64 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index ebc24dc..b625a1a 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -66,6 +66,14 @@ void generic_cpu_die(unsigned int cpu);
>  void generic_mach_cpu_die(void);
>  void generic_set_cpu_dead(unsigned int cpu);
>  int generic_check_cpu_restart(unsigned int cpu);
> +
> +extern void inhibit_secondary_onlining(void);
> +extern void uninhibit_secondary_onlining(void);
> +
> +#else /* HOTPLUG_CPU */
> +static inline void inhibit_secondary_onlining(void) {}
> +static inline void uninhibit_secondary_onlining(void) {}
> +
>  #endif
>  
>  #ifdef CONFIG_PPC64
> diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
> index 0321007..c45f51d 100644
> --- a/arch/powerpc/kernel/smp.c
> +++ b/arch/powerpc/kernel/smp.c
> @@ -410,6 +410,45 @@ int generic_check_cpu_restart(unsigned int cpu)
>  {
>  	return per_cpu(cpu_state, cpu) == CPU_UP_PREPARE;
>  }
> +
> +static atomic_t secondary_inhibit_count;
> +
> +/*
> + * Don't allow secondary CPU threads to come online
> + */
> +void inhibit_secondary_onlining(void)
> +{
> +	/*
> +	 * This makes secondary_inhibit_count stable during cpu
> +	 * online/offline operations.
> +	 */
> +	get_online_cpus();
> +
> +	atomic_inc(&secondary_inhibit_count);
> +	put_online_cpus();
> +}
> +EXPORT_SYMBOL_GPL(inhibit_secondary_onlining);
> +
> +/*
> + * Allow secondary CPU threads to come online again
> + */
> +void uninhibit_secondary_onlining(void)
> +{
> +	get_online_cpus();
> +	atomic_dec(&secondary_inhibit_count);
> +	put_online_cpus();
> +}
> +EXPORT_SYMBOL_GPL(uninhibit_secondary_onlining);
> +
> +static int secondaries_inhibited(void)
> +{
> +	return atomic_read(&secondary_inhibit_count);
> +}
> +
> +#else /* HOTPLUG_CPU */
> +
> +#define secondaries_inhibited()		0
> +
>  #endif
>  
>  static void cpu_idle_thread_init(unsigned int cpu, struct task_struct *idle)
> @@ -428,6 +467,13 @@ int __cpuinit __cpu_up(unsigned int cpu, struct task_struct *tidle)
>  {
>  	int rc, c;
>  
> +	/*
> +	 * Don't allow secondary threads to come online if inhibited
> +	 */
> +	if (threads_per_core > 1 && secondaries_inhibited() &&
> +	    cpu % threads_per_core != 0)
> +		return -EBUSY;
> +
>  	if (smp_ops == NULL ||
>  	    (smp_ops->cpu_bootable && !smp_ops->cpu_bootable(cpu)))
>  		return -EINVAL;
> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> index bebf9cb..6fe1410 100644
> --- a/arch/powerpc/kvm/book3s_hv.c
> +++ b/arch/powerpc/kvm/book3s_hv.c
> @@ -47,6 +47,7 @@
>  #include <asm/page.h>
>  #include <asm/hvcall.h>
>  #include <asm/switch_to.h>
> +#include <asm/smp.h>
>  #include <linux/gfp.h>
>  #include <linux/vmalloc.h>
>  #include <linux/highmem.h>
> @@ -918,8 +919,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
>  	/*
>  	 * Make sure we are running on thread 0, and that
>  	 * secondary threads are offline.
> -	 * XXX we should also block attempts to bring any
> -	 * secondary threads online.
>  	 */
>  	if (threads_per_core > 1 && !on_primary_thread()) {
>  		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
> @@ -1632,11 +1631,20 @@ int kvmppc_core_init_vm(struct kvm *kvm)
>  
>  	kvm->arch.using_mmu_notifiers = !!cpu_has_feature(CPU_FTR_ARCH_206);
>  	spin_lock_init(&kvm->arch.slot_phys_lock);
> +
> +	/*
> +	 * Don't allow secondary CPU threads to come online
> +	 * while any KVM VMs exist.
> +	 */
> +	inhibit_secondary_onlining();
> +
>  	return 0;
>  }
>  
>  void kvmppc_core_destroy_vm(struct kvm *kvm)
>  {
> +	uninhibit_secondary_onlining();
> +
>  	if (kvm->arch.rma) {
>  		kvm_release_rma(kvm->arch.rma);
>  		kvm->arch.rma = NULL;



^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 06/10] KVM: PPC: Book3S HV: Don't access runnable threads list without vcore lock
  2012-09-24 12:48   ` Alexander Graf
@ 2012-09-27  6:00     ` Paul Mackerras
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Mackerras @ 2012-09-27  6:00 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

There were a few places where we were traversing the list of runnable
threads in a virtual core, i.e. vc->runnable_threads, without holding
the vcore spinlock.  This extends the places where we hold the vcore
spinlock to cover everywhere that we traverse that list.

Since we possibly need to sleep inside kvmppc_book3s_hv_page_fault,
this moves the call of it from kvmppc_handle_exit out to
kvmppc_vcpu_run, where we don't hold the vcore lock.

In kvmppc_vcore_blocked, we don't actually need to check whether
all vcpus are ceded and don't have any pending exceptions, since the
caller has already done that.  The caller (kvmppc_run_vcpu) wasn't
actually checking for pending exceptions, so we add that.

The change of if to while in kvmppc_run_vcpu is to make sure that we
never call kvmppc_remove_runnable() when the vcore state is RUNNING or
EXITING.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
v2: move RESUME_PAGE_FAULT defn to book3s_hv.c

 arch/powerpc/include/asm/kvm_asm.h |    1 +
 arch/powerpc/kvm/book3s_hv.c       |   67 ++++++++++++++++++------------------
 2 files changed, 34 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h
index 76fdcfe..aabcdba 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -118,6 +118,7 @@
 
 #define RESUME_FLAG_NV          (1<<0)  /* Reload guest nonvolatile state? */
 #define RESUME_FLAG_HOST        (1<<1)  /* Resume host? */
+#define RESUME_FLAG_ARCH1	(1<<2)
 
 #define RESUME_GUEST            0
 #define RESUME_GUEST_NV         RESUME_FLAG_NV
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 77dec0f..3a737a4 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -57,6 +57,9 @@
 /* #define EXIT_DEBUG_SIMPLE */
 /* #define EXIT_DEBUG_INT */
 
+/* Used to indicate that a guest page fault needs to be handled */
+#define RESUME_PAGE_FAULT	(RESUME_GUEST | RESUME_FLAG_ARCH1)
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
@@ -431,7 +434,6 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 			      struct task_struct *tsk)
 {
 	int r = RESUME_HOST;
-	int srcu_idx;
 
 	vcpu->stat.sum_exits++;
 
@@ -491,16 +493,12 @@ static int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	 * have been handled already.
 	 */
 	case BOOK3S_INTERRUPT_H_DATA_STORAGE:
-		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-		r = kvmppc_book3s_hv_page_fault(run, vcpu,
-				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
-		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+		r = RESUME_PAGE_FAULT;
 		break;
 	case BOOK3S_INTERRUPT_H_INST_STORAGE:
-		srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
-		r = kvmppc_book3s_hv_page_fault(run, vcpu,
-				kvmppc_get_pc(vcpu), 0);
-		srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
+		vcpu->arch.fault_dar = kvmppc_get_pc(vcpu);
+		vcpu->arch.fault_dsisr = 0;
+		r = RESUME_PAGE_FAULT;
 		break;
 	/*
 	 * This occurs if the guest executes an illegal instruction.
@@ -984,22 +982,24 @@ static int on_primary_thread(void)
  * Run a set of guest threads on a physical core.
  * Called with vc->lock held.
  */
-static int kvmppc_run_core(struct kvmppc_vcore *vc)
+static void kvmppc_run_core(struct kvmppc_vcore *vc)
 {
 	struct kvm_vcpu *vcpu, *vcpu0, *vnext;
 	long ret;
 	u64 now;
 	int ptid, i, need_vpa_update;
 	int srcu_idx;
+	struct kvm_vcpu *vcpus_to_update[threads_per_core];
 
 	/* don't start if any threads have a signal pending */
 	need_vpa_update = 0;
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		if (signal_pending(vcpu->arch.run_task))
-			return 0;
-		need_vpa_update |= vcpu->arch.vpa.update_pending |
-			vcpu->arch.slb_shadow.update_pending |
-			vcpu->arch.dtl.update_pending;
+			return;
+		if (vcpu->arch.vpa.update_pending ||
+		    vcpu->arch.slb_shadow.update_pending ||
+		    vcpu->arch.dtl.update_pending)
+			vcpus_to_update[need_vpa_update++] = vcpu;
 	}
 
 	/*
@@ -1019,8 +1019,8 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	 */
 	if (need_vpa_update) {
 		spin_unlock(&vc->lock);
-		list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
-			kvmppc_update_vpas(vcpu);
+		for (i = 0; i < need_vpa_update; ++i)
+			kvmppc_update_vpas(vcpus_to_update[i]);
 		spin_lock(&vc->lock);
 	}
 
@@ -1037,8 +1037,10 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 			vcpu->arch.ptid = ptid++;
 		}
 	}
-	if (!vcpu0)
-		return 0;		/* nothing to run */
+	if (!vcpu0) {
+		vc->vcore_state = VCORE_INACTIVE;
+		return;		/* nothing to run; should never happen */
+	}
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 		if (vcpu->arch.ceded)
 			vcpu->arch.ptid = ptid++;
@@ -1091,6 +1093,7 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 	preempt_enable();
 	kvm_resched(vcpu);
 
+	spin_lock(&vc->lock);
 	now = get_tb();
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		/* cancel pending dec exception if dec is positive */
@@ -1114,7 +1117,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 		}
 	}
 
-	spin_lock(&vc->lock);
  out:
 	vc->vcore_state = VCORE_INACTIVE;
 	vc->preempt_tb = mftb();
@@ -1125,8 +1127,6 @@ static int kvmppc_run_core(struct kvmppc_vcore *vc)
 			wake_up(&vcpu->arch.cpu_run);
 		}
 	}
-
-	return 1;
 }
 
 /*
@@ -1150,20 +1150,11 @@ static void kvmppc_wait_for_exec(struct kvm_vcpu *vcpu, int wait_state)
 static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc)
 {
 	DEFINE_WAIT(wait);
-	struct kvm_vcpu *v;
-	int all_idle = 1;
 
 	prepare_to_wait(&vc->wq, &wait, TASK_INTERRUPTIBLE);
 	vc->vcore_state = VCORE_SLEEPING;
 	spin_unlock(&vc->lock);
-	list_for_each_entry(v, &vc->runnable_threads, arch.run_list) {
-		if (!v->arch.ceded || v->arch.pending_exceptions) {
-			all_idle = 0;
-			break;
-		}
-	}
-	if (all_idle)
-		schedule();
+	schedule();
 	finish_wait(&vc->wq, &wait);
 	spin_lock(&vc->lock);
 	vc->vcore_state = VCORE_INACTIVE;
@@ -1219,7 +1210,8 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 		vc->runner = vcpu;
 		n_ceded = 0;
 		list_for_each_entry(v, &vc->runnable_threads, arch.run_list)
-			n_ceded += v->arch.ceded;
+			if (!v->arch.pending_exceptions)
+				n_ceded += v->arch.ceded;
 		if (n_ceded == vc->n_runnable)
 			kvmppc_vcore_blocked(vc);
 		else
@@ -1240,8 +1232,9 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	}
 
 	if (signal_pending(current)) {
-		if (vc->vcore_state == VCORE_RUNNING ||
-		    vc->vcore_state == VCORE_EXITING) {
+		while (vcpu->arch.state == KVMPPC_VCPU_RUNNABLE &&
+		       (vc->vcore_state == VCORE_RUNNING ||
+			vc->vcore_state == VCORE_EXITING)) {
 			spin_unlock(&vc->lock);
 			kvmppc_wait_for_exec(vcpu, TASK_UNINTERRUPTIBLE);
 			spin_lock(&vc->lock);
@@ -1261,6 +1254,7 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
 {
 	int r;
+	int srcu_idx;
 
 	if (!vcpu->arch.sane) {
 		run->exit_reason = KVM_EXIT_INTERNAL_ERROR;
@@ -1299,6 +1293,11 @@ int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
 		    !(vcpu->arch.shregs.msr & MSR_PR)) {
 			r = kvmppc_pseries_do_hcall(vcpu);
 			kvmppc_core_prepare_to_enter(vcpu);
+		} else if (r == RESUME_PAGE_FAULT) {
+			srcu_idx = srcu_read_lock(&vcpu->kvm->srcu);
+			r = kvmppc_book3s_hv_page_fault(run, vcpu,
+				vcpu->arch.fault_dar, vcpu->arch.fault_dsisr);
+			srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx);
 		}
 	} while (r == RESUME_GUEST);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 09/10] KVM: PPC: Book3S HV: Fix accounting of stolen time
  2012-09-21  5:38 ` [PATCH 09/10] KVM: PPC: Book3S HV: Fix accounting of stolen time Paul Mackerras
@ 2012-09-27  6:05   ` Paul Mackerras
  0 siblings, 0 replies; 20+ messages in thread
From: Paul Mackerras @ 2012-09-27  6:05 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm, kvm-ppc

Currently the code that accounts stolen time tends to overestimate the
stolen time, and will sometimes report more stolen time in a DTL
(dispatch trace log) entry than has elapsed since the last DTL entry.
This can cause guests to underflow the user or system time measured
for some tasks, leading to ridiculous CPU percentages and total runtimes
being reported by top and other utilities.

In addition, the current code was designed for the previous policy where
a vcore would only run when all the vcpus in it were runnable, and so
only counted stolen time on a per-vcore basis.  Now that a vcore can
run while some of the vcpus in it are doing other things in the kernel
(e.g. handling a page fault), we need to count the time when a vcpu task
is preempted while it is not running as part of a vcore as stolen also.

To do this, we bring back the BUSY_IN_HOST vcpu state and extend the
vcpu_load/put functions to count preemption time while the vcpu is
in that state.  Handling the transitions between the RUNNING and
BUSY_IN_HOST states requires checking and updating two variables
(accumulated time stolen and time last preempted), so we add a new
spinlock, vcpu->arch.tbacct_lock.  This protects both the per-vcpu
stolen/preempt-time variables, and the per-vcore variables while this
vcpu is running the vcore.

Finally, we now don't count time spent in userspace as stolen time.
The task could be executing in userspace on behalf of the vcpu, or
it could be preempted, or the vcpu could be genuinely stopped.  Since
we have no way of dividing up the time between these cases, we don't
count any of it as stolen.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
v2: just rediffed for context changes, no real change from previous version.

 arch/powerpc/include/asm/kvm_host.h |    5 ++
 arch/powerpc/kvm/book3s_hv.c        |  127 ++++++++++++++++++++++++++++++-----
 2 files changed, 117 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 1e8cbd1..3093896 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -559,12 +559,17 @@ struct kvm_vcpu_arch {
 	unsigned long dtl_index;
 	u64 stolen_logged;
 	struct kvmppc_vpa slb_shadow;
+
+	spinlock_t tbacct_lock;
+	u64 busy_stolen;
+	u64 busy_preempt;
 #endif
 };
 
 /* Values for vcpu->arch.state */
 #define KVMPPC_VCPU_NOTREADY		0
 #define KVMPPC_VCPU_RUNNABLE		1
+#define KVMPPC_VCPU_BUSY_IN_HOST	2
 
 /* Values for vcpu->arch.io_gpr */
 #define KVM_MMIO_REG_MASK	0x001f
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 61d2934..8b3c470 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -60,23 +60,74 @@
 /* Used to indicate that a guest page fault needs to be handled */
 #define RESUME_PAGE_FAULT	(RESUME_GUEST | RESUME_FLAG_ARCH1)
 
+/* Used as a "null" value for timebase values */
+#define TB_NIL	(~(u64)0)
+
 static void kvmppc_end_cede(struct kvm_vcpu *vcpu);
 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu);
 
+/*
+ * We use the vcpu_load/put functions to measure stolen time.
+ * Stolen time is counted as time when either the vcpu is able to
+ * run as part of a virtual core, but the task running the vcore
+ * is preempted or sleeping, or when the vcpu needs something done
+ * in the kernel by the task running the vcpu, but that task is
+ * preempted or sleeping.  Those two things have to be counted
+ * separately, since one of the vcpu tasks will take on the job
+ * of running the core, and the other vcpu tasks in the vcore will
+ * sleep waiting for it to do that, but that sleep shouldn't count
+ * as stolen time.
+ *
+ * Hence we accumulate stolen time when the vcpu can run as part of
+ * a vcore using vc->stolen_tb, and the stolen time when the vcpu
+ * needs its task to do other things in the kernel (for example,
+ * service a page fault) in busy_stolen.  We don't accumulate
+ * stolen time for a vcore when it is inactive, or for a vcpu
+ * when it is in state RUNNING or NOTREADY.  NOTREADY is a bit of
+ * a misnomer; it means that the vcpu task is not executing in
+ * the KVM_VCPU_RUN ioctl, i.e. it is in userspace or elsewhere in
+ * the kernel.  We don't have any way of dividing up that time
+ * between time that the vcpu is genuinely stopped, time that
+ * the task is actively working on behalf of the vcpu, and time
+ * that the task is preempted, so we don't count any of it as
+ * stolen.
+ *
+ * Updates to busy_stolen are protected by arch.tbacct_lock;
+ * updates to vc->stolen_tb are protected by the arch.tbacct_lock
+ * of the vcpu that has taken responsibility for running the vcore
+ * (i.e. vc->runner).  The stolen times are measured in units of
+ * timebase ticks.  (Note that the != TB_NIL checks below are
+ * purely defensive; they should never fail.)
+ */
+
 void kvmppc_core_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
-	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE)
+	spin_lock(&vcpu->arch.tbacct_lock);
+	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE &&
+	    vc->preempt_tb != TB_NIL) {
 		vc->stolen_tb += mftb() - vc->preempt_tb;
+		vc->preempt_tb = TB_NIL;
+	}
+	if (vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST &&
+	    vcpu->arch.busy_preempt != TB_NIL) {
+		vcpu->arch.busy_stolen += mftb() - vcpu->arch.busy_preempt;
+		vcpu->arch.busy_preempt = TB_NIL;
+	}
+	spin_unlock(&vcpu->arch.tbacct_lock);
 }
 
 void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
 {
 	struct kvmppc_vcore *vc = vcpu->arch.vcore;
 
+	spin_lock(&vcpu->arch.tbacct_lock);
 	if (vc->runner == vcpu && vc->vcore_state != VCORE_INACTIVE)
 		vc->preempt_tb = mftb();
+	if (vcpu->arch.state == KVMPPC_VCPU_BUSY_IN_HOST)
+		vcpu->arch.busy_preempt = mftb();
+	spin_unlock(&vcpu->arch.tbacct_lock);
 }
 
 void kvmppc_set_msr(struct kvm_vcpu *vcpu, u64 msr)
@@ -357,24 +408,61 @@ static void kvmppc_update_vpas(struct kvm_vcpu *vcpu)
 	spin_unlock(&vcpu->arch.vpa_update_lock);
 }
 
+/*
+ * Return the accumulated stolen time for the vcore up until `now'.
+ * The caller should hold the vcore lock.
+ */
+static u64 vcore_stolen_time(struct kvmppc_vcore *vc, u64 now)
+{
+	u64 p;
+
+	/*
+	 * If we are the task running the vcore, then since we hold
+	 * the vcore lock, we can't be preempted, so stolen_tb/preempt_tb
+	 * can't be updated, so we don't need the tbacct_lock.
+	 * If the vcore is inactive, it can't become active (since we
+	 * hold the vcore lock), so the vcpu load/put functions won't
+	 * update stolen_tb/preempt_tb, and we don't need tbacct_lock.
+	 */
+	if (vc->vcore_state != VCORE_INACTIVE &&
+	    vc->runner->arch.run_task != current) {
+		spin_lock(&vc->runner->arch.tbacct_lock);
+		p = vc->stolen_tb;
+		if (vc->preempt_tb != TB_NIL)
+			p += now - vc->preempt_tb;
+		spin_unlock(&vc->runner->arch.tbacct_lock);
+	} else {
+		p = vc->stolen_tb;
+	}
+	return p;
+}
+
 static void kvmppc_create_dtl_entry(struct kvm_vcpu *vcpu,
 				    struct kvmppc_vcore *vc)
 {
 	struct dtl_entry *dt;
 	struct lppaca *vpa;
-	unsigned long old_stolen;
+	unsigned long stolen;
+	unsigned long core_stolen;
+	u64 now;
 
 	dt = vcpu->arch.dtl_ptr;
 	vpa = vcpu->arch.vpa.pinned_addr;
-	old_stolen = vcpu->arch.stolen_logged;
-	vcpu->arch.stolen_logged = vc->stolen_tb;
+	now = mftb();
+	core_stolen = vcore_stolen_time(vc, now);
+	stolen = core_stolen - vcpu->arch.stolen_logged;
+	vcpu->arch.stolen_logged = core_stolen;
+	spin_lock(&vcpu->arch.tbacct_lock);
+	stolen += vcpu->arch.busy_stolen;
+	vcpu->arch.busy_stolen = 0;
+	spin_unlock(&vcpu->arch.tbacct_lock);
 	if (!dt || !vpa)
 		return;
 	memset(dt, 0, sizeof(struct dtl_entry));
 	dt->dispatch_reason = 7;
 	dt->processor_id = vc->pcpu + vcpu->arch.ptid;
-	dt->timebase = mftb();
-	dt->enqueue_to_dispatch_time = vc->stolen_tb - old_stolen;
+	dt->timebase = now;
+	dt->enqueue_to_dispatch_time = stolen;
 	dt->srr0 = kvmppc_get_pc(vcpu);
 	dt->srr1 = vcpu->arch.shregs.msr;
 	++dt;
@@ -773,6 +861,8 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	vcpu->arch.pvr = mfspr(SPRN_PVR);
 	kvmppc_set_pvr(vcpu, vcpu->arch.pvr);
 	spin_lock_init(&vcpu->arch.vpa_update_lock);
+	spin_lock_init(&vcpu->arch.tbacct_lock);
+	vcpu->arch.busy_preempt = TB_NIL;
 
 	kvmppc_mmu_book3s_hv_init(vcpu);
 
@@ -788,7 +878,7 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 			INIT_LIST_HEAD(&vcore->runnable_threads);
 			spin_lock_init(&vcore->lock);
 			init_waitqueue_head(&vcore->wq);
-			vcore->preempt_tb = mftb();
+			vcore->preempt_tb = TB_NIL;
 		}
 		kvm->arch.vcores[core] = vcore;
 	}
@@ -801,7 +891,6 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
 	++vcore->num_threads;
 	spin_unlock(&vcore->lock);
 	vcpu->arch.vcore = vcore;
-	vcpu->arch.stolen_logged = vcore->stolen_tb;
 
 	vcpu->arch.cpu_type = KVM_CPU_3S_64;
 	kvmppc_sanity_check(vcpu);
@@ -861,9 +950,17 @@ extern void xics_wake_cpu(int cpu);
 static void kvmppc_remove_runnable(struct kvmppc_vcore *vc,
 				   struct kvm_vcpu *vcpu)
 {
+	u64 now;
+
 	if (vcpu->arch.state != KVMPPC_VCPU_RUNNABLE)
 		return;
-	vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
+	spin_lock(&vcpu->arch.tbacct_lock);
+	now = mftb();
+	vcpu->arch.busy_stolen += vcore_stolen_time(vc, now) -
+		vcpu->arch.stolen_logged;
+	vcpu->arch.busy_preempt = now;
+	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
+	spin_unlock(&vcpu->arch.tbacct_lock);
 	--vc->n_runnable;
 	list_del(&vcpu->arch.run_list);
 }
@@ -1038,10 +1135,8 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 			vcpu->arch.ptid = ptid++;
 		}
 	}
-	if (!vcpu0) {
-		vc->vcore_state = VCORE_INACTIVE;
-		return;		/* nothing to run; should never happen */
-	}
+	if (!vcpu0)
+		goto out;	/* nothing to run; should never happen */
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list)
 		if (vcpu->arch.ceded)
 			vcpu->arch.ptid = ptid++;
@@ -1056,7 +1151,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 		goto out;
 	}
 
-	vc->stolen_tb += mftb() - vc->preempt_tb;
 	vc->pcpu = smp_processor_id();
 	list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) {
 		kvmppc_start_thread(vcpu);
@@ -1121,7 +1215,6 @@ static void kvmppc_run_core(struct kvmppc_vcore *vc)
 
  out:
 	vc->vcore_state = VCORE_INACTIVE;
-	vc->preempt_tb = mftb();
 	list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads,
 				 arch.run_list) {
 		if (vcpu->arch.ret != RESUME_GUEST) {
@@ -1181,7 +1274,9 @@ static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 	vcpu->arch.ceded = 0;
 	vcpu->arch.run_task = current;
 	vcpu->arch.kvm_run = kvm_run;
+	vcpu->arch.stolen_logged = vcore_stolen_time(vc, mftb());
 	vcpu->arch.state = KVMPPC_VCPU_RUNNABLE;
+	vcpu->arch.busy_preempt = TB_NIL;
 	list_add_tail(&vcpu->arch.run_list, &vc->runnable_threads);
 	++vc->n_runnable;
 
@@ -1295,6 +1390,7 @@ int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	flush_vsx_to_thread(current);
 	vcpu->arch.wqp = &vcpu->arch.vcore->wq;
 	vcpu->arch.pgdir = current->mm->pgd;
+	vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST;
 
 	do {
 		r = kvmppc_run_vcpu(run, vcpu);
@@ -1312,6 +1408,7 @@ int kvmppc_vcpu_run(struct kvm_run *run, struct kvm_vcpu *vcpu)
 	} while (r == RESUME_GUEST);
 
  out:
+	vcpu->arch.state = KVMPPC_VCPU_NOTREADY;
 	atomic_dec(&vcpu->kvm->arch.vcpus_running);
 	return r;
 }
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-09-27  6:05 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-21  5:16 [PATCH 0/10] HV KVM fixes, reposted Paul Mackerras
2012-09-21  5:33 ` [PATCH 01/10] KVM: PPC: Book3S HV: Provide a way for userspace to get/set per-vCPU areas Paul Mackerras
2012-09-24 12:23   ` Alexander Graf
2012-09-21  5:35 ` [PATCH 02/10] KVM: PPC: Book3S HV: Allow KVM guests to stop secondary threads coming online Paul Mackerras
2012-09-24 12:26   ` Alexander Graf
2012-09-27  1:01   ` Benjamin Herrenschmidt
2012-09-21  5:35 ` [PATCH 03/10] KVM: PPC: Book3S HV: Fix updates of vcpu->cpu Paul Mackerras
2012-09-24 12:52   ` Alexander Graf
2012-09-21  5:36 ` [PATCH 04/10] KVM: PPC: Book3S HV: Remove bogus update of physical thread IDs Paul Mackerras
2012-09-24 12:52   ` Alexander Graf
2012-09-21  5:36 ` [PATCH 05/10] KVM: PPC: Book3S HV: Fix some races in starting secondary threads Paul Mackerras
2012-09-21  5:37 ` [PATCH 06/10] KVM: PPC: Book3s HV: Don't access runnable threads list without vcore lock Paul Mackerras
2012-09-24 12:48   ` Alexander Graf
2012-09-27  6:00     ` [PATCH v2 06/10] KVM: PPC: Book3S " Paul Mackerras
2012-09-21  5:37 ` [PATCH 07/10] KVM: PPC: Book3S HV: Fixes for late-joining threads Paul Mackerras
2012-09-21  5:38 ` [PATCH 08/10] KVM: PPC: Book3S HV: Run virtual core whenever any vcpus in it can run Paul Mackerras
2012-09-21  5:38 ` [PATCH 09/10] KVM: PPC: Book3S HV: Fix accounting of stolen time Paul Mackerras
2012-09-27  6:05   ` [PATCH v2 " Paul Mackerras
2012-09-21  5:39 ` [PATCH 10/10] KVM: PPC: Book3S HV: Fix calculation of guest phys address for MMIO emulation Paul Mackerras
2012-09-24 12:52   ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).