public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Bug fixes for HV KVM, v2
@ 2014-05-26  9:48 Paul Mackerras
  2014-05-26  9:48 ` [PATCH 1/8] KVM: PPC: Book3S: Add ONE_REG register names that were missed Paul Mackerras
                   ` (8 more replies)
  0 siblings, 9 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

This series of patches fixes a few bugs that have been found in
testing HV KVM recently.  It also adds workarounds for a couple of
POWER8 PMU bugs, fixes the definition of KVM_REG_PPC_WORT, and adds
some things that were missing from Documentation/virtual/kvm/api.txt.

The patches are against Alex Graf's kvm-ppc-queue branch.

Please apply for 3.16.  The first couple would be safe to go into 3.15
as well, and probably should.

Thanks,
Paul.

 Documentation/virtual/kvm/api.txt       |  6 +++
 arch/powerpc/include/asm/reg.h          | 12 +++--
 arch/powerpc/include/uapi/asm/kvm.h     |  2 +-
 arch/powerpc/kvm/book3s_64_mmu_hv.c     | 93 ++++++++++++++++++++++++++-------
 arch/powerpc/kvm/book3s_hv_rm_mmu.c     |  3 +-
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 60 ++++++++++++++++++++-
 6 files changed, 147 insertions(+), 29 deletions(-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/8] KVM: PPC: Book3S: Add ONE_REG register names that were missed
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
@ 2014-05-26  9:48 ` Paul Mackerras
  2014-05-26  9:48 ` [PATCH 2/8] KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number Paul Mackerras
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

Commit 3b7834743f9 ("KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg") added definitions for several KVM_REG_PPC_* symbols
but missed adding some to api.txt.  This adds them.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 Documentation/virtual/kvm/api.txt | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 2014ff1..d4c1188 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1794,6 +1794,11 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_MMCR0     | 64
   PPC   | KVM_REG_PPC_MMCR1     | 64
   PPC   | KVM_REG_PPC_MMCRA     | 64
+  PPC   | KVM_REG_PPC_MMCR2     | 64
+  PPC   | KVM_REG_PPC_MMCRS     | 64
+  PPC   | KVM_REG_PPC_SIAR      | 64
+  PPC   | KVM_REG_PPC_SDAR      | 64
+  PPC   | KVM_REG_PPC_SIER      | 64
   PPC   | KVM_REG_PPC_PMC1      | 32
   PPC   | KVM_REG_PPC_PMC2      | 32
   PPC   | KVM_REG_PPC_PMC3      | 32
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/8] KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
  2014-05-26  9:48 ` [PATCH 1/8] KVM: PPC: Book3S: Add ONE_REG register names that were missed Paul Mackerras
@ 2014-05-26  9:48 ` Paul Mackerras
  2014-05-26  9:48 ` [PATCH 3/8] KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() Paul Mackerras
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

Commit b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8
SPRs") added a definition of KVM_REG_PPC_WORT with the same register
number as the existing KVM_REG_PPC_VRSAVE (though in fact the
definitions are not identical because of the different register sizes.)

For clarity, this moves KVM_REG_PPC_WORT to the next unused number,
and also adds it to api.txt.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 Documentation/virtual/kvm/api.txt   | 1 +
 arch/powerpc/include/uapi/asm/kvm.h | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index d4c1188..6b0225d 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1873,6 +1873,7 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_PPR	| 64
   PPC   | KVM_REG_PPC_ARCH_COMPAT 32
   PPC   | KVM_REG_PPC_DABRX     | 32
+  PPC   | KVM_REG_PPC_WORT      | 64
   PPC   | KVM_REG_PPC_TM_GPR0	| 64
           ...
   PPC   | KVM_REG_PPC_TM_GPR31	| 64
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index a6665be..2bc4a94 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -545,7 +545,6 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_TCSCR	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb1)
 #define KVM_REG_PPC_PID		(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb2)
 #define KVM_REG_PPC_ACOP	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb3)
-#define KVM_REG_PPC_WORT	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb4)
 
 #define KVM_REG_PPC_VRSAVE	(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb4)
 #define KVM_REG_PPC_LPCR	(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb5)
@@ -555,6 +554,7 @@ struct kvm_get_htab_header {
 #define KVM_REG_PPC_ARCH_COMPAT	(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb7)
 
 #define KVM_REG_PPC_DABRX	(KVM_REG_PPC | KVM_REG_SIZE_U32 | 0xb8)
+#define KVM_REG_PPC_WORT	(KVM_REG_PPC | KVM_REG_SIZE_U64 | 0xb9)
 
 /* Transactional Memory checkpointed state:
  * This is all GPRs, all VSX regs and a subset of SPRs
-- 
2.0.0.rc2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/8] KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates()
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
  2014-05-26  9:48 ` [PATCH 1/8] KVM: PPC: Book3S: Add ONE_REG register names that were missed Paul Mackerras
  2014-05-26  9:48 ` [PATCH 2/8] KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number Paul Mackerras
@ 2014-05-26  9:48 ` Paul Mackerras
  2014-05-26  9:48 ` [PATCH 4/8] KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address Paul Mackerras
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

The global_invalidates() function contains a check that is intended
to tell whether we are currently executing in the context of a hypercall
issued by the guest.  The reason is that the optimization of using a
local TLB invalidate instruction is only valid in that context.  The
check was testing local_paca->kvm_hstate.kvm_vcore, which gets set
when entering the guest but no longer gets cleared when exiting the
guest.  To fix this, we use the kvm_vcpu field instead, which does
get cleared when exiting the guest, by the kvmppc_release_hwthread()
calls inside kvmppc_run_core().

The effect of having the check wrong was that when kvmppc_do_h_remove()
got called from htab_write() on the destination machine during a
migration, it cleared the current cpu's bit in kvm->arch.need_tlb_flush.
This meant that when the guest started running in the destination VM,
it may miss out on doing a complete TLB flush, and therefore may end
up using stale TLB entries from a previous guest that used the same
LPID value.

This should make migration more reliable.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rm_mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 1d6c56a..ac840c6 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -42,13 +42,14 @@ static int global_invalidates(struct kvm *kvm, unsigned long flags)
 
 	/*
 	 * If there is only one vcore, and it's currently running,
+	 * as indicated by local_paca->kvm_hstate.kvm_vcpu being set,
 	 * we can use tlbiel as long as we mark all other physical
 	 * cores as potentially having stale TLB entries for this lpid.
 	 * If we're not using MMU notifiers, we never take pages away
 	 * from the guest, so we can use tlbiel if requested.
 	 * Otherwise, don't use tlbiel.
 	 */
-	if (kvm->arch.online_vcores == 1 && local_paca->kvm_hstate.kvm_vcore)
+	if (kvm->arch.online_vcores == 1 && local_paca->kvm_hstate.kvm_vcpu)
 		global = 0;
 	else if (kvm->arch.using_mmu_notifiers)
 		global = 1;
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/8] KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
                   ` (2 preceding siblings ...)
  2014-05-26  9:48 ` [PATCH 3/8] KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() Paul Mackerras
@ 2014-05-26  9:48 ` Paul Mackerras
  2014-05-26  9:48 ` [PATCH 5/8] KVM: PPC: Book3S HV: Fix dirty map for hugepages Paul Mackerras
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

Currently, when a huge page is faulted in for a guest, we select the
rmap chain to insert the HPTE into based on the guest physical address
that the guest tried to access.  Since there is an rmap chain for each
system page, there are many rmap chains for the area covered by a huge
page (e.g. 256 for 16MB pages when PAGE_SIZE = 64kB), and the huge-page
HPTE could end up in any one of them.

For consistency, and to make the huge-page HPTEs easier to find, we now
put huge-page HPTEs in the rmap chain corresponding to the base address
of the huge page.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index f32896f..4e22ecb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -585,6 +585,7 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	struct kvm *kvm = vcpu->kvm;
 	unsigned long *hptep, hpte[3], r;
 	unsigned long mmu_seq, psize, pte_size;
+	unsigned long gpa_base, gfn_base;
 	unsigned long gpa, gfn, hva, pfn;
 	struct kvm_memory_slot *memslot;
 	unsigned long *rmap;
@@ -623,7 +624,9 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 
 	/* Translate the logical address and get the page */
 	psize = hpte_page_size(hpte[0], r);
-	gpa = (r & HPTE_R_RPN & ~(psize - 1)) | (ea & (psize - 1));
+	gpa_base = r & HPTE_R_RPN & ~(psize - 1);
+	gfn_base = gpa_base >> PAGE_SHIFT;
+	gpa = gpa_base | (ea & (psize - 1));
 	gfn = gpa >> PAGE_SHIFT;
 	memslot = gfn_to_memslot(kvm, gfn);
 
@@ -635,6 +638,13 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 	if (!kvm->arch.using_mmu_notifiers)
 		return -EFAULT;		/* should never get here */
 
+	/*
+	 * This should never happen, because of the slot_is_aligned()
+	 * check in kvmppc_do_h_enter().
+	 */
+	if (gfn_base < memslot->base_gfn)
+		return -EFAULT;
+
 	/* used to check for invalidations in progress */
 	mmu_seq = kvm->mmu_notifier_seq;
 	smp_rmb();
@@ -727,7 +737,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
 		goto out_unlock;
 	hpte[0] = (hpte[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
 
-	rmap = &memslot->arch.rmap[gfn - memslot->base_gfn];
+	/* Always put the HPTE in the rmap chain for the page base address */
+	rmap = &memslot->arch.rmap[gfn_base - memslot->base_gfn];
 	lock_rmap(rmap);
 
 	/* Check if we might have been invalidated; let the guest retry if so */
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 5/8] KVM: PPC: Book3S HV: Fix dirty map for hugepages
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
                   ` (3 preceding siblings ...)
  2014-05-26  9:48 ` [PATCH 4/8] KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address Paul Mackerras
@ 2014-05-26  9:48 ` Paul Mackerras
  2014-05-26  9:48 ` [PATCH 6/8] KVM: PPC: Book3S HV: Make sure we don't miss dirty pages Paul Mackerras
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

From: Alexey Kardashevskiy <aik@ozlabs.ru>

The dirty map that we construct for the KVM_GET_DIRTY_LOG ioctl has
one bit per system page (4K/64K).  Currently, we only set one bit in
the map for each HPT entry with the Change bit set, even if the HPT is
for a large page (e.g., 16MB).  Userspace then considers only the
first system page dirty, though in fact the guest may have modified
anywhere in the large page.

To fix this, we make kvm_test_clear_dirty() return the actual number
of pages that are dirty (and rename it to kvm_test_clear_dirty_npages()
to emphasize that that's what it returns).  In kvmppc_hv_get_dirty_log()
we then set that many bits in the dirty map.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 33 ++++++++++++++++++++++++---------
 1 file changed, 24 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 4e22ecb..96c9044 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1060,22 +1060,27 @@ void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte)
 	kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
 }
 
-static int kvm_test_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
+/*
+ * Returns the number of system pages that are dirty.
+ * This can be more than 1 if we find a huge-page HPTE.
+ */
+static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 {
 	struct revmap_entry *rev = kvm->arch.revmap;
 	unsigned long head, i, j;
+	unsigned long n;
 	unsigned long *hptep;
-	int ret = 0;
+	int npages_dirty = 0;
 
  retry:
 	lock_rmap(rmapp);
 	if (*rmapp & KVMPPC_RMAP_CHANGED) {
 		*rmapp &= ~KVMPPC_RMAP_CHANGED;
-		ret = 1;
+		npages_dirty = 1;
 	}
 	if (!(*rmapp & KVMPPC_RMAP_PRESENT)) {
 		unlock_rmap(rmapp);
-		return ret;
+		return npages_dirty;
 	}
 
 	i = head = *rmapp & KVMPPC_RMAP_INDEX;
@@ -1106,13 +1111,16 @@ static int kvm_test_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
 				rev[i].guest_rpte |= HPTE_R_C;
 				note_hpte_modification(kvm, &rev[i]);
 			}
-			ret = 1;
+			n = hpte_page_size(hptep[0], hptep[1]);
+			n = (n + PAGE_SIZE - 1) >> PAGE_SHIFT;
+			if (n > npages_dirty)
+				npages_dirty = n;
 		}
 		hptep[0] &= ~HPTE_V_HVLOCK;
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
-	return ret;
+	return npages_dirty;
 }
 
 static void harvest_vpa_dirty(struct kvmppc_vpa *vpa,
@@ -1136,15 +1144,22 @@ static void harvest_vpa_dirty(struct kvmppc_vpa *vpa,
 long kvmppc_hv_get_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot,
 			     unsigned long *map)
 {
-	unsigned long i;
+	unsigned long i, j;
 	unsigned long *rmapp;
 	struct kvm_vcpu *vcpu;
 
 	preempt_disable();
 	rmapp = memslot->arch.rmap;
 	for (i = 0; i < memslot->npages; ++i) {
-		if (kvm_test_clear_dirty(kvm, rmapp) && map)
-			__set_bit_le(i, map);
+		int npages = kvm_test_clear_dirty_npages(kvm, rmapp);
+		/*
+		 * Note that if npages > 0 then i must be a multiple of npages,
+		 * since we always put huge-page HPTEs in the rmap chain
+		 * corresponding to their page base address.
+		 */
+		if (npages && map)
+			for (j = i; npages; ++j, --npages)
+				__set_bit_le(j, map);
 		++rmapp;
 	}
 
-- 
2.0.0.rc2


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 6/8] KVM: PPC: Book3S HV: Make sure we don't miss dirty pages
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
                   ` (4 preceding siblings ...)
  2014-05-26  9:48 ` [PATCH 5/8] KVM: PPC: Book3S HV: Fix dirty map for hugepages Paul Mackerras
@ 2014-05-26  9:48 ` Paul Mackerras
  2014-05-26  9:48 ` [PATCH 7/8] KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs Paul Mackerras
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

Current, when testing whether a page is dirty (when constructing the
bitmap for the KVM_GET_DIRTY_LOG ioctl), we test the C (changed) bit
in the HPT entries mapping the page, and if it is 0, we consider the
page to be clean.  However, the Power ISA doesn't require processors
to set the C bit to 1 immediately when writing to a page, and in fact
allows them to delay the writeback of the C bit until they receive a
TLB invalidation for the page.  Thus it is possible that the page
could be dirty and we miss it.

Now, if there are vcpus running, this is not serious since the
collection of the dirty log is racy already - some vcpu could dirty
the page just after we check it.  But if there are no vcpus running we
should return definitive results, in case we are in the final phase of
migrating the guest.

Also, if the permission bits in the HPTE don't allow writing, then we
know that no CPU can set C.  If the HPTE was previously writable and
the page was modified, any C bit writeback would have been flushed out
by the tlbie that we did when changing the HPTE to read-only.

Otherwise we need to do a TLB invalidation even if the C bit is 0, and
then check the C bit.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_64_mmu_hv.c | 47 +++++++++++++++++++++++++++++--------
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 96c9044..8056107 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -1060,6 +1060,11 @@ void kvm_set_spte_hva_hv(struct kvm *kvm, unsigned long hva, pte_t pte)
 	kvm_handle_hva(kvm, hva, kvm_unmap_rmapp);
 }
 
+static int vcpus_running(struct kvm *kvm)
+{
+	return atomic_read(&kvm->arch.vcpus_running) != 0;
+}
+
 /*
  * Returns the number of system pages that are dirty.
  * This can be more than 1 if we find a huge-page HPTE.
@@ -1069,6 +1074,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 	struct revmap_entry *rev = kvm->arch.revmap;
 	unsigned long head, i, j;
 	unsigned long n;
+	unsigned long v, r;
 	unsigned long *hptep;
 	int npages_dirty = 0;
 
@@ -1088,7 +1094,22 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 		hptep = (unsigned long *) (kvm->arch.hpt_virt + (i << 4));
 		j = rev[i].forw;
 
-		if (!(hptep[1] & HPTE_R_C))
+		/*
+		 * Checking the C (changed) bit here is racy since there
+		 * is no guarantee about when the hardware writes it back.
+		 * If the HPTE is not writable then it is stable since the
+		 * page can't be written to, and we would have done a tlbie
+		 * (which forces the hardware to complete any writeback)
+		 * when making the HPTE read-only.
+		 * If vcpus are running then this call is racy anyway
+		 * since the page could get dirtied subsequently, so we
+		 * expect there to be a further call which would pick up
+		 * any delayed C bit writeback.
+		 * Otherwise we need to do the tlbie even if C==0 in
+		 * order to pick up any delayed writeback of C.
+		 */
+		if (!(hptep[1] & HPTE_R_C) &&
+		    (!hpte_is_writable(hptep[1]) || vcpus_running(kvm)))
 			continue;
 
 		if (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) {
@@ -1100,23 +1121,29 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
 		}
 
 		/* Now check and modify the HPTE */
-		if ((hptep[0] & HPTE_V_VALID) && (hptep[1] & HPTE_R_C)) {
-			/* need to make it temporarily absent to clear C */
-			hptep[0] |= HPTE_V_ABSENT;
-			kvmppc_invalidate_hpte(kvm, hptep, i);
-			hptep[1] &= ~HPTE_R_C;
-			eieio();
-			hptep[0] = (hptep[0] & ~HPTE_V_ABSENT) | HPTE_V_VALID;
+		if (!(hptep[0] & HPTE_V_VALID))
+			continue;
+
+		/* need to make it temporarily absent so C is stable */
+		hptep[0] |= HPTE_V_ABSENT;
+		kvmppc_invalidate_hpte(kvm, hptep, i);
+		v = hptep[0];
+		r = hptep[1];
+		if (r & HPTE_R_C) {
+			hptep[1] = r & ~HPTE_R_C;
 			if (!(rev[i].guest_rpte & HPTE_R_C)) {
 				rev[i].guest_rpte |= HPTE_R_C;
 				note_hpte_modification(kvm, &rev[i]);
 			}
-			n = hpte_page_size(hptep[0], hptep[1]);
+			n = hpte_page_size(v, r);
 			n = (n + PAGE_SIZE - 1) >> PAGE_SHIFT;
 			if (n > npages_dirty)
 				npages_dirty = n;
+			eieio();
 		}
-		hptep[0] &= ~HPTE_V_HVLOCK;
+		v &= ~(HPTE_V_ABSENT | HPTE_V_HVLOCK);
+		v |= HPTE_V_VALID;
+		hptep[0] = v;
 	} while ((i = j) != head);
 
 	unlock_rmap(rmapp);
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 7/8] KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
                   ` (5 preceding siblings ...)
  2014-05-26  9:48 ` [PATCH 6/8] KVM: PPC: Book3S HV: Make sure we don't miss dirty pages Paul Mackerras
@ 2014-05-26  9:48 ` Paul Mackerras
  2014-05-26  9:48 ` [PATCH 8/8] KVM: PPC: Book3S HV: Fix machine check delivery to guest Paul Mackerras
  2014-05-28 12:14 ` [PATCH 0/8] Bug fixes for HV KVM, v2 Alexander Graf
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

This adds workarounds for two hardware bugs in the POWER8 performance
monitor unit (PMU), both related to interrupt generation.  The effect
of these bugs is that PMU interrupts can get lost, leading to tools
such as perf reporting fewer counts and samples than they should.

The first bug relates to the PMAO (perf. mon. alert occurred) bit in
MMCR0; setting it should cause an interrupt, but doesn't.  The other
bug relates to the PMAE (perf. mon. alert enable) bit in MMCR0.
Setting PMAE when a counter is negative and counter negative
conditions are enabled to cause alerts should cause an alert, but
doesn't.

The workaround for the first bug is to create conditions where a
counter will overflow, whenever we are about to restore a MMCR0
value that has PMAO set (and PMAO_SYNC clear).  The workaround for
the second bug is to freeze all counters using MMCR2 before reading
MMCR0.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/reg.h          | 12 ++++---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 59 +++++++++++++++++++++++++++++++--
 2 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
index e5d2e0b..4852bcf 100644
--- a/arch/powerpc/include/asm/reg.h
+++ b/arch/powerpc/include/asm/reg.h
@@ -670,18 +670,20 @@
 #define   MMCR0_PROBLEM_DISABLE MMCR0_FCP
 #define   MMCR0_FCM1	0x10000000UL /* freeze counters while MSR mark = 1 */
 #define   MMCR0_FCM0	0x08000000UL /* freeze counters while MSR mark = 0 */
-#define   MMCR0_PMXE	0x04000000UL /* performance monitor exception enable */
-#define   MMCR0_FCECE	0x02000000UL /* freeze ctrs on enabled cond or event */
+#define   MMCR0_PMXE	ASM_CONST(0x04000000) /* perf mon exception enable */
+#define   MMCR0_FCECE	ASM_CONST(0x02000000) /* freeze ctrs on enabled cond or event */
 #define   MMCR0_TBEE	0x00400000UL /* time base exception enable */
 #define   MMCR0_BHRBA	0x00200000UL /* BHRB Access allowed in userspace */
 #define   MMCR0_EBE	0x00100000UL /* Event based branch enable */
 #define   MMCR0_PMCC	0x000c0000UL /* PMC control */
 #define   MMCR0_PMCC_U6	0x00080000UL /* PMC1-6 are R/W by user (PR) */
 #define   MMCR0_PMC1CE	0x00008000UL /* PMC1 count enable*/
-#define   MMCR0_PMCjCE	0x00004000UL /* PMCj count enable*/
+#define   MMCR0_PMCjCE	ASM_CONST(0x00004000) /* PMCj count enable*/
 #define   MMCR0_TRIGGER	0x00002000UL /* TRIGGER enable */
-#define   MMCR0_PMAO_SYNC 0x00000800UL /* PMU interrupt is synchronous */
-#define   MMCR0_PMAO	0x00000080UL /* performance monitor alert has occurred, set to 0 after handling exception */
+#define   MMCR0_PMAO_SYNC ASM_CONST(0x00000800) /* PMU intr is synchronous */
+#define   MMCR0_C56RUN	ASM_CONST(0x00000100) /* PMC5/6 count when RUN=0 */
+/* performance monitor alert has occurred, set to 0 after handling exception */
+#define   MMCR0_PMAO	ASM_CONST(0x00000080)
 #define   MMCR0_SHRFC	0x00000040UL /* SHRre freeze conditions between threads */
 #define   MMCR0_FC56	0x00000010UL /* freeze counters 5 and 6 */
 #define   MMCR0_FCTI	0x00000008UL /* freeze counters in tags inactive mode */
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index ffbb871..60fe8ba 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -86,6 +86,12 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
 	lbz	r4, LPPACA_PMCINUSE(r3)
 	cmpwi	r4, 0
 	beq	23f			/* skip if not */
+BEGIN_FTR_SECTION
+	ld	r3, HSTATE_MMCR(r13)
+	andi.	r4, r3, MMCR0_PMAO_SYNC | MMCR0_PMAO
+	cmpwi	r4, MMCR0_PMAO
+	beql	kvmppc_fix_pmao
+END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
 	lwz	r3, HSTATE_PMC(r13)
 	lwz	r4, HSTATE_PMC + 4(r13)
 	lwz	r5, HSTATE_PMC + 8(r13)
@@ -726,6 +732,12 @@ skip_tm:
 	sldi	r3, r3, 31		/* MMCR0_FC (freeze counters) bit */
 	mtspr	SPRN_MMCR0, r3		/* freeze all counters, disable ints */
 	isync
+BEGIN_FTR_SECTION
+	ld	r3, VCPU_MMCR(r4)
+	andi.	r5, r3, MMCR0_PMAO_SYNC | MMCR0_PMAO
+	cmpwi	r5, MMCR0_PMAO
+	beql	kvmppc_fix_pmao
+END_FTR_SECTION_IFSET(CPU_FTR_PMAO_BUG)
 	lwz	r3, VCPU_PMC(r4)	/* always load up guest PMU registers */
 	lwz	r5, VCPU_PMC + 4(r4)	/* to prevent information leak */
 	lwz	r6, VCPU_PMC + 8(r4)
@@ -1324,6 +1336,30 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
 25:
 	/* Save PMU registers if requested */
 	/* r8 and cr0.eq are live here */
+BEGIN_FTR_SECTION
+	/*
+	 * POWER8 seems to have a hardware bug where setting
+	 * MMCR0[PMAE] along with MMCR0[PMC1CE] and/or MMCR0[PMCjCE]
+	 * when some counters are already negative doesn't seem
+	 * to cause a performance monitor alert (and hence interrupt).
+	 * The effect of this is that when saving the PMU state,
+	 * if there is no PMU alert pending when we read MMCR0
+	 * before freezing the counters, but one becomes pending
+	 * before we read the counters, we lose it.
+	 * To work around this, we need a way to freeze the counters
+	 * before reading MMCR0.  Normally, freezing the counters
+	 * is done by writing MMCR0 (to set MMCR0[FC]) which
+	 * unavoidably writes MMCR0[PMA0] as well.  On POWER8,
+	 * we can also freeze the counters using MMCR2, by writing
+	 * 1s to all the counter freeze condition bits (there are
+	 * 9 bits each for 6 counters).
+	 */
+	li	r3, -1			/* set all freeze bits */
+	clrrdi	r3, r3, 10
+	mfspr	r10, SPRN_MMCR2
+	mtspr	SPRN_MMCR2, r3
+	isync
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	li	r3, 1
 	sldi	r3, r3, 31		/* MMCR0_FC (freeze counters) bit */
 	mfspr	r4, SPRN_MMCR0		/* save MMCR0 */
@@ -1347,6 +1383,9 @@ END_FTR_SECTION_IFSET(CPU_FTR_ARCH_206)
 	std	r4, VCPU_MMCR(r9)
 	std	r5, VCPU_MMCR + 8(r9)
 	std	r6, VCPU_MMCR + 16(r9)
+BEGIN_FTR_SECTION
+	std	r10, VCPU_MMCR + 24(r9)
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
 	std	r7, VCPU_SIAR(r9)
 	std	r8, VCPU_SDAR(r9)
 	mfspr	r3, SPRN_PMC1
@@ -1370,12 +1409,10 @@ BEGIN_FTR_SECTION
 	stw	r11, VCPU_PMC + 28(r9)
 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_201)
 BEGIN_FTR_SECTION
-	mfspr	r4, SPRN_MMCR2
 	mfspr	r5, SPRN_SIER
 	mfspr	r6, SPRN_SPMC1
 	mfspr	r7, SPRN_SPMC2
 	mfspr	r8, SPRN_MMCRS
-	std	r4, VCPU_MMCR + 24(r9)
 	std	r5, VCPU_SIER(r9)
 	stw	r6, VCPU_PMC + 24(r9)
 	stw	r7, VCPU_PMC + 28(r9)
@@ -2311,3 +2348,21 @@ kvmppc_msr_interrupt:
 	li	r0, 1
 1:	rldimi	r11, r0, MSR_TS_S_LG, 63 - MSR_TS_T_LG
 	blr
+
+/*
+ * This works around a hardware bug on POWER8E processors, where
+ * writing a 1 to the MMCR0[PMAO] bit doesn't generate a
+ * performance monitor interrupt.  Instead, when we need to have
+ * an interrupt pending, we have to arrange for a counter to overflow.
+ */
+kvmppc_fix_pmao:
+	li	r3, 0
+	mtspr	SPRN_MMCR2, r3
+	lis	r3, (MMCR0_PMXE | MMCR0_FCECE)@h
+	ori	r3, r3, MMCR0_PMCjCE | MMCR0_C56RUN
+	mtspr	SPRN_MMCR0, r3
+	lis	r3, 0x7fff
+	ori	r3, r3, 0xffff
+	mtspr	SPRN_PMC6, r3
+	isync
+	blr
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 8/8] KVM: PPC: Book3S HV: Fix machine check delivery to guest
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
                   ` (6 preceding siblings ...)
  2014-05-26  9:48 ` [PATCH 7/8] KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs Paul Mackerras
@ 2014-05-26  9:48 ` Paul Mackerras
  2014-05-28 12:14 ` [PATCH 0/8] Bug fixes for HV KVM, v2 Alexander Graf
  8 siblings, 0 replies; 10+ messages in thread
From: Paul Mackerras @ 2014-05-26  9:48 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm

The code that delivered a machine check to the guest after handling
it in real mode failed to load up r11 before calling kvmppc_msr_interrupt,
which needs the old MSR value in r11 so it can see the transactional
state there.  This adds the missing load.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 60fe8ba..220aefb 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -2144,6 +2144,7 @@ machine_check_realmode:
 	beq	mc_cont
 	/* If not, deliver a machine check.  SRR0/1 are already set */
 	li	r10, BOOK3S_INTERRUPT_MACHINE_CHECK
+	ld	r11, VCPU_MSR(r9)
 	bl	kvmppc_msr_interrupt
 	b	fast_interrupt_c_return
 
-- 
2.0.0.rc2

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH 0/8] Bug fixes for HV KVM, v2
  2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
                   ` (7 preceding siblings ...)
  2014-05-26  9:48 ` [PATCH 8/8] KVM: PPC: Book3S HV: Fix machine check delivery to guest Paul Mackerras
@ 2014-05-28 12:14 ` Alexander Graf
  8 siblings, 0 replies; 10+ messages in thread
From: Alexander Graf @ 2014-05-28 12:14 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm


On 26.05.14 11:48, Paul Mackerras wrote:
> This series of patches fixes a few bugs that have been found in
> testing HV KVM recently.  It also adds workarounds for a couple of
> POWER8 PMU bugs, fixes the definition of KVM_REG_PPC_WORT, and adds
> some things that were missing from Documentation/virtual/kvm/api.txt.
>
> The patches are against Alex Graf's kvm-ppc-queue branch.
>
> Please apply for 3.16.  The first couple would be safe to go into 3.15
> as well, and probably should.

Thanks, applied all to kvm-ppc-queue. I don't think it's necessary for 
the first few to really go into 3.15 - if user space uses a header from 
there it will just get unimplemented ONE_REG returns for WORT.


Alex

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-05-28 12:14 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-05-26  9:48 [PATCH 0/8] Bug fixes for HV KVM, v2 Paul Mackerras
2014-05-26  9:48 ` [PATCH 1/8] KVM: PPC: Book3S: Add ONE_REG register names that were missed Paul Mackerras
2014-05-26  9:48 ` [PATCH 2/8] KVM: PPC: Book3S: Move KVM_REG_PPC_WORT to an unused register number Paul Mackerras
2014-05-26  9:48 ` [PATCH 3/8] KVM: PPC: Book3S HV: Fix check for running inside guest in global_invalidates() Paul Mackerras
2014-05-26  9:48 ` [PATCH 4/8] KVM: PPC: Book3S HV: Put huge-page HPTEs in rmap chain for base address Paul Mackerras
2014-05-26  9:48 ` [PATCH 5/8] KVM: PPC: Book3S HV: Fix dirty map for hugepages Paul Mackerras
2014-05-26  9:48 ` [PATCH 6/8] KVM: PPC: Book3S HV: Make sure we don't miss dirty pages Paul Mackerras
2014-05-26  9:48 ` [PATCH 7/8] KVM: PPC: Book3S HV: Work around POWER8 performance monitor bugs Paul Mackerras
2014-05-26  9:48 ` [PATCH 8/8] KVM: PPC: Book3S HV: Fix machine check delivery to guest Paul Mackerras
2014-05-28 12:14 ` [PATCH 0/8] Bug fixes for HV KVM, v2 Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox