public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6 v2] kvm: powerpc: use cache attributes from linux pte
@ 2013-08-01 11:12 Bharat Bhushan
  2013-08-01 11:12 ` [PATCH 1/6 v2] powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN Bharat Bhushan
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Bharat Bhushan @ 2013-08-01 11:12 UTC (permalink / raw)
  To: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood; +Cc: Bharat Bhushan

From: Bharat Bhushan <bharat.bhushan@freescale.com>

First patch is a typo fix where book3e define _PAGE_LENDIAN while it should be
defined as _PAGE_ENDIAN. This seems to show that this is never exercised :-)

Second and third patch is to allow guest controlling "G"-Guarded and
"E"-Endiany TLB attributes respectively.

Rest of patches is about setting caching attributes (TLB.WIMGE) using
corresponding Linux pte.

v1->v2
 - Earlier caching attributes (WIMGE) were set based of page is RAM or not
   But now we get these attributes from corresponding Linux PTE.

Bharat Bhushan (6):
  powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN
  kvm: powerpc: allow guest control "E" attribute in mas2
  kvm: powerpc: allow guest control "G" attribute in mas2
  powerpc: move linux pte/hugepte search to more generic file
  kvm: powerpc: booke: Add linux pte lookup like booke3s
  kvm: powerpc: use caching attributes as per linux pte

 arch/powerpc/include/asm/kvm_booke.h     |   73 ++++++++++++++++++++++++++++++
 arch/powerpc/include/asm/kvm_host.h      |    2 +-
 arch/powerpc/include/asm/pgtable-ppc64.h |   36 ---------------
 arch/powerpc/include/asm/pgtable.h       |   37 +++++++++++++++
 arch/powerpc/include/asm/pte-book3e.h    |    2 +-
 arch/powerpc/kvm/booke.c                 |    2 +-
 arch/powerpc/kvm/e500.h                  |   10 +++--
 arch/powerpc/kvm/e500_mmu_host.c         |   31 +++++++-----
 8 files changed, 137 insertions(+), 56 deletions(-)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6 v2] powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN
  2013-08-01 11:12 [PATCH 0/6 v2] kvm: powerpc: use cache attributes from linux pte Bharat Bhushan
@ 2013-08-01 11:12 ` Bharat Bhushan
  2013-08-01 11:12 ` [PATCH 2/6 v2] kvm: powerpc: allow guest control "E" attribute in mas2 Bharat Bhushan
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Bharat Bhushan @ 2013-08-01 11:12 UTC (permalink / raw)
  To: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood
  Cc: Bharat Bhushan, Bharat Bhushan

For booke3e _PAGE_ENDIAN is not defined. Infact what is defined
is "_PAGE_LENDIAN" which is wrong and should be _PAGE_ENDIAN.
There are no compilation errors as
arch/powerpc/include/asm/pte-common.h defines _PAGE_ENDIAN to 0
as it is not defined anywhere.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
v1->v2
 - no change

 arch/powerpc/include/asm/pte-book3e.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/pte-book3e.h b/arch/powerpc/include/asm/pte-book3e.h
index 0156702..576ad88 100644
--- a/arch/powerpc/include/asm/pte-book3e.h
+++ b/arch/powerpc/include/asm/pte-book3e.h
@@ -40,7 +40,7 @@
 #define _PAGE_U1	0x010000
 #define _PAGE_U0	0x020000
 #define _PAGE_ACCESSED	0x040000
-#define _PAGE_LENDIAN	0x080000
+#define _PAGE_ENDIAN	0x080000
 #define _PAGE_GUARDED	0x100000
 #define _PAGE_COHERENT	0x200000 /* M: enforce memory coherence */
 #define _PAGE_NO_CACHE	0x400000 /* I: cache inhibit */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/6 v2] kvm: powerpc: allow guest control "E" attribute in mas2
  2013-08-01 11:12 [PATCH 0/6 v2] kvm: powerpc: use cache attributes from linux pte Bharat Bhushan
  2013-08-01 11:12 ` [PATCH 1/6 v2] powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN Bharat Bhushan
@ 2013-08-01 11:12 ` Bharat Bhushan
  2013-08-01 11:12 ` [PATCH 3/6 v2] kvm: powerpc: allow guest control "G" " Bharat Bhushan
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 27+ messages in thread
From: Bharat Bhushan @ 2013-08-01 11:12 UTC (permalink / raw)
  To: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood
  Cc: Bharat Bhushan, Bharat Bhushan

"E" bit in MAS2 bit indicates whether the page is accessed
in Little-Endian or Big-Endian byte order.
There is no reason to stop guest setting  "E", so allow him."

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
v1->v2
 - no change

 arch/powerpc/kvm/e500.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index c2e5e98..277cb18 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct kvm_vcpu *vcpu)
 #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW)
 #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW)
 #define MAS2_ATTRIB_MASK \
-	  (MAS2_X0 | MAS2_X1)
+	  (MAS2_X0 | MAS2_X1 | MAS2_E)
 #define MAS3_ATTRIB_MASK \
 	  (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \
 	   | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK)
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/6 v2] kvm: powerpc: allow guest control "G" attribute in mas2
  2013-08-01 11:12 [PATCH 0/6 v2] kvm: powerpc: use cache attributes from linux pte Bharat Bhushan
  2013-08-01 11:12 ` [PATCH 1/6 v2] powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN Bharat Bhushan
  2013-08-01 11:12 ` [PATCH 2/6 v2] kvm: powerpc: allow guest control "E" attribute in mas2 Bharat Bhushan
@ 2013-08-01 11:12 ` Bharat Bhushan
  2013-08-02  6:39   ` "“tiejun.chen”"
  2013-08-01 11:12 ` [PATCH 4/6 v2] powerpc: move linux pte/hugepte search to more generic file Bharat Bhushan
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Bharat Bhushan @ 2013-08-01 11:12 UTC (permalink / raw)
  To: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood
  Cc: Bharat Bhushan, Bharat Bhushan

"G" bit in MAS2 indicates whether the page is Guarded.
There is no reason to stop guest setting  "E", so allow him.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
v1->v2
 - no change

 arch/powerpc/kvm/e500.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 277cb18..4fd9650 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct kvm_vcpu *vcpu)
 #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW)
 #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW)
 #define MAS2_ATTRIB_MASK \
-	  (MAS2_X0 | MAS2_X1 | MAS2_E)
+	  (MAS2_X0 | MAS2_X1 | MAS2_E | MAS2_G)
 #define MAS3_ATTRIB_MASK \
 	  (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \
 	   | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK)
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/6 v2] powerpc: move linux pte/hugepte search to more generic file
  2013-08-01 11:12 [PATCH 0/6 v2] kvm: powerpc: use cache attributes from linux pte Bharat Bhushan
                   ` (2 preceding siblings ...)
  2013-08-01 11:12 ` [PATCH 3/6 v2] kvm: powerpc: allow guest control "G" " Bharat Bhushan
@ 2013-08-01 11:12 ` Bharat Bhushan
  2013-08-01 11:12 ` [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s Bharat Bhushan
  2013-08-01 11:12 ` [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte Bharat Bhushan
  5 siblings, 0 replies; 27+ messages in thread
From: Bharat Bhushan @ 2013-08-01 11:12 UTC (permalink / raw)
  To: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood
  Cc: Bharat Bhushan, Bharat Bhushan

Linux pte search functions find_linux_pte_or_hugepte() and
find_linux_pte() have nothing specific to 64bit anymore.
So they are move from pgtable-ppc64.h to asm/pgtable.h

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
v1->v2
 - This is a new change in this version

 arch/powerpc/include/asm/pgtable-ppc64.h |   36 -----------------------------
 arch/powerpc/include/asm/pgtable.h       |   37 ++++++++++++++++++++++++++++++
 2 files changed, 37 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index e3d55f6..d257d98 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -340,42 +340,6 @@ static inline void __ptep_set_access_flags(pte_t *ptep, pte_t entry)
 void pgtable_cache_add(unsigned shift, void (*ctor)(void *));
 void pgtable_cache_init(void);
 
-/*
- * find_linux_pte returns the address of a linux pte for a given
- * effective address and directory.  If not found, it returns zero.
- */
-static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea)
-{
-	pgd_t *pg;
-	pud_t *pu;
-	pmd_t *pm;
-	pte_t *pt = NULL;
-
-	pg = pgdir + pgd_index(ea);
-	if (!pgd_none(*pg)) {
-		pu = pud_offset(pg, ea);
-		if (!pud_none(*pu)) {
-			pm = pmd_offset(pu, ea);
-			if (pmd_present(*pm))
-				pt = pte_offset_kernel(pm, ea);
-		}
-	}
-	return pt;
-}
-
-#ifdef CONFIG_HUGETLB_PAGE
-pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
-				 unsigned *shift);
-#else
-static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
-					       unsigned *shift)
-{
-	if (shift)
-		*shift = 0;
-	return find_linux_pte(pgdir, ea);
-}
-#endif /* !CONFIG_HUGETLB_PAGE */
-
 #endif /* __ASSEMBLY__ */
 
 #endif /* _ASM_POWERPC_PGTABLE_PPC64_H_ */
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index b6293d2..690c8c2 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -217,6 +217,43 @@ extern int gup_hugepd(hugepd_t *hugepd, unsigned pdshift, unsigned long addr,
 
 extern int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
 		       unsigned long end, int write, struct page **pages, int *nr);
+
+/*
+ * find_linux_pte returns the address of a linux pte for a given
+ * effective address and directory.  If not found, it returns zero.
+ */
+static inline pte_t *find_linux_pte(pgd_t *pgdir, unsigned long ea)
+{
+	pgd_t *pg;
+	pud_t *pu;
+	pmd_t *pm;
+	pte_t *pt = NULL;
+
+	pg = pgdir + pgd_index(ea);
+	if (!pgd_none(*pg)) {
+		pu = pud_offset(pg, ea);
+		if (!pud_none(*pu)) {
+			pm = pmd_offset(pu, ea);
+			if (pmd_present(*pm))
+				pt = pte_offset_kernel(pm, ea);
+		}
+	}
+	return pt;
+}
+
+#ifdef CONFIG_HUGETLB_PAGE
+pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
+				 unsigned *shift);
+#else
+static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
+					       unsigned *shift)
+{
+	if (shift)
+		*shift = 0;
+	return find_linux_pte(pgdir, ea);
+}
+#endif /* !CONFIG_HUGETLB_PAGE */
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-01 11:12 [PATCH 0/6 v2] kvm: powerpc: use cache attributes from linux pte Bharat Bhushan
                   ` (3 preceding siblings ...)
  2013-08-01 11:12 ` [PATCH 4/6 v2] powerpc: move linux pte/hugepte search to more generic file Bharat Bhushan
@ 2013-08-01 11:12 ` Bharat Bhushan
  2013-08-02  6:37   ` "“tiejun.chen”"
  2013-08-02 22:58   ` Scott Wood
  2013-08-01 11:12 ` [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte Bharat Bhushan
  5 siblings, 2 replies; 27+ messages in thread
From: Bharat Bhushan @ 2013-08-01 11:12 UTC (permalink / raw)
  To: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood
  Cc: Bharat Bhushan, Bharat Bhushan

KVM need to lookup linux pte for getting TLB attributes (WIMGE).
This is similar to how book3s does.
This will be used in follow-up patches.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
v1->v2
 - This is a new change in this version

 arch/powerpc/include/asm/kvm_booke.h |   73 ++++++++++++++++++++++++++++++++++
 1 files changed, 73 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
index d3c1eb3..903624d 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -102,4 +102,77 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
 {
 	return vcpu->arch.shared->msr;
 }
+
+/*
+ * Lock and read a linux PTE.  If it's present and writable, atomically
+ * set dirty and referenced bits and return the PTE, otherwise return 0.
+ */
+static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
+{
+	pte_t pte;
+
+#ifdef PTE_ATOMIC_UPDATES
+	pte_t tmp;
+        /* wait until _PAGE_BUSY is clear then set it atomically */
+#ifdef CONFIG_PPC64
+	__asm__ __volatile__ (
+		"1:	ldarx	%0,0,%3\n"
+		"	andi.	%1,%0,%4\n"
+		"	bne-	1b\n"
+		"	ori	%1,%0,%4\n"
+		"	stdcx.	%1,0,%3\n"
+		"	bne-	1b"
+		: "=&r" (pte), "=&r" (tmp), "=m" (*p)
+		: "r" (p), "i" (_PAGE_BUSY)
+		: "cc");
+#else
+        __asm__ __volatile__ (
+                "1:     lwarx   %0,0,%3\n"
+                "       andi.   %1,%0,%4\n"
+                "       bne-    1b\n"
+                "       ori     %1,%0,%4\n"
+                "       stwcx.  %1,0,%3\n"
+                "       bne-    1b"
+                : "=&r" (pte), "=&r" (tmp), "=m" (*p)
+                : "r" (p), "i" (_PAGE_BUSY)
+                : "cc");
+#endif
+#else
+	pte = pte_val(*p);
+#endif
+
+	if (pte_present(pte)) {
+		pte = pte_mkyoung(pte);
+		if (writing && pte_write(pte))
+			pte = pte_mkdirty(pte);
+	}
+
+	*p = pte;	/* clears _PAGE_BUSY */
+
+	return pte;
+}
+
+static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
+			      int writing, unsigned long *pte_sizep)
+{
+	pte_t *ptep;
+	unsigned long ps = *pte_sizep;
+	unsigned int shift;
+
+	ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
+	if (!ptep)
+		return __pte(0);
+	if (shift)
+		*pte_sizep = 1ul << shift;
+	else
+		*pte_sizep = PAGE_SIZE;
+
+	if (ps > *pte_sizep)
+		return __pte(0);
+	if (!pte_present(*ptep))
+		return __pte(0);
+
+	return kvmppc_read_update_linux_pte(ptep, writing);
+}
+
 #endif /* __ASM_KVM_BOOKE_H__ */
-- 
1.7.0.4



^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
  2013-08-01 11:12 [PATCH 0/6 v2] kvm: powerpc: use cache attributes from linux pte Bharat Bhushan
                   ` (4 preceding siblings ...)
  2013-08-01 11:12 ` [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s Bharat Bhushan
@ 2013-08-01 11:12 ` Bharat Bhushan
  2013-08-02  6:24   ` "“tiejun.chen”"
  2013-08-02 23:34   ` Scott Wood
  5 siblings, 2 replies; 27+ messages in thread
From: Bharat Bhushan @ 2013-08-01 11:12 UTC (permalink / raw)
  To: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood
  Cc: Bharat Bhushan, Bharat Bhushan

KVM uses same WIM tlb attributes as the corresponding qemu pte.
For this we now search the linux pte for the requested page and
get these cache caching/coherency attributes from pte.

Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
---
v1->v2
 - Use Linux pte for wimge rather than RAM/no-RAM mechanism

 arch/powerpc/include/asm/kvm_host.h |    2 +-
 arch/powerpc/kvm/booke.c            |    2 +-
 arch/powerpc/kvm/e500.h             |    8 +++++---
 arch/powerpc/kvm/e500_mmu_host.c    |   31 ++++++++++++++++++-------------
 4 files changed, 25 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 3328353..583d405 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -535,6 +535,7 @@ struct kvm_vcpu_arch {
 #endif
 	gpa_t paddr_accessed;
 	gva_t vaddr_accessed;
+	pgd_t *pgdir;
 
 	u8 io_gpr; /* GPR used as IO source/target */
 	u8 mmio_is_bigendian;
@@ -592,7 +593,6 @@ struct kvm_vcpu_arch {
 	struct list_head run_list;
 	struct task_struct *run_task;
 	struct kvm_run *kvm_run;
-	pgd_t *pgdir;
 
 	spinlock_t vpa_update_lock;
 	struct kvmppc_vpa vpa;
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 17722d8..ebcccc2 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
 #endif
 
 	kvmppc_fix_ee_before_entry();
-
+	vcpu->arch.pgdir = current->mm->pgd;
 	ret = __kvmppc_vcpu_run(kvm_run, vcpu);
 
 	/* No need for kvm_guest_exit. It's done in handle_exit.
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index 4fd9650..fc4b2f6 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -31,11 +31,13 @@ enum vcpu_ftr {
 #define E500_TLB_NUM   2
 
 /* entry is mapped somewhere in host TLB */
-#define E500_TLB_VALID		(1 << 0)
+#define E500_TLB_VALID		(1 << 31)
 /* TLB1 entry is mapped by host TLB1, tracked by bitmaps */
-#define E500_TLB_BITMAP		(1 << 1)
+#define E500_TLB_BITMAP		(1 << 30)
 /* TLB1 entry is mapped by host TLB0 */
-#define E500_TLB_TLB0		(1 << 2)
+#define E500_TLB_TLB0		(1 << 29)
+/* Lower 5 bits have WIMGE value */
+#define E500_TLB_WIMGE_MASK	(0x1f)
 
 struct tlbe_ref {
 	pfn_t pfn;		/* valid only for TLB0, except briefly */
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index 1c6a9d7..9b10b0b 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -64,15 +64,6 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode)
 	return mas3;
 }
 
-static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
-{
-#ifdef CONFIG_SMP
-	return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
-#else
-	return mas2 & MAS2_ATTRIB_MASK;
-#endif
-}
-
 /*
  * writing shadow tlb entry to host TLB
  */
@@ -248,10 +239,12 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
 
 static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
 					 struct kvm_book3e_206_tlb_entry *gtlbe,
-					 pfn_t pfn)
+					 pfn_t pfn, int wimg)
 {
 	ref->pfn = pfn;
 	ref->flags |= E500_TLB_VALID;
+	/* Use guest supplied MAS2_G and MAS2_E */
+	ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
 
 	if (tlbe_is_writable(gtlbe))
 		kvm_set_pfn_dirty(pfn);
@@ -312,8 +305,7 @@ static void kvmppc_e500_setup_stlbe(
 
 	/* Force IPROT=0 for all guest mappings. */
 	stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
-	stlbe->mas2 = (gvaddr & MAS2_EPN) |
-		      e500_shadow_mas2_attrib(gtlbe->mas2, pr);
+	stlbe->mas2 = (gvaddr & MAS2_EPN) | (ref->flags & E500_TLB_WIMGE_MASK);
 	stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
 			e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
 
@@ -332,6 +324,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 	unsigned long hva;
 	int pfnmap = 0;
 	int tsize = BOOK3E_PAGESZ_4K;
+	pte_t pte;
+	int wimg = 0;
 
 	/*
 	 * Translate guest physical to true physical, acquiring
@@ -437,6 +431,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 
 	if (likely(!pfnmap)) {
 		unsigned long tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT);
+		pgd_t *pgdir;
+
 		pfn = gfn_to_pfn_memslot(slot, gfn);
 		if (is_error_noslot_pfn(pfn)) {
 			printk(KERN_ERR "Couldn't get real page for gfn %lx!\n",
@@ -447,9 +443,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
 		/* Align guest and physical address to page map boundaries */
 		pfn &= ~(tsize_pages - 1);
 		gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1);
+		pgdir = vcpu_e500->vcpu.arch.pgdir;
+		pte = lookup_linux_pte(pgdir, hva, 1, &tsize_pages);
+		if (pte_present(pte)) {
+			wimg = (pte >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;
+		} else {
+			printk(KERN_ERR "pte not present: gfn %lx, pfn %lx\n",
+					(long)gfn, pfn);
+			return -EINVAL;
+		}
 	}
 
-	kvmppc_e500_ref_setup(ref, gtlbe, pfn);
+	kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
 
 	kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
 				ref, gvaddr, stlbe);
-- 
1.7.0.4

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
  2013-08-01 11:12 ` [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte Bharat Bhushan
@ 2013-08-02  6:24   ` "“tiejun.chen”"
  2013-08-02 23:34   ` Scott Wood
  1 sibling, 0 replies; 27+ messages in thread
From: "“tiejun.chen”" @ 2013-08-02  6:24 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood,
	Bharat Bhushan

On 08/01/2013 07:12 PM, Bharat Bhushan wrote:
> KVM uses same WIM tlb attributes as the corresponding qemu pte.
> For this we now search the linux pte for the requested page and
> get these cache caching/coherency attributes from pte.
>
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
> v1->v2
>   - Use Linux pte for wimge rather than RAM/no-RAM mechanism
>
>   arch/powerpc/include/asm/kvm_host.h |    2 +-
>   arch/powerpc/kvm/booke.c            |    2 +-
>   arch/powerpc/kvm/e500.h             |    8 +++++---
>   arch/powerpc/kvm/e500_mmu_host.c    |   31 ++++++++++++++++++-------------
>   4 files changed, 25 insertions(+), 18 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> index 3328353..583d405 100644
> --- a/arch/powerpc/include/asm/kvm_host.h
> +++ b/arch/powerpc/include/asm/kvm_host.h
> @@ -535,6 +535,7 @@ struct kvm_vcpu_arch {
>   #endif
>   	gpa_t paddr_accessed;
>   	gva_t vaddr_accessed;
> +	pgd_t *pgdir;
>
>   	u8 io_gpr; /* GPR used as IO source/target */
>   	u8 mmio_is_bigendian;
> @@ -592,7 +593,6 @@ struct kvm_vcpu_arch {
>   	struct list_head run_list;
>   	struct task_struct *run_task;
>   	struct kvm_run *kvm_run;
> -	pgd_t *pgdir;
>
>   	spinlock_t vpa_update_lock;
>   	struct kvmppc_vpa vpa;
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 17722d8..ebcccc2 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
>   #endif
>
>   	kvmppc_fix_ee_before_entry();
> -
> +	vcpu->arch.pgdir = current->mm->pgd;
>   	ret = __kvmppc_vcpu_run(kvm_run, vcpu);
>
>   	/* No need for kvm_guest_exit. It's done in handle_exit.
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index 4fd9650..fc4b2f6 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -31,11 +31,13 @@ enum vcpu_ftr {
>   #define E500_TLB_NUM   2
>
>   /* entry is mapped somewhere in host TLB */
> -#define E500_TLB_VALID		(1 << 0)
> +#define E500_TLB_VALID		(1 << 31)
>   /* TLB1 entry is mapped by host TLB1, tracked by bitmaps */
> -#define E500_TLB_BITMAP		(1 << 1)
> +#define E500_TLB_BITMAP		(1 << 30)
>   /* TLB1 entry is mapped by host TLB0 */
> -#define E500_TLB_TLB0		(1 << 2)
> +#define E500_TLB_TLB0		(1 << 29)
> +/* Lower 5 bits have WIMGE value */
> +#define E500_TLB_WIMGE_MASK	(0x1f)
>
>   struct tlbe_ref {
>   	pfn_t pfn;		/* valid only for TLB0, except briefly */
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
> index 1c6a9d7..9b10b0b 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -64,15 +64,6 @@ static inline u32 e500_shadow_mas3_attrib(u32 mas3, int usermode)
>   	return mas3;
>   }
>
> -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int usermode)
> -{
> -#ifdef CONFIG_SMP
> -	return (mas2 & MAS2_ATTRIB_MASK) | MAS2_M;
> -#else
> -	return mas2 & MAS2_ATTRIB_MASK;
> -#endif
> -}
> -
>   /*
>    * writing shadow tlb entry to host TLB
>    */
> @@ -248,10 +239,12 @@ static inline int tlbe_is_writable(struct kvm_book3e_206_tlb_entry *tlbe)
>
>   static inline void kvmppc_e500_ref_setup(struct tlbe_ref *ref,
>   					 struct kvm_book3e_206_tlb_entry *gtlbe,
> -					 pfn_t pfn)
> +					 pfn_t pfn, int wimg)
>   {
>   	ref->pfn = pfn;
>   	ref->flags |= E500_TLB_VALID;
> +	/* Use guest supplied MAS2_G and MAS2_E */
> +	ref->flags |= (gtlbe->mas2 & MAS2_ATTRIB_MASK) | wimg;
>
>   	if (tlbe_is_writable(gtlbe))
>   		kvm_set_pfn_dirty(pfn);
> @@ -312,8 +305,7 @@ static void kvmppc_e500_setup_stlbe(
>
>   	/* Force IPROT=0 for all guest mappings. */
>   	stlbe->mas1 = MAS1_TSIZE(tsize) | get_tlb_sts(gtlbe) | MAS1_VALID;
> -	stlbe->mas2 = (gvaddr & MAS2_EPN) |
> -		      e500_shadow_mas2_attrib(gtlbe->mas2, pr);
> +	stlbe->mas2 = (gvaddr & MAS2_EPN) | (ref->flags & E500_TLB_WIMGE_MASK);
>   	stlbe->mas7_3 = ((u64)pfn << PAGE_SHIFT) |
>   			e500_shadow_mas3_attrib(gtlbe->mas7_3, pr);
>
> @@ -332,6 +324,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>   	unsigned long hva;
>   	int pfnmap = 0;
>   	int tsize = BOOK3E_PAGESZ_4K;
> +	pte_t pte;
> +	int wimg = 0;
>
>   	/*
>   	 * Translate guest physical to true physical, acquiring
> @@ -437,6 +431,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>
>   	if (likely(!pfnmap)) {
>   		unsigned long tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT);
> +		pgd_t *pgdir;
> +
>   		pfn = gfn_to_pfn_memslot(slot, gfn);
>   		if (is_error_noslot_pfn(pfn)) {
>   			printk(KERN_ERR "Couldn't get real page for gfn %lx!\n",
> @@ -447,9 +443,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>   		/* Align guest and physical address to page map boundaries */
>   		pfn &= ~(tsize_pages - 1);
>   		gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1);
> +		pgdir = vcpu_e500->vcpu.arch.pgdir;
> +		pte = lookup_linux_pte(pgdir, hva, 1, &tsize_pages);
> +		if (pte_present(pte)) {
> +			wimg = (pte >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;

This should be
		wimg = (pte_val(pte) >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;

Tiejun

> +		} else {
> +			printk(KERN_ERR "pte not present: gfn %lx, pfn %lx\n",
> +					(long)gfn, pfn);
> +			return -EINVAL;
> +		}
>   	}
>
> -	kvmppc_e500_ref_setup(ref, gtlbe, pfn);
> +	kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
>
>   	kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
>   				ref, gvaddr, stlbe);
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-01 11:12 ` [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s Bharat Bhushan
@ 2013-08-02  6:37   ` "“tiejun.chen”"
  2013-08-02 22:58   ` Scott Wood
  1 sibling, 0 replies; 27+ messages in thread
From: "“tiejun.chen”" @ 2013-08-02  6:37 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood,
	Bharat Bhushan

On 08/01/2013 07:12 PM, Bharat Bhushan wrote:
> KVM need to lookup linux pte for getting TLB attributes (WIMGE).
> This is similar to how book3s does.
> This will be used in follow-up patches.
>
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
> v1->v2
>   - This is a new change in this version
>
>   arch/powerpc/include/asm/kvm_booke.h |   73 ++++++++++++++++++++++++++++++++++
>   1 files changed, 73 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
> index d3c1eb3..903624d 100644
> --- a/arch/powerpc/include/asm/kvm_booke.h
> +++ b/arch/powerpc/include/asm/kvm_booke.h
> @@ -102,4 +102,77 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
>   {
>   	return vcpu->arch.shared->msr;
>   }
> +
> +/*
> + * Lock and read a linux PTE.  If it's present and writable, atomically
> + * set dirty and referenced bits and return the PTE, otherwise return 0.
> + */
> +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
> +{
> +	pte_t pte;
> +
> +#ifdef PTE_ATOMIC_UPDATES
> +	pte_t tmp;
> +        /* wait until _PAGE_BUSY is clear then set it atomically */
> +#ifdef CONFIG_PPC64
> +	__asm__ __volatile__ (
> +		"1:	ldarx	%0,0,%3\n"
> +		"	andi.	%1,%0,%4\n"
> +		"	bne-	1b\n"
> +		"	ori	%1,%0,%4\n"
> +		"	stdcx.	%1,0,%3\n"
> +		"	bne-	1b"
> +		: "=&r" (pte), "=&r" (tmp), "=m" (*p)
> +		: "r" (p), "i" (_PAGE_BUSY)
> +		: "cc");
> +#else
> +        __asm__ __volatile__ (
> +                "1:     lwarx   %0,0,%3\n"
> +                "       andi.   %1,%0,%4\n"
> +                "       bne-    1b\n"
> +                "       ori     %1,%0,%4\n"
> +                "       stwcx.  %1,0,%3\n"
> +                "       bne-    1b"
> +                : "=&r" (pte), "=&r" (tmp), "=m" (*p)
> +                : "r" (p), "i" (_PAGE_BUSY)
> +                : "cc");
> +#endif
> +#else
> +	pte = pte_val(*p);
> +#endif
> +
> +	if (pte_present(pte)) {
> +		pte = pte_mkyoung(pte);
> +		if (writing && pte_write(pte))
> +			pte = pte_mkdirty(pte);
> +	}
> +
> +	*p = pte;	/* clears _PAGE_BUSY */
> +
> +	return pte;
> +}
> +
> +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> +			      int writing, unsigned long *pte_sizep)

Looks this function is as same as book3s, so why not improve that as common :)

Tiejun

> +{
> +	pte_t *ptep;
> +	unsigned long ps = *pte_sizep;
> +	unsigned int shift;
> +
> +	ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
> +	if (!ptep)
> +		return __pte(0);
> +	if (shift)
> +		*pte_sizep = 1ul << shift;
> +	else
> +		*pte_sizep = PAGE_SIZE;
> +
> +	if (ps > *pte_sizep)
> +		return __pte(0);
> +	if (!pte_present(*ptep))
> +		return __pte(0);
> +
> +	return kvmppc_read_update_linux_pte(ptep, writing);
> +}
> +
>   #endif /* __ASM_KVM_BOOKE_H__ */
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6 v2] kvm: powerpc: allow guest control "G" attribute in mas2
  2013-08-01 11:12 ` [PATCH 3/6 v2] kvm: powerpc: allow guest control "G" " Bharat Bhushan
@ 2013-08-02  6:39   ` "“tiejun.chen”"
  0 siblings, 0 replies; 27+ messages in thread
From: "“tiejun.chen”" @ 2013-08-02  6:39 UTC (permalink / raw)
  To: Bharat Bhushan
  Cc: benh, agraf, kvm-ppc, kvm, linuxppc-dev, scottwood,
	Bharat Bhushan

On 08/01/2013 07:12 PM, Bharat Bhushan wrote:
> "G" bit in MAS2 indicates whether the page is Guarded.
> There is no reason to stop guest setting  "E", so allow him.

Could we merge patch 2 and 3 into only one.

Tiejun

>
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
> v1->v2
>   - no change
>
>   arch/powerpc/kvm/e500.h |    2 +-
>   1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
> index 277cb18..4fd9650 100644
> --- a/arch/powerpc/kvm/e500.h
> +++ b/arch/powerpc/kvm/e500.h
> @@ -117,7 +117,7 @@ static inline struct kvmppc_vcpu_e500 *to_e500(struct kvm_vcpu *vcpu)
>   #define E500_TLB_USER_PERM_MASK (MAS3_UX|MAS3_UR|MAS3_UW)
>   #define E500_TLB_SUPER_PERM_MASK (MAS3_SX|MAS3_SR|MAS3_SW)
>   #define MAS2_ATTRIB_MASK \
> -	  (MAS2_X0 | MAS2_X1 | MAS2_E)
> +	  (MAS2_X0 | MAS2_X1 | MAS2_E | MAS2_G)
>   #define MAS3_ATTRIB_MASK \
>   	  (MAS3_U0 | MAS3_U1 | MAS3_U2 | MAS3_U3 \
>   	   | E500_TLB_USER_PERM_MASK | E500_TLB_SUPER_PERM_MASK)
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-01 11:12 ` [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s Bharat Bhushan
  2013-08-02  6:37   ` "“tiejun.chen”"
@ 2013-08-02 22:58   ` Scott Wood
  2013-08-02 23:16     ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 27+ messages in thread
From: Scott Wood @ 2013-08-02 22:58 UTC (permalink / raw)
  To: Bharat Bhushan; +Cc: benh, agraf, kvm-ppc, kvm, linuxppc-dev, Bharat Bhushan

On Thu, 2013-08-01 at 16:42 +0530, Bharat Bhushan wrote:
> KVM need to lookup linux pte for getting TLB attributes (WIMGE).
> This is similar to how book3s does.
> This will be used in follow-up patches.
> 
> Signed-off-by: Bharat Bhushan <bharat.bhushan@freescale.com>
> ---
> v1->v2
>  - This is a new change in this version
> 
>  arch/powerpc/include/asm/kvm_booke.h |   73 ++++++++++++++++++++++++++++++++++
>  1 files changed, 73 insertions(+), 0 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/kvm_booke.h b/arch/powerpc/include/asm/kvm_booke.h
> index d3c1eb3..903624d 100644
> --- a/arch/powerpc/include/asm/kvm_booke.h
> +++ b/arch/powerpc/include/asm/kvm_booke.h
> @@ -102,4 +102,77 @@ static inline ulong kvmppc_get_msr(struct kvm_vcpu *vcpu)
>  {
>  	return vcpu->arch.shared->msr;
>  }
> +
> +/*
> + * Lock and read a linux PTE.  If it's present and writable, atomically
> + * set dirty and referenced bits and return the PTE, otherwise return 0.
> + */
> +static inline pte_t kvmppc_read_update_linux_pte(pte_t *p, int writing)
> +{
> +	pte_t pte;
> +
> +#ifdef PTE_ATOMIC_UPDATES
> +	pte_t tmp;
> +        /* wait until _PAGE_BUSY is clear then set it atomically */

_PAGE_BUSY is 0 on book3e.

> +#ifdef CONFIG_PPC64
> +	__asm__ __volatile__ (
> +		"1:	ldarx	%0,0,%3\n"
> +		"	andi.	%1,%0,%4\n"
> +		"	bne-	1b\n"
> +		"	ori	%1,%0,%4\n"
> +		"	stdcx.	%1,0,%3\n"
> +		"	bne-	1b"
> +		: "=&r" (pte), "=&r" (tmp), "=m" (*p)
> +		: "r" (p), "i" (_PAGE_BUSY)
> +		: "cc");
> +#else
> +        __asm__ __volatile__ (
> +                "1:     lwarx   %0,0,%3\n"
> +                "       andi.   %1,%0,%4\n"
> +                "       bne-    1b\n"
> +                "       ori     %1,%0,%4\n"
> +                "       stwcx.  %1,0,%3\n"
> +                "       bne-    1b"
> +                : "=&r" (pte), "=&r" (tmp), "=m" (*p)
> +                : "r" (p), "i" (_PAGE_BUSY)
> +                : "cc");
> +#endif

What about 64-bit PTEs on 32-bit kernels?

In any case, this code does not belong in KVM.  It should be in the main
PPC mm code, even if KVM is the only user.

-Scott

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-02 22:58   ` Scott Wood
@ 2013-08-02 23:16     ` Benjamin Herrenschmidt
  2013-08-03  2:58       ` Bhushan Bharat-R65777
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-02 23:16 UTC (permalink / raw)
  To: Scott Wood
  Cc: Bharat Bhushan, agraf, kvm-ppc, kvm, linuxppc-dev, Bharat Bhushan

On Fri, 2013-08-02 at 17:58 -0500, Scott Wood wrote:
> 
> What about 64-bit PTEs on 32-bit kernels?
> 
> In any case, this code does not belong in KVM.  It should be in the
> main
> PPC mm code, even if KVM is the only user.

Also don't we do similar things in BookS KVM ? At the very least that
sutff should become common. And yes, I agree, it should probably also
move to pgtable*

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
  2013-08-01 11:12 ` [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte Bharat Bhushan
  2013-08-02  6:24   ` "“tiejun.chen”"
@ 2013-08-02 23:34   ` Scott Wood
  2013-08-03  3:11     ` Bhushan Bharat-R65777
  1 sibling, 1 reply; 27+ messages in thread
From: Scott Wood @ 2013-08-02 23:34 UTC (permalink / raw)
  To: Bharat Bhushan; +Cc: benh, agraf, kvm-ppc, kvm, linuxppc-dev, Bharat Bhushan

On Thu, Aug 01, 2013 at 04:42:38PM +0530, Bharat Bhushan wrote:
> diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> index 17722d8..ebcccc2 100644
> --- a/arch/powerpc/kvm/booke.c
> +++ b/arch/powerpc/kvm/booke.c
> @@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
>  #endif
>  
>  	kvmppc_fix_ee_before_entry();
> -
> +	vcpu->arch.pgdir = current->mm->pgd;
>  	ret = __kvmppc_vcpu_run(kvm_run, vcpu);

kvmppc_fix_ee_before_entry() is supposed to be the last thing that
happens before __kvmppc_vcpu_run().

> @@ -332,6 +324,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>  	unsigned long hva;
>  	int pfnmap = 0;
>  	int tsize = BOOK3E_PAGESZ_4K;
> +	pte_t pte;
> +	int wimg = 0;
>  
>  	/*
>  	 * Translate guest physical to true physical, acquiring
> @@ -437,6 +431,8 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>  
>  	if (likely(!pfnmap)) {
>  		unsigned long tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT);
> +		pgd_t *pgdir;
> +
>  		pfn = gfn_to_pfn_memslot(slot, gfn);
>  		if (is_error_noslot_pfn(pfn)) {
>  			printk(KERN_ERR "Couldn't get real page for gfn %lx!\n",
> @@ -447,9 +443,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>  		/* Align guest and physical address to page map boundaries */
>  		pfn &= ~(tsize_pages - 1);
>  		gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1);
> +		pgdir = vcpu_e500->vcpu.arch.pgdir;
> +		pte = lookup_linux_pte(pgdir, hva, 1, &tsize_pages);
> +		if (pte_present(pte)) {
> +			wimg = (pte >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;
> +		} else {
> +			printk(KERN_ERR "pte not present: gfn %lx, pfn %lx\n",
> +					(long)gfn, pfn);
> +			return -EINVAL;
> +		}
>  	}

How does wimg get set in the pfnmap case?

Could you explain why we need to set dirty/referenced on the PTE, when we
didn't need to do that before?  All we're getting from the PTE is wimg. 
We have MMU notifiers to take care of the page being unmapped, and we've
already marked the page itself as dirty if the TLB entry is writeable.

-Scott

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-02 23:16     ` Benjamin Herrenschmidt
@ 2013-08-03  2:58       ` Bhushan Bharat-R65777
  2013-08-03  4:24         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 27+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-03  2:58 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Wood Scott-B07421
  Cc: agraf@suse.de, kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org



> -----Original Message-----
> From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]
> Sent: Saturday, August 03, 2013 4:47 AM
> To: Wood Scott-B07421
> Cc: Bhushan Bharat-R65777; agraf@suse.de; kvm-ppc@vger.kernel.org;
> kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
> booke3s
> 
> On Fri, 2013-08-02 at 17:58 -0500, Scott Wood wrote:
> >
> > What about 64-bit PTEs on 32-bit kernels?
> >
> > In any case, this code does not belong in KVM.  It should be in the
> > main PPC mm code, even if KVM is the only user.
> 
> Also don't we do similar things in BookS KVM ? At the very least that sutff
> should become common. And yes, I agree, it should probably also move to pgtable*

One of the problem I saw was that if I put this code in asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other friend function (on which this code depends) are defined in pgtable.h. And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it defines pte_present() and friends functions.

Ok I move wove this in asm/pgtable*.h, initially I fought with myself to take this code in pgtable* but finally end up doing here (got biased by book3s :)).

Thanks
-Bharat

> 
> Cheers,
> Ben.
> 
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
  2013-08-02 23:34   ` Scott Wood
@ 2013-08-03  3:11     ` Bhushan Bharat-R65777
  2013-08-03  4:25       ` Benjamin Herrenschmidt
  2013-08-05 16:30       ` Scott Wood
  0 siblings, 2 replies; 27+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-03  3:11 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: benh@kernel.crashing.org, agraf@suse.de, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Saturday, August 03, 2013 5:05 AM
> To: Bhushan Bharat-R65777
> Cc: benh@kernel.crashing.org; agraf@suse.de; kvm-ppc@vger.kernel.org;
> kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org; Bhushan Bharat-R65777
> Subject: Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux
> pte
> 
> On Thu, Aug 01, 2013 at 04:42:38PM +0530, Bharat Bhushan wrote:
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
> > 17722d8..ebcccc2 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -697,7 +697,7 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run,
> > struct kvm_vcpu *vcpu)  #endif
> >
> >  	kvmppc_fix_ee_before_entry();
> > -
> > +	vcpu->arch.pgdir = current->mm->pgd;
> >  	ret = __kvmppc_vcpu_run(kvm_run, vcpu);
> 
> kvmppc_fix_ee_before_entry() is supposed to be the last thing that happens
> before __kvmppc_vcpu_run().
> 
> > @@ -332,6 +324,8 @@ static inline int kvmppc_e500_shadow_map(struct
> kvmppc_vcpu_e500 *vcpu_e500,
> >  	unsigned long hva;
> >  	int pfnmap = 0;
> >  	int tsize = BOOK3E_PAGESZ_4K;
> > +	pte_t pte;
> > +	int wimg = 0;
> >
> >  	/*
> >  	 * Translate guest physical to true physical, acquiring @@ -437,6
> > +431,8 @@ static inline int kvmppc_e500_shadow_map(struct
> > kvmppc_vcpu_e500 *vcpu_e500,
> >
> >  	if (likely(!pfnmap)) {
> >  		unsigned long tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT);
> > +		pgd_t *pgdir;
> > +
> >  		pfn = gfn_to_pfn_memslot(slot, gfn);
> >  		if (is_error_noslot_pfn(pfn)) {
> >  			printk(KERN_ERR "Couldn't get real page for gfn %lx!\n", @@
> -447,9
> > +443,18 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500
> *vcpu_e500,
> >  		/* Align guest and physical address to page map boundaries */
> >  		pfn &= ~(tsize_pages - 1);
> >  		gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1);
> > +		pgdir = vcpu_e500->vcpu.arch.pgdir;
> > +		pte = lookup_linux_pte(pgdir, hva, 1, &tsize_pages);
> > +		if (pte_present(pte)) {
> > +			wimg = (pte >> PTE_WIMGE_SHIFT) & MAS2_WIMGE_MASK;
> > +		} else {
> > +			printk(KERN_ERR "pte not present: gfn %lx, pfn %lx\n",
> > +					(long)gfn, pfn);
> > +			return -EINVAL;
> > +		}
> >  	}
> 
> How does wimg get set in the pfnmap case?

Pfnmap is not kernel managed pages, right? So should we set I+G there ?

> 
> Could you explain why we need to set dirty/referenced on the PTE, when we didn't
> need to do that before? All we're getting from the PTE is wimg.
> We have MMU notifiers to take care of the page being unmapped, and we've already
> marked the page itself as dirty if the TLB entry is writeable.

I pulled this code from book3s.

Ben, can you describe why we need this on book3s ?

Thanks
-Bharat
> 
> -Scott

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-03  2:58       ` Bhushan Bharat-R65777
@ 2013-08-03  4:24         ` Benjamin Herrenschmidt
  2013-08-05 14:27           ` Bhushan Bharat-R65777
  0 siblings, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-03  4:24 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Wood Scott-B07421, agraf@suse.de, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org

On Sat, 2013-08-03 at 02:58 +0000, Bhushan Bharat-R65777 wrote:
> One of the problem I saw was that if I put this code in
> asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
> friend function (on which this code depends) are defined in pgtable.h.
> And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it
> defines pte_present() and friends functions.
> 
> Ok I move wove this in asm/pgtable*.h, initially I fought with myself
> to take this code in pgtable* but finally end up doing here (got
> biased by book3s :)).

Is there a reason why these routines can not be completely generic
in pgtable.h ?

Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
  2013-08-03  3:11     ` Bhushan Bharat-R65777
@ 2013-08-03  4:25       ` Benjamin Herrenschmidt
  2013-08-05 16:28         ` Scott Wood
  2013-08-05 16:30       ` Scott Wood
  1 sibling, 1 reply; 27+ messages in thread
From: Benjamin Herrenschmidt @ 2013-08-03  4:25 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Wood Scott-B07421, agraf@suse.de, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org

On Sat, 2013-08-03 at 03:11 +0000, Bhushan Bharat-R65777 wrote:
> 
> > 
> > Could you explain why we need to set dirty/referenced on the PTE, when we didn't
> > need to do that before? All we're getting from the PTE is wimg.
> > We have MMU notifiers to take care of the page being unmapped, and we've already
> > marked the page itself as dirty if the TLB entry is writeable.
> 
> I pulled this code from book3s.
> 
> Ben, can you describe why we need this on book3s ?

If you let the guest write to the page you must set the dirty bit on the PTE
(or the struct page, at least one of them), similar with accessed on any access.

If you don't, the VM might swap the page out without writing it back to disk
for example, assuming it contains no modified data.

Cheers,
Ben.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-03  4:24         ` Benjamin Herrenschmidt
@ 2013-08-05 14:27           ` Bhushan Bharat-R65777
  2013-08-05 19:19             ` Scott Wood
  0 siblings, 1 reply; 27+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-05 14:27 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Wood Scott-B07421, agraf@suse.de, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org



> -----Original Message-----
> From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]
> Sent: Saturday, August 03, 2013 9:54 AM
> To: Bhushan Bharat-R65777
> Cc: Wood Scott-B07421; agraf@suse.de; kvm-ppc@vger.kernel.org;
> kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
> booke3s
> 
> On Sat, 2013-08-03 at 02:58 +0000, Bhushan Bharat-R65777 wrote:
> > One of the problem I saw was that if I put this code in
> > asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
> > friend function (on which this code depends) are defined in pgtable.h.
> > And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it
> > defines pte_present() and friends functions.
> >
> > Ok I move wove this in asm/pgtable*.h, initially I fought with myself
> > to take this code in pgtable* but finally end up doing here (got
> > biased by book3s :)).
> 
> Is there a reason why these routines can not be completely generic in pgtable.h
> ?

How about the generic function:

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index d257d98..21daf28 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct *mm,
        return old;
 }

+static inline unsigned long pte_read(pte_t *p)
+{
+#ifdef PTE_ATOMIC_UPDATES
+       pte_t pte;
+       pte_t tmp;
+       __asm__ __volatile__ (
+       "1:     ldarx   %0,0,%3\n"
+       "       andi.   %1,%0,%4\n"
+       "       bne-    1b\n"
+       "       ori     %1,%0,%4\n"
+       "       stdcx.  %1,0,%3\n"
+       "       bne-    1b"
+       : "=&r" (pte), "=&r" (tmp), "=m" (*p)
+       : "r" (p), "i" (_PAGE_BUSY)
+       : "cc");
+
+       return pte;
+#else  
+       return pte_val(*p);
+#endif
+#endif
+}
 static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
                                              unsigned long addr, pte_t *ptep)
 {
diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
index 690c8c2..dad712c 100644
--- a/arch/powerpc/include/asm/pgtable.h
+++ b/arch/powerpc/include/asm/pgtable.h
@@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
 }
 #endif /* !CONFIG_HUGETLB_PAGE */

+static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
+                                    int writing, unsigned long *pte_sizep)
+{
+       pte_t *ptep;
+       pte_t pte;
+       unsigned long ps = *pte_sizep;
+       unsigned int shift;
+
+       ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
+       if (!ptep)
+               return __pte(0);
+       if (shift)
+               *pte_sizep = 1ul << shift;
+       else
+               *pte_sizep = PAGE_SIZE;
+
+       if (ps > *pte_sizep)
+               return __pte(0);
+
+       if (!pte_present(*ptep))
+               return __pte(0);
+
+#ifdef CONFIG_PPC64
+       /* Lock PTE (set _PAGE_BUSY) and read */
+       pte = pte_read(ptep);
+#else
+       pte = pte_val(*ptep);
+#endif
+       if (pte_present(pte)) {
+               pte = pte_mkyoung(pte);
+               if (writing && pte_write(pte))
+                       pte = pte_mkdirty(pte);
+       }
+
+       *ptep = __pte(pte); /* 64bit: Also unlock pte (clear _PAGE_BUSY) */
+
+       return pte;
+}
+
 #endif /* __ASSEMBLY__ */

 #endif /* __KERNEL__ */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
  2013-08-03  4:25       ` Benjamin Herrenschmidt
@ 2013-08-05 16:28         ` Scott Wood
  0 siblings, 0 replies; 27+ messages in thread
From: Scott Wood @ 2013-08-05 16:28 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Bhushan Bharat-R65777, Wood Scott-B07421, agraf@suse.de,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org

On Sat, 2013-08-03 at 14:25 +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2013-08-03 at 03:11 +0000, Bhushan Bharat-R65777 wrote:
> > 
> > > 
> > > Could you explain why we need to set dirty/referenced on the PTE, when we didn't
> > > need to do that before? All we're getting from the PTE is wimg.
> > > We have MMU notifiers to take care of the page being unmapped, and we've already
> > > marked the page itself as dirty if the TLB entry is writeable.
> > 
> > I pulled this code from book3s.
> > 
> > Ben, can you describe why we need this on book3s ?
> 
> If you let the guest write to the page you must set the dirty bit on the PTE
> (or the struct page, at least one of them), similar with accessed on any access.
> 
> If you don't, the VM might swap the page out without writing it back to disk
> for example, assuming it contains no modified data.

We've already marked the page itself as dirty using kvm_set_pfn_dirty(),
and if the VM swaps it out we'll get an MMU notifier callback.  If we
marked the PTE dirty/accessed instead, is there any guarantee it will
stay marked dirty/accessed until the next MMU notifier?

-Scott

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte
  2013-08-03  3:11     ` Bhushan Bharat-R65777
  2013-08-03  4:25       ` Benjamin Herrenschmidt
@ 2013-08-05 16:30       ` Scott Wood
  1 sibling, 0 replies; 27+ messages in thread
From: Scott Wood @ 2013-08-05 16:30 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Wood Scott-B07421, benh@kernel.crashing.org, agraf@suse.de,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org

On Fri, 2013-08-02 at 22:11 -0500, Bhushan Bharat-R65777 wrote:
> > How does wimg get set in the pfnmap case?
> 
> Pfnmap is not kernel managed pages, right? So should we set I+G there ?

It could depend on ppc_md.phys_mem_access_prot().  Can't you pull it
from the PTE regardless of pfnmap?

-Scott

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-05 14:27           ` Bhushan Bharat-R65777
@ 2013-08-05 19:19             ` Scott Wood
  2013-08-06  1:12               ` Bhushan Bharat-R65777
                                 ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Scott Wood @ 2013-08-05 19:19 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Benjamin Herrenschmidt, Wood Scott-B07421, agraf@suse.de,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org

On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
> 
> > -----Original Message-----
> > From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]
> > Sent: Saturday, August 03, 2013 9:54 AM
> > To: Bhushan Bharat-R65777
> > Cc: Wood Scott-B07421; agraf@suse.de; kvm-ppc@vger.kernel.org;
> > kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
> > booke3s
> > 
> > On Sat, 2013-08-03 at 02:58 +0000, Bhushan Bharat-R65777 wrote:
> > > One of the problem I saw was that if I put this code in
> > > asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
> > > friend function (on which this code depends) are defined in pgtable.h.
> > > And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h before it
> > > defines pte_present() and friends functions.
> > >
> > > Ok I move wove this in asm/pgtable*.h, initially I fought with myself
> > > to take this code in pgtable* but finally end up doing here (got
> > > biased by book3s :)).
> > 
> > Is there a reason why these routines can not be completely generic in pgtable.h
> > ?
> 
> How about the generic function:
> 
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
> index d257d98..21daf28 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct *mm,
>         return old;
>  }
> 
> +static inline unsigned long pte_read(pte_t *p)
> +{
> +#ifdef PTE_ATOMIC_UPDATES
> +       pte_t pte;
> +       pte_t tmp;
> +       __asm__ __volatile__ (
> +       "1:     ldarx   %0,0,%3\n"
> +       "       andi.   %1,%0,%4\n"
> +       "       bne-    1b\n"
> +       "       ori     %1,%0,%4\n"
> +       "       stdcx.  %1,0,%3\n"
> +       "       bne-    1b"
> +       : "=&r" (pte), "=&r" (tmp), "=m" (*p)
> +       : "r" (p), "i" (_PAGE_BUSY)
> +       : "cc");
> +
> +       return pte;
> +#else  
> +       return pte_val(*p);
> +#endif
> +#endif
> +}
>  static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
>                                               unsigned long addr, pte_t *ptep)

Please leave a blank line between functions.

>  {
> diff --git a/arch/powerpc/include/asm/pgtable.h b/arch/powerpc/include/asm/pgtable.h
> index 690c8c2..dad712c 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -254,6 +254,45 @@ static inline pte_t *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,
>  }
>  #endif /* !CONFIG_HUGETLB_PAGE */
> 
> +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> +                                    int writing, unsigned long *pte_sizep)

The name implies that it just reads the PTE.  Setting accessed/dirty
shouldn't be an undocumented side-effect.  Why can't the caller do that
(or a different function that the caller calls afterward if desired)?  

Though even then you have the undocumented side effect of locking the
PTE on certain targets.

> +{
> +       pte_t *ptep;
> +       pte_t pte;
> +       unsigned long ps = *pte_sizep;
> +       unsigned int shift;
> +
> +       ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
> +       if (!ptep)
> +               return __pte(0);
> +       if (shift)
> +               *pte_sizep = 1ul << shift;
> +       else
> +               *pte_sizep = PAGE_SIZE;
> +
> +       if (ps > *pte_sizep)
> +               return __pte(0);
> +
> +       if (!pte_present(*ptep))
> +               return __pte(0);
> +
> +#ifdef CONFIG_PPC64
> +       /* Lock PTE (set _PAGE_BUSY) and read */
> +       pte = pte_read(ptep);
> +#else
> +       pte = pte_val(*ptep);
> +#endif

What about 32-bit platforms that need atomic PTEs?

-Scott

^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-05 19:19             ` Scott Wood
@ 2013-08-06  1:12               ` Bhushan Bharat-R65777
  2013-08-06  7:02               ` Bhushan Bharat-R65777
  2013-08-06 14:46               ` Bhushan Bharat-R65777
  2 siblings, 0 replies; 27+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-06  1:12 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: Benjamin Herrenschmidt, agraf@suse.de, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Tuesday, August 06, 2013 12:49 AM
> To: Bhushan Bharat-R65777
> Cc: Benjamin Herrenschmidt; Wood Scott-B07421; agraf@suse.de; kvm-
> ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
> booke3s
> 
> On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -----Original Message-----
> > > From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]
> > > Sent: Saturday, August 03, 2013 9:54 AM
> > > To: Bhushan Bharat-R65777
> > > Cc: Wood Scott-B07421; agraf@suse.de; kvm-ppc@vger.kernel.org;
> > > kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> > > Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
> > > lookup like booke3s
> > >
> > > On Sat, 2013-08-03 at 02:58 +0000, Bhushan Bharat-R65777 wrote:
> > > > One of the problem I saw was that if I put this code in
> > > > asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
> > > > friend function (on which this code depends) are defined in pgtable.h.
> > > > And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
> > > > before it defines pte_present() and friends functions.
> > > >
> > > > Ok I move wove this in asm/pgtable*.h, initially I fought with
> > > > myself to take this code in pgtable* but finally end up doing here
> > > > (got biased by book3s :)).
> > >
> > > Is there a reason why these routines can not be completely generic
> > > in pgtable.h ?
> >
> > How about the generic function:
> >
> > diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
> > b/arch/powerpc/include/asm/pgtable-ppc64.h
> > index d257d98..21daf28 100644
> > --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> > +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> > @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct
> *mm,
> >         return old;
> >  }
> >
> > +static inline unsigned long pte_read(pte_t *p) { #ifdef
> > +PTE_ATOMIC_UPDATES
> > +       pte_t pte;
> > +       pte_t tmp;
> > +       __asm__ __volatile__ (
> > +       "1:     ldarx   %0,0,%3\n"
> > +       "       andi.   %1,%0,%4\n"
> > +       "       bne-    1b\n"
> > +       "       ori     %1,%0,%4\n"
> > +       "       stdcx.  %1,0,%3\n"
> > +       "       bne-    1b"
> > +       : "=&r" (pte), "=&r" (tmp), "=m" (*p)
> > +       : "r" (p), "i" (_PAGE_BUSY)
> > +       : "cc");
> > +
> > +       return pte;
> > +#else
> > +       return pte_val(*p);
> > +#endif
> > +#endif
> > +}
> >  static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
> >                                               unsigned long addr,
> > pte_t *ptep)
> 
> Please leave a blank line between functions.
> 
> >  {
> > diff --git a/arch/powerpc/include/asm/pgtable.h
> > b/arch/powerpc/include/asm/pgtable.h
> > index 690c8c2..dad712c 100644
> > --- a/arch/powerpc/include/asm/pgtable.h
> > +++ b/arch/powerpc/include/asm/pgtable.h
> > @@ -254,6 +254,45 @@ static inline pte_t
> > *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }  #endif
> > /* !CONFIG_HUGETLB_PAGE */
> >
> > +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> > +                                    int writing, unsigned long
> > +*pte_sizep)
> 
> The name implies that it just reads the PTE.  Setting accessed/dirty shouldn't
> be an undocumented side-effect.

Ok, will rename and document.

> Why can't the caller do that (or a different
> function that the caller calls afterward if desired)?

The current implementation in book3s is;
 1) find a pte/hugepte
 2) return null if pte not present
 3) take _PAGE_BUSY lock
 4) set accessed/dirty
 5) clear _PAGE_BUSY.

What I tried was 
1) find a pte/hugepte
2) return null if pte not present
3) return pte (not take lock by not setting _PAGE_BUSY)

4) then user calls  __ptep_set_access_flags() to atomic update the dirty/accessed flags in pte.

- but the benchmark results were not good
- Also can there be race as we do not take lock in step 3 and update in step 4 ?
  
> 
> Though even then you have the undocumented side effect of locking the PTE on
> certain targets.
> 
> > +{
> > +       pte_t *ptep;
> > +       pte_t pte;
> > +       unsigned long ps = *pte_sizep;
> > +       unsigned int shift;
> > +
> > +       ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
> > +       if (!ptep)
> > +               return __pte(0);
> > +       if (shift)
> > +               *pte_sizep = 1ul << shift;
> > +       else
> > +               *pte_sizep = PAGE_SIZE;
> > +
> > +       if (ps > *pte_sizep)
> > +               return __pte(0);
> > +
> > +       if (!pte_present(*ptep))
> > +               return __pte(0);
> > +
> > +#ifdef CONFIG_PPC64
> > +       /* Lock PTE (set _PAGE_BUSY) and read */
> > +       pte = pte_read(ptep);
> > +#else
> > +       pte = pte_val(*ptep);
> > +#endif
> 
> What about 32-bit platforms that need atomic PTEs?

I called __ptep_set_access_flags() for both 32/64bit (for 64bit I was not calling pte_read()), which handles atomic updates. Somehow the benchmark result were not good, will try again.

Thanks
-Bharat
> 
> -Scott
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-05 19:19             ` Scott Wood
  2013-08-06  1:12               ` Bhushan Bharat-R65777
@ 2013-08-06  7:02               ` Bhushan Bharat-R65777
  2013-08-07  0:24                 ` Paul Mackerras
  2013-08-06 14:46               ` Bhushan Bharat-R65777
  2 siblings, 1 reply; 27+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-06  7:02 UTC (permalink / raw)
  To: Wood Scott-B07421, Paul Mackerras
  Cc: Benjamin Herrenschmidt, agraf@suse.de, kvm-ppc@vger.kernel.org,
	kvm@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Bhushan Bharat-R65777



> -----Original Message-----
> From: Bhushan Bharat-R65777
> Sent: Tuesday, August 06, 2013 6:42 AM
> To: Wood Scott-B07421
> Cc: Benjamin Herrenschmidt; agraf@suse.de; kvm-ppc@vger.kernel.org;
> kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Subject: RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
> booke3s
> 
> 
> 
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Tuesday, August 06, 2013 12:49 AM
> > To: Bhushan Bharat-R65777
> > Cc: Benjamin Herrenschmidt; Wood Scott-B07421; agraf@suse.de; kvm-
> > ppc@vger.kernel.org; kvm@vger.kernel.org;
> > linuxppc-dev@lists.ozlabs.org
> > Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup
> > like booke3s
> >
> > On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
> > >
> > > > -----Original Message-----
> > > > From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]
> > > > Sent: Saturday, August 03, 2013 9:54 AM
> > > > To: Bhushan Bharat-R65777
> > > > Cc: Wood Scott-B07421; agraf@suse.de; kvm-ppc@vger.kernel.org;
> > > > kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> > > > Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
> > > > lookup like booke3s
> > > >
> > > > On Sat, 2013-08-03 at 02:58 +0000, Bhushan Bharat-R65777 wrote:
> > > > > One of the problem I saw was that if I put this code in
> > > > > asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and
> > > > > other friend function (on which this code depends) are defined in
> pgtable.h.
> > > > > And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
> > > > > before it defines pte_present() and friends functions.
> > > > >
> > > > > Ok I move wove this in asm/pgtable*.h, initially I fought with
> > > > > myself to take this code in pgtable* but finally end up doing
> > > > > here (got biased by book3s :)).
> > > >
> > > > Is there a reason why these routines can not be completely generic
> > > > in pgtable.h ?
> > >
> > > How about the generic function:
> > >
> > > diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
> > > b/arch/powerpc/include/asm/pgtable-ppc64.h
> > > index d257d98..21daf28 100644
> > > --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> > > +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> > > @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct
> > > mm_struct
> > *mm,
> > >         return old;
> > >  }
> > >
> > > +static inline unsigned long pte_read(pte_t *p) { #ifdef
> > > +PTE_ATOMIC_UPDATES
> > > +       pte_t pte;
> > > +       pte_t tmp;
> > > +       __asm__ __volatile__ (
> > > +       "1:     ldarx   %0,0,%3\n"
> > > +       "       andi.   %1,%0,%4\n"
> > > +       "       bne-    1b\n"
> > > +       "       ori     %1,%0,%4\n"
> > > +       "       stdcx.  %1,0,%3\n"
> > > +       "       bne-    1b"
> > > +       : "=&r" (pte), "=&r" (tmp), "=m" (*p)
> > > +       : "r" (p), "i" (_PAGE_BUSY)
> > > +       : "cc");
> > > +
> > > +       return pte;
> > > +#else
> > > +       return pte_val(*p);
> > > +#endif
> > > +#endif
> > > +}
> > >  static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
> > >                                               unsigned long addr,
> > > pte_t *ptep)
> >
> > Please leave a blank line between functions.
> >
> > >  {
> > > diff --git a/arch/powerpc/include/asm/pgtable.h
> > > b/arch/powerpc/include/asm/pgtable.h
> > > index 690c8c2..dad712c 100644
> > > --- a/arch/powerpc/include/asm/pgtable.h
> > > +++ b/arch/powerpc/include/asm/pgtable.h
> > > @@ -254,6 +254,45 @@ static inline pte_t
> > > *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }
> > > #endif
> > > /* !CONFIG_HUGETLB_PAGE */
> > >
> > > +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> > > +                                    int writing, unsigned long
> > > +*pte_sizep)
> >
> > The name implies that it just reads the PTE.  Setting accessed/dirty
> > shouldn't be an undocumented side-effect.
> 
> Ok, will rename and document.
> 
> > Why can't the caller do that (or a different function that the caller
> > calls afterward if desired)?
> 
> The current implementation in book3s is;
>  1) find a pte/hugepte
>  2) return null if pte not present
>  3) take _PAGE_BUSY lock
>  4) set accessed/dirty
>  5) clear _PAGE_BUSY.
> 
> What I tried was
> 1) find a pte/hugepte
> 2) return null if pte not present
> 3) return pte (not take lock by not setting _PAGE_BUSY)
> 
> 4) then user calls  __ptep_set_access_flags() to atomic update the
> dirty/accessed flags in pte.
> 
> - but the benchmark results were not good
> - Also can there be race as we do not take lock in step 3 and update in step 4 ?
> 
> >
> > Though even then you have the undocumented side effect of locking the
> > PTE on certain targets.
> >
> > > +{
> > > +       pte_t *ptep;
> > > +       pte_t pte;
> > > +       unsigned long ps = *pte_sizep;
> > > +       unsigned int shift;
> > > +
> > > +       ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
> > > +       if (!ptep)
> > > +               return __pte(0);
> > > +       if (shift)
> > > +               *pte_sizep = 1ul << shift;
> > > +       else
> > > +               *pte_sizep = PAGE_SIZE;
> > > +
> > > +       if (ps > *pte_sizep)
> > > +               return __pte(0);
> > > +
> > > +       if (!pte_present(*ptep))
> > > +               return __pte(0);
> > > +
> > > +#ifdef CONFIG_PPC64
> > > +       /* Lock PTE (set _PAGE_BUSY) and read */
> > > +       pte = pte_read(ptep);
> > > +#else
> > > +       pte = pte_val(*ptep);
> > > +#endif
> >
> > What about 32-bit platforms that need atomic PTEs?
> 
> I called __ptep_set_access_flags() for both 32/64bit (for 64bit I was not
> calling pte_read()), which handles atomic updates. Somehow the benchmark result
> were not good, will try again.

Hi Pauls,

I am trying to me the Linux pte search and update generic so that this can be used for powerpc as well.

I am not sure which of the below two should be ok, please help

--------------------------- (1) This --------------------------
/*
 * Lock and read a linux PTE.  If it's present and writable, atomically
 * set dirty and referenced bits and return the PTE, otherwise return 0.
 */
static inline atomic_read_update_pte_flags(pte_t *ptep, int writing)
{
        pte_t pte;
#ifdef CONFIG_PPC64
        /* Lock PTE (set _PAGE_BUSY) and read
         * XXX: NOTE: Some Architectures (book3e) does not have pte Locking,
         * so not generic in that sense for all architecture
         */
        pte = pte_read(ptep);
#else
        /* XXX: do not see any read locking on 32 bit architecture */
        pte = pte_val(*ptep);
#endif
        if (pte_present(pte)) {
                pte = pte_mkyoung(pte);
                if (writing && pte_write(pte))
                        pte = pte_mkdirty(pte);
        }

        *ptep = __pte(pte); /* 64bit: Also unlock pte (clear _PAGE_BUSY) */

        return pte;
}


--------------------- (2) OR THIS ----------------------------------

/*
 * Read a linux PTE.  If it's present and writable, atomically
 * set dirty and referenced bits and return the PTE, otherwise return 0.
 */
static inline atomic_read_update_pte_flags(pte_t *ptep, int writing)
{
        pte_t pte;
        pte = pte_val(*ptep);
        if (pte_present(pte)) {
                pte = pte_mkyoung(pte);
                if (writing && pte_write(pte))
                        pte = pte_mkdirty(pte);
        }

        __ptep_set_access_flags(ptep, pte);

        return pte_val(*ptep);
}

Thanks
-Bharat

> 
> Thanks
> -Bharat
> >
> > -Scott
> >


^ permalink raw reply	[flat|nested] 27+ messages in thread

* RE: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-05 19:19             ` Scott Wood
  2013-08-06  1:12               ` Bhushan Bharat-R65777
  2013-08-06  7:02               ` Bhushan Bharat-R65777
@ 2013-08-06 14:46               ` Bhushan Bharat-R65777
  2 siblings, 0 replies; 27+ messages in thread
From: Bhushan Bharat-R65777 @ 2013-08-06 14:46 UTC (permalink / raw)
  To: Wood Scott-B07421
  Cc: linuxppc-dev@lists.ozlabs.org, agraf@suse.de,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Tuesday, August 06, 2013 12:49 AM
> To: Bhushan Bharat-R65777
> Cc: Benjamin Herrenschmidt; Wood Scott-B07421; agraf@suse.de; kvm-
> ppc@vger.kernel.org; kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
> booke3s
> 
> On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -----Original Message-----
> > > From: Benjamin Herrenschmidt [mailto:benh@kernel.crashing.org]
> > > Sent: Saturday, August 03, 2013 9:54 AM
> > > To: Bhushan Bharat-R65777
> > > Cc: Wood Scott-B07421; agraf@suse.de; kvm-ppc@vger.kernel.org;
> > > kvm@vger.kernel.org; linuxppc-dev@lists.ozlabs.org
> > > Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
> > > lookup like booke3s
> > >
> > > On Sat, 2013-08-03 at 02:58 +0000, Bhushan Bharat-R65777 wrote:
> > > > One of the problem I saw was that if I put this code in
> > > > asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
> > > > friend function (on which this code depends) are defined in pgtable.h.
> > > > And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
> > > > before it defines pte_present() and friends functions.
> > > >
> > > > Ok I move wove this in asm/pgtable*.h, initially I fought with
> > > > myself to take this code in pgtable* but finally end up doing here
> > > > (got biased by book3s :)).
> > >
> > > Is there a reason why these routines can not be completely generic
> > > in pgtable.h ?
> >
> > How about the generic function:
> >
> > diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
> > b/arch/powerpc/include/asm/pgtable-ppc64.h
> > index d257d98..21daf28 100644
> > --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> > +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> > @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct
> *mm,
> >         return old;
> >  }
> >
> > +static inline unsigned long pte_read(pte_t *p) { #ifdef
> > +PTE_ATOMIC_UPDATES
> > +       pte_t pte;
> > +       pte_t tmp;
> > +       __asm__ __volatile__ (
> > +       "1:     ldarx   %0,0,%3\n"
> > +       "       andi.   %1,%0,%4\n"
> > +       "       bne-    1b\n"
> > +       "       ori     %1,%0,%4\n"
> > +       "       stdcx.  %1,0,%3\n"
> > +       "       bne-    1b"
> > +       : "=&r" (pte), "=&r" (tmp), "=m" (*p)
> > +       : "r" (p), "i" (_PAGE_BUSY)
> > +       : "cc");
> > +
> > +       return pte;
> > +#else
> > +       return pte_val(*p);
> > +#endif
> > +#endif
> > +}
> >  static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
> >                                               unsigned long addr,
> > pte_t *ptep)
> 
> Please leave a blank line between functions.
> 
> >  {
> > diff --git a/arch/powerpc/include/asm/pgtable.h
> > b/arch/powerpc/include/asm/pgtable.h
> > index 690c8c2..dad712c 100644
> > --- a/arch/powerpc/include/asm/pgtable.h
> > +++ b/arch/powerpc/include/asm/pgtable.h
> > @@ -254,6 +254,45 @@ static inline pte_t
> > *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }  #endif
> > /* !CONFIG_HUGETLB_PAGE */
> >
> > +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> > +                                    int writing, unsigned long
> > +*pte_sizep)
> 
> The name implies that it just reads the PTE.  Setting accessed/dirty shouldn't
> be an undocumented side-effect.  Why can't the caller do that (or a different
> function that the caller calls afterward if desired)?

Scott, I sent the next version of patch based on above idea. Now I think we do not need to update the pte flags on booke 
So we do not need to solve the kvmppc_read_update_linux_pte() stuff of book3s.

-Bharat

> 
> Though even then you have the undocumented side effect of locking the PTE on
> certain targets.
> 
> > +{
> > +       pte_t *ptep;
> > +       pte_t pte;
> > +       unsigned long ps = *pte_sizep;
> > +       unsigned int shift;
> > +
> > +       ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
> > +       if (!ptep)
> > +               return __pte(0);
> > +       if (shift)
> > +               *pte_sizep = 1ul << shift;
> > +       else
> > +               *pte_sizep = PAGE_SIZE;
> > +
> > +       if (ps > *pte_sizep)
> > +               return __pte(0);
> > +
> > +       if (!pte_present(*ptep))
> > +               return __pte(0);
> > +
> > +#ifdef CONFIG_PPC64
> > +       /* Lock PTE (set _PAGE_BUSY) and read */
> > +       pte = pte_read(ptep);
> > +#else
> > +       pte = pte_val(*ptep);
> > +#endif
> 
> What about 32-bit platforms that need atomic PTEs?
> 
> -Scott
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-06  7:02               ` Bhushan Bharat-R65777
@ 2013-08-07  0:24                 ` Paul Mackerras
  2013-08-07  1:11                   ` Scott Wood
  0 siblings, 1 reply; 27+ messages in thread
From: Paul Mackerras @ 2013-08-07  0:24 UTC (permalink / raw)
  To: Bhushan Bharat-R65777
  Cc: Wood Scott-B07421, Benjamin Herrenschmidt, agraf@suse.de,
	kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org

On Tue, Aug 06, 2013 at 07:02:48AM +0000, Bhushan Bharat-R65777 wrote:
> 
> I am trying to me the Linux pte search and update generic so that this can be used for powerpc as well.
> 
> I am not sure which of the below two should be ok, please help

Given that the BookE code uses gfn_to_pfn_memslot() to get the host
pfn, and then kvm_set_pfn_dirty(pfn) on pages that you're going to let
the guest write to, I don't think you need to set the dirty and/or
accessed bits in the Linux PTE explicitly.  If you care about the
WIMGE bits you can do find_linux_pte_or_hugepte() and just look at the
PTE, but you really should be using mmu_notifier_retry() to guard
against concurrent changes to the Linux PTE.  See the HV KVM code or
patch 21 of my recent series to see how it's used.  You probably
should be calling kvm_set_pfn_accessed() as well.

Paul.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-07  0:24                 ` Paul Mackerras
@ 2013-08-07  1:11                   ` Scott Wood
  2013-08-07  1:47                     ` Paul Mackerras
  0 siblings, 1 reply; 27+ messages in thread
From: Scott Wood @ 2013-08-07  1:11 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Bhushan Bharat-R65777, Wood Scott-B07421, Benjamin Herrenschmidt,
	agraf@suse.de, kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org

On Wed, 2013-08-07 at 10:24 +1000, Paul Mackerras wrote:
> On Tue, Aug 06, 2013 at 07:02:48AM +0000, Bhushan Bharat-R65777 wrote:
> > 
> > I am trying to me the Linux pte search and update generic so that this can be used for powerpc as well.
> > 
> > I am not sure which of the below two should be ok, please help
> 
> Given that the BookE code uses gfn_to_pfn_memslot() to get the host
> pfn, and then kvm_set_pfn_dirty(pfn) on pages that you're going to let
> the guest write to, I don't think you need to set the dirty and/or
> accessed bits in the Linux PTE explicitly.  If you care about the
> WIMGE bits you can do find_linux_pte_or_hugepte() and just look at the
> PTE, but you really should be using mmu_notifier_retry() to guard
> against concurrent changes to the Linux PTE.  See the HV KVM code or
> patch 21 of my recent series to see how it's used. 

Hmm... we only get a callback on invalidate_range_start(), not
invalidate_range_end() (and even if we did get a callback for the
latter, it'd probably be racy).  So we may have a problem here
regardless of getting WIMG from the PTE, unless it's guaranteed that
hva_to_pfn() will fail after invalidate_range_start().

>  You probably should be calling kvm_set_pfn_accessed() as well.

Yeah...  I think it'll only affect the quality of page-out decisions (as
opposed to corruption and such), but still it should be fixed.

-Scott

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s
  2013-08-07  1:11                   ` Scott Wood
@ 2013-08-07  1:47                     ` Paul Mackerras
  0 siblings, 0 replies; 27+ messages in thread
From: Paul Mackerras @ 2013-08-07  1:47 UTC (permalink / raw)
  To: Scott Wood
  Cc: Bhushan Bharat-R65777, Wood Scott-B07421, Benjamin Herrenschmidt,
	agraf@suse.de, kvm-ppc@vger.kernel.org, kvm@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org

On Tue, Aug 06, 2013 at 08:11:34PM -0500, Scott Wood wrote:
> On Wed, 2013-08-07 at 10:24 +1000, Paul Mackerras wrote:
> > On Tue, Aug 06, 2013 at 07:02:48AM +0000, Bhushan Bharat-R65777 wrote:
> > > 
> > > I am trying to me the Linux pte search and update generic so that this can be used for powerpc as well.
> > > 
> > > I am not sure which of the below two should be ok, please help
> > 
> > Given that the BookE code uses gfn_to_pfn_memslot() to get the host
> > pfn, and then kvm_set_pfn_dirty(pfn) on pages that you're going to let
> > the guest write to, I don't think you need to set the dirty and/or
> > accessed bits in the Linux PTE explicitly.  If you care about the
> > WIMGE bits you can do find_linux_pte_or_hugepte() and just look at the
> > PTE, but you really should be using mmu_notifier_retry() to guard
> > against concurrent changes to the Linux PTE.  See the HV KVM code or
> > patch 21 of my recent series to see how it's used. 
> 
> Hmm... we only get a callback on invalidate_range_start(), not
> invalidate_range_end() (and even if we did get a callback for the
> latter, it'd probably be racy).  So we may have a problem here
> regardless of getting WIMG from the PTE, unless it's guaranteed that
> hva_to_pfn() will fail after invalidate_range_start().

No, it's not guaranteed.  You have to use mmu_notifier_retry().  It
tells you if either (a) some sort of invalidation has happened since
you snapshotted kvm->mmu_notifier_seq, or (b) an
invalidate_range_start...end sequence is currently in progress.  In
either case you should discard any PTE or pfn information you
collected and retry.

> >  You probably should be calling kvm_set_pfn_accessed() as well.
> 
> Yeah...  I think it'll only affect the quality of page-out decisions (as
> opposed to corruption and such), but still it should be fixed.

Right.

Paul.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2013-08-07  1:47 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-01 11:12 [PATCH 0/6 v2] kvm: powerpc: use cache attributes from linux pte Bharat Bhushan
2013-08-01 11:12 ` [PATCH 1/6 v2] powerpc: book3e: _PAGE_LENDIAN must be _PAGE_ENDIAN Bharat Bhushan
2013-08-01 11:12 ` [PATCH 2/6 v2] kvm: powerpc: allow guest control "E" attribute in mas2 Bharat Bhushan
2013-08-01 11:12 ` [PATCH 3/6 v2] kvm: powerpc: allow guest control "G" " Bharat Bhushan
2013-08-02  6:39   ` "“tiejun.chen”"
2013-08-01 11:12 ` [PATCH 4/6 v2] powerpc: move linux pte/hugepte search to more generic file Bharat Bhushan
2013-08-01 11:12 ` [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s Bharat Bhushan
2013-08-02  6:37   ` "“tiejun.chen”"
2013-08-02 22:58   ` Scott Wood
2013-08-02 23:16     ` Benjamin Herrenschmidt
2013-08-03  2:58       ` Bhushan Bharat-R65777
2013-08-03  4:24         ` Benjamin Herrenschmidt
2013-08-05 14:27           ` Bhushan Bharat-R65777
2013-08-05 19:19             ` Scott Wood
2013-08-06  1:12               ` Bhushan Bharat-R65777
2013-08-06  7:02               ` Bhushan Bharat-R65777
2013-08-07  0:24                 ` Paul Mackerras
2013-08-07  1:11                   ` Scott Wood
2013-08-07  1:47                     ` Paul Mackerras
2013-08-06 14:46               ` Bhushan Bharat-R65777
2013-08-01 11:12 ` [PATCH 6/6 v2] kvm: powerpc: use caching attributes as per linux pte Bharat Bhushan
2013-08-02  6:24   ` "“tiejun.chen”"
2013-08-02 23:34   ` Scott Wood
2013-08-03  3:11     ` Bhushan Bharat-R65777
2013-08-03  4:25       ` Benjamin Herrenschmidt
2013-08-05 16:28         ` Scott Wood
2013-08-05 16:30       ` Scott Wood

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox