LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v7 14/28] powerpc: Add a probe_kernel_read_inst() function
From: Alistair Popple @ 2020-05-04  9:24 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: npiggin, bala24, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20200501034220.8982-15-jniethe5@gmail.com>

> @@ -524,7 +524,10 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned
> long addr) struct module *mod = rec->arch.mod;
> 
>  	/* read where this goes */
> -	if (probe_kernel_read(op, ip, sizeof(op)))
> +	if (probe_kernel_read_inst(op, ip))
> +		return -EFAULT;
> +
> +	if (probe_kernel_read_inst(op + 1, ip + 4))
>  		return -EFAULT;

I had to double check the above for what happens when we introduce prefix 
instructions but it looks mostly correct. There does however look to be a 
corner case that could alter the behaviour once prefix instructions are 
introduced.

With prefix instructions probe_kernel_read_inst() will read 8 bytes if the first 
4 bytes are a valid prefix. Therefore the above could end up trying to read 12 
bytes in total if the ip is a normal instruction and ip+4 is prefixed.

Obviously this is never going to match the expected nop sequence, and prefixed 
instructions shouldn't cross page boundaries so the extra 4 bytes should never 
be the cause of a fault either. The only difference we might see is 
ftrace_make_call() incorrectly returning -EFAULT instead of -EINVAL for an 
invalid (ie. crossing a 64 byte boundary) prefix instruction sequence.

In practice this doesn't seem like it would cause any real issues and the rest 
of the patch does not appear to change any existing behaviour.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

> 
>  	if (!expected_nop_sequence(ip, op[0], op[1])) {
> @@ -587,7 +590,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long
> addr) unsigned long ip = rec->ip;
> 
>  	/* read where this goes */
> -	if (probe_kernel_read(&op, (void *)ip, MCOUNT_INSN_SIZE))
> +	if (probe_kernel_read_inst(&op, (void *)ip))
>  		return -EFAULT;
> 
>  	/* It should be pointing to a nop */
> @@ -643,7 +646,7 @@ static int __ftrace_make_call_kernel(struct dyn_ftrace
> *rec, unsigned long addr) }
> 
>  	/* Make sure we have a nop */
> -	if (probe_kernel_read(&op, ip, sizeof(op))) {
> +	if (probe_kernel_read_inst(&op, ip)) {
>  		pr_err("Unable to read ftrace location %p\n", ip);
>  		return -EFAULT;
>  	}
> @@ -721,7 +724,7 @@ __ftrace_modify_call(struct dyn_ftrace *rec, unsigned
> long old_addr, }
> 
>  	/* read where this goes */
> -	if (probe_kernel_read(&op, (void *)ip, sizeof(int))) {
> +	if (probe_kernel_read_inst(&op, (void *)ip)) {
>  		pr_err("Fetching opcode failed.\n");
>  		return -EFAULT;
>  	}
> diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c
> index eaf786afad2b..08dedd927268 100644
> --- a/arch/powerpc/lib/inst.c
> +++ b/arch/powerpc/lib/inst.c
> @@ -16,3 +16,14 @@ int probe_user_read_inst(struct ppc_inst *inst,
>  	*inst = ppc_inst(val);
>  	return err;
>  }
> +
> +int probe_kernel_read_inst(struct ppc_inst *inst,
> +			   struct ppc_inst *src)
> +{
> +	unsigned int val;
> +	int err;
> +
> +	err = probe_kernel_read(&val, src, sizeof(val));
> +	*inst = ppc_inst(val);
> +	return err;
> +}

^ permalink raw reply

* [PATCH v2 2/2] powerpc/64s/hash: add torture_hpt kernel boot option to increase hash faults
From: Nicholas Piggin @ 2020-05-04  9:06 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin
In-Reply-To: <20200504090658.44996-1-npiggin@gmail.com>

This option increases the number of hash misses by limiting the number of
kernel HPT entries, by accessing the address immediately after installing
the PTE, then removing it again (except in the case of CI entries that
must not be accessed, these are removed on the next hash fault).

This helps stress test difficult to hit paths in the kernel.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
v2:
- Add a bit more to the changelog.
- Add some kuap stuff for when Aneesh's kaup for hash patches are
  merged.

 .../admin-guide/kernel-parameters.txt         |  9 +++
 arch/powerpc/include/asm/book3s/64/mmu-hash.h | 10 +++
 arch/powerpc/mm/book3s64/hash_4k.c            |  3 +
 arch/powerpc/mm/book3s64/hash_64k.c           |  8 ++
 arch/powerpc/mm/book3s64/hash_utils.c         | 76 ++++++++++++++++++-
 5 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 5a34b7dd9ebe..1ec6a32a717a 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -876,6 +876,15 @@
 			them frequently to increase the rate of SLB faults
 			on kernel addresses.
 
+	torture_hpt	[PPC]
+			Limits the number of kernel HPT entries in the hash
+			page table to increase the rate of hash page table
+			faults on kernel addresses.
+
+			This may hang when run on processors / emulators which
+			do not have a TLB, or flush it more often than
+			required, QEMU seems to have problems.
+
 	disable=	[IPV6]
 			See Documentation/networking/ipv6.txt.
 
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 758de1e0f676..539e3d91eac4 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -324,6 +324,16 @@ static inline bool torture_slb(void)
 	return static_branch_unlikely(&torture_slb_key);
 }
 
+extern bool torture_hpt_enabled;
+DECLARE_STATIC_KEY_FALSE(torture_hpt_key);
+static inline bool torture_hpt(void)
+{
+	return static_branch_unlikely(&torture_hpt_key);
+}
+
+void hpt_do_torture(unsigned long ea, unsigned long access,
+		    unsigned long rflags, unsigned long hpte_group);
+
 /*
  * This computes the AVPN and B fields of the first dword of a HPTE,
  * for use when we want to match an existing PTE.  The bottom 7 bits
diff --git a/arch/powerpc/mm/book3s64/hash_4k.c b/arch/powerpc/mm/book3s64/hash_4k.c
index 22e787123cdf..54e4ff8c558d 100644
--- a/arch/powerpc/mm/book3s64/hash_4k.c
+++ b/arch/powerpc/mm/book3s64/hash_4k.c
@@ -118,6 +118,9 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 		}
 		new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | H_PAGE_HASHPTE;
 		new_pte |= pte_set_hidx(ptep, rpte, 0, slot, PTRS_PER_PTE);
+
+		if (torture_hpt())
+			hpt_do_torture(ea, access, rflags, hpte_group);
 	}
 	*ptep = __pte(new_pte & ~H_PAGE_BUSY);
 	return 0;
diff --git a/arch/powerpc/mm/book3s64/hash_64k.c b/arch/powerpc/mm/book3s64/hash_64k.c
index 7084ce2951e6..19ea0fc145a9 100644
--- a/arch/powerpc/mm/book3s64/hash_64k.c
+++ b/arch/powerpc/mm/book3s64/hash_64k.c
@@ -216,6 +216,9 @@ int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid,
 	new_pte |= pte_set_hidx(ptep, rpte, subpg_index, slot, PTRS_PER_PTE);
 	new_pte |= H_PAGE_HASHPTE;
 
+	if (torture_hpt())
+		hpt_do_torture(ea, access, rflags, hpte_group);
+
 	*ptep = __pte(new_pte & ~H_PAGE_BUSY);
 	return 0;
 }
@@ -327,7 +330,12 @@ int __hash_page_64K(unsigned long ea, unsigned long access,
 
 		new_pte = (new_pte & ~_PAGE_HPTEFLAGS) | H_PAGE_HASHPTE;
 		new_pte |= pte_set_hidx(ptep, rpte, 0, slot, PTRS_PER_PTE);
+
+		if (torture_hpt())
+			hpt_do_torture(ea, access, rflags, hpte_group);
 	}
+
 	*ptep = __pte(new_pte & ~H_PAGE_BUSY);
+
 	return 0;
 }
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 9c487b5782ef..845da1e8ca4f 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -353,8 +353,12 @@ int htab_remove_mapping(unsigned long vstart, unsigned long vend,
 	return ret;
 }
 
-static bool disable_1tb_segments = false;
+static bool disable_1tb_segments __read_mostly = false;
 bool torture_slb_enabled __read_mostly = false;
+bool torture_hpt_enabled __read_mostly = false;
+
+/* per-CPU array allocated if we enable torture_hpt. */
+static unsigned long *torture_hpt_last_group;
 
 static int __init parse_disable_1tb_segments(char *p)
 {
@@ -370,6 +374,13 @@ static int __init parse_torture_slb(char *p)
 }
 early_param("torture_slb", parse_torture_slb);
 
+static int __init parse_torture_hpt(char *p)
+{
+	torture_hpt_enabled = true;
+	return 0;
+}
+early_param("torture_hpt", parse_torture_hpt);
+
 static int __init htab_dt_scan_seg_sizes(unsigned long node,
 					 const char *uname, int depth,
 					 void *data)
@@ -863,6 +874,7 @@ static void __init hash_init_partition_table(phys_addr_t hash_table,
 }
 
 DEFINE_STATIC_KEY_FALSE(torture_slb_key);
+DEFINE_STATIC_KEY_FALSE(torture_hpt_key);
 
 static void __init htab_initialize(void)
 {
@@ -882,6 +894,15 @@ static void __init htab_initialize(void)
 
 	if (torture_slb_enabled)
 		static_branch_enable(&torture_slb_key);
+	if (torture_hpt_enabled) {
+		unsigned long tmp;
+		static_branch_enable(&torture_hpt_key);
+		tmp = memblock_phys_alloc_range(sizeof(unsigned long) * NR_CPUS,
+						  0,
+						  0, MEMBLOCK_ALLOC_ANYWHERE);
+		memset((void *)tmp, 0xff, sizeof(unsigned long) * NR_CPUS);
+		torture_hpt_last_group = __va(tmp);
+	}
 
 	/*
 	 * Calculate the required size of the htab.  We want the number of
@@ -1901,6 +1922,59 @@ long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
 	return slot;
 }
 
+void hpt_do_torture(unsigned long ea, unsigned long access,
+		    unsigned long rflags, unsigned long hpte_group)
+{
+	unsigned long last_group;
+	int cpu = raw_smp_processor_id();
+
+	last_group = torture_hpt_last_group[cpu];
+	if (last_group != -1UL) {
+		while (mmu_hash_ops.hpte_remove(last_group) != -1)
+			;
+		torture_hpt_last_group[cpu] = -1UL;
+	}
+
+#define QEMU_WORKAROUND	0
+
+	if (ea >= PAGE_OFFSET) {
+		if (!QEMU_WORKAROUND && (access & (_PAGE_READ|_PAGE_WRITE)) &&
+		    !(rflags & (HPTE_R_I|HPTE_R_G))) {
+			/* prefetch / prefetchw does not seem to set up a TLB
+			 * entry with the powerpc systemsim (mambo) emulator,
+			 * though it works with real hardware. An alternative
+			 * approach that would work more reliably on quirky
+			 * emulators like QEMU may be to remember the last
+			 * insertion and remove that, rather than removing the
+			 * current insertion. Then no prefetch is required.
+			 */
+			if ((access & _PAGE_WRITE) && (access & _PAGE_READ)) {
+				void __user *mem = (void *)(ea & ~0x3);
+
+				allow_read_write_user(mem, mem, sizeof(atomic_t));
+				atomic_add(0, (atomic_t *)mem);
+				prevent_read_write_user(mem, mem, sizeof(atomic_t));
+
+			} else if (access & _PAGE_READ) {
+				void __user *mem = (void *)ea;
+
+				allow_read_from_user(mem, 1);
+				(void)*(volatile char *)mem;
+				prevent_read_from_user(mem, 1);
+			}
+
+			mb();
+
+			while (mmu_hash_ops.hpte_remove(hpte_group) != -1)
+				;
+		} else {
+			/* Can't prefetch cache-inhibited so clear next time. */
+			torture_hpt_last_group[cpu] = hpte_group;
+		}
+	}
+}
+
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
 {
-- 
2.23.0


^ permalink raw reply related

* [PATCH v2 1/2] powerpc/64s/hash: add torture_slb kernel boot option to increase SLB faults
From: Nicholas Piggin @ 2020-05-04  9:06 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Aneesh Kumar K . V, Nicholas Piggin

This option increases the number of SLB misses by limiting the number of
kernel SLB entries, and increased flushing of cached lookaside information.
This helps stress test difficult to hit paths in the kernel.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
v2:
- Address some comments from Aneesh about function names and types

 .../admin-guide/kernel-parameters.txt         |   5 +
 arch/powerpc/include/asm/book3s/64/mmu-hash.h |   7 +
 arch/powerpc/mm/book3s64/hash_utils.c         |  13 ++
 arch/powerpc/mm/book3s64/slb.c                | 152 ++++++++++++------
 4 files changed, 132 insertions(+), 45 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index f2a93c8679e8..5a34b7dd9ebe 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -871,6 +871,11 @@
 			can be useful when debugging issues that require an SLB
 			miss to occur.
 
+	torture_slb	[PPC]
+			Limits the number of kernel SLB entries, and flushes
+			them frequently to increase the rate of SLB faults
+			on kernel addresses.
+
 	disable=	[IPV6]
 			See Documentation/networking/ipv6.txt.
 
diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 3fa1b962dc27..758de1e0f676 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -317,6 +317,13 @@ extern unsigned long tce_alloc_start, tce_alloc_end;
  */
 extern int mmu_ci_restrictions;
 
+extern bool torture_slb_enabled;
+DECLARE_STATIC_KEY_FALSE(torture_slb_key);
+static inline bool torture_slb(void)
+{
+	return static_branch_unlikely(&torture_slb_key);
+}
+
 /*
  * This computes the AVPN and B fields of the first dword of a HPTE,
  * for use when we want to match an existing PTE.  The bottom 7 bits
diff --git a/arch/powerpc/mm/book3s64/hash_utils.c b/arch/powerpc/mm/book3s64/hash_utils.c
index 8ed2411c3f39..9c487b5782ef 100644
--- a/arch/powerpc/mm/book3s64/hash_utils.c
+++ b/arch/powerpc/mm/book3s64/hash_utils.c
@@ -354,6 +354,7 @@ int htab_remove_mapping(unsigned long vstart, unsigned long vend,
 }
 
 static bool disable_1tb_segments = false;
+bool torture_slb_enabled __read_mostly = false;
 
 static int __init parse_disable_1tb_segments(char *p)
 {
@@ -362,6 +363,13 @@ static int __init parse_disable_1tb_segments(char *p)
 }
 early_param("disable_1tb_segments", parse_disable_1tb_segments);
 
+static int __init parse_torture_slb(char *p)
+{
+	torture_slb_enabled = true;
+	return 0;
+}
+early_param("torture_slb", parse_torture_slb);
+
 static int __init htab_dt_scan_seg_sizes(unsigned long node,
 					 const char *uname, int depth,
 					 void *data)
@@ -854,6 +862,8 @@ static void __init hash_init_partition_table(phys_addr_t hash_table,
 	pr_info("Partition table %p\n", partition_tb);
 }
 
+DEFINE_STATIC_KEY_FALSE(torture_slb_key);
+
 static void __init htab_initialize(void)
 {
 	unsigned long table;
@@ -870,6 +880,9 @@ static void __init htab_initialize(void)
 		printk(KERN_INFO "Using 1TB segments\n");
 	}
 
+	if (torture_slb_enabled)
+		static_branch_enable(&torture_slb_key);
+
 	/*
 	 * Calculate the required size of the htab.  We want the number of
 	 * PTEGs to equal one half the number of real pages.
diff --git a/arch/powerpc/mm/book3s64/slb.c b/arch/powerpc/mm/book3s64/slb.c
index 716204aee3da..cf41c4b63d95 100644
--- a/arch/powerpc/mm/book3s64/slb.c
+++ b/arch/powerpc/mm/book3s64/slb.c
@@ -68,7 +68,7 @@ static void assert_slb_presence(bool present, unsigned long ea)
 	 * slbfee. requires bit 24 (PPC bit 39) be clear in RB. Hardware
 	 * ignores all other bits from 0-27, so just clear them all.
 	 */
-	ea &= ~((1UL << 28) - 1);
+	ea &= ~((1UL << SID_SHIFT) - 1);
 	asm volatile(__PPC_SLBFEE_DOT(%0, %1) : "=r"(tmp) : "r"(ea) : "cr0");
 
 	WARN_ON(present == (tmp == 0));
@@ -153,14 +153,42 @@ void slb_flush_all_realmode(void)
 	asm volatile("slbmte %0,%0; slbia" : : "r" (0));
 }
 
+static __always_inline void __slb_flush_and_restore_bolted(bool preserve_kernel_lookaside)
+{
+	struct slb_shadow *p = get_slb_shadow();
+	unsigned long ksp_esid_data, ksp_vsid_data;
+	u32 ih;
+
+	/*
+	 * SLBIA IH=1 on ISA v2.05 and newer processors may preserve lookaside
+	 * information created with Class=0 entries, which we use for kernel
+	 * SLB entries (the SLB entries themselves are still invalidated).
+	 *
+	 * Older processors will ignore this optimisation. Over-invalidation
+	 * is fine because we never rely on lookaside information existing.
+	 */
+	if (preserve_kernel_lookaside)
+		ih = 1;
+	else
+		ih = 0;
+
+	ksp_esid_data = be64_to_cpu(p->save_area[KSTACK_INDEX].esid);
+	ksp_vsid_data = be64_to_cpu(p->save_area[KSTACK_INDEX].vsid);
+
+	asm volatile(PPC_SLBIA(%0)"	\n"
+		     "slbmte	%1, %2	\n"
+		     :: "i" (ih),
+			"r" (ksp_vsid_data),
+			"r" (ksp_esid_data)
+		     : "memory");
+}
+
 /*
  * This flushes non-bolted entries, it can be run in virtual mode. Must
  * be called with interrupts disabled.
  */
 void slb_flush_and_restore_bolted(void)
 {
-	struct slb_shadow *p = get_slb_shadow();
-
 	BUILD_BUG_ON(SLB_NUM_BOLTED != 2);
 
 	WARN_ON(!irqs_disabled());
@@ -171,13 +199,10 @@ void slb_flush_and_restore_bolted(void)
 	 */
 	hard_irq_disable();
 
-	asm volatile("isync\n"
-		     "slbia\n"
-		     "slbmte  %0, %1\n"
-		     "isync\n"
-		     :: "r" (be64_to_cpu(p->save_area[KSTACK_INDEX].vsid)),
-			"r" (be64_to_cpu(p->save_area[KSTACK_INDEX].esid))
-		     : "memory");
+	isync();
+	__slb_flush_and_restore_bolted(false);
+	isync();
+
 	assert_slb_presence(true, get_paca()->kstack);
 
 	get_paca()->slb_cache_ptr = 0;
@@ -400,6 +425,30 @@ void preload_new_slb_context(unsigned long start, unsigned long sp)
 	local_irq_enable();
 }
 
+static void slb_cache_slbie_kernel(unsigned int index)
+{
+	unsigned long slbie_data = get_paca()->slb_cache[index];
+	unsigned long ksp = get_paca()->kstack;
+
+	slbie_data <<= SID_SHIFT;
+	slbie_data |= 0xc000000000000000ULL;
+	if ((ksp & slb_esid_mask(mmu_kernel_ssize)) == slbie_data)
+		return;
+	slbie_data |= mmu_kernel_ssize << SLBIE_SSIZE_SHIFT;
+
+	asm volatile("slbie %0" : : "r" (slbie_data));
+}
+
+static void slb_cache_slbie_user(unsigned int index)
+{
+	unsigned long slbie_data = get_paca()->slb_cache[index];
+
+	slbie_data <<= SID_SHIFT;
+	slbie_data |= user_segment_size(slbie_data) << SLBIE_SSIZE_SHIFT;
+	slbie_data |= SLBIE_C; /* user slbs have C=1 */
+
+	asm volatile("slbie %0" : : "r" (slbie_data));
+}
 
 /* Flush all user entries from the segment table of the current processor. */
 void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
@@ -414,8 +463,14 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 	 * which would update the slb_cache/slb_cache_ptr fields in the PACA.
 	 */
 	hard_irq_disable();
-	asm volatile("isync" : : : "memory");
-	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
+	isync();
+	if (torture_slb()) {
+		__slb_flush_and_restore_bolted(false);
+		isync();
+		get_paca()->slb_cache_ptr = 0;
+		get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1;
+
+	} else if (cpu_has_feature(CPU_FTR_ARCH_300)) {
 		/*
 		 * SLBIA IH=3 invalidates all Class=1 SLBEs and their
 		 * associated lookaside structures, which matches what
@@ -423,47 +478,29 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 		 * cache.
 		 */
 		asm volatile(PPC_SLBIA(3));
+
 	} else {
 		unsigned long offset = get_paca()->slb_cache_ptr;
 
 		if (!mmu_has_feature(MMU_FTR_NO_SLBIE_B) &&
 		    offset <= SLB_CACHE_ENTRIES) {
-			unsigned long slbie_data = 0;
-
-			for (i = 0; i < offset; i++) {
-				unsigned long ea;
-
-				ea = (unsigned long)
-					get_paca()->slb_cache[i] << SID_SHIFT;
-				/*
-				 * Could assert_slb_presence(true) here, but
-				 * hypervisor or machine check could have come
-				 * in and removed the entry at this point.
-				 */
-
-				slbie_data = ea;
-				slbie_data |= user_segment_size(slbie_data)
-						<< SLBIE_SSIZE_SHIFT;
-				slbie_data |= SLBIE_C; /* user slbs have C=1 */
-				asm volatile("slbie %0" : : "r" (slbie_data));
-			}
+			/*
+			 * Could assert_slb_presence(true) here, but
+			 * hypervisor or machine check could have come
+			 * in and removed the entry at this point.
+			 */
+
+			for (i = 0; i < offset; i++)
+				slb_cache_slbie_user(i);
 
 			/* Workaround POWER5 < DD2.1 issue */
 			if (!cpu_has_feature(CPU_FTR_ARCH_207S) && offset == 1)
-				asm volatile("slbie %0" : : "r" (slbie_data));
+				slb_cache_slbie_user(0);
 
 		} else {
-			struct slb_shadow *p = get_slb_shadow();
-			unsigned long ksp_esid_data =
-				be64_to_cpu(p->save_area[KSTACK_INDEX].esid);
-			unsigned long ksp_vsid_data =
-				be64_to_cpu(p->save_area[KSTACK_INDEX].vsid);
-
-			asm volatile(PPC_SLBIA(1) "\n"
-				     "slbmte	%0,%1\n"
-				     "isync"
-				     :: "r"(ksp_vsid_data),
-					"r"(ksp_esid_data));
+			/* Flush but retain kernel lookaside information */
+			__slb_flush_and_restore_bolted(true);
+			isync();
 
 			get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1;
 		}
@@ -503,7 +540,7 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
 	 * address accesses by the kernel (user mode won't happen until
 	 * rfid, which is safe).
 	 */
-	asm volatile("isync" : : : "memory");
+	isync();
 }
 
 void slb_set_size(u16 size)
@@ -571,6 +608,9 @@ static void slb_cache_update(unsigned long esid_data)
 	if (cpu_has_feature(CPU_FTR_ARCH_300))
 		return; /* ISAv3.0B and later does not use slb_cache */
 
+	if (torture_slb())
+		return;
+
 	/*
 	 * Now update slb cache entries
 	 */
@@ -580,7 +620,7 @@ static void slb_cache_update(unsigned long esid_data)
 		 * We have space in slb cache for optimized switch_slb().
 		 * Top 36 bits from esid_data as per ISA
 		 */
-		local_paca->slb_cache[slb_cache_index++] = esid_data >> 28;
+		local_paca->slb_cache[slb_cache_index++] = esid_data >> SID_SHIFT;
 		local_paca->slb_cache_ptr++;
 	} else {
 		/*
@@ -671,6 +711,28 @@ static long slb_insert_entry(unsigned long ea, unsigned long context,
 	 * accesses user memory before it returns to userspace with rfid.
 	 */
 	assert_slb_presence(false, ea);
+	if (torture_slb()) {
+		int slb_cache_index = local_paca->slb_cache_ptr;
+
+		/*
+		 * torture_slb() does not use slb cache, repurpose as a
+		 * cache of inserted (non-bolted) kernel SLB entries. All
+		 * non-bolted kernel entries are flushed on any user fault,
+		 * or if there are already 3 non-boled kernel entries.
+		 */
+		BUILD_BUG_ON(SLB_CACHE_ENTRIES < 3);
+		if (!kernel || slb_cache_index == 3) {
+			int i;
+
+			for (i = 0; i < slb_cache_index; i++)
+				slb_cache_slbie_kernel(i);
+			slb_cache_index = 0;
+		}
+
+		if (kernel)
+			local_paca->slb_cache[slb_cache_index++] = esid_data >> SID_SHIFT;
+		local_paca->slb_cache_ptr = slb_cache_index;
+	}
 	asm volatile("slbmte %0, %1" : : "r" (vsid_data), "r" (esid_data));
 
 	barrier();
-- 
2.23.0


^ permalink raw reply related

* Re: [PATCH v6 1/4] powerpc: Document details on H_SCM_HEALTH hcall
From: Vaibhav Jain @ 2020-05-04  8:52 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev, linux-nvdimm; +Cc: Aneesh Kumar K . V
In-Reply-To: <87r1w5echa.fsf@mpe.ellerman.id.au>

Thanks for looking into this patch Mpe,

Michael Ellerman <mpe@ellerman.id.au> writes:

> Vaibhav Jain <vaibhav@linux.ibm.com> writes:
>
>> Add documentation to 'papr_hcalls.rst' describing the bitmap flags
>> that are returned from H_SCM_HEALTH hcall as per the PAPR-SCM
>> specification.
>>
>> Cc: Dan Williams <dan.j.williams@intel.com>
>> Cc: Michael Ellerman <mpe@ellerman.id.au>
>> Cc: "Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>
>> Signed-off-by: Vaibhav Jain <vaibhav@linux.ibm.com>
>> ---
>> Changelog:
>>
>> v5..v6
>> * New patch in the series
>> ---
>>  Documentation/powerpc/papr_hcalls.rst | 43 ++++++++++++++++++++++++---
>>  1 file changed, 39 insertions(+), 4 deletions(-)
>>
>> diff --git a/Documentation/powerpc/papr_hcalls.rst b/Documentation/powerpc/papr_hcalls.rst
>> index 3493631a60f8..9a5ba5eaf323 100644
>> --- a/Documentation/powerpc/papr_hcalls.rst
>> +++ b/Documentation/powerpc/papr_hcalls.rst
>> @@ -220,13 +220,48 @@ from the LPAR memory.
>>  **H_SCM_HEALTH**
>>  
>>  | Input: drcIndex
>> -| Out: *health-bitmap, health-bit-valid-bitmap*
>> +| Out: *health-bitmap (r4), health-bit-valid-bitmap (r5)*
>>  | Return Value: *H_Success, H_Parameter, H_Hardware*
>>  
>>  Given a DRC Index return the info on predictive failure and overall health of
>> -the NVDIMM. The asserted bits in the health-bitmap indicate a single predictive
>> -failure and health-bit-valid-bitmap indicate which bits in health-bitmap are
>> -valid.
>> +the NVDIMM. The asserted bits in the health-bitmap indicate one or more states
>> +(described in table below) of the NVDIMM and health-bit-valid-bitmap indicate
>> +which bits in health-bitmap are valid.
>> +
>> +Health Bitmap Flags:
>> +
>> ++------+-----------------------------------------------------------------------+
>> +|  Bit |               Definition                                              |
>> ++======+=======================================================================+
>> +|  00  | SCM device is unable to persist memory contents.                      |
>> +|      | If the system is powered down, nothing will be saved.                 |
>> ++------+-----------------------------------------------------------------------+
>
> Are these correct bit numbers or backward IBM big endian bit numbers?
>
> ie. which bit is LSB?
These bit numbers index to a 64-bit dword laid in IBM big endian
format. So LSB would be at the located at a higher address. For example
0xC400000000000000 indicates bits 0, 1, and 5 are valid.

>
> cheers
> _______________________________________________
> Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
> To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

^ permalink raw reply

* Re: [PATCH v7 13/28] powerpc: Add a probe_user_read_inst() function
From: Alistair Popple @ 2020-05-04  8:30 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: npiggin, bala24, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20200501034220.8982-14-jniethe5@gmail.com>

Looks good.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

On Friday, 1 May 2020 1:42:05 PM AEST Jordan Niethe wrote:
> Introduce a probe_user_read_inst() function to use in cases where
> probe_user_read() is used for getting an instruction. This will be more
> useful for prefixed instructions.
> 
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> ---
> v6: - New to series
> ---
>  arch/powerpc/include/asm/inst.h |  3 +++
>  arch/powerpc/lib/Makefile       |  2 +-
>  arch/powerpc/lib/inst.c         | 18 ++++++++++++++++++
>  arch/powerpc/mm/fault.c         |  2 +-
>  4 files changed, 23 insertions(+), 2 deletions(-)
>  create mode 100644 arch/powerpc/lib/inst.c
> 
> diff --git a/arch/powerpc/include/asm/inst.h
> b/arch/powerpc/include/asm/inst.h index 552e953bf04f..3e9a58420151 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -37,4 +37,7 @@ static inline bool ppc_inst_equal(struct ppc_inst x,
> struct ppc_inst y) return ppc_inst_val(x) == ppc_inst_val(y);
>  }
> 
> +int probe_user_read_inst(struct ppc_inst *inst,
> +			 struct ppc_inst *nip);
> +
>  #endif /* _ASM_INST_H */
> diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
> index b8de3be10eb4..546591848219 100644
> --- a/arch/powerpc/lib/Makefile
> +++ b/arch/powerpc/lib/Makefile
> @@ -16,7 +16,7 @@ CFLAGS_code-patching.o += -DDISABLE_BRANCH_PROFILING
>  CFLAGS_feature-fixups.o += -DDISABLE_BRANCH_PROFILING
>  endif
> 
> -obj-y += alloc.o code-patching.o feature-fixups.o pmem.o
> +obj-y += alloc.o code-patching.o feature-fixups.o pmem.o inst.o
> 
>  ifndef CONFIG_KASAN
>  obj-y	+=	string.o memcmp_$(BITS).o
> diff --git a/arch/powerpc/lib/inst.c b/arch/powerpc/lib/inst.c
> new file mode 100644
> index 000000000000..eaf786afad2b
> --- /dev/null
> +++ b/arch/powerpc/lib/inst.c
> @@ -0,0 +1,18 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + *  Copyright 2020, IBM Corporation.
> + */
> +
> +#include <linux/uaccess.h>
> +#include <asm/inst.h>
> +
> +int probe_user_read_inst(struct ppc_inst *inst,
> +			 struct ppc_inst *nip)
> +{
> +	unsigned int val;
> +	int err;
> +
> +	err = probe_user_read(&val, nip, sizeof(val));
> +	*inst = ppc_inst(val);
> +	return err;
> +}
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 4a50f125ec18..f3a943eae305 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -281,7 +281,7 @@ static bool bad_stack_expansion(struct pt_regs *regs,
> unsigned long address, access_ok(nip, sizeof(*nip))) {
>  			struct ppc_inst inst;
> 
> -			if (!probe_user_read(&inst, nip, sizeof(inst)))
> +			if (!probe_user_read_inst(&inst, (struct ppc_inst __user *)nip))
>  				return !store_updates_sp(inst);
>  			*must_retry = true;
>  		}





^ permalink raw reply

* Re: [PATCH v7 12/28] powerpc: Use a function for reading instructions
From: Alistair Popple @ 2020-05-04  8:26 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: npiggin, bala24, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20200501034220.8982-13-jniethe5@gmail.com>

> diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
> index 31c870287f2b..6893d40a48c5 100644
> --- a/arch/powerpc/kernel/uprobes.c
> +++ b/arch/powerpc/kernel/uprobes.c
> @@ -174,7 +174,7 @@ bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe,
> struct pt_regs *regs) * emulate_step() returns 1 if the insn was
> successfully emulated. * For all other cases, we need to single-step in
> hardware.
>  	 */
> -	ret = emulate_step(regs, auprobe->insn);
> +	ret = emulate_step(regs, ppc_inst_read(&auprobe->insn));

I'm not a uprobe expert so I don't follow why we need this read here but the 
rest of the patch looked ok in that it shouldn't change behaviour (and in 
practice neither should the above) so:

Reviewed-by: Alistair Popple <alistair@popple.id.au>

>  	if (ret > 0)
>  		return true;
> 




^ permalink raw reply

* Re: [PATCH 1/2] powerpc/64s/hash: add torture_slb kernel boot option to increase SLB faults
From: Nicholas Piggin @ 2020-05-04  8:19 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <877dxsma7p.fsf@linux.ibm.com>

Excerpts from Aneesh Kumar K.V's message of May 4, 2020 5:27 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
>> This option increases the number of SLB misses by limiting the number of
>> kernel SLB entries, and increased flushing of cached lookaside information.
>> This helps stress test difficult to hit paths in the kernel.
>>
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> 
> ....
> 
>> +{
>> +	unsigned long slbie_data = get_paca()->slb_cache[index];
>> +	unsigned long ksp = get_paca()->kstack;
>> +
>> +	slbie_data <<= SID_SHIFT;
>> +	slbie_data |= 0xc000000000000000ULL;
>> +	if ((ksp & slb_esid_mask(mmu_kernel_ssize)) == slbie_data)
>> +		return;
>> +	slbie_data |= mmu_kernel_ssize << SLBIE_SSIZE_SHIFT;
>> +
>> +	asm volatile("slbie %0" : : "r" (slbie_data));
>> +}
>> +
>> +static void slb_cache_slbie(unsigned int index)
> 
> May be slb_cache_slbie_user()? Similar to _kernel above?

Yeah that'd help.

>> +{
>> +	unsigned long slbie_data = get_paca()->slb_cache[index];
>> +
>> +	slbie_data <<= SID_SHIFT;
>> +	slbie_data |= user_segment_size(slbie_data) << SLBIE_SSIZE_SHIFT;
>> +	slbie_data |= SLBIE_C; /* user slbs have C=1 */
>> +
>> +	asm volatile("slbie %0" : : "r" (slbie_data));
>> +}
>>  
>>  /* Flush all user entries from the segment table of the current processor. */
>>  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>> @@ -414,8 +449,14 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>>  	 * which would update the slb_cache/slb_cache_ptr fields in the PACA.
>>  	 */
>>  	hard_irq_disable();
>> -	asm volatile("isync" : : : "memory");
>> -	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
>> +	isync();
>> +	if (torture_slb()) {
>> +		__slb_flush_and_restore_bolted(0);
> 
> s/0/SLIBA_IH_ALL or something like that? 

IH isn't so simple. 0 isn't all, it's clear all SLBE except index zero, 
and flush all EA lookaside data.

Maybe callers should use a bool parameter though to flush kernel 
lookaside data.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH 2/2] powerpc/64s/hash: add torture_hpt kernel boot option to increase hash faults
From: Nicholas Piggin @ 2020-05-04  8:06 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev
In-Reply-To: <874kswm9s6.fsf@linux.ibm.com>

Excerpts from Aneesh Kumar K.V's message of May 4, 2020 5:36 pm:
> Nicholas Piggin <npiggin@gmail.com> writes:
> 
>> This option increases the number of hash misses by limiting the number of
>> kernel HPT entries. This helps stress test difficult to hit paths in the
>> kernel.
>>
> 
> It would nice if we can explain in commit message how we are limiting
> the number of HPT entries.

"limiting the number of kernel HPT entries by removing them as soon as 
possible after they are installed"?

> 
>> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
>> ---
>>  .../admin-guide/kernel-parameters.txt         |  9 +++
>>  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 10 +++
>>  arch/powerpc/mm/book3s64/hash_4k.c            |  3 +
>>  arch/powerpc/mm/book3s64/hash_64k.c           |  8 +++
>>  arch/powerpc/mm/book3s64/hash_utils.c         | 66 ++++++++++++++++++-
>>  5 files changed, 95 insertions(+), 1 deletion(-)
> 
> ....
> 
>   
>> +void hpt_do_torture(unsigned long ea, unsigned long access,
>> +		    unsigned long rflags, unsigned long hpte_group)
>> +{
>> +	unsigned long last_group;
>> +	int cpu = raw_smp_processor_id();
>> +
>> +	last_group = torture_hpt_last_group[cpu];
>> +	if (last_group != -1UL) {
>> +		while (mmu_hash_ops.hpte_remove(last_group) != -1)
>> +			;
>> +		torture_hpt_last_group[cpu] = -1UL;
>> +	}
>> +
>> +#define QEMU_WORKAROUND	0
>> +
>> +	if (ea >= PAGE_OFFSET) {
>> +		if (!QEMU_WORKAROUND && (access & (_PAGE_READ|_PAGE_WRITE)) &&
>> +		    !(rflags & (HPTE_R_I|HPTE_R_G))) {
>> +			/* prefetch / prefetchw does not seem to set up a TLB
>> +			 * entry with the powerpc systemsim (mambo) emulator,
>> +			 * though it works with real hardware. An alternative
>> +			 * approach that would work more reliably on quirky
>> +			 * emulators like QEMU may be to remember the last
>> +			 * insertion and remove that, rather than removing the
>> +			 * current insertion. Then no prefetch is required.
>> +			 */
>> +			if ((access & _PAGE_WRITE) && (access & _PAGE_READ))
>> +				atomic_add(0, (atomic_t *)(ea & ~0x3));
>> +			else if (access & _PAGE_READ)
>> +				*(volatile char *)ea;
>> +
>> +			mb();
>> +
>> +			while (mmu_hash_ops.hpte_remove(hpte_group) != -1)
>> +				;
> 
> Do we get similar hpte faults rate, if we remove everything except the
> current inserted entry?. If so that would largely simplify the code.

Well it would remove this one branch at least. It does actually help 
cause more faults and helps (in theory) irritate cases where you have 
two accesses to a vmalloc page in the kernel, where the first is okay
to take a fault but the second is buggy.

I actually like the interesting behaviour it exposes in emulators too. 
We should really fix mambo to have prefetches bring in TLBs, and fix 
qemu to bring in TLBs more like hardware, and it could have TLB vs PTE 
consistency checks to catch bugs like mambo does.

So I prefer leaving this in.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH v7 08/28] powerpc: Use a function for getting the instruction op code
From: Alistair Popple @ 2020-05-04  8:01 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: npiggin, bala24, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20200501034220.8982-9-jniethe5@gmail.com>

Looks good to me in that it doesn't look to change the behaviour of any 
existing code.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

On Friday, 1 May 2020 1:42:00 PM AEST Jordan Niethe wrote:
> In preparation for using a data type for instructions that can not be
> directly used with the '>>' operator use a function for getting the op
> code of an instruction.
> 
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> ---
> v4: New to series
> v6: - Rename ppc_inst_primary() to ppc_inst_primary_opcode()
>     - Use in vecemu.c, fault.c, sstep.c
>     - Move this patch after the ppc_inst_val() patch
> ---
>  arch/powerpc/include/asm/inst.h  | 5 +++++
>  arch/powerpc/kernel/align.c      | 2 +-
>  arch/powerpc/kernel/vecemu.c     | 3 ++-
>  arch/powerpc/lib/code-patching.c | 4 ++--
>  arch/powerpc/lib/sstep.c         | 2 +-
>  arch/powerpc/mm/fault.c          | 3 ++-
>  6 files changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/inst.h
> b/arch/powerpc/include/asm/inst.h index 8a9e73bfbd27..442a95f20de7 100644
> --- a/arch/powerpc/include/asm/inst.h
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -13,4 +13,9 @@ static inline u32 ppc_inst_val(u32 x)
>  	return x;
>  }
> 
> +static inline int ppc_inst_primary_opcode(u32 x)
> +{
> +	return ppc_inst_val(x) >> 26;
> +}
> +
>  #endif /* _ASM_INST_H */
> diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
> index 44921001f84a..47dbba81a227 100644
> --- a/arch/powerpc/kernel/align.c
> +++ b/arch/powerpc/kernel/align.c
> @@ -314,7 +314,7 @@ int fix_alignment(struct pt_regs *regs)
>  	}
> 
>  #ifdef CONFIG_SPE
> -	if ((ppc_inst_val(instr) >> 26) == 0x4) {
> +	if (ppc_inst_primary_opcode(instr) == 0x4) {
>  		int reg = (ppc_inst_val(instr) >> 21) & 0x1f;
>  		PPC_WARN_ALIGNMENT(spe, regs);
>  		return emulate_spe(regs, reg, instr);
> diff --git a/arch/powerpc/kernel/vecemu.c b/arch/powerpc/kernel/vecemu.c
> index 1f5e3b4c8ae4..a544590b90e5 100644
> --- a/arch/powerpc/kernel/vecemu.c
> +++ b/arch/powerpc/kernel/vecemu.c
> @@ -10,6 +10,7 @@
>  #include <asm/processor.h>
>  #include <asm/switch_to.h>
>  #include <linux/uaccess.h>
> +#include <asm/inst.h>
> 
>  /* Functions in vector.S */
>  extern void vaddfp(vector128 *dst, vector128 *a, vector128 *b);
> @@ -268,7 +269,7 @@ int emulate_altivec(struct pt_regs *regs)
>  		return -EFAULT;
> 
>  	word = ppc_inst_val(instr);
> -	if ((word >> 26) != 4)
> +	if (ppc_inst_primary_opcode(instr) != 4)
>  		return -EINVAL;		/* not an altivec instruction */
>  	vd = (word >> 21) & 0x1f;
>  	va = (word >> 16) & 0x1f;
> diff --git a/arch/powerpc/lib/code-patching.c
> b/arch/powerpc/lib/code-patching.c index baa849b1a1f9..f5c6dcbac44b 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -231,7 +231,7 @@ bool is_offset_in_branch_range(long offset)
>   */
>  bool is_conditional_branch(unsigned int instr)
>  {
> -	unsigned int opcode = instr >> 26;
> +	unsigned int opcode = ppc_inst_primary_opcode(instr);
> 
>  	if (opcode == 16)       /* bc, bca, bcl, bcla */
>  		return true;
> @@ -289,7 +289,7 @@ int create_cond_branch(unsigned int *instr, const
> unsigned int *addr,
> 
>  static unsigned int branch_opcode(unsigned int instr)
>  {
> -	return (instr >> 26) & 0x3F;
> +	return ppc_inst_primary_opcode(instr) & 0x3F;
>  }
> 
>  static int instr_is_branch_iform(unsigned int instr)
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index 14c93ee4ffc8..7f7be154da7e 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -1175,7 +1175,7 @@ int analyse_instr(struct instruction_op *op, const
> struct pt_regs *regs, word = ppc_inst_val(instr);
>  	op->type = COMPUTE;
> 
> -	opcode = instr >> 26;
> +	opcode = ppc_inst_primary_opcode(instr);
>  	switch (opcode) {
>  	case 16:	/* bc */
>  		op->type = BRANCH;
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 9364921870df..0e7e145d5cad 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -41,6 +41,7 @@
>  #include <asm/siginfo.h>
>  #include <asm/debug.h>
>  #include <asm/kup.h>
> +#include <asm/inst.h>
> 
>  /*
>   * Check whether the instruction inst is a store using
> @@ -52,7 +53,7 @@ static bool store_updates_sp(unsigned int inst)
>  	if (((ppc_inst_val(inst) >> 16) & 0x1f) != 1)
>  		return false;
>  	/* check major opcode */
> -	switch (inst >> 26) {
> +	switch (ppc_inst_primary_opcode(inst)) {
>  	case OP_STWU:
>  	case OP_STBU:
>  	case OP_STHU:





^ permalink raw reply

* Re: [PATCH 2/2] powerpc/64s/hash: add torture_hpt kernel boot option to increase hash faults
From: Aneesh Kumar K.V @ 2020-05-04  7:36 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200503082236.17991-2-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> This option increases the number of hash misses by limiting the number of
> kernel HPT entries. This helps stress test difficult to hit paths in the
> kernel.
>

It would nice if we can explain in commit message how we are limiting
the number of HPT entries.

> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> ---
>  .../admin-guide/kernel-parameters.txt         |  9 +++
>  arch/powerpc/include/asm/book3s/64/mmu-hash.h | 10 +++
>  arch/powerpc/mm/book3s64/hash_4k.c            |  3 +
>  arch/powerpc/mm/book3s64/hash_64k.c           |  8 +++
>  arch/powerpc/mm/book3s64/hash_utils.c         | 66 ++++++++++++++++++-
>  5 files changed, 95 insertions(+), 1 deletion(-)

....

  
> +void hpt_do_torture(unsigned long ea, unsigned long access,
> +		    unsigned long rflags, unsigned long hpte_group)
> +{
> +	unsigned long last_group;
> +	int cpu = raw_smp_processor_id();
> +
> +	last_group = torture_hpt_last_group[cpu];
> +	if (last_group != -1UL) {
> +		while (mmu_hash_ops.hpte_remove(last_group) != -1)
> +			;
> +		torture_hpt_last_group[cpu] = -1UL;
> +	}
> +
> +#define QEMU_WORKAROUND	0
> +
> +	if (ea >= PAGE_OFFSET) {
> +		if (!QEMU_WORKAROUND && (access & (_PAGE_READ|_PAGE_WRITE)) &&
> +		    !(rflags & (HPTE_R_I|HPTE_R_G))) {
> +			/* prefetch / prefetchw does not seem to set up a TLB
> +			 * entry with the powerpc systemsim (mambo) emulator,
> +			 * though it works with real hardware. An alternative
> +			 * approach that would work more reliably on quirky
> +			 * emulators like QEMU may be to remember the last
> +			 * insertion and remove that, rather than removing the
> +			 * current insertion. Then no prefetch is required.
> +			 */
> +			if ((access & _PAGE_WRITE) && (access & _PAGE_READ))
> +				atomic_add(0, (atomic_t *)(ea & ~0x3));
> +			else if (access & _PAGE_READ)
> +				*(volatile char *)ea;
> +
> +			mb();
> +
> +			while (mmu_hash_ops.hpte_remove(hpte_group) != -1)
> +				;

Do we get similar hpte faults rate, if we remove everything except the
current inserted entry?. If so that would largely simplify the code.

> +		} else {
> +			/* Can't prefetch cache-inhibited so clear next time. */
> +			torture_hpt_last_group[cpu] = hpte_group;
> +		}
> +	}
> +}
> +
> +
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  static void kernel_map_linear_page(unsigned long vaddr, unsigned long lmi)
>  {
> -- 
> 2.23.0

-aneesh

^ permalink raw reply

* Re: Build regressions/improvements in v5.7-rc4
From: Geert Uytterhoeven @ 2020-05-04  7:32 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: Deucher, Alexander, Timothy Pearson, linuxppc-dev
In-Reply-To: <20200504072229.31214-1-geert@linux-m68k.org>

On Mon, May 4, 2020 at 9:24 AM Geert Uytterhoeven <geert@linux-m68k.org> wrote:
> JFYI, when comparing v5.7-rc4[1] to v5.7-rc3[3], the summaries are:
>   - build errors: +3/-123

> [1] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/0e698dfa282211e414076f9dc7e83c1c288314fd/ (all 239 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/branch/linus/head/6a8b55ed4056ea5559ebe4f6a4b247f627870d4c/ (all 239 configs)

It's back:

  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:
error: implicit declaration of function 'cpu_has_feature'
[-Werror=implicit-function-declaration]:  => 626:2
  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:
error: implicit declaration of function 'disable_kernel_vsx'
[-Werror=implicit-function-declaration]:  => 662:2
  + /kisskb/src/drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:
error: implicit declaration of function 'enable_kernel_vsx'
[-Werror=implicit-function-declaration]:  => 626:2

powerpc-gcc4.6/ppc64_book3e_allmodconfig

powerpc-gcc9/ppc64_book3e_allmodconfig builds fine, as it doesn't have
DRM_AMD_DC_DCN, enabled due to:

    select DRM_AMD_DC_DCN if (X86 || PPC64) && !(KCOV_INSTRUMENT_ALL
&& KCOV_ENABLE_COMPARISONS)

Gr{oetje,eeting}s,

                        Geert

-- 
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH 1/2] powerpc/64s/hash: add torture_slb kernel boot option to increase SLB faults
From: Aneesh Kumar K.V @ 2020-05-04  7:27 UTC (permalink / raw)
  To: Nicholas Piggin, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20200503082236.17991-1-npiggin@gmail.com>

Nicholas Piggin <npiggin@gmail.com> writes:

> This option increases the number of SLB misses by limiting the number of
> kernel SLB entries, and increased flushing of cached lookaside information.
> This helps stress test difficult to hit paths in the kernel.
>
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>

....

> +{
> +	unsigned long slbie_data = get_paca()->slb_cache[index];
> +	unsigned long ksp = get_paca()->kstack;
> +
> +	slbie_data <<= SID_SHIFT;
> +	slbie_data |= 0xc000000000000000ULL;
> +	if ((ksp & slb_esid_mask(mmu_kernel_ssize)) == slbie_data)
> +		return;
> +	slbie_data |= mmu_kernel_ssize << SLBIE_SSIZE_SHIFT;
> +
> +	asm volatile("slbie %0" : : "r" (slbie_data));
> +}
> +
> +static void slb_cache_slbie(unsigned int index)

May be slb_cache_slbie_user()? Similar to _kernel above?

> +{
> +	unsigned long slbie_data = get_paca()->slb_cache[index];
> +
> +	slbie_data <<= SID_SHIFT;
> +	slbie_data |= user_segment_size(slbie_data) << SLBIE_SSIZE_SHIFT;
> +	slbie_data |= SLBIE_C; /* user slbs have C=1 */
> +
> +	asm volatile("slbie %0" : : "r" (slbie_data));
> +}
>  
>  /* Flush all user entries from the segment table of the current processor. */
>  void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
> @@ -414,8 +449,14 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>  	 * which would update the slb_cache/slb_cache_ptr fields in the PACA.
>  	 */
>  	hard_irq_disable();
> -	asm volatile("isync" : : : "memory");
> -	if (cpu_has_feature(CPU_FTR_ARCH_300)) {
> +	isync();
> +	if (torture_slb()) {
> +		__slb_flush_and_restore_bolted(0);

s/0/SLIBA_IH_ALL or something like that? 


> +		isync();
> +		get_paca()->slb_cache_ptr = 0;
> +		get_paca()->slb_kern_bitmap = (1U << SLB_NUM_BOLTED) - 1;
> +
> +	} else if (cpu_has_feature(CPU_FTR_ARCH_300)) {
>  		/*
>  		 * SLBIA IH=3 invalidates all Class=1 SLBEs and their
>  		 * associated lookaside structures, which matches what
> @@ -423,47 +464,36 @@ void switch_slb(struct task_struct *tsk, struct mm_struct *mm)
>  		 * cache.
>  		 */
>  		asm volatile(PPC_SLBIA(3));


-aneesh

^ permalink raw reply

* Re: [PATCH] powerpc/mce: Add MCE notification chain
From: Ganesh @ 2020-05-04  6:39 UTC (permalink / raw)
  To: mpe, linuxppc-dev; +Cc: mahesh, santosh, arbab, npiggin, aneesh.kumar
In-Reply-To: <20200330071219.12284-1-ganeshgr@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 470 bytes --]

On 3/30/20 12:42 PM, Ganesh Goudar wrote:

> From: Santosh S <santosh@fossix.org>
>
> Introduce notification chain which lets know about uncorrected memory
> errors(UE). This would help prospective users in pmem or nvdimm subsystem
> to track bad blocks for better handling of persistent memory allocations.
>
> Signed-off-by: Santosh S <santosh@fossix.org>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> ---

Hi mpe, Do you have any comments on this patch?


[-- Attachment #2: Type: text/html, Size: 1030 bytes --]

^ permalink raw reply

* Re: [PATCH v7 06/28] powerpc: Use a macro for creating instructions from u32s
From: Alistair Popple @ 2020-05-04  5:54 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: npiggin, bala24, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20200501034220.8982-7-jniethe5@gmail.com>

I haven't reviewed our existing code to figure out if we've caught all places a 
unsigned int is used as an instruction, but what's here looks correct and I 
assume any missed cases will likely get caught by the compiler when the 
datatype is actually introduced. At least past experience trying to build this 
series suggests that assumption is reasonably accurate :-)

Michael - this requires a minor fix up when merged on top of Ravi's 2nd DAWR 
series, let me know if you need it.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

On Friday, 1 May 2020 1:41:58 PM AEST Jordan Niethe wrote:
> In preparation for instructions having a more complex data type start
> using a macro, ppc_inst(), for making an instruction out of a u32.  A
> macro is used so that instructions can be used as initializer elements.
> Currently this does nothing, but it will allow for creating a data type
> that can represent prefixed instructions.
> 
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> ---
> v4: New to series
> v5: - Rename PPC_INST() -> ppc_inst().
>     - Use on epapr_paravirt.c, kgdb.c
> v6: - Use in setup_32.c
>     - epapr_paravirt.c: early_init_dt_scan_epapr(): move the use of
>       ppc_inst() earlier.
> ---
>  arch/powerpc/include/asm/code-patching.h |  3 +-
>  arch/powerpc/include/asm/inst.h          | 11 +++++
>  arch/powerpc/kernel/align.c              |  1 +
>  arch/powerpc/kernel/epapr_paravirt.c     |  3 +-
>  arch/powerpc/kernel/hw_breakpoint.c      |  3 +-
>  arch/powerpc/kernel/jump_label.c         |  3 +-
>  arch/powerpc/kernel/kgdb.c               |  5 ++-
>  arch/powerpc/kernel/kprobes.c            |  5 ++-
>  arch/powerpc/kernel/module_64.c          |  3 +-
>  arch/powerpc/kernel/optprobes.c          | 31 ++++++-------
>  arch/powerpc/kernel/security.c           |  9 ++--
>  arch/powerpc/kernel/setup_32.c           |  2 +-
>  arch/powerpc/kernel/trace/ftrace.c       | 25 ++++++-----
>  arch/powerpc/kernel/uprobes.c            |  1 +
>  arch/powerpc/kvm/emulate_loadstore.c     |  2 +-
>  arch/powerpc/lib/code-patching.c         | 57 ++++++++++++------------
>  arch/powerpc/lib/feature-fixups.c        | 39 ++++++++--------
>  arch/powerpc/lib/test_emulate_step.c     | 39 ++++++++--------
>  arch/powerpc/xmon/xmon.c                 |  7 +--
>  19 files changed, 138 insertions(+), 111 deletions(-)
>  create mode 100644 arch/powerpc/include/asm/inst.h
> 
> diff --git a/arch/powerpc/include/asm/code-patching.h
> b/arch/powerpc/include/asm/code-patching.h index 351dda7215b6..48e021957ee5
> 100644
> --- a/arch/powerpc/include/asm/code-patching.h
> +++ b/arch/powerpc/include/asm/code-patching.h
> @@ -11,6 +11,7 @@
>  #include <linux/string.h>
>  #include <linux/kallsyms.h>
>  #include <asm/asm-compat.h>
> +#include <asm/inst.h>
> 
>  /* Flags for create_branch:
>   * "b"   == create_branch(addr, target, 0);
> @@ -48,7 +49,7 @@ static inline int patch_branch_site(s32 *site, unsigned
> long target, int flags) static inline int modify_instruction(unsigned int
> *addr, unsigned int clr, unsigned int set)
>  {
> -	return patch_instruction(addr, (*addr & ~clr) | set);
> +	return patch_instruction(addr, ppc_inst((*addr & ~clr) | set));
>  }
> 
>  static inline int modify_instruction_site(s32 *site, unsigned int clr,
> unsigned int set) diff --git a/arch/powerpc/include/asm/inst.h
> b/arch/powerpc/include/asm/inst.h new file mode 100644
> index 000000000000..5298ba33b6e5
> --- /dev/null
> +++ b/arch/powerpc/include/asm/inst.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +#ifndef _ASM_INST_H
> +#define _ASM_INST_H
> +
> +/*
> + * Instruction data type for POWER
> + */
> +
> +#define ppc_inst(x) (x)
> +
> +#endif /* _ASM_INST_H */
> diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
> index 92045ed64976..86e9bf62f18c 100644
> --- a/arch/powerpc/kernel/align.c
> +++ b/arch/powerpc/kernel/align.c
> @@ -24,6 +24,7 @@
>  #include <asm/disassemble.h>
>  #include <asm/cpu_has_feature.h>
>  #include <asm/sstep.h>
> +#include <asm/inst.h>
> 
>  struct aligninfo {
>  	unsigned char len;
> diff --git a/arch/powerpc/kernel/epapr_paravirt.c
> b/arch/powerpc/kernel/epapr_paravirt.c index 9d32158ce36f..e8eb72a65572
> 100644
> --- a/arch/powerpc/kernel/epapr_paravirt.c
> +++ b/arch/powerpc/kernel/epapr_paravirt.c
> @@ -11,6 +11,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/code-patching.h>
>  #include <asm/machdep.h>
> +#include <asm/inst.h>
> 
>  #if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
>  extern void epapr_ev_idle(void);
> @@ -36,7 +37,7 @@ static int __init early_init_dt_scan_epapr(unsigned long
> node, return -1;
> 
>  	for (i = 0; i < (len / 4); i++) {
> -		u32 inst = be32_to_cpu(insts[i]);
> +		u32 inst = ppc_inst(be32_to_cpu(insts[i]));
>  		patch_instruction(epapr_hypercall_start + i, inst);
>  #if !defined(CONFIG_64BIT) || defined(CONFIG_PPC_BOOK3E_64)
>  		patch_instruction(epapr_ev_idle_start + i, inst);
> diff --git a/arch/powerpc/kernel/hw_breakpoint.c
> b/arch/powerpc/kernel/hw_breakpoint.c index 72f461bd70fb..46e09ac8b84a
> 100644
> --- a/arch/powerpc/kernel/hw_breakpoint.c
> +++ b/arch/powerpc/kernel/hw_breakpoint.c
> @@ -24,6 +24,7 @@
>  #include <asm/debug.h>
>  #include <asm/debugfs.h>
>  #include <asm/hvcall.h>
> +#include <asm/inst.h>
>  #include <linux/uaccess.h>
> 
>  /*
> @@ -243,7 +244,7 @@ dar_range_overlaps(unsigned long dar, int size, struct
> arch_hw_breakpoint *info) static bool stepping_handler(struct pt_regs
> *regs, struct perf_event *bp, struct arch_hw_breakpoint *info)
>  {
> -	unsigned int instr = 0;
> +	unsigned int instr = ppc_inst(0);
>  	int ret, type, size;
>  	struct instruction_op op;
>  	unsigned long addr = info->address;
> diff --git a/arch/powerpc/kernel/jump_label.c
> b/arch/powerpc/kernel/jump_label.c index ca37702bde97..daa4afce7ec8 100644
> --- a/arch/powerpc/kernel/jump_label.c
> +++ b/arch/powerpc/kernel/jump_label.c
> @@ -6,6 +6,7 @@
>  #include <linux/kernel.h>
>  #include <linux/jump_label.h>
>  #include <asm/code-patching.h>
> +#include <asm/inst.h>
> 
>  void arch_jump_label_transform(struct jump_entry *entry,
>  			       enum jump_label_type type)
> @@ -15,5 +16,5 @@ void arch_jump_label_transform(struct jump_entry *entry,
>  	if (type == JUMP_LABEL_JMP)
>  		patch_branch(addr, entry->target, 0);
>  	else
> -		patch_instruction(addr, PPC_INST_NOP);
> +		patch_instruction(addr, ppc_inst(PPC_INST_NOP));
>  }
> diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
> index 7dd55eb1259d..a6b38a19133f 100644
> --- a/arch/powerpc/kernel/kgdb.c
> +++ b/arch/powerpc/kernel/kgdb.c
> @@ -26,6 +26,7 @@
>  #include <asm/debug.h>
>  #include <asm/code-patching.h>
>  #include <linux/slab.h>
> +#include <asm/inst.h>
> 
>  /*
>   * This table contains the mapping between PowerPC hardware trap types, and
> @@ -424,7 +425,7 @@ int kgdb_arch_set_breakpoint(struct kgdb_bkpt *bpt) if
> (err)
>  		return err;
> 
> -	err = patch_instruction(addr, BREAK_INSTR);
> +	err = patch_instruction(addr, ppc_inst(BREAK_INSTR));
>  	if (err)
>  		return -EFAULT;
> 
> @@ -439,7 +440,7 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
>  	unsigned int instr = *(unsigned int *)bpt->saved_instr;
>  	unsigned int *addr = (unsigned int *)bpt->bpt_addr;
> 
> -	err = patch_instruction(addr, instr);
> +	err = patch_instruction(addr, ppc_inst(instr));
>  	if (err)
>  		return -EFAULT;
> 
> diff --git a/arch/powerpc/kernel/kprobes.c b/arch/powerpc/kernel/kprobes.c
> index 81efb605113e..2378a7ed4438 100644
> --- a/arch/powerpc/kernel/kprobes.c
> +++ b/arch/powerpc/kernel/kprobes.c
> @@ -23,6 +23,7 @@
>  #include <asm/cacheflush.h>
>  #include <asm/sstep.h>
>  #include <asm/sections.h>
> +#include <asm/inst.h>
>  #include <linux/uaccess.h>
> 
>  DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
> @@ -138,13 +139,13 @@ NOKPROBE_SYMBOL(arch_prepare_kprobe);
> 
>  void arch_arm_kprobe(struct kprobe *p)
>  {
> -	patch_instruction(p->addr, BREAKPOINT_INSTRUCTION);
> +	patch_instruction(p->addr, ppc_inst(BREAKPOINT_INSTRUCTION));
>  }
>  NOKPROBE_SYMBOL(arch_arm_kprobe);
> 
>  void arch_disarm_kprobe(struct kprobe *p)
>  {
> -	patch_instruction(p->addr, p->opcode);
> +	patch_instruction(p->addr, ppc_inst(p->opcode));
>  }
>  NOKPROBE_SYMBOL(arch_disarm_kprobe);
> 
> diff --git a/arch/powerpc/kernel/module_64.c
> b/arch/powerpc/kernel/module_64.c index 007606a48fd9..7fd6b29edcb2 100644
> --- a/arch/powerpc/kernel/module_64.c
> +++ b/arch/powerpc/kernel/module_64.c
> @@ -20,6 +20,7 @@
>  #include <linux/sort.h>
>  #include <asm/setup.h>
>  #include <asm/sections.h>
> +#include <asm/inst.h>
> 
>  /* FIXME: We don't do .init separately.  To do this, we'd need to have
>     a separate r2 value in the init and core section, and stub between
> @@ -506,7 +507,7 @@ static int restore_r2(const char *name, u32
> *instruction, struct module *me) * "link" branches and they don't return,
> so they don't need the r2 * restore afterwards.
>  	 */
> -	if (!instr_is_relative_link_branch(*prev_insn))
> +	if (!instr_is_relative_link_branch(ppc_inst(*prev_insn)))
>  		return 1;
> 
>  	if (*instruction != PPC_INST_NOP) {
> diff --git a/arch/powerpc/kernel/optprobes.c
> b/arch/powerpc/kernel/optprobes.c index 445b3dad82dc..3b33ebf18859 100644
> --- a/arch/powerpc/kernel/optprobes.c
> +++ b/arch/powerpc/kernel/optprobes.c
> @@ -16,6 +16,7 @@
>  #include <asm/code-patching.h>
>  #include <asm/sstep.h>
>  #include <asm/ppc-opcode.h>
> +#include <asm/inst.h>
> 
>  #define TMPL_CALL_HDLR_IDX	\
>  	(optprobe_template_call_handler - optprobe_template_entry)
> @@ -147,13 +148,13 @@ void arch_remove_optimized_kprobe(struct
> optimized_kprobe *op) void patch_imm32_load_insns(unsigned int val,
> kprobe_opcode_t *addr) {
>  	/* addis r4,0,(insn)@h */
> -	patch_instruction(addr, PPC_INST_ADDIS | ___PPC_RT(4) |
> -			  ((val >> 16) & 0xffff));
> +	patch_instruction(addr, ppc_inst(PPC_INST_ADDIS | ___PPC_RT(4) |
> +			  ((val >> 16) & 0xffff)));
>  	addr++;
> 
>  	/* ori r4,r4,(insn)@l */
> -	patch_instruction(addr, PPC_INST_ORI | ___PPC_RA(4) |
> -			  ___PPC_RS(4) | (val & 0xffff));
> +	patch_instruction(addr, ppc_inst(PPC_INST_ORI | ___PPC_RA(4) |
> +			  ___PPC_RS(4) | (val & 0xffff)));
>  }
> 
>  /*
> @@ -163,28 +164,28 @@ void patch_imm32_load_insns(unsigned int val,
> kprobe_opcode_t *addr) void patch_imm64_load_insns(unsigned long val,
> kprobe_opcode_t *addr) {
>  	/* lis r3,(op)@highest */
> -	patch_instruction(addr, PPC_INST_ADDIS | ___PPC_RT(3) |
> -			  ((val >> 48) & 0xffff));
> +	patch_instruction(addr, ppc_inst(PPC_INST_ADDIS | ___PPC_RT(3) |
> +			  ((val >> 48) & 0xffff)));
>  	addr++;
> 
>  	/* ori r3,r3,(op)@higher */
> -	patch_instruction(addr, PPC_INST_ORI | ___PPC_RA(3) |
> -			  ___PPC_RS(3) | ((val >> 32) & 0xffff));
> +	patch_instruction(addr, ppc_inst(PPC_INST_ORI | ___PPC_RA(3) |
> +			  ___PPC_RS(3) | ((val >> 32) & 0xffff)));
>  	addr++;
> 
>  	/* rldicr r3,r3,32,31 */
> -	patch_instruction(addr, PPC_INST_RLDICR | ___PPC_RA(3) |
> -			  ___PPC_RS(3) | __PPC_SH64(32) | __PPC_ME64(31));
> +	patch_instruction(addr, ppc_inst(PPC_INST_RLDICR | ___PPC_RA(3) |
> +			  ___PPC_RS(3) | __PPC_SH64(32) | __PPC_ME64(31)));
>  	addr++;
> 
>  	/* oris r3,r3,(op)@h */
> -	patch_instruction(addr, PPC_INST_ORIS | ___PPC_RA(3) |
> -			  ___PPC_RS(3) | ((val >> 16) & 0xffff));
> +	patch_instruction(addr, ppc_inst(PPC_INST_ORIS | ___PPC_RA(3) |
> +			  ___PPC_RS(3) | ((val >> 16) & 0xffff)));
>  	addr++;
> 
>  	/* ori r3,r3,(op)@l */
> -	patch_instruction(addr, PPC_INST_ORI | ___PPC_RA(3) |
> -			  ___PPC_RS(3) | (val & 0xffff));
> +	patch_instruction(addr, ppc_inst(PPC_INST_ORI | ___PPC_RA(3) |
> +			  ___PPC_RS(3) | (val & 0xffff)));
>  }
> 
>  int arch_prepare_optimized_kprobe(struct optimized_kprobe *op, struct
> kprobe *p) @@ -230,7 +231,7 @@ int arch_prepare_optimized_kprobe(struct
> optimized_kprobe *op, struct kprobe *p) size = (TMPL_END_IDX *
> sizeof(kprobe_opcode_t)) / sizeof(int);
>  	pr_devel("Copying template to %p, size %lu\n", buff, size);
>  	for (i = 0; i < size; i++) {
> -		rc = patch_instruction(buff + i, *(optprobe_template_entry + i));
> +		rc = patch_instruction(buff + i, ppc_inst(*(optprobe_template_entry +
> i))); if (rc < 0)
>  			goto error;
>  	}
> diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
> index bd70f5be1c27..81a288b1a603 100644
> --- a/arch/powerpc/kernel/security.c
> +++ b/arch/powerpc/kernel/security.c
> @@ -14,6 +14,7 @@
>  #include <asm/debugfs.h>
>  #include <asm/security_features.h>
>  #include <asm/setup.h>
> +#include <asm/inst.h>
> 
> 
>  u64 powerpc_security_features __read_mostly = SEC_FTR_DEFAULT;
> @@ -403,9 +404,9 @@ static void toggle_count_cache_flush(bool enable)
>  		enable = false;
> 
>  	if (!enable) {
> -		patch_instruction_site(&patch__call_flush_count_cache, PPC_INST_NOP);
> +		patch_instruction_site(&patch__call_flush_count_cache,
> ppc_inst(PPC_INST_NOP)); #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> -		patch_instruction_site(&patch__call_kvm_flush_link_stack, 
PPC_INST_NOP);
> +		patch_instruction_site(&patch__call_kvm_flush_link_stack,
> ppc_inst(PPC_INST_NOP)); #endif
>  		pr_info("link-stack-flush: software flush disabled.\n");
>  		link_stack_flush_enabled = false;
> @@ -428,7 +429,7 @@ static void toggle_count_cache_flush(bool enable)
> 
>  	// If we just need to flush the link stack, patch an early return
>  	if (!security_ftr_enabled(SEC_FTR_FLUSH_COUNT_CACHE)) {
> -		patch_instruction_site(&patch__flush_link_stack_return, PPC_INST_BLR);
> +		patch_instruction_site(&patch__flush_link_stack_return,
> ppc_inst(PPC_INST_BLR)); no_count_cache_flush();
>  		return;
>  	}
> @@ -439,7 +440,7 @@ static void toggle_count_cache_flush(bool enable)
>  		return;
>  	}
> 
> -	patch_instruction_site(&patch__flush_count_cache_return, PPC_INST_BLR);
> +	patch_instruction_site(&patch__flush_count_cache_return,
> ppc_inst(PPC_INST_BLR)); count_cache_flush_type = COUNT_CACHE_FLUSH_HW;
>  	pr_info("count-cache-flush: hardware assisted flush sequence enabled\n");
>  }
> diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
> index 3a43e8e847c8..0536e4aed330 100644
> --- a/arch/powerpc/kernel/setup_32.c
> +++ b/arch/powerpc/kernel/setup_32.c
> @@ -85,7 +85,7 @@ notrace void __init machine_init(u64 dt_ptr)
>  	/* Enable early debugging if any specified (see udbg.h) */
>  	udbg_early_init();
> 
> -	patch_instruction_site(&patch__memcpy_nocache, PPC_INST_NOP);
> +	patch_instruction_site(&patch__memcpy_nocache, ppc_inst(PPC_INST_NOP));
> 
>  	create_cond_branch(&insn, addr, branch_target(addr), 0x820000);
>  	patch_instruction(addr, insn);	/* replace b by bne cr0 */
> diff --git a/arch/powerpc/kernel/trace/ftrace.c
> b/arch/powerpc/kernel/trace/ftrace.c index 828c5f64ca1e..0318e1ed6248
> 100644
> --- a/arch/powerpc/kernel/trace/ftrace.c
> +++ b/arch/powerpc/kernel/trace/ftrace.c
> @@ -27,6 +27,7 @@
>  #include <asm/code-patching.h>
>  #include <asm/ftrace.h>
>  #include <asm/syscall.h>
> +#include <asm/inst.h>
> 
> 
>  #ifdef CONFIG_DYNAMIC_FTRACE
> @@ -161,7 +162,7 @@ __ftrace_make_nop(struct module *mod,
> 
>  #ifdef CONFIG_MPROFILE_KERNEL
>  	/* When using -mkernel_profile there is no load to jump over */
> -	pop = PPC_INST_NOP;
> +	pop = ppc_inst(PPC_INST_NOP);
> 
>  	if (probe_kernel_read(&op, (void *)(ip - 4), 4)) {
>  		pr_err("Fetching instruction at %lx failed.\n", ip - 4);
> @@ -169,7 +170,7 @@ __ftrace_make_nop(struct module *mod,
>  	}
> 
>  	/* We expect either a mflr r0, or a std r0, LRSAVE(r1) */
> -	if (op != PPC_INST_MFLR && op != PPC_INST_STD_LR) {
> +	if (op != ppc_inst(PPC_INST_MFLR) && op != ppc_inst(PPC_INST_STD_LR)) {
>  		pr_err("Unexpected instruction %08x around bl _mcount\n", op);
>  		return -EINVAL;
>  	}
> @@ -188,7 +189,7 @@ __ftrace_make_nop(struct module *mod,
>  	 * Use a b +8 to jump over the load.
>  	 */
> 
> -	pop = PPC_INST_BRANCH | 8;	/* b +8 */
> +	pop = ppc_inst(PPC_INST_BRANCH | 8);	/* b +8 */
> 
>  	/*
>  	 * Check what is in the next instruction. We can see ld r2,40(r1), but
> @@ -199,7 +200,7 @@ __ftrace_make_nop(struct module *mod,
>  		return -EFAULT;
>  	}
> 
> -	if (op != PPC_INST_LD_TOC) {
> +	if (op != ppc_inst(PPC_INST_LD_TOC)) {
>  		pr_err("Expected %08x found %08x\n", PPC_INST_LD_TOC, op);
>  		return -EINVAL;
>  	}
> @@ -275,7 +276,7 @@ __ftrace_make_nop(struct module *mod,
>  		return -EINVAL;
>  	}
> 
> -	op = PPC_INST_NOP;
> +	op = ppc_inst(PPC_INST_NOP);
> 
>  	if (patch_instruction((unsigned int *)ip, op))
>  		return -EPERM;
> @@ -420,7 +421,7 @@ static int __ftrace_make_nop_kernel(struct dyn_ftrace
> *rec, unsigned long addr) }
>  	}
> 
> -	if (patch_instruction((unsigned int *)ip, PPC_INST_NOP)) {
> +	if (patch_instruction((unsigned int *)ip, ppc_inst(PPC_INST_NOP))) {
>  		pr_err("Patching NOP failed.\n");
>  		return -EPERM;
>  	}
> @@ -442,7 +443,7 @@ int ftrace_make_nop(struct module *mod,
>  	if (test_24bit_addr(ip, addr)) {
>  		/* within range */
>  		old = ftrace_call_replace(ip, addr, 1);
> -		new = PPC_INST_NOP;
> +		new = ppc_inst(PPC_INST_NOP);
>  		return ftrace_modify_code(ip, old, new);
>  	} else if (core_kernel_text(ip))
>  		return __ftrace_make_nop_kernel(rec, addr);
> @@ -496,7 +497,7 @@ expected_nop_sequence(void *ip, unsigned int op0,
> unsigned int op1) * The load offset is different depending on the ABI. For
> simplicity * just mask it out when doing the compare.
>  	 */
> -	if ((op0 != 0x48000008) || ((op1 & 0xffff0000) != 0xe8410000))
> +	if ((op0 != ppc_inst(0x48000008)) || ((op1 & 0xffff0000) != 0xe8410000))
>  		return 0;
>  	return 1;
>  }
> @@ -505,7 +506,7 @@ static int
>  expected_nop_sequence(void *ip, unsigned int op0, unsigned int op1)
>  {
>  	/* look for patched "NOP" on ppc64 with -mprofile-kernel */
> -	if (op0 != PPC_INST_NOP)
> +	if (op0 != ppc_inst(PPC_INST_NOP))
>  		return 0;
>  	return 1;
>  }
> @@ -588,7 +589,7 @@ __ftrace_make_call(struct dyn_ftrace *rec, unsigned long
> addr) return -EFAULT;
> 
>  	/* It should be pointing to a nop */
> -	if (op != PPC_INST_NOP) {
> +	if (op != ppc_inst(PPC_INST_NOP)) {
>  		pr_err("Expected NOP but have %x\n", op);
>  		return -EINVAL;
>  	}
> @@ -645,7 +646,7 @@ static int __ftrace_make_call_kernel(struct dyn_ftrace
> *rec, unsigned long addr) return -EFAULT;
>  	}
> 
> -	if (op != PPC_INST_NOP) {
> +	if (op != ppc_inst(PPC_INST_NOP)) {
>  		pr_err("Unexpected call sequence at %p: %x\n", ip, op);
>  		return -EINVAL;
>  	}
> @@ -676,7 +677,7 @@ int ftrace_make_call(struct dyn_ftrace *rec, unsigned
> long addr) */
>  	if (test_24bit_addr(ip, addr)) {
>  		/* within range */
> -		old = PPC_INST_NOP;
> +		old = ppc_inst(PPC_INST_NOP);
>  		new = ftrace_call_replace(ip, addr, 1);
>  		return ftrace_modify_code(ip, old, new);
>  	} else if (core_kernel_text(ip))
> diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
> index 1cfef0e5fec5..31c870287f2b 100644
> --- a/arch/powerpc/kernel/uprobes.c
> +++ b/arch/powerpc/kernel/uprobes.c
> @@ -14,6 +14,7 @@
>  #include <linux/kdebug.h>
> 
>  #include <asm/sstep.h>
> +#include <asm/inst.h>
> 
>  #define UPROBE_TRAP_NR	UINT_MAX
> 
> diff --git a/arch/powerpc/kvm/emulate_loadstore.c
> b/arch/powerpc/kvm/emulate_loadstore.c index 1139bc56e004..135d0e686622
> 100644
> --- a/arch/powerpc/kvm/emulate_loadstore.c
> +++ b/arch/powerpc/kvm/emulate_loadstore.c
> @@ -95,7 +95,7 @@ int kvmppc_emulate_loadstore(struct kvm_vcpu *vcpu)
> 
>  	emulated = EMULATE_FAIL;
>  	vcpu->arch.regs.msr = vcpu->arch.shared->msr;
> -	if (analyse_instr(&op, &vcpu->arch.regs, inst) == 0) {
> +	if (analyse_instr(&op, &vcpu->arch.regs, ppc_inst(inst)) == 0) {
>  		int type = op.type & INSTR_TYPE_MASK;
>  		int size = GETSIZE(op.type);
> 
> diff --git a/arch/powerpc/lib/code-patching.c
> b/arch/powerpc/lib/code-patching.c index 6ed3301c0582..6c30ddadd971 100644
> --- a/arch/powerpc/lib/code-patching.c
> +++ b/arch/powerpc/lib/code-patching.c
> @@ -17,6 +17,7 @@
>  #include <asm/page.h>
>  #include <asm/code-patching.h>
>  #include <asm/setup.h>
> +#include <asm/inst.h>
> 
>  static int __patch_instruction(unsigned int *exec_addr, unsigned int instr,
> unsigned int *patch_addr)
> @@ -414,37 +415,37 @@ static void __init test_branch_iform(void)
>  	addr = (unsigned long)&instr;
> 
>  	/* The simplest case, branch to self, no flags */
> -	check(instr_is_branch_iform(0x48000000));
> +	check(instr_is_branch_iform(ppc_inst(0x48000000)));
>  	/* All bits of target set, and flags */
> -	check(instr_is_branch_iform(0x4bffffff));
> +	check(instr_is_branch_iform(ppc_inst(0x4bffffff)));
>  	/* High bit of opcode set, which is wrong */
> -	check(!instr_is_branch_iform(0xcbffffff));
> +	check(!instr_is_branch_iform(ppc_inst(0xcbffffff)));
>  	/* Middle bits of opcode set, which is wrong */
> -	check(!instr_is_branch_iform(0x7bffffff));
> +	check(!instr_is_branch_iform(ppc_inst(0x7bffffff)));
> 
>  	/* Simplest case, branch to self with link */
> -	check(instr_is_branch_iform(0x48000001));
> +	check(instr_is_branch_iform(ppc_inst(0x48000001)));
>  	/* All bits of targets set */
> -	check(instr_is_branch_iform(0x4bfffffd));
> +	check(instr_is_branch_iform(ppc_inst(0x4bfffffd)));
>  	/* Some bits of targets set */
> -	check(instr_is_branch_iform(0x4bff00fd));
> +	check(instr_is_branch_iform(ppc_inst(0x4bff00fd)));
>  	/* Must be a valid branch to start with */
> -	check(!instr_is_branch_iform(0x7bfffffd));
> +	check(!instr_is_branch_iform(ppc_inst(0x7bfffffd)));
> 
>  	/* Absolute branch to 0x100 */
> -	instr = 0x48000103;
> +	instr = ppc_inst(0x48000103);
>  	check(instr_is_branch_to_addr(&instr, 0x100));
>  	/* Absolute branch to 0x420fc */
> -	instr = 0x480420ff;
> +	instr = ppc_inst(0x480420ff);
>  	check(instr_is_branch_to_addr(&instr, 0x420fc));
>  	/* Maximum positive relative branch, + 20MB - 4B */
> -	instr = 0x49fffffc;
> +	instr = ppc_inst(0x49fffffc);
>  	check(instr_is_branch_to_addr(&instr, addr + 0x1FFFFFC));
>  	/* Smallest negative relative branch, - 4B */
> -	instr = 0x4bfffffc;
> +	instr = ppc_inst(0x4bfffffc);
>  	check(instr_is_branch_to_addr(&instr, addr - 4));
>  	/* Largest negative relative branch, - 32 MB */
> -	instr = 0x4a000000;
> +	instr = ppc_inst(0x4a000000);
>  	check(instr_is_branch_to_addr(&instr, addr - 0x2000000));
> 
>  	/* Branch to self, with link */
> @@ -478,7 +479,7 @@ static void __init test_branch_iform(void)
>  	/* Check flags are masked correctly */
>  	err = create_branch(&instr, &instr, addr, 0xFFFFFFFC);
>  	check(instr_is_branch_to_addr(&instr, addr));
> -	check(instr == 0x48000000);
> +	check(instr == ppc_inst(0x48000000));
>  }
> 
>  static void __init test_create_function_call(void)
> @@ -505,28 +506,28 @@ static void __init test_branch_bform(void)
>  	addr = (unsigned long)iptr;
> 
>  	/* The simplest case, branch to self, no flags */
> -	check(instr_is_branch_bform(0x40000000));
> +	check(instr_is_branch_bform(ppc_inst(0x40000000)));
>  	/* All bits of target set, and flags */
> -	check(instr_is_branch_bform(0x43ffffff));
> +	check(instr_is_branch_bform(ppc_inst(0x43ffffff)));
>  	/* High bit of opcode set, which is wrong */
> -	check(!instr_is_branch_bform(0xc3ffffff));
> +	check(!instr_is_branch_bform(ppc_inst(0xc3ffffff)));
>  	/* Middle bits of opcode set, which is wrong */
> -	check(!instr_is_branch_bform(0x7bffffff));
> +	check(!instr_is_branch_bform(ppc_inst(0x7bffffff)));
> 
>  	/* Absolute conditional branch to 0x100 */
> -	instr = 0x43ff0103;
> +	instr = ppc_inst(0x43ff0103);
>  	check(instr_is_branch_to_addr(&instr, 0x100));
>  	/* Absolute conditional branch to 0x20fc */
> -	instr = 0x43ff20ff;
> +	instr = ppc_inst(0x43ff20ff);
>  	check(instr_is_branch_to_addr(&instr, 0x20fc));
>  	/* Maximum positive relative conditional branch, + 32 KB - 4B */
> -	instr = 0x43ff7ffc;
> +	instr = ppc_inst(0x43ff7ffc);
>  	check(instr_is_branch_to_addr(&instr, addr + 0x7FFC));
>  	/* Smallest negative relative conditional branch, - 4B */
> -	instr = 0x43fffffc;
> +	instr = ppc_inst(0x43fffffc);
>  	check(instr_is_branch_to_addr(&instr, addr - 4));
>  	/* Largest negative relative conditional branch, - 32 KB */
> -	instr = 0x43ff8000;
> +	instr = ppc_inst(0x43ff8000);
>  	check(instr_is_branch_to_addr(&instr, addr - 0x8000));
> 
>  	/* All condition code bits set & link */
> @@ -563,7 +564,7 @@ static void __init test_branch_bform(void)
>  	/* Check flags are masked correctly */
>  	err = create_cond_branch(&instr, iptr, addr, 0xFFFFFFFC);
>  	check(instr_is_branch_to_addr(&instr, addr));
> -	check(instr == 0x43FF0000);
> +	check(instr == ppc_inst(0x43FF0000));
>  }
> 
>  static void __init test_translate_branch(void)
> @@ -597,7 +598,7 @@ static void __init test_translate_branch(void)
>  	patch_instruction(q, instr);
>  	check(instr_is_branch_to_addr(p, addr));
>  	check(instr_is_branch_to_addr(q, addr));
> -	check(*q == 0x4a000000);
> +	check(*q == ppc_inst(0x4a000000));
> 
>  	/* Maximum positive case, move x to x - 32 MB + 4 */
>  	p = buf + 0x2000000;
> @@ -608,7 +609,7 @@ static void __init test_translate_branch(void)
>  	patch_instruction(q, instr);
>  	check(instr_is_branch_to_addr(p, addr));
>  	check(instr_is_branch_to_addr(q, addr));
> -	check(*q == 0x49fffffc);
> +	check(*q == ppc_inst(0x49fffffc));
> 
>  	/* Jump to x + 16 MB moved to x + 20 MB */
>  	p = buf;
> @@ -654,7 +655,7 @@ static void __init test_translate_branch(void)
>  	patch_instruction(q, instr);
>  	check(instr_is_branch_to_addr(p, addr));
>  	check(instr_is_branch_to_addr(q, addr));
> -	check(*q == 0x43ff8000);
> +	check(*q == ppc_inst(0x43ff8000));
> 
>  	/* Maximum positive case, move x to x - 32 KB + 4 */
>  	p = buf + 0x8000;
> @@ -666,7 +667,7 @@ static void __init test_translate_branch(void)
>  	patch_instruction(q, instr);
>  	check(instr_is_branch_to_addr(p, addr));
>  	check(instr_is_branch_to_addr(q, addr));
> -	check(*q == 0x43ff7ffc);
> +	check(*q == ppc_inst(0x43ff7ffc));
> 
>  	/* Jump to x + 12 KB moved to x + 20 KB */
>  	p = buf;
> diff --git a/arch/powerpc/lib/feature-fixups.c
> b/arch/powerpc/lib/feature-fixups.c index b129d7b4e7dd..6e7479b8887a 100644
> --- a/arch/powerpc/lib/feature-fixups.c
> +++ b/arch/powerpc/lib/feature-fixups.c
> @@ -21,6 +21,7 @@
>  #include <asm/setup.h>
>  #include <asm/security_features.h>
>  #include <asm/firmware.h>
> +#include <asm/inst.h>
> 
>  struct fixup_entry {
>  	unsigned long	mask;
> @@ -89,7 +90,7 @@ static int patch_feature_section(unsigned long value,
> struct fixup_entry *fcur) }
> 
>  	for (; dest < end; dest++)
> -		raw_patch_instruction(dest, PPC_INST_NOP);
> +		raw_patch_instruction(dest, ppc_inst(PPC_INST_NOP));
> 
>  	return 0;
>  }
> @@ -146,15 +147,15 @@ static void do_stf_entry_barrier_fixups(enum
> stf_barrier_type types)
> 
>  		pr_devel("patching dest %lx\n", (unsigned long)dest);
> 
> -		patch_instruction(dest, instrs[0]);
> +		patch_instruction(dest, ppc_inst(instrs[0]));
> 
>  		if (types & STF_BARRIER_FALLBACK)
>  			patch_branch(dest + 1, (unsigned long)&stf_barrier_fallback,
>  				     BRANCH_SET_LINK);
>  		else
> -			patch_instruction(dest + 1, instrs[1]);
> +			patch_instruction(dest + 1, ppc_inst(instrs[1]));
> 
> -		patch_instruction(dest + 2, instrs[2]);
> +		patch_instruction(dest + 2, ppc_inst(instrs[2]));
>  	}
> 
>  	printk(KERN_DEBUG "stf-barrier: patched %d entry locations (%s
> barrier)\n", i, @@ -207,12 +208,12 @@ static void
> do_stf_exit_barrier_fixups(enum stf_barrier_type types)
> 
>  		pr_devel("patching dest %lx\n", (unsigned long)dest);
> 
> -		patch_instruction(dest, instrs[0]);
> -		patch_instruction(dest + 1, instrs[1]);
> -		patch_instruction(dest + 2, instrs[2]);
> -		patch_instruction(dest + 3, instrs[3]);
> -		patch_instruction(dest + 4, instrs[4]);
> -		patch_instruction(dest + 5, instrs[5]);
> +		patch_instruction(dest, ppc_inst(instrs[0]));
> +		patch_instruction(dest + 1, ppc_inst(instrs[1]));
> +		patch_instruction(dest + 2, ppc_inst(instrs[2]));
> +		patch_instruction(dest + 3, ppc_inst(instrs[3]));
> +		patch_instruction(dest + 4, ppc_inst(instrs[4]));
> +		patch_instruction(dest + 5, ppc_inst(instrs[5]));
>  	}
>  	printk(KERN_DEBUG "stf-barrier: patched %d exit locations (%s barrier)
\n",
> i, (types == STF_BARRIER_NONE)                  ? "no" :
> @@ -260,9 +261,9 @@ void do_rfi_flush_fixups(enum l1d_flush_type types)
> 
>  		pr_devel("patching dest %lx\n", (unsigned long)dest);
> 
> -		patch_instruction(dest, instrs[0]);
> -		patch_instruction(dest + 1, instrs[1]);
> -		patch_instruction(dest + 2, instrs[2]);
> +		patch_instruction(dest, ppc_inst(instrs[0]));
> +		patch_instruction(dest + 1, ppc_inst(instrs[1]));
> +		patch_instruction(dest + 2, ppc_inst(instrs[2]));
>  	}
> 
>  	printk(KERN_DEBUG "rfi-flush: patched %d locations (%s flush)\n", i,
> @@ -295,7 +296,7 @@ void do_barrier_nospec_fixups_range(bool enable, void
> *fixup_start, void *fixup_ dest = (void *)start + *start;
> 
>  		pr_devel("patching dest %lx\n", (unsigned long)dest);
> -		patch_instruction(dest, instr);
> +		patch_instruction(dest, ppc_inst(instr));
>  	}
> 
>  	printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i);
> @@ -338,8 +339,8 @@ void do_barrier_nospec_fixups_range(bool enable, void
> *fixup_start, void *fixup_ dest = (void *)start + *start;
> 
>  		pr_devel("patching dest %lx\n", (unsigned long)dest);
> -		patch_instruction(dest, instr[0]);
> -		patch_instruction(dest + 1, instr[1]);
> +		patch_instruction(dest, ppc_inst(instr[0]));
> +		patch_instruction(dest + 1, ppc_inst(instr[1]));
>  	}
> 
>  	printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i);
> @@ -353,7 +354,7 @@ static void patch_btb_flush_section(long *curr)
>  	end = (void *)curr + *(curr + 1);
>  	for (; start < end; start++) {
>  		pr_devel("patching dest %lx\n", (unsigned long)start);
> -		patch_instruction(start, PPC_INST_NOP);
> +		patch_instruction(start, ppc_inst(PPC_INST_NOP));
>  	}
>  }
> 
> @@ -382,7 +383,7 @@ void do_lwsync_fixups(unsigned long value, void
> *fixup_start, void *fixup_end)
> 
>  	for (; start < end; start++) {
>  		dest = (void *)start + *start;
> -		raw_patch_instruction(dest, PPC_INST_LWSYNC);
> +		raw_patch_instruction(dest, ppc_inst(PPC_INST_LWSYNC));
>  	}
>  }
> 
> @@ -400,7 +401,7 @@ static void do_final_fixups(void)
>  	length = (__end_interrupts - _stext) / sizeof(int);
> 
>  	while (length--) {
> -		raw_patch_instruction(dest, *src);
> +		raw_patch_instruction(dest, ppc_inst(*src));
>  		src++;
>  		dest++;
>  	}
> diff --git a/arch/powerpc/lib/test_emulate_step.c
> b/arch/powerpc/lib/test_emulate_step.c index 53df4146dd32..85d62f16d07a
> 100644
> --- a/arch/powerpc/lib/test_emulate_step.c
> +++ b/arch/powerpc/lib/test_emulate_step.c
> @@ -11,6 +11,7 @@
>  #include <asm/sstep.h>
>  #include <asm/ppc-opcode.h>
>  #include <asm/code-patching.h>
> +#include <asm/inst.h>
> 
>  #define IMM_L(i)		((uintptr_t)(i) & 0xffff)
>  #define IMM_DS(i)		((uintptr_t)(i) & 0xfffc)
> @@ -19,40 +20,40 @@
>   * Defined with TEST_ prefix so it does not conflict with other
>   * definitions.
>   */
> -#define TEST_LD(r, base, i)	(PPC_INST_LD | ___PPC_RT(r) |		\
> +#define TEST_LD(r, base, i)	ppc_inst(PPC_INST_LD | ___PPC_RT(r) |		\
>  					___PPC_RA(base) | IMM_DS(i))
> -#define TEST_LWZ(r, base, i)	(PPC_INST_LWZ | ___PPC_RT(r) |		\
> +#define TEST_LWZ(r, base, i)	ppc_inst(PPC_INST_LWZ | ___PPC_RT(r) |		\
>  					___PPC_RA(base) | IMM_L(i))
> -#define TEST_LWZX(t, a, b)	(PPC_INST_LWZX | ___PPC_RT(t) |		\
> +#define TEST_LWZX(t, a, b)	ppc_inst(PPC_INST_LWZX | ___PPC_RT(t) |		\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_STD(r, base, i)	(PPC_INST_STD | ___PPC_RS(r) |		\
> +#define TEST_STD(r, base, i)	ppc_inst(PPC_INST_STD | ___PPC_RS(r) |		\
>  					___PPC_RA(base) | IMM_DS(i))
> -#define TEST_LDARX(t, a, b, eh)	(PPC_INST_LDARX | ___PPC_RT(t) |	\
> +#define TEST_LDARX(t, a, b, eh)	ppc_inst(PPC_INST_LDARX | ___PPC_RT(t) |	\
>  					___PPC_RA(a) | ___PPC_RB(b) |	\
>  					__PPC_EH(eh))
> -#define TEST_STDCX(s, a, b)	(PPC_INST_STDCX | ___PPC_RS(s) |	\
> +#define TEST_STDCX(s, a, b)	ppc_inst(PPC_INST_STDCX | ___PPC_RS(s) |	\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_LFSX(t, a, b)	(PPC_INST_LFSX | ___PPC_RT(t) |		\
> +#define TEST_LFSX(t, a, b)	ppc_inst(PPC_INST_LFSX | ___PPC_RT(t) |		\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_STFSX(s, a, b)	(PPC_INST_STFSX | ___PPC_RS(s) |	\
> +#define TEST_STFSX(s, a, b)	ppc_inst(PPC_INST_STFSX | ___PPC_RS(s) |	\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_LFDX(t, a, b)	(PPC_INST_LFDX | ___PPC_RT(t) |		\
> +#define TEST_LFDX(t, a, b)	ppc_inst(PPC_INST_LFDX | ___PPC_RT(t) |		\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_STFDX(s, a, b)	(PPC_INST_STFDX | ___PPC_RS(s) |	\
> +#define TEST_STFDX(s, a, b)	ppc_inst(PPC_INST_STFDX | ___PPC_RS(s) |	\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_LVX(t, a, b)	(PPC_INST_LVX | ___PPC_RT(t) |		\
> +#define TEST_LVX(t, a, b)	ppc_inst(PPC_INST_LVX | ___PPC_RT(t) |		\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_STVX(s, a, b)	(PPC_INST_STVX | ___PPC_RS(s) |		\
> +#define TEST_STVX(s, a, b)	ppc_inst(PPC_INST_STVX | ___PPC_RS(s) |		\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_LXVD2X(s, a, b)	(PPC_INST_LXVD2X | VSX_XX1((s), R##a, R##b))
> -#define TEST_STXVD2X(s, a, b)	(PPC_INST_STXVD2X | VSX_XX1((s), R##a, R##b))
> -#define TEST_ADD(t, a, b)	(PPC_INST_ADD | ___PPC_RT(t) |		\
> +#define TEST_LXVD2X(s, a, b)	ppc_inst(PPC_INST_LXVD2X | VSX_XX1((s), R##a,
> R##b)) +#define TEST_STXVD2X(s, a, b)	ppc_inst(PPC_INST_STXVD2X |
> VSX_XX1((s), R##a, R##b)) +#define TEST_ADD(t, a, b)	ppc_inst(PPC_INST_ADD
> | ___PPC_RT(t) |		\ ___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_ADD_DOT(t, a, b)	(PPC_INST_ADD | ___PPC_RT(t) |		\
> +#define TEST_ADD_DOT(t, a, b)	ppc_inst(PPC_INST_ADD | ___PPC_RT(t) |		
\
>  					___PPC_RA(a) | ___PPC_RB(b) | 0x1)
> -#define TEST_ADDC(t, a, b)	(PPC_INST_ADDC | ___PPC_RT(t) |		\
> +#define TEST_ADDC(t, a, b)	ppc_inst(PPC_INST_ADDC | ___PPC_RT(t) |		\
>  					___PPC_RA(a) | ___PPC_RB(b))
> -#define TEST_ADDC_DOT(t, a, b)	(PPC_INST_ADDC | ___PPC_RT(t) |		\
> +#define TEST_ADDC_DOT(t, a, b)	ppc_inst(PPC_INST_ADDC | ___PPC_RT(t) |		\
>  					___PPC_RA(a) | ___PPC_RB(b) | 0x1)
> 
>  #define MAX_SUBTESTS	16
> @@ -472,7 +473,7 @@ static struct compute_test compute_tests[] = {
>  		.subtests = {
>  			{
>  				.descr = "R0 = LONG_MAX",
> -				.instr = PPC_INST_NOP,
> +				.instr = ppc_inst(PPC_INST_NOP),
>  				.regs = {
>  					.gpr[0] = LONG_MAX,
>  				}
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index e122f0c8a044..7e60327a9483 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -54,6 +54,7 @@
>  #include <asm/firmware.h>
>  #include <asm/code-patching.h>
>  #include <asm/sections.h>
> +#include <asm/inst.h>
> 
>  #ifdef CONFIG_PPC64
>  #include <asm/hvcall.h>
> @@ -946,7 +947,7 @@ static void remove_bpts(void)
>  		if ((bp->enabled & (BP_TRAP|BP_CIABR)) != BP_TRAP)
>  			continue;
>  		if (mread(bp->address, &instr, 4) == 4
> -		    && instr == bpinstr
> +		    && instr == ppc_inst(bpinstr)
>  		    && patch_instruction(
>  			(unsigned int *)bp->address, bp->instr[0]) != 0)
>  			printf("Couldn't remove breakpoint at %lx\n",
> @@ -2847,7 +2848,7 @@ generic_inst_dump(unsigned long adr, long count, int
> praddr, {
>  	int nr, dotted;
>  	unsigned long first_adr;
> -	unsigned int inst, last_inst = 0;
> +	unsigned int inst, last_inst = ppc_inst(0);
>  	unsigned char val[4];
> 
>  	dotted = 0;
> @@ -2860,7 +2861,7 @@ generic_inst_dump(unsigned long adr, long count, int
> praddr, }
>  			break;
>  		}
> -		inst = GETWORD(val);
> +		inst = ppc_inst(GETWORD(val));
>  		if (adr > first_adr && inst == last_inst) {
>  			if (!dotted) {
>  				printf(" ...\n");





^ permalink raw reply

* Re: [PATCH v7 02/28] powerpc/xmon: Move breakpoint instructions to own array
From: Jordan Niethe @ 2020-05-04  5:52 UTC (permalink / raw)
  To: Alistair Popple
  Cc: Nicholas Piggin, Balamuruhan S, naveen.n.rao, linuxppc-dev,
	Daniel Axtens
In-Reply-To: <2060873.nxaIYQu1l1@townsend>

On Mon, May 4, 2020 at 3:41 PM Alistair Popple <alistair@popple.id.au> wrote:
>
> On Friday, 1 May 2020 1:41:54 PM AEST Jordan Niethe wrote:
> > To execute an instruction out of line after a breakpoint, the NIP is set
> > to the address of struct bpt::instr. Here a copy of the instruction that
> > was replaced with a breakpoint is kept, along with a trap so normal flow
> > can be resumed after XOLing. The struct bpt's are located within the
> > data section. This is problematic as the data section may be marked as
> > no execute.
> >
> > Instead of each struct bpt holding the instructions to be XOL'd, make a
> > new array, bpt_table[], with enough space to hold instructions for the
> > number of supported breakpoints. A later patch will move this to the
> > text section.
> > Make struct bpt::instr a pointer to the instructions in bpt_table[]
> > associated with that breakpoint. This association is a simple mapping:
> > bpts[n] -> bpt_table[n * words per breakpoint]. Currently we only need
> > the copied instruction followed by a trap, so 2 words per breakpoint.
> >
> > Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> > ---
> > v4: New to series
> > v5: - Do not use __section(), use a .space directive in .S file
> >     - Simplify in_breakpoint_table() calculation
> >     - Define BPT_SIZE
> > v6: - Seperate moving to text section
> > ---
> >  arch/powerpc/xmon/xmon.c | 21 ++++++++++++---------
> >  1 file changed, 12 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> > index f91ae2c9adbe..6ba7f66c1dd0 100644
> > --- a/arch/powerpc/xmon/xmon.c
> > +++ b/arch/powerpc/xmon/xmon.c
> > @@ -98,7 +98,7 @@ static long *xmon_fault_jmp[NR_CPUS];
> >  /* Breakpoint stuff */
> >  struct bpt {
> >       unsigned long   address;
> > -     unsigned int    instr[2];
> > +     unsigned int    *instr;
> >       atomic_t        ref_count;
> >       int             enabled;
> >       unsigned long   pad;
> > @@ -117,6 +117,10 @@ static unsigned bpinstr = 0x7fe00008;    /* trap */
> >
> >  #define BP_NUM(bp)   ((bp) - bpts + 1)
> >
> > +#define BPT_SIZE       (sizeof(unsigned int) * 2)
> > +#define BPT_WORDS      (BPT_SIZE / sizeof(unsigned int))
>
> Minor nit-pick but IMHO this would be more logical if you defined BPT_WORDS
> first like so:
>
> #define BPT_WORDS      (2)
> #define BPT_SIZE       (sizeof(unsigned int) * BPT_WORDS)
>
> Otherwise this looks good and I think the offset calculations below are correct
> so:
What I was thinking was BPT_SIZE  would later be defined in terms the
instruction type and by doing it this way
 BPT_WORDS  would be correct for the the instruction type if like on
ppc32, it did not include a suffix.
>
> Reviewed-by: Alistair Popple <alistair@popple.id.au>
>
> > +static unsigned int bpt_table[NBPTS * BPT_WORDS];
> > +
> >  /* Prototypes */
> >  static int cmds(struct pt_regs *);
> >  static int mread(unsigned long, void *, int);
> > @@ -854,15 +858,13 @@ static struct bpt *in_breakpoint_table(unsigned long
> > nip, unsigned long *offp) {
> >       unsigned long off;
> >
> > -     off = nip - (unsigned long) bpts;
> > -     if (off >= sizeof(bpts))
> > +     off = nip - (unsigned long) bpt_table;
> > +     if (off >= sizeof(bpt_table))
> >               return NULL;
> > -     off %= sizeof(struct bpt);
> > -     if (off != offsetof(struct bpt, instr[0])
> > -         && off != offsetof(struct bpt, instr[1]))
> > +     *offp = off % BPT_SIZE;
> > +     if (*offp != 0 && *offp != 4)
> >               return NULL;
> > -     *offp = off - offsetof(struct bpt, instr[0]);
> > -     return (struct bpt *) (nip - off);
> > +     return bpts + (off / BPT_SIZE);
> >  }
> >
> >  static struct bpt *new_breakpoint(unsigned long a)
> > @@ -877,7 +879,8 @@ static struct bpt *new_breakpoint(unsigned long a)
> >       for (bp = bpts; bp < &bpts[NBPTS]; ++bp) {
> >               if (!bp->enabled && atomic_read(&bp->ref_count) == 0) {
> >                       bp->address = a;
> > -                     patch_instruction(&bp->instr[1], bpinstr);
> > +                     bp->instr = bpt_table + ((bp - bpts) * BPT_WORDS);
> > +                     patch_instruction(bp->instr + 1, bpinstr);
> >                       return bp;
> >               }
> >       }
>
>
>
>

^ permalink raw reply

* Re: [PATCH v7 04/28] powerpc/xmon: Use bitwise calculations in_breakpoint_table()
From: Alistair Popple @ 2020-05-04  5:41 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: npiggin, bala24, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20200501034220.8982-5-jniethe5@gmail.com>

I probably would have just folded this change into patch 2 but it looks fine to 
me.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

On Friday, 1 May 2020 1:41:56 PM AEST Jordan Niethe wrote:
> A modulo operation is used for calculating the current offset from a
> breakpoint within the breakpoint table. As instruction lengths are
> always a power of 2, this can be replaced with a bitwise 'and'. The
> current check for word alignment can be replaced with checking that the
> lower 2 bits are not set.
> 
> Suggested-by: Christophe Leroy <christophe.leroy@c-s.fr>
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> ---
> v6: New to series
> ---
>  arch/powerpc/xmon/xmon.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index bbfea22f4a96..e122f0c8a044 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -857,8 +857,8 @@ static struct bpt *in_breakpoint_table(unsigned long
> nip, unsigned long *offp) off = nip - (unsigned long) bpt_table;
>  	if (off >= sizeof(bpt_table))
>  		return NULL;
> -	*offp = off % BPT_SIZE;
> -	if (*offp != 0 && *offp != 4)
> +	*offp = off & (BPT_SIZE - 1);
> +	if (off & 3)
>  		return NULL;
>  	return bpts + (off / BPT_SIZE);
>  }





^ permalink raw reply

* Re: [PATCH v7 02/28] powerpc/xmon: Move breakpoint instructions to own array
From: Alistair Popple @ 2020-05-04  5:41 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: npiggin, bala24, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20200501034220.8982-3-jniethe5@gmail.com>

On Friday, 1 May 2020 1:41:54 PM AEST Jordan Niethe wrote:
> To execute an instruction out of line after a breakpoint, the NIP is set
> to the address of struct bpt::instr. Here a copy of the instruction that
> was replaced with a breakpoint is kept, along with a trap so normal flow
> can be resumed after XOLing. The struct bpt's are located within the
> data section. This is problematic as the data section may be marked as
> no execute.
> 
> Instead of each struct bpt holding the instructions to be XOL'd, make a
> new array, bpt_table[], with enough space to hold instructions for the
> number of supported breakpoints. A later patch will move this to the
> text section.
> Make struct bpt::instr a pointer to the instructions in bpt_table[]
> associated with that breakpoint. This association is a simple mapping:
> bpts[n] -> bpt_table[n * words per breakpoint]. Currently we only need
> the copied instruction followed by a trap, so 2 words per breakpoint.
> 
> Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
> ---
> v4: New to series
> v5: - Do not use __section(), use a .space directive in .S file
>     - Simplify in_breakpoint_table() calculation
>     - Define BPT_SIZE
> v6: - Seperate moving to text section
> ---
>  arch/powerpc/xmon/xmon.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c
> index f91ae2c9adbe..6ba7f66c1dd0 100644
> --- a/arch/powerpc/xmon/xmon.c
> +++ b/arch/powerpc/xmon/xmon.c
> @@ -98,7 +98,7 @@ static long *xmon_fault_jmp[NR_CPUS];
>  /* Breakpoint stuff */
>  struct bpt {
>  	unsigned long	address;
> -	unsigned int	instr[2];
> +	unsigned int	*instr;
>  	atomic_t	ref_count;
>  	int		enabled;
>  	unsigned long	pad;
> @@ -117,6 +117,10 @@ static unsigned bpinstr = 0x7fe00008;	/* trap */
> 
>  #define BP_NUM(bp)	((bp) - bpts + 1)
> 
> +#define BPT_SIZE       (sizeof(unsigned int) * 2)
> +#define BPT_WORDS      (BPT_SIZE / sizeof(unsigned int))

Minor nit-pick but IMHO this would be more logical if you defined BPT_WORDS 
first like so:

#define BPT_WORDS      (2)
#define BPT_SIZE       (sizeof(unsigned int) * BPT_WORDS)

Otherwise this looks good and I think the offset calculations below are correct 
so:

Reviewed-by: Alistair Popple <alistair@popple.id.au>

> +static unsigned int bpt_table[NBPTS * BPT_WORDS];
> +
>  /* Prototypes */
>  static int cmds(struct pt_regs *);
>  static int mread(unsigned long, void *, int);
> @@ -854,15 +858,13 @@ static struct bpt *in_breakpoint_table(unsigned long
> nip, unsigned long *offp) {
>  	unsigned long off;
> 
> -	off = nip - (unsigned long) bpts;
> -	if (off >= sizeof(bpts))
> +	off = nip - (unsigned long) bpt_table;
> +	if (off >= sizeof(bpt_table))
>  		return NULL;
> -	off %= sizeof(struct bpt);
> -	if (off != offsetof(struct bpt, instr[0])
> -	    && off != offsetof(struct bpt, instr[1]))
> +	*offp = off % BPT_SIZE;
> +	if (*offp != 0 && *offp != 4)
>  		return NULL;
> -	*offp = off - offsetof(struct bpt, instr[0]);
> -	return (struct bpt *) (nip - off);
> +	return bpts + (off / BPT_SIZE);
>  }
> 
>  static struct bpt *new_breakpoint(unsigned long a)
> @@ -877,7 +879,8 @@ static struct bpt *new_breakpoint(unsigned long a)
>  	for (bp = bpts; bp < &bpts[NBPTS]; ++bp) {
>  		if (!bp->enabled && atomic_read(&bp->ref_count) == 0) {
>  			bp->address = a;
> -			patch_instruction(&bp->instr[1], bpinstr);
> +			bp->instr = bpt_table + ((bp - bpts) * BPT_WORDS);
> +			patch_instruction(bp->instr + 1, bpinstr);
>  			return bp;
>  		}
>  	}





^ permalink raw reply

* Re: [PATCH V2 00/11] Subject: Remove duplicated kmap code
From: Al Viro @ 2020-05-04  5:33 UTC (permalink / raw)
  To: Ira Weiny
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Huang Rui, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Ingo Molnar, linux-snps-arc, linux-xtensa,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200504050447.GA979899@iweiny-DESK2.sc.intel.com>

On Sun, May 03, 2020 at 10:04:47PM -0700, Ira Weiny wrote:

> Grepping for 'asm/highmem.h' and investigations don't reveal any issues...  But
> you do have me worried.  That said 0-day has been crunching on multiple
> versions of this series without issues such as this (save the mips issue
> above).
> 
> I have to say it would be nice if the relation between linux/highmem.h and
> asm/highmem.h was more straightforward.

IIRC, the headache was due to asm/pgtable.h on several architectures and
asm/cacheflush.h on parisc.

<digs the notes out>

||         IOW, there's one in linux/highmem.h (conditional on !CONFIG_HIGHMEM,
|| !ARCH_HAS_KMAP) and several per-architecture variants, usually declared in
|| their asm/highmem.h.  In three of those (microblaze, parisc and powerpc) these
|| are inlines (parisc one identical to linux/highmem.h, lives in asm/cacheflush.h,
|| powerpc and microblaze ones calling kmap_atomic_prot() which is defined in
|| arch/$ARCH/mm/highmem.c).
|| 
||         parisc case is weird - essentially, they want to call 
|| flush_kernel_dcache_page_addr() at the places where kunmap/kunmap_atomic
|| is done.  And they do so despite not selecting HIGHMEM, with definitions
|| in usual place.  They do have ARCH_HAS_KMAP defined, which prevents
|| getting buggered in linux/highmem.h.  ARCH_HAS_KMAP is parisc-unique,
|| BTW, and checked only in linux/highmem.h.
|| 
||         All genuine arch-specific variants are defined in (or call functions
|| defined in) translation units that are only included CONFIG_HIGHMEM builds.
|| 
||         It would be tempting to consolidate those, e.g. by adding __kmap_atomic()
|| and __kmap_atomic_prot() without that boilerplate, with universal kmap_atomic()
|| and kmap_atomic_prot() in linux/highmem.h.  Potential problem with that would
|| be something that pulls ash/highmem.h (or asm/cacheflush.h in case of parisc)
|| directly and uses kmap_atomic/kmap_atomic_prot.  There's not a lot places
|| pulling asm/highmem.h, and many of them are not even in includes:
|| 
|| arch/arm/include/asm/efi.h:13:#include <asm/highmem.h>
|| arch/arm/mm/dma-mapping.c:31:#include <asm/highmem.h>
|| arch/arm/mm/flush.c:14:#include <asm/highmem.h>
|| arch/arm/mm/mmu.c:27:#include <asm/highmem.h>
|| arch/mips/include/asm/pgtable-32.h:22:#include <asm/highmem.h>
|| arch/mips/mm/cache.c:19:#include <asm/highmem.h>
|| arch/mips/mm/fault.c:28:#include <asm/highmem.h>                /* For VMALLOC_END */
|| arch/nds32/include/asm/pgtable.h:60:#include <asm/highmem.h>
|| arch/x86/kernel/setup_percpu.c:20:#include <asm/highmem.h>
|| include/linux/highmem.h:35:#include <asm/highmem.h>
|| 
|| Users of asm/cacheflush.h are rather more numerous; however, anything
|| outside of parisc-specific code has to pull linux/highmem.h, or it won't see
|| the definitions of kmap_atomic/kmap_atomic_prot at all.  arch/parisc itself
|| has no callers of those.
|| 
|| Outside of arch/* there is a plenty of callers.  However, the following is
|| true: all instances of kmap_atomic or kmap_atomic_prot outside of arch/*
|| are either inside the linux/highmem.h or are preceded by include of
|| linux/highmem.h on any build that sees them (there is a common include
|| chain that is conditional upon CONFIG_BLOCK, but it's only needed in
|| drivers that are BLOCK-dependent).  It was not fun to verify, to put
|| it mildly...
|| 
|| So for parisc we have no problem - leaving __kmap_atomic()/__kmap_atomic_prot()
|| in asm/cachefile.h and adding universal wrappers in linux/highmem.h will be
|| fine.  For other architectures the things might be trickier.
|| 
|| * arc: all users in arch/arc/ are within arch/arc/mm/{cache,highmem}.c;
|| both pull linux/highmem.h.  We are fine.
|| 
|| * arm: much, much worse.  We have several files that pull linux/highmem.h:
|| arch/arm/mm/cache-feroceon-l2.c, arch/arm/mm/cache-xsc3l2.c,
|| arch/arm/mm/copypage-*.c, arch/arm/mm/dma-mapping.c, arch/arm/mm/flush.c,
|| arch/arm/mm/highmem.c, arch/arm/probes/uprobes/core.c,
|| arch/arm/include/asm/kvm_mmu.h (kmap_atomic_pfn()).
|| Those are fine, but we also have this:
|| arch/arm/include/asm/pgtable.h:200:#define __pte_map(pmd)               (pte_t *)kmap_atomic(pmd_page(*(pmd)))
|| arch/arm/include/asm/pgtable.h:208:#define pte_offset_map(pmd,addr)     (__pte_map(pmd) + pte_index(addr))
|| and sure as hell, asm/pgtable.h does *NOT* pull linux/highmem.h.
|| 
|| Fortunately, the users of pte_offset_map() (__pte_map() has no other users)
|| are few, both in arch/arm and outside of arch.  All arm ones are pulling
|| linux/highmem (arch/arm/mm/{pgd,fault*}.c).  Outside of arch we have several
|| that pull highmem.h (by way of rmap.h or pagemap.h, usually):
||         fs/userfaultfd.c, mm/gup.c, mm/hmm.c, mm/huge_memory.c,
||         mm/khugepaged.c, mm/memory-failure.c, mm/memory.c, mm/migrate.c,
||         mm/mremap.c, mm/page_vma_mapped.c, mm/swap_state.c, mm/swapfile.c,
||         mm/vmalloc.c
|| and then there are these in linux/mm.h:
|| 
|| #define pte_offset_map_lock(mm, pmd, address, ptlp)     \
|| ({                                                      \
||         spinlock_t *__ptl = pte_lockptr(mm, pmd);       \
||         pte_t *__pte = pte_offset_map(pmd, address);    \
||         *(ptlp) = __ptl;                                \
||         spin_lock(__ptl);                               \
||         __pte;                                          \
|| })
|| #define pte_alloc_map(mm, pmd, address)                 \
||         (pte_alloc(mm, pmd) ? NULL : pte_offset_map(pmd, address))
|| #define pte_alloc_map_lock(mm, pmd, address, ptlp)      \
||         (pte_alloc(mm, pmd) ?                   \
||                  NULL : pte_offset_map_lock(mm, pmd, address, ptlp))
|| 
||         These have two users in arch/arm (arch/arm/mm/pgd.c and
|| arch/arm/lib/uaccess_with_memcpy.c, both pulling highmem.h).  Outside of
|| arch there are several new files (plus a lot of what we'd already seen
|| in mm/*.c, unsurprisingly):
||         fs/proc/task_mmu.c, mm/ksm.c, mm/madvise.c, mm/memcontrol.c,
||         mm/mempolicy.c, mm/mincore.c, mm/mprotect.c, mm/pagewalk.c,
||         mm/shmem.c, mm/userfaultfd.c,
|| all pulling linux/highmem.h, as pretty much all core VM does.  So we are
|| still fine.
|| 
|| * csky: users in arch/csky/abiv2/cacheflush.c, arch/csky/mm/dma-mapping.c,
|| arch/csky/mm/highmem.c, all pulling linux/highmem.h
|| 
|| * microblaze: users in arch/microblaze/mm/highmem.c (pulls linux/highmem.h) and,
|| arch/microblaze/include/asm/pgtable.h, this:
|| #define pte_offset_map(dir, addr)               \
||         ((pte_t *) kmap_atomic(pmd_page(*(dir))) + pte_index(addr))
||         One pte_offset_map user in arch/microblaze:
|| arch/microblaze/kernel/signal.c:207:    ptep = pte_offset_map(pmdp, address);
|| Messy, but doesn't require any changes (we have asm/pgalloc.h included
|| there, and that pull linux/highmem.h).
||         Outside of arch we'd already sorted it out when looking at arm.
|| 
|| * mips: users in arch/mips/kernel/crash_dump.c, arch/mips/kernel/uprobes.c,
|| arch/mips/mm/c-r4k.c, arch/mips/mm/dma-noncoherent.c, arch/mips/mm/highmem.c,
|| and arch/mips/mm/init.c (all pulling linux/highmem.h) plus this
|| arch/mips/mm/cache.c, which relies upon asm/highmem.h.  This can be switched
|| to linux/highmem.h.  On !CONFIG_HIGHMEM builds the call of kmap_atomic() in
|| there is eliminated, since it's conditional upon PageHighMem().  IOW, even
|| though we get a call of (inexistent) out-of-line version, it's not going to
|| survive into object file.  With linux/highmem.h use it will be an equally
|| eliminated call of inlined version.
|| XXX: arch/mips/mm/cache.c
|| 
|| * nds32: users in arch/nds32/kernel/dma.c, arch/nds32/mm/cacheflush.c and
|| arch/nds32/mm/highmem.c, all pulling linux/highmem.h
|| 
|| * powerpc: users in arch/powerpc/kvm/book3s_pr.c,
|| arch/powerpc/kvm/e500_mmu_host.c, arch/powerpc/mm/dma-noncoherent.c,
|| arch/powerpc/mm/highmem.c and arch/powerpc/mm/mem.c, all pulling
|| linux/highmem.h, a user in arch/powerpc/mm/hugetlbpage.c pulling it
|| via asm/tlb.h -> linux/pagemap.h -> linux/highmem.h and
|| macros for pte_offset_map in arch/powerpc/include/asm/*/32/pgtable.h.
|| Users of that within arch/powerpc are either 64bit-only or
|| pull linux/highmem.h (arch/powerpc/mm/pgtable_32.c and
|| arch/powerpc/xmon/xmon.c).  Users outside of arch - same as for arm.
|| 
|| * sparc: users in arch/sparc/kernel/uprobes.c and arch/sparc/mm/highmem.c
|| (both pulling linux/highmem.h directly) + arch/sparc/mm/init_64.c pulling
|| it via linux/pagemap.h.  Strangely, arch/sparc/mm/io-unit.c and
|| arch/sparc/mm/iommu.c both include linux/highmem.h with odd comment
|| that seems to indicate that once upon a time pte_offset_map() used to
|| requite kmap_atomic() there...  Right, it used to - until 2002.
|| These includes are pointless, then...
|| 
|| * x86: users in arch/x86/kernel/crash_dump_32.c, arch/x86/kvm/svm.c,
|| arch/x86/lib/usercopy_64.c, arch/x86/mm/highmem_32.c and arch/x86/mm/iomap_32.c,
|| all pulling linux/highmem.h, users in paging_tmpl.h (included from
|| arch/x86/kvm/mmu/mmu.c, which has pulled linux/highmem.h prior to that)
|| and definition of pte_offset_map() (in asm/pgtable_32.h)
|| Users of pte_offset_map() and friends in arch/x86 are in
|| arch/x86/kernel/vm86_32.c and arch/x86/mm/dump_pagetables.c (both
|| pulling linux/highmem.h), in arch/x86/mm/mem_encrypt_identity.c
|| (64bit-only, pte_offset_map() doesn't use kmap_atomic() there) and
|| arch/x86/kernel/tboot.c (pulls linux/highmem.h via asm/pgalloc.h
|| and linux/pagemap.h)
|| 
|| * xtensa: users in arch/xtensa/kernel/pci-dma.c, arch/xtensa/mm/highmem.c,
|| arch/xtensa/mm/cache.c and arch/xtensa/platforms/iss/simdisk.c (all pull
|| linux/highmem.h).

^ permalink raw reply

* Re: [PATCH] powerpc/powernv: Fix a warning message
From: Gautham R Shenoy @ 2020-05-04  5:05 UTC (permalink / raw)
  To: Christophe JAILLET
  Cc: ego, maddy, Rafael J. Wysocki, kernel-janitors, cclaudio, npiggin,
	linux-kernel, zhangshaokun, Viresh Kumar, atrajeev, paulus, tglx,
	linuxppc-dev, akshay.adiga
In-Reply-To: <20200502115949.139000-1-christophe.jaillet@wanadoo.fr>

Hello Christophe,

On Sat, May 02, 2020 at 01:59:49PM +0200, Christophe JAILLET wrote:
> Fix a cut'n'paste error in a warning message. This should be
> 'cpu-idle-state-residency-ns' to match the property searched in the
> previous 'of_property_read_u32_array()'
> 
> Fixes: 9c7b185ab2fe ("powernv/cpuidle: Parse dt idle properties into global structure")
> Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>

Thanks for catching this.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
>  arch/powerpc/platforms/powernv/idle.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/powernv/idle.c b/arch/powerpc/platforms/powernv/idle.c
> index 78599bca66c2..2dd467383a88 100644
> --- a/arch/powerpc/platforms/powernv/idle.c
> +++ b/arch/powerpc/platforms/powernv/idle.c
> @@ -1270,7 +1270,7 @@ static int pnv_parse_cpuidle_dt(void)
>  	/* Read residencies */
>  	if (of_property_read_u32_array(np, "ibm,cpu-idle-state-residency-ns",
>  				       temp_u32, nr_idle_states)) {
> -		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-latencies-ns in DT\n");
> +		pr_warn("cpuidle-powernv: missing ibm,cpu-idle-state-residency-ns in DT\n");
>  		rc = -EINVAL;
>  		goto out;
>  	}
> -- 
> 2.25.1
> 

^ permalink raw reply

* Re: [PATCH V2 00/11] Subject: Remove duplicated kmap code
From: Ira Weiny @ 2020-05-04  5:04 UTC (permalink / raw)
  To: Al Viro
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Huang Rui, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Ingo Molnar, linux-snps-arc, linux-xtensa,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200504013509.GU23230@ZenIV.linux.org.uk>

On Mon, May 04, 2020 at 02:35:09AM +0100, Al Viro wrote:
> On Sun, May 03, 2020 at 06:09:01PM -0700, ira.weiny@intel.com wrote:
> > From: Ira Weiny <ira.weiny@intel.com>
> > 
> > The kmap infrastructure has been copied almost verbatim to every architecture.
> > This series consolidates obvious duplicated code by defining core functions
> > which call into the architectures only when needed.
> > 
> > Some of the k[un]map_atomic() implementations have some similarities but the
> > similarities were not sufficient to warrant further changes.
> > 
> > In addition we remove a duplicate implementation of kmap() in DRM.
> > 
> > Testing was done by 0day to cover all the architectures I can't readily
> > build/test.
> 
> OK...  Looking through my old notes on kmap unification (this winter, never
> went anywhere),
> 
> * arch/mips/mm/cache.c ought to use linux/highmem.h, not asm/highmem.h
> I suspect that your series doesn't build on some configs there.  Hadn't
> verified that, though.

Yes patch 6 makes the change because kmap_atomic() was no longer declared in
asm/highmem.h.  I'm pretty sure 0-day caught that ...  but I seem to remember
noticing some oddness in that file and I did go through it by hand.

> 
> * kmap_atomic_to_page() is dead, but not quite gone - csky and nds32 brought
> the damn thing back (nds32 - only an extern).  It needs killin'...

Easy enough. Added as a follow on patch.

> 
> * parisc is (arguably) abusing kunmap()/kunmap_atomic() for cache flushing.
> Replace the bulk of its highmem.h with
> #define ARCH_HAS_FLUSH_ON_KUNMAP
> #define arch_before_kunmap flush_kernel_dcache_page_addr
> and have default kunmap()/kunmap_atomic() do
> #ifdef ARCH_HAS_FLUSH_ON_KUNMAP
> 	arch_before_kunmap(page_address(page));
> #endif
> and
> #ifdef ARCH_HAS_FLUSH_ON_KUNMAP
> 	arch_before_kunmap(addr);
> #endif
> resp.  Kills ARCH_HAS_KMAP along with ifdefs on it, makes parisc use somewhat
> less hacky.

Agreed.  Done in a follow on patch.

> 
> I'd suggest checking various configs on mips - that's likely to cause headache.
> Said that, my analysis of include chains back then is pretty much worthless
> by now - I really hate the amount of indirect include chains leading to that
> sucker on some, but not all configs ;-/  IIRC, the proof that everything
> using kmap*/kunmap* would pull linux/highmem.h regardless of config took several
> hours of digging, ran for several pages and had been hopelessly brittle.
> arch/mips/mm/cache.c was the only exception caught by it, but these days
> there might be more.

Grepping for 'asm/highmem.h' and investigations don't reveal any issues...  But
you do have me worried.  That said 0-day has been crunching on multiple
versions of this series without issues such as this (save the mips issue
above).

I have to say it would be nice if the relation between linux/highmem.h and
asm/highmem.h was more straightforward.

Ira


^ permalink raw reply

* [PATCH V18 2/2] mm/debug: Add tests validating architecture page table helpers
From: Anshuman Khandual @ 2020-05-04  4:01 UTC (permalink / raw)
  To: linux-mm
  Cc: Heiko Carstens, Paul Mackerras, H. Peter Anvin, linux-riscv,
	Will Deacon, linux-arch, linux-s390, x86, Mike Rapoport,
	Christian Borntraeger, Ingo Molnar, Catalin Marinas,
	linux-snps-arc, Vasily Gorbik, Anshuman Khandual, Borislav Petkov,
	Paul Walmsley, Kirill A . Shutemov, Thomas Gleixner,
	linux-arm-kernel, Vineet Gupta, linux-kernel, Palmer Dabbelt,
	Qian Cai, Andrew Morton, linuxppc-dev
In-Reply-To: <1588564865-31160-1-git-send-email-anshuman.khandual@arm.com>

This adds tests which will validate architecture page table helpers and
other accessors in their compliance with expected generic MM semantics.
This will help various architectures in validating changes to existing
page table helpers or addition of new ones.

This test covers basic page table entry transformations including but not
limited to old, young, dirty, clean, write, write protect etc at various
level along with populating intermediate entries with next page table page
and validating them.

Test page table pages are allocated from system memory with required size
and alignments. The mapped pfns at page table levels are derived from a
real pfn representing a valid kernel text symbol. This test gets called
via late_initcall().

This test gets built and run when CONFIG_DEBUG_VM_PGTABLE is selected. Any
architecture, which is willing to subscribe this test will need to select
ARCH_HAS_DEBUG_VM_PGTABLE. For now this is limited to arc, arm64, x86, s390
and powerpc platforms where the test is known to build and run successfully
Going forward, other architectures too can subscribe the test after fixing
any build or runtime problems with their page table helpers.

Folks interested in making sure that a given platform's page table helpers
conform to expected generic MM semantics should enable the above config
which will just trigger this test during boot. Any non conformity here will
be reported as an warning which would need to be fixed. This test will help
catch any changes to the agreed upon semantics expected from generic MM and
enable platforms to accommodate it thereafter.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: x86@kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	# s390
Tested-by: Christophe Leroy <christophe.leroy@c-s.fr>	# ppc32
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 .../debug/debug-vm-pgtable/arch-support.txt   |  34 ++
 arch/arc/Kconfig                              |   1 +
 arch/arm64/Kconfig                            |   1 +
 arch/powerpc/Kconfig                          |   1 +
 arch/s390/Kconfig                             |   1 +
 arch/x86/Kconfig                              |   1 +
 lib/Kconfig.debug                             |  22 +
 mm/Makefile                                   |   1 +
 mm/debug_vm_pgtable.c                         | 382 ++++++++++++++++++
 9 files changed, 444 insertions(+)
 create mode 100644 Documentation/features/debug/debug-vm-pgtable/arch-support.txt
 create mode 100644 mm/debug_vm_pgtable.c

diff --git a/Documentation/features/debug/debug-vm-pgtable/arch-support.txt b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
new file mode 100644
index 000000000000..c527d05c0459
--- /dev/null
+++ b/Documentation/features/debug/debug-vm-pgtable/arch-support.txt
@@ -0,0 +1,34 @@
+#
+# Feature name:          debug-vm-pgtable
+#         Kconfig:       ARCH_HAS_DEBUG_VM_PGTABLE
+#         description:   arch supports pgtable tests for semantics compliance
+#
+    -----------------------
+    |         arch |status|
+    -----------------------
+    |       alpha: | TODO |
+    |         arc: |  ok  |
+    |         arm: | TODO |
+    |       arm64: |  ok  |
+    |         c6x: | TODO |
+    |        csky: | TODO |
+    |       h8300: | TODO |
+    |     hexagon: | TODO |
+    |        ia64: | TODO |
+    |        m68k: | TODO |
+    |  microblaze: | TODO |
+    |        mips: | TODO |
+    |       nds32: | TODO |
+    |       nios2: | TODO |
+    |    openrisc: | TODO |
+    |      parisc: | TODO |
+    |     powerpc: |  ok  |
+    |       riscv: | TODO |
+    |        s390: |  ok  |
+    |          sh: | TODO |
+    |       sparc: | TODO |
+    |          um: | TODO |
+    |   unicore32: | TODO |
+    |         x86: |  ok  |
+    |      xtensa: | TODO |
+    -----------------------
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig
index ff306246d0f8..471ef22216c4 100644
--- a/arch/arc/Kconfig
+++ b/arch/arc/Kconfig
@@ -6,6 +6,7 @@
 config ARC
 	def_bool y
 	select ARC_TIMERS
+	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DMA_PREP_COHERENT
 	select ARCH_HAS_PTE_SPECIAL
 	select ARCH_HAS_SETUP_DMA_OPS
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 40fb05d96c60..0efb46abaf3d 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -10,6 +10,7 @@ config ARM64
 	select ACPI_SPCR_TABLE if ACPI
 	select ACPI_PPTT if ACPI
 	select ARCH_HAS_DEBUG_VIRTUAL
+	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_DMA_PREP_COHERENT
 	select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 924c541a9260..4e89f22cdd27 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -116,6 +116,7 @@ config PPC
 	#
 	select ARCH_32BIT_OFF_T if PPC32
 	select ARCH_HAS_DEBUG_VIRTUAL
+	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 2167bce993ff..8206b2c19aa8 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -59,6 +59,7 @@ config KASAN_SHADOW_OFFSET
 config S390
 	def_bool y
 	select ARCH_BINFMT_ELF_STATE
+	select ARCH_HAS_DEBUG_VM_PGTABLE
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FORTIFY_SOURCE
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1197b5596d5a..65f545c130f6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -59,6 +59,7 @@ config X86
 	select ARCH_CLOCKSOURCE_INIT
 	select ARCH_HAS_ACPI_TABLE_UPGRADE	if ACPI
 	select ARCH_HAS_DEBUG_VIRTUAL
+	select ARCH_HAS_DEBUG_VM_PGTABLE	if !X86_PAE
 	select ARCH_HAS_DEVMEM_IS_ALLOWED
 	select ARCH_HAS_ELF_RANDOMIZE
 	select ARCH_HAS_FAST_MULTIPLIER
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 21d9c5f6e7ec..f37d01054bc3 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -653,6 +653,12 @@ config SCHED_STACK_END_CHECK
 	  data corruption or a sporadic crash at a later stage once the region
 	  is examined. The runtime overhead introduced is minimal.
 
+config ARCH_HAS_DEBUG_VM_PGTABLE
+	bool
+	help
+	  An architecture should select this when it can successfully
+	  build and run DEBUG_VM_PGTABLE.
+
 config DEBUG_VM
 	bool "Debug VM"
 	depends on DEBUG_KERNEL
@@ -688,6 +694,22 @@ config DEBUG_VM_PGFLAGS
 
 	  If unsure, say N.
 
+config DEBUG_VM_PGTABLE
+	bool "Debug arch page table for semantics compliance"
+	depends on MMU
+	depends on ARCH_HAS_DEBUG_VM_PGTABLE
+	default y if DEBUG_VM
+	help
+	  This option provides a debug method which can be used to test
+	  architecture page table helper functions on various platforms in
+	  verifying if they comply with expected generic MM semantics. This
+	  will help architecture code in making sure that any changes or
+	  new additions of these helpers still conform to expected
+	  semantics of the generic MM. Platforms will have to opt in for
+	  this through ARCH_HAS_DEBUG_VM_PGTABLE.
+
+	  If unsure, say N.
+
 config ARCH_HAS_DEBUG_VIRTUAL
 	bool
 
diff --git a/mm/Makefile b/mm/Makefile
index fccd3756b25f..662fd1504646 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -88,6 +88,7 @@ obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o
 obj-$(CONFIG_DEBUG_KMEMLEAK) += kmemleak.o
 obj-$(CONFIG_DEBUG_KMEMLEAK_TEST) += kmemleak-test.o
 obj-$(CONFIG_DEBUG_RODATA_TEST) += rodata_test.o
+obj-$(CONFIG_DEBUG_VM_PGTABLE) += debug_vm_pgtable.o
 obj-$(CONFIG_PAGE_OWNER) += page_owner.o
 obj-$(CONFIG_CLEANCACHE) += cleancache.o
 obj-$(CONFIG_MEMORY_ISOLATION) += page_isolation.o
diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c
new file mode 100644
index 000000000000..188c18908964
--- /dev/null
+++ b/mm/debug_vm_pgtable.c
@@ -0,0 +1,382 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This kernel test validates architecture page table helpers and
+ * accessors and helps in verifying their continued compliance with
+ * expected generic MM semantics.
+ *
+ * Copyright (C) 2019 ARM Ltd.
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#define pr_fmt(fmt) "debug_vm_pgtable: %s: " fmt, __func__
+
+#include <linux/gfp.h>
+#include <linux/highmem.h>
+#include <linux/hugetlb.h>
+#include <linux/kernel.h>
+#include <linux/kconfig.h>
+#include <linux/mm.h>
+#include <linux/mman.h>
+#include <linux/mm_types.h>
+#include <linux/module.h>
+#include <linux/pfn_t.h>
+#include <linux/printk.h>
+#include <linux/random.h>
+#include <linux/spinlock.h>
+#include <linux/swap.h>
+#include <linux/swapops.h>
+#include <linux/start_kernel.h>
+#include <linux/sched/mm.h>
+#include <asm/pgalloc.h>
+#include <asm/pgtable.h>
+
+#define VMFLAGS	(VM_READ|VM_WRITE|VM_EXEC)
+
+/*
+ * On s390 platform, the lower 4 bits are used to identify given page table
+ * entry type. But these bits might affect the ability to clear entries with
+ * pxx_clear() because of how dynamic page table folding works on s390. So
+ * while loading up the entries do not change the lower 4 bits. It does not
+ * have affect any other platform.
+ */
+#define S390_MASK_BITS	4
+#define RANDOM_ORVALUE	GENMASK(BITS_PER_LONG - 1, S390_MASK_BITS)
+#define RANDOM_NZVALUE	GENMASK(7, 0)
+
+static void __init pte_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	pte_t pte = pfn_pte(pfn, prot);
+
+	WARN_ON(!pte_same(pte, pte));
+	WARN_ON(!pte_young(pte_mkyoung(pte_mkold(pte))));
+	WARN_ON(!pte_dirty(pte_mkdirty(pte_mkclean(pte))));
+	WARN_ON(!pte_write(pte_mkwrite(pte_wrprotect(pte))));
+	WARN_ON(pte_young(pte_mkold(pte_mkyoung(pte))));
+	WARN_ON(pte_dirty(pte_mkclean(pte_mkdirty(pte))));
+	WARN_ON(pte_write(pte_wrprotect(pte_mkwrite(pte))));
+}
+
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	pmd_t pmd = pfn_pmd(pfn, prot);
+
+	WARN_ON(!pmd_same(pmd, pmd));
+	WARN_ON(!pmd_young(pmd_mkyoung(pmd_mkold(pmd))));
+	WARN_ON(!pmd_dirty(pmd_mkdirty(pmd_mkclean(pmd))));
+	WARN_ON(!pmd_write(pmd_mkwrite(pmd_wrprotect(pmd))));
+	WARN_ON(pmd_young(pmd_mkold(pmd_mkyoung(pmd))));
+	WARN_ON(pmd_dirty(pmd_mkclean(pmd_mkdirty(pmd))));
+	WARN_ON(pmd_write(pmd_wrprotect(pmd_mkwrite(pmd))));
+	/*
+	 * A huge page does not point to next level page table
+	 * entry. Hence this must qualify as pmd_bad().
+	 */
+	WARN_ON(!pmd_bad(pmd_mkhuge(pmd)));
+}
+
+#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
+static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	pud_t pud = pfn_pud(pfn, prot);
+
+	WARN_ON(!pud_same(pud, pud));
+	WARN_ON(!pud_young(pud_mkyoung(pud_mkold(pud))));
+	WARN_ON(!pud_write(pud_mkwrite(pud_wrprotect(pud))));
+	WARN_ON(pud_write(pud_wrprotect(pud_mkwrite(pud))));
+	WARN_ON(pud_young(pud_mkold(pud_mkyoung(pud))));
+
+	if (mm_pmd_folded(mm))
+		return;
+
+	/*
+	 * A huge page does not point to next level page table
+	 * entry. Hence this must qualify as pud_bad().
+	 */
+	WARN_ON(!pud_bad(pud_mkhuge(pud)));
+}
+#else  /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
+static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
+#endif /* CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */
+#else  /* !CONFIG_TRANSPARENT_HUGEPAGE */
+static void __init pmd_basic_tests(unsigned long pfn, pgprot_t prot) { }
+static void __init pud_basic_tests(unsigned long pfn, pgprot_t prot) { }
+#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
+
+static void __init p4d_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	p4d_t p4d;
+
+	memset(&p4d, RANDOM_NZVALUE, sizeof(p4d_t));
+	WARN_ON(!p4d_same(p4d, p4d));
+}
+
+static void __init pgd_basic_tests(unsigned long pfn, pgprot_t prot)
+{
+	pgd_t pgd;
+
+	memset(&pgd, RANDOM_NZVALUE, sizeof(pgd_t));
+	WARN_ON(!pgd_same(pgd, pgd));
+}
+
+#ifndef __PAGETABLE_PUD_FOLDED
+static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp)
+{
+	pud_t pud = READ_ONCE(*pudp);
+
+	if (mm_pmd_folded(mm))
+		return;
+
+	pud = __pud(pud_val(pud) | RANDOM_ORVALUE);
+	WRITE_ONCE(*pudp, pud);
+	pud_clear(pudp);
+	pud = READ_ONCE(*pudp);
+	WARN_ON(!pud_none(pud));
+}
+
+static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp,
+				      pmd_t *pmdp)
+{
+	pud_t pud;
+
+	if (mm_pmd_folded(mm))
+		return;
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pud_bad().
+	 */
+	pmd_clear(pmdp);
+	pud_clear(pudp);
+	pud_populate(mm, pudp, pmdp);
+	pud = READ_ONCE(*pudp);
+	WARN_ON(pud_bad(pud));
+}
+#else  /* !__PAGETABLE_PUD_FOLDED */
+static void __init pud_clear_tests(struct mm_struct *mm, pud_t *pudp) { }
+static void __init pud_populate_tests(struct mm_struct *mm, pud_t *pudp,
+				      pmd_t *pmdp)
+{
+}
+#endif /* PAGETABLE_PUD_FOLDED */
+
+#ifndef __PAGETABLE_P4D_FOLDED
+static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp)
+{
+	p4d_t p4d = READ_ONCE(*p4dp);
+
+	if (mm_pud_folded(mm))
+		return;
+
+	p4d = __p4d(p4d_val(p4d) | RANDOM_ORVALUE);
+	WRITE_ONCE(*p4dp, p4d);
+	p4d_clear(p4dp);
+	p4d = READ_ONCE(*p4dp);
+	WARN_ON(!p4d_none(p4d));
+}
+
+static void __init p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp,
+				      pud_t *pudp)
+{
+	p4d_t p4d;
+
+	if (mm_pud_folded(mm))
+		return;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as p4d_bad().
+	 */
+	pud_clear(pudp);
+	p4d_clear(p4dp);
+	p4d_populate(mm, p4dp, pudp);
+	p4d = READ_ONCE(*p4dp);
+	WARN_ON(p4d_bad(p4d));
+}
+
+static void __init pgd_clear_tests(struct mm_struct *mm, pgd_t *pgdp)
+{
+	pgd_t pgd = READ_ONCE(*pgdp);
+
+	if (mm_p4d_folded(mm))
+		return;
+
+	pgd = __pgd(pgd_val(pgd) | RANDOM_ORVALUE);
+	WRITE_ONCE(*pgdp, pgd);
+	pgd_clear(pgdp);
+	pgd = READ_ONCE(*pgdp);
+	WARN_ON(!pgd_none(pgd));
+}
+
+static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp,
+				      p4d_t *p4dp)
+{
+	pgd_t pgd;
+
+	if (mm_p4d_folded(mm))
+		return;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pgd_bad().
+	 */
+	p4d_clear(p4dp);
+	pgd_clear(pgdp);
+	pgd_populate(mm, pgdp, p4dp);
+	pgd = READ_ONCE(*pgdp);
+	WARN_ON(pgd_bad(pgd));
+}
+#else  /* !__PAGETABLE_P4D_FOLDED */
+static void __init p4d_clear_tests(struct mm_struct *mm, p4d_t *p4dp) { }
+static void __init pgd_clear_tests(struct mm_struct *mm, pgd_t *pgdp) { }
+static void __init p4d_populate_tests(struct mm_struct *mm, p4d_t *p4dp,
+				      pud_t *pudp)
+{
+}
+static void __init pgd_populate_tests(struct mm_struct *mm, pgd_t *pgdp,
+				      p4d_t *p4dp)
+{
+}
+#endif /* PAGETABLE_P4D_FOLDED */
+
+static void __init pte_clear_tests(struct mm_struct *mm, pte_t *ptep,
+				   unsigned long vaddr)
+{
+	pte_t pte = READ_ONCE(*ptep);
+
+	pte = __pte(pte_val(pte) | RANDOM_ORVALUE);
+	set_pte_at(mm, vaddr, ptep, pte);
+	barrier();
+	pte_clear(mm, vaddr, ptep);
+	pte = READ_ONCE(*ptep);
+	WARN_ON(!pte_none(pte));
+}
+
+static void __init pmd_clear_tests(struct mm_struct *mm, pmd_t *pmdp)
+{
+	pmd_t pmd = READ_ONCE(*pmdp);
+
+	pmd = __pmd(pmd_val(pmd) | RANDOM_ORVALUE);
+	WRITE_ONCE(*pmdp, pmd);
+	pmd_clear(pmdp);
+	pmd = READ_ONCE(*pmdp);
+	WARN_ON(!pmd_none(pmd));
+}
+
+static void __init pmd_populate_tests(struct mm_struct *mm, pmd_t *pmdp,
+				      pgtable_t pgtable)
+{
+	pmd_t pmd;
+
+	/*
+	 * This entry points to next level page table page.
+	 * Hence this must not qualify as pmd_bad().
+	 */
+	pmd_clear(pmdp);
+	pmd_populate(mm, pmdp, pgtable);
+	pmd = READ_ONCE(*pmdp);
+	WARN_ON(pmd_bad(pmd));
+}
+
+static unsigned long __init get_random_vaddr(void)
+{
+	unsigned long random_vaddr, random_pages, total_user_pages;
+
+	total_user_pages = (TASK_SIZE - FIRST_USER_ADDRESS) / PAGE_SIZE;
+
+	random_pages = get_random_long() % total_user_pages;
+	random_vaddr = FIRST_USER_ADDRESS + random_pages * PAGE_SIZE;
+
+	return random_vaddr;
+}
+
+static int __init debug_vm_pgtable(void)
+{
+	struct mm_struct *mm;
+	pgd_t *pgdp;
+	p4d_t *p4dp, *saved_p4dp;
+	pud_t *pudp, *saved_pudp;
+	pmd_t *pmdp, *saved_pmdp, pmd;
+	pte_t *ptep;
+	pgtable_t saved_ptep;
+	pgprot_t prot;
+	phys_addr_t paddr;
+	unsigned long vaddr, pte_aligned, pmd_aligned;
+	unsigned long pud_aligned, p4d_aligned, pgd_aligned;
+	spinlock_t *uninitialized_var(ptl);
+
+	pr_info("Validating architecture page table helpers\n");
+	prot = vm_get_page_prot(VMFLAGS);
+	vaddr = get_random_vaddr();
+	mm = mm_alloc();
+	if (!mm) {
+		pr_err("mm_struct allocation failed\n");
+		return 1;
+	}
+
+	/*
+	 * PFN for mapping at PTE level is determined from a standard kernel
+	 * text symbol. But pfns for higher page table levels are derived by
+	 * masking lower bits of this real pfn. These derived pfns might not
+	 * exist on the platform but that does not really matter as pfn_pxx()
+	 * helpers will still create appropriate entries for the test. This
+	 * helps avoid large memory block allocations to be used for mapping
+	 * at higher page table levels.
+	 */
+	paddr = __pa_symbol(&start_kernel);
+
+	pte_aligned = (paddr & PAGE_MASK) >> PAGE_SHIFT;
+	pmd_aligned = (paddr & PMD_MASK) >> PAGE_SHIFT;
+	pud_aligned = (paddr & PUD_MASK) >> PAGE_SHIFT;
+	p4d_aligned = (paddr & P4D_MASK) >> PAGE_SHIFT;
+	pgd_aligned = (paddr & PGDIR_MASK) >> PAGE_SHIFT;
+	WARN_ON(!pfn_valid(pte_aligned));
+
+	pgdp = pgd_offset(mm, vaddr);
+	p4dp = p4d_alloc(mm, pgdp, vaddr);
+	pudp = pud_alloc(mm, p4dp, vaddr);
+	pmdp = pmd_alloc(mm, pudp, vaddr);
+	ptep = pte_alloc_map_lock(mm, pmdp, vaddr, &ptl);
+
+	/*
+	 * Save all the page table page addresses as the page table
+	 * entries will be used for testing with random or garbage
+	 * values. These saved addresses will be used for freeing
+	 * page table pages.
+	 */
+	pmd = READ_ONCE(*pmdp);
+	saved_p4dp = p4d_offset(pgdp, 0UL);
+	saved_pudp = pud_offset(p4dp, 0UL);
+	saved_pmdp = pmd_offset(pudp, 0UL);
+	saved_ptep = pmd_pgtable(pmd);
+
+	pte_basic_tests(pte_aligned, prot);
+	pmd_basic_tests(pmd_aligned, prot);
+	pud_basic_tests(pud_aligned, prot);
+	p4d_basic_tests(p4d_aligned, prot);
+	pgd_basic_tests(pgd_aligned, prot);
+
+	pte_clear_tests(mm, ptep, vaddr);
+	pmd_clear_tests(mm, pmdp);
+	pud_clear_tests(mm, pudp);
+	p4d_clear_tests(mm, p4dp);
+	pgd_clear_tests(mm, pgdp);
+
+	pte_unmap_unlock(ptep, ptl);
+
+	pmd_populate_tests(mm, pmdp, saved_ptep);
+	pud_populate_tests(mm, pudp, saved_pmdp);
+	p4d_populate_tests(mm, p4dp, saved_pudp);
+	pgd_populate_tests(mm, pgdp, saved_p4dp);
+
+	p4d_free(mm, saved_p4dp);
+	pud_free(mm, saved_pudp);
+	pmd_free(mm, saved_pmdp);
+	pte_free(mm, saved_ptep);
+
+	mm_dec_nr_puds(mm);
+	mm_dec_nr_pmds(mm);
+	mm_dec_nr_ptes(mm);
+	mmdrop(mm);
+	return 0;
+}
+late_initcall(debug_vm_pgtable);
-- 
2.20.1


^ permalink raw reply related

* [PATCH V18 0/2] mm/debug: Add tests validating architecture page table helpers
From: Anshuman Khandual @ 2020-05-04  4:01 UTC (permalink / raw)
  To: linux-mm
  Cc: Heiko Carstens, Paul Mackerras, H. Peter Anvin, linux-riscv,
	Will Deacon, linux-arch, linux-s390, x86, Mike Rapoport,
	Christian Borntraeger, Ingo Molnar, Catalin Marinas,
	linux-snps-arc, Vasily Gorbik, Anshuman Khandual, Borislav Petkov,
	Paul Walmsley, Kirill A . Shutemov, Thomas Gleixner,
	linux-arm-kernel, Vineet Gupta, linux-kernel, Palmer Dabbelt,
	Andrew Morton, linuxppc-dev

This adds a test validation for architecture exported page table helpers.
Patch adds basic transformation tests at various levels of the page table.

This test was originally suggested by Catalin during arm64 THP migration
RFC discussion earlier. Going forward it can include more specific tests
with respect to various generic MM functions like THP, HugeTLB etc and
platform specific tests.

https://lore.kernel.org/linux-mm/20190628102003.GA56463@arrakis.emea.arm.com/

Needs to be applied on linux V5.7-rc4

Changes in V18:

- Stopped enabling CONFIG_DEBUG_VM_PGTABLE via CONFIG_EXPERT
- Dropped the exclude list (IA64 and ARM) as CONFIG_EXPERT route no longer available
- Updated CONFIG_DEBUG_VM_PGTABLE's help section as required
- Updated the commit message for [PATCH 2/2] as required

Changes in V17: (https://patchwork.kernel.org/project/linux-mm/list/?series=274401)

- debug_vm_pgtable() is now called from late_initcall() per Linus
- Explicitly enable DEBUG_VM_PGTABLE when ARCH_HAS_DEBUG_VM_PGTABLE and DEBUG_VM
- Added #ifdef documentation per Gerald
- Dropped page table helper semantics documentation (will be added via later patches)
- Split the X86 changes defining mm_p4d_folded() into a new prerequisite patch

Changes in V16: (https://patchwork.kernel.org/patch/11431277/)

- Replaced WRITE_ONCE() with set_pte_at() with a new barrier() in pte_clear_tests() per Qian
- Enabled all powerpc platforms and updated the feature list

Changes in V15: (https://patchwork.kernel.org/patch/11422803/)

- Replaced __pa() with __pa_symbol() (https://patchwork.kernel.org/patch/11407715/) 
- Replaced pte_alloc_map() with pte_alloc_map_lock() per Qian
- Replaced pte_unmap() with pte_unmap_unlock() per Qian
- Added address to pte_clear_tests() and passed it down till pte_clear() per Qian

Changes in V14: (https://patchwork.kernel.org/project/linux-mm/list/?series=242305)

- Disabled DEBUG_VM_PGTABLE for IA64 and ARM (32 Bit) per Andrew and Christophe
- Updated DEBUG_VM_PGTABLE documentation wrt EXPERT and disabled platforms
- Updated RANDOM_[OR|NZ]VALUE open encodings with GENMASK() per Catalin
- Updated s390 constraint bits from 12 to 4 (S390_MASK_BITS) per Gerald
- Updated in-code documentation for RANDOM_ORVALUE per Gerald
- Updated pxx_basic_tests() to use invert functions first per Catalin
- Dropped ARCH_HAS_4LEVEL_HACK check from pud_basic_tests()
- Replaced __ARCH_HAS_[4|5]LEVEL_HACK with __PAGETABLE_[PUD|P4D]_FOLDED per Catalin
- Trimmed the CC list on the commit message per Catalin

Changes in V13: (https://patchwork.kernel.org/project/linux-mm/list/?series=237125)

- Subscribed s390 platform and updated debug-vm-pgtable/arch-support.txt per Gerald
- Dropped keyword 'extern' from debug_vm_pgtable() declaration per Christophe
- Moved debug_vm_pgtable() declarations to <linux/mmdebug.h> per Christophe
- Moved debug_vm_pgtable() call site into kernel_init() per Christophe
- Changed CONFIG_DEBUG_VM_PGTABLE rules per Christophe
- Updated commit to include new supported platforms and changed config selection

Changes in V12: (https://patchwork.kernel.org/project/linux-mm/list/?series=233905)

- Replaced __mmdrop() with mmdrop()
- Enable ARCH_HAS_DEBUG_VM_PGTABLE on X86 for non CONFIG_X86_PAE platforms as the
  test procedure interfere with pre-allocated PMDs attached to the PGD resulting
  in runtime failures with VM_BUG_ON()

Changes in V11: (https://patchwork.kernel.org/project/linux-mm/list/?series=221135)

- Rebased the patch on V5.4

Changes in V10: (https://patchwork.kernel.org/project/linux-mm/list/?series=205529)

- Always enable DEBUG_VM_PGTABLE when DEBUG_VM is enabled per Ingo
- Added tags from Ingo

Changes in V9: (https://patchwork.kernel.org/project/linux-mm/list/?series=201429)

- Changed feature support enumeration for powerpc platforms per Christophe
- Changed config wrapper for basic_[pmd|pud]_tests() to enable ARC platform
- Enabled the test on ARC platform

Changes in V8: (https://patchwork.kernel.org/project/linux-mm/list/?series=194297)

- Enabled ARCH_HAS_DEBUG_VM_PGTABLE on PPC32 platform per Christophe
- Updated feature documentation as DEBUG_VM_PGTABLE is now enabled on PPC32 platform
- Moved ARCH_HAS_DEBUG_VM_PGTABLE earlier to indent it with DEBUG_VM per Christophe
- Added an information message in debug_vm_pgtable() per Christophe
- Dropped random_vaddr boundary condition checks per Christophe and Qian
- Replaced virt_addr_valid() check with pfn_valid() check in debug_vm_pgtable()
- Slightly changed pr_fmt(fmt) information

Changes in V7: (https://patchwork.kernel.org/project/linux-mm/list/?series=193051)

- Memory allocation and free routines for mapped pages have been droped
- Mapped pfns are derived from standard kernel text symbol per Matthew
- Moved debug_vm_pgtaable() after page_alloc_init_late() per Michal and Qian 
- Updated the commit message per Michal
- Updated W=1 GCC warning problem on x86 per Qian Cai
- Addition of new alloc_contig_pages() helper has been submitted separately

Changes in V6: (https://patchwork.kernel.org/project/linux-mm/list/?series=187589)

- Moved alloc_gigantic_page_order() into mm/page_alloc.c per Michal
- Moved alloc_gigantic_page_order() within CONFIG_CONTIG_ALLOC in the test
- Folded Andrew's include/asm-generic/pgtable.h fix into the test patch 2/2

Changes in V5: (https://patchwork.kernel.org/project/linux-mm/list/?series=185991)

- Redefined and moved X86 mm_p4d_folded() into a different header per Kirill/Ingo
- Updated the config option comment per Ingo and dropped 'kernel module' reference
- Updated the commit message and dropped 'kernel module' reference
- Changed DEBUG_ARCH_PGTABLE_TEST into DEBUG_VM_PGTABLE per Ingo
- Moved config option from mm/Kconfig.debug into lib/Kconfig.debug
- Renamed core test function arch_pgtable_tests() as debug_vm_pgtable()
- Renamed mm/arch_pgtable_test.c as mm/debug_vm_pgtable.c
- debug_vm_pgtable() gets called from kernel_init_freeable() after init_mm_internals()
- Added an entry in Documentation/features/debug/ per Ingo
- Enabled the test on arm64 and x86 platforms for now

Changes in V4: (https://patchwork.kernel.org/project/linux-mm/list/?series=183465)

- Disable DEBUG_ARCH_PGTABLE_TEST for ARM and IA64 platforms

Changes in V3: (https://lore.kernel.org/patchwork/project/lkml/list/?series=411216)

- Changed test trigger from module format into late_initcall()
- Marked all functions with __init to be freed after completion
- Changed all __PGTABLE_PXX_FOLDED checks as mm_pxx_folded()
- Folded in PPC32 fixes from Christophe

Changes in V2:

https://lore.kernel.org/linux-mm/1568268173-31302-1-git-send-email-anshuman.khandual@arm.com/T/#t

- Fixed small typo error in MODULE_DESCRIPTION()
- Fixed m64k build problems for lvalue concerns in pmd_xxx_tests()
- Fixed dynamic page table level folding problems on x86 as per Kirril
- Fixed second pointers during pxx_populate_tests() per Kirill and Gerald
- Allocate and free pte table with pte_alloc_one/pte_free per Kirill
- Modified pxx_clear_tests() to accommodate s390 lower 12 bits situation
- Changed RANDOM_NZVALUE value from 0xbe to 0xff
- Changed allocation, usage, free sequence for saved_ptep
- Renamed VMA_FLAGS as VMFLAGS
- Implemented a new method for random vaddr generation
- Implemented some other cleanups
- Dropped extern reference to mm_alloc()
- Created and exported new alloc_gigantic_page_order()
- Dropped the custom allocator and used new alloc_gigantic_page_order()

Changes in V1:

https://lore.kernel.org/linux-mm/1567497706-8649-1-git-send-email-anshuman.khandual@arm.com/

- Added fallback mechanism for PMD aligned memory allocation failure

Changes in RFC V2:

https://lore.kernel.org/linux-mm/1565335998-22553-1-git-send-email-anshuman.khandual@arm.com/T/#u

- Moved test module and it's config from lib/ to mm/
- Renamed config TEST_ARCH_PGTABLE as DEBUG_ARCH_PGTABLE_TEST
- Renamed file from test_arch_pgtable.c to arch_pgtable_test.c
- Added relevant MODULE_DESCRIPTION() and MODULE_AUTHOR() details
- Dropped loadable module config option
- Basic tests now use memory blocks with required size and alignment
- PUD aligned memory block gets allocated with alloc_contig_range()
- If PUD aligned memory could not be allocated it falls back on PMD aligned
  memory block from page allocator and pud_* tests are skipped
- Clear and populate tests now operate on real in memory page table entries
- Dummy mm_struct gets allocated with mm_alloc()
- Dummy page table entries get allocated with [pud|pmd|pte]_alloc_[map]()
- Simplified [p4d|pgd]_basic_tests(), now has random values in the entries

Original RFC V1:

https://lore.kernel.org/linux-mm/1564037723-26676-1-git-send-email-anshuman.khandual@arm.com/

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mike Rapoport <rppt@linux.ibm.com>
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: linux-snps-arc@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-s390@vger.kernel.org
Cc: linux-riscv@lists.infradead.org
Cc: x86@kernel.org
Cc: linux-arch@vger.kernel.org
Cc: linux-kernel@vger.kernel.org


Anshuman Khandual (2):
  x86/mm: Define mm_p4d_folded()
  mm/debug: Add tests validating architecture page table helpers

 .../debug/debug-vm-pgtable/arch-support.txt   |  34 ++
 arch/arc/Kconfig                              |   1 +
 arch/arm64/Kconfig                            |   1 +
 arch/powerpc/Kconfig                          |   1 +
 arch/s390/Kconfig                             |   1 +
 arch/x86/Kconfig                              |   1 +
 arch/x86/include/asm/pgtable_64.h             |   6 +
 lib/Kconfig.debug                             |  22 +
 mm/Makefile                                   |   1 +
 mm/debug_vm_pgtable.c                         | 382 ++++++++++++++++++
 10 files changed, 450 insertions(+)
 create mode 100644 Documentation/features/debug/debug-vm-pgtable/arch-support.txt
 create mode 100644 mm/debug_vm_pgtable.c

-- 
2.20.1


^ permalink raw reply

* Re: [PATCH v7 05/28] powerpc: Change calling convention for create_branch() et. al.
From: Alistair Popple @ 2020-05-04  2:55 UTC (permalink / raw)
  To: Jordan Niethe; +Cc: npiggin, bala24, naveen.n.rao, linuxppc-dev, dja
In-Reply-To: <20200501034220.8982-6-jniethe5@gmail.com>

On Friday, 1 May 2020 1:41:57 PM AEST Jordan Niethe wrote:
> create_branch(), create_cond_branch() and translate_branch() return the
> instruction that they create, or return 0 to signal an error. Separate
> these concerns in preparation for an instruction type that is not just
> an unsigned int.  Fill the created instruction to a pointer passed as
> the first parameter to the function and use a non-zero return value to
> signify an error.

We're not adding any new checks for error cases here, but this patch doesn't 
change existing behaviour as far as I can tell and adding checks would be 
difficult (and could introduce other bugs) so would be best done as a separate 
series anyway if required at all.

> @@ -403,6 +407,7 @@ static void __init test_trampoline(void)
> 
>  static void __init test_branch_iform(void)
>  {
> +	int err;
>  	unsigned int instr;
>  	unsigned long addr;
> 
> @@ -443,35 +448,35 @@ static void __init test_branch_iform(void)
>  	check(instr_is_branch_to_addr(&instr, addr - 0x2000000));
> 
>  	/* Branch to self, with link */
> -	instr = create_branch(&instr, addr, BRANCH_SET_LINK);
> +	err = create_branch(&instr, &instr, addr, BRANCH_SET_LINK);
>  	check(instr_is_branch_to_addr(&instr, addr));

The pointer alias above initially caught my eye, but it's ok because 
create_branch() doesn't actually dereference the second instance. Arguably the 
argument type could be changed to an unsigned long but then we'd just end up 
with more casts so this is ok to me at least.

Reviewed-by: Alistair Popple <alistair@popple.id.au>

^ permalink raw reply

* Re: [PATCH V2 00/11] Subject: Remove duplicated kmap code
From: Al Viro @ 2020-05-04  1:35 UTC (permalink / raw)
  To: ira.weiny
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, linux-mips,
	James E.J. Bottomley, Max Filippov, Huang Rui, Paul Mackerras,
	H. Peter Anvin, sparclinux, Dan Williams, Helge Deller, x86,
	linux-csky, Ingo Molnar, linux-snps-arc, linux-xtensa,
	Borislav Petkov, Andy Lutomirski, Thomas Gleixner,
	linux-arm-kernel, Chris Zankel, Thomas Bogendoerfer, linux-parisc,
	linux-kernel, Christian Koenig, Andrew Morton, linuxppc-dev,
	David S. Miller
In-Reply-To: <20200504010912.982044-1-ira.weiny@intel.com>

On Sun, May 03, 2020 at 06:09:01PM -0700, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> The kmap infrastructure has been copied almost verbatim to every architecture.
> This series consolidates obvious duplicated code by defining core functions
> which call into the architectures only when needed.
> 
> Some of the k[un]map_atomic() implementations have some similarities but the
> similarities were not sufficient to warrant further changes.
> 
> In addition we remove a duplicate implementation of kmap() in DRM.
> 
> Testing was done by 0day to cover all the architectures I can't readily
> build/test.

OK...  Looking through my old notes on kmap unification (this winter, never
went anywhere),

* arch/mips/mm/cache.c ought to use linux/highmem.h, not asm/highmem.h
I suspect that your series doesn't build on some configs there.  Hadn't
verified that, though.

* kmap_atomic_to_page() is dead, but not quite gone - csky and nds32 brought
the damn thing back (nds32 - only an extern).  It needs killin'...

* parisc is (arguably) abusing kunmap()/kunmap_atomic() for cache flushing.
Replace the bulk of its highmem.h with
#define ARCH_HAS_FLUSH_ON_KUNMAP
#define arch_before_kunmap flush_kernel_dcache_page_addr
and have default kunmap()/kunmap_atomic() do
#ifdef ARCH_HAS_FLUSH_ON_KUNMAP
	arch_before_kunmap(page_address(page));
#endif
and
#ifdef ARCH_HAS_FLUSH_ON_KUNMAP
	arch_before_kunmap(addr);
#endif
resp.  Kills ARCH_HAS_KMAP along with ifdefs on it, makes parisc use somewhat
less hacky.

I'd suggest checking various configs on mips - that's likely to cause headache.
Said that, my analysis of include chains back then is pretty much worthless
by now - I really hate the amount of indirect include chains leading to that
sucker on some, but not all configs ;-/  IIRC, the proof that everything
using kmap*/kunmap* would pull linux/highmem.h regardless of config took several
hours of digging, ran for several pages and had been hopelessly brittle.
arch/mips/mm/cache.c was the only exception caught by it, but these days
there might be more.

^ permalink raw reply

* [PATCH V2 02/11] arch/xtensa: Move kmap build bug out of the way
From: ira.weiny @ 2020-05-04  1:09 UTC (permalink / raw)
  To: linux-kernel, Andrew Morton, Christian Koenig, Huang Rui
  Cc: Peter Zijlstra, Dave Hansen, dri-devel, James E.J. Bottomley,
	Max Filippov, Paul Mackerras, H. Peter Anvin, sparclinux,
	Ira Weiny, Thomas Gleixner, Helge Deller, x86, linux-csky,
	Christoph Hellwig, Ingo Molnar, linux-snps-arc, linux-xtensa,
	Borislav Petkov, Andy Lutomirski, Dan Williams, linux-arm-kernel,
	Chris Zankel, Thomas Bogendoerfer, linux-parisc, linux-mips,
	linuxppc-dev, David S. Miller
In-Reply-To: <20200504010912.982044-1-ira.weiny@intel.com>

From: Ira Weiny <ira.weiny@intel.com>

Move the kmap() build bug to kmap_init() to facilitate patches to lift
kmap() to the core.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>

---
Changes from V1:
	combine code onto 1 line.
---
 arch/xtensa/include/asm/highmem.h | 5 -----
 arch/xtensa/mm/highmem.c          | 4 ++++
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/xtensa/include/asm/highmem.h b/arch/xtensa/include/asm/highmem.h
index 413848cc1e56..a9587c85be85 100644
--- a/arch/xtensa/include/asm/highmem.h
+++ b/arch/xtensa/include/asm/highmem.h
@@ -68,11 +68,6 @@ void kunmap_high(struct page *page);
 
 static inline void *kmap(struct page *page)
 {
-	/* Check if this memory layout is broken because PKMAP overlaps
-	 * page table.
-	 */
-	BUILD_BUG_ON(PKMAP_BASE <
-		     TLBTEMP_BASE_1 + TLBTEMP_SIZE);
 	might_sleep();
 	if (!PageHighMem(page))
 		return page_address(page);
diff --git a/arch/xtensa/mm/highmem.c b/arch/xtensa/mm/highmem.c
index 184ceadccc1a..da734a2ed641 100644
--- a/arch/xtensa/mm/highmem.c
+++ b/arch/xtensa/mm/highmem.c
@@ -88,6 +88,10 @@ void __init kmap_init(void)
 {
 	unsigned long kmap_vstart;
 
+	/* Check if this memory layout is broken because PKMAP overlaps
+	 * page table.
+	 */
+	BUILD_BUG_ON(PKMAP_BASE < TLBTEMP_BASE_1 + TLBTEMP_SIZE);
 	/* cache the first kmap pte */
 	kmap_vstart = __fix_to_virt(FIX_KMAP_BEGIN);
 	kmap_pte = kmap_get_fixmap_pte(kmap_vstart);
-- 
2.25.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox