LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2 09/11] powerpc/smp: Optimize update_mask_by_l2
From: Qian Cai @ 2020-10-07 14:23 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling, Peter Zijlstra,
	LKML, Nicholas Piggin, Ingo Molnar, Oliver O'Halloran,
	Satheesh Rajendran, linuxppc-dev, Valentin Schneider
In-Reply-To: <20201007141745.GM12031@linux.vnet.ibm.com>

On Wed, 2020-10-07 at 19:47 +0530, Srikar Dronamraju wrote:
> Can you confirm if CONFIG_CPUMASK_OFFSTACK is enabled in your config?

Yes, https://gitlab.com/cailca/linux-mm/-/blob/master/powerpc.config

We tested here almost daily on linux-next.


^ permalink raw reply

* Re: [PATCH v2 09/11] powerpc/smp: Optimize update_mask_by_l2
From: Srikar Dronamraju @ 2020-10-07 14:17 UTC (permalink / raw)
  To: Qian Cai
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling, Peter Zijlstra,
	LKML, Nicholas Piggin, Ingo Molnar, Oliver O'Halloran,
	Satheesh Rajendran, linuxppc-dev, Valentin Schneider
In-Reply-To: <f848a6761de05d655d847130e77b23b2bb39aa26.camel@redhat.com>

* Qian Cai <cai@redhat.com> [2020-10-07 09:05:42]:

Hi Qian,

Thanks for testing and reporting the failure.

> On Mon, 2020-09-21 at 15:26 +0530, Srikar Dronamraju wrote:
> > All threads of a SMT4 core can either be part of this CPU's l2-cache
> > mask or not related to this CPU l2-cache mask. Use this relation to
> > reduce the number of iterations needed to find all the CPUs that share
> > the same l2-cache.
> > 
> > Use a temporary mask to iterate through the CPUs that may share l2_cache
> > mask. Also instead of setting one CPU at a time into cpu_l2_cache_mask,
> > copy the SMT4/sub mask at one shot.
> > 
> ...
> >  static bool update_mask_by_l2(int cpu)
> >  {
> > +	struct cpumask *(*submask_fn)(int) = cpu_sibling_mask;
> >  	struct device_node *l2_cache, *np;
> > +	cpumask_var_t mask;
> >  	int i;
> >  
> >  	l2_cache = cpu_to_l2cache(cpu);
> > @@ -1240,22 +1264,37 @@ static bool update_mask_by_l2(int cpu)
> >  		return false;
> >  	}
> >  
> > -	cpumask_set_cpu(cpu, cpu_l2_cache_mask(cpu));
> > -	for_each_cpu_and(i, cpu_online_mask, cpu_cpu_mask(cpu)) {
> > +	alloc_cpumask_var_node(&mask, GFP_KERNEL, cpu_to_node(cpu));
> 
> Shouldn't this be GFP_ATOMIC? Otherwise, during the CPU hotplugging, we have,

Can you confirm if CONFIG_CPUMASK_OFFSTACK is enabled in your config?
Because if !CONFIG_CPUMASK_OFFSTACK, then alloc_cpumask_var_node would do
nothing but return true.

Regarding CONFIG_CPUMASK_OFFSTACK, not sure how much powerpc was tested
with that config enabled.

Please refer to
http://lore.kernel.org/lkml/87o8nv51bg.fsf@mpe.ellerman.id.au/t/#u
And we do have an issue to track the same.
https://github.com/linuxppc/issues/issues/321 for enabling/ testing /
verifying if CONFIG_CPUMASK_OFFSTACK works. I also dont see any
powerpc kconfig enabling this.

I do agree with your suggestion that we could substitute
GFP_ATOMIC/GFP_KERNEL.

> 
> (irqs were disabled in do_idle())
> 
> [  335.420001][    T0] BUG: sleeping function called from invalid context at mm/slab.h:494
> [  335.420003][    T0] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88
> [  335.420005][    T0] no locks held by swapper/88/0.
> [  335.420007][    T0] irq event stamp: 18074448
> [  335.420015][    T0] hardirqs last  enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110
> [  335.420019][    T0] hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0
> do_idle at kernel/sched/idle.c:253 (discriminator 1)
> [  335.420023][    T0] softirqs last  enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0
> [  335.420026][    T0] softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0
> [  335.420030][    T0] CPU: 88 PID: 0 Comm: swapper/88 Tainted: G        W         5.9.0-rc8-next-20201007 #1
> [  335.420032][    T0] Call Trace:
> [  335.420037][    T0] [c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable)
> [  335.420043][    T0] [c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310
> [  335.420048][    T0] [c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190
> [  335.420051][    T0] [c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0
> slab_alloc_node at mm/slub.c:2817
> (inlined by) __kmalloc_node at mm/slub.c:4013
> [  335.420054][    T0] [c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80
> kmalloc_node at include/linux/slab.h:577
> (inlined by) alloc_cpumask_var_node at lib/cpumask.c:116
> [  335.420060][    T0] [c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800
> update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267
> (inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387
> (inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420
> [  335.420063][    T0] [c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14
> 
> > +	cpumask_and(mask, cpu_online_mask, cpu_cpu_mask(cpu));
> > +
> > +	if (has_big_cores)
> > +		submask_fn = cpu_smallcore_mask;
> > +
> > +	/* Update l2-cache mask with all the CPUs that are part of submask */
> > +	or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask);
> > +
> > +	/* Skip all CPUs already part of current CPU l2-cache mask */
> > +	cpumask_andnot(mask, mask, cpu_l2_cache_mask(cpu));
> > +
> > +	for_each_cpu(i, mask) {
> >  		/*
> >  		 * when updating the marks the current CPU has not been marked
> >  		 * online, but we need to update the cache masks
> >  		 */
> >  		np = cpu_to_l2cache(i);
> > -		if (!np)
> > -			continue;
> >  
> > -		if (np == l2_cache)
> > -			set_cpus_related(cpu, i, cpu_l2_cache_mask);
> > +		/* Skip all CPUs already part of current CPU l2-cache */
> > +		if (np == l2_cache) {
> > +			or_cpumasks_related(cpu, i, submask_fn,
> > cpu_l2_cache_mask);
> > +			cpumask_andnot(mask, mask, submask_fn(i));
> > +		} else {
> > +			cpumask_andnot(mask, mask, cpu_l2_cache_mask(i));
> > +		}
> >  
> >  		of_node_put(np);
> >  	}
> >  	of_node_put(l2_cache);
> > +	free_cpumask_var(mask);
> >  
> >  	return true;
> >  }
> 

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH v2 09/11] powerpc/smp: Optimize update_mask_by_l2
From: Qian Cai @ 2020-10-07 13:05 UTC (permalink / raw)
  To: Srikar Dronamraju, Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling, Peter Zijlstra,
	LKML, Nicholas Piggin, Ingo Molnar, Oliver O'Halloran,
	Satheesh Rajendran, linuxppc-dev, Valentin Schneider
In-Reply-To: <20200921095653.9701-10-srikar@linux.vnet.ibm.com>

On Mon, 2020-09-21 at 15:26 +0530, Srikar Dronamraju wrote:
> All threads of a SMT4 core can either be part of this CPU's l2-cache
> mask or not related to this CPU l2-cache mask. Use this relation to
> reduce the number of iterations needed to find all the CPUs that share
> the same l2-cache.
> 
> Use a temporary mask to iterate through the CPUs that may share l2_cache
> mask. Also instead of setting one CPU at a time into cpu_l2_cache_mask,
> copy the SMT4/sub mask at one shot.
> 
...
>  static bool update_mask_by_l2(int cpu)
>  {
> +	struct cpumask *(*submask_fn)(int) = cpu_sibling_mask;
>  	struct device_node *l2_cache, *np;
> +	cpumask_var_t mask;
>  	int i;
>  
>  	l2_cache = cpu_to_l2cache(cpu);
> @@ -1240,22 +1264,37 @@ static bool update_mask_by_l2(int cpu)
>  		return false;
>  	}
>  
> -	cpumask_set_cpu(cpu, cpu_l2_cache_mask(cpu));
> -	for_each_cpu_and(i, cpu_online_mask, cpu_cpu_mask(cpu)) {
> +	alloc_cpumask_var_node(&mask, GFP_KERNEL, cpu_to_node(cpu));

Shouldn't this be GFP_ATOMIC? Otherwise, during the CPU hotplugging, we have,

(irqs were disabled in do_idle())

[  335.420001][    T0] BUG: sleeping function called from invalid context at mm/slab.h:494
[  335.420003][    T0] in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88
[  335.420005][    T0] no locks held by swapper/88/0.
[  335.420007][    T0] irq event stamp: 18074448
[  335.420015][    T0] hardirqs last  enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110
[  335.420019][    T0] hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0
do_idle at kernel/sched/idle.c:253 (discriminator 1)
[  335.420023][    T0] softirqs last  enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0
[  335.420026][    T0] softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0
[  335.420030][    T0] CPU: 88 PID: 0 Comm: swapper/88 Tainted: G        W         5.9.0-rc8-next-20201007 #1
[  335.420032][    T0] Call Trace:
[  335.420037][    T0] [c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable)
[  335.420043][    T0] [c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310
[  335.420048][    T0] [c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190
[  335.420051][    T0] [c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0
slab_alloc_node at mm/slub.c:2817
(inlined by) __kmalloc_node at mm/slub.c:4013
[  335.420054][    T0] [c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80
kmalloc_node at include/linux/slab.h:577
(inlined by) alloc_cpumask_var_node at lib/cpumask.c:116
[  335.420060][    T0] [c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800
update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267
(inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387
(inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420
[  335.420063][    T0] [c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14

> +	cpumask_and(mask, cpu_online_mask, cpu_cpu_mask(cpu));
> +
> +	if (has_big_cores)
> +		submask_fn = cpu_smallcore_mask;
> +
> +	/* Update l2-cache mask with all the CPUs that are part of submask */
> +	or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask);
> +
> +	/* Skip all CPUs already part of current CPU l2-cache mask */
> +	cpumask_andnot(mask, mask, cpu_l2_cache_mask(cpu));
> +
> +	for_each_cpu(i, mask) {
>  		/*
>  		 * when updating the marks the current CPU has not been marked
>  		 * online, but we need to update the cache masks
>  		 */
>  		np = cpu_to_l2cache(i);
> -		if (!np)
> -			continue;
>  
> -		if (np == l2_cache)
> -			set_cpus_related(cpu, i, cpu_l2_cache_mask);
> +		/* Skip all CPUs already part of current CPU l2-cache */
> +		if (np == l2_cache) {
> +			or_cpumasks_related(cpu, i, submask_fn,
> cpu_l2_cache_mask);
> +			cpumask_andnot(mask, mask, submask_fn(i));
> +		} else {
> +			cpumask_andnot(mask, mask, cpu_l2_cache_mask(i));
> +		}
>  
>  		of_node_put(np);
>  	}
>  	of_node_put(l2_cache);
> +	free_cpumask_var(mask);
>  
>  	return true;
>  }


^ permalink raw reply

* Re: [PATCH 2/2] sparc: Check VMA range in sparc_validate_prot()
From: Christoph Hellwig @ 2020-10-07 12:36 UTC (permalink / raw)
  To: Jann Horn
  Cc: Catalin Marinas, linuxppc-dev, linux-kernel, Christoph Hellwig,
	linux-mm, Khalid Aziz, Paul Mackerras, sparclinux, Anthony Yznaga,
	Andrew Morton, Will Deacon, David S. Miller, linux-arm-kernel
In-Reply-To: <20201007073932.865218-2-jannh@google.com>

> +++ b/arch/sparc/include/asm/mman.h
> @@ -60,31 +60,41 @@ static inline int sparc_validate_prot(unsigned long prot, unsigned long addr,
>  	if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_ADI))
>  		return 0;
>  	if (prot & PROT_ADI) {
> +		struct vm_area_struct *vma, *next;
> +

I'd split all the ADI logic into a separate, preferable out of line
helper.

> +			/* reached the end of the range without errors? */
> +			if (addr+len <= vma->vm_end)

missing whitespaces around the arithmetic operator.

^ permalink raw reply

* Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
From: Christoph Hellwig @ 2020-10-07 12:35 UTC (permalink / raw)
  To: Jann Horn
  Cc: Catalin Marinas, linuxppc-dev, linux-kernel, Christoph Hellwig,
	linux-mm, Khalid Aziz, Dave Kleikamp, Paul Mackerras, sparclinux,
	Anthony Yznaga, Andrew Morton, Will Deacon, David S. Miller,
	linux-arm-kernel
In-Reply-To: <20201007073932.865218-1-jannh@google.com>

On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote:
> diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
> index 078608ec2e92..b1fabb97d138 100644
> --- a/arch/powerpc/kernel/syscalls.c
> +++ b/arch/powerpc/kernel/syscalls.c
> @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len,
>  {
>  	long ret = -EINVAL;
>  
> -	if (!arch_validate_prot(prot, addr))
> +	if (!arch_validate_prot(prot, addr, len))

This call isn't under mmap lock.  I also find it rather weird as the
generic code only calls arch_validate_prot from mprotect, only powerpc
also calls it from mmap.

This seems to go back to commit ef3d3246a0d0
("powerpc/mm: Add Strong Access Ordering support")

^ permalink raw reply

* [PATCH v3 4/4] powerpc/lmb-size: Use addr #size-cells value when fetching lmb-size
From: Aneesh Kumar K.V @ 2020-10-07 11:48 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: nathanl, Aneesh Kumar K.V
In-Reply-To: <20201007114836.282468-1-aneesh.kumar@linux.ibm.com>

Make it consistent with other usages.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/mm/book3s64/radix_pgtable.c        |  7 ++++---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 13 +++++++++----
 2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 78c5afe98359..f8e9eb49d46b 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -498,7 +498,7 @@ static int __init probe_memory_block_size(unsigned long node, const char *uname,
 					  depth, void *data)
 {
 	unsigned long *mem_block_size = (unsigned long *)data;
-	const __be64 *prop;
+	const __be32 *prop;
 	int len;
 
 	if (depth != 1)
@@ -508,13 +508,14 @@ static int __init probe_memory_block_size(unsigned long node, const char *uname,
 		return 0;
 
 	prop = of_get_flat_dt_prop(node, "ibm,lmb-size", &len);
-	if (!prop || len < sizeof(__be64))
+
+	if (!prop || len < dt_root_size_cells * sizeof(__be32))
 		/*
 		 * Nothing in the device tree
 		 */
 		*mem_block_size = MIN_MEMORY_BLOCK_SIZE;
 	else
-		*mem_block_size = be64_to_cpup(prop);
+		*mem_block_size = of_read_number(prop, dt_root_size_cells);
 	return 1;
 }
 
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 843db91e39aa..f8aef06b29ec 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -30,12 +30,17 @@ unsigned long pseries_memory_block_size(void)
 
 	np = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
 	if (np) {
-		const __be64 *size;
+		int len;
+		int size_cells;
+		const __be32 *prop;
 
-		size = of_get_property(np, "ibm,lmb-size", NULL);
-		if (size)
-			memblock_size = be64_to_cpup(size);
+		size_cells = of_n_size_cells(np);
+
+		prop = of_get_property(np, "ibm,lmb-size", &len);
+		if (prop && len >= size_cells * sizeof(__be32))
+			memblock_size = of_read_number(prop, size_cells);
 		of_node_put(np);
+
 	} else  if (machine_is(pseries)) {
 		/* This fallback really only applies to pseries */
 		unsigned int memzero_size = 0;
-- 
2.26.2


^ permalink raw reply related

* [PATCH v3 3/4] powerpc/book3s64/radix: Make radix_mem_block_size 64bit
From: Aneesh Kumar K.V @ 2020-10-07 11:48 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: nathanl, Aneesh Kumar K.V
In-Reply-To: <20201007114836.282468-1-aneesh.kumar@linux.ibm.com>

Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic")
make sure different variables tracking lmb_size are updated to be 64 bit.

Fixes: af9d00e93a4f ("powerpc/mm/radix: Create separate mappings for hot-plugged memory")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/book3s/64/mmu.h | 2 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
index ddc414ab3c4d..e0b52940e43c 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu.h
@@ -70,7 +70,7 @@ extern unsigned int mmu_base_pid;
 /*
  * memory block size used with radix translation.
  */
-extern unsigned int __ro_after_init radix_mem_block_size;
+extern unsigned long __ro_after_init radix_mem_block_size;
 
 #define PRTB_SIZE_SHIFT	(mmu_pid_bits + 4)
 #define PRTB_ENTRIES	(1ul << mmu_pid_bits)
diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
index 5c8adeb8c955..78c5afe98359 100644
--- a/arch/powerpc/mm/book3s64/radix_pgtable.c
+++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
@@ -34,7 +34,7 @@
 
 unsigned int mmu_pid_bits;
 unsigned int mmu_base_pid;
-unsigned int radix_mem_block_size __ro_after_init;
+unsigned long radix_mem_block_size __ro_after_init;
 
 static __ref void *early_alloc_pgtable(unsigned long size, int nid,
 			unsigned long region_start, unsigned long region_end)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v3 2/4] powerpc/memhotplug: Make lmb size 64bit
From: Aneesh Kumar K.V @ 2020-10-07 11:48 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: nathanl, Aneesh Kumar K.V, stable
In-Reply-To: <20201007114836.282468-1-aneesh.kumar@linux.ibm.com>

Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic")
make sure different variables tracking lmb_size are updated to be 64 bit.

This was found by code audit.

Cc: stable@vger.kernel.org
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 .../platforms/pseries/hotplug-memory.c        | 43 +++++++++++++------
 1 file changed, 29 insertions(+), 14 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 0ea976d1cac4..843db91e39aa 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -277,7 +277,7 @@ static int dlpar_offline_lmb(struct drmem_lmb *lmb)
 	return dlpar_change_lmb_state(lmb, false);
 }
 
-static int pseries_remove_memblock(unsigned long base, unsigned int memblock_size)
+static int pseries_remove_memblock(unsigned long base, unsigned long memblock_size)
 {
 	unsigned long block_sz, start_pfn;
 	int sections_per_block;
@@ -308,10 +308,11 @@ static int pseries_remove_memblock(unsigned long base, unsigned int memblock_siz
 
 static int pseries_remove_mem_node(struct device_node *np)
 {
-	const __be32 *regs;
+	const __be32 *prop;
 	unsigned long base;
-	unsigned int lmb_size;
+	unsigned long lmb_size;
 	int ret = -EINVAL;
+	int addr_cells, size_cells;
 
 	/*
 	 * Check to see if we are actually removing memory
@@ -322,12 +323,19 @@ static int pseries_remove_mem_node(struct device_node *np)
 	/*
 	 * Find the base address and size of the memblock
 	 */
-	regs = of_get_property(np, "reg", NULL);
-	if (!regs)
+	prop = of_get_property(np, "reg", NULL);
+	if (!prop)
 		return ret;
 
-	base = be64_to_cpu(*(unsigned long *)regs);
-	lmb_size = be32_to_cpu(regs[3]);
+	addr_cells = of_n_addr_cells(np);
+	size_cells = of_n_size_cells(np);
+
+	/*
+	 * "reg" property represents (addr,size) tuple.
+	 */
+	base = of_read_number(prop, addr_cells);
+	prop += addr_cells;
+	lmb_size = of_read_number(prop, size_cells);
 
 	pseries_remove_memblock(base, lmb_size);
 	return 0;
@@ -564,7 +572,7 @@ static int dlpar_memory_remove_by_ic(u32 lmbs_to_remove, u32 drc_index)
 
 #else
 static inline int pseries_remove_memblock(unsigned long base,
-					  unsigned int memblock_size)
+					  unsigned long memblock_size)
 {
 	return -EOPNOTSUPP;
 }
@@ -886,10 +894,11 @@ int dlpar_memory(struct pseries_hp_errorlog *hp_elog)
 
 static int pseries_add_mem_node(struct device_node *np)
 {
-	const __be32 *regs;
+	const __be32 *prop;
 	unsigned long base;
-	unsigned int lmb_size;
+	unsigned long lmb_size;
 	int ret = -EINVAL;
+	int addr_cells, size_cells;
 
 	/*
 	 * Check to see if we are actually adding memory
@@ -900,12 +909,18 @@ static int pseries_add_mem_node(struct device_node *np)
 	/*
 	 * Find the base and size of the memblock
 	 */
-	regs = of_get_property(np, "reg", NULL);
-	if (!regs)
+	prop = of_get_property(np, "reg", NULL);
+	if (!prop)
 		return ret;
 
-	base = be64_to_cpu(*(unsigned long *)regs);
-	lmb_size = be32_to_cpu(regs[3]);
+	addr_cells = of_n_addr_cells(np);
+	size_cells = of_n_size_cells(np);
+	/*
+	 * "reg" property represents (addr,size) tuple.
+	 */
+	base = of_read_number(prop, addr_cells);
+	prop += addr_cells;
+	lmb_size = of_read_number(prop, size_cells);
 
 	/*
 	 * Update memory region to represent the memory add
-- 
2.26.2


^ permalink raw reply related

* [PATCH v3 1/4] powerpc/drmem: Make lmb_size 64 bit
From: Aneesh Kumar K.V @ 2020-10-07 11:48 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: nathanl, Aneesh Kumar K.V, stable
In-Reply-To: <20201007114836.282468-1-aneesh.kumar@linux.ibm.com>

Similar to commit 89c140bbaeee ("pseries: Fix 64 bit logical memory block panic")
make sure different variables tracking lmb_size are updated to be 64 bit.

This was found by code audit.

Cc: stable@vger.kernel.org
Acked-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/include/asm/drmem.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/drmem.h b/arch/powerpc/include/asm/drmem.h
index 030a19d92213..bf2402fed3e0 100644
--- a/arch/powerpc/include/asm/drmem.h
+++ b/arch/powerpc/include/asm/drmem.h
@@ -20,7 +20,7 @@ struct drmem_lmb {
 struct drmem_lmb_info {
 	struct drmem_lmb        *lmbs;
 	int                     n_lmbs;
-	u32                     lmb_size;
+	u64                     lmb_size;
 };
 
 extern struct drmem_lmb_info *drmem_info;
@@ -80,7 +80,7 @@ struct of_drconf_cell_v2 {
 #define DRCONF_MEM_RESERVED	0x00000080
 #define DRCONF_MEM_HOTREMOVABLE	0x00000100
 
-static inline u32 drmem_lmb_size(void)
+static inline u64 drmem_lmb_size(void)
 {
 	return drmem_info->lmb_size;
 }
-- 
2.26.2


^ permalink raw reply related

* [PATCH v3 0/4] Enable usage of larger LMB ( > 4G)
From: Aneesh Kumar K.V @ 2020-10-07 11:48 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: nathanl, Aneesh Kumar K.V

Changes from v2:
* Don't use root addr and size cells during runtime. Walk up the
  device tree and use the first addr and size cells value (of_n_addr_cells()/
  of_n_size_cells())

Aneesh Kumar K.V (4):
  powerpc/drmem: Make lmb_size 64 bit
  powerpc/memhotplug: Make lmb size 64bit
  powerpc/book3s64/radix: Make radix_mem_block_size 64bit
  powerpc/lmb-size: Use addr #size-cells value when fetching lmb-size

 arch/powerpc/include/asm/book3s/64/mmu.h      |  2 +-
 arch/powerpc/include/asm/drmem.h              |  4 +-
 arch/powerpc/mm/book3s64/radix_pgtable.c      |  9 +--
 .../platforms/pseries/hotplug-memory.c        | 56 +++++++++++++------
 4 files changed, 46 insertions(+), 25 deletions(-)

-- 
2.26.2


^ permalink raw reply

* [PATCH] powerpc/powernv/elog: Reduce elog message severity
From: Vasant Hegde @ 2020-10-07 10:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Vasant Hegde, Mahesh Salgaonkar

OPAL interrupts kernel whenever it has new error log. Kernel calls
interrupt handler (elog_event()) to retrieve event. elog_event makes
OPAL API call (opal_get_elog_size()) to retrieve elog info.

In some case before kernel makes opal_get_elog_size() call, it gets interrupt
again. So second time when elog_event() calls opal_get_elog_size API OPAL
returns error. Its safe to ignore this error. Hence reduce the severity
of log message.

CC: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-elog.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/powernv/opal-elog.c b/arch/powerpc/platforms/powernv/opal-elog.c
index 62ef7ad995da..67f435bb1ec4 100644
--- a/arch/powerpc/platforms/powernv/opal-elog.c
+++ b/arch/powerpc/platforms/powernv/opal-elog.c
@@ -247,7 +247,7 @@ static irqreturn_t elog_event(int irq, void *data)

 	rc = opal_get_elog_size(&id, &size, &type);
 	if (rc != OPAL_SUCCESS) {
-		pr_err("ELOG: OPAL log info read failed\n");
+		pr_debug("ELOG: OPAL log info read failed\n");
 		return IRQ_HANDLED;
 	}

-- 
2.26.2

^ permalink raw reply related

* [PATCH] powerpc/powernv/dump: Fix race while processing OPAL dump
From: Vasant Hegde @ 2020-10-07 10:17 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Vasant Hegde, Mahesh Salgaonkar

Every dump reported by OPAL is exported to userspace through a sysfs
interface and notified using kobject_uevent(). The userspace daemon
(opal_errd) then reads the dump and acknowledges that the dump is
saved safely to disk. Once acknowledged the kernel removes the
respective sysfs file entry causing respective resources to be
released including kobject.

However it's possible the userspace daemon may already be scanning
dump entries when a new sysfs dump entry is created by the kernel.
User daemon may read this new entry and ack it even before kernel can
notify userspace about it through kobject_uevent() call. If that
happens then we have a potential race between
dump_ack_store->kobject_put() and kobject_uevent which can lead to
use-after-free of a kernfs object resulting in a kernel crash.

This patch fixes this race by protecting the sysfs file
creation/notification by holding a reference count on kobject until we
safely send kobject_uevent().

The function create_dump_obj() returns the dump object which if used
by caller function will end up in use-after-free problem again.
However, the return value of create_dump_obj() function isn't being
used today and there is no need as well. Hence change it to return
void to make this fix complete.

Fixes: c7e64b9c ("powerpc/powernv Platform dump interface")
CC: Mahesh Salgaonkar <mahesh@linux.ibm.com>
Signed-off-by: Vasant Hegde <hegdevasant@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-dump.c | 31 +++++++++++++++++-----
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/opal-dump.c b/arch/powerpc/platforms/powernv/opal-dump.c
index 543c816fa99e..7e6eeedec32b 100644
--- a/arch/powerpc/platforms/powernv/opal-dump.c
+++ b/arch/powerpc/platforms/powernv/opal-dump.c
@@ -318,15 +318,14 @@ static ssize_t dump_attr_read(struct file *filep, struct kobject *kobj,
 	return count;
 }

-static struct dump_obj *create_dump_obj(uint32_t id, size_t size,
-					uint32_t type)
+static void create_dump_obj(uint32_t id, size_t size, uint32_t type)
 {
 	struct dump_obj *dump;
 	int rc;

 	dump = kzalloc(sizeof(*dump), GFP_KERNEL);
 	if (!dump)
-		return NULL;
+		return;

 	dump->kobj.kset = dump_kset;

@@ -346,21 +345,39 @@ static struct dump_obj *create_dump_obj(uint32_t id, size_t size,
 	rc = kobject_add(&dump->kobj, NULL, "0x%x-0x%x", type, id);
 	if (rc) {
 		kobject_put(&dump->kobj);
-		return NULL;
+		return;
 	}

+	/*
+	 * As soon as the sysfs file for this dump is created/activated there is
+	 * a chance the opal_errd daemon (or any userspace) might read and
+	 * acknowledge the dump before kobject_uevent() is called. If that
+	 * happens then there is a potential race between
+	 * dump_ack_store->kobject_put() and kobject_uevent() which leads to a
+	 * use-after-free of a kernfs object resulting in a kernel crash.
+	 *
+	 * To avoid that, we need to take a reference on behalf of the bin file,
+	 * so that our reference remains valid while we call kobject_uevent().
+	 * We then drop our reference before exiting the function, leaving the
+	 * bin file to drop the last reference (if it hasn't already).
+	 */
+
+	/* Take a reference for the bin file */
+	kobject_get(&dump->kobj);
 	rc = sysfs_create_bin_file(&dump->kobj, &dump->dump_attr);
 	if (rc) {
 		kobject_put(&dump->kobj);
-		return NULL;
+		/* Drop reference count taken for bin file */
+		kobject_put(&dump->kobj);
+		return;
 	}

 	pr_info("%s: New platform dump. ID = 0x%x Size %u\n",
 		__func__, dump->id, dump->size);

 	kobject_uevent(&dump->kobj, KOBJ_ADD);
-
-	return dump;
+	/* Drop reference count taken for bin file */
+	kobject_put(&dump->kobj);
 }

 static irqreturn_t process_dump(int irq, void *data)
-- 
2.26.2

^ permalink raw reply related

* [PATCH] powerpc/security: Fix link stack flush instruction
From: Nicholas Piggin @ 2020-10-07  8:06 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Nicholas Piggin

The inline execution path for the hardware assisted branch flush
instruction failed to set CTR to the correct value before bcctr,
causing a crash when the feature is enabled.

Fixes: 4d24e21cc694 ("powerpc/security: Allow for processors that flush the link stack using the special bcctr")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ++-
 arch/powerpc/kernel/entry_64.S            |  8 ++++--
 arch/powerpc/kernel/security.c            | 34 ++++++++++++++++-------
 3 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index de14b1a34d56..9652756b0694 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -144,7 +144,9 @@ void _kvmppc_restore_tm_pr(struct kvm_vcpu *vcpu, u64 guest_msr);
 void _kvmppc_save_tm_pr(struct kvm_vcpu *vcpu, u64 guest_msr);
 
 /* Patch sites */
-extern s32 patch__call_flush_branch_caches;
+extern s32 patch__call_flush_branch_caches1;
+extern s32 patch__call_flush_branch_caches2;
+extern s32 patch__call_flush_branch_caches3;
 extern s32 patch__flush_count_cache_return;
 extern s32 patch__flush_link_stack_return;
 extern s32 patch__call_kvm_flush_link_stack;
diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 733e40eba4eb..2f3846192ec7 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -430,7 +430,11 @@ _ASM_NOKPROBE_SYMBOL(save_nvgprs);
 
 #define FLUSH_COUNT_CACHE	\
 1:	nop;			\
-	patch_site 1b, patch__call_flush_branch_caches
+	patch_site 1b, patch__call_flush_branch_caches1; \
+1:	nop;			\
+	patch_site 1b, patch__call_flush_branch_caches2; \
+1:	nop;			\
+	patch_site 1b, patch__call_flush_branch_caches3
 
 .macro nops number
 	.rept \number
@@ -512,7 +516,7 @@ _GLOBAL(_switch)
 
 	kuap_check_amr r9, r10
 
-	FLUSH_COUNT_CACHE
+	FLUSH_COUNT_CACHE	/* Clobbers r9, ctr */
 
 	/*
 	 * On SMP kernels, care must be taken because a task may be
diff --git a/arch/powerpc/kernel/security.c b/arch/powerpc/kernel/security.c
index c9876aab3142..e4e1a94ccf6a 100644
--- a/arch/powerpc/kernel/security.c
+++ b/arch/powerpc/kernel/security.c
@@ -430,30 +430,44 @@ device_initcall(stf_barrier_debugfs_init);
 
 static void update_branch_cache_flush(void)
 {
+	u32 *site;
+
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
+	site = &patch__call_kvm_flush_link_stack;
 	// This controls the branch from guest_exit_cont to kvm_flush_link_stack
 	if (link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
-		patch_instruction_site(&patch__call_kvm_flush_link_stack,
-				       ppc_inst(PPC_INST_NOP));
+		patch_instruction_site(site, ppc_inst(PPC_INST_NOP));
 	} else {
 		// Could use HW flush, but that could also flush count cache
-		patch_branch_site(&patch__call_kvm_flush_link_stack,
-				  (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
+		patch_branch_site(site, (u64)&kvm_flush_link_stack, BRANCH_SET_LINK);
 	}
 #endif
 
+	// Patch out the bcctr first, then nop the rest
+	site = &patch__call_flush_branch_caches3;
+	patch_instruction_site(site, ppc_inst(PPC_INST_NOP));
+	site = &patch__call_flush_branch_caches2;
+	patch_instruction_site(site, ppc_inst(PPC_INST_NOP));
+	site = &patch__call_flush_branch_caches1;
+	patch_instruction_site(site, ppc_inst(PPC_INST_NOP));
+
 	// This controls the branch from _switch to flush_branch_caches
 	if (count_cache_flush_type == BRANCH_CACHE_FLUSH_NONE &&
 	    link_stack_flush_type == BRANCH_CACHE_FLUSH_NONE) {
-		patch_instruction_site(&patch__call_flush_branch_caches,
-				       ppc_inst(PPC_INST_NOP));
+		// Nothing to be done
+
 	} else if (count_cache_flush_type == BRANCH_CACHE_FLUSH_HW &&
 		   link_stack_flush_type == BRANCH_CACHE_FLUSH_HW) {
-		patch_instruction_site(&patch__call_flush_branch_caches,
-				       ppc_inst(PPC_INST_BCCTR_FLUSH));
+		// Patch in the bcctr last
+		site = &patch__call_flush_branch_caches1;
+		patch_instruction_site(site, ppc_inst(0x39207fff)); // li r9,0x7fff
+		site = &patch__call_flush_branch_caches2;
+		patch_instruction_site(site, ppc_inst(0x7d2903a6)); // mtctr r9
+		site = &patch__call_flush_branch_caches3;
+		patch_instruction_site(site, ppc_inst(PPC_INST_BCCTR_FLUSH));
+
 	} else {
-		patch_branch_site(&patch__call_flush_branch_caches,
-				  (u64)&flush_branch_caches, BRANCH_SET_LINK);
+		patch_branch_site(site, (u64)&flush_branch_caches, BRANCH_SET_LINK);
 
 		// If we just need to flush the link stack, early return
 		if (count_cache_flush_type == BRANCH_CACHE_FLUSH_NONE) {
-- 
2.23.0


^ permalink raw reply related

* [PATCH 2/2] sparc: Check VMA range in sparc_validate_prot()
From: Jann Horn @ 2020-10-07  7:39 UTC (permalink / raw)
  To: David S. Miller, sparclinux, Andrew Morton, linux-mm
  Cc: Catalin Marinas, linuxppc-dev, linux-kernel, Christoph Hellwig,
	Khalid Aziz, Paul Mackerras, Anthony Yznaga, Will Deacon,
	linux-arm-kernel
In-Reply-To: <20201007073932.865218-1-jannh@google.com>

sparc_validate_prot() is called from do_mprotect_pkey() as
arch_validate_prot(); it tries to ensure that an mprotect() call can't
enable ADI on incompatible VMAs.
The current implementation only checks that the VMA at the start address
matches the rules for ADI mappings; instead, check all VMAs that will be
affected by mprotect().

(This hook is called before mprotect() makes sure that the specified range
is actually covered by VMAs, and mprotect() returns specific error codes
when that's not the case. In order for mprotect() to still generate the
same error codes for mprotect(<unmapped_ptr>, <len>, ...|PROT_ADI), we need
to *accept* cases where the range is not fully covered by VMAs.)

Cc: stable@vger.kernel.org
Fixes: 74a04967482f ("sparc64: Add support for ADI (Application Data Integrity)")
Signed-off-by: Jann Horn <jannh@google.com>
---
compile-tested only, I don't have a Sparc ADI setup - might be nice if some
Sparc person could test this?

 arch/sparc/include/asm/mman.h | 50 +++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 20 deletions(-)

diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
index e85222c76585..6dced75567c3 100644
--- a/arch/sparc/include/asm/mman.h
+++ b/arch/sparc/include/asm/mman.h
@@ -60,31 +60,41 @@ static inline int sparc_validate_prot(unsigned long prot, unsigned long addr,
 	if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_ADI))
 		return 0;
 	if (prot & PROT_ADI) {
+		struct vm_area_struct *vma, *next;
+
 		if (!adi_capable())
 			return 0;
 
-		if (addr) {
-			struct vm_area_struct *vma;
+		vma = find_vma(current->mm, addr);
+		/* if @addr is unmapped, let mprotect() deal with it */
+		if (!vma || vma->vm_start > addr)
+			return 1;
+		while (1) {
+			/* ADI can not be enabled on PFN
+			 * mapped pages
+			 */
+			if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
+				return 0;
 
-			vma = find_vma(current->mm, addr);
-			if (vma) {
-				/* ADI can not be enabled on PFN
-				 * mapped pages
-				 */
-				if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
-					return 0;
+			/* Mergeable pages can become unmergeable
+			 * if ADI is enabled on them even if they
+			 * have identical data on them. This can be
+			 * because ADI enabled pages with identical
+			 * data may still not have identical ADI
+			 * tags on them. Disallow ADI on mergeable
+			 * pages.
+			 */
+			if (vma->vm_flags & VM_MERGEABLE)
+				return 0;
 
-				/* Mergeable pages can become unmergeable
-				 * if ADI is enabled on them even if they
-				 * have identical data on them. This can be
-				 * because ADI enabled pages with identical
-				 * data may still not have identical ADI
-				 * tags on them. Disallow ADI on mergeable
-				 * pages.
-				 */
-				if (vma->vm_flags & VM_MERGEABLE)
-					return 0;
-			}
+			/* reached the end of the range without errors? */
+			if (addr+len <= vma->vm_end)
+				return 1;
+			next = vma->vm_next;
+			/* if a VMA hole follows, let mprotect() deal with it */
+			if (!next || next->vm_start != vma->vm_end)
+				return 1;
+			vma = next;
 		}
 	}
 	return 1;
-- 
2.28.0.806.g8561365e88-goog


^ permalink raw reply related

* [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
From: Jann Horn @ 2020-10-07  7:39 UTC (permalink / raw)
  To: David S. Miller, sparclinux, Andrew Morton, linux-mm
  Cc: Catalin Marinas, linuxppc-dev, linux-kernel, Christoph Hellwig,
	Khalid Aziz, Paul Mackerras, Anthony Yznaga, Will Deacon,
	linux-arm-kernel

arch_validate_prot() is a hook that can validate whether a given set of
protection flags is valid in an mprotect() operation. It is given the set
of protection flags and the address being modified.

However, the address being modified can currently not actually be used in
a meaningful way because:

1. Only the address is given, but not the length, and the operation can
   span multiple VMAs. Therefore, the callee can't actually tell which
   virtual address range, or which VMAs, are being targeted.
2. The mmap_lock is not held, meaning that if the callee were to check
   the VMA at @addr, that VMA would be unrelated to the one the
   operation is performed on.

Currently, custom arch_validate_prot() handlers are defined by
arm64, powerpc and sparc.
arm64 and powerpc don't care about the address range, they just check the
flags against CPU support masks.
sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take
the mmap_lock.

Change the function signature to also take a length, and move the
arch_validate_prot() call in mm/mprotect.c down into the locked region.

Cc: stable@vger.kernel.org
Fixes: 9035cf9a97e4 ("mm: Add address parameter to arch_validate_prot()")
Suggested-by: Khalid Aziz <khalid.aziz@oracle.com>
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Jann Horn <jannh@google.com>
---
 arch/arm64/include/asm/mman.h   | 4 ++--
 arch/powerpc/include/asm/mman.h | 3 ++-
 arch/powerpc/kernel/syscalls.c  | 2 +-
 arch/sparc/include/asm/mman.h   | 6 ++++--
 include/linux/mman.h            | 3 ++-
 mm/mprotect.c                   | 6 ++++--
 6 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
index 081ec8de9ea6..0876a87986dc 100644
--- a/arch/arm64/include/asm/mman.h
+++ b/arch/arm64/include/asm/mman.h
@@ -23,7 +23,7 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
 #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
 
 static inline bool arch_validate_prot(unsigned long prot,
-	unsigned long addr __always_unused)
+	unsigned long addr __always_unused, unsigned long len __always_unused)
 {
 	unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM;
 
@@ -32,6 +32,6 @@ static inline bool arch_validate_prot(unsigned long prot,
 
 	return (prot & ~supported) == 0;
 }
-#define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr)
+#define arch_validate_prot(prot, addr, len) arch_validate_prot(prot, addr, len)
 
 #endif /* ! __ASM_MMAN_H__ */
diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
index 7cb6d18f5cd6..65dd9b594985 100644
--- a/arch/powerpc/include/asm/mman.h
+++ b/arch/powerpc/include/asm/mman.h
@@ -36,7 +36,8 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
 }
 #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
 
-static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
+static inline bool arch_validate_prot(unsigned long prot, unsigned long addr,
+				      unsigned long len)
 {
 	if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
 		return false;
diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
index 078608ec2e92..b1fabb97d138 100644
--- a/arch/powerpc/kernel/syscalls.c
+++ b/arch/powerpc/kernel/syscalls.c
@@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len,
 {
 	long ret = -EINVAL;
 
-	if (!arch_validate_prot(prot, addr))
+	if (!arch_validate_prot(prot, addr, len))
 		goto out;
 
 	if (shift) {
diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
index f94532f25db1..e85222c76585 100644
--- a/arch/sparc/include/asm/mman.h
+++ b/arch/sparc/include/asm/mman.h
@@ -52,9 +52,11 @@ static inline pgprot_t sparc_vm_get_page_prot(unsigned long vm_flags)
 	return (vm_flags & VM_SPARC_ADI) ? __pgprot(_PAGE_MCD_4V) : __pgprot(0);
 }
 
-#define arch_validate_prot(prot, addr) sparc_validate_prot(prot, addr)
-static inline int sparc_validate_prot(unsigned long prot, unsigned long addr)
+#define arch_validate_prot(prot, addr, len) sparc_validate_prot(prot, addr, len)
+static inline int sparc_validate_prot(unsigned long prot, unsigned long addr,
+				      unsigned long len)
 {
+	mmap_assert_write_locked(current->mm);
 	if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_ADI))
 		return 0;
 	if (prot & PROT_ADI) {
diff --git a/include/linux/mman.h b/include/linux/mman.h
index 6f34c33075f9..5b4d554d3189 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -96,7 +96,8 @@ static inline void vm_unacct_memory(long pages)
  *
  * Returns true if the prot flags are valid
  */
-static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
+static inline bool arch_validate_prot(unsigned long prot, unsigned long addr,
+				      unsigned long len)
 {
 	return (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) == 0;
 }
diff --git a/mm/mprotect.c b/mm/mprotect.c
index ce8b8a5eacbb..e2d6b51acbf8 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -533,14 +533,16 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
 	end = start + len;
 	if (end <= start)
 		return -ENOMEM;
-	if (!arch_validate_prot(prot, start))
-		return -EINVAL;
 
 	reqprot = prot;
 
 	if (mmap_write_lock_killable(current->mm))
 		return -EINTR;
 
+	error = -EINVAL;
+	if (!arch_validate_prot(prot, start, len))
+		goto out;
+
 	/*
 	 * If userspace did not allocate the pkey, do not let
 	 * them use it here.

base-commit: c85fb28b6f999db9928b841f63f1beeb3074eeca
-- 
2.28.0.806.g8561365e88-goog


^ permalink raw reply related

* Re: [PATCH] crypto: talitos - Fix sparse warnings
From: Herbert Xu @ 2020-10-07  6:50 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linuxppc-dev, Kim Phillips, Linux Crypto Mailing List,
	Horia Geantă
In-Reply-To: <20201003191553.Horde.qhVjpQA-iJND7COibFfWZQ7@messagerie.c-s.fr>

On Sat, Oct 03, 2020 at 07:15:53PM +0200, Christophe Leroy wrote:
>
> The following changes fix the sparse warnings with less churn:

Yes that works too.  Can you please submit this patch?

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [powerpc:next-test 76/183] arch/powerpc/kernel/vdso32/gettimeofday.S:40: Error: syntax error; found `@', expected `,'
From: Christophe Leroy @ 2020-10-07  6:03 UTC (permalink / raw)
  To: kernel test robot, Michael Ellerman
  Cc: clang-built-linux, kbuild-all, linuxppc-dev
In-Reply-To: <202010070441.K8Bb46Rt-lkp@intel.com>



Le 06/10/2020 à 22:41, kernel test robot a écrit :
> Hi Michael,
> 
> First bad commit (maybe != root cause):
> 
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
> head:   72cdd117c449896c707fc6cfe5b90978160697d0
> commit: 231b232df8f67e7d37af01259c21f2a131c3911e [76/183] powerpc/64: Make VDSO32 track COMPAT on 64-bit
> config: powerpc-randconfig-r033-20201005 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 1127662c6dc2a276839c75a42238b11a3ad00f32)

There has been already a discussion on this, see 
https://groups.google.com/g/clang-built-linux/c/ayNmi3HoNdY/m/ROdg7avVBwAJ

This apparently is a clang issue. The commit mentionned here is only exposing the issue, but the 
issue is not in the commit itself.

Regardless, this error should go away when we switch to the generic C VDSO.

Christophe

^ permalink raw reply

* Re: [PATCH v2 3/4] powerpc/memhotplug: Make lmb size 64bit
From: Michael Ellerman @ 2020-10-07  5:56 UTC (permalink / raw)
  To: Nathan Lynch, Aneesh Kumar K.V; +Cc: linuxppc-dev, stable
In-Reply-To: <87pn7lxe5k.fsf@linux.ibm.com>

Nathan Lynch <nathanl@linux.ibm.com> writes:
> "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com> writes:
>> @@ -322,12 +322,16 @@ static int pseries_remove_mem_node(struct device_node *np)
>>  	/*
>>  	 * Find the base address and size of the memblock
>>  	 */
>> -	regs = of_get_property(np, "reg", NULL);
>> -	if (!regs)
>> +	prop = of_get_property(np, "reg", NULL);
>> +	if (!prop)
>>  		return ret;
>>  
>> -	base = be64_to_cpu(*(unsigned long *)regs);
>> -	lmb_size = be32_to_cpu(regs[3]);
>> +	/*
>> +	 * "reg" property represents (addr,size) tuple.
>> +	 */
>> +	base = of_read_number(prop, mem_addr_cells);
>> +	prop += mem_addr_cells;
>> +	lmb_size = of_read_number(prop, mem_size_cells);
>
> Would of_n_size_cells() and of_n_addr_cells() work here?

Yes that should work and be cleaner.

cheers

^ permalink raw reply

* [PATCH v2] powerpc/mm: Update tlbiel loop on POWER10
From: Aneesh Kumar K.V @ 2020-10-07  5:33 UTC (permalink / raw)
  To: linuxppc-dev, mpe; +Cc: Aneesh Kumar K.V, npiggin

With POWER10, single tlbiel instruction invalidates all the congruence
class of the TLB and hence we need to issue only one tlbiel with SET=0.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_hv.c         |  7 ++++++-
 arch/powerpc/kvm/book3s_hv_builtin.c | 11 ++++++++++-
 arch/powerpc/mm/book3s64/radix_tlb.c | 23 ++++++++++++++++-------
 3 files changed, 32 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3bd3118c7633..00b5c5981db5 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4939,7 +4939,12 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
 	 * Work out how many sets the TLB has, for the use of
 	 * the TLB invalidation loop in book3s_hv_rmhandlers.S.
 	 */
-	if (radix_enabled())
+	if (cpu_has_feature(CPU_FTR_ARCH_31)) {
+		/*
+		 * P10 will flush all the congruence class with a single tlbiel
+		 */
+		kvm->arch.tlb_sets = 1;
+	} else if (radix_enabled())
 		kvm->arch.tlb_sets = POWER9_TLB_SETS_RADIX;	/* 128 */
 	else if (cpu_has_feature(CPU_FTR_ARCH_300))
 		kvm->arch.tlb_sets = POWER9_TLB_SETS_HASH;	/* 256 */
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 073617ce83e0..2803a4b01109 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -702,6 +702,7 @@ static void wait_for_sync(struct kvm_split_mode *sip, int phase)
 
 void kvmhv_p9_set_lpcr(struct kvm_split_mode *sip)
 {
+	int num_sets;
 	unsigned long rb, set;
 
 	/* wait for every other thread to get to real mode */
@@ -712,11 +713,19 @@ void kvmhv_p9_set_lpcr(struct kvm_split_mode *sip)
 	mtspr(SPRN_LPID, sip->lpidr_req);
 	isync();
 
+	/*
+	 * P10 will flush all the congruence class with a single tlbiel
+	 */
+	if (cpu_has_feature(CPU_FTR_ARCH_31))
+		num_sets =  1;
+	else
+		num_sets = POWER9_TLB_SETS_RADIX;
+
 	/* Invalidate the TLB on thread 0 */
 	if (local_paca->kvm_hstate.tid == 0) {
 		sip->do_set = 0;
 		asm volatile("ptesync" : : : "memory");
-		for (set = 0; set < POWER9_TLB_SETS_RADIX; ++set) {
+		for (set = 0; set < num_sets; ++set) {
 			rb = TLBIEL_INVAL_SET_LPID +
 				(set << TLBIEL_INVAL_SET_SHIFT);
 			asm volatile(PPC_TLBIEL(%0, %1, 0, 0, 0) : :
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index 143b4fd396f0..9e76ba766b3c 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -56,14 +56,21 @@ static void tlbiel_all_isa300(unsigned int num_sets, unsigned int is)
 	if (early_cpu_has_feature(CPU_FTR_HVMODE)) {
 		/* MSR[HV] should flush partition scope translations first. */
 		tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 0);
-		for (set = 1; set < num_sets; set++)
-			tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 0);
+
+		if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) {
+			for (set = 1; set < num_sets; set++)
+				tlbiel_radix_set_isa300(set, is, 0,
+							RIC_FLUSH_TLB, 0);
+		}
 	}
 
 	/* Flush process scoped entries. */
 	tlbiel_radix_set_isa300(0, is, 0, RIC_FLUSH_ALL, 1);
-	for (set = 1; set < num_sets; set++)
-		tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 1);
+
+	if (!early_cpu_has_feature(CPU_FTR_ARCH_31)) {
+		for (set = 1; set < num_sets; set++)
+			tlbiel_radix_set_isa300(set, is, 0, RIC_FLUSH_TLB, 1);
+	}
 
 	asm volatile("ptesync": : :"memory");
 }
@@ -300,9 +307,11 @@ static __always_inline void _tlbiel_pid(unsigned long pid, unsigned long ric)
 		return;
 	}
 
-	/* For the remaining sets, just flush the TLB */
-	for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
-		__tlbiel_pid(pid, set, RIC_FLUSH_TLB);
+	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+		/* For the remaining sets, just flush the TLB */
+		for (set = 1; set < POWER9_TLB_SETS_RADIX ; set++)
+			__tlbiel_pid(pid, set, RIC_FLUSH_TLB);
+	}
 
 	asm volatile("ptesync": : :"memory");
 	asm volatile(PPC_RADIX_INVALIDATE_ERAT_USER "; isync" : : :"memory");
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace
From: Andrew Donnellan @ 2020-10-07  5:10 UTC (permalink / raw)
  To: Sasha Levin, linuxppc-dev; +Cc: nathanl, leobras.c, stable, dja
In-Reply-To: <20200826135348.AD06422B4B@mail.kernel.org>

On 26/8/20 11:53 pm, Sasha Levin wrote:
> How should we proceed with this patch?

mpe: I believe we came to the conclusion that we shouldn't put this in 
stable just yet?

-- 
Andrew Donnellan              OzLabs, ADL Canberra
ajd@linux.ibm.com             IBM Australia Limited

^ permalink raw reply

* [PATCH 2/2] powerpc/pseries/eeh: Fix use of uninitialised variable
From: Oliver O'Halloran @ 2020-10-07  4:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran
In-Reply-To: <20201007040903.819081-1-oohall@gmail.com>

If the RTAS call to query the PE address for a device fails we jump the
err: label where an error message is printed along with the return code.
However, the printed return code is from the "ret" variable which isn't set
at that point since we assigned the result to "addr" instead. Fix this by
consistently using the "ret" variable for the result of the RTAS call
helpers an dropping the "addr" local variable"

Fixes: 98ba956f6a38 ("powerpc/pseries/eeh: Rework device EEH PE determination")
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
 arch/powerpc/platforms/pseries/eeh_pseries.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index d8fd5f7b2143..cf024fa37bda 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -363,7 +363,6 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
 {
 	struct eeh_pe pe, *parent;
 	struct eeh_dev *edev;
-	int addr;
 	u32 pcie_flags;
 	int ret;
 
@@ -422,8 +421,8 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
 	}
 
 	/* first up, find the pe_config_addr for the PE containing the device */
-	addr = pseries_eeh_get_pe_config_addr(pdn);
-	if (addr < 0) {
+	ret = pseries_eeh_get_pe_config_addr(pdn);
+	if (ret < 0) {
 		eeh_edev_dbg(edev, "Unable to find pe_config_addr\n");
 		goto err;
 	}
@@ -431,7 +430,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
 	/* Try enable EEH on the fake PE */
 	memset(&pe, 0, sizeof(struct eeh_pe));
 	pe.phb = pdn->phb;
-	pe.addr = addr;
+	pe.addr = ret;
 
 	eeh_edev_dbg(edev, "Enabling EEH on device\n");
 	ret = eeh_ops->set_option(&pe, EEH_OPT_ENABLE);
@@ -440,7 +439,7 @@ void pseries_eeh_init_edev(struct pci_dn *pdn)
 		goto err;
 	}
 
-	edev->pe_config_addr = addr;
+	edev->pe_config_addr = pe.addr;
 
 	eeh_add_flag(EEH_ENABLED);
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 1/2] powerpc/eeh: Delete eeh_pe->config_addr
From: Oliver O'Halloran @ 2020-10-07  4:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Oliver O'Halloran

The eeh_pe->config_addr field was supposed to be removed in
commit 35d64734b643 ("powerpc/eeh: Clean up PE addressing") which made it
largely unused. Finish the job.

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
---
mpe: the Fixes SHA is from ppc/next, fold it into that if you want.
---
 arch/powerpc/include/asm/eeh.h | 1 -
 arch/powerpc/kernel/eeh.c      | 2 +-
 arch/powerpc/kernel/eeh_pe.c   | 4 ++--
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index dd6a4ac6c713..b1a5bba2e0b9 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -73,7 +73,6 @@ struct pci_dn;
 struct eeh_pe {
 	int type;			/* PE type: PHB/Bus/Device	*/
 	int state;			/* PE EEH dependent mode	*/
-	int config_addr;		/* Traditional PCI address	*/
 	int addr;			/* PE configuration address	*/
 	struct pci_controller *phb;	/* Associated PHB		*/
 	struct pci_bus *bus;		/* Top PCI bus for bus PE	*/
diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
index 87de8b798b2d..0e160dffcb86 100644
--- a/arch/powerpc/kernel/eeh.c
+++ b/arch/powerpc/kernel/eeh.c
@@ -466,7 +466,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev)
 		return 0;
 	}
 
-	if (!pe->addr && !pe->config_addr) {
+	if (!pe->addr) {
 		eeh_stats.no_cfg_addr++;
 		return 0;
 	}
diff --git a/arch/powerpc/kernel/eeh_pe.c b/arch/powerpc/kernel/eeh_pe.c
index 61b7d4019051..845e024321d4 100644
--- a/arch/powerpc/kernel/eeh_pe.c
+++ b/arch/powerpc/kernel/eeh_pe.c
@@ -354,8 +354,8 @@ int eeh_pe_tree_insert(struct eeh_dev *edev, struct eeh_pe *new_pe_parent)
 		pr_err("%s: out of memory!\n", __func__);
 		return -ENOMEM;
 	}
-	pe->addr	= edev->pe_config_addr;
-	pe->config_addr	= edev->bdfn;
+
+	pe->addr = edev->pe_config_addr;
 
 	/*
 	 * Put the new EEH PE into hierarchy tree. If the parent
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v5] powerpc/powernv/elog: Fix race while processing OPAL error log event.
From: Michael Ellerman @ 2020-10-07  3:25 UTC (permalink / raw)
  To: Michael Ellerman, linuxppc-dev; +Cc: aneesh.kumar, oohall, hegdevasant
In-Reply-To: <20201006122051.190176-1-mpe@ellerman.id.au>

On Tue, 6 Oct 2020 23:20:51 +1100, Michael Ellerman wrote:
> Every error log reported by OPAL is exported to userspace through a
> sysfs interface and notified using kobject_uevent(). The userspace
> daemon (opal_errd) then reads the error log and acknowledges the error
> log is saved safely to disk. Once acknowledged the kernel removes the
> respective sysfs file entry causing respective resources to be
> released including kobject.
> 
> [...]

Applied to powerpc/next.

[1/1] powerpc/powernv/elog: Fix race while processing OPAL error log event.
      https://git.kernel.org/powerpc/c/aea948bb80b478ddc2448f7359d574387521a52d

cheers

^ permalink raw reply

* Re: [PATCH v4] pseries/hotplug-memory: hot-add: skip redundant LMB lookup
From: Michael Ellerman @ 2020-10-07  3:25 UTC (permalink / raw)
  To: Scott Cheloha, linuxppc-dev
  Cc: Nathan Lynch, Laurent Dufour, Michal Suchanek, David Hildenbrand,
	Rick Lindsley
In-Reply-To: <20200916145122.3408129-1-cheloha@linux.ibm.com>

On Wed, 16 Sep 2020 09:51:22 -0500, Scott Cheloha wrote:
> During memory hot-add, dlpar_add_lmb() calls memory_add_physaddr_to_nid()
> to determine which node id (nid) to use when later calling __add_memory().
> 
> This is wasteful.  On pseries, memory_add_physaddr_to_nid() finds an
> appropriate nid for a given address by looking up the LMB containing the
> address and then passing that LMB to of_drconf_to_nid_single() to get the
> nid.  In dlpar_add_lmb() we get this address from the LMB itself.
> 
> [...]

Applied to powerpc/next.

[1/1] pseries/hotplug-memory: hot-add: skip redundant LMB lookup
      https://git.kernel.org/powerpc/c/72cdd117c449896c707fc6cfe5b90978160697d0

cheers

^ permalink raw reply

* Re: [PATCH -next] powerpc/papr_scm: Fix warnings about undeclared variable
From: Michael Ellerman @ 2020-10-07  3:21 UTC (permalink / raw)
  To: Wang Wensheng, linuxppc-dev
  Cc: santosh, aneesh.kumar, linux-kernel, paulus, vaibhav,
	dan.j.williams, ira.weiny
In-Reply-To: <20200918085951.44983-1-wangwensheng4@huawei.com>

On Fri, 18 Sep 2020 08:59:51 +0000, Wang Wensheng wrote:
> Build the kernel with 'make C=2':
> arch/powerpc/platforms/pseries/papr_scm.c:825:1: warning: symbol
> 'dev_attr_perf_stats' was not declared. Should it be static?

Applied to powerpc/next.

[1/1] powerpc/papr_scm: Fix warnings about undeclared variable
      https://git.kernel.org/powerpc/c/4366337490cbe5a35b50e83755be629a502ec7e2

cheers

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox