LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH v3 06/11] powerpc/smp: Stop passing mask to update_mask_by_l2
From: Srikar Dronamraju @ 2020-10-07 18:37 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Michael Neuling,
	Srikar Dronamraju, Peter Zijlstra, LKML, Nicholas Piggin,
	Valentin Schneider, Qian Cai, Oliver O'Halloran,
	Satheesh Rajendran, linuxppc-dev, Ingo Molnar
In-Reply-To: <20201007183800.27415-1-srikar@linux.vnet.ibm.com>

update_mask_by_l2 is called only once. But it passes cpu_l2_cache_mask
as parameter. Instead of passing cpu_l2_cache_mask, use it directly in
update_mask_by_l2.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Anton Blanchard <anton@ozlabs.org>
Cc: Oliver O'Halloran <oohall@gmail.com>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Qian Cai <cai@redhat.com>
---
 arch/powerpc/kernel/smp.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index c860c4950c9f..441c9c64b1e3 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1218,7 +1218,7 @@ static struct device_node *cpu_to_l2cache(int cpu)
 	return cache;
 }
 
-static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
+static bool update_mask_by_l2(int cpu)
 {
 	struct device_node *l2_cache, *np;
 	int i;
@@ -1240,7 +1240,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
 		return false;
 	}
 
-	cpumask_set_cpu(cpu, mask_fn(cpu));
+	cpumask_set_cpu(cpu, cpu_l2_cache_mask(cpu));
 	for_each_cpu_and(i, cpu_online_mask, cpu_cpu_mask(cpu)) {
 		/*
 		 * when updating the marks the current CPU has not been marked
@@ -1251,7 +1251,7 @@ static bool update_mask_by_l2(int cpu, struct cpumask *(*mask_fn)(int))
 			continue;
 
 		if (np == l2_cache)
-			set_cpus_related(cpu, i, mask_fn);
+			set_cpus_related(cpu, i, cpu_l2_cache_mask);
 
 		of_node_put(np);
 	}
@@ -1315,7 +1315,7 @@ static void add_cpu_to_masks(int cpu)
 			set_cpus_related(i, cpu, cpu_sibling_mask);
 
 	add_cpu_to_smallcore_masks(cpu);
-	update_mask_by_l2(cpu, cpu_l2_cache_mask);
+	update_mask_by_l2(cpu);
 
 	if (has_coregroup_support()) {
 		int coregroup_id = cpu_to_coregroup_id(cpu);
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
From: Khalid Aziz @ 2020-10-07 20:14 UTC (permalink / raw)
  To: Jann Horn, David S. Miller, sparclinux, Andrew Morton, linux-mm
  Cc: Catalin Marinas, linuxppc-dev, linux-kernel, Christoph Hellwig,
	Paul Mackerras, Anthony Yznaga, Will Deacon, linux-arm-kernel
In-Reply-To: <20201007073932.865218-1-jannh@google.com>

On 10/7/20 1:39 AM, Jann Horn wrote:
> arch_validate_prot() is a hook that can validate whether a given set of
> protection flags is valid in an mprotect() operation. It is given the set
> of protection flags and the address being modified.
> 
> However, the address being modified can currently not actually be used in
> a meaningful way because:
> 
> 1. Only the address is given, but not the length, and the operation can
>    span multiple VMAs. Therefore, the callee can't actually tell which
>    virtual address range, or which VMAs, are being targeted.
> 2. The mmap_lock is not held, meaning that if the callee were to check
>    the VMA at @addr, that VMA would be unrelated to the one the
>    operation is performed on.
> 
> Currently, custom arch_validate_prot() handlers are defined by
> arm64, powerpc and sparc.
> arm64 and powerpc don't care about the address range, they just check the
> flags against CPU support masks.
> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take
> the mmap_lock.
> 
> Change the function signature to also take a length, and move the
> arch_validate_prot() call in mm/mprotect.c down into the locked region.
> 
> Cc: stable@vger.kernel.org
> Fixes: 9035cf9a97e4 ("mm: Add address parameter to arch_validate_prot()")
> Suggested-by: Khalid Aziz <khalid.aziz@oracle.com>
> Suggested-by: Christoph Hellwig <hch@infradead.org>
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
>  arch/arm64/include/asm/mman.h   | 4 ++--
>  arch/powerpc/include/asm/mman.h | 3 ++-
>  arch/powerpc/kernel/syscalls.c  | 2 +-
>  arch/sparc/include/asm/mman.h   | 6 ++++--
>  include/linux/mman.h            | 3 ++-
>  mm/mprotect.c                   | 6 ++++--
>  6 files changed, 15 insertions(+), 9 deletions(-)


This looks good to me.

As Chris pointed out, the call to arch_validate_prot() from do_mmap2()
is made without holding mmap_lock. Lock is not acquired until
vm_mmap_pgoff(). This variance is uncomfortable but I am more
uncomfortable forcing all implementations of validate_prot to require
mmap_lock be held when non-sparc implementations do not have such need
yet. Since do_mmap2() is in powerpc specific code, for now this patch
solves a current problem. That leaves open the question of should
generic mmap call arch_validate_prot and return EINVAL for invalid
combination of protection bits, but that is better addressed in a
separate patch.

Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>

> 
> diff --git a/arch/arm64/include/asm/mman.h b/arch/arm64/include/asm/mman.h
> index 081ec8de9ea6..0876a87986dc 100644
> --- a/arch/arm64/include/asm/mman.h
> +++ b/arch/arm64/include/asm/mman.h
> @@ -23,7 +23,7 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
>  #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
>  
>  static inline bool arch_validate_prot(unsigned long prot,
> -	unsigned long addr __always_unused)
> +	unsigned long addr __always_unused, unsigned long len __always_unused)
>  {
>  	unsigned long supported = PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM;
>  
> @@ -32,6 +32,6 @@ static inline bool arch_validate_prot(unsigned long prot,
>  
>  	return (prot & ~supported) == 0;
>  }
> -#define arch_validate_prot(prot, addr) arch_validate_prot(prot, addr)
> +#define arch_validate_prot(prot, addr, len) arch_validate_prot(prot, addr, len)
>  
>  #endif /* ! __ASM_MMAN_H__ */
> diff --git a/arch/powerpc/include/asm/mman.h b/arch/powerpc/include/asm/mman.h
> index 7cb6d18f5cd6..65dd9b594985 100644
> --- a/arch/powerpc/include/asm/mman.h
> +++ b/arch/powerpc/include/asm/mman.h
> @@ -36,7 +36,8 @@ static inline pgprot_t arch_vm_get_page_prot(unsigned long vm_flags)
>  }
>  #define arch_vm_get_page_prot(vm_flags) arch_vm_get_page_prot(vm_flags)
>  
> -static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
> +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr,
> +				      unsigned long len)
>  {
>  	if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_SAO))
>  		return false;
> diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
> index 078608ec2e92..b1fabb97d138 100644
> --- a/arch/powerpc/kernel/syscalls.c
> +++ b/arch/powerpc/kernel/syscalls.c
> @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len,
>  {
>  	long ret = -EINVAL;
>  
> -	if (!arch_validate_prot(prot, addr))
> +	if (!arch_validate_prot(prot, addr, len))
>  		goto out;
>  
>  	if (shift) {
> diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
> index f94532f25db1..e85222c76585 100644
> --- a/arch/sparc/include/asm/mman.h
> +++ b/arch/sparc/include/asm/mman.h
> @@ -52,9 +52,11 @@ static inline pgprot_t sparc_vm_get_page_prot(unsigned long vm_flags)
>  	return (vm_flags & VM_SPARC_ADI) ? __pgprot(_PAGE_MCD_4V) : __pgprot(0);
>  }
>  
> -#define arch_validate_prot(prot, addr) sparc_validate_prot(prot, addr)
> -static inline int sparc_validate_prot(unsigned long prot, unsigned long addr)
> +#define arch_validate_prot(prot, addr, len) sparc_validate_prot(prot, addr, len)
> +static inline int sparc_validate_prot(unsigned long prot, unsigned long addr,
> +				      unsigned long len)
>  {
> +	mmap_assert_write_locked(current->mm);
>  	if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_ADI))
>  		return 0;
>  	if (prot & PROT_ADI) {
> diff --git a/include/linux/mman.h b/include/linux/mman.h
> index 6f34c33075f9..5b4d554d3189 100644
> --- a/include/linux/mman.h
> +++ b/include/linux/mman.h
> @@ -96,7 +96,8 @@ static inline void vm_unacct_memory(long pages)
>   *
>   * Returns true if the prot flags are valid
>   */
> -static inline bool arch_validate_prot(unsigned long prot, unsigned long addr)
> +static inline bool arch_validate_prot(unsigned long prot, unsigned long addr,
> +				      unsigned long len)
>  {
>  	return (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM)) == 0;
>  }
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index ce8b8a5eacbb..e2d6b51acbf8 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -533,14 +533,16 @@ static int do_mprotect_pkey(unsigned long start, size_t len,
>  	end = start + len;
>  	if (end <= start)
>  		return -ENOMEM;
> -	if (!arch_validate_prot(prot, start))
> -		return -EINVAL;
>  
>  	reqprot = prot;
>  
>  	if (mmap_write_lock_killable(current->mm))
>  		return -EINTR;
>  
> +	error = -EINVAL;
> +	if (!arch_validate_prot(prot, start, len))
> +		goto out;
> +
>  	/*
>  	 * If userspace did not allocate the pkey, do not let
>  	 * them use it here.
> 
> base-commit: c85fb28b6f999db9928b841f63f1beeb3074eeca
> 



^ permalink raw reply

* Re: [PATCH 2/2] sparc: Check VMA range in sparc_validate_prot()
From: Khalid Aziz @ 2020-10-07 20:15 UTC (permalink / raw)
  To: Jann Horn, David S. Miller, sparclinux, Andrew Morton, linux-mm
  Cc: Catalin Marinas, linuxppc-dev, linux-kernel, Christoph Hellwig,
	Paul Mackerras, Anthony Yznaga, Will Deacon, linux-arm-kernel
In-Reply-To: <20201007073932.865218-2-jannh@google.com>

On 10/7/20 1:39 AM, Jann Horn wrote:
> sparc_validate_prot() is called from do_mprotect_pkey() as
> arch_validate_prot(); it tries to ensure that an mprotect() call can't
> enable ADI on incompatible VMAs.
> The current implementation only checks that the VMA at the start address
> matches the rules for ADI mappings; instead, check all VMAs that will be
> affected by mprotect().
> 
> (This hook is called before mprotect() makes sure that the specified range
> is actually covered by VMAs, and mprotect() returns specific error codes
> when that's not the case. In order for mprotect() to still generate the
> same error codes for mprotect(<unmapped_ptr>, <len>, ...|PROT_ADI), we need
> to *accept* cases where the range is not fully covered by VMAs.)
> 
> Cc: stable@vger.kernel.org
> Fixes: 74a04967482f ("sparc64: Add support for ADI (Application Data Integrity)")
> Signed-off-by: Jann Horn <jannh@google.com>
> ---
> compile-tested only, I don't have a Sparc ADI setup - might be nice if some
> Sparc person could test this?
> 
>  arch/sparc/include/asm/mman.h | 50 +++++++++++++++++++++--------------
>  1 file changed, 30 insertions(+), 20 deletions(-)


Looks good to me.

Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com>


> 
> diff --git a/arch/sparc/include/asm/mman.h b/arch/sparc/include/asm/mman.h
> index e85222c76585..6dced75567c3 100644
> --- a/arch/sparc/include/asm/mman.h
> +++ b/arch/sparc/include/asm/mman.h
> @@ -60,31 +60,41 @@ static inline int sparc_validate_prot(unsigned long prot, unsigned long addr,
>  	if (prot & ~(PROT_READ | PROT_WRITE | PROT_EXEC | PROT_SEM | PROT_ADI))
>  		return 0;
>  	if (prot & PROT_ADI) {
> +		struct vm_area_struct *vma, *next;
> +
>  		if (!adi_capable())
>  			return 0;
>  
> -		if (addr) {
> -			struct vm_area_struct *vma;
> +		vma = find_vma(current->mm, addr);
> +		/* if @addr is unmapped, let mprotect() deal with it */
> +		if (!vma || vma->vm_start > addr)
> +			return 1;
> +		while (1) {
> +			/* ADI can not be enabled on PFN
> +			 * mapped pages
> +			 */
> +			if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> +				return 0;
>  
> -			vma = find_vma(current->mm, addr);
> -			if (vma) {
> -				/* ADI can not be enabled on PFN
> -				 * mapped pages
> -				 */
> -				if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))
> -					return 0;
> +			/* Mergeable pages can become unmergeable
> +			 * if ADI is enabled on them even if they
> +			 * have identical data on them. This can be
> +			 * because ADI enabled pages with identical
> +			 * data may still not have identical ADI
> +			 * tags on them. Disallow ADI on mergeable
> +			 * pages.
> +			 */
> +			if (vma->vm_flags & VM_MERGEABLE)
> +				return 0;
>  
> -				/* Mergeable pages can become unmergeable
> -				 * if ADI is enabled on them even if they
> -				 * have identical data on them. This can be
> -				 * because ADI enabled pages with identical
> -				 * data may still not have identical ADI
> -				 * tags on them. Disallow ADI on mergeable
> -				 * pages.
> -				 */
> -				if (vma->vm_flags & VM_MERGEABLE)
> -					return 0;
> -			}
> +			/* reached the end of the range without errors? */
> +			if (addr+len <= vma->vm_end)
> +				return 1;
> +			next = vma->vm_next;
> +			/* if a VMA hole follows, let mprotect() deal with it */
> +			if (!next || next->vm_start != vma->vm_end)
> +				return 1;
> +			vma = next;
>  		}
>  	}
>  	return 1;
> 



^ permalink raw reply

* Re: [PATCH] powerpc/pci: Fix PHB removal/rescan on PowerNV
From: Oliver O'Halloran @ 2020-10-08  2:23 UTC (permalink / raw)
  To: Cédric Le Goater; +Cc: Alexey Kardashevskiy, linuxppc-dev, linux-pci
In-Reply-To: <20200925092258.525079-1-clg@kaod.org>

On Fri, Sep 25, 2020 at 7:23 PM Cédric Le Goater <clg@kaod.org> wrote:
>
> To fix an issue with PHB hotplug on pSeries machine (HPT/XIVE), commit
> 3a3181e16fbd introduced a PPC specific pcibios_remove_bus() routine to
> clear all interrupt mappings when the bus is removed. This routine
> frees an array allocated in pcibios_scan_phb().
>
> This broke PHB hotplug on PowerNV because, when a PHB is removed and
> re-scanned through sysfs, the PCI layer un-assigns and re-assigns
> resources to the PHB but does not destroy and recreate the PCI
> controller structure. Since pcibios_remove_bus() does not clear the
> 'irq_map' array pointer, a second removal of the PHB will try to free
> the array a second time and corrupt memory.

"PHB hotplug" and "hot-plugging devices under a PHB" are different
things. What you're saying here doesn't make a whole lot of sense to
me unless you're conflating the two. The distinction is important
since on pseries we can use DLPAR to add and remove actual PHBs (i.e.
the pci_controller) at runtime, but there's no corresponding mechanism
on PowerNV.

> Free the 'irq_map' array in pcibios_free_controller() to fix
> corruption and clear interrupt mapping after it has been
> disposed. This to avoid filling up the array with successive
> remove/rescan of a bus.

Even with this patch I think we're still broken. With this patch
applied the init path is something like:

per-phb init:
    allocate phb->irq_map array
per-bus init:
    nothing
per-device init:
    pcibios_bus_add_device()
       pci_read_irq_line()
            pci_irq_map_register(pci_dev, virq)
               *record the device's interrupt in phb->irq_map*

And the teardown path:

per-device teardown:
    nothing
per-bus teardown:
    pcibios_remove_bus()
        pci_irq_map_dispose()
            *walk phb->irq_map and dispose of each mapped interrupt*
per-phb teardown:
    free(phb->irq_map)

There's a lot of asymmetry here, which is a problem in itself, but the
real issue is that when removing *any* pci_bus under a PHB we dispose
of the LSI\ for *every* device under that PHB. Not good.

Ideally we should be fixing this by having the per-device teardown
handle disposing the mapping. Unfortunately, there's no pcibios hook
that's called when removing a pci_dev. However, we can register a bus
notifier which will be called when the pci_dev is removed from its bus
and we already do that for the per-device EEH teardown and to handle
IOMMU TCE invalidation when the device is removed.

^ permalink raw reply

* [PATCH] powerpc/smp: Use GFP_ATOMIC while allocating tmp mask
From: Srikar Dronamraju @ 2020-10-08  3:42 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Srikar Dronamraju, Qian Cai, LKML,
	Valentin Schneider, Peter Zijlstra, linuxppc-dev, Ingo Molnar

Qian Cai reported a regression where CPU Hotplug fails with the latest
powerpc/next

BUG: sleeping function called from invalid context at mm/slab.h:494
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88
no locks held by swapper/88/0.
irq event stamp: 18074448
hardirqs last  enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110
hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0
do_idle at kernel/sched/idle.c:253 (discriminator 1)
softirqs last  enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0
softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0
CPU: 88 PID: 0 Comm: swapper/88 Tainted: G        W         5.9.0-rc8-next-20201007 #1
Call Trace:
[c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable)
[c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310
[c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190
[c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0
slab_alloc_node at mm/slub.c:2817
(inlined by) __kmalloc_node at mm/slub.c:4013
[c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80
kmalloc_node at include/linux/slab.h:577
(inlined by) alloc_cpumask_var_node at lib/cpumask.c:116
[c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800
update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267
(inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387
(inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420
[c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14

Allocating a temporary mask while performing a CPU Hotplug operation
with CONFIG_CPUMASK_OFFSTACK enabled, leads to calling a sleepable
function from a atomic context. Fix this by allocating the temporary
mask with GFP_ATOMIC flag.

If there is a failure to allocate a mask, scheduler is going to observe
that this CPU's topology is broken. Instead of having to speculate why
the topology is broken, add a WARN_ON_ONCE.

Fixes: 70a94089d7f7 ("powerpc/smp: Optimize update_coregroup_mask")
Fixes: 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2")
Reported-by: Qian Cai <cai@redhat.com>
Suggested-by: Qian Cai <cai@redhat.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Qian Cai <cai@redhat.com>
---
 arch/powerpc/kernel/smp.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 0dc1b85..1268558 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1264,7 +1264,8 @@ static bool update_mask_by_l2(int cpu)
 		return false;
 	}

-	alloc_cpumask_var_node(&mask, GFP_KERNEL, cpu_to_node(cpu));
+	/* In CPU-hotplug path, hence use GFP_ATOMIC */
+	WARN_ON_ONCE(!alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu)));
 	cpumask_and(mask, cpu_online_mask, cpu_cpu_mask(cpu));

 	if (has_big_cores)
@@ -1344,7 +1345,8 @@ static void update_coregroup_mask(int cpu)
 	int coregroup_id = cpu_to_coregroup_id(cpu);
 	int i;

-	alloc_cpumask_var_node(&mask, GFP_KERNEL, cpu_to_node(cpu));
+	/* In CPU-hotplug path, hence use GFP_ATOMIC */
+	WARN_ON_ONCE(!alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu)));
 	cpumask_and(mask, cpu_online_mask, cpu_cpu_mask(cpu));

 	if (shared_caches)
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] powerpc/pci: Fix PHB removal/rescan on PowerNV
From: Cédric Le Goater @ 2020-10-08  4:37 UTC (permalink / raw)
  To: Oliver O'Halloran; +Cc: Alexey Kardashevskiy, linuxppc-dev, linux-pci
In-Reply-To: <CAOSf1CGW7ocYm2BXFiy9Nmi+G+xwVcqQzTqPo_nss_tmpG_V=w@mail.gmail.com>

On 10/8/20 4:23 AM, Oliver O'Halloran wrote:
> On Fri, Sep 25, 2020 at 7:23 PM Cédric Le Goater <clg@kaod.org> wrote:
>>
>> To fix an issue with PHB hotplug on pSeries machine (HPT/XIVE), commit
>> 3a3181e16fbd introduced a PPC specific pcibios_remove_bus() routine to
>> clear all interrupt mappings when the bus is removed. This routine
>> frees an array allocated in pcibios_scan_phb().
>>
>> This broke PHB hotplug on PowerNV because, when a PHB is removed and
>> re-scanned through sysfs, the PCI layer un-assigns and re-assigns
>> resources to the PHB but does not destroy and recreate the PCI
>> controller structure. Since pcibios_remove_bus() does not clear the
>> 'irq_map' array pointer, a second removal of the PHB will try to free
>> the array a second time and corrupt memory.
> 
> "PHB hotplug" and "hot-plugging devices under a PHB" are different
> things. What you're saying here doesn't make a whole lot of sense to
> me unless you're conflating the two. The distinction is important
> since on pseries we can use DLPAR to add and remove actual PHBs (i.e.
> the pci_controller) at runtime, but there's no corresponding mechanism
> on PowerNV.

And it's even different on QEMU. 

>> Free the 'irq_map' array in pcibios_free_controller() to fix
>> corruption and clear interrupt mapping after it has been
>> disposed. This to avoid filling up the array with successive
>> remove/rescan of a bus.
> 
> Even with this patch I think we're still broken. With this patch
> applied the init path is something like:
> 
> per-phb init:
>     allocate phb->irq_map array
> per-bus init:
>     nothing
> per-device init:
>     pcibios_bus_add_device()
>        pci_read_irq_line()
>             pci_irq_map_register(pci_dev, virq)
>                *record the device's interrupt in phb->irq_map*
> 
> And the teardown path:
> 
> per-device teardown:
>     nothing
> per-bus teardown:
>     pcibios_remove_bus()
>         pci_irq_map_dispose()
>             *walk phb->irq_map and dispose of each mapped interrupt*
> per-phb teardown:
>     free(phb->irq_map)
> 
> There's a lot of asymmetry here, which is a problem in itself, but the
> real issue is that when removing *any* pci_bus under a PHB we dispose
> of the LSI\ for *every* device under that PHB. Not good.
> 
> Ideally we should be fixing this by having the per-device teardown
> handle disposing the mapping. Unfortunately, there's no pcibios hook
> that's called when removing a pci_dev. However, we can register a bus
> notifier which will be called when the pci_dev is removed from its bus
> and we already do that for the per-device EEH teardown and to handle
> IOMMU TCE invalidation when the device is removed.

I lack the knowledge here and I think some else should take over,
as I am not doing a good job. 

Michael, can you drop the initial patch again :/ It is better not
to merge anything.

Thanks,

C. 



^ permalink raw reply

* Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
From: Christoph Hellwig @ 2020-10-08  6:21 UTC (permalink / raw)
  To: Jann Horn
  Cc: Dave Kleikamp, Will Deacon, Linux-MM, kernel list,
	Christoph Hellwig, Catalin Marinas, Khalid Aziz, Paul Mackerras,
	sparclinux, Anthony Yznaga, Andrew Morton, linuxppc-dev,
	David S. Miller, Linux ARM
In-Reply-To: <CAG48ez3kjTeVtQcjQerYYRs7sX5qq3O7SU-FEaYLNXisFmAeOg@mail.gmail.com>

On Wed, Oct 07, 2020 at 04:42:55PM +0200, Jann Horn wrote:
> > > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len,
> > >  {
> > >       long ret = -EINVAL;
> > >
> > > -     if (!arch_validate_prot(prot, addr))
> > > +     if (!arch_validate_prot(prot, addr, len))
> >
> > This call isn't under mmap lock.  I also find it rather weird as the
> > generic code only calls arch_validate_prot from mprotect, only powerpc
> > also calls it from mmap.
> >
> > This seems to go back to commit ef3d3246a0d0
> > ("powerpc/mm: Add Strong Access Ordering support")
> 
> I'm _guessing_ the idea in the generic case might be that mmap()
> doesn't check unknown bits in the protection flags, and therefore
> maybe people wanted to avoid adding new error cases that could be
> caused by random high bits being set? So while the mprotect() case
> checks the flags and refuses unknown values, the mmap() code just lets
> the architecture figure out which bits are actually valid to set (via
> arch_calc_vm_prot_bits()) and silently ignores the rest?
> 
> And powerpc apparently decided that they do want to error out on bogus
> prot values passed to their version of mmap(), and in exchange, assume
> in arch_calc_vm_prot_bits() that the protection bits are valid?

The problem really is that now programs behave different on powerpc
compared to all other architectures.

> powerpc's arch_validate_prot() doesn't actually need the mmap lock, so
> I think this is fine-ish for now (as in, while the code is a bit
> unclean, I don't think I'm making it worse, and I don't think it's
> actually buggy). In theory, we could move the arch_validate_prot()
> call over into the mmap guts, where we're holding the lock, and gate
> it on the architecture or on some feature CONFIG that powerpc can
> activate in its Kconfig. But I'm not sure whether that'd be helping or
> making things worse, so when I sent this patch, I deliberately left
> the powerpc stuff as-is.

For now I'd just duplicate the trivial logic from arch_validate_prot
in the powerpc version of do_mmap2 and add a comment that this check
causes a gratious incompatibility to all other architectures.  And then
hope that the powerpc maintainers fix it up :)

^ permalink raw reply

* [PATCH v4 1/4] powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31 is set
From: Ravi Bangoria @ 2020-10-08  7:27 UTC (permalink / raw)
  To: mpe; +Cc: ravi.bangoria, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201008072726.233086-1-ravi.bangoria@linux.ibm.com>

From: Balamuruhan S <bala24@linux.ibm.com>

Unconditional emulation of prefixed instructions will allow
emulation of them on Power10 predecessors which might cause
issues. Restrict that.

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/lib/sstep.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e9dcaba9a4f8..e6242744c71b 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -1346,6 +1346,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 	switch (opcode) {
 #ifdef __powerpc64__
 	case 1:
+		if (!cpu_has_feature(CPU_FTR_ARCH_31))
+			return -1;
+
 		prefix_r = GET_PREFIX_R(word);
 		ra = GET_PREFIX_RA(suffix);
 		rd = (suffix >> 21) & 0x1f;
@@ -2733,6 +2736,9 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 		}
 		break;
 	case 1: /* Prefixed instructions */
+		if (!cpu_has_feature(CPU_FTR_ARCH_31))
+			return -1;
+
 		prefix_r = GET_PREFIX_R(word);
 		ra = GET_PREFIX_RA(suffix);
 		op->update_reg = ra;
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 0/4] powerpc/sstep: VSX 32-byte vector paired load/store instructions
From: Ravi Bangoria @ 2020-10-08  7:27 UTC (permalink / raw)
  To: mpe; +Cc: ravi.bangoria, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev

VSX vector paired instructions operates with octword (32-byte)
operand for loads and stores between storage and a pair of two
sequential Vector-Scalar Registers (VSRs). There are 4 word
instructions and 2 prefixed instructions that provides this
32-byte storage access operations - lxvp, lxvpx, stxvp, stxvpx,
plxvp, pstxvp.

Emulation infrastructure doesn't have support for these instructions,
to operate with 32-byte storage access and to operate with 2 VSX
registers. This patch series enables the instruction emulation
support and adds test cases for them respectively.

v3: https://lore.kernel.org/linuxppc-dev/20200731081637.1837559-1-bala24@linux.ibm.com/ 

Changes in v4:
-------------
* Patch #1 is (kind of) new.
* Patch #2 now enables both analyse_instr() and emulate_step()
  unlike prev series where both were in separate patches.
* Patch #2 also has important fix for emulation on LE.
* Patch #3 and #4. Added XSP/XTP, D0/D1 instruction operands,
  removed *_EX_OP, __PPC_T[P][X] macros which are incorrect,
  and adhered to PPC_RAW_* convention.
* Added `CPU_FTR_ARCH_31` check in testcases to avoid failing
  in p8/p9.
* Some consmetic changes.
* Rebased to powerpc/next

Changes in v3:
-------------
Worked on review comments and suggestions from Ravi and Naveen,

* Fix the do_vsx_load() to handle vsx instructions if MSR_FP/MSR_VEC
  cleared in exception conditions and it reaches to read/write to
  thread_struct member fp_state/vr_state respectively.
* Fix wrongly used `__vector128 v[2]` in struct vsx_reg as it should
  hold a single vsx register size.
* Remove unnecessary `VSX_CHECK_VEC` flag set and condition to check
  `VSX_LDLEFT` that is not applicable for these vsx instructions.
* Fix comments in emulate_vsx_load() that were misleading.
* Rebased on latest powerpc next branch.

Changes in v2:
-------------
* Fix suggestion from Sandipan, wrap ISA 3.1 instructions with
  cpu_has_feature(CPU_FTR_ARCH_31) check.
* Rebase on latest powerpc next branch.

Balamuruhan S (4):
  powerpc/sstep: Emulate prefixed instructions only when CPU_FTR_ARCH_31
    is set
  powerpc/sstep: Support VSX vector paired storage access instructions
  powerpc/ppc-opcode: Add encoding macros for VSX vector paired
    instructions
  powerpc/sstep: Add testcases for VSX vector paired load/store
    instructions

 arch/powerpc/include/asm/ppc-opcode.h |  13 ++
 arch/powerpc/lib/sstep.c              | 152 +++++++++++++--
 arch/powerpc/lib/test_emulate_step.c  | 270 ++++++++++++++++++++++++++
 3 files changed, 414 insertions(+), 21 deletions(-)

-- 
2.26.2

^ permalink raw reply

* [PATCH v4 2/4] powerpc/sstep: Support VSX vector paired storage access instructions
From: Ravi Bangoria @ 2020-10-08  7:27 UTC (permalink / raw)
  To: mpe; +Cc: ravi.bangoria, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201008072726.233086-1-ravi.bangoria@linux.ibm.com>

From: Balamuruhan S <bala24@linux.ibm.com>

VSX Vector Paired instructions loads/stores an octword (32 bytes)
from/to storage into two sequential VSRs. Add emulation support
for these new instructions:
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/lib/sstep.c | 146 +++++++++++++++++++++++++++++++++------
 1 file changed, 125 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index e6242744c71b..e39ee1651636 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -32,6 +32,10 @@ extern char system_call_vectored_emulate[];
 #define XER_OV32	0x00080000U
 #define XER_CA32	0x00040000U
 
+#ifdef CONFIG_VSX
+#define VSX_REGISTER_XTP(rd)   ((((rd) & 1) << 5) | ((rd) & 0xfe))
+#endif
+
 #ifdef CONFIG_PPC_FPU
 /*
  * Functions in ldstfp.S
@@ -279,6 +283,19 @@ static nokprobe_inline void do_byte_reverse(void *ptr, int nb)
 		up[1] = tmp;
 		break;
 	}
+	case 32: {
+		unsigned long *up = (unsigned long *)ptr;
+		unsigned long tmp;
+
+		tmp = byterev_8(up[0]);
+		up[0] = byterev_8(up[3]);
+		up[3] = tmp;
+		tmp = byterev_8(up[2]);
+		up[2] = byterev_8(up[1]);
+		up[1] = tmp;
+		break;
+	}
+
 #endif
 	default:
 		WARN_ON_ONCE(1);
@@ -709,6 +726,8 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 	reg->d[0] = reg->d[1] = 0;
 
 	switch (op->element_size) {
+	case 32:
+		/* [p]lxvp[x] */
 	case 16:
 		/* whole vector; lxv[x] or lxvl[l] */
 		if (size == 0)
@@ -717,7 +736,7 @@ void emulate_vsx_load(struct instruction_op *op, union vsx_reg *reg,
 		if (IS_LE && (op->vsx_flags & VSX_LDLEFT))
 			rev = !rev;
 		if (rev)
-			do_byte_reverse(reg, 16);
+			do_byte_reverse(reg, size);
 		break;
 	case 8:
 		/* scalar loads, lxvd2x, lxvdsx */
@@ -793,6 +812,20 @@ void emulate_vsx_store(struct instruction_op *op, const union vsx_reg *reg,
 	size = GETSIZE(op->type);
 
 	switch (op->element_size) {
+	case 32:
+		/* [p]stxvp[x] */
+		if (size == 0)
+			break;
+		if (rev) {
+			/* reverse 32 bytes */
+			buf.d[0] = byterev_8(reg->d[3]);
+			buf.d[1] = byterev_8(reg->d[2]);
+			buf.d[2] = byterev_8(reg->d[1]);
+			buf.d[3] = byterev_8(reg->d[0]);
+			reg = &buf;
+		}
+		memcpy(mem, reg, size);
+		break;
 	case 16:
 		/* stxv, stxvx, stxvl, stxvll */
 		if (size == 0)
@@ -861,28 +894,43 @@ static nokprobe_inline int do_vsx_load(struct instruction_op *op,
 				       bool cross_endian)
 {
 	int reg = op->reg;
-	u8 mem[16];
-	union vsx_reg buf;
+	int i, j, nr_vsx_regs;
+	u8 mem[32];
+	union vsx_reg buf[2];
 	int size = GETSIZE(op->type);
 
 	if (!address_ok(regs, ea, size) || copy_mem_in(mem, ea, size, regs))
 		return -EFAULT;
 
-	emulate_vsx_load(op, &buf, mem, cross_endian);
+	nr_vsx_regs = size / sizeof(__vector128);
+	emulate_vsx_load(op, buf, mem, cross_endian);
 	preempt_disable();
 	if (reg < 32) {
 		/* FP regs + extensions */
 		if (regs->msr & MSR_FP) {
-			load_vsrn(reg, &buf);
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				load_vsrn(reg + i, &buf[j].v);
+			}
 		} else {
-			current->thread.fp_state.fpr[reg][0] = buf.d[0];
-			current->thread.fp_state.fpr[reg][1] = buf.d[1];
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				current->thread.fp_state.fpr[reg + i][0] = buf[j].d[0];
+				current->thread.fp_state.fpr[reg + i][1] = buf[j].d[1];
+			}
 		}
 	} else {
-		if (regs->msr & MSR_VEC)
-			load_vsrn(reg, &buf);
-		else
-			current->thread.vr_state.vr[reg - 32] = buf.v;
+		if (regs->msr & MSR_VEC) {
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				load_vsrn(reg + i, &buf[j].v);
+			}
+		} else {
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				current->thread.vr_state.vr[reg - 32 + i] = buf[j].v;
+			}
+		}
 	}
 	preempt_enable();
 	return 0;
@@ -893,30 +941,45 @@ static nokprobe_inline int do_vsx_store(struct instruction_op *op,
 					bool cross_endian)
 {
 	int reg = op->reg;
-	u8 mem[16];
-	union vsx_reg buf;
+	int i, j, nr_vsx_regs;
+	u8 mem[32];
+	union vsx_reg buf[2];
 	int size = GETSIZE(op->type);
 
 	if (!address_ok(regs, ea, size))
 		return -EFAULT;
 
+	nr_vsx_regs = size / sizeof(__vector128);
 	preempt_disable();
 	if (reg < 32) {
 		/* FP regs + extensions */
 		if (regs->msr & MSR_FP) {
-			store_vsrn(reg, &buf);
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				store_vsrn(reg + i, &buf[j].v);
+			}
 		} else {
-			buf.d[0] = current->thread.fp_state.fpr[reg][0];
-			buf.d[1] = current->thread.fp_state.fpr[reg][1];
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				buf[j].d[0] = current->thread.fp_state.fpr[reg + i][0];
+				buf[j].d[1] = current->thread.fp_state.fpr[reg + i][1];
+			}
 		}
 	} else {
-		if (regs->msr & MSR_VEC)
-			store_vsrn(reg, &buf);
-		else
-			buf.v = current->thread.vr_state.vr[reg - 32];
+		if (regs->msr & MSR_VEC) {
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				store_vsrn(reg + i, &buf[j].v);
+			}
+		} else {
+			for (i = 0; i < nr_vsx_regs; i++) {
+				j = IS_LE ? nr_vsx_regs - i - 1 : i;
+				buf[j].v = current->thread.vr_state.vr[reg - 32 + i];
+			}
+		}
 	}
 	preempt_enable();
-	emulate_vsx_store(op, &buf, mem, cross_endian);
+	emulate_vsx_store(op, buf, mem, cross_endian);
 	return  copy_mem_out(mem, ea, size, regs);
 }
 #endif /* CONFIG_VSX */
@@ -2403,6 +2466,14 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			op->vsx_flags = VSX_SPLAT;
 			break;
 
+		case 333:       /* lxvpx */
+			if (!cpu_has_feature(CPU_FTR_ARCH_31))
+				return -1;
+			op->reg = VSX_REGISTER_XTP(rd);
+			op->type = MKOP(LOAD_VSX, 0, 32);
+			op->element_size = 32;
+			break;
+
 		case 364:	/* lxvwsx */
 			op->reg = rd | ((word & 1) << 5);
 			op->type = MKOP(LOAD_VSX, 0, 4);
@@ -2431,6 +2502,13 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 				VSX_CHECK_VEC;
 			break;
 		}
+		case 461:       /* stxvpx */
+			if (!cpu_has_feature(CPU_FTR_ARCH_31))
+				return -1;
+			op->reg = VSX_REGISTER_XTP(rd);
+			op->type = MKOP(STORE_VSX, 0, 32);
+			op->element_size = 32;
+			break;
 		case 524:	/* lxsspx */
 			op->reg = rd | ((word & 1) << 5);
 			op->type = MKOP(LOAD_VSX, 0, 4);
@@ -2672,6 +2750,22 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 #endif
 
 #ifdef CONFIG_VSX
+	case 6:
+		if (!cpu_has_feature(CPU_FTR_ARCH_31))
+			return -1;
+		op->ea = dqform_ea(word, regs);
+		op->reg = VSX_REGISTER_XTP(rd);
+		op->element_size = 32;
+		switch (word & 0xf) {
+		case 0:         /* lxvp */
+			op->type = MKOP(LOAD_VSX, 0, 32);
+			break;
+		case 1:         /* stxvp */
+			op->type = MKOP(STORE_VSX, 0, 32);
+			break;
+		}
+		break;
+
 	case 61:	/* stfdp, lxv, stxsd, stxssp, stxv */
 		switch (word & 7) {
 		case 0:		/* stfdp with LSB of DS field = 0 */
@@ -2803,12 +2897,22 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
 			case 57:	/* pld */
 				op->type = MKOP(LOAD, PREFIXED, 8);
 				break;
+			case 58:        /* plxvp */
+				op->reg = VSX_REGISTER_XTP(rd);
+				op->type = MKOP(LOAD_VSX, PREFIXED, 32);
+				op->element_size = 32;
+				break;
 			case 60:        /* stq */
 				op->type = MKOP(STORE, PREFIXED, 16);
 				break;
 			case 61:	/* pstd */
 				op->type = MKOP(STORE, PREFIXED, 8);
 				break;
+			case 62:        /* pstxvp */
+				op->reg = VSX_REGISTER_XTP(rd);
+				op->type = MKOP(STORE_VSX, PREFIXED, 32);
+				op->element_size = 32;
+				break;
 			}
 			break;
 		case 1: /* Type 01 Eight-Byte Register-to-Register */
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 3/4] powerpc/ppc-opcode: Add encoding macros for VSX vector paired instructions
From: Ravi Bangoria @ 2020-10-08  7:27 UTC (permalink / raw)
  To: mpe; +Cc: ravi.bangoria, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201008072726.233086-1-ravi.bangoria@linux.ibm.com>

From: Balamuruhan S <bala24@linux.ibm.com>

Add instruction encodings, DQ, D0, D1 immediate, XTP, XSP operands as
macros for new VSX vector paired instructions,
  * Load VSX Vector Paired (lxvp)
  * Load VSX Vector Paired Indexed (lxvpx)
  * Prefixed Load VSX Vector Paired (plxvp)
  * Store VSX Vector Paired (stxvp)
  * Store VSX Vector Paired Indexed (stxvpx)
  * Prefixed Store VSX Vector Paired (pstxvp)

Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index a6e3700c4566..5e7918ca4fb7 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -78,6 +78,9 @@
 
 #define IMM_L(i)               ((uintptr_t)(i) & 0xffff)
 #define IMM_DS(i)              ((uintptr_t)(i) & 0xfffc)
+#define IMM_DQ(i)              ((uintptr_t)(i) & 0xfff0)
+#define IMM_D0(i)              (((uintptr_t)(i) >> 16) & 0x3ffff)
+#define IMM_D1(i)              IMM_L(i)
 
 /*
  * 16-bit immediate helper macros: HA() is for use with sign-extending instrs
@@ -295,6 +298,8 @@
 #define __PPC_XB(b)	((((b) & 0x1f) << 11) | (((b) & 0x20) >> 4))
 #define __PPC_XS(s)	((((s) & 0x1f) << 21) | (((s) & 0x20) >> 5))
 #define __PPC_XT(s)	__PPC_XS(s)
+#define __PPC_XSP(s)	((((s) & 0x1e) | (((s) >> 5) & 0x1)) << 21)
+#define __PPC_XTP(s)	__PPC_XSP(s)
 #define __PPC_T_TLB(t)	(((t) & 0x3) << 21)
 #define __PPC_WC(w)	(((w) & 0x3) << 21)
 #define __PPC_WS(w)	(((w) & 0x1f) << 11)
@@ -395,6 +400,14 @@
 #define PPC_RAW_XVCPSGNDP(t, a, b)	((0xf0000780 | VSX_XX3((t), (a), (b))))
 #define PPC_RAW_VPERMXOR(vrt, vra, vrb, vrc) \
 	((0x1000002d | ___PPC_RT(vrt) | ___PPC_RA(vra) | ___PPC_RB(vrb) | (((vrc) & 0x1f) << 6)))
+#define PPC_RAW_LXVP(xtp, a, i)		(0x18000000 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_STXVP(xsp, a, i)	(0x18000001 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_DQ(i))
+#define PPC_RAW_LXVPX(xtp, a, b)	(0x7c00029a | __PPC_XTP(xtp) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_STXVPX(xsp, a, b)	(0x7c00039a | __PPC_XSP(xsp) | ___PPC_RA(a) | ___PPC_RB(b))
+#define PPC_RAW_PLXVP(xtp, i, a, pr) \
+	((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xe8000000 | __PPC_XTP(xtp) | ___PPC_RA(a) | IMM_D1(i)))
+#define PPC_RAW_PSTXVP(xsp, i, a, pr) \
+	((PPC_PREFIX_8LS | __PPC_PRFX_R(pr) | IMM_D0(i)) << 32 | (0xf8000000 | __PPC_XSP(xsp) | ___PPC_RA(a) | IMM_D1(i)))
 #define PPC_RAW_NAP			(0x4c000364)
 #define PPC_RAW_SLEEP			(0x4c0003a4)
 #define PPC_RAW_WINKLE			(0x4c0003e4)
-- 
2.26.2


^ permalink raw reply related

* [PATCH v4 4/4] powerpc/sstep: Add testcases for VSX vector paired load/store instructions
From: Ravi Bangoria @ 2020-10-08  7:27 UTC (permalink / raw)
  To: mpe; +Cc: ravi.bangoria, bala24, paulus, sandipan, naveen.n.rao,
	linuxppc-dev
In-Reply-To: <20201008072726.233086-1-ravi.bangoria@linux.ibm.com>

From: Balamuruhan S <bala24@linux.ibm.com>

Add testcases for VSX vector paired load/store instructions.
Sample o/p:

  emulate_step_test: lxvp           : PASS
  emulate_step_test: stxvp          : PASS
  emulate_step_test: lxvpx          : PASS
  emulate_step_test: stxvpx         : PASS
  emulate_step_test: plxvp          : PASS
  emulate_step_test: pstxvp         : PASS

Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
---
 arch/powerpc/lib/test_emulate_step.c | 270 +++++++++++++++++++++++++++
 1 file changed, 270 insertions(+)

diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 0a201b771477..783d1b85ecfe 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -612,6 +612,273 @@ static void __init test_lxvd2x_stxvd2x(void)
 }
 #endif /* CONFIG_VSX */
 
+#ifdef CONFIG_VSX
+static void __init test_lxvp_stxvp(void)
+{
+	struct pt_regs regs;
+	union {
+		vector128 a;
+		u32 b[4];
+	} c[2];
+	u32 cached_b[8];
+	int stepped = -1;
+
+	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+		show_result("lxvp", "SKIP (!CPU_FTR_ARCH_31)");
+		show_result("stxvp", "SKIP (!CPU_FTR_ARCH_31)");
+		return;
+	}
+
+	init_pt_regs(&regs);
+
+	/*** lxvp ***/
+
+	cached_b[0] = c[0].b[0] = 18233;
+	cached_b[1] = c[0].b[1] = 34863571;
+	cached_b[2] = c[0].b[2] = 834;
+	cached_b[3] = c[0].b[3] = 6138911;
+	cached_b[4] = c[1].b[0] = 1234;
+	cached_b[5] = c[1].b[1] = 5678;
+	cached_b[6] = c[1].b[2] = 91011;
+	cached_b[7] = c[1].b[3] = 121314;
+
+	regs.gpr[4] = (unsigned long)&c[0].a;
+
+	/*
+	 * lxvp XTp,DQ(RA)
+	 * XTp = 32xTX + 2xTp
+	 * let TX=1 Tp=1 RA=4 DQ=0
+	 */
+	stepped = emulate_step(&regs, ppc_inst(PPC_RAW_LXVP(34, 4, 0)));
+
+	if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("lxvp", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("lxvp", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("lxvp", "FAIL");
+	}
+
+	/*** stxvp ***/
+
+	c[0].b[0] = 21379463;
+	c[0].b[1] = 87;
+	c[0].b[2] = 374234;
+	c[0].b[3] = 4;
+	c[1].b[0] = 90;
+	c[1].b[1] = 122;
+	c[1].b[2] = 555;
+	c[1].b[3] = 32144;
+
+	/*
+	 * stxvp XSp,DQ(RA)
+	 * XSp = 32xSX + 2xSp
+	 * let SX=1 Sp=1 RA=4 DQ=0
+	 */
+	stepped = emulate_step(&regs, ppc_inst(PPC_RAW_STXVP(34, 4, 0)));
+
+	if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] &&
+	    cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+	    cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+	    cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+	    cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("stxvp", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("stxvp", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("stxvp", "FAIL");
+	}
+}
+#else
+static void __init test_lxvp_stxvp(void)
+{
+	show_result("lxvp", "SKIP (CONFIG_VSX is not set)");
+	show_result("stxvp", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
+#ifdef CONFIG_VSX
+static void __init test_lxvpx_stxvpx(void)
+{
+	struct pt_regs regs;
+	union {
+		vector128 a;
+		u32 b[4];
+	} c[2];
+	u32 cached_b[8];
+	int stepped = -1;
+
+	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+		show_result("lxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+		show_result("stxvpx", "SKIP (!CPU_FTR_ARCH_31)");
+		return;
+	}
+
+	init_pt_regs(&regs);
+
+	/*** lxvpx ***/
+
+	cached_b[0] = c[0].b[0] = 18233;
+	cached_b[1] = c[0].b[1] = 34863571;
+	cached_b[2] = c[0].b[2] = 834;
+	cached_b[3] = c[0].b[3] = 6138911;
+	cached_b[4] = c[1].b[0] = 1234;
+	cached_b[5] = c[1].b[1] = 5678;
+	cached_b[6] = c[1].b[2] = 91011;
+	cached_b[7] = c[1].b[3] = 121314;
+
+	regs.gpr[3] = (unsigned long)&c[0].a;
+	regs.gpr[4] = 0;
+
+	/*
+	 * lxvpx XTp,RA,RB
+	 * XTp = 32xTX + 2xTp
+	 * let TX=1 Tp=1 RA=3 RB=4
+	 */
+	stepped = emulate_step(&regs, ppc_inst(PPC_RAW_LXVPX(34, 3, 4)));
+
+	if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("lxvpx", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("lxvpx", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("lxvpx", "FAIL");
+	}
+
+	/*** stxvpx ***/
+
+	c[0].b[0] = 21379463;
+	c[0].b[1] = 87;
+	c[0].b[2] = 374234;
+	c[0].b[3] = 4;
+	c[1].b[0] = 90;
+	c[1].b[1] = 122;
+	c[1].b[2] = 555;
+	c[1].b[3] = 32144;
+
+	/*
+	 * stxvpx XSp,RA,RB
+	 * XSp = 32xSX + 2xSp
+	 * let SX=1 Sp=1 RA=3 RB=4
+	 */
+	stepped = emulate_step(&regs, ppc_inst(PPC_RAW_STXVPX(34, 3, 4)));
+
+	if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] &&
+	    cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+	    cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+	    cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+	    cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("stxvpx", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("stxvpx", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("stxvpx", "FAIL");
+	}
+}
+#else
+static void __init test_lxvpx_stxvpx(void)
+{
+	show_result("lxvpx", "SKIP (CONFIG_VSX is not set)");
+	show_result("stxvpx", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
+#ifdef CONFIG_VSX
+static void __init test_plxvp_pstxvp(void)
+{
+	struct ppc_inst instr;
+	struct pt_regs regs;
+	union {
+		vector128 a;
+		u32 b[4];
+	} c[2];
+	u32 cached_b[8];
+	int stepped = -1;
+
+	if (!cpu_has_feature(CPU_FTR_ARCH_31)) {
+		show_result("plxvp", "SKIP (!CPU_FTR_ARCH_31)");
+		show_result("pstxvp", "SKIP (!CPU_FTR_ARCH_31)");
+		return;
+	}
+
+	/*** plxvp ***/
+
+	cached_b[0] = c[0].b[0] = 18233;
+	cached_b[1] = c[0].b[1] = 34863571;
+	cached_b[2] = c[0].b[2] = 834;
+	cached_b[3] = c[0].b[3] = 6138911;
+	cached_b[4] = c[1].b[0] = 1234;
+	cached_b[5] = c[1].b[1] = 5678;
+	cached_b[6] = c[1].b[2] = 91011;
+	cached_b[7] = c[1].b[3] = 121314;
+
+	init_pt_regs(&regs);
+	regs.gpr[3] = (unsigned long)&c[0].a;
+
+	/*
+	 * plxvp XTp,D(RA),R
+	 * XTp = 32xTX + 2xTp
+	 * let RA=3 R=0 D=d0||d1=0 R=0 Tp=1 TX=1
+	 */
+	instr = ppc_inst_prefix(PPC_RAW_PLXVP(34, 0, 3, 0) >> 32,
+			PPC_RAW_PLXVP(34, 0, 3, 0) & 0xffffffff);
+
+	stepped = emulate_step(&regs, instr);
+	if (stepped == 1 && cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("plxvp", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("plxvp", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("plxvp", "FAIL");
+	}
+
+	/*** pstxvp ***/
+
+	c[0].b[0] = 21379463;
+	c[0].b[1] = 87;
+	c[0].b[2] = 374234;
+	c[0].b[3] = 4;
+	c[1].b[0] = 90;
+	c[1].b[1] = 122;
+	c[1].b[2] = 555;
+	c[1].b[3] = 32144;
+
+	/*
+	 * pstxvp XSp,D(RA),R
+	 * XSp = 32xSX + 2xSp
+	 * let RA=3 D=d0||d1=0 R=0 Sp=1 SX=1
+	 */
+	instr = ppc_inst_prefix(PPC_RAW_PSTXVP(34, 0, 3, 0) >> 32,
+			PPC_RAW_PSTXVP(34, 0, 3, 0) & 0xffffffff);
+
+	stepped = emulate_step(&regs, instr);
+
+	if (stepped == 1 && cached_b[0] == c[0].b[0] && cached_b[1] == c[0].b[1] &&
+	    cached_b[2] == c[0].b[2] && cached_b[3] == c[0].b[3] &&
+	    cached_b[4] == c[1].b[0] && cached_b[5] == c[1].b[1] &&
+	    cached_b[6] == c[1].b[2] && cached_b[7] == c[1].b[3] &&
+	    cpu_has_feature(CPU_FTR_VSX)) {
+		show_result("pstxvp", "PASS");
+	} else {
+		if (!cpu_has_feature(CPU_FTR_VSX))
+			show_result("pstxvp", "PASS (!CPU_FTR_VSX)");
+		else
+			show_result("pstxvp", "FAIL");
+	}
+}
+#else
+static void __init test_plxvp_pstxvp(void)
+{
+	show_result("plxvp", "SKIP (CONFIG_VSX is not set)");
+	show_result("pstxvp", "SKIP (CONFIG_VSX is not set)");
+}
+#endif /* CONFIG_VSX */
+
 static void __init run_tests_load_store(void)
 {
 	test_ld();
@@ -628,6 +895,9 @@ static void __init run_tests_load_store(void)
 	test_plfd_pstfd();
 	test_lvx_stvx();
 	test_lxvd2x_stxvd2x();
+	test_lxvp_stxvp();
+	test_lxvpx_stxvpx();
+	test_plxvp_pstxvp();
 }
 
 struct compute_test {
-- 
2.26.2


^ permalink raw reply related

* [PATCH] mm: Avoid using set_pte_at when updating a present pte
From: Aneesh Kumar K.V @ 2020-10-08  9:25 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: Michal Hocko, Jan Kara, Jason Gunthorpe, Aneesh Kumar K.V,
	Hugh Dickins, linux-kernel, Peter Xu, linux-mm, npiggin,
	John Hubbard, Kirill Shutemov, Andrew Morton, Linus Torvalds

This avoids the below warning

WARNING: CPU: 0 PID: 30613 at arch/powerpc/mm/pgtable.c:185 set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 30613 Comm: syz-executor.0 Not tainted 5.9.0-rc8-syzkaller-00156-gc85fb28b6f99 #0
Call Trace:
 [c0000000001cd1f0] panic+0x29c/0x75c kernel/panic.c:231
 [c0000000001cce24] __warn+0x104/0x1b8 kernel/panic.c:600
 [c000000000d829e4] report_bug+0x1d4/0x380 lib/bug.c:198
 [c000000000036800] program_check_exception+0x4e0/0x750 arch/powerpc/kernel/traps.c:1508
 [c0000000000098a8] program_check_common_virt+0x308/0x360
--- interrupt: 700 at set_pte_at+0x2a8/0x3a0 arch/powerpc/mm/pgtable.c:185
    LR = set_pte_at+0x2a4/0x3a0 arch/powerpc/mm/pgtable.c:185
 [c0000000005d2a7c] copy_present_page mm/memory.c:857 [inline]
 [c0000000005d2a7c] copy_present_pte mm/memory.c:899 [inline]
 [c0000000005d2a7c] copy_pte_range mm/memory.c:1014 [inline]
 [c0000000005d2a7c] copy_pmd_range mm/memory.c:1092 [inline]
 [c0000000005d2a7c] copy_pud_range mm/memory.c:1127 [inline]
 [c0000000005d2a7c] copy_p4d_range mm/memory.c:1150 [inline]
 [c0000000005d2a7c] copy_page_range+0x1f6c/0x2cc0 mm/memory.c:1212
 [c0000000001c63cc] dup_mmap kernel/fork.c:592 [inline]
 [c0000000001c63cc] dup_mm+0x77c/0xab0 kernel/fork.c:1355
 [c0000000001c8f70] copy_mm kernel/fork.c:1411 [inline]
 [c0000000001c8f70] copy_process+0x1f00/0x2740 kernel/fork.c:2070
 [c0000000001c9b54] _do_fork+0xc4/0x10b0 kernel/fork.c:2429
 [c0000000001caf54] __do_sys_clone3+0x1d4/0x2b0 kernel/fork.c:27

Architecture like ppc64 expects set_pte_at to be not used for updating a
valid pte. This is further explained in
commit 56eecdb912b5 ("mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit")

Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index fcfc4ca36eba..bfe202ef6244 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -854,7 +854,7 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	 * source pte back to being writable.
 	 */
 	if (pte_write(pte))
-		set_pte_at(src_mm, addr, src_pte, pte);
+		ptep_set_access_flags(vma, addr, src_pte, pte, 1);
 
 	new_page = *prealloc;
 	if (!new_page)
-- 
2.26.2


^ permalink raw reply related

* [RFC PATCH] mm: Fetch the dirty bit before we reset the pte
From: Aneesh Kumar K.V @ 2020-10-08  9:26 UTC (permalink / raw)
  To: linuxppc-dev, mpe
  Cc: Michal Hocko, Jan Kara, Jason Gunthorpe, Aneesh Kumar K.V,
	Hugh Dickins, linux-kernel, Peter Xu, linux-mm, npiggin,
	John Hubbard, Kirill Shutemov, Andrew Morton, Linus Torvalds

In copy_present_page, after we mark the pte non-writable, we should
check for previous dirty bit updates and make sure we don't lose the dirty
bit on reset.

Also, avoid marking the pte write-protected again if copy_present_page
already marked it write-protected.

Cc: Peter Xu <peterx@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Kirill Shutemov <kirill@shutemov.name>
Cc: Hugh Dickins <hughd@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 mm/memory.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index bfe202ef6244..f57b1f04d50a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -848,6 +848,9 @@ copy_present_page(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	if (likely(!page_maybe_dma_pinned(page)))
 		return 1;
 
+	if (pte_dirty(*src_pte))
+		pte = pte_mkdirty(pte);
+
 	/*
 	 * Uhhuh. It looks like the page might be a pinned page,
 	 * and we actually need to copy it. Now we can set the
@@ -904,6 +907,11 @@ copy_present_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		if (retval <= 0)
 			return retval;
 
+		/*
+		 * Fetch the src pte value again, copy_present_page
+		 * could modify it.
+		 */
+		pte = *src_pte;
 		get_page(page);
 		page_dup_rmap(page, false);
 		rss[mm_counter(page)]++;
-- 
2.26.2


^ permalink raw reply related

* [PATCH] crypto: talitos - Endianess in current_desc_hdr()
From: Christophe Leroy @ 2020-10-08  9:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller; +Cc: linuxppc-dev, linux-crypto, linux-kernel

current_desc_hdr() compares the value of the current descriptor
with the next_desc member of the talitos_desc struct.

While the current descriptor is obtained from in_be32() which
return CPU ordered bytes, next_desc member is in big endian order.

Convert the current descriptor into big endian before comparing it
with next_desc.

This fixes a sparse warning.

Fixes: 37b5e8897eb5 ("crypto: talitos - chain in buffered data for ahash on SEC1")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 drivers/crypto/talitos.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index f9f0d34d49f3..992d58a4dbf1 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -478,7 +478,7 @@ static __be32 current_desc_hdr(struct device *dev, int ch)

 	iter = tail;
 	while (priv->chan[ch].fifo[iter].dma_desc != cur_desc &&
-	       priv->chan[ch].fifo[iter].desc->next_desc != cur_desc) {
+	       priv->chan[ch].fifo[iter].desc->next_desc != cpu_to_be32(cur_desc)) {
 		iter = (iter + 1) & (priv->fifo_len - 1);
 		if (iter == tail) {
 			dev_err(dev, "couldn't locate current descriptor\n");
@@ -486,7 +486,7 @@ static __be32 current_desc_hdr(struct device *dev, int ch)
 		}
 	}

-	if (priv->chan[ch].fifo[iter].desc->next_desc == cur_desc) {
+	if (priv->chan[ch].fifo[iter].desc->next_desc == cpu_to_be32(cur_desc)) {
 		struct talitos_edesc *edesc;

 		edesc = container_of(priv->chan[ch].fifo[iter].desc,
-- 
2.25.0

^ permalink raw reply related

* [PATCH] crypto: talitos - Fix return type of current_desc_hdr()
From: Christophe Leroy @ 2020-10-08  9:34 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller; +Cc: linuxppc-dev, linux-crypto, linux-kernel

current_desc_hdr() returns a u32 but in fact this is a __be32,
leading to a lot of sparse warnings.

Change the return type to __be32 and ensure it is handled as
sure by the caller.

Fixes: 3e721aeb3df3 ("crypto: talitos - handle descriptor not found in error path")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 drivers/crypto/talitos.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 7c547352a862..f9f0d34d49f3 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -460,7 +460,7 @@ DEF_TALITOS2_DONE(ch1_3, TALITOS2_ISR_CH_1_3_DONE)
 /*
  * locate current (offending) descriptor
  */
-static u32 current_desc_hdr(struct device *dev, int ch)
+static __be32 current_desc_hdr(struct device *dev, int ch)
 {
 	struct talitos_private *priv = dev_get_drvdata(dev);
 	int tail, iter;
@@ -501,13 +501,13 @@ static u32 current_desc_hdr(struct device *dev, int ch)
 /*
  * user diagnostics; report root cause of error based on execution unit status
  */
-static void report_eu_error(struct device *dev, int ch, u32 desc_hdr)
+static void report_eu_error(struct device *dev, int ch, __be32 desc_hdr)
 {
 	struct talitos_private *priv = dev_get_drvdata(dev);
 	int i;
 
 	if (!desc_hdr)
-		desc_hdr = in_be32(priv->chan[ch].reg + TALITOS_DESCBUF);
+		desc_hdr = cpu_to_be32(in_be32(priv->chan[ch].reg + TALITOS_DESCBUF));
 
 	switch (desc_hdr & DESC_HDR_SEL0_MASK) {
 	case DESC_HDR_SEL0_AFEU:
-- 
2.25.0


^ permalink raw reply related

* mm: Question about the use of 'accessed' flags and pte_young() helper
From: Christophe Leroy @ 2020-10-08  9:49 UTC (permalink / raw)
  To: linux-mm, linux-kernel@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, Aneesh Kumar K.V, Nicholas Piggin

In a 10 years old commit 
(https://github.com/linuxppc/linux/commit/d069cb4373fe0d451357c4d3769623a7564dfa9f), powerpc 8xx has 
made the handling of PTE accessed bit conditional to CONFIG_SWAP.
Since then, this has been extended to some other powerpc variants.

That commit means that when CONFIG_SWAP is not selected, the accessed bit is not set by SW TLB miss 
handlers, leading to pte_young() returning garbage, or should I say possibly returning false 
allthough a page has been accessed since its access flag was reset.

Looking at various mm/ places, pte_young() is used independent of CONFIG_SWAP

Is it still valid the not manage accessed flags when CONFIG_SWAP is not selected ?
If yes, should pte_young() always return true in that case ?

While we are at it, I'm wondering whether powerpc should redefine arch_faults_on_old_pte()
On some variants of powerpc, accessed flag is managed by HW. On others, it is managed by SW TLB miss 
handlers via page fault handling.

Thanks
Christophe

^ permalink raw reply

* Re: [PATCH] crypto: talitos - Fix sparse warnings
From: Christophe Leroy @ 2020-10-08  9:50 UTC (permalink / raw)
  To: Herbert Xu
  Cc: linuxppc-dev, Kim Phillips, Linux Crypto Mailing List,
	Horia Geantă
In-Reply-To: <20201007065048.GA25944@gondor.apana.org.au>



Le 07/10/2020 à 08:50, Herbert Xu a écrit :
> On Sat, Oct 03, 2020 at 07:15:53PM +0200, Christophe Leroy wrote:
>>
>> The following changes fix the sparse warnings with less churn:
> 
> Yes that works too.  Can you please submit this patch?
> 

This fixed two independant commits from the past. I sent out two fix patches.

Christophe

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace
From: Michael Ellerman @ 2020-10-08 10:06 UTC (permalink / raw)
  To: Andrew Donnellan, Sasha Levin, linuxppc-dev
  Cc: nathanl, leobras.c, stable, dja
In-Reply-To: <421cba41-20bf-f874-c81a-8b7f9944c845@linux.ibm.com>

Andrew Donnellan <ajd@linux.ibm.com> writes:
> On 26/8/20 11:53 pm, Sasha Levin wrote:
>> How should we proceed with this patch?
>
> mpe: I believe we came to the conclusion that we shouldn't put this in 
> stable just yet?

Yeah.

Let's give it a little time to get some wider testing before we backport
it.

cheers

^ permalink raw reply

* Re: [PATCH v2 1/2] powerpc/rtas: Restrict RTAS requests from userspace
From: Michael Ellerman @ 2020-10-08 10:07 UTC (permalink / raw)
  To: Andrew Donnellan, Sasha Levin, linuxppc-dev
  Cc: nathanl, leobras.c, stable, dja
In-Reply-To: <87v9fl117r.fsf@mpe.ellerman.id.au>

Michael Ellerman <mpe@ellerman.id.au> writes:
> Andrew Donnellan <ajd@linux.ibm.com> writes:
>> On 26/8/20 11:53 pm, Sasha Levin wrote:
>>> How should we proceed with this patch?
>>
>> mpe: I believe we came to the conclusion that we shouldn't put this in 
>> stable just yet?
>
> Yeah.
>
> Let's give it a little time to get some wider testing before we backport
> it.

So my fault for not dropping the Cc: stable on the commit, sorry.

cheers

^ permalink raw reply

* Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
From: Catalin Marinas @ 2020-10-08 10:12 UTC (permalink / raw)
  To: Jann Horn
  Cc: linuxppc-dev, linux-kernel, Christoph Hellwig, linux-mm,
	Khalid Aziz, Paul Mackerras, sparclinux, Anthony Yznaga,
	Andrew Morton, Will Deacon, David S. Miller, linux-arm-kernel
In-Reply-To: <20201007073932.865218-1-jannh@google.com>

On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote:
> arch_validate_prot() is a hook that can validate whether a given set of
> protection flags is valid in an mprotect() operation. It is given the set
> of protection flags and the address being modified.
> 
> However, the address being modified can currently not actually be used in
> a meaningful way because:
> 
> 1. Only the address is given, but not the length, and the operation can
>    span multiple VMAs. Therefore, the callee can't actually tell which
>    virtual address range, or which VMAs, are being targeted.
> 2. The mmap_lock is not held, meaning that if the callee were to check
>    the VMA at @addr, that VMA would be unrelated to the one the
>    operation is performed on.
> 
> Currently, custom arch_validate_prot() handlers are defined by
> arm64, powerpc and sparc.
> arm64 and powerpc don't care about the address range, they just check the
> flags against CPU support masks.
> sparc's arch_validate_prot() attempts to look at the VMA, but doesn't take
> the mmap_lock.
> 
> Change the function signature to also take a length, and move the
> arch_validate_prot() call in mm/mprotect.c down into the locked region.

For arm64 mte, I noticed the arch_validate_prot() issue with multiple
vmas and addressed this in a different way:

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte&id=c462ac288f2c97e0c1d9ff6a65955317e799f958
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte&id=0042090548740921951f31fc0c20dcdb96638cb0

Both patches queued for 5.10.

Basically, arch_calc_vm_prot_bits() returns a VM_MTE if PROT_MTE has
been requested. The newly introduced arch_validate_flags() will check
the VM_MTE flag against what the system supports and this covers both
mmap() and mprotect(). Note that arch_validate_prot() only handles the
latter and I don't think it's sufficient for SPARC ADI. For arm64 MTE we
definitely wanted mmap() flags to be validated.

In addition, there's a new arch_calc_vm_flag_bits() which allows us to
set a VM_MTE_ALLOWED on a vma if the conditions are right (MAP_ANONYMOUS
or shmem_mmap():

https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/commit/?h=for-next/mte&id=b3fbbea4c00220f62e6f7e2514466e6ee81f7f60

-- 
Catalin

^ permalink raw reply

* Re: [PATCH 1/2] mm/mprotect: Call arch_validate_prot under mmap_lock and with length
From: Michael Ellerman @ 2020-10-08 10:34 UTC (permalink / raw)
  To: Jann Horn, Christoph Hellwig, Benjamin Herrenschmidt,
	Paul Mackerras, linuxppc-dev
  Cc: Dave Kleikamp, Catalin Marinas, kernel list, Linux-MM,
	Khalid Aziz, sparclinux, Anthony Yznaga, Andrew Morton,
	Will Deacon, David S. Miller, Linux ARM
In-Reply-To: <CAG48ez3kjTeVtQcjQerYYRs7sX5qq3O7SU-FEaYLNXisFmAeOg@mail.gmail.com>

Jann Horn <jannh@google.com> writes:
> On Wed, Oct 7, 2020 at 2:35 PM Christoph Hellwig <hch@infradead.org> wrote:
>> On Wed, Oct 07, 2020 at 09:39:31AM +0200, Jann Horn wrote:
>> > diff --git a/arch/powerpc/kernel/syscalls.c b/arch/powerpc/kernel/syscalls.c
>> > index 078608ec2e92..b1fabb97d138 100644
>> > --- a/arch/powerpc/kernel/syscalls.c
>> > +++ b/arch/powerpc/kernel/syscalls.c
>> > @@ -43,7 +43,7 @@ static inline long do_mmap2(unsigned long addr, size_t len,
>> >  {
>> >       long ret = -EINVAL;
>> >
>> > -     if (!arch_validate_prot(prot, addr))
>> > +     if (!arch_validate_prot(prot, addr, len))
>>
>> This call isn't under mmap lock.  I also find it rather weird as the
>> generic code only calls arch_validate_prot from mprotect, only powerpc
>> also calls it from mmap.
>>
>> This seems to go back to commit ef3d3246a0d0
>> ("powerpc/mm: Add Strong Access Ordering support")
>
> I'm _guessing_ the idea in the generic case might be that mmap()
> doesn't check unknown bits in the protection flags, and therefore
> maybe people wanted to avoid adding new error cases that could be
> caused by random high bits being set?

I suspect it's just that when we added it we updated our do_mmap2() and
didn't touch the generic version because we didn't need to. ie. it's not
intentional it's just a buglet.

I think this is the original submission:

  https://lore.kernel.org/linuxppc-dev/20080610220055.10257.84465.sendpatchset@norville.austin.ibm.com/

Which only calls arch_validate_prot() from mprotect and the powerpc
code, and there was no discussion about adding it elsewhere.

> So while the mprotect() case
> checks the flags and refuses unknown values, the mmap() code just lets
> the architecture figure out which bits are actually valid to set (via
> arch_calc_vm_prot_bits()) and silently ignores the rest?
>
> And powerpc apparently decided that they do want to error out on bogus
> prot values passed to their version of mmap(), and in exchange, assume
> in arch_calc_vm_prot_bits() that the protection bits are valid?

I don't think we really decided that, it just happened by accident and
no one noticed/complained.

Seems userspace is pretty well behaved when it comes to passing prot
values to mmap().

> powerpc's arch_validate_prot() doesn't actually need the mmap lock, so
> I think this is fine-ish for now (as in, while the code is a bit
> unclean, I don't think I'm making it worse, and I don't think it's
> actually buggy). In theory, we could move the arch_validate_prot()
> call over into the mmap guts, where we're holding the lock, and gate
> it on the architecture or on some feature CONFIG that powerpc can
> activate in its Kconfig. But I'm not sure whether that'd be helping or
> making things worse, so when I sent this patch, I deliberately left
> the powerpc stuff as-is.

I think what you've done is fine, and anything more elaborate is not
worth the effort.

cheers

^ permalink raw reply

* [PATCH 0/4] powerpc/perf: Power PMU fixes for power10 DD1
From: Athira Rajeev @ 2020-10-08 10:52 UTC (permalink / raw)
  To: mpe; +Cc: mikey, maddy, linuxppc-dev

The patch series addresses PMU fixes for power10 DD1

Patch1 introduces a new power pmu flag to include
conditional code changes for power10 DD1.
Patch2 and Patch3 includes fixes in core-book3s to address
issues with marked events during sampling.
Patch4 includes fix to drop kernel samples while
userspace profiling.

Athira Rajeev (4):
  powerpc/perf: Add new power pmu flag "PPMU_P10_DD1" for power10 DD1
  powerpc/perf: Using SIER[CMPL] instead of SIER[SIAR_VALID]
  powerpc/perf: Use the address from SIAR register to set cpumode flags
  powerpc/perf: Exclude kernel samples while counting events in user
    space.

 arch/powerpc/include/asm/perf_event_server.h |  1 +
 arch/powerpc/perf/core-book3s.c              | 35 +++++++++++++++++++++++++++-
 arch/powerpc/perf/power10-pmu.c              |  6 +++++
 3 files changed, 41 insertions(+), 1 deletion(-)

-- 
1.8.3.1


^ permalink raw reply

* [PATCH 1/4] powerpc/perf: Add new power pmu flag "PPMU_P10_DD1" for power10 DD1
From: Athira Rajeev @ 2020-10-08 10:52 UTC (permalink / raw)
  To: mpe; +Cc: mikey, maddy, linuxppc-dev
In-Reply-To: <1602154329-2092-1-git-send-email-atrajeev@linux.vnet.ibm.com>

Add a new power PMU flag "PPMU_P10_DD1" which can be
used to conditionally add any code path for power10 DD1 processor
version. Also modify power10 PMU driver code to set this
flag only for DD1, based on the Processor Version Register (PVR)
value.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/perf_event_server.h | 1 +
 arch/powerpc/perf/power10-pmu.c              | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/perf_event_server.h b/arch/powerpc/include/asm/perf_event_server.h
index f6acabb..3b7baba 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -82,6 +82,7 @@ struct power_pmu {
 #define PPMU_ARCH_207S		0x00000080 /* PMC is architecture v2.07S */
 #define PPMU_NO_SIAR		0x00000100 /* Do not use SIAR */
 #define PPMU_ARCH_31		0x00000200 /* Has MMCR3, SIER2 and SIER3 */
+#define PPMU_P10_DD1		0x00000400 /* Is power10 DD1 processor version */
 
 /*
  * Values for flags to get_alternatives()
diff --git a/arch/powerpc/perf/power10-pmu.c b/arch/powerpc/perf/power10-pmu.c
index 8314865..47d930a 100644
--- a/arch/powerpc/perf/power10-pmu.c
+++ b/arch/powerpc/perf/power10-pmu.c
@@ -404,6 +404,7 @@ static void power10_config_bhrb(u64 pmu_bhrb_filter)
 
 int init_power10_pmu(void)
 {
+	unsigned int pvr;
 	int rc;
 
 	/* Comes from cpu_specs[] */
@@ -411,6 +412,11 @@ int init_power10_pmu(void)
 	    strcmp(cur_cpu_spec->oprofile_cpu_type, "ppc64/power10"))
 		return -ENODEV;
 
+	pvr = mfspr(SPRN_PVR);
+	/* Add the ppmu flag for power10 DD1 */
+	if ((PVR_CFG(pvr) == 1))
+		power10_pmu.flags |= PPMU_P10_DD1;
+
 	/* Set the PERF_REG_EXTENDED_MASK here */
 	PERF_REG_EXTENDED_MASK = PERF_REG_PMU_MASK_31;
 
-- 
1.8.3.1


^ permalink raw reply related

* [PATCH 3/4] powerpc/perf: Use the address from SIAR register to set cpumode flags
From: Athira Rajeev @ 2020-10-08 10:52 UTC (permalink / raw)
  To: mpe; +Cc: mikey, maddy, linuxppc-dev
In-Reply-To: <1602154329-2092-1-git-send-email-atrajeev@linux.vnet.ibm.com>

While setting the processor mode for any sample, `perf_get_misc_flags`
expects the privilege level to differentiate the userspace and kernel
address. On power10 DD1, there is an issue that causes [MSR_HV MSR_PR] bits
of Sampled Instruction Event Register (SIER) not to be set for marked
events. Hence add a check to use the address in Sampled Instruction Address
Register (SIAR) to identify the privilege level.

Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---
 arch/powerpc/perf/core-book3s.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index d766090..c018004 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -250,11 +250,25 @@ static inline u32 perf_flags_from_msr(struct pt_regs *regs)
 static inline u32 perf_get_misc_flags(struct pt_regs *regs)
 {
 	bool use_siar = regs_use_siar(regs);
+	unsigned long mmcra = regs->dsisr;
+	int marked = mmcra & MMCRA_SAMPLE_ENABLE;
 
 	if (!use_siar)
 		return perf_flags_from_msr(regs);
 
 	/*
+	 * Check the address in SIAR to identify the
+	 * privilege levels since the SIER[MSR_HV, MSR_PR]
+	 * bits are not set for marked events in power10
+	 * DD1.
+	 */
+	if (marked && (ppmu->flags & PPMU_P10_DD1)) {
+		if (is_kernel_addr(mfspr(SPRN_SIAR)))
+			return PERF_RECORD_MISC_KERNEL;
+		return PERF_RECORD_MISC_USER;
+	}
+
+	/*
 	 * If we don't have flags in MMCRA, rather than using
 	 * the MSR, we intuit the flags from the address in
 	 * SIAR which should give slightly more reliable
-- 
1.8.3.1


^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox