LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 5/5] powerpc/pmem: Avoid the barrier in flush routines
From: Aneesh Kumar K.V @ 2020-05-13  3:47 UTC (permalink / raw)
  To: linuxppc-dev, mpe, linux-nvdimm
  Cc: alistair, dan.j.williams, oohall, Aneesh Kumar K.V
In-Reply-To: <20200513034705.172983-1-aneesh.kumar@linux.ibm.com>

nvdimm expect the flush routines to just mark the cache clean. The barrier
that mark the store globally visible is done in nvdimm_flush().

Update the papr_scm driver to a simplified nvdim_flush callback that do
only the required barrier.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
---
 arch/powerpc/lib/pmem.c                   | 34 +++++++++++++++++------
 arch/powerpc/platforms/pseries/papr_scm.c | 13 +++++++++
 2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/lib/pmem.c b/arch/powerpc/lib/pmem.c
index 076d75efb57a..3ef15cfa925b 100644
--- a/arch/powerpc/lib/pmem.c
+++ b/arch/powerpc/lib/pmem.c
@@ -9,7 +9,7 @@
 
 #include <asm/cacheflush.h>
 
-static inline void clean_pmem_range_isa310(unsigned long start, unsigned long stop)
+static inline void __clean_pmem_range(unsigned long start, unsigned long stop)
 {
 	unsigned long shift = l1_dcache_shift();
 	unsigned long bytes = l1_dcache_bytes();
@@ -18,13 +18,22 @@ static inline void clean_pmem_range_isa310(unsigned long start, unsigned long st
 	unsigned long i;
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
-		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+		dcbf(addr);
+}
 
+static inline void __flush_pmem_range(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
 
-	asm volatile(PPC_PHWSYNC ::: "memory");
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		dcbf(addr);
 }
 
-static inline void flush_pmem_range_isa310(unsigned long start, unsigned long stop)
+static inline void clean_pmem_range_isa310(unsigned long start, unsigned long stop)
 {
 	unsigned long shift = l1_dcache_shift();
 	unsigned long bytes = l1_dcache_bytes();
@@ -33,24 +42,33 @@ static inline void flush_pmem_range_isa310(unsigned long start, unsigned long st
 	unsigned long i;
 
 	for (i = 0; i < size >> shift; i++, addr += bytes)
-		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
+		asm volatile(PPC_DCBSTPS(%0, %1): :"i"(0), "r"(addr): "memory");
+}
 
+static inline void flush_pmem_range_isa310(unsigned long start, unsigned long stop)
+{
+	unsigned long shift = l1_dcache_shift();
+	unsigned long bytes = l1_dcache_bytes();
+	void *addr = (void *)(start & ~(bytes - 1));
+	unsigned long size = stop - (unsigned long)addr + (bytes - 1);
+	unsigned long i;
 
-	asm volatile(PPC_PHWSYNC ::: "memory");
+	for (i = 0; i < size >> shift; i++, addr += bytes)
+		asm volatile(PPC_DCBFPS(%0, %1): :"i"(0), "r"(addr): "memory");
 }
 
 static inline void clean_pmem_range(unsigned long start, unsigned long stop)
 {
 	if (cpu_has_feature(CPU_FTR_ARCH_31))
 		return clean_pmem_range_isa310(start, stop);
-	return flush_dcache_range(start, stop);
+	return __clean_pmem_range(start, stop);
 }
 
 static inline void flush_pmem_range(unsigned long start, unsigned long stop)
 {
 	if (cpu_has_feature(CPU_FTR_ARCH_31))
 		return flush_pmem_range_isa310(start, stop);
-	return flush_dcache_range(start, stop);
+	return __flush_pmem_range(start, stop);
 }
 
 /*
diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c
index f35592423380..ad506e7003c9 100644
--- a/arch/powerpc/platforms/pseries/papr_scm.c
+++ b/arch/powerpc/platforms/pseries/papr_scm.c
@@ -285,6 +285,18 @@ static int papr_scm_ndctl(struct nvdimm_bus_descriptor *nd_desc,
 
 	return 0;
 }
+/*
+ * We have made sure the pmem writes are done such that before calling this
+ * all the caches are flushed/clean. We use dcbf/dcbfps to ensure this. Here
+ * we just need to add the necessary barrier to make sure the above flushes
+ * are have updated persistent storage before any data access or data transfer
+ * caused by subsequent instructions is initiated.
+ */
+static int papr_scm_flush_sync(struct nd_region *nd_region, struct bio *bio)
+{
+	arch_pmem_flush_barrier();
+	return 0;
+}
 
 static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 {
@@ -340,6 +352,7 @@ static int papr_scm_nvdimm_init(struct papr_scm_priv *p)
 	ndr_desc.mapping = &mapping;
 	ndr_desc.num_mappings = 1;
 	ndr_desc.nd_set = &p->nd_set;
+	ndr_desc.flush = papr_scm_flush_sync;
 
 	if (p->is_volatile)
 		p->region = nvdimm_volatile_region_create(p->bus, &ndr_desc);
-- 
2.26.2


^ permalink raw reply related

* Re: [PATCH v4 2/3] powerpc/numa: Prefer node id queried from vphn
From: Gautham R Shenoy @ 2020-05-13  3:58 UTC (permalink / raw)
  To: Srikar Dronamraju
  Cc: Gautham R Shenoy, Michal Hocko, David Hildenbrand, Linus Torvalds,
	linux-kernel, linux-mm, Satheesh Rajendran, Mel Gorman,
	Kirill A. Shutemov, Andrew Morton, linuxppc-dev,
	Christopher Lameter, Vlastimil Babka
In-Reply-To: <20200512132937.19295-3-srikar@linux.vnet.ibm.com>

On Tue, May 12, 2020 at 06:59:36PM +0530, Srikar Dronamraju wrote:
> Node id queried from the static device tree may not
> be correct. For example: it may always show 0 on a shared processor.
> Hence prefer the node id queried from vphn and fallback on the device tree
> based node id if vphn query fails.
> 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-mm@kvack.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
> Cc: Christopher Lameter <cl@linux.com>
> Cc: Michael Ellerman <mpe@ellerman.id.au>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
> Cc: David Hildenbrand <david@redhat.com>
> Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

Looks good to me.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>

> ---
> Changelog v2:->v3:
> - Resolved comments from Gautham.
> Link v2: https://lore.kernel.org/linuxppc-dev/20200428093836.27190-1-srikar@linux.vnet.ibm.com/t/#u
> 
> Changelog v1:->v2:
> - Rebased to v5.7-rc3
> 
>  arch/powerpc/mm/numa.c | 16 ++++++++--------
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index b3615b7..2815313 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -719,20 +719,20 @@ static int __init parse_numa_properties(void)
>  	 */
>  	for_each_present_cpu(i) {
>  		struct device_node *cpu;
> -		int nid;
> -
> -		cpu = of_get_cpu_node(i, NULL);
> -		BUG_ON(!cpu);
> -		nid = of_node_to_nid_single(cpu);
> -		of_node_put(cpu);
> +		int nid = vphn_get_nid(i);
> 
>  		/*
>  		 * Don't fall back to default_nid yet -- we will plug
>  		 * cpus into nodes once the memory scan has discovered
>  		 * the topology.
>  		 */
> -		if (nid < 0)
> -			continue;
> -		node_set_online(nid);
> +		if (nid == NUMA_NO_NODE) {
> +			cpu = of_get_cpu_node(i, NULL);
> +			BUG_ON(!cpu);
> +			nid = of_node_to_nid_single(cpu);
> +			of_node_put(cpu);
> +		}
> +
> +		if (likely(nid > 0))
> +			node_set_online(nid);
>  	}
> 
>  	get_n_mem_cells(&n_mem_addr_cells, &n_mem_size_cells);
> -- 
> 1.8.3.1
> 

^ permalink raw reply

* Re: [PATCH] powerpc/kvm: silence kmemleak false positives
From: Michael Ellerman @ 2020-05-13  4:00 UTC (permalink / raw)
  To: Catalin Marinas; +Cc: linux-kernel, kvm-ppc, Qian Cai, linuxppc-dev
In-Reply-To: <20200511112829.GB19176@gaia>

Catalin Marinas <catalin.marinas@arm.com> writes:
> On Mon, May 11, 2020 at 09:15:55PM +1000, Michael Ellerman wrote:
>> Qian Cai <cai@lca.pw> writes:
>> > kvmppc_pmd_alloc() and kvmppc_pte_alloc() allocate some memory but then
>> > pud_populate() and pmd_populate() will use __pa() to reference the newly
>> > allocated memory. The same is in xive_native_provision_pages().
>> >
>> > Since kmemleak is unable to track the physical memory resulting in false
>> > positives, silence those by using kmemleak_ignore().
>> 
>> There is kmemleak_alloc_phys(), which according to the docs can be used
>> for tracking a phys address.
>
> This won't help. While kmemleak_alloc_phys() allows passing a physical
> address, it doesn't track physical address references to this object. It
> still expects VA pointing to it, otherwise the object would be reported
> as a leak.

OK, thanks for clarifying that.

> We currently only call this from the memblock code with a min_count of
> 0, meaning it will not be reported as a leak if no references are found.
>
> We don't have this issue with page tables on other architectures since
> most of them use whole page allocations which aren't tracked by
> kmemleak. These powerpc functions use kmem_cache_alloc() which would be
> tracked automatically by kmemleak. While we could add a phys alias to
> kmemleak (another search tree), I think the easiest is as per Qian's
> patch, just ignore those objects.

Agreed.

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/kvm: silence kmemleak false positives
From: Michael Ellerman @ 2020-05-13  4:05 UTC (permalink / raw)
  To: Qian Cai; +Cc: linux-kernel, kvm-ppc, Qian Cai, catalin.marinas, linuxppc-dev
In-Reply-To: <20200509015538.3183-1-cai@lca.pw>

Qian Cai <cai@lca.pw> writes:
> kvmppc_pmd_alloc() and kvmppc_pte_alloc() allocate some memory but then
> pud_populate() and pmd_populate() will use __pa() to reference the newly
> allocated memory. The same is in xive_native_provision_pages().

Can you please split this into two patches, one for the KVM cases and
one for xive.

That way the KVM patch can go via the kvm-ppc tree, and I'll take the
xive one via powerpc.

> Since kmemleak is unable to track the physical memory resulting in false
> positives, silence those by using kmemleak_ignore().
>
> unreferenced object 0xc000201c382a1000 (size 4096):
>   comm "qemu-kvm", pid 124828, jiffies 4295733767 (age 341.250s)
>   hex dump (first 32 bytes):
>     c0 00 20 09 f4 60 03 87 c0 00 20 10 72 a0 03 87  .. ..`.... .r...
>     c0 00 20 0e 13 a0 03 87 c0 00 20 1b dc c0 03 87  .. ....... .....
>   backtrace:
>     [<000000004cc2790f>] kvmppc_create_pte+0x838/0xd20 [kvm_hv]
>     kvmppc_pmd_alloc at arch/powerpc/kvm/book3s_64_mmu_radix.c:366
>     (inlined by) kvmppc_create_pte at arch/powerpc/kvm/book3s_64_mmu_radix.c:590
>     [<00000000d123c49a>] kvmppc_book3s_instantiate_page+0x2e0/0x8c0 [kvm_hv]
>     [<00000000bb549087>] kvmppc_book3s_radix_page_fault+0x1b4/0x2b0 [kvm_hv]
>     [<0000000086dddc0e>] kvmppc_book3s_hv_page_fault+0x214/0x12a0 [kvm_hv]
>     [<000000005ae9ccc2>] kvmppc_vcpu_run_hv+0xc5c/0x15f0 [kvm_hv]
>     [<00000000d22162ff>] kvmppc_vcpu_run+0x34/0x48 [kvm]
>     [<00000000d6953bc4>] kvm_arch_vcpu_ioctl_run+0x314/0x420 [kvm]
>     [<000000002543dd54>] kvm_vcpu_ioctl+0x33c/0x950 [kvm]
>     [<0000000048155cd6>] ksys_ioctl+0xd8/0x130
>     [<0000000041ffeaa7>] sys_ioctl+0x28/0x40
>     [<000000004afc4310>] system_call_exception+0x114/0x1e0
>     [<00000000fb70a873>] system_call_common+0xf0/0x278
> unreferenced object 0xc0002001f0c03900 (size 256):
>   comm "qemu-kvm", pid 124830, jiffies 4295735235 (age 326.570s)
>   hex dump (first 32 bytes):
>     c0 00 20 10 fa a0 03 87 c0 00 20 10 fa a1 03 87  .. ....... .....
>     c0 00 20 10 fa a2 03 87 c0 00 20 10 fa a3 03 87  .. ....... .....
>   backtrace:
>     [<0000000023f675b8>] kvmppc_create_pte+0x854/0xd20 [kvm_hv]
>     kvmppc_pte_alloc at arch/powerpc/kvm/book3s_64_mmu_radix.c:356
>     (inlined by) kvmppc_create_pte at arch/powerpc/kvm/book3s_64_mmu_radix.c:593
>     [<00000000d123c49a>] kvmppc_book3s_instantiate_page+0x2e0/0x8c0 [kvm_hv]
>     [<00000000bb549087>] kvmppc_book3s_radix_page_fault+0x1b4/0x2b0 [kvm_hv]
>     [<0000000086dddc0e>] kvmppc_book3s_hv_page_fault+0x214/0x12a0 [kvm_hv]
>     [<000000005ae9ccc2>] kvmppc_vcpu_run_hv+0xc5c/0x15f0 [kvm_hv]
>     [<00000000d22162ff>] kvmppc_vcpu_run+0x34/0x48 [kvm]
>     [<00000000d6953bc4>] kvm_arch_vcpu_ioctl_run+0x314/0x420 [kvm]
>     [<000000002543dd54>] kvm_vcpu_ioctl+0x33c/0x950 [kvm]
>     [<0000000048155cd6>] ksys_ioctl+0xd8/0x130
>     [<0000000041ffeaa7>] sys_ioctl+0x28/0x40
>     [<000000004afc4310>] system_call_exception+0x114/0x1e0
>     [<00000000fb70a873>] system_call_common+0xf0/0x278
> unreferenced object 0xc000201b53e90000 (size 65536):
>   comm "qemu-kvm", pid 124557, jiffies 4295650285 (age 364.370s)
>   hex dump (first 32 bytes):
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<00000000acc2fb77>] xive_native_alloc_vp_block+0x168/0x210
>     xive_native_provision_pages at arch/powerpc/sysdev/xive/native.c:645
>     (inlined by) xive_native_alloc_vp_block at arch/powerpc/sysdev/xive/native.c:674
>     [<000000004d5c7964>] kvmppc_xive_compute_vp_id+0x20c/0x3b0 [kvm]
>     [<0000000055317cd2>] kvmppc_xive_connect_vcpu+0xa4/0x4a0 [kvm]
>     [<0000000093dfc014>] kvm_arch_vcpu_ioctl+0x388/0x508 [kvm]
>     [<00000000d25aea0f>] kvm_vcpu_ioctl+0x15c/0x950 [kvm]
>     [<0000000048155cd6>] ksys_ioctl+0xd8/0x130
>     [<0000000041ffeaa7>] sys_ioctl+0x28/0x40
>     [<000000004afc4310>] system_call_exception+0x114/0x1e0
>     [<00000000fb70a873>] system_call_common+0xf0/0x278
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>  arch/powerpc/kvm/book3s_64_mmu_radix.c | 16 ++++++++++++++--
>  arch/powerpc/sysdev/xive/native.c      |  4 ++++
>  2 files changed, 18 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> index aa12cd4078b3..bc6c1aa3d0e9 100644
> --- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
> +++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
> @@ -353,7 +353,13 @@ static struct kmem_cache *kvm_pmd_cache;

This should probably also have an include of <linux/kmemleak.h> ?

>  static pte_t *kvmppc_pte_alloc(void)
>  {
> -	return kmem_cache_alloc(kvm_pte_cache, GFP_KERNEL);
> +	pte_t *pte;
> +
> +	pte = kmem_cache_alloc(kvm_pte_cache, GFP_KERNEL);
> +	/* pmd_populate() will only reference _pa(pte). */
> +	kmemleak_ignore(pte);
> +
> +	return pte;
>  }
>  
>  static void kvmppc_pte_free(pte_t *ptep)


cheers

^ permalink raw reply

* Re: [PATCH] powerpc/book3s64/radix/tlb: Determine hugepage flush correctly
From: Nicholas Piggin @ 2020-05-13  4:30 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe; +Cc: Bharata B Rao
In-Reply-To: <20200513030616.152288-1-aneesh.kumar@linux.ibm.com>

Excerpts from Aneesh Kumar K.V's message of May 13, 2020 1:06 pm:
> With a 64K page size flush with start and end value as below
> (start, end) = (721f680d0000, 721f680e0000) results in
> (hstart, hend) = (721f68200000, 721f68000000)
> 
> Avoid doing a __tlbie_va_range with the wrong hstart and hend value in this
> case.
> 
> __tlbie_va_range will skip the actual tlbie operation for start > end.
> But we still end up doing the tlbie fixup.

Hm, good catch.

> Reported-by: Bharata B Rao <bharata@linux.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
> ---
>  arch/powerpc/mm/book3s64/radix_tlb.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
> index 758ade2c2b6e..d592f9e1c037 100644
> --- a/arch/powerpc/mm/book3s64/radix_tlb.c
> +++ b/arch/powerpc/mm/book3s64/radix_tlb.c
> @@ -884,10 +884,10 @@ static inline void __radix__flush_tlb_range(struct mm_struct *mm,
>  		if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE)) {
>  			hstart = (start + PMD_SIZE - 1) & PMD_MASK;
>  			hend = end & PMD_MASK;
> -			if (hstart == hend)
> -				hflush = false;
> -			else
> +			if (hstart < hend)
>  				hflush = true;
> +			else
> +				hflush = false;

We can probably get rid of the else part since it is already false.

Otherwise

Reviewed-by: Nicholas Piggin <npiggin@gmail.com>


^ permalink raw reply

* Re: [PATCH v2 1/1] powerpc/crash: Use NMI context for printk when starting to crash
From: Nicholas Piggin @ 2020-05-13  4:36 UTC (permalink / raw)
  To: Alexios Zavras, Benjamin Herrenschmidt, Christophe Leroy,
	Greg Kroah-Hartman, Enrico Weigelt, Leonardo Bras,
	Michael Ellerman, Paul Mackerras, Thomas Gleixner
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20200512214533.93878-1-leobras.c@gmail.com>

Excerpts from Leonardo Bras's message of May 13, 2020 7:45 am:
> Currently, if printk lock (logbuf_lock) is held by other thread during
> crash, there is a chance of deadlocking the crash on next printk, and
> blocking a possibly desired kdump.
> 
> At the start of default_machine_crash_shutdown, make printk enter
> NMI context, as it will use per-cpu buffers to store the message,
> and avoid locking logbuf_lock.

printk_nmi_enter is used in one other place outside nmi_enter.

Is there a different/better way to handle this? What do other 
architectures do?

Other subsystems get put into an nmi-mode when we call nmi_enter
(lockdep, ftrace, rcu etc). It seems like those would be useful for 
similar reasons, so at least explaining why that is not used in a 
comment would be good.

Aside from that, I welcome any effort to make our crashes more reliable
so thanks for working on this stuff.

Thanks,
Nick

> 
> Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
> Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
> 
> ---
> Changes since v1:
> - Added in-code comment explaining the need of context change
> - Function moved to the start of default_machine_crash_shutdown,
>   to avoid locking any printk on crashing routine.
> - Title was 'Use NMI context for printk after crashing other CPUs'
> 
> ---
>  arch/powerpc/kexec/crash.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/arch/powerpc/kexec/crash.c b/arch/powerpc/kexec/crash.c
> index d488311efab1..c9a889880214 100644
> --- a/arch/powerpc/kexec/crash.c
> +++ b/arch/powerpc/kexec/crash.c
> @@ -311,6 +311,9 @@ void default_machine_crash_shutdown(struct pt_regs *regs)
>  	unsigned int i;
>  	int (*old_handler)(struct pt_regs *regs);
>  
> +	/* Avoid hardlocking with irresponsive CPU holding logbuf_lock */
> +	printk_nmi_enter();
> +
>  	/*
>  	 * This function is only called after the system
>  	 * has panicked or is otherwise in a critical state.
> -- 
> 2.25.4
> 
> 

^ permalink raw reply

* [PATCH v2 1/2] powerpc/rtas: Move type/struct definitions from rtas.h into rtas-types.h
From: Leonardo Bras @ 2020-05-13  4:40 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Allison Randal, Leonardo Bras, Greg Kroah-Hartman,
	Thomas Gleixner, Nicholas Piggin, Nathan Lynch, Gautham R. Shenoy,
	Nadav Amit
  Cc: Leonardo Bras, linuxppc-dev, linux-kernel

In order to get any rtas* struct into other headers, including rtas.h
may cause a lot of errors, regarding include dependency needed for
inline functions.

Create rtas-types.h and move there all type/struct definitions
from rtas.h, then include rtas-types.h into rtas.h.

Also, as suggested by checkpath.pl, replace uint8_t for u8, and keep
the same type pattern for the whole file, as they are the same
according to powerpc/boot/types.h.

Signed-off-by: Leonardo Bras <leobras.c@gmail.com>
---
 arch/powerpc/include/asm/rtas-types.h | 124 ++++++++++++++++++++++++++
 arch/powerpc/include/asm/rtas.h       | 118 +-----------------------
 2 files changed, 125 insertions(+), 117 deletions(-)
 create mode 100644 arch/powerpc/include/asm/rtas-types.h

diff --git a/arch/powerpc/include/asm/rtas-types.h b/arch/powerpc/include/asm/rtas-types.h
new file mode 100644
index 000000000000..59b0b4b25b7a
--- /dev/null
+++ b/arch/powerpc/include/asm/rtas-types.h
@@ -0,0 +1,124 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _POWERPC_RTAS_TYPES_H
+#define _POWERPC_RTAS_TYPES_H
+#ifdef __KERNEL__
+
+typedef __be32 rtas_arg_t;
+
+struct rtas_args {
+	__be32 token;
+	__be32 nargs;
+	__be32 nret;
+	rtas_arg_t args[16];
+	rtas_arg_t *rets;     /* Pointer to return values in args[]. */
+};
+
+struct rtas_t {
+	unsigned long entry;		/* physical address pointer */
+	unsigned long base;		/* physical address pointer */
+	unsigned long size;
+	arch_spinlock_t lock;
+	struct rtas_args args;
+	struct device_node *dev;	/* virtual address pointer */
+};
+
+struct rtas_suspend_me_data {
+	atomic_t working; /* number of cpus accessing this struct */
+	atomic_t done;
+	int token; /* ibm,suspend-me */
+	atomic_t error;
+	struct completion *complete; /* wait on this until working == 0 */
+};
+
+struct rtas_error_log {
+	/* Byte 0 */
+	u8		byte0;			/* Architectural version */
+
+	/* Byte 1 */
+	u8		byte1;
+	/* XXXXXXXX
+	 * XXX		3: Severity level of error
+	 *    XX	2: Degree of recovery
+	 *      X	1: Extended log present?
+	 *       XX	2: Reserved
+	 */
+
+	/* Byte 2 */
+	u8		byte2;
+	/* XXXXXXXX
+	 * XXXX		4: Initiator of event
+	 *     XXXX	4: Target of failed operation
+	 */
+	u8		byte3;			/* General event or error*/
+	__be32		extended_log_length;	/* length in bytes */
+	unsigned char	buffer[1];		/* Start of extended log */
+						/* Variable length.      */
+};
+
+/* RTAS general extended event log, Version 6. The extended log starts
+ * from "buffer" field of struct rtas_error_log defined above.
+ */
+struct rtas_ext_event_log_v6 {
+	/* Byte 0 */
+	u8 byte0;
+	/* XXXXXXXX
+	 * X		1: Log valid
+	 *  X		1: Unrecoverable error
+	 *   X		1: Recoverable (correctable or successfully retried)
+	 *    X		1: Bypassed unrecoverable error (degraded operation)
+	 *     X	1: Predictive error
+	 *      X	1: "New" log (always 1 for data returned from RTAS)
+	 *       X	1: Big Endian
+	 *        X	1: Reserved
+	 */
+
+	/* Byte 1 */
+	u8 byte1;			/* reserved */
+
+	/* Byte 2 */
+	u8 byte2;
+	/* XXXXXXXX
+	 * X		1: Set to 1 (indicating log is in PowerPC format)
+	 *  XXX		3: Reserved
+	 *     XXXX	4: Log format used for bytes 12-2047
+	 */
+
+	/* Byte 3 */
+	u8 byte3;			/* reserved */
+	/* Byte 4-11 */
+	u8 reserved[8];			/* reserved */
+	/* Byte 12-15 */
+	__be32  company_id;		/* Company ID of the company	*/
+					/* that defines the format for	*/
+					/* the vendor specific log type	*/
+	/* Byte 16-end of log */
+	u8 vendor_log[1];		/* Start of vendor specific log	*/
+					/* Variable length.		*/
+};
+
+/* Vendor specific Platform Event Log Format, Version 6, section header */
+struct pseries_errorlog {
+	__be16 id;			/* 0x00 2-byte ASCII section ID	*/
+	__be16 length;			/* 0x02 Section length in bytes	*/
+	u8 version;			/* 0x04 Section version		*/
+	u8 subtype;			/* 0x05 Section subtype		*/
+	__be16 creator_component;	/* 0x06 Creator component ID	*/
+	u8 data[];			/* 0x08 Start of section data	*/
+};
+
+/* RTAS pseries hotplug errorlog section */
+struct pseries_hp_errorlog {
+	u8	resource;
+	u8	action;
+	u8	id_type;
+	u8	reserved;
+	union {
+		__be32	drc_index;
+		__be32	drc_count;
+		struct { __be32 count, index; } ic;
+		char	drc_name[1];
+	} _drc_u;
+};
+
+#endif /* __KERNEL__ */
+#endif /* _POWERPC_RTAS_TYPES_H */
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index 3c1887351c71..c35c5350b7e4 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -5,6 +5,7 @@
 
 #include <linux/spinlock.h>
 #include <asm/page.h>
+#include <asm/rtas-types.h>
 #include <linux/time.h>
 #include <linux/cpumask.h>
 
@@ -42,33 +43,6 @@
  *
  */
 
-typedef __be32 rtas_arg_t;
-
-struct rtas_args {
-	__be32 token;
-	__be32 nargs;
-	__be32 nret; 
-	rtas_arg_t args[16];
-	rtas_arg_t *rets;     /* Pointer to return values in args[]. */
-};  
-
-struct rtas_t {
-	unsigned long entry;		/* physical address pointer */
-	unsigned long base;		/* physical address pointer */
-	unsigned long size;
-	arch_spinlock_t lock;
-	struct rtas_args args;
-	struct device_node *dev;	/* virtual address pointer */
-};
-
-struct rtas_suspend_me_data {
-	atomic_t working; /* number of cpus accessing this struct */
-	atomic_t done;
-	int token; /* ibm,suspend-me */
-	atomic_t error;
-	struct completion *complete; /* wait on this until working == 0 */
-};
-
 /* RTAS event classes */
 #define RTAS_INTERNAL_ERROR		0x80000000 /* set bit 0 */
 #define RTAS_EPOW_WARNING		0x40000000 /* set bit 1 */
@@ -148,31 +122,6 @@ struct rtas_suspend_me_data {
 /* RTAS check-exception vector offset */
 #define RTAS_VECTOR_EXTERNAL_INTERRUPT	0x500
 
-struct rtas_error_log {
-	/* Byte 0 */
-	uint8_t		byte0;			/* Architectural version */
-
-	/* Byte 1 */
-	uint8_t		byte1;
-	/* XXXXXXXX
-	 * XXX		3: Severity level of error
-	 *    XX	2: Degree of recovery
-	 *      X	1: Extended log present?
-	 *       XX	2: Reserved
-	 */
-
-	/* Byte 2 */
-	uint8_t		byte2;
-	/* XXXXXXXX
-	 * XXXX		4: Initiator of event
-	 *     XXXX	4: Target of failed operation
-	 */
-	uint8_t		byte3;			/* General event or error*/
-	__be32		extended_log_length;	/* length in bytes */
-	unsigned char	buffer[1];		/* Start of extended log */
-						/* Variable length.      */
-};
-
 static inline uint8_t rtas_error_severity(const struct rtas_error_log *elog)
 {
 	return (elog->byte1 & 0xE0) >> 5;
@@ -212,47 +161,6 @@ uint32_t rtas_error_extended_log_length(const struct rtas_error_log *elog)
 
 #define RTAS_V6EXT_COMPANY_ID_IBM	(('I' << 24) | ('B' << 16) | ('M' << 8))
 
-/* RTAS general extended event log, Version 6. The extended log starts
- * from "buffer" field of struct rtas_error_log defined above.
- */
-struct rtas_ext_event_log_v6 {
-	/* Byte 0 */
-	uint8_t byte0;
-	/* XXXXXXXX
-	 * X		1: Log valid
-	 *  X		1: Unrecoverable error
-	 *   X		1: Recoverable (correctable or successfully retried)
-	 *    X		1: Bypassed unrecoverable error (degraded operation)
-	 *     X	1: Predictive error
-	 *      X	1: "New" log (always 1 for data returned from RTAS)
-	 *       X	1: Big Endian
-	 *        X	1: Reserved
-	 */
-
-	/* Byte 1 */
-	uint8_t byte1;			/* reserved */
-
-	/* Byte 2 */
-	uint8_t byte2;
-	/* XXXXXXXX
-	 * X		1: Set to 1 (indicating log is in PowerPC format)
-	 *  XXX		3: Reserved
-	 *     XXXX	4: Log format used for bytes 12-2047
-	 */
-
-	/* Byte 3 */
-	uint8_t byte3;			/* reserved */
-	/* Byte 4-11 */
-	uint8_t reserved[8];		/* reserved */
-	/* Byte 12-15 */
-	__be32  company_id;		/* Company ID of the company	*/
-					/* that defines the format for	*/
-					/* the vendor specific log type	*/
-	/* Byte 16-end of log */
-	uint8_t vendor_log[1];		/* Start of vendor specific log	*/
-					/* Variable length.		*/
-};
-
 static
 inline uint8_t rtas_ext_event_log_format(struct rtas_ext_event_log_v6 *ext_log)
 {
@@ -287,16 +195,6 @@ inline uint32_t rtas_ext_event_company_id(struct rtas_ext_event_log_v6 *ext_log)
 #define PSERIES_ELOG_SECT_ID_HOTPLUG		(('H' << 8) | 'P')
 #define PSERIES_ELOG_SECT_ID_MCE		(('M' << 8) | 'C')
 
-/* Vendor specific Platform Event Log Format, Version 6, section header */
-struct pseries_errorlog {
-	__be16 id;			/* 0x00 2-byte ASCII section ID	*/
-	__be16 length;			/* 0x02 Section length in bytes	*/
-	uint8_t version;		/* 0x04 Section version		*/
-	uint8_t subtype;		/* 0x05 Section subtype		*/
-	__be16 creator_component;	/* 0x06 Creator component ID	*/
-	uint8_t data[];			/* 0x08 Start of section data	*/
-};
-
 static
 inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
 {
@@ -309,20 +207,6 @@ inline uint16_t pseries_errorlog_length(struct pseries_errorlog *sect)
 	return be16_to_cpu(sect->length);
 }
 
-/* RTAS pseries hotplug errorlog section */
-struct pseries_hp_errorlog {
-	u8	resource;
-	u8	action;
-	u8	id_type;
-	u8	reserved;
-	union {
-		__be32	drc_index;
-		__be32	drc_count;
-		struct { __be32 count, index; } ic;
-		char	drc_name[1];
-	} _drc_u;
-};
-
 #define PSERIES_HP_ELOG_RESOURCE_CPU	1
 #define PSERIES_HP_ELOG_RESOURCE_MEM	2
 #define PSERIES_HP_ELOG_RESOURCE_SLOT	3
-- 
2.25.4


^ permalink raw reply related

* [PATCH v2 2/2] powerpc/rtas: Implement reentrant rtas call
From: Leonardo Bras @ 2020-05-13  4:40 UTC (permalink / raw)
  To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras,
	Allison Randal, Leonardo Bras, Greg Kroah-Hartman,
	Thomas Gleixner, Nicholas Piggin, Nathan Lynch, Gautham R. Shenoy,
	Nadav Amit
  Cc: Leonardo Bras, linuxppc-dev, linux-kernel
In-Reply-To: <20200513044025.105379-1-leobras.c@gmail.com>

Implement rtas_call_reentrant() for reentrant rtas-calls:
"ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".

On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
items 2 and 3 say:

2 - For the PowerPC External Interrupt option: The * call must be
reentrant to the number of processors on the platform.
3 - For the PowerPC External Interrupt option: The * argument call
buffer for each simultaneous call must be physically unique.

So, these rtas-calls can be called in a lockless way, if using
a different buffer for each call.

This can be useful to avoid deadlocks in crashing, where rtas-calls are
needed, but some other thread crashed holding the rtas.lock.

This is an example backtrace of deadlock noticed:

  #0 arch_spin_lock
  #1  lock_rtas () 
  #2  rtas_call (token=8204, nargs=1, nret=1, outputs=0x0)
  #3  ics_rtas_mask_real_irq (hw_irq=4100) 
  #4  machine_kexec_mask_interrupts
  #5  default_machine_crash_shutdown
  #6  machine_crash_shutdown 
  #7  __crash_kexec
  #8  crash_kexec
  #9  oops_end


Signed-off-by: Leonardo Bras <leobras.c@gmail.com>

---
Changes since v1:
- Moved buffer from stack to PACA (as suggested by Paul Mackerras)
- Added missing output bits
- Improve documentation following kernel-doc format (as suggested by
  Nathan Lynch)
---
 arch/powerpc/include/asm/paca.h     |  2 ++
 arch/powerpc/include/asm/rtas.h     |  1 +
 arch/powerpc/kernel/rtas.c          | 42 +++++++++++++++++++++++++++++
 arch/powerpc/sysdev/xics/ics-rtas.c | 22 +++++++--------
 4 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index e3cc9eb9204d..5a76ba50b40f 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -29,6 +29,7 @@
 #include <asm/hmi.h>
 #include <asm/cpuidle.h>
 #include <asm/atomic.h>
+#include <asm/rtas-types.h>
 
 #include <asm-generic/mmiowb_types.h>
 
@@ -270,6 +271,7 @@ struct paca_struct {
 #ifdef CONFIG_MMIOWB
 	struct mmiowb_state mmiowb_state;
 #endif
+	struct rtas_args reentrant_args;
 } ____cacheline_aligned;
 
 extern void copy_mm_to_paca(struct mm_struct *mm);
diff --git a/arch/powerpc/include/asm/rtas.h b/arch/powerpc/include/asm/rtas.h
index c35c5350b7e4..fa7509c85881 100644
--- a/arch/powerpc/include/asm/rtas.h
+++ b/arch/powerpc/include/asm/rtas.h
@@ -236,6 +236,7 @@ extern struct rtas_t rtas;
 extern int rtas_token(const char *service);
 extern int rtas_service_present(const char *service);
 extern int rtas_call(int token, int, int, int *, ...);
+int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...);
 void rtas_call_unlocked(struct rtas_args *args, int token, int nargs,
 			int nret, ...);
 extern void __noreturn rtas_restart(char *cmd);
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index c5fa251b8950..d426b5c4856c 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -41,6 +41,7 @@
 #include <asm/time.h>
 #include <asm/mmu.h>
 #include <asm/topology.h>
+#include <asm/paca.h>
 
 /* This is here deliberately so it's only used in this file */
 void enter_rtas(unsigned long);
@@ -483,6 +484,47 @@ int rtas_call(int token, int nargs, int nret, int *outputs, ...)
 }
 EXPORT_SYMBOL(rtas_call);
 
+/**
+ * rtas_call_reentrant() - Used for reentrant rtas calls
+ * @token:	Token for desired reentrant RTAS call
+ * @nargs:	Number of Input Parameters
+ * @nret:	Number of Output Parameters
+ * @outputs:	Array of outputs
+ * @...:	Inputs for desired RTAS call
+ *
+ * According to LoPAR documentation, only "ibm,int-on", "ibm,int-off",
+ * "ibm,get-xive" and "ibm,set-xive" are currently reentrant.
+ * Reentrant calls need their own rtas_args buffer, so not using rtas.args, but
+ * PACA one instead.
+ *
+ * Return:	-1 on error,
+ *		First output value of RTAS call if (nret > 0),
+ *		0 otherwise,
+ */
+
+int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...)
+{
+	va_list list;
+	struct rtas_args *args;
+	int i;
+
+	if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
+		return -1;
+
+	/* We use the per-cpu (PACA) rtas args buffer */
+	args = &local_paca->reentrant_args;
+
+	va_start(list, outputs);
+	va_rtas_call_unlocked(args, token, nargs, nret, list);
+	va_end(list);
+
+	if (nret > 1 && outputs)
+		for (i = 0; i < nret - 1; ++i)
+			outputs[i] = be32_to_cpu(args->rets[i + 1]);
+
+	return (nret > 0) ? be32_to_cpu(args->rets[0]) : 0;
+}
+
 /* For RTAS_BUSY (-2), delay for 1 millisecond.  For an extended busy status
  * code of 990n, perform the hinted delay of 10^n (last digit) milliseconds.
  */
diff --git a/arch/powerpc/sysdev/xics/ics-rtas.c b/arch/powerpc/sysdev/xics/ics-rtas.c
index 6aabc74688a6..4cf18000f07c 100644
--- a/arch/powerpc/sysdev/xics/ics-rtas.c
+++ b/arch/powerpc/sysdev/xics/ics-rtas.c
@@ -50,8 +50,8 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
 
 	server = xics_get_irq_server(d->irq, irq_data_get_affinity_mask(d), 0);
 
-	call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq, server,
-				DEFAULT_PRIORITY);
+	call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
+					  server, DEFAULT_PRIORITY);
 	if (call_status != 0) {
 		printk(KERN_ERR
 			"%s: ibm_set_xive irq %u server %x returned %d\n",
@@ -60,7 +60,7 @@ static void ics_rtas_unmask_irq(struct irq_data *d)
 	}
 
 	/* Now unmask the interrupt (often a no-op) */
-	call_status = rtas_call(ibm_int_on, 1, 1, NULL, hw_irq);
+	call_status = rtas_call_reentrant(ibm_int_on, 1, 1, NULL, hw_irq);
 	if (call_status != 0) {
 		printk(KERN_ERR "%s: ibm_int_on irq=%u returned %d\n",
 			__func__, hw_irq, call_status);
@@ -91,7 +91,7 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
 	if (hw_irq == XICS_IPI)
 		return;
 
-	call_status = rtas_call(ibm_int_off, 1, 1, NULL, hw_irq);
+	call_status = rtas_call_reentrant(ibm_int_off, 1, 1, NULL, hw_irq);
 	if (call_status != 0) {
 		printk(KERN_ERR "%s: ibm_int_off irq=%u returned %d\n",
 			__func__, hw_irq, call_status);
@@ -99,8 +99,8 @@ static void ics_rtas_mask_real_irq(unsigned int hw_irq)
 	}
 
 	/* Have to set XIVE to 0xff to be able to remove a slot */
-	call_status = rtas_call(ibm_set_xive, 3, 1, NULL, hw_irq,
-				xics_default_server, 0xff);
+	call_status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL, hw_irq,
+					  xics_default_server, 0xff);
 	if (call_status != 0) {
 		printk(KERN_ERR "%s: ibm_set_xive(0xff) irq=%u returned %d\n",
 			__func__, hw_irq, call_status);
@@ -131,7 +131,7 @@ static int ics_rtas_set_affinity(struct irq_data *d,
 	if (hw_irq == XICS_IPI || hw_irq == XICS_IRQ_SPURIOUS)
 		return -1;
 
-	status = rtas_call(ibm_get_xive, 1, 3, xics_status, hw_irq);
+	status = rtas_call_reentrant(ibm_get_xive, 1, 3, xics_status, hw_irq);
 
 	if (status) {
 		printk(KERN_ERR "%s: ibm,get-xive irq=%u returns %d\n",
@@ -146,8 +146,8 @@ static int ics_rtas_set_affinity(struct irq_data *d,
 		return -1;
 	}
 
-	status = rtas_call(ibm_set_xive, 3, 1, NULL,
-			   hw_irq, irq_server, xics_status[1]);
+	status = rtas_call_reentrant(ibm_set_xive, 3, 1, NULL,
+				     hw_irq, irq_server, xics_status[1]);
 
 	if (status) {
 		printk(KERN_ERR "%s: ibm,set-xive irq=%u returns %d\n",
@@ -179,7 +179,7 @@ static int ics_rtas_map(struct ics *ics, unsigned int virq)
 		return -EINVAL;
 
 	/* Check if RTAS knows about this interrupt */
-	rc = rtas_call(ibm_get_xive, 1, 3, status, hw_irq);
+	rc = rtas_call_reentrant(ibm_get_xive, 1, 3, status, hw_irq);
 	if (rc)
 		return -ENXIO;
 
@@ -198,7 +198,7 @@ static long ics_rtas_get_server(struct ics *ics, unsigned long vec)
 {
 	int rc, status[2];
 
-	rc = rtas_call(ibm_get_xive, 1, 3, status, vec);
+	rc = rtas_call_reentrant(ibm_get_xive, 1, 3, status, vec);
 	if (rc)
 		return -1;
 	return status[0];
-- 
2.25.4


^ permalink raw reply related

* Re: [PATCH v4 2/2] powerpc/64s/hash: Add stress_hpt kernel boot option to increase hash faults
From: Nicholas Piggin @ 2020-05-13  4:50 UTC (permalink / raw)
  To: linuxppc-dev, Michael Ellerman
In-Reply-To: <20200511125825.3081305-2-mpe@ellerman.id.au>

Excerpts from Michael Ellerman's message of May 11, 2020 10:58 pm:
> +void hpt_do_stress(unsigned long ea, unsigned long access,
> +		   unsigned long rflags, unsigned long hpte_group)
> +{
> +	unsigned long last_group;
> +	int cpu = raw_smp_processor_id();
> +
> +	last_group = stress_hpt_last_group[cpu];
> +	if (last_group != -1UL) {
> +		while (mmu_hash_ops.hpte_remove(last_group) != -1)
> +			;

This seems to have issues causing hangs and livelocking, particularly on 
SMP guests. I think another CPU taking a fault and inserting an HPTE
into this group can get stuck when it has its entry removed before it can
return from the fault and load its TLB, so it faults again.

The hpte_remove hypercall must be slow enough on a guest that this loop 
doesn't finish before the other CPU comes in and puts another entry in 
there. Which explains why I didn't see it on bare metal.

Removing the loop doesn't end up generating a lot of faults because most 
HPTEs are userspace, so they end up overwhelming the kernel HPTE 
removal.

Using hpte_invalidate to invalidate the specific entry might be the go,
although it removes some element of randomness at least on PowerNV -- 
the kernel TLB is still there and will fault "some time" when the TLB 
entry drops out.

Maybe hpt stress should go into the hash implementation. I'm thinking 
about what to do.

Better drop this patch for now, but the SLB one should be good to go.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH 1/1] powerpc/rtas: Implement reentrant rtas call
From: Leonardo Bras @ 2020-05-13  4:31 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Gautham R. Shenoy, Greg Kroah-Hartman, linux-kernel,
	Nicholas Piggin, Paul Mackerras, Nadav Amit, Thomas Gleixner,
	linuxppc-dev, Allison Randal
In-Reply-To: <87ftdb87jf.fsf@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3666 bytes --]

Hello Nathan, thanks for the feedback!

On Fri, 2020-04-10 at 14:28 -0500, Nathan Lynch wrote:
> Leonardo Bras <leonardo@linux.ibm.com> writes:
> > Implement rtas_call_reentrant() for reentrant rtas-calls:
> > "ibm,int-on", "ibm,int-off",ibm,get-xive" and  "ibm,set-xive".
> > 
> > On LoPAPR Version 1.1 (March 24, 2016), from 7.3.10.1 to 7.3.10.4,
> > items 2 and 3 say:
> > 
> > 2 - For the PowerPC External Interrupt option: The * call must be
> > reentrant to the number of processors on the platform.
> > 3 - For the PowerPC External Interrupt option: The * argument call
> > buffer for each simultaneous call must be physically unique.
> > 
> > So, these rtas-calls can be called in a lockless way, if using
> > a different buffer for each call.
> > 

> From the language in the spec it's clear that these calls are intended
> to be reentrant with respect to themselves, but it's less clear to me
> that they are safe to call simultaneously with respect to each other or
> arbitrary other RTAS methods.

In my viewpoint, being reentrant to themselves, without being reentrant
to others would be very difficult to do, considering the way the
rtas_call is crafted to work.

I mean, I have no experience in rtas code, it's my viewpoint. In my
thoughts there is something like this:

common_path -> selects function by token -> reentrant function
					|-> non-reentrant function

If there is one function that is reentrant, it means the common_path
and function selection by token would need to be reentrant too.

> > This can be useful to avoid deadlocks in crashing, where rtas-calls are
> > needed, but some other thread crashed holding the rtas.lock.
> 
> Are these calls commonly used in the crash-handling path? Is this
> addressing a real issue you've seen?
> 

Yes, I noticed deadlocks during crashes, like this one:
#0 arch_spin_lock
#1  lock_rtas () 
#2  rtas_call (token=8204, nargs=1, nret=1, outputs=0x0)
#3  ics_rtas_mask_real_irq (hw_irq=4100) 
#4  machine_kexec_mask_interrupts
#5  default_machine_crash_shutdown
#6  machine_crash_shutdown 
#7  __crash_kexec
#8  crash_kexec
#9  oops_end

On ics_rtas_mask_real_irq() we have both ibm_int_off and ibm_set_xive,
so it makes sense to also add ibm_int_on and ibm_get_xive as reentrant
too.

Full discussion available on this thread:
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200401000020.590447-1-leonardo@linux.ibm.com/

> 
> > +/*
> > + * Used for reentrant rtas calls.
> > + * According to LoPAR documentation, only "ibm,int-on", "ibm,int-off",
> > + * "ibm,get-xive" and "ibm,set-xive" are currently reentrant.
> > + * Reentrant calls need their own rtas_args buffer, so not using rtas.args.
> > + */
> 
> Please use kernel-doc format in new code.

Sure, v2 is going to be fixed.

> 
> 
> > +int rtas_call_reentrant(int token, int nargs, int nret, int *outputs, ...)
> > +{
> > +	va_list list;
> > +	struct rtas_args rtas_args;
> > +
> > +	if (!rtas.entry || token == RTAS_UNKNOWN_SERVICE)
> > +		return -1;
> > +
> > +	va_start(list, outputs);
> > +	va_rtas_call_unlocked(&rtas_args, token, nargs, nret, list);
> > +	va_end(list);
> 
> No, I don't think you can place the RTAS argument buffer on the stack:
> 
>   7.2.7, Software Implementation Note:
>   | The OS must be aware that the effective address range for RTAS is 4
>   | GB when instantiated in 32-bit mode and the OS should not pass RTAS
>   | addresses or blocks of data which might fall outside of this range.

Agree, moved to PACA.

I will send a v2 soon, it will be a 2-patch patchset.

Best regards,
Leonardo Bras

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply

* Re: [PATCH v2 1/1] powerpc/crash: Use NMI context for printk when starting to crash
From: Leonardo Bras @ 2020-05-13  5:16 UTC (permalink / raw)
  To: Nicholas Piggin, Alexios Zavras, Benjamin Herrenschmidt,
	Christophe Leroy, Greg Kroah-Hartman, Enrico Weigelt,
	Michael Ellerman, Paul Mackerras, Thomas Gleixner
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1589344247.2akwhmzwhg.astroid@bobo.none>

Hello Nick, thanks for your feedback.
Comments inline:

On Wed, 2020-05-13 at 14:36 +1000, Nicholas Piggin wrote:
> Excerpts from Leonardo Bras's message of May 13, 2020 7:45 am:
> > Currently, if printk lock (logbuf_lock) is held by other thread during
> > crash, there is a chance of deadlocking the crash on next printk, and
> > blocking a possibly desired kdump.
> > 
> > At the start of default_machine_crash_shutdown, make printk enter
> > NMI context, as it will use per-cpu buffers to store the message,
> > and avoid locking logbuf_lock.
> 
> printk_nmi_enter is used in one other place outside nmi_enter.
> 
> Is there a different/better way to handle this? What do other 
> architectures do?

To be honest, I was unaware of nmi_enter() and I have yet to study what
other architectures do here.

> Other subsystems get put into an nmi-mode when we call nmi_enter
> (lockdep, ftrace, rcu etc). It seems like those would be useful for 
> similar reasons, so at least explaining why that is not used in a 
> comment would be good.

My reasoning for using printk_nmi_enter() here was only to keep it from
using printk regular buffer (and locking logbuf_lock) at this point of
the crash.

I have yet to see how nmi_enter() extra functions would happen to
interfere with the crash at this point. 

(In a quick look at x86, (native_machine_crash_shutdown) I could not
see it using any printk, so it may not be necessary).

> Aside from that, I welcome any effort to make our crashes more reliable
> so thanks for working on this stuff.
> 
> Thanks,
> Nick

Thank you, it means a lot.

Leonardo Bras


^ permalink raw reply

* Re: [PATCH 03/12] mm: reorder includes after introduction of linux/pgtable.h
From: Mike Rapoport @ 2020-05-13  5:21 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Rich Felker, linux-ia64, linux-sh, Catalin Marinas,
	Heiko Carstens, linux-mips, Max Filippov, Guo Ren, linux-csky,
	sparclinux, linux-hexagon, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, linux-arch, linux-s390, linux-c6x-dev, Brian Cain,
	Helge Deller, x86, Russell King, Ley Foon Tan, Mike Rapoport,
	Ingo Molnar, Geert Uytterhoeven, linux-parisc, Mark Salter,
	Matt Turner, linux-snps-arc, linux-xtensa, Arnd Bergmann,
	linux-alpha, linux-um, linux-m68k, Tony Luck, Borislav Petkov,
	Greentime Hu, Paul Walmsley, Stafford Horne, Guan Xuetao,
	linux-arm-kernel, Chris Zankel, Michal Simek, Thomas Bogendoerfer,
	Yoshinori Sato, Nick Hu, linux-mm, Vineet Gupta, linux-kernel,
	openrisc, Thomas Gleixner, Richard Weinberger, Andrew Morton,
	linuxppc-dev, David S. Miller
In-Reply-To: <20200512192013.GY16070@bombadil.infradead.org>

On Tue, May 12, 2020 at 12:20:13PM -0700, Matthew Wilcox wrote:
> On Tue, May 12, 2020 at 09:44:13PM +0300, Mike Rapoport wrote:
> > diff --git a/arch/alpha/kernel/proto.h b/arch/alpha/kernel/proto.h
> > index a093cd45ec79..701a05090141 100644
> > --- a/arch/alpha/kernel/proto.h
> > +++ b/arch/alpha/kernel/proto.h
> > @@ -2,8 +2,6 @@
> >  #include <linux/interrupt.h>
> >  #include <linux/io.h>
> >  
> > -#include <linux/pgtable.h>
> > -
> >  /* Prototypes of functions used across modules here in this directory.  */
> >  
> >  #define vucp	volatile unsigned char  *
> 
> Looks like your script has a bug if linux/pgtable.h is the last include
> in the file?

Script indeed cannot handle all the corner case, but this is not one of
them.
I've started initially to look into removing asm/pgtable.h if it was not
needed, but I've run out of patience very soon. This file is what
sneaked in from that attempt.

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH 08/12] mm: pgtable: add shortcuts for accessing kernel PMD and PTE
From: Mike Rapoport @ 2020-05-13  5:23 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Rich Felker, linux-ia64, linux-sh, Catalin Marinas,
	Heiko Carstens, linux-mips, Max Filippov, Guo Ren, linux-csky,
	sparclinux, linux-hexagon, linux-riscv, Vincent Chen, Will Deacon,
	Greg Ungerer, linux-arch, linux-s390, linux-c6x-dev, Brian Cain,
	Helge Deller, x86, Russell King, Ley Foon Tan, Mike Rapoport,
	Ingo Molnar, Geert Uytterhoeven, linux-parisc, Mark Salter,
	Matt Turner, linux-snps-arc, linux-xtensa, Arnd Bergmann,
	linux-alpha, linux-um, linux-m68k, Tony Luck, Borislav Petkov,
	Greentime Hu, Paul Walmsley, Stafford Horne, Guan Xuetao,
	linux-arm-kernel, Chris Zankel, Michal Simek, Thomas Bogendoerfer,
	Yoshinori Sato, Nick Hu, linux-mm, Vineet Gupta, linux-kernel,
	openrisc, Thomas Gleixner, Richard Weinberger, Andrew Morton,
	linuxppc-dev, David S. Miller
In-Reply-To: <20200512192441.GZ16070@bombadil.infradead.org>

On Tue, May 12, 2020 at 12:24:41PM -0700, Matthew Wilcox wrote:
> On Tue, May 12, 2020 at 09:44:18PM +0300, Mike Rapoport wrote:
> > +++ b/include/linux/pgtable.h
> > @@ -28,6 +28,24 @@
> >  #define USER_PGTABLES_CEILING	0UL
> >  #endif
> >  
> > +/* FIXME: */
> 
> Fix you what?  Add documentation?

Ouch, indeed :)

> > +static inline pmd_t *pmd_off(struct mm_struct *mm, unsigned long va)
> > +{
> > +	return pmd_offset(pud_offset(p4d_offset(pgd_offset(mm, va), va), va), va);
> > +}

-- 
Sincerely yours,
Mike.

^ permalink raw reply

* Re: [PATCH 1/1] powerpc/rtas: Implement reentrant rtas call
From: Leonardo Bras @ 2020-05-13  5:23 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: Gautham R. Shenoy, Greg Kroah-Hartman, linux-kernel,
	Nicholas Piggin, Paul Mackerras, Nadav Amit, Thomas Gleixner,
	linuxppc-dev, Allison Randal
In-Reply-To: <87ftdb87jf.fsf@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 191 bytes --]

v2: 
http://patchwork.ozlabs.org/project/linuxppc-dev/patch/20200513044025.105379-2-leobras.c@gmail.com/

(Series:
http://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=176534) 

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 862 bytes --]

^ permalink raw reply

* Re: [PATCH v8 5/5] powerpc/hv-24x7: Update post_mobility_fixup() to handle migration
From: kajoljain @ 2020-05-13  6:20 UTC (permalink / raw)
  To: Nathan Lynch
  Cc: ravi.bangoria, maddy, anju, peterz, gregkh, suka,
	alexander.shishkin, mingo, mpetlan, yao.jin, ak, mamatha4, acme,
	jmario, namhyung, linuxppc-dev, jolsa, kan.liang
In-Reply-To: <87eerq2rcc.fsf@linux.ibm.com>



On 5/12/20 1:10 AM, Nathan Lynch wrote:
> Hello,
> 
> Kajol Jain <kjain@linux.ibm.com> writes:
>> Function 'read_sys_info_pseries()' is added to get system parameter
>> values like number of sockets and chips per socket.
>> and it gets these details via rtas_call with token
>> "PROCESSOR_MODULE_INFO".
>>
>> Incase lpar migrate from one system to another, system
>> parameter details like chips per sockets or number of sockets might
>> change. So, it needs to be re-initialized otherwise, these values
>> corresponds to previous system values.
>> This patch adds a call to 'read_sys_info_pseries()' from
>> 'post-mobility_fixup()' to re-init the physsockets and physchips values
>>
>> Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
>> ---
>>  arch/powerpc/platforms/pseries/mobility.c | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
> 
> Please cc me on patches for this code, thanks.

Hi Nathan,
	Thanks for reviewing the patch. I will cc you on next version of this patchset.
> 
> I see no technical problems with how this patch handles partition
> migration. However:
> 
> "Update post_mobility_fixup() to handle migration" is not an appropriate
> summary for this change. post_mobility_fixup() already handles
> migration. A better summary would be
> 
> "powerpc/pseries: update hv-24x7 info after migration"

Will update.

> 
> 
>> diff --git a/arch/powerpc/platforms/pseries/mobility.c b/arch/powerpc/platforms/pseries/mobility.c
>> index b571285f6c14..0fb8f1e6e9d2 100644
>> --- a/arch/powerpc/platforms/pseries/mobility.c
>> +++ b/arch/powerpc/platforms/pseries/mobility.c
>> @@ -42,6 +42,12 @@ struct update_props_workarea {
>>  #define MIGRATION_SCOPE	(1)
>>  #define PRRN_SCOPE -2
>>  
>> +#ifdef CONFIG_HV_PERF_CTRS
>> +void read_sys_info_pseries(void);
>> +#else
>> +static inline void read_sys_info_pseries(void) { }
>> +#endif
> 
> This should go in a header.
> 
> 
>>  static int mobility_rtas_call(int token, char *buf, s32 scope)
>>  {
>>  	int rc;
>> @@ -371,6 +377,16 @@ void post_mobility_fixup(void)
>>  	/* Possibly switch to a new RFI flush type */
>>  	pseries_setup_rfi_flush();
>>  
>> +	/*
>> +	 * In case an Lpar migrates from one system to another, system
>> +	 * parameter details like chips per sockets, cores per chip and
>> +	 * number of sockets details might change.
>> +	 * So, they needs to be re-initialized otherwise the
>> +	 * values will correspond to the previous system.
>> +	 * Call read_sys_info_pseries() to reinitialise the values.
>> +	 */
> 
> This is needlessly verbose; any literate reader of this code knows this
> is used immediately after resuming from a suspend (migration). If you
> give your hook a more descriptive name, the comment becomes unnecessary.
> 

Yes make sense, will update.

Thanks,
Kajol Jain

> 
>> +	read_sys_info_pseries();
>> +
> 

^ permalink raw reply

* Re: [PATCH 19/31] riscv: use asm-generic/cacheflush.h
From: Christoph Hellwig @ 2020-05-13  6:23 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: linux-ia64, linux-sh, zippel, linux-mips, linux-mm, sparclinux,
	linux-riscv, Christoph Hellwig, linux-arch, linux-c6x-dev,
	linux-hexagon, x86, linux-xtensa, Arnd Bergmann, jeyu, linux-um,
	linux-m68k, openrisc, linux-arm-kernel, monstr, linux-kernel,
	linux-alpha, linux-fsdevel, akpm, linuxppc-dev
In-Reply-To: <mhng-8adbedbc-0f91-4291-9471-2df5eb7b802b@palmerdabbelt-glaptop1>

On Tue, May 12, 2020 at 04:00:26PM -0700, Palmer Dabbelt wrote:
> Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
> Acked-by: Palmer Dabbelt <palmerdabbelt@google.com>
>
> Were you trying to get these all in at once, or do you want me to take it into
> my tree?

Except for the small fixups at the beginning of the series this needs
to go in together.  I'll have to do at least another resend, and after
that I hope Andrew is going to pick it up.

^ permalink raw reply

* Re: [PATCH] powerpc/kvm: silence kmemleak false positives
From: Qian Cai @ 2020-05-13  6:24 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linux-kernel, kvm-ppc, catalin.marinas, linuxppc-dev
In-Reply-To: <87h7wkbhu4.fsf@mpe.ellerman.id.au>



> On May 13, 2020, at 12:04 AM, Michael Ellerman <mpe@ellerman.id.au> wrote:
> 
> This should probably also have an include of <linux/kmemleak.h> ?

No, asm/book3s/64/pgalloc.h has already had it and since this is book3s_64_mmu_radix.c, it will include it eventually from,

asm/pgalloc.h
  asm/book3s/pgalloc.h

^ permalink raw reply

* Re: [PATCH v2 4/5] powerpc/pmem/of_pmem: Update of_pmem to use the new barrier instruction.
From: kbuild test robot @ 2020-05-13  6:44 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev, mpe, linux-nvdimm
  Cc: alistair, dan.j.williams, kbuild-all, oohall, Aneesh Kumar K.V
In-Reply-To: <20200513034705.172983-4-aneesh.kumar@linux.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 3435 bytes --]

Hi "Aneesh,

I love your patch! Yet something to improve:

[auto build test ERROR on powerpc/next]
[also build test ERROR on linux-nvdimm/libnvdimm-for-next v5.7-rc5 next-20200512]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Aneesh-Kumar-K-V/powerpc-pmem-Add-new-instructions-for-persistent-storage-and-sync/20200513-133938
base:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
config: powerpc-storcenter_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day GCC_VERSION=9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

WARNING: unmet direct dependencies detected for PPC_INDIRECT_PCI
Depends on PCI
Selected by
- MPC10X_BRIDGE
In file included from include/linux/highmem.h:12,
from include/linux/pagemap.h:11,
from include/linux/blkdev.h:16,
from include/linux/blk-cgroup.h:23,
from include/linux/writeback.h:14,
from include/linux/memcontrol.h:22,
from include/linux/swap.h:9,
from include/linux/suspend.h:5,
from arch/powerpc/kernel/asm-offsets.c:23:
arch/powerpc/include/asm/cacheflush.h: In function 'arch_pmem_flush_barrier':
>> arch/powerpc/include/asm/cacheflush.h:126:22: error: 'CPU_FTR_ARCH_31' undeclared (first use in this function); did you mean
126 | if (cpu_has_feature(CPU_FTR_ARCH_31))
| ^~~~~~~~~~~~~~~
| CPU_FTR_ARCH_300
arch/powerpc/include/asm/cacheflush.h:126:22: note: each undeclared identifier is reported only once for each function it appears in
Makefile arch block certs crypto drivers fs include init ipc kernel lib mm net scripts security sound source usr virt [scripts/Makefile.build:100: arch/powerpc/kernel/asm-offsets.s] Error 1
Target '__build' not remade because of errors.
Makefile arch block certs crypto drivers fs include init ipc kernel lib mm net scripts security sound source usr virt [Makefile:1141: prepare0] Error 2
Target 'prepare' not remade because of errors.
make: Makefile arch block certs crypto drivers fs include init ipc kernel lib mm net scripts security sound source usr virt [Makefile:180: sub-make] Error 2

vim +/CPU_FTR_ARCH_31 +126 arch/powerpc/include/asm/cacheflush.h

   113	
   114	#define copy_to_user_page(vma, page, vaddr, dst, src, len) \
   115		do { \
   116			memcpy(dst, src, len); \
   117			flush_icache_user_range(vma, page, vaddr, len); \
   118		} while (0)
   119	#define copy_from_user_page(vma, page, vaddr, dst, src, len) \
   120		memcpy(dst, src, len)
   121	
   122	
   123	#define arch_pmem_flush_barrier arch_pmem_flush_barrier
   124	static inline void  arch_pmem_flush_barrier(void)
   125	{
 > 126		if (cpu_has_feature(CPU_FTR_ARCH_31))
   127			asm volatile(PPC_PHWSYNC ::: "memory");
   128		else
   129			asm volatile("hwsync" ::: "memory");
   130	}
   131	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 14417 bytes --]

^ permalink raw reply

* Re: [PATCH] tty: hvc: Fix data abort due to race in hvc_open
From: Greg KH @ 2020-05-13  7:04 UTC (permalink / raw)
  To: rananta; +Cc: linuxppc-dev, andrew, Jiri Slaby, linux-kernel
In-Reply-To: <417b1d320bda37410788430979dd708d@codeaurora.org>

On Tue, May 12, 2020 at 02:39:50PM -0700, rananta@codeaurora.org wrote:
> On 2020-05-12 01:25, Greg KH wrote:
> > On Tue, May 12, 2020 at 09:22:15AM +0200, Jiri Slaby wrote:
> > > On 11. 05. 20, 9:39, Greg KH wrote:
> > > > On Mon, May 11, 2020 at 12:23:58AM -0700, rananta@codeaurora.org wrote:
> > > >> On 2020-05-09 23:48, Greg KH wrote:
> > > >>> On Sat, May 09, 2020 at 06:30:56PM -0700, rananta@codeaurora.org wrote:
> > > >>>> On 2020-05-06 02:48, Greg KH wrote:
> > > >>>>> On Mon, Apr 27, 2020 at 08:26:01PM -0700, Raghavendra Rao Ananta wrote:
> > > >>>>>> Potentially, hvc_open() can be called in parallel when two tasks calls
> > > >>>>>> open() on /dev/hvcX. In such a scenario, if the
> > > >>>>>> hp->ops->notifier_add()
> > > >>>>>> callback in the function fails, where it sets the tty->driver_data to
> > > >>>>>> NULL, the parallel hvc_open() can see this NULL and cause a memory
> > > >>>>>> abort.
> > > >>>>>> Hence, serialize hvc_open and check if tty->private_data is NULL
> > > >>>>>> before
> > > >>>>>> proceeding ahead.
> > > >>>>>>
> > > >>>>>> The issue can be easily reproduced by launching two tasks
> > > >>>>>> simultaneously
> > > >>>>>> that does nothing but open() and close() on /dev/hvcX.
> > > >>>>>> For example:
> > > >>>>>> $ ./simple_open_close /dev/hvc0 & ./simple_open_close /dev/hvc0 &
> > > >>>>>>
> > > >>>>>> Signed-off-by: Raghavendra Rao Ananta <rananta@codeaurora.org>
> > > >>>>>> ---
> > > >>>>>>  drivers/tty/hvc/hvc_console.c | 16 ++++++++++++++--
> > > >>>>>>  1 file changed, 14 insertions(+), 2 deletions(-)
> > > >>>>>>
> > > >>>>>> diff --git a/drivers/tty/hvc/hvc_console.c
> > > >>>>>> b/drivers/tty/hvc/hvc_console.c
> > > >>>>>> index 436cc51c92c3..ebe26fe5ac09 100644
> > > >>>>>> --- a/drivers/tty/hvc/hvc_console.c
> > > >>>>>> +++ b/drivers/tty/hvc/hvc_console.c
> > > >>>>>> @@ -75,6 +75,8 @@ static LIST_HEAD(hvc_structs);
> > > >>>>>>   */
> > > >>>>>>  static DEFINE_MUTEX(hvc_structs_mutex);
> > > >>>>>>
> > > >>>>>> +/* Mutex to serialize hvc_open */
> > > >>>>>> +static DEFINE_MUTEX(hvc_open_mutex);
> > > >>>>>>  /*
> > > >>>>>>   * This value is used to assign a tty->index value to a hvc_struct
> > > >>>>>> based
> > > >>>>>>   * upon order of exposure via hvc_probe(), when we can not match it
> > > >>>>>> to
> > > >>>>>> @@ -346,16 +348,24 @@ static int hvc_install(struct tty_driver
> > > >>>>>> *driver, struct tty_struct *tty)
> > > >>>>>>   */
> > > >>>>>>  static int hvc_open(struct tty_struct *tty, struct file * filp)
> > > >>>>>>  {
> > > >>>>>> -	struct hvc_struct *hp = tty->driver_data;
> > > >>>>>> +	struct hvc_struct *hp;
> > > >>>>>>  	unsigned long flags;
> > > >>>>>>  	int rc = 0;
> > > >>>>>>
> > > >>>>>> +	mutex_lock(&hvc_open_mutex);
> > > >>>>>> +
> > > >>>>>> +	hp = tty->driver_data;
> > > >>>>>> +	if (!hp) {
> > > >>>>>> +		rc = -EIO;
> > > >>>>>> +		goto out;
> > > >>>>>> +	}
> > > >>>>>> +
> > > >>>>>>  	spin_lock_irqsave(&hp->port.lock, flags);
> > > >>>>>>  	/* Check and then increment for fast path open. */
> > > >>>>>>  	if (hp->port.count++ > 0) {
> > > >>>>>>  		spin_unlock_irqrestore(&hp->port.lock, flags);
> > > >>>>>>  		hvc_kick();
> > > >>>>>> -		return 0;
> > > >>>>>> +		goto out;
> > > >>>>>>  	} /* else count == 0 */
> > > >>>>>>  	spin_unlock_irqrestore(&hp->port.lock, flags);
> > > >>>>>
> > > >>>>> Wait, why isn't this driver just calling tty_port_open() instead of
> > > >>>>> trying to open-code all of this?
> > > >>>>>
> > > >>>>> Keeping a single mutext for open will not protect it from close, it will
> > > >>>>> just slow things down a bit.  There should already be a tty lock held by
> > > >>>>> the tty core for open() to keep it from racing things, right?
> > > >>>> The tty lock should have been held, but not likely across
> > > >>>> ->install() and
> > > >>>> ->open() callbacks, thus resulting in a race between hvc_install() and
> > > >>>> hvc_open(),
> > > >>>
> > > >>> How?  The tty lock is held in install, and should not conflict with
> > > >>> open(), otherwise, we would be seeing this happen in all tty drivers,
> > > >>> right?
> > > >>>
> > > >> Well, I was expecting the same, but IIRC, I see that the open() was being
> > > >> called in parallel for the same device node.
> > > >
> > > > So open and install are happening at the same time?  And the tty_lock()
> > > > does not protect the needed fields from being protected properly?  If
> > > > not, what fields are being touched without the lock?
> > > >
> > > >> Is it expected that the tty core would allow only one thread to
> > > >> access the dev-node, while blocking the other, or is it the client
> > > >> driver's responsibility to handle the exclusiveness?
> > > >
> > > > The tty core should handle this correctly, for things that can mess
> > > > stuff up (like install and open at the same time).  A driver should not
> > > > have to worry about that.
> > > >
> > > >>>> where hvc_install() sets a data and the hvc_open() clears it.
> > > >>>> hvc_open()
> > > >>>> doesn't
> > > >>>> check if the data was set to NULL and proceeds.
> > > >>>
> > > >>> What data is being set that hvc_open is checking?
> > > >> hvc_install sets tty->private_data to hp, while hvc_open sets it to NULL (in
> > > >> one of the paths).
> > > >
> > > > I see no use of private_data in drivers/tty/hvc/ so what exactly are you
> > > > referring to?
> > > 
> > > He likely means tty->driver_data. And there exactly lays the issue.
> > > 
> > > commit bdb498c20040616e94b05c31a0ceb3e134b7e829
> > > Author: Jiri Slaby <jslaby@suse.cz>
> > > Date:   Tue Aug 7 21:48:04 2012 +0200
> > > 
> > >     TTY: hvc_console, add tty install
> > > 
> > > added hvc_install but did not move 'tty->driver_data = NULL;' from
> > > hvc_open's fail path to hvc_cleanup.
> > > 
> > > IOW hvc_open now NULLs tty->driver_data even for another task which
> > > opened the tty earlier. The same holds for
> > > "tty_port_tty_set(&hp->port,
> > > NULL);" there. And actually "tty_port_put(&hp->port);" is also
> > > incorrect
> > > for the 2nd task opening the tty.
> > > 
> > > So, a mutex with tty->driver_data check in open is not definitely the
> > > way to go. This mess needs to be sorted out properly. Sure, a good
> > > start
> > > would be a conversion to tty_port_open. Right after dropping "tty:
> > > hvc:
> > > Fix data abort due to race in hvc_open" from tty/tty-next :).
> > 
> > I've now reverted this commit so we can start from a "clean" place.
> > 
> > > What I *don't* understand is why hp->ops->notifier_add fails, given
> > > the
> > > open does not allow multiple opens anyway?
> > 
> > I don't understand that either.  Raghavendra, can you show a real trace
> > for this issue that shows this?
> > 
> Let me know if this helps:
> 
> [  265.332900] Unable to handle kernel NULL pointer dereference at virtual
> address 00000000000000a8
> [  265.332920] Mem abort info:
> [  265.332934]   ESR = 0x96000006
> [  265.332950]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  265.332963]   SET = 0, FnV = 0
> [  265.332975]   EA = 0, S1PTW = 0
> [  265.332985] Data abort info:
> [  265.332997]   ISV = 0, ISS = 0x00000006
> [  265.333008]   CM = 0, WnR = 0
> [  265.333025] user pgtable: 4k pages, 39-bit VAs, pgdp=00000001620f3000
> [  265.333038] [00000000000000a8] pgd=00000001620f2003,
> pud=00000001620f2003, pmd=0000000000000000
> [  265.333071] Internal error: Oops: 96000006 [#1] PREEMPT SMP
> [  265.333424] CPU: 1 PID: 5653 Comm: stress-ng-dev Tainted: G S      W  O
> 5.4.12-g04866e0 #1
> [  265.333458] pstate: 80400085 (Nzcv daIf +PAN -UAO)
> [  265.333499] pc : _raw_spin_lock_irqsave+0x40/0x7c
> [  265.333517] lr : _raw_spin_lock_irqsave+0x38/0x7c
> [  265.333530] sp : ffffffc02436ba40
> [  265.333542] x29: ffffffc02436ba40 x28: 0000000000020800
> [  265.333562] x27: ffffffdfb4046490 x26: ffffff8101b83400
> [  265.333580] x25: ffffff80e163ad00 x24: ffffffdfb45c7798
> [  265.333598] x23: ffffff8101b83668 x22: ffffffdfb4974000
> [  265.333617] x21: 0000000000000001 x20: 00000000000000a8
> [  265.333634] x19: 0000000000000000 x18: ffffff80e0b0d460
> [  265.333652] x17: 0000000000000000 x16: 0000000001000000
> [  265.333670] x15: 0000000001000000 x14: 00000000f8000000
> [  265.333688] x13: 0000000000000000 x12: 0000000000000001
> [  265.333706] x11: 17f5f16765f64600 x10: 17f5f16765f64600
> [  265.333724] x9 : ffffffdfb3444244 x8 : 0000000000000000
> [  265.333741] x7 : 0000000000000000 x6 : 0000000000000000
> [  265.333759] x5 : 0000000000000000 x4 : 0000000000000002
> [  265.333776] x3 : ffffffc02436b9c0 x2 : ffffffdfb40456e0
> [  265.333794] x1 : ffffffc02436b9c0 x0 : ffffffdfb3444244
> [  265.333812] Call trace:
> [  265.333831]  _raw_spin_lock_irqsave+0x40/0x7c
> [  265.333859]  hvc_open$61deaf328f140fd7df47c115ec866fa5+0x28/0x174
> [  265.333882]  tty_open$86bd494905ebe22944bf63b711173de3+0x3d0/0x584
> [  265.333921]  chrdev_open$4083aaa799bca8e0e1e0c8dc1947aa96+0x1c4/0x248
> [  265.333940]  do_dentry_open+0x258/0x3b0
> [  265.333956]  vfs_open+0x2c/0x38
> [  265.333975]  path_openat+0x898/0xedc
> [  265.333991]  do_filp_open+0x78/0x124
> [  265.334006]  do_sys_open+0x13c/0x298
> [  265.334022]  __arm64_sys_openat+0x28/0x34
> [  265.334044]  el0_svc_common+0xb8/0x1b4
> [  265.334059]  el0_svc_handler+0x6c/0x88
> [  265.334079]  el0_svc+0x8/0xc
> [  265.334110] Code: 52800035 97b9fec7 aa1f03e8 f9800291 (885ffe81)
> [  265.334130] ---[ end trace ac90e3099a98e99f ]---
> [  265.334146] Kernel panic - not syncing: Fatal exception

Hm, do you have a strace showing the close happening at the same time?
What about install()?

And what line in hvc_open() does that offset correspond to?

thanks,

greg k-h

^ permalink raw reply

* Re: powerpc/pci: [PATCH 1/1]: PCIE PHB reset
From: Sam Bobroff @ 2020-05-13  7:23 UTC (permalink / raw)
  To: wenxiong; +Cc: brking, oohall, linuxppc-dev, wenxiong
In-Reply-To: <1588857037-25950-1-git-send-email-wenxiong@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 6544 bytes --]

On Thu, May 07, 2020 at 08:10:37AM -0500, wenxiong@linux.vnet.ibm.com wrote:
> From: Wen Xiong <wenxiong@linux.vnet.ibm.com>
> 
> Several device drivers hit EEH(Extended Error handling) when triggering
> kdump on Pseries PowerVM. This patch implemented a reset of the PHBs
> in pci general code. PHB reset stop all PCI transactions from previous
> kernel. We have tested the patch in several enviroments:
> - direct slot adapters
> - adapters under the switch
> - a VF adapter in PowerVM
> - a VF adapter/adapter in KVM guest.
> 
> Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>

One other thing:

I tested the patch and this line is logged for each emulated PHB:

[    3.337057] pseries_get_pdn_addr: Failed to get address for PHB#0-PE# option=1 config_addr=0

And it's not really an error -- QEMU's emulated PHBs don't have EEH
support.

It's not a big deal, because there are other similar messages already
present but it would probably be better if this were suppressed for the
case where there's no support.

Cheers,
Sam.

> ---
>  arch/powerpc/platforms/pseries/pci.c | 153 +++++++++++++++++++++++++++
>  1 file changed, 153 insertions(+)
> 
> diff --git a/arch/powerpc/platforms/pseries/pci.c b/arch/powerpc/platforms/pseries/pci.c
> index 911534b89c85..aac7f00696d2 100644
> --- a/arch/powerpc/platforms/pseries/pci.c
> +++ b/arch/powerpc/platforms/pseries/pci.c
> @@ -11,6 +11,8 @@
>  #include <linux/kernel.h>
>  #include <linux/pci.h>
>  #include <linux/string.h>
> +#include <linux/crash_dump.h>
> +#include <linux/delay.h>
>  
>  #include <asm/eeh.h>
>  #include <asm/pci-bridge.h>
> @@ -354,3 +356,154 @@ int pseries_root_bridge_prepare(struct pci_host_bridge *bridge)
>  
>  	return 0;
>  }
> +
> +/**
> + * pseries_get_pdn_addr - Retrieve PHB address
> + * @pe: EEH PE
> + *
> + * Retrieve the assocated PHB address. Actually, there're 2 RTAS
> + * function calls dedicated for the purpose. We need implement
> + * it through the new function and then the old one. Besides,
> + * you should make sure the config address is figured out from
> + * FDT node before calling the function.
> + *
> + */
> +static int pseries_get_pdn_addr(struct pci_controller *phb)
> +{
> +	int ret = -1;
> +	int rets[3];
> +	int ibm_get_config_addr_info;
> +	int ibm_get_config_addr_info2;
> +	int config_addr = 0;
> +	struct pci_dn *root_pdn, *pdn;
> +
> +	ibm_get_config_addr_info2   = rtas_token("ibm,get-config-addr-info2");
> +	ibm_get_config_addr_info    = rtas_token("ibm,get-config-addr-info");
> +
> +	root_pdn = PCI_DN(phb->dn);
> +	pdn = list_first_entry(&root_pdn->child_list, struct pci_dn, list);
> +	config_addr = (pdn->busno << 16) | (pdn->devfn << 8);
> +
> +	if (ibm_get_config_addr_info2 != RTAS_UNKNOWN_SERVICE) {
> +		/*
> +		 * First of all, we need to make sure there has one PE
> +		 * associated with the device. Otherwise, PE address is
> +		 * meaningless.
> +		 */
> +		ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
> +			config_addr, BUID_HI(pdn->phb->buid),
> +			BUID_LO(pdn->phb->buid), 1);
> +		if (ret || (rets[0] == 0)) {
> +			pr_warn("%s: Failed to get address for PHB#%x-PE# "
> +				"option=%d config_addr=%x\n",
> +				__func__, pdn->phb->global_number, 1, rets[0]);
> +			return -1;
> +		}
> +
> +		/* Retrieve the associated PE config address */
> +		ret = rtas_call(ibm_get_config_addr_info2, 4, 2, rets,
> +			config_addr, BUID_HI(pdn->phb->buid),
> +			BUID_LO(pdn->phb->buid), 0);
> +		if (ret) {
> +			pr_warn("%s: Failed to get address for PHB#%x-PE# "
> +				"option=%d config_addr=%x\n",
> +				__func__, pdn->phb->global_number, 0, rets[0]);
> +			return -1;
> +		}
> +		return rets[0];
> +	}
> +
> +	if (ibm_get_config_addr_info != RTAS_UNKNOWN_SERVICE) {
> +		ret = rtas_call(ibm_get_config_addr_info, 4, 2, rets,
> +			config_addr, BUID_HI(pdn->phb->buid),
> +			BUID_LO(pdn->phb->buid), 0);
> +		if (ret || rets[0]) {
> +			pr_warn("%s: Failed to get address for PHB#%x-PE# "
> +				"config_addr=%x\n",
> +				__func__, pdn->phb->global_number, rets[0]);
> +			return -1;
> +		}
> +		return rets[0];
> +	}
> +
> +	return ret;
> +}
> +
> +static int __init pseries_phb_reset(void)
> +{
> +	struct pci_controller *phb;
> +	int config_addr;
> +	int ibm_set_slot_reset;
> +	int ibm_configure_pe;
> +	int ret;
> +
> +	if (is_kdump_kernel() || reset_devices) {
> +		pr_info("Issue PHB reset ...\n");
> +		ibm_set_slot_reset = rtas_token("ibm,set-slot-reset");
> +		ibm_configure_pe = rtas_token("ibm,configure-pe");
> +
> +		if (ibm_set_slot_reset == RTAS_UNKNOWN_SERVICE ||
> +				ibm_configure_pe == RTAS_UNKNOWN_SERVICE) {
> +			pr_info("%s: EEH functionality not supported\n",
> +				__func__);
> +		}
> +
> +		list_for_each_entry(phb, &hose_list, list_node) {
> +			config_addr = pseries_get_pdn_addr(phb);
> +			if (config_addr == -1)
> +				continue;
> +
> +			ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
> +				config_addr, BUID_HI(phb->buid),
> +				BUID_LO(phb->buid), EEH_RESET_FUNDAMENTAL);
> +
> +			/* If fundamental-reset not supported, try hot-reset */
> +			if (ret == -8)
> +				ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
> +					config_addr, BUID_HI(phb->buid),
> +					BUID_LO(phb->buid), EEH_RESET_HOT);
> +
> +			if (ret) {
> +				pr_err("%s: fail with rtas_call fundamental reset=%d\n",
> +					__func__, ret);
> +				continue;
> +			}
> +		}
> +		msleep(EEH_PE_RST_SETTLE_TIME);
> +
> +		list_for_each_entry(phb, &hose_list, list_node) {
> +			config_addr = pseries_get_pdn_addr(phb);
> +			if (config_addr == -1)
> +				continue;
> +
> +			ret = rtas_call(ibm_set_slot_reset, 4, 1, NULL,
> +				config_addr, BUID_HI(phb->buid),
> +				BUID_LO(phb->buid), EEH_RESET_DEACTIVATE);
> +			if (ret) {
> +				pr_err("%s: fail with rtas_call deactive=%d\n",
> +					__func__, ret);
> +				continue;
> +			}
> +		}
> +		msleep(EEH_PE_RST_SETTLE_TIME);
> +
> +		list_for_each_entry(phb, &hose_list, list_node) {
> +			config_addr = pseries_get_pdn_addr(phb);
> +			if (config_addr == -1)
> +				continue;
> +
> +			ret = rtas_call(ibm_configure_pe, 3, 1, NULL,
> +				config_addr, BUID_HI(phb->buid),
> +				BUID_LO(phb->buid));
> +			if (ret) {
> +				pr_err("%s: fail with rtas_call configure_pe =%d\n",
> +					__func__, ret);
> +				continue;
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +postcore_initcall(pseries_phb_reset);
> +
> -- 
> 2.18.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH v4 3/3] mm/page_alloc: Keep memoryless cpuless node 0 offline
From: Srikar Dronamraju @ 2020-05-13  7:46 UTC (permalink / raw)
  To: Christopher Lameter
  Cc: Gautham R Shenoy, Michal Hocko, David Hildenbrand, Linus Torvalds,
	linux-kernel, linux-mm, Satheesh Rajendran, Mel Gorman,
	Kirill A. Shutemov, Andrew Morton, linuxppc-dev, Vlastimil Babka
In-Reply-To: <alpine.DEB.2.22.394.2005121627340.98180@www.lameter.com>

* Christopher Lameter <cl@linux.com> [2020-05-12 16:31:26]:

> On Tue, 12 May 2020, Srikar Dronamraju wrote:
> 
> > +#ifdef CONFIG_NUMA
> > +	[N_ONLINE] = NODE_MASK_NONE,
> 
> Again. Same issue as before. If you do this then you do a global change
> for all architectures. You need to put something in the early boot
> sequence (in a non architecture specific way) that sets the first node
> online by default.
> 

I did respond to that earlier.

> You have fixed the issue in your earlier patches for the powerpc
> archicture. What about the other architectures?
> 
> Or did I miss something?
> 

Here are my assumptions, please do correct me if any of them are wrong.
1. My other patches for Powerpc, don't change when the nodes are being
onlined. They only change how the cpu_to_node numbering of the offline cpus.
In this respect Powerpc due to its PAPR compliance may be slightly unique
from other archs where the cpu binding of the node is not known till CPUs
are onlined.

2. Currently the nodes are onlined (in all arch specific code) as soon as
they are detected. This is unconditional onlining as in there are no checks
to see the node number is 0. i.e I don't see any special checks that
restrict or allow node 0 from being onlined / offlined. Its considered no
special than any other online node.

3. If we were to expect node 0 to be always online, then why do we have
first_online_node. We could always hard code it to 0.

4. I tried enabling CONFIG_MEMORYLESS_NODE on x86, but that's seems to be
not possible. And it looks to me that something like that is only possible
on powerpc and IA64.

5. Without my patch on a regular numa system, node 0 would be onlined by
default during structure initialization. When the nodes get detected, node 0
and other nodes would again be onlined. The only drawback being if node 0
wasn't suppose to be online, it will still end up being marked online.
With the proposed patch, when the nodes get detected, any nodes detected
would be onlined.

I think the node onlining is already pretty early in boot. I don't know of
any other mechanism to move the onlining further up and in a non
architecture specific way. However if you have ideas, please do let me know.

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* Re: [PATCH v4 01/22] powerpc/pkeys: Avoid using lockless page table walk
From: Michael Ellerman @ 2020-05-13  8:49 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev; +Cc: Aneesh Kumar K.V, Ram Pai, npiggin
In-Reply-To: <20200505071729.54912-2-aneesh.kumar@linux.ibm.com>

On Tue, 2020-05-05 at 07:17:08 UTC, "Aneesh Kumar K.V" wrote:
> Fetch pkey from vma instead of linux page table. Also document the fact that in
> some cases the pkey returned in siginfo won't be the same as the one we took
> keyfault on. Even with linux page table walk, we can end up in a similar scenario.
> 
> Cc: Ram Pai <linuxram@linux.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/fe4a6856cb4f4353a6cb8d3629bcfe9204e3d57d

cheers

^ permalink raw reply

* Re: [PATCH 06/12] m68k/mm: move {cache,nocahe}_page() definitions close to their user
From: Greg Ungerer @ 2020-05-13 11:19 UTC (permalink / raw)
  To: Mike Rapoport, linux-kernel
  Cc: Rich Felker, linux-ia64, linux-sh, Catalin Marinas,
	Heiko Carstens, Max Filippov, Guo Ren, linux-csky, sparclinux,
	linux-hexagon, linux-riscv, Vincent Chen, Will Deacon,
	Thomas Gleixner, linux-arch, linux-s390, linux-c6x-dev,
	Brian Cain, Helge Deller, x86, Russell King, Ley Foon Tan,
	Mike Rapoport, Ingo Molnar, Geert Uytterhoeven, linux-parisc,
	Mark Salter, Matt Turner, linux-snps-arc, linux-xtensa,
	Arnd Bergmann, linux-alpha, linux-um, linux-m68k, Tony Luck,
	Borislav Petkov, Greentime Hu, Paul Walmsley, Stafford Horne,
	Guan Xuetao, linux-arm-kernel, Chris Zankel, Michal Simek,
	Thomas Bogendoerfer, Yoshinori Sato, Nick Hu, linux-mm,
	Vineet Gupta, linux-mips, openrisc, Richard Weinberger,
	Andrew Morton, linuxppc-dev, David S. Miller
In-Reply-To: <20200512184422.12418-7-rppt@kernel.org>

Hi Mike,

On 13/5/20 4:44 am, Mike Rapoport wrote:
> From: Mike Rapoport <rppt@linux.ibm.com>
> 
> The cache_page() and nocache_page() functions are only used by the morotola
                                                                      ^^^^^^^^
                                                                      motorola

> MMU variant for setting caching attributes for the page table pages.
> 
> Move the definitions of these functions from
> arch/m68k/include/asm/motorola_pgtable.h closer to their usage in
> arch/m68k/mm/motorola.c and drop unused definition in
> arch/m68k/include/asm/mcf_pgtable.h.
> 
> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>

Acked-by: Greg Ungerer <gerg@linux-m68k.org>

Regards
Greg


> ---
>   arch/m68k/include/asm/mcf_pgtable.h      | 40 ---------------------
>   arch/m68k/include/asm/motorola_pgtable.h | 44 ------------------------
>   arch/m68k/mm/motorola.c                  | 43 +++++++++++++++++++++++
>   3 files changed, 43 insertions(+), 84 deletions(-)
> 
> diff --git a/arch/m68k/include/asm/mcf_pgtable.h b/arch/m68k/include/asm/mcf_pgtable.h
> index 0031cd387b75..737e826294f3 100644
> --- a/arch/m68k/include/asm/mcf_pgtable.h
> +++ b/arch/m68k/include/asm/mcf_pgtable.h
> @@ -328,46 +328,6 @@ extern pgd_t kernel_pg_dir[PTRS_PER_PGD];
>   #define pte_offset_kernel(dir, address) \
>   	((pte_t *) __pmd_page(*(dir)) + __pte_offset(address))
>   
> -/*
> - * Disable caching for page at given kernel virtual address.
> - */
> -static inline void nocache_page(void *vaddr)
> -{
> -	pgd_t *dir;
> -	p4d_t *p4dp;
> -	pud_t *pudp;
> -	pmd_t *pmdp;
> -	pte_t *ptep;
> -	unsigned long addr = (unsigned long) vaddr;
> -
> -	dir = pgd_offset_k(addr);
> -	p4dp = p4d_offset(dir, addr);
> -	pudp = pud_offset(p4dp, addr);
> -	pmdp = pmd_offset(pudp, addr);
> -	ptep = pte_offset_kernel(pmdp, addr);
> -	*ptep = pte_mknocache(*ptep);
> -}
> -
> -/*
> - * Enable caching for page at given kernel virtual address.
> - */
> -static inline void cache_page(void *vaddr)
> -{
> -	pgd_t *dir;
> -	p4d_t *p4dp;
> -	pud_t *pudp;
> -	pmd_t *pmdp;
> -	pte_t *ptep;
> -	unsigned long addr = (unsigned long) vaddr;
> -
> -	dir = pgd_offset_k(addr);
> -	p4dp = p4d_offset(dir, addr);
> -	pudp = pud_offset(p4dp, addr);
> -	pmdp = pmd_offset(pudp, addr);
> -	ptep = pte_offset_kernel(pmdp, addr);
> -	*ptep = pte_mkcache(*ptep);
> -}
> -
>   /*
>    * Encode and de-code a swap entry (must be !pte_none(e) && !pte_present(e))
>    */
> diff --git a/arch/m68k/include/asm/motorola_pgtable.h b/arch/m68k/include/asm/motorola_pgtable.h
> index 9e5a3de21e15..e1594acf7c7e 100644
> --- a/arch/m68k/include/asm/motorola_pgtable.h
> +++ b/arch/m68k/include/asm/motorola_pgtable.h
> @@ -227,50 +227,6 @@ static inline pte_t *pte_offset_kernel(pmd_t *pmdp, unsigned long address)
>   #define pte_offset_map(pmdp,address) ((pte_t *)__pmd_page(*pmdp) + (((address) >> PAGE_SHIFT) & (PTRS_PER_PTE - 1)))
>   #define pte_unmap(pte)		((void)0)
>   
> -/* Prior to calling these routines, the page should have been flushed
> - * from both the cache and ATC, or the CPU might not notice that the
> - * cache setting for the page has been changed. -jskov
> - */
> -static inline void nocache_page(void *vaddr)
> -{
> -	unsigned long addr = (unsigned long)vaddr;
> -
> -	if (CPU_IS_040_OR_060) {
> -		pgd_t *dir;
> -		p4d_t *p4dp;
> -		pud_t *pudp;
> -		pmd_t *pmdp;
> -		pte_t *ptep;
> -
> -		dir = pgd_offset_k(addr);
> -		p4dp = p4d_offset(dir, addr);
> -		pudp = pud_offset(p4dp, addr);
> -		pmdp = pmd_offset(pudp, addr);
> -		ptep = pte_offset_kernel(pmdp, addr);
> -		*ptep = pte_mknocache(*ptep);
> -	}
> -}
> -
> -static inline void cache_page(void *vaddr)
> -{
> -	unsigned long addr = (unsigned long)vaddr;
> -
> -	if (CPU_IS_040_OR_060) {
> -		pgd_t *dir;
> -		p4d_t *p4dp;
> -		pud_t *pudp;
> -		pmd_t *pmdp;
> -		pte_t *ptep;
> -
> -		dir = pgd_offset_k(addr);
> -		p4dp = p4d_offset(dir, addr);
> -		pudp = pud_offset(p4dp, addr);
> -		pmdp = pmd_offset(pudp, addr);
> -		ptep = pte_offset_kernel(pmdp, addr);
> -		*ptep = pte_mkcache(*ptep);
> -	}
> -}
> -
>   /* Encode and de-code a swap entry (must be !pte_none(e) && !pte_present(e)) */
>   #define __swp_type(x)		(((x).val >> 4) & 0xff)
>   #define __swp_offset(x)		((x).val >> 12)
> diff --git a/arch/m68k/mm/motorola.c b/arch/m68k/mm/motorola.c
> index 904c2a663977..8e5e74121a78 100644
> --- a/arch/m68k/mm/motorola.c
> +++ b/arch/m68k/mm/motorola.c
> @@ -45,6 +45,49 @@ unsigned long mm_cachebits;
>   EXPORT_SYMBOL(mm_cachebits);
>   #endif
>   
> +/* Prior to calling these routines, the page should have been flushed
> + * from both the cache and ATC, or the CPU might not notice that the
> + * cache setting for the page has been changed. -jskov
> + */
> +static inline void nocache_page(void *vaddr)
> +{
> +	unsigned long addr = (unsigned long)vaddr;
> +
> +	if (CPU_IS_040_OR_060) {
> +		pgd_t *dir;
> +		p4d_t *p4dp;
> +		pud_t *pudp;
> +		pmd_t *pmdp;
> +		pte_t *ptep;
> +
> +		dir = pgd_offset_k(addr);
> +		p4dp = p4d_offset(dir, addr);
> +		pudp = pud_offset(p4dp, addr);
> +		pmdp = pmd_offset(pudp, addr);
> +		ptep = pte_offset_kernel(pmdp, addr);
> +		*ptep = pte_mknocache(*ptep);
> +	}
> +}
> +
> +static inline void cache_page(void *vaddr)
> +{
> +	unsigned long addr = (unsigned long)vaddr;
> +
> +	if (CPU_IS_040_OR_060) {
> +		pgd_t *dir;
> +		p4d_t *p4dp;
> +		pud_t *pudp;
> +		pmd_t *pmdp;
> +		pte_t *ptep;
> +
> +		dir = pgd_offset_k(addr);
> +		p4dp = p4d_offset(dir, addr);
> +		pudp = pud_offset(p4dp, addr);
> +		pmdp = pmd_offset(pudp, addr);
> +		ptep = pte_offset_kernel(pmdp, addr);
> +		*ptep = pte_mkcache(*ptep);
> +	}
> +}
>   
>   /*
>    * Motorola 680x0 user's manual recommends using uncached memory for address
> 

^ permalink raw reply

* Re: [PATCH] powerpc/32s: Fix build failure with CONFIG_PPC_KUAP_DEBUG
From: Michael Ellerman @ 2020-05-13 12:43 UTC (permalink / raw)
  To: Michael Ellerman, Christophe Leroy, Paul Mackerras,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <ea599546f2a7771bde551393889e44e6b2632332.1587368807.git.christophe.leroy@c-s.fr>

On Mon, 20 Apr 2020 07:47:05 +0000 (UTC), Christophe Leroy wrote:
> gpr2 is not a parametre of kuap_check(), it doesn't exist.
> 
> Use gpr instead.

Applied to powerpc/fixes.

[1/1] powerpc/32s: Fix build failure with CONFIG_PPC_KUAP_DEBUG
      https://git.kernel.org/powerpc/c/4833ce06e6855d526234618b746ffb71d6612c9a

cheers

^ permalink raw reply

* Re: [PATCH fixes] powerpc/40x: Make more space for system call exception
From: Michael Ellerman @ 2020-05-13 12:43 UTC (permalink / raw)
  To: Michael Ellerman, Christophe Leroy, Paul Mackerras,
	Benjamin Herrenschmidt
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <633165d72f75b4ef4c0901aebe99d3915c93e9a2.1589043863.git.christophe.leroy@csgroup.eu>

On Sat, 9 May 2020 17:05:11 +0000 (UTC), Christophe Leroy wrote:
> When CONFIG_VIRT_CPU_ACCOUNTING is selected, system call exception
> handler doesn't fit below 0xd00 and build fails.
> 
> As exception 0xd00 doesn't exist and is never generated by 40x,
> comment it out in order to get more space for system call exception.

Applied to powerpc/fixes.

[1/1] powerpc/40x: Make more space for system call exception
      https://git.kernel.org/powerpc/c/249c9b0cd193d983c3a0b00f3fd3b92333bfeebe

cheers

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox