LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH AUTOSEL 5.9 26/39] soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)
From: Sasha Levin @ 2020-12-03 13:28 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yi Wang, Sasha Levin, Hao Si, Li Yang, Lin Chen, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201203132834.930999-1-sashal@kernel.org>

From: Hao Si <si.hao@zte.com.cn>

[ Upstream commit 2663b3388551230cbc4606a40fabf3331ceb59e4 ]

The local variable 'cpumask_t mask' is in the stack memory, and its address
is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
But the memory area where this variable is located is at risk of being
modified.

During LTP testing, the following error was generated:

Unable to handle kernel paging request at virtual address ffff000012e9b790
Mem abort info:
  ESR = 0x96000007
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000007
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000075ac5e07
[ffff000012e9b790] pgd=00000027dbffe003, pud=00000027dbffd003,
pmd=00000027b6d61003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in: xt_conntrack
Process read_all (pid: 20171, stack limit = 0x0000000044ea4095)
CPU: 14 PID: 20171 Comm: read_all Tainted: G    B   W
Hardware name: NXP Layerscape LX2160ARDB (DT)
pstate: 80000085 (Nzcv daIf -PAN -UAO)
pc : irq_affinity_hint_proc_show+0x54/0xb0
lr : irq_affinity_hint_proc_show+0x4c/0xb0
sp : ffff00001138bc10
x29: ffff00001138bc10 x28: 0000ffffd131d1e0
x27: 00000000007000c0 x26: ffff8025b9480dc0
x25: ffff8025b9480da8 x24: 00000000000003ff
x23: ffff8027334f8300 x22: ffff80272e97d000
x21: ffff80272e97d0b0 x20: ffff8025b9480d80
x19: ffff000009a49000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000040
x11: 0000000000000000 x10: ffff802735b79b88
x9 : 0000000000000000 x8 : 0000000000000000
x7 : ffff000009a49848 x6 : 0000000000000003
x5 : 0000000000000000 x4 : ffff000008157d6c
x3 : ffff00001138bc10 x2 : ffff000012e9b790
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 irq_affinity_hint_proc_show+0x54/0xb0
 seq_read+0x1b0/0x440
 proc_reg_read+0x80/0xd8
 __vfs_read+0x60/0x178
 vfs_read+0x94/0x150
 ksys_read+0x74/0xf0
 __arm64_sys_read+0x24/0x30
 el0_svc_common.constprop.0+0xd8/0x1a0
 el0_svc_handler+0x34/0x88
 el0_svc+0x10/0x14
Code: f9001bbf 943e0732 f94066c2 b4000062 (f9400041)
---[ end trace b495bdcb0b3b732b ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 0,2-4,6,8,11,13-15
Kernel Offset: disabled
CPU features: 0x0,21006008
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---

Fix it by using 'cpumask_of(cpu)' to get the cpumask.

Signed-off-by: Hao Si <si.hao@zte.com.cn>
Signed-off-by: Lin Chen <chen.lin5@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/soc/fsl/dpio/dpio-driver.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-driver.c b/drivers/soc/fsl/dpio/dpio-driver.c
index 7b642c330977f..7f397b4ad878d 100644
--- a/drivers/soc/fsl/dpio/dpio-driver.c
+++ b/drivers/soc/fsl/dpio/dpio-driver.c
@@ -95,7 +95,6 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
 {
 	int error;
 	struct fsl_mc_device_irq *irq;
-	cpumask_t mask;
 
 	irq = dpio_dev->irqs[0];
 	error = devm_request_irq(&dpio_dev->dev,
@@ -112,9 +111,7 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
 	}
 
 	/* set the affinity hint */
-	cpumask_clear(&mask);
-	cpumask_set_cpu(cpu, &mask);
-	if (irq_set_affinity_hint(irq->msi_desc->irq, &mask))
+	if (irq_set_affinity_hint(irq->msi_desc->irq, cpumask_of(cpu)))
 		dev_err(&dpio_dev->dev,
 			"irq_set_affinity failed irq %d cpu %d\n",
 			irq->msi_desc->irq, cpu);
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.9 27/39] sched/idle: Fix arch_cpu_idle() vs tracing
From: Sasha Levin @ 2020-12-03 13:28 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Mark Rutland, Sasha Levin, linux-ia64, linux-parisc, linux-s390,
	Peter Zijlstra, linux-hexagon, linux-sh, linux-um, linux-mips,
	linux-csky, sparclinux, openrisc, Sven Schnelle, linux-alpha,
	uclinux-h8-devel, linux-riscv, linuxppc-dev, linux-arm-kernel
In-Reply-To: <20201203132834.930999-1-sashal@kernel.org>

From: Peter Zijlstra <peterz@infradead.org>

[ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ]

We call arch_cpu_idle() with RCU disabled, but then use
local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.

Switch all arch_cpu_idle() implementations to use
raw_local_irq_{en,dis}able() and carefully manage the
lockdep,rcu,tracing state like we do in entry.

(XXX: we really should change arch_cpu_idle() to not return with
interrupts enabled)

Reported-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Tested-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/alpha/kernel/process.c      |  2 +-
 arch/arm/kernel/process.c        |  2 +-
 arch/arm64/kernel/process.c      |  2 +-
 arch/csky/kernel/process.c       |  2 +-
 arch/h8300/kernel/process.c      |  2 +-
 arch/hexagon/kernel/process.c    |  2 +-
 arch/ia64/kernel/process.c       |  2 +-
 arch/microblaze/kernel/process.c |  2 +-
 arch/mips/kernel/idle.c          | 12 ++++++------
 arch/nios2/kernel/process.c      |  2 +-
 arch/openrisc/kernel/process.c   |  2 +-
 arch/parisc/kernel/process.c     |  2 +-
 arch/powerpc/kernel/idle.c       |  4 ++--
 arch/riscv/kernel/process.c      |  2 +-
 arch/s390/kernel/idle.c          |  6 +++---
 arch/sh/kernel/idle.c            |  2 +-
 arch/sparc/kernel/leon_pmc.c     |  4 ++--
 arch/sparc/kernel/process_32.c   |  2 +-
 arch/sparc/kernel/process_64.c   |  4 ++--
 arch/um/kernel/process.c         |  2 +-
 arch/x86/include/asm/mwait.h     |  2 --
 arch/x86/kernel/process.c        | 12 +++++++-----
 kernel/sched/idle.c              | 28 +++++++++++++++++++++++++++-
 23 files changed, 64 insertions(+), 38 deletions(-)

diff --git a/arch/alpha/kernel/process.c b/arch/alpha/kernel/process.c
index 7462a79110024..4c7b0414a3ff3 100644
--- a/arch/alpha/kernel/process.c
+++ b/arch/alpha/kernel/process.c
@@ -57,7 +57,7 @@ EXPORT_SYMBOL(pm_power_off);
 void arch_cpu_idle(void)
 {
 	wtint(0);
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_dead(void)
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 8e6ace03e960b..9f199b1e83839 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -71,7 +71,7 @@ void arch_cpu_idle(void)
 		arm_pm_idle();
 	else
 		cpu_do_idle();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_prepare(void)
diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 2da5f3f9d345f..f7c42a7d09b66 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -124,7 +124,7 @@ void arch_cpu_idle(void)
 	 * tricks
 	 */
 	cpu_do_idle();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 #ifdef CONFIG_HOTPLUG_CPU
diff --git a/arch/csky/kernel/process.c b/arch/csky/kernel/process.c
index f730869e21eed..69af6bc87e647 100644
--- a/arch/csky/kernel/process.c
+++ b/arch/csky/kernel/process.c
@@ -102,6 +102,6 @@ void arch_cpu_idle(void)
 #ifdef CONFIG_CPU_PM_STOP
 	asm volatile("stop\n");
 #endif
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 #endif
diff --git a/arch/h8300/kernel/process.c b/arch/h8300/kernel/process.c
index 83ce3caf73139..a2961c7b2332c 100644
--- a/arch/h8300/kernel/process.c
+++ b/arch/h8300/kernel/process.c
@@ -57,7 +57,7 @@ asmlinkage void ret_from_kernel_thread(void);
  */
 void arch_cpu_idle(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 	__asm__("sleep");
 }
 
diff --git a/arch/hexagon/kernel/process.c b/arch/hexagon/kernel/process.c
index dfd322c5ce83a..20962601a1b47 100644
--- a/arch/hexagon/kernel/process.c
+++ b/arch/hexagon/kernel/process.c
@@ -44,7 +44,7 @@ void arch_cpu_idle(void)
 {
 	__vmwait();
 	/*  interrupts wake us up, but irqs are still disabled */
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
diff --git a/arch/ia64/kernel/process.c b/arch/ia64/kernel/process.c
index f19cb97c00987..1b2769260688d 100644
--- a/arch/ia64/kernel/process.c
+++ b/arch/ia64/kernel/process.c
@@ -252,7 +252,7 @@ void arch_cpu_idle(void)
 	if (mark_idle)
 		(*mark_idle)(1);
 
-	safe_halt();
+	raw_safe_halt();
 
 	if (mark_idle)
 		(*mark_idle)(0);
diff --git a/arch/microblaze/kernel/process.c b/arch/microblaze/kernel/process.c
index a9e46e525cd0a..f99860771ff48 100644
--- a/arch/microblaze/kernel/process.c
+++ b/arch/microblaze/kernel/process.c
@@ -149,5 +149,5 @@ int dump_fpu(struct pt_regs *regs, elf_fpregset_t *fpregs)
 
 void arch_cpu_idle(void)
 {
-       local_irq_enable();
+       raw_local_irq_enable();
 }
diff --git a/arch/mips/kernel/idle.c b/arch/mips/kernel/idle.c
index 5bc3b04693c7d..18e69ebf5691d 100644
--- a/arch/mips/kernel/idle.c
+++ b/arch/mips/kernel/idle.c
@@ -33,19 +33,19 @@ static void __cpuidle r3081_wait(void)
 {
 	unsigned long cfg = read_c0_conf();
 	write_c0_conf(cfg | R30XX_CONF_HALT);
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 static void __cpuidle r39xx_wait(void)
 {
 	if (!need_resched())
 		write_c0_conf(read_c0_conf() | TX39_CONF_HALT);
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void __cpuidle r4k_wait(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 	__r4k_wait();
 }
 
@@ -64,7 +64,7 @@ void __cpuidle r4k_wait_irqoff(void)
 		"	.set	arch=r4000	\n"
 		"	wait			\n"
 		"	.set	pop		\n");
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
@@ -84,7 +84,7 @@ static void __cpuidle rm7k_wait_irqoff(void)
 		"	wait						\n"
 		"	mtc0	$1, $12		# stalls until W stage	\n"
 		"	.set	pop					\n");
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
@@ -257,7 +257,7 @@ void arch_cpu_idle(void)
 	if (cpu_wait)
 		cpu_wait();
 	else
-		local_irq_enable();
+		raw_local_irq_enable();
 }
 
 #ifdef CONFIG_CPU_IDLE
diff --git a/arch/nios2/kernel/process.c b/arch/nios2/kernel/process.c
index 88a4ec03edab4..f5cc55a88d310 100644
--- a/arch/nios2/kernel/process.c
+++ b/arch/nios2/kernel/process.c
@@ -33,7 +33,7 @@ EXPORT_SYMBOL(pm_power_off);
 
 void arch_cpu_idle(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
diff --git a/arch/openrisc/kernel/process.c b/arch/openrisc/kernel/process.c
index 0ff391f00334c..3c98728cce249 100644
--- a/arch/openrisc/kernel/process.c
+++ b/arch/openrisc/kernel/process.c
@@ -79,7 +79,7 @@ void machine_power_off(void)
  */
 void arch_cpu_idle(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 	if (mfspr(SPR_UPR) & SPR_UPR_PMP)
 		mtspr(SPR_PMR, mfspr(SPR_PMR) | SPR_PMR_DME);
 }
diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
index f196d96e2f9f5..a92a23d6acd93 100644
--- a/arch/parisc/kernel/process.c
+++ b/arch/parisc/kernel/process.c
@@ -169,7 +169,7 @@ void __cpuidle arch_cpu_idle_dead(void)
 
 void __cpuidle arch_cpu_idle(void)
 {
-	local_irq_enable();
+	raw_local_irq_enable();
 
 	/* nop on real hardware, qemu will idle sleep. */
 	asm volatile("or %%r10,%%r10,%%r10\n":::);
diff --git a/arch/powerpc/kernel/idle.c b/arch/powerpc/kernel/idle.c
index 422e31d2f5a2b..8df35f1329a42 100644
--- a/arch/powerpc/kernel/idle.c
+++ b/arch/powerpc/kernel/idle.c
@@ -60,9 +60,9 @@ void arch_cpu_idle(void)
 		 * interrupts enabled, some don't.
 		 */
 		if (irqs_disabled())
-			local_irq_enable();
+			raw_local_irq_enable();
 	} else {
-		local_irq_enable();
+		raw_local_irq_enable();
 		/*
 		 * Go into low thread priority and possibly
 		 * low power mode.
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 2b97c493427c9..308e1d95ecbf0 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -36,7 +36,7 @@ extern asmlinkage void ret_from_kernel_thread(void);
 void arch_cpu_idle(void)
 {
 	wait_for_interrupt();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void show_regs(struct pt_regs *regs)
diff --git a/arch/s390/kernel/idle.c b/arch/s390/kernel/idle.c
index f7f1e64e0d980..2b85096964f84 100644
--- a/arch/s390/kernel/idle.c
+++ b/arch/s390/kernel/idle.c
@@ -33,10 +33,10 @@ void enabled_wait(void)
 		PSW_MASK_IO | PSW_MASK_EXT | PSW_MASK_MCHECK;
 	clear_cpu_flag(CIF_NOHZ_DELAY);
 
-	local_irq_save(flags);
+	raw_local_irq_save(flags);
 	/* Call the assembler magic in entry.S */
 	psw_idle(idle, psw_mask);
-	local_irq_restore(flags);
+	raw_local_irq_restore(flags);
 
 	/* Account time spent with enabled wait psw loaded as idle time. */
 	raw_write_seqcount_begin(&idle->seqcount);
@@ -123,7 +123,7 @@ void arch_cpu_idle_enter(void)
 void arch_cpu_idle(void)
 {
 	enabled_wait();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 void arch_cpu_idle_exit(void)
diff --git a/arch/sh/kernel/idle.c b/arch/sh/kernel/idle.c
index 0dc0f52f9bb8d..f59814983bd59 100644
--- a/arch/sh/kernel/idle.c
+++ b/arch/sh/kernel/idle.c
@@ -22,7 +22,7 @@ static void (*sh_idle)(void);
 void default_idle(void)
 {
 	set_bl_bit();
-	local_irq_enable();
+	raw_local_irq_enable();
 	/* Isn't this racy ? */
 	cpu_sleep();
 	clear_bl_bit();
diff --git a/arch/sparc/kernel/leon_pmc.c b/arch/sparc/kernel/leon_pmc.c
index 065e2d4b72908..396f46bca52eb 100644
--- a/arch/sparc/kernel/leon_pmc.c
+++ b/arch/sparc/kernel/leon_pmc.c
@@ -50,7 +50,7 @@ static void pmc_leon_idle_fixup(void)
 	register unsigned int address = (unsigned int)leon3_irqctrl_regs;
 
 	/* Interrupts need to be enabled to not hang the CPU */
-	local_irq_enable();
+	raw_local_irq_enable();
 
 	__asm__ __volatile__ (
 		"wr	%%g0, %%asr19\n"
@@ -66,7 +66,7 @@ static void pmc_leon_idle_fixup(void)
 static void pmc_leon_idle(void)
 {
 	/* Interrupts need to be enabled to not hang the CPU */
-	local_irq_enable();
+	raw_local_irq_enable();
 
 	/* For systems without power-down, this will be no-op */
 	__asm__ __volatile__ ("wr	%g0, %asr19\n\t");
diff --git a/arch/sparc/kernel/process_32.c b/arch/sparc/kernel/process_32.c
index adfcaeab3ddc5..a023637359154 100644
--- a/arch/sparc/kernel/process_32.c
+++ b/arch/sparc/kernel/process_32.c
@@ -74,7 +74,7 @@ void arch_cpu_idle(void)
 {
 	if (sparc_idle)
 		(*sparc_idle)();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /* XXX cli/sti -> local_irq_xxx here, check this works once SMP is fixed. */
diff --git a/arch/sparc/kernel/process_64.c b/arch/sparc/kernel/process_64.c
index a75093b993f9a..6f8c7822fc065 100644
--- a/arch/sparc/kernel/process_64.c
+++ b/arch/sparc/kernel/process_64.c
@@ -62,11 +62,11 @@ void arch_cpu_idle(void)
 {
 	if (tlb_type != hypervisor) {
 		touch_nmi_watchdog();
-		local_irq_enable();
+		raw_local_irq_enable();
 	} else {
 		unsigned long pstate;
 
-		local_irq_enable();
+		raw_local_irq_enable();
 
                 /* The sun4v sleeping code requires that we have PSTATE.IE cleared over
                  * the cpu sleep hypervisor call.
diff --git a/arch/um/kernel/process.c b/arch/um/kernel/process.c
index 26b5e243d3fc0..495f101792b3d 100644
--- a/arch/um/kernel/process.c
+++ b/arch/um/kernel/process.c
@@ -217,7 +217,7 @@ void arch_cpu_idle(void)
 {
 	cpu_tasks[current_thread_info()->cpu].pid = os_getpid();
 	um_idle_sleep();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 int __cant_sleep(void) {
diff --git a/arch/x86/include/asm/mwait.h b/arch/x86/include/asm/mwait.h
index e039a933aca3c..29dd27b5a339d 100644
--- a/arch/x86/include/asm/mwait.h
+++ b/arch/x86/include/asm/mwait.h
@@ -88,8 +88,6 @@ static inline void __mwaitx(unsigned long eax, unsigned long ebx,
 
 static inline void __sti_mwait(unsigned long eax, unsigned long ecx)
 {
-	trace_hardirqs_on();
-
 	mds_idle_clear_cpu_buffers();
 	/* "mwait %eax, %ecx;" */
 	asm volatile("sti; .byte 0x0f, 0x01, 0xc9;"
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index ba4593a913fab..145a7ac0c19aa 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -685,7 +685,7 @@ void arch_cpu_idle(void)
  */
 void __cpuidle default_idle(void)
 {
-	safe_halt();
+	raw_safe_halt();
 }
 #if defined(CONFIG_APM_MODULE) || defined(CONFIG_HALTPOLL_CPUIDLE_MODULE)
 EXPORT_SYMBOL(default_idle);
@@ -736,6 +736,8 @@ void stop_this_cpu(void *dummy)
 /*
  * AMD Erratum 400 aware idle routine. We handle it the same way as C3 power
  * states (local apic timer and TSC stop).
+ *
+ * XXX this function is completely buggered vs RCU and tracing.
  */
 static void amd_e400_idle(void)
 {
@@ -757,9 +759,9 @@ static void amd_e400_idle(void)
 	 * The switch back from broadcast mode needs to be called with
 	 * interrupts disabled.
 	 */
-	local_irq_disable();
+	raw_local_irq_disable();
 	tick_broadcast_exit();
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /*
@@ -801,9 +803,9 @@ static __cpuidle void mwait_idle(void)
 		if (!need_resched())
 			__sti_mwait(0, 0);
 		else
-			local_irq_enable();
+			raw_local_irq_enable();
 	} else {
-		local_irq_enable();
+		raw_local_irq_enable();
 	}
 	__current_clr_polling();
 }
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index f324dc36fc43d..dee807ffad11b 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -78,7 +78,7 @@ void __weak arch_cpu_idle_dead(void) { }
 void __weak arch_cpu_idle(void)
 {
 	cpu_idle_force_poll = 1;
-	local_irq_enable();
+	raw_local_irq_enable();
 }
 
 /**
@@ -94,9 +94,35 @@ void __cpuidle default_idle_call(void)
 
 		trace_cpu_idle(1, smp_processor_id());
 		stop_critical_timings();
+
+		/*
+		 * arch_cpu_idle() is supposed to enable IRQs, however
+		 * we can't do that because of RCU and tracing.
+		 *
+		 * Trace IRQs enable here, then switch off RCU, and have
+		 * arch_cpu_idle() use raw_local_irq_enable(). Note that
+		 * rcu_idle_enter() relies on lockdep IRQ state, so switch that
+		 * last -- this is very similar to the entry code.
+		 */
+		trace_hardirqs_on_prepare();
+		lockdep_hardirqs_on_prepare(_THIS_IP_);
 		rcu_idle_enter();
+		lockdep_hardirqs_on(_THIS_IP_);
+
 		arch_cpu_idle();
+
+		/*
+		 * OK, so IRQs are enabled here, but RCU needs them disabled to
+		 * turn itself back on.. funny thing is that disabling IRQs
+		 * will cause tracing, which needs RCU. Jump through hoops to
+		 * make it 'work'.
+		 */
+		raw_local_irq_disable();
+		lockdep_hardirqs_off(_THIS_IP_);
 		rcu_idle_exit();
+		lockdep_hardirqs_on(_THIS_IP_);
+		raw_local_irq_enable();
+
 		start_critical_timings();
 		trace_cpu_idle(PWR_EVENT_EXIT, smp_processor_id());
 	}
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 05/23] powerpc: Drop -me200 addition to build flags
From: Sasha Levin @ 2020-12-03 13:29 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, kernel test robot, Németh Márton,
	Nick Desaulniers, Scott Wood, linuxppc-dev
In-Reply-To: <20201203132935.931362-1-sashal@kernel.org>

From: Michael Ellerman <mpe@ellerman.id.au>

[ Upstream commit e02152ba2810f7c88cb54e71cda096268dfa9241 ]

Currently a build with CONFIG_E200=y will fail with:

  Error: invalid switch -me200
  Error: unrecognized option -me200

Upstream binutils has never supported an -me200 option. Presumably it
was supported at some point by either a fork or Freescale internal
binutils.

We can't support code that we can't even build test, so drop the
addition of -me200 to the build flags, so we can at least build with
CONFIG_E200=y.

Reported-by: Németh Márton <nm127@freemail.hu>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Scott Wood <oss@buserror.net>
Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 37ac731a556b8..9f73fb6b1cc91 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -250,7 +250,6 @@ KBUILD_CFLAGS		+= $(call cc-option,-mno-string)
 
 cpu-as-$(CONFIG_4xx)		+= -Wa,-m405
 cpu-as-$(CONFIG_ALTIVEC)	+= $(call as-option,-Wa$(comma)-maltivec)
-cpu-as-$(CONFIG_E200)		+= -Wa,-me200
 cpu-as-$(CONFIG_E500)		+= -Wa,-me500
 
 # When using '-many -mpower4' gas will first try and find a matching power4
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 12/23] ibmvnic: skip tx timeout reset while in resetting
From: Sasha Levin @ 2020-12-03 13:29 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, netdev, Dany Madden, Lijun Pan, Brian King,
	Jakub Kicinski, linuxppc-dev
In-Reply-To: <20201203132935.931362-1-sashal@kernel.org>

From: Lijun Pan <ljp@linux.ibm.com>

[ Upstream commit 855a631a4c11458a9cef1ab79c1530436aa95fae ]

Sometimes it takes longer than 5 seconds (watchdog timeout) to complete
failover, migration, and other resets. In stead of scheduling another
timeout reset, we wait for the current one to complete.

Suggested-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index e53994ca3142c..af1d9d2150616 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2266,6 +2266,12 @@ static void ibmvnic_tx_timeout(struct net_device *dev)
 {
 	struct ibmvnic_adapter *adapter = netdev_priv(dev);
 
+	if (test_bit(0, &adapter->resetting)) {
+		netdev_err(adapter->netdev,
+			   "Adapter is resetting, skip timeout reset\n");
+		return;
+	}
+
 	ibmvnic_reset(adapter, VNIC_RESET_TIMEOUT);
 }
 
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.4 15/23] soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)
From: Sasha Levin @ 2020-12-03 13:29 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yi Wang, Sasha Levin, Hao Si, Li Yang, Lin Chen, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201203132935.931362-1-sashal@kernel.org>

From: Hao Si <si.hao@zte.com.cn>

[ Upstream commit 2663b3388551230cbc4606a40fabf3331ceb59e4 ]

The local variable 'cpumask_t mask' is in the stack memory, and its address
is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
But the memory area where this variable is located is at risk of being
modified.

During LTP testing, the following error was generated:

Unable to handle kernel paging request at virtual address ffff000012e9b790
Mem abort info:
  ESR = 0x96000007
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000007
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000075ac5e07
[ffff000012e9b790] pgd=00000027dbffe003, pud=00000027dbffd003,
pmd=00000027b6d61003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in: xt_conntrack
Process read_all (pid: 20171, stack limit = 0x0000000044ea4095)
CPU: 14 PID: 20171 Comm: read_all Tainted: G    B   W
Hardware name: NXP Layerscape LX2160ARDB (DT)
pstate: 80000085 (Nzcv daIf -PAN -UAO)
pc : irq_affinity_hint_proc_show+0x54/0xb0
lr : irq_affinity_hint_proc_show+0x4c/0xb0
sp : ffff00001138bc10
x29: ffff00001138bc10 x28: 0000ffffd131d1e0
x27: 00000000007000c0 x26: ffff8025b9480dc0
x25: ffff8025b9480da8 x24: 00000000000003ff
x23: ffff8027334f8300 x22: ffff80272e97d000
x21: ffff80272e97d0b0 x20: ffff8025b9480d80
x19: ffff000009a49000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000040
x11: 0000000000000000 x10: ffff802735b79b88
x9 : 0000000000000000 x8 : 0000000000000000
x7 : ffff000009a49848 x6 : 0000000000000003
x5 : 0000000000000000 x4 : ffff000008157d6c
x3 : ffff00001138bc10 x2 : ffff000012e9b790
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 irq_affinity_hint_proc_show+0x54/0xb0
 seq_read+0x1b0/0x440
 proc_reg_read+0x80/0xd8
 __vfs_read+0x60/0x178
 vfs_read+0x94/0x150
 ksys_read+0x74/0xf0
 __arm64_sys_read+0x24/0x30
 el0_svc_common.constprop.0+0xd8/0x1a0
 el0_svc_handler+0x34/0x88
 el0_svc+0x10/0x14
Code: f9001bbf 943e0732 f94066c2 b4000062 (f9400041)
---[ end trace b495bdcb0b3b732b ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 0,2-4,6,8,11,13-15
Kernel Offset: disabled
CPU features: 0x0,21006008
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---

Fix it by using 'cpumask_of(cpu)' to get the cpumask.

Signed-off-by: Hao Si <si.hao@zte.com.cn>
Signed-off-by: Lin Chen <chen.lin5@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/soc/fsl/dpio/dpio-driver.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-driver.c b/drivers/soc/fsl/dpio/dpio-driver.c
index 7b642c330977f..7f397b4ad878d 100644
--- a/drivers/soc/fsl/dpio/dpio-driver.c
+++ b/drivers/soc/fsl/dpio/dpio-driver.c
@@ -95,7 +95,6 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
 {
 	int error;
 	struct fsl_mc_device_irq *irq;
-	cpumask_t mask;
 
 	irq = dpio_dev->irqs[0];
 	error = devm_request_irq(&dpio_dev->dev,
@@ -112,9 +111,7 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
 	}
 
 	/* set the affinity hint */
-	cpumask_clear(&mask);
-	cpumask_set_cpu(cpu, &mask);
-	if (irq_set_affinity_hint(irq->msi_desc->irq, &mask))
+	if (irq_set_affinity_hint(irq->msi_desc->irq, cpumask_of(cpu)))
 		dev_err(&dpio_dev->dev,
 			"irq_set_affinity failed irq %d cpu %d\n",
 			irq->msi_desc->irq, cpu);
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 04/14] powerpc: Drop -me200 addition to build flags
From: Sasha Levin @ 2020-12-03 13:30 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, kernel test robot, Németh Márton,
	Nick Desaulniers, Scott Wood, linuxppc-dev
In-Reply-To: <20201203133010.931600-1-sashal@kernel.org>

From: Michael Ellerman <mpe@ellerman.id.au>

[ Upstream commit e02152ba2810f7c88cb54e71cda096268dfa9241 ]

Currently a build with CONFIG_E200=y will fail with:

  Error: invalid switch -me200
  Error: unrecognized option -me200

Upstream binutils has never supported an -me200 option. Presumably it
was supported at some point by either a fork or Freescale internal
binutils.

We can't support code that we can't even build test, so drop the
addition of -me200 to the build flags, so we can at least build with
CONFIG_E200=y.

Reported-by: Németh Márton <nm127@freemail.hu>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Scott Wood <oss@buserror.net>
Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 8954108df4570..f51e21ea53492 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -251,7 +251,6 @@ endif
 
 cpu-as-$(CONFIG_4xx)		+= -Wa,-m405
 cpu-as-$(CONFIG_ALTIVEC)	+= $(call as-option,-Wa$(comma)-maltivec)
-cpu-as-$(CONFIG_E200)		+= -Wa,-me200
 cpu-as-$(CONFIG_E500)		+= -Wa,-me500
 
 # When using '-many -mpower4' gas will first try and find a matching power4
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH 1/2] ASoC: fsl-asoc-card: Add support for si476x codec
From: Mark Brown @ 2020-12-03 14:18 UTC (permalink / raw)
  To: devicetree, linux-kernel, perex, Shengjiu Wang, linuxppc-dev,
	timur, Xiubo.Lee, nicoleotsuka, alsa-devel, lgirdwood, robh+dt,
	tiwai, festevam
In-Reply-To: <1606708668-28786-1-git-send-email-shengjiu.wang@nxp.com>

On Mon, 30 Nov 2020 11:57:47 +0800, Shengjiu Wang wrote:
> The si476x codec is used for FM radio function on i.MX6
> auto board, it only supports recording function.

Applied to

   https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound.git for-next

Thanks!

[1/2] ASoC: fsl-asoc-card: Add support for si476x codec
      commit: 77f1ff751037fcd39c8fc37b3c3796fb139fb388
[2/2] ASoC: bindings: fsl-asoc-card: add compatible string for si476x codec
      commit: 0b3355b070434f9901f641aac9000df93e2c96ad

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

^ permalink raw reply

* Re: powerpc 5.10-rcN boot failures with RCU_SCALE_TEST=m
From: Uladzislau Rezki @ 2020-12-03 14:34 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: rcu, Uladzislau Rezki, linuxppc-dev, Paul E . McKenney,
	Daniel Axtens
In-Reply-To: <87sg8nv277.fsf@mpe.ellerman.id.au>

On Thu, Dec 03, 2020 at 05:22:20PM +1100, Michael Ellerman wrote:
> Uladzislau Rezki <urezki@gmail.com> writes:
> > On Thu, Dec 03, 2020 at 01:03:32AM +1100, Michael Ellerman wrote:
> ...
> >> 
> >> The SMP bringup stalls because _cpu_up() is blocked trying to take
> >> cpu_hotplug_lock for writing:
> >> 
> >> [  401.403132][    T0] task:swapper/0       state:D stack:12512 pid:    1 ppid:     0 flags:0x00000800
> >> [  401.403502][    T0] Call Trace:
> >> [  401.403907][    T0] [c0000000062c37d0] [c0000000062c3830] 0xc0000000062c3830 (unreliable)
> >> [  401.404068][    T0] [c0000000062c39b0] [c000000000019d70] __switch_to+0x2e0/0x4a0
> >> [  401.404189][    T0] [c0000000062c3a10] [c000000000b87228] __schedule+0x288/0x9b0
> >> [  401.404257][    T0] [c0000000062c3ad0] [c000000000b879b8] schedule+0x68/0x120
> >> [  401.404324][    T0] [c0000000062c3b00] [c000000000184ad4] percpu_down_write+0x164/0x170
> >> [  401.404390][    T0] [c0000000062c3b50] [c000000000116b68] _cpu_up+0x68/0x280
> >> [  401.404475][    T0] [c0000000062c3bb0] [c000000000116e70] cpu_up+0xf0/0x140
> >> [  401.404546][    T0] [c0000000062c3c30] [c00000000011776c] bringup_nonboot_cpus+0xac/0xf0
> >> [  401.404643][    T0] [c0000000062c3c80] [c000000000eea1b8] smp_init+0x40/0xcc
> >> [  401.404727][    T0] [c0000000062c3ce0] [c000000000ec43dc] kernel_init_freeable+0x1e0/0x3a0
> >> [  401.404799][    T0] [c0000000062c3db0] [c000000000011ec4] kernel_init+0x24/0x150
> >> [  401.404958][    T0] [c0000000062c3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
> >> 
> >> It can't get it because kprobe_optimizer() has taken it for read and is now
> >> blocked waiting for synchronize_rcu_tasks():
> >> 
> >> [  401.418808][    T0] task:kworker/0:1     state:D stack:13392 pid:   12 ppid:     2 flags:0x00000800
> >> [  401.418951][    T0] Workqueue: events kprobe_optimizer
> >> [  401.419078][    T0] Call Trace:
> >> [  401.419121][    T0] [c0000000062ef650] [c0000000062ef710] 0xc0000000062ef710 (unreliable)
> >> [  401.419213][    T0] [c0000000062ef830] [c000000000019d70] __switch_to+0x2e0/0x4a0
> >> [  401.419281][    T0] [c0000000062ef890] [c000000000b87228] __schedule+0x288/0x9b0
> >> [  401.419347][    T0] [c0000000062ef950] [c000000000b879b8] schedule+0x68/0x120
> >> [  401.419415][    T0] [c0000000062ef980] [c000000000b8e664] schedule_timeout+0x2a4/0x340
> >> [  401.419484][    T0] [c0000000062efa80] [c000000000b894ec] wait_for_completion+0x9c/0x170
> >> [  401.419552][    T0] [c0000000062efae0] [c0000000001ac85c] __wait_rcu_gp+0x19c/0x210
> >> [  401.419619][    T0] [c0000000062efb40] [c0000000001ac90c] synchronize_rcu_tasks_generic+0x3c/0x70
> >> [  401.419690][    T0] [c0000000062efbe0] [c00000000022a3dc] kprobe_optimizer+0x1dc/0x470
> >> [  401.419757][    T0] [c0000000062efc60] [c000000000136684] process_one_work+0x2f4/0x530
> >> [  401.419823][    T0] [c0000000062efd20] [c000000000138d28] worker_thread+0x78/0x570
> >> [  401.419891][    T0] [c0000000062efdb0] [c000000000142424] kthread+0x194/0x1a0
> >> [  401.419976][    T0] [c0000000062efe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
> >> 
> >> But why is the synchronize_rcu_tasks() not completing?
> >> 
> > I think that it is because RCU is not fully initialized by that time.
> 
> Yeah that would explain it :)
> 
> > The 36dadef23fcc ("kprobes: Init kprobes in early_initcall") patch
> > switches to early_initcall() that has a higher priority sequence than
> > core_initcall() that is used to complete an RCU setup in the rcu_set_runtime_mode().
> 
> I was looking at debug_lockdep_rcu_enabled(), which is:
> 
> noinstr int notrace debug_lockdep_rcu_enabled(void)
> {
> 	return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
> 	       current->lockdep_recursion == 0;
> }
> 
> That is not firing any warnings for me because rcu_scheduler_active is:
> 
> (gdb) p/x rcu_scheduler_active
> $1 = 0x1
> 
> Which is:
> 
> #define RCU_SCHEDULER_INIT	1
> 

Agree with that.

> But that's different to RCU_SCHEDULER_RUNNING, which is set in
> rcu_set_runtime_mode() as you mentioned:
> 
> static int __init rcu_set_runtime_mode(void)
> {
> 	rcu_test_sync_prims();
> 	rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
> 	kfree_rcu_scheduler_running();
> 	rcu_test_sync_prims();
> 	return 0;
> }
> 
BTW, since you can reproduce it and have a test setup, could you please
check that:

<snip>
-core_initcall(rcu_set_runtime_mode);
+early_initcall(rcu_set_runtime_mode);
<snip>

Just in case. The the synchronize_rcu_tasks() gets blocked:

<snip>
void __wait_rcu_gp(bool checktiny, int n, call_rcu_func_t *crcu_array,
    if (crcu_array[j] == crcu_array[i])
                                break;
        if (j == i) {
            wait_for_completion(&rs_array[i].completion); <--- here
            destroy_rcu_head_on_stack(&rs_array[i].head);
        }
...
<snip>

but that is obvious when looking at your full traces.

> The comment on rcu_scheduler_active implies that once we're at
> RCU_SCHEDULER_INIT things should work:
> 
> /*
>  * The rcu_scheduler_active variable is initialized to the value
>  * RCU_SCHEDULER_INACTIVE and transitions RCU_SCHEDULER_INIT just before the
>  * first task is spawned.  So when this variable is RCU_SCHEDULER_INACTIVE,
>  * RCU can assume that there is but one task, allowing RCU to (for example)
>  * optimize synchronize_rcu() to a simple barrier().  When this variable
>  * is RCU_SCHEDULER_INIT, RCU must actually do all the hard work required
>  * to detect real grace periods.  This variable is also used to suppress
>  * boot-time false positives from lockdep-RCU error checking.  Finally, it
>  * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU
>  * is fully initialized, including all of its kthreads having been spawned.
>  */
> 
> 
> So I'm not sure, the comments and the debug checks imply that it is OK
> for kprobes to be using RCU this early.
>
Sounds like it should be possible.

> 
> I guess I'll keep digging.
> 
Thank you! I also will dig further with that even though i do not have a setup
for reproducing it.

--
Vlad Rezki

^ permalink raw reply

* Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: Rik van Riel @ 2020-12-03 14:26 UTC (permalink / raw)
  To: Matthew Wilcox, Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, Dave Hansen, Will Deacon, X86 ML, LKML,
	Nicholas Piggin, Linux-MM, Mathieu Desnoyers, Catalin Marinas,
	linuxppc-dev
In-Reply-To: <20201203123129.GH11935@casper.infradead.org>

[-- Attachment #1: Type: text/plain, Size: 771 bytes --]

On Thu, 2020-12-03 at 12:31 +0000, Matthew Wilcox wrote:

> And this just makes me think RCU freeing of mm_struct.  I'm sure it's
> more complicated than that (then, or now), but if an anonymous
> process
> is borrowing a freed mm, and the mm is freed by RCU then it will not
> go
> away until the task context switches.  When we context switch back to
> the anon task, it'll borrow some other task's MM and won't even
> notice
> that the MM it was using has gone away.

One major complication here is that most of the
active_mm borrowing is done by the idle task,
but RCU does not wait for idle tasks to context
switch.

That means RCU, as it is today, is not a
mechanism that mm_struct freeing could just
piggyback off.

-- 
All Rights Reversed.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply

* Re: [PATCH AUTOSEL 5.9 27/39] sched/idle: Fix arch_cpu_idle() vs tracing
From: Heiko Carstens @ 2020-12-03 14:54 UTC (permalink / raw)
  To: Sasha Levin
  Cc: Mark Rutland, uclinux-h8-devel, linux-ia64, linux-parisc,
	linux-s390, Peter Zijlstra, linux-hexagon, linux-sh, linux-um,
	linux-kernel, stable, linux-mips, openrisc, linux-csky,
	Sven Schnelle, linux-alpha, sparclinux, linux-riscv, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201203132834.930999-27-sashal@kernel.org>

On Thu, Dec 03, 2020 at 08:28:21AM -0500, Sasha Levin wrote:
> From: Peter Zijlstra <peterz@infradead.org>
> 
> [ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ]
> 
> We call arch_cpu_idle() with RCU disabled, but then use
> local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
> 
> Switch all arch_cpu_idle() implementations to use
> raw_local_irq_{en,dis}able() and carefully manage the
> lockdep,rcu,tracing state like we do in entry.
> 
> (XXX: we really should change arch_cpu_idle() to not return with
> interrupts enabled)
> 
> Reported-by: Sven Schnelle <svens@linux.ibm.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> Tested-by: Mark Rutland <mark.rutland@arm.com>
> Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
> Signed-off-by: Sasha Levin <sashal@kernel.org>

This patch broke s390 irq state tracing. A patch to fix this is
scheduled to be merged upstream today (hopefully).
Therefore I think this patch should not yet go into 5.9 stable.

^ permalink raw reply

* [PATCH AUTOSEL 4.19 10/14] soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)
From: Sasha Levin @ 2020-12-03 13:30 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yi Wang, Sasha Levin, Hao Si, Li Yang, Lin Chen, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201203133010.931600-1-sashal@kernel.org>

From: Hao Si <si.hao@zte.com.cn>

[ Upstream commit 2663b3388551230cbc4606a40fabf3331ceb59e4 ]

The local variable 'cpumask_t mask' is in the stack memory, and its address
is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
But the memory area where this variable is located is at risk of being
modified.

During LTP testing, the following error was generated:

Unable to handle kernel paging request at virtual address ffff000012e9b790
Mem abort info:
  ESR = 0x96000007
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000007
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000075ac5e07
[ffff000012e9b790] pgd=00000027dbffe003, pud=00000027dbffd003,
pmd=00000027b6d61003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in: xt_conntrack
Process read_all (pid: 20171, stack limit = 0x0000000044ea4095)
CPU: 14 PID: 20171 Comm: read_all Tainted: G    B   W
Hardware name: NXP Layerscape LX2160ARDB (DT)
pstate: 80000085 (Nzcv daIf -PAN -UAO)
pc : irq_affinity_hint_proc_show+0x54/0xb0
lr : irq_affinity_hint_proc_show+0x4c/0xb0
sp : ffff00001138bc10
x29: ffff00001138bc10 x28: 0000ffffd131d1e0
x27: 00000000007000c0 x26: ffff8025b9480dc0
x25: ffff8025b9480da8 x24: 00000000000003ff
x23: ffff8027334f8300 x22: ffff80272e97d000
x21: ffff80272e97d0b0 x20: ffff8025b9480d80
x19: ffff000009a49000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000040
x11: 0000000000000000 x10: ffff802735b79b88
x9 : 0000000000000000 x8 : 0000000000000000
x7 : ffff000009a49848 x6 : 0000000000000003
x5 : 0000000000000000 x4 : ffff000008157d6c
x3 : ffff00001138bc10 x2 : ffff000012e9b790
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 irq_affinity_hint_proc_show+0x54/0xb0
 seq_read+0x1b0/0x440
 proc_reg_read+0x80/0xd8
 __vfs_read+0x60/0x178
 vfs_read+0x94/0x150
 ksys_read+0x74/0xf0
 __arm64_sys_read+0x24/0x30
 el0_svc_common.constprop.0+0xd8/0x1a0
 el0_svc_handler+0x34/0x88
 el0_svc+0x10/0x14
Code: f9001bbf 943e0732 f94066c2 b4000062 (f9400041)
---[ end trace b495bdcb0b3b732b ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 0,2-4,6,8,11,13-15
Kernel Offset: disabled
CPU features: 0x0,21006008
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---

Fix it by using 'cpumask_of(cpu)' to get the cpumask.

Signed-off-by: Hao Si <si.hao@zte.com.cn>
Signed-off-by: Lin Chen <chen.lin5@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/soc/fsl/dpio/dpio-driver.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-driver.c b/drivers/soc/fsl/dpio/dpio-driver.c
index b60b77bfaffae..ea6f8904c01b5 100644
--- a/drivers/soc/fsl/dpio/dpio-driver.c
+++ b/drivers/soc/fsl/dpio/dpio-driver.c
@@ -53,7 +53,6 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
 	struct dpio_priv *priv;
 	int error;
 	struct fsl_mc_device_irq *irq;
-	cpumask_t mask;
 
 	priv = dev_get_drvdata(&dpio_dev->dev);
 
@@ -72,9 +71,7 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
 	}
 
 	/* set the affinity hint */
-	cpumask_clear(&mask);
-	cpumask_set_cpu(cpu, &mask);
-	if (irq_set_affinity_hint(irq->msi_desc->irq, &mask))
+	if (irq_set_affinity_hint(irq->msi_desc->irq, cpumask_of(cpu)))
 		dev_err(&dpio_dev->dev,
 			"irq_set_affinity failed irq %d cpu %d\n",
 			irq->msi_desc->irq, cpu);
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Alexander Gordeev @ 2020-12-03 17:03 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, linuxppc-dev, Arnd Bergmann, Vasily Gorbik,
	Christian Borntraeger, Peter Zijlstra, Catalin Marinas,
	Heiko Carstens, X86 ML, LKML, Nicholas Piggin, Linux-MM,
	Dave Hansen, Mathieu Desnoyers, Will Deacon
In-Reply-To: <CALCETrXAR_9EGaOF8ymVkZycxgZkYk0dR+NjEpTfVzdcS3sOVw@mail.gmail.com>

On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote:
> other arch folk: there's some background here:
> 
> https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com
> 
> On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski <luto@kernel.org> wrote:
> >
> > On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@kernel.org> wrote:
> > >
> > > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> > > >
> > > > On big systems, the mm refcount can become highly contented when doing
> > > > a lot of context switching with threaded applications (particularly
> > > > switching between the idle thread and an application thread).
> > > >
> > > > Abandoning lazy tlb slows switching down quite a bit in the important
> > > > user->idle->user cases, so so instead implement a non-refcounted scheme
> > > > that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
> > > > any remaining lazy ones.
> > > >
> > > > Shootdown IPIs are some concern, but they have not been observed to be
> > > > a big problem with this scheme (the powerpc implementation generated
> > > > 314 additional interrupts on a 144 CPU system during a kernel compile).
> > > > There are a number of strategies that could be employed to reduce IPIs
> > > > if they turn out to be a problem for some workload.
> > >
> > > I'm still wondering whether we can do even better.
> > >
> >
> > Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
> > the TLB.  On x86, this will shoot down all lazies as long as even a
> > single pagetable was freed.  (Or at least it will if we don't have a
> > serious bug, but the code seems okay.  We'll hit pmd_free_tlb, which
> > sets tlb->freed_tables, which will trigger the IPI.)  So, on
> > architectures like x86, the shootdown approach should be free.  The
> > only way it ought to have any excess IPIs is if we have CPUs in
> > mm_cpumask() that don't need IPI to free pagetables, which could
> > happen on paravirt.
> 
> Indeed, on x86, we do this:
> 
> [   11.558844]  flush_tlb_mm_range.cold+0x18/0x1d
> [   11.559905]  tlb_finish_mmu+0x10e/0x1a0
> [   11.561068]  exit_mmap+0xc8/0x1a0
> [   11.561932]  mmput+0x29/0xd0
> [   11.562688]  do_exit+0x316/0xa90
> [   11.563588]  do_group_exit+0x34/0xb0
> [   11.564476]  __x64_sys_exit_group+0xf/0x10
> [   11.565512]  do_syscall_64+0x34/0x50
> 
> and we have info->freed_tables set.
> 
> What are the architectures that have large systems like?
> 
> x86: we already zap lazies, so it should cost basically nothing to do
> a little loop at the end of __mmput() to make sure that no lazies are
> left.  If we care about paravirt performance, we could implement one
> of the optimizations I mentioned above to fix up the refcounts instead
> of sending an IPI to any remaining lazies.
> 
> arm64: AFAICT arm64's flush uses magic arm64 hardware support for
> remote flushes, so any lazy mm references will still exist after
> exit_mmap().  (arm64 uses lazy TLB, right?)  So this is kind of like
> the x86 paravirt case.  Are there large enough arm64 systems that any
> of this matters?
> 
> s390x: The code has too many acronyms for me to understand it fully,
> but I think it's more or less the same situation as arm64.  How big do
> s390x systems come?
> 
> power: Ridiculously complicated, seems to vary by system and kernel config.
> 
> So, Nick, your unconditional IPI scheme is apparently a big
> improvement for power, and it should be an improvement and have low
> cost for x86.  On arm64 and s390x it will add more IPIs on process
> exit but reduce contention on context switching depending on how lazy

s390 does not invalidate TLBs per-CPU explicitly - we have special
instructions for that. Those in turn initiate signalling to other
CPUs, completely transparent to OS.

Apart from mm_count, I am struggling to realize how the suggested
scheme could change the the contention on s390 in connection with
TLB. Could you clarify a bit here, please?

> TLB works.  I suppose we could try it for all architectures without
> any further optimizations.  Or we could try one of the perhaps
> excessively clever improvements I linked above.  arm64, s390x people,
> what do you think?

I do not immediately see anything in the series that would harm
performance on s390.

We however use mm_cpumask to distinguish between local and global TLB
flushes. With this series it looks like mm_cpumask is *required* to
be consistent with lazy users. And that is something quite diffucult
for us to adhere (at least in the foreseeable future).

But actually keeping track of lazy users in a cpumask is something
the generic code would rather do AFAICT.

Thanks!

^ permalink raw reply

* Re: [PATCH AUTOSEL 5.9 27/39] sched/idle: Fix arch_cpu_idle() vs tracing
From: Peter Zijlstra @ 2020-12-03 17:10 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Sasha Levin, Mark Rutland, linux-ia64, linux-parisc, linux-s390,
	linux-hexagon, linux-sh, linux-um, linux-kernel, stable,
	linux-mips, openrisc, sparclinux, linux-csky, Sven Schnelle,
	linux-alpha, uclinux-h8-devel, linux-riscv, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201203145442.GC9994@osiris>

On Thu, Dec 03, 2020 at 03:54:42PM +0100, Heiko Carstens wrote:
> On Thu, Dec 03, 2020 at 08:28:21AM -0500, Sasha Levin wrote:
> > From: Peter Zijlstra <peterz@infradead.org>
> > 
> > [ Upstream commit 58c644ba512cfbc2e39b758dd979edd1d6d00e27 ]
> > 
> > We call arch_cpu_idle() with RCU disabled, but then use
> > local_irq_{en,dis}able(), which invokes tracing, which relies on RCU.
> > 
> > Switch all arch_cpu_idle() implementations to use
> > raw_local_irq_{en,dis}able() and carefully manage the
> > lockdep,rcu,tracing state like we do in entry.
> > 
> > (XXX: we really should change arch_cpu_idle() to not return with
> > interrupts enabled)
> > 
> > Reported-by: Sven Schnelle <svens@linux.ibm.com>
> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> > Reviewed-by: Mark Rutland <mark.rutland@arm.com>
> > Tested-by: Mark Rutland <mark.rutland@arm.com>
> > Link: https://lkml.kernel.org/r/20201120114925.594122626@infradead.org
> > Signed-off-by: Sasha Levin <sashal@kernel.org>
> 
> This patch broke s390 irq state tracing. A patch to fix this is
> scheduled to be merged upstream today (hopefully).
> Therefore I think this patch should not yet go into 5.9 stable.

Agreed.

^ permalink raw reply

* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Andy Lutomirski @ 2020-12-03 17:14 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-arch, linuxppc-dev, Arnd Bergmann, Vasily Gorbik,
	Dave Hansen, Peter Zijlstra, Catalin Marinas, Heiko Carstens,
	X86 ML, LKML, Nicholas Piggin, Linux-MM, Christian Borntraeger,
	Mathieu Desnoyers, Andy Lutomirski, Will Deacon
In-Reply-To: <20201203170332.GA27195@oc3871087118.ibm.com>



> On Dec 3, 2020, at 9:09 AM, Alexander Gordeev <agordeev@linux.ibm.com> wrote:
> 
> On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote:
>> other arch folk: there's some background here:

> 
>> 
>> power: Ridiculously complicated, seems to vary by system and kernel config.
>> 
>> So, Nick, your unconditional IPI scheme is apparently a big
>> improvement for power, and it should be an improvement and have low
>> cost for x86.  On arm64 and s390x it will add more IPIs on process
>> exit but reduce contention on context switching depending on how lazy
> 
> s390 does not invalidate TLBs per-CPU explicitly - we have special
> instructions for that. Those in turn initiate signalling to other
> CPUs, completely transparent to OS.

Just to make sure I understand: this means that you broadcast flushes to all CPUs, not just a subset?

> 
> Apart from mm_count, I am struggling to realize how the suggested
> scheme could change the the contention on s390 in connection with
> TLB. Could you clarify a bit here, please?

I’m just talking about mm_count. Maintaining mm_count is quite expensive on some workloads.

> 
>> TLB works.  I suppose we could try it for all architectures without
>> any further optimizations.  Or we could try one of the perhaps
>> excessively clever improvements I linked above.  arm64, s390x people,
>> what do you think?
> 
> I do not immediately see anything in the series that would harm
> performance on s390.
> 
> We however use mm_cpumask to distinguish between local and global TLB
> flushes. With this series it looks like mm_cpumask is *required* to
> be consistent with lazy users. And that is something quite diffucult
> for us to adhere (at least in the foreseeable future).

You don’t actually need to maintain mm_cpumask — we could scan all CPUs instead.

> 
> But actually keeping track of lazy users in a cpumask is something
> the generic code would rather do AFAICT.

The problem is that arches don’t agree on what the contents of mm_cpumask should be.  Tracking a mask of exactly what the arch wants in generic code is a nontrivial operation.

^ permalink raw reply

* Re: [PATCH] powerpc/hotplug: assign hot added LMB to the right node
From: Greg KH @ 2020-12-03 18:25 UTC (permalink / raw)
  To: Laurent Dufour
  Cc: Nathan Lynch, Scott Cheloha, linux-kernel, stable, Paul Mackerras,
	linuxppc-dev
In-Reply-To: <20201203101514.33591-1-ldufour@linux.ibm.com>

On Thu, Dec 03, 2020 at 11:15:14AM +0100, Laurent Dufour wrote:
> This patch applies to 5.9 and earlier kernels only.
> 
> Since 5.10, this has been fortunately fixed by the commit
> e5e179aa3a39 ("pseries/drmem: don't cache node id in drmem_lmb struct").

Why can't we just backport that patch instead?  It's almost always
better to do that than to have a one-off patch, as almost always those
have bugs in them.

thanks,

greg k-h

^ permalink raw reply

* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Alexander Gordeev @ 2020-12-03 18:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, linuxppc-dev, Arnd Bergmann, Vasily Gorbik,
	Dave Hansen, Peter Zijlstra, Catalin Marinas, Heiko Carstens,
	X86 ML, LKML, Nicholas Piggin, Linux-MM, Christian Borntraeger,
	Mathieu Desnoyers, Andy Lutomirski, Will Deacon
In-Reply-To: <E6BC2596-6087-49F2-8758-CA5598998BBE@amacapital.net>

On Thu, Dec 03, 2020 at 09:14:22AM -0800, Andy Lutomirski wrote:
> 
> 
> > On Dec 3, 2020, at 9:09 AM, Alexander Gordeev <agordeev@linux.ibm.com> wrote:
> > 
> > On Mon, Nov 30, 2020 at 10:31:51AM -0800, Andy Lutomirski wrote:
> >> other arch folk: there's some background here:
> 
> > 
> >> 
> >> power: Ridiculously complicated, seems to vary by system and kernel config.
> >> 
> >> So, Nick, your unconditional IPI scheme is apparently a big
> >> improvement for power, and it should be an improvement and have low
> >> cost for x86.  On arm64 and s390x it will add more IPIs on process
> >> exit but reduce contention on context switching depending on how lazy
> > 
> > s390 does not invalidate TLBs per-CPU explicitly - we have special
> > instructions for that. Those in turn initiate signalling to other
> > CPUs, completely transparent to OS.
> 
> Just to make sure I understand: this means that you broadcast flushes to all CPUs, not just a subset?

Correct.
If mm has one CPU attached we flush TLB only for that CPU.
If mm has more than one CPUs attached we flush all CPUs' TLBs.

In fact, details are bit more complicated, since the hardware
is able to flush subsets of TLB entries depending on provided
parameters (e.g page tables used to create that entries).
But we can not select a CPU subset.

> > Apart from mm_count, I am struggling to realize how the suggested
> > scheme could change the the contention on s390 in connection with
> > TLB. Could you clarify a bit here, please?
> 
> I’m just talking about mm_count. Maintaining mm_count is quite expensive on some workloads.
> 
> > 
> >> TLB works.  I suppose we could try it for all architectures without
> >> any further optimizations.  Or we could try one of the perhaps
> >> excessively clever improvements I linked above.  arm64, s390x people,
> >> what do you think?
> > 
> > I do not immediately see anything in the series that would harm
> > performance on s390.
> > 
> > We however use mm_cpumask to distinguish between local and global TLB
> > flushes. With this series it looks like mm_cpumask is *required* to
> > be consistent with lazy users. And that is something quite diffucult
> > for us to adhere (at least in the foreseeable future).
> 
> You don’t actually need to maintain mm_cpumask — we could scan all CPUs instead.
> 
> > 
> > But actually keeping track of lazy users in a cpumask is something
> > the generic code would rather do AFAICT.
> 
> The problem is that arches don’t agree on what the contents of mm_cpumask should be.  Tracking a mask of exactly what the arch wants in generic code is a nontrivial operation.

It could be yet another cpumask or the CPU scan you mentioned.
Just wanted to make sure there is no new requirement for an arch
to maintain mm_cpumask ;)

Thanks, Andy!

^ permalink raw reply

* Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses
From: Qian Cai @ 2020-12-03 17:17 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev; +Cc: cmr, spoorts2, npiggin
In-Reply-To: <20201119231333.361771-4-dja@axtens.net>

On Fri, 2020-11-20 at 10:13 +1100, Daniel Axtens wrote:
> From: Nicholas Piggin <npiggin@gmail.com>
> 
> IBM Power9 processors can speculatively operate on data in the L1 cache
> before it has been completely validated, via a way-prediction mechanism. It
> is not possible for an attacker to determine the contents of impermissible
> memory using this method, since these systems implement a combination of
> hardware and software security measures to prevent scenarios where
> protected data could be leaked.
> 
> However these measures don't address the scenario where an attacker induces
> the operating system to speculatively execute instructions using data that
> the attacker controls. This can be used for example to speculatively bypass
> "kernel user access prevention" techniques, as discovered by Anthony
> Steinhauser of Google's Safeside Project. This is not an attack by itself,
> but there is a possibility it could be used in conjunction with
> side-channels or other weaknesses in the privileged code to construct an
> attack.
> 
> This issue can be mitigated by flushing the L1 cache between privilege
> boundaries of concern. This patch flushes the L1 cache after user accesses.
> 
> This is part of the fix for CVE-2020-4788.
> 
> Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
> Signed-off-by: Daniel Axtens <dja@axtens.net>
> ---
>  .../admin-guide/kernel-parameters.txt         |  4 +
>  .../powerpc/include/asm/book3s/64/kup-radix.h | 66 ++++++++------
>  arch/powerpc/include/asm/exception-64s.h      |  3 +
>  arch/powerpc/include/asm/feature-fixups.h     |  9 ++
>  arch/powerpc/include/asm/kup.h                | 19 +++--
>  arch/powerpc/include/asm/security_features.h  |  3 +
>  arch/powerpc/include/asm/setup.h              |  1 +
>  arch/powerpc/kernel/exceptions-64s.S          | 85 ++++++-------------
>  arch/powerpc/kernel/setup_64.c                | 62 ++++++++++++++
>  arch/powerpc/kernel/vmlinux.lds.S             |  7 ++
>  arch/powerpc/lib/feature-fixups.c             | 50 +++++++++++
>  arch/powerpc/platforms/powernv/setup.c        | 10 ++-
>  arch/powerpc/platforms/pseries/setup.c        |  4 +
>  13 files changed, 233 insertions(+), 90 deletions(-)
[]
> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> index 3ee1ec60be84..97c2394e7dea 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
[]
> +static inline bool
> +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
> +{
> +	return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
> +		    (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
> +		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
> +}

A simple "echo t > /proc/sysrq-trigger" will trigger this warning almost
endlessly on Power8 NV.

.config: 
https://cailca.coding.net/public/linux/mm/git/files/master/powerpc.config

[  391.734028][ T1986] Bug: Read fault blocked by AMR!
[  391.734032][ T1986] WARNING: CPU: 80 PID: 1986 at arch/powerpc/include/asm/book3s/64/kup-radix.h:145 do_page_fault+0x8fc/0xb70
[  391.734232][ T1986] Modules linked in: kvm_hv kvm ip_tables x_tables sd_mod ahci libahci tg3 libata firmware_class libphy dm_mirror dm_region_hash dm_log dm_mod
[  391.734425][ T1986] CPU: 80 PID: 1986 Comm: bash Tainted: G        W         5.10.0-rc6-next-20201203+ #3
[  391.734535][ T1986] NIP:  c00000000004dd1c LR: c00000000004dd18 CTR: 0000000000000000
[  391.734648][ T1986] REGS: c00020003a0bf3a0 TRAP: 0700   Tainted: G        W          (5.10.0-rc6-next-20201203+)
[  391.734768][ T1986] MSR:  9000000000021033 <SF,HV,ME,IR,DR,RI,LE>  CR: 48222284  XER: 00000000
[  391.734906][ T1986] CFAR: c0000000000bb05c IRQMASK: 1 
[  391.734906][ T1986] GPR00: c00000000004dd18 c00020003a0bf630 c000000007fe0d00 000000000000001f 
[  391.734906][ T1986] GPR04: c000000000f1cc58 0000000000000003 0000000000000027 c000201cc6207218 
[  391.734906][ T1986] GPR08: 0000000000000023 0000000000000002 c00020004753bd80 c000000007f1cee8 
[  391.734906][ T1986] GPR12: 0000000000002000 c000201fff7f8380 0000000040000000 0000000110929798 
[  391.734906][ T1986] GPR16: 0000000110929724 00000001108c6988 000000011085f290 000000011092d568 
[  391.734906][ T1986] GPR20: 00000001229f1f80 0000000000000001 0000000000000001 c000000000aa8dc8 
[  391.734906][ T1986] GPR24: c000000000ab4a00 c00020001cc8c880 0000000000000000 0000000000000000 
[  391.734906][ T1986] GPR28: c00000000801aa18 0000000000000160 c00020003a0bf760 0000000000000300 
[  391.735865][ T1986] NIP [c00000000004dd1c] do_page_fault+0x8fc/0xb70
[  391.735947][ T1986] LR [c00000000004dd18] do_page_fault+0x8f8/0xb70
[  391.736033][ T1986] Call Trace:
[  391.736072][ T1986] [c00020003a0bf630] [c00000000004dd18] do_page_fault+0x8f8/0xb70 (unreliable)
[  391.736181][ T1986] [c00020003a0bf6f0] [c00000000000c1b8] handle_page_fault+0x10/0x2c
[  391.736294][ T1986] --- interrupt: 300 at copy_from_kernel_nofault+0x68/0x190
[  391.736294][ T1986]     LR = copy_from_kernel_nofault+0x40/0x190
[  391.736441][ T1986] [c00020003a0bf9f0] [c00020003a0bfa30] 0xc00020003a0bfa30 (unreliable)
[  391.736565][ T1986] [c00020003a0bfa30] [c0000000000edd98] print_worker_info+0xe8/0x1c0
[  391.736672][ T1986] [c00020003a0bfaf0] [c000000000104b0c] sched_show_task+0x2dc/0x350
[  391.736807][ T1986] [c00020003a0bfb70] [c000000000112cd8] show_state_filter+0x148/0x320
[  391.736899][ T1986] [c00020003a0bfbe0] [c00000000070a3f4] sysrq_handle_showstate+0x24/0x40
[  391.736995][ T1986] [c00020003a0bfc00] [c00000000070add4] __handle_sysrq+0x164/0x280
[  391.737111][ T1986] [c00020003a0bfca0] [c00000000070b03c] write_sysrq_trigger+0xfc/0x13c
[  391.737233][ T1986] [c00020003a0bfce0] [c0000000004c579c] proc_reg_write+0x10c/0x1b0
[  391.737327][ T1986] [c00020003a0bfd10] [c0000000003e5ec4] vfs_write+0xf4/0x480
[  391.737431][ T1986] [c00020003a0bfd70] [c0000000003e642c] ksys_write+0x7c/0x140
[  391.737536][ T1986] [c00020003a0bfdc0] [c00000000002c578] system_call_exception+0xf8/0x1d0
[  391.737623][ T1986] [c00020003a0bfe20] [c00000000000d1c8] system_call_common+0xe8/0x218
[  391.737708][ T1986] Instruction dump:
[  391.737777][ T1986] 60000000 2fbb0000 e93e0168 419e007c 2fa90000 3c82f8ac 388467a0 409cfed4 
[  391.737913][ T1986] 3c62f8ac 386368a8 4806d2e1 60000000 <0fe00000> e80100d0 3ae0000b eb410090 
[  391.738041][ T1986] irq event stamp: 126198
[  391.738077][ T1986] hardirqs last  enabled at (126197): [<c00000000002cbf4>] interrupt_exit_kernel_prepare+0xb4/0x250
[  391.738196][ T1986] hardirqs last disabled at (126198): [<c00000000000897c>] data_access_common_virt+0x16c/0x180
[  391.738327][ T1986] softirqs last  enabled at (126196): [<c000000000948f08>] __do_softirq+0x388/0x704
[  391.738427][ T1986] softirqs last disabled at (126191): [<c0000000000c75b8>] irq_exit+0x198/0x1c0
[  392.177827][ T1986] ---[ end trace 8eaf99b33f09def0 ]---
[  392.177934][ T1986] Workqueue:  0x0 (mm_percpu_wq)
[  392.177994][ T1986] Call Trace:
[  392.178048][ T1986] [c00000002afbf9d0] [c000000000ab0ee8] __func__.4060+0x125178/0x185


^ permalink raw reply

* Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses
From: Qian Cai @ 2020-12-03 17:31 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev; +Cc: cmr, spoorts2, npiggin
In-Reply-To: <e82f315e08fe9f13ce4e94259968e0782ebb57a3.camel@redhat.com>

On Thu, 2020-12-03 at 12:17 -0500, Qian Cai wrote:
> A simple "echo t > /proc/sysrq-trigger" will trigger this warning almost
> endlessly on Power8 NV.

Correction -- POWER9 NV.


^ permalink raw reply

* Re: [PATCH 3/7] powerpc/64s: flush L1D after user accesses
From: Qian Cai @ 2020-12-03 18:50 UTC (permalink / raw)
  To: Daniel Axtens, linuxppc-dev; +Cc: cmr, spoorts2, npiggin
In-Reply-To: <e82f315e08fe9f13ce4e94259968e0782ebb57a3.camel@redhat.com>

On Thu, 2020-12-03 at 12:17 -0500, Qian Cai wrote:
> []
> > +static inline bool
> > +bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
> > +{
> > +	return WARN(mmu_has_feature(MMU_FTR_RADIX_KUAP) &&
> > +		    (regs->kuap & (is_write ? AMR_KUAP_BLOCK_WRITE : AMR_KUAP_BLOCK_READ)),
> > +		    "Bug: %s fault blocked by AMR!", is_write ? "Write" : "Read");
> > +}
> 
> A simple "echo t > /proc/sysrq-trigger" will trigger this warning almost
> endlessly on POWER9 NV.

I have just realized the patch just moved this warning around, so the issue was
pre-existent. Since I have not tested sysrq-t regularly, I am not sure when it
started to break. So far, I have reverted some of those for testing which did
not help, i.e., the sysrq-t issue remains.

16852975f0f  Revert "powerpc/64s: Use early_mmu_has_feature() in set_kuap()"
129e240ead32 Revert "powerpc: Implement user_access_save() and user_access_restore()"
edb0046c842c Revert "powerpc/64s/kuap: Add missing isync to KUAP restore paths"
2d46ee87ce44 Revert "powerpc/64/kuap: Conditionally restore AMR in interrupt exit"
c1e0e805fc57 Revert "powerpc/64s/kuap: Conditionally restore AMR in kuap_restore_amr asm"
7f30b7aaf23a Revert "selftests/powerpc: rfi_flush: disable entry flush if present"
bc9b9967a100 Revert "powerpc/64s: flush L1D on kernel entry"
b77e7b54f5eb Revert "powerpc/64s: flush L1D after user accesses"
22dddf532c64 Revert "powerpc: Only include kup-radix.h for 64-bit Book3S"
2679d155c46a Revert "selftests/powerpc: entry flush test"
87954b9b4243 Revert "selftests/powerpc: refactor entry and rfi_flush tests"
342d82bd4c5d Revert "powerpc/64s: rename pnv|pseries_setup_rfi_flush to _setup_security_mitigations"


^ permalink raw reply

* Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: Nicholas Piggin @ 2020-12-03 22:13 UTC (permalink / raw)
  To: Andy Lutomirski, Peter Zijlstra
  Cc: linux-arch, X86 ML, Arnd Bergmann, Linux-MM, Catalin Marinas,
	Rik van Riel, LKML, Dave Hansen, Will Deacon, Mathieu Desnoyers,
	linuxppc-dev
In-Reply-To: <20201203084448.GF2414@hirez.programming.kicks-ass.net>

Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm:
> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:
> 
>> power: same as ARM, except that the loop may be rather larger since
>> the systems are bigger.  But I imagine it's still faster than Nick's
>> approach -- a cmpxchg to a remote cacheline should still be faster than
>> an IPI shootdown. 
> 
> While a single atomic might be cheaper than an IPI, the comparison
> doesn't work out nicely. You do the xchg() on every unlazy, while the
> IPI would be once per process exit.
> 
> So over the life of the process, it might do very many unlazies, adding
> up to a total cost far in excess of what the single IPI would've been.

Yeah this is the concern, I looked at things that add cost to the
idle switch code and it gets hard to justify the scalability improvement
when you slow these fundmaental things down even a bit.

I still think working on the assumption that IPIs = scary expensive 
might not be correct. An IPI itself is, but you only issue them when 
you've left a lazy mm on another CPU which just isn't that often.

Thanks,
Nick

^ permalink raw reply

* [PATCH] ASoC: fsl_aud2htx: mark PM functions as __maybe_unused
From: Arnd Bergmann @ 2020-12-03 22:28 UTC (permalink / raw)
  To: Timur Tabi, Nicolin Chen, Xiubo Li, Liam Girdwood, Mark Brown,
	Jaroslav Kysela, Takashi Iwai, Shengjiu Wang
  Cc: alsa-devel, Arnd Bergmann, Fabio Estevam, linuxppc-dev,
	linux-kernel, Shengjiu Wang

From: Arnd Bergmann <arnd@arndb.de>

When CONFIG_PM is disabled, we get a warning for unused functions:

sound/soc/fsl/fsl_aud2htx.c:261:12: error: unused function 'fsl_aud2htx_runtime_suspend' [-Werror,-Wunused-function]
static int fsl_aud2htx_runtime_suspend(struct device *dev)
sound/soc/fsl/fsl_aud2htx.c:271:12: error: unused function 'fsl_aud2htx_runtime_resume' [-Werror,-Wunused-function]
static int fsl_aud2htx_runtime_resume(struct device *dev)

Mark these as __maybe_unused to avoid the warning without adding
an #ifdef.

Fixes: 8a24c834c053 ("ASoC: fsl_aud2htx: Add aud2htx module driver")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 sound/soc/fsl/fsl_aud2htx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/sound/soc/fsl/fsl_aud2htx.c b/sound/soc/fsl/fsl_aud2htx.c
index 4091ccc7c3e9..d70d5e75a30c 100644
--- a/sound/soc/fsl/fsl_aud2htx.c
+++ b/sound/soc/fsl/fsl_aud2htx.c
@@ -258,7 +258,7 @@ static int fsl_aud2htx_remove(struct platform_device *pdev)
 	return 0;
 }
 
-static int fsl_aud2htx_runtime_suspend(struct device *dev)
+static int __maybe_unused fsl_aud2htx_runtime_suspend(struct device *dev)
 {
 	struct fsl_aud2htx *aud2htx = dev_get_drvdata(dev);
 
@@ -268,7 +268,7 @@ static int fsl_aud2htx_runtime_suspend(struct device *dev)
 	return 0;
 }
 
-static int fsl_aud2htx_runtime_resume(struct device *dev)
+static int __maybe_unused fsl_aud2htx_runtime_resume(struct device *dev)
 {
 	struct fsl_aud2htx *aud2htx = dev_get_drvdata(dev);
 	int ret;
-- 
2.27.0


^ permalink raw reply related

* Re: [PATCH 0/6] Add documentation for Documentation/features at the built docs
From: Jonathan Corbet @ 2020-12-03 22:36 UTC (permalink / raw)
  To: Mauro Carvalho Chehab
  Cc: Rich Felker, linux-ia64, Linux Doc Mailing List, x86, linux-mips,
	James E.J. Bottomley, Paul Mackerras, H. Peter Anvin, linux-riscv,
	Will Deacon, Thomas Gleixner, Jonas Bonn, linux-s390,
	Yoshinori Sato, Helge Deller, linux-sh, Christian Borntraeger,
	Ingo Molnar, Catalin Marinas, Fenghua Yu, Albert Ou, Kees Cook,
	Vasily Gorbik, Heiko Carstens, Jonathan Neuschäfer,
	Stefan Kristiansson, Tony Luck, Borislav Petkov, Paul Walmsley,
	Stafford Horne, Daniel W. S. Almeida, linux-arm-kernel,
	Thomas Bogendoerfer, linux-parisc, Andrew Cooper, linux-kernel,
	openrisc, Palmer Dabbelt, Masami Hiramatsu, Greg Kroah-Hartman,
	linuxppc-dev
In-Reply-To: <cover.1606748711.git.mchehab+huawei@kernel.org>

On Mon, 30 Nov 2020 16:36:29 +0100
Mauro Carvalho Chehab <mchehab+huawei@kernel.org> wrote:

> This series got already submitted last year:
> 
>    https://lore.kernel.org/lkml/cover.1561222784.git.mchehab+samsung@kernel.org/
> 
> Yet, on that time, there were too many other patches related to ReST
> conversion floating around. So, at the end, I guess this one got missed.
> 
> So, I did a rebase on the top of upstream, and added a few new changes.

OK, I've gone ahead and applied these; it gains me a new trivial conflict
with x86, but so be it...

That said, I think that the RST table formatting could be *way* improved.
The current tables are all white space and hard to make sense of.  What if
we condensed the information?  Just looking at the first entry in
Documentation/admin-guide/features.html, perhaps it could look like:

    FEATURE	KCONFIG/DESCRIPTION		STATUS

    cBPF-JIT	HAVE_CBPF_JIT			TODO: alpha, arc, arm...
    						ok: mips, powerpc, ...
		arch supports cBPF JIT
		optimizations

The result would be far more compact and easy to read, IMO.  I may get
around to giving this a try if (hint :) nobody else gets there first.

Thanks,

jon

^ permalink raw reply

* Re: [PATCH] ASoC: fsl_aud2htx: mark PM functions as __maybe_unused
From: Nicolin Chen @ 2020-12-03 22:36 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: alsa-devel, linuxppc-dev, Arnd Bergmann, Timur Tabi,
	Liam Girdwood, Shengjiu Wang, Shengjiu Wang, Xiubo Li,
	Takashi Iwai, Jaroslav Kysela, Mark Brown, Fabio Estevam,
	linux-kernel
In-Reply-To: <20201203222900.1042578-1-arnd@kernel.org>

On Thu, Dec 03, 2020 at 11:28:47PM +0100, Arnd Bergmann wrote:
> From: Arnd Bergmann <arnd@arndb.de>
> 
> When CONFIG_PM is disabled, we get a warning for unused functions:
> 
> sound/soc/fsl/fsl_aud2htx.c:261:12: error: unused function 'fsl_aud2htx_runtime_suspend' [-Werror,-Wunused-function]
> static int fsl_aud2htx_runtime_suspend(struct device *dev)
> sound/soc/fsl/fsl_aud2htx.c:271:12: error: unused function 'fsl_aud2htx_runtime_resume' [-Werror,-Wunused-function]
> static int fsl_aud2htx_runtime_resume(struct device *dev)
> 
> Mark these as __maybe_unused to avoid the warning without adding
> an #ifdef.
> 
> Fixes: 8a24c834c053 ("ASoC: fsl_aud2htx: Add aud2htx module driver")
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>

Acked-by: Nicolin Chen <nicoleotsuka@gmail.com>

^ permalink raw reply

* Re: [PATCH kernel v2] vfio/pci/nvlink2: Do not attempt NPU2 setup on POWER8NVL NPU
From: Alex Williamson @ 2020-12-03 23:48 UTC (permalink / raw)
  To: Alexey Kardashevskiy
  Cc: kvm, kvm-ppc, stable, Leonardo Augusto Guimarães Garcia,
	linuxppc-dev, David Gibson
In-Reply-To: <20201122073950.15684-1-aik@ozlabs.ru>

On Sun, 22 Nov 2020 18:39:50 +1100
Alexey Kardashevskiy <aik@ozlabs.ru> wrote:

> We execute certain NPU2 setup code (such as mapping an LPID to a device
> in NPU2) unconditionally if an Nvlink bridge is detected. However this
> cannot succeed on POWER8NVL machines as the init helpers return an error
> other than ENODEV which means the device is there is and setup failed so
> vfio_pci_enable() fails and pass through is not possible.
> 
> This changes the two NPU2 related init helpers to return -ENODEV if
> there is no "memory-region" device tree property as this is
> the distinction between NPU and NPU2.
> 
> Tested on
> - POWER9 pvr=004e1201, Ubuntu 19.04 host, Ubuntu 18.04 vm,
>   NVIDIA GV100 10de:1db1 driver 418.39
> - POWER8 pvr=004c0100, RHEL 7.6 host, Ubuntu 16.10 vm,
>   NVIDIA P100 10de:15f9 driver 396.47
> 
> Fixes: 7f92891778df ("vfio_pci: Add NVIDIA GV100GL [Tesla V100 SXM2] subdriver")
> Cc: stable@vger.kernel.org # 5.0
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * updated commit log with tested configs and replaced P8+ with POWER8NVL for clarity
> ---
>  drivers/vfio/pci/vfio_pci_nvlink2.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)

Thanks, applies to vfio next branch for v5.11.

Alex


> 
> diff --git a/drivers/vfio/pci/vfio_pci_nvlink2.c b/drivers/vfio/pci/vfio_pci_nvlink2.c
> index 65c61710c0e9..9adcf6a8f888 100644
> --- a/drivers/vfio/pci/vfio_pci_nvlink2.c
> +++ b/drivers/vfio/pci/vfio_pci_nvlink2.c
> @@ -231,7 +231,7 @@ int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev)
>  		return -EINVAL;
>  
>  	if (of_property_read_u32(npu_node, "memory-region", &mem_phandle))
> -		return -EINVAL;
> +		return -ENODEV;
>  
>  	mem_node = of_find_node_by_phandle(mem_phandle);
>  	if (!mem_node)
> @@ -393,7 +393,7 @@ int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
>  	int ret;
>  	struct vfio_pci_npu2_data *data;
>  	struct device_node *nvlink_dn;
> -	u32 nvlink_index = 0;
> +	u32 nvlink_index = 0, mem_phandle = 0;
>  	struct pci_dev *npdev = vdev->pdev;
>  	struct device_node *npu_node = pci_device_to_OF_node(npdev);
>  	struct pci_controller *hose = pci_bus_to_host(npdev->bus);
> @@ -408,6 +408,9 @@ int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
>  	if (!pnv_pci_get_gpu_dev(vdev->pdev))
>  		return -ENODEV;
>  
> +	if (of_property_read_u32(npu_node, "memory-region", &mem_phandle))
> +		return -ENODEV;
> +
>  	/*
>  	 * NPU2 normally has 8 ATSD registers (for concurrency) and 6 links
>  	 * so we can allocate one register per link, using nvlink index as


^ permalink raw reply

* Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: Andy Lutomirski @ 2020-12-04  2:17 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, X86 ML, Arnd Bergmann, Linux-MM, Peter Zijlstra,
	Catalin Marinas, Rik van Riel, LKML, Dave Hansen, Will Deacon,
	Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <1607033145.hcppy9ndl4.astroid@bobo.none>


> On Dec 3, 2020, at 2:13 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
> 
> Excerpts from Peter Zijlstra's message of December 3, 2020 6:44 pm:
>>> On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:
>>> 
>>> power: same as ARM, except that the loop may be rather larger since
>>> the systems are bigger.  But I imagine it's still faster than Nick's
>>> approach -- a cmpxchg to a remote cacheline should still be faster than
>>> an IPI shootdown. 
>> 
>> While a single atomic might be cheaper than an IPI, the comparison
>> doesn't work out nicely. You do the xchg() on every unlazy, while the
>> IPI would be once per process exit.
>> 
>> So over the life of the process, it might do very many unlazies, adding
>> up to a total cost far in excess of what the single IPI would've been.
> 
> Yeah this is the concern, I looked at things that add cost to the
> idle switch code and it gets hard to justify the scalability improvement
> when you slow these fundmaental things down even a bit.

v2 fixes this and is generally much nicer. I’ll send it out in a couple hours.

> 
> I still think working on the assumption that IPIs = scary expensive 
> might not be correct. An IPI itself is, but you only issue them when 
> you've left a lazy mm on another CPU which just isn't that often.
> 
> Thanks,
> Nick

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox