LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.9 26/39] soc: fsl: dpio: Get the cpumask through cpumask_of(cpu)
From: Sasha Levin @ 2020-12-03 13:28 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Yi Wang, Sasha Levin, Hao Si, Li Yang, Lin Chen, linuxppc-dev,
	linux-arm-kernel
In-Reply-To: <20201203132834.930999-1-sashal@kernel.org>

From: Hao Si <si.hao@zte.com.cn>

[ Upstream commit 2663b3388551230cbc4606a40fabf3331ceb59e4 ]

The local variable 'cpumask_t mask' is in the stack memory, and its address
is assigned to 'desc->affinity' in 'irq_set_affinity_hint()'.
But the memory area where this variable is located is at risk of being
modified.

During LTP testing, the following error was generated:

Unable to handle kernel paging request at virtual address ffff000012e9b790
Mem abort info:
  ESR = 0x96000007
  Exception class = DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
Data abort info:
  ISV = 0, ISS = 0x00000007
  CM = 0, WnR = 0
swapper pgtable: 4k pages, 48-bit VAs, pgdp = 0000000075ac5e07
[ffff000012e9b790] pgd=00000027dbffe003, pud=00000027dbffd003,
pmd=00000027b6d61003, pte=0000000000000000
Internal error: Oops: 96000007 [#1] PREEMPT SMP
Modules linked in: xt_conntrack
Process read_all (pid: 20171, stack limit = 0x0000000044ea4095)
CPU: 14 PID: 20171 Comm: read_all Tainted: G    B   W
Hardware name: NXP Layerscape LX2160ARDB (DT)
pstate: 80000085 (Nzcv daIf -PAN -UAO)
pc : irq_affinity_hint_proc_show+0x54/0xb0
lr : irq_affinity_hint_proc_show+0x4c/0xb0
sp : ffff00001138bc10
x29: ffff00001138bc10 x28: 0000ffffd131d1e0
x27: 00000000007000c0 x26: ffff8025b9480dc0
x25: ffff8025b9480da8 x24: 00000000000003ff
x23: ffff8027334f8300 x22: ffff80272e97d000
x21: ffff80272e97d0b0 x20: ffff8025b9480d80
x19: ffff000009a49000 x18: 0000000000000000
x17: 0000000000000000 x16: 0000000000000000
x15: 0000000000000000 x14: 0000000000000000
x13: 0000000000000000 x12: 0000000000000040
x11: 0000000000000000 x10: ffff802735b79b88
x9 : 0000000000000000 x8 : 0000000000000000
x7 : ffff000009a49848 x6 : 0000000000000003
x5 : 0000000000000000 x4 : ffff000008157d6c
x3 : ffff00001138bc10 x2 : ffff000012e9b790
x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
 irq_affinity_hint_proc_show+0x54/0xb0
 seq_read+0x1b0/0x440
 proc_reg_read+0x80/0xd8
 __vfs_read+0x60/0x178
 vfs_read+0x94/0x150
 ksys_read+0x74/0xf0
 __arm64_sys_read+0x24/0x30
 el0_svc_common.constprop.0+0xd8/0x1a0
 el0_svc_handler+0x34/0x88
 el0_svc+0x10/0x14
Code: f9001bbf 943e0732 f94066c2 b4000062 (f9400041)
---[ end trace b495bdcb0b3b732b ]---
Kernel panic - not syncing: Fatal exception
SMP: stopping secondary CPUs
SMP: failed to stop secondary CPUs 0,2-4,6,8,11,13-15
Kernel Offset: disabled
CPU features: 0x0,21006008
Memory Limit: none
---[ end Kernel panic - not syncing: Fatal exception ]---

Fix it by using 'cpumask_of(cpu)' to get the cpumask.

Signed-off-by: Hao Si <si.hao@zte.com.cn>
Signed-off-by: Lin Chen <chen.lin5@zte.com.cn>
Signed-off-by: Yi Wang <wang.yi59@zte.com.cn>
Signed-off-by: Li Yang <leoyang.li@nxp.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/soc/fsl/dpio/dpio-driver.c | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/soc/fsl/dpio/dpio-driver.c b/drivers/soc/fsl/dpio/dpio-driver.c
index 7b642c330977f..7f397b4ad878d 100644
--- a/drivers/soc/fsl/dpio/dpio-driver.c
+++ b/drivers/soc/fsl/dpio/dpio-driver.c
@@ -95,7 +95,6 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
 {
 	int error;
 	struct fsl_mc_device_irq *irq;
-	cpumask_t mask;
 
 	irq = dpio_dev->irqs[0];
 	error = devm_request_irq(&dpio_dev->dev,
@@ -112,9 +111,7 @@ static int register_dpio_irq_handlers(struct fsl_mc_device *dpio_dev, int cpu)
 	}
 
 	/* set the affinity hint */
-	cpumask_clear(&mask);
-	cpumask_set_cpu(cpu, &mask);
-	if (irq_set_affinity_hint(irq->msi_desc->irq, &mask))
+	if (irq_set_affinity_hint(irq->msi_desc->irq, cpumask_of(cpu)))
 		dev_err(&dpio_dev->dev,
 			"irq_set_affinity failed irq %d cpu %d\n",
 			irq->msi_desc->irq, cpu);
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.9 18/39] ibmvnic: skip tx timeout reset while in resetting
From: Sasha Levin @ 2020-12-03 13:28 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, netdev, Dany Madden, Lijun Pan, Brian King,
	Jakub Kicinski, linuxppc-dev
In-Reply-To: <20201203132834.930999-1-sashal@kernel.org>

From: Lijun Pan <ljp@linux.ibm.com>

[ Upstream commit 855a631a4c11458a9cef1ab79c1530436aa95fae ]

Sometimes it takes longer than 5 seconds (watchdog timeout) to complete
failover, migration, and other resets. In stead of scheduling another
timeout reset, we wait for the current one to complete.

Suggested-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Lijun Pan <ljp@linux.ibm.com>
Reviewed-by: Dany Madden <drt@linux.ibm.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c
index 81ec233926acb..37012aa594f41 100644
--- a/drivers/net/ethernet/ibm/ibmvnic.c
+++ b/drivers/net/ethernet/ibm/ibmvnic.c
@@ -2368,6 +2368,12 @@ static void ibmvnic_tx_timeout(struct net_device *dev, unsigned int txqueue)
 {
 	struct ibmvnic_adapter *adapter = netdev_priv(dev);
 
+	if (test_bit(0, &adapter->resetting)) {
+		netdev_err(adapter->netdev,
+			   "Adapter is resetting, skip timeout reset\n");
+		return;
+	}
+
 	ibmvnic_reset(adapter, VNIC_RESET_TIMEOUT);
 }
 
-- 
2.27.0


^ permalink raw reply related

* [PATCH AUTOSEL 5.9 09/39] powerpc: Drop -me200 addition to build flags
From: Sasha Levin @ 2020-12-03 13:28 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, kernel test robot, Németh Márton,
	Nick Desaulniers, Scott Wood, linuxppc-dev
In-Reply-To: <20201203132834.930999-1-sashal@kernel.org>

From: Michael Ellerman <mpe@ellerman.id.au>

[ Upstream commit e02152ba2810f7c88cb54e71cda096268dfa9241 ]

Currently a build with CONFIG_E200=y will fail with:

  Error: invalid switch -me200
  Error: unrecognized option -me200

Upstream binutils has never supported an -me200 option. Presumably it
was supported at some point by either a fork or Freescale internal
binutils.

We can't support code that we can't even build test, so drop the
addition of -me200 to the build flags, so we can at least build with
CONFIG_E200=y.

Reported-by: Németh Márton <nm127@freemail.hu>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Scott Wood <oss@buserror.net>
Link: https://lore.kernel.org/r/20201116120913.165317-1-mpe@ellerman.id.au
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 arch/powerpc/Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 3e8da9cf2eb9d..e6643d5699fef 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -249,7 +249,6 @@ KBUILD_CFLAGS		+= $(call cc-option,-mno-string)
 cpu-as-$(CONFIG_40x)		+= -Wa,-m405
 cpu-as-$(CONFIG_44x)		+= -Wa,-m440
 cpu-as-$(CONFIG_ALTIVEC)	+= $(call as-option,-Wa$(comma)-maltivec)
-cpu-as-$(CONFIG_E200)		+= -Wa,-me200
 cpu-as-$(CONFIG_E500)		+= -Wa,-me500
 
 # When using '-many -mpower4' gas will first try and find a matching power4
-- 
2.27.0


^ permalink raw reply related

* Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: Matthew Wilcox @ 2020-12-03 12:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, Rik van Riel, Will Deacon, X86 ML,
	Dave Hansen, LKML, Nicholas Piggin, Linux-MM, Mathieu Desnoyers,
	Catalin Marinas, linuxppc-dev
In-Reply-To: <7c4bcc0a464ca60be1e0aeba805a192be0ee81e5.1606972194.git.luto@kernel.org>

On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:
> This code compiles, but I haven't even tried to boot it.  The earlier
> part of the series isn't terribly interesting -- it's a handful of
> cleanups that remove all reads of ->active_mm from arch/x86.  I've
> been meaning to do that for a while, and now I did it.  But, with
> that done, I think we can move to a totally different lazy mm refcounting
> model.

I went back and read Documentation/vm/active_mm.rst recently.

I think it's useful to think about how this would have been handled if
we'd had RCU at the time.  Particularly:

Linus wrote:
> To support all that, the "struct mm_struct" now has two counters: a
> "mm_users" counter that is how many "real address space users" there are,
> and a "mm_count" counter that is the number of "lazy" users (ie anonymous
> users) plus one if there are any real users.

And this just makes me think RCU freeing of mm_struct.  I'm sure it's
more complicated than that (then, or now), but if an anonymous process
is borrowing a freed mm, and the mm is freed by RCU then it will not go
away until the task context switches.  When we context switch back to
the anon task, it'll borrow some other task's MM and won't even notice
that the MM it was using has gone away.

^ permalink raw reply

* Re: [PATCH] powerpc/mm: Don't see NULL pointer dereference as a KUAP fault
From: Michael Ellerman @ 2020-12-03 11:55 UTC (permalink / raw)
  To: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <8b865b93d25c15c8e6d41e71c368bfc28da4489d.1606816701.git.christophe.leroy@csgroup.eu>

Christophe Leroy <christophe.leroy@csgroup.eu> writes:
> Sometimes, NULL pointer dereferences are expected. Even when they
> are accidental they are unlikely an exploit attempt because the
> first page is never mapped.

The first page can be mapped if mmap_min_addr is 0.

Blocking all faults to the first page would potentially break any
program that does that.

Also if there is something mapped at 0 it's a good chance it is an
exploit attempt :)

cheers


> The exemple below shows what we get when invoking the "show task"
> sysrq handler, by writing 't' in /proc/sysrq-trigger
>
> [ 1117.202054] ------------[ cut here ]------------
> [ 1117.202102] Bug: fault blocked by AP register !
> [ 1117.202261] WARNING: CPU: 0 PID: 377 at arch/powerpc/include/asm/nohash/32/kup-8xx.h:66 do_page_fault+0x4a8/0x5ec
> [ 1117.202310] Modules linked in:
> [ 1117.202428] CPU: 0 PID: 377 Comm: sh Tainted: G        W         5.10.0-rc5-s3k-dev-01340-g83f53be2de31-dirty #4175
> [ 1117.202499] NIP:  c0012048 LR: c0012048 CTR: 00000000
> [ 1117.202573] REGS: cacdbb88 TRAP: 0700   Tainted: G        W          (5.10.0-rc5-s3k-dev-01340-g83f53be2de31-dirty)
> [ 1117.202625] MSR:  00021032 <ME,IR,DR,RI>  CR: 24082222  XER: 20000000
> [ 1117.202899]
> [ 1117.202899] GPR00: c0012048 cacdbc40 c2929290 00000023 c092e554 00000001 c09865e8 c092e640
> [ 1117.202899] GPR08: 00001032 00000000 00000000 00014efc 28082224 100d166a 100a0920 00000000
> [ 1117.202899] GPR16: 100cac0c 100b0000 1080c3fc 1080d685 100d0000 100d0000 00000000 100a0900
> [ 1117.202899] GPR24: 100d0000 c07892ec 00000000 c0921510 c21f4440 0000005c c0000000 cacdbc80
> [ 1117.204362] NIP [c0012048] do_page_fault+0x4a8/0x5ec
> [ 1117.204461] LR [c0012048] do_page_fault+0x4a8/0x5ec
> [ 1117.204509] Call Trace:
> [ 1117.204609] [cacdbc40] [c0012048] do_page_fault+0x4a8/0x5ec (unreliable)
> [ 1117.204771] [cacdbc70] [c00112f0] handle_page_fault+0x8/0x34
> [ 1117.204911] --- interrupt: 301 at copy_from_kernel_nofault+0x70/0x1c0
> [ 1117.204979] NIP:  c010dbec LR: c010dbac CTR: 00000001
> [ 1117.205053] REGS: cacdbc80 TRAP: 0301   Tainted: G        W          (5.10.0-rc5-s3k-dev-01340-g83f53be2de31-dirty)
> [ 1117.205104] MSR:  00009032 <EE,ME,IR,DR,RI>  CR: 28082224  XER: 00000000
> [ 1117.205416] DAR: 0000005c DSISR: c0000000
> [ 1117.205416] GPR00: c0045948 cacdbd38 c2929290 00000001 00000017 00000017 00000027 0000000f
> [ 1117.205416] GPR08: c09926ec 00000000 00000000 3ffff000 24082224
> [ 1117.206106] NIP [c010dbec] copy_from_kernel_nofault+0x70/0x1c0
> [ 1117.206202] LR [c010dbac] copy_from_kernel_nofault+0x30/0x1c0
> [ 1117.206258] --- interrupt: 301
> [ 1117.206372] [cacdbd38] [c004bbb0] kthread_probe_data+0x44/0x70 (unreliable)
> [ 1117.206561] [cacdbd58] [c0045948] print_worker_info+0xe0/0x194
> [ 1117.206717] [cacdbdb8] [c00548ac] sched_show_task+0x134/0x168
> [ 1117.206851] [cacdbdd8] [c005a268] show_state_filter+0x70/0x100
> [ 1117.206989] [cacdbe08] [c039baa0] sysrq_handle_showstate+0x14/0x24
> [ 1117.207122] [cacdbe18] [c039bf18] __handle_sysrq+0xac/0x1d0
> [ 1117.207257] [cacdbe48] [c039c0c0] write_sysrq_trigger+0x4c/0x74
> [ 1117.207407] [cacdbe68] [c01fba48] proc_reg_write+0xb4/0x114
> [ 1117.207550] [cacdbe88] [c0179968] vfs_write+0x12c/0x478
> [ 1117.207686] [cacdbf08] [c0179e60] ksys_write+0x78/0x128
> [ 1117.207826] [cacdbf38] [c00110d0] ret_from_syscall+0x0/0x34
> [ 1117.207938] --- interrupt: c01 at 0xfd4e784
> [ 1117.208008] NIP:  0fd4e784 LR: 0fe0f244 CTR: 10048d38
> [ 1117.208083] REGS: cacdbf48 TRAP: 0c01   Tainted: G        W          (5.10.0-rc5-s3k-dev-01340-g83f53be2de31-dirty)
> [ 1117.208134] MSR:  0000d032 <EE,PR,ME,IR,DR,RI>  CR: 44002222  XER: 00000000
> [ 1117.208470]
> [ 1117.208470] GPR00: 00000004 7fc34090 77bfb4e0 00000001 1080fa40 00000002 7400000f fefefeff
> [ 1117.208470] GPR08: 7f7f7f7f 10048d38 1080c414 7fc343c0 00000000
> [ 1117.209104] NIP [0fd4e784] 0xfd4e784
> [ 1117.209180] LR [0fe0f244] 0xfe0f244
> [ 1117.209236] --- interrupt: c01
> [ 1117.209274] Instruction dump:
> [ 1117.209353] 714a4000 418200f0 73ca0001 40820084 73ca0032 408200f8 73c90040 4082ff60
> [ 1117.209727] 0fe00000 3c60c082 386399f4 48013b65 <0fe00000> 80010034 3860000b 7c0803a6
> [ 1117.210102] ---[ end trace 1927c0323393af3e ]---
>
> So, avoid the big KUAP warning by bailing out of bad_kernel_fault()
> before calling bad_kuap_fault() when address references the first
> page.
>
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  arch/powerpc/mm/fault.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> index 0add963a849b..be2b4318206f 100644
> --- a/arch/powerpc/mm/fault.c
> +++ b/arch/powerpc/mm/fault.c
> @@ -198,6 +198,10 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned long error_code,
>  {
>  	int is_exec = TRAP(regs) == 0x400;
>  
> +	// Kernel fault on first page is likely a NULL pointer dereference
> +	if (address < PAGE_SIZE)
> +		return true;
> +
>  	/* NX faults set DSISR_PROTFAULT on the 8xx, DSISR_NOEXEC_OR_G on others */
>  	if (is_exec && (error_code & (DSISR_NOEXEC_OR_G | DSISR_KEYFAULT |
>  				      DSISR_PROTFAULT))) {
> -- 
> 2.25.0

^ permalink raw reply

* Re: [PATCH] EDAC, mv64x60: Fix error return code in mv64x60_pci_err_probe()
From: Borislav Petkov @ 2020-12-03 11:39 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: cj.chengjian, linux-kernel, Wang ShaoBo, Paul Mackerras,
	huawei.libin, james.morse, mchehab, linuxppc-dev, linux-edac
In-Reply-To: <87pn3ruo2q.fsf@mpe.ellerman.id.au>

On Thu, Dec 03, 2020 at 10:27:25PM +1100, Michael Ellerman wrote:
> It's dead code, so drop it.
> 
> I can send a patch if no one else wants to.

Yes please.

I love patches removing code! :-)

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

^ permalink raw reply

* Re: [PATCH] EDAC, mv64x60: Fix error return code in mv64x60_pci_err_probe()
From: Michael Ellerman @ 2020-12-03 11:27 UTC (permalink / raw)
  To: Borislav Petkov, Benjamin Herrenschmidt, Paul Mackerras
  Cc: cj.chengjian, linux-kernel, Wang ShaoBo, james.morse,
	huawei.libin, mchehab, linuxppc-dev, linux-edac
In-Reply-To: <20201202112515.GC2951@zn.tnic>

Borislav Petkov <bp@alien8.de> writes:
> On Tue, Nov 24, 2020 at 02:30:09PM +0800, Wang ShaoBo wrote:
>> Fix to return -ENODEV error code when edac_pci_add_device() failed instaed
>> of 0 in mv64x60_pci_err_probe(), as done elsewhere in this function.
>> 
>> Fixes: 4f4aeeabc061 ("drivers-edac: add marvell mv64x60 driver")
>> Signed-off-by: Wang ShaoBo <bobo.shaobowang@huawei.com>
>> ---
>>  drivers/edac/mv64x60_edac.c | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/drivers/edac/mv64x60_edac.c b/drivers/edac/mv64x60_edac.c
>> index 3c68bb525d5d..456b9ca1fe8d 100644
>> --- a/drivers/edac/mv64x60_edac.c
>> +++ b/drivers/edac/mv64x60_edac.c
>> @@ -168,6 +168,7 @@ static int mv64x60_pci_err_probe(struct platform_device *pdev)
>>  
>>  	if (edac_pci_add_device(pci, pdata->edac_idx) > 0) {
>>  		edac_dbg(3, "failed edac_pci_add_device()\n");
>> +		res = -ENODEV;
>>  		goto err;
>>  	}
>
> That driver depends on MV64X60 and I don't see anything in the tree
> enabling it and I can't select it AFAICT:
>
> config MV64X60
>         bool
>         select PPC_INDIRECT_PCI
>         select CHECK_CACHE_COHERENCY

It was selected by PPC_C2K, but that was dropped in:

  92c8c16f3457 ("powerpc/embedded6xx: Remove C2K board support")

> PPC folks, what do we do here?
>
> If not used anymore, I'd love to have one less EDAC driver.

It's dead code, so drop it.

I can send a patch if no one else wants to.

cheers

^ permalink raw reply

* [PATCH] powerpc/hotplug: assign hot added LMB to the right node
From: Laurent Dufour @ 2020-12-03 10:15 UTC (permalink / raw)
  To: linuxppc-dev
  Cc: Nathan Lynch, Scott Cheloha, linux-kernel, stable, Paul Mackerras

This patch applies to 5.9 and earlier kernels only.

Since 5.10, this has been fortunately fixed by the commit
e5e179aa3a39 ("pseries/drmem: don't cache node id in drmem_lmb struct").

When LMBs are added to a running system, the node id assigned to the LMB is
fetched from the temporary DT node provided by the hypervisor.

However, LMBs added are always assigned to the first online node. This is a
mistake and this is because hot_add_drconf_scn_to_nid() called by
lmb_set_nid() is checking for the LMB flags DRCONF_MEM_ASSIGNED which is
set later in dlpar_add_lmb().

To fix this issue, simply set that flag earlier in dlpar_add_lmb().

Note, this code has been rewrote in 5.10 and thus this fix has no meaning
since this version.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Scott Cheloha <cheloha@linux.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: stable@vger.kernel.org
---
 arch/powerpc/platforms/pseries/hotplug-memory.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c
index e54dcbd04b2f..92d83915c629 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -663,12 +663,14 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 		return rc;
 	}
 
+	lmb->flags |= DRCONF_MEM_ASSIGNED;
 	lmb_set_nid(lmb);
 	block_sz = memory_block_size_bytes();
 
 	/* Add the memory */
 	rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
 	if (rc) {
+		lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 		invalidate_lmb_associativity_index(lmb);
 		return rc;
 	}
@@ -676,10 +678,9 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
 	rc = dlpar_online_lmb(lmb);
 	if (rc) {
 		__remove_memory(lmb->nid, lmb->base_addr, block_sz);
+		lmb->flags &= ~DRCONF_MEM_ASSIGNED;
 		invalidate_lmb_associativity_index(lmb);
 		lmb_clear_nid(lmb);
-	} else {
-		lmb->flags |= DRCONF_MEM_ASSIGNED;
 	}
 
 	return rc;
-- 
2.29.2


^ permalink raw reply related

* Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: Peter Zijlstra @ 2020-12-03  8:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-arch, Arnd Bergmann, Rik van Riel, Will Deacon, X86 ML,
	Dave Hansen, LKML, Nicholas Piggin, Linux-MM, Mathieu Desnoyers,
	Catalin Marinas, linuxppc-dev
In-Reply-To: <7c4bcc0a464ca60be1e0aeba805a192be0ee81e5.1606972194.git.luto@kernel.org>

On Wed, Dec 02, 2020 at 09:25:51PM -0800, Andy Lutomirski wrote:

> power: same as ARM, except that the loop may be rather larger since
> the systems are bigger.  But I imagine it's still faster than Nick's
> approach -- a cmpxchg to a remote cacheline should still be faster than
> an IPI shootdown. 

While a single atomic might be cheaper than an IPI, the comparison
doesn't work out nicely. You do the xchg() on every unlazy, while the
IPI would be once per process exit.

So over the life of the process, it might do very many unlazies, adding
up to a total cost far in excess of what the single IPI would've been.


And while I appreciate all the work to get rid of the active_mm
accounting; the worry I have with pushing this all into arch code is
that it will be so very easy to get this subtly wrong.

^ permalink raw reply

* Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts
From: Alexey Kardashevskiy @ 2020-12-03  8:13 UTC (permalink / raw)
  To: Christophe Leroy, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <7d09d218-2703-a37e-bf47-1cc47ee467b7@csgroup.eu>



On 03/12/2020 19:03, Christophe Leroy wrote:
> 
> 
> Le 03/12/2020 à 06:47, Alexey Kardashevskiy a écrit :
>> When interrupted in raw_copy_from_user()/... after user memory access
>> is enabled, a nested handler may also access user memory (perf is
>> one example) and when it does so, it calls prevent_read_from_user()
>> which prevents the upper handler from accessing user memory.
>>
>> This saves/restores AMR when replaying interrupts.
>>
>> get_kuap/set_kuap have stubs for disabled KUAP on RADIX but there are
>> none for hash-only configs (BOOK3E) so this adds stubs and moves
>> AMR_KUAP_BLOCK_xxx.
>>
>> Found by syzkaller. More likely to break with enabled
>> CONFIG_DEBUG_ATOMIC_SLEEP, the call chain is
>> timer_interrupt -> ktime_get -> read_seqcount_begin -> local_irq_restore.
>>
>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>> ---
>> Changes:
>> v2:
>> * fixed compile on hash
>> * moved get/set to arch_local_irq_restore
>> * block KUAP before replaying
>>
>>
>> ---
>>
>> This is an example:
>>
>> ------------[ cut here ]------------
>> Bug: Read fault blocked by AMR!
>> WARNING: CPU: 0 PID: 1603 at 
>> /home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup-radix.h:145 
>> __do_page_fau
>>
>> Modules linked in:
>> CPU: 0 PID: 1603 Comm: amr Not tainted 5.10.0-rc6_v5.10-rc6_a+fstn1 #24
>> NIP:  c00000000009ece8 LR: c00000000009ece4 CTR: 0000000000000000
>> REGS: c00000000dc63560 TRAP: 0700   Not tainted  
>> (5.10.0-rc6_v5.10-rc6_a+fstn1)
>> MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 28002888  XER: 20040000
>> CFAR: c0000000001fa928 IRQMASK: 1
>> GPR00: c00000000009ece4 c00000000dc637f0 c000000002397600 
>> 000000000000001f
>> GPR04: c0000000020eb318 0000000000000000 c00000000dc63494 
>> 0000000000000027
>> GPR08: c00000007fe4de68 c00000000dfe9180 0000000000000000 
>> 0000000000000001
>> GPR12: 0000000000002000 c0000000030a0000 0000000000000000 
>> 0000000000000000
>> GPR16: 0000000000000000 0000000000000000 0000000000000000 
>> bfffffffffffffff
>> GPR20: 0000000000000000 c0000000134a4020 c0000000019c2218 
>> 0000000000000fe0
>> GPR24: 0000000000000000 0000000000000000 c00000000d106200 
>> 0000000040000000
>> GPR28: 0000000000000000 0000000000000300 c00000000dc63910 
>> c000000001946730
>> NIP [c00000000009ece8] __do_page_fault+0xb38/0xde0
>> LR [c00000000009ece4] __do_page_fault+0xb34/0xde0
>> Call Trace:
>> [c00000000dc637f0] [c00000000009ece4] __do_page_fault+0xb34/0xde0 
>> (unreliable)
>> [c00000000dc638a0] [c00000000000c968] handle_page_fault+0x10/0x2c
>> --- interrupt: 300 at strncpy_from_user+0x290/0x440
>>      LR = strncpy_from_user+0x284/0x440
>> [c00000000dc63ba0] [c000000000c3dcb0] strncpy_from_user+0x2f0/0x440 
>> (unreliable)
>> [c00000000dc63c30] [c00000000068b888] getname_flags+0x88/0x2c0
>> [c00000000dc63c90] [c000000000662a44] do_sys_openat2+0x2d4/0x5f0
>> [c00000000dc63d30] [c00000000066560c] do_sys_open+0xcc/0x140
>> [c00000000dc63dc0] [c000000000045e10] system_call_exception+0x160/0x240
>> [c00000000dc63e20] [c00000000000da60] system_call_common+0xf0/0x27c
>> Instruction dump:
>> 409c0048 3fe2ff5b 3bfff128 fac10060 fae10068 482f7a85 60000000 3c62ff5b
>> 7fe4fb78 3863f250 4815bbd9 60000000 <0fe00000> 3c62ff5b 3863f2b8 4815c8b5
>> irq event stamp: 254
>> hardirqs last  enabled at (253): [<c000000000019550>] 
>> arch_local_irq_restore+0xa0/0x150
>> hardirqs last disabled at (254): [<c000000000008a10>] 
>> data_access_common_virt+0x1b0/0x1d0
>> softirqs last  enabled at (0): [<c0000000001f6d5c>] 
>> copy_process+0x78c/0x2120
>> softirqs last disabled at (0): [<0000000000000000>] 0x0
>> ---[ end trace ba98aec5151f3aeb ]---
>> ---
>>   arch/powerpc/include/asm/book3s/64/kup-radix.h |  3 ---
>>   arch/powerpc/include/asm/kup.h                 | 10 ++++++++++
>>   arch/powerpc/kernel/irq.c                      |  6 ++++++
>>   3 files changed, 16 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h 
>> b/arch/powerpc/include/asm/book3s/64/kup-radix.h
>> index a39e2d193fdc..4ad607461b75 100644
>> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
>> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
>> @@ -5,9 +5,6 @@
>>   #include <linux/const.h>
>>   #include <asm/reg.h>
>> -#define AMR_KUAP_BLOCK_READ    UL(0x4000000000000000)
>> -#define AMR_KUAP_BLOCK_WRITE    UL(0x8000000000000000)
>> -#define AMR_KUAP_BLOCKED    (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
>>   #define AMR_KUAP_SHIFT        62
>>   #ifdef __ASSEMBLY__
>> diff --git a/arch/powerpc/include/asm/kup.h 
>> b/arch/powerpc/include/asm/kup.h
>> index 0d93331d0fab..e63930767a96 100644
>> --- a/arch/powerpc/include/asm/kup.h
>> +++ b/arch/powerpc/include/asm/kup.h
>> @@ -14,6 +14,10 @@
>>   #define KUAP_CURRENT_WRITE    8
>>   #define KUAP_CURRENT        (KUAP_CURRENT_READ | KUAP_CURRENT_WRITE)
>> +#define AMR_KUAP_BLOCK_READ    UL(0x4000000000000000)
>> +#define AMR_KUAP_BLOCK_WRITE    UL(0x8000000000000000)
>> +#define AMR_KUAP_BLOCKED    (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
>> +
> 
> Those macro are specific to BOOK3S_64, they have nothing to do in 
> asm/kup.h, must stay in the file included just below
> 
>>   #ifdef CONFIG_PPC_BOOK3S_64
>>   #include <asm/book3s/64/kup-radix.h>
>>   #endif
>> @@ -64,6 +68,12 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long 
>> address, bool is_write)
>>   }
>>   static inline void kuap_check_amr(void) { }
>> +static inline unsigned long get_kuap(void)
>> +{
>> +    return AMR_KUAP_BLOCKED;
>> +}
>> +
> 
> The above is not generic, it is specific to book3s 64, AMR doesn't exist 
> on book3s/32 or on 8xx.


ah ok, I did not want more ifdefs there but looks like it is 
unavoidable, I'll repost if there are no other comments. Thanks,


-- 
Alexey

^ permalink raw reply

* Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts
From: Alexey Kardashevskiy @ 2020-12-03  8:10 UTC (permalink / raw)
  To: Aneesh Kumar K.V, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <87zh2vjsxg.fsf@linux.ibm.com>



On 03/12/2020 17:38, Aneesh Kumar K.V wrote:
> Alexey Kardashevskiy <aik@ozlabs.ru> writes:
> 
>> When interrupted in raw_copy_from_user()/... after user memory access
>> is enabled, a nested handler may also access user memory (perf is
>> one example) and when it does so, it calls prevent_read_from_user()
>> which prevents the upper handler from accessing user memory.
>>
>> This saves/restores AMR when replaying interrupts.
>>
>> get_kuap/set_kuap have stubs for disabled KUAP on RADIX but there are
>> none for hash-only configs (BOOK3E) so this adds stubs and moves
>> AMR_KUAP_BLOCK_xxx.
>>
>> Found by syzkaller. More likely to break with enabled
>> CONFIG_DEBUG_ATOMIC_SLEEP, the call chain is
>> timer_interrupt -> ktime_get -> read_seqcount_begin -> local_irq_restore.
> 
> Can you test this with https://github.com/kvaneesh/linux/commits/hash-kuap-reworked-2

Yup, broken:

[    8.813115] ------------[ cut here ]------------ 

[    8.813499] Bug: Read fault blocked by AMR! 

[    8.813532] WARNING: CPU: 0 PID: 1113 at 
/home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup.h:369 
__do_page_fault+0xd
34/0xdf0 

[    8.814248] Modules linked in: 

[    8.814459] CPU: 0 PID: 1113 Comm: amr Not tainted 
5.10.0-rc4_aneesh--hash-kuap-reworked-2_a+fstn1 #61
[    8.815075] NIP:  c0000000000a4674 LR: c0000000000a4670 CTR: 
0000000000000000
[    8.815632] REGS: c00000000d44f530 TRAP: 0700   Not tainted 
(5.10.0-rc4_aneesh--hash-kuap-reworked-2_a+fstn1)



> We do save restore AMR on interrupt entry and exit.

This is nested interrupt. we get interrupted while copying to/from user. 
The first handler has AMR off, then it goes to strncpy_from_user, 
enables AMR, page fault happens and another interrupt arrives - and if 
the new interrupt also wants strncpy_from_user/co, it will enable AMR 
again (which is ok), do copy and then disable AMR; and then we go back 
and resume the first strncpy_from_user which thinks AMR is enabling 
while it is not. I just see how strncpy_from_user is called while 
strncpy_from_user is running. Nick understands the mechanics better.

I can give you an initramdisk which crashes in a millisecond, ping me in 
slack.


-- 
Alexey

^ permalink raw reply

* Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts
From: Christophe Leroy @ 2020-12-03  8:03 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: Nicholas Piggin
In-Reply-To: <20201203054724.44838-1-aik@ozlabs.ru>



Le 03/12/2020 à 06:47, Alexey Kardashevskiy a écrit :
> When interrupted in raw_copy_from_user()/... after user memory access
> is enabled, a nested handler may also access user memory (perf is
> one example) and when it does so, it calls prevent_read_from_user()
> which prevents the upper handler from accessing user memory.
> 
> This saves/restores AMR when replaying interrupts.
> 
> get_kuap/set_kuap have stubs for disabled KUAP on RADIX but there are
> none for hash-only configs (BOOK3E) so this adds stubs and moves
> AMR_KUAP_BLOCK_xxx.
> 
> Found by syzkaller. More likely to break with enabled
> CONFIG_DEBUG_ATOMIC_SLEEP, the call chain is
> timer_interrupt -> ktime_get -> read_seqcount_begin -> local_irq_restore.
> 
> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> ---
> Changes:
> v2:
> * fixed compile on hash
> * moved get/set to arch_local_irq_restore
> * block KUAP before replaying
> 
> 
> ---
> 
> This is an example:
> 
> ------------[ cut here ]------------
> Bug: Read fault blocked by AMR!
> WARNING: CPU: 0 PID: 1603 at /home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup-radix.h:145 __do_page_fau
> 
> Modules linked in:
> CPU: 0 PID: 1603 Comm: amr Not tainted 5.10.0-rc6_v5.10-rc6_a+fstn1 #24
> NIP:  c00000000009ece8 LR: c00000000009ece4 CTR: 0000000000000000
> REGS: c00000000dc63560 TRAP: 0700   Not tainted  (5.10.0-rc6_v5.10-rc6_a+fstn1)
> MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 28002888  XER: 20040000
> CFAR: c0000000001fa928 IRQMASK: 1
> GPR00: c00000000009ece4 c00000000dc637f0 c000000002397600 000000000000001f
> GPR04: c0000000020eb318 0000000000000000 c00000000dc63494 0000000000000027
> GPR08: c00000007fe4de68 c00000000dfe9180 0000000000000000 0000000000000001
> GPR12: 0000000000002000 c0000000030a0000 0000000000000000 0000000000000000
> GPR16: 0000000000000000 0000000000000000 0000000000000000 bfffffffffffffff
> GPR20: 0000000000000000 c0000000134a4020 c0000000019c2218 0000000000000fe0
> GPR24: 0000000000000000 0000000000000000 c00000000d106200 0000000040000000
> GPR28: 0000000000000000 0000000000000300 c00000000dc63910 c000000001946730
> NIP [c00000000009ece8] __do_page_fault+0xb38/0xde0
> LR [c00000000009ece4] __do_page_fault+0xb34/0xde0
> Call Trace:
> [c00000000dc637f0] [c00000000009ece4] __do_page_fault+0xb34/0xde0 (unreliable)
> [c00000000dc638a0] [c00000000000c968] handle_page_fault+0x10/0x2c
> --- interrupt: 300 at strncpy_from_user+0x290/0x440
>      LR = strncpy_from_user+0x284/0x440
> [c00000000dc63ba0] [c000000000c3dcb0] strncpy_from_user+0x2f0/0x440 (unreliable)
> [c00000000dc63c30] [c00000000068b888] getname_flags+0x88/0x2c0
> [c00000000dc63c90] [c000000000662a44] do_sys_openat2+0x2d4/0x5f0
> [c00000000dc63d30] [c00000000066560c] do_sys_open+0xcc/0x140
> [c00000000dc63dc0] [c000000000045e10] system_call_exception+0x160/0x240
> [c00000000dc63e20] [c00000000000da60] system_call_common+0xf0/0x27c
> Instruction dump:
> 409c0048 3fe2ff5b 3bfff128 fac10060 fae10068 482f7a85 60000000 3c62ff5b
> 7fe4fb78 3863f250 4815bbd9 60000000 <0fe00000> 3c62ff5b 3863f2b8 4815c8b5
> irq event stamp: 254
> hardirqs last  enabled at (253): [<c000000000019550>] arch_local_irq_restore+0xa0/0x150
> hardirqs last disabled at (254): [<c000000000008a10>] data_access_common_virt+0x1b0/0x1d0
> softirqs last  enabled at (0): [<c0000000001f6d5c>] copy_process+0x78c/0x2120
> softirqs last disabled at (0): [<0000000000000000>] 0x0
> ---[ end trace ba98aec5151f3aeb ]---
> ---
>   arch/powerpc/include/asm/book3s/64/kup-radix.h |  3 ---
>   arch/powerpc/include/asm/kup.h                 | 10 ++++++++++
>   arch/powerpc/kernel/irq.c                      |  6 ++++++
>   3 files changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> index a39e2d193fdc..4ad607461b75 100644
> --- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
> +++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
> @@ -5,9 +5,6 @@
>   #include <linux/const.h>
>   #include <asm/reg.h>
>   
> -#define AMR_KUAP_BLOCK_READ	UL(0x4000000000000000)
> -#define AMR_KUAP_BLOCK_WRITE	UL(0x8000000000000000)
> -#define AMR_KUAP_BLOCKED	(AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
>   #define AMR_KUAP_SHIFT		62
>   
>   #ifdef __ASSEMBLY__
> diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
> index 0d93331d0fab..e63930767a96 100644
> --- a/arch/powerpc/include/asm/kup.h
> +++ b/arch/powerpc/include/asm/kup.h
> @@ -14,6 +14,10 @@
>   #define KUAP_CURRENT_WRITE	8
>   #define KUAP_CURRENT		(KUAP_CURRENT_READ | KUAP_CURRENT_WRITE)
>   
> +#define AMR_KUAP_BLOCK_READ	UL(0x4000000000000000)
> +#define AMR_KUAP_BLOCK_WRITE	UL(0x8000000000000000)
> +#define AMR_KUAP_BLOCKED	(AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
> +

Those macro are specific to BOOK3S_64, they have nothing to do in asm/kup.h, must stay in the file 
included just below

>   #ifdef CONFIG_PPC_BOOK3S_64
>   #include <asm/book3s/64/kup-radix.h>
>   #endif
> @@ -64,6 +68,12 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
>   }
>   
>   static inline void kuap_check_amr(void) { }
> +static inline unsigned long get_kuap(void)
> +{
> +	return AMR_KUAP_BLOCKED;
> +}
> +

The above is not generic, it is specific to book3s 64, AMR doesn't exist on book3s/32 or on 8xx.

> +static inline void set_kuap(unsigned long value) { }
>   
>   /*
>    * book3s/64/kup-radix.h defines these functions for the !KUAP case to flush
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index 7d0f7682d01d..d9fd46da04d6 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -314,6 +314,7 @@ void replay_soft_interrupts(void)
>   notrace void arch_local_irq_restore(unsigned long mask)
>   {
>   	unsigned char irq_happened;
> +	unsigned long kuap_state;
>   
>   	/* Write the new soft-enabled value */
>   	irq_soft_mask_set(mask);
> @@ -373,9 +374,14 @@ notrace void arch_local_irq_restore(unsigned long mask)
>   	irq_soft_mask_set(IRQS_ALL_DISABLED);
>   	trace_hardirqs_off();
>   
> +	kuap_state = get_kuap();
> +	set_kuap(AMR_KUAP_BLOCKED);
> +
>   	replay_soft_interrupts();
>   	local_paca->irq_happened = 0;
>   
> +	set_kuap(kuap_state);
> +
>   	trace_hardirqs_on();
>   	irq_soft_mask_set(IRQS_ENABLED);
>   	__hard_irq_enable();
> 

^ permalink raw reply

* Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts
From: kernel test robot @ 2020-12-03  7:41 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev
  Cc: Alexey Kardashevskiy, kbuild-all, Nicholas Piggin
In-Reply-To: <20201203054724.44838-1-aik@ozlabs.ru>

[-- Attachment #1: Type: text/plain, Size: 9162 bytes --]

Hi Alexey,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on linus/master]
[also build test WARNING on v5.10-rc6 next-20201201]
[cannot apply to powerpc/next]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201203-135010
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 34816d20f173a90389c8a7e641166d8ea9dce70a
config: powerpc-wii_defconfig (attached as .config)
compiler: powerpc-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/d2a99260ef6d241abe6fb961ee3ed84bbc5db7f5
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Alexey-Kardashevskiy/powerpc-kuap-Restore-AMR-after-replaying-soft-interrupts/20201203-135010
        git checkout d2a99260ef6d241abe6fb961ee3ed84bbc5db7f5
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=powerpc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   In file included from arch/powerpc/include/asm/uaccess.h:9,
                    from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from include/linux/compat.h:17,
                    from arch/powerpc/kernel/asm-offsets.c:14:
   arch/powerpc/include/asm/kup.h: In function 'get_kuap':
>> arch/powerpc/include/asm/kup.h:19:26: warning: conversion from 'long long unsigned int' to 'long unsigned int' changes value from '13835058055282163712' to '0' [-Woverflow]
      19 | #define AMR_KUAP_BLOCKED (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
         |                          ^
   arch/powerpc/include/asm/kup.h:73:9: note: in expansion of macro 'AMR_KUAP_BLOCKED'
      73 |  return AMR_KUAP_BLOCKED;
         |         ^~~~~~~~~~~~~~~~
--
   In file included from arch/powerpc/include/asm/uaccess.h:9,
                    from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from include/linux/huge_mm.h:8,
                    from include/linux/mm.h:687,
                    from include/linux/ring_buffer.h:5,
                    from include/linux/trace_events.h:6,
                    from include/trace/trace_events.h:21,
                    from include/trace/define_trace.h:102,
                    from include/trace/events/sched.h:656,
                    from kernel/sched/core.c:10:
   arch/powerpc/include/asm/kup.h: In function 'get_kuap':
>> arch/powerpc/include/asm/kup.h:19:26: warning: conversion from 'long long unsigned int' to 'long unsigned int' changes value from '13835058055282163712' to '0' [-Woverflow]
      19 | #define AMR_KUAP_BLOCKED (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
         |                          ^
   arch/powerpc/include/asm/kup.h:73:9: note: in expansion of macro 'AMR_KUAP_BLOCKED'
      73 |  return AMR_KUAP_BLOCKED;
         |         ^~~~~~~~~~~~~~~~
   kernel/sched/core.c: In function 'ttwu_stat':
   kernel/sched/core.c:2419:13: warning: variable 'rq' set but not used [-Wunused-but-set-variable]
    2419 |  struct rq *rq;
         |             ^~
--
   In file included from arch/powerpc/include/asm/uaccess.h:9,
                    from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/sched/cputime.h:5,
                    from kernel/sched/sched.h:11,
                    from kernel/sched/fair.c:23:
   arch/powerpc/include/asm/kup.h: In function 'get_kuap':
>> arch/powerpc/include/asm/kup.h:19:26: warning: conversion from 'long long unsigned int' to 'long unsigned int' changes value from '13835058055282163712' to '0' [-Woverflow]
      19 | #define AMR_KUAP_BLOCKED (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
         |                          ^
   arch/powerpc/include/asm/kup.h:73:9: note: in expansion of macro 'AMR_KUAP_BLOCKED'
      73 |  return AMR_KUAP_BLOCKED;
         |         ^~~~~~~~~~~~~~~~
   kernel/sched/fair.c: At top level:
   kernel/sched/fair.c:5368:6: warning: no previous prototype for 'init_cfs_bandwidth' [-Wmissing-prototypes]
    5368 | void init_cfs_bandwidth(struct cfs_bandwidth *cfs_b) {}
         |      ^~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:11150:6: warning: no previous prototype for 'free_fair_sched_group' [-Wmissing-prototypes]
   11150 | void free_fair_sched_group(struct task_group *tg) { }
         |      ^~~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:11152:5: warning: no previous prototype for 'alloc_fair_sched_group' [-Wmissing-prototypes]
   11152 | int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent)
         |     ^~~~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:11157:6: warning: no previous prototype for 'online_fair_sched_group' [-Wmissing-prototypes]
   11157 | void online_fair_sched_group(struct task_group *tg) { }
         |      ^~~~~~~~~~~~~~~~~~~~~~~
   kernel/sched/fair.c:11159:6: warning: no previous prototype for 'unregister_fair_sched_group' [-Wmissing-prototypes]
   11159 | void unregister_fair_sched_group(struct task_group *tg) { }
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from arch/powerpc/include/asm/uaccess.h:9,
                    from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/sched/cputime.h:5,
                    from kernel/sched/sched.h:11,
                    from kernel/sched/rt.c:6:
   arch/powerpc/include/asm/kup.h: In function 'get_kuap':
>> arch/powerpc/include/asm/kup.h:19:26: warning: conversion from 'long long unsigned int' to 'long unsigned int' changes value from '13835058055282163712' to '0' [-Woverflow]
      19 | #define AMR_KUAP_BLOCKED (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
         |                          ^
   arch/powerpc/include/asm/kup.h:73:9: note: in expansion of macro 'AMR_KUAP_BLOCKED'
      73 |  return AMR_KUAP_BLOCKED;
         |         ^~~~~~~~~~~~~~~~
   kernel/sched/rt.c: At top level:
   kernel/sched/rt.c:253:6: warning: no previous prototype for 'free_rt_sched_group' [-Wmissing-prototypes]
     253 | void free_rt_sched_group(struct task_group *tg) { }
         |      ^~~~~~~~~~~~~~~~~~~
   kernel/sched/rt.c:255:5: warning: no previous prototype for 'alloc_rt_sched_group' [-Wmissing-prototypes]
     255 | int alloc_rt_sched_group(struct task_group *tg, struct task_group *parent)
         |     ^~~~~~~~~~~~~~~~~~~~
   kernel/sched/rt.c:668:6: warning: no previous prototype for 'sched_rt_bandwidth_account' [-Wmissing-prototypes]
     668 | bool sched_rt_bandwidth_account(struct rt_rq *rt_rq)
         |      ^~~~~~~~~~~~~~~~~~~~~~~~~~
--
   In file included from arch/powerpc/include/asm/uaccess.h:9,
                    from include/linux/uaccess.h:11,
                    from include/linux/sched/task.h:11,
                    from include/linux/sched/signal.h:9,
                    from include/linux/rcuwait.h:6,
                    from include/linux/percpu-rwsem.h:7,
                    from include/linux/fs.h:33,
                    from include/linux/compat.h:17,
                    from arch/powerpc/kernel/asm-offsets.c:14:
   arch/powerpc/include/asm/kup.h: In function 'get_kuap':
>> arch/powerpc/include/asm/kup.h:19:26: warning: conversion from 'long long unsigned int' to 'long unsigned int' changes value from '13835058055282163712' to '0' [-Woverflow]
      19 | #define AMR_KUAP_BLOCKED (AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
         |                          ^
   arch/powerpc/include/asm/kup.h:73:9: note: in expansion of macro 'AMR_KUAP_BLOCKED'
      73 |  return AMR_KUAP_BLOCKED;
         |         ^~~~~~~~~~~~~~~~

vim +19 arch/powerpc/include/asm/kup.h

    16	
    17	#define AMR_KUAP_BLOCK_READ	UL(0x4000000000000000)
    18	#define AMR_KUAP_BLOCK_WRITE	UL(0x8000000000000000)
  > 19	#define AMR_KUAP_BLOCKED	(AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
    20	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 16271 bytes --]

^ permalink raw reply

* Re: [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: kernel test robot @ 2020-12-03  7:11 UTC (permalink / raw)
  To: Andy Lutomirski, Nicholas Piggin
  Cc: linux-arch, kbuild-all, Arnd Bergmann, Will Deacon, X86 ML, LKML,
	Linux-MM, Mathieu Desnoyers, linuxppc-dev
In-Reply-To: <7c4bcc0a464ca60be1e0aeba805a192be0ee81e5.1606972194.git.luto@kernel.org>

[-- Attachment #1: Type: text/plain, Size: 2383 bytes --]

Hi Andy,

I love your patch! Yet something to improve:

[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/x86/mm soc/for-next linus/master v5.10-rc6 next-20201201]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Andy-Lutomirski/x86-mm-Lightweight-lazy-mm-refcounting/20201203-132706
base:   https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git 238c91115cd05c71447ea071624a4c9fe661f970
config: m68k-randconfig-s032-20201203 (attached as .config)
compiler: m68k-linux-gcc (GCC) 9.3.0
reproduce:
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # apt-get install sparse
        # sparse version: v0.6.3-179-ga00755aa-dirty
        # https://github.com/0day-ci/linux/commit/0d9b23b22e621d8e588095b4d0f9f39110a57901
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Andy-Lutomirski/x86-mm-Lightweight-lazy-mm-refcounting/20201203-132706
        git checkout 0d9b23b22e621d8e588095b4d0f9f39110a57901
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=m68k 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   m68k-linux-ld: kernel/fork.o: in function `__mmput':
>> fork.c:(.text+0x214): undefined reference to `arch_fixup_lazy_mm_refs'
   m68k-linux-ld: kernel/fork.o: in function `mmput_async_fn':
   fork.c:(.text+0x990): undefined reference to `arch_fixup_lazy_mm_refs'
   m68k-linux-ld: kernel/fork.o: in function `mmput':
   fork.c:(.text+0xa68): undefined reference to `arch_fixup_lazy_mm_refs'
   m68k-linux-ld: kernel/fork.o: in function `dup_mm.isra.0':
   fork.c:(.text+0x1192): undefined reference to `arch_fixup_lazy_mm_refs'
   m68k-linux-ld: kernel/fork.o: in function `mm_access':
   fork.c:(.text+0x13c6): undefined reference to `arch_fixup_lazy_mm_refs'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 31262 bytes --]

^ permalink raw reply

* Re: [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts
From: Aneesh Kumar K.V @ 2020-12-03  6:38 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: Alexey Kardashevskiy, Nicholas Piggin
In-Reply-To: <20201203054724.44838-1-aik@ozlabs.ru>

Alexey Kardashevskiy <aik@ozlabs.ru> writes:

> When interrupted in raw_copy_from_user()/... after user memory access
> is enabled, a nested handler may also access user memory (perf is
> one example) and when it does so, it calls prevent_read_from_user()
> which prevents the upper handler from accessing user memory.
>
> This saves/restores AMR when replaying interrupts.
>
> get_kuap/set_kuap have stubs for disabled KUAP on RADIX but there are
> none for hash-only configs (BOOK3E) so this adds stubs and moves
> AMR_KUAP_BLOCK_xxx.
>
> Found by syzkaller. More likely to break with enabled
> CONFIG_DEBUG_ATOMIC_SLEEP, the call chain is
> timer_interrupt -> ktime_get -> read_seqcount_begin -> local_irq_restore.

Can you test this with https://github.com/kvaneesh/linux/commits/hash-kuap-reworked-2

We do save restore AMR on interrupt entry and exit.

-aneesh


^ permalink raw reply

* Re: powerpc 5.10-rcN boot failures with RCU_SCALE_TEST=m
From: Michael Ellerman @ 2020-12-03  6:22 UTC (permalink / raw)
  To: Uladzislau Rezki, Paul E . McKenney
  Cc: rcu, linuxppc-dev, Paul E . McKenney, Daniel Axtens
In-Reply-To: <20201202143955.GA12300@pc636>

Uladzislau Rezki <urezki@gmail.com> writes:
> On Thu, Dec 03, 2020 at 01:03:32AM +1100, Michael Ellerman wrote:
...
>> 
>> The SMP bringup stalls because _cpu_up() is blocked trying to take
>> cpu_hotplug_lock for writing:
>> 
>> [  401.403132][    T0] task:swapper/0       state:D stack:12512 pid:    1 ppid:     0 flags:0x00000800
>> [  401.403502][    T0] Call Trace:
>> [  401.403907][    T0] [c0000000062c37d0] [c0000000062c3830] 0xc0000000062c3830 (unreliable)
>> [  401.404068][    T0] [c0000000062c39b0] [c000000000019d70] __switch_to+0x2e0/0x4a0
>> [  401.404189][    T0] [c0000000062c3a10] [c000000000b87228] __schedule+0x288/0x9b0
>> [  401.404257][    T0] [c0000000062c3ad0] [c000000000b879b8] schedule+0x68/0x120
>> [  401.404324][    T0] [c0000000062c3b00] [c000000000184ad4] percpu_down_write+0x164/0x170
>> [  401.404390][    T0] [c0000000062c3b50] [c000000000116b68] _cpu_up+0x68/0x280
>> [  401.404475][    T0] [c0000000062c3bb0] [c000000000116e70] cpu_up+0xf0/0x140
>> [  401.404546][    T0] [c0000000062c3c30] [c00000000011776c] bringup_nonboot_cpus+0xac/0xf0
>> [  401.404643][    T0] [c0000000062c3c80] [c000000000eea1b8] smp_init+0x40/0xcc
>> [  401.404727][    T0] [c0000000062c3ce0] [c000000000ec43dc] kernel_init_freeable+0x1e0/0x3a0
>> [  401.404799][    T0] [c0000000062c3db0] [c000000000011ec4] kernel_init+0x24/0x150
>> [  401.404958][    T0] [c0000000062c3e20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
>> 
>> It can't get it because kprobe_optimizer() has taken it for read and is now
>> blocked waiting for synchronize_rcu_tasks():
>> 
>> [  401.418808][    T0] task:kworker/0:1     state:D stack:13392 pid:   12 ppid:     2 flags:0x00000800
>> [  401.418951][    T0] Workqueue: events kprobe_optimizer
>> [  401.419078][    T0] Call Trace:
>> [  401.419121][    T0] [c0000000062ef650] [c0000000062ef710] 0xc0000000062ef710 (unreliable)
>> [  401.419213][    T0] [c0000000062ef830] [c000000000019d70] __switch_to+0x2e0/0x4a0
>> [  401.419281][    T0] [c0000000062ef890] [c000000000b87228] __schedule+0x288/0x9b0
>> [  401.419347][    T0] [c0000000062ef950] [c000000000b879b8] schedule+0x68/0x120
>> [  401.419415][    T0] [c0000000062ef980] [c000000000b8e664] schedule_timeout+0x2a4/0x340
>> [  401.419484][    T0] [c0000000062efa80] [c000000000b894ec] wait_for_completion+0x9c/0x170
>> [  401.419552][    T0] [c0000000062efae0] [c0000000001ac85c] __wait_rcu_gp+0x19c/0x210
>> [  401.419619][    T0] [c0000000062efb40] [c0000000001ac90c] synchronize_rcu_tasks_generic+0x3c/0x70
>> [  401.419690][    T0] [c0000000062efbe0] [c00000000022a3dc] kprobe_optimizer+0x1dc/0x470
>> [  401.419757][    T0] [c0000000062efc60] [c000000000136684] process_one_work+0x2f4/0x530
>> [  401.419823][    T0] [c0000000062efd20] [c000000000138d28] worker_thread+0x78/0x570
>> [  401.419891][    T0] [c0000000062efdb0] [c000000000142424] kthread+0x194/0x1a0
>> [  401.419976][    T0] [c0000000062efe20] [c00000000000daf0] ret_from_kernel_thread+0x5c/0x6c
>> 
>> But why is the synchronize_rcu_tasks() not completing?
>> 
> I think that it is because RCU is not fully initialized by that time.

Yeah that would explain it :)

> The 36dadef23fcc ("kprobes: Init kprobes in early_initcall") patch
> switches to early_initcall() that has a higher priority sequence than
> core_initcall() that is used to complete an RCU setup in the rcu_set_runtime_mode().

I was looking at debug_lockdep_rcu_enabled(), which is:

noinstr int notrace debug_lockdep_rcu_enabled(void)
{
	return rcu_scheduler_active != RCU_SCHEDULER_INACTIVE && debug_locks &&
	       current->lockdep_recursion == 0;
}

That is not firing any warnings for me because rcu_scheduler_active is:

(gdb) p/x rcu_scheduler_active
$1 = 0x1

Which is:

#define RCU_SCHEDULER_INIT	1

But that's different to RCU_SCHEDULER_RUNNING, which is set in
rcu_set_runtime_mode() as you mentioned:

static int __init rcu_set_runtime_mode(void)
{
	rcu_test_sync_prims();
	rcu_scheduler_active = RCU_SCHEDULER_RUNNING;
	kfree_rcu_scheduler_running();
	rcu_test_sync_prims();
	return 0;
}

The comment on rcu_scheduler_active implies that once we're at
RCU_SCHEDULER_INIT things should work:

/*
 * The rcu_scheduler_active variable is initialized to the value
 * RCU_SCHEDULER_INACTIVE and transitions RCU_SCHEDULER_INIT just before the
 * first task is spawned.  So when this variable is RCU_SCHEDULER_INACTIVE,
 * RCU can assume that there is but one task, allowing RCU to (for example)
 * optimize synchronize_rcu() to a simple barrier().  When this variable
 * is RCU_SCHEDULER_INIT, RCU must actually do all the hard work required
 * to detect real grace periods.  This variable is also used to suppress
 * boot-time false positives from lockdep-RCU error checking.  Finally, it
 * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU
 * is fully initialized, including all of its kthreads having been spawned.
 */


So I'm not sure, the comments and the debug checks imply that it is OK
for kprobes to be using RCU this early.

I guess I'll keep digging.

cheers


^ permalink raw reply

* Re: [powerpc:next-test 121/184] arch/powerpc/kernel/firmware.c:31:9-10: WARNING: return of 0/1 in function 'check_kvm_guest' with return type bool
From: Srikar Dronamraju @ 2020-12-03  6:12 UTC (permalink / raw)
  To: kernel test robot; +Cc: linuxppc-dev, kbuild-all, Nicholas Piggin
In-Reply-To: <202012030759.zuEULDQ3-lkp@intel.com>

Hi, 

Thanks for the report.

* kernel test robot <lkp@intel.com> [2020-12-03 07:25:06]:

> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next-test
> head:   fb003959777a635dea8910cf71109b612c7f940c
> commit: 77354ecf8473208a5cc5f20a501760f7d6d631cd [121/184] powerpc: Rename is_kvm_guest() to check_kvm_guest()

Sorry for the nit pick, but the commit should have been
Commit 107c55005fbd ("powerpc/pseries: Add KVM guest doorbell restrictions")

because thats the patch that introduced is_kvm_guest.
my patch was just renaming is_kvm_guest to check_kvm_guest.

> config: powerpc-randconfig-c003-20201202 (attached as .config)
> compiler: powerpc64le-linux-gcc (GCC) 9.3.0
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> 
> "coccinelle warnings: (new ones prefixed by >>)"
> >> arch/powerpc/kernel/firmware.c:31:9-10: WARNING: return of 0/1 in function 'check_kvm_guest' with return type bool
> 
> Please review and possibly fold the followup patch.
> 

But I agree we can still fold this the followup patch into the
Rename is_kvm_guest() to check_kvm_guest().

> ---
> 0-DAY CI Kernel Test Service, Intel Corporation
> https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org



-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply

* [PATCH kernel v2] powerpc/kuap: Restore AMR after replaying soft interrupts
From: Alexey Kardashevskiy @ 2020-12-03  5:47 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Nicholas Piggin

When interrupted in raw_copy_from_user()/... after user memory access
is enabled, a nested handler may also access user memory (perf is
one example) and when it does so, it calls prevent_read_from_user()
which prevents the upper handler from accessing user memory.

This saves/restores AMR when replaying interrupts.

get_kuap/set_kuap have stubs for disabled KUAP on RADIX but there are
none for hash-only configs (BOOK3E) so this adds stubs and moves
AMR_KUAP_BLOCK_xxx.

Found by syzkaller. More likely to break with enabled
CONFIG_DEBUG_ATOMIC_SLEEP, the call chain is
timer_interrupt -> ktime_get -> read_seqcount_begin -> local_irq_restore.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v2:
* fixed compile on hash
* moved get/set to arch_local_irq_restore
* block KUAP before replaying


---

This is an example:

------------[ cut here ]------------
Bug: Read fault blocked by AMR!
WARNING: CPU: 0 PID: 1603 at /home/aik/p/kernel/arch/powerpc/include/asm/book3s/64/kup-radix.h:145 __do_page_fau

Modules linked in:
CPU: 0 PID: 1603 Comm: amr Not tainted 5.10.0-rc6_v5.10-rc6_a+fstn1 #24
NIP:  c00000000009ece8 LR: c00000000009ece4 CTR: 0000000000000000
REGS: c00000000dc63560 TRAP: 0700   Not tainted  (5.10.0-rc6_v5.10-rc6_a+fstn1)
MSR:  8000000000021033 <SF,ME,IR,DR,RI,LE>  CR: 28002888  XER: 20040000
CFAR: c0000000001fa928 IRQMASK: 1
GPR00: c00000000009ece4 c00000000dc637f0 c000000002397600 000000000000001f
GPR04: c0000000020eb318 0000000000000000 c00000000dc63494 0000000000000027
GPR08: c00000007fe4de68 c00000000dfe9180 0000000000000000 0000000000000001
GPR12: 0000000000002000 c0000000030a0000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 bfffffffffffffff
GPR20: 0000000000000000 c0000000134a4020 c0000000019c2218 0000000000000fe0
GPR24: 0000000000000000 0000000000000000 c00000000d106200 0000000040000000
GPR28: 0000000000000000 0000000000000300 c00000000dc63910 c000000001946730
NIP [c00000000009ece8] __do_page_fault+0xb38/0xde0
LR [c00000000009ece4] __do_page_fault+0xb34/0xde0
Call Trace:
[c00000000dc637f0] [c00000000009ece4] __do_page_fault+0xb34/0xde0 (unreliable)
[c00000000dc638a0] [c00000000000c968] handle_page_fault+0x10/0x2c
--- interrupt: 300 at strncpy_from_user+0x290/0x440
    LR = strncpy_from_user+0x284/0x440
[c00000000dc63ba0] [c000000000c3dcb0] strncpy_from_user+0x2f0/0x440 (unreliable)
[c00000000dc63c30] [c00000000068b888] getname_flags+0x88/0x2c0
[c00000000dc63c90] [c000000000662a44] do_sys_openat2+0x2d4/0x5f0
[c00000000dc63d30] [c00000000066560c] do_sys_open+0xcc/0x140
[c00000000dc63dc0] [c000000000045e10] system_call_exception+0x160/0x240
[c00000000dc63e20] [c00000000000da60] system_call_common+0xf0/0x27c
Instruction dump:
409c0048 3fe2ff5b 3bfff128 fac10060 fae10068 482f7a85 60000000 3c62ff5b
7fe4fb78 3863f250 4815bbd9 60000000 <0fe00000> 3c62ff5b 3863f2b8 4815c8b5
irq event stamp: 254
hardirqs last  enabled at (253): [<c000000000019550>] arch_local_irq_restore+0xa0/0x150
hardirqs last disabled at (254): [<c000000000008a10>] data_access_common_virt+0x1b0/0x1d0
softirqs last  enabled at (0): [<c0000000001f6d5c>] copy_process+0x78c/0x2120
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace ba98aec5151f3aeb ]---
---
 arch/powerpc/include/asm/book3s/64/kup-radix.h |  3 ---
 arch/powerpc/include/asm/kup.h                 | 10 ++++++++++
 arch/powerpc/kernel/irq.c                      |  6 ++++++
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/kup-radix.h b/arch/powerpc/include/asm/book3s/64/kup-radix.h
index a39e2d193fdc..4ad607461b75 100644
--- a/arch/powerpc/include/asm/book3s/64/kup-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/kup-radix.h
@@ -5,9 +5,6 @@
 #include <linux/const.h>
 #include <asm/reg.h>
 
-#define AMR_KUAP_BLOCK_READ	UL(0x4000000000000000)
-#define AMR_KUAP_BLOCK_WRITE	UL(0x8000000000000000)
-#define AMR_KUAP_BLOCKED	(AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
 #define AMR_KUAP_SHIFT		62
 
 #ifdef __ASSEMBLY__
diff --git a/arch/powerpc/include/asm/kup.h b/arch/powerpc/include/asm/kup.h
index 0d93331d0fab..e63930767a96 100644
--- a/arch/powerpc/include/asm/kup.h
+++ b/arch/powerpc/include/asm/kup.h
@@ -14,6 +14,10 @@
 #define KUAP_CURRENT_WRITE	8
 #define KUAP_CURRENT		(KUAP_CURRENT_READ | KUAP_CURRENT_WRITE)
 
+#define AMR_KUAP_BLOCK_READ	UL(0x4000000000000000)
+#define AMR_KUAP_BLOCK_WRITE	UL(0x8000000000000000)
+#define AMR_KUAP_BLOCKED	(AMR_KUAP_BLOCK_READ | AMR_KUAP_BLOCK_WRITE)
+
 #ifdef CONFIG_PPC_BOOK3S_64
 #include <asm/book3s/64/kup-radix.h>
 #endif
@@ -64,6 +68,12 @@ bad_kuap_fault(struct pt_regs *regs, unsigned long address, bool is_write)
 }
 
 static inline void kuap_check_amr(void) { }
+static inline unsigned long get_kuap(void)
+{
+	return AMR_KUAP_BLOCKED;
+}
+
+static inline void set_kuap(unsigned long value) { }
 
 /*
  * book3s/64/kup-radix.h defines these functions for the !KUAP case to flush
diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
index 7d0f7682d01d..d9fd46da04d6 100644
--- a/arch/powerpc/kernel/irq.c
+++ b/arch/powerpc/kernel/irq.c
@@ -314,6 +314,7 @@ void replay_soft_interrupts(void)
 notrace void arch_local_irq_restore(unsigned long mask)
 {
 	unsigned char irq_happened;
+	unsigned long kuap_state;
 
 	/* Write the new soft-enabled value */
 	irq_soft_mask_set(mask);
@@ -373,9 +374,14 @@ notrace void arch_local_irq_restore(unsigned long mask)
 	irq_soft_mask_set(IRQS_ALL_DISABLED);
 	trace_hardirqs_off();
 
+	kuap_state = get_kuap();
+	set_kuap(AMR_KUAP_BLOCKED);
+
 	replay_soft_interrupts();
 	local_paca->irq_happened = 0;
 
+	set_kuap(kuap_state);
+
 	trace_hardirqs_on();
 	irq_soft_mask_set(IRQS_ENABLED);
 	__hard_irq_enable();
-- 
2.17.1


^ permalink raw reply related

* Re: [PATCH kernel] powerpc/perf: Stop crashing with generic_compat_pmu
From: Madhavan Srinivasan @ 2020-12-03  5:27 UTC (permalink / raw)
  To: Alexey Kardashevskiy, linuxppc-dev; +Cc: Madhavan Srinivasan
In-Reply-To: <848255d6-5c8f-d4c2-a865-e3a7ffce7fdb@ozlabs.ru>


On 12/2/20 8:31 AM, Alexey Kardashevskiy wrote:
> Hi Maddy,
>
> I just noticed that I still have "powerpc/perf: Add checks for 
> reserved values" in my pile (pushed here 
> https://github.com/aik/linux/commit/61e1bc3f2e19d450e2e2d39174d422160b21957b 
> ), do we still need it? The lockups I saw were fixed by 
> https://github.com/aik/linux/commit/17899eaf88d689 but it is hardly a 
> replacement. Thanks,

sorry missed this. Will look at this again. Since we will need 
generation specific checks for the reserve field.

Maddy

>
>
> On 04/06/2020 02:34, Madhavan Srinivasan wrote:
>>
>>
>> On 6/2/20 8:26 AM, Alexey Kardashevskiy wrote:
>>> The bhrb_filter_map ("The  Branch History  Rolling  Buffer") 
>>> callback is
>>> only defined in raw CPUs' power_pmu structs. The "architected" CPUs use
>>> generic_compat_pmu which does not have this callback and crashed occur.
>>>
>>> This add a NULL pointer check for bhrb_filter_map() which behaves as if
>>> the callback returned an error.
>>>
>>> This does not add the same check for config_bhrb() as the only caller
>>> checks for cpuhw->bhrb_users which remains zero if bhrb_filter_map==0.
>>
>> Changes looks fine.
>> Reviewed-by: Madhavan Srinivasan <maddy@linux.ibm.com>
>>
>> The commit be80e758d0c2e ('powerpc/perf: Add generic compat mode pmu 
>> driver')
>> which introduced generic_compat_pmu was merged in v5.2.  So we need to
>> CC stable starting from 5.2 :( .  My bad,  sorry.
>>
>> Maddy
>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>>   arch/powerpc/perf/core-book3s.c | 19 ++++++++++++++-----
>>>   1 file changed, 14 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/arch/powerpc/perf/core-book3s.c 
>>> b/arch/powerpc/perf/core-book3s.c
>>> index 3dcfecf858f3..36870569bf9c 100644
>>> --- a/arch/powerpc/perf/core-book3s.c
>>> +++ b/arch/powerpc/perf/core-book3s.c
>>> @@ -1515,9 +1515,16 @@ static int power_pmu_add(struct perf_event 
>>> *event, int ef_flags)
>>>       ret = 0;
>>>    out:
>>>       if (has_branch_stack(event)) {
>>> -        power_pmu_bhrb_enable(event);
>>> -        cpuhw->bhrb_filter = ppmu->bhrb_filter_map(
>>> -                    event->attr.branch_sample_type);
>>> +        u64 bhrb_filter = -1;
>>> +
>>> +        if (ppmu->bhrb_filter_map)
>>> +            bhrb_filter = ppmu->bhrb_filter_map(
>>> +                event->attr.branch_sample_type);
>>> +
>>> +        if (bhrb_filter != -1) {
>>> +            cpuhw->bhrb_filter = bhrb_filter;
>>> +            power_pmu_bhrb_enable(event); /* Does bhrb_users++ */
>>> +        }
>>>       }
>>>
>>>       perf_pmu_enable(event->pmu);
>>> @@ -1839,7 +1846,6 @@ static int power_pmu_event_init(struct 
>>> perf_event *event)
>>>       int n;
>>>       int err;
>>>       struct cpu_hw_events *cpuhw;
>>> -    u64 bhrb_filter;
>>>
>>>       if (!ppmu)
>>>           return -ENOENT;
>>> @@ -1945,7 +1951,10 @@ static int power_pmu_event_init(struct 
>>> perf_event *event)
>>>       err = power_check_constraints(cpuhw, events, cflags, n + 1);
>>>
>>>       if (has_branch_stack(event)) {
>>> -        bhrb_filter = ppmu->bhrb_filter_map(
>>> +        u64 bhrb_filter = -1;
>>> +
>>> +        if (ppmu->bhrb_filter_map)
>>> +            bhrb_filter = ppmu->bhrb_filter_map(
>>>                       event->attr.branch_sample_type);
>>>
>>>           if (bhrb_filter == -1) {
>>
>

^ permalink raw reply

* [MOCKUP] x86/mm: Lightweight lazy mm refcounting
From: Andy Lutomirski @ 2020-12-03  5:25 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Arnd Bergmann, Rik van Riel, Will Deacon, X86 ML,
	Dave Hansen, LKML, Linux-MM, Mathieu Desnoyers, Andy Lutomirski,
	Catalin Marinas, linuxppc-dev

For context, this is part of a series here:

https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/commit/?h=x86/mm&id=7c4bcc0a464ca60be1e0aeba805a192be0ee81e5

This code compiles, but I haven't even tried to boot it.  The earlier
part of the series isn't terribly interesting -- it's a handful of
cleanups that remove all reads of ->active_mm from arch/x86.  I've
been meaning to do that for a while, and now I did it.  But, with
that done, I think we can move to a totally different lazy mm refcounting
model.

So this patch is a mockup of what this could look like.  The algorithm
involves no atomics at all in the context switch path except for a
single atomic_long_xchg() of a percpu variable when going from lazy
mode to nonlazy mode.  Even that could be optimized -- I suspect it could
be replaced with non-atomic code if mm_users > 0.  Instead, on mm exit,
there's a single loop over all CPUs on which that mm could be lazily
loaded that atomic_long_cmpxchg_relaxed()'s a remote percpu variable to
tell the CPU to kindly mmdrop() the mm when it reschedules.  All cpus
that don't have the mm lazily loaded are ignored because they can't
have lazy references, and no cpus can gain lazy references after
mm_users hits zero.  (I need to verify that all the relevant barriers
are in place.  I suspect that they are on x86 but that I'm short an
smp_mb() on arches for which _relaxed atomics are genuinely relaxed.)

Here's how I think it fits with various arches:

x86: On bare metal (i.e. paravirt flush unavailable), the loop won't do
much.  The existing TLB shootdown when user tables are freed will
empty mm_cpumask of everything but the calling CPU.  So x86 ends up
pretty close to as good as we can get short of reworking mm_cpumask() itself.

arm64: with s/for_each_cpu/for_each_online_cpu, I think it will give
good performance.  The mmgrab()/mmdrop() overhead goes away, and,
on smallish systems, the cost of the loop should be low.

power: same as ARM, except that the loop may be rather larger since
the systems are bigger.  But I imagine it's still faster than Nick's
approach -- a cmpxchg to a remote cacheline should still be faster than
an IPI shootdown.  (Nick, don't benchmark this patch -- at least the
mm_users optimization mentioned above should be done, but also the
mmgrab() and mmdrop() aren't actually removed.)

Other arches: I don't know.  Further research is required.

What do you all think?


As mentioned, there are several things blatantly wrong with this patch:

The coding stype is not up to kernel standars.  I have prototypes in the
wrong places and other hacks.

mms are likely to be freed with IRQs off.  I think this is safe, but it's
suboptimal.

This whole thing is in arch/x86.  The core algorithm ought to move outside
arch/, but doing so without making a mess might take some thought.  It
doesn't help that different architectures have very different ideas
of what mm_cpumask() does.

Finally, this patch has no benefit by itself.  I didn't remove the
active_mm refounting, so the big benefit of removing mmgrab() and
mmdrop() calls on transitions to and from lazy mode isn't there.
There is no point at all in benchmarking this patch as is.  That
being said, getting rid of the active_mm refcounting shouldn't be
so hard, since x86 (in this tree) no longer uses active_mm at all.

I should contemplate whether the calling CPU is special in
arch_fixup_lazy_mm_refs().  On a very very quick think, it's not, but
it needs more thought.

Signed-off-by: Andy Lutomirski <luto@kernel.org>

 arch/x86/include/asm/tlbflush.h | 20 ++++++++
 arch/x86/mm/tlb.c               | 81 +++++++++++++++++++++++++++++++--
 kernel/fork.c                   |  5 ++
 3 files changed, 101 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
index 8c87a2e0b660..efcd4f125f2c 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -124,6 +124,26 @@ struct tlb_state {
 	 */
 	unsigned short user_pcid_flush_mask;
 
+	/*
+	 * Lazy mm tracking.
+	 *
+	 *  - If this is NULL, it means that any mm_struct referenced by
+	 *    this CPU is kept alive by a real reference count.
+	 *
+	 *  - If this is nonzero but the low bit is clear, it points to
+	 *    an mm_struct that must not be freed out from under this
+	 *    CPU.
+	 *
+	 *  - If the low bit is set, it still points to an mm_struct,
+	 *    but some other CPU has mmgrab()'d it on our behalf, and we
+	 *    must mmdrop() it when we're done with it.
+	 *
+	 * See lazy_mm_grab() and related functions for the precise
+	 * access rules.
+	 */
+	atomic_long_t		lazy_mm;
+
+
 	/*
 	 * Access to this CR4 shadow and to H/W CR4 is protected by
 	 * disabling interrupts when modifying either one.
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index e27300fc865b..00f5bace534b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -8,6 +8,7 @@
 #include <linux/export.h>
 #include <linux/cpu.h>
 #include <linux/debugfs.h>
+#include <linux/sched/mm.h>
 
 #include <asm/tlbflush.h>
 #include <asm/mmu_context.h>
@@ -420,6 +421,64 @@ void cr4_update_pce(void *ignored)
 static inline void cr4_update_pce_mm(struct mm_struct *mm) { }
 #endif
 
+static void lazy_mm_grab(struct mm_struct *mm)
+{
+	atomic_long_t *lazy_mm = this_cpu_ptr(&cpu_tlbstate.lazy_mm);
+
+	WARN_ON_ONCE(atomic_long_read(lazy_mm) != 0);
+	atomic_long_set(lazy_mm, (unsigned long)mm);
+}
+
+static void lazy_mm_drop(void)
+{
+	atomic_long_t *lazy_mm = this_cpu_ptr(&cpu_tlbstate.lazy_mm);
+
+	unsigned long prev = atomic_long_xchg(lazy_mm, 0);
+	if (prev & 1)
+		mmdrop((struct mm_struct *)(prev & ~1UL));
+}
+
+void arch_fixup_lazy_mm_refs(struct mm_struct *mm)
+{
+	int cpu;
+
+	/*
+	 * mm_users is zero, so no new lazy refs will be taken.
+	 */
+	WARN_ON_ONCE(atomic_read(&mm->mm_users) != 0);
+
+	for_each_cpu(cpu, mm_cpumask(mm)) {
+		atomic_long_t *lazy_mm = per_cpu_ptr(&cpu_tlbstate.lazy_mm, cpu);
+		unsigned long old;
+
+		// Hmm, is this check actually useful?
+		if (atomic_long_read(lazy_mm) != (unsigned long)mm)
+			continue;
+
+		// XXX: we could optimize this by adding a bunch to
+		// mm_count at the beginning and subtracting the unused refs
+		// back off at the end.
+		mmgrab(mm);
+
+		// XXX: is relaxed okay here?  We need to be sure that the
+		// remote CPU has observed mm_users == 0.  This is just
+		// x86 for now, but we might want to move it into a library
+		// and use it elsewhere.
+		old = atomic_long_cmpxchg_relaxed(lazy_mm, (unsigned long)mm,
+						  (unsigned long)mm | 1);
+		if (old == (unsigned long)mm) {
+			/* The remote CPU now owns the reference we grabbed. */
+		} else {
+			/*
+			 * We raced!  The remote CPU switched mms and no longer
+			 * needs its reference.  We didn't transfer ownership
+			 * of the reference, so drop it.
+			 */
+			mmdrop(mm);
+		}
+	}
+}
+
 void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 			struct task_struct *tsk)
 {
@@ -587,16 +646,24 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
 		cr4_update_pce_mm(next);
 		switch_ldt(real_prev, next);
 	}
+
+	// XXX: this can end up in __mmdrop().  Is this okay with IRQs off?
+	// It might be nicer to defer this to a later stage in the scheduler
+	// with IRQs on.
+	if (was_lazy)
+		lazy_mm_drop();
 }
 
 /*
- * Please ignore the name of this function.  It should be called
- * switch_to_kernel_thread().
+ * Please don't think too hard about the name of this function.  It
+ * should be called something like switch_to_kernel_mm().
  *
  * enter_lazy_tlb() is a hint from the scheduler that we are entering a
- * kernel thread or other context without an mm.  Acceptable implementations
- * include doing nothing whatsoever, switching to init_mm, or various clever
- * lazy tricks to try to minimize TLB flushes.
+ * kernel thread or other context without an mm.  Acceptable
+ * implementations include switching to init_mm, or various clever lazy
+ * tricks to try to minimize TLB flushes.  We are, however, required to
+ * either stop referencing the previous mm or to take some action to
+ * keep it from being freed out from under us.
  *
  * The scheduler reserves the right to call enter_lazy_tlb() several times
  * in a row.  It will notify us that we're going back to a real mm by
@@ -607,7 +674,11 @@ void enter_lazy_tlb(struct mm_struct *mm, struct task_struct *tsk)
 	if (this_cpu_read(cpu_tlbstate.loaded_mm) == &init_mm)
 		return;
 
+	if (this_cpu_read(cpu_tlbstate.is_lazy))
+		return;  /* nothing to do */
+
 	this_cpu_write(cpu_tlbstate.is_lazy, true);
+	lazy_mm_grab(mm);
 }
 
 /*
diff --git a/kernel/fork.c b/kernel/fork.c
index da8d360fb032..4d68162c1d02 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1066,6 +1066,9 @@ struct mm_struct *mm_alloc(void)
 	return mm_init(mm, current, current_user_ns());
 }
 
+// XXX: move to a header
+extern void arch_fixup_lazy_mm_refs(struct mm_struct *mm);
+
 static inline void __mmput(struct mm_struct *mm)
 {
 	VM_BUG_ON(atomic_read(&mm->mm_users));
@@ -1084,6 +1087,8 @@ static inline void __mmput(struct mm_struct *mm)
 	}
 	if (mm->binfmt)
 		module_put(mm->binfmt->module);
+
+	arch_fixup_lazy_mm_refs(mm);
 	mmdrop(mm);
 }
 
-- 
2.28.0


^ permalink raw reply related

* Re: [PATCH 2/8] x86: use exit_lazy_tlb rather than membarrier_mm_sync_core_before_usermode
From: Andy Lutomirski @ 2020-12-03  5:09 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, Arnd Bergmann, Peter Zijlstra, X86 ML, LKML, Linux-MM,
	Mathieu Desnoyers, Andy Lutomirski, linuxppc-dev
In-Reply-To: <1606876327.dyrhkih2kh.astroid@bobo.none>

On Tue, Dec 1, 2020 at 6:50 PM Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Andy Lutomirski's message of November 29, 2020 3:55 am:
> > On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
> >>
> >> And get rid of the generic sync_core_before_usermode facility. This is
> >> functionally a no-op in the core scheduler code, but it also catches
> >>
> >> This helper is the wrong way around I think. The idea that membarrier
> >> state requires a core sync before returning to user is the easy one
> >> that does not need hiding behind membarrier calls. The gap in core
> >> synchronization due to x86's sysret/sysexit and lazy tlb mode, is the
> >> tricky detail that is better put in x86 lazy tlb code.
> >>
> >> Consider if an arch did not synchronize core in switch_mm either, then
> >> membarrier_mm_sync_core_before_usermode would be in the wrong place
> >> but arch specific mmu context functions would still be the right place.
> >> There is also a exit_lazy_tlb case that is not covered by this call, which
> >> could be a bugs (kthread use mm the membarrier process's mm then context
> >> switch back to the process without switching mm or lazy mm switch).
> >>
> >> This makes lazy tlb code a bit more modular.
> >
> > I have a couple of membarrier fixes that I want to send out today or
> > tomorrow, and they might eliminate the need for this patch.  Let me
> > think about this a little bit.  I'll cc you.  The existing code is way
> > to subtle and the comments are far too confusing for me to be quickly
> > confident about any of my conclusions :)
> >
>
> Thanks for the head's up. I'll have to have a better look through them
> but I don't know that it eliminates the need for this entirely although
> it might close some gaps and make this not a bug fix. The problem here
> is x86 code wanted something to be called when a lazy mm is unlazied,
> but it missed some spots and also the core scheduler doesn't need to
> know about those x86 details if it has this generic call that annotates
> the lazy handling better.

I'll send v3 tomorrow.  They add more sync_core_before_usermode() callers.

Having looked at your patches a bunch and the membarrier code a bunch,
I don't think I like the approach of pushing this logic out into new
core functions called by arch code.  Right now, even with my
membarrier patches applied, understanding how (for example) the x86
switch_mm_irqs_off() plus the scheduler code provides the barriers
that membarrier needs is quite complicated, and it's not clear to me
that the code is even correct.  At the very least I'm pretty sure that
the x86 comments are misleading.  With your patches, someone trying to
audit the code would have to follow core code calling into arch code
and back out into core code to figure out what's going on.  I think
the result is worse.

I wrote this incomplete patch which takes the opposite approach (sorry
for whitespace damage):

commit 928b5c60e93f475934892d6e0b357ebf0a2bf87d
Author: Andy Lutomirski <luto@kernel.org>
Date:   Wed Dec 2 17:24:02 2020 -0800

    [WIP] x86/mm: Handle unlazying membarrier core sync in the arch code

    The core scheduler isn't a great place for
    membarrier_mm_sync_core_before_usermode() -- the core scheduler
    doesn't actually know whether we are lazy.  With the old code, if a
    CPU is running a membarrier-registered task, goes idle, gets unlazied
    via a TLB shootdown IPI, and switches back to the
    membarrier-registered task, it will do an unnecessary core sync.

    Conveniently, x86 is the only architecture that does anything in this
    hook, so we can just move the code.

    XXX: actually delete the old code.

    Signed-off-by: Andy Lutomirski <luto@kernel.org>

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 3338a1feccf9..e27300fc865b 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -496,6 +496,8 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
          * from one thread in a process to another thread in the same
          * process. No TLB flush required.
          */
+
+        // XXX: why is this okay wrt membarrier?
         if (!was_lazy)
             return;

@@ -508,12 +510,24 @@ void switch_mm_irqs_off(struct mm_struct *prev,
struct mm_struct *next,
         smp_mb();
         next_tlb_gen = atomic64_read(&next->context.tlb_gen);
         if (this_cpu_read(cpu_tlbstate.ctxs[prev_asid].tlb_gen) ==
-                next_tlb_gen)
+            next_tlb_gen) {
+            /*
+             * We're reactivating an mm, and membarrier might
+             * need to serialize.  Tell membarrier.
+             */
+
+            // XXX: I can't understand the logic in
+            // membarrier_mm_sync_core_before_usermode().  What's
+            // the mm check for?
+            membarrier_mm_sync_core_before_usermode(next);
             return;
+        }

         /*
          * TLB contents went out of date while we were in lazy
          * mode. Fall through to the TLB switching code below.
+         * No need for an explicit membarrier invocation -- the CR3
+         * write will serialize.
          */
         new_asid = prev_asid;
         need_flush = true;

^ permalink raw reply related

* [PATCH] powerpc/book3s_hv_uvmem: Check for failed page migration
From: Alistair Popple @ 2020-12-03  5:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alistair Popple, bharata

migrate_vma_pages() may still clear MIGRATE_PFN_MIGRATE on pages which
are not able to be migrated. Drivers may safely copy data prior to
calling migrate_vma_pages() however a remote mapping must not be
established until after migrate_vma_pages() has returned as the
migration could still fail.

UV_PAGE_IN_in both copies and maps the data page, therefore it should
only be called after checking the results of migrate_vma_pages().

Signed-off-by: Alistair Popple <alistair@popple.id.au>
---
 arch/powerpc/kvm/book3s_hv_uvmem.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 84e5a2dc8be5..08aa6a90c525 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -762,7 +762,10 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
 		goto out_finalize;
 	}
 
-	if (pagein) {
+	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
+	migrate_vma_pages(&mig);
+
+	if ((*mig.src & MIGRATE_PFN_MIGRATE) && pagein) {
 		pfn = *mig.src >> MIGRATE_PFN_SHIFT;
 		spage = migrate_pfn_to_page(*mig.src);
 		if (spage) {
@@ -773,8 +776,6 @@ static int kvmppc_svm_page_in(struct vm_area_struct *vma,
 		}
 	}
 
-	*mig.dst = migrate_pfn(page_to_pfn(dpage)) | MIGRATE_PFN_LOCKED;
-	migrate_vma_pages(&mig);
 out_finalize:
 	migrate_vma_finalize(&mig);
 	return ret;
-- 
2.20.1


^ permalink raw reply related

* Re: [PATCH 6/8] lazy tlb: shoot lazies, a non-refcounting lazy tlb option
From: Andy Lutomirski @ 2020-12-03  5:05 UTC (permalink / raw)
  To: Nicholas Piggin
  Cc: linux-arch, X86 ML, Arnd Bergmann, Vasily Gorbik, Dave Hansen,
	Peter Zijlstra, Catalin Marinas, Heiko Carstens, linuxppc-dev,
	LKML, Linux-MM, Christian Borntraeger, Mathieu Desnoyers,
	Andy Lutomirski, Will Deacon
In-Reply-To: <1606879302.tdngvs3yq4.astroid@bobo.none>

> On Dec 1, 2020, at 7:47 PM, Nicholas Piggin <npiggin@gmail.com> wrote:
>
> Excerpts from Andy Lutomirski's message of December 1, 2020 4:31 am:
>> other arch folk: there's some background here:
>>
>> https://lkml.kernel.org/r/CALCETrVXUbe8LfNn-Qs+DzrOQaiw+sFUg1J047yByV31SaTOZw@mail.gmail.com
>>
>>> On Sun, Nov 29, 2020 at 12:16 PM Andy Lutomirski <luto@kernel.org> wrote:
>>>
>>> On Sat, Nov 28, 2020 at 7:54 PM Andy Lutomirski <luto@kernel.org> wrote:
>>>>
>>>> On Sat, Nov 28, 2020 at 8:02 AM Nicholas Piggin <npiggin@gmail.com> wrote:
>>>>>
>>>>> On big systems, the mm refcount can become highly contented when doing
>>>>> a lot of context switching with threaded applications (particularly
>>>>> switching between the idle thread and an application thread).
>>>>>
>>>>> Abandoning lazy tlb slows switching down quite a bit in the important
>>>>> user->idle->user cases, so so instead implement a non-refcounted scheme
>>>>> that causes __mmdrop() to IPI all CPUs in the mm_cpumask and shoot down
>>>>> any remaining lazy ones.
>>>>>
>>>>> Shootdown IPIs are some concern, but they have not been observed to be
>>>>> a big problem with this scheme (the powerpc implementation generated
>>>>> 314 additional interrupts on a 144 CPU system during a kernel compile).
>>>>> There are a number of strategies that could be employed to reduce IPIs
>>>>> if they turn out to be a problem for some workload.
>>>>
>>>> I'm still wondering whether we can do even better.
>>>>
>>>
>>> Hold on a sec.. __mmput() unmaps VMAs, frees pagetables, and flushes
>>> the TLB.  On x86, this will shoot down all lazies as long as even a
>>> single pagetable was freed.  (Or at least it will if we don't have a
>>> serious bug, but the code seems okay.  We'll hit pmd_free_tlb, which
>>> sets tlb->freed_tables, which will trigger the IPI.)  So, on
>>> architectures like x86, the shootdown approach should be free.  The
>>> only way it ought to have any excess IPIs is if we have CPUs in
>>> mm_cpumask() that don't need IPI to free pagetables, which could
>>> happen on paravirt.
>>
>> Indeed, on x86, we do this:
>>
>> [   11.558844]  flush_tlb_mm_range.cold+0x18/0x1d
>> [   11.559905]  tlb_finish_mmu+0x10e/0x1a0
>> [   11.561068]  exit_mmap+0xc8/0x1a0
>> [   11.561932]  mmput+0x29/0xd0
>> [   11.562688]  do_exit+0x316/0xa90
>> [   11.563588]  do_group_exit+0x34/0xb0
>> [   11.564476]  __x64_sys_exit_group+0xf/0x10
>> [   11.565512]  do_syscall_64+0x34/0x50
>>
>> and we have info->freed_tables set.
>>
>> What are the architectures that have large systems like?
>>
>> x86: we already zap lazies, so it should cost basically nothing to do
>
> This is not zapping lazies, this is freeing the user page tables.
>
> "lazy mm" is where a switch to a kernel thread takes on the
> previous mm for its kernel mapping rather than switch to init_mm.

The intent of the code is to flush the TLB after freeing user pages
tables, but, on bare metal, lazies get zapped as a side effect.

Anyway, I'm going to send out a mockup of an alternative approach shortly.

^ permalink raw reply

* [PATCH] powerpc: add security.config, enforcing lockdown=integrity
From: Daniel Axtens @ 2020-12-03  4:28 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Daniel Axtens

It's sometimes handy to have a config that boots a bit like a system
under secure boot (forcing lockdown=integrity, without needing any
extra stuff like a command line option).

This config file allows that, and also turns on a few assorted security
and hardening options for good measure.

Suggested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Daniel Axtens <dja@axtens.net>
---
 arch/powerpc/configs/security.config | 15 +++++++++++++++
 1 file changed, 15 insertions(+)
 create mode 100644 arch/powerpc/configs/security.config

diff --git a/arch/powerpc/configs/security.config b/arch/powerpc/configs/security.config
new file mode 100644
index 000000000000..1c91a35c6a73
--- /dev/null
+++ b/arch/powerpc/configs/security.config
@@ -0,0 +1,15 @@
+# This is the equivalent of booting with lockdown=integrity
+CONFIG_SECURITY=y
+CONFIG_SECURITYFS=y
+CONFIG_SECURITY_LOCKDOWN_LSM=y
+CONFIG_SECURITY_LOCKDOWN_LSM_EARLY=y
+CONFIG_LOCK_DOWN_KERNEL_FORCE_INTEGRITY=y
+
+# These are some general, reasonably inexpensive hardening options
+CONFIG_HARDENED_USERCOPY=y
+CONFIG_FORTIFY_SOURCE=y
+CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y
+
+# UBSAN bounds checking is very cheap and good for hardening
+CONFIG_UBSAN=y
+# CONFIG_UBSAN_MISC is not set
\ No newline at end of file
-- 
2.25.1


^ permalink raw reply related

* Re: [PATCH 2/2] ASoC: fsl: Add imx-hdmi machine driver
From: Shengjiu Wang @ 2020-12-03  3:51 UTC (permalink / raw)
  To: Nicolin Chen
  Cc: open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS,
	alsa-devel, Timur Tabi, Xiubo Li, Fabio Estevam, Shengjiu Wang,
	Liam Girdwood, Takashi Iwai, Rob Herring, Mark Brown,
	linuxppc-dev, linux-kernel
In-Reply-To: <20201202201955.GB1498@Asurada-Nvidia>

On Thu, Dec 3, 2020 at 4:23 AM Nicolin Chen <nicoleotsuka@gmail.com> wrote:
>
> On Fri, Nov 27, 2020 at 01:30:21PM +0800, Shengjiu Wang wrote:
> > The driver is initially designed for sound card using HDMI
> > interface on i.MX platform. There is internal HDMI IP or
> > external HDMI modules connect with SAI or AUD2HTX interface.
> > It supports both transmitter and receiver devices.
> >
> > Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com>
> > ---
> >  sound/soc/fsl/Kconfig    |  12 ++
> >  sound/soc/fsl/Makefile   |   2 +
> >  sound/soc/fsl/imx-hdmi.c | 235 +++++++++++++++++++++++++++++++++++++++
> >  3 files changed, 249 insertions(+)
> >  create mode 100644 sound/soc/fsl/imx-hdmi.c
>
> > diff --git a/sound/soc/fsl/imx-hdmi.c b/sound/soc/fsl/imx-hdmi.c
> > new file mode 100644
> > index 000000000000..ac164514b1b2
> > --- /dev/null
> > +++ b/sound/soc/fsl/imx-hdmi.c
>
> > +static int imx_hdmi_hw_params(struct snd_pcm_substream *substream,
> > +                           struct snd_pcm_hw_params *params)
> > +{
> > +     struct snd_soc_pcm_runtime *rtd = substream->private_data;
> > +     struct imx_hdmi_data *data = snd_soc_card_get_drvdata(rtd->card);
> > +     bool tx = substream->stream == SNDRV_PCM_STREAM_PLAYBACK;
> > +     struct snd_soc_dai *cpu_dai = asoc_rtd_to_cpu(rtd, 0);
> > +     struct snd_soc_card *card = rtd->card;
> > +     struct device *dev = card->dev;
> > +     int ret;
> > +
> > +     /* set cpu DAI configuration */
> > +     ret = snd_soc_dai_set_sysclk(cpu_dai, data->cpu_priv.sysclk_id[tx],
> > +                                  8 * data->cpu_priv.slot_width * params_rate(params),
>
> Looks like fixed 2 slots being used, judging by the set_tdm_slot
> call below. Then...why "8 *"? Probably need a line of comments?

The master clock always 256 * rate, when slot_width=32.  so use
the 8 * slot_width.  will add comments.

>
> > +                                  tx ? SND_SOC_CLOCK_OUT : SND_SOC_CLOCK_IN);
> > +     if (ret && ret != -ENOTSUPP) {
> > +             dev_err(dev, "failed to set cpu sysclk: %d\n", ret);
> > +             return ret;
> > +     }
> > +
> > +     ret = snd_soc_dai_set_tdm_slot(cpu_dai, 0, 0, 2, data->cpu_priv.slot_width);
>
> May have a local variable to cache slot_width.

ok.

>
> > +static int imx_hdmi_probe(struct platform_device *pdev)
>
> > +     data->dai.name = "i.MX HDMI";
> > +     data->dai.stream_name = "i.MX HDMI";
> > +     data->dai.cpus->dai_name = dev_name(&cpu_pdev->dev);
> > +     data->dai.platforms->of_node = cpu_np;
> > +     data->dai.ops = &imx_hdmi_ops;
> > +     data->dai.playback_only = true;
> > +     data->dai.capture_only = false;
> > +     data->dai.init = imx_hdmi_init;
> > +
> > +
> > +     if (of_property_read_bool(np, "hdmi-out")) {
> > +             data->dai.playback_only = true;
> > +             data->dai.capture_only = false;
> > +             data->dai.codecs->dai_name = "i2s-hifi";
> > +             data->dai.codecs->name = "hdmi-audio-codec.1";
> > +             data->dai.dai_fmt = data->dai_fmt |
> > +                                 SND_SOC_DAIFMT_NB_NF |
> > +                                 SND_SOC_DAIFMT_CBS_CFS;
> > +     }
> > +
> > +     if (of_property_read_bool(np, "hdmi-in")) {
> > +             data->dai.playback_only = false;
> > +             data->dai.capture_only = true;
> > +             data->dai.codecs->dai_name = "i2s-hifi";
> > +             data->dai.codecs->name = "hdmi-audio-codec.2";
> > +             data->dai.dai_fmt = data->dai_fmt |
> > +                                 SND_SOC_DAIFMT_NB_NF |
> > +                                 SND_SOC_DAIFMT_CBM_CFM;
> > +     }
> > +
> > +     if ((data->dai.playback_only && data->dai.capture_only) ||
> > +         (!data->dai.playback_only && !data->dai.capture_only)) {
> > +             dev_err(&pdev->dev, "Wrongly enable HDMI DAI link\n");
> > +             goto fail;
> > +     }
>
> Seems that this condition check can never be true, given that:
> 1. By default: playback_only=true && capture_only=false
> 2. Conditionally overwritten: playback_only=true && capture_only=false
> 3. Conditionally overwritten: playback_only=false && capture_only=true
>
> If I understand it correctly, probably should be something like:
>         bool hdmi_out = of_property_read_bool(np, "hdmi-out");
>         bool hdmi_in = of_property_read_bool(np, "hdmi-in");
>
>         if ((hdmi_out && hdmi_in) || (!hdmi_out || !hdmi_in))
>                 // "Invalid HDMI DAI link"; goto fail;
>
>         if (hdmi_out) {
>                 // ...
>         } else if (hdmi_in) {
>                 // ...
>         } else // No need of this line if two properties are exclusive
>

Good catch, will update it.

> > +     data->card.num_links = 1;
> > +     data->card.dai_link = &data->dai;
> > +
> > +     platform_set_drvdata(pdev, &data->card);
>
> Why pass card pointer?

Seems it duplicates with dev_set_drvdata(card->dev, card);
in snd_soc_register_card.  will remove it.

best regards
wang shengjiu

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox