LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH AUTOSEL 5.4 55/80] scsi: ibmvfc: Fix error return in ibmvfc_probe()
From: Sasha Levin @ 2020-10-18 19:22 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Tyrel Datwyler, Martin K . Petersen, linux-scsi,
	Jing Xiangfeng, linuxppc-dev
In-Reply-To: <20201018192231.4054535-1-sashal@kernel.org>

From: Jing Xiangfeng <jingxiangfeng@huawei.com>

[ Upstream commit 5e48a084f4e824e1b624d3fd7ddcf53d2ba69e53 ]

Fix to return error code PTR_ERR() from the error handling case instead of
0.

Link: https://lore.kernel.org/r/20200907083949.154251-1-jingxiangfeng@huawei.com
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index df897df5cafee..8a76284b59b08 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4788,6 +4788,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	if (IS_ERR(vhost->work_thread)) {
 		dev_err(dev, "Couldn't create kernel thread: %ld\n",
 			PTR_ERR(vhost->work_thread));
+		rc = PTR_ERR(vhost->work_thread);
 		goto free_host_mem;
 	}
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.19 42/56] scsi: ibmvfc: Fix error return in ibmvfc_probe()
From: Sasha Levin @ 2020-10-18 19:24 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Tyrel Datwyler, Martin K . Petersen, linux-scsi,
	Jing Xiangfeng, linuxppc-dev
In-Reply-To: <20201018192417.4055228-1-sashal@kernel.org>

From: Jing Xiangfeng <jingxiangfeng@huawei.com>

[ Upstream commit 5e48a084f4e824e1b624d3fd7ddcf53d2ba69e53 ]

Fix to return error code PTR_ERR() from the error handling case instead of
0.

Link: https://lore.kernel.org/r/20200907083949.154251-1-jingxiangfeng@huawei.com
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 71d53bb239e25..090ab377f65e5 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4795,6 +4795,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	if (IS_ERR(vhost->work_thread)) {
 		dev_err(dev, "Couldn't create kernel thread: %ld\n",
 			PTR_ERR(vhost->work_thread));
+		rc = PTR_ERR(vhost->work_thread);
 		goto free_host_mem;
 	}
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.14 39/52] scsi: ibmvfc: Fix error return in ibmvfc_probe()
From: Sasha Levin @ 2020-10-18 19:25 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Tyrel Datwyler, Martin K . Petersen, linux-scsi,
	Jing Xiangfeng, linuxppc-dev
In-Reply-To: <20201018192530.4055730-1-sashal@kernel.org>

From: Jing Xiangfeng <jingxiangfeng@huawei.com>

[ Upstream commit 5e48a084f4e824e1b624d3fd7ddcf53d2ba69e53 ]

Fix to return error code PTR_ERR() from the error handling case instead of
0.

Link: https://lore.kernel.org/r/20200907083949.154251-1-jingxiangfeng@huawei.com
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 34612add3829f..dbacd9830d3df 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4797,6 +4797,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	if (IS_ERR(vhost->work_thread)) {
 		dev_err(dev, "Couldn't create kernel thread: %ld\n",
 			PTR_ERR(vhost->work_thread));
+		rc = PTR_ERR(vhost->work_thread);
 		goto free_host_mem;
 	}
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.9 32/41] scsi: ibmvfc: Fix error return in ibmvfc_probe()
From: Sasha Levin @ 2020-10-18 19:26 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Tyrel Datwyler, Martin K . Petersen, linux-scsi,
	Jing Xiangfeng, linuxppc-dev
In-Reply-To: <20201018192635.4056198-1-sashal@kernel.org>

From: Jing Xiangfeng <jingxiangfeng@huawei.com>

[ Upstream commit 5e48a084f4e824e1b624d3fd7ddcf53d2ba69e53 ]

Fix to return error code PTR_ERR() from the error handling case instead of
0.

Link: https://lore.kernel.org/r/20200907083949.154251-1-jingxiangfeng@huawei.com
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 54dea767dfde9..04b3ac17531db 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4804,6 +4804,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	if (IS_ERR(vhost->work_thread)) {
 		dev_err(dev, "Couldn't create kernel thread: %ld\n",
 			PTR_ERR(vhost->work_thread));
+		rc = PTR_ERR(vhost->work_thread);
 		goto free_host_mem;
 	}
 
-- 
2.25.1


^ permalink raw reply related

* [PATCH AUTOSEL 4.4 24/33] scsi: ibmvfc: Fix error return in ibmvfc_probe()
From: Sasha Levin @ 2020-10-18 19:27 UTC (permalink / raw)
  To: linux-kernel, stable
  Cc: Sasha Levin, Tyrel Datwyler, Martin K . Petersen, linux-scsi,
	Jing Xiangfeng, linuxppc-dev
In-Reply-To: <20201018192728.4056577-1-sashal@kernel.org>

From: Jing Xiangfeng <jingxiangfeng@huawei.com>

[ Upstream commit 5e48a084f4e824e1b624d3fd7ddcf53d2ba69e53 ]

Fix to return error code PTR_ERR() from the error handling case instead of
0.

Link: https://lore.kernel.org/r/20200907083949.154251-1-jingxiangfeng@huawei.com
Acked-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 drivers/scsi/ibmvscsi/ibmvfc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c
index 0526a47e30a3f..db80ab8335dfb 100644
--- a/drivers/scsi/ibmvscsi/ibmvfc.c
+++ b/drivers/scsi/ibmvscsi/ibmvfc.c
@@ -4790,6 +4790,7 @@ static int ibmvfc_probe(struct vio_dev *vdev, const struct vio_device_id *id)
 	if (IS_ERR(vhost->work_thread)) {
 		dev_err(dev, "Couldn't create kernel thread: %ld\n",
 			PTR_ERR(vhost->work_thread));
+		rc = PTR_ERR(vhost->work_thread);
 		goto free_host_mem;
 	}
 
-- 
2.25.1


^ permalink raw reply related

* [Bug 209733] New: Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
From: bugzilla-daemon @ 2020-10-18 23:09 UTC (permalink / raw)
  To: linuxppc-dev

https://bugzilla.kernel.org/show_bug.cgi?id=209733

            Bug ID: 209733
           Summary: Starting new KVM virtual machines on PPC64 starts to
                    hang after box is up for a while
           Product: Platform Specific/Hardware
           Version: 2.5
    Kernel Version: >=5.8
          Hardware: PPC-64
                OS: Linux
              Tree: Mainline
            Status: NEW
          Severity: high
          Priority: P1
         Component: PPC-64
          Assignee: platform_ppc-64@kernel-bugs.osdl.org
          Reporter: cam@neo-zeon.de
        Regression: No

Issue occurs with 5.8.14, 5.8.16, and 5.9.1.  Does NOT occur with 5.7.x. I
suspect it occurs with all of 5.8, but I haven't confirmed this yet.

After the box has been up for a "while", starting new VM's fails. Completely
shutting down existing VM's and then starting them back up will also fail in
the same way.

What is a while? Could be 2 days, might be 9. I'll update as the pattern
becomes more clear.

libvirt is generally used, but when running kvm manually with strace, kvm
always gets stuck here:
ioctl(11, KVM_PPC_ALLOCATE_HTAB, 0x7fffea0bade4

Maybe the kernel is trying to find the memory needed to allocate the Hashed
Page Table but is unable to do so? Maybe there's a memory leak?

Before this issue starts occurring, I have confirmed I am able to run the exact
same kvm command manually:
sudo -u libvirt-qemu qemu-system-ppc64 -enable-kvm -m 8192 -nographic -vga none
-drive file=/var/lib/libvirt/images/test.qcow2,format=qcow2 -mem-prealloc -smp
4

Nothing in dmesg, nothing useful in the logs.

This box's configuration:
Debian 10 stable
2x 18 core POWER9 (144 threads)
512g physical memory
Raptor Talos II motherboard
radix MMU disabled

Unfortunately, I cannot test the affected box with the Radix MMU enabled
because I have some important VM's that won't run unless it is disabled.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* [Bug 209733] Starting new KVM virtual machines on PPC64 starts to hang after box is up for a while
From: bugzilla-daemon @ 2020-10-18 23:13 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <bug-209733-206035@https.bugzilla.kernel.org/>

https://bugzilla.kernel.org/show_bug.cgi?id=209733

Cameron (cam@neo-zeon.de) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
          Component|PPC-64                      |kvm
            Version|2.5                         |unspecified
            Product|Platform Specific/Hardware  |Virtualization

-- 
You are receiving this mail because:
You are watching the assignee of the bug.

^ permalink raw reply

* Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
From: Nicholas Piggin @ 2020-10-19  1:00 UTC (permalink / raw)
  To: Michal Suchánek; +Cc: ro, linuxppc-dev, Hari Bathini
In-Reply-To: <20201016201410.GH29778@kitsune.suse.cz>

Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am:
> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
>> > Michal Suchánek <msuchanek@suse.de> writes:
>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
>> >>> > Hello,
>> >>> > 
>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
>> >>> > Reimplement book3s idle code in C").
>> >>> > 
>> >>> > The symptom is host locking up completely after some hours of KVM
>> >>> > workload with messages like
>> >>> > 
>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>> >>> > 
>> >>> > printed before the host locks up.
>> >>> > 
>> >>> > The machines run sandboxed builds which is a mixed workload resulting in
>> >>> > IO/single core/mutiple core load over time and there are periods of no
>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
>> >>> > setup/terdown is somewhat excercised as well.
>> >>> > 
>> >>> > POWER9 with the new guest entry fast path does not seem to be affected.
>> >>> > 
>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel seems
>> >>> > stable.
>> >>> > 
>> >>> > Config is attached.
>> >>> > 
>> >>> > I cannot easily revert this commit, especially if I want to use the same
>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
>> >>> > only to the new idle code.
>> >>> > 
>> >>> > Any idea what can be the problem?
>> >>> 
>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a good
>> >>> NMI IPI and I don't know if it supports pdbg dumping registers from the
>> >>> BMC unfortunately.
>> >>
>> >> It may be possible to set up fadump with a later kernel version that
>> >> supports it on powernv and dump the whole kernel.
>> > 
>> > Your firmware won't support it AFAIK.
>> > 
>> > You could try kdump, but if we have CPUs stuck in KVM then there's a
>> > good chance it won't work :/
>> 
>> I haven't had any luck yet reproducing this still. Testing with sub 
>> cores of various different combinations, etc. I'll keep trying though.
> 
> Hello,
> 
> I tried running some KVM guests to simulate the workload and what I get
> is guests failing to start with a rcu stall. Tried both 5.3 and 5.9
> kernel and qemu 4.2.1 and 5.1.0
> 
> To start some guests I run
> 
> for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel kvm -smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults -nographic -serial mon:telnet::444$i,server,wait & done
> 
> To simulate some workload I run
> 
> xz -zc9T0 < /dev/zero > /dev/null &
> while true; do
>     killall -STOP xz; sleep 1; killall -CONT xz; sleep 1;
> done &
> 
> on the host and add a job that executes this to the ramdisk. However, most
> guests never get to the point where the job is executed.
> 
> Any idea what might be the problem?

I would say try without pv queued spin locks (but if the same thing is 
happening with 5.3 then it must be something else I guess). 

I'll try to test a similar setup on a POWER8 here.

Thanks,
Nick

> 
> In the past I was able to boot guests quite realiably.
> 
> This is boot log of one of the VMs
> 
> Trying ::1...
> Connected to localhost.
> Escape character is '^]'.
> 
> 
> SLOF **********************************************************************
> QEMU Starting
>  Build Date = Jul 17 2020 11:15:24
>  FW Version = git-e18ddad8516ff2cf
>  Press "s" to enter Open Firmware.
> 
> Populating /vdevice methods
> Populating /vdevice/vty@71000000
> Populating /vdevice/nvram@71000001
> Populating /pci@800000020000000
> No NVRAM common partition, re-initializing...
> Scanning USB 
> Using default console: /vdevice/vty@71000000
> Detected RAM kernel at 400000 (27c8620 bytes) 
>      
>   Welcome to Open Firmware
> 
>   Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
>   This program and the accompanying materials are made available
>   under the terms of the BSD License available at
>   http://www.opensource.org/licenses/bsd-license.php
> 
> Booting from memory...
> OF stdout device is: /vdevice/vty@71000000
> Preparing to boot Linux version 5.9.0-1.g11733e1-default (geeko@buildhost) (gcc (SUSE Linux) 10.2.1 20200825 [revision c0746a1beb1ba073c7981eb09f55b3d993b32e5c], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.34.0.20200325-1) #1 SMP Sun Oct 11 22:20:46 UTC 2020 (11733e1)
> Detected machine type: 0000000000000101
> command line:  
> Max number of cores passed to firmware: 2048 (NR_CPUS = 2048)
> Calling ibm,client-architecture-support... done
> memory layout at init:
>   memory_limit : 0000000000000000 (16 MB aligned)
>   alloc_bottom : 0000000003810000
>   alloc_top    : 0000000030000000
>   alloc_top_hi : 0000000080000000
>   rmo_top      : 0000000030000000
>   ram_top      : 0000000080000000
> instantiating rtas at 0x000000002fff0000... done
> prom_hold_cpus: skipped
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000003820000 -> 0x0000000003820a2c
> Device tree struct  0x0000000003830000 -> 0x0000000003840000
> Quiescing Open Firmware ...
> Booting Linux via __start() @ 0x0000000000400000 ...
> [    0.000000] hash-mmu: Page sizes from device-tree:
> [    0.000000] hash-mmu: base_shift=12: shift=12, sllp=0x0000, avpnm=0x00000000, tlbiel=1, penc=0
> [    0.000000] hash-mmu: base_shift=16: shift=16, sllp=0x0110, avpnm=0x00000000, tlbiel=1, penc=1
> [    0.000000] Using 1TB segments
> [    0.000000] hash-mmu: Initializing hash mmu with SLB
> [    0.000000] Linux version 5.9.0-1.g11733e1-default (geeko@buildhost) (gcc (SUSE Linux) 10.2.1 20200825 [revision c0746a1beb1ba073c7981eb09f55b3d993b32e5c], GNU ld (GNU Binutils; openSUSE Tumbleweed) 2.34.0.20200325-1) #1 SMP Sun Oct 11 22:20:46 UTC 2020 (11733e1)
> [    0.000000] Found initrd at 0xc000000002be0000:0xc000000003804784
> [    0.000000] Using pSeries machine description
> [    0.000000] printk: bootconsole [udbg0] enabled
> [    0.000000] Partition configured for 8 cpus.
> [    0.000000] CPU maps initialized for 1 thread per core
> [    0.000000] -----------------------------------------------------
> [    0.000000] phys_mem_size     = 0x80000000
> [    0.000000] dcache_bsize      = 0x80
> [    0.000000] icache_bsize      = 0x80
> [    0.000000] cpu_features      = 0x000002eb8f4d91a7
> [    0.000000]   possible        = 0x000ffbfbcf5fb1a7
> [    0.000000]   always          = 0x00000003800081a1
> [    0.000000] cpu_user_features = 0xdc0065c2 0xae000000
> [    0.000000] mmu_features      = 0x78006001
> [    0.000000] firmware_features = 0x00000085455a445f
> [    0.000000] vmalloc start     = 0xc008000000000000
> [    0.000000] IO start          = 0xc00a000000000000
> [    0.000000] vmemmap start     = 0xc00c000000000000
> [    0.000000] hash-mmu: ppc64_pft_size    = 0x18
> [    0.000000] hash-mmu: htab_hash_mask    = 0x1ffff
> [    0.000000] -----------------------------------------------------
> [    0.000000] numa:   NODE_DATA [mem 0x7ffa8900-0x7ffaffff]
> [    0.000000] rfi-flush: fallback displacement flush available
> [    0.000000] rfi-flush: ori type flush available
> [    0.000000] rfi-flush: mttrig type flush available
> [    0.000000] count-cache-flush: hardware flush enabled.
> [    0.000000] link-stack-flush: software flush enabled.
> [    0.000000] stf-barrier: hwsync barrier available
> [    0.000000] PCI host bridge /pci@800000020000000  ranges:
> [    0.000000]   IO 0x0000200000000000..0x000020000000ffff -> 0x0000000000000000
> [    0.000000]  MEM 0x0000200080000000..0x00002000ffffffff -> 0x0000000080000000 
> [    0.000000]  MEM 0x0000210000000000..0x000021ffffffffff -> 0x0000210000000000 
> [    0.000000] PCI: OF: PROBE_ONLY disabled
> [    0.000000] PPC64 nvram contains 65536 bytes
> [    0.000000] PV qspinlock hash table entries: 4096 (order: 0, 65536 bytes, linear)
> [    0.000000] barrier-nospec: using ORI speculation barrier
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000007fffffff]
> [    0.000000]   Device   empty
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000007fffffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000007fffffff]
> [    0.000000] percpu: Embedded 11 pages/cpu s629400 r0 d91496 u1048576
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 32736
> [    0.000000] Policy zone: Normal
> [    0.000000] Kernel command line: 
> [    0.000000] Dentry cache hash table entries: 262144 (order: 5, 2097152 bytes, linear)
> [    0.000000] Inode-cache hash table entries: 131072 (order: 4, 1048576 bytes, linear)
> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Memory: 0K/2097152K available (15104K kernel code, 1984K rwdata, 7040K rodata, 5824K init, 10721K bss, 131968K reserved, 0K cma-reserved)
> [    0.000000] random: get_random_u64 called from kmem_cache_open+0x3c/0x330 with crng_init=0
> [    0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=8, Nodes=1
> [    0.000000] ftrace: allocating 37469 entries in 14 pages
> [    0.000000] ftrace: allocated 14 pages with 3 groups
> [    0.000000] rcu: Hierarchical RCU implementation.
> [    0.000000] rcu: 	RCU event tracing is enabled.
> [    0.000000] rcu: 	RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=8.
> [    0.000000] 	Trampoline variant of Tasks RCU enabled.
> [    0.000000] 	Rude variant of Tasks RCU enabled.
> [    0.000000] 	Tracing variant of Tasks RCU enabled.
> [    0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
> [    0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
> [    0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
> [    0.000001] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x761537d007, max_idle_ns: 440795202126 ns
> [    0.001107] clocksource: timebase mult[1f40000] shift[24] registered
> [    0.001819] Console: colour dummy device 80x25
> [    0.002296] printk: console [hvc0] enabled
> [    0.002296] printk: console [hvc0] enabled
> [    0.002762] printk: bootconsole [udbg0] disabled
> [    0.002762] printk: bootconsole [udbg0] disabled
> [    0.003323] pid_max: default: 32768 minimum: 301
> [    0.003435] LSM: Security Framework initializing
> [    0.003529] AppArmor: AppArmor initialized
> [    0.003597] Mount-cache hash table entries: 8192 (order: 0, 65536 bytes, linear)
> [    0.003660] Mountpoint-cache hash table entries: 8192 (order: 0, 65536 bytes, linear)
> [    0.004286] EEH: pSeries platform initialized
> [    0.004329] POWER8 performance monitor hardware support registered
> [    0.004380] power8-pmu: PMAO restore workaround active.
> [    0.004434] rcu: Hierarchical SRCU implementation.
> [    0.004660] smp: Bringing up secondary CPUs ...
> [    0.007882] smp: Brought up 1 node, 8 CPUs
> [    0.007927] numa: Node 0 CPUs: 0-7
> [    0.007958] Using standard scheduler topology
> [    0.009002] node 0 deferred pages initialised in 0ms
> [    0.009491] devtmpfs: initialized
> [    0.018942] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> [    0.019031] futex hash table entries: 2048 (order: 2, 262144 bytes, linear)
> [    0.019229] pinctrl core: initialized pinctrl subsystem
> [    0.019602] NET: Registered protocol family 16
> [    0.019805] audit: initializing netlink subsys (disabled)
> [    0.020018] audit: type=2000 audit(1602878417.010:1): state=initialized audit_enabled=0 res=1
> [    0.020091] thermal_sys: Registered thermal governor 'fair_share'
> [    0.020092] thermal_sys: Registered thermal governor 'bang_bang'
> [    0.020180] thermal_sys: Registered thermal governor 'step_wise'
> [    0.020228] thermal_sys: Registered thermal governor 'user_space'
> [    0.020292] cpuidle: using governor ladder
> [    0.020372] cpuidle: using governor menu
> [    0.020521] pstore: Registered nvram as persistent store backend
> Linux ppc64le
> #1 SMP Sun Oct 1[    0.024636] PCI: Probing PCI hardware
> [    0.024713] PCI host bridge to bus 0000:00
> [    0.024746] pci_bus 0000:00: root bus resource [io  0x10000-0x1ffff] (bus address [0x0000-0xffff])
> [    0.024814] pci_bus 0000:00: root bus resource [mem 0x200080000000-0x2000ffffffff] (bus address [0x80000000-0xffffffff])
> [    0.024891] pci_bus 0000:00: root bus resource [mem 0x210000000000-0x21ffffffffff]
> [    0.024950] pci_bus 0000:00: root bus resource [bus 00-ff]
> [    0.026710] IOMMU table initialized, virtual merging enabled
> [    0.026769] pci_bus 0000:00: resource 4 [io  0x10000-0x1ffff]
> [    0.026820] pci_bus 0000:00: resource 5 [mem 0x200080000000-0x2000ffffffff]
> [    0.026869] pci_bus 0000:00: resource 6 [mem 0x210000000000-0x21ffffffffff]
> [    0.026918] EEH: No capable adapters found: recovery disabled.
> [    0.031232] iommu: Default domain type: Translated 
> [    0.031358] vgaarb: loaded
> [    0.031439] pps_core: LinuxPPS API ver. 1 registered
> [    0.031479] pps_core: Software ver. 5.3.6 - Copyright 2005-2007 Rodolfo Giometti <giometti@linux.it>
> [    0.031548] PTP clock support registered
> [    0.031585] EDAC MC: Ver: 3.0.0
> [    0.031861] NetLabel: Initializing
> [    0.031893] NetLabel:  domain hash size = 128
> [    0.031932] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
> [    0.031992] NetLabel:  unlabeled traffic allowed by default
> [    0.045283] clocksource: Switched to clocksource timebase
> [    0.061403] VFS: Disk quotas dquot_6.6.0
> [    0.061497] VFS: Dquot-cache hash table entries: 8192 (order 0, 65536 bytes)
> [    0.061581] hugetlbfs: disabling because there are no supported hugepage sizes
> [    0.061727] AppArmor: AppArmor Filesystem Enabled
> [    0.063692] random: fast init done
> [    0.065139] NET: Registered protocol family 2
> [    0.065324] tcp_listen_portaddr_hash hash table entries: 4096 (order: 0, 65536 bytes, linear)
> [    0.065401] TCP established hash table entries: 16384 (order: 1, 131072 bytes, linear)
> [    0.065482] TCP bind hash table entries: 16384 (order: 2, 262144 bytes, linear)
> [    0.065564] TCP: Hash tables configured (established 16384 bind 16384)
> [    0.065658] MPTCP token hash table entries: 4096 (order: 0, 98304 bytes, linear)
> [    0.065729] UDP hash table entries: 2048 (order: 0, 65536 bytes, linear)
> [    0.065785] UDP-Lite hash table entries: 2048 (order: 0, 65536 bytes, linear)
> [    0.065893] NET: Registered protocol family 1
> [    0.065937] NET: Registered protocol family 44
> [    0.065979] PCI: CLS 0 bytes, default 128
> [    0.066053] Trying to unpack rootfs image as initramfs...
> [    0.095306] rcu: INFO: rcu_sched self-detected stall on CPU
> [    0.095492] rcu: 	4-...!: (1 GPs behind) idle=0d6/0/0x1 softirq=6/7 fqs=0 
> [    0.095647] 	(t=13508 jiffies g=-1187 q=215)
> [    0.095769] rcu: rcu_sched kthread starved for 13508 jiffies! g-1187 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x402 ->cpu=0
> [    0.096015] rcu: 	Unless rcu_sched kthread gets sufficient CPU time, OOM is now expected behavior.
> [    0.096233] rcu: RCU grace-period kthread stack dump:
> [    0.096360] task:rcu_sched       state:I stack:    0 pid:   10 ppid:     2 flags:0x00000808
> [    0.096551] Call Trace:
> [    0.096621] [c000000007aab820] [c000000007aab890] 0xc000000007aab890 (unreliable)
> [    0.096812] [c000000007aaba00] [c00000000001c25c] __switch_to+0x11c/0x200
> [    0.096971] [c000000007aaba60] [c000000000ea50ac] __schedule+0x23c/0x800
> [    0.097136] [c000000007aabb30] [c000000000ea56e4] schedule+0x74/0x140
> [    0.097294] [c000000007aabb60] [c000000000eab6b0] schedule_timeout+0xa0/0x1c0
> [    0.097483] [c000000007aabc30] [c00000000021f00c] rcu_gp_fqs_loop+0x48c/0x610
> [    0.097674] [c000000007aabcf0] [c000000000222f40] rcu_gp_kthread+0x200/0x290
> [    0.097859] [c000000007aabdb0] [c000000000190e60] kthread+0x190/0x1a0
> [    0.098018] [c000000007aabe20] [c00000000000d3d0] ret_from_kernel_thread+0x5c/0x6c
> [    0.098211] Sending NMI from CPU 4 to CPUs 0:
> [    6.719253] CPU 0 didn't respond to backtrace IPI, inspecting paca.
> [    6.722181] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 1 (swapper/0)
> [    6.722297] Back trace of paca->saved_r1 (0xc000000007a8f140) (possibly stale):
> [    6.722412] Call Trace:
> [    6.722489] [c000000007a8f140] [0000000000000075] 0x75 (unreliable)
> [    6.722581] [c000000007a8f1a0] [c000000007a8f280] 0xc000000007a8f280
> [    6.722680] [c000000007a8f250] [c000000000254e30] tick_do_update_jiffies64.part.0+0x100/0x1f0
> [    6.722785] [c000000007a8f290] [c00000000025505c] tick_sched_timer+0x13c/0x140
> [    6.722855] [c000000007a8f2d0] [c00000000023ac50] __run_hrtimer+0xb0/0x360
> [    6.722916] [c000000007a8f320] [c00000000023afd4] __hrtimer_run_queues+0xd4/0x1a0
> [    6.722986] [c000000007a8f380] [c00000000023c144] hrtimer_interrupt+0x124/0x300
> [    6.723064] [c000000007a8f430] [c0000000000272b4] timer_interrupt+0x104/0x2d0
> [    6.723141] [c000000007a8f490] [c000000000009aa0] decrementer_common_virt+0x190/0x1a0
> [    6.723227] --- interrupt: 900 at lzma_main+0x1d4/0x2a0
> [    6.723227]     LR = lzma_main+0xcc/0x2a0
> [    6.723317] [c000000007a8f790] [c0000000008dec00] lzma_main+0x1f0/0x2a0 (unreliable)
> [    6.723393] [c000000007a8f7d0] [c0000000008dee90] lzma2_lzma+0x1e0/0x390
> [    6.723448] [c000000007a8f820] [c0000000008df150] xz_dec_lzma2_run+0x110/0x6e0
> [    6.723523] [c000000007a8f8c0] [c0000000008dce58] dec_block+0x218/0x2b0
> [    6.723578] [c000000007a8f920] [c0000000008dd390] dec_main+0x1a0/0x590
> [    6.723645] [c000000007a8f9b0] [c0000000008dd7d4] xz_dec_run+0x54/0x180
> [    6.723714] [c000000007a8f9f0] [c00000000160333c] unxz+0x1e8/0x408
> [    6.723786] [c000000007a8faa0] [c0000000015b6f58] unpack_to_rootfs+0x1e4/0x378
> [    6.723851] [c000000007a8fb50] [c0000000015b7264] populate_rootfs+0x98/0x184
> [    6.723926] [c000000007a8fbd0] [c000000000011ee0] do_one_initcall+0x60/0x2b0
> [    6.723992] [c000000007a8fca0] [c0000000015b4f34] do_initcalls+0x140/0x18c
> [    6.724062] [c000000007a8fd50] [c0000000015b520c] kernel_init_freeable+0x1f0/0x25c
> [    6.724129] [c000000007a8fdb0] [c000000000012534] kernel_init+0x2c/0x168
> [    6.724183] [c000000007a8fe20] [c00000000000d3d0] ret_from_kernel_thread+0x5c/0x6c
> [    6.724315] Sending NMI from CPU 4 to CPUs 1:
> [    6.724569] CPU 1 didn't respond to backtrace IPI, inspecting paca.
> [    6.724582] NMI backtrace for cpu 1
> [    6.724640] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 0 (swapper/1)
> [    6.724801] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-1.g11733e1-default #1 openSUSE Tumbleweed (unreleased)
> [    6.724838] Back trace of paca->saved_r1 (0xc000000007aff720) (possibly stale):
> [    6.724841] Call Trace:
> [    6.724957] NIP:  c0000000000fb260 LR: c0000000000fdf58 CTR: c00000000012e014
> [    6.725027] Sending NMI from CPU 4 to CPUs 2:
> [    6.725049] REGS: c000000007ad7b00 TRAP: 0500   Not tainted  (5.9.0-1.g11733e1-default)
> [    6.725052] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48000224  XER: 00000000
> [    6.725188] CPU 2 didn't respond to backtrace IPI, inspecting paca.
> [    6.725194] NMI backtrace for cpu 2
> [    6.725200] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.9.0-1.g11733e1-default #1 openSUSE Tumbleweed (unreleased)
> [    6.725201] NIP:  c0000000000fb260 LR: c0000000000fdf58 CTR: c00000000012e014
> [    6.725203] REGS: c000000007af7b00 TRAP: 0500   Not tainted  (5.9.0-1.g11733e1-default)
> [    6.725203] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48000024  XER: 00000000
> [    6.725208] CFAR: c0000000001e8fbc IRQMASK: 0 
> [    6.725208] GPR00: 0000000028000024 c000000007af7d90 c000000001cd4a00 0000000000000000 
> [    6.725208] GPR04: c00000007f612c30 0000000000000000 00000000001570d0 000000000014de44 
> [    6.725208] GPR08: 000000007df60000 c00000003fffd000 0000000000000001 00000000ffffffff 
> [    6.725208] GPR12: c0000000000fdec0 c00000003fffd600 0000000000000000 0000000000000000 
> [    6.725208] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> [    6.725208] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 
> [    6.725208] GPR24: 0000000000000002 0000000000000000 0000000000000000 c000000001d17640 
> [    6.725208] GPR28: 0000000000000001 c000000007a49e00 c0000000016a7190 c0000000016a7188 
> [    6.725233] NIP [c0000000000fb260] plpar_hcall_norets+0x1c/0x28
> [    6.725235] LR [c0000000000fdf58] pseries_lpar_idle+0x98/0x190
> [    6.725236] Call Trace:
> [    6.725238] [c000000007af7d90] [c000000007a49e00] 0xc000000007a49e00 (unreliable)
> [    6.725241] [c000000007af7e00] [c00000000001da40] arch_cpu_idle+0x50/0x180
> [    6.725266] [c000000007af7e30] [c000000000eacd84] default_idle_call+0x84/0x208
> [    6.725268] CFAR: c0000000001e9170 
> [    6.725272] IRQMASK: 0 
> [    6.725272] GPR00: 0000000028000224 c000000007ad7d90 c000000001cd4a00 0000000000000000 
> [    6.725272] GPR04: c00000007f512c30 0000000000000000 0000000000101b60 00000000000fa76c 
> [    6.725272] GPR08: 000000007de60000 c00000003fffe800 0000000000000001 00000000ffffffff 
> [    6.725272] GPR12: c0000000000fdec0 c00000003fffee00 0000000000000000 0000000000000000 
> [    6.725272] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> [    6.725272] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 
> [    6.725272] GPR24: 0000000000000001 0000000000000000 0000000000000000 c000000001d17640 
> [    6.725272] GPR28: 0000000000000001 c000000007a42780 c0000000016a7190 c0000000016a7188 
> [    6.725289] NIP [c0000000000fb260] plpar_hcall_norets+0x1c/0x28
> [    6.725291] LR [c0000000000fdf58] pseries_lpar_idle+0x98/0x190
> [    6.725292] Call Trace:
> [    6.725295] [c000000007ad7d90] [c000000007a42780] 0xc000000007a42780 (unreliable)
> [    6.725298] [c000000007ad7e00] [c00000000001da40] arch_cpu_idle+0x50/0x180
> [    6.725302] [c000000007ad7e30] [c000000000eacd84] default_idle_call+0x84/0x208
> [    6.725305] [c000000007ad7e70] [c0000000001afd6c] cpuidle_idle_call+0x20c/0x2d0
> [    6.725306] [c000000007ad7ec0] [c0000000001aff20] do_idle+0xf0/0x1d0
> [    6.725308] [c000000007ad7f10] [c0000000001b024c] cpu_startup_entry+0x3c/0x40
> [    6.725310] [c000000007ad7f40] [c00000000005f038] start_secondary+0x248/0x250
> [    6.725312] [c000000007ad7f90] [c00000000000c454] start_secondary_prolog+0x10/0x14
> [    6.725312] Instruction dump:
> [    6.725337] 7c0803a6 3884fff8 78630100 78840020 4bfffeb0 3c4c01be 384297bc 7c421378 
> [    6.725340] 7c000026 90010008 60000000 44000022 <80010008> 7c0ff120 4e800020 7c0802a6 
> [    6.725357] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 0 (swapper/2)
> [    6.725517] [c000000007af7e70] [c0000000001afd6c] cpuidle_idle_call+0x20c/0x2d0
> [    6.725556] Back trace of paca->saved_r1 (0x0000000000000000) (possibly stale):
> [    6.725558] Call Trace:
> [    6.725658] [c000000007af7ec0] [c0000000001aff20] do_idle+0xf0/0x1d0
> [    6.725740] Sending NMI from CPU 4 to CPUs 3:
> [    6.725806] [c000000007af7f10] [c0000000001b024c] cpu_startup_entry+0x3c/0x40
> [    6.728644] [c000000007af7f40] [c00000000005f038] start_secondary+0x248/0x250
> [    6.728707] [c000000007af7f90] [c00000000000c454] start_secondary_prolog+0x10/0x14
> [    6.728769] Instruction dump:
> [    6.728805] 7c0803a6 3884fff8 78630100 78840020 4bfffeb0 3c4c01be 384297bc 7c421378 
> [    6.728869] 7c000026 90010008 60000000 44000022 <80010008> 7c0ff120 4e800020 7c0802a6 
> [   12.209770] CPU 3 didn't respond to backtrace IPI, inspecting paca.
> [   12.210353] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 0 (swapper/3)
> [   12.210436] Back trace of paca->saved_r1 (0xc000000007a83760) (possibly stale):
> [   12.210546] Call Trace:
> [   12.210578] [c000000007a83760] [c000000007a837b0] 0xc000000007a837b0 (unreliable)
> [   12.210655] [c000000007a83800] [c0000000001e9574] __pv_queued_spin_lock_slowpath+0x2f4/0x420
> [   12.210739] [c000000007a83850] [c000000000ead414] _raw_spin_lock_irqsave+0xa4/0xc0
> [   12.210814] [c000000007a83880] [c000000000220cc8] rcu_report_qs_rdp.constprop.0+0x38/0x170
> [   12.210892] [c000000007a838c0] [c000000000221334] rcu_core+0xa4/0x2a0
> [   12.210946] [c000000007a83910] [c000000000eadb40] __do_softirq+0x160/0x3f8
> [   12.211001] [c000000007a83a00] [c000000000162cf8] irq_exit+0xd8/0x140
> [   12.211066] [c000000007a83a30] [c0000000000272d8] timer_interrupt+0x128/0x2d0
> [   12.211143] [c000000007a83a90] [c000000000009aa0] decrementer_common_virt+0x190/0x1a0
> [   12.211222] --- interrupt: 900 at plpar_hcall_norets+0x1c/0x28
> [   12.211222]     LR = pseries_lpar_idle+0x98/0x190
> [   12.211307] [c000000007a83d90] [c000000007a4ed00] 0xc000000007a4ed00 (unreliable)
> [   12.211385] [c000000007a83e00] [c00000000001da40] arch_cpu_idle+0x50/0x180
> [   12.211441] [c000000007a83e30] [c000000000eacd84] default_idle_call+0x84/0x208
> [   12.211518] [c000000007a83e70] [c0000000001afd6c] cpuidle_idle_call+0x20c/0x2d0
> [   12.211584] [c000000007a83ec0] [c0000000001aff20] do_idle+0xf0/0x1d0
> [   12.211650] [c000000007a83f10] [c0000000001b024c] cpu_startup_entry+0x3c/0x40
> [   12.211717] [c000000007a83f40] [c00000000005f038] start_secondary+0x248/0x250
> [   12.211792] [c000000007a83f90] [c00000000000c454] start_secondary_prolog+0x10/0x14
> [   12.211879] NMI backtrace for cpu 4
> [   12.211932] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 5.9.0-1.g11733e1-default #1 openSUSE Tumbleweed (unreleased)
> [   12.212022] Call Trace:
> [   12.212055] [c000000007abb4f0] [c0000000008f5de0] dump_stack+0xc4/0x114 (unreliable)
> [   12.212121] [c000000007abb540] [c00000000090316c] nmi_cpu_backtrace+0xac/0x100
> [   12.212187] [c000000007abb5b0] [c00000000090336c] nmi_trigger_cpumask_backtrace+0x1ac/0x1f0
> [   12.212271] [c000000007abb650] [c000000000072e38] arch_trigger_cpumask_backtrace+0x28/0x40
> [   12.212338] [c000000007abb670] [c00000000022479c] rcu_dump_cpu_stacks+0x10c/0x168
> [   12.212416] [c000000007abb6c0] [c00000000021b4d0] print_cpu_stall+0x1d0/0x2a0
> [   12.212492] [c000000007abb770] [c00000000021d998] check_cpu_stall+0x148/0x340
> [   12.212557] [c000000007abb7a0] [c00000000021dbe0] rcu_pending+0x50/0x150
> [   12.212623] [c000000007abb7f0] [c0000000002233f4] rcu_sched_clock_irq+0x84/0x240
> [   12.212699] [c000000007abb830] [c000000000239b28] update_process_times+0x48/0xd0
> [   12.212765] [c000000007abb860] [c0000000002546bc] tick_sched_handle+0x3c/0xd0
> [   12.212842] [c000000007abb890] [c000000000254fb0] tick_sched_timer+0x90/0x140
> [   12.212919] [c000000007abb8d0] [c00000000023ac50] __run_hrtimer+0xb0/0x360
> [   12.212973] [c000000007abb920] [c00000000023afd4] __hrtimer_run_queues+0xd4/0x1a0
> [   12.213050] [c000000007abb980] [c00000000023c144] hrtimer_interrupt+0x124/0x300
> [   12.213128] [c000000007abba30] [c0000000000272b4] timer_interrupt+0x104/0x2d0
> [   12.213203] [c000000007abba90] [c000000000009aa0] decrementer_common_virt+0x190/0x1a0
> [   12.213269] --- interrupt: 900 at plpar_hcall_norets+0x1c/0x28
> [   12.213269]     LR = pseries_lpar_idle+0x98/0x190
> [   12.213361] [c000000007abbd90] [c000000007a44f00] 0xc000000007a44f00 (unreliable)
> [   12.213437] [c000000007abbe00] [c00000000001da40] arch_cpu_idle+0x50/0x180
> [   12.213502] [c000000007abbe30] [c000000000eacd84] default_idle_call+0x84/0x208
> [   12.213567] [c000000007abbe70] [c0000000001afd6c] cpuidle_idle_call+0x20c/0x2d0
> [   12.213645] [c000000007abbec0] [c0000000001aff20] do_idle+0xf0/0x1d0
> [   12.213707] [c000000007abbf10] [c0000000001b0248] cpu_startup_entry+0x38/0x40
> [   12.213775] [c000000007abbf40] [c00000000005f038] start_secondary+0x248/0x250
> [   12.213840] [c000000007abbf90] [c00000000000c454] start_secondary_prolog+0x10/0x14
> [   12.213917] Sending NMI from CPU 4 to CPUs 5:
> [   17.888326] CPU 5 didn't respond to backtrace IPI, inspecting paca.
> [   17.888947] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 0 (swapper/5)
> [   17.889011] Back trace of paca->saved_r1 (0xc000000007ac7720) (possibly stale):
> [   17.889108] Call Trace:
> [   17.889141] [c000000007ac7720] [c0000000001c1684] load_balance+0x1c4/0xaa0 (unreliable)
> [   17.889215] [c000000007ac7800] [c0000000001e9434] __pv_queued_spin_lock_slowpath+0x1b4/0x420
> [   17.889303] [c000000007ac7850] [c000000000ead414] _raw_spin_lock_irqsave+0xa4/0xc0
> [   17.889383] [c000000007ac7880] [c000000000220cc8] rcu_report_qs_rdp.constprop.0+0x38/0x170
> [   17.889463] [c000000007ac78c0] [c000000000221334] rcu_core+0xa4/0x2a0
> [   17.889519] [c000000007ac7910] [c000000000eadb40] __do_softirq+0x160/0x3f8
> [   17.889582] [c000000007ac7a00] [c000000000162cf8] irq_exit+0xd8/0x140
> [   17.889641] [c000000007ac7a30] [c0000000000272d8] timer_interrupt+0x128/0x2d0
> [   17.889712] [c000000007ac7a90] [c000000000009aa0] decrementer_common_virt+0x190/0x1a0
> [   17.889795] --- interrupt: 900 at plpar_hcall_norets+0x1c/0x28
> [   17.889795]     LR = pseries_lpar_idle+0x98/0x190
> [   17.889899] [c000000007ac7d90] [c000000007a5da00] 0xc000000007a5da00 (unreliable)
> [   17.889979] [c000000007ac7e00] [c00000000001da40] arch_cpu_idle+0x50/0x180
> [   17.890044] [c000000007ac7e30] [c000000000eacd84] default_idle_call+0x84/0x208
> [   17.890123] [c000000007ac7e70] [c0000000001afd6c] cpuidle_idle_call+0x20c/0x2d0
> [   17.890201] [c000000007ac7ec0] [c0000000001aff20] do_idle+0xf0/0x1d0
> [   17.890258] [c000000007ac7f10] [c0000000001b0248] cpu_startup_entry+0x38/0x40
> [   17.890337] [c000000007ac7f40] [c00000000005f038] start_secondary+0x248/0x250
> [   17.890402] [c000000007ac7f90] [c00000000000c454] start_secondary_prolog+0x10/0x14
> [   17.890492] Sending NMI from CPU 4 to CPUs 6:
> [   17.890573] NMI backtrace for cpu 6
> [   17.890574] CPU 6 didn't respond to backtrace IPI, inspecting paca.
> [   17.890576] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 0 (swapper/6)
> [   17.890666] CPU: 6 PID: 0 Comm: swapper/6 Not tainted 5.9.0-1.g11733e1-default #1 openSUSE Tumbleweed (unreleased)
> [   17.890719] Back trace of paca->saved_r1 (0x0000000000000000) (possibly stale):
> [   17.890720] Call Trace:
> [   17.890791] NIP:  c0000000000fb260 LR: c0000000000fdf58 CTR: c00000000012e014
> [   17.890885] Sending NMI from CPU 4 to CPUs 7:
> [   17.890955] REGS: c000000007ae7b00 TRAP: 0500   Not tainted  (5.9.0-1.g11733e1-default)
> [   17.890956] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48000024  XER: 00000000
> [   17.891234] CFAR: c0000000000276bc IRQMASK: 0 
> [   17.891234] GPR00: 0000000028000024 c000000007ae7d90 c000000001cd4a00 0000000000000000 
> [   17.891234] GPR04: c000000007ae7e00 0000000000000000 c000000001d03a00 000000000018ed08 
> [   17.891234] GPR08: 000000007e360000 c00000003fff7400 0000000000000001 0000000000000006 
> [   17.891234] GPR12: c0000000000fdec0 c00000003fff7800 0000000000000000 0000000000000000 
> [   17.891234] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> [   17.891234] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000001 
> [   17.891234] GPR24: 0000000000000006 0000000000000000 0000000000000000 c000000001d17640 
> [   17.891234] GPR28: 0000000000000001 c000000007a40000 c0000000016a7190 c0000000016a7188 
> [   17.891807] NIP [c0000000000fb260] plpar_hcall_norets+0x1c/0x28
> [   17.891863] LR [c0000000000fdf58] pseries_lpar_idle+0x98/0x190
> [   17.891930] Call Trace:
> [   17.891954] [c000000007ae7d90] [c000000007a40000] 0xc000000007a40000 (unreliable)
> [   17.892034] [c000000007ae7e00] [c00000000001da40] arch_cpu_idle+0x50/0x180
> [   17.892091] [c000000007ae7e30] [c000000000eacd84] default_idle_call+0x84/0x208
> [   17.892172] [c000000007ae7e70] [c0000000001afd6c] cpuidle_idle_call+0x20c/0x2d0
> [   17.892240] [c000000007ae7ec0] [c0000000001aff20] do_idle+0xf0/0x1d0
> [   17.892307] [c000000007ae7f10] [c0000000001b0248] cpu_startup_entry+0x38/0x40
> [   17.892374] [c000000007ae7f40] [c00000000005f038] start_secondary+0x248/0x250
> [   17.892452] [c000000007ae7f90] [c00000000000c454] start_secondary_prolog+0x10/0x14
> [   17.892529] Instruction dump:
> [   17.892566] 7c0803a6 3884fff8 78630100 78840020 4bfffeb0 3c4c01be 384297bc 7c421378 
> [   17.892647] 7c000026 90010008 60000000 44000022 <80010008> 7c0ff120 4e800020 7c0802a6 
> [   24.118936] CPU 7 didn't respond to backtrace IPI, inspecting paca.
> [   24.120313] irq_soft_mask: 0x01 in_mce: 0 in_nmi: 0 current: 0 (swapper/7)
> [   24.120568] Back trace of paca->saved_r1 (0xc000000007aa3760) (possibly stale):
> [   24.120834] Call Trace:
> [   24.120927] [c000000007aa3800] [c0000000001e9574] __pv_queued_spin_lock_slowpath+0x2f4/0x420
> [   24.121220] [c000000007aa3850] [c000000000ead414] _raw_spin_lock_irqsave+0xa4/0xc0
> [   24.121473] [c000000007aa3880] [c000000000220cc8] rcu_report_qs_rdp.constprop.0+0x38/0x170
> [   24.121724] [c000000007aa38c0] [c000000000221334] rcu_core+0xa4/0x2a0
> [   24.121933] [c000000007aa3910] [c000000000eadb40] __do_softirq+0x160/0x3f8
> [   24.122144] [c000000007aa3a00] [c000000000162cf8] irq_exit+0xd8/0x140
> [   24.122354] [c000000007aa3a30] [c0000000000272d8] timer_interrupt+0x128/0x2d0
> [   24.122606] [c000000007aa3a90] [c000000000009aa0] decrementer_common_virt+0x190/0x1a0
> [   24.122858] --- interrupt: 900 at plpar_hcall_norets+0x1c/0x28
> [   24.122858]     LR = pseries_lpar_idle+0x98/0x190
> [   24.123191] [c000000007aa3d90] [c000000007a71600] 0xc000000007a71600 (unreliable)
> [   24.123443] [c000000007aa3e00] [c00000000001da40] arch_cpu_idle+0x50/0x180
> [   24.123654] [c000000007aa3e30] [c000000000eacd84] default_idle_call+0x84/0x208
> [   24.123908] [c000000007aa3e70] [c0000000001afd6c] cpuidle_idle_call+0x20c/0x2d0
> [   24.124159] [c000000007aa3ec0] [c0000000001aff20] do_idle+0xf0/0x1d0
> [   24.124367] [c000000007aa3f10] [c0000000001b0248] cpu_startup_entry+0x38/0x40
> [   24.124619] [c000000007aa3f40] [c00000000005f038] start_secondary+0x248/0x250
> [   24.124869] [c000000007aa3f90] [c00000000000c454] start_secondary_prolog+0x10/0x14
> 
> 
> qemu and overall the problem is the same - the boot stops early.
> 

^ permalink raw reply

* [PATCH v2 0/2] Fixes for coregroup
From: Srikar Dronamraju @ 2020-10-19  4:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Srikar Dronamraju, Peter Zijlstra,
	LKML, Valentin Schneider, Qian Cai, linuxppc-dev, Ingo Molnar

These patches fixes problems introduced by the coregroup patches.
The first patch we remove a redundant variable.
Second patch allows to boot with CONFIG_CPUMASK_OFFSTACK enabled.

Changelog v1->v2:
https://lore.kernel.org/linuxppc-dev/20201008034240.34059-1-srikar@linux.vnet.ibm.com/t/#u
1. 1st patch was not part of previous posting.
2. Updated 2nd patch based on comments from Michael Ellerman

Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Qian Cai <cai@redhat.com>

Srikar Dronamraju (2):
  powerpc/smp: Remove unnecessary variable
  powerpc/smp: Use GFP_ATOMIC while allocating tmp mask

 arch/powerpc/kernel/smp.c | 70 +++++++++++++++++++--------------------
 1 file changed, 35 insertions(+), 35 deletions(-)

-- 
2.18.2


^ permalink raw reply

* [PATCH 1/2] powerpc/smp: Remove unnecessary variable
From: Srikar Dronamraju @ 2020-10-19  4:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Srikar Dronamraju, Peter Zijlstra,
	LKML, Valentin Schneider, Qian Cai, linuxppc-dev, Ingo Molnar
In-Reply-To: <20201019042716.106234-1-srikar@linux.vnet.ibm.com>

Commit 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2")
introduced submask_fn in update_mask_by_l2 to track the right submask.
However commit f6606cfdfbcd ("powerpc/smp: Dont assume l2-cache to be
superset of sibling") introduced sibling_mask in update_mask_by_l2 to
track the same submask. Remove sibling_mask in favour of submask_fn.

Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Qian Cai <cai@redhat.com>
---
 arch/powerpc/kernel/smp.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 8d1c401f4617..a864b9b3228c 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1264,18 +1264,16 @@ static bool update_mask_by_l2(int cpu)
 	cpumask_var_t mask;
 	int i;
 
+	if (has_big_cores)
+		submask_fn = cpu_smallcore_mask;
+
 	l2_cache = cpu_to_l2cache(cpu);
 	if (!l2_cache) {
-		struct cpumask *(*sibling_mask)(int) = cpu_sibling_mask;
-
 		/*
 		 * If no l2cache for this CPU, assume all siblings to share
 		 * cache with this CPU.
 		 */
-		if (has_big_cores)
-			sibling_mask = cpu_smallcore_mask;
-
-		for_each_cpu(i, sibling_mask(cpu))
+		for_each_cpu(i, submask_fn(cpu))
 			set_cpus_related(cpu, i, cpu_l2_cache_mask);
 
 		return false;
@@ -1284,9 +1282,6 @@ static bool update_mask_by_l2(int cpu)
 	alloc_cpumask_var_node(&mask, GFP_KERNEL, cpu_to_node(cpu));
 	cpumask_and(mask, cpu_online_mask, cpu_cpu_mask(cpu));
 
-	if (has_big_cores)
-		submask_fn = cpu_smallcore_mask;
-
 	/* Update l2-cache mask with all the CPUs that are part of submask */
 	or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask);
 
-- 
2.18.2


^ permalink raw reply related

* [PATCH 2/2] powerpc/smp: Use GFP_ATOMIC while allocating tmp mask
From: Srikar Dronamraju @ 2020-10-19  4:27 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Nathan Lynch, Gautham R Shenoy, Srikar Dronamraju, Qian Cai, LKML,
	Valentin Schneider, Peter Zijlstra, linuxppc-dev, Ingo Molnar
In-Reply-To: <20201019042716.106234-1-srikar@linux.vnet.ibm.com>

Qian Cai reported a regression where CPU Hotplug fails with the latest
powerpc/next

BUG: sleeping function called from invalid context at mm/slab.h:494
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/88
no locks held by swapper/88/0.
irq event stamp: 18074448
hardirqs last  enabled at (18074447): [<c0000000001a2a7c>] tick_nohz_idle_enter+0x9c/0x110
hardirqs last disabled at (18074448): [<c000000000106798>] do_idle+0x138/0x3b0
do_idle at kernel/sched/idle.c:253 (discriminator 1)
softirqs last  enabled at (18074440): [<c0000000000bbec4>] irq_enter_rcu+0x94/0xa0
softirqs last disabled at (18074439): [<c0000000000bbea0>] irq_enter_rcu+0x70/0xa0
CPU: 88 PID: 0 Comm: swapper/88 Tainted: G        W         5.9.0-rc8-next-20201007 #1
Call Trace:
[c00020000a4bfcf0] [c000000000649e98] dump_stack+0xec/0x144 (unreliable)
[c00020000a4bfd30] [c0000000000f6c34] ___might_sleep+0x2f4/0x310
[c00020000a4bfdb0] [c000000000354f94] slab_pre_alloc_hook.constprop.82+0x124/0x190
[c00020000a4bfe00] [c00000000035e9e8] __kmalloc_node+0x88/0x3a0
slab_alloc_node at mm/slub.c:2817
(inlined by) __kmalloc_node at mm/slub.c:4013
[c00020000a4bfe80] [c0000000006494d8] alloc_cpumask_var_node+0x38/0x80
kmalloc_node at include/linux/slab.h:577
(inlined by) alloc_cpumask_var_node at lib/cpumask.c:116
[c00020000a4bfef0] [c00000000003eedc] start_secondary+0x27c/0x800
update_mask_by_l2 at arch/powerpc/kernel/smp.c:1267
(inlined by) add_cpu_to_masks at arch/powerpc/kernel/smp.c:1387
(inlined by) start_secondary at arch/powerpc/kernel/smp.c:1420
[c00020000a4bff90] [c00000000000c468] start_secondary_resume+0x10/0x14

Allocating a temporary mask while performing a CPU Hotplug operation
with CONFIG_CPUMASK_OFFSTACK enabled, leads to calling a sleepable
function from a atomic context. Fix this by allocating the temporary
mask with GFP_ATOMIC flag. Also instead of having to allocate twice,
allocate the mask in the caller so that we only have to allocate once.
If the allocation fails, assume the mask to be same as sibling mask, which
will make the scheduler to drop this domain for this CPU.

Fixes: 70a94089d7f7 ("powerpc/smp: Optimize update_coregroup_mask")
Fixes: 3ab33d6dc3e9 ("powerpc/smp: Optimize update_mask_by_l2")
Reported-by: Qian Cai <cai@redhat.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Lynch <nathanl@linux.ibm.com>
Cc: Gautham R Shenoy <ego@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Qian Cai <cai@redhat.com>
---
Changelog v1->v2:
https://lore.kernel.org/linuxppc-dev/20201008034240.34059-1-srikar@linux.vnet.ibm.com/t/#u
Updated 2nd patch based on comments from Michael Ellerman
- Remove the WARN_ON.
- Handle allocation failures in a more subtle fashion
- Allocate in the caller so that we allocate once.

 arch/powerpc/kernel/smp.c | 57 +++++++++++++++++++++------------------
 1 file changed, 31 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index a864b9b3228c..028479e9b66b 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1257,38 +1257,33 @@ static struct device_node *cpu_to_l2cache(int cpu)
 	return cache;
 }
 
-static bool update_mask_by_l2(int cpu)
+static bool update_mask_by_l2(int cpu, cpumask_var_t *mask)
 {
 	struct cpumask *(*submask_fn)(int) = cpu_sibling_mask;
 	struct device_node *l2_cache, *np;
-	cpumask_var_t mask;
 	int i;
 
 	if (has_big_cores)
 		submask_fn = cpu_smallcore_mask;
 
 	l2_cache = cpu_to_l2cache(cpu);
-	if (!l2_cache) {
-		/*
-		 * If no l2cache for this CPU, assume all siblings to share
-		 * cache with this CPU.
-		 */
+	if (!l2_cache || !*mask) {
+		/* Assume only core siblings share cache with this CPU */
 		for_each_cpu(i, submask_fn(cpu))
 			set_cpus_related(cpu, i, cpu_l2_cache_mask);
 
 		return false;
 	}
 
-	alloc_cpumask_var_node(&mask, GFP_KERNEL, cpu_to_node(cpu));
-	cpumask_and(mask, cpu_online_mask, cpu_cpu_mask(cpu));
+	cpumask_and(*mask, cpu_online_mask, cpu_cpu_mask(cpu));
 
 	/* Update l2-cache mask with all the CPUs that are part of submask */
 	or_cpumasks_related(cpu, cpu, submask_fn, cpu_l2_cache_mask);
 
 	/* Skip all CPUs already part of current CPU l2-cache mask */
-	cpumask_andnot(mask, mask, cpu_l2_cache_mask(cpu));
+	cpumask_andnot(*mask, *mask, cpu_l2_cache_mask(cpu));
 
-	for_each_cpu(i, mask) {
+	for_each_cpu(i, *mask) {
 		/*
 		 * when updating the marks the current CPU has not been marked
 		 * online, but we need to update the cache masks
@@ -1298,15 +1293,14 @@ static bool update_mask_by_l2(int cpu)
 		/* Skip all CPUs already part of current CPU l2-cache */
 		if (np == l2_cache) {
 			or_cpumasks_related(cpu, i, submask_fn, cpu_l2_cache_mask);
-			cpumask_andnot(mask, mask, submask_fn(i));
+			cpumask_andnot(*mask, *mask, submask_fn(i));
 		} else {
-			cpumask_andnot(mask, mask, cpu_l2_cache_mask(i));
+			cpumask_andnot(*mask, *mask, cpu_l2_cache_mask(i));
 		}
 
 		of_node_put(np);
 	}
 	of_node_put(l2_cache);
-	free_cpumask_var(mask);
 
 	return true;
 }
@@ -1349,40 +1343,46 @@ static inline void add_cpu_to_smallcore_masks(int cpu)
 	}
 }
 
-static void update_coregroup_mask(int cpu)
+static void update_coregroup_mask(int cpu, cpumask_var_t *mask)
 {
 	struct cpumask *(*submask_fn)(int) = cpu_sibling_mask;
-	cpumask_var_t mask;
 	int coregroup_id = cpu_to_coregroup_id(cpu);
 	int i;
 
-	alloc_cpumask_var_node(&mask, GFP_KERNEL, cpu_to_node(cpu));
-	cpumask_and(mask, cpu_online_mask, cpu_cpu_mask(cpu));
-
 	if (shared_caches)
 		submask_fn = cpu_l2_cache_mask;
 
+	if (!*mask) {
+		/* Assume only siblings are part of this CPU's coregroup */
+		for_each_cpu(i, submask_fn(cpu))
+			set_cpus_related(cpu, i, cpu_coregroup_mask);
+
+		return;
+	}
+
+	cpumask_and(*mask, cpu_online_mask, cpu_cpu_mask(cpu));
+
 	/* Update coregroup mask with all the CPUs that are part of submask */
 	or_cpumasks_related(cpu, cpu, submask_fn, cpu_coregroup_mask);
 
 	/* Skip all CPUs already part of coregroup mask */
-	cpumask_andnot(mask, mask, cpu_coregroup_mask(cpu));
+	cpumask_andnot(*mask, *mask, cpu_coregroup_mask(cpu));
 
-	for_each_cpu(i, mask) {
+	for_each_cpu(i, *mask) {
 		/* Skip all CPUs not part of this coregroup */
 		if (coregroup_id == cpu_to_coregroup_id(i)) {
 			or_cpumasks_related(cpu, i, submask_fn, cpu_coregroup_mask);
-			cpumask_andnot(mask, mask, submask_fn(i));
+			cpumask_andnot(*mask, *mask, submask_fn(i));
 		} else {
-			cpumask_andnot(mask, mask, cpu_coregroup_mask(i));
+			cpumask_andnot(*mask, *mask, cpu_coregroup_mask(i));
 		}
 	}
-	free_cpumask_var(mask);
 }
 
 static void add_cpu_to_masks(int cpu)
 {
 	int first_thread = cpu_first_thread_sibling(cpu);
+	cpumask_var_t mask;
 	int i;
 
 	/*
@@ -1396,10 +1396,15 @@ static void add_cpu_to_masks(int cpu)
 			set_cpus_related(i, cpu, cpu_sibling_mask);
 
 	add_cpu_to_smallcore_masks(cpu);
-	update_mask_by_l2(cpu);
+
+	/* In CPU-hotplug path, hence use GFP_ATOMIC */
+	alloc_cpumask_var_node(&mask, GFP_ATOMIC, cpu_to_node(cpu));
+	update_mask_by_l2(cpu, &mask);
 
 	if (has_coregroup_support())
-		update_coregroup_mask(cpu);
+		update_coregroup_mask(cpu, &mask);
+
+	free_cpumask_var(mask);
 }
 
 /* Activate a secondary processor. */
-- 
2.18.2


^ permalink raw reply related

* Re: KVM on POWER8 host lock up since 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
From: Nicholas Piggin @ 2020-10-19  4:50 UTC (permalink / raw)
  To: Michal Suchánek; +Cc: ro, linuxppc-dev, Hari Bathini
In-Reply-To: <1603066878.gtbyofrzyo.astroid@bobo.none>

Excerpts from Nicholas Piggin's message of October 19, 2020 11:00 am:
> Excerpts from Michal Suchánek's message of October 17, 2020 6:14 am:
>> On Mon, Sep 07, 2020 at 11:13:47PM +1000, Nicholas Piggin wrote:
>>> Excerpts from Michael Ellerman's message of August 31, 2020 8:50 pm:
>>> > Michal Suchánek <msuchanek@suse.de> writes:
>>> >> On Mon, Aug 31, 2020 at 11:14:18AM +1000, Nicholas Piggin wrote:
>>> >>> Excerpts from Michal Suchánek's message of August 31, 2020 6:11 am:
>>> >>> > Hello,
>>> >>> > 
>>> >>> > on POWER8 KVM hosts lock up since commit 10d91611f426 ("powerpc/64s:
>>> >>> > Reimplement book3s idle code in C").
>>> >>> > 
>>> >>> > The symptom is host locking up completely after some hours of KVM
>>> >>> > workload with messages like
>>> >>> > 
>>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
>>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 71
>>> >>> > 2020-08-30T10:51:31+00:00 obs-power8-01 kernel: KVM: couldn't grab cpu 47
>>> >>> > 
>>> >>> > printed before the host locks up.
>>> >>> > 
>>> >>> > The machines run sandboxed builds which is a mixed workload resulting in
>>> >>> > IO/single core/mutiple core load over time and there are periods of no
>>> >>> > activity and no VMS runnig as well. The VMs are shortlived so VM
>>> >>> > setup/terdown is somewhat excercised as well.
>>> >>> > 
>>> >>> > POWER9 with the new guest entry fast path does not seem to be affected.
>>> >>> > 
>>> >>> > Reverted the patch and the followup idle fixes on top of 5.2.14 and
>>> >>> > re-applied commit a3f3072db6ca ("powerpc/powernv/idle: Restore IAMR
>>> >>> > after idle") which gives same idle code as 5.1.16 and the kernel seems
>>> >>> > stable.
>>> >>> > 
>>> >>> > Config is attached.
>>> >>> > 
>>> >>> > I cannot easily revert this commit, especially if I want to use the same
>>> >>> > kernel on POWER8 and POWER9 - many of the POWER9 fixes are applicable
>>> >>> > only to the new idle code.
>>> >>> > 
>>> >>> > Any idea what can be the problem?
>>> >>> 
>>> >>> So hwthread_state is never getting back to to HWTHREAD_IN_IDLE on
>>> >>> those threads. I wonder what they are doing. POWER8 doesn't have a good
>>> >>> NMI IPI and I don't know if it supports pdbg dumping registers from the
>>> >>> BMC unfortunately.
>>> >>
>>> >> It may be possible to set up fadump with a later kernel version that
>>> >> supports it on powernv and dump the whole kernel.
>>> > 
>>> > Your firmware won't support it AFAIK.
>>> > 
>>> > You could try kdump, but if we have CPUs stuck in KVM then there's a
>>> > good chance it won't work :/
>>> 
>>> I haven't had any luck yet reproducing this still. Testing with sub 
>>> cores of various different combinations, etc. I'll keep trying though.
>> 
>> Hello,
>> 
>> I tried running some KVM guests to simulate the workload and what I get
>> is guests failing to start with a rcu stall. Tried both 5.3 and 5.9
>> kernel and qemu 4.2.1 and 5.1.0
>> 
>> To start some guests I run
>> 
>> for i in $(seq 0 9) ; do /opt/qemu/bin/qemu-system-ppc64 -m 2048 -accel kvm -smp 8 -kernel /boot/vmlinux -initrd /boot/initrd -nodefaults -nographic -serial mon:telnet::444$i,server,wait & done
>> 
>> To simulate some workload I run
>> 
>> xz -zc9T0 < /dev/zero > /dev/null &
>> while true; do
>>     killall -STOP xz; sleep 1; killall -CONT xz; sleep 1;
>> done &
>> 
>> on the host and add a job that executes this to the ramdisk. However, most
>> guests never get to the point where the job is executed.
>> 
>> Any idea what might be the problem?
> 
> I would say try without pv queued spin locks (but if the same thing is 
> happening with 5.3 then it must be something else I guess). 
> 
> I'll try to test a similar setup on a POWER8 here.

Couldn't reproduce the guest hang, they seem to run fine even with 
queued spinlocks. Might have a different .config.

I might have got a lockup in the host (although different symptoms than 
the original report). I'll look into that a bit further.

Thanks,
Nick

^ permalink raw reply

* Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
From: Joel Stanley @ 2020-10-19  4:55 UTC (permalink / raw)
  To: Christophe Leroy, Segher Boessenkool
  Cc: linux-arch, Arnd Bergmann, Linux Kernel Mailing List,
	Masahiro Yamada, Andrew Morton, linuxppc-dev
In-Reply-To: <96c6172d619c51acc5c1c4884b80785c59af4102.1602949927.git.christophe.leroy@csgroup.eu>

On Sat, 17 Oct 2020 at 15:55, Christophe Leroy
<christophe.leroy@csgroup.eu> wrote:
>
> When building mpc885_ads_defconfig with gcc 10.1,
> the function get_order() appears 50 times in vmlinux:
>
> [linux]# ppc-linux-objdump -x vmlinux | grep get_order | wc -l
> 50
>
> [linux]# size vmlinux
>    text    data     bss     dec     hex filename
> 3842620  675624  135160 4653404  47015c vmlinux
>
> In the old days, marking a function 'static inline' was forcing
> GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable
> CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline
> a function.
>
> It looks like GCC 10 is taking poor decisions on this.
>
> get_order() compiles into the following tiny function,
> occupying 20 bytes of text.
>
> 0000007c <get_order>:
>   7c:   38 63 ff ff     addi    r3,r3,-1
>   80:   54 63 a3 3e     rlwinm  r3,r3,20,12,31
>   84:   7c 63 00 34     cntlzw  r3,r3
>   88:   20 63 00 20     subfic  r3,r3,32
>   8c:   4e 80 00 20     blr
>
> By forcing get_order() to be __always_inline, the size of text is
> reduced by 1940 bytes, that is almost twice the space occupied by
> 50 times get_order()
>
> [linux-powerpc]# size vmlinux
>    text    data     bss     dec     hex filename
> 3840680  675588  135176 4651444  46f9b4 vmlinux

I see similar results with GCC 10.2 building for arm32. There are 143
instances of get_order with aspeed_g5_defconfig.

Before:
 9071838 2630138  186468 11888444         b5673c vmlinux
After:
 9069886 2630126  186468 11886480         b55f90 vmlinux

1952 bytes smaller with your patch applied. Did you raise this with
anyone from GCC?

Reviewed-by: Joel Stanley <joel@jms.id.au>



> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
> ---
>  include/asm-generic/getorder.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/asm-generic/getorder.h b/include/asm-generic/getorder.h
> index e9f20b813a69..f2979e3a96b6 100644
> --- a/include/asm-generic/getorder.h
> +++ b/include/asm-generic/getorder.h
> @@ -26,7 +26,7 @@
>   *
>   * The result is undefined if the size is 0.
>   */
> -static inline __attribute_const__ int get_order(unsigned long size)
> +static __always_inline __attribute_const__ int get_order(unsigned long size)
>  {
>         if (__builtin_constant_p(size)) {
>                 if (!size)
> --
> 2.25.0
>

^ permalink raw reply

* Re: [PATCH v4 0/2] powerpc/mce: Fix mce handler and add selftest
From: Ganesh @ 2020-10-19  5:26 UTC (permalink / raw)
  To: Michael Ellerman, mpe, linuxppc-dev; +Cc: msuchanek, mahesh, npiggin, keescook
In-Reply-To: <160284791867.1794337.3709853894956238598.b4-ty@ellerman.id.au>

On 10/16/20 5:02 PM, Michael Ellerman wrote:

> On Fri, 9 Oct 2020 12:10:03 +0530, Ganesh Goudar wrote:
>> This patch series fixes mce handling for pseries, Adds LKDTM test
>> for SLB multihit recovery and enables selftest for the same,
>> basically to test MCE handling on pseries/powernv machines running
>> in hash mmu mode.
>>
>> v4:
>> * Use radix_enabled() to check if its in Hash or Radix mode.
>> * Use FW_FEATURE_LPAR instead of machine_is_pseries().
>>
>> [...]
> Patch 1 applied to powerpc/fixes.
>
> [1/2] powerpc/mce: Avoid nmi_enter/exit in real mode on pseries hash
>        https://git.kernel.org/powerpc/c/8d0e2101274358d9b6b1f27232b40253ca48bab5
>
> cheers
Thank you, Any comments on patch 2.

^ permalink raw reply

* Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
From: Christophe Leroy @ 2020-10-19  5:50 UTC (permalink / raw)
  To: Joel Stanley, Segher Boessenkool
  Cc: linux-arch, Arnd Bergmann, Linux Kernel Mailing List,
	Masahiro Yamada, Andrew Morton, linuxppc-dev
In-Reply-To: <CACPK8XfgK0Bj3sLjKCi80jS6vK34FN5BTkU8FvBGcMR=RQn4Xw@mail.gmail.com>



Le 19/10/2020 à 06:55, Joel Stanley a écrit :
> On Sat, 17 Oct 2020 at 15:55, Christophe Leroy
> <christophe.leroy@csgroup.eu> wrote:
>>
>> When building mpc885_ads_defconfig with gcc 10.1,
>> the function get_order() appears 50 times in vmlinux:
>>
>> [linux]# ppc-linux-objdump -x vmlinux | grep get_order | wc -l
>> 50
>>
>> [linux]# size vmlinux
>>     text    data     bss     dec     hex filename
>> 3842620  675624  135160 4653404  47015c vmlinux
>>
>> In the old days, marking a function 'static inline' was forcing
>> GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable
>> CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline
>> a function.
>>
>> It looks like GCC 10 is taking poor decisions on this.
>>
>> get_order() compiles into the following tiny function,
>> occupying 20 bytes of text.
>>
>> 0000007c <get_order>:
>>    7c:   38 63 ff ff     addi    r3,r3,-1
>>    80:   54 63 a3 3e     rlwinm  r3,r3,20,12,31
>>    84:   7c 63 00 34     cntlzw  r3,r3
>>    88:   20 63 00 20     subfic  r3,r3,32
>>    8c:   4e 80 00 20     blr
>>
>> By forcing get_order() to be __always_inline, the size of text is
>> reduced by 1940 bytes, that is almost twice the space occupied by
>> 50 times get_order()
>>
>> [linux-powerpc]# size vmlinux
>>     text    data     bss     dec     hex filename
>> 3840680  675588  135176 4651444  46f9b4 vmlinux
> 
> I see similar results with GCC 10.2 building for arm32. There are 143
> instances of get_order with aspeed_g5_defconfig.
> 
> Before:
>   9071838 2630138  186468 11888444         b5673c vmlinux
> After:
>   9069886 2630126  186468 11886480         b55f90 vmlinux
> 
> 1952 bytes smaller with your patch applied. Did you raise this with
> anyone from GCC?

Yes I did, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445

For the time being, it's at a standstill.

Christophe

> 
> Reviewed-by: Joel Stanley <joel@jms.id.au>
> 
> 
> 
>> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
>> ---
>>   include/asm-generic/getorder.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/asm-generic/getorder.h b/include/asm-generic/getorder.h
>> index e9f20b813a69..f2979e3a96b6 100644
>> --- a/include/asm-generic/getorder.h
>> +++ b/include/asm-generic/getorder.h
>> @@ -26,7 +26,7 @@
>>    *
>>    * The result is undefined if the size is 0.
>>    */
>> -static inline __attribute_const__ int get_order(unsigned long size)
>> +static __always_inline __attribute_const__ int get_order(unsigned long size)
>>   {
>>          if (__builtin_constant_p(size)) {
>>                  if (!size)
>> --
>> 2.25.0
>>

^ permalink raw reply

* Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
From: Segher Boessenkool @ 2020-10-19  8:32 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: linux-arch, Arnd Bergmann, Linux Kernel Mailing List,
	Masahiro Yamada, Joel Stanley, Andrew Morton, linuxppc-dev
In-Reply-To: <0bd0afae-f043-2811-944b-c94d90e231d2@csgroup.eu>

On Mon, Oct 19, 2020 at 07:50:41AM +0200, Christophe Leroy wrote:
> Le 19/10/2020 à 06:55, Joel Stanley a écrit :
> >>In the old days, marking a function 'static inline' was forcing
> >>GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable
> >>CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline
> >>a function.
> >>
> >>It looks like GCC 10 is taking poor decisions on this.

> >1952 bytes smaller with your patch applied. Did you raise this with
> >anyone from GCC?
> 
> Yes I did, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> 
> For the time being, it's at a standstill.

The kernel should just use __always_inline if that is what it *wants*;
that is true here most likely.  GCC could perhaps improve its heuristics
so that it no longer thinks these functions are often too big for
inlining (they *are* pretty big, but not after basic optimisations with
constant integer arguments).


Segher

^ permalink raw reply

* Re: [PATCH] asm-generic: Force inlining of get_order() to work around gcc10 poor decision
From: Christophe Leroy @ 2020-10-19  8:54 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: linux-arch, Arnd Bergmann, Linux Kernel Mailing List,
	Masahiro Yamada, Joel Stanley, Andrew Morton, linuxppc-dev
In-Reply-To: <20201019083225.GN2672@gate.crashing.org>



Le 19/10/2020 à 10:32, Segher Boessenkool a écrit :
> On Mon, Oct 19, 2020 at 07:50:41AM +0200, Christophe Leroy wrote:
>> Le 19/10/2020 à 06:55, Joel Stanley a écrit :
>>>> In the old days, marking a function 'static inline' was forcing
>>>> GCC to inline, but since commit ac7c3e4ff401 ("compiler: enable
>>>> CONFIG_OPTIMIZE_INLINING forcibly") GCC may decide to not inline
>>>> a function.
>>>>
>>>> It looks like GCC 10 is taking poor decisions on this.
> 
>>> 1952 bytes smaller with your patch applied. Did you raise this with
>>> anyone from GCC?
>>
>> Yes I did, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
>>
>> For the time being, it's at a standstill.
> 
> The kernel should just use __always_inline if that is what it *wants*;
> that is true here most likely.  GCC could perhaps improve its heuristics
> so that it no longer thinks these functions are often too big for
> inlining (they *are* pretty big, but not after basic optimisations with
> constant integer arguments).
> 

Yes I guess __always_inline is to be added on functions like this defined in headers for exactly 
that, and that's the purpose of this patch.

However I find it odd that get_order() is outlined by GCC even in some object files that don't use 
it at all, for instance in fs/pipe.o

Christophe

^ permalink raw reply

* [PATCH] drm/amd/display: Fix missing declaration of enable_kernel_vsx()
From: Christophe Leroy @ 2020-10-19  9:39 UTC (permalink / raw)
  To: Harry Wentland, Leo Li, Alex Deucher, Christian König,
	David Airlie, Daniel Vetter
  Cc: dri-devel, linuxppc-dev, linux-kernel, amd-gfx

Include <asm/switch-to.h> in order to avoid following build failure
because of missing declaration of enable_kernel_vsx()

  CC [M]  drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o
In file included from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services_types.h:29,
                 from ./drivers/gpu/drm/amd/amdgpu/../display/dc/dm_services.h:37,
                 from drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:27:
drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c: In function 'dcn_bw_apply_registry_override':
./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:64:3: error: implicit declaration of function 'enable_kernel_vsx'; did you mean 'enable_kernel_fp'? [-Werror=implicit-function-declaration]
   64 |   enable_kernel_vsx(); \
      |   ^~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:640:2: note: in expansion of macro 'DC_FP_START'
  640 |  DC_FP_START();
      |  ^~~~~~~~~~~
./drivers/gpu/drm/amd/amdgpu/../display/dc/os_types.h:75:3: error: implicit declaration of function 'disable_kernel_vsx'; did you mean 'disable_kernel_fp'? [-Werror=implicit-function-declaration]
   75 |   disable_kernel_vsx(); \
      |   ^~~~~~~~~~~~~~~~~~
drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:676:2: note: in expansion of macro 'DC_FP_END'
  676 |  DC_FP_END();
      |  ^~~~~~~~~
cc1: some warnings being treated as errors
make[5]: *** [drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.o] Error 1

Fixes: 16a9dea110a6 ("amdgpu: Enable initial DCN support on POWER")
Cc: stable@vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 drivers/gpu/drm/amd/display/dc/os_types.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/display/dc/os_types.h b/drivers/gpu/drm/amd/display/dc/os_types.h
index c3bbfe397e8d..9000cf188544 100644
--- a/drivers/gpu/drm/amd/display/dc/os_types.h
+++ b/drivers/gpu/drm/amd/display/dc/os_types.h
@@ -33,6 +33,7 @@
 #include <linux/slab.h>
 
 #include <asm/byteorder.h>
+#include <asm/switch-to.h>
 
 #include <drm/drm_print.h>
 
-- 
2.25.0


^ permalink raw reply related

* Re: [PATCH v4 2/2] lkdtm/powerpc: Add SLB multihit test
From: Michael Ellerman @ 2020-10-19 10:59 UTC (permalink / raw)
  To: Ganesh Goudar, linuxppc-dev
  Cc: msuchanek, Ganesh Goudar, keescook, npiggin, mahesh
In-Reply-To: <20201009064005.19777-3-ganeshgr@linux.ibm.com>

Hi Ganesh,

Some comments below ...

Ganesh Goudar <ganeshgr@linux.ibm.com> writes:
> To check machine check handling, add support to inject slb
> multihit errors.
>
> Cc: Kees Cook <keescook@chromium.org>
> Reviewed-by: Michal Suchánek <msuchanek@suse.de>
> Co-developed-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
> Signed-off-by: Ganesh Goudar <ganeshgr@linux.ibm.com>
> ---
>  drivers/misc/lkdtm/Makefile             |   1 +
>  drivers/misc/lkdtm/core.c               |   3 +
>  drivers/misc/lkdtm/lkdtm.h              |   3 +
>  drivers/misc/lkdtm/powerpc.c            | 156 ++++++++++++++++++++++++
>  tools/testing/selftests/lkdtm/tests.txt |   1 +
>  5 files changed, 164 insertions(+)
>  create mode 100644 drivers/misc/lkdtm/powerpc.c
>
..
> diff --git a/drivers/misc/lkdtm/powerpc.c b/drivers/misc/lkdtm/powerpc.c
> new file mode 100644
> index 000000000000..f388b53dccba
> --- /dev/null
> +++ b/drivers/misc/lkdtm/powerpc.c
> @@ -0,0 +1,156 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include "lkdtm.h"
> +#include <linux/slab.h>
> +#include <linux/vmalloc.h>

Usual style is to include the linux headers first and then the local header.

> +
> +/* Gets index for new slb entry */
> +static inline unsigned long get_slb_index(void)
> +{
> +	unsigned long index;
> +
> +	index = get_paca()->stab_rr;
> +
> +	/*
> +	 * simple round-robin replacement of slb starting at SLB_NUM_BOLTED.
> +	 */
> +	if (index < (mmu_slb_size - 1))
> +		index++;
> +	else
> +		index = SLB_NUM_BOLTED;
> +	get_paca()->stab_rr = index;
> +	return index;
> +}

I'm not sure we need that really?

We can just always insert at SLB_MUM_BOLTED and SLB_NUM_BOLTED + 1.

Or we could allocate from the top down using mmu_slb_size - 1, and
mmu_slb_size - 2.


> +#define slb_esid_mask(ssize)	\
> +	(((ssize) == MMU_SEGSIZE_256M) ? ESID_MASK : ESID_MASK_1T)
> +
> +/* Form the operand for slbmte */
> +static inline unsigned long mk_esid_data(unsigned long ea, int ssize,
> +					 unsigned long slot)
> +{
> +	return (ea & slb_esid_mask(ssize)) | SLB_ESID_V | slot;
> +}
> +
> +#define slb_vsid_shift(ssize)	\
> +	((ssize) == MMU_SEGSIZE_256M ? SLB_VSID_SHIFT : SLB_VSID_SHIFT_1T)
> +
> +/* Form the operand for slbmte */
> +static inline unsigned long mk_vsid_data(unsigned long ea, int ssize,
> +					 unsigned long flags)
> +{
> +	return (get_kernel_vsid(ea, ssize) << slb_vsid_shift(ssize)) | flags |
> +		((unsigned long)ssize << SLB_VSID_SSIZE_SHIFT);
> +}

I realise it's not much code, but I'd rather those were in a header,
rather than copied from slb.c. That way they can never skew vs the
versions in slb.c

Best place I think would be arch/powerpc/include/asm/book3s/64/mmu-hash.h


> +
> +/* Inserts new slb entry */

It inserts two.

> +static void insert_slb_entry(char *p, int ssize)
> +{
> +	unsigned long flags, entry;
> +
> +	flags = SLB_VSID_KERNEL | mmu_psize_defs[MMU_PAGE_64K].sllp;

That won't work if the kernel is built for 4K pages. Or at least it
won't work the way we want it to.

You should use mmu_linear_psize.

But for vmalloc you should use mmu_vmalloc_psize, so it will need to be
a parameter.

> +	preempt_disable();
> +
> +	entry = get_slb_index();
> +	asm volatile("slbmte %0,%1" :
> +			: "r" (mk_vsid_data((unsigned long)p, ssize, flags)),
> +			  "r" (mk_esid_data((unsigned long)p, ssize, entry))
> +			: "memory");
> +
> +	entry = get_slb_index();
> +	asm volatile("slbmte %0,%1" :
> +			: "r" (mk_vsid_data((unsigned long)p, ssize, flags)),
> +			  "r" (mk_esid_data((unsigned long)p, ssize, entry))
> +			: "memory");
> +	preempt_enable();
> +	/*
> +	 * This triggers exception, If handled correctly we must recover
> +	 * from this error.
> +	 */
> +	p[0] = '!';

That doesn't belong in here, it should be done by the caller.

That would also mean p could be unsigned long in here, so you wouldn't
have to cast it four times.

> +}
> +
> +/* Inject slb multihit on vmalloc-ed address i.e 0xD00... */
> +static void inject_vmalloc_slb_multihit(void)
> +{
> +	char *p;
> +
> +	p = vmalloc(2048);

vmalloc() allocates whole pages, so it may as well be vmalloc(PAGE_SIZE).

> +	if (!p)
> +		return;

That's unlikely, but it should be an error that's propagated up to the caller.

> +
> +	insert_slb_entry(p, MMU_SEGSIZE_1T);
> +	vfree(p);
> +}
> +
> +/* Inject slb multihit on kmalloc-ed address i.e 0xC00... */
> +static void inject_kmalloc_slb_multihit(void)
> +{
> +	char *p;
> +
> +	p = kmalloc(2048, GFP_KERNEL);
> +	if (!p)
> +		return;
> +
> +	insert_slb_entry(p, MMU_SEGSIZE_1T);
> +	kfree(p);
> +}
> +
> +/*
> + * Few initial SLB entries are bolted. Add a test to inject
> + * multihit in bolted entry 0.
> + */
> +static void insert_dup_slb_entry_0(void)
> +{
> +	unsigned long test_address = 0xC000000000000000;

Should use PAGE_OFFSET;

> +	volatile unsigned long *test_ptr;

Does it need to be a volatile?
The slbmte should act as a compiler barrier (it has a memory clobber)
and a CPU barrier as well?

> +	unsigned long entry, i = 0;
> +	unsigned long esid, vsid;

Please group your variables:

  unsigned long esid, vsid, entry, test_address, i;
  volatile unsigned long *test_ptr;

And then initialise them as appropriate.

> +	test_ptr = (unsigned long *)test_address;
> +	preempt_disable();
> +
> +	asm volatile("slbmfee  %0,%1" : "=r" (esid) : "r" (i));
> +	asm volatile("slbmfev  %0,%1" : "=r" (vsid) : "r" (i));

Why do we need to read them out of the SLB rather than just computing
the values?

> +	entry = get_slb_index();
> +
> +	/* for i !=0 we would need to mask out the old entry number */

Or you could just compute esid and then it wouldn't be an issue.

> +	asm volatile("slbmte %0,%1" :
> +			: "r" (vsid),
> +			  "r" (esid | entry)
> +			: "memory");

At this point we've just inserted a duplicate of entry 0. So you don't
need to insert a third entry do you?

> +	asm volatile("slbmfee  %0,%1" : "=r" (esid) : "r" (i));
> +	asm volatile("slbmfev  %0,%1" : "=r" (vsid) : "r" (i));
> +	entry = get_slb_index();
> +
> +	/* for i !=0 we would need to mask out the old entry number */
> +	asm volatile("slbmte %0,%1" :
> +			: "r" (vsid),
> +			  "r" (esid | entry)
> +			: "memory");
> +
> +	pr_info("%s accessing test address 0x%lx: 0x%lx\n",
> +		__func__, test_address, *test_ptr);

This prints the first two instructions of the kernel. I happen to know
what values they should have, but most people won't understand what
they're seeing. A better test would be to read the value at the top of
the function and then load it again here and check we got the right
thing.

eg.
  unsigned long val;

  val = *test_ptr;
  ...
  if (val == *test_ptr)
    pr_info("Success ...")
  else
    pr_info("Fail ...")


> +
> +	preempt_enable();
> +}
> +
> +void lkdtm_PPC_SLB_MULTIHIT(void)
> +{
> +	if (mmu_has_feature(MMU_FTR_HPTE_TABLE)) {

That will be true even if radix is enabled. You need to use radix_enabled().

> +		pr_info("Injecting SLB multihit errors\n");
> +		/*
> +		 * These need not be separate tests, And they do pretty
> +		 * much same thing. In any case we must recover from the
> +		 * errors introduced by these functions, machine would not
> +		 * survive these tests in case of failure to handle.
> +		 */
> +		inject_vmalloc_slb_multihit();
> +		inject_kmalloc_slb_multihit();
> +		insert_dup_slb_entry_0();
> +		pr_info("Recovered from SLB multihit errors\n");
> +	} else {
> +		pr_err("XFAIL: This test is for ppc64 and with hash mode MMU only\n");

The whole file is only built if CONFIG_PPC64, so you don't need to
mention ppc64 in the message.

It should say something more like:

  XFAIL: can't test SLB multi-hit when using Radix MMU



cheers

^ permalink raw reply

* [PATCH v1 0/2] Use H_RPT_INVALIDATE for nested guest
From: Bharata B Rao @ 2020-10-19 11:26 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin

This patchset adds support for the new hcall H_RPT_INVALIDATE
(currently handles nested case only) and replaces the nested tlb flush
calls with this new hcall if the support for the same exists.

Changes in v1:
-------------
- Removed the bits that added the FW_FEATURE_RPT_INVALIDATE feature
  as they are already upstream.

v0: https://lore.kernel.org/linuxppc-dev/20200703104420.21349-1-bharata@linux.ibm.com/T/#m1800c5f5b3d4f6a154ae58fc1c617c06f286358f

H_RPT_INVALIDATE
================
Syntax:
int64   /* H_Success: Return code on successful completion */
        /* H_Busy - repeat the call with the same */
        /* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
        hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation lookaside information */
              uint64 pid,       /* PID/LPID to invalidate */
              uint64 target,    /* Invalidation target */
              uint64 type,      /* Type of lookaside information */
              uint64 pageSizes,     /* Page sizes */
              uint64 start,     /* Start of Effective Address (EA) range (inclusive) */
              uint64 end)       /* End of EA range (exclusive) */

Invalidation targets (target)
-----------------------------
Core MMU        0x01 /* All virtual processors in the partition */
Core local MMU  0x02 /* Current virtual processor */
Nest MMU        0x04 /* All nest/accelerator agents in use by the partition */

A combination of the above can be specified, except core and core local.

Type of translation to invalidate (type)
---------------------------------------
NESTED       0x0001  /* Invalidate nested guest partition-scope */
TLB          0x0002  /* Invalidate TLB */
PWC          0x0004  /* Invalidate Page Walk Cache */
PRT          0x0008  /* Invalidate Process Table Entries if NESTED is clear */
PAT          0x0008  /* Invalidate Partition Table Entries if NESTED is set */

A combination of the above can be specified.

Page size mask (pageSizes)
--------------------------
4K              0x01
64K             0x02
2M              0x04
1G              0x08
All sizes       (-1UL)

A combination of the above can be specified.
All page sizes can be selected with -1.

Semantics: Invalidate radix tree lookaside information
           matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
  different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
  LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and end
  should be a valid Quadrant address and  end > start.
* Return H_NotSupported if the partition is not in running in radix
  translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid addresses.
  Else start and end should be aligned to 4kB (lower 11 bits clear).
* If NESTED is clear, then invalidate process scoped lookaside information.
  Else pid specifies a nested LPID, and the invalidation is performed
  on nested guest partition table and nested guest partition scope real
  addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
  quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
  Those which are partially covered are considered outside invalidation
  range, which allows a caller to optimally invalidate ranges that may
  contain mixed page sizes.
* Return H_SUCCESS on success.

Bharata B Rao (2):
  KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case
    only)
  KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM

 Documentation/virt/kvm/api.rst                |  17 +++
 .../include/asm/book3s/64/tlbflush-radix.h    |  18 +++
 arch/powerpc/include/asm/kvm_book3s.h         |   3 +
 arch/powerpc/kvm/book3s_64_mmu_radix.c        |  26 ++++-
 arch/powerpc/kvm/book3s_hv.c                  |  32 ++++++
 arch/powerpc/kvm/book3s_hv_nested.c           | 107 +++++++++++++++++-
 arch/powerpc/kvm/powerpc.c                    |   3 +
 arch/powerpc/mm/book3s64/radix_tlb.c          |   4 -
 include/uapi/linux/kvm.h                      |   1 +
 9 files changed, 200 insertions(+), 11 deletions(-)

-- 
2.26.2


^ permalink raw reply

* [PATCH v1 2/2] KVM: PPC: Book3S HV: Use H_RPT_INVALIDATE in nested KVM
From: Bharata B Rao @ 2020-10-19 11:26 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20201019112642.53016-1-bharata@linux.ibm.com>

In the nested KVM case, replace H_TLB_INVALIDATE by the new hcall
H_RPT_INVALIDATE if available. The availability of this hcall
is determined from "hcall-rpt-invalidate" string in ibm,hypertas-functions
DT property.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 arch/powerpc/kvm/book3s_64_mmu_radix.c | 26 +++++++++++++++++++++-----
 arch/powerpc/kvm/book3s_hv_nested.c    | 13 +++++++++++--
 2 files changed, 32 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_radix.c b/arch/powerpc/kvm/book3s_64_mmu_radix.c
index 22a677b18695e..9934a91adcc3b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_radix.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_radix.c
@@ -21,6 +21,7 @@
 #include <asm/pte-walk.h>
 #include <asm/ultravisor.h>
 #include <asm/kvm_book3s_uvmem.h>
+#include <asm/plpar_wrappers.h>
 
 /*
  * Supported radix tree geometry.
@@ -318,9 +319,17 @@ void kvmppc_radix_tlbie_page(struct kvm *kvm, unsigned long addr,
 	}
 
 	psi = shift_to_mmu_psize(pshift);
-	rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(0, 0, 1),
-				lpid, rb);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE)) {
+		rb = addr | (mmu_get_ap(psi) << PPC_BITLSHIFT(58));
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(0, 0, 1), lpid, rb);
+	} else {
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_TLB,
+					    psize_to_rpti_pgsize(psi),
+					    addr, addr + psize);
+	}
 	if (rc)
 		pr_err("KVM: TLB page invalidation hcall failed, rc=%ld\n", rc);
 }
@@ -334,8 +343,15 @@ static void kvmppc_radix_flush_pwc(struct kvm *kvm, unsigned int lpid)
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(1, 0, 1),
-				lpid, TLBIEL_INVAL_SET_LPID);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(1, 0, 1),
+					lpid, TLBIEL_INVAL_SET_LPID);
+	else
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_PWC, H_RPTI_PAGE_ALL,
+					    0, -1UL);
 	if (rc)
 		pr_err("KVM: TLB PWC invalidation hcall failed, rc=%ld\n", rc);
 }
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 3ec0231628b42..2a187c782e89b 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -19,6 +19,7 @@
 #include <asm/pgalloc.h>
 #include <asm/pte-walk.h>
 #include <asm/reg.h>
+#include <asm/plpar_wrappers.h>
 
 static struct patb_entry *pseries_partition_tb;
 
@@ -402,8 +403,16 @@ static void kvmhv_flush_lpid(unsigned int lpid)
 		return;
 	}
 
-	rc = plpar_hcall_norets(H_TLB_INVALIDATE, H_TLBIE_P1_ENC(2, 0, 1),
-				lpid, TLBIEL_INVAL_SET_LPID);
+	if (!firmware_has_feature(FW_FEATURE_RPT_INVALIDATE))
+		rc = plpar_hcall_norets(H_TLB_INVALIDATE,
+					H_TLBIE_P1_ENC(2, 0, 1),
+					lpid, TLBIEL_INVAL_SET_LPID);
+	else
+		rc = pseries_rpt_invalidate(lpid, H_RPTI_TARGET_CMMU,
+					    H_RPTI_TYPE_NESTED |
+					    H_RPTI_TYPE_TLB | H_RPTI_TYPE_PWC |
+					    H_RPTI_TYPE_PAT,
+					    H_RPTI_PAGE_ALL, 0, -1UL);
 	if (rc)
 		pr_err("KVM: TLB LPID invalidation hcall failed, rc=%ld\n", rc);
 }
-- 
2.26.2


^ permalink raw reply related

* [PATCH v1 1/2] KVM: PPC: Book3S HV: Add support for H_RPT_INVALIDATE (nested case only)
From: Bharata B Rao @ 2020-10-19 11:26 UTC (permalink / raw)
  To: kvm-ppc, linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
In-Reply-To: <20201019112642.53016-1-bharata@linux.ibm.com>

Implements H_RPT_INVALIDATE hcall and supports only nested case
currently.

A KVM capability KVM_CAP_RPT_INVALIDATE is added to indicate the
support for this hcall.

Signed-off-by: Bharata B Rao <bharata@linux.ibm.com>
---
 Documentation/virt/kvm/api.rst                | 17 ++++
 .../include/asm/book3s/64/tlbflush-radix.h    | 18 ++++
 arch/powerpc/include/asm/kvm_book3s.h         |  3 +
 arch/powerpc/kvm/book3s_hv.c                  | 32 +++++++
 arch/powerpc/kvm/book3s_hv_nested.c           | 94 +++++++++++++++++++
 arch/powerpc/kvm/powerpc.c                    |  3 +
 arch/powerpc/mm/book3s64/radix_tlb.c          |  4 -
 include/uapi/linux/kvm.h                      |  1 +
 8 files changed, 168 insertions(+), 4 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 1f26d83e6b168..67e98a56271ae 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -5852,6 +5852,23 @@ controlled by the kvm module parameter halt_poll_ns. This capability allows
 the maximum halt time to specified on a per-VM basis, effectively overriding
 the module parameter for the target VM.
 
+7.21 KVM_CAP_RPT_INVALIDATE
+--------------------------
+
+:Capability: KVM_CAP_RPT_INVALIDATE
+:Architectures: ppc
+:Type: vm
+
+This capability indicates that the kernel is capable of handling
+H_RPT_INVALIDATE hcall.
+
+In order to enable the use of H_RPT_INVALIDATE in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using it if "hcall-rpt-invalidate" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+This capability is always enabled.
+
 8. Other capabilities.
 ======================
 
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
index 94439e0cefc9c..aace7e9b2397d 100644
--- a/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
+++ b/arch/powerpc/include/asm/book3s/64/tlbflush-radix.h
@@ -4,6 +4,10 @@
 
 #include <asm/hvcall.h>
 
+#define RIC_FLUSH_TLB 0
+#define RIC_FLUSH_PWC 1
+#define RIC_FLUSH_ALL 2
+
 struct vm_area_struct;
 struct mm_struct;
 struct mmu_gather;
@@ -21,6 +25,20 @@ static inline u64 psize_to_rpti_pgsize(unsigned long psize)
 	return H_RPTI_PAGE_ALL;
 }
 
+static inline int rpti_pgsize_to_psize(unsigned long page_size)
+{
+	if (page_size == H_RPTI_PAGE_4K)
+		return MMU_PAGE_4K;
+	if (page_size == H_RPTI_PAGE_64K)
+		return MMU_PAGE_64K;
+	if (page_size == H_RPTI_PAGE_2M)
+		return MMU_PAGE_2M;
+	if (page_size == H_RPTI_PAGE_1G)
+		return MMU_PAGE_1G;
+	else
+		return MMU_PAGE_64K; /* Default */
+}
+
 static inline int mmu_get_ap(int psize)
 {
 	return mmu_psize_defs[psize].ap;
diff --git a/arch/powerpc/include/asm/kvm_book3s.h b/arch/powerpc/include/asm/kvm_book3s.h
index d32ec9ae73bd4..0f1c5fa6e8ce3 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -298,6 +298,9 @@ void kvmhv_set_ptbl_entry(unsigned int lpid, u64 dw0, u64 dw1);
 void kvmhv_release_all_nested(struct kvm *kvm);
 long kvmhv_enter_nested_guest(struct kvm_vcpu *vcpu);
 long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu);
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+			 unsigned long type, unsigned long pg_sizes,
+			 unsigned long start, unsigned long end);
 int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu,
 			  u64 time_limit, unsigned long lpcr);
 void kvmhv_save_hv_regs(struct kvm_vcpu *vcpu, struct hv_guest_state *hr);
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 3bd3118c76330..6cbd37af91ebf 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -904,6 +904,28 @@ static int kvmppc_get_yield_count(struct kvm_vcpu *vcpu)
 	return yield_count;
 }
 
+static long kvmppc_h_rpt_invalidate(struct kvm_vcpu *vcpu,
+				    unsigned long pid, unsigned long target,
+				    unsigned long type, unsigned long pg_sizes,
+				    unsigned long start, unsigned long end)
+{
+	if (end < start)
+		return H_P5;
+
+	if ((!type & H_RPTI_TYPE_NESTED))
+		return H_P3;
+
+	if (!nesting_enabled(vcpu->kvm))
+		return H_FUNCTION;
+
+	/* Support only cores as target */
+	if (target != H_RPTI_TARGET_CMMU)
+		return H_P2;
+
+	return kvmhv_h_rpti_nested(vcpu, pid, (type & ~H_RPTI_TYPE_NESTED),
+				   pg_sizes, start, end);
+}
+
 int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 {
 	unsigned long req = kvmppc_get_gpr(vcpu, 3);
@@ -1112,6 +1134,14 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
 		 */
 		ret = kvmppc_h_svm_init_abort(vcpu->kvm);
 		break;
+	case H_RPT_INVALIDATE:
+		ret = kvmppc_h_rpt_invalidate(vcpu, kvmppc_get_gpr(vcpu, 4),
+					      kvmppc_get_gpr(vcpu, 5),
+					      kvmppc_get_gpr(vcpu, 6),
+					      kvmppc_get_gpr(vcpu, 7),
+					      kvmppc_get_gpr(vcpu, 8),
+					      kvmppc_get_gpr(vcpu, 9));
+		break;
 
 	default:
 		return RESUME_HOST;
@@ -1158,6 +1188,7 @@ static int kvmppc_hcall_impl_hv(unsigned long cmd)
 	case H_XIRR_X:
 #endif
 	case H_PAGE_INIT:
+	case H_RPT_INVALIDATE:
 		return 1;
 	}
 
@@ -5334,6 +5365,7 @@ static unsigned int default_hcall_list[] = {
 	H_XIRR,
 	H_XIRR_X,
 #endif
+	H_RPT_INVALIDATE,
 	0
 };
 
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index 6822d23a2da4d..3ec0231628b42 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -1149,6 +1149,100 @@ long kvmhv_do_nested_tlbie(struct kvm_vcpu *vcpu)
 	return H_SUCCESS;
 }
 
+static long do_tlb_invalidate_nested_tlb(struct kvm_vcpu *vcpu,
+					 unsigned long lpid,
+					 unsigned long page_size,
+					 unsigned long ap,
+					 unsigned long start,
+					 unsigned long end)
+{
+	unsigned long addr = start;
+	int ret;
+
+	do {
+		ret = kvmhv_emulate_tlbie_tlb_addr(vcpu, lpid, ap,
+						   get_epn(addr));
+		if (ret)
+			return ret;
+		addr += page_size;
+	} while (addr < end);
+
+	return ret;
+}
+
+static long do_tlb_invalidate_nested_all(struct kvm_vcpu *vcpu,
+					 unsigned long lpid)
+{
+	struct kvm *kvm = vcpu->kvm;
+	struct kvm_nested_guest *gp;
+
+	gp = kvmhv_get_nested(kvm, lpid, false);
+	if (gp) {
+		kvmhv_emulate_tlbie_lpid(vcpu, gp, RIC_FLUSH_ALL);
+		kvmhv_put_nested(gp);
+	}
+	return H_SUCCESS;
+}
+
+long kvmhv_h_rpti_nested(struct kvm_vcpu *vcpu, unsigned long lpid,
+			 unsigned long type, unsigned long pg_sizes,
+			 unsigned long start, unsigned long end)
+{
+	struct kvm_nested_guest *gp;
+	long ret;
+	unsigned long psize, ap;
+
+	/*
+	 * If L2 lpid isn't valid, we need to return H_PARAMETER.
+	 * Nested KVM issues a L2 lpid flush call when creating
+	 * partition table entries for L2. This happens even before
+	 * the corresponding shadow lpid is created in HV. Until
+	 * this is fixed, ignore such flush requests.
+	 */
+	gp = kvmhv_find_nested(vcpu->kvm, lpid);
+	if (!gp)
+		return H_SUCCESS;
+
+	if ((type & H_RPTI_TYPE_NESTED_ALL) == H_RPTI_TYPE_NESTED_ALL)
+		return do_tlb_invalidate_nested_all(vcpu, lpid);
+
+	if ((type & H_RPTI_TYPE_TLB) == H_RPTI_TYPE_TLB) {
+		if (pg_sizes & H_RPTI_PAGE_64K) {
+			psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_64K);
+			ap = mmu_get_ap(psize);
+
+			ret = do_tlb_invalidate_nested_tlb(vcpu, lpid,
+							   (1UL << 16),
+							   ap, start, end);
+			if (ret)
+				return H_P4;
+		}
+
+		if (pg_sizes & H_RPTI_PAGE_2M) {
+			psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_2M);
+			ap = mmu_get_ap(psize);
+
+			ret = do_tlb_invalidate_nested_tlb(vcpu, lpid,
+							   (1UL << 21),
+							   ap, start, end);
+			if (ret)
+				return H_P4;
+		}
+
+		if (pg_sizes & H_RPTI_PAGE_1G) {
+			psize = rpti_pgsize_to_psize(pg_sizes & H_RPTI_PAGE_1G);
+			ap = mmu_get_ap(psize);
+
+			ret = do_tlb_invalidate_nested_tlb(vcpu, lpid,
+							   (1UL << 30),
+							   ap, start, end);
+			if (ret)
+				return H_P4;
+		}
+	}
+	return H_SUCCESS;
+}
+
 /* Used to convert a nested guest real address to a L1 guest real address */
 static int kvmhv_translate_addr_nested(struct kvm_vcpu *vcpu,
 				       struct kvm_nested_guest *gp,
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b7358..8ab4a71c40b4a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -678,6 +678,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 		r = hv_enabled && kvmppc_hv_ops->enable_svm &&
 			!kvmppc_hv_ops->enable_svm(NULL);
 		break;
+	case KVM_CAP_RPT_INVALIDATE:
+		r = 1;
+		break;
 #endif
 	default:
 		r = 0;
diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c
index b487b489d4b68..3a2b12d1d49b8 100644
--- a/arch/powerpc/mm/book3s64/radix_tlb.c
+++ b/arch/powerpc/mm/book3s64/radix_tlb.c
@@ -18,10 +18,6 @@
 #include <asm/cputhreads.h>
 #include <asm/plpar_wrappers.h>
 
-#define RIC_FLUSH_TLB 0
-#define RIC_FLUSH_PWC 1
-#define RIC_FLUSH_ALL 2
-
 /*
  * tlbiel instruction for radix, set invalidation
  * i.e., r=1 and is=01 or is=10 or is=11
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7d8eced6f459b..109ede74735d4 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1037,6 +1037,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_SMALLER_MAXPHYADDR 185
 #define KVM_CAP_S390_DIAG318 186
 #define KVM_CAP_STEAL_TIME 187
+#define KVM_CAP_RPT_INVALIDATE 188
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
2.26.2


^ permalink raw reply related

* [PATCH 2/3] powerpc: Fix incorrect stw{, ux, u, x} instructions in __set_pte_at
From: Christophe Leroy @ 2020-10-19 12:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <5ffcb064f695d5285bf1faab91bffa3f9245fc26.1603109522.git.christophe.leroy@csgroup.eu>

From: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

The placeholder for instruction selection should use the second
argument's operand, which is %1, not %0. This could generate incorrect
assembly code if the instruction selection for argument %0 ever differs
from argument %1.

Fixes: 9bf2b5cdc5fe ("powerpc: Fixes for CONFIG_PTE_64BIT for SMP support")
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: <stable@vger.kernel.org> # v2.6.28+
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +-
 arch/powerpc/include/asm/nohash/pgtable.h    | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 36443cda8dcf..34f5ca391f0c 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -524,7 +524,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 	__asm__ __volatile__("\
 		stw%U0%X0 %2,%0\n\
 		eieio\n\
-		stw%U0%X0 %L2,%1"
+		stw%U1%X1 %L2,%1"
 	: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
 	: "r" (pte) : "memory");
 
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index 4b7c3472eab1..a00e4c1746d6 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -199,7 +199,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 		__asm__ __volatile__("\
 			stw%U0%X0 %2,%0\n\
 			eieio\n\
-			stw%U0%X0 %L2,%1"
+			stw%U1%X1 %L2,%1"
 		: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
 		: "r" (pte) : "memory");
 		return;
-- 
2.25.0


^ permalink raw reply related

* [PATCH 3/3] powerpc: Fix pre-update addressing in inline assembly
From: Christophe Leroy @ 2020-10-19 12:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel
In-Reply-To: <5ffcb064f695d5285bf1faab91bffa3f9245fc26.1603109522.git.christophe.leroy@csgroup.eu>

In several places, inline assembly uses the "%Un" modifier
to enable the use of instruction with pre-update addressing,
but the associated "<>" constraint is missing.

As mentioned in previous patch, this fails with gcc 4.9, so
"<>" can't be used directly.

Use UPD_CONSTR macro everywhere %Un modifier is used.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/atomic.h            | 9 +++++----
 arch/powerpc/include/asm/book3s/32/pgtable.h | 2 +-
 arch/powerpc/include/asm/io.h                | 4 ++--
 arch/powerpc/include/asm/nohash/pgtable.h    | 2 +-
 arch/powerpc/kvm/powerpc.c                   | 4 ++--
 5 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/atomic.h b/arch/powerpc/include/asm/atomic.h
index 8a55eb8cc97b..b82f9154e45a 100644
--- a/arch/powerpc/include/asm/atomic.h
+++ b/arch/powerpc/include/asm/atomic.h
@@ -10,6 +10,7 @@
 #include <linux/types.h>
 #include <asm/cmpxchg.h>
 #include <asm/barrier.h>
+#include <asm/ppc_asm.h>
 
 /*
  * Since *_return_relaxed and {cmp}xchg_relaxed are implemented with
@@ -26,14 +27,14 @@ static __inline__ int atomic_read(const atomic_t *v)
 {
 	int t;
 
-	__asm__ __volatile__("lwz%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter));
+	__asm__ __volatile__("lwz%U1%X1 %0,%1" : "=r"(t) : "m"UPD_CONSTR(v->counter));
 
 	return t;
 }
 
 static __inline__ void atomic_set(atomic_t *v, int i)
 {
-	__asm__ __volatile__("stw%U0%X0 %1,%0" : "=m"(v->counter) : "r"(i));
+	__asm__ __volatile__("stw%U0%X0 %1,%0" : "=m"UPD_CONSTR(v->counter) : "r"(i));
 }
 
 #define ATOMIC_OP(op, asm_op)						\
@@ -316,14 +317,14 @@ static __inline__ s64 atomic64_read(const atomic64_t *v)
 {
 	s64 t;
 
-	__asm__ __volatile__("ld%U1%X1 %0,%1" : "=r"(t) : "m"(v->counter));
+	__asm__ __volatile__("ld%U1%X1 %0,%1" : "=r"(t) : "m"UPD_CONSTR(v->counter));
 
 	return t;
 }
 
 static __inline__ void atomic64_set(atomic64_t *v, s64 i)
 {
-	__asm__ __volatile__("std%U0%X0 %1,%0" : "=m"(v->counter) : "r"(i));
+	__asm__ __volatile__("std%U0%X0 %1,%0" : "=m"UPD_CONSTR(v->counter) : "r"(i));
 }
 
 #define ATOMIC64_OP(op, asm_op)						\
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 34f5ca391f0c..0e1b6e020cef 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -525,7 +525,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 		stw%U0%X0 %2,%0\n\
 		eieio\n\
 		stw%U1%X1 %L2,%1"
-	: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
+	: "=m"UPD_CONSTR (*ptep), "=m"UPD_CONSTR (*((unsigned char *)ptep+4))
 	: "r" (pte) : "memory");
 
 #else
diff --git a/arch/powerpc/include/asm/io.h b/arch/powerpc/include/asm/io.h
index 58635960403c..87964dfb838e 100644
--- a/arch/powerpc/include/asm/io.h
+++ b/arch/powerpc/include/asm/io.h
@@ -122,7 +122,7 @@ static inline u##size name(const volatile u##size __iomem *addr)	\
 {									\
 	u##size ret;							\
 	__asm__ __volatile__("sync;"#insn"%U1%X1 %0,%1;twi 0,%0,0;isync"\
-		: "=r" (ret) : "m" (*addr) : "memory");			\
+		: "=r" (ret) : "m"UPD_CONSTR (*addr) : "memory");	\
 	return ret;							\
 }
 
@@ -130,7 +130,7 @@ static inline u##size name(const volatile u##size __iomem *addr)	\
 static inline void name(volatile u##size __iomem *addr, u##size val)	\
 {									\
 	__asm__ __volatile__("sync;"#insn"%U0%X0 %1,%0"			\
-		: "=m" (*addr) : "r" (val) : "memory");			\
+		: "=m"UPD_CONSTR (*addr) : "r" (val) : "memory");	\
 	mmiowb_set_pending();						\
 }
 
diff --git a/arch/powerpc/include/asm/nohash/pgtable.h b/arch/powerpc/include/asm/nohash/pgtable.h
index a00e4c1746d6..55ef2112ed00 100644
--- a/arch/powerpc/include/asm/nohash/pgtable.h
+++ b/arch/powerpc/include/asm/nohash/pgtable.h
@@ -200,7 +200,7 @@ static inline void __set_pte_at(struct mm_struct *mm, unsigned long addr,
 			stw%U0%X0 %2,%0\n\
 			eieio\n\
 			stw%U1%X1 %L2,%1"
-		: "=m" (*ptep), "=m" (*((unsigned char *)ptep+4))
+		: "=m"UPD_CONSTR (*ptep), "=m"UPD_CONSTR (*((unsigned char *)ptep+4))
 		: "r" (pte) : "memory");
 		return;
 	}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 13999123b735..cf52d26f49cd 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -1087,7 +1087,7 @@ static inline u64 sp_to_dp(u32 fprs)
 
 	preempt_disable();
 	enable_kernel_fp();
-	asm ("lfs%U1%X1 0,%1; stfd%U0%X0 0,%0" : "=m" (fprd) : "m" (fprs)
+	asm ("lfs%U1%X1 0,%1; stfd%U0%X0 0,%0" : "=m"UPD_CONSTR (fprd) : "m"UPD_CONSTR (fprs)
 	     : "fr0");
 	preempt_enable();
 	return fprd;
@@ -1099,7 +1099,7 @@ static inline u32 dp_to_sp(u64 fprd)
 
 	preempt_disable();
 	enable_kernel_fp();
-	asm ("lfd%U1%X1 0,%1; stfs%U0%X0 0,%0" : "=m" (fprs) : "m" (fprd)
+	asm ("lfd%U1%X1 0,%1; stfs%U0%X0 0,%0" : "=m"UPD_CONSTR (fprs) : "m"UPD_CONSTR (fprd)
 	     : "fr0");
 	preempt_enable();
 	return fprs;
-- 
2.25.0


^ permalink raw reply related

* [PATCH 1/3] powerpc/uaccess: Don't use "m<>" constraint with GCC 4.9
From: Christophe Leroy @ 2020-10-19 12:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
  Cc: linuxppc-dev, linux-kernel

GCC 4.9 sometimes fails to build with "m<>" constraint in
inline assembly.

  CC      lib/iov_iter.o
In file included from ./arch/powerpc/include/asm/cmpxchg.h:6:0,
                 from ./arch/powerpc/include/asm/atomic.h:11,
                 from ./include/linux/atomic.h:7,
                 from ./include/linux/crypto.h:15,
                 from ./include/crypto/hash.h:11,
                 from lib/iov_iter.c:2:
lib/iov_iter.c: In function 'iovec_from_user.part.30':
./arch/powerpc/include/asm/uaccess.h:287:2: error: 'asm' operand has impossible constraints
  __asm__ __volatile__(    \
  ^
./include/linux/compiler.h:78:42: note: in definition of macro 'unlikely'
 # define unlikely(x) __builtin_expect(!!(x), 0)
                                          ^
./arch/powerpc/include/asm/uaccess.h:583:34: note: in expansion of macro 'unsafe_op_wrap'
 #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
                                  ^
./arch/powerpc/include/asm/uaccess.h:329:10: note: in expansion of macro '__get_user_asm'
  case 4: __get_user_asm(x, (u32 __user *)ptr, retval, "lwz"); break; \
          ^
./arch/powerpc/include/asm/uaccess.h:363:3: note: in expansion of macro '__get_user_size_allowed'
   __get_user_size_allowed(__gu_val, __gu_addr, __gu_size, __gu_err); \
   ^
./arch/powerpc/include/asm/uaccess.h:100:2: note: in expansion of macro '__get_user_nocheck'
  __get_user_nocheck((x), (ptr), sizeof(*(ptr)), false)
  ^
./arch/powerpc/include/asm/uaccess.h:583:49: note: in expansion of macro '__get_user_allowed'
 #define unsafe_get_user(x, p, e) unsafe_op_wrap(__get_user_allowed(x, p), e)
                                                 ^
lib/iov_iter.c:1663:3: note: in expansion of macro 'unsafe_get_user'
   unsafe_get_user(len, &uiov[i].iov_len, uaccess_end);
   ^
make[1]: *** [scripts/Makefile.build:283: lib/iov_iter.o] Error 1

Define a UPD_CONSTR macro that is "<>" by default and
only "" with GCC prior to GCC 5.

Fixes: fcf1f26895a4 ("powerpc/uaccess: Add pre-update addressing to __put_user_asm_goto()")
Fixes: 2f279eeb68b8 ("powerpc/uaccess: Add pre-update addressing to __get_user_asm() and __put_user_asm()")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
 arch/powerpc/include/asm/ppc_asm.h | 14 ++++++++++++++
 arch/powerpc/include/asm/uaccess.h |  4 ++--
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/ppc_asm.h b/arch/powerpc/include/asm/ppc_asm.h
index 511786f0e40d..471c7c57fc98 100644
--- a/arch/powerpc/include/asm/ppc_asm.h
+++ b/arch/powerpc/include/asm/ppc_asm.h
@@ -803,6 +803,20 @@ END_FTR_SECTION_NESTED(CPU_FTR_CELL_TB_BUG, CPU_FTR_CELL_TB_BUG, 96)
 
 #endif /* !CONFIG_PPC_BOOK3E */
 
+#else /* __ASSEMBLY */
+
+/*
+ * Inline assembly memory constraint
+ *
+ * GCC 4.9 doesn't properly handle pre update memory constraint "m<>"
+ *
+ */
+#if defined(GCC_VERSION) && GCC_VERSION < 50000
+#define UPD_CONSTR ""
+#else
+#define UPD_CONSTR "<>"
+#endif
+
 #endif /*  __ASSEMBLY__ */
 
 /*
diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
index 604d705f1bb8..8f27ea48fadb 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -223,7 +223,7 @@ do {								\
 		"1:	" op "%U1%X1 %0,%1	# put_user\n"	\
 		EX_TABLE(1b, %l2)				\
 		:						\
-		: "r" (x), "m<>" (*addr)				\
+		: "r" (x), "m"UPD_CONSTR (*addr)		\
 		:						\
 		: label)
 
@@ -294,7 +294,7 @@ extern long __get_user_bad(void);
 		".previous\n"				\
 		EX_TABLE(1b, 3b)			\
 		: "=r" (err), "=r" (x)			\
-		: "m<>" (*addr), "i" (-EFAULT), "0" (err))
+		: "m"UPD_CONSTR (*addr), "i" (-EFAULT), "0" (err))
 
 #ifdef __powerpc64__
 #define __get_user_asm2(x, addr, err)			\
-- 
2.25.0


^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox