* [powerpc:fixes-test] BUILD SUCCESS c1ed1754f271f6b7acb1bfdc8cfb62220fbed423
From: kernel test robot @ 2020-06-26 4:48 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git fixes-test
branch HEAD: c1ed1754f271f6b7acb1bfdc8cfb62220fbed423 powerpc/kvm/book3s64: Fix kernel crash with nested kvm & DEBUG_VIRTUAL
elapsed time: 945m
configs tested: 114
configs skipped: 108
The following configs have been built successfully.
More configs may be tested in the coming days.
arm defconfig
arm allyesconfig
arm allmodconfig
arm allnoconfig
arm64 allyesconfig
arm64 defconfig
arm64 allmodconfig
arm64 allnoconfig
arc haps_hs_smp_defconfig
s390 allyesconfig
powerpc g5_defconfig
mips jmr3927_defconfig
sh se7751_defconfig
arm imx_v6_v7_defconfig
arm xcep_defconfig
arm pxa255-idp_defconfig
arm tango4_defconfig
arm pxa_defconfig
arm lpc18xx_defconfig
mips ip27_defconfig
arm eseries_pxa_defconfig
mips loongson3_defconfig
i386 alldefconfig
nds32 allnoconfig
sh se7724_defconfig
mips loongson1b_defconfig
parisc allnoconfig
arm lart_defconfig
i386 allnoconfig
i386 allyesconfig
i386 defconfig
i386 debian-10.3
ia64 allmodconfig
ia64 defconfig
ia64 allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k allnoconfig
m68k sun3_defconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
nios2 allyesconfig
openrisc defconfig
c6x allyesconfig
c6x allnoconfig
openrisc allyesconfig
nds32 defconfig
csky allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
h8300 allmodconfig
xtensa defconfig
arc defconfig
arc allyesconfig
sh allmodconfig
sh allnoconfig
microblaze allnoconfig
mips allyesconfig
mips allnoconfig
mips allmodconfig
parisc defconfig
parisc allyesconfig
parisc allmodconfig
powerpc allyesconfig
powerpc rhel-kconfig
powerpc allmodconfig
powerpc allnoconfig
powerpc defconfig
i386 randconfig-a002-20200624
i386 randconfig-a006-20200624
i386 randconfig-a003-20200624
i386 randconfig-a001-20200624
i386 randconfig-a005-20200624
i386 randconfig-a004-20200624
i386 randconfig-a013-20200624
i386 randconfig-a016-20200624
i386 randconfig-a012-20200624
i386 randconfig-a014-20200624
i386 randconfig-a011-20200624
i386 randconfig-a015-20200624
x86_64 randconfig-a004-20200624
x86_64 randconfig-a002-20200624
x86_64 randconfig-a003-20200624
x86_64 randconfig-a005-20200624
x86_64 randconfig-a001-20200624
x86_64 randconfig-a006-20200624
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
s390 allnoconfig
s390 allmodconfig
s390 defconfig
sparc allyesconfig
sparc defconfig
sparc64 defconfig
sparc64 allnoconfig
sparc64 allyesconfig
sparc64 allmodconfig
um allnoconfig
um defconfig
um allmodconfig
x86_64 rhel-7.6
x86_64 rhel-7.6-kselftests
x86_64 rhel-8.3
x86_64 kexec
x86_64 rhel
x86_64 rhel-7.2-clear
x86_64 lkp
x86_64 fedora-25
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* [powerpc:next] BUILD SUCCESS 105fb38124a490f38e9c1e23bb4c4a0b6ba12fdb
From: kernel test robot @ 2020-06-26 4:48 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git next
branch HEAD: 105fb38124a490f38e9c1e23bb4c4a0b6ba12fdb powerpc/8xx: Modify ptep_get()
elapsed time: 942m
configs tested: 120
configs skipped: 5
The following configs have been built successfully.
More configs may be tested in the coming days.
arm defconfig
arm allyesconfig
arm allmodconfig
arm allnoconfig
arm64 allyesconfig
arm64 defconfig
arm64 allmodconfig
arm64 allnoconfig
arc haps_hs_smp_defconfig
powerpc g5_defconfig
mips jmr3927_defconfig
s390 allyesconfig
sh se7751_defconfig
arm imx_v6_v7_defconfig
arm xcep_defconfig
arm pxa255-idp_defconfig
arm tango4_defconfig
arm mainstone_defconfig
arm moxart_defconfig
m68k q40_defconfig
sh sdk7786_defconfig
s390 allnoconfig
arm mps2_defconfig
arm pxa_defconfig
arm lpc18xx_defconfig
mips ip27_defconfig
arm eseries_pxa_defconfig
mips loongson3_defconfig
i386 alldefconfig
nds32 allnoconfig
sh se7724_defconfig
mips loongson1b_defconfig
parisc allnoconfig
arm lart_defconfig
i386 allnoconfig
i386 allyesconfig
i386 defconfig
i386 debian-10.3
ia64 allmodconfig
ia64 defconfig
ia64 allnoconfig
ia64 allyesconfig
m68k allmodconfig
m68k allnoconfig
m68k sun3_defconfig
m68k defconfig
m68k allyesconfig
nios2 defconfig
nios2 allyesconfig
openrisc defconfig
c6x allyesconfig
c6x allnoconfig
openrisc allyesconfig
nds32 defconfig
csky allyesconfig
csky defconfig
alpha defconfig
alpha allyesconfig
xtensa allyesconfig
h8300 allyesconfig
h8300 allmodconfig
xtensa defconfig
arc defconfig
arc allyesconfig
sh allmodconfig
sh allnoconfig
microblaze allnoconfig
mips allyesconfig
mips allnoconfig
mips allmodconfig
parisc defconfig
parisc allyesconfig
parisc allmodconfig
powerpc allyesconfig
powerpc rhel-kconfig
powerpc allmodconfig
powerpc allnoconfig
powerpc defconfig
i386 randconfig-a002-20200624
i386 randconfig-a006-20200624
i386 randconfig-a003-20200624
i386 randconfig-a001-20200624
i386 randconfig-a005-20200624
i386 randconfig-a004-20200624
x86_64 randconfig-a004-20200624
x86_64 randconfig-a002-20200624
x86_64 randconfig-a003-20200624
x86_64 randconfig-a005-20200624
x86_64 randconfig-a001-20200624
x86_64 randconfig-a006-20200624
i386 randconfig-a013-20200624
i386 randconfig-a016-20200624
i386 randconfig-a012-20200624
i386 randconfig-a014-20200624
i386 randconfig-a011-20200624
i386 randconfig-a015-20200624
riscv allyesconfig
riscv allnoconfig
riscv defconfig
riscv allmodconfig
s390 allmodconfig
s390 defconfig
sparc allyesconfig
sparc defconfig
sparc64 defconfig
sparc64 allnoconfig
sparc64 allyesconfig
sparc64 allmodconfig
um allmodconfig
um allnoconfig
um allyesconfig
um defconfig
x86_64 rhel-7.6
x86_64 rhel-7.6-kselftests
x86_64 rhel-8.3
x86_64 kexec
x86_64 rhel
x86_64 rhel-7.2-clear
x86_64 lkp
x86_64 fedora-25
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
^ permalink raw reply
* Re: [PATCH RFC 1/1] powerpc/eeh: Provide a unique ID for each EEH recovery
From: Oliver O'Halloran @ 2020-06-26 5:08 UTC (permalink / raw)
To: Sam Bobroff; +Cc: linuxppc-dev
In-Reply-To: <f81f645ba66b760c31f25014f03a0c3a4970f993.1592975969.git.sbobroff@linux.ibm.com>
On Wed, Jun 24, 2020 at 3:20 PM Sam Bobroff <sbobroff@linux.ibm.com> wrote:
>
> Give a unique ID to each recovery event, to ease log parsing and
> prepare for parallel recovery.
>
> Also add some new messages with a very simple format that may be
> useful to log-parsers.
>
> Signed-off-by: Sam Bobroff <sbobroff@linux.ibm.com>
> ---
> This patch should be applied on top of my recent(ish) set:
> "powerpc/eeh: Synchronization for safety".
If you're going to do a respin I'd post these as a single series and
rebase it on mainline. There's a bit of drift.
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 68e6dfa526a5..54f921ff7621 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -197,7 +197,8 @@ EXPORT_SYMBOL_GPL(eeh_recovery_must_be_locked);
> * for the indicated PCI device, and puts them into a buffer
> * for RTAS error logging.
> */
> -static size_t eeh_dump_dev_log(struct eeh_dev *edev, char *buf, size_t len)
> +static size_t eeh_dump_dev_log(unsigned int event_id, struct eeh_dev *edev,
> + char *buf, size_t len)
If we're going to pass around some event context then IMO we should
pass around the eeh_event itself rather than just an ID number. That
would give us somewhere to put any extra per-event context (such as
the saved stacktrace) rather than dumping it into eeh_pe.
We'd probably have to fix the "special" events so they're signalled by
some means other than a NULL event pointer.
*snip*
> @@ -280,19 +283,26 @@ static void eeh_pe_report_pdev(struct pci_dev *pdev, eeh_report_fn fn,
> driver = eeh_pcid_get(pdev);
>
> if (!driver)
> - pci_info(pdev, "no driver");
> + pci_info(pdev, "EEH(%u): no driver", event_id);
> else if (!driver->err_handler)
> - pci_info(pdev, "driver not EEH aware");
> + pci_info(pdev, "EEH(%u): driver not EEH aware", event_id);
> else if (late)
> - pci_info(pdev, "driver bound too late");
> + pci_info(pdev, "EEH(%u): driver bound too late", event_id);
> else {
> - new_result = fn(pdev, driver);
> + pr_warn("EEH(%u): EVENT=HANDLER_CALL DEVICE=%04x:%02x:%02x.%x DRIVER='%s' HANDLER='%s'\n",
WHY ARE WE YELLING
> @@ -579,7 +598,8 @@ static void *eeh_add_virt_device(struct eeh_dev *edev)
> * lock is dropped (which it must be in order to take the PCI rescan/remove
> * lock without risking a deadlock).
> */
> -static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
> +static void eeh_rmv_device(unsigned int event_id,
> + struct pci_dev *pdev, void *userdata)
> {
> struct eeh_dev *edev;
> struct pci_driver *driver;
> @@ -588,8 +608,8 @@ static void eeh_rmv_device(struct pci_dev *pdev, void *userdata)
>
> edev = pci_dev_to_eeh_dev(pdev);
> if (!edev) {
> - pci_warn(pdev, "EEH: Device removed during processing (#%d)\n",
> - __LINE__);
> + pci_warn(pdev, "EEH(%u): Device removed during processing (#%d)\n",
> + event_id, __LINE__);
It's already there, but what's with the __LINE__ ?
> diff --git a/arch/powerpc/kernel/eeh_event.c b/arch/powerpc/kernel/eeh_event.c
> index a7a8dc182efb..bd38d6fe5449 100644
> --- a/arch/powerpc/kernel/eeh_event.c
> +++ b/arch/powerpc/kernel/eeh_event.c
> @@ -26,6 +26,9 @@ static DEFINE_SPINLOCK(eeh_eventlist_lock);
> static DECLARE_COMPLETION(eeh_eventlist_event);
> static LIST_HEAD(eeh_eventlist);
>
> +/* Event ID 0 is reserved for special events */
> +static atomic_t eeh_event_id = ATOMIC_INIT(1);
> +
I don't think using zero for all special events is a good idea.
Special events are just events that are detected by the EEH
notification interrupt. Unlike the MMIO / config space detection
mechanism we don't have any device or PE context available in the
interrupt handler so the work of figuring out where the error came
from is punted to the recovery thread.
IMO this function probably shouldn't be calling
eeh_handle_normal_event() at all. Instead it should queue a new
eeh_event (with a unique ID) for each error it finds. That way
handling a "special" event just consists of scanning for which PHB /
PE is currently broken and the actual recovery path is identical. If
we switched to using a threaded IRQ handler (which can block) for the
EEH notification interrupts we could probably kill off special events
entirely.
> @@ -1338,7 +1367,7 @@ void eeh_handle_special_event(void)
> if (rc == EEH_NEXT_ERR_FROZEN_PE ||
> rc == EEH_NEXT_ERR_FENCED_PHB) {
> eeh_pe_state_mark(pe, EEH_PE_RECOVERING);
> - eeh_handle_normal_event(pe);
> + eeh_handle_normal_event(0, pe);
I think that needs to be a unique ID even if we keep this function
calling eeh_handle_normal_event() directly.
> } else {
> eeh_for_each_pe(pe, tmp_pe)
> eeh_pe_for_each_dev(tmp_pe, edev, tmp_edev)
> @@ -1347,7 +1376,7 @@ void eeh_handle_special_event(void)
> /* Notify all devices to be down */
> eeh_pe_state_clear(pe, EEH_PE_PRI_BUS, true);
> eeh_set_channel_state(pe, pci_channel_io_perm_failure);
> - eeh_pe_report(
> + eeh_pe_report(0,
> "error_detected(permanent failure)", pe,
> eeh_report_failure, NULL);
^ permalink raw reply
* [PATCH] crypto: af_alg - Fix regression on empty requests
From: Herbert Xu @ 2020-06-26 6:29 UTC (permalink / raw)
To: Eric Biggers
Cc: Sachin Sant, David S. Miller, Naresh Kamboju, Jarkko Sakkinen,
Luis Chamberlain, lkft-triage, open list, David Howells,
Linux Next Mailing List, linux-security-module, keyrings,
linux-crypto, chrubis, James Morris, linuxppc-dev, Jan Stancek,
LTP List, Serge E. Hallyn
In-Reply-To: <20200623170217.GB150582@gmail.com>
On Tue, Jun 23, 2020 at 10:02:17AM -0700, Eric Biggers wrote:
>
> The source code for the two failing AF_ALG tests is here:
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypto/af_alg02.c
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/crypto/af_alg05.c
>
> They use read() and write(), not send() and recv().
>
> af_alg02 uses read() to read from a "salsa20" request socket without writing
> anything to it. It is expected that this returns 0, i.e. that behaves like
> encrypting an empty message.
>
> af_alg05 uses write() to write 15 bytes to a "cbc(aes-generic)" request socket,
> then read() to read 15 bytes. It is expected that this fails with EINVAL, since
> the length is not aligned to the AES block size (16 bytes).
This patch should fix the regression:
---8<---
Some user-space programs rely on crypto requests that have no
control metadata. This broke when a check was added to require
the presence of control metadata with the ctx->init flag.
This patch fixes the regression by removing the ctx->init flag.
This means that we do not distinguish the case of no metadata
as opposed to an empty request. IOW it is always assumed that
if you call recv(2) before sending metadata that you are working
with an empty request.
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: f3c802a1f300 ("crypto: algif_aead - Only wake up when...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 9fcb91ea10c4..2d391117c020 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -635,7 +635,6 @@ void af_alg_pull_tsgl(struct sock *sk, size_t used, struct scatterlist *dst,
if (!ctx->used)
ctx->merge = 0;
- ctx->init = ctx->more;
}
EXPORT_SYMBOL_GPL(af_alg_pull_tsgl);
@@ -757,8 +756,7 @@ int af_alg_wait_for_data(struct sock *sk, unsigned flags, unsigned min)
break;
timeout = MAX_SCHEDULE_TIMEOUT;
if (sk_wait_event(sk, &timeout,
- ctx->init && (!ctx->more ||
- (min && ctx->used >= min)),
+ !ctx->more || (min && ctx->used >= min),
&wait)) {
err = 0;
break;
@@ -847,7 +845,7 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
}
lock_sock(sk);
- if (ctx->init && (init || !ctx->more)) {
+ if (!ctx->more && ctx->used) {
err = -EINVAL;
goto unlock;
}
@@ -858,7 +856,6 @@ int af_alg_sendmsg(struct socket *sock, struct msghdr *msg, size_t size,
memcpy(ctx->iv, con.iv->iv, ivsize);
ctx->aead_assoclen = con.aead_assoclen;
- ctx->init = true;
}
while (size) {
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index d48d2156e621..749fe42315be 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -106,7 +106,7 @@ static int _aead_recvmsg(struct socket *sock, struct msghdr *msg,
size_t usedpages = 0; /* [in] RX bufs to be used from user */
size_t processed = 0; /* [in] TX bufs to be consumed */
- if (!ctx->init || ctx->more) {
+ if (ctx->more) {
err = af_alg_wait_for_data(sk, flags, 0);
if (err)
return err;
diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index a51ba22fef58..5b6fa5e8c00d 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -61,7 +61,7 @@ static int _skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
int err = 0;
size_t len = 0;
- if (!ctx->init || (ctx->more && ctx->used < bs)) {
+ if (ctx->more && ctx->used < bs) {
err = af_alg_wait_for_data(sk, flags, bs);
if (err)
return err;
diff --git a/include/crypto/if_alg.h b/include/crypto/if_alg.h
index ee6412314f8f..08c087cc89d6 100644
--- a/include/crypto/if_alg.h
+++ b/include/crypto/if_alg.h
@@ -135,7 +135,6 @@ struct af_alg_async_req {
* SG?
* @enc: Cryptographic operation to be performed when
* recvmsg is invoked.
- * @init: True if metadata has been sent.
* @len: Length of memory allocated for this data structure.
*/
struct af_alg_ctx {
@@ -152,7 +151,6 @@ struct af_alg_ctx {
bool more;
bool merge;
bool enc;
- bool init;
unsigned int len;
};
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply related
* [bug] LTP mmap03 stuck in page fault loop after c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault error to the user")
From: Jan Stancek @ 2020-06-26 6:59 UTC (permalink / raw)
To: linuxppc-dev, aneesh.kumar, sandipan; +Cc: Rachel Sibley, Jan Stancek
In-Reply-To: <1402271372.18777802.1593153800272.JavaMail.zimbra@redhat.com>
Hi,
LTP mmap03 is getting stuck in page fault loop after commit
c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault error to the user")
System is ppc64le P9 lpar [1] running v5.8-rc2-34-g3e08a95294a4.
Here's a minimized reproducer:
------------------------- 8< -----------------------------
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/mman.h>
int main(int ac, char **av)
{
int page_sz = getpagesize();
int fildes;
char *addr;
fildes = open("tempfile", O_WRONLY | O_CREAT, 0666);
write(fildes, &fildes, sizeof(fildes));
close(fildes);
fildes = open("tempfile", O_RDONLY);
unlink("tempfile");
addr = mmap(0, page_sz, PROT_EXEC, MAP_FILE | MAP_PRIVATE, fildes, 0);
printf("%d\n", *addr);
return 0;
}
------------------------- >8 -----------------------------
This would previously end quickly with segmentation fault, after
commit c46241a370a6 test is stuck:
# perf stat timeout 5 ./a.out
Performance counter stats for 'timeout 5 ./a.out':
5,001.74 msec task-clock # 1.000 CPUs utilized
9 context-switches # 0.002 K/sec
0 cpu-migrations # 0.000 K/sec
3,094,893 page-faults # 0.619 M/sec
18,940,869,512 cycles # 3.787 GHz (33.39%)
1,377,005,087 stalled-cycles-frontend # 7.27% frontend cycles idle (50.19%)
10,949,936,056 stalled-cycles-backend # 57.81% backend cycles idle (16.62%)
21,133,828,748 instructions # 1.12 insn per cycle
# 0.52 stalled cycles per insn (33.22%)
4,395,016,137 branches # 878.698 M/sec (49.81%)
164,499,002 branch-misses # 3.74% of all branches (16.60%)
5.001237248 seconds time elapsed
0.321276000 seconds user
4.680772000 seconds sys
access_pkey_error() in page fault handler now always seem to return false:
__do_page_fault
access_pkey_error(is_pkey: 1, is_exec: 0, is_write: 0)
arch_vma_access_permitted
pkey_access_permitted
if (!is_pkey_enabled(pkey))
return true
return false
Regards,
Jan
[1]
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 8
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 2
Model: 2.2 (pvr 004e 0202)
Model name: POWER9 (architected), altivec supported
Hypervisor vendor: pHyp
Virtualization type: para
L1d cache: 32 KiB
L1i cache: 32 KiB
NUMA node0 CPU(s):
NUMA node1 CPU(s): 0-7
Physical sockets: 2
Physical chips: 1
Physical cores/chip: 8
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Mitigation; RFI Flush, L1D private per thread
Vulnerability Mds: Not affected
Vulnerability Meltdown: Mitigation; RFI Flush, L1D private per thread
Vulnerability Spec store bypass: Mitigation; Kernel entry/exit barrier (eieio)
Vulnerability Spectre v1: Mitigation; __user pointer sanitization, ori31 speculation barrier enabled
Vulnerability Spectre v2: Mitigation; Indirect branch cache disabled, Software link stack flush
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: Use doorbells even if XIVE is available
From: Cédric Le Goater @ 2020-06-26 7:26 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, linuxppc-dev
Cc: Anton Blanchard, kvm-ppc, David Gibson
In-Reply-To: <af42c250-cf4b-0815-c91c-9363445383e7@kaod.org>
[ ... ]
>>> An option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
>>> to get KVM performance back.
>
> An option vector would require a PAPR change. Unless the architecture
> reserves some bits for the implementation, but I don't think so. Same
> for CAS.
>
>> Qemu/KVM populates /proc/device-tree/hypervisor, so we *could* look at
>> that. Though adding PowerVM/KVM specific hacks is obviously a very
>> slippery slope.
>
> QEMU could advertise a property "emulated-msgsndp", or something similar,
> which would be interpreted by Linux as a CPU feature and taken into account
> when doing the IPIs.
This is really a PowerVM optimization.
> The CPU setup for XIVE needs a cleanup also. There is no need to allocate
> interrupts for IPIs anymore in that case.
We need to keep these for the other cores. The XIVE layer is unchanged.
C.
^ permalink raw reply
* Re: [PATCH v2 2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
From: Gautham R Shenoy @ 2020-06-26 7:45 UTC (permalink / raw)
To: Madhavan Srinivasan
Cc: nathanl, ego, maddy, suka, anju, Kajol Jain, linuxppc-dev
In-Reply-To: <a6a626e6-22eb-f1c2-4356-dfe1caf8db46@linux.ibm.com>
On Wed, Jun 24, 2020 at 05:58:31PM +0530, Madhavan Srinivasan wrote:
>
>
> On 6/24/20 4:26 PM, Gautham R Shenoy wrote:
> >Hi Kajol,
> >
> >On Wed, Jun 24, 2020 at 03:47:54PM +0530, Kajol Jain wrote:
> >>Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.
> >>
> >>command:# cat /sys/devices/hv_24x7/cpumask
> >>0
> >Since this sysfs interface is read-only, and the user cannot change
> >the CPU which will be making the HCALLs to obtain the 24x7 counts,
> >does the user even need to know if currently CPU X is the one which is
> >going to make HCALLs to retrive the 24x7 counts ? Does it help in any
> >kind of trouble-shooting ?
> Primary use to expose the cpumask is for the perf tool.
> Which has the capability to parse the driver sysfs folder
> and understand the cpumask file. Having cpumask
> file will reduce the number of perf commandline
> parameters (will avoid "-C" option in the perf tool
> command line). I can also notify the user which is
> the current cpu used to retrieve the counter data.
Fair enough. Can we include this in the patch description ?
>
> >It would have made sense if the interface was read-write, since a user
> >can set this to a CPU which is not running user applications. This
> >would help in minimising jitter on those active CPUs running the user
> >applications.
>
> With cpumask backed by hotplug
> notifiers, enabling user write access to it will
> complicate the code with more additional check.
> CPU will come to play only if the user request for
> counter data. If not, then there will be no HCALLs made
> using the CPU.
Well, I was wondering if you could make the interface writable because
I couldn't think of the use of a read-only interface. With the
perf-use case you have provided, I guess it makes sense. I am ok with
it being a read-only interface.
>
> Maddy
--
Thanks and Regards
gautham.
^ permalink raw reply
* Re: [bug] LTP mmap03 stuck in page fault loop after c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault error to the user")
From: Aneesh Kumar K.V @ 2020-06-26 7:47 UTC (permalink / raw)
To: Jan Stancek, linuxppc-dev, sandipan; +Cc: Rachel Sibley
In-Reply-To: <2065283975.18780128.1593154755849.JavaMail.zimbra@redhat.com>
Hi Jan,
On 6/26/20 12:29 PM, Jan Stancek wrote:
> Hi,
>
> LTP mmap03 is getting stuck in page fault loop after commit
> c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault error to the user")
>
> System is ppc64le P9 lpar [1] running v5.8-rc2-34-g3e08a95294a4.
>
> Here's a minimized reproducer:
> ------------------------- 8< -----------------------------
> #include <fcntl.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <sys/mman.h>
>
> int main(int ac, char **av)
> {
> int page_sz = getpagesize();
> int fildes;
> char *addr;
>
> fildes = open("tempfile", O_WRONLY | O_CREAT, 0666);
> write(fildes, &fildes, sizeof(fildes));
> close(fildes);
>
> fildes = open("tempfile", O_RDONLY);
> unlink("tempfile");
>
> addr = mmap(0, page_sz, PROT_EXEC, MAP_FILE | MAP_PRIVATE, fildes, 0);
>
> printf("%d\n", *addr);
> return 0;
> }
> ------------------------- >8 -----------------------------
Thanks for the report. This is execute only key where vma has the
implied read permission. So The patch do break this case. I will see how
best we can handle PROT_EXEC and the multi threaded test that required
the change in the patch.
-aneesh
^ permalink raw reply
* Re: [PATCH v2 2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
From: kajoljain @ 2020-06-26 8:02 UTC (permalink / raw)
To: ego, Madhavan Srinivasan; +Cc: nathanl, maddy, suka, anju, linuxppc-dev
In-Reply-To: <20200626074521.GA13159@in.ibm.com>
On 6/26/20 1:15 PM, Gautham R Shenoy wrote:
> On Wed, Jun 24, 2020 at 05:58:31PM +0530, Madhavan Srinivasan wrote:
>>
>>
>> On 6/24/20 4:26 PM, Gautham R Shenoy wrote:
>>> Hi Kajol,
>>>
>>> On Wed, Jun 24, 2020 at 03:47:54PM +0530, Kajol Jain wrote:
>>>> Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.
>>>>
>>>> command:# cat /sys/devices/hv_24x7/cpumask
>>>> 0
>>> Since this sysfs interface is read-only, and the user cannot change
>>> the CPU which will be making the HCALLs to obtain the 24x7 counts,
>>> does the user even need to know if currently CPU X is the one which is
>>> going to make HCALLs to retrive the 24x7 counts ? Does it help in any
>>> kind of trouble-shooting ?
>> Primary use to expose the cpumask is for the perf tool.
>> Which has the capability to parse the driver sysfs folder
>> and understand the cpumask file. Having cpumask
>> file will reduce the number of perf commandline
>> parameters (will avoid "-C" option in the perf tool
>> command line). I can also notify the user which is
>> the current cpu used to retrieve the counter data.
>
> Fair enough. Can we include this in the patch description ?
Sure will update in next version of patchset.
Thanks,
Kajol Jain
>
>>
>>> It would have made sense if the interface was read-write, since a user
>>> can set this to a CPU which is not running user applications. This
>>> would help in minimising jitter on those active CPUs running the user
>>> applications.
>>
>> With cpumask backed by hotplug
>> notifiers, enabling user write access to it will
>> complicate the code with more additional check.
>> CPU will come to play only if the user request for
>> counter data. If not, then there will be no HCALLs made
>> using the CPU.
>
> Well, I was wondering if you could make the interface writable because
> I couldn't think of the use of a read-only interface. With the
> perf-use case you have provided, I guess it makes sense. I am ok with
> it being a read-only interface.
>
>>
>> Maddy
>
> --
> Thanks and Regards
> gautham.
>
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: Use doorbells even if XIVE is available
From: Cédric Le Goater @ 2020-06-26 7:55 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, linuxppc-dev
Cc: Anton Blanchard, kvm-ppc, David Gibson
In-Reply-To: <af42c250-cf4b-0815-c91c-9363445383e7@kaod.org>
>>> An option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
>>> to get KVM performance back.
>
> An option vector would require a PAPR change. Unless the architecture
> reserves some bits for the implementation, but I don't think so. Same
> for CAS.
>
>> Qemu/KVM populates /proc/device-tree/hypervisor, so we *could* look at
>> that. Though adding PowerVM/KVM specific hacks is obviously a very
>> slippery slope.
>
> QEMU could advertise a property "emulated-msgsndp", or something similar,
> which would be interpreted by Linux as a CPU feature and taken into account
> when doing the IPIs.
Could we remove msgsndp support from HFSCR in KVM and test it in pseries ?
C.
^ permalink raw reply
* Re: [PATCH v2 2/2] cpufreq: Specify default governor on command line
From: Quentin Perret @ 2020-06-26 8:09 UTC (permalink / raw)
To: Viresh Kumar
Cc: juri.lelli, kernel-team, vincent.guittot, arnd, rafael, peterz,
adharmap, linux-pm, rjw, linux-kernel, mingo, paulus,
linuxppc-dev, tkjos
In-Reply-To: <20200626025346.z3g3ikdcin56gjlo@vireshk-i7>
On Friday 26 Jun 2020 at 08:23:46 (+0530), Viresh Kumar wrote:
> On 23-06-20, 15:21, Quentin Perret wrote:
> > @@ -2789,7 +2796,13 @@ static int __init cpufreq_core_init(void)
> > cpufreq_global_kobject = kobject_create_and_add("cpufreq", &cpu_subsys.dev_root->kobj);
> > BUG_ON(!cpufreq_global_kobject);
> >
> > + mutex_lock(&cpufreq_governor_mutex);
> > + if (!default_governor)
>
> Also is this check really required ? The pointer will always be NULL
> at this point, isn't it ?
Not necessarily in this implementation -- the governors are registered
at core_initcall time too, so I don't think we can assume any ordering
there.
But it looks like your new version has fixed that by design, so I'll go
look at it some more, and try it out.
Thanks for the help!
Quentin
>
> > + default_governor = cpufreq_default_governor();
> > + mutex_unlock(&cpufreq_governor_mutex);
> > +
> > return 0;
> > }
> > module_param(off, int, 0444);
> > +module_param_string(default_governor, cpufreq_param_governor, CPUFREQ_NAME_LEN, 0444);
> > core_initcall(cpufreq_core_init);
> > --
> > 2.27.0.111.gc72c7da667-goog
>
> --
> viresh
^ permalink raw reply
* Re: [bug] LTP mmap03 stuck in page fault loop after c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault error to the user")
From: Aneesh Kumar K.V @ 2020-06-26 9:09 UTC (permalink / raw)
To: Jan Stancek, linuxppc-dev, sandipan; +Cc: Rachel Sibley, linuxram
In-Reply-To: <ac99e243-0945-8be0-6ae4-73af29b7a199@linux.ibm.com>
On 6/26/20 1:17 PM, Aneesh Kumar K.V wrote:
> Hi Jan,
>
> On 6/26/20 12:29 PM, Jan Stancek wrote:
>> Hi,
>>
>> LTP mmap03 is getting stuck in page fault loop after commit
>> c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault
>> error to the user")
>>
>> System is ppc64le P9 lpar [1] running v5.8-rc2-34-g3e08a95294a4.
>>
>> Here's a minimized reproducer:
>> ------------------------- 8< -----------------------------
>> #include <fcntl.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <unistd.h>
>> #include <sys/mman.h>
>>
>> int main(int ac, char **av)
>> {
>> int page_sz = getpagesize();
>> int fildes;
>> char *addr;
>>
>> fildes = open("tempfile", O_WRONLY | O_CREAT, 0666);
>> write(fildes, &fildes, sizeof(fildes));
>> close(fildes);
>>
>> fildes = open("tempfile", O_RDONLY);
>> unlink("tempfile");
>>
>> addr = mmap(0, page_sz, PROT_EXEC, MAP_FILE | MAP_PRIVATE,
>> fildes, 0);
>>
>> printf("%d\n", *addr);
>> return 0;
>> }
>> ------------------------- >8 -----------------------------
>
> Thanks for the report. This is execute only key where vma has the
> implied read permission. So The patch do break this case. I will see how
> best we can handle PROT_EXEC and the multi threaded test that required
> the change in the patch.
>
Can you check with this change? While checking for access permission we
are checking against UAMOR value which i think is wrong. We just need to
look at the AMR and IAMR values to check whether access is permitted or
not. Even if UAMOR deny the userspace management of the key, we should
do the correct access check.
modified arch/powerpc/mm/book3s64/pkeys.c
@@ -353,9 +353,6 @@ static bool pkey_access_permitted(int pkey, bool
write, bool execute)
int pkey_shift;
u64 amr;
- if (!is_pkey_enabled(pkey))
- return true;
-
pkey_shift = pkeyshift(pkey);
if (execute && !(read_iamr() & (IAMR_EX_BIT << pkey_shift)))
return true;
^ permalink raw reply
* Re: [bug] LTP mmap03 stuck in page fault loop after c46241a370a6 ("powerpc/pkeys: Check vma before returning key fault error to the user")
From: Jan Stancek @ 2020-06-26 9:49 UTC (permalink / raw)
To: Aneesh Kumar K.V; +Cc: Rachel Sibley, linuxppc-dev, sandipan, linuxram
In-Reply-To: <a55e7ccd-09e8-726a-21d2-02b00b48d857@linux.ibm.com>
----- Original Message -----
> Can you check with this change? While checking for access permission we
> are checking against UAMOR value which i think is wrong. We just need to
> look at the AMR and IAMR values to check whether access is permitted or
> not. Even if UAMOR deny the userspace management of the key, we should
> do the correct access check.
>
> modified arch/powerpc/mm/book3s64/pkeys.c
> @@ -353,9 +353,6 @@ static bool pkey_access_permitted(int pkey, bool
> write, bool execute)
> int pkey_shift;
> u64 amr;
>
> - if (!is_pkey_enabled(pkey))
> - return true;
> -
> pkey_shift = pkeyshift(pkey);
> if (execute && !(read_iamr() & (IAMR_EX_BIT << pkey_shift)))
> return true;
>
This change fixes it for me. mmap03 and reproducer from previous
email no longer get stuck.
Thanks,
Jan
^ permalink raw reply
* [PATCH v2 0/4] Prefixed instruction tests to cover negative cases
From: Balamuruhan S @ 2020-06-26 9:51 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
This patchset adds support to test negative scenarios and adds testcase
for paddi with few fixes. It is based on powerpc/next and on top of
Jordan's tests for prefixed instructions patchsets,
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-May/211394.html
https://lists.ozlabs.org/pipermail/linuxppc-dev/2020-June/211768.html
Changes in v2:
-------------
Fix review comments from Sandipan and Jordan,
* use helper function to print word/prefix instructions
* reuse bits of `flags` to represent negative test scenario
* always set NIP instead of only setting for relative prefixed
instructions
Balamuruhan S (4):
powerpc test_emulate_step: enhancement to test negative scenarios
powerpc test_emulate_step: add negative tests for prefixed addi
powerpc sstep: introduce macros to retrieve Prefix instruction
operands
powerpc test_emulate_step: move extern declaration to sstep.h
arch/powerpc/include/asm/sstep.h | 6 ++++
arch/powerpc/lib/sstep.c | 12 ++++----
arch/powerpc/lib/test_emulate_step.c | 42 ++++++++++++++++++++--------
3 files changed, 43 insertions(+), 17 deletions(-)
base-commit: 64677779e8962c20b580b471790fe42367750599
prerequisite-patch-id: 3fff52f42000e816e2e8b4f75a2bca651dec5efe
prerequisite-patch-id: 5d7904bf38248ec39ed0f6223500286b9eaf82a9
prerequisite-patch-id: 7236d3caa4dc6de6079ae893678223876a3bb364
prerequisite-patch-id: 733e7f9b5c6ade64b8a1c7458b5aefe6b7d6fcff
prerequisite-patch-id: 4793e7716f3f56577a49976539d06db37ba31a80
prerequisite-patch-id: ffb024c2590e7249190b0137acf267e821a816a7
prerequisite-patch-id: 86e64f47de2dc6e9a6e1404de12d7c91775c22c8
prerequisite-patch-id: 9fbe5c3af9590696c230944cdee3c45e01f44d6d
prerequisite-patch-id: 3f9c6238023c867e27d87de26487e7e665a9dc12
--
2.24.1
^ permalink raw reply
* [PATCH v2 1/4] powerpc test_emulate_step: enhancement to test negative scenarios
From: Balamuruhan S @ 2020-06-26 9:51 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200626095158.1031507-1-bala24@linux.ibm.com>
add provision to declare test is a negative scenario, verify
whether emulation fails and avoid executing it.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/lib/test_emulate_step.c | 30 +++++++++++++++++++---------
1 file changed, 21 insertions(+), 9 deletions(-)
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 0ca2b7cc8d8c..7c30a69c174f 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -118,6 +118,7 @@
#define IGNORE_GPR(n) (0x1UL << (n))
#define IGNORE_XER (0x1UL << 32)
#define IGNORE_CCR (0x1UL << 33)
+#define NEGATIVE_TEST (0x1UL << 63)
static void __init init_pt_regs(struct pt_regs *regs)
{
@@ -1202,8 +1203,10 @@ static struct compute_test compute_tests[] = {
};
static int __init emulate_compute_instr(struct pt_regs *regs,
- struct ppc_inst instr)
+ struct ppc_inst instr,
+ bool negative)
{
+ int analysed;
extern s32 patch__exec_instr;
struct instruction_op op;
@@ -1212,13 +1215,17 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
regs->nip = patch_site_addr(&patch__exec_instr);
- if (analyse_instr(&op, regs, instr) != 1 ||
- GETTYPE(op.type) != COMPUTE) {
- pr_info("execution failed, instruction = %s\n", ppc_inst_as_str(instr));
+ analysed = analyse_instr(&op, regs, instr);
+ if (analysed != 1 || GETTYPE(op.type) != COMPUTE) {
+ if (negative)
+ return -EFAULT;
+ pr_info("emulation failed, instruction = %s\n", ppc_inst_as_str(instr));
return -EFAULT;
}
-
- emulate_update_regs(regs, &op);
+ if (analysed == 1 && negative)
+ pr_info("negative test failed, instruction = %s\n", ppc_inst_as_str(instr));
+ if (!negative)
+ emulate_update_regs(regs, &op);
return 0;
}
@@ -1256,7 +1263,7 @@ static void __init run_tests_compute(void)
struct pt_regs *regs, exp, got;
unsigned int i, j, k;
struct ppc_inst instr;
- bool ignore_gpr, ignore_xer, ignore_ccr, passed;
+ bool ignore_gpr, ignore_xer, ignore_ccr, passed, rc, negative;
for (i = 0; i < ARRAY_SIZE(compute_tests); i++) {
test = &compute_tests[i];
@@ -1270,6 +1277,7 @@ static void __init run_tests_compute(void)
instr = test->subtests[j].instr;
flags = test->subtests[j].flags;
regs = &test->subtests[j].regs;
+ negative = flags & NEGATIVE_TEST;
ignore_xer = flags & IGNORE_XER;
ignore_ccr = flags & IGNORE_CCR;
passed = true;
@@ -1284,8 +1292,12 @@ static void __init run_tests_compute(void)
exp.msr = MSR_KERNEL;
got.msr = MSR_KERNEL;
- if (emulate_compute_instr(&got, instr) ||
- execute_compute_instr(&exp, instr)) {
+ rc = emulate_compute_instr(&got, instr, negative) != 0;
+ if (negative) {
+ /* skip executing instruction */
+ passed = rc;
+ goto print;
+ } else if (rc || execute_compute_instr(&exp, instr)) {
passed = false;
goto print;
}
--
2.24.1
^ permalink raw reply related
* [PATCH v2 2/4] powerpc test_emulate_step: add negative tests for prefixed addi
From: Balamuruhan S @ 2020-06-26 9:51 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200626095158.1031507-1-bala24@linux.ibm.com>
testcases for `paddi` instruction to cover the negative case,
if R is equal to 1 and RA is not equal to 0, the instruction
form is invalid.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/lib/test_emulate_step.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 7c30a69c174f..0ee59301ef99 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -1197,6 +1197,16 @@ static struct compute_test compute_tests[] = {
.regs = {
.gpr[21] = 0,
}
+ },
+ /* Invalid instruction form with R = 1 and RA != 0 */
+ {
+ .descr = "RA = R22(0), SI = 0, R = 1",
+ .instr = TEST_PADDI(21, 22, 0, 1),
+ .flags = NEGATIVE_TEST,
+ .regs = {
+ .gpr[21] = 0,
+ .gpr[22] = 0,
+ }
}
}
}
--
2.24.1
^ permalink raw reply related
* [PATCH v2 3/4] powerpc sstep: introduce macros to retrieve Prefix instruction operands
From: Balamuruhan S @ 2020-06-26 9:51 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200626095158.1031507-1-bala24@linux.ibm.com>
retrieve prefix instruction operands RA and pc relative bit R values
using macros and adopt it in sstep.c and test_emulate_step.c.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/include/asm/sstep.h | 4 ++++
arch/powerpc/lib/sstep.c | 12 ++++++------
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 3b01c69a44aa..325975b4ef30 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -104,6 +104,10 @@ enum instruction_type {
#define MKOP(t, f, s) ((t) | (f) | SIZE(s))
+/* Prefix instruction operands */
+#define GET_PREFIX_RA(i) (((i) >> 16) & 0x1f)
+#define GET_PREFIX_R(i) ((i) & (1ul << 20))
+
struct instruction_op {
int type;
int reg;
diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
index 5abe98216dc2..fb4c5767663d 100644
--- a/arch/powerpc/lib/sstep.c
+++ b/arch/powerpc/lib/sstep.c
@@ -200,8 +200,8 @@ static nokprobe_inline unsigned long mlsd_8lsd_ea(unsigned int instr,
unsigned int dd;
unsigned long ea, d0, d1, d;
- prefix_r = instr & (1ul << 20);
- ra = (suffix >> 16) & 0x1f;
+ prefix_r = GET_PREFIX_R(instr);
+ ra = GET_PREFIX_RA(suffix);
d0 = instr & 0x3ffff;
d1 = suffix & 0xffff;
@@ -1339,8 +1339,8 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
switch (opcode) {
#ifdef __powerpc64__
case 1:
- prefix_r = word & (1ul << 20);
- ra = (suffix >> 16) & 0x1f;
+ prefix_r = GET_PREFIX_R(word);
+ ra = GET_PREFIX_RA(suffix);
rd = (suffix >> 21) & 0x1f;
op->reg = rd;
op->val = regs->gpr[rd];
@@ -2715,8 +2715,8 @@ int analyse_instr(struct instruction_op *op, const struct pt_regs *regs,
}
break;
case 1: /* Prefixed instructions */
- prefix_r = word & (1ul << 20);
- ra = (suffix >> 16) & 0x1f;
+ prefix_r = GET_PREFIX_R(word);
+ ra = GET_PREFIX_RA(suffix);
op->update_reg = ra;
rd = (suffix >> 21) & 0x1f;
op->reg = rd;
--
2.24.1
^ permalink raw reply related
* [PATCH v2 4/4] powerpc test_emulate_step: move extern declaration to sstep.h
From: Balamuruhan S @ 2020-06-26 9:51 UTC (permalink / raw)
To: mpe
Cc: ravi.bangoria, jniethe5, Balamuruhan S, paulus, sandipan,
naveen.n.rao, linuxppc-dev
In-Reply-To: <20200626095158.1031507-1-bala24@linux.ibm.com>
fix checkpatch.pl warnings by moving extern declaration from source
file to headerfile.
Signed-off-by: Balamuruhan S <bala24@linux.ibm.com>
---
arch/powerpc/include/asm/sstep.h | 2 ++
arch/powerpc/lib/test_emulate_step.c | 2 --
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/include/asm/sstep.h b/arch/powerpc/include/asm/sstep.h
index 325975b4ef30..c8e37ef060c1 100644
--- a/arch/powerpc/include/asm/sstep.h
+++ b/arch/powerpc/include/asm/sstep.h
@@ -108,6 +108,8 @@ enum instruction_type {
#define GET_PREFIX_RA(i) (((i) >> 16) & 0x1f)
#define GET_PREFIX_R(i) ((i) & (1ul << 20))
+extern s32 patch__exec_instr;
+
struct instruction_op {
int type;
int reg;
diff --git a/arch/powerpc/lib/test_emulate_step.c b/arch/powerpc/lib/test_emulate_step.c
index 0ee59301ef99..c46bf6fc199b 100644
--- a/arch/powerpc/lib/test_emulate_step.c
+++ b/arch/powerpc/lib/test_emulate_step.c
@@ -1217,7 +1217,6 @@ static int __init emulate_compute_instr(struct pt_regs *regs,
bool negative)
{
int analysed;
- extern s32 patch__exec_instr;
struct instruction_op op;
if (!regs || !ppc_inst_val(instr))
@@ -1243,7 +1242,6 @@ static int __init execute_compute_instr(struct pt_regs *regs,
struct ppc_inst instr)
{
extern int exec_instr(struct pt_regs *regs);
- extern s32 patch__exec_instr;
if (!regs || !ppc_inst_val(instr))
return -EINVAL;
--
2.24.1
^ permalink raw reply related
* Re: [PATCH v2 3/5] libnvdimm/nvdimm/flush: Allow architecture to override the flush barrier
From: Michal Suchánek @ 2020-06-26 10:20 UTC (permalink / raw)
To: Mikulas Patocka
Cc: Jan Kara, linux-nvdimm, Aneesh Kumar K.V, Jeff Moyer, alistair,
Dan Williams, linuxppc-dev
In-Reply-To: <alpine.LRH.2.02.2005220845200.17488@file01.intranet.prod.int.rdu2.redhat.com>
On Fri, May 22, 2020 at 09:01:17AM -0400, Mikulas Patocka wrote:
>
>
> On Fri, 22 May 2020, Aneesh Kumar K.V wrote:
>
> > On 5/22/20 3:01 PM, Michal Suchánek wrote:
> > > On Thu, May 21, 2020 at 02:52:30PM -0400, Mikulas Patocka wrote:
> > > >
> > > >
> > > > On Thu, 21 May 2020, Dan Williams wrote:
> > > >
> > > > > On Thu, May 21, 2020 at 10:03 AM Aneesh Kumar K.V
> > > > > <aneesh.kumar@linux.ibm.com> wrote:
> > > > > >
> > > > > > > Moving on to the patch itself--Aneesh, have you audited other
> > > > > > > persistent
> > > > > > > memory users in the kernel? For example, drivers/md/dm-writecache.c
> > > > > > > does
> > > > > > > this:
> > > > > > >
> > > > > > > static void writecache_commit_flushed(struct dm_writecache *wc, bool
> > > > > > > wait_for_ios)
> > > > > > > {
> > > > > > > if (WC_MODE_PMEM(wc))
> > > > > > > wmb(); <==========
> > > > > > > else
> > > > > > > ssd_commit_flushed(wc, wait_for_ios);
> > > > > > > }
> > > > > > >
> > > > > > > I believe you'll need to make modifications there.
> > > > > > >
> > > > > >
> > > > > > Correct. Thanks for catching that.
> > > > > >
> > > > > >
> > > > > > I don't understand dm much, wondering how this will work with
> > > > > > non-synchronous DAX device?
> > > > >
> > > > > That's a good point. DM-writecache needs to be cognizant of things
> > > > > like virtio-pmem that violate the rule that persisent memory writes
> > > > > can be flushed by CPU functions rather than calling back into the
> > > > > driver. It seems we need to always make the flush case a dax_operation
> > > > > callback to account for this.
> > > >
> > > > dm-writecache is normally sitting on the top of dm-linear, so it would
> > > > need to pass the wmb() call through the dm core and dm-linear target ...
> > > > that would slow it down ... I remember that you already did it this way
> > > > some times ago and then removed it.
> > > >
> > > > What's the exact problem with POWER? Could the POWER system have two types
> > > > of persistent memory that need two different ways of flushing?
> > >
> > > As far as I understand the discussion so far
> > >
> > > - on POWER $oldhardware uses $oldinstruction to ensure pmem consistency
> > > - on POWER $newhardware uses $newinstruction to ensure pmem consistency
> > > (compatible with $oldinstruction on $oldhardware)
> >
> > Correct.
> >
> > > - on some platforms instead of barrier instruction a callback into the
> > > driver is issued to ensure consistency
> >
> > This is virtio-pmem only at this point IIUC.
> >
> > -aneesh
>
> And does the virtio-pmem driver track which pages are dirty? Or does it
> need to specify the range of pages to flush in the flush function?
>
> > > None of this is reflected by the dm driver.
>
> We could make a new dax method:
> void *(dax_get_flush_function)(void);
>
> This would return a pointer to "wmb()" on x86 and something else on Power.
>
> The method "dax_get_flush_function" would be called only once when
> initializing the writecache driver (because the call would be slow because
> it would have to go through the DM stack) and then, the returned function
> would be called each time we need write ordering. The returned function
> would do just "sfence; ret".
Hello,
as far as I understand the code virtio_pmem has a fush function defined
which indeed can make use of the region properties, such as memory
range. If such function exists you need quivalent of sync() - call into
the device in question. If it does not calling arch_pmem_flush_barrier()
instead of wmb() should suffice.
I am not aware of an interface to determine if the flush function exists
for a particular region.
Thanks
Michal
^ permalink raw reply
* [PATCH v3 1/2] powerpc/perf/hv-24x7: Add cpu hotplug support
From: Kajol Jain @ 2020-06-26 10:28 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: nathanl, ego, maddy, kjain, suka, anju
In-Reply-To: <20200626102824.270923-1-kjain@linux.ibm.com>
Patch here adds cpu hotplug functions to hv_24x7 pmu.
A new cpuhp_state "CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE" enum
is added.
The online callback function updates the cpumask only if its
empty. As the primary intention of adding hotplug support
is to designate a CPU to make HCALL to collect the
counter data.
The offline function test and clear corresponding cpu in a cpumask
and update cpumask to any other active cpu.
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
---
arch/powerpc/perf/hv-24x7.c | 45 +++++++++++++++++++++++++++++++++++++
include/linux/cpuhotplug.h | 1 +
2 files changed, 46 insertions(+)
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index db213eb7cb02..ce4739e2b407 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -31,6 +31,8 @@ static int interface_version;
/* Whether we have to aggregate result data for some domains. */
static bool aggregate_result_elements;
+static cpumask_t hv_24x7_cpumask;
+
static bool domain_is_valid(unsigned domain)
{
switch (domain) {
@@ -1641,6 +1643,44 @@ static struct pmu h_24x7_pmu = {
.capabilities = PERF_PMU_CAP_NO_EXCLUDE,
};
+static int ppc_hv_24x7_cpu_online(unsigned int cpu)
+{
+ /* Make this CPU the designated target for counter collection */
+ if (cpumask_empty(&hv_24x7_cpumask))
+ cpumask_set_cpu(cpu, &hv_24x7_cpumask);
+
+ return 0;
+}
+
+static int ppc_hv_24x7_cpu_offline(unsigned int cpu)
+{
+ int target = -1;
+
+ /* Check if exiting cpu is used for collecting 24x7 events */
+ if (!cpumask_test_and_clear_cpu(cpu, &hv_24x7_cpumask))
+ return 0;
+
+ /* Find a new cpu to collect 24x7 events */
+ target = cpumask_last(cpu_active_mask);
+
+ if (target < 0 || target >= nr_cpu_ids)
+ return -1;
+
+ /* Migrate 24x7 events to the new target */
+ cpumask_set_cpu(target, &hv_24x7_cpumask);
+ perf_pmu_migrate_context(&h_24x7_pmu, cpu, target);
+
+ return 0;
+}
+
+static int hv_24x7_cpu_hotplug_init(void)
+{
+ return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
+ "perf/powerpc/hv_24x7:online",
+ ppc_hv_24x7_cpu_online,
+ ppc_hv_24x7_cpu_offline);
+}
+
static int hv_24x7_init(void)
{
int r;
@@ -1685,6 +1725,11 @@ static int hv_24x7_init(void)
if (r)
return r;
+ /* init cpuhotplug */
+ r = hv_24x7_cpu_hotplug_init();
+ if (r)
+ pr_err("hv_24x7: CPU hotplug init failed\n");
+
r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
if (r)
return r;
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 191772d4a4d7..a2710e654b64 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -181,6 +181,7 @@ enum cpuhp_state {
CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_THREAD_IMC_ONLINE,
CPUHP_AP_PERF_POWERPC_TRACE_IMC_ONLINE,
+ CPUHP_AP_PERF_POWERPC_HV_24x7_ONLINE,
CPUHP_AP_WATCHDOG_ONLINE,
CPUHP_AP_WORKQUEUE_ONLINE,
CPUHP_AP_RCUTREE_ONLINE,
--
2.18.2
^ permalink raw reply related
* [PATCH v3 2/2] powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
From: Kajol Jain @ 2020-06-26 10:28 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: nathanl, ego, maddy, kjain, suka, anju
In-Reply-To: <20200626102824.270923-1-kjain@linux.ibm.com>
Patch here adds a cpumask attr to hv_24x7 pmu along with ABI documentation.
Primary use to expose the cpumask is for the perf tool which has the
capability to parse the driver sysfs folder and understand the
cpumask file. Having cpumask file will reduce the number of perf command
line parameters (will avoid "-C" option in the perf tool
command line). It can also notify the user which is
the current cpu used to retrieve the counter data.
command:# cat /sys/devices/hv_24x7/cpumask
0
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
---
.../sysfs-bus-event_source-devices-hv_24x7 | 7 ++++
arch/powerpc/perf/hv-24x7.c | 36 +++++++++++++++++--
2 files changed, 41 insertions(+), 2 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
index e8698afcd952..f9dd3755b049 100644
--- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
@@ -43,6 +43,13 @@ Description: read only
This sysfs interface exposes the number of cores per chip
present in the system.
+What: /sys/devices/hv_24x7/cpumask
+Date: June 2020
+Contact: Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
+Description: read only
+ This sysfs file exposes the cpumask which is designated to make
+ HCALLs to retrieve hv-24x7 pmu event counter data.
+
What: /sys/bus/event_source/devices/hv_24x7/event_descs/<event-name>
Date: February 2014
Contact: Linux on PowerPC Developer List <linuxppc-dev@lists.ozlabs.org>
diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
index ce4739e2b407..3c699612d29f 100644
--- a/arch/powerpc/perf/hv-24x7.c
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -448,6 +448,12 @@ static ssize_t device_show_string(struct device *dev,
return sprintf(buf, "%s\n", (char *)d->var);
}
+static ssize_t cpumask_get_attr(struct device *dev,
+ struct device_attribute *attr, char *buf)
+{
+ return cpumap_print_to_pagebuf(true, buf, &hv_24x7_cpumask);
+}
+
static ssize_t sockets_show(struct device *dev,
struct device_attribute *attr, char *buf)
{
@@ -1116,6 +1122,17 @@ static DEVICE_ATTR_RO(sockets);
static DEVICE_ATTR_RO(chipspersocket);
static DEVICE_ATTR_RO(coresperchip);
+static DEVICE_ATTR(cpumask, S_IRUGO, cpumask_get_attr, NULL);
+
+static struct attribute *cpumask_attrs[] = {
+ &dev_attr_cpumask.attr,
+ NULL,
+};
+
+static struct attribute_group cpumask_attr_group = {
+ .attrs = cpumask_attrs,
+};
+
static struct bin_attribute *if_bin_attrs[] = {
&bin_attr_catalog,
NULL,
@@ -1143,6 +1160,11 @@ static const struct attribute_group *attr_groups[] = {
&event_desc_group,
&event_long_desc_group,
&if_group,
+ /*
+ * This NULL is a placeholder for the cpumask attr which will update
+ * onlyif cpuhotplug registration is successful
+ */
+ NULL,
NULL,
};
@@ -1683,7 +1705,7 @@ static int hv_24x7_cpu_hotplug_init(void)
static int hv_24x7_init(void)
{
- int r;
+ int r, i = -1;
unsigned long hret;
struct hv_perf_caps caps;
@@ -1727,8 +1749,18 @@ static int hv_24x7_init(void)
/* init cpuhotplug */
r = hv_24x7_cpu_hotplug_init();
- if (r)
+ if (r) {
pr_err("hv_24x7: CPU hotplug init failed\n");
+ } else {
+ /*
+ * Cpu hotplug init is successful, add the
+ * cpumask file as part of pmu attr group and
+ * assign it to very first NULL location.
+ */
+ while (attr_groups[++i])
+ /* nothing */;
+ attr_groups[i] = &cpumask_attr_group;
+ }
r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
if (r)
--
2.18.2
^ permalink raw reply related
* [PATCH v3 0/2] Add cpu hotplug support for powerpc/perf/hv-24x7
From: Kajol Jain @ 2020-06-26 10:28 UTC (permalink / raw)
To: linuxppc-dev, mpe; +Cc: nathanl, ego, maddy, kjain, suka, anju
This patchset add cpu hotplug support for hv_24x7 driver by adding
online/offline cpu hotplug function. It also add sysfs file
"cpumask" to expose current online cpu that can be used for
hv_24x7 event count.
Changelog:
v2 -> v3
- Corrected some of the typo mistakes and update commit message
as suggested by Gautham R Shenoy.
- Added Reviewed-by tag for the first patch in the patchset.
v1 -> v2
- Changed function to pick active cpu incase of offline
from "cpumask_any_but" to "cpumask_last", as
cpumask_any_but function pick very next online cpu and incase where
we are sequentially off-lining multiple cpus, "pmu_migrate_context"
can add extra latency.
- Suggested by: Gautham R Shenoy.
- Change documentation for cpumask and rather then hardcode the
initialization for cpumask_attr_group, add loop to get very first
NULL as suggested by Gautham R Shenoy.
Kajol Jain (2):
powerpc/perf/hv-24x7: Add cpu hotplug support
powerpc/hv-24x7: Add sysfs files inside hv-24x7 device to show cpumask
.../sysfs-bus-event_source-devices-hv_24x7 | 7 ++
arch/powerpc/perf/hv-24x7.c | 79 ++++++++++++++++++-
include/linux/cpuhotplug.h | 1 +
3 files changed, 86 insertions(+), 1 deletion(-)
--
2.18.2
^ permalink raw reply
* Re: [PATCH 09/13] x86: Remove dev->archdata.iommu pointer
From: Borislav Petkov @ 2020-06-26 11:46 UTC (permalink / raw)
To: Joerg Roedel
Cc: linux-ia64, Heiko Stuebner, David Airlie, Joonas Lahtinen,
Thierry Reding, Paul Mackerras, Will Deacon, Marek Szyprowski,
x86, Russell King, Catalin Marinas, Fenghua Yu, Joerg Roedel,
intel-gfx, Jani Nikula, Rodrigo Vivi, Matthias Brugger,
linux-arm-kernel, Tony Luck, linuxppc-dev, linux-kernel, iommu,
Daniel Vetter, David Woodhouse, Lu Baolu
In-Reply-To: <20200625130836.1916-10-joro@8bytes.org>
On Thu, Jun 25, 2020 at 03:08:32PM +0200, Joerg Roedel wrote:
> From: Joerg Roedel <jroedel@suse.de>
>
> There are no users left, all drivers have been converted to use the
> per-device private pointer offered by IOMMU core.
>
> Signed-off-by: Joerg Roedel <jroedel@suse.de>
> ---
> arch/x86/include/asm/device.h | 3 ---
> 1 file changed, 3 deletions(-)
>
> diff --git a/arch/x86/include/asm/device.h b/arch/x86/include/asm/device.h
> index 49bd6cf3eec9..7c0a52ca2f4d 100644
> --- a/arch/x86/include/asm/device.h
> +++ b/arch/x86/include/asm/device.h
> @@ -3,9 +3,6 @@
> #define _ASM_X86_DEVICE_H
>
> struct dev_archdata {
> -#ifdef CONFIG_IOMMU_API
> - void *iommu; /* hook for IOMMU specific extension */
> -#endif
> };
>
> struct pdev_archdata {
> --
Acked-by: Borislav Petkov <bp@suse.de>
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
^ permalink raw reply
* Re: [PATCH] powerpc/pseries: Use doorbells even if XIVE is available
From: Cédric Le Goater @ 2020-06-26 7:17 UTC (permalink / raw)
To: Michael Ellerman, Nicholas Piggin, linuxppc-dev
Cc: Anton Blanchard, kvm-ppc, David Gibson
In-Reply-To: <87r1u4aqzm.fsf@mpe.ellerman.id.au>
Adding David,
On 6/25/20 3:11 AM, Michael Ellerman wrote:
> Nicholas Piggin <npiggin@gmail.com> writes:
>> KVM supports msgsndp in guests by trapping and emulating the
>> instruction, so it was decided to always use XIVE for IPIs if it is
>> available. However on PowerVM systems, msgsndp can be used and gives
>> better performance. On large systems, high XIVE interrupt rates can
>> have sub-linear scaling, and using msgsndp can reduce the load on
>> the interrupt controller.
>>
>> So switch to using core local doorbells even if XIVE is available.
>> This reduces performance for KVM guests with an SMT topology by
>> about 50% for ping-pong context switching between SMT vCPUs.
>
> You have to take explicit steps to configure KVM in that way with qemu.
> eg. "qemu .. -smp 8" will give you 8 SMT1 CPUs by default.
>
>> An option vector (or dt-cpu-ftrs) could be defined to disable msgsndp
>> to get KVM performance back.
An option vector would require a PAPR change. Unless the architecture
reserves some bits for the implementation, but I don't think so. Same
for CAS.
> Qemu/KVM populates /proc/device-tree/hypervisor, so we *could* look at
> that. Though adding PowerVM/KVM specific hacks is obviously a very
> slippery slope.
QEMU could advertise a property "emulated-msgsndp", or something similar,
which would be interpreted by Linux as a CPU feature and taken into account
when doing the IPIs.
The CPU setup for XIVE needs a cleanup also. There is no need to allocate
interrupts for IPIs anymore in that case.
Thanks,
C.
>
>> diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
>> index 6891710833be..a737a2f87c67 100644
>> --- a/arch/powerpc/platforms/pseries/smp.c
>> +++ b/arch/powerpc/platforms/pseries/smp.c
>> @@ -188,13 +188,14 @@ static int pseries_smp_prepare_cpu(int cpu)
>> return 0;
>> }
>>
>> +static void (*cause_ipi_offcore)(int cpu) __ro_after_init;
>> +
>> static void smp_pseries_cause_ipi(int cpu)
>
> This is static so the name could be more descriptive, it doesn't need
> the "smp_pseries" prefix.
>
>> {
>> - /* POWER9 should not use this handler */
>> if (doorbell_try_core_ipi(cpu))
>> return;
>
> Seems like it would be worth making that static inline so we can avoid
> the function call overhead.
>
>> - icp_ops->cause_ipi(cpu);
>> + cause_ipi_offcore(cpu);
>> }
>>
>> static int pseries_cause_nmi_ipi(int cpu)
>> @@ -222,10 +223,7 @@ static __init void pSeries_smp_probe_xics(void)
>> {
>> xics_smp_probe();
>>
>> - if (cpu_has_feature(CPU_FTR_DBELL) && !is_secure_guest())
>> - smp_ops->cause_ipi = smp_pseries_cause_ipi;
>> - else
>> - smp_ops->cause_ipi = icp_ops->cause_ipi;
>> + smp_ops->cause_ipi = icp_ops->cause_ipi;
>> }
>>
>> static __init void pSeries_smp_probe(void)
>> @@ -238,6 +236,18 @@ static __init void pSeries_smp_probe(void)
>
> The comment just above here says:
>
> /*
> * Don't use P9 doorbells when XIVE is enabled. IPIs
> * using MMIOs should be faster
> */
>> xive_smp_probe();
>
> Which is no longer true.
>
>> else
>> pSeries_smp_probe_xics();
>
> I think you should just fold this in, it would make the logic slightly
> easier to follow.
>
>> + /*
>> + * KVM emulates doorbells by reading the instruction, which
>> + * can't be done if the guest is secure. If a secure guest
>> + * runs under PowerVM, it could use msgsndp but would need a
>> + * way to distinguish.
>> + */
>
> It's not clear what it needs to distinguish: That it's running under
> PowerVM and therefore *can* use msgsndp even though it's secure.
>
> Also the comment just talks about the is_secure_guest() test, which is
> not obvious on first reading.
>
>> + if (cpu_has_feature(CPU_FTR_DBELL) &&
>> + cpu_has_feature(CPU_FTR_SMT) && !is_secure_guest()) {
>> + cause_ipi_offcore = smp_ops->cause_ipi;
>> + smp_ops->cause_ipi = smp_pseries_cause_ipi;
>> + }
>
> Because we're at the tail of the function I think this would be clearer
> if it used early returns, eg:
>
> // If the CPU doesn't have doorbells then we must use xics/xive
> if (!cpu_has_feature(CPU_FTR_DBELL))
> return;
>
> // If the CPU doesn't have SMT then doorbells don't help us
> if (!cpu_has_feature(CPU_FTR_SMT))
> return;
>
> // Secure guests can't use doorbells because ...
> if (!is_secure_guest()
> return;
>
> /*
> * Otherwise we want to use doorbells for sibling threads and
> * xics/xive for IPIs off the core, because it performs better
> * on large systems ...
> */
> cause_ipi_offcore = smp_ops->cause_ipi;
> smp_ops->cause_ipi = smp_pseries_cause_ipi;
> }
>
>
> cheers
>
^ permalink raw reply
* [PATCH v2 0/3] Off-load TLB invalidations to host for !GTSE
From: Bharata B Rao @ 2020-06-26 13:09 UTC (permalink / raw)
To: linuxppc-dev; +Cc: aneesh.kumar, Bharata B Rao, npiggin
Hypervisor may choose not to enable Guest Translation Shootdown Enable
(GTSE) option for the guest. When GTSE isn't ON, the guest OS isn't
permitted to use instructions like tblie and tlbsync directly, but is
expected to make hypervisor calls to get the TLB flushed.
This series enables the TLB flush routines in the radix code to
off-load TLB flushing to hypervisor via the newly proposed hcall
H_RPT_INVALIDATE.
To easily check the availability of GTSE, it is made an MMU feature.
The OV5 handling and H_REGISTER_PROC_TBL hcall are changed to
handle GTSE as an optionally available feature and to not assume GTSE
when radix support is available.
The actual hcall implementation for KVM isn't included in this
patchset and will be posted separately.
Changes in v2
=============
- Dropped the patch that added H_RPT_INVALIDATE calls for the nested
case. This patch will be posted separately along with KVM hcall
implementation.
- Merged first two patches
- A few cleanups
- Rebased to powerpc/next
v1: https://lore.kernel.org/linuxppc-dev/20200618160930.26324-1-bharata@linux.ibm.com/
H_RPT_INVALIDATE
================
Syntax:
int64 /* H_Success: Return code on successful completion */
/* H_Busy - repeat the call with the same */
/* H_Parameter, H_P2, H_P3, H_P4, H_P5 : Invalid parameters */
hcall(const uint64 H_RPT_INVALIDATE, /* Invalidate RPT translation lookaside information */
uint64 pid, /* PID/LPID to invalidate */
uint64 target, /* Invalidation target */
uint64 type, /* Type of lookaside information */
uint64 pageSizes, /* Page sizes */
uint64 start, /* Start of Effective Address (EA) range (inclusive) */
uint64 end) /* End of EA range (exclusive) */
Invalidation targets (target)
-----------------------------
Core MMU 0x01 /* All virtual processors in the partition */
Core local MMU 0x02 /* Current virtual processor */
Nest MMU 0x04 /* All nest/accelerator agents in use by the partition */
A combination of the above can be specified, except core and core local.
Type of translation to invalidate (type)
---------------------------------------
NESTED 0x0001 /* Invalidate nested guest partition-scope */
TLB 0x0002 /* Invalidate TLB */
PWC 0x0004 /* Invalidate Page Walk Cache */
PRT 0x0008 /* Invalidate Process Table Entries if NESTED is clear */
PAT 0x0008 /* Invalidate Partition Table Entries if NESTED is set */
A combination of the above can be specified.
Page size mask (pageSizes)
--------------------------
4K 0x01
64K 0x02
2M 0x04
1G 0x08
All sizes (-1UL)
A combination of the above can be specified.
All page sizes can be selected with -1.
Semantics: Invalidate radix tree lookaside information
matching the parameters given.
* Return H_P2, H_P3 or H_P4 if target, type, or pageSizes parameters are
different from the defined values.
* Return H_PARAMETER if NESTED is set and pid is not a valid nested
LPID allocated to this partition
* Return H_P5 if (start, end) doesn't form a valid range. Start and end
should be a valid Quadrant address and end > start.
* Return H_NotSupported if the partition is not in running in radix
translation mode.
* May invalidate more translation information than requested.
* If start = 0 and end = -1, set the range to cover all valid addresses.
Else start and end should be aligned to 4kB (lower 11 bits clear).
* If NESTED is clear, then invalidate process scoped lookaside information.
Else pid specifies a nested LPID, and the invalidation is performed
on nested guest partition table and nested guest partition scope real
addresses.
* If pid = 0 and NESTED is clear, then valid addresses are quadrant 3 and
quadrant 0 spaces, Else valid addresses are quadrant 0.
* Pages which are fully covered by the range are to be invalidated.
Those which are partially covered are considered outside invalidation
range, which allows a caller to optimally invalidate ranges that may
contain mixed page sizes.
* Return H_SUCCESS on success.
Bharata B Rao (2):
powerpc/mm: Enable radix GTSE only if supported.
powerpc/pseries: H_REGISTER_PROC_TBL should ask for GTSE only if
enabled
Nicholas Piggin (1):
powerpc/mm/book3s64/radix: Off-load TLB invalidations to host when
!GTSE
.../include/asm/book3s/64/tlbflush-radix.h | 15 ++++
arch/powerpc/include/asm/hvcall.h | 34 +++++++-
arch/powerpc/include/asm/mmu.h | 4 +
arch/powerpc/include/asm/plpar_wrappers.h | 50 +++++++++++
arch/powerpc/kernel/dt_cpu_ftrs.c | 1 +
arch/powerpc/kernel/prom_init.c | 13 +--
arch/powerpc/mm/book3s64/radix_tlb.c | 82 +++++++++++++++++--
arch/powerpc/mm/init_64.c | 5 +-
arch/powerpc/platforms/pseries/lpar.c | 8 +-
9 files changed, 195 insertions(+), 17 deletions(-)
--
2.21.3
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox