* Re: [PATCH v2] cpuidle/powernv : Add Description for cpuidle state
From: Akshay Adiga @ 2018-06-05 9:08 UTC (permalink / raw)
To: Abhishek
Cc: Benjamin Herrenschmidt, stewart, linux-pm, daniel.lezcano, rjw,
linux-kernel, paulus, linuxppc-dev
In-Reply-To: <17cd53ec-6bc1-5814-8824-3c6a4f0b90cd@linux.vnet.ibm.com>
On Tue, Jun 05, 2018 at 02:24:39PM +0530, Abhishek wrote:
>
>
> On 06/04/2018 05:15 PM, Akshay Adiga wrote:
> > On Mon, Jun 04, 2018 at 07:04:14PM +1000, Benjamin Herrenschmidt wrote:
> > > Is this a new property ? I'm not fan of adding yet another of those
> > > silly arrays.
> > >
> > > I would say this is the right time now to switch over to a node per
> > > state instead, as we discussed with Vaidy.
>
> It is not a new property. Name was being used for description as description
> was not present in device tree. A skiboot patch adding description to device
> tree have been posted. This patch reads those description instead of copying
> name itself into description. And we fall back to reading name into
> description to not break the comaptibility with older firmware.
>From a cpuidle point of view this is a exisiting property, but for
powernv there was no device-tree property describing this.
Abhishek has added the following skiboot patch for adding description
for each idle state in device-tree .
https://patchwork.ozlabs.org/patch/924879/
I agree this can go into new device tree format which each idle state
as a node. Probably i will roll this patch into mine in the next
version.
>
> Thanks
> Abhishek
>
> > I posted the node based device tree here :
> > skiboot patch : https://patchwork.ozlabs.org/patch/923120/
> > kernel patch : https://lkml.org/lkml/2018/5/30/1146
> >
> > Do you have any inputs for this design ?
> >
> > > Additionally, while doing that, we can provide the versioning mechanism
> > > I proposed so we can deal with state specific issues and erratas.
> > >
> > > Cheers,
> > > Ben.
> > >
>
^ permalink raw reply
* Re: [PATCH v2] cpuidle/powernv : Add Description for cpuidle state
From: Abhishek @ 2018-06-05 8:54 UTC (permalink / raw)
To: Akshay Adiga, Benjamin Herrenschmidt
Cc: rjw, daniel.lezcano, paulus, mpe, linux-pm, linuxppc-dev,
linux-kernel, stewart
In-Reply-To: <20180604114500.3rlpuso5aftesnf5@aksadiga.ibm>
On 06/04/2018 05:15 PM, Akshay Adiga wrote:
> On Mon, Jun 04, 2018 at 07:04:14PM +1000, Benjamin Herrenschmidt wrote:
>> Is this a new property ? I'm not fan of adding yet another of those
>> silly arrays.
>>
>> I would say this is the right time now to switch over to a node per
>> state instead, as we discussed with Vaidy.
It is not a new property. Name was being used for description as
description was not present in device tree. A skiboot patch adding
description to device tree have been posted. This patch reads those
description instead of copying name itself into description. And we fall
back to reading name into description to not break the comaptibility
with older firmware.
Thanks
Abhishek
> I posted the node based device tree here :
> skiboot patch : https://patchwork.ozlabs.org/patch/923120/
> kernel patch : https://lkml.org/lkml/2018/5/30/1146
>
> Do you have any inputs for this design ?
>
>> Additionally, while doing that, we can provide the versioning mechanism
>> I proposed so we can deal with state specific issues and erratas.
>>
>> Cheers,
>> Ben.
>>
^ permalink raw reply
* Re: [PATCH] cpuidle:powernv: Make the snooze timeout dynamic.
From: Gautham R Shenoy @ 2018-06-05 8:45 UTC (permalink / raw)
To: Michael Ellerman
Cc: Gautham R. Shenoy, Rafael J. Wysocki, Daniel Lezcano,
Stewart Smith, Michael Neuling, Vaidyanathan Srinivasan,
Shilpasri G Bhat, Akshay Adiga, Nicholas Piggin, linuxppc-dev,
linux-kernel, linux-pm
In-Reply-To: <87fu22bxlf.fsf@concordia.ellerman.id.au>
Hello Michael,
On Mon, Jun 04, 2018 at 09:27:40PM +1000, Michael Ellerman wrote:
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
>
> > From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
> >
> > The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of
> > snooze to deeper idle state") introduced a timeout for the snooze idle
> > state so that it could be eventually be promoted to a deeper idle
> > state. The snooze timeout value is static and set to the target
> > residency of the next idle state, which would train the cpuidle
> > governor to pick the next idle state eventually.
> >
> > The unfortunate side-effect of this is that if the next idle state(s)
> > is disabled, the CPU will forever remain in snooze, despite the fact
> > that the system is completely idle, and other deeper idle states are
> > available.
>
> That sounds like a bug, I'll add?
>
Yes, this is a bug-fix for a customer scenario which we encountered
recently.
> Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state")
> Cc: stable@vger.kernel.org # v4.2+
This patch applies cleanly from v4.13 onwards. Prior to that there are
some (minor) conflicts.
Should I spin a version separately for the prior stable versions ?
>
> cheers
>
^ permalink raw reply
* Re: [v3] powerpc: fix build failure by disabling attribute-alias warning
From: Christophe LEROY @ 2018-06-05 7:00 UTC (permalink / raw)
To: Michael Ellerman, Benjamin Herrenschmidt, Paul Mackerras, segher
Cc: linuxppc-dev, linux-kernel
In-Reply-To: <40zxgb4SmTz9s7c@ozlabs.org>
Le 04/06/2018 à 16:11, Michael Ellerman a écrit :
> On Tue, 2018-05-29 at 16:06:41 UTC, Christophe Leroy wrote:
>> Latest GCC version emit the following warnings
>>
>> As arch/powerpc code is built with -Werror, this breaks build with
>> GCC 8.1
>>
>> This patch inhibits those warnings
>>
>> CC arch/powerpc/kernel/syscalls.o
>> In file included from arch/powerpc/kernel/syscalls.c:24:
>> ./include/linux/syscalls.h:233:18: error: 'sys_mmap2' alias between functions of incompatible types 'long int(long unsigned int, size_t, long unsigned int, long unsigned int, long unsigned int, long unsigned int)' {aka 'long int(long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Werror=attribute-alias]
>> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
>> ^~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 'SYSCALL_DEFINE6'
>> SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
>> ^~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:238:18: note: aliased declaration here
>> asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
>> ^~~~~~~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/syscalls.c:65:1: note: in expansion of macro 'SYSCALL_DEFINE6'
>> SYSCALL_DEFINE6(mmap2, unsigned long, addr, size_t, len,
>> ^~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:233:18: error: 'sys_mmap' alias between functions of incompatible types 'long int(long unsigned int, size_t, long unsigned int, long unsigned int, long unsigned int, off_t)' {aka 'long int(long unsigned int, long unsigned int, long unsigned int, long unsigned int, long unsigned int, long int)'} and 'long int(long int, long int, long int, long int, long int, long int)' [-Werror=attribute-alias]
>> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
>> ^~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/syscalls.c:72:1: note: in expansion of macro 'SYSCALL_DEFINE6'
>> SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
>> ^~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:238:18: note: aliased declaration here
>> asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
>> ^~~~~~~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE6(name, ...) SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/syscalls.c:72:1: note: in expansion of macro 'SYSCALL_DEFINE6'
>> SYSCALL_DEFINE6(mmap, unsigned long, addr, size_t, len,
>> ^~~~~~~~~~~~~~~
>> CC arch/powerpc/kernel/signal_32.o
>> In file included from arch/powerpc/kernel/signal_32.c:31:
>> ./include/linux/compat.h:74:18: error: 'compat_sys_swapcontext' alias between functions of incompatible types 'long int(struct ucontext32 *, struct ucontext32 *, int)' and 'long int(long int, long int, long int)' [-Werror=attribute-alias]
>> asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
>> ^~~~~~~~~~
>> ./include/linux/compat.h:58:2: note: in expansion of macro 'COMPAT_SYSCALL_DEFINEx'
>> COMPAT_SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~~~~~~
>> arch/powerpc/kernel/signal_32.c:1041:1: note: in expansion of macro 'COMPAT_SYSCALL_DEFINE3'
>> COMPAT_SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>> ^~~~~~~~~~~~~~~~~~~~~~
>> ./include/linux/compat.h:79:18: note: aliased declaration here
>> asmlinkage long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
>> ^~~~~~~~~~~~~~~
>> ./include/linux/compat.h:58:2: note: in expansion of macro 'COMPAT_SYSCALL_DEFINEx'
>> COMPAT_SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~~~~~~
>> arch/powerpc/kernel/signal_32.c:1041:1: note: in expansion of macro 'COMPAT_SYSCALL_DEFINE3'
>> COMPAT_SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>> ^~~~~~~~~~~~~~~~~~~~~~
>> CC arch/powerpc/kernel/signal_64.o
>> In file included from arch/powerpc/kernel/signal_64.c:27:
>> ./include/linux/syscalls.h:233:18: error: 'sys_swapcontext' alias between functions of incompatible types 'long int(struct ucontext *, struct ucontext *, long int)' and 'long int(long int, long int, long int)' [-Werror=attribute-alias]
>> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
>> ^~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:213:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/signal_64.c:628:1: note: in expansion of macro 'SYSCALL_DEFINE3'
>> SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>> ^~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:238:18: note: aliased declaration here
>> asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
>> ^~~~~~~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:213:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/signal_64.c:628:1: note: in expansion of macro 'SYSCALL_DEFINE3'
>> SYSCALL_DEFINE3(swapcontext, struct ucontext __user *, old_ctx,
>> ^~~~~~~~~~~~~~~
>> CC arch/powerpc/kernel/rtas.o
>> In file included from arch/powerpc/kernel/rtas.c:29:
>> ./include/linux/syscalls.h:233:18: error: 'sys_rtas' alias between functions of incompatible types 'long int(struct rtas_args *)' and 'long int(long int)' [-Werror=attribute-alias]
>> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
>> ^~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:211:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/rtas.c:1054:1: note: in expansion of macro 'SYSCALL_DEFINE1'
>> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>> ^~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:238:18: note: aliased declaration here
>> asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
>> ^~~~~~~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:211:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE1(name, ...) SYSCALL_DEFINEx(1, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/rtas.c:1054:1: note: in expansion of macro 'SYSCALL_DEFINE1'
>> SYSCALL_DEFINE1(rtas, struct rtas_args __user *, uargs)
>> ^~~~~~~~~~~~~~~
>> CC arch/powerpc/kernel/pci_64.o
>> In file included from arch/powerpc/kernel/pci_64.c:23:
>> ./include/linux/syscalls.h:233:18: error: 'sys_pciconfig_iobase' alias between functions of incompatible types 'long int(long int, long unsigned int, long unsigned int)' and 'long int(long int, long int, long int)' [-Werror=attribute-alias]
>> asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
>> ^~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:213:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/pci_64.c:206:1: note: in expansion of macro 'SYSCALL_DEFINE3'
>> SYSCALL_DEFINE3(pciconfig_iobase, long, which, unsigned long, in_bus,
>> ^~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:238:18: note: aliased declaration here
>> asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \
>> ^~~~~~~~
>> ./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
>> __SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~~~
>> ./include/linux/syscalls.h:213:36: note: in expansion of macro 'SYSCALL_DEFINEx'
>> #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__)
>> ^~~~~~~~~~~~~~~
>> arch/powerpc/kernel/pci_64.c:206:1: note: in expansion of macro 'SYSCALL_DEFINE3'
>> SYSCALL_DEFINE3(pciconfig_iobase, long, which, unsigned long, in_bus,
>> ^~~~~~~~~~~~~~~
>>
>> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
>
> Applied to powerpc next, thanks.
>
> https://git.kernel.org/powerpc/c/2479bfc9bc600dcce7f932d52dcfa8
Oops, you didn't take v4 but v3 which was missing the fix into pci_32.c
I'll send you a patch to add the fix in pci_32.c
Christophe
>
> cheers
>
^ permalink raw reply
* [PATCH] powerpc: fix build failure by disabling attribute-alias warning in pci_32
From: Christophe Leroy @ 2018-06-05 6:57 UTC (permalink / raw)
To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman
Cc: linux-kernel, linuxppc-dev
Commit 2479bfc9bc600 ("powerpc: Fix build by disabling attribute-alias
warning for SYSCALL_DEFINEx") forgot arch/powerpc/kernel/pci_32.c
Latest GCC version emit the following warnings
As arch/powerpc code is built with -Werror, this breaks build with
GCC 8.1
This patch inhibits this warning
In file included from arch/powerpc/kernel/pci_32.c:14:
./include/linux/syscalls.h:233:18: error: 'sys_pciconfig_iobase' alias between functions of incompatible types 'long int(long int, long unsigned int, long unsigned int)' and 'long int(long int, long int, long int)' [-Werror=attribute-alias]
asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \
^~~
./include/linux/syscalls.h:222:2: note: in expansion of macro '__SYSCALL_DEFINEx'
__SYSCALL_DEFINEx(x, sname, __VA_ARGS__)
^~~~~~~~~~~~~~~~~
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
---
arch/powerpc/kernel/pci_32.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index d63b488d34d7..4f861055a852 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -285,6 +285,9 @@ pci_bus_to_hose(int bus)
* Note that the returned IO or memory base is a physical address
*/
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpragmas"
+#pragma GCC diagnostic ignored "-Wattribute-alias"
SYSCALL_DEFINE3(pciconfig_iobase, long, which,
unsigned long, bus, unsigned long, devfn)
{
@@ -310,3 +313,4 @@ SYSCALL_DEFINE3(pciconfig_iobase, long, which,
return result;
}
+#pragma GCC diagnostic pop
--
2.13.3
^ permalink raw reply related
* Re: [RFC PATCH for 4.18 09/16] powerpc: Add syscall detection for restartable sequences
From: Michael Ellerman @ 2018-06-05 5:21 UTC (permalink / raw)
To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
Andy Lutomirski, Dave Watson
Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King,
Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter,
Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt,
Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers,
Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev
In-Reply-To: <20180602124408.8430-10-mathieu.desnoyers@efficios.com>
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
> From: Boqun Feng <boqun.feng@gmail.com>
>
> Syscalls are not allowed inside restartable sequences, so add a call to
> rseq_syscall() at the very beginning of system call exiting path for
> CONFIG_DEBUG_RSEQ=y kernel. This could help us to detect whether there
> is a syscall issued inside restartable sequences.
>
> [ Tested on 64-bit powerpc kernel by Mathieu Desnoyers. Still needs to
> be tested on 32-bit powerpc kernel. ]
>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: linuxppc-dev@lists.ozlabs.org
> ---
> arch/powerpc/kernel/entry_32.S | 7 +++++++
> arch/powerpc/kernel/entry_64.S | 8 ++++++++
> 2 files changed, 15 insertions(+)
I don't _love_ the #ifdefs in here, but they look correct and there's
not really a better option until we rewrite the syscall handler in C.
The rseq selftests passed for me with this applied and enabled. So if
you like here's some tags:
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
cheers
^ permalink raw reply
* Re: [RFC PATCH for 4.18 10/16] powerpc: Wire up restartable sequences system call
From: Michael Ellerman @ 2018-06-05 5:18 UTC (permalink / raw)
To: Mathieu Desnoyers, Peter Zijlstra, Paul E . McKenney, Boqun Feng,
Andy Lutomirski, Dave Watson
Cc: linux-kernel, linux-api, Paul Turner, Andrew Morton, Russell King,
Thomas Gleixner, Ingo Molnar, H . Peter Anvin, Andrew Hunter,
Andi Kleen, Chris Lameter, Ben Maurer, Steven Rostedt,
Josh Triplett, Linus Torvalds, Catalin Marinas, Will Deacon,
Michael Kerrisk, Joel Fernandes, Mathieu Desnoyers,
Benjamin Herrenschmidt, Paul Mackerras, linuxppc-dev
In-Reply-To: <20180602124408.8430-11-mathieu.desnoyers@efficios.com>
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> writes:
> From: Boqun Feng <boqun.feng@gmail.com>
>
> Wire up the rseq system call on powerpc.
>
> This provides an ABI improving the speed of a user-space getcpu
> operation on powerpc by skipping the getcpu system call on the fast
> path, as well as improving the speed of user-space operations on per-cpu
> data compared to using load-reservation/store-conditional atomics.
>
> Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> CC: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> CC: Paul Mackerras <paulus@samba.org>
> CC: Michael Ellerman <mpe@ellerman.id.au>
> CC: Peter Zijlstra <peterz@infradead.org>
> CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
> CC: linuxppc-dev@lists.ozlabs.org
> ---
> arch/powerpc/include/asm/systbl.h | 1 +
> arch/powerpc/include/asm/unistd.h | 2 +-
> arch/powerpc/include/uapi/asm/unistd.h | 1 +
> 3 files changed, 3 insertions(+), 1 deletion(-)
Looks fine to me.
I don't have any other new syscalls in my next, so this should not
conflict with anything for 4.18.
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
cheers
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Christoph Hellwig @ 2018-06-05 4:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Michael S. Tsirkin, Anshuman Khandual, virtualization,
linux-kernel, linuxppc-dev, aik, robh, joe, elfring, david,
jasowang, mpe, hch, luto
In-Reply-To: <6164442a718645a754879eac5c4c5fad9283c211.camel@kernel.crashing.org>
On Tue, Jun 05, 2018 at 09:26:56AM +1000, Benjamin Herrenschmidt wrote:
> Sorry Michael, that doesn't click. Yes of course virtio is implemented
> in qemu, but the problem we are trying to solve is *not* a qemu problem
> (the fact that the Linux drivers bypass the DMA API is wrong, needs
> fixing, and isnt a qemu problem). The fact that the secure guests need
> bounce buffering is not a qemu problem either.
>
> Whether qemu chose to use an iommu or not is, and should remain an
> orthogonal problem.
Agreed. We have a problem with qemu (old qemu only?) punching a hole
into the VM abstraction by deciding that even if firmware tables
claim use of an IOMMU for a PCI bus it expects virtio to use physiscal
addresses. So far so bad. The answer to that should have been to
quirk the affected qemu versions and move on. Instead we now have
virtio not use the DMA API by default and are creating a worse problem.
Let's fix this issue ASAP and quirk the buggy implementations instead
of letting everyone else suffer.
> The DMA API itself isn't the one that needs to learn "per-device
> quirks", it's just plumbing into arch backends. The "quirk" is at the
> point of establishing the backend for a given device.
>
> We can go a good way down that path simply by having virtio in Linux
> start with putting *itself* its own direct ops in there when
> VIRTIO_F_IOMMU_PLATFORM is not set, and removing all the special casing
> in the rest of the driver.
Yes. And we have all the infrastructure for that now. A few RDMA
drivers quirk to virt_dma_ops, and virtio could quirk to dma_direct_ops
anytime now. In fact given how much time we are spending arguing here
I'm going to give it a spin today.
> Once that's done, we have a single point of establishing the dma ops,
> we can quirk in there if needed, that's rather nicely contained, or put
> an arch hook, or whatever is necessary.
Yes.
^ permalink raw reply
* Re: [PATCH v7 0/5] powerpc/64: memcmp() optimization
From: Simon Guo @ 2018-06-04 10:27 UTC (permalink / raw)
To: Michael Ellerman; +Cc: linuxppc-dev, Paul Mackerras, Naveen N. Rao, Cyril Bur
In-Reply-To: <877eneasg9.fsf@concordia.ellerman.id.au>
Hi Michael,
On Tue, Jun 05, 2018 at 12:16:22PM +1000, Michael Ellerman wrote:
> Hi Simon,
>
> wei.guo.simon@gmail.com writes:
> > From: Simon Guo <wei.guo.simon@gmail.com>
> >
> > There is some room to optimize memcmp() in powerpc 64 bits version for
> > following 2 cases:
> > (1) Even src/dst addresses are not aligned with 8 bytes at the beginning,
> > memcmp() can align them and go with .Llong comparision mode without
> > fallback to .Lshort comparision mode do compare buffer byte by byte.
> > (2) VMX instructions can be used to speed up for large size comparision,
> > currently the threshold is set for 4K bytes. Notes the VMX instructions
> > will lead to VMX regs save/load penalty. This patch set includes a
> > patch to add a 32 bytes pre-checking to minimize the penalty.
> >
> > It did the similar with glibc commit dec4a7105e (powerpc: Improve memcmp
> > performance for POWER8). Thanks Cyril Bur's information.
> > This patch set also updates memcmp selftest case to make it compiled and
> > incorporate large size comparison case.
>
> I'm seeing a few crashes with this applied, I haven't had time to look
> into what is happening yet, sorry.
Sorry I didn't catch this in my testing. I will check the root cause
and update later.
Thanks,
- Simon
>
> [ 2471.300595] kselftest: Running tests in user
> [ 2471.302785] calling test_user_copy_init+0x0/0xd14 [test_user_copy] @ 44883
> [ 2471.302892] Unable to handle kernel paging request for data at address 0xc008000018553005
> [ 2471.303014] Faulting instruction address: 0xc00000000001f29c
> [ 2471.303119] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 2471.303193] LE SMP NR_CPUS=2048 NUMA PowerNV
> [ 2471.303256] Modules linked in: test_user_copy(+) vxlan ip6_udp_tunnel udp_tunnel 8021q bridge stp llc dummy test_printf test_firmware vmx_crypto crct10dif_vpmsum crct10dif_common crc32c_vpmsum veth [last unloaded: test_static_key_base]
> [ 2471.303532] CPU: 4 PID: 44883 Comm: modprobe Tainted: G W 4.17.0-rc3-gcc7x-g7204012 #1
> [ 2471.303644] NIP: c00000000001f29c LR: c00000000001f6e4 CTR: 0000000000000000
> [ 2471.303754] REGS: c000001fddc2b560 TRAP: 0300 Tainted: G W (4.17.0-rc3-gcc7x-g7204012)
> [ 2471.303873] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 24222844 XER: 00000000
> [ 2471.303996] CFAR: c00000000001f6e0 DAR: c008000018553005 DSISR: 40000000 IRQMASK: 0
> [ 2471.303996] GPR00: c00000000001f6e4 c000001fddc2b7e0 c008000018529900 0000000002000000
> [ 2471.303996] GPR04: c000001fe4b90020 000000000000ffe0 0000000000000000 03fffffe01b48000
> [ 2471.303996] GPR08: 0000000080000000 c008000018553005 c000001fddc28000 c008000018520df0
> [ 2471.303996] GPR12: c00000000009c430 c000001fffffbc00 0000000020000000 0000000000000000
> [ 2471.303996] GPR16: c000001fddc2bc20 0000000000000030 c0000000001f7ba0 0000000000000001
> [ 2471.303996] GPR20: 0000000000000000 c000000000c772b0 c0000000010b4018 0000000000000000
> [ 2471.303996] GPR24: 0000000000000000 c008000018521c98 0000000000000000 c000001fe4b90000
> [ 2471.303996] GPR28: fffffffffffffff4 0000000002000000 9000000002009033 9000000002009033
> [ 2471.304930] NIP [c00000000001f29c] msr_check_and_set+0x3c/0xc0
> [ 2471.305008] LR [c00000000001f6e4] enable_kernel_altivec+0x44/0x100
> [ 2471.305084] Call Trace:
> [ 2471.305122] [c000001fddc2b7e0] [c00000000009baa8] __copy_tofrom_user_base+0x9c/0x574 (unreliable)
> [ 2471.305240] [c000001fddc2b860] [c00000000001f6e4] enable_kernel_altivec+0x44/0x100
> [ 2471.305336] [c000001fddc2b890] [c00000000009ce40] enter_vmx_ops+0x50/0x70
> [ 2471.305418] [c000001fddc2b8b0] [c00000000009c768] memcmp+0x338/0x680
> [ 2471.305501] [c000001fddc2b9b0] [c008000018520190] test_user_copy_init+0x188/0xd14 [test_user_copy]
> [ 2471.305617] [c000001fddc2ba60] [c00000000000de20] do_one_initcall+0x90/0x560
> [ 2471.305710] [c000001fddc2bb30] [c000000000200630] do_init_module+0x90/0x260
> [ 2471.305795] [c000001fddc2bbc0] [c0000000001fec88] load_module+0x1a28/0x1ce0
> [ 2471.305875] [c000001fddc2bd70] [c0000000001ff1e8] sys_finit_module+0xc8/0x110
> [ 2471.305983] [c000001fddc2be30] [c00000000000b528] system_call+0x58/0x6c
> [ 2471.306066] Instruction dump:
> [ 2471.306112] fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff81 7c7d1b78 60000000 60000000
> [ 2471.306216] 7fe000a6 3d220003 39299705 7ffeeb78 <89290000> 2f890000 419e0044 60000000
> [ 2471.306326] ---[ end trace daf8d409e65b9841 ]---
>
> And:
>
> [ 19.096709] test_bpf: test_skb_segment: success in skb_segment!
> [ 19.096799] initcall test_bpf_init+0x0/0xae0 [test_bpf] returned 0 after 591217 usecs
> [ 19.115869] calling test_user_copy_init+0x0/0xd14 [test_user_copy] @ 3159
> [ 19.116165] Unable to handle kernel paging request for data at address 0xd000000003852805
> [ 19.116352] Faulting instruction address: 0xc00000000001f44c
> [ 19.116483] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 19.116583] LE SMP NR_CPUS=2048 NUMA pSeries
> [ 19.116684] Modules linked in: test_user_copy(+) lzo_compress crc_itu_t zstd_compress zstd_decompress test_bpf test_static_keys test_static_key_base xxhash test_firmware af_key cls_bpf act_bpf bridge nf_nat_irc xt_NFLOG nfnetlink_log xt_policy nf_conntrack_netlink nfnetlink xt_nat nf_conntrack_irc xt_mark xt_tcpudp nf_nat_sip xt_TCPMSS xt_LOG nf_nat_ftp nf_conntrack_ftp xt_conntrack nf_conntrack_sip xt_addrtype xt_state 8021q iptable_filter ipt_MASQUERADE nf_log_ipv4 iptable_mangle nf_nat_masquerade_ipv4 ipt_REJECT nf_reject_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nf_log_arp nf_log_common ah4 ipcomp xfrm4_tunnel esp4 rpcrdma stp p8022 psnap llc xfrm_ipcomp xfrm_user xfrm_algo platform_lcd lcd ocxl virtio_balloon virtio_crypto crypto_engine
> [ 19.118040] vmx_crypto nbd zram zsmalloc virtio_blk st be2iscsi cxgb3i cxgb4i libcxgbi bnx2i ibmvfc sym53c8xx scsi_transport_spi scsi_dh_alua scsi_dh_rdac qla4xxx mpt3sas scsi_transport_sas cxlflash cxl libiscsi_tcp lpfc crc_t10dif crct10dif_generic crct10dif_common qla2xxx iscsi_boot_sysfs raid_class parport_pc parport powernv_op_panel powernv_rng pseries_rng rng_core virtio_console pcspkr input_leds evdev dm_round_robin dm_mirror dm_region_hash dm_log raid10 dm_service_time multipath dm_queue_length dm_multipath dm_thin_pool faulty dm_persistent_data dm_zero dm_crypt dm_bio_prison dm_snapshot dm_bufio raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq rpadlpar_io rpaphp jsm icom hvcs ib_ipoib ib_srp ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_ucm ib_ucm ib_uverbs
> [ 19.119505] rdma_cm iw_cm ib_cm mlx4_ib iw_cxgb3 iw_cxgb4 ib_mthca ib_core leds_powernv led_class vhost_net vhost macvtap macvlan dummy bsd_comp ppp_async crc_ccitt pppoe ppp_synctty pppox ppp_deflate ppp_generic 3c59x s2io bnx2 cnic uio bnx2x libcrc32c i40e ixgbe ixgb cxgb3 libcxgb cxgb cxgb4 pcnet32 netxen_nic qlge be2net acenic mlx4_en mlx4_core myri10ge bonding slhc tap mdio veth vxlan udp_tunnel tun usb_storage usbmon oprofile sha1_powerpc md5_ppc crc32c_vpmsum kvm hvcserver
> [ 19.120358] CPU: 4 PID: 3159 Comm: modprobe Not tainted 4.17.0-rc3-gcc7x-g7204012 #1
> [ 19.120508] NIP: c00000000001f44c LR: c00000000001f894 CTR: 0000000000000000
> [ 19.120666] REGS: c0000000f8d9f570 TRAP: 0300 Not tainted (4.17.0-rc3-gcc7x-g7204012)
> [ 19.120817] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24222844 XER: 00000000
> [ 19.120984] CFAR: c00000000000c03c DAR: d000000003852805 DSISR: 40000000 IRQMASK: 0
> GPR00: c00000000001f894 c0000000f8d9f7f0 d000000003829900 0000000002000000
> GPR04: c0000000f9a30048 000000000000ffe0 0000000000000000 03fffffff065dffd
> GPR08: 0000000080000000 d000000003852805 c0000000f8d9c000 d000000003820df0
> GPR12: c00000000009ebb0 c00000003fffb300 c0000000f8d9fd90 d000000003840000
> GPR16: d000000003840000 0000000000000000 c0000000011d6900 d000000003821ad0
> GPR20: c000000000bd7860 0000000000000000 c000000000ff9060 00000000014000c0
> GPR24: 0000000000000000 0000000000000000 0000000000000100 c0000000f9a30028
> GPR28: fffffffffffffff4 0000000002000000 8000000002009033 8000000000009033
> [ 19.122454] NIP [c00000000001f44c] msr_check_and_set+0x3c/0xc0
> [ 19.122580] LR [c00000000001f894] enable_kernel_altivec+0x44/0x100
> [ 19.122707] Call Trace:
> [ 19.122789] [c0000000f8d9f7f0] [c00000000009e228] __copy_tofrom_user_base+0x9c/0x574 (unreliable)
> [ 19.122962] [c0000000f8d9f870] [c00000000001f894] enable_kernel_altivec+0x44/0x100
> [ 19.123344] [c0000000f8d9f8a0] [c00000000009f740] enter_vmx_ops+0x50/0x70
> [ 19.123583] [c0000000f8d9f8c0] [c00000000009eee8] memcmp+0x338/0x680
> [ 19.123728] [c0000000f8d9f9c0] [d000000003820190] test_user_copy_init+0x188/0xd14 [test_user_copy]
> [ 19.123909] [c0000000f8d9fa70] [c00000000000e37c] do_one_initcall+0x5c/0x2d0
> [ 19.124094] [c0000000f8d9fb30] [c00000000020066c] do_init_module+0x90/0x264
> [ 19.124234] [c0000000f8d9fbc0] [c0000000001ff084] load_module+0x2f64/0x3600
> [ 19.124371] [c0000000f8d9fd70] [c0000000001ff9c8] sys_finit_module+0xc8/0x110
> [ 19.124530] [c0000000f8d9fe30] [c00000000000b868] system_call+0x58/0x6c
> [ 19.124648] Instruction dump:
> [ 19.124721] fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff81 7c7d1b78 60000000 60000000
> [ 19.124869] 7fe000a6 3d220003 39298f05 7ffeeb78 <89290000> 2f890000 419e0044 60000000
> [ 19.125034] ---[ end trace 7c08acedd4b4e6aa ]---
>
>
> cheers
^ permalink raw reply
* Re: [PATCH] cpuidle:powernv: Make the snooze timeout dynamic.
From: Stewart Smith @ 2018-06-05 3:47 UTC (permalink / raw)
To: Michael Ellerman, Gautham R. Shenoy, Rafael J. Wysocki,
Daniel Lezcano, Michael Neuling, Vaidyanathan Srinivasan,
Shilpasri G Bhat, Akshay Adiga, Nicholas Piggin
Cc: linuxppc-dev, linux-kernel, linux-pm, Gautham R. Shenoy
In-Reply-To: <87fu22bxlf.fsf@concordia.ellerman.id.au>
Michael Ellerman <mpe@ellerman.id.au> writes:
> "Gautham R. Shenoy" <ego@linux.vnet.ibm.com> writes:
>
>> From: "Gautham R. Shenoy" <ego@linux.vnet.ibm.com>
>>
>> The commit 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of
>> snooze to deeper idle state") introduced a timeout for the snooze idle
>> state so that it could be eventually be promoted to a deeper idle
>> state. The snooze timeout value is static and set to the target
>> residency of the next idle state, which would train the cpuidle
>> governor to pick the next idle state eventually.
>>
>> The unfortunate side-effect of this is that if the next idle state(s)
>> is disabled, the CPU will forever remain in snooze, despite the fact
>> that the system is completely idle, and other deeper idle states are
>> available.
>
> That sounds like a bug, I'll add?
>
> Fixes: 78eaa10f027c ("cpuidle: powernv/pseries: Auto-promotion of snooze to deeper idle state")
> Cc: stable@vger.kernel.org # v4.2+
Yes, it's a bug - we had a customer bug because we lacked this that
meant we had to do firmware changes rather than just tweaking what stop
states were used.
--
Stewart Smith
OPAL Architect, IBM.
^ permalink raw reply
* RE: [PATCH 09/10] dpaa_eth: add support for hardware timestamping
From: Y.b. Lu @ 2018-06-05 3:35 UTC (permalink / raw)
To: Richard Cochran
Cc: netdev@vger.kernel.org, Madalin-cristian Bucur, Rob Herring,
Shawn Guo, David S . Miller, devicetree@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org,
linux-arm-kernel@lists.infradead.org,
linux-kernel@vger.kernel.org
In-Reply-To: <20180604134920.ezhe6jz5ntpnqyzj@localhost>
Hi Richard,
> -----Original Message-----
> From: Richard Cochran [mailto:richardcochran@gmail.com]
> Sent: Monday, June 4, 2018 9:49 PM
> To: Y.b. Lu <yangbo.lu@nxp.com>
> Cc: netdev@vger.kernel.org; Madalin-cristian Bucur
> <madalin.bucur@nxp.com>; Rob Herring <robh+dt@kernel.org>; Shawn Guo
> <shawnguo@kernel.org>; David S . Miller <davem@davemloft.net>;
> devicetree@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org
> Subject: Re: [PATCH 09/10] dpaa_eth: add support for hardware timestampin=
g
>=20
> On Mon, Jun 04, 2018 at 03:08:36PM +0800, Yangbo Lu wrote:
>=20
> > +if FSL_DPAA_ETH
> > +config FSL_DPAA_ETH_TS
> > + bool "DPAA hardware timestamping support"
> > + select PTP_1588_CLOCK_QORIQ
> > + default n
> > + help
> > + Enable DPAA hardware timestamping support.
> > + This option is useful for applications to get
> > + hardware time stamps on the Ethernet packets
> > + using the SO_TIMESTAMPING API.
> > +endif
>=20
> You should drop this #ifdef. In general, if a MAC supports time stamping=
and
> PHC, then the driver support should simply be compiled in.
>=20
> [ When time stamping incurs a large run time performance penalty to
> non-PTP users, then it might make sense to have a Kconfig option to
> disable it, but that doesn't appear to be the case here. ]
[Y.b. Lu] Actually these timestamping codes affected DPAA networking perfor=
mance in our previous performance test.
That's why we used ifdef for it.
>=20
> > @@ -1615,6 +1635,24 @@ static int dpaa_eth_refill_bpools(struct
> dpaa_priv *priv)
> > skbh =3D (struct sk_buff **)phys_to_virt(addr);
> > skb =3D *skbh;
> >
> > +#ifdef CONFIG_FSL_DPAA_ETH_TS
> > + if (priv->tx_tstamp &&
> > + skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) {
>=20
> This condition fits on one line easily.
[Y.b. Lu] Right. I will use one line in next version.
>=20
> > + struct skb_shared_hwtstamps shhwtstamps;
> > + u64 ns;
>=20
> Local variables belong at the top of the function.
[Y.b. Lu] Ok, will move them to the top in next verison.
>=20
> > + memset(&shhwtstamps, 0, sizeof(shhwtstamps));
> > +
> > + if (!dpaa_get_tstamp_ns(priv->net_dev, &ns,
> > + priv->mac_dev->port[TX],
> > + (void *)skbh)) {
> > + shhwtstamps.hwtstamp =3D ns_to_ktime(ns);
> > + skb_tstamp_tx(skb, &shhwtstamps);
> > + } else {
> > + dev_warn(dev, "dpaa_get_tstamp_ns failed!\n");
> > + }
> > + }
> > +#endif
> > if (unlikely(qm_fd_get_format(fd) =3D=3D qm_fd_sg)) {
> > nr_frags =3D skb_shinfo(skb)->nr_frags;
> > dma_unmap_single(dev, addr, qm_fd_get_offset(fd) + @@ -2086,6
> > +2124,14 @@ static int dpaa_start_xmit(struct sk_buff *skb, struct
> net_device *net_dev)
> > if (unlikely(err < 0))
> > goto skb_to_fd_failed;
> >
> > +#ifdef CONFIG_FSL_DPAA_ETH_TS
> > + if (priv->tx_tstamp &&
> > + skb_shinfo(skb)->tx_flags & SKBTX_HW_TSTAMP) {
>=20
> One line please.
[Y.b. Lu] No problem.
>=20
> > + fd.cmd |=3D FM_FD_CMD_UPD;
> > + skb_shinfo(skb)->tx_flags |=3D SKBTX_IN_PROGRESS;
> > + }
> > +#endif
> > +
> > if (likely(dpaa_xmit(priv, percpu_stats, queue_mapping, &fd) =3D=3D 0=
))
> > return NETDEV_TX_OK;
> >
>=20
> Thanks,
> Richard
^ permalink raw reply
* Re: [PATCH v7 0/5] powerpc/64: memcmp() optimization
From: Michael Ellerman @ 2018-06-05 2:16 UTC (permalink / raw)
To: wei.guo.simon, linuxppc-dev
Cc: Paul Mackerras, Naveen N. Rao, Cyril Bur, Simon Guo
In-Reply-To: <1527672063-6953-1-git-send-email-wei.guo.simon@gmail.com>
Hi Simon,
wei.guo.simon@gmail.com writes:
> From: Simon Guo <wei.guo.simon@gmail.com>
>
> There is some room to optimize memcmp() in powerpc 64 bits version for
> following 2 cases:
> (1) Even src/dst addresses are not aligned with 8 bytes at the beginning,
> memcmp() can align them and go with .Llong comparision mode without
> fallback to .Lshort comparision mode do compare buffer byte by byte.
> (2) VMX instructions can be used to speed up for large size comparision,
> currently the threshold is set for 4K bytes. Notes the VMX instructions
> will lead to VMX regs save/load penalty. This patch set includes a
> patch to add a 32 bytes pre-checking to minimize the penalty.
>
> It did the similar with glibc commit dec4a7105e (powerpc: Improve memcmp
> performance for POWER8). Thanks Cyril Bur's information.
> This patch set also updates memcmp selftest case to make it compiled and
> incorporate large size comparison case.
I'm seeing a few crashes with this applied, I haven't had time to look
into what is happening yet, sorry.
[ 2471.300595] kselftest: Running tests in user
[ 2471.302785] calling test_user_copy_init+0x0/0xd14 [test_user_copy] @ 44883
[ 2471.302892] Unable to handle kernel paging request for data at address 0xc008000018553005
[ 2471.303014] Faulting instruction address: 0xc00000000001f29c
[ 2471.303119] Oops: Kernel access of bad area, sig: 11 [#1]
[ 2471.303193] LE SMP NR_CPUS=2048 NUMA PowerNV
[ 2471.303256] Modules linked in: test_user_copy(+) vxlan ip6_udp_tunnel udp_tunnel 8021q bridge stp llc dummy test_printf test_firmware vmx_crypto crct10dif_vpmsum crct10dif_common crc32c_vpmsum veth [last unloaded: test_static_key_base]
[ 2471.303532] CPU: 4 PID: 44883 Comm: modprobe Tainted: G W 4.17.0-rc3-gcc7x-g7204012 #1
[ 2471.303644] NIP: c00000000001f29c LR: c00000000001f6e4 CTR: 0000000000000000
[ 2471.303754] REGS: c000001fddc2b560 TRAP: 0300 Tainted: G W (4.17.0-rc3-gcc7x-g7204012)
[ 2471.303873] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 24222844 XER: 00000000
[ 2471.303996] CFAR: c00000000001f6e0 DAR: c008000018553005 DSISR: 40000000 IRQMASK: 0
[ 2471.303996] GPR00: c00000000001f6e4 c000001fddc2b7e0 c008000018529900 0000000002000000
[ 2471.303996] GPR04: c000001fe4b90020 000000000000ffe0 0000000000000000 03fffffe01b48000
[ 2471.303996] GPR08: 0000000080000000 c008000018553005 c000001fddc28000 c008000018520df0
[ 2471.303996] GPR12: c00000000009c430 c000001fffffbc00 0000000020000000 0000000000000000
[ 2471.303996] GPR16: c000001fddc2bc20 0000000000000030 c0000000001f7ba0 0000000000000001
[ 2471.303996] GPR20: 0000000000000000 c000000000c772b0 c0000000010b4018 0000000000000000
[ 2471.303996] GPR24: 0000000000000000 c008000018521c98 0000000000000000 c000001fe4b90000
[ 2471.303996] GPR28: fffffffffffffff4 0000000002000000 9000000002009033 9000000002009033
[ 2471.304930] NIP [c00000000001f29c] msr_check_and_set+0x3c/0xc0
[ 2471.305008] LR [c00000000001f6e4] enable_kernel_altivec+0x44/0x100
[ 2471.305084] Call Trace:
[ 2471.305122] [c000001fddc2b7e0] [c00000000009baa8] __copy_tofrom_user_base+0x9c/0x574 (unreliable)
[ 2471.305240] [c000001fddc2b860] [c00000000001f6e4] enable_kernel_altivec+0x44/0x100
[ 2471.305336] [c000001fddc2b890] [c00000000009ce40] enter_vmx_ops+0x50/0x70
[ 2471.305418] [c000001fddc2b8b0] [c00000000009c768] memcmp+0x338/0x680
[ 2471.305501] [c000001fddc2b9b0] [c008000018520190] test_user_copy_init+0x188/0xd14 [test_user_copy]
[ 2471.305617] [c000001fddc2ba60] [c00000000000de20] do_one_initcall+0x90/0x560
[ 2471.305710] [c000001fddc2bb30] [c000000000200630] do_init_module+0x90/0x260
[ 2471.305795] [c000001fddc2bbc0] [c0000000001fec88] load_module+0x1a28/0x1ce0
[ 2471.305875] [c000001fddc2bd70] [c0000000001ff1e8] sys_finit_module+0xc8/0x110
[ 2471.305983] [c000001fddc2be30] [c00000000000b528] system_call+0x58/0x6c
[ 2471.306066] Instruction dump:
[ 2471.306112] fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff81 7c7d1b78 60000000 60000000
[ 2471.306216] 7fe000a6 3d220003 39299705 7ffeeb78 <89290000> 2f890000 419e0044 60000000
[ 2471.306326] ---[ end trace daf8d409e65b9841 ]---
And:
[ 19.096709] test_bpf: test_skb_segment: success in skb_segment!
[ 19.096799] initcall test_bpf_init+0x0/0xae0 [test_bpf] returned 0 after 591217 usecs
[ 19.115869] calling test_user_copy_init+0x0/0xd14 [test_user_copy] @ 3159
[ 19.116165] Unable to handle kernel paging request for data at address 0xd000000003852805
[ 19.116352] Faulting instruction address: 0xc00000000001f44c
[ 19.116483] Oops: Kernel access of bad area, sig: 11 [#1]
[ 19.116583] LE SMP NR_CPUS=2048 NUMA pSeries
[ 19.116684] Modules linked in: test_user_copy(+) lzo_compress crc_itu_t zstd_compress zstd_decompress test_bpf test_static_keys test_static_key_base xxhash test_firmware af_key cls_bpf act_bpf bridge nf_nat_irc xt_NFLOG nfnetlink_log xt_policy nf_conntrack_netlink nfnetlink xt_nat nf_conntrack_irc xt_mark xt_tcpudp nf_nat_sip xt_TCPMSS xt_LOG nf_nat_ftp nf_conntrack_ftp xt_conntrack nf_conntrack_sip xt_addrtype xt_state 8021q iptable_filter ipt_MASQUERADE nf_log_ipv4 iptable_mangle nf_nat_masquerade_ipv4 ipt_REJECT nf_reject_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables nf_log_arp nf_log_common ah4 ipcomp xfrm4_tunnel esp4 rpcrdma stp p8022 psnap llc xfrm_ipcomp xfrm_user xfrm_algo platform_lcd lcd ocxl virtio_balloon virtio_crypto crypto_engine
[ 19.118040] vmx_crypto nbd zram zsmalloc virtio_blk st be2iscsi cxgb3i cxgb4i libcxgbi bnx2i ibmvfc sym53c8xx scsi_transport_spi scsi_dh_alua scsi_dh_rdac qla4xxx mpt3sas scsi_transport_sas cxlflash cxl libiscsi_tcp lpfc crc_t10dif crct10dif_generic crct10dif_common qla2xxx iscsi_boot_sysfs raid_class parport_pc parport powernv_op_panel powernv_rng pseries_rng rng_core virtio_console pcspkr input_leds evdev dm_round_robin dm_mirror dm_region_hash dm_log raid10 dm_service_time multipath dm_queue_length dm_multipath dm_thin_pool faulty dm_persistent_data dm_zero dm_crypt dm_bio_prison dm_snapshot dm_bufio raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq rpadlpar_io rpaphp jsm icom hvcs ib_ipoib ib_srp ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_ucm ib_ucm ib_uverbs
[ 19.119505] rdma_cm iw_cm ib_cm mlx4_ib iw_cxgb3 iw_cxgb4 ib_mthca ib_core leds_powernv led_class vhost_net vhost macvtap macvlan dummy bsd_comp ppp_async crc_ccitt pppoe ppp_synctty pppox ppp_deflate ppp_generic 3c59x s2io bnx2 cnic uio bnx2x libcrc32c i40e ixgbe ixgb cxgb3 libcxgb cxgb cxgb4 pcnet32 netxen_nic qlge be2net acenic mlx4_en mlx4_core myri10ge bonding slhc tap mdio veth vxlan udp_tunnel tun usb_storage usbmon oprofile sha1_powerpc md5_ppc crc32c_vpmsum kvm hvcserver
[ 19.120358] CPU: 4 PID: 3159 Comm: modprobe Not tainted 4.17.0-rc3-gcc7x-g7204012 #1
[ 19.120508] NIP: c00000000001f44c LR: c00000000001f894 CTR: 0000000000000000
[ 19.120666] REGS: c0000000f8d9f570 TRAP: 0300 Not tainted (4.17.0-rc3-gcc7x-g7204012)
[ 19.120817] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24222844 XER: 00000000
[ 19.120984] CFAR: c00000000000c03c DAR: d000000003852805 DSISR: 40000000 IRQMASK: 0
GPR00: c00000000001f894 c0000000f8d9f7f0 d000000003829900 0000000002000000
GPR04: c0000000f9a30048 000000000000ffe0 0000000000000000 03fffffff065dffd
GPR08: 0000000080000000 d000000003852805 c0000000f8d9c000 d000000003820df0
GPR12: c00000000009ebb0 c00000003fffb300 c0000000f8d9fd90 d000000003840000
GPR16: d000000003840000 0000000000000000 c0000000011d6900 d000000003821ad0
GPR20: c000000000bd7860 0000000000000000 c000000000ff9060 00000000014000c0
GPR24: 0000000000000000 0000000000000000 0000000000000100 c0000000f9a30028
GPR28: fffffffffffffff4 0000000002000000 8000000002009033 8000000000009033
[ 19.122454] NIP [c00000000001f44c] msr_check_and_set+0x3c/0xc0
[ 19.122580] LR [c00000000001f894] enable_kernel_altivec+0x44/0x100
[ 19.122707] Call Trace:
[ 19.122789] [c0000000f8d9f7f0] [c00000000009e228] __copy_tofrom_user_base+0x9c/0x574 (unreliable)
[ 19.122962] [c0000000f8d9f870] [c00000000001f894] enable_kernel_altivec+0x44/0x100
[ 19.123344] [c0000000f8d9f8a0] [c00000000009f740] enter_vmx_ops+0x50/0x70
[ 19.123583] [c0000000f8d9f8c0] [c00000000009eee8] memcmp+0x338/0x680
[ 19.123728] [c0000000f8d9f9c0] [d000000003820190] test_user_copy_init+0x188/0xd14 [test_user_copy]
[ 19.123909] [c0000000f8d9fa70] [c00000000000e37c] do_one_initcall+0x5c/0x2d0
[ 19.124094] [c0000000f8d9fb30] [c00000000020066c] do_init_module+0x90/0x264
[ 19.124234] [c0000000f8d9fbc0] [c0000000001ff084] load_module+0x2f64/0x3600
[ 19.124371] [c0000000f8d9fd70] [c0000000001ff9c8] sys_finit_module+0xc8/0x110
[ 19.124530] [c0000000f8d9fe30] [c00000000000b868] system_call+0x58/0x6c
[ 19.124648] Instruction dump:
[ 19.124721] fba1ffe8 fbc1fff0 fbe1fff8 f8010010 f821ff81 7c7d1b78 60000000 60000000
[ 19.124869] 7fe000a6 3d220003 39298f05 7ffeeb78 <89290000> 2f890000 419e0044 60000000
[ 19.125034] ---[ end trace 7c08acedd4b4e6aa ]---
cheers
^ permalink raw reply
* [PATCH 5/5] powerpc/pkeys: make protection key 0 less special
From: Ram Pai @ 2018-06-05 2:09 UTC (permalink / raw)
To: mpe
Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
mhocko, bauerman, linuxram, Ulrich.Weigand, fweimer, luto
In-Reply-To: <1528164594-23823-1-git-send-email-linuxram@us.ibm.com>
Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.
Lets make pkey-0 less special and treat it almost like any other key.
Thus it can be explicitly associated with any address range, and can be
freed. This gives the application more flexibility and power. The
ability to free pkey-0 must be used responsibily, since pkey-0 is
associated with almost all address-range by default.
Even with this change pkey-0 continues to be slightly more special
from the following point of view.
(a) it is implicitly allocated.
(b) it is the default key assigned to any address-range.
(c) its permissions cannot be modified by userspace.
NOTE: (c) is specific to powerpc only. pkey-0 is associated by default
with all pages including kernel pages, and pkeys are also active in
kernel mode. If any permission is denied on pkey-0, the kernel running
in the context of the application will be unable to operate.
Tested on powerpc.
cc: Thomas Gleixner <tglx@linutronix.de>
cc: Dave Hansen <dave.hansen@intel.com>
cc: Michael Ellermen <mpe@ellerman.id.au>
cc: Ingo Molnar <mingo@kernel.org>
cc: Andrew Morton <akpm@linux-foundation.org>
cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
cc: Michal Such谩nek <msuchanek@suse.de
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
-----------------------------------------------------------------------
History:
v4: . introduced PKEY_0 macro. No bug fixes. Code
re-arrangement to save a few cycles.
v3: . Corrected a comment in arch_set_user_pkey_access(). .
Clarified the header, to capture the notion that pkey-0
permissions cannot be modified by userspace on powerpc.
-- comment from Thiago
v2: . mm_pkey_is_allocated() continued to treat pkey-0 special.
fixed it.
---
arch/powerpc/include/asm/pkeys.h | 29 +++++++++++++++++++++++------
arch/powerpc/mm/pkeys.c | 13 ++++++-------
2 files changed, 29 insertions(+), 13 deletions(-)
diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
index 0409c80..d349e22 100644
--- a/arch/powerpc/include/asm/pkeys.h
+++ b/arch/powerpc/include/asm/pkeys.h
@@ -13,7 +13,10 @@
DECLARE_STATIC_KEY_TRUE(pkey_disabled);
extern int pkeys_total; /* total pkeys as per device tree */
-extern u32 initial_allocation_mask; /* bits set for reserved keys */
+extern u32 initial_allocation_mask; /* bits set for the initially allocated keys */
+extern u32 reserved_allocation_mask; /* bits set for reserved keys */
+
+#define PKEY_0 0
/*
* Define these here temporarily so we're not dependent on patching linux/mm.h.
@@ -96,15 +99,19 @@ static inline u16 pte_to_pkey_bits(u64 pteflags)
#define __mm_pkey_is_allocated(mm, pkey) \
(mm_pkey_allocation_map(mm) & pkey_alloc_mask(pkey))
-#define __mm_pkey_is_reserved(pkey) (initial_allocation_mask & \
+#define __mm_pkey_is_reserved(pkey) (reserved_allocation_mask & \
pkey_alloc_mask(pkey))
static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey)
{
- /* A reserved key is never considered as 'explicitly allocated' */
- return ((pkey < arch_max_pkey()) &&
- !__mm_pkey_is_reserved(pkey) &&
- __mm_pkey_is_allocated(mm, pkey));
+ if (pkey < 0 || pkey >= arch_max_pkey())
+ return false;
+
+ /* Reserved keys are never allocated. */
+ if (__mm_pkey_is_reserved(pkey))
+ return false;
+
+ return __mm_pkey_is_allocated(mm, pkey);
}
extern void __arch_activate_pkey(int pkey);
@@ -200,6 +207,16 @@ static inline int arch_set_user_pkey_access(struct task_struct *tsk, int pkey,
{
if (static_branch_likely(&pkey_disabled))
return -EINVAL;
+
+ /*
+ * userspace should not change pkey-0 permissions.
+ * pkey-0 is associated with every page in the kernel.
+ * If userspace denies any permission on pkey-0, the
+ * kernel cannot operate.
+ */
+ if (pkey == PKEY_0)
+ return init_val ? -EINVAL : 0;
+
return __arch_set_user_pkey_access(tsk, pkey, init_val);
}
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 0b98db6..1ebb21b 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -14,7 +14,8 @@
bool pkey_execute_disable_supported;
int pkeys_total; /* Total pkeys as per device tree */
bool pkeys_devtree_defined; /* pkey property exported by device tree */
-u32 initial_allocation_mask; /* Bits set for reserved keys */
+u32 initial_allocation_mask; /* Bits set for the initially allocated keys */
+u32 reserved_allocation_mask; /* Bits set for reserved keys */
u64 pkey_amr_mask; /* Bits in AMR not to be touched */
u64 pkey_iamr_mask; /* Bits in AMR not to be touched */
u64 pkey_uamor_mask; /* Bits in UMOR not to be touched */
@@ -121,8 +122,9 @@ int pkey_initialize(void)
#else
os_reserved = 0;
#endif
- initial_allocation_mask = (0x1 << 0) | (0x1 << 1) |
- (0x1 << EXECUTE_ONLY_KEY);
+ /* Bits are in LE format. */
+ reserved_allocation_mask = (0x1 << 1) | (0x1 << EXECUTE_ONLY_KEY);
+ initial_allocation_mask = reserved_allocation_mask | (0x1 << PKEY_0);
/* register mask is in BE format */
pkey_amr_mask = ~0x0ul;
@@ -135,7 +137,7 @@ int pkey_initialize(void)
pkey_amr_mask |= (AMR_RD_BIT|AMR_WR_BIT) << pkeyshift(EXECUTE_ONLY_KEY);
pkey_uamor_mask = ~0x0ul;
- pkey_uamor_mask &= ~(0x3ul << pkeyshift(0));
+ pkey_uamor_mask &= ~(0x3ul << pkeyshift(PKEY_0));
/*
* key 1 is recommended not to be used.
* PowerISA(3.0) page 1015,
@@ -381,9 +383,6 @@ static bool pkey_access_permitted(int pkey, bool write, bool execute)
int pkey_shift;
u64 amr;
- if (!pkey)
- return true;
-
if (!is_pkey_enabled(pkey))
return true;
--
1.7.1
^ permalink raw reply related
* [PATCH 4/5] powerpc/pkeys: Preallocate execute-only key
From: Ram Pai @ 2018-06-05 2:09 UTC (permalink / raw)
To: mpe
Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
mhocko, bauerman, linuxram, Ulrich.Weigand, fweimer, luto
In-Reply-To: <1528164594-23823-1-git-send-email-linuxram@us.ibm.com>
execute-only key is allocated dynamically. This is a problem. When a
thread implicitly creates a execute-only key, and resets UAMOR for that
key, the UAMOR value does not percolate to all the other threads. Any
other thread may ignorantly change the permissions on the key. This can
cause the key to be not execute-only for that thread.
Preallocate the execute-only key and ensure that no thread can change
the permission of the key, by resetting the corresponding bit in UAMOR.
CC: Andy Lutomirski <luto@kernel.org>
CC: Florian Weimer <fweimer@redhat.com>
CC: Thiago Jung Bauermann <bauerman@linux.ibm.com>
CC: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
arch/powerpc/mm/pkeys.c | 53 +++++++---------------------------------------
1 files changed, 8 insertions(+), 45 deletions(-)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 90ab793..0b98db6 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -25,6 +25,7 @@
#define IAMR_EX_BIT 0x1UL
#define PKEY_REG_BITS (sizeof(u64)*8)
#define pkeyshift(pkey) (PKEY_REG_BITS - ((pkey+1) * AMR_BITS_PER_PKEY))
+#define EXECUTE_ONLY_KEY 2
static void scan_pkey_feature(void)
{
@@ -120,7 +121,8 @@ int pkey_initialize(void)
#else
os_reserved = 0;
#endif
- initial_allocation_mask = (0x1 << 0) | (0x1 << 1);
+ initial_allocation_mask = (0x1 << 0) | (0x1 << 1) |
+ (0x1 << EXECUTE_ONLY_KEY);
/* register mask is in BE format */
pkey_amr_mask = ~0x0ul;
@@ -130,6 +132,7 @@ int pkey_initialize(void)
pkey_amr_mask &= ~(0x3ul << pkeyshift(i));
pkey_iamr_mask &= ~(0x1ul << pkeyshift(i));
}
+ pkey_amr_mask |= (AMR_RD_BIT|AMR_WR_BIT) << pkeyshift(EXECUTE_ONLY_KEY);
pkey_uamor_mask = ~0x0ul;
pkey_uamor_mask &= ~(0x3ul << pkeyshift(0));
@@ -140,6 +143,8 @@ int pkey_initialize(void)
* pseries kernel running on powerVM.
*/
pkey_uamor_mask &= ~(0x3ul << pkeyshift(1));
+ pkey_uamor_mask &= ~(0x3ul << pkeyshift(EXECUTE_ONLY_KEY));
+
for (i = (pkeys_total - os_reserved); i < pkeys_total; i++)
pkey_uamor_mask &= ~(0x3ul << pkeyshift(i));
@@ -153,8 +158,7 @@ void pkey_mm_init(struct mm_struct *mm)
if (static_branch_likely(&pkey_disabled))
return;
mm_pkey_allocation_map(mm) = initial_allocation_mask;
- /* -1 means unallocated or invalid */
- mm->context.execute_only_pkey = -1;
+ mm->context.execute_only_pkey = EXECUTE_ONLY_KEY;
}
static inline u64 read_amr(void)
@@ -333,48 +337,7 @@ static inline bool pkey_allows_readwrite(int pkey)
int __execute_only_pkey(struct mm_struct *mm)
{
- bool need_to_set_mm_pkey = false;
- int execute_only_pkey = mm->context.execute_only_pkey;
- int ret;
-
- /* Do we need to assign a pkey for mm's execute-only maps? */
- if (execute_only_pkey == -1) {
- /* Go allocate one to use, which might fail */
- execute_only_pkey = mm_pkey_alloc(mm);
- if (execute_only_pkey < 0)
- return -1;
- need_to_set_mm_pkey = true;
- }
-
- /*
- * We do not want to go through the relatively costly dance to set AMR
- * if we do not need to. Check it first and assume that if the
- * execute-only pkey is readwrite-disabled than we do not have to set it
- * ourselves.
- */
- if (!need_to_set_mm_pkey && !pkey_allows_readwrite(execute_only_pkey))
- return execute_only_pkey;
-
- /*
- * Set up AMR so that it denies access for everything other than
- * execution.
- */
- ret = __arch_set_user_pkey_access(current, execute_only_pkey,
- PKEY_DISABLE_ACCESS |
- PKEY_DISABLE_WRITE);
- /*
- * If the AMR-set operation failed somehow, just return 0 and
- * effectively disable execute-only support.
- */
- if (ret) {
- mm_pkey_free(mm, execute_only_pkey);
- return -1;
- }
-
- /* We got one, store it and use it from here on out */
- if (need_to_set_mm_pkey)
- mm->context.execute_only_pkey = execute_only_pkey;
- return execute_only_pkey;
+ return mm->context.execute_only_pkey;
}
static inline bool vma_is_pkey_exec_only(struct vm_area_struct *vma)
--
1.7.1
^ permalink raw reply related
* [PATCH 3/5] powerpc/pkeys: fix calculation of total pkeys.
From: Ram Pai @ 2018-06-05 2:09 UTC (permalink / raw)
To: mpe
Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
mhocko, bauerman, linuxram, Ulrich.Weigand, fweimer, luto
In-Reply-To: <1528164594-23823-1-git-send-email-linuxram@us.ibm.com>
Total number of pkeys calculation is off by 1. Fix it.
CC: Florian Weimer <fweimer@redhat.com>
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
arch/powerpc/mm/pkeys.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 6fc56f4..90ab793 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -92,7 +92,7 @@ int pkey_initialize(void)
* arch-neutral code.
*/
pkeys_total = min_t(int, pkeys_total,
- (ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT));
+ ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT)+1));
if (!pkey_mmu_enabled() || radix_enabled() || !pkeys_total)
static_branch_enable(&pkey_disabled);
--
1.7.1
^ permalink raw reply related
* [PATCH 2/5] powerpc/pkeys: Save the pkey registers before fork
From: Ram Pai @ 2018-06-05 2:09 UTC (permalink / raw)
To: mpe
Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
mhocko, bauerman, linuxram, Ulrich.Weigand, fweimer, luto
In-Reply-To: <1528164594-23823-1-git-send-email-linuxram@us.ibm.com>
When a thread forks the contents of AMR, IAMR, UAMOR registers in the
newly forked thread are not inherited.
Save the registers before forking, for content of those
registers to be automatically copied into the new thread.
CC: Michael Ellerman <mpe@ellerman.id.au>
CC: Florian Weimer <fweimer@redhat.com>
CC: Andy Lutomirski <luto@kernel.org>
CC: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
arch/powerpc/kernel/process.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 1237f13..999dd08 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -582,6 +582,7 @@ static void save_all(struct task_struct *tsk)
__giveup_spe(tsk);
msr_check_and_clear(msr_all_available);
+ thread_pkey_regs_save(&tsk->thread);
}
void flush_all_to_thread(struct task_struct *tsk)
--
1.7.1
^ permalink raw reply related
* [PATCH 1/5] powerpc/pkeys: Enable all user-allocatable pkeys at init.
From: Ram Pai @ 2018-06-05 2:09 UTC (permalink / raw)
To: mpe
Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
mhocko, bauerman, linuxram, Ulrich.Weigand, fweimer, luto
In-Reply-To: <1528164594-23823-1-git-send-email-linuxram@us.ibm.com>
In a multithreaded application, a key allocated by one thread must
be activate and usable on all threads.
Currently this is not the case, because the UAMOR bits for all keys are
disabled by default. When a new key is allocated in one thread, though
the corresponding UAMOR bits for that thread get enabled, the UAMOR bits
for all other existing threads continue to have their bits disabled.
Other threads have no way to set permissions on the key, effectively
making the key useless.
Enable the UAMOR bits for all keys, at process creation. Since the
contents of UAMOR are inherited at fork, all threads are capable of
modifying the permissions on any key.
BTW: changing the permission on unallocated keys has no effect, till
those keys are not associated with any PTEs. The kernel will anyway
disallow to association of unallocated keys with PTEs.
CC: Andy Lutomirski <luto@kernel.org>
CC: Florian Weimer <fweimer@redhat.com>
CC: Thiago Jung Bauermann <bauerman@linux.ibm.com>
CC: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
---
arch/powerpc/mm/pkeys.c | 47 +++++++++++++++++++++++++++++------------------
1 files changed, 29 insertions(+), 18 deletions(-)
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 0eafdf0..6fc56f4 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -15,8 +15,9 @@
int pkeys_total; /* Total pkeys as per device tree */
bool pkeys_devtree_defined; /* pkey property exported by device tree */
u32 initial_allocation_mask; /* Bits set for reserved keys */
-u64 pkey_amr_uamor_mask; /* Bits in AMR/UMOR not to be touched */
+u64 pkey_amr_mask; /* Bits in AMR not to be touched */
u64 pkey_iamr_mask; /* Bits in AMR not to be touched */
+u64 pkey_uamor_mask; /* Bits in UMOR not to be touched */
#define AMR_BITS_PER_PKEY 2
#define AMR_RD_BIT 0x1UL
@@ -119,20 +120,29 @@ int pkey_initialize(void)
#else
os_reserved = 0;
#endif
- initial_allocation_mask = ~0x0;
- pkey_amr_uamor_mask = ~0x0ul;
+ initial_allocation_mask = (0x1 << 0) | (0x1 << 1);
+
+ /* register mask is in BE format */
+ pkey_amr_mask = ~0x0ul;
pkey_iamr_mask = ~0x0ul;
- /*
- * key 0, 1 are reserved.
- * key 0 is the default key, which allows read/write/execute.
- * key 1 is recommended not to be used. PowerISA(3.0) page 1015,
- * programming note.
- */
- for (i = 2; i < (pkeys_total - os_reserved); i++) {
- initial_allocation_mask &= ~(0x1 << i);
- pkey_amr_uamor_mask &= ~(0x3ul << pkeyshift(i));
+
+ for (i = 0; i < (pkeys_total - os_reserved); i++) {
+ pkey_amr_mask &= ~(0x3ul << pkeyshift(i));
pkey_iamr_mask &= ~(0x1ul << pkeyshift(i));
}
+
+ pkey_uamor_mask = ~0x0ul;
+ pkey_uamor_mask &= ~(0x3ul << pkeyshift(0));
+ /*
+ * key 1 is recommended not to be used.
+ * PowerISA(3.0) page 1015,
+ * @TODO: Revisit this. This is only applicable on
+ * pseries kernel running on powerVM.
+ */
+ pkey_uamor_mask &= ~(0x3ul << pkeyshift(1));
+ for (i = (pkeys_total - os_reserved); i < pkeys_total; i++)
+ pkey_uamor_mask &= ~(0x3ul << pkeyshift(i));
+
return 0;
}
@@ -289,9 +299,6 @@ void thread_pkey_regs_restore(struct thread_struct *new_thread,
if (static_branch_likely(&pkey_disabled))
return;
- /*
- * TODO: Just set UAMOR to zero if @new_thread hasn't used any keys yet.
- */
if (old_thread->amr != new_thread->amr)
write_amr(new_thread->amr);
if (old_thread->iamr != new_thread->iamr)
@@ -305,9 +312,13 @@ void thread_pkey_regs_init(struct thread_struct *thread)
if (static_branch_likely(&pkey_disabled))
return;
- thread->amr = read_amr() & pkey_amr_uamor_mask;
- thread->iamr = read_iamr() & pkey_iamr_mask;
- thread->uamor = read_uamor() & pkey_amr_uamor_mask;
+ thread->amr = pkey_amr_mask;
+ thread->iamr = pkey_iamr_mask;
+ thread->uamor = pkey_uamor_mask;
+
+ write_uamor(pkey_uamor_mask);
+ write_amr(pkey_amr_mask);
+ write_iamr(pkey_iamr_mask);
}
static inline bool pkey_allows_readwrite(int pkey)
--
1.7.1
^ permalink raw reply related
* [PATCH 0/5] powerpc/pkeys: fixes to pkeys
From: Ram Pai @ 2018-06-05 2:09 UTC (permalink / raw)
To: mpe
Cc: linuxppc-dev, dave.hansen, aneesh.kumar, bsingharora, hbabu,
mhocko, bauerman, linuxram, Ulrich.Weigand, fweimer, luto
Assortment of fixes to pkey.
Patch 1 makes pkey consumable in multithreaded applications.
Patch 2 fixes fork behavior to inherit the key attributes.
Patch 3 A off-by-one bug made one key unusable. Fixes it.
Patch 4 Makes pkey-0 less special.
Ram Pai (5):
powerpc/pkeys: Enable all user-allocatable pkeys at init.
powerpc/pkeys: Save the pkey registers before fork
powerpc/pkeys: fix calculation of total pkeys.
powerpc/pkeys: Preallocate execute-only key
powerpc/pkeys: make protection key 0 less special
arch/powerpc/include/asm/pkeys.h | 29 ++++++++--
arch/powerpc/kernel/process.c | 1 +
arch/powerpc/mm/pkeys.c | 107 ++++++++++++++------------------------
3 files changed, 64 insertions(+), 73 deletions(-)
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: David Gibson @ 2018-06-05 1:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Michael S. Tsirkin, Anshuman Khandual, virtualization,
linux-kernel, linuxppc-dev, aik, robh, joe, elfring, jasowang,
mpe, hch
In-Reply-To: <d5df613d6347fe2f9bb6ea65bc6f6be05650ca6f.camel@kernel.crashing.org>
[-- Attachment #1: Type: text/plain, Size: 1720 bytes --]
On Mon, Jun 04, 2018 at 07:48:54PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2018-06-04 at 18:57 +1000, David Gibson wrote:
> >
> > > - First qemu doesn't know that the guest will switch to "secure mode"
> > > in advance. There is no difference between a normal and a secure
> > > partition until the partition does the magic UV call to "enter secure
> > > mode" and qemu doesn't see any of it. So who can set the flag here ?
> >
> > This seems weird to me. As a rule HV calls should go through qemu -
> > or be allowed to go directly to KVM *by* qemu.
>
> It's not an HV call, it's a UV call, qemu won't see it, qemu isn't
> trusted. Now the UV *will* reflect that to the HV via some synthetized
> HV calls, and we *could* have those do a pass by qemu, however, so far,
> our entire design doesn't rely on *any* qemu knowledge whatsoever and
> it would be sad to add it just for that purpose.
>
> Additionally, this is rather orthogonal, see my other email, the
> problem we are trying to solve is *not* a qemu problem and it doesn't
> make sense to leak that into qemu.
>
> > We generally reserve
> > the latter for hot path things. Since this isn't a hot path, having
> > the call handled directly by the kernel seems wrong.
> >
> > Unless a "UV call" is something different I don't know about.
>
> Yes, a UV call goes to the Ultravisor, not the Hypervisor. The
> Hypervisor isn't trusted.
Ah, right. Is that implemented in the host kernel, or in something
further above?
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Michael S. Tsirkin @ 2018-06-05 1:25 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
aik, robh, joe, elfring, david, jasowang, mpe, hch
In-Reply-To: <6164442a718645a754879eac5c4c5fad9283c211.camel@kernel.crashing.org>
On Tue, Jun 05, 2018 at 09:26:56AM +1000, Benjamin Herrenschmidt wrote:
> I would like to keep however the ability to bypass the iommu for
> performance reasons
So that's easy, clear the IOMMU flag and this means "bypass the IOMMU".
--
MST
^ permalink raw reply
* Re: [RFC V2] virtio: Add platform specific DMA API translation for virito devices
From: Benjamin Herrenschmidt @ 2018-06-04 23:26 UTC (permalink / raw)
To: Michael S. Tsirkin
Cc: Anshuman Khandual, virtualization, linux-kernel, linuxppc-dev,
aik, robh, joe, elfring, david, jasowang, mpe, hch
In-Reply-To: <20180604184035-mutt-send-email-mst@kernel.org>
On Mon, 2018-06-04 at 19:21 +0300, Michael S. Tsirkin wrote:
>
> > > > - First qemu doesn't know that the guest will switch to "secure mode"
> > > > in advance. There is no difference between a normal and a secure
> > > > partition until the partition does the magic UV call to "enter secure
> > > > mode" and qemu doesn't see any of it. So who can set the flag here ?
> > >
> > > The user should set it. You just tell user "to be able to use with
> > > feature X, enable IOMMU".
> >
> > That's completely backwards. The user has no idea what that stuff is.
> > And it would have to percolate all the way up the management stack,
> > libvirt, kimchi, whatever else ... that's just nonsense.
> >
> > Especially since, as I explained in my other email, this is *not* a
> > qemu problem and thus the solution shouldn't be messing around with
> > qemu.
>
> virtio is implemented in qemu though. If you prefer to stick
> all your code in either guest or the UV that's your decision
> but it looks like qemu could be helpful here.
Sorry Michael, that doesn't click. Yes of course virtio is implemented
in qemu, but the problem we are trying to solve is *not* a qemu problem
(the fact that the Linux drivers bypass the DMA API is wrong, needs
fixing, and isnt a qemu problem). The fact that the secure guests need
bounce buffering is not a qemu problem either.
Whether qemu chose to use an iommu or not is, and should remain an
orthogonal problem.
Forcing qemu to use the iommu to work around a linux side lack of
proper use of the DMA API is not only just papering over the problem,
it's also forcing changes up 3 or 4 levels of the SW stack to create
that new option that no user will understand the meaning of and that
would otherwise be unnecessary.
> For example what if you have a guest that passes physical addresses
> to qemu bypassing swiotlb? Don't you want to detect
> that and fail gracefully rather than crash the guest?
A guest bug then ? Well it wouldn't so much crash as force the pages to
become encrypted and cause horrible ping/pong between qemu and the
guest (the secure pages aren't accessible to qemu directly).
> That's what VIRTIO_F_IOMMU_PLATFORM will do for you.
Again this is orthogonal. Using an iommu will indeed provide a modicum
of protection against buggy drivers, like it does on HW PCI platforms,
whether those guests are secure or not.
Note however that in practice, we tend to disable the iommu even on
real HW whenever we want performance (of course we can't for guests but
for bare metal systems we do, the added RAS isn't worth the performance
lost for very fast networking for example).
> Still that's hypervisor's decision. What isn't up to the hypervisor is
> the way we structure code. We made an early decision to merge a hack
> with xen, among discussion about how with time DMA API will learn to
> support per-device quirks and we'll be able to switch to that.
> So let's do that now?
The DMA API itself isn't the one that needs to learn "per-device
quirks", it's just plumbing into arch backends. The "quirk" is at the
point of establishing the backend for a given device.
We can go a good way down that path simply by having virtio in Linux
start with putting *itself* its own direct ops in there when
VIRTIO_F_IOMMU_PLATFORM is not set, and removing all the special casing
in the rest of the driver.
Once that's done, we have a single point of establishing the dma ops,
we can quirk in there if needed, that's rather nicely contained, or put
an arch hook, or whatever is necessary.
I would like to keep however the ability to bypass the iommu for
performance reasons, and also because it's qemu default mode of
operation and my secure guest has no clean way to force qemu to turn
the iommu on. The hypervisor *could* return something to qemu when the
guest switch to secure as we do know that, and qemu could walk all of
it's virtio devices as a result and "switch" them over but that's
almost grosser from a qemu perspective.
.../...
> > The point is that requiring specific qemu command line arguments isn't
> > going to fly. We have additional problems due to the fact that our
> > firmware (SLOF) inside qemu doesn't currently deal with iommu's etc...
> > though those can be fixed.
> >
> > Overall, however, this seems to be the most convoluted way of achieving
> > things, require user interventions where none should be needed etc...
> >
> > Again, what's wrong with a 2 lines hook instead that solves it all and
> > completely avoids involving qemu ?
> >
> > Ben.
>
> That each platform wants to add hacks in this data path function.
Sure, then add a single platform hook and the platforms can do what
they want here.
But as I said, it should all be done at initialization time rather than
in the data path, this we absolutely agree. We should just chose the
right set of dma_ops, and have the data path always use the DMA API.
Cheers,
Ben.
> > >
> > > > >
> > > > >
> > > > > > arch/powerpc/include/asm/dma-mapping.h | 6 ++++++
> > > > > > arch/powerpc/platforms/pseries/iommu.c | 11 +++++++++++
> > > > > > drivers/virtio/virtio_ring.c | 10 ++++++++++
> > > > > > 3 files changed, 27 insertions(+)
> > > > > >
> > > > > > diff --git a/arch/powerpc/include/asm/dma-mapping.h b/arch/powerpc/include/asm/dma-mapping.h
> > > > > > index 8fa3945..056e578 100644
> > > > > > --- a/arch/powerpc/include/asm/dma-mapping.h
> > > > > > +++ b/arch/powerpc/include/asm/dma-mapping.h
> > > > > > @@ -115,4 +115,10 @@ extern u64 __dma_get_required_mask(struct device *dev);
> > > > > > #define ARCH_HAS_DMA_MMAP_COHERENT
> > > > > >
> > > > > > #endif /* __KERNEL__ */
> > > > > > +
> > > > > > +#define platform_forces_virtio_dma platform_forces_virtio_dma
> > > > > > +
> > > > > > +struct virtio_device;
> > > > > > +
> > > > > > +extern bool platform_forces_virtio_dma(struct virtio_device *vdev);
> > > > > > #endif /* _ASM_DMA_MAPPING_H */
> > > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
> > > > > > index 06f0296..a2ec15a 100644
> > > > > > --- a/arch/powerpc/platforms/pseries/iommu.c
> > > > > > +++ b/arch/powerpc/platforms/pseries/iommu.c
> > > > > > @@ -38,6 +38,7 @@
> > > > > > #include <linux/of.h>
> > > > > > #include <linux/iommu.h>
> > > > > > #include <linux/rculist.h>
> > > > > > +#include <linux/virtio.h>
> > > > > > #include <asm/io.h>
> > > > > > #include <asm/prom.h>
> > > > > > #include <asm/rtas.h>
> > > > > > @@ -1396,3 +1397,13 @@ static int __init disable_multitce(char *str)
> > > > > > __setup("multitce=", disable_multitce);
> > > > > >
> > > > > > machine_subsys_initcall_sync(pseries, tce_iommu_bus_notifier_init);
> > > > > > +
> > > > > > +bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > > > > +{
> > > > > > + /*
> > > > > > + * On protected guest platforms, force virtio core to use DMA
> > > > > > + * MAP API for all virtio devices. But there can also be some
> > > > > > + * exceptions for individual devices like virtio balloon.
> > > > > > + */
> > > > > > + return (of_find_compatible_node(NULL, NULL, "ibm,ultravisor") != NULL);
> > > > > > +}
> > > > >
> > > > > Isn't this kind of slow? vring_use_dma_api is on
> > > > > data path and supposed to be very fast.
> > > > >
> > > > > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> > > > > > index 21d464a..47ea6c3 100644
> > > > > > --- a/drivers/virtio/virtio_ring.c
> > > > > > +++ b/drivers/virtio/virtio_ring.c
> > > > > > @@ -141,8 +141,18 @@ struct vring_virtqueue {
> > > > > > * unconditionally on data path.
> > > > > > */
> > > > > >
> > > > > > +#ifndef platform_forces_virtio_dma
> > > > > > +static inline bool platform_forces_virtio_dma(struct virtio_device *vdev)
> > > > > > +{
> > > > > > + return false;
> > > > > > +}
> > > > > > +#endif
> > > > > > +
> > > > > > static bool vring_use_dma_api(struct virtio_device *vdev)
> > > > > > {
> > > > > > + if (platform_forces_virtio_dma(vdev))
> > > > > > + return true;
> > > > > > +
> > > > > > if (!virtio_has_iommu_quirk(vdev))
> > > > > > return true;
> > > > > >
> > > > > > --
> > > > > > 2.9.3
^ permalink raw reply
* Re: [PATCH net-next] wan/fsl_ucc_hdlc: use dma_zalloc_coherent instead of allocator/memset
From: David Miller @ 2018-06-04 21:28 UTC (permalink / raw)
To: yuehaibing; +Cc: qiang.zhao, netdev, linux-kernel, linuxppc-dev
In-Reply-To: <20180604130759.25024-1-yuehaibing@huawei.com>
From: YueHaibing <yuehaibing@huawei.com>
Date: Mon, 4 Jun 2018 21:07:59 +0800
> Use dma_zalloc_coherent instead of dma_alloc_coherent
> followed by memset 0.
>
> Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Applied.
^ permalink raw reply
* Re: pkeys on POWER: Access rights not reset on execve
From: Florian Weimer @ 2018-06-04 21:00 UTC (permalink / raw)
To: Ram Pai; +Cc: Andy Lutomirski, Linux-MM, linuxppc-dev, Dave Hansen
In-Reply-To: <20180604190229.GB10088@ram.oc3035372033.ibm.com>
On 06/04/2018 09:02 PM, Ram Pai wrote:
> On Mon, Jun 04, 2018 at 07:57:46PM +0200, Florian Weimer wrote:
>> On 06/04/2018 04:01 PM, Ram Pai wrote:
>>> On Mon, Jun 04, 2018 at 12:12:07PM +0200, Florian Weimer wrote:
>>>> On 06/03/2018 10:18 PM, Ram Pai wrote:
>>>>> On Mon, May 21, 2018 at 01:29:11PM +0200, Florian Weimer wrote:
>>>>>> On 05/20/2018 09:11 PM, Ram Pai wrote:
>>>>>>> Florian,
>>>>>>>
>>>>>>> Does the following patch fix the problem for you? Just like x86
>>>>>>> I am enabling all keys in the UAMOR register during
>>>>>>> initialization itself. Hence any key created by any thread at
>>>>>>> any time, will get activated on all threads. So any thread
>>>>>>> can change the permission on that key. Smoke tested it
>>>>>>> with your test program.
>>>>>>
>>>>>> I think this goes in the right direction, but the AMR value after
>>>>>> fork is still strange:
>>>>>>
>>>>>> AMR (PID 34912): 0x0000000000000000
>>>>>> AMR after fork (PID 34913): 0x0000000000000000
>>>>>> AMR (PID 34913): 0x0000000000000000
>>>>>> Allocated key in subprocess (PID 34913): 2
>>>>>> Allocated key (PID 34912): 2
>>>>>> Setting AMR: 0xffffffffffffffff
>>>>>> New AMR value (PID 34912): 0x0fffffffffffffff
>>>>>> About to call execl (PID 34912) ...
>>>>>> AMR (PID 34912): 0x0fffffffffffffff
>>>>>> AMR after fork (PID 34914): 0x0000000000000003
>>>>>> AMR (PID 34914): 0x0000000000000003
>>>>>> Allocated key in subprocess (PID 34914): 2
>>>>>> Allocated key (PID 34912): 2
>>>>>> Setting AMR: 0xffffffffffffffff
>>>>>> New AMR value (PID 34912): 0x0fffffffffffffff
>>>>>>
>>>>>> I mean this line:
>>>>>>
>>>>>> AMR after fork (PID 34914): 0x0000000000000003
>>>>>>
>>>>>> Shouldn't it be the same as in the parent process?
>>>>>
>>>>> Fixed it. Please try this patch. If it all works to your satisfaction, I
>>>>> will clean it up further and send to Michael Ellermen(ppc maintainer).
>>>>>
>>>>>
>>>>> commit 51f4208ed5baeab1edb9b0f8b68d7144449b3527
>>>>> Author: Ram Pai <linuxram@us.ibm.com>
>>>>> Date: Sun Jun 3 14:44:32 2018 -0500
>>>>>
>>>>> Fix for the fork bug.
>>>>> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
>>>>
>>>> Is this on top of the previous patch, or a separate fix?
>>>
>>> top of previous patch.
>>
>> Thanks. With this patch, I get this on an LPAR:
>>
>> AMR (PID 1876): 0x0000000000000003
>> AMR after fork (PID 1877): 0x0000000000000003
>> AMR (PID 1877): 0x0000000000000003
>> Allocated key in subprocess (PID 1877): 2
>> Allocated key (PID 1876): 2
>> Setting AMR: 0xffffffffffffffff
>> New AMR value (PID 1876): 0x0fffffffffffffff
>> About to call execl (PID 1876) ...
>> AMR (PID 1876): 0x0000000000000003
>> AMR after fork (PID 1878): 0x0000000000000003
>> AMR (PID 1878): 0x0000000000000003
>> Allocated key in subprocess (PID 1878): 2
>> Allocated key (PID 1876): 2
>> Setting AMR: 0xffffffffffffffff
>> New AMR value (PID 1876): 0x0fffffffffffffff
>>
>> Test program is still this one:
>>
>> <https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-May/173198.html>
>>
>> So the process starts out with a different AMR value for some
>> reason. That could be a pre-existing bug that was just hidden by the
>> reset-to-zero on fork, or it could be intentional. But the kernel
>
> yes it is a bug, a patch for which is lined up for submission..
>
> The fix is
>
>
> commit eaf5b2ac002ad2f5bca118d7ce075ce28311aa8e
> Author: Ram Pai <linuxram@us.ibm.com>
> Date: Mon Jun 4 10:58:44 2018 -0500
>
> powerpc/pkeys: fix total pkeys calculation
>
> Total number of pkeys calculation is off by 1. Fix it.
>
> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Looks good to me now. Initial AMR value is zero, as is currently intended.
So the remaining question at this point is whether the Intel behavior
(default-deny instead of default-allow) is preferable.
But if you can get the existing fixes into 4.18 and perhaps the relevant
stable kernels, that would already be a great help for my glibc work.
Thanks,
Florian
^ permalink raw reply
* Re: [1/5] powerpc/embedded6xx: Remove C2K board support
From: Mark Greer @ 2018-06-04 20:50 UTC (permalink / raw)
To: Michael Ellerman
Cc: Mark Greer, Benjamin Herrenschmidt, Paul Mackerras, Remi Machet,
Dale Farnsworth, linuxppc-dev, linux-kernel
In-Reply-To: <40zxfb5H0xz9s2S@ozlabs.org>
On Tue, Jun 05, 2018 at 12:10:31AM +1000, Michael Ellerman wrote:
> On Fri, 2018-04-06 at 01:17:16 UTC, Mark Greer wrote:
> > The C2K platform appears to be orphaned so remove code supporting it.
> >
> > CC: Remi Machet <rmachet@nvidia.com>
> > Signed-off-by: Mark Greer <mgreer@animalcreek.com>
> > Acked-by: Remi Machet <remi@machet.us>
> > Signed-off-by: Mark Greer <mgreer@animalcreek.com>
>
> Series applied to powerpc next, thanks.
>
> https://git.kernel.org/powerpc/c/92c8c16f345759e87c5d5b771d438f
Thanks Michael.
Mark
--
^ permalink raw reply
* Re: pkeys on POWER: Access rights not reset on execve
From: Ram Pai @ 2018-06-04 19:02 UTC (permalink / raw)
To: Florian Weimer; +Cc: Andy Lutomirski, Linux-MM, linuxppc-dev, Dave Hansen
In-Reply-To: <f2f61c24-8e8f-0d36-4e22-196a2a3f7ca7@redhat.com>
On Mon, Jun 04, 2018 at 07:57:46PM +0200, Florian Weimer wrote:
> On 06/04/2018 04:01 PM, Ram Pai wrote:
> >On Mon, Jun 04, 2018 at 12:12:07PM +0200, Florian Weimer wrote:
> >>On 06/03/2018 10:18 PM, Ram Pai wrote:
> >>>On Mon, May 21, 2018 at 01:29:11PM +0200, Florian Weimer wrote:
> >>>>On 05/20/2018 09:11 PM, Ram Pai wrote:
> >>>>>Florian,
> >>>>>
> >>>>> Does the following patch fix the problem for you? Just like x86
> >>>>> I am enabling all keys in the UAMOR register during
> >>>>> initialization itself. Hence any key created by any thread at
> >>>>> any time, will get activated on all threads. So any thread
> >>>>> can change the permission on that key. Smoke tested it
> >>>>> with your test program.
> >>>>
> >>>>I think this goes in the right direction, but the AMR value after
> >>>>fork is still strange:
> >>>>
> >>>>AMR (PID 34912): 0x0000000000000000
> >>>>AMR after fork (PID 34913): 0x0000000000000000
> >>>>AMR (PID 34913): 0x0000000000000000
> >>>>Allocated key in subprocess (PID 34913): 2
> >>>>Allocated key (PID 34912): 2
> >>>>Setting AMR: 0xffffffffffffffff
> >>>>New AMR value (PID 34912): 0x0fffffffffffffff
> >>>>About to call execl (PID 34912) ...
> >>>>AMR (PID 34912): 0x0fffffffffffffff
> >>>>AMR after fork (PID 34914): 0x0000000000000003
> >>>>AMR (PID 34914): 0x0000000000000003
> >>>>Allocated key in subprocess (PID 34914): 2
> >>>>Allocated key (PID 34912): 2
> >>>>Setting AMR: 0xffffffffffffffff
> >>>>New AMR value (PID 34912): 0x0fffffffffffffff
> >>>>
> >>>>I mean this line:
> >>>>
> >>>>AMR after fork (PID 34914): 0x0000000000000003
> >>>>
> >>>>Shouldn't it be the same as in the parent process?
> >>>
> >>>Fixed it. Please try this patch. If it all works to your satisfaction,=
I
> >>>will clean it up further and send to Michael Ellermen(ppc maintainer).
> >>>
> >>>
> >>>commit 51f4208ed5baeab1edb9b0f8b68d7144449b3527
> >>>Author: Ram Pai <linuxram@us.ibm.com>
> >>>Date: Sun Jun 3 14:44:32 2018 -0500
> >>>
> >>> Fix for the fork bug.
> >>> Signed-off-by: Ram Pai <linuxram@us.ibm.com>
> >>
> >>Is this on top of the previous patch, or a separate fix?
> >
> >top of previous patch.
>=20
> Thanks. With this patch, I get this on an LPAR:
>=20
> AMR (PID 1876): 0x0000000000000003
> AMR after fork (PID 1877): 0x0000000000000003
> AMR (PID 1877): 0x0000000000000003
> Allocated key in subprocess (PID 1877): 2
> Allocated key (PID 1876): 2
> Setting AMR: 0xffffffffffffffff
> New AMR value (PID 1876): 0x0fffffffffffffff
> About to call execl (PID 1876) ...
> AMR (PID 1876): 0x0000000000000003
> AMR after fork (PID 1878): 0x0000000000000003
> AMR (PID 1878): 0x0000000000000003
> Allocated key in subprocess (PID 1878): 2
> Allocated key (PID 1876): 2
> Setting AMR: 0xffffffffffffffff
> New AMR value (PID 1876): 0x0fffffffffffffff
>=20
> Test program is still this one:
>=20
> <https://lists.ozlabs.org/pipermail/linuxppc-dev/2018-May/173198.html>
>=20
> So the process starts out with a different AMR value for some
> reason. That could be a pre-existing bug that was just hidden by the
> reset-to-zero on fork, or it could be intentional. But the kernel
yes it is a bug, a patch for which is lined up for submission..
The fix is
commit eaf5b2ac002ad2f5bca118d7ce075ce28311aa8e
Author: Ram Pai <linuxram@us.ibm.com>
Date: Mon Jun 4 10:58:44 2018 -0500
powerpc/pkeys: fix total pkeys calculation
=20=20=20=20
Total number of pkeys calculation is off by 1. Fix it.
=20=20=20=20
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
index 4530cdf..3384c4e 100644
--- a/arch/powerpc/mm/pkeys.c
+++ b/arch/powerpc/mm/pkeys.c
@@ -93,7 +93,7 @@ int pkey_initialize(void)
* arch-neutral code.
*/
pkeys_total =3D min_t(int, pkeys_total,
- (ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT));
+ ((ARCH_VM_PKEY_FLAGS >> VM_PKEY_SHIFT)+1));
=20
if (!pkey_mmu_enabled() || radix_enabled() || !pkeys_total)
static_branch_enable(&pkey_disabled);
^ permalink raw reply related
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox